Jump to content
Angus Robertson

No exception handling with server crash

Recommended Posts

I've been using an ICS FTP server on several of my servers for 15 years or so, compiled using Delphi 2007, although the server component is compatible with all compilers.  I'm now trying to convert more of my applications to D11.

 

But when built with D11, the FTP server application crashes with a Windows heap corruption exception upon completion of any SSL FTP session, irrespective of what commands were processed during the session.  


The crash only happens when the application uses OpenSSL DLLs, win32 or Win64, if I build it with YuOpenSSL which links the C code into the app, it does not fail.  

 

Despite all my error handling efforts including Madexcept, the application is unable to catch the error, just got lots of Windows Error Reporting and Application Error events.

 

The application itself logs activity, but the last thing logged is the FTP QUIT command, the application crashes before the log writes anything more to disk.

 

What is strange, this crash only happens on server operating systems, specifically Windows Server 2012 and 2022, and only when accessing the server from another computer, not locally. 

The same problem has been reproduced in two different server applications, running interactively or as a Windows service.  

 

The SSL code is well tested and widely used, it's strange the crash only happens on Windows servers in such rare conditions, and that Madexcept can not catch the error, although it is not reported as being in the DLL.  

 

Faulting application name: magfserver.exe, version: 2.0.0.7, time stamp: 0x62f69268
Faulting module name: ntdll.dll, version: 10.0.20348.803, time stamp: 0xbee6f04c
Exception code: 0xc0000374
Fault offset: 0x00000000001044a9
Faulting process ID: 0xe00
Faulting application start time: 0x01d8ae7494fc7fe2
Faulting application path: C:\magenta\fileserver\magfserver.exe
Faulting module path: C:\Windows\SYSTEM32\ntdll.dll

 

Any thoughts on why an application can fail in such rare but very specific circumstances? 

 

Angus

 

Share this post


Link to post

Thanks, even if I managed to create a process dump at the moment of corruption, I really would not know how to go looking for the line that caused the problem. 

 

Or why that line only fails in certain but 100% reliably repeatable circumstances, which do not include under the Delphi debugger. 

 

Angus

 

Share this post


Link to post

I added syslog logging to the sample so now have logs up to the point of crashing, which is in our function ResetSslSession while closing various handles.  The crash happens after a call to SSL_free(FSsl), but the exception handler does not catch it. 

 

But this still only happens for a remote connection to the server, not locally.   The remote connection does raise an earlier abort network error, but this is ignored.

 

So ultimately the problem does not seem to be anything to do with our Delphi code, but with the runtimes in the OpenSSL DLL. 

 

Angus

 

Share this post


Link to post

From your description, the heap corruption you experience is a Windows heap corruption, not a Delphi heap corruption.

 

Do you mean the crash happens during the execution of SSL_free(FSsl), or right after it returns or sometimes later?

Are you sure FSsl is not corrupted or already freed?

 

Share this post


Link to post

When the exception occurs, is fault offset always the same (I mean with the same executable)?

If it is, try setting a data breakpoint at that address so that the debugger stops when the location is changed.

Share this post


Link to post

Logging works before SSL_free is called, but nothing afterwards.  This function is called for every SSL connection in ICS, there is no way anything is getting corrupted on a platform basis.  During a normal close down, the ResetSslSession function gets called a second time with all the various pointers nulled, and behaves itself. 

 

Setting up remote debugging will be tedious, and I doubt would be productive.   It's only the FTP server that seems unhappy, the D10/D11 web server has been running for over a year. 

 

Angus

 

Share this post


Link to post

Since the OS is force closing your application, my mind goes to ALSR or DEP and the implementation on Server. Possibly the way the SSL is compiled in relation to these.

 

There is a lot of reading, but maybe you could just compile your exe with these options and see if it fixes before you spend too much time reading the detail.

https://www.ideasawakened.com/post/enabling-dep-and-aslr-to-reduce-the-attack-vector-of-your-delphi-applications-on-windows

11 hours ago, Angus Robertson said:

OpenSSL DLLs, win32 or Win64, if I build it with YuOpenSSL which links the C code into the app, it does not fail.  

This makes me think the flags used in compile of SSL is where your problem really is.  Maybe dump those PE headers and see if you can spot the problem : https://github.com/changeofpace/PE-Header-Dump-Utilities

 

Share this post


Link to post

Thanks, my issue does seem to be OpenSSL DLL related, although the same ResetSslSession function may have been called dozens of times during the FTP session, twice for each file uploaded or downloaded, or not once if the session fails due to authentication. 

 

I should really try with older DLLs and 3.0 built elsewhere.  Although none of this explains why the same program built with Delphi 2007 has worked fine on all my servers for 15 years, with various OpenSSL releases, maybe that is PE header related.

 

Angus

Share this post


Link to post
1 hour ago, Angus Robertson said:

Although none of this explains why the same program built with Delphi 2007 has worked fine on all my servers for 15 years, with various OpenSSL releases, maybe that is PE header related.

My experience is that bugs can live for very long time while the application works perfectly. Then you change the compiler or the OS and suddenly the bug surface. One cause of the behavior is that the bug corrupt some memory area that is not used until the new compiler or OS change. Those kind of bugs are among the most difficult to locate.

 

Share this post


Link to post

@Angus Robertson did you find a solution for this issue?

I'm having the absolute same problem with my application, using OpenSSL DLL and ICS Server @FPiette
In our case, the application will randomly crash with Exception code: 0xc0000374 which is a Heap Corruption.

We are having this issue for years and we could never find the source of it.. We are intensively debugging for the past few days and what I could find is that the problem doesn't seem to happen if sockets are all running single-threaded, but once we have a multi-threading scenario, thats when this error starts happening.. but there is no simple way to reproduce the error as its not deterministic, its very random. This error happens on the server side (Its an ICS TCP server running our custom protocol).

Share this post


Link to post

I've not seen the heap corruption error for a long time, but the four ICS servers applications (web, rest, ftp, proxy) installed on my public servers are all built with YuOpenSSL Win64.   All my servers are single thread, although some FTP commands use a thread for lengthy commands. 

 

My Windows Server 2018 in particular has one Win64 server that won't start with the OpenSSL DLLs, yet other servers work, and it works on Server 2022, but I've not looked at that for 18 months since YuOpenSSL works fine. 

 

There are major OpenSSL changes in V9.1 that are almost ready for release, once that is out I'll do some more testing with the OpenSSL DLLs.   And long term I do plan a new threaded web server, to allow more than one CPU to be used. 

 

Angus

 

Share this post


Link to post

I have few things about the subject

 

1) The exception is raised in the OS kernel protected part, not the user part, hence the the failure for the application (or MadExcept..) to get hand on the exception.

2) When you say it doesn't happen with YuOpenSSL build, this bring to elephant in the room, YuOpenSSL build will be using/utilizing Delphi application memory manager instead of the System heap and memory manager, and there is a big difference, while FastMM is build for speed and other feature, it is very forgiving in the matter of out-bound write/access, because small or medium allocations are continuous and there is no check for overflow write or even as simple reading after freeing.

3) No sure what MadExcept does offer, but sure that EurekaLog have extended functionality to perform deep hooking and capture and report every handled exception image.png.be833bc4c48a21d86dfb1fdeec3fac2c.png

4) Also EurekaLog can helpful in performing some fuzzing to enhance the chance of capturing the read after free usage image.thumb.png.9e39fbd1d5ef94626ea29975ea0b6a62.png

 

 

I think you can capture and eliminate this exception or at least get better understanding of its context, by going after YuOpenSSL build not the OpenSSL Dlls, focusing on that build in my opinion has the best chance to solve this once and for all, even when there is no exception being raised or any symptoms or memory miss usage.

Share this post


Link to post

To be honest, I've not really thought about this problem in a long time, my original comment was about the FTP server, but I transfer thousands of files daily to and from my servers, so it's not an issue that needs much of my time.

 

I do get a web server crash maybe once every couple of months, but I also get continual attacks on those servers, there are usually 100 or more IPs blocked for attacks.  Sometimes they can be so heavy that even the firewall gives up, had to replace that.  So hard to say if the problems are Delphi or OpenSSL related.  All my servers have heavy logging, but only flushed to disk every few seconds, so usually lost during a crash.  These are live, not experimental servers, if they stop my phone rings within a few minutes. 

 

Angus

 

 

 

Share this post


Link to post

Thanks, I'll look at Application Recovery and Restart, if it applies to Windows Services. 

 

Mine are all set to restart if the application stops, and my services go to a lot of trouble to try and save logs and terminate cleanly on any unexpected errors, including emailing me, although I've delayed that now until the service restarts, since corruption sometimes meant the service locked solid instead of stopping and restarting meaning manual intervention.  Keep meaning to write a second monitoring application, but these problems are so rare I never get around to it.

 

Angus

 

 

 

Share this post


Link to post

@Kas Ob. I have been using EurekaLog and unfortunately this error is never caught, maybe because its external, but even when selecting Catch Handled Exceptions is selected, it also doesn't work. I didn't try "Handle every SafeCall exception" but in the help file they say this option doesn't do anything if Catch Handled Exceptions is selected.

 

In my debugs session, it looks like it could be related to my multi-threading approach that might be wrong and while it works fine when no SSL is being used, looks like the SSL library does't like it and we might be doing some operation on the socket coming from different thread contexts

Share this post


Link to post
1 hour ago, Eric Bonilha said:

@Kas Ob. I have been using EurekaLog and unfortunately this error is never caught, maybe because its external, but even when selecting Catch Handled Exceptions is selected, it also doesn't work. I didn't try "Handle every SafeCall exception" but in the help file they say this option doesn't do anything if Catch Handled Exceptions is selected.

I need to explain the difference between user mode part of the kernel and the real OS kernel, the hidden and protected one, lets take a simple OS API like GetTickCount, put it in Delphi code, stop the debugger on it and use step into, continue until return to Delphi code, the debugger will be able to walk it all, but if you take one of the sensitive OS API let say any of the file handling (or threads...) and tried to trace with step into using the Delphi debugger or any debugger then you will end up with such assembly instruction SYSCALL oe SYSENTER https://www.felixcloutier.com/x86/syscall https://www.felixcloutier.com/x86/sysenter

These are the gates for protected mode, what happens on the other side nothing on our (user) side can see or know and predict.

 

Now back to the exception raising and handling (trapping), all exception handlers are chained but will will not cross that protected mode, so no matter where the exception raised it will follow that chain, BUT will not cross that barrier.

So if an exception been raised in user mode part of the kernel or in you application or any auxiliary library (DLL) those can be caught or at least can be inspected or hooked (override the SEH traps) by software like EurekaLog, but if the exception being raised in the protected mode then nothing will handle it from here and the only way to handle, inspect or trap it is by software running on that side, like kernel debugger or OS kernel its own exception handling code.

 

Some API does indeed raise an exception on its own, by code (aka software) or hardware exception (eg within a driver), here come how it did implemented to do it or after it, either by mitigating the exception silently (eg. access violation to unallocated memory) then will eat that exception and return to the user mode with simple error code as API return value (saying something like the buffer invalid), but in user mode here no one will know, in other words neither code (exception handling) or debugger will be triggered, or in some very specific situation the exception is that rare, that Microsoft didn't take the time or the effort to mitigate it or it is indeed have some risk factor (security), and let it run through the kernel exception handler (OS layer debugger) calling the WER (which indeed also a debugger), this one can't be handled and will lead to terminating the process and no control will be brought back to user mode, i mean none, all the threads will be terminated after WER paused them and will not be allowed to exist after that, if WER is invoked, it could be risk factor and even WER will be called.

 

Back to EurekaLog, it does good job with hooking on system level, but again all this will only apply to user mode part of the kernel, beyond that no software (in user mode) will help, but what will help is to read WER report and get sense out of it, it should have a call stack.

 

1 hour ago, Eric Bonilha said:

maybe because its external

It is definitely external, but we can't be sure where, if in user mode part of the kernel then EurekaLog should/might capture it, but if it is in protected mode then no.

One thing for sure it is as per documentation STATUS_HEAP_CORRUPTION , which is merely something simple like use after free or an overflow in heap or something similar.....,

 

But again that an OS API tried to do read/write on what it did classify as corrupted heap, and OS didn't allocate that part but your software did (including OpenSSL library), and it is the one responsible for this wrong memory/pointer fed to the API,

OR in a THOERY the API were fed a right pointer/memory/heap .... but another thread wrongfully accessed it and modified something or even freed it !, that what should be checked against.

 

PS : if you supplied a blocking read with a buffer to fill (file or socket ...whatever) and while it is blocking on that operation you did free that buffer, what do you think should be raised as an exception ?

yes it will be STATUS_HEAP_CORRUPTION, this might not be easy to reproduce with Delphi MM (FastMM) but will be very easy with VirtualAlloc/VirtualFree, with FastMM there will be no error because freeing memory is rarely a free and return to the system.

 

Hope that was readable and helpful.

Share this post


Link to post

Hm, I'm interested in whether Eurekalog works at all with the DDService, and if it does, whether any adjustments are needed because I'm currently not succeeding.

You are using it in your services @Angus Robertson, don't you?

I can send emails manully, the account on which the service runs allows it, but that's it.

madExcept didn't do enything either.

 

Share this post


Link to post

I maintain and use DDService, but not Eurekalog. 

 

I've been using madExecept for many years, but only for logging errors, none of the restart stuff or emails. 

 

Strangely, my main web server has crashed twice during the last two nights, restarted within a few seconds, not happened since November and that was development bugs in ICS. Error was  C0000005 buffer overrun. Using YUOpenSSL.  

 

But it could be hackers trying to exploit several low priority exploits in OpenSSL due to be fixed this week.

 

Angus

 

  • Thanks 1

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×