Jump to content
David Schwartz

Threading question

Recommended Posts

Posted (edited)

I'm working with some code that was written many years ago, and it uses threading to circumvent delays that tended to occur back when computers were slower and it took a while for the UI to update. For the most part, these apps were headless -- just stuff that ran on a server and spat out log files to document their travails.

 

The thing is, the log files are too sparse; there's not enough info in them to support any sort of auditing. So now we're finding that the program has been failing to update things in the DB and side storage for years and nobody knew.

 

I'm trying to add code to capture more details to the log, but I'm getting random AVs that I suspect are due to the threading not properly synchronizing with the main form.

 

Access violation at address 0040DD41 in module 'DocLink.exe'. Read of address 00000001

 

The code uses this to send data from the thread to the main form:

 

procedure TSequence.StatusOut(const stat: string);
begin
  if (MainWinHandle <> 0) then
    Windows.SendMessage(MainWinHandle, WM_STATUSOUT, LongInt(PChar(stat)), 0)
end;

//----------------------------------------------
// this seems to be the other side in the main form:

    procedure WMStatusOut(var Message: TMessage); message WM_STATUSOUT;
. . .

procedure StatusOut(const stat: string);
begin
  LogWrite(stat, true, true);
  FrmMain.memStatus.Lines.Add(stat);
  Application.ProcessMessages;
end;

procedure TfrmMain.WMStatusOut(var Message: TMessage);
begin
  StatusOut(PChar(Message.WParam)); // local StatusOut procedure
  inherited;
end;

The main form stuffs it into a TMemo (memStatus) as well as sending it to a logger.

 

I've done some research and it seems that SendMessage should be sufficient. It's ok for low volumes of traffic, but as I send more data, the threads start throwing these AVs.

 

Personally, I don't see any need for threading today, but it is what it is, and I'm not about to redo everything to remove the threading.

 

Any suggestion on how to deal with this? BTW, I'm only guessing that this is the problem. We've got a version of this code that has been running for 2-1/2 years with none of these errors showing up. It's only now as I'm trying to capture more details in the log file that I'm getting these errors.

Edited by David Schwartz

Share this post


Link to post

I suspect your problem is due to reentrancy.  The code is using Windows messages to update to the UI only in the main VCL thread.  This is a common old school approach prior to the use of TTHread.Synchronize().  I would venture the issue lies in the fact that Application.ProcessMessages is called in the message handler.  As such it will attempt to process any pending messages in the message queue.  When the load created by other threads is sufficient, there will be messages in the queue, and AVs will result as the stack variables will be overwritten.  Take out the Application.ProcessMessages I would bet the AVs disappear.

 

Calling Application.ProcessMessages is common when the main VCL thread may be very busy, but you want the UI controls to update to reflect progress.  It is a CODE SMELL.  If you see it, remove it.  A background thread should be used for longer running tasks and these days there is no excuse for not using one because with the PPL, OTL or even just TThread it isn't that hard any more.  In this case someone who didn't understand how the program works must have added the call to Application.ProcessMessages or thought it might help because the method call to write the log was taking so long.  You might want to move the log writing to a background thread as well, and that would keep the entire processing headless, which is an excellent way to ensure it is testable with minimal dependencies.

 

Share this post


Link to post

Read of address with a low number means a nullpointer exception; the code is attempting to access an already freed / uninitialized object. Did you manage to catch where the AV is raised or you just see it in the logs?

I was suspecting that the thread's "stat" variable is overwritten and therefore relocated; but it's unlikely because

1, it would show a random address, not 1.

2, thanks to @Anders Melander I remembered that SendMessage does not return until the message is processed. So unless "stat" can be written from the outside...

 

Share this post


Link to post
Posted (edited)

@Larry Hengen There are only 4 controls on the main form: a checkbox, two buttons, and a TMemo. The memo gets very little traffic from these StatusOut messages, at least it did until I started adding stuff. I don't know why the Application.ProcessMessages is there other than someone thought it was a Good Idea. I'll remove it and see if your theory is correct.

 

UPDATE: Nope, it didn't help. Now I'll try changing SendMessage to PostMessage.

Edited by David Schwartz

Share this post


Link to post

Changing to PostMessage didn't help either. The messages all showed up as some kind of Asian pictographs and other junk.

 

The class being threaded is derived from TThread, if that helps with anything.

Share this post


Link to post
Posted (edited)

Hmmm. Kind of hard to guess without more information.  But, try trapping/putting breakpoints in all relevant/related object destructors and explicitly nil every object reference where possible straight after to make nil pointer derefences very obvious.  I'm inclined to think something is being freed and subsequently used without you realising and then you're tripping on an invalid reference.  The traps and setting to nil I suggest is intended to help catch where.  (That unguarded/unchecked FfrmMain reference in StatusOut() seems a bit suspicious in particular.  Ensure FfrmMain gets set to nil when the object it points to is destroyed, e.g. in the form class destructor [of course, only if self = FfrmMain!]) and then also only access memStatus in StatusOut if FfrmMain is [still] Assigned.)  Of course I could be barking up entirely the wrong tree. (I'd add, I worked on a codebase with somewhat similar goings on at a point, which worked fine, though it was the case that you could easily be hoisted by your own petard if you're not very careful, as seems to be the case here also -- there was some similar issues at one point due to message arriving in the message queue after the Release message and then code like above assumed objects were still there that actually weren't etc., or something of the sort as I recall, so I'm wondering whether something similar is going on.)  Also, the Application.ProcessMessages is also a bit suspicious and smelly, although it's might not be a problem per se here.  (It's unclear offhand to me why exactly that's there, I'd be inclined to remove it to eliminate it as a cause of trouble unless you absolutely know that it is required to avoid some other behaviour.)  Do you use something like MadExcept or Eurekalog?  (Have you got more context about what's actually happening?)  I might perhaps be open to a quick realtime screenshare look in person if you're open to this? (Discord, Zoom, Skype) Maybe another pair of eyes can help. 

Edited by ByteJuggler

Share this post


Link to post

Okay. I'll try again then. There are multiple problems in the original code.

Like I said, Application.ProcessMessages in itself isn't a problem. The problem is that when you use SendMessage then the WMStatusOut method will run in the context of the thread which means that Application.ProcessMessage will also be executed in the context of the thread and that will definitely end badly at some point. Replacing SendMessage with PostMessage will take care of that (and get rid of ProcessMessages while you're at it. It's the mark of an amateur).

 

Once the above is fixed, the more serious problem will surface: You're passing a pointer to a string owned by the thread to the main thread. You haven't posted the thread code, but I'm betting that there's nothing in your code to prevent the thread from modifying, or even freeing, this string once StatusOut has been called. I'm sure one of the usual suspects will be happy to provide you with an alternative solution.

Share this post


Link to post
2 hours ago, Anders Melander said:

one of the usual suspects will be happy to provide you with an alternative solution. 

If I am among the suspects I would have to disappoint.  I agree that it is likely related to thread ownership of the string passed.  That is why in old code you often see the use of a record type or short strings allocated in the calling thread and freed in the receiving thread.  Like a flaming bag of dung thrown over a wall...up to the receiver to dispose of it.🤣

  • Like 2

Share this post


Link to post
Posted (edited)

ok, let's review... this code was written around 2003 and has been changed very little. The company says it's "rock-solid". The guy who maintained it before me was on the job for 8 years or so, and I'm told he was meticulous. The app we've been running is dated early 2017 and it runs every day at 10 AM. None of this is a problem. However, it has a bug that's not related to any of this that I'm trying to fix. It started in early March and I stumbled onto it only because I started adding a bunch of lines that call StatusOut to report what's going on inside this code. 

 

It's basically behaving like a box containing Shröedinger's bug ... the more I try to zoom in to observe the bug, the more unstable the whole thing gets. 🙂 I used to do a lot of embedded real-time dev work, but have very little experience with multi-threading in Windows in general and Delphi in particular.

 

The main form is very light-weight and only creates the thread and runs it. When the thread is finished, then the main form is closed. It's not getting shut down early, although I did manage to kill it once that left the thread running. That was weird to see.

 

The code in the thread calls the StatusOut( ) method to send status info that's handled as can be seen above: it's copied into a TMemo and sent to a logger. What appears in the memo and the logger are identical. It's not sending records, allocating objects, or anything that needs to be disposed on the other side of the call. It's just sending strings.

 

When I use the other method to send the message, it shows up as trash -- a bunch of Chinese or Japanese or some other text. I don't know why.

 

The messages sent are mostly string constants with IntToStr() calls sometimes added at the end.

 

I discovered there are other things happening that need to be addressed. Unfortunately, there's no way I've found to run this inside of the IDE to debug it, so I have to add debug traces to the code to see what's going on in the servers where it does run.

 

Also, there's a server with Windows XP hosting this which is what motivated this whole thing, because the company needs to move it to over Windows 10, and that's where they started to see some weird stuff. It turns out that the weird stuff they noticed is what's supposed to happen; for whatever reason, it's NOT showing up in the Win XP machine, which is strange, but does not seem to be causing any problems. I don't care, I just need it running properly in Win 10.

 

Yes, a lot of data is assigned to local string vars before it's being handed to StatusOut, and there are lots of Format statements generating data like:

 

StatusOut( Format( '. . .', [a, b, c] ) );

 

where the variables are both local vars and members of an object.

 

Since SendMessage is synchronous, the StatusOut calls are not returning until the other side has dispatched the data that was sent, right?

 

I thought this process was fairly robust, but maybe I need to set up CodeSite instead. Any thoughts on this?

 

During normal operation, the existing code fires off a Windows app as a child process. It runs for quite a while. It performs OCR on one-page PDFs. It's not interesting to watch, and all anybody does is open the VNC window to see that it's running and then they close it. I don't think anybody even looks at the log. They pretty much ignore it unless the client complains, which hasn't happened in ages from what I'm told. (I don't know why, because I discovered it has been broken so it's not attaching the PDFs to proper db records since around March 1st of this year and nobody knew.)

 

I've replaced the Windows app with an OCR library that runs as a DLL, and it does the work in two seconds per page. A process that takes 8 hours with the app takes about 16 minutes with the DLL.

 

The core part of this thing works as-is, except for one piece that's saving data into a zip file. It seems the more StatusOut calls I make to see what's going on, the more weird exceptions I get. Again, it's behaving more like a box containing Shröedinger's bug than a nice friendly Delphi app.

 

 

Edited by David Schwartz

Share this post


Link to post

You probably get an authorization message from both.
I recommend you to close it one by one and try it.

 

LogWrite(stat, true, true);

FrmMain.memStatus.Lines.Add(stat);

 

1,

First, the "log writer" write error might be returning. authorization may have changed. HDD or SSD permisssion ...!

LogWrite(stat, true, true);

 

2,

If the problem is here. New Windows updates have controls for "thread" access.

FrmMain.memStatus.Lines.Add(stat);

 

2, I suggest you look at the create line in "thread".

constructor TSequence.Create;
begin
  inherited Create;
  FHandle := AllocateHWnd(WndProc);
end;

 

I recommend changing the code by giving a few minutes.

As I understand you are posting a message from the "thread" to the forum.

 

unit Sequence;

interface

uses
  System.Classes, Winapi.Messages;

type
  TOnSequenceEvent = procedure(Text: string) of object;

type
  TSequence = class(TThread)
  private
    FHandle: THandle;
    FOnSequenceEvent: TOnSequenceEvent;
    procedure WndProc(var Message: TMessage);
    procedure SequenceEvent(Text: string);
  public
    constructor Create; overload;
    destructor Destroy; override;
  published
    property OnSequenceEvent: TOnSequenceEvent read FOnSequenceEvent write FOnSequenceEvent;
  protected
    procedure Execute; override;
  end;

implementation

{ TSequence }

constructor TSequence.Create;
begin
  inherited Create;
  FHandle := AllocateHWnd(WndProc);
end;

destructor TSequence.Destroy;
begin
  if FHandle > 0 Then
    DeallocateHWnd(FHandle);

  inherited;
end;

procedure TSequence.WndProc(var Message: TMessage);
begin
  try
    Dispatch(Message);
  except
    if Assigned(ApplicationHandleException) then
      ApplicationHandleException(Self);
  end;
end;

procedure TSequence.SequenceEvent(Text: string);
begin
  if Assigned(FOnSequenceEvent) then
    FOnSequenceEvent(Text);
end;

procedure TSequence.Execute;
begin
  { Place thread code here }
end;

end.

Share this post


Link to post
Posted (edited)

If you are using the above management, "" you don't need it.

remove.

Application.ProcessMessages;

 

unit Sequence;

interface

uses
  System.Classes, Winapi.Messages;

type
  TOnSequenceEvent = procedure(Text: string) of object;

type
  TSequence = class(TThread)
  private
    FHandle: THandle;
    FOnSequenceEvent: TOnSequenceEvent;
    procedure SequenceEvent(Text: string);
  public
    constructor Create; overload;
    destructor Destroy; override;
  published
    property OnSequenceEvent: TOnSequenceEvent read FOnSequenceEvent write FOnSequenceEvent;
  end;

implementation

{ TSequence }

constructor TSequence.Create;
begin
  inherited Create;

end;

destructor TSequence.Destroy;
begin

  inherited;
end;

procedure TSequence.SequenceEvent(Text: string);
begin
  if Assigned(FOnSequenceEvent) then
    FOnSequenceEvent(Text);
end;

end.

Edited by Turan Can

Share this post


Link to post
Posted (edited)
6 hours ago, Anders Melander said:

Okay. I'll try again then. There are multiple problems in the original code.

Like I said, Application.ProcessMessages in itself isn't a problem. The problem is that when you use SendMessage then the WMStatusOut method will run in the context of the thread which means that Application.ProcessMessage will also be executed in the context of the thread and that will definitely end badly at some point. Replacing SendMessage with PostMessage will take care of that (and get rid of ProcessMessages while you're at it. It's the mark of an amateur).

 

Once the above is fixed, the more serious problem will surface: You're passing a pointer to a string owned by the thread to the main thread. You haven't posted the thread code, but I'm betting that there's nothing in your code to prevent the thread from modifying, or even freeing, this string once StatusOut has been called. I'm sure one of the usual suspects will be happy to provide you with an alternative solution.

I'm afraid that both the points stated here are factually incorrect. 

 

SendMessage synchronously sends a message to a window. The code is executed by the thread that owns that window.

 

Secondly, there is no reason why code in one thread should not operate on a string variable allocated in a different thread. That's not even what's going on anyway. It's just a pointer to null terminated character array. And the message handler receives that and makes a new string. No problem there. 

Edited by David Heffernan
  • Like 1

Share this post


Link to post

Why do you call ProcessMessages? 

 

And you say there are AVs but you don't have any information. The two addresses are critical, both the code address and the data address. And the call stack will be vital. Aren't you using madExcept or similar? Or if you have this under the debugger then you inspect call stack when the AV is raised. 

Share this post


Link to post
Posted (edited)

Press F7 or F8 to start the debugging session for the application and in CPU View on the code pane press ctrl+g and enter $0040DD41.

Which line is it in the code?

But It's a low address, so it will be most likely system.pas, so you have to look after the call stack, either on AV or by putting a breakpoint there with a condition.

Also, there is an inherited; in the message handler, how do the other handlers look like?

Edited by Attila Kovacs
  • Like 1

Share this post


Link to post
2 hours ago, David Heffernan said:

SendMessage synchronously sends a message to a window. The code is executed by the thread that owns that window.

You are off course correct:

Quote

If the specified window was created by the calling thread, the window procedure is called immediately as a subroutine. If the specified window was created by a different thread, the system switches to that thread and calls the appropriate window procedure. Messages sent between threads are processed only when the receiving thread executes message retrieval code. The sending thread is blocked until the receiving thread processes the message.

 

2 hours ago, David Heffernan said:

Secondly, there is no reason why code in one thread should not operate on a string variable allocated in a different thread. That's not even what's going on anyway. It's just a pointer to null terminated character array. And the message handler receives that and makes a new string. No problem there. 

You are correct if the string resource is protected, but I'm not seeing any evidence of that.

The string copy you're referring to happens in the message handler when the PChar is converted to a string, but before (or while) that happens the tread can have modified or free'd the source string. It's a race condition.

Let's say the relevant code looks like this:

// Thread
procedure TMyThread.Execute;
var
  Buffer: string;
begin
  while (not Terminated) do
  begin
    Buffer := IntToStr(GetTickCount);
    PostMessage(MainWindowHandle, WM_STATUSOUT, WParam(PChar(Buffer)), 0);
    Buffer := 'Lorem Ipsum Dolor';
  end;
end;

// Main thread
procedure TMyForm.WmStatusOut(var Msg: TMessage);
var
  Buffer: string;
begin
  Buffer := PChar(Msg.wParam);
  OutputDebugString(PChar(Buffer));
  
  // Or simply:
  // OutputDebugString(PChar(Msg.wParam));
end;

I can guarantee you that you will see corrupted output.

Share this post


Link to post
Posted (edited)
40 minutes ago, Anders Melander said:

You are correct if the string resource is protected, but I'm not seeing any evidence of that.

The string copy you're referring to happens in the message handler when the PChar is converted to a string, but before (or while) that happens the tread can have modified or free'd the source string. It's a race condition.

The code in the original post uses SendMessage. Which is synchronous.

 

What you say would be true if you used PostMessage. But that is not what is happening here. I guess that you were making that second point on the assumption that PostMessage had to be used, which I didn't pick up. 

 

Debugging with tiny excerpts of code is kinda pointless. The asker can obtain a stack trace and take it from there. 

Edited by David Heffernan

Share this post


Link to post

I don't know if there's enough info provided to solve this.   But, this is what I'd do with this small bit of info:

 

First, comment out the lines in the method: TSequence.StatusOut.  Do the AV's persist?  If so, then look elsewhere.

 

Then comment out one line at a time in procedure StatusOut until the AV's stop.   Is the problem in LogWrite() or frmMain.memStatus.Lines.Add()?   I have to imagine you are assuming the problem is the memo.  If so then why is LogWrite in the example code?

 

Since we're all just guessing here, my guess is that Application.ProcessMessages was leading to a StackOverflow error, which was trapped/muted and led to a different exception elsewhere.  (As when there's one ProcessMessages call, there's likely more.)  Use ProcessMessages as often as you use the With statement....as little as possible.

 

 

Share this post


Link to post
5 hours ago, Anders Melander said:

You are off course correct:

 

You are correct if the string resource is protected, but I'm not seeing any evidence of that.

The string copy you're referring to happens in the message handler when the PChar is converted to a string, but before (or while) that happens the tread can have modified or free'd the source string. It's a race condition.

Let's say the relevant code looks like this:


// Thread
procedure TMyThread.Execute;
var
  Buffer: string;
begin
  while (not Terminated) do
  begin
    Buffer := IntToStr(GetTickCount);
    PostMessage(MainWindowHandle, WM_STATUSOUT, WParam(PChar(Buffer)), 0);
    Buffer := 'Lorem Ipsum Dolor';
  end;
end;

// Main thread
procedure TMyForm.WmStatusOut(var Msg: TMessage);
var
  Buffer: string;
begin
  Buffer := PChar(Msg.wParam);
  OutputDebugString(PChar(Buffer));
  
  // Or simply:
  // OutputDebugString(PChar(Msg.wParam));
end;

I can guarantee you that you will see corrupted output.

 

 

 

I ran your code and didn't see any corrupted output after letting it run for a few minutes.  After my eyes glazed over concentrating on the debugger messages flashing by, so I tweaked it a bit to only output if the string value is different.  See attached.   Perhaps this just needs to run overnight to eventually see the issue.  Your explanation seems very plausible but the string allocation mechanics must allow this to work (at least the vast majority of the time.)  Tested on a 4 virtual core, fairly speedy VM with Windows 10 and Delphi 10.3.3.

 

StringCorruptionTest.pas

StringCorruptionTest.dfm

Share this post


Link to post

@Darian Miller Anders is right, there is a race here. You don't need to run any code to see it. However, its all moot because this code isn't that which the asker is using. The asker's code handles the text fine. 

Share this post


Link to post

@David Heffernan  I was testing Ander's assertion "I can guarantee that you will see corrupted output"   I tested and didn't see it, at least in a limited time window.  But there were many thousands of iterations and no corruptions detected.

 

It was simply something interesting to test...

 

Share this post


Link to post
Posted (edited)

@Darian Miller I've slightly modified your code, with the sleep you can fine-tune to get it synced 😉

 

procedure TMyThread.Execute;
var
  Buffer: string;
  r: integer;
begin
  while (not Terminated) do
  begin
    r := random(3242424);
    Buffer := TEST_STRING + inttostr(r);
    PostMessage(MainWindowHandle, WM_STATUSOUT, WParam(PChar(Buffer)), r);
    // Sleep(30);
  end;
end;


// Main thread
procedure TForm1.WmStatusOut(var Msg: TMessage);
begin
  if PChar(Msg.WParam) <> TEST_STRING + inttostr(Msg.LParam) then
  begin
    OutputDebugString(PChar(Msg.WParam));
  end;
end;

 

or if we want more dramatic results:

procedure TMyThread.Execute;
var
  p: pchar;
  r: integer;
begin
  while (not Terminated) do
  begin
    r := random(3242424);
    p := StrNew(pchar(TEST_STRING + inttostr(r)));
    try
      PostMessage(MainWindowHandle, WM_STATUSOUT, WParam(p), r);
    finally
      StrDispose(p);
    end;
    // Sleep(30);
  end;
end;

 

Edited by Attila Kovacs

Share this post


Link to post
1 hour ago, Darian Miller said:

I ran your code and didn't see any corrupted output after letting it run for a few minutes.

You are using static strings so there's no string allocation involved thus no corruption. The PChar will point to a constant string that never changes. Create the string with Format or IntToStr, as in my example, and you will get corrupted strings immediately.

 

Also; Why WM_USER+1? Is WM_USER cursed?

Share this post


Link to post

I did use your example but didn't see any corruption when looking at it flash by.  I put the TickCount back and changed the WmStatusOut to

 

if StrToIntDef(PChar(Msg.WParam),-1) = -1 then
begin
  OutputDebugString(PChar(Msg.wParam));
end;

 

And there are indeed a few that show up in the messages window over time.  However, they display as normal looking numbers.  Since StrToIntDef is failing I assume there's something amiss....?

 

 

WM_USER.  Simply in the habit of always using WM_USER + something, usually a constant.

 

 

Share this post


Link to post
Posted (edited)

This is running inside a VM and I cannot install Delphi in it, so the debugger cannot be used. If it could, I probably would have tracked this down. (Running the code on my dev machine isn't possible at the moment for reasons I have not fully grasped.)

 

All I'm wanting to do here is capture log data without interfering with the operation of the logic, which seems to be what's happening here. 

 

Alternative approaches are worth considering.

 

What I'll try next: (1) install MadExcept.  (2) try to set up CodeSite Lite to catch the messages rather than the current approach.

 

We might have another logging lib available, like EurekaLog. It was used on one project I worked on a while back, and it seemed kind of clunky. I think it has a separate app to read its log files. (Or am I thinking of another one?)


Any thoughts along that line are welcome.

Edited by David Schwartz

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×