Jump to content
Willicious

How to debug a Not Responding program element

Recommended Posts

My Delphi program has a somewhat complex level selection menu which loads various items including graphics, level information and a treeview for selecting packs and levels.

 

Up until recently, the menu has started to hang and occasionally become unresponsive, occasionally requiring closing the program altogether using Task Manager.

 

My question is, how would I go about locating the source of the problem? Something as generic as a program hang which only happens occasionally doesn’t really point towards anything specific, but the chances are that it is something specific which is causing the issue.

 

I’m happy to share the entire .pas file for the level select menu (about 1700 lines of code) if anyone thinks it necessary, but I wondered if there might be a specific way to find out where loading bottlenecks might be happening, or how to isolate whatever is causing the program to hang. I use RAD Studio 10.4 (the program doesn’t compile at all on 11 or higher).

 

Suggestions welcome, thank you.

Share this post


Link to post

Install madExcept and use that to detect and report when the application freezes (provided the freeze is caused by the application not pumping the message queue).

  • Thanks 1

Share this post


Link to post

Pausing the app in the debugger and inspecting each thread might also offer clues.

In my experience, such hangs are typically caused by

- forgetting to call an inherited method

- deadlock on a resouce

- loop that has exit conditions that are never met

and in rare cases, slow recursions with low stack impact

 

 

  • Thanks 1

Share this post


Link to post
25 minutes ago, Willicious said:

My Delphi program has a somewhat complex level selection menu which loads various items including graphics, level information and a treeview for selecting packs and levels.

 

Up until recently, the menu has started to hang and occasionally become unresponsive, occasionally requiring closing the program altogether using Task Manager.

 

My question is, how would I go about locating the source of the problem? Something as generic as a program hang which only happens occasionally doesn’t really point towards anything specific, but the chances are that it is something specific which is causing the issue.

 

I’m happy to share the entire .pas file for the level select menu (about 1700 lines of code) if anyone thinks it necessary, but I wondered if there might be a specific way to find out where loading bottlenecks might be happening, or how to isolate whatever is causing the program to hang. I use RAD Studio 10.4 (the program doesn’t compile at all on 11 or higher).

 

Suggestions welcome, thank you.

If you can reproduce the problem in the debugger the Run -> Program Pause menu item should get you the code location the program is currently executing.

  • Thanks 1

Share this post


Link to post

Sometimes a logging can help. Log the start and finish of every action you take when the menu is executed (starting loading this / finished, starting that / finished). The part that does not have the finished event in the log is caused the hang.

 

 

 

  • Like 2
  • Thanks 1

Share this post


Link to post
Posted (edited)
On 5/16/2024 at 11:08 AM, Anders Melander said:

Install madExcept and use that to detect and report when the application freezes (provided the freeze is caused by the application not pumping the message queue).

Thanks for the suggestion! Installed, enabled, compiled.

 

It almost seems a bit pathetic to be typing "what do I do now?" but... I really have no other option :classic_blink: Maybe I can buff it up with some other more specific questions:

Does madExcept have a GUI? If not, how can I see its output?

Can I point madExcept to a specific .pas file to look for possible optimizations? If not, how can I know that madExcept is looking at what I need it to look at?

 

You know, that sort of thing :classic_wink:

 

EDIT: Better question - how would I know if the hangs are caused by the app pumping the message queue?

Edited by Willicious

Share this post


Link to post
On 5/16/2024 at 11:17 AM, Lars Fosdal said:

Pausing the app in the debugger and inspecting each thread might also offer clues.

This helped, thanks. I put a breakpoint at the only procedure called by the treeview's click event handler:

procedure TFLevelSelect.tvLevelSelectClick(Sender: TObject);
begin
  SetInfo; <-------------------- breakpoint here
end;

What I noticed is that there was occasionally a significant delay between clicking the treeview and the debugger pausing the program, but only for certain treeview items. Some pause immediately, others take longer. This is consistent with the behaviour I'm noticing with the release version.

 

I'm interested to know what this might point to. Why would the program sometimes be taking a while to acknowledge input?

Share this post


Link to post
On 5/16/2024 at 11:19 AM, PeterBelow said:

If you can reproduce the problem in the debugger the Run -> Program Pause menu item should get you the code location the program is currently executing.

This one is very useful to know in general, thanks.

 

Does Program Pause have a hotkey? Since I'm needing to debug input, I need to be able to pause pretty much immediately after I've clicked the treeview.

Share this post


Link to post
On 5/16/2024 at 11:43 AM, Lajos Juhász said:

Sometimes a logging can help. Log the start and finish of every action you take when the menu is executed (starting loading this / finished, starting that / finished). The part that does not have the finished event in the log is caused the hang.

Last resort as it takes a while to paste in debug strings to every procedure, but if all else fails this is probably what I'll do :classic_huh:

 

In fact, calling it now that this is probably what I'll end up having to do :classic_laugh:

  • Haha 1

Share this post


Link to post

Both the "program not responding" and the delay between clicking a tree node and hitting the breakpoint indicate that your program becomes very slow to process its window message queue.

 

Unfortunately there are many reasons why this is the case. Some ideas that come to my mind are:

 

1. If you know in advance which node to click to experience the delay, press Pause ASAP after clicking and before hitting the breakpoint (a shortcut won't help, since while you click your program is active and not the IDE). Do this a couple of times to learn where your program typically spends that delay. If you end up in a similar spot in your code more than once, chances are you're looking at the cause (check the call stack!)

2. Do you use threads/tasks in connection with .Synchronize calls? Anything lengthy and blocking in synchronize has the potential to cause such effects.

3. Do you have Action(List) OnUpdate handlers that could take a bit longer? Depending on how complex your GUI is, these handlers might get called a LOT, so they should never do anything costly - if so, consider a caching mechanism.

4. Is this a normal VCL TTreeView? These are known to become quite inefficient with a couple of thousand nodes. Depending on what you do, maybe even with less than that.

 

Of course these are all things to keep in mind even if you go down the OutputDebugString route.

  • Thanks 1

Share this post


Link to post
7 hours ago, Willicious said:

It almost seems a bit pathetic to be typing "what do I do now?"

With regard to madExcept; Nothing. You just enable it for your project, check the freeze detection and run the application.

If the application isn't pumping the message queue then madExcept will detect that and pop up its usual message box with a stack trace and options to terminate, restart, etc.

 

However, you would only need to do all this if you cannot reproduce the problem in the debugger. If you can reproduce in the debugger you will get the exact same information by just pausing the application and examining the call stack of the main thread in the debugger.

 

 

7 hours ago, Willicious said:

Does Program Pause have a hotkey?

No. I believe it was possible to break into the debugger using F12 in Windows XP but I think that option was removed (by Microsoft). I have a registry hack somewhere that enables it again, but I don't have time to find it right now. I think it's related to this: https://learn.microsoft.com/en-us/windows/win32/debug/configuring-automatic-debugging

Maybe someone else here knows the answer.

  • Like 1

Share this post


Link to post
Posted (edited)
10 hours ago, msohn said:

Both the "program not responding" and the delay between clicking a tree node and hitting the breakpoint indicate that your program becomes very slow to process its window message queue.

I see. I'm guessing there's no easy answer to this particular problem. Could I maybe limit the number of messages being sent in some way, or prioritize treeview stuff when the form is active?

10 hours ago, msohn said:

1. If you know in advance which node to click to experience the delay, press Pause ASAP after clicking and before hitting the breakpoint ... If you end up in a similar spot in your code more than once, chances are you're looking at the cause (check the call stack!)

Managed to grab the Pause button from the Menu just in time. Anyone's guess why there's no hotkey for this; someone at Embarcadero didn't have their Weetabix the day that feature was added, clearly.

 

Anyways, here's a screen shot of what I can see. CPU processing and no call stack history:

image.thumb.png.8550483baf452b0fc48d2fad2859236f.png

 

However, did get this when I stepped over with F8:

image.thumb.png.d62011a74ebb6bf4375ee1fba733c3a8.png

 

10 hours ago, msohn said:

2. Do you use threads/tasks in connection with .Synchronize calls? Anything lengthy and blocking in synchronize has the potential to cause such effects.

Very unlikely. It's a 32-bit program which (as far I understand) doesn't support multi-threading. It's pretty ancient by today's standards so unlikely to be trying to pull those sorts of manoeuvres (although, I could be wrong about that).

 

10 hours ago, msohn said:

3. Do you have Action(List) OnUpdate handlers that could take a bit longer? Depending on how complex your GUI is, these handlers might get called a LOT, so they should never do anything costly - if so, consider a caching mechanism.

This is more likely. Caching sounds like a very good idea, since each time a treeview item is clicked it has to reload a bunch of info. As far as I can tell, all assets are loaded on form creation, but there could be some re-loading/re-drawing going on somewhere. Thing is though, if the delay is happening before SetInfo is even being called, then it stands to reason that isn't the cause of it. Still, I'd definitely be up for learning how to cache stuff if it's a generally helpful thing to do to save memory.

 

10 hours ago, msohn said:

4. Is this a normal VCL TTreeView? These are known to become quite inefficient with a couple of thousand nodes.

Yes. I've had someone else suggest that I use a better treeview. I'm guessing I'd install it like a plugin, like Bass.dll or GR32 (or madExcept)...? Would definitely be up for exploring that.

 

3 hours ago, Anders Melander said:

However, you would only need to do all this if you cannot reproduce the problem in the debugger. If you can reproduce in the debugger you will get the exact same information by just pausing the application and examining the call stack of the main thread in the debugger.

See above screenshot for debug output on pausing. I'm probably not using the debugger correctly though, because it's gone straight to CPU stuff.

 

Meanwhile, madExcept found nothing, so I've uninstalled. It probably is the message queue, from what's being said.

 

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

 

Thanks for the help so far everyone, I know I'm a bit of a novice; it's nice to have the suggestions.

Edited by Willicious

Share this post


Link to post
42 minutes ago, Willicious said:

It's a 32-bit program which (as far I understand) doesn't support multi-threading.

You must be thinking of DOS/Windows 3.x. All version of Windows since Windows 95 has supported threading.

 

Your screenshots aren't really showing us the call stack. Make sure that after your pause that you switch to the main thread (open the thread list, double click the topmost thread).

 

I would keep madExcept installed if I were you and occasionally enable its memory overwrite detection for your application. That is how I found and fixed some of the memory overrun bugs in NeoLemmix.

  • Like 1
  • Thanks 1

Share this post


Link to post
10 minutes ago, Anders Melander said:

Your screenshots aren't really showing us the call stack. Make sure that after your pause that you switch to the main thread (open the thread list, double click the topmost thread).

image.thumb.png.b96f1645e6405130a7fcf745aff64b1d.png

 

11 minutes ago, Anders Melander said:

I would keep madExcept installed if I were you and occasionally enable its memory overwrite detection for your application. That is how I found and fixed some of the memory overrun bugs in NeoLemmix.

I'll definitely give it another try at some point. For now I'm trying to get a few things stable before I introduce any new project elements (hence trying to get the treeview code optimised).

Share this post


Link to post

Okay, so you are doing something in Application.OnIdle. My guess is that you are doing too much (it's taking too long) and thus slowing everything down.

 

Application.OnIdle is called inside the message pump loop when the the message queue is empty. If you do something in OnIdle that take a bit too long the message pump will be slow to react to new messages in the queue. It could also be that you produce a lot of new messages in OnIdle and thus flood the message queue and keep the loop busy all the time. My guess is that OnIdle is used for animation so the messages are probably mostly WM_PAINT and WM_TIMER.

 

Anyway, have a look at what the OnIdle handler is doing. Does the problem go away if you disable that?

  • Like 1

Share this post


Link to post

You are calling Sleep(interval) in GameScreenMenu.TGameScreenMenu.ApplicationIdle , this interval either too big or being called too many times behaving like deadlock.

 

the problem is not exactly there, but in some sort of combination with waiting on some event or value from the TGameBaseMenuScreen, as this one is in DoLevelSelect is executing ShowModal .

So your ShowModal is implemented in the base class while you ApplicationIdle is in the inherited one, now make sure you are not checking on any Form.Handle as it might be recreated in other words, your ShowModal form might be not visible or lost track of some of its internal fields where your Idle handler is waiting on it.

 

Suggestion :

1) redo or completely remove that Idle handler, by redo i mean re-code it in defensive way assuming you can't trust any locally stored values and check them at runtime every time you evaluating something namely and most critically before calling a Sleep,

 

2) Why on earth you need to call on Sleep to begin with ? if you step back to look at what is going on, OnIdle is executed when the application is Idle and no work to, but you are introducing a work for it by calling Sleep !!

Share this post


Link to post

Commented out the entire ApplicationIdle procedure (including the call for Sleep(1)). The result is that the scroller animation in the Menu screen doesn't update. Otherwise, the program runs normally.

 

However, the problem (i.e. slow-responding Level Select treeview) persists. Here's the resulting call stack (i.e. between physical mouse click and treeview click event handler):

 

image.thumb.png.85ba5b5c7f8f9165b6d7c558623d2a9a.png

 

Here's the ApplicationIdle procedure:

 

procedure TGameMenuScreen.ApplicationIdle(Sender: TObject; var Done: Boolean);
begin
  if fCleanInstallFail then
  begin
    DisableIdle;

    fCleanInstallFail := false;
    ShowMessage('It appears you have installed this version of SuperLemmix over ' +
                'an older major version. It is recommended that you perform a ' +
                'clean install of SuperLemmix whenever updating to a new major ' +
                'version. If you encounter any bugs, especially relating to ' +
                'styles, please test with a fresh install before reporting them.');
    fLastReelUpdateTickCount := GetTickCount64;

    EnableIdle;
  end else if not GameParams.LoadedConfig then
  begin
    DisableIdle;

    GameParams.LoadedConfig := true;
    ShowSetupMenu;

    EnableIdle;
  end else if not fDisableScroller then
    UpdateReel;

  Done := false;
  Sleep(1);
end;

Any suggestions welcome...

Share this post


Link to post
Posted (edited)

From the documentation for OnIdle: https://docwiki.embarcadero.com/Libraries/Athens/en/Vcl.Forms.TApplication.OnIdle

 

"Applications that set Done to false consume an inordinate amount of CPU time, which affects overall system performance."

 

Your call stack shows that you clicked on an Image (no TreeView involved here), which causes some dialog to be shown modally. While this dialog is shown, your TGameMenuScreen.ApplicationIdle code is called continuously.

 

I assume you've written the code for ApplicationIdle - can you describe what that code is intended to do? Because it's not obvious to me. Why the sleep? Why do you never set Done to True?

 

Edit: also, it's quite telling, that you disable the OnIdle while you show a modal form in your ApplicationIdle handler - but you don't seem to do that in DoLevelSelect that is called as a result of clicking the image. I'd try that first, i.e. add DisableIdle and EnableIdle (ideally with a try/finally!) before/after your ShowModal call in DoLevelSelect.

Edited by msohn

Share this post


Link to post
24 minutes ago, msohn said:

add DisableIdle and EnableIdle (ideally with a try/finally!) before/after your ShowModal

I consider this a necessary duty!

Share this post


Link to post
Posted (edited)
45 minutes ago, msohn said:

Your call stack shows that you clicked on an Image (no TreeView involved here), which causes some dialog to be shown modally. While this dialog is shown, your TGameMenuScreen.ApplicationIdle code is called continuously.

The image that's being clicked is to show the modal (Level Select) in the first place, which is where the treeview is displayed.

 

The call stack doesn't show the treeview click event because I've inserted a breakpoint at the treeview click even handler call, and I'm pausing the program between clicking the treeview and the breakpoint happening - so, we never see what the treeview is calling in the call stack. That's OK, though; the point of this exercise is to determine what's causing the delay between treeview click and treeview click event handling.

 

Regarding the ApplicationIdle procedure, I find it unlikely that this is the problem. As I stated in my previous post (important parts highlighted):

1 hour ago, Willicious said:

Commented out the entire ApplicationIdle procedure (including the call for Sleep(1)). The result is that the scroller animation in the Menu screen doesn't update. Otherwise, the program runs normally.

 

However, the problem (i.e. slow-responding Level Select treeview) persists.

 

So, hopefully we can agree that the ApplicationIdle procedure is unlikely (although not uncertain) to be the problem. To address your questions, though:

45 minutes ago, msohn said:

I assume you've written the code for ApplicationIdle - can you describe what that code is intended to do? Because it's not obvious to me. Why the sleep?

I didn't write the code, I'm working on a fork of no fewer than 4 other programmers' work. So, the codebase is full of inconsistencies and calls that seemingly have no purpose, etc. As a relative beginner (I've only been working on it for about a year and a half), it makes the process that bit more difficult, but I do get to learn lots about how not to do things.

 

The ApplicationIdle procedure is intended to do a few initial checks upon first loading the program (is it a new version?, does a user config file exist?), and then handle the situation accordingly. Generally this is only relevant the very first time a brand new version is opened on a user's system. Beyond that, it handles updating the scroller reel (a simple animation on the menu screen). My best guess is that the sleep allows time for the reel to update, but I could be wrong about this. The original developer is difficult to get hold of, and usually can't remember off the top of their head why certain calls have been made, so I have to make (un!)educated guesses and generally leave things as they are unless I have a good reason to change them.

 

45 minutes ago, msohn said:

Why do you never set Done to True?

I didn't know I had to, simple as that really. Best guess is that some previous version of the unit did set Done to true, but this was later removed (perhaps inadvertently). I'm unsure of exactly what to do about this, would you suggest I simply set remove the call to set it to false, or explicitly set it to true somewhere else (possibly when closing the menu screen)?

Edited by Willicious

Share this post


Link to post
Posted (edited)
39 minutes ago, msohn said:

also, it's quite telling, that you disable the OnIdle while you show a modal form in your ApplicationIdle handler - but you don't seem to do that in DoLevelSelect that is called as a result of clicking the image. I'd try that first, i.e. add DisableIdle and EnableIdle (ideally with a try/finally!) before/after your ShowModal call in DoLevelSelect.

 

15 minutes ago, Stano said:

I consider this a necessary duty!

 

I did try this actually (i.e. explicitly disabling idle and then re-enabling Idle before and after the Level Select modal call), but the problem still persists.

 

Can we move away from thinking it has anything to do with ApplicationIdle? That theory has been disproven simply by commenting out the entire procedure. I realise that the call stack doesn't really point to anything else, but maybe the problem isn't actually with the program. It could be a Windows thing, a VLC treeview thing, etc...?

Edited by Willicious

Share this post


Link to post
Posted (edited)

What does the SetInfo do on the TV.onclick?       

 

I populate my treeview childnodes by finding all the menuitems at startup. Then assign the menuitems onclick to the TV. The TV then follows the menu settings.  

 

Emba has Space Rocks game on Getit.  Jim M uses a list to animate the rocks as they created and destroyed.      

 

Treeview has savetofile and loadfromfile.   

 

How is the game paused? is it ctrl- C? In the future pause the game not the IDE, insert a breakpoint, and restart the game. The IDE will then pause at breakpoint.   

 

  

Edited by Pat Foley
Add two items

Share this post


Link to post
44 minutes ago, Willicious said:

Can we move away from thinking it has anything to do with ApplicationIdle? That theory has been disproven simply by commenting out the entire procedure. I realise that the call stack doesn't really point to anything else, but maybe the problem isn't actually with the program. It could be a Windows thing, a VLC treeview thing, etc...?

Well ApplicationIdle definitely contributed to your program not handling the window message queue in a timely manner. But for sure it might not be the only one contributing to it.

 

If you keep ApplicationIdle commented out and repeat the steps to get the call stack again, what do you get?

 

Also, is there anything unusual on the form being shown? Does it already take unusually long for the modal form to show up?

 

Coming back to a previous point of mine, which you seemed to have misunderstood: are there any TActionList components in this project? Actions are updated with the same mechanism, i.e. from within TApplication.Idle. If that takes long, the symptoms would be very much the same.

  • Like 1

Share this post


Link to post
54 minutes ago, Willicious said:

Can we move away from thinking it has anything to do with ApplicationIdle?

OK,

 

But evidently your application usage of Sleep is showing a problem if understanding its impact, see, Sleep(1) aka sleep for 1 millisecond by default will be 1000/64=15.625 ms, limiting that specific path of code to 64 time per second, this will affect UI greatly, so :

 

1) Search for every Sleep call in your project from the main thread and remove them, don't use Sleep in main thread not for 1 or even 0 ms, they makes no sense, for background thread it is understandable and useful, but again if the main thread will wait for background thread to finish rendering an animation to paint it then again UI will behave as you described.

2) In my life i may be have used ApplicationIdle few times that can be counted on hand, mostly were to logout or clean up and close stuff or finish opened procedure to not causing blocking something online, but i have removed tens of these handlers from projects that are not mine, due the abuse and misunderstanding of usage, ApplicationIdle is healthy way to see that your UI is responsive the more is called the better, and never use it for something take long time, if needed then start background thread to do it.

3) Your animation most likely is the cause, this is my guess, a thread is doing something while main thread is waiting on it to paint, that is wrong approach, let the thread render (prepare what ever), on other hand use a timer, simple timer with adequate interval to see if that thread or the animation is ready to be painted then draw/paint it, and exit, don't use interval higher than 60 per second, as this is useless and waste of time, on other hand try to centralize all your animations with one timer and one class that do the draw/paint, you don't need every animated button to use 60 message per second, right ? because this will starve UI from handing message and produce the mentioned behavior.  

  • Like 1

Share this post


Link to post

@Willicious: I managed to get all relevant code from github and have it compiling and running.

 

How do I reproduce this?

 

The only bit of a hang I've seen is when I click one of the larger level group nodes - then it takes quite a while because of hundreds of buttons and labels being created on the fly. And another thing which makes this worse, there are many updates, i.e. SetInfo calls, even though the selected tree node hasn't even changed - a simple cache of what was previously selected would improve that situation - or at least remove the Click and DblClick handlers which call SetInfo - any click will rebuild the GUI even though the selected node has not changed.

  • Thanks 1

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×