Jump to content
Willicious

How to debug a Not Responding program element

Recommended Posts

Posted (edited)
4 hours ago, Kas Ob. said:

Yesterday only i downloaded https://github.com/ericlangedijk/Lemmix at last and tried to build it, but it is not for older Delphi.

That's Lemmix, the original program. There has been NeoLemmix and now SuperLemmix since then.

 

You can get the SuperLemmix source code here (it builds in RAD 10.4, not sure about later versions): https://github.com/Willicious/SuperLemmixPlayer

 

I'll take another look at PE later today, thanks again for your help and advice.

Edited by Willicious

Share this post


Link to post
7 hours ago, Kas Ob. said:

1) Make sure there is no Sleep calls at all ! everything single one of them , these you have to replace them with timers in needed 

It would probably be best if you had looked at the code before making such a broad recommendation... I haven't looked through all the code but I'm certain there at least some cases where it would make no sense to replace Sleep. I know because I wrote that code.

Share this post


Link to post
59 minutes ago, Anders Melander said:

It would probably be best if you had looked at the code before making such a broad recommendation... I haven't looked through all the code but I'm certain there at least some cases where it would make no sense to replace Sleep. I know because I wrote that code.

I do understand.

 

I want the CPU to spike first, go full core, only after that the reason(s) for the stuttering can be identified or isolated, right now and until now all i got from the description is the CPU is doing nothing yet there is unresponsiveness in UI, after unhinge the CPU and processing loops without break, if the stutter still exist then this will mean a thing if not then it is a different thing and so on.

If it is an one-to-one import for very old design like DOS then timing is suffer as modern OS and CPU behave differently and multitasking is shared responsibility between the application and OS, in old days your game should time the rendering and the input and give no concern about OS or other any applications running on the device.

 

When i see a comment in the source right before Sleep(1) call with "  // Relax CPU" then many variation comes to mind, relax for whom or to whom as this game is one thread and the OS will switch it when he see fit not waiting for such a sleep, this makes sense if such sleep is inside a never ending loop, is it ? 

 

All in all, i can't imagine (what ever) reason to call Sleep from main thread. think about it.

Share this post


Link to post

Also i didn't say this will solve the problem or will work as workaround.

 

Sleep can no be used for timing anywhere, and in this application if it is being used as such then a confirmation is needed.

Just like handling Windows control and custom drawing, as example, take Custom Draw events, like ListBox, some might implement to draw one item on every call or even draw the whole list with all the items, still the OS will send punch of draw messages, different messages for might be one or many items, some will simply pointing the X item is not selected followed by Y item selected, while the application just draw the lot twice.

Share this post


Link to post
2 minutes ago, Kas Ob. said:

All in all, i can't imagine (what ever) reason to call Sleep from main thread. think about it.

Okay but lack of imagination isn't a reason to eliminate it 🙂

 

The use of Sleep in OnIdle is definitely a problem but that doesn't mean that all use of Sleep in the main thread is bad.

For example, the fade in/out code is a time limited (sub-second duration) loop containing a Sleep. While the fade in/out is in progress nothing else happens and nothing else should happen. It would be silly to complicate this with threads or timers.

Share this post


Link to post
2 minutes ago, Anders Melander said:

Okay but lack of imagination isn't a reason to eliminate it 🙂

No imagination here just logic an make sense.

 

2 minutes ago, Anders Melander said:

For example, the fade in/out code is a time limited (sub-second duration) loop containing a Sleep. While the fade in/out is in progress nothing else happens and nothing else should happen. It would be silly to complicate this with threads or timers.

OK, but how it is more complicated, on one hand Sleep doesn't have fixed interval ! right ? any other application running on Windows OS can ruin your timing by 16 fold, i saw it 1ms and saw it 15.625ms, these are the most popular.

On other hand either that animation is slow and taking long time or very fast, in both cases a timer or background thread will be more appreciated and will not prevent the used from closing the app as as example or not clicking a button, or even worse buffering few clicks form the user then fire them afterward like machine gun.

And again if it is one animation then one timer will do it, but if there is more then what more sleep calls, or the same one timer or even more timers, for me this is simple question and simple answer.

 

 

300ms delay in IDE autocomplete does feel very different from 200ms, don't you agree ?

Share this post


Link to post

I remembered the worse of them all, clicking show Desktop aka minimize everything, and an application still there, most annoying thing, sadly the Delphi debugger do this without any sense of consideration i would love to see an animation like a hammer and a nail when it refuse to minimize.

Share this post


Link to post
1 minute ago, Kas Ob. said:

Sleep doesn't have fixed interval ! right ?

No and it isn't needed in this particular case. How about you have a peek at the code before you continue?

Share this post


Link to post
Posted (edited)
4 hours ago, Kas Ob. said:

I do understand.

Here are the screenshots as promised. This is graph & threads on opening the app:

 

image.thumb.png.9fa960ac84d51ab8b3959b16bb204a0c.pngimage.thumb.png.a8ee14fb99b0f25f90835f9c456b2192.png 

 

And then again a few mins later after loading and playing a level:

image.thumb.png.90f5e075a3b71a0de4043dd42bcbeb4d.pngimage.thumb.png.bc1442966dba8316a69276e8b32ec40e.png

 

2 more memory leaks after playing, these I'll try to fix using the same method as msohn:

image.thumb.png.d6cea3c1541a94abbc042319730d7227.png

 

Ignore this one, added it by accident and for some reason the site won't let me delete it:

image.png

Edited by Willicious

Share this post


Link to post
8 hours ago, Willicious said:

2 more memory leaks after playing, these I'll try to fix using the same method as msohn:

image.thumb.png.d6cea3c1541a94abbc042319730d7227.png

TNLCursor leak is easy to find and fix

{
..
    Cursors              : array[1..CURSOR_TYPES] of TNLCursor;
..
procedure TGameWindow.InitializeCursor;
...
begin
...
    for i := 1 to CURSOR_TYPES do
    begin
      Cursors[i].Free;

      Cursors[i] := TNLCursor.Create(LocalMaxZoom);
      
procedure TGameWindow.FreeCursors;
var
  i: Integer;
begin
  for i := 0 to Length(Cursors)-1 do   // will leak also it could cause AV, yet it seems never did
    Cursors[i].Free;
end; }

procedure TGameWindow.FreeCursors;    
var
  i: Integer;
begin
  for i := Low(Cursors) to High(Cursors) do    // right way
    Cursors[i].Free;
end;

The unknown need to be known to be fixed, and i don't have a crystal ball.

 

This unknown might be very similar though, enabling Overflow Checking and Range Checking in the project option should prevent bugs and leaks like the above from first run, but i will guess here it never being enabled, and might be huge task to fix all the exception that will raise, none the less enabling them at least when debugging or developing will yield better code quality with less memory leaks.

Share this post


Link to post
9 hours ago, Willicious said:

Here are the screenshots as promised.

I don't see any strange behavior here, there is no CPU load or any bottle necks, on the contrary the cpu is relaxed and if there is not responding then it is either Sleep or long process that can be moved into a background thread and make it do it fast.

 

I explained my position and opinion about depending or even using Sleep in main thread, and yes i looked at the code and i know what i am talking about.

 

My suggestion is to entertain or research the possibility of removing the dependency on Application Idle handler to either background thread that post messages to the main thread or simple timer doing the same as that Idle Handler, aka emulate it in timer, this should not huge deal at all, on contrary it will make sure of best practice in developing for Windows, remember that ApplicationOnIdle is triggered when no other message seen at this moment, while executing any thing in OnIdle there is no OS message handling, while Windows OS will consider application freeze and no responding after only 5 seconds if the application main thread is did empty/grab message(s) from the message queue, yes 5 seconds.

 

After all these posts in this thread, i may be out of date on what is the top priority bug/problem to fix, is it still non responding menu level selection window or something else, 

 

Keep the good work and good luck.

Share this post


Link to post

Found this, and this have be understood and remembered when developing for Windows

https://learn.microsoft.com/en-us/windows/win32/win7appqual/preventing-hangs-in-windows-applications

 

 

Off-topic:

the last two lines in that article are most fascinating, because C++ exceptions are similar to Delphi exceptions in their dissimilarity to C exceptions, exceptions in callbacks are freaking fatal, and most the time are obscure and untraceable due the failed unwinding (unroll) process.

Share this post


Link to post
Posted (edited)
7 hours ago, Kas Ob. said:

TNLCursor leak is easy to find and fix

...

enabling Overflow Checking and Range Checking in the project option should prevent bugs and leaks like the above from first run, but i will guess here it never being enabled, and might be huge task to fix all the exception that will raise

Thanks for the fix, I applied it and it fixed both of the leaks! :classic_cool: I likely wouldn't have spotted this either, because on the surface it did originally appear to be the case that the cursors were being freed correctly.

 

I also enabled Overflow and Range checking; so far, this Integer overflow is reported:

procedure TGameBaseScreen.FadeOut;
var
  RemainingTime: integer;
  OldRemainingTime: integer;
  EndTickCount: Cardinal;
const
  MAX_TIME = 320; // mS
begin
  EndTickCount := GetTickCount + MAX_TIME;
  OldRemainingTime := 0;
  RemainingTime := MAX_TIME;

  ScreenImg.Bitmap.DrawMode := dmBlend; // So MasterAlpha is used to draw the bitmap

  while (RemainingTime >= 0) do
  begin
    if (RemainingTime <> OldRemainingTime) then
    begin
      ScreenImg.Bitmap.MasterAlpha := MulDiv(255, RemainingTime, MAX_TIME);
      ScreenImg.Update;

      OldRemainingTime := RemainingTime;
    end else
      Sleep(1);

    RemainingTime := EndTickCount - GetTickCount; <--------------------------------------- this line here
  end;

  Application.ProcessMessages;
end;

 

Edited by Willicious

Share this post


Link to post
1 hour ago, Willicious said:

RemainingTime := EndTickCount - GetTickCount;

That's a classic - with an easy fix. Try Googling it and see if you can't fix it yourself; You'll learn more from that.

  • Like 1

Share this post


Link to post
1 hour ago, Willicious said:

so far,

Well, you are in for a ride !

These runtime checks might be triggered many times, as the code never or may be for long times evolved without being checked against these.

 

Anyway, each and every case can be fixed, just take your time with them.

For this case :

1) GetTickCount is Cardinal (DWORD), thing to remember.

2) That loop is easy to understand and rewrite performing such subtraction or just slap a workaround around it like this

  while (RemainingTime >= 0) do
  begin
    if (RemainingTime <> OldRemainingTime) then
    begin
      ScreenImg.Bitmap.MasterAlpha := MulDiv(255, RemainingTime, MAX_TIME);
      ScreenImg.Update;

      OldRemainingTime := RemainingTime;
    end else
      Sleep(1);

    if GetTickCount > EndTickCount then   // prevent integer overflow
      Break;

    RemainingTime := EndTickCount - GetTickCount; <--------------------------------------- this line here
  end;

  Application.ProcessMessages;
end;

 

You are in for a ride, and definitely worth it.

Share this post


Link to post
7 hours ago, Kas Ob. said:

After all these posts in this thread, i may be out of date on what is the top priority bug/problem to fix, is it still non responding menu level selection window or something else, 

Yes, let's get the thread back on track. The main thing I need help with here is fixing the slow-responding level select menu & treeview - the fixes that have been applied so far have helped for sure, but haven't eliminated the slow response times. I'm considering adding a progress bar, but if I do this then I'd want to load all information to the treeview at the start, rather than adding some when nodes are clicked. Node clicking should, then, access pre-loaded info (if this is at all possible).

 

Please bear in mind, those that have offered suggestions so far, that I need step-by-step instructions if possible. If you say something like "just use a 3rd party treeview", that means nothing to me without instructions as to where to get it and how to intregrate it into the project. I understand that not everyone has time to type out instructions to a newbie though, so... it's fine if you don't, but chances are I probably won't get around to implementing your suggestion.

 

Regarding general optimization though, it'd be nice to get some help with what to do with Overflow & Range checks. At present, they show a popup and point to a code area (this is great!), but I don't really know what to do from there.

Share this post


Link to post
Posted (edited)
8 minutes ago, Willicious said:

I'm considering adding a progress bar, but if I do this then I'd want to load all information to the treeview at the start, rather than adding some when nodes are clicked. Node clicking should, then, access pre-loaded info (if this is at all possible).

 

From my limited experience with the code, I would say this is definitely the approach to take.

 

Bear in mind though that some information might actually get changed in the dialog (it does, right?) - I had the impression that there were already optimisations in place that would prevent this from working, but I did not get to the bottom of this, so I might be wrong.

 

I would do this first - and only if the result is still not acceptable, would I consider other options like 3rd party trees. But I'm quite positive that won't be necessary with preloaded info.

 

Also: since your end goal is a totally different UI without a tree, why bother? Actually - that UI works so different that maybe you wouldn't even need to cache info at all since there is a lot less info which is displayed once at a time.

Edited by msohn

Share this post


Link to post
1 hour ago, Anders Melander said:

That's a classic - with an easy fix. Try Googling it and see if you can't fix it yourself; You'll learn more from that.

Another source is the source in Vcl.WinXCalendars.pas if your Delphi is new enough.  Look at TBaseCalendarAnimation in how it updates on its timerevent. 

Share this post


Link to post

@Willicious I am sorry !

 

I just tried tried again to build the code for this one https://github.com/Willicious/SuperLemmixPlayer assuming it is the one we are talking about, the readme for this one pointing that Graphics32 is needed and it is, but Graphics32 is not compatible with XE8 or 10, it does have inline variables and recent version of generics that i don't have a clue when they are supported.

 

I don't know even the of the game is compliable with my IDEs, so i can't help beyond eyeballing the code in the IDE, even these IDEs are failing to navigate and i had to use search and that is it.

 

1 hour ago, Willicious said:

Overflow & Range checks

Will produce performance hit, but like that easy to find and fix memory leak, they will prevent such and many many more bugs, so you don't need to enable them all the time, if the performance hit is huge then you can disable them for production or release, yet i like to keep them enabled and disable them in critical parts, but this and from what i see, is too soon, so keep them enabled and fix every exception raised with them is good for now.

 

Though from your screenshot and the CPU usage i don't think these checks performance hit will be noticeable in this application.

Share this post


Link to post
4 hours ago, Kas Ob. said:

but Graphics32 is not compatible with XE8 or 10, it does have inline variables and recent version of generics that i don't have a clue when they are supported.

That's a bug. Right now XE2 and later should be supported. I will probably soon change that to XE7 or something in that neighborhood - so a support window of approximately 10 years.

The inline vars is something I keep introducing by accident simply because I've gotten so used to them that I don't even consciously notice when I use them.

 

I'm not aware of any compatibility problems with the use of generics. I remember that generics was bugged to various degree in some of the early versions but I'm not going to work around that if that's the problem.

 

I'm also not aware of any problems with Delphi 10; I'm using D10.3, D11.2, and D12 myself.

Share this post


Link to post
Posted (edited)
20 hours ago, Anders Melander said:

That's a classic - with an easy fix. Try Googling it and see if you can't fix it yourself; You'll learn more from that.

A Google search for "integer overflow delphi" brought up the answer of using data types that are large enough to contain all values that may possibly be computed and stored in them, and a few responses suggested Int64. ChatGPT seemed to confirm this by suggesting that I change all integer types in the FadeOut procedure to Int64 instead. I went ahead and made this change, and still got the integer overflow error.

 

So, I then applied the fix suggested by Kas Ob. (this one:)

 

    if GetTickCount > EndTickCount then   // prevent integer overflow
      Break;

And, this fixed the error. So, congratulation Kas Ob!

 

What this has done is confirm to me that asking a person, and giving my specific case, is vastly more preferable to generic Googling and even using AI tools. How can I possibly have known that this would fix the issue? The best developer tools on the planet didn't even help solve the problem. In the end, it took a more experienced programmer applying their existing knowledge to the specific issue I was encountering.

 

As a hobbyist with limited time to spend programming, how can I acquire that knowledge?

 

We now have a range check error. Again, I tried googling "range check error delphi" and asking ChatGPT for a solution, neither has produced anything which fixes the issue:

 

procedure TRenderer.ApplyRemovedTerrain(X, Y, W, H: Integer);
var
  PhysicsArrPtr, TerrLayerArrPtr: PColor32Array;
  cx, cy: Integer;
  MapWidth: Integer; // Width of the total PhysicsMap
begin
  // This has two applications:
  // - Removing all non-solid pixels from rlTerrain (possibly created by blending)
  // - Removed pixels from PhysicsMap copied to rlTerrain (called when applying a mask from LemGame via RenderInterface)

  PhysicsArrPtr := PhysicsMap.Bits;
  TerrLayerArrPtr := fLayers[rlTerrain].Bits;

  MapWidth := PhysicsMap.Width;

  for cy := Y to (Y+H-1) do
  begin
    if cy < 0 then Continue;
    if cy >= PhysicsMap.Height then Break;
    for cx := X to (X+W-1) do
    begin
      if cx < 0 then Continue;
      if cx >= MapWidth then Break;
      
      if PhysicsArrPtr[cy * MapWidth + cx] and PM_SOLID = 0 then
      begin
        if GameParams.HighResolution then
        begin
          TerrLayerArrPtr[(cy * MapWidth * 4) + (cx * 2)] := 0;
          TerrLayerArrPtr[(cy * MapWidth * 4) + (cx * 2) + 1] := 0; <---------------------- this line here
          TerrLayerArrPtr[((cy * 2) + 1) * (MapWidth * 2) + (cx * 2)] := 0;
          TerrLayerArrPtr[((cy * 2) + 1) * (MapWidth * 2) + (cx * 2) + 1] := 0;

        end else
          TerrLayerArrPtr[cy * MapWidth + cx] := 0;
      end;
    end;
  end;
end;

My question here would be: what exactly should I be Googling for, and how will I know if what I've found is helpful?

 

If the answer is: "by testing out every single possible thing I find from a generic Google search", that's not exactly encouraging. Stack Overflow, in particular, is full of unspecific answers which usually require an existing level of knowledge and experience to understand in the first place, let alone put into practice for testing.

 

Don't get me wrong, I've been using Google searches, this Forum and various AI tools over the past year to get to where I am now with SuperLemmix, so I can absolutely use these resources effectively, up to a point. When it comes to not actually knowing what I'm supposed to be searching for, or how to know whether the answer is at least relevant if not also correct, this is where I probably need a bit more specific help/mentoring. I've considered paying for lessons many times, because ultimately these are the biggest barriers to progress.

Edited by Willicious

Share this post


Link to post
20 hours ago, msohn said:

From my limited experience with the code, I would say this is definitely the approach to take.

 

Bear in mind though that some information might actually get changed in the dialog (it does, right?) - I had the impression that there were already optimisations in place that would prevent this from working, but I did not get to the bottom of this, so I might be wrong.

OK, thanks. It's good to have that confirmed. I'll look at ways to cache the data for later quick loading.

 

The data that can be changed whilst the same instance of the Level Select dialog is open is level records and talisman completion data, which is all user-specific and parsed from a single text file, so this shouldn't be too problematic to re-load on the fly, as long as the actual baseline pack/level info itself is pre-loaded.

 

When you say "optimisations in place that would prevent this from working", what do you mean by "this"?

 

20 hours ago, msohn said:

Also: since your end goal is a totally different UI without a tree, why bother? Actually - that UI works so different that maybe you wouldn't even need to cache info at all since there is a lot less info which is displayed once at a time.

Fair question. The answer is simply because it will likely take me many months to bring the alternative UI into some semblance of usability. I'll need to learn how to load fonts and display them at the correct size for the size of the Window (or load the letters as bitmaps, and then have to increase the size of the base window so that the font can be more hi-res). Along with many other challenges, of course.

 

So, I'd like to optiise the existing Level Select dialog in the meantime. That way, it's better sooner, and if the challenges of a brand new screen prove to be too large, I'll at least know more about how the existing one works and what I can do to optimise it.

 

Also, I'd like to know how to cache info anyway; it may come in useful for other program elements. I'll add an option to the config menu to clear program cache, and make sure to clear it when a new copy is installed. I need to learn how to do it, though.

Share this post


Link to post
Posted (edited)
14 minutes ago, Willicious said:

I'll look at ways to cache the data for later quick loading.

It already does this. Again, I don't have the code at hand right now, but I remember something like a FLoadLevel which signalled what kind of data had already been loaded so you could call the method again, and if it already loaded the info, skipped the loading, i.e. a cache.

 

So my idea would be to fully update the tree including all the loading in InitializeTreeView, i.e. you would merge SetInfo (edit: only the part that updates the tree) and InitializeTreeView and show a progress while you do that. That would also allow to to get rid of the OnExpanded node handler you added earlier in the thread here.

Quote

The data that can be changed whilst the same instance of the Level Select dialog is open is level records and talisman completion data, which is all user-specific and parsed from a single text file, so this shouldn't be too problematic to re-load on the fly, as long as the actual baseline pack/level info itself is pre-loaded.

 

When you say "optimisations in place that would prevent this from working", what do you mean by "this"?

Because of the above cache, it wouldn't actually reload it. Again, I might be wrong, if the user-data does not have the aforementioned cache.

Also because e.g. the tree nodes are only updated once - for all future calls to SetInfo a node update is skipped if it's text is non-empty. That way all the initially blank nodes created in InitializeTreeView are updated only once, including image index etc. If any of the user data influences the node text or its image, then it wouldn't be properly updated. Again - might be wrong, can't check.

 

Quote

So, I'd like to optiise the existing Level Select dialog in the meantime. That way, it's better sooner, and if the challenges of a brand new screen prove to be too large, I'll at least know more about how the existing one works and what I can do to optimise it.

Fair enough - this was merely directed at considering 3rd party tree views. Also because I'm still optimistic it isn't needed.

 

Edited by msohn
  • Like 1

Share this post


Link to post
6 minutes ago, Willicious said:

A Google search result for "integer overflow delphi" brought up the answer of using data types that are large enough to contain all values that may possibly be computed and stored in them, and a few responses suggested Int64. ChatGPT seemed to confirm this by suggesting that I change all integer types in the FadeOut procedure to Int64 instead. I went ahead and made this change, and still got the integer overflow error.

It sounds like you haven't realized why the overflow occurs. That would have been step 1; Understand the problem before you try to fix it.

 

The problem is that GetTickCount wraps around every 49.7 days. The documentation for GetTickCount would have told you that. If you reboot Windows the problem "goes away" since GetTickCount will start from zero again.

When GetTickCount wraps around the start time becomes "later" than the current time so the delta time becomes negative. Since we are operating on Cardinals (an unsigned type) you get an integer overflow error.

 

The solution is one of the following:

  1. Use another algorithm
  2. Use GetTickCount64 and Int64
  3. Use signed integers and cast the cardinal result to integer.
  4. Disable overflow checks locally for that particular code.

In this particular case I would probably go for 3 or 4.

It should be noted that this isn't really a bug. It only turned into a bug when you enabled overflow checks.

 

25 minutes ago, Willicious said:

What this has done is confirm to me that asking a person, and giving my specific case, is preferable to generic Googling and even using AI tools. How can I possibly have known that this would fix the issue? The best developer tools on the planet didn't even help solve the problem. In the end, it took a more experienced programmer applying their knowledge to the specific issue I was encountering.

You could have solved this problem on your own if you wanted to. It wasn't a difficult one.

 

It might seem easier to ask somebody else for the solution to a problem, and sure it might even solve that particular problem, but what it doesn't do is give you the ability to solve future problems on your own.

How do you suppose the "experienced programmers" acquired their knowledge? They didn't get it by asking for solutions. They got it by analyzing, doing research and experimenting.

  • Like 2

Share this post


Link to post

 

52 minutes ago, Willicious said:

Anyway, we now have a range check error. Again, I tried googling "range check error delphi" and asking ChatGPT for a solution, neither has produced anything which fixes the issue: [...]

My question here would be: what exactly should I be Googling for, and how will I know if what I've found is helpful?

Excuse me but why the hell would you ask ChatGPT about that? It knows nothing about coding; It's a linguistic AI.

 

Also, no amount of googling will help you solve that particular problem. First you need to understand what the error means (what is a range check error), then you must understand what the code does. Pick the code apart and comment it along the way. Run it in the debugger, examine the values passed to the method, validate that they are within the expected bounds, etc. etc. Continue until you completely understand what is going on.

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×