Jump to content
Willicious

How to debug a Not Responding program element

Recommended Posts

5 hours ago, Willicious said:

neither has produced anything which fixes the issue:

A lot being said above, and you need to start and fig deeper and learn why and how.

 

Anyway, this issue is also easy to fix, the cause is this declaration in GR32.pas

  PColor32Array = ^TColor32Array;
  TColor32Array = array [0..0] of TColor32; // pain in the back, and will trigger overflow check
  TArrayOfColor32 = array of TColor32;

And the fix or another slap work around without testing but again using my extra power 

procedure TRenderer.ApplyRemovedTerrain(X, Y, W, H: Integer);
type
  TTempHackColor32Arr = array [0.. MaxInt div SizeOf(TColor32) -1 ] of TColor32;
  PTempHackColor32Arr = ^TTempHackColor32Arr;
var
  //PhysicsArrPtr, TerrLayerArrPtr: PColor32Array;
  PhysicsArrPtr, TerrLayerArrPtr: PTempHackColor32Arr;
  cx, cy: Integer;
  MapWidth: Integer; // Width of the total PhysicsMap
begin
  // This has two applications:
  // - Removing all non-solid pixels from rlTerrain (possibly created by blending)
  // - Removed pixels from PhysicsMap copied to rlTerrain (called when applying a mask from LemGame via RenderInterface)

  PhysicsArrPtr := PTempHackColor32Arr(PhysicsMap.Bits);
  TerrLayerArrPtr := PTempHackColor32Arr(fLayers[rlTerrain].Bits);

The best is to fix GR32 with the following, but don't do that on your own, this is up to Anders

  PColor32Array = ^TColor32Array;
  //TColor32Array = array [0..0] of TColor32;
  TColor32Array = array [0.. MaxInt div SizeOf(TColor32) -1 ] of TColor32;
  TArrayOfColor32 = array of TColor32;

 

  • Like 1

Share this post


Link to post
1 hour ago, Kas Ob. said:

The best is to fix GR32 with the following, but don't do that on your own, this is up to Anders


  PColor32Array = ^TColor32Array;
  //TColor32Array = array [0..0] of TColor32;
  TColor32Array = array [0.. MaxInt div SizeOf(TColor32) -1 ] of TColor32;
  TArrayOfColor32 = array of TColor32;

 

I started to make this change for all the publicly declared array[0..0] types (there's more than a few) but then I realized that this will probably break existing code.

For example if a user of Graphics32 has the following declaration:

type
  TMyStuff = record
    FirstPixel: TColor32;
    TheRestOfThePixels: TColor32Array;
  end;
  
  PMyStuff = ^TMyStuff;

Then the compiler will refuse to compile it because the size of the structure exceeds the max allowed:

Quote

E2100 Data type too large: exceeds 2 GB

 

Given that the [0..0] declaration is a very common technique and the problem here really is that range checking has been enabled for a corpus of code that wasn't written with that in mind (and I'm speaking of *Lemmix here) I think I'll leave it be.

 

My recommendation to get around this for now is to search for PColor32Array and then surround the code with {$RANGECHECKS OFF}...{$RANGECHECKS ON} - or simply declare a new type and cast to it as @Kas Ob. has suggested. Give it a neutral name like TPixelArray/PPixelArray or something like that though; It's not really a hack.

Share this post


Link to post
40 minutes ago, Anders Melander said:

I realized that this will probably break existing code.

 

41 minutes ago, Anders Melander said:

Then the compiler will refuse to compile it because the size of the structure exceeds the max allowed:

This exactly the point, this type should not be used ! (no variable or field should be declared as such)

I searched GR32 source and it is in fact not used anywhere, because it should not appear anywhere, such static array can be used just to make the virtual declaration of it (pointer to) available.

 

Think about it, your example TMyStuff and even in many Windows SDK structures need such type, the easy way is to declare it as [0..0] to cast the header of record, and this exactly what make these structures not memory safe, alas there is no workaround it, though the only approach is to have two versions, or may be in case of such record then prevent the declaration of xxArr version and declare your own with "[0..0] of yyyy", this will prevent the developer from confusing these static arrays, ensuring best practice.

 

In other words, both these declaration 

  TColor32Array = array [0..0] of TColor32;
  TColor32Array = array [0.. MaxInt div SizeOf(TColor32) -1 ] of TColor32;

Can be dangerous, yet the second can't be used !, makes it safer and force you to declare your own need in place, while the first one has zero help from the compiler in protecting memory overflow at runtime, both are tuples, both have their lengths known before hand by the compiler and no runtime protection at all, one is not allowed the other is can be falling knife.

 

I am sounding is my opinion, and that is it.

 

 

ps Wish if Pascal syntax had a modifier to force or at least warn with some type when used like 

  TColor32Array = array [0..0] of TColor32; varprohibited;     // can't be used in as field or var, yet allowed to be used in other types declaration like below.
  TColor32Array = array [0.. MaxInt div SizeOf(TColor32) -1 ] of TColor32; prohibited; // same as above
  PColor32Array = ^TColor32Array; // this is allowed 
  TUnsafeColArr = TColor32Array;  // this is allowed

 

Share this post


Link to post

Also i missed to point this thought about this, missed writing it in the above post while writing slowly and fixing gramer :classic_blush:

type
  TColor32Array = array [0..0] of TColor32;
  
  TMyStuff = record
    FirstPixel: TColor32;
    TheRestOfThePixels: TColor32Array;
  end;
  
  PMyStuff = ^TMyStuff;

This doesn't save any time, shorten the code or makes it clearer (if clearer is a word), for simple fact :

the length of data in that array is 1 for the compiler and any manipulation with such declared var (rec) will only copy one element "like assigning with := ", and an extra var/field contains the actual length should be saved somewhere, on top of that, there must be a manually code written to perform the actual copy of the elements, so... !! again my opinion.

Share this post


Link to post

I think you missed my point: Backward compatibility.

 

While we can agree on the pitfalls of the current declarations and on how the types could have been declared, I have to take the users' existing code into account before I change something that have worked for over 20 years.

 

3 hours ago, Kas Ob. said:

This doesn't save any time, shorten the code or makes it clearer (if clearer is a word)

Again you missed the point. It was just an example of how changing a publicly declared array type can break user code.

Share this post


Link to post
Posted (edited)
On 5/31/2024 at 12:10 PM, Anders Melander said:

Excuse me but why the hell would you ask ChatGPT about that? It knows nothing about coding; It's a linguistic AI.

Yeah, it's not always the best resource for programming, but it can sometimes help with quick solutions (e.g. syntax errors) and can actually help with what you're suggesting (i.e. breaking a block of code down to analyze what it does). It absolutely still requires basic knowledge to be able to make good use of its output though.

 

I'm not a complete beginner by now, and I do understand way more about how things work than I did this time last year (for example, I know the difference between functions, methods, variables, constants, types, classes, and what these are all for). I made my first enum type a couple of weeks ago, for example, and it was satisfying to get it right and have it optimise the code I was working on. That was done with the guidance of another programmer as well. I've been very lucky to have lots of help, and I hope I can one day provide the same for someone else.

 

On 5/31/2024 at 12:10 PM, Anders Melander said:

First you need to understand what the error means (what is a range check error), then you must understand what the code does. Pick the code apart and comment it along the way. Run it in the debugger, examine the values passed to the method, validate that they are within the expected bounds, etc. etc. Continue until you completely understand what is going on.

I generally do take this approach. It's more difficult with some things than others.

 

For range check errors, from what I understand it's if something falls outside the specified bounds, for example:

 

(i = 1 to 10)

 

If I then somehow = 11, that would cause a range check error...?

Edited by Willicious

Share this post


Link to post
Posted (edited)
On 5/31/2024 at 11:45 AM, msohn said:

It already does this ... if it already loaded the info, skipped the loading, i.e. a cache

From what I can tell, none of the loading is done from the level select treeview. It's all done in LemLevel and LemNeoLevelPack and then the parsed data is merely accessed from LevelSelect. But, if that's the case, then what's causing the slow response times? It's very difficult to narrow it down to any one particular thing without guidance, hence this thread.

 

EDIT: It's possible that, by accessing the data from LevelSelect, that causes it to be parsed in LemLevel. I'm not sure which way around it is.

Edited by Willicious

Share this post


Link to post
6 hours ago, Willicious said:

From what I can tell, none of the loading is done from the level select treeview. It's all done in LemLevel and LemNeoLevelPack and then the parsed data is merely accessed from LevelSelect.

Absolutely correct.

Quote

But, if that's the case, then what's causing the slow response times? It's very difficult to narrow it down to any one particular thing without guidance, hence this thread.

To see where the time is spent, look at the VTune screenshot I posted here a few day ago:

You have to read it from the bottom up. The x axis is time spent and it's only showing the time spent in ...BaseMenuScreen.DoLevelSelect, i.e. clicking the button to show the dialog.

 

There's basically 3 main "pillars". The right-most shows all the way from TCustomForm.Show, via your TFLevelSelect.FormShow event handler calling SetInfo. The LoadNodeLabels part is what I was referring to in my last post where only visible, uninitialised nodes are updated. That is something that I would recommend joining with .InitializeTreeView - but do it for all nodes - and show a progress while you do it.

 

The caching happens in LemLevelNeoPack, for example in TNeoLevelEntry.GetTitle. All these property getters call LoadLevelFileData, which checks LoadState telling it what was already loaded and then only load the missing parts or do nothing.

 

Finally, to be clear: someone (you?) already put quite some effort into optimising the LevelSelect dialog, i.e. to only spend as little time as possible with filling the GUI initially (think InitializeTreeView) and then load all other parts upon user-interaction (SetInfo calls). It just turned out to be too slow (maybe too many levels/groups/packs now) and a mess (calling SetInfo at too many places). Now the task is to undo all of that, do the slower loading of everything while showing a progress and giving your applications window message pump more time to "breathe" and fix the freezing. The result hopefully is a better, more responsive compromise: opening the dialog the first time will take a lot longer than now - the progress will move slowly - but subsequently, a lot quicker.

 

  • Thanks 1

Share this post


Link to post
On 6/1/2024 at 9:41 PM, Willicious said:

For range check errors, from what I understand it's if something falls outside the specified bounds, for example:

(i = 1 to 10)

If I then somehow = 11, that would cause a range check error...?

Let me look it up in the help for you... :classic_dry:

 

Quote

Range checking

In the {$R+} state, all array and string-indexing expressions are verified as being within the defined bounds, and all assignments to enumerated and subrange variables are checked to be within range.

Quote

Overflow checking

In the {$Q+} state, certain integer arithmetic operations (+, -, *, Abs, Sqr, Succ, Pred, Inc, and Dec) are checked for overflow. The code for each of these integer arithmetic operations is followed by additional code that verifies that the result is within the supported range.

 

  • Like 1

Share this post


Link to post
Posted (edited)
On 5/31/2024 at 11:45 AM, msohn said:

So my idea would be to fully update the tree including all the loading in InitializeTreeView, i.e. you would merge SetInfo (edit: only the part that updates the tree) and InitializeTreeView and show a progress while you do that. That would also allow to to get rid of the OnExpanded node handler you added earlier in the thread here.

 

On 6/2/2024 at 4:22 AM, msohn said:

The LoadNodeLabels part is what I was referring to in my last post where only visible, uninitialised nodes are updated. That is something that I would recommend joining with .InitializeTreeView - but do it for all nodes - and show a progress while you do it.

OK... I went ahead and:

 

1) Merged LoadIcons and LoadNodeLabels into InitializeTreeview

2) Removed the call to LoadNodeLabels in SetInfo

3) Moved the call to LoadIcons to DisplayLevelInfo (still called via SetInfo, but at least it's now in the right place!)

4) Removed the OnExpanded event handler

 

And WOW! The difference is night and day. It takes much longer to load the Level Select initially (expected, and I'll do a progress bar for that), but once open it's super snappy and responsive, and doesn't bug out when switching between packs any more 🎉

 

Thanks for your help in narrowing down what was causing the delays. I clearly need to install VTune (is there a free/budget version?) as that displays exactly what's causing the delays very clearly, so it will no doubt come in very handy for other features of the program.

 

The only thing I'd probably say is that it might be somewhat frustrating to use the slower-loading Level Select when only wanting to switch quickly between levels (e.g. for testing purposes). It might be worth doing a "Quick Select" version which only displays the treeview and nothing else; this would likely be the default. Then, a button for "Show Level Info" could go ahead and load the full version. I'll see what the feedback is after releasing this slower-loading but quicker-interfacing version of the Level Select.

 

I'd say that pretty much wraps up this topic as resolved, but it would be good to continue the discussion around general optimization. Can the admins split the posts regarding range & overflow checking into a new topic? Happy to continue that discussion here if not.

 

----------------------

 

EDIT: After playing around with it for a bit, I think the (much!) slower loading time is too much of a cost for the slightly snappier treeview. Clicking into Level Select is done quite a lot, especially for testing purposes, and it gets old quick; I can see why it was set up the way it is (i.e. loading node labels only when actually needed). There may be other opportunities for optimization though; I'll take a look at the VTune diagram and see what's possible.

 

Also, I've learned a ton via this thread, so thank you again!

Edited by Willicious
  • Like 2

Share this post


Link to post

I'm glad I took the time to explain how I read the VTune flame graph and that you tried and verified most of what I expected (well guessed tbh). And that you now saw first-hand why the original code was written the way it was. Welcome to the world of maintaining legacy projects😉

 

I'm sure you figured out how to get a copy of VTune, if not refer to the thread by Anders which I linked to in the flame graph post. It contained all I needed to get going.

 

I read your post like you reverted your changes for now, but you might want to consider a few things which could improve the solution you tried:

 

1. Make sure all the loaded info is kept, i.e. when opening the dialog a second time, everything is already there. That way the user has to endure the delay only once. If that proves that memory usage is of no real concern (which I'd expect), you could go ahead and

2. Consider loading the info asynchronously, i.e. in a thread/threadpool as soon as the app starts - then, if the users opens the dialog before the loading has finished, show a progress and wait for it to finish. This should behave great when it works, but making that codebase thread safe is an enormous task, especially since there don't seem to be unit- and functional tests which help you make changes with confidence 

Share this post


Link to post
On 6/4/2024 at 1:58 PM, msohn said:

1. Make sure all the loaded info is kept, i.e. when opening the dialog a second time, everything is already there. That way the user has to endure the delay only once

I'm not sure exactly how to do this. It was my (and, from what you've been saying, your) understanding that the program already did cache the loaded info, but it turns out it doesn't (at least not to any noticeable extent). I think it has to reload everything even in the same session (i.e. with SLX still open, but Level Select being closed and then re-opened).

Share this post


Link to post
On 5/27/2024 at 3:56 PM, msohn said:

Not with the built-in tools, but here's an excellent summary by Remy of what you need to do: https://stackoverflow.com/a/1130506/386473

 

I just used exactly that approach and was able to find and fix the memory leaks in like half an hour. I'll try to make a pull request on GitHub as the patch is a bit larger due to spacing when I introduced new try/finally blocks.

Got one more memory leak that's showed up and I'd like to have a go at fixing it myself. It's narrowed down to being a problem with TProjectile, which doesn't have a destructor. However, I implemented a destructor and this doesn't fix the memory leak, so... more investigation needed.

 

I'm having some problems getting FastMM set up. Managed all but one step:

image.thumb.png.db5bed3c5a8d93ec0fd0c43c2f53ff4d.png

 

I tried running the program anyway and as far as I can tell, nothing happens. Does it open a dialog or spit out a text file? Or something else?

Share this post


Link to post
Posted (edited)
4 hours ago, Willicious said:

It's narrowed down to being a problem with TProjectile, which doesn't have a destructor. However, I implemented a destructor and this doesn't fix the memory leak

Classes do not need a destructor to be destroyed/freed. A class needs a destructor if it allocated other objects or memory that it should free or there is other cleanup it needs to do before being destroyed.

 

If an object is not being freed, adding a destructor would not fix the issue.

4 hours ago, Willicious said:

Does it open a dialog or spit out a text file? Or something else?

Read the comments in FastMM4Options.inc. Specifically:

  {Set this option to log all errors to a text file in the same folder as the
   application. Memory errors (with the FullDebugMode option set) will be
   appended to the log file. Has no effect if "FullDebugMode" is not set.}
  {$define LogErrorsToFile}

  {Set this option to log all memory leaks to a text file in the same folder as
   the application. Memory leak reports (with the FullDebugMode option set)
   will be appended to the log file. Has no effect if "LogErrorsToFile" and
   "FullDebugMode" are not also set. Note that usually all leaks are always
   logged, even if they are "expected" leaks registered through
   AddExpectedMemoryLeaks. Expected leaks registered by pointer may be excluded
   through the HideExpectedLeaksRegisteredByPointer option.}
  {$define LogMemoryLeakDetailToFile}

It has been a while since I did this. I found a post on SO for a reminder. Based on that post, copy FastMM4Options.inc to your project folder. Edit the .inc file and remove the period from {.$define FullDebugMode}

 

Do a full build of the project and any leaks should be written to a text file with additional information about the object leaked and when it was allocated.

Edited by JonRobertson

Share this post


Link to post
10 hours ago, Willicious said:

I'm having some problems getting FastMM set up.

Give madExcept a try again; It's much easier.

image.png.d862fc11a236f857323bca6cf0cccbf4.png

image.thumb.png.be8d2f4b04fbedf58a3f815c1032c081.png

image.png.bbc651c6780b8631aa243f018ba6d090.png

image.png.4f9814f3fe01ff9d6f786a7aa49baaf8.png

  • Like 1

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×