marcocir 2 Posted June 14, 2021 (edited) Hi everybody. I have a problem trying to compile a 64 bit Firemonkey Windows application in Delphi 10.3.3, app that used to compile, both 32 and 64 bits, without problems in Delphi 10.1, 10.2, 10.3.1 and 10.3.2. The first screen of disassembly window is the code generated by the Delphi 10.3.3 compiler, targeting Windows 32 bit in the IDE, for the begin line (577) of the AddProgrammaCompleto method (screen rendering of many fmx panels, layouts, ecc.). The second one is what compiler generates for the same line of code with target Windows 64 bit: as you can see there are differences, of course, but this code, the second time that is executed, ends in a stack overflow error (recursive calling itself?), letting the 64 bit version of my application unusable. My code is exactly the same, this is a simple rebuild switching from 32 to 64 bit. I'd need your help to understand if this simply a bug of the 10.3.3 64 bit windows compiler (and if there is something that I can try to "fix" it in your opinion) or if there is something in my code that, now, is bad for that compiler. Thanks N.B. Hope this is the right place for this type of problem Edited June 14, 2021 by marcocir Share this post Link to post
Remy Lebeau 1396 Posted June 14, 2021 (edited) 29 minutes ago, marcocir said: I'd need your help to understand if this simply a bug of the 10.3.3 64 bit windows compiler (and if there is something that I can try to "fix" it in your opinion) or if there is something in my code that, now, is bad for that compiler. Kind of hard to answer that without seeing your actual Delphi code. Edited June 14, 2021 by Remy Lebeau Share this post Link to post
marcocir 2 Posted June 14, 2021 (edited) Hi Remy, absolutely. This is a medium fmx application, sort of 100.000 lines. I attach the code of method whose the begin is so bad compiled - and hope this could help. Thanks a lot AddProgrammaCompleto method.txt Edited June 14, 2021 by marcocir Share this post Link to post
Arnaud Bouchez 407 Posted June 15, 2021 (edited) The Win64 assembly is not inefficient. It is almost the same code as the Win32 version. And my guess is that Win64 code would be faster. The on-stack TPanel is initialized with explicit zero fill of all bytes in Win32 in a loop, whereas it is with zero fill only of some managed fields within TPanel in Win64, if I understand well the generated asm. Then the RTL calls are comparable. So at execution, my guess is that the Win64 version is likely to be faster. So the compiler is not faulty. On the other hand, your code could be enhanced. One simple optimization would be to write "const Programma: TProgrammaCalcio;" for your program parameter to avoid a RTL array copy call (stack allocation + movsd/movsq + AddRefArray). Using "const" for managed types (string, arrays, records) is always a good idea. Actual stack variable use may be the real cause of the Stack Overflow problem. My guess is that you are abusing of huge variable on stack. Each calls seems to consume $8cb08 bytes on the stack. Much too big! So to fix it: 0) read more about how records and static arrays are used by Delphi, especially about memory allocation and parameter passing 1) use const to avoid a temporary copy on the local stack 2) don't use fixed size arrays for TProgrammaCalcio but a dynamic array allocated on heap 3) TPanel may also be somewhat big - consider using classes and not records to use the heap instead of the stack 4) If you are sure that heap allocation is a bottleneck (I doubt it here), you may create a temporary TPanel within the main TfrmFixedCalcio instance and reuse it Before wondering about the code generated by the Delphi compiler, please review your own code, which is likely to be the real bottleneck for performance. And in such a method, the FMX process and the GUI computation is likely to be the real bottleneck, not your Delphi code, and especially not the prolog/epilog generated by the Delphi compiler. "Premature Optimization is the root of all evil"! (DK) Edited June 15, 2021 by Arnaud Bouchez 1 Share this post Link to post
marcocir 2 Posted June 15, 2021 (edited) Thanks Arnaud. There is a reason why TProgrammaCalcio is not passed as const: I have to update Selezionato (Programma[iProx].Selezionato := 1). The question is always there: maybe it wasn't clear, but my problem isn't the speed of the code, but that 64 bit prolog asm (and not the 32 bit equivalent) seems to recall something endless and the application explodes in a stack overflow error. You are right: that method could be a bit (or a lot ) optimized, but now it can't because the simply execution of the begin line crashes the application! My asm knowledge is not so strong, so this post... p.s think that many of those panels and layouts were added as a trick to fix some fmx... 2014/2015 problems 😉 Edited June 15, 2021 by marcocir Share this post Link to post
Guest Posted June 15, 2021 12 hours ago, marcocir said: the second time that is executed, ends in a stack overflow error (recursive calling itself?) That sound very serious bug in 64bit compiler. Don't have answer per se for this code or the latest compilers, but here few thought running in my brain 1) Looking at this This is wrong on so many levels, see $9863 = 39011 in decimal, the 2 push will fill 8 bytes means 39011*8 = 312088 bytes being filled with 0 and used on stack, this is more than quarter of the 1mb default stack size !, so the compiler is using the stack in uncontrolled ( or semi controlled way) 2) Looking at the code you provided, there is nothing declared in the var sections that require the size established above., i mean it is almost impossible to use 1kb of the stack with simple types, to use simple types ( not arrays ...) for 32bit 1024/4 = 256 , and that the number of needed of simple types vars ( strings, integers, objects..) to fill 1kb 3) Why ? and How ?, these are more complex questions but have factors affecting this, like, Your code looks like too complicated for the compiler so it went to temporary vars this will add to stack usage, considering the Delphi compiler have a brain smaller than a sparrow. Are you using inlined functions, this will bring the stack from these inlined to the host. Also typecasting might be a problem, not sure if there is in that code, but mentioning it anyway. The 64bit is ugly as it can get, but that should not be problem unless the problem is magnified by 2 for 64bit, i mean the stack usage by these temporary vars (storage) in that case each one will take 8 bytes means, your function is hitting two third of the available stack, so my recommendation is split that function into few functions, and double check what are you inlining if there is any. By splitting i mean try to move the loops into their own functions, because it is complex nested loops which always throw the compiler running after butterflies. Now returning to the first line i wrote, if second execution raises a stack overflow then the compiler is faulty no arguing there, that should not happen unless you are using depleting the stack, but looking at the code, the compiler itself depleted the stack and failed to maintain it right. Share this post Link to post
marcocir 2 Posted June 15, 2021 Ok, I've traced the problem to the asm lines of disassembly windows: The stack error is down to the selected line, where, if I've well understood that code, the generated asm tries to move stuff on the stack (rsp), in that loop (jnbe), till it gets the max stack size permitted (?), 1 mb for my app. What's the meaning of that code fragment (that I can't see in the 32 bit prolog asm)? Thanks a lot! Share this post Link to post
Arnaud Bouchez 407 Posted June 15, 2021 (edited) @Kas Ob. Win32 code is fine, it fills the stack variables with zeros, and my guess is that the size fits the TProgrammaCalcio size. The problem is really a stack overflow, not infinite recursion. The compiler has no bug here. User code is faulty. @marcocir Did you read my answer? There is no unexpected recursion involved. The problem is that this TProgrammaCalcio is too huge to fit on the stack. On Win64, this is what mov [rsp+rax],al does: it reserves some space for the stack, by increasing the stack pointer by 4KB chunks (the memory page size), and writing one single byte to it to trigger a page fault and a stack overflow if there is no stack place available. And you are using too much of it with your local variable. On Win32, it doesn't reserve the memory by 4KB chunks, but it fills the stack with zeros - which is slower, but does exactly the same. What does SizeOf(TProgrammaCalcio) return? Is it really a static array? You should switch to a dynamic array instead. A good practice is not to allocate more than 64KB on the stack, especially if there are some internal calls to other functions. In some execution context (e.g. a dll hosted by IIS), the per-thread stack size could be as little as 128KB. Edited June 15, 2021 by Arnaud Bouchez Share this post Link to post
David Heffernan 2345 Posted June 15, 2021 @marcocir start from the assumption that the compiler is correct and that your code is faulty. That's far and away the most likely explanation. With that mindset, read Arnaud's answer again. 1 1 Share this post Link to post
Guest Posted June 15, 2021 @marcocir Do as Arnaud said, as the size of that parameter can cause the high stack usage, also if you can't use const then declare it as var. Also one thing to remember, Stack Overflow exception doesn't always means an recursive had gone uncontrolled, to explain how stack overflow being raised you need to know about stack protection first. So, we have a stack which usually 1mb per thread, the OS will mark few pages as read and write protected means any access to these will raise a hard access violation exception, the system places these pages ( usually 3 or 2 ) with size of 4kb each, at the end of the stack means they are are the lowest point of that 1mb, the system will check raised access violation pointing to these pages and change the exception to stack overflow, instead of access violation, so unless you landed on these pages there will not be a stack overflow even with recursive, also if a function allocating a lot of either local or temp variables it will get close to the end of the stack eventually might hit them without recursive logic. Share this post Link to post
marcocir 2 Posted June 15, 2021 (edited) Arnaud, now I hope I get the point. Yes it is a static array of 305k (32bit) or 562k (64bit). Two passes and we are over the 1MB upper limit. First, my bad, I was convinced that passing an array that way was exactly as passing it like with a var parameter (a pointer, so, no copy of the whole array on the stack). But this is an array of a complex record, with strings, so maybe this is the falling point. Second, I have to understand why on the heart the stack isn't unloaded from the first 562k when my method returns. Sure, maybe I have to re-read some basics, sorry David, you are right, but the start of this story is that the exact same code compiled to 64 bit, since 2014 , from Delphi 10.1 to 10.3.2, without any stack overflow error (and this is a strange thing ;( ). Edited June 15, 2021 by marcocir Share this post Link to post
marcocir 2 Posted June 15, 2021 1 hour ago, Arnaud Bouchez said: So to fix it: 0) read more about how records and static arrays are used by Delphi, especially about memory allocation and parameter passing 1) use const to avoid a temporary copy on the local stack 2) don't use fixed size arrays for TProgrammaCalcio but a dynamic array allocated on heap 3) TPanel may also be somewhat big - consider using classes and not records to use the heap instead of the stack 4) If you are sure that heap allocation is a bottleneck (I doubt it here), you may create a temporary TPanel within the main TfrmFixedCalcio instance and reuse it Ooops, I didn't see that you integrated your answer... Share this post Link to post
Arnaud Bouchez 407 Posted June 15, 2021 The stack is always "unloaded" when the method returns. That is a fact for sure. There is a "mov rbp, rsp" in the function prolog, and a reversed "mov rsp, rbp" in the function epilog. Nice and easy. Look at the stack trace in the debugger when you reach the stack overflow problem. You will find out the exact context. And switch to a dynamic array. Using proper copy() if you want to work on a local copy. Share this post Link to post
David Heffernan 2345 Posted June 15, 2021 (edited) 26 minutes ago, marcocir said: First, my bad, I was convinced that passing an array that way was exactly as passing it like with a var parameter (a pointer, so, no copy of the whole array on the stack). But this is an array of a complex record, with strings, so maybe this is the falling point. It's a fixed length array, so it's a value type. Which means that when you pass it as a value argument, it travels via the stack. If it were a dynamic array then it would be passed as a pointer to the first element. 26 minutes ago, marcocir said: David, you are right, but the start of this story is that the exact same code compiled to 64 bit, since 2014 , from Delphi 10.1 to 10.3.2, without any stack overflow error (and this is a strange thing ;( ). Not that strange really. Changing from 32 to 64 bit is significant. You've been getting lucky up until now. All that copying of huge objects must cost a lot of time and make for inefficient memory usage. Mindset is important here. It's very rare that you will find compiler bugs (even with the Delphi compiler!) So always suspect your own code first. Edited June 15, 2021 by David Heffernan Share this post Link to post
marcocir 2 Posted June 15, 2021 (edited) 12 minutes ago, David Heffernan said: It's a fixed length array, so it's a value type. Which means that when you pass it as a value argument, it travels via the stack. If it were a dynamic array then it would be passed as a pointer to the first element. Not that strange really. Changing from 32 to 64 bit is significant. You've been getting lucky up until now. All that copying of huge objects must cost a lot of time and make for inefficient memory usage. Thanks David, I completely agree, not a great example of optimization here, but bet it that this method had no problems (lucky me, ok) till now. I have, also, to dig on why the first 562k (...) moved on the stack, now are not unloaded from the stack when the method returns. Edited June 15, 2021 by marcocir Share this post Link to post
marcocir 2 Posted June 15, 2021 32 minutes ago, Kas Ob. said: Also one thing to remember, Stack Overflow exception doesn't always means an recursive had gone uncontrolled, to explain how stack overflow being raised you need to know about stack protection first. Thanks Kas, another thing to remember! 😉 Share this post Link to post
Guest Posted June 15, 2021 1 hour ago, Arnaud Bouchez said: Win32 code is fine, it fills the stack variables with zeros, and my guess is that the size fits the TProgrammaCalcio size. Not trying to raise long discussion, but fine is more or less like well it works. The code generated in many parts gives the impression that the compiler currently in un-finished development stage and stopped years ago, literally looks like test beta version compiler since ages. My point is : What is the point of zeroing passed parameter to just be followed by full copy !! a full overwrite is following that zero process, pointing to few things is wrong here 1) The zeroing is done on huge chunk of memory, no matter what machine code and its speed, it should have compiler configurable threshold to switch to intrinsic RTL function like FillChar, ZeroMemory instead of locally do it. 2) Copying large part of memory as above should be switched to RTL specific function. 3) Why zeroing if it gonna be copied over. I think these points are viable to be reported, if reporting will do something about that. @marcocir You mentioned that can't use const, this means one of two 1) You are changing something in the passed array, but not the changes will not be permanent, because it is done on temporary local copy and all changes will vanish once that function exit. 2) You are changing something in the passed array, but you need it only locally for this function, then using var and const only will not help, but you can allocate same size of it on the heap and copy it yourself, just don't forget to switch the passed to const, then change what ever you want on your heap allocated copy. Share this post Link to post
David Heffernan 2345 Posted June 15, 2021 1 hour ago, marcocir said: I have, also, to dig on why the first 562k (...) moved on the stack, now are not unloaded from the stack when the method returns. It is removed from the stack when the method returns. I don't think you've got to the bottom of the issue yet. Share this post Link to post
marcocir 2 Posted June 15, 2021 1 hour ago, David Heffernan said: It is removed from the stack when the method returns. I don't think you've got to the bottom of the issue yet. Yes David, I'm here to learn, if possible. My code is buggy, no problem, and static arrays are horrible in this context. So same exact code, compiled in 10.2.3 version continues to run without problems, and compiled with 10.3.3 version continues to crash with stack overflow error: work to do this summer. This runs (10.2.3): This crashes (10.3.3): Thanks for your great help. Share this post Link to post
David Heffernan 2345 Posted June 15, 2021 Answer likely to be found in the source code Share this post Link to post
marcocir 2 Posted June 15, 2021 1 minute ago, David Heffernan said: Answer likely to be found in the source code Uhmmmm, I agree, but I have compiled the same exact source code, using same Delphi Compiler options. I don't know if it's normal that mov eax,.... instruction at offset $4 is of different length. Ok.., It's enough, thank you all. 😉 Share this post Link to post
Arnaud Bouchez 407 Posted June 23, 2021 My guess is that the default data alignment may have changed between Delphi 10.2 and 10.3, so the static arrays don't have the same size. You may try to use "packed" for all the internal structures of the array. Share this post Link to post
marcocir 2 Posted June 23, 2021 Hi Arnaud, and thanks. I don't know if that change happened, but, as said by David, stack overflow error was there for a problem in my code: the (evil) static array had a size change when I ported the source from Delphi 10.2 to 10.3 , so the overflow problem with the stack, my bad. Share this post Link to post