pyscripter 694 Posted November 25, 2018 I used to think that activating the compiler option "Use debug dcus" had minimal impact on performance. I hadn't done any benchmarking but from experience I could not tell the difference in performance. So I would include debug dcus even on release versions, so that full stack trace information would be available. But in my recent tests regarding the performance of Regular Expressions, I found that activating this option would increase run times by as much as 70-80%. Does anyone has done any benchmarking or have any rough idea about this? Is the large performance hit specific to this case? Can you still get full stack trace information without using debug dcus? (I am using jcl's Debug expert for exception reporting, with jdbg info linked into the executable). Example stack trace: Quote Stack list, generated 14/09/2018 15:56:54 (0000F543){PyScripter.exe} [00410543] System.LocaleCharsFromUnicode (Line 39900, "System.pas" + 1) + $17 (001CDF3F){PyScripter.exe} [005CEF3F] Vcl.Controls.TWinControl.CMEnabledChanged (Line 11672, "Vcl.Controls.pas" + 2) + $2 (002B5F41){PyScripter.exe} [006B6F41] VirtualTrees.TBaseVirtualTree.CMEnabledChanged (Line 15752, "VirtualTrees.pas" + 1) + $2 (0003F413){PyScripter.exe} [00440413] System.Generics.Collections.TListHelper.DoRemoveFwd4 (Line 2295, "System.Generics.Collections.pas" + 3) + $6 (000081DB){PyScripter.exe} [004091DB] System.TMonitor.Exit (Line 18722, "System.pas" + 2) + $7 (000A558C){PyScripter.exe} [004A658C] System.Classes.RemoveFixups (Line 9715, "System.Classes.pas" + 14) + $8 (00007620){PyScripter.exe} [00408620] System.TObject.Destroy (Line 16985, "System.pas" + 0) + $0 1 Share this post Link to post
dummzeuch 1517 Posted November 25, 2018 Originally the idea was that the debug dcus only contain additional information for the integrated debugger which should have no performance impact at all. This of course is only true, if all compiler (and possibly linker) settings are equal, which I doubt. E.g. enabling range checking (which I always do for debug builds) can have a significant impact on performance. No idea what the compiler options are in the supplied debug dcus. The jcldebug stack trace does not require debug dcus, but a detailed map file, which does not have any performance impact. 1 Share this post Link to post
David Heffernan 2353 Posted November 25, 2018 (edited) My understanding has always been that debug dcus are compiled with optimisations etc. and so there should be no performance difference. Can you provide a cut down program that demonstrates the issue? Edited November 25, 2018 by David Heffernan Share this post Link to post
pyscripter 694 Posted November 25, 2018 @David Heffernan Please find attached my RE Benchmark application. Small console app, with the testing text file. With debug dcus run time increases from 8 to about 14 secs. RE_Benchmark.7z Share this post Link to post
David Heffernan 2353 Posted November 25, 2018 Is it possible that the difference is due to the PCRE code being compiled with poor settings? Something similar to this: https://stackoverflow.com/q/27821277/505088 Share this post Link to post
pyscripter 694 Posted November 25, 2018 But here I am comparing the same app with/without debug dcus not between 32bit/64bit. Actually I just noticed that the difference only appears with 32 bit in Delphi Rio. Debug dcus make little difference in Rio 64bits or in Delphi Tokyo The surprising thing is that in this benchmark application, most of the computational time is spent inside external c code linked into the application. Could it be that the debug dcus link to different c code in 32 bits Rio compared to dcus without debug info? Another finding was that in Delphi Tokyo, the benchmark runs significantly faster in 64bits than 32 bits (quite the opposite with Davids zlib experience). Share this post Link to post
Attila Kovacs 631 Posted November 25, 2018 @pyscripter Hi, I can't see any difference between release/debug under Rio/x86. Both are done in 12.x seconds. Share this post Link to post
pyscripter 694 Posted November 25, 2018 @Attila Kovacs Is the debug built using the "Use Debug dcus" option? Share this post Link to post
Attila Kovacs 631 Posted November 25, 2018 @pyscripter Uhh, sorry, it wasn't. Now it's as slow as Berlin. Like the enhancements in rio were missing from the debug dcu-s. It's about 20secs. Share this post Link to post
pyscripter 694 Posted November 25, 2018 At least we can conclude that the problem with debug dcus is confined to the RE units in win32. Bug report submitted. Share this post Link to post
Attila Kovacs 631 Posted November 25, 2018 I don't have the impression that the debug dcu code has been compiled with optimization. I'm also not sure if it's should have been. For exception reporting I'm using eurekalog, the call stack is okay, maybe not as verbose as with debug dcu's. But in principle I never release with debug dcu's. Share this post Link to post
David Heffernan 2353 Posted November 25, 2018 4 hours ago, pyscripter said: But here I am comparing the same app with/without debug dcus not between 32bit/64bit. Actually I just noticed that the difference only appears with 32 bit in Delphi Rio. Debug dcus make little difference in Rio 64bits or in Delphi Tokyo The surprising thing is that in this benchmark application, most of the computational time is spent inside external c code linked into the application. Could it be that the debug dcus link to different c code in 32 bits Rio compared to dcus without debug info? Another finding was that in Delphi Tokyo, the benchmark runs significantly faster in 64bits than 32 bits (quite the opposite with Davids zlib experience). My point is not that it's a 32/64 bit issue. My point is that it could be a difference in the way the C code is compiled. Share this post Link to post
pyscripter 694 Posted November 25, 2018 @dummzeuch I checked again and without debug dcu's you do not get line information for vcl and rtl routines in the call stack, when using jcl debug. This was the reason I was using the "debug dcus" option. 2 Share this post Link to post