FPiette 383 Posted January 16, 2022 Hello! I have the need to profile my application to find a performance bottleneck. This 32 bit application written in Delphi 11 is using IXmlDoc to read a GPX file and build an equivalent Delphi class hierarchy. This process look abnormally long. This could comes from IXmlDoc which is not the fastest XML parser, or from the code I wrote. I hope that using a profiler I will discover where the issue lies. So I searched the web and found several products. Among them I found GpProfile2017 (https://github.com/ase379/gpprofile2017) which is open source and still maintained. Before jumping to this project, I would like to know if someone has some experience to share. Thanks. Share this post Link to post
Dalija Prasnikar 1396 Posted January 16, 2022 Sampling profiler will give you better insight. https://www.delphitools.info/samplingprofiler/ 2 Share this post Link to post
Dalija Prasnikar 1396 Posted January 16, 2022 Unrelated to the profiling, there are other optimizations. First, SAX parsing is generally more performant than DOM parsing, especially when DOM is based on interfaces. If you don't need XML DOM, then building your business classes directly during parsing will be more efficient. But not all structures cane be equally easy parsed by SAX. Next, IXmlDoc works on top of standard IDOM interfaces, so you have additional slowdown there. If you cannot use SAX, modifying code to work directly with IDOM interfaces might be a solution. Or using different DOM parser. Share this post Link to post
FPiette 383 Posted January 16, 2022 30 minutes ago, Dalija Prasnikar said: https://www.delphitools.info/samplingprofiler/ This link is dead. Share this post Link to post
FPiette 383 Posted January 16, 2022 20 minutes ago, Dalija Prasnikar said: But not all structures cane be equally easy parsed by SAX. GPX file have very simple structure but there are tens of thousands of nodes (Example of GPX file). Before changing my code, I would profile it to know if the slowness comes from the XML parser our from my own code which is fully class oriented with generic TObjectList. I suspect this is much slower in my case compared to records and pre-allocated dynamic arrays. 1 Share this post Link to post
David Schwartz 426 Posted January 16, 2022 14 minutes ago, FPiette said: This link is dead. works fine for me! 1 Share this post Link to post
PeterBelow 238 Posted January 16, 2022 19 minutes ago, FPiette said: GPX file have very simple structure but there are tens of thousands of nodes (Example of GPX file). Before changing my code, I would profile it to know if the slowness comes from the XML parser our from my own code which is fully class oriented with generic TObjectList. I suspect this is much slower in my case compared to records and pre-allocated dynamic arrays. Set the Capacity property of your list objects to a suitably high value before you start adding objects, that can greatly improve performance since it cuts down cases where the list has to grow its internal array. Share this post Link to post
Dalija Prasnikar 1396 Posted January 16, 2022 1 hour ago, FPiette said: GPX file have very simple structure but there are tens of thousands of nodes (Example of GPX file). Before changing my code, I would profile it to know if the slowness comes from the XML parser our from my own code which is fully class oriented with generic TObjectList. I suspect this is much slower in my case compared to records and pre-allocated dynamic arrays. Understandable. I would do the same. GPX format is ideal for SAX parser. Avoiding allocations of thousands of XML nodes would be my best bet for optimization. 1 Share this post Link to post
pyscripter 689 Posted January 16, 2022 (edited) 4 hours ago, FPiette said: This 32 bit application written in Delphi 11 is using IXmlDoc Have a look at this: In addition to SAX parsers you may want to consider XmlLite. XMLLite is a good alternative to SAX on Windows. See the note about push and pull parsers. Similar speed and much easier to program with. And there is a Delphi wrapper (just a single unit to add to your project). In my experience XMLLite was very fast. Microsoft is using XMLLite to parse SVG files. Edited January 16, 2022 by pyscripter 1 1 Share this post Link to post
Anders Melander 1784 Posted January 16, 2022 François, for something like this you probably don't need a profiler at all. Especially not an instrumenting profiler (more on that later). What I always do, before resorting to profilers, is to simply run the code in the IDE and pause the execution in the debugger when the time critical code is executing. Statistics dictate that the current call stack will show you exactly where your hot spot is. Repeat a few times to verify. Basically this is the same approach a sampling profiler uses. The problem with instrumenting profilers is that the overhead of the instrumentation code affect the timing results so much that you can't really rely on it. They're great at determining call graphs and identifying the relative call frequency of different methods, but in my experience you can't use the timing for much. Share this post Link to post
FPiette 383 Posted January 16, 2022 37 minutes ago, pyscripter said: XMLLite is a good alternative to SAX on Windows Currently trying omniXML that comes with Delphi 11. If unsuccessful, I will give a try to XMLLite. Thanks. Share this post Link to post
FPiette 383 Posted January 16, 2022 1 hour ago, Anders Melander said: simply run the code in the IDE and pause the execution in the debugger when the time critical code is executing. When I do that, the debugger always stops at the same place: ntdll.RtlUserThreadStart. And the call stack is empty! Share this post Link to post
Anders Melander 1784 Posted January 16, 2022 19 minutes ago, FPiette said: When I do that, the debugger always stops at the same place: ntdll.RtlUserThreadStart. And the call stack is empty! And you're looking at the main thread? This is probably the COM apartment threading in play. If so, one of the other threads will most likely be running code in MSXML. Share this post Link to post
Fr0sT.Brutal 900 Posted January 17, 2022 20 hours ago, FPiette said: Before changing my code, I would profile it to know if the slowness comes from the XML parser our from my own code which is fully class oriented with generic TObjectList. I suspect this is much slower in my case compared to records and pre-allocated dynamic arrays. Hmm, I probably will act as Cap'n Obvious but why not check timings of parse/read & list fill code separately (with primitive gettickcount)? That's what I'm doing first before unpacking serious tools. 2 Share this post Link to post
John Terwiske 4 Posted January 17, 2022 If one starts with good algorithm then the only thing that works for me is to do profiling (without instrumentation). I've had good luck with (the free) Vtune Profiler from Intel. Attached is a picture showing comparison of Delphi and Cpp for prime sieving console application on Windows. This sample uses the Fastmm5, but the differences in cache misses are not that different than the Delphi shipping version of Fastmm. I should also note that the Delphi implementation needs more work (in the algorithm more than anything else), but this might give you an idea of where to look for performance improvement. Also, one needs to jump through some hoops to find the actual line in Delphi code where bottlenecks appear (unlike some of the profilers mentioned above which can zero in to function). Share this post Link to post
Anders Melander 1784 Posted January 17, 2022 1 hour ago, John Terwiske said: Also, one needs to jump through some hoops to find the actual line in Delphi code where bottlenecks appear (unlike some of the profilers mentioned above which can zero in to function). Aren't you using map2pdb with VTune? Share this post Link to post
John Terwiske 4 Posted January 18, 2022 17 hours ago, Anders Melander said: Aren't you using map2pdb with VTune? Not yet, but it certainly looks worthwhile! Share this post Link to post
Stefan Glienke 2002 Posted January 18, 2022 19 hours ago, John Terwiske said: I should also note that the Delphi implementation needs more work (in the algorithm more than anything else) Which PrimeSieve implementation do you use? There has been quite a fuss around that last year after the Youtube Video series by Dave Plummer. Share this post Link to post
luebbe 26 Posted January 18, 2022 @FPiette do you know https://github.com/neslib/Neslib.Xml? I switched some of my xml parsing from Delphi's native parser to Neslib.Xml and it resulted in around 35x-40x faster parsing (17-20 seconds versus <= 0.5 seconds). 2 2 Share this post Link to post
John Terwiske 4 Posted January 18, 2022 2 hours ago, Stefan Glienke said: Which PrimeSieve implementation do you use? There has been quite a fuss around that last year after the Youtube Video series by Dave Plummer. In the table above, the cpp implementation is the accepted solution for the Dave Plummer YouTube series (Microsoft VS C++ 20 with all optimizations). The Delphi solution (10.3.3) is the accepted solution for Delphi-- with a couple of modifications that add add about 20% to number of iterations (just by inlining member functions, iirc). I'm not really surprised by these results, and memory bound apps (my own included) seem to suffer quite a bit with Delphi. What was the fuss about (not that I wish to reopen a topic)? Share this post Link to post
FPiette 383 Posted January 18, 2022 1 hour ago, luebbe said: @FPiette do you know https://github.com/neslib/Neslib.Xml? I switched some of my xml parsing from Delphi's native parser to Neslib.Xml and it resulted in around 35x-40x faster parsing (17-20 seconds versus <= 0.5 seconds). I did not know! And this looks very promising. Thanks a lot. Share this post Link to post
Stefan Glienke 2002 Posted January 18, 2022 7 hours ago, John Terwiske said: What was the fuss about (not that I wish to reopen a topic)? Mostly the language war that some people made out of that. The accepted Delphi solution even with inlining is like half as fast as it could be if you know the quirks of the Delphi compiler. Since the process of what was accepted and not was apples and bananas (such as insisting to create instances for Delphi/Pascal but use stack allocated objects in C++ that don't require a heap allocation) I did not bother with providing any version. Plus the non-availability of a docker image with Delphi to run the tests. Share this post Link to post
FPiette 383 Posted January 21, 2022 To keep you up-to-date, I have modified my code to use OmniXML that is delivered with Delphi 11 (unit Xml.Internal.OmniXML) and to use a record instead of a class for the most used data structure. The net result is a speed increase by a factor of 10 (Ten!) on a large GPX file. If time permit, I will give a try to neslib.xml. 1 Share this post Link to post