Jump to content

Search the Community

Showing results for 'profiling'.


Didn't find what you were looking for? Try searching for:


More search options

  • Search By Tags

    Type tags separated by commas.
  • Search By Author

Content Type


Forums

  • Delphi Questions and Answers
    • Algorithms, Data Structures and Class Design
    • VCL
    • FMX
    • RTL and Delphi Object Pascal
    • Databases
    • Network, Cloud and Web
    • Windows API
    • Cross-platform
    • Delphi IDE and APIs
    • General Help
    • Delphi Third-Party
  • C++Builder Questions and Answers
    • General Help
  • General Discussions
    • Embarcadero Lounge
    • Tips / Blogs / Tutorials / Videos
    • Job Opportunities / Coder for Hire
    • I made this
  • Software Development
    • Project Planning and -Management
    • Software Testing and Quality Assurance
  • Community
    • Community Management

Find results in...

Find results that contain...


Date Created

  • Start

    End


Last Updated

  • Start

    End


Filter by number of...

Joined

  • Start

    End


Group


Delphi-Version

Found 57 results

  1. I am very sure that record finalization is way faster than heap allocations caused by SetString - if after profiling it still has noticeable impact then one could extract the string keeping from the record into some other structure - the point is they need to be kept alive as long as they are inside the dictionary - how does not matter.
  2. Needs to be question #1. It is not uncommon for people to invest much time and energy in pursuit of optimizations they later find were not necessary. As an old friend told me many years ago: First make it work, then make it fast. Profiling always trumps supposition.
  3. John Terwiske

    Delphi profiler

    If one starts with good algorithm then the only thing that works for me is to do profiling (without instrumentation). I've had good luck with (the free) Vtune Profiler from Intel. Attached is a picture showing comparison of Delphi and Cpp for prime sieving console application on Windows. This sample uses the Fastmm5, but the differences in cache misses are not that different than the Delphi shipping version of Fastmm. I should also note that the Delphi implementation needs more work (in the algorithm more than anything else), but this might give you an idea of where to look for performance improvement. Also, one needs to jump through some hoops to find the actual line in Delphi code where bottlenecks appear (unlike some of the profilers mentioned above which can zero in to function).
  4. Dalija Prasnikar

    Delphi profiler

    Unrelated to the profiling, there are other optimizations. First, SAX parsing is generally more performant than DOM parsing, especially when DOM is based on interfaces. If you don't need XML DOM, then building your business classes directly during parsing will be more efficient. But not all structures cane be equally easy parsed by SAX. Next, IXmlDoc works on top of standard IDOM interfaces, so you have additional slowdown there. If you cannot use SAX, modifying code to work directly with IDOM interfaces might be a solution. Or using different DOM parser.
  5. MichaelT

    The future of Delphi

    Did you visit David I. 😉 Connect all the 6 boxes, since everything is connected with everything else and all problems are gone. I doubt that will work. pgAdmin works great in virtual-boxes, especially because of the dashboard and debugging against a run-time environment on the server works great but only if just the screen is sent to the client. Forget it. I tend to agree that in both cases Delphi and Python a solid well maintained base for C-Bindings attracted people. Think of project Jedi but a Windows origin and the integrate everything into the desktop and the Explorer strategy put another level of complexity on top, which lead to anything but simplicity in the end. Apart from that, you talk about something totally different than Delphi. Not sure if a Delphi like way is an answer to the underlying questions accepted by a broader audience. Elevate Web Builder would be a first step into such a direction once debugging on the client side is possible, but even that is not really required at the moment. Without proprietary add-ons in general integrated functionality is hard to achieve. It's 25 years too late for 4GL love. There is no such thing as an open let's say ABAP-stack just for the Web. Sounds like that. You want something pretty similar like the SAP-GUI called Delphi-GUI. I worked on/with something that was called XSTP-GUI which integrated Java widgets in the mid/late 1990s and an approach that work like a charm on Smalltalk-System put to the next level on Java busted with flying colors. Javascript is about portability and from this perspective an integrated IDE like Delphi is a meta tool allowing to build the environment you suggest. The more features you add to the very definition of an IDE, an advanced editor with a menu entry called tools supporting invoking them in the context of the IDE adds interfaces (common sense) to other disciplines of software engineering who do not even care a tiny rabbit shit about you and your IDE. The last revival into such a direction, which inevitably comes with your suggestions outside the scope of what an IDE is meant to be, I have seen in the fashion of add-ons to the Eclipse IDE which again failed to succeed even on a mid term, because of breaking changes in the Eclipse IDE itself. Do you really that all the others have to put things in the right place at the right time just because you want to press a button and everything works as intended by you 😉. Maybe it's the biggest tragedy for the Delphi super-hero that the world never worked this way and rest assured it never will. The Delphi world is about succeeding in a dynamic environment where anarchy still matters and rulez from time to time or all the time and not about a consensus on praising others for leaving things unchanged. I tend to agree that developing in another environment but the target environment makes development pretty complicated. Going beyond an IDE based approach leads to to something beyond pretty quickly a workbench for example or many of the 1990s approaches that worked pretty great generating C Code and indeed they were abandoned for the wrong reason 'C', which was heavily bashed these days. All that came later used a virtual machine especially because of being in the position to utilize dictionaries in case of the CRM-systems (integrated development and execution environments) or RTTI exposed (JVM, .net runtime, ....). It's not surprise that those who never worked with e.g. Smalltalk before tried a revival but on another technology called Java oder .net and spent their time wasting to again rebuild all the crap people threw out of the windows when XP-programming was introduced including their managers and the bureaucrats. Kidding. Even if the IDEs dominated the scenes for along time theirs has come when it comes to software-development. I see no way why an integrated tool should give the answer. You should not assume that an application is what people want. People got used to it. Except from the very early versions Borland turned Delphi, said more precisely Turbo Pascal, into a Y2K child. I didn't have the impression that Y2K was a challenge for small shops in a first place. Everyone asked for Business-software on Windows. So it's no surprise that Delphi turned into what it is today. It's just not that bad. EMBT successfully had enough to do with putting the Wild West style mess we left from 1997 to 2005 into a somehow consistent no one knows how what it's good for today. The 1990s were about making money and not software-engineering since it turned out soon that the whole bunch of software methodologies and other failed approaches allowed to kill budgets. After the these days were finally gone with the disappearance of Windows XP the honst soles were left back what without them could be called a ghost-town. But what those who disappeared behind the horizon after a tough ride were the requirements for what is called FMX today and EMB had to live with that situation. In both cases the question of in how far an IDE based approach is the answer to both love in the very detail and a total open approach attracting developers beyond what's already available. If you remember what was the answer to performance, a) buy a new Windows and a new computer b) let the database do the calculation jobs or use extensive profiling and c) use assembler.
  6. Jud

    Maximum static memory

    I'm hot shuffling the records, I'm grouping them according to how they are to be processed. The records are 16-bytes long (there are a lot of them). Yesterday (after watching a Webinar and installing Delphi 11.0, yes it is available) I did more profiling. I said that the quicksort was taking 80%. There is a SwapRecords procedure nested in the quicksort procedure that is taking an additional 7%. I went back to my earlier version (with compiler directives) - the one that uses a non-rectulangular dynamic array of dynamic arrays. The profile shows that SetLength is taking most of the time. The profiler shows "system" using 77% of the time. @FillChar is using the most, which must be zeroing out the dynamic array (which ( don't need). SysGetMem is second in the profiler. But yesterday I had the idea of using a large dynamic array of small static arrays. I think that is going to work (in 64-bit mode). Also, what Administrators said above just made me think of another possible approach. Keep the dynamic array of records and go through and pick out the ones with a certain key, and process them. Repeat with the next key. The problem with this is that there are so many keys (and each one will have to make a pass through the array) that I think it will be back to quadratic time. Probably tomorrow I will implement the dynamic array of small static array method.
  7. It's this rich api that makes Spring4D collections worthwhile. The perf hit is still worth it for this alone. In a real world application (FinalBuilder) I have not noticed any perf issues that could be attributed to the spring4d collections, but then I'm rarely enumerating huge collections - certainly in my profiling efforts there were plenty of areas to optimise, but spring4d wasn't showing up as an area of concern in the profiler (vtune). Now if only we could have non ref counted interfaces (ie no calls to add/release) and we could implement interfaces on records - best of both worlds 😉
  8. Anders Melander

    MAP2PDB - Profiling with VTune

    With an exclusion list that removed most of the VCL and RTL as well as DevExpress, TeeChart, Indy and Firedac I managed to reduced the size of my pdb from 200Mb to 35Mb. VTune now loads my project in "just" 5 minutes... It's still struggling though. Everything is incredible slow. I get the impression that Intel has never tried profiling VTune with VTune.... or maybe they tried and gave up because it was too slow. Here's my command line: map2pdb -v -bind "TurboFooProPlus.map" -exclude:dx*;cx*;system*;winapi*;vcl*;data*;firedac*;soap*;web*;id*
  9. Stefan Glienke

    MAP2PDB - Profiling with VTune

    Profiling map2pdb with VTune using a pdb built with the map file from map2pdb 🤯 I guess that is fixable.
  10. Vincent Parrett

    List of usable RegEx for source code

    As the old saying goes, used regex to solve a problem? now you have two problems! 🤣 That said, I use regex extensively both in Delphi (I wrote the original System.RegularExpressions code) and .Net (keep it simple, and it's not for parsing html!) - but using it to find problems with my code? ah nope, for that I use Eyeballs 1.0 and static analysis tools like FixInsight and Pascal Analyzer (yes I know they have their limitations). Plus I guess I have learned a few things over the years that I apply in new code I write. As for the old code, well refactoring is a work in progress. Yesterday I rewrote a lexer/parser framework I initially wrote 10 years ago due to performance issues, and that issue was identified through profiling (see the thread on vtune). Switching to records took the memory manager from being a big % to insignificant (ie no longer shows up in the profiler). I thought I might get something like a 10% improvement, but was pleasantly surprised to see a 30% improvement! Those sort of gains don't come often in a mature code base.
  11. Vincent Parrett

    MAP2PDB - Profiling with VTune

    I just had my first big win with VTune 😃 I was looking into improving the performance of loading projects in FinalBuilder 9 (in dev) - we have some customers with huge projects that were taking a while to load. Profiling with VTune showed that most of the time was spent in the project lexer/parser - the lexer created class based tokens (and some other associated classes) and a lot of time was spent in the lexer and the memory manager. So 3 hours later, the code has been converted to use records rather than classes - the unit tests all pass (I only had to comment out all the Assert.IsNotNull(token); calls) and the application runs normally (still need to code review to make sure I didn't break things). The result is around a 30% improvement with that change alone! That's just me counting out loud as the project loads 😉 - I'll do more formal timing/testing tomorrow. I also just compared to FinalBuilder 8 and the total improvement is more like 60% - I'll put that down to some manual code review looking for possible hotspots over the last week, and also to switching from the rtl generic collections to using spring4d everywhere! Time for some sleep, but looking forward to more VTuning tomorrow. Thanks @Anders Melander again for this amazing bit of work!
  12. Anders Melander

    MAP2PDB - Profiling with VTune

    It took me a bit longer than expected to get here but I believe I've finally reached the goal. The following shows VTune profiling a Delphi application, with symbol, line number and source code resolution: Download Get the source here: https://bitbucket.org/anders_melander/map2pdb/ And a precompiled exe here: https://bitbucket.org/anders_melander/map2pdb/downloads/ The source has only been tested with Delphi 10.3 - uses inline vars so it will not compile with older versions. Usage map2pdb - Copyright (c) 2021 Anders Melander Version 2.0 Parses the map file produced by Delphi and writes a PDB file. Usage: map2pdb [options] <map-filename> Options: -v Verbose output -pdb[:<output-filename>] Writes a PDB (default) -yaml[:<output-filename>] Writes an YAML file that can be used with llvm-pdbutil -bind[:<exe-filename>] Patches a Delphi compiled exe file to include a reference to the pdb file -test Works on test data. Ignores the input file Example: Configure your project linker options to output a Detailed map file. Compile the project. Execute map2pdb <map-filename> -bind Profile the application with VTune (or whatever) Known issues The -bind switch must occur after the filename contrary to the usage instructions. PDB files larger than 16Mb are not valid. This is currently by design. 64-bit PE files are not yet supported by the -bind option. As should be evident I decided not to go the DWARF route after all. After using a few days to read the DWARF specification and examine the FPC source I decided that it would be easier to leverage the PDB knowledge I had already acquired. Not that this has been easy. Even though I've been able to use the LLVM PDB implementation and Microsoft's PDB source as a reference LLVM's implementation is incomplete and buggy and the LLVM source is "modern C++" which means that it's close to unreadable in places. Microsoft's source, while written in clean C and guaranteed to be correct, doesn't compile and is poorly commented. Luckily it was nothing a few all-nighters with a disassembler and a hex editor couldn't solve. Enjoy!
  13. Anders Melander

    Profiler for Delphi

    Not quite Game Over it seems. It appears that the Size being 1 is just because whomever wrote the linker has misunderstood the meaning of the field. 1 in this case means that there one entry, so I could replace the 1 with $1C and the entry would be valid. If I then assume that data in the debug directory is now valid then the one entry points to the .debug segment. I assume this segment contains the TDS debug data or something like (it starts with the TDS signature "FB09"). It's only present (for both 32- and 64-bit) if I link with debug info enabled. Now since this debug info isn't used anyway when profiling with VTune, I can just hijack the area occupied by it and store my IMAGE_DEBUG_TYPE_CODEVIEW structure there. This means that I won't have to deal with adding new sections and updating all the various offsets in the PE header. Should be doable with what I know so far. I have to some gardening to take care of now but I'll give it another go this evening. Stay (V)tuned...
  14. Mike Torrettinni

    How do you identify bottleneck in Delphi code?

    I just realized I can do a better comparison of new vs old methods, when optimizing code: in the past I would profile old version, and save profiling results, then run new and compare screenshots of timings. But is much better if you run both methods at the same time and compare results: This way I can keep both methods in the same code, compare small improvements and I can switch on/off old vs new when profiling and see the progress of eliminating or reducing bottlenecks.
  15. You began with the question of how to identify a bottleneck. The first criterion should be whether it is observable by your users. If you have a button click event which executes in 200mS, and you can cut that in half, you may get satisfaction from doing it, but the user will not see the difference. Ordinarily, unless a) an action takes at least dozens of seconds to complete, b) is frequently used and c) can be significantly speeded up (by which I mean 2X or more), the time invested is unlikely to be repaid in user satisfaction. If you are analyzing or converting some kind of data, and the amount to process is large, then you are likely looking in the wrong place. Some years ago, I had a spreadsheet which the app took a few minutes to produce. In the end, it was not code rework, but replacement of some memory datasets which made a difference. Profiling showed that I might improve code execution by 10% or so, but changing to a more suitable component brought a speedup of over 20 times.
  16. A bad scenario in my code reminded me of this thread, so I re-read it to be sure I'm not missing out anything important. Very useful conclusion! I had an example of some old string manipulation method which I replaced with better one, and I set it up like this - not sure why I used this wrong approach: procedure Work(var aStr:string); var vTmp1, vTmp2: string; begin WorkBetter(aStr); Exit; // here was old stuff that handled string slower than in WorkBetter ... end; When looking at profiling results and the code I was sure this method can't be result of any performance bottle-neck, because it doesn't even touch the slow code! Well, of course I was wrong because it still handles 2 local string vars. Thanks again!
  17. Because RTL and VCL code is build also with inlining I think, so it was valid thing to test I think. No, milliseconds. LResultArray := LStopWatch.Elapsed.TotalMilliseconds; Number format is standard Finnish, which can be misleading to some, decimal separator is comma, and thousand separator is "space". Will do that when I've got time for that. Also publish my test code so anyone can check it if want to. Big problem here is that profiling is that I can't still reproduce the level of slowdown in production App. And is very hard to profile without instrumenting one (which I don't have) without loosing all to the noise, of all other processes in the app. What I know that main problem is the ScanLine, but Dunno why it is sometimes very fast and takes ages in our production code (vs in test app). It is at least 10 times difference. I've tested production code with just one Scanline and inc the pointer to next line, it'll at least 10x faster as I stated before. Just have to reproduce that in the test App. This, sadly, will be kind of marathon can't spend to much time on this currently. But I'll get there 🙂 -tee-
  18. Yes, of course that didn't do anything. Why would you expect it to? I think you need to take a step back and think about what you are doing instead of just trying random stuff. Take control of the problem. The numbers you have posted shows that you are either measuring time in microseconds or using the thousand separator incorrectly. If you are measuring microseconds then stop that. Numbers that small are not relevant here. One of the first things you should have done would be to locate the bottleneck by profiling your code. If you don't have a profiler or don't understand how to use one then you can emulate a sampling profiler by just running the application a few times and pause it in the debugger. Unless the slowdown is evenly distributed then there's a statistic likelihood that the call stack will show you where the application is spending the majority of its time.
  19. During routine profiling I noticed a function that gets called 1mio+ times and I wanted to look into it, even though it's only 0.24% of total execution time. So, not a bottleneck, but still wanted to see is there anything need to be addressed. Here is example that imitates a real function: var flag: boolean = true; function ProcessString(const aStr: string): string; var s: string; i: integer; b: boolean; begin if flag then Exit(aStr); // dummy code to use the local variables s := aStr; i := Length(s); b := i = 1; if b then Result := aStr else Result := s; end; As this function will always return string and exit, the same does my real function in 99.9% cases - in only 0.01% cases executes the lower part of the function. If I split functions to this: function ProcessStringOLD(const aStr: string): string; var s: string; i: integer; b: boolean; begin s := aStr; i := Length(s); b := i = 1; if b then Result := aStr else Result := s; end; function ProcessStringNew(const aStr: string): string; begin if flag then Exit(aStr) else Result := ProcessStringOLD(aStr); end; In this case, the new ProcessStringNew is 25% faster, because it never executes ProcessStringOLD - make sense. But if I set the flag = false, then of course ProcessStringNew is slower vs original ProcessString, but only by 7%. So new changes result in: 25% faster in 99.9% and 7% slower in 0.01%. Does this micro optimization makes sense? I assume a few little changes like this across in multiple functions, and it could save some valuable execution time, >1%.
  20. Remy Lebeau

    Need help with IDhttp and Thread

    Did you try profiling your code to see where it is actually spending its time? You are starting a new TTask thread for each ListBox item, but running 100 threads simultaneously will not be faster than processing 100 items in batches of, say, 4-8 threads at a time. Creating more simultaneous threads does not mean the job will be completed faster. If anything, doing so will slow it down, because the OS can only handle so much work simultaneously, the more threads you have running the more time the OS has to spend switching between them. In general, you should not have more threads running than you have CPU cores. Have you tried using TParallel.For() instead? It uses a smaller pool of threads and will manage them according to its actual work load. Or, at the very least, you can use the TTask constructor that allows your tasks to use threads from a TThreadPool that you create. Also, don't pass TThread.Current to TThread.Queue(). That will link the queued operation to the thread, and if the thread terminates before the queued operation is performed, the operation will be canceled, and you won't see the thread's result in your UI. Better to pass nil instead in this case. Also, you are leaking the TIdHTTP object, and thus the TIdSSLIOHandlerSocketOpenSSL and TIdCookieManager objects. Also, there is no need to invoke your TRegEx logic in the context of the main UI thread (I wouldn't even use TRegEx at all), it should be invoked in the context of the TTask thread instead. Only the final ListBox addition (the only part of the thread code that actually touches the UI) should be queued, if it is to be performed at all.
  21. Guest

    Delphi 10.4 compiler going senile

    I can't be more conflicted between these both phrases, yet both are quite true and right. One thing though, Embarcadero should hire more experienced .NET developer to build the IDE right ! I saw some discrepancy in the .NET part of the IDE, which is big part of it, something wrong with performance counters, like they been left on and had been forgotten and untouched since Delphi 2009 (at least) to Seattle (at least), i can't be sure, but there is something wrong in that part, the memory usage and the calling stacks keep pointing that code built for profiling not for production, that behaviour is not observed in other .NET applications, and yes the RAD IDE eligible to be called .NET application.
  22. Bill Meyer

    Export to PDF speed

    Again, with no knowledge of content, it is impossible to speculate. If one row per record, you are talking about roughly 60 pages, and though we always want things to be fast, 15 seconds for 60 pages doesn't sound terrible. You need to do some profiling. Also isolate the timing of the report production from any data preparation. Lacking any detailed knowledge of your code, we can offer no specifics.
  23. Mike Torrettinni

    Help with string extraction function

    No, not using any of it, yet. Planning to do some profiling and benchmarking on real data and will see if there are any bottlenecks in this area.
  24. Guest

    TMemoryStream.Write

    You will not see it, not like that, because that is very hard to measure, the magnitude may range between 0 and few cycles ( but in fact it might measure minus few cycles also), so out-of-order-execution is a complicate matter, and what i wrote above above is fraction of how it does work, because it does need the compiler assistance to make sure the assembly code is helping to create better chance for the CPU to use it. explaining in example will be better, also the internet full of resources about this subject, but i found this on SO, which does have 2 nice and accurate answers https://cs.stackexchange.com/questions/97407/difference-between-delayed-branches-and-out-of-order-execution One more fact the compiler should be aware of those situation and utilize different registers to help the CPU execute out of order instructions, Which Delphi compiler is helpless here. Now to see it in work you should create test/profile case that use low level timing directly from the CPU itself (use the RDTSC instruction), then time that part, as we all have modern CPU's, most likely your CPU is having some technology like Speed-Step, so go to the BIOS and disable it, then time again to establish a base line, after that only you will get better idea if out-of-order-execution is been used there, here we need the actual speed from the CPU as it built without those enhancement, We are not talking about slower performance, but we might get better performance by helping the CPU to decide and execute bunch of instruction at the same time. One more thing about profiling, if your timing ranging, means not 100% exact result ( like 1.23456 second every single time) , then you need to make sure that all CPU enhanced technology are disabled first ( Speed-Step, Hyper-Threading...), if you still get ranging result then make sure to switch to use Fibers instead of Threads, Fiber will not be interrupted ( less interaction from the OS) it will have less chance to suffer from a context switched before yielding, such you will get stable result, after that enable those enhancement, and change between jumps/branching to measure what was the result. In other words unless you have same result from each and every execution without these CPU enhancement, then measuring the speed with them is invalid. Yes and no, 2 cycles here with additional few in case out-of-order-execution is triggered and helped, might looks negligible but think of it differently, it is RTL and it is on every Delphi application working over the globe, simply put why not ?, imagine how many cycles been wasted since Delphi 2007 as the OP mentioned, RTL is not something you change every day and it should be best of the best, the number of cycles can be saved is literally unimaginable number, right ? now imagine you can squeeze 2 cycles from most RTL procedures !! On other hand what is the cost of having better and faster RTL ? nothing, it is there and will be there for ever, it should be evolving in that part, hiring or outsourcing that part to real professionals knows how things should be done is most likely cheaper than visiting new city for marketing festival. On side note: i have simple example which will show this out-of-order-execution with SIMD instructions that might raise your eyebrows, not sure to put it here now or later after dummzeuch agree to steer the subject away, dummzeuch looks like nice guy, he is not Daniel, he is not Lars , those guys are traumatizing material.
  25. Stefan Glienke

    FastMM5 now released by Pierre le Riche (small background story)

    FWIW I have done similar but in many cases just looking at the disassembly does not tell the entire story - running the code and profiling it (which can be a very tedious process given the many different uses cases etc) will often tell a different story and modern hardware architecture just shows you the middle finger. Especially for routines that get inlined in many different places although they might produce a little bit better disassembly it might not be worse or even better in some cases to not inline them if they can be written jump free because the code will be smaller and likeliness of staying in the instruction cache will be higher. But again we are talking about microoptimization here that requires a lot of profiling and looking at some cpu metrics.
×