RDP1974 40 Posted January 15, 2021 (edited) hello dear Delphinius Delphi 64 bit compiler RTL speedup Strong performance speedup for multithreaded server apps Deflate compression 5x faster than gzlib for WebBroker apps, brings your client-server experience up to the stars I'll update sometime. Regards. Edited January 15, 2021 by RDP1974 1 Share this post Link to post
Lars Fosdal 1792 Posted January 15, 2021 Chance that I will download and use some DLL from an unknown third party: None. Share this post Link to post
Arnaud Bouchez 407 Posted January 15, 2021 The 3rd party dll are Intel TBB if I am correct. So you should at least mention it, with the proper licence terms, and provide a link. About memory management, from my tests the Intel TBB MM is indeed fast, but eats all memory, so it is not usable for any serious server-side software, running a long time. Some numbers, tested on FPC/Linux, but you got the idea: - FPC default heap 500000 interning 8 KB in 77.34ms i.e. 6,464,959/s, aver. 0us, 98.6 MB/s 500000 direct 7.6 MB in 100.73ms i.e. 4,963,518/s, aver. 0us, 75.7 MB/s - glibc 2.23 500000 interning 8 KB in 76.06ms i.e. 6,573,152/s, aver. 0us, 100.2 MB/s 500000 direct 7.6 MB in 36.64ms i.e. 13,645,915/s, aver. 0us, 208.2 MB/s - jemalloc 3.6 500000 interning 8 KB in 78.60ms i.e. 6,361,323/s, aver. 0us, 97 MB/s 500000 direct 7.6 MB in 58.08ms i.e. 8,608,667/s, aver. 0us, 131.3 MB/s - Intel TBB 4.4 500000 interning 8 KB in 61.96ms i.e. 8,068,810/s, aver. 0us, 123.1 MB/s 500000 direct 7.6 MB in 36.46ms i.e. 13,711,402/s, aver. 0us, 209.2 MB/s for multi-threaded process, we observed best scaling with TBB on this system BUT memory consumption raised to 60 more space (gblic=2.6GB vs TBB=170GB)! -> so for serious server work, glibc (FPC_SYNCMEM) sounds the best candidate 1 Share this post Link to post
RDP1974 40 Posted January 15, 2021 1 hour ago, Arnaud Bouchez said: The 3rd party dll are Intel TBB if I am correct. So you should at least mention it, with the proper licence terms, and provide a link. Yes, it's written in the title and in the license. Custom DLL from Intel Performance libraries. Kind regards Share this post Link to post
Stefan Glienke 2002 Posted January 15, 2021 Why isn't the code of those dlls open source as well? If they are simply directly taken from those Intel libraries, provide a link how to get them directly from the original source. Share this post Link to post
RDP1974 40 Posted January 15, 2021 hi Arnaud, consider I admire your talent, but why you tell TBB unusable? It's used in mainstream server and workstation products worldwide without problems DLL? Are extracted from Intel TBB and IPP royalty free packages, I did only pascal wrappers; no custom source code changes are done; you can compile by yourself, I have put them in the repository because many people cannot build them, or not having the time to do for the memory allocator: https://github.com/oneapi-src/oneTBB/releases https://github.com/oneapi-src/oneTBB/archive/v2020.3.zip -> see folder TBBMalloc for the rtl simd patches: https://software.seek.intel.com/performance-libraries -> see IPP run the utility to build a custom DLL and export: 'ippsZero_8u'; 'ippsCopy_8u'; 'ippsMove_8u'; 'ippsSet_8u'; 'ippsFind_8u'; 'ippsCompare_8u'; 'ippsUppercaseLatin_8u_I'; 'ippsReplaceC_8u'; for the web deflate acceleration (5x quicker than windows gzip, webbroker helper provided) -> extract IPP under Linux, see the readme how to patch zlib original sources, take the changed sources and compile them with MS VC++ kind regards R. Still Delphi (VCL) the best framework for Windows apps! Share this post Link to post
Fr0sT.Brutal 900 Posted January 15, 2021 3 hours ago, Arnaud Bouchez said: About memory management, from my tests the Intel TBB MM is indeed fast, but eats all memory, so it is not usable for any serious server-side software, running a long time. If that "eaten" memory would be unused otherwise why you bother about that consumption? I suspect they just dynamically reserve as much memory as possible for internal needs. 7 minutes ago, RDP1974 said: DLL? Are extracted from Intel TBB and IPP royalty free packages, I did only pascal wrappers; no custom source code changes are done; you can compile by yourself, I have put them in the repository because many people cannot build them, or not having the time to do for the memory allocator: Links to original source of these libs would greatly improve trustfulness of your project. Share this post Link to post
RDP1974 40 Posted January 15, 2021 Indeed TBB is open https://github.com/oneapi-src/oneTBB meanwhile IPP is closed source https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/ipp.html (but with a utility to extract custom DLL) Share this post Link to post
Stefan Glienke 2002 Posted January 16, 2021 (edited) Just pointing this out so you don't get yourself into trouble: IANAL but the fact that the IPP is under a commercial license or a free license if you qualify (time limited if I read it correctly - but I just quickly skimmed through it) might make it arguable to actually distribute any parts of it. If you know more about it I would be glad to be wrong. And as said before - even though your intentions are surely to make it easy for users - putting the source for those projects with an explanation how to build them would get you on the safe side. Edited January 16, 2021 by Stefan Glienke 1 Share this post Link to post
pyscripter 689 Posted January 16, 2021 Looks free to me. Free Intel® Software Development Tools End User License Agreements (intel.com) The license if fairly liberal. Share this post Link to post
pyscripter 689 Posted January 16, 2021 Off Topic: VTune also appears to be a free download (Fix Performance Bottlenecks with Intel® VTune™ Profiler). Any experience of using it with Delphi? Share this post Link to post
RDP1974 40 Posted January 16, 2021 6 hours ago, Stefan Glienke said: might make it arguable to actually distribute any parts of it. The license permits. There is a tool to make custom DLL. Share this post Link to post
David Heffernan 2345 Posted January 16, 2021 2 hours ago, RDP1974 said: The license permits. There is a tool to make custom DLL. Stefan is talking about licensing, not about whether tools exist to build the library. He's talking about distributing not building. Share this post Link to post
Arnaud Bouchez 407 Posted January 16, 2021 21 hours ago, Fr0sT.Brutal said: If that "eaten" memory would be unused otherwise why you bother about that consumption? I suspect they just dynamically reserve as much memory as possible for internal needs. No, it was not just "reserved", there was a lot more of dirty pages with Intel TBB. We tried it on production on Linux, on high-end servers with heavy multi-thread process, and the resident size (RES) was much bigger - not only the virtual/shared memory (VIRT/SHR). Also the guys from https://unitybase.info - which have very high demanding services - evaluated and rejected the Intel TBB use. Either the glibc MM https://sourceware.org/glibc/wiki/MallocInternals or our https://github.com/synopse/mORMot2/blob/master/src/core/mormot.core.fpcx64mm.pas give good results on Linux, with low memory consumption. Anyway, I wouldn't use Windows to host demanding services. So if you have a Windows server with a lot of memory, you are free to use Intel TBB if you prefer. 1 Share this post Link to post
RDP1974 40 Posted January 16, 2021 (edited) 3 hours ago, David Heffernan said: He's talking about distributing not building. I'm sure, the license permits to distribute for free. Edited January 16, 2021 by RDP1974 Share this post Link to post
Stefan Glienke 2002 Posted January 16, 2021 9 hours ago, pyscripter said: Off Topic: VTune also appears to be a free download (Fix Performance Bottlenecks with Intel® VTune™ Profiler). Any experience of using it with Delphi? The lack of pdb support in Delphi makes it tedious to use because you only get addresses reported which you then have to manually look up. Share this post Link to post
David Heffernan 2345 Posted January 16, 2021 10 minutes ago, Stefan Glienke said: The lack of pdb support in Delphi makes it tedious to use because you only get addresses reported which you then have to manually look up. OMG, if somebody could make a tool to convert detailed map files into PDB files that would be incredibly useful. 1 Share this post Link to post
Anders Melander 1784 Posted January 16, 2021 1 hour ago, David Heffernan said: OMG, if somebody could make a tool to convert detailed map files into PDB files that would be incredibly useful. There's this old one, as I'm sure you know: https://github.com/andremussche/map2dbg/tree/master/tds2pdb I think I tried it once, for use with VTune, without success. Share this post Link to post
David Heffernan 2345 Posted January 16, 2021 2 minutes ago, Anders Melander said: There's this old one, as I'm sure you know: https://github.com/andremussche/map2dbg/tree/master/tds2pdb I think I tried it once, for use with VTune, without success. Last time I checked that project was dormant. Once upon a time I used Andre's map2dbg to make dbg files that could be used by some tools, but I never had any success with that for 64 bit executables. Share this post Link to post
Leif Uneus 43 Posted January 16, 2021 See the question from David at SO, years ago: https://stackoverflow.com/q/9422703/576719 Share this post Link to post
Fr0sT.Brutal 900 Posted January 18, 2021 On 1/16/2021 at 2:05 PM, Arnaud Bouchez said: No, it was not just "reserved", there was a lot more of dirty pages with Intel TBB. OK, good to know that. Share this post Link to post
RDP1974 40 Posted January 18, 2021 (edited) On 1/16/2021 at 12:05 PM, Arnaud Bouchez said: No, it was not just "reserved", there was a lot more of dirty pages with Intel TBB. Maybe in a old version? They are making "giant" steps forward. Edited January 18, 2021 by RDP1974 Share this post Link to post
Arnaud Bouchez 407 Posted January 18, 2021 7 hours ago, RDP1974 said: Maybe in a old version? They are making "giant" steps forward. Tests were done last year on the last Debian. Share this post Link to post
RDP1974 40 Posted January 19, 2021 The thread pool TLS cache model of TBB fits particularly well the NT Windows Kernel (scheduler, quantum fibers, KI* exposed API over HAL), but sure, it consumes a lot of memory. Anyway, offtopic, I'm using with great satisfaction Delphi x Linux compiler with Firedac pooling, SOAP indy based custom SSL webservices -> very small and very fast, nobody is using the same toolchain? Share this post Link to post
Arnaud Bouchez 407 Posted February 13, 2021 On 1/19/2021 at 1:04 PM, RDP1974 said: I'm using with great satisfaction Delphi x Linux compiler with Firedac pooling, SOAP indy based custom SSL webservices -> very small and very fast, nobody is using the same toolchain? Nope: FPC Linux + mORMot DB and SOA layer since years. With high performance and stability - we had servers handling thousands of requests per seconds receiving TB of data running for months with no restart and no problem. Especially with our MM which uses much less memory than TBB. One problem I noticed on Linux with C memory managers running FPC services is that they are subject to SIGABRT if they encounter any memory problem. This is why we worked on our own https://github.com/synopse/mORMot2/blob/master/src/core/mormot.core.fpcx64mm.pas which consumes much less memory than TBB, and if there is a problem in our code, we have a GPF exception we can trace, and not a SIGABRT which kills the process. I can tell you that a SIGABRT for a service is a disaster - it always happen when you are far AFK and can't react quickly. And if you need to install something like https://mmonit.com/monit/ on your server, it becomes complicated... Share this post Link to post