Jump to content
Sign in to follow this  
RDP1974

Delphi 64bit compiler RTL speedup

Recommended Posts

hello dear Delphinius

 

Delphi 64 bit compiler RTL speedup

 

Strong performance speedup for multithreaded server apps

Deflate compression 5x faster than gzlib for WebBroker apps, brings your client-server experience up to the stars

 

I'll update sometime.

 

Regards.

Edited by RDP1974
  • Like 1

Share this post


Link to post

Chance that I will download and use some DLL from an unknown third party: None.

Share this post


Link to post

The 3rd party dll are Intel TBB if I am correct.

So you should at least mention it, with the proper licence terms, and provide a link.

 

About memory management, from my tests the Intel TBB MM is indeed fast, but eats all memory, so it is not usable for any serious server-side software, running a long time.

Some numbers, tested on FPC/Linux, but you got the idea:

    - FPC default heap
     500000 interning 8 KB in 77.34ms i.e. 6,464,959/s, aver. 0us, 98.6 MB/s
     500000 direct 7.6 MB in 100.73ms i.e. 4,963,518/s, aver. 0us, 75.7 MB/s
    - glibc 2.23
     500000 interning 8 KB in 76.06ms i.e. 6,573,152/s, aver. 0us, 100.2 MB/s
     500000 direct 7.6 MB in 36.64ms i.e. 13,645,915/s, aver. 0us, 208.2 MB/s
    - jemalloc 3.6
     500000 interning 8 KB in 78.60ms i.e. 6,361,323/s, aver. 0us, 97 MB/s
     500000 direct 7.6 MB in 58.08ms i.e. 8,608,667/s, aver. 0us, 131.3 MB/s
    - Intel TBB 4.4
     500000 interning 8 KB in 61.96ms i.e. 8,068,810/s, aver. 0us, 123.1 MB/s
     500000 direct 7.6 MB in 36.46ms i.e. 13,711,402/s, aver. 0us, 209.2 MB/s
    for multi-threaded process, we observed best scaling with TBB on this system
    BUT memory consumption raised to 60 more space (gblic=2.6GB vs TBB=170GB)!
    -> so for serious server work, glibc (FPC_SYNCMEM) sounds the best candidate

 

  • Like 1

Share this post


Link to post
1 hour ago, Arnaud Bouchez said:

The 3rd party dll are Intel TBB if I am correct.

So you should at least mention it, with the proper licence terms, and provide a link.

Yes, it's written in the title and in the license. Custom DLL from Intel Performance libraries.

Kind regards

Share this post


Link to post

Why isn't the code of those dlls open source as well? If they are simply directly taken from those Intel libraries, provide a link how to get them directly from the original source.

Share this post


Link to post

hi Arnaud, consider I admire your talent, but why you tell TBB unusable? It's used in mainstream server and workstation products worldwide without problems

 

DLL? Are extracted from Intel TBB and IPP royalty free packages, I did only pascal wrappers; no custom source code changes are done;
you can compile by yourself, I have put them in the repository because many people cannot build them, or not having the time to do

for the memory allocator:
https://github.com/oneapi-src/oneTBB/releases
https://github.com/oneapi-src/oneTBB/archive/v2020.3.zip
-> see folder TBBMalloc

for the rtl simd patches:
https://software.seek.intel.com/performance-libraries
-> see IPP
run the utility to build a custom DLL and export:
'ippsZero_8u';
'ippsCopy_8u';
'ippsMove_8u';
'ippsSet_8u';
'ippsFind_8u';
'ippsCompare_8u';
'ippsUppercaseLatin_8u_I';
'ippsReplaceC_8u';
 
for the web deflate acceleration (5x quicker than windows gzip, webbroker helper provided)
-> extract IPP under Linux, see the readme how to patch zlib original sources, take the changed sources and compile them with MS VC++

kind regards
R.

 

Still Delphi (VCL) the best framework for Windows apps!

 

Share this post


Link to post
3 hours ago, Arnaud Bouchez said:

About memory management, from my tests the Intel TBB MM is indeed fast, but eats all memory, so it is not usable for any serious server-side software, running a long time.

If that "eaten" memory would be unused otherwise why you bother about that consumption? I suspect they just dynamically reserve as much memory as possible for internal needs.

7 minutes ago, RDP1974 said:

DLL? Are extracted from Intel TBB and IPP royalty free packages, I did only pascal wrappers; no custom source code changes are done;
you can compile by yourself, I have put them in the repository because many people cannot build them, or not having the time to do

for the memory allocator:

Links to original source of these libs would greatly improve trustfulness of your project.

Share this post


Link to post

Just pointing this out so you don't get yourself into trouble:

IANAL but the fact that the IPP is under a commercial license or a free license if you qualify (time limited if I read it correctly - but I just quickly skimmed through it) might make it arguable to actually distribute any parts of it.

If you know more about it I would be glad to be wrong. And as said before - even though your intentions are surely to make it easy for users - putting the source for those projects with an explanation how to build them would get you on the safe side.

Edited by Stefan Glienke
  • Sad 1

Share this post


Link to post
6 hours ago, Stefan Glienke said:

might make it arguable to actually distribute any parts of it.

The license permits. There is a tool to make custom DLL.

Share this post


Link to post
2 hours ago, RDP1974 said:

The license permits. There is a tool to make custom DLL.

Stefan is talking about licensing, not about whether tools exist to build the library. He's talking about distributing not building. 

Share this post


Link to post
21 hours ago, Fr0sT.Brutal said:

If that "eaten" memory would be unused otherwise why you bother about that consumption? I suspect they just dynamically reserve as much memory as possible for internal needs.

No, it was not just "reserved", there was a lot more of dirty pages with Intel TBB.

We tried it on production on Linux, on high-end servers with heavy multi-thread process, and the resident size (RES) was much bigger - not only the virtual/shared memory (VIRT/SHR).

 

Also the guys from https://unitybase.info - which have very high demanding services - evaluated and rejected the Intel TBB use. Either the glibc MM https://sourceware.org/glibc/wiki/MallocInternals or our https://github.com/synopse/mORMot2/blob/master/src/core/mormot.core.fpcx64mm.pas give good results on Linux, with low memory consumption.


Anyway, I wouldn't use Windows to host demanding services. So if you have a Windows server with a lot of memory, you are free to use Intel TBB if you prefer.

  • Like 1

Share this post


Link to post
3 hours ago, David Heffernan said:

He's talking about distributing not building. 

I'm sure, the license permits to distribute for free.

Edited by RDP1974

Share this post


Link to post
10 minutes ago, Stefan Glienke said:

The lack of pdb support in Delphi makes it tedious to use because you only get addresses reported which you then have to manually look up.

 

OMG, if somebody could make a tool to convert detailed map files into PDB files that would be incredibly useful. 

  • Like 1

Share this post


Link to post
2 minutes ago, Anders Melander said:

There's this old one, as I'm sure you know: https://github.com/andremussche/map2dbg/tree/master/tds2pdb

I think I tried it once, for use with VTune, without success.

Last time I checked that project was dormant. Once upon a time I used Andre's map2dbg to make dbg files that could be used by some tools, but I never had any success with that for 64 bit executables. 

Share this post


Link to post
On 1/16/2021 at 2:05 PM, Arnaud Bouchez said:

No, it was not just "reserved", there was a lot more of dirty pages with Intel TBB.

OK, good to know that.

Share this post


Link to post
On 1/16/2021 at 12:05 PM, Arnaud Bouchez said:

No, it was not just "reserved", there was a lot more of dirty pages with Intel TBB.

Maybe in a old version? They are making "giant" steps forward.

Edited by RDP1974

Share this post


Link to post
7 hours ago, RDP1974 said:

Maybe in a old version? They are making "giant" steps forward.

Tests were done last year on the last Debian.

Share this post


Link to post

The thread pool TLS cache model of TBB fits particularly well the NT Windows Kernel (scheduler, quantum fibers, KI* exposed API over HAL), but sure, it consumes a lot of memory.

 

Anyway, offtopic, I'm using with great satisfaction Delphi x Linux compiler with Firedac pooling, SOAP indy based custom SSL webservices -> very small and very fast, nobody is using the same toolchain?

 

Share this post


Link to post
On 1/19/2021 at 1:04 PM, RDP1974 said:

I'm using with great satisfaction Delphi x Linux compiler with Firedac pooling, SOAP indy based custom SSL webservices -> very small and very fast, nobody is using the same toolchain?

Nope: FPC Linux + mORMot DB and SOA layer since years. With high performance and stability - we had servers handling thousands of requests per seconds receiving TB of data running for months with no restart and no problem. Especially with our MM which uses much less memory than TBB.

 

One problem I noticed on Linux with C memory managers running FPC services is that they are subject to SIGABRT if they encounter any memory problem.
This is why we worked on our own https://github.com/synopse/mORMot2/blob/master/src/core/mormot.core.fpcx64mm.pas which consumes much less memory than TBB, and if there is a problem in our code, we have a GPF exception we can trace, and not a SIGABRT which kills the process. I can tell you that a SIGABRT for a service is a disaster - it always happen when you are far AFK and can't react quickly. And if you need to install something like https://mmonit.com/monit/ on your server, it becomes complicated...

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×