Jump to content

Arnaud Bouchez

  • Content Count

  • Joined

  • Last visited

  • Days Won


Arnaud Bouchez last won the day on February 13

Arnaud Bouchez had the most liked content!

Community Reputation

251 Excellent


Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

  1. Arnaud Bouchez

    Edge Webview update (?)

    Personal note: each time I see GetIt involved, I remember that mORMot was never accepted as part of it because it was "breaking their license policy". In short, you could use any Delphi version (even the free edition) and create Client-Server apps with it. I guess this is the same reason the great ZEOS library or even UniDAC are not part of it, if I checked correctly their registration. That's why I prefer more open package solutions like Delphinus and I hope in the future the very promising https://github.com/DelphiPackageManager/DPM Old but still relevant discussion at https://synopse.info/forum/viewtopic.php?pid=17453#p17453
  2. Side note: inlining is not necessary faster. Sometimes, after inlining the compiler has troubles assigning properly the registers: so a sub-function with a loop is sometimes faster when NOT inlined, because the loop index and pointer could be in a register, whereas once inlined the stack may be used. The worse use of "inline;" I have seen is perhaps the inlining of FileCreate/FileClose/DeleteFile/RenameFile from SysUtils which required the Windows unit to be part of any unit calling it. With obviously no performance benefit because those calls are slow by nature. Embarcadero made this mistake in early versions of Delphi, then they fixed some of it (but RenameFile is still inlined!), and re-used "inline;" when POSIX was introduced... 😞 I had to redefine those functions in mormot.core.os.pas so that I was not required to write {$ifdef MsWindows} Windows, {$endif} in my units when writing cross-platform code...
  3. Arnaud Bouchez

    Micro optimization: Split strings

    Your code is sometimes not correct. For instance, CustomSplitWithPrecount() exit directly without setting result := nil so it won't change the value passed as input (remember than an array result is in fact a "var" appended argument). All those are microoptimisations - not worth it unless you really need it. I would not use TStringList for sure. But any other method is good enough in most cases. Also no need to use a PChar and increment it. In practice, a loop with an index of the string is safer - and slightly faster since you use only the i variable which is already incremented each time. To optimize any further, I would use PosEx() to find the delimiter which may be faster than your manual search on some targets. The golden rule is to make it right first. Then make it fast - only if it is worth it, and I don't see why it would be worth it.
  4. Arnaud Bouchez

    Quickly zero all local variables?

    First, all managed types (string, variable, dynamic arrays) will be already initialized with zero by the compiler. What you can do is define all local variables inside a record, then call FillChar() on it. There won't be any performance penalty procedure MyFunction; var i: integer; loc: record x,y,z: integer; a: array[0..10] of double; end; begin writeln(loc.x); // write random value (may be 0) FillChar(loc, SizeOf(loc), 0); writeln(loc.x); // write 0 for sure for i := 0 to high(loc.a) do writeln(loc.a[i]); // will write 0 values end; But as drawback, all those variables will be forced to be on stack, so the compiler won't be able to optimize and use register for them - which is the case for the "i" variable above. So don't put ALL local variables in the record, only those needed to be initialized. Anyway, if you have a lot of variables and a lot of code in a method, it may be time to refactor it, and use a dedicated class to implement the logic. This class could contain the variables of the "record" in my sample code. You could keep this class in the implementation section of your unit, for safety. It will be the safer way to debug - and test! One huge benefit of a dedicated class for any complex process is that it could be tested.
  5. If you look at the asm - at least on FPC - in fact CtrNistCarryBigEndian() is inlined so has very little impact. It is called 1/256th times, and only add a two inc/test opcodes. Using branchless instructions seems pointless in this part of the loop: DoBlock() takes dozen of cycles for sure, and the bottleneck is likely to be the critical section. Also note that 2^24 depends on the re-seed parameter, which may be set to something more than 2^24*16 bytes (even NIST seems to allow up to 2^48), so a 3 bytes counter won't be enough. CtrNistCarryBigEndian() is a nice and readable solution, in the context of filling a single block of 16 bytes. Current 32MB default for the reseed value is still far below from the NIST advice of 2^48. We used 32MB from user perspective - previous limit was 1MB which was really paranoid. Anyway, if an application needs a lot of random values, then it will instantiate its own TAesPrng, with a proper reseed, for each huge random need.
  6. Arnaud Bouchez

    Delphi 64bit compiler RTL speedup

    Nope: FPC Linux + mORMot DB and SOA layer since years. With high performance and stability - we had servers handling thousands of requests per seconds receiving TB of data running for months with no restart and no problem. Especially with our MM which uses much less memory than TBB. One problem I noticed on Linux with C memory managers running FPC services is that they are subject to SIGABRT if they encounter any memory problem. This is why we worked on our own https://github.com/synopse/mORMot2/blob/master/src/core/mormot.core.fpcx64mm.pas which consumes much less memory than TBB, and if there is a problem in our code, we have a GPF exception we can trace, and not a SIGABRT which kills the process. I can tell you that a SIGABRT for a service is a disaster - it always happen when you are far AFK and can't react quickly. And if you need to install something like https://mmonit.com/monit/ on your server, it becomes complicated...
  7. Two blog posts to share: https://blog.synopse.info/?post/2021/02/13/Fastest-AES-PRNG%2C-AES-CTR-and-AES-GCM-Delphi-implementation https://blog.synopse.info/?post/2021/02/12/New-AesNiHash-for-mORMot-2 TL&DR: new AES assembly code burst AES-CTR AES-GCM AES-PRNG and AES-HASH implementation, especially on x86_64, for mORMot 2. It outperforms OpenSSL for AES-CTR and AES-PRNG, and is magnitude times faster than every other Delphi library I know about.
  8. New hasher in town, to test and benchmark: https://blog.synopse.info/?post/2021/02/12/New-AesNiHash-for-mORMot-2 Murmur and xxHash just far away from it, in terms of speed, and also in terms of collisions I guess... 15GB/s on my Core i3, on both Win32 and Win64. The smallest length of 0-15 bytes are taken without any branch, 16-128 bytes have no loop, and 129+ are hashed with 128 bytes per iteration. Also note its anti-DOS abilities, thanks to its random seed at process startup. So it was especially tuned for a hashmap/dictionary.
  9. Arnaud Bouchez

    AnsiPos and AnsiString

    1. Use RawByteString instead of AnsiString if you don't want to force any conversion. 2. Note that Ansi*() functions are not all meant to deal with AnsiString, but they expect string/UnicodeString types, and deal with current system locale e.g. for comparison or case folding... A bit confusing indeed... 3. Consider using your own version of functions- as we did with https://github.com/synopse/mORMot2/blob/master/src/core/mormot.core.base.pas - so you are sure there is no hidden conversion. 4. The main trick is indeed to never let any 'Implicit string cast' warning unfixed. And sometimes use Alt+F2 to see the generated asm, and check there is no hidden "call" during the conversion. 5. Another good idea is to write some unit tests of your core process, uncoupled from TCP itself: write them in the original pre-Unicode Delphi, then recompile the code with the Unicode version of Delphi and ensure they do pass. It will save you a lot of time!
  10. Arnaud Bouchez

    Delphi 64bit compiler RTL speedup

    Tests were done last year on the last Debian.
  11. Arnaud Bouchez

    CrossOver : Win -> Mac

    From my experiment, Delphi has a lot of troubles running under Wine. IIRC Delphi 7 starts, but debugging is not possible. Newer versions didn't start without some dependencies. So Wine is not an option for the Delphi IDE itself. On the contrary, regular VCL apps work well on Wine, if the UI components are almost standard. You may also check https://winebottler.kronenberg.org/ which is a way of packaging a Win executable into a Mac app, embedding Wine with the package.
  12. Arnaud Bouchez

    Delphi 64bit compiler RTL speedup

    No, it was not just "reserved", there was a lot more of dirty pages with Intel TBB. We tried it on production on Linux, on high-end servers with heavy multi-thread process, and the resident size (RES) was much bigger - not only the virtual/shared memory (VIRT/SHR). Also the guys from https://unitybase.info - which have very high demanding services - evaluated and rejected the Intel TBB use. Either the glibc MM https://sourceware.org/glibc/wiki/MallocInternals or our https://github.com/synopse/mORMot2/blob/master/src/core/mormot.core.fpcx64mm.pas give good results on Linux, with low memory consumption. Anyway, I wouldn't use Windows to host demanding services. So if you have a Windows server with a lot of memory, you are free to use Intel TBB if you prefer.
  13. Arnaud Bouchez

    Delphi 64bit compiler RTL speedup

    The 3rd party dll are Intel TBB if I am correct. So you should at least mention it, with the proper licence terms, and provide a link. About memory management, from my tests the Intel TBB MM is indeed fast, but eats all memory, so it is not usable for any serious server-side software, running a long time. Some numbers, tested on FPC/Linux, but you got the idea: - FPC default heap 500000 interning 8 KB in 77.34ms i.e. 6,464,959/s, aver. 0us, 98.6 MB/s 500000 direct 7.6 MB in 100.73ms i.e. 4,963,518/s, aver. 0us, 75.7 MB/s - glibc 2.23 500000 interning 8 KB in 76.06ms i.e. 6,573,152/s, aver. 0us, 100.2 MB/s 500000 direct 7.6 MB in 36.64ms i.e. 13,645,915/s, aver. 0us, 208.2 MB/s - jemalloc 3.6 500000 interning 8 KB in 78.60ms i.e. 6,361,323/s, aver. 0us, 97 MB/s 500000 direct 7.6 MB in 58.08ms i.e. 8,608,667/s, aver. 0us, 131.3 MB/s - Intel TBB 4.4 500000 interning 8 KB in 61.96ms i.e. 8,068,810/s, aver. 0us, 123.1 MB/s 500000 direct 7.6 MB in 36.46ms i.e. 13,711,402/s, aver. 0us, 209.2 MB/s for multi-threaded process, we observed best scaling with TBB on this system BUT memory consumption raised to 60 more space (gblic=2.6GB vs TBB=170GB)! -> so for serious server work, glibc (FPC_SYNCMEM) sounds the best candidate
  14. If a method does two diverse actions, define two methods. If a method performs an action which is something on/off or enabled/disabled then you can use a boolean, if the false/true statement is clearly defined by the naming of the method. If a method performs something, but with a custom behavior, don't use boolean (or booleans) but an enumeration or even better a set. It will be much more easy to understand what it does, without looking into the parameter names, and it will be more open to new options/behaviors. function TMyObject.SaveTo(json: boolean): string; // what is the behavior with json=false? function TMyObject.SaveToJson(expanded: boolean): string; // what does SaveToJson(true/false) mean without knowing the parameter name? function TMyObject.SaveToJson(expanded, usecache: boolean): string; // what does SaveToJson(true/false, true/false) mean without knowing the parameters names? type TMyObjectSaveToJsonOptions = set of (sjoExpanded, sjoUseCache); function TMyObject.SaveToJson(options: TMyObjectSaveToJsonOptions): string; // you understand what does SaveToJson([]) or SaveToJson([sjoExpanded]) or SaveToJson[sjoExpanded, sjoUserCache]) mean
  15. Arnaud Bouchez

    The Case of Delphi Const String Parameters

    All this is a void discussion. This code is just broken and should be fixed. It has nothing to do with const or whatever. The compiler is doing what it should, but the code is plain wrong. I fully agree with @David Heffernan here. About the "address", it should be pointer(Value) not @Value. pointer(value) returns the actual pointer of the string content in heap, so will change. It is a faster alternative to @value[1] which works also with value='' -> pointer(value)=nil. @Value returns the memory adress of the local Value variable on the stack, so won't change.