Jump to content

Arnaud Bouchez

Members
  • Content Count

    324
  • Joined

  • Last visited

  • Days Won

    25

Everything posted by Arnaud Bouchez

  1. Arnaud Bouchez

    ANN: Native X.509, RSA and HSM Support for mORMot

    @Kas Ob. Thanks a lot for the very interesting feedback. Some remarks with no order: As written in the blog article, we tried to follow FIPS specs, used Mbed TLS source as reference, and also the HAC document. Perhaps not at 100%, but at least for the most known cornercases. RSA keys generation follows all those patterns, so should have very proven pattern to generate primes, and we fixed 65537 as exponent. Even the random source is the OS (which is likely to be trusted), then XORed with RdRand() on Intel/AMD, then XORed with our own CSPRNG (which is AES based, from an extensive set of entropy sources). XORing a random source with other sources is a common and safe practice to ensure there is no weakness or predictability. We therefore tried to avoid weaknesses like https://ieeexplore.ieee.org/document/9014350 - see the TBigInt.FillPrime method. The "certificate cache" is a cache from the raw DER binary: the same ICryptCert instance will be reused for the very same DER/PEM input. Sounds safe. In the code you screenshot, there is a x.Compare() call which actually ensure that the supplied certificate DER/PEM binary match the one we have in the trusted list as SKI. If the application is using our high-level ICryptCert abstract interface, only safe levels will be available (e.g. a minimal RSA-2048 with SHA-256, or ECC-256 with SHA-256, and proven AES modes): it was meant to be as less error-prone as possible for the end user. You just can't sign a certificate from a MD5 or SHA1 hash, or encrypt with RC4, DES or AES-ECB. Note that our plan is to implement TLS 1.3 (and only its version 1.3) in the close future, to mitigate even more MIM attacks during the TLS handshake (because all traffic is signed and verified). To summarize, if we use some cache or search within the internal lists, we always ensure that the whole DER/PEM binary do match at 100%, not only some fields. We even don't use fingerprints, but every byte of the content. So attacks from forged certificates with only partial fields should be avoided. Of course, in real live if some company need its application to fulfill some high level of requirements, you may just use OpenSSL or any other library which fits what is needed. With some other potential problems, like loading a wrong or forged external library, or running on a weak POSIX OS... but it is the fun of regulation. 😉 You follow the rules - even if they also are weak. Perhaps we missed something, so your feedback is very welcome. We would like to have our mormot.crypt.*.pas units code audited in the future, by some third party or government agency, in our EEC context, and especially the French regulations. The mORMot 1 SynCrypto and SynEcc were already audited some years ago by a 1B$ company security experts - but it was an internal audit. But the security is even better with mORMot 2. Please continue to look at the source, and if you see anything wrong and dubious, or see any incorrect comment, do not hesitate to come back!
  2. Arnaud Bouchez

    FireDAC Alternative

    Try Zeos - they are Open Source, with very good support. And if you need direct DB access, you can use their ZDBC API which bypasses the TDataSet component so is faster, e.g. for a SELECT with a few rows.
  3. Perhaps the FPU/x87 is not well supported within the emulator. IIRC the 32-bit Delphi RTL uses the FPU to make double-to-string conversion, using BCD conversions https://www.felixcloutier.com/x86/fbstp
  4. You may have noticed that the OpenSSL 1.1.1 series will reach End of Life (EOL) next Monday... Most sensible options are to switch to 3.0 or 3.1 as soon as possible. But we also discovered that switching to OpenSSL 3.0 could led into big performance regressions... so which version do we need to use? 😮 I just published a blog article about this, and also how we tried to leverage any incompatibility issue within the mORMot OpenSSL layer: https://blog.synopse.info/?post/2023/09/08/End-Of-Live-OpenSSL-1.1-vs-Slow-OpenSSL-3.0
  5. Arnaud Bouchez

    End Of Live OpenSSL 1.1 vs Slow OpenSSL 3.0

    For the TLS layer, I did not notice any huge performance problem during the transmission. Only the certificate checking may take longer than before.
  6. Arnaud Bouchez

    KeepAliveTimeSec of TSslHttpServer

    IIRC the keep alive time out is not about idle time, but total existing time of the whole connection.
  7. Arnaud Bouchez

    mORMot 2.1 Released

    We are pleased to announce the release of mORMot 2.1. The download link is available on https://github.com/synopse/mORMot2/releases/tag/2.1.stable The reference blog article was just published at https://blog.synopse.info/?post/2023/08/24/mORMot-2.1-Released Here is an extract of the release notes: Added (C)LDAP, DNS, (S)NTP clients Command Line Parser Native digest/basic HTTP servers authentication Angelize services/daemons manager TTunnelLocal TCP port forwarding SHA-1/SHA-256 HW opcodes asm 7Zip dll wrapper OpenSSL CSR support PostgreSQL async DB with HTTP async backend (for TFB) LUTI continous integration cross-platform farm Changed Upgraded SQLite3 to 3.42.0 Stabilized Mac x86_64/aarch64 platforms Lots of bug fixes and enhancements Any feedback is welcome! 🙂
  8. Arnaud Bouchez

    mORMot 2.1 Released

    Where to start? 1) Use TArray<integer> instead of array of integer (seems incredible - see below) 2) ARC -- happily removed since 10.4 Sidney 3) Language level incompatibilities: RawByteString, shortstring... 4) The LongInt and LongWord Data Type are different on 64-bit POSIX platforms (WTF) 5) too-much-moving target (it is almost impossible to target several Delphi compiler versions at once for mobile targets) 6) lack of ASM inlined blocks (and we have a lot of very good asm in mORMot) 7) LLVM backend issues (floating point performance, inlining inconsistency with Delphi compiler) ... in short: when you compare with FPC e.g. on Linux x64, most latest Delphi design decisions just make no sense About point 1) I am not sure but I got it from official Delphi documentation: https://docwiki.embarcadero.com/RADStudio/Alexandria/en/Migrating_Delphi_Code_to_Mobile_from_Desktop This is just a b**s**t idea because at the compiler level "array of" and "TArray<>" are handled the same internally IIRC. It just breaks code with pre-generics compilers we prefer to support. So from now on, it was not possible to envisage Delphi "mobile" compiler compatibility without a lot of work. We do not want to pollute our source code with IFDEF everywhere, just to circumvent Delphi inconsistencies.
  9. Arnaud Bouchez

    mORMot 2.1 Released

    Good question. 😉 The multi-platform of mORMot 2 was deeply enhanced, but mostly about how it is implemented. Most system-specific code is now within mormot.core.os.pas. It now eases a lot the port to other systems. But currently, about Delphi platform supports, we only support Win32/Win64. Other platforms are supported on FPC only. Some new platforms are supported in mORMot 2 like Linux arm32 and aarch64, or Darwin/MacOS aarch64. We test them with our CI farm (using Mac M1 VM, or even a RPi). The reasons are: 1) we don't need it for our own projects 2) we don't own any Delphi licence with Mobile and Linux support 3) FPC cross-platform support is just much better than Delphi (exact same compiler and features everywhere) and don't break anything between versions 4) we certainly would need a lot of IFDEF to support those platforms - which we are very reluctant to do 5) for a FMX app, it is likely to need only some REST client support, which does not require the whole mORMot feature set - and you can generate client code for FMX To be more precise, with mORMot 2, we still need to fix the client code generation (which is not fully debugged), and introduce some cross-platform client units, which are likely to support TMS WebCore instead of SmartMobileStudio.
  10. Arnaud Bouchez

    Simple ORM

    Generation of SQL with parameters should help the performance and security.
  11. Arnaud Bouchez

    Delphi Developer wanted

    Nice seeing another growing project using mORMot. 😉 You can post the offer on Synopse forum, too, if you want.
  12. Arnaud Bouchez

    String comparison in HashTable

    From the asm shown in the video, it seems to compare WORD PTR characters, so it is likely to be UTF-16.
  13. Arnaud Bouchez

    String comparison in HashTable

    1) TL&WR Do not try to use those tricks in anything close to a common-use hash table, e.g. your own data library. So much micro-benchmarking for no benefit in practice: all this is not applicable to a common-use library. The Rust RTL has already been optimized and scrutinized by a lot of people and production workloads. I would never try to apply what he found for his own particular case to any hash table implementation. 2) hash function From profiling on real workloads, with a good enough hash function, there are only a few collisions. Of course, FNV is a slow function, with a bad collision rate. With a good hash function, e.g. our AesNiHash32 from https://github.com/synopse/mORMot2/blob/master/src/crypt/mormot.crypt.core.asmx64.inc#L6500 which is inspired by the one in Go RTL, comparing the first char is enough to reject most collisions, due to its almost-random hash spreading. Then, the idea of "tweaking the hash function" is just a pure waste of computer resource for a common-use library, once you have a realistic hash table with thousands (millions) of items. Of course, this guy want to hash a table of a few elements, which are known in advance. So it is not a problem for him. So no hash function was the fastest. Of course. But this is not a hash table any more - it is a dedicated algorithm for a specific use-case. 3) security Only using the first and last characters is an awful assumption for a hash process in a common library. It may work for his own dataset, but it is a very unsafe practice. This is the 101 of hash table security: don't make it guessable, or you would expose yourself to hash flooding http://ocert.org/advisories/ocert-2012-001.html 4) one known algorithm for such a fixed keyword lookup The purpose of this video is to quickly find a value within a fixed list of keywords. And from what I have seen in practice, some algorithms would perform better because won't involve a huge hash table, and won't pollute the CPU cache. For instance, this code is used on billions of computers, on billions of datasets, and works very well in practice: https://sqlite.org/src/file?name=src/tokenize.c&amp;ci=trunk The code is generated by https://sqlite.org/src/file?name=tool/mkkeywordhash.c&amp;ci=trunk An extract is: /* Check to see if z[0..n-1] is a keyword. If it is, write the ** parser symbol code for that keyword into *pType. Always ** return the integer n (the length of the token). */ static int keywordCode(const char *z, int n, int *pType){ int i, j; const char *zKW; if( n>=2 ){ i = ((charMap(z[0])*4) ^ (charMap(z[n-1])*3) ^ n*1) % 127; for(i=((int)aKWHash[i])-1; i>=0; i=((int)aKWNext[i])-1){ if( aKWLen[i]!=n ) continue; zKW = &zKWText[aKWOffset[i]]; if( (z[0]&~0x20)!=zKW[0] ) continue; if( (z[1]&~0x20)!=zKW[1] ) continue; j = 2; while( j<n && (z[j]&~0x20)==zKW[j] ){ j++; } if( j<n ) continue; *pType = aKWCode[i]; break; } } return n; } Its purpose was to reduce the code size, but in practice, it also reduces CPU cache pollution and tends to be very fast, thanks to a 128 bytes hash table. This code is close to what the video proposes - just even more optimized.
  14. Don't mess with the threads or DB connections of the HTTP/REST server. You would depend on an implementation detail of Mars, with no guaranty it stays the same in the future. If you have a long process, then a temporary dedicated connection is just fine. You could reuse the thread and its connection, for the next requests: add a queue to your thread, for pending requests.
  15. We just introduced in our Open Source mORMot 2 framework two client units to access DNS and LDAP/CLDAP servers. You can resolve IP addresses and services using DNS, and ask for information about your IT infrastructure using LDAP. There are not so many working and cross-platform OpenSource DNS and LDAP libraries around in Delphi or FPC, especially compatible with the latest MS AD versions. And none was able to use Kerberos authentication, or signing/sealing, AFAIK. Last but not least, its DNS and CLDAP server-auto-discovery feature is pretty unique. Please see https://blog.synopse.info/?post/2023/04/19/New-DNS-and-(C)LDAP-Clients-for-Delphi-and-FPC-in-mORMot-2 🙂
  16. For most projects, we want to be able to pass some custom values when starting it. We have ParamStr and ParamCount global functions, enough to retrieve the basic information. But not enough when you want to go any further. We just committed a new command line parser to our Open Source mORMot 2 framework, which works on both Delphi and FPC, follows both Windows not POSIX/Linux conventions, and has much more features (like automated generation of the help message), in an innovative and easy workflow. The most simple code may be the following (extracted from the documentation): var verbose: boolean; threads: integer; ... with Executable.Command do begin ExeDescription := 'An executable to test mORMot Execute.Command'; verbose := Option(['v', 'verbose'], 'generate verbose output'); Get(['t', 'threads'], threads, '#number of threads to run', 5); ConsoleWrite(FullDescription); end; This code will fill verbose and threads local variables from the command line (with some optional default value), and output on Linux: An executable to test mORMot Execute.Command Usage: mormot2tests [options] [params] Options: -v, --verbose generate verbose output Params: -t, --threads <number> (default 5) number of threads to run So, not only you can parse the command line and retrieve values, but you can also add some description text, and let generate an accurate help message when needed. More information available at https://blog.synopse.info/?post/2023/04/19/New-Command-Line-Parser-in-mORMot-2
  17. Arnaud Bouchez

    New Command Line Parser in mORMot 2

    Both syntax are of course supported. This is explained in the blog article: What is your exact concern? Is it that you want the quotes to be supported too? Such quotes are not cross-platform I guess. The parser don't read quotes, because they are in fact parsed at OS level. IMHO the correct way is to write either /path "C:\Program Files\mORMotHyperServer\" or "/path=C:\Program Files\mORMotHyperServer\" But I did not test this. Any feedback is welcome.
  18. Arnaud Bouchez

    ANN: mORMot 2 Release Candidate

    The mORMot 2 framework is about to be released as its first 2.0 stable version. I am currently working on preliminary documentation. Some first shot here https://synopse.info/files/doc/mORMot2.html The framework feature set should now be considered as sealed for this release. There is no issue reported opened at https://github.com/synopse/mORMot2/issues or in the forum. Please test it, and give here some feedback to fix any problem before the actual release! We enter a framework code-freeze phase until then. The forum thread for reporting issues and comment is https://synopse.info/forum/viewtopic.php?id=6442 The related blog article is https://blog.synopse.info/?post/2023/01/10/mORMot-2-Release-Candidate
  19. Arnaud Bouchez

    Cyber security Question

    Yes, compute a cryptograhic hash of the scripts (MD5 or SHA1 are not enough) before running them. But you need to ensure that the hash are provided in a safe way, e.g. as constant within a digitally signed executable. You may consider hashing ALL the scripts at startup, and compare a single hash with the expected value. Then refuse to start is something was tempered with. Instead of fixed hash, you could add an asymmetric signature of all scripts to your script folder. Then put the signature together with the files, and only store a public key within the executable. You can use https://github.com/synopse/mORMot2/tree/master/src/crypt for those tasks. This is for instance what is run at the core of https://wapt.tranquil.it/store/en/ to protect the python script within each software installation package.
  20. Arnaud Bouchez

    Sweet 16: Delphi/Object Pascal

    This index is clearly weird. I don't remember anything new in Visual Basic in 2020, which made a 400% increase of interrest... https://www.tiobe.com/tiobe-index/visual-basic/
  21. Arnaud Bouchez

    App is faster in IDE

    To be fair, there is a 14,000 time addition on both sides, more often outside of the IDE. Something is interfering with your application, and wait for 14 seconds. Don't guess, use a profiler. You will see where the time is spent. For instance a good one is https://www.delphitools.info/samplingprofiler/
  22. From my tests running REST services on the same hardware, a Linux server using epoll is always much faster than http.sys. By a huge amount. My remark against WebBroker was not about its coding architecture, it was about its actual memory pressure, and performance overhead. And I won't understand why Apache may still be used for any benchmark. 🙂 About Rust/Malloc/Heap this is because the MS CRT malloc() is poorly coded. At best, it redirects to the MS heap. Nothing in common with our discussion.
  23. FastMM4, MSHeap, TBB or the libc fpalloc are not encapsulating the OS heap manager, they use low-level OS calls like VirtualAlloc or mmap() to reserve big blocks of memory (a few MB), then split them and manage smaller blocks. My guess is that you are making some confusion. About MSHeap, I guess it is documented in https://www.blackhat.com/docs/us-16/materials/us-16-Yason-Windows-10-Segment-Heap-Internals-wp.pdf
  24. All those tests on the localhost on Windows are not very representative. If you want something fast and scaling, use a Linux server, and not over the loopback, which is highly bypassed by the OS itself. Changing the MM in mORMot tests is never of 10x improvements, because the framework tries to avoid heap allocation as much as possible. @Edwin Yip @Stefan Glienke If you don't make any memory allocation, then you have the best performance. Our THttpAsyncServer Event-Driven HTTP Server tries to minimize the memory allocation, and we get very high numbers. https://github.com/synopse/mORMot2/blob/master/src/net/mormot.net.async.pas If I understand correctly, the performance came from 353 to 4869 requests per second with ab. I need to emphasize that ab is not a good benchmarking tool for high-performance numbers. You need to use something more scalable like wrk. With a mORMot 2 HTTP server on Linux, a benchmark test with wrk has requests per second much higher than those. With the default FPC memory manager. And if we use the FastMM4-based mORMot MM (which is tuned for multithreading) we reach 100K per second. On my old Core i5 7200u laptop: abouchez@aaa:~/$ wrk -c 100 -d 15s -t 4 http://localhost:8080/plaintext Running 15s test @ http://localhost:8080/plaintext 4 threads and 100 connections Thread Stats Avg Stdev Max +/- Stdev Latency 1.41ms 3.74ms 45.25ms 93.57% Req/Sec 30.84k 6.58k 48.49k 65.72% 1845696 requests in 15.09s, 288.67MB read Requests/sec: 122341.58 Transfer/sec: 19.13MB Server code is available in https://github.com/synopse/mORMot2/tree/master/ex/techempower-bench If I run the test with ab, I get: $ ab -c 100 -n 10000 http://localhost:8080/plaintext This is ApacheBench, Version 2.3 <$Revision: 1901567 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/ Server Software: mORMot2 Server Hostname: localhost Server Port: 8080 Document Path: /plaintext Document Length: 13 bytes Concurrency Level: 100 Time taken for tests: 0.616 seconds Complete requests: 10000 Failed requests: 0 Total transferred: 1590000 bytes HTML transferred: 130000 bytes Requests per second: 16245.71 [#/sec] (mean) Time per request: 6.155 [ms] (mean) Time per request: 0.062 [ms] (mean, across all concurrent requests) Transfer rate: 2522.53 [Kbytes/sec] received As you can see, ab is not very good at scaling on multiple threads, especially because by default it does NOT keep alive the connection. So if you add the -k switch, then you will have kept-alive connections, which is closer to the actual use of a server I guess: $ ab -k -c 100 -n 100000 http://localhost:8080/plaintext This is ApacheBench, Version 2.3 <$Revision: 1901567 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/ Server Software: mORMot2 Server Hostname: localhost Server Port: 8080 Document Path: /plaintext Document Length: 13 bytes Concurrency Level: 100 Time taken for tests: 1.284 seconds Complete requests: 100000 Failed requests: 0 Keep-Alive requests: 100000 Total transferred: 16400000 bytes HTML transferred: 1300000 bytes Requests per second: 77879.68 [#/sec] (mean) Time per request: 1.284 [ms] (mean) Time per request: 0.013 [ms] (mean, across all concurrent requests) Transfer rate: 12472.92 [Kbytes/sec] received Therefore, ab achieves only about half of the requests per second rate that wrk is able to do. Just forget about ab. And when I close my test server, I have the following stats: { "ApiVersion": "Debian Linux 5.10.0 epoll", "ServerName": "mORMot2 (Linux)", "ProcessName": "8080", "SockPort": "8080", "ServerKeepAliveTimeOut": 300000, "HeadersDefaultBufferSize": 2048, "HeadersMaximumSize": 65535, "Async": { "ThreadPoolCount": 16, "ConnectionHigh": 100, "Clients": { "ReadCount": 1784548, "WriteCount": 1627649, "ReadBytes": 95843130, "WriteBytes": 267528514, "Total": 10313 }, "Server": { "Server": "0.0.0.0", "Port": "8080", "RawSocket": 5, "TimeOut": 10000 }, "Accepted": 10313, "MaxConnections": 7777777, "MaxPending": 100000 } } Flags: SERVER assumulthrd lockless erms debug repmemleak Small: blocks=14K size=997KB (part of Medium arena) Medium: 10MB/21MB peak=21MB current=8 alloc=17 free=9 sleep=0 Large: 0B/0B peak=0B current=0 alloc=0 free=0 sleep=0 Small Blocks since beginning: 503K/46MB (as small=43/46 tiny=56/56) 64=306K 48=104K 32=39K 96=11K 160=9K 192=5K 320=5K 448=5K 2176=5K 256=4K 80=1K 144=1K 128=888 1056=609 112=469 1152=450 Small Blocks current: 14K/997KB 64=10K 48=3K 352=194 32=172 112=128 128=85 96=83 80=76 16=38 176=23 192=16 576=14 144=12 880=10 272=10 160=9 The last block is the detailed information of our FastMM4 fork for FPC, in x86_64 asm. https://github.com/synopse/mORMot2/blob/master/src/core/mormot.core.fpcx64mm.pas The peak MM consumption was 21MB of memory. Compare it with what WebBroker consumes in a similar test. In particular "sleep=0" indicates that there was NO contention at all during the whole server process. Just by adding some enhancements to FastMM4 original code, like a thread-safe round-robin of small blocks arenas. It was with memory leaks reporting (none reported here), and debug/stats mode - so you could save a few % by disabling those features. To conclude, it seems that it is the WebBroker technology which is fundamentally broken in terms of performance, not the MM itself. I would also consider how much MB of memory the processes are consuming. I suspect the MSHeap consumes more than FastMM4. Intel TBB was a nightmare, not usable on production servers, from my tests, in that respect. (Sorry if I was a bit long, but it is a subject I like very much)
×