Jump to content

Arnaud Bouchez

Members
  • Content Count

    315
  • Joined

  • Last visited

  • Days Won

    22

Everything posted by Arnaud Bouchez

  1. Arnaud Bouchez

    Simple ORM

    Generation of SQL with parameters should help the performance and security.
  2. Arnaud Bouchez

    Delphi Developer wanted

    Nice seeing another growing project using mORMot. 😉 You can post the offer on Synopse forum, too, if you want.
  3. Arnaud Bouchez

    String comparison in HashTable

    From the asm shown in the video, it seems to compare WORD PTR characters, so it is likely to be UTF-16.
  4. Arnaud Bouchez

    String comparison in HashTable

    1) TL&WR Do not try to use those tricks in anything close to a common-use hash table, e.g. your own data library. So much micro-benchmarking for no benefit in practice: all this is not applicable to a common-use library. The Rust RTL has already been optimized and scrutinized by a lot of people and production workloads. I would never try to apply what he found for his own particular case to any hash table implementation. 2) hash function From profiling on real workloads, with a good enough hash function, there are only a few collisions. Of course, FNV is a slow function, with a bad collision rate. With a good hash function, e.g. our AesNiHash32 from https://github.com/synopse/mORMot2/blob/master/src/crypt/mormot.crypt.core.asmx64.inc#L6500 which is inspired by the one in Go RTL, comparing the first char is enough to reject most collisions, due to its almost-random hash spreading. Then, the idea of "tweaking the hash function" is just a pure waste of computer resource for a common-use library, once you have a realistic hash table with thousands (millions) of items. Of course, this guy want to hash a table of a few elements, which are known in advance. So it is not a problem for him. So no hash function was the fastest. Of course. But this is not a hash table any more - it is a dedicated algorithm for a specific use-case. 3) security Only using the first and last characters is an awful assumption for a hash process in a common library. It may work for his own dataset, but it is a very unsafe practice. This is the 101 of hash table security: don't make it guessable, or you would expose yourself to hash flooding http://ocert.org/advisories/ocert-2012-001.html 4) one known algorithm for such a fixed keyword lookup The purpose of this video is to quickly find a value within a fixed list of keywords. And from what I have seen in practice, some algorithms would perform better because won't involve a huge hash table, and won't pollute the CPU cache. For instance, this code is used on billions of computers, on billions of datasets, and works very well in practice: https://sqlite.org/src/file?name=src/tokenize.c&amp;ci=trunk The code is generated by https://sqlite.org/src/file?name=tool/mkkeywordhash.c&amp;ci=trunk An extract is: /* Check to see if z[0..n-1] is a keyword. If it is, write the ** parser symbol code for that keyword into *pType. Always ** return the integer n (the length of the token). */ static int keywordCode(const char *z, int n, int *pType){ int i, j; const char *zKW; if( n>=2 ){ i = ((charMap(z[0])*4) ^ (charMap(z[n-1])*3) ^ n*1) % 127; for(i=((int)aKWHash[i])-1; i>=0; i=((int)aKWNext[i])-1){ if( aKWLen[i]!=n ) continue; zKW = &zKWText[aKWOffset[i]]; if( (z[0]&~0x20)!=zKW[0] ) continue; if( (z[1]&~0x20)!=zKW[1] ) continue; j = 2; while( j<n && (z[j]&~0x20)==zKW[j] ){ j++; } if( j<n ) continue; *pType = aKWCode[i]; break; } } return n; } Its purpose was to reduce the code size, but in practice, it also reduces CPU cache pollution and tends to be very fast, thanks to a 128 bytes hash table. This code is close to what the video proposes - just even more optimized.
  5. Don't mess with the threads or DB connections of the HTTP/REST server. You would depend on an implementation detail of Mars, with no guaranty it stays the same in the future. If you have a long process, then a temporary dedicated connection is just fine. You could reuse the thread and its connection, for the next requests: add a queue to your thread, for pending requests.
  6. We just introduced in our Open Source mORMot 2 framework two client units to access DNS and LDAP/CLDAP servers. You can resolve IP addresses and services using DNS, and ask for information about your IT infrastructure using LDAP. There are not so many working and cross-platform OpenSource DNS and LDAP libraries around in Delphi or FPC, especially compatible with the latest MS AD versions. And none was able to use Kerberos authentication, or signing/sealing, AFAIK. Last but not least, its DNS and CLDAP server-auto-discovery feature is pretty unique. Please see https://blog.synopse.info/?post/2023/04/19/New-DNS-and-(C)LDAP-Clients-for-Delphi-and-FPC-in-mORMot-2 🙂
  7. For most projects, we want to be able to pass some custom values when starting it. We have ParamStr and ParamCount global functions, enough to retrieve the basic information. But not enough when you want to go any further. We just committed a new command line parser to our Open Source mORMot 2 framework, which works on both Delphi and FPC, follows both Windows not POSIX/Linux conventions, and has much more features (like automated generation of the help message), in an innovative and easy workflow. The most simple code may be the following (extracted from the documentation): var verbose: boolean; threads: integer; ... with Executable.Command do begin ExeDescription := 'An executable to test mORMot Execute.Command'; verbose := Option(['v', 'verbose'], 'generate verbose output'); Get(['t', 'threads'], threads, '#number of threads to run', 5); ConsoleWrite(FullDescription); end; This code will fill verbose and threads local variables from the command line (with some optional default value), and output on Linux: An executable to test mORMot Execute.Command Usage: mormot2tests [options] [params] Options: -v, --verbose generate verbose output Params: -t, --threads <number> (default 5) number of threads to run So, not only you can parse the command line and retrieve values, but you can also add some description text, and let generate an accurate help message when needed. More information available at https://blog.synopse.info/?post/2023/04/19/New-Command-Line-Parser-in-mORMot-2
  8. Arnaud Bouchez

    New Command Line Parser in mORMot 2

    Both syntax are of course supported. This is explained in the blog article: What is your exact concern? Is it that you want the quotes to be supported too? Such quotes are not cross-platform I guess. The parser don't read quotes, because they are in fact parsed at OS level. IMHO the correct way is to write either /path "C:\Program Files\mORMotHyperServer\" or "/path=C:\Program Files\mORMotHyperServer\" But I did not test this. Any feedback is welcome.
  9. Arnaud Bouchez

    ANN: mORMot 2 Release Candidate

    The mORMot 2 framework is about to be released as its first 2.0 stable version. I am currently working on preliminary documentation. Some first shot here https://synopse.info/files/doc/mORMot2.html The framework feature set should now be considered as sealed for this release. There is no issue reported opened at https://github.com/synopse/mORMot2/issues or in the forum. Please test it, and give here some feedback to fix any problem before the actual release! We enter a framework code-freeze phase until then. The forum thread for reporting issues and comment is https://synopse.info/forum/viewtopic.php?id=6442 The related blog article is https://blog.synopse.info/?post/2023/01/10/mORMot-2-Release-Candidate
  10. Arnaud Bouchez

    Cyber security Question

    Yes, compute a cryptograhic hash of the scripts (MD5 or SHA1 are not enough) before running them. But you need to ensure that the hash are provided in a safe way, e.g. as constant within a digitally signed executable. You may consider hashing ALL the scripts at startup, and compare a single hash with the expected value. Then refuse to start is something was tempered with. Instead of fixed hash, you could add an asymmetric signature of all scripts to your script folder. Then put the signature together with the files, and only store a public key within the executable. You can use https://github.com/synopse/mORMot2/tree/master/src/crypt for those tasks. This is for instance what is run at the core of https://wapt.tranquil.it/store/en/ to protect the python script within each software installation package.
  11. Arnaud Bouchez

    Sweet 16: Delphi/Object Pascal

    This index is clearly weird. I don't remember anything new in Visual Basic in 2020, which made a 400% increase of interrest... https://www.tiobe.com/tiobe-index/visual-basic/
  12. Arnaud Bouchez

    App is faster in IDE

    To be fair, there is a 14,000 time addition on both sides, more often outside of the IDE. Something is interfering with your application, and wait for 14 seconds. Don't guess, use a profiler. You will see where the time is spent. For instance a good one is https://www.delphitools.info/samplingprofiler/
  13. From my tests running REST services on the same hardware, a Linux server using epoll is always much faster than http.sys. By a huge amount. My remark against WebBroker was not about its coding architecture, it was about its actual memory pressure, and performance overhead. And I won't understand why Apache may still be used for any benchmark. 🙂 About Rust/Malloc/Heap this is because the MS CRT malloc() is poorly coded. At best, it redirects to the MS heap. Nothing in common with our discussion.
  14. FastMM4, MSHeap, TBB or the libc fpalloc are not encapsulating the OS heap manager, they use low-level OS calls like VirtualAlloc or mmap() to reserve big blocks of memory (a few MB), then split them and manage smaller blocks. My guess is that you are making some confusion. About MSHeap, I guess it is documented in https://www.blackhat.com/docs/us-16/materials/us-16-Yason-Windows-10-Segment-Heap-Internals-wp.pdf
  15. All those tests on the localhost on Windows are not very representative. If you want something fast and scaling, use a Linux server, and not over the loopback, which is highly bypassed by the OS itself. Changing the MM in mORMot tests is never of 10x improvements, because the framework tries to avoid heap allocation as much as possible. @Edwin Yip @Stefan Glienke If you don't make any memory allocation, then you have the best performance. Our THttpAsyncServer Event-Driven HTTP Server tries to minimize the memory allocation, and we get very high numbers. https://github.com/synopse/mORMot2/blob/master/src/net/mormot.net.async.pas If I understand correctly, the performance came from 353 to 4869 requests per second with ab. I need to emphasize that ab is not a good benchmarking tool for high-performance numbers. You need to use something more scalable like wrk. With a mORMot 2 HTTP server on Linux, a benchmark test with wrk has requests per second much higher than those. With the default FPC memory manager. And if we use the FastMM4-based mORMot MM (which is tuned for multithreading) we reach 100K per second. On my old Core i5 7200u laptop: abouchez@aaa:~/$ wrk -c 100 -d 15s -t 4 http://localhost:8080/plaintext Running 15s test @ http://localhost:8080/plaintext 4 threads and 100 connections Thread Stats Avg Stdev Max +/- Stdev Latency 1.41ms 3.74ms 45.25ms 93.57% Req/Sec 30.84k 6.58k 48.49k 65.72% 1845696 requests in 15.09s, 288.67MB read Requests/sec: 122341.58 Transfer/sec: 19.13MB Server code is available in https://github.com/synopse/mORMot2/tree/master/ex/techempower-bench If I run the test with ab, I get: $ ab -c 100 -n 10000 http://localhost:8080/plaintext This is ApacheBench, Version 2.3 <$Revision: 1901567 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/ Server Software: mORMot2 Server Hostname: localhost Server Port: 8080 Document Path: /plaintext Document Length: 13 bytes Concurrency Level: 100 Time taken for tests: 0.616 seconds Complete requests: 10000 Failed requests: 0 Total transferred: 1590000 bytes HTML transferred: 130000 bytes Requests per second: 16245.71 [#/sec] (mean) Time per request: 6.155 [ms] (mean) Time per request: 0.062 [ms] (mean, across all concurrent requests) Transfer rate: 2522.53 [Kbytes/sec] received As you can see, ab is not very good at scaling on multiple threads, especially because by default it does NOT keep alive the connection. So if you add the -k switch, then you will have kept-alive connections, which is closer to the actual use of a server I guess: $ ab -k -c 100 -n 100000 http://localhost:8080/plaintext This is ApacheBench, Version 2.3 <$Revision: 1901567 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/ Server Software: mORMot2 Server Hostname: localhost Server Port: 8080 Document Path: /plaintext Document Length: 13 bytes Concurrency Level: 100 Time taken for tests: 1.284 seconds Complete requests: 100000 Failed requests: 0 Keep-Alive requests: 100000 Total transferred: 16400000 bytes HTML transferred: 1300000 bytes Requests per second: 77879.68 [#/sec] (mean) Time per request: 1.284 [ms] (mean) Time per request: 0.013 [ms] (mean, across all concurrent requests) Transfer rate: 12472.92 [Kbytes/sec] received Therefore, ab achieves only about half of the requests per second rate that wrk is able to do. Just forget about ab. And when I close my test server, I have the following stats: { "ApiVersion": "Debian Linux 5.10.0 epoll", "ServerName": "mORMot2 (Linux)", "ProcessName": "8080", "SockPort": "8080", "ServerKeepAliveTimeOut": 300000, "HeadersDefaultBufferSize": 2048, "HeadersMaximumSize": 65535, "Async": { "ThreadPoolCount": 16, "ConnectionHigh": 100, "Clients": { "ReadCount": 1784548, "WriteCount": 1627649, "ReadBytes": 95843130, "WriteBytes": 267528514, "Total": 10313 }, "Server": { "Server": "0.0.0.0", "Port": "8080", "RawSocket": 5, "TimeOut": 10000 }, "Accepted": 10313, "MaxConnections": 7777777, "MaxPending": 100000 } } Flags: SERVER assumulthrd lockless erms debug repmemleak Small: blocks=14K size=997KB (part of Medium arena) Medium: 10MB/21MB peak=21MB current=8 alloc=17 free=9 sleep=0 Large: 0B/0B peak=0B current=0 alloc=0 free=0 sleep=0 Small Blocks since beginning: 503K/46MB (as small=43/46 tiny=56/56) 64=306K 48=104K 32=39K 96=11K 160=9K 192=5K 320=5K 448=5K 2176=5K 256=4K 80=1K 144=1K 128=888 1056=609 112=469 1152=450 Small Blocks current: 14K/997KB 64=10K 48=3K 352=194 32=172 112=128 128=85 96=83 80=76 16=38 176=23 192=16 576=14 144=12 880=10 272=10 160=9 The last block is the detailed information of our FastMM4 fork for FPC, in x86_64 asm. https://github.com/synopse/mORMot2/blob/master/src/core/mormot.core.fpcx64mm.pas The peak MM consumption was 21MB of memory. Compare it with what WebBroker consumes in a similar test. In particular "sleep=0" indicates that there was NO contention at all during the whole server process. Just by adding some enhancements to FastMM4 original code, like a thread-safe round-robin of small blocks arenas. It was with memory leaks reporting (none reported here), and debug/stats mode - so you could save a few % by disabling those features. To conclude, it seems that it is the WebBroker technology which is fundamentally broken in terms of performance, not the MM itself. I would also consider how much MB of memory the processes are consuming. I suspect the MSHeap consumes more than FastMM4. Intel TBB was a nightmare, not usable on production servers, from my tests, in that respect. (Sorry if I was a bit long, but it is a subject I like very much)
  16. Arnaud Bouchez

    Components4developers???

    Perhaps they have an attack from Russian hackers....
  17. Arnaud Bouchez

    Is Move the fastest way to copy memory?

    L1 cache access time makes a huge difference. http://blog.skoups.com/?p=592 You could retrieve the L1 cache size, then work on buffers of about 90% of this size (always keep some space for stack, tables and such). Then, if you work in the API buffer directly, a non-temporal move to the result buffer may help a little. During your process, if you use lookup tables, ensure they don't pollute the cache. But profiling is the key for sure. Guesses are most of the time wrong...
  18. Arnaud Bouchez

    Is Move the fastest way to copy memory?

    That's what I wrote: it is unlikely alternate Move() would make a huge difference. When working on buffers, cache locality is a performance key. Working on smaller buffers, which fit in L1 cache (a few MB usually) could be faster than two big Move / Process. But perhaps your CPU has already good enough cache (bigger than your picture), so it won't help. About the buffers, couldn't you use a ring of them, so that you don't move data?
  19. Arnaud Bouchez

    How make benchmark?

    It will depend on the Database used behind FireDAC or Zeos, and the standard used (ODBC/OleDB/Direct...). I would say that both are tuned - just ensure you got the latest version of Zeos, which is much more maintained and refined that FireDAC in the last years. Note that FireDAC has some aggressive settings, e.g. for SQLite3 it changes the default safe write settings into faster access. The main interrest of Zeos is that the ZDBC low-level layer does not use a TDataSet, so it is (much) faster if you retrieve a single object. You will see those two behavior in the Michal numbers above, for instance. Also note that mORMot has a direct DB layer, not based on TDataSet, which may be used with FireDAC or Zeos, or with its own direct ODBC/OleDB/Oracle/PostgreSQL/SQLite3 data access. See https://synopse.info/files/html/Synopse mORMot Framework SAD 1.18.html#TITL_27 Note that its ORM is built on top on this unique DB layer, and add some unique features like multi-insert SQL generation, so a mORMot TRestBatch is usually much faster than direct naive INSERTs within a transaction. You can reach 1 million inserts per second with SQLite3 with mORMot 2 - https://blog.synopse.info/?post/2022/02/15/mORMot-2-ORM-Performance
  20. Arnaud Bouchez

    Update framework question

    I would stick with a static JSON resource, if it is 20KB of data once zipped. Don't use HEAD for it. With a simple GET, and proper E-Tag caching, it would let the HTTP server return 304 on GET if not modified: just a single request, only returning the data when it changed. All will stay at HTTP server level, so it would be simple and effective.
  21. Arnaud Bouchez

    Is Move the fastest way to copy memory?

    Don't expect anything magic by using mORMot MoveFast(). Perhaps a few percent more or less. On Win32 - which is your target, IIRC the Delphi RTL uses X87 registers. On this platform, MoveFast() use SSE2 registers for small sizes, so is likely to be slightly faster, and will leverage ERMSB move (i.e. rep movsb) on newer CPUs which support it. To be fair, mORMot asm is more optimized for x86_64 than for i386 - because it is the target platform for server side, which is the one needing more optimization. But I would just try all FastCode variants - some can be very verbose, but "may" be better. What I would do in your case, is trying to not move any data at all. Isn't it possible that you pre-allocate a set of buffers, then just consume them in a circular way, passing them from the acquisition to the processing methods as pointers, with no copy? The fastest move() is ... when there is no move... 🙂
  22. Arnaud Bouchez

    Locked SQlite

    See https://www.sqlite.org/lockingv3.html By default, FireDac opens SQLite3 databases in "exclusive" mode, meaning that only a single connection is allowed. It is much better for the performance, but it "locks" the file for opening outside this main connection. So, as @joaodanet2018 wrote, change the LockingMode in FDconn, or just close the application currently using it.
  23. Arnaud Bouchez

    Docx (RTF) to PDF convert

    I really recommend https://www.trichview.com/
  24. Where are you located? (it makes difference for your potential work status, even remotely) Do you have some code to show? (e.g. on github or anywhere else)
×