Jump to content

RDP1974

Members
  • Content Count

    247
  • Joined

  • Last visited

  • Days Won

    1

Everything posted by RDP1974

  1. hi, I have built the libraries with the latest sources of https://www.intel.com/content/www/us/en/developer/tools/oneapi/ipp.html and https://www.intel.com/content/www/us/en/developer/tools/oneapi/onetbb.html I had zero warnings or problems on compile. Here the files https://github.com/RDP1974/Delphi64RTL Look the TBB allocator is very prone to detect memory errors as double free or overruns. In multithreaded apps as web applications you will get a large performance improvement. Btw. Intel license is totally permissive free to distribute and deploy everywhere please let me know if you discover errors Quick test with WebBroker Indy app producing a plain: program Project1; uses RDPMM64, Vcl.Forms, Web.WebReq, ... procedure TWebModule1.WebModule1DefaultHandlerAction(Sender: TObject; Request: TWebRequest; Response: TWebResponse; var Handled: Boolean); begin Response.Content := '<html>' + '<head><title>Web Server Application</title></head>' + '<body>Web Server Application '+FormatDateTime('yyyymmdd.hhnnss',Now)+'</body>' + '</html> end; Hyper-V i9 cpu windows 2022 server, 16 cores Host i9 cpu windows 10 pro Apache bench ab -n 1000 -c 100 -k -r http://localhost:8080/ Delphi 11 default Concurrency Level: 100 Time taken for tests: 1.845 seconds Complete requests: 1000 Failed requests: 0 Keep-Alive requests: 0 Total transferred: 250000 bytes HTML transferred: 114000 bytes Requests per second: 542.04 [#/sec] (mean) Time per request: 184.488 [ms] (mean) Time per request: 1.845 [ms] (mean, across all concurrent requests) Transfer rate: 132.33 [Kbytes/sec] received Delphi 11 (with Intel libs): Concurrency Level: 100 Time taken for tests: 0.297 seconds Complete requests: 1000 Failed requests: 0 Keep-Alive requests: 0 Total transferred: 250000 bytes HTML transferred: 114000 bytes Requests per second: 3364.56 [#/sec] (mean) Time per request: 29.722 [ms] (mean) Time per request: 0.297 [ms] (mean, across all concurrent requests) Transfer rate: 821.42 [Kbytes/sec] received
  2. well, somebody ask me to build a static dll without dependencies to visual c runtime and visual c++ this should be done with clang and mingw (but now I don't have time) about my repository I added a thread safe fifo queue for highest performance producer-consumer between threads https://github.com/RDP1974/Delphi64RTL check testqueue
  3. oneapi tbb concurrent hash map when call iterate() does a snapshot copy of the collection and publish it, meantime protecting keys with acc (similar to critical section) btw. concurrent_queue test with 10 threads is 3x quicker than TOmniQueue, and unfortunately TThreadedQueue goes in deadlock
  4. hi, I wish to not be offtopic, and to be useful anyway there https://github.com/RDP1974/Delphi64RTL I have added concurrent queue, thread safe, from OneApi v2022.1 also a small test there (single thread, create and dispose string, 10M push + 10M pop within 1 sec)(I have not time to do multithread test now) kind regards btw. this repo is a base for a mine custom server reactor+proactor done in Delphi
  5. updated intel scalable allocator a little change for delphi 12.x please let me know if you found errors kind regards btw. in my test, win11 24h, i9900, in single thread scenario it's identical score with the default mm (D12.3), in multithread scenario it's the fastest among tested
  6. guess a pool with 100 tthreads, each with a queue fifo receiving messages, also each tthread send messages simultaneously to every all others: then a tthreadqueue without global locking as CRT should be the faster solution? as far I have researched then the spring4d queue lock free seems the fastest solution (but I cannot find it in the source) finally -> OmniThreadLibrary -> TOmniBaseQueue -> Dynamically allocated, O(1) enqueue and dequeue, threadsafe, microlocking queue or TOmniMessageQueue (ring buffer) also I have found a ring buffer from https://blog.grijjy.com/2017/01/12/expand-your-collections-collection-part-2-a-generic-ring-buffer/ please can you suggest me the best code, libraries to achieve consumer-producers between threads? thanks btw.if I have time will do a dll for tbb::concurrent_queue
  7. guess a pool with 100 tthreads, each with a queue fifo receiving messages, also each tthread send messages simultaneously to every all others: then a tthreadqueue without global locking as CRT should be the faster solution?
  8. please can you test with TThreadedQueue? (with latest 12.3)
  9. can I ask? the tthreadedqueue it's reliable in delphi 12.3 latest release? or do you suggest omnithread queue? about dictionary same question, or do you suggest spring4d? or other libraries are better than default rtl?
  10. hi, I have a windows service where I dispatch a custom thread pool, dynamic, using IoCompletionPort api, then I have a component where methods should be called within the servicethread, my question is, do you know if servicethread.queue it is reliable? is this the best method to post things to the main thread in a safer way without incur in race conditions as deadlock? example, this code is called within a thread: TServiceThread.Queue(Service.ServiceThread, procedure begin dothings end); kind regards
  11. hi, I like to do remote debugging, I have installed PAserver 23 (from Delphi 12.2 paserver folder) into remote vps, then I have copied the \bin\rmtdbg290.exe to the paserver remote folder updating it, have done the profile manager for windows and paserver is connecting ok firewall ports open both paserver and remote debugger I obtain this error from delphi ide: can somebody help me how to obtain remote debugging of windows vcl? thank you
  12. RDP1974

    remote debugging windows 64 vcl

    sorry, it works needs open the firewall on remote side bcz many processes dispatch many ports also https://docwiki.embarcadero.com/RADStudio/Athens/en/Installing_a_Debugger_on_a_Remote_Machine kind regards
  13. RDP1974

    My app dies in Server 2019

    there is a generic all in one https://learn.microsoft.com/en-us/cpp/windows/latest-supported-vc-redist?view=msvc-170#visual-studio-2015-2017-2019-and-2022 installing this solved, probably db clients dll are binded to ucrt so needing this
  14. RDP1974

    My app dies in Server 2019

    I had a similar problem with firedac mysql solved installing visual c++ redistributable
  15. question do you remember the good old https://fastcode.sourceforge.net/ ? (2007 project) I asked chatgpt to convert asm32 to asm64 delphi, look at this example: procedure MoveJOH_SSE2_10(const Source; var Dest; Count : Integer); asm cmp ecx, TINYSIZE ja @@Large {Count > TINYSIZE or Count < 0} cmp eax, edx jbe @@SmallCheck add eax, ecx add edx, ecx jmp SmallForwardMove_10 @@SmallCheck: jne SmallBackwardMove_10 ret {For Compatibility with Delphi's move for Source = Dest} @@Large: jng @@Done {For Compatibility with Delphi's move for Count < 0} cmp eax, edx ja Forwards_SSE2_10 je @@Done {For Compatibility with Delphi's move for Source = Dest} sub edx, ecx cmp eax, edx lea edx, [edx+ecx] jna Forwards_SSE2_10 jmp Backwards_SSE2_10 {Source/Dest Overlap} @@Done: end; {MoveJOH_SSE2} chat gpt rewrite this in 64bit: procedure MoveJOH_SSE2_10(const Source; var Dest; Count: Integer); asm cmp r8d, TINYSIZE // Compare Count (R8D) with TINYSIZE ja @@Large // If Count > TINYSIZE or Count < 0, jump to @@Large cmp rcx, rdx // Compare Source (RCX) with Dest (RDX) jbe @@SmallCheck add rcx, r8 // Adjust Source address: Source + Count add rdx, r8 // Adjust Dest address: Dest + Count jmp SmallForwardMove_10 // Jump to small forward move @@SmallCheck: jne SmallBackwardMove_10 // If Source ≠ Dest, use backward move ret // If Source = Dest, exit (compatible with Delphi’s Move) @@Large: jng @@Done // If Count < 0, exit (compatible with Delphi’s Move) cmp rcx, rdx ja Forwards_SSE2_10 // If Source > Dest, use forward move je @@Done // If Source = Dest, exit sub rdx, r8 // Adjust Dest address: Dest - Count cmp rcx, rdx lea rdx, [rdx + r8] // Restore original Dest address jna Forwards_SSE2_10 // If no overlap, use forward move jmp Backwards_SSE2_10 // If overlap, use backward move @@Done: end; opinions? does this make sense to you? can we take the winners of fastcode 32bit and convert them in 64bit so easily? chatgpt then owns the copyright?
  16. indeed the quality of the system rtl of D12 is outstanding
  17. hi, I have done a more extended sample, with webbroker indy http server, json parsing, data serialize to firedac memtable and populate a response, using threadvar blobs single request > 1 request 1 thread: ab -n 1 -c 1 -k http://192.168.1.110:8080/ default Concurrency Level: 1 Time taken for tests: 0.003 seconds Complete requests: 1 Failed requests: 0 Keep-Alive requests: 0 Total transferred: 43146 bytes HTML transferred: 43008 bytes Requests per second: 287.27 [#/sec] (mean) Time per request: 3.481 [ms] (mean) Time per request: 3.481 [ms] (mean, across all concurrent requests) Transfer rate: 12104.21 [Kbytes/sec] received rdp64 intel tbb Concurrency Level: 1 Time taken for tests: 0.003 seconds Complete requests: 1 Failed requests: 0 Keep-Alive requests: 0 Total transferred: 43146 bytes HTML transferred: 43008 bytes Requests per second: 287.44 [#/sec] (mean) Time per request: 3.479 [ms] (mean) Time per request: 3.479 [ms] (mean, across all concurrent requests) Transfer rate: 12111.17 [Kbytes/sec] received msheap Concurrency Level: 1 Time taken for tests: 0.005 seconds Complete requests: 1 Failed requests: 0 Keep-Alive requests: 0 Total transferred: 43146 bytes HTML transferred: 43008 bytes Requests per second: 191.57 [#/sec] (mean) Time per request: 5.220 [ms] (mean) Time per request: 5.220 [ms] (mean, across all concurrent requests) Transfer rate: 8071.79 [Kbytes/sec] received fastmm5 Concurrency Level: 1 Time taken for tests: 0.005 seconds Complete requests: 1 Failed requests: 0 Keep-Alive requests: 0 Total transferred: 43146 bytes HTML transferred: 43008 bytes Requests per second: 191.64 [#/sec] (mean) Time per request: 5.218 [ms] (mean) Time per request: 5.218 [ms] (mean, across all concurrent requests) Transfer rate: 8074.89 [Kbytes/sec] received > multi thread test > 100 requests 100 threads: ab -n 100 -c 100 -k http://192.168.1.110:8080/ default: Concurrency Level: 100 Time taken for tests: 1.549 seconds Complete requests: 100 Failed requests: 0 Keep-Alive requests: 0 Total transferred: 4314600 bytes HTML transferred: 4300800 bytes Requests per second: 64.56 [#/sec] (mean) Time per request: 1548.967 [ms] (mean) Time per request: 15.490 [ms] (mean, across all concurrent requests) Transfer rate: 2720.18 [Kbytes/sec] received rdp64 intel tbb Concurrency Level: 100 Time taken for tests: 0.063 seconds Complete requests: 100 Failed requests: 0 Keep-Alive requests: 0 Total transferred: 4314600 bytes HTML transferred: 4300800 bytes Requests per second: 1596.37 [#/sec] (mean) Time per request: 62.642 [ms] (mean) Time per request: 0.626 [ms] (mean, across all concurrent requests) Transfer rate: 67262.80 [Kbytes/sec] received msheap Concurrency Level: 100 Time taken for tests: 0.070 seconds Complete requests: 100 Failed requests: 0 Keep-Alive requests: 0 Total transferred: 4314600 bytes HTML transferred: 4300800 bytes Requests per second: 1431.02 [#/sec] (mean) Time per request: 69.880 [ms] (mean) Time per request: 0.699 [ms] (mean, across all concurrent requests) Transfer rate: 60295.89 [Kbytes/sec] received fastmm5 Concurrency Level: 100 Time taken for tests: 0.110 seconds Complete requests: 100 Failed requests: 0 Keep-Alive requests: 0 Total transferred: 4314600 bytes HTML transferred: 4300800 bytes Requests per second: 909.90 [#/sec] (mean) Time per request: 109.902 [ms] (mean) Time per request: 1.099 [ms] (mean, across all concurrent requests) Transfer rate: 38338.49 [Kbytes/sec] received The ab bench is not very granular, so I assume that in a single thread all allocators should be close together. I also did a test on linux ubuntu 22 in the same host and the results are similar to windows with msheap or tbb. kind regards btw. I used libraries from JsonDataObjects, DataSet.Serialize TestWebbroker.zip
  18. I feel dumb, want edit first post, can't find how to want erase one post, seems impossible! 😕
  19. I don't find where to rename the topic 😕
  20. lowering the size of the output to 2.5kB (json blob) instead of 46kB, isapi has this throughput (cpu near old 9th 14nm i9900-kf) Concurrency Level: 100 Time taken for tests: 0.398 seconds Complete requests: 10000 Failed requests: 0 Keep-Alive requests: 10000 Total transferred: 27800000 bytes HTML transferred: 26100000 bytes Requests per second: 25106.45 [#/sec] (mean) Time per request: 3.983 [ms] (mean) Time per request: 0.040 [ms] (mean, across all concurrent requests) Transfer rate: 68160.09 [Kbytes/sec] received
  21. with keep-alive isapi app Concurrency Level: 100 Time taken for tests: 4.763 seconds Complete requests: 10000 Failed requests: 0 Keep-Alive requests: 10000 Total transferred: 431790000 bytes HTML transferred: 430080000 bytes Requests per second: 2099.45 [#/sec] (mean) Time per request: 47.631 [ms] (mean) Time per request: 0.476 [ms] (mean, across all concurrent requests) Transfer rate: 88527.55 [Kbytes/sec] received in project source: Application.MaxConnections:=1000; Application.CacheConnections:=True;
  22. please tell me if these test are disturbing or inapropriate if so will delete them
×