Jump to content

RDPasqua

Members
  • Content Count

    25
  • Joined

  • Last visited

Community Reputation

8 Neutral

1 Follower

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

  1. Hi Andrea, about SQL slow results, sounds like you can have a pair of bottlenecks: 1) zlib gzip or deflate massive cpu taking 2) serialized SQL access 3) slow json 4) files should use TransmitFile API with kernel cache How to solve: 1) please can you do a test disabling HTTP compression, so we will see if this is the bottleneck 2) I'll upload a new Zlib 5x faster than the system default on single core, please wait I finish to adapt to Delphi RTL 3) please use Firedac FDManager POOLING set to true, and making your connection use it (so in the thread you will do connection create, transaction create, query create, SQL, free, free, free): you will save the time of connection that usually can be also 250-1000msec; in the query use UNIDIRECTIONAL true readonly for selects, and LIVE false, CACHED false, use indexes and plan analyzer 4) avoid to use a single connection with shared queries over it 5) you can accelerate further doing your pooling, use a threadvar: class DBPool FConn: FDConnection FTrans: FDTranscation FQuery: FDQuery theadvar MyPool:DBPool; threadvar PoolEnabled:boolean=false; inside the method do a check before ask SQL; if not PoolEnabled do begin Fconn.create FTrans.create Fquery.create PoolEnabled:=True end; then in your SQL simply use MyPool.FQuery.SQL.Text().ExecSql or Open,Close, this is the fastest method because the thread where you are already has a connection, transaction, query ready, you don't need create, and don't need destroy ... tell me if you need further info 6) use stored procedures for complex query 7) use a fast Json library, as for example the excellent https://blog.grijjy.com/2017/01/30/efficient-and-easy-to-use-json-and-bson-library/ 12x faster pls let me know with those changes the results thank you btw. you can use a profiler to see where the cpu time is most used, maybe the sql cannot throughput more than this 🙂
  2. I can do it, or try yourself to do: gettickcount tparallel.for 1000000 function to test gettickount- previous gettickount then rem or use RDPMM64 anyway bench a real application, as a webserver, as I did, is producing a better idea of the real gain
  3. ok I have built CompareMem version, but is faster the Delphi RTL version; CompareText too, has similar results, so I have erased these two functions from the RDPSimd64 avoiding to bind Math and SysUtils Pos seems to work ok now, ansi, wide and unicode. Now with MM, Move, FillChar, Zeromemory and Pos, apachebench ab -n 100000 -c 100 -k -r http://192.168.1.124:8000/hello (100 concurrent http clients), is jumping from ~22-25k ops/sec to ~100k ops/sec (Windows 2016 server: VMware guest 8 virtual cores into I7 quad core host) Server Software: CrossHttpServer/2.0 Server Hostname: 192.168.1.166 Server Port: 8000 Document Path: /hello Document Length: 11 bytes Concurrency Level: 100 Time taken for tests: 0.985 seconds Complete requests: 100000 Failed requests: 0 Keep-Alive requests: 100000 Total transferred: 14200000 bytes HTML transferred: 1100000 bytes Requests per second: 101502.13 [#/sec] (mean) Time per request: 0.985 [ms] (mean) Time per request: 0.010 [ms] (mean, across all concurrent requests) Transfer rate: 14075.49 [Kbytes/sec] received TBB and IPP seems very delicate about memory errors, so for example a third-party component I have tested is producing exceptions (should be a double free or a buffer overrun). Pos in IPP was behaving differently from Delphi RTL version, managing also negative values, now should be ok. Resuming, try it with your components and let me know the results. Also please let me know if I did errors on the source code or if you trap exceptions. Thank you. Roberto Btw. tested with RIO 10.3.1 (Delphi 64bit) (I will send also Zlib 5x faster for HTTP compression) RDPINTELPas.zip
  4. I should try to do builds using CLANG, for TBB should have no trobles, we will have a OBJ to link inside, but for IPP I suppose there will be a license problem, need read the license
  5. sure, you can use in datasnap and in isapi, in delphimvc, in indy based servers, in mars, etc. in any delphi 64 app in project source put as first unit RdpMM64 then put seamm.dll and seartl.dll in the same folder where the exe is located (sea is a name I did for a little software that will put on github) or in case of isapi in the same folder where the isapi dll is located if you do benchmarks please let me know the results consider that the most time should be consumed by sql queries, those should be maked optimized using the pooling set to true in fdmanager http://docwiki.embarcadero.com/RADStudio/Rio/en/Multithreading_(FireDAC) (don't create/free everytime a connection, but reuse from a pool) if you have good threading using, as Tparallel.for or Itask, tthread, you will get a strong speedup (i'll upload the last version with pos and comparemem, comparetext soon) CompareMem is faster in Delphi RTL (overall quality of the RTL is outstanding from my point of view) CompareText is sligthly faster in IPP and not behaves identically, so with some third-party components will break Pos is average double faster, so will finish this, also there we have odd behavior from the RTL that needs to insert conditions on the translated routine So the most time intensive and frequently op seems enough to be: MM ZeroMemory FIllChar Move Pos Can I ask, in your knowledge which RTL routine are so important and often called that worth a SIMD counterpart? Please tell me the most used key base routines, will see if can translate (uppercase, lowercase, stringreplace, copy, delete?)
  6. btw. I'm enhancing the excellent DelphiCrossSocket base of winddriver, adding support for TransmitFile() API, we will have a powerful webserver for the Delphi community
  7. Using Intel IPP and TBB libraries. TBB is a memory manager optimized for parallelization/multithreading/threadpool ops IPP is a set of high performance routines that uses SIMD istructions of modern CPU, as SSE3, AVX512 In the RDPMM64 I use an optimized compiled library of TBB, and replacing FillChar() with the SIMD version In the RDPSimd64, I patch RTL most time intensive routines, FillChar, Move, Pos, CompareText with SIMD counterpart (probably I will add other RTL basic routines) I did also a Zlib optimized 5x faster than system gzip will upload all into github together with a high performance IIS Isapi filter for fast compression
  8. please rem the Pos() patches, because I have found a bug, will correct it soon
  9. UPDATED with ITBB from 27k op/sec to 98k op/sec on quad core cpu Server Software: CrossHttpServer/2.0 Server Hostname: 192.168.1.166 Server Port: 8000 Document Path: /hello Document Length: 11 bytes Concurrency Level: 100 Time taken for tests: 1.015 seconds Complete requests: 100000 Failed requests: 0 Keep-Alive requests: 100000 Total transferred: 14200000 bytes HTML transferred: 1100000 bytes Requests per second: **98514.50** [#/sec] (mean) Time per request: 1.015 [ms] (mean) Time per request: 0.010 [ms] (mean, across all concurrent requests) Transfer rate: 13661.19 [Kbytes/sec] received So: default 27K/s win2016 heap 91K/s **intel TBB 98K/s** But RDPMM64 as first unit in project source RDP_IntelMMRTL.zip
  10. hello, I did patches for MM and RTL of Win64, making Delphi windows server app flying you can check this cool library thread https://github.com/winddriver/Delphi-Cross-Socket/issues/39# Rio 10.3.1 default Server Software: CrossHttpServer/2.0 Server Hostname: 192.168.1.166 Server Port: 8000 Document Path: /hello Document Length: 11 bytes Concurrency Level: 100 Time taken for tests: 3.703 seconds Complete requests: 100000 Failed requests: 0 Keep-Alive requests: 100000 Total transferred: 14200000 bytes HTML transferred: 1100000 bytes Requests per second: **27002.22 [#/sec] (mean)** Time per request: 3.703 [ms] (mean) Time per request: 0.037 [ms] (mean, across all concurrent requests) Transfer rate: 3744.45 [Kbytes/sec] received Rio 10.3.1 with RDP patches Server Software: CrossHttpServer/2.0 Server Hostname: 192.168.1.166 Server Port: 8000 Document Path: /hello Document Length: 11 bytes Concurrency Level: 100 Time taken for tests: 1.094 seconds Complete requests: 100000 Failed requests: 0 Keep-Alive requests: 100000 Total transferred: 14200000 bytes HTML transferred: 1100000 bytes Requests per second: **91442.20 [#/sec] (mean)** Time per request: 1.094 [ms] (mean) Time per request: 0.011 [ms] (mean, across all concurrent requests) Transfer rate: 12680.46 [Kbytes/sec] received Please check my Pos() routines, should be ok If you use my patches please put a link to my website [www.dellapasqua.com](http://www.dellapasqua.com) and please, if you like, forward me some jobs internet related, fullstack, cloud, embedded, sql, I'm glad to collaborate with smart people, Delphi companies Thank you Roberto Della Pasqua RDPRTL.zip TestPOs.zip Btw. I have also zlib SIMD 5x faster than the system gzip library, I'll post next time
  11. I have done a quick look to the source where exception happens, well, I have not seen errors, also the code seems very well designed with interfaces, anonymous methods, abstraction, generics, correct overloading and great management of windows api I have done a try to resemble the function with the fault, calling a new() record within a loop, without troubles ... :-zzz
  12. I have tested only the application throughput speed, I don't know if it's prone to fragmentation (probably the big advantage of adopting a MM layer is avoid this) How to test other parameters you told? (btw. I see C++ coders use msvcrt malloc or directly the OS heap allocator api) https://github.com/01org/tbb/issues/120#issuecomment-459776671 https://github.com/winddriver/Delphi-Cross-Socket/issues/39 (I miss Per Larsen SleuthQA, I try now madExcept, or do you know any good QA mem checker?)
  13. yes, indeed, the problem appears only using Intel TBB I have tested FastMM-avx, ScaleMM2, Google tcMalloc, BrainMM curiously, under Windows Server 2016, the better performing heap manager (I'm using apachebench to do a test with 1000 concurrent sockets) is the OS (using directly heapalloc, heapfree...) I would like to notify IntelTBB github about their problem thanks all for the help btw. https://docs.microsoft.com/en-us/windows/desktop/memory/low-fragmentation-heap
  14. I have solved with this: IsMultiThread:=True; at begin but this isn't a old setting? problem persist just to clarify then I stop to be boring, imho the thing deserves a look, because for example using TParallel.For and other system.threadpool with massive concurrency I get no problem at all... Something of odd should be (sorry if I'm wrong)
  15. ok just tried unmodified Intel compiled DLL https://github.com/01org/tbb/releases (you can check bin folder tbbmalloc.dll) the problem persist, so in my opinion cannot be a problem of this library, but somewhere in the compiler low level / RTL / manager memory? (sorry if I insist, but the libraries works perfectly with dozen of delphi projects, big, a variety of components, and the problem appears only with this Delphi Cross Socket IoCompletionPort)
×