Jump to content

Arnaud Bouchez

Members
  • Content Count

    274
  • Joined

  • Last visited

  • Days Won

    12

Everything posted by Arnaud Bouchez

  1. Arnaud Bouchez

    Components4developers???

    Perhaps they have an attack from Russian hackers....
  2. Arnaud Bouchez

    Is Move the fastest way to copy memory?

    L1 cache access time makes a huge difference. http://blog.skoups.com/?p=592 You could retrieve the L1 cache size, then work on buffers of about 90% of this size (always keep some space for stack, tables and such). Then, if you work in the API buffer directly, a non-temporal move to the result buffer may help a little. During your process, if you use lookup tables, ensure they don't pollute the cache. But profiling is the key for sure. Guesses are most of the time wrong...
  3. Arnaud Bouchez

    Is Move the fastest way to copy memory?

    That's what I wrote: it is unlikely alternate Move() would make a huge difference. When working on buffers, cache locality is a performance key. Working on smaller buffers, which fit in L1 cache (a few MB usually) could be faster than two big Move / Process. But perhaps your CPU has already good enough cache (bigger than your picture), so it won't help. About the buffers, couldn't you use a ring of them, so that you don't move data?
  4. Arnaud Bouchez

    How make benchmark?

    It will depend on the Database used behind FireDAC or Zeos, and the standard used (ODBC/OleDB/Direct...). I would say that both are tuned - just ensure you got the latest version of Zeos, which is much more maintained and refined that FireDAC in the last years. Note that FireDAC has some aggressive settings, e.g. for SQLite3 it changes the default safe write settings into faster access. The main interrest of Zeos is that the ZDBC low-level layer does not use a TDataSet, so it is (much) faster if you retrieve a single object. You will see those two behavior in the Michal numbers above, for instance. Also note that mORMot has a direct DB layer, not based on TDataSet, which may be used with FireDAC or Zeos, or with its own direct ODBC/OleDB/Oracle/PostgreSQL/SQLite3 data access. See https://synopse.info/files/html/Synopse mORMot Framework SAD 1.18.html#TITL_27 Note that its ORM is built on top on this unique DB layer, and add some unique features like multi-insert SQL generation, so a mORMot TRestBatch is usually much faster than direct naive INSERTs within a transaction. You can reach 1 million inserts per second with SQLite3 with mORMot 2 - https://blog.synopse.info/?post/2022/02/15/mORMot-2-ORM-Performance
  5. Arnaud Bouchez

    Update framework question

    I would stick with a static JSON resource, if it is 20KB of data once zipped. Don't use HEAD for it. With a simple GET, and proper E-Tag caching, it would let the HTTP server return 304 on GET if not modified: just a single request, only returning the data when it changed. All will stay at HTTP server level, so it would be simple and effective.
  6. Arnaud Bouchez

    Is Move the fastest way to copy memory?

    Don't expect anything magic by using mORMot MoveFast(). Perhaps a few percent more or less. On Win32 - which is your target, IIRC the Delphi RTL uses X87 registers. On this platform, MoveFast() use SSE2 registers for small sizes, so is likely to be slightly faster, and will leverage ERMSB move (i.e. rep movsb) on newer CPUs which support it. To be fair, mORMot asm is more optimized for x86_64 than for i386 - because it is the target platform for server side, which is the one needing more optimization. But I would just try all FastCode variants - some can be very verbose, but "may" be better. What I would do in your case, is trying to not move any data at all. Isn't it possible that you pre-allocate a set of buffers, then just consume them in a circular way, passing them from the acquisition to the processing methods as pointers, with no copy? The fastest move() is ... when there is no move... 🙂
  7. Arnaud Bouchez

    Locked SQlite

    See https://www.sqlite.org/lockingv3.html By default, FireDac opens SQLite3 databases in "exclusive" mode, meaning that only a single connection is allowed. It is much better for the performance, but it "locks" the file for opening outside this main connection. So, as @joaodanet2018 wrote, change the LockingMode in FDconn, or just close the application currently using it.
  8. Arnaud Bouchez

    Docx (RTF) to PDF convert

    I really recommend https://www.trichview.com/
  9. Where are you located? (it makes difference for your potential work status, even remotely) Do you have some code to show? (e.g. on github or anywhere else)
  10. note: if you read the file from start to end, Memory mapped files are not faster than reading the file in memory. The memory faults make it slower than a regular single FileRead() call. For huge files on Win32 which won't be able to load in memory, you may use temporary chunks (e.g. 128MB). And if you really load it once and don't want to pollute the OS disk memory cache, consider using the FILE_FLAG_SEQUENTIAL_SCAN flag under Windows. This is what we do with mORMot's FileOpenSequentialRead(). https://devblogs.microsoft.com/oldnewthing/20120120-00/?p=8493
  11. Yes, delete() is as bad as copy(), David is right! Idea is to keep the input string untouched, then append the output to a new output string, preallocated once with a maximum potential size. Then call SetLength() once at the end, which is likely to do nothing and reallocate the content in-place, thanks to the heap manager.
  12. You could do it with NO copy() call at all. Just write a small state machine and read the input one char per char.
  13. The main trick is to avoid memory allocation, i.e. temporary string allocations. For instance, the less copy() calls, the better. Try to rewrite your code to allocate one single output string per input string. Just parse the input string from left to write, then applying the quotes or dates processing on the fly. Then you could also avoid any input line allocation, and parse the whole input buffer at once. Parsing 100.000 lines could be done much quicker, if properly written. I guess round 500MB/s is easy to reach. For instance, within mORMot, we parse and convert 900MB/s of JSON in pure pascal, including string unquoting.
  14. This is a bit tricky (a COM object) but it works fine on Win32. You have the source code at https://github.com/synopse/SynProject We embedded the COM object as resource with the main exe, and it is uncompressed and registered for the current user.
  15. ShortString have a big advantage: they can be allocated on the stack. So they are perfect for small ASCII text, e.g. numbers to text conversion, or when logging information in plain English. Using str() over a shortstring is for instance faster than using IntToString() and a temporary string (or AnsiString) on a multi-thread program, because the heap is not involved. Of course, we could use a static array of AnsiChar, but ShortString have an advantage because they have their length encoded so they are safer and faster than #0 terminated strings. So on mobile platform, we could end up by creating a new record type, re-inventing the wheel whereas the ShortString type is in fact still supported and generated by the compiler, and even used by the RTL at its lowest system level. ShortString have been deprecated... and hidden. They could even be restored/unhidden by some tricks like https://www.idefixpack.de/blog/2016/05/system-bytestrings-for-10-1-berlin Why? Because some people at Embarcadero thought it was confusing, and that the language should be "cleaned up" - I translate by "more C# / Java like", with a single string type. This was the very same reason they did hide RawByteString and AnsiString... More a marketing strategy than a technical decision IMHO. I prefer the FPC more "conservative" way of preserving backward compatibility. It is worth noting that the FPC compiler source code itself uses a lot of shortstring internally, so it would never be deprecated on FPC for sure. 😉
  16. Arnaud Bouchez

    About TGUID type...

    Using a local TGUID constant seems the best solution. It is clean and fast (no string/hex conversion involved, just copy the TGUID record bytes). No need to make anything else, or ask for a feature request I guess, because it would be purely cosmetic.
  17. Arnaud Bouchez

    Why compiler allows this difference in declaration?

    IIRC it was almost mandatory to work with Ole Automation and Word/Excel. Named parameters are the usual way of calling Word/Excel Ole API, from Office macros or Visual Basic. So Delphi did have to support this too. And it is easy to implement named parameters with OLE. Because in its ABI, parameters are... indeed named. Whereas for regular native code function ABI, parameters are not named, but passed in order on registers or the stack. So implementing named parameters in Delphi would have been feasible, but would require more magic, especially for the "unnamed" parameter. It was requested several times in the past, especially from people coming from Python background. Named parameters can make calls more explicit. So the question is more for Embarcadero people. 😉 You can emulate this using a record or a class to pass the values, but it is not as elegant.
  18. Yes, I have seen the handcrafted IMT. But I am not convinced the "sub rcx, xxx; jmp xxx" code block makes a noticeable performance penalty - perhaps only on irrelevant micro benchmarks. The CPU lock involved in calling a Delphi virtual method through an interface has a cost for sure https://www.idefixpack.de/blog/2016/05/whats-wrong-with-virtual-methods-called-through-an-interface - but not a sub + jmp. Also the enumerator instance included into the list itself seems a premature optimization to me if we want to be cross-platform as we want. Calling GetThreadID on a non Windows platform has a true cost if you call the pthread library. And the resulting code complexity makes me wonder if it is really worth it in practice. Better switch to a better memory manager, or just use a record and rely on inlining + stack allocation of an unmanaged enumerator. Since you implemented both of those optimizations: OK, just continue to use them. But I won't go that way with mORMot. Whereas I still find your interface + pre-compiled folded classes an effective and nice trick (even if I just disable byte and word stubs for dictionaries: if you have a dictionary, the hash table will be bigger than the size of the byte/word data itself - so just use integers instead of byte/word for dictionaries in end-user code; but I still have byte/word specializations for IList<> which does not have any hash by default, but may on demand).
  19. Nice timings. Yes, you are right, in mORMot we only use basic iteration to be used in "for ... in .. do" statement with no further composition and flexibility as available with IEnumerable<T>. The difference for small loops is huge (almost 10 times) and for big loops is still relevant (2 times) when records are used. I guess mORMot pointer-based records could be slightly faster than RTL index-based values, especially when managed types are involved. In practice, I find "for .. in .. do" to be the main place for iterations. So to my understanding, records are the way to go for mORMot. Then nothing prevents another method returning complex and fluid IEnumerable<T>. We just don't go that way in mORMot yet.
  20. I discovered that using a record as TEnumerator makes the code faster, and allow some kind of inlining, with no memory allocation. My TSynEnumerator<T> record only uses 3 pointers on the stack with no temporary allocation. I prefer using pointers here to avoid any temporary storage e.g. for managed types. And the GetCurrent and MoveNext functions have no branch and inline very aggressively. And it works with Delphi 2010! 😉 Record here sounds like a much more efficient idea than a class + interface, as @Stefan Glienke did in Spring4D. Stefan, have you any reason to prefer interfaces also for the enumerator instead of good old record? From my finding, performance is better with a record, especially thanks to inlining - which is perfect on FPC, but Delphi still calls MoveNext and don't inline it. It also avoid a try...finally for most simple functions, nor any heap allocation. Please check https://github.com/synopse/mORMot2/commit/17b7a2753bb54057ad0b6d03bd757e370d80dce2
  21. I just found out that Delphi has big troubles with static arrays like THash128 = array[0..15] of byte. If I remove those types, then the Delphi compiler does emit "F2084 Internal Error" any more... So I will stick with the same kind of types as you did (byte/word/integer/int64/string/interface/variant) and keep those bigger ordinals (THash128/TGUID) to use regular (bloated) generics code on Delphi (they work with no problem on FPC 3.2). Yes, I get a glimpse of what you endure, and I thank you much for having found such nice tricks in Spring4D. The interface + folded types pattern is awesome. 😄
  22. Is it only me or awful and undocumented problems like " F2084 Internal Error: AV004513DE-R00024F47-0 " occur when working on Delphi with generics? @Stefan Glienke How did you manage to circumvent the Delphi compiler limitations? On XE7 for instance, as soon as I use intrinsics I encounter those problems - and the worse is that they are random: sometimes, the compilation succeed, but other times there is the Internal Error, sometimes on Win32 sometimes on Win64... 😞 I get rid of as many "inlined" code as possible, since I discovered generics do not like calling "inline" code. In comparison, FPC seems much more stable. The Lazarus IDE is somewhat lost within generics code (code navigation is not effective) - but at least, it doesn't crash and you can work as you expect. FPC 3.2 generics support is somewhat mature: I can see by disassembling the .o that intrinsics are properly handled with this compiler. Which is good news, since I rely heavily on it for folding base generics specialization using interface, as you did with Spring4D.
  23. Arnaud Bouchez

    64 bit compiler problem

    My guess is that the default data alignment may have changed between Delphi 10.2 and 10.3, so the static arrays don't have the same size. You may try to use "packed" for all the internal structures of the array.
  24. @Stefan Glienke If you can, please take a look at a new mORMot 2 unit: https://github.com/synopse/mORMot2/blob/master/src/core/mormot.core.collections.pas In respect to generics.collections, this unit uses interfaces as variable holders, and leverage them to reduce the generated code as much as possible, as the Spring4D 2.0 framework does, but for both Delphi and FPC. Most of the unit is in fact embedding some core collection types to mormot.core.collections.dcu to reduce the user units and executable size for Delphi XE7+ and FPC 3.2+. Thanks a lot for the ideas! It also publishes TDynArray and TSynDictionary high-level features like JSON/binary serialization or thread safety with Generics strong typing. More TDynArray features (like sorting and search) and also TDynArrayHasher features (an optional hash table included in ISynList<T>) are coming.
  25. Arnaud Bouchez

    64 bit compiler problem

    The stack is always "unloaded" when the method returns. That is a fact for sure. There is a "mov rbp, rsp" in the function prolog, and a reversed "mov rsp, rbp" in the function epilog. Nice and easy. Look at the stack trace in the debugger when you reach the stack overflow problem. You will find out the exact context. And switch to a dynamic array. Using proper copy() if you want to work on a local copy.
×