Jump to content

Arnaud Bouchez

  • Content Count

  • Joined

  • Last visited

  • Days Won


Arnaud Bouchez last won the day on June 3

Arnaud Bouchez had the most liked content!

Community Reputation

278 Excellent


Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

  1. ShortString have a big advantage: they can be allocated on the stack. So they are perfect for small ASCII text, e.g. numbers to text conversion, or when logging information in plain English. Using str() over a shortstring is for instance faster than using IntToString() and a temporary string (or AnsiString) on a multi-thread program, because the heap is not involved. Of course, we could use a static array of AnsiChar, but ShortString have an advantage because they have their length encoded so they are safer and faster than #0 terminated strings. So on mobile platform, we could end up by creating a new record type, re-inventing the wheel whereas the ShortString type is in fact still supported and generated by the compiler, and even used by the RTL at its lowest system level. ShortString have been deprecated... and hidden. They could even be restored/unhidden by some tricks like https://www.idefixpack.de/blog/2016/05/system-bytestrings-for-10-1-berlin Why? Because some people at Embarcadero thought it was confusing, and that the language should be "cleaned up" - I translate by "more C# / Java like", with a single string type. This was the very same reason they did hide RawByteString and AnsiString... More a marketing strategy than a technical decision IMHO. I prefer the FPC more "conservative" way of preserving backward compatibility. It is worth noting that the FPC compiler source code itself uses a lot of shortstring internally, so it would never be deprecated on FPC for sure. 😉
  2. Arnaud Bouchez

    About TGUID type...

    Using a local TGUID constant seems the best solution. It is clean and fast (no string/hex conversion involved, just copy the TGUID record bytes). No need to make anything else, or ask for a feature request I guess, because it would be purely cosmetic.
  3. Arnaud Bouchez

    Why compiler allows this difference in declaration?

    IIRC it was almost mandatory to work with Ole Automation and Word/Excel. Named parameters are the usual way of calling Word/Excel Ole API, from Office macros or Visual Basic. So Delphi did have to support this too. And it is easy to implement named parameters with OLE. Because in its ABI, parameters are... indeed named. Whereas for regular native code function ABI, parameters are not named, but passed in order on registers or the stack. So implementing named parameters in Delphi would have been feasible, but would require more magic, especially for the "unnamed" parameter. It was requested several times in the past, especially from people coming from Python background. Named parameters can make calls more explicit. So the question is more for Embarcadero people. 😉 You can emulate this using a record or a class to pass the values, but it is not as elegant.
  4. Yes, I have seen the handcrafted IMT. But I am not convinced the "sub rcx, xxx; jmp xxx" code block makes a noticeable performance penalty - perhaps only on irrelevant micro benchmarks. The CPU lock involved in calling a Delphi virtual method through an interface has a cost for sure https://www.idefixpack.de/blog/2016/05/whats-wrong-with-virtual-methods-called-through-an-interface - but not a sub + jmp. Also the enumerator instance included into the list itself seems a premature optimization to me if we want to be cross-platform as we want. Calling GetThreadID on a non Windows platform has a true cost if you call the pthread library. And the resulting code complexity makes me wonder if it is really worth it in practice. Better switch to a better memory manager, or just use a record and rely on inlining + stack allocation of an unmanaged enumerator. Since you implemented both of those optimizations: OK, just continue to use them. But I won't go that way with mORMot. Whereas I still find your interface + pre-compiled folded classes an effective and nice trick (even if I just disable byte and word stubs for dictionaries: if you have a dictionary, the hash table will be bigger than the size of the byte/word data itself - so just use integers instead of byte/word for dictionaries in end-user code; but I still have byte/word specializations for IList<> which does not have any hash by default, but may on demand).
  5. Nice timings. Yes, you are right, in mORMot we only use basic iteration to be used in "for ... in .. do" statement with no further composition and flexibility as available with IEnumerable<T>. The difference for small loops is huge (almost 10 times) and for big loops is still relevant (2 times) when records are used. I guess mORMot pointer-based records could be slightly faster than RTL index-based values, especially when managed types are involved. In practice, I find "for .. in .. do" to be the main place for iterations. So to my understanding, records are the way to go for mORMot. Then nothing prevents another method returning complex and fluid IEnumerable<T>. We just don't go that way in mORMot yet.
  6. I discovered that using a record as TEnumerator makes the code faster, and allow some kind of inlining, with no memory allocation. My TSynEnumerator<T> record only uses 3 pointers on the stack with no temporary allocation. I prefer using pointers here to avoid any temporary storage e.g. for managed types. And the GetCurrent and MoveNext functions have no branch and inline very aggressively. And it works with Delphi 2010! 😉 Record here sounds like a much more efficient idea than a class + interface, as @Stefan Glienke did in Spring4D. Stefan, have you any reason to prefer interfaces also for the enumerator instead of good old record? From my finding, performance is better with a record, especially thanks to inlining - which is perfect on FPC, but Delphi still calls MoveNext and don't inline it. It also avoid a try...finally for most simple functions, nor any heap allocation. Please check https://github.com/synopse/mORMot2/commit/17b7a2753bb54057ad0b6d03bd757e370d80dce2
  7. I just found out that Delphi has big troubles with static arrays like THash128 = array[0..15] of byte. If I remove those types, then the Delphi compiler does emit "F2084 Internal Error" any more... So I will stick with the same kind of types as you did (byte/word/integer/int64/string/interface/variant) and keep those bigger ordinals (THash128/TGUID) to use regular (bloated) generics code on Delphi (they work with no problem on FPC 3.2). Yes, I get a glimpse of what you endure, and I thank you much for having found such nice tricks in Spring4D. The interface + folded types pattern is awesome. 😄
  8. Is it only me or awful and undocumented problems like " F2084 Internal Error: AV004513DE-R00024F47-0 " occur when working on Delphi with generics? @Stefan Glienke How did you manage to circumvent the Delphi compiler limitations? On XE7 for instance, as soon as I use intrinsics I encounter those problems - and the worse is that they are random: sometimes, the compilation succeed, but other times there is the Internal Error, sometimes on Win32 sometimes on Win64... 😞 I get rid of as many "inlined" code as possible, since I discovered generics do not like calling "inline" code. In comparison, FPC seems much more stable. The Lazarus IDE is somewhat lost within generics code (code navigation is not effective) - but at least, it doesn't crash and you can work as you expect. FPC 3.2 generics support is somewhat mature: I can see by disassembling the .o that intrinsics are properly handled with this compiler. Which is good news, since I rely heavily on it for folding base generics specialization using interface, as you did with Spring4D.
  9. Arnaud Bouchez

    64 bit compiler problem

    My guess is that the default data alignment may have changed between Delphi 10.2 and 10.3, so the static arrays don't have the same size. You may try to use "packed" for all the internal structures of the array.
  10. @Stefan Glienke If you can, please take a look at a new mORMot 2 unit: https://github.com/synopse/mORMot2/blob/master/src/core/mormot.core.collections.pas In respect to generics.collections, this unit uses interfaces as variable holders, and leverage them to reduce the generated code as much as possible, as the Spring4D 2.0 framework does, but for both Delphi and FPC. Most of the unit is in fact embedding some core collection types to mormot.core.collections.dcu to reduce the user units and executable size for Delphi XE7+ and FPC 3.2+. Thanks a lot for the ideas! It also publishes TDynArray and TSynDictionary high-level features like JSON/binary serialization or thread safety with Generics strong typing. More TDynArray features (like sorting and search) and also TDynArrayHasher features (an optional hash table included in ISynList<T>) are coming.
  11. Arnaud Bouchez

    64 bit compiler problem

    The stack is always "unloaded" when the method returns. That is a fact for sure. There is a "mov rbp, rsp" in the function prolog, and a reversed "mov rsp, rbp" in the function epilog. Nice and easy. Look at the stack trace in the debugger when you reach the stack overflow problem. You will find out the exact context. And switch to a dynamic array. Using proper copy() if you want to work on a local copy.
  12. Arnaud Bouchez

    64 bit compiler problem

    @Kas Ob. Win32 code is fine, it fills the stack variables with zeros, and my guess is that the size fits the TProgrammaCalcio size. The problem is really a stack overflow, not infinite recursion. The compiler has no bug here. User code is faulty. @marcocir Did you read my answer? There is no unexpected recursion involved. The problem is that this TProgrammaCalcio is too huge to fit on the stack. On Win64, this is what mov [rsp+rax],al does: it reserves some space for the stack, by increasing the stack pointer by 4KB chunks (the memory page size), and writing one single byte to it to trigger a page fault and a stack overflow if there is no stack place available. And you are using too much of it with your local variable. On Win32, it doesn't reserve the memory by 4KB chunks, but it fills the stack with zeros - which is slower, but does exactly the same. What does SizeOf(TProgrammaCalcio) return? Is it really a static array? You should switch to a dynamic array instead. A good practice is not to allocate more than 64KB on the stack, especially if there are some internal calls to other functions. In some execution context (e.g. a dll hosted by IIS), the per-thread stack size could be as little as 128KB.
  13. TDynArrayHashed is a wrapper around an existing dynamic array. It doesn't hold any data but works with TArray<T>. You could even maintain several hashes for several fields of the array item at the same time. TSynDictionary is a class holding two dynamic arrays defined at runtime (one for the keys, one for the values). So it is closer to the regular collections. But it has additional features (like thread safety, JSON or binary serialization, deprecation after timeout, embedded data search...), which could make the comparison unfair: it does more than lookups. The default syntax is using old fashioned TypeInfo() but you could wrap it into generics syntax instead, using TypeInfo(T). So I guess the comparison is fair. And the executable size would not be bloated because there is no generic involved for the actual implementation of the TSynDictionary class in mORMot. Removing generic/parametrized code as soon as possible is what every library is trying to do, both the Delphi RTL and Delphi Spring. The RTL and Delphi String were rewritten several times to keep the generic syntax as high as possible, and try to leverage compiler magic as much as possible to avoid duplicated or dead code. This is what String4D 2.0 is all about. mORMot just allows to avoid duplicated or dead code regeneration completely.
  14. Arnaud Bouchez

    64 bit compiler problem

    The Win64 assembly is not inefficient. It is almost the same code as the Win32 version. And my guess is that Win64 code would be faster. The on-stack TPanel is initialized with explicit zero fill of all bytes in Win32 in a loop, whereas it is with zero fill only of some managed fields within TPanel in Win64, if I understand well the generated asm. Then the RTL calls are comparable. So at execution, my guess is that the Win64 version is likely to be faster. So the compiler is not faulty. On the other hand, your code could be enhanced. One simple optimization would be to write "const Programma: TProgrammaCalcio;" for your program parameter to avoid a RTL array copy call (stack allocation + movsd/movsq + AddRefArray). Using "const" for managed types (string, arrays, records) is always a good idea. Actual stack variable use may be the real cause of the Stack Overflow problem. My guess is that you are abusing of huge variable on stack. Each calls seems to consume $8cb08 bytes on the stack. Much too big! So to fix it: 0) read more about how records and static arrays are used by Delphi, especially about memory allocation and parameter passing 1) use const to avoid a temporary copy on the local stack 2) don't use fixed size arrays for TProgrammaCalcio but a dynamic array allocated on heap 3) TPanel may also be somewhat big - consider using classes and not records to use the heap instead of the stack 4) If you are sure that heap allocation is a bottleneck (I doubt it here), you may create a temporary TPanel within the main TfrmFixedCalcio instance and reuse it Before wondering about the code generated by the Delphi compiler, please review your own code, which is likely to be the real bottleneck for performance. And in such a method, the FMX process and the GUI computation is likely to be the real bottleneck, not your Delphi code, and especially not the prolog/epilog generated by the Delphi compiler. "Premature Optimization is the root of all evil"! (DK)
  15. Nice reading. If you make some performance comparisons with other libraries, ensure you use the latest mORMot 2 source code - not mORMot 1, but mORMot 2. Some rewrite and optimization have been included in latest mORMot 2. And consider using TDynArrayHashed instead of TSynDictionary if you want a fair comparison - TSynDictionary has a lock to be thread-safe, and copy the retrieved data at lookup so it is not fair comparison with regular mapping classes.