Jump to content

Stefan Glienke

Members
  • Content Count

    1428
  • Joined

  • Last visited

  • Days Won

    141

Everything posted by Stefan Glienke

  1. Stefan Glienke

    Out parameter is read before set

    Out parameters are just bad var parameters - just saying.
  2. Delphi developers usually have the association of allocating memory when they read constructor - and thus think that for a value type it is wrong to have one. C++ has struct ctors and nobody (that I know) thinks they are wrong - because usually, C++ developers don't immediately think of memory allocation when they see a ctor.
  3. In fact, its a regression that happened just recently: https://quality.embarcadero.com/browse/RSP-33455 sometime in 10.4.x
  4. Looking at asm without $O+ is pointless. Here is the code from both options with $O+: RecordCtorVsClassFunc.dpr.26: begin 00408F94 51 push ecx RecordCtorVsClassFunc.dpr.27: LRecCon := TRecConstructor.Create( 42 ); 00408F95 8BC4 mov eax,esp 00408F97 BA2A000000 mov edx,$0000002a 00408F9C E813000000 call TRecConstructor.Create RecordCtorVsClassFunc.dpr.28: end; 00408FA1 5A pop edx 00408FA2 C3 ret 00408FA3 90 nop RecordCtorVsClassFunc.dpr.33: begin 00408FA4 51 push ecx RecordCtorVsClassFunc.dpr.34: LRecCls := TRecClassFunc.Create( 84 ); 00408FA5 B854000000 mov eax,$00000054 00408FAA E809000000 call TRecClassFunc.Create 00408FAF 890424 mov [esp],eax RecordCtorVsClassFunc.dpr.35: end; 00408FB2 5A pop edx 00408FB3 C3 ret Because you have a record here that fits into a register and does not have any managed type fields it will simply be returned via eax when using a function. When using a ctor it passes its address via eax. Now let's add a second Integer field to both records and look again: RecordCtorVsClassFunc.dpr.26: begin 00408F94 83C4F8 add esp,-$08 RecordCtorVsClassFunc.dpr.27: LRecCon := TRecConstructor.Create( 42 ); 00408F97 8BC4 mov eax,esp 00408F99 BA2A000000 mov edx,$0000002a 00408F9E E819000000 call TRecConstructor.Create RecordCtorVsClassFunc.dpr.28: end; 00408FA3 59 pop ecx 00408FA4 5A pop edx 00408FA5 C3 ret 00408FA6 8BC0 mov eax,eax RecordCtorVsClassFunc.dpr.33: begin 00408FA8 83C4F8 add esp,-$08 RecordCtorVsClassFunc.dpr.34: LRecCls := TRecClassFunc.Create( 84 ); 00408FAB 8BD4 mov edx,esp 00408FAD B854000000 mov eax,$00000054 00408FB2 E809000000 call TRecClassFunc.Create RecordCtorVsClassFunc.dpr.35: end; 00408FB7 59 pop ecx 00408FB8 5A pop edx 00408FB9 C3 ret Whooop, no difference. Now let's add another field - of type string - I stripped the prologue and epilogue from the asm shown to reduce the noise: RecordCtorVsClassFunc.dpr.28: begin ... RecordCtorVsClassFunc.dpr.29: LRecCon := TRecConstructor.Create( 42 ); 004099B1 8D45E8 lea eax,[ebp-$18] 004099B4 BA2A000000 mov edx,$0000002a 004099B9 E89A000000 call TRecConstructor.Create 004099BE 8D55E8 lea edx,[ebp-$18] 004099C1 8D45F4 lea eax,[ebp-$0c] 004099C4 8B0DD0984000 mov ecx,[$004098d0] 004099CA E871D2FFFF call @CopyRecord RecordCtorVsClassFunc.dpr.30: end; ... RecordCtorVsClassFunc.dpr.35: begin ... RecordCtorVsClassFunc.dpr.36: LRecCls := TRecClassFunc.Create( 84 ); 00409A22 8D55F4 lea edx,[ebp-$0c] 00409A25 B854000000 mov eax,$00000054 00409A2A E82D000000 call TRecClassFunc.Create RecordCtorVsClassFunc.dpr.37: end; ... Now we see a missing optimization when using the ctor opposed to the function - as you might know functions returning a managed type (such as a record with at least one field of a managed type such as string) are actually passed as var param (last parameter after all others, thus edx here). The ctor code however uses an unnecessary temp copy.
  5. Stefan Glienke

    Spring4D TPair removed

    There is no new released master - there was one single commit for an inc file to fix a compiler issue on OSX64
  6. Stefan Glienke

    Spring4D TPair removed

    1.2 never had TPair - it got introduced in develop and will be in 2.0
  7. Stefan Glienke

    Delphi 10.4.2 always recompiling in IDE

    IIRC there was (maybe still is?) an issue which causes relinking when error insight is enabled - I think it was among the many things that IDEFixPack patched.
  8. Certainly better than the RTL - however if you want it really fast and justify implementing in asm then SSE should be used. For reference: https://www.strchr.com/strcmp_and_strlen_using_sse_4.2 Also, people might not want to use your code because of GPLv3 unless they put it into an extra DLL.
  9. Any branching (if, case) is expensive - especially when it cannot be predicted because you have a random pattern. Here is how it's done fast: procedure CountAllItems4(const aDocuments: TDocuments; var counts: array of Integer); var i: integer; begin counts[0] := 0; counts[1] := 0; counts[2] := 0; for i := Low(aDocuments) to High(aDocuments) do Inc(counts[Ord(aDocuments[i].DocType)]); end; var counts: array[TDocType] of Integer; begin ... CountAllItems4(vDocuments, counts); q := counts[dtQuote]; o := counts[dtOrder]; i := counts[dtInvoice];
  10. We can certainly argue over the term "big" - but performance is either CPU or memory bound - and with hash table items be scattered all over the heap it most certainly will be memory bound. Hash tables are complex beasts and there are many considerations when designing one but usually, you want to avoid blasting your items all over the heap. Interesting read: https://leventov.medium.com/hash-table-tradeoffs-cpu-memory-and-variability-22dc944e6b9a
  11. Stefan Glienke

    JCL installation problems in D10.4.2

    That implies that there is nothing more than some code to compile - fwiw this does not even add the dcu output directory to the library path (which is my biggest issue with that "process") let alone all the other stuff the jcl/jvcl installer might do (I am no jedi dev, so I don't know how much of that is actually necessary).
  12. Stefan Glienke

    JCL installation problems in D10.4.2

    What process is that?
  13. Did you just ask if O(log n) is faster than O(n)?
  14. Stefan Glienke

    recompiling delphi source for Delphi Sydney

    Generics as well but that is related to inlining because generics use a similar mechanism as inlining does. That's why we had some breaking changes in some updates in the past because some devs at Embarcadero obviously forgot about that and fixed some issues in Generics.Collections. I think there are more reasons for F2051 to happen but those that are not solvable by switching around compiler options are those I mentioned to my knowledge. To be honest I am still not fully understanding how inlining is being controlled exactly. For example when I write a for in loop over a TCollection in my code and add System.Classes to my project causing a recompile I see that it does not fully inline but does a call to TList<TCollectionItem>.GetItems which is marked as inline. When I use the dcu shipped with Delphi that call is not there. Changing $INLINE option does not change anything about that. Example code - when looking at the asm you will see the call I mentioned before. When the path to the pas file is removed and it uses the dcu the call is not there. I did not find a compiler option that makes it go away when compiling the pas file. uses System.Classes in 'c:\program files (x86)\embarcadero\studio\21.0\source\rtl\common\System.Classes.pas'; procedure Main; var list: TCollection; item: TCollectionItem; begin list := TCollection.Create(TCollectionItem); list.Add; for item in list do; end; begin Main; end.
  15. Stefan Glienke

    recompiling delphi source for Delphi Sydney

    This has nothing to do with the compiler options but with the inlining - in Vcl.ExtCtrls.TCustomGridPanel.Loaded there is a for in loop over a TCollection and the GetCurrent implemenentation is different in 10.4 as it was in 10.0. To my knowledge there is no way to compile individual units that have methods that are inlined when being used in other units - so you have to compile the other units as well - in this case you also have to add Vcl.ExtCtrls to your project and recompile that as well even though you did not change anything in that unit but because of the inlined method. This can pile up quite significantly causing the necessity to recompile almost the entire codebase depending on which unit you modified and recompile.
  16. Neither - all hash tables that have a word when it comes to performance are using some contiguous block of memory (aka array/vector). Otherwise, any possible cache locality is just completely destroyed.
  17. Stefan Glienke

    Micro optimization: Math.InRange

    Simple: AValue-AMin causes an overflow when AMin is bigger than AValue. Your code is shifting the non-zero-based range to a zero-based range causing AValue to possibly become <0. In fact, running the very benchmark code Mike posted will cause it!
  18. Stefan Glienke

    Micro optimization: Math.InRange

    And even that depending on the CPU generation can possibly be detected by the branch predictor because it's a fixed pattern. You need a random sequence of numbers that you check against. However, keep in mind what exact use case you are benchmarking vs what you actually use this function for. Is it random data that needs to be checked for in range? If you have a loop where at some point the counter is in range and at some point runs out of range then the loop counter itself should be limited to just run over the numbers that are in range.
  19. Stefan Glienke

    TArray<T> helper

    GNU License 🤦‍♂️ Some of the methods in that helper are obsolete since XE7 because we have Insert, Add, Delete for dynamic arrays. Most of the other methods are in Spring.pas TArray which is not a helper for the System.Generics.Collections one but reimplements its methods and adds its own. For the TArrayRecord<T> type Spring.pas has Vector<T> (the naming is taken from C++ where this is the dynamic array type name)
  20. Stefan Glienke

    Micro optimization: Math.InRange

    Indeed here it's the opposite, due to the way it's written it always has to perform both checks (see my third point). But depending on what you do after the if it could be done completely branchless.
  21. Stefan Glienke

    Micro optimization: Math.InRange

    While your assessment on Math.InRange is true (it is coded in a bad way plus the compiler produces way too many conditional jumps - https://quality.embarcadero.com/browse/RSP-21955) you certainly need to read some material on how to properly microbenchmark and how to read assembly. First of all, even though the Delphi compiler is pretty terrible at optimizing away dead code it might omit the if statement if there is nothing to do after it. Second - be careful if one of your loops spans multiple cache lines while others don't this affects the outcome slightly and can in such a case affect the result in a noticeable way. Third - with a static test like this you prove nothing - the branch predictor will do its job. If you want to benchmark the raw performance of one vs the other you need to give it random data which does not follow the "not in range for a while, in range for a while, not in range until the end" pattern
  22. An impressive number of implemented collections for sure but it's only compatible with FreePascal and from a quick look you won't easily make that code Delphi compatible. But thanks for mentioning it - I will certainly run some benchmarks to compare.
  23. @Dany Marmur Agreed - it's the job of runtime library developers to get the most out of their data structures and algorithms so the users of those libraries don't have to worry 99.9% of the time but just chose the reasonable type/algo for the job. Funny enough I am currently in the process to do exactly that for my library and can tell you that for the RTL it's certainly not the case, unfortunately.
  24. Stefan Glienke

    Is set a nullable type? (record constraint)

    No version ever compiled this code - I just checked with XE8 to 10.4 (all the latest update/hotfixes) Reported since 10.0.1 - see https://quality.embarcadero.com/browse/RSP-13198
×