Jump to content

Stefan Glienke

Members
  • Content Count

    1370
  • Joined

  • Last visited

  • Days Won

    130

Everything posted by Stefan Glienke

  1. Stefan Glienke

    Having fun with Delphi

    Fluent API with records containing managed fields (such as strings) unfortunately produces terrible code because the compiler produces an implicit variable for each method call.
  2. I really lol'ed at that one - stockholm syndrome anyone?
  3. The description is in the paper and the reference implementation in c is in the very links on that page. Attention though: there is xoroshiro and xoshiro in different variations.
  4. Take a look into @Primož Gabrijelčič GpRandomGen.pas
  5. Stefan Glienke

    Does Filter Exceptions make Delphi to steal focus

    Why are you running unit tests under the debugger if you are not interested in well actually debugging? If there are any unexpected exceptions the tests will be red and then you select those and start looking into the defects.
  6. Stefan Glienke

    Range checking in library code?

    See my point is this - when a developer writes code like this: for i := 0 to list.Count - 1 do DoSomething(list[i]); it makes absolutely no sense that for every item access to perform the range check. Yes, I know you could use the for-in loop to avoid that but let's just see this as one example of "I have tested this code and I did not make an off by one error on the Count (like forgetting that -1) and let me just get the fastest possible code for my RELEASE build. Unfortunately unless you recompile the RTL/VCL/FMX/you name it yourself you have to stick with pointless sanity checks all over the place. No, I have not done any full application benchmark to actually measure the impact of the overall performance of all types of code that uses RTL lists but I think we can agree that even if the impact might be minor there is one. What I am talking here is the public API of my code that users can interact with and thus I have no control over them passing bad values hence I would like to perform sanity checks to make the code as robust as possible. But I also want the library code to be as fast as possible in a RELEASE configuration (I think this could also stir an argument on whether to turn range/overflow checking on or off then). I certainly don't want to add two levels of API here but make it behave reasonable under DEBUG and RELEASE. I am using these two words for the "tell me when I did a mistake and give me specific information about the mistake" and the "I tested all the code and don't have out of range cases, dont pollute all my code with useless sanity checks" cases.
  7. Stefan Glienke

    Organizing enums

    And if you would use tinyint you could even save 3 byte per record 😉
  8. Stefan Glienke

    Delphi 10.4.1 and the IDE FIx Pack

    tbh not having IDEFixPack being available is good in the long run - with the existence of it nobody (or very few) actually cared to report issues and put some pressure on Embarcadero to address these things. Now that there is no solution available the pressure on Embarcadero has raised and I can confirm that they are working on it - will they suddenly implement all fixes and optimizations from Andreas? No, but better they do slowly than relying on the third party no matter how incredible.
  9. Disclaimer: This is microbenchmark area! Which is why I wrote "sometimes" - but it might be better to write this: if not precondition then raise_some_error; do_stuff into: if precondition then do Stuff else raise_some_error Of course there are other factors - but if we already talk about writing an index check as one instead of two cmp/jxx instructions we might as well talk about that.
  10. Stefan Glienke

    Organizing enums

    If that would compile - and even if it would that would not prevent that set to be assigned [4] to.
  11. Exactly what I have been doing recently during my refactoring - fun fact: sometimes you even want to rewrite that to avoid the conditional forward jump taken.
  12. Stefan Glienke

    Organizing enums

    And I thought enums and the possibility to build sets of enums is one of those unique features in Delphi that many other languages such as C++ don't have hence you have to and/or with damn bitmasks....
  13. Stefan Glienke

    Initialization of returned managed types

    Use Fixinsight. It catches those.
  14. Stefan Glienke

    Organizing enums

    Give types a meaningful and obvious name. Then you don't have issues remembering them.
  15. Stefan Glienke

    Simple inlined function question

    Those numbers make me assume that you ran in Debug config (i.e. without $O+).
  16. Stefan Glienke

    Simple inlined function question

    Without going into more detailed analysis for now (maybe that's a good topic for a future blog post) I would say even though the older versions produced that extra temp variable and checking against that with the inlined code the main reason why both loops take different durations is due to being in one or two cachelines. We had the same situation some while ago in another thread when we measured different string handling routines. Some performed better or worse and then suddenly a small change changed the result significantly simply because the instructions emitted for the loop were located differently. Some stuff to read about Microbenchmarks: https://engineering.appfolio.com/appfolio-engineering/2019/1/7/microbenchmarks-vs-macrobenchmarks-ie-whats-a-microbenchmark
  17. Stefan Glienke

    Simple inlined function question

    Classic measuring issues. I ran the same code in 10.4.1 and while it produces the same asm code the first loop ran slower for me (461 vs 232). I have seen this before and I guess its because the TStopwatch code is not yet in the cache for the first run - same is true for code to be measured. That is why for running good benchmarks you either run both in their own binary and not back to back in the same one - in order for them to be both affected by being a cold run or you simply run the benchmark once to warm up and then start measuring. There are more things to consider though but I won't go into detail here. Edit: compiled in 10.1 the second loop indeed runs slower for me as well. Unit1.pas.46: for i := 1 to loop do 005CE6D2 8B1D54C95D00 mov ebx,[$005dc954] 005CE6D8 85DB test ebx,ebx 005CE6DA 7E1A jle $005ce6f6 Unit1.pas.47: if (vSearchValue = '') and (vItemValue <> '') or (vItemValue = vSearchValue ) 005CE6DC 837DFC00 cmp dword ptr [ebp-$04],$00 005CE6E0 7506 jnz $005ce6e8 005CE6E2 837DF800 cmp dword ptr [ebp-$08],$00 005CE6E6 750B jnz $005ce6f3 005CE6E8 8B45F8 mov eax,[ebp-$08] 005CE6EB 8B55FC mov edx,[ebp-$04] 005CE6EE E8DDC2E3FF call @UStrEqual Unit1.pas.46: for i := 1 to loop do 005CE6F3 4B dec ebx 005CE6F4 75E6 jnz $005ce6dc vs Unit1.pas.53: for i := 1 to loop do 005CE74B 8B1D54C95D00 mov ebx,[$005dc954] 005CE751 85DB test ebx,ebx 005CE753 7E24 jle $005ce779 Unit1.pas.54: if IsSearchByValueFound_Inlined(vSearchValue, vItemValue) 005CE755 837DFC00 cmp dword ptr [ebp-$04],$00 005CE759 7506 jnz $005ce761 005CE75B 837DF800 cmp dword ptr [ebp-$08],$00 005CE75F 7511 jnz $005ce772 005CE761 8B45F8 mov eax,[ebp-$08] 005CE764 8B55FC mov edx,[ebp-$04] 005CE767 E864C2E3FF call @UStrEqual 005CE76C 7404 jz $005ce772 005CE76E 33C0 xor eax,eax 005CE770 EB02 jmp $005ce774 005CE772 B001 mov al,$01 005CE774 84C0 test al,al Unit1.pas.53: for i := 1 to loop do 005CE776 4B dec ebx 005CE777 75DC jnz $005ce755 So there is indeed a difference in the code which affects the performance which goes back to what I said before - the inliner not doing its best job - what you see here is that the compiler still generates that result variable and either sets it to true or to false and then checks that one. But the first loop is only faster because you don't do anything after the check. Edit: One more thing that is important when measuring stuff like this in comparison directly - cache lines. In my case the second loop is always faster in 10.4 even though both of them have the same code generated - and that is simply because the first loop spans two cachelines and the second one in only one - that is something you cannot influence easily and should not bother with but need to be kept in mind when doing measuring code like this.
  18. Stefan Glienke

    Simple inlined function question

    Unfortunately that is true for almost all code in Delphi - and the reason why there are so many "do I better write the code like this or that" discussions - because we constantly have to help the compiler writing code in certain ways when we want to get the optimum. The inliner is not effective as it could be - I would guess the reason being the Delphi compiler is mostly a single pass compiler - so it does not run another optimization step after the inlined code. That means that often there is register or stack juggling happening after the inlining took place that would not have been there if the code would have written there directly. But again: measure and evaluate if it matters. And be careful when measuring it because simply taking both different codes and timing it won't be enough.
  19. So instead of adressing those statements you bring up some completely irrelevant and wrong things about my library? Well... I am all for criticism if you find issues in its design or implementation but that was just a low blow. 😉 Maybe I am missing something when setting up the list or using the wrong one but there is clearly the lack of handling managed types in TsgList<T> because it simply calls TslListHelper.SetItem which uses ordinal assignments or System.Move. Here is some code that shows that something is wrong - shouldn't it print 0 to 9? But it does print an empty line and then raises an EInvalidPointer. const COUNT = 10; procedure RunSGL; var list: TsgList<string>; i: Integer; s: string; begin list.From(nil); for i := 0 to COUNT-1 do begin s := i.ToString; list.Add(s); end; s := ''; for i := 0 to COUNT-1 do begin s := list[i]; Writeln(s); end; end; P.S. You can edit your posts - no need for multiposting to address multiple previous comments.
  20. As I wrote that was not an in depth benchmark but just to get a rough idea and its apples and oranges anway, ymmv. unit Unit1; interface procedure RunSGL; procedure RunSpring; procedure RunRTL; procedure RunArray; implementation uses Diagnostics, Generics.Collections, Spring.Collections, Oz.SGL.Collections; const COUNT = 10000000; procedure RunSGL; var list: TsgList<Integer>; sw: TStopwatch; i, n: Integer; begin list.From(nil); list.Count := COUNT; sw := TStopwatch.StartNew; for i := 0 to COUNT-1 do list[i] := i; Writeln('SGL SetItem ', sw.ElapsedMilliseconds); sw := TStopwatch.StartNew; for i := 0 to COUNT-1 do begin n := list[i]; if n = -1 then Break; end; Writeln('SGL GetItem ', sw.ElapsedMilliseconds); end; procedure RunSpring; var list: IList<Integer>; sw: TStopwatch; i, n: Integer; begin list := TCollections.CreateList<Integer>; list.Count := COUNT; sw := TStopwatch.StartNew; for i := 0 to COUNT-1 do list[i] := i; Writeln('Spring SetItem ', sw.ElapsedMilliseconds); sw := TStopwatch.StartNew; for i := 0 to COUNT-1 do begin n := list[i]; if n = -1 then Break; end; Writeln('Spring GetItem ', sw.ElapsedMilliseconds); end; procedure RunRTL; var list: TList<Integer>; sw: TStopwatch; i, n: Integer; begin list := TList<Integer>.Create; list.Count := COUNT; sw := TStopwatch.StartNew; for i := 0 to COUNT-1 do list[i] := i; Writeln('RTL SetItem ', sw.ElapsedMilliseconds); sw := TStopwatch.StartNew; for i := 0 to COUNT-1 do begin n := list[i]; if n = -1 then Break; end; Writeln('RTL GetItem ', sw.ElapsedMilliseconds); end; procedure RunArray; var list: TArray<Integer>; sw: TStopwatch; i, n: Integer; begin SetLength(list, COUNT); sw := TStopwatch.StartNew; for i := 0 to COUNT-1 do list[i] := i; Writeln('array ', sw.ElapsedMilliseconds); sw := TStopwatch.StartNew; for i := 0 to COUNT-1 do begin n := list[i]; if n = -1 then Break; end; Writeln('array ', sw.ElapsedMilliseconds); end; end. You know that unit testing also includes testing if exceptions are thrown properly, yes? All of them are expected exceptions and you saw that all tests are green, yes?
  21. Fun fact: some years ago someone (not me, not Andreas) achieved being able to inherit record helpers by simply patching one flag in the compiler. 😉 I am just saying this to emphasize that its not some technical limitation but simply the compiler does not allow it when it sees "record helper" but does when its "class helper" as otherwise its the same code.
  22. @Edwin Yip The article you linked to is my blog, not Erics 😉 Yes, it also suffers from that however the types in that library are way smaller and have only limited functionality of just storing stuff, no rich IEnumerable API such as Spring4D. That makes the binary overhead very very small or even non existing. What I observed though is that they are no general purpose collection classes but obviously tailored for some specific needs - you cannot use TsgList<T> for any type as it is very limited as what it handles (only non managed types). I also did not do a in depth performance comparison but just adding some Integers to a list was 3-4 times slower than in my latest Spring4D build (which is a faster than the RTL). If you watch the video by Herb Sutter I linked in the other thread I even wonder why someone coming from C++ would need to create some list type as he could just use TArray<T> because he should have used Vector<T> in C++.
  23. No but I consider what state of the art hardware likes and what it does not like which sometimes varies from what people learned 20 years ago. Big O is about scaling of an algorithm - not about absolute speed. You can have an O(1) algo being stomped by some O(n) simply because you only ever have that much data that the constant factor in the O(1) still makes it slower than the O(n). The constant factor can be influenced by either additional computations (such as hashing) or simply because you have to do stuff that is more expensive hardware wise (such as memory allocation or memory indirections). That is why for example the fastest general purpose sorting algorithms out there are hybrid sorting algorithms combining the best sorting algorithms for particular parts of the to be sorted data (such as introsort or timsort) with timsort even being optimized on realistic data appearance such as already, almost or reverse sorted. Having said so imo pondering over that stuff is only important for certain areas of software development such as core library developers or if you roll your own data types and algorithm implementations. Here is another thing about the "stuff that modern hardware loves" argument. Interesting part starts around 24:00 with all the theory and is followed by two mind blowing examples at around 41:00:
  24. Concatenation of dynamic arrays is just for convenience - it certainly does not help to improve performance if you are concerned about heap allocations.
  25. There is a difference between using a data structure like an array which is optimized by default because the hardware really really likes it and one where additional work has to be done. Well the only thing to optimize on an array is to make the type being stored as compact as possible and to ensure it uses as few cache lines as possible. When implementing a linked list naively you end up with heap memory all over the place - so make it as cache friendly as possible to have to go the extra mile - that's what I meant. But as you said it depends but for adding/removing at both ends a circular buffer array with "wrap around" logic to avoid moving when operating at the head will win I am pretty sure. Additional read on the subject: https://dzone.com/articles/performance-of-array-vs-linked-list-on-modern-comp
×