Jump to content

Stefan Glienke

Members
  • Content Count

    1362
  • Joined

  • Last visited

  • Days Won

    129

Everything posted by Stefan Glienke

  1. Stefan Glienke

    random between a range

    RandomRange - just saying
  2. I don't know what to think after reading that article. Here are my comments on it: - the classic way of truncating the last 2 digits with div and mod 10 (or 100) does not involve a costly div or mod instruction on modern compilers (*cough* even Delphi 12 now does it - apart from the bugs that came with it) - I think C++ compilers would detect doing a div and a mod instruction and the code they emit would be further optimized so it does not require the "workaround" that the Delphi RTL uses by calculating the modulo by subtracting the div result times 100 from the original value. - the pseudo-code he shows for detecting the number of digits is correct but this is never what gets executed - and you either rewrite this into a few branches (as you can see in the RTL), a C++ compiler might unroll the loop or some other trickery is applied The DivBy100 function was introduced by me in RSP-36119 and I already notified them that DivBy100 can be removed in 12 because now it properly optimizes a div by 100 - however, that affects performance only by like 0.5% or so. As David correctly pointed out the real bottleneck is the heap allocation - and not only a single one when you just turn an integer into a string and display that one but when you concat strings and numbers the "classic" way because then it produces a ton of small temporary strings. That issue even exists when using TStringBuilder where one might think that this was built for optimization. If you look into some overloads of Append you will see that it naively calls into IntToStr and passes that down to the overload that takes string. This is completely insane as the conversion should be done directly in place into the internal buffer that TStringBuilder already uses instead of creating a temporary string, convert the integer into that one, pass that to Append to copy its content into the buffer. This will likely be my next contribution as part of my "Better RTL" series of JIRA entries.
  3. Stefan Glienke

    Delphi 12 is available

    It's not, what you refer to is AOT - when .NET code runs, the runtime always JITs the code to machine code - read someone explain it who knows more than I do. You are mixing things here - I can also compile to native code with VC++ and still require some special version of the Visual C++ runtime to be installed.
  4. Stefan Glienke

    Delphi 12 is available

    You do realize that Java and C# are not interpreted like Python, yes? They compile down to machine instructions but not at compile time but at runtime - hence the term JIT. And fwiw the code that these JIT compilers produce often runs circles around what Delphi does with its ancient instruction set the compiler knows of. I am getting tired of that mantra "But it compiles to native code!" as if that in itself was something good. If that native code is poorly optimized and mostly looks the same as in '95 or compiled with -O0 then how good can it be?
  5. Stefan Glienke

    Delphi 12 is available

    PSA: Looks like that integer division is broken in 12 due to implementing this feature request. There are at least two reports already about this: https://quality.embarcadero.com/browse/RSP-43274 https://quality.embarcadero.com/browse/RSP-43418 Personally, I would say this absolute "need a hotfix asap" severity. I am honestly really sad about this because this encourages everyone who shies away from requesting any improvements to the compiler because it is almost certain that s**t will fall apart after. 😒
  6. Stefan Glienke

    Try-Finally-end; & Exit??

    What you actually want to say is that the register allocator is bad (and that also applies to Win64) - which I agree with and has been reported repeatedly. However, pointing out one of the thousands of different specific cases is not getting us anywhere. Especially since there are way more impactful situations where this causes excessive stack spilling.
  7. Stefan Glienke

    Try-Finally-end; & Exit??

    That code only wastes 2 bytes of binary code because the CPU will most likely apply register renaming and mov elimination (yes, there is some cost for decoding the unnecessary mov). But from all the possible optimizations in the x86 and x86-64 codegen this is one of the least important ones I can think of. FWIW x86-64 code will emit the lea instruction for both.
  8. Stefan Glienke

    Delphi 12 List Objects x64

    What? NativeInt is basically Int64 on a 64-bit target platform.
  9. Stefan Glienke

    Try-Finally-end; & Exit??

    I do, but I know a dozen other things I would rather like to have, and know most of them have been reported for almost a decade or longer and nothing has happened about them - such as this no-brainer.
  10. Stefan Glienke

    Try-Finally-end; & Exit??

    It's cute that you only have x86 in mind when discussing this.
  11. Stefan Glienke

    Try-Finally-end; & Exit??

    Could a goto out of a try block be supported - yes. It would require properly transferring control to the finally blocks and then transferring control to the target label. This would require significant work in the compiler. Given how popular the use of goto is, that would be a complete waste of resources. See how the C# spec defines the behavior of a goto out of a try block: https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/language-specification/statements#13104-the-goto-statement
  12. Stefan Glienke

    Spring4D dictionary and record values

    More features, better performance, fewer bugs (and if there are any not having to wait for the Delphi version that fixes them), less binary code bloat.
  13. Stefan Glienke

    Delphi 12 is available

    At least in 13 they would have an excuse for poor quality
  14. Also not true (anymore, it might have been decades ago) - or it might happen that a shifting pointer might be better because of register pressure or shortage under x86 because it only requires one register opposed to two when indexing into an array. But indexing into a memory address with an increasing or decreasing index register is always faster. Another situation might happen when your array is a field of your class and you index into that one because then the compiler is really stupid and re-reads the field every time and then indexes into it. But then the issue is re-reading the field and not the indexing into it. I solved this by putting the array into a local pointer variable and then index into that one - like here. And yet another situation happens on 64bit when using an Integer index variable because then it always does an extra register widening instruction which can be not zero cost (yes, I need to fix the code I just pointed to because it does exactly that having i declared as Integer and not NativeInt as it should be, shoot me). Oh, one particular bad thing about dcc64 is that it does not really optimize some instructions in loops well. From dcc32 we know about the counting down to 0 behavior of a for-to loop where it maintains two counters, the actual index variable (if you actually use that within the loop) and the counting down to 0 variable that it uses to control the loop. For that it usually uses the dec/jnz combination which works well, macro fuses and all that. On win64 it does sub reg, 1, test reg, reg, jnz where only test and jnz fuse which causes wasted cycles. That extra test is complete bonkers because the sub (which should actually be a dec) already sets the zero flag! See RSP-37745 Another missed opportunity of loop optimization that affects both win32 and win64 is letting the compiler create loop that counts from -count to -1 which is another optimization technique where you grab the position after the last element then index into it. This way if you don't need the index variable itself for something else than indexing you only need 2 registers, one points to right after the last element and the loop just needs the nicely fusing inc reg, jnz
  15. This is false knowledge - it only does repeated calls to Length when you do for x in some_dynamic_array do loop
  16. If you care about such things you are totally wrong using Delphi - especially 64bit. It does suboptimal use of all the registers available in 64bit, it produces a crapton of conditional jumps instead of better alternatives that exist for decades, it does not use SSE (let alone AVX) which exists almost as long (except for some floating point stuff), it does zero optimization wrt to loop alignment, it does not restructure binary code so that some cold code does not sit in the middle of some hot code. I could go on, but these are just a few things that matter more if you really care about the least ns squeezed out of your application than some 300K of binary size.
  17. It better be - I spent quite some time on it. FWIW it was already introduced in 11.3
  18. You realize that it's not "exact same code" if you compare 11 and 12, right? The code in the RTL and the VCL or FMX (depending on which one you use) - change between those versions. It only requires one use of a class somewhere that was not used before or an introduction of an additional list of something inside of some class and the binary size increases. Go diff the source directory of your C:\Program Files (x86)\Embarcadero\Studio\22.0 and C:\Program Files (x86)\Embarcadero\Studio\23.0 directories and check for yourself. Or build with a map file and then diff the map file to find these changes. Also - and this might be marginal but contribute to the overall increase: they changed the count and index of all collection types to NativeInt which means that some instructions regarding those might be a few bytes larger on 64bit (see some x86-64 instruction reference of your choice for details). Otherwise, when I think about it it might also save a few bytes because it does no do register widening anymore on 64bit. So take this last paragraph with a grain of salt and consider it just additional information. I would guess a few fixes and new features here and there can easily add up to 300K more binary size - especially with the general issue the Delphi compiler has with generics - see RSP-16520.
  19. Stefan Glienke

    Delphi 12 is available

    Marketing people don't fix bugs 😉
  20. Stefan Glienke

    Delphi 12 is available

    TBH IMO using years is kinda ok and a version number where the number after the dot means "update to that version". What was marketing bs was the XE numbering and then later making the 10 minor versions actually major versions - combining that with city names just made things more confusing (man, I still have to think if Tokyo or Rio was first...). Funny enough as far as I can see the IDEs in 11 and 12 don't reference the city name anywhere.
  21. Stefan Glienke

    LSP processes

    CreateProcess and IPC
  22. Stefan Glienke

    LSP processes

    LSP has an official protocol though I don't know exactly which version and which subset Embarcadero supports - but it's enough to let you use it in VS Code.
  23. Stefan Glienke

    Intel Simd-sort library

    Indirectly does not matter for this algorithm because it directly sorts a vector of integer or float - and DB keys are organized in a different layout. Again, the question was not in sorting millions of elements of data but strictly sorting arrays of integer or float.
  24. Stefan Glienke

    Intel Simd-sort library

    A serious question because many sort benchmarks are so obsessed with sorting large arrays/vectors of just integers and floats: Do people in reality really have these kinds of data that need to be sorted?
  25. Using 10.4 you can do this with the following code: TArray.Sort<yourtype>(list.PList^, list.Comparer, index, count);
×