-
Content Count
1518 -
Joined
-
Last visited
-
Days Won
154
Everything posted by Stefan Glienke
-
DPM Package Manager - presentation
Stefan Glienke replied to Vincent Parrett's topic in Delphi Third-Party
I kinda like DPM - TLAs FTW -
Manage overloaded IfThen functions
Stefan Glienke replied to Mike Torrettinni's topic in General Help
That or something like this. -
Is someone using MVVM?
Stefan Glienke replied to Javier Tarí's topic in Algorithms, Data Structures and Class Design
Best since sliced bread ... if you're not in Delphi -
Micro optimization - effect of defined and not used local variables
Stefan Glienke replied to Mike Torrettinni's topic in Algorithms, Data Structures and Class Design
What if we could write optimized code right from the start and would not have to deal with all that shit because the compiler has the intelligence of a rock. What if good coding practices could be tought by the editor via suggesting things (look at quick actions in Visual Studio that will help you with many different things - from fixing formatting to suggesting some refactoring) Much backwards compatibility is eyewash and simply means: "we did not change the signature but sacrificed some firstborn to make it still work". If you provide - there is it again - tooling to detect and guide you with moving forward (yes, often backwards compatibility is nice because I don't have to ifdef my code for a dozen different versions) then breaking changes are not bad. -
Micro optimization - effect of defined and not used local variables
Stefan Glienke replied to Mike Torrettinni's topic in Algorithms, Data Structures and Class Design
I disagree - this is exactly the mindset that we got trained all these years because we did not know any better - the world moved on - heck there are people working on programming tools based on ML so it can suggest refactorings based on refactorings you have done in the past! And yet here we are mostly doing yolo driven development - "if it aint break it might be ok" (ok, I am exaggerating here). If the tooling can point out possible optimizations because they understand what you are doing that can just be good regardless of how much of a measurable improvement that will make. And if its just for some junior coders at Embarcadero slapping together some ... ahem ... non ideal code that never gets properly reviewed because lack of time. It took them years and an actual change of the FreeAndNil function to find bugs in their code that static code analysis could have found ages ago. -
Manage overloaded IfThen functions
Stefan Glienke replied to Mike Torrettinni's topic in General Help
https://en.wikipedia.org/wiki/Principle_of_least_astonishment And now imagine some compiler that turns that into proper code with as little conditional jumps as possible: https://godbolt.org/z/KqhKnx -
Micro optimization - effect of defined and not used local variables
Stefan Glienke replied to Mike Torrettinni's topic in Algorithms, Data Structures and Class Design
@Mahdi Safsafi Add this on the issue please. -
static array vs. dynamic array
Stefan Glienke replied to FranzB's topic in Algorithms, Data Structures and Class Design
And then you put them into an array and have no contiguous memory where the data resides in but just a bunch of pointers pointing all over the heap -> bad. -
Micro optimization - effect of defined and not used local variables
Stefan Glienke replied to Mike Torrettinni's topic in Algorithms, Data Structures and Class Design
And on win32 those try/finally have a significant effect even worse than a heap allocation at times because they completely trash a part of the CPUs branch prediction mechanism - see RSP-27375 -
@Mahdi Safsafi Optimized it: {$O+} function foo(I: Integer): Integer; begin case I of 0: Exit(random(255)); 1..5: Exit(i+2); // 2: Exit(4); // 3: Exit(5); // 4: Exit(6); // 5: Exit(7); else Exit(0); end; end; Scnr
-
How to optimize exe loading times
Stefan Glienke replied to a topic in Software Testing and Quality Assurance
SamplingProfiler usually gives a good overview to find the particularly time consuming parts. (although it says up to XE4 on that page it works just fine for up to 10.4) -
Exactly - that's why I just recently rearranged some of my code from this pattern (which I personally like very much for its readability and less indention): if not check then SomeErrorMessageStuff/raise/exit Usual stuff; to: if check then begin do usual stuff optionally exit end else Error stuff With some noticable improvements. Not only will the common part be on the fallthrough but also it avoids unnecessary register preserving when you have an error raising subroutine that will never return which the compiler does not know of. I wish Delphi would have something like noreturn
-
Keep in mind the slightly different tendencies to branch predict on different CPUs when microbenchmarking cold code. https://xania.org/201602/bpu-part-one
-
Micro optimization - effect of defined and not used local variables
Stefan Glienke replied to Mike Torrettinni's topic in Algorithms, Data Structures and Class Design
As a guideline: try to remove overhead from prologues and epilogues caused by variables of managed types (explicit or implicit) such as strings or interfaces that are only there for the uncommon path. Another example was the error raising code in the hextobin thread that can be put into a subroutine that gets called only when the rase case of a invalid char occurs. Eric Grange wrote a nice article about this some years ago that I like to recommend: https://www.delphitools.info/2009/05/06/code-optimization-go-for-the-jugular/ -
Nice, now make it run on multiple cores 😅
-
No need for multiple inheritance - you can also apply the mechanism from Spring.TManagedObject to any other class - simply override NewInstance and FreeInstance
-
Why can't we have both - a fast compilation generating debug friendly non optimized code and one that churns a little longer and emits those juicy optimizations. Anyhow the current slowliness in the compiler comes from sloppy code in the compiler and not because it does so many amazing things.
-
Especially since one of its selling points is "it compiles to native code" - if that native code is garbage for modern CPUs because its written like in the 90s that's kinda poor.
-
Fair enough - guess I have to go back to the version with the label.
-
Close one, nice! 😉 But I would be surprised if a simd loop would not beat it. I am sure doing the entire thing with a simd loop would totally destroy any pure pascal solution.
-
That was a typo that David copied - my first version had 3 checks in the loop where {$B+} made it better, now with only 2 checks I don't need that anymore - see my post with the currently best version.
-
Well that is the optimizations that some people were aware of and some weren't - why would that be unfair? P.S. What did I win? Joking aside - it's always interesting that different people see different things. And at the same time it's very sad that perfectly fine code will be like 3-4 times slower than hardcore optimized code simply because the compiler does not know about some things, does not do zero cost abstractions (*) and does not reorder code to make it better. (*) I mean seriously - why do an LStrAsg on a string parameter - as if that would go invalid in the middle of the loop or what?! And because you cannot assign to a loop variable it should treat it exactly the same way as a moving PChar over the string.
-
"Get your conditional jumps and error handling garbage outta my hot loop, kay?" function HexToBinStefan(const HexValue: string): string; // put the exception stuff into a subroutine to not pollute our routine procedure Error(c: PChar; s: string); begin raise EConvertError.CreateFmt('Invalid hex digit ''%s'' found in ''%s''', [c^, s]); end; label _Error; type TChar4 = array[0..3] of Char; PChar4 = ^TChar4; {$POINTERMATH ON} PInteger = ^Integer; {$POINTERMATH OFF} const Table: array[0..22] of TChar4 = ( '0000', '0001', '0010', '0011', '0100', '0101', '0110', '0111', '1000', '1001', // 0-9 'xxxx', 'xxxx', 'xxxx', 'xxxx', 'xxxx', 'xxxx', 'xxxx', // :-@ - unused '1010', '1011', '1100', '1101', '1110', '1111'); // A-F var HexDigit: PChar; P: PChar4; i, n: Cardinal; begin // do not use PChar cast because that causes a call to UStrToPWChar // we don't need that special PChar to #0 when HexValue is empty HexDigit := Pointer(HexValue); if HexDigit = nil then Exit; // we know that HexDigit is not nil so we can avoid the conditional jump from Length // this also directly moves it into the correct register for the SetLength call SetLength(Result, PInteger(HexDigit)[-1] * 4); P := PChar4(Result); for i := 1 to PInteger(HexDigit)[-1] do begin // subtract 48 to make '0'-'9' 0-9 which enables unconditionally downcasing any upper case char // when we hit the #0 it will simply produce an invalid value for n that we will break on next n := Cardinal(Integer(Ord(HexDigit^)) - 48) and not 32; // avoid one check by simply subtracting 10 and checking the invalid range of 10-16 // thank you godbolt.org and amazingly optimizing c++ compilers for that idea! <3 if (Cardinal(Integer(n)-10) <= 6) or (n > 22) then goto _error; P^ := Table[n]; Inc(P); Inc(HexDigit); end; Exit; _error: Error(HexDigit, HexValue); end;
-
i was just going to comment on that - a for in loop on a string is causing the compiler to do an LStrAsg to a local variable and iterates that one which causes a costly implicit try finally and UstrClr in the epilogue. Also with all that microbenchmarking - please consider the compiler might place code for various implementations good or bad in terms of their layout within cache lines. We had that topic already some while ago where one implementation was simply faster because the hot loop fit into one cache line while another one or even rearranging of code caused it to span two cache lines affecting the results negatively.
-
Kinda pointless to limit the power of sse to handling single characters instead of simply processing multiple characters at once - especially since you now have a call inside the loop slowing stuff down significantly plus having to move the same stuff over and over into xmm2-4. Using simd should be 2-10times faster than the regular 1 char in a loop implementation