Stefan Glienke

Members

View Profile See their activity

Content Count
1497
Joined
October 18, 2018
Last visited
14 hours ago
Days Won
152

Content Type

All Activity

Profiles

Forums

Topics
Posts

Calendar

Events

Everything posted by Stefan Glienke

Function with 2 return values ?

Stefan Glienke replied to Henry Olive's topic in RTL and Delphi Object Pascal

With the current memory model that would be a complete disaster. The main appeal of using interfaces when doing some DI architecture is the ref counting. And the mantra "program to an interface, not an implementation" is factually wrong: read https://blog.ploeh.dk/2010/12/02/Interfacesarenotabstractions/
- November 23, 2022
- 53 replies
Array size 64bits

Stefan Glienke replied to Heremik's topic in RTL and Delphi Object Pascal

The difference in performance is clear - accessing a dynamic array which is a field inside the class is two indirections while accessing a static array which is a field inside the class is only one indirection. If you inspect the generated assembly code you will see that every access to the dynamic array has more instructions. This is the case every time you have repeated access to a field inside a method because the compiler does not store it away as if it was a local variable and then directly reads it but basically does Self.Table every time. For this exact reason I have explicitly written code that first reads the dynamic array into a local pointer variable (to avoid the extra reference counting) and then operate on that one via hardcast back to dynamic array (or via pointermath). That way the compiler could keep that in a register and directly index into it rather than dereferencing Self every time to read that dynamic array. To try out, add this code to your Button1Click: {$POINTERMATH ON} Table: ^DWord; {$POINTERMATH OFF} begin SetLength(Self.Table, NbPrime); Table := @Self.Table[0]; Now the code accesses the local Table variable which most likely is stored in a register.
- November 14, 2022
- 10 replies
A gem from the past (Goto)

Stefan Glienke replied to Mike Torrettinni's topic in Algorithms, Data Structures and Class Design

If you are using XE2 as in your profile you could be affected by this: https://quality.embarcadero.com/browse/RSP-27375 And as David mentions depending on what is inside the try the compiler easily throws any register usage overboard and operates via stack.
- November 9, 2022
- 40 replies
Use of inline variables..

Stefan Glienke replied to Ian Branch's topic in General Help

https://quality.embarcadero.com/browse/RSP-23096
- November 9, 2022
- 22 replies
64bit RTL patches with Intel OneApi and TBB

Stefan Glienke replied to RDP1974's topic in RTL and Delphi Object Pascal

That Poker Benchmark is completely pointless as it has almost zero memory allocations - the majority of CPU time is spent sorting cards and stuff.
- November 3, 2022
- 41 replies
Implement logic in TListView

Stefan Glienke replied to karl Jonson's topic in VCL

I would probably design it like this (enabling and disabling the Checkboxes depending on RadioButton3 Checked)
- November 2, 2022
- 6 replies
Implement logic in TListView

Stefan Glienke replied to karl Jonson's topic in VCL

Use the OnChanging event to allow or disallow selecting an item depending on the existing selection
- November 2, 2022
- 6 replies
Delphifeeds.com??

Stefan Glienke replied to Ian Branch's topic in General Help

That is exactly what I respond when someone tells me that Embarcadero should integrate TestInsight or Spring. Parnassus plugins ...
- October 28, 2022
- 12 replies
generics

Stefan Glienke replied to RDP1974's topic in Algorithms, Data Structures and Class Design

Doesn't that make it a container? 😜
- October 27, 2022
- 19 replies
generics

Stefan Glienke replied to RDP1974's topic in Algorithms, Data Structures and Class Design

Think of collections as "algorithms and datatypes for any type" - then you know the use case of generics. For any algorithm and/or datatype that is not just specific for one exact type.
- October 27, 2022
- 19 replies
The Delphi 11.2 release thread

Stefan Glienke replied to Lars Fosdal's topic in General Help

11.2 is a nightmare - prior to LSP the worst was that ctrl+click did not work. Now, most of the time nothing at all works because LSP dies all the time.
- October 26, 2022
- 123 replies
generics

Stefan Glienke replied to RDP1974's topic in Algorithms, Data Structures and Class Design

As a reaction to one of his answers during Q&A I wrote a blog post. Having said that and personally loving generics for various use cases (as shown in the blog post) there are also things that are solved sub optimal - which I also wrote about. Also if you have used generics in C# then you will likely miss co- and contravariance - oh, look, I also wrote about that. If you are going really fancy with generics and code that uses RTTI you have to be aware about some percularities - guess what: wrote about it. Now because in generics you basically have the lowest common denominator and we are lacking quite some ways to specify some traits of the supported types via constraints there are several things that you cannot do or have to fall back to indirections: most common example is using comparer interfaces for generic sorting algorithm or hashtables. That is mostly where a naively implemented generic algorithm might be slower than some handcrafted code for the specific type unless you heavily optimize for various cases (as I have done in Spring).
- October 25, 2022
- 19 replies
Best place for Spring4D questions

Stefan Glienke replied to Gary's topic in General Help

Sure, gonna work on that silver tag badge
- September 30, 2022
- 3 replies
Any advantage to using FastMM5??

Stefan Glienke replied to Ian Branch's topic in General Help

In a single-threaded application, FastMM5 will not give you any noticeable improvements. It was designed to overcome the issues of V4 under heavy multithreading.
- September 30, 2022
- 6 replies
64bit RTL patches with Intel OneApi and TBB

Stefan Glienke replied to RDP1974's topic in RTL and Delphi Object Pascal

It still surprises me that people are surprised by how much performance improves under heavy multithreading when not using the default MM. AFAIK mORMot does not use the default MM anyway.
- September 18, 2022
- 41 replies
32bit vs 64bit

Stefan Glienke replied to chkaufmann's topic in Windows API

I can tell you from experience that traveling with an airplane has significant overhead if you live approx 100km from an airport that has flights going to your destination. You missed the point - when David mentioned that nobody should be using Extended he most likely stated a well-meant suggestion and did not include the "I started using Extended like decades ago and don't wanna change it" case.
- September 12, 2022
- 33 replies
32bit vs 64bit

Stefan Glienke replied to chkaufmann's topic in Windows API

Carriages also once were the fastest way to travel - yet nobody today complains that he can't go onto the highway with one. On topic: If the 64bit Delphi compiler(s) (and significant parts of the RTL) would not be even worse than the 32bit compilers and the debugging experience would not be an absolute nightmare I would switch instantly - it's proven that even though a 64bit application might use a bit more memory because the pointers are double the size due to architectural differences it simply can perform better. Simply having more registers is already a huge gain.
- September 12, 2022
- 33 replies
32bit vs 64bit

Stefan Glienke replied to chkaufmann's topic in Windows API

Too bad it's the implicit default type in many places - be it float literals or parameter types.
- September 12, 2022
- 33 replies
64bit RTL patches with Intel OneApi and TBB

Stefan Glienke replied to RDP1974's topic in RTL and Delphi Object Pascal

Tests that run for approx one second are really the way to go when deciding on the proper memory manager. 😂
- September 11, 2022
- 41 replies
64bit RTL patches with Intel OneApi and TBB

Stefan Glienke replied to RDP1974's topic in RTL and Delphi Object Pascal

Been using FastMM5 in production for over 2 years now and never looked back. I don't know of any reliability or fragmentation issues.
- September 6, 2022
- 41 replies
64bit RTL patches with Intel OneApi and TBB

Stefan Glienke replied to RDP1974's topic in RTL and Delphi Object Pascal

That's another reason why precompiled binaries are bad - if I had to guess I would say they are compiled for CPUs that support AVX which Nehalem did not have.
- September 2, 2022
- 41 replies
Profiling Clipper2

Stefan Glienke replied to angusj's topic in Delphi Third-Party

That can be the reason: array accessing needs (at least) two registers: array pointer and index, incrementing pointer only needs one With how for-to loops are working we need 3, the array pointer, the incrementing index and one compiler generated one that counts down to 0 which is actually being used for the loop. Doing a for i := 1 to count loop not actually using i but the shifting pointer we need 2 registers. However, I assume the original code has another issue: too many indirections. It does not access into the dynamic array but it first accesses the TList reference. That means we have three indirections: first the field access, second the backing array access, regardless of using a getter or the List property, third indexing into the array. (you see the 3 mov eax, ... instructions following each other) These indirections are causing a data dependency - modern CPUs can execute multiple instructions at the same time if they don't depend on each other - in this case they do so these instructions cannot execute in parallel leaving part of the CPU without anything to do. That is the main difference you will see in the code below! If you would store the TPointerList (not directly as that type because then you have implicit finally block generated by the compiler because its a dynamic array, doh) then you would have probably similar runtime because in this code there are enough registers available. Also, make sure to use NativeInteger for index variables whenever possible to avoid unnecessary move with sign-extention instructions on 64bit. j being ^PIntersectNode the asm looks like this: x86 Clipper.Engine.pas.3206: inc(j); 00BD954C 83C104 add ecx,$04 Clipper.Engine.pas.3205: repeat 00BD954F 8B01 mov eax,[ecx] 00BD9551 8B10 mov edx,[eax] 00BD9553 8B4004 mov eax,[eax+$04] 00BD9556 3B4244 cmp eax,[edx+$44] 00BD9559 7405 jz $00bd9560 00BD955B 3B4240 cmp eax,[edx+$40] 00BD955E 75EC jnz $00bd954c x64 Clipper.Engine.pas.3206: inc(j); 0000000000E9CDCB 4883C008 add rax,$08 Clipper.Engine.pas.3205: repeat 0000000000E9CDCF 488B10 mov rdx,[rax] 0000000000E9CDD2 4C8B02 mov r8,[rdx] 0000000000E9CDD5 488B5208 mov rdx,[rdx+$08] 0000000000E9CDD9 49395050 cmp [r8+$50],rdx 0000000000E9CDDD 7406 jz TClipperBase.ProcessIntersectList + $75 0000000000E9CDDF 49395048 cmp [r8+$48],rdx 0000000000E9CDE3 75E6 jnz TClipperBase.ProcessIntersectList + $5B with indexing its this: x86 Clipper.Engine.pas.3206: inc(j); 004C954B 42 inc edx Clipper.Engine.pas.3205: repeat 004C954C 8B461C mov eax,[esi+$1c] 004C954F 8B4004 mov eax,[eax+$04] 004C9552 8B0490 mov eax,[eax+edx*4] 004C9555 8B28 mov ebp,[eax] 004C9557 8B4004 mov eax,[eax+$04] 004C955A 3B4544 cmp eax,[ebp+$44] 004C955D 7405 jz $004c9564 004C955F 3B4540 cmp eax,[ebp+$40] 004C9562 75E7 jnz $004c954b x64 Clipper.Engine.pas.3206: inc(j); 000000000031CDD2 4883C001 add rax,$01 Clipper.Engine.pas.3205: repeat 000000000031CDD6 488B5320 mov rdx,[rbx+$20] 000000000031CDDA 488B5208 mov rdx,[rdx+$08] 000000000031CDDE 488B14C2 mov rdx,[rdx+rax*8] 000000000031CDE2 4C8B02 mov r8,[rdx] 000000000031CDE5 488B5208 mov rdx,[rdx+$08] 000000000031CDE9 49395050 cmp [r8+$50],rdx 000000000031CDED 7406 jz TClipperBase.ProcessIntersectList + $85 000000000031CDEF 49395048 cmp [r8+$48],rdx 000000000031CDF3 75DD jnz TClipperBase.ProcessIntersectList + $62 However, incrementing pointer is not always faster because when using indexing into an array the CPU sometimes can better pipeline those instructions. Now we use a local variable of type ^Pointer (with pointermath on to be able to index into it) like this: list := Pointer(FIntersectList.List); for i := 0 to FIntersectList.Count-1 do begin // make sure edges are adjacent, otherwise // change the intersection order before proceeding if not EdgesAdjacentInAEL(list[i]) then begin j := i; repeat inc(j); until EdgesAdjacentInAEL(list[j]); // now swap intersection order node := list[i]; list[i] := list[j]; list[j] := node; end; and we get this asm for the inner repeat loop (performance is basically equal to using the pointer) - there can easily be some variations as soon as one uses an additional local variable at the "wrong" spot because the register allocator of the Delphi compiler sucks: Clipper.Engine.pas.3207: inc(j); 00E89554 42 inc edx Clipper.Engine.pas.3206: repeat 00E89555 8B0496 mov eax,[esi+edx*4] 00E89558 8B08 mov ecx,[eax] 00E8955A 8B4004 mov eax,[eax+$04] 00E8955D 3B4144 cmp eax,[ecx+$44] 00E89560 7405 jz $00e89567 00E89562 3B4140 cmp eax,[ecx+$40] 00E89565 75ED jnz $00e89554 I cannot find it right now to verify but I think I've read somewhere that as previously noted this way of instructions might perform better (modifying index and addressing into array over shifting pointer) because the CPU is able to fuse these instructions FWIW it's kinda interesting to see what some C++ compilers emit for some loop: https://godbolt.org/z/oq9Gzexa3
- September 2, 2022
- 23 replies
Profiling Clipper2

Stefan Glienke replied to angusj's topic in Delphi Third-Party

FWIW I see more improvement after my changes - but that can have various reasons. Before optimization Win32 Testing edge count: 1000 time: 145 msecs Testing edge count: 2000 time: 689 msecs Testing edge count: 3000 time: 2.585 msecs Win64 Testing edge count: 1000 time: 128 msecs Testing edge count: 2000 time: 573 msecs Testing edge count: 3000 time: 2.087 msecs Commit "Improved Delphi performance" Win32 Testing edge count: 1000 time: 149 msecs Testing edge count: 2000 time: 626 msecs Testing edge count: 3000 time: 2.379 msecs Win64 Testing edge count: 1000 time: 127 msecs Testing edge count: 2000 time: 497 msecs Testing edge count: 3000 time: 1.767 msecs Further improvements Win32 Testing edge count: 1000 time: 141 msecs Testing edge count: 2000 time: 552 msecs Testing edge count: 3000 time: 1.840 msecs Win64 Testing edge count: 1000 time: 124 msecs Testing edge count: 2000 time: 493 msecs Testing edge count: 3000 time: 1.630 msecs What we can clearly see from the results though is that your code has O(n²) - that is where you might want to invest some time into
- September 2, 2022
- 23 replies
Profiling Clipper2

Stefan Glienke replied to angusj's topic in Delphi Third-Party

First of all, using Int64 on x86 causes quite some impact already. You notice that simply compiling the benchmark as is on x64 makes it run faster - that is not because the x64 compiler is so amazing but simply because now all the Int64 operations are simple register operations. Second - and that is something I have been asking for quite some time: let the compiler generate less conditional jump instructions in favor or conditional mov instruction - especially in comparers that can improve performance quite significantly. To a certain degree one can code that in pascal but that's usually not very pretty. As for the sorting: I did a try using IntroSort which gave a slight improvement over the RTL Quicksort.
- September 2, 2022
- 23 replies
Profiling Clipper2

Stefan Glienke replied to angusj's topic in Delphi Third-Party

That's one of the great things they've been doing in the JIT/compiler if you read through the blog article - it can detect when list access won't be out of its range so it can omit any superfluous range checks. A few additional improvements to your source code: function IntersectListSort(node1, node2: Pointer): Integer; var pt1, pt2: ^TPoint64; i: Int64; begin // note to self - can't return int64 values :) pt1 := @PIntersectNode(node1).pt; pt2 := @PIntersectNode(node2).pt; i := pt2.Y - pt1.Y; if (i = 0) then begin if (pt1 = pt2) then begin Result := 0; Exit; end; // Sort by X too. Not essential, but it significantly // speeds up the secondary sort in ProcessIntersectList . i := pt1.X - pt2.X; end; if i > 0 then Result := 1 else if i < 0 then Result := -1 else result := 0; end; This eliminates as many repeatedly indirections as possible - that reduces register pressure (especially on x86) Next improvement is in TClipperBase.ProcessIntersectList which takes the majority of overall time in the benchmark - namely the inner loop that increments j: for i := 0 to highI do begin // make sure edges are adjacent, otherwise // change the intersection order before proceeding node := UnsafeGet(FIntersectList, i); if not EdgesAdjacentInAEL(node) then begin j := i; repeat inc(j); until EdgesAdjacentInAEL(UnsafeGet(FIntersectList, j)); // now swap intersection order FIntersectList.List[i] := UnsafeGet(FIntersectList, j); FIntersectList.List[j] := node; end; First we store the node at i because its the same after the loop - so no repeatly getting it necessary. Since you incremented j by 1 anyway I chose to use a repeat loop because I know that it won't do a tail jump to check the condition first. Also it generates slightly better code with an inlined function as its condition. Which leads the the third improvement: function EdgesAdjacentInAEL(node: PIntersectNode): Boolean; {$IFDEF INLINING} inline; {$ENDIF} var active1, active2: PActive; begin active1 := node.active1; active2 := node.active2; Result := (active1.nextInAEL = active2) or (active1.prevInAEL = active2); end; Same story as before: eliminate any repeatly indirections, fetch active1 and active2 only once instead of accessing both two times (that with you used there before does not change that) Another big chunk of time can probably be shaved off by using a hand-coded quicksort that does not use the repeated call into IntersectListSort but avoids that call altogether.
- September 2, 2022
- 23 replies

Sign In

Stefan Glienke

Content Count

Joined

Last visited

Days Won

Content Type

Profiles

Forums

Calendar

Everything posted by Stefan Glienke

Function with 2 return values ?

Array size 64bits

A gem from the past (Goto)

Use of inline variables..

64bit RTL patches with Intel OneApi and TBB

Implement logic in TListView

Implement logic in TListView

Delphifeeds.com??

generics

generics

The Delphi 11.2 release thread

generics

Best place for Spring4D questions

Any advantage to using FastMM5??

64bit RTL patches with Intel OneApi and TBB

32bit vs 64bit

32bit vs 64bit

32bit vs 64bit

64bit RTL patches with Intel OneApi and TBB

64bit RTL patches with Intel OneApi and TBB

64bit RTL patches with Intel OneApi and TBB

Profiling Clipper2

Profiling Clipper2

Profiling Clipper2

Profiling Clipper2

Browse

Activity