@AT 0 Posted Saturday at 11:30 PM Recently I was looked into the `assembler` code generated with Delphi 12.3 for the snipped below: ... var [volatile] Src, Dst: TObject; ... begin ... Src:=TObject.Create; Dst:=TInterlocked.Exchange(Src, nil); FreeAndNil(Dst); ... end; And I see this code (Win32 Target): TestInterlocked.dpr.31: Src:=TObject.Create; 003823FD B201 mov dl,$01 003823FF A1EC162B00 mov eax,[$002b16ec] 00382404 E85B66F3FF call $002b8a64 00382409 8945FC mov [ebp-$04],eax TestInterlocked.dpr.32: Dst:=TInterlocked.Exchange(Src, nil); 0038240C 33C0 xor eax,eax 0038240E 8945F0 mov [ebp-$10],eax 00382411 8B45F0 mov eax,[ebp-$10] 00382414 8945EC mov [ebp-$14],eax 00382417 8D45FC lea eax,[ebp-$04] 0038241A 8B55EC mov edx,[ebp-$14] 0038241D E886F5FFFF call $003819a8 00382422 8945E8 mov [ebp-$18],eax 00382425 8B45E8 mov eax,[ebp-$18] 00382428 8945F8 mov [ebp-$08],eax TestInterlocked.dpr.33: FreeAndNil(Dst); 0038242B 8B45F8 mov eax,[ebp-$08] 0038242E 8945E4 mov [ebp-$1c],eax 00382431 33C0 xor eax,eax 00382433 8945F8 mov [ebp-$08],eax 00382436 8B45E4 mov eax,[ebp-$1c] 00382439 E85666F3FF call $002b8a94 I do not think the code generated for the Dst:=TInterlocked.Exchange(Src, nil); call is atomic and thread safe. I have compared what compiler does for the next snippet: ... var [volatile] Src, Dst: TObject; ... begin ... Src:=TObject.Create; Dst:=AtomicExchange(Src, nil); FreeAndNil(Dst); ... end; and see the big difference: TestInterlocked.dpr.27: Src:=TObject.Create; 003823D2 B201 mov dl,$01 003823D4 A1EC162B00 mov eax,[$002b16ec] 003823D9 E88666F3FF call $002b8a64 003823DE 8945FC mov [ebp-$04],eax TestInterlocked.dpr.28: Dst:=AtomicExchange(pointer(Src), nil); 003823E1 33C0 xor eax,eax 003823E3 F08745FC lock xchg [ebp-$04],eax 003823E7 8945F8 mov [ebp-$08],eax TestInterlocked.dpr.29: FreeAndNil(Dst); 003823EA 8B45F8 mov eax,[ebp-$08] 003823ED 8945F4 mov [ebp-$0c],eax 003823F0 33C0 xor eax,eax 003823F2 8945F8 mov [ebp-$08],eax 003823F5 8B45F4 mov eax,[ebp-$0c] 003823F8 E89766F3FF call $002b8a94 The AtomicExchange instrict function generate an atomic code. I suspect that the reason of it is type conversions used in TInterlocked.Exchange, TInterlocked.CompareExchange for the floating point types, TObject, and class generic types. I haven't check what the assembler code generated for the example above by pre-12.x compilers. I would appreciate it if someone can check it with 10.x and 11.x. P.S. TInterlocked.Exchange generate correct code for simple types (integers, pointers, boolean), but adds additional code to Class types and floating point types makes them vulnerable to multithread race condition errors. Share this post Link to post
Anders Melander 2038 Posted Sunday at 12:32 AM 46 minutes ago, @AT said: I do not think the code generated for the Dst:=TInterlocked.Exchange(Src, nil); call is atomic and thread safe. Why? What in that assembler makes you think that? TInterlocked.Exchange(pointer, pointer) and TInterlocked.Exchange(TObject, TObject) are both implemented with a call to AtomicExchange. That's the: call $003819a8 58 minutes ago, @AT said: I suspect that the reason of it is type conversions used in TInterlocked.Exchange, TInterlocked.CompareExchange for the floating point types, TObject, and class generic types. There's no type conversion as such. The compiler just doesn't inline the call as it should. 1 Share this post Link to post
@AT 0 Posted Sunday at 04:06 AM (edited) 5 hours ago, Anders Melander said: Why? What in that assembler makes you think that? I showed the example above. AtomicExchange primitive generate correct thread-safe code. Variable value exchanges with the register value in atomic operation which can not be interrupted in the middle of execution. TestInterlocked.dpr.28: Dst:=AtomicExchange(pointer(Src), nil); 003823E1 33C0 xor eax,eax 003823E3 F08745FC lock xchg [ebp-$04],eax 003823E7 8945F8 mov [ebp-$08],eax It you look to the TInterlocked class source code you can find bunch of type conversions: class function TInterlocked.Exchange(var Target: Pointer; Value: Pointer): Pointer; begin Result := AtomicExchange(Target, Value); end; ... class function TInterlocked.Exchange<T>(var Target: T; Value: T): T; begin TObject((@Result)^) := Exchange(TObject((@Target)^), TObject((@Value)^)); end; So, Compiler inlines the only TInterlocked.Exchange<T>(var Target: T; Value: T): T function with explicit underlaid call to TInterlocked.Exchange(var Target: Pointer; Value: Pointer): Pointer; Assembler code displays explicit conversions defined in the Pascal code like TObject((@Value)^), TObject((@Value)^) and TObject((@Result)^) As this assembler code generates multiple set of instructions (tossing register to variables and) vice versa before and after the real exchange, this actual result may be overwritten in the middle by other thread. So my concern is that the TInterlocked.Exchange is called as and atomic primitive, which is not correct in some of declared cases. Edited Sunday at 05:34 AM by @AT Share this post Link to post
Remy Lebeau 1623 Posted Sunday at 07:45 AM (edited) 4 hours ago, @AT said: As this assembler code generates multiple set of instructions (tossing register to variables and) vice versa before and after the real exchange, this actual result may be overwritten in the middle by other thread. That is not true. During a task switch between two threads, CPU register values are preserved for the thread that is being switched from, and they are restored when that thread is switched back to. So threads cannot overwrite each other's register values. And the variables in question are all local to the calling thread's call stack, so they can't be overwritten by other threads, either (at least, in this example, anyway). The only way that values could possibly be overwritten are from variables that are being shared across thread boundaries (which is not the case in your example), and such overwriting is going to be sensitive to the timings between thread switches, so you are not guaranteed a particular result one way or the other whether you use the intrinsic exchange or class-wrapped exchange. I think you are misdiagosing the problem (if there even is a problem). Yes, the class version of the exchange is clearly less efficient than the intrinsic, but that doesn't mean the class version is any less thread-safe. Edited Sunday at 08:28 AM by Remy Lebeau 1 Share this post Link to post
Dalija Prasnikar 1524 Posted Sunday at 08:15 AM 3 hours ago, @AT said: As this assembler code generates multiple set of instructions (tossing register to variables and) vice versa before and after the real exchange, this actual result may be overwritten in the middle by other thread. So my concern is that the TInterlocked.Exchange is called as and atomic primitive, which is not correct in some of declared cases. It is atomic, because only one thread will be able to make the exchange and retrieve non-nil value stored in the Src variable, provided that all other threads also use atomic exchange. The extra shuffling does not matter for atomicity as the shuffled values before the call are not related to the value stored in the Src variable (one that will be atomically exchanged by lock xchg instruction). Note that lea instruction loads the address, not the value stored in memory location. Share this post Link to post
DelphiUdIT 254 Posted Sunday at 09:14 AM (edited) I understand what @@AT means. The entire class-based operation is not "singularized," and from a general perspective, it might seem that the operation's "atomicity" isn't guaranteed. But atomicity concerns ONLY and EXCLUSIVELY the modification of the SRC value, nothing else. And the modification of that value is guaranteed to be atomic via the LOCK. As for its content, nothing can be defined, since if the variables are "constant" or "local," everything is certain, but if all the parts involved are global variables... then their value is definitely not certain at any given time. But here we delve into other topics, involving synchronization techniques, barriers, etc. (and they also exist at the processor instruction level, such as "lfence" other the "lock"). And that is another story like other writers in this topic said. Edited Sunday at 09:15 AM by DelphiUdIT Share this post Link to post
@AT 0 Posted 14 hours ago Thank you folks for your comments. I have to agree with your comments. My original example was simple, but did not show my concern. The Interlocked.CompareExchange methods uses the same semantics and the same explicit conversion for the Generic class types. I expected that explicit conversion increases unsuccessful atomic exchange attempts in the Generic variant in comparison to Pointer variant due to increased CPU command instructions in the Generic variant method. However, my experiments displays that they are close to each other. Share this post Link to post
Dalija Prasnikar 1524 Posted 13 hours ago 36 minutes ago, @AT said: increases unsuccessful atomic exchange What do you mean by this? If there are multiple threads trying to do such atomic operation on a variable, then only single one will succeed regardless of which method is called. And one thread will always be successful. That is the whole point of atomic operation. You should also remember that in multithreading there is no guarantee which thread will be able to make the atomic exchange nor which thread will be able to acquire some lock. Even if 100 times operation happens in particular order, that does not mean that it will happen in the same order the next time. Share this post Link to post
@AT 0 Posted 13 hours ago (edited) I meant unsuccessful Interlocked.CompareExchange calls returns False. for example, how many collisions (unsuccessful exchanges) this code may generate in heavy loaded multithreading environment: var lSW: TSpinWatch; var lObj: TObject:=CreateNewObject; // CreateNewObject: TObject; var lOldObj: TObject:=nil; var lSucc: boolean; lSW.Reset repeat // FFIeld is a "global" variable TObject type var lCmp:=FField; lOldObj:=TInterlocked.CompareExchange(FField, lObj, lCmp, lSucc); if lSucc then break; lSW.SpinWaitl until True; // do something with lOldObj below // ... in comparison to: var lOldObj: TObject:=nil; FCriticalSection.Enter; // FCriticalSection: TCriticalSection; "global" variable. try lOldObj:=FField; FField:=CreateNewObj; finally FCriticalSection.Leave; end // do something with lOldObj below // ... It's clear, using AtomicExchange functions can generate faster code (where is possible) than the code using locking mechanism. But my worry was that multiprocessor concurrent environment may generate a lot of "collisions" in attempts to update shared variable with Interlocked.CompareExchange<T: class>... method due to additional explicit type conversion code. Edited 10 hours ago by @AT Share this post Link to post
DelphiUdIT 254 Posted 11 hours ago 53 minutes ago, @AT said: I meant unsuccessful Interlocked.CompareExchange calls returns False. for example, how many collisions (unsuccessful exchanges) this code may generate in heavy loaded multithreading environment: The code with ".CompareExchange" in you example cannot exist in that way. The "unsuccesfull" exchange is not about a collision but a comparison: look at https://docwiki.embarcadero.com/Libraries/Athens/en/System.SyncObjs.TInterlocked.CompareExchange I think you confuse the Interlock with something else. Interlock operations ensure that these are performed SINGLELY in a multithreaded environment (there's no point in using them otherwise). But these operations are ALL ALWAYS performed. The hardware determines how to do it, but it does them all one at a time. The sequence is not and cannot be known in a multithreaded environment. For example, these operations can be used to increment a global counter from multiple threads. If the "target," i.e., the memory to be modified, is the same (since it's the same variable), interlock operations prevent multiple threads from colliding and generating a race condition. Share this post Link to post
@AT 0 Posted 9 hours ago 1 hour ago, DelphiUdIT said: The code with ".CompareExchange" in you example cannot exist in that way. Thank you for pointing it out. I corrected the example above. I do not confuse atomic operation with locking primitives. I'm refactoring some legacy code which use a lot of locking primitives mostly for updating a single "global" variable of simple type, class or pointer inside a protected code section. So I'm planning to replace some of them with an atomic functions. As I wrote above, I was worried with assembler code generated for a specific methods: TInterlocked.Exchange<T: class> and TInterlocked.CompareExhchange<T: class>... This was a reason why I raised this question. Share this post Link to post
Anders Melander 2038 Posted 9 hours ago 3 hours ago, @AT said: due to additional explicit type conversion code. Again: There is no type conversion. A hard type-cast is not a type conversion. It's just telling the compiler to treat a variable as a specific type even though it actually is another type. The compiler allows this because the types are of the same binary size. Share this post Link to post
@AT 0 Posted 9 hours ago 7 minutes ago, Anders Melander said: Again: There is no type conversion. A hard type-cast is not a type conversion. It's just telling the compiler to treat a variable as a specific type even though it actually is another type. The compiler allows this because the types are of the same binary size. Ok, type-casting is correct wording. Thanks again for correcting me. However, there is a big difference between: pointer(someVar) and T((@someVar)^) type-casting when someVar is a class instance variable. First variant does not adds extra code, but second one enforces compiler to get the pointer and then dereference it. This was a source of original question. Share this post Link to post
Dalija Prasnikar 1524 Posted 4 hours ago 5 hours ago, @AT said: I do not confuse atomic operation with locking primitives. I'm refactoring some legacy code which use a lot of locking primitives mostly for updating a single "global" variable of simple type, class or pointer inside a protected code section. So I'm planning to replace some of them with an atomic functions. You should be very careful when doing this, because not all code logic protected with locks can be replaced with atomic operations. 1 Share this post Link to post
DelphiUdIT 254 Posted 3 hours ago @@AT I still think you don't understand the function of Interlocked and how it works. Even in your example, you use loops to verify that the function "succeeds." This is legal if the variables are modified by other threads and you enter that "loop" indefinitely until the variables assume an identical value to exit. But since you also mention collisions, I think you actually believe that Interlocks are executed or not based on collisions. That's not the case. It's not a mutex or a semaphore, and it's not a critical section. Interlocked simply protects a memory area from "concurrent" modifications. And note that in reality, most direct memory operations are already intrinsically protected. For example, INTEL guarantees the atomicity of some operations such as direct reading or writing, which involves a register and "byte," word, dword, etc. memory if the memory is aligned. XCHG instructions are also atomic (unlike complex instructions like CMPXCHG) as long as they operate on aligned memory. LOCK can be used on multiple memory-operating instructions to ensure LOCK even in unintended cases (e.g., unaligned memory). Ref: "Intel® 64 and IA-32 Architectures Optimization Reference Manual: Volume 1, April 2024"; Ref: "Intel® 64 and IA-32 Architectures Software Developer’s Manual, March 2025" Caution: Interlocked functions generate a significant delay in instruction execution (i.e., the instruction executes in more clock cycles than normal). 1 Share this post Link to post
Stefan Glienke 2143 Posted 2 hours ago FWIW I reported the bad inlining due to the way these methods are implemented in TInterlocked and proposed the improvements in https://embt.atlassian.net/servicedesk/customer/portal/1/RSS-3862 Share this post Link to post
Tommi Prami 154 Posted 1 hour ago (edited) For clarity, maybe good code example would be needed. 🙂 Checked Embarcadero documentation, and it also would benefit code samples. And explanation what those can and can't do... -Tee- Edited 1 hour ago by Tommi Prami Typo etc... Share this post Link to post