Are TInterlocked.Exchange and CompareExchange implementation really Atomic with class types??

@AT · July 26

Recently I was looked into the `assembler` code generated with Delphi 12.3 for the snipped below:

...
var
  [volatile] Src, Dst: TObject;
...
begin
...
  Src:=TObject.Create;
  Dst:=TInterlocked.Exchange(Src, nil);
  FreeAndNil(Dst);
  ...
 end;

And I see this code (Win32 Target):

TestInterlocked.dpr.31: Src:=TObject.Create;
003823FD B201             mov dl,$01
003823FF A1EC162B00       mov eax,[$002b16ec]
00382404 E85B66F3FF       call $002b8a64
00382409 8945FC           mov [ebp-$04],eax
TestInterlocked.dpr.32: Dst:=TInterlocked.Exchange(Src, nil);
0038240C 33C0             xor eax,eax
0038240E 8945F0           mov [ebp-$10],eax
00382411 8B45F0           mov eax,[ebp-$10]
00382414 8945EC           mov [ebp-$14],eax
00382417 8D45FC           lea eax,[ebp-$04]
0038241A 8B55EC           mov edx,[ebp-$14]
0038241D E886F5FFFF       call $003819a8
00382422 8945E8           mov [ebp-$18],eax
00382425 8B45E8           mov eax,[ebp-$18]
00382428 8945F8           mov [ebp-$08],eax
TestInterlocked.dpr.33: FreeAndNil(Dst);
0038242B 8B45F8           mov eax,[ebp-$08]
0038242E 8945E4           mov [ebp-$1c],eax
00382431 33C0             xor eax,eax
00382433 8945F8           mov [ebp-$08],eax
00382436 8B45E4           mov eax,[ebp-$1c]
00382439 E85666F3FF       call $002b8a94

I do not think the code generated for the Dst:=TInterlocked.Exchange(Src, nil); call is atomic and thread safe.

I have compared what compiler does for the next snippet:

...
var
  [volatile] Src, Dst: TObject;
...
begin
...
  Src:=TObject.Create;
  Dst:=AtomicExchange(Src, nil);
  FreeAndNil(Dst);
  ...
 end;

and see the big difference:

TestInterlocked.dpr.27: Src:=TObject.Create;
003823D2 B201             mov dl,$01
003823D4 A1EC162B00       mov eax,[$002b16ec]
003823D9 E88666F3FF       call $002b8a64
003823DE 8945FC           mov [ebp-$04],eax
TestInterlocked.dpr.28: Dst:=AtomicExchange(pointer(Src), nil);
003823E1 33C0             xor eax,eax
003823E3 F08745FC         lock xchg [ebp-$04],eax
003823E7 8945F8           mov [ebp-$08],eax
TestInterlocked.dpr.29: FreeAndNil(Dst);
003823EA 8B45F8           mov eax,[ebp-$08]
003823ED 8945F4           mov [ebp-$0c],eax
003823F0 33C0             xor eax,eax
003823F2 8945F8           mov [ebp-$08],eax
003823F5 8B45F4           mov eax,[ebp-$0c]
003823F8 E89766F3FF       call $002b8a94

The AtomicExchange instrict function generate an atomic code.

I suspect that the reason of it is type conversions used in TInterlocked.Exchange, TInterlocked.CompareExchange for the floating point types, TObject, and class generic types.

I haven't check what the assembler code generated for the example above by pre-12.x compilers. I would appreciate it if someone can check it with 10.x and 11.x.

P.S.

TInterlocked.Exchange generate correct code for simple types (integers, pointers, boolean), but adds additional code to Class types and floating point types makes them vulnerable to multithread race condition errors.

Anders Melander · July 27

46 minutes ago, @AT said:

I do not think the code generated for the Dst:=TInterlocked.Exchange(Src, nil); call is atomic and thread safe.

Why? What in that assembler makes you think that?

TInterlocked.Exchange(pointer, pointer) and TInterlocked.Exchange(TObject, TObject) are both implemented with a call to AtomicExchange. That's the:

call $003819a8

58 minutes ago, @AT said:

I suspect that the reason of it is type conversions used in TInterlocked.Exchange, TInterlocked.CompareExchange for the floating point types, TObject, and class generic types.

There's no type conversion as such. The compiler just doesn't inline the call as it should.

@AT · July 27

5 hours ago, Anders Melander said:

Why? What in that assembler makes you think that?

I showed the example above. AtomicExchange primitive generate correct thread-safe code. Variable value exchanges with the register value in atomic operation which can not be interrupted in the middle of execution.

TestInterlocked.dpr.28: Dst:=AtomicExchange(pointer(Src), nil);
003823E1 33C0             xor eax,eax
003823E3 F08745FC         lock xchg [ebp-$04],eax
003823E7 8945F8           mov [ebp-$08],eax

It you look to the TInterlocked class source code you can find bunch of type conversions:

class function TInterlocked.Exchange(var Target: Pointer; Value: Pointer): Pointer;
begin
  Result := AtomicExchange(Target, Value);
end;

...

class function TInterlocked.Exchange<T>(var Target: T; Value: T): T;
begin
  TObject((@Result)^) := Exchange(TObject((@Target)^), TObject((@Value)^));
end;

So, Compiler inlines the only TInterlocked.Exchange<T>(var Target: T; Value: T): T function with explicit underlaid call to TInterlocked.Exchange(var Target: Pointer; Value: Pointer): Pointer;

Assembler code displays explicit conversions defined in the Pascal code like TObject((@Value)^), TObject((@Value)^) and TObject((@Result)^)

As this assembler code generates multiple set of instructions (tossing register to variables and) vice versa before and after the real exchange, this actual result may be overwritten in the middle by other thread. So my concern is that the TInterlocked.Exchange is called as and atomic primitive, which is not correct in some of declared cases.

Edited July 27 by @AT

Remy Lebeau · July 27

4 hours ago, @AT said:

As this assembler code generates multiple set of instructions (tossing register to variables and) vice versa before and after the real exchange, this actual result may be overwritten in the middle by other thread.

That is not true. During a task switch between two threads, CPU register values are preserved for the thread that is being switched from, and they are restored when that thread is switched back to. So threads cannot overwrite each other's register values. And the variables in question are all local to the calling thread's call stack, so they can't be overwritten by other threads, either (at least, in this example, anyway). The only way that values could possibly be overwritten are from variables that are being shared across thread boundaries (which is not the case in your example), and such overwriting is going to be sensitive to the timings between thread switches, so you are not guaranteed a particular result one way or the other whether you use the intrinsic exchange or class-wrapped exchange.

I think you are misdiagosing the problem (if there even is a problem). Yes, the class version of the exchange is clearly less efficient than the intrinsic, but that doesn't mean the class version is any less thread-safe.

Edited July 27 by Remy Lebeau

Dalija Prasnikar · July 27

3 hours ago, @AT said:

As this assembler code generates multiple set of instructions (tossing register to variables and) vice versa before and after the real exchange, this actual result may be overwritten in the middle by other thread. So my concern is that the TInterlocked.Exchange is called as and atomic primitive, which is not correct in some of declared cases.

It is atomic, because only one thread will be able to make the exchange and retrieve non-nil value stored in the Src variable, provided that all other threads also use atomic exchange. The extra shuffling does not matter for atomicity as the shuffled values before the call are not related to the value stored in the Src variable (one that will be atomically exchanged by lock xchg instruction). Note that lea instruction loads the address, not the value stored in memory location.

DelphiUdIT · July 27

I understand what @@AT means.
The entire class-based operation is not "singularized," and from a general perspective, it might seem that the operation's "atomicity" isn't guaranteed.
But atomicity concerns ONLY and EXCLUSIVELY the modification of the SRC value, nothing else.
And the modification of that value is guaranteed to be atomic via the LOCK.

As for its content, nothing can be defined, since if the variables are "constant" or "local," everything is certain, but if all the parts involved are global variables... then their value is definitely not certain at any given time.

But here we delve into other topics, involving synchronization techniques, barriers, etc. (and they also exist at the processor instruction level, such as "lfence" other the "lock").

And that is another story like other writers in this topic said.

Edited July 27 by DelphiUdIT

@AT · July 28

Thank you folks for your comments.

I have to agree with your comments. My original example was simple, but did not show my concern.

The Interlocked.CompareExchange methods uses the same semantics and the same explicit conversion for the Generic class types. I expected that explicit conversion increases unsuccessful atomic exchange attempts in the Generic variant in comparison to Pointer variant due to increased CPU command instructions in the Generic variant method. However, my experiments displays that they are close to each other.

Dalija Prasnikar · July 28

36 minutes ago, @AT said:

increases unsuccessful atomic exchange

What do you mean by this?

If there are multiple threads trying to do such atomic operation on a variable, then only single one will succeed regardless of which method is called. And one thread will always be successful. That is the whole point of atomic operation. You should also remember that in multithreading there is no guarantee which thread will be able to make the atomic exchange nor which thread will be able to acquire some lock. Even if 100 times operation happens in particular order, that does not mean that it will happen in the same order the next time.

@AT · July 28

I meant unsuccessful Interlocked.CompareExchange calls returns False.

for example, how many collisions (unsuccessful exchanges) this code may generate in heavy loaded multithreading environment:

var lSW: TSpinWatch;
var lObj: TObject:=CreateNewObject; // CreateNewObject: TObject;
var lOldObj: TObject:=nil;
var lSucc: boolean;

lSW.Reset
repeat
	// FFIeld is a "global" variable TObject type
  var lCmp:=FField; 
  lOldObj:=TInterlocked.CompareExchange(FField, lObj, lCmp, lSucc);
  if lSucc then
    break;
  lSW.SpinWaitl
until True;

// do something with lOldObj below
// ...

in comparison to:

var lOldObj: TObject:=nil;

FCriticalSection.Enter; // FCriticalSection: TCriticalSection; "global" variable.
try
  lOldObj:=FField;
  FField:=CreateNewObj;
finally
  FCriticalSection.Leave;
end
// do something with lOldObj below
// ...

It's clear, using AtomicExchange functions can generate faster code (where is possible) than the code using locking mechanism. But my worry was that multiprocessor concurrent environment may generate a lot of "collisions" in attempts to update shared variable with Interlocked.CompareExchange<T: class>... method due to additional explicit type conversion code.

Edited July 29 by @AT

DelphiUdIT · July 28

53 minutes ago, @AT said:

I meant unsuccessful Interlocked.CompareExchange calls returns False.

for example, how many collisions (unsuccessful exchanges) this code may generate in heavy loaded multithreading environment:

The code with ".CompareExchange" in you example cannot exist in that way.

The "unsuccesfull" exchange is not about a collision but a comparison: look at https://docwiki.embarcadero.com/Libraries/Athens/en/System.SyncObjs.TInterlocked.CompareExchange

I think you confuse the Interlock with something else.

Interlock operations ensure that these are performed SINGLELY in a multithreaded environment (there's no point in using them otherwise).
But these operations are ALL ALWAYS performed. The hardware determines how to do it, but it does them all one at a time.
The sequence is not and cannot be known in a multithreaded environment.
For example, these operations can be used to increment a global counter from multiple threads.
If the "target," i.e., the memory to be modified, is the same (since it's the same variable), interlock operations prevent multiple threads from colliding and generating a race condition.

@AT · July 29

1 hour ago, DelphiUdIT said:

The code with ".CompareExchange" in you example cannot exist in that way.

Thank you for pointing it out. I corrected the example above.

I do not confuse atomic operation with locking primitives. I'm refactoring some legacy code which use a lot of locking primitives mostly for updating a single "global" variable of simple type, class or pointer inside a protected code section. So I'm planning to replace some of them with an atomic functions.

As I wrote above, I was worried with assembler code generated for a specific methods: TInterlocked.Exchange<T: class> and TInterlocked.CompareExhchange<T: class>... This was a reason why I raised this question.

Anders Melander · July 29

3 hours ago, @AT said:

due to additional explicit type conversion code.

Again: There is no type conversion.

A hard type-cast is not a type conversion. It's just telling the compiler to treat a variable as a specific type even though it actually is another type. The compiler allows this because the types are of the same binary size.

@AT · July 29

7 minutes ago, Anders Melander said:

Again: There is no type conversion.

A hard type-cast is not a type conversion. It's just telling the compiler to treat a variable as a specific type even though it actually is another type. The compiler allows this because the types are of the same binary size.

Ok, type-casting is correct wording. Thanks again for correcting me.

However, there is a big difference between: pointer(someVar) and T((@someVar)^) type-casting when someVar is a class instance variable. First variant does not adds extra code, but second one enforces compiler to get the pointer and then dereference it. This was a source of original question.

Dalija Prasnikar · July 29

5 hours ago, @AT said:

I do not confuse atomic operation with locking primitives. I'm refactoring some legacy code which use a lot of locking primitives mostly for updating a single "global" variable of simple type, class or pointer inside a protected code section. So I'm planning to replace some of them with an atomic functions.

You should be very careful when doing this, because not all code logic protected with locks can be replaced with atomic operations.

DelphiUdIT · July 29

@@AT

I still think you don't understand the function of Interlocked and how it works.
Even in your example, you use loops to verify that the function "succeeds."
This is legal if the variables are modified by other threads and you enter that "loop" indefinitely until the variables assume an identical value to exit.
But since you also mention collisions, I think you actually believe that Interlocks are executed or not based on collisions.
That's not the case. It's not a mutex or a semaphore, and it's not a critical section.
Interlocked simply protects a memory area from "concurrent" modifications.
And note that in reality, most direct memory operations are already intrinsically protected.
For example, INTEL guarantees the atomicity of some operations such as direct reading or writing, which involves a register and "byte," word, dword, etc. memory if the memory is aligned.
XCHG instructions are also atomic (unlike complex instructions like CMPXCHG) as long as they operate on aligned memory.
LOCK can be used on multiple memory-operating instructions to ensure LOCK even in unintended cases (e.g., unaligned memory).
Ref: "Intel® 64 and IA-32 Architectures Optimization Reference Manual: Volume 1, April 2024";
Ref: "Intel® 64 and IA-32 Architectures Software Developer’s Manual, March 2025"

Caution: Interlocked functions generate a significant delay in instruction execution (i.e., the instruction executes in more clock cycles than normal).

Stefan Glienke · July 29

FWIW I reported the bad inlining due to the way these methods are implemented in TInterlocked and proposed the improvements in https://embt.atlassian.net/servicedesk/customer/portal/1/RSS-3862

Tommi Prami · July 29

For clarity, maybe good code example would be needed. 🙂

Checked Embarcadero documentation, and it also would benefit code samples. And explanation what those can and can't do...

-Tee-

Edited July 29 by Tommi Prami
Typo etc...

@AT · July 29

7 hours ago, Stefan Glienke said:

FWIW I reported the bad inlining due to the way these methods are implemented in TInterlocked and proposed the improvements in https://embt.atlassian.net/servicedesk/customer/portal/1/RSS-3862

Thank you Stefan! You run ahead of me . In my point of view it's not an compiler inlining issue, but the source code issue. If you look at the Embarcadero sources, they use explicit type cast like ...Exchange(pointer(InstanceVar)... much often than ...Exchange<classtype>(... in their own code. However, it might a result of late implementation generic method.

10 hours ago, Dalija Prasnikar said:

You should be very careful when doing this, because not all code logic protected with locks can be replaced with atomic operations.

Absolutely.

8 hours ago, DelphiUdIT said:

I still think you don't understand the function of Interlocked and how it works.

I so appreciated for correcting my wording and providing additional information which could be useful for the community. However, it does not mean I have luck of understanding difference of atomic operations and locking primitives and use-cases where they should be used. This topic is closed for me, I have a clear answer on my question already.

Sign In

Are TInterlocked.Exchange and CompareExchange implementation really Atomic with class types??

Recommended Posts

@AT 1

Share this post

Link to post

Anders Melander 2096

Share this post

Link to post

@AT 1

Share this post

Link to post

Remy Lebeau 1654

Share this post

Link to post

Dalija Prasnikar 1533

Share this post

Link to post

DelphiUdIT 261

Share this post

Link to post

@AT 1

Share this post

Link to post

Dalija Prasnikar 1533

Share this post

Link to post

@AT 1

Share this post

Link to post

DelphiUdIT 261

Share this post

Link to post

@AT 1

Share this post

Link to post

Anders Melander 2096

Share this post

Link to post

@AT 1

Share this post

Link to post

Dalija Prasnikar 1533

Share this post

Link to post

DelphiUdIT 261

Share this post

Link to post

Stefan Glienke 2169

Share this post

Link to post

Tommi Prami 158

Share this post

Link to post

@AT 1

Share this post

Link to post

Create an account or sign in to comment

Create an account

Sign in

Browse

Activity