Jump to content
pyscripter

Revisiting TThreadedQueue and TMonitor

Recommended Posts

Posted (edited)

Success!  If I use the CAS function from OmniThreadLibrary it works.  So it appears that Delphi's InterlockedCompareExchange128 is broken.

 

Passed a test with 1000 threads and 50 million calls to NewWaitObj.

 

I am attaching the revised test code which should now work in both 32bits and 64bits.  We should now try to convince Embarcadero to include the fixes in the Delphi RTL.

iaStressTest.TThreadedQueue.PopItem.pas

Edited by pyscripter

Share this post


Link to post
8 hours ago, Kas Ob. said:

The assembly is good and had nothing to be fixed.

No it is not. It modifies rbx and does not restore it. 

Share this post


Link to post

@David HeffernanI know very little about assembler and I will leave this to experts to resolve.  However I do see push rbx and pop rbx in the code.  Does this not save and restore rbx?   This code has been in the master branch of OmniThreadLibrary for ages and should have been tested extensively.   It does appear to work well we the TMonitor fix.

Share this post


Link to post

I guess David is referring another assembler sequence.

Share this post


Link to post
Posted (edited)
13 minutes ago, Uwe Raabe said:

I guess David is referring another assembler sequence.

Well there is the Delphi implementation which does not work:

function InterlockedCompareExchange128(Destination: Pointer; ExchangeHigh, ExchangeLow: Int64; ComparandResult: Pointer): Bool; stdcall;
asm
      MOV   R10,RCX
      MOV   RBX,R8
      MOV   RCX,RDX
      MOV   RDX,[R9+8]
      MOV   RAX,[R9]
 LOCK CMPXCHG16B [R10]
      SETZ  AL
end;

and the CAS function from OmniThreadLibrary which works well.

function CAS(const oldData: pointer; oldReference: NativeInt; newData: pointer;
  newReference: NativeInt; var destination): boolean;
asm
{$IFNDEF CPUX64}
  push  edi
  push  ebx
  mov   ebx, newData
  mov   ecx, newReference
  mov   edi, destination
  lock cmpxchg8b qword ptr [edi]
  pop   ebx
  pop   edi
{$ELSE CPUX64}
  .noframe
  push  rbx                     //rsp := rsp - 8 !
  mov   rax, oldData
  mov   rbx, newData
  mov   rcx, newReference
  mov   r8, [destination + 8]   //+8 with respect to .noframe
  lock cmpxchg16b [r8]
  pop   rbx
{$ENDIF CPUX64}
  setz  al
end; { CAS }

@David Heffernanhas said that both are wrong (see earlier comments).   By the way Allen Bauer explains .PUSHNV here.

Edited by pyscripter

Share this post


Link to post
8 hours ago, pyscripter said:

And by the way. There is also the  [Align(8)] attribute which works with fields, global variables and records it seems.

Thank you very much, this is the first time i hear this !

Share this post


Link to post

CAS from OTL does restore rbx, but it should do so using .pushnv, for the reasons that Allen's article explains. 

  • Thanks 1

Share this post


Link to post
1 minute ago, David Heffernan said:

CAS from OTL does restore rbx, but it should do so using .pushnv, for the reasons that Allen's article explains. 

@David HeffernanThanks.  Could you please post here what the function should be?

Share this post


Link to post
4 minutes ago, David Heffernan said:

CAS from OTL does restore rbx, but it should do so using .pushnv, for the reasons that Allen's article explains. 

From dosumentation 

Quote

.PUSHNV REG

Generates code to save and restore the non-volatile general purpose register in prologue and epilogue.

It merely does add push REG at the begining and pop REG at the end, for rbx it is saved already in code and doesn't need this.

Share this post


Link to post
Posted (edited)
10 minutes ago, Kas Ob. said:

It merely does add push REG at the begining and pop REG at the end, for rbx it is saved already in code and doesn't need this.

Wrong. It also lets the compiler out the meta data, the unwind data, needed to restore the register in case of an exception. Read Allen's article. 

Edited by David Heffernan

Share this post


Link to post
15 minutes ago, David Heffernan said:

Wrong. It also lets the compiler out the meta data, the unwind data, needed to restore the register in case of an exception. Read Allen's article. 

Would you please point me to that article, i have no idea where to find it.

Share this post


Link to post
Just now, Kas Ob. said:

Would you please point me to that article, i have no idea where to find it.

It is linked by @pyscripter a few replies above.

  • Thanks 1

Share this post


Link to post

If I replace the push and pop rbx with .pushnv rbx and remove the .noframe the code crashes.   How is one supposed to use .pushnv?

 

function CAS(const oldData: pointer; oldReference: NativeInt; newData: pointer;
  newReference: NativeInt; var destination): boolean;
asm
{$IFNDEF CPUX64}

{$ELSE CPUX64}
  .pushnv rbx                   //rsp := rsp - 8 !
  mov   rax, oldData
  mov   rbx, newData
  mov   rcx, newReference
  mov   r8, [destination + 8]   //+8 with respect to .noframe
  lock cmpxchg16b [r8]
{$ENDIF CPUX64}
  setz  al
end; { CAS }

 

Share this post


Link to post
Posted (edited)

Probably remove that +8. But I'm guessing there. I'd never write asm using variable names. You simply have to write the whole thing using registers to keep track of what it where. 

Edited by David Heffernan
  • Like 1

Share this post


Link to post
9 hours ago, pyscripter said:

Actually Delphi does offer support for alignment (undocumented feature see @Uwe Raabe response in this Stackoverflow question).

Beware that this does not align memory allocated from the heap. For that you can use the following:

{$ifdef CPUX64}
  SetMinimumBlockAlignment(mba16Byte);
{$else CPUX64}
  SetMinimumBlockAlignment(mba8Byte);
{$endif CPUX64}

 

52 minutes ago, pyscripter said:

Could you please post here what the function should be?

This works for me:

function InterlockedCompareExchange128(Destination: Pointer; ExchangeHigh, ExchangeLow: Int64; ComparandResult: Pointer): Bool; stdcall;
asm
      .NOFRAME
      .PUSHNV R10
      .PUSHNV RAX
      .PUSHNV RBX
      .PUSHNV RCX
      MOV   R10,Destination
      MOV   RBX,ExchangeLow
      MOV   RCX,ExchangeHigh
      MOV   RDX,[ComparandResult+8]
      MOV   RAX,[ComparandResult]
 LOCK CMPXCHG16B [R10]
      SETZ  AL
end;

The PUSHNV RBX is translated to push RBX + pop RBX. The others do nothing.

Share this post


Link to post
Just now, pyscripter said:

If I replace the push and pop rbx with .pushnv rbx and remove the .noframe the code crashes.   How is one supposed to use .pushnv?


mov   r8, [destination]

This works and passes the test.

Share this post


Link to post
Posted (edited)
7 minutes ago, Anders Melander said:

Beware that this does not align memory allocated from the heap. For that you can use the following:


{$ifdef CPUX64}
  SetMinimumBlockAlignment(mba16Byte);
{$else CPUX64}
  SetMinimumBlockAlignment(mba8Byte);
{$endif CPUX64}

 

This works for me:


function InterlockedCompareExchange128(Destination: Pointer; ExchangeHigh, ExchangeLow: Int64; ComparandResult: Pointer): Bool; stdcall;
asm
      .NOFRAME
      .PUSHNV R10
      .PUSHNV RAX
      .PUSHNV RBX
      .PUSHNV RCX
      MOV   R10,Destination
      MOV   RBX,ExchangeLow
      MOV   RCX,ExchangeHigh
      MOV   RDX,[ComparandResult+8]
      MOV   RAX,[ComparandResult]
 LOCK CMPXCHG16B [R10]
      SETZ  AL
end;

The PUSHNV RBX is translated to push RBX + pop RBX. The others do nothing.

You are restoring volatile registers there. That's wrong. Most egregious is where you restore rax which has the return value. 

 

Ah, i read the bit where you say "the others do nothing". I guess the compiler knows they are volatile / used for param/return passing. I'd remove them all the same. 

 

You really must stop using parameter names in asm because it obfuscates the fact they they live in registers, and which ones. 

Edited by David Heffernan

Share this post


Link to post
3 minutes ago, David Heffernan said:

You are restoring volatile registers there. That's wrong

I'm aware that it's a noop in this case and I was leaving that detail to the compiler...

3 minutes ago, David Heffernan said:

Most egregious is where you restore rax which has the return value. 

...but that's a good point.

 

4 minutes ago, David Heffernan said:

You really must stop using parameter names in asm because it obfuscates the fact they they live in registers, and which ones. 

I get your point but I always start with registers and then refactor the code to use symbols to make it readable. I guess I should keep the register names in a comment.

 

Share this post


Link to post
6 minutes ago, Anders Melander said:

I always start with registers and then refactor the code to use symbols to make it readable. I guess I should keep the register names in a comment.

To me it is much cleaner to comment at the top of the routine which register each parameter travels in, and then stick to registers. That allows you to see teg register usage directly rather than having an extra level of indirection. 

Share this post


Link to post
1 hour ago, Anders Melander said:

Beware that this does not align memory allocated from the heap. For that you can use the following:


{$ifdef CPUX64}
  SetMinimumBlockAlignment(mba16Byte);
{$else CPUX64}
  SetMinimumBlockAlignment(mba8Byte);
{$endif CPUX64}

 

SetMinimumBlockAlignment does nothing in 64bits (see the source code).  Alignment is fixed at mba16Byte always.

  • Thanks 1

Share this post


Link to post

Watching this with great interest!  This level of knowledge is beyond me, but as a consumer of the Delphi library code it would be fantastic to hear that is has been fixed.  What's the best way to "suggest" that Embarcadero incorporate any found fixes?  Contacting Marco?  Voting for a QC report?  Something else?  A big thanks to all contributors on this, it really is fascinating.

  • Like 1

Share this post


Link to post
31 minutes ago, MarkShark said:

What's the best way to "suggest" that Embarcadero incorporate any found fixes?  Contacting Marco?  Voting for a QC report?

There's RSP-23333 which is about this problem. I've already posted a link to the original article and this thread to the report.

Beyond that it would be nice if someone on the 10.4 field test could bring it to the attention of those that can do something about it.

 

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×