pyscripter 689 Posted May 17, 2020 (edited) Success! If I use the CAS function from OmniThreadLibrary it works. So it appears that Delphi's InterlockedCompareExchange128 is broken. Passed a test with 1000 threads and 50 million calls to NewWaitObj. I am attaching the revised test code which should now work in both 32bits and 64bits. We should now try to convince Embarcadero to include the fixes in the Delphi RTL. iaStressTest.TThreadedQueue.PopItem.pas Edited May 17, 2020 by pyscripter Share this post Link to post
David Heffernan 2345 Posted May 17, 2020 8 hours ago, Kas Ob. said: The assembly is good and had nothing to be fixed. No it is not. It modifies rbx and does not restore it. Share this post Link to post
pyscripter 689 Posted May 17, 2020 @David HeffernanI know very little about assembler and I will leave this to experts to resolve. However I do see push rbx and pop rbx in the code. Does this not save and restore rbx? This code has been in the master branch of OmniThreadLibrary for ages and should have been tested extensively. It does appear to work well we the TMonitor fix. Share this post Link to post
Uwe Raabe 2058 Posted May 17, 2020 I guess David is referring another assembler sequence. Share this post Link to post
pyscripter 689 Posted May 17, 2020 (edited) 13 minutes ago, Uwe Raabe said: I guess David is referring another assembler sequence. Well there is the Delphi implementation which does not work: function InterlockedCompareExchange128(Destination: Pointer; ExchangeHigh, ExchangeLow: Int64; ComparandResult: Pointer): Bool; stdcall; asm MOV R10,RCX MOV RBX,R8 MOV RCX,RDX MOV RDX,[R9+8] MOV RAX,[R9] LOCK CMPXCHG16B [R10] SETZ AL end; and the CAS function from OmniThreadLibrary which works well. function CAS(const oldData: pointer; oldReference: NativeInt; newData: pointer; newReference: NativeInt; var destination): boolean; asm {$IFNDEF CPUX64} push edi push ebx mov ebx, newData mov ecx, newReference mov edi, destination lock cmpxchg8b qword ptr [edi] pop ebx pop edi {$ELSE CPUX64} .noframe push rbx //rsp := rsp - 8 ! mov rax, oldData mov rbx, newData mov rcx, newReference mov r8, [destination + 8] //+8 with respect to .noframe lock cmpxchg16b [r8] pop rbx {$ENDIF CPUX64} setz al end; { CAS } @David Heffernanhas said that both are wrong (see earlier comments). By the way Allen Bauer explains .PUSHNV here. Edited May 17, 2020 by pyscripter Share this post Link to post
Guest Posted May 17, 2020 8 hours ago, pyscripter said: And by the way. There is also the [Align(8)] attribute which works with fields, global variables and records it seems. Thank you very much, this is the first time i hear this ! Share this post Link to post
David Heffernan 2345 Posted May 17, 2020 CAS from OTL does restore rbx, but it should do so using .pushnv, for the reasons that Allen's article explains. 1 Share this post Link to post
Guest Posted May 17, 2020 (edited) Sorry for this question, but where can i find the information about this alignement functionality and when they are introduced, as i see they are working with Seattle but not with 2009. Found my answer https://stackoverflow.com/questions/8460862/what-does-packed-now-forces-byte-alignment-of-records-mean Thank you @Uwe Raabe better late than never. Edited May 17, 2020 by Guest Share this post Link to post
pyscripter 689 Posted May 17, 2020 1 minute ago, David Heffernan said: CAS from OTL does restore rbx, but it should do so using .pushnv, for the reasons that Allen's article explains. @David HeffernanThanks. Could you please post here what the function should be? Share this post Link to post
Guest Posted May 17, 2020 4 minutes ago, David Heffernan said: CAS from OTL does restore rbx, but it should do so using .pushnv, for the reasons that Allen's article explains. From dosumentation Quote .PUSHNV REG Generates code to save and restore the non-volatile general purpose register in prologue and epilogue. It merely does add push REG at the begining and pop REG at the end, for rbx it is saved already in code and doesn't need this. Share this post Link to post
David Heffernan 2345 Posted May 17, 2020 (edited) 10 minutes ago, Kas Ob. said: It merely does add push REG at the begining and pop REG at the end, for rbx it is saved already in code and doesn't need this. Wrong. It also lets the compiler out the meta data, the unwind data, needed to restore the register in case of an exception. Read Allen's article. Edited May 17, 2020 by David Heffernan Share this post Link to post
Guest Posted May 17, 2020 15 minutes ago, David Heffernan said: Wrong. It also lets the compiler out the meta data, the unwind data, needed to restore the register in case of an exception. Read Allen's article. Would you please point me to that article, i have no idea where to find it. Share this post Link to post
Leif Uneus 43 Posted May 17, 2020 Just now, Kas Ob. said: Would you please point me to that article, i have no idea where to find it. It is linked by @pyscripter a few replies above. Share this post Link to post
pyscripter 689 Posted May 17, 2020 If I replace the push and pop rbx with .pushnv rbx and remove the .noframe the code crashes. How is one supposed to use .pushnv? function CAS(const oldData: pointer; oldReference: NativeInt; newData: pointer; newReference: NativeInt; var destination): boolean; asm {$IFNDEF CPUX64} {$ELSE CPUX64} .pushnv rbx //rsp := rsp - 8 ! mov rax, oldData mov rbx, newData mov rcx, newReference mov r8, [destination + 8] //+8 with respect to .noframe lock cmpxchg16b [r8] {$ENDIF CPUX64} setz al end; { CAS } Share this post Link to post
David Heffernan 2345 Posted May 17, 2020 (edited) Probably remove that +8. But I'm guessing there. I'd never write asm using variable names. You simply have to write the whole thing using registers to keep track of what it where. Edited May 17, 2020 by David Heffernan 1 Share this post Link to post
Anders Melander 1784 Posted May 17, 2020 9 hours ago, pyscripter said: Actually Delphi does offer support for alignment (undocumented feature see @Uwe Raabe response in this Stackoverflow question). Beware that this does not align memory allocated from the heap. For that you can use the following: {$ifdef CPUX64} SetMinimumBlockAlignment(mba16Byte); {$else CPUX64} SetMinimumBlockAlignment(mba8Byte); {$endif CPUX64} 52 minutes ago, pyscripter said: Could you please post here what the function should be? This works for me: function InterlockedCompareExchange128(Destination: Pointer; ExchangeHigh, ExchangeLow: Int64; ComparandResult: Pointer): Bool; stdcall; asm .NOFRAME .PUSHNV R10 .PUSHNV RAX .PUSHNV RBX .PUSHNV RCX MOV R10,Destination MOV RBX,ExchangeLow MOV RCX,ExchangeHigh MOV RDX,[ComparandResult+8] MOV RAX,[ComparandResult] LOCK CMPXCHG16B [R10] SETZ AL end; The PUSHNV RBX is translated to push RBX + pop RBX. The others do nothing. Share this post Link to post
Leif Uneus 43 Posted May 17, 2020 Just now, pyscripter said: If I replace the push and pop rbx with .pushnv rbx and remove the .noframe the code crashes. How is one supposed to use .pushnv? mov r8, [destination] This works and passes the test. Share this post Link to post
David Heffernan 2345 Posted May 17, 2020 (edited) 7 minutes ago, Anders Melander said: Beware that this does not align memory allocated from the heap. For that you can use the following: {$ifdef CPUX64} SetMinimumBlockAlignment(mba16Byte); {$else CPUX64} SetMinimumBlockAlignment(mba8Byte); {$endif CPUX64} This works for me: function InterlockedCompareExchange128(Destination: Pointer; ExchangeHigh, ExchangeLow: Int64; ComparandResult: Pointer): Bool; stdcall; asm .NOFRAME .PUSHNV R10 .PUSHNV RAX .PUSHNV RBX .PUSHNV RCX MOV R10,Destination MOV RBX,ExchangeLow MOV RCX,ExchangeHigh MOV RDX,[ComparandResult+8] MOV RAX,[ComparandResult] LOCK CMPXCHG16B [R10] SETZ AL end; The PUSHNV RBX is translated to push RBX + pop RBX. The others do nothing. You are restoring volatile registers there. That's wrong. Most egregious is where you restore rax which has the return value. Ah, i read the bit where you say "the others do nothing". I guess the compiler knows they are volatile / used for param/return passing. I'd remove them all the same. You really must stop using parameter names in asm because it obfuscates the fact they they live in registers, and which ones. Edited May 17, 2020 by David Heffernan Share this post Link to post
Anders Melander 1784 Posted May 17, 2020 2 minutes ago, Leif Uneus said: This works and passes the test. As far as I can tell that's a coincidence. RDX needs a value: https://www.felixcloutier.com/x86/cmpxchg8b:cmpxchg16b Share this post Link to post
Anders Melander 1784 Posted May 17, 2020 3 minutes ago, David Heffernan said: You are restoring volatile registers there. That's wrong I'm aware that it's a noop in this case and I was leaving that detail to the compiler... 3 minutes ago, David Heffernan said: Most egregious is where you restore rax which has the return value. ...but that's a good point. 4 minutes ago, David Heffernan said: You really must stop using parameter names in asm because it obfuscates the fact they they live in registers, and which ones. I get your point but I always start with registers and then refactor the code to use symbols to make it readable. I guess I should keep the register names in a comment. Share this post Link to post
David Heffernan 2345 Posted May 17, 2020 6 minutes ago, Anders Melander said: I always start with registers and then refactor the code to use symbols to make it readable. I guess I should keep the register names in a comment. To me it is much cleaner to comment at the top of the routine which register each parameter travels in, and then stick to registers. That allows you to see teg register usage directly rather than having an extra level of indirection. Share this post Link to post
pyscripter 689 Posted May 17, 2020 1 hour ago, Anders Melander said: Beware that this does not align memory allocated from the heap. For that you can use the following: {$ifdef CPUX64} SetMinimumBlockAlignment(mba16Byte); {$else CPUX64} SetMinimumBlockAlignment(mba8Byte); {$endif CPUX64} SetMinimumBlockAlignment does nothing in 64bits (see the source code). Alignment is fixed at mba16Byte always. 1 Share this post Link to post
MarkShark 27 Posted May 17, 2020 Watching this with great interest! This level of knowledge is beyond me, but as a consumer of the Delphi library code it would be fantastic to hear that is has been fixed. What's the best way to "suggest" that Embarcadero incorporate any found fixes? Contacting Marco? Voting for a QC report? Something else? A big thanks to all contributors on this, it really is fascinating. 1 Share this post Link to post
Anders Melander 1784 Posted May 17, 2020 31 minutes ago, MarkShark said: What's the best way to "suggest" that Embarcadero incorporate any found fixes? Contacting Marco? Voting for a QC report? There's RSP-23333 which is about this problem. I've already posted a link to the original article and this thread to the report. Beyond that it would be nice if someone on the 10.4 field test could bring it to the attention of those that can do something about it. Share this post Link to post
Koru 2 Posted May 17, 2020 Reported https://quality.embarcadero.com/browse/RSB-4115 Already openned Share this post Link to post