@Darian Millerhas published a very nice article about the state of TThreadedQueue and TMonitor in Delphi. He has also published at Github a stress test that shows how TThreadQueue still fails under stress.
I have played with his stress test and concluded that the problem is almost certainly in TMonitor. TMonitor implements a lock-free stack to recycle events created with the CreateEvent function. The relevant code in SysUtils is
var
EventCache: PEventItemHolder;
EventItemHolders: PEventItemHolder;
procedure Push(var Stack: PEventItemHolder; EventItem: PEventItemHolder);
var
LStack: PEventItemHolder;
begin
repeat
LStack := Stack;
EventItem.Next := LStack;
until AtomicCmpExchange(Pointer(Stack), EventItem, LStack) = LStack;
end;
function Pop(var Stack: PEventItemHolder): PEventItemHolder;
begin
repeat
Result := Stack;
if Result = nil then
Exit;
until AtomicCmpExchange(Pointer(Stack), Result.Next, Result) = Result;
end;
This lock-free stack is used by NewWaitObj and FreeWaitObj which are part of the Monitor support protocol and used by TMonitor. This works reasonably well, but under stress it fails. The reason it fails is known as the ABA problem and is discussed in a similar context by a series of excellent blog posts by @Primož Gabrijelčič: blog post 1, blog post 2, blog post 3.
His OmniThreadLibrary contains the following routine that he uses to deal with this problem.
/either 8-byte or 16-byte CAS, depending on the platform; destination must be propely aligned (8- or 16-byte)
function CAS(const oldData: pointer; oldReference: NativeInt; newData: pointer;
newReference: NativeInt; var destination): boolean;
asm
{$IFNDEF CPUX64}
push edi
push ebx
mov ebx, newData
mov ecx, newReference
mov edi, destination
lock cmpxchg8b qword ptr [edi]
pop ebx
pop edi
{$ELSE CPUX64}
.noframe
push rbx //rsp := rsp - 8 !
mov rax, oldData
mov rbx, newData
mov rcx, newReference
mov r8, [destination + 8] //+8 with respect to .noframe
lock cmpxchg16b [r8]
pop rbx
{$ENDIF CPUX64}
setz al
end; { CAS }
I have tried to use this function to provide a solution for TMonitor similar to the one in OmniThreadLibrary. (see attached iaStressTest.TThreadedQueue.PopItem that can be used with the original stress test). Whilst still not perfect it helps a lot in 32 bits with say up to 100 threads. However it crashes in 64bits and I do not know why. I am posting this here in case anyone with better knowledge than mine of assembler and thread programming can help with the challenge of fixing TMonitor. It would be nice to try and get a fix included in 10.4. And even if it is not included, it can be easily used as a patch in the same way as in the attached code.
iaStressTest.TThreadedQueue.PopItem.pas