We were pleased to hear that version 12.0 addressed the deadlock issue. However, we've encountered three major issues since upgrading:
1. Unfortunately, we've observed a 40% performance decrease with the updated threading.pas in version 12.0, which is unacceptable for our needs. The root of the problem appears to be the ThreadPool growing beyond the max workers we set in our custom TThreadPool. This necessitates the creation of new copies for the new thread within TParallel (we manage necessary copies dynamically using a Dictionary with TThreadID as key). The issue in threading.pas seems to stem from this property being explicitly set to true in TThreadPool.Create:
constructor TThreadPool.Create;
begin
inherited Create;
// Initialization code
FUnlimitedWorkerThreadsWhenBlocked := TRUE;
// More initialization code
end;
despite already having a default value or having been set to False:
property UnlimitedWorkerThreadsWhenBlocked: Boolean read FUnlimitedWorkerThreadsWhenBlocked
write FUnlimitedWorkerThreadsWhenBlocked default True;
This leads to the TThreadPool.TThreadPoolMonitor.GrowThreadPoolIfStarved procedure unnecessarily increasing the number of worker threads. When I added debugging, I noticed it enters the else if FThreadPool.UnlimitedWorkerThreadsWhenBlocked block multiple times per TParallel.for loop.
2. In previous versions of Delphi, we encountered access violations when closing the debugger if we used TParallel.For without modifications (with the default thread pool). Switching to custom thread pools mitigated this issue, but with version 12.0, the problem seems to persist regardless. The access violation in 12.0 (using a custom thread pool) often occurs here:
procedure TThreadPool.TThreadPoolMonitor.Execute;
begin
// Procedure code
Signaled := FThreadPool.FMonitorThreadWakeEvent.WaitFor(TThreadPool.MonitorThreadDelay) = TWaitResult.wrSignaled;
// More procedure code
end;
I've also seen it happen on different lines within the same procedure. We can only reproduce this in our VCL application. Our DUnit test project does not have the same issue
3. With the debugger attached, any operation involving TParallel.For runs excruciatingly slow—about 8-10 times slower compared to twice as slow in version 11.3.
For full disclosure and additional information we're using https://github.com/pleriche/FastMM5 but none of my reported issues change with it. The base runtime just becomes a lot slower with it, because the built in FastMM4 doesn't do great with our workload.
We're reaching out for insights or solutions to these issues, as the current performance and stability impacts are significant for our projects.