Der schöne Günther 316 Posted February 9, 2023 Consider the following code: program Project1; uses System.SyncObjs, System.Threading; begin var counter := 0; TParallel.For( 0, 9, procedure(i: Int64) begin TParallel.For( 0, 9, procedure(i: Int64) begin TInterlocked.Increment(counter); end ); end ); Assert(counter = 100); end. It will entirely hang up. When I check the debugger, all worker threads are blocked by waiting for some event that is never happening. As the Parallel Library seems to be modelled after C#, I tried out the exact same thing which works as expected: var counter = 0; Parallel.For( 0, 10, i => Parallel.For( 0, 10, i => Interlocked.Increment(ref counter) ) ); System.Diagnostics.Debug.Assert(counter == 100); The documentation on TParallel.For also does not say anything about this. Share this post Link to post
David Heffernan 2345 Posted February 9, 2023 Works fine here on the latest Delphi. Are you still using Seattle? Share this post Link to post
Attila Kovacs 629 Posted February 9, 2023 (edited) 12 minutes ago, David Heffernan said: Are you still using Seattle? var counter := 0; btw. with 10.1 U2 it runs fine too Edited February 9, 2023 by Attila Kovacs Share this post Link to post
Der schöne Günther 316 Posted February 9, 2023 (edited) 58 minutes ago, David Heffernan said: Works fine here on the latest Delphi. Are you still using Seattle? Ooops, sorry - No. I updated my profile to Delphi 11. I noticed that it does run and produce the expected output when I lower the range to something like 0..6. As soon as it is 8² or more, it will block. Maybe it's depending on the number of cores. On my system, TThreadPool.Default.MinWorkerThreads will report 2 and MaxWorkerThreads is 4. It's just a guess, but I changes the example slightly: program Project2; uses System.SyncObjs, System.Threading; begin var counter := 0; const COUNT = TThreadPool.Default.MaxWorkerThreads * 2; TParallel.For( 0, Pred(count), procedure(i: Int64) begin TParallel.For( 0, Pred(count), procedure(i: Int64) begin TInterlocked.Increment(counter); end ); end ); Assert(counter = (COUNT * COUNT)); end. Edited February 9, 2023 by Der schöne Günther Share this post Link to post
Anders Melander 1784 Posted February 9, 2023 (edited) Reproduced. Delphi 11.2, 32- and 64-bit 4-core CPU. Edited February 9, 2023 by Anders Melander Share this post Link to post
Anders Melander 1784 Posted February 9, 2023 7 minutes ago, Anders Melander said: Reproduced. ...and now I can't. I tried debugging it and the IDE froze. After a restart of the IDE I can't reproduce it anymore. Share this post Link to post
Anders Melander 1784 Posted February 9, 2023 4 minutes ago, Anders Melander said: ...and now I can't. ...and now I can again. Race conditions. Pfft! 🙂 Share this post Link to post
Dalija Prasnikar 1396 Posted February 9, 2023 The problem happens when outer For loop consumes all available threads and hits maximum where no new threads are being allocated for newly requested task. And then if inner tasks cannot get free thread to run, they cannot complete their job. I have test case that will break more easily as number of needed threads is higher, and sleep also increases the time inner thread runs and gives more time for outer for loop to consume all available threads. Using separate pool for inner for loop solves the problem. This can still be an issue if you have code that indirectly uses nested parallel loops. begin var counter := 0; var pool := TThreadPool.Create; const COUNT = TThreadPool.Default.MaxWorkerThreads * 4; TParallel.For( 0, Pred(count), procedure(i: Int64) begin TParallel.For( 0, Pred(count), procedure(i: Int64) begin TInterlocked.Increment(counter); Sleep(10); end , pool); end ); Writeln(counter); Readln; end. Probably the best solution for parallel for loops that can spawn too many tasks would be to create dedicated thread pool for each loop. The cost of creating a thread pool is not big when dealing with longer running tasks. And for fast running tasks, like in this example, parallel loop is overkill anyway as it can run slower than simple for loop in single task. 3 Share this post Link to post
Der schöne Günther 316 Posted February 9, 2023 27 minutes ago, Dalija Prasnikar said: Probably the best solution (...) would be to create dedicated thread pool for each loop. The cost of creating a thread pool is not big when dealing with longer running tasks. Thank you, that's also the temporary workaround I have come up with. I discovered it when running code from a library in a TParallel.For-loop that also made use of TParallel.For. They all implicitly used the default thread pool. Still, I'd like this caveat to be properly documented, and not to discover it in a running factory like I just did. And it still leaves me wondering why the C# runtime does not have these problems. 1 Share this post Link to post
David Heffernan 2345 Posted February 9, 2023 18 minutes ago, Der schöne Günther said: And it still leaves me wondering why the C# runtime does not have these problems. Likely because it has been properly designed by skilled practitioners in this field. 2 2 2 Share this post Link to post
Anders Melander 1784 Posted February 9, 2023 2 hours ago, David Heffernan said: Likely because it has been properly designed by skilled practitioners in this field. That was really helpful. Thanks. 3 hours ago, Der schöne Günther said: Still, I'd like this caveat to be properly documented, and not to discover it in a running factory like I just did. If you view the PPL as an opaque general-purpose library then documentation will not help. Even if the problem/limitation is known by the user of the library, its existence means that the PPL can only be used when running on known configurations (# of CPUs) and cannot safely be used with other libraries that might also use the PPL. In other words: It's a bug; The situation should either be handled or it should fail with an exception. As far as I can see the concrete problem is in TParallel.ForWorker64 on the statement RootTask.Start.Wait. All the inner loop tasks are hanging there, waiting for their code to execute, but since all threads are occupied (by the outer loop tasks), none will ever become available. Deadlock. This situation should be detectable and I think a simple solution would be to temporarily increase the size of the threadpool when it occurs. For example in TThreadPool.TThreadPoolMonitor.GrowThreadPoolIfStarved. Share this post Link to post
Anders Melander 1784 Posted February 9, 2023 Okay so I've looked into this some more and I've got good news and bad news: The bad news is that this is probably "as designed". The good news, well I lied; There isn't any good news. If we look at other thread pool implementations "properly designed by skilled practitioners", the old Win32 thread pools also had a hard upper limit on the number of threads in a pool and suffered from the exact same problem. The newer Vista thread pools are a bit more clever but it still has an upper limit (I believe the default max is 500 threads) and suffered from the exact same problem. Same with .NET thread pools (which are a rewrite of Win32 thread pools); For CLR 2.0 the max is 25 threads per core and for 2.0SP1 the max is 250 per core. The reason for this tenfold increase in default max is actually to avoid experiencing deadlocks caused by running out of threads quite so often. Thus .NET too suffers from the exact same problem. See Concurrent Programming on Windows, chapter 7 for a discussion on all this. So the problem is that the RTL thread pool imposes a hard upper limit on the number of threads in it and that the limit is way too small. Ideally what we'd like, in this case, is for the growth algorithm to be a bit more intelligent and flexible but I doubt that will happen. The easy solution is probably to just increase the max and hope for the best 😕 . I still believe that the library should detect (and fail on) the simple case when all threads are blocked waiting for a thread to become available. It's relatively unlikely that this will occur in real code (there are many other things, not controlled by the PPL, that a thread can wait on) but it's so simple to implement that I believe it's worth the effort. 2 Share this post Link to post