hsvandrew 23 Posted January 1, 2019 (edited) After much pain and cost and time wasted, I wanted to leave this note for others TTask, Parallel Library -- DO NOT USE My advice to all Delphi developers: stick with OmniThread or write your own ThreadPool using TThread's. The Parallel Task Library (TTask, Parallel For loops etc) are all based on completely flawed code that still isn't fixed in Rio You'll invest many hours moving code across to TTask and then have to deal with all the core fundamental issues which you may not notice under normal conditions. These include: Flawed logic that runs only a few threads at a time (worse on a virtual machine) Flaws which sometimes stop tasks from starting at all Faults that, due to point 1 could result in a complete bottleneck of your app if you nest TTasks which may be hard to detect inside other functions Memory leaks on Linux n 10.2.x Hope this helps someone. We have got to the point where we try to avoid all new EMB built libraries where possible when a third party solution exists as we know it will be supported, tested and properly written. Edited January 2, 2019 by hsvandrew Share this post Link to post
pyscripter 689 Posted January 1, 2019 Please allow me to disagree. I have made non-trivial use of System.Threading in PyScripter (used by thousands of users) without any problems. The main issue I know of with the library is RSP-11267. You can see a fix in that report, but even without the fix you can work around the issue by using a different mechanism for signalling task to stop As for Parallel.For I have not used it extensively, but you can see some code of mine in this StackOverflow question that I think is now used in a commercial product. An issue with thread creation by Parallel.For (RSP-21177) has been fixed in Rio. Advantages of System.Threading: By far the easiest way to use tasks (as opposed to threads) in your delphi application Nice interface using anonymous methods Avoids third-party dependencies It provides usage statistics that can be used for diagnostic purposes Nowadays, it provides good performance out of the box, but you can still fine-tune the way it works if you are prepared to dig-in It uses state-of-the-art synchronization techniques @Primož Gabrijelčič can correct me on this, but by reading his book "Delphi High Performance", I think, that depending on the type of use, is competitive if not faster than his own OmniThreadLibrary The Parallel library, like many other new features of Delphi (e.g. Generics) got a bad reputation, due to the many bugs in the early days. Unfortunately. trust is easy to lose and very hard to rebuild. But the library does deserves some love from Embarcadero (fixing remaining issues) and Delphi users. 2 1 Share this post Link to post
Alberto Fornés 22 Posted January 1, 2019 I make a simply use of this library an TTask, and I don't see any problem with it (for sure not full tested), but reading your point "The first task takes forever to start (over a second)" , I would like to know if somebody else can confirm this, because I experimented delays at start, but it did not relate to TTask, because they are tasks that connect to remote servers and I thought that was the problem, but if the TTask start takes a second, it is possible that there is a problem with TTask. I would like to know if in this sense it can be a better to use other alternative like OmniThreadLibrary. Have a good year¡ Share this post Link to post
Berocoder 14 Posted January 1, 2019 Make a small testcase to demo the bug so others can confirm it or point to error in code. 3 Share this post Link to post
pyscripter 689 Posted January 1, 2019 (edited) Quote "The first task takes forever to start (over a second)" Of course not. However parallel programming is not a piece of cake. There are so many ways you can shoot yourself in the foot and then blame the library. Edited January 1, 2019 by pyscripter 2 Share this post Link to post
hsvandrew 23 Posted January 1, 2019 (edited) 1 hour ago, pyscripter said: Of course not. However parallel programming is not a piece of cake. There are so many ways you can shoot yourself in the foot and then blame the library. Ok I'm prepared to concede this problem isn't happening anymore on 10.2.3 (some of these results came from earlier in 2018) - it is possible this is a linux only problem as that was the platform where this issue was experienced Edited January 1, 2019 by hsvandrew Share this post Link to post
hsvandrew 23 Posted January 2, 2019 I'm going to pause this rant for now, perhaps these issues have been resolved and its just the tickets in quality control haven't been closed. We spend a lot of time on this in January and it failed miserably (especially for Linux) so we gave up. It appears it might be ok now on 10.2.3/10.3 I will do some more testing and advise Share this post Link to post
Primož Gabrijelčič 223 Posted January 2, 2019 The biggest problem with PPL (apart from number of terrifying bugs that were only recently fixed) is the annoying and uncontrollable behavior of its thread pool. It will start threads as it seems appropriate, sometimes seemingly without considering min/max concurrent tasks configuration (of which the 'min' part is actually not, as per source, see below). // Returns false if attempting to set a value < 0 or > MaxWorkerThreads. The actual number of pool // threads could be less than this value depending on actual demand. Setting this to too few threads could // be less than optimal resource utilization. function SetMinWorkerThreads(Value: Integer): Boolean; (BTW, the comment suggests that MinWorkerThreads can be set to the same value as MaxWorkerThreads. This is not the case. It must be less than that.) "Annoying and uncontrollable behavior": Sometimes you schedule N threads, then wait, then schedule M (M > N, but M < MaxWorkerThreads) threads, but only N would start. I can repeatedly reproduce this in Tokyo (and Berlin, and Seattle ...). The problem was mostly, but not completely, fixed in Rio. I noticed some problems there, too, but at the moment I don't yet know how to reproduce the problem from scratch. Share this post Link to post
Lars Fosdal 1791 Posted January 2, 2019 13 hours ago, hsvandrew said: I'm going to pause this rant for now, perhaps these issues have been resolved and its just the tickets in quality control haven't been closed. We spend a lot of time on this in January and it failed miserably (especially for Linux) so we gave up. It appears it might be ok now on 10.2.3/10.3 I will do some more testing and advise Please amend the title. The title should IMO reflect the contents of the post, which "-------" does not. 3 Share this post Link to post
pyscripter 689 Posted January 3, 2019 (edited) Quote "Annoying and uncontrollable behavior" @Primož GabrijelčičAt least in Rio it seems to work OK: program Project2; {$APPTYPE CONSOLE} uses WinApi.Windows, System.Diagnostics, System.Threading; procedure TestParallel; Var TaskArray: array [1..100] of ITask; begin for var I := Low(TaskArray) to High(TaskArray) do TaskArray[I] := TTask.Create(procedure begin Sleep(100); end).Start; TTask.WaitForAll(TaskArray); end; begin var SW := TStopWatch.StartNew; //TThreadPool.Default.SetMinWorkerThreads(50); TestParallel; WriteLn(SW.ElapsedMilliseconds); WriteLn; Var TPS := TThreadPoolStats.Default; WriteLn('WorkerThreadCount: ', TPS.WorkerThreadCount); WriteLn('MinLimitWorkerThreadCount: ', TPS.MinLimitWorkerThreadCount); WriteLn('MaxLimitWorkerThreadCount: ', TPS.MaxLimitWorkerThreadCount); ReadLn; end. Output on an 8 processor machine: 1370 WorkerThreadCount: 9 MinLimitWorkerThreadCount: 8 MaxLimitWorkerThreadCount: 200 Output with TThreadPool.Default.SetMinWorkerThreads(50); 233 WorkerThreadCount: 50 MinLimitWorkerThreadCount: 50 MaxLimitWorkerThreadCount: 200 With TParallel.&For it works differently and the number of threads created relate to the number of processors. Edited January 3, 2019 by pyscripter Parallel.For 1 Share this post Link to post
farcodev 2 Posted January 5, 2019 Here's my test with your code to give a version w/ 10.2. 402 WorkerThreadCount: 32 MinLimitWorkerThreadCount: 32 MaxLimitWorkerThreadCount: 800 w/ TThreadPool.Default.SetMinWorkerThreads(50); 214 WorkerThreadCount: 50 MinLimitWorkerThreadCount: 50 MaxLimitWorkerThreadCount: 800 So all works well either. Share this post Link to post
pyscripter 689 Posted January 5, 2019 (edited) @farcodev32 logical processors, Wow! What is your CPU? How many cores does it have? It appears to be super fast. Edited January 5, 2019 by pyscripter Share this post Link to post
farcodev 2 Posted January 5, 2019 @pyscripter a good old Threadripper 1950x :) 16 cores / 32 threads overclocked @ 3.7Ghz. I upgraded it from my "old" 3930K Intel in August 2017. Share this post Link to post
FredS 138 Posted January 6, 2019 On 1/2/2019 at 7:45 PM, pyscripter said: in Rio it seems to work OK Modified your code to get a better look at the difference. Both Executed in Berlin 10.1.2 Ideally #1 should reach a higher WorkerThreadCount during its two minutes of execution but still the same with Rio. #3 may be an issue (also in Rio), imagine running a pile of low intensity tasks first then executing something that loads Data Modules, Databases and whatnot.. starting at the same count you left off could cause out of memory issues unless dealt with (not tested). #4 IMO is handled better in Berlin, the AverageCPUUsage delegates the Pool to drop below MinLimitWorkerThreadCount. ** RIO System.Threading unit, modified to compile in Berlin #1 Slow increase, never reached full ActiveTask Count 00:02:13.8790372 AverageCPUUsage: 35 WorkerThreadCount: 34 MinLimitWorkerThreadCount: 4 #2 Increased to 58 immediately, holds appropriate AverageCPUUsage 00:01:11.9777749 AverageCPUUsage: 90 WorkerThreadCount: 58 MinLimitWorkerThreadCount: 50 #3 Starts off with #2 WorkerThreadCount and holds appropriate AverageCPUUsage 00:01:11.8997475 AverageCPUUsage: 89 WorkerThreadCount: 58 MinLimitWorkerThreadCount: 4 #4 Does NOT hold appropriate AverageCPUUsage 00:01:35.0015869 AverageCPUUsage: 100 WorkerThreadCount: 99 MinLimitWorkerThreadCount: 99 ** Berlin System.Threading unit, modified with RSP fixes only #1 Slow increase, never reached full ActiveTask Count 00:02:13.9110588 AverageCPUUsage: 34 WorkerThreadCount: 34 MinLimitWorkerThreadCount: 4 #2 Increased to 58 immediately, holds appropriate AverageCPUUsage 00:01:11.4247095 AverageCPUUsage: 89 WorkerThreadCount: 59 MinLimitWorkerThreadCount: 50 #3 Starts off with #2 WorkerThreadCount and holds appropriate AverageCPUUsage 00:01:11.6894449 AverageCPUUsage: 88 WorkerThreadCount: 59 MinLimitWorkerThreadCount: 4 #4 Drops appropriately below MinLimitWorkerThreadCount to keep AverageCPUUsage 00:01:21.0234632 AverageCPUUsage: 88 WorkerThreadCount: 78 MinLimitWorkerThreadCount: 99 program Project2; {$APPTYPE CONSOLE} uses WinApi.Windows, System.SysUtils, System.Diagnostics, System.Threading, System.Classes, System.TimeSpan; var SW : TStopWatch; ActiveTasksTicks : Int64; const SleepMs = 100; SleepCycles = 333; HighPrime = 100 * 1000; function IsPrime(const Value: Integer): Boolean; {$REGION 'History'} // 18-Jul-2018 - From a Parallel Task example {$ENDREGION} var Test, k: Integer; begin if Value <= 3 then IsPrime := Value > 1 else if ((Value mod 2) = 0) or ((Value mod 3) = 0) then IsPrime := False else begin IsPrime := True; k := Trunc(Sqrt(Value)); Test := 5; while Test <= k do begin if ((Value mod Test) = 0) or ((Value mod (Test + 2)) = 0) then begin IsPrime := False; break; { jump out of the for loop } end; Test := Test + 6; end; end; end; procedure TestParallel; var TaskArray: array [1..100] of ITask; Ticks : Int64; i : integer; begin for I := Low(TaskArray) to High(TaskArray) do TaskArray[I] := TTask.Create(procedure var s, p: integer; Succeeded : Boolean; {* GetActiveTasks *} function GetActiveTasks: integer; var i : integer; begin Result := 0; for i := Low(TaskArray) to High(TaskArray) do if (TaskArray[i].Status = System.Threading.TTaskStatus.Running) then begin Inc(Result); end; end; begin for s := 1 to SleepCycles do begin Ticks := ActiveTasksTicks; if (Ticks + TTimeSpan.TicksPerSecond) < SW.ElapsedTicks then begin System.AtomicCmpExchange(ActiveTasksTicks, SW.ElapsedTicks, Ticks, Succeeded); if Succeeded then begin Write(#13 + Format('ActiveTasks: %3.d, CPU: %3.d%%', [GetActiveTasks, TThreadPoolStats.Current.CurrentCPUUsage])); end; end; Sleep(SleepMs); for p := 1 to HighPrime do isPrime(p); end end).Start; TTask.WaitForAll(TaskArray); end; procedure TestTasks(AMinWorkerThreads : integer); var TPS : TThreadPoolStats; begin ActiveTasksTicks := 0; SW := TStopWatch.StartNew; TThreadPool.Default.SetMinWorkerThreads(AMinWorkerThreads); TestParallel; Write(#13 + StringOfChar(' ', 72)); WriteLn(#13 + SW.Elapsed.ToString); TPS := TThreadPoolStats.Default; WriteLn('AverageCPUUsage: ', TPS.AverageCPUUsage); WriteLn('WorkerThreadCount: ', TPS.WorkerThreadCount); WriteLn('MinLimitWorkerThreadCount: ', TPS.MinLimitWorkerThreadCount); WriteLn; end; begin TestTasks(TThread.ProcessorCount); TestTasks(50); TestTasks(TThread.ProcessorCount); TestTasks(99); ReadLn; end. Share this post Link to post
pyscripter 689 Posted January 6, 2019 The optimal thread pool configuration for IO intensive task is of course different that that for computational intensive tasks. The first case calls for many more threads than the second, since the tasks are in waiting state most of the time. System.Threading allows to have separate thread pools for different sets of tasks. Share this post Link to post
Stefan Glienke 2002 Posted January 7, 2019 (edited) No offense but I cannot take people serious that test any threading library code by putting sleep into their thread execution to simulate any real workload... Edited January 7, 2019 by Stefan Glienke Share this post Link to post
Daniel 417 Posted January 7, 2019 //Notice: I have just renamed this thread to a more meaningful subject. 2 Share this post Link to post
Primož Gabrijelčič 223 Posted January 7, 2019 5 minutes ago, Stefan Glienke said: No offense but I cannot take people serious that test any threading library code by putting sleep into their thread execution to simulate any real workload... Because? Unless the thread pool manager is measuring CPU load, the result should be the same. And if the thread pool manager does that, it is easy to replace the code with a busy loop. As far as my bug hunting in Delphi's TThreadPool went, replacing Sleep(2000) with a 2 second busy loop never changed anything. Share this post Link to post
Stefan Glienke 2002 Posted January 7, 2019 Because Sleep "suspends the execution of the current thread" (see MSDN) opposed to some "real" work. That might or might not affect significantly the behavior of the threading library itself depending on where it has glitches or not. Share this post Link to post
pyscripter 689 Posted January 7, 2019 (edited) @Stefan GlienkeIt does depend on what you are testing. If one claims that it takes > 1s for the first task to execute, it is sufficient to show it is not true. Also the work load on many real-life multi-threaded applications is not dissimilar. Say you fire 100 HTTP requests and you wait for the results to come, or you do asynchronous IO with memory mapped files (using overlapped events). I like this use of language: No doubt … followed by a debatable statement It is clear that … followed by an unclear statement No offence …. followed by an offensive statement etc. Edited January 7, 2019 by pyscripter 1 1 Share this post Link to post
Stefan Glienke 2002 Posted January 7, 2019 (edited) My point was that countering that there is "uncontrollable behavior" sometimes with showing that there is nothing wrong when firing off a bunch of sleep threads is useless. A thread that does nothing but sleep suspends and gives control back to the OS scheduler which knows not to wake up that thread until time is over. You can see that in the numbers you posted. Also when doing IO heavy stuff you hardly want to put that into CPU threads but use I/O completion ports. Edited January 7, 2019 by Stefan Glienke Share this post Link to post
Primož Gabrijelčič 223 Posted January 7, 2019 "Might or might not". Sleep is a good tool to demonstrate a bug. But I agree with you - if it works with Sleep, it still may not work with a high CPU load code. So proving with "Sleep" that the library works is not good enough. OTOH, proving with a high CPU load code that it works is also not good enough. "Unit tests can only be used to prove that a parallel code is NOT working." - me (BTW, I know that we agree here 🙂 I just wanted to state that testing parallel code with Sleep is a perfectly good tool. It is just not good enough.) 1 Share this post Link to post
Lars Fosdal 1791 Posted January 8, 2019 Alternatives to sleep,could be to have a class that creates large arrays of random values, and do Bubble sort on them, optionally logging the progress to disk 😛 Share this post Link to post
Primož Gabrijelčič 223 Posted January 8, 2019 When I need a busy loop I usually just do a := 1; while not timeout do a := Cos(a); 1 Share this post Link to post
Bretthans 2 Posted January 22, 2019 You're wearing out the cosine instruction on the CPU. :-) 1 1 Share this post Link to post