David Schwartz 434 Posted January 24, 2022 In general, is there a way to set task thread priority in Windows? (Both in the PPL and OTL.) It seems that if you have more threads than cores, a thread that you want to run on a regular interval can get starved for CPU time by the others that are getting shuffled in and out of cores until they're all done. Share this post Link to post
Anders Melander 1819 Posted January 24, 2022 (edited) 8 hours ago, David Schwartz said: In general, is there a way to set task thread priority in Windows? (Both in the PPL and OTL.) It seems that if you have more threads than cores, a thread that you want to run on a regular interval can get starved for CPU time by the others that are getting shuffled in and out of cores until they're all done. Unless you actually understand what and why the Windows thread scheduler does it's generally better to leave that stuff alone. It's a classic newbie mistake to think that one can make a thread run faster/better by raising its priority. https://blog.codinghorror.com/thread-priorities-are-evil/ Quote Although there are some edge conditions where micromanaging thread priorities can make sense, it's generally a bad idea. Set up your threads at normal priority and let the operating system deal with scheduling them. No matter how brilliant a programmer you may be, I can practically guarantee you won't be able to outsmart the programmers who wrote the scheduler in your operating system. Edited January 24, 2022 by Anders Melander 2 Share this post Link to post
darnocian 93 Posted January 25, 2022 (edited) I agree with Anders. using threads can be for different reasons though… backgrounding a blocking io bound tasks is different to backgrounding a cpu bound tasks. you may want to consider thread pools, as they ensure threads are recycled efficiently, and don’t normally exceed the number of cores. On multi core systems, I’ve normally used threadpools usually helps in recycling threads. The Delphi tasks library essentially captures this concept at a high level. there is also thread affinity that can be assigned using SetThreadAffinityMask, (on windows) but not sure how effective it is. The docs also recommend letting the system do the scheduling. Edited January 25, 2022 by darnocian Share this post Link to post
David Schwartz 434 Posted January 25, 2022 On 1/23/2022 at 10:59 PM, eivindbakkestuen said: TThread.Priority Thanks! 23 hours ago, Anders Melander said: Unless you actually understand what and why the Windows thread scheduler does it's generally better to leave that stuff alone. It's a classic newbie mistake to think that one can make a thread run faster/better by raising its priority. Well, I can plainly see that the Windows thread scheduler is failing to do what I want. The UI is locking-up after processing a couple of things, even though there's a 300ms timer being used to update it, so nothing further is visible to the user until the number of tasks left in the queue is less than the number of cores, at which point it's like dumping 10 gallons of water on someone all at once who was expecting to see one gallon per minute for 10 minutes. Oddly, in Windows, when you set the thread pool size to 1, the whole asynchronous threading model breaks down and everything runs serially with no asynchronism at all. Which is why Windows had (maybe still has?) this odd "Yield" method that you have to sprinkle-in liberally throughtout your code to ensure no one task hogs too much CPU time. There are warnings I've read that say to beware of this situation where a single thread can hijack and saturate the CPU because everything runs at the same process priority by default. You can solve this by boosting the priority of tasks that are intended to run periodically (ie, on a timer), for example, to ensure they actually run when their timer triggers them rather than having the timer stuff a message at the end of the message queue that's not processed until everything else has finished. The task triggered by the timer needs to actually RUN periodically, not just at the end. I found out that the OTL also has a way to set a thread's priority, but it took quite a while to track down in the manual. Share this post Link to post
David Schwartz 434 Posted January 25, 2022 12 minutes ago, darnocian said: using threads can be for different reasons though… backgrounding a blocking io bound tasks is different to backgrounding a cpu bound tasks. you may want to consider thread pools, as they ensure threads are recycled efficiently, and don’t normally exceed the number of cores. On multi core systems, I’ve normally used threadpools usually helps in recycling threads. The Delphi tasks library essentially captures this concept at a high level. Adjusting priorities is a primary way you can ensure that tasks intended to run periodically actually DO run periodically. That's the problem I'm faced with here, and the default settings are causing events messages that should run a FG process periodically all get clumped at the end of the message list (or task queue). They need to run when they're triggered, not when everything else is finished. I have a lot of experience doing real-time control stuff on single-CPU (ie, single-core) systems. We didn't have this problem for a variety of reasons. One was that the Scheduler would wake up periodically and see if there were any higher-priority tasks that needed to run. It would also send tasks that had been running "too long" to the end of the line. And idle tasks would not eat up any CPU time at all. In my case, I've got threads for tasks that take many wall-clock seconds to run, although 99% of that time is spent waiting for a reply to arrive from off-site. In theory, they should be stuffed into an "idle queue" while they're blocked so they don't bogart the cores. I set the thread pool to a relatively large number to ensure as many tasks are waiting for replies as possible. But what happens is they saturate all available cores instead of sitting in an idle queue, and the thread that's supposed to update the UI never gets a chance to run. If you have 50 tasks that all sit for 10 seconds waiting for a reply, the CPU should not be saturated with only 'n' tasks running (where n = #cores). If the response time varies from 5 - 15 secs randomly, the CPU cores should not have a few tasks saturating them waiting on their replies while other tasks in the queue that HAVE received replies are sitting waiting to get time to run. This is how things are working right now, and Windows does not seem to be doing a very intelligent job of managing any of it. The periodic task needs to have its priority raised so it runs ahead of the others when it wakes up. The others would do well to have their priority dropped when they begin waiting for their reply so others can get CPU time, and when a reply arrives it would restore the priority of its sleeping task. If anybody has any suggestions other than adjusting task priorities, I'm all ears. Share this post Link to post
PeterBelow 239 Posted January 25, 2022 57 minutes ago, David Schwartz said: Thanks! Well, I can plainly see that the Windows thread scheduler is failing to do what I want. The UI is locking-up after processing a couple of things, even though there's a 300ms timer being used to update it, so nothing further is visible to the user until the number of tasks left in the queue is less than the number of cores, at which point it's like dumping 10 gallons of water on someone all at once who was expecting to see one gallon per minute for 10 minutes. Windows is not a real-.time OS, so you don't have any guarantee for defined thread execution times, and your program has to share the CPU resources with a few dozen OS processes. Anyway, your problem sounds like you flood your UI thread's message queue with update requests (you do Synchronize these request, I hope, and do not execute the background task's work in a syncronized method). These requests have a higher priority than the paint messages your changes to the UI cause to be added to the queue. If adding calls to a method like procedure ProcessPaintrequests; var Msg: TMsg; begin while PeekMessage(Msg, 0, WM_PAINT, WM_PAINT, PM_REMOVE) do DispatchMessage(Msg); while PeekMessage(Msg, 0, WM_NULL, WM_NULL, PM_REMOVE) do DispatchMessage(Msg); end; does not fix the problem you have to lower the priority of your worker threads to give the main thread more time to update the UI. Share this post Link to post
David Schwartz 434 Posted January 25, 2022 (edited) The project in question is attached to my big post at the end of the Third-Party -> OTL board. (the code is in a Zip file, but the main logic is shown in the post.) I wrote it to test OTL's Join abstraction. However, the original form used Delphi's PPL to update some statistics in the form. I moved them to a status bar at the bottom of the window. Under the hood, both PPL and OTL are using the same Windows logic, so the differences in higher-level abstractions is irrelevant for testing purposes. (This IS just a TEST jig.) I can see what's going on in my head, but I'm having trouble getting it expressed correctly in code. I'm testing out different things, and have gotten this far, changing the TSystemInfo.Create method in the Classes.SystemInfo.pas unit to this: class constructor TSystemInfo.Create; begin // create "dummy" task, that will start System.Threading's monitoring, and keeps it running // required to get meaningful values for CPU Usage //TTask.Run( var aThread := TThread.CreateAnonymousThread( procedure begin sleep(100); end); aThread.Priority := tpHigher; aThread.Start; //platform won't change at runtime FPlatform.Detect; end; I changed TTask.Run to the line below it and then set the Priority to tpHigher and Started the thread. Happily, this corrected the problem with the Join logic not updating the UI for a lot of tasks, which is not obvious but it makes sense to me. Here's why... My CPU has 4 cores. When I set the test to use just 1 core, everything runs as if it's synchronous; ie, there's no parallelism going on. (That makes sense to me given a message-driven system and nothing limiting the amount of time threads can bogart a core. In an interrupt-driven system with caps on CPU time for each task, I don't think the cores would be so saturated. The test generates a bunch of tasks that have delays and a variance that is randomly added or subtracted from the base value. By default, it's 8 +/- 3, so the delay values will be 5-11 seconds. Now, based on the # of cores and the range of delays, there will be some ratio of threads to cores where all of the threads will essentially start at the same time, and will finish after their appointed delay times (all rounded to the nearest full second). When a thread finishes, it will be moved to the right-hand list. Sometimes they will be sorted by delay time, and will appear in ascending order, even when the #tasks exceeds #cores by quite a bit. (Try it, you'll see.) As you adjust the delay factors and ratio of threads to cores, at some point there will be enough contention for cores that some delays will occur, resulting in the threads no longer sorted as nicely in the right-hand list. At some point, the contention for CPU time will become so high that the ordering of completion will appear fairly random vs. their specified delay times. Without boosting the priority on the thread above, all of the requests to update the UI get added to the end of the thread queue, and the contention starts right away -- I've never see a situation where the threads are sorted in the right-hand list by ascending delay times except where the #threads <= #cores. Again, this makes perfect sense to me, although it's not very intuitive. That said, there may be little things I'm overlooking, because I think the point at which the #threads to #cores starts to mess up the ordering of things in the right-hand list by ascending delays should be much higher than what I'm seeing. There is one side-effect of changing that TSystemInfo.Create method from using TTask.Run to use a TThread instead, and that has to do with the way the GetCurrentCPUUsage works -- it always displays 0%. This has no effect on the above logic, it's just a quirk of something in the PPL. I'd like to replace the PPL code with OTL code, but I'm not clear which abstraction is best to use for it. Edited January 25, 2022 by David Schwartz Share this post Link to post
Lars Fosdal 1794 Posted January 25, 2022 There are too many unknowns to suggest a complete design. Are multiple queries done on the same connection, one after the other - or shall each query make a new connection? Is there a limit to the number of parallel slow query/responses? Is order of sequence for the query a crucial factor? Is order or sequence for the response processing a crucial factor? I would be thinking along the lines of a request queue of objects containing the connection and any other necessesary payload. A request thread pool would be processing the request queue and moving the objects to a wait queue. A receive event would copy the response payload to the object, close the connection, and move the object to the response processing queue. A response thread pool would process the objects from the response queue, populate the data used by the UI, and signal the UI that an update has arrived. The UI thread would throttle the actual update/repaint activity to an acceptable rate. IMO, the thread pool sizes does not need to exceed the number of kernels as long as the threads themselves are not idle looping. Share this post Link to post
David Schwartz 434 Posted January 25, 2022 (edited) As I said, this is just a test jig for me to get famliar with the OTL and driving the UI properly. I'll be sending out a bunch of requests to my own middle-tier service that forwards them on to the same destination service. My testing on up to 50 queries has shown they take between 3 and 12 seconds to process each one. My middle tier service extracts a handful of details from the full response packets and returns that to the waiting client threads. The overhead in processing the replies is minimal, so there's not much point in holding the data in the middle tier until all replies have been received, then dumping it all down to the client at that point. The client would have nothing to do at that point while waiting for the reply data to arrive. I chose to use the Join abstraction because my application needs to process all of these requests in parallel then wait to get all of the response data back before it can proceed to do anything further. If you have 100 requests to process and they take an average of 10 seconds each, that's 1000 seconds to process them all in sequence. But in parallel, it'll be closer to 30 seconds. THAT is the ONLY metric I'm trying to shrink by using multi-threading. All of the rest is unaffected. 34 minutes ago, Lars Fosdal said: IMO, the thread pool sizes does not need to exceed the number of kernels as long as the threads themselves are not idle looping. Do you mean "cores" instead of "kernals"? If so, my tests show otherwise. And intuitively it makes no sense either. Let's say you have 40 threads and 4 cores. Each thread can run and send off its request, then go to sleep waiting for a reply. It's very possible that all 40 of those threads could send out their queries before the first reply is received. I don't understand why a bunch of threads all waiting on something to happen are saturating all of the CPU cores while basically doing NOTHING! At least by increasing the thread pool size you have a far better chance that each thread will send its request, then go to sleep and let another thread do the same. If you see how things work based on how this test jig works, Windows does a really poor job of reallocating threads to cores when threads have nothing to do. It's clear that increasing the size of the thread pool results in lower overall processing time, up to some point, whereupon the total processing time starts to creep up as the thread pool size grows. That said, I've seen numerous benchmarks from Java and C#/.NET applilcations that show the overhead in their threading code is so high that the break-even point on saving time by multi-threading is absurdly high. So this test jig shows I can get a serious reduction in overall processing time with this approach. Edited January 25, 2022 by David Schwartz Share this post Link to post
Lars Fosdal 1794 Posted January 25, 2022 2 hours ago, David Schwartz said: I don't understand why a bunch of threads all waiting on something to happen are saturating all of the CPU cores while basically doing NOTHING! It depends on how you wait? I meant kernels as in logical processors. Typically twice the number of cores on an Intel CPU. 1 Share this post Link to post
Lars Fosdal 1794 Posted January 25, 2022 As I write this, my laptop runs 523 processes with 7600+ threads on a 12-kernal (6 core) i7. If all those threads were check/sleep/loop based - I'd be at 100%. Instead, most of them are idle, just waiting for something to happen. Share this post Link to post
darnocian 93 Posted January 25, 2022 You may know this, or for other readers... As Lars mentioned, 'waiting' correctly is important. Ideally we should not be sleep()ing before querying a queue. Be mindful not to create 1 thread per request. If there are too many threads, then your scheduler ends up spending more time spinning, which doesn't help. Ideally, it should be utilising thread synchronisation primitives such as TEvent SetEvent() and WaitFor() when adding requests to the queue and waiting for events in the queue respectively. Further, we need to ensure we have some locking around whatever structure is used for the queue so that concurrent access doesn't destroy the structure's consistency. I'll write a little example later. Share this post Link to post
David Schwartz 434 Posted January 27, 2022 great to know. My test jig is just calling Sleep() because ... that what's used in all of the threading examples I've seen. My main app isn't working yet, but it will block on calls to the HTTPGet routines. Is there a better way to do that if you're multi-threading and want to minimize your overall processing time for a big batch of REST calls to the same API? Share this post Link to post