Jump to content

David Schwartz

Members
  • Content Count

    1237
  • Joined

  • Last visited

  • Days Won

    25

Everything posted by David Schwartz

  1. I've been going through the videos, and I'm left wondering what high-level abstraction is best to use for a batch of REST requests? The requests themselves are different, but they're all going through the same routine to the same service. Eg., the same Google queries for different cities, or hotel room availability at a bunch of different properties in the same hotel chain. That is, situations where you need to qualify the same query with a different geographical location, or different features at the same location. The point is, the API vendor requires you to submit the same query multiple times by varying a piece of the data each time. In this case, I can see using at least half of the abstractions that OTL provides. Indeed, they show loading multiple web pages in most of the examples. The program's users will want to process a batch of these requests, from 4 to 100 per batch, but typically 10-20 per batch on average. They typically take 8-12 seconds each. The purpose of these queries is to collect different sets of data to be analyzed at the next step. The user needs to wait for them all to complete before proceeding. They're all displayed in a list that I want to update in real-time to show what's going on with each one. When they've all completed I want to switch to another tab and take the user to the next step that relies on all of the data that was just obtained. Async/Await, Join, ForEach, ParallelTask, and ForkJoin all seem like equally viable candidates in this situation. How would you choose one over another if they all seem equally capable?
  2. David Schwartz

    The Curiously Recurring Generic Pattern

    Ok, after reading the article, it makes more sense. It will take some pondering to come up with legitimate use cases. The article ends with: A while back, I was trying to translate a Swagger spec into some Delphi code, and I ran into some issues that this might solve. The Swagger spec is read at run-time and the program generates Dephi code that gets compiled later on. The problem is that the typing isn't really known at run-time when you're reading the Swagger spec. But emitting code that resolves it at a subsequent compile time without having to know it when it's emitted might solve the problem...
  3. David Schwartz

    Windows Software Development Kit - why?

    I think it's optional unless there's some dependency that requires it. Which version of Delphi is this?
  4. Global variables in a unit often suggest there are classes that could be created to encapsulate them, possibly as static / class variables. How are they being used? Two things come to mind most frequently: they may be associated with singleton patterns; or they could be acting as buffers for data loaded from a DB or INI files. The presence of lots of global vars tells me the original writers probably used VB a lot and weren't very skilled with OOP concepts. (I once saw a Delphi app that had a ton of global vars in it, and a couple of classes that were used to "collect" a few dozen methods together. There were no data members in these classes; the methods all used the global vars. Yet the code had numerous places where they created and freed these "objects". it was the goofiest code I've ever seen. But overall, it looked like a big VB program simply translated into Pascal and made me want to gag. I told my boss I wouldn't touch it with a 10' pole.) I don't think moving methods that access global vars into TThreads is advisable because that can cause contention issues if they're ever updated, and that would require even more coding to resolve. If the algorithm goes through a process first and loads up data into the global vars, then treats it all as as read-only at run-time, then maybe having multiple threads could be useful, but they won't in the main thread. And that's an optimization that I'd leave for later. Why did you think of doing this first? (Just curious.) Off-hand, it sounds like a recipe for disaster. The threads will NOT run in the main form thread (they're separate threads!), and simply using properties won't buy you any kind of protection against contention at run-time between multiple threads unless you build that into the getters and setters -- which, BTW, won't know what threads they're dealing with unless you add even MORE global vars! They need to be encapsulated into classes FIRST or you're just creating a bigger, more complex mess for yourself. You need to study the code and see what it's doing. But start encapsulating things based on space first (with vars moved into classes), not time (based on threading). Identify groups of related variables and the methods that refer to them, and put them into separate classes and give each one a meaningful name. If the vars are only initialized once and then read-only, then make them static / class vars. They're probably used as configuration parameters. If they're used for buffering values being read in or processed, then they're going to be private data members that need properties defined to access them. The logic will be working like this: - initialize operational / config parameters (some of the global vars) - for each line or batch of input data do -- read the data and put it into some global vars -- process the data that was just read -- save the results somewhere - go on to the next step (if needed) -- eg, display results - wrap everything up and shut down If you've got some global vars that are all set up as arrays of the same length, this is a sign that you can put them into a class (as single vars, not arrays) and you'd have as many instances as the length of the arrays. You could then put the class instances into a single array or list.
  5. David Schwartz

    Which option to use for a large batch of REST queries?

    This is a test jig and I've been juggling things around trying to see what the differences are.
  6. David Schwartz

    task thread priority?

    In general, is there a way to set task thread priority in Windows? (Both in the PPL and OTL.) It seems that if you have more threads than cores, a thread that you want to run on a regular interval can get starved for CPU time by the others that are getting shuffled in and out of cores until they're all done.
  7. David Schwartz

    task thread priority?

    great to know. My test jig is just calling Sleep() because ... that what's used in all of the threading examples I've seen. My main app isn't working yet, but it will block on calls to the HTTPGet routines. Is there a better way to do that if you're multi-threading and want to minimize your overall processing time for a big batch of REST calls to the same API?
  8. David Schwartz

    task thread priority?

    As I said, this is just a test jig for me to get famliar with the OTL and driving the UI properly. I'll be sending out a bunch of requests to my own middle-tier service that forwards them on to the same destination service. My testing on up to 50 queries has shown they take between 3 and 12 seconds to process each one. My middle tier service extracts a handful of details from the full response packets and returns that to the waiting client threads. The overhead in processing the replies is minimal, so there's not much point in holding the data in the middle tier until all replies have been received, then dumping it all down to the client at that point. The client would have nothing to do at that point while waiting for the reply data to arrive. I chose to use the Join abstraction because my application needs to process all of these requests in parallel then wait to get all of the response data back before it can proceed to do anything further. If you have 100 requests to process and they take an average of 10 seconds each, that's 1000 seconds to process them all in sequence. But in parallel, it'll be closer to 30 seconds. THAT is the ONLY metric I'm trying to shrink by using multi-threading. All of the rest is unaffected. Do you mean "cores" instead of "kernals"? If so, my tests show otherwise. And intuitively it makes no sense either. Let's say you have 40 threads and 4 cores. Each thread can run and send off its request, then go to sleep waiting for a reply. It's very possible that all 40 of those threads could send out their queries before the first reply is received. I don't understand why a bunch of threads all waiting on something to happen are saturating all of the CPU cores while basically doing NOTHING! At least by increasing the thread pool size you have a far better chance that each thread will send its request, then go to sleep and let another thread do the same. If you see how things work based on how this test jig works, Windows does a really poor job of reallocating threads to cores when threads have nothing to do. It's clear that increasing the size of the thread pool results in lower overall processing time, up to some point, whereupon the total processing time starts to creep up as the thread pool size grows. That said, I've seen numerous benchmarks from Java and C#/.NET applilcations that show the overhead in their threading code is so high that the break-even point on saving time by multi-threading is absurdly high. So this test jig shows I can get a serious reduction in overall processing time with this approach.
  9. David Schwartz

    task thread priority?

    The project in question is attached to my big post at the end of the Third-Party -> OTL board. (the code is in a Zip file, but the main logic is shown in the post.) I wrote it to test OTL's Join abstraction. However, the original form used Delphi's PPL to update some statistics in the form. I moved them to a status bar at the bottom of the window. Under the hood, both PPL and OTL are using the same Windows logic, so the differences in higher-level abstractions is irrelevant for testing purposes. (This IS just a TEST jig.) I can see what's going on in my head, but I'm having trouble getting it expressed correctly in code. I'm testing out different things, and have gotten this far, changing the TSystemInfo.Create method in the Classes.SystemInfo.pas unit to this: class constructor TSystemInfo.Create; begin // create "dummy" task, that will start System.Threading's monitoring, and keeps it running // required to get meaningful values for CPU Usage //TTask.Run( var aThread := TThread.CreateAnonymousThread( procedure begin sleep(100); end); aThread.Priority := tpHigher; aThread.Start; //platform won't change at runtime FPlatform.Detect; end; I changed TTask.Run to the line below it and then set the Priority to tpHigher and Started the thread. Happily, this corrected the problem with the Join logic not updating the UI for a lot of tasks, which is not obvious but it makes sense to me. Here's why... My CPU has 4 cores. When I set the test to use just 1 core, everything runs as if it's synchronous; ie, there's no parallelism going on. (That makes sense to me given a message-driven system and nothing limiting the amount of time threads can bogart a core. In an interrupt-driven system with caps on CPU time for each task, I don't think the cores would be so saturated. The test generates a bunch of tasks that have delays and a variance that is randomly added or subtracted from the base value. By default, it's 8 +/- 3, so the delay values will be 5-11 seconds. Now, based on the # of cores and the range of delays, there will be some ratio of threads to cores where all of the threads will essentially start at the same time, and will finish after their appointed delay times (all rounded to the nearest full second). When a thread finishes, it will be moved to the right-hand list. Sometimes they will be sorted by delay time, and will appear in ascending order, even when the #tasks exceeds #cores by quite a bit. (Try it, you'll see.) As you adjust the delay factors and ratio of threads to cores, at some point there will be enough contention for cores that some delays will occur, resulting in the threads no longer sorted as nicely in the right-hand list. At some point, the contention for CPU time will become so high that the ordering of completion will appear fairly random vs. their specified delay times. Without boosting the priority on the thread above, all of the requests to update the UI get added to the end of the thread queue, and the contention starts right away -- I've never see a situation where the threads are sorted in the right-hand list by ascending delay times except where the #threads <= #cores. Again, this makes perfect sense to me, although it's not very intuitive. That said, there may be little things I'm overlooking, because I think the point at which the #threads to #cores starts to mess up the ordering of things in the right-hand list by ascending delays should be much higher than what I'm seeing. There is one side-effect of changing that TSystemInfo.Create method from using TTask.Run to use a TThread instead, and that has to do with the way the GetCurrentCPUUsage works -- it always displays 0%. This has no effect on the above logic, it's just a quirk of something in the PPL. I'd like to replace the PPL code with OTL code, but I'm not clear which abstraction is best to use for it.
  10. David Schwartz

    Delphi MRU Project Manager

    Either one ... but feel free to address both. (The images suggest neither one, which is why I asked.)
  11. David Schwartz

    task thread priority?

    Adjusting priorities is a primary way you can ensure that tasks intended to run periodically actually DO run periodically. That's the problem I'm faced with here, and the default settings are causing events messages that should run a FG process periodically all get clumped at the end of the message list (or task queue). They need to run when they're triggered, not when everything else is finished. I have a lot of experience doing real-time control stuff on single-CPU (ie, single-core) systems. We didn't have this problem for a variety of reasons. One was that the Scheduler would wake up periodically and see if there were any higher-priority tasks that needed to run. It would also send tasks that had been running "too long" to the end of the line. And idle tasks would not eat up any CPU time at all. In my case, I've got threads for tasks that take many wall-clock seconds to run, although 99% of that time is spent waiting for a reply to arrive from off-site. In theory, they should be stuffed into an "idle queue" while they're blocked so they don't bogart the cores. I set the thread pool to a relatively large number to ensure as many tasks are waiting for replies as possible. But what happens is they saturate all available cores instead of sitting in an idle queue, and the thread that's supposed to update the UI never gets a chance to run. If you have 50 tasks that all sit for 10 seconds waiting for a reply, the CPU should not be saturated with only 'n' tasks running (where n = #cores). If the response time varies from 5 - 15 secs randomly, the CPU cores should not have a few tasks saturating them waiting on their replies while other tasks in the queue that HAVE received replies are sitting waiting to get time to run. This is how things are working right now, and Windows does not seem to be doing a very intelligent job of managing any of it. The periodic task needs to have its priority raised so it runs ahead of the others when it wakes up. The others would do well to have their priority dropped when they begin waiting for their reply so others can get CPU time, and when a reply arrives it would restore the priority of its sleeping task. If anybody has any suggestions other than adjusting task priorities, I'm all ears.
  12. David Schwartz

    task thread priority?

    Thanks! Well, I can plainly see that the Windows thread scheduler is failing to do what I want. The UI is locking-up after processing a couple of things, even though there's a 300ms timer being used to update it, so nothing further is visible to the user until the number of tasks left in the queue is less than the number of cores, at which point it's like dumping 10 gallons of water on someone all at once who was expecting to see one gallon per minute for 10 minutes. Oddly, in Windows, when you set the thread pool size to 1, the whole asynchronous threading model breaks down and everything runs serially with no asynchronism at all. Which is why Windows had (maybe still has?) this odd "Yield" method that you have to sprinkle-in liberally throughtout your code to ensure no one task hogs too much CPU time. There are warnings I've read that say to beware of this situation where a single thread can hijack and saturate the CPU because everything runs at the same process priority by default. You can solve this by boosting the priority of tasks that are intended to run periodically (ie, on a timer), for example, to ensure they actually run when their timer triggers them rather than having the timer stuff a message at the end of the message queue that's not processed until everything else has finished. The task triggered by the timer needs to actually RUN periodically, not just at the end. I found out that the OTL also has a way to set a thread's priority, but it took quite a while to track down in the manual.
  13. David Schwartz

    Which option to use for a large batch of REST queries?

    I've decided that parallel.join works best for my needs. Here's part of a test I built. The full code is attached in a zip file. Note that I prefer to use Raize Compnents (Konopka) but you can probably replace them with regular versions here. var join : IOmniParallelJoin; // I have 2 pages in a PageControl // The first has a TListView that contains a list of things to process, with objects attached to the .Data property // The second has a TListView that shows completed items. // Each item in the first one is processed, then moved to the second one. This way, you can see the list shrinking as tasks complete. procedure TThreadingTest3_frm.Go_btnClick(Sender: TObject); begin Main_ntbk.ActivePageIndex := 0; Completed_lview.Items.Clear; if (Ready_lview.Items.Count = 0) then ShowMessage( 'add some tasks first!' ) else begin Go_btn.Enabled := False; StartProcessing( Ready_lview ); end; end; procedure TThreadingTest3_frm.StartProcessing( aLV : TRzListView ); begin var ntasks := 5; // if this is = 1 ==> everything runs syncronously, not async join := Parallel.Join .NumTasks( ntasks ) .OnStopInvoke( procedure // AFTER <<EACH TASK>> COMPLETES, run this in the main thread begin UpdateLV; // update the ListView in the main thread end ) .TaskConfig( Parallel.TaskConfig.OnTerminated( // AFTER <<ALL TASKS>> COMPLETE, run this in the main thread procedure (const task: IOmniTaskControl) begin UpdateLV; // update the ListView in the main thread end ) ) ; start_tm := Now; for var li in aLV.Items do // join is a list; this adds tasks to it ProcessListItem( li ); join.NoWait.Execute; // this schedules everything to run, then returns end; procedure TThreadingTest3_frm.ProcessListItem( aLI : TListItem ); begin join.Task( // creates a new task and ads it to the list procedure (const joinState: IOmniJoinState) begin var obj := TmyClass(aLI.Data); obj.ElapsedTm := 0; // signifies we've started joinState.task.Invoke( UpdateLV ); // update the ListView in the main thread var elapsed := DSiTimeGetTime64; // do something DSiuSecDelay( obj.DelayAmt * 1000 * 1000 ); // micro-seconds // done obj.ElapsedTm := DSiElapsedTime64( elapsed ); // in msecs joinState.task.Invoke( UpdateLV ); // update the ListView in the main thread end ); end; procedure TThreadingTest3_frm.UpdateLV; // this must run in the main thread begin var we_are_done := (Ready_lview.Items.Count = 0); for var li in Ready_lview.Items do begin var obj := TmyClass(li.Data); if Assigned(obj) then begin if obj.isReady then li.SubItems[1] := 'Started...' else if obj.isFinished then begin var li2 := Completed_lview.Items.Add; li2.Caption := li.Caption; li2.SubItems.Assign( li.SubItems ); li2.SubItems[1] := 'Finished!'; li2.SubItems[3] := Format( '%f', [obj.ElapsedTm / 1000] ); obj.markComplete; obj.Free; li.Data := NIL; li.Delete; end; end; Application.ProcessMessages; end; Application.ProcessMessages; if we_are_done then begin if (Main_ntbk.ActivePage <> Completed_TabSheet) then begin Main_ntbk.ActivePage := Completed_TabSheet; ShowMessage( 'ALL DONE! Total time = '+Format( '%f', [SecondSpan( Now, start_tm )] ) ); end; Go_btn.Enabled := True; end; end; parallel_join_test3.zip
  14. David Schwartz

    Frequent and/or annoying typos you make while coding

    'g' gets flopped a lot, esp. in -ing -> -ign. It happens so often that I've gotten to the point where if there's not a spell-check line under it, I have to really think about it.
  15. David Schwartz

    Which option to use for a large batch of REST queries?

    Ok, got it. But I like abstractions! 🙂
  16. David Schwartz

    Which option to use for a large batch of REST queries?

    I can imagine many different uses for this, for sure! It's a way of implementing a *nix command-line pipe expression in code. In my case, I'm just grabbing a bunch of data, then when it has all been fetched I let the user select pieces of it and it displays a bunch of related items in a heatmap. There's no "import" process required.
  17. David Schwartz

    Which option to use for a large batch of REST queries?

    Thanks, but you're pointing at something that amounts to mere milliseconds of added overhead on a 2GHz CPU. Meanwhile, these tasks are spending ~95% of their time waiting for a reply, which is MANY orders of magnitude greater than your concerns of any inefficiencies caused by the abstraction layers. So why should I care? The user's experience won't be perceptible. I suspect I can set up a thread pool of 50 for a 2-core CPU and I'm guessing it will still not saturate either core. I'm looking forward to testing it. Anyway, this library offers 8 different high-level abstractions, and it's a library I'm just getting familiar with. So I'm still curious what approach others with more experience using it might choose.
  18. David Schwartz

    Which option to use for a large batch of REST queries?

    I guess "huge" is relative. 🙂 As I stated: I'm sending these requests to my own service which forwards on the requests, waits for replies, extracts the exact details needed and returns them to my client app. The rest of the data can be downloaded as well if desired, but it's superfluous in most cases. (And the overhead of downloading that data is minor.) If we have 100 requests per batch and most are just sitting there waiting for a reply, I'm guessing they can all be processed in ~20 seconds of wall-clock time instead of 100 x ~12 secs if processed in series. Even if we're only processing 10, the total processing time is STILL going to be ~12 seconds because that's how long it typically takes for one request, and we cannot speed that up. I cannot do anything about the +/- 25% processing time variations of the service that's being called, so I don't see that optimizing anything will improve overall performance. Do you?
  19. David Schwartz

    Which option to use for a large batch of REST queries?

    Sorry, what does this have to do with calling a REST API in OTL?
  20. David Schwartz

    Which option to use for a large batch of REST queries?

    I think they all do that in slightly different ways.
  21. David Schwartz

    Delphi profiler

    works fine for me!
  22. I need some code written in either C/C++ or PASCAL (Delphi, Free Pascal, etc.) that suppresses breath noise from divers. I can provide a sample audio file that would be used to demonstrate it works. Ideally, it would be small enough to run on a Raspberry Pi Pico CPU (8MB of storage + 256k of RAM). The "breath noise" is a large burst of mostly white noise produced over audio comms when a diver inhales and/or exhales. (think "Darth Vader's breathing") My thinking is to have some adaptive measurement of the audible voice stream to measure the most dominant frequencies in the normal speech stream and compare packets so when a block of broad-spectrum data appears, attenuate it by some amount, like 50dB or so for as long as it's present. In other words, do some basic spectral analysis and isolate the speaker's dominant frequencies vs. those that may be missing most of the time. When you see a large increase in non-dominant broad-spectrum noise, assume it's a breath and attenuate the entire signal. People do not typically say anything meaningful when they're inhaling or exhaling, unless they're screaming very loudly, but that's an exhale. Usually this noise is most present during inhalation. When the device turns on or restarts, it needs to reinitialize its internal model by learning the speakers dominant frequencies in the audio stream for the current diver and adapt itself. So it's fine to hear the breath noises at first and hear them fade away after 10 breaths or so while the user is speaking normally. Perhaps have a short thing they need to read, like a poem, to help train the model. AI is not required! It just needs to be able to build a statistical model to differentiate the current speaker's normal speaking voice from the breath noise and suppress the sound when the breath noise is detected. It should go without saying, but ... divers won't swap out equipment while under water. So the same person will be using the device for perhaps several hours. It needs to keep a moving average, so occasional variations in the voice won't cause the breath noise to start showing up if the pitch of their voice changes somewhat under stress. Also, if helium is present in the air mix, the vocal pitch will increase, although this won't be a concern initially. For example: measure the presence of signals in, say, 20 frequency bands in the normal vocal range; some will be heavily present, and most will be almost empty / non-existent. When you see a noise that suddenly shows up in half or more of the bins that are normally empty, then attenuate that sample. (We'll probably need a way to adjust that threshold and perhaps other filter parameters easily during testing without recompiling.) A RPi Pico runs at 185 MHz or so, which should have plenty of bandwidth to handle vocals at a 22 kHz sampling rate in real-time. It can be developed in C/C++ or Pascal (Delphi, Free Pascal, etc) but it needs to be small enough to run in a RPi Pico or similarly configured device. This seems to me like a fairly simple task for someone familiar with FFTs and basic signal processing. I'm sure there are open source FFT libs and what really needs to be done is writing some code to sample frequencies and use it to recognize when a burst of white noise occurs on the input vs. normal speech patterns, and attenuate that signal in real-time. Here is a link that lets you DL the sample data file from Dropbox. REMOVE THE SPACES! ht tp : // w5a. com/u/ breath_sounds
  23. David Schwartz

    Interface question

    FastMM is what I was referring to. I have not been able to get either of the two most recent V4 versions to work properly in Sidney. It works fine in Tokyo. I've brought it up here before, and apparently it's unique to my machine since nobody else seems to have a problem. I built a tool for it that parses out the log file and makes it super easy to figure out the likely source of the error. Most of the stuff in the log file is redundant, and it's mostly documenting side-effects of the error, not the error itself. Eg, failing to free the Objects in a stringlist throws a gazillion errors related to all of the orphaned objects, making it really hard to figure out that the problem was with the stringlist, which technically did nothing wrong. Also, in some situations, I'll get tons of orphaned string literals showing up that baffle me why they're even there. I forgot to properly free something that had a bunch of strings in it, and I guess Unicode sometimes gets them from the heap when they're added to certain containers (?). It's all just noise! In most cases, it comes down to missing one single call to .Free somewhere before either replacing something or freeing the container.
  24. David Schwartz

    Need a grid of buttons that can have text/graphics in them

    Nope. I spent a few hours trying to get those flow-things to work for my needs. They do something similar to word-wrapping, which does not preserve an X/Y grid layout which is what I was after. I could have used something like a big sheet of paper that slides around under a viewport and lets you shrink or expand the scale of the paper, but that's not what these things do. They do what they do. It's just not what I wanted.
  25. David Schwartz

    Need a grid of buttons that can have text/graphics in them

    I'm doing something similar right now. I found a TjanRoundedButton component somewhere and am using that. I lay them out on a panel in an x-by-y matrix. I ended up subclassing it to add some additional things I needed and I use instances of my own class on the panel. The original is fairly basic but easy to extend. I may ultimately switch to a grid, but to my eye, this approach looks better because it offers isolation between the "cells" whereas a grid jams them all together. In my case, I must create them at run-time because the number of rows and cols is data-driven. I found it easier to just lay them down algorithmically rather than use any of the fancy flowgrid/panel/etc options. I end up with one or more tabsheets, based on how many data sets are present. Then I add a TPanel on the tabsheet (which is probably redundant), and then add the buttons in a grid layout. Honestly, while prototyping I put them all down at design time, and it slowed the IDE down quite a bit. I didn't need them at design time, so doing it at run-time keeps the IDE from slowing down. (I've got this other thing that's a 9x9 array of radio buttons organized in columns using 9 panels, and it slows down the IDE as well. But they're static and it helps to have them at design time.)
×