David Schwartz 428 Posted January 18, 2022 (edited) I've been going through the videos, and I'm left wondering what high-level abstraction is best to use for a batch of REST requests? The requests themselves are different, but they're all going through the same routine to the same service. Eg., the same Google queries for different cities, or hotel room availability at a bunch of different properties in the same hotel chain. That is, situations where you need to qualify the same query with a different geographical location, or different features at the same location. The point is, the API vendor requires you to submit the same query multiple times by varying a piece of the data each time. In this case, I can see using at least half of the abstractions that OTL provides. Indeed, they show loading multiple web pages in most of the examples. The program's users will want to process a batch of these requests, from 4 to 100 per batch, but typically 10-20 per batch on average. They typically take 8-12 seconds each. The purpose of these queries is to collect different sets of data to be analyzed at the next step. The user needs to wait for them all to complete before proceeding. They're all displayed in a list that I want to update in real-time to show what's going on with each one. When they've all completed I want to switch to another tab and take the user to the next step that relies on all of the data that was just obtained. Async/Await, Join, ForEach, ParallelTask, and ForkJoin all seem like equally viable candidates in this situation. How would you choose one over another if they all seem equally capable? Edited January 18, 2022 by David Schwartz Share this post Link to post
Fr0sT.Brutal 900 Posted January 18, 2022 I'd use "task list + N worker threads" abstraction. N is known beforehand as "req limit" option so you just create N threads and let them consume tasks from a list and stop when all tasks are done. Main app checks all the threads for finish state on timer. Simple, clean, reliable. Share this post Link to post
Rollo62 538 Posted January 18, 2022 Consider GraphQL, which is more effective than REST. Share this post Link to post
David Schwartz 428 Posted January 19, 2022 19 hours ago, Fr0sT.Brutal said: I'd use "task list + N worker threads" abstraction. N is known beforehand as "req limit" option so you just create N threads and let them consume tasks from a list and stop when all tasks are done. Main app checks all the threads for finish state on timer. Simple, clean, reliable. I think they all do that in slightly different ways. Share this post Link to post
David Schwartz 428 Posted January 19, 2022 9 hours ago, Rollo62 said: Consider GraphQL, which is more effective than REST. Sorry, what does this have to do with calling a REST API in OTL? Share this post Link to post
Rollo62 538 Posted January 19, 2022 (edited) 4 hours ago, David Schwartz said: Sorry, what does this have to do with calling a REST API in OTL? Sorry for giving you totally wrong and off-topic hints. I thought you wanted to work with huge amount of REST queries, maybe with lots of unneccessary data to download as well, where GraphQL as server interface could help to optimize requests paths and download sizes. Edited January 19, 2022 by Rollo62 Share this post Link to post
Fr0sT.Brutal 900 Posted January 19, 2022 5 hours ago, David Schwartz said: I think they all do that in slightly different ways. Yep but they all just build unneeded levels of abstractions on top of simple things. IMHO. Your case, as I understood, doesn't need dynamic thread pool or constant task streaming and so on. However, if you want to use OTL, I think the simplest one is the best. Share this post Link to post
David Schwartz 428 Posted January 19, 2022 (edited) 2 hours ago, Rollo62 said: Sorry for giving you totally wrong and off-topic hints. I thought you wanted to work with huge amount of REST queries, maybe with lots of unneccessary data to download as well, where GraphQL as server interface could help to optimize requests paths and download sizes. I guess "huge" is relative. 🙂 As I stated: On 1/17/2022 at 7:32 PM, David Schwartz said: The program's users will want to process a batch of these requests, from 4 to 100 per batch, but typically 10-20 per batch on average. They typically take 8-12 seconds each. I'm sending these requests to my own service which forwards on the requests, waits for replies, extracts the exact details needed and returns them to my client app. The rest of the data can be downloaded as well if desired, but it's superfluous in most cases. (And the overhead of downloading that data is minor.) If we have 100 requests per batch and most are just sitting there waiting for a reply, I'm guessing they can all be processed in ~20 seconds of wall-clock time instead of 100 x ~12 secs if processed in series. Even if we're only processing 10, the total processing time is STILL going to be ~12 seconds because that's how long it typically takes for one request, and we cannot speed that up. I cannot do anything about the +/- 25% processing time variations of the service that's being called, so I don't see that optimizing anything will improve overall performance. Do you? Edited January 19, 2022 by David Schwartz Share this post Link to post
David Schwartz 428 Posted January 19, 2022 (edited) 1 hour ago, Fr0sT.Brutal said: Yep but they all just build unneeded levels of abstractions on top of simple things. IMHO. Your case, as I understood, doesn't need dynamic thread pool or constant task streaming and so on. However, if you want to use OTL, I think the simplest one is the best. Thanks, but you're pointing at something that amounts to mere milliseconds of added overhead on a 2GHz CPU. Meanwhile, these tasks are spending ~95% of their time waiting for a reply, which is MANY orders of magnitude greater than your concerns of any inefficiencies caused by the abstraction layers. So why should I care? The user's experience won't be perceptible. I suspect I can set up a thread pool of 50 for a 2-core CPU and I'm guessing it will still not saturate either core. I'm looking forward to testing it. Anyway, this library offers 8 different high-level abstractions, and it's a library I'm just getting familiar with. So I'm still curious what approach others with more experience using it might choose. Edited January 19, 2022 by David Schwartz Share this post Link to post
Attila Kovacs 631 Posted January 19, 2022 I'm using "pipeline" for a multi downloader, with my "download" and "import" stages. It's really cool. 1 Share this post Link to post
David Schwartz 428 Posted January 19, 2022 (edited) 38 minutes ago, Attila Kovacs said: I'm using "pipeline" for a multi downloader, with my "download" and "import" stages. It's really cool. I can imagine many different uses for this, for sure! It's a way of implementing a *nix command-line pipe expression in code. In my case, I'm just grabbing a bunch of data, then when it has all been fetched I let the user select pieces of it and it displays a bunch of related items in a heatmap. There's no "import" process required. Edited January 19, 2022 by David Schwartz Share this post Link to post
Fr0sT.Brutal 900 Posted January 19, 2022 1 hour ago, David Schwartz said: Thanks, but you're pointing at something that amounts to mere milliseconds of added overhead on a 2GHz CPU. No, I'm not considering perf at all, just the resulting code. I just don't see benefits in adding more abstractions on top of what could serve well as-is. Share this post Link to post
David Schwartz 428 Posted January 19, 2022 (edited) 18 minutes ago, Fr0sT.Brutal said: No, I'm not considering perf at all, just the resulting code. I just don't see benefits in adding more abstractions on top of what could serve well as-is. Ok, got it. But I like abstractions! 🙂 Edited January 19, 2022 by David Schwartz Share this post Link to post
David Schwartz 428 Posted January 21, 2022 (edited) I've decided that parallel.join works best for my needs. Here's part of a test I built. The full code is attached in a zip file. Note that I prefer to use Raize Compnents (Konopka) but you can probably replace them with regular versions here. var join : IOmniParallelJoin; // I have 2 pages in a PageControl // The first has a TListView that contains a list of things to process, with objects attached to the .Data property // The second has a TListView that shows completed items. // Each item in the first one is processed, then moved to the second one. This way, you can see the list shrinking as tasks complete. procedure TThreadingTest3_frm.Go_btnClick(Sender: TObject); begin Main_ntbk.ActivePageIndex := 0; Completed_lview.Items.Clear; if (Ready_lview.Items.Count = 0) then ShowMessage( 'add some tasks first!' ) else begin Go_btn.Enabled := False; StartProcessing( Ready_lview ); end; end; procedure TThreadingTest3_frm.StartProcessing( aLV : TRzListView ); begin var ntasks := 5; // if this is = 1 ==> everything runs syncronously, not async join := Parallel.Join .NumTasks( ntasks ) .OnStopInvoke( procedure // AFTER <<EACH TASK>> COMPLETES, run this in the main thread begin UpdateLV; // update the ListView in the main thread end ) .TaskConfig( Parallel.TaskConfig.OnTerminated( // AFTER <<ALL TASKS>> COMPLETE, run this in the main thread procedure (const task: IOmniTaskControl) begin UpdateLV; // update the ListView in the main thread end ) ) ; start_tm := Now; for var li in aLV.Items do // join is a list; this adds tasks to it ProcessListItem( li ); join.NoWait.Execute; // this schedules everything to run, then returns end; procedure TThreadingTest3_frm.ProcessListItem( aLI : TListItem ); begin join.Task( // creates a new task and ads it to the list procedure (const joinState: IOmniJoinState) begin var obj := TmyClass(aLI.Data); obj.ElapsedTm := 0; // signifies we've started joinState.task.Invoke( UpdateLV ); // update the ListView in the main thread var elapsed := DSiTimeGetTime64; // do something DSiuSecDelay( obj.DelayAmt * 1000 * 1000 ); // micro-seconds // done obj.ElapsedTm := DSiElapsedTime64( elapsed ); // in msecs joinState.task.Invoke( UpdateLV ); // update the ListView in the main thread end ); end; procedure TThreadingTest3_frm.UpdateLV; // this must run in the main thread begin var we_are_done := (Ready_lview.Items.Count = 0); for var li in Ready_lview.Items do begin var obj := TmyClass(li.Data); if Assigned(obj) then begin if obj.isReady then li.SubItems[1] := 'Started...' else if obj.isFinished then begin var li2 := Completed_lview.Items.Add; li2.Caption := li.Caption; li2.SubItems.Assign( li.SubItems ); li2.SubItems[1] := 'Finished!'; li2.SubItems[3] := Format( '%f', [obj.ElapsedTm / 1000] ); obj.markComplete; obj.Free; li.Data := NIL; li.Delete; end; end; Application.ProcessMessages; end; Application.ProcessMessages; if we_are_done then begin if (Main_ntbk.ActivePage <> Completed_TabSheet) then begin Main_ntbk.ActivePage := Completed_TabSheet; ShowMessage( 'ALL DONE! Total time = '+Format( '%f', [SecondSpan( Now, start_tm )] ) ); end; Go_btn.Enabled := True; end; end; parallel_join_test3.zip Edited January 21, 2022 by David Schwartz Share this post Link to post
Dalija Prasnikar 1402 Posted January 25, 2022 On 1/21/2022 at 1:15 PM, David Schwartz said: Application.ProcessMessages; Why? It serves no purpose whatsoever. I am not familiar with OTL enough to comment other code. Share this post Link to post
David Schwartz 428 Posted January 27, 2022 On 1/25/2022 at 6:25 AM, Dalija Prasnikar said: Why? It serves no purpose whatsoever. This is a test jig and I've been juggling things around trying to see what the differences are. Share this post Link to post
Dalija Prasnikar 1402 Posted January 27, 2022 8 hours ago, David Schwartz said: This is a test jig and I've been juggling things around trying to see what the differences are. Testing is one things, calling something that serves no purpose is another. If the system is over stressed then pumping messages from UpdateLV will not make it run any faster. If the system is not stressed it will pump messages even without you forcing it, literally as soon as you exit UpdateLV method. If you have too many items on list view and UpdateLV is killing your overal performance, then you should modify that logic and instead of updating the whole list all the time update only item that is modified. Yes, you could see some differences in behavior with Application.ProcessMessages, but none that really matter. Just remove those. 1 Share this post Link to post