Which option to use for a large batch of REST queries?

David Schwartz · January 18, 2022

I've been going through the videos, and I'm left wondering what high-level abstraction is best to use for a batch of REST requests?

The requests themselves are different, but they're all going through the same routine to the same service.

Eg., the same Google queries for different cities, or hotel room availability at a bunch of different properties in the same hotel chain. That is, situations where you need to qualify the same query with a different geographical location, or different features at the same location. The point is, the API vendor requires you to submit the same query multiple times by varying a piece of the data each time.

In this case, I can see using at least half of the abstractions that OTL provides. Indeed, they show loading multiple web pages in most of the examples.

The program's users will want to process a batch of these requests, from 4 to 100 per batch, but typically 10-20 per batch on average. They typically take 8-12 seconds each.

The purpose of these queries is to collect different sets of data to be analyzed at the next step. The user needs to wait for them all to complete before proceeding.

They're all displayed in a list that I want to update in real-time to show what's going on with each one. When they've all completed I want to switch to another tab and take the user to the next step that relies on all of the data that was just obtained.

Async/Await, Join, ForEach, ParallelTask, and ForkJoin all seem like equally viable candidates in this situation.

How would you choose one over another if they all seem equally capable?

Edited January 18, 2022 by David Schwartz

Fr0sT.Brutal · January 18, 2022

I'd use "task list + N worker threads" abstraction. N is known beforehand as "req limit" option so you just create N threads and let them consume tasks from a list and stop when all tasks are done. Main app checks all the threads for finish state on timer. Simple, clean, reliable.

Rollo62 · January 18, 2022

Consider GraphQL, which is more effective than REST.

David Schwartz · January 19, 2022

19 hours ago, Fr0sT.Brutal said:

I'd use "task list + N worker threads" abstraction. N is known beforehand as "req limit" option so you just create N threads and let them consume tasks from a list and stop when all tasks are done. Main app checks all the threads for finish state on timer. Simple, clean, reliable.

I think they all do that in slightly different ways.

David Schwartz · January 19, 2022

9 hours ago, Rollo62 said:

Consider GraphQL, which is more effective than REST.

Sorry, what does this have to do with calling a REST API in OTL?

Rollo62 · January 19, 2022

4 hours ago, David Schwartz said:

Sorry, what does this have to do with calling a REST API in OTL?

Sorry for giving you totally wrong and off-topic hints.

I thought you wanted to work with huge amount of REST queries, maybe with lots of unneccessary data to download as well,

where GraphQL as server interface could help to optimize requests paths and download sizes.

Edited January 19, 2022 by Rollo62

Fr0sT.Brutal · January 19, 2022

5 hours ago, David Schwartz said:

I think they all do that in slightly different ways.

Yep but they all just build unneeded levels of abstractions on top of simple things. IMHO. Your case, as I understood, doesn't need dynamic thread pool or constant task streaming and so on. However, if you want to use OTL, I think the simplest one is the best.

David Schwartz · January 19, 2022

2 hours ago, Rollo62 said:

Sorry for giving you totally wrong and off-topic hints.

I thought you wanted to work with huge amount of REST queries, maybe with lots of unneccessary data to download as well,

where GraphQL as server interface could help to optimize requests paths and download sizes.

I guess "huge" is relative. 🙂 As I stated:

On 1/17/2022 at 7:32 PM, David Schwartz said:

The program's users will want to process a batch of these requests, from 4 to 100 per batch, but typically 10-20 per batch on average. They typically take 8-12 seconds each.

I'm sending these requests to my own service which forwards on the requests, waits for replies, extracts the exact details needed and returns them to my client app. The rest of the data can be downloaded as well if desired, but it's superfluous in most cases. (And the overhead of downloading that data is minor.)

If we have 100 requests per batch and most are just sitting there waiting for a reply, I'm guessing they can all be processed in ~20 seconds of wall-clock time instead of 100 x ~12 secs if processed in series.

Even if we're only processing 10, the total processing time is STILL going to be ~12 seconds because that's how long it typically takes for one request, and we cannot speed that up.

I cannot do anything about the +/- 25% processing time variations of the service that's being called, so I don't see that optimizing anything will improve overall performance. Do you?

Edited January 19, 2022 by David Schwartz

David Schwartz · January 19, 2022

1 hour ago, Fr0sT.Brutal said:

Yep but they all just build unneeded levels of abstractions on top of simple things. IMHO. Your case, as I understood, doesn't need dynamic thread pool or constant task streaming and so on. However, if you want to use OTL, I think the simplest one is the best.

Thanks, but you're pointing at something that amounts to mere milliseconds of added overhead on a 2GHz CPU.

Meanwhile, these tasks are spending ~95% of their time waiting for a reply, which is MANY orders of magnitude greater than your concerns of any inefficiencies caused by the abstraction layers.

So why should I care? The user's experience won't be perceptible.

I suspect I can set up a thread pool of 50 for a 2-core CPU and I'm guessing it will still not saturate either core. I'm looking forward to testing it.

Anyway, this library offers 8 different high-level abstractions, and it's a library I'm just getting familiar with. So I'm still curious what approach others with more experience using it might choose.

Edited January 19, 2022 by David Schwartz

Attila Kovacs · January 19, 2022

I'm using "pipeline" for a multi downloader, with my "download" and "import" stages. It's really cool.

David Schwartz · January 19, 2022

38 minutes ago, Attila Kovacs said:

I'm using "pipeline" for a multi downloader, with my "download" and "import" stages. It's really cool.

I can imagine many different uses for this, for sure! It's a way of implementing a *nix command-line pipe expression in code.

In my case, I'm just grabbing a bunch of data, then when it has all been fetched I let the user select pieces of it and it displays a bunch of related items in a heatmap. There's no "import" process required.

Edited January 19, 2022 by David Schwartz

Fr0sT.Brutal · January 19, 2022

1 hour ago, David Schwartz said:

Thanks, but you're pointing at something that amounts to mere milliseconds of added overhead on a 2GHz CPU.

No, I'm not considering perf at all, just the resulting code. I just don't see benefits in adding more abstractions on top of what could serve well as-is.

David Schwartz · January 19, 2022

18 minutes ago, Fr0sT.Brutal said:

No, I'm not considering perf at all, just the resulting code. I just don't see benefits in adding more abstractions on top of what could serve well as-is.

Ok, got it. But I like abstractions! 🙂

Edited January 19, 2022 by David Schwartz

David Schwartz · January 21, 2022

I've decided that parallel.join works best for my needs. Here's part of a test I built. The full code is attached in a zip file.

Note that I prefer to use Raize Compnents (Konopka) but you can probably replace them with regular versions here.

var
  join : IOmniParallelJoin;

// I have 2 pages in a PageControl
// The first has a TListView that contains a list of things to process, with objects attached to the .Data property
// The second has a TListView that shows completed items.
// Each item in the first one is processed, then moved to the second one. This way, you can see the list shrinking as tasks complete.

procedure TThreadingTest3_frm.Go_btnClick(Sender: TObject);
begin
  Main_ntbk.ActivePageIndex := 0;
  Completed_lview.Items.Clear;   

  if (Ready_lview.Items.Count = 0) then
    ShowMessage( 'add some tasks first!' )
  else
  begin
    Go_btn.Enabled := False;
    StartProcessing( Ready_lview );
  end;
end;

procedure TThreadingTest3_frm.StartProcessing( aLV : TRzListView );
begin
  var ntasks := 5; // if this is = 1 ==> everything runs syncronously, not async

  join := Parallel.Join
                  .NumTasks( ntasks )
                  .OnStopInvoke( procedure  // AFTER <<EACH TASK>> COMPLETES, run this in the main thread
                                 begin
                                   UpdateLV;   // update the ListView in the main thread
                                 end
                               )
                  .TaskConfig(
                      Parallel.TaskConfig.OnTerminated(  // AFTER <<ALL TASKS>> COMPLETE, run this in the main thread
                          procedure (const task: IOmniTaskControl)
                          begin
                            UpdateLV;   // update the ListView in the main thread
                          end
                        )
                      )
                  ;
  start_tm := Now;
  for var li in aLV.Items do  // join is a list; this adds tasks to it
    ProcessListItem( li );

  join.NoWait.Execute;        // this schedules everything to run, then returns
end;

procedure TThreadingTest3_frm.ProcessListItem( aLI : TListItem );
begin
  join.Task(  // creates a new task and ads it to the list
    procedure (const joinState: IOmniJoinState)
    begin
      var obj := TmyClass(aLI.Data);
      obj.ElapsedTm := 0; // signifies we've started
      joinState.task.Invoke( UpdateLV );   // update the ListView in the main thread
      var elapsed := DSiTimeGetTime64;

      // do something
      DSiuSecDelay( obj.DelayAmt * 1000 * 1000 ); // micro-seconds
      // done

      obj.ElapsedTm := DSiElapsedTime64( elapsed ); // in msecs
      joinState.task.Invoke( UpdateLV );   // update the ListView in the main thread
    end
  );
end;

procedure TThreadingTest3_frm.UpdateLV;  // this must run in the main thread
begin
  var we_are_done := (Ready_lview.Items.Count = 0);
  for var li in Ready_lview.Items do
  begin
    var obj := TmyClass(li.Data);
    if Assigned(obj) then
    begin
      if obj.isReady then
        li.SubItems[1] := 'Started...'
      else if obj.isFinished then
      begin
        var li2 := Completed_lview.Items.Add;
        li2.Caption := li.Caption;
        li2.SubItems.Assign( li.SubItems );
        li2.SubItems[1] := 'Finished!';
        li2.SubItems[3] := Format( '%f', [obj.ElapsedTm / 1000] );

        obj.markComplete;
        obj.Free;
        li.Data := NIL;
        li.Delete;
      end;
    end;
    Application.ProcessMessages;

  end;
  Application.ProcessMessages;
  if we_are_done then
  begin
    if (Main_ntbk.ActivePage <> Completed_TabSheet) then
    begin
      Main_ntbk.ActivePage := Completed_TabSheet;
      ShowMessage( 'ALL DONE! Total time = '+Format( '%f', [SecondSpan( Now, start_tm )] ) );
    end;
    Go_btn.Enabled := True;
  end;
end;

parallel_join_test3.zip

Edited January 21, 2022 by David Schwartz

Dalija Prasnikar · January 25, 2022

On 1/21/2022 at 1:15 PM, David Schwartz said:

Application.ProcessMessages;

Why? It serves no purpose whatsoever.

I am not familiar with OTL enough to comment other code.

David Schwartz · January 27, 2022

On 1/25/2022 at 6:25 AM, Dalija Prasnikar said:

Why? It serves no purpose whatsoever.

This is a test jig and I've been juggling things around trying to see what the differences are.

Dalija Prasnikar · January 27, 2022

8 hours ago, David Schwartz said:

This is a test jig and I've been juggling things around trying to see what the differences are.

Testing is one things, calling something that serves no purpose is another.

If the system is over stressed then pumping messages from UpdateLV will not make it run any faster. If the system is not stressed it will pump messages even without you forcing it, literally as soon as you exit UpdateLV method. If you have too many items on list view and UpdateLV is killing your overal performance, then you should modify that logic and instead of updating the whole list all the time update only item that is modified.

Yes, you could see some differences in behavior with Application.ProcessMessages, but none that really matter. Just remove those.

Sign In

Which option to use for a large batch of REST queries?

Recommended Posts

David Schwartz 443

Share this post

Link to post

Fr0sT.Brutal 904

Share this post

Link to post

Rollo62 602

Share this post

Link to post

David Schwartz 443

Share this post

Link to post

David Schwartz 443

Share this post

Link to post

Rollo62 602

Share this post

Link to post

Fr0sT.Brutal 904

Share this post

Link to post

David Schwartz 443

Share this post

Link to post

David Schwartz 443

Share this post

Link to post

Attila Kovacs 676

Share this post

Link to post

David Schwartz 443

Share this post

Link to post

Fr0sT.Brutal 904

Share this post

Link to post

David Schwartz 443

Share this post

Link to post

David Schwartz 443

Share this post

Link to post

Dalija Prasnikar 1525

Share this post

Link to post

David Schwartz 443

Share this post

Link to post

Dalija Prasnikar 1525

Share this post

Link to post

Create an account or sign in to comment

Create an account

Sign in

Browse

Activity