Jump to content
David Schwartz

Which option to use for a large batch of REST queries?

Recommended Posts

I've been going through the videos, and I'm left wondering what high-level abstraction is best to use for a batch of REST requests?

 

The requests themselves are different, but they're all going through the same routine to the same service.

 

Eg., the same Google queries for different cities, or hotel room availability at a bunch of different properties in the same hotel chain. That is, situations where you need to qualify the same query with a different geographical location, or different features at the same location. The point is, the API vendor requires you to submit the same query multiple times by varying a piece of the data each time.

 

In this case, I can see using at least half of the abstractions that OTL provides. Indeed, they show loading multiple web pages in most of the examples.

 

The program's users will want to process a batch of these requests, from 4 to 100 per batch, but typically 10-20 per batch on average. They typically take 8-12 seconds each.

 

The purpose of these queries is to collect different sets of data to be analyzed at the next step. The user needs to wait for them all to complete before proceeding.

 

They're all displayed in a list that I want to update in real-time to show what's going on with each one. When they've all completed I want to switch to another tab and take the user to the next step that relies on all of the data that was just obtained.

 

Async/Await, Join, ForEach, ParallelTask, and ForkJoin all seem like equally viable candidates in this situation.

 

How would you choose one over another if they all seem equally capable?

 

 

Edited by David Schwartz

Share this post


Link to post

I'd use "task list + N worker threads" abstraction. N is known beforehand as "req limit" option so you just create N threads and let them consume tasks from a list and stop when all tasks are done. Main app checks all the threads for finish state on timer. Simple, clean, reliable.

Share this post


Link to post

Consider GraphQL, which is more effective than REST.

Share this post


Link to post
19 hours ago, Fr0sT.Brutal said:

I'd use "task list + N worker threads" abstraction. N is known beforehand as "req limit" option so you just create N threads and let them consume tasks from a list and stop when all tasks are done. Main app checks all the threads for finish state on timer. Simple, clean, reliable.

I think they all do that in slightly different ways.

Share this post


Link to post
9 hours ago, Rollo62 said:

Consider GraphQL, which is more effective than REST.

Sorry, what does this have to do with calling a REST API in OTL?

Share this post


Link to post
4 hours ago, David Schwartz said:

Sorry, what does this have to do with calling a REST API in OTL?

Sorry for giving you totally wrong and off-topic hints.

I thought you wanted to work with huge amount of REST queries, maybe with lots of unneccessary data to download as well,

where GraphQL as server interface could help to optimize requests paths and download sizes.

 

Edited by Rollo62

Share this post


Link to post
5 hours ago, David Schwartz said:

I think they all do that in slightly different ways.

Yep but they all just build unneeded levels of abstractions on top of simple things. IMHO. Your case, as I understood, doesn't need dynamic thread pool or constant task streaming and so on. However, if you want to use OTL, I think the simplest one is the best.

Share this post


Link to post
2 hours ago, Rollo62 said:

Sorry for giving you totally wrong and off-topic hints.

I thought you wanted to work with huge amount of REST queries, maybe with lots of unneccessary data to download as well,

where GraphQL as server interface could help to optimize requests paths and download sizes.

 

I guess "huge" is relative. 🙂  As I stated: 

On 1/17/2022 at 7:32 PM, David Schwartz said:

The program's users will want to process a batch of these requests, from 4 to 100 per batch, but typically 10-20 per batch on average. They typically take 8-12 seconds each.

I'm sending these requests to my own service which forwards on the requests, waits for replies, extracts the exact details needed and returns them to my client app. The rest of the data can be downloaded as well if desired, but it's superfluous in most cases. (And the overhead of downloading that data is minor.)

 

If we have 100 requests per batch and most are just sitting there waiting for a reply, I'm guessing they can all be processed in ~20 seconds of wall-clock time instead of 100 x ~12 secs if processed in series.


Even if we're only processing 10, the total processing time is STILL going to be ~12 seconds because that's how long it typically takes for one request, and we cannot speed that up.

 

I cannot do anything about the +/- 25% processing time variations of the service that's being called, so I don't see that optimizing anything will improve overall performance. Do you?

 

Edited by David Schwartz

Share this post


Link to post
1 hour ago, Fr0sT.Brutal said:

Yep but they all just build unneeded levels of abstractions on top of simple things. IMHO. Your case, as I understood, doesn't need dynamic thread pool or constant task streaming and so on. However, if you want to use OTL, I think the simplest one is the best.

Thanks, but you're pointing at something that amounts to mere milliseconds of added overhead on a 2GHz CPU.

 

Meanwhile, these tasks are spending ~95% of their time waiting for a reply, which is MANY orders of magnitude greater than your concerns of any inefficiencies caused by the abstraction layers.

So why should I care? The user's experience won't be perceptible.

 

I suspect I can set up a thread pool of 50 for a 2-core CPU and I'm guessing it will still not saturate either core. I'm looking forward to testing it.

 

Anyway, this library offers 8 different high-level abstractions, and it's a library I'm just getting familiar with. So I'm still curious what approach others with more experience using it might choose.

 

 

Edited by David Schwartz

Share this post


Link to post
38 minutes ago, Attila Kovacs said:

I'm using "pipeline" for a multi downloader, with my "download" and "import" stages. It's really cool.

I can imagine many different uses for this, for sure! It's a way of implementing a *nix command-line pipe expression in code.

 

In my case, I'm just grabbing a bunch of data, then when it has all been fetched I let the user select pieces of it and it displays a bunch of related items in a heatmap. There's no "import" process required.

 

 

Edited by David Schwartz

Share this post


Link to post
1 hour ago, David Schwartz said:

Thanks, but you're pointing at something that amounts to mere milliseconds of added overhead on a 2GHz CPU.

No, I'm not considering perf at all, just the resulting code. I just don't see benefits in adding more abstractions on top of what could serve well as-is.

Share this post


Link to post
18 minutes ago, Fr0sT.Brutal said:

No, I'm not considering perf at all, just the resulting code. I just don't see benefits in adding more abstractions on top of what could serve well as-is.

Ok, got it. But I like abstractions! 🙂

 

Edited by David Schwartz

Share this post


Link to post

I've decided that parallel.join works best for my needs. Here's part of a test I built. The full code is attached in a zip file.

 

Note that I prefer to use Raize Compnents (Konopka) but you can probably replace them with regular versions here.

 

var
  join : IOmniParallelJoin;

// I have 2 pages in a PageControl
// The first has a TListView that contains a list of things to process, with objects attached to the .Data property
// The second has a TListView that shows completed items.
// Each item in the first one is processed, then moved to the second one. This way, you can see the list shrinking as tasks complete.

procedure TThreadingTest3_frm.Go_btnClick(Sender: TObject);
begin
  Main_ntbk.ActivePageIndex := 0;
  Completed_lview.Items.Clear;   

  if (Ready_lview.Items.Count = 0) then
    ShowMessage( 'add some tasks first!' )
  else
  begin
    Go_btn.Enabled := False;
    StartProcessing( Ready_lview );
  end;
end;

procedure TThreadingTest3_frm.StartProcessing( aLV : TRzListView );
begin
  var ntasks := 5; // if this is = 1 ==> everything runs syncronously, not async

  join := Parallel.Join
                  .NumTasks( ntasks )
                  .OnStopInvoke( procedure  // AFTER <<EACH TASK>> COMPLETES, run this in the main thread
                                 begin
                                   UpdateLV;   // update the ListView in the main thread
                                 end
                               )
                  .TaskConfig(
                      Parallel.TaskConfig.OnTerminated(  // AFTER <<ALL TASKS>> COMPLETE, run this in the main thread
                          procedure (const task: IOmniTaskControl)
                          begin
                            UpdateLV;   // update the ListView in the main thread
                          end
                        )
                      )
                  ;
  start_tm := Now;
  for var li in aLV.Items do  // join is a list; this adds tasks to it
    ProcessListItem( li );

  join.NoWait.Execute;        // this schedules everything to run, then returns
end;

procedure TThreadingTest3_frm.ProcessListItem( aLI : TListItem );
begin
  join.Task(  // creates a new task and ads it to the list
    procedure (const joinState: IOmniJoinState)
    begin
      var obj := TmyClass(aLI.Data);
      obj.ElapsedTm := 0; // signifies we've started
      joinState.task.Invoke( UpdateLV );   // update the ListView in the main thread
      var elapsed := DSiTimeGetTime64;

      // do something
      DSiuSecDelay( obj.DelayAmt * 1000 * 1000 ); // micro-seconds
      // done

      obj.ElapsedTm := DSiElapsedTime64( elapsed ); // in msecs
      joinState.task.Invoke( UpdateLV );   // update the ListView in the main thread
    end
  );
end;

procedure TThreadingTest3_frm.UpdateLV;  // this must run in the main thread
begin
  var we_are_done := (Ready_lview.Items.Count = 0);
  for var li in Ready_lview.Items do
  begin
    var obj := TmyClass(li.Data);
    if Assigned(obj) then
    begin
      if obj.isReady then
        li.SubItems[1] := 'Started...'
      else if obj.isFinished then
      begin
        var li2 := Completed_lview.Items.Add;
        li2.Caption := li.Caption;
        li2.SubItems.Assign( li.SubItems );
        li2.SubItems[1] := 'Finished!';
        li2.SubItems[3] := Format( '%f', [obj.ElapsedTm / 1000] );

        obj.markComplete;
        obj.Free;
        li.Data := NIL;
        li.Delete;
      end;
    end;
    Application.ProcessMessages;

  end;
  Application.ProcessMessages;
  if we_are_done then
  begin
    if (Main_ntbk.ActivePage <> Completed_TabSheet) then
    begin
      Main_ntbk.ActivePage := Completed_TabSheet;
      ShowMessage( 'ALL DONE! Total time = '+Format( '%f', [SecondSpan( Now, start_tm )] ) );
    end;
    Go_btn.Enabled := True;
  end;
end;

 

parallel_join_test3.zip

Edited by David Schwartz

Share this post


Link to post

 

On 1/21/2022 at 1:15 PM, David Schwartz said:

Application.ProcessMessages; 

Why? It serves no purpose whatsoever.

 

I am not familiar with OTL enough to comment other code.

Share this post


Link to post
On 1/25/2022 at 6:25 AM, Dalija Prasnikar said:

 

Why? It serves no purpose whatsoever.

 

This is a test jig and I've been juggling things around trying to see what the differences are. 

Share this post


Link to post
8 hours ago, David Schwartz said:

This is a test jig and I've been juggling things around trying to see what the differences are. 

Testing is one things, calling something that serves no purpose is another. 

 

If the system is over stressed then pumping messages from UpdateLV will not make it run any faster. If the system is not stressed it will pump messages even without you forcing it, literally as soon as you exit UpdateLV method. If you have too many items on list view and UpdateLV is killing your overal performance, then you should modify that logic and instead of updating the whole list all the time update only item that is modified.

 

Yes, you could see some differences in behavior with Application.ProcessMessages, but none that really matter. Just remove those.

  • Like 1

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×