Jump to content
Sonjli

REST api: too much data

Recommended Posts

Hello,

I have to use an external (horrible... 🤮) REST api that gives me a lot of data. In one case I have over 60.000 json records by day, so for one year it starts from 60k x 365. And the customer need all that data.

For now I work day by day, but it is not an option.

I would like to "stream" the data coming from webservice and use an async thread to "eat" the json while streaming. How can I get it in Delphi? Is it possible?

(Other ideas are welcome! :classic_ninja:)

 

Thanks!

Share this post


Link to post

Do you get a monolithic collection of json elements for the entire day, or do they arrive one by one?

If monolithic, you could split it up and add each element to a queue.


Is there a sequence of processing requirement?

If not, you could have multiple parallell consumers of that queue.

  • Thanks 1

Share this post


Link to post

Is this a one-time input of historical data or is more data available from the API every day? Perhaps it makes sense to write a completely separate program that operates in the background and just pulls data into a local database on a regular basis and then your user-facing program interacts with the local database?

  • Thanks 1

Share this post


Link to post
2 hours ago, Sonjli said:

For now I work day by day, but it is not an option.

I do not understand the problem. It is processed automatically but too slow or do you work manually?

What do you try to solve?

Share this post


Link to post

If your concern is high memory usage, you can try OXml and use SAX parser. If your data is well structured, you can also use Delphi classes together with SAX parsing. This would dramatically reduce memory consumption compared to System.JSON.

http://www.kluug.net/oxml.php

 

However, your problem is not clear as stated earlier. You might want to add details.

  • Thanks 1

Share this post


Link to post

I've had to deal with several variations of that over time. Are there web-hooks in place that call the system to hand over each JSON packet? Or are they just dumped in chunks into a file, accessible by something like FTP? Or do you have to poll a system periodically for new data that has accumulated?

 

60k records per day isn't that big of a deal. And text is quite easy to compress, especially if there's a limited lexicon involved.

 

Then there's the backing store, where it really depends on the total lifetime of that data, what the half-life of searches is (so you know when you can flush it out to secondary storage with low likelihood of needing the majority of it), and tuning the DB to optimize the queries that come in. In America, certain types of records need to be kept for a legally-defined minimum length of time, like 7 years for corporate and financial records. Most of it isn't ever looked at, but it needs to be kept online and backed-up in case of audits or lawsuits. Just figure out how to compress the crap out of it and stash it away so it's easy to retrieve when needed.

 

The point is, your job is to make sure the recent data is quickly accessible, while the older data is on slower more massive drives but still accessible in a reasonable amount of time. (A subpoena for data might give a month or more to provide it, which should be plenty of time to deal with slower backing stores, or even mag tapes.) Do NOT try to make 100% of it accessible "instantly". The client might have some temporal requirements, but make sure they're reasonable.

  • Thanks 1

Share this post


Link to post

You asked:

Quote

I would like to "stream" the data coming from webservice and use an async thread to "eat" the json while streaming. How can I get it in Delphi? Is it possible?

Yes this is certainly possible.  What you need is a 'Streaming Parser' that understands the lexical structure of the input (JSON) and fires appropriate events as a file is read, giving you the parsed data.

We use this approach extensively and because it streams the file it can work with arbitrarily large input files.

We also have code that sits on top of the parser that can use separate knowledge about the Delphi classes (either specifically coded or garnered with RTTI) to build Delphi Objects in real time from the JSON and then send each completed object to the application as they are read.  And because they're delphi objects you could have a method in the object definition that then knows what to do with the data.  So once you have the appropriate classes defined processing becomes as simple as instantiating the parser, configuring it and telling it parse the input stream.  Each object is handed back to you and you can call a method on that object to process it.

But of course you have to have the parser.

The code we use is not currently released into the public domain, but if it's of interest and your client has budget I can ask if it can be licensed to you if that would help.

  • Thanks 1

Share this post


Link to post
On 7/8/2024 at 3:36 PM, Lars Fosdal said:

Do you get a monolithic collection of json elements for the entire day, or do they arrive one by one?

If monolithic, you could split it up and add each element to a queue.


Is there a sequence of processing requirement?

If not, you could have multiple parallell consumers of that queue.

Hi. It is a monolithic json. Parallel consumers can be an option (one day per task, for example), yes. I have to plan correct boundaries (day, week, etc.), but yes, good one.

Share this post


Link to post
On 7/8/2024 at 4:20 PM, corneliusdavid said:

Is this a one-time input of historical data or is more data available from the API every day? Perhaps it makes sense to write a completely separate program that operates in the background and just pulls data into a local database on a regular basis and then your user-facing program interacts with the local database?

Thanks. Another good one. Small sqlite db loaded with "batch night", yes.

Share this post


Link to post
On 7/8/2024 at 7:41 PM, ertank said:

If your concern is high memory usage, you can try OXml and use SAX parser. If your data is well structured, you can also use Delphi classes together with SAX parsing. This would dramatically reduce memory consumption compared to System.JSON.

http://www.kluug.net/oxml.php

 

However, your problem is not clear as stated earlier. You might want to add details.

Sorry, you're right. Mine is JSON data and still compressed, not XML. Thanks!

Share this post


Link to post
9 minutes ago, Sonjli said:

Sorry, you're right. Mine is JSON data and still compressed, not XML. Thanks!

Just check the link. There is OJson with "SAX" parsing as well.

 

  • Thanks 1

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×