Jump to content
David Schwartz

any suggestions on how to "lazy load" a BLOB/CLOB from Oracle DB?

Recommended Posts

I'm working with an Oracle DB and am using TOracleDataset from AllroundAutomation. The app is pre-processing some data and saving it to a couple of tables. One string field is mostly pretty small, usually <100 characters. But there are some big instances, and a handful of REALLY  HUGE instances (8500 chars). The max varchar2 size is 4000 chars. So to hold the handful of records that are bigger, I replaced it with a CLOB field.

 

The problem is, when the app starts up, it opens the tables using "select * from xyz". That works fine as long as there are no LOB fields -- each one takes 1-2 seconds to open. But adding a CLOB field suddenly causes it to take ~15-20 seconds to open, even if there's just one short line of data in each of the CLOB fields.

 

So I'm trying to figure out some way to "lazy load" the data in the CLOB fields (or even most of the table).

 

The nature of this problem is that most of the fields are static -- which is why they're filled from the initial preprocessing step. Even the initial data in this CLOB field comes from the pre-process step. But the user will need to modify it and then save it. (Ironically, it will usually be shrunken way down in the process.)

 

This CLOB field is not read-only, but read-write. However, the data really isn't needed until the user scrolls to that exact record. Then it gets loaded almost instantly even if it's huge.

 

So I'm trying to figure out a way of minimizing the initial load time when I open the table so it doesn't actually load the CLOB data right then, but rather when the user accesses the record the first time.

 

If it matters, none of the data needs to be pre-loaded. It can be loaded as the user scrolls through it.


It can also be paged, but I don't know how to do that with typical DB-aware controls. (I'm using cxGrids as well as some separate edit controls, and I've heard cxGrids may have that ability.)

 

I'm looking for suggestions on how to resolve this quickly -- as opposed to re-architecting the entire app.

Edited by David Schwartz

Share this post


Link to post

is this Yours

http://forums.allroundautomations.com/list.html

?

 

If so, it would have been nice to mention this here.

 

Anyway, my idea is:

Create a view which returns all fields 1:1, except Your clob column, which is truncated to a just a view characters. (may be an individual truncation comment could be added also, if it realy happened.

Now change Your app to showing the view instead of the table and add an event for editing & showing  the complete clob.

 

This is no paging but a quite minimal change to the app..

 

Of course it depends on capabilities of your components, I never used them. But at least the tools of aa are well known to me and they are quite fast. I guess they use their own data access components and this workaround would be a quick fix.

Share this post


Link to post
On 12/22/2018 at 4:07 AM, jobo said:

Anyway, my idea is:

Create a view which returns all fields 1:1, except Your clob column, which is truncated to a just a view characters. (may be an individual truncation comment could be added also, if it realy happened.

Now change Your app to showing the view instead of the table and add an event for editing & showing  the complete clob.

 

This is no paging but a quite minimal change to the app..

 

Of course it depends on capabilities of your components, I never used them. But at least the tools of aa are well known to me and they are quite fast. I guess they use their own data access components and this workaround would be a quick fix.

That doesn't solve the problem I'm having.

 

The initial Open() on the table uses "select * from <table>" and the presence of the CLOB field has a ginormous performance impact when it runs the Open(). We're talking 1-2 seconds vs. nearly a full minute!

 

A grid gets loaded with a few of the columns. When you click the grid, it selects the entire record. It's at THAT MOMENT that I want to load the CLOB data into a TDBMemo.

 

It's like lazy-loading just that column, and then just the records as they're selected.

 

The grid in this case is a DevEx cxGrid.

 

Share this post


Link to post

I'm no database expert, so I pose this expecting I'll have missed something.


Suppose you put the large fields in their own table with a key pointing to them from the main table's records.

Load the main table, and then use a background process (or on demand) to load the large fields from the second table.

 

Share this post


Link to post

You can do it with two queries. Make one query that select's all the fields that you want in the grid (don't use "select *", use the wanted fieldnames instead). Then make another query with a parameter that select's the clob field for one record only. Like this:

 

select [clob field] 

from [table name]

where [Primary Field ID] = :ParameterName 

 

Then you open the second query after the first query has entered the wanted record. The parameter will get it's value from the first query.  There might be an event on the grid you can use - I don't know the DevEx grid. You can also use the AfterScroll event of the dataset to open the second query.

 

If you use the second approach, you will need to have a delay on the opening of the second query, or it will open after all scrolled records. You can use a timer to have a small delay. Disable it on the BeforeScroll event and enable it on the AfterScroll event. The delay should just be long enough to make shure that it is not triggering while the user is scrolling.

Share this post


Link to post
On 12/29/2018 at 11:55 AM, David Schwartz said:

That doesn't solve the problem I'm having.

Well, I‘m afraid You have basically two big problems from a technical perspective. Taking Your explanation from the other thread https://en.delphipraxis.net/topic/476-any-suggestions-on-how-to-lazy-load-a-blobclob-from-oracle-db/ into account, these problems result from an incomplete or imperfect “migration” from BDE.

When or where ever a dataset is opened by “select * from anytable;”, it misses the required precision in horizontal and vertical data limitation.

a) * doesn’t cope with specific field handling requirements like BLOB data

b) missing filter (where clause) forces the used components to fetch all data from remote server (ignoring bandwidth shared with others)

 

Now the question is, how do you imagine a cure? (I’m cross-eyed on the other thread again)

Are You looking for a kind of magic switch?

Are You doomed to an unintelligent solution? (Because of management, resources, ...)

 

When I wrote my idea of a “workaround” I wasn’t really aware, You are selecting thousands or millions of such BLOBed records unfiltered. (Which is in my point of view a no go even without BLOB fields involved)

 

Okay, the good news is: There are magic switches, limiting the number of fetched records.

The bad news: Talking about big and complex queries You will sooner or later face the problem, that a big and complex query itself takes its time to be evaluated even on server side. Any sort operation for example is where the trouble starts. And like explained in the other thread, the paging mechanism you are looking for is implemented with sort and limit and offset clauses.

So if You ask the server to run a complex query against millions of rows, also returning millions of rows without grace, then the magic switch asking for just a few of the records will not help in every situation.

 

Let me put it this way: even a “modern” paging mechanism doesn’t help, if data design doesn’t scale. You will not find such an app, which offers high speed data retrieval without some kind of windowing or boxing or filtering your data prior to showing it.

And a normal use case in fact offers a lot of such limits, it’s “natural”.

A clerk works on one order after the other, not with thousands or millions. Also the customer hotline. They ask for my number each time and yes, after that we are talking about my order and everything I ordered, but this isn’t about millions of order items.

 

And I’m sorry, but your requirements (from the other thread) sound a little like “wash me, but don't wet me”.

 

In any case, you can make use of a powerful database and some good components you seem to have already available to get this problems under control. (Even with an older version of delphi) At least that was what I did, when I was actively developing delphi apps. Starting with BDE, moving to ADO/OLEDB, ODBC using mechanisms discussed here.

 

Sure my suggestion would help, if You implement it.

Share this post


Link to post

Ok, forget all this crap. You guys are so far off in the weeds that it's ridiculous.  Thanks, but let's refocus.

 

The root problem I'm trying to deal with is the performance hit that occurs when varchar2 fields are replaced with CLOB fields. That's all it is. Everything flows from that.

 

****************************************************

 

Here's the deal...

 

I have two pairs of tables that represent master-detail relationships. They're very similar, but not identical. 

 

The data I'm working with is such that I pre-process it ONCE to load up the master tables and also initialize a lot of the records in the detail tables.

 

The two "master" tables have fewer than 500 records each, and the two "detail" tables won't ever have more than 1500 records. (Not "millions"!)

 

The users will use the program to go through the data, making comments, setting other data flags, and doing things that even weak minds are far quicker at figuring out than code.

 

As it happens, there's one field in all four tables that has a very wide range in terms of its length: 95% of the records are under 1k, and 4% are under 4k, all of which fit very happily inside of a single varchar2 field (limited to 4000 chars). But 1% are > 4k and a dozen are > 8k. This forces me to either use CLOBs or employ some crazy-ass paging using 3 varchar2 fields (which I haven't tried yet).

 

The problem is that when I change this field in each table from varchar2 to a CLOB, the TOTAL TIME to simply call Open() on all four tables goes from about two (2) seconds to almost four (4) minutes -- one minute per table. We're talking about only a few hundred records!

 

Given that 99% of the data in those fields is the exact same length (ie., <4k) whether it's in a varchar2 or CLOB, the performance impact clearly has something to do with how CLOBs are managed internally, not the tiny number (1%) of really long fields. (And it's NOT data transfer time to the client, because again, <1% of the data is >4k in length.)

 

I'm taking this approach to simply allow me to use data-aware controls so I don't have to manually manage all of the loads and stores whenever the cursor changes. 

 

Maybe someone has a better idea. But please stop talking about nonsense inherited from the BDE and swapping queries at run-time.

 

At its core, this is a very simple problem, and I'm just trying to find a simple solution so the program doesn't take so damn long to initialize (open the tables) when it starts up.

 

 

Edited by David Schwartz

Share this post


Link to post

Ok, sorry, I read somewhere about a general discussion and up to now, nobody but You knew about record counts.

Of course, a few thousand records can't itself be a problem.

 

So I already asked the magic question and one must die a death.

Use some of the magic switches already mentioned .

Do intelligent initialization instead of swallowing the whole database into memory. At the moment processing seems to be complete opposite to "scaling" design.

 

At its core, this is a very simple problem, and I'm just trying to find a simple solution ..

This includes a restriction right at the beginning. Does abstraction make problems simpler and lead to simple solutions?

I don't think so, devide and conquer! Have a look at the details!

 

All the best for 2019 to all forum visitors!

  • Like 1

Share this post


Link to post

I think you're still missing the problem.

 

Varchar2 fields represent baseline performance. It takes ~2 seconds to load 100% of tables into memory using "select * from Xxxx" on ALL FOUR TABLES. It's about 10 MB of data.

 

With this approach, I'm only missing about 2% of the data because of the small number of records that have > 4000 characters in those fields.

 

When I simply replace even one Varchar2 field in a table with a CLOB field, the time to load "select * from Xxxx" goes up by nearly 60 seconds PER TABLE! 

 

That's 0.5 sec to 60 secs. Just changing one field from Varchar2 to CLOB. 

We're talking maybe 100kb more data in a few dozen records TOTAL. There is absolutely no reason that ten megabytes can be loaded in 2 seconds but then an additional 100kb takes FOUR MINUTES!

 

I keep saying, the problem is NOT IN LOADING ALL OF THE RECORDS INTO MEMORY!!!

 

Where is the overhead coming from in switching from a Varchar2 field to a CLOB field? THAT IS THE ISSUE!

 

I've worked with applications that use SQL Server and MySQL with tables that have Memo fields (equivalent to Oracle's CLOBs) that hold 100kb chunks of text -- tens of thousands of records -- and running "select * from Xxxx" takes a few seconds to load 100% of them into memory. We're talking a couple orders of magnitude LARGER databases than what I'm working with here -- and all I ever hear about Oracle is how it's supposedly so much faster and more efficient for HUGE databases. I'm sorry, but it's TOTALLY FAILING here on a teeny weenie change that should have zero impact on performance.

 

This is NOT NOT NOT related to "select * from Xxxx" loading the whole table into memory! Not when the entirety of all four tables is UNDER 10 MB! 

 

What accounts for the disparity between 2 sec and 4 minutes when the size of the data load is only 100kb more? It's NOT "load time"! Not when I've got several GBytes of free memory available!

Share this post


Link to post
Posted (edited)

There are a lot of people seemingly asking about CLOB performance. Here are some links that I've found:

http://betteratoracle.com/posts/45-the-performance-cost-of-clobs

https://stackoverflow.com/questions/38776560/oracle-is-there-any-performance-hit-for-a-clob-column-that-contains-rows-with

 

What timings do you get for both data types when you execute "select * from Xxxx" from command line Oracle SQL tool?

Edited by ertank

Share this post


Link to post

As suggested by Jobo above, put the CLOB fields in a seperate unique result query which you trigger on selection of a specific record in the overall query.

As I understand the documentation of your component suite there is probabyl the need to instantiate a TLOBLocator for every record with a CLOB - could be that this is causing a significant overhead.

Further it might be a threads or other ressource issue on the Oracle DB server hitting you because of the CLOBs overhead (this is just a wild guess though).

Share this post


Link to post
Posted (edited)
On 12/31/2018 at 11:18 PM, David Schwartz said:

That's 0.5 sec to 60 secs. Just changing one field from Varchar2 to CLOB.

This is because the dataset now contains references to the CLOBs and for each row it needs to do a fetch.

So 0.5 seconds "one" fetch. 60 sec, one "fetch" + one new fetch for each CLOB.

 

So you will either need DACs and dataaware components that can show the user whether there is data or not (this is determined in the first fetch) and then do the actual BLOB/CLOB fetch when the user for example clicks the cell for that field. I do not have the time to research if this is possible, use a SQLMonitor* and experiment. Or (much simpler) set up a dataset as a child (one to one) of the master so that it is fetched when the user navigates to the record.

 

If TDataset compliance is needed (and i assume it is) then i would go about it like so:

 

Select T.ID, T.B, T.C, CAST(SUBSTRING(T.M from 1 for 10) as VARCHAR) AS M

from T

...

 

Select T.ID, T.M

from T

Where T.ID = :ID

 

(This is firebird syntax from the top of my head - not oracle).

 

The second query being a "detail". This way you can show the user the first 10 characters (0.5 seconds) and either when the user navigates or clicking a cell, put code there to set the parameter of query 2 and open it. After a BLOB/BLOB post you would need to update that record in the "master" dataset (query 1).

 

HTH,

 

/Dany

 

* It will show you when a BLOB/CLOB reference is fetched as part of a row and when the actual BLOB/CLOB is fetched.

Edited by Dany Marmur
  • Thanks 1

Share this post


Link to post

Just did a small test with my DAC (IBObjects), TDataSource and vanilla VCL TDBGrid + TDBMemo.

This is only one query containing all field, even the BLOB.

The grid only shows "(MEMO)" or "(memo)" for the cell. All rows are fetched (i have an SQL Monitor) w/o the BLOB content for the grid (only the reference).

The BLOB fetch is done when i navigate to a record, not when the grid is filled.

If i disable the (standalone) TDBMemo control - blobs content is never fetched.

Is this what you are after? Perhaps it's some magic done by IBObjects, but i think not.

 

Either way - i do not think it's possible to have an "all data all scenario"-solution for this.

As some already said; the GUI needs affects the tactical here.

  • Thanks 1

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×