Jump to content
Mark Williams

Best/Simplest way to dynamically populate TVirtualStringTree from database

Recommended Posts

I am using TvirtualTree for data display where there can be large numbers of records in a database table (upwards of 30,000). 

Given the virtual paradigm, it obviously makes sense to query for the data only as needed. But I am struggling to find the best way of dealing with this.

It is not ideal to query for the data simply on the firing of the onScroll event. It would just mean that scrolling would become gruesomely slow.

I did try to implement something based on the following design:

  • Use the OnScroll event to fire a timer with an appropriate interval to indicate the user has stopped scrolling for a spell (eg 500ms).
  • Record the topNode at start of timer.
  • After timer interval check if TopNode has changed. 
  • If TopNode hasn't changed, query whether data already obtained for the nodes within the tree's visible area.
  • If data needed, obtain it via a thread so as not to inhibit further scrolling of the tree.
  • Populate the tree with the new data assuming user hasn't scrolled on since the thread executed.

I managed to get this working after a fashion. but it didn't strike me an elegant solution (nor in fact was it terribly reliable probably due to y design).

I am about to revisit this again. It occurs to me that as the virtual paradigm is key to VirtualTree this problem must have been considered and resolved as part of the tree's design so as to greatly simplify the process. If that is so, I can't find out what the solution is meant to be and would be grateful for any pointers.

If there is no easy solution as part of the tree design I would be grateful for a steer as to other ways of reliably and (hopefully) simply resolving this issue.

Share this post


Link to post
9 hours ago, Mark Williams said:

I am using TvirtualTree for data display where there can be large numbers of records in a database table (upwards of 30,000). 

 

Whenever I see such a statement I have to wonder: has the poster ever actually tried to work with a list that large? It is completely useless as an UI element in my opinion, nobody in his right mind wants to scroll around in such a list to find something he is looking for. Give your user a decent search function and show only the database records matching the search criteria. At least, if that is possible, devide the "data space" into chunks of manageable size, e.g all records where a person name starts with the same letter, then allow the user to select the letter to look for.

  • Like 2

Share this post


Link to post
Quote
13 hours ago, PeterBelow said:

has the poster ever actually tried to work with a list that large?

 

Afraid to say "yes". It's used in a document database where the documents can be a random assortment and often poorly titled. Trying to sift through the documents by categories often doesn't work well and the only way to find what you want may be by way of a laborious scroll through the table (even item by item), tedious as it may sound.

 

Your reply also seems to query the point of the virtual paradigm. Why have the ability to host millions of nodes if there is never any point in hosting them?

 

I could load 250/500/1000 items a time and have a button "Get Next 1000", but I don't wish to do it that way. I would prefer (and believe it is also better from a usability stand point) to have a tree that purports to provide access to all the relevant documents simply by scrolling and allows the user to get to the point they want to be by scrolling through the tree quickly. Having to do so in chunks is in my view clunky and obstructive. VirtualTreeView, as I have always understood it is the remedy to such clunkiness: if only I could figure out how!

Edited by Mark Williams
Update

Share this post


Link to post

DevExpress has something they call "Server Mode" wish is a "GridMode" using "refined" queries. I.e. extended "DataProviders".

It is rather convoluted IMHO but it works for specific implementations (it's not a one-for-all). You could check their demos for inspiration.

 

Share this post


Link to post
33 minutes ago, Dany Marmur said:

DevExpress has something they call "Server Mode" wish is a "GridMode" using "refined" queries. I.e. extended "DataProviders".

It is rather convoluted IMHO but it works for specific implementations (it's not a one-for-all). You could check their demos for inspiration.

 

Thanks for the tip. I've had a quick look on their site.  I'm not sure how useful it would be to download their demo product without the source code. There are so many users of VirtualTree and, whilst Peter's observations are never to be taken lightly, I can't believe that my issue, which perhaps might be better summarised as "How best to use TVirtualTreeView virtually with a database", has not been addressed previously. But I cannot find any demos or help online. I suspect that I may be looking to the wrong events and properties for handling this. If so, I can't work out which are the right ones. OnInitNode may be, but it fires for each node and I need to query to get data for all visible nodes.

 

I don't think my original approach was a million miles off, but it didn't perform as well as I hoped. It may be just a case of taking a long hard look at my code and seeing how it can be improved. However, before I do that I would really like some feedback from other virtualTreeview users who have already addressed this problem.

Share this post


Link to post
1 hour ago, Mark Williams said:

I'm not sure how useful it would be to download their demo product without the source code.

Bummer, of course, that will be kind of moot. Bummer.

 

You could request a "trustware" licence from IBO (IBObjects) showing similar stuff but then i think FireDAC has all that if you are on Enterprise.

 

No, i'm not surprised (as to why you are not fining resources on line). I'd guess that the component mainly targets "app-local", local memory and such.

 

For the "transport" (SQL <=> Delphi local) i guess you have too look somewhere else. As i said, traditionally what you are after resides [would reside] in the DataController/DataSet layer rather than in the GUI layer. Not sure i'm making myself clear enough here...

Share this post


Link to post
2 hours ago, Dany Marmur said:

I'd guess that the component mainly targets "app-local", local memory and such.

It certainly handles data from local memory, but I am pretty sure it was also designed for displaying data from databases. 

Share this post


Link to post
21 hours ago, Mark Williams said:

Afraid to say "yes". It's used in a document database where the documents can be a random assortment and often poorly titled. Trying to sift through the documents by categories often doesn't work well and the only way to find what you want may be by way of a laborious scroll through the table (even item by item), tedious as it may sound.

 

Your reply also seems to query the point of the virtual paradigm. Why have the ability to host millions of nodes if there is never any point in hosting them?

 

I could load 250/500/1000 items a time and have a button "Get Next 1000", but I don't wish to do it that way. I would prefer (and believe it is also better from a usability stand point) to have a tree that purports to provide access to all the relevant documents simply by scrolling and allows the user to get to the point they want to be by scrolling through the tree quickly. Having to do so in chunks is in my view clunky and obstructive. VirtualTreeView, as I have always understood it is the remedy to such clunkiness: if only I could figure out how!

What I would do in your case is to load only the document title (and the primary key  plus perhaps a submission date, but only the minimum of data you need for display and to later fetch the full record) for all records at first, perhaps ordering the result by submission date descending or alphabetically. The resulting data would be stored client-side in a suitable data structure, and the tree would show it from that structure. Only when the user indicates that he wants to examine a document in more detail would you fetch the actual document content from the server and display it. 

 

This way your first query should still be fast enough to not cause a too noticable delay; even if it returns 30,000 rows each of the rows would be fairly short, so the total amount of data transferred is not that large.

 

I'm not familiar with this VirtualTreeview control, but with such virtual controls the time saver is not having to load all the data into the control in one operation. But you still need to know, up front, how many "records" your control has to display, so you can create the necessary number of virtual nodes, and you need to have the data available somewhere, to be able to provide it when the control asks for it. If the ultimate source of the data is a database you need not only the number of records the query you use returns, you also need some piece of data for each of the records that allows you to tie each of the virtual nodes to a record in the database unambiguously, usually a primary key. Only then can you fetch the full data for a node when you need it for display. 

 

A database is usually not like a sequential file or list, there is no convenient way to partition a query result set into "pages". Even though some SQL databases support the ANSI SELECT syntax with OFFSET and FIRST ROWS in some manner these are very inefficient queries, since the server still has to compose the full result set on each query, it just returns only part of the resulting rows.

 

Also keep in mind that such list-type controls have a vertical scrollbar. The user can frustrate your attempts at reading the data in chunks sequentially by grabbing the scrollbar thumb and dragging it down, ending up somewhere way below the current page you have laboriously fetched before. You can only react fast to that if you have the data needed to display already stored in some data structure on the client side.

 

In summary: using this kind of display does not work well with the classical RAD "drop a query and tied that to a databound control" way of design. You are much better served by using a list of objects for the client-side data storage. You fill that with objects holding only the minimum of data at first (primary key and a display string, e.g. your document title) and tie the objects to the virtual nodes of your UI. You fetch the rest of the data only when it is needed for display. If you grow ambitious after having that working you can then use a secondary thread with a second conncection to the database to fill the objects corresponding to the next visual page of the control in background, while the user is still staring at the display of the current page.

Edited by PeterBelow

Share this post


Link to post
31 minutes ago, PeterBelow said:

What I would do in your case is to load only the document title (and the primary key  plus perhaps a submission date, but only the minimum of data you need for display and to later fetch the full record) for all records at first, perhaps ordering the result by submission date descending or alphabetically. The resulting data would be stored client-side in a suitable data structure, and the tree would show it from that structure. Only when the user indicates that he wants to examine a document in more detail would you fetch the actual document content from the server and display it. 

 

My initial thoughts were:

  1. On start up fetch the data from the key id field into Query1 for all required rows. This should be very fast even for a large database.
  2. When data is needed for the tree it is fetched into Query2. I think this query would have to also contain the key id field so there is a duplication of data, but that would be relatively minor.
  3. Set no of virtual nodes in the tree to number of rows in Query1
  4. When the tree needs to populate a node it looks to Query1 for the document id.
  5. With that id you would then use FindKey on Query2 to see if it already contains the data.
  6. If not, request the data from the database and add it to Query 2.
  7. Then pass the relevant RecNos from Query2 back to the tree.
  8. Possibly store the recNo from Query2 as node data so that cut out the small overhead of having to go back to Query1 each time a node requests data.

I think this is pretty much in align with your thoughts.

 

I am not too concerned with fast scrolling, I would just show a holding string ("Data populating...") or some such. I am more concerned with handling what happens when scrolling stops for a time. 

 

I previously implemented a thread which fired once the user stopped scrolling for a short period of time. It analysed scroll direction and populated above or below accordingly grabbing a few tree pages more than was necessary.

 

This worked ok, but not great. More requests were being fired to the server than was really necessary and occasionally when the scrolling stopped nothing happened. These are just fine tuning and debugging issues. 

 

Before I plunge in and try to implement something like the above (which will be quite a bit of work), I was just looking for a steer from VirtualTree users who have encountered this issue as to whether this is a sensible approach or whether there is a better one, In particular, I need to know what are the best events to handle in order to implement this. My thoughts are OnScroll, but I have a feeling that there may be better suited events from the many that VirtualTree exposes.

 

Many thanks for your input. As always highly appreciated.

Share this post


Link to post

Mark, I ran into your problem in a different way a while back. I had a huge jumble of data buried in a memo field and needed a way to surface it, while having my user mumble those hated words, "I'll know it when I see it." What I did was say, "What will it look like?" And he described a title that might have this word or that word, a date from roughly this six month period and some other attributes. In other words, he DID have a list of parameters, however vague. So, I wrote what I hope was a smart filter that gave him fields for his 'descriptions', allowed there to be ANDs and ORs in EACH of those fields and a button to FILTER. From millions of records, he often ended up with the exact one. But if not, it was within a screenfull's worth of data.

 

Using a 'filter' set that you process AI-like with a button push running a SQL query, could be a solution. Setting up the VT off that returned dataset might save you a lot of fidgety work. The task is to turn vague descriptors into actual query WHERE parameters. Yes, that will mean User Education but the time spent doing that, saving time scrolling, should be a win-win. Think more GOOGLE and less YOUTUBE Cat Videos. Not that I would EVER be caught watching kittens.

 

I think you're letting the user off the hook. They/He know more about the thing they are searching for than they are telling you.

 

I've done away with 'scrolling grids' in my latest version of my big project. Since I tentpole around one specific field in just about all databases in the project, it's easier for me to do that. But it was a hard decision to abandon the grid that showed just the few fields that were keys somewhere in the design. I watched users ALWAYS go to Ctrl-F to find what they were looking for. Then, when they DID decide to traverse the grid, it was rarely more than a single Next/Prev click. So, I just abandoned the grid entirely, put the Find buttons (there's an Again button) between two one-record movement buttons and that was it. In the menu, there's a popup for a grid CENTRED around the current record. And I've seen it used. Twice. I actually had a counter and kept track of how often the menu item ran.

 

BREAKING myself of how I WORKED, which was scroll and scan frequently, into USER MODE where search and work was the mode, only took 25 years.

 

All of this blather might not directly address your issue. It all centres around that user's inability/ability to at least reduce the total returned dataset by some. I know when I go spelunking to find data, it's always the LAST ONE I LOOK AT. Murphy's tenth law, I believe. I just can't imagine having it be the thirty THOUSANDTH thing I looked at!!!

Share this post


Link to post
23 minutes ago, Gary Mugford said:

Using a 'filter' set that you process AI-like with a button push running a SQL query, could be a solution. Setting up the VT off that returned dataset might save you a lot of fidgety work. The task is to turn vague descriptors into actual query WHERE parameters. Yes, that will mean User Education but the time spent doing that, saving time scrolling, should be a win-win. Think more GOOGLE and less YOUTUBE Cat Videos. Not that I would EVER be caught watching kittens.

 

Thanks for the feedback. 

 

I do provide a search function with AND+OR functionality to search the document index. Also, all the document data gets text scraped/OCRd before going on to the database and there are then search facilities for the document text.

 

All of this does a lot of the time get you where you want to be. But not always and this is an app managing documents where it is often critical that every document must have been looked at.  So the app visually displays which documents have been seen and you can further prod into it by seeing who has looked at it, for how long, did look at the entire document etc.

 

But unfortunately, sometimes the documents are really old and OCR badly and sometimes the descriptions of the documents are utterly meaningless "YZ123456789X-YV.pdf" for example. 

 

So I also want to give the user (which also includes me) the ability to scroll through and see what is there without having to do it in chunks.

 

I've used it in a number of projects where there have been between 20K to 30K individual documents and between 500K and 1M actual pages. Loading the 30K or so records in one hit at start up is not too bad even on a remote internet server as long as you have a good internet connection your end. but I need to allow for slow connections and also to scale up for potentially much larger document numbers.

Share this post


Link to post

Ahhhh, I understand better. I thought it was more along the lines of something I do in a different milieu, which is to track essay submissions for a prof that I'm an occasional Bridge partner with. So, the random names of the essays he's looking for fall into, I remember it was two semesters ago and it was Joe or maybe John, writing on murders or maybe attempted murders of journalists in one of the ex Soviet Union republics. THAT I can narrow down. Yours, not so much.

 

I assume there's a boolean field that marks a document as examined. Is there a facility to convert the PDF to a text document so you can index the words, or is it a truly examine them one at a time situation? And after examination, do you rename the file, discard it or leave it as is with a descriptor field? Feels like this might be a legal evidence app. As bad as OCR is with bad scans, there should STILL be a way to prioritize the ORDER of examination to maximize the time taken looking at them. Even if the KEY WORDS (I CONFESS) fails to be OCR'd correctly, the surrounding text might hit your eventual ordering algorithm. I've seen some impressive results with OCR these days with a little training. Eighty percent on a bad page I though virtually unreadable. And even the fabled 99 percent still has an error every line and a half. Somewhere in between, let's call it 90 percent, means an organizational improvement of, well, 90 percent. Worth the time to seek out the PDF conversion libraries and write yourself something for the computers to do overnight and on long weekends. 

 

Now, if there's nothing automatic out there (I've seen lots of apps, many of which CAN operate on command lines) then the labour/result graph might be wrong. But looking at 30K badly scanned PDFs makes my eyes bleed just thinking about it.

 

Lots of luck, GM

Share this post


Link to post
17 minutes ago, Gary Mugford said:

?I assume there's a boolean field that marks a document as examined. Is there a facility to convert the PDF to a text document so you can index the words, or is it a truly examine them one at a time situation?

Yes there is or rather it's a servant table that records all contact with a document (ie who, when, time spent and depth of their examination based on number of pages visited etc). You can already filter on what's been read and also on how well it's been read. You can also mark documents as thoroughly read.

 

It already extracts the text from the documents in whatever format they come in - ocr/ pdf text extraction etc. These are added to a database.

 

I agree OCR has greatly improved, but there's still that worry that something has been missed. Also, some documents are photographs or are handwritten (yes I know about ICR, but we could be talking doctor scribblings!). Even with a typed document, there may be that critical one word handwritten comment someone has added to a document, which just gets missed unless someone does the grunt work.

 

You guessed right as to legal app.

 

There are all sorts of ways the documents get indexed, including automatic textual analysis and manual grunt input. But assigning manual descriptions, dates etc to documents can mean analysing the whole document and well, you may as well do it as you trawl through the documents and keep a couple of egg cups on the desk to catch the eye bleed!

 

Which brings me back to my long tree of documents and how best to populate it on the fly. 

 

Are there no VirtualTree users out there! I thought we numbered in the thousands?

Edited by Mark Williams

Share this post


Link to post

Wish I was, although it sits in the component library. I use the TfcDBTreeView from Woll2Woll's First Class components in my mapping module. It runs off a self-referential database that has Key, UltimateParent and Parent fields to create the TreeView. The Object itself is dead simple:

 

  object tvMap: TfcDBTreeView
    Left = 4
    Top = 27
    Width = 367
    Height = 518
    BorderStyle = bsSingle
    TabOrder = 1
    DataSources = 'SourceRoot;SourceNodes'
    DisplayFields.Strings = (
      '"Key"   [Desc] "Desc"   [Op] "Operator"'
      'Uses "Errors" "Key" [Grade] "Grade"')
    LevelIndent = 19
  end

Nothing there that VT can't duplicate and you've already done it for your datasets. Now, as for the code ... Again, dead simple and maybe not sophisticated for your stuff, but it works as a self=referential tree exactly as I NEED it to be.

procedure TFrmMap2.tvMapUserExpand(TreeView: TfcDBCustomTreeView;
  Node: TfcDBTreeNode);
var
  qChild: TQuery;
  sourceChild: TDatasource;
begin
   if (node.level + 1) < tvMap.displayfields.count
     then exit;

   { Create new detail paramaterized query }
   sourceChild := TDatasource.create(self);
   sourceChild.name := 'ChildDataSource' + inttostr(node.level + 1);
   qChild := TQuery.create(self);
   sourceChild.dataset := qChild;
   with qChild do begin
      qChild.sql.assign(qNodes.sql);
      qChild.databasename := qNodes.databasename;
      if lastds = nil
        then qChild.datasource := SourceNodes
        else qChild.datasource := lastDS;
      qChild.active := true;
      end;
   with tvMap do begin
     DataSources := DataSources + ';' + sourceChild.name;
     displayfields.add(displayfields[displayfields.count - 1]);
     lastDS := sourceChild;
   end
end;

procedure TFrmMap2.tvMapCalcNodeAttributes(
  TreeView: TfcDBCustomTreeView; Node: TfcDBTreeNode);
begin
   { Determine if expand button should be displayed by using a lookup table }
   node.haschildren :=
       TblLU.Locate('Parent',
                    node.dataset.fieldbyname('Key').asstring, []);
//   node.haschildren :=
//       TblLU.Locate('Parent Index',
//                    node.dataset.fieldbyname('index').asstring, []);
end;

procedure TFrmMap2.tvMapChange(TreeView: TfcDBCustomTreeView;
  Node: TfcDBTreeNode);
begin
   { Synchronize table bound to edit controls }
   if node.DataSet = qRoot
     then begin
       TblView.active := false;
       end
     else begin
       TblView.active := true;
       TblView.locate('Parent;Key',
                  varArrayOf([node.dataset.fieldbyname('Parent').asstring,
                              node.dataset.fieldbyname('Key').asstring]),
                  []);
       end;
end;

procedure TFrmMap2.TbLViewAfterPost(DataSet: TDataSet);
begin
  qNodes.active := false;
  qNodes.active := true;
  Application.ProcessMessages;
end;

Now, I HOPE I didn't induce any errors into the code trying to anonymize it. But you can see it's a minimum of code and I get the ability using the hierarchy fields to keep me functional. In my case, a Key has a Parent and that Parent, somewhere UP the line has an UltimateParent. This allows me the ability to move things around. Adapting it to YOUR case, I'd see the UltimateParent being FiledTo or somesuch thing. The Parent being some combination of Examined with Contents. And Key, of course, would be your document by title. The rest of the fields would be displayed by a Record View to the right or bottom of your tree or wherever.

 

I would then write your text search routines to bring up unanalyzed documents by type and then keyword analysis. Throw in whatever else your AI deems as important in descending order of importance. Then scroll to your heart's delight, although technically, if all the AI has been well-written, the guess is that you scroll and read, one at a time. 

 

OVER-SIMPLIFYING by a lot, I know. But Peter is right when it comes to the viability of scrolling through the kind of list. I don't know HOW you decide to stop and read, but it HAS to be quantifiable. So says the guy who's never seen the code or the data. For which I apologize for my hubris. Wish you well in getting REAL help here. GM

Share this post


Link to post

Thanks for posting all that. In terms of speed of loading and size of returned query etc I know there's a lot of sense in losing all the joins and displaying just the master data with some sort of panel display for the servant data which would update as you highlight each document. The problem with that is it necessitates stopping on each individual document to see what is happening with it, which could in itself become a major issue in terms of speed of scrolling through the tree ie having to arrow down instead of paging through it.

I obviously need to give this a lot more thought, but thanks for food for that thought!

 

Share this post


Link to post

Just because it's possible to access remote tables as easily as local tables doesn't mean you MUST do it that way! Pull down your static data and cache it locally.

 

Most medical coding & billing apps require access to an ICD10 table (60k+ records) along with other mostly static data tables that can also be rather large.

 

In one situation, the devs hooked a combobox to a query that scrolled thru the entire list of ICD10 codes, as well as a larger list that did the same thing, both via a live connection. It's a bit clunky.

 

In another situation, the program downloads a couple dozen tables from the server and caches them locally when it starts up.

 

The thing to focus on is the fact that ICD10 codes are only updated once or twice a year, and many of the other tables only grow -- ie., what was there previously doesn't change.

 

So it ends up being pretty simple to download them to local cache files once, then load those into memory tables at program startup. It's WAY faster than pulling the data down from a server every time. You could also use something like SQLite to hold them. Check the server for updates periodically and set up a scheme that lets you only download the new records for tables that do change periodically.


 

Share this post


Link to post
1 hour ago, David Schwartz said:

Just because it's possible to access remote tables as easily as local tables doesn't mean you MUST do it that way! Pull down your static data and cache it locally.

Thanks for the input. I'm not sure if you mean to cache date locally as in save them to a file on the local machine and load from there on start up. I don't think that's what your saying, but if so it would be something I would be keen to avoid. I would think it would be the same for medical apps which doubtless process highly sensitive data.

 

I only download the data once by way of a query on start up, which then remains static (ie it doesn't refresh). So it's effectively cached locally from that point.  I then periodically run a separate query to check for new/modified records and, if it finds any, it updates the main query.

 

That's fine over fast internet or local connection with 30000 or so records and it's great to have everything stored locally as you say.  On a slow connection it can considerably slow down start up and that's why I am thinking of having the ability to load incrementally only what's needed where there is a slow connection or a monster number of records. However, although downloading in chunks, I am not intending to jettison any data downloaded locally once the user scrolls away. It will be retained locally in case the user returns to the same location in the tree.

 

Effectively, I'm trying to do something similar to what dbGrid does in OnDemand mode save that if you Ctrl+END it loads the entire table, which I am looking to avoid. I should probably delve into the DbGrids unit and see what it reveals.

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×