Jump to content

Mark Williams

Members
  • Content Count

    274
  • Joined

  • Last visited

Posts posted by Mark Williams


  1. That works in the sense that it masks the exception, but the operation basically stops early.

     

    The example applications do not use masking of this error and yet they work for me. I assume it has to be a different configuration of the project option, but I could not find any differences that appeared significant in this context although I update my project optiosn to align just in case and no joy.

     

     


  2. I'm trying to use the TesseractOCR4 wrapper from GitHub.

     

    I have cloned the project and downloaded it. I get all the example projects to work as expected.

     

    However, I run into floating point operation errors with the dlls when I try and use it in my own application.

     

    I have all the necessary units included in my uses. I have copied all the necessary dlls to my application folder and also the test data.

     

    My app runs and loads the library fine and I can call the recognize function to extract the text. However, it extracts as a single word (ie with no gaps between words). I assume this is the default setting (although I can't imagine why).

     

    There is a PageSegMode property which can be used to change the way Tesseract recognizes the text. I can change this fine within the example apps, but whenever I try and set it in my own app to one of the pre-defined constants it causes a floating point violation within one of the dlls.

     

    I assume there must be some project option which is set/unset in my app which is causing this issue. Can anyone please point me in the right direction?

     

     


  3. Do I really need to make that choice? As the inline vars compile ok they shouldn't be underlined as code errors. So I was wondering if there is an option I needed to configure in the IDE to prevent this issue or whether it is a bug in the new IDE.

     

    If it is not a configuration option then I assume it must be a bug. So I would rather not make the choice. I would rather get an answer and if it is a bug I will report it and hope it gets fixed. 


  4. I thought I'd give the new inline variables a try in 10.3.1. 

     

    When I enter eg "Var b:=true" it is underlined in squiggly red to indicate a code error (as are chunks of the subsequent code) but it compiles fine.

     

    I would like to use this new feature, but this is going to get me confused as to where actual errors do actually lie.

     

    Is this a bug or do I need to configure something in the IDE to stop this happening?


  5. Occasionally, when I am debugging a particular app, if there is a bug in a section of code, the debugger opens one of my units (always the same one and which wasn't previously open or changed) and reports a bug in a specific line in that unit. The report is wrong there is no bug. If I comment out the supposedly offending line in this unit and recompile I am taken to the correct bug. After I correct that, I can then go back to the incorrectly identified line uncomment and recompile successfully.

     

    Does anyone know why I am seeing this behaviour?

     

    I am using 10.3.1, but the issue was present at least as far back as 10.2


  6. 3 minutes ago, David Schwartz said:

    it's done once, and it can be done in the background (separate thread) so as to be almost unnoticeable.

    I already handle it in a separate thread so that's all set to go. I use a separate thread also to check for any new records/changes etc. I'll test the encryption side. I have the feeling that the locally stored database will be 10s of MBs for a database with around 10,000 records. My experience of strong encryption for files of that magnitude is that it is slow. There's also the issue of where to store the encryption key. Hard-coded within the app is not a good idea. I could store it on the server and retrieve it on start up, but then the key is something which should probably be regularly changed for security reasons, which would then render the local file inaccessible. I suppose you just load it all over again from the server. 


  7. 3 minutes ago, David Schwartz said:

    What I'm trying to say is ... there are plenty of ways to approach local caching regardless of the sensitivity of your data.

    OK. Food for thought. TFDQuery has saveToStream abilities. So could be saved to a local file. Encrypting and decrypting such a hefty file would probably take a while, but probably better than the hit on the server. Certainly another option.


  8. Thanks for the info. I gave up on byteA in the end. The data is an array of numbers. I now save it to csv and in a text field and then restore it to an array when needed. There does not seem to be any appreciable loss of speed. Would have certainly looked at ftStream otherwise.


  9. 1 hour ago, David Schwartz said:

    Just because it's possible to access remote tables as easily as local tables doesn't mean you MUST do it that way! Pull down your static data and cache it locally.

    Thanks for the input. I'm not sure if you mean to cache date locally as in save them to a file on the local machine and load from there on start up. I don't think that's what your saying, but if so it would be something I would be keen to avoid. I would think it would be the same for medical apps which doubtless process highly sensitive data.

     

    I only download the data once by way of a query on start up, which then remains static (ie it doesn't refresh). So it's effectively cached locally from that point.  I then periodically run a separate query to check for new/modified records and, if it finds any, it updates the main query.

     

    That's fine over fast internet or local connection with 30000 or so records and it's great to have everything stored locally as you say.  On a slow connection it can considerably slow down start up and that's why I am thinking of having the ability to load incrementally only what's needed where there is a slow connection or a monster number of records. However, although downloading in chunks, I am not intending to jettison any data downloaded locally once the user scrolls away. It will be retained locally in case the user returns to the same location in the tree.

     

    Effectively, I'm trying to do something similar to what dbGrid does in OnDemand mode save that if you Ctrl+END it loads the entire table, which I am looking to avoid. I should probably delve into the DbGrids unit and see what it reveals.


  10. 6 minutes ago, David Heffernan said:

    The reason you don't call FileExists is that CreateFile will fail and GetLastError will tell you that the file does not exist.

    Ok. Noted, but not really my issue.

     

    However, for some reason the function is now working as expected in 64 bit and I have no idea why. 

     

    Thanks anyway.


  11. Thanks for posting all that. In terms of speed of loading and size of returned query etc I know there's a lot of sense in losing all the joins and displaying just the master data with some sort of panel display for the servant data which would update as you highlight each document. The problem with that is it necessitates stopping on each individual document to see what is happening with it, which could in itself become a major issue in terms of speed of scrolling through the tree ie having to arrow down instead of paging through it.

    I obviously need to give this a lot more thought, but thanks for food for that thought!

     


  12. 1 hour ago, David Heffernan said:

    Why do you call FileExists? A single call to CreateFile can tell you what you need. The code as it stands won't tell you whether or not the file is in use. What is hdl? Use THandle as stated above. 

    I copied the code from I believe Mike Lischke's site and found it worked exactly as I wanted in 32 bit.

    Never thought about the reason for the fileExists, but I suppose if the file doesn't exist then the answer to isFileInUse is "No". If it doesn't exist why bother creating it? So to my mind, it makes sense. However, it certainly isn't the cause of the problem.

     

    As for using THandle, please read my post to which you have responded where I say "Noted and thanks, but still returns false...". I was unaware of the problem with FileHdl in64 bit, but I have now changed it and it makes no difference whatsoever. In 64 bit when I run the function, it creates the file even though it is in use in another process and returns no error at least not on the CreateFile process. The only error it returns is on the call to CloseHandle, but that only happens in the IDE so, as I stated in my original post, I don't particularly care. Anyway, It's not hard to imagine that the reason CloseHandle bugs out is because the file is open in another process already and the handle should never have been created.

     

    But ignoring the CloseHandle problem, which would not exist if the function worked as expected (and as it does work in 32 bit), why doesn't it work in 64 bit?

     

    It would be helpful if someone else tried it in a 64 bit app. All you need is a button and an openDialog, the above function (obviously with HFileRes, changed to THandle) and the following code in your button's onClick:

    if OpenDialog1.Execute then
      if isFileInUse(openDialog1.FileName) then
        Showmessage('in use')
      else
       Showmessage('not in use');

     


  13. 5 minutes ago, MarkShark said:

    HFileRes should be a THandle instead of HFILE (which is a LongWord).  In 64bit CreateFile returns a 64bit value and you are assigning it to a 32bit one.

    Noted and thanks, but still returns false when the file is actually in use in 64 bit. Any idea why?


  14. 17 minutes ago, Gary Mugford said:

    ?I assume there's a boolean field that marks a document as examined. Is there a facility to convert the PDF to a text document so you can index the words, or is it a truly examine them one at a time situation?

    Yes there is or rather it's a servant table that records all contact with a document (ie who, when, time spent and depth of their examination based on number of pages visited etc). You can already filter on what's been read and also on how well it's been read. You can also mark documents as thoroughly read.

     

    It already extracts the text from the documents in whatever format they come in - ocr/ pdf text extraction etc. These are added to a database.

     

    I agree OCR has greatly improved, but there's still that worry that something has been missed. Also, some documents are photographs or are handwritten (yes I know about ICR, but we could be talking doctor scribblings!). Even with a typed document, there may be that critical one word handwritten comment someone has added to a document, which just gets missed unless someone does the grunt work.

     

    You guessed right as to legal app.

     

    There are all sorts of ways the documents get indexed, including automatic textual analysis and manual grunt input. But assigning manual descriptions, dates etc to documents can mean analysing the whole document and well, you may as well do it as you trawl through the documents and keep a couple of egg cups on the desk to catch the eye bleed!

     

    Which brings me back to my long tree of documents and how best to populate it on the fly. 

     

    Are there no VirtualTree users out there! I thought we numbered in the thousands?


  15. 23 minutes ago, Gary Mugford said:

    Using a 'filter' set that you process AI-like with a button push running a SQL query, could be a solution. Setting up the VT off that returned dataset might save you a lot of fidgety work. The task is to turn vague descriptors into actual query WHERE parameters. Yes, that will mean User Education but the time spent doing that, saving time scrolling, should be a win-win. Think more GOOGLE and less YOUTUBE Cat Videos. Not that I would EVER be caught watching kittens.

     

    Thanks for the feedback. 

     

    I do provide a search function with AND+OR functionality to search the document index. Also, all the document data gets text scraped/OCRd before going on to the database and there are then search facilities for the document text.

     

    All of this does a lot of the time get you where you want to be. But not always and this is an app managing documents where it is often critical that every document must have been looked at.  So the app visually displays which documents have been seen and you can further prod into it by seeing who has looked at it, for how long, did look at the entire document etc.

     

    But unfortunately, sometimes the documents are really old and OCR badly and sometimes the descriptions of the documents are utterly meaningless "YZ123456789X-YV.pdf" for example. 

     

    So I also want to give the user (which also includes me) the ability to scroll through and see what is there without having to do it in chunks.

     

    I've used it in a number of projects where there have been between 20K to 30K individual documents and between 500K and 1M actual pages. Loading the 30K or so records in one hit at start up is not too bad even on a remote internet server as long as you have a good internet connection your end. but I need to allow for slow connections and also to scale up for potentially much larger document numbers.


  16. I have used the following function for checking if a file is in use for many years (it is recommended on numerous sites):

    function IsFileInUse(FileName: String): Boolean;
    var
      HFileRes: HFILE;
      hdl:THandle;
    begin
      Result := False;
      if not FileExists(FileName) then
        Exit;
      HFileRes := CreateFile(PChar(FileName),
         GENERIC_READ or GENERIC_WRITE,
         0,
         nil,
         OPEN_EXISTING,
         FILE_ATTRIBUTE_NORMAL,
         0);
      Result := (HFileRes = INVALID_HANDLE_VALUE);
      if not Result then
        CloseHandle(HFileRes);
    end;

    It still works fine for 32Bit, but not 64 bit. On 64 bit It always returns a valid handle even when the file is in use and thus reports that the file is not in use even when it is.

     

    Also, when run from the IDE in 64 bit it crashes with an EExternalException: "External exception C0000008" when trying to close the handle. I have found some posts on this via Google and it seems to be a known issue, the answer being to switch off some debug options.

     

    But I am not particularly troubled by the IDE bug. I am majorly concerned by the failure to detect that a file is in use!

     

    I've tried with CreateFileA and CreateFileW, but with the exactly the same result.


  17. 31 minutes ago, PeterBelow said:

    What I would do in your case is to load only the document title (and the primary key  plus perhaps a submission date, but only the minimum of data you need for display and to later fetch the full record) for all records at first, perhaps ordering the result by submission date descending or alphabetically. The resulting data would be stored client-side in a suitable data structure, and the tree would show it from that structure. Only when the user indicates that he wants to examine a document in more detail would you fetch the actual document content from the server and display it. 

     

    My initial thoughts were:

    1. On start up fetch the data from the key id field into Query1 for all required rows. This should be very fast even for a large database.
    2. When data is needed for the tree it is fetched into Query2. I think this query would have to also contain the key id field so there is a duplication of data, but that would be relatively minor.
    3. Set no of virtual nodes in the tree to number of rows in Query1
    4. When the tree needs to populate a node it looks to Query1 for the document id.
    5. With that id you would then use FindKey on Query2 to see if it already contains the data.
    6. If not, request the data from the database and add it to Query 2.
    7. Then pass the relevant RecNos from Query2 back to the tree.
    8. Possibly store the recNo from Query2 as node data so that cut out the small overhead of having to go back to Query1 each time a node requests data.

    I think this is pretty much in align with your thoughts.

     

    I am not too concerned with fast scrolling, I would just show a holding string ("Data populating...") or some such. I am more concerned with handling what happens when scrolling stops for a time. 

     

    I previously implemented a thread which fired once the user stopped scrolling for a short period of time. It analysed scroll direction and populated above or below accordingly grabbing a few tree pages more than was necessary.

     

    This worked ok, but not great. More requests were being fired to the server than was really necessary and occasionally when the scrolling stopped nothing happened. These are just fine tuning and debugging issues. 

     

    Before I plunge in and try to implement something like the above (which will be quite a bit of work), I was just looking for a steer from VirtualTree users who have encountered this issue as to whether this is a sensible approach or whether there is a better one, In particular, I need to know what are the best events to handle in order to implement this. My thoughts are OnScroll, but I have a feeling that there may be better suited events from the many that VirtualTree exposes.

     

    Many thanks for your input. As always highly appreciated.


  18. When using Array DML to insert new records is there a way of obtaining the data for any autoinc key fields for the inserted rows so that the key field of each newly inserted record can be updated in the relevant TFDQquery

     

    If using ApplyUpdates you merely set the AutoInc field of UpdateOptions and away you go. Can't see any obviously simple way of doing this with Array DML.

×