Jump to content

parallax

Members
  • Content Count

    31
  • Joined

  • Last visited

Community Reputation

1 Neutral

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

  1. I am having issues with testing your code, getting very large files. Could you please attach a test project?
  2. You are correct, double quotes are only used if there is a comma within the string, and quotations within strings are double quoted, i.e. escaped with a double quote. So your load clause should work, but it doesn't. About 10% of the rows are discarded somehow.
  3. Not sure why you get that error. There should be 35 fields in the data file: MyArr1990VAERSDATAFields: array [0 .. 34] of string = ( { } 'VAERS_ID', 'RECVDATE', 'STATE', 'AGE_YRS', 'CAGE_YR', 'CAGE_MO', { } 'SEX', 'RPT_DATE', 'SYMPTOM_TEXT', 'DIED', 'DATEDIED', 'L_THREAT', 'ER_VISIT', { } 'HOSPITAL', 'HOSPDAYS', 'X_STAY', 'DISABLE', 'RECOVD', 'VAX_DATE', { } 'ONSET_DATE', 'NUMDAYS', 'LAB_DATA', 'V_ADMINBY', 'V_FUNDBY', 'OTHER_MEDS', { } 'CUR_ILL', 'HISTORY', 'PRIOR_VAX', 'SPLTTYPE', 'FORM_VERS', 'TODAYS_DATE', { } 'BIRTH_DEFECT', 'OFC_VISIT', 'ER_ED_VISIT', 'ALLERGIES');
  4. UTF8Decode did not work as expected, even though the files are UTF-8 without BOM. But casting the output from MMFileToString using PAnsiChar, it worked fine, and I came down to 72 seconds on loading the big file. The mapping alone took 3,8 seconds, for the 748.xxx lines. That was fast.
  5. Yes, I can imagine that going to x64 will make it work. I just tried the MM only loading/mapping in Dx32, and was out of memory again. Edit: I just tried Delphi x64, and the loading/mapping of 748.xxx lines took 1,56 secs. But the stringlist in Delphi is unicode and all the text ends up in a single string, with delimiters not being recognized.
  6. Thank you very much, that was a very fast mapping routine - very interesting! Running on Lazarus it loads (or maps) 748.xxx lines in 2,39 secs., but I was only able to insert 300.000 lines into the SG, which took 4,02 secs, before getting Out of Memory, but that is no problem. There is no need to view all the data at once in a grid anyway.
  7. Thank you for the suggestion - but I've tried the ESCAPED BY clause, and I still lose some lines when importing: LOAD DATA local INFILE 'c:/programdata/mysql/mysql server 8.0/uploads/2022VAERSDATA.csv' INTO TABLE data CHARACTER SET latin1 FIELDS TERMINATED BY ',' ENCLOSED BY '"' ESCAPED BY '' IGNORE 1 LINES This inserts 35.673 lines of 39.563, so I am losing close to 10% of the data. When working on this, I came across this section in the manual: https://dev.mysql.com/doc/refman/8.0/en/load-data.html I also noticed this suggestion at SO, but it did not work as indicated in the manual. If anyone has a solution that actually works in this case, I would of course appreciate that very much. Maybe something is escaping me?
  8. Thank you for the suggestion, it looks very interesting. I have not worked with FD yet, so it will take a while to get into it.
  9. Thank you for the idea. If I have to split the file into smaller pieces, I have to read and write the whole file first, but if that is done in a more efficient way than line by line, it may work. I would have to test some block reading to see what kind of speed I can get on larger files.
  10. I was not able to load 620 Mb into a stringlist, or a memo... a smaller file of 39.000 lines/36 Mb went fast, about 120 ms. I have considered memory mapping, but I think it is too much to load into memory, and if I have to convert the file anyway to get rid of the double double quotes, then I can't escape reading and writing it, with all the time that it has to take. If that is the case, I guess the best one can do is to do as much as possible at the same time.
  11. Testing the load data function on the original files without conversion now, I see that it loads without error messages, but it does not accept all the lines of the file. Loading the 2021 file takes close to 30 seconds, but it only loads 669.008 lines of 748.779 lines. The issue may be the double double quotes, not sure.
  12. Sure, functions may be run in the load statement in the SET clauses - I know, and I already use that. You are correct that it could have been solved quite easily, I forgot about that in my previous reply. But I was not able to make it work with the quotes, so I did not get that far. I may have overlooked something. Maybe it was the double double (double) quotes issue that was the problem. I specified the delimiter and did testing on very simple files, but to no avail. I checked the net and there were others that had the same issue, and it may be related to the db version - I use MySQL 8.0.23 Since I was not able to resolve this, I went for the conversion routine.
  13. Thank you, Anders, for a very nice setup, I will consider that. FIFO was in fact something that crossed my mind when I read your previous post, presenting the idea 🙂
  14. You are right about that - if the file may be loaded without conversion, that would be ideal. And it is probably a good thing to be a bit lazy, as that motivates to create automated solutions with minimal manual demand. Regarding a direct data load, the issue is the date format and the BOOLs, and they could of course be imported as strings (assuming it worked with quotes). I would not be able to sort on the dates then, but could of course spin through the table and convert it in a background thread...after the loading. Not really ideal. I assumed that the sorting on dates would be possible and on integers would be faster than if using strings, when doing queries later on, if the file was first converted.
  15. It is not for me, but for the people who will install the program and use it for analyzing the data. They probably have slower hardware than me, so it would be nice if they would not have to wait for several minutes for the loading to finish. The idea is that this is something everybody should see, because it is so shocking - I mean the sudden spike in cases in 2021. So I want it to work well.
×