Jump to content

parallax

Members
  • Content Count

    31
  • Joined

  • Last visited

Everything posted by parallax

  1. Good idea, using some logic that should be possible. I am also considering reading larger chunks of the files into an array, and looping through it - if that would be faster than reading single lines as with text files.
  2. Yes, this is what I came up with, and I am checking it now to see if it works ok. It reduced the extraction time from 147 to 83 seconds. function convert(s:string):string; var i:integer; nextNextQuote, nextQuote, nextComma, prevComma, yChar, comma, inString, quote:boolean; m1, m2, d1, d2:char; begin inString:=false; quote:=false; comma:=false; for i:=1 to length(s) do if i<=length(s) then begin nextComma:=false; nextQuote:=false; nextNextQuote:=false; prevComma:=comma; comma:=(s[i]=','); if i<length(s) then nextComma:=(s[succ(i)]=','); if (prevComma and (s[i]='Y') and nextComma) then s[i]:='1'; if (s[i]='"') then begin quote:=true; system.Delete(s, i, 1); end else quote:=false; if (s[i]='"') then begin nextQuote:=true; system.Delete(s, i, 1); end; if (s[i]='"') then begin nextNextQuote:=true; system.Delete(s, i, 1); end; //assume that only single or triple quotes takes us in or out of a string if (quote and (not nextQuote) and (not nextNextQuote)) or (quote and nextQuote and nextNextQuote) then inString:=not inString; if inString and (s[i]=',') then s[i]:=';'; if ((length(s)-i)>10) and (not inString) and prevComma then if (s[i+2]='/') and (s[i+5]='/') and (s[i+10]=',') then begin //pattern: ,??/??/????, is supposed to be a date //change date format from american mm/dd/yyyy to yyyy-mm-dd format m1:=s[i]; m2:=s[i+1]; d1:=s[i+3]; d2:=s[i+4]; s[i]:=s[i+6]; s[i+1]:=s[i+7]; s[i+2]:=s[i+8]; s[i+3]:=s[i+9]; s[i+4]:='-'; s[i+5]:=m1; s[i+6]:=m2; s[i+7]:='-'; s[i+8]:=d1; s[i+9]:=d2; end; end; result:=s; end;
  3. Yes, I know the statement is supposed to handle quotes, but I was not able to make it work with quotes and gave up after a couple of hours. The terminated by clause seemed to have no effect whatsover. Also, there are instances of double and triple quotes, when something is quoted within a string they use double quotes: "This is a string, and then someone said ""Hello there.""" So you can risk having three double quotes in a row if the quote is at the end of the string.
  4. No, I haven't tried that - that's probably a good idea. Will look at it, thanks.
  5. I was able to boil it down to a single loop, but still there is a copy call as well as three stringReplace calls. function commaReplace(s:string):string; var i:integer; inString:boolean; s2, d, dStr:string; begin //replace each comma within double quotes with semicolon, and then remove double quotes //so we can use MySQL LOAD DATA function to load the .csv directly //first remove dual double quotes s:=stringReplace(s,'""','',[rfReplaceAll]); inString:=false; for i:=1 to length(s) do begin if s[i]='"' then inString:=not inString else if inString and (s[i]=',') then s[i]:=';'; if ((length(s)-i)>10) and (not inString) then begin d:=copy(s, i, 10); if regEx.IsMatch(d) then begin //we have a date, change format from american mm/dd/yyyy to yyyy-mm-dd format dStr:=copy(d, 7, 4)+'-'+copy(d, 1, 2)+'-'+copy(d, 4, 2); //insert it in the right place s:=copy(s, 1, i-1)+dStr+copy(s, i+10, length(s)); end; end; end; s2:=stringReplace(s,'"','',[rfReplaceAll]); //convert boolean Y to 1 in the right places, and convert dates //result:=convertDates(stringReplace(s2,',Y,',',1,',[rfReplaceAll])); result:=stringReplace(s2,',Y,',',1,',[rfReplaceAll]); end;
  6. Thanks, I think you have a point, and I have been pondering doing it all in a single loop, but it seemed a bit complicated. May have to figure this out 🙂
×