pyscripter 689 Posted September 9, 2020 (edited) The most common way do text-processing in Delphi is to load a file into a TStringList and then process the text line-by-line. Often you need to save the contents of the StringList back to the file. The TStringList is one of the most widely used RTL classes. However there are a number of limitations discussed below: a) No easy way to preserve line breaks on Load/Save TStringList does not make any effort to recognize the type of line breaks (LF, CR or CRLF) in the files it opens, let alone save that for use on saving. b) Information loss without any warning or any option for dealing with that. Internally TStringList uses unicode strings, but when you save its contents to a file using the default encoding (ANSI), this may result in information loss, without getting any warning or having any means for dealing with that. TEncoding.GetBytes also suffers from that. c) No easy way to determine whether a file you loaded into a TStringList contained a BOM When you load a file (LoadFromFile method), the encoding of the file is stored but not the information about whether the file contained a BOM or not. The WriteBOM property is only used when you save a file. d) Last but not least, no easy way of dealing with utf8 encoded files without a BOM The default text file format in Linux systems, in Visual Studio Code and other editors, as well as in languages such as python 3 is utf8 without BOM. Reading such files with TStringList is problematic and can result in errors, because it thinks such files are ANSI encoded. You could change the DefaultEncoding to utf8, but then you get errors when you read ansi files. No effort is made to detect whether a file without a BOM contains utf8 sequences. Overall, it is desirable that, when you load a file using LoadFromFile and then you save it using SavetoFile, the saved copy is identical to the original. I am attaching a general purpose TStringList descendent that deals with all the above issues in case anyone has a use for that. XStringList.pas Edited September 9, 2020 by pyscripter 8 Share this post Link to post
Stefan Glienke 2002 Posted September 9, 2020 Loading an entire file into a TStringList is a bad way to begin with tbh yet often the most easy way because there are no nice to use alternatives out of the box. 3 Share this post Link to post
pyscripter 689 Posted September 9, 2020 (edited) 8 hours ago, Stefan Glienke said: Loading an entire file into a TStringList is a bad way to begin with tbh yet often the most easy way because there are no nice to use alternatives out of the box. Agree. In the old days (pre Unicode) I wrote the attached Strmtxt.pas (converts a stream to a text file). It would need to be updated to work with recent versions of Delphi. And it works great with buffered streams such as @David Heffernan's TReadOnlyCachedFileStream. With that you could do something like: var Stream := TReadOnlyCachedFileStream.Create('c:\temp\t'); AssignStream(Txt, Stream); while not EOF(Txt) do begin Readln(Txt,S); Writeln(MemoText, S); end; However when you build a text editor such as SynEdit, you typically load the whole file in memory. In any case, the focus of this topic was the limitations of TStringList with regards to dealing with encodings, BOM and Line breaks. It was not about what is the most efficient way to do text processing in Delphi. STRMTXT.PAS Edited September 10, 2020 by pyscripter 2 Share this post Link to post
Remy Lebeau 1396 Posted September 10, 2020 1 hour ago, pyscripter said: In the old days (pre Unicode) I wrote the attached Strmtxt.pas (converts a stream to a text file). It would need to be updated to work with recent versions of Delphi. And it works great with buffered streams such as @David Heffernan's TReadOnlyCachedFileStream. With that you could do something like: You can do something similar using TStreamReader and its ReadLine() method, eg: var Stream := TReadOnlyCachedFileStream.Create('c:\temp\t'); try var Reader := TStreamReader.Create(Stream); try while not Reader.EndOfStream do begin S := Reader.ReadLine; // use S as needed... end; finally Reader.Free; end; finally Stream.Free; end; 8 Share this post Link to post
FPiette 383 Posted September 10, 2020 8 hours ago, pyscripter said: The most common way do text-processing in Delphi is to load a file into a TStringList and then process the text line-by-line. That looks easy at first. You've found yourself there are limitations. But you forgot one important: using a TStringList load the full file into memory. This is probably not desirable if you want to support arbitrary large files. As previous answers mentioned, there are other possibilities. The best solution depends on what you intent to do. Sometimes it is easy to use memory mapped file. 2 Share this post Link to post
Rollo62 536 Posted September 10, 2020 (edited) I like useful enhancements to such basic classes very much . But as always I ask myself, if its worth a separate class, or would it be better to have an interposer class for the new functionalty. With such basic class I'm not very sure if such interposer would be good or bad animal, maybe there could be hard issues when mixing with original, static classes. Since I basically compile all code from sources, I'm quite relaxed, but this could lay the ground of nasty problems. What is your opinion, about when to use interposer classes, and when better avoid it ? Edited September 10, 2020 by Rollo62 Share this post Link to post
Remy Lebeau 1396 Posted September 10, 2020 (edited) 9 hours ago, FPiette said: But you forgot one important: using a TStringList load the full file into memory. It is worse than that. It loads the entire file into a local byte array, then it decodes the entire byte array into a single Unicode string, and then it parses that string to extract the lines into individual substrings. So, by the time the TStringList has finished being populated, and before TStrings.LoadFrom...() actually exits, you could be using upwards of 4-5 times the original file size in memory! Granted, this is temporary, and all of that memory gets freed when LoadFrom...() finally exits. But there is small window where you have a LOT of memory allocated at one time. Edited September 10, 2020 by Remy Lebeau 5 Share this post Link to post
Rollo62 536 Posted September 10, 2020 @Stefan Glienke @Dany Marmur Thanks for your considerations. I use interposer heavily on components, to fix and enhance their behaviour. Sometimes I think even thats too much, but everytime it turns out that they behave so well. From that fact comes my dark consideration to use them elsewhere too, from time to time. My thoughts were about enhancing existing units and frameworks, by simply dropping an uses entry, which "auto-magically" can enhance the units functionality, without too much (or any) reworking of the whole unit. But sure, I was afraid of all that "side effects" too, thats why I didn't use them on RTL classes right now. You're arguments are right, so I won't touch them, and stay on the safe side. Share this post Link to post
Rollo62 536 Posted September 11, 2020 (edited) 18 minutes ago, Fr0sT.Brutal said: And again the topic has turned to some other direction 😉 Yes, sorry for that. But its interesting to see that the world is divided into design time component lovers, and not so much lovers 🙂 It still has a bit to do with the gerat TStringList component ( I hope ). At least we all agree that this is the right way to go for this, as a new, derived class. Edited September 11, 2020 by Rollo62 Share this post Link to post
Lars Fosdal 1792 Posted September 11, 2020 Splitted off the Interposer discussion. Share this post Link to post
Fr0sT.Brutal 900 Posted September 11, 2020 1 hour ago, Rollo62 said: Yes, sorry for that. But its interesting to see that the world is divided into design time component lovers, and not so much lovers 🙂 It still has a bit to do with the gerat TStringList component ( I hope ). At least we all agree that this is the right way to go for this, as a new, derived class. No problem, that subject was interesting and useful as well. But talking about ideal stringlist, what features do you need really? I've never felt I miss something in SL's for many years of Delphi using. Share this post Link to post
Uwe Raabe 2057 Posted September 11, 2020 6 minutes ago, Fr0sT.Brutal said: I've never felt I miss something in SL's for many years of Delphi using. Same here. I make use of TStringList way more often without any file involved than for text file processing. If I need some extension to the standard TStringList behavior I prefer to derive a special purpose class from it. 2 Share this post Link to post
Rollo62 536 Posted September 11, 2020 (edited) So the circle closes to @pyscripter s solution of Bom Handling, Which makes a Lot of Sense to me. Edited September 11, 2020 by Rollo62 Share this post Link to post