Mike Torrettinni 198 Posted September 4, 2020 "Use AnsiStrings instead of Strings. Unicode strings are inefficient." I noticed this as last comment in one of my old SO questions: https://stackoverflow.com/questions/35942270/charinset-is-much-slower-than-in-should-i-fix-w1050-warning-hint ( scroll to the bottom) Is this just one man's opinion, or is there any validity to it? Is it suggesting that when dealing with string manipulation functions, to convert Unicode string to Ansistring variable type, process the Ansistring and then convert back to Unicode for display/export...? My projects are all in Delphi 10.2, so they all work with string types - Unicode strings. And all works good for importing, processing, exporting and displaying data even for international customers. So, all works good. Should I look into if any string manipulation methods would be faster with working on Ansistrings? Or should I just ignore the comment and move on? Thanks! Share this post Link to post
Anders Melander 1783 Posted September 4, 2020 36 minutes ago, Mike Torrettinni said: Should I look into if any string manipulation methods would be faster with working on Ansistrings? Do you need them to be faster? 1 Share this post Link to post
David Heffernan 2345 Posted September 4, 2020 No reason to give any credence to that comment. Ignore it and move on. Be happy that your code is not limited to text that can be encoded with whatever ANSI locale the machine it runs on is using. 2 1 Share this post Link to post
Dalija Prasnikar 1396 Posted September 4, 2020 Don't use AnsiString if anyhow possible. Only if you exclusively work with 7bit ASCII subset then you can safely use AnsiStrings. The reason is that AnsiString - Unicode conversions is potentially lossy. You can lose original data. Using 8-bit strings instead of 16-bit strings can be faster under some circumstances, and they also use less memory. But if you have use case where 8-bit string performance and memory consumption is needed you should use UTF8String instead of AnsiString. 2 1 Share this post Link to post
Mike Torrettinni 198 Posted September 4, 2020 Great, good to know I don't need to worry about this! I remember in D2006 I had to use all sorts of 'magic' with input strings, UTF8Decode/Encode, WideString* conversion methods, switching system locale to test customer's language, locale settings... nice to know that stays in the past! 🙂 Share this post Link to post
Mike Torrettinni 198 Posted September 4, 2020 12 minutes ago, Anders Melander said: Do you need them to be faster? I use a lot of input data manipulation, but none specific is really slow, and now I see there is no reason to even think about if String <-> Ansistring would make them any faster. Share this post Link to post
pyscripter 689 Posted September 4, 2020 (edited) Agree with all the above, except that UTF8String is an an AnsiString with a Code Page 65001 and is also Unicode compliant and incurs no conversion loss. It is the default string type in FPC. In Linux systems everything is UTF8. And nowadays 65001 it can be set as the default code page in Windows. Edited September 4, 2020 by pyscripter 3 Share this post Link to post
Remy Lebeau 1395 Posted September 4, 2020 (edited) 15 minutes ago, pyscripter said: Agree with all the above, except that UTF8String is an an AnsiString with a Code Page 65001 and it incurs no conversion loss. It is the default string type in FPC. More accurately, UTF-8 is the default encoding used by Lazarus, not by FreePascal itself. On Windows, FPC's RTL sets the default encoding to the system encoding, just as Delphi does (on Linux/OSX, the default encoding is set to UTF-8). Lazarus overrides that RTL setting. https://wiki.freepascal.org/Unicode_Support_in_Lazarus#RTL_with_default_codepage_UTF-8 Quote And nowadays 65001 it can be set as the default code page in Windows. That is still a beta feature, and it doesn't work quite as well as everyone had hoped it would. Maybe in the future, it will be better. Edited September 4, 2020 by Remy Lebeau 3 1 Share this post Link to post
aehimself 396 Posted September 4, 2020 I'm using String for string everywhere with no noticeable performance penalty (although, I never had to write performance-critical programs so far). If I know that encoding is in the game (e.g.: receiving data from a web browser) or I'm working with binary data I'm using TBytes. Pretty happy until now. 1 Share this post Link to post
pyscripter 689 Posted September 4, 2020 (edited) 15 minutes ago, Remy Lebeau said: That is still a beta feature, and it doesn't work quite as well as everyone had hoped it would. Maybe in the future, it will be better. I would like to add that if you change the default Windows page to 65001 and have pas files containing ASCII characters > 127 then your files will be messed up when you open them in Delphi. Also if you build your Delphi projects you will get warnings or even worse produce erroneous executables without warnings. Try to compile SynEdit for example. Having said that, this option is great for interacting with console applications and system processes and being able to at last handle unicode console input/output. Edited September 4, 2020 by pyscripter 1 Share this post Link to post
Fr0sT.Brutal 900 Posted September 4, 2020 If you care about performance so much, you can use if Ord(c) in [Ord('A')..Ord('Z')] or case c of 'A'..'Z': ... end; 1 Share this post Link to post
David Heffernan 2345 Posted September 4, 2020 14 minutes ago, Fr0sT.Brutal said: If you care about performance so much, you can use if Ord(c) in [Ord('A')..Ord('Z')] or case c of 'A'..'Z': ... end; Read Andy's comments to my answer in the SO post. If you care about performance, measure it. 4 Share this post Link to post
Arnaud Bouchez 407 Posted September 5, 2020 10 hours ago, David Heffernan said: If you care about performance, measure it. This is the main idea. No premature optimization. This is not because a single line ("case ... of") is slightly faster than your work will be faster. AnsiString with the system code page is a wrong idea - it is not able to store all Unicode content. UTF-8 is a good idea if you use it from one end to the other in your project. For instance, if your database layer uses "string" then using AnsiString won't help. On the contrary, conversion and memory allocation has a cost, so it may be actually slower. Only if you have UTF-8 from end to end, e.g. in our Open Source framework, we use UTF-8 everwhere, e.g. from DB to JSON, so no UTF-16 conversion is done. It is perfect for server side. But if you write a VCL/FMX RAD app, using plain string makes more sense. Share this post Link to post
Fr0sT.Brutal 900 Posted September 5, 2020 12 hours ago, David Heffernan said: Read Andy's comments to my answer in the SO post. Comment about inefficiency of Unicode strings? Ok, I read it, what that nonsense should have told me? Share this post Link to post
Stefan Glienke 2002 Posted September 5, 2020 7 hours ago, Fr0sT.Brutal said: Comment about inefficiency of Unicode strings? Ok, I read it, what that nonsense should have told me? When David wrote "Andy" he was referring to "Andreas Hausladen" Share this post Link to post
Mike Torrettinni 198 Posted September 5, 2020 1 hour ago, Stefan Glienke said: When David wrote "Andy" he was referring to "Andreas Hausladen" He commented: "The CharInSet function suffers from the fact that the set must be stored in memory because it is specified as a function argument. Thus the compiler can't generate fast arithmetic instructions that operate on CPU registers. It has to use the much slower memory bit-test instruction. So there are multiple memory accesses per iteration (non-inlined: function call, bit-test, function return; inlined: 2x stack juggling, bit-test) compared to none" ""Sets are far less efficient": Not in this case. The compiler is smart enough to change the long element list to ['A'..'Z'] itself and then it uses the fast if (c >= 'A') and (c <= 'Z') to implement the in-operator. And that also with correct code for WideChar as long as the set elements are Ord(x)<=#127." This is quite impressive insight about compiler! I just googled him and I see he is the developer of DelphiSpeedUp and IDEFixPack and other tools! I feel quite honored he took the time to make those comments 🙂 Share this post Link to post
Fr0sT.Brutal 900 Posted September 6, 2020 On 9/5/2020 at 8:59 PM, Stefan Glienke said: When David wrote "Andy" he was referring to "Andreas Hausladen" Ah, THAT Andy )) David really puzzled me! Share this post Link to post