Use of Ansistring in Unicode world?

Mike Torrettinni · September 4, 2020

"Use AnsiStrings instead of Strings. Unicode strings are inefficient." I noticed this as last comment in one of my old SO questions: https://stackoverflow.com/questions/35942270/charinset-is-much-slower-than-in-should-i-fix-w1050-warning-hint ( scroll to the bottom)

Is this just one man's opinion, or is there any validity to it?

Is it suggesting that when dealing with string manipulation functions, to convert Unicode string to Ansistring variable type, process the Ansistring and then convert back to Unicode for display/export...?

My projects are all in Delphi 10.2, so they all work with string types - Unicode strings. And all works good for importing, processing, exporting and displaying data even for international customers. So, all works good.

Should I look into if any string manipulation methods would be faster with working on Ansistrings?

Or should I just ignore the comment and move on?

Thanks!

Anders Melander · September 4, 2020

36 minutes ago, Mike Torrettinni said:

Should I look into if any string manipulation methods would be faster with working on Ansistrings?

Do you need them to be faster?

David Heffernan · September 4, 2020

No reason to give any credence to that comment. Ignore it and move on.

Be happy that your code is not limited to text that can be encoded with whatever ANSI locale the machine it runs on is using.

Dalija Prasnikar · September 4, 2020

Don't use AnsiString if anyhow possible. Only if you exclusively work with 7bit ASCII subset then you can safely use AnsiStrings. The reason is that AnsiString - Unicode conversions is potentially lossy. You can lose original data.

Using 8-bit strings instead of 16-bit strings can be faster under some circumstances, and they also use less memory. But if you have use case where 8-bit string performance and memory consumption is needed you should use UTF8String instead of AnsiString.

Mike Torrettinni · September 4, 2020

Great, good to know I don't need to worry about this!

I remember in D2006 I had to use all sorts of 'magic' with input strings, UTF8Decode/Encode, WideString* conversion methods, switching system locale to test customer's language, locale settings... nice to know that stays in the past! 🙂

Mike Torrettinni · September 4, 2020

12 minutes ago, Anders Melander said:

Do you need them to be faster?

I use a lot of input data manipulation, but none specific is really slow, and now I see there is no reason to even think about if String <-> Ansistring would make them any faster.

pyscripter · September 4, 2020

Agree with all the above, except that UTF8String is an an AnsiString with a Code Page 65001 and is also Unicode compliant and incurs no conversion loss. It is the default string type in FPC. In Linux systems everything is UTF8. And nowadays 65001 it can be set as the default code page in Windows.

Edited September 4, 2020 by pyscripter

Remy Lebeau · September 4, 2020

15 minutes ago, pyscripter said:

Agree with all the above, except that UTF8String is an an AnsiString with a Code Page 65001 and it incurs no conversion loss. It is the default string type in FPC.

More accurately, UTF-8 is the default encoding used by Lazarus, not by FreePascal itself. On Windows, FPC's RTL sets the default encoding to the system encoding, just as Delphi does (on Linux/OSX, the default encoding is set to UTF-8). Lazarus overrides that RTL setting.

https://wiki.freepascal.org/Unicode_Support_in_Lazarus#RTL_with_default_codepage_UTF-8

Quote

And nowadays 65001 it can be set as the default code page in Windows.

That is still a beta feature, and it doesn't work quite as well as everyone had hoped it would. Maybe in the future, it will be better.

Edited September 4, 2020 by Remy Lebeau

aehimself · September 4, 2020

I'm using String for string everywhere with no noticeable performance penalty (although, I never had to write performance-critical programs so far). If I know that encoding is in the game (e.g.: receiving data from a web browser) or I'm working with binary data I'm using TBytes.

Pretty happy until now.

pyscripter · September 4, 2020

15 minutes ago, Remy Lebeau said:

That is still a beta feature, and it doesn't work quite as well as everyone had hoped it would. Maybe in the future, it will be better.

I would like to add that if you change the default Windows page to 65001 and have pas files containing ASCII characters > 127 then your files will be messed up when you open them in Delphi. Also if you build your Delphi projects you will get warnings or even worse produce erroneous executables without warnings. Try to compile SynEdit for example.

Having said that, this option is great for interacting with console applications and system processes and being able to at last handle unicode console input/output.

Edited September 4, 2020 by pyscripter

Fr0sT.Brutal · September 4, 2020

If you care about performance so much, you can use

if Ord(c) in [Ord('A')..Ord('Z')]

or

case c of
  'A'..'Z': ...
end;

David Heffernan · September 4, 2020

14 minutes ago, Fr0sT.Brutal said:
If you care about performance so much, you can use
if Ord(c) in [Ord('A')..Ord('Z')]
or
case c of
  'A'..'Z': ...
end;

Read Andy's comments to my answer in the SO post.

If you care about performance, measure it.

Arnaud Bouchez · September 5, 2020

10 hours ago, David Heffernan said:

If you care about performance, measure it.

This is the main idea.

No premature optimization. This is not because a single line ("case ... of") is slightly faster than your work will be faster.

AnsiString with the system code page is a wrong idea - it is not able to store all Unicode content.
UTF-8 is a good idea if you use it from one end to the other in your project.
For instance, if your database layer uses "string" then using AnsiString won't help. On the contrary, conversion and memory allocation has a cost, so it may be actually slower.

Only if you have UTF-8 from end to end, e.g. in our Open Source framework, we use UTF-8 everwhere, e.g. from DB to JSON, so no UTF-16 conversion is done. It is perfect for server side. But if you write a VCL/FMX RAD app, using plain string makes more sense.

Fr0sT.Brutal · September 5, 2020

12 hours ago, David Heffernan said:

Read Andy's comments to my answer in the SO post.

Comment about inefficiency of Unicode strings? Ok, I read it, what that nonsense should have told me?

Stefan Glienke · September 5, 2020

7 hours ago, Fr0sT.Brutal said:

Comment about inefficiency of Unicode strings? Ok, I read it, what that nonsense should have told me?

When David wrote "Andy" he was referring to "Andreas Hausladen"

Mike Torrettinni · September 5, 2020

1 hour ago, Stefan Glienke said:

When David wrote "Andy" he was referring to "Andreas Hausladen"

He commented:

"The CharInSet function suffers from the fact that the set must be stored in memory because it is specified as a function argument. Thus the compiler can't generate fast arithmetic instructions that operate on CPU registers. It has to use the much slower memory bit-test instruction. So there are multiple memory accesses per iteration (non-inlined: function call, bit-test, function return; inlined: 2x stack juggling, bit-test) compared to none"

""Sets are far less efficient": Not in this case. The compiler is smart enough to change the long element list to ['A'..'Z'] itself and then it uses the fast if (c >= 'A') and (c <= 'Z') to implement the in-operator. And that also with correct code for WideChar as long as the set elements are Ord(x)<=#127."

This is quite impressive insight about compiler!

I just googled him and I see he is the developer of DelphiSpeedUp and IDEFixPack and other tools! I feel quite honored he took the time to make those comments 🙂

Fr0sT.Brutal · September 6, 2020

On 9/5/2020 at 8:59 PM, Stefan Glienke said:

When David wrote "Andy" he was referring to "Andreas Hausladen"

Ah, THAT Andy )) David really puzzled me!

Sign In

Use of Ansistring in Unicode world?

Recommended Posts

Mike Torrettinni 199

Share this post

Link to post

Anders Melander 2127

Share this post

Link to post

David Heffernan 2492

Share this post

Link to post

Dalija Prasnikar 1569

Share this post

Link to post

Mike Torrettinni 199

Share this post

Link to post

Mike Torrettinni 199

Share this post

Link to post

pyscripter 850

Share this post

Link to post

Remy Lebeau 1674

Share this post

Link to post

aehimself 423

Share this post

Link to post

pyscripter 850

Share this post

Link to post

Fr0sT.Brutal 904

Share this post

Link to post

David Heffernan 2492

Share this post

Link to post

Arnaud Bouchez 414

Share this post

Link to post

Fr0sT.Brutal 904

Share this post

Link to post

Stefan Glienke 2190

Share this post

Link to post

Mike Torrettinni 199

Share this post

Link to post

Fr0sT.Brutal 904

Share this post

Link to post

Create an account or sign in to comment

Create an account

Sign in

Browse

Activity