Jump to content
Mike Torrettinni

Use of Ansistring in Unicode world?

Recommended Posts

"Use AnsiStrings instead of Strings. Unicode strings are inefficient." I noticed this as last comment in one of my old SO questions: https://stackoverflow.com/questions/35942270/charinset-is-much-slower-than-in-should-i-fix-w1050-warning-hint ( scroll to the bottom)

Is this just one man's opinion, or is there any validity to it?

 

Is it suggesting that when dealing with string manipulation functions, to convert Unicode string to Ansistring variable type, process the Ansistring and then convert back to Unicode for display/export...?

 

My projects are all in Delphi 10.2, so they all work with string types - Unicode strings. And all works good for importing, processing, exporting and displaying data even for international customers. So, all works good.

Should I look into if any string manipulation methods would be faster with working on Ansistrings?

 

Or should I just ignore the comment and move on?

 

Thanks!

Share this post


Link to post
36 minutes ago, Mike Torrettinni said:

Should I look into if any string manipulation methods would be faster with working on Ansistrings?

Do you need them to be faster?

  • Like 1

Share this post


Link to post

No reason to give any credence to that comment. Ignore it and move on. 

 

Be happy that your code is not limited to text that can be encoded with whatever ANSI locale the machine it runs on is using. 

  • Like 2
  • Thanks 1

Share this post


Link to post

Don't use AnsiString if anyhow possible. Only if you exclusively work with 7bit ASCII subset then you can safely use AnsiStrings. The reason is that AnsiString - Unicode conversions is potentially lossy. You can lose original data.

 

Using 8-bit strings instead of 16-bit strings can be faster under some circumstances, and they also use less memory. But if you have use case where 8-bit string performance and memory consumption is needed you should use UTF8String instead of AnsiString. 

  • Like 2
  • Thanks 1

Share this post


Link to post

Great, good to know I don't need to worry about this!

I remember in D2006 I had to use all sorts of 'magic' with input strings, UTF8Decode/Encode, WideString* conversion methods, switching system locale to test customer's language, locale settings... nice to know that stays in the past! 🙂

Share this post


Link to post
12 minutes ago, Anders Melander said:

Do you need them to be faster?

I use a lot of input data manipulation, but none specific is really slow, and now I see there is no reason to even think about if String <-> Ansistring would make them any faster.

Share this post


Link to post

Agree with all the above, except that UTF8String is an an AnsiString with a Code Page 65001 and is also Unicode compliant and incurs no conversion loss.  It is the default string type in FPC.  In Linux systems everything is UTF8.  And nowadays 65001 it can be set as the default code page in Windows.  

Edited by pyscripter
  • Like 3

Share this post


Link to post
15 minutes ago, pyscripter said:

Agree with all the above, except that UTF8String is an an AnsiString with a Code Page 65001 and it incurs no conversion loss.  It is the default string type in FPC.

More accurately, UTF-8 is the default encoding used by Lazarus, not by FreePascal itself.  On Windows, FPC's RTL sets the default encoding to the system encoding, just as Delphi does (on Linux/OSX, the default encoding is set to UTF-8).  Lazarus overrides that RTL setting.

https://wiki.freepascal.org/Unicode_Support_in_Lazarus#RTL_with_default_codepage_UTF-8

Quote

And nowadays 65001 it can be set as the default code page in Windows.  

That is still a beta feature, and it doesn't work quite as well as everyone had hoped it would.  Maybe in the future, it will be better.

Edited by Remy Lebeau
  • Like 3
  • Thanks 1

Share this post


Link to post

I'm using String for string everywhere with no noticeable performance penalty (although, I never had to write performance-critical programs so far). If I know that encoding is in the game (e.g.: receiving data from a web browser) or I'm working with binary data I'm using TBytes.

Pretty happy until now.

  • Like 1

Share this post


Link to post
15 minutes ago, Remy Lebeau said:

That is still a beta feature, and it doesn't work quite as well as everyone had hoped it would.  Maybe in the future, it will be better.

I would like to add that if you change the default Windows page to 65001 and have pas files containing ASCII characters > 127  then your files will be messed up when you open them in Delphi.  Also if you build your Delphi projects you will get warnings or even worse produce erroneous executables without warnings.  Try to compile SynEdit for example.

 

Having said that, this option is great for interacting with console applications and system processes and being able to at last handle unicode console input/output.

Edited by pyscripter
  • Like 1

Share this post


Link to post

If you care about performance so much, you can use

if Ord(c) in [Ord('A')..Ord('Z')]

or

case c of
  'A'..'Z': ...
end;

 

  • Like 1

Share this post


Link to post
14 minutes ago, Fr0sT.Brutal said:

If you care about performance so much, you can use


if Ord(c) in [Ord('A')..Ord('Z')]

or


case c of
  'A'..'Z': ...
end;

 

Read Andy's comments to my answer in the SO post.

 

If you care about performance, measure it. 

  • Like 4

Share this post


Link to post
10 hours ago, David Heffernan said:

If you care about performance, measure it. 

This is the main idea.

No premature optimization. This is not because a single line ("case  ... of") is slightly faster than your work will be faster.


AnsiString with the system code page is a wrong idea - it is not able to store all Unicode content.
UTF-8 is a good idea if you use it from one end to the other in your project.
For instance, if your database layer uses "string" then using AnsiString won't help. On the contrary, conversion and memory allocation has a cost, so it may be actually slower.

Only if you have UTF-8 from end to end, e.g. in our Open Source framework, we use UTF-8 everwhere, e.g. from DB to JSON, so no UTF-16 conversion is done. It is perfect for server side. But if you write a VCL/FMX RAD app, using plain string makes more sense.

Share this post


Link to post
12 hours ago, David Heffernan said:

Read Andy's comments to my answer in the SO post.

Comment about inefficiency of Unicode strings? Ok, I read it, what that nonsense should have told me?

Share this post


Link to post
7 hours ago, Fr0sT.Brutal said:

Comment about inefficiency of Unicode strings? Ok, I read it, what that nonsense should have told me?

When David wrote "Andy" he was referring to "Andreas Hausladen"

Share this post


Link to post
1 hour ago, Stefan Glienke said:

When David wrote "Andy" he was referring to "Andreas Hausladen"

He commented:

 

"The CharInSet function suffers from the fact that the set must be stored in memory because it is specified as a function argument. Thus the compiler can't generate fast arithmetic instructions that operate on CPU registers. It has to use the much slower memory bit-test instruction. So there are multiple memory accesses per iteration (non-inlined: function call, bit-test, function return; inlined: 2x stack juggling, bit-test) compared to none"

 

""Sets are far less efficient": Not in this case. The compiler is smart enough to change the long element list to ['A'..'Z'] itself and then it uses the fast if (c >= 'A') and (c <= 'Z') to implement the in-operator. And that also with correct code for WideChar as long as the set elements are Ord(x)<=#127."

 

This is quite impressive insight about compiler!

 

I just googled him and I see he is the developer of DelphiSpeedUp and IDEFixPack and other tools! I feel quite honored he took the time to make those comments 🙂

Share this post


Link to post
On 9/5/2020 at 8:59 PM, Stefan Glienke said:

When David wrote "Andy" he was referring to "Andreas Hausladen"

Ah, THAT Andy )) David really puzzled me!

 

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×