Jump to content

Recommended Posts

Does anybody know why WideStringToUCS4String adds 0 zero as the last character?

Surprisingly, Length(WideStringToUCS4String('')) returns 1 and Length(WideStringToUCS4String('abc')) returns 4.

 

Share this post


Link to post

I've not ever looked at this function but it's not hard to see what must be going on. There is no 4 byte string type. So you'll be getting a dynamic array back. And there will be a null terminator as there is for all non short string types. But since there is no compiler support for treating the type as a string, you just get the dyn array length function, which counts the null terminator. 

  • Like 2
  • Thanks 1

Share this post


Link to post
3 hours ago, pyscripter said:

Does anybody know why WideStringToUCS4String adds 0 zero as the last character?

Despite its name, UCS4String is not actually a native string type, like (Ansi|Raw|UTF8|Unicode|Wide)String are.  It is just a dynamic 'array of UCS4Char', so a null UCS4Char is added to the end of the array to allow for null-terminated-string semantics, ie you can type-cast a UCS4String to PUCS4Char and iterate the string up to the null terminator, just like any other null-terminated P(Ansi|Wide)Char string.

 

UCS4String was introduced way back in Delphi 6 (when UTF8String was first added as just an alias for AnsiString), so it couldn't be added as a true native string type back then.

 

They never made UCS4String into a native string type, even though the RTL is now flexible enough to support a native string with 4-byte characters.  All of the necessary plumbing was added in Delphi 2009 when UnicodeString was first introduced and UTF8String was turned into its own unique string type.  UCS4String could easily be made into a native string type now, if they really wanted to.  They probably haven't done so yet because UCS4String is very seldomly used by anyone, so they likely didn't want to waste development resources on it.

Quote

Surprisingly, Length(WideStringToUCS4String('')) returns 1 and Length(WideStringToUCS4String('abc')) returns 4.

Yes, because Length() is simply returning the full array length, which includes the null UCS4Char at the end.

Edited by Remy Lebeau
  • Like 2
  • Thanks 1

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×