pyscripter 689 Posted October 11, 2019 Does anybody know why WideStringToUCS4String adds 0 zero as the last character? Surprisingly, Length(WideStringToUCS4String('')) returns 1 and Length(WideStringToUCS4String('abc')) returns 4. Share this post Link to post
David Heffernan 2345 Posted October 11, 2019 I've not ever looked at this function but it's not hard to see what must be going on. There is no 4 byte string type. So you'll be getting a dynamic array back. And there will be a null terminator as there is for all non short string types. But since there is no compiler support for treating the type as a string, you just get the dyn array length function, which counts the null terminator. 2 1 Share this post Link to post
pyscripter 689 Posted October 11, 2019 @David Heffernan http://docwiki.embarcadero.com/Libraries/Rio/en/System.UCS4String Your were spot on. 1 Share this post Link to post
Remy Lebeau 1396 Posted October 11, 2019 (edited) 3 hours ago, pyscripter said: Does anybody know why WideStringToUCS4String adds 0 zero as the last character? Despite its name, UCS4String is not actually a native string type, like (Ansi|Raw|UTF8|Unicode|Wide)String are. It is just a dynamic 'array of UCS4Char', so a null UCS4Char is added to the end of the array to allow for null-terminated-string semantics, ie you can type-cast a UCS4String to PUCS4Char and iterate the string up to the null terminator, just like any other null-terminated P(Ansi|Wide)Char string. UCS4String was introduced way back in Delphi 6 (when UTF8String was first added as just an alias for AnsiString), so it couldn't be added as a true native string type back then. They never made UCS4String into a native string type, even though the RTL is now flexible enough to support a native string with 4-byte characters. All of the necessary plumbing was added in Delphi 2009 when UnicodeString was first introduced and UTF8String was turned into its own unique string type. UCS4String could easily be made into a native string type now, if they really wanted to. They probably haven't done so yet because UCS4String is very seldomly used by anyone, so they likely didn't want to waste development resources on it. Quote Surprisingly, Length(WideStringToUCS4String('')) returns 1 and Length(WideStringToUCS4String('abc')) returns 4. Yes, because Length() is simply returning the full array length, which includes the null UCS4Char at the end. Edited October 11, 2019 by Remy Lebeau 2 1 Share this post Link to post
Remy Lebeau 1396 Posted September 25, 2020 On 10/11/2019 at 2:26 PM, Remy Lebeau said: They never made UCS4String into a native string type, even though the RTL is now flexible enough to support a native string with 4-byte characters. https://quality.embarcadero.com/browse/RSP-31118 Share this post Link to post
A.M. Hoornweg 144 Posted September 27, 2020 On 9/25/2020 at 8:25 PM, Remy Lebeau said: https://quality.embarcadero.com/browse/RSP-31118 Voted! Share this post Link to post