alank2 5 Posted August 23, 2022 There are a whole list of functions here, but it seems some are depreciated and others not: https://docwiki.embarcadero.com/RADStudio/Alexandria/en/UTF-8_Conversion_Routines Many VCL/FMX properties use UnicodeString, so when working with them, if you want to convert to UTF-8 and back, what do you use? One is UTF8ToUnicodeString, but I don't see its reverse which I would have expected to possibly be UnicodeStringToUTF8 ? Share this post Link to post
Uwe Raabe 2064 Posted August 23, 2022 What variable type are you going to store the UTF-8? Share this post Link to post
Lajos Juhász 295 Posted August 23, 2022 25 minutes ago, alank2 said: One is UTF8ToUnicodeString, but I don't see its reverse which I would have expected to possibly be UnicodeStringToUTF8 ? It's https://docwiki.embarcadero.com/Libraries/Alexandria/en/System.UTF8Encode. Share this post Link to post
alank2 5 Posted August 23, 2022 That really is the question isn't it. What I've *been doing* is using wchar_t in cppbuilder, but looking at that now, I'm wonder if that is the best approach or not. Most of the text I work with is going to fit in 7-bit ASCII, but if wchar_t has to use surrogates to support all of Unicodes 17 planes anyway, why not just use UTF-8 which is perhaps more efficient as well anyway? I found this site which is certainly pro UTF-8: http://utf8everywhere.org/ My question is, for modern cppbuilder development, is it better to use wchar_t or go back to char and assume it is UTF-8? Both have the issue of variable code points possibly being one character anyway. If so, then are the conversions to and from the UnicodeString's that VCL/FMX uses worth dealing with, or does it make more sense to just store them in a wchar_t. So many things have to be converted to char for the outside world anyway. I know there may not be a one thought fits all on this, so I just wanted to get everyone's opinion. Share this post Link to post
Uwe Raabe 2064 Posted August 23, 2022 Well, I cannot speak for C++-Builder, but in Delphi there is type UTF8String and you can just assign to and from string: var S: string; u8: UTF8String; begin S := 'Hello World'; u8 := S; u8 := 'Hello World'; S := u8; end; Share this post Link to post
Remy Lebeau 1436 Posted August 23, 2022 1 hour ago, Uwe Raabe said: Well, I cannot speak for C++-Builder, but in Delphi there is type UTF8String and you can just assign to and from string UTF8String exists in C++Builder too, and does the same implicit conversion to UTF-8 when assigned other string types. Share this post Link to post
Remy Lebeau 1436 Posted August 23, 2022 3 hours ago, alank2 said: My question is, for modern cppbuilder development, is it better to use wchar_t or go back to char and assume it is UTF-8? It really depends on what you are using the strings for. If the strings are mostly for interacting with Embarcadero's RTL/VCL/FMX frameworks, then stick with UnicodeString/System::String, and convert to other string types only when needed. If the strings are mostly for interacting with external libraries, then use whatever type is most suitable for those libs, and convert to/from UnicodeString only when needed. Share this post Link to post
alank2 5 Posted August 24, 2022 Thanks everyone; I'll take a look at UTF8String! Share this post Link to post