dummzeuch 1472 Posted March 9, 2019 (edited) A few weeks ago, Sue King contacted me because there was a problem with using dxGetText together with the Nexus DB components. For Unicode aware Delphi versions gnugettext.pas declares a function utf8decode which calls System.UTF8ToWideString. After replacing a call to utf8decode with UTF8ToUnicodeString the problem went away. Since I don’t want to break backwards compatibility with non Unicode Delphi versions I have now changed gnugettext.utf8decode to call UTF8ToUnicodeString instead of UTF8ToWideString. I can’t see any problem with this change but I am far from being an expert on Unicode related issues. So, if you find any problem with this change, please comment https://blog.dummzeuch.de/2019/03/09/gnugettext-pas-using-utf8tounicodestring-instead-of-utf8towidestring/ Edited March 9, 2019 by dummzeuch Share this post Link to post
Hallvard Vassbotn 3 Posted March 9, 2019 I don’t see any problem with that. Unless that function was not available in an earlier version of the Delphi RTL. Maybe it it should be IFDEFed Share this post Link to post
dummzeuch 1472 Posted March 9, 2019 It is available since at least Delphi 2009. That's the version I tested with. Share this post Link to post
mael 29 Posted March 10, 2019 (edited) 2009 was the first to introduce Unicode and UnicodeString, so it's very likely UTF8ToUnicodeString did not exist before that. But you could use IFDEFs to define UnicodeString as WideString for pre-Unicode Delphi versions, and make a stub UTF8ToUnicodeString that calls UTF8ToWideString. That's how I used to do it, and it worked well. WideString will still not be reference counted of course. A reason for the original issue could be reference-counting. I remember that Andreas Hausladen implemented reference counting for WideStrings, with a hack. I am not sure anymore how it was implemented, and how deep the hack went (a quick search didn't turn up anything). But if people have this patch installed, it may have unintended consequences, which might have caused the issue. Edited March 10, 2019 by mael Share this post Link to post
dummzeuch 1472 Posted March 10, 2019 Utf8Decode existed as an RTL function in older versions (I just checked Delphi 6: It's in system.pas, line 17659). Only in Unicode aware Delphi versions was it marked as deprecated. The gnugettext.Utf8Decode function has already been enclosed in {$ifdef unicode} ... {$endif} since it was introduced in 2012. 1 Share this post Link to post
Remy Lebeau 1353 Posted March 10, 2019 (edited) UTF8String was first introduced in Delphi 6 (but it did not become a true UTF-8 string until Delphi 2009). Delphi 6 has the following UTF8 <-> UTF16 functions in the System unit: function UnicodeToUtf8(Dest: PChar; Source: PWideChar; MaxBytes: Integer): Integer; overload; deprecated; function Utf8ToUnicode(Dest: PWideChar; Source: PChar; MaxChars: Integer): Integer; overload; deprecated; function UnicodeToUtf8(Dest: PChar; MaxDestBytes: Cardinal; Source: PWideChar; SourceChars: Cardinal): Cardinal; overload; function Utf8ToUnicode(Dest: PWideChar; MaxDestChars: Cardinal; Source: PChar; SourceBytes: Cardinal): Cardinal; overload; function UTF8Encode(const WS: WideString): UTF8String; function UTF8Decode(const S: UTF8String): WideString; Do note, however, that these functions did not support 4-byte UTF-8 sequences (Unicode codepoints outside of the BMP), only 1-3 byte sequences (Unicode codepoints in the BMP). That was not fixed until Delphi 2009, when they were rewritten to use platform conversions instead of manual conversions. Now granted, at the time, 4-byte UTF-8 sequences were pretty rare, typically only seen in strings using Eastern Asian languages. But in modern Unicode, they are much more common now, especially with the popularity of emojis on the rise, most of which use high codepoint values outside the BMP. Edited March 11, 2019 by Remy Lebeau Share this post Link to post