GnuGetText.pas using Utf8ToUnicodeString instead of Utf8ToWideString

dummzeuch · March 9, 2019

A few weeks ago, Sue King contacted me because there was a problem with using dxGetText together with the Nexus DB components.

For Unicode aware Delphi versions gnugettext.pas declares a function utf8decode which calls System.UTF8ToWideString. After replacing a call to utf8decode with UTF8ToUnicodeString the problem went away.

Since I don’t want to break backwards compatibility with non Unicode Delphi versions I have now changed gnugettext.utf8decode to call UTF8ToUnicodeString instead of UTF8ToWideString.

I can’t see any problem with this change but I am far from being an expert on Unicode related issues. So, if you find any problem with this change, please comment

https://blog.dummzeuch.de/2019/03/09/gnugettext-pas-using-utf8tounicodestring-instead-of-utf8towidestring/

Edited March 9, 2019 by dummzeuch

Hallvard Vassbotn · March 9, 2019

I don’t see any problem with that. Unless that function was not available in an earlier version of the Delphi RTL.

Maybe it it should be IFDEFed

dummzeuch · March 9, 2019

It is available since at least Delphi 2009. That's the version I tested with.

mael · March 10, 2019

2009 was the first to introduce Unicode and UnicodeString, so it's very likely UTF8ToUnicodeString did not exist before that.

But you could use IFDEFs to define UnicodeString as WideString for pre-Unicode Delphi versions, and make a stub UTF8ToUnicodeString that calls UTF8ToWideString.

That's how I used to do it, and it worked well. WideString will still not be reference counted of course.

A reason for the original issue could be reference-counting. I remember that Andreas Hausladen implemented reference counting for WideStrings, with a hack. I am not sure anymore how it was implemented, and how deep the hack went (a quick search didn't turn up anything). But if people have this patch installed, it may have unintended consequences, which might have caused the issue.

Edited March 10, 2019 by mael

dummzeuch · March 10, 2019

Utf8Decode existed as an RTL function in older versions (I just checked Delphi 6: It's in system.pas, line 17659). Only in Unicode aware Delphi versions was it marked as deprecated.

The gnugettext.Utf8Decode function has already been enclosed in {$ifdef unicode} ... {$endif} since it was introduced in 2012.

Remy Lebeau · March 10, 2019

UTF8String was first introduced in Delphi 6 (but it did not become a true UTF-8 string until Delphi 2009). Delphi 6 has the following UTF8 <-> UTF16 functions in the System unit:

function UnicodeToUtf8(Dest: PChar; Source: PWideChar; MaxBytes: Integer): Integer; overload; deprecated;
function Utf8ToUnicode(Dest: PWideChar; Source: PChar; MaxChars: Integer): Integer; overload; deprecated;

function UnicodeToUtf8(Dest: PChar; MaxDestBytes: Cardinal; Source: PWideChar; SourceChars: Cardinal): Cardinal; overload;
function Utf8ToUnicode(Dest: PWideChar; MaxDestChars: Cardinal; Source: PChar; SourceBytes: Cardinal): Cardinal; overload;

function UTF8Encode(const WS: WideString): UTF8String;
function UTF8Decode(const S: UTF8String): WideString;

Do note, however, that these functions did not support 4-byte UTF-8 sequences (Unicode codepoints outside of the BMP), only 1-3 byte sequences (Unicode codepoints in the BMP). That was not fixed until Delphi 2009, when they were rewritten to use platform conversions instead of manual conversions.

Now granted, at the time, 4-byte UTF-8 sequences were pretty rare, typically only seen in strings using Eastern Asian languages. But in modern Unicode, they are much more common now, especially with the popularity of emojis on the rise, most of which use high codepoint values outside the BMP.

Edited March 11, 2019 by Remy Lebeau

Sign In

GnuGetText.pas using Utf8ToUnicodeString instead of Utf8ToWideString

Recommended Posts

dummzeuch 1675

Share this post

Link to post

Hallvard Vassbotn 3

Share this post

Link to post

dummzeuch 1675

Share this post

Link to post

mael 29

Share this post

Link to post

dummzeuch 1675

Share this post

Link to post

Remy Lebeau 1642

Share this post

Link to post

Create an account or sign in to comment

Create an account

Sign in

Browse

Activity