Jump to content

GnuGetText.pas using Utf8ToUnicodeString instead of Utf8ToWideString

Recommended Posts

A few weeks ago, Sue King contacted me because there was a problem with using dxGetText together with the Nexus DB components.

For Unicode aware Delphi versions gnugettext.pas declares a function utf8decode which calls System.UTF8ToWideString. After replacing a call to utf8decode with UTF8ToUnicodeString the problem went away.

Since I don’t want to break backwards compatibility with non Unicode Delphi versions I have now changed gnugettext.utf8decode to call UTF8ToUnicodeString instead of UTF8ToWideString.

I can’t see any problem with this change but I am far from being an expert on Unicode related issues. So, if you find any problem with this change, please comment



Edited by dummzeuch

Share this post

Link to post

I don’t see any problem with that. Unless that function was not available in an earlier version of the Delphi RTL. 


Maybe it it should be IFDEFed

Share this post

Link to post

It is available since at least Delphi 2009. That's the version I tested with.

Share this post

Link to post

2009 was the first to introduce Unicode and UnicodeString, so it's very likely UTF8ToUnicodeString did not exist before that.

But you could use IFDEFs to define UnicodeString as WideString for pre-Unicode Delphi versions, and make a stub UTF8ToUnicodeString that calls UTF8ToWideString.

That's how I used to do it, and it worked well. WideString will still not be reference counted of course.


A reason for the original issue could be reference-counting. I remember that Andreas Hausladen implemented reference counting for WideStrings, with a hack. I am not sure anymore how it was implemented, and how deep the hack went (a quick search didn't turn up anything). But if people have this patch installed, it may have unintended consequences, which might have caused the issue.

Edited by mael

Share this post

Link to post

Utf8Decode existed as an RTL function in older versions (I just checked Delphi 6: It's in system.pas, line 17659). Only in Unicode aware Delphi versions was it marked as deprecated.


The gnugettext.Utf8Decode function has already been enclosed in {$ifdef unicode} ... {$endif} since it was introduced in 2012.

  • Like 1

Share this post

Link to post

UTF8String was first introduced in Delphi 6 (but it did not become a true UTF-8 string until Delphi 2009).  Delphi 6 has the following UTF8 <-> UTF16 functions in the System unit:


function UnicodeToUtf8(Dest: PChar; Source: PWideChar; MaxBytes: Integer): Integer; overload; deprecated;
function Utf8ToUnicode(Dest: PWideChar; Source: PChar; MaxChars: Integer): Integer; overload; deprecated;


function UnicodeToUtf8(Dest: PChar; MaxDestBytes: Cardinal; Source: PWideChar; SourceChars: Cardinal): Cardinal; overload;
function Utf8ToUnicode(Dest: PWideChar; MaxDestChars: Cardinal; Source: PChar; SourceBytes: Cardinal): Cardinal; overload;


function UTF8Encode(const WS: WideString): UTF8String;
function UTF8Decode(const S: UTF8String): WideString;


Do note, however, that these functions did not support 4-byte UTF-8 sequences (Unicode codepoints outside of the BMP), only 1-3 byte sequences (Unicode codepoints in the BMP).  That was not fixed until Delphi 2009, when they were rewritten to use platform conversions instead of manual conversions.


Now granted, at the time, 4-byte UTF-8 sequences were pretty rare, typically only seen in strings using Eastern Asian languages.  But in modern Unicode, they are much more common now, especially with the popularity of emojis on the rise, most of which use high codepoint values outside the BMP.

Edited by Remy Lebeau

Share this post

Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now