MarkShark 27 Posted April 12 Hi all! Is there a library routine or api function that will compare two utf8 strings or buffers without converting them to UTF16? I'm looking for something analogous to AnsiCompareStr. Thanks! -Mark Share this post Link to post
DelphiUdIT 187 Posted April 12 If you only want to know if two buffer or strings are identical, you can use CompareMem (from System-Sysutils) : https://docwiki.embarcadero.com/Libraries/Athens/en/System.SysUtils.CompareMem Of course the size of the two blocks should be the same (to be identical 😉 ) . Share this post Link to post
Remy Lebeau 1436 Posted April 12 (edited) 2 hours ago, MarkShark said: I'm looking for something analogous to AnsiCompareStr. The System.AnsiStrings.AnsiCompareStr() function uses the Win32 CompareStringA() function on Windows (the System.SysUtils.AnsiCompareStr() function uses CompareStringW() instead). But, CompareStringA() assumes the input strings are in the ANSI encoding of the specified locale. The MSDN documentation says: Quote If your application is calling the ANSI version of CompareString, the function converts parameters via the default code page of the supplied locale. Thus, an application can never use CompareString to handle UTF-8 text. Edited April 12 by Remy Lebeau Share this post Link to post
pyscripter 694 Posted April 12 (edited) 6 minutes ago, Remy Lebeau said: The System.AnsiStrings.AnsiCompareStr() function uses the Win32 CompareStringA() function on Windows. Unfortunately (see CompareStringA function (winnls.h) - Win32 apps | Microsoft Learn😞 Quote If your application is calling the ANSI version of CompareString, the function converts parameters via the default code page of the supplied locale. Thus, an application can never use CompareString to handle UTF-8 text. It also appears that System.AnsiStrings.AnsiCompareStr ignores the code page of the ansi strings. Edited April 12 by pyscripter Share this post Link to post
pyscripter 694 Posted April 12 System.AnsiStrings.AnsiCompareStr under POSIX converts the ansistrings to UnicodeStrings and then compares. FPC has a function UTF8CompareStr, which sounds promising, but it also converts the ansi strings to UTF-16 and then compares them. Unfortunately it appears that there is no good way to directly compare utf8 strings without converting them. I hope I am wrong. Share this post Link to post
PeterBelow 239 Posted April 13 14 hours ago, MarkShark said: Hi all! Is there a library routine or api function that will compare two utf8 strings or buffers without converting them to UTF16? I'm looking for something analogous to AnsiCompareStr. Thanks! -Mark Depends on what you need to compare for. Equality is easy, that is a simple memory comparison. Larger/smaller implies a collation sequence, which is language-specific. Here converting to UTF-16 first is, in my opinion, the only practical way. Same if you need case-insensitive comparison, since case conversion is also language-specific and may not be applicable for the language in question at all (e.g. chinese). Share this post Link to post
Brandon Staggs 285 Posted April 15 On 4/12/2024 at 6:22 PM, pyscripter said: Unfortunately it appears that there is no good way to directly compare utf8 strings without converting them. I hope I am wrong. If you know the UTF-8 strings have been normalized the same way and you want to test for equality, then do a simple memory comparison. If you need to do anything more complex than that (case-insensitivity, ignoring diacritics, accounting for different normalizations, etc) , then there is no reason not to convert the strings anyway. Share this post Link to post