terran 0 Posted August 28, 2024 (edited) What does this string mean? function PathLastChar(const S: String): PChar; { Returns pointer to last character in the string. Is MBCS-aware. Returns nil if the string is empty. } begin if S = '' then Result := nil else Result := CharPrev(Pointer(S), @S[Length(S)+1]); end; I got Range Check Error, because it's uses Length(S)+1. And why Pointer? This function was used in InnoUnpacker, and worked for others (why?). I replaced it to PChar(S[Length(S)]);. So why it worked before? Edited August 28, 2024 by terran Share this post Link to post
Virgo 18 Posted August 28, 2024 It is written for 1 based strings where length(s)+1 is ending #0 (unless string is empty). Are you compiling it with 0 based strings, where it would go beyond it? I do not know, why Pointer(S) and not @S[1]. Pointer probably works also correctly with empty strings, but there is already empty string check. CharPrev itself is Windows api function that accepts PChar parameters. 1 Share this post Link to post
Remy Lebeau 1452 Posted August 28, 2024 (edited) 12 hours ago, terran said: I replaced it to PChar(S[Length(S)]);. That would not work. That is taking the last character and type-casting its value to a pointer, which is not the same thing as taking a pointer to the last character. I would have used this instead: CharPrev(PChar(S), PChar(S) + Length(S)); That would work with both 0-based and 1-based strings. Edited August 29, 2024 by Remy Lebeau 4 Share this post Link to post
terran 0 Posted August 28, 2024 What's the difference? Zero-based strings not used. Probably it was compiled with range check disabled. Share this post Link to post
Remy Lebeau 1452 Posted August 29, 2024 3 hours ago, terran said: What's the difference? The difference is that: @S[Length(S)+1] returns a pointer to the null-terminator only on a 1-based string, but goes out of bounds on a 0-based string. PChar(S)+Length(S) returns a pointer to the null-terminator on both a 1-based and a 0-based string. 3 hours ago, terran said: Zero-based strings not used. Probably it was compiled with range check disabled. Range checking would need to be disabled, because the null terminator is not included in the Length of the string, so it is not an indexable character, even though it is physically present in memory. All the more reason to use pointer arithmetic instead of character indexing to access the terminator. 1 Share this post Link to post
terran 0 Posted August 29, 2024 I meant this: "PChar(S[Length(S)])". There's no point of using CharPrev of next char of required. Share this post Link to post
Virgo 18 Posted August 29, 2024 Previous character, not next. CharPrev works also, if character consists of multiple codepoints... Share this post Link to post
PeterBelow 239 Posted August 29, 2024 1 hour ago, Virgo said: Previous character, not next. CharPrev works also, if character consists of multiple codepoints... That is not relevant for Unicode UTF-16, which is what the String type uses in all Delphi releases since more than a decade. Who relies an ANSI/MBCS strings these days anymore? Windows has used Unicode internally for ages... Share this post Link to post
Virgo 18 Posted August 29, 2024 (edited) 31 minutes ago, PeterBelow said: That is not relevant for Unicode UTF-16 It is absolutely is relevant to UTF-16. From CharPrevW documentation Quote This function works with default "user" expectations of characters when dealing with diacritics. For example: A string that contains U+0061 U+030a "LATIN SMALL LETTER A" + COMBINING RING ABOVE" — which looks like "å", will advance two code points, not one. A string that contains U+0061 U+0301 U+0302 U+0303 U+0304 — which looks like "a´^~¯", will advance five code points, not one, and so on. Edited August 29, 2024 by Virgo Share this post Link to post
Anders Melander 1820 Posted August 29, 2024 Quote U+0061 U+030a "LATIN SMALL LETTER A" + COMBINING RING ABOVE" ...also known as the single character U+00E5 (Latin Small Letter A with Ring Above) of which U+0061 U+030a is the decomposition. But, even if the input was guaranteed to be composed Unicode then it would not be safe to replace CharPrev with a "-1" without knowing the exact algorithm CharPrev uses internally. Quote A string that contains U+0061 U+0301 U+0302 U+0303 U+0304 — which looks like "a´^~¯", will advance five code points, not one, and so on. The "looks like" part is nonsense since the glyphs produced by that sequence depends on the font being used to render it but it seems like CharPrev just skips all Combining Diacritical Marks. Share this post Link to post
Anders Melander 1820 Posted August 29, 2024 9 minutes ago, Anders Melander said: the exact algorithm CharPrev uses internally As far as I can tell it uses GetStringType(CT_CTYPE3) and skips codepoints with the C3_NONSPACING flag or without the C3_ALPHA flag. Share this post Link to post
Remy Lebeau 1452 Posted August 29, 2024 (edited) 7 hours ago, terran said: I meant this: "PChar(S[Length(S)])". There's no point of using CharPrev of next char of required. As I stated earlier, PChar(S[Length(S)]) is extracting the last single Char from the string and type-casting its value into a PChar pointer, which is wrong. You need to use the @ operator to get the address of that Char. But in any case, using S[Length(S)] doesn't take into account that a string contains encoded codeunits, so will NOT be the last full character if the character is encoded using multiple codeunits. CharPrev() takes the encoding into account. The code is getting a pointer to the null terminator and then moving the pointer backwards 1 full character regardless of how how many codeunits it actually takes. Edited August 29, 2024 by Remy Lebeau 2 Share this post Link to post