What does this function mean? CharPrev / Range Check Error

terran · August 28, 2024

What does this string mean?

function PathLastChar(const S: String): PChar;
{ Returns pointer to last character in the string. Is MBCS-aware. Returns nil
  if the string is empty. }
begin
  if S = '' then
    Result := nil
  else
    Result := CharPrev(Pointer(S), @S[Length(S)+1]);

end;

I got Range Check Error, because it's uses Length(S)+1. And why Pointer?

This function was used in InnoUnpacker, and worked for others (why?).

I replaced it to PChar(S[Length(S)]);. So why it worked before?

Edited August 28, 2024 by terran

Virgo · August 28, 2024

It is written for 1 based strings where length(s)+1 is ending #0 (unless string is empty). Are you compiling it with 0 based strings, where it would go beyond it?

I do not know, why Pointer(S) and not @S[1]. Pointer probably works also correctly with empty strings, but there is already empty string check.

CharPrev itself is Windows api function that accepts PChar parameters.

Remy Lebeau · August 28, 2024

12 hours ago, terran said:

I replaced it to PChar(S[Length(S)]);.

That would not work. That is taking the last character and type-casting its value to a pointer, which is not the same thing as taking a pointer to the last character.

I would have used this instead:

CharPrev(PChar(S), PChar(S) + Length(S));

That would work with both 0-based and 1-based strings.

Edited August 29, 2024 by Remy Lebeau

terran · August 28, 2024

What's the difference?

Zero-based strings not used.

Probably it was compiled with range check disabled.

Remy Lebeau · August 29, 2024

3 hours ago, terran said:

What's the difference?

The difference is that:

@S[Length(S)+1] returns a pointer to the null-terminator only on a 1-based string, but goes out of bounds on a 0-based string.
PChar(S)+Length(S) returns a pointer to the null-terminator on both a 1-based and a 0-based string.

3 hours ago, terran said:

Zero-based strings not used.

Probably it was compiled with range check disabled.

Range checking would need to be disabled, because the null terminator is not included in the Length of the string, so it is not an indexable character, even though it is physically present in memory. All the more reason to use pointer arithmetic instead of character indexing to access the terminator.

terran · August 29, 2024

I meant this: "PChar(S[Length(S)])".

There's no point of using CharPrev of next char of required.

Virgo · August 29, 2024

Previous character, not next.

CharPrev works also, if character consists of multiple codepoints...

PeterBelow · August 29, 2024

1 hour ago, Virgo said:

Previous character, not next.

CharPrev works also, if character consists of multiple codepoints...

That is not relevant for Unicode UTF-16, which is what the String type uses in all Delphi releases since more than a decade. Who relies an ANSI/MBCS strings these days anymore? Windows has used Unicode internally for ages...

Virgo · August 29, 2024

31 minutes ago, PeterBelow said:

That is not relevant for Unicode UTF-16

It is absolutely is relevant to UTF-16. From CharPrevW documentation

Quote

This function works with default "user" expectations of characters when dealing with diacritics. For example: A string that contains U+0061 U+030a "LATIN SMALL LETTER A" + COMBINING RING ABOVE" — which looks like "å", will advance two code points, not one. A string that contains U+0061 U+0301 U+0302 U+0303 U+0304 — which looks like "a´^~¯", will advance five code points, not one, and so on.

Edited August 29, 2024 by Virgo

Anders Melander · August 29, 2024

Quote

U+0061 U+030a "LATIN SMALL LETTER A" + COMBINING RING ABOVE"

...also known as the single character U+00E5 (Latin Small Letter A with Ring Above) of which U+0061 U+030a is the decomposition.

But, even if the input was guaranteed to be composed Unicode then it would not be safe to replace CharPrev with a "-1" without knowing the exact algorithm CharPrev uses internally.

Quote

A string that contains U+0061 U+0301 U+0302 U+0303 U+0304 — which looks like "a´^~¯", will advance five code points, not one, and so on.

The "looks like" part is nonsense since the glyphs produced by that sequence depends on the font being used to render it but it seems like CharPrev just skips all Combining Diacritical Marks.

Anders Melander · August 29, 2024

9 minutes ago, Anders Melander said:

the exact algorithm CharPrev uses internally

As far as I can tell it uses GetStringType(CT_CTYPE3) and skips codepoints with the C3_NONSPACING flag or without the C3_ALPHA flag.

Remy Lebeau · August 29, 2024

7 hours ago, terran said:

I meant this: "PChar(S[Length(S)])".

There's no point of using CharPrev of next char of required.

As I stated earlier, PChar(S[Length(S)]) is extracting the last single Char from the string and type-casting its value into a PChar pointer, which is wrong. You need to use the @ operator to get the address of that Char.

But in any case, using S[Length(S)] doesn't take into account that a string contains encoded codeunits, so will NOT be the last full character if the character is encoded using multiple codeunits. CharPrev() takes the encoding into account. The code is getting a pointer to the null terminator and then moving the pointer backwards 1 full character regardless of how how many codeunits it actually takes.

Edited August 29, 2024 by Remy Lebeau

Sign In

What does this function mean? CharPrev / Range Check Error

Recommended Posts

terran 0

Share this post

Link to post

Virgo 18

Share this post

Link to post

Remy Lebeau 1618

Share this post

Link to post

terran 0

Share this post

Link to post

Remy Lebeau 1618

Share this post

Link to post

terran 0

Share this post

Link to post

Virgo 18

Share this post

Link to post

PeterBelow 259

Share this post

Link to post

Virgo 18

Share this post

Link to post

Anders Melander 2023

Share this post

Link to post

Anders Melander 2023

Share this post

Link to post

Remy Lebeau 1618

Share this post

Link to post

Create an account or sign in to comment

Create an account

Sign in

Browse

Activity