Jump to content
terran

What does this function mean? CharPrev / Range Check Error

Recommended Posts

Posted (edited)

What does this string mean?

function PathLastChar(const S: String): PChar;
{ Returns pointer to last character in the string. Is MBCS-aware. Returns nil
  if the string is empty. }
begin
  if S = '' then
    Result := nil
  else
    Result := CharPrev(Pointer(S), @S[Length(S)+1]);

end;

I got Range Check Error, because it's uses Length(S)+1. And why Pointer?

This function was used in InnoUnpacker, and worked for others (why?).

I replaced it to PChar(S[Length(S)]);. So why it worked before?

Edited by terran

Share this post


Link to post

It is written for 1 based strings where length(s)+1 is ending #0 (unless string is empty). Are you compiling it with 0 based strings, where it would go beyond it?

I do not know, why Pointer(S) and not @S[1]. Pointer probably works also correctly with empty strings, but there is already empty string check.

CharPrev itself is Windows api function that accepts PChar parameters.

  • Like 1

Share this post


Link to post
Posted (edited)
12 hours ago, terran said:

I replaced it to PChar(S[Length(S)]);. 

That would not work. That is taking the last character and type-casting its value to a pointer, which is not the same thing as taking a pointer to the last character.

 

I would have used this instead:

CharPrev(PChar(S), PChar(S) + Length(S));

That would work with both 0-based and 1-based strings.

Edited by Remy Lebeau
  • Like 4

Share this post


Link to post

What's the difference?

 

 

Zero-based strings not used.

Probably it was compiled with range check disabled.

Share this post


Link to post
3 hours ago, terran said:

What's the difference?

The difference is that:

  1. @S[Length(S)+1] returns a pointer to the null-terminator only on a 1-based string, but goes out of bounds on a 0-based string.
  2. PChar(S)+Length(S) returns a pointer to the null-terminator on both a 1-based and a 0-based string.
3 hours ago, terran said:

Zero-based strings not used.

Probably it was compiled with range check disabled.

Range checking would need to be disabled, because the null terminator is not included in the Length of the string, so it is not an indexable character, even though it is physically present in memory.  All the more reason to use pointer arithmetic instead of character indexing to access the terminator.

  • Like 1

Share this post


Link to post

I meant this: "PChar(S[Length(S)])".

 

There's no point of using CharPrev of next char of required.

 

 

Share this post


Link to post

Previous character, not next.

CharPrev works also, if character consists of multiple codepoints...

Share this post


Link to post
1 hour ago, Virgo said:

Previous character, not next.

CharPrev works also, if character consists of multiple codepoints...

That is not relevant for Unicode UTF-16, which is what the String type uses in all Delphi releases since more than a decade. Who relies an ANSI/MBCS strings these days anymore? Windows has used Unicode internally for ages...

Share this post


Link to post
Posted (edited)
31 minutes ago, PeterBelow said:

That is not relevant for Unicode UTF-16

It is absolutely is relevant to UTF-16. From CharPrevW documentation 

Quote

This function works with default "user" expectations of characters when dealing with diacritics. For example: A string that contains U+0061 U+030a "LATIN SMALL LETTER A" + COMBINING RING ABOVE" — which looks like "å", will advance two code points, not one. A string that contains U+0061 U+0301 U+0302 U+0303 U+0304 — which looks like "a´^~¯", will advance five code points, not one, and so on.

 

Edited by Virgo

Share this post


Link to post
Quote

U+0061 U+030a "LATIN SMALL LETTER A" + COMBINING RING ABOVE"

...also known as the single character U+00E5 (Latin Small Letter A with Ring Above) of which U+0061 U+030a is the decomposition.

 

But, even if the input was guaranteed to be composed Unicode then it would not be safe to replace CharPrev with a "-1" without knowing the exact algorithm CharPrev uses internally.

 

Quote

A string that contains U+0061 U+0301 U+0302 U+0303 U+0304 — which looks like "a´^~¯", will advance five code points, not one, and so on. 

The "looks like" part is nonsense since the glyphs produced by that sequence depends on the font being used to render it but it seems like CharPrev just skips all Combining Diacritical Marks.

Share this post


Link to post
Posted (edited)
7 hours ago, terran said:

I meant this: "PChar(S[Length(S)])".

 

There's no point of using CharPrev of next char of required.

As I stated earlier, PChar(S[Length(S)]) is extracting the last single Char from the string and type-casting its value into a PChar pointer, which is wrong.  You need to use the @ operator to get the address of that Char.

 

But in any case, using S[Length(S)] doesn't take into account that a string contains encoded codeunits, so will NOT be the last full character if the character is encoded using multiple codeunits.  CharPrev() takes the encoding into account.  The code is getting a pointer to the null terminator and then moving the pointer backwards 1 full character regardless of how how many codeunits it actually takes.

 

Edited by Remy Lebeau
  • Like 2

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×