Range checking in library code?

Stefan Glienke · October 16, 2020

I think everyone knows what happens in the following code:

{$APPTYPE CONSOLE}
{$RANGECHECKS OFF}

uses
  Classes,
  Generics.Collections,
  SysUtils;

var
  s: TStringList;
  l: TList<Integer>;
begin
  s := TStringList.Create;
  try
    Writeln(s[0]);
  except
    on E: Exception do
      Writeln(E.ClassName, ': ', E.Message);
  end;
  s.Free;

  l := TList<Integer>.Create;
  try
    Writeln(l[0]);
  except
    on E: Exception do
      Writeln(E.ClassName, ': ', E.Message);
  end;
  l.Free;

  Readln;
end.

Spoiler:

EStringListError: List index out of bounds (0)
EArgumentOutOfRangeException: Argument out of range

Now we can argue if that is good or bad that regardless the rangecheck option the code in those lists always does range checking but the question I am pondering over is the following:

Do I as a 3rd party library author where users can compile its code to their liking want to have the same behavior or do I want to make the range checking for index and count parameters in collection classes to depend on {$IFOPT R+} (or some other define) and let the person compiling the code decide if they want to have the range checking or not?

Your suggestions and opinions are welcome.

Before someone suggests it - due to possibly overallocating storage arrays and that you cannot simply rely on the range checking the compiler inserts with {$R+}

Edited October 16, 2020 by Stefan Glienke

October 16, 2020

For me this "3rd party library author" is a key word, this means you are the master of your code, and you take the full responsibility of its quality and functionality, simply put you owe no one anything, you decide that for your self, and keep an open mind when they argue or discuss this behaviour.

Away from that, i believe a range of possibility always exist for errors and bugs, and the wider range of freedom in compilation settings that your code will go through the faster the library will mature, this can be done by letting your users do their best, chaos, like using a thing without even read the documentation or the comments.

Arnaud Bouchez · October 16, 2020

I would not put the behavior change regarding of $R+ setting.
Your library should better be consistent and never need a re-compilation if you need to change e.g. from DEBUG to RELEASE mode.

My idea would be to raise an exception in case of out of bounds error - just like TStringList/TList.
It is what most users expect, and will be confident and pleased with.

In mORMot, the trick is that I also give direct access (as a field, not as a getter - even inlined) to the raw pointer of array storage. So the user can by-pass the range check if needed, e.g. by using a local pointer variable, and proper pointer arithmetic. I document the property as low-level and raw access to the values, to be used at your own risk.
Then advanced users, and other parts of your own library which is safely written and tested, could be able to bypass the range check, by accessing directly the pointer values.

Edited October 16, 2020 by Arnaud Bouchez

Mahdi Safsafi · October 16, 2020

Many compilers typically inserts some validations routines but all can be disabled (optional). Contracts are also optional. range checking for c++ vector can be enabled/disabled using some macro (its enabled by default for Debug and disabled for Release). So IMHO, I think it's better to follow the tradition and let the user decides whether he wants to compile with range checking or not. Personally I enable it under the debugger and disable it for Release.

Stefan Glienke · October 16, 2020

See my point is this - when a developer writes code like this:

for i := 0 to list.Count - 1 do
  DoSomething(list[i]);

it makes absolutely no sense that for every item access to perform the range check. Yes, I know you could use the for-in loop to avoid that but let's just see this as one example of "I have tested this code and I did not make an off by one error on the Count (like forgetting that -1) and let me just get the fastest possible code for my RELEASE build.

Unfortunately unless you recompile the RTL/VCL/FMX/you name it yourself you have to stick with pointless sanity checks all over the place.

No, I have not done any full application benchmark to actually measure the impact of the overall performance of all types of code that uses RTL lists but I think we can agree that even if the impact might be minor there is one.

What I am talking here is the public API of my code that users can interact with and thus I have no control over them passing bad values hence I would like to perform sanity checks to make the code as robust as possible. But I also want the library code to be as fast as possible in a RELEASE configuration (I think this could also stir an argument on whether to turn range/overflow checking on or off then).

I certainly don't want to add two levels of API here but make it behave reasonable under DEBUG and RELEASE. I am using these two words for the "tell me when I did a mistake and give me specific information about the mistake" and the "I tested all the code and don't have out of range cases, dont pollute all my code with useless sanity checks" cases.

David Heffernan · October 16, 2020

Conceptually a list class is the same as an array. For sure it has extra convenience functionality, but it is still a random access array. So my philosophy is that consistency is achieved by having lists match arrays.

I'd have the range checks conditional on RANGECHECKS. Indeed that is how I do it in my personal collection library.

Dave Novo · October 17, 2020

I agree that having two methods to access the list data is helpful. Or maybe just because I got used to it from Delphi. Even Delphi has TList.List forever, so in your example if the developer cared about speed they should be doing

for i := 0 to list.Count - 1 do
  DoSomething(list.list[i]);

Mind you, I have rarely found that this makes a difference I could actually time, but sometimes it has.

Jacek Laskowski · October 17, 2020

Maybe it is worth to surround the scope check by conditional directives?

By default, the scope check would be enabled in debug and release, and for demanding users it would be possible to achieve maximum performance at the expense of a slightly more difficult configuration.

Marat1961 · October 17, 2020

In the process of development on the level of the subject area you want to see the problem at the moment of its occurrence.
Memory destruction from outside the array may be detected too late.

In complicated cases, it may take several hours of your life to find such an error.
Do you need to optimize your program?
And if necessary, for the program code more than adhere to the principle of Pareto 80/20 (here you probably need to amplify by 10 times) 98% of the time takes 2% of the code.

You need to optimize this 2%, and even if it is critical for you.

Even when I was programming microprocessors where the average execution time of one instruction was 3-10 µs LSI-11 and only 8-16k memory I was not saving on safety.

I programmed in Pascal with all the checks included.

And yet my program code was more functional and faster than the programs of my colleagues who programmed on Assembler.

I usually had a better program due to better Pascal code control.

When I was still young and green, immediately after graduating from the Radio Engineering Department of the university, with no specialized training in programming, I tried to detect a repeating code and to extract it to a parameterized subprogram.
I had my favorite book, "Algorithms and Data Structures", by Niklaus Wirth, Mir Publishing House, 1985

which is always next to me, although it already has a double glued cover and some pages are already scattered from use.

If my developed algorithm will work correctly for data in some value domain, I will add a check of this condition.

Unlike some of my colleagues who honor my uniform and think that the appearance of an error message is a bad tone.

Some of them do not show any errors or even close the error output.

I know what the mistake is and exactly where it happened.
I will also try to react quickly and preferably close the problem by the evening and post an extraordinary release.
As a result, I don't have problems with my programs' support for a long time already.
As a result of this approach, I am mainly engaged in extending the functionality of my programs, rather than closing holes in them.

Rarely is 100% code coverage by tests in real projects.
And even 100% code coverage by tests does not guarantee absence of errors in the code.
If the logic is complex, there is no guarantee that every variant of a passing branch has been tested.
A test can detect an error, but tests do not guarantee that there are no errors in code.

In principle, an effectively implemented iterator could provide a fast and safety iteration of array elements.
For me it would be best if the iterator returned the element address.
it is possible to check the fields of the element and change something in it if necessary.

The fastest way to search is with sentinel, when we only compare values and do not check the element index.

To do this we add sentinel to the list at the end of the list and set the key value in sentinel.

p := List.First;
List.Sentinel.Key := x; 
while p.key <> x do Inc(p, Itemsize);
if p <> List.Sentinel then ...

At the end of the list we check which element we have reached if it means that the sentinel is not looking for it.
The structures with sentinel are not only a list.

The current version of the Delphi iterator is not quite perfect,
it requires the call of two methods.
It would be better if the iterator or list executed the code until it said "I did what I wanted".

DoSomething: TEnumFunc;
list.Enumerate(DoSomething);

Edited October 17, 2020 by Marat1961

Tom F · October 17, 2020

42 minutes ago, Marat1961 said:

In complicated cases, it may take several days of your life to find such an error.

There, I've fixed it for you. 😉

Fr0sT.Brutal · October 19, 2020

I wouldn't bother disabling the check.

1) I doubt it causes any noticeable perf loss compared to calling a method and manipulating an item

2) Memory corruption/reading corrupted memory is far more serious than tiny slowdown

3) From my POV and understanding, compiler directives should only control operations with simple types. Classes are out of scope and should use their own checks and exceptions.

4) If you really want to give a user the fastest access method, something like .RawList property would be quite obvious and representative.

balabuev · November 18, 2020

My vote is for this way:

On 10/16/2020 at 8:04 PM, Stefan Glienke said:

make the range checking for index and count parameters in collection classes to depend on {$IFOPT R+}

On the orther hand, collections are different from arrays in the following aspect: colections are typically used in much wider scopes (like in public APIs) than arrays (which are mostly used quite locally).

Edited November 18, 2020 by balabuev

Sign In

Range checking in library code?

Recommended Posts

Stefan Glienke 2143

Share this post

Link to post

Guest

Share this post

Link to post

Arnaud Bouchez 409

Share this post

Link to post

Mahdi Safsafi 225

Share this post

Link to post

Stefan Glienke 2143

Share this post

Link to post

David Heffernan 2448

Share this post

Link to post

Dave Novo 56

Share this post

Link to post

Jacek Laskowski 57

Share this post

Link to post

Marat1961 17

Share this post

Link to post

Tom F 85

Share this post

Link to post

Fr0sT.Brutal 903

Share this post

Link to post

balabuev 102

Share this post

Link to post

Create an account or sign in to comment

Create an account

Sign in

Browse

Activity