Jump to content
Tommi Prami

IntToStr algorithm (Interesting read)

Recommended Posts

https://towardsdatascience.com/34-faster-integer-to-string-conversion-algorithm-c72453d25352

To my understanding Delphi has bit different algorithm, but looking at "default implementation" mentioned article or one Delphi has, it is pretty complex beast in either case, and some what computationally heavy.  

 

Delphi has lookup tables and DivBy100()  which I would guess it'll make it faster than the "default implementation". Not sure tough.


lr_printer Github repo: https://github.com/tigranh/lr_printer

 

-Tee-

Edited by Tommi Prami

Share this post


Link to post

I don't know what to think after reading that article.

 

Here are my comments on it:

- the classic way of truncating the last 2 digits with div and mod 10 (or 100) does not involve a costly div or mod instruction on modern compilers (*cough* even Delphi 12 now does it - apart from the bugs that came with it)

- I think C++ compilers would detect doing a div and a mod instruction and the code they emit would be further optimized so it does not require the "workaround" that the Delphi RTL uses by calculating the modulo by subtracting the div result times 100 from the original value.

- the pseudo-code he shows for detecting the number of digits is correct but this is never what gets executed - and you either rewrite this into a few branches (as you can see in the RTL), a C++ compiler might unroll the loop or some other trickery is applied

 

The DivBy100 function was introduced by me in RSP-36119 and I already notified them that DivBy100 can be removed in 12 because now it properly optimizes a div by 100 - however, that affects performance only by like 0.5% or so.

 

As David correctly pointed out the real bottleneck is the heap allocation - and not only a single one when you just turn an integer into a string and display that one but when you concat strings and numbers the "classic" way because then it produces a ton of small temporary strings.

That issue even exists when using TStringBuilder where one might think that this was built for optimization. If you look into some overloads of Append you will see that it naively calls into IntToStr and passes that down to the overload that takes string. This is completely insane as the conversion should be done directly in place into the internal buffer that TStringBuilder already uses instead of creating a temporary string, convert the integer into that one, pass that to Append to copy its content into the buffer. This will likely be my next contribution as part of my "Better RTL" series of JIRA entries.

Edited by Stefan Glienke
  • Like 8

Share this post


Link to post

From my codebase:

// disable range checks and overflow checks so that Abs() functions in case Value = Low(Value)
{$R-}
{$Q-}
function CopyIntegerToAnsiBuffer(const Value: Integer; var Buffer: array of AnsiChar): Integer;
var
  i, j: Integer;
  val, remainder: Cardinal;
  negative: Boolean;
  tmp: array [0..15] of AnsiChar;
begin
  negative := Value<0;
  val := Abs(Value);
  Result := 0;
  repeat
    DivMod(val, 10, val, remainder);
    tmp[Result] := AnsiChar(remainder + Ord('0'));
    Inc(Result);
  until val=0;
  if negative then begin
    tmp[Result] := '-';
    Inc(Result);
  end;
  Assert(Result<=Length(Buffer));

  i := 0;
  j := Result-1;
  while i<Result do begin
    Buffer[i] := tmp[j];
    Inc(i);
    Dec(j);
  end;
end;

function CopyIntegerToWideBuffer(const Value: Integer; var Buffer: array of WideChar): Integer;
var
  i, j: Integer;
  val, remainder: Cardinal;
  negative: Boolean;
  tmp: array [0..15] of WideChar;
begin
  negative := Value<0;
  val := Abs(Value);
  Result := 0;
  repeat
    DivMod(val, 10, val, remainder);
    tmp[Result] := WideChar(remainder + Ord('0'));
    Inc(Result);
  until val=0;
  if negative then begin
    tmp[Result] := '-';
    Inc(Result);
  end;
  Assert(Result<=Length(Buffer));

  i := 0;
  j := Result-1;
  while i<Result do begin
    Buffer[i] := tmp[j];
    Inc(i);
    Dec(j);
  end;
end;

function CopyInt64ToAnsiBuffer(const Value: Int64; var Buffer: array of AnsiChar): Integer;
var
  i, j: Integer;
  val, remainder: UInt64;
  negative: Boolean;
  tmp: array [0..23] of AnsiChar;
begin
  negative := Value<0;
  val := Abs(Value);
  Result := 0;
  repeat
    DivMod(val, 10, val, remainder);
    tmp[Result] := AnsiChar(remainder + Ord('0'));
    Inc(Result);
  until val=0;
  if negative then begin
    tmp[Result] := '-';
    Inc(Result);
  end;
  Assert(Result<=Length(Buffer));

  i := 0;
  j := Result-1;
  while i<Result do begin
    Buffer[i] := tmp[j];
    Inc(i);
    Dec(j);
  end;
end;

function CopyInt64ToWideBuffer(const Value: Int64; var Buffer: array of WideChar): Integer;
var
  i, j: Integer;
  val, remainder: UInt64;
  negative: Boolean;
  tmp: array [0..23] of WideChar;
begin
  negative := Value<0;
  val := Abs(Value);
  Result := 0;
  repeat
    DivMod(val, 10, val, remainder);
    tmp[Result] := WideChar(remainder + Ord('0'));
    Inc(Result);
  until val=0;
  if negative then begin
    tmp[Result] := '-';
    Inc(Result);
  end;
  Assert(Result<=Length(Buffer));

  i := 0;
  j := Result-1;
  while i<Result do begin
    Buffer[i] := tmp[j];
    Inc(i);
    Dec(j);
  end;
end;
{$IFDEF RANGECHECKSON}{$R+}{$ENDIF}
{$IFDEF OVERFLOWCHECKSON}{$Q+}{$ENDIF}

No heap allocation. I guess I could avoid that DivMod now and do something better.

 

The main point is that a good runtime library should have building block functions like this that allow us to perform such conversions without forcing heap allocation. 

 

The RTL should have functionality like this on top of which all conversion functions can be built. Then the RTL function can be optimised and everyone benefits. 

Edited by David Heffernan
  • Like 5
  • Thanks 1

Share this post


Link to post

I am trying to understand when this stack based allocation can practically help, because it seems to me that for the most part, you usually are going to have to do something with this array of char, and inevitably transform it to a string which will do some heap allocation and copy the stack data.

 

The one thing that pops to my head is if saving a large number of integers to a file (as strings of course), when you can write to the file directly from the char buffer allocated on the stack. Then you avoid any heap allocation for a large number of conversions.

 

Are there many other circumstances where you can avoid the eventual conversion to a heap allocated string and reap the benefits of the stack based memory allocation? It seems limited to areas where you are doing large numbers of IntToStr and only need to keep the string value within the local method within the loop.

  • Thanks 1

Share this post


Link to post
13 hours ago, Dave Novo said:

The one thing that pops to my head is if saving a large number of integers to a file (as strings of course), when you can write to the file directly from the char buffer allocated on the stack.

Yes, this is it. It's a very important use case though. Imagine writing large YAML or JSON files containing a lot of data. And then imagine trying to do the same with multiple threads with extra contention on a memory manager.

 

The point is that you build IntToStr using a function like mine above. And publish both. This allows the consumer of the library to have both options. Even if avoiding heap allocation is something that is only done in a narrow set of circumstances, that doesn't render it unnecessary.

Edited by David Heffernan
  • Thanks 1

Share this post


Link to post

I never meant to imply that it was unnecessary. In fact, I already applied your excellent code to my own code that was writing out CSV files from large 2D matrices of integers. I was just wondering if there was a wider variety of circumstances that I could also leverage this technique that I was not thinking of. 

 

In fact, I also have to write out large matrices of floating point values, so will try to modify the Delphi float conversion routines to work on array of char similar to what you proposed below. 

 

TBH though, I have not done much timing. I wonder if the overhead of writing the file to disk dwarfs the time it takes for the conversions. 

Share this post


Link to post
On 12/12/2023 at 8:51 PM, Dave Novo said:

The one thing that pops to my head is if saving a large number of integers to a file (as strings of course), when you can write to the file directly from the char buffer allocated on the stack.

 

9 hours ago, Dave Novo said:

I already applied your excellent code to my own code that was writing out CSV files from large 2D matrices of integers.

I would suggest to ditch the stack and return to heap, but without allocation, see, i have very similar to David implementation, (David's can be refactored easily to do so), with differences that i refactored the converting code into a function take a pointer to 24 char ( two in fact one ansichar and one widechar) this function take three parameters :

1) pointer to this buffer ( hence it could be on stack or on the heap) guaranteed by the caller to have the space of 24 char or 24 byte based on the version.

2) the value to parse

3) var to receive the length in char, this will be as same as the result but in chars, to trim in char in case needed.

the result will be in byte the actual length being used, to trim in bytes in case needed.

 

with such you can convert your csv numbers at really great number with 0 allocation per operation, and you don't need the limited stack space.

 

example

SetLength( T, 24 );
LenByte := IntToStrMem( T[1], Value, CharsLen);
SetLength(T,CharsLen);

// or 

// Loop over this, for buffer (as in array of bytes)
LenByte := IntToStrMem(@Buffer[P], Value[i], CharsLen);
Inc(P,LenBye);
Buffer[P]:=',';
Inc(P);

.... 
//use 
PBufferWC : PWideChar;
..
LenByte := IntToStrMem(PBufferWC, Value[i], CharsLen);
Inc(OverallLength,LenByte); 
Inc(PBufferWC,CharsLen);
PBufferWC^:=',';
Inc(PBufferWC);

 

Share this post


Link to post
3 hours ago, Kas Ob. said:

I would suggest to ditch the stack and return to heap

Kind of irrelevant whether it's heap or stack. Just that you avoid the heap allocation. 

Share this post


Link to post
Just now, David Heffernan said:

Kind of irrelevant whether it's heap or stack. Just that you avoid the heap allocation. 

Sure and agree, just more portability and in case Dave will be faster.

Share this post


Link to post
13 hours ago, Dave Novo said:

In fact, I also have to write out large matrices of floating point values, so will try to modify the Delphi float conversion routines to work on array of char similar to what you proposed below. 

I have routines to do that. Because the Delphi code to convert between text and floats is irredeemably broken. I'm using dragon4 for float to text, but these days there are better algorithms around. I use dtoa in the other direction.

 

13 hours ago, Dave Novo said:

I wonder if the overhead of writing the file to disk dwarfs the time it takes for the conversions. 

My code base is spewing out a lot of floats to text files and yes it makes a significant difference to performance. Especially for files on a local drive.

Share this post


Link to post
On 12/14/2023 at 4:04 AM, David Heffernan said:

 Because the Delphi code to convert between text and floats is irredeemably broken.

 

Could you expand on that a little bit please?  And does that apply to C++Builder as well?

 

Share this post


Link to post
Posted (edited)
On 12/13/2023 at 10:33 AM, David Heffernan said:

Yes, this is it. It's a very important use case though. Imagine writing large YAML or JSON files containing a lot of data. And then imagine trying to do the same with multiple threads with extra contention on a memory manager.

 

The point is that you build IntToStr using a function like mine above. And publish both. This allows the consumer of the library to have both options. Even if avoiding heap allocation is something that is only done in a narrow set of circumstances, that doesn't render it unnecessary.

Delphi memory manager is very fast and for small block allocation it has some parallelism built in. I measured IntToStr(-1583749570) at 17ns, CopyIntegerToWideBuffer(-1583749570, Buffer) at 13ns and the version in System.SysUtils without string allocation 7ns. All numbers in nano seconds. Example code below:

const
  CTwoDigitLookup : packed array[0..99] of array[0..1] of Char =
    ('00','01','02','03','04','05','06','07','08','09',
     '10','11','12','13','14','15','16','17','18','19',
     '20','21','22','23','24','25','26','27','28','29',
     '30','31','32','33','34','35','36','37','38','39',
     '40','41','42','43','44','45','46','47','48','49',
     '50','51','52','53','54','55','56','57','58','59',
     '60','61','62','63','64','65','66','67','68','69',
     '70','71','72','73','74','75','76','77','78','79',
     '80','81','82','83','84','85','86','87','88','89',
     '90','91','92','93','94','95','96','97','98','99');


function DivBy100(i: Cardinal): Cardinal;
{$IF Defined(WIN32)}
asm
        MOV     ECX, 1374389535 // 1/100 * 2^(32+5)
        MUL     ECX
        MOV     EAX, EDX
        SHR     EAX, 5
end;
{$ELSEIF Defined(WIN64))}
asm
        MOV     EAX, ECX
        IMUL    RAX, RAX, 1374389535
        SHR     RAX, 37
end;
{$ELSE}
inline;
begin
  Result := i div 100;
end;
{$ENDIF}

{$R-}
{$Q-}
{$POINTERMATH ON}
function Int32ToWideBuffer(const AValue: Int32; ABuffer: PChar): Integer;
var
  LVal, LRemainder: UInt32;
  LDigits: Integer;
  LNegative: Boolean;
begin
  LVal := Abs(AValue);
  if LVal >= 10000 then
    if LVal >= 1000000 then
      if LVal >= 100000000 then
        LDigits := 9 + Byte(Ord(LVal >= 1000000000))
      else
        LDigits := 7 + Byte(Ord(LVal >= 10000000))
    else
      LDigits := 5 + Byte(Ord(LVal >= 100000))
  else
    if LVal >= 100 then
      LDigits := 3 + Byte(Ord(LVal >= 1000))
    else
      LDigits := 1 + Byte(Ord(LVal >= 10));
  LNegative := AValue < 0;
  ABuffer^ := '-';
  Inc(ABuffer, Ord(LNegative));
  Result := LDigits;
  Inc(Result, Ord(LNegative));
  while LDigits > 1 do
  begin
    LRemainder := LVal;
    LVal := DivBy100(LVal);
    Dec(LRemainder, LVal * 100);
    Dec(LDigits, 2);
    PCardinal(@ABuffer[LDigits])^ := Cardinal(CTwoDigitLookup[LRemainder]);
  end;
  if LDigits <> 0 then
    ABuffer^ := Char(LVal or Ord('0'));
end;

 

Edited by dennis12

Share this post


Link to post
13 hours ago, dennis12 said:

Delphi memory manager is very fast and for small block allocation it has some parallelism built in.

Automatic stack allocation is faster than all heap allocators, and has no thread contention. And Delphi's current heap allocator is known not to be scalable.

 

I've measured this in a real world setting. Avoiding heap allocations when converting numbers (integer and real) to text makes a huge performance difference when creating files, e.g. XML, JSON, YAML. Even when single threaded. When multi-threading the impact is even greater, if using the default heap allocator. I personally use isolated per thread heaps to reduce contention. This doesn't eliminate it because every allocation/deallocation that requires a call to VirtualAlloc and friends has contention.

Share this post


Link to post

Unless you show the benchmark code any results you mention are pointless because there can be so many things in the benchmark that lead to bias.

You could for example run IntToStr in a loop and this will lead to using the same already allocated string over and over.

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×