Jump to content
Mike Torrettinni

Micro optimization - effect of defined and not used local variables

Recommended Posts

and "IF" you use it that way:

  • RAD Studio 10.3.3 Arch sample
  • aSTR.Lenght  --> using Helper class for Strings

 

image.thumb.png.5c1aed52ec4c20cf0ba60d8aa002bc4d.png    image.thumb.png.4559080e9b80c0e8fce7e3437c15c1fd.png   image.thumb.png.ac2b9810c82c8921af642465189e3f78.png

 

var
  Form1: TForm1;

implementation

{$R *.dfm}

// const OR
var
  bFlag: boolean = true; // if needs change it

function ProcessStringOLD(const aStr: string): string;
begin
  Result := aStr;
  //
  // Result := Result + ' new value '; // NO NEEDS any "local" var!
  //
  if (Result.Length = 1) then
    Result := aStr;
end;

function ProcessStringNew(const aStr: string): string;
begin
  try
    if bFlag then
      Exit(aStr);
    //
    Result := ProcessStringOLD(aStr);
    //
  finally
    Form1.Caption := Result;
  end;
end;

procedure TForm1.Button1Click(Sender: TObject);
begin
  ProcessStringNew('h');
end;

initialization // is executed before many unit (yours)

bFlag := false;

finalization

end.

 

hug

 

 

Edited by emailx45

Share this post


Link to post
2 hours ago, David Heffernan said:

Is this code a bottleneck in your program? If not write it in the way that is easiest to read. How many times have I said that to you? Why would you choose to make your code hard to read for no benefit? 

It this change is an actual improvement and if it becomes a template for similar functions, it could have noticeable effect on overall project. Of course, if at the end the result is not worth it, then change will not be implemented.

Share this post


Link to post
2 hours ago, Bill Meyer said:

Or as a friend advised me when I started learning to code, on my wood-burning CPU: First make it work, then worry about performance.

Very good, thanks! This is years old function working as expected. Maybe it's time for an improvement, but only if there is benefit at the end. I'm not trying a change for the sake of a change.

Share this post


Link to post
54 minutes ago, emailx45 said:

initialization // is executed before many unit (yours) 
  bFlag := false;
finalization 
end.

 

Thank you. Interesting suggestion, but I don't use unit initialization sections (I think only in 1). I have 'prepare on start' unit/methods that handle/control project behavior, executed before first form is shown.

Share this post


Link to post
4 hours ago, Mike Torrettinni said:

Thank you. Interesting suggestion, but I don't use unit initialization sections (I think only in 1). I have 'prepare on start' unit/methods that handle/control project behavior, executed before first form is shown.

This Sections is always used to initialize/register class etc... in OP.

you can have in any unit, and it is executed according with calls order in your projects. always before any others units without it, or on finaly when end your app.

 

Delphi use it in many units (~895units in RAD Studio 10.3.3 Arch in source codes)

  • It is not just an "adornment" in the code, but an important section, after "Interface" and "Implementation", properly!
  • Widely used in "FireDAC", for example!

All_Units_Using_INITIALIZATION_and_FINALIZATION_Sections.txt

Edited by emailx45

Share this post


Link to post
17 hours ago, Rollo62 said:

You could remove the flag, by the use of a PointerVariable as pointer to function.

Would that not potentially incur a cache miss, if the pointer points to a "remote" function?

Share this post


Link to post
16 hours ago, Kas Ob. said:

move these local managed types vars to be private fields even when each one of them is not used outside one method, here you can recycle them

This makes the object unusable for multi-threading because it is unnecessarily stateful.

  • Like 1

Share this post


Link to post
7 hours ago, Mike Torrettinni said:

It this change is an actual improvement

Measure your program and find out. 

 

If all you do is micro benchmarks then likely all you will achieve is to make your code harder to read and develop, and your program runs no faster. 

 

Do you know where the bottlenecks are in your program? 

Edited by David Heffernan

Share this post


Link to post
7 minutes ago, A.M. Hoornweg said:

This makes the object unusable for multi-threading because it is unnecessarily stateful.

While it can, the impact is manageable most the time by default, see, you have an object, if the object doesn't use any fields of its own then you are free to use it in multithread way safely, of course if it is not calling unsafe outsider code ( objects, functions), so by introducing such approach by moving local var to object field, then yes it should be protected against parallel usage, but and this is big but, when the last time you saw an object without local field that had been used in multithreading !? those are rare.

If these object does have fields then it is already protected and that field (converted from local) would not be a big difference.

 

It is an approach to squeeze some juice, not ideal and it does add complexity, also comes with drawbacks like the one you pointing to, but how this is different from any algorithm we use on daily basis.

Share this post


Link to post
14 minutes ago, Kas Ob. said:

when the last time you saw an object without local field that had been used in multithreading

What exactly is a "local field" ?         

 

Do you mean a private field of a class (a member of an instantiated object, located on the heap) , or do you mean a local variable of a procedure or method (located on the stack) ?

 

 

Share this post


Link to post
13 minutes ago, A.M. Hoornweg said:

What exactly is a "local field" ?     

just "fields" instead of "local field"

 

the right wording of the question is 

when the last time you saw an object without fields that had been used in multithreading ?

Share this post


Link to post

Some hints about performance on REAL bottlenecks:

 

It covers, among others, the tip of a sub-function if you have some temporary managed variables (like string).

 

The associated code, proving the slide assumptions, is available at https://synopse.info/files/slides/EKON22_2_High_Performance_Pascal_Code_On_Servers.zip
Worth I look to understand how it works in practice.

 

But remember:
"Premature Optimization if the Root of All Evil !" (DK)

Edited by Arnaud Bouchez

Share this post


Link to post
1 minute ago, Kas Ob. said:

just "fields" instead of "local field"

 

the right wording of the question is 

when the last time you saw an object without fields that had been used in multithreading ?

All the time.   I am especially fond of classes that have only class methods. They basically act as namespaces.  

 

 

 

  • Like 1

Share this post


Link to post
10 minutes ago, A.M. Hoornweg said:

All the time.   I am especially fond of classes that have only class methods. They basically act as namespaces.  

Fine, means are you free to use the first one with extra unused parameter moving the allocating the managed type variable from the the intensively called function in a loop to the caller.

 

I didn't suggest that you or anyone should use that everywhere, but if you to enhance a loop calling function with such variables then there is a workaround.

I found myself using TStringList very often, it is great tool that can't live without, but when it does come to fast in intensive data processing i found recycling that list yield better performance, as such usage will remove the create and free, leaving me to call clear on exit, which the skipped destructor should called.

  • Like 1

Share this post


Link to post
1 hour ago, Kas Ob. said:

Fine, means are you free to use the first one with extra unused parameter moving the allocating the managed type variable from the the intensively called function in a loop to the caller.

 

I didn't suggest that you or anyone should use that everywhere, but if you to enhance a loop calling function with such variables then there is a workaround.

I found myself using TStringList very often, it is great tool that can't live without, but when it does come to fast in intensive data processing i found recycling that list yield better performance, as such usage will remove the create and free, leaving me to call clear on exit, which the skipped destructor should called.

Fair enough.   You're using the stringlist as an internally shared object, just for saving some time by not having to create/destroy one whenever you need one.

 

You could take that concept one step further by creating a global stringlist pool (a singleton) from which you can request an available tStringlist whenever you need one.  That pool could be shared among many objects and you could even make it threadsafe if you want.

 

Quote

 

Procedure tMyobject.DoSomething;

VAR ts:tStringlist;

begin

  ts:=StringListPool.GetList;

  ...

  StringListPool.Release(ts);

end;

 

 

 

The problem of having local variables of managed data types such as strings is that Delphi needs to guarantee that no memory leaks occur.  So there's always a hidden Try/Finally block in such methods that will "finalize" the managed variables and release any allocated heap space. That takes time to execute, even if there's no further "code" in the method.

 

 

 

 

 

 

 

 

 

 

 

Share this post


Link to post

Good suggestions! At this moment this was a test of micro benchmarking, and if similar concept is applied to multiple methods, it might bring some more than micro improvements. Of course this is not a 'let me test this quickly in 1h and know the results'... it will take time and results might not be what I was hoping for, or I might be surprised and it turns out to be big overall improvement. 🙂

Share this post


Link to post
On 11/26/2020 at 7:08 PM, A.M. Hoornweg said:

The problem of having local variables of managed data types such as strings is that Delphi needs to guarantee that no memory leaks occur.  So there's always a hidden Try/Finally block in such methods that will "finalize" the managed variables and release any allocated heap space. That takes time to execute, even if there's no further "code" in the method.

 

I came across this trying to duplicate C++'s std::next_permutation using Delphi.  I went through various modifications, and had left a string declaration in where it was no longer needed:

procedure reverse(var s:AnsiString; const a,x:word);  inline;
var
    i,j : word;
    //t   : string;
begin                          //  x is one past the end of string
   if  a  = x-1 then exit;
   j     := ( x-a ) shr 1;     //  trunc((x-a)/2);
   for i := 1 to j do
            swapCh( s[a-1+i] , s[x-i] );
end;

All permutations of 12 chars = 479,001,600.  C++ = 2s.  Commenting out the string reduced the Delphi code from 9s to 6s.  (I haven't been back to it since then.)

 

  • Like 1

Share this post


Link to post
On 11/26/2020 at 12:08 PM, A.M. Hoornweg said:

there's always a hidden Try/Finally block in such methods that will "finalize" the managed variables and release any allocated heap space. That takes time to execute, even if there's no further "code" in the method.

And on win32 those try/finally have a significant effect even worse than a heap allocation at times because they completely trash a part of the CPUs branch prediction mechanism - see RSP-27375

Edited by Stefan Glienke

Share this post


Link to post

@pmcgee That swapCh caught my eye and signaled something wrong, would you care to share its implementation ?

 

I think it can be faster but that depends on that swapChar, s is var string and the compiler in many cases will introduce an overhead for handling it and passing it further.

While you care about the speed of 12! permutations operation, then can you check if this is faster

procedure reverse(var s:AnsiString; const a,x:word);  inline;
var
    i,j : word;
    //t   : string;
    tmpChar: Byte;
    SBytes: pByte absolute s;
begin                          //  x is one past the end of string
   if  a  = x-1 then exit;
   j     := ( x-a ) shr 1;     //  trunc((x-a)/2);
   for i := 1 to j do
     begin
       tmpChar := SBytes[a - 1 + i];
       SBytes[a - 1 + i] := SBytes[x - i];
       SBytes[x - i] := tmpChar;      
       //swapCh( s[a-1+i] , s[x-i] );
     end;
end;

Didn't run the code, i hope it is right.

Share this post


Link to post
5 minutes ago, Kas Ob. said:

@pmcgee That swapCh caught my eye and signaled something wrong, would you care to share its implementation ?

 

I think it can be faster but that depends on that swapChar, s is var string and the compiler in many cases will introduce an overhead for handling it and passing it further.

While you care about the speed of 12! permutations operation, then can you check if this is faster


   for i := 1 to j do
     begin
       tmpChar := SBytes[a - 1 + i];
       SBytes[a - 1 + i] := SBytes[x - i];
       SBytes[x - i] := tmpChar;      
       //swapCh( s[a-1+i] , s[x-i] );
     end;

 

I had tried a couple things there ... this was a small improvement.  I haven't pulled apart the assembly code yet.  It's just an ongoing interest.    It'll be fun to try it with char/byte array.
 

procedure swapByte( a:Pbyte ; b:Pbyte );    inline;
begin
    if a <> b then begin
       a^ := a^ + b^;
       b^ := a^ - b^;
       a^ := a^ - b^ ;
    end;
end;

procedure swapChar( var a : Ansichar; var b : Ansichar );   inline;
var c :  Ansichar;
begin
    if a<>b then begin
       c := a; a := b; b := c;
    end;
end;

 

Share this post


Link to post

@pmcgee Thank you for sharing.

 

I tried this

{$APPTYPE CONSOLE}

{$R *.res}

uses
  System.SysUtils,
  Winapi.Windows;

  {
procedure swapByte( a:Pbyte ; b:Pbyte );    inline;
begin
    if a <> b then begin
       a^ := a^ + b^;
       b^ := a^ - b^;
       a^ := a^ - b^ ;
    end;
end;  }

procedure swapChar(var a: Ansichar; var b: Ansichar); inline;
var
  c: Ansichar;
begin
  if a <> b then
  begin
    c := a;
    a := b;
    b := c;
  end;
end;

procedure reverse(var s: AnsiString; const a, x: word); inline;
var
  i, j: word;
  //t   : string;
begin                          //  x is one past the end of string
  if a = x - 1 then
    exit;
  j := (x - a) shr 1;     //  trunc((x-a)/2);
  for i := 1 to j do
    swapChar(s[a - 1 + i], s[x - i]);
end;

procedure reverse2(var s: AnsiString; const a, x: word); inline;
var
  i, j: word;
  tmpChar: Byte;
  SBytes: pByte absolute s;
begin
  if a = x - 1 then
    exit;
  j := (x - a) shr 1;     //  trunc((x-a)/2);
  for i := 1 to j do
  begin
    tmpChar := SBytes[a - 1 + i];
    SBytes[a - 1 + i] := SBytes[x - i];
    SBytes[x - i] := tmpChar;
    //swapCh(s[a - 1 + i], s[x - i]);
  end;
end;

var
  st: AnsiString;
  i: Integer;
  D: Uint64;

begin
  st := '1234567890ab';
  Writeln(st);
  d := GetTickCount;
  for i := 1 to 479001600 do           // 12!
    reverse(st, 1, 12);
  D := GetTickCount - D;
  Writeln(d);

  st := '1234567890ab';
  Writeln(st);
  d := GetTickCount;
  for i := 1 to 479001600 do
    reverse2(st, 1, 12);
  D := GetTickCount - D;
  Writeln(d);

  Readln;
end.

The result

Quote

1234567890ab
14125
1234567890ab
4891

What am i missing here ?

Share this post


Link to post
13 hours ago, Stefan Glienke said:

And on win32 those try/finally have a significant effect even worse than a heap allocation at times because they completely trash a part of the CPUs branch prediction mechanism - see RSP-27375

Yes that's right ! I've seen your proposal as well and I've a better proposal that solves your proposal issues(Step Over/replicating the codes) and it's a little bit slightly faster !

To begin, the issue arise because there was a mismatch between a call and a ret instruction (a ret instruction that doesn't correspond to a call instruction). In your proposal, you introduced a call to fix the issue but that also introduced Step Over issue !

Here is my proposal if we jumped without using a call instruction then we simply return without using a ret instruction. How ? we do a lazy stack pop (add esp, 4) to remove the return address from the stack then we jump back to the return address (jmp [esp - 4]).

Program Test;

{$APPTYPE CONSOLE}
{$R *.res}
{$O+,W-}

uses
  Diagnostics, Windows;

{$DEFINE PATCH_TRY_FINALLY}
{$DEFINE REPLACE_RET_WITH_JMP}

procedure Test;
var
  i: Integer;
begin
  i := 0;
  try
    Inc(i);
    asm
      nop
      nop
    end;
  finally
    Dec(i);
    Dec(i);
    Dec(i);
{$IFDEF REPLACE_RET_WITH_JMP}
    {
      payload :
      ---------
      add  esp, 4      // remove return address from the stack
      jmp  [esp - 4]   // jmp back (return address)
    }
    Dec(i);
    Dec(i);
{$ENDIF}
  end;
  if i = 0 then;
end;

procedure PatchTryFinally1(address: Pointer);
const
  jmp: array [0 .. 14] of Byte = ($33, $C0, $5A, $59, $59, $64, $89, $10, $E8, $02, $00, $00, $00, $EB, $00);
var
  n: NativeUInt;
  target: Pointer;
  offset: Byte;
begin
  target := PPointer(PByte(address) + 11)^;
  offset := PByte(target) - (PByte(address) + 10) - 5;

  WriteProcessMemory(GetCurrentProcess, address, @jmp, SizeOf(jmp), n);
  WriteProcessMemory(GetCurrentProcess, PByte(address) + SizeOf(jmp) - 1, @offset, 1, n);
  FlushInstructionCache(GetCurrentProcess, address, SizeOf(jmp));
end;

procedure PatchTryFinally2(address: Pointer);
const
  Data: array [0 .. 6] of Byte = ($83, $C4, $04, $FF, $64, $24, $FC);
var
  n: NativeUInt;
begin
  WriteProcessMemory(GetCurrentProcess, address, @Data, SizeOf(Data), n);
end;

procedure PatchTryFinally(address: Pointer);
begin
{$IFDEF REPLACE_RET_WITH_JMP}
  PatchTryFinally2(PByte(@Test) + $32);
{$ELSE}
  PatchTryFinally1(PByte(@Test) + 26);
{$ENDIF}
end;

var
  i: Integer;
  sw: TStopwatch;

begin
{$IFDEF PATCH_TRY_FINALLY}
  PatchTryFinally(PByte(@Test));
{$ENDIF}
  sw := TStopwatch.StartNew;
  Sleep(1);
  sw.ElapsedMilliseconds;

  sw := TStopwatch.StartNew;
  for i := 1 to 100000000 do
    Test;

  Writeln(sw.ElapsedMilliseconds);
  Readln;
end.

 

  • Like 5

Share this post


Link to post

Maybe this is OT for this thread. I'll look up where to start a new one, and add a link.

 

 

8 hours ago, Kas Ob. said:

@pmcgee Thank you for sharing.

 

I tried this


{$APPTYPE CONSOLE}

end.

The result

What am i missing here ?

 

Edited by pmcgee

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×