PeaShooter_OMO 35 Posted yesterday at 11:07 AM This topic is not because of an issue but just me messing around in order to learn. On Delphi XE3 and Windows 10 I am interested in the overhead of calling a routine. In the code below I call the same function directly and indirectly to determine the overhead involved with calling a routine in Delphi. I use a 100 million iteration loop to be able to note the difference. Obviously a few nanoseconds per call is not much to write home about but I might be making some wrong assumptions here. Apart from calling directly what would I have to keep in mind to achieve good performance when calling in an Indirectly way when a more complex function is being used with more complex code and more complex parameters? function DirectlyAddOne(const AValue : Integer) : Integer; begin Result := AValue + 1; end; function Indirectly(const AValue : Integer) : Integer; begin Result := DirectlyAddOne(AValue); end; procedure DoLoop(ACallDirectly : Boolean); var I : Integer; LTickCount : Cardinal; LValue : Integer; begin LValue := 100; LTickCount := GetTickCount; // 100 million iteration For I := 0 to 99999999 do If ACallDirectly then LValue := DirectlyAddOne(LValue) else LValue := Indirectly(LValue); LTickCount := GetTickCount - LTickCount; ShowMessage('Directly? ' + BoolToStr(ACallDirectly,True) + #13#10#13#10 + InttoStr(LValue) + #13#10#13#10 + InttoStr(LTickCount) + ' ms'); end; procedure TForm1.ButtonDirectlyClick(Sender: TObject); begin DoLoop(True); end; procedure TForm1.ButtonIndirectlyClick(Sender: TObject); begin DoLoop(False); end; RoutineOverhead.zip Share this post Link to post
david berneda 37 Posted yesterday at 11:49 AM The "if" inside the "for" is also taking time. Try with 2 for loops, one for direct and another for indirect. Also the indirect routine could be marked as "inline" to eliminate the call. This is fine for short small routines. Share this post Link to post
DelphiUdIT 248 Posted yesterday at 12:15 PM 14 minutes ago, david berneda said: The "if" inside the "for" is also taking time. Try with 2 for loops, one for direct and another for indirect. The "if" is neutral, since it is alwys execute (may be a very little jmp that are assorbed by cache) ... you can try to invert the two calls and you will (should) see how is the timing. The different times are justified by the overhead (like @PeaShooter_OMO sayd) of the call (load the registers / stack, jmp, load the result / stack, return). It's right to define them INLINE, in this case you have a better timing (you can declare inline one, the other or all two to see the various timing). Share this post Link to post
DelphiUdIT 248 Posted yesterday at 12:32 PM 1 hour ago, PeaShooter_OMO said: Obviously a few nanoseconds per call is not much to write home about but I might be making some wrong assumptions here. Apart from calling directly what would I have to keep in mind to achieve good performance when calling in an Indirectly way when a more complex function is being used with more complex code and more complex parameters? Unless the application is critical to runtimes, the best approach is to focus on timing, code readability, and flexibility. Specifically, break long code into shorter chunks, passing few parameters so as to use only the CPU registers. Use the INLINE directive so that the code can be compiled by "including" it directly in the caller (in this case, the resulting executable will be larger but faster). You can also write parts in assembler to further optimize speed. Be aware, however, that this requires a thorough understanding of the topic and, above all, always provide a Pascal alternative in case the assembler isn't suitable for the processor used at runtime (e.g., Intel vs. ARM) or the platform. Share this post Link to post
Anders Melander 2030 Posted yesterday at 02:15 PM 2 hours ago, PeaShooter_OMO said: I am interested in the overhead of calling a routine. Profile it. The significance of the overhead really depends on what the routine is doing and how often it is called. For example if the call overhead is X but the routine itself takes 1000*X to execute then the overhead is insignificant. Apart from the overhead of the call itself there's the overhead of passing parameters. You need to learn how parameters are passed in order to optimize them. Not all parameter types can be passed in registers, Delphi doesn't use all available registers, 32-bit and 64-bit does things differently, etc. Also be aware that if you pass literal floating point numbers as parameters, the compiler might not consider the literal values to be the same type as the parameter which means that it will produce code to convert the values to the correct type. For example, if the parameter type is Single and you need to pass 0.5, then pass it as Single(0.5) - or declare a typed constant with the value and pass that. Set Code Inlining Control=Auto to have the compiler automatically inline small routines. With regard to writing stuff in assembler be aware that assembler routines can't be inlined so the call overhead becomes mandatory. For example, in my code one constant bottleneck is calls to Round and Trunc. I have assembler versions of these two functions which are much faster but unfortunately the call overhead completely eliminate the performance gain so they are basically useless. It's beyond me why Delphi doesn't implement these two as intrinsics. They are listed as such but they are implemented as regular functions. 1 hour ago, DelphiUdIT said: above all, always provide a Pascal alternative in case the assembler isn't suitable for the processor used at runtime (e.g., Intel vs. ARM) or the platform. Not only that, but the Pascal version will provide a reference implementation to test and benchmark against and it will help documenting what the assembler versions does. 2 Share this post Link to post