Jump to content
Sign in to follow this  
Mike Torrettinni

Micro optimization: use Pos before StringReplace

Recommended Posts

I usually use Pos before StringReplace to test if StringReplace needs to be called at all, like this:

if Pos(vFind, vTest) > 0 then
  vNew := StringReplace(vTest, vFind, vReplace, [rfReplaceAll]);

 

And I found interesting comment in one of my old questions, 7th comment to question:  I believe the fist step would be to remove all the If Pos(...) > 0 then  - because StringReplace already does it and thus those checks give nothing but a redundant extra scanning of the string. (https://stackoverflow.com/questions/34769953/how-to-improve-multiple-stringreplace-calls)

 

So, since this is comment from an expert, of course I take it as a valid comment, but wanted to run a few tests to confirm his suggestion (or ignore it). Quick test shows these timings in ms:

Short string Find + Pos: 1362
Short string Find - Pos: 1265
Short string NO Find + Pos: 98
Short string NO Find - Pos: 714

Long string Find + Pos: 2547
Long string Find - Pos: 2291
Long string NO Find + Pos: 647
Long string NO Find - Pos: 1457

 

It shows that Pos does add a little time when substring is found, but it is significantly slower not using Pos when substring is not found.

 

Is my test correct or am I missing something obvious?

 

 

 

program Project1;

{$APPTYPE CONSOLE}

{$R *.res}

uses System.SysUtils, System.Diagnostics;

const
  cMaxLoop  = 10000000;
  cShortStr = 'Testing string magna.';
  cLongStr  = 'Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.';
  cFind     = 'mag';
  cNoFind   = 'mga';
  cReplace  = 'X';

var vNewStr: string;
    vSW: TStopWatch;
    i: Integer;

procedure TestShort_Find;
begin
  vSW := TStopWatch.StartNew;
  for i := 1 to cMaxLoop do
    if Pos(cFind, cShortStr) > 0 then
      vNewStr := StringReplace(cShortStr, cFind, cReplace, [rfReplaceAll]);
  Writeln('Short string Find + Pos: ' + vSW.ElapsedMilliseconds.ToString);

  vSW := TStopWatch.StartNew;
  for i := 1 to cMaxLoop do
    vNewStr := StringReplace(cShortStr, cFind, cReplace, [rfReplaceAll]);
  Writeln('Short string Find - Pos: ' + vSW.ElapsedMilliseconds.ToString);
end;

procedure TestShort_NoFind;
begin
  vSW := TStopWatch.StartNew;
  for i := 1 to cMaxLoop do
    if Pos(cNoFind, cShortStr) > 0 then
      vNewStr := StringReplace(cShortStr, cNoFind, cReplace, [rfReplaceAll]);
  Writeln('Short string NO Find + Pos: ' + vSW.ElapsedMilliseconds.ToString);

  vSW := TStopWatch.StartNew;
  for i := 1 to cMaxLoop do
    vNewStr := StringReplace(cShortStr, cNoFind, cReplace, [rfReplaceAll]);
  Writeln('Short string NO Find - Pos: ' + vSW.ElapsedMilliseconds.ToString);
end;

procedure TestLong_Find;
begin
  vSW := TStopWatch.StartNew;
  for i := 1 to cMaxLoop do
    if Pos(cFind, cLongStr) > 0 then
      vNewStr := StringReplace(cLongStr, cFind, cReplace, [rfReplaceAll]);
  Writeln('Long string Find + Pos: ' + vSW.ElapsedMilliseconds.ToString);

  vSW := TStopWatch.StartNew;
  for i := 1 to cMaxLoop do
    vNewStr := StringReplace(cLongStr, cFind, cReplace, [rfReplaceAll]);
  Writeln('Long string Find - Pos: ' + vSW.ElapsedMilliseconds.ToString);
end;

procedure TestLong_NoFind;
begin
  vSW := TStopWatch.StartNew;
  for i := 1 to cMaxLoop do
    if Pos(cNoFind, cLongStr) > 0 then
      vNewStr := StringReplace(cLongStr, cNoFind, cReplace, [rfReplaceAll]);
  Writeln('Long string NO Find + Pos: ' + vSW.ElapsedMilliseconds.ToString);

  vSW := TStopWatch.StartNew;
  for i := 1 to cMaxLoop do
    vNewStr := StringReplace(cLongStr, cNoFind, cReplace, [rfReplaceAll]);
  Writeln('Long string NO Find - Pos: ' + vSW.ElapsedMilliseconds.ToString);
end;


begin
  TestShort_Find;
  TestShort_NoFind;
  TestLong_Find;
  TestLong_NoFind;

  Readln;
end.

 

Share this post


Link to post
25 minutes ago, Mark- said:

Did you look at the source code for StringReplace?

Yes, it doesn't use Pos to check if it needs replacing at all. Which is strange. But that is in Delphi 10.2.3, perhaps newer version has this fixed.

Which version of Delphi do you use, does it have Pos check at the beginning?

Share this post


Link to post
1 minute ago, Mike Torrettinni said:

...But that is in Delphi 10.2.3, perhaps newer version has this fixed.

Which version of Delphi do you use, does it have Pos check at the beginning?

Not sure it is needs a "fix". pos is case sensitive. A different function would be needed, depending on the value of the flag parameter. 

Same version.

  • Thanks 1

Share this post


Link to post

Nothing needs to be fixed. StringReplace is giving correct output, isn't it? You absolutely don't want to be calling Pos inside an efficient implementation of StringReplace. All that you are learning is that StringReplace is not efficiently written. The correct way to deal with that is to find/write and use a more efficient function to replace text. That is of course if this is a bottleneck in your code. Have you checked? 

 

BTW, do you really have calls to Pos littered through your code whenever you call StringReplace? 

  • Like 2
  • Thanks 1

Share this post


Link to post
1 hour ago, David Heffernan said:

BTW, do you really have calls to Pos littered through your code whenever you call StringReplace? 

Just did the search through my code and I only use Pos when dealing with a lot of data, when there is repetitive call to StringReplace. Otherwise not, since the performance is not important.

 

1 hour ago, David Heffernan said:

StringReplace is not efficiently written.

Any recommendations on more efficient function?

Share this post


Link to post
4 minutes ago, Mike Torrettinni said:

Any recommendations on more efficient function?

No. I'd consider writing one myself if I found that my program was spending a significant amount of time in an inefficient function. But we've had this discussion before and you don't believe in that approach.

Share this post


Link to post
5 minutes ago, Mike Torrettinni said:

Just did the search through my code and I only use Pos when dealing with a lot of data, when there is repetitive call to StringReplace. Otherwise not, since the performance is not important.

You really need to learn about the extract method refactoring.

Share this post


Link to post
2 minutes ago, David Heffernan said:

I'd consider writing one myself if I found that my program was spending a significant amount of time in an inefficient function.

I guess Pos before StringReplace is already an improvement, when substring is not found but we don't know it.

Share this post


Link to post
46 minutes ago, Mike Torrettinni said:

I guess Pos before StringReplace is already an improvement

If the code is not a bottleneck then surely you'd reach the opposite conclusion.

Share this post


Link to post
7 minutes ago, David Heffernan said:

If the code is not a bottleneck then surely you'd reach the opposite conclusion.

Yes, of course. Currently working on this part of the code that is a bottleneck. Want to make sure it doesn't regress, so every step is measured and evaluated.

 

The fact is that only a few customers have data that this code is needed. So, it's a tradeoff between slower by 5% for 5% cases and 50-80% faster for 95% cases.

Share this post


Link to post
2 minutes ago, Mike Torrettinni said:

Yes, of course. Currently working on this part of the code that is a bottleneck. Want to make sure it doesn't regress, so every step is measured and evaluated.

 

The fact is that only a few customers have data that this code is needed. So, it's a tradeoff between slower by 5% for 5% cases and 50-80% faster for 95% cases.

Why not achieve faster in all cases then? Does the code that you are looking at, that is a bottleneck, still call StringReplace multiple times per input string, as per the SO question?

Edited by David Heffernan

Share this post


Link to post
Just now, David Heffernan said:

Why not achieve faster in all cases then? 

Based on my Delphi knowledge this would probably be way more expensive (in hours, days), and highly likely it will not result in 'faster in all cases'.

Share this post


Link to post
17 minutes ago, David Heffernan said:

Does the code that you are looking at, that is a bottleneck, still call StringReplace multiple times per input string, as per the SO question?

Yes, I wasn't successful in creating final version of multiple string replace function in one go.

Share this post


Link to post
7 minutes ago, Mike Torrettinni said:

Yes, I wasn't successful in creating final version of multiple string replace function in one go.

Well there's your problem! Run through the string once only and you'll get a huge boost in performance. I described how to do that in my comment to your Q.

Edited by David Heffernan

Share this post


Link to post

That speed difference probably depends  also on Delphi version. Looking at the code ansi versions of Delphi would have been much slower on those negative cases. In Unicode version I do not see any obvious problems. 

Edited by Virgo

Share this post


Link to post
12 minutes ago, Virgo said:

That speed difference probably depends  also on Delphi version. Looking at the code ansi versions of Delphi would have been much slower on those negative cases. In Unicode version I do not see any obvious problems. 

You are talking about pre 2009 versions?

Share this post


Link to post
22 minutes ago, Mike Torrettinni said:

You are talking about pre 2009 versions?

Yes. Those used AnsiPos that calls AnsiStrPos, which has PChar parameters. And then ends up calling StrLen on both strings. Which basically means multiple full string scans. But Unicode version does not have such obvious problem.

Share this post


Link to post
1 minute ago, Virgo said:

Yes. Those used AnsiPos that calls AnsiStrPos, which has PChar parameters. And then ends up calling StrLen on both strings. Which basically means multiple full string scans. But Unicode version does not have such obvious problem.

I used similar code for replacing strings in pre 2009, but at that time the performance was not my main focus. I guess I should have started at that time and perhaps I wouldn't deal with this now  😉

Share this post


Link to post
15 hours ago, Mike Torrettinni said:

It shows that Pos does add a little time when substring is found, but it is significantly slower not using Pos when substring is not found.

Pos is much faster than Replace so what are you wondering about? Of course to not call Replace at all is faster than call it.

  • Like 1

Share this post


Link to post
7 minutes ago, Fr0sT.Brutal said:

Pos is much faster than Replace so what are you wondering about? Of course to not call Replace at all is faster than call it.

What Mike is concerned about is that calling Replace is slower than filtering with Pos before calling Replace. 

 

In reality he is asking the wrong question. The right solution to the problem (see five year old SO question) is to iterate over the input string once rather than 5-10 times. 

Share this post


Link to post
22 hours ago, Mike Torrettinni said:

Yes, it doesn't use Pos to check if it needs replacing at all.

Maybe you have a special, dedicated RTL shipped with Delphi. Was it in a golden envelope?

  • Confused 1

Share this post


Link to post

Ok, well

 

On 2/1/2021 at 1:47 AM, Mike Torrettinni said:
On 2/1/2021 at 1:19 AM, Mark- said:

Did you look at the source code for StringReplace?

Yes, it doesn't use Pos to check if it needs replacing at all.

System.SysUtils.StringReplace:  "FoundPos := Pos(xOldPattern, Str, FoundPos);"

 

Fail.

 

On 2/1/2021 at 2:05 AM, Mike Torrettinni said:
On 2/1/2021 at 1:19 AM, Mark- said:

Did you look at the source code for StringReplace?

Aha, I just noticed StrigReplace has rfIgnoreCase flag.

 

6th line in System.SysUtils.StringReplace:  "if rfIgnoreCase in Flags then"

 

Fail.

 

 

Your test case:

  vNewStr := 'Fail.';
  if Pos(cNoFind, cLongStr) > 0 then
    vNewStr := StringReplace(cLongStr, cNoFind, cReplace, [rfReplaceAll]);
  Writeln(vNewStr);

Fail.

 

  • Sad 1

Share this post


Link to post
4 hours ago, Attila Kovacs said:

System.SysUtils.StringReplace:  "FoundPos := Pos(xOldPattern, Str, FoundPos);"

 

Fail.

So it is different in different Delphi versions.... In Delphi XE StringReplace does not use Pos... Instead it uses SysUtils.AnsiPos.

But on possible reasons why on negative cases (search string is not found) StringReplace is slower to return then checking with Post before (just in case sensitive version and on Delphi XE):

Multiple functions calls to find position of search string (StringReplace->AnsiPos->StrPosLen->StrLComp

String concatenation always.

 

Edited by Virgo
Clarified, that it uses SysUtils version of AnsiPos
  • Like 1

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×