Jump to content
Sign in to follow this  
Mike Torrettinni

Micro optimization: use Pos before StringReplace

Recommended Posts

6 hours ago, Virgo said:

So it is different in different Delphi versions.... In Delphi XE StringReplace does not use Pos... Instead it uses SysUtils.AnsiPos.

Up to ~XE7 StringReplace is piece of straightforward dumb slow code. Later they optimized it pretty well.

  • Like 1

Share this post


Link to post
14 hours ago, Virgo said:

But on possible reasons why on negative cases (search string is not found) StringReplace is slower to return then checking with Post before (just in case sensitive version and on Delphi XE):

I tried to customize the StringReplace to use Pos and exit asap if string is not found within, but I broke something for other cases, so I had to give up 🙂

Share this post


Link to post

Why don't you fix your tests first? You misled yourself with faulty tests.

Edited by Attila Kovacs

Share this post


Link to post
1 hour ago, Mike Torrettinni said:

I tried to customize the StringReplace to use Pos and exit asap if string is not found within, but I broke something for other cases, so I had to give up 🙂

That's not the right way to solve this. The right way is to iterate over the string once rather than 5-10 times. Why would you want to iterate over the string so many times? 

  • Thanks 1

Share this post


Link to post
4 minutes ago, David Heffernan said:

That's not the right way to solve this. The right way is to iterate over the string once rather than 5-10 times. Why would you want to iterate over the string so many times? 

StringReplace iterates only once over the string, there is nothing to solve.

It's his tests where he is ignoring the fact that StringReplace not only replacing (or not) parts of the string, but also assigns the result to a new string.

Those test are failing and by fixing it the differences are not that big any more. A couple of milliseconds on a loop with 10 million iterations.

 

 

Share this post


Link to post
30 minutes ago, Attila Kovacs said:

StringReplace iterates only once over the string, there is nothing to solve.

The code is the code in the SO question which calls StringReplace multiple times. 

Share this post


Link to post
3 minutes ago, David Heffernan said:

The code is the code in the SO question which calls StringReplace multiple times. 

He isn't using that code at all. At least not in the tests.

Share this post


Link to post
52 minutes ago, Attila Kovacs said:

He isn't using that code at all. At least not in the tests.

Read back in the thread. He says that it's the same code as the SO question that he is trying to make go faster. 

Share this post


Link to post

I see now what you mean, you are right about optimization part (just as 5 years ago).

I had only this thread in my mind.

Share this post


Link to post
2 hours ago, David Heffernan said:

That's not the right way to solve this. The right way is to iterate over the string once rather than 5-10 times. Why would you want to iterate over the string so many times? 

Yes, I put it on ToDo list to revisit - 5 years ago when I tried to implement it it was 10x slower than normal StringReplace call, perhaps because they way I was concatenation strings. Now I can probably improve it, I just need to book the time.

 

Good thing is nobody is disputing that Pos is good improvement to use with StringReplace, when we don't know if call is actually needed or not - if sub string is in the string. Of course for repetitive calls, like in my case.

 

  • Haha 1

Share this post


Link to post
6 hours ago, Mike Torrettinni said:

Good thing is nobody is disputing that Pos is good improvement to use with StringReplace

I did

  • Thanks 1

Share this post


Link to post

I checked my string replacements in more details and on average there are 3 replacements in long strings. So, I included test with 3 replacement and the difference using Pos before StringReplace or without Pos is even smaller, 1% only, while when substring is not found not using Pos adds 100%+ to the execution time.

 

Short string Find + Pos: 1322
Short string Find - Pos: 1275
Short string 3x Find + Pos: 1504
Short string 3x Find - Pos: 1482
Short string NO Find + Pos: 98
Short string NO Find - Pos: 686
Long string Find + Pos: 2525
Long string Find - Pos: 2322
Long string 3x Find + Pos: 2609
Long string 3x Find - Pos: 2517
Long string NO Find + Pos: 660
Long string NO Find - Pos: 1452

 

 

So, using Pos before StringReplace saves a lot of time, unless we are sure that the substring exists, then it adds some time, depending on string length.

 

program Project1;

{$APPTYPE CONSOLE}

{$R *.res}

uses
  System.SysUtils,
  System.Diagnostics;

const
  cMaxLoop  = 10000000;
  cShortStr = 'Testing string magna.';
  cShortStr3x = 'Testmag strmaging magna.';
  cLongStr  = 'Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.';
  cLongStr3x  = 'Lorem ipsum dolor sit ametmag, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequatmag.';
  cFind     = 'mag';
  cNoFind   = 'mga';
  cReplace  = 'X';

var vNewStr: string;
    vSW: TStopWatch;
    c: Integer;


procedure TestShort_Find;
begin
  vSW := TStopWatch.StartNew;
  for c := 1 to cMaxLoop do
    if Pos(cFind, cShortStr) > 0 then
      vNewStr := StringReplace(cShortStr, cFind, cReplace, [rfReplaceAll]);
  Writeln('Short string Find + Pos: ' + vSW.ElapsedMilliseconds.ToString);

  vSW := TStopWatch.StartNew;
  for c := 1 to cMaxLoop do
    vNewStr := StringReplace(cShortStr, cFind, cReplace, [rfReplaceAll]);
  Writeln('Short string Find - Pos: ' + vSW.ElapsedMilliseconds.ToString);
end;

procedure TestShort_NoFind;
begin
  vSW := TStopWatch.StartNew;
  for c := 1 to cMaxLoop do
    if Pos(cNoFind, cShortStr) > 0 then
      vNewStr := StringReplace(cShortStr, cNoFind, cReplace, [rfReplaceAll]);
  Writeln('Short string NO Find + Pos: ' + vSW.ElapsedMilliseconds.ToString);

  vSW := TStopWatch.StartNew;
  for c := 1 to cMaxLoop do
    vNewStr := StringReplace(cShortStr, cNoFind, cReplace, [rfReplaceAll]);
  Writeln('Short string NO Find - Pos: ' + vSW.ElapsedMilliseconds.ToString);
end;

procedure TestShort_Find3x;
begin
  vSW := TStopWatch.StartNew;
  for c := 1 to cMaxLoop do
    if Pos(cFind, cShortStr3x) > 0 then
      vNewStr := StringReplace(cShortStr3x, cFind, cReplace, [rfReplaceAll]);
  Writeln('Short string 3x Find + Pos: ' + vSW.ElapsedMilliseconds.ToString);

  vSW := TStopWatch.StartNew;
  for c := 1 to cMaxLoop do
    vNewStr := StringReplace(cShortStr3x, cFind, cReplace, [rfReplaceAll]);
  Writeln('Short string 3x Find - Pos: ' + vSW.ElapsedMilliseconds.ToString);
end;


procedure TestLong_Find;
begin
  vSW := TStopWatch.StartNew;
  for c := 1 to cMaxLoop do
    if Pos(cFind, cLongStr) > 0 then
      vNewStr := StringReplace(cLongStr, cFind, cReplace, [rfReplaceAll]);
  Writeln('Long string Find + Pos: ' + vSW.ElapsedMilliseconds.ToString);

  vSW := TStopWatch.StartNew;
  for c := 1 to cMaxLoop do
    vNewStr := StringReplace(cLongStr, cFind, cReplace, [rfReplaceAll]);
  Writeln('Long string Find - Pos: ' + vSW.ElapsedMilliseconds.ToString);
end;

procedure TestLong_NoFind;
begin
  vSW := TStopWatch.StartNew;
  for c := 1 to cMaxLoop do
    if Pos(cNoFind, cLongStr) > 0 then
      vNewStr := StringReplace(cLongStr, cNoFind, cReplace, [rfReplaceAll]);
  Writeln('Long string NO Find + Pos: ' + vSW.ElapsedMilliseconds.ToString);

  vSW := TStopWatch.StartNew;
  for c := 1 to cMaxLoop do
    vNewStr := StringReplace(cLongStr, cNoFind, cReplace, [rfReplaceAll]);
  Writeln('Long string NO Find - Pos: ' + vSW.ElapsedMilliseconds.ToString);
end;

procedure TestLong_Find3x;
begin
  vSW := TStopWatch.StartNew;
  for c := 1 to cMaxLoop do
    if Pos(cFind, cLongStr3x) > 0 then
      vNewStr := StringReplace(cLongStr3x, cFind, cReplace, [rfReplaceAll]);
  Writeln('Long string 3x Find + Pos: ' + vSW.ElapsedMilliseconds.ToString);

  vSW := TStopWatch.StartNew;
  for c := 1 to cMaxLoop do
    vNewStr := StringReplace(cLongStr3x, cFind, cReplace, [rfReplaceAll]);
  Writeln('Long string 3x Find - Pos: ' + vSW.ElapsedMilliseconds.ToString);
end;




begin
  TestShort_Find;
  TestShort_Find3x;
  TestShort_NoFind;
  TestLong_Find;
  TestLong_Find3x;
  TestLong_NoFind;

  Readln;
end.

 

Share this post


Link to post

Your benchmarks are bogus because the real code replaces multiple strings. It is simple to make this code at least three times faster in the real setting. 

Share this post


Link to post

"percent" is not a unit neither "lot" is

If you are already patching files from your customers, it would have been a little effort to test it yourself on your use case with realtime data and post the results.

I don't think that this test which comparing Pos() with SubString() and giving a result of 0,00006 milliseconds difference, makes any sense.

Share this post


Link to post
36 minutes ago, David Heffernan said:

Your benchmarks are bogus because the real code replaces multiple strings. It is simple to make this code at least three times faster in the real setting. 

I don't understand, what is missing that would be MCVE? I have description, results and code, what could I include to improve the test case?

Share this post


Link to post
1 hour ago, Mike Torrettinni said:

I don't understand, what is missing that would be MCVE? I have description, results and code, what could I include to improve the test case?

The code in this post doesn't accurately represent the code in your app which calls StringReplace repeatedly on the same string. As per the SO question. 

Share this post


Link to post
6 minutes ago, David Heffernan said:

The code in this post doesn't accurately represent the code in your app which calls StringReplace repeatedly on the same string. As per the SO question. 

I referenced exact comment in the SO question which I investigated and prepared MCVE to confirm, or that someone shows my test case is wrong, or what is missing. That was original post.

Then I analyzed real data to try to get as close to it as possible, and second test case shows better representation of real data.

 

Or are you suggesting I post my project source and customer data?

Edited by Mike Torrettinni

Share this post


Link to post
4 minutes ago, Mike Torrettinni said:

I referenced exact comment in the SO question which I investigated and prepared MCVE to confirm, or that someone shows my test case is wrong, or what is missing. That was original post.

Then I analyzed real data to try to get as close to it as possible, and second test case shows better representation of real data.

 

Or are you suggesting I post my project source and customer data?

Sample data might be useful. I'm amazed that you don't see the huge difference between your SO question and the code in this post. In this post you make one replacement. In the SO post you make five. That's the key issue here. And in five years you have not appreciated it. 

  • Like 1

Share this post


Link to post
Just now, David Heffernan said:

Sample data might be useful. I'm amazed that you don't see the huge difference between your SO question and the code in this post. In this post you make one replacement. In the SO post you make five. That's the key issue here. And in five years you have not appreciated it. 

How should I phrase it that this is referencing a comment about redundant Pos with StringReplace and not that I'm looking for a solution for the original SO question?

Share this post


Link to post
13 minutes ago, David Heffernan said:

Sample data might be useful

Data could be entity names, descriptions, notes, comments... free text, in any language. I assumed lorem ipsum is good to be used for such test cases.

Share this post


Link to post
7 hours ago, Mike Torrettinni said:

How should I phrase it that this is referencing a comment about redundant Pos with StringReplace and not that I'm looking for a solution for the original SO question?

Earlier in the thread you said that the original problem from the SO question was driving this topic. 

Share this post


Link to post

One example I didn't test: if substring is at the end of string, so that Pos runs the longest:

 

Long string Find AtTheEnd + Pos: 2938
Long string Find AtTheEnd - Pos: 2298

In this case using Pos adds 32% of time. Which is quite substantial, but in my real data this happens in less than 0.01% cases, so not really valid case.

Edited by Mike Torrettinni

Share this post


Link to post

Just a quick update on a progress of my code:

 

If we use Pos before StringReplace, because actual string replacement is done rarely and not every time this method is called, we can squeeze a little more performance if we move StringReplace out of the current method:

procedure DoStringReplace(var aStr: string; const aOldStr, aNewStr: string);
begin
  aStr := StringReplace(aStr, aOldStr, aNewStr, [rfReplaceAll]);
end;

function PrepareStr(const aStr: string): string;
begin
  Result := aStr;
  ...
  if Pos(cInvalidSubStr, Result) > 0 then    
    DoStringReplace(Result, cInvalidSubStr, '');
end;

In this case the slow safe string handling stuff is done in DoStringReplace method. 

I guess this is the similar optimization trick that many use by moving rare error report message to another method, instead of using string concatenation in the method where error is raised.

 

We just need to make sure we don't make DoStringReplace inlined, then all performance gain is lost. It took me a while to find that out 😉

 

 

Share this post


Link to post
On 2/8/2021 at 2:24 PM, Mike Torrettinni said:

Just a quick update on a progress of my code:

 

If we use Pos before StringReplace, because actual string replacement is done rarely and not every time this method is called, we can squeeze a little more performance if we move StringReplace out of the current method:


procedure DoStringReplace(var aStr: string; const aOldStr, aNewStr: string);
begin
  aStr := StringReplace(aStr, aOldStr, aNewStr, [rfReplaceAll]);
end;

function PrepareStr(const aStr: string): string;
begin
  Result := aStr;
  ...
  if Pos(cInvalidSubStr, Result) > 0 then    
    DoStringReplace(Result, cInvalidSubStr, '');
end;

In this case the slow safe string handling stuff is done in DoStringReplace method. 

I guess this is the similar optimization trick that many use by moving rare error report message to another method, instead of using string concatenation in the method where error is raised.

 

We just need to make sure we don't make DoStringReplace inlined, then all performance gain is lost. It took me a while to find that out 😉

 

 

I really don't understant the interest of putting StringReplace out of the current method. Just note that there are a lot of StringReplace more efficient than the RTL one, and I think usi,g it  is the best way for better performances.

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×