Jump to content
Mike Torrettinni

Multiple string replace - avoid calling StringReplace multiple times

Recommended Posts

I finally have a working version that is faster than calling StringReplace numerous times, but I hope there is a way to make it even faster.

 

So, the purpose of this ReplaceMultiStrings is avoid calling StringRepace multiple times, when we need to replace multiple sub strings, like:

s := StringReplace(s, '&lt;', '<');
s := StringReplace(s, '&gt;', '>');
s := StringReplace(s, '&amp', '&');

 

Here is description of how my function works:

1. Get all Positions of all substrings

2. Sort Positions, so we make replacements from 1..n one by one (if we don't sort and make replacements randomly within a string, it's a mess...)

3. Loop Positions and make replacements (copy each replacement and parts between replacements into single Result string)

 

I like the performance, but I was hoping that a single point of constructing final string will give much better performance. But of course the finding all Positions and sorting takes time.

 

Here are the example timings:

 

image.png.fe38612c3d9e58d69e599b82e9b15f56.png

 

With short strings, the performance is close to 50% faster, but with longer strings the difference is much smaller.

 

Of course there are some limitations if substrings are not unique or substrings are within other substrings. No Case options...and perhaps others.

 

Source code:

 

 

 

program Project1;

{$APPTYPE CONSOLE}

{$R *.res}

uses
  System.SysUtils,
  System.Generics.Collections,
  System.Generics.Defaults,
  System.Diagnostics;

type
  // Positions
  TSubStrPos    = record
    Pos   : integer;
    StrIdx: integer;
  end;

  // Test cases
  TTestCase = record
    Str: string;
    OldPatterns,
    NewPatterns: TArray<string>;
  end;
  TTestCases = TArray<TTestCase>;

  // Custom comparer initialized 1x only
  TCustomComparer = class(TInterfacedObject, IComparer<TSubStrPos>)
  public
    function Compare(const Left, Right: TSubStrPos): Integer;
  end;

const
  cMaxLoop = 1000000;
  cShortA  : string = 'abc &lt; 1 &gt; 2';
  cMediumA : string = 'medium &lt; text medium &gt; text medium &lt;&gt; text medium text &gt;';
  cLongA   : string = '&lt;Lorem ipsum sit &gt;amet, &lt;consectetur adipiscing. Integer sagittis dolor a aliquam. &gt;&lt; Vestibulum et purus in dolor consectetur tempor a eget est. Curabitur sit amet erat imperdiet, &lt;scelerisque &gt;suscipit nisl. &lt;Vestibulum velit.';

  cShortB  : string = 'testing and testing';
  cMediumB : string = 'testing and testing and another testing and yet another testing of the test';
  cLongB   : string = 'Lorem te ipsum ingdolor sit stamet, consectetur adipiingiscing elitesting. Morbi bibendum, mauris Vestibulum vulputate, mauris neque pretium terisus, maximus odio ipsum vel est. Integer sollicitudin massa sit amet semper condimentum. rutrumingte';

  cHTML: string =
    '<html><body>This is sample text for StringReplace test.'
    + 'Lorem &lt;ipsum dolor sit amet,&amp; consectetur adipiscing elit. Integer feugiat ullamcorper &amp;ligula, at &amp;vehicula turpis volutpat et. Praesent in sapien vel tortor pulvinar ullamcorper vel et felis. &gt;Praesent mattis efficitur ligula, sit amet'
    + 'venenatis felis. Mauris dui mauris, tempus in turpis at, eleifend tincidunt arcu. Morbi nec tincidunt felis, non viverra nisi. Nulla facilisi.&lt; Donec iaculis, elit &gt;non hendrerit vehicula, urna dui efficitur &quolectus, in tempus metus est'
    + 'sed sem. &lt;Vivamus eleifend euismod orci a volutpat. Ut sed ligula sed sem bibendum facilisis. Morbi gravida neque &lt;nec odio lacinia, at tincidunt &amp;nulla dignissim. Vestibulum vitae &gt;nisi accumsan, dapibus lorem non, &quoconvallis turpis.'
    + 'Pellentesque imp&amp;erdiet malesuada ipsum vel commodo. Nam dignissim&lt; luctus augue, nec maximus felis vehicula eget. Phasellus ultrices lacus &lt;ac elit dapibus &quopellentesque. Morbi vitae eros in turpis&gt; hendrerit volutpat. Cras euismod'
    + ' dolor in scelerisque. Donec &amp;id tincidunt nisi. Praesent ut nunc luctus, pulvinar &lt;mauris ut, hend&gt;rerit nulla. Praesent quis varius lorem.&amp;'
    + '</body></html>';

  cSpecialChars : string = 'Testing special char replacemenets: [Lorem te (ipsum) ing(((d)))olo/r -s+it st\\amet, co_!?nse?ctetur adipi-i!ng?isc/ing +(el_)i[_!?]testing.]';

var xComparer:  IComparer<TSubStrPos>;

function TCustomComparer.Compare(const Left, Right: TSubStrPos): Integer;
begin
  Result := Left.Pos - Right.Pos;
end;


procedure GetSubStrPositions7(const aStr: string; const aOldPatterns, aNewPatterns: array of string; var aPositions: TArray<TSubStrPos>; var aFinalLength: integer);
var vPos, i, vDiff, vIdx: integer;
begin
  aFinalLength := aStr.Length;
  SetLength(aPositions, 32);
  vIdx := 0;
  for i := Low(aOldPatterns) to High(aOldPatterns) do
  begin
    vPos := Pos(aOldPatterns, aStr);
    vDiff := aNewPatterns.Length - aOldPatterns.Length; // calculate diff of replacements, to adjust aFinalLength
    while vPos <> 0 do
    begin
      if vIdx > High(aPositions) then
        SetLength(aPositions, Length(aPositions) * 2);

      aPositions[vIdx].Pos := vPos;
      aPositions[vIdx].StrIdx := i;
      Inc(vIdx);
      aFinalLength := aFinalLength + vDiff;
      vPos := Pos(aOldPatterns, aStr, vPos + 1);
    end;
  end;
  SetLength(aPositions, vIdx);
end;

function ReplaceMultiStringsByPositions7(const aStr: string; const aPositions: TArray<TSubStrPos>;  const aOldPatterns, aNewPatterns: array of string; aResultLen: integer): string;
var i, vResPos, vStrPos: Integer;
begin
  SetLength(Result, aResultLen);

  // move substrings by index + between indexes, if any
  vResPos := 1; vStrPos := 1;
  for i := Low(aPositions) to High(aPositions) do
  begin
    // Copy free text between last Pos and current Pos
    if aPositions.Pos > vStrPos then
    begin
      Move(aStr[vStrPos], Result[vResPos], (aPositions.Pos - vStrPos)*SizeOf(Char));
      Inc(vResPos, aPositions.Pos - vStrPos);
    end;

    // copy New str
    Move(aNewPatterns[aPositions.StrIdx][1], Result[vResPos], aNewPatterns[aPositions.StrIdx].Length * SizeOf(Char));
    // move Pos by New str len
    Inc(vResPos, aNewPatterns[aPositions.StrIdx].Length);
    // move aStr Pos
    vStrPos := aPositions.Pos + aOldPatterns[aPositions.StrIdx].Length;
  end;

  // copy any left after last replacement
  if vStrPos < aStr.Length then
    Move(aStr[vStrPos], Result[vResPos], (aStr.Length - vStrPos + 1) * SizeOf(Char));
end;

procedure SortPositions(var aPositions: TArray<TSubStrPos>);
begin
  if xComparer = nil then
    xComparer  := TCustomComparer.Create; // initialize Comparer

  TArray.Sort<TSubStrPos>(aPositions, xComparer);
end;

function ReplaceMultiStrings7(const aStr: string; const aOldPatterns, aNewPatterns: array of string): string;
var vPositions: TArray<TSubStrPos>;
    vResultLen: integer;
begin
  // Get Positions for all string replacements
  GetSubStrPositions7(aStr, aOldPatterns, aNewPatterns, vPositions, vResultLen);

  // Sort Positions so the replacements go from first to last pos
  SortPositions(vPositions);

  // Replace by indexes
  Result := ReplaceMultiStringsByPositions7(aStr,  vPositions, aOldPatterns, aNewPatterns, vResultLen);
end;


{ Standard StringReplace used multiple times }
function StringReplaceAll(const aStr: string; const aOldPatterns, aNewPatterns: array of string): string;
var i: Integer;
begin
  Result := aStr;
  for i := Low(aOldPatterns) to High(aOldPatterns) do
    Result := StringReplace(Result, aOldPatterns, aNewPatterns, [rfReplaceAll]);
end;


{ Test cases }
procedure AddTestCase(var aTestCases: TTestCases; const aStr: string; const aOldPatterns, aNewPatterns: TArray<string>);
var vTestCase: TTestCase;
begin
  vTestCase.Str := aStr;
  vTestCase.OldPatterns := aOldPatterns;
  vTestCase.NewPatterns := aNewPatterns;

  aTestCases := aTestCases + [vTestCase];
end;

procedure RunTestCases(const aTestCases: TTestCases);
var i: Integer;
    s, s1: string;
begin
  for i := Low(aTestCases) to High(aTestCases) do
  begin
    s := StringReplaceAll(aTestCases.Str, aTestCases.OldPatterns, aTestCases.NewPatterns);
    s1 := ReplaceMultiStrings7(aTestCases.Str, aTestCases.OldPatterns, aTestCases.NewPatterns);
    if (s <> s1) then
      raise Exception.Create('Not equal results: ' + sLineBreak + s + sLineBreak + s1);
  end;
end;


procedure DoTheTiming(const aCaption, aStr: string; const aOldPatterns, aNewPatterns: array of string);
var vSW:TStopWatch;
    i: integer;
    str: string;
begin
  Writeln(aCaption);

  vSW := TStopWatch.StartNew;
  for i := 1 to cMaxLoop do
  begin
    str := '';
    str := StringReplaceAll(aStr, aOldPatterns, aNewPatterns);
  end;
  Writeln('StringReplaceAll:     ' + vSW.ElapsedMilliseconds.ToString);

  vSW := TStopWatch.StartNew;
  for i := 1 to cMaxLoop do
  begin
    str := '';
    str := ReplaceMultiStrings7(aStr, aOldPatterns, aNewPatterns);
  end;
  Writeln('ReplaceMultiStrings7: ' + vSW.ElapsedMilliseconds.ToString);
  Writeln;
end;

var vTestCases: TTestCases;

begin
   // make sure string replacement is correct!
  AddTestCase(vTestCases, cShortA, ['&lt;', '&gt;'], ['<', '>']);
  AddTestCase(vTestCases, cMediumA, ['&lt;', '&gt;'], ['<', '>']);
  AddTestCase(vTestCases, cLongA, ['&lt;', '&gt;'], ['<', '>']);
  AddTestCase(vTestCases, cShortB, ['te', 'st', 'ing', 'and'], ['ab', 'cd', 'ef', 'AND']);
  AddTestCase(vTestCases, cShortB, ['te'], ['AND']);
  AddTestCase(vTestCases, cShortB, ['and'], ['ab']);
  AddTestCase(vTestCases, cMediumB, ['te', 'st', 'ing', 'and'], ['ab', 'cd', 'ef', 'AND']);
  AddTestCase(vTestCases, cMediumB, ['te'], ['AND']);
  AddTestCase(vTestCases, cMediumB, ['and'], ['ab']);
  AddTestCase(vTestCases, cLongB, ['te', 'st', 'ing', 'and'], ['ab', 'cd', 'ef', 'AND']);
  AddTestCase(vTestCases, cLongB, ['te'], ['AND']);
  AddTestCase(vTestCases, cLongB, ['and'], ['ab']);
  AddTestCase(vTestCases, cHTML, ['&lt;', '&gt;', '&amp', '&quot;'], ['<', '>', '&', '"']);
  AddTestCase(vTestCases, cSpecialChars, ['\', '/', '!', '?', '[', ']', '_', '(', ')', '-', '+'], ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k']);

  RunTestCases(vTestCases);

//  DoTheTiming('Short str A:',   cShortA,  ['&lt;', '&gt;'], ['<', '>']);
//  DoTheTiming('Medium str A:',  cMediumA, ['&lt;', '&gt;'], ['<', '>']);
//  DoTheTiming('Long str A:',    cLongA,   ['&lt;', '&gt;'], ['<', '>']);

  DoTheTiming('Short str B:',   cShortB,  ['te', 'st', 'ing', 'and'], ['ab', 'cd', 'ef', 'AND']);
  DoTheTiming('Medium str B:',  cMediumB, ['te', 'st', 'ing', 'and'], ['ab', 'cd', 'ef', 'AND']);
  DoTheTiming('Long str B:',    cLongB,   ['te', 'st', 'ing', 'and'], ['ab', 'cd', 'ef', 'AND']);

  DoTheTiming('HTML:',    cHTML, ['&lt;', '&gt;', '&amp', '&quot;'], ['<', '>', '&', '"']);
  DoTheTiming('Special Chars:',   cSpecialChars, ['\', '/', '!', '?', '[', ']', '_', '(', ')', '-', '+'], ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k']);

  Writeln('done...');
  Readln;

end.

 
 

 

Any advice is appreciated!

 

 

Edited by Mike Torrettinni

Share this post


Link to post
6 hours ago, Mike Torrettinni said:

Here is description of how my function works:

1. Get all Positions of all substrings

2. Sort Positions, so we make replacements from 1..n one by one (if we don't sort and make replacements randomly within a string, it's a mess...)

3. Loop Positions and make replacements (copy each replacement and parts between replacements into single Result string)

 

Initial idea that multiple scans are bad for performance is good, but your steps are wrong.

 

You should scan the string for all substrings in one go, but you should also build new string while you are scanning. Using TStringBuilder or similar pattern with preallocated buffer would be appropriate.

 

And when I sad scanning string for substrings, don't use Pos function. Scan string one character at the time and match it with patterns you are looking.

Edited by Dalija Prasnikar
  • Thanks 1

Share this post


Link to post

A few notes:

- the position array will very likely be rather short - unless you going to replace words in a novel or something like that. Handwritten InsertionSort will outperform that.

- avoid dynamic allocation for the position array - use a static array with a reasonable standard size and only fall back to dynamically allocating if that is not enough

- after fixing that I get an improvement of around 1/3 over the code you posted and now SamplingProfiler tells me that the code stays in System.Pos for like 50% of the time - that cannot be right

 

Also you have a bug:

  AddTestCase(vTestCases, 'ReplaceReplaceString', ['Replace', 'FooString'], ['Foo', 'Bar']);
 

Edited by Stefan Glienke
  • Thanks 1

Share this post


Link to post
1 hour ago, Dalija Prasnikar said:

And when I sad scanning string for substrings, don't use Pos function. Scan string one character at the time and match it with patterns you are looking.

I tried that, for each char in string loop all substrings if any match. It was slower, perhaps because it was Pos is using asm.

Will have another look at this example and will provide a code.

 

2 hours ago, Stefan Glienke said:

- avoid dynamic allocation for the position array - use a static array with a reasonable standard size and only fall back to dynamically allocating if that is not enough

Initially I had static array [0..1000]  and stored all positions of substrings. But looping and finding if that position is > 0 proved to be bottleneck, even when I had MaxIndex (length of string) < 1000 and was looping from 0 to MaxIndex.

 

I tried many options and I could have made a  mistake with previous versions, but this one proved to be the only one faster, the rest were really fast on short and much slower on longer strings.

 

Will try InsertionSort.

 

 

 

Share this post


Link to post
2 hours ago, Stefan Glienke said:

Also you have a bug:

  AddTestCase(vTestCases, 'ReplaceReplaceString', ['Replace', 'FooString'], ['Foo', 'Bar']);

Not a bug, but a fact that StringReplace replaces substrings sequentially not all in one step, like my function. Multiple calls in StringReplace go like this:

1. s := StringReplace('ReplaceReplaceString', 'Replace', 'Foo'); -> s = 'FooFooString'

2. s := StringReplace(s, 'FooString', 'Bar'); -> s = 'FooBar'

 

my function:

s := ReplaceMultiStrings7('ReplaceReplaceString', ['Replace', 'FooString'], ['Foo', 'Bar']) ->no 'Foo' found in original string, so no replacement, except 'Replace' is replaced with 'Foo' -> s = 'FooFooString'

 

This test raises error because they are different results, but they are both correct results, just this is rare cases where both functions works correct and produce different results.

 

I mean, you can't expect to replace 'Foo' if there is no 'Foo' in original string.

 

Am I wrong in this thinking, does the explanation make sense or am I just trying to justify that my function works correctly?

 

 

Share this post


Link to post

Did you ever have a look at the old faststrings library? It was for ansi strings only, but they had a lot of good ideas. Plus, if you happen to be using ansi strings only for your application (or at least part of it) it is very fast.

  • Like 1

Share this post


Link to post
25 minutes ago, Dave Novo said:

Did you ever have a look at the old faststrings library? It was for ansi strings only, but they had a lot of good ideas. Plus, if you happen to be using ansi strings only for your application (or at least part of it) it is very fast.

I think Delphi's standard StringReplace is based on implementation from FastStrings, already, but there's a lot of tricks that don't apply to unicode anymore. But if you have specific idea I should look at, please let me know.

Share this post


Link to post

@Dalija Prasnikar I have 2 additional versions:

MultiStrReplace3 - which loops string char by char  and compares each substring. The problem here is that I don't precalculate final lenght of Result, so it's a guess. From results we can see is fastest for really short strings, but way slower with medium or long examples.

MultiStringReplaceSB - example using TStringBuilder and is slowest in all examples.

 

image.png.0d39c8689e33837f11f2be35efdfc8fd.png

 

 

Of course there is quite high possibility I have made a mistake and one of the new versions could be best.

 

@Stefan Glienke I implemented InsertionSort and you are correct, it is faster. If you have more details on how you implemented static array that would make it even faster, I would like to know since my implementation of static array didn't prove successful.

 

Updated source:

 

program Project1;

{$APPTYPE CONSOLE}

{$R *.res}

uses
  System.SysUtils,
  System.Generics.Collections,
  System.Generics.Defaults,
  System.Diagnostics;

type
  // Positions
  TSubStrPos    = record
    Pos   : integer;
    StrIdx: integer;
  end;

  // Test cases
  TTestCase = record
    Str: string;
    OldPatterns,
    NewPatterns: TArray<string>;
  end;
  TTestCases = TArray<TTestCase>;

  // Custom comparer initialized 1x only
  TCustomComparer = class(TInterfacedObject, IComparer<TSubStrPos>)
  public
    function Compare(const Left, Right: TSubStrPos): Integer;
  end;

const
  cMaxLoop = 1000000;
  cShortA  : string = 'abc &lt; 1 &gt; 2';
  cMediumA : string = 'medium &lt; text medium &gt; text medium &lt;&gt; text medium text &gt;';
  cLongA   : string = '&lt;Lorem ipsum sit &gt;amet, &lt;consectetur adipiscing. Integer sagittis dolor a aliquam. &gt;&lt; Vestibulum et purus in dolor consectetur tempor a eget est. Curabitur sit amet erat imperdiet, &lt;scelerisque &gt;suscipit nisl. &lt;Vestibulum velit.';

  cShortB  : string = 'testing and testing';
  cMediumB : string = 'testing and testing and another testing and yet another testing of the test';
  cLongB   : string = 'Lorem te ipsum ingdolor sit stamet, consectetur adipiingiscing elitesting. Morbi bibendum, mauris Vestibulum vulputate, mauris neque pretium terisus, maximus odio ipsum vel est. Integer sollicitudin massa sit amet semper condimentum. rutrumingte';

  cHTML: string =
    '<html><body>This is sample text for StringReplace test.'
    + 'Lorem &lt;ipsum dolor sit amet,&amp; consectetur adipiscing elit. Integer feugiat ullamcorper &amp;ligula, at &amp;vehicula turpis volutpat et. Praesent in sapien vel tortor pulvinar ullamcorper vel et felis. &gt;Praesent mattis efficitur ligula, sit amet'
    + 'venenatis felis. Mauris dui mauris, tempus in turpis at, eleifend tincidunt arcu. Morbi nec tincidunt felis, non viverra nisi. Nulla facilisi.&lt; Donec iaculis, elit &gt;non hendrerit vehicula, urna dui efficitur &quolectus, in tempus metus est'
    + 'sed sem. &lt;Vivamus eleifend euismod orci a volutpat. Ut sed ligula sed sem bibendum facilisis. Morbi gravida neque &lt;nec odio lacinia, at tincidunt &amp;nulla dignissim. Vestibulum vitae &gt;nisi accumsan, dapibus lorem non, &quoconvallis turpis.'
    + 'Pellentesque imp&amp;erdiet malesuada ipsum vel commodo. Nam dignissim&lt; luctus augue, nec maximus felis vehicula eget. Phasellus ultrices lacus &lt;ac elit dapibus &quopellentesque. Morbi vitae eros in turpis&gt; hendrerit volutpat. Cras euismod'
    + ' dolor in scelerisque. Donec &amp;id tincidunt nisi. Praesent ut nunc luctus, pulvinar &lt;mauris ut, hend&gt;rerit nulla. Praesent quis varius lorem.&amp;'
    + '</body></html>';

  cSpecialChars : string = 'Testing special char replacemenets: [Lorem te (ipsum) ing(((d)))olo/r -s+it st\\amet, co_!?nse?ctetur adipi-i!ng?isc/ing +(el_)i[_!?]testing.]';

var xComparer:  IComparer<TSubStrPos>;

function TCustomComparer.Compare(const Left, Right: TSubStrPos): Integer;
begin
  Result := Left.Pos - Right.Pos;
end;


procedure GetSubStrPositions7(const aStr: string; const aOldPatterns, aNewPatterns: array of string; var aPositions: TArray<TSubStrPos>; var aFinalLength: integer);
var vPos, i, vDiff, vIdx: integer;
begin
  aFinalLength := aStr.Length;
  SetLength(aPositions, 32);
  vIdx := 0;
  for i := Low(aOldPatterns) to High(aOldPatterns) do
  begin
    vPos := Pos(aOldPatterns, aStr);
    vDiff := aNewPatterns.Length - aOldPatterns.Length; // calculate diff of replacements, to adjust aFinalLength
    while vPos <> 0 do
    begin
      if vIdx > High(aPositions) then
        SetLength(aPositions, Length(aPositions) * 2);

      aPositions[vIdx].Pos := vPos;
      aPositions[vIdx].StrIdx := i;
      Inc(vIdx);
      aFinalLength := aFinalLength + vDiff;
      vPos := Pos(aOldPatterns, aStr, vPos + 1);
    end;
  end;
  SetLength(aPositions, vIdx);
end;

function ReplaceMultiStringsByPositions7(const aStr: string; const aPositions: TArray<TSubStrPos>;  const aOldPatterns, aNewPatterns: array of string; aResultLen: integer): string;
var i, vResPos, vStrPos: Integer;
begin
  SetLength(Result, aResultLen);

  // move substrings by index + between indexes, if any
  vResPos := 1; vStrPos := 1;
  for i := Low(aPositions) to High(aPositions) do
  begin
    // Copy free text between last Pos and current Pos
    if aPositions.Pos > vStrPos then
    begin
      Move(aStr[vStrPos], Result[vResPos], (aPositions.Pos - vStrPos)*SizeOf(Char));
      Inc(vResPos, aPositions.Pos - vStrPos);
    end;

    // copy New str
    Move(aNewPatterns[aPositions.StrIdx][1], Result[vResPos], aNewPatterns[aPositions.StrIdx].Length * SizeOf(Char));
    // move Pos by New str len
    Inc(vResPos, aNewPatterns[aPositions.StrIdx].Length);
    // move aStr Pos
    vStrPos := aPositions.Pos + aOldPatterns[aPositions.StrIdx].Length;
  end;

  // copy any left after last replacement
  if vStrPos < aStr.Length then
    Move(aStr[vStrPos], Result[vResPos], (aStr.Length - vStrPos + 1) * SizeOf(Char));
end;

procedure SortPositions(var aPositions: TArray<TSubStrPos>);
begin
  if xComparer = nil then
    xComparer  := TCustomComparer.Create; // initialize Comparer

  TArray.Sort<TSubStrPos>(aPositions, xComparer);
end;

procedure InsertionSort(var A: TArray<TSubStrPos>);
var i, j: Integer;
    tmp: TSubStrPos;
begin
  for i:= 1 to high(A) do
  begin
    j:= i;
    tmp := A;
    while (j > 0) and (A[j-1].Pos > tmp.Pos) do
    begin
      A[j]:= A[j-1];
      Dec(j);
    end;
    A[j]:= tmp;
  end;
end;

function ReplaceMultiStrings7(const aStr: string; const aOldPatterns, aNewPatterns: array of string): string;
var vPositions: TArray<TSubStrPos>;
    vResultLen: integer;
   // vPositions2: TArray<TSubStrPos>;
begin
  // Get Positions for all string replacements
  GetSubStrPositions7(aStr, aOldPatterns, aNewPatterns, vPositions, vResultLen);
  //GetSubStrPositions7(aStr, aOldPatterns, aNewPatterns, vPositions2, vResultLen);

  // Sort Positions so the replacements go from first to last pos
  //SortPositions(vPositions);
  InsertionSort(vPositions); // Insertion sort is faster in these situation because of small number of items to sort

  // Replace by indexes
  Result := ReplaceMultiStringsByPositions7(aStr,  vPositions, aOldPatterns, aNewPatterns, vResultLen);
end;


{ Standard StringReplace used multiple times }
function StringReplaceAll(const aStr: string; const aOldPatterns, aNewPatterns: array of string): string;
var i: Integer;
begin
  Result := aStr;
  for i := Low(aOldPatterns) to High(aOldPatterns) do
    Result := StringReplace(Result, aOldPatterns, aNewPatterns, [rfReplaceAll]);
end;

function MultiStrReplace3(const aStr: string; const aOldPatterns, aNewPatterns: array of string): string;
var
  vCharPos: integer;
  vStrLen: integer;
  i, j, vStrPos: integer;
  vMatched: boolean;
  vResPos, a, b: integer;
begin
  vStrLen := Length(aStr);
  if vStrLen > 0 then
  begin
    SetLength(Result, 2 * vStrLen); // pre-allocate Result
    vStrPos := 1;
    vCharPos := 1;
    vResPos := 1;
    a := Low(aOldPatterns);
    b := High(aOldPatterns);
    while vCharPos <= vStrLen do
    begin

      // find next match
      for i := a to b do
      begin
        vMatched := false;
        if (aStr[vCharPos] = aOldPatterns[1]) then
        begin
          vMatched := True;
          for j := 1 to aOldPatterns.Length - 1 do
          if aStr[vCharPos + j] <> aOldPatterns[j+1] then
          begin
            vMatched := false;
            Break;
          end;
        end;

        // replace matched string
        if vMatched then
        begin
          // add original string up to this point, if needed
          Move(aStr[vStrPos], Result[vResPos], (vCharPos - vStrPos)*SizeOf(Char));
          Inc(vResPos, vCharPos - vStrPos);

          // now add new string
          Move(aNewPatterns[1], Result[vResPos], (aNewPatterns.Length)*SizeOf(Char));
          Inc(vResPos, aNewPatterns.Length);

          // move pos past the matched string
          Inc(vCharPos, aOldPatterns.Length);
          vStrPos := vCharPos;
          Break;
        end;
      end;
      if not vMatched then
        inc(vCharPos)
      else
        vMatched := false;
    end;
    // add end of string
    Move(aStr[vStrPos], Result[vResPos], (aStr.Length - vStrPos +1)*SizeOf(Char));
    SetLength(Result, vResPos + (aStr.Length - vStrPos));
  end;
end;

function MultiStringReplaceSB(const aStr: string; const aOldPatterns, aNewPatterns: array of string): string;
var vSB: TStringBuilder;
  vCharPos: integer;
  vStrLen: integer;
  i, j, vStrPos: integer;
  vMatched: boolean;
   a, b: integer;
begin
  vStrLen := Length(aStr);
  if vStrLen > 0 then
  begin
    vSB := TStringBuilder.Create( 2 * Length(aStr));
    try
      vStrPos := 1;
      vCharPos := 1;
      a := Low(aOldPatterns);
      b := High(aOldPatterns);
      while vCharPos <= vStrLen do
      begin

        // find next match
        for i := a to b do
        begin
          vMatched := false;
          if (aStr[vCharPos] = aOldPatterns[1]) then
          begin
            vMatched := True;
            for j := 1 to aOldPatterns.Length - 1 do
            if aStr[vCharPos + j] <> aOldPatterns[j+1] then
            begin
              vMatched := false;
              Break;
            end;
          end;

          // replace matched string
          if vMatched then
          begin
            // add original string up to this point, if needed
            vSB.Append(Copy(aStr, vStrPos, vCharPos - vStrPos));

            // now add new string
            vSB.Append(aNewPatterns);

            // move pos past the matched string
            Inc(vCharPos, aOldPatterns.Length);
            vStrPos := vCharPos;
            Break;
          end;
        end;
        if not vMatched then
          inc(vCharPos)
        else
          vMatched := false;
      end;

      // add end of string
      vSB.Append(Copy(aStr, vStrPos, aStr.Length - vStrPos +1));
      Result := vSB.ToString;
    finally
      vSB.Free;
    end;
  end;
end;

{ Test cases }
procedure AddTestCase(var aTestCases: TTestCases; const aStr: string; const aOldPatterns, aNewPatterns: TArray<string>);
var vTestCase: TTestCase;
begin
  vTestCase.Str := aStr;
  vTestCase.OldPatterns := aOldPatterns;
  vTestCase.NewPatterns := aNewPatterns;

  aTestCases := aTestCases + [vTestCase];
end;

procedure RunTestCases(const aTestCases: TTestCases);
var i: Integer;
    s, s7, s3, sSB: string;
begin
  for i := Low(aTestCases) to High(aTestCases) do
  begin
    s := StringReplaceAll(aTestCases.Str, aTestCases.OldPatterns, aTestCases.NewPatterns);
    s7 := ReplaceMultiStrings7(aTestCases.Str, aTestCases.OldPatterns, aTestCases.NewPatterns);

    s3 := MultiStrReplace3(aTestCases.Str, aTestCases.OldPatterns, aTestCases.NewPatterns);
    sSB := MultiStringReplaceSB(aTestCases.Str, aTestCases.OldPatterns, aTestCases.NewPatterns);

    if (s <> s7) then
      raise Exception.Create('Not equal results: ' + sLineBreak + s + sLineBreak + s7);

    if (s <> s3) or (s <> sSB) then
      raise Exception.Create('Not equal results: ' + sLineBreak + s + sLineBreak + s3 + sLineBreak + sSB );
  end;
end;


procedure DoTheTiming(const aCaption, aStr: string; const aOldPatterns, aNewPatterns: array of string);
var vSW:TStopWatch;
    i: integer;
    str: string;
begin
  Writeln(aCaption);

  // Standard StringReplace
  vSW := TStopWatch.StartNew;
  for i := 1 to cMaxLoop do
  begin
    str := '';
    str := StringReplaceAll(aStr, aOldPatterns, aNewPatterns);
  end;
  Writeln('StringReplaceAll:     ' + vSW.ElapsedMilliseconds.ToString);

  // ReplaceMultiStrings7
  vSW := TStopWatch.StartNew;
  for i := 1 to cMaxLoop do
  begin
    str := '';
    str := ReplaceMultiStrings7(aStr, aOldPatterns, aNewPatterns);
  end;
  Writeln('ReplaceMultiStrings7: ' + vSW.ElapsedMilliseconds.ToString);

  // MultiStrReplace3
  vSW := TStopWatch.StartNew;
  for i := 1 to cMaxLoop do
  begin
    str := '';
    str := MultiStrReplace3(aStr, aOldPatterns, aNewPatterns);
  end;
  Writeln('MultiStrReplace3:     ' + vSW.ElapsedMilliseconds.ToString);

  // MultiStringReplaceSB
  vSW := TStopWatch.StartNew;
  for i := 1 to cMaxLoop do
  begin
    str := '';
    str := MultiStringReplaceSB(aStr, aOldPatterns, aNewPatterns);
  end;
  Writeln('MultiStringReplaceSB: ' + vSW.ElapsedMilliseconds.ToString);
  Writeln;
end;

var vTestCases: TTestCases;

begin
   // make sure string replacement is correct!
  AddTestCase(vTestCases, cShortA, ['&lt;', '&gt;'], ['<', '>']);
  AddTestCase(vTestCases, cMediumA, ['&lt;', '&gt;'], ['<', '>']);
  AddTestCase(vTestCases, cLongA, ['&lt;', '&gt;'], ['<', '>']);
  AddTestCase(vTestCases, cShortB, ['te', 'st', 'ing', 'and'], ['ab', 'cd', 'ef', 'AND']);
  AddTestCase(vTestCases, cShortB, ['te'], ['AND']);
  AddTestCase(vTestCases, cShortB, ['and'], ['ab']);
  AddTestCase(vTestCases, cMediumB, ['te', 'st', 'ing', 'and'], ['ab', 'cd', 'ef', 'AND']);
  AddTestCase(vTestCases, cMediumB, ['te'], ['AND']);
  AddTestCase(vTestCases, cMediumB, ['and'], ['ab']);
  AddTestCase(vTestCases, cLongB, ['te', 'st', 'ing', 'and'], ['ab', 'cd', 'ef', 'AND']);
  AddTestCase(vTestCases, cLongB, ['te'], ['AND']);
  AddTestCase(vTestCases, cLongB, ['and'], ['ab']);
  AddTestCase(vTestCases, cHTML, ['&lt;', '&gt;', '&amp', '&quot;'], ['<', '>', '&', '"']);
  AddTestCase(vTestCases, cSpecialChars, ['\', '/', '!', '?', '[', ']', '_', '(', ')', '-', '+'], ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k']);

  // Examples of different outcome between calling StringReplace multiple times vs one-time replace all substrings
  //AddTestCase(vTestCases, 'ReplaceReplaceString', ['Replace', 'FooString'], ['Foo', 'Bar']);

  RunTestCases(vTestCases);

//  DoTheTiming('Short str A:',   cShortA,  ['&lt;', '&gt;'], ['<', '>']);
//  DoTheTiming('Medium str A:',  cMediumA, ['&lt;', '&gt;'], ['<', '>']);
//  DoTheTiming('Long str A:',    cLongA,   ['&lt;', '&gt;'], ['<', '>']);

  DoTheTiming('Short str B:',   cShortB,  ['te', 'st', 'ing', 'and'], ['ab', 'cd', 'ef', 'AND']);
  DoTheTiming('Medium str B:',  cMediumB, ['te', 'st', 'ing', 'and'], ['ab', 'cd', 'ef', 'AND']);
  DoTheTiming('Long str B:',    cLongB,   ['te', 'st', 'ing', 'and'], ['ab', 'cd', 'ef', 'AND']);

  DoTheTiming('HTML:',    cHTML, ['&lt;', '&gt;', '&amp', '&quot;'], ['<', '>', '&', '"']);
  DoTheTiming('Special Chars:',   cSpecialChars, ['\', '/', '!', '?', '[', ']', '_', '(', ')', '-', '+'], ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k']);

  Writeln('done...');

  Readln;

end.

 

Edited by Mike Torrettinni

Share this post


Link to post

What is that?

Can you omit the for-to index in 10.4 or is there a StringReplace which accepts arrays?

 

 for i := Low(aOldPatterns) to High(aOldPatterns) do
    Result := StringReplace(Result, aOldPatterns, aNewPatterns, [rfReplaceAll]);

 

 

 

 

Edited by Attila Kovacs

Share this post


Link to post
Guest

what about my sample?

 

image.thumb.png.4868b16c12cf2193cf9f0bb5b9acf770.png

 

implementation

{$R *.dfm}

uses
  System.StrUtils;

const
  cShortA: string  = 'abc &lt; 1 &gt; 2';
  cMediumA: string = 'medium &lt; text medium &gt; text medium &lt;&gt; text medium text &gt;';
  cLongA: string   =
    '&lt;Lorem ipsum sit &gt;amet, &lt;consectetur adipiscing. Integer sagittis dolor a aliquam. &gt;&lt; Vestibulum et purus in dolor consectetur tempor a eget est. Curabitur sit amet erat imperdiet, &lt;scelerisque &gt;suscipit nisl. &lt;Vestibulum velit.';

  cShortB: string  = 'testing and testing';
  cMediumB: string = 'testing and testing and another testing and yet another testing of the test';
  cLongB: string   =
    'Lorem te ipsum ingdolor sit stamet, consectetur adipiingiscing elitesting. Morbi bibendum, mauris Vestibulum vulputate, mauris neque pretium terisus, maximus odio ipsum vel est. Integer sollicitudin massa sit amet semper condimentum. rutrumingte';

  cHTML: string = '<html><body>This is sample text for StringReplace test.' +
    'Lorem &lt;ipsum dolor sit amet,&amp; consectetur adipiscing elit. Integer feugiat ullamcorper &amp;ligula, at &amp;vehicula turpis volutpat et. Praesent in sapien vel tortor pulvinar ullamcorper vel et felis. &gt;Praesent mattis efficitur ligula, sit amet'
    + 'venenatis felis. Mauris dui mauris, tempus in turpis at, eleifend tincidunt arcu. Morbi nec tincidunt felis, non viverra nisi. Nulla facilisi.&lt; Donec iaculis, elit &gt;non hendrerit vehicula, urna dui efficitur &quolectus, in tempus metus est'
    + 'sed sem. &lt;Vivamus eleifend euismod orci a volutpat. Ut sed ligula sed sem bibendum facilisis. Morbi gravida neque &lt;nec odio lacinia, at tincidunt &amp;nulla dignissim. Vestibulum vitae &gt;nisi accumsan, dapibus lorem non, &quoconvallis turpis.'
    + 'Pellentesque imp&amp;erdiet malesuada ipsum vel commodo. Nam dignissim&lt; luctus augue, nec maximus felis vehicula eget. Phasellus ultrices lacus &lt;ac elit dapibus &quopellentesque. Morbi vitae eros in turpis&gt; hendrerit volutpat. Cras euismod'
    + ' dolor in scelerisque. Donec &amp;id tincidunt nisi. Praesent ut nunc luctus, pulvinar &lt;mauris ut, hend&gt;rerit nulla. Praesent quis varius lorem.&amp;' + '</body></html>';

  cSpecialChars: string = 'Testing special char replacemenets: [Lorem te (ipsum) ing(((d)))olo/r -s+it st\\amet, co_!?nse?ctetur adipi-i!ng?isc/ing +(el_)i[_!?]testing.]';

  //
var
  lMyArrayString2: TArray<string>;
  //
  lMyDelimiter: string = ' '; // Alt+255 ... old D.O.S. scholl.

  //
procedure TForm1.btnGoJohnnyGoClick(Sender: TObject);
var
  lMyArrayString: TArray<string>;
  lMyStringS    : string;
  lOldString    : TArray<string>;
  lNewString    : TArray<string>;
  lTmpString    : string;
  lIndexStart   : integer;
  i             : integer;
begin
  // just for not use random values...
  lTmpString     := '';
  lIndexStart    := 0;
  i              := 0;
  lMyStringS     := '';
  lOldString     := [];
  lNewString     := [];
  lMyArrayString := [];
  //
  // aggregating all strings... for one step!!!
  lMyStringS := cShortA + lMyDelimiter;
  lMyStringS := lMyStringS + cMediumA + lMyDelimiter;
  lMyStringS := lMyStringS + cLongA + lMyDelimiter;
  lMyStringS := lMyStringS + cShortB + lMyDelimiter;
  lMyStringS := lMyStringS + cMediumB + lMyDelimiter;
  lMyStringS := lMyStringS + cLongB + lMyDelimiter;
  lMyStringS := lMyStringS + cHTML + lMyDelimiter;
  lMyStringS := lMyStringS + cSpecialChars + lMyDelimiter;
  //
  lOldString := ['&lt', '&gt']; // it should be equals items-array => lOldString AND lSub
  lNewString := ['&@@', '&@@']; // ... above note:
  //
  Memo1.Lines.Clear;
  //
  if high(lOldString) <> high(lNewString) then
  begin
    ShowMessage('High(lOldString)<>high(lNewString)');
    exit;
  end;
  //
  Memo1.Lines.Add(lMyStringS);
  Memo1.Lines.Add('---------------');
  //

  //
  // for lTmpString in lOldString do
  for i := low(lOldString) to high(lOldString) do
  begin
    while true do
    begin
      lIndexStart := Pos(lOldString[i], lMyStringS, lIndexStart + 1);
      //
      if (lIndexStart > 0) then
      begin
        // Memo1.Lines.Add(lOldString[i] + ' - ' + lIndexStart.ToString); // debugging...
        //
        lTmpString := lNewString[i]; // just for "CopyTo" use!!!
        //
        lTmpString.CopyTo(0, lMyStringS[lIndexStart], 0, lTmpString.Length);
        //
        // Memo1.Lines.Add('------> ' + lTmpString);
        //
        lIndexStart := lIndexStart + Length(lOldString[i]);
      end
      else
        break;
    end;
  end;
  //
  //
  Memo1.Lines.Add(lMyStringS);
  Memo1.Lines.Add('---------------');
  //
  // if need separate each "string... - in fact, would be "each var"..."
  // --------------------------------------------------------------------
  lMyArrayString := lMyStringS.Split(lMyDelimiter); // recreate the "arrays" if needs separate each "line"
  Memo1.Lines.AddStrings(lMyArrayString);

end;

end.

NOTE: using "POS()" because Im not have other better here...

 

hug

Edited by Guest

Share this post


Link to post

 

6 hours ago, emailx45 said:

lTmpString.CopyTo(0, lMyStringS[lIndexStart], 0, lTmpString.Length);

I didn't know about CopyTo, thanks! Good example, but fails when old and new substrings are of different length, right?

Share this post


Link to post
8 hours ago, Mike Torrettinni said:

Updated source

With Delphi Rio i am not able to build your example. Trying to fix it did not worked well.

Share this post


Link to post
1 minute ago, KodeZwerg said:

With Delphi Rio i am not able to build your example. Trying to fix it did not worked well.

I use 10.2.3, so I can't use anything 'newer' than 10.3. Where does it fail?

Share this post


Link to post
4 minutes ago, Mike Torrettinni said:

I use 10.2.3, so I can't use anything 'newer' than 10.3. Where does it fail?

Pictures say more than a thousand words, so I append some 😉

 

Screenshot - 21_03.png

Screenshot - 21_03 002.png

  • Thanks 1

Share this post


Link to post

Seems that when code is in SPOILER tags could be copied wrong.

 

Here is latest source:

 

program Project1;

{$APPTYPE CONSOLE}

{$R *.res}

uses
  System.SysUtils,
  System.Generics.Collections,
  System.Generics.Defaults,
  System.Diagnostics;

type
  // Positions
  TSubStrPos    = record
    Pos   : integer;
    StrIdx: integer;
  end;

  // Test cases
  TTestCase = record
    Str: string;
    OldPatterns,
    NewPatterns: TArray<string>;
  end;
  TTestCases = TArray<TTestCase>;

  // Custom comparer initialized 1x only
  TCustomComparer = class(TInterfacedObject, IComparer<TSubStrPos>)
  public
    function Compare(const Left, Right: TSubStrPos): Integer;
  end;

const
  cMaxLoop = 1000000;
  cShortA  : string = 'abc &lt; 1 &gt; 2';
  cMediumA : string = 'medium &lt; text medium &gt; text medium &lt;&gt; text medium text &gt;';
  cLongA   : string = '&lt;Lorem ipsum sit &gt;amet, &lt;consectetur adipiscing. Integer sagittis dolor a aliquam. &gt;&lt; Vestibulum et purus in dolor consectetur tempor a eget est. Curabitur sit amet erat imperdiet, &lt;scelerisque &gt;suscipit nisl. &lt;Vestibulum velit.';

  cShortB  : string = 'testing and testing';
  cMediumB : string = 'testing and testing and another testing and yet another testing of the test';
  cLongB   : string = 'Lorem te ipsum ingdolor sit stamet, consectetur adipiingiscing elitesting. Morbi bibendum, mauris Vestibulum vulputate, mauris neque pretium terisus, maximus odio ipsum vel est. Integer sollicitudin massa sit amet semper condimentum. rutrumingte';

  cHTML: string =
    '<html><body>This is sample text for StringReplace test.'
    + 'Lorem &lt;ipsum dolor sit amet,&amp; consectetur adipiscing elit. Integer feugiat ullamcorper &amp;ligula, at &amp;vehicula turpis volutpat et. Praesent in sapien vel tortor pulvinar ullamcorper vel et felis. &gt;Praesent mattis efficitur ligula, sit amet'
    + 'venenatis felis. Mauris dui mauris, tempus in turpis at, eleifend tincidunt arcu. Morbi nec tincidunt felis, non viverra nisi. Nulla facilisi.&lt; Donec iaculis, elit &gt;non hendrerit vehicula, urna dui efficitur &quolectus, in tempus metus est'
    + 'sed sem. &lt;Vivamus eleifend euismod orci a volutpat. Ut sed ligula sed sem bibendum facilisis. Morbi gravida neque &lt;nec odio lacinia, at tincidunt &amp;nulla dignissim. Vestibulum vitae &gt;nisi accumsan, dapibus lorem non, &quoconvallis turpis.'
    + 'Pellentesque imp&amp;erdiet malesuada ipsum vel commodo. Nam dignissim&lt; luctus augue, nec maximus felis vehicula eget. Phasellus ultrices lacus &lt;ac elit dapibus &quopellentesque. Morbi vitae eros in turpis&gt; hendrerit volutpat. Cras euismod'
    + ' dolor in scelerisque. Donec &amp;id tincidunt nisi. Praesent ut nunc luctus, pulvinar &lt;mauris ut, hend&gt;rerit nulla. Praesent quis varius lorem.&amp;'
    + '</body></html>';

  cSpecialChars : string = 'Testing special char replacemenets: [Lorem te (ipsum) ing(((d)))olo/r -s+it st\\amet, co_!?nse?ctetur adipi-i!ng?isc/ing +(el_)i[_!?]testing.]';

var xComparer:  IComparer<TSubStrPos>;

function TCustomComparer.Compare(const Left, Right: TSubStrPos): Integer;
begin
  Result := Left.Pos - Right.Pos;
end;


procedure GetSubStrPositions7(const aStr: string; const aOldPatterns, aNewPatterns: array of string; var aPositions: TArray<TSubStrPos>; var aFinalLength: integer);
var vPos, i, vDiff, vIdx: integer;
begin
  aFinalLength := aStr.Length;
  SetLength(aPositions, 32);
  vIdx := 0;
  for i := Low(aOldPatterns) to High(aOldPatterns) do
  begin
    vPos := Pos(aOldPatterns[i], aStr);
    vDiff := aNewPatterns[i].Length - aOldPatterns[i].Length; // calculate diff of replacements, to adjust aFinalLength
    while vPos <> 0 do
    begin
      if vIdx > High(aPositions) then
        SetLength(aPositions, Length(aPositions) * 2);

      aPositions[vIdx].Pos := vPos;
      aPositions[vIdx].StrIdx := i;
      Inc(vIdx);
      aFinalLength := aFinalLength + vDiff;
      vPos := Pos(aOldPatterns[i], aStr, vPos + 1);
    end;
  end;
  SetLength(aPositions, vIdx);
end;

function ReplaceMultiStringsByPositions7(const aStr: string; const aPositions: TArray<TSubStrPos>;  const aOldPatterns, aNewPatterns: array of string; aResultLen: integer): string;
var i, vResPos, vStrPos: Integer;
begin
  SetLength(Result, aResultLen);

  // move substrings by index + between indexes, if any
  vResPos := 1; vStrPos := 1;
  for i := Low(aPositions) to High(aPositions) do
  begin
    // Copy free text between last Pos and current Pos
    if aPositions[i].Pos > vStrPos then
    begin
      Move(aStr[vStrPos], Result[vResPos], (aPositions[i].Pos - vStrPos)*SizeOf(Char));
      Inc(vResPos, aPositions[i].Pos - vStrPos);
    end;

    // copy New str
    Move(aNewPatterns[aPositions[i].StrIdx][1], Result[vResPos], aNewPatterns[aPositions[i].StrIdx].Length * SizeOf(Char));
    // move Pos by New str len
    Inc(vResPos, aNewPatterns[aPositions[i].StrIdx].Length);
    // move aStr Pos
    vStrPos := aPositions[i].Pos + aOldPatterns[aPositions[i].StrIdx].Length;
  end;

  // copy any left after last replacement
  if vStrPos < aStr.Length then
    Move(aStr[vStrPos], Result[vResPos], (aStr.Length - vStrPos + 1) * SizeOf(Char));
end;

procedure SortPositions(var aPositions: TArray<TSubStrPos>);
begin
  if xComparer = nil then
    xComparer  := TCustomComparer.Create; // initialize Comparer

  TArray.Sort<TSubStrPos>(aPositions, xComparer);
end;

procedure InsertionSort(var A: TArray<TSubStrPos>);
var i, j: Integer;
    tmp: TSubStrPos;
begin
  for i:= 1 to high(A) do
  begin
    j:= i;
    tmp := A[i];
    while (j > 0) and (A[j-1].Pos > tmp.Pos) do
    begin
      A[j]:= A[j-1];
      Dec(j);
    end;
    A[j]:= tmp;
  end;
end;

function ReplaceMultiStrings7(const aStr: string; const aOldPatterns, aNewPatterns: array of string): string;
var vPositions: TArray<TSubStrPos>;
    vResultLen: integer;
   // vPositions2: TArray<TSubStrPos>;
begin
  // Get Positions for all string replacements
  GetSubStrPositions7(aStr, aOldPatterns, aNewPatterns, vPositions, vResultLen);
  //GetSubStrPositions7(aStr, aOldPatterns, aNewPatterns, vPositions2, vResultLen);

  // Sort Positions so the replacements go from first to last pos
  //SortPositions(vPositions);
  InsertionSort(vPositions); // Insertion sort is faster in these situation because of small number of items to sort

  // Replace by indexes
  Result := ReplaceMultiStringsByPositions7(aStr,  vPositions, aOldPatterns, aNewPatterns, vResultLen);
end;


{ Standard StringReplace used multiple times }
function StringReplaceAll(const aStr: string; const aOldPatterns, aNewPatterns: array of string): string;
var i: Integer;
begin
  Result := aStr;
  for i := Low(aOldPatterns) to High(aOldPatterns) do
    Result := StringReplace(Result, aOldPatterns[i], aNewPatterns[i], [rfReplaceAll]);
end;

function MultiStrReplace3(const aStr: string; const aOldPatterns, aNewPatterns: array of string): string;
var
  vCharPos: integer;
  vStrLen: integer;
  i, j, vStrPos: integer;
  vMatched: boolean;
  vResPos, a, b: integer;
begin
  vStrLen := Length(aStr);
  if vStrLen > 0 then
  begin
    SetLength(Result, 2 * vStrLen); // pre-allocate Result
    vStrPos := 1;
    vCharPos := 1;
    vResPos := 1;
    a := Low(aOldPatterns);
    b := High(aOldPatterns);
    while vCharPos <= vStrLen do
    begin

      // find next match
      for i := a to b do
      begin
        vMatched := false;
        if (aStr[vCharPos] = aOldPatterns[i][1]) then
        begin
          vMatched := True;
          for j := 1 to aOldPatterns[i].Length - 1 do
          if aStr[vCharPos + j] <> aOldPatterns[i][j+1] then
          begin
            vMatched := false;
            Break;
          end;
        end;

        // replace matched string
        if vMatched then
        begin
          // add original string up to this point, if needed
          Move(aStr[vStrPos], Result[vResPos], (vCharPos - vStrPos)*SizeOf(Char));
          Inc(vResPos, vCharPos - vStrPos);

          // now add new string
          Move(aNewPatterns[i][1], Result[vResPos], (aNewPatterns[i].Length)*SizeOf(Char));
          Inc(vResPos, aNewPatterns[i].Length);

          // move pos past the matched string
          Inc(vCharPos, aOldPatterns[i].Length);
          vStrPos := vCharPos;
          Break;
        end;
      end;
      if not vMatched then
        inc(vCharPos)
      else
        vMatched := false;
    end;
    // add end of string
    Move(aStr[vStrPos], Result[vResPos], (aStr.Length - vStrPos +1)*SizeOf(Char));
    SetLength(Result, vResPos + (aStr.Length - vStrPos));
  end;
end;

function MultiStringReplaceSB(const aStr: string; const aOldPatterns, aNewPatterns: array of string): string;
var vSB: TStringBuilder;
  vCharPos: integer;
  vStrLen: integer;
  i, j, vStrPos: integer;
  vMatched: boolean;
   a, b: integer;
begin
  vStrLen := Length(aStr);
  if vStrLen > 0 then
  begin
    vSB := TStringBuilder.Create( 2 * Length(aStr));
    try
      vStrPos := 1;
      vCharPos := 1;
      a := Low(aOldPatterns);
      b := High(aOldPatterns);
      while vCharPos <= vStrLen do
      begin

        // find next match
        for i := a to b do
        begin
          vMatched := false;
          if (aStr[vCharPos] = aOldPatterns[i][1]) then
          begin
            vMatched := True;
            for j := 1 to aOldPatterns[i].Length - 1 do
            if aStr[vCharPos + j] <> aOldPatterns[i][j+1] then
            begin
              vMatched := false;
              Break;
            end;
          end;

          // replace matched string
          if vMatched then
          begin
            // add original string up to this point, if needed
            vSB.Append(Copy(aStr, vStrPos, vCharPos - vStrPos));

            // now add new string
            vSB.Append(aNewPatterns[i]);

            // move pos past the matched string
            Inc(vCharPos, aOldPatterns[i].Length);
            vStrPos := vCharPos;
            Break;
          end;
        end;
        if not vMatched then
          inc(vCharPos)
        else
          vMatched := false;
      end;

      // add end of string
      vSB.Append(Copy(aStr, vStrPos, aStr.Length - vStrPos +1));
      Result := vSB.ToString;
    finally
      vSB.Free;
    end;
  end;
end;

{ Test cases }
procedure AddTestCase(var aTestCases: TTestCases; const aStr: string; const aOldPatterns, aNewPatterns: TArray<string>);
var vTestCase: TTestCase;
begin
  vTestCase.Str := aStr;
  vTestCase.OldPatterns := aOldPatterns;
  vTestCase.NewPatterns := aNewPatterns;

  aTestCases := aTestCases + [vTestCase];
end;

procedure RunTestCases(const aTestCases: TTestCases);
var i: Integer;
    s, s7, s3, sSB: string;
begin
  for i := Low(aTestCases) to High(aTestCases) do
  begin
    s := StringReplaceAll(aTestCases[i].Str, aTestCases[i].OldPatterns, aTestCases[i].NewPatterns);
    s7 := ReplaceMultiStrings7(aTestCases[i].Str, aTestCases[i].OldPatterns, aTestCases[i].NewPatterns);

    s3 := MultiStrReplace3(aTestCases[i].Str, aTestCases[i].OldPatterns, aTestCases[i].NewPatterns);
    sSB := MultiStringReplaceSB(aTestCases[i].Str, aTestCases[i].OldPatterns, aTestCases[i].NewPatterns);

      if (s <> s7) then
      raise Exception.Create('Not equal results: ' + sLineBreak + s + sLineBreak + s7);

    if (s <> s3) or (s <> sSB) then
      raise Exception.Create('Not equal results: ' + sLineBreak + s + sLineBreak + s3 + sLineBreak + sSB );

  end;
end;


procedure DoTheTiming(const aCaption, aStr: string; const aOldPatterns, aNewPatterns: array of string);
var vSW:TStopWatch;
    i: integer;
    str: string;
begin
  Writeln(aCaption);

  // Standard StringReplace
  vSW := TStopWatch.StartNew;
  for i := 1 to cMaxLoop do
  begin
    str := '';
    str := StringReplaceAll(aStr, aOldPatterns, aNewPatterns);
  end;
  Writeln('StringReplaceAll:     ' + vSW.ElapsedMilliseconds.ToString);

  // ReplaceMultiStrings7
  vSW := TStopWatch.StartNew;
  for i := 1 to cMaxLoop do
  begin
    str := '';
    str := ReplaceMultiStrings7(aStr, aOldPatterns, aNewPatterns);
  end;
  Writeln('ReplaceMultiStrings7: ' + vSW.ElapsedMilliseconds.ToString);

  // MultiStrReplace3
  vSW := TStopWatch.StartNew;
  for i := 1 to cMaxLoop do
  begin
    str := '';
    str := MultiStrReplace3(aStr, aOldPatterns, aNewPatterns);
  end;
  Writeln('MultiStrReplace3:     ' + vSW.ElapsedMilliseconds.ToString);

  // MultiStringReplaceSB
  vSW := TStopWatch.StartNew;
  for i := 1 to cMaxLoop do
  begin
    str := '';
    str := MultiStringReplaceSB(aStr, aOldPatterns, aNewPatterns);
  end;
  Writeln('MultiStringReplaceSB: ' + vSW.ElapsedMilliseconds.ToString);

  Writeln;
end;

var vTestCases: TTestCases;

begin
   // make sure string replacement is correct!
  AddTestCase(vTestCases, cShortA, ['&lt;', '&gt;'], ['<', '>']);
  AddTestCase(vTestCases, cMediumA, ['&lt;', '&gt;'], ['<', '>']);
  AddTestCase(vTestCases, cLongA, ['&lt;', '&gt;'], ['<', '>']);
  AddTestCase(vTestCases, cShortB, ['te', 'st', 'ing', 'and'], ['ab', 'cd', 'ef', 'AND']);
  AddTestCase(vTestCases, cShortB, ['te'], ['AND']);
  AddTestCase(vTestCases, cShortB, ['and'], ['ab']);
  AddTestCase(vTestCases, cMediumB, ['te', 'st', 'ing', 'and'], ['ab', 'cd', 'ef', 'AND']);
  AddTestCase(vTestCases, cMediumB, ['te'], ['AND']);
  AddTestCase(vTestCases, cMediumB, ['and'], ['ab']);
  AddTestCase(vTestCases, cLongB, ['te', 'st', 'ing', 'and'], ['ab', 'cd', 'ef', 'AND']);
  AddTestCase(vTestCases, cLongB, ['te'], ['AND']);
  AddTestCase(vTestCases, cLongB, ['and'], ['ab']);
  AddTestCase(vTestCases, cHTML, ['&lt;', '&gt;', '&amp', '&quot;'], ['<', '>', '&', '"']);
  AddTestCase(vTestCases, cSpecialChars, ['\', '/', '!', '?', '[', ']', '_', '(', ')', '-', '+'], ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k']);

  // Examples of different outcome between calling StringReplace multiple times vs one-time replace all substrings
  //AddTestCase(vTestCases, 'ReplaceReplaceString', ['Replace', 'FooString'], ['Foo', 'Bar']);

  RunTestCases(vTestCases);

//  DoTheTiming('Short str A:',   cShortA,  ['&lt;', '&gt;'], ['<', '>']);
//  DoTheTiming('Medium str A:',  cMediumA, ['&lt;', '&gt;'], ['<', '>']);
//  DoTheTiming('Long str A:',    cLongA,   ['&lt;', '&gt;'], ['<', '>']);

  DoTheTiming('Short str B:',   cShortB,  ['te', 'st', 'ing', 'and'], ['ab', 'cd', 'ef', 'AND']);
  DoTheTiming('Medium str B:',  cMediumB, ['te', 'st', 'ing', 'and'], ['ab', 'cd', 'ef', 'AND']);
  DoTheTiming('Long str B:',    cLongB,   ['te', 'st', 'ing', 'and'], ['ab', 'cd', 'ef', 'AND']);

  DoTheTiming('HTML:',    cHTML, ['&lt;', '&gt;', '&amp', '&quot;'], ['<', '>', '&', '"']);
  DoTheTiming('Special Chars:',   cSpecialChars, ['\', '/', '!', '?', '[', ']', '_', '(', ')', '-', '+'], ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k']);

  Writeln('done...');
  Readln;

end.

 

Edited by Mike Torrettinni
  • Thanks 1

Share this post


Link to post
2 minutes ago, Mike Torrettinni said:

Seems that when code is in SPOILER tags could be copied wrong.

Yes! That one works cool! Will now checkout and try add own method!

Thank you!

  • Thanks 1

Share this post


Link to post
12 hours ago, Mike Torrettinni said:

I tried that, for each char in string loop all substrings if any match. It was slower, perhaps because it was Pos is using asm.

Possible. 

Share this post


Link to post
8 hours ago, Mike Torrettinni said:

@Dalija PrasnikarMultiStringReplaceSB - example using TStringBuilder and is slowest in all examples.

It seems that TStringBuilder is not quite as fast as it could be. I didn't gave it closer look, so I cannot say whether the issue is creating and destroying TStringBuilder or other TStringBuilder code. 

Share this post


Link to post
45 minutes ago, Dalija Prasnikar said:

It seems that TStringBuilder is not quite as fast as it could be. I didn't gave it closer look, so I cannot say whether the issue is creating and destroying TStringBuilder or other TStringBuilder code. 

I didn't initially test TStringBuilder because I don't have any experience with it and also I remember reading that it was very good solution in early versions of Delphi, then they optimized everything else and it became not so performant in latest versions.

But if it would be faster or the fastest, I would use it!

 

I guess I forgot the point I wanted to make: because of limited experience with TStringBuilder, there's a high chance I don't use it correctly or most efficiently.

Edited by Mike Torrettinni

Share this post


Link to post
34 minutes ago, Mike Torrettinni said:

I didn't initially test TStringBuilder because I don't have any experience with it and also I remember reading that it was very good solution in early versions of Delphi, then they optimized everything else and it became not so performant in latest versions.

But if it would be faster or the fastest, I would use it!

Delphi strings are pretty good as far as performance is concerned, TStringBuilder might have advantage when you need to handle longer strings and where benefits of preallocating buffer will outweigh the cost of the builder allocation, deallocation and try...finally block.

I use TStringBuilder in few places, where it simplifies code, and I never measured the speed because it was not important in there.

 

If you want to knock yourself out with micro-optimizations, you should definitely take a look at Knuth-Morris-Pratt algorithm, especially if you are matching same patterns over and over again https://en.wikipedia.org/wiki/Knuth–Morris–Pratt_algorithm

 

  • Thanks 1

Share this post


Link to post

One of the big issues with heap allocation comes when you have multi threaded code. If your benchmark doesn't test that and you do have multi threaded code, then your benchmark is likely not very useful. 

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×