Jump to content
Mike Torrettinni

Multiple string replace - avoid calling StringReplace multiple times

Recommended Posts

moderator please delete, I do thank you!556640156_Screenshot-21_03003.thumb.png.36e3b9586539524e5c08e8dfa01a85a5.png

 

Edited by KodeZwerg

Share this post


Link to post
6 minutes ago, KodeZwerg said:

Please verify if I made everything correct. If so, new method beats all

Thank you, but it doesn't pass the RunTestCases. Just include the check for new functions, like:

 

var sHim: string;
...
sHim := Him_StringReplace(aTestCases[i].Str, aTestCases[i].OldPatterns, aTestCases[i].NewPatterns);
...
if s <> sHim then      
  raise Exception.Create('Not equal results: ' + sLineBreak + s + sLineBreak + sHim);

 

It seems it only replaces first strings.

Share this post


Link to post

Not faster, just tested.

 

https://github.com/frones/ACBr/blob/master/Fontes/Terceiros/FastStringReplace/StrUtilsEx.pas

Included as:

uses StrUtilsEx;

...

function StringReplaceExAll(const aStr: string; const aOldPatterns, aNewPatterns: array of string): string;
var i: Integer;
begin
  Result := aStr;
  for i := Low(aOldPatterns) to High(aOldPatterns) do
    Result := FastStringReplace(Result, aOldPatterns[i], aNewPatterns[i], [rfReplaceAll]);
end;

procedure RunTestCases(const aTestCases: TTestCases);
var i: Integer;
    s, s7, s3, sSB, sEx: string;
begin
  for i := Low(aTestCases) to High(aTestCases) do
  begin
    s := StringReplaceAll(aTestCases[i].Str, aTestCases[i].OldPatterns, aTestCases[i].NewPatterns);
    s7 := ReplaceMultiStrings7(aTestCases[i].Str, aTestCases[i].OldPatterns, aTestCases[i].NewPatterns);

    s3 := MultiStrReplace3(aTestCases[i].Str, aTestCases[i].OldPatterns, aTestCases[i].NewPatterns);
    sSB := MultiStringReplaceSB(aTestCases[i].Str, aTestCases[i].OldPatterns, aTestCases[i].NewPatterns);

    sEx := StringReplaceExAll(aTestCases[i].Str, aTestCases[i].OldPatterns, aTestCases[i].NewPatterns);

      if (s <> s7) then
      raise Exception.Create('Not equal results: ' + sLineBreak + s + sLineBreak + s7);

    if (s <> s3) or (s <> sSB) then
      raise Exception.Create('Not equal results: ' + sLineBreak + s + sLineBreak + s3 + sLineBreak + sSB );

    if (s <> sEx) then
      raise Exception.Create('Not equal results: ' + sLineBreak + s + sLineBreak + sEx);

  end;
end;


procedure DoTheTiming(const aCaption, aStr: string; const aOldPatterns, aNewPatterns: array of string);
var vSW:TStopWatch;
    i: integer;
    str: string;
begin
  Writeln(aCaption);

  // Standard StringReplace
  vSW := TStopWatch.StartNew;
  for i := 1 to cMaxLoop do
  begin
    str := '';
    str := StringReplaceAll(aStr, aOldPatterns, aNewPatterns);
  end;
  Writeln('StringReplaceAll:     ' + vSW.ElapsedMilliseconds.ToString);

  // ReplaceMultiStrings7
  vSW := TStopWatch.StartNew;
  for i := 1 to cMaxLoop do
  begin
    str := '';
    str := ReplaceMultiStrings7(aStr, aOldPatterns, aNewPatterns);
  end;
  Writeln('ReplaceMultiStrings7: ' + vSW.ElapsedMilliseconds.ToString);

  // MultiStrReplace3
  vSW := TStopWatch.StartNew;
  for i := 1 to cMaxLoop do
  begin
    str := '';
    str := MultiStrReplace3(aStr, aOldPatterns, aNewPatterns);
  end;
  Writeln('MultiStrReplace3:     ' + vSW.ElapsedMilliseconds.ToString);

  // MultiStringReplaceSB
  vSW := TStopWatch.StartNew;
  for i := 1 to cMaxLoop do
  begin
    str := '';
    str := MultiStringReplaceSB(aStr, aOldPatterns, aNewPatterns);
  end;
  Writeln('MultiStringReplaceSB: ' + vSW.ElapsedMilliseconds.ToString);

  // StringReplaceExAll
  vSW := TStopWatch.StartNew;
  for i := 1 to cMaxLoop do
  begin
    str := '';
    str := StringReplaceExAll(aStr, aOldPatterns, aNewPatterns);
  end;
  Writeln('StringReplaceExAll:   ' + vSW.ElapsedMilliseconds.ToString);

  Writeln;
end;

With following Results

381818575_Screenshot-21_03.thumb.png.31c5e09e76b6365cb2803a7fc6ac58a6.png

Continue the journey....

  • Thanks 1

Share this post


Link to post
Guest

hi @Mike Torrettinni 

about new string size, you can use "difference size between old--new", the xxx.Legenth will be the major-values. you see?

 

hug

Edited by Guest

Share this post


Link to post
22 minutes ago, emailx45 said:

hi @Mike Torrettinni 

about new string size, you can use "difference size between old--new", the xxx.Legenth will be the major-values. you see?

 

hug

Correct, but this is only for methods where you do one-pass over string first and count substrings, otherwise you don't know how many substrings there are, right?

Share this post


Link to post
Guest

no!

see my sample verify if the oldstr is founded until the end.

for general use, you have use "looping" as I did in simple case using "true until the end".

as do it any other procedure that needs verify many occurrences. if necessary, you add or delete chars...

if 3 > 2 then delete chars

if 2 > 3 then add chars

 

see CopyTo() internal definition ---> Move()

 

hug

Edited by Guest

Share this post


Link to post
6 minutes ago, emailx45 said:

no! for general use, you have use "looping" as I did in simple case using "true until the end".

as do it any other procedure that needs verify many occurrences. if necessary, you add or delete chars...

if 3 > 2 then delete chars (delete next chars)

if 2 > 3 then add chars (empty places)

 

see CopyTo() internal definition ---> Move()

 

hug

Hm, I probably don't understand correctly. Yes, CopyTo is like Move, but how do you delete chars? 

Share this post


Link to post
Guest

here, we will have that have a "select, and replace" as using mouse, you see?

but this function should be done by Move(), if not you need verify "how many chars is necessary delete or add, according with oldstr and newstr". then, you can add or delete chars conform what the necessity .

 

you see? my english is bad, sorry

 

hug

Edited by Guest

Share this post


Link to post
2 minutes ago, emailx45 said:

here, we will have that have a "select, and replace" as using mouse, you see?

but this function should be done by Move(), if not you need verify "how many chars is necessary delete or add, according with oldstr and newstr". then, you can add or delete chars conform what the necessity .

 

you see? my english is bad, sorry

 

hug

Ok, yes I know that. But I'm looking for performant example. If you have a complete example, I would like to try it out.

Share this post


Link to post
Guest

later, I can try add more code... im out house now.

try remove my code non essencial like memos, etc... and benchmark it if possible. I would like see  the values in your system.

 

hug

Edited by Guest

Share this post


Link to post

Here is my small idea. Not perfect, has limitations, I guess, but usable :classic_biggrin:

 

{$POINTERMATH ON}

const
  GUESS_MASK = 32 - 1;

procedure BuildGuess(AGuess: PByte; const aOld: array of string);
var
  pi, j: Integer;
begin
  FillChar(AGuess^, GUESS_MASK + 1, 0);
  for pi := 0 to High(aOld) do
  begin
    j := Ord(aOld[pi][1]) and GUESS_MASK;
    if AGuess[j] = 0 then
      AGuess[j] := pi + 1
    else
      AGuess[j] := 255;
  end;
end;

function Equals(S1, S2: PChar; ACount: Integer): Boolean;
var
  i: Integer;
begin
  for i := 0 to ACount - 1 do
  begin
    if S1[i] <> S2[i] then
      Exit(False);
  end;
  Result := True;
end;

function MyReplace(const S: string; const aOld, aNew: array of string): string;
label
  L;
var
  lnt:    Integer;
  pcnt:   Integer;
  c, eof: PChar;
  p:      PChar;
  pi, j:  Integer;
  pln:    Integer;
  off:    Integer;
  guess:  array[0..GUESS_MASK] of Byte;
begin
  pln  := 0;
  lnt  := Length(S);
  pcnt := Length(AOld);

  BuildGuess(@guess, aOld);
  SetLength(Result, lnt * 2);

  c   := Pointer(S);
  eof := c + lnt;
  off := PChar(Pointer(Result)) - c;

  while c <> eof do
  begin
    pi := guess[Ord(c^) and GUESS_MASK];
    if pi <> 0 then
    begin
      if pi <> 255 then
      begin
        Dec(pi);
        pln := aOld[pi].Length;
        if (c^ = aOld[pi][1]) and Equals(c, Pointer(aOld[pi]), pln) then
          goto L;
      end
      else
      begin
        pi := 0;
        while pi <> pcnt do
        begin
          pln := aOld[pi].Length;
          if (c^ = aOld[pi][1]) and Equals(c, Pointer(aOld[pi]), pln) then
            goto L;
          Inc(pi);
        end;
      end;
    end;

    c[off] := c^;
    Inc(c);
    Continue;
  L:
    Inc(c, pln);
    Dec(off, pln);

    pln := aNew[pi].Length;
    p   := Pointer(aNew[pi]);

    for j := 0 to pln - 1 do
      c[off + j] := p[j];
    Inc(off, pln);
  end;

  SetLength(Result, @c[off] - PChar(Pointer(Result)));
end;

 

  • Thanks 1

Share this post


Link to post
On 3/20/2021 at 12:40 PM, Mike Torrettinni said:

So, the purpose of this ReplaceMultiStrings is avoid calling StringRepace multiple times, when we need to replace multiple sub strings, like:

I am a bit puzzled here. How do you want to guarantee a consistent outcome? The first "search & replace" will change the string. It will remove characters and insert new ones. Which may get captured by the next search.

 

Suppose you have a string like "Jabba the Hutt lived in a Hut" and you want to replace "Hutt" by "Alien" and "Hut" by "House".  How do you decide which replacement to do first?  

 

 

 

 

 

 

 

  • Like 1

Share this post


Link to post
3 minutes ago, A.M. Hoornweg said:

How do you decide which replacement to do first?  

? I don't understand the question. The input has a natural order. There is nothing to decide.

Share this post


Link to post
Guest
25 minutes ago, A.M. Hoornweg said:

How do you decide which replacement to do first

in my sample, exist a correspondece between oldstr and newstr, as it should be. there is not necessary do it as a "who is the first?", or be, as in ReplaceString by Embarcadero the replacement is done as user say that be. not as would can be when using multiples strings (as a puzzle for all possibilities as a combination/arrangement.

 

but your question is pertinent, because "hut" is in "Hutt"... for sure. It WAS easy understand you... :classic_cheerleader:

 

note: just remember, here is not a dictionary grammar checker or similar, just a way to speed up the exchange of values for others 

 

hug

 

Edited by Guest

Share this post


Link to post
1 hour ago, Attila Kovacs said:

does this respect the order of the substrings?

Hm. I think - yes.

Share this post


Link to post
50 minutes ago, Attila Kovacs said:

I don't understand the question. The input has a natural order. There is nothing to decide.

What I mean is, if you call Embarcadero's stringReplace routine multiple times, then each subsequent replace sees the insertions made by the previous call.

But if you write a MultiStringReplace routine that first analyzes the string and determines everything that is to be replaced, that is not the case. It sees only the original string and not the intermediate insertions. The end result may be different.

  • Like 2

Share this post


Link to post

@A.M. Hoornweg The MultiStringReplace has to consider this, right. I did not follow the whole thread and the codes, but if it's not the case, then yes, you are right, the routine is useless. Btw. your example does not reflects this.

Share this post


Link to post
46 minutes ago, A.M. Hoornweg said:

What I mean is, if you call Embarcadero's stringReplace routine multiple times, then each subsequent replace sees the insertions made by the previous call.

But if you write a MultiStringReplace routine that first analyzes the string and determines everything that is to be replaced, that is not the case. It sees only the original string and not the intermediate insertions. The end result may be different.

That is expected, different implementation could have different results.
 

1 hour ago, A.M. Hoornweg said:

if you call Embarcadero's stringReplace routine multiple times, then each subsequent replace sees the insertions made by the previous call.

I looked into some open source projects and I couldn't find any example where StringReplace(s, 'somestring', '', [rfReplaceAll]) would be used in while Pos('somestring', s) > 0.

So this example would fail on single StringReplace: s := StringReplace('s, 'delphi', '', []),if the purpose is to remove 'delphi' from string: 'This is deldeldelphiphiphi example'.

 

You can also call MultiStringReplace  mutliple consecutive times, if needed.

 

The purpose is to have fast single pass replacements, this means some limitation apply. In most cases you know the type of input you work on. If it's possible to have 'nested/hidden' substrings that will show up after this function, than you should use different approach.

 

Share this post


Link to post
16 minutes ago, Mike Torrettinni said:

That is expected, different implementation could have different results.
 

Do you have a specification about what you want the function to do? It seems like you don't mind. 

  • Like 1

Share this post


Link to post

I brought that up in the third post in this thread already before everyone started polluting it with their attempts before actually knowing the exact requirements 😉 

 

So assuming you want a one pass replacement of multiple matches using the best match I would try something like this:

- sort replacement pairs by length of the to be replaced string descending (to get the best match first (like "hut" and "hutt" you would have "hutt" first to not replace "hut" where you could have replaced "hutt")

- walk the input string char by char and check against the array of replacements if you have a match - due to the sorting by length you will find the most fitting one if there is any

- produce output string on the fly copying the parts with no matches and the replacements

 

I did not write any code for that nor did measure anything but that would be my approach

Edited by Stefan Glienke

Share this post


Link to post
3 hours ago, balabuev said:

Here is my small idea. Not perfect, has limitations, I guess, but usable 

Pretty impressive, seems to be very fast, Still looking into it!

 

Amazing how you guys can come up with such solutions so fast, while I spent days working on mine 😉

Share this post


Link to post
1 minute ago, Mike Torrettinni said:

Pretty impressive, seems to be very fast, Still looking into it!

 

Amazing how you guys can come up with such solutions so fast, while I spent days working on mine 😉

I guess if there are no firm rules on the expected behaviour, then that makes it easier to write faster code. 

  • Like 1
  • Haha 2

Share this post


Link to post
1 hour ago, Stefan Glienke said:

sort replacement pairs by length of the to be replaced string descending

This would be a brand new exotic routine with unpredictable results.

Share this post


Link to post
Guest

a easy way to order each "WORDS" in a string, dont worry about string-size!!!

var
	lStrArrays:TArray<string>;
begin
      lStrArrays := xxxString.Split([' ']    {space} , ExcludeEmpty {or anothers options} );
      //
      // some procedure to order it, or simple List.SORT( xx) would be enought here
      xxList.SORT( lStrArray );
      //
      // now, do your replacements
   ...
      // 
      xxxString := ''.JOIN( lStrArrays );
end;

 

hug

Edited by Guest

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×