Mike Torrettinni 198 Posted July 16, 2020 I saw this example of ExtractContent function: https://stackoverflow.com/a/62918216/5198394 - by Andreas Rejbrandt It should return string without whatever is within brackets: function ExtractContent(const S: string): string; var i, c: Integer; InBracket: Boolean; begin SetLength(Result, S.Length); InBracket := False; c := 0; for i := 1 to S.Length do begin if S[i] = '{' then InBracket := True else if S[i]= '}' then InBracket := False else if not InBracket then begin Inc(c); Result[c] := S[i]; end; end; SetLength(Result, c); end; Then the last comment for this answer is :"... I'd use a pchar indexer and omit the bool flag and use two mutually exclusive (accept and reject) loops..." - by MartynA So, I was trying to implement his suggestions, but this is not my area of expertise. So here is my attempt: Renamed and brackets as parameters: function RemoveTextBetweenChars(const aString: string; const aChar1, aChar2: Char): string; var c : integer; vP : PChar; vSkip : boolean; // when to skip text begin SetLength(Result, aString.Length); c := 0; vP := PChar(aString); vSkip := False; while vP^ <> #0 do begin if vP^ = aChar1 then vSkip := True else if (vP^ = aChar2) then vSkip := false else if Not vSkip then begin Inc(c); Result[c] := vP^; end; Inc(vP); end; SetLength(Result, c); end; But I was only able to implement PChar indexer. I have no idea how to do this without boolean variable and I jhave no idea what two mutually exclusive (accept and reject) loops are. Any help is appreciated! Share this post Link to post
Guest Posted July 16, 2020 function ExtractContent(const S: string): string; var p, r: pchar; begin SetLength(Result, S.Length); r := @Result[1]; p := @S[1]; while p^ <> #0 do begin if p^ = '{' then begin while p^ <> '}' do Inc(p); Inc(p); end; r^ := p^; Inc(r); Inc(p); end; SetLength(Result, r - @Result[1]); end; Share this post Link to post
Guest Posted July 16, 2020 OK, there is bug ! in the inner ( the refuse loop ), i missed check for end of S function ExtractContent(const S: string): string; var p, r: pchar; begin SetLength(Result, S.Length); r := @Result[1]; p := @S[1]; while p^ <> #0 do begin if p^ = '{' then begin while p^ <> '}' do if p^ = #0 then Break else Inc(p); Inc(p); end; r^ := p^; Inc(r); Inc(p); end; SetLength(Result, r - @Result[1]); end; Share this post Link to post
Guest Posted July 16, 2020 still not right Mike, can you spot the bug case and fix it ? Share this post Link to post
Mike Torrettinni 198 Posted July 16, 2020 (edited) Let me try. Edited July 16, 2020 by Mike Torrettinni Share this post Link to post
Mike Torrettinni 198 Posted July 16, 2020 6 minutes ago, Kas Ob. said: still not right Mike, can you spot the bug case and fix it ? Hm, I guess I can't see the bug... and it works. Any hints on the bug? Share this post Link to post
Guest Posted July 16, 2020 look for the margin cases like lack of end closing bracket Share this post Link to post
Mike Torrettinni 198 Posted July 16, 2020 12 minutes ago, Kas Ob. said: look for the margin cases like lack of end closing bracket Aha, it can go past the #0 if missing closing bracket. This is better right: function ExtractContent3(const S: string): string; var p, r: pchar; begin SetLength(Result, S.Length); r := @Result[1]; p := @S[1]; while p^ <> #0 do begin if p^ = '{' then begin while p^ <> '}' do if p^ = #0 then begin sleep(0); Break; end else Inc(p); if p^ = #0 then // if inner loop gets to the end Break else Inc(p); end; r^ := p^; Inc(r); Inc(p); end; SetLength(Result, r - @Result[1]); end; Share this post Link to post
Guest Posted July 16, 2020 (edited) right, but you ended with two if's and two loops means four conditional flow, just the original text !! function ExtractContent(const S: string): string; var p, r: pchar; begin SetLength(Result, S.Length); r := @Result[1]; p := @S[1]; while p^ <> #0 do begin r^ := p^; if p^ = '{' then begin Dec(r); while p^ <> '}' do if p^ = #0 then begin Dec(p); Break; end else Inc(p); end; Inc(r); Inc(p); end; SetLength(Result, r - @Result[1]); end; I hope it is right now, i am lousy programmer ! Edited July 16, 2020 by Guest Share this post Link to post
Mike Torrettinni 198 Posted July 16, 2020 6 minutes ago, Kas Ob. said: right, but you ended with two if's and two loops means four conditional flow, just the original text !! function ExtractContent(const S: string): string; var p, r: pchar; begin SetLength(Result, S.Length); r := @Result[1]; p := @S[1]; while p^ <> #0 do begin r^ := p^; if p^ = '{' then begin Dec(r); while p^ <> '}' do if p^ = #0 then begin Dec(p); Break; end else Inc(p); end; Inc(r); Inc(p); end; SetLength(Result, r - @Result[1]); end; I hope it is right now, i am lousy programmer ! Thank you, looks good to me! 🙂 Share this post Link to post
timfrost 78 Posted July 16, 2020 But the original code was surely much easier to understand and maintain. And who can say whether the last one is 'better'? What surprises me is that nobody has jumped in with a one line incomprehensible Regex. Share this post Link to post
Mike Torrettinni 198 Posted July 16, 2020 New function is only about 17% faster than original - my simple tests. I was hoping that using pointers and all the changes would result in bigger performance gain. 2 minutes ago, timfrost said: But the original code was surely much easier to understand and maintain I agree. This was more exercise for me and trying new things. Share this post Link to post
Mike Torrettinni 198 Posted July 16, 2020 (edited) 5 minutes ago, timfrost said: What surprises me is that nobody has jumped in with a one line incomprehensible Regex. Actually RegEx was accepted answer in that SO question. But I'm staying away from RegEx as much as possible. Edited July 16, 2020 by Mike Torrettinni 2 Share this post Link to post
FredS 138 Posted July 16, 2020 17 minutes ago, timfrost said: one line incomprehensible Regex Pretty simple answer on SO `{.*?}`.. looks good to me unless Lazy and Greedy become a Feature: https://quality.embarcadero.com/browse/RSP-15402 Share this post Link to post
Guest Posted July 16, 2020 28 minutes ago, Mike Torrettinni said: New function is only about 17% faster than original So you are after speed, would you please confirm if this is faster function ExtractContent(const S: string): string; var p, ep, r, sr: pchar; len: Integer; begin len := S.Length; SetLength(Result, len); r := Pointer(Result); //r := @Result[1]; sr := r; p := Pointer(S); //p := @S[1]; ep := p + len; while p < ep do begin r^ := p^; Inc(r); if p^ = '{' then begin Dec(r); while Byte(Ord(p^) * (Ord(p^) - Ord('}'))) <> 0 do Inc(p); end; Inc(p); end; SetLength(Result, r - sr); end; This prefer short string, the shorter the faster it will be, Share this post Link to post
Mike Torrettinni 198 Posted July 16, 2020 10 minutes ago, Kas Ob. said: So you are after speed, would you please confirm if this is faster function ExtractContent(const S: string): string; var p, ep, r, sr: pchar; len: Integer; begin len := S.Length; SetLength(Result, len); r := Pointer(Result); //r := @Result[1]; sr := r; p := Pointer(S); //p := @S[1]; ep := p + len; while p < ep do begin r^ := p^; Inc(r); if p^ = '{' then begin Dec(r); while Byte(Ord(p^) * (Ord(p^) - Ord('}'))) <> 0 do Inc(p); end; Inc(p); end; SetLength(Result, r - sr); end; This prefer short string, the shorter the faster it will be, No significant change, some runs it's a little slower (1587ms vs 1610ms). Wasn't really looking for speed performance, but it would be nice. Share this post Link to post
Mike Torrettinni 198 Posted July 16, 2020 I'm testing for speed performance with this text example: Contrary to popular belief, {Lorem Ipsum} is not simply random text. It has roots in a piece of classical Latin literature from 45 BC, making it over 2000 years old. {Richard McClintock, a Latin professor at Hampden-Sydney College in Virginia, looked up one of the more obscure Latin words, consectetur, from a Lorem Ipsum passage, and going through the cites of the word in classical literature, discovered the undoubtable source. Lorem Ipsum comes from sections 1.10.32 and 1.10.33 of} "de Finibus Bonorum et {Malorum" (The Extremes of Good and Evil) by Cicero, written in 45 BC. This book is a treatise on the theory of ethics, very popular during the Renaissance. The first line of} Lorem Ipsum, {"Lorem ipsum dolor sit amet.."}, comes from a line in section 1.10.32. 1mio iterations: Stopwatch := TStopwatch.StartNew; for i := 1 to 1000000 do ExtractContent5(s); Stopwatch.Stop; Memo5.Lines.Add(stopwatch.ElapsedMilliseconds.ToString); Share this post Link to post
Guest Posted July 16, 2020 with such long text, you might be better to use PosEx and Move instead of walking it. Share this post Link to post
Attila Kovacs 629 Posted July 16, 2020 (edited) Player 3 entered the game (edited) function RemoveTextBetweenChars(const aString: string; const aChar1, aChar2: Char): string; label exit; var c: integer; P, start, res: PChar; begin if aString <> '' then begin SetLength(Result, aString.Length); res := Pointer(Result); P := Pointer(aString); c := 0; while True do begin start := P; while P^ <> aChar1 do begin inc(P); if P^ = #0 then begin Move(start^, res[c], (P - start) * SizeOf(Char)); inc(c, P - start); goto exit; end; end; Move(start^, res[c], (P - start) * SizeOf(Char)); inc(c, P - start); while P^ <> aChar2 do begin inc(P); if P^ = #0 then goto exit; end; inc(P); if P^ = #0 then goto exit; end; exit: SetLength(Result, c); end; end; Edited July 16, 2020 by Attila Kovacs 1 Share this post Link to post
Mike Torrettinni 198 Posted July 16, 2020 16 minutes ago, Attila Kovacs said: Player 3 entered the game function RemoveTextBetweenChars(const aString: string; const aChar1, aChar2: Char): string; var c: integer; P, start: PChar; begin if aString <> '' then begin SetLength(Result, aString.Length); P := Pointer(aString); c := 1; while True do begin start := P; while P^ <> aChar1 do begin inc(P); if P^ = #0 then begin Move(start^, Result[c], (P - start) * SizeOf(Char)); SetLength(Result, P - PChar(aString)); Exit; end; end; Move(start^, Result[c], (P - start) * SizeOf(Char)); inc(c, P - start); while P^ <> aChar2 do begin inc(P); if P^ = #0 then Exit; end; inc(P); if P^ = #0 then Exit; end; end; end; OK, this one is 25% faster, pretty good! Share this post Link to post
Attila Kovacs 629 Posted July 16, 2020 @Mike Torrettinni Slightly modified, wasn't round everything. Share this post Link to post
Stefan Glienke 2002 Posted July 16, 2020 (edited) For parsing purposes it might not be a good idea actually to return a string as that causes heap allocation every time you extract a substring (*) which I would guess does not only happen once but many times. Something like a PChar with additional Length information might be a better fit. If you really need the text as new string entity you can still materialize it. (*) You will just not notice in your benchmark because in the loop the same string instance is being reused all the time - but if you would run this in 10.4 story might be different (see my comment in https://quality.embarcadero.com/browse/RSP-29450) Edited July 16, 2020 by Stefan Glienke Share this post Link to post
Attila Kovacs 629 Posted July 16, 2020 (edited) You are right, even around the Move() was some string magic. With passing PChar is even faster. As for the result type, I'll let it for Mike to tune it. Edited July 16, 2020 by Attila Kovacs Share this post Link to post
Attila Kovacs 629 Posted July 16, 2020 Looking into the Move() implementation, wth is happened to rep movs*? Is it too slow nowadays? Share this post Link to post
Mahdi Safsafi 225 Posted July 16, 2020 27 minutes ago, Attila Kovacs said: Looking into the Move() implementation, wth is happened to rep movs*? Is it too slow nowadays? AFAIK it gets a little bit better on modern CPU. However it's getting abandoned on favor of SIMD instructions. IMO, I think that the RTL Move routine should be implemented as an intrinsic (not a function) in that way the compiler can generate a much better code. Share this post Link to post