Help with string extraction function

Mike Torrettinni · July 16, 2020

51 minutes ago, Stefan Glienke said:

For parsing purposes it might not be a good idea actually to return a string as that causes heap allocation every time you extract a substring (*) which I would guess does not only happen once but many times. Something like a PChar with additional Length information might be a better fit. If you really need the text as new string entity you can still materialize it.

(*) You will just not notice in your benchmark because in the loop the same string instance is being reused all the time - but if you would run this in 10.4 story might be different (see my comment in https://quality.embarcadero.com/browse/RSP-29450)

Wow, I hope they fix this by 10.5. I'm skipping 10.4 anyway.

Mike Torrettinni · July 16, 2020

1 hour ago, Attila Kovacs said:

goto exit;

Oh, you used GoTo... brave man 🙂 It is fastest fuction so far 🙂

Mike Torrettinni · July 17, 2020

6 hours ago, Kas Ob. said:

with such long text, you might be better to use PosEx and Move instead of walking it.

If I look at PosEx code in System.pas it also walks char by char. No?

@Attila Kovacs solutions uses Move, but not PosEx. You think PosEx could be better option for longer texts?

July 17, 2020

5 hours ago, Mike Torrettinni said:

solutions uses Move, but not PosEx. You think PosEx could be better option for longer texts?

No, No i looked at PosEX, it will not .

but wait how your result is very far from mine, i mean my version is way faster for short string and for your string it is even faster than Atilla's

program MikeStrings;

{$APPTYPE CONSOLE}

{$R *.res}

uses
  SysUtils,
  Windows;

var
  iStartTime, iEndTime, iFreq: Int64;
  Duration: Int64;

procedure StartTiming;
begin
  QueryPerformanceFrequency(iFreq);
  QueryPerformanceCounter(iStartTime);
end;

function StopTiming: string;
begin
  QueryPerformanceCounter(iEndTime);
  Duration := iEndTime - iStartTime;
  Result := FloatToStrF(Duration * 1000{for ms}         / iFreq, ffGeneral, 6, 3) + ' ms';
end;

const
  MIKE_MID_STRING =
    'Contrary to popular belief, {Lorem Ipsum} is not simply random text. It has roots in a piece of classical Latin literature from 45 BC, making it over 2000 years old. {Richard ' +
    'McClintock, a Latin professor at Hampden-Sydney College in Virginia, looked up one of the more obscure Latin words, consectetur, from a Lorem Ipsum passage, and going ' +
    'through the cites of the word in classical literature, discovered the undoubtable source. Lorem Ipsum comes from sections 1.10.32 and 1.10.33 of} "de Finibus Bonorum et ' +
    '{Malorum" (The Extremes of Good and Evil) by Cicero, written in 45 BC. This book is a treatise on the theory of ethics, very popular during the Renaissance. The first line of} ' +
    'Lorem Ipsum, {"Lorem ipsum dolor sit amet.."}, comes from a line in section 1.10.32.';
  JUST_SHORT_STRING = '123456{6789}ABCDEF';

function ExtractContentOriginal(const S: string): string;
var
  i, c: Integer;
  InBracket: Boolean;
begin
  SetLength(Result, S.Length);
  InBracket := False;
  c := 0;
  for i := 1 to S.Length do
  begin
    if S[i] = '{' then
      InBracket := True
    else if S[i] = '}' then
      InBracket := False
    else if not InBracket then
    begin
      Inc(c);
      Result[c] := S[i];
    end;
  end;
  SetLength(Result, c);
end;

function ExtractContentKasFixed(const S: string): string;
var
  p, ep, r, sr: pchar;
  len: Integer;
begin
  len := S.Length;
  SetLength(Result, len);
  r := Pointer(Result);
  sr := r;
  p := Pointer(S);
  ep := p + len;
  while p < ep do
  begin
    r^ := p^;
    Inc(r);
    if p^ = '{' then
    begin
      Dec(r);
      while Byte(Ord(p^) * (Ord(p^) - Ord('}'))) <> 0 do
        Inc(p);
    end;
    Inc(p);
  end;

  SetLength(Result, r - sr);
end;

function ExtractContentKasNotFixed(const S: string; const aChar1, aChar2: Char): string;
var
  p, ep, r, sr: pchar;
  len: Integer;
begin
  len := S.Length;
  SetLength(Result, len);
  r := Pointer(Result);
  sr := r;
  p := Pointer(S);
  ep := p + len;
  while p < ep do
  begin
    r^ := p^;
    Inc(r);
    if p^ = aChar1 then
    begin
      Dec(r);
      while Byte(Ord(p^) * (Ord(p^) - Ord(aChar2))) <> 0 do
        Inc(p);
    end;
    Inc(p);
  end;

  SetLength(Result, r - sr);
end;

function ExtractContentAtilla1NotFixed(const aString: string; const aChar1,
  aChar2: Char): string;
label
  exit;
var
  c: integer;
  P, start, res: PChar;
begin
  if aString <> '' then
  begin
    SetLength(Result, aString.Length);
    res := Pointer(Result);
    P := Pointer(aString);
    c := 0;
    while True do
    begin
      start := P;
      while P^ <> aChar1 do
      begin
        inc(P);
        if P^ = #0 then
        begin
          Move(start^, res[c], (P - start) * SizeOf(Char));
          inc(c, P - start);
          goto exit;
        end;
      end;
      Move(start^, res[c], (P - start) * SizeOf(Char));
      inc(c, P - start);
      while P^ <> aChar2 do
      begin
        inc(P);
        if P^ = #0 then
          goto exit;
      end;
      inc(P);
      if P^ = #0 then
        goto exit;
    end;
exit:
    SetLength(Result, c);
  end;
end;

function ExtractContentAtilla1Fixed(const aString: string): string;
label
  exit;
var
  c: integer;
  P, start, res: PChar;
begin
  if aString <> '' then
  begin
    SetLength(Result, aString.Length);
    res := Pointer(Result);
    P := Pointer(aString);
    c := 0;
    while True do
    begin
      start := P;
      while P^ <> '{' do
      begin
        inc(P);
        if P^ = #0 then
        begin
          Move(start^, res[c], (P - start) * SizeOf(Char));
          inc(c, P - start);
          goto exit;
        end;
      end;
      Move(start^, res[c], (P - start) * SizeOf(Char));
      inc(c, P - start);
      while P^ <> '}' do
      begin
        inc(P);
        if P^ = #0 then
          goto exit;
      end;
      inc(P);
      if P^ = #0 then
        goto exit;
    end;
exit:
    SetLength(Result, c);
  end;
end;

procedure RunMikeTest(Loops: Integer);
var
  I: Integer;
begin
  Writeln('Mike test');
  Write('Mid string'#9);
  StartTiming;
  for I := 1 to Loops do
    ExtractContentOriginal(MIKE_MID_STRING);
  Writeln(StopTiming);
  Write('Short string'#9);
  StartTiming;
  for I := 1 to Loops do
    ExtractContentOriginal(JUST_SHORT_STRING);
  Writeln(StopTiming);
end;

procedure RunKasFixedTest(Loops: Integer);
var
  I: Integer;
begin
  Writeln('Kas fixed test');
  Write('Mid string'#9);
  StartTiming;
  for I := 1 to Loops do
    ExtractContentKasFixed(MIKE_MID_STRING);
  Writeln(StopTiming);
  Write('Short string'#9);
  StartTiming;
  for I := 1 to Loops do
    ExtractContentKasFixed(JUST_SHORT_STRING);
  Writeln(StopTiming);
end;

procedure RunKasNotFixedTest(Loops: Integer);
var
  I: Integer;
begin
  Writeln('Kas not fixed test');
  Write('Mid string'#9);
  StartTiming;
  for I := 1 to Loops do
    ExtractContentKasNotFixed(MIKE_MID_STRING, '{', '}');
  Writeln(StopTiming);
  Write('Short string'#9);
  StartTiming;
  for I := 1 to Loops do
    ExtractContentKasNotFixed(JUST_SHORT_STRING, '{', '}');
  Writeln(StopTiming);
end;

procedure RunAtilla1NotFixedTest(Loops: Integer);
var
  I: Integer;
begin
  Writeln('Atilla not fixed test');
  Write('Mid string'#9);
  StartTiming;
  for I := 1 to Loops do
    ExtractContentAtilla1NotFixed(MIKE_MID_STRING, '{', '}');
  Writeln(StopTiming);
  Write('Short string'#9);
  StartTiming;
  for I := 1 to Loops do
    ExtractContentAtilla1NotFixed(JUST_SHORT_STRING, '{', '}');
  Writeln(StopTiming);
end;

procedure RunAtilla1TestFixed(Loops: Integer);
var
  I: Integer;
begin
  Writeln('Atilla fixed test');
  Write('Mid string'#9);
  StartTiming;
  for I := 1 to Loops do
    ExtractContentAtilla1Fixed(MIKE_MID_STRING);
  Writeln(StopTiming);
  Write('Short string'#9);
  StartTiming;
  for I := 1 to Loops do
    ExtractContentAtilla1Fixed(JUST_SHORT_STRING);
  Writeln(StopTiming);
end;

const
  ITERATIONS = 1000000;

procedure RunTests;
begin
  RunMikeTest(ITERATIONS);
  RunKasFixedTest(ITERATIONS);
  RunKasNotFixedTest(ITERATIONS);
  RunAtilla1NotFixedTest(ITERATIONS);
  RunAtilla1TestFixed(ITERATIONS);
end;

begin
  RunTests;
  Readln;
end.

And here my result

Quote

Mike test
Mid string 1768.8 ms
Short string 81.5028 ms
Kas fixed test
Mid string 533.336 ms
Short string 30.154 ms
Kas not fixed test
Mid string 658.88 ms
Short string 36.1251 ms
Atilla not fixed test
Mid string 612.567 ms
Short string 42.1707 ms
Atilla fixed test
Mid string 604.432 ms
Short string 40.2711 ms

aehimself · July 17, 2020

When I made a parser like this I was working with indexes returned by String Helpers. Look for the first opener, look for the first closer from the opener. Make sure there are no quotes (or even number of quotes) in between. Make sure there are equal amount of openers and equal amount of closers in between.

I'm not saying it was the fastest solution, but it worked 🙂 And it handled nested sections correctly, too.

Stefan Glienke · July 17, 2020

@Kas Ob. Your benchmark is flawed - every time you test multiple implementations in the same test you have a bias because of sorts of things such as memory layout/alignment, branch prediction, prefetcher and so on.

For example when I run the code you posted "Kas not fixed wins" - when I comment out all others the code runs slower. When I comment out one of the tests all routines suddenly run faster. This clearly is an indicator that depending on where the code is put by the compiler sometimes they might be sitting in the cache more beneficial than some other times. Unfortunately at least to my knowledge with the Delphi compiler you have no control over code alignment - unless you write stuff in asm if course...

Edited July 17, 2020 by Stefan Glienke

July 17, 2020

7 minutes ago, Stefan Glienke said:

For example when I run the code you posted "Kas not fixed wins" - when I comment out all others the code runs slower. When I comment out one of the tests all routines suddenly run faster. This clearly is an indicator that depending on where the code is put by the compiler sometimes they might be sitting in the cache more beneficial than some other times. Unfortunately at least to my knowledge with the Delphi compiler you have no control over code alignment - unless you write stuff in asm if course...

Adding to your comment switching the position of the functions in the same unit will change the result too.

It mostly depend on the loops destination address if it is aligned on 4,2,8... bytes or not, Delphi compiler has didn't got the memo !

FPC and LLVM do add nops in some places to align internal loops with aggressive optimization.

But still flawed is flawed no one arguing, may more like non optimal, one thing though all of those you mentioned (memory layout/alignment, branch prediction, prefetcher) will not play huge role in this case, as the data the iterations is very huge that will eliminate the cache effect while the data and the code are small that will be cached and fitted in small execution window, so it is not the main player here

Try to move the destination of those tight loops !, that is the case and adding $CODEALIGN 16 will not help much too as my outer loop will always will land on 2n, and if you want for the sake of educational purposes, try the same asm but add many as many nops needed just to make the conditional jmp land on multiplication of 16, 8,4,2, you will have better-to-less-better performance at the same order, then try to land on 3 ! see the performance drop 10-50% but will not be completely at random on the contrary it will repeatable process, this will happen because the data are not moving around so this exact effect destination address alignment is amplified. also this behaviour will be completely different on virtualized machines, there do exist random and unpredictable bahaviour.

July 17, 2020

Almost forgot to complain too.

I mentioned 16..2 alignments but for the top performance it will be on 64, Delphi added code alignment, but not sure if this is still the case for latest version ( after Seattle), someone did good with it, but decided that 32 and 16 alignment is not needed, unlike any compiler out there, because who need to read Intel manuals.

That worth opening QP or QC or whatever it called.

For me, i do sometimes low level optimization and that ruin the effect even with code compiled with FPC as obj, Delphi is not allowing it to be aligned right, so i do copy the code at runtime to memory aligned and execute those critical parts, also for the future ( hope not far and not in far galaxy ) when low level optimization of the RTL start such alignment (32 and 64) will be mission critical.

Stefan Glienke · July 17, 2020

I was merely commenting on the things that can happen when using common practice for benchmarking and comparing different algorithms or implementation by simply putting them all into the same executable and then running them after each other in any order and less so on which of them are affecting this particular test.

FWIW here is an optimized version of your routine (tested on 10.3.3 with $O+,W-):

- no unnecessary Inc(r)/Dec(r) when '{' was found

- no unnecessary check on the inner loop before first Inc(p)

- better code generated for inner loop condition - yours generated imul - could even use until (p^ in ['}', #0]) which is even faster but generates W1050 but produces valid code as it uses the 16bit registers still.

function ExtractContent(const S: string): string;
var
  p, ep, r, sr: pchar;
  len: Integer;
begin
  len := S.Length;
  SetLength(Result, len);
  r := Pointer(Result);
  sr := r;
  p := Pointer(S);
  ep := p + len;
  while p < ep do
  begin
    r^ := p^;
    if p^ = '{' then
    begin
      repeat
        Inc(p);
      until (p^ = '}') or (p^ = #0);
    end
    else
      Inc(r);
    Inc(p);
  end;

  SetLength(Result, r - sr);
end;

Edited July 17, 2020 by Stefan Glienke

Attila Kovacs · July 17, 2020

1 hour ago, Stefan Glienke said:

putting them all into the same executable and then running them after each other

no-no, the compiler runs out of registers if do so, always testing separately

Nice routine. I don't know why but if you use parameters instead if consts it performs even better. (Despite comparing against the stack)

ExtractContent(const S: string; const aChar1, aChar2: Char);

this boost it even further:

var zero: Char;

zero := #0;

until (P^ = aChar2) or (P^ = zero);

or even better:

until (P^ = aChar2) or (P = ep);

Edited July 17, 2020 by Attila Kovacs

July 17, 2020

@Stefan Glienke you are right there, and your version is faster

It is fascinating how FPC generate that inner loop with multiplication

Quote

MikeTest.pas:110 Inc(p);
0000000100001878 4883c602 add $0x2,%rsi
MikeTest.pas:109 while Byte(Ord(p^) * (Ord(p^) - Ord('}'))) <> 0 do
000000010000187C 440fb706 movzwl (%rsi),%r8d
0000000100001880 67458d4883 lea -0x7d(%r8d),%r9d
0000000100001885 450fafc1 imul %r9d,%r8d
0000000100001889 4584c0 test %r8b,%r8b
MikeTest.pas:95 begin
000000010000188C 75ea jne 0x100001878 <EXTRACTCONTENTKASFIXED+88>

and this how Delphi does it

Quote

MikeStrings.dpr.106: while Byte(Ord(p^) * (Ord(p^) - Ord('}'))) <> 0 do
00000000004256BC 4C0FB611 movzx r10,byte ptr [rcx]
00000000004256C0 4180EA7D sub r10b,$7d
00000000004256C4 480FB601 movzx rax,byte ptr [rcx]
00000000004256C8 41F6EA imul r10b
00000000004256CB 84C0 test al,al
00000000004256CD 75E9 jnz ExtractContentKasFixed + $58
MikeStrings.dpr.109: Inc(p);
00000000004256CF 4883C102 add rcx,$02
MikeStrings.dpr.99: while p < ep do
00000000004256D3 483BCA cmp rcx,rdx
00000000004256D6 72C8 jb ExtractContentKasFixed + $40
00000000004256D8 90 nop

I have no idea where this nop come from, as it does exactly the opposite of aligning the next line which is landing for a branch, this Delphi XE5, my IDE preferred for the light weight 64,

you have such nop in latest versions ?

Stefan Glienke · July 17, 2020

20 minutes ago, Kas Ob. said:

I have no idea where this nop come from, as it does exactly the opposite of aligning the next line which is landing for a branch, this Delphi XE5, my IDE preferred for the light weight 64, you have such nop in latest versions?

Yes but who cares for the nop (well for this one at least, yes the Win64 compiler inserts a ton of them in weird places), the next line is the SetLength before the ret.

Edited July 17, 2020 by Stefan Glienke

Mahdi Safsafi · July 17, 2020

@Kas Ob. I noticed that you take optimization very seriously. That's really good but just as an advice from someone that did, don't be driven too much. Behind that optimization door resides the evil ! In fact, I spent much time reading Intel doc/books, comparing different compilers (gcc, clang, ...). At the end I become more obsessional and less productive as I started paying much attention to my codes than what is required like taking some time to decide whether I should write if a = b or if a <> b.

Attila Kovacs · July 17, 2020

Slightly modified Stefan's beautiful routine. If I did not screw it up, it's faster because of comparing against the stack (parameters aChar1, aChar2), (I can't explain that), and omitting "sr" leaves more register for the compiler, which was a speed boost again.

Also, the initial "while" became a "repeat until", and instead of "P^ = #0" there is a " P = ep".

function ExtractContent4(const S: string; const aChar1, aChar2: Char): string;
var
  P, ep, r: PChar;
  len: integer;
begin
  len := S.Length;
  if len = 0 then
    exit(S);  
  SetLength(Result, len);
  r := Pointer(Result);
  P := Pointer(S);
  ep := P + len;
  repeat
    r^ := P^;
    if P^ = aChar1 then
    begin
      repeat
        inc(P);
      until (P^ = aChar2) or (P = ep);
    end
    else
      inc(r);
    inc(P);
  until P > ep;
  SetLength(Result, r - Pointer(Result));
end;

Edited July 17, 2020 by Attila Kovacs

Stefan Glienke · July 17, 2020

19 minutes ago, Attila Kovacs said:

Also, the initial "while" became a "repeat until".

Congrats, you just broke it for empty S. Also not storing Pointer(Result) makes it slower at least for me.

43 minutes ago, Mahdi Safsafi said:

@Kas Ob. I noticed that you take optimization very seriously. That's really good but just as an advice from someone that did, don't be driven too much. Behind that optimization door resides the evil ! In fact, I spent much time reading Intel doc/books, comparing different compilers (gcc, clang, ...). At the end I become more obsessional and less productive as I started paying much attention to my codes than what is required like taking some time to decide whether I should write if a = b or if a <> b.

You have to know where it's worth to put effort and where it does not matter.

Edited July 17, 2020 by Stefan Glienke

Mahdi Safsafi · July 17, 2020

@Stefan Glienke For some time it was totally out of my control

Now its getting better but yet I still fight my self some time.

Attila Kovacs · July 17, 2020

@Stefan Glienke thx, updated. Strange, with 10.1U2 it's definitely much faster if not storing it. O+ W-

asm:

ec4.txt

Mike Torrettinni · July 17, 2020

Interesting stuff: asm, registries, aligning... it's like a black-box to me. Reading how little insignificant changes (order of functions calls) affect the performance, is kind of scary.

But it shows how real performance optimization is based on case by case. It's hard (or impossible) to make 1 super performant algorithm for all cases. Just from this thread we can see the differences on optimization for short vs long strings, or like @aehimself pointed out if we need to consider nested brackets, escaped chars, quoted chars...

Here I am thinking if increasing char pointer is faster than using integer iterator, while you guys are discussing condition branching and using stack and full set of registries 🙂

Attila Kovacs · July 17, 2020

@Mike Torrettinni And the same code can perform differently on different CPU's.

9 minutes ago, Mike Torrettinni said:

increasing char pointer is faster than using integer iterator

Because the "integer iterator" does nothing else just calculates the pointer + i over and over. So basically it's a shortcut.

Edited July 17, 2020 by Attila Kovacs

Mahdi Safsafi · July 17, 2020

3 minutes ago, Mike Torrettinni said:

Here I am thinking if increasing char pointer is faster than using integer iterator, while you guys are discussing condition branching and using stack and full set of registries 🙂

On x86, sizeof(pointer) = sizeof(integer) so its much likely you are going to have the same performance.

On x64, sizeof(pointer) = sizeof(int64) so using integer could be some how better.

This was just a general rule, but does not apply always as it depends on the way you're using your code.

Stefan Glienke · July 17, 2020

30 minutes ago, Mahdi Safsafi said:

On x86, sizeof(pointer) = sizeof(integer) so its much likely you are going to have the same performance.

No, because if you are doing something with whatever is at x[ i ] its faster to use a pointer that starts at @x[ 0 ] and just inc that rather than having a loop which then has 2 counter variables (they like to count down to zero while maintaining the upwards counting i). And on x86 that is a precious register that can be used for something else.

Edited July 17, 2020 by Stefan Glienke

David Heffernan · July 17, 2020

1 hour ago, Mike Torrettinni said:

Interesting stuff: asm, registries, aligning... it's like a black-box to me. Reading how little insignificant changes (order of functions calls) affect the performance, is kind of scary.

But it shows how real performance optimization is based on case by case. It's hard (or impossible) to make 1 super performant algorithm for all cases. Just from this thread we can see the differences on optimization for short vs long strings, or like @aehimself pointed out if we need to consider nested brackets, escaped chars, quoted chars...

Here I am thinking if increasing char pointer is faster than using integer iterator, while you guys are discussing condition branching and using stack and full set of registries 🙂

Do you have a performance bottleneck with this code?

Mike Torrettinni · July 17, 2020

9 minutes ago, David Heffernan said:

Do you have a performance bottleneck with this code?

No, not using any of it, yet. Planning to do some profiling and benchmarking on real data and will see if there are any bottlenecks in this area.

July 17, 2020

Just now, Mike Torrettinni said:

No, not using any of it, yet.

Got me good !

Mike Torrettinni · July 17, 2020

4 minutes ago, Kas Ob. said:

Got me good !

Wait, this all happened yesterday... sorry, but I can't implement improvements so fast. What happened here, miscommunication?

Sign In

Help with string extraction function

Recommended Posts

Mike Torrettinni 199

Share this post

Link to post

Mike Torrettinni 199

Share this post

Link to post

Mike Torrettinni 199

Share this post

Link to post

Guest

Share this post

Link to post

aehimself 413

Share this post

Link to post

Stefan Glienke 2175

Share this post

Link to post

Guest

Share this post

Link to post

Guest

Share this post

Link to post

Stefan Glienke 2175

Share this post

Link to post

Attila Kovacs 693

Share this post

Link to post

Guest

Share this post

Link to post

Stefan Glienke 2175

Share this post

Link to post

Mahdi Safsafi 225

Share this post

Link to post

Attila Kovacs 693

Share this post

Link to post

Stefan Glienke 2175

Share this post

Link to post

Mahdi Safsafi 225

Share this post

Link to post

Attila Kovacs 693

Share this post

Link to post

Mike Torrettinni 199

Share this post

Link to post

Attila Kovacs 693

Share this post

Link to post

Mahdi Safsafi 225

Share this post

Link to post

Stefan Glienke 2175

Share this post

Link to post

David Heffernan 2482

Share this post

Link to post

Mike Torrettinni 199

Share this post

Link to post

Guest

Share this post

Link to post

Mike Torrettinni 199

Share this post

Link to post

Create an account or sign in to comment

Create an account