Jump to content
Mike Torrettinni

Help with string extraction function

Recommended Posts

51 minutes ago, Stefan Glienke said:

For parsing purposes it might not be a good idea actually to return a string as that causes heap allocation every time you extract a substring (*) which I would guess does not only happen once but many times. Something like a PChar with additional Length information might be a better fit. If you really need the text as new string entity you can still materialize it.

 

(*) You will just not notice in your benchmark because in the loop the same string instance is being reused all the time - but if you would run this in 10.4 story might be different (see my comment in https://quality.embarcadero.com/browse/RSP-29450)

Wow, I hope they fix this by 10.5. I'm skipping 10.4 anyway.

Share this post


Link to post
6 hours ago, Kas Ob. said:

with such long text, you might be better to use PosEx and Move instead of walking it.

If I look at PosEx code in System.pas it also walks char by char. No?

@Attila Kovacs solutions uses Move, but not PosEx. You think PosEx could be better option for longer texts?

 

 

Share this post


Link to post
Guest
5 hours ago, Mike Torrettinni said:

solutions uses Move, but not PosEx. You think PosEx could be better option for longer texts?

No, No i looked at PosEX, it will not .

 

but wait how your result is very far from mine, i mean my version is way faster for short string and for your string it is even faster than Atilla's

program MikeStrings;

{$APPTYPE CONSOLE}

{$R *.res}

uses
  SysUtils,
  Windows;

var
  iStartTime, iEndTime, iFreq: Int64;
  Duration: Int64;

procedure StartTiming;
begin
  QueryPerformanceFrequency(iFreq);
  QueryPerformanceCounter(iStartTime);
end;

function StopTiming: string;
begin
  QueryPerformanceCounter(iEndTime);
  Duration := iEndTime - iStartTime;
  Result := FloatToStrF(Duration * 1000{for ms}         / iFreq, ffGeneral, 6, 3) + ' ms';
end;

const
  MIKE_MID_STRING =
    'Contrary to popular belief, {Lorem Ipsum} is not simply random text. It has roots in a piece of classical Latin literature from 45 BC, making it over 2000 years old. {Richard ' +
    'McClintock, a Latin professor at Hampden-Sydney College in Virginia, looked up one of the more obscure Latin words, consectetur, from a Lorem Ipsum passage, and going ' +
    'through the cites of the word in classical literature, discovered the undoubtable source. Lorem Ipsum comes from sections 1.10.32 and 1.10.33 of} "de Finibus Bonorum et ' +
    '{Malorum" (The Extremes of Good and Evil) by Cicero, written in 45 BC. This book is a treatise on the theory of ethics, very popular during the Renaissance. The first line of} ' +
    'Lorem Ipsum, {"Lorem ipsum dolor sit amet.."}, comes from a line in section 1.10.32.';
  JUST_SHORT_STRING = '123456{6789}ABCDEF';

function ExtractContentOriginal(const S: string): string;
var
  i, c: Integer;
  InBracket: Boolean;
begin
  SetLength(Result, S.Length);
  InBracket := False;
  c := 0;
  for i := 1 to S.Length do
  begin
    if S[i] = '{' then
      InBracket := True
    else if S[i] = '}' then
      InBracket := False
    else if not InBracket then
    begin
      Inc(c);
      Result[c] := S[i];
    end;
  end;
  SetLength(Result, c);
end;

function ExtractContentKasFixed(const S: string): string;
var
  p, ep, r, sr: pchar;
  len: Integer;
begin
  len := S.Length;
  SetLength(Result, len);
  r := Pointer(Result);
  sr := r;
  p := Pointer(S);
  ep := p + len;
  while p < ep do
  begin
    r^ := p^;
    Inc(r);
    if p^ = '{' then
    begin
      Dec(r);
      while Byte(Ord(p^) * (Ord(p^) - Ord('}'))) <> 0 do
        Inc(p);
    end;
    Inc(p);
  end;

  SetLength(Result, r - sr);
end;

function ExtractContentKasNotFixed(const S: string; const aChar1, aChar2: Char): string;
var
  p, ep, r, sr: pchar;
  len: Integer;
begin
  len := S.Length;
  SetLength(Result, len);
  r := Pointer(Result);
  sr := r;
  p := Pointer(S);
  ep := p + len;
  while p < ep do
  begin
    r^ := p^;
    Inc(r);
    if p^ = aChar1 then
    begin
      Dec(r);
      while Byte(Ord(p^) * (Ord(p^) - Ord(aChar2))) <> 0 do
        Inc(p);
    end;
    Inc(p);
  end;

  SetLength(Result, r - sr);
end;

function ExtractContentAtilla1NotFixed(const aString: string; const aChar1,
  aChar2: Char): string;
label
  exit;
var
  c: integer;
  P, start, res: PChar;
begin
  if aString <> '' then
  begin
    SetLength(Result, aString.Length);
    res := Pointer(Result);
    P := Pointer(aString);
    c := 0;
    while True do
    begin
      start := P;
      while P^ <> aChar1 do
      begin
        inc(P);
        if P^ = #0 then
        begin
          Move(start^, res[c], (P - start) * SizeOf(Char));
          inc(c, P - start);
          goto exit;
        end;
      end;
      Move(start^, res[c], (P - start) * SizeOf(Char));
      inc(c, P - start);
      while P^ <> aChar2 do
      begin
        inc(P);
        if P^ = #0 then
          goto exit;
      end;
      inc(P);
      if P^ = #0 then
        goto exit;
    end;
exit:
    SetLength(Result, c);
  end;
end;

function ExtractContentAtilla1Fixed(const aString: string): string;
label
  exit;
var
  c: integer;
  P, start, res: PChar;
begin
  if aString <> '' then
  begin
    SetLength(Result, aString.Length);
    res := Pointer(Result);
    P := Pointer(aString);
    c := 0;
    while True do
    begin
      start := P;
      while P^ <> '{' do
      begin
        inc(P);
        if P^ = #0 then
        begin
          Move(start^, res[c], (P - start) * SizeOf(Char));
          inc(c, P - start);
          goto exit;
        end;
      end;
      Move(start^, res[c], (P - start) * SizeOf(Char));
      inc(c, P - start);
      while P^ <> '}' do
      begin
        inc(P);
        if P^ = #0 then
          goto exit;
      end;
      inc(P);
      if P^ = #0 then
        goto exit;
    end;
exit:
    SetLength(Result, c);
  end;
end;

procedure RunMikeTest(Loops: Integer);
var
  I: Integer;
begin
  Writeln('Mike test');
  Write('Mid string'#9);
  StartTiming;
  for I := 1 to Loops do
    ExtractContentOriginal(MIKE_MID_STRING);
  Writeln(StopTiming);
  Write('Short string'#9);
  StartTiming;
  for I := 1 to Loops do
    ExtractContentOriginal(JUST_SHORT_STRING);
  Writeln(StopTiming);
end;

procedure RunKasFixedTest(Loops: Integer);
var
  I: Integer;
begin
  Writeln('Kas fixed test');
  Write('Mid string'#9);
  StartTiming;
  for I := 1 to Loops do
    ExtractContentKasFixed(MIKE_MID_STRING);
  Writeln(StopTiming);
  Write('Short string'#9);
  StartTiming;
  for I := 1 to Loops do
    ExtractContentKasFixed(JUST_SHORT_STRING);
  Writeln(StopTiming);
end;

procedure RunKasNotFixedTest(Loops: Integer);
var
  I: Integer;
begin
  Writeln('Kas not fixed test');
  Write('Mid string'#9);
  StartTiming;
  for I := 1 to Loops do
    ExtractContentKasNotFixed(MIKE_MID_STRING, '{', '}');
  Writeln(StopTiming);
  Write('Short string'#9);
  StartTiming;
  for I := 1 to Loops do
    ExtractContentKasNotFixed(JUST_SHORT_STRING, '{', '}');
  Writeln(StopTiming);
end;

procedure RunAtilla1NotFixedTest(Loops: Integer);
var
  I: Integer;
begin
  Writeln('Atilla not fixed test');
  Write('Mid string'#9);
  StartTiming;
  for I := 1 to Loops do
    ExtractContentAtilla1NotFixed(MIKE_MID_STRING, '{', '}');
  Writeln(StopTiming);
  Write('Short string'#9);
  StartTiming;
  for I := 1 to Loops do
    ExtractContentAtilla1NotFixed(JUST_SHORT_STRING, '{', '}');
  Writeln(StopTiming);
end;

procedure RunAtilla1TestFixed(Loops: Integer);
var
  I: Integer;
begin
  Writeln('Atilla fixed test');
  Write('Mid string'#9);
  StartTiming;
  for I := 1 to Loops do
    ExtractContentAtilla1Fixed(MIKE_MID_STRING);
  Writeln(StopTiming);
  Write('Short string'#9);
  StartTiming;
  for I := 1 to Loops do
    ExtractContentAtilla1Fixed(JUST_SHORT_STRING);
  Writeln(StopTiming);
end;

const
  ITERATIONS = 1000000;

procedure RunTests;
begin
  RunMikeTest(ITERATIONS);
  RunKasFixedTest(ITERATIONS);
  RunKasNotFixedTest(ITERATIONS);
  RunAtilla1NotFixedTest(ITERATIONS);
  RunAtilla1TestFixed(ITERATIONS);
end;

begin
  RunTests;
  Readln;
end.

And here my result

Quote

Mike test
Mid string      1768.8 ms
Short string    81.5028 ms
Kas fixed test
Mid string      533.336 ms
Short string    30.154 ms
Kas not fixed test
Mid string      658.88 ms
Short string    36.1251 ms
Atilla not fixed test
Mid string      612.567 ms
Short string    42.1707 ms
Atilla fixed test
Mid string      604.432 ms
Short string    40.2711 ms

 

Share this post


Link to post

When I made a parser like this I was working with indexes returned by String Helpers. Look for the first opener, look for the first closer from the opener. Make sure there are no quotes (or even number of quotes) in between. Make sure there are equal amount of openers and equal amount of closers in between.

I'm not saying it was the fastest solution, but it worked 🙂 And it handled nested sections correctly, too.

Share this post


Link to post

@Kas Ob. Your benchmark is flawed - every time you test multiple implementations in the same test you have a bias because of sorts of things such as memory layout/alignment, branch prediction, prefetcher and so on.

For example when I run the code you posted "Kas not fixed wins" - when I comment out all others the code runs slower. When I comment out one of the tests all routines suddenly run faster. This clearly is an indicator that depending on where the code is put by the compiler sometimes they might be sitting in the cache more beneficial than some other times. Unfortunately at least to my knowledge with the Delphi compiler you have no control over code alignment - unless you write stuff in asm if course...

Edited by Stefan Glienke
  • Like 1

Share this post


Link to post
Guest
7 minutes ago, Stefan Glienke said:

For example when I run the code you posted "Kas not fixed wins" - when I comment out all others the code runs slower. When I comment out one of the tests all routines suddenly run faster. This clearly is an indicator that depending on where the code is put by the compiler sometimes they might be sitting in the cache more beneficial than some other times. Unfortunately at least to my knowledge with the Delphi compiler you have no control over code alignment - unless you write stuff in asm if course...

Adding to your comment switching the position of the functions in the same unit will change the result too. 

It mostly depend on the loops destination address if it is aligned on 4,2,8... bytes or not, Delphi compiler has didn't got the memo !

FPC and LLVM do add nops in some places to align internal loops with aggressive optimization.

 

But still flawed is flawed no one arguing, may more like non optimal, one thing though all of those you mentioned (memory layout/alignment, branch prediction, prefetcher) will not play huge role in this case, as the data the iterations is very huge that will eliminate the cache effect while the data and the code are small that will be cached and fitted in small execution window, so it is not the main player here

Try to move the destination of those tight loops !, that is the case and adding $CODEALIGN 16 will not help much too as my outer loop will always will land on 2n, and if you want for the sake of educational purposes, try the same asm but add many as many nops needed just to make the conditional jmp land on multiplication of 16, 8,4,2, you will have better-to-less-better performance at the same order, then try to land on 3 ! see the performance drop 10-50% but will not be completely at random on the contrary it will repeatable process, this will happen because the data are not moving around so this exact effect destination address alignment is amplified. also this behaviour will be completely different on virtualized machines, there do exist random and unpredictable bahaviour.

Share this post


Link to post
Guest

Almost forgot to complain too.

 

I mentioned 16..2 alignments but for the top performance it will be on 64, Delphi added code alignment, but not sure if this is still the case for latest version ( after Seattle), someone did good with it, but decided that 32 and 16 alignment is not needed, unlike any compiler out there, because who need to read Intel manuals.

 

That worth opening QP or QC or whatever it called.

 

For me, i do sometimes low level optimization and that ruin the effect even with code compiled with FPC as obj, Delphi is not allowing it to be aligned right, so i do copy the code at runtime to memory aligned and execute those critical parts, also for the future ( hope not far and not in far galaxy ) when low level optimization of the RTL start such alignment (32 and 64) will be mission critical.

Share this post


Link to post

I was merely commenting on the things that can happen when using common practice for benchmarking and comparing different algorithms or implementation by simply putting them all into the same executable and then running them after each other in any order and less so on which of them are affecting this particular test.

 

FWIW here is an optimized version of your routine (tested on 10.3.3 with $O+,W-):

- no unnecessary Inc(r)/Dec(r) when '{' was found

- no unnecessary check on the inner loop before first Inc(p)

- better code generated for inner loop condition - yours generated imul - could even use until (p^ in ['}', #0]) which is even faster but generates W1050 but produces valid code as it uses the 16bit registers still.

 

function ExtractContent(const S: string): string;
var
  p, ep, r, sr: pchar;
  len: Integer;
begin
  len := S.Length;
  SetLength(Result, len);
  r := Pointer(Result);
  sr := r;
  p := Pointer(S);
  ep := p + len;
  while p < ep do
  begin
    r^ := p^;
    if p^ = '{' then
    begin
      repeat
        Inc(p);
      until (p^ = '}') or (p^ = #0);
    end
    else
      Inc(r);
    Inc(p);
  end;

  SetLength(Result, r - sr);
end;

 

Edited by Stefan Glienke
  • Thanks 1

Share this post


Link to post
1 hour ago, Stefan Glienke said:

putting them all into the same executable and then running them after each other

no-no, the compiler runs out of registers if do so, always testing separately

 

Nice routine. I don't know why but if you use parameters instead if consts it performs even better. (Despite comparing against the stack)

ExtractContent(const S: string; const aChar1, aChar2: Char);

 

 

this boost it even further:

var zero: Char;

zero := #0;

until (P^ = aChar2) or (P^ = zero);

 

or even better:

until (P^ = aChar2) or (P = ep);

 

 

Edited by Attila Kovacs
  • Like 1

Share this post


Link to post
Guest

@Stefan Glienke you are right there, and your version is faster

 

It is fascinating how FPC generate that inner loop with multiplication 

Quote

MikeTest.pas:110                                      Inc(p);
0000000100001878 4883c602                 add    $0x2,%rsi
MikeTest.pas:109                                      while Byte(Ord(p^) * (Ord(p^) - Ord('}'))) <> 0 do
000000010000187C 440fb706                 movzwl (%rsi),%r8d
0000000100001880 67458d4883               lea    -0x7d(%r8d),%r9d
0000000100001885 450fafc1                     imul   %r9d,%r8d
0000000100001889 4584c0                        test   %r8b,%r8b
MikeTest.pas:95                                      begin
000000010000188C 75ea                         jne    0x100001878 <EXTRACTCONTENTKASFIXED+88>
 

and this how Delphi does it

Quote

MikeStrings.dpr.106: while Byte(Ord(p^) * (Ord(p^) - Ord('}'))) <> 0 do
00000000004256BC 4C0FB611         movzx r10,byte ptr [rcx]
00000000004256C0 4180EA7D         sub r10b,$7d
00000000004256C4 480FB601          movzx rax,byte ptr [rcx]
00000000004256C8 41F6EA              imul r10b
00000000004256CB 84C0                  test al,al
00000000004256CD 75E9                  jnz ExtractContentKasFixed + $58
MikeStrings.dpr.109: Inc(p);
00000000004256CF 4883C102           add rcx,$02
MikeStrings.dpr.99: while p < ep do
00000000004256D3 483BCA              cmp rcx,rdx
00000000004256D6 72C8                   jb ExtractContentKasFixed + $40
00000000004256D8 90                       nop

I have no idea where this nop come from, as it does exactly the opposite of aligning the next line which is landing for a branch, this Delphi XE5, my IDE preferred for the light weight 64,

you have such nop in latest versions ? 

 

 

Share this post


Link to post
20 minutes ago, Kas Ob. said:

I have no idea where this nop come from, as it does exactly the opposite of aligning the next line which is landing for a branch, this Delphi XE5, my IDE preferred for the light weight 64, you have such nop in latest versions?

Yes but who cares for the nop (well for this one at least, yes the Win64 compiler inserts a ton of them in weird places), the next line is the SetLength before the ret.

Edited by Stefan Glienke

Share this post


Link to post

@Kas Ob. I noticed that you take optimization very seriously. That's really good but just as an advice from someone that did, don't be driven too much. Behind that optimization door resides the evil ! In fact, I spent much time reading Intel doc/books, comparing different compilers (gcc, clang, ...). At the end I become more obsessional and less productive as I started paying much attention to my codes than what is required like taking some time to decide whether I should write if a = b or if a <> b.

  • Like 2

Share this post


Link to post

Slightly modified Stefan's beautiful routine. If I did not screw it up, it's faster because of comparing against the stack (parameters aChar1, aChar2), (I can't explain that), and omitting "sr" leaves more register for the compiler, which was a speed boost again.

Also, the initial "while" became a "repeat until", and instead of "P^ = #0" there is a " P = ep".

 

function ExtractContent4(const S: string; const aChar1, aChar2: Char): string;
var
  P, ep, r: PChar;
  len: integer;
begin
  len := S.Length;
  if len = 0 then
    exit(S);  
  SetLength(Result, len);
  r := Pointer(Result);
  P := Pointer(S);
  ep := P + len;
  repeat
    r^ := P^;
    if P^ = aChar1 then
    begin
      repeat
        inc(P);
      until (P^ = aChar2) or (P = ep);
    end
    else
      inc(r);
    inc(P);
  until P > ep;
  SetLength(Result, r - Pointer(Result));
end;

 

Edited by Attila Kovacs
  • Thanks 1

Share this post


Link to post
19 minutes ago, Attila Kovacs said:

Also, the initial "while" became a "repeat until".

Congrats, you just broke it for empty S. Also not storing Pointer(Result) makes it slower at least for me.

 

43 minutes ago, Mahdi Safsafi said:

@Kas Ob. I noticed that you take optimization very seriously. That's really good but just as an advice from someone that did, don't be driven too much. Behind that optimization door resides the evil ! In fact, I spent much time reading Intel doc/books, comparing different compilers (gcc, clang, ...). At the end I become more obsessional and less productive as I started paying much attention to my codes than what is required like taking some time to decide whether I should write if a = b or if a <> b.

You have to know where it's worth to put effort and where it does not matter.

 

Edited by Stefan Glienke

Share this post


Link to post

Interesting stuff: asm, registries, aligning... it's like a black-box to me. Reading how little insignificant changes (order of functions calls) affect the performance, is kind of scary.

 

But it shows how real performance optimization is based on case by case. It's hard (or impossible) to make 1 super performant algorithm for all cases. Just from this thread we can see the differences on optimization for short vs long strings, or like @aehimself pointed out if we need to consider nested brackets, escaped chars, quoted chars...

 

Here I am thinking if increasing char pointer is faster than using integer iterator, while you guys are discussing condition branching and using stack and full set of registries 🙂

Share this post


Link to post

@Mike Torrettinni And the same code can perform differently on different CPU's.

 

 

9 minutes ago, Mike Torrettinni said:

increasing char pointer is faster than using integer iterator

Because the "integer iterator" does nothing else just calculates the pointer + i over and over. So basically it's a shortcut.

 

 

 

Edited by Attila Kovacs

Share this post


Link to post
3 minutes ago, Mike Torrettinni said:

Here I am thinking if increasing char pointer is faster than using integer iterator, while you guys are discussing condition branching and using stack and full set of registries 🙂

On x86, sizeof(pointer) = sizeof(integer) so its much likely you are going to have the same performance.

On x64, sizeof(pointer) = sizeof(int64) so using integer could be some how better.

This was just a general rule, but does not apply always as it depends on the way you're using your code.

Share this post


Link to post
30 minutes ago, Mahdi Safsafi said:

On x86, sizeof(pointer) = sizeof(integer) so its much likely you are going to have the same performance.

No, because if you are doing something with whatever is at x[ i ] its faster to use a pointer that starts at @x[ 0 ] and just inc that rather than having a loop which then has 2 counter variables (they like to count down to zero while maintaining the upwards counting i). And on x86 that is a precious register that can be used for something else.

Edited by Stefan Glienke

Share this post


Link to post
1 hour ago, Mike Torrettinni said:

Interesting stuff: asm, registries, aligning... it's like a black-box to me. Reading how little insignificant changes (order of functions calls) affect the performance, is kind of scary.

 

But it shows how real performance optimization is based on case by case. It's hard (or impossible) to make 1 super performant algorithm for all cases. Just from this thread we can see the differences on optimization for short vs long strings, or like @aehimself pointed out if we need to consider nested brackets, escaped chars, quoted chars...

 

Here I am thinking if increasing char pointer is faster than using integer iterator, while you guys are discussing condition branching and using stack and full set of registries 🙂

Do you have a performance bottleneck with this code?

Share this post


Link to post
9 minutes ago, David Heffernan said:

Do you have a performance bottleneck with this code?

No, not using any of it, yet. Planning to do some profiling and benchmarking on real data and will see if there are any bottlenecks in this area.

Share this post


Link to post
Guest
Just now, Mike Torrettinni said:

No, not using any of it, yet.

Got me good !

Share this post


Link to post
4 minutes ago, Kas Ob. said:

Got me good !

Wait, this all happened yesterday... sorry, but I can't implement improvements so fast. What happened here, miscommunication?

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×