Jump to content
Sign in to follow this  
Mike Torrettinni

RegEx performance

Recommended Posts

I have an old example of RegEx usage and noticed with one customer becomes a bottleneck. So, I did some benchmarking and it was very slow because I create RegEx expressions on each call.

So, I optimized it to only create RegEx when is needed. I also took a chance to see if I can get rid of RegEx and using Pos and string iteration is faster of course.

 

The purpose of RegEx is to find if  Function() exists in string. The problem is that sometimes there are spaces before bracket, like Function (). The most I found was 4 spaces before I implemented RegEx to replace If 'Function(' or 'Function (' or 'Function  (' or 'Function   (' is in string, with RegEx: '\b' + vFunc + '[ ]*\('

 

So, I'm happy to get rid of RegEx for this example. But still wanted to show the code, perhaps I'm using RegEx the wrong way and someone has a suggestion.

 

 

Here are the timings (ms):

32bit:

image.png.9b04ea63ec1da047c304d8d2522d8dca.png

 

In 64bit optimized RegEx is a little faster:

image.png.ae46170cc49b7592e340d614725c4ca2.png

 

 

And code:

program Project1;

{$APPTYPE CONSOLE}

{$R *.res}

uses
  System.SysUtils, System.RegularExpressions, System.Diagnostics;

const
  cNoHit:   string = 'Testing string just for testing and more testing.';
  cShort:   string = 'FunctionA(x)';
  cMedium:  string = 'Testing FunctionA (1) or Writeln(FunctionB  (1+1)) and Not FunctionC( FunctionD(y))';
  cLong:    string = 'Lorem ipsum dolor FunctionA(a)sit amet, consectetur adipiscing FunctionB(FunctionC (2)) elit. Ut enim neque, dictum vel sodales molestie, FunctionD     (10)pretium in neque. Quisque FunctionA          (10)rhoncus. Alma e alma de penatibus et FunctionA(99)';

  cLoop: integer = 100000;
  cFunctions: array[1..5] of string =
    ('FunctionA', 'FunctionB', 'FunctionC', 'FunctionD', 'FunctionN'); // FunctionN is not in test cases

function IsFunctionInString_RegEx(const aFunction, aStr: string): boolean;
var vRegEx: TRegEx;
begin
  // aFunction can have none or multple spaces between bracket
  // aFunction(), aFunction (), aFunction  ()...
  vRegEx.Create('\b' + aFunction + '[ ]*\(');
  Result := vRegEx.IsMatch(aStr);
end;

function IsFunctionInString_RegEx_Optimized(const aRegEx: TRegEx; const aFunction, aStr: string): boolean;
begin
  // Optimized: aRegEx compiled only once
  Result := aRegEx.IsMatch(aStr);
end;

function IsFunctionInString(const aFunction, aStr: string): boolean;
var vPos, i: Integer;
begin
  // aFunction can have none or multple spaces between bracket
  // aFunction(), aFunction (), aFunction  ()...

  // 1. Find aFunction and check if next character is '(', skipping any spaces
  vPos := Pos(aFunction, aStr);
  if vPos = 0 then
    Exit(false);
  for i := vPos + aFunction.Length to aStr.Length do
  begin
    if aStr[i] = '(' then
      Exit(True);
    if aStr[i] <> ' ' then
      Exit(false);
  end;
end;

procedure Test_RegEx(const aStr: string);
var vSW: TStopWatch;
    s, vFunc: string;
    b: boolean;
    i: Integer;
begin
  vSW := TStopWatch.StartNew;
  for vFunc in cFunctions do
    for i := 1 to cLoop do
      b := IsFunctionInString_RegEx(vFunc, cNoHit);
  Writeln(' RegEx:            ' + vSW.ElapsedMilliseconds.ToString);
end;

procedure Test_RegEx_Optimized(const aStr: string);
var vSW: TStopWatch;
    s, vFunc: string;
    b: boolean;
    i: Integer;
    vRegEx: TRegEx;
begin
  vSW := TStopWatch.StartNew;
  for vFunc in cFunctions do
  begin
    vRegEx.Create('\b' + vFunc + '[ ]*\(');
    for i := 1 to cLoop do
      b := IsFunctionInString_RegEx_Optimized(vRegEx, vFunc, cNoHit);
  end;
  Writeln(' RegEx Optimized:    ' + vSW.ElapsedMilliseconds.ToString);
end;

procedure Test(const aStr: string);
var vSW: TStopWatch;
    s, vFunc: string;
    b: boolean;
  i: Integer;
begin
  vSW := TStopWatch.StartNew;
  for i := 1 to cLoop do
    for vFunc in cFunctions do
      b := IsFunctionInString(vFunc, cNoHit);
  Writeln(' Pos:                ' + vSW.ElapsedMilliseconds.ToString);
end;

procedure Validate;
var vStrings: array of string;
    s, vFunc: string;
    vRegEx: TRegEx;
begin
  vStrings := [cNoHit, cShort, cMedium, cLong];
  for s in vStrings do
    for vFunc in cFunctions do
    begin
      vRegEx.Create('\b' + vFunc + '[ ]*\(');
      if (IsFunctionInString(vFunc, s) <> IsFunctionInString_RegEx(vFunc, s)) or
         (IsFunctionInString(vFunc, s) <> IsFunctionInString_RegEx_Optimized(vRegEx, vFunc, s))
     then
         raise Exception.Create('Error validating: ' + vFunc + ' in ' + s);
    end;
end;


begin
  Validate;

  Writeln('No hit str:');
  Test_RegEx(cNoHit);
  Test_RegEx_Optimized(cNoHit);
  Test(cNoHit);
  Writeln('');

  Writeln('Short str:');
  Test_RegEx(cShort);
  Test_RegEx_Optimized(cShort);
  Test(cShort);
  Writeln('');

  Writeln('Medium str:');
  Test_RegEx(cMedium);
  Test_RegEx_Optimized(cMedium);
  Test(cMedium);
  Writeln('');

  Writeln('Long str:');
  Test_RegEx(cLong);
  Test_RegEx_Optimized(cLong);
  Test(cLong);
  Writeln('');

  Writeln('done...');
  Readln;

end.

 

Share this post


Link to post

Just skip any spaces and then check if it's '(' or not:

 

function IsFunctionInString(const aFunction, aStr: string): boolean;
var
  vPos: Integer;
  p: PChar;
begin
  vPos := Pos(aFunction, aStr);
  if vPos > 0 then
  begin
    p := @aStr[vPos + aFunction.Length];
    while p^ = ' ' do
      Inc(p);
    Result := p^ = '(';
  end
  else
    Result := False;
end;


 

  • Thanks 1

Share this post


Link to post
22 minutes ago, Stefan Glienke said:

Just skip any spaces and then check if it's '(' or not:

 


function IsFunctionInString(const aFunction, aStr: string): boolean;
var
  vPos: Integer;
  p: PChar;
begin
  vPos := Pos(aFunction, aStr);
  if vPos > 0 then
  begin
    p := @aStr[vPos + aFunction.Length];
    while p^ = ' ' do
      Inc(p);
    Result := p^ = '(';
  end
  else
    Result := False;
end;


 

Thanks!

 

It doesn't make sense to use RegEx for such example, right? OK, I see it's usable for a few calls, but for anything that searches lost of data, is pretty slow.

Share this post


Link to post

Opinions on regex differ - however for this case using a regex is like using a bucket-wheel excavator to plant a pansy.

  • Haha 1

Share this post


Link to post
Quote

Some people, when confronted with a problem, think
“I know, I'll use regular expressions.”   Now they have two problems.

 

  • Like 1

Share this post


Link to post
16 minutes ago, Stefan Glienke said:

Opinions on regex differ - however for this case using a regex is like using a bucket-wheel excavator to plant a pansy.

Seems like the only RegEx left in my apps will be for email validation. But this is called only once on user entry, so no performance needed.

The rest of RegEx will probably phase out slowly.

Edited by Mike Torrettinni

Share this post


Link to post
15 hours ago, Mike Torrettinni said:

Seems like the only RegEx left in my apps will be for email validation. But this is called only once on user entry, so no performance needed.

The rest of RegEx will probably phase out slowly.

I hope you will use https://emailregex.com/ - because invalid "email validations" are common and annoying.

  • Like 1

Share this post


Link to post

Why validate email at all? If you need it valid and active, send confirmation code. Otherwise let it go. "aa@bb.cc" is totally valid but as useless as "weufbowkef". I use this one frequently when a site that I found just now and unlikely will visit again requires my email for some unclear reasons.

Share this post


Link to post
3 hours ago, DiGi said:

I hope you will use https://emailregex.com/ - because invalid "email validations" are common and annoying.

Thanks! The website kind of proves my point why I'm removing all (most) regex. Funny how comments show so many feel so strongly about email regex 😉

 

3 hours ago, Fr0sT.Brutal said:

Why validate email at all? If you need it valid and active, send confirmation code. Otherwise let it go. "aa@bb.cc" is totally valid but as useless as "weufbowkef". I use this one frequently when a site that I found just now and unlikely will visit again requires my email for some unclear reasons.

Yes, this annoys me too, especially websites requesting email for just basic info.

 

My projects are not like website, so validation is only for typos. It's part of licensing form, so no problem with understanding the purpose.

Share this post


Link to post

We check the email address just with delphi code, implemented the rules needed. 

Also check the TLD against a TLD table, and we check if the email address is really accepted. 

(lookup mx,   and simulate sending a mail, with most servers you can check if the email address is known to the server. (catch all addressess are always ok, thats a thing),.

With this method we known which customers we need to contact to get a new valid email address.

(our company does not have a customer portal, so we need to check everything ourselves)

 

Edited by mvanrijnen
  • Like 1

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×