Jump to content
pyscripter

Just-in-time compiling for Delphi's Regular Expressions

Recommended Posts

 

Pcre-Jit-Delphi

 

Delphi wraps the PCRE library into the System.RegularExpressions unit. This unit provides a high-level, easy-to-use interface to PCRE. A limitation however is that it does not provide access to the Just-In-Time (JIT) compiler of PCRE. PCRE JIT can improve the performance of regular expression matching dramatically and is particularly useful, when you apply the same regular expression repetitively. The purpose of the Pcre-Jit-Delphi project is to patch the Delphi RTL in order to provide access to JIT.  Please visit the project page at Github to see how you can use PCRE JIT in your projects.

 

Benchmark

The console project Benchmark.dpr in Demos folder compares the performance of the built-in Delphi regular expressions library, with the those of using Study without JIT and Study with JIT on a commonly used regular expression benchmark.

Here are the results I got from the 64 bit version.

 

                                                        Time     | Match count
==============================================================================
Delphi's own TRegEx:
                                        /Twain/ :        7.00 ms |         811
                                    /(?i)Twain/ :       41.00 ms |         965
                                   /[a-z]shing/ :      384.00 ms |        1540
                   /Huck[a-zA-Z]+|Saw[a-zA-Z]+/ :      461.00 ms |         262
                                    /\b\w+nn\b/ :      588.00 ms |         262
                             /[a-q][^u-z]{13}x/ :      539.00 ms |        4094
                  /Tom|Sawyer|Huckleberry|Finn/ :      757.00 ms |        2598
              /(?i)Tom|Sawyer|Huckleberry|Finn/ :      861.00 ms |        4152
          /.{0,2}(Tom|Sawyer|Huckleberry|Finn)/ :     2615.00 ms |        2598
          /.{2,4}(Tom|Sawyer|Huckleberry|Finn)/ :     2766.00 ms |        1976
            /Tom.{10,25}river|river.{10,25}Tom/ :      455.00 ms |           2
                                 /[a-zA-Z]+ing/ :      807.00 ms |       78423
                        /\s[a-zA-Z]{0,12}ing\s/ :      560.00 ms |       49659
                /([A-Za-z]awyer|[A-Za-z]inn)\s/ :      789.00 ms |         209
                    /["'][^"']{0,30}[?!\.]["']/ :      321.00 ms |        8885
Total Time:    11963.00 ms
==============================================================================
Delphi's own TRegEx with Study:
                                        /Twain/ :        6.00 ms |         811
                                    /(?i)Twain/ :       41.00 ms |         965
                                   /[a-z]shing/ :      316.00 ms |        1540
                   /Huck[a-zA-Z]+|Saw[a-zA-Z]+/ :       21.00 ms |         262
                                    /\b\w+nn\b/ :      581.00 ms |         262
                             /[a-q][^u-z]{13}x/ :      413.00 ms |        4094
                  /Tom|Sawyer|Huckleberry|Finn/ :       28.00 ms |        2598
              /(?i)Tom|Sawyer|Huckleberry|Finn/ :      217.00 ms |        4152
          /.{0,2}(Tom|Sawyer|Huckleberry|Finn)/ :     2632.00 ms |        2598
          /.{2,4}(Tom|Sawyer|Huckleberry|Finn)/ :     2785.00 ms |        1976
            /Tom.{10,25}river|river.{10,25}Tom/ :       50.00 ms |           2
                                 /[a-zA-Z]+ing/ :      759.00 ms |       78423
                        /\s[a-zA-Z]{0,12}ing\s/ :      563.00 ms |       49659
                /([A-Za-z]awyer|[A-Za-z]inn)\s/ :      699.00 ms |         209
                    /["'][^"']{0,30}[?!\.]["']/ :       52.00 ms |        8885
Total Time:     9179.00 ms
==============================================================================
Delphi's own TRegEx with JIT:
                                        /Twain/ :       11.00 ms |         811
                                    /(?i)Twain/ :       14.00 ms |         965
                                   /[a-z]shing/ :       12.00 ms |        1540
                   /Huck[a-zA-Z]+|Saw[a-zA-Z]+/ :        3.00 ms |         262
                                    /\b\w+nn\b/ :      126.00 ms |         262
                             /[a-q][^u-z]{13}x/ :      154.00 ms |        4094
                  /Tom|Sawyer|Huckleberry|Finn/ :       22.00 ms |        2598
              /(?i)Tom|Sawyer|Huckleberry|Finn/ :       61.00 ms |        4152
          /.{0,2}(Tom|Sawyer|Huckleberry|Finn)/ :      277.00 ms |        2598
          /.{2,4}(Tom|Sawyer|Huckleberry|Finn)/ :      346.00 ms |        1976
            /Tom.{10,25}river|river.{10,25}Tom/ :       12.00 ms |           2
                                 /[a-zA-Z]+ing/ :       84.00 ms |       78423
                        /\s[a-zA-Z]{0,12}ing\s/ :      156.00 ms |       49659
                /([A-Za-z]awyer|[A-Za-z]inn)\s/ :       35.00 ms |         209
                    /["'][^"']{0,30}[?!\.]["']/ :       18.00 ms |        8885
Total Time:     1350.00 ms

As you can see the increase in performance is impressive.

 

Credits

 

Many thanks to @Mahdi Safsafi who provided the assembly code for __chkstk  and other Delphi developers (@Kas Ob., @David Heffernan, @Remy Lebeau, @Fr0sT.Brutal) that have provided useful advice.

 

  • Like 6
  • Thanks 2

Share this post


Link to post

Well done! Maybe it's easier to deploy if it's possible to patch at runtime with DDetours

Share this post


Link to post
8 hours ago, Edwin Yip said:

Maybe it's easier to deploy if it's possible to patch at runtime with DDetours

Not possible I am afraid and not easier.    You just need to add the patched System.RegularExpressionsAPI to your project uses clause.

Share this post


Link to post

@pyscripter I just made a benchmark against Perl regex. Yes its weird to compare a compiler vs interpreter ... but Perl beats Delphi own & study regex. However it failed to bit jit regex. A script and detailed result are attached.

Benchmark.rar

Share this post


Link to post
Posted (edited)
1 hour ago, Mahdi Safsafi said:

@pyscripter I just made a benchmark against Perl regex. Yes its weird to compare a compiler vs interpreter ... but Perl beats Delphi own & study regex. However it failed to bit jit regex. A script and detailed result are attached.

Benchmark.rar

Here it is not an issue of compiler/interpreter because PERL is using a C library anyway.

 

Perl uses its own regular expression engine not PCRE.  There are many other options, Google RE, Intel's Hyperscan to mention a couple.   PCRE+JIT is one of the fastest around and most widely used.   I should not forget to mention FLRE written in Pascal, which is probably the fastest of all!  Sometime ago I had posted in Google+ a benchmark comparing many of the above,  But I not sure how to access Google+ material.  Also note that the number of matches differs between Delphi and Perl.  This is not surprising, because there are subtle differences in how these engines work.

Edited by pyscripter

Share this post


Link to post
2 hours ago, pyscripter said:

Not possible I am afraid and not easier.    You just need to add the patched System.RegularExpressionsAPI to your project uses clause.

But the patch is only available for a single specific version of Delphi :classic_wacko:

Share this post


Link to post
1 minute ago, Edwin Yip said:

But the patch is only available for a single specific version of Delphi

You can apply it manually to your own version (it only takes a couple of minutes, because the changes are few) and you only need to do it once).  You can then submit a pull request with the patch in your Delphi version.

Share this post


Link to post
Posted (edited)
  • FLRE is the fastest of all! 
  • One single pascal file which is good
  • The  author is a kind of genius

But

  • not as extensively tested as PCRE
  • poorly documented
  • not very well supported
  • the performance gain compared to PCRE+JIT is not that big
  • System.RegularExpressions has a nicer interface

 

All-in-all I would stick to System.RegularExpressions at least if you add JIT.

Edited by pyscripter
  • Like 2

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×