pyscripter 689 Posted July 24, 2020 Pcre-Jit-Delphi Delphi wraps the PCRE library into the System.RegularExpressions unit. This unit provides a high-level, easy-to-use interface to PCRE. A limitation however is that it does not provide access to the Just-In-Time (JIT) compiler of PCRE. PCRE JIT can improve the performance of regular expression matching dramatically and is particularly useful, when you apply the same regular expression repetitively. The purpose of the Pcre-Jit-Delphi project is to patch the Delphi RTL in order to provide access to JIT. Please visit the project page at Github to see how you can use PCRE JIT in your projects. Benchmark The console project Benchmark.dpr in Demos folder compares the performance of the built-in Delphi regular expressions library, with the those of using Study without JIT and Study with JIT on a commonly used regular expression benchmark. Here are the results I got from the 64 bit version. Time | Match count ============================================================================== Delphi's own TRegEx: /Twain/ : 7.00 ms | 811 /(?i)Twain/ : 41.00 ms | 965 /[a-z]shing/ : 384.00 ms | 1540 /Huck[a-zA-Z]+|Saw[a-zA-Z]+/ : 461.00 ms | 262 /\b\w+nn\b/ : 588.00 ms | 262 /[a-q][^u-z]{13}x/ : 539.00 ms | 4094 /Tom|Sawyer|Huckleberry|Finn/ : 757.00 ms | 2598 /(?i)Tom|Sawyer|Huckleberry|Finn/ : 861.00 ms | 4152 /.{0,2}(Tom|Sawyer|Huckleberry|Finn)/ : 2615.00 ms | 2598 /.{2,4}(Tom|Sawyer|Huckleberry|Finn)/ : 2766.00 ms | 1976 /Tom.{10,25}river|river.{10,25}Tom/ : 455.00 ms | 2 /[a-zA-Z]+ing/ : 807.00 ms | 78423 /\s[a-zA-Z]{0,12}ing\s/ : 560.00 ms | 49659 /([A-Za-z]awyer|[A-Za-z]inn)\s/ : 789.00 ms | 209 /["'][^"']{0,30}[?!\.]["']/ : 321.00 ms | 8885 Total Time: 11963.00 ms ============================================================================== Delphi's own TRegEx with Study: /Twain/ : 6.00 ms | 811 /(?i)Twain/ : 41.00 ms | 965 /[a-z]shing/ : 316.00 ms | 1540 /Huck[a-zA-Z]+|Saw[a-zA-Z]+/ : 21.00 ms | 262 /\b\w+nn\b/ : 581.00 ms | 262 /[a-q][^u-z]{13}x/ : 413.00 ms | 4094 /Tom|Sawyer|Huckleberry|Finn/ : 28.00 ms | 2598 /(?i)Tom|Sawyer|Huckleberry|Finn/ : 217.00 ms | 4152 /.{0,2}(Tom|Sawyer|Huckleberry|Finn)/ : 2632.00 ms | 2598 /.{2,4}(Tom|Sawyer|Huckleberry|Finn)/ : 2785.00 ms | 1976 /Tom.{10,25}river|river.{10,25}Tom/ : 50.00 ms | 2 /[a-zA-Z]+ing/ : 759.00 ms | 78423 /\s[a-zA-Z]{0,12}ing\s/ : 563.00 ms | 49659 /([A-Za-z]awyer|[A-Za-z]inn)\s/ : 699.00 ms | 209 /["'][^"']{0,30}[?!\.]["']/ : 52.00 ms | 8885 Total Time: 9179.00 ms ============================================================================== Delphi's own TRegEx with JIT: /Twain/ : 11.00 ms | 811 /(?i)Twain/ : 14.00 ms | 965 /[a-z]shing/ : 12.00 ms | 1540 /Huck[a-zA-Z]+|Saw[a-zA-Z]+/ : 3.00 ms | 262 /\b\w+nn\b/ : 126.00 ms | 262 /[a-q][^u-z]{13}x/ : 154.00 ms | 4094 /Tom|Sawyer|Huckleberry|Finn/ : 22.00 ms | 2598 /(?i)Tom|Sawyer|Huckleberry|Finn/ : 61.00 ms | 4152 /.{0,2}(Tom|Sawyer|Huckleberry|Finn)/ : 277.00 ms | 2598 /.{2,4}(Tom|Sawyer|Huckleberry|Finn)/ : 346.00 ms | 1976 /Tom.{10,25}river|river.{10,25}Tom/ : 12.00 ms | 2 /[a-zA-Z]+ing/ : 84.00 ms | 78423 /\s[a-zA-Z]{0,12}ing\s/ : 156.00 ms | 49659 /([A-Za-z]awyer|[A-Za-z]inn)\s/ : 35.00 ms | 209 /["'][^"']{0,30}[?!\.]["']/ : 18.00 ms | 8885 Total Time: 1350.00 ms As you can see the increase in performance is impressive. Credits Many thanks to @Mahdi Safsafi who provided the assembly code for __chkstk and other Delphi developers (@Kas Ob., @David Heffernan, @Remy Lebeau, @Fr0sT.Brutal) that have provided useful advice. 6 2 Share this post Link to post
Edwin Yip 154 Posted July 25, 2020 Well done! Maybe it's easier to deploy if it's possible to patch at runtime with DDetours Share this post Link to post
pyscripter 689 Posted July 25, 2020 8 hours ago, Edwin Yip said: Maybe it's easier to deploy if it's possible to patch at runtime with DDetours Not possible I am afraid and not easier. You just need to add the patched System.RegularExpressionsAPI to your project uses clause. Share this post Link to post
pyscripter 689 Posted July 25, 2020 There is an RSP https://quality.embarcadero.com/browse/RSP-21733 that asks for PCRE JIT support in the Delphi RTL. If you want the extra PCRE performance please vote for this. Off-topic. What does RSP stand for? 1 Share this post Link to post
dummzeuch 1505 Posted July 25, 2020 54 minutes ago, pyscripter said: Off-topic. What does RSP stand for? Easy: Revolutionary Socialist Party 😉 1 Share this post Link to post
Mahdi Safsafi 225 Posted July 25, 2020 @pyscripter I just made a benchmark against Perl regex. Yes its weird to compare a compiler vs interpreter ... but Perl beats Delphi own & study regex. However it failed to bit jit regex. A script and detailed result are attached. Benchmark.rar Share this post Link to post
pyscripter 689 Posted July 25, 2020 (edited) 1 hour ago, Mahdi Safsafi said: @pyscripter I just made a benchmark against Perl regex. Yes its weird to compare a compiler vs interpreter ... but Perl beats Delphi own & study regex. However it failed to bit jit regex. A script and detailed result are attached. Benchmark.rar Here it is not an issue of compiler/interpreter because PERL is using a C library anyway. Perl uses its own regular expression engine not PCRE. There are many other options, Google RE, Intel's Hyperscan to mention a couple. PCRE+JIT is one of the fastest around and most widely used. I should not forget to mention FLRE written in Pascal, which is probably the fastest of all! Sometime ago I had posted in Google+ a benchmark comparing many of the above, But I not sure how to access Google+ material. Also note that the number of matches differs between Delphi and Perl. This is not surprising, because there are subtle differences in how these engines work. Edited July 25, 2020 by pyscripter Share this post Link to post
Edwin Yip 154 Posted July 25, 2020 2 hours ago, pyscripter said: Not possible I am afraid and not easier. You just need to add the patched System.RegularExpressionsAPI to your project uses clause. But the patch is only available for a single specific version of Delphi Share this post Link to post
pyscripter 689 Posted July 25, 2020 1 minute ago, Edwin Yip said: But the patch is only available for a single specific version of Delphi You can apply it manually to your own version (it only takes a couple of minutes, because the changes are few) and you only need to do it once). You can then submit a pull request with the patch in your Delphi version. Share this post Link to post
pyscripter 689 Posted July 25, 2020 1 hour ago, dummzeuch said: Easy: Revolutionary Socialist Party 😉 lol. It reminded me of the Ecumenical Liberation Army from the Network film. Great film by the way. Share this post Link to post
Stefan Glienke 2002 Posted July 25, 2020 RAD Studio Project - every Jira project has a unique code 1 Share this post Link to post
Guest Posted July 25, 2020 26 minutes ago, Stefan Glienke said: RAD I know this one and it is easy The Reich Labour Service (Reichsarbeitsdienst; RAD) https://en.wikipedia.org/wiki/RAD Share this post Link to post
Georgge Bakh 29 Posted July 26, 2020 Did someone compare it to https://github.com/BeRo1985/flre ? Share this post Link to post
Tommi Prami 130 Posted July 29, 2020 On 7/26/2020 at 7:11 PM, Georgge Bakh said: Did someone compare it to https://github.com/BeRo1985/flre ? That flre is interesting, should check that out to do some testing with it. Share this post Link to post
pyscripter 689 Posted July 29, 2020 (edited) FLRE is the fastest of all! One single pascal file which is good The author is a kind of genius But not as extensively tested as PCRE poorly documented not very well supported the performance gain compared to PCRE+JIT is not that big System.RegularExpressions has a nicer interface All-in-all I would stick to System.RegularExpressions at least if you add JIT. Edited July 29, 2020 by pyscripter 2 Share this post Link to post