Guest Posted July 24, 2020 Just now, pyscripter said: I would prefer to not mess around with the c files and I think that pcre_jit_compile needs the stack checking anyway. I agree and you don't, but /GS is not essential here but /Gs, i don't see #pragma check_stack(on) in the code, so no need to mess with the code. also there is a configuration directive for pcre that might worth testing for exception and stack problem Quote $PACKAGE-$VERSION configuration summary: Install prefix .................. : ${prefix} C preprocessor .................. : ${CPP} C compiler ...................... : ${CC} C++ preprocessor ................ : ${CXXCPP} C++ compiler .................... : ${CXX} Linker .......................... : ${LD} C preprocessor flags ............ : ${CPPFLAGS} C compiler flags ................ : ${CFLAGS} ${VISIBILITY_CFLAGS} C++ compiler flags .............. : ${CXXFLAGS} ${VISIBILITY_CXXFLAGS} Linker flags .................... : ${LDFLAGS} Extra libraries ................. : ${LIBS} Build 8 bit pcre library ........ : ${enable_pcre8} Build 16 bit pcre library ....... : ${enable_pcre16} Build 32 bit pcre library ....... : ${enable_pcre32} Build C++ library ............... : ${enable_cpp} Enable JIT compiling support .... : ${enable_jit} Enable UTF-8/16/32 support ...... : ${enable_utf} Unicode properties .............. : ${enable_unicode_properties} Newline char/sequence ........... : ${enable_newline} \R matches only ANYCRLF ......... : ${enable_bsr_anycrlf} EBCDIC coding ................... : ${enable_ebcdic} EBCDIC code for NL .............. : ${ebcdic_nl_code} Rebuild char tables ............. : ${enable_rebuild_chartables} Use stack recursion ............. : ${enable_stack_for_recursion} POSIX mem threshold ............. : ${with_posix_malloc_threshold} Internal link size .............. : ${with_link_size} Nested parentheses limit ........ : ${with_parens_nest_limit} Match limit ..................... : ${with_match_limit} Match limit recursion ........... : ${with_match_limit_recursion} Build shared libs ............... : ${enable_shared} Build static libs ............... : ${enable_static} Use JIT in pcregrep ............. : ${enable_pcregrep_jit} Buffer size for pcregrep ........ : ${with_pcregrep_bufsize} Link pcregrep with libz ......... : ${enable_pcregrep_libz} Link pcregrep with libbz2 ....... : ${enable_pcregrep_libbz2} Link pcretest with libedit ...... : ${enable_pcretest_libedit} Link pcretest with libreadline .. : ${enable_pcretest_libreadline} Valgrind support ................ : ${enable_valgrind} Code coverage ................... : ${enable_coverage} EOF Quote --disable-stack-for-recursion don't use stack recursion when matching Share this post Link to post
Mahdi Safsafi 225 Posted July 24, 2020 7 minutes ago, pyscripter said: Could anyone help translate the LLMV code to Delphi PLEASE? The prize is that everyone gets a much faster System.RegularExpressions and we may even convince Embarcadero to implement the changes. I'd help ... I'm just checking other options. Share this post Link to post
Mahdi Safsafi 225 Posted July 24, 2020 (edited) @pyscripter Anyway I compiled (using CLANG) a simple c code that I tweaked to use chkstk. and then I dumped the function and edited it to be compatible with Delphi : procedure chkstk(); asm .NOFRAME sub rsp, $10 mov [rsp], r10 mov [rsp+8], r11 xor r11,r11 lea r10, [rsp+$18] sub r10,rax cmovb r10,r11 mov r11, qword ptr gs:[$10] cmp r10,r11 db $f2 jae @@L1 and r10w,$F000 @@L2: lea r11, [r11-$1000] mov byte [r11],0 cmp r10,r11 db $f2 jne @@L2 @@L1: mov r10, [rsp] mov r11, [rsp+8] add rsp, $10 db $f2 ret end; EDIT: I didn't pay attention to the GAS syntax for branch and interpreted it as hex ... My bad. Edited July 24, 2020 by Mahdi Safsafi 1 Share this post Link to post
Guest Posted July 24, 2020 Found it and you were almost right /Gs100000000 did it ! The problem is in one call jit_machine_stack_exec Share this post Link to post
pyscripter 689 Posted July 24, 2020 (edited) 27 minutes ago, Mahdi Safsafi said: @pyscripter The ___chkstk_ms function that you post is incomplete !!! ja 0x2b is jumping to some location that is not with function body ! Perhaps DEFINE_COMPILERRT_FUNCTION is doing something here. DEFINE_COMPILERRT_FUNCTION(___chkstk_ms) push %rcx push %rax cmp $0x1000,%rax lea 24(%rsp),%rcx jb 1f 2: sub $0x1000,%rcx test %rcx,(%rcx) sub $0x1000,%rax cmp $0x1000,%rax ja 2b 1: sub %rax,%rcx test %rcx,(%rcx) pop %rax pop %rcx ret END_COMPILERRT_FUNCTION(___chkstk_ms) Anyway I compiled (using CLANG) a simple c code that I tweaked to use chkstk. and then I dumped the function and edited it to be compatible with Delphi : procedure chkstk(); asm .NOFRAME sub rsp, $10 mov [rsp], r10 mov [rsp+8], r11 xor r11,r11 lea r10, [rsp+$18] sub r10,rax cmovb r10,r11 mov r11, qword ptr gs:[$10] cmp r10,r11 db $f2 jae @@L1 and r10w,$F000 @@L2: lea r11, [r11-$1000] mov byte [r11],0 cmp r10,r11 db $f2 jne @@L2 @@L1: mov r10, [rsp] mov r11, [rsp+8] add rsp, $10 db $f2 ret end; That works and produces the correct results!! Thanks. So win64 is complete and I will now try to get win32 right. I would love to know what this function is doing though... What I understand is that it just "touches" the stack and if an _XCPT_GUARD_PAGE_VIOLATION error occurs the OS traps that and grows the stack. In the event of failure, the OS raises the _XCPT_UNABLE_TO_GROW_STACK exception. Sounds like magic. Edited July 24, 2020 by pyscripter Share this post Link to post
Guest Posted July 24, 2020 @pyscripter Out of curiosity, why are you suing 16bit version of pcre ? Are 8bit and 32bit slower than 16bit ? The sum size of 32bit obj's is 380kb -DPCRE_BUILD_PCRE32=ON -DPCRE_BUILD_PCRE8=OFF while the sum size of 16bit obj's 409kb -DPCRE_BUILD_PCRE16=ON -DPCRE_BUILD_PCRE8=OFF Share this post Link to post
pyscripter 689 Posted July 24, 2020 Just now, Kas Ob. said: @pyscripter Out of curiosity, why are you suing 16bit version of pcre ? Are 8bit and 32bit slower than 16bit ? The sum size of 32bit obj's is 380kb -DPCRE_BUILD_PCRE32=ON -DPCRE_BUILD_PCRE8=OFF while the sum size of 16bit obj's 409kb -DPCRE_BUILD_PCRE16=ON -DPCRE_BUILD_PCRE8=OFF PCRE16. This is what Delphi uses and for a reason. It avoids conversions between delphi strings and PCRE strings. Share this post Link to post
Mahdi Safsafi 225 Posted July 24, 2020 Quote That works and produces the correct results!! Thanks. So win64 is complete and I will now try to get win32 right. From what I understood from your comment, you're going to compile using MSVC and then use some tools to convert the output ? why not give bcc32c a chance ? if it does not work ... you can simply use plan B. Share this post Link to post
pyscripter 689 Posted July 24, 2020 (edited) 19 minutes ago, Mahdi Safsafi said: From what I understood from your comment, you're going to compile using MSVC and then use some tools to convert the output ? why not give bcc32c a chance ? if it does not work ... you can simply use plan B. There are no conversions needed for Win64. For Win 32 I have the issue of windows calls (such as __imp__VirtualFree@12. Fortunately there is just one obj file that makes such calls and I will find a way to work around it (see earlier posts). Why not use bcc32c... I have only access to the free tools that only compile x86 code pcre has a number of configuration steps and options and cmake makes it very easy and convenient to automate everything. It would be very hard to do that manually correctly. cmake has a generator for bcc32c but is rather new and I am not sure how well tested that is. Saddly, I have more trust in the Visual Studio compiler, especially for code that requires stack checking. Does bcc32c do that? When I finish I will release the stuff at github, so people can take a look and make adjustments. Embarcadero can use their own compiler if they wish. Edited July 24, 2020 by pyscripter Share this post Link to post
Mahdi Safsafi 225 Posted July 24, 2020 4 minutes ago, pyscripter said: There are no conversions needed for Win64. Yeah I know that, I early proposed to you to use MSVC for x64 and bcc32c for x86. Quote Saddly, I have more trust in the Visual Studio compiler, especially for code that requires stack checking. Does bcc32c do that? Sadly bcc32c (at least the free version) is build on an old LLVM compiler. Quote When I finish I will release the stuff at github, so people can take a look and make adjustments. Embarcadero can use their own compiler if they wish. Good luck ! if you need some help let me know. Share this post Link to post
pyscripter 689 Posted July 24, 2020 2 minutes ago, Mahdi Safsafi said: Good luck ! if you need some help let me know. Well yes. I need a 32-bit __chkstk Here is the Visual Studio masm code page ,132 title chkstk - C stack checking routine ;*** ;chkstk.asm - C stack checking routine ; ; Copyright (c) Microsoft Corporation. All rights reserved. ; ;Purpose: ; Provides support for automatic stack checking in C procedures ; when stack checking is enabled. ; ;******************************************************************************* .xlist include vcruntime.inc .list ; size of a page of memory _PAGESIZE_ equ 1000h CODESEG page ;*** ;_chkstk - check stack upon procedure entry ; ;Purpose: ; Provide stack checking on procedure entry. Method is to simply probe ; each page of memory required for the stack in descending order. This ; causes the necessary pages of memory to be allocated via the guard ; page scheme, if possible. In the event of failure, the OS raises the ; _XCPT_UNABLE_TO_GROW_STACK exception. ; ; NOTE: Currently, the (EAX < _PAGESIZE_) code path falls through ; to the "lastpage" label of the (EAX >= _PAGESIZE_) code path. This ; is small; a minor speed optimization would be to special case ; this up top. This would avoid the painful save/restore of ; ecx and would shorten the code path by 4-6 instructions. ; ;Entry: ; EAX = size of local frame ; ;Exit: ; ESP = new stackframe, if successful ; ;Uses: ; EAX ; ;Exceptions: ; _XCPT_GUARD_PAGE_VIOLATION - May be raised on a page probe. NEVER TRAP ; THIS!!!! It is used by the OS to grow the ; stack on demand. ; _XCPT_UNABLE_TO_GROW_STACK - The stack cannot be grown. More precisely, ; the attempt by the OS memory manager to ; allocate another guard page in response ; to a _XCPT_GUARD_PAGE_VIOLATION has ; failed. ; ;******************************************************************************* public _alloca_probe _chkstk proc _alloca_probe = _chkstk push ecx ; Calculate new TOS. lea ecx, [esp] + 8 - 4 ; TOS before entering function + size for ret value sub ecx, eax ; new TOS ; Handle allocation size that results in wraparound. ; Wraparound will result in StackOverflow exception. sbb eax, eax ; 0 if CF==0, ~0 if CF==1 not eax ; ~0 if TOS did not wrapped around, 0 otherwise and ecx, eax ; set to 0 if wraparound mov eax, esp ; current TOS and eax, not ( _PAGESIZE_ - 1) ; Round down to current page boundary cs10: cmp ecx, eax ; Is new TOS bnd jb short cs20 ; in probed page? mov eax, ecx ; yes. pop ecx xchg esp, eax ; update esp mov eax, dword ptr [eax] ; get return address mov dword ptr [esp], eax ; and put it at new TOS bnd ret ; Find next lower page and probe cs20: sub eax, _PAGESIZE_ ; decrease by PAGESIZE test dword ptr [eax],eax ; probe page. jmp short cs10 _chkstk endp end Share this post Link to post
David Heffernan 2345 Posted July 24, 2020 I always turn off stack checking for this reason. Wouldn't it just be better to put it in a DLL? Share this post Link to post
Guest Posted July 24, 2020 @pyscripter /Gs100000000 did remove the need for chkstk for both 32 and 64 bit, as i see on my VS 2017, now i am not sure why the protection should be needed here as the parameters passed in stack are only to 2 and to hit the stack limits, it should be recursively called 128000 times on 32bit 1mb stack, so may be you should consider remove it. Now on other hand you can simply disable recursion to begin with SET(PCRE_NO_RECURSE OFF CACHE BOOL "If ON, then don't use stack recursion when matching. See NO_RECURSE in config.h.in for details.") then use that /Gs100000000 to remove chkstk call Share this post Link to post
Mahdi Safsafi 225 Posted July 24, 2020 2 minutes ago, Kas Ob. said: @pyscripter /Gs100000000 did remove the need for chkstk for both 32 and 64 bit, as i see on my VS 2017, now i am not sure why the protection should be needed here as the parameters passed in stack are only to 2 and to hit the stack limits, it should be recursively called 128000 times on 32bit 1mb stack, so may be you should consider remove it. Now on other hand you can simply disable recursion to begin with SET(PCRE_NO_RECURSE OFF CACHE BOOL "If ON, then don't use stack recursion when matching. See NO_RECURSE in config.h.in for details.") then use that /Gs100000000 to remove chkstk call Wow ... that's a 100mb ! Share this post Link to post
pyscripter 689 Posted July 24, 2020 (edited) 3 hours ago, pyscripter said: I understand that a DLL is a safer and easier option, but this is meant to be an enhancement of System.RegularExpressions which uses static linking of obj files. @David Heffernan I explained why I am linking C code earlier... @Kas Ob.I would rather use the default options if I can. For example I am not sure of the performance implications of PCRE_NO_RECURSE. Now, if nothing else worked, I will try. Edited July 24, 2020 by pyscripter Share this post Link to post
Guest Posted July 24, 2020 Before make your mind, please hear me out, and forgive me to repeat explaining this There two settings for this stack protection Quote SET(PCRE_MATCH_LIMIT_RECURSION "MATCH_LIMIT" CACHE STRING "Default limit on internal recursion. See MATCH_LIMIT_RECURSION in config.h.in for details.") SET(PCRE_NO_RECURSE OFF CACHE BOOL "If ON, then don't use stack recursion when matching. See NO_RECURSE in config.h.in for details.") The default for limit comes with Quote #define MATCH_LIMIT 10000000 #define MATCH_LIMIT_RECURSION MATCH_LIMIT Those belongs to pcre itself, and they are already applied and in action while chkstk comes from the JIT engine where no limit, and enforced by VS, so i really don't see the reason to be afraid of stack overflow with their own default settings. Share this post Link to post
Mahdi Safsafi 225 Posted July 24, 2020 @pyscripter Here is the x86 function. But its really weird, the original version shipped with Delphi declares it as empty and it works. I don't understand why it crashed with you when you defined it as empty. @Kas Ob's point is very interesting. I didn't read the configuration's documentation. But he definitely did ... Perhaps he is right. I suggest that you check with him that point. And who knows ... you may not need to use chkstk. chkstk.txt 1 Share this post Link to post
pyscripter 689 Posted July 24, 2020 (edited) @Mahdi SafsafiThanks!! Now it works in 32-bits as well. 32-bit benchmark Time | Match count ============================================================================== Delphi's own TRegEx: /Twain/ : 6.00 ms | 811 /(?i)Twain/ : 41.00 ms | 965 /[a-z]shing/ : 448.00 ms | 1540 /Huck[a-zA-Z]+|Saw[a-zA-Z]+/ : 528.00 ms | 262 /\b\w+nn\b/ : 634.00 ms | 262 /[a-q][^u-z]{13}x/ : 543.00 ms | 4094 /Tom|Sawyer|Huckleberry|Finn/ : 896.00 ms | 2598 /(?i)Tom|Sawyer|Huckleberry|Finn/ : 1063.00 ms | 4152 /.{0,2}(Tom|Sawyer|Huckleberry|Finn)/ : 2980.00 ms | 2598 /.{2,4}(Tom|Sawyer|Huckleberry|Finn)/ : 3088.00 ms | 1976 /Tom.{10,25}river|river.{10,25}Tom/ : 528.00 ms | 2 /[a-zA-Z]+ing/ : 918.00 ms | 78423 /\s[a-zA-Z]{0,12}ing\s/ : 548.00 ms | 49659 /([A-Za-z]awyer|[A-Za-z]inn)\s/ : 775.00 ms | 209 /["'][^"']{0,30}[?!\.]["']/ : 312.00 ms | 8885 Total Time: 13321.00 ms ============================================================================== Delphi's own TRegEx with Study: /Twain/ : 6.00 ms | 811 /(?i)Twain/ : 40.00 ms | 965 /[a-z]shing/ : 318.00 ms | 1540 /Huck[a-zA-Z]+|Saw[a-zA-Z]+/ : 24.00 ms | 262 /\b\w+nn\b/ : 619.00 ms | 262 /[a-q][^u-z]{13}x/ : 450.00 ms | 4094 /Tom|Sawyer|Huckleberry|Finn/ : 31.00 ms | 2598 /(?i)Tom|Sawyer|Huckleberry|Finn/ : 256.00 ms | 4152 /.{0,2}(Tom|Sawyer|Huckleberry|Finn)/ : 2875.00 ms | 2598 /.{2,4}(Tom|Sawyer|Huckleberry|Finn)/ : 2905.00 ms | 1976 /Tom.{10,25}river|river.{10,25}Tom/ : 63.00 ms | 2 /[a-zA-Z]+ing/ : 829.00 ms | 78423 /\s[a-zA-Z]{0,12}ing\s/ : 569.00 ms | 49659 /([A-Za-z]awyer|[A-Za-z]inn)\s/ : 685.00 ms | 209 /["'][^"']{0,30}[?!\.]["']/ : 56.00 ms | 8885 Total Time: 9746.00 ms ============================================================================== Delphi's own TRegEx with JIT: /Twain/ : 12.00 ms | 811 /(?i)Twain/ : 13.00 ms | 965 /[a-z]shing/ : 13.00 ms | 1540 /Huck[a-zA-Z]+|Saw[a-zA-Z]+/ : 3.00 ms | 262 /\b\w+nn\b/ : 189.00 ms | 262 /[a-q][^u-z]{13}x/ : 154.00 ms | 4094 /Tom|Sawyer|Huckleberry|Finn/ : 22.00 ms | 2598 /(?i)Tom|Sawyer|Huckleberry|Finn/ : 64.00 ms | 4152 /.{0,2}(Tom|Sawyer|Huckleberry|Finn)/ : 371.00 ms | 2598 /.{2,4}(Tom|Sawyer|Huckleberry|Finn)/ : 497.00 ms | 1976 /Tom.{10,25}river|river.{10,25}Tom/ : 12.00 ms | 2 /[a-zA-Z]+ing/ : 105.00 ms | 78423 /\s[a-zA-Z]{0,12}ing\s/ : 197.00 ms | 49659 /([A-Za-z]awyer|[A-Za-z]inn)\s/ : 39.00 ms | 209 /["'][^"']{0,30}[?!\.]["']/ : 19.00 ms | 8885 Total Time: 1730.00 ms I will release it at Github (watch for announcement). Then @Kas Ob.you can try to get rid of __chkstk and see what happens. Edited July 24, 2020 by pyscripter 1 Share this post Link to post
pyscripter 689 Posted July 24, 2020 Here it is https://github.com/pyscripter/Pcre-Jit-Delphi. Enjoy 1 Share this post Link to post