Jump to content

Arnaud Bouchez

Members
  • Content Count

    315
  • Joined

  • Last visited

  • Days Won

    22

Everything posted by Arnaud Bouchez

  1. Arnaud Bouchez

    tiny computer for Delphi apps

    6GB is pretty enough for Windows 10, if you don't run VMs on it. And Delphi IDE won't use more than 2GB for sure, since it is still a 32-bit application. This mini PC would run very well with Delphi - the main bottleneck for an IDE is the disk, and with a good M2 SSD I don't see why it may be slow. Perhaps the default EMMC storage may not be optimum, but I guess it would work well enough. From what I saw, the slowest part of the IDE is the copy protection check at startup... at least if you use Andy's FixPack. 🙂 Such computers are powerful enough for Delphi. Perhaps not with Visual Studio with a lot of plugins.
  2. The new features of this compiler are just untested/unfinished... Even the type helpers are broken with sub-types: if you define TMyInteger = type integer then you can't use myinteger.ToString... Non-inheritance is a "feature" which IMHO is wrong. What always work, and is the very same, is to write: TUsers = array of TUser; > Too bad we are working with Delphi where it does not matter because I can still assign a TUserName to TUserFirstName Yes, only var/out variables have compile-time strong checking... But at least you can read the code and verify the consistency either since the type (and RTTI) are explicit. And refer to the type definition comment as documentation. And it also helps writing more natural code, by having the type defined in the time, not in the parameter I prefer: function CopySession(source, dest: TSessionID): boolean; .... property session: TSessionID; instead of function CopySession(sourceSessionID, destSessionID: integer): boolean ... property sessionID: integer;
  3. Strong typing may be a good idea in some context. For instance, in DDD (Domain Driven Design) you should better define your genuine types. Instead of writing: type TUser = record Name: string; FirstName: string; end; You should rather define: type TUserName = type string; TUserFirstName = type string; TUser = record Name: TUserName; FirstName: TUserFirstName; end; TUsers = type TArray<TUser>; and so on for any service methods. Such strong typing (T* = type ### defines its own strong type) helps maintaining complex code. I always let people remember the https://en.wikipedia.org/wiki/Mars_Climate_Orbiter disaster. A strong type (as it should have been if ADA would still have been used for the software) would have ensured that the force would use the same unit in the inter-module computation (english vs metric systems). Specific types may help e.g. for "Design by contract". Of course, for a DTO you may just use plain string/TArray<string>. But if you want to have some maintainable long-term code, consider define your own types. See e.g. http://blog.synopse.info/post/2019/09/18/Introducing-Kingdom-Driven-Design-at-EKON-23
  4. Arnaud Bouchez

    Free SQLite -> Interbase tool?

    You may also try our SynDBExplorer tool. If you can serve the Interbase ToGO DB with Interbase server, connect to it then choose the "Table Export" feature: it will create the SQLite3 file for you. Note that if you use the SQL to text dump conversion, I guess you don't need to change anything to the CREATE TABLE statement. The SQLite3 syntax is very relaxed, thanks to its "column affinity" feature.
  5. Arnaud Bouchez

    Free SQLite -> Interbase tool?

    If you crash your drive with a hammer, you would also loose all your data. The SQLite3 reference article is really paranoid, and its default settings are aircraft-level secure. If you have exclusive access to the SQLite3 DB, then most of the database corruption problems disappear. To be fair, 'entreprise' DBs don't synch to disk at every transaction. I have had Oracle databases not able to mount at all after a power failure. Whereas SQLite3 may loose some data, but can almost always reopen its SQLite3 file. We used those settings on production DB since years with TB of processing data and billions of insert/select, with no data loss (only journalmode was left to its default). The OP was talking about mobile app, where I doubt such paranoia is required.
  6. Arnaud Bouchez

    Free SQLite -> Interbase tool?

    Export/import as SQL? Only the CREATE TABLE statements may need some manual adjustment. But the INSERT should work directly. Sqlite3 command-line tool has a .dump command - just copy the SQlite3 DB file from your mobile to your desktop to dump it. IMHO SQLite3 would be faster than Interbase - it is at least what I saw with Firebird/SQLite3 on Windows and Linux. Perhaps Interbase has some secret weapon, but I doubt it very much. And using an Open Source and proven solution like SQlite3 is worth it... Also ensure you make a fair comparison between the two. By default, SQLite3 expects a full synch to the storage media, which is the safest, but slowest approch. So ensure you setup JOURNAL_MODE=Memory and LOCKINGMODE=Exclusive and SYNCHRONOUS=Off. Check http://blog.synopse.info/post/2012/07/26/ACID-and-speed
  7. Arnaud Bouchez

    Experience/opinions on FastMM5

    @abak My advice to switch to FastMM5 only if 1. you actually tested and saw a noticeable (on wall clock) performance improvement 2. and you are OK with the licence terms. I doubt point 1. will happen in most cases, i.e. if your app is not heavily multi-threaded, but is a regular VCL/DB app. Point 2 would require to pay for a license if your project is not GPL/LGPL itself.
  8. Arnaud Bouchez

    Experience/opinions on FastMM5

    I don't think alignement is involved to trigger or not microfusion. Alignement is a just way to ensure that the CPU instruction decoder is able to fetch as much opcodes as possible: since the CPU is likely to fetch 16 bytes of opcodes at a time, aligning a jump to 16 bytes may reduce the number of fetchs. It is mostly needed for a loop, and could (much more marginaly) be beneficial for regular jumps. My reference/bible is https://www.agner.org/optimize/optimizing_assembly.pdf in that matter: But the only true reference is the clock: as you wrote we need to test/measure, not guess.
  9. Arnaud Bouchez

    Experience/opinions on FastMM5

    @Kas Ob. 1) this modified code is not the same as the initial, because rdx is modified in between. And the current code is better since the CPU will make microfusion opcode of cmp + jmp 2) It is correct. I will use cmovb here. Thanks! 3) I would never use an Windows undocumented function in production code. There is almost no sleep() call in my tests thanks to good spining. So it won't make any difference in practice. And we focus on Linux, not Windows, for our servers - in which nanosleep is there. Speaking of 100ns resolution is IMHO unrealistic: I suspect there is a context switch otherwise bigger spinning or calling ThreadSwitch may be just good enough.
  10. Arnaud Bouchez

    Experience/opinions on FastMM5

    You are right: FastMM5 challenged me... and since no one responded to my offer about helping it run on FPC/Linux, and also since I wanted something Open Source but not so restrictive, I created https://github.com/synopse/mORMot2/blob/master/src/core/mormot.core.fpcx64mm.pas which is GPL/LGPL and MPL. So you can use it with closed software. It uses the same core algorithms than FastMM4. I like it so much, and missed it so much in FPC... 🙂 I was involved in ScaleMM2, and a per-thread arena for small blocks didn't convince me: it tends to consume too much RAM when you have a lot of threads in your process. Note that a threadvar is what the FPC standard MM uses. I wanted to take the best of FastMM4 (which is very proven, stable and efficient), but drive it a little further in terms of multi-threading and code quality. FastMM4 asm is 32-bit oriented, its x86_64 version was sometimes not very optimized for this target - just see its abuse of globals, not knowledge of micro-op fusion or CPU cache lines and locks, and sparse use of registers. Also focusing on a single compiler and a single CPU, with not all the features of FastMM4 in pascal mode, helped fpcx64mm appear in two days only. Last but not least, I spent a lot of time this last year in x86_64 assembly, so I know which patterns are expected to be faster. The huge regression test suite of mORMot helps having a proven benchmark - much more aggressive and realistic than microbenchmarks (like string concatenation in threads, or even the FastCode benchmark) on which most other MM relies for measurement. When the regression tests are more than twice faster than with the FPC standard MM on Linux - as @ttomas reported - then we are talking. It runs a lot of different scenarios, with more than 43,000,000 individual tests, and several kind of HTTP/TCP servers on the loopback, running in-memory or SQLite databases, processing JSON everywhere, with multiple client threads stressing it. When I run the test on my Linux machine, I have only a few (less than a dozen) system Linux nanosleeps (better than Windows sleep) , and less than 2 ms waiting during a 1 minute of heavy tests - and only for Freemem. I really don't like the microbenchmarks used for testing MM. Like the one published in this forum. For instance IntelTBB is very fast for such benchmarks, but it doesn't release its memory as it should, and it is unusable in practice. I guess that some user code, not written with performance in mind, and e.g. abusing of str := str+'something' patterns would also be more than twice faster. And if your code has to reallocate huge buffers (>256KB) in a loop, using mremap on Linux may make a huge performance boost since no data would be copied at all - Linux mremap() is much better than what Windows or BSD offer! Yes, huge memory blocks are resized by the Linux Kernel by reaffecting its TLB redirection tables, without copying any memory. No need to use AVX512 if you don't copy anything! And plain SSE2 (with non-volatile mov for big buffers) is good enough to saturate the HW memory bandwidth - and faster than ERMS in practice. IMHO there was no need to change the data structures like FastMM5 did - I just tuned/fixed most of its predecessor FastMM4 asm, reserved some additional slots for the smaller blocks (<=80 bytes are now triplets), implemented a safe and efficient spinning, implement some internal instrumentation to catch multi-threading bottlenecks, and then Getmem didn't suffer from contention any more! I knew than FastMM4 plus some tweaks could be faster than anything else - perhaps even FastMM5.
  11. Arnaud Bouchez

    FastMM5 now released by Pierre le Riche (small background story)

    If I understand correctly, FastMM5 handles several arenas instead of one for FastMM4, and tries all of them until one is not currently locked, so thread contention is less subject to happen. One area where FastMM5 may have some improvement is his naive use of "rep movsb" which should rather use a non volative SSE2/AVX move for big blocks. Check https://stackoverflow.com/a/43574756/458259 numbers for instance. ScaleMM2 and FPC heap both use a threadvar arena for small blocks, so doesn't bother to check for any lock. It is truly lock-free. But each thread maintains its own small blocks arena, so it consumes more memory. Other Memory Managers like Intel TBB or JeMalloc have also a similar per-thread approach, but consumes much more memory. For instance, IBB is a huge winner in performance, but it consumes up to 60 (sixty!) times more memory! So it is not usable in practice for serious server work - it may help for heavily multithread apps, but not for common services. I tries those other MM with mORMot and real heavily multi-threaded service. Please check the comments at the beginning of https://synopse.info/fossil/artifact/f85c957ff5016106 One big problem with the C memory managers is that they tend to ABORT the process (not SIGAV but SIGABRT) if there is a dandling pointer - which happens sometimes, and is very difficult to track. This paranoia makes them impossible to use on production: you don't want your service to shutdown with no prior notice just because the MM complains about a pointer! Our only problem with FPC heap is that with long term servers, it tends to fragment the memory and consumes some GB of RAM, whereas a FastMM-like memory manager would have consumed much less. FPC heap memory consumption doesn't leak: it stabilizes after a few days, but it is still higher than FastMM. The problem with FastMM (both 4 and 5) is that they don't work on Linux x86_64 with FPC. This is why I proposed to help Eric about FPC support.
  12. Arnaud Bouchez

    Experience/opinions on FastMM5

    I didn't see any explicit NUMA support in the source code. I guess the idea is to force the CPU affinity of the process, to avoid NUMA latencies.
  13. Arnaud Bouchez

    FastMM5 now released by Pierre le Riche (small background story)

    What do you think about FPC + Linux support, which is a good environment for multi-threaded servers? FPC built-in heap is good, but tends to consume a lot of memory with a lot of threads: it maintains small per-thread heaps using a threadvar, whereas FastMM5 uses several arenas which are shared among all threads (I guess the idea is inspired from pmalloc/glibc allocator). I used C all best known alternatives, and I was not convinced. The only stable and not bloated memory manager is the one in glibc. But the slightest memory access violation tends to kill/abort the process, so it is not good on production. I could definitively help about the Linux/FPC syscalls and the low-level Intel asm, to includ FPC/Linux support on FastMM5. But perhaps I would go into this direction only if FPC as compiler doesn't require a commercial license. What do you think?
  14. Arnaud Bouchez

    Random Access Violation?

    As David wrote, try to make a minimal reproducible example. Just a project with ODBC access, running a SELECT query. I thought it may have been some problem with FPU exceptions, which happen with third-party libraries, but they usually occur in your Delphi code, not in library code....
  15. Arnaud Bouchez

    Shift-F9 dead

    Another program may capture Shift-F9 ?
  16. Arnaud Bouchez

    'stdcall' for 32-bit vs. 64-bit?

    From the code point of view, there is a single calling convention on Delphi for x86_64. But in practice, on Win64, they may be several calling conventions, e.g;. __vectorcall https://en.wikipedia.org/wiki/X86_calling_conventions#Microsoft_vectorcall This is not yet supported with Delphi - and I doubt it will any day soon, since we have so poor support of vectorization in our compiler. Let's hope a vectorcall; attribute would be implemented someday in Delphi! 🙂 It is supported by FPC - which seems more advanced in terms of optimization and vectorization: https://wiki.freepascal.org/FPC_New_Features_3.2#Support_for_Microsoft.27s_vectorcall_calling_convention Last note: for regular calls, there is a single calling convention per ABI on 64-bit. But the ABI itself (i.e. how parameters are passed and stack handled) is not the same on Windows and POSIX. The Windows 64-bit ABI differs from the Linux/POSIX/SystemV 64-bit ABI on x86_64. Check https://en.wikipedia.org/wiki/X86_calling_conventions#x86-64_calling_conventions It won't affect regular pascal code. But if you mess with assembly, ensure you use the right registers and stack alignment!
  17. Arnaud Bouchez

    language updates in 10.4?

    What do you mean? That you can put a string to a TNullableInteger? At least under FPC they are type safe: you are required to call NullableInteger(1234) to fill a TNullableInteger variable - you can't write aNullableInteger := 'toto' - or even aNullableInteger := 1234. Delphi is more relaxed about implicit conversions, so under Delphi it is indeed not typesafe, since you can write aNullableInteger := 'toto'.
  18. Arnaud Bouchez

    language updates in 10.4?

    The main change about the language would be the full ARC removal. The memory model is really part of the language, to my understanding. It is just as vital as to operate with "class" itself. Pure unfair FUD trolling remark: managed records are available in FPC trunk since a few months, and I guess EMB doesn't like to be behind an Open Source compiler. 😉 We implemented Nullable types using variants, and integrated support in our ORM. It has the advantage on working since Delphi 6, with low overhead, and good integration with the pascal language. See http://blog.synopse.info/post/2015/09/25/ORM-TNullable*-fields-for-NULL-storage - back from 2015!
  19. We use our https://github.com/synopse/mORMot/blob/master/SynLog.pas Open Source logging framework on server side, with sometimes dozens of threads into the same log file. It has very high performance, so logging don't slow down the process, and you can make very useful forensic and analysis if needed. Having all threads logging in the same log file is at the same time a nightmare and a blessing. It may be a nightmare since all operations are interleaved and difficult to identify. It is a blessing with the right tool, able to filter for one or several threads, then find out what is really occuring: in this case, a single log file is better than several. For instance, one of our https://livemon.com server generates TB of logs - see https://leela1.livemon.net/metrics/counters - and we can still handle it. This instance for instance is running since more than 8 months without being restarted, and with hunderths of simultaneous connections, logging incoming data every second... 🙂 We defined our simple and very usable log viewer tool - see http://blog.synopse.info/post/2011/08/20/Enhanced-Log-viewer and https://synopse.info/files/html/Synopse mORMot Framework SAD 1.18.html#TITL_103 This is the key to be able to have something usable from heavily multithreaded logs. So if you can, rather use a single log file per process, with proper thread identification and filtering.
  20. @David Heffernan Yes, the fastest heap is the one not used - I tend to allocate a lot of temporary small buffers (e.g. for number to text conversion) from the stack instead of using a temporary string. See http://blog.synopse.info/post/2011/05/20/How-to-write-fast-multi-thread-Delphi-applications
  21. @Tommi Prami I guess you found out about Lemire in our latest commits - see e.g. https://github.com/synopse/mORMot/commit/a91dfbe2e63761d724adef0703140e717f5b2f00 🙂 @Stefan Glienke It is to be used with a prime size - as with our code - which also reduces memory consumption since power of 2 tables are far from optimal in this regard (doubling the slot numbers can become problematic). With a prime, it actually enhances the distribution, even with a weak hash function, especially in respect to anding a power of 2. Lemire reduction is as fast as anding a power of 2 since a multiplication is done in 1 cycle on modern CPUs. Note that Delphi Win32 is not so good at compiling 64-bit multiplcation as involved with Lemire's, whereas FPC has no problem using the i386 mul opcode - which already gives 64-bit results.
  22. @David Heffernan You just store the ThreadID within the memory block information, or you use a per-convention identification of the memory buffer. Cross-thread deallocations also usually require a ThreadEnded-like event handler, which doesn't exist on Delphi IIRC - but does exist on FPC - so need to hack TThread. @RDP1974 Last time I checked, FastMM4 (trunk or AVX2 fork) don't work well with Linux (at least under FPC). Under Delphi + Linux, FastMM4 is not used at all - it just call libc free/malloc IIRC. I am not convinced the slowness comes from libc heap - which is very good from our tests. But from how Delphi/Linux is not optimized (yet). Other MM like BrainMM or our ScaleMM are not designed for Linux. We tried also a lot of allocators on Linux - see https://github.com/synopse/mORMot/blob/master/SynFPCCMemAligned.pas - in the context of highly multi-threaded servers. In a nutshell, see https://github.com/synopse/mORMot/blob/master/SynFPCCMemAligned.pas#L57 for some numbers. The big issue with those C-based allocators, which is not listed in those comments, apart from loading a lot of RAM, is that they stop the executable as soon as some GPF occurs: e.g. a double free will call a SIGABORT! So they are NOT usable on production unless you use them with ValGrid and proper debugging. We fallback into using the FPC default heap, which is a bit slower, consumes a lot of RAM (since it has a per-thread heap for smaller blocks) but is very stable. It is written in plain pascal. And the main idea about performance is to avoid as much memory allocation as possible - which is what we tried with mORMot from the ground up: for instance, we define most of the temp strings in the stack, not in the heap. I don't think that re-writing a C allocator into pascal would be much faster. It is very likely to be slower. Only a pure asm version may have some noticeable benefits - just like FastMM4. And, personally, I wouldn't invest into Delphi for Linux for server process: FPC is so much stable, faster and better maintained... for free!
  23. Arnaud Bouchez

    Reading large UTF8 encoded file in chunks

    When reading several MB of buffers, it is not needed to read back. Just read the buffer line by line, from the beginning. Use a fast function like our BufferLineLength() above to compute the line length. Then search within the line buffer. If you can keep the buffer smaller than your CPU L3 cache, it may have some benefit. Going that way, the CPU will give you best performance, for several reasons: 1. the whole line is very likely to remain in L1 cache, so searching the line feed, then search any pattern will be achieved at full core speed. 2. there will be automatic prefetching from main RAM into L1/L2 cache when reading ahead in a single direction. If your disk is fast enough (NVMe), you can fill buffers in separated threads (use number of CPU cores - 1), then search in parallel from several files (one core per file - it would be more difficult to properly search the same file in multiple cores). If you don't allocate any memory during the process (do not use string), parallel search would scale linearly. Always do proper timing for your search speed - also taking into account the OS disk cache, which is likely to be used during testing, but not from real "cold" files.
  24. Arnaud Bouchez

    Reading large UTF8 encoded file in chunks

    There is very fast line feed search, using proper x86_64 SSE assembly, checking by 16 bytes per loop iteration, in our SynCommons.pas: function BufferLineLength(Text, TextEnd: PUTF8Char): PtrInt; {$ifdef CPUX64} {$ifdef FPC} nostackframe; assembler; asm {$else} asm .noframe {$endif} {$ifdef MSWINDOWS} // Win64 ABI to System-V ABI push rsi push rdi mov rdi, rcx mov rsi, rdx {$endif}mov r8, rsi sub r8, rdi // rdi=Text, rsi=TextEnd, r8=TextLen jz @fail mov ecx, edi movdqa xmm0, [rip + @for10] movdqa xmm1, [rip + @for13] and rdi, -16 // check first aligned 16 bytes and ecx, 15 // lower 4 bits indicate misalignment movdqa xmm2, [rdi] movdqa xmm3, xmm2 pcmpeqb xmm2, xmm0 pcmpeqb xmm3, xmm1 por xmm3, xmm2 pmovmskb eax, xmm3 shr eax, cl // shift out unaligned bytes test eax, eax jz @main bsf eax, eax add rax, rcx add rax, rdi sub rax, rsi jae @fail // don't exceed TextEnd add rax, r8 // rax = TextFound - TextEnd + (TextEnd - Text) = offset {$ifdef MSWINDOWS} pop rdi pop rsi {$endif}ret @main: add rdi, 16 sub rdi, rsi jae @fail jmp @by16 {$ifdef FPC} align 16 {$else} .align 16 {$endif} @for10: dq $0a0a0a0a0a0a0a0a dq $0a0a0a0a0a0a0a0a @for13: dq $0d0d0d0d0d0d0d0d dq $0d0d0d0d0d0d0d0d @by16: movdqa xmm2, [rdi + rsi] // check 16 bytes per loop movdqa xmm3, xmm2 pcmpeqb xmm2, xmm0 pcmpeqb xmm3, xmm1 por xmm3, xmm2 pmovmskb eax, xmm3 test eax, eax jnz @found add rdi, 16 jnc @by16 @fail: mov rax, r8 // returns TextLen if no CR/LF found {$ifdef MSWINDOWS} pop rdi pop rsi {$endif}ret @found: bsf eax, eax add rax, rdi jc @fail add rax, r8 {$ifdef MSWINDOWS} pop rdi pop rsi {$endif} end; {$else} {$ifdef FPC}inline;{$endif} var c: cardinal; begin result := 0; dec(PtrInt(TextEnd),PtrInt(Text)); // compute TextLen if TextEnd<>nil then repeat c := ord(Text[result]); if c>13 then begin inc(result); if result>=PtrInt(PtrUInt(TextEnd)) then break; continue; end; if (c=10) or (c=13) then break; inc(result); if result>=PtrInt(PtrUInt(TextEnd)) then break; until false; end; {$endif CPUX64} It will be faster than any UTF-8 decoding for sure. I already hear some people say: "hey, this is premature optimization! the disk is the bottleneck!". But in 2020, my 1TB SSD reads at more than 3GB/s - https://www.sabrent.com/rocket This is real numbers on my laptop. So searching at GB/s speed does make sense. We use similar techniques at https://www.livemon.com/features/log-management With optimized compression, and distributed search, we reach TB/s brute force speed.
  25. Arnaud Bouchez

    Reading large UTF8 encoded file in chunks

    We use similar techniques in our SynCommons.pas unit. See for instance lines 17380 and following: // some constants used for UTF-8 conversion, including surrogates const UTF16_HISURROGATE_MIN = $d800; UTF16_HISURROGATE_MAX = $dbff; UTF16_LOSURROGATE_MIN = $dc00; UTF16_LOSURROGATE_MAX = $dfff; UTF8_EXTRABYTES: array[$80..$ff] of byte = ( 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1, 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1, 2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2, 3,3,3,3,3,3,3,3,4,4,4,4,5,5,0,0); UTF8_EXTRA: array[0..6] of record offset, minimum: cardinal; end = ( // http://floodyberry.wordpress.com/2007/04/14/utf-8-conversion-tricks (offset: $00000000; minimum: $00010000), (offset: $00003080; minimum: $00000080), (offset: $000e2080; minimum: $00000800), (offset: $03c82080; minimum: $00010000), (offset: $fa082080; minimum: $00200000), (offset: $82082080; minimum: $04000000), (offset: $00000000; minimum: $04000000)); UTF8_EXTRA_SURROGATE = 3; UTF8_FIRSTBYTE: array[2..6] of byte = ($c0,$e0,$f0,$f8,$fc); In fact, the state machine I talked about was just about line feeds, not UTF-8. My guess was that UTF-8 decoding could be avoided during the process. If the lines are not truncated, then UTF-8 and Ansi bytes will be valid sequences. Since when processing logs, lines should be taken into account, a first scan would be to decode line feeds, then process the line bytes directly, with no string/UnicodeString conversion at all. For fast searching within the UTF-8/Ansi memory buffer, we have some enhanced techniques e.g. the SBNDM2 algorithm: see TMatch.PrepareContains in our SynTable.pas unit. It is much faster than Pos() or BoyerMore for small patterns, with branchless case-insensitivity. It reaches several GB/s of searching speed inside memory buffers. There is even a very fast expression search engine (e.g. search for '404 & mydomain.com') in TExprParserMatch. More convenient than a RegEx to me - for a fast RegEx engine, check https://github.com/BeRo1985/flre/ Any memory allocation would reduce a lot the process performance.
×