Arnaud Bouchez's Content - Page 9

Help with string extraction function

Arnaud Bouchez replied to Mike Torrettinni's topic in Algorithms, Data Structures and Class Design

Branch prediction can do wonders in some micro-benchmarks, but they always have a cost, until the CPU logic actually "warmed up". No branch is always better, since the real problem is branch misprediction. Using the asm generated by a recent GCC with tuned optimization flags as a reference is a good hint of what is likely to be more efficient. Intel and AMD are major GCC contributors. And a wall clock of a realistic process (not micro benchmark) is the ultimate reference. We use some part of our automated tests as performance benchmark, running some close-to-the reality scenarios.

Help with string extraction function

Arnaud Bouchez replied to Mike Torrettinni's topic in Algorithms, Data Structures and Class Design

Indeed. This is why I use the "if x then repeat until not x" explicit pattern in time-critical loops of my framework - e.g. when processing JSON. As a nice side effect, the variables uses in "x" are more likely to be assigned to registers, since they will be used more often. Sometimes an explicit temporary local variable is needed. Or use of the 'result' variable if it is a PChar. Both Delphi and FPC require this, unless "x" is a single simple test, where I have seen FPC able to optimize it IIRC.

Why should I use good source control versioning system?

Arnaud Bouchez replied to Mike Torrettinni's topic in Tips / Blogs / Tutorials / Videos

So Delphi would be a weird choice for programming. Regardless of how good it is, it's not widespread any more. The weirdest David's argument I have never read. Alternatives are good. Especially if they are better for simple projects. 😉

Why should I use good source control versioning system?

Arnaud Bouchez replied to Mike Torrettinni's topic in Tips / Blogs / Tutorials / Videos

For a small and efficient Source Control Management system, even stand-alone with no server, you don't need git, tortoise and whatever... Just try https://fossil-scm.org/home/doc/trunk/www/index.wiki Only a 2MB download, for a full-bloated distributed SCM, with integrated web server, wiki and tickets. Perfect for any size of projects. It was made by the SQLite3 author, and I use it since years. Git was meant for Linux. Fossil was made for SQLite3. Ensure you read https://fossil-scm.org/home/doc/trunk/www/concepts.wiki and https://fossil-scm.org/home/doc/trunk/www/fossil-v-git.wiki

Buggy Optimizer in Delphi 10.4

Arnaud Bouchez replied to sakura's topic in General Help

I have written a blog article about this problem. https://blog.synopse.info/?post/2020/07/20/Special-Care-of-Delphi-10.4 Hope the issue is fixed soon in the next Delphi patch - after the holidays I guess. 🙂

July 20, 2020
7 replies
- optimization
- bug
- (and 1 more)
  Tagged with:

Patch 2 for RAD Studio 10.4 now available

Arnaud Bouchez replied to Marco Cantu's topic in General Help

This is the famous "patch release before holidays" syndrome... 🙂

Rethinking Delphi’s management of floating point control registers

Arnaud Bouchez replied to David Heffernan's topic in RTL and Delphi Object Pascal

I guess this bug may be some inheritance from DOS/TurboPascal years... when the 8087 was already there and no thread was involved.

Rethinking Delphi’s management of floating point control registers

Arnaud Bouchez replied to David Heffernan's topic in RTL and Delphi Object Pascal

Fair enough. 😞 But FPC doesn't suffer from this race condition AFAIK: procedure Set8087CW(cw:word); begin default8087cw:=cw; asm fnclex fldcw cw end; end;

Rethinking Delphi’s management of floating point control registers

Arnaud Bouchez replied to David Heffernan's topic in RTL and Delphi Object Pascal

I agree with you. IMHO it is less a breaking change than a bugfix. The behavior you propose seems more stable, in terms of thread-safety. Note that FPC RTL mimics the same behavior, and is affected by the same bug. Is just calling Set8087CW() at thread start enough as a workaround in user-code?

FreeAndNil 10.4 vs 10.3.1 and Pointers

Arnaud Bouchez replied to Sherlock's topic in RTL and Delphi Object Pascal

At first place, I don't understand how FreeAndNil() on an array pointer could work properly. It would try to call a Destroy method in the VMT, which doesn't exist... as @dummzeuch reported above. What you should do ASAP: 1. Get rid of all those FreeAndNil() on something else than class instances. 2. If you can , try to replace those pointer arrays with dynamic arrays, so you would have reference-counting and automatic free of the content when the variable comes out of scope. You can transtype a dynamic array into an array pointer just by using `pointer(aByteDynArray)`.

tiny computer for Delphi apps

Arnaud Bouchez replied to David Schwartz's topic in General Help

I don't know which size are your projects, but I used a 4GB Win10 computer until recently, with no memory issue at all. With hunderths of thousands of source code lines...

tiny computer for Delphi apps

Arnaud Bouchez replied to David Schwartz's topic in General Help

6GB is pretty enough for Windows 10, if you don't run VMs on it. And Delphi IDE won't use more than 2GB for sure, since it is still a 32-bit application. This mini PC would run very well with Delphi - the main bottleneck for an IDE is the disk, and with a good M2 SSD I don't see why it may be slow. Perhaps the default EMMC storage may not be optimum, but I guess it would work well enough. From what I saw, the slowest part of the IDE is the copy protection check at startup... at least if you use Andy's FixPack. 🙂 Such computers are powerful enough for Delphi. Perhaps not with Visual Studio with a lot of plugins.

Disadvantage of using defined type of TArray?

Arnaud Bouchez replied to Mike Torrettinni's topic in Algorithms, Data Structures and Class Design

The new features of this compiler are just untested/unfinished... Even the type helpers are broken with sub-types: if you define TMyInteger = type integer then you can't use myinteger.ToString... Non-inheritance is a "feature" which IMHO is wrong. What always work, and is the very same, is to write: TUsers = array of TUser; > Too bad we are working with Delphi where it does not matter because I can still assign a TUserName to TUserFirstName Yes, only var/out variables have compile-time strong checking... But at least you can read the code and verify the consistency either since the type (and RTTI) are explicit. And refer to the type definition comment as documentation. And it also helps writing more natural code, by having the type defined in the time, not in the parameter I prefer: function CopySession(source, dest: TSessionID): boolean; .... property session: TSessionID; instead of function CopySession(sourceSessionID, destSessionID: integer): boolean ... property sessionID: integer;

Disadvantage of using defined type of TArray?

Arnaud Bouchez replied to Mike Torrettinni's topic in Algorithms, Data Structures and Class Design

Strong typing may be a good idea in some context. For instance, in DDD (Domain Driven Design) you should better define your genuine types. Instead of writing: type TUser = record Name: string; FirstName: string; end; You should rather define: type TUserName = type string; TUserFirstName = type string; TUser = record Name: TUserName; FirstName: TUserFirstName; end; TUsers = type TArray<TUser>; and so on for any service methods. Such strong typing (T* = type ### defines its own strong type) helps maintaining complex code. I always let people remember the https://en.wikipedia.org/wiki/Mars_Climate_Orbiter disaster. A strong type (as it should have been if ADA would still have been used for the software) would have ensured that the force would use the same unit in the inter-module computation (english vs metric systems). Specific types may help e.g. for "Design by contract". Of course, for a DTO you may just use plain string/TArray<string>. But if you want to have some maintainable long-term code, consider define your own types. See e.g. http://blog.synopse.info/post/2019/09/18/Introducing-Kingdom-Driven-Design-at-EKON-23

Free SQLite -> Interbase tool?

Arnaud Bouchez replied to sjordi's topic in Databases

You may also try our SynDBExplorer tool. If you can serve the Interbase ToGO DB with Interbase server, connect to it then choose the "Table Export" feature: it will create the SQLite3 file for you. Note that if you use the SQL to text dump conversion, I guess you don't need to change anything to the CREATE TABLE statement. The SQLite3 syntax is very relaxed, thanks to its "column affinity" feature.

Free SQLite -> Interbase tool?

Arnaud Bouchez replied to sjordi's topic in Databases

If you crash your drive with a hammer, you would also loose all your data. The SQLite3 reference article is really paranoid, and its default settings are aircraft-level secure. If you have exclusive access to the SQLite3 DB, then most of the database corruption problems disappear. To be fair, 'entreprise' DBs don't synch to disk at every transaction. I have had Oracle databases not able to mount at all after a power failure. Whereas SQLite3 may loose some data, but can almost always reopen its SQLite3 file. We used those settings on production DB since years with TB of processing data and billions of insert/select, with no data loss (only journalmode was left to its default). The OP was talking about mobile app, where I doubt such paranoia is required.

Free SQLite -> Interbase tool?

Arnaud Bouchez replied to sjordi's topic in Databases

Export/import as SQL? Only the CREATE TABLE statements may need some manual adjustment. But the INSERT should work directly. Sqlite3 command-line tool has a .dump command - just copy the SQlite3 DB file from your mobile to your desktop to dump it. IMHO SQLite3 would be faster than Interbase - it is at least what I saw with Firebird/SQLite3 on Windows and Linux. Perhaps Interbase has some secret weapon, but I doubt it very much. And using an Open Source and proven solution like SQlite3 is worth it... Also ensure you make a fair comparison between the two. By default, SQLite3 expects a full synch to the storage media, which is the safest, but slowest approch. So ensure you setup JOURNAL_MODE=Memory and LOCKINGMODE=Exclusive and SYNCHRONOUS=Off. Check http://blog.synopse.info/post/2012/07/26/ACID-and-speed

Experience/opinions on FastMM5

Arnaud Bouchez replied to Leif Uneus's topic in RTL and Delphi Object Pascal

@abak My advice to switch to FastMM5 only if 1. you actually tested and saw a noticeable (on wall clock) performance improvement 2. and you are OK with the licence terms. I doubt point 1. will happen in most cases, i.e. if your app is not heavily multi-threaded, but is a regular VCL/DB app. Point 2 would require to pay for a license if your project is not GPL/LGPL itself.

Experience/opinions on FastMM5

Arnaud Bouchez replied to Leif Uneus's topic in RTL and Delphi Object Pascal

I don't think alignement is involved to trigger or not microfusion. Alignement is a just way to ensure that the CPU instruction decoder is able to fetch as much opcodes as possible: since the CPU is likely to fetch 16 bytes of opcodes at a time, aligning a jump to 16 bytes may reduce the number of fetchs. It is mostly needed for a loop, and could (much more marginaly) be beneficial for regular jumps. My reference/bible is https://www.agner.org/optimize/optimizing_assembly.pdf in that matter: But the only true reference is the clock: as you wrote we need to test/measure, not guess.

Experience/opinions on FastMM5

Arnaud Bouchez replied to Leif Uneus's topic in RTL and Delphi Object Pascal

@Kas Ob. 1) this modified code is not the same as the initial, because rdx is modified in between. And the current code is better since the CPU will make microfusion opcode of cmp + jmp 2) It is correct. I will use cmovb here. Thanks! 3) I would never use an Windows undocumented function in production code. There is almost no sleep() call in my tests thanks to good spining. So it won't make any difference in practice. And we focus on Linux, not Windows, for our servers - in which nanosleep is there. Speaking of 100ns resolution is IMHO unrealistic: I suspect there is a context switch otherwise bigger spinning or calling ThreadSwitch may be just good enough.

Experience/opinions on FastMM5

Arnaud Bouchez replied to Leif Uneus's topic in RTL and Delphi Object Pascal

You are right: FastMM5 challenged me... and since no one responded to my offer about helping it run on FPC/Linux, and also since I wanted something Open Source but not so restrictive, I created https://github.com/synopse/mORMot2/blob/master/src/core/mormot.core.fpcx64mm.pas which is GPL/LGPL and MPL. So you can use it with closed software. It uses the same core algorithms than FastMM4. I like it so much, and missed it so much in FPC... 🙂 I was involved in ScaleMM2, and a per-thread arena for small blocks didn't convince me: it tends to consume too much RAM when you have a lot of threads in your process. Note that a threadvar is what the FPC standard MM uses. I wanted to take the best of FastMM4 (which is very proven, stable and efficient), but drive it a little further in terms of multi-threading and code quality. FastMM4 asm is 32-bit oriented, its x86_64 version was sometimes not very optimized for this target - just see its abuse of globals, not knowledge of micro-op fusion or CPU cache lines and locks, and sparse use of registers. Also focusing on a single compiler and a single CPU, with not all the features of FastMM4 in pascal mode, helped fpcx64mm appear in two days only. Last but not least, I spent a lot of time this last year in x86_64 assembly, so I know which patterns are expected to be faster. The huge regression test suite of mORMot helps having a proven benchmark - much more aggressive and realistic than microbenchmarks (like string concatenation in threads, or even the FastCode benchmark) on which most other MM relies for measurement. When the regression tests are more than twice faster than with the FPC standard MM on Linux - as @ttomas reported - then we are talking. It runs a lot of different scenarios, with more than 43,000,000 individual tests, and several kind of HTTP/TCP servers on the loopback, running in-memory or SQLite databases, processing JSON everywhere, with multiple client threads stressing it. When I run the test on my Linux machine, I have only a few (less than a dozen) system Linux nanosleeps (better than Windows sleep) , and less than 2 ms waiting during a 1 minute of heavy tests - and only for Freemem. I really don't like the microbenchmarks used for testing MM. Like the one published in this forum. For instance IntelTBB is very fast for such benchmarks, but it doesn't release its memory as it should, and it is unusable in practice. I guess that some user code, not written with performance in mind, and e.g. abusing of str := str+'something' patterns would also be more than twice faster. And if your code has to reallocate huge buffers (>256KB) in a loop, using mremap on Linux may make a huge performance boost since no data would be copied at all - Linux mremap() is much better than what Windows or BSD offer! Yes, huge memory blocks are resized by the Linux Kernel by reaffecting its TLB redirection tables, without copying any memory. No need to use AVX512 if you don't copy anything! And plain SSE2 (with non-volatile mov for big buffers) is good enough to saturate the HW memory bandwidth - and faster than ERMS in practice. IMHO there was no need to change the data structures like FastMM5 did - I just tuned/fixed most of its predecessor FastMM4 asm, reserved some additional slots for the smaller blocks (<=80 bytes are now triplets), implemented a safe and efficient spinning, implement some internal instrumentation to catch multi-threading bottlenecks, and then Getmem didn't suffer from contention any more! I knew than FastMM4 plus some tweaks could be faster than anything else - perhaps even FastMM5.

FastMM5 now released by Pierre le Riche (small background story)

Arnaud Bouchez replied to Günther Schoch's topic in Delphi Third-Party

If I understand correctly, FastMM5 handles several arenas instead of one for FastMM4, and tries all of them until one is not currently locked, so thread contention is less subject to happen. One area where FastMM5 may have some improvement is his naive use of "rep movsb" which should rather use a non volative SSE2/AVX move for big blocks. Check https://stackoverflow.com/a/43574756/458259 numbers for instance. ScaleMM2 and FPC heap both use a threadvar arena for small blocks, so doesn't bother to check for any lock. It is truly lock-free. But each thread maintains its own small blocks arena, so it consumes more memory. Other Memory Managers like Intel TBB or JeMalloc have also a similar per-thread approach, but consumes much more memory. For instance, IBB is a huge winner in performance, but it consumes up to 60 (sixty!) times more memory! So it is not usable in practice for serious server work - it may help for heavily multithread apps, but not for common services. I tries those other MM with mORMot and real heavily multi-threaded service. Please check the comments at the beginning of https://synopse.info/fossil/artifact/f85c957ff5016106 One big problem with the C memory managers is that they tend to ABORT the process (not SIGAV but SIGABRT) if there is a dandling pointer - which happens sometimes, and is very difficult to track. This paranoia makes them impossible to use on production: you don't want your service to shutdown with no prior notice just because the MM complains about a pointer! Our only problem with FPC heap is that with long term servers, it tends to fragment the memory and consumes some GB of RAM, whereas a FastMM-like memory manager would have consumed much less. FPC heap memory consumption doesn't leak: it stabilizes after a few days, but it is still higher than FastMM. The problem with FastMM (both 4 and 5) is that they don't work on Linux x86_64 with FPC. This is why I proposed to help Eric about FPC support.

Experience/opinions on FastMM5

Arnaud Bouchez replied to Leif Uneus's topic in RTL and Delphi Object Pascal

I didn't see any explicit NUMA support in the source code. I guess the idea is to force the CPU affinity of the process, to avoid NUMA latencies.

FastMM5 now released by Pierre le Riche (small background story)

Arnaud Bouchez replied to Günther Schoch's topic in Delphi Third-Party

What do you think about FPC + Linux support, which is a good environment for multi-threaded servers? FPC built-in heap is good, but tends to consume a lot of memory with a lot of threads: it maintains small per-thread heaps using a threadvar, whereas FastMM5 uses several arenas which are shared among all threads (I guess the idea is inspired from pmalloc/glibc allocator). I used C all best known alternatives, and I was not convinced. The only stable and not bloated memory manager is the one in glibc. But the slightest memory access violation tends to kill/abort the process, so it is not good on production. I could definitively help about the Linux/FPC syscalls and the low-level Intel asm, to includ FPC/Linux support on FastMM5. But perhaps I would go into this direction only if FPC as compiler doesn't require a commercial license. What do you think?

Random Access Violation?

Arnaud Bouchez replied to Nathan Wild's topic in General Help

As David wrote, try to make a minimal reproducible example. Just a project with ODBC access, running a SELECT query. I thought it may have been some problem with FPU exceptions, which happen with third-party libraries, but they usually occur in your Delphi code, not in library code....

Sign In

Arnaud Bouchez

Content Count

Joined

Last visited

Days Won

Content Type

Profiles

Forums

Calendar

Everything posted by Arnaud Bouchez

Help with string extraction function

Help with string extraction function

Why should I use good source control versioning system?

Why should I use good source control versioning system?

Buggy Optimizer in Delphi 10.4

Patch 2 for RAD Studio 10.4 now available

Rethinking Delphi’s management of floating point control registers

Rethinking Delphi’s management of floating point control registers

Rethinking Delphi’s management of floating point control registers

FreeAndNil 10.4 vs 10.3.1 and Pointers

tiny computer for Delphi apps

tiny computer for Delphi apps

Disadvantage of using defined type of TArray?

Disadvantage of using defined type of TArray?

Free SQLite -> Interbase tool?

Free SQLite -> Interbase tool?

Free SQLite -> Interbase tool?

Experience/opinions on FastMM5

Experience/opinions on FastMM5

Experience/opinions on FastMM5

Experience/opinions on FastMM5

FastMM5 now released by Pierre le Riche (small background story)

Experience/opinions on FastMM5

FastMM5 now released by Pierre le Riche (small background story)

Random Access Violation?

Browse

Activity