Jump to content

Arnaud Bouchez

  • Content Count

  • Joined

  • Last visited

  • Days Won


Arnaud Bouchez last won the day on June 27

Arnaud Bouchez had the most liked content!

Community Reputation

136 Excellent


Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

  1. I guess this bug may be some inheritance from DOS/TurboPascal years... when the 8087 was already there and no thread was involved.
  2. Fair enough. 😞 But FPC doesn't suffer from this race condition AFAIK: procedure Set8087CW(cw:word); begin default8087cw:=cw; asm fnclex fldcw cw end; end;
  3. I agree with you. IMHO it is less a breaking change than a bugfix. The behavior you propose seems more stable, in terms of thread-safety. Note that FPC RTL mimics the same behavior, and is affected by the same bug. Is just calling Set8087CW() at thread start enough as a workaround in user-code?
  4. Arnaud Bouchez

    FreeAndNil 10.4 vs 10.3.1 and Pointers

    At first place, I don't understand how FreeAndNil() on an array pointer could work properly. It would try to call a Destroy method in the VMT, which doesn't exist... as @dummzeuch reported above. What you should do ASAP: 1. Get rid of all those FreeAndNil() on something else than class instances. 2. If you can , try to replace those pointer arrays with dynamic arrays, so you would have reference-counting and automatic free of the content when the variable comes out of scope. You can transtype a dynamic array into an array pointer just by using `pointer(aByteDynArray)`.
  5. Arnaud Bouchez

    tiny computer for Delphi apps

    I don't know which size are your projects, but I used a 4GB Win10 computer until recently, with no memory issue at all. With hunderths of thousands of source code lines...
  6. Arnaud Bouchez

    tiny computer for Delphi apps

    6GB is pretty enough for Windows 10, if you don't run VMs on it. And Delphi IDE won't use more than 2GB for sure, since it is still a 32-bit application. This mini PC would run very well with Delphi - the main bottleneck for an IDE is the disk, and with a good M2 SSD I don't see why it may be slow. Perhaps the default EMMC storage may not be optimum, but I guess it would work well enough. From what I saw, the slowest part of the IDE is the copy protection check at startup... at least if you use Andy's FixPack. 🙂 Such computers are powerful enough for Delphi. Perhaps not with Visual Studio with a lot of plugins.
  7. The new features of this compiler are just untested/unfinished... Even the type helpers are broken with sub-types: if you define TMyInteger = type integer then you can't use myinteger.ToString... Non-inheritance is a "feature" which IMHO is wrong. What always work, and is the very same, is to write: TUsers = array of TUser; > Too bad we are working with Delphi where it does not matter because I can still assign a TUserName to TUserFirstName Yes, only var/out variables have compile-time strong checking... But at least you can read the code and verify the consistency either since the type (and RTTI) are explicit. And refer to the type definition comment as documentation. And it also helps writing more natural code, by having the type defined in the time, not in the parameter I prefer: function CopySession(source, dest: TSessionID): boolean; .... property session: TSessionID; instead of function CopySession(sourceSessionID, destSessionID: integer): boolean ... property sessionID: integer;
  8. Strong typing may be a good idea in some context. For instance, in DDD (Domain Driven Design) you should better define your genuine types. Instead of writing: type TUser = record Name: string; FirstName: string; end; You should rather define: type TUserName = type string; TUserFirstName = type string; TUser = record Name: TUserName; FirstName: TUserFirstName; end; TUsers = type TArray<TUser>; and so on for any service methods. Such strong typing (T* = type ### defines its own strong type) helps maintaining complex code. I always let people remember the https://en.wikipedia.org/wiki/Mars_Climate_Orbiter disaster. A strong type (as it should have been if ADA would still have been used for the software) would have ensured that the force would use the same unit in the inter-module computation (english vs metric systems). Specific types may help e.g. for "Design by contract". Of course, for a DTO you may just use plain string/TArray<string>. But if you want to have some maintainable long-term code, consider define your own types. See e.g. http://blog.synopse.info/post/2019/09/18/Introducing-Kingdom-Driven-Design-at-EKON-23
  9. Arnaud Bouchez

    Free SQLite -> Interbase tool?

    You may also try our SynDBExplorer tool. If you can serve the Interbase ToGO DB with Interbase server, connect to it then choose the "Table Export" feature: it will create the SQLite3 file for you. Note that if you use the SQL to text dump conversion, I guess you don't need to change anything to the CREATE TABLE statement. The SQLite3 syntax is very relaxed, thanks to its "column affinity" feature.
  10. Arnaud Bouchez

    Free SQLite -> Interbase tool?

    If you crash your drive with a hammer, you would also loose all your data. The SQLite3 reference article is really paranoid, and its default settings are aircraft-level secure. If you have exclusive access to the SQLite3 DB, then most of the database corruption problems disappear. To be fair, 'entreprise' DBs don't synch to disk at every transaction. I have had Oracle databases not able to mount at all after a power failure. Whereas SQLite3 may loose some data, but can almost always reopen its SQLite3 file. We used those settings on production DB since years with TB of processing data and billions of insert/select, with no data loss (only journalmode was left to its default). The OP was talking about mobile app, where I doubt such paranoia is required.
  11. Arnaud Bouchez

    Free SQLite -> Interbase tool?

    Export/import as SQL? Only the CREATE TABLE statements may need some manual adjustment. But the INSERT should work directly. Sqlite3 command-line tool has a .dump command - just copy the SQlite3 DB file from your mobile to your desktop to dump it. IMHO SQLite3 would be faster than Interbase - it is at least what I saw with Firebird/SQLite3 on Windows and Linux. Perhaps Interbase has some secret weapon, but I doubt it very much. And using an Open Source and proven solution like SQlite3 is worth it... Also ensure you make a fair comparison between the two. By default, SQLite3 expects a full synch to the storage media, which is the safest, but slowest approch. So ensure you setup JOURNAL_MODE=Memory and LOCKINGMODE=Exclusive and SYNCHRONOUS=Off. Check http://blog.synopse.info/post/2012/07/26/ACID-and-speed
  12. Arnaud Bouchez

    Experience/opinions on FastMM5

    @abak My advice to switch to FastMM5 only if 1. you actually tested and saw a noticeable (on wall clock) performance improvement 2. and you are OK with the licence terms. I doubt point 1. will happen in most cases, i.e. if your app is not heavily multi-threaded, but is a regular VCL/DB app. Point 2 would require to pay for a license if your project is not GPL/LGPL itself.
  13. Arnaud Bouchez

    Experience/opinions on FastMM5

    I don't think alignement is involved to trigger or not microfusion. Alignement is a just way to ensure that the CPU instruction decoder is able to fetch as much opcodes as possible: since the CPU is likely to fetch 16 bytes of opcodes at a time, aligning a jump to 16 bytes may reduce the number of fetchs. It is mostly needed for a loop, and could (much more marginaly) be beneficial for regular jumps. My reference/bible is https://www.agner.org/optimize/optimizing_assembly.pdf in that matter: But the only true reference is the clock: as you wrote we need to test/measure, not guess.
  14. Arnaud Bouchez

    Experience/opinions on FastMM5

    @Kas Ob. 1) this modified code is not the same as the initial, because rdx is modified in between. And the current code is better since the CPU will make microfusion opcode of cmp + jmp 2) It is correct. I will use cmovb here. Thanks! 3) I would never use an Windows undocumented function in production code. There is almost no sleep() call in my tests thanks to good spining. So it won't make any difference in practice. And we focus on Linux, not Windows, for our servers - in which nanosleep is there. Speaking of 100ns resolution is IMHO unrealistic: I suspect there is a context switch otherwise bigger spinning or calling ThreadSwitch may be just good enough.
  15. Arnaud Bouchez

    Experience/opinions on FastMM5

    You are right: FastMM5 challenged me... and since no one responded to my offer about helping it run on FPC/Linux, and also since I wanted something Open Source but not so restrictive, I created https://github.com/synopse/mORMot2/blob/master/src/core/mormot.core.fpcx64mm.pas which is GPL/LGPL and MPL. So you can use it with closed software. It uses the same core algorithms than FastMM4. I like it so much, and missed it so much in FPC... 🙂 I was involved in ScaleMM2, and a per-thread arena for small blocks didn't convince me: it tends to consume too much RAM when you have a lot of threads in your process. Note that a threadvar is what the FPC standard MM uses. I wanted to take the best of FastMM4 (which is very proven, stable and efficient), but drive it a little further in terms of multi-threading and code quality. FastMM4 asm is 32-bit oriented, its x86_64 version was sometimes not very optimized for this target - just see its abuse of globals, not knowledge of micro-op fusion or CPU cache lines and locks, and sparse use of registers. Also focusing on a single compiler and a single CPU, with not all the features of FastMM4 in pascal mode, helped fpcx64mm appear in two days only. Last but not least, I spent a lot of time this last year in x86_64 assembly, so I know which patterns are expected to be faster. The huge regression test suite of mORMot helps having a proven benchmark - much more aggressive and realistic than microbenchmarks (like string concatenation in threads, or even the FastCode benchmark) on which most other MM relies for measurement. When the regression tests are more than twice faster than with the FPC standard MM on Linux - as @ttomas reported - then we are talking. It runs a lot of different scenarios, with more than 43,000,000 individual tests, and several kind of HTTP/TCP servers on the loopback, running in-memory or SQLite databases, processing JSON everywhere, with multiple client threads stressing it. When I run the test on my Linux machine, I have only a few (less than a dozen) system Linux nanosleeps (better than Windows sleep) , and less than 2 ms waiting during a 1 minute of heavy tests - and only for Freemem. I really don't like the microbenchmarks used for testing MM. Like the one published in this forum. For instance IntelTBB is very fast for such benchmarks, but it doesn't release its memory as it should, and it is unusable in practice. I guess that some user code, not written with performance in mind, and e.g. abusing of str := str+'something' patterns would also be more than twice faster. And if your code has to reallocate huge buffers (>256KB) in a loop, using mremap on Linux may make a huge performance boost since no data would be copied at all - Linux mremap() is much better than what Windows or BSD offer! Yes, huge memory blocks are resized by the Linux Kernel by reaffecting its TLB redirection tables, without copying any memory. No need to use AVX512 if you don't copy anything! And plain SSE2 (with non-volatile mov for big buffers) is good enough to saturate the HW memory bandwidth - and faster than ERMS in practice. IMHO there was no need to change the data structures like FastMM5 did - I just tuned/fixed most of its predecessor FastMM4 asm, reserved some additional slots for the smaller blocks (<=80 bytes are now triplets), implemented a safe and efficient spinning, implement some internal instrumentation to catch multi-threading bottlenecks, and then Getmem didn't suffer from contention any more! I knew than FastMM4 plus some tweaks could be faster than anything else - perhaps even FastMM5.