Arnaud Bouchez

September 20, 2020

1 hour ago, Anders Melander said:

Funny name you say? How about he optimize it with Gnome sort instead:

It is also known as "stupid sort".

Best algorithm name ever.

September 12, 2020

22 minutes ago, Hans J. Ellingsgaard said:

A count(*) query, on the other hand, has a very limited impact on the server.

I don't agree. This is highly depending on the DB itself.
A count(*) could take a lot of time for a huge database, or with a complex query with no proper indexes.

You could have a count(*) of a few rows (e.g. one or two rows) which takes a lot of time on big tables with a SQL request bypassing the indexes.
In this case, it is better to retrieve the few result rows instead of making a select count(*) following by a select *

September 12, 2020

You may either:

Refine your SQL query, e.g. by adding some proper indexes to the DB, to make it faster on the server;
Actually retrieve all data at once (maybe into an in-memory list/array), but use Application.ProcessMessages within the loop to avoid UI freezing;
Don't care about exact record count: just fetch the first 1000 for instance, and write '>1000' on the UI if there are more rows.

September 10, 2020

As I wrote it never stores a double - SQLite3 doesn't support TDateTime double which is Ole/Windows specific.

In your code, text will be stored.

It is because that SQliteStudio display what is stored, i.e. text, trying several date/time layouts.

It seems that FireDac expects ISO-8601 encoding - just as SQLite3. And don't forget to set the seconds - even :00.

September 9, 2020

IIRC, SQLite3 has no native date/time format, but it can understand ISO 8601 text fields and Unix Epoch integer fields.

September 9, 2020

Thanks for the detailed feedback... from the asm sounds like a compiler issue.

It may be worth a ticket, since it may affect not only our code, but a lot of it!

September 9, 2020

From a Github issue description for our SynPDF Open Source project: Generating a PDF via VLCCanvas and TPdfDocumentGDI causes access violation when compiled with Delphi 10.4.1 with record field alignment compiler option set to "byte" or "off". When this option is set to either of "word", "double word" or "quad word", the PDF gets created without errors. The same exact code works fine when compiled with Delphi 10.4 (patch 3), regardless of the field alignment flag.

We added {$A+} and it seemed to fix the problem.
https://blog.synopse.info/?post/2020/09/09/Record-Alignement-and-Delphi-10.4.1

Sadly, I don't have access to Delphi 10.4.1 since I don't have any commercial licence, and I am waiting for the Community Edition - which is still 10.3 IIRC. So I couldn't debug the root cause and fill a JIRA ticket to EMB.
Perhaps some people from Delphi-Praxis may have encountered this issue, and found the root cause...

Maybe it is was a real fix introduced in 10.4.1, and the previous behavior was incorrect: perhaps an explicit {$A+} is required when working with records... but at least, it breaks existing code, so spreading the info may help...

September 7, 2020

I wonder if the WebSockets encapsulating in HTTP/2 makes a huge performance difference in respect to plain HTTP upgrade.

Speaking about WebSockets frames communications, not HTTP requests.

September 5, 2020

10 hours ago, David Heffernan said:

If you care about performance, measure it.

This is the main idea.

No premature optimization. This is not because a single line ("case ... of") is slightly faster than your work will be faster.

AnsiString with the system code page is a wrong idea - it is not able to store all Unicode content.
UTF-8 is a good idea if you use it from one end to the other in your project.
For instance, if your database layer uses "string" then using AnsiString won't help. On the contrary, conversion and memory allocation has a cost, so it may be actually slower.

Only if you have UTF-8 from end to end, e.g. in our Open Source framework, we use UTF-8 everwhere, e.g. from DB to JSON, so no UTF-16 conversion is done. It is perfect for server side. But if you write a VCL/FMX RAD app, using plain string makes more sense.

September 3, 2020

We can report that a compiler regression about wrongly optimized result value was fixed with 10.4.1.

https://quality.embarcadero.com/browse/RSP-30088

Even if it has been reported in JIRA as "Expected behavior". 😞

https://synopse.info/forum/viewtopic.php?pid=32966#p32966

September 2, 2020

20 hours ago, M.Joos said:

About DWscript: You said it is not cross-platform yet , does that mean that someone is already working on making it cross platform? And what is it that makes it so Windows specific?

Cross-Platform was prepared, there are OS-specific units, but the POSIX versions were never finished nor tested IIRC, since Eric (the maintainer) didn't need anything outside Windows.

As stated by Eric in his blog https://www.delphitools.info/2018/04/20/dwscript-transition-to-delphi-10-2-3/#more-3949 :

Quote

The goal is to target Win32 and Win64 compilers, mobile platforms and Delphi Linux are currently not in the scope.

Darwin/Linux support may be feasible, but Mobile platforms would require some ARM low-level stuff, which may not be easy to do.

September 1, 2020

DWSScipt is my favorite.
Its syntax is modern, and its implementation is very clean. It even has a JIT!

The problem is that it is not cross-platform yet.

The veteran PascalScript is my favorite if cross-platform is needed.

It is stable, and widely used since years.

September 1, 2020

2 hours ago, Lars Fosdal said:

Did it also attempt to take some liberties with regards to HKLM?

Not that I have seen.

You can run Delphi 7 with non-admin rights, as soon as you install it not in "c:\program files".
This did not change since Vista.

August 31, 2020

I don't see Delphi 7 being slow on Windows 10, with the built-in antivirus/antimalware.

I installed it in a c:\Progs\Delphi7 folder, not in the default "c:\program files" sub-folder.
Ensure you installed https://www.idefixpack.de/blog/ide-tools/delphispeedup/ tool.

August 27, 2020

11 hours ago, RDP1974 said:

can I ask, in your Synopse software do you use Windows API IoCompletionPorts with thread pool and WSA* Overlapped I/O read write calls?

On Windows, we use http.sys kernel mode which scales better than anything on this platform. It is faster than IOCP since it runs in the kernel.

On Linux, we use our own thread-pool of socket server, with a nginx frontend as reverse proxy on the unix socket loopback, handling HTTPS and HTTP/2. This is very safe and scalable.

And don't trust micro benchmarks. Even worse, don't write your own benchmark. They won't be as good as measuring of a real application.

As I wrote, Intel TBB is a no-go for real server work due to huge memory consumption. If you have to run some specific API calls to release the memory, this is a big design flow - may be considered as a bug (we don't want to have the application stale as it would have with a GC) - and we would never do it.

To be more precise, we use long-living threads from thread pools. So in practice, the threads are never released, and the memory allocation and the memory release are done in diverse threads: one thread pool handles the socket communication, then other thread pool will consume the data and release the memory. This is a scenario typical from most event-driven servers, running on multi-core CPUs, with a proven ring-oriented architecture. Perhaps Intel TBB is not very good at releasing memory with such pattern - whereas our SynFPCx64MM is very efficient in this case. And we almost never realloc - just alloc/free using the stack as working buffer if necessary.

August 27, 2020

11 hours ago, RDP1974 said:

FastMM5 is fast as TBB under Webbroker with apachebench 100 concurrent users (finally overcoming the FM4 problems), but

TBB is 5x faster than FM5 under TParallel class

TBB is fast as FM4/FM5 in single thread

TBB is fast in benchmarks, but from our experiment not usable on production on a server.

TBB consumes A LOT of memory, much more than FM4/FM5 and alternatives.

Numbers for a real multi-threaded Linux server are a show stopper for using TBB.

On production on a huge Multi Xeon server, RAM consumption after a few hours stabilisation is gblic=2.6GB vs TBB=170GB - 60 times more memory ! With almost no actual performance boost.

This mORMot service handles TB of incoming data, sent by block every second, with thousands of simultaneous HTTPS connections.

See https://github.com/synopse/mORMot/blob/master/SynFPCCMemAligned.pas#L55

So never trust any benchmark.
Try with your real workload.

Quote

Delphi AVX support for Synopse MM?

What we found out with https://github.com/synopse/mORMot/blob/master/SynFPCx64MM.pas may be interesting for the discussion.

Using AVX for medium blocks moves/realloc doesn't change in practice in respect to an inlined SSE2 move (tiny/small/medium blocks), or a non-temporal move (using movntdq opcode instead of plain mov - for large blocks).
For large blocks, using mremap/VirtualAlloc in-place reallocation is a better approach: relying on the OS and performing no move is faster than AVX/AVX2/AVX512.

SynFPCx64MM is currently only for FPC. Used on production with heavily loaded servers.
It is based on FastMM4 design, fully optimized in x86_64 asm, but with a lockless round-robin algorithm for tiny blocks (<=256 bytes), and an optional lockless list for FreeMem - which are the bottleneck for most actual servers. It has several spinning alternatives in case of contention.
And it is really Open Source - not like FastMM5.
We may publish a Delphi-compatible version in the next weeks.

August 24, 2020

It is in records/seconds so the higher the better.

And it includes the ORM layer - which is very low in practice.

You can see for instance that if you use TDataSet (and DB.pas depending units) then reading one record/object has a noticeable overhead, in respect to direct DB access, as ZDBC does or our direct SynDB classes.

For a reference documentation, with some old numbers, you may check https://synopse.info/files/html/Synopse mORMot Framework SAD 1.18.html#TITL_59

Edit: you will see that the ORM also supports MongoDB as backend. Pretty unique.

August 24, 2020

Some discussion with numbers using ZDBC/Zeos 7.3 beta in respect to alternatives is available at https://synopse.info/forum/viewtopic.php?pid=32916#p32916

August 23, 2020

Great News!

Zeos is IMHO the best data access library for Delphi and FPC. And it is Open Source.

The direct ZDBC layer has tremendous performance, and a lot of work has been done for this 7.3 upcoming branch.

August 21, 2020

49 minutes ago, FPiette said:

BRCC32 cannot handle ico files with 24 bits RGB.

I don't see why - but I never used 24bits RGB icons... 16bits are good enough...

August 21, 2020

I would try to disable 3rd party packages first, and re-enable them one by one.

August 20, 2020

On 8/19/2020 at 2:03 AM, Mahdi Safsafi said:

Now, if you just used VirtualAlloc/VirtualFree, de-referencing the pointer is not required and all steps from 1 to 6 aren't necessary at all and paging is not happening too !!!

The important thing is that some of the above steps are heavy ... and that's why on their analyze they were taking hours to free the memory. Because a swap from/to disk/memory is happening all the time.

If the memory pages are swapped on disk, then indeed it will be slow to dereference the pointer.
But in this case, the application is very badly designed: paging on disk should be avoided in all cases, and direct disk API calls should be made instead to flush the data.

The problem is not the use of the MM. The problem is the whole memory allocation design in the application. Less memory should be allocated.

This is what @David Heffernan wrote, and you didn't get his point about swapping.

If the memory page is not on disk - then you may have a cache miss when the pointer is dereferenced.

For big memory blocks, it won't hurt. Calling VirtualFree will take definitively much more CPU than a cache miss.

So I still don't find the relevance of your argumentation.

Last but not least, the article you quoted (without any benchmark and code to prove the point) is very specific to the memory use of a database engine, which claims to be the fastest on the embedded market.
I doubt everytime I read such claim, and I don't see actual code. More like technical marketing arguments than real data.
Raima DB features "needing 350KB of RAM" and "optimized to run on resource-constrained IoT edge devices that require real-time response". So what is the point of benchmarking handling of GB of RAM?

The whole https://raima.com/sqlite-vs-rdm/ is full of FUD. The graphs are a joke. Since they don't even show the benchmark code, I guess they didn't even use a fair comparison and use SQLite in default mode - whereas with exclusive mode and in-memory journal, SQLite3 can be really fast. We have benchmark and code to show that with mORMot - https://synopse.info/files/html/Synopse mORMot Framework SAD 1.18.html#TITL_60 and https://synopse.info/files/html/Synopse mORMot Framework SAD 1.18.html#TITLE_140 (current numbers are even higher).
You may have to find better references.

August 18, 2020

@Mahdi Safsafi
Your article refers to the C malloc on Windows - which is known to be far from optimized - much less optimized than the Delphi MM.
For instance, the conclusion of the article doesn't apply to the Delphi MM: "If you have an application that uses a lot of memory allocation in relatively small chunks, you may want to consider using alternatives to malloc/free on Windows-based systems. While VirtualAlloc/VirtualFree are not appropriate for allocating less than a memory page they can greatly improve database performance and predictability when allocating memory in multiples of a single page.". This is exactly what FastMM4 does.

When I wrote fragmentation won't increase for HUGE blocks, I meant > some MB blocks. With such size, I would probably reuse the very same buffer per thread if performance is needed.

@Kas Ob.

You are just proving my point: if you use very specific OS calls, you may need buffer aligned on memory page.

August 18, 2020

On 8/15/2020 at 3:58 PM, Mahdi Safsafi said:

Large data tends to be aligned, calling Delphi MM will likely result to allocate one extra page for storing pointer info if the requested size is at the system granularity. Also pointer is not aligned at the system page granularity.

Allocating 4KB more for huge blocks is not an issue.

If you want the buffer aligned with system page granularity, then it is a very specific case, only needed by other OS calls, like changing the memory protection flags. It is theoritically possible, but very rare. This is the only reason when using the internal MM is not to be used.

If you expect to see any performance benefit of using memory page-aligned, you are pretty wrong for huge blocks - it doesn't change anything in practice. The only way to increase performance with huge block of memory is by using non-volatile/non-temporal asm opcodes (movnti e.g.), which won't populate the CPU cache. But this is only possible with raw asm, not Delphi code, and will clearly be MM independent.

August 15, 2020

20 hours ago, Mahdi Safsafi said:

Why you're using Delphi MM for such large block ? It would be better to use OS functions.

Delphi MM (FastMM4) is just a wrapper around the OS API for big blocks. No benefit of calling direclty the OS function, which is system-specific, and unsafe. Just use getmem/fremem/reallocmem everywhere.

Sign In

Arnaud Bouchez

Content Count

Joined

Last visited

Days Won

Content Type

Profiles

Forums

Calendar

Posts posted by Arnaud Bouchez

More performance Stringgrid sorting algorithm help

Best Practices for FireDAC FetchMode/RecordCount Settings

Best Practices for FireDAC FetchMode/RecordCount Settings

Firedac - Sqlite - DateTime field

Firedac - Sqlite - DateTime field

Record Alignement and Delphi 10.4.1

Record Alignement and Delphi 10.4.1

ANN: sgcWebSockets 4.2.2 New HTTP/2 Client

Use of Ansistring in Unicode world?

10.4.1 Released today

What ScriptEngine to choose ? DwScript, PascalScript, FastScript, TmsScripter, HtmlScripter

What ScriptEngine to choose ? DwScript, PascalScript, FastScript, TmsScripter, HtmlScripter

Delphi 7 is a lot slower on Windows 10 (compared to Win7)

Delphi 7 is a lot slower on Windows 10 (compared to Win7)

a pair of MM test

a pair of MM test

Zeos 7.3 entered the beta phase.

Zeos 7.3 entered the beta phase.

Zeos 7.3 entered the beta phase.

Free Resource Builder Utility?

Delphi IDE 10.4 Patch 3, crashes when opening pas files in current project

System.GetMemory returning NIL

System.GetMemory returning NIL

System.GetMemory returning NIL

System.GetMemory returning NIL

Browse

Activity