Leaderboard

in all areas
Custom Date
- Custom Date
  Between and

Arnaud Bouchez

Members
- Points
  
  2
- Content Count
  
  325
- Find Content
Lars Fosdal

Administrators
- Points
  
  1
- Content Count
  
  3524
- Find Content
Bill Meyer

Members
- Points
  
  1
- Content Count
  
  656
- Find Content
Fr0sT.Brutal

Members
- Points
  
  1
- Content Count
  
  2268
- Find Content

Popular Content

Showing content with the highest reputation on 06/03/21 in all areas

Is it possible to get a list of all global variables, consts, types (non RTL)?

Bill Meyer replied to Mike Torrettinni's topic in General Help

I have found it necessary to write small apps to derive what I need from reports out of PAL. The noise level is quite high, and the options do not address things I have needed.
- June 3, 2021
- 8 replies
Listen to UDP in a TThread (Windows Service)

Fr0sT.Brutal replied to Clément's topic in ICS - Internet Component Suite

Btw, TranslateMessage only deals with virtual keys so is useless for worker threads. Removing it allows to save 3 lines 😉
- June 3, 2021
- 5 replies
Fast Pos & StringReplace for 64 bit

Arnaud Bouchez replied to Tom de Neef's topic in Algorithms, Data Structures and Class Design

Here is for instance how FPC compile our PosExPas() function for Linux x86_64. https://gist.github.com/synopse/1e30b30a77f6b0288310115085401c1e You can see the resulting asm is very efficient. Thanks to the goto used in the pascal code, and proper use of pointers/registers, to be fair. 😉 You may find some inspiring string process in our https://github.com/synopse/mORMot2/blob/master/src/core/mormot.core.unicode.pas#L1404 unit. This link give you some efficient case-insensitive process over not only latin A-Z chars, but on the whole Unicode 10.0 case folding tables (i.e. work with greek, cyrilic, or other folds). This code is UTF-8 focused, because we use it instead of UTF-16 in our framework for faster processing with minimal memory allocation and usage. But you would find some pascal code which is as fast as manual asm with no SIMD support. For better performance, branchless SIMD is the key, but it is much more complex and error prone. The main trick about case insensitive search is that a branchless version using a lookup table for A-Z chars is faster than cascaded cmp/ja/jb branches, on modern CPUs. We just enhanced this idea to Unicode case folding.
Fast Pos & StringReplace for 64 bit

Arnaud Bouchez replied to Tom de Neef's topic in Algorithms, Data Structures and Class Design

In practice, SSE 4.2 is not faster than regular SSE 2 code for strcmp and strlen. More complex process may benefit of SSE 4.2 - but the fastest JSON parser I have seen doesn't use it, and focuses on micro-parallelized branchless process with regular SIMD instructions - see https://github.com/simdjson/simdjson Memory access is the bottleneck. This is what Agner measured. About any asm, it is mandatory to refer to https://agner.org/optimize There are reference code and reference documentation about how modern asm should be written. The PosEx_Sha_Pas_2 version is one of the fastest, and probably faster than your version, even if it is written in pure pascal. For instance, reading a register then shr + cmp is not the fastest pattern today. Pascal version will also work on Mac and Linux, whereas your asm version would need additional code to support the POSIX ABI. We included it (with minimal tweaks like using NativeInt instead of integer, and using an inline stub for register usage) in https://github.com/synopse/mORMot2/blob/master/src/core/mormot.core.base.pas#L7974 First thing is to benchmark and compare your code with proper timing, and regular tests. Try with some complex process, not in naive loops which tends to be biased because in naive tests the data remains in the CPU L1 cache, so numbers are not realistic.

Sign In

Leaderboard

Arnaud Bouchez

Points

Content Count

Lars Fosdal

Points

Content Count

Bill Meyer

Points

Content Count

Fr0sT.Brutal

Points

Content Count

Popular Content

Is it possible to get a list of all global variables, consts, types (non RTL)?

Listen to UDP in a TThread (Windows Service)

Fast Pos & StringReplace for 64 bit

Fast Pos & StringReplace for 64 bit

Browse

Activity