Jump to content

Leaderboard


Popular Content

Showing content with the highest reputation on 05/01/20 in all areas

  1. Pierre le Riche

    Experience/opinions on FastMM5

    Hi all, I've added experimental support for NUMA in a branch (numa_support). The idea is to link both the arenas and threads (for which performance matters) to a "NUMA mask". When scanning the arenas for available blocks it will perform a bitwise "and" between the mask for the arena and the mask for the thread, and if the result is non-zero then the arena is allowed to serve blocks to that thread. In this way you can completely separate the memory pools between threads or groups of threads. I have not tested how well this works in practice (I don't have a NUMA system on hand), but I believe VirtualAlloc is smart enough to provide memory from the NUMA node closest to the CPU the thread is running on. I have made it so you can specify a mask by block size. Version 4 is susceptible to cache thrashing when adjacent small blocks share the same cache line and are written to by different CPUs. By using this mechanism that can also be avoided. Pierre
  2. Günther Schoch

    Experience/opinions on FastMM5

    Well, during the design phase of FastMM5 this feature was discussed but not (yet) implemented. The background was: a) a lot of the software is now running on large AWS nodes or similar virtual severs. There the optimization via NUMA is rather a special case b) modern processors as the AMD EPYC https://www.nextplatform.com/2019/08/15/a-deep-dive-into-amds-rome-epyc-architecture/ have internal optimization strategies But we are open to everything that makes the FastMM5 performance significantly better. regards Günther (Günther Schoch, gs-soft AG = we sponsored FastMM5)
  3. What do you think about FPC + Linux support, which is a good environment for multi-threaded servers? FPC built-in heap is good, but tends to consume a lot of memory with a lot of threads: it maintains small per-thread heaps using a threadvar, whereas FastMM5 uses several arenas which are shared among all threads (I guess the idea is inspired from pmalloc/glibc allocator). I used C all best known alternatives, and I was not convinced. The only stable and not bloated memory manager is the one in glibc. But the slightest memory access violation tends to kill/abort the process, so it is not good on production. I could definitively help about the Linux/FPC syscalls and the low-level Intel asm, to includ FPC/Linux support on FastMM5. But perhaps I would go into this direction only if FPC as compiler doesn't require a commercial license. What do you think?
×