Registration disabled at the moment Read more... ×

Pierre le Riche

Members

View Profile See their activity

Content Count
27
Joined
May 1, 2020
Last visited
Friday at 07:51 PM
Days Won
1

Content Type

All Activity

Profiles

Forums

Topics
Posts

Calendar

Events

Everything posted by Pierre le Riche

Experience/opinions on FastMM5

Pierre le Riche replied to Leif Uneus's topic in RTL and Delphi Object Pascal

I have not been able to get official verification of this from a Microsoft website, but according to several other sources Windows uses the memory closest to the CPU that causes the page fault wherever possible. So effectively it does not matter which thread allocated the virtual memory, the thread that touches the page first will determine what memory is used to back it. It certainly would make a lot of sense for it to work that way. If you have a real-world workload that you could throw at it I would really appreciate the feedback. I don't currently have any benchmarks that I think are suitable. Assuming the behaviour described above is correct and Windows backs a page with memory from the CPU that touched the page first I expect this new feature not to have a material impact on performance with blocks much larger than 4K, but with smaller blocks there should be a measurable difference.
Experience/opinions on FastMM5

Pierre le Riche replied to Leif Uneus's topic in RTL and Delphi Object Pascal

Hi all, I've added experimental support for NUMA in a branch (numa_support). The idea is to link both the arenas and threads (for which performance matters) to a "NUMA mask". When scanning the arenas for available blocks it will perform a bitwise "and" between the mask for the arena and the mask for the thread, and if the result is non-zero then the arena is allowed to serve blocks to that thread. In this way you can completely separate the memory pools between threads or groups of threads. I have not tested how well this works in practice (I don't have a NUMA system on hand), but I believe VirtualAlloc is smart enough to provide memory from the NUMA node closest to the CPU the thread is running on. I have made it so you can specify a mask by block size. Version 4 is susceptible to cache thrashing when adjacent small blocks share the same cache line and are written to by different CPUs. By using this mechanism that can also be avoided. Pierre

Sign In

Pierre le Riche

Content Count

Joined

Last visited

Days Won

Content Type

Profiles

Forums

Calendar

Everything posted by Pierre le Riche

Experience/opinions on FastMM5

Experience/opinions on FastMM5

Browse

Activity