Jump to content
Registration disabled at the moment Read more... ×

Pierre le Riche

Members
  • Content Count

    27
  • Joined

  • Last visited

  • Days Won

    1

Everything posted by Pierre le Riche

  1. Pierre le Riche

    Experience/opinions on FastMM5

    I have not been able to get official verification of this from a Microsoft website, but according to several other sources Windows uses the memory closest to the CPU that causes the page fault wherever possible. So effectively it does not matter which thread allocated the virtual memory, the thread that touches the page first will determine what memory is used to back it. It certainly would make a lot of sense for it to work that way. If you have a real-world workload that you could throw at it I would really appreciate the feedback. I don't currently have any benchmarks that I think are suitable. Assuming the behaviour described above is correct and Windows backs a page with memory from the CPU that touched the page first I expect this new feature not to have a material impact on performance with blocks much larger than 4K, but with smaller blocks there should be a measurable difference.
  2. Pierre le Riche

    Experience/opinions on FastMM5

    Hi all, I've added experimental support for NUMA in a branch (numa_support). The idea is to link both the arenas and threads (for which performance matters) to a "NUMA mask". When scanning the arenas for available blocks it will perform a bitwise "and" between the mask for the arena and the mask for the thread, and if the result is non-zero then the arena is allowed to serve blocks to that thread. In this way you can completely separate the memory pools between threads or groups of threads. I have not tested how well this works in practice (I don't have a NUMA system on hand), but I believe VirtualAlloc is smart enough to provide memory from the NUMA node closest to the CPU the thread is running on. I have made it so you can specify a mask by block size. Version 4 is susceptible to cache thrashing when adjacent small blocks share the same cache line and are written to by different CPUs. By using this mechanism that can also be avoided. Pierre
×