A.M. Hoornweg 144 Posted October 30, 2020 Hello all, I need to map a block of memory (a ring buffer) into the address space of multiple processes for shared access. The Windows API offers a technology called memory-mapped files for this purpose. My question: does the "Interlockedxxxx" API work reliably across process boundaries ? The windows API documentation does not state that explicitly. My hope is that these commands translate into process-agnostic cpu opcodes. Share this post Link to post
FPiette 383 Posted October 30, 2020 1 hour ago, A.M. Hoornweg said: does the "Interlockedxxxx" API work reliably across process boundaries ? I think so provided the data is in shared memory. Share this post Link to post
Anders Melander 1782 Posted October 30, 2020 1 hour ago, A.M. Hoornweg said: does the "Interlockedxxxx" API work reliably across process boundaries ? Yes. Share this post Link to post
Mahdi Safsafi 225 Posted October 30, 2020 Quote does the "Interlockedxxxx" API work reliably across process boundaries ? The Interlockedxxxx defined in Delphi are just a wrapper for System.Atomicxxxx. Quote The windows API documentation does not state that explicitly. Yes it works as long as you respect the alignment requirement. Share this post Link to post
A.M. Hoornweg 144 Posted November 2, 2020 Thank you all! I'm giving it a try. Share this post Link to post
A.M. Hoornweg 144 Posted November 3, 2020 Maybe a crazy idea, but since this block of memory is in a shared address space of two applications, would it be possible to place a "critical section" object in there? I'm looking for a lightweight method to serialize access (something more lightweight than a mutex). One process will be writing the ring buffer and another will be reading it. Share this post Link to post
Guest Posted November 3, 2020 I never tried it with between process ( shared memory ), but WaitOnAddress https://docs.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-waitonaddress does provide an easy way to build one-producer-multi-consumer mechanism, it is lighter than event and little faster in response, it worth a try to see if it does work on shared memory address. Share this post Link to post
Anders Melander 1782 Posted November 3, 2020 Why not just use the traditional synchronizations mechanism provided by the OS and keep things simple? It seems pointless focusing on micro optimizing the concurrency control of a buffer backed by virtual memory and accessed synchronously. The sync mechanism is not going to be the bottleneck. I have the code for a shared memory ring buffer somewhere if the OP is interested. I used it a few decades ago to pipe CodeSite log messages between a producer and a consumer service. The consumer wrote the log to disk. 1 Share this post Link to post
Mahdi Safsafi 225 Posted November 3, 2020 4 hours ago, A.M. Hoornweg said: Maybe a crazy idea, but since this block of memory is in a shared address space of two applications, would it be possible to place a "critical section" object in there? No! CS is not designed to work with inter-process. It may look from the first sight that putting it into a shared memory will do the trick ... but in fact things are more complex than what you think because there is a signaling mechanism involved usually a semaphore that has a local(private) visibility. @Kas Ob. WaitOnAddress apparently works only with threads in the same process. 2 Share this post Link to post
Anders Melander 1782 Posted November 3, 2020 5 minutes ago, Mahdi Safsafi said: WaitOnAddress apparently works only with threads in the same process And it's extremely subject to race conditions. https://devblogs.microsoft.com/oldnewthing/20160826-00/?p=94185 1 Share this post Link to post
Mahdi Safsafi 225 Posted November 3, 2020 47 minutes ago, Anders Melander said: And it's extremely subject to race conditions. https://devblogs.microsoft.com/oldnewthing/20160826-00/?p=94185 Wake Woke Woken ... I'm pretty sure I'm going to wake too early. Share this post Link to post
Guest Posted November 4, 2020 10 hours ago, Anders Melander said: And it's extremely subject to race conditions. I would not use extremely there, as race condition can happen with almost every threading model, in other words it depends on the design, you can have a race with CS or events, with wrong assumptions you can have one with IOCP. Anyway thank you for that link, and i am adding this one with very interesting comparison table of characteristics with futex https://devblogs.microsoft.com/oldnewthing/20170601-00/?p=96265 Share this post Link to post
FPiette 383 Posted November 4, 2020 16 hours ago, A.M. Hoornweg said: I'm looking for a lightweight method to serialize access (something more lightweight than a mutex). One process will be writing the ring buffer and another will be reading it. InterlockedCompareExchange can be used to implement a spinlock. When using shared memory, this will work interprocess. Share this post Link to post
Mahdi Safsafi 225 Posted November 4, 2020 1 hour ago, FPiette said: InterlockedCompareExchange can be used to implement a spinlock. When using shared memory, this will work interprocess. Yes that's right but at what cost ? Spinlock is good when the operation is lightweight ... otherwise its just an overhead on the System/CPU. Share this post Link to post
Anders Melander 1782 Posted November 4, 2020 4 hours ago, FPiette said: InterlockedCompareExchange can be used to implement a spinlock. When using shared memory, this will work interprocess. So how will you use a spinlock to signal the consumer thread that data is available without doing busy wait? Share this post Link to post
A.M. Hoornweg 144 Posted November 4, 2020 (edited) @Anders Melander in this case a signal to the consumer thread that data is available isn't necessary. The producer writes into the ring buffer @ 10Hz, the consumer polls the ring buffer every few seconds and pulls whatever was put in there. A synchronization object is only needed for atomicity of the pointers and counters. Edited November 4, 2020 by A.M. Hoornweg Share this post Link to post
Anders Melander 1782 Posted November 4, 2020 1 hour ago, A.M. Hoornweg said: The producer writes into the ring buffer @ 10Hz, the consumer polls the ring buffer every few seconds and pulls whatever was put in there. Are you kidding me? What you are asking for is apparently a wheelbarrow while the rest of us are suggesting different ways to build a drag racer I don't get why you don't just use a TMutex to protect the ring buffer structure and a TSemaphore to signal availability. I mean it's like around 10 lines of extra code compared to the 2-300 lines of code you will need to implement the ring buffer in shared memory. 1 hour ago, A.M. Hoornweg said: A synchronization object is only needed for atomicity of the pointers and counters. You're aware that you can't store pointers in shared memory, right? Share this post Link to post
A.M. Hoornweg 144 Posted November 5, 2020 @Anders MelanderThis recording application stores processed data records at 10 Hz, which is slow by any metric, but it performs sampling and aggregation at a much higher frequency. This is a "finished" application, tried and tested, and we'd like to avoid breaking anything because its reliability is vital to our business. But since the beginning of the Covid 19 pandemic my entire department works from home and we have the need to access that data. So the idea was to make a tiny modification to that application, to give it a circular buffer in RAM that is accessible from an outside process and to dimension that buffer big enough to contain an hour of data. We would then write an independent application that acts as a TCP server, allowing us to stream large chunks of data from the buffer. Not disturbing the data acquisition itself is an absolute necessity, hence my question about a lean locking mechanism. It is absolutely no problem if the consumer must wait a few ms, but the producer should be as undisturbed as possible. And of course I meant the word "pointer" in a generic sense, not as a logical address. The buffer would get a 4 kb header with a version number and some properly aligned control variables. All "pointers" in there will just be record numbers. Share this post Link to post
Anders Melander 1782 Posted November 5, 2020 @A.M. Hoornweg Thanks for the detailed explanation. I understand your challenge much better now. I think it's essential that you realize that the overhead of the lock itself is not going to be a factor at all and instead focus on minimizing the time the lock is held - i.e. the time to transfer data to and from the shared memory buffer. My recommendation would be to start with a simple solution and if that turns out not to be fast enough then you can try to come up with something better. I don't know anything about the actual amount of data you're processing but I think I would strive to process it in many small chunks instead of few large chunks. On the producer side I would write data packets to a (fast) lock free queue and have a separate thread read from that queue and write them to the (slower) shared memory queue. If it's essential that the producer isn't blocked then you will have to accept that data can be dropped if the consumer isn't able to keep up, but I guess you already know that. Again, if you want it I have a ready to use implementation of a shared memory circular buffer that has been used in production for many, many years. Share this post Link to post
A.M. Hoornweg 144 Posted November 5, 2020 @Anders Melander that would be very kind of you! I'd like to take a closer look at it at the very least ! Share this post Link to post
Fr0sT.Brutal 900 Posted November 5, 2020 20 hours ago, A.M. Hoornweg said: The producer writes into the ring buffer @ 10Hz, the consumer polls the ring buffer every few seconds and pulls whatever was put in there. Why not just use pipes or sockets instead? Share this post Link to post
Anders Melander 1782 Posted November 5, 2020 Just now, A.M. Hoornweg said: I'd like to take a closer look at it at the very least ! Okay, here you go. Source and simple demo attached. Usage: Producer var FRingBuffer := TSharedMemoryRingBuffer.Create('FooBar', 1024*1024); // 1Mb ... // String FRingBuffer.Enqueue('Hello world'); // Raw bytes var Buffer: TBytes; ... FRingBuffer.Enqueue(Buffer); Consumer var FRingBuffer := TSharedMemoryRingBuffer.Create('FooBar', 1024*1024); // 1Mb ... // Strings while (True) do begin // Just remove the WaitFor to use polling instead if (FRingBuffer.WaitFor(100) = wrSignaled) then begin var s. string; if (FRingBuffer.Dequeue(s)) then ...do something with string... end; ... end; // Raw bytes while (True) do begin // Just remove the WaitFor to use polling instead if (FRingBuffer.WaitFor(100) = wrSignaled) then begin var Buffer: TBytes; if (FRingBuffer.Dequeue(Buffer)) then ...do something with buffer... end; ... end; amSharedMemory.pas SharedMemory.zip 2 Share this post Link to post