Interlocked API and memory-mapped files

A.M. Hoornweg · October 30, 2020

Hello all,

I need to map a block of memory (a ring buffer) into the address space of multiple processes for shared access. The Windows API offers a technology called memory-mapped files for this purpose.

My question: does the "Interlockedxxxx" API work reliably across process boundaries ? The windows API documentation does not state that explicitly. My hope is that these commands translate into process-agnostic cpu opcodes.

FPiette · October 30, 2020

1 hour ago, A.M. Hoornweg said:

does the "Interlockedxxxx" API work reliably across process boundaries ?

I think so provided the data is in shared memory.

Anders Melander · October 30, 2020

1 hour ago, A.M. Hoornweg said:

does the "Interlockedxxxx" API work reliably across process boundaries ?

Yes.

Mahdi Safsafi · October 30, 2020

Quote

does the "Interlockedxxxx" API work reliably across process boundaries ?

The Interlockedxxxx defined in Delphi are just a wrapper for System.Atomicxxxx.

Quote

The windows API documentation does not state that explicitly.

Yes it works as long as you respect the alignment requirement.

A.M. Hoornweg · November 2, 2020

Thank you all! I'm giving it a try.

A.M. Hoornweg · November 3, 2020

Maybe a crazy idea, but since this block of memory is in a shared address space of two applications, would it be possible to place a "critical section" object in there?

I'm looking for a lightweight method to serialize access (something more lightweight than a mutex). One process will be writing the ring buffer and another will be reading it.

November 3, 2020

I never tried it with between process ( shared memory ), but WaitOnAddress https://docs.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-waitonaddress does provide an easy way to build one-producer-multi-consumer mechanism, it is lighter than event and little faster in response, it worth a try to see if it does work on shared memory address.

Anders Melander · November 3, 2020

Why not just use the traditional synchronizations mechanism provided by the OS and keep things simple? It seems pointless focusing on micro optimizing the concurrency control of a buffer backed by virtual memory and accessed synchronously. The sync mechanism is not going to be the bottleneck.

I have the code for a shared memory ring buffer somewhere if the OP is interested. I used it a few decades ago to pipe CodeSite log messages between a producer and a consumer service. The consumer wrote the log to disk.

Mahdi Safsafi · November 3, 2020

4 hours ago, A.M. Hoornweg said:

Maybe a crazy idea, but since this block of memory is in a shared address space of two applications, would it be possible to place a "critical section" object in there?

No! CS is not designed to work with inter-process. It may look from the first sight that putting it into a shared memory will do the trick ... but in fact things are more complex than what you think because there is a signaling mechanism involved usually a semaphore that has a local(private) visibility.

@Kas Ob. WaitOnAddress apparently works only with threads in the same process.

Anders Melander · November 3, 2020

5 minutes ago, Mahdi Safsafi said:

WaitOnAddress apparently works only with threads in the same process

And it's extremely subject to race conditions. https://devblogs.microsoft.com/oldnewthing/20160826-00/?p=94185

Mahdi Safsafi · November 3, 2020

47 minutes ago, Anders Melander said:

And it's extremely subject to race conditions. https://devblogs.microsoft.com/oldnewthing/20160826-00/?p=94185

Wake Woke Woken ... I'm pretty sure I'm going to wake too early.

November 4, 2020

10 hours ago, Anders Melander said:

And it's extremely subject to race conditions.

I would not use extremely there, as race condition can happen with almost every threading model, in other words it depends on the design, you can have a race with CS or events, with wrong assumptions you can have one with IOCP.

Anyway thank you for that link, and i am adding this one with very interesting comparison table of characteristics with futex

https://devblogs.microsoft.com/oldnewthing/20170601-00/?p=96265

FPiette · November 4, 2020

16 hours ago, A.M. Hoornweg said:

I'm looking for a lightweight method to serialize access (something more lightweight than a mutex). One process will be writing the ring buffer and another will be reading it.

InterlockedCompareExchange can be used to implement a spinlock. When using shared memory, this will work interprocess.

Mahdi Safsafi · November 4, 2020

1 hour ago, FPiette said:

InterlockedCompareExchange can be used to implement a spinlock. When using shared memory, this will work interprocess.

Yes that's right but at what cost ? Spinlock is good when the operation is lightweight ... otherwise its just an overhead on the System/CPU.

Anders Melander · November 4, 2020

4 hours ago, FPiette said:

InterlockedCompareExchange can be used to implement a spinlock. When using shared memory, this will work interprocess.

So how will you use a spinlock to signal the consumer thread that data is available without doing busy wait?

A.M. Hoornweg · November 4, 2020

@Anders Melander in this case a signal to the consumer thread that data is available isn't necessary. The producer writes into the ring buffer @ 10Hz, the consumer polls the ring buffer every few seconds and pulls whatever was put in there. A synchronization object is only needed for atomicity of the pointers and counters.

Edited November 4, 2020 by A.M. Hoornweg

Anders Melander · November 4, 2020

1 hour ago, A.M. Hoornweg said:

The producer writes into the ring buffer @ 10Hz, the consumer polls the ring buffer every few seconds and pulls whatever was put in there.

Are you kidding me?

What you are asking for is apparently a wheelbarrow while the rest of us are suggesting different ways to build a drag racer

I don't get why you don't just use a TMutex to protect the ring buffer structure and a TSemaphore to signal availability. I mean it's like around 10 lines of extra code compared to the 2-300 lines of code you will need to implement the ring buffer in shared memory.

1 hour ago, A.M. Hoornweg said:

A synchronization object is only needed for atomicity of the pointers and counters.

You're aware that you can't store pointers in shared memory, right?

A.M. Hoornweg · November 5, 2020

@Anders MelanderThis recording application stores processed data records at 10 Hz, which is slow by any metric, but it performs sampling and aggregation at a much higher frequency. This is a "finished" application, tried and tested, and we'd like to avoid breaking anything because its reliability is vital to our business.

But since the beginning of the Covid 19 pandemic my entire department works from home and we have the need to access that data. So the idea was to make a tiny modification to that application, to give it a circular buffer in RAM that is accessible from an outside process and to dimension that buffer big enough to contain an hour of data.

We would then write an independent application that acts as a TCP server, allowing us to stream large chunks of data from the buffer. Not disturbing the data acquisition itself is an absolute necessity, hence my question about a lean locking mechanism. It is absolutely no problem if the consumer must wait a few ms, but the producer should be as undisturbed as possible.

And of course I meant the word "pointer" in a generic sense, not as a logical address. The buffer would get a 4 kb header with a version number and some properly aligned control variables. All "pointers" in there will just be record numbers.

Anders Melander · November 5, 2020

@A.M. Hoornweg Thanks for the detailed explanation. I understand your challenge much better now.

I think it's essential that you realize that the overhead of the lock itself is not going to be a factor at all and instead focus on minimizing the time the lock is held - i.e. the time to transfer data to and from the shared memory buffer. My recommendation would be to start with a simple solution and if that turns out not to be fast enough then you can try to come up with something better.

I don't know anything about the actual amount of data you're processing but I think I would strive to process it in many small chunks instead of few large chunks. On the producer side I would write data packets to a (fast) lock free queue and have a separate thread read from that queue and write them to the (slower) shared memory queue. If it's essential that the producer isn't blocked then you will have to accept that data can be dropped if the consumer isn't able to keep up, but I guess you already know that.

Again, if you want it I have a ready to use implementation of a shared memory circular buffer that has been used in production for many, many years.

A.M. Hoornweg · November 5, 2020

@Anders Melander that would be very kind of you! I'd like to take a closer look at it at the very least !

Fr0sT.Brutal · November 5, 2020

20 hours ago, A.M. Hoornweg said:

The producer writes into the ring buffer @ 10Hz, the consumer polls the ring buffer every few seconds and pulls whatever was put in there.

Why not just use pipes or sockets instead?

Anders Melander · November 5, 2020

Just now, A.M. Hoornweg said:

I'd like to take a closer look at it at the very least !

Okay, here you go. Source and simple demo attached.

Usage:

Producer

var FRingBuffer := TSharedMemoryRingBuffer.Create('FooBar', 1024*1024); // 1Mb
...

// String
FRingBuffer.Enqueue('Hello world');

// Raw bytes
var Buffer: TBytes;
...
FRingBuffer.Enqueue(Buffer);

Consumer

var FRingBuffer := TSharedMemoryRingBuffer.Create('FooBar', 1024*1024); // 1Mb
...

// Strings
while (True) do
begin
  // Just remove the WaitFor to use polling instead
  if (FRingBuffer.WaitFor(100) = wrSignaled) then
  begin
    var s. string;
    if (FRingBuffer.Dequeue(s)) then
      ...do something with string...
  end;
  ...
end;

// Raw bytes
while (True) do
begin
  // Just remove the WaitFor to use polling instead
  if (FRingBuffer.WaitFor(100) = wrSignaled) then
  begin
    var Buffer: TBytes;
    if (FRingBuffer.Dequeue(Buffer)) then
      ...do something with buffer...
  end;
  ...
end;

amSharedMemory.pas

SharedMemory.zip

A.M. Hoornweg · November 5, 2020

Thank you very much!

Sign In

Interlocked API and memory-mapped files

Recommended Posts

A.M. Hoornweg 161

Share this post

Link to post

FPiette 394

Share this post

Link to post

Anders Melander 2137

Share this post

Link to post

Mahdi Safsafi 225

Share this post

Link to post

A.M. Hoornweg 161

Share this post

Link to post

A.M. Hoornweg 161

Share this post

Link to post

Guest

Share this post

Link to post

Anders Melander 2137

Share this post

Link to post

Mahdi Safsafi 225

Share this post

Link to post

Anders Melander 2137

Share this post

Link to post

Mahdi Safsafi 225

Share this post

Link to post

Guest

Share this post

Link to post

FPiette 394

Share this post

Link to post

Mahdi Safsafi 225

Share this post

Link to post

Anders Melander 2137

Share this post

Link to post

A.M. Hoornweg 161

Share this post

Link to post

Anders Melander 2137

Share this post

Link to post

A.M. Hoornweg 161

Share this post

Link to post

Anders Melander 2137

Share this post

Link to post

A.M. Hoornweg 161

Share this post

Link to post

Fr0sT.Brutal 904

Share this post

Link to post

Anders Melander 2137

Share this post

Link to post

A.M. Hoornweg 161

Share this post

Link to post

Create an account or sign in to comment

Create an account

Sign in

Browse

Activity