Jump to content
A.M. Hoornweg

Interlocked API and memory-mapped files

Recommended Posts

Hello all,

 

I need to map a block of memory (a ring buffer) into the address space of multiple processes for shared access. The Windows API offers a technology called memory-mapped files for this purpose.

 

My question: does the "Interlockedxxxx" API work reliably across process boundaries ?  The windows API documentation does not state that explicitly.  My hope is that these commands translate into process-agnostic cpu opcodes.

Share this post


Link to post
1 hour ago, A.M. Hoornweg said:

does the "Interlockedxxxx" API work reliably across process boundaries ?

I think so provided the data is in shared memory.

Share this post


Link to post
Quote

does the "Interlockedxxxx" API work reliably across process boundaries 

The Interlockedxxxx defined in Delphi are just a wrapper for System.Atomicxxxx. 

Quote

The windows API documentation does not state that explicitly.

Yes it works as long as you respect the alignment requirement.

Share this post


Link to post

Maybe a crazy idea, but since this block of memory is in a shared address space of two applications, would it be possible to place a "critical section" object in there?   

 

I'm looking for a lightweight method to serialize access (something more lightweight than a mutex). One process will be writing the ring buffer and another will be reading it.

 

Share this post


Link to post

Why not just use the traditional synchronizations mechanism provided by the OS and keep things simple? It seems pointless focusing on micro optimizing the concurrency control of a buffer backed by virtual memory and accessed synchronously. The sync mechanism is not going to be the bottleneck.

 

I have the code for a shared memory ring buffer somewhere if the OP is interested. I used it a few decades ago to pipe CodeSite log messages between a producer and a consumer service. The consumer wrote the log to disk.

  • Like 1

Share this post


Link to post
4 hours ago, A.M. Hoornweg said:

Maybe a crazy idea, but since this block of memory is in a shared address space of two applications, would it be possible to place a "critical section" object in there?   



No! CS is not designed to work with inter-process. It may look from the first sight that putting it into a shared memory will do the trick ... but in fact things are more complex than what you think because there is a signaling mechanism involved usually a semaphore that has a local(private) visibility.

@Kas Ob. WaitOnAddress apparently works only with threads in the same process.

  • Like 2

Share this post


Link to post
Guest
10 hours ago, Anders Melander said:

And it's extremely subject to race conditions.

I would not use extremely there, as race condition can happen with almost every threading model, in other words it depends on the design, you can have a race with CS or events, with wrong assumptions you can have one with IOCP.

 

Anyway thank you for that link, and i am adding this one with very interesting comparison table of characteristics with futex

https://devblogs.microsoft.com/oldnewthing/20170601-00/?p=96265 

Share this post


Link to post
16 hours ago, A.M. Hoornweg said:

I'm looking for a lightweight method to serialize access (something more lightweight than a mutex). One process will be writing the ring buffer and another will be reading it.

 

InterlockedCompareExchange can be used to implement a spinlock. When using shared memory, this will work interprocess.

Share this post


Link to post
1 hour ago, FPiette said:

InterlockedCompareExchange can be used to implement a spinlock. When using shared memory, this will work interprocess.

Yes that's right but at what cost ? Spinlock is good when the operation is lightweight ... otherwise its just an overhead on the System/CPU.

Share this post


Link to post
4 hours ago, FPiette said:

InterlockedCompareExchange can be used to implement a spinlock. When using shared memory, this will work interprocess.

So how will you use a spinlock to signal the consumer thread that data is available without doing busy wait?

Share this post


Link to post

@Anders Melander  in this case a signal to the consumer thread that data is available isn't necessary.    The producer writes into the ring buffer @ 10Hz, the consumer polls the ring buffer every few seconds and pulls whatever was put in there. A synchronization object is only needed for atomicity of the pointers and counters.

Edited by A.M. Hoornweg

Share this post


Link to post
1 hour ago, A.M. Hoornweg said:

The producer writes into the ring buffer @ 10Hz, the consumer polls the ring buffer every few seconds and pulls whatever was put in there.

Are you kidding me?

What you are asking for is apparently a wheelbarrow while the rest of us are suggesting different ways to build a drag racer :classic_smile:

 

I don't get why you don't just use a TMutex to protect the ring buffer structure and a TSemaphore to signal availability. I mean it's like around 10 lines of extra code compared to the 2-300 lines of code you will need to implement the ring buffer in shared memory.

 

1 hour ago, A.M. Hoornweg said:

A synchronization object is only needed for atomicity of the pointers and counters.

You're aware that you can't store pointers in shared memory, right?

Share this post


Link to post

@Anders MelanderThis recording application stores processed data records at 10 Hz, which is slow by any metric, but it performs sampling and aggregation at a much higher frequency. This is a "finished" application, tried and tested,  and we'd like to avoid breaking anything because its reliability is vital to our business. 

 

But since the beginning of the Covid 19 pandemic my entire department works from home and we have the need to access that data.  So the idea was to make a tiny modification to that application, to give it a circular buffer in RAM that is accessible from an outside process and to dimension that buffer big enough to contain an hour of data. 

 

We would then write an independent application that acts as a TCP server, allowing us to stream large chunks of data from the buffer.  Not disturbing the data acquisition itself is an absolute necessity, hence my question about a lean locking mechanism. It is absolutely no problem if the consumer must wait a few ms, but the producer should be as undisturbed as possible. 

 

 

And of course I meant the word "pointer" in a generic sense, not as a logical address. The buffer would get a 4 kb header with a version number and some properly aligned control variables. All "pointers" in there will just be record numbers.

 

 

 

 

 

Share this post


Link to post

@A.M. Hoornweg Thanks for the detailed explanation. I understand your challenge much better now.

 

I think it's essential that you realize that the overhead of the lock itself is not going to be a factor at all and instead focus on minimizing the time the lock is held - i.e. the time to transfer data to and from the shared memory buffer. My recommendation would be to start with a simple solution and if that turns out not to be fast enough then you can try to come up with something better.

I don't know anything about the actual amount of data you're processing but I think I would strive to process it in many small chunks instead of few large chunks. On the producer side I would write data packets to a (fast) lock free queue and have a separate thread read from that queue and write them to the (slower) shared memory queue. If it's essential that the producer isn't blocked then you will have to accept that data can be dropped if the consumer isn't able to keep up, but I guess you already know that.

Again, if you want it I have a ready to use implementation of a shared memory circular buffer that has been used in production for many, many years.

Share this post


Link to post
20 hours ago, A.M. Hoornweg said:

The producer writes into the ring buffer @ 10Hz, the consumer polls the ring buffer every few seconds and pulls whatever was put in there.

Why not just use pipes or sockets instead?

Share this post


Link to post
Just now, A.M. Hoornweg said:

I'd like to take a closer look at it at the very least !

Okay, here you go. Source and simple demo attached.

 

Usage:

Producer

var FRingBuffer := TSharedMemoryRingBuffer.Create('FooBar', 1024*1024); // 1Mb
...

// String
FRingBuffer.Enqueue('Hello world');

// Raw bytes
var Buffer: TBytes;
...
FRingBuffer.Enqueue(Buffer);

Consumer

var FRingBuffer := TSharedMemoryRingBuffer.Create('FooBar', 1024*1024); // 1Mb
...

// Strings
while (True) do
begin
  // Just remove the WaitFor to use polling instead
  if (FRingBuffer.WaitFor(100) = wrSignaled) then
  begin
    var s. string;
    if (FRingBuffer.Dequeue(s)) then
      ...do something with string...
  end;
  ...
end;

// Raw bytes
while (True) do
begin
  // Just remove the WaitFor to use polling instead
  if (FRingBuffer.WaitFor(100) = wrSignaled) then
  begin
    var Buffer: TBytes;
    if (FRingBuffer.Dequeue(Buffer)) then
      ...do something with buffer...
  end;
  ...
end;

 

amSharedMemory.pas

SharedMemory.zip

  • Like 2

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×