Jump to content
hukmac

Strange problem with mutexes

Recommended Posts

Hello 🙂I have the following code in Delphi12, with slow functions PerformCalculations and SaveDataToDisk:

function MutexBarrier(resource_mutex_id :String) :THandle;
var Mutex :THandle; 
begin
  Mutex:=CreateMutex(nil,false,PChar(resource_mutex_id)); 
  WaitForSingleObject(Mutex,INFINITE); 
  result:=Mutex;
end;

procedure Main;      //process main code (N processes are being run):
var mutex :THandle;
begin
  repeat 
    mutex:=MutexBarrier('Mutex1_constant_unique_name'); 
    try
      Prepare;
    finally
      ReleaseMutex(mutex);
      CloseHandle(mutex);
    end;

    PerformCalculations;

    mutex:=MutexBarrier('Mutex2_constant_unique_name');
    try
      SaveDataToDisk;
    finally
      ReleaseMutex(mutex);
      CloseHandle(mutex);
    end;
  until False;
end;


I'm running N processes with this code, each on separate core (set with affinity, the number of cores > N). Processes do not create additional threads. Processes are run by a separate script like "proc.exe -core=x", where x is the expected affinity. Affinity is set correctly. But the effect of usage of mutexes is strange:

 

1. Initially all N processes quickly create queue on Mutex1, go through it one after another and then in parallel they are performing calculations on separate cores.
2. Then all N processes start to wait on Mutex2, and one of them (lets name it Process A) gets to the critical section of Mutex2. This is ok.
3. Then the process A which passed critical section of Mutex2 is going through Mutex1 and is performing calculations.
At the same time all other processes still wait on Mutex1. - this is strange.
4. Only when process A finishes calculations and enters Mutex2 queue then some other process (name it process B) is entering the critical section of Mutex2.  - This is very strange to me.
5. Then process B goes through Mutex1 and performs calculations., etc.

In the effect, starting from point 3 only one process is performing calculations at a time. This is very strange to me. Why Mutex2 behaves like that. It is Win10.


Remark 1: When SaveDataToDisk is fast (or not called at all - empty critical section of Mutex2), the problem is not happening - and all processes execute PerformCalculations in parallel.
Remark 2: I have also verified that Prepare, PerformCalculations and SaveDataToDisk are not failing.

I'm searching how to make Mutex2 to behave like expected, but no success till now. Maybe someone has some idea?

Edited by hukmac

Share this post


Link to post

Why are you creating and destroying the mutexes over and over?  Don't do that!  Create them one time and then just acquire and release them as needed, eg:

function CreateMutexBarrier(const resource_mutex_id : String) : THandle;
begin
  Result := CreateMutex(nil, False, PChar(resource_mutex_id));
  if Result = 0 then
    RaiseLastOSError;
end;

procedure EnterMutexBarrier(mutex : THandle);
begin
  if WaitForSingleObject(mutex, INFINITE) = WAIT_FAILED then
    RaiseLastOSError; 
end;

procedure LeaveMutexBarrier(mutex : THandle);
begin
  if not ReleaseMutex(mutex1) then
    RaiseLastOSError; 
end;

procedure Main;
var
  mutex1, mutex2 : THandle;
begin
  mutex1 := CreateMutexBarrier('Mutex1_constant_unique_name'); 
  try
    mutex2 := CreateMutexBarrier('Mutex2_constant_unique_name');
    try
      repeat 
        EnterMutexBarrier(mutex1); 
        try
          Prepare;
        finally
          LeaveMutexBarrier(mutex1);
        end;

        PerformCalculations;

        EnterMutexBarrier(mutex2); 
        try
          SaveDataToDisk;
        finally
          LeaveMutexBarrier(mutex2);
        end;
      until False;
    finally
      CloseHandle(mutex2);
    end;
  finally
    CloseHandle(mutex1);
  end;
end;

 

Edited by Remy Lebeau

Share this post


Link to post

Dear Remy, thank you for your question and proposed code. So here I go with the answer and clarification.
 

> Why are you creating and destroying the mutexes over and over? 

 

First reason: the situation presented above is a simplification (I thought it will be more clear to present the problem in this form). In my real code, for important reasons, each process is performing calculations and writing results only specified number of times (so the above Main loop would not be infinite). When the work is done then the process creates its child process assigned to the same core and self-terminates. Thus at the end of the process the mutex handle would be closed by the OS anyway. To make things clear I'm doing it manually. 

 

Second reason: The named mutex in the OS will exist till at least one handle to it will exist in any of the processes. So while the Mutex2 queue is not empty the mutex is not destroyed by the OS. It is created just once and opened many times by processes if/as the queue is formed. If given queue is emptied the mutex will be destroyed. But creation/destruction of the mutex by the OS is not costly, especially in comparison with the time cores need to spend in crtical sections (in Mutex2 - many seconds).

 

So in my case the problematic Mutex2 - as its queue is almost always full of waiting processes - is in fact created once and not destroyed till the last process is destroyed. And only local handles to the mutex are created/destroyed as processes are hitting/passing the barrier. 

 

The problem is that the Mutex2 queue is not working as expected: none of processes waiting in Mutex2 is entering Mutex2 critical section till the last process which was in this critical section - or its child process - will hit this barrier again. Expected behaviour is: when process A leaves the Mutex2 critical section and starts doing other things, some other process waiting in Mutex2 enters its critical section (even if process A is calculating), etc. 

 

Is this mutex improperly created or some Win10 mechanism needs additional configuration to make it working as expected?

 

Second thought: my CPU is i9-9900K in a hyperthreading mode - can the problem be related with the hyperthreading? If process A is assigned to logical core 0 and next process in a Mutex2 queue (name it B) is assigned to the logical core 1 (both logical cores 0 and 1 are simulated by the same SMT unit - physical core 0), then: if process A is leaving Mutex2 critical section and starts calculating faster than process B will get to Mutex2 critical section, then maybe OS will not like to switch the context of the physical core 0 (simulating hyperthreading cores 0 and 1) and will not release process B to critical section - and none of other waiting processes which are lower in the queue. I will switch off hyperthreading and check it. 

Edited by hukmac

Share this post


Link to post

OK, so... I have checked that the above has nothing to do with hyperthreading. 

 

I have also revisited my observations. Somehow earlier I assumed that the time of single PerformCalculation is at least twice the time of executing SaveDataToDisk - it was the opposite. Thus, the observed behaviour is expected. SaveDataToDisk is a real bottleneck. Processes are entering Mutex2 critical section as fast as possible and nothing strange is with Mutex2. When I artificially increased the time of PerformCalculation to be 10 times the SaveDataToDisk, it nicely performs SaveDataToDisk during PerformCalculation on other cores - and PerformCalculation in all processes uses almost 100% time of the CPU.

 

I will check if I can speed up the SaveDataToDisk. In the example I'm working with each 3s calculation creates 800MB of data to save. Saving this takes ~7s. I'm working on Samsung EVO 970 Plus with 3300MB/s write speed so there is room for improvement. 

 

Thank you Remy for making me thinking.

Edited by hukmac

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×