hukmac 5 Posted Thursday at 05:58 PM (edited) Hello 🙂I have the following code in Delphi12, with slow functions PerformCalculations and SaveDataToDisk: function MutexBarrier(resource_mutex_id :String) :THandle; var Mutex :THandle; begin Mutex:=CreateMutex(nil,false,PChar(resource_mutex_id)); WaitForSingleObject(Mutex,INFINITE); result:=Mutex; end; procedure Main; //process main code (N processes are being run): var mutex :THandle; begin repeat mutex:=MutexBarrier('Mutex1_constant_unique_name'); try Prepare; finally ReleaseMutex(mutex); CloseHandle(mutex); end; PerformCalculations; mutex:=MutexBarrier('Mutex2_constant_unique_name'); try SaveDataToDisk; finally ReleaseMutex(mutex); CloseHandle(mutex); end; until False; end; I'm running N processes with this code, each on separate core (set with affinity, the number of cores > N). Processes do not create additional threads. Processes are run by a separate script like "proc.exe -core=x", where x is the expected affinity. Affinity is set correctly. But the effect of usage of mutexes is strange: 1. Initially all N processes quickly create queue on Mutex1, go through it one after another and then in parallel they are performing calculations on separate cores. 2. Then all N processes start to wait on Mutex2, and one of them (lets name it Process A) gets to the critical section of Mutex2. This is ok. 3. Then the process A which passed critical section of Mutex2 is going through Mutex1 and is performing calculations. At the same time all other processes still wait on Mutex1. - this is strange. 4. Only when process A finishes calculations and enters Mutex2 queue then some other process (name it process B) is entering the critical section of Mutex2. - This is very strange to me. 5. Then process B goes through Mutex1 and performs calculations., etc. In the effect, starting from point 3 only one process is performing calculations at a time. This is very strange to me. Why Mutex2 behaves like that. It is Win10. Remark 1: When SaveDataToDisk is fast (or not called at all - empty critical section of Mutex2), the problem is not happening - and all processes execute PerformCalculations in parallel. Remark 2: I have also verified that Prepare, PerformCalculations and SaveDataToDisk are not failing. I'm searching how to make Mutex2 to behave like expected, but no success till now. Maybe someone has some idea? Edited Thursday at 06:01 PM by hukmac Share this post Link to post
Remy Lebeau 1623 Posted Thursday at 11:49 PM (edited) Why are you creating and destroying the mutexes over and over? Don't do that! Create them one time and then just acquire and release them as needed, eg: function CreateMutexBarrier(const resource_mutex_id : String) : THandle; begin Result := CreateMutex(nil, False, PChar(resource_mutex_id)); if Result = 0 then RaiseLastOSError; end; procedure EnterMutexBarrier(mutex : THandle); begin if WaitForSingleObject(mutex, INFINITE) = WAIT_FAILED then RaiseLastOSError; end; procedure LeaveMutexBarrier(mutex : THandle); begin if not ReleaseMutex(mutex1) then RaiseLastOSError; end; procedure Main; var mutex1, mutex2 : THandle; begin mutex1 := CreateMutexBarrier('Mutex1_constant_unique_name'); try mutex2 := CreateMutexBarrier('Mutex2_constant_unique_name'); try repeat EnterMutexBarrier(mutex1); try Prepare; finally LeaveMutexBarrier(mutex1); end; PerformCalculations; EnterMutexBarrier(mutex2); try SaveDataToDisk; finally LeaveMutexBarrier(mutex2); end; until False; finally CloseHandle(mutex2); end; finally CloseHandle(mutex1); end; end; Edited Thursday at 11:51 PM by Remy Lebeau Share this post Link to post
hukmac 5 Posted Friday at 01:12 AM (edited) Dear Remy, thank you for your question and proposed code. So here I go with the answer and clarification. > Why are you creating and destroying the mutexes over and over? First reason: the situation presented above is a simplification (I thought it will be more clear to present the problem in this form). In my real code, for important reasons, each process is performing calculations and writing results only specified number of times (so the above Main loop would not be infinite). When the work is done then the process creates its child process assigned to the same core and self-terminates. Thus at the end of the process the mutex handle would be closed by the OS anyway. To make things clear I'm doing it manually. Second reason: The named mutex in the OS will exist till at least one handle to it will exist in any of the processes. So while the Mutex2 queue is not empty the mutex is not destroyed by the OS. It is created just once and opened many times by processes if/as the queue is formed. If given queue is emptied the mutex will be destroyed. But creation/destruction of the mutex by the OS is not costly, especially in comparison with the time cores need to spend in crtical sections (in Mutex2 - many seconds). So in my case the problematic Mutex2 - as its queue is almost always full of waiting processes - is in fact created once and not destroyed till the last process is destroyed. And only local handles to the mutex are created/destroyed as processes are hitting/passing the barrier. The problem is that the Mutex2 queue is not working as expected: none of processes waiting in Mutex2 is entering Mutex2 critical section till the last process which was in this critical section - or its child process - will hit this barrier again. Expected behaviour is: when process A leaves the Mutex2 critical section and starts doing other things, some other process waiting in Mutex2 enters its critical section (even if process A is calculating), etc. Is this mutex improperly created or some Win10 mechanism needs additional configuration to make it working as expected? Second thought: my CPU is i9-9900K in a hyperthreading mode - can the problem be related with the hyperthreading? If process A is assigned to logical core 0 and next process in a Mutex2 queue (name it B) is assigned to the logical core 1 (both logical cores 0 and 1 are simulated by the same SMT unit - physical core 0), then: if process A is leaving Mutex2 critical section and starts calculating faster than process B will get to Mutex2 critical section, then maybe OS will not like to switch the context of the physical core 0 (simulating hyperthreading cores 0 and 1) and will not release process B to critical section - and none of other waiting processes which are lower in the queue. I will switch off hyperthreading and check it. Edited Friday at 03:11 AM by hukmac Share this post Link to post
hukmac 5 Posted Friday at 03:08 AM (edited) OK, so... I have checked that the above has nothing to do with hyperthreading. I have also revisited my observations. Somehow earlier I assumed that the time of single PerformCalculation is at least twice the time of executing SaveDataToDisk - it was the opposite. Thus, the observed behaviour is expected. SaveDataToDisk is a real bottleneck. Processes are entering Mutex2 critical section as fast as possible and nothing strange is with Mutex2. When I artificially increased the time of PerformCalculation to be 10 times the SaveDataToDisk, it nicely performs SaveDataToDisk during PerformCalculation on other cores - and PerformCalculation in all processes uses almost 100% time of the CPU. I will check if I can speed up the SaveDataToDisk. In the example I'm working with each 3s calculation creates 800MB of data to save. Saving this takes ~7s. I'm working on Samsung EVO 970 Plus with 3300MB/s write speed so there is room for improvement. Thank you Remy for making me thinking. Edited Friday at 03:10 AM by hukmac Share this post Link to post