Hello Gabr,
I'm seeing a race condition in TOmniResourceCount in the TryAllocate function.
I created a logger following your advice in:
https://stackoverflow.com/questions/8134526/implement-thread-safe-logging
Two methods in the TOmniLogger code relevant to this discussion are:
procedure TOmniLogger.Log(const Text: string);
begin
FLogQueue.Enqueue(Text);
FLogMsgCount.Allocate;
end;
procedure TOmniLogger.Logger(const task: IOmniTask);
var
LogFile: TOmniLogFile;
procedure Flush;
var
logData: TOmniValue;
begin
while FLogQueue.TryDequeue(logData) do
begin
FLogMsgCount.Release;
LogFile.Log(logData.AsString);
end;
end;
begin
LogFile := TOmniLogFile.Create(FFileName);
FTaskName := Format('%s:%s [ThreadId:%d]', [Self.ClassName, SysUtils.ExtractFileName(FFileName), Windows.GetCurrentThreadId]);
try
Log(Format('Task "%s" on thread [%d, %d]', [Task.Name, Task.UniqueID, Windows.GetCurrentThreadId]));
while DSiWaitForTwoObjects(task.TerminateEvent, FLogMsgCount.Handle, false, CMaxLogTimeout_ms) <> WAIT_OBJECT_0 do
Flush;
Flush;
finally
LogFile.Free;
end;
end;
FLogMsgCount is an instance of TOmniResourceCount. The race condition occurs in the tight while DSiWaitForTwoObjects loop and involves FLogMsgCount.Handle.
This code calls two methods from TOmniResourceCount in OtlSync.pas, Release and TryAllocate:
function TOmniResourceCount.Release: cardinal;
begin
orcLock.Acquire;
try
Result := cardinal(orcNumResources.Increment);
if Result = 1 then begin
ResetEvent(orcHandle);
SetEvent(orcAvailable);
end;
finally orcLock.Release; end;
end; { TOmniResourceCount.Release }
///<summary>Like Allocate, but with a timeout.</summary>
function TOmniResourceCount.TryAllocate(var resourceCount: cardinal;
timeout_ms: cardinal): boolean;
var
startTime_ms: int64;
waitTime_ms : int64;
begin
Result := false;
startTime_ms := DSiTimeGetTime64; //TODO: Rewrite this with a faster, non-locking clock
orcLock.Acquire;
repeat
if orcNumResources.Value = 0 then begin
orcLock.Release;
if timeout_ms <= 0 then
Exit;
if timeout_ms = INFINITE then
waitTime_ms := INFINITE
else begin
waitTime_ms := startTime_ms + timeout_ms - DSiTimeGetTime64;
if waitTime_ms <= 0 then
Exit;
end;
if WaitForSingleObject(orcAvailable, waitTime_ms) <> WAIT_OBJECT_0 then
Exit; // skip final Release
orcLock.Acquire;
end;
if orcNumResources.Value > 0 then begin
Result := true;
resourceCount := cardinal(orcNumResources.Decrement);
if resourceCount = 0 then
begin
ResetEvent(orcAvailable); //reset before release - otherwise there's a race condition between this code and .Release
orcLock.Release; //prevent race condition - another thread may wait on orcHandle and destroy this instance
SetEvent(orcHandle);
Exit; // skip final Release
end;
break; //repeat
end;
until false;
orcLock.Release;
end; { TOmniResourceCount.TryAllocate }
Here's the scenario where I see the race condition occurring:
1) resourceCount is 0. This code executes in TryAllocate:
if resourceCount = 0 then begin
ResetEvent(orcAvailable); //reset before release - otherwise there's a race condition between this code and .Release
orcLock.Release; //prevent race condition - another thread may wait on orcHandle and destroy this instance
SetEvent(orcHandle);
Exit; // skip final Release
end;
2) After the orLock.Release and before the SetEvent a thread switch occurs.
3) A call to TOmniResourceCount.Allocate is made via the TOmniLogger.Log procedure above. ResetEvent(orcHandle) is called.
4) A thread switch occurs. Now SetEvent(orcHandle) in TryAllocate executes. This causes:
while DSiWaitForTwoObjects(task.TerminateEvent, FLogMsgCount.Handle, false, CMaxLogTimeout_ms) <> WAIT_OBJECT_0 do
Flush;
To execute. The end result is FLogMsgCount.Handle is stuck in the signaled state and this loop takes over a core.
My workaround is to move the SetEvent(orcHandle) into the critical section (before orcLock.Release in TryAllocate), thus preventing this scenario. Does this seem like the correct approach to you?
Regards,
Stephen