Jump to content
StephenM

Race Condition in TOmniResourceCount

Recommended Posts

Hello Gabr,

 

I'm seeing a race condition in TOmniResourceCount in the TryAllocate function.

 

I created a logger following your advice in:

https://stackoverflow.com/questions/8134526/implement-thread-safe-logging

 

Two methods in the TOmniLogger code relevant to this discussion are:

 

procedure TOmniLogger.Log(const Text: string);
begin
  FLogQueue.Enqueue(Text);
  FLogMsgCount.Allocate;
end;

procedure TOmniLogger.Logger(const task: IOmniTask);
var
  LogFile: TOmniLogFile;

  procedure Flush;
  var
    logData: TOmniValue;
  begin
    while FLogQueue.TryDequeue(logData) do
    begin
      FLogMsgCount.Release;
      LogFile.Log(logData.AsString);
    end;
  end;

begin
  LogFile := TOmniLogFile.Create(FFileName);
  FTaskName := Format('%s:%s [ThreadId:%d]', [Self.ClassName, SysUtils.ExtractFileName(FFileName), Windows.GetCurrentThreadId]);
  try
    Log(Format('Task "%s" on thread [%d, %d]', [Task.Name, Task.UniqueID, Windows.GetCurrentThreadId]));
    while DSiWaitForTwoObjects(task.TerminateEvent, FLogMsgCount.Handle, false, CMaxLogTimeout_ms) <> WAIT_OBJECT_0 do
      Flush;
    Flush;
  finally
    LogFile.Free;
  end;
end;

FLogMsgCount is an instance of TOmniResourceCount.  The race condition occurs in the tight while DSiWaitForTwoObjects loop and involves FLogMsgCount.Handle.

 

This code calls two methods from TOmniResourceCount in OtlSync.pas, Release and TryAllocate:

 

function TOmniResourceCount.Release: cardinal;
begin
  orcLock.Acquire;
  try
    Result := cardinal(orcNumResources.Increment);
    if Result = 1 then begin
      ResetEvent(orcHandle);
      SetEvent(orcAvailable);
    end;
  finally orcLock.Release; end;
end; { TOmniResourceCount.Release }

///<summary>Like Allocate, but with a timeout.</summary>
function TOmniResourceCount.TryAllocate(var resourceCount: cardinal;
  timeout_ms: cardinal): boolean;
var
  startTime_ms: int64;
  waitTime_ms : int64;
begin
  Result := false;
  startTime_ms := DSiTimeGetTime64; //TODO: Rewrite this with a faster, non-locking clock
  orcLock.Acquire;
  repeat
    if orcNumResources.Value = 0 then begin
      orcLock.Release;
      if timeout_ms <= 0 then
        Exit;
      if timeout_ms = INFINITE then
        waitTime_ms := INFINITE
      else begin
        waitTime_ms := startTime_ms + timeout_ms - DSiTimeGetTime64;
        if waitTime_ms <= 0 then
          Exit;
      end;
      if WaitForSingleObject(orcAvailable, waitTime_ms) <> WAIT_OBJECT_0 then
        Exit; // skip final Release
      orcLock.Acquire;
    end;
    if orcNumResources.Value > 0 then begin
      Result := true;
      resourceCount := cardinal(orcNumResources.Decrement);
      if resourceCount = 0 then
      begin
        ResetEvent(orcAvailable); //reset before release - otherwise there's a race condition between this code and .Release
        orcLock.Release; //prevent race condition - another thread may wait on orcHandle and destroy this instance        
        SetEvent(orcHandle);
        Exit; // skip final Release
      end;
      break; //repeat
    end;
  until false;
  orcLock.Release; 
end; { TOmniResourceCount.TryAllocate }

Here's the scenario where I see the race condition occurring:

 

1) resourceCount is 0.  This code executes in TryAllocate:

     if resourceCount = 0 then begin
        ResetEvent(orcAvailable); //reset before release - otherwise there's a race condition between this code and .Release
        orcLock.Release; //prevent race condition - another thread may wait on orcHandle and destroy this instance
        SetEvent(orcHandle);
        Exit; // skip final Release
      end;

 

2) After the orLock.Release and before the SetEvent a thread switch occurs.

 

3) A call to TOmniResourceCount.Allocate is made via the TOmniLogger.Log procedure above.  ResetEvent(orcHandle) is called.

 

4) A thread switch occurs.  Now SetEvent(orcHandle) in TryAllocate executes.  This causes:

    while DSiWaitForTwoObjects(task.TerminateEvent, FLogMsgCount.Handle, false, CMaxLogTimeout_ms) <> WAIT_OBJECT_0 do
      Flush;

    To execute.  The end result is FLogMsgCount.Handle is stuck in the signaled state and this loop takes over a core.

 

My workaround is to move the SetEvent(orcHandle) into the critical section (before orcLock.Release in TryAllocate), thus preventing this scenario.  Does this seem like the correct approach to you?

 

Regards,

 

Stephen

 

 

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×