Jump to content

pyscripter

Members
  • Content Count

    785
  • Joined

  • Last visited

  • Days Won

    42

Posts posted by pyscripter


  1. 4 hours ago, Tommi Prami said:

    I am pretty interested in hearing what are people thinking of implementing with Managed records.

     

    -Tee-

    I think there is a serious issue with the timing of finalization of temp managed records which now is ASAP unlike regular records.  Please see

     

    Timing of finalization of temporary managed records

     

    I would love to hear what the experts think about this.  @Stefan Glienke @David Heffernan @Marco Cantu etc.  


  2. Right now the quality portal is updated at a frenetic rate.  There was a very large number of fixes, I think more than in any other previous version, and I was quite pleased that quite a few of my reports have been resolved, some of the unexpectedly:

     

    System.Threading got some attention with the famous Cancel and Wait issue resolved.

     

    Issues I would like to see fixed now and consider critical are:

     

     

     

    • Like 1

  3. On 5/18/2020 at 12:37 PM, Anders Melander said:

    Anyway, here's my final (hopefully) fixed version of InterlockedCompareExchange128 which conforms to the Win32 API documentation:

    Are the MOV instructions after LOCK CMPXCHG16B [R10] needed?

     

    The following passes the stress test:

     

    {$IFDEF Win64}
    function InterlockedCompareExchange128(Destination: Pointer; ExchangeHigh, ExchangeLow: Int64; ComparandResult: Pointer): boolean;
    // The parameters are in the RCX, RDX, R8 and R9 registers per the MS x64 calling convention:
    //   RCX        Destination
    //   RDX        ExchangeHigh
    //   R8         ExchangeLow
    //   R9         ComparandResult
    //
    // CMPXCHG16B requires the following register setup:
    //   RDX:RAX    ComparandResult.High:ComparandResult.Low
    //   RCX:RBX    ExchangeHigh:ExchangeLow
    // See: https://www.felixcloutier.com/x86/cmpxchg8b:cmpxchg16b
    asm
          .PUSHNV RBX
          MOV   R10,Destination             // RCX
          MOV   RBX,ExchangeLow             // R8
          MOV   RCX,ExchangeHigh            // RDX
          MOV   RDX,[ComparandResult+8]     // R9
          MOV   RAX,[ComparandResult]       // R9
     LOCK CMPXCHG16B [R10]
    //      MOV   [ComparandResult+8],RDX     // R9
    //      MOV   [ComparandResult],RAX       // R9
          SETZ  AL
     end;
    {$ENDIF Win64}

     


  4. 6 minutes ago, Anders Melander said:

    Note that more threads does not necessarily result in a higher likelihood of failure. More threads means more overhead (context switching and so on) and less time spent actually exercising the code we want to test. I think the the goal should be to loop as fast as possible (i.e. perform as many operations as possible), utilizing all available CPUs, with as few context switches and as little kernel mode as possible.

    Correct in general but in this particular test and given the timeout the greater the number of threads (up to a point) the greater the number of calls to NewWaitObj (which calls the stack push and pop).  You can confirm by running the stress test.


  5. I am attaching an updated fix using the corrected InterlockedCompareExchange128 by @Anders Melander.

     

    A note about @Darian Miller stress test from Github.  If you run the stress test with the fix you will see failures reported.   Fist a couple of important pointers:

    I have confirmed that in the cases where the stress test reports a failure, this is not Delphi's fault but what the WaitForSingleObject returns.

     

    To run the stress test in a meaningful way either

    • change the POP-TIMEOUT to a value greater than 15  or
    • change the TTestThread.Execute() so that it just reports cases in which the vStopWatch.ElapsedMilliseconds is zero but it does not stop the test:

    e.g.

     

    procedure TTestThread.Execute();
    var
      Item: TItem;
      vWaitResult:TWaitResult;
      vStopwatch:TStopwatch;
    begin
      pubSyncrhonizedStart.WaitFor();
    
      while not Terminated do
      begin
        if fQueue.ShutDown then Break;
        if pubTestCompletionCheck.WaitFor(0) = wrSignaled then Break;
    
        vStopwatch := TStopwatch.StartNew;
        vWaitResult := FQueue.PopItem( Item );
        vStopWatch.Stop();
    
    
        //Note: Reason for ACCEPTABLE_VARIANCE_MS is that on some low timeout values (like 200ms) it occasionally times out a little early (like 180ms)
        if (vWaitResult = wrTimeout) and (vStopWatch.ElapsedMilliseconds > 0) and (vStopWatch.ElapsedMilliseconds >= POP_TIMEOUT-ACCEPTABLE_TIMEOUTVARIANCE_MS) then
        begin
          //successful PopItem operation as we aren't adding anything into our queue
          Continue;
        end
        else
        begin
          LogIt('TTestThread ERROR: TThreadedQueue.PopItem returned [' + TRttiEnumerationType.GetName(vWaitResult) + '] unexpectedly after [' + IntToStr(vStopWatch.ElapsedMilliseconds) + 'ms]');
          //pubTestCompletionCheck.SetEvent();
          //Break;
        end;
      end;
    end;

    I have tested with a couple of thousand threads in both win32 and win64 and it behaves as expected.

    MonitorWaitStackFix.pas

    • Thanks 1

  6. @Anders MelanderCurrently the only way to pass the tests in 64bits is to use the modified version of the CAS function which now looks like this:

    function CAS(const oldData: pointer; oldReference: NativeInt; newData: pointer;
      newReference: NativeInt; var destination): boolean;
    asm
    {$IFNDEF CPUX64}
    ...
    {$ELSE CPUX64}
      .pushnv rbx
      mov   rax, oldData
      mov   rbx, newData
      mov   rcx, newReference
      mov   r8, [destination]
      lock cmpxchg16b [r8]
    {$ENDIF CPUX64}
      setz  al
    end; { CAS }

    Your implementation of InterlockedCompareExchange128 does not pass the tests, same as Delphi's own implementation.  It would be nice to come up with a working InterlockedCompareExchange128 and submit that as a separate bug report with a working solution.


  7. 1 hour ago, Anders Melander said:

    Beware that this does not align memory allocated from the heap. For that you can use the following:

    
    {$ifdef CPUX64}
      SetMinimumBlockAlignment(mba16Byte);
    {$else CPUX64}
      SetMinimumBlockAlignment(mba8Byte);
    {$endif CPUX64}

     

    SetMinimumBlockAlignment does nothing in 64bits (see the source code).  Alignment is fixed at mba16Byte always.

    • Thanks 1

  8. If I replace the push and pop rbx with .pushnv rbx and remove the .noframe the code crashes.   How is one supposed to use .pushnv?

     

    function CAS(const oldData: pointer; oldReference: NativeInt; newData: pointer;
      newReference: NativeInt; var destination): boolean;
    asm
    {$IFNDEF CPUX64}
    
    {$ELSE CPUX64}
      .pushnv rbx                   //rsp := rsp - 8 !
      mov   rax, oldData
      mov   rbx, newData
      mov   rcx, newReference
      mov   r8, [destination + 8]   //+8 with respect to .noframe
      lock cmpxchg16b [r8]
    {$ENDIF CPUX64}
      setz  al
    end; { CAS }

     


  9. 13 minutes ago, Uwe Raabe said:

    I guess David is referring another assembler sequence.

    Well there is the Delphi implementation which does not work:

    function InterlockedCompareExchange128(Destination: Pointer; ExchangeHigh, ExchangeLow: Int64; ComparandResult: Pointer): Bool; stdcall;
    asm
          MOV   R10,RCX
          MOV   RBX,R8
          MOV   RCX,RDX
          MOV   RDX,[R9+8]
          MOV   RAX,[R9]
     LOCK CMPXCHG16B [R10]
          SETZ  AL
    end;

    and the CAS function from OmniThreadLibrary which works well.

    function CAS(const oldData: pointer; oldReference: NativeInt; newData: pointer;
      newReference: NativeInt; var destination): boolean;
    asm
    {$IFNDEF CPUX64}
      push  edi
      push  ebx
      mov   ebx, newData
      mov   ecx, newReference
      mov   edi, destination
      lock cmpxchg8b qword ptr [edi]
      pop   ebx
      pop   edi
    {$ELSE CPUX64}
      .noframe
      push  rbx                     //rsp := rsp - 8 !
      mov   rax, oldData
      mov   rbx, newData
      mov   rcx, newReference
      mov   r8, [destination + 8]   //+8 with respect to .noframe
      lock cmpxchg16b [r8]
      pop   rbx
    {$ENDIF CPUX64}
      setz  al
    end; { CAS }

    @David Heffernanhas said that both are wrong (see earlier comments).   By the way Allen Bauer explains .PUSHNV here.


  10. Success!  If I use the CAS function from OmniThreadLibrary it works.  So it appears that Delphi's InterlockedCompareExchange128 is broken.

     

    Passed a test with 1000 threads and 50 million calls to NewWaitObj.

     

    I am attaching the revised test code which should now work in both 32bits and 64bits.  We should now try to convince Embarcadero to include the fixes in the Delphi RTL.

    iaStressTest.TThreadedQueue.PopItem.pas


  11. @Kas Ob.Thanks for the info about alignment.

     

    Actually Delphi does offer support for alignment (undocumented feature see @Uwe Raabe response in this Stackoverflow question).  If you change the definition of TEventStack to

      TEventStack = record
        Head: PEventItemHolder;
        Counter: NativeInt;
        procedure Push(EventItem: PEventItemHolder);
        function Pop: PEventItemHolder;
      end align {$IFDEF CPUX64}16{$ELSE CPUX64}8{$ENDIF CPUX64};

    There are not longer crashes.  The test still fails though and I am investigating why.

     

    And by the way. There is also the  [Align(8)] attribute which works with fields, global variables and records it seems.


  12. 11 hours ago, David Heffernan said:

    I can't see where RBX is restored in that InterlockedCompareExchange128 code. It's a NV register.  Function needs to start with .PUSHNV RBX

     

    I can't vouch for the rest of what it does.

    For reference this is from fpc.

        function InterlockedCompareExchange128(var Target: Int128Rec; NewValue: Int128Rec; Comperand: Int128Rec): Int128Rec; assembler;
         {
            win64:
              rcx ... pointer to result
              rdx ... target
              r8  ... NewValue
              r9  ... Comperand
          }
        {$ifdef win64}
          asm
            pushq %rbx
    
            { store result pointer for later use }
            pushq %rcx
    
            { load new value }
            movq (%r8),%rbx
            movq 8(%r8),%rcx
    
            { save target pointer for later use }
            movq %rdx,%r8
    
            { load comperand }
            movq (%r9),%rax
            movq 8(%r9),%rdx
    
            {$ifdef oldbinutils}
               .byte 0xF0,0x49,0x0F,0xC7,0x08
            {$else}
            lock cmpxchg16b (%r8)
            {$endif}
            { restore result pointer }
            popq %rcx
    
            { store result }
            movq %rax,(%rcx)
            movq %rdx,8(%rcx)
    
            popq %rbx
          end;

    Any chance of anyone coming up with a working InterlockedCompareExchange128 for Delphi?


  13. 5 hours ago, Darian Miller said:

    On my Windows 10 system, the Win32 version of your code fails after 6 seconds with about 130 threads created before PopItem throws a premature timeout.   (10.3.3)

     

    Over here win 32 300 threads 10 minutes:

    2020-05-17 01.01.08: StressTestPopItem Start: Waiting up to [600] seconds for PopItem to prematurely timeout.
    2020-05-17 01.01.08: Note: Using [10] as PopTimeout on TThreadedQueue creation
    2020-05-17 01.01.09: TThreadCreator Start: Creating up to [300] threads
    2020-05-17 01.01.09: Note: Creating threads with a StackSize of [65536]
    2020-05-17 01.01.09: TThreadCreator End: [300] worker threads created
    ...
    2020-05-17 01.11.05: New Wait objects created: 14800000
    2020-05-17 01.11.08: StressTestPopItem End: Overall maximum time limit reached for this test without an error detected...we will call this a success!
    2020-05-17 01.11.08: Hit <enter> to exit

    NewWaitObj was called 15 million times.

     

    and Win32 2000 threads 30 seconds.

    2020-05-17 01.30.28: StressTestPopItem Start: Waiting up to [30] seconds for PopItem to prematurely timeout.
    2020-05-17 01.30.28: Note: Using [10] as PopTimeout on TThreadedQueue creation
    2020-05-17 01.30.29: TThreadCreator Start: Creating up to [2000] threads
    2020-05-17 01.30.29: Note: Creating threads with a StackSize of [65536]
    2020-05-17 01.30.29: TThreadCreator End: [2000] worker threads created
    ...
    2020-05-17 01.30.58: New Wait objects created: 5000000
    2020-05-17 01.30.58: StressTestPopItem End: Overall maximum time limit reached for this test without an error detected...we will call this a success!
    2020-05-17 01.30.58: Hit <enter> to exit

    NewWaitObj was called 5 million times in 30 seconds.

     

    Sometimes with many threads say more than 300 I get an early failure before all threads are created and I am not sure why.  Once all the threads are created I get no errors.  If you add Sleep(1000) at the top of  TTestThread.Execute() to give it time for all the threads to be created these startup errors are avoided.


  14. 15 hours ago, Anders Melander said:

    What you (or someone else) need to do is provide an implementation of TInterlocked.CompareExchangeAtomicCmpExchange or CAS that handles 16 bytes and returns a boolean indicating success.

    function InterlockedCompareExchage(var Dest: TEventStack; NewValue, CurrentValue: TEventStack): Boolean;
    begin
      {$IFDEF CPUX64}
      Result := InterlockedCompareExchange128(@Dest, Int64(NewValue.Head), NewValue.Counter,  @CurrentValue);
      {$ELSE CPUX64}
      Result := InterlockedCompareExchange64(Int64(Dest), Int64(NewValue), Int64(CurrentValue)) = Int64(CurrentValue);
      {$ENDIF CPUX64}
    end;

    I have tried the above.  Works well in 32 bits, but crashes in 64bits.

    The implementation of InterlockedCompareExchange128 in WinApi.Windows.pas is: 

    function InterlockedCompareExchange128(Destination: Pointer; ExchangeHigh, ExchangeLow: Int64; ComparandResult: Pointer): Bool; stdcall;
    asm
          MOV   R10,RCX
          MOV   RBX,R8
          MOV   RCX,RDX
          MOV   RDX,[R9+8]
          MOV   RAX,[R9]
     LOCK CMPXCHG16B [R10]
          SETZ  AL
    end;

     

×