Jump to content

stephane

Members
  • Content Count

    5
  • Joined

  • Last visited

  • Days Won

    1

Posts posted by stephane


  1. Thank you guys for your hints. After investigating further, it seems indeed that there are some memory allocation issues. I started fixing them and both the monothread version and the multithread version are now faster. I'll revert back here when I am done with this process.

    • Like 2

  2. Hello,

     

    In a VCL application I am currently trying to optimize a monothread task that is doing many complex geometric calculations and that is taking around 2 minutes and 20 seconds to execute. It seems like a good candidate for implementing a multithread strategy. My computer has 8 cores and 16 threads but I try to implement 8 threads only for now.

     

    Here is the code implementing the Parallel.For loop:

      var lNumTasks := 8;
      SetLength(lVCalculBuckets, lNumTasks);
      
      Parallel.For<TObject> (lShadingStepListAsObjects.ToArray)
              .NoWait
              .NumTasks(lNumTasks)
              .OnStop(Parallel.CompleteQueue(lResults))
              .Initialize(
          procInitMultiThread
        )
              .Finalize(
          procFinalizeMultiThread
        )
              .Execute (
          procExecuteMultiThread
      );

    procInitMultiThread and procFinalizeMultiThread copy and free lVCalculBuckets which contains one copy of our working objects per thread:

        procedure TMyClass.procInitMultiThread(aTaskIndex, aFromIndex, aToIndex: Integer);
        var lVCalcul : TVCalcul;
        begin
          // Copy data
          lVCalcul := TVCalcul.Create(nil);
          lVCalcul.CopyLight(Self.VCalcul);
          lVCalculBuckets[aTaskIndex] := lVCalcul;
        end;
    
        procedure TMyClass.procFinalizeMultiThread(aTaskIndex, aFromIndex, aToIndex: Integer);
        var lVCalcul : TVCalcul;
        begin
          // Delete copied data
          lVCalcul := TVCalcul(lVCalculBuckets[aTaskIndex]);
          FreeAndNil(lVCalcul);
        end;

    procExecuteMultiThread is just making the calculations and posting them back to the calling thread so that they are displayed on the VCL interface:

        procedure TMyClass.procExecuteMultiThread(aTaskIndex: Integer; var aValue: TObject);
        var lVCalcul : TVCalcul;
            lRes: TStepRes;
        begin
          // Retrieve data
          lVCalcul := TVCalcul(lVCalculBuckets[aTaskIndex]);
          if Assigned(lVCalcul) then
          begin
            // Calculate factors
            lRes := TShadingStepRes(aValue);
            lVCalcul.CalculateFactors(lRes.Height, lRes.Width);
    
            // Post results
            lRes.FillResFromVCalcul(lVCalcul);
            lResults.Add(TOmniValue.CastFrom<TStepRes>(lRes));
          end;
        end;

    Now this implementation runs in about 1min50, which is faster than the monothread version, but far from the gains I expected. I tried simplifying the code by removing the "Post results" part, thinking that it was causing synchronization delays. But it doesn't have any effects.

     

    Running the application inside SamplingProfiler and profiling a worker thread shows that 80% of the time spent by this thread is in NtDelayExecution:

    image.thumb.png.e282294df9331623e9efd6d9e3bb3aa9.png

     

    Yet I have no idea why because in the calculation part itself there isn't any synchronization code that I am aware of.

     

    If any of you would be able to point me in the right direction to further debug this, it would be much appreciated.


  3. Hello,

     

    I am using Parallel.ForEach in my project and it didn't speed up the process compared to the monothread approach.

     

    So I tried to run the test "58_ForVsForEach" for CLoopCount at 2 billion and "Parallel.ForEach" is more than 10 times slower than the "for" approach while "Parallel.For" is the fastest approach:

    image.png.24d5854cc8af3ee5d231d176560121a1.png

     

    I would have expected "Parallel.ForEach" to be comparable to "Parallel.For" in terms of speed. Am I missing something obvious? 

     

    If this is of any help, I am using Delphi 12.1 on Windows 10 with a 4 cores/8 threads processor. I also tried on another computer and got the same kind of results.

     

    Thanks in advance for your help.

×