Jump to content

stephane

Members
  • Content Count

    5
  • Joined

  • Last visited

  • Days Won

    1

stephane last won the day on July 8

stephane had the most liked content!

Community Reputation

3 Neutral
  1. stephane

    Parallel.For optimization

    Thank you guys for your hints. After investigating further, it seems indeed that there are some memory allocation issues. I started fixing them and both the monothread version and the multithread version are now faster. I'll revert back here when I am done with this process.
  2. stephane

    Parallel.For optimization

    Thanks a lot for the hint. I found a way to display the caller and it seems that many calls are coming from the system managing the memory: Not sure how to take it from here though.
  3. stephane

    Parallel.For optimization

    Hello, In a VCL application I am currently trying to optimize a monothread task that is doing many complex geometric calculations and that is taking around 2 minutes and 20 seconds to execute. It seems like a good candidate for implementing a multithread strategy. My computer has 8 cores and 16 threads but I try to implement 8 threads only for now. Here is the code implementing the Parallel.For loop: var lNumTasks := 8; SetLength(lVCalculBuckets, lNumTasks); Parallel.For<TObject> (lShadingStepListAsObjects.ToArray) .NoWait .NumTasks(lNumTasks) .OnStop(Parallel.CompleteQueue(lResults)) .Initialize( procInitMultiThread ) .Finalize( procFinalizeMultiThread ) .Execute ( procExecuteMultiThread ); procInitMultiThread and procFinalizeMultiThread copy and free lVCalculBuckets which contains one copy of our working objects per thread: procedure TMyClass.procInitMultiThread(aTaskIndex, aFromIndex, aToIndex: Integer); var lVCalcul : TVCalcul; begin // Copy data lVCalcul := TVCalcul.Create(nil); lVCalcul.CopyLight(Self.VCalcul); lVCalculBuckets[aTaskIndex] := lVCalcul; end; procedure TMyClass.procFinalizeMultiThread(aTaskIndex, aFromIndex, aToIndex: Integer); var lVCalcul : TVCalcul; begin // Delete copied data lVCalcul := TVCalcul(lVCalculBuckets[aTaskIndex]); FreeAndNil(lVCalcul); end; procExecuteMultiThread is just making the calculations and posting them back to the calling thread so that they are displayed on the VCL interface: procedure TMyClass.procExecuteMultiThread(aTaskIndex: Integer; var aValue: TObject); var lVCalcul : TVCalcul; lRes: TStepRes; begin // Retrieve data lVCalcul := TVCalcul(lVCalculBuckets[aTaskIndex]); if Assigned(lVCalcul) then begin // Calculate factors lRes := TShadingStepRes(aValue); lVCalcul.CalculateFactors(lRes.Height, lRes.Width); // Post results lRes.FillResFromVCalcul(lVCalcul); lResults.Add(TOmniValue.CastFrom<TStepRes>(lRes)); end; end; Now this implementation runs in about 1min50, which is faster than the monothread version, but far from the gains I expected. I tried simplifying the code by removing the "Post results" part, thinking that it was causing synchronization delays. But it doesn't have any effects. Running the application inside SamplingProfiler and profiling a worker thread shows that 80% of the time spent by this thread is in NtDelayExecution: Yet I have no idea why because in the calculation part itself there isn't any synchronization code that I am aware of. If any of you would be able to point me in the right direction to further debug this, it would be much appreciated.
  4. stephane

    Parallel.ForEach is really slow

    Thank you guys for your answers. I will try to rewrite my code to use Parallel.For instead of Parallel.ForEach and hopefully I will get much better performance.
  5. Hello, I am using Parallel.ForEach in my project and it didn't speed up the process compared to the monothread approach. So I tried to run the test "58_ForVsForEach" for CLoopCount at 2 billion and "Parallel.ForEach" is more than 10 times slower than the "for" approach while "Parallel.For" is the fastest approach: I would have expected "Parallel.ForEach" to be comparable to "Parallel.For" in terms of speed. Am I missing something obvious? If this is of any help, I am using Delphi 12.1 on Windows 10 with a 4 cores/8 threads processor. I also tried on another computer and got the same kind of results. Thanks in advance for your help.
×