stephane
-
Content Count
5 -
Joined
-
Last visited
-
Days Won
1
Posts posted by stephane
-
-
-
Hello,
In a VCL application I am currently trying to optimize a monothread task that is doing many complex geometric calculations and that is taking around 2 minutes and 20 seconds to execute. It seems like a good candidate for implementing a multithread strategy. My computer has 8 cores and 16 threads but I try to implement 8 threads only for now.
Here is the code implementing the Parallel.For loop:
var lNumTasks := 8; SetLength(lVCalculBuckets, lNumTasks); Parallel.For<TObject> (lShadingStepListAsObjects.ToArray) .NoWait .NumTasks(lNumTasks) .OnStop(Parallel.CompleteQueue(lResults)) .Initialize( procInitMultiThread ) .Finalize( procFinalizeMultiThread ) .Execute ( procExecuteMultiThread );
procInitMultiThread and procFinalizeMultiThread copy and free lVCalculBuckets which contains one copy of our working objects per thread:
procedure TMyClass.procInitMultiThread(aTaskIndex, aFromIndex, aToIndex: Integer); var lVCalcul : TVCalcul; begin // Copy data lVCalcul := TVCalcul.Create(nil); lVCalcul.CopyLight(Self.VCalcul); lVCalculBuckets[aTaskIndex] := lVCalcul; end; procedure TMyClass.procFinalizeMultiThread(aTaskIndex, aFromIndex, aToIndex: Integer); var lVCalcul : TVCalcul; begin // Delete copied data lVCalcul := TVCalcul(lVCalculBuckets[aTaskIndex]); FreeAndNil(lVCalcul); end;
procExecuteMultiThread is just making the calculations and posting them back to the calling thread so that they are displayed on the VCL interface:
procedure TMyClass.procExecuteMultiThread(aTaskIndex: Integer; var aValue: TObject); var lVCalcul : TVCalcul; lRes: TStepRes; begin // Retrieve data lVCalcul := TVCalcul(lVCalculBuckets[aTaskIndex]); if Assigned(lVCalcul) then begin // Calculate factors lRes := TShadingStepRes(aValue); lVCalcul.CalculateFactors(lRes.Height, lRes.Width); // Post results lRes.FillResFromVCalcul(lVCalcul); lResults.Add(TOmniValue.CastFrom<TStepRes>(lRes)); end; end;
Now this implementation runs in about 1min50, which is faster than the monothread version, but far from the gains I expected. I tried simplifying the code by removing the "Post results" part, thinking that it was causing synchronization delays. But it doesn't have any effects.
Running the application inside SamplingProfiler and profiling a worker thread shows that 80% of the time spent by this thread is in NtDelayExecution:
Yet I have no idea why because in the calculation part itself there isn't any synchronization code that I am aware of.
If any of you would be able to point me in the right direction to further debug this, it would be much appreciated.
-
Thank you guys for your answers. I will try to rewrite my code to use Parallel.For instead of Parallel.ForEach and hopefully I will get much better performance.
-
Hello,
I am using Parallel.ForEach in my project and it didn't speed up the process compared to the monothread approach.
So I tried to run the test "58_ForVsForEach" for CLoopCount at 2 billion and "Parallel.ForEach" is more than 10 times slower than the "for" approach while "Parallel.For" is the fastest approach:
I would have expected "Parallel.ForEach" to be comparable to "Parallel.For" in terms of speed. Am I missing something obvious?
If this is of any help, I am using Delphi 12.1 on Windows 10 with a 4 cores/8 threads processor. I also tried on another computer and got the same kind of results.
Thanks in advance for your help.
Parallel.For optimization
in OmniThreadLibrary
Posted · Edited by stephane
Thank you guys for your hints. After investigating further, it seems indeed that there are some memory allocation issues. I started fixing them and both the monothread version and the multithread version are now faster. I'll revert back here when I am done with this process.