pyscripter 689 Posted May 5, 2020 I revisited this thread and tested the code below: program Project1; {$APPTYPE CONSOLE} {$R *.res} uses System.SysUtils, System.Threading, System.Diagnostics; var SW:TStopWatch; type TThreadPoolStatsHelper = record helper for TThreadPoolStats function Formatted: string; end; function TThreadPoolStatsHelper.Formatted: string; begin Result := Format('Worker: %2d, Min: %2d, Max: %2d, Idle: %2d, Retired: %2d, Suspended: %2d, CPU(Avg): %3d, CPU: %3d', [self.WorkerThreadCount, self.MinLimitWorkerThreadCount, self.MaxLimitWorkerThreadCount, self.IdleWorkerThreadCount, self.RetiredWorkerThreadCount, self.ThreadSuspended, self.AverageCPUUsage, self.CurrentCPUUsage]); end; procedure Load; begin TParallel.For(0, 99999999, procedure(i: Integer) var T:Single; begin T:=Sin(i/PI); end); end; begin try Writeln('PPL Test ---------------'); Writeln('Before: '+ TThreadPoolStats.Current.Formatted); SW:=TStopWatch.StartNew; Load; Writeln('Finished in '+SW.Elapsed.ToString); Sleep(1000); Writeln('After: '+TThreadPoolStats.Current.Formatted); except on E: Exception do Writeln(E.ClassName, ': ', E.Message); end; ReadLn; end. This is the output 32-bits PPL Test --------------- Before: Worker: 0, Min: 8, Max: 200, Idle: 0, Retired: 0, Suspended: 0, CPU(Avg): 0, CPU: 0 Finished in 00:00:00.7620933 After: Worker: 8, Min: 8, Max: 200, Idle: 7, Retired: 0, Suspended: 0, CPU(Avg): 8, CPU: 15 64-bits PPL Test --------------- Before: Worker: 0, Min: 8, Max: 200, Idle: 0, Retired: 0, Suspended: 0, CPU(Avg): 0, CPU: 0 Finished in 00:00:14.0655228 After: Worker: 8, Min: 8, Max: 200, Idle: 7, Retired: 0, Suspended: 0, CPU(Avg): 85, CPU: 1 Can anyone explain the huge difference in times? (it was consistent over many runs). Share this post Link to post
pyscripter 689 Posted May 5, 2020 Oh I get it. sin is highly optimized in 32-bits but apparently not in 64-bits. Share this post Link to post
Vandrovnik 214 Posted May 5, 2020 May be https://github.com/neslib/FastMath and/or http://docwiki.embarcadero.com/RADStudio/Rio/en/Floating_point_precision_control_(Delphi_for_x64) can help. 1 Share this post Link to post
David Heffernan 2345 Posted May 5, 2020 The issue is that x64 trig functions are very slow for very large values. Nobody actually wants to know sin for 99999999/pi radians. Put in sensible values for the argument to sin and it looks more reasonable. For instance try using T:=Sin(i/99999999); 2 Share this post Link to post
David Heffernan 2345 Posted May 5, 2020 7 hours ago, pyscripter said: Oh I get it. sin is highly optimized in 32-bits but apparently not in 64-bits. No. That's wrong. In fact sin is quicker under x64 than under x86. Even though sin (and other trig) is implemented in hardware in the x87 unit, and in Pascal in x64 (because the SSE2 unit does not have built in trig). Share this post Link to post
pyscripter 689 Posted May 5, 2020 8 minutes ago, David Heffernan said: The issue is that x64 trig functions are very slow for very large values. Nobody actually wants to know sin for 99999999/pi radians. Put in sensible values for the argument to sin and it looks more reasonable. For instance try using T:=Sin(i/99999999); Yes you are right... Share this post Link to post