Parallel for 32 vrs 64bits

pyscripter · May 5, 2020

I revisited this thread and tested the code below:

program Project1;
{$APPTYPE CONSOLE}
{$R *.res}

uses
  System.SysUtils,
  System.Threading,
  System.Diagnostics;
var
  SW:TStopWatch;
type
  TThreadPoolStatsHelper = record helper for TThreadPoolStats
    function Formatted: string;
  end;

  function TThreadPoolStatsHelper.Formatted: string;
  begin
    Result := Format('Worker: %2d, Min: %2d, Max: %2d, Idle: %2d, Retired: %2d, Suspended: %2d, CPU(Avg): %3d, CPU: %3d',
      [self.WorkerThreadCount,
       self.MinLimitWorkerThreadCount, self.MaxLimitWorkerThreadCount,
       self.IdleWorkerThreadCount, self.RetiredWorkerThreadCount, self.ThreadSuspended,
       self.AverageCPUUsage, self.CurrentCPUUsage]);
  end;

  procedure Load;
  begin
    TParallel.For(0, 99999999, procedure(i: Integer)
    var
      T:Single;
    begin
      T:=Sin(i/PI);
    end);
  end;

begin
  try
    Writeln('PPL Test ---------------');
    Writeln('Before: '+ TThreadPoolStats.Current.Formatted);
    SW:=TStopWatch.StartNew;
    Load;
    Writeln('Finished in '+SW.Elapsed.ToString);
    Sleep(1000);
    Writeln('After: '+TThreadPoolStats.Current.Formatted);
  except
    on E: Exception do
      Writeln(E.ClassName, ': ', E.Message);
  end;
  ReadLn;
end.

This is the output

32-bits

PPL Test ---------------
Before: Worker: 0, Min: 8, Max: 200, Idle: 0, Retired: 0, Suspended: 0, CPU(Avg): 0, CPU: 0
Finished in 00:00:00.7620933
After: Worker: 8, Min: 8, Max: 200, Idle: 7, Retired: 0, Suspended: 0, CPU(Avg): 8, CPU: 15

64-bits

PPL Test ---------------
Before: Worker: 0, Min: 8, Max: 200, Idle: 0, Retired: 0, Suspended: 0, CPU(Avg): 0, CPU: 0
Finished in 00:00:14.0655228
After: Worker: 8, Min: 8, Max: 200, Idle: 7, Retired: 0, Suspended: 0, CPU(Avg): 85, CPU: 1

Can anyone explain the huge difference in times? (it was consistent over many runs).

pyscripter · May 5, 2020

Oh I get it. sin is highly optimized in 32-bits but apparently not in 64-bits.

Vandrovnik · May 5, 2020

May be https://github.com/neslib/FastMath and/or http://docwiki.embarcadero.com/RADStudio/Rio/en/Floating_point_precision_control_(Delphi_for_x64) can help.

David Heffernan · May 5, 2020

The issue is that x64 trig functions are very slow for very large values. Nobody actually wants to know sin for 99999999/pi radians. Put in sensible values for the argument to sin and it looks more reasonable. For instance try using

T:=Sin(i/99999999);

David Heffernan · May 5, 2020

7 hours ago, pyscripter said:

Oh I get it. sin is highly optimized in 32-bits but apparently not in 64-bits.

No. That's wrong. In fact sin is quicker under x64 than under x86. Even though sin (and other trig) is implemented in hardware in the x87 unit, and in Pascal in x64 (because the SSE2 unit does not have built in trig).

pyscripter · May 5, 2020

8 minutes ago, David Heffernan said:

The issue is that x64 trig functions are very slow for very large values. Nobody actually wants to know sin for 99999999/pi radians. Put in sensible values for the argument to sin and it looks more reasonable. For instance try using

T:=Sin(i/99999999);

Yes you are right...

Sign In

Parallel for 32 vrs 64bits

Recommended Posts

pyscripter 792

Share this post

Link to post

pyscripter 792

Share this post

Link to post

Vandrovnik 223

Share this post

Link to post

David Heffernan 2453

Share this post

Link to post

David Heffernan 2453

Share this post

Link to post

pyscripter 792

Share this post

Link to post

Create an account or sign in to comment

Create an account

Sign in

Browse

Activity