Jump to content
pyscripter

Parallel for 32 vrs 64bits

Recommended Posts

I revisited this thread  and tested the code below:

 

program Project1;
{$APPTYPE CONSOLE}
{$R *.res}

uses
  System.SysUtils,
  System.Threading,
  System.Diagnostics;
var
  SW:TStopWatch;
type
  TThreadPoolStatsHelper = record helper for TThreadPoolStats
    function Formatted: string;
  end;

  function TThreadPoolStatsHelper.Formatted: string;
  begin
    Result := Format('Worker: %2d, Min: %2d, Max: %2d, Idle: %2d, Retired: %2d, Suspended: %2d, CPU(Avg): %3d, CPU: %3d',
      [self.WorkerThreadCount,
       self.MinLimitWorkerThreadCount, self.MaxLimitWorkerThreadCount,
       self.IdleWorkerThreadCount, self.RetiredWorkerThreadCount, self.ThreadSuspended,
       self.AverageCPUUsage, self.CurrentCPUUsage]);
  end;

  procedure Load;
  begin
    TParallel.For(0, 99999999, procedure(i: Integer)
    var
      T:Single;
    begin
      T:=Sin(i/PI);
    end);
  end;

begin
  try
    Writeln('PPL Test ---------------');
    Writeln('Before: '+ TThreadPoolStats.Current.Formatted);
    SW:=TStopWatch.StartNew;
    Load;
    Writeln('Finished in '+SW.Elapsed.ToString);
    Sleep(1000);
    Writeln('After: '+TThreadPoolStats.Current.Formatted);
  except
    on E: Exception do
      Writeln(E.ClassName, ': ', E.Message);
  end;
  ReadLn;
end.

This is the output

 

32-bits

 

PPL Test ---------------
Before: Worker:  0, Min:  8, Max: 200, Idle:  0, Retired:  0, Suspended:  0, CPU(Avg):   0, CPU:   0
Finished in 00:00:00.7620933
After: Worker:  8, Min:  8, Max: 200, Idle:  7, Retired:  0, Suspended:  0, CPU(Avg):   8, CPU:  15

 

64-bits

PPL Test ---------------
Before: Worker:  0, Min:  8, Max: 200, Idle:  0, Retired:  0, Suspended:  0, CPU(Avg):   0, CPU:   0
Finished in 00:00:14.0655228
After: Worker:  8, Min:  8, Max: 200, Idle:  7, Retired:  0, Suspended:  0, CPU(Avg):  85, CPU:   1

 

Can anyone explain the huge difference in times? (it was consistent over many runs).

 

Share this post


Link to post

Oh I get it.  sin is highly optimized in 32-bits but apparently not in 64-bits.

Share this post


Link to post

The issue is that x64 trig functions are very slow for very large values. Nobody actually wants to know sin for 99999999/pi radians.  Put in sensible values for the argument to sin and it looks more reasonable. For instance try using      

 

T:=Sin(i/99999999);

  • Like 2

Share this post


Link to post
7 hours ago, pyscripter said:

Oh I get it.  sin is highly optimized in 32-bits but apparently not in 64-bits.

 

No. That's wrong. In fact sin is quicker under x64 than under x86. Even though sin (and other trig) is implemented in hardware in the x87 unit, and in Pascal in x64 (because the SSE2 unit does not have built in trig).

Share this post


Link to post
8 minutes ago, David Heffernan said:

The issue is that x64 trig functions are very slow for very large values. Nobody actually wants to know sin for 99999999/pi radians.  Put in sensible values for the argument to sin and it looks more reasonable. For instance try using      

 

T:=Sin(i/99999999);

Yes you are right...

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×