Jump to content
Rolf Fankhauser

Poor performance of Python script

Recommended Posts

Hi again,

 

I would like to use P4D in my application for Real Time Control (RTC) in a program for simulation of sewer systems. The program needs to calculate 500'000 time steps or more. In each time step a Python script should set some controls according to some discharges in the system. The rules are system-specific and the users should be able to define the rules for their system. A perfect application for a scripting language!

So, I created a small application to test the performance of P4D for RTC: a loop with 1 mio iterations.

The loop in Delphi:

for i:=1 to i_end do
       begin
        input := Random * 10.0;
        if (input > 6.0) or (input < 2.0)
        then
          output := 40.0
        else
          output := 20.0
       end;

The Python script 'loop.py':

if (inflow.Value > 6) or (inflow.Value < 2) :
  outflow.Value = 40
else:
  outflow.Value = 20

 

and the Delphi code to execute the script:

	  script.LoadFromFile('loop.py');
      
      for i:=1 to i_end do
       begin
        PythonDelphiVar2.Value := Random * 10.0;
        PythonEngine1.ExecStrings(script);
       end;

 

The results for Delphi (with different methods to measure the time):

 

Time by Now to run loop:           28 ms
Time by GetTickCounts to run loop: 15 ms
Time by StopWatch to run loop:     27664 us
Time by StopWatch to run loop:     27664600 ns

 

and for P4D:

loop started with python script, 1 mio iterations!

 

Time by Now to run loop:           28830 ms
Time by GetTickCounts to run loop: 28844 ms
Time by StopWatch to run loop:     28828936 us
Time by StopWatch to run loop:     28828936500 ns

 

P4D is around 1000 times slower than Delphi! Some ideas to improve the performance would be very welcome.

 

I made some tests with C++Builder running the Python script with the C API. Then Python is very fast: 200 ms for 1 mio iterations. But when I moved this test program from Win 7 to Win 10 it didn't work any more (no error message, silent close of the program when running the script)

At the moment I use Lua:

   advantages: very fast (ca. 300 ms for 1 mio iterations), no use of external dll's, all is included in the main C++Builder application

   disadvantages: C API rather complicated, not so much modules but growing, no support for Delphi (maybe there is a component...) 

 

Regards, Rolf

   

 

 

Share this post


Link to post

Put the loop into the Python script. It's the transition from Delphi to Python and back again every iteration that is killing the performance.

Share this post


Link to post

But the main program needs the value for each iteration, that's the real time control!

The Delphi program transmits the input to the python script. The python script calculates the output according to the rule and gives it back to Delphi. This must happen in each iteration (time step).

Share this post


Link to post
1 minute ago, Rolf Fankhauser said:

But the main program needs the value for each iteration, that's the real time control!

The Delphi program transmits the input to the python script. The python script calculates the output according to the rule and gives it back to Delphi. This must happen in each iteration (time step).

That's going to be pretty slow then. I'm not sure you are going to be able to combat that. I find it a little hard to believe that your C++ experiment with 200ms for 1mio iterations had the loop in C++. C API for Python of course works fine under Windows 10, so your silent crash is going to be solvable.

Share this post


Link to post
Posted (edited)

Your loop calls ExecStrings which means you have the overhead of string compilation. in every iteration  

 

What you can do instead is to wrap you python code in a function:

 

def calc_outflow(inflow)
  if (inflow > 6) or (inflow < 2) :
    return 40
  else:
    return 20

and call the function from Delphi.

 

Uses 
  VarPyth;
 
      script.LoadFromFile('loop.py');
      PythonEngine1.ExecStrings(script);
      var CalcOutFlow: Variant := MainModule.calc_outflow;

      for i:=1 to i_end do
       begin
         Output := CalcOutFlow(Random * 10.0);        
       end;

 

You will get extra speed if instead of 

 

Output := CalcOupFlow(Random * 10.0);

 

you call the function using low-level PythonEngine routines e.g.  PythonEngine.PyObject_CallObject

Edited by pyscripter

Share this post


Link to post

Thanks for the hint!

 

This was my next idea but I didn't yet know how to call the function.

I already noticed this when I used the C API. Then I also changed from the script to a function and could improve the speed.

For Lua I got an improvement of a factor 10 using a function.

 

Share this post


Link to post

@pyscripter

The compiler didn't accept the following line proposed by you:

 

var CalcOutFlow: Variant := MainModule.calc_outflow;

So, I split it:

CalcOutFlow: Variant; (in var block)

and

CalcOutFlow := MainModule.calc_outflow;

 

But then I get an error on line:    Output := CalcOutFlow(Random * 10.0);  (the compiler does not understand the bracket)

 

Share this post


Link to post

Does

Output := MainModule.calc_outflow(Random * 10.0);

work?

Share this post


Link to post

Yes !! => now 1 mio iterations in 1800ms, not bad.

I will try the low level functions of PythonEngine. It seems that they are wrapper of the C-API functions?

Next step would be to use P4D in C++Builder because my application for sewer system simulation is written in C/C++

I think there is a tutorial from David I for installation and use?

 

Thanks for your prompt help!

Share this post


Link to post
6 hours ago, Rolf Fankhauser said:

I will try the low level functions of PythonEngine. It seems that they are wrapper of the C-API functions?

Nοt wrappers but the exported C-API functions of the python DLL.   You can get the PPyObject corresponding the python function by using ExtractPythonObjectFrom(MainModule.calc_outflow).

 

6 hours ago, Rolf Fankhauser said:

I think there is a tutorial from David I for installation and use?

 Yes but bear in mind that the package structure was modified since the tutorial video was produced.

Share this post


Link to post

I used the following code (not complete) to call the function in C++Builder:

 

PyObject *pName, *pModule, *pDict, *pFunc;
PyObject *pArgs, *pValue, *pType, *pTraceback;
int i;
...
pArgs = PyTuple_New(1);  
...  
for (int j = 0; j < j_end; j++) {
   i = random(10);
   pValue = PyInt_FromLong(i);
   PyTuple_SetItem(pArgs, 0, pValue);
   pValue = PyObject_CallObject(pFunc, pArgs);
}

I guess that PPyObject (Delphi) is equivalent to PyObject (C)

With ExtractPythonObjectFrom(MainModule.calc_outflow)  I would define pFunc, right?

I will try it...

Share this post


Link to post
1 hour ago, Rolf Fankhauser said:

I guess that PPyObject (Delphi) is equivalent to PyObject (C)

PyObject *

Share this post


Link to post

Ok, 1 mio iterations in 325 ms (average). That's close to the C-API version and 10 times slower than compiled. That is usable.

That's the code:

procedure TForm1.btRunScriptedClick(Sender: TObject);
var
   i, i_end: integer;
   start, stop: TDateTime;
   start2, stop2: cardinal;
   sw : TStopWatch;
   script: TStringList;
   PyFunc, PyValue, PyArgs: PPyObject;

begin
   sw := TStopWatch.Create;
   script := TStringList.Create;
   Memo2.Lines.Append('');
   Memo2.Lines.Append('loop started with python script, 1 mio iterations!');
   try
      i_end := 1000000;
      start := Now;
      start2 := GetTickCount;
      sw.Start;
      script.LoadFromFile('loop_function.py');
      PythonEngine1.ExecStrings(script);
      PyFunc := ExtractPythonObjectFrom(MainModule.calc_outflow);
      PyArgs := PythonEngine1.PyTuple_New(1);

      for i:=1 to i_end do
       begin
        Input := Random * 10.0;
        //PyValue := PythonEngine1.PyLong_FromLong(Input);  integer version
        PyValue := PythonEngine1.PyFloat_FromDouble(Input);
        PythonEngine1.PyTuple_SetItem(PyArgs, 0, PyValue);
        PyValue := PythonEngine1.PyObject_CallObject(PyFunc, PyArgs);
        //Output := PythonEngine1.PyLong_AsLong(PyValue); integer version
        Output := PythonEngine1.PyFloat_AsDouble(PyValue);
        //Memo2.Lines.Append('Input: ' + IntToStr(Input) + ', Output: ' + IntToStr(Output));
       end;
      stop := Now;
      stop2 := GetTickCount;
      sw.Stop;
      if sw.IsHighResolution = true then
       Memo2.Lines.Append('Is high resolution!')
      else
       Memo2.Lines.Append('Is not high resolution!');
      Memo2.Lines.Append('');
      Memo2.Lines.Append('Time by Now to run loop:           ' + FloatToStr(RoundTo((stop - start)*24*3600*1000, -3)) + ' ms');
      Memo2.Lines.Append('Time by GetTickCounts to run loop: ' + IntToStr(stop2 - start2) + ' ms');
      Memo2.Lines.Append('Time by StopWatch to run loop:     ' + IntToStr(sw.ElapsedMicroseconds) + ' us');
      Memo2.Lines.Append('Time by StopWatch to run loop:     ' + IntToStr(sw.ElapsedNanoseconds) + ' ns');
   finally
     { with PythonEngine1 do
      begin
       if Assigned(PyValue) then Py_DECREF(PyValue);
       if Assigned(PyArgs) then Py_DECREF(PyArgs);
       if Assigned(PyFunc) then Py_DECREF(PyFunc);
      end;  }
      sw.Free;
      script.Free;
   end;
end;

I have some problems with dereferencing the PPyObjects.

Therefore I removed them. I got an AV when I run the loop multiple times with dereferencing. But I don't understand why.

Without dereferencing I had no problems to run the loop multiple times. 

 

Share this post


Link to post
Posted (edited)

You were dereferencing Py_Func without increasing the reference count first.  Reference counting is tricky in Python.  See Reference Counting in Python (tripod.com).

This is why is safer to use VarPyth, which takes care of ref counting,  even if is slower.

 

The code below has not been tested:

var PyFunc:= ExtractPythonObjectFrom(MainModule.calc_outflow); // borrowed reference
var PE := GetPythonEngine;
PE.Py_INCREF(PyFunc);  // can skip since it is protected inside the temp variant MainModule.calc_outflow but is safer to inc it.
var PyArgs := PE.PyTuple_New(1);  // Will need to be deref
for i:=1 to i_end do
begin
  Input := Random * 10.0;
  var PyInput := PE.PythonEngine1.PyFloat_FromDouble(Input);
  PE.PyTuple_SetItem(PyArgs, 0, PyInput);  //  PyInput is now the responsibility of the tuple
  var PyValue :=  PyObject_CallObject(PyFunc, PyArgs);  // the result of the function is your responsibility
  Output := PE.PyFloat_AsDouble(PyValue);
  PE.Py_DECREF(PyValue); 
end;
PE.Py_DECREF(PyArg);
PE.Py_DECREF(PyFunc); // to match the increment

  

 

 

 

Edited by pyscripter

Share this post


Link to post

Thanks a lot !! This reference counting is annoying !! Thanks for the link, I will study the article.
I supposed that ExtractPythonObjectFrom does increment the reference count.

I corrected my code with your above suggestions. You forgot to remove PythonEngine1:

var PyInput := PE.PythonEngine1.PyFloat_FromDouble(Input);

Performance didn't change

 

Share this post


Link to post

One of the biggest issues with python, not necessarily python4d, is that everything is boxed. Many analytical packages use some packages like numpy/scipy which have some very powerful  underlying C code that provides the boost without needing the boxing and unboxing of values where the python code is just providing very flexible glue.

import numpy

def test(size):
     x = numpy.random.rand(size) * 10
     return numpy.logical_or(x > 6, x < 2).astype(int) * 20 + 20

test(500000)

to get the above to work, you will need to install numpy if you don't already have it:

pip install numpy

It may be interesting to see if this approach provides any improvements? In the above, I could see there being a cost with the way it still has to evaluate the logical expression with a callback.

 

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×