Rolf Fankhauser 1 Posted March 22, 2021 Hi again, I would like to use P4D in my application for Real Time Control (RTC) in a program for simulation of sewer systems. The program needs to calculate 500'000 time steps or more. In each time step a Python script should set some controls according to some discharges in the system. The rules are system-specific and the users should be able to define the rules for their system. A perfect application for a scripting language! So, I created a small application to test the performance of P4D for RTC: a loop with 1 mio iterations. The loop in Delphi: for i:=1 to i_end do begin input := Random * 10.0; if (input > 6.0) or (input < 2.0) then output := 40.0 else output := 20.0 end; The Python script 'loop.py': if (inflow.Value > 6) or (inflow.Value < 2) : outflow.Value = 40 else: outflow.Value = 20 and the Delphi code to execute the script: script.LoadFromFile('loop.py'); for i:=1 to i_end do begin PythonDelphiVar2.Value := Random * 10.0; PythonEngine1.ExecStrings(script); end; The results for Delphi (with different methods to measure the time): Time by Now to run loop: 28 ms Time by GetTickCounts to run loop: 15 ms Time by StopWatch to run loop: 27664 us Time by StopWatch to run loop: 27664600 ns and for P4D: loop started with python script, 1 mio iterations! Time by Now to run loop: 28830 ms Time by GetTickCounts to run loop: 28844 ms Time by StopWatch to run loop: 28828936 us Time by StopWatch to run loop: 28828936500 ns P4D is around 1000 times slower than Delphi! Some ideas to improve the performance would be very welcome. I made some tests with C++Builder running the Python script with the C API. Then Python is very fast: 200 ms for 1 mio iterations. But when I moved this test program from Win 7 to Win 10 it didn't work any more (no error message, silent close of the program when running the script) At the moment I use Lua: advantages: very fast (ca. 300 ms for 1 mio iterations), no use of external dll's, all is included in the main C++Builder application disadvantages: C API rather complicated, not so much modules but growing, no support for Delphi (maybe there is a component...) Regards, Rolf Share this post Link to post
David Heffernan 2345 Posted March 22, 2021 Put the loop into the Python script. It's the transition from Delphi to Python and back again every iteration that is killing the performance. Share this post Link to post
Rolf Fankhauser 1 Posted March 22, 2021 But the main program needs the value for each iteration, that's the real time control! The Delphi program transmits the input to the python script. The python script calculates the output according to the rule and gives it back to Delphi. This must happen in each iteration (time step). Share this post Link to post
David Heffernan 2345 Posted March 22, 2021 1 minute ago, Rolf Fankhauser said: But the main program needs the value for each iteration, that's the real time control! The Delphi program transmits the input to the python script. The python script calculates the output according to the rule and gives it back to Delphi. This must happen in each iteration (time step). That's going to be pretty slow then. I'm not sure you are going to be able to combat that. I find it a little hard to believe that your C++ experiment with 200ms for 1mio iterations had the loop in C++. C API for Python of course works fine under Windows 10, so your silent crash is going to be solvable. Share this post Link to post
pyscripter 689 Posted March 22, 2021 (edited) Your loop calls ExecStrings which means you have the overhead of string compilation. in every iteration What you can do instead is to wrap you python code in a function: def calc_outflow(inflow) if (inflow > 6) or (inflow < 2) : return 40 else: return 20 and call the function from Delphi. Uses VarPyth; script.LoadFromFile('loop.py'); PythonEngine1.ExecStrings(script); var CalcOutFlow: Variant := MainModule.calc_outflow; for i:=1 to i_end do begin Output := CalcOutFlow(Random * 10.0); end; You will get extra speed if instead of Output := CalcOupFlow(Random * 10.0); you call the function using low-level PythonEngine routines e.g. PythonEngine.PyObject_CallObject Edited March 22, 2021 by pyscripter Share this post Link to post
Rolf Fankhauser 1 Posted March 22, 2021 Thanks for the hint! This was my next idea but I didn't yet know how to call the function. I already noticed this when I used the C API. Then I also changed from the script to a function and could improve the speed. For Lua I got an improvement of a factor 10 using a function. Share this post Link to post
Rolf Fankhauser 1 Posted March 23, 2021 @pyscripter The compiler didn't accept the following line proposed by you: var CalcOutFlow: Variant := MainModule.calc_outflow; So, I split it: CalcOutFlow: Variant; (in var block) and CalcOutFlow := MainModule.calc_outflow; But then I get an error on line: Output := CalcOutFlow(Random * 10.0); (the compiler does not understand the bracket) Share this post Link to post
pyscripter 689 Posted March 23, 2021 Does Output := MainModule.calc_outflow(Random * 10.0); work? Share this post Link to post
Rolf Fankhauser 1 Posted March 23, 2021 Yes !! => now 1 mio iterations in 1800ms, not bad. I will try the low level functions of PythonEngine. It seems that they are wrapper of the C-API functions? Next step would be to use P4D in C++Builder because my application for sewer system simulation is written in C/C++ I think there is a tutorial from David I for installation and use? Thanks for your prompt help! Share this post Link to post
pyscripter 689 Posted March 23, 2021 6 hours ago, Rolf Fankhauser said: I will try the low level functions of PythonEngine. It seems that they are wrapper of the C-API functions? Nοt wrappers but the exported C-API functions of the python DLL. You can get the PPyObject corresponding the python function by using ExtractPythonObjectFrom(MainModule.calc_outflow). 6 hours ago, Rolf Fankhauser said: I think there is a tutorial from David I for installation and use? Yes but bear in mind that the package structure was modified since the tutorial video was produced. Share this post Link to post
Rolf Fankhauser 1 Posted March 23, 2021 I used the following code (not complete) to call the function in C++Builder: PyObject *pName, *pModule, *pDict, *pFunc; PyObject *pArgs, *pValue, *pType, *pTraceback; int i; ... pArgs = PyTuple_New(1); ... for (int j = 0; j < j_end; j++) { i = random(10); pValue = PyInt_FromLong(i); PyTuple_SetItem(pArgs, 0, pValue); pValue = PyObject_CallObject(pFunc, pArgs); } I guess that PPyObject (Delphi) is equivalent to PyObject (C) With ExtractPythonObjectFrom(MainModule.calc_outflow) I would define pFunc, right? I will try it... Share this post Link to post
pyscripter 689 Posted March 23, 2021 1 hour ago, Rolf Fankhauser said: I guess that PPyObject (Delphi) is equivalent to PyObject (C) PyObject * Share this post Link to post
Rolf Fankhauser 1 Posted March 24, 2021 Ok, 1 mio iterations in 325 ms (average). That's close to the C-API version and 10 times slower than compiled. That is usable. That's the code: procedure TForm1.btRunScriptedClick(Sender: TObject); var i, i_end: integer; start, stop: TDateTime; start2, stop2: cardinal; sw : TStopWatch; script: TStringList; PyFunc, PyValue, PyArgs: PPyObject; begin sw := TStopWatch.Create; script := TStringList.Create; Memo2.Lines.Append(''); Memo2.Lines.Append('loop started with python script, 1 mio iterations!'); try i_end := 1000000; start := Now; start2 := GetTickCount; sw.Start; script.LoadFromFile('loop_function.py'); PythonEngine1.ExecStrings(script); PyFunc := ExtractPythonObjectFrom(MainModule.calc_outflow); PyArgs := PythonEngine1.PyTuple_New(1); for i:=1 to i_end do begin Input := Random * 10.0; //PyValue := PythonEngine1.PyLong_FromLong(Input); integer version PyValue := PythonEngine1.PyFloat_FromDouble(Input); PythonEngine1.PyTuple_SetItem(PyArgs, 0, PyValue); PyValue := PythonEngine1.PyObject_CallObject(PyFunc, PyArgs); //Output := PythonEngine1.PyLong_AsLong(PyValue); integer version Output := PythonEngine1.PyFloat_AsDouble(PyValue); //Memo2.Lines.Append('Input: ' + IntToStr(Input) + ', Output: ' + IntToStr(Output)); end; stop := Now; stop2 := GetTickCount; sw.Stop; if sw.IsHighResolution = true then Memo2.Lines.Append('Is high resolution!') else Memo2.Lines.Append('Is not high resolution!'); Memo2.Lines.Append(''); Memo2.Lines.Append('Time by Now to run loop: ' + FloatToStr(RoundTo((stop - start)*24*3600*1000, -3)) + ' ms'); Memo2.Lines.Append('Time by GetTickCounts to run loop: ' + IntToStr(stop2 - start2) + ' ms'); Memo2.Lines.Append('Time by StopWatch to run loop: ' + IntToStr(sw.ElapsedMicroseconds) + ' us'); Memo2.Lines.Append('Time by StopWatch to run loop: ' + IntToStr(sw.ElapsedNanoseconds) + ' ns'); finally { with PythonEngine1 do begin if Assigned(PyValue) then Py_DECREF(PyValue); if Assigned(PyArgs) then Py_DECREF(PyArgs); if Assigned(PyFunc) then Py_DECREF(PyFunc); end; } sw.Free; script.Free; end; end; I have some problems with dereferencing the PPyObjects. Therefore I removed them. I got an AV when I run the loop multiple times with dereferencing. But I don't understand why. Without dereferencing I had no problems to run the loop multiple times. Share this post Link to post
pyscripter 689 Posted March 24, 2021 (edited) You were dereferencing Py_Func without increasing the reference count first. Reference counting is tricky in Python. See Reference Counting in Python (tripod.com). This is why is safer to use VarPyth, which takes care of ref counting, even if is slower. The code below has not been tested: var PyFunc:= ExtractPythonObjectFrom(MainModule.calc_outflow); // borrowed reference var PE := GetPythonEngine; PE.Py_INCREF(PyFunc); // can skip since it is protected inside the temp variant MainModule.calc_outflow but is safer to inc it. var PyArgs := PE.PyTuple_New(1); // Will need to be deref for i:=1 to i_end do begin Input := Random * 10.0; var PyInput := PE.PythonEngine1.PyFloat_FromDouble(Input); PE.PyTuple_SetItem(PyArgs, 0, PyInput); // PyInput is now the responsibility of the tuple var PyValue := PyObject_CallObject(PyFunc, PyArgs); // the result of the function is your responsibility Output := PE.PyFloat_AsDouble(PyValue); PE.Py_DECREF(PyValue); end; PE.Py_DECREF(PyArg); PE.Py_DECREF(PyFunc); // to match the increment Edited March 24, 2021 by pyscripter Share this post Link to post
Rolf Fankhauser 1 Posted March 24, 2021 Thanks a lot !! This reference counting is annoying !! Thanks for the link, I will study the article. I supposed that ExtractPythonObjectFrom does increment the reference count. I corrected my code with your above suggestions. You forgot to remove PythonEngine1: var PyInput := PE.PythonEngine1.PyFloat_FromDouble(Input); Performance didn't change Share this post Link to post
darnocian 86 Posted May 11, 2021 One of the biggest issues with python, not necessarily python4d, is that everything is boxed. Many analytical packages use some packages like numpy/scipy which have some very powerful underlying C code that provides the boost without needing the boxing and unboxing of values where the python code is just providing very flexible glue. import numpy def test(size): x = numpy.random.rand(size) * 10 return numpy.logical_or(x > 6, x < 2).astype(int) * 20 + 20 test(500000) to get the above to work, you will need to install numpy if you don't already have it: pip install numpy It may be interesting to see if this approach provides any improvements? In the above, I could see there being a cost with the way it still has to evaluate the logical expression with a callback. Share this post Link to post