In case you have not heard, Python 12 made a breakthrough towards running threads in parallel. For now, the feature is only available through the C-API and there are limitations. But still, the performance boost that comes from the exploitation of multiple cores makes this feature very desirable. P4D makes this feature very easily accessible by adding a new execution mode to TPythonThread. There is also a new demo (Demo 36) that shows how to run threads in parallel. If you are wondering about the performance boost, see the results below:
Classic Subinterpreters:
prime count 78498
prime count 78498
prime count 78498
prime count 78498
prime count 78498
Elapsed ms: 13695
Subinterpreters with own GIL:
prime count 78498
prime count 78498
prime count 78498
prime count 78498
prime count 78498
Elapsed ms: 3482
You can find more details in this P4D announcement.