" How does Delphi (11.3 in particular) handle this with the parallel for? " Hopefully not at all. It is not its job.
The CPU cores are a shared resource; other application and services use them too. So, it is up to the operationg system to somehow measure the load of all cores over some time, and make scheduling decisions based on that. The only factors (aside from the affinity mask) that incluence the decisions of the scheduler are the priority of the process amd its threads.
As for affinity masks, this would probably also not the right tool, as soon as CPUs with 32 or 64 cores (or more) become more widespread in the near future, due to the existence of the processor groups:
https://docs.microsoft.com/en-us/windows/win32/procthread/processor-groups
How do this groups handle different kinds of cores?
In my opinion, trying to measure CPU load from inside an application, and trying to second-guess the scheduler is not the right approach.