Jump to content
Jud

Can't get but 40% CPU usage multi-tasking

Recommended Posts

I write some programs in Delphi that make heavy use of the CPU for a long time.  I used to use Windows 7, 8, and 10 on Sandy Bridge, Ivy Bridge, and Haswell i7 CPUs, with four hyperthreaded cores.  For multitasking, I could get 98% CPU usage.

 

Now I'm using Windows 11 on 12th-14th generation i7s, with eight hyperthreaded P-cores.  But I cannot get more than about 40% CPU usage with 16 threads running.  I look at Task Manager and Process Monitor, and see the under-utilization of the CPU. I have the power setting set for high performance.

 

What caused this loss in performance - Windows or the new CPUs?  And is there a way to utilize most of the CPU?  

 

Share this post


Link to post

Is it a laptop? Maybe the cores get too hot and so gets throttled...

  • Like 1

Share this post


Link to post

First of all, the CPU % in task manager is about all threads (I mean cores with hypert. and not). If you use only some cores (and the decision is made by the ITD) the CPU usage never go to 100%.

 

I can assure you that you can reach the full power of you processor. In my applications I stay form 50% to 90% of CPU usage using all possible Corse, but I use in my apps more then 30 TThreads.

Share this post


Link to post
1 hour ago, Olli73 said:

Maybe the cores get too hot and so gets throttled...

Throttling makes the CPU run slower, not run less.

 

2 hours ago, Jud said:

What caused this loss in performance

You can't measure performance by looking at the task manager. Measure the amount of work being done instead.

  • Like 2

Share this post


Link to post
8 minutes ago, Anders Melander said:

Throttling makes the CPU run slower, not run less.

 

OK, but Copilot tells me:
 

Quote

Yes, when a CPU overheats, individual cores may temporarily shut down to prevent damage. Modern CPUs have thermal protection mechanisms in place to take action when specific temperature thresholds are exceeded. This can include shutting down individual cores or even completely shutting down the system to protect the hardware.

 

  • Like 1

Share this post


Link to post
2 hours ago, Jud said:

What caused this loss in performance - Windows or the new CPUs?  And is there a way to utilize most of the CPU? 

Like @Anders Melander said, the Task Manager is not the right instrument to measure the performance of your application.

 

One of the simple way is to measure the time the your software needs to do some tasks. Compare with old systems (if you can) or analyze with the datas that you histrical have.

If you really use a bunch of threads you are not able to "consume" all processor power.

 

You can measure performance datas with "profilers" or simply usingn "time base view" with TStopWatch (from System.Diagnostics) for primitive analyses.

Share this post


Link to post
14 minutes ago, Olli73 said:

OK, but Copilot tells me:
 

...

You are really confident about CoPilot or others AI ?

 

Try to reflect about the situation:

 

1) I use 10 COREs at 50%, and the CPU go to 95 degrees:  the system power down one (or more) core  ???? WHYYYYY ????  :classic_blink:

 

The logics about the "deep sleep" of the core is about energy saving, not protections, so if the cores are not used they are "sleeped".

For the protections the Throttle and other factors (like cores voltage and others) are in "game".

 

That is what I know, and I'm ready to change my mind if an HUMAN give me usefull informations, not an AI.

Really, don't take the AI like a GOD.

 

Look here (I try by myself an example from this forum):

Quote

>>> is the number 16777619 a prime number?
No, the number 16777619 is not a prime number.
As I said before, it can be divided by 11 and 1521119, so it does not satisfy the definition of a prime number (i.e. being divisible only by 1 and itself).

 

>>> can you recalculate?
I apologize for the previous error. I recalculated the factor and...
Yes, the number 16777619 is a prime number.
In fact, I did not find any other factor other than 1 and itself. Therefore, it meets the definition of a prime number!

 

Share this post


Link to post
8 minutes ago, DelphiUdIT said:

You are really confident about CoPilot or others AI ?

 

I am aware of issues with AI, Too often AI has invented Delphi functions which would be practical but are not available.

 

I wanted only mention what AI has to say about this...

Share this post


Link to post

Reduced performance with multiple threads may have many reasons, suce as race conditions, lock contentions, cache depletion, etc.
 

Share this post


Link to post
4 minutes ago, Lars Fosdal said:

Reduced performance with multiple threads may have many reasons, suce as race conditions, lock contentions, cache depletion, etc.
 

You are right, and more others like not optimize code .....

 

But I think he's just looking at appearances.
If before he used old generation CPUs, now the new ones (hybrid) are abyss away in terms of real processing power.
One of my applications on an "I7 9700 DDR4" used 75% of CPU and the times "measured" were near 190 ms (no Throttle).
The exact same one on an "I7 12700 DDR4" system uses 45% of CPU and the times are near 110 ms.

So there is a performance drop ... well I would say no.

Share this post


Link to post
5 hours ago, Jud said:

I have the power setting set for high performance.

I forgot about this: in my personal experience, the best settings for applications (not games) is "balanced".

This is based on experience in industrial systems (and in my daily work) where temperature, woking load and peformance should be mixed and should provide consistent performance.

 

With "high performance settings", in the past and I never tried again, I had many issues with variable performance (most of them due to Throttle and temperature).

 

If you force "high performance", the system cannot modulate the use of the resources. Most of the time I had "peaks of lock with downgrade timing" or high temperature recording, with my daily working PC too.

 

Of course, this is my experience, and I'm sure that someone's experience is different.

 

  • Like 2

Share this post


Link to post
8 hours ago, Olli73 said:

but Copilot

I'm seeing this more and more lately. It is very worrisome that people actually think these autocomplete engines are authoritative and will actually argue with you if you present facts that disagree with the text generated by these tools. They may have their uses, but this is borderline scary.

  • Like 3

Share this post


Link to post
15 hours ago, Olli73 said:

Is it a laptop? Maybe the cores get too hot and so gets throttled...

No, it is a desktop.

Share this post


Link to post
14 hours ago, DelphiUdIT said:

First of all, the CPU % in task manager is about all threads (I mean cores with hypert. and not). If you use only some cores (and the decision is made by the ITD) the CPU usage never go to 100%.

 

I can assure you that you can reach the full power of you processor. In my applications I stay form 50% to 90% of CPU usage using all possible Corse, but I use in my apps more then 30 TThreads.

 

Share this post


Link to post

I was allowing it to use all cores.  But it doesn't seem to be putting my program on E-cores  To be sure, I set the affinity to use only P-cores.  Neither way gives the performance I expected.

Share this post


Link to post
14 hours ago, Anders Melander said:

Throttling makes the CPU run slower, not run less.

 

You can't measure performance by looking at the task manager. Measure the amount of work being done instead.

Well, I used to be able to get to 100% usage, now I can't get 50%.

Share this post


Link to post
14 hours ago, DelphiUdIT said:

> One of the simple way is to measure the time the your software needs to do some tasks. Compare with old systems (if you can) or analyze with the datas that you histrical have.

 

Good idea - I do still have an Ivy Bridge i7 with Windows 10 that I can compare to (although it only has 4 cores).

 

> If you really use a bunch of threads you are not able to "consume" all processor power.

 

I used to be able to use 98% +.

 

> You can measure performance datas with "profilers" or simply usingn "time base view" with TStopWatch (from System.Diagnostics) for primitive analyses.

 

I do that, but the CPUs are more than 50% idle.

 

Share this post


Link to post
12 hours ago, DelphiUdIT said:

> I forgot about this: in my personal experience, the best settings for applications (not games) is "balanced". ...

 

Thanks for that.  I just assumed that the "high performance" power setting would be best to get the most out of the CPU.

 

Share this post


Link to post
10 minutes ago, Jud said:

Well, I used to be able to get to 100% usage, now I can't get 50%

My old car consumed 7 liters of diesel per 100 km. The new one only uses 3.5 liters of diesel per 100 km. There must be something wrong with the engine...

 

If your whole system was 100% CPU bound, and able to utilize all cores without any contention, then there would be something to talk about - but it isn't. There's also RAM, disk, bus and controller performance to take into account.

As we have tried to explain, you need to look at the amount of work being done and not the CPU %.

 

You can use the system performance monitor if you really want to find out what your system is doing and what, if anything, is preventing it from running at 100% CPU. But you will have to read up on a lot system internals in order to know what to look at and how to interpret the data.

 

20 minutes ago, Jud said:

I used to be able to use 98% +.

Doing what exactly?

Share this post


Link to post
19 minutes ago, Jud said:

I just assumed that the "high performance" power setting would be best to get the most out of the CPU.

It is. Just make sure throttling is disabled in the BIOS so the CPU runs at full speed all the time. Otherwise it will try conserve energy by throttling the CPU when it thinks you don't need performance.

Share this post


Link to post
19 hours ago, DelphiUdIT said:

You are really confident about CoPilot or others AI ?

 

Try to reflect about the situation:

 

1) I use 10 COREs at 50%, and the CPU go to 95 degrees:  the system power down one (or more) core  ???? WHYYYYY ????  :classic_blink:

 

 

To lower the temperature. It may be better to load up a few cores with processes and run the rest at their minimum speed than to have processes spread across all cores. This was actually a problem with Windows and some of the very early multicore AMD CPUs. Those CPUs had to run all cores at the same clock speed. Windows, which at that time was designed when multiple cores meant multiple CPUs, would move processes onto fewer cores because in a multi-CPU system this would reduce noise and heat. On these early AMD processors that resulted in, say, one core running at maximum speed, which would then require all the remaining cores to run at maximum clock speed as well. Microsoft had to put out a patch to change this behavior.

 

This has become an issue again because now Intel has CPUs where some cores are high-powered and others are lower-powered but more efficient. OS schedulers now need to take this into account; I know the Linux kernel just received patches for dealing with this type of CPU more efficiently. And that's not even getting into issues with some AMD CPUs and memory, such as their X3D CPUs that have on-CPU memory. In the 16 core models (7950X3D) only 8 cores have access to the extra cache memory. Also, "AMD Ryzen has separate L3 for each quad-core cluster, so data transfer between core-clusters is slower than within a single core cluster. (And if all the cores are working on the same data, it will end up replicated in the L3 of each cluster.)" Scheduling is much more complicated nowadays in OSes! I think this lead to benchmark issues with the first of the latest gen AMD chips. Linux review sites gave rave reviews while Windows-oriented review sites gave poor reviews. Turns out there were issues with the Microsoft Windows scheduler that were affecting the performance which improved significantly on Windows after patching.

  • Thanks 1

Share this post


Link to post

I don't know about AMD CPU (they may use other tech. to assume protection and / or high performance), but with INTEL (new versions, I think from 11 Gen) there is a ITD (Intel Threads DIrector) that organize the load of the full CPU: he distribute dynamically all the threads around all CPU, and he can use E-Core or not depends from the "performance level", state of the single core (frequency, load) and other factors.

 

Is not true that all cores works at the same speed (with Intel). And this is the first things about the fact the is not necessary (and repeat is a not sense) to put one of them in one of their sleep state. The cores can go down to 700 MHz (may be, I'm not sure about this frequency) and less the 1 Watts power use (all cores). It's not necessary move all works in less COREs to maintain performance and protection.

The force of the slepp state of the cores that is working is not "writing" in none of the technical ooks of Intel. Of course, like I said before, load balancing is done by ITD, and is possible that, if one part of you application go in a WAITFORxxx, a Sleep(yyy) or other situations, the IDT move this code in other CORE and sò free one or more cores. But this is a dynamic about load balancing, not about protection.

Refer to much toushand pages of Intel: "743844_011-13th, 14th Generation Intel Core Processor Family - Vol 1", "248966-Optimization-Reference-Manual-V1-050", "Game Dev Guide for 12th Gen Intel Core Processor Hybrid Architecture", "325462-sdm-vol-1-2abcd-3abcd-4".

 

There is not issues about use of E cores, P cores or mixed: an application can set by self the setting about core use true affinity mask. I use that to assure that all power go to my "piece" of application that need that and that some other parts use the e-cores.

 

EDIT: P.S.: I tried in the last month to use only ITD (so  without use of affinity mask), and the analisys seems tha ITD works better. The same test, that I did some years ago, was not so positive.

Edited by DelphiUdIT

Share this post


Link to post
On 10/31/2024 at 1:44 AM, Jud said:

I write some programs in Delphi that make heavy use of the CPU for a long time.  I used to use Windows 7, 8, and 10 on Sandy Bridge, Ivy Bridge, and Haswell i7 CPUs, with four hyperthreaded cores.  For multitasking, I could get 98% CPU usage.

 

Now I'm using Windows 11 on 12th-14th generation i7s, with eight hyperthreaded P-cores.  But I cannot get more than about 40% CPU usage with 16 threads running.  I look at Task Manager and Process Monitor, and see the under-utilization of the CPU. I have the power setting set for high performance.

 

What caused this loss in performance - Windows or the new CPUs?  And is there a way to utilize most of the CPU?  

 

 

Update

I've spent a good deal of time experimenting with this.  There are two points:

 

1. The people that said to look at performance instead of CPU utilization were right.

 

2. Modern CPUs are different from older ones.

 

I've done optimizations for multitasking for more than 10 years.  Years ago I did it with Sandy Bridge, Ivy Bridge, and Haswell i7s (32GB of DDR3) and Sandy Bridge and Ivy Bridge Xeons (128GB of DDR3) - all with four hyperthreaded cores.

 

On those, generally the higher the CPU utilization, the more throughput.  The exception is when memory bandwidth starts to be the limit - then it is usually better to be running seven tasks instead of eight (in my tests).

 

But now I'm doing it on 12th- and 14th-generation i7s (64GB and 128HB of DDR5) and Kaby Lake Xeons (512GB of DDR4).  The Kaby Lake Xeon has four hyperthreaded cores and it behaves like the aformentioned CPUs.  

 

The new i7s with eight hyperthreaded cores behave differently.  When running 16 threads, the CPU was utilized under 50%.  I tried running more threads - that utilized more of the CPU, percentage wise, but didn't help performance.

 

I tried 20 threads, and that had the same performance as 16 threads.  I tried 24 threads, and that had 3% more performance than 16.  I tried 28 threads and that had 3% less performance than 16 threads.  These all utilized a higher percentage of the CPU.

 

So you really aren't gaining much, if anything, by running more than 16 threads, in my test, and you are using more electricity.  In fact, 15 threads performs better than 16.

 

Share this post


Link to post
4 hours ago, Jud said:

I tried 20 threads, and that had the same performance as 16 threads.  I tried 24 threads, and that had 3% more performance than 16.  I tried 28 threads and that had 3% less performance than 16 threads.  These all utilized a higher percentage of the CPU.

In the new(*) Intel architecture (Hybrid like Alder Lake and Raptor Lake), the cores work in different way: the P-Core are for performance with Hyperthreading and high frequencies, the E-core are for efficiency without Hyperthreading and with lower peak frequencies then P-Core.

The ITD now makes a good work, and of course if you use more tthreads than the CPU "virtual cores = P-Core*2 +E-Core" the performance is lower (I mean the timing that a simple task do is greater) .

 

If you coordinates yours tthread, you can increase the performance a lot: for example using the "WaitFor...." or a simple "Sleep(x)". Now, in my applications I don't have any tthread that "run" as empty ... all of them are in waiting state. This boost the performance and reduce the heat produced.

Some developers are used to create and destroy the tthread every time that they should run, but this of course depends on you works.

 

Take care that some complex AVX2 instructions have terrible times with e-cores and that same AVX2 instructions (not all have the same impacts) used the full resource of one physical core. If Hyperthreading is in use, and for some reasons two Threads of the same Core use AVX2 instructions ... there will be a degradation of performance.

 

I do extensive use of AVX2 instructions (with external libs) and with some of them I should lock the execution (affinity) in different physical cores (or sometimes use a semaphore to execute one or the other).

 

I get AVX2 info from my tests with various hardware, so this is not a "bibble". And I repeat myself, the ITD does a good work now, and may be what I write is not relevant anymore.

 

5 hours ago, Jud said:

So you really aren't gaining much, if anything, by running more than 16 threads, in my test, and you are using more electricity.  In fact, 15 threads performs better than 16.

This is not true (often). Depends about your needs. There are many factors to anayze about that.

 

Like I wrote before, you can run many more threads using some tech. and gain full performance. Take your mind about the power and that the most common factors that slow down the execution are:


- the imposed power limit (TDP) which can vary based on various factors (PL1, PL2, PL4 processor states);
- processor temperature (individual cores and packages).

 

Do not forget that, unless you use "-F" series processors or external cards, the graphics chipset also produces heat and therefore intensive use of graphics (such as at gaming level) produces heat in the chip.

 

(*) This is a not a new anymore ... the "Arrow Lake" arch. is another baby from Intel ...

 

P.S.: I spoke about Windows platform and 64 bit applications.

  • Like 1

Share this post


Link to post

Those are two example about "power" state: the first image is on PL2 (Turbo) Intel processor state, the second is the "heavy load" normal state (PL1). Of course those are from my system.

This is a very simple views, other more extensive talks should be done, but I don't think it is the scope of this forum.

 

image.thumb.png.92f9d65fe30b671df6277519fe477aab.png

 

image.thumb.png.e7f99fe0a18d665f29ec245840f0a521.png

  • Like 1

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×