Can't get but 40% CPU usage multi-tasking

Kas Ob. · November 4, 2024

I am chipping in, i don't have CPU with E-Core and P-Core, but i believe i do have a clear picture on them and will try to explain and clear few misconception about CPU core in general, HT (also called SMT) and the new technology E-Core with P-Core.

(too long to read ?, just skip the latest few lines)

We must start at the beginning,

How Multithread is working ? in other words how OS simulate and provide Multitasking ?

This is essential to understand, as many know it, the running thread which called CPU (aka CPU processor), must be stopped and switch context then continue from the new point, and how low level interrupt involve is irrelevant here, one can imagine a hardware timer by the CPU itself stop the current running process (from execution processing in virtual mode) and start by going to specific address in protected mode, this make the thread (again i am talking about CPU and running Core thread) a kernel one, this thread itself will run in the context of OS scheduler and will pick a new context then continue, hence will appear as the OS providing multitasking by slicing the running time between multiple contexts, these multiple contexts could be in hundreds or thousands, and these have nothing to do with CPU or any hardware, it is agnostic for the CPU and it is irrelevant, so the limit is OS resources to save contexts.

Thread context, is resource(s), from the OS point of view is very small and limited, it is essentially are the current CPU registers (general and others including execution pointer, stack pointer, debug ....etc), these are a must to make the thread continue running multitasking, but the OS also need its own identifiers and resources, but again this one is very small as it is one memory page (4kb) storing the context needed by the hardware ( registers and what not) and the OS specific data like the thread TLS and context recognizing if it is User or Kernel one... that in very simple way and it is enough to understand how the OS with one core CPU ( one running physical processer) can provide multitasking and multithreading.

The process or the operation of switching the context is well called Context Switching.. duh ! and it happens in software, in OS kernel scheduler.

Now with multi physical cores, the above is the same multi core can and will do the exact same as one core, only the kernel scheduler start to become a little more complex (aware) to adjust and when and where these threads continued to run, example if a heavy process thread running with higher priority and taking its sweet time, the scheduler might refuse to perform a context switch and send the thread running to continue its work, or decide to pick a specific thread was running on the same core instead of making salad and arbitrary switching, there is much to write here, but this is enough for now.

Now at some point CPU manufacturer had an idea what if we did the switching of the context instead of the OS kernel, hmmm.... we can simply build our own hardware context and simulate 2 thread while doing nothing at all except storing the hardware context of the running CPU, and thus they created HT (hyperthreading), it doesn't provide anything other than performing hardware context switching on its own, and in theory they can make HT3 version for simulating 3 threads (3 logical core form one physical) or even 128 logical one, but the CPU must have extra space for storing the contexts and that is it, so it is cheating or faking.

Now, how much is that HT useful or gaining performance?

the answer is it depends, mostly yes it is faster, but it could be slower, see... my Windows 10 saying there is ~1460 running thread at this moment of writing, my CPU is i7-2500k, no HT !, my HT i-2600k burned in March so i replaced it with the mother board with used one at $35

, and i am satisfied with this as the change didn't make dramatic or even noticeable performance change for me, anyways, OS is deciding when and where to switch the context to, as almost all of these ~1500 threads are doing nothing, just waiting on some hardware interrupt, signals in general, software ones.

when HT is performing slower?

while if there is very intensive execution, the OS could do better than hardware in the context of context switching because it does know to whom give more execution time slice, here the hardware switching is fooling the OS and wasting time, there is many resources on the internet asking and explaining how and when HT is slower or faster.

So to short the above HT is a cheat trick by the CPU to emulate another core, very helpful with more of the %90 any modern OS CPU usage.

Now to our main subject E-Core and P-Core, intel wanted to break this stall, as we did reach the peak of CPU speed with Sandy Bridge or may be Haswell, they invented and added many tricks to enhance the performance, but they weren't enhancing the performance of the low level instruction execution, all the tricks were about the end of the execution, the result, the visible change, and they did great job mostly in decreasing the power usage and enhancing the output, yet there was the problem with HT and its impact, as HT at anytime can be hindering.

So they implemented and presented a new standard (technology) to involve the OS while adding new simplified cores, physical ones, these ones doesn't have the full potential of latest Intel technology and tricks !, these could be the ones that are waiting, of course this need the OS to signal or mark the current context as ...like.... nah.. this one is performing Sleep(t) or this one is blocking for another thread so this is lower processing priority, we are talking about E-Core (efficiency core), it is full CPU with full capability, BUT it doesn't have the full packages of Intel tricks, like out-of-order-execution in its full window length, or lack the out-of-order-execution or full dedicated L1 cache size like P-Core, i don't know for sure what have being decreased or removed to be honest, and quick search didn't help much, but you got the idea, most articles draw them as smaller in physical size like this, notice the size and the E-Core shared LLC (Last Level Cache, known as L1)

Now after all of the above comes how to utilize the CPU in full with the existence of P-Core and E-Core, i can't answer that too, as i don't have physical access to one, and answering this is little harder than it seems, like this article is deep and great

https://www.anandtech.com/show/17047/the-intel-12th-gen-core-i912900k-review-hybrid-performance-brings-hybrid-complexity

Few extra thoughts

1) While my late deceased HT CPU was with me something around 13 years, big chunk of its life was running with HT disabled, yes if you are going to optimize and do the right clocking then HT will only corrupt your result.

2) To be honest disable HT didn't impact my PC experience at all.

3) if you ever need to use fibers or you are building application with intensive CPU usage, you really should disable HT, as it will only throw sticks in the wheels of your application or server.

4) While it is useful in general, i mean all the Intel tricks, some of them are expensive, and not worth it at all, just waste of money.

5) The huge impact that every one should pick is power consumption when choosing CPUs, as it will reflect how CPU will throttle, and when, the lower CPU power needs, the faster and longer it will perform the intensive processing, while in obvious way if you don't need the intensive usage then don't buy the most expensive, it is simple logic if you think about it, paying top $ for a feature that may save me few minutes per week, is waste of money, and if you calculate if that saving will be in hours per week, then it is valid argument.

5) In general, wouldn't expect E-Core to be a lot of useful in doing work , as they designed and implemented as second degree performer, and should be dedicated to waiting for input, as many of the resources i read they really under performing when it comes to memory operations, aka performing as they were cores from decade or more ago, so utilizing them at %100 could be a mistake as all what they will contribute is more heat by consuming more power, affecting other cores and causing throttling.

6) About throttling, while yes throttling is decreasing the power and the clock, it doesn't mean the newer CPU will not shut down the E-Core, in theory it will be easier and more efficient to shut down a core, but hey, there can't be a shut down, it will be running the core at lowest clock possible with the lowest power possible, this might be what is going on, and this is my personal opinion, modern CPU should start to throttle cores in interleave way, in hope to dissipate the heat in half step instead of lower them all, but based on .. again my opinion, but not the least it will be easier and logical for the efficiency to throttle E-Core before P-Core.

Lastly, for the subject of this thread and the Jud question, you need to monitor your CPU in lowest level you can, share with us a screen shot of Process monitor while application under full load where you expect it to utilize %100, only then we can say something, look at task manager when report the overall of all the cores (logical and physical) is uninformative !, and wrong way to find the short coming, to be honest i didn't see a screen shot of these mixed core as reported by monitoring tools, like Process Monitor or even CPU-Z...

Hope you find this informative and helpful.

DelphiUdIT · November 4, 2024

3 hours ago, Kas Ob. said:

1) While my late deceased HT CPU was with me something around 13 years, big chunk of its life was running with HT disabled, yes if you are going to optimize and do the right clocking then HT will only corrupt your result.

2) To be honest disable HT didn't impact my PC experience at all.

3) if you ever need to use fibers or you are building application with intensive CPU usage, you really should disable HT, as it will only throw sticks in the wheels of your application or server.

I partially agree with you: measuring (always my experience on software and system that I have done, and view competitors's solutions) the performance (timing based) seems that HT in my case is better. I used with the same twins systems HT and not HT (with Alder Lake) and with HT I gain form 15% to 20% of better performance (timing lowered from 150 ms. to 120 ms. with HT).

Of course it is my case, and my be should be better "thinks" for not HT. Ahhh .... the tempearture were lowered from 85/90 C to 80/85 C.

3 hours ago, Kas Ob. said:

5) The huge impact that every one should pick is power consumption when choosing CPUs, as it will reflect how CPU will throttle, and when, the lower CPU power needs, the faster and longer it will perform the intensive processing, while in obvious way if you don't need the intensive usage then don't buy the most expensive, it is simple logic if you think about it, paying top $ for a feature that may save me few minutes per week, is waste of money, and if you calculate if that saving will be in hours per week, then it is valid argument.

There is a scope about that ... an application (or better a system) should (MUST) achieve the "GOAL" ... the price and the techs. go in second plane (of course ALL should be OK, techs, economy, profit, ect ...).

3 hours ago, Kas Ob. said:

6) About throttling, while yes throttling is decreasing the power and the clock, it doesn't mean the newer CPU will not shut down the E-Core, in theory it will be easier and more efficient to shut down a core, but hey, there can't be a shut down, it will be running the core at lowest clock possible with the lowest power possible, this might be what is going on, and this is my personal opinion, modern CPU should start to throttle cores in interleave way, in hope to dissipate the heat in half step instead of lower them all, but based on .. again my opinion, but not the least it will be easier and logical for the efficiency to throttle E-Core before P-Core.

Like I told there is no notation about that CORES will be put in deep sleep when are working. If they are not in use they will be put in deep sleep. But if you do some tests you will notice that is not sense to move some works from one CORE to another and go to sleep the first CORE. Unitll there is a lot of power to use, of course this will do (masks from ITD). After that you will not see anything, the CORES works until they should.

In normal Windows state, without any user appliocation (I mean application that USER LAUNCH, not application that system launch). you will see that the Processor use only one or two THREADS (ONE P-CORE) for all the threads in use, and some E-CORES (one or two). The P-CORE in used is called "preferred" and it (they ... 2 THREADS ... with HT) run at full freq. (in my case 5.8 GHz) ... so YES some CORES are in deep sleep and all threads run in one or two cores, but THIS is for performance ... all the threads occupy only 25% the capacity of the single core and so the performance of the core is still at OPTIMUM level,

If YOU overload one CORE, what is the sense to "move in" some other task to free another CORE and put to it in deep sleep ?

The ITD resolve this for you, dynamically balancing the load. That is the news. And if you are not happy about this, you can affort the use of "Affinity Mask", like I did in the past.

All other CORES except the "preferred" are normally "parking". To use the full power I should do a lot of works (32 software Threads that run without sleeping or waiting). The frequency is still high. Only the power is at max value, throttling the frequency. I think is enough to works at the best conditions. Throttle the single frequency of every CORE is better ? Uhmm I' dont' know but I don't think that is right way. But we'll see in the future.

Sign In

Can't get but 40% CPU usage multi-tasking

Recommended Posts

Kas Ob. 142

Share this post

Link to post

DelphiUdIT 245

Share this post

Link to post

Create an account or sign in to comment

Create an account

Sign in

Browse

Activity