When CPU affinity matters

Keeping an application below a certain total CPU % use

The first use I'll mention is a very common one - keeping a process limited to using a certain amount of CPU time or percent of available total CPU time. By limiting its CPU affinity to specific cores, you have the ability to control the available CPU time it has access to, out of the total CPU time pool. Thus, you can keep it at an increment of 25%, 50%, 75%, or 100% for a fully (all physical) quad-core processor. Now, when you throw in logical processors, the picture gets more complex. While the scheduler may show an exact 50% for 2 of 4 logical cores, that may not mean those two cores are executing at 50% the capacity of two physical cores. The computational capacity limits are actually staggered, so they might instead be: 25%, 32%, 82%, 100%. It is hard to say, and varies for each CPU. However, this is an effective method to limit an application's CPU use. It is *not* recommended for system components, security software, or other critical services.

You can do this with:

  1. Process Lasso's Default CPU Affinity
  2. When Process Lasso ProBalance events occur
  3. Via the Process Lasso Watchdog (so that the change is only induced when certain criteria are met)

Remember, the OS CPU scheduler itself will work-around any busy cores, so you need not worry too much about putting too much on too few cores, *but* you shoudl be careful not to under-utilize your computing capacity.

Per-Core Frequency Scaling

Some newer processors, both AMD (TurboCore) and Intel (TurboBoost), have frequency scaling technologies that allows for scaling up of specific cores on-demand. The development of this emerging feature in CPU hardware means that what CPU affinity you set may actually make a big difference in real-world performance.

Normally as a thread gets a time slice (a period in which to use the core), it is granted whichever core [CPU] is determined to be most free by the operating system's scheduler. Yes, this is in contrast to the popular fallacy that the single thread would stay on a single core. This means that the actual thread(s) of an application might get swapped around to non-overclocked cores, and even underclocked cores in some cases. As you can see, changing the affinity and forcing a single-threaded CPU to stay on a single CPU makes a big difference in such scenarios. The scaling up of a core does not happen instantly, not by a long shot in CPU time.

Therefore, for primarily single (or limited) thread applications, it is sometimes best to set the CPU affinity to a specific core, or subset of cores. This will allow the 'Turbo' processor frequency scaling to kick in and be sustained (instead of skipping around to various cores that may not be scaled up, and could even be scaled down).

Another possible criticism of this feature is that certain threads will run faster than others. This is an unusual scenario many OS CPU Schedulers may not be prepared to handle. Windows 7, with its cycle-based counter, should be able to adequately handle it - but others not so much. While this usually doesn't matter anyway, it could conceivably cause timing issues in cases where timing just happens to be important - for instance a race condition. Sure, race conditions should never exist in the 'wild', but in this imperfect world we know they do.

Regardless, the end lesson is that it takes TIME to change frequency -- often more time than the code took to execute. Keeping the CPU affinity restricted to certain overclocked cores is more optimal than having the threads of the process swapped around to all cores, over-clocked or not. As of now no OS scheduler does takes the active clock speed of individual cores into consideration (afaik). Perhaps that will change in a future release of Windows, but it seems unlikely at least for several years.

Core Thrashing

Just by the name, you know this is a bad thing. You lose performance when a thread is swapped to a different core, due to the CPU cache being 'lost' each time. In general, the *least* switching of cores the better. One would hope the OS would try to avoid this, but it doesn't seem to at all in quick tests under Windows 7. Therefore, it is recommended you manually adjust the CPU affinity of certain applications to achieve better performance.

Intel HyperThreading and AMD Bulldozer

Another important issue is avoiding placing a load on a HyperThreaded (non-physical) core. These cores offer a small fraction of the performance of a real core. The Windows scheduler is aware of this and will swap to them only if needed. As of mid Jan 2012 the Windows 7 and Windows 2008 R2 schedulers have a hotfix for AMD Bulldozer CPUs that see them as HyperThreaded, cutting them down from 8 physical cores to 4 physical cores, 8 logical cores. This is for two reasons: The AMD Bulldozer platform uses pairs of cores called Bulldozer Modules. Each pair shares some computation units, such as an L2 cache and FPU. To spread out the load and prevent too much load being placed on two cores that have shared computational units, the Windows patch was released, boosting performance in lightly threaded scenarios.

When I say Bulldozer+, I refer to PileDriver, SteamRoller, and all later incarnations we've seen, or are rumored to be in the works.

External and Reference Links

This article is part of a collection of information available at Bitsum Technologies as part of our Process Lasso software.

Get Process Lasso to experience the benefits of automated process priority adjustment and more.

'CPU' vs. 'Core'

These two words are synonoms, though in this world of rapidly evolving technology they have changed a bit. CPU is now used to represent a single physical unit, where-as core is used to represent one of many CPUs on a single die. In this way, it can be confusing. Just remember, 4 cores really means 4 CPUs all on one chip.

Reminder about what CPU % use is

I want to remind people that CPU utilization occurs in micro-bursts, and the % use per second is not a perfect representation of how fast or slow a CPU is. That is to say, just because that metric shows only 75% of a CPU consumed, that doesn't mean that you had an 'extra' 25% laying around. The speed at which that 75% was executed matters too. At best this metric gives you some idea of how CPU intensive your operations are.

NO gimmicks!