When CPU Affinity Matters
When CPU affinity matters
Case 1: Limit a Process’s CPU Consumption
One common case where CPU affinity matters is one of CPU resource allocation. Specifically, keeping a process limited to using a certain amount of CPU time or percent of available total CPU time. By limiting a process to specific cores, you have the ability to control the available CPU time it has access to, out of the total CPU time pool. Thus, you can keep it at an increment of 25%, 50%, 75%, or 100% for a fully (all physical) quad-core processor. Now, when you throw in logical processors, the picture gets more complex. While the scheduler may show an exact 50% for 2 of 4 logical cores, that may not mean those two cores are executing at 50% the capacity of two physical cores. The computational capacity limits are actually staggered, so they might instead be: 25%, 32%, 82%, 100%. It is hard to say, and varies for each CPU. However, this is an effective method to limit an application’s CPU use. It is *not* recommended for system components, security software, or other critical services.
You can do this with Process Lasso, using any of three automation features:
Remember, the OS CPU scheduler itself will work-around any busy cores, so you need not worry too much about putting too much on too few cores, *but* you shoudl be careful not to under-utilize your computing capacity.
Case 2: Per-Core Frequency Scaling
Some newer processors, both AMD (TurboCore) and Intel (TurboBoost), have frequency scaling technologies that allows for scaling up of specific cores on-demand. The development of this emerging feature in CPU hardware means that what CPU affinity you set may actually make a big difference in real-world performance.
Normally as a thread gets a time slice (a period in which to use the core), it is granted whichever core [CPU] is determined to be most free by the operating system’s scheduler. Yes, this is in contrast to the popular fallacy that the single thread would stay on a single core. This means that the actual thread(s) of an application might get swapped around to non-overclocked cores, and even underclocked cores in some cases. As you can see, changing the affinity and forcing a single-threaded CPU to stay on a single CPU makes a big difference in such scenarios. The scaling up of a core does not happen instantly, not by a long shot in CPU time.
Therefore, for primarily single (or limited) thread applications, it is sometimes best to set the CPU affinity to a specific core, or subset of cores. This will allow the ‘Turbo’ processor frequency scaling to kick in and be sustained (instead of skipping around to various cores that may not be scaled up, and could even be scaled down).
Another possible criticism of this feature is that certain threads will run faster than others. This is an unusual scenario many OS CPU Schedulers may not be prepared to handle. Windows 7, with its cycle-based counter, should be able to adequately handle it – but others not so much. While this usually doesn’t matter anyway, it could conceivably cause timing issues in cases where timing just happens to be important – for instance a race condition. Sure, race conditions should never exist in the ‘wild’, but in this imperfect world we know they do.
Regardless, the end lesson is that it takes TIME to change frequency — often more time than the code took to execute. Keeping the CPU affinity restricted to certain overclocked cores is more optimal than having the threads of the process swapped around to all cores, over-clocked or not. As of now no OS scheduler does takes the active clock speed of individual cores into consideration. Perhaps that will change in a future release of Windows, but it seems unlikely at least for several years.
Case 3: Core Thrashing
Just by the name, you know this is a bad thing. You lose performance when a thread is swapped to a different core, due to the CPU cache being ‘lost’ each time. In general, the *least* switching of cores the better. One would hope the OS would try to avoid this, but it doesn’t seem to at all in quick tests under Windows 7. Therefore, it is recommended you manually adjust the CPU affinity of certain applications to achieve better performance.
Case 4: Intel HyperThreading and AMD Bulldozer
Another important issue is avoiding placing a load on a HyperThreaded (non-physical) core. These cores offer a small fraction of the performance of a real core. The Windows scheduler is aware of this and will swap to them only if needed. As of mid Jan 2012 the Windows 7 and Windows 2008 R2 schedulers have a hotfix for AMD Bulldozer CPUs that see them as HyperThreaded, cutting them down from 8 physical cores to 4 physical cores, 8 logical cores. This is for two reasons: The AMD Bulldozer platform uses pairs of cores called Bulldozer Modules. Each pair shares some computation units, such as an L2 cache and FPU. To spread out the load and prevent too much load being placed on two cores that have shared computational units, the Windows patch was released, boosting performance in lightly threaded scenarios.
When I say Bulldozer+, I refer to PileDriver, SteamRoller, and all later incarnations we’ve seen, or are rumored to be in the works.
External and Reference Links
- Intel Site on TurboBoost
- Extreme Overclocking AMD Phenom II x64 1055/1090T review
- Slashdot: Intel Turboboost vs. AMD Turbocore
- Tom’s Hardware: CORE Or Boost? AMD’s And Intel’s Turbo Features Dissected