AMD 3D V-Cache CPU Core Selection by CCD

When CPU Affinity Matters

Author: Jeremy Collake
Date: 01 Jan 2013
Categories: Tips and Tweaks

Case 1: Overcome Scheduler Inefficiency

It is sometimes necessary to assign an application or game to a specific set of CPU cores to maximize performance. For instance, on AMD X3D platforms, not all CPU cores have the same amount of cache available. One set of CPU cores (CCD) has more cache, and the other can clock higher. Using Process Lasso, you can direct your application or games to the CCD they perform best on.

Similarly, another use case is to avoid placement of loads on logical (Hyper-Threaded) cores of the same physical CPU core. This helps to ensure that there no contention between on-chip compute units.

Case 2: Limit a Process’s CPU Consumption

One common case where CPU affinity matters is one of CPU resource allocation. Specifically, keeping a process limited to using a certain amount of CPU time or percent of the total available. By limiting a process to specific cores, you have the ability to control the available CPU time it has access to, out of the total CPU time pool. Thus, you can keep it at an increment of 25%, 50%, 75%, or 100% for a fully (all physical) quad-core processor. Now, when you throw in logical processors, the picture gets more complex. While the scheduler may show an exact 50% for 2 of 4 logical cores, that may not mean those two cores are executing at 50% the capacity of two physical cores. The computational capacity limits are actually staggered, so they might instead be: 25%, 32%, 82%, 100%. It is hard to say, and varies for each CPU. However, this is an effective method to limit an application’s CPU use. It is *not* recommended for system components, security software, or other critical services.

You can do this with Process Lasso, using any of three automation features:

Remember, the OS CPU scheduler itself will work-around any busy cores, so you need not worry too much about putting too much on too few cores, *but* you should be careful not to under-utilize your computing capacity.

Case 3: Per-Core Frequency Scaling

Some newer processors, both AMD (TurboCore) and Intel (TurboBoost), have frequency scaling technologies that allows for scaling up of specific cores on-demand. The development of this emerging feature in CPU hardware means that what CPU affinity you set may actually make a big difference in real-world performance.

Normally as a thread gets a time slice (a period in which to use the core), it is granted whichever core [CPU] is determined to be most free by the operating system’s scheduler. Yes, this is in contrast to the popular fallacy that the single thread would stay on a single core. This means that the actual thread(s) of an application might get swapped around to non-overclocked cores, and even under-clocked cores in some cases. As you can see, changing the affinity and forcing a single-threaded CPU to stay on a single CPU makes a big difference in such scenarios. The scaling up of a core does not happen instantly, not by a long shot in CPU time.

Therefore, for primarily single (or limited) thread applications, it is sometimes best to set the CPU affinity to a specific core, or subset of cores. This will allow the ‘Turbo’ processor frequency scaling to kick in and be sustained (instead of skipping around to various cores that may not be scaled up, and could even be scaled down).

Another possible criticism of this feature is that certain threads will run faster than others. This is an unusual scenario many OS CPU Schedulers may not be prepared to handle. Windows 7, with its cycle-based counter, should be able to adequately handle it – but others not so much. While this usually doesn’t matter anyway, it could conceivably cause timing issues in cases where timing just happens to be important – for instance a race condition. Sure, race conditions should never exist in the ‘wild’, but in this imperfect world we know they do.

Regardless, the end lesson is that it takes TIME to change frequency — often more time than the code took to execute. Keeping the CPU affinity restricted to certain overclocked cores is more optimal than having the threads of the process swapped around to all cores, over-clocked or not. As of now no OS scheduler does takes the active clock speed of individual cores into consideration. Perhaps that will change in a future release of Windows, but it seems unlikely at least for several years.

Case 4: Core Thrashing

Just by the name, you know this is a bad thing. You lose performance when a thread is swapped to a different core, due to the CPU cache being ‘lost’ each time. In general, the *least* switching of cores the better. One would hope the OS would try to avoid this, but it doesn’t seem to at all in quick tests under Windows 7. Therefore, it is recommended you manually adjust the CPU affinity of certain applications to achieve better performance.

External and Reference Links

Discover more from Bitsum

Subscribe to get the latest posts sent to your email.