The 64 Core Threshold – Processor Groups on Windows
TL;DR – Microsoft hacked in support for more than 64 logical CPU cores by adding ‘processor groups’. Existing process level CPU affinity masks apply to only a single processor group, but threads within a process can be manually assigned to other CPU groups.
As systems with more than 64 logical CPU cores become common, it is important to understand some fundamental limitations of Windows. When Windows NT was conceived, a 64-bit bitmask was used to represent CPU affinities throughout the system. This seemed like plenty of bit space for the single-core CPUs in use at the time. Now we’ve breached the limits of that bitmask (64 cores), and this is how Microsoft adapted Windows.
Windows 2008 R2 introduced processor groups to allow for more than 64 logical CPU cores. Each group has, at most, 64 cores. Existing APIs and system functions could then continue to accept 64-bit affinity bitmasks, as each would implicitly be operating on a single processor group. This means that, without programmatic adaptation, each process is limited to a single processor group of no more than 64 logical cores.
Individual threads within a process can be manually assigned to a group other than the process’s default group. SetThreadGroupAffinity and other thread APIs supports specification of the processor group. In this way, a process’s threads can be running on more than one processor group. Once a thread is assigned to a group other than the default, the process becomes a multi-group process. However, new threads will continue to use the default group assigned at process creation.
Often processor groups are composed of less than 64 cores, as each must encompass an entire NUMA node. This makes sense as it would not be optimal for a process to be utilizing a mix of cores from differing NUMA nodes. It would also bizarre to have processor groups of different sizes, causing new processes to get more or less cores depending on the processor group they were assigned.
Therefore, on a system with 72 logical CPU cores, divided into two NUMA nodes, two processor groups are created, each having 36 logical CPU cores.
This creates an interesting scenario where a Windows system with up to 64 cores may allow an application access to more cores than does a system with greater than 64 cores.
To demonstrate the impact, we are imagining an application unaware of processor groups, named aptly UnawareOfGroupsApp.exe.
72 TOTAL CPU CORES Group #0: 36 CPU cores Group #1: 36 CPU cores UnawareOfGroupsApp.exe has default access to 36 CPU cores 48 TOTAL CPU CORES Group #0: 48 CPU cores UnawareOfGroupsApp.exe has default access to 48 CPU cores
As you can see, our group unaware application has access to 48 CPU cores on a 48 core system, but only 36 on a 72 core system!
If this scenario presents a problem, your options are as follows:
- Disable Hyper-Threading/SMT to reduce the logical core count to below 64, incurring the overall reduction in computational capacity. That would allow group unaware processes to use more cores since the singular processor group includes all available cores.
- Contact the app developer to have them modify it to be group aware.
- Contact Bitsum to see what we can do for you. We can externally adapt group unaware applications to multiple processor groups. However, this requires special attention to the unique attributes of the offending application to be done right.
Given this design, users should check check your applications’ group support before upgrading your hardware, and try to choose hardware with CPU core counts that are a multiple of 64 since that will result in maximum size processor groups.
MSDN: Processor Groups