The 64 Core Threshold – Processor Groups on Windows
TL;DR – Microsoft hacked in support for more than 64 logical CPU cores by adding ‘processor groups’. Unless an application is written to take advantage of multiple processor groups (group aware), its threads will be scheduled to only a single processor group. See our new groupextend project to enable group unaware apps to make full use of the CPU by scheduling threads to supplemental processor groups, a feature now also available in Process Lasso.
UPDATE – There have been improvements to processor group support in Windows 11 and Windows Server 2022. Application threads are now scheduled across all processor groups without any group awareness required by the application. Read more at MSDN.
UPDATE 2 – CPU Sets are now supported by Process Lasso, and they allow spanning of multiple processor groups. We suggest experimenting with them if you need to assign a process specific CPU cores from multiple groups.
As systems with more than 64 logical CPU cores become common, it is important to understand some fundamental limitations of Windows. When Windows NT was conceived, a 64-bit bitmask was used to represent CPU affinities throughout the system. This seemed like plenty of bit space for the single-core CPUs in use at the time. Now we’ve breached the limits of that bitmask (64 cores). To address this …
Windows 2008 R2 introduced processor groups to allow for more than 64 logical CPU cores. Each group has, at most, 64 cores. Existing APIs and system functions could then continue to accept 64-bit CPU affinity bitmasks since they are implicitly be operating on a single processor group. This means that, without programmatic adaptation, each process is limited to a single processor group of no more than 64 logical cores.
The process (application) CPU affinities you see in Process Lasso are therefore specific to the default processor group of the application. This is simply the nature of OS. So if you have a 96 logical core system, you will see application CPU affinities of up to 48 logical cores. If an application is on the first group, that CPU affinity will represent the first 48 logical cores. If an application is on the second group, the CPU affinity will represent the second 48 logical cores.
The default processor group is assigned in a round-robin manner, typically per-session since CPU affinities are inherited, so that the session load ends up roughly split between the processor groups. Once an application starts, its default (primary) processor group can not be changed.
However, individual threads within an application can be manually assigned to a group other than the application’s default group. SetThreadGroupAffinity and other thread APIs supports specification of the processor group. In this way, an application’s threads can be run on more than one processor group, but they must be manually assigned. Ideally, this is implemented by the application developer, as awareness of what threads are doing and where they should be placed is important.
Once a thread is assigned to a group other than the application’s default, the application becomes a multi-group. However, new threads will continue to use the default group assigned when the application started.
When an application’s threads span multiple processor groups, there can be confusion around the application’s CPU affinity mask since it only applies to the default processor group assigned when the application started.
Often processor groups are composed of less than 64 cores, as they can not span NUMA nodes, and must be equally divided. It would be bizarre to have processor groups of different sizes, causing applications to get more or less cores depending on the processor group they were assigned.
Therefore, on a system with 72 logical CPU cores, divided into two NUMA nodes, two processor groups are created, each having 36 logical CPU cores.
This creates an interesting scenario where a Windows system with up to 64 cores may allow an application access to more cores than does a system with greater than 64 cores.
To demonstrate the impact, we are imagining an application unaware of processor groups, named aptly UnawareOfGroupsApp.exe.
72 TOTAL CPU CORES Group #0: 36 CPU cores Group #1: 36 CPU cores UnawareOfGroupsApp.exe has default access to 36 CPU cores 48 TOTAL CPU CORES Group #0: 48 CPU cores UnawareOfGroupsApp.exe has default access to 48 CPU cores
As you can see, our group unaware application has access to 48 CPU cores on a 48 core system, but only 36 on a 72 core system!
If this scenario presents a problem, your options are as follows:
- Disable Hyper-Threading/SMT to reduce the logical core count to 64 or less, incurring the overall reduction in computational capacity. That would allow group unaware processes to use more cores since the singular processor group includes all available cores.
- Contact the app developer to have them modify it to be group aware.
- Contact Bitsum to see what we can do for you. We can adapt group unaware applications. Update: See groupextend tool below.
Given this design, users should check check your applications’ group support before upgrading your hardware, and try to choose hardware with CPU core counts that are a multiple of 64 since that will result in maximum size processor groups.
MSDN: Processor Groups