The 64 Core Threshold – Processor Groups on Windows
As systems with more than 64 logical CPU cores (aka threads) become common, it is important to understand some fundamental limitations of Windows. When Windows NT was conceived, a 64-bit bitmask was used to represent CPU affinities throughout the system. This seemed like plenty of bit space for the single-core CPUs in use at the time. Now we’ve breached the limits of that bitmask (64 cores), and this is how Microsoft adapted Windows.
Windows 2008 R2 introduced processor groups to allow for more than 64 logical CPU cores. Each group has, at most, 64 cores. Existing APIs and system functions could then continue to accept 64-bit affinity bitmasks, as each would implicitly be operating on a single processor group. This means that, without programmatic adaptation by application developers, each process is limited to a single processor group of no more than 64 logical cores.
Often processor groups are composed of less than 64 cores, as each will encompass an entire NUMA node for scheduling simplicity and efficiency. This makes sense as it would not be optimal for a process to be utilizing a mix of cores from differing NUMA nodes. It would also bizarre to have processor groups of different sizes, causing new processes to get more or less cores depending on the processor group they were assigned. Therefore, on a system with 72 logical cores, divided into two NUMA nodes, two processor groups are defined of 36 cores each.
This creates an interesting scenario where a Windows system with up to 64 cores may allow an application access to more cores than does a system with greater than 64 cores. For example, on a system with 48 logical cores, each process (say an instance of IIS) would have access to the entire 48 cores. However, on a system with 72 logical cores, each process can have access to only 36 of those cores!
If this scenario presents a problem, your options as a user are limited. You could disable Hyper-Threading/SMT to reduce the logical core count to below 64, incurring the overall reduction in computational capacity. That would allow each process to use more cores since the singular processor group includes all available cores. Unfortunately, from the user side, there is no perfect solution, and the general assumption is that any application that can utilize such a large number of CPU cores will be written to make full use of processor groups (e.g. UMS threading or process forking).
Given this design, before you upgrade hardware to have more than 64 logical CPU cores, check your application load to determine if that is really the best option. It would be a shame to have to artificially reduce the core count just to allow any single process access to more cores than would otherwise be available in each processor group.
Process Lasso support for processor groups is coming, but it will never be able to overcome this fundamental limitation where each process has access to only a single group of, at most, 64 logical cores.