Fix an issue with the last level cache values on Linux running on certain AMD Processors by mrsharm · Pull Request #108492 · dotnet/runtime

Fixes: #76290

Problem Details

We recently discovered an issue that affects Unix based VMs where we are taking the host’s (as opposed to the VM’s) for certain AMD processor's last level cache size to be used in the GetLogicalProcessorCacheSizeFromOS call to discern the gen0 budget for both WKS and SVR. This is because sysconf, the method we first try in GetLogicalProcessorCacheSizeFromOS, gets us the last level cache of the host machine as opposed to the fallback code path that reads the value of /sys/devices/system/cpu/cpu0/cache/index{LastLevelCache}/size. Consequently, the gen0 budgets are significantly different between Unix VMs and Windows using certain AMD processors on the same machine and the further implication of this is that we are probably setting much larger value than expected for the Gen0 budget for the GC running on Unix based VMs.

The details from my v16 CPU based DevBox with an AMD EPYC 7763 64 Core Processor running Ubuntu 22.04.3 via WSL are as follows:

  • sysconf returns the last cache size (L3) for the host machine (AMD EPYC™ 7763 – specs are here) as 256 MB.
    • Can be repro’d on the command line using:
      getconf -a | grep “LEVEL3_CACHE_SIZE” => LEVEL3_CACHE_SIZE 268435456
  • Reading /sys/devices/system/cpu/cpu0/cache/index3/size returns 32 MiB, the same as the result from GetLogicalProcessorCacheSizeFromOS from Windows that calls GetLogicalProcessorInformation function (sysinfoapi.h) - Win32 apps | Microsoft Learn.
    • Can be repro’d on the command line using lscpu => L3: 32 MiB (1 instance)

How To Check for the Issue

  1. Get sysconf output: getconf -a | grep "LEVEL"
  2. The full output of lscpu
  3. Check if the L3 (or if available, L4) cache size is the same from sysconf and that from lscpu.
  4. If the values are different, the issue exists.

Solution

  • By default, with no configuration changes, first try to read in the cache information from sysfs and if that fails, fall back to the heuristic we use to compute the value for the ARM* cases.
  • A new configuration DOTNET_GCCacheSizeFromSysConf can be set to 1 to revert to the current behavior i.e., using sysconf to obtain the last level cache.

Performance Testing

Ran with the following GCPerfSim configurations for SVR: -tc 28 -tagb 100 -tlgb 0 -lohar 0-pohar 0 -sohsr 100-4000 -lohsr 102400-204800 -pohsr 100-204800 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time

image

Metric Not With SysConf With SysConf
Peak Heap Size (MB) 339.458 2656.194
% Pause Time in GC 39.0 7.7