public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/5] sched/fair: SMT-aware asymmetric CPU capacity
@ 2026-04-23  7:36 Andrea Righi
  2026-04-23  7:36 ` [PATCH 1/5] sched/fair: Attach sched_domain_shared to sd_asym_cpucapacity Andrea Righi
                   ` (4 more replies)
  0 siblings, 5 replies; 23+ messages in thread
From: Andrea Righi @ 2026-04-23  7:36 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot
  Cc: Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, K Prateek Nayak, Christian Loehle, Koba Ko,
	Felix Abecassis, Balbir Singh, Joel Fernandes, Shrikanth Hegde,
	linux-kernel

This series attempts to improve SD_ASYM_CPUCAPACITY scheduling by introducing
SMT awareness.

= Problem =

Nominal per-logical-CPU capacity can overstate usable compute when an SMT
sibling is busy, because the physical core doesn't deliver its full nominal
capacity. So, several asym-cpu-capacity paths may pick high capacity idle CPUs
that are not actually good destinations.

= Solution =

This patch set aligns those paths with a simple rule already used elsewhere:
when SMT is active, prefer fully idle cores and avoid treating partially idle
SMT siblings as full-capacity targets where that would mislead load balance.

Patch set summary:
 - Attach sched_domain_shared to sd_asym_cpucapacity in SD_ASYM_CPUCAPACITY to
   use has_idle_cores hint consistently in the wakeup idle scan
 - Prefer fully-idle SMT cores in asym-capacity idle selection: in the wakeup
   fast path, extend select_idle_capacity() / asym_fits_cpu() so idle
   selection can prefer CPUs on fully idle cores.
 - Reject misfit pulls onto busy SMT siblings on SD_ASYM_CPUCAPACITY.
 - Add SIS_UTIL support to select_idle_capacity(): add to select_idle_capacity()
   the same SIS_UTIL-controlled idle-scan mechanism, already used by
   select_idle_cpu()

This patch set has been tested on the new NVIDIA Vera Rubin platform, where SMT
is enabled and the firmware exposes small frequency variations (+/-~5%) as
differences in CPU capacity, resulting in SD_ASYM_CPUCAPACITY being set.

Without these patches, performance can drop by up to ~2x with CPU-intensive
workloads, because the SD_ASYM_CPUCAPACITY idle selection policy does not
account for busy SMT siblings.

Alternative approaches have been evaluated, such as equalizing CPU capacities,
either by exposing uniform values via firmware or normalizing them in the kernel
by grouping CPUs within a small capacity window (+-5%).

However, the SMT-aware SD_ASYM_CPUCAPACITY approach has shown better results so
far. Improving this policy also seems worthwhile in general, as future platforms
may enable SMT with asymmetric CPU topologies.

Performance results on Vera Rubin with SD_ASYM_CPUCAPACITY (mainline) vs
SD_ASYM_CPUCAPACITY + SMT

- NVBLAS benchblas (one task / SMT core):

 +---------------------------------+--------+
 | Configuration                   | gflops |
 +---------------------------------+--------+
 | ASYM (mainline) + SIS_UTIL      |  5478  |
 | ASYM (mainline) + NO_SIS_UTIL   |  5491  |
 |                                 |        |
 | NO ASYM + SIS_UTIL              |  8912  |
 | NO ASYM + NO_SIS_UTIL           |  8978  |
 |                                 |        |
 | ASYM + SMT + SIS_UTIL           |  9259  |
 | ASYM + SMT + NO_SIS_UTIL        |  9291  |
 +---------------------------------+--------+

 - DCPerf MediaWiki (all CPUs):

 +---------------------------------+--------+--------+--------+--------+
 | Configuration                   |   rps  |  p50   |  p95   |  p99   |
 +---------------------------------+--------+--------+--------+--------+
 | ASYM (mainline) + SIS_UTIL      |  7994  |  0.052 |  0.223 |  0.246 |
 | ASYM (mainline) + NO_SIS_UTIL   |  7993  |  0.052 |  0.221 |  0.245 |
 |                                 |        |        |        |        |
 | NO ASYM + SIS_UTIL              |  8113  |  0.067 |  0.184 |  0.225 |
 | NO ASYM + NO_SIS_UTIL           |  8093  |  0.068 |  0.184 |  0.223 |
 |                                 |        |        |        |        |
 | ASYM + SMT + SIS_UTIL           |  8129  |  0.076 |  0.149 |  0.188 |
 | ASYM + SMT + NO_SIS_UTIL        |  8138  |  0.076 |  0.148 |  0.186 |
 +---------------------------------+--------+--------+--------+--------+

In the MediaWiki case SMT awareness is less impactful, because for the majority
of the run all CPUs are used, but it still seems to provide some benefits at
reducing tail latency.

See also:
 - https://lore.kernel.org/lkml/20260324005509.1134981-1-arighi@nvidia.com
 - https://lore.kernel.org/lkml/20260318092214.130908-1-arighi@nvidia.com

Changes in v3:
 - Add SIS_UTIL support to select_idle_capacity() (K Prateek Nayak)
 - Attach sched_domain_shared to sd_asym_cpucapacity (K Prateek Nayak)
 - Add enum for the different fit state (K Prateek Nayak)
 - Update has_idle_cores hint (Vincent Guittot)
 - Link to v2: https://lore.kernel.org/all/20260403053654.1559142-1-arighi@nvidia.com

Changes in v2:
 - Rework SMT awareness logic in select_idle_capacity() (K Prateek Nayak)
 - Drop EAS and find_new_ilb() changes for now
 - Link to v1: https://lore.kernel.org/all/20260326151211.1862600-1-arighi@nvidia.com

Git tree: git://git.kernel.org/pub/scm/linux/kernel/git/arighi/linux.git asym-cpu-capacity-smt

Andrea Righi (2):
      sched/fair: Prefer fully-idle SMT cores in asym-capacity idle selection
      sched/fair: Reject misfit pulls onto busy SMT siblings on asym-capacity

K Prateek Nayak (3):
      sched/fair: Attach sched_domain_shared to sd_asym_cpucapacity
      sched/fair: Add SIS_UTIL support to select_idle_capacity()
      sched/fair: Make asym CPU capacity idle rank values self-documenting

 kernel/sched/fair.c     | 107 ++++++++++++++++++++++++++++++++++++++++++++----
 kernel/sched/topology.c |  81 +++++++++++++++++++++++++++++++-----
 2 files changed, 168 insertions(+), 20 deletions(-)

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2026-04-27 17:27 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-23  7:36 [PATCH v3 0/5] sched/fair: SMT-aware asymmetric CPU capacity Andrea Righi
2026-04-23  7:36 ` [PATCH 1/5] sched/fair: Attach sched_domain_shared to sd_asym_cpucapacity Andrea Righi
2026-04-24  5:14   ` K Prateek Nayak
2026-04-24  8:46     ` Andrea Righi
2026-04-24 11:18       ` K Prateek Nayak
2026-04-24 23:29         ` Andrea Righi
2026-04-23  7:36 ` [PATCH 2/5] sched/fair: Prefer fully-idle SMT cores in asym-capacity idle selection Andrea Righi
2026-04-24  5:42   ` K Prateek Nayak
2026-04-23  7:36 ` [PATCH 3/5] sched/fair: Reject misfit pulls onto busy SMT siblings on asym-capacity Andrea Righi
2026-04-24  5:37   ` K Prateek Nayak
2026-04-24  9:21     ` Andrea Righi
2026-04-23  7:36 ` [PATCH 4/5] sched/fair: Add SIS_UTIL support to select_idle_capacity() Andrea Righi
2026-04-24  5:55   ` K Prateek Nayak
2026-04-24 12:32   ` Vincent Guittot
2026-04-24 17:13     ` Andrea Righi
2026-04-27  5:13     ` K Prateek Nayak
2026-04-27  8:35       ` Vincent Guittot
2026-04-27 16:01         ` Andrea Righi
2026-04-27 17:26           ` Vincent Guittot
2026-04-23  7:36 ` [PATCH 5/5] sched/fair: Make asym CPU capacity idle rank values self-documenting Andrea Righi
2026-04-24  4:29   ` K Prateek Nayak
2026-04-24  5:19     ` Andrea Righi
2026-04-24 12:34       ` Vincent Guittot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox