All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/9] sched/topology: Optimize sd->shared allocation
@ 2026-03-12  4:44 K Prateek Nayak
  2026-03-12  4:44 ` [PATCH v4 1/9] sched/topology: Compute sd_weight considering cpuset partitions K Prateek Nayak
                   ` (9 more replies)
  0 siblings, 10 replies; 56+ messages in thread
From: K Prateek Nayak @ 2026-03-12  4:44 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Valentin Schneider, linux-kernel
  Cc: Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Chen Yu,
	Shrikanth Hegde, Li Chen, Gautham R. Shenoy, K Prateek Nayak

Hello folks,

I got distracted for a bit but here is v4 of the series with most of the
feedback on v3 incorporated. Nothing much has changed but if you see
anything you don't like, please let me know and we can discuss how to
best address it.

Background
==========

Discussed at LPC'25, the allocation of per-CPU "sched_domain_shared"
objects for each topology level was found to be unnecessary since only
"sd_llc_shared" is ever used by the scheduler and rest is either
reclaimed during __sdt_free() or remain allocated without any purpose.

Folks are already optimizing for unnecessary sched domain allocations
with commit f79c9aa446d6 ("x86/smpboot: avoid SMT domain attach/destroy
if SMT is not enabled") removing the SMT level entirely on the x86 side
when it is know that the domain will be degenerated anyways by the
scheduler.

New approach to sd->shared allocation
=====================================

This goes one step ahead with the "sched_domain_shared" allocations by
moving it out of "sd_data" which is allocated for every topology level
and into "s_data" instead which is allocated once per partition.

"sd->shared" is only allocated for the topmost SD_SHARE_LLC domain and
the topology layer uses the sched domain degeneration path to pass the
reference to the final "sd_llc" domain. Since degeneration of parent
ensures 1:1 mapping between the span with the child, and the fact that
SD_SHARE_LLC domains never overlap, degeneration of an SD_SAHRE_LLC
domain either means its span is same as that of its child or that it
only contains a single CPU making it redundant.

Future work
===========

This is an initial optimization for larger idea to break the global
"nohz.idle_cpus_mask" into per-LLC chunks which would embed itself in
"sd_llc_shared" and bloat the struct. Reducing the overhead of
allocating "sched_domain_shared" now would benefit later by reducing the
temporary memory pressure experience during sched domain rebuild.

The suggestion to entirely remove per-CPU "sd_llc_shared" from Shrikanth
has been deferred to this future work once few more users of
"sd_llc_shared" in CONFIG_NO_HZ_COMON are converted over to use per-CPU
"sd_nohz->shared" reference leaving only the {test,set}_idle_cores()
using the per-CPU "sd_llc_shared" references.

Misc cleanups
=============

Since the topology layer also checks for the existence of a valid
"sd->shared" when "sd_llc" is present, the handling of "sd_llc_shared"
can also be simplified when a reference to "sd_llc" is already present
in the scope (Patch 8 and Patch 9).

Patches are based on top of:

  git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git sched/core

at commit 54a66e431eea ("sched/headers: Inline raw_spin_rq_unlock()").
---
Changelog v3..v4:

o Collected tags from Chenyu, Shrikanth, and Valentin (Thanks a ton for
  reviewing and testing v3)

o Broke off the "imb_numa_nr" calculation into a separate helper to
  avoid having two "if" conditions - one searcing for sd_llc and other
  for sd_llc's parent. (Valentin)

o Moved the claiming of "d.sds" objects into claim_allocations() to keep
  all those bits in one place. (Shrikanth)

o Elaborated with a comment in Patch 9 on why dereferencing "sd->shared"
  in select_idle_cpu() since it's lifetime is tied to "sd_llc"
  dereferenced in the caller and as long as there exists a RCU protected
  reference to "sd", "sd->shared" is also valid. (Shrikanth)

o Illustrated the problematic case highlighted in Patch 1 with an
  example. (Shrikanth)

o Made a note of the larger optimization to "nohz.idle_cpus_mask" coming
  up in the cover latter that is helped along by this optimization.
  (Peter)

o Rebase and more testing with hotplug and cpuset.

v3: https://lore.kernel.org/lkml/20260120113246.27987-1-kprateek.nayak@amd.com/


Changelog rfc v2..v3:

o Broke off the "sd->shared" assignment optimization into a separate
  series for easier review.

o Spotted a case of incorrect calculation of load balancing periods
  in presence of cpuset partitions (Patch 1).

o Broke off the single "sd->shared" assignment optimization patch into
  3 parts for easier review (Patch 2 - Patch 4). The "Reviewed-by:" tag
  from Gautham was dropped as a result.

o Building on recent effort from Peter to remove the superfluous usage
  of rcu_read_lock() in !preemptible() regions, Patch5 and Patch 6
  cleans up the fair task's wakeup path before adding more cleanups in
  Patch 7 and Patch 8.

o Dropped the RFC tag.

v2: https://lore.kernel.org/lkml/20251208083602.31898-1-kprateek.nayak@amd.com/
---
K Prateek Nayak (9):
  sched/topology: Compute sd_weight considering cpuset partitions
  sched/topology: Extract "imb_numa_nr" calculation into a separate
    helper
  sched/topology: Allocate per-CPU sched_domain_shared in s_data
  sched/topology: Switch to assigning "sd->shared" from s_data
  sched/topology: Remove sched_domain_shared allocation with sd_data
  sched/core: Check for rcu_read_lock_any_held() in idle_get_state()
  sched/fair: Remove superfluous rcu_read_lock() in the wakeup path
  sched/fair: Simplify the entry condition for update_idle_cpu_scan()
  sched/fair: Simplify SIS_UTIL handling in select_idle_cpu()

 include/linux/sched/topology.h |   1 -
 kernel/sched/fair.c            |  70 ++++-----
 kernel/sched/sched.h           |   2 +-
 kernel/sched/topology.c        | 263 +++++++++++++++++++++------------
 4 files changed, 199 insertions(+), 137 deletions(-)


base-commit: 54a66e431eeacf23e1dc47cb3507f2d0c068aaf0
-- 
2.34.1


^ permalink raw reply	[flat|nested] 56+ messages in thread

end of thread, other threads:[~2026-03-24  9:10 UTC | newest]

Thread overview: 56+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-12  4:44 [PATCH v4 0/9] sched/topology: Optimize sd->shared allocation K Prateek Nayak
2026-03-12  4:44 ` [PATCH v4 1/9] sched/topology: Compute sd_weight considering cpuset partitions K Prateek Nayak
2026-03-12  9:34   ` Peter Zijlstra
2026-03-12  9:59     ` K Prateek Nayak
2026-03-12 10:01       ` Peter Zijlstra
2026-03-12 10:09         ` K Prateek Nayak
2026-03-18  8:08   ` [tip: sched/core] " tip-bot2 for K Prateek Nayak
2026-03-20 23:58     ` Nathan Chancellor
2026-03-21  3:36       ` K Prateek Nayak
2026-03-21  7:33         ` Chen, Yu C
2026-03-21  7:47           ` Chen, Yu C
2026-03-21  8:59             ` K Prateek Nayak
2026-03-21  9:45               ` K Prateek Nayak
2026-03-21 10:13                 ` K Prateek Nayak
2026-03-21 12:48                   ` Chen, Yu C
2026-03-24  2:54                     ` K Prateek Nayak
2026-03-21 14:13                   ` Shrikanth Hegde
2026-03-21 15:14                     ` K Prateek Nayak
2026-03-21 16:38       ` [PATCH] sched/topology: Initialize sd_span after assignment to *sd K Prateek Nayak
2026-03-23  9:08         ` Shrikanth Hegde
2026-03-23 17:34           ` K Prateek Nayak
2026-03-23  9:36         ` Peter Zijlstra
2026-03-23 13:24           ` Jon Hunter
2026-03-23 15:36           ` Chen, Yu C
2026-03-23 17:24           ` K Prateek Nayak
2026-03-23 22:41           ` Nathan Chancellor
2026-03-24  9:10           ` [tip: sched/core] sched/topology: Fix sched_domain_span() tip-bot2 for Peter Zijlstra
2026-03-12  4:44 ` [PATCH v4 2/9] sched/topology: Extract "imb_numa_nr" calculation into a separate helper K Prateek Nayak
2026-03-12 13:37   ` kernel test robot
2026-03-12 15:42     ` K Prateek Nayak
2026-03-12 16:02       ` Peter Zijlstra
2026-03-16  0:18   ` Dietmar Eggemann
2026-03-16  3:41     ` K Prateek Nayak
2026-03-16  8:24       ` Dietmar Eggemann
2026-03-16  8:50         ` K Prateek Nayak
2026-03-18  8:08   ` [tip: sched/core] " tip-bot2 for K Prateek Nayak
2026-03-12  4:44 ` [PATCH v4 3/9] sched/topology: Allocate per-CPU sched_domain_shared in s_data K Prateek Nayak
2026-03-18  8:08   ` [tip: sched/core] " tip-bot2 for K Prateek Nayak
2026-03-12  4:44 ` [PATCH v4 4/9] sched/topology: Switch to assigning "sd->shared" from s_data K Prateek Nayak
2026-03-18  8:08   ` [tip: sched/core] " tip-bot2 for K Prateek Nayak
2026-03-12  4:44 ` [PATCH v4 5/9] sched/topology: Remove sched_domain_shared allocation with sd_data K Prateek Nayak
2026-03-18  8:08   ` [tip: sched/core] " tip-bot2 for K Prateek Nayak
2026-03-12  4:44 ` [PATCH v4 6/9] sched/core: Check for rcu_read_lock_any_held() in idle_get_state() K Prateek Nayak
2026-03-12  9:46   ` Peter Zijlstra
2026-03-12 10:06     ` K Prateek Nayak
2026-03-18  8:08   ` [tip: sched/core] " tip-bot2 for K Prateek Nayak
2026-03-12  4:44 ` [PATCH v4 7/9] sched/fair: Remove superfluous rcu_read_lock() in the wakeup path K Prateek Nayak
2026-03-15 23:36   ` Dietmar Eggemann
2026-03-16  3:19     ` K Prateek Nayak
2026-03-18  8:08     ` [tip: sched/core] PM: EM: Switch to rcu_dereference_all() in " tip-bot2 for Dietmar Eggemann
2026-03-18  8:08   ` [tip: sched/core] sched/fair: Remove superfluous rcu_read_lock() in the " tip-bot2 for K Prateek Nayak
2026-03-12  4:44 ` [PATCH v4 8/9] sched/fair: Simplify the entry condition for update_idle_cpu_scan() K Prateek Nayak
2026-03-18  8:08   ` [tip: sched/core] " tip-bot2 for K Prateek Nayak
2026-03-12  4:44 ` [PATCH v4 9/9] sched/fair: Simplify SIS_UTIL handling in select_idle_cpu() K Prateek Nayak
2026-03-18  8:08   ` [tip: sched/core] " tip-bot2 for K Prateek Nayak
2026-03-16  0:22 ` [PATCH v4 0/9] sched/topology: Optimize sd->shared allocation Dietmar Eggemann

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.