All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] sched/fair: Allow account_cfs_rq_runtime() to throttle current hierarchy
@ 2026-05-28  9:48 K Prateek Nayak
  2026-05-28  9:48 ` [PATCH 1/5] sched/fair: Convert cfs bandwidth throttling to use guards K Prateek Nayak
                   ` (6 more replies)
  0 siblings, 7 replies; 21+ messages in thread
From: K Prateek Nayak @ 2026-05-28  9:48 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot
  Cc: Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, Aaron Lu, Josh Don, K Prateek Nayak,
	linux-kernel

The current hierarchy is always throttled in __schedule() during the
pick when update_curr() detects a cfs_rq running out of the bandwidth
and issues a resched.

This was necessary prior to per-task throttling where the entire
throttled hierarchy was dequeued at the point of first throttle during
the pick but with per-task throttling, tasks continue to run as usual
until they exit to userspace and dequeue themselves one-by-one until the
hierarchy is deemed fully throttled and the PELT is frozen.

throttle_cfs_rq() is now simply a propagator of throttle indicators and
nothing more.

Unify the throttling for current hierarchy under
account_cfs_rq_runtime() which is responsible for the time accounting.
If the bandwidth runs out, account_cfs_rq_runtime() will request for
sched_cfs_bandwidth_slice() and mark the hierarchy as throttled if it
fails to grab bandwidth.

throttle_cfs_rq() will do a task_throttle_setup_work() if it finds the
current task to be on a throttled hierarchy and the task will naturally
dequeue itself when it exits to the userspace without needing an
explicit resched.

First four patches are cleanups and preparation for the final bit that
switches over to using account_cfs_rq_runtime() for throttling which was
provided by Peter in [1].

Following are the results of running hackbench running 3 levels deep
with the setup from "Testing" section on [2] when compared to
tip:sched/core:

  kernel        :  tip        tip + series

  Min           : 207.33        202.20
  Max           : 210.20        222.47
  Median        : 207.83        218.33
  AMean         : 208.29        215.36
  GMean         : 208.29        215.25
  HMean         : 208.29        215.13
  AMean Stddev  : 1.02          7.37
  AMean CoefVar : 0.49 pct      3.42 pct

  All numbers are in seconds.

There is a slight boot to boot variation for this benchmark but the
utilization numbers in top is more or less similar between the two.
Additional testing and feedback is always appreciated as usual :-)

Patches are based on tip:sched/core at commit 9e005ed21152
("sched/topology: Allow multiple domains to claim sched_domain_shared")
All testing was done on a dual socket 4th Generation EPYC system (2 x
128C/256T). CONFIG_CFS_BANDWIDTH=n was only build tested.

Patches also cleanly apply on top of Zecheng's optimization from [3]
when applied on top of the same base. Peter, there is only one trivial
conflict with sched/flat, and Zecheng's optimization is generally
beneficial for deep hierarchies even with flattened pick.

References
==========

[1] https://lore.kernel.org/lkml/20260512110932.GB1889694@noisy.programming.kicks-ass.net/
[2] https://lore.kernel.org/lkml/20250220093257.9380-1-kprateek.nayak@amd.com/
[3] https://lore.kernel.org/lkml/20260522141623.600235-4-zli94@ncsu.edu/

---
K Prateek Nayak (4):
  sched/fair: Convert cfs bandwidth throttling to use guards
  sched/fair: Use throttled_csd_list for local unthrottle
  sched/fair: Call update_curr() before unthrottling the hierarchy
  sched/fair: Move the throttled tasks to a local list in
    tg_unthrottle_up()

Peter Zijlstra (1):
  sched/fair: Unify cfs_rq throttling via account_cfs_rq_runtime()

 kernel/sched/fair.c | 342 +++++++++++++++++++++++---------------------
 1 file changed, 181 insertions(+), 161 deletions(-)


base-commit: 9e005ed21152d4a4bb0ceea71045ff8a642a6feb
-- 
2.34.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2026-06-02  8:58 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-28  9:48 [PATCH 0/5] sched/fair: Allow account_cfs_rq_runtime() to throttle current hierarchy K Prateek Nayak
2026-05-28  9:48 ` [PATCH 1/5] sched/fair: Convert cfs bandwidth throttling to use guards K Prateek Nayak
2026-05-28 21:46   ` Benjamin Segall
2026-05-28  9:48 ` [PATCH 2/5] sched/fair: Use throttled_csd_list for local unthrottle K Prateek Nayak
2026-05-28 21:53   ` Benjamin Segall
2026-05-28  9:48 ` [PATCH 3/5] sched/fair: Call update_curr() before unthrottling the hierarchy K Prateek Nayak
2026-05-28 22:03   ` Benjamin Segall
2026-06-01  3:52   ` Aaron Lu
2026-06-01  5:50     ` K Prateek Nayak
2026-06-01 11:27       ` Peter Zijlstra
2026-06-02  6:33         ` K Prateek Nayak
2026-05-28  9:48 ` [PATCH 4/5] sched/fair: Move the throttled tasks to a local list in tg_unthrottle_up() K Prateek Nayak
2026-05-28 22:14   ` Benjamin Segall
2026-05-28  9:48 ` [PATCH 5/5] sched/fair: Unify cfs_rq throttling via account_cfs_rq_runtime() K Prateek Nayak
2026-05-28 22:44   ` Benjamin Segall
2026-06-01 13:48   ` Peter Zijlstra
2026-06-02  7:01     ` K Prateek Nayak
2026-06-02  8:32       ` Peter Zijlstra
2026-06-02  8:57         ` K Prateek Nayak
2026-05-28 11:45 ` [PATCH 0/5] sched/fair: Allow account_cfs_rq_runtime() to throttle current hierarchy Peter Zijlstra
2026-06-01  6:18 ` Aaron Lu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.