Sched_ext development
 help / color / mirror / Atom feed
* [PATCHSET v3 sched_ext/for-7.2] sched_ext: Auto-manage ext/fair dl_server bandwidth
@ 2026-05-26 16:42 Andrea Righi
  2026-05-26 16:42 ` [PATCH 1/2] sched_ext: Auto-register/unregister dl_server reservations Andrea Righi
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Andrea Righi @ 2026-05-26 16:42 UTC (permalink / raw)
  To: Tejun Heo, David Vernet, Changwoo Min, Ingo Molnar,
	Peter Zijlstra, Juri Lelli, Vincent Guittot
  Cc: Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, K Prateek Nayak, Christian Loehle, Phil Auld,
	Koba Ko, Joel Fernandes, Richard Cheng, Cheng-Yang Chou,
	sched-ext, linux-kernel

Currently, a fixed bandwidth is reserved at boot for both the fair and ext
deadline servers, and this reservation remains unchanged unless explicitly
modified via debugfs. As a result, both servers permanently contribute to global
bandwidth accounting, regardless of whether a BPF scheduler is active.

While unused bandwidth can still be reclaimed at runtime by other classes, this
static reservation prevents RT from fully utilizing available headroom in
situations where one of the sched_ext or fair class is guaranteed to be inactive
(for example, when no BPF scheduler is loaded, or when sched_ext runs in full
mode and replaces fair).

As discussed at the VIII OSPM summit in Cambridge [1], a better solution would
be to dynamically register and unregister deadline server bandwidth based on the
active sched_ext state. This allows the kernel to automatically enable bandwidth
accounting only for the scheduling class that is currently active, while
disabling it for inactive ones.

This patch series implements this automatic register/unregister logic. Moreover,
the sched_ext total_bw kselftest is also modified to validate the correct
behavior across the different scheduling configurations and ensure that
bandwidth accounting follows the expected state transitions.

[1] https://retis.santannapisa.it/ospm-summit/

Git tree: git://git.kernel.org/pub/scm/linux/kernel/git/arighi/linux.git dl-server-bw-v3

Changes in v3:
 - Don't bypass __dl_overflow() for detached servers in dl_server_apply_params()
   to reject oversized configs up front (reported by Sashiko)
 - A potential divide-by-zero in dl_server_apply_params() reported by Sashiko
   has been fixed in a separate patch (not introduced by this patch set):
   https://lore.kernel.org/all/20260526100502.575774-1-arighi@nvidia.com/
 - Link to v2: https://lore.kernel.org/all/20260526082954.550958-1-arighi@nvidia.com/

Changes in v2:
 - Rework the sched_ext enable path as suggested by Peter: attach ext_server
   before committing the scheduler switch and fail the enable if admission
   control rejects the reservation; detach fair_server only after a successful
   full-mode switch.
 - Added dl_server_swap_bw() for the disable/recovery path so ext_server detach
   and fair_server reattach happen under the same dl_b->lock, closing the
   window where concurrent SCHED_DEADLINE admission could steal the freed
   bandwidth (reported by Sashiko).
 - Fixed the attach/detach accounting issue reported by Sashiko by updating
   rq->dl.this_bw together with root-domain total_bw, draining active or
   non-contending servers before detach and preventing detached servers from
   starting.
 - Reuse dl_rq_change_utilization() to drain the server, so the detach path goes
   through the same machinery as dl_server_apply_params()
 - Made root-domain accounting honor the same cpu_active() conditions used by
   root-domain rebuilds, while preserving runtime/period updates made while a
   server is detached.
 - Fixed the total_bw selftest issues reported by Sashiko: check fclose()
   errors for debugfs writes, preserve per-CPU fair_server runtime values, and
   restore all CPUs on cleanup even if one write fails.
 - Link to v1: https://lore.kernel.org/all/20260521174509.1534623-1-arighi@nvidia.com/

Andrea Righi (2):
      sched_ext: Auto-register/unregister dl_server reservations
      selftests/sched_ext: Validate dl_server attach/detach in total_bw test

 include/linux/sched.h                        |   6 +
 kernel/sched/deadline.c                      | 204 ++++++++++++++++++++++++++-
 kernel/sched/ext.c                           |  71 ++++++++++
 kernel/sched/sched.h                         |   4 +
 tools/testing/selftests/sched_ext/total_bw.c | 201 +++++++++++++++++++++++++-
 5 files changed, 478 insertions(+), 8 deletions(-)

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2026-05-29  9:08 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-26 16:42 [PATCHSET v3 sched_ext/for-7.2] sched_ext: Auto-manage ext/fair dl_server bandwidth Andrea Righi
2026-05-26 16:42 ` [PATCH 1/2] sched_ext: Auto-register/unregister dl_server reservations Andrea Righi
2026-05-26 17:14   ` sashiko-bot
2026-05-28 11:36   ` Peter Zijlstra
2026-05-28 16:13     ` Andrea Righi
2026-05-26 16:42 ` [PATCH 2/2] selftests/sched_ext: Validate dl_server attach/detach in total_bw test Andrea Righi
2026-05-26 17:33   ` sashiko-bot
2026-05-27 12:36 ` [PATCHSET v3 sched_ext/for-7.2] sched_ext: Auto-manage ext/fair dl_server bandwidth Juri Lelli
2026-05-28 11:33   ` Peter Zijlstra
2026-05-28 16:13     ` Andrea Righi
2026-05-28 15:53 ` Tejun Heo
2026-05-29  9:08   ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox