From: Yuri Andriaccio <yurand2000@gmail.com>
To: Ingo Molnar <mingo@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Juri Lelli <juri.lelli@redhat.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Steven Rostedt <rostedt@goodmis.org>,
Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
Valentin Schneider <vschneid@redhat.com>
Cc: linux-kernel@vger.kernel.org,
Luca Abeni <luca.abeni@santannapisa.it>,
Yuri Andriaccio <yuri.andriaccio@santannapisa.it>
Subject: [RFC PATCH v5 00/29] Hierarchical Constant Bandwidth Server
Date: Thu, 30 Apr 2026 23:38:04 +0200 [thread overview]
Message-ID: <20260430213835.62217-1-yurand2000@gmail.com> (raw)
Hello,
This is the v5 for Hierarchical Constant Bandwidth Server, aiming at replacing
the current RT_GROUP_SCHED mechanism with something more robust and
theoretically sound. The patchset has been presented at OSPM25 and OSPM26
(https://retis.sssup.it/ospm-summit/), and a summary of its inner workings can
be found at https://lwn.net/Articles/1021332/ . You can find the previous
versions of this patchset at the bottom of the page, in particular version 1
which talks in more detail what this patchset is all about and how it is
implemented.
This v5 version works on the comments by the reviewers and introduces the
following meaningful changes:
- Update to kernel version 7.0.
- General refactorings, cleanups, extensive use of lock guard for cleaner code.
- Add missing rcu read sections in deadline.c and rt.c code.
- Include fix for non-deferred deadline server logic (Patch 1).
- Account HCBS deadline servers along with all the active tasks when the servers
are active. This ensures correct behaviour for servers that are just
replenished but have no tasks to run.
- Update and reuse __checkparam_dl to also check for HCBS servers' parameters.
- Update default sysctl_sched_rt_runtime to 1s, as sysctl_sched_rt_period. These
parameters only manage the deadline tasks' and servers' bandwidth, not the
actual parameters of the fair (and ext) servers.
- Add early release of cgroup resources in unregister_rt_sched_group, reducing
from two to one the number of RCU grace periods to wait for the release of
reserved deadline bandwidth.
- Remove rt_server_try_pull, as it is now possible to pull tasks directly in
rt_server_pick on server replenish.
- Remove dl_server_stop call when emptying a cgroup's runqueue, as the server is
nonetheless stopped on the next server pick (if the pull operation fails).
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Summary of the patches:
1) Replenishment logic fix for non-deferred deadline servers
2-5) Preparation patches, so that the RT classes' code can be used both
for normal and cgroup scheduling.
6-17) Implementation of HCBS, no migration and only one level hierarchy.
The old RT_GROUP_SCHED code is removed.
18-19) Remove cgroups v1 in favour of v2.
20) Add support for deeper hierarchies.
21) Update default bandwidth for deadline entities.
22-26) Add support for tasks migration.
27) Documentation for HCBS.
28-29) Debug BUG_ONs optional patches.
Updates from v4:
- Rebase to latest tip/master.
- General rebasing/cleanup.
- Update default sysctl_sched_rt_runtime to 1s, same as the period.
- Fix non-deferred deadline server replenishment logic.
- Add missing RCU read sections.
- Account HCBS servers along with their tasks when the servers are active.
- Release bandwidth resources early in unregister_rt_sched_group.
- Drop server_try_pull_task as it is now redundant.
- Remove dl_server_stop call in dequeue_task_rt.
- Update to reuse __checkparam_dl for deadline servers.
Updates from v3:
- Rebase to latest tip/master.
- General rebasing/cleanup.
- Add Documentation.
- Define **live** and **active** groups.
- Introduce server_try_pull_task in place of the removed server_has_task.
- Introduce RELEASE_LOCK helper macro for guard-based locking.
- Update inc/dec_dl_tasks to account for served runqueues regardless of the
server type.
- Fix computing of new bandwidth values in dl_init_tg.
- Fix check in dl_check_tg to use capacity scaling.
- Fix wakeup_preempt_rt to check if curr is a DEADLINE task.
Updates from v2:
- Rebase to latest tip/master.
- Remove fair-servers' bw reclaiming.
- Fix a check which prevented execution of wakeup_preempt code.
- Fix a priority check in group_pull_rt_task between tasks of different groups.
- Rework allocation/deallocation code for rt-cgroups.
- Update signatures for some group related migration functions.
- Add documentation for wakeup_preempt preemption rules.
Updates from v1:
- Rebase to latest tip/master.
- Add migration code.
- Split big patches for more readability.
- Refactor code to use guarded locks where applicable.
- Remove unnecessary patches from v1 which have been addressed differently by
mainline updates.
- Remove unnecessary checks and general code cleanup.
Notes:
Patch 1 has already been submitted for review at:
https://lore.kernel.org/all/20260420163410.20808-1-yurand2000@gmail.com/
Patches 28-29 are completely optional and are not meant to be included in the
final patchset: they just add some invasive BUG_ONs that assert some
preconditions expected on some function calls.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Testing v5:
The patchset has been tested with a suite of tests tailored to stress all the
implemented functionalities.
The tests are available at https://github.com/Yurand2000/HCBS-Test-Suite .
Refer to the README of the repository for more details.
Follow these steps to test HCBS v5:
- Get the HCBS patch up and running. Any kernel/disto should work effortlessly.
- Get, compile and _install_ the tests.
- Run the `go_rt.sh` script to set the frequency of the CPUs to a fixed value
and disable hyperthreading and power saving features.
- Run the `run_tests.sh full` script, to run the whole test suite.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Future Work:
We think the current patchset is stable enough. Our current test suite
demonstrates, on our limited hardware, that the kernel does not throw warnings
and that it is actually possible to guarantee time reservations and isolation
among tenants.
In the hope that the pre-migration patches (2-19) have reached a decent final
form, we of course expect comments on the migration related code (22-26) and the
other patches (1,20-21).
Since the updates on the latest comments were already worked onto, we've decided
to release v5 without the multiCPU feature, presented at OSPM26, as the code is
not yet fully tested and cleaned, in the hope to release it in a future v6 RFC.
Additional future work:
- capacity aware bandwidth reservation.
- hotplug/hotunplug management.
Have a nice day,
Yuri
v1: https://lore.kernel.org/all/20250605071412.139240-1-yurand2000@gmail.com/
v2: https://lore.kernel.org/all/20250731105543.40832-1-yurand2000@gmail.com/
v3: https://lore.kernel.org/all/20250929092221.10947-1-yurand2000@gmail.com/
v4: https://lore.kernel.org/all/20251201124205.11169-1-yurand2000@gmail.com/
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Yuri Andriaccio (13):
sched/deadline: Fix replenishment logic for non-deferred servers
sched/rt: Disable RT_GROUP_SCHED
sched/rt: Remove unnecessary runqueue pointer in struct rt_rq
sched/rt: Implement dl-server operations for rt-cgroups
sched/rt: Update task event callbacks for HCBS scheduling
sched/rt: Allow zeroing the runtime of the root control group
sched/rt: Remove support for cgroups-v1
sched/rt: Update default bandwidth for real-time tasks to ONE
sched/rt: Try pull task on empty server pick.
sched/core: Execute enqueued balance callbacks after
migrate_disable_switch
Documentation: Update documentation for real-time cgroups
sched/rt: Add debug BUG_ONs for pre-migration code
sched/rt: Add debug BUG_ONs in migration code
luca abeni (16):
sched/deadline: Do not access dl_se->rq directly
sched/deadline: Distinguish between dl_rq and my_q
sched/rt: Pass an rt_rq instead of an rq where needed
sched/rt: Move functions from rt.c to sched.h
sched/rt: Introduce HCBS specific structs in task_group
sched/core: Initialize HCBS specific structures
sched/deadline: Add dl_init_tg
sched/rt: Add {alloc/unregister/free}_rt_sched_group
sched/deadline: Account rt-cgroups bandwidth in deadline tasks
schedulability tests.
sched/rt: Update rt-cgroup schedulability checks
sched/rt: Remove old RT_GROUP_SCHED data structures
sched/core: Cgroup v2 support
sched/deadline: Allow deeper hierarchies of RT cgroups
sched/rt: Add rt-cgroup migration functions
sched/rt: Hook HCBS migration functions
sched/core: Execute enqueued balance callbacks when changing allowed
CPUs
Documentation/scheduler/sched-rt-group.rst | 504 ++-
include/linux/rcupdate.h | 1 +
include/linux/sched.h | 10 +-
kernel/sched/autogroup.c | 4 +-
kernel/sched/core.c | 74 +-
kernel/sched/deadline.c | 251 +-
kernel/sched/debug.c | 6 -
kernel/sched/ext.c | 4 +-
kernel/sched/fair.c | 4 +-
kernel/sched/rt.c | 3240 ++++++++++----------
kernel/sched/sched.h | 178 +-
kernel/sched/syscalls.c | 9 +-
12 files changed, 2393 insertions(+), 1892 deletions(-)
base-commit: 028ef9c96e96197026887c0f092424679298aae8
--
2.53.0
next reply other threads:[~2026-04-30 21:38 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-30 21:38 Yuri Andriaccio [this message]
2026-04-30 21:38 ` [RFC PATCH v5 01/29] sched/deadline: Fix replenishment logic for non-deferred servers Yuri Andriaccio
2026-04-30 21:38 ` [RFC PATCH v5 02/29] sched/deadline: Do not access dl_se->rq directly Yuri Andriaccio
2026-04-30 21:38 ` [RFC PATCH v5 03/29] sched/deadline: Distinguish between dl_rq and my_q Yuri Andriaccio
2026-04-30 21:38 ` [RFC PATCH v5 04/29] sched/rt: Pass an rt_rq instead of an rq where needed Yuri Andriaccio
2026-04-30 21:38 ` [RFC PATCH v5 05/29] sched/rt: Move functions from rt.c to sched.h Yuri Andriaccio
2026-04-30 21:38 ` [RFC PATCH v5 06/29] sched/rt: Disable RT_GROUP_SCHED Yuri Andriaccio
2026-04-30 21:38 ` [RFC PATCH v5 07/29] sched/rt: Remove unnecessary runqueue pointer in struct rt_rq Yuri Andriaccio
2026-04-30 21:38 ` [RFC PATCH v5 08/29] sched/rt: Introduce HCBS specific structs in task_group Yuri Andriaccio
2026-04-30 21:38 ` [RFC PATCH v5 09/29] sched/core: Initialize HCBS specific structures Yuri Andriaccio
2026-04-30 21:38 ` [RFC PATCH v5 10/29] sched/deadline: Add dl_init_tg Yuri Andriaccio
2026-04-30 21:38 ` [RFC PATCH v5 11/29] sched/rt: Add {alloc/unregister/free}_rt_sched_group Yuri Andriaccio
2026-04-30 21:38 ` [RFC PATCH v5 12/29] sched/deadline: Account rt-cgroups bandwidth in deadline tasks schedulability tests Yuri Andriaccio
2026-04-30 21:38 ` [RFC PATCH v5 13/29] sched/rt: Implement dl-server operations for rt-cgroups Yuri Andriaccio
2026-05-05 13:04 ` Peter Zijlstra
2026-04-30 21:38 ` [RFC PATCH v5 14/29] sched/rt: Update task event callbacks for HCBS scheduling Yuri Andriaccio
2026-05-05 13:16 ` Peter Zijlstra
2026-04-30 21:38 ` [RFC PATCH v5 15/29] sched/rt: Update rt-cgroup schedulability checks Yuri Andriaccio
2026-05-05 14:36 ` Peter Zijlstra
2026-04-30 21:38 ` [RFC PATCH v5 16/29] sched/rt: Allow zeroing the runtime of the root control group Yuri Andriaccio
2026-04-30 21:38 ` [RFC PATCH v5 17/29] sched/rt: Remove old RT_GROUP_SCHED data structures Yuri Andriaccio
2026-04-30 21:38 ` [RFC PATCH v5 18/29] sched/core: Cgroup v2 support Yuri Andriaccio
2026-05-05 14:59 ` Peter Zijlstra
2026-04-30 21:38 ` [RFC PATCH v5 19/29] sched/rt: Remove support for cgroups-v1 Yuri Andriaccio
2026-05-05 15:01 ` Peter Zijlstra
2026-04-30 21:38 ` [RFC PATCH v5 20/29] sched/deadline: Allow deeper hierarchies of RT cgroups Yuri Andriaccio
2026-05-05 15:15 ` Peter Zijlstra
2026-05-05 19:56 ` Tejun Heo
2026-04-30 21:38 ` [RFC PATCH v5 21/29] sched/rt: Update default bandwidth for real-time tasks to ONE Yuri Andriaccio
2026-04-30 21:38 ` [RFC PATCH v5 22/29] sched/rt: Add rt-cgroup migration functions Yuri Andriaccio
2026-05-05 15:20 ` Peter Zijlstra
2026-05-05 15:24 ` Peter Zijlstra
2026-04-30 21:38 ` [RFC PATCH v5 23/29] sched/rt: Hook HCBS " Yuri Andriaccio
2026-04-30 21:38 ` [RFC PATCH v5 24/29] sched/core: Execute enqueued balance callbacks when changing allowed CPUs Yuri Andriaccio
2026-04-30 21:38 ` [RFC PATCH v5 25/29] sched/rt: Try pull task on empty server pick Yuri Andriaccio
2026-04-30 21:38 ` [RFC PATCH v5 26/29] sched/core: Execute enqueued balance callbacks after migrate_disable_switch Yuri Andriaccio
2026-04-30 21:38 ` [RFC PATCH v5 27/29] Documentation: Update documentation for real-time cgroups Yuri Andriaccio
2026-04-30 21:38 ` [RFC PATCH v5 28/29] sched/rt: Add debug BUG_ONs for pre-migration code Yuri Andriaccio
2026-04-30 21:38 ` [RFC PATCH v5 29/29] sched/rt: Add debug BUG_ONs in migration code Yuri Andriaccio
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260430213835.62217-1-yurand2000@gmail.com \
--to=yurand2000@gmail.com \
--cc=bsegall@google.com \
--cc=dietmar.eggemann@arm.com \
--cc=juri.lelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=luca.abeni@santannapisa.it \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
--cc=yuri.andriaccio@santannapisa.it \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox