From: Yuri Andriaccio <yurand2000@gmail.com>
To: "Ingo Molnar" <mingo@redhat.com>,
"Peter Zijlstra" <peterz@infradead.org>,
"Juri Lelli" <juri.lelli@redhat.com>,
"Vincent Guittot" <vincent.guittot@linaro.org>,
"Dietmar Eggemann" <dietmar.eggemann@arm.com>,
"Steven Rostedt" <rostedt@goodmis.org>,
"Ben Segall" <bsegall@google.com>, "Mel Gorman" <mgorman@suse.de>,
"Valentin Schneider" <vschneid@redhat.com>,
"Tejun Heo" <tj@kernel.org>,
"Johannes Weiner" <hannes@cmpxchg.org>,
"Michal Koutný" <mkoutny@suse.com>
Cc: cgroups@vger.kernel.org, linux-kernel@vger.kernel.org,
Luca Abeni <luca.abeni@santannapisa.it>,
Yuri Andriaccio <yuri.andriaccio@santannapisa.it>
Subject: [RFC PATCH v6 15/25] sched/rt: Update task event callbacks for HCBS scheduling
Date: Mon, 8 Jun 2026 14:15:34 +0200 [thread overview]
Message-ID: <20260608121546.69910-16-yurand2000@gmail.com> (raw)
In-Reply-To: <20260608121546.69910-1-yurand2000@gmail.com>
Update wakeup_preempt_rt, switched_{from/to}_rt and prio_changed_rt with
rt-cgroup's specific preemption rules:
- In wakeup_preempt_rt(), whenever a task wakes up, it must be checked if
it is served by a deadline server or it lives on the global runqueue.
Preemption rules (as documented in the function), change based on the
current task's donor and woken task runqueue:
- If both tasks are FIFO/RR tasks on the global runqueue, or the same
cgroup, run as normal.
- If woken is inside a cgroup, but donor is a FIFO task on the global
runqueue, always preempt. If donor is a DEADLINE task, check if the dl
server preempts donor.
- If both tasks are FIFO/RR tasks in served but different groups, check
whether the woken server preempts the donor server.
- In prio_changed_rt(), if the task is not running, only run preemption
checks if the running task resides on the same task group of the task
that changed priority.
Update sched_rt_can_attach() to check if a task can be attached to a given
cgroup. For now the check only consists in checking if the group has
non-zero bandwidth. Remove the tsk argument from sched_rt_can_attach, as
it is unused.
Change cpu_cgroup_can_attach() to check if the attachee is a FIFO/RR
task before attaching it to a cgroup.
Update __sched_setscheduler() to perform checks when trying to switch
to FIFO/RR for a task inside a cgroup, as the group needs to have
runtime allocated.
Update task_is_throttled_rt() for SCHED_CORE, returning the is_throttled
value of the server if present, while global rt-tasks are never throttled.
Update migration functions to ignore cgroups migration, to be implemented
in later patches.
Co-developed-by: Alessio Balsini <a.balsini@sssup.it>
Signed-off-by: Alessio Balsini <a.balsini@sssup.it>
Co-developed-by: Andrea Parri <parri.andrea@gmail.com>
Signed-off-by: Andrea Parri <parri.andrea@gmail.com>
Co-developed-by: luca abeni <luca.abeni@santannapisa.it>
Signed-off-by: luca abeni <luca.abeni@santannapisa.it>
Signed-off-by: Yuri Andriaccio <yurand2000@gmail.com>
---
kernel/sched/core.c | 2 +-
kernel/sched/rt.c | 98 ++++++++++++++++++++++++++++++++++++++---
kernel/sched/sched.h | 2 +-
kernel/sched/syscalls.c | 12 +++++
4 files changed, 105 insertions(+), 9 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 9e47a02cfaf7..1252f45feda0 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -9545,7 +9545,7 @@ static int cpu_cgroup_can_attach(struct cgroup_taskset *tset)
goto scx_check;
cgroup_taskset_for_each(task, css, tset) {
- if (!sched_rt_can_attach(css_tg(css), task))
+ if (rt_task(task) && !sched_rt_can_attach(css_tg(css)))
return -EINVAL;
}
scx_check:
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 61e9dab894d1..168a92945b4a 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -372,6 +372,9 @@ static inline void rt_queue_push_tasks(struct rt_rq *rt_rq)
{
struct rq *rq = global_rq_of_rt_rq(rt_rq);
+ if (is_dl_group(rt_rq))
+ return;
+
if (!has_pushable_tasks(rt_rq))
return;
@@ -382,6 +385,9 @@ static inline void rt_queue_pull_task(struct rt_rq *rt_rq)
{
struct rq *rq = global_rq_of_rt_rq(rt_rq);
+ if (is_dl_group(rt_rq))
+ return;
+
queue_balance_callback(rq, &per_cpu(rt_pull_head, rq->cpu), pull_rt_task);
}
@@ -1031,7 +1037,55 @@ static int balance_rt(struct rq *rq, struct task_struct *p, struct rq_flags *rf)
static void wakeup_preempt_rt(struct rq *rq, struct task_struct *p, int flags)
{
struct task_struct *donor = rq->donor;
+ struct sched_dl_entity *woken_dl_se = NULL;
+ struct sched_dl_entity *donor_dl_se = NULL;
+
+ if (!rt_group_sched_enabled())
+ goto same_group_sched;
+
+ /*
+ * Preemption checks are different if the waking task and the current donor
+ * are running on the global runqueue or in a cgroup. The following rules
+ * apply:
+ * - dl-tasks (and equally dl_servers) always preempt FIFO/RR tasks.
+ * - if donor is a FIFO/RR task inside a cgroup (i.e. run by a
+ * dl_server), or donor is a DEADLINE task and waking is a FIFO/RR
+ * task on the root cgroup, do nothing.
+ * - if waking is inside a cgroup but donor is a FIFO/RR task in the
+ * root cgroup, always reschedule.
+ * - if they are both on the global runqueue or in the same cgroup, run
+ * the standard code.
+ * - if they are both in a cgroup, but not the same one, check whether the
+ * woken task's dl_server preempts the donor's dl_server.
+ * - if donor is a DEADLINE task and waking is in a cgroup, check whether
+ * the woken task's server preempts donor.
+ */
+ if (is_dl_group(rt_rq_of_se(&p->rt)))
+ woken_dl_se = dl_group_of(rt_rq_of_se(&p->rt));
+ if (is_dl_group(rt_rq_of_se(&donor->rt)))
+ donor_dl_se = dl_group_of(rt_rq_of_se(&donor->rt));
+ else if (task_has_dl_policy(donor))
+ donor_dl_se = &donor->dl;
+
+ if (woken_dl_se != NULL && donor_dl_se != NULL) {
+ if (woken_dl_se == donor_dl_se) {
+ goto same_group_sched;
+ }
+
+ if (dl_entity_preempt(woken_dl_se, donor_dl_se))
+ resched_curr(rq);
+
+ return;
+
+ } else if (woken_dl_se != NULL) {
+ resched_curr(rq);
+ return;
+
+ } else if (donor_dl_se != NULL) {
+ return;
+ }
+same_group_sched:
/*
* XXX If we're preempted by DL, queue a push?
*/
@@ -1055,7 +1109,8 @@ static void wakeup_preempt_rt(struct rq *rq, struct task_struct *p, int flags)
* to move current somewhere else, making room for our non-migratable
* task.
*/
- if (p->prio == donor->prio && !test_tsk_need_resched(rq->curr))
+ if (!is_dl_group(rt_rq_of_se(&p->rt)) &&
+ p->prio == donor->prio && !test_tsk_need_resched(rq->curr))
check_preempt_equal_prio(rq, p);
}
@@ -1362,6 +1417,9 @@ static int push_rt_rq_task(struct rt_rq *rt_rq, bool pull)
struct rt_rq *lowest_rt_rq;
int ret = 0;
+ if (is_dl_group(rt_rq))
+ return 0;
+
if (!rt_rq->overloaded)
return 0;
@@ -1668,6 +1726,9 @@ static void pull_rt_rq_task(struct rt_rq *this_rt_rq)
struct rq *src_rq;
int rt_overload_count = rt_overloaded(this_rq);
+ if (is_dl_group(&this_rq->rt))
+ return;
+
if (likely(!rt_overload_count))
return;
@@ -1811,6 +1872,8 @@ static void rq_offline_rt(struct rq *rq)
*/
static void switched_from_rt(struct rq *rq, struct task_struct *p)
{
+ struct rt_rq *rt_rq = rt_rq_of_se(&p->rt);
+
/*
* If there are other RT tasks then we will reschedule
* and the scheduling of the other RT tasks will handle
@@ -1818,10 +1881,10 @@ static void switched_from_rt(struct rq *rq, struct task_struct *p)
* we may need to handle the pulling of RT tasks
* now.
*/
- if (!task_on_rq_queued(p) || rq->rt.rt_nr_running)
+ if (!task_on_rq_queued(p) || rt_rq->rt_nr_running)
return;
- rt_queue_pull_task(rt_rq_of_se(&p->rt));
+ rt_queue_pull_task(rt_rq);
}
void __init init_sched_rt_class(void)
@@ -1858,6 +1921,7 @@ static void switched_to_rt(struct rq *rq, struct task_struct *p)
if (task_on_rq_queued(p)) {
if (p->nr_cpus_allowed > 1 && rq->rt.overloaded)
rt_queue_push_tasks(rt_rq_of_se(&p->rt));
+
if (p->prio < rq->donor->prio && cpu_online(cpu_of(rq)))
resched_curr(rq);
}
@@ -1870,6 +1934,8 @@ static void switched_to_rt(struct rq *rq, struct task_struct *p)
static void
prio_changed_rt(struct rq *rq, struct task_struct *p, u64 oldprio)
{
+ struct rt_rq *rt_rq = rt_rq_of_se(&p->rt);
+
if (!task_on_rq_queued(p))
return;
@@ -1882,15 +1948,24 @@ prio_changed_rt(struct rq *rq, struct task_struct *p, u64 oldprio)
* may need to pull tasks to this runqueue.
*/
if (oldprio < p->prio)
- rt_queue_pull_task(rt_rq_of_se(&p->rt));
+ rt_queue_pull_task(rt_rq);
/*
* If there's a higher priority task waiting to run
* then reschedule.
*/
- if (p->prio > rq->rt.highest_prio.curr)
+ if (p->prio > rt_rq->highest_prio.curr)
resched_curr(rq);
} else {
+ /*
+ * This task is not running, thus we check against the currently
+ * running task for preemption. We can preempt only if both tasks are
+ * in the same cgroup or on the global runqueue.
+ */
+ if (rt_group_sched_enabled() &&
+ rt_rq->tg != rt_rq_of_se(&rq->curr->rt)->tg)
+ return;
+
/*
* This task is not running, but if it is
* greater than the current running task
@@ -1983,7 +2058,16 @@ static unsigned int get_rr_interval_rt(struct rq *rq, struct task_struct *task)
#ifdef CONFIG_SCHED_CORE
static int task_is_throttled_rt(struct task_struct *p, int cpu)
{
+#ifdef CONFIG_RT_GROUP_SCHED
+ struct rt_rq *rt_rq;
+
+ rt_rq = task_group(p)->rt_rq[cpu];
+ WARN_ON(!rt_group_sched_enabled() && rt_rq->tg != &root_task_group);
+
+ return dl_group_of(rt_rq)->dl_throttled;
+#else
return 0;
+#endif
}
#endif /* CONFIG_SCHED_CORE */
@@ -2222,10 +2306,10 @@ long sched_group_rt_period(struct task_group *tg)
return rt_period_us;
}
-int sched_rt_can_attach(struct task_group *tg, struct task_struct *tsk)
+int sched_rt_can_attach(struct task_group *tg)
{
/* Don't accept real-time tasks when there is no way for them to run */
- if (rt_group_sched_enabled() && rt_task(tsk) && tg->rt_bandwidth.rt_runtime == 0)
+ if (rt_group_sched_enabled() && tg->dl_bandwidth.dl_runtime == 0)
return 0;
return 1;
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 66d5bd1aa4f1..bde49f216081 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -611,7 +611,7 @@ extern int sched_group_set_rt_runtime(struct task_group *tg, long rt_runtime_us)
extern int sched_group_set_rt_period(struct task_group *tg, u64 rt_period_us);
extern long sched_group_rt_runtime(struct task_group *tg);
extern long sched_group_rt_period(struct task_group *tg);
-extern int sched_rt_can_attach(struct task_group *tg, struct task_struct *tsk);
+extern int sched_rt_can_attach(struct task_group *tg);
extern struct task_group *sched_create_group(struct task_group *parent);
extern void sched_online_group(struct task_group *tg,
diff --git a/kernel/sched/syscalls.c b/kernel/sched/syscalls.c
index 9c1ba10ea5a7..773f744c0460 100644
--- a/kernel/sched/syscalls.c
+++ b/kernel/sched/syscalls.c
@@ -606,6 +606,18 @@ int __sched_setscheduler(struct task_struct *p,
change:
if (user) {
+ /*
+ * Do not allow real-time tasks into groups that have no runtime
+ * assigned.
+ */
+ if (rt_group_sched_enabled() &&
+ dl_bandwidth_enabled() && rt_policy(policy) &&
+ !sched_rt_can_attach(task_group(p)) &&
+ !task_group_is_autogroup(task_group(p))) {
+ retval = -EPERM;
+ goto unlock;
+ }
+
if (dl_bandwidth_enabled() && dl_policy(policy) &&
!(attr->sched_flags & SCHED_FLAG_SUGOV)) {
cpumask_t *span = rq->rd->span;
--
2.54.0
next prev parent reply other threads:[~2026-06-08 12:16 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-08 12:15 [RFC PATCH v6 00/25] Hierarchical Constant Bandwidth Server Yuri Andriaccio
2026-06-08 12:15 ` [RFC PATCH v6 01/25] sched/deadline: Fix replenishment logic for non-deferred servers Yuri Andriaccio
2026-06-08 12:15 ` [RFC PATCH v6 02/25] sched/rt: Update default bandwidth for real-time tasks to ONE Yuri Andriaccio
2026-06-08 12:15 ` [RFC PATCH v6 03/25] sched/deadline: Do not access dl_se->rq directly Yuri Andriaccio
2026-06-08 12:15 ` [RFC PATCH v6 04/25] sched/deadline: Distinguish between dl_rq and my_q Yuri Andriaccio
2026-06-08 12:15 ` [RFC PATCH v6 05/25] sched/rt: Pass an rt_rq instead of an rq where needed Yuri Andriaccio
2026-06-08 12:15 ` [RFC PATCH v6 06/25] sched/rt: Move functions from rt.c to sched.h Yuri Andriaccio
2026-06-08 12:15 ` [RFC PATCH v6 07/25] sched/rt: Disable RT_GROUP_SCHED Yuri Andriaccio
2026-06-08 12:15 ` [RFC PATCH v6 08/25] sched/rt: Remove unnecessary runqueue pointer in struct rt_rq Yuri Andriaccio
2026-06-08 12:15 ` [RFC PATCH v6 09/25] sched/rt: Introduce HCBS specific structs in task_group Yuri Andriaccio
2026-06-08 12:15 ` [RFC PATCH v6 10/25] sched/core: Initialize HCBS specific structures Yuri Andriaccio
2026-06-08 12:15 ` [RFC PATCH v6 11/25] sched/deadline: Add dl_init_tg Yuri Andriaccio
2026-06-08 12:15 ` [RFC PATCH v6 12/25] sched/rt: Add {alloc/unregister/free}_rt_sched_group Yuri Andriaccio
2026-06-11 8:42 ` Juri Lelli
2026-06-08 12:15 ` [RFC PATCH v6 13/25] sched/deadline: Account rt-cgroups bandwidth in deadline tasks schedulability tests Yuri Andriaccio
2026-06-08 12:15 ` [RFC PATCH v6 14/25] sched/rt: Implement dl-server operations for rt-cgroups Yuri Andriaccio
2026-06-08 12:15 ` Yuri Andriaccio [this message]
2026-06-08 12:15 ` [RFC PATCH v6 16/25] sched/rt: Remove support for cgroups-v1 Yuri Andriaccio
2026-06-08 12:15 ` [RFC PATCH v6 17/25] sched/rt: Update rt-cgroup schedulability checks Yuri Andriaccio
2026-06-08 12:15 ` [RFC PATCH v6 18/25] sched/rt: Update task's RT runqueue when switching scheduling class Yuri Andriaccio
2026-06-08 12:15 ` [RFC PATCH v6 19/25] sched/rt: Remove old RT_GROUP_SCHED data structures Yuri Andriaccio
2026-06-08 12:15 ` [RFC PATCH v6 20/25] sched/rt: Add HCBS migration code to related functions Yuri Andriaccio
2026-06-08 12:15 ` [RFC PATCH v6 21/25] sched/rt: Hook HCBS migration functions Yuri Andriaccio
2026-06-08 12:15 ` [RFC PATCH v6 22/25] sched/core: Execute enqueued balance callbacks when changing allowed CPUs Yuri Andriaccio
2026-06-08 12:15 ` [RFC PATCH v6 23/25] sched/rt: Try pull task on empty server pick Yuri Andriaccio
2026-06-08 12:15 ` [RFC PATCH v6 24/25] sched/core: Execute enqueued balance callbacks after migrate_disable_switch Yuri Andriaccio
2026-06-08 12:15 ` [RFC PATCH v6 25/25] Documentation: Update documentation for real-time cgroups Yuri Andriaccio
2026-06-09 15:46 ` [RFC PATCH v6 00/25] Hierarchical Constant Bandwidth Server Juri Lelli
2026-06-09 16:23 ` Yuri Andriaccio
2026-06-10 9:21 ` Juri Lelli
2026-06-15 20:38 ` Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260608121546.69910-16-yurand2000@gmail.com \
--to=yurand2000@gmail.com \
--cc=bsegall@google.com \
--cc=cgroups@vger.kernel.org \
--cc=dietmar.eggemann@arm.com \
--cc=hannes@cmpxchg.org \
--cc=juri.lelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=luca.abeni@santannapisa.it \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=mkoutny@suse.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=tj@kernel.org \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
--cc=yuri.andriaccio@santannapisa.it \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox