Linux cgroups development
 help / color / mirror / Atom feed
From: Yuri Andriaccio <yurand2000@gmail.com>
To: "Ingo Molnar" <mingo@redhat.com>,
	"Peter Zijlstra" <peterz@infradead.org>,
	"Juri Lelli" <juri.lelli@redhat.com>,
	"Vincent Guittot" <vincent.guittot@linaro.org>,
	"Dietmar Eggemann" <dietmar.eggemann@arm.com>,
	"Steven Rostedt" <rostedt@goodmis.org>,
	"Ben Segall" <bsegall@google.com>, "Mel Gorman" <mgorman@suse.de>,
	"Valentin Schneider" <vschneid@redhat.com>,
	"Tejun Heo" <tj@kernel.org>,
	"Johannes Weiner" <hannes@cmpxchg.org>,
	"Michal Koutný" <mkoutny@suse.com>
Cc: cgroups@vger.kernel.org, linux-kernel@vger.kernel.org,
	Luca Abeni <luca.abeni@santannapisa.it>,
	Yuri Andriaccio <yuri.andriaccio@santannapisa.it>
Subject: [RFC PATCH v6 15/25] sched/rt: Update task event callbacks for HCBS scheduling
Date: Mon,  8 Jun 2026 14:15:34 +0200	[thread overview]
Message-ID: <20260608121546.69910-16-yurand2000@gmail.com> (raw)
In-Reply-To: <20260608121546.69910-1-yurand2000@gmail.com>

Update wakeup_preempt_rt, switched_{from/to}_rt and prio_changed_rt with
rt-cgroup's specific preemption rules:

- In wakeup_preempt_rt(), whenever a task wakes up, it must be checked if
  it is served by a deadline server or it lives on the global runqueue.
  Preemption rules (as documented in the function), change based on the
  current task's donor and woken task runqueue:
  - If both tasks are FIFO/RR tasks on the global runqueue, or the same
    cgroup, run as normal.
  - If woken is inside a cgroup, but donor is a FIFO task on the global
    runqueue, always preempt. If donor is a DEADLINE task, check if the dl
    server preempts donor.
  - If both tasks are FIFO/RR tasks in served but different groups, check
    whether the woken server preempts the donor server.
- In prio_changed_rt(), if the task is not running, only run preemption
  checks if the running task resides on the same task group of the task
  that changed priority.

Update sched_rt_can_attach() to check if a task can be attached to a given
cgroup. For now the check only consists in checking if the group has
non-zero bandwidth. Remove the tsk argument from sched_rt_can_attach, as
it is unused.

Change cpu_cgroup_can_attach() to check if the attachee is a FIFO/RR
task before attaching it to a cgroup.

Update __sched_setscheduler() to perform checks when trying to switch
to FIFO/RR for a task inside a cgroup, as the group needs to have
runtime allocated.

Update task_is_throttled_rt() for SCHED_CORE, returning the is_throttled
value of the server if present, while global rt-tasks are never throttled.

Update migration functions to ignore cgroups migration, to be implemented
in later patches.

Co-developed-by: Alessio Balsini <a.balsini@sssup.it>
Signed-off-by: Alessio Balsini <a.balsini@sssup.it>
Co-developed-by: Andrea Parri <parri.andrea@gmail.com>
Signed-off-by: Andrea Parri <parri.andrea@gmail.com>
Co-developed-by: luca abeni <luca.abeni@santannapisa.it>
Signed-off-by: luca abeni <luca.abeni@santannapisa.it>
Signed-off-by: Yuri Andriaccio <yurand2000@gmail.com>
---
 kernel/sched/core.c     |  2 +-
 kernel/sched/rt.c       | 98 ++++++++++++++++++++++++++++++++++++++---
 kernel/sched/sched.h    |  2 +-
 kernel/sched/syscalls.c | 12 +++++
 4 files changed, 105 insertions(+), 9 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 9e47a02cfaf7..1252f45feda0 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -9545,7 +9545,7 @@ static int cpu_cgroup_can_attach(struct cgroup_taskset *tset)
 		goto scx_check;

 	cgroup_taskset_for_each(task, css, tset) {
-		if (!sched_rt_can_attach(css_tg(css), task))
+		if (rt_task(task) && !sched_rt_can_attach(css_tg(css)))
 			return -EINVAL;
 	}
 scx_check:
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 61e9dab894d1..168a92945b4a 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -372,6 +372,9 @@ static inline void rt_queue_push_tasks(struct rt_rq *rt_rq)
 {
 	struct rq *rq = global_rq_of_rt_rq(rt_rq);

+	if (is_dl_group(rt_rq))
+		return;
+
 	if (!has_pushable_tasks(rt_rq))
 		return;

@@ -382,6 +385,9 @@ static inline void rt_queue_pull_task(struct rt_rq *rt_rq)
 {
 	struct rq *rq = global_rq_of_rt_rq(rt_rq);

+	if (is_dl_group(rt_rq))
+		return;
+
 	queue_balance_callback(rq, &per_cpu(rt_pull_head, rq->cpu), pull_rt_task);
 }

@@ -1031,7 +1037,55 @@ static int balance_rt(struct rq *rq, struct task_struct *p, struct rq_flags *rf)
 static void wakeup_preempt_rt(struct rq *rq, struct task_struct *p, int flags)
 {
 	struct task_struct *donor = rq->donor;
+	struct sched_dl_entity *woken_dl_se = NULL;
+	struct sched_dl_entity *donor_dl_se = NULL;
+
+	if (!rt_group_sched_enabled())
+		goto same_group_sched;
+
+	/*
+	 * Preemption checks are different if the waking task and the current donor
+	 * are running on the global runqueue or in a cgroup. The following rules
+	 * apply:
+	 *   - dl-tasks (and equally dl_servers) always preempt FIFO/RR tasks.
+	 *     - if donor is a FIFO/RR task inside a cgroup (i.e. run by a
+	 *       dl_server), or donor is a DEADLINE task and waking is a FIFO/RR
+	 *       task on the root cgroup, do nothing.
+	 *     - if waking is inside a cgroup but donor is a FIFO/RR task in the
+	 *       root cgroup, always reschedule.
+	 *   - if they are both on the global runqueue or in the same cgroup, run
+	 *     the standard code.
+	 *   - if they are both in a cgroup, but not the same one, check whether the
+	 *     woken task's dl_server preempts the donor's dl_server.
+	 *   - if donor is a DEADLINE task and waking is in a cgroup, check whether
+	 *     the woken task's server preempts donor.
+	 */
+	if (is_dl_group(rt_rq_of_se(&p->rt)))
+		woken_dl_se = dl_group_of(rt_rq_of_se(&p->rt));
+	if (is_dl_group(rt_rq_of_se(&donor->rt)))
+		donor_dl_se = dl_group_of(rt_rq_of_se(&donor->rt));
+	else if (task_has_dl_policy(donor))
+		donor_dl_se = &donor->dl;
+
+	if (woken_dl_se != NULL && donor_dl_se != NULL) {
+		if (woken_dl_se == donor_dl_se) {
+			goto same_group_sched;
+		}
+
+		if (dl_entity_preempt(woken_dl_se, donor_dl_se))
+			resched_curr(rq);
+
+		return;
+
+	} else if (woken_dl_se != NULL) {
+		resched_curr(rq);
+		return;
+
+	} else if (donor_dl_se != NULL) {
+		return;
+	}

+same_group_sched:
 	/*
 	 * XXX If we're preempted by DL, queue a push?
 	 */
@@ -1055,7 +1109,8 @@ static void wakeup_preempt_rt(struct rq *rq, struct task_struct *p, int flags)
 	 * to move current somewhere else, making room for our non-migratable
 	 * task.
 	 */
-	if (p->prio == donor->prio && !test_tsk_need_resched(rq->curr))
+	if (!is_dl_group(rt_rq_of_se(&p->rt)) &&
+	    p->prio == donor->prio && !test_tsk_need_resched(rq->curr))
 		check_preempt_equal_prio(rq, p);
 }

@@ -1362,6 +1417,9 @@ static int push_rt_rq_task(struct rt_rq *rt_rq, bool pull)
 	struct rt_rq *lowest_rt_rq;
 	int ret = 0;

+	if (is_dl_group(rt_rq))
+		return 0;
+
 	if (!rt_rq->overloaded)
 		return 0;

@@ -1668,6 +1726,9 @@ static void pull_rt_rq_task(struct rt_rq *this_rt_rq)
 	struct rq *src_rq;
 	int rt_overload_count = rt_overloaded(this_rq);

+	if (is_dl_group(&this_rq->rt))
+		return;
+
 	if (likely(!rt_overload_count))
 		return;

@@ -1811,6 +1872,8 @@ static void rq_offline_rt(struct rq *rq)
  */
 static void switched_from_rt(struct rq *rq, struct task_struct *p)
 {
+	struct rt_rq *rt_rq = rt_rq_of_se(&p->rt);
+
 	/*
 	 * If there are other RT tasks then we will reschedule
 	 * and the scheduling of the other RT tasks will handle
@@ -1818,10 +1881,10 @@ static void switched_from_rt(struct rq *rq, struct task_struct *p)
 	 * we may need to handle the pulling of RT tasks
 	 * now.
 	 */
-	if (!task_on_rq_queued(p) || rq->rt.rt_nr_running)
+	if (!task_on_rq_queued(p) || rt_rq->rt_nr_running)
 		return;

-	rt_queue_pull_task(rt_rq_of_se(&p->rt));
+	rt_queue_pull_task(rt_rq);
 }

 void __init init_sched_rt_class(void)
@@ -1858,6 +1921,7 @@ static void switched_to_rt(struct rq *rq, struct task_struct *p)
 	if (task_on_rq_queued(p)) {
 		if (p->nr_cpus_allowed > 1 && rq->rt.overloaded)
 			rt_queue_push_tasks(rt_rq_of_se(&p->rt));
+
 		if (p->prio < rq->donor->prio && cpu_online(cpu_of(rq)))
 			resched_curr(rq);
 	}
@@ -1870,6 +1934,8 @@ static void switched_to_rt(struct rq *rq, struct task_struct *p)
 static void
 prio_changed_rt(struct rq *rq, struct task_struct *p, u64 oldprio)
 {
+	struct rt_rq *rt_rq = rt_rq_of_se(&p->rt);
+
 	if (!task_on_rq_queued(p))
 		return;

@@ -1882,15 +1948,24 @@ prio_changed_rt(struct rq *rq, struct task_struct *p, u64 oldprio)
 		 * may need to pull tasks to this runqueue.
 		 */
 		if (oldprio < p->prio)
-			rt_queue_pull_task(rt_rq_of_se(&p->rt));
+			rt_queue_pull_task(rt_rq);

 		/*
 		 * If there's a higher priority task waiting to run
 		 * then reschedule.
 		 */
-		if (p->prio > rq->rt.highest_prio.curr)
+		if (p->prio > rt_rq->highest_prio.curr)
 			resched_curr(rq);
 	} else {
+		/*
+		 * This task is not running, thus we check against the currently
+		 * running task for preemption. We can preempt only if both tasks are
+		 * in the same cgroup or on the global runqueue.
+		 */
+		if (rt_group_sched_enabled() &&
+		    rt_rq->tg != rt_rq_of_se(&rq->curr->rt)->tg)
+			return;
+
 		/*
 		 * This task is not running, but if it is
 		 * greater than the current running task
@@ -1983,7 +2058,16 @@ static unsigned int get_rr_interval_rt(struct rq *rq, struct task_struct *task)
 #ifdef CONFIG_SCHED_CORE
 static int task_is_throttled_rt(struct task_struct *p, int cpu)
 {
+#ifdef CONFIG_RT_GROUP_SCHED
+	struct rt_rq *rt_rq;
+
+	rt_rq = task_group(p)->rt_rq[cpu];
+	WARN_ON(!rt_group_sched_enabled() && rt_rq->tg != &root_task_group);
+
+	return dl_group_of(rt_rq)->dl_throttled;
+#else
 	return 0;
+#endif
 }
 #endif /* CONFIG_SCHED_CORE */

@@ -2222,10 +2306,10 @@ long sched_group_rt_period(struct task_group *tg)
 	return rt_period_us;
 }

-int sched_rt_can_attach(struct task_group *tg, struct task_struct *tsk)
+int sched_rt_can_attach(struct task_group *tg)
 {
 	/* Don't accept real-time tasks when there is no way for them to run */
-	if (rt_group_sched_enabled() && rt_task(tsk) && tg->rt_bandwidth.rt_runtime == 0)
+	if (rt_group_sched_enabled() && tg->dl_bandwidth.dl_runtime == 0)
 		return 0;

 	return 1;
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 66d5bd1aa4f1..bde49f216081 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -611,7 +611,7 @@ extern int sched_group_set_rt_runtime(struct task_group *tg, long rt_runtime_us)
 extern int sched_group_set_rt_period(struct task_group *tg, u64 rt_period_us);
 extern long sched_group_rt_runtime(struct task_group *tg);
 extern long sched_group_rt_period(struct task_group *tg);
-extern int sched_rt_can_attach(struct task_group *tg, struct task_struct *tsk);
+extern int sched_rt_can_attach(struct task_group *tg);

 extern struct task_group *sched_create_group(struct task_group *parent);
 extern void sched_online_group(struct task_group *tg,
diff --git a/kernel/sched/syscalls.c b/kernel/sched/syscalls.c
index 9c1ba10ea5a7..773f744c0460 100644
--- a/kernel/sched/syscalls.c
+++ b/kernel/sched/syscalls.c
@@ -606,6 +606,18 @@ int __sched_setscheduler(struct task_struct *p,
 change:

 	if (user) {
+		/*
+		 * Do not allow real-time tasks into groups that have no runtime
+		 * assigned.
+		 */
+		if (rt_group_sched_enabled() &&
+		    dl_bandwidth_enabled() && rt_policy(policy) &&
+		    !sched_rt_can_attach(task_group(p)) &&
+		    !task_group_is_autogroup(task_group(p))) {
+			retval = -EPERM;
+			goto unlock;
+		}
+
 		if (dl_bandwidth_enabled() && dl_policy(policy) &&
 				!(attr->sched_flags & SCHED_FLAG_SUGOV)) {
 			cpumask_t *span = rq->rd->span;
--
2.54.0


  parent reply	other threads:[~2026-06-08 12:16 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-08 12:15 [RFC PATCH v6 00/25] Hierarchical Constant Bandwidth Server Yuri Andriaccio
2026-06-08 12:15 ` [RFC PATCH v6 01/25] sched/deadline: Fix replenishment logic for non-deferred servers Yuri Andriaccio
2026-06-08 12:15 ` [RFC PATCH v6 02/25] sched/rt: Update default bandwidth for real-time tasks to ONE Yuri Andriaccio
2026-06-08 12:15 ` [RFC PATCH v6 03/25] sched/deadline: Do not access dl_se->rq directly Yuri Andriaccio
2026-06-08 12:15 ` [RFC PATCH v6 04/25] sched/deadline: Distinguish between dl_rq and my_q Yuri Andriaccio
2026-06-08 12:15 ` [RFC PATCH v6 05/25] sched/rt: Pass an rt_rq instead of an rq where needed Yuri Andriaccio
2026-06-08 12:15 ` [RFC PATCH v6 06/25] sched/rt: Move functions from rt.c to sched.h Yuri Andriaccio
2026-06-08 12:15 ` [RFC PATCH v6 07/25] sched/rt: Disable RT_GROUP_SCHED Yuri Andriaccio
2026-06-08 12:15 ` [RFC PATCH v6 08/25] sched/rt: Remove unnecessary runqueue pointer in struct rt_rq Yuri Andriaccio
2026-06-08 12:15 ` [RFC PATCH v6 09/25] sched/rt: Introduce HCBS specific structs in task_group Yuri Andriaccio
2026-06-08 12:15 ` [RFC PATCH v6 10/25] sched/core: Initialize HCBS specific structures Yuri Andriaccio
2026-06-08 12:15 ` [RFC PATCH v6 11/25] sched/deadline: Add dl_init_tg Yuri Andriaccio
2026-06-08 12:15 ` [RFC PATCH v6 12/25] sched/rt: Add {alloc/unregister/free}_rt_sched_group Yuri Andriaccio
2026-06-11  8:42   ` Juri Lelli
2026-06-08 12:15 ` [RFC PATCH v6 13/25] sched/deadline: Account rt-cgroups bandwidth in deadline tasks schedulability tests Yuri Andriaccio
2026-06-08 12:15 ` [RFC PATCH v6 14/25] sched/rt: Implement dl-server operations for rt-cgroups Yuri Andriaccio
2026-06-08 12:15 ` Yuri Andriaccio [this message]
2026-06-08 12:15 ` [RFC PATCH v6 16/25] sched/rt: Remove support for cgroups-v1 Yuri Andriaccio
2026-06-08 12:15 ` [RFC PATCH v6 17/25] sched/rt: Update rt-cgroup schedulability checks Yuri Andriaccio
2026-06-08 12:15 ` [RFC PATCH v6 18/25] sched/rt: Update task's RT runqueue when switching scheduling class Yuri Andriaccio
2026-06-08 12:15 ` [RFC PATCH v6 19/25] sched/rt: Remove old RT_GROUP_SCHED data structures Yuri Andriaccio
2026-06-08 12:15 ` [RFC PATCH v6 20/25] sched/rt: Add HCBS migration code to related functions Yuri Andriaccio
2026-06-08 12:15 ` [RFC PATCH v6 21/25] sched/rt: Hook HCBS migration functions Yuri Andriaccio
2026-06-08 12:15 ` [RFC PATCH v6 22/25] sched/core: Execute enqueued balance callbacks when changing allowed CPUs Yuri Andriaccio
2026-06-08 12:15 ` [RFC PATCH v6 23/25] sched/rt: Try pull task on empty server pick Yuri Andriaccio
2026-06-08 12:15 ` [RFC PATCH v6 24/25] sched/core: Execute enqueued balance callbacks after migrate_disable_switch Yuri Andriaccio
2026-06-08 12:15 ` [RFC PATCH v6 25/25] Documentation: Update documentation for real-time cgroups Yuri Andriaccio
2026-06-09 15:46 ` [RFC PATCH v6 00/25] Hierarchical Constant Bandwidth Server Juri Lelli
2026-06-09 16:23   ` Yuri Andriaccio
2026-06-10  9:21     ` Juri Lelli
2026-06-15 20:38 ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260608121546.69910-16-yurand2000@gmail.com \
    --to=yurand2000@gmail.com \
    --cc=bsegall@google.com \
    --cc=cgroups@vger.kernel.org \
    --cc=dietmar.eggemann@arm.com \
    --cc=hannes@cmpxchg.org \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luca.abeni@santannapisa.it \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=mkoutny@suse.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tj@kernel.org \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    --cc=yuri.andriaccio@santannapisa.it \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox