[RFC PATCH v5 14/29] sched/rt: Update task event callbacks for HCBS scheduling

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Yuri Andriaccio <yurand2000@gmail.com>
To: Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>
Cc: linux-kernel@vger.kernel.org,
	Luca Abeni <luca.abeni@santannapisa.it>,
	Yuri Andriaccio <yuri.andriaccio@santannapisa.it>
Subject: [RFC PATCH v5 14/29] sched/rt: Update task event callbacks for HCBS scheduling
Date: Thu, 30 Apr 2026 23:38:18 +0200	[thread overview]
Message-ID: <20260430213835.62217-15-yurand2000@gmail.com> (raw)
In-Reply-To: <20260430213835.62217-1-yurand2000@gmail.com>

Update wakeup_preempt_rt, switched_{from/to}_rt and prio_changed_rt with
rt-cgroup's specific preemption rules:

- In wakeup_preempt_rt(), whenever a task wakes up, it must be checked if
  it is served by a deadline server or it lives on the global runqueue.
  Preemption rules (as documented in the function), change based on the
  current task and woken task runqueue:
  - If both tasks are FIFO/RR tasks on the global runqueue, or the same
    cgroup, run as normal.
  - If woken is inside a cgroup, but donor is a FIFO task on the global
    runqueue, always preempt. If donor is a DEADLINE task, check if the dl
    server preempts donor.
  - If both tasks are FIFO/RR tasks in served but different groups, check
    whether the woken server preempts the donor server.
- In switched_from_rt(), perform a pull only on the global runqueue, and
  do nothing if the task is inside a group. This will change when
  migrations are added.
- In switched_to_rt(), queue a push only on the global runqueue, while
  perform a priority check when the task switching is inside a group.
  This will change also when migrations are added.
- In prio_changed_rt(), queue a pull only on the global runqueue, if the
  task is not queued. If the task is queued, run preemption checks only
  if both the prio changed task and curr are in the same cgroup.

Update sched_rt_can_attach() to check if a task can be attached to a given
cgroup. For now the check only consists in checking if the group has
non-zero bandwidth. Remove the tsk argument from sched_rt_can_attach, as
it is unused.

Change cpu_cgroup_can_attach() to check if the attachee is a FIFO/RR
task before attaching it to a cgroup.

Update __sched_setscheduler() to perform checks when trying to switch
to FIFO/RR for a task inside a cgroup, as the group needs to have
runtime allocated.

Update task_is_throttled_rt() for SCHED_CORE, returning the is_throttled
value of the server if present, while global rt-tasks are never throttled.

Co-developed-by: Alessio Balsini <a.balsini@sssup.it>
Signed-off-by: Alessio Balsini <a.balsini@sssup.it>
Co-developed-by: Andrea Parri <parri.andrea@gmail.com>
Signed-off-by: Andrea Parri <parri.andrea@gmail.com>
Co-developed-by: luca abeni <luca.abeni@santannapisa.it>
Signed-off-by: luca abeni <luca.abeni@santannapisa.it>
Signed-off-by: Yuri Andriaccio <yurand2000@gmail.com>
---
 kernel/sched/core.c     |   2 +-
 kernel/sched/rt.c       | 106 +++++++++++++++++++++++++++++++++++-----
 kernel/sched/sched.h    |   2 +-
 kernel/sched/syscalls.c |  12 +++++
 4 files changed, 109 insertions(+), 13 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 4e58b4f165ed..98a53b60e21f 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -9270,7 +9270,7 @@ static int cpu_cgroup_can_attach(struct cgroup_taskset *tset)
 		goto scx_check;

 	cgroup_taskset_for_each(task, css, tset) {
-		if (!sched_rt_can_attach(css_tg(css), task))
+		if (rt_task(task) && !sched_rt_can_attach(css_tg(css)))
 			return -EINVAL;
 	}
 scx_check:
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index defb812b0e48..67fbf4bbe461 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -975,7 +975,58 @@ static int balance_rt(struct rq *rq, struct task_struct *p, struct rq_flags *rf)
 static void wakeup_preempt_rt(struct rq *rq, struct task_struct *p, int flags)
 {
 	struct task_struct *donor = rq->donor;
+	struct sched_dl_entity *woken_dl_se = NULL;
+	struct sched_dl_entity *donor_dl_se = NULL;
+
+	if (!rt_group_sched_enabled())
+		goto no_group_sched;

+	/*
+	 * Preemption checks are different if the waking task and the current task
+	 * are running on the global runqueue or in a cgroup. The following rules
+	 * apply:
+	 *   - dl-tasks (and equally dl_servers) always preempt FIFO/RR tasks.
+	 *     - if curr is a FIFO/RR task inside a cgroup (i.e. run by a
+	 *       dl_server), or curr is a DEADLINE task and waking is a FIFO/RR task
+	 *       on the root cgroup, do nothing.
+	 *     - if waking is inside a cgroup but curr is a FIFO/RR task in the root
+	 *       cgroup, always reschedule.
+	 *   - if they are both on the global runqueue, run the standard code.
+	 *   - if they are both in the same cgroup, check for tasks priorities.
+	 *   - if they are both in a cgroup, but not the same one, check whether the
+	 *     woken task's dl_server preempts the current's dl_server.
+	 *   - if curr is a DEADLINE task and waking is in a cgroup, check whether
+	 *     the woken task's server preempts curr.
+	 */
+	if (is_dl_group(rt_rq_of_se(&p->rt)))
+		woken_dl_se = dl_group_of(rt_rq_of_se(&p->rt));
+	if (is_dl_group(rt_rq_of_se(&donor->rt)))
+		donor_dl_se = dl_group_of(rt_rq_of_se(&donor->rt));
+	else if (task_has_dl_policy(donor))
+		donor_dl_se = &donor->dl;
+
+	if (woken_dl_se != NULL && donor_dl_se != NULL) {
+		if (woken_dl_se == donor_dl_se) {
+			if (p->prio < donor->prio)
+				resched_curr(rq);
+
+			return;
+		}
+
+		if (dl_entity_preempt(woken_dl_se, donor_dl_se))
+			resched_curr(rq);
+
+		return;
+
+	} else if (woken_dl_se != NULL) {
+		resched_curr(rq);
+		return;
+
+	} else if (donor_dl_se != NULL) {
+		return;
+	}
+
+no_group_sched:
 	/*
 	 * XXX If we're preempted by DL, queue a push?
 	 */
@@ -1026,7 +1077,8 @@ static inline void set_next_task_rt(struct rq *rq, struct task_struct *p, bool f
 	if (rq->donor->sched_class != &rt_sched_class)
 		update_rt_rq_load_avg(rq_clock_pelt(rq), rq, 0);

-	rt_queue_push_tasks(rt_rq);
+	if (!IS_ENABLED(CONFIG_RT_GROUP_SCHED) || !is_dl_group(rt_rq))
+		rt_queue_push_tasks(rt_rq);
 }

 static struct sched_rt_entity *pick_next_rt_entity(struct rt_rq *rt_rq)
@@ -1736,6 +1788,8 @@ static void rq_offline_rt(struct rq *rq)
  */
 static void switched_from_rt(struct rq *rq, struct task_struct *p)
 {
+	struct rt_rq *rt_rq = rt_rq_of_se(&p->rt);
+
 	/*
 	 * If there are other RT tasks then we will reschedule
 	 * and the scheduling of the other RT tasks will handle
@@ -1743,10 +1797,11 @@ static void switched_from_rt(struct rq *rq, struct task_struct *p)
 	 * we may need to handle the pulling of RT tasks
 	 * now.
 	 */
-	if (!task_on_rq_queued(p) || rq->rt.rt_nr_running)
+	if (!task_on_rq_queued(p) || rt_rq->rt_nr_running)
 		return;

-	rt_queue_pull_task(rt_rq_of_se(&p->rt));
+	if (!IS_ENABLED(CONFIG_RT_GROUP_SCHED) || !is_dl_group(rt_rq))
+		rt_queue_pull_task(rt_rq);
 }

 void __init init_sched_rt_class(void)
@@ -1766,6 +1821,8 @@ void __init init_sched_rt_class(void)
  */
 static void switched_to_rt(struct rq *rq, struct task_struct *p)
 {
+	struct rt_rq *rt_rq = rt_rq_of_se(&p->rt);
+
 	/*
 	 * If we are running, update the avg_rt tracking, as the running time
 	 * will now on be accounted into the latter.
@@ -1781,8 +1838,14 @@ static void switched_to_rt(struct rq *rq, struct task_struct *p)
 	 * then see if we can move to another run queue.
 	 */
 	if (task_on_rq_queued(p)) {
-		if (p->nr_cpus_allowed > 1 && rq->rt.overloaded)
-			rt_queue_push_tasks(rt_rq_of_se(&p->rt));
+		if (IS_ENABLED(CONFIG_RT_GROUP_SCHED) && is_dl_group(rt_rq)) {
+			if (p->prio < rq->donor->prio)
+				resched_curr(rq);
+		} else {
+			if (p->nr_cpus_allowed > 1 && rq->rt.overloaded)
+				rt_queue_push_tasks(rt_rq_of_se(&p->rt));
+		}
+
 		if (p->prio < rq->donor->prio && cpu_online(cpu_of(rq)))
 			resched_curr(rq);
 	}
@@ -1795,6 +1858,8 @@ static void switched_to_rt(struct rq *rq, struct task_struct *p)
 static void
 prio_changed_rt(struct rq *rq, struct task_struct *p, u64 oldprio)
 {
+	struct rt_rq *rt_rq = rt_rq_of_se(&p->rt);
+
 	if (!task_on_rq_queued(p))
 		return;

@@ -1807,15 +1872,25 @@ prio_changed_rt(struct rq *rq, struct task_struct *p, u64 oldprio)
 		 * may need to pull tasks to this runqueue.
 		 */
 		if (oldprio < p->prio)
-			rt_queue_pull_task(rt_rq_of_se(&p->rt));
+			if (!IS_ENABLED(CONFIG_RT_GROUP_SCHED) || !is_dl_group(rt_rq))
+				rt_queue_pull_task(rt_rq);

 		/*
 		 * If there's a higher priority task waiting to run
 		 * then reschedule.
 		 */
-		if (p->prio > rq->rt.highest_prio.curr)
+		if (p->prio > rt_rq->highest_prio.curr)
 			resched_curr(rq);
 	} else {
+		/*
+		 * This task is not running, thus we check against the currently
+		 * running task for preemption. We can preempt only if both tasks are
+		 * in the same cgroup or on the global runqueue.
+		 */
+		if (IS_ENABLED(CONFIG_RT_GROUP_SCHED) &&
+		    rt_rq_of_se(&p->rt)->tg != rt_rq_of_se(&rq->curr->rt)->tg)
+			return;
+
 		/*
 		 * This task is not running, but if it is
 		 * greater than the current running task
@@ -1908,7 +1983,16 @@ static unsigned int get_rr_interval_rt(struct rq *rq, struct task_struct *task)
 #ifdef CONFIG_SCHED_CORE
 static int task_is_throttled_rt(struct task_struct *p, int cpu)
 {
+#ifdef CONFIG_RT_GROUP_SCHED
+	struct rt_rq *rt_rq;
+
+	rt_rq = task_group(p)->rt_rq[cpu];
+	WARN_ON(!rt_group_sched_enabled() && rt_rq->tg != &root_task_group);
+
+	return dl_group_of(rt_rq)->dl_throttled;
+#else
 	return 0;
+#endif
 }
 #endif /* CONFIG_SCHED_CORE */

@@ -2159,16 +2243,16 @@ static int sched_rt_global_constraints(void)
 }
 #endif /* CONFIG_SYSCTL */

-int sched_rt_can_attach(struct task_group *tg, struct task_struct *tsk)
+int sched_rt_can_attach(struct task_group *tg)
 {
 	/* Don't accept real-time tasks when there is no way for them to run */
-	if (rt_group_sched_enabled() && rt_task(tsk) && tg->rt_bandwidth.rt_runtime == 0)
+	if (rt_group_sched_enabled() && tg->dl_bandwidth.dl_runtime == 0)
 		return 0;

 	return 1;
 }

-#else /* !CONFIG_RT_GROUP_SCHED: */
+#else /* !CONFIG_RT_GROUP_SCHED */

 #ifdef CONFIG_SYSCTL
 static int sched_rt_global_constraints(void)
@@ -2176,7 +2260,7 @@ static int sched_rt_global_constraints(void)
 	return 0;
 }
 #endif /* CONFIG_SYSCTL */
-#endif /* !CONFIG_RT_GROUP_SCHED */
+#endif /* CONFIG_RT_GROUP_SCHED */

 #ifdef CONFIG_SYSCTL
 static int sched_rt_global_validate(void)
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index d949babfe16a..fceb02a04858 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -609,7 +609,7 @@ extern int sched_group_set_rt_runtime(struct task_group *tg, long rt_runtime_us)
 extern int sched_group_set_rt_period(struct task_group *tg, u64 rt_period_us);
 extern long sched_group_rt_runtime(struct task_group *tg);
 extern long sched_group_rt_period(struct task_group *tg);
-extern int sched_rt_can_attach(struct task_group *tg, struct task_struct *tsk);
+extern int sched_rt_can_attach(struct task_group *tg);

 extern struct task_group *sched_create_group(struct task_group *parent);
 extern void sched_online_group(struct task_group *tg,
diff --git a/kernel/sched/syscalls.c b/kernel/sched/syscalls.c
index 806bc88d21ee..15653840c812 100644
--- a/kernel/sched/syscalls.c
+++ b/kernel/sched/syscalls.c
@@ -606,6 +606,18 @@ int __sched_setscheduler(struct task_struct *p,
 change:

 	if (user) {
+		/*
+		 * Do not allow real-time tasks into groups that have no runtime
+		 * assigned.
+		 */
+		if (rt_group_sched_enabled() &&
+				dl_bandwidth_enabled() && rt_policy(policy) &&
+				!sched_rt_can_attach(task_group(p)) &&
+				!task_group_is_autogroup(task_group(p))) {
+			retval = -EPERM;
+			goto unlock;
+		}
+
 		if (dl_bandwidth_enabled() && dl_policy(policy) &&
 				!(attr->sched_flags & SCHED_FLAG_SUGOV)) {
 			cpumask_t *span = rq->rd->span;
--
2.53.0

next prev parent reply	other threads:[~2026-04-30 21:39 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-30 21:38 [RFC PATCH v5 00/29] Hierarchical Constant Bandwidth Server Yuri Andriaccio
2026-04-30 21:38 ` [RFC PATCH v5 01/29] sched/deadline: Fix replenishment logic for non-deferred servers Yuri Andriaccio
2026-04-30 21:38 ` [RFC PATCH v5 02/29] sched/deadline: Do not access dl_se->rq directly Yuri Andriaccio
2026-04-30 21:38 ` [RFC PATCH v5 03/29] sched/deadline: Distinguish between dl_rq and my_q Yuri Andriaccio
2026-04-30 21:38 ` [RFC PATCH v5 04/29] sched/rt: Pass an rt_rq instead of an rq where needed Yuri Andriaccio
2026-04-30 21:38 ` [RFC PATCH v5 05/29] sched/rt: Move functions from rt.c to sched.h Yuri Andriaccio
2026-04-30 21:38 ` [RFC PATCH v5 06/29] sched/rt: Disable RT_GROUP_SCHED Yuri Andriaccio
2026-04-30 21:38 ` [RFC PATCH v5 07/29] sched/rt: Remove unnecessary runqueue pointer in struct rt_rq Yuri Andriaccio
2026-04-30 21:38 ` [RFC PATCH v5 08/29] sched/rt: Introduce HCBS specific structs in task_group Yuri Andriaccio
2026-04-30 21:38 ` [RFC PATCH v5 09/29] sched/core: Initialize HCBS specific structures Yuri Andriaccio
2026-04-30 21:38 ` [RFC PATCH v5 10/29] sched/deadline: Add dl_init_tg Yuri Andriaccio
2026-04-30 21:38 ` [RFC PATCH v5 11/29] sched/rt: Add {alloc/unregister/free}_rt_sched_group Yuri Andriaccio
2026-04-30 21:38 ` [RFC PATCH v5 12/29] sched/deadline: Account rt-cgroups bandwidth in deadline tasks schedulability tests Yuri Andriaccio
2026-04-30 21:38 ` [RFC PATCH v5 13/29] sched/rt: Implement dl-server operations for rt-cgroups Yuri Andriaccio
2026-05-05 13:04   ` Peter Zijlstra
2026-04-30 21:38 ` Yuri Andriaccio [this message]
2026-05-05 13:16   ` [RFC PATCH v5 14/29] sched/rt: Update task event callbacks for HCBS scheduling Peter Zijlstra
2026-04-30 21:38 ` [RFC PATCH v5 15/29] sched/rt: Update rt-cgroup schedulability checks Yuri Andriaccio
2026-05-05 14:36   ` Peter Zijlstra
2026-04-30 21:38 ` [RFC PATCH v5 16/29] sched/rt: Allow zeroing the runtime of the root control group Yuri Andriaccio
2026-04-30 21:38 ` [RFC PATCH v5 17/29] sched/rt: Remove old RT_GROUP_SCHED data structures Yuri Andriaccio
2026-04-30 21:38 ` [RFC PATCH v5 18/29] sched/core: Cgroup v2 support Yuri Andriaccio
2026-05-05 14:59   ` Peter Zijlstra
2026-04-30 21:38 ` [RFC PATCH v5 19/29] sched/rt: Remove support for cgroups-v1 Yuri Andriaccio
2026-05-05 15:01   ` Peter Zijlstra
2026-04-30 21:38 ` [RFC PATCH v5 20/29] sched/deadline: Allow deeper hierarchies of RT cgroups Yuri Andriaccio
2026-05-05 15:15   ` Peter Zijlstra
2026-05-05 19:56     ` Tejun Heo
2026-04-30 21:38 ` [RFC PATCH v5 21/29] sched/rt: Update default bandwidth for real-time tasks to ONE Yuri Andriaccio
2026-04-30 21:38 ` [RFC PATCH v5 22/29] sched/rt: Add rt-cgroup migration functions Yuri Andriaccio
2026-05-05 15:20   ` Peter Zijlstra
2026-05-05 15:24   ` Peter Zijlstra
2026-04-30 21:38 ` [RFC PATCH v5 23/29] sched/rt: Hook HCBS " Yuri Andriaccio
2026-04-30 21:38 ` [RFC PATCH v5 24/29] sched/core: Execute enqueued balance callbacks when changing allowed CPUs Yuri Andriaccio
2026-04-30 21:38 ` [RFC PATCH v5 25/29] sched/rt: Try pull task on empty server pick Yuri Andriaccio
2026-04-30 21:38 ` [RFC PATCH v5 26/29] sched/core: Execute enqueued balance callbacks after migrate_disable_switch Yuri Andriaccio
2026-04-30 21:38 ` [RFC PATCH v5 27/29] Documentation: Update documentation for real-time cgroups Yuri Andriaccio
2026-04-30 21:38 ` [RFC PATCH v5 28/29] sched/rt: Add debug BUG_ONs for pre-migration code Yuri Andriaccio
2026-04-30 21:38 ` [RFC PATCH v5 29/29] sched/rt: Add debug BUG_ONs in migration code Yuri Andriaccio

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:4e58b4f165e dfblob:98a53b60e21 dfblob:defb812b0e4
dfblob:67fbf4bbe46 dfblob:d949babfe16 dfblob:fceb02a0485
dfblob:806bc88d21e dfblob:15653840c81 )
 OR (
bs:"[RFC PATCH v5 14/29] sched/rt: Update task event callbacks for HCBS scheduling" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260430213835.62217-15-yurand2000@gmail.com \
    --to=yurand2000@gmail.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luca.abeni@santannapisa.it \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    --cc=yuri.andriaccio@santannapisa.it \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox