From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-wm1-f54.google.com (mail-wm1-f54.google.com [209.85.128.54]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9664C3CA49D for ; Thu, 30 Apr 2026 21:39:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.54 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777585146; cv=none; b=uo1fxRCmAwovW+GhfQTfEveVgXaFCA+agzmfuuYtZelRXFfs4gSZRj4iwOyeureWU27WEFSZaUJ6+d5oROOIR9UYl2YolAM7mY5NKttHiBZTjy8G4gUn9juXsuq0udNsLEPdyEq/Gj9jtHgBbgkTAuuxZCx0GH0PDTdCs6ei+9Q= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777585146; c=relaxed/simple; bh=hRMZM1EfcVGEWsgSYAIYLTPoeiUBiWhqtXz6aRHZ73I=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=SsoMAc3fVg+ebuuK+QSjEkK6DW0FBEwdu3NxVPP6wYGdtg8JTfGzWu63ZAPctH3qqIx93mwVDSF/iE1VBbI0sjUaJMhGxWcqFDmN68jWfoO3tD8oHdj1dSqnWoRdmb3+htPfbt68ZCXt/RG7kYI9u7zX7wYvAIG/m9ltQpkL/ZY= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=OvlyU0yD; arc=none smtp.client-ip=209.85.128.54 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="OvlyU0yD" Received: by mail-wm1-f54.google.com with SMTP id 5b1f17b1804b1-4891d7164ddso7944705e9.3 for ; Thu, 30 Apr 2026 14:39:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1777585143; x=1778189943; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=/pmc3SPqNOuM897JTx8gK8hiE6u8sztch7kdFDO7fKQ=; b=OvlyU0yDEPbpuLDrfFRoZLyOyxEg8Iw1Tg5uix/GRsomHEhikWcJ9kRmIdzhfmjChr 8C0Ba+Tdm3mJrfqeN50gkI89ud5zDvS8wgKdUNyLBTytiHTT6cfYQA5yWj/fKY2mP5JE RdVbbBmmnKTZjZNdDENmrvYAgt4B2PCkXlpEiLab6JWiH/j/TNzfRqFY6wRuf59h2oXO /C9wwD+8xxpGhW5fygpwmPAVEy+Me8HAf2lIOvWFwfbJEhXinobXDuObRC+OU5yX9G6Z PylGj2mDLBNAOQl2N3lPIbu6ZAyg1DI5fln1rqEsNv6cE6ejQcaXOaqBdpeTPUd4CpCD DfDA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1777585143; x=1778189943; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=/pmc3SPqNOuM897JTx8gK8hiE6u8sztch7kdFDO7fKQ=; b=Lt+9gjc4IFr1+/VlMn6or7apWAYkS0einSGFpqWvltCSyHhNCxoMa2AFWQGA/KFnIg lRbiSI3xXHGAnCR4QW6QK3HfX2WHI5Lpq4qrwi1xIM5NRdgm44y7yO0a7PJG7H6cZTHS 1mT3LpcUq/43KzxVxic0H85PC5UQ/sLtPTKT6/blXxRtuABjIy52MEr5lHZOjzbal9x0 tgS1H14HVrOFsgpnHwzoNQ0Hl+kEJY0Q7b1m+tsOiI26Y9SZ+hWigPo1ORLLSZYAfL36 Sw1eCn5k/apHwe3elGdC3HuBkRXPIcrakylXceW1Ean42/6zEmNxq26DK+6ja0TnKTkw CkmQ== X-Gm-Message-State: AOJu0YwztINhnKY94f1v1t2Y7F6+gyabKXDYQLwE5ggfRujXk9Npzqzx WIheXvElnyN3iTaAbHRni4CCd7D5YWeDkBeDgXJ1RKzFUYw+vCkyaCFo X-Gm-Gg: AeBDiesJ7L7vzJHAnLtS46Hmdu06BeYgkqAcTacBHmvqoD+jFgKsc1tPcyspeusmN1z 3rsXPxOuz7+fCH1NwsQoShTxrjd1cbCATijYOAWNIF/jdMVakrgpnIVzCyGI7V5q6/PLkc9mK9a 6n1v831V1qPfi0QHO/pjlkBm2UW7WeLPAag/ZtUoAUZHadF05nH1a9Ak8Rbsor3/V2XbIQat5pC i8kdPssIuNb0zOsLQ89sNA1yckIe62YM2Fe9sbv9J3ZLppUWI3tUrLiodh2uNDr7c4mg2t6Edbw BJb7/3lLb0UMhkMIVKw/DE2SiONvCFbT6P1446AsTjA4PEyxd2jN75E09njRql6zAaIw5GW9/Vj s5fUycHRJuAj7Aq63bsVGQIX0BVXzPKAOiLai5SLDqMY0bVURjQoBFEo/asW7Z8bt6hmjbGt7yZ HxsqcXZSWg8tBe3uerUPWa/3LYyoZGNSFDltCEHazm X-Received: by 2002:a05:600c:5254:b0:471:700:f281 with SMTP id 5b1f17b1804b1-48a84460b1amr78128015e9.25.1777585142883; Thu, 30 Apr 2026 14:39:02 -0700 (PDT) Received: from yuri-framework13 ([78.211.51.156]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-44a9879ef89sm418510f8f.30.2026.04.30.14.39.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 Apr 2026 14:39:02 -0700 (PDT) From: Yuri Andriaccio To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider Cc: linux-kernel@vger.kernel.org, Luca Abeni , Yuri Andriaccio Subject: [RFC PATCH v5 14/29] sched/rt: Update task event callbacks for HCBS scheduling Date: Thu, 30 Apr 2026 23:38:18 +0200 Message-ID: <20260430213835.62217-15-yurand2000@gmail.com> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260430213835.62217-1-yurand2000@gmail.com> References: <20260430213835.62217-1-yurand2000@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Update wakeup_preempt_rt, switched_{from/to}_rt and prio_changed_rt with rt-cgroup's specific preemption rules: - In wakeup_preempt_rt(), whenever a task wakes up, it must be checked if it is served by a deadline server or it lives on the global runqueue. Preemption rules (as documented in the function), change based on the current task and woken task runqueue: - If both tasks are FIFO/RR tasks on the global runqueue, or the same cgroup, run as normal. - If woken is inside a cgroup, but donor is a FIFO task on the global runqueue, always preempt. If donor is a DEADLINE task, check if the dl server preempts donor. - If both tasks are FIFO/RR tasks in served but different groups, check whether the woken server preempts the donor server. - In switched_from_rt(), perform a pull only on the global runqueue, and do nothing if the task is inside a group. This will change when migrations are added. - In switched_to_rt(), queue a push only on the global runqueue, while perform a priority check when the task switching is inside a group. This will change also when migrations are added. - In prio_changed_rt(), queue a pull only on the global runqueue, if the task is not queued. If the task is queued, run preemption checks only if both the prio changed task and curr are in the same cgroup. Update sched_rt_can_attach() to check if a task can be attached to a given cgroup. For now the check only consists in checking if the group has non-zero bandwidth. Remove the tsk argument from sched_rt_can_attach, as it is unused. Change cpu_cgroup_can_attach() to check if the attachee is a FIFO/RR task before attaching it to a cgroup. Update __sched_setscheduler() to perform checks when trying to switch to FIFO/RR for a task inside a cgroup, as the group needs to have runtime allocated. Update task_is_throttled_rt() for SCHED_CORE, returning the is_throttled value of the server if present, while global rt-tasks are never throttled. Co-developed-by: Alessio Balsini Signed-off-by: Alessio Balsini Co-developed-by: Andrea Parri Signed-off-by: Andrea Parri Co-developed-by: luca abeni Signed-off-by: luca abeni Signed-off-by: Yuri Andriaccio --- kernel/sched/core.c | 2 +- kernel/sched/rt.c | 106 +++++++++++++++++++++++++++++++++++----- kernel/sched/sched.h | 2 +- kernel/sched/syscalls.c | 12 +++++ 4 files changed, 109 insertions(+), 13 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 4e58b4f165ed..98a53b60e21f 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -9270,7 +9270,7 @@ static int cpu_cgroup_can_attach(struct cgroup_taskset *tset) goto scx_check; cgroup_taskset_for_each(task, css, tset) { - if (!sched_rt_can_attach(css_tg(css), task)) + if (rt_task(task) && !sched_rt_can_attach(css_tg(css))) return -EINVAL; } scx_check: diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index defb812b0e48..67fbf4bbe461 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -975,7 +975,58 @@ static int balance_rt(struct rq *rq, struct task_struct *p, struct rq_flags *rf) static void wakeup_preempt_rt(struct rq *rq, struct task_struct *p, int flags) { struct task_struct *donor = rq->donor; + struct sched_dl_entity *woken_dl_se = NULL; + struct sched_dl_entity *donor_dl_se = NULL; + + if (!rt_group_sched_enabled()) + goto no_group_sched; + /* + * Preemption checks are different if the waking task and the current task + * are running on the global runqueue or in a cgroup. The following rules + * apply: + * - dl-tasks (and equally dl_servers) always preempt FIFO/RR tasks. + * - if curr is a FIFO/RR task inside a cgroup (i.e. run by a + * dl_server), or curr is a DEADLINE task and waking is a FIFO/RR task + * on the root cgroup, do nothing. + * - if waking is inside a cgroup but curr is a FIFO/RR task in the root + * cgroup, always reschedule. + * - if they are both on the global runqueue, run the standard code. + * - if they are both in the same cgroup, check for tasks priorities. + * - if they are both in a cgroup, but not the same one, check whether the + * woken task's dl_server preempts the current's dl_server. + * - if curr is a DEADLINE task and waking is in a cgroup, check whether + * the woken task's server preempts curr. + */ + if (is_dl_group(rt_rq_of_se(&p->rt))) + woken_dl_se = dl_group_of(rt_rq_of_se(&p->rt)); + if (is_dl_group(rt_rq_of_se(&donor->rt))) + donor_dl_se = dl_group_of(rt_rq_of_se(&donor->rt)); + else if (task_has_dl_policy(donor)) + donor_dl_se = &donor->dl; + + if (woken_dl_se != NULL && donor_dl_se != NULL) { + if (woken_dl_se == donor_dl_se) { + if (p->prio < donor->prio) + resched_curr(rq); + + return; + } + + if (dl_entity_preempt(woken_dl_se, donor_dl_se)) + resched_curr(rq); + + return; + + } else if (woken_dl_se != NULL) { + resched_curr(rq); + return; + + } else if (donor_dl_se != NULL) { + return; + } + +no_group_sched: /* * XXX If we're preempted by DL, queue a push? */ @@ -1026,7 +1077,8 @@ static inline void set_next_task_rt(struct rq *rq, struct task_struct *p, bool f if (rq->donor->sched_class != &rt_sched_class) update_rt_rq_load_avg(rq_clock_pelt(rq), rq, 0); - rt_queue_push_tasks(rt_rq); + if (!IS_ENABLED(CONFIG_RT_GROUP_SCHED) || !is_dl_group(rt_rq)) + rt_queue_push_tasks(rt_rq); } static struct sched_rt_entity *pick_next_rt_entity(struct rt_rq *rt_rq) @@ -1736,6 +1788,8 @@ static void rq_offline_rt(struct rq *rq) */ static void switched_from_rt(struct rq *rq, struct task_struct *p) { + struct rt_rq *rt_rq = rt_rq_of_se(&p->rt); + /* * If there are other RT tasks then we will reschedule * and the scheduling of the other RT tasks will handle @@ -1743,10 +1797,11 @@ static void switched_from_rt(struct rq *rq, struct task_struct *p) * we may need to handle the pulling of RT tasks * now. */ - if (!task_on_rq_queued(p) || rq->rt.rt_nr_running) + if (!task_on_rq_queued(p) || rt_rq->rt_nr_running) return; - rt_queue_pull_task(rt_rq_of_se(&p->rt)); + if (!IS_ENABLED(CONFIG_RT_GROUP_SCHED) || !is_dl_group(rt_rq)) + rt_queue_pull_task(rt_rq); } void __init init_sched_rt_class(void) @@ -1766,6 +1821,8 @@ void __init init_sched_rt_class(void) */ static void switched_to_rt(struct rq *rq, struct task_struct *p) { + struct rt_rq *rt_rq = rt_rq_of_se(&p->rt); + /* * If we are running, update the avg_rt tracking, as the running time * will now on be accounted into the latter. @@ -1781,8 +1838,14 @@ static void switched_to_rt(struct rq *rq, struct task_struct *p) * then see if we can move to another run queue. */ if (task_on_rq_queued(p)) { - if (p->nr_cpus_allowed > 1 && rq->rt.overloaded) - rt_queue_push_tasks(rt_rq_of_se(&p->rt)); + if (IS_ENABLED(CONFIG_RT_GROUP_SCHED) && is_dl_group(rt_rq)) { + if (p->prio < rq->donor->prio) + resched_curr(rq); + } else { + if (p->nr_cpus_allowed > 1 && rq->rt.overloaded) + rt_queue_push_tasks(rt_rq_of_se(&p->rt)); + } + if (p->prio < rq->donor->prio && cpu_online(cpu_of(rq))) resched_curr(rq); } @@ -1795,6 +1858,8 @@ static void switched_to_rt(struct rq *rq, struct task_struct *p) static void prio_changed_rt(struct rq *rq, struct task_struct *p, u64 oldprio) { + struct rt_rq *rt_rq = rt_rq_of_se(&p->rt); + if (!task_on_rq_queued(p)) return; @@ -1807,15 +1872,25 @@ prio_changed_rt(struct rq *rq, struct task_struct *p, u64 oldprio) * may need to pull tasks to this runqueue. */ if (oldprio < p->prio) - rt_queue_pull_task(rt_rq_of_se(&p->rt)); + if (!IS_ENABLED(CONFIG_RT_GROUP_SCHED) || !is_dl_group(rt_rq)) + rt_queue_pull_task(rt_rq); /* * If there's a higher priority task waiting to run * then reschedule. */ - if (p->prio > rq->rt.highest_prio.curr) + if (p->prio > rt_rq->highest_prio.curr) resched_curr(rq); } else { + /* + * This task is not running, thus we check against the currently + * running task for preemption. We can preempt only if both tasks are + * in the same cgroup or on the global runqueue. + */ + if (IS_ENABLED(CONFIG_RT_GROUP_SCHED) && + rt_rq_of_se(&p->rt)->tg != rt_rq_of_se(&rq->curr->rt)->tg) + return; + /* * This task is not running, but if it is * greater than the current running task @@ -1908,7 +1983,16 @@ static unsigned int get_rr_interval_rt(struct rq *rq, struct task_struct *task) #ifdef CONFIG_SCHED_CORE static int task_is_throttled_rt(struct task_struct *p, int cpu) { +#ifdef CONFIG_RT_GROUP_SCHED + struct rt_rq *rt_rq; + + rt_rq = task_group(p)->rt_rq[cpu]; + WARN_ON(!rt_group_sched_enabled() && rt_rq->tg != &root_task_group); + + return dl_group_of(rt_rq)->dl_throttled; +#else return 0; +#endif } #endif /* CONFIG_SCHED_CORE */ @@ -2159,16 +2243,16 @@ static int sched_rt_global_constraints(void) } #endif /* CONFIG_SYSCTL */ -int sched_rt_can_attach(struct task_group *tg, struct task_struct *tsk) +int sched_rt_can_attach(struct task_group *tg) { /* Don't accept real-time tasks when there is no way for them to run */ - if (rt_group_sched_enabled() && rt_task(tsk) && tg->rt_bandwidth.rt_runtime == 0) + if (rt_group_sched_enabled() && tg->dl_bandwidth.dl_runtime == 0) return 0; return 1; } -#else /* !CONFIG_RT_GROUP_SCHED: */ +#else /* !CONFIG_RT_GROUP_SCHED */ #ifdef CONFIG_SYSCTL static int sched_rt_global_constraints(void) @@ -2176,7 +2260,7 @@ static int sched_rt_global_constraints(void) return 0; } #endif /* CONFIG_SYSCTL */ -#endif /* !CONFIG_RT_GROUP_SCHED */ +#endif /* CONFIG_RT_GROUP_SCHED */ #ifdef CONFIG_SYSCTL static int sched_rt_global_validate(void) diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index d949babfe16a..fceb02a04858 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -609,7 +609,7 @@ extern int sched_group_set_rt_runtime(struct task_group *tg, long rt_runtime_us) extern int sched_group_set_rt_period(struct task_group *tg, u64 rt_period_us); extern long sched_group_rt_runtime(struct task_group *tg); extern long sched_group_rt_period(struct task_group *tg); -extern int sched_rt_can_attach(struct task_group *tg, struct task_struct *tsk); +extern int sched_rt_can_attach(struct task_group *tg); extern struct task_group *sched_create_group(struct task_group *parent); extern void sched_online_group(struct task_group *tg, diff --git a/kernel/sched/syscalls.c b/kernel/sched/syscalls.c index 806bc88d21ee..15653840c812 100644 --- a/kernel/sched/syscalls.c +++ b/kernel/sched/syscalls.c @@ -606,6 +606,18 @@ int __sched_setscheduler(struct task_struct *p, change: if (user) { + /* + * Do not allow real-time tasks into groups that have no runtime + * assigned. + */ + if (rt_group_sched_enabled() && + dl_bandwidth_enabled() && rt_policy(policy) && + !sched_rt_can_attach(task_group(p)) && + !task_group_is_autogroup(task_group(p))) { + retval = -EPERM; + goto unlock; + } + if (dl_bandwidth_enabled() && dl_policy(policy) && !(attr->sched_flags & SCHED_FLAG_SUGOV)) { cpumask_t *span = rq->rd->span; -- 2.53.0