From: Andrea Righi <arighi@nvidia.com>
To: Juri Lelli <juri.lelli@redhat.com>
Cc: "Ingo Molnar" <mingo@redhat.com>,
"Peter Zijlstra" <peterz@infradead.org>,
"Vincent Guittot" <vincent.guittot@linaro.org>,
"Dietmar Eggemann" <dietmar.eggemann@arm.com>,
"Steven Rostedt" <rostedt@goodmis.org>,
"Ben Segall" <bsegall@google.com>, "Mel Gorman" <mgorman@suse.de>,
"Valentin Schneider" <vschneid@redhat.com>,
"K Prateek Nayak" <kprateek.nayak@amd.com>,
"Frederic Weisbecker" <frederic@kernel.org>,
linux-kernel@vger.kernel.org,
"David Haufe" <dhaufe@simplextrading.com>,
"Cao Ruichuang" <create0818@163.com>,
"Furkan Çalışkan" <frn1furkan10@gmail.com>
Subject: Re: [PATCH v2] sched/deadline: Make dl-server nohz full aware
Date: Wed, 13 May 2026 13:13:49 +0200 [thread overview]
Message-ID: <agRc7bDnav7tgcJT@gpd4> (raw)
In-Reply-To: <20260513-upstream-fix-dlserver-nohzfull-b4-v2-1-d3e9cbe5c845@redhat.com>
Hi Juri,
On Wed, May 13, 2026 at 11:13:03AM +0200, Juri Lelli wrote:
> The dl_server_timer() originally caused spurious IPIs on nohz_full
> cores, breaking isolation guarantees. While such IPIs cannot be observed
> on recent kernels, dl-server timers for tick-stopped isolated CPUs still
> fire unnecessarily on housekeeping cores.
>
> The problem is that dl-servers are not coordinated with nohz_full tick
> state. Even when the tick stops on an isolated CPU, its dl-server timer
> continues to fire on housekeeping, wasting cycles and potentially
> affecting housekeeping CPU performance.
>
> Fix by managing servers in sched_can_stop_tick():
>
> - When RT tasks run with CFS/SCX tasks, start the appropriate server(s)
> and keep the tick running
> - When only RT tasks remain, stop all servers and allow tick to stop
> (except for >1 RR tasks which need the tick for round-robin)
> - When only CFS/SCX tasks remain, stop all servers before stopping tick
>
> Introduce dl_servers_stop_all() to reduce duplication and abstract
> server management from core.c. Unify RT handling into one block that
> handles both RR and FIFO cases.
>
> Note on SCX: While SCX is incompatible with isolcpus=domain, it does
> support nohz_full. The ext_server handling in this patch targets
> nohz_full configurations without domain isolation.
>
> Fixes: 557a6bfc662c ("sched/fair: Add trivial fair server")
> Reported-by: David Haufe <dhaufe@simplextrading.com>
> Closes: https://lore.kernel.org/lkml/CAKJHwtOw_G67edzuHVtL1xC5Vyt6StcZzihtDd0yaKudW=rwVw@mail.gmail.com
> Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
From a sched_ext perspective LGTM.
Reviewed-by: Andrea Righi <arighi@nvidia.com>
Thanks,
-Andrea
> ---
> Changes from v1 [1]
>
> - Fix CFS/SCX server start logic to handle both simultaneously in
> partial switch mode (Furkan)
> - Clarify in commit message that SCX supports nohz_full despite
> isolcpus=domain incompatibility (Andrea)
>
> 1 - https://lore.kernel.org/lkml/20260512-upstream-fix-dlserver-nohzfull-b4-v1-1-a94844387ae7@redhat.com/
> ---
> kernel/sched/core.c | 46 +++++++++++++++++++++++++++-------------------
> kernel/sched/deadline.c | 14 ++++++++++++++
> kernel/sched/sched.h | 1 +
> 3 files changed, 42 insertions(+), 19 deletions(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index b905805bbcbe4..6d05ce9b1dfe6 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1414,30 +1414,40 @@ static inline bool __need_bw_check(struct rq *rq, struct task_struct *p)
>
> bool sched_can_stop_tick(struct rq *rq)
> {
> - int fifo_nr_running;
> -
> /* Deadline tasks, even if single, need the tick */
> if (rq->dl.dl_nr_running)
> return false;
>
> /*
> - * If there are more than one RR tasks, we need the tick to affect the
> - * actual RR behaviour.
> + * If there are RT tasks, we may need the tick (for >1 RR tasks),
> + * but we must also service lower-priority CFS/SCX tasks via dl-servers.
> */
> - if (rq->rt.rr_nr_running) {
> - if (rq->rt.rr_nr_running == 1)
> - return true;
> - else
> + if (rq->rt.rt_nr_running) {
> + bool cfs_or_scx_queued = false;
> +
> + if (rq->cfs.h_nr_queued) {
> + dl_server_start(&rq->fair_server);
> + cfs_or_scx_queued = true;
> + }
> +#ifdef CONFIG_SCHED_CLASS_EXT
> + if (rq->scx.nr_running) {
> + dl_server_start(&rq->ext_server);
> + cfs_or_scx_queued = true;
> + }
> +#endif
> + if (cfs_or_scx_queued)
> return false;
> - }
>
> - /*
> - * If there's no RR tasks, but FIFO tasks, we can skip the tick, no
> - * forced preemption between FIFO tasks.
> - */
> - fifo_nr_running = rq->rt.rt_nr_running - rq->rt.rr_nr_running;
> - if (fifo_nr_running)
> + /*
> + * Only RT tasks, no CFS/SCX. Stop servers to prevent spurious
> + * wakeups. Tick can stop for single RR or any FIFO, but must
> + * run for multiple RR (round-robin behavior).
> + */
> + dl_servers_stop_all(rq);
> + if (rq->rt.rr_nr_running > 1)
> + return false;
> return true;
> + }
>
> /*
> * If there are no DL,RR/FIFO tasks, there must only be CFS or SCX tasks
> @@ -1462,6 +1472,7 @@ bool sched_can_stop_tick(struct rq *rq)
> return false;
> }
>
> + dl_servers_stop_all(rq);
> return true;
> }
> #endif /* CONFIG_NO_HZ_FULL */
> @@ -8810,10 +8821,7 @@ int sched_cpu_dying(unsigned int cpu)
> WARN(true, "Dying CPU not properly vacated!");
> dump_rq_tasks(rq, KERN_WARNING);
> }
> - dl_server_stop(&rq->fair_server);
> -#ifdef CONFIG_SCHED_CLASS_EXT
> - dl_server_stop(&rq->ext_server);
> -#endif
> + dl_servers_stop_all(rq);
> rq_unlock_irqrestore(rq, &rf);
>
> calc_load_migrate(rq);
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index edca7849b165d..c2b3d6bbe4828 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -1826,6 +1826,20 @@ void dl_server_stop(struct sched_dl_entity *dl_se)
> dl_se->dl_server_active = 0;
> }
>
> +/*
> + * Stop all dl-servers on this runqueue. Called when transitioning to a state
> + * where the tick can be stopped (e.g., single RR/FIFO task, or no RT tasks).
> + * This ensures server timers are disarmed and won't cause spurious wakeups on
> + * nohz_full isolated cores.
> + */
> +void dl_servers_stop_all(struct rq *rq)
> +{
> + dl_server_stop(&rq->fair_server);
> +#ifdef CONFIG_SCHED_CLASS_EXT
> + dl_server_stop(&rq->ext_server);
> +#endif
> +}
> +
> void dl_server_init(struct sched_dl_entity *dl_se, struct rq *rq,
> dl_server_pick_f pick_task)
> {
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 9f63b15d309d1..26cf1d14efde5 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -412,6 +412,7 @@ extern void dl_server_update_idle(struct sched_dl_entity *dl_se, s64 delta_exec)
> extern void dl_server_update(struct sched_dl_entity *dl_se, s64 delta_exec);
> extern void dl_server_start(struct sched_dl_entity *dl_se);
> extern void dl_server_stop(struct sched_dl_entity *dl_se);
> +extern void dl_servers_stop_all(struct rq *rq);
> extern void dl_server_init(struct sched_dl_entity *dl_se, struct rq *rq,
> dl_server_pick_f pick_task);
> extern void sched_init_dl_servers(void);
>
> ---
> base-commit: 4ac4d6549a6563878d7c19c154e017f6cb7114d3
> change-id: 20260513-upstream-fix-dlserver-nohzfull-b4-fa741a2b6189
>
> Best regards,
> --
> Juri Lelli <juri.lelli@redhat.com>
>
next prev parent reply other threads:[~2026-05-13 11:14 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-13 9:13 [PATCH v2] sched/deadline: Make dl-server nohz full aware Juri Lelli
2026-05-13 11:13 ` Andrea Righi [this message]
2026-05-15 15:56 ` Valentin Schneider
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=agRc7bDnav7tgcJT@gpd4 \
--to=arighi@nvidia.com \
--cc=bsegall@google.com \
--cc=create0818@163.com \
--cc=dhaufe@simplextrading.com \
--cc=dietmar.eggemann@arm.com \
--cc=frederic@kernel.org \
--cc=frn1furkan10@gmail.com \
--cc=juri.lelli@redhat.com \
--cc=kprateek.nayak@amd.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.