The Linux Kernel Mailing List
 help / color / mirror / Atom feed
* [PATCH] sched/deadline: Make dl-server nohz full aware
@ 2026-05-12  9:02 Juri Lelli
  2026-05-12 10:06 ` Furkan Çalışkan
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Juri Lelli @ 2026-05-12  9:02 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	K Prateek Nayak, Andrea Righi, Frederic Weisbecker
  Cc: linux-kernel, David Haufe, Cao Ruichuang, Juri Lelli

The dl_server_timer() causes spurious IPIs on nohz_full cores, breaking
isolation guarantees. The timer executes on a housekeeping core and
eventually calls tick_nohz_dep_set_cpu(), sending IPIs to isolated cores
even when only a single task is running.

The problem is that dl-servers are not coordinated with nohz_full tick
state. Timers can fire and send IPIs to otherwise undisturbed cores.

Fix by managing servers in sched_can_stop_tick():

- When RT tasks run with CFS/SCX tasks, start the appropriate server
  and keep the tick running
- When only RT tasks remain, stop all servers and allow tick to stop
  (except for >1 RR tasks which need the tick for round-robin)
- When only CFS/SCX tasks remain, stop all servers before stopping tick

Introduce dl_servers_stop_all() to reduce duplication and abstract
server management from core.c. Unify RT handling into one block that
handles both RR and FIFO cases.

Fixes: 557a6bfc662c ("sched/fair: Add trivial fair server")
Reported-by: David Haufe <dhaufe@simplextrading.com>
Closes: https://lore.kernel.org/lkml/CAKJHwtOw_G67edzuHVtL1xC5Vyt6StcZzihtDd0yaKudW=rwVw@mail.gmail.com
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
---
I had to modify my first original attempt at fixing this (please take a
look at the linked report/discussion) to also take SCX into
consideration.

FYI, I temporarily pushed the script I'm using to repro and verify the
fix here

https://github.com/jlelli/sched-deadline-tests/blob/master/test-dlserver-nohz.sh
---
 kernel/sched/core.c     | 43 +++++++++++++++++++++++--------------------
 kernel/sched/deadline.c | 14 ++++++++++++++
 kernel/sched/sched.h    |  1 +
 3 files changed, 38 insertions(+), 20 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index b905805bbcbe4..98759255c306b 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1414,30 +1414,35 @@ static inline bool __need_bw_check(struct rq *rq, struct task_struct *p)
 
 bool sched_can_stop_tick(struct rq *rq)
 {
-	int fifo_nr_running;
-
 	/* Deadline tasks, even if single, need the tick */
 	if (rq->dl.dl_nr_running)
 		return false;
 
 	/*
-	 * If there are more than one RR tasks, we need the tick to affect the
-	 * actual RR behaviour.
+	 * If there are RT tasks, we may need the tick (for >1 RR tasks),
+	 * but we must also service lower-priority CFS/SCX tasks via dl-servers.
 	 */
-	if (rq->rt.rr_nr_running) {
-		if (rq->rt.rr_nr_running == 1)
-			return true;
-		else
+	if (rq->rt.rt_nr_running) {
+		if (rq->cfs.h_nr_queued) {
+			dl_server_start(&rq->fair_server);
+			return false;
+		}
+#ifdef CONFIG_SCHED_CLASS_EXT
+		if (rq->scx.nr_running) {
+			dl_server_start(&rq->ext_server);
+			return false;
+		}
+#endif
+		/*
+		 * Only RT tasks, no CFS/SCX. Stop servers to prevent spurious
+		 * wakeups. Tick can stop for single RR or any FIFO, but must
+		 * run for multiple RR (round-robin behavior).
+		 */
+		dl_servers_stop_all(rq);
+		if (rq->rt.rr_nr_running > 1)
 			return false;
-	}
-
-	/*
-	 * If there's no RR tasks, but FIFO tasks, we can skip the tick, no
-	 * forced preemption between FIFO tasks.
-	 */
-	fifo_nr_running = rq->rt.rt_nr_running - rq->rt.rr_nr_running;
-	if (fifo_nr_running)
 		return true;
+	}
 
 	/*
 	 * If there are no DL,RR/FIFO tasks, there must only be CFS or SCX tasks
@@ -1462,6 +1467,7 @@ bool sched_can_stop_tick(struct rq *rq)
 			return false;
 	}
 
+	dl_servers_stop_all(rq);
 	return true;
 }
 #endif /* CONFIG_NO_HZ_FULL */
@@ -8810,10 +8816,7 @@ int sched_cpu_dying(unsigned int cpu)
 		WARN(true, "Dying CPU not properly vacated!");
 		dump_rq_tasks(rq, KERN_WARNING);
 	}
-	dl_server_stop(&rq->fair_server);
-#ifdef CONFIG_SCHED_CLASS_EXT
-	dl_server_stop(&rq->ext_server);
-#endif
+	dl_servers_stop_all(rq);
 	rq_unlock_irqrestore(rq, &rf);
 
 	calc_load_migrate(rq);
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index edca7849b165d..c2b3d6bbe4828 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1826,6 +1826,20 @@ void dl_server_stop(struct sched_dl_entity *dl_se)
 	dl_se->dl_server_active = 0;
 }
 
+/*
+ * Stop all dl-servers on this runqueue. Called when transitioning to a state
+ * where the tick can be stopped (e.g., single RR/FIFO task, or no RT tasks).
+ * This ensures server timers are disarmed and won't cause spurious wakeups on
+ * nohz_full isolated cores.
+ */
+void dl_servers_stop_all(struct rq *rq)
+{
+	dl_server_stop(&rq->fair_server);
+#ifdef CONFIG_SCHED_CLASS_EXT
+	dl_server_stop(&rq->ext_server);
+#endif
+}
+
 void dl_server_init(struct sched_dl_entity *dl_se, struct rq *rq,
 		    dl_server_pick_f pick_task)
 {
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 9f63b15d309d1..26cf1d14efde5 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -412,6 +412,7 @@ extern void dl_server_update_idle(struct sched_dl_entity *dl_se, s64 delta_exec)
 extern void dl_server_update(struct sched_dl_entity *dl_se, s64 delta_exec);
 extern void dl_server_start(struct sched_dl_entity *dl_se);
 extern void dl_server_stop(struct sched_dl_entity *dl_se);
+extern void dl_servers_stop_all(struct rq *rq);
 extern void dl_server_init(struct sched_dl_entity *dl_se, struct rq *rq,
 		    dl_server_pick_f pick_task);
 extern void sched_init_dl_servers(void);

---
base-commit: 4ac4d6549a6563878d7c19c154e017f6cb7114d3
change-id: 20260512-upstream-fix-dlserver-nohzfull-b4-b745e2a967ed

Best regards,
--  
Juri Lelli <juri.lelli@redhat.com>


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] sched/deadline: Make dl-server nohz full aware
  2026-05-12  9:02 [PATCH] sched/deadline: Make dl-server nohz full aware Juri Lelli
@ 2026-05-12 10:06 ` Furkan Çalışkan
  2026-05-12 12:27   ` Juri Lelli
  2026-05-12 14:03 ` Frederic Weisbecker
  2026-05-12 14:55 ` Andrea Righi
  2 siblings, 1 reply; 9+ messages in thread
From: Furkan Çalışkan @ 2026-05-12 10:06 UTC (permalink / raw)
  To: Juri Lelli, Ingo Molnar, Peter Zijlstra, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, K Prateek Nayak, Andrea Righi,
	Frederic Weisbecker
  Cc: linux-kernel, David Haufe, Cao Ruichuang

Hi Juri,

On 5/12/26 12:02, Juri Lelli wrote:
> The dl_server_timer() causes spurious IPIs on nohz_full cores, breaking
> isolation guarantees. The timer executes on a housekeeping core and
> eventually calls tick_nohz_dep_set_cpu(), sending IPIs to isolated cores
> even when only a single task is running.
> 
> The problem is that dl-servers are not coordinated with nohz_full tick
> state. Timers can fire and send IPIs to otherwise undisturbed cores.
> 
> Fix by managing servers in sched_can_stop_tick():
> 
> - When RT tasks run with CFS/SCX tasks, start the appropriate server
>   and keep the tick running
> - When only RT tasks remain, stop all servers and allow tick to stop
>   (except for >1 RR tasks which need the tick for round-robin)
> - When only CFS/SCX tasks remain, stop all servers before stopping tick
> 
> Introduce dl_servers_stop_all() to reduce duplication and abstract
> server management from core.c. Unify RT handling into one block that
> handles both RR and FIFO cases.
> 
> Fixes: 557a6bfc662c ("sched/fair: Add trivial fair server")
> Reported-by: David Haufe <dhaufe@simplextrading.com>
> Closes: https://lore.kernel.org/lkml/CAKJHwtOw_G67edzuHVtL1xC5Vyt6StcZzihtDd0yaKudW=rwVw@mail.gmail.com
> Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
> ---
> I had to modify my first original attempt at fixing this (please take a
> look at the linked report/discussion) to also take SCX into
> consideration.
> 
> FYI, I temporarily pushed the script I'm using to repro and verify the
> fix here
> 
> https://github.com/jlelli/sched-deadline-tests/blob/master/test-dlserver-nohz.sh
> ---
>  kernel/sched/core.c     | 43 +++++++++++++++++++++++--------------------
>  kernel/sched/deadline.c | 14 ++++++++++++++
>  kernel/sched/sched.h    |  1 +
>  3 files changed, 38 insertions(+), 20 deletions(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index b905805bbcbe4..98759255c306b 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1414,30 +1414,35 @@ static inline bool __need_bw_check(struct rq *rq, struct task_struct *p)
>  
>  bool sched_can_stop_tick(struct rq *rq)
>  {
> -	int fifo_nr_running;
> -
>  	/* Deadline tasks, even if single, need the tick */
>  	if (rq->dl.dl_nr_running)
>  		return false;
>  
>  	/*
> -	 * If there are more than one RR tasks, we need the tick to affect the
> -	 * actual RR behaviour.
> +	 * If there are RT tasks, we may need the tick (for >1 RR tasks),
> +	 * but we must also service lower-priority CFS/SCX tasks via dl-servers.
>  	 */
> -	if (rq->rt.rr_nr_running) {
> -		if (rq->rt.rr_nr_running == 1)
> -			return true;
> -		else
> +	if (rq->rt.rt_nr_running) {
> +		if (rq->cfs.h_nr_queued) {
> +			dl_server_start(&rq->fair_server);
> +			return false;
> +		}
> +#ifdef CONFIG_SCHED_CLASS_EXT
> +		if (rq->scx.nr_running) {
> +			dl_server_start(&rq->ext_server);
> +			return false;
> +		}
> +#endif

In the above block, the CFS and SCX server start paths are mutually exclusive. 
If both cfs.h_nr_queued and scx.nr_running are non-zero at the same time, only 
fair_server gets started and ext_server remains stopped. Could that leave SCX 
tasks without server coverage in a mixed CFS+SCX+RT scenario?

---
Thanks,
Furkan Caliskan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] sched/deadline: Make dl-server nohz full aware
  2026-05-12 10:06 ` Furkan Çalışkan
@ 2026-05-12 12:27   ` Juri Lelli
  0 siblings, 0 replies; 9+ messages in thread
From: Juri Lelli @ 2026-05-12 12:27 UTC (permalink / raw)
  To: Furkan Çalışkan
  Cc: Ingo Molnar, Peter Zijlstra, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	K Prateek Nayak, Andrea Righi, Frederic Weisbecker, linux-kernel,
	David Haufe, Cao Ruichuang

Hi Furkan,

On 12/05/26 13:06, Furkan Çalışkan wrote:
> Hi Juri,
> 
> On 5/12/26 12:02, Juri Lelli wrote:
> > The dl_server_timer() causes spurious IPIs on nohz_full cores, breaking
> > isolation guarantees. The timer executes on a housekeeping core and
> > eventually calls tick_nohz_dep_set_cpu(), sending IPIs to isolated cores
> > even when only a single task is running.
> > 
> > The problem is that dl-servers are not coordinated with nohz_full tick
> > state. Timers can fire and send IPIs to otherwise undisturbed cores.
> > 
> > Fix by managing servers in sched_can_stop_tick():
> > 
> > - When RT tasks run with CFS/SCX tasks, start the appropriate server
> >   and keep the tick running
> > - When only RT tasks remain, stop all servers and allow tick to stop
> >   (except for >1 RR tasks which need the tick for round-robin)
> > - When only CFS/SCX tasks remain, stop all servers before stopping tick
> > 
> > Introduce dl_servers_stop_all() to reduce duplication and abstract
> > server management from core.c. Unify RT handling into one block that
> > handles both RR and FIFO cases.
> > 
> > Fixes: 557a6bfc662c ("sched/fair: Add trivial fair server")
> > Reported-by: David Haufe <dhaufe@simplextrading.com>
> > Closes: https://lore.kernel.org/lkml/CAKJHwtOw_G67edzuHVtL1xC5Vyt6StcZzihtDd0yaKudW=rwVw@mail.gmail.com
> > Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
> > ---
> > I had to modify my first original attempt at fixing this (please take a
> > look at the linked report/discussion) to also take SCX into
> > consideration.
> > 
> > FYI, I temporarily pushed the script I'm using to repro and verify the
> > fix here
> > 
> > https://github.com/jlelli/sched-deadline-tests/blob/master/test-dlserver-nohz.sh
> > ---
> >  kernel/sched/core.c     | 43 +++++++++++++++++++++++--------------------
> >  kernel/sched/deadline.c | 14 ++++++++++++++
> >  kernel/sched/sched.h    |  1 +
> >  3 files changed, 38 insertions(+), 20 deletions(-)
> > 
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index b905805bbcbe4..98759255c306b 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -1414,30 +1414,35 @@ static inline bool __need_bw_check(struct rq *rq, struct task_struct *p)
> >  
> >  bool sched_can_stop_tick(struct rq *rq)
> >  {
> > -	int fifo_nr_running;
> > -
> >  	/* Deadline tasks, even if single, need the tick */
> >  	if (rq->dl.dl_nr_running)
> >  		return false;
> >  
> >  	/*
> > -	 * If there are more than one RR tasks, we need the tick to affect the
> > -	 * actual RR behaviour.
> > +	 * If there are RT tasks, we may need the tick (for >1 RR tasks),
> > +	 * but we must also service lower-priority CFS/SCX tasks via dl-servers.
> >  	 */
> > -	if (rq->rt.rr_nr_running) {
> > -		if (rq->rt.rr_nr_running == 1)
> > -			return true;
> > -		else
> > +	if (rq->rt.rt_nr_running) {
> > +		if (rq->cfs.h_nr_queued) {
> > +			dl_server_start(&rq->fair_server);
> > +			return false;
> > +		}
> > +#ifdef CONFIG_SCHED_CLASS_EXT
> > +		if (rq->scx.nr_running) {
> > +			dl_server_start(&rq->ext_server);
> > +			return false;
> > +		}
> > +#endif
> 
> In the above block, the CFS and SCX server start paths are mutually exclusive. 
> If both cfs.h_nr_queued and scx.nr_running are non-zero at the same time, only 
> fair_server gets started and ext_server remains stopped. Could that leave SCX 
> tasks without server coverage in a mixed CFS+SCX+RT scenario?

Indeed there is the partial switch mode to consider. Can fix in the next
version.

Thanks,
Juri


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] sched/deadline: Make dl-server nohz full aware
  2026-05-12  9:02 [PATCH] sched/deadline: Make dl-server nohz full aware Juri Lelli
  2026-05-12 10:06 ` Furkan Çalışkan
@ 2026-05-12 14:03 ` Frederic Weisbecker
  2026-05-12 15:31   ` Juri Lelli
  2026-05-12 14:55 ` Andrea Righi
  2 siblings, 1 reply; 9+ messages in thread
From: Frederic Weisbecker @ 2026-05-12 14:03 UTC (permalink / raw)
  To: Juri Lelli
  Cc: Ingo Molnar, Peter Zijlstra, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	K Prateek Nayak, Andrea Righi, linux-kernel, David Haufe,
	Cao Ruichuang, Tejun Heo

Le Tue, May 12, 2026 at 11:02:37AM +0200, Juri Lelli a écrit :
> The dl_server_timer() causes spurious IPIs on nohz_full cores, breaking
> isolation guarantees. The timer executes on a housekeeping core and
> eventually calls tick_nohz_dep_set_cpu(), sending IPIs to isolated cores
> even when only a single task is running.
> 
> The problem is that dl-servers are not coordinated with nohz_full tick
> state. Timers can fire and send IPIs to otherwise undisturbed cores.
> 
> Fix by managing servers in sched_can_stop_tick():
> 
> - When RT tasks run with CFS/SCX tasks, start the appropriate server
>   and keep the tick running
> - When only RT tasks remain, stop all servers and allow tick to stop
>   (except for >1 RR tasks which need the tick for round-robin)
> - When only CFS/SCX tasks remain, stop all servers before stopping tick
> 
> Introduce dl_servers_stop_all() to reduce duplication and abstract
> server management from core.c. Unify RT handling into one block that
> handles both RR and FIFO cases.
> 
> Fixes: 557a6bfc662c ("sched/fair: Add trivial fair server")
> Reported-by: David Haufe <dhaufe@simplextrading.com>
> Closes: https://lore.kernel.org/lkml/CAKJHwtOw_G67edzuHVtL1xC5Vyt6StcZzihtDd0yaKudW=rwVw@mail.gmail.com
> Signed-off-by: Juri Lelli <juri.lelli@redhat.com>

I indeed observed IPIs originating from dl_server some time
ago but that magically disappeared after some commit from Peter.

Perhaps it came back somehow? Lemme run dynticks-testing again on
latest upstream...

> ---
> I had to modify my first original attempt at fixing this (please take a
> look at the linked report/discussion) to also take SCX into
> consideration.

I thought SCX was disabled when CPU isolation is running?

9f391f94a173 ("sched_ext: Disallow loading BPF scheduler if isolcpus= domain
isolation is in effect")

Thanks.

-- 
Frederic Weisbecker
SUSE Labs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] sched/deadline: Make dl-server nohz full aware
  2026-05-12  9:02 [PATCH] sched/deadline: Make dl-server nohz full aware Juri Lelli
  2026-05-12 10:06 ` Furkan Çalışkan
  2026-05-12 14:03 ` Frederic Weisbecker
@ 2026-05-12 14:55 ` Andrea Righi
  2026-05-12 15:34   ` Juri Lelli
  2 siblings, 1 reply; 9+ messages in thread
From: Andrea Righi @ 2026-05-12 14:55 UTC (permalink / raw)
  To: Juri Lelli
  Cc: Ingo Molnar, Peter Zijlstra, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	K Prateek Nayak, Frederic Weisbecker, linux-kernel, David Haufe,
	Cao Ruichuang

Hi Juri,

On Tue, May 12, 2026 at 11:02:37AM +0200, Juri Lelli wrote:
> The dl_server_timer() causes spurious IPIs on nohz_full cores, breaking
> isolation guarantees. The timer executes on a housekeeping core and
> eventually calls tick_nohz_dep_set_cpu(), sending IPIs to isolated cores
> even when only a single task is running.
> 
> The problem is that dl-servers are not coordinated with nohz_full tick
> state. Timers can fire and send IPIs to otherwise undisturbed cores.
> 
> Fix by managing servers in sched_can_stop_tick():
> 
> - When RT tasks run with CFS/SCX tasks, start the appropriate server
>   and keep the tick running
> - When only RT tasks remain, stop all servers and allow tick to stop
>   (except for >1 RR tasks which need the tick for round-robin)
> - When only CFS/SCX tasks remain, stop all servers before stopping tick
> 
> Introduce dl_servers_stop_all() to reduce duplication and abstract
> server management from core.c. Unify RT handling into one block that
> handles both RR and FIFO cases.
> 
> Fixes: 557a6bfc662c ("sched/fair: Add trivial fair server")
> Reported-by: David Haufe <dhaufe@simplextrading.com>
> Closes: https://lore.kernel.org/lkml/CAKJHwtOw_G67edzuHVtL1xC5Vyt6StcZzihtDd0yaKudW=rwVw@mail.gmail.com
> Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
> ---
> I had to modify my first original attempt at fixing this (please take a
> look at the linked report/discussion) to also take SCX into
> consideration.

As mentioned by Frederic, we don't allow to load BPF schedulers when isolcpus=
is used, so I think we can simplify the sched_can_stop_tick() part.

> 
> FYI, I temporarily pushed the script I'm using to repro and verify the
> fix here
> 
> https://github.com/jlelli/sched-deadline-tests/blob/master/test-dlserver-nohz.sh
> ---
>  kernel/sched/core.c     | 43 +++++++++++++++++++++++--------------------
>  kernel/sched/deadline.c | 14 ++++++++++++++
>  kernel/sched/sched.h    |  1 +
>  3 files changed, 38 insertions(+), 20 deletions(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index b905805bbcbe4..98759255c306b 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1414,30 +1414,35 @@ static inline bool __need_bw_check(struct rq *rq, struct task_struct *p)
>  
>  bool sched_can_stop_tick(struct rq *rq)
>  {
> -	int fifo_nr_running;
> -
>  	/* Deadline tasks, even if single, need the tick */
>  	if (rq->dl.dl_nr_running)
>  		return false;
>  
>  	/*
> -	 * If there are more than one RR tasks, we need the tick to affect the
> -	 * actual RR behaviour.
> +	 * If there are RT tasks, we may need the tick (for >1 RR tasks),
> +	 * but we must also service lower-priority CFS/SCX tasks via dl-servers.

No need to mention SCX, maybe we can add a note that SCX is incompatible with
isolcpus, so there's no SCX task to run here.

>  	 */
> -	if (rq->rt.rr_nr_running) {
> -		if (rq->rt.rr_nr_running == 1)
> -			return true;
> -		else
> +	if (rq->rt.rt_nr_running) {
> +		if (rq->cfs.h_nr_queued) {
> +			dl_server_start(&rq->fair_server);
> +			return false;
> +		}
> +#ifdef CONFIG_SCHED_CLASS_EXT
> +		if (rq->scx.nr_running) {
> +			dl_server_start(&rq->ext_server);
> +			return false;
> +		}
> +#endif

This #ifdef block can go away.

> +		/*
> +		 * Only RT tasks, no CFS/SCX. Stop servers to prevent spurious

CFS/SCX -> CFS.

> +		 * wakeups. Tick can stop for single RR or any FIFO, but must
> +		 * run for multiple RR (round-robin behavior).
> +		 */
> +		dl_servers_stop_all(rq);
> +		if (rq->rt.rr_nr_running > 1)
>  			return false;
> -	}
> -
> -	/*
> -	 * If there's no RR tasks, but FIFO tasks, we can skip the tick, no
> -	 * forced preemption between FIFO tasks.
> -	 */
> -	fifo_nr_running = rq->rt.rt_nr_running - rq->rt.rr_nr_running;
> -	if (fifo_nr_running)
>  		return true;
> +	}
>  
>  	/*
>  	 * If there are no DL,RR/FIFO tasks, there must only be CFS or SCX tasks
> @@ -1462,6 +1467,7 @@ bool sched_can_stop_tick(struct rq *rq)
>  			return false;
>  	}
>  
> +	dl_servers_stop_all(rq);
>  	return true;
>  }
>  #endif /* CONFIG_NO_HZ_FULL */
> @@ -8810,10 +8816,7 @@ int sched_cpu_dying(unsigned int cpu)
>  		WARN(true, "Dying CPU not properly vacated!");
>  		dump_rq_tasks(rq, KERN_WARNING);
>  	}
> -	dl_server_stop(&rq->fair_server);
> -#ifdef CONFIG_SCHED_CLASS_EXT
> -	dl_server_stop(&rq->ext_server);
> -#endif
> +	dl_servers_stop_all(rq);
>  	rq_unlock_irqrestore(rq, &rf);
>  
>  	calc_load_migrate(rq);
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index edca7849b165d..c2b3d6bbe4828 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -1826,6 +1826,20 @@ void dl_server_stop(struct sched_dl_entity *dl_se)
>  	dl_se->dl_server_active = 0;
>  }
>  
> +/*
> + * Stop all dl-servers on this runqueue. Called when transitioning to a state
> + * where the tick can be stopped (e.g., single RR/FIFO task, or no RT tasks).
> + * This ensures server timers are disarmed and won't cause spurious wakeups on
> + * nohz_full isolated cores.
> + */
> +void dl_servers_stop_all(struct rq *rq)
> +{
> +	dl_server_stop(&rq->fair_server);
> +#ifdef CONFIG_SCHED_CLASS_EXT
> +	dl_server_stop(&rq->ext_server);
> +#endif
> +}

And I think the dl_servers_stop_all() helper still makes sense, stopping the
ext_server is still needed in sched_cpu_dying() and calling dl_server_stop() on
an already-inactive server is harmless in the no-RT path.

Thanks,
-Andrea

> +
>  void dl_server_init(struct sched_dl_entity *dl_se, struct rq *rq,
>  		    dl_server_pick_f pick_task)
>  {
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 9f63b15d309d1..26cf1d14efde5 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -412,6 +412,7 @@ extern void dl_server_update_idle(struct sched_dl_entity *dl_se, s64 delta_exec)
>  extern void dl_server_update(struct sched_dl_entity *dl_se, s64 delta_exec);
>  extern void dl_server_start(struct sched_dl_entity *dl_se);
>  extern void dl_server_stop(struct sched_dl_entity *dl_se);
> +extern void dl_servers_stop_all(struct rq *rq);
>  extern void dl_server_init(struct sched_dl_entity *dl_se, struct rq *rq,
>  		    dl_server_pick_f pick_task);
>  extern void sched_init_dl_servers(void);
> 
> ---
> base-commit: 4ac4d6549a6563878d7c19c154e017f6cb7114d3
> change-id: 20260512-upstream-fix-dlserver-nohzfull-b4-b745e2a967ed
> 
> Best regards,
> --  
> Juri Lelli <juri.lelli@redhat.com>
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] sched/deadline: Make dl-server nohz full aware
  2026-05-12 14:03 ` Frederic Weisbecker
@ 2026-05-12 15:31   ` Juri Lelli
  0 siblings, 0 replies; 9+ messages in thread
From: Juri Lelli @ 2026-05-12 15:31 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Ingo Molnar, Peter Zijlstra, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	K Prateek Nayak, Andrea Righi, linux-kernel, David Haufe,
	Cao Ruichuang, Tejun Heo

On 12/05/26 16:03, Frederic Weisbecker wrote:
> Le Tue, May 12, 2026 at 11:02:37AM +0200, Juri Lelli a écrit :
> > The dl_server_timer() causes spurious IPIs on nohz_full cores, breaking
> > isolation guarantees. The timer executes on a housekeeping core and
> > eventually calls tick_nohz_dep_set_cpu(), sending IPIs to isolated cores
> > even when only a single task is running.
> > 
> > The problem is that dl-servers are not coordinated with nohz_full tick
> > state. Timers can fire and send IPIs to otherwise undisturbed cores.
> > 
> > Fix by managing servers in sched_can_stop_tick():
> > 
> > - When RT tasks run with CFS/SCX tasks, start the appropriate server
> >   and keep the tick running
> > - When only RT tasks remain, stop all servers and allow tick to stop
> >   (except for >1 RR tasks which need the tick for round-robin)
> > - When only CFS/SCX tasks remain, stop all servers before stopping tick
> > 
> > Introduce dl_servers_stop_all() to reduce duplication and abstract
> > server management from core.c. Unify RT handling into one block that
> > handles both RR and FIFO cases.
> > 
> > Fixes: 557a6bfc662c ("sched/fair: Add trivial fair server")
> > Reported-by: David Haufe <dhaufe@simplextrading.com>
> > Closes: https://lore.kernel.org/lkml/CAKJHwtOw_G67edzuHVtL1xC5Vyt6StcZzihtDd0yaKudW=rwVw@mail.gmail.com
> > Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
> 
> I indeed observed IPIs originating from dl_server some time
> ago but that magically disappeared after some commit from Peter.
> 
> Perhaps it came back somehow? Lemme run dynticks-testing again on
> latest upstream...

So, the IPIs seem to be indeed gone also from my recent testing (decided
to keep the reference/context in the changelog for historical reasons,
but can remove). But the dl_task_timer (dl-server timer) for isolated
CPUs is still firing on housekeeping and that is what this is
addressing.

> > ---
> > I had to modify my first original attempt at fixing this (please take a
> > look at the linked report/discussion) to also take SCX into
> > consideration.
> 
> I thought SCX was disabled when CPU isolation is running?
> 
> 9f391f94a173 ("sched_ext: Disallow loading BPF scheduler if isolcpus= domain
> isolation is in effect")

Ah, thanks for poiting it out, it indeed simplify things. :)

And thanks for the super quick review!

Best,
Juri


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] sched/deadline: Make dl-server nohz full aware
  2026-05-12 14:55 ` Andrea Righi
@ 2026-05-12 15:34   ` Juri Lelli
  2026-05-13  6:16     ` Juri Lelli
  0 siblings, 1 reply; 9+ messages in thread
From: Juri Lelli @ 2026-05-12 15:34 UTC (permalink / raw)
  To: Andrea Righi
  Cc: Ingo Molnar, Peter Zijlstra, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	K Prateek Nayak, Frederic Weisbecker, linux-kernel, David Haufe,
	Cao Ruichuang

Hi Andrea,

On 12/05/26 16:55, Andrea Righi wrote:
> Hi Juri,

Thanks from the quick review!

> On Tue, May 12, 2026 at 11:02:37AM +0200, Juri Lelli wrote:
> > The dl_server_timer() causes spurious IPIs on nohz_full cores, breaking
> > isolation guarantees. The timer executes on a housekeeping core and
> > eventually calls tick_nohz_dep_set_cpu(), sending IPIs to isolated cores
> > even when only a single task is running.
> > 
> > The problem is that dl-servers are not coordinated with nohz_full tick
> > state. Timers can fire and send IPIs to otherwise undisturbed cores.
> > 
> > Fix by managing servers in sched_can_stop_tick():
> > 
> > - When RT tasks run with CFS/SCX tasks, start the appropriate server
> >   and keep the tick running
> > - When only RT tasks remain, stop all servers and allow tick to stop
> >   (except for >1 RR tasks which need the tick for round-robin)
> > - When only CFS/SCX tasks remain, stop all servers before stopping tick
> > 
> > Introduce dl_servers_stop_all() to reduce duplication and abstract
> > server management from core.c. Unify RT handling into one block that
> > handles both RR and FIFO cases.
> > 
> > Fixes: 557a6bfc662c ("sched/fair: Add trivial fair server")
> > Reported-by: David Haufe <dhaufe@simplextrading.com>
> > Closes: https://lore.kernel.org/lkml/CAKJHwtOw_G67edzuHVtL1xC5Vyt6StcZzihtDd0yaKudW=rwVw@mail.gmail.com
> > Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
> > ---
> > I had to modify my first original attempt at fixing this (please take a
> > look at the linked report/discussion) to also take SCX into
> > consideration.
> 
> As mentioned by Frederic, we don't allow to load BPF schedulers when isolcpus=
> is used, so I think we can simplify the sched_can_stop_tick() part.

Right! Thanks for confirming.

> > 
> > FYI, I temporarily pushed the script I'm using to repro and verify the
> > fix here
> > 
> > https://github.com/jlelli/sched-deadline-tests/blob/master/test-dlserver-nohz.sh
> > ---
> >  kernel/sched/core.c     | 43 +++++++++++++++++++++++--------------------
> >  kernel/sched/deadline.c | 14 ++++++++++++++
> >  kernel/sched/sched.h    |  1 +
> >  3 files changed, 38 insertions(+), 20 deletions(-)
> > 
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index b905805bbcbe4..98759255c306b 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -1414,30 +1414,35 @@ static inline bool __need_bw_check(struct rq *rq, struct task_struct *p)
> >  
> >  bool sched_can_stop_tick(struct rq *rq)
> >  {
> > -	int fifo_nr_running;
> > -
> >  	/* Deadline tasks, even if single, need the tick */
> >  	if (rq->dl.dl_nr_running)
> >  		return false;
> >  
> >  	/*
> > -	 * If there are more than one RR tasks, we need the tick to affect the
> > -	 * actual RR behaviour.
> > +	 * If there are RT tasks, we may need the tick (for >1 RR tasks),
> > +	 * but we must also service lower-priority CFS/SCX tasks via dl-servers.
> 
> No need to mention SCX, maybe we can add a note that SCX is incompatible with
> isolcpus, so there's no SCX task to run here.

Ack.

> 
> >  	 */
> > -	if (rq->rt.rr_nr_running) {
> > -		if (rq->rt.rr_nr_running == 1)
> > -			return true;
> > -		else
> > +	if (rq->rt.rt_nr_running) {
> > +		if (rq->cfs.h_nr_queued) {
> > +			dl_server_start(&rq->fair_server);
> > +			return false;
> > +		}
> > +#ifdef CONFIG_SCHED_CLASS_EXT
> > +		if (rq->scx.nr_running) {
> > +			dl_server_start(&rq->ext_server);
> > +			return false;
> > +		}
> > +#endif
> 
> This #ifdef block can go away.
> 
> > +		/*
> > +		 * Only RT tasks, no CFS/SCX. Stop servers to prevent spurious
> 
> CFS/SCX -> CFS.
> 
> > +		 * wakeups. Tick can stop for single RR or any FIFO, but must
> > +		 * run for multiple RR (round-robin behavior).
> > +		 */
> > +		dl_servers_stop_all(rq);
> > +		if (rq->rt.rr_nr_running > 1)
> >  			return false;
> > -	}
> > -
> > -	/*
> > -	 * If there's no RR tasks, but FIFO tasks, we can skip the tick, no
> > -	 * forced preemption between FIFO tasks.
> > -	 */
> > -	fifo_nr_running = rq->rt.rt_nr_running - rq->rt.rr_nr_running;
> > -	if (fifo_nr_running)
> >  		return true;
> > +	}
> >  
> >  	/*
> >  	 * If there are no DL,RR/FIFO tasks, there must only be CFS or SCX tasks
> > @@ -1462,6 +1467,7 @@ bool sched_can_stop_tick(struct rq *rq)
> >  			return false;
> >  	}
> >  
> > +	dl_servers_stop_all(rq);
> >  	return true;
> >  }
> >  #endif /* CONFIG_NO_HZ_FULL */
> > @@ -8810,10 +8816,7 @@ int sched_cpu_dying(unsigned int cpu)
> >  		WARN(true, "Dying CPU not properly vacated!");
> >  		dump_rq_tasks(rq, KERN_WARNING);
> >  	}
> > -	dl_server_stop(&rq->fair_server);
> > -#ifdef CONFIG_SCHED_CLASS_EXT
> > -	dl_server_stop(&rq->ext_server);
> > -#endif
> > +	dl_servers_stop_all(rq);
> >  	rq_unlock_irqrestore(rq, &rf);
> >  
> >  	calc_load_migrate(rq);
> > diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> > index edca7849b165d..c2b3d6bbe4828 100644
> > --- a/kernel/sched/deadline.c
> > +++ b/kernel/sched/deadline.c
> > @@ -1826,6 +1826,20 @@ void dl_server_stop(struct sched_dl_entity *dl_se)
> >  	dl_se->dl_server_active = 0;
> >  }
> >  
> > +/*
> > + * Stop all dl-servers on this runqueue. Called when transitioning to a state
> > + * where the tick can be stopped (e.g., single RR/FIFO task, or no RT tasks).
> > + * This ensures server timers are disarmed and won't cause spurious wakeups on
> > + * nohz_full isolated cores.
> > + */
> > +void dl_servers_stop_all(struct rq *rq)
> > +{
> > +	dl_server_stop(&rq->fair_server);
> > +#ifdef CONFIG_SCHED_CLASS_EXT
> > +	dl_server_stop(&rq->ext_server);
> > +#endif
> > +}
> 
> And I think the dl_servers_stop_all() helper still makes sense, stopping the
> ext_server is still needed in sched_cpu_dying() and calling dl_server_stop() on
> an already-inactive server is harmless in the no-RT path.

And ack to all the above. Will send out a v2 soon.

Best,
Juri


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] sched/deadline: Make dl-server nohz full aware
  2026-05-12 15:34   ` Juri Lelli
@ 2026-05-13  6:16     ` Juri Lelli
  2026-05-13  6:38       ` Andrea Righi
  0 siblings, 1 reply; 9+ messages in thread
From: Juri Lelli @ 2026-05-13  6:16 UTC (permalink / raw)
  To: Andrea Righi
  Cc: Ingo Molnar, Peter Zijlstra, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	K Prateek Nayak, Frederic Weisbecker, linux-kernel, David Haufe,
	Cao Ruichuang

On 12/05/26 17:34, Juri Lelli wrote:
> Hi Andrea,
> 
> On 12/05/26 16:55, Andrea Righi wrote:
> > Hi Juri,
> 
> Thanks from the quick review!
> 
> > On Tue, May 12, 2026 at 11:02:37AM +0200, Juri Lelli wrote:
> > > The dl_server_timer() causes spurious IPIs on nohz_full cores, breaking
> > > isolation guarantees. The timer executes on a housekeeping core and
> > > eventually calls tick_nohz_dep_set_cpu(), sending IPIs to isolated cores
> > > even when only a single task is running.
> > > 
> > > The problem is that dl-servers are not coordinated with nohz_full tick
> > > state. Timers can fire and send IPIs to otherwise undisturbed cores.
> > > 
> > > Fix by managing servers in sched_can_stop_tick():
> > > 
> > > - When RT tasks run with CFS/SCX tasks, start the appropriate server
> > >   and keep the tick running
> > > - When only RT tasks remain, stop all servers and allow tick to stop
> > >   (except for >1 RR tasks which need the tick for round-robin)
> > > - When only CFS/SCX tasks remain, stop all servers before stopping tick
> > > 
> > > Introduce dl_servers_stop_all() to reduce duplication and abstract
> > > server management from core.c. Unify RT handling into one block that
> > > handles both RR and FIFO cases.
> > > 
> > > Fixes: 557a6bfc662c ("sched/fair: Add trivial fair server")
> > > Reported-by: David Haufe <dhaufe@simplextrading.com>
> > > Closes: https://lore.kernel.org/lkml/CAKJHwtOw_G67edzuHVtL1xC5Vyt6StcZzihtDd0yaKudW=rwVw@mail.gmail.com
> > > Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
> > > ---
> > > I had to modify my first original attempt at fixing this (please take a
> > > look at the linked report/discussion) to also take SCX into
> > > consideration.
> > 
> > As mentioned by Frederic, we don't allow to load BPF schedulers when isolcpus=
> > is used, so I think we can simplify the sched_can_stop_tick() part.
> 
> Right! Thanks for confirming.

Ah, but wait. IIUC SCX is incopatible with isolcpus=domain only?
scx_can_stop_tick() seems to confirm we need to take care of it when
domain flag is not present.

So, maybe we still need to consider SCX in this patch? e.g. in
configurations that are not using static domain isolation, but isolate
CPUs by configuring tasks affinities.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] sched/deadline: Make dl-server nohz full aware
  2026-05-13  6:16     ` Juri Lelli
@ 2026-05-13  6:38       ` Andrea Righi
  0 siblings, 0 replies; 9+ messages in thread
From: Andrea Righi @ 2026-05-13  6:38 UTC (permalink / raw)
  To: Juri Lelli
  Cc: Ingo Molnar, Peter Zijlstra, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	K Prateek Nayak, Frederic Weisbecker, linux-kernel, David Haufe,
	Cao Ruichuang

Hi Juri,

On Wed, May 13, 2026 at 08:16:21AM +0200, Juri Lelli wrote:
> On 12/05/26 17:34, Juri Lelli wrote:
> > Hi Andrea,
> > 
> > On 12/05/26 16:55, Andrea Righi wrote:
> > > Hi Juri,
> > 
> > Thanks from the quick review!
> > 
> > > On Tue, May 12, 2026 at 11:02:37AM +0200, Juri Lelli wrote:
> > > > The dl_server_timer() causes spurious IPIs on nohz_full cores, breaking
> > > > isolation guarantees. The timer executes on a housekeeping core and
> > > > eventually calls tick_nohz_dep_set_cpu(), sending IPIs to isolated cores
> > > > even when only a single task is running.
> > > > 
> > > > The problem is that dl-servers are not coordinated with nohz_full tick
> > > > state. Timers can fire and send IPIs to otherwise undisturbed cores.
> > > > 
> > > > Fix by managing servers in sched_can_stop_tick():
> > > > 
> > > > - When RT tasks run with CFS/SCX tasks, start the appropriate server
> > > >   and keep the tick running
> > > > - When only RT tasks remain, stop all servers and allow tick to stop
> > > >   (except for >1 RR tasks which need the tick for round-robin)
> > > > - When only CFS/SCX tasks remain, stop all servers before stopping tick
> > > > 
> > > > Introduce dl_servers_stop_all() to reduce duplication and abstract
> > > > server management from core.c. Unify RT handling into one block that
> > > > handles both RR and FIFO cases.
> > > > 
> > > > Fixes: 557a6bfc662c ("sched/fair: Add trivial fair server")
> > > > Reported-by: David Haufe <dhaufe@simplextrading.com>
> > > > Closes: https://lore.kernel.org/lkml/CAKJHwtOw_G67edzuHVtL1xC5Vyt6StcZzihtDd0yaKudW=rwVw@mail.gmail.com
> > > > Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
> > > > ---
> > > > I had to modify my first original attempt at fixing this (please take a
> > > > look at the linked report/discussion) to also take SCX into
> > > > consideration.
> > > 
> > > As mentioned by Frederic, we don't allow to load BPF schedulers when isolcpus=
> > > is used, so I think we can simplify the sched_can_stop_tick() part.
> > 
> > Right! Thanks for confirming.
> 
> Ah, but wait. IIUC SCX is incopatible with isolcpus=domain only?
> scx_can_stop_tick() seems to confirm we need to take care of it when
> domain flag is not present.
> 
> So, maybe we still need to consider SCX in this patch? e.g. in
> configurations that are not using static domain isolation, but isolate
> CPUs by configuring tasks affinities.

Ah! That's right. SCX is incompatible with isolcpus=domain, but we do support
nohz_full=..., so I think your original approach is correct. It might be worth
calling out explicitly in the patch description that the SCX handling targets
nohz_full, so we don't make the same mistake in the future.

Thanks,
-Andrea

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2026-05-13  6:38 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-12  9:02 [PATCH] sched/deadline: Make dl-server nohz full aware Juri Lelli
2026-05-12 10:06 ` Furkan Çalışkan
2026-05-12 12:27   ` Juri Lelli
2026-05-12 14:03 ` Frederic Weisbecker
2026-05-12 15:31   ` Juri Lelli
2026-05-12 14:55 ` Andrea Righi
2026-05-12 15:34   ` Juri Lelli
2026-05-13  6:16     ` Juri Lelli
2026-05-13  6:38       ` Andrea Righi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox