The Linux Kernel Mailing List
 help / color / mirror / Atom feed
* [PATCH] sched/deadline: Fix missing ENQUEUE_REPLENISH during PI de-boosting
@ 2026-02-06 13:25 Juri Lelli
  2026-02-06 15:39 ` Peter Zijlstra
  2026-02-07  8:45 ` Peter Zijlstra
  0 siblings, 2 replies; 8+ messages in thread
From: Juri Lelli @ 2026-02-06 13:25 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider
  Cc: Philip Auld, Gabriele Monaco, linux-kernel, Bruno Goncalves,
	Juri Lelli

Running stress-ng --schedpolicy 0 on an RT kernel on a big machine
might lead to the following WARNINGs (edited).

 sched: DL de-boosted task PID 22725: REPLENISH flag missing

 WARNING: CPU: 93 PID: 0 at kernel/sched/deadline.c:239 dequeue_task_dl+0x15c/0x1f8
 ... (running_bw underflow)
 Call trace:
  dequeue_task_dl+0x15c/0x1f8 (P)
  dequeue_task+0x80/0x168
  deactivate_task+0x24/0x50
  push_dl_task+0x264/0x2e0
  dl_task_timer+0x1b0/0x228
  __hrtimer_run_queues+0x188/0x378
  hrtimer_interrupt+0xfc/0x260
  arch_timer_handler_phys+0x34/0x60
  handle_percpu_devid_irq+0xa4/0x230
  generic_handle_domain_irq+0x34/0x60
  __gic_handle_irq_from_irqson.isra.0+0x158/0x298
  gic_handle_irq+0x28/0x80
  call_on_irq_stack+0x30/0x48
  do_interrupt_handler+0xdc/0xe8
  el1_interrupt+0x44/0xc0
  el1h_64_irq_handler+0x18/0x28
  el1h_64_irq+0x80/0x88
  cpuidle_enter_state+0xc4/0x520 (P)
  cpuidle_enter+0x40/0x60
  cpuidle_idle_call+0x13c/0x220
  do_idle+0xa4/0x120
  cpu_startup_entry+0x40/0x50
  secondary_start_kernel+0xe4/0x128
  __secondary_switched+0xc0/0xc8

The problem is that when a SCHED_DEADLINE task (lock holder) is
changed to a lower priority class via sched_setscheduler(), it may
fail to properly inherit the parameters of potential DEADLINE donors
if it didn't already inherit them in the past (shorter deadline than
donor's at that time). This might lead to bandwidth accounting
corruption, as enqueue_task_dl() won't recognize the lock holder as
boosted.

The scenario occurs when:
1. A DEADLINE task (donor) blocks on a PI mutex held by another
   DEADLINE task (holder), but the holder doesn't inherit parameters
   (e.g., it already has a shorter deadline)
2. sched_setscheduler() changes the holder from DEADLINE to a lower
   class while still holding the mutex
3. The holder should now inherit DEADLINE parameters from the donor
   and be enqueued with ENQUEUE_REPLENISH, but this doesn't happen

Fix the issue by introducing __setscheduler_dl(), which detects when
a task's normal priority class differs from its PI-boosted class.
When a (now!) non-DEADLINE task (normal_prio) is being boosted by a
DEADLINE pi_task (effective prio), it inherits the DEADLINE
parameters (pi_se) and sets the ENQUEUE_REPLENISH flag to ensure
proper bandwidth accounting during the next enqueue operation.

Reported-by: Bruno Goncalves <bgoncalv@redhat.com>
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
---
Hello,

The underlying big(ger) issue is that PI is broken for DEADLINE. We know
this, proxy exec is progressing well and will hopefully soon replace all
this. In the meantime, here it comes another piece of duck tape trying
to fix the issue described in the changelog.

The issue was discovered by Bruno Goncalves while running stress-ng
--schedpolicy 0 on RT kernels on large systems (I believe lots of CPUs
and PI enabled in-kernel mutexes makes it easier to trigger). Later on a
simpler and more focused reproducer was created (with Claude Code help)
and is available at

https://github.com/jlelli/sched-deadline-tests/blob/master/test_dl_replenish_bug.c

Fix also available from

git@github.com:jlelli/linux.git upstream/fix-deadline-piboost
---
 kernel/sched/syscalls.c | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/kernel/sched/syscalls.c b/kernel/sched/syscalls.c
index 6f10db3646e7f..369e47b4ea863 100644
--- a/kernel/sched/syscalls.c
+++ b/kernel/sched/syscalls.c
@@ -7,6 +7,7 @@
  *  Copyright (C) 1991-2002  Linus Torvalds
  *  Copyright (C) 1998-2024  Ingo Molnar, Red Hat
  */
+#include "linux/sched/rt.h"
 #include <linux/sched.h>
 #include <linux/cpuset.h>
 #include <linux/sched/debug.h>
@@ -284,6 +285,33 @@ static bool check_same_owner(struct task_struct *p)
 		uid_eq(cred->euid, pcred->uid));
 }
 
+#ifdef CONFIG_RT_MUTEXES
+static void __setscheduler_dl(struct task_struct *p,
+			      struct sched_change_ctx *scope)
+{
+	struct task_struct *pi_task = rt_mutex_get_top_task(p);
+
+	/*
+	 * In case a former DEADLINE task (either proper or boosted) gets
+	 * setscheduled to a lower priority class, check if it neeeds to
+	 * inherit parameters from a potential pi_task. In that case make
+	 * sure replenishment happens with the next enqueue.
+	 */
+	if (!dl_prio(p->normal_prio) &&
+	    (pi_task && dl_prio(pi_task->prio))) {
+		p->dl.pi_se = pi_task->dl.pi_se;
+
+		if (scope && scope->queued)
+			scope->flags |= ENQUEUE_REPLENISH;
+	}
+}
+#else /* !CONFIG_RT_MUTEXES */
+static void __setscheduler_dl(struct task_struct *p,
+			      struct sched_change_ctx *scope)
+{
+}
+#endif /* !CONFIG_RT_MUTEXES */
+
 #ifdef CONFIG_UCLAMP_TASK
 
 static int uclamp_validate(struct task_struct *p,
@@ -657,6 +685,7 @@ int __sched_setscheduler(struct task_struct *p,
 			p->prio = newprio;
 		}
 		__setscheduler_uclamp(p, attr);
+		__setscheduler_dl(p, scope);
 
 		if (scope->queued) {
 			/*

---
base-commit: e34881c84c255bc300f24d9fe685324be20da3d1
change-id: 20260205-upstream-fix-deadline-piboost-b4-2d924be17182

Best regards,
--  
Juri Lelli <juri.lelli@redhat.com>


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] sched/deadline: Fix missing ENQUEUE_REPLENISH during PI de-boosting
  2026-02-06 13:25 [PATCH] sched/deadline: Fix missing ENQUEUE_REPLENISH during PI de-boosting Juri Lelli
@ 2026-02-06 15:39 ` Peter Zijlstra
  2026-02-06 15:42   ` Juri Lelli
  2026-02-07  8:45 ` Peter Zijlstra
  1 sibling, 1 reply; 8+ messages in thread
From: Peter Zijlstra @ 2026-02-06 15:39 UTC (permalink / raw)
  To: Juri Lelli
  Cc: Ingo Molnar, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Valentin Schneider, Philip Auld,
	Gabriele Monaco, linux-kernel, Bruno Goncalves

On Fri, Feb 06, 2026 at 02:25:52PM +0100, Juri Lelli wrote:

> diff --git a/kernel/sched/syscalls.c b/kernel/sched/syscalls.c
> index 6f10db3646e7f..369e47b4ea863 100644
> --- a/kernel/sched/syscalls.c
> +++ b/kernel/sched/syscalls.c
> @@ -7,6 +7,7 @@
>   *  Copyright (C) 1991-2002  Linus Torvalds
>   *  Copyright (C) 1998-2024  Ingo Molnar, Red Hat
>   */
> +#include "linux/sched/rt.h"
>  #include <linux/sched.h>
>  #include <linux/cpuset.h>
>  #include <linux/sched/debug.h>

Is this clangd being 'helpful' ? Or an over eager AI thing? In case of
clangd, add to .clangd:

Completion:
  HeaderInsertion: Never

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] sched/deadline: Fix missing ENQUEUE_REPLENISH during PI de-boosting
  2026-02-06 15:39 ` Peter Zijlstra
@ 2026-02-06 15:42   ` Juri Lelli
  0 siblings, 0 replies; 8+ messages in thread
From: Juri Lelli @ 2026-02-06 15:42 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Valentin Schneider, Philip Auld,
	Gabriele Monaco, linux-kernel, Bruno Goncalves

On 06/02/26 16:39, Peter Zijlstra wrote:
> On Fri, Feb 06, 2026 at 02:25:52PM +0100, Juri Lelli wrote:
> 
> > diff --git a/kernel/sched/syscalls.c b/kernel/sched/syscalls.c
> > index 6f10db3646e7f..369e47b4ea863 100644
> > --- a/kernel/sched/syscalls.c
> > +++ b/kernel/sched/syscalls.c
> > @@ -7,6 +7,7 @@
> >   *  Copyright (C) 1991-2002  Linus Torvalds
> >   *  Copyright (C) 1998-2024  Ingo Molnar, Red Hat
> >   */
> > +#include "linux/sched/rt.h"
> >  #include <linux/sched.h>
> >  #include <linux/cpuset.h>
> >  #include <linux/sched/debug.h>
> 
> Is this clangd being 'helpful' ? Or an over eager AI thing? In case of
> clangd, add to .clangd:
> 
> Completion:
>   HeaderInsertion: Never
> 

clangd. Thanks for the suggestion! Will add.

Best,
Juri


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] sched/deadline: Fix missing ENQUEUE_REPLENISH during PI de-boosting
  2026-02-06 13:25 [PATCH] sched/deadline: Fix missing ENQUEUE_REPLENISH during PI de-boosting Juri Lelli
  2026-02-06 15:39 ` Peter Zijlstra
@ 2026-02-07  8:45 ` Peter Zijlstra
  2026-02-09  9:46   ` Juri Lelli
  1 sibling, 1 reply; 8+ messages in thread
From: Peter Zijlstra @ 2026-02-07  8:45 UTC (permalink / raw)
  To: Juri Lelli
  Cc: Ingo Molnar, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Valentin Schneider, Philip Auld,
	Gabriele Monaco, linux-kernel, Bruno Goncalves

On Fri, Feb 06, 2026 at 02:25:52PM +0100, Juri Lelli wrote:

> @@ -284,6 +285,33 @@ static bool check_same_owner(struct task_struct *p)
>  		uid_eq(cred->euid, pcred->uid));
>  }
>  
> +#ifdef CONFIG_RT_MUTEXES
> +static void __setscheduler_dl(struct task_struct *p,
> +			      struct sched_change_ctx *scope)
> +{
> +	struct task_struct *pi_task = rt_mutex_get_top_task(p);
> +
> +	/*
> +	 * In case a former DEADLINE task (either proper or boosted) gets
> +	 * setscheduled to a lower priority class, check if it neeeds to
> +	 * inherit parameters from a potential pi_task. In that case make
> +	 * sure replenishment happens with the next enqueue.
> +	 */
> +	if (!dl_prio(p->normal_prio) &&
> +	    (pi_task && dl_prio(pi_task->prio))) {
> +		p->dl.pi_se = pi_task->dl.pi_se;
> +
> +		if (scope && scope->queued)
> +			scope->flags |= ENQUEUE_REPLENISH;
> +	}
> +}
> +#else /* !CONFIG_RT_MUTEXES */
> +static void __setscheduler_dl(struct task_struct *p,
> +			      struct sched_change_ctx *scope)
> +{
> +}
> +#endif /* !CONFIG_RT_MUTEXES */
> +
>  #ifdef CONFIG_UCLAMP_TASK
>  
>  static int uclamp_validate(struct task_struct *p,
> @@ -657,6 +685,7 @@ int __sched_setscheduler(struct task_struct *p,
>  			p->prio = newprio;
>  		}
>  		__setscheduler_uclamp(p, attr);
> +		__setscheduler_dl(p, scope);
>  
>  		if (scope->queued) {
>  			/*
> 

Urgh... :-)

So normally it would be __setscheduler_params(), but that funks out
because !dl_policy() -- after all, we're demoting the boosted task to be
!DL.

So then we need to fix up things to the effective priority.

Should this not be inside the !KEEP_PARAMS thing? Something like so?

(afaict nothing clears dl_se::pi_se except rt_mutex_setprio() so that
should still be valid here -- so we don't need to go find it again)


diff --git a/kernel/sched/syscalls.c b/kernel/sched/syscalls.c
index 6f10db3646e7..ccd2be806e13 100644
--- a/kernel/sched/syscalls.c
+++ b/kernel/sched/syscalls.c
@@ -655,6 +655,10 @@ int __sched_setscheduler(struct task_struct *p,
 			__setscheduler_params(p, attr);
 			p->sched_class = next_class;
 			p->prio = newprio;
+#ifdef CONFIG_RT_MUTEXES
+			if (dl_prio(newprio) && !dl_policy(policy) && p->dl.pi_se)
+				scope->flags |= ENQUEUE_REPLENISH;
+#endif
 		}
 		__setscheduler_uclamp(p, attr);
 


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] sched/deadline: Fix missing ENQUEUE_REPLENISH during PI de-boosting
  2026-02-07  8:45 ` Peter Zijlstra
@ 2026-02-09  9:46   ` Juri Lelli
  2026-02-24 13:22     ` Juri Lelli
  2026-02-24 13:45     ` Peter Zijlstra
  0 siblings, 2 replies; 8+ messages in thread
From: Juri Lelli @ 2026-02-09  9:46 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Valentin Schneider, Philip Auld,
	Gabriele Monaco, linux-kernel, Bruno Goncalves

On 07/02/26 09:45, Peter Zijlstra wrote:
> On Fri, Feb 06, 2026 at 02:25:52PM +0100, Juri Lelli wrote:
> 
> > @@ -284,6 +285,33 @@ static bool check_same_owner(struct task_struct *p)
> >  		uid_eq(cred->euid, pcred->uid));
> >  }
> >  
> > +#ifdef CONFIG_RT_MUTEXES
> > +static void __setscheduler_dl(struct task_struct *p,
> > +			      struct sched_change_ctx *scope)
> > +{
> > +	struct task_struct *pi_task = rt_mutex_get_top_task(p);
> > +
> > +	/*
> > +	 * In case a former DEADLINE task (either proper or boosted) gets
> > +	 * setscheduled to a lower priority class, check if it neeeds to
> > +	 * inherit parameters from a potential pi_task. In that case make
> > +	 * sure replenishment happens with the next enqueue.
> > +	 */
> > +	if (!dl_prio(p->normal_prio) &&
> > +	    (pi_task && dl_prio(pi_task->prio))) {
> > +		p->dl.pi_se = pi_task->dl.pi_se;
> > +
> > +		if (scope && scope->queued)
> > +			scope->flags |= ENQUEUE_REPLENISH;
> > +	}
> > +}
> > +#else /* !CONFIG_RT_MUTEXES */
> > +static void __setscheduler_dl(struct task_struct *p,
> > +			      struct sched_change_ctx *scope)
> > +{
> > +}
> > +#endif /* !CONFIG_RT_MUTEXES */
> > +
> >  #ifdef CONFIG_UCLAMP_TASK
> >  
> >  static int uclamp_validate(struct task_struct *p,
> > @@ -657,6 +685,7 @@ int __sched_setscheduler(struct task_struct *p,
> >  			p->prio = newprio;
> >  		}
> >  		__setscheduler_uclamp(p, attr);
> > +		__setscheduler_dl(p, scope);
> >  
> >  		if (scope->queued) {
> >  			/*
> > 
> 
> Urgh... :-)

Yeah.

> So normally it would be __setscheduler_params(), but that funks out
> because !dl_policy() -- after all, we're demoting the boosted task to be
> !DL.

In this particular case we have a DEADLINE task (holder) that didn't
take the chance of being boosted by another DEADLINE task (donor),
because donor had longer dynamic deadline when rt_mutex_setprio() was
called. So now p->dl.pi_se still points to &p->dl and so enqueue_task_dl
doesn't recognize the holder as boosted and takes the wrong path at the
start.

> So then we need to fix up things to the effective priority.

We can use effective priority, right.

> Should this not be inside the !KEEP_PARAMS thing? Something like so?

And do it inside !KEEP_PARAMS, indeed.

> (afaict nothing clears dl_se::pi_se except rt_mutex_setprio() so that
> should still be valid here -- so we don't need to go find it again)

But, maybe with something like this? I believe we need to make things
right at this point "promoting" the now becoming lower prio class task
to DEADLINE (inheriting from the task it didn't inherit from in the
past). Maybe we can avoid checking pi_task since dl_prio(newprio). And
also move everything in an helper to remove ifdeffery.

---
diff --git a/kernel/sched/syscalls.c b/kernel/sched/syscalls.c
index 6f10db3646e7f..856df1a22e3ca 100644
--- a/kernel/sched/syscalls.c
+++ b/kernel/sched/syscalls.c
@@ -655,6 +655,16 @@ int __sched_setscheduler(struct task_struct *p,
 			__setscheduler_params(p, attr);
 			p->sched_class = next_class;
 			p->prio = newprio;
+#ifdef CONFIG_RT_MUTEXES
+			if (dl_prio(newprio) && !dl_policy(policy)) {
+				struct task_struct *pi_task = rt_mutex_get_top_task(p);
+
+				if (pi_task) {
+					p->dl.pi_se = pi_task->dl.pi_se;
+					scope->flags |= ENQUEUE_REPLENISH;
+				}
+			}
+#endif
 		}
 		__setscheduler_uclamp(p, attr);


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] sched/deadline: Fix missing ENQUEUE_REPLENISH during PI de-boosting
  2026-02-09  9:46   ` Juri Lelli
@ 2026-02-24 13:22     ` Juri Lelli
  2026-02-24 13:45     ` Peter Zijlstra
  1 sibling, 0 replies; 8+ messages in thread
From: Juri Lelli @ 2026-02-24 13:22 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Valentin Schneider, Philip Auld,
	Gabriele Monaco, linux-kernel, Bruno Goncalves

Hi Peter,

On 09/02/26 10:46, Juri Lelli wrote:
> On 07/02/26 09:45, Peter Zijlstra wrote:
> > On Fri, Feb 06, 2026 at 02:25:52PM +0100, Juri Lelli wrote:
> > 
> > > @@ -284,6 +285,33 @@ static bool check_same_owner(struct task_struct *p)
> > >  		uid_eq(cred->euid, pcred->uid));
> > >  }
> > >  
> > > +#ifdef CONFIG_RT_MUTEXES
> > > +static void __setscheduler_dl(struct task_struct *p,
> > > +			      struct sched_change_ctx *scope)
> > > +{
> > > +	struct task_struct *pi_task = rt_mutex_get_top_task(p);
> > > +
> > > +	/*
> > > +	 * In case a former DEADLINE task (either proper or boosted) gets
> > > +	 * setscheduled to a lower priority class, check if it neeeds to
> > > +	 * inherit parameters from a potential pi_task. In that case make
> > > +	 * sure replenishment happens with the next enqueue.
> > > +	 */
> > > +	if (!dl_prio(p->normal_prio) &&
> > > +	    (pi_task && dl_prio(pi_task->prio))) {
> > > +		p->dl.pi_se = pi_task->dl.pi_se;
> > > +
> > > +		if (scope && scope->queued)
> > > +			scope->flags |= ENQUEUE_REPLENISH;
> > > +	}
> > > +}
> > > +#else /* !CONFIG_RT_MUTEXES */
> > > +static void __setscheduler_dl(struct task_struct *p,
> > > +			      struct sched_change_ctx *scope)
> > > +{
> > > +}
> > > +#endif /* !CONFIG_RT_MUTEXES */
> > > +
> > >  #ifdef CONFIG_UCLAMP_TASK
> > >  
> > >  static int uclamp_validate(struct task_struct *p,
> > > @@ -657,6 +685,7 @@ int __sched_setscheduler(struct task_struct *p,
> > >  			p->prio = newprio;
> > >  		}
> > >  		__setscheduler_uclamp(p, attr);
> > > +		__setscheduler_dl(p, scope);
> > >  
> > >  		if (scope->queued) {
> > >  			/*
> > > 
> > 
> > Urgh... :-)
> 
> Yeah.
> 
> > So normally it would be __setscheduler_params(), but that funks out
> > because !dl_policy() -- after all, we're demoting the boosted task to be
> > !DL.
> 
> In this particular case we have a DEADLINE task (holder) that didn't
> take the chance of being boosted by another DEADLINE task (donor),
> because donor had longer dynamic deadline when rt_mutex_setprio() was
> called. So now p->dl.pi_se still points to &p->dl and so enqueue_task_dl
> doesn't recognize the holder as boosted and takes the wrong path at the
> start.
> 
> > So then we need to fix up things to the effective priority.
> 
> We can use effective priority, right.
> 
> > Should this not be inside the !KEEP_PARAMS thing? Something like so?
> 
> And do it inside !KEEP_PARAMS, indeed.
> 
> > (afaict nothing clears dl_se::pi_se except rt_mutex_setprio() so that
> > should still be valid here -- so we don't need to go find it again)
> 
> But, maybe with something like this? I believe we need to make things
> right at this point "promoting" the now becoming lower prio class task
> to DEADLINE (inheriting from the task it didn't inherit from in the
> past). Maybe we can avoid checking pi_task since dl_prio(newprio). And
> also move everything in an helper to remove ifdeffery.
> 
> ---
> diff --git a/kernel/sched/syscalls.c b/kernel/sched/syscalls.c
> index 6f10db3646e7f..856df1a22e3ca 100644
> --- a/kernel/sched/syscalls.c
> +++ b/kernel/sched/syscalls.c
> @@ -655,6 +655,16 @@ int __sched_setscheduler(struct task_struct *p,
>  			__setscheduler_params(p, attr);
>  			p->sched_class = next_class;
>  			p->prio = newprio;
> +#ifdef CONFIG_RT_MUTEXES
> +			if (dl_prio(newprio) && !dl_policy(policy)) {
> +				struct task_struct *pi_task = rt_mutex_get_top_task(p);
> +
> +				if (pi_task) {
> +					p->dl.pi_se = pi_task->dl.pi_se;
> +					scope->flags |= ENQUEUE_REPLENISH;
> +				}
> +			}
> +#endif
>  		}
>  		__setscheduler_uclamp(p, attr);

When you have a minute, can you take a look at the above?

Thanks!
Juri


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] sched/deadline: Fix missing ENQUEUE_REPLENISH during PI de-boosting
  2026-02-09  9:46   ` Juri Lelli
  2026-02-24 13:22     ` Juri Lelli
@ 2026-02-24 13:45     ` Peter Zijlstra
  2026-02-24 14:05       ` Juri Lelli
  1 sibling, 1 reply; 8+ messages in thread
From: Peter Zijlstra @ 2026-02-24 13:45 UTC (permalink / raw)
  To: Juri Lelli
  Cc: Ingo Molnar, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Valentin Schneider, Philip Auld,
	Gabriele Monaco, linux-kernel, Bruno Goncalves

On Mon, Feb 09, 2026 at 10:46:18AM +0100, Juri Lelli wrote:

> > (afaict nothing clears dl_se::pi_se except rt_mutex_setprio() so that
> > should still be valid here -- so we don't need to go find it again)
> 
> But, maybe with something like this? I believe we need to make things
> right at this point "promoting" the now becoming lower prio class task
> to DEADLINE (inheriting from the task it didn't inherit from in the
> past). Maybe we can avoid checking pi_task since dl_prio(newprio). And
> also move everything in an helper to remove ifdeffery.
> 
> ---
> diff --git a/kernel/sched/syscalls.c b/kernel/sched/syscalls.c
> index 6f10db3646e7f..856df1a22e3ca 100644
> --- a/kernel/sched/syscalls.c
> +++ b/kernel/sched/syscalls.c
> @@ -655,6 +655,16 @@ int __sched_setscheduler(struct task_struct *p,
>  			__setscheduler_params(p, attr);
>  			p->sched_class = next_class;
>  			p->prio = newprio;
> +#ifdef CONFIG_RT_MUTEXES
> +			if (dl_prio(newprio) && !dl_policy(policy)) {
> +				struct task_struct *pi_task = rt_mutex_get_top_task(p);
> +
> +				if (pi_task) {
> +					p->dl.pi_se = pi_task->dl.pi_se;
> +					scope->flags |= ENQUEUE_REPLENISH;
> +				}
> +			}
> +#endif
>  		}
>  		__setscheduler_uclamp(p, attr);

I'm still a bit confused on the pi_se thing; if this was a dl task, then
rt_mutex_setprio() would've already set this to pi_task->dl.pi_se, no?

Anyway, yes something like that, possibly with a helper sounds fine.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] sched/deadline: Fix missing ENQUEUE_REPLENISH during PI de-boosting
  2026-02-24 13:45     ` Peter Zijlstra
@ 2026-02-24 14:05       ` Juri Lelli
  0 siblings, 0 replies; 8+ messages in thread
From: Juri Lelli @ 2026-02-24 14:05 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Valentin Schneider, Philip Auld,
	Gabriele Monaco, linux-kernel, Bruno Goncalves

On 24/02/26 14:45, Peter Zijlstra wrote:
> On Mon, Feb 09, 2026 at 10:46:18AM +0100, Juri Lelli wrote:
> 
> > > (afaict nothing clears dl_se::pi_se except rt_mutex_setprio() so that
> > > should still be valid here -- so we don't need to go find it again)
> > 
> > But, maybe with something like this? I believe we need to make things
> > right at this point "promoting" the now becoming lower prio class task
> > to DEADLINE (inheriting from the task it didn't inherit from in the
> > past). Maybe we can avoid checking pi_task since dl_prio(newprio). And
> > also move everything in an helper to remove ifdeffery.
> > 
> > ---
> > diff --git a/kernel/sched/syscalls.c b/kernel/sched/syscalls.c
> > index 6f10db3646e7f..856df1a22e3ca 100644
> > --- a/kernel/sched/syscalls.c
> > +++ b/kernel/sched/syscalls.c
> > @@ -655,6 +655,16 @@ int __sched_setscheduler(struct task_struct *p,
> >  			__setscheduler_params(p, attr);
> >  			p->sched_class = next_class;
> >  			p->prio = newprio;
> > +#ifdef CONFIG_RT_MUTEXES
> > +			if (dl_prio(newprio) && !dl_policy(policy)) {
> > +				struct task_struct *pi_task = rt_mutex_get_top_task(p);
> > +
> > +				if (pi_task) {
> > +					p->dl.pi_se = pi_task->dl.pi_se;
> > +					scope->flags |= ENQUEUE_REPLENISH;
> > +				}
> > +			}
> > +#endif
> >  		}
> >  		__setscheduler_uclamp(p, attr);
> 
> I'm still a bit confused on the pi_se thing; if this was a dl task, then
> rt_mutex_setprio() would've already set this to pi_task->dl.pi_se, no?

Let's say it's 'far from ideal', but we currently have

if (dl_prio(prio)) {
  if (!dl_prio(p->normal_prio) ||
    (pi_task && dl_prio(pi_task->prio) &&
     dl_entity_preempt(&pi_task->dl, &p->dl))) {  <--
        p->dl.pi_se = pi_task->dl.pi_se;
        scope->flags |= ENQUEUE_REPLENISH;
  } else {
    p->dl.pi_se = &p->dl;
  }

So, if the potential donor was DEADLINE, but with longer absolute
deadline, we don't boost. But, if the donor is still potential, we can
boost later when the holder becomes lower prio.

> 
> Anyway, yes something like that, possibly with a helper sounds fine.
> 

Will clean-up and resend.

Thanks,
Juri


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2026-02-24 14:05 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-06 13:25 [PATCH] sched/deadline: Fix missing ENQUEUE_REPLENISH during PI de-boosting Juri Lelli
2026-02-06 15:39 ` Peter Zijlstra
2026-02-06 15:42   ` Juri Lelli
2026-02-07  8:45 ` Peter Zijlstra
2026-02-09  9:46   ` Juri Lelli
2026-02-24 13:22     ` Juri Lelli
2026-02-24 13:45     ` Peter Zijlstra
2026-02-24 14:05       ` Juri Lelli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox