All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH sched_ext/for-7.1-fixes] sched_ext: Reset dsq_vtime and slice when a task leaves SCX
@ 2026-06-08 13:49 Andrea Righi
  2026-06-08 14:11 ` sashiko-bot
  0 siblings, 1 reply; 3+ messages in thread
From: Andrea Righi @ 2026-06-08 13:49 UTC (permalink / raw)
  To: Tejun Heo, David Vernet, Changwoo Min; +Cc: sched-ext, linux-kernel

When a task switches out of the sched_ext class, p->scx.dsq_vtime and
p->scx.slice keep whatever values they last held. On enable, slice is
reset to slice_dfl, but dsq_vtime is owned by the BPF scheduler and is
never cleared by the core, so a task that leaves SCX and later returns
carries a stale dsq_vtime across the round-trip.

The stale values are also visible to other SCX schedulers if they
inspects the scx fields of non-SCX tasks.

Zero dsq_vtime and slice in switched_from_scx() after scx_disable_task()
so the fields are reset to the same baseline a freshly forked task has.
The reset is done after the disable callback so the BPF scheduler can
still observe the task's final values.

Signed-off-by: Andrea Righi <arighi@nvidia.com>
---
 kernel/sched/ext.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 8e88a25bc602f..915d5ece5d277 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -3942,10 +3942,11 @@ static void switched_from_scx(struct rq *rq, struct task_struct *p)
 	 * scx_disable_task() would WARN on the non-%ENABLED state and trigger a
 	 * NONE -> READY validation failure.
 	 */
-	if (scx_get_task_state(p) == SCX_TASK_NONE)
-		return;
+	if (scx_get_task_state(p) != SCX_TASK_NONE)
+		scx_disable_task(scx_task_sched(p), p);
 
-	scx_disable_task(scx_task_sched(p), p);
+	p->scx.dsq_vtime = 0;
+	p->scx.slice = 0;
 }
 
 static void switched_to_scx(struct rq *rq, struct task_struct *p) {}
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH sched_ext/for-7.1-fixes] sched_ext: Reset dsq_vtime and slice when a task leaves SCX
  2026-06-08 13:49 [PATCH sched_ext/for-7.1-fixes] sched_ext: Reset dsq_vtime and slice when a task leaves SCX Andrea Righi
@ 2026-06-08 14:11 ` sashiko-bot
  2026-06-08 16:23   ` Andrea Righi
  0 siblings, 1 reply; 3+ messages in thread
From: sashiko-bot @ 2026-06-08 14:11 UTC (permalink / raw)
  To: Andrea Righi; +Cc: sched-ext

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:

New issues:
- [High] Tasks returning to the SCX scheduling class receive a zero slice, violating the intended baseline and causing immediate slice expiration.

Pre-existing issues:
- [High] Tasks with an explicit SCHED_EXT policy retain a stale `dsq_vtime` across global BPF scheduler unloads and reloads, leading to corrupted task prioritization.
--

commit 7f8eecaba1a8e23922f489ed563df593e55f3478
Author: Andrea Righi <arighi@nvidia.com>

sched_ext: Reset dsq_vtime and slice when a task leaves SCX

When a task switches out of the sched_ext class, p->scx.dsq_vtime and
p->scx.slice keep whatever values they last held. This patch zeroes
dsq_vtime and slice in switched_from_scx() after scx_disable_task() so
the fields are reset to the same baseline a freshly forked task has.

> diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
> index bd8a3332999f4..38157af00fb77 100644
> --- a/kernel/sched/ext.c
> +++ b/kernel/sched/ext.c
[ ... ]
> @@ -4097,10 +4097,11 @@ static void switched_from_scx(struct rq *rq, struct task_struct *p)
>  	 * scx_disable_task() would WARN on the non-%ENABLED state and trigger a
>  	 * NONE -> READY validation failure.
>  	 */
> -	if (scx_get_task_state(p) == SCX_TASK_NONE)
> -		return;
> +	if (scx_get_task_state(p) != SCX_TASK_NONE)
> +		scx_disable_task(scx_task_sched(p), p);
>  
> -	scx_disable_task(scx_task_sched(p), p);
> +	p->scx.dsq_vtime = 0;
> +	p->scx.slice = 0;

[Severity: High]
Does zeroing the slice here mean a task returning to the SCX scheduling class
will receive a zero slice?

If neither switched_to_scx() nor __scx_enable_task() resets it to
SCX_SLICE_DFL, it seems this violates the commit's intent of restoring the
baseline of a freshly forked task.

Will entering the BPF scheduler with a 0 slice cause the core to emit a
warned_zero_slice warning and trigger immediate slice exhaustion upon
dispatch (falling back to a 1 nanosecond slice)?

[Severity: High]
This isn't a bug introduced by this patch, but does the approach of clearing
dsq_vtime here miss tasks during global BPF scheduler unloads and reloads?

If tasks have an explicit SCHED_EXT policy, they remain in the ext_sched_class
when a custom BPF scheduler is unloaded. During scx_root_disable() and
scx_root_enable(), old_class == new_class evaluates to true, so
switched_from_scx() is never invoked and dsq_vtime is never cleared.

When the new BPF scheduler is loaded, will these tasks retain the dsq_vtime
from the previous scheduler, causing incorrect sorting and prioritization in
vtime-sorted dispatch queues?

>  }
>  
>  static void switched_to_scx(struct rq *rq, struct task_struct *p) {}

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260608134908.3232097-1-arighi@nvidia.com?part=1

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH sched_ext/for-7.1-fixes] sched_ext: Reset dsq_vtime and slice when a task leaves SCX
  2026-06-08 14:11 ` sashiko-bot
@ 2026-06-08 16:23   ` Andrea Righi
  0 siblings, 0 replies; 3+ messages in thread
From: Andrea Righi @ 2026-06-08 16:23 UTC (permalink / raw)
  To: sashiko-reviews; +Cc: sched-ext

On Mon, Jun 08, 2026 at 02:11:11PM +0000, sashiko-bot@kernel.org wrote:
> Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
> 
> New issues:
> - [High] Tasks returning to the SCX scheduling class receive a zero slice, violating the intended baseline and causing immediate slice expiration.
> 
> Pre-existing issues:
> - [High] Tasks with an explicit SCHED_EXT policy retain a stale `dsq_vtime` across global BPF scheduler unloads and reloads, leading to corrupted task prioritization.
> --
> 
> commit 7f8eecaba1a8e23922f489ed563df593e55f3478
> Author: Andrea Righi <arighi@nvidia.com>
> 
> sched_ext: Reset dsq_vtime and slice when a task leaves SCX
> 
> When a task switches out of the sched_ext class, p->scx.dsq_vtime and
> p->scx.slice keep whatever values they last held. This patch zeroes
> dsq_vtime and slice in switched_from_scx() after scx_disable_task() so
> the fields are reset to the same baseline a freshly forked task has.
> 
> > diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
> > index bd8a3332999f4..38157af00fb77 100644
> > --- a/kernel/sched/ext.c
> > +++ b/kernel/sched/ext.c
> [ ... ]
> > @@ -4097,10 +4097,11 @@ static void switched_from_scx(struct rq *rq, struct task_struct *p)
> >  	 * scx_disable_task() would WARN on the non-%ENABLED state and trigger a
> >  	 * NONE -> READY validation failure.
> >  	 */
> > -	if (scx_get_task_state(p) == SCX_TASK_NONE)
> > -		return;
> > +	if (scx_get_task_state(p) != SCX_TASK_NONE)
> > +		scx_disable_task(scx_task_sched(p), p);
> >  
> > -	scx_disable_task(scx_task_sched(p), p);
> > +	p->scx.dsq_vtime = 0;
> > +	p->scx.slice = 0;
> 
> [Severity: High]
> Does zeroing the slice here mean a task returning to the SCX scheduling class
> will receive a zero slice?
> 
> If neither switched_to_scx() nor __scx_enable_task() resets it to
> SCX_SLICE_DFL, it seems this violates the commit's intent of restoring the
> baseline of a freshly forked task.
> 
> Will entering the BPF scheduler with a 0 slice cause the core to emit a
> warned_zero_slice warning and trigger immediate slice exhaustion upon
> dispatch (falling back to a 1 nanosecond slice)?

This looks like a false positive. And I think it's semantically correct to reset
the slice to zero when a task exits the SCHED_EXT class. I'll rephrase the
"freshly forked task" part, which might be confusing.

> 
> [Severity: High]
> This isn't a bug introduced by this patch, but does the approach of clearing
> dsq_vtime here miss tasks during global BPF scheduler unloads and reloads?
> 
> If tasks have an explicit SCHED_EXT policy, they remain in the ext_sched_class
> when a custom BPF scheduler is unloaded. During scx_root_disable() and
> scx_root_enable(), old_class == new_class evaluates to true, so
> switched_from_scx() is never invoked and dsq_vtime is never cleared.
> 
> When the new BPF scheduler is loaded, will these tasks retain the dsq_vtime
> from the previous scheduler, causing incorrect sorting and prioritization in
> vtime-sorted dispatch queues?

This one instead looks legit, we should probably reset dsq_vtime and slice in
scx_disable_task() after ops.disable() has observed their final values. I'll fix
this in v2.

-Andrea

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-06-08 16:23 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-08 13:49 [PATCH sched_ext/for-7.1-fixes] sched_ext: Reset dsq_vtime and slice when a task leaves SCX Andrea Righi
2026-06-08 14:11 ` sashiko-bot
2026-06-08 16:23   ` Andrea Righi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.