* [PATCH sched_ext/for-7.1-fixes] sched_ext: Reset dsq_vtime and slice when a task leaves SCX
@ 2026-06-08 13:49 Andrea Righi
2026-06-08 14:11 ` sashiko-bot
0 siblings, 1 reply; 3+ messages in thread
From: Andrea Righi @ 2026-06-08 13:49 UTC (permalink / raw)
To: Tejun Heo, David Vernet, Changwoo Min; +Cc: sched-ext, linux-kernel
When a task switches out of the sched_ext class, p->scx.dsq_vtime and
p->scx.slice keep whatever values they last held. On enable, slice is
reset to slice_dfl, but dsq_vtime is owned by the BPF scheduler and is
never cleared by the core, so a task that leaves SCX and later returns
carries a stale dsq_vtime across the round-trip.
The stale values are also visible to other SCX schedulers if they
inspects the scx fields of non-SCX tasks.
Zero dsq_vtime and slice in switched_from_scx() after scx_disable_task()
so the fields are reset to the same baseline a freshly forked task has.
The reset is done after the disable callback so the BPF scheduler can
still observe the task's final values.
Signed-off-by: Andrea Righi <arighi@nvidia.com>
---
kernel/sched/ext.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 8e88a25bc602f..915d5ece5d277 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -3942,10 +3942,11 @@ static void switched_from_scx(struct rq *rq, struct task_struct *p)
* scx_disable_task() would WARN on the non-%ENABLED state and trigger a
* NONE -> READY validation failure.
*/
- if (scx_get_task_state(p) == SCX_TASK_NONE)
- return;
+ if (scx_get_task_state(p) != SCX_TASK_NONE)
+ scx_disable_task(scx_task_sched(p), p);
- scx_disable_task(scx_task_sched(p), p);
+ p->scx.dsq_vtime = 0;
+ p->scx.slice = 0;
}
static void switched_to_scx(struct rq *rq, struct task_struct *p) {}
--
2.54.0
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH sched_ext/for-7.1-fixes] sched_ext: Reset dsq_vtime and slice when a task leaves SCX
2026-06-08 13:49 [PATCH sched_ext/for-7.1-fixes] sched_ext: Reset dsq_vtime and slice when a task leaves SCX Andrea Righi
@ 2026-06-08 14:11 ` sashiko-bot
2026-06-08 16:23 ` Andrea Righi
0 siblings, 1 reply; 3+ messages in thread
From: sashiko-bot @ 2026-06-08 14:11 UTC (permalink / raw)
To: Andrea Righi; +Cc: sched-ext
Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
New issues:
- [High] Tasks returning to the SCX scheduling class receive a zero slice, violating the intended baseline and causing immediate slice expiration.
Pre-existing issues:
- [High] Tasks with an explicit SCHED_EXT policy retain a stale `dsq_vtime` across global BPF scheduler unloads and reloads, leading to corrupted task prioritization.
--
commit 7f8eecaba1a8e23922f489ed563df593e55f3478
Author: Andrea Righi <arighi@nvidia.com>
sched_ext: Reset dsq_vtime and slice when a task leaves SCX
When a task switches out of the sched_ext class, p->scx.dsq_vtime and
p->scx.slice keep whatever values they last held. This patch zeroes
dsq_vtime and slice in switched_from_scx() after scx_disable_task() so
the fields are reset to the same baseline a freshly forked task has.
> diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
> index bd8a3332999f4..38157af00fb77 100644
> --- a/kernel/sched/ext.c
> +++ b/kernel/sched/ext.c
[ ... ]
> @@ -4097,10 +4097,11 @@ static void switched_from_scx(struct rq *rq, struct task_struct *p)
> * scx_disable_task() would WARN on the non-%ENABLED state and trigger a
> * NONE -> READY validation failure.
> */
> - if (scx_get_task_state(p) == SCX_TASK_NONE)
> - return;
> + if (scx_get_task_state(p) != SCX_TASK_NONE)
> + scx_disable_task(scx_task_sched(p), p);
>
> - scx_disable_task(scx_task_sched(p), p);
> + p->scx.dsq_vtime = 0;
> + p->scx.slice = 0;
[Severity: High]
Does zeroing the slice here mean a task returning to the SCX scheduling class
will receive a zero slice?
If neither switched_to_scx() nor __scx_enable_task() resets it to
SCX_SLICE_DFL, it seems this violates the commit's intent of restoring the
baseline of a freshly forked task.
Will entering the BPF scheduler with a 0 slice cause the core to emit a
warned_zero_slice warning and trigger immediate slice exhaustion upon
dispatch (falling back to a 1 nanosecond slice)?
[Severity: High]
This isn't a bug introduced by this patch, but does the approach of clearing
dsq_vtime here miss tasks during global BPF scheduler unloads and reloads?
If tasks have an explicit SCHED_EXT policy, they remain in the ext_sched_class
when a custom BPF scheduler is unloaded. During scx_root_disable() and
scx_root_enable(), old_class == new_class evaluates to true, so
switched_from_scx() is never invoked and dsq_vtime is never cleared.
When the new BPF scheduler is loaded, will these tasks retain the dsq_vtime
from the previous scheduler, causing incorrect sorting and prioritization in
vtime-sorted dispatch queues?
> }
>
> static void switched_to_scx(struct rq *rq, struct task_struct *p) {}
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260608134908.3232097-1-arighi@nvidia.com?part=1
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH sched_ext/for-7.1-fixes] sched_ext: Reset dsq_vtime and slice when a task leaves SCX
2026-06-08 14:11 ` sashiko-bot
@ 2026-06-08 16:23 ` Andrea Righi
0 siblings, 0 replies; 3+ messages in thread
From: Andrea Righi @ 2026-06-08 16:23 UTC (permalink / raw)
To: sashiko-reviews; +Cc: sched-ext
On Mon, Jun 08, 2026 at 02:11:11PM +0000, sashiko-bot@kernel.org wrote:
> Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
>
> New issues:
> - [High] Tasks returning to the SCX scheduling class receive a zero slice, violating the intended baseline and causing immediate slice expiration.
>
> Pre-existing issues:
> - [High] Tasks with an explicit SCHED_EXT policy retain a stale `dsq_vtime` across global BPF scheduler unloads and reloads, leading to corrupted task prioritization.
> --
>
> commit 7f8eecaba1a8e23922f489ed563df593e55f3478
> Author: Andrea Righi <arighi@nvidia.com>
>
> sched_ext: Reset dsq_vtime and slice when a task leaves SCX
>
> When a task switches out of the sched_ext class, p->scx.dsq_vtime and
> p->scx.slice keep whatever values they last held. This patch zeroes
> dsq_vtime and slice in switched_from_scx() after scx_disable_task() so
> the fields are reset to the same baseline a freshly forked task has.
>
> > diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
> > index bd8a3332999f4..38157af00fb77 100644
> > --- a/kernel/sched/ext.c
> > +++ b/kernel/sched/ext.c
> [ ... ]
> > @@ -4097,10 +4097,11 @@ static void switched_from_scx(struct rq *rq, struct task_struct *p)
> > * scx_disable_task() would WARN on the non-%ENABLED state and trigger a
> > * NONE -> READY validation failure.
> > */
> > - if (scx_get_task_state(p) == SCX_TASK_NONE)
> > - return;
> > + if (scx_get_task_state(p) != SCX_TASK_NONE)
> > + scx_disable_task(scx_task_sched(p), p);
> >
> > - scx_disable_task(scx_task_sched(p), p);
> > + p->scx.dsq_vtime = 0;
> > + p->scx.slice = 0;
>
> [Severity: High]
> Does zeroing the slice here mean a task returning to the SCX scheduling class
> will receive a zero slice?
>
> If neither switched_to_scx() nor __scx_enable_task() resets it to
> SCX_SLICE_DFL, it seems this violates the commit's intent of restoring the
> baseline of a freshly forked task.
>
> Will entering the BPF scheduler with a 0 slice cause the core to emit a
> warned_zero_slice warning and trigger immediate slice exhaustion upon
> dispatch (falling back to a 1 nanosecond slice)?
This looks like a false positive. And I think it's semantically correct to reset
the slice to zero when a task exits the SCHED_EXT class. I'll rephrase the
"freshly forked task" part, which might be confusing.
>
> [Severity: High]
> This isn't a bug introduced by this patch, but does the approach of clearing
> dsq_vtime here miss tasks during global BPF scheduler unloads and reloads?
>
> If tasks have an explicit SCHED_EXT policy, they remain in the ext_sched_class
> when a custom BPF scheduler is unloaded. During scx_root_disable() and
> scx_root_enable(), old_class == new_class evaluates to true, so
> switched_from_scx() is never invoked and dsq_vtime is never cleared.
>
> When the new BPF scheduler is loaded, will these tasks retain the dsq_vtime
> from the previous scheduler, causing incorrect sorting and prioritization in
> vtime-sorted dispatch queues?
This one instead looks legit, we should probably reset dsq_vtime and slice in
scx_disable_task() after ops.disable() has observed their final values. I'll fix
this in v2.
-Andrea
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2026-06-08 16:23 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-08 13:49 [PATCH sched_ext/for-7.1-fixes] sched_ext: Reset dsq_vtime and slice when a task leaves SCX Andrea Righi
2026-06-08 14:11 ` sashiko-bot
2026-06-08 16:23 ` Andrea Righi
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.