From: Andrea Righi <arighi@nvidia.com>
To: Tejun Heo <tj@kernel.org>
Cc: David Vernet <void@manifault.com>,
Changwoo Min <changwoo@igalia.com>,
Daniel Hodges <hodgesd@meta.com>,
sched-ext@lists.linux.dev, linux-kernel@vger.kernel.org,
patsomaru@meta.com
Subject: Re: [PATCH] sched_ext: Fix stale direct dispatch state in ddsp_dsq_id
Date: Thu, 2 Apr 2026 09:54:44 +0200 [thread overview]
Message-ID: <ac4gxCImroMR6TYl@gpd4> (raw)
In-Reply-To: <ac4dbpuXGBUEVm6E@gpd4>
On Thu, Apr 02, 2026 at 09:40:41AM +0200, Andrea Righi wrote:
> Hi Tejun,
>
> On Wed, Apr 01, 2026 at 12:46:58PM -1000, Tejun Heo wrote:
> > Hello,
> >
> > (cc'ing Patrick Somaru. This is the same issue reported on
> > https://github.com/sched-ext/scx/pull/3482)
> >
> > On Wed, Apr 01, 2026 at 11:56:19PM +0200, Andrea Righi wrote:
> > > @p->scx.ddsp_dsq_id can be left set (non-SCX_DSQ_INVALID) in three
> > > scenarios, causing a spurious WARN_ON_ONCE() in mark_direct_dispatch()
> > > when the next wakeup's ops.select_cpu() calls scx_bpf_dsq_insert():
> > >
> > > 1. Deferred dispatch cancellation: when a task is directly dispatched to
> > > a remote CPU's local DSQ via ops.select_cpu() or ops.enqueue(), the
> > > dispatch is deferred (since we can't lock the remote rq while holding
> > > the current one). If the task is dequeued before processing the
> > > dispatch in process_ddsp_deferred_locals(), dispatch_dequeue()
> > > removes the task from the list leaving a stale direct dispatch state.
> > >
> > > Fix: clear ddsp_dsq_id and ddsp_enq_flags in the !list_empty branch
> > > of dispatch_dequeue().
> > >
> > > 2. Holding-cpu dispatch race: when dispatch_to_local_dsq() transfers a
> > > task to another CPU's local DSQ, it sets holding_cpu and releases
> > > DISPATCHING before locking the source rq. If dequeue wins the race
> > > and clears holding_cpu, dispatch_enqueue() is never called and
> > > ddsp_dsq_id is not cleared.
> > >
> > > Fix: clear ddsp_dsq_id and ddsp_enq_flags when clearing holding_cpu
> > > in dispatch_dequeue().
> >
> > These two just mean that dequeue need to clear it, right?
>
> Correct, we want to clear the state where dispatch_dequeue() cancels a
> pending direct dispatch without calling dispatch_enqueue(), so I could have
> just clear the state unconditionally in the !dsq case and simplify the
> code.
>
> >
> > > 3. Cross-scheduler-instance stale state: When an SCX scheduler exits,
> > > scx_bypass() iterates over all runnable tasks to dequeue/re-enqueue
> > > them, but sleeping tasks are not on any runqueue and are not touched.
> > > If a sleeping task had a deferred dispatch in flight (ddsp_dsq_id
> > > set) at the time the scheduler exited, the state persists. When a new
> > > scheduler instance loads and calls scx_enable_task() for all tasks,
> > > it does not reset this leftover state. The next wakeup's
> > > ops.select_cpu() then sees a non-INVALID ddsp_dsq_id and triggers:
> > >
> > > WARN_ON_ONCE(p->scx.ddsp_dsq_id != SCX_DSQ_INVALID)
> > >
> > > Fix: clear ddsp_dsq_id and ddsp_enq_flags in scx_enable_task() before
> > > calling ops.enable(), ensuring each new scheduler instance starts
> > > with a clean direct dispatch state per task.
> >
> > I don't understand this one. If we fix the missing clearing from dequeue,
> > where would the residual ddsp_dsq_id come from? How would a sleeping task
> > have ddsp_dsq_id set? Note that select_cpu() + enqueue() call sequence is
> > atomic w.r.t. dequeue as both are protected by pi_lock.
> >
> > It's been always a bit bothersome that ddsp_dsq_id was being cleared in
> > dispatch_enqueue(). It was there to catch the cases where ddsp_dsq_id was
> > overridden but it just isn't the right place. Can we do the following?
> >
> > - Add clear_direct_dispatch() which clears ddsp_dsq_id and ddsp_enq_flags.
> >
> > - Add clear_direct_dispatch() call under the enqueue: in do_enqueue_task()
> > and remove ddsp clearing from dispatch_enqueue(). This should catch all
> > cases that ignore ddsp.
> >
> > - Add clear_direct_dispatch() call after dispatch_enqueue() in
> > direct_dispatch(). This clears it for the synchronous consumption.
> >
> > - Add clear_direct_dispatch() call before dispatch_to_local_dsq() call in
> > process_ddsp_deferred_locals(). Note that the funciton has to cache and
> > clear ddsp fields *before* calling dispatch_to_local_enq() as the function
> > will migrate the task to another rq and we can't control what happens to
> > it afterwrds. Even for the previous synchronous case, it may just be a
> > better pattern to always cache dsq_id and enq_flags in local vars and
> > clear p->scx.ddsp* before calling dispatch_enqueue().
> >
> > - Add clear_direct_dispatch() call to dequeue_task_scx() after
> > dispatch_dequeue().
> >
> > I think this should capture all cases and the fields are cleared where they
> > should be cleared (either consumed or canceled).
>
> I like this, it looks like a better design.
>
> However, I tried it, but I'm still able to trigger the warning, unless I
> clear the direct dispatch state in __scx_enable_task(), so we're still
> missing a case that doesn't properly clear the state.
>
> I think it has something to do with sleeping tasks / queued wakeups /
> bypass, because now I can easily reproduce the warning running a
> `stress-ng --sleep 1` while restarting a scheduler that uses
> SCX_OPS_ALLOW_QUEUED_WAKEUP, like scx_cosmos. Will keep investigating...
I think I see it: waking tasks may have had ddsp_dsq_id set by the outgoing
scheduler's ops.select_cpu() and then been queued on a wake_list via
ttwu_queue_wakelist(), when SCX_OPS_ALLOW_QUEUED_WAKEUP is set. Such tasks are
not on the runqueue (on_rq == 0) and are not iterated by scx_bypass(), so their
direct dispatch state won't be cleared by dequeue_task_scx().
How about clearning the direct dispatch state in scx_disable_task()? That would
catch those tasks as well and it won't leave the stale state from the previous
scx scheduler instance.
Thanks,
-Andrea
prev parent reply other threads:[~2026-04-02 7:54 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-01 21:56 [PATCH] sched_ext: Fix stale direct dispatch state in ddsp_dsq_id Andrea Righi
2026-04-01 22:46 ` Tejun Heo
2026-04-02 7:40 ` Andrea Righi
2026-04-02 7:54 ` Andrea Righi [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ac4gxCImroMR6TYl@gpd4 \
--to=arighi@nvidia.com \
--cc=changwoo@igalia.com \
--cc=hodgesd@meta.com \
--cc=linux-kernel@vger.kernel.org \
--cc=patsomaru@meta.com \
--cc=sched-ext@lists.linux.dev \
--cc=tj@kernel.org \
--cc=void@manifault.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox