All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrea Righi <arighi@nvidia.com>
To: Tejun Heo <tj@kernel.org>
Cc: David Vernet <void@manifault.com>,
	Changwoo Min <changwoo@igalia.com>,
	Daniel Hodges <hodgesd@meta.com>,
	sched-ext@lists.linux.dev, linux-kernel@vger.kernel.org,
	patsomaru@meta.com
Subject: Re: [PATCH] sched_ext: Fix stale direct dispatch state in ddsp_dsq_id
Date: Thu, 2 Apr 2026 09:40:30 +0200	[thread overview]
Message-ID: <ac4dbpuXGBUEVm6E@gpd4> (raw)
In-Reply-To: <ac2gYobvvAXSPjxd@slm.duckdns.org>

Hi Tejun,

On Wed, Apr 01, 2026 at 12:46:58PM -1000, Tejun Heo wrote:
> Hello,
> 
> (cc'ing Patrick Somaru. This is the same issue reported on
> https://github.com/sched-ext/scx/pull/3482)
> 
> On Wed, Apr 01, 2026 at 11:56:19PM +0200, Andrea Righi wrote:
> > @p->scx.ddsp_dsq_id can be left set (non-SCX_DSQ_INVALID) in three
> > scenarios, causing a spurious WARN_ON_ONCE() in mark_direct_dispatch()
> > when the next wakeup's ops.select_cpu() calls scx_bpf_dsq_insert():
> > 
> > 1. Deferred dispatch cancellation: when a task is directly dispatched to
> >    a remote CPU's local DSQ via ops.select_cpu() or ops.enqueue(), the
> >    dispatch is deferred (since we can't lock the remote rq while holding
> >    the current one). If the task is dequeued before processing the
> >    dispatch in process_ddsp_deferred_locals(), dispatch_dequeue()
> >    removes the task from the list leaving a stale direct dispatch state.
> > 
> >    Fix: clear ddsp_dsq_id and ddsp_enq_flags in the !list_empty branch
> >    of dispatch_dequeue().
> > 
> > 2. Holding-cpu dispatch race: when dispatch_to_local_dsq() transfers a
> >    task to another CPU's local DSQ, it sets holding_cpu and releases
> >    DISPATCHING before locking the source rq. If dequeue wins the race
> >    and clears holding_cpu, dispatch_enqueue() is never called and
> >    ddsp_dsq_id is not cleared.
> > 
> >    Fix: clear ddsp_dsq_id and ddsp_enq_flags when clearing holding_cpu
> >    in dispatch_dequeue().
> 
> These two just mean that dequeue need to clear it, right?

Correct, we want to clear the state where dispatch_dequeue() cancels a
pending direct dispatch without calling dispatch_enqueue(), so I could have
just clear the state unconditionally in the !dsq case and simplify the
code.

> 
> > 3. Cross-scheduler-instance stale state: When an SCX scheduler exits,
> >    scx_bypass() iterates over all runnable tasks to dequeue/re-enqueue
> >    them, but sleeping tasks are not on any runqueue and are not touched.
> >    If a sleeping task had a deferred dispatch in flight (ddsp_dsq_id
> >    set) at the time the scheduler exited, the state persists. When a new
> >    scheduler instance loads and calls scx_enable_task() for all tasks,
> >    it does not reset this leftover state. The next wakeup's
> >    ops.select_cpu() then sees a non-INVALID ddsp_dsq_id and triggers:
> > 
> >      WARN_ON_ONCE(p->scx.ddsp_dsq_id != SCX_DSQ_INVALID)
> > 
> >    Fix: clear ddsp_dsq_id and ddsp_enq_flags in scx_enable_task() before
> >    calling ops.enable(), ensuring each new scheduler instance starts
> >    with a clean direct dispatch state per task.
> 
> I don't understand this one. If we fix the missing clearing from dequeue,
> where would the residual ddsp_dsq_id come from? How would a sleeping task
> have ddsp_dsq_id set? Note that select_cpu() + enqueue() call sequence is
> atomic w.r.t. dequeue as both are protected by pi_lock.
> 
> It's been always a bit bothersome that ddsp_dsq_id was being cleared in
> dispatch_enqueue(). It was there to catch the cases where ddsp_dsq_id was
> overridden but it just isn't the right place. Can we do the following?
> 
> - Add clear_direct_dispatch() which clears ddsp_dsq_id and ddsp_enq_flags.
> 
> - Add clear_direct_dispatch() call under the enqueue: in do_enqueue_task()
>   and remove ddsp clearing from dispatch_enqueue(). This should catch all
>   cases that ignore ddsp.
> 
> - Add clear_direct_dispatch() call after dispatch_enqueue() in
>   direct_dispatch(). This clears it for the synchronous consumption.
> 
> - Add clear_direct_dispatch() call before dispatch_to_local_dsq() call in
>   process_ddsp_deferred_locals(). Note that the funciton has to cache and
>   clear ddsp fields *before* calling dispatch_to_local_enq() as the function
>   will migrate the task to another rq and we can't control what happens to
>   it afterwrds. Even for the previous synchronous case, it may just be a
>   better pattern to always cache dsq_id and enq_flags in local vars and
>   clear p->scx.ddsp* before calling dispatch_enqueue().
> 
> - Add clear_direct_dispatch() call to dequeue_task_scx() after
>   dispatch_dequeue().
> 
> I think this should capture all cases and the fields are cleared where they
> should be cleared (either consumed or canceled).

I like this, it looks like a better design.

However, I tried it, but I'm still able to trigger the warning, unless I
clear the direct dispatch state in __scx_enable_task(), so we're still
missing a case that doesn't properly clear the state.

I think it has something to do with sleeping tasks / queued wakeups /
bypass, because now I can easily reproduce the warning running a
`stress-ng --sleep 1` while restarting a scheduler that uses
SCX_OPS_ALLOW_QUEUED_WAKEUP, like scx_cosmos. Will keep investigating...

Thanks,
-Andrea

  reply	other threads:[~2026-04-02  7:40 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-01 21:56 [PATCH] sched_ext: Fix stale direct dispatch state in ddsp_dsq_id Andrea Righi
2026-04-01 22:46 ` Tejun Heo
2026-04-02  7:40   ` Andrea Righi [this message]
2026-04-02  7:54     ` Andrea Righi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ac4dbpuXGBUEVm6E@gpd4 \
    --to=arighi@nvidia.com \
    --cc=changwoo@igalia.com \
    --cc=hodgesd@meta.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=patsomaru@meta.com \
    --cc=sched-ext@lists.linux.dev \
    --cc=tj@kernel.org \
    --cc=void@manifault.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.