public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Tejun Heo <tj@kernel.org>
To: Andrea Righi <arighi@nvidia.com>
Cc: David Vernet <void@manifault.com>,
	Changwoo Min <changwoo@igalia.com>,
	Daniel Hodges <hodgesd@meta.com>,
	sched-ext@lists.linux.dev, linux-kernel@vger.kernel.org,
	patsomaru@meta.com
Subject: Re: [PATCH] sched_ext: Fix stale direct dispatch state in ddsp_dsq_id
Date: Wed, 1 Apr 2026 12:46:58 -1000	[thread overview]
Message-ID: <ac2gYobvvAXSPjxd@slm.duckdns.org> (raw)
In-Reply-To: <20260401215619.1188194-1-arighi@nvidia.com>

Hello,

(cc'ing Patrick Somaru. This is the same issue reported on
https://github.com/sched-ext/scx/pull/3482)

On Wed, Apr 01, 2026 at 11:56:19PM +0200, Andrea Righi wrote:
> @p->scx.ddsp_dsq_id can be left set (non-SCX_DSQ_INVALID) in three
> scenarios, causing a spurious WARN_ON_ONCE() in mark_direct_dispatch()
> when the next wakeup's ops.select_cpu() calls scx_bpf_dsq_insert():
> 
> 1. Deferred dispatch cancellation: when a task is directly dispatched to
>    a remote CPU's local DSQ via ops.select_cpu() or ops.enqueue(), the
>    dispatch is deferred (since we can't lock the remote rq while holding
>    the current one). If the task is dequeued before processing the
>    dispatch in process_ddsp_deferred_locals(), dispatch_dequeue()
>    removes the task from the list leaving a stale direct dispatch state.
> 
>    Fix: clear ddsp_dsq_id and ddsp_enq_flags in the !list_empty branch
>    of dispatch_dequeue().
> 
> 2. Holding-cpu dispatch race: when dispatch_to_local_dsq() transfers a
>    task to another CPU's local DSQ, it sets holding_cpu and releases
>    DISPATCHING before locking the source rq. If dequeue wins the race
>    and clears holding_cpu, dispatch_enqueue() is never called and
>    ddsp_dsq_id is not cleared.
> 
>    Fix: clear ddsp_dsq_id and ddsp_enq_flags when clearing holding_cpu
>    in dispatch_dequeue().

These two just mean that dequeue need to clear it, right?

> 3. Cross-scheduler-instance stale state: When an SCX scheduler exits,
>    scx_bypass() iterates over all runnable tasks to dequeue/re-enqueue
>    them, but sleeping tasks are not on any runqueue and are not touched.
>    If a sleeping task had a deferred dispatch in flight (ddsp_dsq_id
>    set) at the time the scheduler exited, the state persists. When a new
>    scheduler instance loads and calls scx_enable_task() for all tasks,
>    it does not reset this leftover state. The next wakeup's
>    ops.select_cpu() then sees a non-INVALID ddsp_dsq_id and triggers:
> 
>      WARN_ON_ONCE(p->scx.ddsp_dsq_id != SCX_DSQ_INVALID)
> 
>    Fix: clear ddsp_dsq_id and ddsp_enq_flags in scx_enable_task() before
>    calling ops.enable(), ensuring each new scheduler instance starts
>    with a clean direct dispatch state per task.

I don't understand this one. If we fix the missing clearing from dequeue,
where would the residual ddsp_dsq_id come from? How would a sleeping task
have ddsp_dsq_id set? Note that select_cpu() + enqueue() call sequence is
atomic w.r.t. dequeue as both are protected by pi_lock.

It's been always a bit bothersome that ddsp_dsq_id was being cleared in
dispatch_enqueue(). It was there to catch the cases where ddsp_dsq_id was
overridden but it just isn't the right place. Can we do the following?

- Add clear_direct_dispatch() which clears ddsp_dsq_id and ddsp_enq_flags.

- Add clear_direct_dispatch() call under the enqueue: in do_enqueue_task()
  and remove ddsp clearing from dispatch_enqueue(). This should catch all
  cases that ignore ddsp.

- Add clear_direct_dispatch() call after dispatch_enqueue() in
  direct_dispatch(). This clears it for the synchronous consumption.

- Add clear_direct_dispatch() call before dispatch_to_local_dsq() call in
  process_ddsp_deferred_locals(). Note that the funciton has to cache and
  clear ddsp fields *before* calling dispatch_to_local_enq() as the function
  will migrate the task to another rq and we can't control what happens to
  it afterwrds. Even for the previous synchronous case, it may just be a
  better pattern to always cache dsq_id and enq_flags in local vars and
  clear p->scx.ddsp* before calling dispatch_enqueue().

- Add clear_direct_dispatch() call to dequeue_task_scx() after
  dispatch_dequeue().

I think this should capture all cases and the fields are cleared where they
should be cleared (either consumed or canceled).

Thanks.

-- 
tejun

  reply	other threads:[~2026-04-01 22:46 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-01 21:56 [PATCH] sched_ext: Fix stale direct dispatch state in ddsp_dsq_id Andrea Righi
2026-04-01 22:46 ` Tejun Heo [this message]
2026-04-02  7:40   ` Andrea Righi
2026-04-02  7:54     ` Andrea Righi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ac2gYobvvAXSPjxd@slm.duckdns.org \
    --to=tj@kernel.org \
    --cc=arighi@nvidia.com \
    --cc=changwoo@igalia.com \
    --cc=hodgesd@meta.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=patsomaru@meta.com \
    --cc=sched-ext@lists.linux.dev \
    --cc=void@manifault.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox