All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tejun Heo <tj@kernel.org>
To: Andrea Righi <arighi@nvidia.com>
Cc: David Vernet <void@manifault.com>,
	Changwoo Min <changwoo@igalia.com>,
	Daniel Hodges <hodgesd@meta.com>,
	sched-ext@lists.linux.dev, linux-kernel@vger.kernel.org,
	patsomaru@meta.com
Subject: Re: [PATCH] sched_ext: Fix stale direct dispatch state in ddsp_dsq_id
Date: Wed, 1 Apr 2026 12:46:58 -1000	[thread overview]
Message-ID: <ac2gYobvvAXSPjxd@slm.duckdns.org> (raw)
In-Reply-To: <20260401215619.1188194-1-arighi@nvidia.com>

Hello,

(cc'ing Patrick Somaru. This is the same issue reported on
https://github.com/sched-ext/scx/pull/3482)

On Wed, Apr 01, 2026 at 11:56:19PM +0200, Andrea Righi wrote:
> @p->scx.ddsp_dsq_id can be left set (non-SCX_DSQ_INVALID) in three
> scenarios, causing a spurious WARN_ON_ONCE() in mark_direct_dispatch()
> when the next wakeup's ops.select_cpu() calls scx_bpf_dsq_insert():
> 
> 1. Deferred dispatch cancellation: when a task is directly dispatched to
>    a remote CPU's local DSQ via ops.select_cpu() or ops.enqueue(), the
>    dispatch is deferred (since we can't lock the remote rq while holding
>    the current one). If the task is dequeued before processing the
>    dispatch in process_ddsp_deferred_locals(), dispatch_dequeue()
>    removes the task from the list leaving a stale direct dispatch state.
> 
>    Fix: clear ddsp_dsq_id and ddsp_enq_flags in the !list_empty branch
>    of dispatch_dequeue().
> 
> 2. Holding-cpu dispatch race: when dispatch_to_local_dsq() transfers a
>    task to another CPU's local DSQ, it sets holding_cpu and releases
>    DISPATCHING before locking the source rq. If dequeue wins the race
>    and clears holding_cpu, dispatch_enqueue() is never called and
>    ddsp_dsq_id is not cleared.
> 
>    Fix: clear ddsp_dsq_id and ddsp_enq_flags when clearing holding_cpu
>    in dispatch_dequeue().

These two just mean that dequeue need to clear it, right?

> 3. Cross-scheduler-instance stale state: When an SCX scheduler exits,
>    scx_bypass() iterates over all runnable tasks to dequeue/re-enqueue
>    them, but sleeping tasks are not on any runqueue and are not touched.
>    If a sleeping task had a deferred dispatch in flight (ddsp_dsq_id
>    set) at the time the scheduler exited, the state persists. When a new
>    scheduler instance loads and calls scx_enable_task() for all tasks,
>    it does not reset this leftover state. The next wakeup's
>    ops.select_cpu() then sees a non-INVALID ddsp_dsq_id and triggers:
> 
>      WARN_ON_ONCE(p->scx.ddsp_dsq_id != SCX_DSQ_INVALID)
> 
>    Fix: clear ddsp_dsq_id and ddsp_enq_flags in scx_enable_task() before
>    calling ops.enable(), ensuring each new scheduler instance starts
>    with a clean direct dispatch state per task.

I don't understand this one. If we fix the missing clearing from dequeue,
where would the residual ddsp_dsq_id come from? How would a sleeping task
have ddsp_dsq_id set? Note that select_cpu() + enqueue() call sequence is
atomic w.r.t. dequeue as both are protected by pi_lock.

It's been always a bit bothersome that ddsp_dsq_id was being cleared in
dispatch_enqueue(). It was there to catch the cases where ddsp_dsq_id was
overridden but it just isn't the right place. Can we do the following?

- Add clear_direct_dispatch() which clears ddsp_dsq_id and ddsp_enq_flags.

- Add clear_direct_dispatch() call under the enqueue: in do_enqueue_task()
  and remove ddsp clearing from dispatch_enqueue(). This should catch all
  cases that ignore ddsp.

- Add clear_direct_dispatch() call after dispatch_enqueue() in
  direct_dispatch(). This clears it for the synchronous consumption.

- Add clear_direct_dispatch() call before dispatch_to_local_dsq() call in
  process_ddsp_deferred_locals(). Note that the funciton has to cache and
  clear ddsp fields *before* calling dispatch_to_local_enq() as the function
  will migrate the task to another rq and we can't control what happens to
  it afterwrds. Even for the previous synchronous case, it may just be a
  better pattern to always cache dsq_id and enq_flags in local vars and
  clear p->scx.ddsp* before calling dispatch_enqueue().

- Add clear_direct_dispatch() call to dequeue_task_scx() after
  dispatch_dequeue().

I think this should capture all cases and the fields are cleared where they
should be cleared (either consumed or canceled).

Thanks.

-- 
tejun

  reply	other threads:[~2026-04-01 22:46 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-01 21:56 [PATCH] sched_ext: Fix stale direct dispatch state in ddsp_dsq_id Andrea Righi
2026-04-01 22:46 ` Tejun Heo [this message]
2026-04-02  7:40   ` Andrea Righi
2026-04-02  7:54     ` Andrea Righi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ac2gYobvvAXSPjxd@slm.duckdns.org \
    --to=tj@kernel.org \
    --cc=arighi@nvidia.com \
    --cc=changwoo@igalia.com \
    --cc=hodgesd@meta.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=patsomaru@meta.com \
    --cc=sched-ext@lists.linux.dev \
    --cc=void@manifault.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.