From: Andrea Righi <arighi@nvidia.com>
To: Kuba Piecuch <jpiecuch@google.com>
Cc: Tejun Heo <tj@kernel.org>, David Vernet <void@manifault.com>,
Changwoo Min <changwoo@igalia.com>,
Emil Tsalapatis <emil@etsalapatis.com>,
Christian Loehle <christian.loehle@arm.com>,
Daniel Hodges <hodgesd@meta.com>,
sched-ext@lists.linux.dev, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2 sched_ext/for-7.1] sched_ext: Invalidate dispatch decisions on CPU affinity changes
Date: Thu, 19 Mar 2026 20:01:01 +0100 [thread overview]
Message-ID: <abxH7cC-IOE2PqKb@gpd4> (raw)
In-Reply-To: <DH6UY7YCS6LN.KMPOKET3PSF9@google.com>
Hi Kuba,
On Thu, Mar 19, 2026 at 03:18:38PM +0000, Kuba Piecuch wrote:
> Hi Andrea,
>
> On Thu Mar 19, 2026 at 8:35 AM UTC, Andrea Righi wrote:
> > A BPF scheduler may rely on p->cpus_ptr from ops.dispatch() to select a
> > target CPU. However, task affinity can change between the dispatch
> > decision and its finalization in finish_dispatch(). When this happens,
> > the scheduler may attempt to dispatch a task to a CPU that is no longer
> > allowed, resulting in fatal errors such as:
> >
> > EXIT: runtime error (SCX_DSQ_LOCAL[_ON] target CPU 10 not allowed for stress-ng-race-[13565])
> >
> > This race exists because ops.dispatch() runs without holding the task's
> > run queue lock, allowing a concurrent set_cpus_allowed() to update
> > p->cpus_ptr while the BPF scheduler is still using it. The dispatch is
> > then finalized using stale affinity information.
> >
> > Example timeline:
> >
> > CPU0 CPU1
> > ---- ----
> > task_rq_lock(p)
> > if (cpumask_test_cpu(cpu, p->cpus_ptr))
> > set_cpus_allowed_scx(p, new_mask)
> > task_rq_unlock(p)
> > scx_bpf_dsq_insert(p,
> > SCX_DSQ_LOCAL_ON | cpu, 0)
> >
> > With commit ebf1ccff79c4 ("sched_ext: Fix ops.dequeue() semantics"), BPF
> > schedulers can avoid the affinity race by tracking task state and
> > handling %SCX_DEQ_SCHED_CHANGE in ops.dequeue(): when a task is dequeued
> > due to a property change, the scheduler can update the task state and
> > skip the direct dispatch from ops.dispatch() for non-queued tasks.
> >
> > However, schedulers that do not implement task state tracking and
> > dispatch directly to a local DSQ directly from ops.dispatch() may
> > trigger the scx_error() condition when the kernel validates the
> > destination in dispatch_to_local_dsq().
>
> The two paragraphs above mention "direct dispatch from ops.dispatch()"
> and "dispatch directly to a local DSQ directly from ops.dispatch()".
> My understanding is that a "direct dispatch" can only happen from
> ops.select_cpu() or ops.enqueue(), not from ops.dispatch(). Is this just
> an unfortunate choice of words?
> Would "dispatch to a local DSQ" be a more accurate phrase here?
Oh yes, poor wording on my side. What I mean is
scx_bpf_dsq_insert(SCX_DSQ_LOCAL_ON | cpu) from ops.dispatch(), so
"dispatch to a local DSQ" is definitely better, thanks!
-Andrea
prev parent reply other threads:[~2026-03-19 19:01 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-19 8:35 [PATCH v2 sched_ext/for-7.1] sched_ext: Invalidate dispatch decisions on CPU affinity changes Andrea Righi
2026-03-19 10:31 ` Kuba Piecuch
2026-03-19 13:54 ` Kuba Piecuch
2026-03-19 21:09 ` Andrea Righi
2026-03-20 9:18 ` Kuba Piecuch
2026-03-23 23:13 ` Tejun Heo
2026-04-22 6:33 ` Cheng-Yang Chou
2026-04-22 11:02 ` Andrea Righi
2026-04-23 13:32 ` Kuba Piecuch
2026-04-26 1:47 ` Cheng-Yang Chou
2026-04-27 9:06 ` Kuba Piecuch
2026-05-01 16:19 ` Cheng-Yang Chou
2026-05-04 8:00 ` Kuba Piecuch
2026-05-04 21:24 ` Tejun Heo
2026-05-04 21:58 ` Andrea Righi
2026-05-05 8:35 ` Cheng-Yang Chou
2026-05-05 8:01 ` Kuba Piecuch
2026-05-05 8:31 ` Tejun Heo
2026-05-05 9:13 ` Kuba Piecuch
2026-05-05 15:14 ` Tejun Heo
2026-05-05 15:58 ` Cheng-Yang Chou
2026-03-19 15:18 ` Kuba Piecuch
2026-03-19 19:01 ` Andrea Righi [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=abxH7cC-IOE2PqKb@gpd4 \
--to=arighi@nvidia.com \
--cc=changwoo@igalia.com \
--cc=christian.loehle@arm.com \
--cc=emil@etsalapatis.com \
--cc=hodgesd@meta.com \
--cc=jpiecuch@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=sched-ext@lists.linux.dev \
--cc=tj@kernel.org \
--cc=void@manifault.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.