public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Andrea Righi <arighi@nvidia.com>
To: Kuba Piecuch <jpiecuch@google.com>
Cc: Tejun Heo <tj@kernel.org>, David Vernet <void@manifault.com>,
	Changwoo Min <changwoo@igalia.com>,
	Emil Tsalapatis <emil@etsalapatis.com>,
	Christian Loehle <christian.loehle@arm.com>,
	Daniel Hodges <hodgesd@meta.com>,
	sched-ext@lists.linux.dev, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2 sched_ext/for-7.1] sched_ext: Invalidate dispatch decisions on CPU affinity changes
Date: Thu, 19 Mar 2026 20:01:01 +0100	[thread overview]
Message-ID: <abxH7cC-IOE2PqKb@gpd4> (raw)
In-Reply-To: <DH6UY7YCS6LN.KMPOKET3PSF9@google.com>

Hi Kuba,

On Thu, Mar 19, 2026 at 03:18:38PM +0000, Kuba Piecuch wrote:
> Hi Andrea,
> 
> On Thu Mar 19, 2026 at 8:35 AM UTC, Andrea Righi wrote:
> > A BPF scheduler may rely on p->cpus_ptr from ops.dispatch() to select a
> > target CPU. However, task affinity can change between the dispatch
> > decision and its finalization in finish_dispatch(). When this happens,
> > the scheduler may attempt to dispatch a task to a CPU that is no longer
> > allowed, resulting in fatal errors such as:
> >
> >  EXIT: runtime error (SCX_DSQ_LOCAL[_ON] target CPU 10 not allowed for stress-ng-race-[13565])
> >
> > This race exists because ops.dispatch() runs without holding the task's
> > run queue lock, allowing a concurrent set_cpus_allowed() to update
> > p->cpus_ptr while the BPF scheduler is still using it. The dispatch is
> > then finalized using stale affinity information.
> >
> > Example timeline:
> >
> >   CPU0                                      CPU1
> >   ----                                      ----
> >                                             task_rq_lock(p)
> >   if (cpumask_test_cpu(cpu, p->cpus_ptr))
> >                                             set_cpus_allowed_scx(p, new_mask)
> >                                             task_rq_unlock(p)
> >       scx_bpf_dsq_insert(p,
> >               SCX_DSQ_LOCAL_ON | cpu, 0)
> >
> > With commit ebf1ccff79c4 ("sched_ext: Fix ops.dequeue() semantics"), BPF
> > schedulers can avoid the affinity race by tracking task state and
> > handling %SCX_DEQ_SCHED_CHANGE in ops.dequeue(): when a task is dequeued
> > due to a property change, the scheduler can update the task state and
> > skip the direct dispatch from ops.dispatch() for non-queued tasks.
> >
> > However, schedulers that do not implement task state tracking and
> > dispatch directly to a local DSQ directly from ops.dispatch() may
> > trigger the scx_error() condition when the kernel validates the
> > destination in dispatch_to_local_dsq().
> 
> The two paragraphs above mention "direct dispatch from ops.dispatch()"
> and "dispatch directly to a local DSQ directly from ops.dispatch()".
> My understanding is that a "direct dispatch" can only happen from
> ops.select_cpu() or ops.enqueue(), not from ops.dispatch(). Is this just
> an unfortunate choice of words?
> Would "dispatch to a local DSQ" be a more accurate phrase here?

Oh yes, poor wording on my side. What I mean is
scx_bpf_dsq_insert(SCX_DSQ_LOCAL_ON | cpu) from ops.dispatch(), so
"dispatch to a local DSQ" is definitely better, thanks!

-Andrea

      reply	other threads:[~2026-03-19 19:01 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-19  8:35 [PATCH v2 sched_ext/for-7.1] sched_ext: Invalidate dispatch decisions on CPU affinity changes Andrea Righi
2026-03-19 10:31 ` Kuba Piecuch
2026-03-19 13:54   ` Kuba Piecuch
2026-03-19 21:09   ` Andrea Righi
2026-03-20  9:18     ` Kuba Piecuch
2026-03-23 23:13       ` Tejun Heo
2026-03-19 15:18 ` Kuba Piecuch
2026-03-19 19:01   ` Andrea Righi [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=abxH7cC-IOE2PqKb@gpd4 \
    --to=arighi@nvidia.com \
    --cc=changwoo@igalia.com \
    --cc=christian.loehle@arm.com \
    --cc=emil@etsalapatis.com \
    --cc=hodgesd@meta.com \
    --cc=jpiecuch@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sched-ext@lists.linux.dev \
    --cc=tj@kernel.org \
    --cc=void@manifault.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox