All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kuba Piecuch <jpiecuch@google.com>
To: Andrea Righi <arighi@nvidia.com>, Tejun Heo <tj@kernel.org>,
	 David Vernet <void@manifault.com>,
	Changwoo Min <changwoo@igalia.com>
Cc: Kuba Piecuch <jpiecuch@google.com>,
	Emil Tsalapatis <emil@etsalapatis.com>,
	 Christian Loehle <christian.loehle@arm.com>,
	Daniel Hodges <hodgesd@meta.com>, <sched-ext@lists.linux.dev>,
	 <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v2 sched_ext/for-7.1] sched_ext: Invalidate dispatch decisions on CPU affinity changes
Date: Thu, 19 Mar 2026 15:18:38 +0000	[thread overview]
Message-ID: <DH6UY7YCS6LN.KMPOKET3PSF9@google.com> (raw)
In-Reply-To: <20260319083518.94673-1-arighi@nvidia.com>

Hi Andrea,

On Thu Mar 19, 2026 at 8:35 AM UTC, Andrea Righi wrote:
> A BPF scheduler may rely on p->cpus_ptr from ops.dispatch() to select a
> target CPU. However, task affinity can change between the dispatch
> decision and its finalization in finish_dispatch(). When this happens,
> the scheduler may attempt to dispatch a task to a CPU that is no longer
> allowed, resulting in fatal errors such as:
>
>  EXIT: runtime error (SCX_DSQ_LOCAL[_ON] target CPU 10 not allowed for stress-ng-race-[13565])
>
> This race exists because ops.dispatch() runs without holding the task's
> run queue lock, allowing a concurrent set_cpus_allowed() to update
> p->cpus_ptr while the BPF scheduler is still using it. The dispatch is
> then finalized using stale affinity information.
>
> Example timeline:
>
>   CPU0                                      CPU1
>   ----                                      ----
>                                             task_rq_lock(p)
>   if (cpumask_test_cpu(cpu, p->cpus_ptr))
>                                             set_cpus_allowed_scx(p, new_mask)
>                                             task_rq_unlock(p)
>       scx_bpf_dsq_insert(p,
>               SCX_DSQ_LOCAL_ON | cpu, 0)
>
> With commit ebf1ccff79c4 ("sched_ext: Fix ops.dequeue() semantics"), BPF
> schedulers can avoid the affinity race by tracking task state and
> handling %SCX_DEQ_SCHED_CHANGE in ops.dequeue(): when a task is dequeued
> due to a property change, the scheduler can update the task state and
> skip the direct dispatch from ops.dispatch() for non-queued tasks.
>
> However, schedulers that do not implement task state tracking and
> dispatch directly to a local DSQ directly from ops.dispatch() may
> trigger the scx_error() condition when the kernel validates the
> destination in dispatch_to_local_dsq().

The two paragraphs above mention "direct dispatch from ops.dispatch()"
and "dispatch directly to a local DSQ directly from ops.dispatch()".
My understanding is that a "direct dispatch" can only happen from
ops.select_cpu() or ops.enqueue(), not from ops.dispatch(). Is this just
an unfortunate choice of words?
Would "dispatch to a local DSQ" be a more accurate phrase here?

Thanks,
Kuba

  parent reply	other threads:[~2026-03-19 15:18 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-19  8:35 [PATCH v2 sched_ext/for-7.1] sched_ext: Invalidate dispatch decisions on CPU affinity changes Andrea Righi
2026-03-19 10:31 ` Kuba Piecuch
2026-03-19 13:54   ` Kuba Piecuch
2026-03-19 21:09   ` Andrea Righi
2026-03-20  9:18     ` Kuba Piecuch
2026-03-23 23:13       ` Tejun Heo
2026-04-22  6:33         ` Cheng-Yang Chou
2026-04-22 11:02           ` Andrea Righi
2026-04-23 13:32           ` Kuba Piecuch
2026-04-26  1:47             ` Cheng-Yang Chou
2026-04-27  9:06               ` Kuba Piecuch
2026-05-01 16:19                 ` Cheng-Yang Chou
2026-05-04  8:00                   ` Kuba Piecuch
2026-05-04 21:24                     ` Tejun Heo
2026-05-04 21:58                       ` Andrea Righi
2026-05-05  8:35                         ` Cheng-Yang Chou
2026-05-05  8:01                       ` Kuba Piecuch
2026-05-05  8:31                         ` Tejun Heo
2026-05-05  9:13                           ` Kuba Piecuch
2026-05-05 15:14                             ` Tejun Heo
2026-05-05 15:58                           ` Cheng-Yang Chou
2026-03-19 15:18 ` Kuba Piecuch [this message]
2026-03-19 19:01   ` Andrea Righi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DH6UY7YCS6LN.KMPOKET3PSF9@google.com \
    --to=jpiecuch@google.com \
    --cc=arighi@nvidia.com \
    --cc=changwoo@igalia.com \
    --cc=christian.loehle@arm.com \
    --cc=emil@etsalapatis.com \
    --cc=hodgesd@meta.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sched-ext@lists.linux.dev \
    --cc=tj@kernel.org \
    --cc=void@manifault.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.