From: Peter Zijlstra <peterz@infradead.org>
To: Tejun Heo <tj@kernel.org>
Cc: David Vernet <void@manifault.com>,
Andrea Righi <andrea.righi@linux.dev>,
Changwoo Min <changwoo@igalia.com>,
linux-kernel@vger.kernel.org, sched-ext@lists.linux.dev,
Wen-Fang Liu <liuwenfang@honor.com>
Subject: Re: [PATCH v2 3/3] sched_ext: Allow scx_bpf_reenqueue_local() to be called from anywhere
Date: Wed, 29 Oct 2025 11:45:46 +0100 [thread overview]
Message-ID: <20251029104546.GI3419281@noisy.programming.kicks-ass.net> (raw)
In-Reply-To: <aP-3vOtH9Dyg_R9w@slm.duckdns.org>
On Mon, Oct 27, 2025 at 08:19:40AM -1000, Tejun Heo wrote:
> The ops.cpu_acquire/release() callbacks are broken - they miss events under
> multiple conditions and can't be fixed without adding global sched core hooks
> that sched maintainers don't want.
I think I'll object to that statement just a wee bit. I think we can
make it work -- just not with the things proposed earlier.
Anyway, if you want to reduce the sched_ext interface and remove
cpu_acquire/release entirely, this is fine too.
I might still do that wakeup_preempt() change if I can merge / replace
the queue_mask RETRY_TASK logic -- I have vague memories the RT people
also wanted something like this a while ago and it isn't that big of a
change.
> There are two distinct task dispatch gaps that can cause cpu_released flag
> desynchronization:
>
> 1. balance-to-pick_task gap: This is what was originally reported. balance_scx()
> can enqueue a task, but during consume_remote_task() when the rq lock is
> released, a higher priority task can be enqueued and ultimately picked while
> cpu_released remains false. This gap is closeable via RETRY_TASK handling.
>
> 2. ttwu-to-pick_task gap: ttwu() can directly dispatch a task to a CPU's local
> DSQ. By the time the sched path runs on the target CPU, higher class tasks may
> already be queued. In such cases, nothing on sched_ext side will be invoked,
> and the only solution would be a hook invoked regardless of sched class, which
> isn't desirable.
>
> Rather than adding invasive core hooks, BPF schedulers can use generic BPF
> mechanisms like tracepoints. From SCX scheduler's perspective, this is congruent
> with other mechanisms it already uses and doesn't add further friction.
>
> The main use case for cpu_release() was calling scx_bpf_reenqueue_local() when
> a CPU gets preempted by a higher priority scheduling class. However, the old
> scx_bpf_reenqueue_local() could only be called from cpu_release() context.
>
> Add a new version of scx_bpf_reenqueue_local() that can be called from any
> context by deferring the actual re-enqueue operation. This eliminates the need
> for cpu_acquire/release() ops entirely. Schedulers can now use standard BPF
> mechanisms like the sched_switch tracepoint to detect and handle CPU preemption.
>
> Update scx_qmap to demonstrate the new approach using sched_switch instead of
> cpu_release, with compat support for older kernels. Mark cpu_acquire/release()
> as deprecated. The old scx_bpf_reenqueue_local() variant will be removed in
> v6.23.
>
> Reported-by: Wen-Fang Liu <liuwenfang@honor.com>
> Link: https://lore.kernel.org/all/8d64c74118c6440f81bcf5a4ac6b9f00@honor.com/
> Cc: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Tejun Heo <tj@kernel.org>
Yeah, this Changelog is much better, thanks!
6.23 is a long time, can't we throw this out quicker? This thing wasn't
supposed to be an ABI after all. A 1 release cycle seems fine to me ;-)
next prev parent reply other threads:[~2025-10-29 10:45 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-25 0:18 [PATCHSET sched_ext/for-6.19] sched_ext: Deprecate ops.cpu_acquire/release() Tejun Heo
2025-10-25 0:18 ` [PATCH 1/3] sched_ext: Split schedule_deferred() into locked and unlocked variants Tejun Heo
2025-10-25 23:17 ` Emil Tsalapatis
2025-10-25 0:18 ` [PATCH 2/3] sched_ext: Factor out reenq_local() from scx_bpf_reenqueue_local() Tejun Heo
2025-10-25 23:19 ` Emil Tsalapatis
2025-10-25 0:18 ` [PATCH 3/3] sched_ext: Allow scx_bpf_reenqueue_local() to be called from anywhere Tejun Heo
2025-10-25 23:21 ` Emil Tsalapatis
2025-10-27 9:18 ` Peter Zijlstra
2025-10-27 16:00 ` Tejun Heo
2025-10-27 17:49 ` Peter Zijlstra
2025-10-27 18:05 ` Tejun Heo
2025-10-27 18:07 ` Peter Zijlstra
2025-10-27 18:10 ` Peter Zijlstra
2025-10-27 18:17 ` Tejun Heo
2025-10-28 11:01 ` Peter Zijlstra
2025-10-28 17:07 ` Tejun Heo
2025-10-27 18:19 ` [PATCH v2 " Tejun Heo
2025-10-29 10:45 ` Peter Zijlstra [this message]
2025-10-29 15:11 ` Tejun Heo
2025-10-29 15:49 ` [PATCH v3 " Tejun Heo
2025-11-27 10:39 ` Kuba Piecuch
2025-12-02 23:05 ` Tejun Heo
2025-12-11 14:24 ` Kuba Piecuch
2025-12-11 16:17 ` Tejun Heo
2025-12-11 16:20 ` Tejun Heo
2025-12-13 1:16 ` Andrea Righi
2025-12-13 1:18 ` Tejun Heo
2025-10-29 15:31 ` [PATCHSET sched_ext/for-6.19] sched_ext: Deprecate ops.cpu_acquire/release() Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251029104546.GI3419281@noisy.programming.kicks-ass.net \
--to=peterz@infradead.org \
--cc=andrea.righi@linux.dev \
--cc=changwoo@igalia.com \
--cc=linux-kernel@vger.kernel.org \
--cc=liuwenfang@honor.com \
--cc=sched-ext@lists.linux.dev \
--cc=tj@kernel.org \
--cc=void@manifault.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox