From: Tejun Heo <tj@kernel.org>
To: David Vernet <void@manifault.com>,
Andrea Righi <arighi@nvidia.com>,
Changwoo Min <changwoo@igalia.com>
Cc: Emil Tsalapatis <emil@etsalapatis.com>,
sched-ext@lists.linux.dev, linux-kernel@vger.kernel.org,
Tejun Heo <tj@kernel.org>
Subject: [PATCHSET v4 sched_ext/for-7.2] sched_ext: Topological CPU IDs and cid-form struct_ops
Date: Wed, 29 Apr 2026 08:21:14 -1000 [thread overview]
Message-ID: <20260429182131.1780125-1-tj@kernel.org> (raw)
Hello,
v4:
- cmask: bump CMASK_CAS_TRIES to (1U << 23) so abort fires only after
seconds of real spinning, not on plausible contention. The kfunc
slow-path Changwoo suggested would let BPF loops keep banging a
contended cacheline indefinitely - on multi-socket SPRs that path
can stall the machine into hard lockups, so failing hard is the
right behavior. A follow-up patch will add a kfunc to bail the BPF
CAS loops immediately when sch->aborting is set. Switch
__builtin_ctzll() to the ctzll() wrapper for clang compat.
- cid-qmap-port: cid-shard handling was wired against a future kfunc
signature that didn't make it into v3, leaving the snapshot broken.
Drop the shard test plumbing for v4, match the 2-arg
scx_bpf_cid_override(), bound nr_cpu_ids for the verifier, and
rename mode 3 from bad-mono to bad-range. (Changwoo, Andrea)
- Rebased over the exit_cpu plumbing in for-7.2:
scx-error-header: scx_exit() and scx_verror() are macros now;
move both plus the underlying __scx_exit() / scx_vexit()
declarations to ext_internal.h.
cid-struct-ops: dump_cpu callsite shifted into scx_dump_cpu()
helper; the scx_cpu_arg() wrap moved with it.
v3: https://lore.kernel.org/r/20260428203545.181052-1-tj@kernel.org
v2: https://lore.kernel.org/r/20260424172721.3458520-1-tj@kernel.org
v1: https://lore.kernel.org/r/20260421071945.3110084-1-tj@kernel.org
This patchset introduces topological CPU IDs (cids) - dense,
topology-ordered cpu identifiers - and an alternative cid-form struct_ops
type that lets BPF schedulers operate in cid space directly.
Key pieces:
- cid space: scx_cid_init() walks nodes * LLCs * cores * threads and packs
a dense cid mapping. The mapping can be overridden via
scx_bpf_cid_override(). See "Topological CPU IDs" in ext_cid.h for the
model.
- cmask: a base-windowed bitmap over cid space. Kernel and BPF helpers with
identical semantics. Used by scx_qmap for per-task affinity and idle-cid
tracking; meant to be the substrate for sub-sched cid allocation.
- bpf_sched_ext_ops_cid: a parallel struct_ops type whose callbacks take
cids/cmasks instead of cpus/cpumasks. Kernel translates at the boundary
via scx_cpu_arg() / scx_cpu_ret(); the two struct types share offsets up
through @priv (verified by BUILD_BUG_ON) so the union view in scx_sched
works without function-pointer casts. Sub-sched support is tied to
cid-form: validate_ops() rejects cpu-form sub-scheds and cpu-form roots
that expose sub_attach / sub_detach.
- cid-form kfuncs: scx_bpf_kick_cid, scx_bpf_cidperf_{cap,cur,set},
scx_bpf_cid_curr, scx_bpf_task_cid, scx_bpf_this_cid,
scx_bpf_nr_{cids,online_cids}, scx_bpf_cid_to_cpu, scx_bpf_cpu_to_cid.
A cid-form program may not call cpu-only kfuncs (enforced at verifier
load via scx_kfunc_context_filter); the reverse is intentionally
permissive to ease migration.
- scx_qmap port: scx_qmap is converted to cid-form. It uses the cmask-based
idle picker, per-task cid-space cpus_allowed, and cid-form kfuncs
throughout. Sub-sched dispatching via scx_bpf_sub_dispatch() continues to
work.
v4 re-tested on the 16-cpu QEMU VM with the v3 cut (only the 17 cid
patches applied): basic load + stress, cid-override modes
(shuffle/bad-dup/bad-range), and three enable/disable cycles all clean.
No BUG/WARNING/panic in the dump.
Based on sched_ext/for-7.2 (ee8391ba1164).
0001-sched_ext-Add-ext_types.h-for-early-subsystem-wide-d.patch
0002-sched_ext-Rename-ops_cpu_valid-to-scx_cpu_valid-and-.patch
0003-sched_ext-Move-scx_exit-scx_error-and-friends-to-ext.patch
0004-sched_ext-Shift-scx_kick_cpu-validity-check-to-scx_b.patch
0005-sched_ext-Relocate-cpu_acquire-cpu_release-to-end-of.patch
0006-sched_ext-Make-scx_enable-take-scx_enable_cmd.patch
0007-sched_ext-Add-topological-CPU-IDs-cids.patch
0008-sched_ext-Add-scx_bpf_cid_override-kfunc.patch
0009-tools-sched_ext-Add-struct_size-helpers-to-common.bp.patch
0010-sched_ext-Add-cmask-a-base-windowed-bitmap-over-cid-.patch
0011-sched_ext-Add-cid-form-kfunc-wrappers-alongside-cpu-.patch
0012-sched_ext-Add-bpf_sched_ext_ops_cid-struct_ops-type.patch
0013-sched_ext-Forbid-cpu-form-kfuncs-from-cid-form-sched.patch
0014-tools-sched_ext-scx_qmap-Restart-on-hotplug-instead-.patch
0015-tools-sched_ext-scx_qmap-Add-cmask-based-idle-tracki.patch
0016-tools-sched_ext-scx_qmap-Port-to-cid-form-struct_ops.patch
0017-sched_ext-Require-cid-form-struct_ops-for-sub-sched-.patch
Git tree: git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext.git scx-cid-v4
kernel/sched/build_policy.c | 3 +
kernel/sched/ext.c | 660 ++++++++++++++++++++++++++----
kernel/sched/ext_cid.c | 409 +++++++++++++++++++
kernel/sched/ext_cid.h | 164 ++++++++
kernel/sched/ext_idle.c | 8 +-
kernel/sched/ext_internal.h | 209 +++++++---
kernel/sched/ext_types.h | 104 +++++
tools/sched_ext/include/scx/cid.bpf.h | 666 +++++++++++++++++++++++++++++++
tools/sched_ext/include/scx/common.bpf.h | 23 ++
tools/sched_ext/include/scx/compat.bpf.h | 24 ++
tools/sched_ext/scx_qmap.bpf.c | 350 +++++++++-------
tools/sched_ext/scx_qmap.c | 57 ++-
tools/sched_ext/scx_qmap.h | 2 +-
13 files changed, 2387 insertions(+), 292 deletions(-)
Thanks.
--
tejun
next reply other threads:[~2026-04-29 18:21 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-29 18:21 Tejun Heo [this message]
2026-04-29 18:21 ` [PATCH 01/17] sched_ext: Add ext_types.h for early subsystem-wide defs Tejun Heo
2026-04-29 18:21 ` [PATCH 02/17] sched_ext: Rename ops_cpu_valid() to scx_cpu_valid() and expose it Tejun Heo
2026-04-29 18:21 ` [PATCH 03/17] sched_ext: Move scx_exit(), scx_error() and friends to ext_internal.h Tejun Heo
2026-04-29 18:21 ` [PATCH 04/17] sched_ext: Shift scx_kick_cpu() validity check to scx_bpf_kick_cpu() Tejun Heo
2026-04-29 18:21 ` [PATCH 05/17] sched_ext: Relocate cpu_acquire/cpu_release to end of struct sched_ext_ops Tejun Heo
2026-04-29 18:21 ` [PATCH 06/17] sched_ext: Make scx_enable() take scx_enable_cmd Tejun Heo
2026-04-29 18:21 ` [PATCH 07/17] sched_ext: Add topological CPU IDs (cids) Tejun Heo
2026-04-29 18:21 ` [PATCH 08/17] sched_ext: Add scx_bpf_cid_override() kfunc Tejun Heo
2026-04-29 18:21 ` [PATCH 09/17] tools/sched_ext: Add struct_size() helpers to common.bpf.h Tejun Heo
2026-04-29 18:21 ` [PATCH 10/17] sched_ext: Add cmask, a base-windowed bitmap over cid space Tejun Heo
2026-04-29 18:21 ` [PATCH 11/17] sched_ext: Add cid-form kfunc wrappers alongside cpu-form Tejun Heo
2026-04-29 18:21 ` [PATCH 12/17] sched_ext: Add bpf_sched_ext_ops_cid struct_ops type Tejun Heo
2026-04-29 18:21 ` [PATCH 13/17] sched_ext: Forbid cpu-form kfuncs from cid-form schedulers Tejun Heo
2026-04-29 18:21 ` [PATCH 14/17] tools/sched_ext: scx_qmap: Restart on hotplug instead of cpu_online/offline Tejun Heo
2026-04-29 18:21 ` [PATCH 15/17] tools/sched_ext: scx_qmap: Add cmask-based idle tracking and cid-based idle pick Tejun Heo
2026-04-29 18:21 ` [PATCH 16/17] tools/sched_ext: scx_qmap: Port to cid-form struct_ops Tejun Heo
2026-04-29 18:21 ` [PATCH 17/17] sched_ext: Require cid-form struct_ops for sub-sched support Tejun Heo
2026-04-29 18:29 ` [PATCHSET v4 sched_ext/for-7.2] sched_ext: Topological CPU IDs and cid-form struct_ops Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260429182131.1780125-1-tj@kernel.org \
--to=tj@kernel.org \
--cc=arighi@nvidia.com \
--cc=changwoo@igalia.com \
--cc=emil@etsalapatis.com \
--cc=linux-kernel@vger.kernel.org \
--cc=sched-ext@lists.linux.dev \
--cc=void@manifault.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.