From: Tejun Heo <tj@kernel.org>
To: void@manifault.com, arighi@nvidia.com, changwoo@igalia.com
Cc: sched-ext@lists.linux.dev, emil@etsalapatis.com,
linux-kernel@vger.kernel.org, Tejun Heo <tj@kernel.org>
Subject: [PATCHSET sched_ext/for-7.2] sched_ext: Topological CPU IDs and cid-form struct_ops
Date: Mon, 20 Apr 2026 21:19:29 -1000 [thread overview]
Message-ID: <20260421071945.3110084-1-tj@kernel.org> (raw)
Hello,
This patchset introduces topological CPU IDs (cids) - dense,
topology-ordered cpu identifiers - and an alternative cid-form struct_ops
type that lets BPF schedulers operate in cid space directly.
Key pieces:
- cid space: scx_cid_init() walks nodes * LLCs * cores * threads and packs
a dense cid mapping. The mapping can be overridden via
scx_bpf_cid_override(). See "Topological CPU IDs" in ext_cid.h for the
model.
- cmask: a base-windowed bitmap over cid space. Kernel and BPF helpers with
identical semantics. Used by scx_qmap for per-task affinity and idle-cid
tracking; meant to be the substrate for sub-sched cid allocation.
- bpf_sched_ext_ops_cid: a parallel struct_ops type whose callbacks take
cids/cmasks instead of cpus/cpumasks. Kernel translates at the boundary
via scx_cpu_arg() / scx_cpu_ret(); the two struct types share offsets up
through @priv (verified by BUILD_BUG_ON) so the union view in scx_sched
works without function-pointer casts. Sub-sched support is tied to
cid-form: validate_ops() rejects cpu-form sub-scheds and cpu-form roots
that expose sub_attach / sub_detach.
- cid-form kfuncs: scx_bpf_kick_cid, scx_bpf_cidperf_{cap,cur,set},
scx_bpf_cid_curr, scx_bpf_task_cid, scx_bpf_this_cid,
scx_bpf_nr_{cids,online_cids}, scx_bpf_cid_to_cpu, scx_bpf_cpu_to_cid.
A cid-form program may not call cpu-only kfuncs (enforced at verifier
load via scx_kfunc_context_filter); the reverse is intentionally
permissive to ease migration.
- scx_qmap port: scx_qmap is converted to cid-form. It uses the cmask-based
idle picker, per-task cid-space cpus_allowed, and cid-form kfuncs
throughout. Sub-sched dispatching via scx_bpf_sub_dispatch() continues to
work.
End-to-end testing on a 16-cpu QEMU (identity mapping) and an AMD Ryzen
9 3900X (non-identity cid mapping across CCXes) validated:
- cid <-> cpu table roundtrip
- SCX_DSQ_LOCAL_ON | cid routing (pinned workers land on the right cpu)
- scx_bpf_kick_cid translation (traced entry vs scx_kick_cpu exit)
- set_cmask fidelity (bits match cpu_to_cid(task->cpus_ptr))
- idle picker engagement under light load, backoff under saturation
- sub-sched qmap (root + 3 subs across cgroups)
- cpu-form regression (scx_simple, scx_flatcg still load and schedule)
- kfunc filter denial (cid-form calling scx_bpf_task_cpu rejected at load)
The patchset depends on a pre-existing bug fix that is being submitted
separately to for-7.1-fixes:
"tools/sched_ext: scx_qmap: Silence task_ctx lookup miss"
https://lore.kernel.org/r/59bc5171ee5aa02746c2f576d0f1e14f@kernel.org
The scx-cid-base branch listed below already carries that fix merged
through for-7.1-fixes; when for-7.1-fixes is merged back into for-7.2
the dependency resolves naturally.
Based on sched_ext/for-7.2 (12ff49d4e1d9) + for-7.1-fixes + the above fix
(scx-cid-base at 5755954f68ad).
0001-sched_ext-Rename-ops_cpu_valid-to-scx_cpu_valid-and-.patch
0002-sched_ext-Move-scx_exit-scx_error-and-friends-to-ext.patch
0003-sched_ext-Shift-scx_kick_cpu-validity-check-to-scx_b.patch
0004-sched_ext-Relocate-cpu_acquire-cpu_release-to-end-of.patch
0005-sched_ext-Make-scx_enable-take-scx_enable_cmd.patch
0006-sched_ext-Add-topological-CPU-IDs-cids.patch
0007-sched_ext-Add-scx_bpf_cid_override-kfunc.patch
0008-tools-sched_ext-Add-struct_size-helpers-to-common.bp.patch
0009-sched_ext-Add-cmask-a-base-windowed-bitmap-over-cid-.patch
0010-sched_ext-Add-cid-form-kfunc-wrappers-alongside-cpu-.patch
0011-sched_ext-Add-bpf_sched_ext_ops_cid-struct_ops-type.patch
0012-sched_ext-Forbid-cpu-form-kfuncs-from-cid-form-sched.patch
0013-tools-sched_ext-scx_qmap-Restart-on-hotplug-instead-.patch
0014-tools-sched_ext-scx_qmap-Add-cmask-based-idle-tracki.patch
0015-tools-sched_ext-scx_qmap-Port-to-cid-form-struct_ops.patch
0016-sched_ext-Require-cid-form-struct_ops-for-sub-sched-.patch
Git tree: git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext.git scx-cid
kernel/sched/build_policy.c | 1 +
kernel/sched/ext.c | 635 +++++++++++++++++++++++++----
kernel/sched/ext_cid.c | 554 +++++++++++++++++++++++++++
kernel/sched/ext_cid.h | 327 ++++++++++++++++
kernel/sched/ext_idle.c | 8 +-
kernel/sched/ext_internal.h | 173 +++++++--
tools/sched_ext/include/scx/cid.bpf.h | 595 +++++++++++++++++++++++++++++
tools/sched_ext/include/scx/common.bpf.h | 23 ++
tools/sched_ext/include/scx/compat.bpf.h | 24 ++
tools/sched_ext/scx_qmap.bpf.c | 304 ++++++-------
tools/sched_ext/scx_qmap.c | 25 +-
tools/sched_ext/scx_qmap.h | 2 +-
12 files changed, 2419 insertions(+), 252 deletions(-)
--
tejun
next reply other threads:[~2026-04-21 7:19 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-21 7:19 Tejun Heo [this message]
2026-04-21 7:19 ` [PATCH 01/16] sched_ext: Rename ops_cpu_valid() to scx_cpu_valid() and expose it Tejun Heo
2026-04-21 13:31 ` Cheng-Yang Chou
2026-04-21 7:19 ` [PATCH 02/16] sched_ext: Move scx_exit(), scx_error() and friends to ext_internal.h Tejun Heo
2026-04-21 13:36 ` Cheng-Yang Chou
2026-04-21 7:19 ` [PATCH 03/16] sched_ext: Shift scx_kick_cpu() validity check to scx_bpf_kick_cpu() Tejun Heo
2026-04-21 13:49 ` Cheng-Yang Chou
2026-04-21 7:19 ` [PATCH 04/16] sched_ext: Relocate cpu_acquire/cpu_release to end of struct sched_ext_ops Tejun Heo
2026-04-21 13:58 ` Cheng-Yang Chou
2026-04-21 7:19 ` [PATCH 05/16] sched_ext: Make scx_enable() take scx_enable_cmd Tejun Heo
2026-04-21 14:25 ` Cheng-Yang Chou
2026-04-21 7:19 ` [PATCH 06/16] sched_ext: Add topological CPU IDs (cids) Tejun Heo
2026-04-21 17:15 ` [PATCH v2 sched_ext/for-7.2] " Tejun Heo
2026-04-21 7:19 ` [PATCH 07/16] sched_ext: Add scx_bpf_cid_override() kfunc Tejun Heo
2026-04-21 7:19 ` [PATCH 08/16] tools/sched_ext: Add struct_size() helpers to common.bpf.h Tejun Heo
2026-04-21 7:19 ` [PATCH 09/16] sched_ext: Add cmask, a base-windowed bitmap over cid space Tejun Heo
2026-04-21 17:30 ` Cheng-Yang Chou
2026-04-21 23:21 ` [PATCH v2] " Tejun Heo
2026-04-21 7:19 ` [PATCH 10/16] sched_ext: Add cid-form kfunc wrappers alongside cpu-form Tejun Heo
2026-04-21 7:19 ` [PATCH 11/16] sched_ext: Add bpf_sched_ext_ops_cid struct_ops type Tejun Heo
2026-04-21 7:19 ` [PATCH 12/16] sched_ext: Forbid cpu-form kfuncs from cid-form schedulers Tejun Heo
2026-04-21 7:19 ` [PATCH 13/16] tools/sched_ext: scx_qmap: Restart on hotplug instead of cpu_online/offline Tejun Heo
2026-04-21 7:19 ` [PATCH 14/16] tools/sched_ext: scx_qmap: Add cmask-based idle tracking and cid-based idle pick Tejun Heo
2026-04-21 7:19 ` [PATCH 15/16] tools/sched_ext: scx_qmap: Port to cid-form struct_ops Tejun Heo
2026-04-21 7:19 ` [PATCH 16/16] sched_ext: Require cid-form struct_ops for sub-sched support Tejun Heo
2026-04-21 18:18 ` [PATCHSET sched_ext/for-7.2] sched_ext: Topological CPU IDs and cid-form struct_ops Cheng-Yang Chou
2026-04-21 18:33 ` Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260421071945.3110084-1-tj@kernel.org \
--to=tj@kernel.org \
--cc=arighi@nvidia.com \
--cc=changwoo@igalia.com \
--cc=emil@etsalapatis.com \
--cc=linux-kernel@vger.kernel.org \
--cc=sched-ext@lists.linux.dev \
--cc=void@manifault.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox