* [PATCH 13/17] sched_ext: Forbid cpu-form kfuncs from cid-form schedulers
@ 2026-04-24 1:32 Tejun Heo
2026-04-24 2:15 ` Zhao Mengmeng
0 siblings, 1 reply; 6+ messages in thread
From: Tejun Heo @ 2026-04-24 1:32 UTC (permalink / raw)
To: David Vernet, Andrea Righi, Changwoo Min
Cc: sched-ext, emil, linux-kernel, Cheng-Yang Chou, Zhao Mengmeng,
Tejun Heo
cid and cpu are both small s32s, trivially confused when a cid-form
scheduler calls a cpu-keyed kfunc. Reject cid-form programs that
reference any kfunc in the new scx_kfunc_ids_cpu_only at verifier load
time.
The reverse direction is intentionally permissive: cpu-form schedulers
can freely call cid-form kfuncs to ease a gradual cpumask -> cid
migration.
The check sits in scx_kfunc_context_filter() right after the SCX
struct_ops gate and before the any/idle allow and per-op allow-list
checks, so it catches cpu-only kfuncs regardless of which set they
belong to (any, idle, or select_cpu).
v2: Sync per-entry kfunc flags with their primary declarations (Zhao).
pahole intersects flags across BTF_ID_FLAGS() occurrences, so
omitting them drops the flags globally.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Cheng-Yang Chou <yphbchou0911@gmail.com>
Cc: Zhao Mengmeng <zhaomzhao@126.com>
---
kernel/sched/ext.c | 51 ++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 51 insertions(+)
diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index abc0c798150d..37f37f31b025 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -9979,6 +9979,47 @@ static const struct btf_kfunc_id_set scx_kfunc_set_any = {
.filter = scx_kfunc_context_filter,
};
+/*
+ * cpu-form kfuncs that are forbidden from cid-form schedulers
+ * (bpf_sched_ext_ops_cid). Programs targeting the cid struct_ops type must
+ * use the cid-form alternative (cid/cmask kfuncs).
+ *
+ * Membership overlaps with scx_kfunc_ids_{any,idle,select_cpu}; the filter
+ * tests this set independently and rejects matches before the per-op
+ * allow-list check runs.
+ *
+ * pahole/resolve_btfids scans every BTF_ID_FLAGS() at build time and
+ * intersects flags across duplicate entries, so each entry must carry the
+ * same flags as the kfunc's primary declaration; otherwise the flags get
+ * dropped globally.
+ */
+BTF_KFUNCS_START(scx_kfunc_ids_cpu_only)
+BTF_ID_FLAGS(func, scx_bpf_kick_cpu, KF_IMPLICIT_ARGS)
+BTF_ID_FLAGS(func, scx_bpf_task_cpu, KF_RCU)
+BTF_ID_FLAGS(func, scx_bpf_cpu_rq, KF_IMPLICIT_ARGS)
+BTF_ID_FLAGS(func, scx_bpf_cpu_curr, KF_IMPLICIT_ARGS | KF_RET_NULL | KF_RCU_PROTECTED)
+BTF_ID_FLAGS(func, scx_bpf_cpu_node, KF_IMPLICIT_ARGS)
+BTF_ID_FLAGS(func, scx_bpf_cpuperf_cap, KF_IMPLICIT_ARGS)
+BTF_ID_FLAGS(func, scx_bpf_cpuperf_cur, KF_IMPLICIT_ARGS)
+BTF_ID_FLAGS(func, scx_bpf_cpuperf_set, KF_IMPLICIT_ARGS)
+BTF_ID_FLAGS(func, scx_bpf_get_possible_cpumask, KF_ACQUIRE)
+BTF_ID_FLAGS(func, scx_bpf_get_online_cpumask, KF_ACQUIRE)
+BTF_ID_FLAGS(func, scx_bpf_put_cpumask, KF_RELEASE)
+BTF_ID_FLAGS(func, scx_bpf_select_cpu_dfl, KF_IMPLICIT_ARGS | KF_RCU)
+BTF_ID_FLAGS(func, __scx_bpf_select_cpu_and, KF_IMPLICIT_ARGS | KF_RCU)
+BTF_ID_FLAGS(func, scx_bpf_select_cpu_and, KF_RCU)
+BTF_ID_FLAGS(func, scx_bpf_get_idle_cpumask, KF_IMPLICIT_ARGS | KF_ACQUIRE)
+BTF_ID_FLAGS(func, scx_bpf_get_idle_cpumask_node, KF_IMPLICIT_ARGS | KF_ACQUIRE)
+BTF_ID_FLAGS(func, scx_bpf_get_idle_smtmask, KF_IMPLICIT_ARGS | KF_ACQUIRE)
+BTF_ID_FLAGS(func, scx_bpf_get_idle_smtmask_node, KF_IMPLICIT_ARGS | KF_ACQUIRE)
+BTF_ID_FLAGS(func, scx_bpf_put_idle_cpumask, KF_RELEASE)
+BTF_ID_FLAGS(func, scx_bpf_test_and_clear_cpu_idle, KF_IMPLICIT_ARGS)
+BTF_ID_FLAGS(func, scx_bpf_pick_idle_cpu, KF_IMPLICIT_ARGS | KF_RCU)
+BTF_ID_FLAGS(func, scx_bpf_pick_idle_cpu_node, KF_IMPLICIT_ARGS | KF_RCU)
+BTF_ID_FLAGS(func, scx_bpf_pick_any_cpu, KF_IMPLICIT_ARGS | KF_RCU)
+BTF_ID_FLAGS(func, scx_bpf_pick_any_cpu_node, KF_IMPLICIT_ARGS | KF_RCU)
+BTF_KFUNCS_END(scx_kfunc_ids_cpu_only)
+
/*
* Per-op kfunc allow flags. Each bit corresponds to a context-sensitive kfunc
* group; an op may permit zero or more groups, with the union expressed in
@@ -10042,6 +10083,7 @@ int scx_kfunc_context_filter(const struct bpf_prog *prog, u32 kfunc_id)
bool in_cpu_release = btf_id_set8_contains(&scx_kfunc_ids_cpu_release, kfunc_id);
bool in_idle = btf_id_set8_contains(&scx_kfunc_ids_idle, kfunc_id);
bool in_any = btf_id_set8_contains(&scx_kfunc_ids_any, kfunc_id);
+ bool in_cpu_only = btf_id_set8_contains(&scx_kfunc_ids_cpu_only, kfunc_id);
u32 moff, flags;
/* Not an SCX kfunc - allow. */
@@ -10079,6 +10121,15 @@ int scx_kfunc_context_filter(const struct bpf_prog *prog, u32 kfunc_id)
prog->aux->st_ops != &bpf_sched_ext_ops_cid)
return -EACCES;
+ /*
+ * cid-form schedulers must use cid/cmask kfuncs. cid and cpu are both
+ * small s32s and trivially confused, so cpu-only kfuncs are rejected at
+ * load time. The reverse (cpu-form calling cid-form kfuncs) is
+ * intentionally permissive to ease gradual cpumask -> cid migration.
+ */
+ if (prog->aux->st_ops == &bpf_sched_ext_ops_cid && in_cpu_only)
+ return -EACCES;
+
/* SCX struct_ops: check the per-op allow list. */
if (in_any || in_idle)
return 0;
--
2.53.0
^ permalink raw reply related [flat|nested] 6+ messages in thread* Re: [PATCH 13/17] sched_ext: Forbid cpu-form kfuncs from cid-form schedulers
2026-04-24 1:32 [PATCH 13/17] sched_ext: Forbid cpu-form kfuncs from cid-form schedulers Tejun Heo
@ 2026-04-24 2:15 ` Zhao Mengmeng
0 siblings, 0 replies; 6+ messages in thread
From: Zhao Mengmeng @ 2026-04-24 2:15 UTC (permalink / raw)
To: Tejun Heo, David Vernet, Andrea Righi, Changwoo Min
Cc: sched-ext, emil, linux-kernel, Cheng-Yang Chou
On 4/24/26 09:32, Tejun Heo wrote:
> cid and cpu are both small s32s, trivially confused when a cid-form
> scheduler calls a cpu-keyed kfunc. Reject cid-form programs that
> reference any kfunc in the new scx_kfunc_ids_cpu_only at verifier load
> time.
>
> The reverse direction is intentionally permissive: cpu-form schedulers
> can freely call cid-form kfuncs to ease a gradual cpumask -> cid
> migration.
>
> The check sits in scx_kfunc_context_filter() right after the SCX
> struct_ops gate and before the any/idle allow and per-op allow-list
> checks, so it catches cpu-only kfuncs regardless of which set they
> belong to (any, idle, or select_cpu).
>
> v2: Sync per-entry kfunc flags with their primary declarations (Zhao).
> pahole intersects flags across BTF_ID_FLAGS() occurrences, so
> omitting them drops the flags globally.
>
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Reviewed-by: Cheng-Yang Chou <yphbchou0911@gmail.com>
> Cc: Zhao Mengmeng <zhaomzhao@126.com>
> ---
> kernel/sched/ext.c | 51 ++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 51 insertions(+)
>
> diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
> index abc0c798150d..37f37f31b025 100644
> --- a/kernel/sched/ext.c
> +++ b/kernel/sched/ext.c
> @@ -9979,6 +9979,47 @@ static const struct btf_kfunc_id_set scx_kfunc_set_any = {
> .filter = scx_kfunc_context_filter,
> };
>
> +/*
> + * cpu-form kfuncs that are forbidden from cid-form schedulers
> + * (bpf_sched_ext_ops_cid). Programs targeting the cid struct_ops type must
> + * use the cid-form alternative (cid/cmask kfuncs).
> + *
> + * Membership overlaps with scx_kfunc_ids_{any,idle,select_cpu}; the filter
> + * tests this set independently and rejects matches before the per-op
> + * allow-list check runs.
> + *
> + * pahole/resolve_btfids scans every BTF_ID_FLAGS() at build time and
> + * intersects flags across duplicate entries, so each entry must carry the
> + * same flags as the kfunc's primary declaration; otherwise the flags get
> + * dropped globally.
> + */
> +BTF_KFUNCS_START(scx_kfunc_ids_cpu_only)
> +BTF_ID_FLAGS(func, scx_bpf_kick_cpu, KF_IMPLICIT_ARGS)
> +BTF_ID_FLAGS(func, scx_bpf_task_cpu, KF_RCU)
> +BTF_ID_FLAGS(func, scx_bpf_cpu_rq, KF_IMPLICIT_ARGS)
> +BTF_ID_FLAGS(func, scx_bpf_cpu_curr, KF_IMPLICIT_ARGS | KF_RET_NULL | KF_RCU_PROTECTED)
> +BTF_ID_FLAGS(func, scx_bpf_cpu_node, KF_IMPLICIT_ARGS)
> +BTF_ID_FLAGS(func, scx_bpf_cpuperf_cap, KF_IMPLICIT_ARGS)
> +BTF_ID_FLAGS(func, scx_bpf_cpuperf_cur, KF_IMPLICIT_ARGS)
> +BTF_ID_FLAGS(func, scx_bpf_cpuperf_set, KF_IMPLICIT_ARGS)
> +BTF_ID_FLAGS(func, scx_bpf_get_possible_cpumask, KF_ACQUIRE)
> +BTF_ID_FLAGS(func, scx_bpf_get_online_cpumask, KF_ACQUIRE)
> +BTF_ID_FLAGS(func, scx_bpf_put_cpumask, KF_RELEASE)
> +BTF_ID_FLAGS(func, scx_bpf_select_cpu_dfl, KF_IMPLICIT_ARGS | KF_RCU)
> +BTF_ID_FLAGS(func, __scx_bpf_select_cpu_and, KF_IMPLICIT_ARGS | KF_RCU)
> +BTF_ID_FLAGS(func, scx_bpf_select_cpu_and, KF_RCU)
> +BTF_ID_FLAGS(func, scx_bpf_get_idle_cpumask, KF_IMPLICIT_ARGS | KF_ACQUIRE)
> +BTF_ID_FLAGS(func, scx_bpf_get_idle_cpumask_node, KF_IMPLICIT_ARGS | KF_ACQUIRE)
> +BTF_ID_FLAGS(func, scx_bpf_get_idle_smtmask, KF_IMPLICIT_ARGS | KF_ACQUIRE)
> +BTF_ID_FLAGS(func, scx_bpf_get_idle_smtmask_node, KF_IMPLICIT_ARGS | KF_ACQUIRE)
> +BTF_ID_FLAGS(func, scx_bpf_put_idle_cpumask, KF_RELEASE)
> +BTF_ID_FLAGS(func, scx_bpf_test_and_clear_cpu_idle, KF_IMPLICIT_ARGS)
> +BTF_ID_FLAGS(func, scx_bpf_pick_idle_cpu, KF_IMPLICIT_ARGS | KF_RCU)
> +BTF_ID_FLAGS(func, scx_bpf_pick_idle_cpu_node, KF_IMPLICIT_ARGS | KF_RCU)
> +BTF_ID_FLAGS(func, scx_bpf_pick_any_cpu, KF_IMPLICIT_ARGS | KF_RCU)
> +BTF_ID_FLAGS(func, scx_bpf_pick_any_cpu_node, KF_IMPLICIT_ARGS | KF_RCU)
> +BTF_KFUNCS_END(scx_kfunc_ids_cpu_only)
> +
Hi Tejun,
Build it and check the vmlinux.h, no signature changes, selftests build well. So looks good to me.
Reviewed-by: Zhao Mengmeng <zhaomengmeng@kylinos.cn>
> /*
> * Per-op kfunc allow flags. Each bit corresponds to a context-sensitive kfunc
> * group; an op may permit zero or more groups, with the union expressed in
> @@ -10042,6 +10083,7 @@ int scx_kfunc_context_filter(const struct bpf_prog *prog, u32 kfunc_id)
> bool in_cpu_release = btf_id_set8_contains(&scx_kfunc_ids_cpu_release, kfunc_id);
> bool in_idle = btf_id_set8_contains(&scx_kfunc_ids_idle, kfunc_id);
> bool in_any = btf_id_set8_contains(&scx_kfunc_ids_any, kfunc_id);
> + bool in_cpu_only = btf_id_set8_contains(&scx_kfunc_ids_cpu_only, kfunc_id);
> u32 moff, flags;
>
> /* Not an SCX kfunc - allow. */
> @@ -10079,6 +10121,15 @@ int scx_kfunc_context_filter(const struct bpf_prog *prog, u32 kfunc_id)
> prog->aux->st_ops != &bpf_sched_ext_ops_cid)
> return -EACCES;
>
> + /*
> + * cid-form schedulers must use cid/cmask kfuncs. cid and cpu are both
> + * small s32s and trivially confused, so cpu-only kfuncs are rejected at
> + * load time. The reverse (cpu-form calling cid-form kfuncs) is
> + * intentionally permissive to ease gradual cpumask -> cid migration.
> + */
> + if (prog->aux->st_ops == &bpf_sched_ext_ops_cid && in_cpu_only)
> + return -EACCES;
> +
> /* SCX struct_ops: check the per-op allow list. */
> if (in_any || in_idle)
> return 0;
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCHSET v2 REPOST sched_ext/for-7.2] sched_ext: Topological CPU IDs and cid-form struct_ops
@ 2026-04-24 17:27 Tejun Heo
2026-04-24 17:27 ` [PATCH 13/17] sched_ext: Forbid cpu-form kfuncs from cid-form schedulers Tejun Heo
0 siblings, 1 reply; 6+ messages in thread
From: Tejun Heo @ 2026-04-24 17:27 UTC (permalink / raw)
To: David Vernet, Andrea Righi, Changwoo Min
Cc: sched-ext, emil, linux-kernel, Cheng-Yang Chou, Zhao Mengmeng,
Tejun Heo
Hello,
Reposting v2 because the original send was not properly threaded -
each patch went out as a standalone top-level message. Content is
unchanged from the original v2.
Original v2: https://lore.kernel.org/r/20260424013220.2923402-1-tj@kernel.org
v2 of https://lore.kernel.org/r/20260421071945.3110084-1-tj@kernel.org
v2:
- Add ext-types.h first patch for early subsystem-wide type defs.
- cid: publish the cid tables with WRITE_ONCE / read with READ_ONCE;
document the visibility contract.
- cid-kfuncs: NULL-guard scx_bpf_this_cid / scx_bpf_task_cid for
TRACING/SYSCALL callers before any SCX sched has enabled.
- cid-struct-ops: use struct_size() for the set_cmask_scratch percpu
alloc; cluster __scx_is_cid_type disable with __scx_enabled disable
in scx_root_disable().
- cid-kfunc-filter: sync per-entry kfunc flags with each kfunc's
primary BTF_ID_FLAGS() declaration (Zhao). pahole intersects flags
across occurrences; omitting them drops the flags globally - the
visible symptom was KF_IMPLICIT_ARGS getting cleared on
scx_bpf_kick_cpu, leaking bpf_prog_aux into vmlinux.h.
- cmask: narrow to the helpers this series actually uses;
cmask_copy_from_kernel contract and runtime guard.
This patchset introduces topological CPU IDs (cids) - dense,
topology-ordered cpu identifiers - and an alternative cid-form struct_ops
type that lets BPF schedulers operate in cid space directly.
Key pieces:
- cid space: scx_cid_init() walks nodes * LLCs * cores * threads and packs
a dense cid mapping. The mapping can be overridden via
scx_bpf_cid_override(). See "Topological CPU IDs" in ext_cid.h for the
model.
- cmask: a base-windowed bitmap over cid space. Kernel and BPF helpers with
identical semantics. Used by scx_qmap for per-task affinity and idle-cid
tracking; meant to be the substrate for sub-sched cid allocation.
- bpf_sched_ext_ops_cid: a parallel struct_ops type whose callbacks take
cids/cmasks instead of cpus/cpumasks. Kernel translates at the boundary
via scx_cpu_arg() / scx_cpu_ret(); the two struct types share offsets up
through @priv (verified by BUILD_BUG_ON) so the union view in scx_sched
works without function-pointer casts. Sub-sched support is tied to
cid-form: validate_ops() rejects cpu-form sub-scheds and cpu-form roots
that expose sub_attach / sub_detach.
- cid-form kfuncs: scx_bpf_kick_cid, scx_bpf_cidperf_{cap,cur,set},
scx_bpf_cid_curr, scx_bpf_task_cid, scx_bpf_this_cid,
scx_bpf_nr_{cids,online_cids}, scx_bpf_cid_to_cpu, scx_bpf_cpu_to_cid.
A cid-form program may not call cpu-only kfuncs (enforced at verifier
load via scx_kfunc_context_filter); the reverse is intentionally
permissive to ease migration.
- scx_qmap port: scx_qmap is converted to cid-form. It uses the cmask-based
idle picker, per-task cid-space cpus_allowed, and cid-form kfuncs
throughout. Sub-sched dispatching via scx_bpf_sub_dispatch() continues to
work.
v2 re-tested on the 16-cpu QEMU: cid-form scx_qmap, cpu-form scx_simple,
cid<->cpu cycling, scx_qmap under stress-ng, hotplug auto-restart, and
sub-sched (root scx_qmap + cgroup-scoped scx_qmap child). Clean.
Based on sched_ext/for-7.2 (c2929bc21dce).
0001-sched_ext-Add-ext_types.h-for-early-subsystem-wide-d.patch
0002-sched_ext-Rename-ops_cpu_valid-to-scx_cpu_valid-and-.patch
0003-sched_ext-Move-scx_exit-scx_error-and-friends-to-ext.patch
0004-sched_ext-Shift-scx_kick_cpu-validity-check-to-scx_b.patch
0005-sched_ext-Relocate-cpu_acquire-cpu_release-to-end-of.patch
0006-sched_ext-Make-scx_enable-take-scx_enable_cmd.patch
0007-sched_ext-Add-topological-CPU-IDs-cids.patch
0008-sched_ext-Add-scx_bpf_cid_override-kfunc.patch
0009-tools-sched_ext-Add-struct_size-helpers-to-common.bp.patch
0010-sched_ext-Add-cmask-a-base-windowed-bitmap-over-cid-.patch
0011-sched_ext-Add-cid-form-kfunc-wrappers-alongside-cpu-.patch
0012-sched_ext-Add-bpf_sched_ext_ops_cid-struct_ops-type.patch
0013-sched_ext-Forbid-cpu-form-kfuncs-from-cid-form-sched.patch
0014-tools-sched_ext-scx_qmap-Restart-on-hotplug-instead-.patch
0015-tools-sched_ext-scx_qmap-Add-cmask-based-idle-tracki.patch
0016-tools-sched_ext-scx_qmap-Port-to-cid-form-struct_ops.patch
0017-sched_ext-Require-cid-form-struct_ops-for-sub-sched-.patch
Git tree: git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext.git scx-cid-v2
kernel/sched/build_policy.c | 2 +
kernel/sched/ext.c | 650 +++++++++++++++++++++++++----
kernel/sched/ext_cid.c | 417 ++++++++++++++++++++
kernel/sched/ext_cid.h | 164 ++++++++
kernel/sched/ext_idle.c | 8 +-
kernel/sched/ext_internal.h | 203 +++++++---
kernel/sched/ext_types.h | 104 +++++
tools/sched_ext/include/scx/cid.bpf.h | 597 ++++++++++++++++++++++++++++
tools/sched_ext/include/scx/common.bpf.h | 23 ++
tools/sched_ext/include/scx/compat.bpf.h | 24 ++
tools/sched_ext/scx_qmap.bpf.c | 306 ++++++++-------
tools/sched_ext/scx_qmap.c | 25 +-
tools/sched_ext/scx_qmap.h | 2 +-
13 files changed, 2240 insertions(+), 285 deletions(-)
--
tejun
^ permalink raw reply [flat|nested] 6+ messages in thread* [PATCH 13/17] sched_ext: Forbid cpu-form kfuncs from cid-form schedulers
2026-04-24 17:27 [PATCHSET v2 REPOST sched_ext/for-7.2] sched_ext: Topological CPU IDs and cid-form struct_ops Tejun Heo
@ 2026-04-24 17:27 ` Tejun Heo
2026-04-27 6:03 ` Zhao Mengmeng
0 siblings, 1 reply; 6+ messages in thread
From: Tejun Heo @ 2026-04-24 17:27 UTC (permalink / raw)
To: David Vernet, Andrea Righi, Changwoo Min
Cc: sched-ext, emil, linux-kernel, Cheng-Yang Chou, Zhao Mengmeng,
Tejun Heo
cid and cpu are both small s32s, trivially confused when a cid-form
scheduler calls a cpu-keyed kfunc. Reject cid-form programs that
reference any kfunc in the new scx_kfunc_ids_cpu_only at verifier load
time.
The reverse direction is intentionally permissive: cpu-form schedulers
can freely call cid-form kfuncs to ease a gradual cpumask -> cid
migration.
The check sits in scx_kfunc_context_filter() right after the SCX
struct_ops gate and before the any/idle allow and per-op allow-list
checks, so it catches cpu-only kfuncs regardless of which set they
belong to (any, idle, or select_cpu).
v2: Sync per-entry kfunc flags with their primary declarations (Zhao).
pahole intersects flags across BTF_ID_FLAGS() occurrences, so
omitting them drops the flags globally.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Cheng-Yang Chou <yphbchou0911@gmail.com>
Cc: Zhao Mengmeng <zhaomzhao@126.com>
---
kernel/sched/ext.c | 51 ++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 51 insertions(+)
diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index abc0c798150d..37f37f31b025 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -9979,6 +9979,47 @@ static const struct btf_kfunc_id_set scx_kfunc_set_any = {
.filter = scx_kfunc_context_filter,
};
+/*
+ * cpu-form kfuncs that are forbidden from cid-form schedulers
+ * (bpf_sched_ext_ops_cid). Programs targeting the cid struct_ops type must
+ * use the cid-form alternative (cid/cmask kfuncs).
+ *
+ * Membership overlaps with scx_kfunc_ids_{any,idle,select_cpu}; the filter
+ * tests this set independently and rejects matches before the per-op
+ * allow-list check runs.
+ *
+ * pahole/resolve_btfids scans every BTF_ID_FLAGS() at build time and
+ * intersects flags across duplicate entries, so each entry must carry the
+ * same flags as the kfunc's primary declaration; otherwise the flags get
+ * dropped globally.
+ */
+BTF_KFUNCS_START(scx_kfunc_ids_cpu_only)
+BTF_ID_FLAGS(func, scx_bpf_kick_cpu, KF_IMPLICIT_ARGS)
+BTF_ID_FLAGS(func, scx_bpf_task_cpu, KF_RCU)
+BTF_ID_FLAGS(func, scx_bpf_cpu_rq, KF_IMPLICIT_ARGS)
+BTF_ID_FLAGS(func, scx_bpf_cpu_curr, KF_IMPLICIT_ARGS | KF_RET_NULL | KF_RCU_PROTECTED)
+BTF_ID_FLAGS(func, scx_bpf_cpu_node, KF_IMPLICIT_ARGS)
+BTF_ID_FLAGS(func, scx_bpf_cpuperf_cap, KF_IMPLICIT_ARGS)
+BTF_ID_FLAGS(func, scx_bpf_cpuperf_cur, KF_IMPLICIT_ARGS)
+BTF_ID_FLAGS(func, scx_bpf_cpuperf_set, KF_IMPLICIT_ARGS)
+BTF_ID_FLAGS(func, scx_bpf_get_possible_cpumask, KF_ACQUIRE)
+BTF_ID_FLAGS(func, scx_bpf_get_online_cpumask, KF_ACQUIRE)
+BTF_ID_FLAGS(func, scx_bpf_put_cpumask, KF_RELEASE)
+BTF_ID_FLAGS(func, scx_bpf_select_cpu_dfl, KF_IMPLICIT_ARGS | KF_RCU)
+BTF_ID_FLAGS(func, __scx_bpf_select_cpu_and, KF_IMPLICIT_ARGS | KF_RCU)
+BTF_ID_FLAGS(func, scx_bpf_select_cpu_and, KF_RCU)
+BTF_ID_FLAGS(func, scx_bpf_get_idle_cpumask, KF_IMPLICIT_ARGS | KF_ACQUIRE)
+BTF_ID_FLAGS(func, scx_bpf_get_idle_cpumask_node, KF_IMPLICIT_ARGS | KF_ACQUIRE)
+BTF_ID_FLAGS(func, scx_bpf_get_idle_smtmask, KF_IMPLICIT_ARGS | KF_ACQUIRE)
+BTF_ID_FLAGS(func, scx_bpf_get_idle_smtmask_node, KF_IMPLICIT_ARGS | KF_ACQUIRE)
+BTF_ID_FLAGS(func, scx_bpf_put_idle_cpumask, KF_RELEASE)
+BTF_ID_FLAGS(func, scx_bpf_test_and_clear_cpu_idle, KF_IMPLICIT_ARGS)
+BTF_ID_FLAGS(func, scx_bpf_pick_idle_cpu, KF_IMPLICIT_ARGS | KF_RCU)
+BTF_ID_FLAGS(func, scx_bpf_pick_idle_cpu_node, KF_IMPLICIT_ARGS | KF_RCU)
+BTF_ID_FLAGS(func, scx_bpf_pick_any_cpu, KF_IMPLICIT_ARGS | KF_RCU)
+BTF_ID_FLAGS(func, scx_bpf_pick_any_cpu_node, KF_IMPLICIT_ARGS | KF_RCU)
+BTF_KFUNCS_END(scx_kfunc_ids_cpu_only)
+
/*
* Per-op kfunc allow flags. Each bit corresponds to a context-sensitive kfunc
* group; an op may permit zero or more groups, with the union expressed in
@@ -10042,6 +10083,7 @@ int scx_kfunc_context_filter(const struct bpf_prog *prog, u32 kfunc_id)
bool in_cpu_release = btf_id_set8_contains(&scx_kfunc_ids_cpu_release, kfunc_id);
bool in_idle = btf_id_set8_contains(&scx_kfunc_ids_idle, kfunc_id);
bool in_any = btf_id_set8_contains(&scx_kfunc_ids_any, kfunc_id);
+ bool in_cpu_only = btf_id_set8_contains(&scx_kfunc_ids_cpu_only, kfunc_id);
u32 moff, flags;
/* Not an SCX kfunc - allow. */
@@ -10079,6 +10121,15 @@ int scx_kfunc_context_filter(const struct bpf_prog *prog, u32 kfunc_id)
prog->aux->st_ops != &bpf_sched_ext_ops_cid)
return -EACCES;
+ /*
+ * cid-form schedulers must use cid/cmask kfuncs. cid and cpu are both
+ * small s32s and trivially confused, so cpu-only kfuncs are rejected at
+ * load time. The reverse (cpu-form calling cid-form kfuncs) is
+ * intentionally permissive to ease gradual cpumask -> cid migration.
+ */
+ if (prog->aux->st_ops == &bpf_sched_ext_ops_cid && in_cpu_only)
+ return -EACCES;
+
/* SCX struct_ops: check the per-op allow list. */
if (in_any || in_idle)
return 0;
--
2.53.0
^ permalink raw reply related [flat|nested] 6+ messages in thread* Re: [PATCH 13/17] sched_ext: Forbid cpu-form kfuncs from cid-form schedulers
2026-04-24 17:27 ` [PATCH 13/17] sched_ext: Forbid cpu-form kfuncs from cid-form schedulers Tejun Heo
@ 2026-04-27 6:03 ` Zhao Mengmeng
0 siblings, 0 replies; 6+ messages in thread
From: Zhao Mengmeng @ 2026-04-27 6:03 UTC (permalink / raw)
To: Tejun Heo, David Vernet, Andrea Righi, Changwoo Min
Cc: sched-ext, emil, linux-kernel, Cheng-Yang Chou
On 4/25/26 01:27, Tejun Heo wrote:
> cid and cpu are both small s32s, trivially confused when a cid-form
> scheduler calls a cpu-keyed kfunc. Reject cid-form programs that
> reference any kfunc in the new scx_kfunc_ids_cpu_only at verifier load
> time.
>
> The reverse direction is intentionally permissive: cpu-form schedulers
> can freely call cid-form kfuncs to ease a gradual cpumask -> cid
> migration.
>
> The check sits in scx_kfunc_context_filter() right after the SCX
> struct_ops gate and before the any/idle allow and per-op allow-list
> checks, so it catches cpu-only kfuncs regardless of which set they
> belong to (any, idle, or select_cpu).
>
> v2: Sync per-entry kfunc flags with their primary declarations (Zhao).
> pahole intersects flags across BTF_ID_FLAGS() occurrences, so
> omitting them drops the flags globally.
>
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Reviewed-by: Cheng-Yang Chou <yphbchou0911@gmail.com>
> Cc: Zhao Mengmeng <zhaomzhao@126.com>
> ---
> kernel/sched/ext.c | 51 ++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 51 insertions(+)
>
> diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
> index abc0c798150d..37f37f31b025 100644
> --- a/kernel/sched/ext.c
> +++ b/kernel/sched/ext.c
> @@ -9979,6 +9979,47 @@ static const struct btf_kfunc_id_set scx_kfunc_set_any = {
> .filter = scx_kfunc_context_filter,
> };
>
> +/*
> + * cpu-form kfuncs that are forbidden from cid-form schedulers
> + * (bpf_sched_ext_ops_cid). Programs targeting the cid struct_ops type must
> + * use the cid-form alternative (cid/cmask kfuncs).
> + *
> + * Membership overlaps with scx_kfunc_ids_{any,idle,select_cpu}; the filter
> + * tests this set independently and rejects matches before the per-op
> + * allow-list check runs.
> + *
> + * pahole/resolve_btfids scans every BTF_ID_FLAGS() at build time and
> + * intersects flags across duplicate entries, so each entry must carry the
> + * same flags as the kfunc's primary declaration; otherwise the flags get
> + * dropped globally.
> + */
> +BTF_KFUNCS_START(scx_kfunc_ids_cpu_only)
> +BTF_ID_FLAGS(func, scx_bpf_kick_cpu, KF_IMPLICIT_ARGS)
> +BTF_ID_FLAGS(func, scx_bpf_task_cpu, KF_RCU)
> +BTF_ID_FLAGS(func, scx_bpf_cpu_rq, KF_IMPLICIT_ARGS)
> +BTF_ID_FLAGS(func, scx_bpf_cpu_curr, KF_IMPLICIT_ARGS | KF_RET_NULL | KF_RCU_PROTECTED)
> +BTF_ID_FLAGS(func, scx_bpf_cpu_node, KF_IMPLICIT_ARGS)
> +BTF_ID_FLAGS(func, scx_bpf_cpuperf_cap, KF_IMPLICIT_ARGS)
> +BTF_ID_FLAGS(func, scx_bpf_cpuperf_cur, KF_IMPLICIT_ARGS)
> +BTF_ID_FLAGS(func, scx_bpf_cpuperf_set, KF_IMPLICIT_ARGS)
> +BTF_ID_FLAGS(func, scx_bpf_get_possible_cpumask, KF_ACQUIRE)
> +BTF_ID_FLAGS(func, scx_bpf_get_online_cpumask, KF_ACQUIRE)
> +BTF_ID_FLAGS(func, scx_bpf_put_cpumask, KF_RELEASE)
> +BTF_ID_FLAGS(func, scx_bpf_select_cpu_dfl, KF_IMPLICIT_ARGS | KF_RCU)
> +BTF_ID_FLAGS(func, __scx_bpf_select_cpu_and, KF_IMPLICIT_ARGS | KF_RCU)
> +BTF_ID_FLAGS(func, scx_bpf_select_cpu_and, KF_RCU)
> +BTF_ID_FLAGS(func, scx_bpf_get_idle_cpumask, KF_IMPLICIT_ARGS | KF_ACQUIRE)
> +BTF_ID_FLAGS(func, scx_bpf_get_idle_cpumask_node, KF_IMPLICIT_ARGS | KF_ACQUIRE)
> +BTF_ID_FLAGS(func, scx_bpf_get_idle_smtmask, KF_IMPLICIT_ARGS | KF_ACQUIRE)
> +BTF_ID_FLAGS(func, scx_bpf_get_idle_smtmask_node, KF_IMPLICIT_ARGS | KF_ACQUIRE)
> +BTF_ID_FLAGS(func, scx_bpf_put_idle_cpumask, KF_RELEASE)
> +BTF_ID_FLAGS(func, scx_bpf_test_and_clear_cpu_idle, KF_IMPLICIT_ARGS)
> +BTF_ID_FLAGS(func, scx_bpf_pick_idle_cpu, KF_IMPLICIT_ARGS | KF_RCU)
> +BTF_ID_FLAGS(func, scx_bpf_pick_idle_cpu_node, KF_IMPLICIT_ARGS | KF_RCU)
> +BTF_ID_FLAGS(func, scx_bpf_pick_any_cpu, KF_IMPLICIT_ARGS | KF_RCU)
> +BTF_ID_FLAGS(func, scx_bpf_pick_any_cpu_node, KF_IMPLICIT_ARGS | KF_RCU)
> +BTF_KFUNCS_END(scx_kfunc_ids_cpu_only)
> +
Hi Tejun,
Build it and check the vmlinux.h, no signature changes, selftests build well. So looks good to me.
Reviewed-by: Zhao Mengmeng <zhaomengmeng@kylinos.cn>
> /*
> * Per-op kfunc allow flags. Each bit corresponds to a context-sensitive kfunc
> * group; an op may permit zero or more groups, with the union expressed in
> @@ -10042,6 +10083,7 @@ int scx_kfunc_context_filter(const struct bpf_prog *prog, u32 kfunc_id)
> bool in_cpu_release = btf_id_set8_contains(&scx_kfunc_ids_cpu_release, kfunc_id);
> bool in_idle = btf_id_set8_contains(&scx_kfunc_ids_idle, kfunc_id);
> bool in_any = btf_id_set8_contains(&scx_kfunc_ids_any, kfunc_id);
> + bool in_cpu_only = btf_id_set8_contains(&scx_kfunc_ids_cpu_only, kfunc_id);
> u32 moff, flags;
>
> /* Not an SCX kfunc - allow. */
> @@ -10079,6 +10121,15 @@ int scx_kfunc_context_filter(const struct bpf_prog *prog, u32 kfunc_id)
> prog->aux->st_ops != &bpf_sched_ext_ops_cid)
> return -EACCES;
>
> + /*
> + * cid-form schedulers must use cid/cmask kfuncs. cid and cpu are both
> + * small s32s and trivially confused, so cpu-only kfuncs are rejected at
> + * load time. The reverse (cpu-form calling cid-form kfuncs) is
> + * intentionally permissive to ease gradual cpumask -> cid migration.
> + */
> + if (prog->aux->st_ops == &bpf_sched_ext_ops_cid && in_cpu_only)
> + return -EACCES;
> +
> /* SCX struct_ops: check the per-op allow list. */
> if (in_any || in_idle)
> return 0;
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCHSET v3 sched_ext/for-7.2] sched_ext: Topological CPU IDs and cid-form struct_ops
@ 2026-04-28 20:35 Tejun Heo
2026-04-28 20:35 ` [PATCH 13/17] sched_ext: Forbid cpu-form kfuncs from cid-form schedulers Tejun Heo
0 siblings, 1 reply; 6+ messages in thread
From: Tejun Heo @ 2026-04-28 20:35 UTC (permalink / raw)
To: David Vernet, Andrea Righi, Changwoo Min
Cc: sched-ext, Emil Tsalapatis, linux-kernel, Tejun Heo
Hello,
v3 (all from the Sashiko AI review at
https://sashiko.dev/#/patchset/20260424172721.3458520-1-tj%40kernel.org):
- cid: drop leaked cpus_read_lock() on scx_cid_init() failure;
BUILD_BUG_ON tightened to NR_CPUS<=8192 to match the BPF cmask
helpers' CMASK_MAX_WORDS coverage.
- bpf-struct-size: use offsetof() in struct_size() to match the
kernel <linux/overflow.h> macro semantics (no inflation from
trailing struct padding).
- cmask: cmask_copy_from_kernel() validates src->base==0 via
probe-read; nr_bits check is bit-level rather than rounded-up
word-count.
- cid-qmap-idle: qmap_init() refuses to load when scx_bpf_nr_cids()
exceeds SCX_QMAP_MAX_CPUS; the task_ctx flex array would otherwise
overflow into the next slab entry.
v2: https://lore.kernel.org/r/20260424172721.3458520-1-tj@kernel.org
v1: https://lore.kernel.org/r/20260421071945.3110084-1-tj@kernel.org
This patchset introduces topological CPU IDs (cids) - dense,
topology-ordered cpu identifiers - and an alternative cid-form struct_ops
type that lets BPF schedulers operate in cid space directly.
Key pieces:
- cid space: scx_cid_init() walks nodes * LLCs * cores * threads and packs
a dense cid mapping. The mapping can be overridden via
scx_bpf_cid_override(). See "Topological CPU IDs" in ext_cid.h for the
model.
- cmask: a base-windowed bitmap over cid space. Kernel and BPF helpers with
identical semantics. Used by scx_qmap for per-task affinity and idle-cid
tracking; meant to be the substrate for sub-sched cid allocation.
- bpf_sched_ext_ops_cid: a parallel struct_ops type whose callbacks take
cids/cmasks instead of cpus/cpumasks. Kernel translates at the boundary
via scx_cpu_arg() / scx_cpu_ret(); the two struct types share offsets up
through @priv (verified by BUILD_BUG_ON) so the union view in scx_sched
works without function-pointer casts. Sub-sched support is tied to
cid-form: validate_ops() rejects cpu-form sub-scheds and cpu-form roots
that expose sub_attach / sub_detach.
- cid-form kfuncs: scx_bpf_kick_cid, scx_bpf_cidperf_{cap,cur,set},
scx_bpf_cid_curr, scx_bpf_task_cid, scx_bpf_this_cid,
scx_bpf_nr_{cids,online_cids}, scx_bpf_cid_to_cpu, scx_bpf_cpu_to_cid.
A cid-form program may not call cpu-only kfuncs (enforced at verifier
load via scx_kfunc_context_filter); the reverse is intentionally
permissive to ease migration.
- scx_qmap port: scx_qmap is converted to cid-form. It uses the cmask-based
idle picker, per-task cid-space cpus_allowed, and cid-form kfuncs
throughout. Sub-sched dispatching via scx_bpf_sub_dispatch() continues to
work.
v3 re-tested on the 16-cpu QEMU: cid-form scx_qmap under stress-ng plus
reload cycles, hotplug auto-restart, and sub-sched (root scx_qmap +
cgroup-scoped scx_qmap child). Clean.
Based on sched_ext/for-7.2 (4939721aad2e).
0001-sched_ext-Add-ext_types.h-for-early-subsystem-wide-d.patch
0002-sched_ext-Rename-ops_cpu_valid-to-scx_cpu_valid-and-.patch
0003-sched_ext-Move-scx_exit-scx_error-and-friends-to-ext.patch
0004-sched_ext-Shift-scx_kick_cpu-validity-check-to-scx_b.patch
0005-sched_ext-Relocate-cpu_acquire-cpu_release-to-end-of.patch
0006-sched_ext-Make-scx_enable-take-scx_enable_cmd.patch
0007-sched_ext-Add-topological-CPU-IDs-cids.patch
0008-sched_ext-Add-scx_bpf_cid_override-kfunc.patch
0009-tools-sched_ext-Add-struct_size-helpers-to-common.bp.patch
0010-sched_ext-Add-cmask-a-base-windowed-bitmap-over-cid-.patch
0011-sched_ext-Add-cid-form-kfunc-wrappers-alongside-cpu-.patch
0012-sched_ext-Add-bpf_sched_ext_ops_cid-struct_ops-type.patch
0013-sched_ext-Forbid-cpu-form-kfuncs-from-cid-form-sched.patch
0014-tools-sched_ext-scx_qmap-Restart-on-hotplug-instead-.patch
0015-tools-sched_ext-scx_qmap-Add-cmask-based-idle-tracki.patch
0016-tools-sched_ext-scx_qmap-Port-to-cid-form-struct_ops.patch
0017-sched_ext-Require-cid-form-struct_ops-for-sub-sched-.patch
Git tree: git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext.git scx-cid-v3
kernel/sched/build_policy.c | 3 +
kernel/sched/ext.c | 651 ++++++++++++++++++++++++++----
kernel/sched/ext_cid.c | 409 +++++++++++++++++++
kernel/sched/ext_cid.h | 164 ++++++++
kernel/sched/ext_idle.c | 8 +-
kernel/sched/ext_internal.h | 205 +++++++---
kernel/sched/ext_types.h | 104 +++++
tools/sched_ext/include/scx/cid.bpf.h | 667 +++++++++++++++++++++++++++++++
tools/sched_ext/include/scx/common.bpf.h | 23 ++
tools/sched_ext/include/scx/compat.bpf.h | 24 ++
tools/sched_ext/scx_qmap.bpf.c | 346 +++++++++-------
tools/sched_ext/scx_qmap.c | 70 +++-
tools/sched_ext/scx_qmap.h | 2 +-
13 files changed, 2391 insertions(+), 285 deletions(-)
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 6+ messages in thread* [PATCH 13/17] sched_ext: Forbid cpu-form kfuncs from cid-form schedulers
2026-04-28 20:35 [PATCHSET v3 sched_ext/for-7.2] sched_ext: Topological CPU IDs and cid-form struct_ops Tejun Heo
@ 2026-04-28 20:35 ` Tejun Heo
0 siblings, 0 replies; 6+ messages in thread
From: Tejun Heo @ 2026-04-28 20:35 UTC (permalink / raw)
To: David Vernet, Andrea Righi, Changwoo Min
Cc: sched-ext, Emil Tsalapatis, linux-kernel, Tejun Heo,
Cheng-Yang Chou, Zhao Mengmeng
cid and cpu are both small s32s, trivially confused when a cid-form
scheduler calls a cpu-keyed kfunc. Reject cid-form programs that
reference any kfunc in the new scx_kfunc_ids_cpu_only at verifier load
time.
The reverse direction is intentionally permissive: cpu-form schedulers
can freely call cid-form kfuncs to ease a gradual cpumask -> cid
migration.
The check sits in scx_kfunc_context_filter() right after the SCX
struct_ops gate and before the any/idle allow and per-op allow-list
checks, so it catches cpu-only kfuncs regardless of which set they
belong to (any, idle, or select_cpu).
v2: Sync per-entry kfunc flags with their primary declarations (Zhao).
pahole intersects flags across BTF_ID_FLAGS() occurrences, so
omitting them drops the flags globally.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Cheng-Yang Chou <yphbchou0911@gmail.com>
Reviewed-by: Zhao Mengmeng <zhaomengmeng@kylinos.cn>
---
kernel/sched/ext.c | 51 ++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 51 insertions(+)
diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 8e3c60affc0b..d8f8fca5ded9 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -10095,6 +10095,47 @@ static const struct btf_kfunc_id_set scx_kfunc_set_any = {
.filter = scx_kfunc_context_filter,
};
+/*
+ * cpu-form kfuncs that are forbidden from cid-form schedulers
+ * (bpf_sched_ext_ops_cid). Programs targeting the cid struct_ops type must
+ * use the cid-form alternative (cid/cmask kfuncs).
+ *
+ * Membership overlaps with scx_kfunc_ids_{any,idle,select_cpu}; the filter
+ * tests this set independently and rejects matches before the per-op
+ * allow-list check runs.
+ *
+ * pahole/resolve_btfids scans every BTF_ID_FLAGS() at build time and
+ * intersects flags across duplicate entries, so each entry must carry the
+ * same flags as the kfunc's primary declaration; otherwise the flags get
+ * dropped globally.
+ */
+BTF_KFUNCS_START(scx_kfunc_ids_cpu_only)
+BTF_ID_FLAGS(func, scx_bpf_kick_cpu, KF_IMPLICIT_ARGS)
+BTF_ID_FLAGS(func, scx_bpf_task_cpu, KF_RCU)
+BTF_ID_FLAGS(func, scx_bpf_cpu_rq, KF_IMPLICIT_ARGS)
+BTF_ID_FLAGS(func, scx_bpf_cpu_curr, KF_IMPLICIT_ARGS | KF_RET_NULL | KF_RCU_PROTECTED)
+BTF_ID_FLAGS(func, scx_bpf_cpu_node, KF_IMPLICIT_ARGS)
+BTF_ID_FLAGS(func, scx_bpf_cpuperf_cap, KF_IMPLICIT_ARGS)
+BTF_ID_FLAGS(func, scx_bpf_cpuperf_cur, KF_IMPLICIT_ARGS)
+BTF_ID_FLAGS(func, scx_bpf_cpuperf_set, KF_IMPLICIT_ARGS)
+BTF_ID_FLAGS(func, scx_bpf_get_possible_cpumask, KF_ACQUIRE)
+BTF_ID_FLAGS(func, scx_bpf_get_online_cpumask, KF_ACQUIRE)
+BTF_ID_FLAGS(func, scx_bpf_put_cpumask, KF_RELEASE)
+BTF_ID_FLAGS(func, scx_bpf_select_cpu_dfl, KF_IMPLICIT_ARGS | KF_RCU)
+BTF_ID_FLAGS(func, __scx_bpf_select_cpu_and, KF_IMPLICIT_ARGS | KF_RCU)
+BTF_ID_FLAGS(func, scx_bpf_select_cpu_and, KF_RCU)
+BTF_ID_FLAGS(func, scx_bpf_get_idle_cpumask, KF_IMPLICIT_ARGS | KF_ACQUIRE)
+BTF_ID_FLAGS(func, scx_bpf_get_idle_cpumask_node, KF_IMPLICIT_ARGS | KF_ACQUIRE)
+BTF_ID_FLAGS(func, scx_bpf_get_idle_smtmask, KF_IMPLICIT_ARGS | KF_ACQUIRE)
+BTF_ID_FLAGS(func, scx_bpf_get_idle_smtmask_node, KF_IMPLICIT_ARGS | KF_ACQUIRE)
+BTF_ID_FLAGS(func, scx_bpf_put_idle_cpumask, KF_RELEASE)
+BTF_ID_FLAGS(func, scx_bpf_test_and_clear_cpu_idle, KF_IMPLICIT_ARGS)
+BTF_ID_FLAGS(func, scx_bpf_pick_idle_cpu, KF_IMPLICIT_ARGS | KF_RCU)
+BTF_ID_FLAGS(func, scx_bpf_pick_idle_cpu_node, KF_IMPLICIT_ARGS | KF_RCU)
+BTF_ID_FLAGS(func, scx_bpf_pick_any_cpu, KF_IMPLICIT_ARGS | KF_RCU)
+BTF_ID_FLAGS(func, scx_bpf_pick_any_cpu_node, KF_IMPLICIT_ARGS | KF_RCU)
+BTF_KFUNCS_END(scx_kfunc_ids_cpu_only)
+
/*
* Per-op kfunc allow flags. Each bit corresponds to a context-sensitive kfunc
* group; an op may permit zero or more groups, with the union expressed in
@@ -10158,6 +10199,7 @@ int scx_kfunc_context_filter(const struct bpf_prog *prog, u32 kfunc_id)
bool in_cpu_release = btf_id_set8_contains(&scx_kfunc_ids_cpu_release, kfunc_id);
bool in_idle = btf_id_set8_contains(&scx_kfunc_ids_idle, kfunc_id);
bool in_any = btf_id_set8_contains(&scx_kfunc_ids_any, kfunc_id);
+ bool in_cpu_only = btf_id_set8_contains(&scx_kfunc_ids_cpu_only, kfunc_id);
u32 moff, flags;
/* Not an SCX kfunc - allow. */
@@ -10195,6 +10237,15 @@ int scx_kfunc_context_filter(const struct bpf_prog *prog, u32 kfunc_id)
prog->aux->st_ops != &bpf_sched_ext_ops_cid)
return -EACCES;
+ /*
+ * cid-form schedulers must use cid/cmask kfuncs. cid and cpu are both
+ * small s32s and trivially confused, so cpu-only kfuncs are rejected at
+ * load time. The reverse (cpu-form calling cid-form kfuncs) is
+ * intentionally permissive to ease gradual cpumask -> cid migration.
+ */
+ if (prog->aux->st_ops == &bpf_sched_ext_ops_cid && in_cpu_only)
+ return -EACCES;
+
/* SCX struct_ops: check the per-op allow list. */
if (in_any || in_idle)
return 0;
--
2.54.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCHSET v4 sched_ext/for-7.2] sched_ext: Topological CPU IDs and cid-form struct_ops
@ 2026-04-29 18:21 Tejun Heo
2026-04-29 18:21 ` [PATCH 13/17] sched_ext: Forbid cpu-form kfuncs from cid-form schedulers Tejun Heo
0 siblings, 1 reply; 6+ messages in thread
From: Tejun Heo @ 2026-04-29 18:21 UTC (permalink / raw)
To: David Vernet, Andrea Righi, Changwoo Min
Cc: Emil Tsalapatis, sched-ext, linux-kernel, Tejun Heo
Hello,
v4:
- cmask: bump CMASK_CAS_TRIES to (1U << 23) so abort fires only after
seconds of real spinning, not on plausible contention. The kfunc
slow-path Changwoo suggested would let BPF loops keep banging a
contended cacheline indefinitely - on multi-socket SPRs that path
can stall the machine into hard lockups, so failing hard is the
right behavior. A follow-up patch will add a kfunc to bail the BPF
CAS loops immediately when sch->aborting is set. Switch
__builtin_ctzll() to the ctzll() wrapper for clang compat.
- cid-qmap-port: cid-shard handling was wired against a future kfunc
signature that didn't make it into v3, leaving the snapshot broken.
Drop the shard test plumbing for v4, match the 2-arg
scx_bpf_cid_override(), bound nr_cpu_ids for the verifier, and
rename mode 3 from bad-mono to bad-range. (Changwoo, Andrea)
- Rebased over the exit_cpu plumbing in for-7.2:
scx-error-header: scx_exit() and scx_verror() are macros now;
move both plus the underlying __scx_exit() / scx_vexit()
declarations to ext_internal.h.
cid-struct-ops: dump_cpu callsite shifted into scx_dump_cpu()
helper; the scx_cpu_arg() wrap moved with it.
v3: https://lore.kernel.org/r/20260428203545.181052-1-tj@kernel.org
v2: https://lore.kernel.org/r/20260424172721.3458520-1-tj@kernel.org
v1: https://lore.kernel.org/r/20260421071945.3110084-1-tj@kernel.org
This patchset introduces topological CPU IDs (cids) - dense,
topology-ordered cpu identifiers - and an alternative cid-form struct_ops
type that lets BPF schedulers operate in cid space directly.
Key pieces:
- cid space: scx_cid_init() walks nodes * LLCs * cores * threads and packs
a dense cid mapping. The mapping can be overridden via
scx_bpf_cid_override(). See "Topological CPU IDs" in ext_cid.h for the
model.
- cmask: a base-windowed bitmap over cid space. Kernel and BPF helpers with
identical semantics. Used by scx_qmap for per-task affinity and idle-cid
tracking; meant to be the substrate for sub-sched cid allocation.
- bpf_sched_ext_ops_cid: a parallel struct_ops type whose callbacks take
cids/cmasks instead of cpus/cpumasks. Kernel translates at the boundary
via scx_cpu_arg() / scx_cpu_ret(); the two struct types share offsets up
through @priv (verified by BUILD_BUG_ON) so the union view in scx_sched
works without function-pointer casts. Sub-sched support is tied to
cid-form: validate_ops() rejects cpu-form sub-scheds and cpu-form roots
that expose sub_attach / sub_detach.
- cid-form kfuncs: scx_bpf_kick_cid, scx_bpf_cidperf_{cap,cur,set},
scx_bpf_cid_curr, scx_bpf_task_cid, scx_bpf_this_cid,
scx_bpf_nr_{cids,online_cids}, scx_bpf_cid_to_cpu, scx_bpf_cpu_to_cid.
A cid-form program may not call cpu-only kfuncs (enforced at verifier
load via scx_kfunc_context_filter); the reverse is intentionally
permissive to ease migration.
- scx_qmap port: scx_qmap is converted to cid-form. It uses the cmask-based
idle picker, per-task cid-space cpus_allowed, and cid-form kfuncs
throughout. Sub-sched dispatching via scx_bpf_sub_dispatch() continues to
work.
v4 re-tested on the 16-cpu QEMU VM with the v3 cut (only the 17 cid
patches applied): basic load + stress, cid-override modes
(shuffle/bad-dup/bad-range), and three enable/disable cycles all clean.
No BUG/WARNING/panic in the dump.
Based on sched_ext/for-7.2 (ee8391ba1164).
0001-sched_ext-Add-ext_types.h-for-early-subsystem-wide-d.patch
0002-sched_ext-Rename-ops_cpu_valid-to-scx_cpu_valid-and-.patch
0003-sched_ext-Move-scx_exit-scx_error-and-friends-to-ext.patch
0004-sched_ext-Shift-scx_kick_cpu-validity-check-to-scx_b.patch
0005-sched_ext-Relocate-cpu_acquire-cpu_release-to-end-of.patch
0006-sched_ext-Make-scx_enable-take-scx_enable_cmd.patch
0007-sched_ext-Add-topological-CPU-IDs-cids.patch
0008-sched_ext-Add-scx_bpf_cid_override-kfunc.patch
0009-tools-sched_ext-Add-struct_size-helpers-to-common.bp.patch
0010-sched_ext-Add-cmask-a-base-windowed-bitmap-over-cid-.patch
0011-sched_ext-Add-cid-form-kfunc-wrappers-alongside-cpu-.patch
0012-sched_ext-Add-bpf_sched_ext_ops_cid-struct_ops-type.patch
0013-sched_ext-Forbid-cpu-form-kfuncs-from-cid-form-sched.patch
0014-tools-sched_ext-scx_qmap-Restart-on-hotplug-instead-.patch
0015-tools-sched_ext-scx_qmap-Add-cmask-based-idle-tracki.patch
0016-tools-sched_ext-scx_qmap-Port-to-cid-form-struct_ops.patch
0017-sched_ext-Require-cid-form-struct_ops-for-sub-sched-.patch
Git tree: git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext.git scx-cid-v4
kernel/sched/build_policy.c | 3 +
kernel/sched/ext.c | 660 ++++++++++++++++++++++++++----
kernel/sched/ext_cid.c | 409 +++++++++++++++++++
kernel/sched/ext_cid.h | 164 ++++++++
kernel/sched/ext_idle.c | 8 +-
kernel/sched/ext_internal.h | 209 +++++++---
kernel/sched/ext_types.h | 104 +++++
tools/sched_ext/include/scx/cid.bpf.h | 666 +++++++++++++++++++++++++++++++
tools/sched_ext/include/scx/common.bpf.h | 23 ++
tools/sched_ext/include/scx/compat.bpf.h | 24 ++
tools/sched_ext/scx_qmap.bpf.c | 350 +++++++++-------
tools/sched_ext/scx_qmap.c | 57 ++-
tools/sched_ext/scx_qmap.h | 2 +-
13 files changed, 2387 insertions(+), 292 deletions(-)
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 6+ messages in thread* [PATCH 13/17] sched_ext: Forbid cpu-form kfuncs from cid-form schedulers
2026-04-29 18:21 [PATCHSET v4 sched_ext/for-7.2] sched_ext: Topological CPU IDs and cid-form struct_ops Tejun Heo
@ 2026-04-29 18:21 ` Tejun Heo
0 siblings, 0 replies; 6+ messages in thread
From: Tejun Heo @ 2026-04-29 18:21 UTC (permalink / raw)
To: David Vernet, Andrea Righi, Changwoo Min
Cc: Emil Tsalapatis, sched-ext, linux-kernel, Tejun Heo,
Cheng-Yang Chou, Zhao Mengmeng
cid and cpu are both small s32s, trivially confused when a cid-form
scheduler calls a cpu-keyed kfunc. Reject cid-form programs that
reference any kfunc in the new scx_kfunc_ids_cpu_only at verifier load
time.
The reverse direction is intentionally permissive: cpu-form schedulers
can freely call cid-form kfuncs to ease a gradual cpumask -> cid
migration.
The check sits in scx_kfunc_context_filter() right after the SCX
struct_ops gate and before the any/idle allow and per-op allow-list
checks, so it catches cpu-only kfuncs regardless of which set they
belong to (any, idle, or select_cpu).
v2: Sync per-entry kfunc flags with their primary declarations (Zhao).
pahole intersects flags across BTF_ID_FLAGS() occurrences, so
omitting them drops the flags globally.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Cheng-Yang Chou <yphbchou0911@gmail.com>
Reviewed-by: Zhao Mengmeng <zhaomengmeng@kylinos.cn>
Reviewed-by: Changwoo Min <changwoo@igalia.com>
Reviewed-by: Andrea Righi <arighi@nvidia.com>
---
kernel/sched/ext.c | 51 ++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 51 insertions(+)
diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 79565fabd9b4..65a00c979691 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -10121,6 +10121,47 @@ static const struct btf_kfunc_id_set scx_kfunc_set_any = {
.filter = scx_kfunc_context_filter,
};
+/*
+ * cpu-form kfuncs that are forbidden from cid-form schedulers
+ * (bpf_sched_ext_ops_cid). Programs targeting the cid struct_ops type must
+ * use the cid-form alternative (cid/cmask kfuncs).
+ *
+ * Membership overlaps with scx_kfunc_ids_{any,idle,select_cpu}; the filter
+ * tests this set independently and rejects matches before the per-op
+ * allow-list check runs.
+ *
+ * pahole/resolve_btfids scans every BTF_ID_FLAGS() at build time and
+ * intersects flags across duplicate entries, so each entry must carry the
+ * same flags as the kfunc's primary declaration; otherwise the flags get
+ * dropped globally.
+ */
+BTF_KFUNCS_START(scx_kfunc_ids_cpu_only)
+BTF_ID_FLAGS(func, scx_bpf_kick_cpu, KF_IMPLICIT_ARGS)
+BTF_ID_FLAGS(func, scx_bpf_task_cpu, KF_RCU)
+BTF_ID_FLAGS(func, scx_bpf_cpu_rq, KF_IMPLICIT_ARGS)
+BTF_ID_FLAGS(func, scx_bpf_cpu_curr, KF_IMPLICIT_ARGS | KF_RET_NULL | KF_RCU_PROTECTED)
+BTF_ID_FLAGS(func, scx_bpf_cpu_node, KF_IMPLICIT_ARGS)
+BTF_ID_FLAGS(func, scx_bpf_cpuperf_cap, KF_IMPLICIT_ARGS)
+BTF_ID_FLAGS(func, scx_bpf_cpuperf_cur, KF_IMPLICIT_ARGS)
+BTF_ID_FLAGS(func, scx_bpf_cpuperf_set, KF_IMPLICIT_ARGS)
+BTF_ID_FLAGS(func, scx_bpf_get_possible_cpumask, KF_ACQUIRE)
+BTF_ID_FLAGS(func, scx_bpf_get_online_cpumask, KF_ACQUIRE)
+BTF_ID_FLAGS(func, scx_bpf_put_cpumask, KF_RELEASE)
+BTF_ID_FLAGS(func, scx_bpf_select_cpu_dfl, KF_IMPLICIT_ARGS | KF_RCU)
+BTF_ID_FLAGS(func, __scx_bpf_select_cpu_and, KF_IMPLICIT_ARGS | KF_RCU)
+BTF_ID_FLAGS(func, scx_bpf_select_cpu_and, KF_RCU)
+BTF_ID_FLAGS(func, scx_bpf_get_idle_cpumask, KF_IMPLICIT_ARGS | KF_ACQUIRE)
+BTF_ID_FLAGS(func, scx_bpf_get_idle_cpumask_node, KF_IMPLICIT_ARGS | KF_ACQUIRE)
+BTF_ID_FLAGS(func, scx_bpf_get_idle_smtmask, KF_IMPLICIT_ARGS | KF_ACQUIRE)
+BTF_ID_FLAGS(func, scx_bpf_get_idle_smtmask_node, KF_IMPLICIT_ARGS | KF_ACQUIRE)
+BTF_ID_FLAGS(func, scx_bpf_put_idle_cpumask, KF_RELEASE)
+BTF_ID_FLAGS(func, scx_bpf_test_and_clear_cpu_idle, KF_IMPLICIT_ARGS)
+BTF_ID_FLAGS(func, scx_bpf_pick_idle_cpu, KF_IMPLICIT_ARGS | KF_RCU)
+BTF_ID_FLAGS(func, scx_bpf_pick_idle_cpu_node, KF_IMPLICIT_ARGS | KF_RCU)
+BTF_ID_FLAGS(func, scx_bpf_pick_any_cpu, KF_IMPLICIT_ARGS | KF_RCU)
+BTF_ID_FLAGS(func, scx_bpf_pick_any_cpu_node, KF_IMPLICIT_ARGS | KF_RCU)
+BTF_KFUNCS_END(scx_kfunc_ids_cpu_only)
+
/*
* Per-op kfunc allow flags. Each bit corresponds to a context-sensitive kfunc
* group; an op may permit zero or more groups, with the union expressed in
@@ -10184,6 +10225,7 @@ int scx_kfunc_context_filter(const struct bpf_prog *prog, u32 kfunc_id)
bool in_cpu_release = btf_id_set8_contains(&scx_kfunc_ids_cpu_release, kfunc_id);
bool in_idle = btf_id_set8_contains(&scx_kfunc_ids_idle, kfunc_id);
bool in_any = btf_id_set8_contains(&scx_kfunc_ids_any, kfunc_id);
+ bool in_cpu_only = btf_id_set8_contains(&scx_kfunc_ids_cpu_only, kfunc_id);
u32 moff, flags;
/* Not an SCX kfunc - allow. */
@@ -10221,6 +10263,15 @@ int scx_kfunc_context_filter(const struct bpf_prog *prog, u32 kfunc_id)
prog->aux->st_ops != &bpf_sched_ext_ops_cid)
return -EACCES;
+ /*
+ * cid-form schedulers must use cid/cmask kfuncs. cid and cpu are both
+ * small s32s and trivially confused, so cpu-only kfuncs are rejected at
+ * load time. The reverse (cpu-form calling cid-form kfuncs) is
+ * intentionally permissive to ease gradual cpumask -> cid migration.
+ */
+ if (prog->aux->st_ops == &bpf_sched_ext_ops_cid && in_cpu_only)
+ return -EACCES;
+
/* SCX struct_ops: check the per-op allow list. */
if (in_any || in_idle)
return 0;
--
2.54.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
end of thread, other threads:[~2026-04-29 18:21 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-24 1:32 [PATCH 13/17] sched_ext: Forbid cpu-form kfuncs from cid-form schedulers Tejun Heo
2026-04-24 2:15 ` Zhao Mengmeng
-- strict thread matches above, loose matches on Subject: below --
2026-04-24 17:27 [PATCHSET v2 REPOST sched_ext/for-7.2] sched_ext: Topological CPU IDs and cid-form struct_ops Tejun Heo
2026-04-24 17:27 ` [PATCH 13/17] sched_ext: Forbid cpu-form kfuncs from cid-form schedulers Tejun Heo
2026-04-27 6:03 ` Zhao Mengmeng
2026-04-28 20:35 [PATCHSET v3 sched_ext/for-7.2] sched_ext: Topological CPU IDs and cid-form struct_ops Tejun Heo
2026-04-28 20:35 ` [PATCH 13/17] sched_ext: Forbid cpu-form kfuncs from cid-form schedulers Tejun Heo
2026-04-29 18:21 [PATCHSET v4 sched_ext/for-7.2] sched_ext: Topological CPU IDs and cid-form struct_ops Tejun Heo
2026-04-29 18:21 ` [PATCH 13/17] sched_ext: Forbid cpu-form kfuncs from cid-form schedulers Tejun Heo
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.