* [PATCH v2 sched_ext/for-7.1-fixes] sched_ext: Fix ops->priv clobber on concurrent attach/detach
@ 2026-05-11 6:18 Andrea Righi
2026-05-11 7:47 ` Tejun Heo
0 siblings, 1 reply; 3+ messages in thread
From: Andrea Righi @ 2026-05-11 6:18 UTC (permalink / raw)
To: Tejun Heo, David Vernet, Changwoo Min; +Cc: sched-ext, linux-kernel
Under heavy concurrent attach/detach operations, scx_claim_exit() can
trigger a NULL pointer dereference. This can be reproduced running the
reload_loop kselftests inside a virtme-ng session:
$ vng -v -- ./tools/testing/selftests/sched_ext/runner -t reload_loop
...
BUG: kernel NULL pointer dereference, address: 0000000000000400
RIP: 0010:scx_claim_exit+0x3b/0x120
Call Trace:
<TASK>
bpf_scx_unreg+0x45/0xb0
bpf_struct_ops_map_link_dealloc+0x39/0x50
bpf_link_release+0x18/0x20
__fput+0x10b/0x2e0
__x64_sys_close+0x47/0xa0
The underlying race (diagnosed by Tejun Heo) is a stomp of @ops->priv,
not a missing NULL check:
T2 unreg(K) T1 reg(K)
----------- ---------
sch = ops->priv = sch_b800
scx_disable; flush_disable_work
[scx_root_disable: scx_root=NULL,
mutex_unlock, state=DISABLED]
mutex_lock; state ok
scx_alloc_and_add_sched:
ops->priv = sch_a800
scx_root = sch_a800; init=0
state=ENABLED; mutex_unlock
[flush returns]
RCU_INIT_POINTER(ops->priv, NULL) <-- clobbers sch_a800
kobject_put(sch_b800)
T1 acquires scx_enable_mutex inside scx_root_disable()'s mutex_unlock
window and starts a fresh attach on the same kdata, assigning sch_a800
to @ops->priv. T2 then continues out of scx_disable()/flush_disable_work
and clobbers @ops->priv to NULL, leaking sch_a800; the bpf_link is gone
but state stays SCX_ENABLED, so all future attaches fail with -EBUSY
permanently. The next bpf_scx_unreg() on that kdata then reads NULL
@ops->priv and dereferences it in scx_claim_exit().
Make @ops->priv the lifecycle binding: in scx_root_enable_workfn() and
scx_sub_enable_workfn(), after the existing state check and still under
scx_enable_mutex, refuse with -EBUSY if @ops->priv is non-NULL. This
rejects an attempt to reuse a kdata that is still bound to a previous
scheduler instance, closing the race without changing the unreg side.
Fixes: 105dcd005be2 ("sched_ext: Introduce scx_prog_sched()")
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrea Righi <arighi@nvidia.com>
---
Changes in v2:
- Address the root cause (a @ops->priv clobber during concurrent attach/detach)
instead of masking the resulting NULL deref (Tejun Heo)
- Drop the v1 scx_prog_sched() fallback to scx_root and the NULL guard in
bpf_scx_unreg().
- Reword the title to reflect the actual bug.
- Link to v1: https://lore.kernel.org/all/20260510224332.2011982-1-arighi@nvidia.com
kernel/sched/ext.c | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)
diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 864fb21344205..c49ada0a89c7f 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -6800,6 +6800,19 @@ static void scx_root_enable_workfn(struct kthread_work *work)
goto err_unlock;
}
+ /*
+ * @ops->priv binds @ops to its scx_sched instance. It is set here by
+ * scx_alloc_and_add_sched() and cleared at the tail of bpf_scx_unreg(),
+ * which runs after scx_root_disable() has dropped scx_enable_mutex. If
+ * it's still non-NULL here, a previous attachment on @ops has not
+ * finished tearing down; proceeding would let the in-flight unreg's
+ * RCU_INIT_POINTER(NULL) clobber the @ops->priv we are about to assign.
+ */
+ if (rcu_access_pointer(ops->priv)) {
+ ret = -EBUSY;
+ goto err_unlock;
+ }
+
ret = alloc_kick_syncs();
if (ret)
goto err_unlock;
@@ -7118,6 +7131,12 @@ static void scx_sub_enable_workfn(struct kthread_work *work)
goto out_unlock;
}
+ /* See scx_root_enable_workfn() for the @ops->priv check. */
+ if (rcu_access_pointer(ops->priv)) {
+ ret = -EBUSY;
+ goto out_unlock;
+ }
+
cgrp = cgroup_get_from_id(ops->sub_cgroup_id);
if (IS_ERR(cgrp)) {
ret = PTR_ERR(cgrp);
--
2.54.0
^ permalink raw reply related [flat|nested] 3+ messages in thread* Re: [PATCH v2 sched_ext/for-7.1-fixes] sched_ext: Fix ops->priv clobber on concurrent attach/detach
2026-05-11 6:18 [PATCH v2 sched_ext/for-7.1-fixes] sched_ext: Fix ops->priv clobber on concurrent attach/detach Andrea Righi
@ 2026-05-11 7:47 ` Tejun Heo
2026-05-11 7:52 ` Andrea Righi
0 siblings, 1 reply; 3+ messages in thread
From: Tejun Heo @ 2026-05-11 7:47 UTC (permalink / raw)
To: Andrea Righi
Cc: David Vernet, Changwoo Min, Emil Tsalapatis, sched-ext,
linux-kernel
Hello, Andrea.
Applied to sched_ext/for-7.1-fixes.
One followup if you have cycles: scx_alloc_and_add_sched() can still
fail after rcu_assign_pointer(ops->priv, sch) (sub-sched
kzalloc/kstrdup and kobject paths). With the new gate, that would
leave the kdata permanently -EBUSY. Could probably be addressed by
clearing @ops->priv on those error paths.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH v2 sched_ext/for-7.1-fixes] sched_ext: Fix ops->priv clobber on concurrent attach/detach
2026-05-11 7:47 ` Tejun Heo
@ 2026-05-11 7:52 ` Andrea Righi
0 siblings, 0 replies; 3+ messages in thread
From: Andrea Righi @ 2026-05-11 7:52 UTC (permalink / raw)
To: Tejun Heo
Cc: David Vernet, Changwoo Min, Emil Tsalapatis, sched-ext,
linux-kernel
On Sun, May 10, 2026 at 09:47:52PM -1000, Tejun Heo wrote:
> Hello, Andrea.
>
> Applied to sched_ext/for-7.1-fixes.
>
> One followup if you have cycles: scx_alloc_and_add_sched() can still
> fail after rcu_assign_pointer(ops->priv, sch) (sub-sched
> kzalloc/kstrdup and kobject paths). With the new gate, that would
> leave the kdata permanently -EBUSY. Could probably be addressed by
> clearing @ops->priv on those error paths.
Ah! Makes sense, I'll send a follow-up patch.
Thanks,
-Andrea
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2026-05-11 7:52 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-11 6:18 [PATCH v2 sched_ext/for-7.1-fixes] sched_ext: Fix ops->priv clobber on concurrent attach/detach Andrea Righi
2026-05-11 7:47 ` Tejun Heo
2026-05-11 7:52 ` Andrea Righi
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox