From: Andrea Righi <arighi@nvidia.com>
To: Tejun Heo <tj@kernel.org>, David Vernet <void@manifault.com>,
Changwoo Min <changwoo@igalia.com>
Cc: sched-ext@lists.linux.dev, linux-kernel@vger.kernel.org
Subject: [PATCH sched_ext/for-7.1-fixes] sched_ext: Fix ops->priv NULL pointer deref in bpf_scx_unreg()
Date: Mon, 11 May 2026 00:43:32 +0200 [thread overview]
Message-ID: <20260510224332.2011982-1-arighi@nvidia.com> (raw)
Under heavy concurrent attach/detach operations, scx_claim_exit() can
trigger a NULL pointer dereference. This can be reproduced running the
reload_loop kselftests inside a virtme-ng session:
$ vng -v -- ./tools/testing/selftests/sched_ext/runner -t reload_loop
...
BUG: kernel NULL pointer dereference, address: 0000000000000400
...
RIP: 0010:scx_claim_exit+0x3b/0x120
Call Trace:
<TASK>
bpf_scx_unreg+0x45/0xb0
bpf_struct_ops_map_link_dealloc+0x39/0x50
bpf_link_release+0x18/0x20
__fput+0x10b/0x2e0
__x64_sys_close+0x47/0xa0
This was introduced by commit 105dcd005be2 ("sched_ext: Introduce
scx_prog_sched()"), which:
- Made kfuncs look up the scheduler via scx_prog_sched(aux), which
resolves aux -> struct_ops -> ops->priv.
- Added RCU_INIT_POINTER(ops->priv, NULL) to bpf_scx_unreg() before
dropping the kobject reference.
Under concurrent attach/detach of the same struct_ops program, the BPF
program's aux->struct_ops association can resolve to a struct_ops whose
->priv was just cleared by a concurrent bpf_scx_unreg(), or to one where
scx_alloc_and_add_sched() has not yet completed rcu_assign_pointer().
When scx_prog_sched() observes this transient ops->priv == NULL, it
returns NULL; kfuncs like scx_bpf_create_dsq() then return -ENODEV,
which causes ops.init() to fail with -ENODEV. The failed attach enters
the disable path, and the subsequent bpf_scx_unreg() reads NULL from
ops->priv and dereferences it in scx_claim_exit().
Fix it in two places:
- scx_prog_sched(): when ops is found but ops->priv is NULL, fall
through to the scx_root path instead of returning NULL. For
single-sched (the only currently supported configuration), this
recovers the previous behavior; for sub-sched-aware schedulers the
existing !root->ops.sub_attach guard keeps the fallback off so
multi-sched semantics are preserved.
- bpf_scx_unreg(): guard against ops->priv == NULL so the function is
a no-op instead of NULL-dereferencing scx_disable(NULL, ...).
Fixes: 105dcd005be2 ("sched_ext: Introduce scx_prog_sched()")
Signed-off-by: Andrea Righi <arighi@nvidia.com>
---
kernel/sched/ext.c | 8 ++++++++
kernel/sched/ext_internal.h | 16 +++++++++++++---
2 files changed, 21 insertions(+), 3 deletions(-)
diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 4efe0099f79af..6c476ec5dcbe1 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -7608,6 +7608,14 @@ static void bpf_scx_unreg(void *kdata, struct bpf_link *link)
struct sched_ext_ops *ops = kdata;
struct scx_sched *sch = rcu_dereference_protected(ops->priv, true);
+ /*
+ * ops->priv can be NULL if scx_alloc_and_add_sched() failed before
+ * assigning it, or if bpf_scx_unreg() somehow re-entered. There's
+ * nothing to tear down in either case.
+ */
+ if (!sch)
+ return;
+
scx_disable(sch, SCX_EXIT_UNREG);
scx_flush_disable_work(sch);
RCU_INIT_POINTER(ops->priv, NULL);
diff --git a/kernel/sched/ext_internal.h b/kernel/sched/ext_internal.h
index a075732d4430d..e468a7401ed83 100644
--- a/kernel/sched/ext_internal.h
+++ b/kernel/sched/ext_internal.h
@@ -1433,11 +1433,21 @@ static inline bool scx_task_on_sched(struct scx_sched *sch,
static inline struct scx_sched *scx_prog_sched(const struct bpf_prog_aux *aux)
{
struct sched_ext_ops *ops;
- struct scx_sched *root;
+ struct scx_sched *root, *sch;
ops = bpf_prog_get_assoc_struct_ops(aux);
- if (likely(ops))
- return rcu_dereference_all(ops->priv);
+ if (likely(ops)) {
+ sch = rcu_dereference_all(ops->priv);
+ if (likely(sch))
+ return sch;
+ /*
+ * @aux is associated with @ops but @ops->priv is NULL. This can
+ * be observed transiently under concurrent attach/detach (e.g.
+ * bpf_scx_unreg() clears @ops->priv before kdata is freed).
+ * Continue with the scx_root path so single-sched users keep
+ * working, sub-sched users see no scheduler.
+ */
+ }
root = rcu_dereference_all(scx_root);
if (root) {
--
2.54.0
next reply other threads:[~2026-05-10 22:43 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-10 22:43 Andrea Righi [this message]
2026-05-11 2:55 ` [PATCH sched_ext/for-7.1-fixes] sched_ext: Fix ops->priv NULL pointer deref in bpf_scx_unreg() Tejun Heo
2026-05-11 5:41 ` Andrea Righi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260510224332.2011982-1-arighi@nvidia.com \
--to=arighi@nvidia.com \
--cc=changwoo@igalia.com \
--cc=linux-kernel@vger.kernel.org \
--cc=sched-ext@lists.linux.dev \
--cc=tj@kernel.org \
--cc=void@manifault.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox