All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrea Righi <arighi@nvidia.com>
To: Tejun Heo <tj@kernel.org>, David Vernet <void@manifault.com>,
	Changwoo Min <changwoo@igalia.com>
Cc: sched-ext@lists.linux.dev, linux-kernel@vger.kernel.org
Subject: [PATCH sched_ext/for-7.1-fixes] sched_ext: Fix ops->priv NULL pointer deref in bpf_scx_unreg()
Date: Mon, 11 May 2026 00:43:32 +0200	[thread overview]
Message-ID: <20260510224332.2011982-1-arighi@nvidia.com> (raw)

Under heavy concurrent attach/detach operations, scx_claim_exit() can
trigger a NULL pointer dereference. This can be reproduced running the
reload_loop kselftests inside a virtme-ng session:

 $ vng -v -- ./tools/testing/selftests/sched_ext/runner -t reload_loop
 ...
 BUG: kernel NULL pointer dereference, address: 0000000000000400
 ...
 RIP: 0010:scx_claim_exit+0x3b/0x120
 Call Trace:
  <TASK>
  bpf_scx_unreg+0x45/0xb0
  bpf_struct_ops_map_link_dealloc+0x39/0x50
  bpf_link_release+0x18/0x20
  __fput+0x10b/0x2e0
  __x64_sys_close+0x47/0xa0

This was introduced by commit 105dcd005be2 ("sched_ext: Introduce
scx_prog_sched()"), which:

  - Made kfuncs look up the scheduler via scx_prog_sched(aux), which
    resolves aux -> struct_ops -> ops->priv.

  - Added RCU_INIT_POINTER(ops->priv, NULL) to bpf_scx_unreg() before
    dropping the kobject reference.

Under concurrent attach/detach of the same struct_ops program, the BPF
program's aux->struct_ops association can resolve to a struct_ops whose
->priv was just cleared by a concurrent bpf_scx_unreg(), or to one where
scx_alloc_and_add_sched() has not yet completed rcu_assign_pointer().

When scx_prog_sched() observes this transient ops->priv == NULL, it
returns NULL; kfuncs like scx_bpf_create_dsq() then return -ENODEV,
which causes ops.init() to fail with -ENODEV. The failed attach enters
the disable path, and the subsequent bpf_scx_unreg() reads NULL from
ops->priv and dereferences it in scx_claim_exit().

Fix it in two places:

  - scx_prog_sched(): when ops is found but ops->priv is NULL, fall
    through to the scx_root path instead of returning NULL. For
    single-sched (the only currently supported configuration), this
    recovers the previous behavior; for sub-sched-aware schedulers the
    existing !root->ops.sub_attach guard keeps the fallback off so
    multi-sched semantics are preserved.

  - bpf_scx_unreg(): guard against ops->priv == NULL so the function is
    a no-op instead of NULL-dereferencing scx_disable(NULL, ...).

Fixes: 105dcd005be2 ("sched_ext: Introduce scx_prog_sched()")
Signed-off-by: Andrea Righi <arighi@nvidia.com>
---
 kernel/sched/ext.c          |  8 ++++++++
 kernel/sched/ext_internal.h | 16 +++++++++++++---
 2 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 4efe0099f79af..6c476ec5dcbe1 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -7608,6 +7608,14 @@ static void bpf_scx_unreg(void *kdata, struct bpf_link *link)
 	struct sched_ext_ops *ops = kdata;
 	struct scx_sched *sch = rcu_dereference_protected(ops->priv, true);
 
+	/*
+	 * ops->priv can be NULL if scx_alloc_and_add_sched() failed before
+	 * assigning it, or if bpf_scx_unreg() somehow re-entered. There's
+	 * nothing to tear down in either case.
+	 */
+	if (!sch)
+		return;
+
 	scx_disable(sch, SCX_EXIT_UNREG);
 	scx_flush_disable_work(sch);
 	RCU_INIT_POINTER(ops->priv, NULL);
diff --git a/kernel/sched/ext_internal.h b/kernel/sched/ext_internal.h
index a075732d4430d..e468a7401ed83 100644
--- a/kernel/sched/ext_internal.h
+++ b/kernel/sched/ext_internal.h
@@ -1433,11 +1433,21 @@ static inline bool scx_task_on_sched(struct scx_sched *sch,
 static inline struct scx_sched *scx_prog_sched(const struct bpf_prog_aux *aux)
 {
 	struct sched_ext_ops *ops;
-	struct scx_sched *root;
+	struct scx_sched *root, *sch;
 
 	ops = bpf_prog_get_assoc_struct_ops(aux);
-	if (likely(ops))
-		return rcu_dereference_all(ops->priv);
+	if (likely(ops)) {
+		sch = rcu_dereference_all(ops->priv);
+		if (likely(sch))
+			return sch;
+		/*
+		 * @aux is associated with @ops but @ops->priv is NULL. This can
+		 * be observed transiently under concurrent attach/detach (e.g.
+		 * bpf_scx_unreg() clears @ops->priv before kdata is freed).
+		 * Continue with the scx_root path so single-sched users keep
+		 * working, sub-sched users see no scheduler.
+		 */
+	}
 
 	root = rcu_dereference_all(scx_root);
 	if (root) {
-- 
2.54.0


             reply	other threads:[~2026-05-10 22:43 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-10 22:43 Andrea Righi [this message]
2026-05-11  2:55 ` [PATCH sched_ext/for-7.1-fixes] sched_ext: Fix ops->priv NULL pointer deref in bpf_scx_unreg() Tejun Heo
2026-05-11  5:41   ` Andrea Righi
2026-05-12  0:44 ` sashiko-bot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260510224332.2011982-1-arighi@nvidia.com \
    --to=arighi@nvidia.com \
    --cc=changwoo@igalia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sched-ext@lists.linux.dev \
    --cc=tj@kernel.org \
    --cc=void@manifault.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.