From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2D48F1448D5 for ; Tue, 12 May 2026 00:44:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778546678; cv=none; b=lA7LbIkrx8HbPzlQbNw6xwrN/jJTEbBhTofQgZThFmkOelA2rJNRn7to2hNN72l07nsKrI4c5BlAoyn6hrXzbcgw5JMhC3S9A2FXV7weis6DOpZm4cLpcUCCgbEyYYitpTqxcukADC57XufmY6yZVxyyJeJL1dvOfCXPh6XrGoU= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778546678; c=relaxed/simple; bh=09qdc32R+gpSDRLd/Ncng0aGMksh2Qkdn6a6xtDHcQ4=; h=From:Subject:To:Cc:In-Reply-To:References:Content-Type:Date: Message-Id; b=g9dK/VjvzcqmHgfJQqoelMZAc/EfsqBtg8ahUh+LbyMDx7fjrt9e6XSVesXAGE7zzpFFxNPwq9mWh76kuhnru0vQdbYcwFugnEKhoOSLmyXYPyBN9H5njuO3PIxEFhLneZ5OXiJuVVEL8+eCiUhneC39m/t3dqDyQguqoCC75uQ= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Lwmih1TQ; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Lwmih1TQ" Received: by smtp.kernel.org (Postfix) with ESMTPSA id A23D7C2BCB0; Tue, 12 May 2026 00:44:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1778546677; bh=09qdc32R+gpSDRLd/Ncng0aGMksh2Qkdn6a6xtDHcQ4=; h=From:Subject:Reply-To:To:Cc:In-Reply-To:References:Date:From; b=Lwmih1TQgwiauuT2RJHKdfuyKx3yYStP1qNy9moihenMCohaxoZdaVIBCBBELbOXU /aa7tKO3wpr6AKY9vR7dfMW1UBW4i2bDRvtPQBZzAigeELfy1rqDqzQyE15BqNVWZE EbLG6q4kk+9ATMFkf6ukSrTA42n1FX/iZ3iaL6hvIBVF4mGdvlbOHOwBJdoEkYJcjj ibHqdb6EGHGiw0Vn5BDIOuR2pzz7pdTnsnl9DjqSQ7o9/Iy/nPFmzQAqsrte/EzIMp J52Pt8PgW+8nc5QMSY2NgVejTvC7Q6hCGWaUhqwbanlbGY4jY443BedNbYLMlsZo0g RAnR8++wCbSsw== From: sashiko-bot@kernel.org Subject: Re: [PATCH sched_ext/for-7.1-fixes] sched_ext: Fix ops->priv NULL pointer deref in bpf_scx_unreg() Reply-To: sashiko@lists.linux.dev To: "Andrea Righi" Cc: sched-ext@lists.linux.dev In-Reply-To: <20260510224332.2011982-1-arighi@nvidia.com> References: <20260510224332.2011982-1-arighi@nvidia.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Date: Tue, 12 May 2026 00:44:36 +0000 Message-Id: <20260512004437.A23D7C2BCB0@smtp.kernel.org> Precedence: bulk X-Mailing-List: sched-ext@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: Thank you for your contribution! Sashiko AI review found 5 potential issue(= s) to consider: - [Critical] Dangling pointer in scx_alloc_and_add_sched() leads to Use-Aft= er-Free. - [High] Spurious scx_error() triggers Denial of Service in multi-sched env= ironments. - [High] Early return in bpf_scx_unreg() introduces a permanent resource le= ak and wedges the scheduler state. - [High] Fallback to scx_root in scx_prog_sched() breaks isolation and allo= ws cross-scheduler state corruption. - [High] Using ops->priv to track the scheduler instance inherently breaks = sub-schedulers. -- commit 13d28bde689d42f58394dbdb0585d40676beacf4 Author: Andrea Righi sched_ext: Fix ops->priv NULL pointer deref in bpf_scx_unreg() Under heavy concurrent attach/detach operations, scx_claim_exit() can trigger a NULL pointer dereference. This commit fixes it by falling back to the scx_root path in scx_prog_sched() when ops->priv is NULL, and returning early in bpf_scx_unreg() if the scheduler pointer is NULL. > diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c > index 345aa11b84b28..11998530416fb 100644 > --- a/kernel/sched/ext.c > +++ b/kernel/sched/ext.c > @@ -7493,6 +7493,14 @@ static void bpf_scx_unreg(void *kdata, struct bpf_= link *link) > struct sched_ext_ops *ops =3D kdata; > struct scx_sched *sch =3D rcu_dereference_protected(ops->priv, true); > =20 > + /* > + * ops->priv can be NULL if scx_alloc_and_add_sched() failed before > + * assigning it, or if bpf_scx_unreg() somehow re-entered. There's > + * nothing to tear down in either case. > + */ > + if (!sch) > + return; > + Does this lockless check introduce a permanent resource leak when multiple links are created for the same map? Looking at bpf_struct_ops_map_link_dealloc(), it does not hold locks preventing concurrent reg operations. If Thread 1 initiates unreg for a detached link and executes scx_disable(), but is preempted before executing RCU_INIT_POINTER(ops->priv, NULL): Thread 1: scx_disable() ... preempted ... Thread 2 could successfully attach a new link for the same map, assigning i= ts new scheduler sch2 to ops->priv. When Thread 1 resumes and executes RCU_INIT_POINTER(ops->priv, NULL), it clears the pointer to Thread 2's active scheduler. When Thread 2 subsequently detaches, would bpf_scx_unreg() read ops->priv, see NULL, and return early here? This would completely skip scx_disable() and kobject_put(), permanently leaking sch2 and leaving scx_enable_state stuck at SCX_ENABLED. Would this prevent new schedulers from being loaded until the system is rebooted? Can a dangling pointer in scx_alloc_and_add_sched() lead to a use-after-free with this ops->priv logic? While not introduced by this patch, in scx_alloc_and_add_sched(), rcu_assign_pointer(ops->priv, sch) is called before several fallible allocations. If a subsequent step like kobject_init_and_add() fails, the function jumps = to the error path and frees sch via kfree(), but fails to reset ops->priv to NULL. If a map was previously attached, ran, and armed a stray bpf_timer, could a subsequent failed re-attach attempt leave ops->priv pointing to the freed s= ch? When the stray timer fires, would scx_prog_sched() dereference ops->priv and return the freed pointer to BPF kfuncs? > diff --git a/kernel/sched/ext_internal.h b/kernel/sched/ext_internal.h > index a075732d4430d..e468a7401ed83 100644 > --- a/kernel/sched/ext_internal.h > +++ b/kernel/sched/ext_internal.h > @@ -1433,11 +1433,21 @@ static inline bool scx_task_on_sched(struct scx_s= ched *sch, > static inline struct scx_sched *scx_prog_sched(const struct bpf_prog_aux= *aux) > { > struct sched_ext_ops *ops; > - struct scx_sched *root; > + struct scx_sched *root, *sch; > =20 > ops =3D bpf_prog_get_assoc_struct_ops(aux); > - if (likely(ops)) > - return rcu_dereference_all(ops->priv); > + if (likely(ops)) { > + sch =3D rcu_dereference_all(ops->priv); > + if (likely(sch)) > + return sch; > + /* > + * @aux is associated with @ops but @ops->priv is NULL. This can > + * be observed transiently under concurrent attach/detach (e.g. > + * bpf_scx_unreg() clears @ops->priv before kdata is freed). > + * Continue with the scx_root path so single-sched users keep > + * working, sub-sched users see no scheduler. > + */ > + } Does this fallback to scx_root trigger a denial of service in multi-sched environments? If root->ops.sub_attach is true, the fallback bypasses the return root; path and instead hits scx_error(root, "Unassociated program executing with sub-schedulers"). Because scx_error() is a macro that disables the BPF scheduler and forces a system-wide fallback to CFS, could encountering this transient ops->priv =3D=3D NULL state instantly crash the root scheduler? Can this fallback to scx_root break isolation and allow cross-scheduler sta= te corruption? When a scheduler is detached, bpf_scx_unreg() clears ops->priv to NULL. However, asynchronous execution contexts like bpf_timer or workqueues armed by the detached scheduler can still fire. If this stray program calls a kfunc, it observes ops->priv =3D=3D NULL and = now falls back to returning the globally active scx_root. If an entirely different scheduler has since been attached, would the stray program bypass isolation boundaries and directly manipulate the new scheduler's state? Prior to this patch, it safely returned NULL. This isn't a bug introduced by this patch, but does using ops->priv to track the scheduler instance inherently break sub-schedulers? Because ops->priv is a single pointer on the shared struct_ops map (kdata), attaching the same map to multiple cgroups will sequentially overwrite ops->priv. It will only track the most recently attached scheduler. If a BPF program executes for an older cgroup's scheduler, will it read ops->priv, receive the newer cgroup's scheduler, and operate on the wrong state? > =20 > root =3D rcu_dereference_all(scx_root); > if (root) { --=20 Sashiko AI review =C2=B7 https://sashiko.dev/#/patchset/20260510224332.2011= 982-1-arighi@nvidia.com?part=3D1