From: Puranjay Mohan <puranjay@kernel.org>
To: bpf@vger.kernel.org
Cc: Puranjay Mohan <puranjay@kernel.org>,
Puranjay Mohan <puranjay12@gmail.com>,
Alexei Starovoitov <ast@kernel.org>,
Andrii Nakryiko <andrii@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Martin KaFai Lau <martin.lau@kernel.org>,
Eduard Zingerman <eddyz87@gmail.com>,
Kumar Kartikeya Dwivedi <memxor@gmail.com>,
Mykyta Yatsenko <mykyta.yatsenko5@gmail.com>,
kernel-team@meta.com
Subject: [PATCH bpf v6 1/3] bpf: fix mm lifecycle in open-coded task_vma iterator
Date: Wed, 8 Apr 2026 08:45:35 -0700 [thread overview]
Message-ID: <20260408154539.3832150-2-puranjay@kernel.org> (raw)
In-Reply-To: <20260408154539.3832150-1-puranjay@kernel.org>
The open-coded task_vma iterator reads task->mm locklessly and acquires
mmap_read_trylock() but never calls mmget(). If the task exits
concurrently, the mm_struct can be freed as it is not
SLAB_TYPESAFE_BY_RCU, resulting in a use-after-free.
Safely read task->mm with a trylock on alloc_lock and acquire an mm
reference. Drop the reference via bpf_iter_mmput_async() in _destroy()
and error paths. bpf_iter_mmput_async() is a local wrapper around
mmput_async() with a fallback to mmput() on !CONFIG_MMU.
Reject irqs-disabled contexts (including NMI) up front. Operations used
by _next() and _destroy() (mmap_read_unlock, bpf_iter_mmput_async)
take spinlocks with IRQs disabled (pool->lock, pi_lock). Running from
NMI or from a tracepoint that fires with those locks held could
deadlock.
A trylock on alloc_lock is used instead of the blocking task_lock()
(get_task_mm) to avoid a deadlock when a softirq BPF program iterates
a task that already holds its alloc_lock on the same CPU.
Fixes: 4ac454682158 ("bpf: Introduce task_vma open-coded iterator kfuncs")
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
---
kernel/bpf/task_iter.c | 54 +++++++++++++++++++++++++++++++++++++++---
1 file changed, 51 insertions(+), 3 deletions(-)
diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
index 98d9b4c0daff..c1f5fbe9dc2f 100644
--- a/kernel/bpf/task_iter.c
+++ b/kernel/bpf/task_iter.c
@@ -9,6 +9,7 @@
#include <linux/bpf_mem_alloc.h>
#include <linux/btf_ids.h>
#include <linux/mm_types.h>
+#include <linux/sched/mm.h>
#include "mmap_unlock_work.h"
static const char * const iter_task_type_names[] = {
@@ -794,6 +795,15 @@ const struct bpf_func_proto bpf_find_vma_proto = {
.arg5_type = ARG_ANYTHING,
};
+static inline void bpf_iter_mmput_async(struct mm_struct *mm)
+{
+#ifdef CONFIG_MMU
+ mmput_async(mm);
+#else
+ mmput(mm);
+#endif
+}
+
struct bpf_iter_task_vma_kern_data {
struct task_struct *task;
struct mm_struct *mm;
@@ -825,6 +835,24 @@ __bpf_kfunc int bpf_iter_task_vma_new(struct bpf_iter_task_vma *it,
BUILD_BUG_ON(sizeof(struct bpf_iter_task_vma_kern) != sizeof(struct bpf_iter_task_vma));
BUILD_BUG_ON(__alignof__(struct bpf_iter_task_vma_kern) != __alignof__(struct bpf_iter_task_vma));
+ /* bpf_iter_mmput_async() needs mmput_async() which requires CONFIG_MMU */
+ if (!IS_ENABLED(CONFIG_MMU)) {
+ kit->data = NULL;
+ return -EOPNOTSUPP;
+ }
+
+ /*
+ * Reject irqs-disabled contexts including NMI. Operations used
+ * by _next() and _destroy() (mmap_read_unlock, bpf_iter_mmput_async)
+ * can take spinlocks with IRQs disabled (pi_lock, pool->lock).
+ * Running from NMI or from a tracepoint that fires with those
+ * locks held could deadlock.
+ */
+ if (irqs_disabled()) {
+ kit->data = NULL;
+ return -EBUSY;
+ }
+
/* is_iter_reg_valid_uninit guarantees that kit hasn't been initialized
* before, so non-NULL kit->data doesn't point to previously
* bpf_mem_alloc'd bpf_iter_task_vma_kern_data
@@ -834,7 +862,25 @@ __bpf_kfunc int bpf_iter_task_vma_new(struct bpf_iter_task_vma *it,
return -ENOMEM;
kit->data->task = get_task_struct(task);
+ /*
+ * Safely read task->mm and acquire an mm reference.
+ *
+ * Cannot use get_task_mm() because its task_lock() is a
+ * blocking spin_lock that would deadlock if the target task
+ * already holds alloc_lock on this CPU (e.g. a softirq BPF
+ * program iterating a task interrupted while holding its
+ * alloc_lock).
+ */
+ if (!spin_trylock(&task->alloc_lock)) {
+ err = -EBUSY;
+ goto err_cleanup_iter;
+ }
kit->data->mm = task->mm;
+ if (kit->data->mm && !(task->flags & PF_KTHREAD))
+ mmget(kit->data->mm);
+ else
+ kit->data->mm = NULL;
+ spin_unlock(&task->alloc_lock);
if (!kit->data->mm) {
err = -ENOENT;
goto err_cleanup_iter;
@@ -844,15 +890,16 @@ __bpf_kfunc int bpf_iter_task_vma_new(struct bpf_iter_task_vma *it,
irq_work_busy = bpf_mmap_unlock_get_irq_work(&kit->data->work);
if (irq_work_busy || !mmap_read_trylock(kit->data->mm)) {
err = -EBUSY;
- goto err_cleanup_iter;
+ goto err_cleanup_mmget;
}
vma_iter_init(&kit->data->vmi, kit->data->mm, addr);
return 0;
+err_cleanup_mmget:
+ bpf_iter_mmput_async(kit->data->mm);
err_cleanup_iter:
- if (kit->data->task)
- put_task_struct(kit->data->task);
+ put_task_struct(kit->data->task);
bpf_mem_free(&bpf_global_ma, kit->data);
/* NULL kit->data signals failed bpf_iter_task_vma initialization */
kit->data = NULL;
@@ -875,6 +922,7 @@ __bpf_kfunc void bpf_iter_task_vma_destroy(struct bpf_iter_task_vma *it)
if (kit->data) {
bpf_mmap_unlock_mm(kit->data->work, kit->data->mm);
put_task_struct(kit->data->task);
+ bpf_iter_mmput_async(kit->data->mm);
bpf_mem_free(&bpf_global_ma, kit->data);
}
}
--
2.52.0
next prev parent reply other threads:[~2026-04-08 15:45 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-08 15:45 [PATCH bpf v6 0/3] bpf: fix and improve open-coded task_vma iterator Puranjay Mohan
2026-04-08 15:45 ` Puranjay Mohan [this message]
2026-04-08 15:45 ` [PATCH bpf v6 2/3] bpf: switch task_vma iterator from mmap_lock to per-VMA locks Puranjay Mohan
2026-04-08 15:45 ` [PATCH bpf v6 3/3] bpf: return VMA snapshot from task_vma iterator Puranjay Mohan
2026-04-09 13:15 ` Mykyta Yatsenko
2026-04-10 19:10 ` [PATCH bpf v6 0/3] bpf: fix and improve open-coded " patchwork-bot+netdevbpf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260408154539.3832150-2-puranjay@kernel.org \
--to=puranjay@kernel.org \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=eddyz87@gmail.com \
--cc=kernel-team@meta.com \
--cc=martin.lau@kernel.org \
--cc=memxor@gmail.com \
--cc=mykyta.yatsenko5@gmail.com \
--cc=puranjay12@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.