* [PATCH bpf v4 0/2] bpf: Fix arena VMA use-after-free on fork
@ 2026-04-12 2:27 Weiming Shi
2026-04-12 2:27 ` [PATCH bpf v4 1/2] bpf: Fix use-after-free of arena VMA " Weiming Shi
2026-04-12 2:27 ` [PATCH bpf v4 2/2] selftests/bpf: Add test for arena VMA use-after-free " Weiming Shi
0 siblings, 2 replies; 7+ messages in thread
From: Weiming Shi @ 2026-04-12 2:27 UTC (permalink / raw)
To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
Cc: Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Barret Rhoden, Emil Tsalapatis, bpf, linux-kernel, Xiang Mei,
Weiming Shi
arena_vm_open() only increments a refcount on the shared vma_list entry
but never registers the new VMA. After fork + parent munmap, vml->vma
becomes a dangling pointer. bpf_arena_free_pages -> zap_pages then
dereferences it, causing a slab-use-after-free in zap_page_range_single.
Patch 1 fixes the bug by tracking each child VMA separately in
arena_vm_open, and adds arena_vm_may_split() to prevent VMA splitting.
Patch 2 adds a selftest that reproduces the issue (requires KASAN to
detect the UAF).
v4:
- Fixed commit message: OOM case description, may_split rationale
v3:
- Added arena_vm_may_split() to prevent VMA splitting
- Reuse remember_vma() in arena_vm_open(), removed HugeTLB references
- selftests: fixed copyright, trimmed comments, use sysconf()
v2:
- Added missing Reported-by tag
Weiming Shi (2):
bpf: Fix use-after-free of arena VMA on fork
selftests/bpf: Add test for arena VMA use-after-free on fork
kernel/bpf/arena.c | 23 ++++--
.../selftests/bpf/prog_tests/arena_fork.c | 80 +++++++++++++++++++
.../testing/selftests/bpf/progs/arena_fork.c | 41 ++++++++++
3 files changed, 138 insertions(+), 6 deletions(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/arena_fork.c
create mode 100644 tools/testing/selftests/bpf/progs/arena_fork.c
--
2.43.0
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH bpf v4 1/2] bpf: Fix use-after-free of arena VMA on fork
2026-04-12 2:27 [PATCH bpf v4 0/2] bpf: Fix arena VMA use-after-free on fork Weiming Shi
@ 2026-04-12 2:27 ` Weiming Shi
2026-04-12 17:50 ` Emil Tsalapatis
2026-04-12 2:27 ` [PATCH bpf v4 2/2] selftests/bpf: Add test for arena VMA use-after-free " Weiming Shi
1 sibling, 1 reply; 7+ messages in thread
From: Weiming Shi @ 2026-04-12 2:27 UTC (permalink / raw)
To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
Cc: Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Barret Rhoden, Emil Tsalapatis, bpf, linux-kernel, Xiang Mei,
Weiming Shi
arena_vm_open() only increments a refcount on the shared vma_list entry
but never registers the new VMA or updates the stored vma pointer. When
the original VMA is unmapped while a forked copy still exists,
arena_vm_close() drops the refcount without freeing the vma_list entry.
The entry's vma pointer now refers to a freed vm_area_struct. A
subsequent bpf_arena_free_pages() call iterates vma_list and passes
the dangling pointer to zap_page_range_single(), causing a
use-after-free.
The bug is reachable by any process with CAP_BPF and CAP_PERFMON that
can create a BPF_MAP_TYPE_ARENA, mmap it, and fork. It triggers
deterministically -- no race condition is involved.
BUG: KASAN: slab-use-after-free in zap_page_range_single (mm/memory.c:2234)
Call Trace:
<TASK>
zap_page_range_single+0x101/0x110 mm/memory.c:2234
zap_pages+0x80/0xf0 kernel/bpf/arena.c:658
arena_free_pages+0x67a/0x860 kernel/bpf/arena.c:712
bpf_prog_test_run_syscall+0x3da net/bpf/test_run.c:1640
__sys_bpf+0x1662/0x50b0 kernel/bpf/syscall.c:6267
__x64_sys_bpf+0x73/0xb0 kernel/bpf/syscall.c:6360
do_syscall_64+0xf1/0x530 arch/x86/entry/syscall_64.c:63
entry_SYSCALL_64_after_hwframe+0x77 arch/x86/entry/entry_64.S:130
</TASK>
Fix this by tracking each child VMA separately. arena_vm_open() now
clears the inherited vm_private_data and calls remember_vma() to
register a fresh vma_list entry for the new VMA. If remember_vma()
fails due to OOM, vm_private_data stays NULL and arena_vm_close()
skips the cleanup for that VMA. The shared refcount is no longer
needed and is removed.
Also add arena_vm_may_split() returning -EINVAL to prevent VMA
splitting, so that arena_vm_open() only needs to handle fork and the
vma_list tracking stays simple.
Fixes: b90d77e5fd78 ("bpf: Fix remap of arena.")
Reported-by: Xiang Mei <xmei5@asu.edu>
Signed-off-by: Weiming Shi <bestswngs@gmail.com>
---
kernel/bpf/arena.c | 23 +++++++++++++++++------
1 file changed, 17 insertions(+), 6 deletions(-)
diff --git a/kernel/bpf/arena.c b/kernel/bpf/arena.c
index f355cf1c1a16..3462c4463617 100644
--- a/kernel/bpf/arena.c
+++ b/kernel/bpf/arena.c
@@ -317,7 +317,6 @@ static u64 arena_map_mem_usage(const struct bpf_map *map)
struct vma_list {
struct vm_area_struct *vma;
struct list_head head;
- refcount_t mmap_count;
};
static int remember_vma(struct bpf_arena *arena, struct vm_area_struct *vma)
@@ -327,7 +326,6 @@ static int remember_vma(struct bpf_arena *arena, struct vm_area_struct *vma)
vml = kmalloc_obj(*vml);
if (!vml)
return -ENOMEM;
- refcount_set(&vml->mmap_count, 1);
vma->vm_private_data = vml;
vml->vma = vma;
list_add(&vml->head, &arena->vma_list);
@@ -336,9 +334,17 @@ static int remember_vma(struct bpf_arena *arena, struct vm_area_struct *vma)
static void arena_vm_open(struct vm_area_struct *vma)
{
- struct vma_list *vml = vma->vm_private_data;
+ struct bpf_map *map = vma->vm_file->private_data;
+ struct bpf_arena *arena = container_of(map, struct bpf_arena, map);
- refcount_inc(&vml->mmap_count);
+ /*
+ * vm_private_data points to the parent's vma_list entry after fork.
+ * Clear it and register this VMA separately.
+ */
+ vma->vm_private_data = NULL;
+ guard(mutex)(&arena->lock);
+ /* OOM is silently ignored; arena_vm_close() handles NULL. */
+ remember_vma(arena, vma);
}
static void arena_vm_close(struct vm_area_struct *vma)
@@ -347,10 +353,9 @@ static void arena_vm_close(struct vm_area_struct *vma)
struct bpf_arena *arena = container_of(map, struct bpf_arena, map);
struct vma_list *vml = vma->vm_private_data;
- if (!refcount_dec_and_test(&vml->mmap_count))
+ if (!vml)
return;
guard(mutex)(&arena->lock);
- /* update link list under lock */
list_del(&vml->head);
vma->vm_private_data = NULL;
kfree(vml);
@@ -415,9 +420,15 @@ static vm_fault_t arena_vm_fault(struct vm_fault *vmf)
return VM_FAULT_SIGSEGV;
}
+static int arena_vm_may_split(struct vm_area_struct *vma, unsigned long addr)
+{
+ return -EINVAL;
+}
+
static const struct vm_operations_struct arena_vm_ops = {
.open = arena_vm_open,
.close = arena_vm_close,
+ .may_split = arena_vm_may_split,
.fault = arena_vm_fault,
};
--
2.43.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH bpf v4 2/2] selftests/bpf: Add test for arena VMA use-after-free on fork
2026-04-12 2:27 [PATCH bpf v4 0/2] bpf: Fix arena VMA use-after-free on fork Weiming Shi
2026-04-12 2:27 ` [PATCH bpf v4 1/2] bpf: Fix use-after-free of arena VMA " Weiming Shi
@ 2026-04-12 2:27 ` Weiming Shi
1 sibling, 0 replies; 7+ messages in thread
From: Weiming Shi @ 2026-04-12 2:27 UTC (permalink / raw)
To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
Cc: Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Barret Rhoden, Emil Tsalapatis, bpf, linux-kernel, Xiang Mei,
Weiming Shi
Add a selftest that reproduces the arena VMA use-after-free fixed in
the previous commit. The test creates an arena, mmaps it, allocates
pages via BPF, forks, has the parent munmap the arena, then has the
child call bpf_arena_free_pages. Without the fix this triggers a
KASAN slab-use-after-free in zap_page_range_single.
Note: the UAF occurs entirely in kernel space and is not observable
from userspace, so this test relies on KASAN to detect the bug.
Without KASAN the test passes regardless of whether the fix is
applied.
Signed-off-by: Weiming Shi <bestswngs@gmail.com>
---
.../selftests/bpf/prog_tests/arena_fork.c | 80 +++++++++++++++++++
.../testing/selftests/bpf/progs/arena_fork.c | 41 ++++++++++
2 files changed, 121 insertions(+)
create mode 100644 tools/testing/selftests/bpf/prog_tests/arena_fork.c
create mode 100644 tools/testing/selftests/bpf/progs/arena_fork.c
diff --git a/tools/testing/selftests/bpf/prog_tests/arena_fork.c b/tools/testing/selftests/bpf/prog_tests/arena_fork.c
new file mode 100644
index 000000000000..0235884c5906
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/arena_fork.c
@@ -0,0 +1,80 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2026 Meta Platforms, Inc. and affiliates. */
+
+/*
+ * Test that forking a process with an arena mmap does not cause a
+ * use-after-free when the parent unmaps and the child frees arena pages.
+ */
+#include <test_progs.h>
+#include <sys/mman.h>
+#include <sys/wait.h>
+#include <unistd.h>
+
+#include "arena_fork.skel.h"
+
+void test_arena_fork(void)
+{
+ LIBBPF_OPTS(bpf_test_run_opts, opts);
+ struct bpf_map_info info = {};
+ __u32 info_len = sizeof(info);
+ struct arena_fork *skel;
+ long page_size;
+ size_t arena_sz;
+ void *arena_addr;
+ int arena_fd, ret, status;
+ pid_t pid;
+
+ page_size = sysconf(_SC_PAGESIZE);
+ if (!ASSERT_GT(page_size, 0, "page_size"))
+ return;
+
+ skel = arena_fork__open_and_load();
+ if (!ASSERT_OK_PTR(skel, "open_and_load"))
+ return;
+
+ arena_fd = bpf_map__fd(skel->maps.arena);
+
+ /* libbpf mmaps the arena via initial_value */
+ arena_addr = bpf_map__initial_value(skel->maps.arena, &arena_sz);
+ if (!ASSERT_OK_PTR(arena_addr, "arena_mmap"))
+ goto out;
+
+ /* Get real arena byte size for munmap */
+ bpf_map_get_info_by_fd(arena_fd, &info, &info_len);
+ arena_sz = (size_t)info.max_entries * page_size;
+
+ /* Allocate 4 pages in the arena via BPF */
+ ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.arena_alloc),
+ &opts);
+ if (!ASSERT_OK(ret, "alloc_run") ||
+ !ASSERT_OK(opts.retval, "alloc_ret"))
+ goto out;
+
+ /* Fault in a page so zap_pages has work to do */
+ ((volatile char *)arena_addr)[0] = 'A';
+
+ /* Fork: child inherits the arena VMA */
+ pid = fork();
+ if (!ASSERT_GE(pid, 0, "fork"))
+ goto out;
+
+ if (pid == 0) {
+ /* Child: parent will unmap first, then we free pages. */
+ LIBBPF_OPTS(bpf_test_run_opts, child_opts);
+ int free_fd = bpf_program__fd(skel->progs.arena_free);
+
+ usleep(200000); /* let parent munmap first */
+
+ ret = bpf_prog_test_run_opts(free_fd, &child_opts);
+ _exit(ret || child_opts.retval);
+ }
+
+ /* Parent: unmap the arena */
+ munmap(arena_addr, arena_sz);
+
+ waitpid(pid, &status, 0);
+ ASSERT_TRUE(WIFEXITED(status), "child_exited");
+ ASSERT_EQ(WEXITSTATUS(status), 0, "child_exit_code");
+out:
+ arena_fork__destroy(skel);
+}
diff --git a/tools/testing/selftests/bpf/progs/arena_fork.c b/tools/testing/selftests/bpf/progs/arena_fork.c
new file mode 100644
index 000000000000..783c935a0af8
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/arena_fork.c
@@ -0,0 +1,41 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2026 Meta Platforms, Inc. and affiliates. */
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+#include "bpf_arena_common.h"
+
+struct {
+ __uint(type, BPF_MAP_TYPE_ARENA);
+ __uint(map_flags, BPF_F_MMAPABLE);
+ __uint(max_entries, 16); /* number of pages */
+#ifdef __TARGET_ARCH_arm64
+ __ulong(map_extra, 0x1ull << 32); /* start of mmap() region */
+#else
+ __ulong(map_extra, 0x1ull << 44); /* start of mmap() region */
+#endif
+} arena SEC(".maps");
+
+void __arena *alloc_addr;
+
+SEC("syscall")
+int arena_alloc(void *ctx)
+{
+ void __arena *p;
+
+ p = bpf_arena_alloc_pages(&arena, NULL, 4, NUMA_NO_NODE, 0);
+ if (!p)
+ return 1;
+ alloc_addr = p;
+ return 0;
+}
+
+SEC("syscall")
+int arena_free(void *ctx)
+{
+ if (!alloc_addr)
+ return 1;
+ bpf_arena_free_pages(&arena, alloc_addr, 4);
+ return 0;
+}
+
+char _license[] SEC("license") = "GPL";
--
2.43.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH bpf v4 1/2] bpf: Fix use-after-free of arena VMA on fork
2026-04-12 2:27 ` [PATCH bpf v4 1/2] bpf: Fix use-after-free of arena VMA " Weiming Shi
@ 2026-04-12 17:50 ` Emil Tsalapatis
2026-04-12 21:30 ` Alexei Starovoitov
0 siblings, 1 reply; 7+ messages in thread
From: Emil Tsalapatis @ 2026-04-12 17:50 UTC (permalink / raw)
To: Weiming Shi, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
Cc: Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Barret Rhoden, Emil Tsalapatis, bpf, linux-kernel, Xiang Mei
On Sat Apr 11, 2026 at 10:27 PM EDT, Weiming Shi wrote:
> arena_vm_open() only increments a refcount on the shared vma_list entry
> but never registers the new VMA or updates the stored vma pointer. When
> the original VMA is unmapped while a forked copy still exists,
> arena_vm_close() drops the refcount without freeing the vma_list entry.
> The entry's vma pointer now refers to a freed vm_area_struct. A
> subsequent bpf_arena_free_pages() call iterates vma_list and passes
> the dangling pointer to zap_page_range_single(), causing a
> use-after-free.
>
> The bug is reachable by any process with CAP_BPF and CAP_PERFMON that
> can create a BPF_MAP_TYPE_ARENA, mmap it, and fork. It triggers
> deterministically -- no race condition is involved.
>
> BUG: KASAN: slab-use-after-free in zap_page_range_single (mm/memory.c:2234)
> Call Trace:
> <TASK>
> zap_page_range_single+0x101/0x110 mm/memory.c:2234
> zap_pages+0x80/0xf0 kernel/bpf/arena.c:658
> arena_free_pages+0x67a/0x860 kernel/bpf/arena.c:712
> bpf_prog_test_run_syscall+0x3da net/bpf/test_run.c:1640
> __sys_bpf+0x1662/0x50b0 kernel/bpf/syscall.c:6267
> __x64_sys_bpf+0x73/0xb0 kernel/bpf/syscall.c:6360
> do_syscall_64+0xf1/0x530 arch/x86/entry/syscall_64.c:63
> entry_SYSCALL_64_after_hwframe+0x77 arch/x86/entry/entry_64.S:130
> </TASK>
>
> Fix this by tracking each child VMA separately. arena_vm_open() now
> clears the inherited vm_private_data and calls remember_vma() to
> register a fresh vma_list entry for the new VMA. If remember_vma()
> fails due to OOM, vm_private_data stays NULL and arena_vm_close()
> skips the cleanup for that VMA. The shared refcount is no longer
> needed and is removed.
>
> Also add arena_vm_may_split() returning -EINVAL to prevent VMA
> splitting, so that arena_vm_open() only needs to handle fork and the
> vma_list tracking stays simple.
>
> Fixes: b90d77e5fd78 ("bpf: Fix remap of arena.")
> Reported-by: Xiang Mei <xmei5@asu.edu>
> Signed-off-by: Weiming Shi <bestswngs@gmail.com>
> ---
> kernel/bpf/arena.c | 23 +++++++++++++++++------
> 1 file changed, 17 insertions(+), 6 deletions(-)
>
> diff --git a/kernel/bpf/arena.c b/kernel/bpf/arena.c
> index f355cf1c1a16..3462c4463617 100644
> --- a/kernel/bpf/arena.c
> +++ b/kernel/bpf/arena.c
> @@ -317,7 +317,6 @@ static u64 arena_map_mem_usage(const struct bpf_map *map)
> struct vma_list {
> struct vm_area_struct *vma;
> struct list_head head;
> - refcount_t mmap_count;
> };
>
> static int remember_vma(struct bpf_arena *arena, struct vm_area_struct *vma)
> @@ -327,7 +326,6 @@ static int remember_vma(struct bpf_arena *arena, struct vm_area_struct *vma)
> vml = kmalloc_obj(*vml);
> if (!vml)
> return -ENOMEM;
> - refcount_set(&vml->mmap_count, 1);
> vma->vm_private_data = vml;
> vml->vma = vma;
> list_add(&vml->head, &arena->vma_list);
> @@ -336,9 +334,17 @@ static int remember_vma(struct bpf_arena *arena, struct vm_area_struct *vma)
>
> static void arena_vm_open(struct vm_area_struct *vma)
> {
> - struct vma_list *vml = vma->vm_private_data;
> + struct bpf_map *map = vma->vm_file->private_data;
> + struct bpf_arena *arena = container_of(map, struct bpf_arena, map);
>
> - refcount_inc(&vml->mmap_count);
> + /*
> + * vm_private_data points to the parent's vma_list entry after fork.
> + * Clear it and register this VMA separately.
> + */
> + vma->vm_private_data = NULL;
> + guard(mutex)(&arena->lock);
> + /* OOM is silently ignored; arena_vm_close() handles NULL. */
I don't see any way this approach gonna work, and frankly makes no sense
to me. This patch doesn't take into account how the vma_list is actually
used. It frankly makes no sense, Please think through this: If we could
silently just not allocate the vml, why do we need it in the first place?
> + remember_vma(arena, vma);
> }
>
> static void arena_vm_close(struct vm_area_struct *vma)
> @@ -347,10 +353,9 @@ static void arena_vm_close(struct vm_area_struct *vma)
> struct bpf_arena *arena = container_of(map, struct bpf_arena, map);
> struct vma_list *vml = vma->vm_private_data;
>
> - if (!refcount_dec_and_test(&vml->mmap_count))
> + if (!vml)
> return;
> guard(mutex)(&arena->lock);
> - /* update link list under lock */
> list_del(&vml->head);
> vma->vm_private_data = NULL;
> kfree(vml);
> @@ -415,9 +420,15 @@ static vm_fault_t arena_vm_fault(struct vm_fault *vmf)
> return VM_FAULT_SIGSEGV;
> }
>
> +static int arena_vm_may_split(struct vm_area_struct *vma, unsigned long addr)
> +{
> + return -EINVAL;
> +}
> +
> static const struct vm_operations_struct arena_vm_ops = {
> .open = arena_vm_open,
> .close = arena_vm_close,
> + .may_split = arena_vm_may_split,
> .fault = arena_vm_fault,
> };
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH bpf v4 1/2] bpf: Fix use-after-free of arena VMA on fork
2026-04-12 17:50 ` Emil Tsalapatis
@ 2026-04-12 21:30 ` Alexei Starovoitov
2026-04-13 10:12 ` Weiming Shi
0 siblings, 1 reply; 7+ messages in thread
From: Alexei Starovoitov @ 2026-04-12 21:30 UTC (permalink / raw)
To: Emil Tsalapatis
Cc: Weiming Shi, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Barret Rhoden, bpf, LKML, Xiang Mei
On Sun, Apr 12, 2026 at 10:50 AM Emil Tsalapatis <emil@etsalapatis.com> wrote:
>
> On Sat Apr 11, 2026 at 10:27 PM EDT, Weiming Shi wrote:
> > arena_vm_open() only increments a refcount on the shared vma_list entry
> > but never registers the new VMA or updates the stored vma pointer. When
> > the original VMA is unmapped while a forked copy still exists,
> > arena_vm_close() drops the refcount without freeing the vma_list entry.
> > The entry's vma pointer now refers to a freed vm_area_struct. A
> > subsequent bpf_arena_free_pages() call iterates vma_list and passes
> > the dangling pointer to zap_page_range_single(), causing a
> > use-after-free.
> >
> > The bug is reachable by any process with CAP_BPF and CAP_PERFMON that
> > can create a BPF_MAP_TYPE_ARENA, mmap it, and fork. It triggers
> > deterministically -- no race condition is involved.
> >
> > BUG: KASAN: slab-use-after-free in zap_page_range_single (mm/memory.c:2234)
> > Call Trace:
> > <TASK>
> > zap_page_range_single+0x101/0x110 mm/memory.c:2234
> > zap_pages+0x80/0xf0 kernel/bpf/arena.c:658
> > arena_free_pages+0x67a/0x860 kernel/bpf/arena.c:712
> > bpf_prog_test_run_syscall+0x3da net/bpf/test_run.c:1640
> > __sys_bpf+0x1662/0x50b0 kernel/bpf/syscall.c:6267
> > __x64_sys_bpf+0x73/0xb0 kernel/bpf/syscall.c:6360
> > do_syscall_64+0xf1/0x530 arch/x86/entry/syscall_64.c:63
> > entry_SYSCALL_64_after_hwframe+0x77 arch/x86/entry/entry_64.S:130
> > </TASK>
> >
> > Fix this by tracking each child VMA separately. arena_vm_open() now
> > clears the inherited vm_private_data and calls remember_vma() to
> > register a fresh vma_list entry for the new VMA. If remember_vma()
> > fails due to OOM, vm_private_data stays NULL and arena_vm_close()
> > skips the cleanup for that VMA. The shared refcount is no longer
> > needed and is removed.
> >
> > Also add arena_vm_may_split() returning -EINVAL to prevent VMA
> > splitting, so that arena_vm_open() only needs to handle fork and the
> > vma_list tracking stays simple.
> >
> > Fixes: b90d77e5fd78 ("bpf: Fix remap of arena.")
> > Reported-by: Xiang Mei <xmei5@asu.edu>
> > Signed-off-by: Weiming Shi <bestswngs@gmail.com>
> > ---
> > kernel/bpf/arena.c | 23 +++++++++++++++++------
> > 1 file changed, 17 insertions(+), 6 deletions(-)
> >
> > diff --git a/kernel/bpf/arena.c b/kernel/bpf/arena.c
> > index f355cf1c1a16..3462c4463617 100644
> > --- a/kernel/bpf/arena.c
> > +++ b/kernel/bpf/arena.c
> > @@ -317,7 +317,6 @@ static u64 arena_map_mem_usage(const struct bpf_map *map)
> > struct vma_list {
> > struct vm_area_struct *vma;
> > struct list_head head;
> > - refcount_t mmap_count;
> > };
> >
> > static int remember_vma(struct bpf_arena *arena, struct vm_area_struct *vma)
> > @@ -327,7 +326,6 @@ static int remember_vma(struct bpf_arena *arena, struct vm_area_struct *vma)
> > vml = kmalloc_obj(*vml);
> > if (!vml)
> > return -ENOMEM;
> > - refcount_set(&vml->mmap_count, 1);
> > vma->vm_private_data = vml;
> > vml->vma = vma;
> > list_add(&vml->head, &arena->vma_list);
> > @@ -336,9 +334,17 @@ static int remember_vma(struct bpf_arena *arena, struct vm_area_struct *vma)
> >
> > static void arena_vm_open(struct vm_area_struct *vma)
> > {
> > - struct vma_list *vml = vma->vm_private_data;
> > + struct bpf_map *map = vma->vm_file->private_data;
> > + struct bpf_arena *arena = container_of(map, struct bpf_arena, map);
> >
> > - refcount_inc(&vml->mmap_count);
> > + /*
> > + * vm_private_data points to the parent's vma_list entry after fork.
> > + * Clear it and register this VMA separately.
> > + */
> > + vma->vm_private_data = NULL;
> > + guard(mutex)(&arena->lock);
> > + /* OOM is silently ignored; arena_vm_close() handles NULL. */
>
> I don't see any way this approach gonna work, and frankly makes no sense
> to me. This patch doesn't take into account how the vma_list is actually
> used. It frankly makes no sense, Please think through this: If we could
> silently just not allocate the vml, why do we need it in the first place?
+1
Weiming,
you should stop trusting AI so blindly.
First, analyze the root cause (the first paragraph of the commit log).
Is this really the case?
Second, I copy pasted it to claude and got the same "fix" back,
but implemented without your bug:
+ vml = kmalloc_obj(*vml);
+ if (!vml) {
+ vma->vm_private_data = NULL;
+ return;
+ }
+ vml->vma = vma;
+ vma->vm_private_data = vml;
+ guard(mutex)(&arena->lock);
+ list_add(&vml->head, &arena->vma_list);
at least this part is kinda makes sense...
and, of course, this part too:
- if (!refcount_dec_and_test(&vml->mmap_count))
+ if (!vml)
return;
when you look at it you MUST ask AI back:
"Is this buggy?"
and it will reply:
"
Right — silently dropping the VMA from the list means zap_pages()
won't unmap pages from it, which is a correctness problem, not just
degraded behavior. Since vm_open can't fail, the allocation should use
__GFP_NOFAIL. The struct is tiny so that's fine.
"
and it proceeded adding __GFP_NOFAIL.
which is wrong too.
So please don't just throw broken patches at maintainers.
Do your homework. Fixing one maybe-bug and introducing
more real bugs is not a step forward.
pw-bot: cr
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH bpf v4 1/2] bpf: Fix use-after-free of arena VMA on fork
2026-04-12 21:30 ` Alexei Starovoitov
@ 2026-04-13 10:12 ` Weiming Shi
2026-04-13 18:53 ` Alexei Starovoitov
0 siblings, 1 reply; 7+ messages in thread
From: Weiming Shi @ 2026-04-13 10:12 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Emil Tsalapatis, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
Hao Luo, Jiri Olsa, Barret Rhoden, bpf, LKML, Xiang Mei
On 26-04-12 14:30, Alexei Starovoitov wrote:
> On Sun, Apr 12, 2026 at 10:50 AM Emil Tsalapatis <emil@etsalapatis.com> wrote:
> >
> > On Sat Apr 11, 2026 at 10:27 PM EDT, Weiming Shi wrote:
> > > arena_vm_open() only increments a refcount on the shared vma_list entry
> > > but never registers the new VMA or updates the stored vma pointer. When
> > > the original VMA is unmapped while a forked copy still exists,
> > > arena_vm_close() drops the refcount without freeing the vma_list entry.
> > > The entry's vma pointer now refers to a freed vm_area_struct. A
> > > subsequent bpf_arena_free_pages() call iterates vma_list and passes
> > > the dangling pointer to zap_page_range_single(), causing a
> > > use-after-free.
> > >
> > > The bug is reachable by any process with CAP_BPF and CAP_PERFMON that
> > > can create a BPF_MAP_TYPE_ARENA, mmap it, and fork. It triggers
> > > deterministically -- no race condition is involved.
> > >
> > > BUG: KASAN: slab-use-after-free in zap_page_range_single (mm/memory.c:2234)
> > > Call Trace:
> > > <TASK>
> > > zap_page_range_single+0x101/0x110 mm/memory.c:2234
> > > zap_pages+0x80/0xf0 kernel/bpf/arena.c:658
> > > arena_free_pages+0x67a/0x860 kernel/bpf/arena.c:712
> > > bpf_prog_test_run_syscall+0x3da net/bpf/test_run.c:1640
> > > __sys_bpf+0x1662/0x50b0 kernel/bpf/syscall.c:6267
> > > __x64_sys_bpf+0x73/0xb0 kernel/bpf/syscall.c:6360
> > > do_syscall_64+0xf1/0x530 arch/x86/entry/syscall_64.c:63
> > > entry_SYSCALL_64_after_hwframe+0x77 arch/x86/entry/entry_64.S:130
> > > </TASK>
> > >
> > > Fix this by tracking each child VMA separately. arena_vm_open() now
> > > clears the inherited vm_private_data and calls remember_vma() to
> > > register a fresh vma_list entry for the new VMA. If remember_vma()
> > > fails due to OOM, vm_private_data stays NULL and arena_vm_close()
> > > skips the cleanup for that VMA. The shared refcount is no longer
> > > needed and is removed.
> > >
> > > Also add arena_vm_may_split() returning -EINVAL to prevent VMA
> > > splitting, so that arena_vm_open() only needs to handle fork and the
> > > vma_list tracking stays simple.
> > >
> > > Fixes: b90d77e5fd78 ("bpf: Fix remap of arena.")
> > > Reported-by: Xiang Mei <xmei5@asu.edu>
> > > Signed-off-by: Weiming Shi <bestswngs@gmail.com>
> > > ---
> > > kernel/bpf/arena.c | 23 +++++++++++++++++------
> > > 1 file changed, 17 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/kernel/bpf/arena.c b/kernel/bpf/arena.c
> > > index f355cf1c1a16..3462c4463617 100644
> > > --- a/kernel/bpf/arena.c
> > > +++ b/kernel/bpf/arena.c
> > > @@ -317,7 +317,6 @@ static u64 arena_map_mem_usage(const struct bpf_map *map)
> > > struct vma_list {
> > > struct vm_area_struct *vma;
> > > struct list_head head;
> > > - refcount_t mmap_count;
> > > };
> > >
> > > static int remember_vma(struct bpf_arena *arena, struct vm_area_struct *vma)
> > > @@ -327,7 +326,6 @@ static int remember_vma(struct bpf_arena *arena, struct vm_area_struct *vma)
> > > vml = kmalloc_obj(*vml);
> > > if (!vml)
> > > return -ENOMEM;
> > > - refcount_set(&vml->mmap_count, 1);
> > > vma->vm_private_data = vml;
> > > vml->vma = vma;
> > > list_add(&vml->head, &arena->vma_list);
> > > @@ -336,9 +334,17 @@ static int remember_vma(struct bpf_arena *arena, struct vm_area_struct *vma)
> > >
> > > static void arena_vm_open(struct vm_area_struct *vma)
> > > {
> > > - struct vma_list *vml = vma->vm_private_data;
> > > + struct bpf_map *map = vma->vm_file->private_data;
> > > + struct bpf_arena *arena = container_of(map, struct bpf_arena, map);
> > >
> > > - refcount_inc(&vml->mmap_count);
> > > + /*
> > > + * vm_private_data points to the parent's vma_list entry after fork.
> > > + * Clear it and register this VMA separately.
> > > + */
> > > + vma->vm_private_data = NULL;
> > > + guard(mutex)(&arena->lock);
> > > + /* OOM is silently ignored; arena_vm_close() handles NULL. */
> >
> > I don't see any way this approach gonna work, and frankly makes no sense
> > to me. This patch doesn't take into account how the vma_list is actually
> > used. It frankly makes no sense, Please think through this: If we could
> > silently just not allocate the vml, why do we need it in the first place?
>
> +1
>
> Weiming,
>
> you should stop trusting AI so blindly.
> First, analyze the root cause (the first paragraph of the commit log).
> Is this really the case?
>
> Second, I copy pasted it to claude and got the same "fix" back,
> but implemented without your bug:
> + vml = kmalloc_obj(*vml);
> + if (!vml) {
> + vma->vm_private_data = NULL;
> + return;
> + }
> + vml->vma = vma;
> + vma->vm_private_data = vml;
> + guard(mutex)(&arena->lock);
> + list_add(&vml->head, &arena->vma_list);
>
> at least this part is kinda makes sense...
>
> and, of course, this part too:
>
> - if (!refcount_dec_and_test(&vml->mmap_count))
> + if (!vml)
> return;
>
> when you look at it you MUST ask AI back:
> "Is this buggy?"
>
> and it will reply:
> "
> Right — silently dropping the VMA from the list means zap_pages()
> won't unmap pages from it, which is a correctness problem, not just
> degraded behavior. Since vm_open can't fail, the allocation should use
> __GFP_NOFAIL. The struct is tiny so that's fine.
> "
>
> and it proceeded adding __GFP_NOFAIL.
>
> which is wrong too.
>
> So please don't just throw broken patches at maintainers.
> Do your homework. Fixing one maybe-bug and introducing
> more real bugs is not a step forward.
>
> pw-bot: cr
Thanks for the detailed review, really appreciate it.
I traced through it with GDB + KASAN in QEMU. Here's what happens:
1. mmap → remember_vma()
vml->vma = 0xffff88800abfe700, mmap_count = 1
now Parent VMA = 0xffff88800abfe700
2. fork → arena_vm_open(child_vma)
vml->vma = 0xffff88800abfe700 (unchanged), mmap_count = 2
3. parent munmap → arena_vm_close(parent_vma)
mmap_count = 1
vml->vma is now dangling
4. child bpf_arena_free_pages → zap_pages()
reads vml->vma = 0xffff88800abfe700 → UAF
The core issue is that arena_vm_open() never registers the child
VMA -- it only bumps the mmap_count . So vml->vma always points at
the parent, and dangles once the parent unmaps.
What approach would you suggest for fixing this?
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH bpf v4 1/2] bpf: Fix use-after-free of arena VMA on fork
2026-04-13 10:12 ` Weiming Shi
@ 2026-04-13 18:53 ` Alexei Starovoitov
0 siblings, 0 replies; 7+ messages in thread
From: Alexei Starovoitov @ 2026-04-13 18:53 UTC (permalink / raw)
To: Weiming Shi
Cc: Emil Tsalapatis, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
Hao Luo, Jiri Olsa, Barret Rhoden, bpf, LKML, Xiang Mei
On Mon, Apr 13, 2026 at 3:12 AM Weiming Shi <bestswngs@gmail.com> wrote:
>
> On 26-04-12 14:30, Alexei Starovoitov wrote:
> > On Sun, Apr 12, 2026 at 10:50 AM Emil Tsalapatis <emil@etsalapatis.com> wrote:
> > >
> > > On Sat Apr 11, 2026 at 10:27 PM EDT, Weiming Shi wrote:
> > > > arena_vm_open() only increments a refcount on the shared vma_list entry
> > > > but never registers the new VMA or updates the stored vma pointer. When
> > > > the original VMA is unmapped while a forked copy still exists,
> > > > arena_vm_close() drops the refcount without freeing the vma_list entry.
> > > > The entry's vma pointer now refers to a freed vm_area_struct. A
> > > > subsequent bpf_arena_free_pages() call iterates vma_list and passes
> > > > the dangling pointer to zap_page_range_single(), causing a
> > > > use-after-free.
> > > >
> > > > The bug is reachable by any process with CAP_BPF and CAP_PERFMON that
> > > > can create a BPF_MAP_TYPE_ARENA, mmap it, and fork. It triggers
> > > > deterministically -- no race condition is involved.
> > > >
> > > > BUG: KASAN: slab-use-after-free in zap_page_range_single (mm/memory.c:2234)
> > > > Call Trace:
> > > > <TASK>
> > > > zap_page_range_single+0x101/0x110 mm/memory.c:2234
> > > > zap_pages+0x80/0xf0 kernel/bpf/arena.c:658
> > > > arena_free_pages+0x67a/0x860 kernel/bpf/arena.c:712
> > > > bpf_prog_test_run_syscall+0x3da net/bpf/test_run.c:1640
> > > > __sys_bpf+0x1662/0x50b0 kernel/bpf/syscall.c:6267
> > > > __x64_sys_bpf+0x73/0xb0 kernel/bpf/syscall.c:6360
> > > > do_syscall_64+0xf1/0x530 arch/x86/entry/syscall_64.c:63
> > > > entry_SYSCALL_64_after_hwframe+0x77 arch/x86/entry/entry_64.S:130
> > > > </TASK>
> > > >
> > > > Fix this by tracking each child VMA separately. arena_vm_open() now
> > > > clears the inherited vm_private_data and calls remember_vma() to
> > > > register a fresh vma_list entry for the new VMA. If remember_vma()
> > > > fails due to OOM, vm_private_data stays NULL and arena_vm_close()
> > > > skips the cleanup for that VMA. The shared refcount is no longer
> > > > needed and is removed.
> > > >
> > > > Also add arena_vm_may_split() returning -EINVAL to prevent VMA
> > > > splitting, so that arena_vm_open() only needs to handle fork and the
> > > > vma_list tracking stays simple.
> > > >
> > > > Fixes: b90d77e5fd78 ("bpf: Fix remap of arena.")
> > > > Reported-by: Xiang Mei <xmei5@asu.edu>
> > > > Signed-off-by: Weiming Shi <bestswngs@gmail.com>
> > > > ---
> > > > kernel/bpf/arena.c | 23 +++++++++++++++++------
> > > > 1 file changed, 17 insertions(+), 6 deletions(-)
> > > >
> > > > diff --git a/kernel/bpf/arena.c b/kernel/bpf/arena.c
> > > > index f355cf1c1a16..3462c4463617 100644
> > > > --- a/kernel/bpf/arena.c
> > > > +++ b/kernel/bpf/arena.c
> > > > @@ -317,7 +317,6 @@ static u64 arena_map_mem_usage(const struct bpf_map *map)
> > > > struct vma_list {
> > > > struct vm_area_struct *vma;
> > > > struct list_head head;
> > > > - refcount_t mmap_count;
> > > > };
> > > >
> > > > static int remember_vma(struct bpf_arena *arena, struct vm_area_struct *vma)
> > > > @@ -327,7 +326,6 @@ static int remember_vma(struct bpf_arena *arena, struct vm_area_struct *vma)
> > > > vml = kmalloc_obj(*vml);
> > > > if (!vml)
> > > > return -ENOMEM;
> > > > - refcount_set(&vml->mmap_count, 1);
> > > > vma->vm_private_data = vml;
> > > > vml->vma = vma;
> > > > list_add(&vml->head, &arena->vma_list);
> > > > @@ -336,9 +334,17 @@ static int remember_vma(struct bpf_arena *arena, struct vm_area_struct *vma)
> > > >
> > > > static void arena_vm_open(struct vm_area_struct *vma)
> > > > {
> > > > - struct vma_list *vml = vma->vm_private_data;
> > > > + struct bpf_map *map = vma->vm_file->private_data;
> > > > + struct bpf_arena *arena = container_of(map, struct bpf_arena, map);
> > > >
> > > > - refcount_inc(&vml->mmap_count);
> > > > + /*
> > > > + * vm_private_data points to the parent's vma_list entry after fork.
> > > > + * Clear it and register this VMA separately.
> > > > + */
> > > > + vma->vm_private_data = NULL;
> > > > + guard(mutex)(&arena->lock);
> > > > + /* OOM is silently ignored; arena_vm_close() handles NULL. */
> > >
> > > I don't see any way this approach gonna work, and frankly makes no sense
> > > to me. This patch doesn't take into account how the vma_list is actually
> > > used. It frankly makes no sense, Please think through this: If we could
> > > silently just not allocate the vml, why do we need it in the first place?
> >
> > +1
> >
> > Weiming,
> >
> > you should stop trusting AI so blindly.
> > First, analyze the root cause (the first paragraph of the commit log).
> > Is this really the case?
> >
> > Second, I copy pasted it to claude and got the same "fix" back,
> > but implemented without your bug:
> > + vml = kmalloc_obj(*vml);
> > + if (!vml) {
> > + vma->vm_private_data = NULL;
> > + return;
> > + }
> > + vml->vma = vma;
> > + vma->vm_private_data = vml;
> > + guard(mutex)(&arena->lock);
> > + list_add(&vml->head, &arena->vma_list);
> >
> > at least this part is kinda makes sense...
> >
> > and, of course, this part too:
> >
> > - if (!refcount_dec_and_test(&vml->mmap_count))
> > + if (!vml)
> > return;
> >
> > when you look at it you MUST ask AI back:
> > "Is this buggy?"
> >
> > and it will reply:
> > "
> > Right — silently dropping the VMA from the list means zap_pages()
> > won't unmap pages from it, which is a correctness problem, not just
> > degraded behavior. Since vm_open can't fail, the allocation should use
> > __GFP_NOFAIL. The struct is tiny so that's fine.
> > "
> >
> > and it proceeded adding __GFP_NOFAIL.
> >
> > which is wrong too.
> >
> > So please don't just throw broken patches at maintainers.
> > Do your homework. Fixing one maybe-bug and introducing
> > more real bugs is not a step forward.
> >
> > pw-bot: cr
>
> Thanks for the detailed review, really appreciate it.
>
> I traced through it with GDB + KASAN in QEMU. Here's what happens:
>
> 1. mmap → remember_vma()
> vml->vma = 0xffff88800abfe700, mmap_count = 1
> now Parent VMA = 0xffff88800abfe700
> 2. fork → arena_vm_open(child_vma)
> vml->vma = 0xffff88800abfe700 (unchanged), mmap_count = 2
>
> 3. parent munmap → arena_vm_close(parent_vma)
> mmap_count = 1
> vml->vma is now dangling
>
> 4. child bpf_arena_free_pages → zap_pages()
> reads vml->vma = 0xffff88800abfe700 → UAF
>
> The core issue is that arena_vm_open() never registers the child
> VMA -- it only bumps the mmap_count . So vml->vma always points at
> the parent, and dangles once the parent unmaps.
>
> What approach would you suggest for fixing this?
I'm thinking to just do this:
diff --git a/kernel/bpf/arena.c b/kernel/bpf/arena.c
index f355cf1c1a16..a4f1df1bf0f4 100644
--- a/kernel/bpf/arena.c
+++ b/kernel/bpf/arena.c
@@ -489,7 +489,7 @@ static int arena_map_mmap(struct bpf_map *map,
struct vm_area_struct *vma)
* clears VM_MAYEXEC. Set VM_DONTEXPAND as well to avoid
* potential change of user_vm_start.
*/
- vm_flags_set(vma, VM_DONTEXPAND);
+ vm_flags_set(vma, VM_DONTEXPAND | VM_DONTCOPY);
vma->vm_ops = &arena_vm_ops;
return 0;
}
^ permalink raw reply related [flat|nested] 7+ messages in thread
end of thread, other threads:[~2026-04-13 18:53 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-12 2:27 [PATCH bpf v4 0/2] bpf: Fix arena VMA use-after-free on fork Weiming Shi
2026-04-12 2:27 ` [PATCH bpf v4 1/2] bpf: Fix use-after-free of arena VMA " Weiming Shi
2026-04-12 17:50 ` Emil Tsalapatis
2026-04-12 21:30 ` Alexei Starovoitov
2026-04-13 10:12 ` Weiming Shi
2026-04-13 18:53 ` Alexei Starovoitov
2026-04-12 2:27 ` [PATCH bpf v4 2/2] selftests/bpf: Add test for arena VMA use-after-free " Weiming Shi
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox