public inbox for bpf@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH bpf v3 0/2] bpf: Fix arena VMA use-after-free on fork
@ 2026-04-11 20:08 Weiming Shi
  2026-04-11 20:08 ` [PATCH bpf v3 1/2] bpf: Fix use-after-free of arena VMA " Weiming Shi
  2026-04-11 20:08 ` [PATCH bpf v3 2/2] selftests/bpf: Add test for arena VMA use-after-free " Weiming Shi
  0 siblings, 2 replies; 4+ messages in thread
From: Weiming Shi @ 2026-04-11 20:08 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
  Cc: Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Barret Rhoden, Emil Tsalapatis, bpf, Xiang Mei, Weiming Shi

arena_vm_open() only increments a refcount on the shared vma_list entry
but never registers the new VMA. After fork + parent munmap, vml->vma
becomes a dangling pointer. bpf_arena_free_pages -> zap_pages then
dereferences it, causing a slab-use-after-free in zap_page_range_single.

Patch 1 fixes the bug by tracking each child VMA separately in
arena_vm_open, and adds arena_vm_may_split() to prevent VMA splitting.
Patch 2 adds a selftest that reproduces the issue (requires KASAN to
detect the UAF).

v3:
- Added arena_vm_may_split() to prevent VMA splitting
- Reuse remember_vma() in arena_vm_open(), removed HugeTLB references
- selftests: fixed copyright, trimmed comments, use sysconf()

v2:
- Added missing Reported-by tag

Weiming Shi (2):
  bpf: Fix use-after-free of arena VMA on fork
  selftests/bpf: Add test for arena VMA use-after-free on fork

 kernel/bpf/arena.c                            | 23 ++++--
 .../selftests/bpf/prog_tests/arena_fork.c     | 80 +++++++++++++++++++
 .../testing/selftests/bpf/progs/arena_fork.c  | 41 ++++++++++
 3 files changed, 138 insertions(+), 6 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/arena_fork.c
 create mode 100644 tools/testing/selftests/bpf/progs/arena_fork.c

-- 
2.43.0


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH bpf v3 1/2] bpf: Fix use-after-free of arena VMA on fork
  2026-04-11 20:08 [PATCH bpf v3 0/2] bpf: Fix arena VMA use-after-free on fork Weiming Shi
@ 2026-04-11 20:08 ` Weiming Shi
  2026-04-11 20:47   ` bot+bpf-ci
  2026-04-11 20:08 ` [PATCH bpf v3 2/2] selftests/bpf: Add test for arena VMA use-after-free " Weiming Shi
  1 sibling, 1 reply; 4+ messages in thread
From: Weiming Shi @ 2026-04-11 20:08 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
  Cc: Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Barret Rhoden, Emil Tsalapatis, bpf, Xiang Mei, Weiming Shi

arena_vm_open() only increments a refcount on the shared vma_list entry
but never registers the new VMA or updates the stored vma pointer. When
the original VMA is unmapped while a forked copy still exists,
arena_vm_close() drops the refcount without freeing the vma_list entry.
The entry's vma pointer now refers to a freed vm_area_struct. A
subsequent bpf_arena_free_pages() call iterates vma_list and passes
the dangling pointer to zap_page_range_single(), causing a
use-after-free.

The bug is reachable by any process with CAP_BPF and CAP_PERFMON that
can create a BPF_MAP_TYPE_ARENA, mmap it, and fork. It triggers
deterministically -- no race condition is involved.

 BUG: KASAN: slab-use-after-free in zap_page_range_single (mm/memory.c:2234)
 Call Trace:
  <TASK>
  zap_page_range_single+0x101/0x110   mm/memory.c:2234
  zap_pages+0x80/0xf0                 kernel/bpf/arena.c:658
  arena_free_pages+0x67a/0x860        kernel/bpf/arena.c:712
  bpf_prog_test_run_syscall+0x3da     net/bpf/test_run.c:1640
  __sys_bpf+0x1662/0x50b0             kernel/bpf/syscall.c:6267
  __x64_sys_bpf+0x73/0xb0             kernel/bpf/syscall.c:6360
  do_syscall_64+0xf1/0x530            arch/x86/entry/syscall_64.c:63
  entry_SYSCALL_64_after_hwframe+0x77  arch/x86/entry/entry_64.S:130
  </TASK>

Fix this by tracking each child VMA separately. arena_vm_open() now
clears the inherited vm_private_data and calls remember_vma() to
register a fresh vma_list entry for the new VMA. arena_vm_close()
unconditionally removes and frees the entry. The shared refcount is
no longer needed and is removed.

Also add arena_vm_may_split() returning -EINVAL to prevent VMA
splitting, which would break the pgoff arithmetic in arena_vm_fault().

Fixes: b90d77e5fd78 ("bpf: Fix remap of arena.")
Reported-by: Xiang Mei <xmei5@asu.edu>
Signed-off-by: Weiming Shi <bestswngs@gmail.com>
---
 kernel/bpf/arena.c | 23 +++++++++++++++++------
 1 file changed, 17 insertions(+), 6 deletions(-)

diff --git a/kernel/bpf/arena.c b/kernel/bpf/arena.c
index f355cf1c1a16..3462c4463617 100644
--- a/kernel/bpf/arena.c
+++ b/kernel/bpf/arena.c
@@ -317,7 +317,6 @@ static u64 arena_map_mem_usage(const struct bpf_map *map)
 struct vma_list {
 	struct vm_area_struct *vma;
 	struct list_head head;
-	refcount_t mmap_count;
 };
 
 static int remember_vma(struct bpf_arena *arena, struct vm_area_struct *vma)
@@ -327,7 +326,6 @@ static int remember_vma(struct bpf_arena *arena, struct vm_area_struct *vma)
 	vml = kmalloc_obj(*vml);
 	if (!vml)
 		return -ENOMEM;
-	refcount_set(&vml->mmap_count, 1);
 	vma->vm_private_data = vml;
 	vml->vma = vma;
 	list_add(&vml->head, &arena->vma_list);
@@ -336,9 +334,17 @@ static int remember_vma(struct bpf_arena *arena, struct vm_area_struct *vma)
 
 static void arena_vm_open(struct vm_area_struct *vma)
 {
-	struct vma_list *vml = vma->vm_private_data;
+	struct bpf_map *map = vma->vm_file->private_data;
+	struct bpf_arena *arena = container_of(map, struct bpf_arena, map);
 
-	refcount_inc(&vml->mmap_count);
+	/*
+	 * vm_private_data points to the parent's vma_list entry after fork.
+	 * Clear it and register this VMA separately.
+	 */
+	vma->vm_private_data = NULL;
+	guard(mutex)(&arena->lock);
+	/* OOM is silently ignored; arena_vm_close() handles NULL. */
+	remember_vma(arena, vma);
 }
 
 static void arena_vm_close(struct vm_area_struct *vma)
@@ -347,10 +353,9 @@ static void arena_vm_close(struct vm_area_struct *vma)
 	struct bpf_arena *arena = container_of(map, struct bpf_arena, map);
 	struct vma_list *vml = vma->vm_private_data;
 
-	if (!refcount_dec_and_test(&vml->mmap_count))
+	if (!vml)
 		return;
 	guard(mutex)(&arena->lock);
-	/* update link list under lock */
 	list_del(&vml->head);
 	vma->vm_private_data = NULL;
 	kfree(vml);
@@ -415,9 +420,15 @@ static vm_fault_t arena_vm_fault(struct vm_fault *vmf)
 	return VM_FAULT_SIGSEGV;
 }
 
+static int arena_vm_may_split(struct vm_area_struct *vma, unsigned long addr)
+{
+	return -EINVAL;
+}
+
 static const struct vm_operations_struct arena_vm_ops = {
 	.open		= arena_vm_open,
 	.close		= arena_vm_close,
+	.may_split	= arena_vm_may_split,
 	.fault          = arena_vm_fault,
 };
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH bpf v3 2/2] selftests/bpf: Add test for arena VMA use-after-free on fork
  2026-04-11 20:08 [PATCH bpf v3 0/2] bpf: Fix arena VMA use-after-free on fork Weiming Shi
  2026-04-11 20:08 ` [PATCH bpf v3 1/2] bpf: Fix use-after-free of arena VMA " Weiming Shi
@ 2026-04-11 20:08 ` Weiming Shi
  1 sibling, 0 replies; 4+ messages in thread
From: Weiming Shi @ 2026-04-11 20:08 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
  Cc: Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Barret Rhoden, Emil Tsalapatis, bpf, Xiang Mei, Weiming Shi

Add a selftest that reproduces the arena VMA use-after-free fixed in
the previous commit. The test creates an arena, mmaps it, allocates
pages via BPF, forks, has the parent munmap the arena, then has the
child call bpf_arena_free_pages. Without the fix this triggers a
KASAN slab-use-after-free in zap_page_range_single.

Note: the UAF occurs entirely in kernel space and is not observable
from userspace, so this test relies on KASAN to detect the bug.
Without KASAN the test passes regardless of whether the fix is
applied.

Signed-off-by: Weiming Shi <bestswngs@gmail.com>
---
 .../selftests/bpf/prog_tests/arena_fork.c     | 80 +++++++++++++++++++
 .../testing/selftests/bpf/progs/arena_fork.c  | 41 ++++++++++
 2 files changed, 121 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/arena_fork.c
 create mode 100644 tools/testing/selftests/bpf/progs/arena_fork.c

diff --git a/tools/testing/selftests/bpf/prog_tests/arena_fork.c b/tools/testing/selftests/bpf/prog_tests/arena_fork.c
new file mode 100644
index 000000000000..0235884c5906
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/arena_fork.c
@@ -0,0 +1,80 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2026 Meta Platforms, Inc. and affiliates. */
+
+/*
+ * Test that forking a process with an arena mmap does not cause a
+ * use-after-free when the parent unmaps and the child frees arena pages.
+ */
+#include <test_progs.h>
+#include <sys/mman.h>
+#include <sys/wait.h>
+#include <unistd.h>
+
+#include "arena_fork.skel.h"
+
+void test_arena_fork(void)
+{
+	LIBBPF_OPTS(bpf_test_run_opts, opts);
+	struct bpf_map_info info = {};
+	__u32 info_len = sizeof(info);
+	struct arena_fork *skel;
+	long page_size;
+	size_t arena_sz;
+	void *arena_addr;
+	int arena_fd, ret, status;
+	pid_t pid;
+
+	page_size = sysconf(_SC_PAGESIZE);
+	if (!ASSERT_GT(page_size, 0, "page_size"))
+		return;
+
+	skel = arena_fork__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "open_and_load"))
+		return;
+
+	arena_fd = bpf_map__fd(skel->maps.arena);
+
+	/* libbpf mmaps the arena via initial_value */
+	arena_addr = bpf_map__initial_value(skel->maps.arena, &arena_sz);
+	if (!ASSERT_OK_PTR(arena_addr, "arena_mmap"))
+		goto out;
+
+	/* Get real arena byte size for munmap */
+	bpf_map_get_info_by_fd(arena_fd, &info, &info_len);
+	arena_sz = (size_t)info.max_entries * page_size;
+
+	/* Allocate 4 pages in the arena via BPF */
+	ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.arena_alloc),
+				     &opts);
+	if (!ASSERT_OK(ret, "alloc_run") ||
+	    !ASSERT_OK(opts.retval, "alloc_ret"))
+		goto out;
+
+	/* Fault in a page so zap_pages has work to do */
+	((volatile char *)arena_addr)[0] = 'A';
+
+	/* Fork: child inherits the arena VMA */
+	pid = fork();
+	if (!ASSERT_GE(pid, 0, "fork"))
+		goto out;
+
+	if (pid == 0) {
+		/* Child: parent will unmap first, then we free pages. */
+		LIBBPF_OPTS(bpf_test_run_opts, child_opts);
+		int free_fd = bpf_program__fd(skel->progs.arena_free);
+
+		usleep(200000); /* let parent munmap first */
+
+		ret = bpf_prog_test_run_opts(free_fd, &child_opts);
+		_exit(ret || child_opts.retval);
+	}
+
+	/* Parent: unmap the arena */
+	munmap(arena_addr, arena_sz);
+
+	waitpid(pid, &status, 0);
+	ASSERT_TRUE(WIFEXITED(status), "child_exited");
+	ASSERT_EQ(WEXITSTATUS(status), 0, "child_exit_code");
+out:
+	arena_fork__destroy(skel);
+}
diff --git a/tools/testing/selftests/bpf/progs/arena_fork.c b/tools/testing/selftests/bpf/progs/arena_fork.c
new file mode 100644
index 000000000000..783c935a0af8
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/arena_fork.c
@@ -0,0 +1,41 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2026 Meta Platforms, Inc. and affiliates. */
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+#include "bpf_arena_common.h"
+
+struct {
+	__uint(type, BPF_MAP_TYPE_ARENA);
+	__uint(map_flags, BPF_F_MMAPABLE);
+	__uint(max_entries, 16); /* number of pages */
+#ifdef __TARGET_ARCH_arm64
+	__ulong(map_extra, 0x1ull << 32); /* start of mmap() region */
+#else
+	__ulong(map_extra, 0x1ull << 44); /* start of mmap() region */
+#endif
+} arena SEC(".maps");
+
+void __arena *alloc_addr;
+
+SEC("syscall")
+int arena_alloc(void *ctx)
+{
+	void __arena *p;
+
+	p = bpf_arena_alloc_pages(&arena, NULL, 4, NUMA_NO_NODE, 0);
+	if (!p)
+		return 1;
+	alloc_addr = p;
+	return 0;
+}
+
+SEC("syscall")
+int arena_free(void *ctx)
+{
+	if (!alloc_addr)
+		return 1;
+	bpf_arena_free_pages(&arena, alloc_addr, 4);
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH bpf v3 1/2] bpf: Fix use-after-free of arena VMA on fork
  2026-04-11 20:08 ` [PATCH bpf v3 1/2] bpf: Fix use-after-free of arena VMA " Weiming Shi
@ 2026-04-11 20:47   ` bot+bpf-ci
  0 siblings, 0 replies; 4+ messages in thread
From: bot+bpf-ci @ 2026-04-11 20:47 UTC (permalink / raw)
  To: bestswngs, ast, daniel, andrii
  Cc: martin.lau, eddyz87, song, yonghong.song, john.fastabend, kpsingh,
	sdf, haoluo, jolsa, brho, emil, bpf, xmei5, bestswngs, ast,
	andrii, daniel, martin.lau, eddyz87, yonghong.song, clm,
	ihor.solodrai

[-- Attachment #1: Type: text/plain, Size: 2477 bytes --]

> Fix this by tracking each child VMA separately. arena_vm_open() now
> clears the inherited vm_private_data and calls remember_vma() to
> register a fresh vma_list entry for the new VMA. arena_vm_close()
> unconditionally removes and frees the entry.

Should "unconditionally" be "conditionally" here?  When
remember_vma() fails due to OOM in arena_vm_open(), vm_private_data
stays NULL, and arena_vm_close() returns early without removing or
freeing anything:

> diff --git a/kernel/bpf/arena.c b/kernel/bpf/arena.c
> index f355cf1c1a16..3462c4463617 100644
> --- a/kernel/bpf/arena.c
> +++ b/kernel/bpf/arena.c

[ ... ]

> @@ -336,9 +334,17 @@ static int remember_vma(struct bpf_arena *arena, struct vm_area_struct *vma)
>  static void arena_vm_open(struct vm_area_struct *vma)
>  {
> +	struct bpf_map *map = vma->vm_file->private_data;
> +	struct bpf_arena *arena = container_of(map, struct bpf_arena, map);
> +
> +	vma->vm_private_data = NULL;
> +	guard(mutex)(&arena->lock);
> +	/* OOM is silently ignored; arena_vm_close() handles NULL. */
> +	remember_vma(arena, vma);
>  }

[ ... ]

> @@ -347,10 +353,9 @@ static void arena_vm_close(struct vm_area_struct *vma)
>  	struct bpf_arena *arena = container_of(map, struct bpf_arena, map);
>  	struct vma_list *vml = vma->vm_private_data;
>
> +	if (!vml)
>  		return;

arena_vm_close() returns here when vml is NULL, so it does not
unconditionally remove and free the entry.

> Also add arena_vm_may_split() returning -EINVAL to prevent VMA
> splitting, which would break the pgoff arithmetic in arena_vm_fault().

Would splitting actually break the pgoff arithmetic?
arena_vm_fault() uses vmf->pgoff for range_tree operations and
kbase + (u32)(vmf->address) for the kernel mapping.  Both of these
use values that are independent of vm_start and vm_pgoff adjustments
made by __split_vma().

Similarly, compute_pgoff() computes from absolute addresses:

    (u32)(uaddr - (u32)arena->user_vm_start) >> PAGE_SHIFT

And zap_pages() iterates all VMAs in the list, with
unmap_single_vma() clamping each call to the individual VMA's
bounds.

Preventing splitting still seems like the right call for keeping the
VMA tracking simple, but could the commit message give a more
accurate reason?


---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/24290771286

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-04-11 20:47 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-11 20:08 [PATCH bpf v3 0/2] bpf: Fix arena VMA use-after-free on fork Weiming Shi
2026-04-11 20:08 ` [PATCH bpf v3 1/2] bpf: Fix use-after-free of arena VMA " Weiming Shi
2026-04-11 20:47   ` bot+bpf-ci
2026-04-11 20:08 ` [PATCH bpf v3 2/2] selftests/bpf: Add test for arena VMA use-after-free " Weiming Shi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox