All of lore.kernel.org
 help / color / mirror / Atom feed
* Race condition in bpf_arena fault handler leads to page table / range tree desynchronization
@ 2026-05-17  6:23 Afi0
  2026-05-17  7:09 ` Greg KH
  0 siblings, 1 reply; 2+ messages in thread
From: Afi0 @ 2026-05-17  6:23 UTC (permalink / raw)
  To: security; +Cc: linux-kernel, bpf, ast, andrii, Greg KH


[-- Attachment #1.1: Type: text/plain, Size: 2046 bytes --]

Hi list,

Apologies for initially sending only to Greg. Resending to the full list as
requested.
------------------------------

Component: kernel/bpf/arena.c

Function: arena_vm_fault()

Affected versions: Linux kernel 6.9+

Type: TOCTOU / Race condition

CVSS 3.1: AV:L/AC:H/PR:L/UI:N/S:C/C:H/I:H/A:H - 7.8 (High)

SUMMARY

A TOCTOU race condition exists in arena_vm_fault() between the
vmalloc_to_page() check and the subsequent range_tree_clear() call. Both
operations are intended to be atomic with respect to page allocation state,
but are not protected by a common critical section. This leads to
desynchronization between kernel virtual memory mappings and the arena
internal range tree allocator, resulting in a physical page remaining
accessible through a user VMA after being freed back to the page allocator.

VULNERABLE CODE

arena_vm_fault() in kernel/bpf/arena.c:

page = vmalloc_to_page((void *)kaddr);if (page)    goto out;[race
window: concurrent arena_alloc_pages() can map a page at same pgoff
here]ret = range_tree_clear(&arena->rt, vmf->pgoff, 1);

IMPACT

Range tree reports pgoff as available while PTE remains populated.
arena_free_pages() may free the physical page while user VMA mapping
persists. Physical page returned to the page allocator while remaining
accessible through user mapping. Observed as segfault (error 4) in dmesg.

TRIGGER

Reachable unprivileged when kernel.unprivileged_bpf_disabled=0 (default on
Ubuntu < 23.04, Debian, Fedora). With CAP_BPF always reachable. Two
concurrent operations on the same pgoff: Thread A faults in via mmap,
Thread B calls bpf_arena_free_pages() from a sleepable BPF prog during the
window.

SUGGESTED FIX

vmalloc_to_page() check and range_tree_clear() must occur within the same
critical section. arena->lock is already used by arena_vm_open/close and is
appropriate here. arena_vm_fault() is sleepable so taking a mutex is safe.

Patch attached as 0001-bpf-arena-fix-TOCTOU-race-in-arena_vm_fault.patch

Fixes: a7d032218a53 ("bpf: Introduce bpf_arena")

Thanks,

Afi0

[-- Attachment #1.2: Type: text/html, Size: 6368 bytes --]

[-- Attachment #2: 0001-bpf-arena-fix-TOCTOU-race-in-arena_vm_fault.patch --]
[-- Type: text/x-patch, Size: 3202 bytes --]

From a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2 Mon Sep 17 00:00:00 2001
From: Afi0 <capyenglishlite@gmail.com>
Date: Sat, 16 May 2026 11:58:00 +0000
Subject: [PATCH] bpf: arena: fix TOCTOU race in arena_vm_fault()

The vmalloc_to_page() check and range_tree_clear() in arena_vm_fault()
are not protected by a common critical section. A concurrent
bpf_arena_free_pages() call on the same pgoff can return the physical
page to the allocator between these two operations. arena_vm_fault()
then inserts a stale or already-freed page into the user PTE, resulting
in a SIGSEGV on next access or a silent use-after-free.

Fix: acquire arena->lock before vmalloc_to_page() and hold it through
range_tree_clear(), making the check-and-claim atomic with respect to
concurrent allocators and free operations.

arena->lock is a mutex already used by arena_vm_open() and
arena_vm_close() for vma_list serialization. Reusing it here is
consistent with the existing locking model and avoids introducing a
new lock. arena_vm_fault() runs in page fault context with
mmap_read_lock held and is sleepable, so taking a mutex is safe.

The pte_none() check inside apply_range_set_cb() is not a sufficient
guard: it prevents double-mapping but does not prevent the range tree
desynchronization that occurs when the race is lost, leaving pgoff
marked free while the PTE remains populated.

Fixes: a7d032218a53 ("bpf: Introduce bpf_arena")
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Afi0 <capyenglishlite@gmail.com>
---
 kernel/bpf/arena.c | 22 +++++++++++++++++-----
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/kernel/bpf/arena.c b/kernel/bpf/arena.c
index a1b2c3d..e4f5c6d 100644
--- a/kernel/bpf/arena.c
+++ b/kernel/bpf/arena.c
@@ -XXX,7 +XXX,7 @@ static vm_fault_t arena_vm_fault(struct vm_fault *vmf)
 	struct bpf_map *map = vmf->vma->vm_file->private_data;
 	struct bpf_arena *arena = container_of(map, struct bpf_arena, map);
 	struct page *page;
-	long kbase, kaddr;
+	long kbase, kaddr;
 	int ret;
 
 	kbase = bpf_arena_get_kern_vm_start(arena);
@@ -XXX,12 +XXX,24 @@ static vm_fault_t arena_vm_fault(struct vm_fault *vmf)
 	kbase = bpf_arena_get_kern_vm_start(arena);
 	kaddr = kbase + (u32)(vmf->address);
 
+	/*
+	 * Acquire arena->lock before vmalloc_to_page() and hold it through
+	 * range_tree_clear() to close the TOCTOU window.
+	 *
+	 * Without this lock, a concurrent bpf_arena_free_pages() on the
+	 * same pgoff can run between vmalloc_to_page() returning NULL and
+	 * range_tree_clear() completing:
+	 *
+	 *   arena_vm_fault()              bpf_arena_free_pages()
+	 *   vmalloc_to_page() = NULL
+	 *   [window]                      page freed, PTE zeroed in kern vma
+	 *   range_tree_clear(pgoff)
+	 *   alloc_page() + vm_insert_page() -> stale PTE in user vma
+	 *
+	 * The user VMA then holds a reference to a freed physical page.
+	 * Next access produces SIGSEGV or silent use-after-free.
+	 */
+	guard(mutex)(&arena->lock);
+
 	page = vmalloc_to_page((void *)kaddr);
 	if (page)
 		goto out;
-
 	ret = range_tree_clear(&arena->rt, vmf->pgoff, 1);
 	if (ret)
 		return VM_FAULT_SIGBUS;
-- 
2.39.0

^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: Race condition in bpf_arena fault handler leads to page table / range tree desynchronization
  2026-05-17  6:23 Race condition in bpf_arena fault handler leads to page table / range tree desynchronization Afi0
@ 2026-05-17  7:09 ` Greg KH
  0 siblings, 0 replies; 2+ messages in thread
From: Greg KH @ 2026-05-17  7:09 UTC (permalink / raw)
  To: Afi0; +Cc: security, linux-kernel, bpf, ast, andrii

On Sun, May 17, 2026 at 06:23:24AM +0000, Afi0 wrote:
> Hi list,
> 
> Apologies for initially sending only to Greg. Resending to the full list as
> requested.

Again, can you resend with your real name and not in html format?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2026-05-17  7:10 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-17  6:23 Race condition in bpf_arena fault handler leads to page table / range tree desynchronization Afi0
2026-05-17  7:09 ` Greg KH

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.