BPF List
 help / color / mirror / Atom feed
* [PATCH v2 0/4] libbpf: move arena variables out of the zero page
@ 2025-11-18  3:00 Emil Tsalapatis
  2025-11-18  3:00 ` [PATCH v2 1/4] selftests/bpf: explicitly account for globals in verifier_arena_large Emil Tsalapatis
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Emil Tsalapatis @ 2025-11-18  3:00 UTC (permalink / raw)
  To: bpf
  Cc: ast, daniel, john.fastabend, memxor, andrii, eddyz87,
	yonghong.song, Emil Tsalapatis

Modify libbpf to place arena globals in a small offset inside the arena
mapping instead of at the very beginning. This allows programs to leave
the "zero page" of the arena unmapped, so that NULL arena pointer 
dereferences trigger a page fault and associated backtrace in BPF streams.
In contrast, the current policy of placing global data in the zero pages
means that NULL dereferences silently corrupt global data, e.g, arena 
qspinlock state. This makes arena bugs more difficult to debug.

The patchset adds code to libbpf to move global arena data 16 pages into 
the arena mapping. If this move is impossible, libbpf tries progressively
smaller increments, and finally defaults to 0 if there is not enough
space in the arena. At load time, libbpf adjusts each symbol's location
within the arena by that offset. The patchset also adds padding to the 
BPF skeleton struct arena datasec to ensure the arena's fields are 
pointing in the right locations within the mapping.

Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com>

HISTORY
-------

v1->v2
------

v1: https://lore.kernel.org/bpf/20251117235636.140259-1-emil@etsalapatis.com/

- Fix ifdef guards causing unused variable errors for
  architectures/compilers without addr_space_cast support (CI)

Emil Tsalapatis (4):
  selftests/bpf: explicitly account for globals in verifier_arena_large
  libbpf: add stub for offset-related skeleton padding
  libbpf: offset global arena data into the arena if possible
  selftests/bpf: add tests for the arena offset of globals

 tools/bpf/bpftool/gen.c                       | 23 ++++++-
 tools/lib/bpf/libbpf.c                        | 36 ++++++++++-
 tools/lib/bpf/libbpf.h                        |  9 +++
 tools/lib/bpf/libbpf.map                      |  1 +
 .../selftests/bpf/prog_tests/verifier.c       |  6 ++
 .../bpf/progs/verifier_arena_globals1.c       | 60 ++++++++++++++++++
 .../bpf/progs/verifier_arena_globals2.c       | 49 +++++++++++++++
 .../bpf/progs/verifier_arena_globals3.c       | 61 +++++++++++++++++++
 .../bpf/progs/verifier_arena_large.c          | 25 ++++++--
 9 files changed, 261 insertions(+), 9 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/verifier_arena_globals1.c
 create mode 100644 tools/testing/selftests/bpf/progs/verifier_arena_globals2.c
 create mode 100644 tools/testing/selftests/bpf/progs/verifier_arena_globals3.c

-- 
2.49.0


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 1/4] selftests/bpf: explicitly account for globals in verifier_arena_large
  2025-11-18  3:00 [PATCH v2 0/4] libbpf: move arena variables out of the zero page Emil Tsalapatis
@ 2025-11-18  3:00 ` Emil Tsalapatis
  2025-11-22  1:44   ` Eduard Zingerman
  2025-11-18  3:00 ` [PATCH v2 2/4] libbpf: add stub for offset-related skeleton padding Emil Tsalapatis
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 15+ messages in thread
From: Emil Tsalapatis @ 2025-11-18  3:00 UTC (permalink / raw)
  To: bpf
  Cc: ast, daniel, john.fastabend, memxor, andrii, eddyz87,
	yonghong.song, Emil Tsalapatis

The big_alloc1 test in verifier_arena_large assumes that the arena base
and the first page allocated by bpf_arena_alloc_pages are identical.
This is not the case, because the first page in the arena is populated
by global arena data. The test still passes because the code makes the
tacit assumption that the first page is on offset PAGE_SIZE instead of
0.

Make this distinction explicit in the code, and adjust the page offsets
requested during the test to count from the beginning of the arena
instead of using the address of the first allocated page.

Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com>
---
 .../selftests/bpf/progs/verifier_arena_large.c    | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/bpf/progs/verifier_arena_large.c b/tools/testing/selftests/bpf/progs/verifier_arena_large.c
index f19e15400b3e..bd430a34c3ab 100644
--- a/tools/testing/selftests/bpf/progs/verifier_arena_large.c
+++ b/tools/testing/selftests/bpf/progs/verifier_arena_large.c
@@ -23,18 +23,25 @@ int big_alloc1(void *ctx)
 {
 #if defined(__BPF_FEATURE_ADDR_SPACE_CAST)
 	volatile char __arena *page1, *page2, *no_page, *page3;
-	void __arena *base;
+	u64 base;
 
-	page1 = base = bpf_arena_alloc_pages(&arena, NULL, 1, NUMA_NO_NODE, 0);
+	base = (u64)arena_base(&arena);
+
+	page1 = bpf_arena_alloc_pages(&arena, NULL, 1, NUMA_NO_NODE, 0);
 	if (!page1)
 		return 1;
+
+	/* Account for global arena data. */
+	if ((u64)page1 != base + PAGE_SIZE)
+		return 15;
+
 	*page1 = 1;
-	page2 = bpf_arena_alloc_pages(&arena, base + ARENA_SIZE - PAGE_SIZE * 2,
+	page2 = bpf_arena_alloc_pages(&arena, (void __arena *)(ARENA_SIZE - PAGE_SIZE),
 				      1, NUMA_NO_NODE, 0);
 	if (!page2)
 		return 2;
 	*page2 = 2;
-	no_page = bpf_arena_alloc_pages(&arena, base + ARENA_SIZE - PAGE_SIZE,
+	no_page = bpf_arena_alloc_pages(&arena, (void __arena *)ARENA_SIZE,
 					1, NUMA_NO_NODE, 0);
 	if (no_page)
 		return 3;
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 2/4] libbpf: add stub for offset-related skeleton padding
  2025-11-18  3:00 [PATCH v2 0/4] libbpf: move arena variables out of the zero page Emil Tsalapatis
  2025-11-18  3:00 ` [PATCH v2 1/4] selftests/bpf: explicitly account for globals in verifier_arena_large Emil Tsalapatis
@ 2025-11-18  3:00 ` Emil Tsalapatis
  2025-11-18  3:00 ` [PATCH v2 3/4] libbpf: offset global arena data into the arena if possible Emil Tsalapatis
  2025-11-18  3:00 ` [PATCH v2 4/4] selftests/bpf: add tests for the arena offset of globals Emil Tsalapatis
  3 siblings, 0 replies; 15+ messages in thread
From: Emil Tsalapatis @ 2025-11-18  3:00 UTC (permalink / raw)
  To: bpf
  Cc: ast, daniel, john.fastabend, memxor, andrii, eddyz87,
	yonghong.song, Emil Tsalapatis

Add a stub function for reporting in which offset within a mapping
libbpf places the map's data. This will be used in a subsequent
patch to support offsetting arena variables within the mapped region.

Adjust skeleton generation to account for the new arena memory layout
by adding padding corresponding to the offset into the arena map. Add
a libbbpf API function to get the data offset within the map's mapping
during skeleton generation.

Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com>
---
 tools/bpf/bpftool/gen.c  | 23 +++++++++++++++++++++--
 tools/lib/bpf/libbpf.c   | 10 ++++++++++
 tools/lib/bpf/libbpf.h   |  9 +++++++++
 tools/lib/bpf/libbpf.map |  1 +
 4 files changed, 41 insertions(+), 2 deletions(-)

diff --git a/tools/bpf/bpftool/gen.c b/tools/bpf/bpftool/gen.c
index 993c7d9484a4..6ed125b1b465 100644
--- a/tools/bpf/bpftool/gen.c
+++ b/tools/bpf/bpftool/gen.c
@@ -148,7 +148,8 @@ static int codegen_datasec_def(struct bpf_object *obj,
 			       struct btf *btf,
 			       struct btf_dump *d,
 			       const struct btf_type *sec,
-			       const char *obj_name)
+			       const char *obj_name,
+			       int var_off)
 {
 	const char *sec_name = btf__name_by_offset(btf, sec->name_off);
 	const struct btf_var_secinfo *sec_var = btf_var_secinfos(sec);
@@ -163,6 +164,17 @@ static int codegen_datasec_def(struct bpf_object *obj,
 		strip_mods = true;
 
 	printf("	struct %s__%s {\n", obj_name, sec_ident);
+
+	/*
+	 * Arena variables may be placed in an offset within the section.
+	 * Represent this in the skeleton using a padding struct.
+	 */
+	if (var_off > 0) {
+		printf("\t\tchar __pad%d[%d];\n",
+			pad_cnt, var_off);
+		pad_cnt++;
+	}
+
 	for (i = 0; i < vlen; i++, sec_var++) {
 		const struct btf_type *var = btf__type_by_id(btf, sec_var->type);
 		const char *var_name = btf__name_by_offset(btf, var->name_off);
@@ -279,6 +291,7 @@ static int codegen_datasecs(struct bpf_object *obj, const char *obj_name)
 	struct bpf_map *map;
 	const struct btf_type *sec;
 	char map_ident[256];
+	int var_off;
 	int err = 0;
 
 	d = btf_dump__new(btf, codegen_btf_dump_printf, NULL, NULL);
@@ -303,7 +316,13 @@ static int codegen_datasecs(struct bpf_object *obj, const char *obj_name)
 			printf("	struct %s__%s {\n", obj_name, map_ident);
 			printf("	} *%s;\n", map_ident);
 		} else {
-			err = codegen_datasec_def(obj, btf, d, sec, obj_name);
+			var_off = bpf_map__data_offset(map);
+			if (var_off < 0)  {
+				p_err("bpf_map__data_offset called on unmapped map\n");
+				err = var_off;
+				goto out;
+			}
+			err = codegen_datasec_def(obj, btf, d, sec, obj_name, var_off);
 			if (err)
 				goto out;
 		}
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 706e7481bdf6..32dac36ba8db 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -10552,6 +10552,16 @@ const char *bpf_map__name(const struct bpf_map *map)
 	return map->name;
 }
 
+int bpf_map__data_offset(const struct bpf_map *map)
+{
+	if (!map->mmaped)
+		return -EINVAL;
+
+	/* No offsetting for now. */
+	return 0;
+}
+
+
 enum bpf_map_type bpf_map__type(const struct bpf_map *map)
 {
 	return map->def.type;
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index 5118d0a90e24..549289dd9891 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -1314,6 +1314,15 @@ LIBBPF_API int bpf_map__set_exclusive_program(struct bpf_map *map, struct bpf_pr
  */
 LIBBPF_API struct bpf_program *bpf_map__exclusive_program(struct bpf_map *map);
 
+/*
+ * @brief **bpf_map__data_offset** returns the offset of the map's data
+ * within the address mapping.
+ * @param BPF map whose variable offset we are looking into.
+ * @return the offset >= 0 of the map's contents within its mapping; negative
+ * error code, otherwise.
+ */
+LIBBPF_API int bpf_map__data_offset(const struct bpf_map *map);
+
 struct bpf_xdp_set_link_opts {
 	size_t sz;
 	int old_fd;
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index 8ed8749907d4..ac932ee3a932 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -451,4 +451,5 @@ LIBBPF_1.7.0 {
 	global:
 		bpf_map__set_exclusive_program;
 		bpf_map__exclusive_program;
+		bpf_map__data_offset;
 } LIBBPF_1.6.0;
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 3/4] libbpf: offset global arena data into the arena if possible
  2025-11-18  3:00 [PATCH v2 0/4] libbpf: move arena variables out of the zero page Emil Tsalapatis
  2025-11-18  3:00 ` [PATCH v2 1/4] selftests/bpf: explicitly account for globals in verifier_arena_large Emil Tsalapatis
  2025-11-18  3:00 ` [PATCH v2 2/4] libbpf: add stub for offset-related skeleton padding Emil Tsalapatis
@ 2025-11-18  3:00 ` Emil Tsalapatis
  2025-11-22  3:17   ` Eduard Zingerman
  2025-11-25 22:11   ` Andrii Nakryiko
  2025-11-18  3:00 ` [PATCH v2 4/4] selftests/bpf: add tests for the arena offset of globals Emil Tsalapatis
  3 siblings, 2 replies; 15+ messages in thread
From: Emil Tsalapatis @ 2025-11-18  3:00 UTC (permalink / raw)
  To: bpf
  Cc: ast, daniel, john.fastabend, memxor, andrii, eddyz87,
	yonghong.song, Emil Tsalapatis

Currently, libbpf places global arena data at the very beginning of
the arena mapping. Stray NULL dereferences into the arena then find
valid data and lead to silent corruption instead of causing an arena
page fault. The data is placed in the mapping at load time, preventing
us from reserving the region using bpf_arena_reserve_pages().

Adjust the arena logic to attempt placing the data from an offset within
the arena (currently 16 pages in) instead of the very beginning. If
placing the data at an offset would lead to an allocation failure due
to global data being as large as the entire arena, progressively reduce
the offset down to 0 until placement succeeds.

Adjust existing arena tests in the same commit to account for the new
global data offset. New tests that explicitly consider the new feature
are introduced in the next patch.

Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com>
---
 tools/lib/bpf/libbpf.c                        | 30 +++++++++++++++----
 .../bpf/progs/verifier_arena_large.c          | 14 +++++++--
 2 files changed, 37 insertions(+), 7 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 32dac36ba8db..6f40c6321935 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -757,6 +757,7 @@ struct bpf_object {
 	int arena_map_idx;
 	void *arena_data;
 	size_t arena_data_sz;
+	__u32 arena_data_off;
 
 	void *jumptables_data;
 	size_t jumptables_data_sz;
@@ -2991,10 +2992,14 @@ static int init_arena_map_data(struct bpf_object *obj, struct bpf_map *map,
 			       void *data, size_t data_sz)
 {
 	const long page_sz = sysconf(_SC_PAGE_SIZE);
+	const size_t data_alloc_sz = roundup(data_sz, page_sz);
+	/* default offset into the arena, may be resized */
+	const long max_off_pages = 16;
 	size_t mmap_sz;
+	long off_pages;
 
 	mmap_sz = bpf_map_mmap_sz(map);
-	if (roundup(data_sz, page_sz) > mmap_sz) {
+	if (data_alloc_sz > mmap_sz) {
 		pr_warn("elf: sec '%s': declared ARENA map size (%zu) is too small to hold global __arena variables of size %zu\n",
 			sec_name, mmap_sz, data_sz);
 		return -E2BIG;
@@ -3006,6 +3011,17 @@ static int init_arena_map_data(struct bpf_object *obj, struct bpf_map *map,
 	memcpy(obj->arena_data, data, data_sz);
 	obj->arena_data_sz = data_sz;
 
+	/*
+	 * find the largest offset for global arena variables
+	 * where they still fit in the arena
+	 */
+	for (off_pages = max_off_pages; off_pages > 0; off_pages >>= 1) {
+		if (off_pages * page_sz + data_alloc_sz <= mmap_sz)
+			break;
+	}
+
+	obj->arena_data_off = off_pages * page_sz;
+
 	/* make bpf_map__init_value() work for ARENA maps */
 	map->mmaped = obj->arena_data;
 
@@ -4663,7 +4679,7 @@ static int bpf_program__record_reloc(struct bpf_program *prog,
 		reloc_desc->type = RELO_DATA;
 		reloc_desc->insn_idx = insn_idx;
 		reloc_desc->map_idx = obj->arena_map_idx;
-		reloc_desc->sym_off = sym->st_value;
+		reloc_desc->sym_off = sym->st_value + obj->arena_data_off;
 
 		map = &obj->maps[obj->arena_map_idx];
 		pr_debug("prog '%s': found arena map %d (%s, sec %d, off %zu) for insn %u\n",
@@ -5624,7 +5640,8 @@ bpf_object__create_maps(struct bpf_object *obj)
 					return err;
 				}
 				if (obj->arena_data) {
-					memcpy(map->mmaped, obj->arena_data, obj->arena_data_sz);
+					memcpy(map->mmaped + obj->arena_data_off, obj->arena_data,
+						obj->arena_data_sz);
 					zfree(&obj->arena_data);
 				}
 			}
@@ -10557,8 +10574,11 @@ int bpf_map__data_offset(const struct bpf_map *map)
 	if (!map->mmaped)
 		return -EINVAL;
 
-	/* No offsetting for now. */
-	return 0;
+	/* Only arenas have offsetting. */
+	if (map->def.type != BPF_MAP_TYPE_ARENA)
+		return 0;
+
+	return map->obj->arena_data_off;
 }
 
 
diff --git a/tools/testing/selftests/bpf/progs/verifier_arena_large.c b/tools/testing/selftests/bpf/progs/verifier_arena_large.c
index bd430a34c3ab..f72198596889 100644
--- a/tools/testing/selftests/bpf/progs/verifier_arena_large.c
+++ b/tools/testing/selftests/bpf/progs/verifier_arena_large.c
@@ -10,6 +10,7 @@
 #include "bpf_arena_common.h"
 
 #define ARENA_SIZE (1ull << 32)
+#define GLOBAL_PGOFF (16)
 
 struct {
 	__uint(type, BPF_MAP_TYPE_ARENA);
@@ -31,8 +32,7 @@ int big_alloc1(void *ctx)
 	if (!page1)
 		return 1;
 
-	/* Account for global arena data. */
-	if ((u64)page1 != base + PAGE_SIZE)
+	if ((u64)page1 != base)
 		return 15;
 
 	*page1 = 1;
@@ -216,6 +216,16 @@ int big_alloc2(void *ctx)
 	__u8 __arena *pg;
 	int i, err;
 
+	/*
+	 * The global data is placed in a page with global offset 16.
+	 * This test is about page allocation contiguity, so avoid
+	 * accounting for the stray allocation by also allocating
+	 * all pages before it. We never use the page range, so leak it.
+	 */
+	pg = bpf_arena_alloc_pages(&arena, NULL, GLOBAL_PGOFF, NUMA_NO_NODE, 0);
+	if (!pg)
+		return 10;
+
 	base = bpf_arena_alloc_pages(&arena, NULL, 1, NUMA_NO_NODE, 0);
 	if (!base)
 		return 1;
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 4/4] selftests/bpf: add tests for the arena offset of globals
  2025-11-18  3:00 [PATCH v2 0/4] libbpf: move arena variables out of the zero page Emil Tsalapatis
                   ` (2 preceding siblings ...)
  2025-11-18  3:00 ` [PATCH v2 3/4] libbpf: offset global arena data into the arena if possible Emil Tsalapatis
@ 2025-11-18  3:00 ` Emil Tsalapatis
  3 siblings, 0 replies; 15+ messages in thread
From: Emil Tsalapatis @ 2025-11-18  3:00 UTC (permalink / raw)
  To: bpf
  Cc: ast, daniel, john.fastabend, memxor, andrii, eddyz87,
	yonghong.song, Emil Tsalapatis

Add tests for the new libbpf globals arena offset logic. The
tests cover all three cases: The globals being small enough
to be placed at the maximum possible offset, being as large as
the arena itself and being placed at the very beginning, and
requiring an intermediate offset into the arena.

Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com>
---
 .../selftests/bpf/prog_tests/verifier.c       |  6 ++
 .../bpf/progs/verifier_arena_globals1.c       | 60 ++++++++++++++++++
 .../bpf/progs/verifier_arena_globals2.c       | 49 +++++++++++++++
 .../bpf/progs/verifier_arena_globals3.c       | 61 +++++++++++++++++++
 4 files changed, 176 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/progs/verifier_arena_globals1.c
 create mode 100644 tools/testing/selftests/bpf/progs/verifier_arena_globals2.c
 create mode 100644 tools/testing/selftests/bpf/progs/verifier_arena_globals3.c

diff --git a/tools/testing/selftests/bpf/prog_tests/verifier.c b/tools/testing/selftests/bpf/prog_tests/verifier.c
index 4b4b081b46cc..0c64fbc9a194 100644
--- a/tools/testing/selftests/bpf/prog_tests/verifier.c
+++ b/tools/testing/selftests/bpf/prog_tests/verifier.c
@@ -6,6 +6,9 @@
 #include "verifier_and.skel.h"
 #include "verifier_arena.skel.h"
 #include "verifier_arena_large.skel.h"
+#include "verifier_arena_globals1.skel.h"
+#include "verifier_arena_globals2.skel.h"
+#include "verifier_arena_globals3.skel.h"
 #include "verifier_array_access.skel.h"
 #include "verifier_async_cb_context.skel.h"
 #include "verifier_basic_stack.skel.h"
@@ -147,6 +150,9 @@ static void run_tests_aux(const char *skel_name,
 void test_verifier_and(void)                  { RUN(verifier_and); }
 void test_verifier_arena(void)                { RUN(verifier_arena); }
 void test_verifier_arena_large(void)          { RUN(verifier_arena_large); }
+void test_verifier_arena_globals1(void)       { RUN(verifier_arena_globals1); }
+void test_verifier_arena_globals2(void)       { RUN(verifier_arena_globals2); }
+void test_verifier_arena_globals3(void)       { RUN(verifier_arena_globals3); }
 void test_verifier_basic_stack(void)          { RUN(verifier_basic_stack); }
 void test_verifier_bitfield_write(void)       { RUN(verifier_bitfield_write); }
 void test_verifier_bounds(void)               { RUN(verifier_bounds); }
diff --git a/tools/testing/selftests/bpf/progs/verifier_arena_globals1.c b/tools/testing/selftests/bpf/progs/verifier_arena_globals1.c
new file mode 100644
index 000000000000..761844577981
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/verifier_arena_globals1.c
@@ -0,0 +1,60 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2025 Meta Platforms, Inc. and affiliates. */
+
+#define BPF_NO_KFUNC_PROTOTYPES
+#include <vmlinux.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+#include "bpf_experimental.h"
+#include "bpf_arena_common.h"
+#include "bpf_misc.h"
+
+#define ARENA_PAGES (64)
+
+/* Set in libbpf. */
+#define GLOBALS_PGOFF (16)
+
+struct {
+	__uint(type, BPF_MAP_TYPE_ARENA);
+	__uint(map_flags, BPF_F_MMAPABLE);
+	__uint(max_entries, ARENA_PAGES); /* Arena of 64 pages (standard offset is 16 pages) */
+#ifdef __TARGET_ARCH_arm64
+	__ulong(map_extra, (1ull << 32) | (~0u - __PAGE_SIZE * ARENA_PAGES + 1));
+#else
+	__ulong(map_extra, (1ull << 44) | (~0u - __PAGE_SIZE * ARENA_PAGES + 1));
+#endif
+} arena SEC(".maps");
+
+/*
+ * Global data small enough that we can apply the maximum
+ * offset into the arena. Userspace will also use this to
+ * ensure the offset doesn't unexpectedly change from
+ * under us.
+ */
+char __arena global_data[PAGE_SIZE][ARENA_PAGES - GLOBALS_PGOFF];
+
+SEC("syscall")
+__success __retval(0)
+int check_reserve1(void *ctx)
+{
+#if defined(__BPF_FEATURE_ADDR_SPACE_CAST)
+	__u8 __arena *guard, *globals;
+	int ret;
+
+	guard = (void __arena *)arena_base(&arena);
+	globals = (void __arena *)(arena_base(&arena) + GLOBALS_PGOFF * PAGE_SIZE);
+
+	/* Reserve the region we've offset the globals by. */
+	ret = bpf_arena_reserve_pages(&arena, guard, GLOBALS_PGOFF);
+	if (ret)
+		return 1;
+
+	/* Make sure the globals are placed GLOBALS_PGOFF pages in. */
+	ret = bpf_arena_reserve_pages(&arena, globals, 1);
+	if (!ret)
+		return 2;
+#endif
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/verifier_arena_globals2.c b/tools/testing/selftests/bpf/progs/verifier_arena_globals2.c
new file mode 100644
index 000000000000..9b6a08135de5
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/verifier_arena_globals2.c
@@ -0,0 +1,49 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2025 Meta Platforms, Inc. and affiliates. */
+
+#define BPF_NO_KFUNC_PROTOTYPES
+#include <vmlinux.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+#include "bpf_misc.h"
+#include "bpf_experimental.h"
+#include "bpf_arena_common.h"
+
+#define ARENA_PAGES (32)
+
+struct {
+	__uint(type, BPF_MAP_TYPE_ARENA);
+	__uint(map_flags, BPF_F_MMAPABLE);
+	__uint(max_entries, ARENA_PAGES); /* Arena of 32 pages (standard offset is 16 pages) */
+#ifdef __TARGET_ARCH_arm64
+	__ulong(map_extra, (1ull << 32) | (~0u - __PAGE_SIZE * ARENA_PAGES + 1));
+#else
+	__ulong(map_extra, (1ull << 44) | (~0u - __PAGE_SIZE * ARENA_PAGES + 1));
+#endif
+} arena SEC(".maps");
+
+/*
+ * Fill the entire arena with global data.
+ * The offset into the arena should be 0.
+ */
+char __arena global_data[PAGE_SIZE][ARENA_PAGES];
+
+SEC("syscall")
+__success __retval(0)
+int check_reserve2(void *ctx)
+{
+#if defined(__BPF_FEATURE_ADDR_SPACE_CAST)
+	void __arena *guard;
+	int ret;
+
+	guard = (void __arena *)arena_base(&arena);
+
+	/* Make sure the data at offset 0 case is properly handled. */
+	ret = bpf_arena_reserve_pages(&arena, guard, 1);
+	if (!ret)
+		return 1;
+#endif
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/verifier_arena_globals3.c b/tools/testing/selftests/bpf/progs/verifier_arena_globals3.c
new file mode 100644
index 000000000000..62d1a46955fd
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/verifier_arena_globals3.c
@@ -0,0 +1,61 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2025 Meta Platforms, Inc. and affiliates. */
+
+#define BPF_NO_KFUNC_PROTOTYPES
+#include <vmlinux.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+#include "bpf_misc.h"
+#include "bpf_experimental.h"
+#include "bpf_arena_common.h"
+
+#define ARENA_PAGES (32)
+
+#define ARENA_AVAIL_PAGES (6)
+
+struct {
+	__uint(type, BPF_MAP_TYPE_ARENA);
+	__uint(map_flags, BPF_F_MMAPABLE);
+	__uint(max_entries, ARENA_PAGES); /* Arena of 32 pages (standard offset is 16 pages) */
+#ifdef __TARGET_ARCH_arm64
+	__ulong(map_extra, (1ull << 32) | (~0u - __PAGE_SIZE * ARENA_PAGES + 1));
+#else
+	__ulong(map_extra, (1ull << 44) | (~0u - __PAGE_SIZE * ARENA_PAGES + 1));
+#endif
+} arena SEC(".maps");
+
+/*
+ * Enough global data to fill most of the arena. Force libbpf to
+ * adjust the offset into the arena enough for the data to fit.
+ */
+
+char __arena global_data[PAGE_SIZE][ARENA_PAGES - ARENA_AVAIL_PAGES];
+
+SEC("syscall")
+__success __retval(0)
+int check_reserve3(void *ctx)
+{
+#if defined(__BPF_FEATURE_ADDR_SPACE_CAST)
+	void __arena *guard, *globals;
+	int ret;
+
+	guard = (void __arena *)arena_base(&arena);
+	globals = (void __arena *)(arena_base(&arena) + 4 * PAGE_SIZE);
+
+	/*
+	 * The data should be offset 4 pages in (the largest
+	 * possible power of 2 that still leaves enough room
+	 * to the global data).
+	 */
+	ret = bpf_arena_reserve_pages(&arena, guard, 4);
+	if (ret)
+		return 1;
+
+	ret = bpf_arena_reserve_pages(&arena, globals, 1);
+	if (!ret)
+		return 2;
+#endif
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/4] selftests/bpf: explicitly account for globals in verifier_arena_large
  2025-11-18  3:00 ` [PATCH v2 1/4] selftests/bpf: explicitly account for globals in verifier_arena_large Emil Tsalapatis
@ 2025-11-22  1:44   ` Eduard Zingerman
  0 siblings, 0 replies; 15+ messages in thread
From: Eduard Zingerman @ 2025-11-22  1:44 UTC (permalink / raw)
  To: Emil Tsalapatis, bpf
  Cc: ast, daniel, john.fastabend, memxor, andrii, yonghong.song

On Mon, 2025-11-17 at 22:00 -0500, Emil Tsalapatis wrote:
> The big_alloc1 test in verifier_arena_large assumes that the arena base
> and the first page allocated by bpf_arena_alloc_pages are identical.
> This is not the case, because the first page in the arena is populated
> by global arena data. The test still passes because the code makes the
> tacit assumption that the first page is on offset PAGE_SIZE instead of
> 0.
> 
> Make this distinction explicit in the code, and adjust the page offsets
> requested during the test to count from the beginning of the arena
> instead of using the address of the first allocated page.
> 
> Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com>
> ---

Seems correct.

Reviewed-by: Eduard Zingerman <eddyz87@gmail.com>

[...]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/4] libbpf: offset global arena data into the arena if possible
  2025-11-18  3:00 ` [PATCH v2 3/4] libbpf: offset global arena data into the arena if possible Emil Tsalapatis
@ 2025-11-22  3:17   ` Eduard Zingerman
  2025-12-01 18:34     ` Emil Tsalapatis
  2025-11-25 22:11   ` Andrii Nakryiko
  1 sibling, 1 reply; 15+ messages in thread
From: Eduard Zingerman @ 2025-11-22  3:17 UTC (permalink / raw)
  To: Emil Tsalapatis, bpf
  Cc: ast, daniel, john.fastabend, memxor, andrii, yonghong.song

On Mon, 2025-11-17 at 22:00 -0500, Emil Tsalapatis wrote:
> Currently, libbpf places global arena data at the very beginning of
> the arena mapping. Stray NULL dereferences into the arena then find
> valid data and lead to silent corruption instead of causing an arena
> page fault. The data is placed in the mapping at load time, preventing
> us from reserving the region using bpf_arena_reserve_pages().
> 
> Adjust the arena logic to attempt placing the data from an offset within
> the arena (currently 16 pages in) instead of the very beginning. If
> placing the data at an offset would lead to an allocation failure due
> to global data being as large as the entire arena, progressively reduce
> the offset down to 0 until placement succeeds.
> 
> Adjust existing arena tests in the same commit to account for the new
> global data offset. New tests that explicitly consider the new feature
> are introduced in the next patch.
> 
> Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com>
> ---

[...]

> @@ -3006,6 +3011,17 @@ static int init_arena_map_data(struct bpf_object *obj, struct bpf_map *map,
>  	memcpy(obj->arena_data, data, data_sz);
>  	obj->arena_data_sz = data_sz;
>  
> +	/*
> +	 * find the largest offset for global arena variables
> +	 * where they still fit in the arena
> +	 */
> +	for (off_pages = max_off_pages; off_pages > 0; off_pages >>= 1) {
> +		if (off_pages * page_sz + data_alloc_sz <= mmap_sz)
> +			break;
> +	}
> +
> +	obj->arena_data_off = off_pages * page_sz;
> +
>  	/* make bpf_map__init_value() work for ARENA maps */
>  	map->mmaped = obj->arena_data;
>  

Please correct me if I'm wrong about the goals of this change:
a. Avoid allocating global data at NULL address
b. Reserve some space to use by upcoming arena-KASAN functionality.

For (b) wouldn't it be simpler to implicitly increase the arena map
size by an amount needed for KASAN functionality? Then there would be
no need to guess the necessary data offset.

For (a), is there a way to move an address of first valid mmaped page
(from BPF perspective) w/o physically allocating the first page?

[...]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/4] libbpf: offset global arena data into the arena if possible
  2025-11-18  3:00 ` [PATCH v2 3/4] libbpf: offset global arena data into the arena if possible Emil Tsalapatis
  2025-11-22  3:17   ` Eduard Zingerman
@ 2025-11-25 22:11   ` Andrii Nakryiko
  2025-12-01 20:41     ` Emil Tsalapatis
  1 sibling, 1 reply; 15+ messages in thread
From: Andrii Nakryiko @ 2025-11-25 22:11 UTC (permalink / raw)
  To: Emil Tsalapatis
  Cc: bpf, ast, daniel, john.fastabend, memxor, andrii, eddyz87,
	yonghong.song

On Mon, Nov 17, 2025 at 7:01 PM Emil Tsalapatis <emil@etsalapatis.com> wrote:
>
> Currently, libbpf places global arena data at the very beginning of
> the arena mapping. Stray NULL dereferences into the arena then find
> valid data and lead to silent corruption instead of causing an arena
> page fault. The data is placed in the mapping at load time, preventing
> us from reserving the region using bpf_arena_reserve_pages().
>
> Adjust the arena logic to attempt placing the data from an offset within
> the arena (currently 16 pages in) instead of the very beginning. If
> placing the data at an offset would lead to an allocation failure due
> to global data being as large as the entire arena, progressively reduce
> the offset down to 0 until placement succeeds.
>

I'm not a big fan of adding a single-purpose bpf_map__data_offset(),
tbh, and the whole "let's try to place it within the first 16 pages"
logic also looks a bit random...

Can't we just say that arena-based global variables are always placed
at the end of the arena? Obviously, page aligned all that stuff, so
it's deterministic and well-defined. Seems like we always expect
BPF_MAP_TYPE_ARENA arena explicitly defined in BPF object file with
max_entries set, so that should always work as expected?

And also, I don't think we need to change anything about skeleton
generation logic for the arena. That padding is not necessary, libbpf
should be able to point arena struct to the beginning of arena global
variables, no? As you implemented patch #2, it breaks backwards compat
for no good reason.

WDYT?

pw-bot: cr


> Adjust existing arena tests in the same commit to account for the new
> global data offset. New tests that explicitly consider the new feature
> are introduced in the next patch.
>
> Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com>
> ---
>  tools/lib/bpf/libbpf.c                        | 30 +++++++++++++++----
>  .../bpf/progs/verifier_arena_large.c          | 14 +++++++--
>  2 files changed, 37 insertions(+), 7 deletions(-)
>

[...]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/4] libbpf: offset global arena data into the arena if possible
  2025-11-22  3:17   ` Eduard Zingerman
@ 2025-12-01 18:34     ` Emil Tsalapatis
  2025-12-01 22:35       ` Eduard Zingerman
  0 siblings, 1 reply; 15+ messages in thread
From: Emil Tsalapatis @ 2025-12-01 18:34 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: bpf, ast, daniel, john.fastabend, memxor, andrii, yonghong.song

On Fri, Nov 21, 2025 at 10:17 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Mon, 2025-11-17 at 22:00 -0500, Emil Tsalapatis wrote:
> > Currently, libbpf places global arena data at the very beginning of
> > the arena mapping. Stray NULL dereferences into the arena then find
> > valid data and lead to silent corruption instead of causing an arena
> > page fault. The data is placed in the mapping at load time, preventing
> > us from reserving the region using bpf_arena_reserve_pages().
> >
> > Adjust the arena logic to attempt placing the data from an offset within
> > the arena (currently 16 pages in) instead of the very beginning. If
> > placing the data at an offset would lead to an allocation failure due
> > to global data being as large as the entire arena, progressively reduce
> > the offset down to 0 until placement succeeds.
> >
> > Adjust existing arena tests in the same commit to account for the new
> > global data offset. New tests that explicitly consider the new feature
> > are introduced in the next patch.
> >
> > Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com>
> > ---
>
> [...]
>
> > @@ -3006,6 +3011,17 @@ static int init_arena_map_data(struct bpf_object *obj, struct bpf_map *map,
> >       memcpy(obj->arena_data, data, data_sz);
> >       obj->arena_data_sz = data_sz;
> >
> > +     /*
> > +      * find the largest offset for global arena variables
> > +      * where they still fit in the arena
> > +      */
> > +     for (off_pages = max_off_pages; off_pages > 0; off_pages >>= 1) {
> > +             if (off_pages * page_sz + data_alloc_sz <= mmap_sz)
> > +                     break;
> > +     }
> > +
> > +     obj->arena_data_off = off_pages * page_sz;
> > +
> >       /* make bpf_map__init_value() work for ARENA maps */
> >       map->mmaped = obj->arena_data;
> >
>

Hi Eduard,

> Please correct me if I'm wrong about the goals of this change:
> a. Avoid allocating global data at NULL address
> b. Reserve some space to use by upcoming arena-KASAN functionality.
>
> For (b) wouldn't it be simpler to implicitly increase the arena map
> size by an amount needed for KASAN functionality? Then there would be
> no need to guess the necessary data offset.
>

b. isn't really an issue because we do not need to reserve the shadow
map region.
Arena ASAN uses a global variable to denote the base of the shadow
mapping, since the mapping needs to be in the last 1/8th of the address
space, and the address space size is equal to the size of the arena (and
thus configurable by the user).

(More details about how the ASAN region is reserved for reference:)
The ASAN runtime code initializes the shadow map at load time and makes the
necessary allocations for it. If the globals fall in the ASAN shadow
map region, then
ASAN will fail for now. This only happens if the arena is tiny, or the
size of the
globals is massive. Both cases are outliers and we support them later,
but the bottom line is that this is a problem for the ASAN runtime and
we don't have to worry about it in this patch.

> For (a), is there a way to move an address of first valid mmaped page
> (from BPF perspective) w/o physically allocating the first page?
>

IIUC what you mean:

This is what this change combined with bpf_alloc_reserve_pages() amounts to.
First, we adjust all the symbol addresses into the mapping to move the address
out of the zero the page. Then, we call the existing bpf_alloc_reserve_pages()
call to prevent the first page from ever being physically allocated.

Alternatively, if you maybe mean transparently add an offset to every
arena address
 in xlated/jitted code: That would also adjust the NULL pointers we're
trying to catch
to point to the first valid page we can allocate. In general, I don't
think it's possible
to do this in the verifier/JIT because this change depends on adjusting only
the arena globals, and that is best done at load time.

> [...]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/4] libbpf: offset global arena data into the arena if possible
  2025-11-25 22:11   ` Andrii Nakryiko
@ 2025-12-01 20:41     ` Emil Tsalapatis
  2025-12-01 22:45       ` Andrii Nakryiko
  0 siblings, 1 reply; 15+ messages in thread
From: Emil Tsalapatis @ 2025-12-01 20:41 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, ast, daniel, john.fastabend, memxor, andrii, eddyz87,
	yonghong.song

On Tue, Nov 25, 2025 at 5:12 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Mon, Nov 17, 2025 at 7:01 PM Emil Tsalapatis <emil@etsalapatis.com> wrote:
> >
> > Currently, libbpf places global arena data at the very beginning of
> > the arena mapping. Stray NULL dereferences into the arena then find
> > valid data and lead to silent corruption instead of causing an arena
> > page fault. The data is placed in the mapping at load time, preventing
> > us from reserving the region using bpf_arena_reserve_pages().
> >
> > Adjust the arena logic to attempt placing the data from an offset within
> > the arena (currently 16 pages in) instead of the very beginning. If
> > placing the data at an offset would lead to an allocation failure due
> > to global data being as large as the entire arena, progressively reduce
> > the offset down to 0 until placement succeeds.
> >
>

Hi Andrii,

> I'm not a big fan of adding a single-purpose bpf_map__data_offset(),
> tbh, and the whole "let's try to place it within the first 16 pages"
> logic also looks a bit random...
>

Giving it some thought, I think we can get away with removing the new API call.
The intended use is for userspace to know where the global data is placed
within the arena. For example, we've had sched-ext code in the past
that allocated
arena pages from userspace using read faults, and that code had to start in the
had to start in the middle of the arena to avoid conflicting with globals.

I think it is reasonable to have users call mincore(2) into the
mapping to find which
pages were populated by libbpf when loading the globals in the arena.
I can't think
of any other use cases for the API call, so we can remove it. Does
this make sense?

For the 16 pages rule I chose 16 as a sane default because we don't
want users choosing
it themselves. The logic is to make sure there's no "false negatives" wrt to
relocation, to add as many guard pages as we can in case the arena is small.

Is it the number or the logic for choosing the offset that looks random?
How would you like to simplify it? Possible ways I see are:

1) Choose a different number, maybe just a single page. That does open up the
possibility of very large offsets added to NULL pointers ending up in
the globals
regions, and 1 isn't really any less arbitrary than 16 or any other choice.

2) Simplify the logic for choosing the offset. Right now it's just
exponential backoff
from 16 all the way to 0, in which case we fall back to existing
behavior. We can
make it that relocation either succeeds at an offset or fails, and avoid trying
intermediate offsets.

Unless the code is very confusing (I don't think it is, but it's my
own code so I'm
not exactly impartial :) ) I don't think removing the backoff gets us much.

> Can't we just say that arena-based global variables are always placed
> at the end of the arena? Obviously, page aligned all that stuff, so
> it's deterministic and well-defined. Seems like we always expect
> BPF_MAP_TYPE_ARENA arena explicitly defined in BPF object file with
> max_entries set, so that should always work as expected?
>

The main problem I see with this is that it will cause complications
with the arena shadow
map. Putting the global variables at the end requires us to either
forfeit the possibility of
applying ASAN to them, or to complicate the mapping function. The
latter will have implications
for performance and code complexity, since it will add a conditional
jump on every ASAN
check.

> And also, I don't think we need to change anything about skeleton
> generation logic for the arena. That padding is not necessary, libbpf
> should be able to point arena struct to the beginning of arena global
> variables, no? As you implemented patch #2, it breaks backwards compat
> for no good reason.
>
> WDYT?
>

IIUC you mean modifying the address returned by the arena to point
where the globals are
placed instead of the guard region, right? I considered it at first
but I think it has its own set
of caveats, mainly that it assumes the program never wants to access
the region below the
globals.

Right now the patch makes the assumption that the program will set a
guard region themselves.
It may not do so and instead use the zero page to store data. In that
case, accessing the page
requires a negative offset on the arena page. It also breaks the
mincore(2) trick I describe
above for finding the offset into the arena without a new API call.
AFAICT expanding the
libbpf API is the bigger problem of the two, but I am not sure.

If we'd like both to avoid the extra API and to avoid the padding, we
can always set up the guard
mapping by default in libbpf at load time. Would that be acceptable?
It does take away control
from the user, but unless they have very good reason to constrain
arena size to a handful of pages
I don't see how the extra guard mapping can be a problem.

> pw-bot: cr
>
>
> > Adjust existing arena tests in the same commit to account for the new
> > global data offset. New tests that explicitly consider the new feature
> > are introduced in the next patch.
> >
> > Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com>
> > ---
> >  tools/lib/bpf/libbpf.c                        | 30 +++++++++++++++----
> >  .../bpf/progs/verifier_arena_large.c          | 14 +++++++--
> >  2 files changed, 37 insertions(+), 7 deletions(-)
> >
>
> [...]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/4] libbpf: offset global arena data into the arena if possible
  2025-12-01 18:34     ` Emil Tsalapatis
@ 2025-12-01 22:35       ` Eduard Zingerman
  2025-12-03 16:07         ` Emil Tsalapatis
  0 siblings, 1 reply; 15+ messages in thread
From: Eduard Zingerman @ 2025-12-01 22:35 UTC (permalink / raw)
  To: Emil Tsalapatis
  Cc: bpf, ast, daniel, john.fastabend, memxor, andrii, yonghong.song

On Mon, 2025-12-01 at 13:34 -0500, Emil Tsalapatis wrote:

[...]

> > For (a), is there a way to move an address of first valid mmaped page
> > (from BPF perspective) w/o physically allocating the first page?
> > 
> 
> IIUC what you mean:
> 
> This is what this change combined with bpf_alloc_reserve_pages() amounts to.
> First, we adjust all the symbol addresses into the mapping to move the address
> out of the zero the page. Then, we call the existing bpf_alloc_reserve_pages()
> call to prevent the first page from ever being physically allocated.

libbpf.c:bpf_object__create_maps() allocates arena memory region as
follows:

  map->mmaped = mmap(addr: (void *)(long)map->map_extra, ...);

Where 'map_extra' comes from fill_map_from_def() / parse_btf_map_def().
The latter even provides a flag:

  map_def->parts |= MAP_DEF_MAP_EXTRA;

On the kernel side arena.c:arena_map_alloc() uses this 'map_extra' as
a starting arena address:

  arena->user_vm_start = attr->map_extra;

arena.c:arena_alloc_pages() uses 'user_vm_start' as follows:

  uaddr32 = (u32)(arena->user_vm_start + pgoff * PAGE_SIZE);
  ...
  return clear_lo32(val: arena->user_vm_start) + uaddr32;

Returned value is what BPF program sees as an address, right?
Is there a way to tweak arena_alloc_pages() implementation or arena
creation at libbpf side, such that arena_alloc_pages() never returns
value smaller then 16 * PAGE_SIZE?
E.g. by tweaking the attr->map_extra value in
bpf_object__create_maps() if it was not specified directly by user?

> Alternatively, if you maybe mean transparently add an offset to every
> arena address
>  in xlated/jitted code: That would also adjust the NULL pointers we're
> trying to catch
> to point to the first valid page we can allocate. In general, I don't
> think it's possible
> to do this in the verifier/JIT because this change depends on adjusting only
> the arena globals, and that is best done at load time.

For sure, it does not make sense to do it on verifier side.
It might even be impossible in general case.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/4] libbpf: offset global arena data into the arena if possible
  2025-12-01 20:41     ` Emil Tsalapatis
@ 2025-12-01 22:45       ` Andrii Nakryiko
  2025-12-03 16:13         ` Emil Tsalapatis
  0 siblings, 1 reply; 15+ messages in thread
From: Andrii Nakryiko @ 2025-12-01 22:45 UTC (permalink / raw)
  To: Emil Tsalapatis
  Cc: bpf, ast, daniel, john.fastabend, memxor, andrii, eddyz87,
	yonghong.song

On Mon, Dec 1, 2025 at 12:41 PM Emil Tsalapatis <emil@etsalapatis.com> wrote:
>
> On Tue, Nov 25, 2025 at 5:12 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Mon, Nov 17, 2025 at 7:01 PM Emil Tsalapatis <emil@etsalapatis.com> wrote:
> > >
> > > Currently, libbpf places global arena data at the very beginning of
> > > the arena mapping. Stray NULL dereferences into the arena then find
> > > valid data and lead to silent corruption instead of causing an arena
> > > page fault. The data is placed in the mapping at load time, preventing
> > > us from reserving the region using bpf_arena_reserve_pages().
> > >
> > > Adjust the arena logic to attempt placing the data from an offset within
> > > the arena (currently 16 pages in) instead of the very beginning. If
> > > placing the data at an offset would lead to an allocation failure due
> > > to global data being as large as the entire arena, progressively reduce
> > > the offset down to 0 until placement succeeds.
> > >
> >
>
> Hi Andrii,
>
> > I'm not a big fan of adding a single-purpose bpf_map__data_offset(),
> > tbh, and the whole "let's try to place it within the first 16 pages"
> > logic also looks a bit random...
> >
>
> Giving it some thought, I think we can get away with removing the new API call.
> The intended use is for userspace to know where the global data is placed
> within the arena. For example, we've had sched-ext code in the past
> that allocated
> arena pages from userspace using read faults, and that code had to start in the
> had to start in the middle of the arena to avoid conflicting with globals.
>
> I think it is reasonable to have users call mincore(2) into the
> mapping to find which
> pages were populated by libbpf when loading the globals in the arena.
> I can't think
> of any other use cases for the API call, so we can remove it. Does
> this make sense?

I don't know if I agree that it's reasonable to expect users to use
mincore() to learn where libbpf put global variables. Most natural
location is at the beginning of arena map, but if that doesn't work,
then I expected that at the very end of arena would be ok.

>
> For the 16 pages rule I chose 16 as a sane default because we don't
> want users choosing
> it themselves. The logic is to make sure there's no "false negatives" wrt to
> relocation, to add as many guard pages as we can in case the arena is small.

I tried to understand what you are saying about false negatives, but
I'm not sure I get it, sorry. I do agree that asking users to choose
the offset is suboptimal, though.

>
> Is it the number or the logic for choosing the offset that looks random?

The offset you chose is a bit random, plus that piece of logic where
you will be shifting from 16 down to 0 pages, if you happen to fail to
mmap.

> How would you like to simplify it? Possible ways I see are:
>
> 1) Choose a different number, maybe just a single page. That does open up the
> possibility of very large offsets added to NULL pointers ending up in
> the globals
> regions, and 1 isn't really any less arbitrary than 16 or any other choice.

Yeah, I was guessing you didn't choose just one page because you
wanted to catch large field offsets added to NULL. (but yes, 1 is way
less arbitrary than 16, but that's beside the point)

>
> 2) Simplify the logic for choosing the offset. Right now it's just
> exponential backoff
> from 16 all the way to 0, in which case we fall back to existing
> behavior. We can
> make it that relocation either succeeds at an offset or fails, and avoid trying
> intermediate offsets.

Yes, if we absolutely have to go with 16 page offset instead of
putting globals at the end, I'd just fix the offset and it either
works or not. It still feels wrong to hard-code 16 as part of API,
though.

>
> Unless the code is very confusing (I don't think it is, but it's my
> own code so I'm
> not exactly impartial :) ) I don't think removing the backoff gets us much.
>
> > Can't we just say that arena-based global variables are always placed
> > at the end of the arena? Obviously, page aligned all that stuff, so
> > it's deterministic and well-defined. Seems like we always expect
> > BPF_MAP_TYPE_ARENA arena explicitly defined in BPF object file with
> > max_entries set, so that should always work as expected?
> >
>
> The main problem I see with this is that it will cause complications
> with the arena shadow
> map. Putting the global variables at the end requires us to either
> forfeit the possibility of
> applying ASAN to them, or to complicate the mapping function. The
> latter will have implications
> for performance and code complexity, since it will add a conditional
> jump on every ASAN
> check.
>

So maybe elaborate a bit on this? I can't say to understand why
locating globals at the end of arena would change anything for your
ASAN logic. Wouldn't ASAN have to track data at any possible location
anyways? I'm sure you have a good answer here, but please lay it out.

> > And also, I don't think we need to change anything about skeleton
> > generation logic for the arena. That padding is not necessary, libbpf
> > should be able to point arena struct to the beginning of arena global
> > variables, no? As you implemented patch #2, it breaks backwards compat
> > for no good reason.
> >
> > WDYT?
> >
>
> IIUC you mean modifying the address returned by the arena to point
> where the globals are
> placed instead of the guard region, right? I considered it at first
> but I think it has its own set
> of caveats, mainly that it assumes the program never wants to access
> the region below the
> globals.

Hm... I guess it depends on how we look at that *arena pointer. I was
thinking about it as more of pointer to arena-placed global variables,
not some sort of generic way to access any place within the arena. So
in that sense it's natural for that *arena pointer to point to just
global variables, wherever within the arena data area it might be
located.

If user needs to work with other arena pages, they can fetch
memory-mapped memory for arena map itself through
bpf_map__initial_value(skel->maps.arena). I.e., they take `struct
bpf_map` object, ask libbpf where its data is, and then work with it
in "untyped" way.

>
> Right now the patch makes the assumption that the program will set a
> guard region themselves.
> It may not do so and instead use the zero page to store data. In that
> case, accessing the page
> requires a negative offset on the arena page. It also breaks the
> mincore(2) trick I describe
> above for finding the offset into the arena without a new API call.
> AFAICT expanding the
> libbpf API is the bigger problem of the two, but I am not sure.

It just feels like we are exposing way too much implementation details
here. See my point above about using bpf_map__initial_value() (the
name is not great, but it's used to get memory-mapped area for
mmap-able BPF maps).

>
> If we'd like both to avoid the extra API and to avoid the padding, we
> can always set up the guard
> mapping by default in libbpf at load time. Would that be acceptable?
> It does take away control
> from the user, but unless they have very good reason to constrain
> arena size to a handful of pages
> I don't see how the extra guard mapping can be a problem.
>

Again, I'm only guessing what you are trying to say here. Please be
more specific.

> > pw-bot: cr
> >
> >
> > > Adjust existing arena tests in the same commit to account for the new
> > > global data offset. New tests that explicitly consider the new feature
> > > are introduced in the next patch.
> > >
> > > Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com>
> > > ---
> > >  tools/lib/bpf/libbpf.c                        | 30 +++++++++++++++----
> > >  .../bpf/progs/verifier_arena_large.c          | 14 +++++++--
> > >  2 files changed, 37 insertions(+), 7 deletions(-)
> > >
> >
> > [...]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/4] libbpf: offset global arena data into the arena if possible
  2025-12-01 22:35       ` Eduard Zingerman
@ 2025-12-03 16:07         ` Emil Tsalapatis
  0 siblings, 0 replies; 15+ messages in thread
From: Emil Tsalapatis @ 2025-12-03 16:07 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: bpf, ast, daniel, john.fastabend, memxor, andrii, yonghong.song

On Mon, Dec 1, 2025 at 5:35 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Mon, 2025-12-01 at 13:34 -0500, Emil Tsalapatis wrote:
>
> [...]
>
> > > For (a), is there a way to move an address of first valid mmaped page
> > > (from BPF perspective) w/o physically allocating the first page?
> > >
> >
> > IIUC what you mean:
> >
> > This is what this change combined with bpf_alloc_reserve_pages() amounts to.
> > First, we adjust all the symbol addresses into the mapping to move the address
> > out of the zero the page. Then, we call the existing bpf_alloc_reserve_pages()
> > call to prevent the first page from ever being physically allocated.
>
> libbpf.c:bpf_object__create_maps() allocates arena memory region as
> follows:
>
>   map->mmaped = mmap(addr: (void *)(long)map->map_extra, ...);
>
> Where 'map_extra' comes from fill_map_from_def() / parse_btf_map_def().
> The latter even provides a flag:
>
>   map_def->parts |= MAP_DEF_MAP_EXTRA;
>
> On the kernel side arena.c:arena_map_alloc() uses this 'map_extra' as
> a starting arena address:
>
>   arena->user_vm_start = attr->map_extra;
>
> arena.c:arena_alloc_pages() uses 'user_vm_start' as follows:
>
>   uaddr32 = (u32)(arena->user_vm_start + pgoff * PAGE_SIZE);
>   ...
>   return clear_lo32(val: arena->user_vm_start) + uaddr32;
>
> Returned value is what BPF program sees as an address, right?
> Is there a way to tweak arena_alloc_pages() implementation or arena
> creation at libbpf side, such that arena_alloc_pages() never returns
> value smaller then 16 * PAGE_SIZE?
> E.g. by tweaking the attr->map_extra value in
> bpf_object__create_maps() if it was not specified directly by user?
>

Turns out that we can move the globals to the end of the arena by moving
the shadow map, so the following is moot: But we could always call
bpf_reserve_alloc_pages() from libbpf when we first set up the arena.

It turns out that it's trivial to move the ASAN shadow map around the
address space so we can remove most of the issues in the patchset (
extra libbpf call, globals in the middle of the address space complicating
the logic). I will send an updated version.

> > Alternatively, if you maybe mean transparently add an offset to every
> > arena address
> >  in xlated/jitted code: That would also adjust the NULL pointers we're
> > trying to catch
> > to point to the first valid page we can allocate. In general, I don't
> > think it's possible
> > to do this in the verifier/JIT because this change depends on adjusting only
> > the arena globals, and that is best done at load time.
>
> For sure, it does not make sense to do it on verifier side.
> It might even be impossible in general case.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/4] libbpf: offset global arena data into the arena if possible
  2025-12-01 22:45       ` Andrii Nakryiko
@ 2025-12-03 16:13         ` Emil Tsalapatis
  0 siblings, 0 replies; 15+ messages in thread
From: Emil Tsalapatis @ 2025-12-03 16:13 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, ast, daniel, john.fastabend, memxor, andrii, eddyz87,
	yonghong.song

On Mon, Dec 1, 2025 at 5:45 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Mon, Dec 1, 2025 at 12:41 PM Emil Tsalapatis <emil@etsalapatis.com> wrote:
> >
> > On Tue, Nov 25, 2025 at 5:12 PM Andrii Nakryiko
> > <andrii.nakryiko@gmail.com> wrote:
> > >
> > > On Mon, Nov 17, 2025 at 7:01 PM Emil Tsalapatis <emil@etsalapatis.com> wrote:
> > > >
> > > > Currently, libbpf places global arena data at the very beginning of
> > > > the arena mapping. Stray NULL dereferences into the arena then find
> > > > valid data and lead to silent corruption instead of causing an arena
> > > > page fault. The data is placed in the mapping at load time, preventing
> > > > us from reserving the region using bpf_arena_reserve_pages().
> > > >
> > > > Adjust the arena logic to attempt placing the data from an offset within
> > > > the arena (currently 16 pages in) instead of the very beginning. If
> > > > placing the data at an offset would lead to an allocation failure due
> > > > to global data being as large as the entire arena, progressively reduce
> > > > the offset down to 0 until placement succeeds.
> > > >
> > >
> >
> > Hi Andrii,
> >
> > > I'm not a big fan of adding a single-purpose bpf_map__data_offset(),
> > > tbh, and the whole "let's try to place it within the first 16 pages"
> > > logic also looks a bit random...
> > >
> >
> > Giving it some thought, I think we can get away with removing the new API call.
> > The intended use is for userspace to know where the global data is placed
> > within the arena. For example, we've had sched-ext code in the past
> > that allocated
> > arena pages from userspace using read faults, and that code had to start in the
> > had to start in the middle of the arena to avoid conflicting with globals.
> >
> > I think it is reasonable to have users call mincore(2) into the
> > mapping to find which
> > pages were populated by libbpf when loading the globals in the arena.
> > I can't think
> > of any other use cases for the API call, so we can remove it. Does
> > this make sense?
>
> I don't know if I agree that it's reasonable to expect users to use
> mincore() to learn where libbpf put global variables. Most natural
> location is at the beginning of arena map, but if that doesn't work,
> then I expected that at the very end of arena would be ok.
>
> >
> > For the 16 pages rule I chose 16 as a sane default because we don't
> > want users choosing
> > it themselves. The logic is to make sure there's no "false negatives" wrt to
> > relocation, to add as many guard pages as we can in case the arena is small.
>
> I tried to understand what you are saying about false negatives, but
> I'm not sure I get it, sorry. I do agree that asking users to choose
> the offset is suboptimal, though.
>
> >
> > Is it the number or the logic for choosing the offset that looks random?
>
> The offset you chose is a bit random, plus that piece of logic where
> you will be shifting from 16 down to 0 pages, if you happen to fail to
> mmap.
>
> > How would you like to simplify it? Possible ways I see are:
> >
> > 1) Choose a different number, maybe just a single page. That does open up the
> > possibility of very large offsets added to NULL pointers ending up in
> > the globals
> > regions, and 1 isn't really any less arbitrary than 16 or any other choice.
>
> Yeah, I was guessing you didn't choose just one page because you
> wanted to catch large field offsets added to NULL. (but yes, 1 is way
> less arbitrary than 16, but that's beside the point)
>
> >
> > 2) Simplify the logic for choosing the offset. Right now it's just
> > exponential backoff
> > from 16 all the way to 0, in which case we fall back to existing
> > behavior. We can
> > make it that relocation either succeeds at an offset or fails, and avoid trying
> > intermediate offsets.
>
> Yes, if we absolutely have to go with 16 page offset instead of
> putting globals at the end, I'd just fix the offset and it either
> works or not. It still feels wrong to hard-code 16 as part of API,
> though.
>
> >
> > Unless the code is very confusing (I don't think it is, but it's my
> > own code so I'm
> > not exactly impartial :) ) I don't think removing the backoff gets us much.
> >
> > > Can't we just say that arena-based global variables are always placed
> > > at the end of the arena? Obviously, page aligned all that stuff, so
> > > it's deterministic and well-defined. Seems like we always expect
> > > BPF_MAP_TYPE_ARENA arena explicitly defined in BPF object file with
> > > max_entries set, so that should always work as expected?
> > >
> >
> > The main problem I see with this is that it will cause complications
> > with the arena shadow
> > map. Putting the global variables at the end requires us to either
> > forfeit the possibility of
> > applying ASAN to them, or to complicate the mapping function. The
> > latter will have implications
> > for performance and code complexity, since it will add a conditional
> > jump on every ASAN
> > check.
> >
>
> So maybe elaborate a bit on this? I can't say to understand why
> locating globals at the end of arena would change anything for your
> ASAN logic. Wouldn't ASAN have to track data at any possible location
> anyways? I'm sure you have a good answer here, but please lay it out.
>

You're actually right, it's trivial to move. I'll send a new version
of the patch
that places all globals at the end. I think this removes all problems present in
the current version. The only extra bit in the new version will be a slight
verifier change to handle large direct offsets into arena maps.

> > > And also, I don't think we need to change anything about skeleton
> > > generation logic for the arena. That padding is not necessary, libbpf
> > > should be able to point arena struct to the beginning of arena global
> > > variables, no? As you implemented patch #2, it breaks backwards compat
> > > for no good reason.
> > >
> > > WDYT?
> > >
> >
> > IIUC you mean modifying the address returned by the arena to point
> > where the globals are
> > placed instead of the guard region, right? I considered it at first
> > but I think it has its own set
> > of caveats, mainly that it assumes the program never wants to access
> > the region below the
> > globals.
>
> Hm... I guess it depends on how we look at that *arena pointer. I was
> thinking about it as more of pointer to arena-placed global variables,
> not some sort of generic way to access any place within the arena. So
> in that sense it's natural for that *arena pointer to point to just
> global variables, wherever within the arena data area it might be
> located.
>
> If user needs to work with other arena pages, they can fetch
> memory-mapped memory for arena map itself through
> bpf_map__initial_value(skel->maps.arena). I.e., they take `struct
> bpf_map` object, ask libbpf where its data is, and then work with it
> in "untyped" way.
>
> >
> > Right now the patch makes the assumption that the program will set a
> > guard region themselves.
> > It may not do so and instead use the zero page to store data. In that
> > case, accessing the page
> > requires a negative offset on the arena page. It also breaks the
> > mincore(2) trick I describe
> > above for finding the offset into the arena without a new API call.
> > AFAICT expanding the
> > libbpf API is the bigger problem of the two, but I am not sure.
>
> It just feels like we are exposing way too much implementation details
> here. See my point above about using bpf_map__initial_value() (the
> name is not great, but it's used to get memory-mapped area for
> mmap-able BPF maps).
>

Wasn't aware about bpf_map__initial_value(), if we have an existing way
of finding the arena's vm_user_start already then I don't see a problem with
changing the arena pointer to point to the globals instead. I'll remove the
additional libbpf call in the next version.

> >
> > If we'd like both to avoid the extra API and to avoid the padding, we
> > can always set up the guard
> > mapping by default in libbpf at load time. Would that be acceptable?
> > It does take away control
> > from the user, but unless they have very good reason to constrain
> > arena size to a handful of pages
> > I don't see how the extra guard mapping can be a problem.
> >
>
> Again, I'm only guessing what you are trying to say here. Please be
> more specific.
>

I meant we could do bpf_alloc_reserve_pages() on arena load times, but
the point is mott.

> > > pw-bot: cr
> > >
> > >
> > > > Adjust existing arena tests in the same commit to account for the new
> > > > global data offset. New tests that explicitly consider the new feature
> > > > are introduced in the next patch.
> > > >
> > > > Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com>
> > > > ---
> > > >  tools/lib/bpf/libbpf.c                        | 30 +++++++++++++++----
> > > >  .../bpf/progs/verifier_arena_large.c          | 14 +++++++--
> > > >  2 files changed, 37 insertions(+), 7 deletions(-)
> > > >
> > >
> > > [...]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 1/4] selftests/bpf: explicitly account for globals in verifier_arena_large
  2025-12-03 16:26 [PATCH v2 0/4] libbpf: move arena variables out of the zero Emil Tsalapatis
@ 2025-12-03 16:26 ` Emil Tsalapatis
  0 siblings, 0 replies; 15+ messages in thread
From: Emil Tsalapatis @ 2025-12-03 16:26 UTC (permalink / raw)
  To: bpf
  Cc: andrii, eddyz87, ast, daniel, john.fastabend, memxor,
	yonghong.song, Emil Tsalapatis

The big_alloc1 test in verifier_arena_large assumes that the arena base
and the first page allocated by bpf_arena_alloc_pages are identical.
This is not the case, because the first page in the arena is populated
by global arena data. The test still passes because the code makes the
tacit assumption that the first page is on offset PAGE_SIZE instead of
0.

Make this distinction explicit in the code, and adjust the page offsets
requested during the test to count from the beginning of the arena
instead of using the address of the first allocated page.

Reviewed-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com>
---
 .../selftests/bpf/progs/verifier_arena_large.c    | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/bpf/progs/verifier_arena_large.c b/tools/testing/selftests/bpf/progs/verifier_arena_large.c
index f19e15400b3e..bd430a34c3ab 100644
--- a/tools/testing/selftests/bpf/progs/verifier_arena_large.c
+++ b/tools/testing/selftests/bpf/progs/verifier_arena_large.c
@@ -23,18 +23,25 @@ int big_alloc1(void *ctx)
 {
 #if defined(__BPF_FEATURE_ADDR_SPACE_CAST)
 	volatile char __arena *page1, *page2, *no_page, *page3;
-	void __arena *base;
+	u64 base;
 
-	page1 = base = bpf_arena_alloc_pages(&arena, NULL, 1, NUMA_NO_NODE, 0);
+	base = (u64)arena_base(&arena);
+
+	page1 = bpf_arena_alloc_pages(&arena, NULL, 1, NUMA_NO_NODE, 0);
 	if (!page1)
 		return 1;
+
+	/* Account for global arena data. */
+	if ((u64)page1 != base + PAGE_SIZE)
+		return 15;
+
 	*page1 = 1;
-	page2 = bpf_arena_alloc_pages(&arena, base + ARENA_SIZE - PAGE_SIZE * 2,
+	page2 = bpf_arena_alloc_pages(&arena, (void __arena *)(ARENA_SIZE - PAGE_SIZE),
 				      1, NUMA_NO_NODE, 0);
 	if (!page2)
 		return 2;
 	*page2 = 2;
-	no_page = bpf_arena_alloc_pages(&arena, base + ARENA_SIZE - PAGE_SIZE,
+	no_page = bpf_arena_alloc_pages(&arena, (void __arena *)ARENA_SIZE,
 					1, NUMA_NO_NODE, 0);
 	if (no_page)
 		return 3;
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2025-12-03 16:26 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-18  3:00 [PATCH v2 0/4] libbpf: move arena variables out of the zero page Emil Tsalapatis
2025-11-18  3:00 ` [PATCH v2 1/4] selftests/bpf: explicitly account for globals in verifier_arena_large Emil Tsalapatis
2025-11-22  1:44   ` Eduard Zingerman
2025-11-18  3:00 ` [PATCH v2 2/4] libbpf: add stub for offset-related skeleton padding Emil Tsalapatis
2025-11-18  3:00 ` [PATCH v2 3/4] libbpf: offset global arena data into the arena if possible Emil Tsalapatis
2025-11-22  3:17   ` Eduard Zingerman
2025-12-01 18:34     ` Emil Tsalapatis
2025-12-01 22:35       ` Eduard Zingerman
2025-12-03 16:07         ` Emil Tsalapatis
2025-11-25 22:11   ` Andrii Nakryiko
2025-12-01 20:41     ` Emil Tsalapatis
2025-12-01 22:45       ` Andrii Nakryiko
2025-12-03 16:13         ` Emil Tsalapatis
2025-11-18  3:00 ` [PATCH v2 4/4] selftests/bpf: add tests for the arena offset of globals Emil Tsalapatis
  -- strict thread matches above, loose matches on Subject: below --
2025-12-03 16:26 [PATCH v2 0/4] libbpf: move arena variables out of the zero Emil Tsalapatis
2025-12-03 16:26 ` [PATCH v2 1/4] selftests/bpf: explicitly account for globals in verifier_arena_large Emil Tsalapatis

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox