[PATCH bpf-next v3 0/4] bpf: Introduce global percpu data

public inbox for bpf@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH bpf-next v3 0/4] bpf: Introduce global percpu data
@ 2025-05-26 16:21 Leon Hwang
  2025-05-26 16:21 ` [PATCH bpf-next v3 1/4] " Leon Hwang
                   ` (4 more replies)
  0 siblings, 5 replies; 24+ messages in thread
From: Leon Hwang @ 2025-05-26 16:21 UTC (permalink / raw)
  To: bpf
  Cc: ast, andrii, daniel, yonghong.song, song, eddyz87, qmo, dxu,
	leon.hwang, kernel-patches-bot

This patch set introduces global percpu data, similar to commit
6316f78306c1 ("Merge branch 'support-global-data'"), to reduce restrictions
in C for BPF programs.

With this enhancement, it becomes possible to define and use global percpu
variables, like the DEFINE_PER_CPU() macro in the kernel[0].

The section name for global peurcpu data is ".data..percpu". It cannot be
named ".percpu" or ".percpudata" because defining a one-byte percpu
variable (e.g., char run SEC(".data..percpu") = 0;) can trigger a crash
with Clang 17[1]. The name ".data.percpu" is also avoided because some
users already use section names prefixed with ".data.percpu", such as in
this example from test_global_map_resize.c:

int percpu_arr[1] SEC(".data.percpu_arr");

The idea stems from the bpfsnoop[2], which itself was inspired by
retsnoop[3]. During testing of bpfsnoop on the v6.6 kernel, two LBR
(Last Branch Record) entries were observed related to the
bpf_get_smp_processor_id() helper.

Since commit 1ae6921009e5 ("bpf: inline bpf_get_smp_processor_id() helper"),
the bpf_get_smp_processor_id() helper has been inlined on x86_64, reducing
the overhead and consequently minimizing these two LBR records.

However, the introduction of global percpu data offers a more robust
solution. By leveraging the percpu_array map and percpu instruction,
global percpu data can be implemented intrinsically.

This feature also facilitates sharing percpu information between tail
callers and callees or between freplace callers and callees through a
shared global percpu variable. Previously, this was achieved using a
1-entry percpu_array map, which this patch set aims to improve upon.

Links:
[0] https://github.com/torvalds/linux/blob/fbfd64d25c7af3b8695201ebc85efe90be28c5a3/include/linux/percpu-defs.h#L114
[1] https://lore.kernel.org/bpf/fd1b3f58-c27f-403d-ad99-644b7d06ecb3@linux.dev/
[2] https://github.com/bpfsnoop/bpfsnoop
[3] https://github.com/anakryiko/retsnoop

Changes:
v2 -> v3:
  * Use ".data..percpu" as PERCPU_DATA_SEC.
  * Address comment from Alexei:
    * Add u8, array of ints and struct { .. } vars to selftest.

v1 -> v2:
  * Address comments from Andrii:
    * Use LIBBPF_MAP_PERCPU and SEC_PERCPU.
    * Reuse mmaped of libbpf's struct bpf_map for .percpu map data.
    * Set .percpu struct pointer to NULL after loading skeleton.
    * Make sure value size of .percpu map is __aligned(8).
    * Use raw_tp and opts.cpu to test global percpu variables on all CPUs.
  * Address comments from Alexei:
    * Test non-zero offset of global percpu variable.
    * Test case about BPF_PSEUDO_MAP_IDX_VALUE.

rfc -> v1:
  * Address comments from Andrii:
    * Keep one image of global percpu variable for all CPUs.
    * Reject non-ARRAY map in bpf_map_direct_read(), check_reg_const_str(),
      and check_bpf_snprintf_call() in verifier.
    * Split out libbpf changes from kernel-side changes.
    * Use ".percpu" as PERCPU_DATA_SEC.
    * Use enum libbpf_map_type to distinguish BSS, DATA, RODATA and
      PERCPU_DATA.
    * Avoid using errno for checking err from libbpf_num_possible_cpus().
    * Use "map '%s': " prefix for error message.

Leon Hwang (4):
  bpf: Introduce global percpu data
  bpf, libbpf: Support global percpu data
  bpf, bpftool: Generate skeleton for global percpu data
  selftests/bpf: Add cases to test global percpu data

 kernel/bpf/arraymap.c                         |  41 +++-
 kernel/bpf/verifier.c                         |  45 ++++
 tools/bpf/bpftool/gen.c                       |  47 ++--
 tools/lib/bpf/libbpf.c                        | 102 ++++++--
 tools/lib/bpf/libbpf.h                        |   9 +
 tools/lib/bpf/libbpf.map                      |   1 +
 tools/testing/selftests/bpf/Makefile          |   2 +-
 .../bpf/prog_tests/global_data_init.c         | 221 +++++++++++++++++-
 .../bpf/progs/test_global_percpu_data.c       |  29 +++
 9 files changed, 459 insertions(+), 38 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/test_global_percpu_data.c

-- 
2.49.0

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH bpf-next v3 1/4] bpf: Introduce global percpu data
  2025-05-26 16:21 [PATCH bpf-next v3 0/4] bpf: Introduce global percpu data Leon Hwang
@ 2025-05-26 16:21 ` Leon Hwang
  2025-05-27 22:31   ` Andrii Nakryiko
  2025-05-26 16:21 ` [PATCH bpf-next v3 2/4] bpf, libbpf: Support " Leon Hwang
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 24+ messages in thread
From: Leon Hwang @ 2025-05-26 16:21 UTC (permalink / raw)
  To: bpf
  Cc: ast, andrii, daniel, yonghong.song, song, eddyz87, qmo, dxu,
	leon.hwang, kernel-patches-bot

This patch introduces global percpu data, inspired by commit
6316f78306c1 ("Merge branch 'support-global-data'"). It enables the
definition of global percpu variables in BPF, similar to the
DEFINE_PER_CPU() macro in the kernel[0].

For example, in BPF, it is able to define a global percpu variable like:

int data SEC(".data..percpu");

With this patch, tools like retsnoop[1] and bpfsnoop[2] can simplify their
BPF code for handling LBRs. The code can be updated from

static struct perf_branch_entry lbrs[1][MAX_LBR_ENTRIES] SEC(".data.lbrs");

to

static struct perf_branch_entry lbrs[MAX_LBR_ENTRIES] SEC(".data..percpu.lbrs");

This eliminates the need to retrieve the CPU ID using the
bpf_get_smp_processor_id() helper.

Additionally, by reusing global percpu data map, sharing information
between tail callers and callees or freplace callers and callees becomes
simpler compared to reusing percpu_array maps.

Links:
[0] https://github.com/torvalds/linux/blob/fbfd64d25c7af3b8695201ebc85efe90be28c5a3/include/linux/percpu-defs.h#L114
[1] https://github.com/anakryiko/retsnoop
[2] https://github.com/bpfsnoop/bpfsnoop

Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
 kernel/bpf/arraymap.c | 41 +++++++++++++++++++++++++++++++++++++--
 kernel/bpf/verifier.c | 45 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 84 insertions(+), 2 deletions(-)

diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index eb28c0f219ee4..91d06f0165a6e 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -249,6 +249,40 @@ static void *percpu_array_map_lookup_elem(struct bpf_map *map, void *key)
 	return this_cpu_ptr(array->pptrs[index & array->index_mask]);
 }
 
+static int percpu_array_map_direct_value_addr(const struct bpf_map *map,
+					      u64 *imm, u32 off)
+{
+	struct bpf_array *array = container_of(map, struct bpf_array, map);
+
+	if (map->max_entries != 1)
+		return -EOPNOTSUPP;
+	if (off >= map->value_size)
+		return -EINVAL;
+	if (!bpf_jit_supports_percpu_insn())
+		return -EOPNOTSUPP;
+
+	*imm = (u64) array->pptrs[0];
+	return 0;
+}
+
+static int percpu_array_map_direct_value_meta(const struct bpf_map *map,
+					      u64 imm, u32 *off)
+{
+	struct bpf_array *array = container_of(map, struct bpf_array, map);
+	u64 base = (u64) array->pptrs[0];
+	u64 range = array->elem_size;
+
+	if (map->max_entries != 1)
+		return -EOPNOTSUPP;
+	if (imm < base || imm >= base + range)
+		return -ENOENT;
+	if (!bpf_jit_supports_percpu_insn())
+		return -EOPNOTSUPP;
+
+	*off = imm - base;
+	return 0;
+}
+
 /* emit BPF instructions equivalent to C code of percpu_array_map_lookup_elem() */
 static int percpu_array_map_gen_lookup(struct bpf_map *map, struct bpf_insn *insn_buf)
 {
@@ -532,9 +566,10 @@ static int array_map_check_btf(const struct bpf_map *map,
 {
 	u32 int_data;
 
-	/* One exception for keyless BTF: .bss/.data/.rodata map */
+	/* One exception for keyless BTF: .bss/.data/.rodata/.data..percpu map */
 	if (btf_type_is_void(key_type)) {
-		if (map->map_type != BPF_MAP_TYPE_ARRAY ||
+		if ((map->map_type != BPF_MAP_TYPE_ARRAY &&
+		     map->map_type != BPF_MAP_TYPE_PERCPU_ARRAY) ||
 		    map->max_entries != 1)
 			return -EINVAL;
 
@@ -815,6 +850,8 @@ const struct bpf_map_ops percpu_array_map_ops = {
 	.map_get_next_key = array_map_get_next_key,
 	.map_lookup_elem = percpu_array_map_lookup_elem,
 	.map_gen_lookup = percpu_array_map_gen_lookup,
+	.map_direct_value_addr = percpu_array_map_direct_value_addr,
+	.map_direct_value_meta = percpu_array_map_direct_value_meta,
 	.map_update_elem = array_map_update_elem,
 	.map_delete_elem = array_map_delete_elem,
 	.map_lookup_percpu_elem = percpu_array_map_lookup_percpu_elem,
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index d5807d2efc922..9203354208732 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -6939,6 +6939,8 @@ static int bpf_map_direct_read(struct bpf_map *map, int off, int size, u64 *val,
 	u64 addr;
 	int err;
 
+	if (map->map_type != BPF_MAP_TYPE_ARRAY)
+		return -EINVAL;
 	err = map->ops->map_direct_value_addr(map, &addr, off);
 	if (err)
 		return err;
@@ -7451,6 +7453,7 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
 			/* if map is read-only, track its contents as scalars */
 			if (tnum_is_const(reg->var_off) &&
 			    bpf_map_is_rdonly(map) &&
+			    map->map_type == BPF_MAP_TYPE_ARRAY &&
 			    map->ops->map_direct_value_addr) {
 				int map_off = off + reg->var_off.value;
 				u64 val = 0;
@@ -9414,6 +9417,11 @@ static int check_reg_const_str(struct bpf_verifier_env *env,
 		return -EACCES;
 	}
 
+	if (map->map_type != BPF_MAP_TYPE_ARRAY) {
+		verbose(env, "only array map supports direct string value access\n");
+		return -EINVAL;
+	}
+
 	err = check_map_access(env, regno, reg->off,
 			       map->value_size - reg->off, false,
 			       ACCESS_HELPER);
@@ -11101,6 +11109,11 @@ static int check_bpf_snprintf_call(struct bpf_verifier_env *env,
 		return -EINVAL;
 	num_args = data_len_reg->var_off.value / 8;
 
+	if (fmt_map->map_type != BPF_MAP_TYPE_ARRAY) {
+		verbose(env, "only array map supports snprintf\n");
+		return -EINVAL;
+	}
+
 	/* fmt being ARG_PTR_TO_CONST_STR guarantees that var_off is const
 	 * and map_direct_value_addr is set.
 	 */
@@ -21906,6 +21919,38 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
 			goto next_insn;
 		}
 
+#ifdef CONFIG_SMP
+		if (insn->code == (BPF_LD | BPF_IMM | BPF_DW) &&
+		    (insn->src_reg == BPF_PSEUDO_MAP_VALUE ||
+		     insn->src_reg == BPF_PSEUDO_MAP_IDX_VALUE)) {
+			struct bpf_map *map;
+
+			aux = &env->insn_aux_data[i + delta];
+			map = env->used_maps[aux->map_index];
+			if (map->map_type != BPF_MAP_TYPE_PERCPU_ARRAY)
+				goto next_insn;
+
+			/* Reuse the original ld_imm64 insn. And add one
+			 * mov64_percpu_reg insn.
+			 */
+
+			insn_buf[0] = insn[1];
+			insn_buf[1] = BPF_MOV64_PERCPU_REG(insn->dst_reg, insn->dst_reg);
+			cnt = 2;
+
+			i++;
+			new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt);
+			if (!new_prog)
+				return -ENOMEM;
+
+			delta    += cnt - 1;
+			env->prog = prog = new_prog;
+			insn      = new_prog->insnsi + i + delta;
+
+			goto next_insn;
+		}
+#endif
+
 		if (insn->code != (BPF_JMP | BPF_CALL))
 			goto next_insn;
 		if (insn->src_reg == BPF_PSEUDO_CALL)
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH bpf-next v3 2/4] bpf, libbpf: Support global percpu data
  2025-05-26 16:21 [PATCH bpf-next v3 0/4] bpf: Introduce global percpu data Leon Hwang
  2025-05-26 16:21 ` [PATCH bpf-next v3 1/4] " Leon Hwang
@ 2025-05-26 16:21 ` Leon Hwang
  2025-05-27 22:31   ` Andrii Nakryiko
  2025-05-27 22:40   ` Alexei Starovoitov
  2025-05-26 16:21 ` [PATCH bpf-next v3 3/4] bpf, bpftool: Generate skeleton for " Leon Hwang
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 24+ messages in thread
From: Leon Hwang @ 2025-05-26 16:21 UTC (permalink / raw)
  To: bpf
  Cc: ast, andrii, daniel, yonghong.song, song, eddyz87, qmo, dxu,
	leon.hwang, kernel-patches-bot

This patch introduces support for global percpu data in libbpf by adding a
new ".data..percpu" section, similar to ".data". It enables efficient
handling of percpu global variables in bpf programs.

This enhancement improves performance for workloads that benefit from
percpu storage.

Meanwhile, add bpf_map__is_internal_percpu() API to check whether the map
is an internal map used for global percpu variables.

Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
 tools/lib/bpf/libbpf.c   | 102 +++++++++++++++++++++++++++++++--------
 tools/lib/bpf/libbpf.h   |   9 ++++
 tools/lib/bpf/libbpf.map |   1 +
 3 files changed, 91 insertions(+), 21 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index e9c641a2fb203..65f0df09ac6d8 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -518,6 +518,7 @@ struct bpf_struct_ops {
 };
 
 #define DATA_SEC ".data"
+#define PERCPU_DATA_SEC ".data..percpu"
 #define BSS_SEC ".bss"
 #define RODATA_SEC ".rodata"
 #define KCONFIG_SEC ".kconfig"
@@ -532,6 +533,7 @@ enum libbpf_map_type {
 	LIBBPF_MAP_BSS,
 	LIBBPF_MAP_RODATA,
 	LIBBPF_MAP_KCONFIG,
+	LIBBPF_MAP_PERCPU_DATA,
 };
 
 struct bpf_map_def {
@@ -642,6 +644,7 @@ enum sec_type {
 	SEC_DATA,
 	SEC_RODATA,
 	SEC_ST_OPS,
+	SEC_PERCPU_DATA,
 };
 
 struct elf_sec_desc {
@@ -1902,7 +1905,7 @@ static bool map_is_mmapable(struct bpf_object *obj, struct bpf_map *map)
 	struct btf_var_secinfo *vsi;
 	int i, n;
 
-	if (!map->btf_value_type_id)
+	if (!map->btf_value_type_id || map->libbpf_type == LIBBPF_MAP_PERCPU_DATA)
 		return false;
 
 	t = btf__type_by_id(obj->btf, map->btf_value_type_id);
@@ -1926,6 +1929,7 @@ static int
 bpf_object__init_internal_map(struct bpf_object *obj, enum libbpf_map_type type,
 			      const char *real_name, int sec_idx, void *data, size_t data_sz)
 {
+	bool is_percpu = type == LIBBPF_MAP_PERCPU_DATA;
 	struct bpf_map_def *def;
 	struct bpf_map *map;
 	size_t mmap_sz;
@@ -1947,9 +1951,9 @@ bpf_object__init_internal_map(struct bpf_object *obj, enum libbpf_map_type type,
 	}
 
 	def = &map->def;
-	def->type = BPF_MAP_TYPE_ARRAY;
+	def->type = is_percpu ? BPF_MAP_TYPE_PERCPU_ARRAY : BPF_MAP_TYPE_ARRAY;
 	def->key_size = sizeof(int);
-	def->value_size = data_sz;
+	def->value_size = is_percpu ? roundup(data_sz, 8) : data_sz;
 	def->max_entries = 1;
 	def->map_flags = type == LIBBPF_MAP_RODATA || type == LIBBPF_MAP_KCONFIG
 		? BPF_F_RDONLY_PROG : 0;
@@ -1960,10 +1964,11 @@ bpf_object__init_internal_map(struct bpf_object *obj, enum libbpf_map_type type,
 	if (map_is_mmapable(obj, map))
 		def->map_flags |= BPF_F_MMAPABLE;
 
-	pr_debug("map '%s' (global data): at sec_idx %d, offset %zu, flags %x.\n",
-		 map->name, map->sec_idx, map->sec_offset, def->map_flags);
+	pr_debug("map '%s' (global %sdata): at sec_idx %d, offset %zu, flags %x.\n",
+		 map->name, is_percpu ? "percpu " : "", map->sec_idx,
+		 map->sec_offset, def->map_flags);
 
-	mmap_sz = bpf_map_mmap_sz(map);
+	mmap_sz = is_percpu ? def->value_size : bpf_map_mmap_sz(map);
 	map->mmaped = mmap(NULL, mmap_sz, PROT_READ | PROT_WRITE,
 			   MAP_SHARED | MAP_ANONYMOUS, -1, 0);
 	if (map->mmaped == MAP_FAILED) {
@@ -1999,6 +2004,13 @@ static int bpf_object__init_global_data_maps(struct bpf_object *obj)
 			continue;
 
 		switch (sec_desc->sec_type) {
+		case SEC_PERCPU_DATA:
+			sec_name = elf_sec_name(obj, elf_sec_by_idx(obj, sec_idx));
+			err = bpf_object__init_internal_map(obj, LIBBPF_MAP_PERCPU_DATA,
+							    sec_name, sec_idx,
+							    sec_desc->data->d_buf,
+							    sec_desc->data->d_size);
+			break;
 		case SEC_DATA:
 			sec_name = elf_sec_name(obj, elf_sec_by_idx(obj, sec_idx));
 			err = bpf_object__init_internal_map(obj, LIBBPF_MAP_DATA,
@@ -3363,6 +3375,10 @@ static int btf_fixup_datasec(struct bpf_object *obj, struct btf *btf,
 		fixup_offsets = true;
 	}
 
+	/* .data..percpu DATASEC must have __aligned(8) size. */
+	if (strcmp(sec_name, PERCPU_DATA_SEC) == 0 || str_has_pfx(sec_name, PERCPU_DATA_SEC))
+		t->size = roundup(t->size, 8);
+
 	for (i = 0, vsi = btf_var_secinfos(t); i < vars; i++, vsi++) {
 		const struct btf_type *t_var;
 		struct btf_var *var;
@@ -3923,6 +3939,11 @@ static int bpf_object__elf_collect(struct bpf_object *obj)
 				err = bpf_object__add_programs(obj, data, name, idx);
 				if (err)
 					return err;
+			} else if (strcmp(name, PERCPU_DATA_SEC) == 0 ||
+				   str_has_pfx(name, PERCPU_DATA_SEC)) {
+				sec_desc->sec_type = SEC_PERCPU_DATA;
+				sec_desc->shdr = sh;
+				sec_desc->data = data;
 			} else if (strcmp(name, DATA_SEC) == 0 ||
 				   str_has_pfx(name, DATA_SEC ".")) {
 				sec_desc->sec_type = SEC_DATA;
@@ -4452,6 +4473,7 @@ static bool bpf_object__shndx_is_data(const struct bpf_object *obj,
 	case SEC_BSS:
 	case SEC_DATA:
 	case SEC_RODATA:
+	case SEC_PERCPU_DATA:
 		return true;
 	default:
 		return false;
@@ -4477,6 +4499,8 @@ bpf_object__section_to_libbpf_map_type(const struct bpf_object *obj, int shndx)
 		return LIBBPF_MAP_DATA;
 	case SEC_RODATA:
 		return LIBBPF_MAP_RODATA;
+	case SEC_PERCPU_DATA:
+		return LIBBPF_MAP_PERCPU_DATA;
 	default:
 		return LIBBPF_MAP_UNSPEC;
 	}
@@ -4794,7 +4818,7 @@ static int map_fill_btf_type_info(struct bpf_object *obj, struct bpf_map *map)
 
 	/*
 	 * LLVM annotates global data differently in BTF, that is,
-	 * only as '.data', '.bss' or '.rodata'.
+	 * only as '.data', '.bss', '.rodata' or '.data..percpu'.
 	 */
 	if (!bpf_map__is_internal(map))
 		return -ENOENT;
@@ -5129,23 +5153,47 @@ static int
 bpf_object__populate_internal_map(struct bpf_object *obj, struct bpf_map *map)
 {
 	enum libbpf_map_type map_type = map->libbpf_type;
-	int err, zero = 0;
-	size_t mmap_sz;
+	bool is_percpu = map_type == LIBBPF_MAP_PERCPU_DATA;
+	int err = 0, zero = 0, num_cpus, i;
+	size_t data_sz, elem_sz, mmap_sz;
+	void *data = NULL;
+
+	data_sz = map->def.value_size;
+	if (is_percpu) {
+		num_cpus = libbpf_num_possible_cpus();
+		if (num_cpus < 0) {
+			err = num_cpus;
+			return err;
+		}
+
+		data_sz = data_sz * num_cpus;
+		data = malloc(data_sz);
+		if (!data) {
+			err = -ENOMEM;
+			return err;
+		}
+
+		elem_sz = map->def.value_size;
+		for (i = 0; i < num_cpus; i++)
+			memcpy(data + i * elem_sz, map->mmaped, elem_sz);
+	} else {
+		data = map->mmaped;
+	}
 
 	if (obj->gen_loader) {
 		bpf_gen__map_update_elem(obj->gen_loader, map - obj->maps,
-					 map->mmaped, map->def.value_size);
+					 data, data_sz);
 		if (map_type == LIBBPF_MAP_RODATA || map_type == LIBBPF_MAP_KCONFIG)
 			bpf_gen__map_freeze(obj->gen_loader, map - obj->maps);
-		return 0;
+		goto free_data;
 	}
 
-	err = bpf_map_update_elem(map->fd, &zero, map->mmaped, 0);
+	err = bpf_map_update_elem(map->fd, &zero, data, 0);
 	if (err) {
 		err = -errno;
 		pr_warn("map '%s': failed to set initial contents: %s\n",
 			bpf_map__name(map), errstr(err));
-		return err;
+		goto free_data;
 	}
 
 	/* Freeze .rodata and .kconfig map as read-only from syscall side. */
@@ -5155,7 +5203,7 @@ bpf_object__populate_internal_map(struct bpf_object *obj, struct bpf_map *map)
 			err = -errno;
 			pr_warn("map '%s': failed to freeze as read-only: %s\n",
 				bpf_map__name(map), errstr(err));
-			return err;
+			goto free_data;
 		}
 	}
 
@@ -5182,7 +5230,7 @@ bpf_object__populate_internal_map(struct bpf_object *obj, struct bpf_map *map)
 			err = -errno;
 			pr_warn("map '%s': failed to re-mmap() contents: %s\n",
 				bpf_map__name(map), errstr(err));
-			return err;
+			goto free_data;
 		}
 		map->mmaped = mmaped;
 	} else if (map->mmaped) {
@@ -5190,7 +5238,10 @@ bpf_object__populate_internal_map(struct bpf_object *obj, struct bpf_map *map)
 		map->mmaped = NULL;
 	}
 
-	return 0;
+free_data:
+	if (is_percpu)
+		free(data);
+	return err;
 }
 
 static void bpf_map__destroy(struct bpf_map *map);
@@ -10214,16 +10265,20 @@ int bpf_map__fd(const struct bpf_map *map)
 
 static bool map_uses_real_name(const struct bpf_map *map)
 {
-	/* Since libbpf started to support custom .data.* and .rodata.* maps,
-	 * their user-visible name differs from kernel-visible name. Users see
-	 * such map's corresponding ELF section name as a map name.
-	 * This check distinguishes .data/.rodata from .data.* and .rodata.*
-	 * maps to know which name has to be returned to the user.
+	/* Since libbpf started to support custom .data.*, .rodata.* and
+	 * .data..percpu.* maps, their user-visible name differs from
+	 * kernel-visible name. Users see such map's corresponding ELF section
+	 * name as a map name. This check distinguishes
+	 * .data/.rodata/.data..percpu from .data.*, .rodata.* and
+	 * .data..percpu.* maps to know which name has to be returned to the
+	 * user.
 	 */
 	if (map->libbpf_type == LIBBPF_MAP_DATA && strcmp(map->real_name, DATA_SEC) != 0)
 		return true;
 	if (map->libbpf_type == LIBBPF_MAP_RODATA && strcmp(map->real_name, RODATA_SEC) != 0)
 		return true;
+	if (map->libbpf_type == LIBBPF_MAP_PERCPU_DATA && strcmp(map->real_name, PERCPU_DATA_SEC) != 0)
+		return true;
 	return false;
 }
 
@@ -10468,6 +10523,11 @@ bool bpf_map__is_internal(const struct bpf_map *map)
 	return map->libbpf_type != LIBBPF_MAP_UNSPEC;
 }
 
+bool bpf_map__is_internal_percpu(const struct bpf_map *map)
+{
+	return map->libbpf_type == LIBBPF_MAP_PERCPU_DATA;
+}
+
 __u32 bpf_map__ifindex(const struct bpf_map *map)
 {
 	return map->map_ifindex;
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index d39f19c8396dc..db5468f78b090 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -1091,6 +1091,15 @@ LIBBPF_API void *bpf_map__initial_value(const struct bpf_map *map, size_t *psize
  */
 LIBBPF_API bool bpf_map__is_internal(const struct bpf_map *map);
 
+/**
+ * @brief **bpf_map__is_internal_percpu()** tells the caller whether or not
+ * the passed map is an internal map used for global percpu variables.
+ * @param map the bpf_map
+ * @return true, if the map is an internal map used for global percpu
+ * variables; false, otherwise
+ */
+LIBBPF_API bool bpf_map__is_internal_percpu(const struct bpf_map *map);
+
 /**
  * @brief **bpf_map__set_pin_path()** sets the path attribute that tells where the
  * BPF map should be pinned. This does not actually create the 'pin'.
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index 1205f9a4fe048..1c239ac88c699 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -443,4 +443,5 @@ LIBBPF_1.6.0 {
 		bpf_program__line_info_cnt;
 		btf__add_decl_attr;
 		btf__add_type_attr;
+		bpf_map__is_internal_percpu;
 } LIBBPF_1.5.0;
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH bpf-next v3 3/4] bpf, bpftool: Generate skeleton for global percpu data
  2025-05-26 16:21 [PATCH bpf-next v3 0/4] bpf: Introduce global percpu data Leon Hwang
  2025-05-26 16:21 ` [PATCH bpf-next v3 1/4] " Leon Hwang
  2025-05-26 16:21 ` [PATCH bpf-next v3 2/4] bpf, libbpf: Support " Leon Hwang
@ 2025-05-26 16:21 ` Leon Hwang
  2025-05-27 22:31   ` Andrii Nakryiko
  2025-05-26 16:21 ` [PATCH bpf-next v3 4/4] selftests/bpf: Add cases to test " Leon Hwang
  2025-05-27 22:31 ` [PATCH bpf-next v3 0/4] bpf: Introduce " Andrii Nakryiko
  4 siblings, 1 reply; 24+ messages in thread
From: Leon Hwang @ 2025-05-26 16:21 UTC (permalink / raw)
  To: bpf
  Cc: ast, andrii, daniel, yonghong.song, song, eddyz87, qmo, dxu,
	leon.hwang, kernel-patches-bot

This patch enhances bpftool to generate skeletons that properly handle
global percpu variables. The generated skeleton now includes a dedicated
structure for percpu data, allowing users to initialize and access percpu
variables more efficiently.

For global percpu variables, the skeleton now includes a nested
structure, e.g.:

struct test_global_percpu_data {
	struct bpf_object_skeleton *skeleton;
	struct bpf_object *obj;
	struct {
		struct bpf_map *data__percpu;
	} maps;
	// ...
	struct test_global_percpu_data__data__percpu {
		int data;
		char run;
		struct {
			char set;
			int i;
			int nums[7];
		} struct_data;
		int nums[7];
	} __aligned(8) *data__percpu;

	// ...
};

  * The "struct test_global_percpu_data__data__percpu *data__percpu" points
    to initialized data, which is actually "maps.data__percpu->mmaped".
  * Before loading the skeleton, updating the
    "struct test_global_percpu_data__data__percpu *data__percpu" modifies
    the initial value of the corresponding global percpu variables.
  * After loading the skeleton, accessing or updating this struct is not
    allowed because this struct pointer has been reset as NULL. Instead,
    users must interact with the global percpu variables via the
    "maps.data__percpu" map.

Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
 tools/bpf/bpftool/gen.c | 47 +++++++++++++++++++++++++++++------------
 1 file changed, 34 insertions(+), 13 deletions(-)

diff --git a/tools/bpf/bpftool/gen.c b/tools/bpf/bpftool/gen.c
index 67a60114368f5..c672f52110221 100644
--- a/tools/bpf/bpftool/gen.c
+++ b/tools/bpf/bpftool/gen.c
@@ -92,7 +92,7 @@ static void get_header_guard(char *guard, const char *obj_name, const char *suff
 
 static bool get_map_ident(const struct bpf_map *map, char *buf, size_t buf_sz)
 {
-	static const char *sfxs[] = { ".data", ".rodata", ".bss", ".kconfig" };
+	static const char *sfxs[] = { ".data..percpu", ".data", ".rodata", ".bss", ".kconfig" };
 	const char *name = bpf_map__name(map);
 	int i, n;
 
@@ -117,7 +117,7 @@ static bool get_map_ident(const struct bpf_map *map, char *buf, size_t buf_sz)
 
 static bool get_datasec_ident(const char *sec_name, char *buf, size_t buf_sz)
 {
-	static const char *pfxs[] = { ".data", ".rodata", ".bss", ".kconfig" };
+	static const char *pfxs[] = { ".data..percpu", ".data", ".rodata", ".bss", ".kconfig" };
 	int i, n;
 
 	/* recognize hard coded LLVM section name */
@@ -148,7 +148,8 @@ static int codegen_datasec_def(struct bpf_object *obj,
 			       struct btf *btf,
 			       struct btf_dump *d,
 			       const struct btf_type *sec,
-			       const char *obj_name)
+			       const char *obj_name,
+			       bool is_percpu)
 {
 	const char *sec_name = btf__name_by_offset(btf, sec->name_off);
 	const struct btf_var_secinfo *sec_var = btf_var_secinfos(sec);
@@ -228,7 +229,7 @@ static int codegen_datasec_def(struct bpf_object *obj,
 
 		off = sec_var->offset + sec_var->size;
 	}
-	printf("	} *%s;\n", sec_ident);
+	printf("	}%s *%s;\n", is_percpu ? " __aligned(8)" : "", sec_ident);
 	return 0;
 }
 
@@ -263,13 +264,13 @@ static bool is_mmapable_map(const struct bpf_map *map, char *buf, size_t sz)
 		return true;
 	}
 
-	if (!bpf_map__is_internal(map) || !(bpf_map__map_flags(map) & BPF_F_MMAPABLE))
-		return false;
-
-	if (!get_map_ident(map, buf, sz))
-		return false;
+	if (bpf_map__is_internal(map) &&
+	    ((bpf_map__map_flags(map) & BPF_F_MMAPABLE) ||
+	     bpf_map__is_internal_percpu(map)) &&
+	    get_map_ident(map, buf, sz))
+		return true;
 
-	return true;
+	return false;
 }
 
 static int codegen_datasecs(struct bpf_object *obj, const char *obj_name)
@@ -303,7 +304,8 @@ static int codegen_datasecs(struct bpf_object *obj, const char *obj_name)
 			printf("	struct %s__%s {\n", obj_name, map_ident);
 			printf("	} *%s;\n", map_ident);
 		} else {
-			err = codegen_datasec_def(obj, btf, d, sec, obj_name);
+			err = codegen_datasec_def(obj, btf, d, sec, obj_name,
+						  bpf_map__is_internal_percpu(map));
 			if (err)
 				goto out;
 		}
@@ -795,7 +797,8 @@ static int gen_trace(struct bpf_object *obj, const char *obj_name, const char *h
 	bpf_object__for_each_map(map, obj) {
 		const char *mmap_flags;
 
-		if (!is_mmapable_map(map, ident, sizeof(ident)))
+		if (!is_mmapable_map(map, ident, sizeof(ident)) ||
+		    bpf_map__is_internal_percpu(map))
 			continue;
 
 		if (bpf_map__map_flags(map) & BPF_F_RDONLY_PROG)
@@ -1434,7 +1437,25 @@ static int do_skeleton(int argc, char **argv)
 		static inline int					    \n\
 		%1$s__load(struct %1$s *obj)				    \n\
 		{							    \n\
-			return bpf_object__load_skeleton(obj->skeleton);    \n\
+			int err;					    \n\
+									    \n\
+			err = bpf_object__load_skeleton(obj->skeleton);	    \n\
+			if (err)					    \n\
+				return err;				    \n\
+									    \n\
+		", obj_name);
+
+	if (map_cnt) {
+		bpf_object__for_each_map(map, obj) {
+			if (bpf_map__is_internal_percpu(map) &&
+			    get_map_ident(map, ident, sizeof(ident)))
+				printf("\tobj->%s = NULL;\n", ident);
+		}
+	}
+
+	codegen("\
+		\n\
+			return 0;					    \n\
 		}							    \n\
 									    \n\
 		static inline struct %1$s *				    \n\
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH bpf-next v3 4/4] selftests/bpf: Add cases to test global percpu data
  2025-05-26 16:21 [PATCH bpf-next v3 0/4] bpf: Introduce global percpu data Leon Hwang
                   ` (2 preceding siblings ...)
  2025-05-26 16:21 ` [PATCH bpf-next v3 3/4] bpf, bpftool: Generate skeleton for " Leon Hwang
@ 2025-05-26 16:21 ` Leon Hwang
  2025-05-27 22:31 ` [PATCH bpf-next v3 0/4] bpf: Introduce " Andrii Nakryiko
  4 siblings, 0 replies; 24+ messages in thread
From: Leon Hwang @ 2025-05-26 16:21 UTC (permalink / raw)
  To: bpf
  Cc: ast, andrii, daniel, yonghong.song, song, eddyz87, qmo, dxu,
	leon.hwang, kernel-patches-bot

If the arch, like s390x, does not support percpu insn, these cases won't
test global percpu data by checking -EOPNOTSUPP after loading prog.

The following APIs have been tested for global percpu data:
1. bpf_map__set_initial_value()
2. bpf_map__initial_value()
3. generated percpu struct pointer pointing to internal map's mmaped data
4. bpf_map__lookup_elem() for global percpu data map
5. bpf_map__is_internal_percpu()

At the same time, the case is also tested with 'bpftool gen skeleton -L'.

cd tools/testing/selftests/bpf; ./test_progs -t global_percpu_data
132     global_percpu_data_init:OK
133     global_percpu_data_lskel:OK
Summary: 2/0 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
 tools/testing/selftests/bpf/Makefile          |   2 +-
 .../bpf/prog_tests/global_data_init.c         | 221 +++++++++++++++++-
 .../bpf/progs/test_global_percpu_data.c       |  29 +++
 3 files changed, 250 insertions(+), 2 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/test_global_percpu_data.c

diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index cf5ed3bee573e..26121f53fa420 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -503,7 +503,7 @@ LSKELS := fentry_test.c fexit_test.c fexit_sleep.c atomics.c 		\
 
 # Generate both light skeleton and libbpf skeleton for these
 LSKELS_EXTRA := test_ksyms_module.c test_ksyms_weak.c kfunc_call_test.c \
-	kfunc_call_test_subprog.c
+	kfunc_call_test_subprog.c test_global_percpu_data.c
 SKEL_BLACKLIST += $$(LSKELS)
 
 test_static_linked.skel.h-deps := test_static_linked1.bpf.o test_static_linked2.bpf.o
diff --git a/tools/testing/selftests/bpf/prog_tests/global_data_init.c b/tools/testing/selftests/bpf/prog_tests/global_data_init.c
index 8466332d7406f..51fdc9c190cac 100644
--- a/tools/testing/selftests/bpf/prog_tests/global_data_init.c
+++ b/tools/testing/selftests/bpf/prog_tests/global_data_init.c
@@ -1,5 +1,8 @@
 // SPDX-License-Identifier: GPL-2.0
 #include <test_progs.h>
+#include "bpf/libbpf_internal.h"
+#include "test_global_percpu_data.skel.h"
+#include "test_global_percpu_data.lskel.h"
 
 void test_global_data_init(void)
 {
@@ -8,7 +11,7 @@ void test_global_data_init(void)
 	__u8 *buff = NULL, *newval = NULL;
 	struct bpf_object *obj;
 	struct bpf_map *map;
-        __u32 duration = 0;
+	__u32 duration = 0;
 	size_t sz;
 
 	obj = bpf_object__open_file(file, NULL);
@@ -60,3 +63,219 @@ void test_global_data_init(void)
 	free(newval);
 	bpf_object__close(obj);
 }
+
+void test_global_percpu_data_init(void)
+{
+	struct test_global_percpu_data__data__percpu init_value, *init_data, *data, *percpu_data;
+	int key, prog_fd, err, num_cpus, num_online, i;
+	struct test_global_percpu_data *skel = NULL;
+	__u64 args[2] = {0x1234ULL, 0x5678ULL};
+	size_t elem_sz, init_data_sz;
+	struct bpf_map *map;
+	bool *online;
+	LIBBPF_OPTS(bpf_test_run_opts, topts,
+		    .ctx_in = args,
+		    .ctx_size_in = sizeof(args),
+		    .flags = BPF_F_TEST_RUN_ON_CPU,
+	);
+
+	num_cpus = libbpf_num_possible_cpus();
+	if (!ASSERT_GT(num_cpus, 0, "libbpf_num_possible_cpus"))
+		return;
+
+	err = parse_cpu_mask_file("/sys/devices/system/cpu/online",
+				  &online, &num_online);
+	if (!ASSERT_OK(err, "parse_cpu_mask_file"))
+		return;
+
+	elem_sz = sizeof(*percpu_data);
+	percpu_data = calloc(num_cpus, elem_sz);
+	if (!ASSERT_OK_PTR(percpu_data, "calloc percpu_data"))
+		goto out;
+
+	skel = test_global_percpu_data__open();
+	if (!ASSERT_OK_PTR(skel, "test_global_percpu_data__open"))
+		goto out;
+	if (!ASSERT_OK_PTR(skel->data__percpu, "skel->data__percpu"))
+		goto out;
+
+	ASSERT_EQ(skel->data__percpu->data, -1, "skel->data__percpu->data");
+	ASSERT_FALSE(skel->data__percpu->run, "skel->data__percpu->run");
+	ASSERT_EQ(skel->data__percpu->nums[6], 0, "skel->data__percpu->nums[6]");
+	ASSERT_EQ(skel->data__percpu->struct_data.i, -1, "struct_data.i");
+	ASSERT_FALSE(skel->data__percpu->struct_data.set, "struct_data.set");
+	ASSERT_EQ(skel->data__percpu->struct_data.nums[6], 0, "struct_data.nums[6]");
+
+	map = skel->maps.data__percpu;
+	if (!ASSERT_EQ(bpf_map__type(map), BPF_MAP_TYPE_PERCPU_ARRAY, "bpf_map__type"))
+		goto out;
+	if (!ASSERT_TRUE(bpf_map__is_internal_percpu(map), "bpf_map__is_internal_percpu"))
+		goto out;
+
+	init_value.data = 2;
+	init_value.nums[6] = -1;
+	init_value.struct_data.i = 2;
+	init_value.struct_data.nums[6] = -1;
+	err = bpf_map__set_initial_value(map, &init_value, sizeof(init_value));
+	if (!ASSERT_OK(err, "bpf_map__set_initial_value"))
+		goto out;
+
+	init_data = bpf_map__initial_value(map, &init_data_sz);
+	if (!ASSERT_OK_PTR(init_data, "bpf_map__initial_value"))
+		goto out;
+
+	ASSERT_EQ(init_data->data, init_value.data, "init_value data");
+	ASSERT_EQ(init_data->run, init_value.run, "init_value run");
+	ASSERT_EQ(init_data->struct_data.i, init_value.struct_data.i,
+		  "init_value struct_data.i");
+	ASSERT_EQ(init_data->struct_data.nums[6],
+		  init_value.struct_data.nums[6],
+		  "init_value struct_data.nums[6]");
+	ASSERT_EQ(init_data_sz, sizeof(init_value), "init_value size");
+	ASSERT_EQ((void *) init_data, (void *) skel->data__percpu,
+		  "skel->data__percpu eq init_data");
+	ASSERT_EQ(skel->data__percpu->data, init_value.data,
+		  "skel->data__percpu->data");
+	ASSERT_EQ(skel->data__percpu->run, init_value.run,
+		  "skel->data__percpu->run");
+	ASSERT_EQ(skel->data__percpu->struct_data.i, init_value.struct_data.i,
+		  "skel->data__percpu->struct_data.i");
+	ASSERT_EQ(skel->data__percpu->struct_data.nums[6],
+		  init_value.struct_data.nums[6],
+		  "skel->data__percpu->struct_data.nums[6]");
+
+	err = test_global_percpu_data__load(skel);
+	if (err == -EOPNOTSUPP) {
+		test__skip();
+		goto out;
+	}
+	if (!ASSERT_OK(err, "test_global_percpu_data__load"))
+		goto out;
+
+	ASSERT_NULL(skel->data__percpu, "skel->data__percpu");
+
+	err = test_global_percpu_data__attach(skel);
+	if (!ASSERT_OK(err, "test_global_percpu_data__attach"))
+		goto out;
+
+	prog_fd = bpf_program__fd(skel->progs.update_percpu_data);
+
+	/* run on every CPU */
+	for (i = 0; i < num_online; i++) {
+		if (!online[i])
+			continue;
+
+		topts.cpu = i;
+		topts.retval = 0;
+		err = bpf_prog_test_run_opts(prog_fd, &topts);
+		ASSERT_OK(err, "bpf_prog_test_run_opts");
+		ASSERT_EQ(topts.retval, 0, "bpf_prog_test_run_opts retval");
+	}
+
+	key = 0;
+	err = bpf_map__lookup_elem(map, &key, sizeof(key), percpu_data,
+				   elem_sz * num_cpus, 0);
+	if (!ASSERT_OK(err, "bpf_map__lookup_elem"))
+		goto out;
+
+	for (i = 0; i < num_online; i++) {
+		if (!online[i])
+			continue;
+
+		data = percpu_data + i;
+		ASSERT_EQ(data->data, 1, "percpu_data->data");
+		ASSERT_TRUE(data->run, "percpu_data->run");
+		ASSERT_EQ(data->nums[6], 0xc0de, "percpu_data->nums[6]");
+		ASSERT_EQ(data->struct_data.i, 1, "struct_data.i");
+		ASSERT_TRUE(data->struct_data.set, "struct_data.set");
+		ASSERT_EQ(data->struct_data.nums[6], 0xc0de, "struct_data.nums[6]");
+	}
+
+out:
+	test_global_percpu_data__destroy(skel);
+	if (percpu_data)
+		free(percpu_data);
+	free(online);
+}
+
+void test_global_percpu_data_lskel(void)
+{
+	struct test_global_percpu_data__data__percpu *data, *percpu_data;
+	int key, prog_fd, map_fd, err, num_cpus, num_online, i;
+	struct test_global_percpu_data_lskel *lskel = NULL;
+	__u64 args[2] = {0x1234ULL, 0x5678ULL};
+	bool *online;
+	LIBBPF_OPTS(bpf_test_run_opts, topts,
+		    .ctx_in = args,
+		    .ctx_size_in = sizeof(args),
+		    .flags = BPF_F_TEST_RUN_ON_CPU,
+	);
+
+	num_cpus = libbpf_num_possible_cpus();
+	if (!ASSERT_GT(num_cpus, 0, "libbpf_num_possible_cpus"))
+		return;
+
+	err = parse_cpu_mask_file("/sys/devices/system/cpu/online",
+				  &online, &num_online);
+	if (!ASSERT_OK(err, "parse_cpu_mask_file"))
+		return;
+
+	percpu_data = calloc(num_cpus, sizeof(*percpu_data));
+	if (!ASSERT_OK_PTR(percpu_data, "calloc percpu_data"))
+		goto out;
+
+	lskel = test_global_percpu_data_lskel__open();
+	if (!ASSERT_OK_PTR(lskel, "test_global_percpu_data_lskel__open"))
+		goto out;
+
+	err = test_global_percpu_data_lskel__load(lskel);
+	if (err == -EOPNOTSUPP) {
+		test__skip();
+		goto out;
+	}
+	if (!ASSERT_OK(err, "test_global_percpu_data_lskel__load"))
+		goto out;
+
+	err = test_global_percpu_data_lskel__attach(lskel);
+	if (!ASSERT_OK(err, "test_global_percpu_data_lskel__attach"))
+		goto out;
+
+	prog_fd = lskel->progs.update_percpu_data.prog_fd;
+
+	/* run on every CPU */
+	for (i = 0; i < num_online; i++) {
+		if (!online[i])
+			continue;
+
+		topts.cpu = i;
+		topts.retval = 0;
+		err = bpf_prog_test_run_opts(prog_fd, &topts);
+		ASSERT_OK(err, "bpf_prog_test_run_opts");
+		ASSERT_EQ(topts.retval, 0, "bpf_prog_test_run_opts retval");
+	}
+
+	key = 0;
+	map_fd = lskel->maps.data__percpu.map_fd;
+	err = bpf_map_lookup_elem(map_fd, &key, percpu_data);
+	if (!ASSERT_OK(err, "bpf_map_lookup_elem"))
+		goto out;
+
+	for (i = 0; i < num_online; i++) {
+		if (!online[i])
+			continue;
+
+		data = percpu_data + i;
+		ASSERT_EQ(data->data, 1, "percpu_data->data");
+		ASSERT_TRUE(data->run, "percpu_data->run");
+		ASSERT_EQ(data->nums[6], 0xc0de, "percpu_data->nums[6]");
+		ASSERT_EQ(data->struct_data.i, 1, "struct_data.i");
+		ASSERT_TRUE(data->struct_data.set, "struct_data.set");
+		ASSERT_EQ(data->struct_data.nums[6], 0xc0de, "struct_data.nums[6]");
+	}
+
+out:
+	test_global_percpu_data_lskel__destroy(lskel);
+	if (percpu_data)
+		free(percpu_data);
+	free(online);
+}
diff --git a/tools/testing/selftests/bpf/progs/test_global_percpu_data.c b/tools/testing/selftests/bpf/progs/test_global_percpu_data.c
new file mode 100644
index 0000000000000..2cf8566b5465f
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_global_percpu_data.c
@@ -0,0 +1,29 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+
+int data SEC(".data..percpu") = -1;
+char run SEC(".data..percpu") = 0;
+int nums[7] SEC(".data..percpu");
+struct {
+	char set;
+	int i;
+	int nums[7];
+} struct_data SEC(".data..percpu") = {
+	.set = 0,
+	.i = -1,
+};
+
+SEC("raw_tp/task_rename")
+int update_percpu_data(struct __sk_buff *skb)
+{
+	struct_data.nums[6] = 0xc0de;
+	struct_data.set = 1;
+	struct_data.i = 1;
+	nums[6] = 0xc0de;
+	data = 1;
+	run = 1;
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf-next v3 1/4] bpf: Introduce global percpu data
  2025-05-26 16:21 ` [PATCH bpf-next v3 1/4] " Leon Hwang
@ 2025-05-27 22:31   ` Andrii Nakryiko
  2025-05-29  2:03     ` Leon Hwang
  0 siblings, 1 reply; 24+ messages in thread
From: Andrii Nakryiko @ 2025-05-27 22:31 UTC (permalink / raw)
  To: Leon Hwang
  Cc: bpf, ast, andrii, daniel, yonghong.song, song, eddyz87, qmo, dxu,
	kernel-patches-bot

On Mon, May 26, 2025 at 9:22 AM Leon Hwang <leon.hwang@linux.dev> wrote:
>
> This patch introduces global percpu data, inspired by commit
> 6316f78306c1 ("Merge branch 'support-global-data'"). It enables the
> definition of global percpu variables in BPF, similar to the
> DEFINE_PER_CPU() macro in the kernel[0].
>
> For example, in BPF, it is able to define a global percpu variable like:
>
> int data SEC(".data..percpu");
>
> With this patch, tools like retsnoop[1] and bpfsnoop[2] can simplify their
> BPF code for handling LBRs. The code can be updated from
>
> static struct perf_branch_entry lbrs[1][MAX_LBR_ENTRIES] SEC(".data.lbrs");
>
> to
>
> static struct perf_branch_entry lbrs[MAX_LBR_ENTRIES] SEC(".data..percpu.lbrs");
>
> This eliminates the need to retrieve the CPU ID using the
> bpf_get_smp_processor_id() helper.
>
> Additionally, by reusing global percpu data map, sharing information
> between tail callers and callees or freplace callers and callees becomes
> simpler compared to reusing percpu_array maps.
>
> Links:
> [0] https://github.com/torvalds/linux/blob/fbfd64d25c7af3b8695201ebc85efe90be28c5a3/include/linux/percpu-defs.h#L114
> [1] https://github.com/anakryiko/retsnoop
> [2] https://github.com/bpfsnoop/bpfsnoop
>
> Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
> ---
>  kernel/bpf/arraymap.c | 41 +++++++++++++++++++++++++++++++++++++--
>  kernel/bpf/verifier.c | 45 +++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 84 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
> index eb28c0f219ee4..91d06f0165a6e 100644
> --- a/kernel/bpf/arraymap.c
> +++ b/kernel/bpf/arraymap.c
> @@ -249,6 +249,40 @@ static void *percpu_array_map_lookup_elem(struct bpf_map *map, void *key)
>         return this_cpu_ptr(array->pptrs[index & array->index_mask]);
>  }
>
> +static int percpu_array_map_direct_value_addr(const struct bpf_map *map,
> +                                             u64 *imm, u32 off)
> +{
> +       struct bpf_array *array = container_of(map, struct bpf_array, map);
> +
> +       if (map->max_entries != 1)
> +               return -EOPNOTSUPP;
> +       if (off >= map->value_size)
> +               return -EINVAL;
> +       if (!bpf_jit_supports_percpu_insn())
> +               return -EOPNOTSUPP;
> +
> +       *imm = (u64) array->pptrs[0];
> +       return 0;
> +}
> +
> +static int percpu_array_map_direct_value_meta(const struct bpf_map *map,
> +                                             u64 imm, u32 *off)
> +{
> +       struct bpf_array *array = container_of(map, struct bpf_array, map);
> +       u64 base = (u64) array->pptrs[0];
> +       u64 range = array->elem_size;
> +
> +       if (map->max_entries != 1)
> +               return -EOPNOTSUPP;
> +       if (imm < base || imm >= base + range)
> +               return -ENOENT;
> +       if (!bpf_jit_supports_percpu_insn())
> +               return -EOPNOTSUPP;
> +
> +       *off = imm - base;
> +       return 0;
> +}
> +
>  /* emit BPF instructions equivalent to C code of percpu_array_map_lookup_elem() */
>  static int percpu_array_map_gen_lookup(struct bpf_map *map, struct bpf_insn *insn_buf)
>  {
> @@ -532,9 +566,10 @@ static int array_map_check_btf(const struct bpf_map *map,
>  {
>         u32 int_data;
>
> -       /* One exception for keyless BTF: .bss/.data/.rodata map */
> +       /* One exception for keyless BTF: .bss/.data/.rodata/.data..percpu map */
>         if (btf_type_is_void(key_type)) {
> -               if (map->map_type != BPF_MAP_TYPE_ARRAY ||
> +               if ((map->map_type != BPF_MAP_TYPE_ARRAY &&
> +                    map->map_type != BPF_MAP_TYPE_PERCPU_ARRAY) ||
>                     map->max_entries != 1)
>                         return -EINVAL;
>
> @@ -815,6 +850,8 @@ const struct bpf_map_ops percpu_array_map_ops = {
>         .map_get_next_key = array_map_get_next_key,
>         .map_lookup_elem = percpu_array_map_lookup_elem,
>         .map_gen_lookup = percpu_array_map_gen_lookup,
> +       .map_direct_value_addr = percpu_array_map_direct_value_addr,
> +       .map_direct_value_meta = percpu_array_map_direct_value_meta,
>         .map_update_elem = array_map_update_elem,
>         .map_delete_elem = array_map_delete_elem,
>         .map_lookup_percpu_elem = percpu_array_map_lookup_percpu_elem,
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index d5807d2efc922..9203354208732 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -6939,6 +6939,8 @@ static int bpf_map_direct_read(struct bpf_map *map, int off, int size, u64 *val,
>         u64 addr;
>         int err;
>
> +       if (map->map_type != BPF_MAP_TYPE_ARRAY)
> +               return -EINVAL;
>         err = map->ops->map_direct_value_addr(map, &addr, off);
>         if (err)
>                 return err;
> @@ -7451,6 +7453,7 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
>                         /* if map is read-only, track its contents as scalars */
>                         if (tnum_is_const(reg->var_off) &&
>                             bpf_map_is_rdonly(map) &&
> +                           map->map_type == BPF_MAP_TYPE_ARRAY &&
>                             map->ops->map_direct_value_addr) {
>                                 int map_off = off + reg->var_off.value;
>                                 u64 val = 0;
> @@ -9414,6 +9417,11 @@ static int check_reg_const_str(struct bpf_verifier_env *env,
>                 return -EACCES;
>         }
>
> +       if (map->map_type != BPF_MAP_TYPE_ARRAY) {
> +               verbose(env, "only array map supports direct string value access\n");
> +               return -EINVAL;
> +       }
> +
>         err = check_map_access(env, regno, reg->off,
>                                map->value_size - reg->off, false,
>                                ACCESS_HELPER);
> @@ -11101,6 +11109,11 @@ static int check_bpf_snprintf_call(struct bpf_verifier_env *env,
>                 return -EINVAL;
>         num_args = data_len_reg->var_off.value / 8;
>
> +       if (fmt_map->map_type != BPF_MAP_TYPE_ARRAY) {
> +               verbose(env, "only array map supports snprintf\n");
> +               return -EINVAL;
> +       }
> +
>         /* fmt being ARG_PTR_TO_CONST_STR guarantees that var_off is const
>          * and map_direct_value_addr is set.
>          */
> @@ -21906,6 +21919,38 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
>                         goto next_insn;
>                 }
>
> +#ifdef CONFIG_SMP

Instead of CONFIG_SMP, I think it's more appropriate to check for
bpf_jit_supports_percpu_insn(). We check CONFIG_SMP for
BPF_FUNC_get_smp_processor_id inlining because of `cpu_number` per-CPU
variable, not because BPF_MOV64_PERCPU_REG() doesn't work on single
CPU systems (IIUC).

pw-bot: cr


> +               if (insn->code == (BPF_LD | BPF_IMM | BPF_DW) &&
> +                   (insn->src_reg == BPF_PSEUDO_MAP_VALUE ||
> +                    insn->src_reg == BPF_PSEUDO_MAP_IDX_VALUE)) {
> +                       struct bpf_map *map;
> +
> +                       aux = &env->insn_aux_data[i + delta];
> +                       map = env->used_maps[aux->map_index];
> +                       if (map->map_type != BPF_MAP_TYPE_PERCPU_ARRAY)
> +                               goto next_insn;
> +
> +                       /* Reuse the original ld_imm64 insn. And add one
> +                        * mov64_percpu_reg insn.
> +                        */
> +
> +                       insn_buf[0] = insn[1];
> +                       insn_buf[1] = BPF_MOV64_PERCPU_REG(insn->dst_reg, insn->dst_reg);
> +                       cnt = 2;
> +
> +                       i++;
> +                       new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt);
> +                       if (!new_prog)
> +                               return -ENOMEM;
> +
> +                       delta    += cnt - 1;
> +                       env->prog = prog = new_prog;
> +                       insn      = new_prog->insnsi + i + delta;
> +
> +                       goto next_insn;
> +               }
> +#endif
> +
>                 if (insn->code != (BPF_JMP | BPF_CALL))
>                         goto next_insn;
>                 if (insn->src_reg == BPF_PSEUDO_CALL)
> --
> 2.49.0
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf-next v3 0/4] bpf: Introduce global percpu data
  2025-05-26 16:21 [PATCH bpf-next v3 0/4] bpf: Introduce global percpu data Leon Hwang
                   ` (3 preceding siblings ...)
  2025-05-26 16:21 ` [PATCH bpf-next v3 4/4] selftests/bpf: Add cases to test " Leon Hwang
@ 2025-05-27 22:31 ` Andrii Nakryiko
  2025-05-28 17:10   ` Yonghong Song
  4 siblings, 1 reply; 24+ messages in thread
From: Andrii Nakryiko @ 2025-05-27 22:31 UTC (permalink / raw)
  To: Leon Hwang
  Cc: bpf, ast, andrii, daniel, yonghong.song, song, eddyz87, qmo, dxu,
	kernel-patches-bot

On Mon, May 26, 2025 at 9:22 AM Leon Hwang <leon.hwang@linux.dev> wrote:
>
> This patch set introduces global percpu data, similar to commit
> 6316f78306c1 ("Merge branch 'support-global-data'"), to reduce restrictions
> in C for BPF programs.
>
> With this enhancement, it becomes possible to define and use global percpu
> variables, like the DEFINE_PER_CPU() macro in the kernel[0].
>
> The section name for global peurcpu data is ".data..percpu". It cannot be
> named ".percpu" or ".percpudata" because defining a one-byte percpu
> variable (e.g., char run SEC(".data..percpu") = 0;) can trigger a crash
> with Clang 17[1]. The name ".data.percpu" is also avoided because some

Does this happen with newer Clangs? If not, I don't think a bug in
Clang 17 is reason enough for this weird '.data..percpu' naming
convention. I'd still very much prefer .percpu prefix. .data is used
for non-per-CPU data, we shouldn't share the prefix, if we can avoid
that.

pw-bot: cr


> users already use section names prefixed with ".data.percpu", such as in
> this example from test_global_map_resize.c:
>
> int percpu_arr[1] SEC(".data.percpu_arr");
>
> The idea stems from the bpfsnoop[2], which itself was inspired by
> retsnoop[3]. During testing of bpfsnoop on the v6.6 kernel, two LBR
> (Last Branch Record) entries were observed related to the
> bpf_get_smp_processor_id() helper.
>
> Since commit 1ae6921009e5 ("bpf: inline bpf_get_smp_processor_id() helper"),
> the bpf_get_smp_processor_id() helper has been inlined on x86_64, reducing
> the overhead and consequently minimizing these two LBR records.
>
> However, the introduction of global percpu data offers a more robust
> solution. By leveraging the percpu_array map and percpu instruction,
> global percpu data can be implemented intrinsically.
>
> This feature also facilitates sharing percpu information between tail
> callers and callees or between freplace callers and callees through a
> shared global percpu variable. Previously, this was achieved using a
> 1-entry percpu_array map, which this patch set aims to improve upon.
>
> Links:
> [0] https://github.com/torvalds/linux/blob/fbfd64d25c7af3b8695201ebc85efe90be28c5a3/include/linux/percpu-defs.h#L114
> [1] https://lore.kernel.org/bpf/fd1b3f58-c27f-403d-ad99-644b7d06ecb3@linux.dev/
> [2] https://github.com/bpfsnoop/bpfsnoop
> [3] https://github.com/anakryiko/retsnoop
>
> Changes:
> v2 -> v3:
>   * Use ".data..percpu" as PERCPU_DATA_SEC.
>   * Address comment from Alexei:
>     * Add u8, array of ints and struct { .. } vars to selftest.
>
> v1 -> v2:
>   * Address comments from Andrii:
>     * Use LIBBPF_MAP_PERCPU and SEC_PERCPU.
>     * Reuse mmaped of libbpf's struct bpf_map for .percpu map data.
>     * Set .percpu struct pointer to NULL after loading skeleton.
>     * Make sure value size of .percpu map is __aligned(8).
>     * Use raw_tp and opts.cpu to test global percpu variables on all CPUs.
>   * Address comments from Alexei:
>     * Test non-zero offset of global percpu variable.
>     * Test case about BPF_PSEUDO_MAP_IDX_VALUE.
>
> rfc -> v1:
>   * Address comments from Andrii:
>     * Keep one image of global percpu variable for all CPUs.
>     * Reject non-ARRAY map in bpf_map_direct_read(), check_reg_const_str(),
>       and check_bpf_snprintf_call() in verifier.
>     * Split out libbpf changes from kernel-side changes.
>     * Use ".percpu" as PERCPU_DATA_SEC.
>     * Use enum libbpf_map_type to distinguish BSS, DATA, RODATA and
>       PERCPU_DATA.
>     * Avoid using errno for checking err from libbpf_num_possible_cpus().
>     * Use "map '%s': " prefix for error message.
>
> Leon Hwang (4):
>   bpf: Introduce global percpu data
>   bpf, libbpf: Support global percpu data
>   bpf, bpftool: Generate skeleton for global percpu data
>   selftests/bpf: Add cases to test global percpu data
>
>  kernel/bpf/arraymap.c                         |  41 +++-
>  kernel/bpf/verifier.c                         |  45 ++++
>  tools/bpf/bpftool/gen.c                       |  47 ++--
>  tools/lib/bpf/libbpf.c                        | 102 ++++++--
>  tools/lib/bpf/libbpf.h                        |   9 +
>  tools/lib/bpf/libbpf.map                      |   1 +
>  tools/testing/selftests/bpf/Makefile          |   2 +-
>  .../bpf/prog_tests/global_data_init.c         | 221 +++++++++++++++++-
>  .../bpf/progs/test_global_percpu_data.c       |  29 +++
>  9 files changed, 459 insertions(+), 38 deletions(-)
>  create mode 100644 tools/testing/selftests/bpf/progs/test_global_percpu_data.c
>
> --
> 2.49.0
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf-next v3 2/4] bpf, libbpf: Support global percpu data
  2025-05-26 16:21 ` [PATCH bpf-next v3 2/4] bpf, libbpf: Support " Leon Hwang
@ 2025-05-27 22:31   ` Andrii Nakryiko
  2025-05-29  2:24     ` Leon Hwang
  2025-05-27 22:40   ` Alexei Starovoitov
  1 sibling, 1 reply; 24+ messages in thread
From: Andrii Nakryiko @ 2025-05-27 22:31 UTC (permalink / raw)
  To: Leon Hwang
  Cc: bpf, ast, andrii, daniel, yonghong.song, song, eddyz87, qmo, dxu,
	kernel-patches-bot

On Mon, May 26, 2025 at 9:22 AM Leon Hwang <leon.hwang@linux.dev> wrote:
>
> This patch introduces support for global percpu data in libbpf by adding a
> new ".data..percpu" section, similar to ".data". It enables efficient
> handling of percpu global variables in bpf programs.
>
> This enhancement improves performance for workloads that benefit from
> percpu storage.
>
> Meanwhile, add bpf_map__is_internal_percpu() API to check whether the map
> is an internal map used for global percpu variables.

I'm not a big fan of this super specific API. We do have
bpf_map__is_internal() to let customer know that map is special in
some way, but I'd like to avoid making this fine distinction between
per-CPU internal map vs non-per-CPU (and then why stop there, why not
have kconfig-specific API, ksym-specific check, etc)?

All this is mostly useful just for bpftool for skeleton codegen, and
bpftool already has to know about .percpu prefix, so it doesn't need
this API to make all these decisions. Let's try to drop this
bpf_map__is_internal_percpu() API?

>
> Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
> ---
>  tools/lib/bpf/libbpf.c   | 102 +++++++++++++++++++++++++++++++--------
>  tools/lib/bpf/libbpf.h   |   9 ++++
>  tools/lib/bpf/libbpf.map |   1 +
>  3 files changed, 91 insertions(+), 21 deletions(-)
>
> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index e9c641a2fb203..65f0df09ac6d8 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -518,6 +518,7 @@ struct bpf_struct_ops {
>  };
>
>  #define DATA_SEC ".data"
> +#define PERCPU_DATA_SEC ".data..percpu"
>  #define BSS_SEC ".bss"
>  #define RODATA_SEC ".rodata"
>  #define KCONFIG_SEC ".kconfig"
> @@ -532,6 +533,7 @@ enum libbpf_map_type {
>         LIBBPF_MAP_BSS,
>         LIBBPF_MAP_RODATA,
>         LIBBPF_MAP_KCONFIG,
> +       LIBBPF_MAP_PERCPU_DATA,
>  };
>
>  struct bpf_map_def {
> @@ -642,6 +644,7 @@ enum sec_type {
>         SEC_DATA,
>         SEC_RODATA,
>         SEC_ST_OPS,
> +       SEC_PERCPU_DATA,
>  };
>
>  struct elf_sec_desc {
> @@ -1902,7 +1905,7 @@ static bool map_is_mmapable(struct bpf_object *obj, struct bpf_map *map)
>         struct btf_var_secinfo *vsi;
>         int i, n;
>
> -       if (!map->btf_value_type_id)
> +       if (!map->btf_value_type_id || map->libbpf_type == LIBBPF_MAP_PERCPU_DATA)

Not sure this is correct. We should have btf_value_type_id for PERCPU
global data array, no?

>                 return false;
>
>         t = btf__type_by_id(obj->btf, map->btf_value_type_id);
> @@ -1926,6 +1929,7 @@ static int
>  bpf_object__init_internal_map(struct bpf_object *obj, enum libbpf_map_type type,
>                               const char *real_name, int sec_idx, void *data, size_t data_sz)
>  {
> +       bool is_percpu = type == LIBBPF_MAP_PERCPU_DATA;
>         struct bpf_map_def *def;
>         struct bpf_map *map;
>         size_t mmap_sz;
> @@ -1947,9 +1951,9 @@ bpf_object__init_internal_map(struct bpf_object *obj, enum libbpf_map_type type,
>         }
>
>         def = &map->def;
> -       def->type = BPF_MAP_TYPE_ARRAY;
> +       def->type = is_percpu ? BPF_MAP_TYPE_PERCPU_ARRAY : BPF_MAP_TYPE_ARRAY;
>         def->key_size = sizeof(int);
> -       def->value_size = data_sz;
> +       def->value_size = is_percpu ? roundup(data_sz, 8) : data_sz;
>         def->max_entries = 1;
>         def->map_flags = type == LIBBPF_MAP_RODATA || type == LIBBPF_MAP_KCONFIG
>                 ? BPF_F_RDONLY_PROG : 0;
> @@ -1960,10 +1964,11 @@ bpf_object__init_internal_map(struct bpf_object *obj, enum libbpf_map_type type,
>         if (map_is_mmapable(obj, map))
>                 def->map_flags |= BPF_F_MMAPABLE;
>
> -       pr_debug("map '%s' (global data): at sec_idx %d, offset %zu, flags %x.\n",
> -                map->name, map->sec_idx, map->sec_offset, def->map_flags);
> +       pr_debug("map '%s' (global %sdata): at sec_idx %d, offset %zu, flags %x.\n",
> +                map->name, is_percpu ? "percpu " : "", map->sec_idx,
> +                map->sec_offset, def->map_flags);
>
> -       mmap_sz = bpf_map_mmap_sz(map);
> +       mmap_sz = is_percpu ? def->value_size : bpf_map_mmap_sz(map);
>         map->mmaped = mmap(NULL, mmap_sz, PROT_READ | PROT_WRITE,
>                            MAP_SHARED | MAP_ANONYMOUS, -1, 0);
>         if (map->mmaped == MAP_FAILED) {
> @@ -1999,6 +2004,13 @@ static int bpf_object__init_global_data_maps(struct bpf_object *obj)
>                         continue;
>
>                 switch (sec_desc->sec_type) {
> +               case SEC_PERCPU_DATA:
> +                       sec_name = elf_sec_name(obj, elf_sec_by_idx(obj, sec_idx));
> +                       err = bpf_object__init_internal_map(obj, LIBBPF_MAP_PERCPU_DATA,
> +                                                           sec_name, sec_idx,
> +                                                           sec_desc->data->d_buf,
> +                                                           sec_desc->data->d_size);
> +                       break;
>                 case SEC_DATA:
>                         sec_name = elf_sec_name(obj, elf_sec_by_idx(obj, sec_idx));
>                         err = bpf_object__init_internal_map(obj, LIBBPF_MAP_DATA,
> @@ -3363,6 +3375,10 @@ static int btf_fixup_datasec(struct bpf_object *obj, struct btf *btf,
>                 fixup_offsets = true;
>         }
>
> +       /* .data..percpu DATASEC must have __aligned(8) size. */

please remind me why? similarly for def->value_size special casing?
What will happen if we don't explicitly roundup() on libbpf side
(kernel always does roundup(8) for ARRAY value_size anyways, which is
why I am asking)

> +       if (strcmp(sec_name, PERCPU_DATA_SEC) == 0 || str_has_pfx(sec_name, PERCPU_DATA_SEC))
> +               t->size = roundup(t->size, 8);
> +
>         for (i = 0, vsi = btf_var_secinfos(t); i < vars; i++, vsi++) {
>                 const struct btf_type *t_var;
>                 struct btf_var *var;
> @@ -3923,6 +3939,11 @@ static int bpf_object__elf_collect(struct bpf_object *obj)
>                                 err = bpf_object__add_programs(obj, data, name, idx);
>                                 if (err)
>                                         return err;
> +                       } else if (strcmp(name, PERCPU_DATA_SEC) == 0 ||
> +                                  str_has_pfx(name, PERCPU_DATA_SEC)) {
> +                               sec_desc->sec_type = SEC_PERCPU_DATA;
> +                               sec_desc->shdr = sh;
> +                               sec_desc->data = data;
>                         } else if (strcmp(name, DATA_SEC) == 0 ||
>                                    str_has_pfx(name, DATA_SEC ".")) {
>                                 sec_desc->sec_type = SEC_DATA;
> @@ -4452,6 +4473,7 @@ static bool bpf_object__shndx_is_data(const struct bpf_object *obj,
>         case SEC_BSS:
>         case SEC_DATA:
>         case SEC_RODATA:
> +       case SEC_PERCPU_DATA:
>                 return true;
>         default:
>                 return false;
> @@ -4477,6 +4499,8 @@ bpf_object__section_to_libbpf_map_type(const struct bpf_object *obj, int shndx)
>                 return LIBBPF_MAP_DATA;
>         case SEC_RODATA:
>                 return LIBBPF_MAP_RODATA;
> +       case SEC_PERCPU_DATA:
> +               return LIBBPF_MAP_PERCPU_DATA;
>         default:
>                 return LIBBPF_MAP_UNSPEC;
>         }
> @@ -4794,7 +4818,7 @@ static int map_fill_btf_type_info(struct bpf_object *obj, struct bpf_map *map)
>
>         /*
>          * LLVM annotates global data differently in BTF, that is,
> -        * only as '.data', '.bss' or '.rodata'.
> +        * only as '.data', '.bss', '.rodata' or '.data..percpu'.
>          */
>         if (!bpf_map__is_internal(map))
>                 return -ENOENT;
> @@ -5129,23 +5153,47 @@ static int
>  bpf_object__populate_internal_map(struct bpf_object *obj, struct bpf_map *map)
>  {
>         enum libbpf_map_type map_type = map->libbpf_type;
> -       int err, zero = 0;
> -       size_t mmap_sz;
> +       bool is_percpu = map_type == LIBBPF_MAP_PERCPU_DATA;
> +       int err = 0, zero = 0, num_cpus, i;
> +       size_t data_sz, elem_sz, mmap_sz;
> +       void *data = NULL;
> +
> +       data_sz = map->def.value_size;
> +       if (is_percpu) {
> +               num_cpus = libbpf_num_possible_cpus();
> +               if (num_cpus < 0) {
> +                       err = num_cpus;
> +                       return err;

hm... why not `return num_cpus;`?

> +               }
> +
> +               data_sz = data_sz * num_cpus;
> +               data = malloc(data_sz);
> +               if (!data) {
> +                       err = -ENOMEM;
> +                       return err;
> +               }

[...]

>  /**
>   * @brief **bpf_map__set_pin_path()** sets the path attribute that tells where the
>   * BPF map should be pinned. This does not actually create the 'pin'.
> diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
> index 1205f9a4fe048..1c239ac88c699 100644
> --- a/tools/lib/bpf/libbpf.map
> +++ b/tools/lib/bpf/libbpf.map
> @@ -443,4 +443,5 @@ LIBBPF_1.6.0 {
>                 bpf_program__line_info_cnt;
>                 btf__add_decl_attr;
>                 btf__add_type_attr;
> +               bpf_map__is_internal_percpu;

alphabetically sorted

>  } LIBBPF_1.5.0;




> --
> 2.49.0
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf-next v3 3/4] bpf, bpftool: Generate skeleton for global percpu data
  2025-05-26 16:21 ` [PATCH bpf-next v3 3/4] bpf, bpftool: Generate skeleton for " Leon Hwang
@ 2025-05-27 22:31   ` Andrii Nakryiko
  2025-05-29  2:56     ` Leon Hwang
  0 siblings, 1 reply; 24+ messages in thread
From: Andrii Nakryiko @ 2025-05-27 22:31 UTC (permalink / raw)
  To: Leon Hwang
  Cc: bpf, ast, andrii, daniel, yonghong.song, song, eddyz87, qmo, dxu,
	kernel-patches-bot, Daniel Müller

Adding libbpf-rs maintainer, Daniel, for awareness, as Rust skeleton
will have to add support for this, once this patch set lands upstream.


On Mon, May 26, 2025 at 9:22 AM Leon Hwang <leon.hwang@linux.dev> wrote:
>
> This patch enhances bpftool to generate skeletons that properly handle
> global percpu variables. The generated skeleton now includes a dedicated
> structure for percpu data, allowing users to initialize and access percpu
> variables more efficiently.
>
> For global percpu variables, the skeleton now includes a nested
> structure, e.g.:
>
> struct test_global_percpu_data {
>         struct bpf_object_skeleton *skeleton;
>         struct bpf_object *obj;
>         struct {
>                 struct bpf_map *data__percpu;
>         } maps;
>         // ...
>         struct test_global_percpu_data__data__percpu {
>                 int data;
>                 char run;
>                 struct {
>                         char set;
>                         int i;
>                         int nums[7];
>                 } struct_data;
>                 int nums[7];
>         } __aligned(8) *data__percpu;
>
>         // ...
> };
>
>   * The "struct test_global_percpu_data__data__percpu *data__percpu" points
>     to initialized data, which is actually "maps.data__percpu->mmaped".
>   * Before loading the skeleton, updating the
>     "struct test_global_percpu_data__data__percpu *data__percpu" modifies
>     the initial value of the corresponding global percpu variables.
>   * After loading the skeleton, accessing or updating this struct is not
>     allowed because this struct pointer has been reset as NULL. Instead,
>     users must interact with the global percpu variables via the
>     "maps.data__percpu" map.
>
> Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
> ---
>  tools/bpf/bpftool/gen.c | 47 +++++++++++++++++++++++++++++------------
>  1 file changed, 34 insertions(+), 13 deletions(-)
>
> diff --git a/tools/bpf/bpftool/gen.c b/tools/bpf/bpftool/gen.c
> index 67a60114368f5..c672f52110221 100644
> --- a/tools/bpf/bpftool/gen.c
> +++ b/tools/bpf/bpftool/gen.c
> @@ -92,7 +92,7 @@ static void get_header_guard(char *guard, const char *obj_name, const char *suff
>
>  static bool get_map_ident(const struct bpf_map *map, char *buf, size_t buf_sz)
>  {
> -       static const char *sfxs[] = { ".data", ".rodata", ".bss", ".kconfig" };
> +       static const char *sfxs[] = { ".data..percpu", ".data", ".rodata", ".bss", ".kconfig" };
>         const char *name = bpf_map__name(map);
>         int i, n;
>
> @@ -117,7 +117,7 @@ static bool get_map_ident(const struct bpf_map *map, char *buf, size_t buf_sz)
>
>  static bool get_datasec_ident(const char *sec_name, char *buf, size_t buf_sz)
>  {
> -       static const char *pfxs[] = { ".data", ".rodata", ".bss", ".kconfig" };
> +       static const char *pfxs[] = { ".data..percpu", ".data", ".rodata", ".bss", ".kconfig" };
>         int i, n;
>
>         /* recognize hard coded LLVM section name */
> @@ -148,7 +148,8 @@ static int codegen_datasec_def(struct bpf_object *obj,
>                                struct btf *btf,
>                                struct btf_dump *d,
>                                const struct btf_type *sec,
> -                              const char *obj_name)
> +                              const char *obj_name,
> +                              bool is_percpu)
>  {
>         const char *sec_name = btf__name_by_offset(btf, sec->name_off);
>         const struct btf_var_secinfo *sec_var = btf_var_secinfos(sec);
> @@ -228,7 +229,7 @@ static int codegen_datasec_def(struct bpf_object *obj,
>
>                 off = sec_var->offset + sec_var->size;
>         }
> -       printf("        } *%s;\n", sec_ident);
> +       printf("        }%s *%s;\n", is_percpu ? " __aligned(8)" : "", sec_ident);
>         return 0;
>  }
>
> @@ -263,13 +264,13 @@ static bool is_mmapable_map(const struct bpf_map *map, char *buf, size_t sz)
>                 return true;
>         }
>
> -       if (!bpf_map__is_internal(map) || !(bpf_map__map_flags(map) & BPF_F_MMAPABLE))
> -               return false;
> -
> -       if (!get_map_ident(map, buf, sz))
> -               return false;
> +       if (bpf_map__is_internal(map) &&
> +           ((bpf_map__map_flags(map) & BPF_F_MMAPABLE) ||
> +            bpf_map__is_internal_percpu(map)) &&
> +           get_map_ident(map, buf, sz))
> +               return true;
>
> -       return true;
> +       return false;
>  }
>
>  static int codegen_datasecs(struct bpf_object *obj, const char *obj_name)
> @@ -303,7 +304,8 @@ static int codegen_datasecs(struct bpf_object *obj, const char *obj_name)
>                         printf("        struct %s__%s {\n", obj_name, map_ident);
>                         printf("        } *%s;\n", map_ident);
>                 } else {
> -                       err = codegen_datasec_def(obj, btf, d, sec, obj_name);
> +                       err = codegen_datasec_def(obj, btf, d, sec, obj_name,
> +                                                 bpf_map__is_internal_percpu(map));
>                         if (err)
>                                 goto out;
>                 }
> @@ -795,7 +797,8 @@ static int gen_trace(struct bpf_object *obj, const char *obj_name, const char *h
>         bpf_object__for_each_map(map, obj) {
>                 const char *mmap_flags;
>
> -               if (!is_mmapable_map(map, ident, sizeof(ident)))
> +               if (!is_mmapable_map(map, ident, sizeof(ident)) ||
> +                   bpf_map__is_internal_percpu(map))
>                         continue;
>
>                 if (bpf_map__map_flags(map) & BPF_F_RDONLY_PROG)
> @@ -1434,7 +1437,25 @@ static int do_skeleton(int argc, char **argv)
>                 static inline int                                           \n\
>                 %1$s__load(struct %1$s *obj)                                \n\
>                 {                                                           \n\
> -                       return bpf_object__load_skeleton(obj->skeleton);    \n\
> +                       int err;                                            \n\
> +                                                                           \n\
> +                       err = bpf_object__load_skeleton(obj->skeleton);     \n\
> +                       if (err)                                            \n\
> +                               return err;                                 \n\
> +                                                                           \n\
> +               ", obj_name);
> +
> +       if (map_cnt) {
> +               bpf_object__for_each_map(map, obj) {
> +                       if (bpf_map__is_internal_percpu(map) &&
> +                           get_map_ident(map, ident, sizeof(ident)))
> +                               printf("\tobj->%s = NULL;\n", ident);
> +               }
> +       }

hm... maybe we can avoid this by making libbpf re-mmap() this
initialization image to be read-only during bpf_object load? Then the
pointer can stay in the skeleton and be available for querying of
"initialization values" (if anyone cares), and we won't have any extra
post-processing steps in code generated skeleton code?

And Rust skeleton will be able to expose this as a non-mutable
reference with no extra magic behind it?


> +
> +       codegen("\
> +               \n\
> +                       return 0;                                           \n\
>                 }                                                           \n\
>                                                                             \n\
>                 static inline struct %1$s *                                 \n\
> --
> 2.49.0
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf-next v3 2/4] bpf, libbpf: Support global percpu data
  2025-05-26 16:21 ` [PATCH bpf-next v3 2/4] bpf, libbpf: Support " Leon Hwang
  2025-05-27 22:31   ` Andrii Nakryiko
@ 2025-05-27 22:40   ` Alexei Starovoitov
  2025-05-27 23:25     ` Andrii Nakryiko
  1 sibling, 1 reply; 24+ messages in thread
From: Alexei Starovoitov @ 2025-05-27 22:40 UTC (permalink / raw)
  To: Leon Hwang
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Yonghong Song, Song Liu, Eduard, Quentin Monnet, Daniel Xu,
	kernel-patches-bot

On Mon, May 26, 2025 at 9:22 AM Leon Hwang <leon.hwang@linux.dev> wrote:
> +
> +       data_sz = map->def.value_size;
> +       if (is_percpu) {
> +               num_cpus = libbpf_num_possible_cpus();
> +               if (num_cpus < 0) {
> +                       err = num_cpus;
> +                       return err;
> +               }
> +
> +               data_sz = data_sz * num_cpus;
> +               data = malloc(data_sz);
> +               if (!data) {
> +                       err = -ENOMEM;
> +                       return err;
> +               }
> +
> +               elem_sz = map->def.value_size;
> +               for (i = 0; i < num_cpus; i++)
> +                       memcpy(data + i * elem_sz, map->mmaped, elem_sz);
> +       } else {
> +               data = map->mmaped;
> +       }
>
>         if (obj->gen_loader) {
>                 bpf_gen__map_update_elem(obj->gen_loader, map - obj->maps,
> -                                        map->mmaped, map->def.value_size);
> +                                        data, data_sz);

I missed it earlier, but now I wonder how this is supposed to work ?
skel and lskel may be generated on a system with N cpus,
but loaded with M cpus.

Another concern is num_cpus multiplier can be huge.
lksel adds all that init data into a global array.
Pls avoid this multiplier.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf-next v3 2/4] bpf, libbpf: Support global percpu data
  2025-05-27 22:40   ` Alexei Starovoitov
@ 2025-05-27 23:25     ` Andrii Nakryiko
  2025-05-28  2:35       ` Alexei Starovoitov
  0 siblings, 1 reply; 24+ messages in thread
From: Andrii Nakryiko @ 2025-05-27 23:25 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Leon Hwang, bpf, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Yonghong Song, Song Liu, Eduard, Quentin Monnet,
	Daniel Xu, kernel-patches-bot

On Tue, May 27, 2025 at 3:40 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Mon, May 26, 2025 at 9:22 AM Leon Hwang <leon.hwang@linux.dev> wrote:
> > +
> > +       data_sz = map->def.value_size;
> > +       if (is_percpu) {
> > +               num_cpus = libbpf_num_possible_cpus();
> > +               if (num_cpus < 0) {
> > +                       err = num_cpus;
> > +                       return err;
> > +               }
> > +
> > +               data_sz = data_sz * num_cpus;
> > +               data = malloc(data_sz);
> > +               if (!data) {
> > +                       err = -ENOMEM;
> > +                       return err;
> > +               }
> > +
> > +               elem_sz = map->def.value_size;
> > +               for (i = 0; i < num_cpus; i++)
> > +                       memcpy(data + i * elem_sz, map->mmaped, elem_sz);
> > +       } else {
> > +               data = map->mmaped;
> > +       }
> >
> >         if (obj->gen_loader) {
> >                 bpf_gen__map_update_elem(obj->gen_loader, map - obj->maps,
> > -                                        map->mmaped, map->def.value_size);
> > +                                        data, data_sz);
>
> I missed it earlier, but now I wonder how this is supposed to work ?
> skel and lskel may be generated on a system with N cpus,
> but loaded with M cpus.
>
> Another concern is num_cpus multiplier can be huge.
> lksel adds all that init data into a global array.
> Pls avoid this multiplier.

Hm... For skel, the number of CPUs at runtime isn't a problem, it's
only memory waste for this temporary data. But it is forced on us by
kernel contract for MAP_UPDATE_ELEM for per-CPU maps.

Should we have a flag for map update command for per-CPU maps that
would mean "use this data as a value for each CPU"? Then we can
provide just a small piece of initialization data and not have to rely
on the number of CPUs. This will also make lskel part very simple.

Alternatively (and perhaps more flexibly) we can extend
MAP_UPDATE_ELEM with ability to specify specific CPU for per-CPU maps.
I'd probably have a MAP_LOOKUP_ELEM counterpart for this as well. Then
skeleton/light skeleton code can iterate given number of times to
initialize all CPUs using small initial data image.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf-next v3 2/4] bpf, libbpf: Support global percpu data
  2025-05-27 23:25     ` Andrii Nakryiko
@ 2025-05-28  2:35       ` Alexei Starovoitov
  2025-05-28 16:05         ` Andrii Nakryiko
  0 siblings, 1 reply; 24+ messages in thread
From: Alexei Starovoitov @ 2025-05-28  2:35 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Leon Hwang, bpf, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Yonghong Song, Song Liu, Eduard, Quentin Monnet,
	Daniel Xu, kernel-patches-bot

On Tue, May 27, 2025 at 4:25 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Tue, May 27, 2025 at 3:40 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Mon, May 26, 2025 at 9:22 AM Leon Hwang <leon.hwang@linux.dev> wrote:
> > > +
> > > +       data_sz = map->def.value_size;
> > > +       if (is_percpu) {
> > > +               num_cpus = libbpf_num_possible_cpus();
> > > +               if (num_cpus < 0) {
> > > +                       err = num_cpus;
> > > +                       return err;
> > > +               }
> > > +
> > > +               data_sz = data_sz * num_cpus;
> > > +               data = malloc(data_sz);
> > > +               if (!data) {
> > > +                       err = -ENOMEM;
> > > +                       return err;
> > > +               }
> > > +
> > > +               elem_sz = map->def.value_size;
> > > +               for (i = 0; i < num_cpus; i++)
> > > +                       memcpy(data + i * elem_sz, map->mmaped, elem_sz);
> > > +       } else {
> > > +               data = map->mmaped;
> > > +       }
> > >
> > >         if (obj->gen_loader) {
> > >                 bpf_gen__map_update_elem(obj->gen_loader, map - obj->maps,
> > > -                                        map->mmaped, map->def.value_size);
> > > +                                        data, data_sz);
> >
> > I missed it earlier, but now I wonder how this is supposed to work ?
> > skel and lskel may be generated on a system with N cpus,
> > but loaded with M cpus.
> >
> > Another concern is num_cpus multiplier can be huge.
> > lksel adds all that init data into a global array.
> > Pls avoid this multiplier.
>
> Hm... For skel, the number of CPUs at runtime isn't a problem, it's
> only memory waste for this temporary data. But it is forced on us by
> kernel contract for MAP_UPDATE_ELEM for per-CPU maps.
>
> Should we have a flag for map update command for per-CPU maps that
> would mean "use this data as a value for each CPU"? Then we can
> provide just a small piece of initialization data and not have to rely
> on the number of CPUs. This will also make lskel part very simple.

Initially it felt too specific, but I think it makes sense.
The contract was too restrictive. Let's add the flag.

> Alternatively (and perhaps more flexibly) we can extend
> MAP_UPDATE_ELEM with ability to specify specific CPU for per-CPU maps.
> I'd probably have a MAP_LOOKUP_ELEM counterpart for this as well. Then
> skeleton/light skeleton code can iterate given number of times to
> initialize all CPUs using small initial data image.

I guess this can be a follow up.
With extra flag lookup/update/delete can look into a new field
in that anonymous struct:
        struct { /* anonymous struct used by BPF_MAP_*_ELEM and
BPF_MAP_FREEZE commands */
                __u32           map_fd;
                __aligned_u64   key;
                union {
                        __aligned_u64 value;
                        __aligned_u64 next_key;
                };
                __u64           flags;
        };

There is also "batch" version of lookup/update/delete.
They probably will need to be extended as well for consistency ?
So I'd only go with the "use data to update all CPUs" flag for now.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf-next v3 2/4] bpf, libbpf: Support global percpu data
  2025-05-28  2:35       ` Alexei Starovoitov
@ 2025-05-28 16:05         ` Andrii Nakryiko
  2025-05-29  2:43           ` Leon Hwang
  0 siblings, 1 reply; 24+ messages in thread
From: Andrii Nakryiko @ 2025-05-28 16:05 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Leon Hwang, bpf, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Yonghong Song, Song Liu, Eduard, Quentin Monnet,
	Daniel Xu, kernel-patches-bot

On Tue, May 27, 2025 at 7:35 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Tue, May 27, 2025 at 4:25 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Tue, May 27, 2025 at 3:40 PM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > On Mon, May 26, 2025 at 9:22 AM Leon Hwang <leon.hwang@linux.dev> wrote:
> > > > +
> > > > +       data_sz = map->def.value_size;
> > > > +       if (is_percpu) {
> > > > +               num_cpus = libbpf_num_possible_cpus();
> > > > +               if (num_cpus < 0) {
> > > > +                       err = num_cpus;
> > > > +                       return err;
> > > > +               }
> > > > +
> > > > +               data_sz = data_sz * num_cpus;
> > > > +               data = malloc(data_sz);
> > > > +               if (!data) {
> > > > +                       err = -ENOMEM;
> > > > +                       return err;
> > > > +               }
> > > > +
> > > > +               elem_sz = map->def.value_size;
> > > > +               for (i = 0; i < num_cpus; i++)
> > > > +                       memcpy(data + i * elem_sz, map->mmaped, elem_sz);
> > > > +       } else {
> > > > +               data = map->mmaped;
> > > > +       }
> > > >
> > > >         if (obj->gen_loader) {
> > > >                 bpf_gen__map_update_elem(obj->gen_loader, map - obj->maps,
> > > > -                                        map->mmaped, map->def.value_size);
> > > > +                                        data, data_sz);
> > >
> > > I missed it earlier, but now I wonder how this is supposed to work ?
> > > skel and lskel may be generated on a system with N cpus,
> > > but loaded with M cpus.
> > >
> > > Another concern is num_cpus multiplier can be huge.
> > > lksel adds all that init data into a global array.
> > > Pls avoid this multiplier.
> >
> > Hm... For skel, the number of CPUs at runtime isn't a problem, it's
> > only memory waste for this temporary data. But it is forced on us by
> > kernel contract for MAP_UPDATE_ELEM for per-CPU maps.
> >
> > Should we have a flag for map update command for per-CPU maps that
> > would mean "use this data as a value for each CPU"? Then we can
> > provide just a small piece of initialization data and not have to rely
> > on the number of CPUs. This will also make lskel part very simple.
>
> Initially it felt too specific, but I think it makes sense.
> The contract was too restrictive. Let's add the flag.
>
> > Alternatively (and perhaps more flexibly) we can extend
> > MAP_UPDATE_ELEM with ability to specify specific CPU for per-CPU maps.
> > I'd probably have a MAP_LOOKUP_ELEM counterpart for this as well. Then
> > skeleton/light skeleton code can iterate given number of times to
> > initialize all CPUs using small initial data image.
>
> I guess this can be a follow up.
> With extra flag lookup/update/delete can look into a new field
> in that anonymous struct:
>         struct { /* anonymous struct used by BPF_MAP_*_ELEM and
> BPF_MAP_FREEZE commands */
>                 __u32           map_fd;
>                 __aligned_u64   key;
>                 union {
>                         __aligned_u64 value;
>                         __aligned_u64 next_key;
>                 };
>                 __u64           flags;
>         };
>

Yep, we'd have two flags: one for "apply across all CPUs", and another
meaning "apply for specified CPU" + new CPU number field. Or the same
flag with a special CPU number value (0xffffffff?).

> There is also "batch" version of lookup/update/delete.
> They probably will need to be extended as well for consistency ?
> So I'd only go with the "use data to update all CPUs" flag for now.

Agreed. But also looking at generic_map_update_batch() it seems like
it just routes everything through single-element updates, so it
shouldn't be hard to add batch support for all this either.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf-next v3 0/4] bpf: Introduce global percpu data
  2025-05-27 22:31 ` [PATCH bpf-next v3 0/4] bpf: Introduce " Andrii Nakryiko
@ 2025-05-28 17:10   ` Yonghong Song
  2025-05-29  1:59     ` Leon Hwang
  0 siblings, 1 reply; 24+ messages in thread
From: Yonghong Song @ 2025-05-28 17:10 UTC (permalink / raw)
  To: Andrii Nakryiko, Leon Hwang
  Cc: bpf, ast, andrii, daniel, song, eddyz87, qmo, dxu,
	kernel-patches-bot



On 5/27/25 3:31 PM, Andrii Nakryiko wrote:
> On Mon, May 26, 2025 at 9:22 AM Leon Hwang <leon.hwang@linux.dev> wrote:
>> This patch set introduces global percpu data, similar to commit
>> 6316f78306c1 ("Merge branch 'support-global-data'"), to reduce restrictions
>> in C for BPF programs.
>>
>> With this enhancement, it becomes possible to define and use global percpu
>> variables, like the DEFINE_PER_CPU() macro in the kernel[0].
>>
>> The section name for global peurcpu data is ".data..percpu". It cannot be
>> named ".percpu" or ".percpudata" because defining a one-byte percpu
>> variable (e.g., char run SEC(".data..percpu") = 0;) can trigger a crash
>> with Clang 17[1]. The name ".data.percpu" is also avoided because some
> Does this happen with newer Clangs? If not, I don't think a bug in
> Clang 17 is reason enough for this weird '.data..percpu' naming
> convention. I'd still very much prefer .percpu prefix. .data is used
> for non-per-CPU data, we shouldn't share the prefix, if we can avoid
> that.

I checked and clang17 does have a fatal error with '.percpu'. But clang18
to clang21 all fine.

For clang17, the error message is
   fatal error: error in backend: unable to write nop sequence of 3 bytes
in llvm/lib/MC/MCAssembler.cpp.

The key reason is in bpf backend llvm/lib/Target/BPF/MCTargetDesc/BPFAsmBackend.cpp

bool BPFAsmBackend::writeNopData(raw_ostream &OS, uint64_t Count,
                                  const MCSubtargetInfo *STI) const {
   if ((Count % 8) != 0)
     return false;

   for (uint64_t i = 0; i < Count; i += 8)
     support::endian::write<uint64_t>(OS, 0x15000000, Endian);

   return true;
}

Since Count is 3, writeNopData returns false and it caused the fatal error.

The bug is likely in MC itself as for the same BPF writeNopData implementatation,
clang18 works fine (with Count is 8). So the bug should be fixed in clang18.

>
> pw-bot: cr
>
>
>> users already use section names prefixed with ".data.percpu", such as in
>> this example from test_global_map_resize.c:
>>
>> int percpu_arr[1] SEC(".data.percpu_arr");
>>
>> The idea stems from the bpfsnoop[2], which itself was inspired by
>> retsnoop[3]. During testing of bpfsnoop on the v6.6 kernel, two LBR
>> (Last Branch Record) entries were observed related to the
>> bpf_get_smp_processor_id() helper.
>>
>> Since commit 1ae6921009e5 ("bpf: inline bpf_get_smp_processor_id() helper"),
>> the bpf_get_smp_processor_id() helper has been inlined on x86_64, reducing
>> the overhead and consequently minimizing these two LBR records.
>>
>> However, the introduction of global percpu data offers a more robust
>> solution. By leveraging the percpu_array map and percpu instruction,
>> global percpu data can be implemented intrinsically.
>>
>> This feature also facilitates sharing percpu information between tail
>> callers and callees or between freplace callers and callees through a
>> shared global percpu variable. Previously, this was achieved using a
>> 1-entry percpu_array map, which this patch set aims to improve upon.
>>
>> Links:
>> [0] https://github.com/torvalds/linux/blob/fbfd64d25c7af3b8695201ebc85efe90be28c5a3/include/linux/percpu-defs.h#L114
>> [1] https://lore.kernel.org/bpf/fd1b3f58-c27f-403d-ad99-644b7d06ecb3@linux.dev/
>> [2] https://github.com/bpfsnoop/bpfsnoop
>> [3] https://github.com/anakryiko/retsnoop
>>
>> Changes:
>> v2 -> v3:
>>    * Use ".data..percpu" as PERCPU_DATA_SEC.
>>    * Address comment from Alexei:
>>      * Add u8, array of ints and struct { .. } vars to selftest.
>>
>> v1 -> v2:
>>    * Address comments from Andrii:
>>      * Use LIBBPF_MAP_PERCPU and SEC_PERCPU.
>>      * Reuse mmaped of libbpf's struct bpf_map for .percpu map data.
>>      * Set .percpu struct pointer to NULL after loading skeleton.
>>      * Make sure value size of .percpu map is __aligned(8).
>>      * Use raw_tp and opts.cpu to test global percpu variables on all CPUs.
>>    * Address comments from Alexei:
>>      * Test non-zero offset of global percpu variable.
>>      * Test case about BPF_PSEUDO_MAP_IDX_VALUE.
>>
>> rfc -> v1:
>>    * Address comments from Andrii:
>>      * Keep one image of global percpu variable for all CPUs.
>>      * Reject non-ARRAY map in bpf_map_direct_read(), check_reg_const_str(),
>>        and check_bpf_snprintf_call() in verifier.
>>      * Split out libbpf changes from kernel-side changes.
>>      * Use ".percpu" as PERCPU_DATA_SEC.
>>      * Use enum libbpf_map_type to distinguish BSS, DATA, RODATA and
>>        PERCPU_DATA.
>>      * Avoid using errno for checking err from libbpf_num_possible_cpus().
>>      * Use "map '%s': " prefix for error message.
>>
>> Leon Hwang (4):
>>    bpf: Introduce global percpu data
>>    bpf, libbpf: Support global percpu data
>>    bpf, bpftool: Generate skeleton for global percpu data
>>    selftests/bpf: Add cases to test global percpu data
>>
>>   kernel/bpf/arraymap.c                         |  41 +++-
>>   kernel/bpf/verifier.c                         |  45 ++++
>>   tools/bpf/bpftool/gen.c                       |  47 ++--
>>   tools/lib/bpf/libbpf.c                        | 102 ++++++--
>>   tools/lib/bpf/libbpf.h                        |   9 +
>>   tools/lib/bpf/libbpf.map                      |   1 +
>>   tools/testing/selftests/bpf/Makefile          |   2 +-
>>   .../bpf/prog_tests/global_data_init.c         | 221 +++++++++++++++++-
>>   .../bpf/progs/test_global_percpu_data.c       |  29 +++
>>   9 files changed, 459 insertions(+), 38 deletions(-)
>>   create mode 100644 tools/testing/selftests/bpf/progs/test_global_percpu_data.c
>>
>> --
>> 2.49.0
>>


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf-next v3 0/4] bpf: Introduce global percpu data
  2025-05-28 17:10   ` Yonghong Song
@ 2025-05-29  1:59     ` Leon Hwang
  0 siblings, 0 replies; 24+ messages in thread
From: Leon Hwang @ 2025-05-29  1:59 UTC (permalink / raw)
  To: Yonghong Song, Andrii Nakryiko
  Cc: bpf, ast, andrii, daniel, song, eddyz87, qmo, dxu,
	kernel-patches-bot



On 29/5/25 01:10, Yonghong Song wrote:
> 
> 
> On 5/27/25 3:31 PM, Andrii Nakryiko wrote:
>> On Mon, May 26, 2025 at 9:22 AM Leon Hwang <leon.hwang@linux.dev> wrote:
>>> This patch set introduces global percpu data, similar to commit
>>> 6316f78306c1 ("Merge branch 'support-global-data'"), to reduce
>>> restrictions
>>> in C for BPF programs.
>>>
>>> With this enhancement, it becomes possible to define and use global
>>> percpu
>>> variables, like the DEFINE_PER_CPU() macro in the kernel[0].
>>>
>>> The section name for global peurcpu data is ".data..percpu". It
>>> cannot be
>>> named ".percpu" or ".percpudata" because defining a one-byte percpu
>>> variable (e.g., char run SEC(".data..percpu") = 0;) can trigger a crash
>>> with Clang 17[1]. The name ".data.percpu" is also avoided because some
>> Does this happen with newer Clangs? If not, I don't think a bug in
>> Clang 17 is reason enough for this weird '.data..percpu' naming
>> convention. I'd still very much prefer .percpu prefix. .data is used
>> for non-per-CPU data, we shouldn't share the prefix, if we can avoid
>> that.
> 
> I checked and clang17 does have a fatal error with '.percpu'. But clang18
> to clang21 all fine.
> 
> For clang17, the error message is
>   fatal error: error in backend: unable to write nop sequence of 3 bytes
> in llvm/lib/MC/MCAssembler.cpp.
> 
> The key reason is in bpf backend llvm/lib/Target/BPF/MCTargetDesc/
> BPFAsmBackend.cpp
> 
> bool BPFAsmBackend::writeNopData(raw_ostream &OS, uint64_t Count,
>                                  const MCSubtargetInfo *STI) const {
>   if ((Count % 8) != 0)
>     return false;
> 
>   for (uint64_t i = 0; i < Count; i += 8)
>     support::endian::write<uint64_t>(OS, 0x15000000, Endian);
> 
>   return true;
> }
> 
> Since Count is 3, writeNopData returns false and it caused the fatal error.
> 
> The bug is likely in MC itself as for the same BPF writeNopData
> implementatation,
> clang18 works fine (with Count is 8). So the bug should be fixed in
> clang18.
> 
From my testing, Clang 18 handles 'char run SEC(".percpu");' correctly
without crashing.

To use the '.percpu' section while avoiding the Clang 17 bug, the test
case can be adjusted as follows:

int data SEC(".percpu") = -1;
int nums[7] SEC(".percpu");
#if defined(__clang__) && __clang_major__ >= 18
char run SEC(".percpu") = 0;
struct {
	char set;
	int i;
	int nums[7];
} struct_data SEC(".percpu") = {
	.set = 0,
	.i = -1,
};
#else
struct {
	int i;
	int nums[7];
} struct_data SEC(".percpu") = {
	.i = -1,
};
#endif

SEC("raw_tp/task_rename")
int update_percpu_data(struct __sk_buff *skb)
{
	struct_data.nums[6] = 0xc0de;
	struct_data.i = 1;
	nums[6] = 0xc0de;
	data = 1;
#if defined(__clang__) && __clang_major__ >= 18
	struct_data.set = 1;
	run = 1;
#endif
	return 0;
}

With this change, the 'char run SEC(".percpu");' declaration will only
be compiled and tested with Clang 18 or newer, effectively avoiding the
crash with Clang 17.

Thanks,
Leon


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf-next v3 1/4] bpf: Introduce global percpu data
  2025-05-27 22:31   ` Andrii Nakryiko
@ 2025-05-29  2:03     ` Leon Hwang
  0 siblings, 0 replies; 24+ messages in thread
From: Leon Hwang @ 2025-05-29  2:03 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, ast, andrii, daniel, yonghong.song, song, eddyz87, qmo, dxu,
	kernel-patches-bot



On 28/5/25 06:31, Andrii Nakryiko wrote:
> On Mon, May 26, 2025 at 9:22 AM Leon Hwang <leon.hwang@linux.dev> wrote:
>>
>> This patch introduces global percpu data, inspired by commit
>> 6316f78306c1 ("Merge branch 'support-global-data'"). It enables the
>> definition of global percpu variables in BPF, similar to the
>> DEFINE_PER_CPU() macro in the kernel[0].
>>

[...]

>> +
>>         err = check_map_access(env, regno, reg->off,
>>                                map->value_size - reg->off, false,
>>                                ACCESS_HELPER);
>> @@ -11101,6 +11109,11 @@ static int check_bpf_snprintf_call(struct bpf_verifier_env *env,
>>                 return -EINVAL;
>>         num_args = data_len_reg->var_off.value / 8;
>>
>> +       if (fmt_map->map_type != BPF_MAP_TYPE_ARRAY) {
>> +               verbose(env, "only array map supports snprintf\n");
>> +               return -EINVAL;
>> +       }
>> +
>>         /* fmt being ARG_PTR_TO_CONST_STR guarantees that var_off is const
>>          * and map_direct_value_addr is set.
>>          */
>> @@ -21906,6 +21919,38 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
>>                         goto next_insn;
>>                 }
>>
>> +#ifdef CONFIG_SMP
> 
> Instead of CONFIG_SMP, I think it's more appropriate to check for
> bpf_jit_supports_percpu_insn(). We check CONFIG_SMP for
> BPF_FUNC_get_smp_processor_id inlining because of `cpu_number` per-CPU
> variable, not because BPF_MOV64_PERCPU_REG() doesn't work on single
> CPU systems (IIUC).
> 
Agreed.

Then, 'EMIT_mov(dst_reg, src_reg);' can be avoided if dst_reg is same as
src_reg while handling BPF_MOV64_PERCPU_REG() on x86_64.

Thanks,
Leon


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf-next v3 2/4] bpf, libbpf: Support global percpu data
  2025-05-27 22:31   ` Andrii Nakryiko
@ 2025-05-29  2:24     ` Leon Hwang
  0 siblings, 0 replies; 24+ messages in thread
From: Leon Hwang @ 2025-05-29  2:24 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, ast, andrii, daniel, yonghong.song, song, eddyz87, qmo, dxu,
	kernel-patches-bot



On 28/5/25 06:31, Andrii Nakryiko wrote:
> On Mon, May 26, 2025 at 9:22 AM Leon Hwang <leon.hwang@linux.dev> wrote:
>>
>> This patch introduces support for global percpu data in libbpf by adding a
>> new ".data..percpu" section, similar to ".data". It enables efficient
>> handling of percpu global variables in bpf programs.
>>
>> This enhancement improves performance for workloads that benefit from
>> percpu storage.
>>
>> Meanwhile, add bpf_map__is_internal_percpu() API to check whether the map
>> is an internal map used for global percpu variables.
> 
> I'm not a big fan of this super specific API. We do have
> bpf_map__is_internal() to let customer know that map is special in
> some way, but I'd like to avoid making this fine distinction between
> per-CPU internal map vs non-per-CPU (and then why stop there, why not
> have kconfig-specific API, ksym-specific check, etc)?
> 
> All this is mostly useful just for bpftool for skeleton codegen, and
> bpftool already has to know about .percpu prefix, so it doesn't need
> this API to make all these decisions. Let's try to drop this
> bpf_map__is_internal_percpu() API?
> 

To remove bpf_map__is_internal_percpu(), it can be replaced with:

static bool bpf_map_is_internal_percpu(const struct bpf_map *map)
{
	return bpf_map__is_internal(map) &&
	       bpf_map__type(map) == BPF_MAP_TYPE_PERCPU_ARRAY;
}

This should be functionally equivalent to checking:

map->libbpf_type == LIBBPF_MAP_PERCPU;

>>
>> Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
>> ---
>>  tools/lib/bpf/libbpf.c   | 102 +++++++++++++++++++++++++++++++--------
>>  tools/lib/bpf/libbpf.h   |   9 ++++
>>  tools/lib/bpf/libbpf.map |   1 +
>>  3 files changed, 91 insertions(+), 21 deletions(-)
>>

[...]

>>  struct elf_sec_desc {
>> @@ -1902,7 +1905,7 @@ static bool map_is_mmapable(struct bpf_object *obj, struct bpf_map *map)
>>         struct btf_var_secinfo *vsi;
>>         int i, n;
>>
>> -       if (!map->btf_value_type_id)
>> +       if (!map->btf_value_type_id || map->libbpf_type == LIBBPF_MAP_PERCPU_DATA)
> 
> Not sure this is correct. We should have btf_value_type_id for PERCPU
> global data array, no?
> 

Yes, a PERCPU global data array should indeed have a valid
btf_value_type_id, which is required for seq_show.

This is evident in percpu_array_map_seq_show_elem(), where
btf_value_type_id is used to display the per-CPU values:

static void percpu_array_map_seq_show_elem(struct bpf_map *map, void *key,
					   struct seq_file *m)
{
	// ...

	rcu_read_lock();

	seq_printf(m, "%u: {\n", *(u32 *)key);
	pptr = array->pptrs[index & array->index_mask];
	for_each_possible_cpu(cpu) {
		seq_printf(m, "\tcpu%d: ", cpu);
		btf_type_seq_show(map->btf, map->btf_value_type_id,
				  per_cpu_ptr(pptr, cpu), m);
		seq_putc(m, '\n');
	}
	seq_puts(m, "}\n");

	rcu_read_unlock();
}

So the check for !map->btf_value_type_id in combination with
LIBBPF_MAP_PERCPU_DATA needs to be considered.

>>                 return false;
>>
>>         t = btf__type_by_id(obj->btf, map->btf_value_type_id);
>> @@ -1926,6 +1929,7 @@ static int
>>  bpf_object__init_internal_map(struct bpf_object *obj, enum libbpf_map_type type,
>>                               const char *real_name, int sec_idx, void *data, size_t data_sz)

[...]

>>
>> +       /* .data..percpu DATASEC must have __aligned(8) size. */
> 
> please remind me why? similarly for def->value_size special casing?
> What will happen if we don't explicitly roundup() on libbpf side
> (kernel always does roundup(8) for ARRAY value_size anyways, which is
> why I am asking)
> 

t->size must match def->value_size.

That said, I believe it's acceptable to avoid using roundup() for both
values. I'll test this using seq_show to confirm.

>> +       if (strcmp(sec_name, PERCPU_DATA_SEC) == 0 || str_has_pfx(sec_name, PERCPU_DATA_SEC))
>> +               t->size = roundup(t->size, 8);
>> +
>>         for (i = 0, vsi = btf_var_secinfos(t); i < vars; i++, vsi++) {
>>                 const struct btf_type *t_var;
>>                 struct btf_var *var;
>> @@ -3923,6 +3939,11 @@ static int bpf_object__elf_collect(struct bpf_object *obj)

[...]

>> +       data_sz = map->def.value_size;
>> +       if (is_percpu) {
>> +               num_cpus = libbpf_num_possible_cpus();
>> +               if (num_cpus < 0) {
>> +                       err = num_cpus;
>> +                       return err;
> 
> hm... why not `return num_cpus;`?
> 

Ack.

>> +               }
>> +
>> +               data_sz = data_sz * num_cpus;
>> +               data = malloc(data_sz);
>> +               if (!data) {
>> +                       err = -ENOMEM;
>> +                       return err;
>> +               }
> 
> [...]
> 
>>  /**
>>   * @brief **bpf_map__set_pin_path()** sets the path attribute that tells where the
>>   * BPF map should be pinned. This does not actually create the 'pin'.
>> diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
>> index 1205f9a4fe048..1c239ac88c699 100644
>> --- a/tools/lib/bpf/libbpf.map
>> +++ b/tools/lib/bpf/libbpf.map
>> @@ -443,4 +443,5 @@ LIBBPF_1.6.0 {
>>                 bpf_program__line_info_cnt;
>>                 btf__add_decl_attr;
>>                 btf__add_type_attr;
>> +               bpf_map__is_internal_percpu;
> 
> alphabetically sorted
> 

bpf_map__is_internal_percpu will be dropped.

Thanks,
Leon


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf-next v3 2/4] bpf, libbpf: Support global percpu data
  2025-05-28 16:05         ` Andrii Nakryiko
@ 2025-05-29  2:43           ` Leon Hwang
  2025-06-02 23:50             ` Andrii Nakryiko
  0 siblings, 1 reply; 24+ messages in thread
From: Leon Hwang @ 2025-05-29  2:43 UTC (permalink / raw)
  To: Andrii Nakryiko, Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Yonghong Song, Song Liu, Eduard, Quentin Monnet, Daniel Xu,
	kernel-patches-bot



On 29/5/25 00:05, Andrii Nakryiko wrote:
> On Tue, May 27, 2025 at 7:35 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
>>

[...]

>>
>> I guess this can be a follow up.
>> With extra flag lookup/update/delete can look into a new field
>> in that anonymous struct:
>>         struct { /* anonymous struct used by BPF_MAP_*_ELEM and
>> BPF_MAP_FREEZE commands */
>>                 __u32           map_fd;
>>                 __aligned_u64   key;
>>                 union {
>>                         __aligned_u64 value;
>>                         __aligned_u64 next_key;
>>                 };
>>                 __u64           flags;
>>         };
>>
> 
> Yep, we'd have two flags: one for "apply across all CPUs", and another
> meaning "apply for specified CPU" + new CPU number field. Or the same
> flag with a special CPU number value (0xffffffff?).
> 
>> There is also "batch" version of lookup/update/delete.
>> They probably will need to be extended as well for consistency ?
>> So I'd only go with the "use data to update all CPUs" flag for now.
> 
> Agreed. But also looking at generic_map_update_batch() it seems like
> it just routes everything through single-element updates, so it
> shouldn't be hard to add batch support for all this either.

Regarding BPF_MAP_UPDATE_{ELEM,BATCH} support for percpu_array maps —
would it make sense to split the flags field as [cpu | flags]?

When the BPF_F_PERCPU_DATA flag is set, the value could be applied
either across all CPUs if the cpu is 0xffffffff, or to a specific CPU
otherwise.

Thanks,
Leon



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf-next v3 3/4] bpf, bpftool: Generate skeleton for global percpu data
  2025-05-27 22:31   ` Andrii Nakryiko
@ 2025-05-29  2:56     ` Leon Hwang
  2025-06-02 23:50       ` Andrii Nakryiko
  0 siblings, 1 reply; 24+ messages in thread
From: Leon Hwang @ 2025-05-29  2:56 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, ast, andrii, daniel, yonghong.song, song, eddyz87, qmo, dxu,
	kernel-patches-bot, Daniel Müller



On 28/5/25 06:31, Andrii Nakryiko wrote:
> Adding libbpf-rs maintainer, Daniel, for awareness, as Rust skeleton
> will have to add support for this, once this patch set lands upstream.
> 
> 

[...]

>> +
>> +       if (map_cnt) {
>> +               bpf_object__for_each_map(map, obj) {
>> +                       if (bpf_map__is_internal_percpu(map) &&
>> +                           get_map_ident(map, ident, sizeof(ident)))
>> +                               printf("\tobj->%s = NULL;\n", ident);
>> +               }
>> +       }
> 
> hm... maybe we can avoid this by making libbpf re-mmap() this
> initialization image to be read-only during bpf_object load? Then the
> pointer can stay in the skeleton and be available for querying of
> "initialization values" (if anyone cares), and we won't have any extra
> post-processing steps in code generated skeleton code?
> 
> And Rust skeleton will be able to expose this as a non-mutable
> reference with no extra magic behind it?
> 
> 
We can re-mmap() it as read-only.

However, in the case of the Rust skeleton, users could still use unsafe
code to cast immutable variables to mutable ones.

That said, it's better to guide users toward treating them as immutable.

Thanks,
Leon


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf-next v3 3/4] bpf, bpftool: Generate skeleton for global percpu data
  2025-05-29  2:56     ` Leon Hwang
@ 2025-06-02 23:50       ` Andrii Nakryiko
  2025-06-03  2:47         ` Leon Hwang
  0 siblings, 1 reply; 24+ messages in thread
From: Andrii Nakryiko @ 2025-06-02 23:50 UTC (permalink / raw)
  To: Leon Hwang
  Cc: bpf, ast, andrii, daniel, yonghong.song, song, eddyz87, qmo, dxu,
	kernel-patches-bot, Daniel Müller

On Wed, May 28, 2025 at 7:56 PM Leon Hwang <leon.hwang@linux.dev> wrote:
>
>
>
> On 28/5/25 06:31, Andrii Nakryiko wrote:
> > Adding libbpf-rs maintainer, Daniel, for awareness, as Rust skeleton
> > will have to add support for this, once this patch set lands upstream.
> >
> >
>
> [...]
>
> >> +
> >> +       if (map_cnt) {
> >> +               bpf_object__for_each_map(map, obj) {
> >> +                       if (bpf_map__is_internal_percpu(map) &&
> >> +                           get_map_ident(map, ident, sizeof(ident)))
> >> +                               printf("\tobj->%s = NULL;\n", ident);
> >> +               }
> >> +       }
> >
> > hm... maybe we can avoid this by making libbpf re-mmap() this
> > initialization image to be read-only during bpf_object load? Then the
> > pointer can stay in the skeleton and be available for querying of
> > "initialization values" (if anyone cares), and we won't have any extra
> > post-processing steps in code generated skeleton code?
> >
> > And Rust skeleton will be able to expose this as a non-mutable
> > reference with no extra magic behind it?
> >
> >
> We can re-mmap() it as read-only.
>
> However, in the case of the Rust skeleton, users could still use unsafe
> code to cast immutable variables to mutable ones.

you have to actively want to abuse the API to do this, so I wouldn't
be too concerned about this

>
> That said, it's better to guide users toward treating them as immutable.
>
> Thanks,
> Leon
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf-next v3 2/4] bpf, libbpf: Support global percpu data
  2025-05-29  2:43           ` Leon Hwang
@ 2025-06-02 23:50             ` Andrii Nakryiko
  2025-06-03  2:45               ` Leon Hwang
  0 siblings, 1 reply; 24+ messages in thread
From: Andrii Nakryiko @ 2025-06-02 23:50 UTC (permalink / raw)
  To: Leon Hwang
  Cc: Alexei Starovoitov, bpf, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Yonghong Song, Song Liu, Eduard, Quentin Monnet,
	Daniel Xu, kernel-patches-bot

On Wed, May 28, 2025 at 7:44 PM Leon Hwang <leon.hwang@linux.dev> wrote:
>
>
>
> On 29/5/25 00:05, Andrii Nakryiko wrote:
> > On Tue, May 27, 2025 at 7:35 PM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> >>
>
> [...]
>
> >>
> >> I guess this can be a follow up.
> >> With extra flag lookup/update/delete can look into a new field
> >> in that anonymous struct:
> >>         struct { /* anonymous struct used by BPF_MAP_*_ELEM and
> >> BPF_MAP_FREEZE commands */
> >>                 __u32           map_fd;
> >>                 __aligned_u64   key;
> >>                 union {
> >>                         __aligned_u64 value;
> >>                         __aligned_u64 next_key;
> >>                 };
> >>                 __u64           flags;
> >>         };
> >>
> >
> > Yep, we'd have two flags: one for "apply across all CPUs", and another
> > meaning "apply for specified CPU" + new CPU number field. Or the same
> > flag with a special CPU number value (0xffffffff?).
> >
> >> There is also "batch" version of lookup/update/delete.
> >> They probably will need to be extended as well for consistency ?
> >> So I'd only go with the "use data to update all CPUs" flag for now.
> >
> > Agreed. But also looking at generic_map_update_batch() it seems like
> > it just routes everything through single-element updates, so it
> > shouldn't be hard to add batch support for all this either.
>
> Regarding BPF_MAP_UPDATE_{ELEM,BATCH} support for percpu_array maps —
> would it make sense to split the flags field as [cpu | flags]?

We coul;d encode CPU number as part of flags, but I'm not sure what we
are trying to achieve here. Adding a dedicated field for cpu number
would be in line of what we did for BPF_PROG_TEST_RUN, so I don't see
a big problem.

>
> When the BPF_F_PERCPU_DATA flag is set, the value could be applied
> either across all CPUs if the cpu is 0xffffffff, or to a specific CPU
> otherwise.

right

>
> Thanks,
> Leon
>
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf-next v3 2/4] bpf, libbpf: Support global percpu data
  2025-06-02 23:50             ` Andrii Nakryiko
@ 2025-06-03  2:45               ` Leon Hwang
  2025-06-05 16:29                 ` Andrii Nakryiko
  0 siblings, 1 reply; 24+ messages in thread
From: Leon Hwang @ 2025-06-03  2:45 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Alexei Starovoitov, bpf, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Yonghong Song, Song Liu, Eduard, Quentin Monnet,
	Daniel Xu, kernel-patches-bot



On 3/6/25 07:50, Andrii Nakryiko wrote:
> On Wed, May 28, 2025 at 7:44 PM Leon Hwang <leon.hwang@linux.dev> wrote:
>>
>>
>>
>> On 29/5/25 00:05, Andrii Nakryiko wrote:
>>> On Tue, May 27, 2025 at 7:35 PM Alexei Starovoitov
>>> <alexei.starovoitov@gmail.com> wrote:
>>>>
>>
>> [...]
>>
>>>>
>>>> I guess this can be a follow up.
>>>> With extra flag lookup/update/delete can look into a new field
>>>> in that anonymous struct:
>>>>         struct { /* anonymous struct used by BPF_MAP_*_ELEM and
>>>> BPF_MAP_FREEZE commands */
>>>>                 __u32           map_fd;
>>>>                 __aligned_u64   key;
>>>>                 union {
>>>>                         __aligned_u64 value;
>>>>                         __aligned_u64 next_key;
>>>>                 };
>>>>                 __u64           flags;
>>>>         };
>>>>
>>>
>>> Yep, we'd have two flags: one for "apply across all CPUs", and another
>>> meaning "apply for specified CPU" + new CPU number field. Or the same
>>> flag with a special CPU number value (0xffffffff?).
>>>
>>>> There is also "batch" version of lookup/update/delete.
>>>> They probably will need to be extended as well for consistency ?
>>>> So I'd only go with the "use data to update all CPUs" flag for now.
>>>
>>> Agreed. But also looking at generic_map_update_batch() it seems like
>>> it just routes everything through single-element updates, so it
>>> shouldn't be hard to add batch support for all this either.
>>
>> Regarding BPF_MAP_UPDATE_{ELEM,BATCH} support for percpu_array maps —
>> would it make sense to split the flags field as [cpu | flags]?
> 
> We coul;d encode CPU number as part of flags, but I'm not sure what we
> are trying to achieve here. Adding a dedicated field for cpu number
> would be in line of what we did for BPF_PROG_TEST_RUN, so I don't see
> a big problem.
> 

It's to avoid breaking existing APIs, such as libbpf's
bpf_map_update_elem() and bpf_map__update_elem(). Otherwise, we would
need to introduce new percpu-specific versions, like
bpf_map_update_percpu_elem() and bpf_map__update_percpu_elem().

Thanks,
Leon


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf-next v3 3/4] bpf, bpftool: Generate skeleton for global percpu data
  2025-06-02 23:50       ` Andrii Nakryiko
@ 2025-06-03  2:47         ` Leon Hwang
  0 siblings, 0 replies; 24+ messages in thread
From: Leon Hwang @ 2025-06-03  2:47 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, ast, andrii, daniel, yonghong.song, song, eddyz87, qmo, dxu,
	kernel-patches-bot, Daniel Müller



On 3/6/25 07:50, Andrii Nakryiko wrote:
> On Wed, May 28, 2025 at 7:56 PM Leon Hwang <leon.hwang@linux.dev> wrote:
>>
>>
>>
>> On 28/5/25 06:31, Andrii Nakryiko wrote:
>>> Adding libbpf-rs maintainer, Daniel, for awareness, as Rust skeleton
>>> will have to add support for this, once this patch set lands upstream.
>>>
>>>
>>
>> [...]
>>
>>>> +
>>>> +       if (map_cnt) {
>>>> +               bpf_object__for_each_map(map, obj) {
>>>> +                       if (bpf_map__is_internal_percpu(map) &&
>>>> +                           get_map_ident(map, ident, sizeof(ident)))
>>>> +                               printf("\tobj->%s = NULL;\n", ident);
>>>> +               }
>>>> +       }
>>>
>>> hm... maybe we can avoid this by making libbpf re-mmap() this
>>> initialization image to be read-only during bpf_object load? Then the
>>> pointer can stay in the skeleton and be available for querying of
>>> "initialization values" (if anyone cares), and we won't have any extra
>>> post-processing steps in code generated skeleton code?
>>>
>>> And Rust skeleton will be able to expose this as a non-mutable
>>> reference with no extra magic behind it?
>>>
>>>
>> We can re-mmap() it as read-only.
>>
>> However, in the case of the Rust skeleton, users could still use unsafe
>> code to cast immutable variables to mutable ones.
> 
> you have to actively want to abuse the API to do this, so I wouldn't
> be too concerned about this
> 

Agreed.

It will be marked as read-only using mprotect().

Thanks,
Leon



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf-next v3 2/4] bpf, libbpf: Support global percpu data
  2025-06-03  2:45               ` Leon Hwang
@ 2025-06-05 16:29                 ` Andrii Nakryiko
  0 siblings, 0 replies; 24+ messages in thread
From: Andrii Nakryiko @ 2025-06-05 16:29 UTC (permalink / raw)
  To: Leon Hwang
  Cc: Alexei Starovoitov, bpf, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Yonghong Song, Song Liu, Eduard, Quentin Monnet,
	Daniel Xu, kernel-patches-bot

On Mon, Jun 2, 2025 at 7:45 PM Leon Hwang <leon.hwang@linux.dev> wrote:
>
>
>
> On 3/6/25 07:50, Andrii Nakryiko wrote:
> > On Wed, May 28, 2025 at 7:44 PM Leon Hwang <leon.hwang@linux.dev> wrote:
> >>
> >>
> >>
> >> On 29/5/25 00:05, Andrii Nakryiko wrote:
> >>> On Tue, May 27, 2025 at 7:35 PM Alexei Starovoitov
> >>> <alexei.starovoitov@gmail.com> wrote:
> >>>>
> >>
> >> [...]
> >>
> >>>>
> >>>> I guess this can be a follow up.
> >>>> With extra flag lookup/update/delete can look into a new field
> >>>> in that anonymous struct:
> >>>>         struct { /* anonymous struct used by BPF_MAP_*_ELEM and
> >>>> BPF_MAP_FREEZE commands */
> >>>>                 __u32           map_fd;
> >>>>                 __aligned_u64   key;
> >>>>                 union {
> >>>>                         __aligned_u64 value;
> >>>>                         __aligned_u64 next_key;
> >>>>                 };
> >>>>                 __u64           flags;
> >>>>         };
> >>>>
> >>>
> >>> Yep, we'd have two flags: one for "apply across all CPUs", and another
> >>> meaning "apply for specified CPU" + new CPU number field. Or the same
> >>> flag with a special CPU number value (0xffffffff?).
> >>>
> >>>> There is also "batch" version of lookup/update/delete.
> >>>> They probably will need to be extended as well for consistency ?
> >>>> So I'd only go with the "use data to update all CPUs" flag for now.
> >>>
> >>> Agreed. But also looking at generic_map_update_batch() it seems like
> >>> it just routes everything through single-element updates, so it
> >>> shouldn't be hard to add batch support for all this either.
> >>
> >> Regarding BPF_MAP_UPDATE_{ELEM,BATCH} support for percpu_array maps —
> >> would it make sense to split the flags field as [cpu | flags]?
> >
> > We coul;d encode CPU number as part of flags, but I'm not sure what we
> > are trying to achieve here. Adding a dedicated field for cpu number
> > would be in line of what we did for BPF_PROG_TEST_RUN, so I don't see
> > a big problem.
> >
>
> It's to avoid breaking existing APIs, such as libbpf's
> bpf_map_update_elem() and bpf_map__update_elem(). Otherwise, we would
> need to introduce new percpu-specific versions, like
> bpf_map_update_percpu_elem() and bpf_map__update_percpu_elem().

Ok, makes sense. I don't think I have a strong preference, but if we
did go with a separate field, we'd just introduce
bpf_map_{update,delete}_elem_opts() and pass that CPU and flags
through opts struct. It's not too bad either way.

>
> Thanks,
> Leon
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2025-06-05 16:29 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-26 16:21 [PATCH bpf-next v3 0/4] bpf: Introduce global percpu data Leon Hwang
2025-05-26 16:21 ` [PATCH bpf-next v3 1/4] " Leon Hwang
2025-05-27 22:31   ` Andrii Nakryiko
2025-05-29  2:03     ` Leon Hwang
2025-05-26 16:21 ` [PATCH bpf-next v3 2/4] bpf, libbpf: Support " Leon Hwang
2025-05-27 22:31   ` Andrii Nakryiko
2025-05-29  2:24     ` Leon Hwang
2025-05-27 22:40   ` Alexei Starovoitov
2025-05-27 23:25     ` Andrii Nakryiko
2025-05-28  2:35       ` Alexei Starovoitov
2025-05-28 16:05         ` Andrii Nakryiko
2025-05-29  2:43           ` Leon Hwang
2025-06-02 23:50             ` Andrii Nakryiko
2025-06-03  2:45               ` Leon Hwang
2025-06-05 16:29                 ` Andrii Nakryiko
2025-05-26 16:21 ` [PATCH bpf-next v3 3/4] bpf, bpftool: Generate skeleton for " Leon Hwang
2025-05-27 22:31   ` Andrii Nakryiko
2025-05-29  2:56     ` Leon Hwang
2025-06-02 23:50       ` Andrii Nakryiko
2025-06-03  2:47         ` Leon Hwang
2025-05-26 16:21 ` [PATCH bpf-next v3 4/4] selftests/bpf: Add cases to test " Leon Hwang
2025-05-27 22:31 ` [PATCH bpf-next v3 0/4] bpf: Introduce " Andrii Nakryiko
2025-05-28 17:10   ` Yonghong Song
2025-05-29  1:59     ` Leon Hwang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox