[RFC PATCH bpf-next v2 0/3] bpf: Introduce BPF_F_CPU flag for percpu

bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFC PATCH bpf-next v2 0/3] bpf: Introduce BPF_F_CPU flag for percpu_array map
@ 2025-07-07 16:04 Leon Hwang
  2025-07-07 16:04 ` [RFC PATCH bpf-next v2 1/3] " Leon Hwang
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Leon Hwang @ 2025-07-07 16:04 UTC (permalink / raw)
  To: bpf; +Cc: ast, andrii, daniel, Leon Hwang

This patch set introduces the BPF_F_CPU flag for percpu_array maps, as
discussed in the thread of
"[PATCH bpf-next v3 0/4] bpf: Introduce global percpu data"[1].

The goal is to reduce data caching overhead in light skeletons by allowing
a single value to be reused across all CPUs. This avoids the M:N problem
where M cached values are used to update a map on N CPUs kernel.

The BPF_F_CPU flag is accompanied by a cpu field, which specifies the
target CPUs for the operation:

* For lookup operations: the flag and cpu field enable querying a value
  on the specified CPU.
* For update operations:
  * If cpu == (u32)~0, the provided value is copied to all CPUs.
  * Otherwise, the value is copied to the specified CPU only.

Currently, this functionality is only supported for percpu_array maps.

Links:
[1] https://lore.kernel.org/bpf/20250526162146.24429-1-leon.hwang@linux.dev/

Changes:
RFC v1 -> RFC v2:
* Address comments from Andrii:
  * Embed cpu to flags on kernel side.
  * Change BPF_ALL_CPU macro to BPF_ALL_CPUS enum.
  * Copy/update element within RCU protection.
  * Update bpf_map_value_size() including BPF_F_CPU case.
  * Use zero as default value to get cpu option.
  * Update documents of APIs to be generic.
  * Add size_t:0 to opts definitions.
  * Update validate_map_op() including BPF_F_CPU case.
  * Use LIBBPF_OPTS instead of DECLARE_LIBBPF_OPTS.

Leon Hwang (3):
  bpf: Introduce BPF_F_CPU flag for percpu_array map
  bpf, libbpf: Support BPF_F_CPU for percpu_array map
  selftests/bpf: Add case to test BPF_F_CPU

 include/linux/bpf.h                           |   3 +-
 include/uapi/linux/bpf.h                      |   7 +
 kernel/bpf/arraymap.c                         |  56 ++++--
 kernel/bpf/syscall.c                          |  52 ++++--
 tools/include/uapi/linux/bpf.h                |   7 +
 tools/lib/bpf/bpf.c                           |  23 +++
 tools/lib/bpf/bpf.h                           |  36 +++-
 tools/lib/bpf/libbpf.c                        |  58 +++++-
 tools/lib/bpf/libbpf.h                        |  53 +++++-
 tools/lib/bpf/libbpf.map                      |   4 +
 tools/lib/bpf/libbpf_common.h                 |  14 ++
 .../selftests/bpf/prog_tests/percpu_alloc.c   | 170 ++++++++++++++++++
 .../selftests/bpf/progs/percpu_array_flag.c   |  24 +++
 13 files changed, 460 insertions(+), 47 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/percpu_array_flag.c

--
2.50.0

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [RFC PATCH bpf-next v2 1/3] bpf: Introduce BPF_F_CPU flag for percpu_array map
  2025-07-07 16:04 [RFC PATCH bpf-next v2 0/3] bpf: Introduce BPF_F_CPU flag for percpu_array map Leon Hwang
@ 2025-07-07 16:04 ` Leon Hwang
  2025-07-11 18:10   ` Andrii Nakryiko
  2025-07-07 16:04 ` [RFC PATCH bpf-next v2 2/3] bpf, libbpf: Support BPF_F_CPU " Leon Hwang
  2025-07-07 16:04 ` [RFC PATCH bpf-next v2 3/3] selftests/bpf: Add case to test BPF_F_CPU Leon Hwang
  2 siblings, 1 reply; 10+ messages in thread
From: Leon Hwang @ 2025-07-07 16:04 UTC (permalink / raw)
  To: bpf; +Cc: ast, andrii, daniel, Leon Hwang

This patch introduces support for the BPF_F_CPU flag in percpu_array maps
to allow updating or looking up values for specified CPUs or for all CPUs
with a single value.

This enhancement enables:

* Efficient update of all CPUs using a single value when cpu == (u32)~0.
* Targeted update or lookup for a specified CPU otherwise.

The flag is passed via:

* map_flags in bpf_percpu_array_update() along with embedded cpu field.
* elem_flags in generic_map_update_batch() along with separated cpu field.

Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
 include/linux/bpf.h            |  3 +-
 include/uapi/linux/bpf.h       |  7 +++++
 kernel/bpf/arraymap.c          | 56 ++++++++++++++++++++++++++--------
 kernel/bpf/syscall.c           | 52 +++++++++++++++++++------------
 tools/include/uapi/linux/bpf.h |  7 +++++
 5 files changed, 92 insertions(+), 33 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 34dd90ec7fad..6ea5fe7fa0d5 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -2662,7 +2662,8 @@ int map_set_for_each_callback_args(struct bpf_verifier_env *env,
 				   struct bpf_func_state *callee);
 
 int bpf_percpu_hash_copy(struct bpf_map *map, void *key, void *value);
-int bpf_percpu_array_copy(struct bpf_map *map, void *key, void *value);
+int bpf_percpu_array_copy(struct bpf_map *map, void *key, void *value,
+			  u64 flags);
 int bpf_percpu_hash_update(struct bpf_map *map, void *key, void *value,
 			   u64 flags);
 int bpf_percpu_array_update(struct bpf_map *map, void *key, void *value,
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 0670e15a6100..0d3469cb7a06 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1371,6 +1371,12 @@ enum {
 	BPF_NOEXIST	= 1, /* create new element if it didn't exist */
 	BPF_EXIST	= 2, /* update existing element */
 	BPF_F_LOCK	= 4, /* spin_lock-ed map_lookup/map_update */
+	BPF_F_CPU	= 8, /* map_update for percpu_array */
+};
+
+enum {
+	/* indicate updating value across all CPUs for percpu maps. */
+	BPF_ALL_CPUS	= (__u32)~0,
 };
 
 /* flags for BPF_MAP_CREATE command */
@@ -1548,6 +1554,7 @@ union bpf_attr {
 		__u32		map_fd;
 		__u64		elem_flags;
 		__u64		flags;
+		__u32		cpu;
 	} batch;
 
 	struct { /* anonymous struct used by BPF_PROG_LOAD command */
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 3d080916faf9..5c5376b89930 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -295,17 +295,24 @@ static void *percpu_array_map_lookup_percpu_elem(struct bpf_map *map, void *key,
 	return per_cpu_ptr(array->pptrs[index & array->index_mask], cpu);
 }
 
-int bpf_percpu_array_copy(struct bpf_map *map, void *key, void *value)
+int bpf_percpu_array_copy(struct bpf_map *map, void *key, void *value, u64 flags)
 {
 	struct bpf_array *array = container_of(map, struct bpf_array, map);
 	u32 index = *(u32 *)key;
 	void __percpu *pptr;
-	int cpu, off = 0;
-	u32 size;
+	u32 size, cpu;
+	int off = 0;
 
 	if (unlikely(index >= array->map.max_entries))
 		return -ENOENT;
 
+	cpu = (u32)(flags >> 32);
+	flags &= (u32)~0;
+	if (unlikely(flags > BPF_F_CPU))
+		return -EINVAL;
+	if (unlikely((flags & BPF_F_CPU) && cpu >= num_possible_cpus()))
+		return -E2BIG;
+
 	/* per_cpu areas are zero-filled and bpf programs can only
 	 * access 'value_size' of them, so copying rounded areas
 	 * will not leak any kernel data
@@ -313,10 +320,15 @@ int bpf_percpu_array_copy(struct bpf_map *map, void *key, void *value)
 	size = array->elem_size;
 	rcu_read_lock();
 	pptr = array->pptrs[index & array->index_mask];
-	for_each_possible_cpu(cpu) {
-		copy_map_value_long(map, value + off, per_cpu_ptr(pptr, cpu));
-		check_and_init_map_value(map, value + off);
-		off += size;
+	if (flags & BPF_F_CPU) {
+		copy_map_value_long(map, value, per_cpu_ptr(pptr, cpu));
+		check_and_init_map_value(map, value);
+	} else {
+		for_each_possible_cpu(cpu) {
+			copy_map_value_long(map, value + off, per_cpu_ptr(pptr, cpu));
+			check_and_init_map_value(map, value + off);
+			off += size;
+		}
 	}
 	rcu_read_unlock();
 	return 0;
@@ -387,13 +399,21 @@ int bpf_percpu_array_update(struct bpf_map *map, void *key, void *value,
 	struct bpf_array *array = container_of(map, struct bpf_array, map);
 	u32 index = *(u32 *)key;
 	void __percpu *pptr;
-	int cpu, off = 0;
-	u32 size;
+	bool reuse_value;
+	u32 size, cpu;
+	int off = 0;
 
-	if (unlikely(map_flags > BPF_EXIST))
+	cpu = (u32)(map_flags >> 32);
+	map_flags = map_flags & (u32)~0;
+	if (unlikely(map_flags > BPF_F_CPU))
 		/* unknown flags */
 		return -EINVAL;
 
+	if (unlikely((map_flags & BPF_F_CPU) && cpu != BPF_ALL_CPUS &&
+		     cpu >= num_possible_cpus()))
+		/* invalid cpu */
+		return -E2BIG;
+
 	if (unlikely(index >= array->map.max_entries))
 		/* all elements were pre-allocated, cannot insert a new one */
 		return -E2BIG;
@@ -409,12 +429,22 @@ int bpf_percpu_array_update(struct bpf_map *map, void *key, void *value,
 	 * so no kernel data leaks possible
 	 */
 	size = array->elem_size;
+	reuse_value = (map_flags & BPF_F_CPU) && cpu == BPF_ALL_CPUS;
 	rcu_read_lock();
 	pptr = array->pptrs[index & array->index_mask];
-	for_each_possible_cpu(cpu) {
-		copy_map_value_long(map, per_cpu_ptr(pptr, cpu), value + off);
+	if ((map_flags & BPF_F_CPU) && cpu != BPF_ALL_CPUS) {
+		copy_map_value_long(map, per_cpu_ptr(pptr, cpu), value);
 		bpf_obj_free_fields(array->map.record, per_cpu_ptr(pptr, cpu));
-		off += size;
+	} else {
+		for_each_possible_cpu(cpu) {
+			if (!reuse_value) {
+				copy_map_value_long(map, per_cpu_ptr(pptr, cpu), value + off);
+				off += size;
+			} else {
+				copy_map_value_long(map, per_cpu_ptr(pptr, cpu), value);
+			}
+			bpf_obj_free_fields(array->map.record, per_cpu_ptr(pptr, cpu));
+		}
 	}
 	rcu_read_unlock();
 	return 0;
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 7db7182a3057..a3ce0cdecb3c 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -129,8 +129,12 @@ bool bpf_map_write_active(const struct bpf_map *map)
 	return atomic64_read(&map->writecnt) != 0;
 }
 
-static u32 bpf_map_value_size(const struct bpf_map *map)
+static u32 bpf_map_value_size(const struct bpf_map *map, u64 flags)
 {
+	if ((flags & BPF_F_CPU) &&
+		map->map_type == BPF_MAP_TYPE_PERCPU_ARRAY)
+		return round_up(map->value_size, 8);
+
 	if (map->map_type == BPF_MAP_TYPE_PERCPU_HASH ||
 	    map->map_type == BPF_MAP_TYPE_LRU_PERCPU_HASH ||
 	    map->map_type == BPF_MAP_TYPE_PERCPU_ARRAY ||
@@ -312,7 +316,7 @@ static int bpf_map_copy_value(struct bpf_map *map, void *key, void *value,
 	    map->map_type == BPF_MAP_TYPE_LRU_PERCPU_HASH) {
 		err = bpf_percpu_hash_copy(map, key, value);
 	} else if (map->map_type == BPF_MAP_TYPE_PERCPU_ARRAY) {
-		err = bpf_percpu_array_copy(map, key, value);
+		err = bpf_percpu_array_copy(map, key, value, flags);
 	} else if (map->map_type == BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE) {
 		err = bpf_percpu_cgroup_storage_copy(map, key, value);
 	} else if (map->map_type == BPF_MAP_TYPE_STACK_TRACE) {
@@ -1662,7 +1666,7 @@ static int map_lookup_elem(union bpf_attr *attr)
 	if (CHECK_ATTR(BPF_MAP_LOOKUP_ELEM))
 		return -EINVAL;
 
-	if (attr->flags & ~BPF_F_LOCK)
+	if ((attr->flags & (u32)~0) & ~(BPF_F_LOCK | BPF_F_CPU))
 		return -EINVAL;
 
 	CLASS(fd, f)(attr->map_fd);
@@ -1680,7 +1684,7 @@ static int map_lookup_elem(union bpf_attr *attr)
 	if (IS_ERR(key))
 		return PTR_ERR(key);
 
-	value_size = bpf_map_value_size(map);
+	value_size = bpf_map_value_size(map, attr->flags);
 
 	err = -ENOMEM;
 	value = kvmalloc(value_size, GFP_USER | __GFP_NOWARN);
@@ -1749,7 +1753,7 @@ static int map_update_elem(union bpf_attr *attr, bpfptr_t uattr)
 		goto err_put;
 	}
 
-	value_size = bpf_map_value_size(map);
+	value_size = bpf_map_value_size(map, attr->flags);
 	value = kvmemdup_bpfptr(uvalue, value_size);
 	if (IS_ERR(value)) {
 		err = PTR_ERR(value);
@@ -1941,19 +1945,25 @@ int generic_map_update_batch(struct bpf_map *map, struct file *map_file,
 {
 	void __user *values = u64_to_user_ptr(attr->batch.values);
 	void __user *keys = u64_to_user_ptr(attr->batch.keys);
-	u32 value_size, cp, max_count;
+	u32 value_size, cp, max_count, cpu = attr->batch.cpu;
+	u64 elem_flags = attr->batch.elem_flags;
 	void *key, *value;
 	int err = 0;
 
-	if (attr->batch.elem_flags & ~BPF_F_LOCK)
+	if (elem_flags & ~(BPF_F_LOCK | BPF_F_CPU))
 		return -EINVAL;
 
-	if ((attr->batch.elem_flags & BPF_F_LOCK) &&
+	if ((elem_flags & BPF_F_LOCK) &&
 	    !btf_record_has_field(map->record, BPF_SPIN_LOCK)) {
 		return -EINVAL;
 	}
 
-	value_size = bpf_map_value_size(map);
+	if ((elem_flags & BPF_F_CPU) &&
+		map->map_type != BPF_MAP_TYPE_PERCPU_ARRAY)
+		return -EINVAL;
+
+	value_size = bpf_map_value_size(map, elem_flags);
+	elem_flags = (((u64)cpu) << 32) | elem_flags;
 
 	max_count = attr->batch.count;
 	if (!max_count)
@@ -1979,8 +1989,7 @@ int generic_map_update_batch(struct bpf_map *map, struct file *map_file,
 		    copy_from_user(value, values + cp * value_size, value_size))
 			break;
 
-		err = bpf_map_update_value(map, map_file, key, value,
-					   attr->batch.elem_flags);
+		err = bpf_map_update_value(map, map_file, key, value, elem_flags);
 
 		if (err)
 			break;
@@ -2004,18 +2013,24 @@ int generic_map_lookup_batch(struct bpf_map *map,
 	void __user *ubatch = u64_to_user_ptr(attr->batch.in_batch);
 	void __user *values = u64_to_user_ptr(attr->batch.values);
 	void __user *keys = u64_to_user_ptr(attr->batch.keys);
+	u32 value_size, cp, max_count, cpu = attr->batch.cpu;
 	void *buf, *buf_prevkey, *prev_key, *key, *value;
-	u32 value_size, cp, max_count;
+	u64 elem_flags = attr->batch.elem_flags;
 	int err;
 
-	if (attr->batch.elem_flags & ~BPF_F_LOCK)
+	if (elem_flags & ~(BPF_F_LOCK | BPF_F_CPU))
 		return -EINVAL;
 
-	if ((attr->batch.elem_flags & BPF_F_LOCK) &&
+	if ((elem_flags & BPF_F_LOCK) &&
 	    !btf_record_has_field(map->record, BPF_SPIN_LOCK))
 		return -EINVAL;
 
-	value_size = bpf_map_value_size(map);
+	if ((elem_flags & BPF_F_CPU) &&
+		map->map_type != BPF_MAP_TYPE_PERCPU_ARRAY)
+		return -EINVAL;
+
+	value_size = bpf_map_value_size(map, elem_flags);
+	elem_flags = (((u64)cpu) << 32) | elem_flags;
 
 	max_count = attr->batch.count;
 	if (!max_count)
@@ -2049,8 +2064,7 @@ int generic_map_lookup_batch(struct bpf_map *map,
 		rcu_read_unlock();
 		if (err)
 			break;
-		err = bpf_map_copy_value(map, key, value,
-					 attr->batch.elem_flags);
+		err = bpf_map_copy_value(map, key, value, elem_flags);
 
 		if (err == -ENOENT)
 			goto next_key;
@@ -2137,7 +2151,7 @@ static int map_lookup_and_delete_elem(union bpf_attr *attr)
 		goto err_put;
 	}
 
-	value_size = bpf_map_value_size(map);
+	value_size = bpf_map_value_size(map, attr->flags);
 
 	err = -ENOMEM;
 	value = kvmalloc(value_size, GFP_USER | __GFP_NOWARN);
@@ -5445,7 +5459,7 @@ static int bpf_task_fd_query(const union bpf_attr *attr,
 	return err;
 }
 
-#define BPF_MAP_BATCH_LAST_FIELD batch.flags
+#define BPF_MAP_BATCH_LAST_FIELD batch.cpu
 
 #define BPF_DO_BATCH(fn, ...)			\
 	do {					\
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 0670e15a6100..0d3469cb7a06 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -1371,6 +1371,12 @@ enum {
 	BPF_NOEXIST	= 1, /* create new element if it didn't exist */
 	BPF_EXIST	= 2, /* update existing element */
 	BPF_F_LOCK	= 4, /* spin_lock-ed map_lookup/map_update */
+	BPF_F_CPU	= 8, /* map_update for percpu_array */
+};
+
+enum {
+	/* indicate updating value across all CPUs for percpu maps. */
+	BPF_ALL_CPUS	= (__u32)~0,
 };
 
 /* flags for BPF_MAP_CREATE command */
@@ -1548,6 +1554,7 @@ union bpf_attr {
 		__u32		map_fd;
 		__u64		elem_flags;
 		__u64		flags;
+		__u32		cpu;
 	} batch;
 
 	struct { /* anonymous struct used by BPF_PROG_LOAD command */
-- 
2.50.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC PATCH bpf-next v2 2/3] bpf, libbpf: Support BPF_F_CPU for percpu_array map
  2025-07-07 16:04 [RFC PATCH bpf-next v2 0/3] bpf: Introduce BPF_F_CPU flag for percpu_array map Leon Hwang
  2025-07-07 16:04 ` [RFC PATCH bpf-next v2 1/3] " Leon Hwang
@ 2025-07-07 16:04 ` Leon Hwang
  2025-07-11 18:10   ` Andrii Nakryiko
  2025-07-07 16:04 ` [RFC PATCH bpf-next v2 3/3] selftests/bpf: Add case to test BPF_F_CPU Leon Hwang
  2 siblings, 1 reply; 10+ messages in thread
From: Leon Hwang @ 2025-07-07 16:04 UTC (permalink / raw)
  To: bpf; +Cc: ast, andrii, daniel, Leon Hwang

This patch adds libbpf support for the BPF_F_CPU flag in percpu_array maps,
introducing the following APIs:

1. bpf_map_update_elem_opts(): update with struct bpf_map_update_elem_opts
2. bpf_map_lookup_elem_opts(): lookup with struct bpf_map_lookup_elem_opts
3. bpf_map__update_elem_opts(): high-level wrapper with input validation
4. bpf_map__lookup_elem_opts(): high-level wrapper with input validation

Behavior:

* If opts->cpu == (u32)~0, the update is applied to all CPUs.
* Otherwise, it applies only to the specified CPU.
* Lookup APIs retrieve values from the target CPU when BPF_F_CPU is used.

Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
 tools/lib/bpf/bpf.c           | 23 ++++++++++++++
 tools/lib/bpf/bpf.h           | 36 +++++++++++++++++++++-
 tools/lib/bpf/libbpf.c        | 56 +++++++++++++++++++++++++++++++----
 tools/lib/bpf/libbpf.h        | 53 ++++++++++++++++++++++++++++-----
 tools/lib/bpf/libbpf.map      |  4 +++
 tools/lib/bpf/libbpf_common.h | 14 +++++++++
 6 files changed, 172 insertions(+), 14 deletions(-)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index ab40dbf9f020..8061093d84f9 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -402,6 +402,17 @@ int bpf_map_update_elem(int fd, const void *key, const void *value,
 	return libbpf_err_errno(ret);
 }
 
+int bpf_map_update_elem_opts(int fd, const void *key, const void *value,
+			     const struct bpf_map_update_elem_opts *opts)
+{
+	__u64 flags;
+	__u32 cpu;
+
+	cpu = OPTS_GET(opts, cpu, 0);
+	flags = ((__u64) cpu) << 32 | OPTS_GET(opts, flags, 0);
+	return bpf_map_update_elem(fd, key, value, flags);
+}
+
 int bpf_map_lookup_elem(int fd, const void *key, void *value)
 {
 	const size_t attr_sz = offsetofend(union bpf_attr, flags);
@@ -433,6 +444,17 @@ int bpf_map_lookup_elem_flags(int fd, const void *key, void *value, __u64 flags)
 	return libbpf_err_errno(ret);
 }
 
+int bpf_map_lookup_elem_opts(int fd, const void *key, void *value,
+			     const struct bpf_map_lookup_elem_opts *opts)
+{
+	__u64 flags;
+	__u32 cpu;
+
+	cpu = OPTS_GET(opts, cpu, 0);
+	flags = ((__u64) cpu) << 32 | OPTS_GET(opts, flags, 0);
+	return bpf_map_lookup_elem_flags(fd, key, value, flags);
+}
+
 int bpf_map_lookup_and_delete_elem(int fd, const void *key, void *value)
 {
 	const size_t attr_sz = offsetofend(union bpf_attr, flags);
@@ -542,6 +564,7 @@ static int bpf_map_batch_common(int cmd, int fd, void  *in_batch,
 	attr.batch.count = *count;
 	attr.batch.elem_flags  = OPTS_GET(opts, elem_flags, 0);
 	attr.batch.flags = OPTS_GET(opts, flags, 0);
+	attr.batch.cpu = OPTS_GET(opts, cpu, 0);
 
 	ret = sys_bpf(cmd, &attr, attr_sz);
 	*count = attr.batch.count;
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index 7252150e7ad3..d0ab18b50294 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -163,12 +163,42 @@ LIBBPF_API int bpf_map_delete_elem_flags(int fd, const void *key, __u64 flags);
 LIBBPF_API int bpf_map_get_next_key(int fd, const void *key, void *next_key);
 LIBBPF_API int bpf_map_freeze(int fd);
 
+/**
+ * @brief **bpf_map_update_elem_opts** allows for updating map with options.
+ *
+ * @param fd BPF map file descriptor
+ * @param key pointer to key
+ * @param value pointer to value
+ * @param opts options for configuring the way to update map
+ * @return 0, on success; negative error code, otherwise (errno is also set to
+ * the error code)
+ */
+LIBBPF_API int bpf_map_update_elem_opts(int fd, const void *key, const void *value,
+					const struct bpf_map_update_elem_opts *opts);
+
+/**
+ * @brief **bpf_map_lookup_elem_opts** allows for looking up the value with
+ * options.
+ *
+ * @param fd BPF map file descriptor
+ * @param key pointer to key
+ * @param value pointer to value
+ * @param opts options for configuring the way to lookup map
+ * @return 0, on success; negative error code, otherwise (errno is also set to
+ * the error code)
+ */
+LIBBPF_API int bpf_map_lookup_elem_opts(int fd, const void *key, void *value,
+					const struct bpf_map_lookup_elem_opts *opts);
+
+
 struct bpf_map_batch_opts {
 	size_t sz; /* size of this struct for forward/backward compatibility */
 	__u64 elem_flags;
 	__u64 flags;
+	__u32 cpu;
+	size_t:0;
 };
-#define bpf_map_batch_opts__last_field flags
+#define bpf_map_batch_opts__last_field cpu
 
 
 /**
@@ -286,6 +316,10 @@ LIBBPF_API int bpf_map_lookup_and_delete_batch(int fd, void *in_batch,
  *    Update spin_lock-ed map elements. This must be
  *    specified if the map value contains a spinlock.
  *
+ * **BPF_F_CPU**
+ *    As for percpu map, update value across all CPUs if **opts->cpu** is
+ *    (__u32)~0, or on specified CPU otherwise.
+ *
  * @param fd BPF map file descriptor
  * @param keys pointer to an array of *count* keys
  * @param values pointer to an array of *count* values
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index aee36402f0a3..35faedef6ab4 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -10582,7 +10582,8 @@ bpf_object__find_map_fd_by_name(const struct bpf_object *obj, const char *name)
 }
 
 static int validate_map_op(const struct bpf_map *map, size_t key_sz,
-			   size_t value_sz, bool check_value_sz)
+			   size_t value_sz, bool check_value_sz, __u64 flags,
+			   __u32 cpu)
 {
 	if (!map_is_created(map)) /* map is not yet created */
 		return -ENOENT;
@@ -10601,6 +10602,19 @@ static int validate_map_op(const struct bpf_map *map, size_t key_sz,
 	if (!check_value_sz)
 		return 0;
 
+	if (flags & BPF_F_CPU) {
+		if (map->def.type != BPF_MAP_TYPE_PERCPU_ARRAY)
+			return -EINVAL;
+		if (cpu != BPF_ALL_CPUS && cpu >= libbpf_num_possible_cpus())
+			return -E2BIG;
+		if (map->def.value_size != value_sz) {
+			pr_warn("map '%s': unexpected value size %zu provided, expected %u\n",
+				map->name, value_sz, map->def.value_size);
+			return -EINVAL;
+		}
+		return 0;
+	}
+
 	switch (map->def.type) {
 	case BPF_MAP_TYPE_PERCPU_ARRAY:
 	case BPF_MAP_TYPE_PERCPU_HASH:
@@ -10633,32 +10647,62 @@ int bpf_map__lookup_elem(const struct bpf_map *map,
 {
 	int err;
 
-	err = validate_map_op(map, key_sz, value_sz, true);
+	err = validate_map_op(map, key_sz, value_sz, true, 0, 0);
 	if (err)
 		return libbpf_err(err);
 
 	return bpf_map_lookup_elem_flags(map->fd, key, value, flags);
 }
 
+int bpf_map__lookup_elem_opts(const struct bpf_map *map, const void *key,
+			      size_t key_sz, void *value, size_t value_sz,
+			      const struct bpf_map_lookup_elem_opts *opts)
+{
+	__u64 flags = OPTS_GET(opts, flags, 0);
+	__u32 cpu = OPTS_GET(opts, cpu, 0);
+	int err;
+
+	err = validate_map_op(map, key_sz, value_sz, true, flags, cpu);
+	if (err)
+		return libbpf_err(err);
+
+	return bpf_map_lookup_elem_opts(map->fd, key, value, opts);
+}
+
 int bpf_map__update_elem(const struct bpf_map *map,
 			 const void *key, size_t key_sz,
 			 const void *value, size_t value_sz, __u64 flags)
 {
 	int err;
 
-	err = validate_map_op(map, key_sz, value_sz, true);
+	err = validate_map_op(map, key_sz, value_sz, true, 0, 0);
 	if (err)
 		return libbpf_err(err);
 
 	return bpf_map_update_elem(map->fd, key, value, flags);
 }
 
+int bpf_map__update_elem_opts(const struct bpf_map *map, const void *key,
+			      size_t key_sz, const void *value, size_t value_sz,
+			      const struct bpf_map_update_elem_opts *opts)
+{
+	__u64 flags = OPTS_GET(opts, flags, 0);
+	__u32 cpu = OPTS_GET(opts, cpu, 0);
+	int err;
+
+	err = validate_map_op(map, key_sz, value_sz, true, flags, cpu);
+	if (err)
+		return libbpf_err(err);
+
+	return bpf_map_update_elem_opts(map->fd, key, value, opts);
+}
+
 int bpf_map__delete_elem(const struct bpf_map *map,
 			 const void *key, size_t key_sz, __u64 flags)
 {
 	int err;
 
-	err = validate_map_op(map, key_sz, 0, false /* check_value_sz */);
+	err = validate_map_op(map, key_sz, 0, false /* check_value_sz */, 0, 0);
 	if (err)
 		return libbpf_err(err);
 
@@ -10671,7 +10715,7 @@ int bpf_map__lookup_and_delete_elem(const struct bpf_map *map,
 {
 	int err;
 
-	err = validate_map_op(map, key_sz, value_sz, true);
+	err = validate_map_op(map, key_sz, value_sz, true, 0, 0);
 	if (err)
 		return libbpf_err(err);
 
@@ -10683,7 +10727,7 @@ int bpf_map__get_next_key(const struct bpf_map *map,
 {
 	int err;
 
-	err = validate_map_op(map, key_sz, 0, false /* check_value_sz */);
+	err = validate_map_op(map, key_sz, 0, false /* check_value_sz */, 0, 0);
 	if (err)
 		return libbpf_err(err);
 
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index d1cf813a057b..fd4940759bc9 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -1168,13 +1168,7 @@ LIBBPF_API struct bpf_map *bpf_map__inner_map(struct bpf_map *map);
  * @param key pointer to memory containing bytes of the key used for lookup
  * @param key_sz size in bytes of key data, needs to match BPF map definition's **key_size**
  * @param value pointer to memory in which looked up value will be stored
- * @param value_sz size in byte of value data memory; it has to match BPF map
- * definition's **value_size**. For per-CPU BPF maps value size has to be
- * a product of BPF map value size and number of possible CPUs in the system
- * (could be fetched with **libbpf_num_possible_cpus()**). Note also that for
- * per-CPU values value size has to be aligned up to closest 8 bytes for
- * alignment reasons, so expected size is: `round_up(value_size, 8)
- * * libbpf_num_possible_cpus()`.
+ * @param value_sz refer to **bpf_map__lookup_elem_opts()**'s description.
  * @flags extra flags passed to kernel for this operation
  * @return 0, on success; negative error, otherwise
  *
@@ -1185,6 +1179,32 @@ LIBBPF_API int bpf_map__lookup_elem(const struct bpf_map *map,
 				    const void *key, size_t key_sz,
 				    void *value, size_t value_sz, __u64 flags);
 
+/**
+ * @brief **bpf_map__lookup_elem_opts()** allows to lookup BPF map value
+ * corresponding to provided key with options.
+ * @param map BPF map to lookup element in
+ * @param key pointer to memory containing bytes of the key used for lookup
+ * @param key_sz size in bytes of key data, needs to match BPF map definition's **key_size**
+ * @param value pointer to memory in which looked up value will be stored
+ * @param value_sz size in byte of value data memory; it has to match BPF map
+ * definition's **value_size**. For per-CPU BPF maps value size can be
+ * definition's **value_size** if **BPF_F_CPU** is specified in **opts->flags**,
+ * otherwise a product of BPF map value size and number of possible CPUs in the
+ * system (could be fetched with **libbpf_num_possible_cpus()**). Note else that
+ * for per-CPU values value size has to be aligned up to closest 8 bytes for
+ * alignment reasons, so expected size is: `round_up(value_size, 8)
+ * * libbpf_num_possible_cpus()`.
+ * @opts extra options passed to kernel for this operation
+ * @return 0, on success; negative error, otherwise
+ *
+ * **bpf_map__lookup_elem_opts()** is high-level equivalent of
+ * **bpf_map_lookup_elem_opts()** API with added check for key and value size.
+ */
+LIBBPF_API int bpf_map__lookup_elem_opts(const struct bpf_map *map,
+					 const void *key, size_t key_sz,
+					 void *value, size_t value_sz,
+					 const struct bpf_map_lookup_elem_opts *opts);
+
 /**
  * @brief **bpf_map__update_elem()** allows to insert or update value in BPF
  * map that corresponds to provided key.
@@ -1209,6 +1229,25 @@ LIBBPF_API int bpf_map__update_elem(const struct bpf_map *map,
 				    const void *key, size_t key_sz,
 				    const void *value, size_t value_sz, __u64 flags);
 
+/**
+ * @brief **bpf_map__update_elem_opts()** allows to insert or update value in BPF
+ * map that corresponds to provided key with options.
+ * @param map BPF map to insert to or update element in
+ * @param key pointer to memory containing bytes of the key
+ * @param key_sz size in bytes of key data, needs to match BPF map definition's **key_size**
+ * @param value pointer to memory containing bytes of the value
+ * @param value_sz refer to **bpf_map__lookup_elem_opts()**'s description.
+ * @opts extra options passed to kernel for this operation
+ * @return 0, on success; negative error, otherwise
+ *
+ * **bpf_map__update_elem_opts()** is high-level equivalent of
+ * **bpf_map_update_elem_opts()** API with added check for key and value size.
+ */
+LIBBPF_API int bpf_map__update_elem_opts(const struct bpf_map *map,
+					 const void *key, size_t key_sz,
+					 const void *value, size_t value_sz,
+					 const struct bpf_map_update_elem_opts *opts);
+
 /**
  * @brief **bpf_map__delete_elem()** allows to delete element in BPF map that
  * corresponds to provided key.
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index 1bbf77326420..d21288991d1c 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -436,6 +436,10 @@ LIBBPF_1.6.0 {
 		bpf_linker__add_buf;
 		bpf_linker__add_fd;
 		bpf_linker__new_fd;
+		bpf_map__lookup_elem_opts;
+		bpf_map__update_elem_opts;
+		bpf_map_lookup_elem_opts;
+		bpf_map_update_elem_opts;
 		bpf_object__prepare;
 		bpf_prog_stream_read;
 		bpf_program__attach_cgroup_opts;
diff --git a/tools/lib/bpf/libbpf_common.h b/tools/lib/bpf/libbpf_common.h
index 8fe248e14eb6..84ca89ace1be 100644
--- a/tools/lib/bpf/libbpf_common.h
+++ b/tools/lib/bpf/libbpf_common.h
@@ -89,4 +89,18 @@
 		memcpy(&NAME, &___##NAME, sizeof(NAME));		    \
 	} while (0)
 
+struct bpf_map_update_elem_opts {
+	size_t sz; /* size of this struct for forward/backward compatibility */
+	__u64 flags;
+	__u32 cpu;
+	size_t:0;
+};
+
+struct bpf_map_lookup_elem_opts {
+	size_t sz; /* size of this struct for forward/backward compatibility */
+	__u64 flags;
+	__u32 cpu;
+	size_t:0;
+};
+
 #endif /* __LIBBPF_LIBBPF_COMMON_H */
-- 
2.50.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC PATCH bpf-next v2 3/3] selftests/bpf: Add case to test BPF_F_CPU
  2025-07-07 16:04 [RFC PATCH bpf-next v2 0/3] bpf: Introduce BPF_F_CPU flag for percpu_array map Leon Hwang
  2025-07-07 16:04 ` [RFC PATCH bpf-next v2 1/3] " Leon Hwang
  2025-07-07 16:04 ` [RFC PATCH bpf-next v2 2/3] bpf, libbpf: Support BPF_F_CPU " Leon Hwang
@ 2025-07-07 16:04 ` Leon Hwang
  2025-07-11 18:11   ` Andrii Nakryiko
  2 siblings, 1 reply; 10+ messages in thread
From: Leon Hwang @ 2025-07-07 16:04 UTC (permalink / raw)
  To: bpf; +Cc: ast, andrii, daniel, Leon Hwang

This patch adds test coverage for the new BPF_F_CPU flag support in
percpu_array maps. The following APIs are exercised:

* bpf_map_update_batch()
* bpf_map_lookup_batch()
* bpf_map_update_elem_opts()
* bpf_map__update_elem_opts()
* bpf_map_lookup_elem_opts()
* bpf_map__lookup_elem_opts()

cd tools/testing/selftests/bpf/
./test_progs -t percpu_alloc/cpu_flag_tests
253/13  percpu_alloc/cpu_flag_tests:OK
253     percpu_alloc:OK
Summary: 1/13 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
 .../selftests/bpf/prog_tests/percpu_alloc.c   | 170 ++++++++++++++++++
 .../selftests/bpf/progs/percpu_array_flag.c   |  24 +++
 2 files changed, 194 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/progs/percpu_array_flag.c

diff --git a/tools/testing/selftests/bpf/prog_tests/percpu_alloc.c b/tools/testing/selftests/bpf/prog_tests/percpu_alloc.c
index 343da65864d6..6f0d0e6dc76a 100644
--- a/tools/testing/selftests/bpf/prog_tests/percpu_alloc.c
+++ b/tools/testing/selftests/bpf/prog_tests/percpu_alloc.c
@@ -3,6 +3,7 @@
 #include "percpu_alloc_array.skel.h"
 #include "percpu_alloc_cgrp_local_storage.skel.h"
 #include "percpu_alloc_fail.skel.h"
+#include "percpu_array_flag.skel.h"
 
 static void test_array(void)
 {
@@ -115,6 +116,173 @@ static void test_failure(void) {
 	RUN_TESTS(percpu_alloc_fail);
 }
 
+static void test_cpu_flag(void)
+{
+	int map_fd, *keys = NULL, value_size, cpu, i, j, nr_cpus, err;
+	size_t key_sz = sizeof(int), value_sz = sizeof(u64);
+	struct percpu_array_flag *skel;
+	u64 batch = 0, *values = NULL;
+	const u64 value = 0xDEADC0DE;
+	u32 count, max_entries;
+	struct bpf_map *map;
+	LIBBPF_OPTS(bpf_map_lookup_elem_opts, lookup_opts,
+		    .flags = BPF_F_CPU,
+		    .cpu = 0,
+	);
+	LIBBPF_OPTS(bpf_map_update_elem_opts, update_opts,
+		    .flags = BPF_F_CPU,
+		    .cpu = 0,
+	);
+	LIBBPF_OPTS(bpf_map_batch_opts, batch_opts,
+		    .elem_flags = BPF_F_CPU,
+		    .flags = 0,
+	);
+
+	nr_cpus = libbpf_num_possible_cpus();
+	if (!ASSERT_GT(nr_cpus, 0, "libbpf_num_possible_cpus"))
+		return;
+
+	skel = percpu_array_flag__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "percpu_array_flag__open_and_load"))
+		return;
+
+	map = skel->maps.percpu;
+	map_fd = bpf_map__fd(map);
+	max_entries = bpf_map__max_entries(map);
+
+	value_size = value_sz * nr_cpus;
+	values = calloc(max_entries, value_size);
+	keys = calloc(max_entries, key_sz);
+	if (!ASSERT_FALSE(!keys || !values, "calloc keys and values"))
+		goto out;
+
+	for (i = 0; i < max_entries; i++)
+		keys[i] = i;
+	memset(values, 0, max_entries * value_size);
+
+	batch_opts.cpu = nr_cpus;
+	err = bpf_map_update_batch(map_fd, keys, values, &max_entries, &batch_opts);
+	if (!ASSERT_EQ(err, -E2BIG, "bpf_map_update_batch -E2BIG"))
+		goto out;
+
+	for (cpu = 0; cpu < nr_cpus; cpu++) {
+		memset(values, 0, max_entries * value_size);
+
+		/* clear values on all CPUs */
+		batch_opts.cpu = BPF_ALL_CPUS;
+		batch_opts.elem_flags = BPF_F_CPU;
+		err = bpf_map_update_batch(map_fd, keys, values, &max_entries, &batch_opts);
+		if (!ASSERT_OK(err, "bpf_map_update_batch all cpus"))
+			goto out;
+
+		/* update values on current CPU */
+		for (i = 0; i < max_entries; i++)
+			values[i] = value;
+
+		batch_opts.cpu = cpu;
+		err = bpf_map_update_batch(map_fd, keys, values, &max_entries, &batch_opts);
+		if (!ASSERT_OK(err, "bpf_map_update_batch current cpu"))
+			goto out;
+
+		/* lookup values on current CPU */
+		batch_opts.cpu = cpu;
+		batch_opts.elem_flags = BPF_F_CPU;
+		memset(values, 0, max_entries * value_sz);
+		err = bpf_map_lookup_batch(map_fd, NULL, &batch, keys, values, &count, &batch_opts);
+		if (!ASSERT_TRUE(!err || err == -ENOENT, "bpf_map_lookup_batch current cpu"))
+			goto out;
+
+		for (i = 0; i < max_entries; i++)
+			if (!ASSERT_EQ(values[i], value, "value on current cpu"))
+				goto out;
+
+		/* lookup values on all CPUs */
+		batch_opts.cpu = 0;
+		batch_opts.elem_flags = 0;
+		memset(values, 0, max_entries * value_size);
+		err = bpf_map_lookup_batch(map_fd, NULL, &batch, keys, values, &count, &batch_opts);
+		if (!ASSERT_TRUE(!err || err == -ENOENT, "bpf_map_lookup_batch all cpus"))
+			goto out;
+
+		for (i = 0; i < max_entries; i++) {
+			for (j = 0; j < nr_cpus; j++) {
+				if (!ASSERT_EQ(values[i*nr_cpus + j], j != cpu ? 0 : value,
+					       "value on cpu"))
+					goto out;
+			}
+		}
+	}
+
+	update_opts.cpu = nr_cpus;
+	err = bpf_map_update_elem_opts(map_fd, keys, values, &update_opts);
+	if (!ASSERT_EQ(err, -E2BIG, "bpf_map_update_elem_opts -E2BIG"))
+		goto out;
+
+	err = bpf_map__update_elem_opts(map, keys, key_sz, values, value_sz,
+					&update_opts);
+	if (!ASSERT_EQ(err, -E2BIG, "bpf_map__update_elem_opts -E2BIG"))
+		goto out;
+
+	lookup_opts.cpu = nr_cpus;
+	err = bpf_map_lookup_elem_opts(map_fd, keys, values, &lookup_opts);
+	if (!ASSERT_EQ(err, -E2BIG, "bpf_map_lookup_elem_opts -E2BIG"))
+		goto out;
+
+	err = bpf_map__lookup_elem_opts(map, keys, key_sz, values, value_sz,
+					&lookup_opts);
+	if (!ASSERT_EQ(err, -E2BIG, "bpf_map__lookup_elem_opts -E2BIG"))
+		goto out;
+
+	/* clear value on all cpus */
+	batch_opts.cpu = BPF_ALL_CPUS;
+	batch_opts.elem_flags = BPF_F_CPU;
+	memset(values, 0, max_entries * value_sz);
+	err = bpf_map_update_batch(map_fd, keys, values, &max_entries, &batch_opts);
+	if (!ASSERT_OK(err, "bpf_map_update_batch all cpus"))
+		goto out;
+
+	for (cpu = 0; cpu < nr_cpus; cpu++) {
+		/* update value on current cpu */
+		values[0] = value;
+		update_opts.cpu = cpu;
+		for (i = 0; i < max_entries; i++) {
+			err = bpf_map__update_elem_opts(map, keys + i,
+							key_sz, values,
+							value_sz, &update_opts);
+			if (!ASSERT_OK(err, "bpf_map__update_elem_opts current cpu"))
+				goto out;
+
+			for (j = 0; j < nr_cpus; j++) {
+				/* lookup then check value on CPUs */
+				lookup_opts.cpu = j;
+				err = bpf_map__lookup_elem_opts(map, keys + i,
+								key_sz, values,
+								value_sz,
+								&lookup_opts);
+				if (!ASSERT_OK(err, "bpf_map__lookup_elem_opts current cpu"))
+					goto out;
+				if (!ASSERT_EQ(values[0], j != cpu ? 0 : value,
+					       "bpf_map__lookup_elem_opts value on current cpu"))
+					goto out;
+			}
+		}
+
+		/* clear value on current cpu */
+		values[0] = 0;
+		err = bpf_map__update_elem_opts(map, keys, key_sz, values,
+						value_sz, &update_opts);
+		if (!ASSERT_OK(err, "bpf_map__update_elem_opts current cpu"))
+			goto out;
+	}
+
+out:
+	if (keys)
+		free(keys);
+	if (values)
+		free(values);
+	percpu_array_flag__destroy(skel);
+}
+
 void test_percpu_alloc(void)
 {
 	if (test__start_subtest("array"))
@@ -125,4 +293,6 @@ void test_percpu_alloc(void)
 		test_cgrp_local_storage();
 	if (test__start_subtest("failure_tests"))
 		test_failure();
+	if (test__start_subtest("cpu_flag_tests"))
+		test_cpu_flag();
 }
diff --git a/tools/testing/selftests/bpf/progs/percpu_array_flag.c b/tools/testing/selftests/bpf/progs/percpu_array_flag.c
new file mode 100644
index 000000000000..4d92e121958e
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/percpu_array_flag.c
@@ -0,0 +1,24 @@
+// SPDX-License-Identifier: GPL-2.0
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+struct {
+	__uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
+	__uint(max_entries, 2);
+	__type(key, int);
+	__type(value, u64);
+} percpu SEC(".maps");
+
+SEC("fentry/bpf_fentry_test1")
+int BPF_PROG(test_percpu_array, int x)
+{
+	u64 value = 0xDEADC0DE;
+	int key = 0;
+
+	bpf_map_update_elem(&percpu, &key, &value, BPF_ANY);
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";
+
-- 
2.50.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH bpf-next v2 1/3] bpf: Introduce BPF_F_CPU flag for percpu_array map
  2025-07-07 16:04 ` [RFC PATCH bpf-next v2 1/3] " Leon Hwang
@ 2025-07-11 18:10   ` Andrii Nakryiko
  2025-07-14 12:47     ` Leon Hwang
  0 siblings, 1 reply; 10+ messages in thread
From: Andrii Nakryiko @ 2025-07-11 18:10 UTC (permalink / raw)
  To: Leon Hwang; +Cc: bpf, ast, andrii, daniel

On Mon, Jul 7, 2025 at 9:04 AM Leon Hwang <leon.hwang@linux.dev> wrote:
>
> This patch introduces support for the BPF_F_CPU flag in percpu_array maps
> to allow updating or looking up values for specified CPUs or for all CPUs
> with a single value.
>

For next revision, please drop RFC tag, so this is tested and reviewed
as a proper patch set.

> This enhancement enables:
>
> * Efficient update of all CPUs using a single value when cpu == (u32)~0.
> * Targeted update or lookup for a specified CPU otherwise.
>
> The flag is passed via:
>
> * map_flags in bpf_percpu_array_update() along with embedded cpu field.
> * elem_flags in generic_map_update_batch() along with separated cpu field.
>
> Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
> ---
>  include/linux/bpf.h            |  3 +-
>  include/uapi/linux/bpf.h       |  7 +++++
>  kernel/bpf/arraymap.c          | 56 ++++++++++++++++++++++++++--------
>  kernel/bpf/syscall.c           | 52 +++++++++++++++++++------------
>  tools/include/uapi/linux/bpf.h |  7 +++++
>  5 files changed, 92 insertions(+), 33 deletions(-)
>

[...]

>
> +       cpu = (u32)(flags >> 32);
> +       flags &= (u32)~0;
> +       if (unlikely(flags > BPF_F_CPU))
> +               return -EINVAL;
> +       if (unlikely((flags & BPF_F_CPU) && cpu >= num_possible_cpus()))
> +               return -E2BIG;
> +
>         /* per_cpu areas are zero-filled and bpf programs can only
>          * access 'value_size' of them, so copying rounded areas
>          * will not leak any kernel data
> @@ -313,10 +320,15 @@ int bpf_percpu_array_copy(struct bpf_map *map, void *key, void *value)
>         size = array->elem_size;
>         rcu_read_lock();
>         pptr = array->pptrs[index & array->index_mask];
> -       for_each_possible_cpu(cpu) {
> -               copy_map_value_long(map, value + off, per_cpu_ptr(pptr, cpu));
> -               check_and_init_map_value(map, value + off);
> -               off += size;
> +       if (flags & BPF_F_CPU) {
> +               copy_map_value_long(map, value, per_cpu_ptr(pptr, cpu));
> +               check_and_init_map_value(map, value);
> +       } else {
> +               for_each_possible_cpu(cpu) {
> +                       copy_map_value_long(map, value + off, per_cpu_ptr(pptr, cpu));
> +                       check_and_init_map_value(map, value + off);
> +                       off += size;
> +               }
>         }
>         rcu_read_unlock();
>         return 0;
> @@ -387,13 +399,21 @@ int bpf_percpu_array_update(struct bpf_map *map, void *key, void *value,
>         struct bpf_array *array = container_of(map, struct bpf_array, map);
>         u32 index = *(u32 *)key;
>         void __percpu *pptr;
> -       int cpu, off = 0;
> -       u32 size;
> +       bool reuse_value;
> +       u32 size, cpu;
> +       int off = 0;
>
> -       if (unlikely(map_flags > BPF_EXIST))
> +       cpu = (u32)(map_flags >> 32);
> +       map_flags = map_flags & (u32)~0;

be consistent, use &= approach as above

> +       if (unlikely(map_flags > BPF_F_CPU))
>                 /* unknown flags */
>                 return -EINVAL;
>
> +       if (unlikely((map_flags & BPF_F_CPU) && cpu != BPF_ALL_CPUS &&
> +                    cpu >= num_possible_cpus()))
> +               /* invalid cpu */
> +               return -E2BIG;
> +
>         if (unlikely(index >= array->map.max_entries))
>                 /* all elements were pre-allocated, cannot insert a new one */
>                 return -E2BIG;
> @@ -409,12 +429,22 @@ int bpf_percpu_array_update(struct bpf_map *map, void *key, void *value,
>          * so no kernel data leaks possible
>          */
>         size = array->elem_size;
> +       reuse_value = (map_flags & BPF_F_CPU) && cpu == BPF_ALL_CPUS;

I find "reuse_value" name extremely misleading, I stumble upon this
every time (because "value" is ambiguous, is it the source value or
map value we are updating?). Please drop it, there is no need for it,
just do `map_flags & BPF_F_CPU` check in that for_each_possible_cpu
loop below

>         rcu_read_lock();
>         pptr = array->pptrs[index & array->index_mask];
> -       for_each_possible_cpu(cpu) {
> -               copy_map_value_long(map, per_cpu_ptr(pptr, cpu), value + off);
> +       if ((map_flags & BPF_F_CPU) && cpu != BPF_ALL_CPUS) {
> +               copy_map_value_long(map, per_cpu_ptr(pptr, cpu), value);
>                 bpf_obj_free_fields(array->map.record, per_cpu_ptr(pptr, cpu));
> -               off += size;
> +       } else {
> +               for_each_possible_cpu(cpu) {
> +                       if (!reuse_value) {
> +                               copy_map_value_long(map, per_cpu_ptr(pptr, cpu), value + off);
> +                               off += size;
> +                       } else {
> +                               copy_map_value_long(map, per_cpu_ptr(pptr, cpu), value);
> +                       }

simpler and less duplication:

copy_map_value_long(map, per_cpu_ptr(pptr, cpu), value + off);
/*
 * same user-provided value is used if BPF_F_CPU is specified,
 * otherwise value is an array of per-cpu values
 */
if (!(map_flags & BPF_F_CPU))
    off += size;

> +                       bpf_obj_free_fields(array->map.record, per_cpu_ptr(pptr, cpu));
> +               }
>         }
>         rcu_read_unlock();
>         return 0;
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 7db7182a3057..a3ce0cdecb3c 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -129,8 +129,12 @@ bool bpf_map_write_active(const struct bpf_map *map)
>         return atomic64_read(&map->writecnt) != 0;
>  }
>
> -static u32 bpf_map_value_size(const struct bpf_map *map)
> +static u32 bpf_map_value_size(const struct bpf_map *map, u64 flags)
>  {
> +       if ((flags & BPF_F_CPU) &&
> +               map->map_type == BPF_MAP_TYPE_PERCPU_ARRAY)

formatting is off, keep single line

> +               return round_up(map->value_size, 8);
> +
>         if (map->map_type == BPF_MAP_TYPE_PERCPU_HASH ||
>             map->map_type == BPF_MAP_TYPE_LRU_PERCPU_HASH ||
>             map->map_type == BPF_MAP_TYPE_PERCPU_ARRAY ||
> @@ -312,7 +316,7 @@ static int bpf_map_copy_value(struct bpf_map *map, void *key, void *value,
>             map->map_type == BPF_MAP_TYPE_LRU_PERCPU_HASH) {
>                 err = bpf_percpu_hash_copy(map, key, value);
>         } else if (map->map_type == BPF_MAP_TYPE_PERCPU_ARRAY) {
> -               err = bpf_percpu_array_copy(map, key, value);
> +               err = bpf_percpu_array_copy(map, key, value, flags);
>         } else if (map->map_type == BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE) {
>                 err = bpf_percpu_cgroup_storage_copy(map, key, value);
>         } else if (map->map_type == BPF_MAP_TYPE_STACK_TRACE) {
> @@ -1662,7 +1666,7 @@ static int map_lookup_elem(union bpf_attr *attr)
>         if (CHECK_ATTR(BPF_MAP_LOOKUP_ELEM))
>                 return -EINVAL;
>
> -       if (attr->flags & ~BPF_F_LOCK)
> +       if ((attr->flags & (u32)~0) & ~(BPF_F_LOCK | BPF_F_CPU))

nit: this whole `attr->flags & (u32)~0` looks like an over-engineered
`(u32)attr->flags`...

>                 return -EINVAL;

we should probably also have a condition checking that upper 32 bits
are zero if BPF_F_CPU is not set?

>
>         CLASS(fd, f)(attr->map_fd);
> @@ -1680,7 +1684,7 @@ static int map_lookup_elem(union bpf_attr *attr)
>         if (IS_ERR(key))
>                 return PTR_ERR(key);
>
> -       value_size = bpf_map_value_size(map);
> +       value_size = bpf_map_value_size(map, attr->flags);
>
>         err = -ENOMEM;
>         value = kvmalloc(value_size, GFP_USER | __GFP_NOWARN);
> @@ -1749,7 +1753,7 @@ static int map_update_elem(union bpf_attr *attr, bpfptr_t uattr)
>                 goto err_put;
>         }
>
> -       value_size = bpf_map_value_size(map);
> +       value_size = bpf_map_value_size(map, attr->flags);
>         value = kvmemdup_bpfptr(uvalue, value_size);
>         if (IS_ERR(value)) {
>                 err = PTR_ERR(value);
> @@ -1941,19 +1945,25 @@ int generic_map_update_batch(struct bpf_map *map, struct file *map_file,
>  {
>         void __user *values = u64_to_user_ptr(attr->batch.values);
>         void __user *keys = u64_to_user_ptr(attr->batch.keys);
> -       u32 value_size, cp, max_count;
> +       u32 value_size, cp, max_count, cpu = attr->batch.cpu;
> +       u64 elem_flags = attr->batch.elem_flags;
>         void *key, *value;
>         int err = 0;
>
> -       if (attr->batch.elem_flags & ~BPF_F_LOCK)
> +       if (elem_flags & ~(BPF_F_LOCK | BPF_F_CPU))
>                 return -EINVAL;
>
> -       if ((attr->batch.elem_flags & BPF_F_LOCK) &&
> +       if ((elem_flags & BPF_F_LOCK) &&
>             !btf_record_has_field(map->record, BPF_SPIN_LOCK)) {
>                 return -EINVAL;
>         }
>
> -       value_size = bpf_map_value_size(map);
> +       if ((elem_flags & BPF_F_CPU) &&
> +               map->map_type != BPF_MAP_TYPE_PERCPU_ARRAY)
> +               return -EINVAL;
> +
> +       value_size = bpf_map_value_size(map, elem_flags);
> +       elem_flags = (((u64)cpu) << 32) | elem_flags;

nit: elem_flags |= (u64)cpu << 32;

same effect, but a bit more explicitly stating "we are just adding
stuff to elem_flags"

>
>         max_count = attr->batch.count;
>         if (!max_count)
> @@ -1979,8 +1989,7 @@ int generic_map_update_batch(struct bpf_map *map, struct file *map_file,
>                     copy_from_user(value, values + cp * value_size, value_size))
>                         break;
>
> -               err = bpf_map_update_value(map, map_file, key, value,
> -                                          attr->batch.elem_flags);
> +               err = bpf_map_update_value(map, map_file, key, value, elem_flags);
>
>                 if (err)
>                         break;
> @@ -2004,18 +2013,24 @@ int generic_map_lookup_batch(struct bpf_map *map,
>         void __user *ubatch = u64_to_user_ptr(attr->batch.in_batch);
>         void __user *values = u64_to_user_ptr(attr->batch.values);
>         void __user *keys = u64_to_user_ptr(attr->batch.keys);
> +       u32 value_size, cp, max_count, cpu = attr->batch.cpu;
>         void *buf, *buf_prevkey, *prev_key, *key, *value;
> -       u32 value_size, cp, max_count;
> +       u64 elem_flags = attr->batch.elem_flags;
>         int err;
>
> -       if (attr->batch.elem_flags & ~BPF_F_LOCK)
> +       if (elem_flags & ~(BPF_F_LOCK | BPF_F_CPU))
>                 return -EINVAL;
>
> -       if ((attr->batch.elem_flags & BPF_F_LOCK) &&
> +       if ((elem_flags & BPF_F_LOCK) &&
>             !btf_record_has_field(map->record, BPF_SPIN_LOCK))
>                 return -EINVAL;
>
> -       value_size = bpf_map_value_size(map);
> +       if ((elem_flags & BPF_F_CPU) &&
> +               map->map_type != BPF_MAP_TYPE_PERCPU_ARRAY)
> +               return -EINVAL;
> +
> +       value_size = bpf_map_value_size(map, elem_flags);
> +       elem_flags = (((u64)cpu) << 32) | elem_flags;
>

ditto

>         max_count = attr->batch.count;
>         if (!max_count)

[...]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH bpf-next v2 2/3] bpf, libbpf: Support BPF_F_CPU for percpu_array map
  2025-07-07 16:04 ` [RFC PATCH bpf-next v2 2/3] bpf, libbpf: Support BPF_F_CPU " Leon Hwang
@ 2025-07-11 18:10   ` Andrii Nakryiko
  2025-07-14 12:48     ` Leon Hwang
  0 siblings, 1 reply; 10+ messages in thread
From: Andrii Nakryiko @ 2025-07-11 18:10 UTC (permalink / raw)
  To: Leon Hwang; +Cc: bpf, ast, andrii, daniel

On Mon, Jul 7, 2025 at 9:04 AM Leon Hwang <leon.hwang@linux.dev> wrote:
>
> This patch adds libbpf support for the BPF_F_CPU flag in percpu_array maps,
> introducing the following APIs:
>
> 1. bpf_map_update_elem_opts(): update with struct bpf_map_update_elem_opts
> 2. bpf_map_lookup_elem_opts(): lookup with struct bpf_map_lookup_elem_opts
> 3. bpf_map__update_elem_opts(): high-level wrapper with input validation
> 4. bpf_map__lookup_elem_opts(): high-level wrapper with input validation
>
> Behavior:
>
> * If opts->cpu == (u32)~0, the update is applied to all CPUs.
> * Otherwise, it applies only to the specified CPU.
> * Lookup APIs retrieve values from the target CPU when BPF_F_CPU is used.
>
> Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
> ---
>  tools/lib/bpf/bpf.c           | 23 ++++++++++++++
>  tools/lib/bpf/bpf.h           | 36 +++++++++++++++++++++-
>  tools/lib/bpf/libbpf.c        | 56 +++++++++++++++++++++++++++++++----
>  tools/lib/bpf/libbpf.h        | 53 ++++++++++++++++++++++++++++-----
>  tools/lib/bpf/libbpf.map      |  4 +++
>  tools/lib/bpf/libbpf_common.h | 14 +++++++++
>  6 files changed, 172 insertions(+), 14 deletions(-)
>

LGTM, just see the note about libpbf.map file, thanks.

> diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
> index 1bbf77326420..d21288991d1c 100644
> --- a/tools/lib/bpf/libbpf.map
> +++ b/tools/lib/bpf/libbpf.map
> @@ -436,6 +436,10 @@ LIBBPF_1.6.0 {
>                 bpf_linker__add_buf;
>                 bpf_linker__add_fd;
>                 bpf_linker__new_fd;
> +               bpf_map__lookup_elem_opts;
> +               bpf_map__update_elem_opts;
> +               bpf_map_lookup_elem_opts;
> +               bpf_map_update_elem_opts;

I'm planning to cut libbpf 1.6 release early next week, so for the
next revision please add it into 1.7 section


>                 bpf_object__prepare;
>                 bpf_prog_stream_read;
>                 bpf_program__attach_cgroup_opts;
> diff --git a/tools/lib/bpf/libbpf_common.h b/tools/lib/bpf/libbpf_common.h
> index 8fe248e14eb6..84ca89ace1be 100644
> --- a/tools/lib/bpf/libbpf_common.h
> +++ b/tools/lib/bpf/libbpf_common.h
> @@ -89,4 +89,18 @@
>                 memcpy(&NAME, &___##NAME, sizeof(NAME));                    \
>         } while (0)
>
> +struct bpf_map_update_elem_opts {
> +       size_t sz; /* size of this struct for forward/backward compatibility */
> +       __u64 flags;
> +       __u32 cpu;
> +       size_t:0;
> +};
> +
> +struct bpf_map_lookup_elem_opts {
> +       size_t sz; /* size of this struct for forward/backward compatibility */
> +       __u64 flags;
> +       __u32 cpu;
> +       size_t:0;
> +};
> +
>  #endif /* __LIBBPF_LIBBPF_COMMON_H */
> --
> 2.50.0
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH bpf-next v2 3/3] selftests/bpf: Add case to test BPF_F_CPU
  2025-07-07 16:04 ` [RFC PATCH bpf-next v2 3/3] selftests/bpf: Add case to test BPF_F_CPU Leon Hwang
@ 2025-07-11 18:11   ` Andrii Nakryiko
  2025-07-14 12:49     ` Leon Hwang
  0 siblings, 1 reply; 10+ messages in thread
From: Andrii Nakryiko @ 2025-07-11 18:11 UTC (permalink / raw)
  To: Leon Hwang; +Cc: bpf, ast, andrii, daniel

On Mon, Jul 7, 2025 at 9:04 AM Leon Hwang <leon.hwang@linux.dev> wrote:
>
> This patch adds test coverage for the new BPF_F_CPU flag support in
> percpu_array maps. The following APIs are exercised:
>
> * bpf_map_update_batch()
> * bpf_map_lookup_batch()
> * bpf_map_update_elem_opts()
> * bpf_map__update_elem_opts()
> * bpf_map_lookup_elem_opts()
> * bpf_map__lookup_elem_opts()
>
> cd tools/testing/selftests/bpf/
> ./test_progs -t percpu_alloc/cpu_flag_tests
> 253/13  percpu_alloc/cpu_flag_tests:OK
> 253     percpu_alloc:OK
> Summary: 1/13 PASSED, 0 SKIPPED, 0 FAILED
>
> Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
> ---
>  .../selftests/bpf/prog_tests/percpu_alloc.c   | 170 ++++++++++++++++++
>  .../selftests/bpf/progs/percpu_array_flag.c   |  24 +++
>  2 files changed, 194 insertions(+)
>  create mode 100644 tools/testing/selftests/bpf/progs/percpu_array_flag.c
>
> diff --git a/tools/testing/selftests/bpf/prog_tests/percpu_alloc.c b/tools/testing/selftests/bpf/prog_tests/percpu_alloc.c
> index 343da65864d6..6f0d0e6dc76a 100644
> --- a/tools/testing/selftests/bpf/prog_tests/percpu_alloc.c
> +++ b/tools/testing/selftests/bpf/prog_tests/percpu_alloc.c
> @@ -3,6 +3,7 @@
>  #include "percpu_alloc_array.skel.h"
>  #include "percpu_alloc_cgrp_local_storage.skel.h"
>  #include "percpu_alloc_fail.skel.h"
> +#include "percpu_array_flag.skel.h"
>
>  static void test_array(void)
>  {
> @@ -115,6 +116,173 @@ static void test_failure(void) {
>         RUN_TESTS(percpu_alloc_fail);
>  }
>
> +static void test_cpu_flag(void)
> +{
> +       int map_fd, *keys = NULL, value_size, cpu, i, j, nr_cpus, err;
> +       size_t key_sz = sizeof(int), value_sz = sizeof(u64);
> +       struct percpu_array_flag *skel;
> +       u64 batch = 0, *values = NULL;
> +       const u64 value = 0xDEADC0DE;
> +       u32 count, max_entries;
> +       struct bpf_map *map;
> +       LIBBPF_OPTS(bpf_map_lookup_elem_opts, lookup_opts,
> +                   .flags = BPF_F_CPU,
> +                   .cpu = 0,
> +       );
> +       LIBBPF_OPTS(bpf_map_update_elem_opts, update_opts,
> +                   .flags = BPF_F_CPU,
> +                   .cpu = 0,
> +       );
> +       LIBBPF_OPTS(bpf_map_batch_opts, batch_opts,
> +                   .elem_flags = BPF_F_CPU,
> +                   .flags = 0,
> +       );
> +
> +       nr_cpus = libbpf_num_possible_cpus();
> +       if (!ASSERT_GT(nr_cpus, 0, "libbpf_num_possible_cpus"))
> +               return;
> +
> +       skel = percpu_array_flag__open_and_load();
> +       if (!ASSERT_OK_PTR(skel, "percpu_array_flag__open_and_load"))
> +               return;
> +
> +       map = skel->maps.percpu;
> +       map_fd = bpf_map__fd(map);
> +       max_entries = bpf_map__max_entries(map);
> +
> +       value_size = value_sz * nr_cpus;
> +       values = calloc(max_entries, value_size);
> +       keys = calloc(max_entries, key_sz);
> +       if (!ASSERT_FALSE(!keys || !values, "calloc keys and values"))

ASSERT_xxx are meant to be meaningful in the case that some condition
fails, so using generic ASSERT_FALSE with some complicated condition
is defeating that purpose. Use two separate ASSERT_OK_PTR checks
instead.

[...]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH bpf-next v2 1/3] bpf: Introduce BPF_F_CPU flag for percpu_array map
  2025-07-11 18:10   ` Andrii Nakryiko
@ 2025-07-14 12:47     ` Leon Hwang
  0 siblings, 0 replies; 10+ messages in thread
From: Leon Hwang @ 2025-07-14 12:47 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: bpf, ast, andrii, daniel



On 2025/7/12 02:10, Andrii Nakryiko wrote:
> On Mon, Jul 7, 2025 at 9:04 AM Leon Hwang <leon.hwang@linux.dev> wrote:
>>
>> This patch introduces support for the BPF_F_CPU flag in percpu_array maps
>> to allow updating or looking up values for specified CPUs or for all CPUs
>> with a single value.
>>
> 
> For next revision, please drop RFC tag, so this is tested and reviewed
> as a proper patch set.
> 

Sure. I'll drop it.

>> This enhancement enables:
>>
>> * Efficient update of all CPUs using a single value when cpu == (u32)~0.
>> * Targeted update or lookup for a specified CPU otherwise.
>>
>> The flag is passed via:
>>
>> * map_flags in bpf_percpu_array_update() along with embedded cpu field.
>> * elem_flags in generic_map_update_batch() along with separated cpu field.
>>
>> Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
>> ---
>>  include/linux/bpf.h            |  3 +-
>>  include/uapi/linux/bpf.h       |  7 +++++
>>  kernel/bpf/arraymap.c          | 56 ++++++++++++++++++++++++++--------
>>  kernel/bpf/syscall.c           | 52 +++++++++++++++++++------------
>>  tools/include/uapi/linux/bpf.h |  7 +++++
>>  5 files changed, 92 insertions(+), 33 deletions(-)
>>
> 
> [...]
> 
>>
>> +       cpu = (u32)(flags >> 32);
>> +       flags &= (u32)~0;
>> +       if (unlikely(flags > BPF_F_CPU))
>> +               return -EINVAL;
>> +       if (unlikely((flags & BPF_F_CPU) && cpu >= num_possible_cpus()))
>> +               return -E2BIG;
>> +
>>         /* per_cpu areas are zero-filled and bpf programs can only
>>          * access 'value_size' of them, so copying rounded areas
>>          * will not leak any kernel data
>> @@ -313,10 +320,15 @@ int bpf_percpu_array_copy(struct bpf_map *map, void *key, void *value)
>>         size = array->elem_size;
>>         rcu_read_lock();
>>         pptr = array->pptrs[index & array->index_mask];
>> -       for_each_possible_cpu(cpu) {
>> -               copy_map_value_long(map, value + off, per_cpu_ptr(pptr, cpu));
>> -               check_and_init_map_value(map, value + off);
>> -               off += size;
>> +       if (flags & BPF_F_CPU) {
>> +               copy_map_value_long(map, value, per_cpu_ptr(pptr, cpu));
>> +               check_and_init_map_value(map, value);
>> +       } else {
>> +               for_each_possible_cpu(cpu) {
>> +                       copy_map_value_long(map, value + off, per_cpu_ptr(pptr, cpu));
>> +                       check_and_init_map_value(map, value + off);
>> +                       off += size;
>> +               }
>>         }
>>         rcu_read_unlock();
>>         return 0;
>> @@ -387,13 +399,21 @@ int bpf_percpu_array_update(struct bpf_map *map, void *key, void *value,
>>         struct bpf_array *array = container_of(map, struct bpf_array, map);
>>         u32 index = *(u32 *)key;
>>         void __percpu *pptr;
>> -       int cpu, off = 0;
>> -       u32 size;
>> +       bool reuse_value;
>> +       u32 size, cpu;
>> +       int off = 0;
>>
>> -       if (unlikely(map_flags > BPF_EXIST))
>> +       cpu = (u32)(map_flags >> 32);
>> +       map_flags = map_flags & (u32)~0;
> 
> be consistent, use &= approach as above
> 

Ack.

>> +       if (unlikely(map_flags > BPF_F_CPU))
>>                 /* unknown flags */
>>                 return -EINVAL;
>>
>> +       if (unlikely((map_flags & BPF_F_CPU) && cpu != BPF_ALL_CPUS &&
>> +                    cpu >= num_possible_cpus()))
>> +               /* invalid cpu */
>> +               return -E2BIG;
>> +
>>         if (unlikely(index >= array->map.max_entries))
>>                 /* all elements were pre-allocated, cannot insert a new one */
>>                 return -E2BIG;
>> @@ -409,12 +429,22 @@ int bpf_percpu_array_update(struct bpf_map *map, void *key, void *value,
>>          * so no kernel data leaks possible
>>          */
>>         size = array->elem_size;
>> +       reuse_value = (map_flags & BPF_F_CPU) && cpu == BPF_ALL_CPUS;
> 
> I find "reuse_value" name extremely misleading, I stumble upon this
> every time (because "value" is ambiguous, is it the source value or
> map value we are updating?). Please drop it, there is no need for it,
> just do `map_flags & BPF_F_CPU` check in that for_each_possible_cpu
> loop below
> 

Ack.

>>         rcu_read_lock();
>>         pptr = array->pptrs[index & array->index_mask];
>> -       for_each_possible_cpu(cpu) {
>> -               copy_map_value_long(map, per_cpu_ptr(pptr, cpu), value + off);
>> +       if ((map_flags & BPF_F_CPU) && cpu != BPF_ALL_CPUS) {
>> +               copy_map_value_long(map, per_cpu_ptr(pptr, cpu), value);
>>                 bpf_obj_free_fields(array->map.record, per_cpu_ptr(pptr, cpu));
>> -               off += size;
>> +       } else {
>> +               for_each_possible_cpu(cpu) {
>> +                       if (!reuse_value) {
>> +                               copy_map_value_long(map, per_cpu_ptr(pptr, cpu), value + off);
>> +                               off += size;
>> +                       } else {
>> +                               copy_map_value_long(map, per_cpu_ptr(pptr, cpu), value);
>> +                       }
> 
> simpler and less duplication:
> 
> copy_map_value_long(map, per_cpu_ptr(pptr, cpu), value + off);
> /*
>  * same user-provided value is used if BPF_F_CPU is specified,
>  * otherwise value is an array of per-cpu values
>  */
> if (!(map_flags & BPF_F_CPU))
>     off += size;
> 

LGTM.

>> +                       bpf_obj_free_fields(array->map.record, per_cpu_ptr(pptr, cpu));
>> +               }
>>         }
>>         rcu_read_unlock();
>>         return 0;
>> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
>> index 7db7182a3057..a3ce0cdecb3c 100644
>> --- a/kernel/bpf/syscall.c
>> +++ b/kernel/bpf/syscall.c
>> @@ -129,8 +129,12 @@ bool bpf_map_write_active(const struct bpf_map *map)
>>         return atomic64_read(&map->writecnt) != 0;
>>  }
>>
>> -static u32 bpf_map_value_size(const struct bpf_map *map)
>> +static u32 bpf_map_value_size(const struct bpf_map *map, u64 flags)
>>  {
>> +       if ((flags & BPF_F_CPU) &&
>> +               map->map_type == BPF_MAP_TYPE_PERCPU_ARRAY)
> 
> formatting is off, keep single line
> 

Ack.

>> +               return round_up(map->value_size, 8);
>> +
>>         if (map->map_type == BPF_MAP_TYPE_PERCPU_HASH ||
>>             map->map_type == BPF_MAP_TYPE_LRU_PERCPU_HASH ||
>>             map->map_type == BPF_MAP_TYPE_PERCPU_ARRAY ||
>> @@ -312,7 +316,7 @@ static int bpf_map_copy_value(struct bpf_map *map, void *key, void *value,
>>             map->map_type == BPF_MAP_TYPE_LRU_PERCPU_HASH) {
>>                 err = bpf_percpu_hash_copy(map, key, value);
>>         } else if (map->map_type == BPF_MAP_TYPE_PERCPU_ARRAY) {
>> -               err = bpf_percpu_array_copy(map, key, value);
>> +               err = bpf_percpu_array_copy(map, key, value, flags);
>>         } else if (map->map_type == BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE) {
>>                 err = bpf_percpu_cgroup_storage_copy(map, key, value);
>>         } else if (map->map_type == BPF_MAP_TYPE_STACK_TRACE) {
>> @@ -1662,7 +1666,7 @@ static int map_lookup_elem(union bpf_attr *attr)
>>         if (CHECK_ATTR(BPF_MAP_LOOKUP_ELEM))
>>                 return -EINVAL;
>>
>> -       if (attr->flags & ~BPF_F_LOCK)
>> +       if ((attr->flags & (u32)~0) & ~(BPF_F_LOCK | BPF_F_CPU))
> 
> nit: this whole `attr->flags & (u32)~0` looks like an over-engineered
> `(u32)attr->flags`...
> 
>>                 return -EINVAL;
> 
> we should probably also have a condition checking that upper 32 bits
> are zero if BPF_F_CPU is not set?
> 

Correct. We should check the upper 32 bits if BPF_F_CPU is not set. It
should check the flags like
`(attr->flags & ~BPF_F_LOCK) && !(attr->flags & BPF_F_CPU)`.

>>
>>         CLASS(fd, f)(attr->map_fd);
>> @@ -1680,7 +1684,7 @@ static int map_lookup_elem(union bpf_attr *attr)
>>         if (IS_ERR(key))
>>                 return PTR_ERR(key);
>>
>> -       value_size = bpf_map_value_size(map);
>> +       value_size = bpf_map_value_size(map, attr->flags);
>>
>>         err = -ENOMEM;
>>         value = kvmalloc(value_size, GFP_USER | __GFP_NOWARN);
>> @@ -1749,7 +1753,7 @@ static int map_update_elem(union bpf_attr *attr, bpfptr_t uattr)
>>                 goto err_put;
>>         }
>>
>> -       value_size = bpf_map_value_size(map);
>> +       value_size = bpf_map_value_size(map, attr->flags);
>>         value = kvmemdup_bpfptr(uvalue, value_size);
>>         if (IS_ERR(value)) {
>>                 err = PTR_ERR(value);
>> @@ -1941,19 +1945,25 @@ int generic_map_update_batch(struct bpf_map *map, struct file *map_file,
>>  {
>>         void __user *values = u64_to_user_ptr(attr->batch.values);
>>         void __user *keys = u64_to_user_ptr(attr->batch.keys);
>> -       u32 value_size, cp, max_count;
>> +       u32 value_size, cp, max_count, cpu = attr->batch.cpu;
>> +       u64 elem_flags = attr->batch.elem_flags;
>>         void *key, *value;
>>         int err = 0;
>>
>> -       if (attr->batch.elem_flags & ~BPF_F_LOCK)
>> +       if (elem_flags & ~(BPF_F_LOCK | BPF_F_CPU))
>>                 return -EINVAL;
>>
>> -       if ((attr->batch.elem_flags & BPF_F_LOCK) &&
>> +       if ((elem_flags & BPF_F_LOCK) &&
>>             !btf_record_has_field(map->record, BPF_SPIN_LOCK)) {
>>                 return -EINVAL;
>>         }
>>
>> -       value_size = bpf_map_value_size(map);
>> +       if ((elem_flags & BPF_F_CPU) &&
>> +               map->map_type != BPF_MAP_TYPE_PERCPU_ARRAY)
>> +               return -EINVAL;
>> +
>> +       value_size = bpf_map_value_size(map, elem_flags);
>> +       elem_flags = (((u64)cpu) << 32) | elem_flags;
> 
> nit: elem_flags |= (u64)cpu << 32;
> 
> same effect, but a bit more explicitly stating "we are just adding
> stuff to elem_flags"
> 

Ack.

>>
>>         max_count = attr->batch.count;
>>         if (!max_count)
>> @@ -1979,8 +1989,7 @@ int generic_map_update_batch(struct bpf_map *map, struct file *map_file,
>>                     copy_from_user(value, values + cp * value_size, value_size))
>>                         break;
>>
>> -               err = bpf_map_update_value(map, map_file, key, value,
>> -                                          attr->batch.elem_flags);
>> +               err = bpf_map_update_value(map, map_file, key, value, elem_flags);
>>
>>                 if (err)
>>                         break;
>> @@ -2004,18 +2013,24 @@ int generic_map_lookup_batch(struct bpf_map *map,
>>         void __user *ubatch = u64_to_user_ptr(attr->batch.in_batch);
>>         void __user *values = u64_to_user_ptr(attr->batch.values);
>>         void __user *keys = u64_to_user_ptr(attr->batch.keys);
>> +       u32 value_size, cp, max_count, cpu = attr->batch.cpu;
>>         void *buf, *buf_prevkey, *prev_key, *key, *value;
>> -       u32 value_size, cp, max_count;
>> +       u64 elem_flags = attr->batch.elem_flags;
>>         int err;
>>
>> -       if (attr->batch.elem_flags & ~BPF_F_LOCK)
>> +       if (elem_flags & ~(BPF_F_LOCK | BPF_F_CPU))
>>                 return -EINVAL;
>>
>> -       if ((attr->batch.elem_flags & BPF_F_LOCK) &&
>> +       if ((elem_flags & BPF_F_LOCK) &&
>>             !btf_record_has_field(map->record, BPF_SPIN_LOCK))
>>                 return -EINVAL;
>>
>> -       value_size = bpf_map_value_size(map);
>> +       if ((elem_flags & BPF_F_CPU) &&
>> +               map->map_type != BPF_MAP_TYPE_PERCPU_ARRAY)
>> +               return -EINVAL;
>> +
>> +       value_size = bpf_map_value_size(map, elem_flags);
>> +       elem_flags = (((u64)cpu) << 32) | elem_flags;
>>
> 
> ditto
> 

Ack.

>>         max_count = attr->batch.count;
>>         if (!max_count)
> 
> [...]


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH bpf-next v2 2/3] bpf, libbpf: Support BPF_F_CPU for percpu_array map
  2025-07-11 18:10   ` Andrii Nakryiko
@ 2025-07-14 12:48     ` Leon Hwang
  0 siblings, 0 replies; 10+ messages in thread
From: Leon Hwang @ 2025-07-14 12:48 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: bpf, ast, andrii, daniel



On 2025/7/12 02:10, Andrii Nakryiko wrote:
> On Mon, Jul 7, 2025 at 9:04 AM Leon Hwang <leon.hwang@linux.dev> wrote:
>>
>> This patch adds libbpf support for the BPF_F_CPU flag in percpu_array maps,
>> introducing the following APIs:
>>
>> 1. bpf_map_update_elem_opts(): update with struct bpf_map_update_elem_opts
>> 2. bpf_map_lookup_elem_opts(): lookup with struct bpf_map_lookup_elem_opts
>> 3. bpf_map__update_elem_opts(): high-level wrapper with input validation
>> 4. bpf_map__lookup_elem_opts(): high-level wrapper with input validation
>>
>> Behavior:
>>
>> * If opts->cpu == (u32)~0, the update is applied to all CPUs.
>> * Otherwise, it applies only to the specified CPU.
>> * Lookup APIs retrieve values from the target CPU when BPF_F_CPU is used.
>>
>> Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
>> ---
>>  tools/lib/bpf/bpf.c           | 23 ++++++++++++++
>>  tools/lib/bpf/bpf.h           | 36 +++++++++++++++++++++-
>>  tools/lib/bpf/libbpf.c        | 56 +++++++++++++++++++++++++++++++----
>>  tools/lib/bpf/libbpf.h        | 53 ++++++++++++++++++++++++++++-----
>>  tools/lib/bpf/libbpf.map      |  4 +++
>>  tools/lib/bpf/libbpf_common.h | 14 +++++++++
>>  6 files changed, 172 insertions(+), 14 deletions(-)
>>
> 
> LGTM, just see the note about libpbf.map file, thanks.
> 
>> diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
>> index 1bbf77326420..d21288991d1c 100644
>> --- a/tools/lib/bpf/libbpf.map
>> +++ b/tools/lib/bpf/libbpf.map
>> @@ -436,6 +436,10 @@ LIBBPF_1.6.0 {
>>                 bpf_linker__add_buf;
>>                 bpf_linker__add_fd;
>>                 bpf_linker__new_fd;
>> +               bpf_map__lookup_elem_opts;
>> +               bpf_map__update_elem_opts;
>> +               bpf_map_lookup_elem_opts;
>> +               bpf_map_update_elem_opts;
> 
> I'm planning to cut libbpf 1.6 release early next week, so for the
> next revision please add it into 1.7 section
> 

No problem.

Thanks,
Leon

[...]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH bpf-next v2 3/3] selftests/bpf: Add case to test BPF_F_CPU
  2025-07-11 18:11   ` Andrii Nakryiko
@ 2025-07-14 12:49     ` Leon Hwang
  0 siblings, 0 replies; 10+ messages in thread
From: Leon Hwang @ 2025-07-14 12:49 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: bpf, ast, andrii, daniel



On 2025/7/12 02:11, Andrii Nakryiko wrote:
> On Mon, Jul 7, 2025 at 9:04 AM Leon Hwang <leon.hwang@linux.dev> wrote:
>>
>> This patch adds test coverage for the new BPF_F_CPU flag support in
>> percpu_array maps. The following APIs are exercised:
>>
>> * bpf_map_update_batch()
>> * bpf_map_lookup_batch()
>> * bpf_map_update_elem_opts()
>> * bpf_map__update_elem_opts()
>> * bpf_map_lookup_elem_opts()
>> * bpf_map__lookup_elem_opts()
>>
>> cd tools/testing/selftests/bpf/
>> ./test_progs -t percpu_alloc/cpu_flag_tests
>> 253/13  percpu_alloc/cpu_flag_tests:OK
>> 253     percpu_alloc:OK
>> Summary: 1/13 PASSED, 0 SKIPPED, 0 FAILED
>>
>> Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
>> ---
>>  .../selftests/bpf/prog_tests/percpu_alloc.c   | 170 ++++++++++++++++++
>>  .../selftests/bpf/progs/percpu_array_flag.c   |  24 +++
>>  2 files changed, 194 insertions(+)
>>  create mode 100644 tools/testing/selftests/bpf/progs/percpu_array_flag.c
>>
>> diff --git a/tools/testing/selftests/bpf/prog_tests/percpu_alloc.c b/tools/testing/selftests/bpf/prog_tests/percpu_alloc.c
>> index 343da65864d6..6f0d0e6dc76a 100644
>> --- a/tools/testing/selftests/bpf/prog_tests/percpu_alloc.c
>> +++ b/tools/testing/selftests/bpf/prog_tests/percpu_alloc.c
>> @@ -3,6 +3,7 @@
>>  #include "percpu_alloc_array.skel.h"
>>  #include "percpu_alloc_cgrp_local_storage.skel.h"
>>  #include "percpu_alloc_fail.skel.h"
>> +#include "percpu_array_flag.skel.h"
>>
>>  static void test_array(void)
>>  {
>> @@ -115,6 +116,173 @@ static void test_failure(void) {
>>         RUN_TESTS(percpu_alloc_fail);
>>  }
>>
>> +static void test_cpu_flag(void)
>> +{
>> +       int map_fd, *keys = NULL, value_size, cpu, i, j, nr_cpus, err;
>> +       size_t key_sz = sizeof(int), value_sz = sizeof(u64);
>> +       struct percpu_array_flag *skel;
>> +       u64 batch = 0, *values = NULL;
>> +       const u64 value = 0xDEADC0DE;
>> +       u32 count, max_entries;
>> +       struct bpf_map *map;
>> +       LIBBPF_OPTS(bpf_map_lookup_elem_opts, lookup_opts,
>> +                   .flags = BPF_F_CPU,
>> +                   .cpu = 0,
>> +       );
>> +       LIBBPF_OPTS(bpf_map_update_elem_opts, update_opts,
>> +                   .flags = BPF_F_CPU,
>> +                   .cpu = 0,
>> +       );
>> +       LIBBPF_OPTS(bpf_map_batch_opts, batch_opts,
>> +                   .elem_flags = BPF_F_CPU,
>> +                   .flags = 0,
>> +       );
>> +
>> +       nr_cpus = libbpf_num_possible_cpus();
>> +       if (!ASSERT_GT(nr_cpus, 0, "libbpf_num_possible_cpus"))
>> +               return;
>> +
>> +       skel = percpu_array_flag__open_and_load();
>> +       if (!ASSERT_OK_PTR(skel, "percpu_array_flag__open_and_load"))
>> +               return;
>> +
>> +       map = skel->maps.percpu;
>> +       map_fd = bpf_map__fd(map);
>> +       max_entries = bpf_map__max_entries(map);
>> +
>> +       value_size = value_sz * nr_cpus;
>> +       values = calloc(max_entries, value_size);
>> +       keys = calloc(max_entries, key_sz);
>> +       if (!ASSERT_FALSE(!keys || !values, "calloc keys and values"))
> 
> ASSERT_xxx are meant to be meaningful in the case that some condition
> fails, so using generic ASSERT_FALSE with some complicated condition
> is defeating that purpose. Use two separate ASSERT_OK_PTR checks
> instead.
> 

Got it.

Thanks,
Leon



^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2025-07-14 12:49 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-07 16:04 [RFC PATCH bpf-next v2 0/3] bpf: Introduce BPF_F_CPU flag for percpu_array map Leon Hwang
2025-07-07 16:04 ` [RFC PATCH bpf-next v2 1/3] " Leon Hwang
2025-07-11 18:10   ` Andrii Nakryiko
2025-07-14 12:47     ` Leon Hwang
2025-07-07 16:04 ` [RFC PATCH bpf-next v2 2/3] bpf, libbpf: Support BPF_F_CPU " Leon Hwang
2025-07-11 18:10   ` Andrii Nakryiko
2025-07-14 12:48     ` Leon Hwang
2025-07-07 16:04 ` [RFC PATCH bpf-next v2 3/3] selftests/bpf: Add case to test BPF_F_CPU Leon Hwang
2025-07-11 18:11   ` Andrii Nakryiko
2025-07-14 12:49     ` Leon Hwang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).