* [RFC PATCH bpf-next 0/3] bpf: Introduce BPF_F_CPU flag for percpu_array map
@ 2025-06-24 16:53 Leon Hwang
2025-06-24 16:53 ` [RFC PATCH bpf-next 1/3] " Leon Hwang
` (2 more replies)
0 siblings, 3 replies; 13+ messages in thread
From: Leon Hwang @ 2025-06-24 16:53 UTC (permalink / raw)
To: bpf; +Cc: ast, andrii, daniel, Leon Hwang
This patch set introduces the BPF_F_CPU flag for percpu_array maps, as
discussed in the thread of
"[PATCH bpf-next v3 0/4] bpf: Introduce global percpu data"[1].
The goal is to reduce data caching overhead in light skeletons by allowing
a single value to be reused across all CPUs. This avoids the M:N problem
where M cached values are used to update a map on N CPUs kernel.
The BPF_F_CPU flag is accompanied by a cpu field, which specifies the
target CPUs for the operation:
* For lookup operations: the flag and cpu field enable querying a value
on the specified CPU.
* For update operations:
* If cpu == 0xFFFFFFFF, the provided value is copied to all CPUs.
* Otherwise, the value is copied to the specified CPU only.
Currently, this functionality is only supported for percpu_array maps.
Links:
[1] https://lore.kernel.org/bpf/20250526162146.24429-1-leon.hwang@linux.dev/
Leon Hwang (3):
bpf: Introduce BPF_F_CPU flag for percpu_array map
bpf, libbpf: Support BPF_F_CPU for percpu_array map
selftests/bpf: Add case to test BPF_F_CPU
include/linux/bpf.h | 5 +-
include/uapi/linux/bpf.h | 6 +
kernel/bpf/arraymap.c | 46 ++++-
kernel/bpf/syscall.c | 56 ++++--
tools/include/uapi/linux/bpf.h | 6 +
tools/lib/bpf/bpf.c | 37 ++++
tools/lib/bpf/bpf.h | 35 +++-
tools/lib/bpf/libbpf.c | 56 ++++++
tools/lib/bpf/libbpf.h | 45 +++++
tools/lib/bpf/libbpf.map | 4 +
tools/lib/bpf/libbpf_common.h | 12 ++
.../selftests/bpf/prog_tests/percpu_alloc.c | 169 ++++++++++++++++++
.../selftests/bpf/progs/percpu_array_flag.c | 24 +++
13 files changed, 473 insertions(+), 28 deletions(-)
create mode 100644 tools/testing/selftests/bpf/progs/percpu_array_flag.c
--
2.49.0
^ permalink raw reply [flat|nested] 13+ messages in thread
* [RFC PATCH bpf-next 1/3] bpf: Introduce BPF_F_CPU flag for percpu_array map
2025-06-24 16:53 [RFC PATCH bpf-next 0/3] bpf: Introduce BPF_F_CPU flag for percpu_array map Leon Hwang
@ 2025-06-24 16:53 ` Leon Hwang
2025-07-01 20:22 ` Andrii Nakryiko
2025-06-24 16:53 ` [RFC PATCH bpf-next 2/3] bpf, libbpf: Support BPF_F_CPU " Leon Hwang
2025-06-24 16:53 ` [RFC PATCH bpf-next 3/3] selftests/bpf: Add case to test BPF_F_CPU Leon Hwang
2 siblings, 1 reply; 13+ messages in thread
From: Leon Hwang @ 2025-06-24 16:53 UTC (permalink / raw)
To: bpf; +Cc: ast, andrii, daniel, Leon Hwang
This patch introduces support for the BPF_F_CPU flag in percpu_array maps
to allow updating or looking up values for specific CPUs or for all CPUs
with a single value.
This enhancement enables:
* Efficient update of all CPUs using a single value when cpu == 0xFFFFFFFF.
* Targeted update or lookup for a specific CPU otherwise.
The flag is passed via:
* map_flags in bpf_percpu_array_update() along with the cpu field.
* elem_flags in generic_map_update_batch() along with the cpu field.
Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
include/linux/bpf.h | 5 +--
include/uapi/linux/bpf.h | 6 ++++
kernel/bpf/arraymap.c | 46 ++++++++++++++++++++++++----
kernel/bpf/syscall.c | 56 ++++++++++++++++++++++------------
tools/include/uapi/linux/bpf.h | 6 ++++
5 files changed, 92 insertions(+), 27 deletions(-)
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 5dd556e89cce..4f4cac6c6b84 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -2628,11 +2628,12 @@ int map_set_for_each_callback_args(struct bpf_verifier_env *env,
struct bpf_func_state *callee);
int bpf_percpu_hash_copy(struct bpf_map *map, void *key, void *value);
-int bpf_percpu_array_copy(struct bpf_map *map, void *key, void *value);
+int bpf_percpu_array_copy(struct bpf_map *map, void *key, void *value,
+ u64 flags, u32 cpu);
int bpf_percpu_hash_update(struct bpf_map *map, void *key, void *value,
u64 flags);
int bpf_percpu_array_update(struct bpf_map *map, void *key, void *value,
- u64 flags);
+ u64 flags, u32 cpu);
int bpf_stackmap_copy(struct bpf_map *map, void *key, void *value);
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 39e7818cca80..a602c45149eb 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1359,8 +1359,12 @@ enum {
BPF_NOEXIST = 1, /* create new element if it didn't exist */
BPF_EXIST = 2, /* update existing element */
BPF_F_LOCK = 4, /* spin_lock-ed map_lookup/map_update */
+ BPF_F_CPU = 8, /* map_update for percpu_array */
};
+/* indicate updating value on all CPUs for percpu maps. */
+#define BPF_ALL_CPU 0xFFFFFFFF
+
/* flags for BPF_MAP_CREATE command */
enum {
BPF_F_NO_PREALLOC = (1U << 0),
@@ -1514,6 +1518,7 @@ union bpf_attr {
__aligned_u64 next_key;
};
__u64 flags;
+ __u32 cpu;
};
struct { /* struct used by BPF_MAP_*_BATCH commands */
@@ -1531,6 +1536,7 @@ union bpf_attr {
__u32 map_fd;
__u64 elem_flags;
__u64 flags;
+ __u32 cpu;
} batch;
struct { /* anonymous struct used by BPF_PROG_LOAD command */
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index eb28c0f219ee..290462a2b1b9 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -295,22 +295,40 @@ static void *percpu_array_map_lookup_percpu_elem(struct bpf_map *map, void *key,
return per_cpu_ptr(array->pptrs[index & array->index_mask], cpu);
}
-int bpf_percpu_array_copy(struct bpf_map *map, void *key, void *value)
+int bpf_percpu_array_copy(struct bpf_map *map, void *key, void *value,
+ u64 flags, u32 cpu)
{
struct bpf_array *array = container_of(map, struct bpf_array, map);
u32 index = *(u32 *)key;
void __percpu *pptr;
- int cpu, off = 0;
+ int off = 0;
u32 size;
if (unlikely(index >= array->map.max_entries))
return -ENOENT;
+ if (unlikely(flags > BPF_F_CPU))
+ /* unknown flags */
+ return -EINVAL;
+
/* per_cpu areas are zero-filled and bpf programs can only
* access 'value_size' of them, so copying rounded areas
* will not leak any kernel data
*/
size = array->elem_size;
+
+ if (flags & BPF_F_CPU) {
+ if (cpu >= num_possible_cpus())
+ return -E2BIG;
+
+ rcu_read_lock();
+ pptr = array->pptrs[index & array->index_mask];
+ copy_map_value_long(map, value, per_cpu_ptr(pptr, cpu));
+ check_and_init_map_value(map, value);
+ rcu_read_unlock();
+ return 0;
+ }
+
rcu_read_lock();
pptr = array->pptrs[index & array->index_mask];
for_each_possible_cpu(cpu) {
@@ -382,15 +400,16 @@ static long array_map_update_elem(struct bpf_map *map, void *key, void *value,
}
int bpf_percpu_array_update(struct bpf_map *map, void *key, void *value,
- u64 map_flags)
+ u64 map_flags, u32 cpu)
{
struct bpf_array *array = container_of(map, struct bpf_array, map);
u32 index = *(u32 *)key;
void __percpu *pptr;
- int cpu, off = 0;
+ bool reuse_value;
+ int off = 0;
u32 size;
- if (unlikely(map_flags > BPF_EXIST))
+ if (unlikely(map_flags > BPF_F_CPU))
/* unknown flags */
return -EINVAL;
@@ -409,10 +428,25 @@ int bpf_percpu_array_update(struct bpf_map *map, void *key, void *value,
* so no kernel data leaks possible
*/
size = array->elem_size;
+
+ if ((map_flags & BPF_F_CPU) && cpu != BPF_ALL_CPU) {
+ if (cpu >= num_possible_cpus())
+ return -E2BIG;
+
+ rcu_read_lock();
+ pptr = array->pptrs[index & array->index_mask];
+ copy_map_value_long(map, per_cpu_ptr(pptr, cpu), value);
+ bpf_obj_free_fields(array->map.record, per_cpu_ptr(pptr, cpu));
+ rcu_read_unlock();
+ return 0;
+ }
+
+ reuse_value = (map_flags & BPF_F_CPU) && cpu == BPF_ALL_CPU;
rcu_read_lock();
pptr = array->pptrs[index & array->index_mask];
for_each_possible_cpu(cpu) {
- copy_map_value_long(map, per_cpu_ptr(pptr, cpu), value + off);
+ copy_map_value_long(map, per_cpu_ptr(pptr, cpu),
+ reuse_value ? value : value + off);
bpf_obj_free_fields(array->map.record, per_cpu_ptr(pptr, cpu));
off += size;
}
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 56500381c28a..cdff7830baee 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -241,7 +241,7 @@ static int bpf_obj_pin_uptrs(struct btf_record *rec, void *obj)
}
static int bpf_map_update_value(struct bpf_map *map, struct file *map_file,
- void *key, void *value, __u64 flags)
+ void *key, void *value, __u64 flags, __u32 cpu)
{
int err;
@@ -265,7 +265,7 @@ static int bpf_map_update_value(struct bpf_map *map, struct file *map_file,
map->map_type == BPF_MAP_TYPE_LRU_PERCPU_HASH) {
err = bpf_percpu_hash_update(map, key, value, flags);
} else if (map->map_type == BPF_MAP_TYPE_PERCPU_ARRAY) {
- err = bpf_percpu_array_update(map, key, value, flags);
+ err = bpf_percpu_array_update(map, key, value, flags, cpu);
} else if (map->map_type == BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE) {
err = bpf_percpu_cgroup_storage_update(map, key, value,
flags);
@@ -299,7 +299,7 @@ static int bpf_map_update_value(struct bpf_map *map, struct file *map_file,
}
static int bpf_map_copy_value(struct bpf_map *map, void *key, void *value,
- __u64 flags)
+ __u64 flags, __u32 cpu)
{
void *ptr;
int err;
@@ -312,7 +312,7 @@ static int bpf_map_copy_value(struct bpf_map *map, void *key, void *value,
map->map_type == BPF_MAP_TYPE_LRU_PERCPU_HASH) {
err = bpf_percpu_hash_copy(map, key, value);
} else if (map->map_type == BPF_MAP_TYPE_PERCPU_ARRAY) {
- err = bpf_percpu_array_copy(map, key, value);
+ err = bpf_percpu_array_copy(map, key, value, flags, cpu);
} else if (map->map_type == BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE) {
err = bpf_percpu_cgroup_storage_copy(map, key, value);
} else if (map->map_type == BPF_MAP_TYPE_STACK_TRACE) {
@@ -1648,7 +1648,7 @@ static void *___bpf_copy_key(bpfptr_t ukey, u64 key_size)
}
/* last field in 'union bpf_attr' used by this command */
-#define BPF_MAP_LOOKUP_ELEM_LAST_FIELD flags
+#define BPF_MAP_LOOKUP_ELEM_LAST_FIELD cpu
static int map_lookup_elem(union bpf_attr *attr)
{
@@ -1662,7 +1662,7 @@ static int map_lookup_elem(union bpf_attr *attr)
if (CHECK_ATTR(BPF_MAP_LOOKUP_ELEM))
return -EINVAL;
- if (attr->flags & ~BPF_F_LOCK)
+ if (attr->flags & ~(BPF_F_LOCK | BPF_F_CPU))
return -EINVAL;
CLASS(fd, f)(attr->map_fd);
@@ -1691,11 +1691,11 @@ static int map_lookup_elem(union bpf_attr *attr)
if (copy_from_user(value, uvalue, value_size))
err = -EFAULT;
else
- err = bpf_map_copy_value(map, key, value, attr->flags);
+ err = bpf_map_copy_value(map, key, value, attr->flags, attr->cpu);
goto free_value;
}
- err = bpf_map_copy_value(map, key, value, attr->flags);
+ err = bpf_map_copy_value(map, key, value, attr->flags, attr->cpu);
if (err)
goto free_value;
@@ -1713,7 +1713,7 @@ static int map_lookup_elem(union bpf_attr *attr)
}
-#define BPF_MAP_UPDATE_ELEM_LAST_FIELD flags
+#define BPF_MAP_UPDATE_ELEM_LAST_FIELD cpu
static int map_update_elem(union bpf_attr *attr, bpfptr_t uattr)
{
@@ -1756,7 +1756,7 @@ static int map_update_elem(union bpf_attr *attr, bpfptr_t uattr)
goto free_key;
}
- err = bpf_map_update_value(map, fd_file(f), key, value, attr->flags);
+ err = bpf_map_update_value(map, fd_file(f), key, value, attr->flags, attr->cpu);
if (!err)
maybe_wait_bpf_programs(map);
@@ -1941,19 +1941,27 @@ int generic_map_update_batch(struct bpf_map *map, struct file *map_file,
{
void __user *values = u64_to_user_ptr(attr->batch.values);
void __user *keys = u64_to_user_ptr(attr->batch.keys);
+ u64 elem_flags = attr->batch.elem_flags;
u32 value_size, cp, max_count;
void *key, *value;
int err = 0;
- if (attr->batch.elem_flags & ~BPF_F_LOCK)
+ if (elem_flags & ~(BPF_F_LOCK | BPF_F_CPU))
return -EINVAL;
- if ((attr->batch.elem_flags & BPF_F_LOCK) &&
+ if ((elem_flags & BPF_F_LOCK) &&
!btf_record_has_field(map->record, BPF_SPIN_LOCK)) {
return -EINVAL;
}
- value_size = bpf_map_value_size(map);
+ if (elem_flags & BPF_F_CPU) {
+ if (map->map_type != BPF_MAP_TYPE_PERCPU_ARRAY)
+ return -EINVAL;
+
+ value_size = round_up(map->value_size, 8);
+ } else {
+ value_size = bpf_map_value_size(map);
+ }
max_count = attr->batch.count;
if (!max_count)
@@ -1980,7 +1988,8 @@ int generic_map_update_batch(struct bpf_map *map, struct file *map_file,
break;
err = bpf_map_update_value(map, map_file, key, value,
- attr->batch.elem_flags);
+ attr->batch.elem_flags,
+ attr->batch.cpu);
if (err)
break;
@@ -2005,17 +2014,25 @@ int generic_map_lookup_batch(struct bpf_map *map,
void __user *values = u64_to_user_ptr(attr->batch.values);
void __user *keys = u64_to_user_ptr(attr->batch.keys);
void *buf, *buf_prevkey, *prev_key, *key, *value;
+ u64 elem_flags = attr->batch.elem_flags;
u32 value_size, cp, max_count;
int err;
- if (attr->batch.elem_flags & ~BPF_F_LOCK)
+ if (elem_flags & ~(BPF_F_LOCK | BPF_F_CPU))
return -EINVAL;
- if ((attr->batch.elem_flags & BPF_F_LOCK) &&
+ if ((elem_flags & BPF_F_LOCK) &&
!btf_record_has_field(map->record, BPF_SPIN_LOCK))
return -EINVAL;
- value_size = bpf_map_value_size(map);
+ if (elem_flags & BPF_F_CPU) {
+ if (map->map_type != BPF_MAP_TYPE_PERCPU_ARRAY)
+ return -EINVAL;
+
+ value_size = round_up(map->value_size, 8);
+ } else {
+ value_size = bpf_map_value_size(map);
+ }
max_count = attr->batch.count;
if (!max_count)
@@ -2050,7 +2067,8 @@ int generic_map_lookup_batch(struct bpf_map *map,
if (err)
break;
err = bpf_map_copy_value(map, key, value,
- attr->batch.elem_flags);
+ attr->batch.elem_flags,
+ attr->batch.cpu);
if (err == -ENOENT)
goto next_key;
@@ -5438,7 +5456,7 @@ static int bpf_task_fd_query(const union bpf_attr *attr,
return err;
}
-#define BPF_MAP_BATCH_LAST_FIELD batch.flags
+#define BPF_MAP_BATCH_LAST_FIELD batch.cpu
#define BPF_DO_BATCH(fn, ...) \
do { \
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 39e7818cca80..a602c45149eb 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -1359,8 +1359,12 @@ enum {
BPF_NOEXIST = 1, /* create new element if it didn't exist */
BPF_EXIST = 2, /* update existing element */
BPF_F_LOCK = 4, /* spin_lock-ed map_lookup/map_update */
+ BPF_F_CPU = 8, /* map_update for percpu_array */
};
+/* indicate updating value on all CPUs for percpu maps. */
+#define BPF_ALL_CPU 0xFFFFFFFF
+
/* flags for BPF_MAP_CREATE command */
enum {
BPF_F_NO_PREALLOC = (1U << 0),
@@ -1514,6 +1518,7 @@ union bpf_attr {
__aligned_u64 next_key;
};
__u64 flags;
+ __u32 cpu;
};
struct { /* struct used by BPF_MAP_*_BATCH commands */
@@ -1531,6 +1536,7 @@ union bpf_attr {
__u32 map_fd;
__u64 elem_flags;
__u64 flags;
+ __u32 cpu;
} batch;
struct { /* anonymous struct used by BPF_PROG_LOAD command */
--
2.49.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [RFC PATCH bpf-next 2/3] bpf, libbpf: Support BPF_F_CPU for percpu_array map
2025-06-24 16:53 [RFC PATCH bpf-next 0/3] bpf: Introduce BPF_F_CPU flag for percpu_array map Leon Hwang
2025-06-24 16:53 ` [RFC PATCH bpf-next 1/3] " Leon Hwang
@ 2025-06-24 16:53 ` Leon Hwang
2025-07-01 20:22 ` Andrii Nakryiko
2025-06-24 16:53 ` [RFC PATCH bpf-next 3/3] selftests/bpf: Add case to test BPF_F_CPU Leon Hwang
2 siblings, 1 reply; 13+ messages in thread
From: Leon Hwang @ 2025-06-24 16:53 UTC (permalink / raw)
To: bpf; +Cc: ast, andrii, daniel, Leon Hwang
This patch adds libbpf support for the BPF_F_CPU flag in percpu_array maps,
introducing the following APIs:
1. bpf_map_update_elem_opts(): update with struct bpf_map_update_elem_opts
2. bpf_map_lookup_elem_opts(): lookup with struct bpf_map_lookup_elem_opts
3. bpf_map__update_elem_opts(): high-level wrapper with input validation
4. bpf_map__lookup_elem_opts(): high-level wrapper with input validation
Behavior:
* If opts->cpu == 0xFFFFFFFF, the update is applied to all CPUs.
* Otherwise, it applies only to the specified CPU.
* Lookup APIs retrieve values from the target CPU when BPF_F_CPU is used.
Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
tools/lib/bpf/bpf.c | 37 +++++++++++++++++++++++
tools/lib/bpf/bpf.h | 35 +++++++++++++++++++++-
tools/lib/bpf/libbpf.c | 56 +++++++++++++++++++++++++++++++++++
tools/lib/bpf/libbpf.h | 45 ++++++++++++++++++++++++++++
tools/lib/bpf/libbpf.map | 4 +++
tools/lib/bpf/libbpf_common.h | 12 ++++++++
6 files changed, 188 insertions(+), 1 deletion(-)
diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 6eb421ccf91b..80f7ea041187 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -402,6 +402,24 @@ int bpf_map_update_elem(int fd, const void *key, const void *value,
return libbpf_err_errno(ret);
}
+int bpf_map_update_elem_opts(int fd, const void *key, const void *value,
+ const struct bpf_map_update_elem_opts *opts)
+{
+ const size_t attr_sz = offsetofend(union bpf_attr, cpu);
+ union bpf_attr attr;
+ int ret;
+
+ memset(&attr, 0, attr_sz);
+ attr.map_fd = fd;
+ attr.key = ptr_to_u64(key);
+ attr.value = ptr_to_u64(value);
+ attr.flags = OPTS_GET(opts, flags, 0);
+ attr.cpu = OPTS_GET(opts, cpu, BPF_ALL_CPU);
+
+ ret = sys_bpf(BPF_MAP_UPDATE_ELEM, &attr, attr_sz);
+ return libbpf_err_errno(ret);
+}
+
int bpf_map_lookup_elem(int fd, const void *key, void *value)
{
const size_t attr_sz = offsetofend(union bpf_attr, flags);
@@ -433,6 +451,24 @@ int bpf_map_lookup_elem_flags(int fd, const void *key, void *value, __u64 flags)
return libbpf_err_errno(ret);
}
+int bpf_map_lookup_elem_opts(int fd, const void *key, void *value,
+ const struct bpf_map_lookup_elem_opts *opts)
+{
+ const size_t attr_sz = offsetofend(union bpf_attr, cpu);
+ union bpf_attr attr;
+ int ret;
+
+ memset(&attr, 0, attr_sz);
+ attr.map_fd = fd;
+ attr.key = ptr_to_u64(key);
+ attr.value = ptr_to_u64(value);
+ attr.flags = OPTS_GET(opts, flags, 0);
+ attr.cpu = OPTS_GET(opts, cpu, BPF_ALL_CPU);
+
+ ret = sys_bpf(BPF_MAP_LOOKUP_ELEM, &attr, attr_sz);
+ return libbpf_err_errno(ret);
+}
+
int bpf_map_lookup_and_delete_elem(int fd, const void *key, void *value)
{
const size_t attr_sz = offsetofend(union bpf_attr, flags);
@@ -542,6 +578,7 @@ static int bpf_map_batch_common(int cmd, int fd, void *in_batch,
attr.batch.count = *count;
attr.batch.elem_flags = OPTS_GET(opts, elem_flags, 0);
attr.batch.flags = OPTS_GET(opts, flags, 0);
+ attr.batch.cpu = OPTS_GET(opts, cpu, BPF_ALL_CPU);
ret = sys_bpf(cmd, &attr, attr_sz);
*count = attr.batch.count;
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index 1342564214c8..7c6a0a3693c9 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -163,12 +163,41 @@ LIBBPF_API int bpf_map_delete_elem_flags(int fd, const void *key, __u64 flags);
LIBBPF_API int bpf_map_get_next_key(int fd, const void *key, void *next_key);
LIBBPF_API int bpf_map_freeze(int fd);
+/**
+ * @brief **bpf_map_update_elem_opts** allows for updating percpu map with value
+ * on specified CPU or on all CPUs.
+ *
+ * @param fd BPF map file descriptor
+ * @param key pointer to key
+ * @param value pointer to value
+ * @param opts options for configuring the way to update percpu map
+ * @return 0, on success; negative error code, otherwise (errno is also set to
+ * the error code)
+ */
+LIBBPF_API int bpf_map_update_elem_opts(int fd, const void *key, const void *value,
+ const struct bpf_map_update_elem_opts *opts);
+
+/**
+ * @brief **bpf_map_lookup_elem_opts** allows for looking up the value from
+ * percpu map on specified CPU.
+ *
+ * @param fd BPF map file descriptor
+ * @param key pointer to key
+ * @param value pointer to value
+ * @param opts options for configuring the way to lookup percpu map
+ * @return 0, on success; negative error code, otherwise (errno is also set to
+ * the error code)
+ */
+LIBBPF_API int bpf_map_lookup_elem_opts(int fd, const void *key, void *value,
+ const struct bpf_map_lookup_elem_opts *opts);
+
struct bpf_map_batch_opts {
size_t sz; /* size of this struct for forward/backward compatibility */
__u64 elem_flags;
__u64 flags;
+ __u32 cpu;
};
-#define bpf_map_batch_opts__last_field flags
+#define bpf_map_batch_opts__last_field cpu
/**
@@ -286,6 +315,10 @@ LIBBPF_API int bpf_map_lookup_and_delete_batch(int fd, void *in_batch,
* Update spin_lock-ed map elements. This must be
* specified if the map value contains a spinlock.
*
+ * **BPF_F_CPU**
+ * As for percpu map, update value on all CPUs if **opts->cpu** is
+ * 0xFFFFFFFF, or on specified CPU otherwise.
+ *
* @param fd BPF map file descriptor
* @param keys pointer to an array of *count* keys
* @param values pointer to an array of *count* values
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 6445165a24f2..30400bdc20d9 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -10636,6 +10636,34 @@ int bpf_map__lookup_elem(const struct bpf_map *map,
return bpf_map_lookup_elem_flags(map->fd, key, value, flags);
}
+int bpf_map__lookup_elem_opts(const struct bpf_map *map, const void *key,
+ size_t key_sz, void *value, size_t value_sz,
+ const struct bpf_map_lookup_elem_opts *opts)
+{
+ int nr_cpus = libbpf_num_possible_cpus();
+ __u32 cpu = OPTS_GET(opts, cpu, nr_cpus);
+ __u64 flags = OPTS_GET(opts, flags, 0);
+ int err;
+
+ if (flags & BPF_F_CPU) {
+ if (map->def.type != BPF_MAP_TYPE_PERCPU_ARRAY)
+ return -EINVAL;
+ if (cpu >= nr_cpus)
+ return -E2BIG;
+ if (map->def.value_size != value_sz) {
+ pr_warn("map '%s': unexpected value size %zu provided, expected %u\n",
+ map->name, value_sz, map->def.value_size);
+ return -EINVAL;
+ }
+ } else {
+ err = validate_map_op(map, key_sz, value_sz, true);
+ if (err)
+ return libbpf_err(err);
+ }
+
+ return bpf_map_lookup_elem_opts(map->fd, key, value, opts);
+}
+
int bpf_map__update_elem(const struct bpf_map *map,
const void *key, size_t key_sz,
const void *value, size_t value_sz, __u64 flags)
@@ -10649,6 +10677,34 @@ int bpf_map__update_elem(const struct bpf_map *map,
return bpf_map_update_elem(map->fd, key, value, flags);
}
+int bpf_map__update_elem_opts(const struct bpf_map *map, const void *key,
+ size_t key_sz, const void *value, size_t value_sz,
+ const struct bpf_map_update_elem_opts *opts)
+{
+ int nr_cpus = libbpf_num_possible_cpus();
+ __u32 cpu = OPTS_GET(opts, cpu, nr_cpus);
+ __u64 flags = OPTS_GET(opts, flags, 0);
+ int err;
+
+ if (flags & BPF_F_CPU) {
+ if (map->def.type != BPF_MAP_TYPE_PERCPU_ARRAY)
+ return -EINVAL;
+ if (cpu != BPF_ALL_CPU && cpu >= nr_cpus)
+ return -E2BIG;
+ if (map->def.value_size != value_sz) {
+ pr_warn("map '%s': unexpected value size %zu provided, expected %u\n",
+ map->name, value_sz, map->def.value_size);
+ return -EINVAL;
+ }
+ } else {
+ err = validate_map_op(map, key_sz, value_sz, true);
+ if (err)
+ return libbpf_err(err);
+ }
+
+ return bpf_map_update_elem_opts(map->fd, key, value, opts);
+}
+
int bpf_map__delete_elem(const struct bpf_map *map,
const void *key, size_t key_sz, __u64 flags)
{
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index d1cf813a057b..ba0d15028c72 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -1185,6 +1185,28 @@ LIBBPF_API int bpf_map__lookup_elem(const struct bpf_map *map,
const void *key, size_t key_sz,
void *value, size_t value_sz, __u64 flags);
+/**
+ * @brief **bpf_map__lookup_elem_opts()** allows to lookup BPF map value
+ * corresponding to provided key with options to lookup percpu map.
+ * @param map BPF map to lookup element in
+ * @param key pointer to memory containing bytes of the key used for lookup
+ * @param key_sz size in bytes of key data, needs to match BPF map definition's **key_size**
+ * @param value pointer to memory in which looked up value will be stored
+ * @param value_sz size in byte of value data memory; it has to match BPF map
+ * definition's **value_size**. For per-CPU BPF maps value size can be
+ * definition's **value_size** if **BPF_F_CPU** is specified in **opts->flags**,
+ * or the size described in **bpf_map__lookup_elem()**.
+ * @opts extra options passed to kernel for this operation
+ * @return 0, on success; negative error, otherwise
+ *
+ * **bpf_map__lookup_elem_opts()** is high-level equivalent of
+ * **bpf_map_lookup_elem_opts()** API with added check for key and value size.
+ */
+LIBBPF_API int bpf_map__lookup_elem_opts(const struct bpf_map *map,
+ const void *key, size_t key_sz,
+ void *value, size_t value_sz,
+ const struct bpf_map_lookup_elem_opts *opts);
+
/**
* @brief **bpf_map__update_elem()** allows to insert or update value in BPF
* map that corresponds to provided key.
@@ -1209,6 +1231,29 @@ LIBBPF_API int bpf_map__update_elem(const struct bpf_map *map,
const void *key, size_t key_sz,
const void *value, size_t value_sz, __u64 flags);
+/**
+ * @brief **bpf_map__update_elem_opts()** allows to insert or update value in BPF
+ * map that corresponds to provided key with options for percpu maps.
+ * @param map BPF map to insert to or update element in
+ * @param key pointer to memory containing bytes of the key
+ * @param key_sz size in bytes of key data, needs to match BPF map definition's **key_size**
+ * @param value pointer to memory containing bytes of the value
+ * @param value_sz size in byte of value data memory; it has to match BPF map
+ * definition's **value_size**. For per-CPU BPF maps value size can be
+ * definition's **value_size** if **BPF_F_CPU** is specified in **opts->flags**,
+ * or the size described in **bpf_map__update_elem()**.
+ * @opts extra options passed to kernel for this operation
+ * @flags extra flags passed to kernel for this operation
+ * @return 0, on success; negative error, otherwise
+ *
+ * **bpf_map__update_elem_opts()** is high-level equivalent of
+ * **bpf_map_update_elem_opts()** API with added check for key and value size.
+ */
+LIBBPF_API int bpf_map__update_elem_opts(const struct bpf_map *map,
+ const void *key, size_t key_sz,
+ const void *value, size_t value_sz,
+ const struct bpf_map_update_elem_opts *opts);
+
/**
* @brief **bpf_map__delete_elem()** allows to delete element in BPF map that
* corresponds to provided key.
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index c7fc0bde5648..c39814adeae9 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -436,6 +436,10 @@ LIBBPF_1.6.0 {
bpf_linker__add_buf;
bpf_linker__add_fd;
bpf_linker__new_fd;
+ bpf_map__lookup_elem_opts;
+ bpf_map__update_elem_opts;
+ bpf_map_lookup_elem_opts;
+ bpf_map_update_elem_opts;
bpf_object__prepare;
bpf_program__attach_cgroup_opts;
bpf_program__func_info;
diff --git a/tools/lib/bpf/libbpf_common.h b/tools/lib/bpf/libbpf_common.h
index 8fe248e14eb6..ef29caf91f9c 100644
--- a/tools/lib/bpf/libbpf_common.h
+++ b/tools/lib/bpf/libbpf_common.h
@@ -89,4 +89,16 @@
memcpy(&NAME, &___##NAME, sizeof(NAME)); \
} while (0)
+struct bpf_map_update_elem_opts {
+ size_t sz; /* size of this struct for forward/backward compatibility */
+ __u64 flags;
+ __u32 cpu;
+};
+
+struct bpf_map_lookup_elem_opts {
+ size_t sz; /* size of this struct for forward/backward compatibility */
+ __u64 flags;
+ __u32 cpu;
+};
+
#endif /* __LIBBPF_LIBBPF_COMMON_H */
--
2.49.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [RFC PATCH bpf-next 3/3] selftests/bpf: Add case to test BPF_F_CPU
2025-06-24 16:53 [RFC PATCH bpf-next 0/3] bpf: Introduce BPF_F_CPU flag for percpu_array map Leon Hwang
2025-06-24 16:53 ` [RFC PATCH bpf-next 1/3] " Leon Hwang
2025-06-24 16:53 ` [RFC PATCH bpf-next 2/3] bpf, libbpf: Support BPF_F_CPU " Leon Hwang
@ 2025-06-24 16:53 ` Leon Hwang
2025-07-01 20:22 ` Andrii Nakryiko
2 siblings, 1 reply; 13+ messages in thread
From: Leon Hwang @ 2025-06-24 16:53 UTC (permalink / raw)
To: bpf; +Cc: ast, andrii, daniel, Leon Hwang
This patch adds test coverage for the new BPF_F_CPU flag support in
percpu_array maps. The following APIs are exercised:
* bpf_map_update_batch()
* bpf_map_lookup_batch()
* bpf_map_update_elem_opts()
* bpf_map__update_elem_opts()
* bpf_map_lookup_elem_opts()
* bpf_map__lookup_elem_opts()
cd tools/testing/selftests/bpf/
./test_progs -t percpu_alloc/cpu_flag_tests
251/5 percpu_alloc/cpu_flag_tests:OK
251 percpu_alloc:OK
Summary: 1/1 PASSED, 0 SKIPPED, 0 FAILED
Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
.../selftests/bpf/prog_tests/percpu_alloc.c | 169 ++++++++++++++++++
.../selftests/bpf/progs/percpu_array_flag.c | 24 +++
2 files changed, 193 insertions(+)
create mode 100644 tools/testing/selftests/bpf/progs/percpu_array_flag.c
diff --git a/tools/testing/selftests/bpf/prog_tests/percpu_alloc.c b/tools/testing/selftests/bpf/prog_tests/percpu_alloc.c
index 343da65864d6..5727f4601b49 100644
--- a/tools/testing/selftests/bpf/prog_tests/percpu_alloc.c
+++ b/tools/testing/selftests/bpf/prog_tests/percpu_alloc.c
@@ -3,6 +3,7 @@
#include "percpu_alloc_array.skel.h"
#include "percpu_alloc_cgrp_local_storage.skel.h"
#include "percpu_alloc_fail.skel.h"
+#include "percpu_array_flag.skel.h"
static void test_array(void)
{
@@ -115,6 +116,172 @@ static void test_failure(void) {
RUN_TESTS(percpu_alloc_fail);
}
+static void test_cpu_flag(void)
+{
+ int map_fd, *keys = NULL, value_size, cpu, i, j, nr_cpus, err;
+ size_t key_sz = sizeof(int), value_sz = sizeof(u64);
+ struct percpu_array_flag *skel;
+ u64 batch = 0, *values = NULL;
+ const u64 value = 0xDEADC0DE;
+ u32 count, max_entries;
+ struct bpf_map *map;
+ DECLARE_LIBBPF_OPTS(bpf_map_lookup_elem_opts, lookup_opts,
+ .flags = BPF_F_CPU,
+ .cpu = 0,
+ );
+ DECLARE_LIBBPF_OPTS(bpf_map_update_elem_opts, update_opts,
+ .flags = BPF_F_CPU,
+ .cpu = 0,
+ );
+ DECLARE_LIBBPF_OPTS(bpf_map_batch_opts, batch_opts,
+ .elem_flags = BPF_F_CPU,
+ .flags = 0,
+ );
+
+ nr_cpus = libbpf_num_possible_cpus();
+ if (!ASSERT_GT(nr_cpus, 0, "libbpf_num_possible_cpus"))
+ return;
+
+ skel = percpu_array_flag__open_and_load();
+ if (!ASSERT_OK_PTR(skel, "percpu_array_flag__open_and_load"))
+ return;
+
+ map = skel->maps.percpu;
+ map_fd = bpf_map__fd(map);
+ max_entries = bpf_map__max_entries(map);
+
+ value_size = value_sz * nr_cpus;
+ values = calloc(max_entries, value_size);
+ keys = calloc(max_entries, key_sz);
+ if (!ASSERT_FALSE(!keys || !values, "calloc keys and values"))
+ goto out;
+
+ batch_opts.cpu = nr_cpus;
+ err = bpf_map_update_batch(map_fd, keys, values, &max_entries, &batch_opts);
+ if (!ASSERT_EQ(err, -E2BIG, "bpf_map_update_batch -E2BIG"))
+ goto out;
+
+ for (i = 0; i < max_entries; i++)
+ keys[i] = i;
+
+ for (cpu = 0; cpu < nr_cpus; cpu++) {
+ memset(values, 0, max_entries * value_size);
+
+ /* clear values on all CPUs */
+ batch_opts.cpu = BPF_ALL_CPU;
+ batch_opts.elem_flags = BPF_F_CPU;
+ err = bpf_map_update_batch(map_fd, keys, values, &max_entries, &batch_opts);
+ if (!ASSERT_OK(err, "bpf_map_update_batch all cpus"))
+ goto out;
+
+ /* update values on current CPU */
+ for (i = 0; i < max_entries; i++)
+ values[i] = value;
+
+ batch_opts.cpu = cpu;
+ err = bpf_map_update_batch(map_fd, keys, values, &max_entries, &batch_opts);
+ if (!ASSERT_OK(err, "bpf_map_update_batch current cpu"))
+ goto out;
+
+ /* lookup values on current CPU */
+ batch_opts.cpu = cpu;
+ batch_opts.elem_flags = BPF_F_CPU;
+ memset(values, 0, max_entries * value_sz);
+ err = bpf_map_lookup_batch(map_fd, NULL, &batch, keys, values, &count, &batch_opts);
+ if (!ASSERT_TRUE(!err || err == -ENOENT, "bpf_map_lookup_batch current cpu"))
+ goto out;
+
+ for (i = 0; i < max_entries; i++)
+ if (!ASSERT_EQ(values[i], value, "value on current cpu"))
+ goto out;
+
+ /* lookup values on all CPUs */
+ batch_opts.cpu = 0;
+ batch_opts.elem_flags = 0;
+ memset(values, 0, max_entries * value_size);
+ err = bpf_map_lookup_batch(map_fd, NULL, &batch, keys, values, &count, &batch_opts);
+ if (!ASSERT_TRUE(!err || err == -ENOENT, "bpf_map_lookup_batch all cpus"))
+ goto out;
+
+ for (i = 0; i < max_entries; i++) {
+ for (j = 0; j < nr_cpus; j++) {
+ if (!ASSERT_EQ(values[i*nr_cpus + j], j != cpu ? 0 : value,
+ "value on cpu"))
+ goto out;
+ }
+ }
+ }
+
+ update_opts.cpu = nr_cpus;
+ err = bpf_map_update_elem_opts(map_fd, keys, values, &update_opts);
+ if (!ASSERT_EQ(err, -E2BIG, "bpf_map_update_elem_opts -E2BIG"))
+ goto out;
+
+ err = bpf_map__update_elem_opts(map, keys, key_sz, values, value_sz,
+ &update_opts);
+ if (!ASSERT_EQ(err, -E2BIG, "bpf_map__update_elem_opts -E2BIG"))
+ goto out;
+
+ lookup_opts.cpu = nr_cpus;
+ err = bpf_map_lookup_elem_opts(map_fd, keys, values, &lookup_opts);
+ if (!ASSERT_EQ(err, -E2BIG, "bpf_map_lookup_elem_opts -E2BIG"))
+ goto out;
+
+ err = bpf_map__lookup_elem_opts(map, keys, key_sz, values, value_sz,
+ &lookup_opts);
+ if (!ASSERT_EQ(err, -E2BIG, "bpf_map__lookup_elem_opts -E2BIG"))
+ goto out;
+
+ /* clear value on all cpus */
+ batch_opts.cpu = BPF_ALL_CPU;
+ batch_opts.elem_flags = BPF_F_CPU;
+ memset(values, 0, max_entries * value_sz);
+ err = bpf_map_update_batch(map_fd, keys, values, &max_entries, &batch_opts);
+ if (!ASSERT_OK(err, "bpf_map_update_batch all cpus"))
+ goto out;
+
+ for (cpu = 0; cpu < nr_cpus; cpu++) {
+ /* update value on current cpu */
+ values[0] = value;
+ update_opts.cpu = cpu;
+ for (i = 0; i < max_entries; i++) {
+ err = bpf_map__update_elem_opts(map, keys + i,
+ key_sz, values,
+ value_sz, &update_opts);
+ if (!ASSERT_OK(err, "bpf_map__update_elem_opts current cpu"))
+ goto out;
+
+ for (j = 0; j < nr_cpus; j++) {
+ /* lookup then check value on CPUs */
+ lookup_opts.cpu = j;
+ err = bpf_map__lookup_elem_opts(map, keys + i,
+ key_sz, values,
+ value_sz,
+ &lookup_opts);
+ if (!ASSERT_OK(err, "bpf_map__lookup_elem_opts current cpu"))
+ goto out;
+ if (!ASSERT_EQ(values[0], j != cpu ? 0 : value,
+ "bpf_map__lookup_elem_opts value on current cpu"))
+ goto out;
+ }
+ }
+
+ /* clear value on current cpu */
+ values[0] = 0;
+ err = bpf_map__update_elem_opts(map, keys, key_sz, values,
+ value_sz, &update_opts);
+ if (!ASSERT_OK(err, "bpf_map__update_elem_opts current cpu"))
+ goto out;
+ }
+
+out:
+ if (keys)
+ free(keys);
+ if (values)
+ free(values);
+ percpu_array_flag__destroy(skel);
+}
+
void test_percpu_alloc(void)
{
if (test__start_subtest("array"))
@@ -125,4 +292,6 @@ void test_percpu_alloc(void)
test_cgrp_local_storage();
if (test__start_subtest("failure_tests"))
test_failure();
+ if (test__start_subtest("cpu_flag_tests"))
+ test_cpu_flag();
}
diff --git a/tools/testing/selftests/bpf/progs/percpu_array_flag.c b/tools/testing/selftests/bpf/progs/percpu_array_flag.c
new file mode 100644
index 000000000000..4d92e121958e
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/percpu_array_flag.c
@@ -0,0 +1,24 @@
+// SPDX-License-Identifier: GPL-2.0
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+struct {
+ __uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
+ __uint(max_entries, 2);
+ __type(key, int);
+ __type(value, u64);
+} percpu SEC(".maps");
+
+SEC("fentry/bpf_fentry_test1")
+int BPF_PROG(test_percpu_array, int x)
+{
+ u64 value = 0xDEADC0DE;
+ int key = 0;
+
+ bpf_map_update_elem(&percpu, &key, &value, BPF_ANY);
+ return 0;
+}
+
+char _license[] SEC("license") = "GPL";
+
--
2.49.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [RFC PATCH bpf-next 1/3] bpf: Introduce BPF_F_CPU flag for percpu_array map
2025-06-24 16:53 ` [RFC PATCH bpf-next 1/3] " Leon Hwang
@ 2025-07-01 20:22 ` Andrii Nakryiko
2025-07-02 17:01 ` Leon Hwang
0 siblings, 1 reply; 13+ messages in thread
From: Andrii Nakryiko @ 2025-07-01 20:22 UTC (permalink / raw)
To: Leon Hwang; +Cc: bpf, ast, andrii, daniel
On Tue, Jun 24, 2025 at 9:54 AM Leon Hwang <leon.hwang@linux.dev> wrote:
>
> This patch introduces support for the BPF_F_CPU flag in percpu_array maps
> to allow updating or looking up values for specific CPUs or for all CPUs
> with a single value.
>
> This enhancement enables:
>
> * Efficient update of all CPUs using a single value when cpu == 0xFFFFFFFF.
> * Targeted update or lookup for a specific CPU otherwise.
>
> The flag is passed via:
>
> * map_flags in bpf_percpu_array_update() along with the cpu field.
> * elem_flags in generic_map_update_batch() along with the cpu field.
>
> Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
> ---
> include/linux/bpf.h | 5 +--
> include/uapi/linux/bpf.h | 6 ++++
> kernel/bpf/arraymap.c | 46 ++++++++++++++++++++++++----
> kernel/bpf/syscall.c | 56 ++++++++++++++++++++++------------
> tools/include/uapi/linux/bpf.h | 6 ++++
> 5 files changed, 92 insertions(+), 27 deletions(-)
>
[...]
> #define BPF_ALL_CPU 0xFFFFFFFF
at the very least we have to make it an enum, IMO. but I'm in general
unsure if we need it at all... and in any case, should it be named
"BPF_ALL_CPUS" (plural)?
> -int bpf_percpu_array_copy(struct bpf_map *map, void *key, void *value)
> +int bpf_percpu_array_copy(struct bpf_map *map, void *key, void *value,
> + u64 flags, u32 cpu)
> {
> struct bpf_array *array = container_of(map, struct bpf_array, map);
> u32 index = *(u32 *)key;
> void __percpu *pptr;
> - int cpu, off = 0;
> + int off = 0;
> u32 size;
>
> if (unlikely(index >= array->map.max_entries))
> return -ENOENT;
>
> + if (unlikely(flags > BPF_F_CPU))
> + /* unknown flags */
> + return -EINVAL;
> +
> /* per_cpu areas are zero-filled and bpf programs can only
> * access 'value_size' of them, so copying rounded areas
> * will not leak any kernel data
> */
> size = array->elem_size;
> +
> + if (flags & BPF_F_CPU) {
> + if (cpu >= num_possible_cpus())
> + return -E2BIG;
> +
> + rcu_read_lock();
> + pptr = array->pptrs[index & array->index_mask];
> + copy_map_value_long(map, value, per_cpu_ptr(pptr, cpu));
> + check_and_init_map_value(map, value);
> + rcu_read_unlock();
> + return 0;
> + }
> +
nit: it seems a bit cleaner to me to not duplicate
rcu_read_{lock,unlock} and pptr fetching
I'd probably add `if ((flags & BPF_F_CPU) && cpu >=
num_possible_cpus())` check, and then within rcu region
if (flags & BPF_F_CPU) {
copy_map_value_long(...);
check_and_init_map_value(...);
} else {
for_each_possible_cpu(cpu) {
copy_map_value_long(...);
check_and_init_map_value(...);
}
}
This to me is more explicitly showing that locking/data fetching isn't
different, and it's only about singular CPU vs all CPUs
(oh, and move int off inside the else branch then as well)
> rcu_read_lock();
> pptr = array->pptrs[index & array->index_mask];
> for_each_possible_cpu(cpu) {
> @@ -382,15 +400,16 @@ static long array_map_update_elem(struct bpf_map *map, void *key, void *value,
> }
>
> int bpf_percpu_array_update(struct bpf_map *map, void *key, void *value,
> - u64 map_flags)
> + u64 map_flags, u32 cpu)
> {
> struct bpf_array *array = container_of(map, struct bpf_array, map);
> u32 index = *(u32 *)key;
> void __percpu *pptr;
> - int cpu, off = 0;
> + bool reuse_value;
> + int off = 0;
> u32 size;
>
> - if (unlikely(map_flags > BPF_EXIST))
> + if (unlikely(map_flags > BPF_F_CPU))
> /* unknown flags */
> return -EINVAL;
>
> @@ -409,10 +428,25 @@ int bpf_percpu_array_update(struct bpf_map *map, void *key, void *value,
> * so no kernel data leaks possible
> */
> size = array->elem_size;
> +
> + if ((map_flags & BPF_F_CPU) && cpu != BPF_ALL_CPU) {
> + if (cpu >= num_possible_cpus())
> + return -E2BIG;
> +
> + rcu_read_lock();
> + pptr = array->pptrs[index & array->index_mask];
> + copy_map_value_long(map, per_cpu_ptr(pptr, cpu), value);
> + bpf_obj_free_fields(array->map.record, per_cpu_ptr(pptr, cpu));
> + rcu_read_unlock();
> + return 0;
> + }
> +
> + reuse_value = (map_flags & BPF_F_CPU) && cpu == BPF_ALL_CPU;
> rcu_read_lock();
> pptr = array->pptrs[index & array->index_mask];
> for_each_possible_cpu(cpu) {
> - copy_map_value_long(map, per_cpu_ptr(pptr, cpu), value + off);
> + copy_map_value_long(map, per_cpu_ptr(pptr, cpu),
> + reuse_value ? value : value + off);
> bpf_obj_free_fields(array->map.record, per_cpu_ptr(pptr, cpu));
> off += size;
ditto here, I'd not touch rcu locking and bpf_obj_free_fields. The
difference would be singular vs all CPUs, and then for all CPUs with
BPF_F_CPU we just don't update off, getting desired behavior without
extra reuse_value variable?
[...]
> @@ -1941,19 +1941,27 @@ int generic_map_update_batch(struct bpf_map *map, struct file *map_file,
> {
> void __user *values = u64_to_user_ptr(attr->batch.values);
> void __user *keys = u64_to_user_ptr(attr->batch.keys);
> + u64 elem_flags = attr->batch.elem_flags;
> u32 value_size, cp, max_count;
> void *key, *value;
> int err = 0;
>
> - if (attr->batch.elem_flags & ~BPF_F_LOCK)
> + if (elem_flags & ~(BPF_F_LOCK | BPF_F_CPU))
> return -EINVAL;
>
> - if ((attr->batch.elem_flags & BPF_F_LOCK) &&
> + if ((elem_flags & BPF_F_LOCK) &&
> !btf_record_has_field(map->record, BPF_SPIN_LOCK)) {
> return -EINVAL;
> }
>
> - value_size = bpf_map_value_size(map);
> + if (elem_flags & BPF_F_CPU) {
> + if (map->map_type != BPF_MAP_TYPE_PERCPU_ARRAY)
> + return -EINVAL;
> +
> + value_size = round_up(map->value_size, 8);
> + } else {
> + value_size = bpf_map_value_size(map);
> + }
why not roll this into bpf_map_value_size() helper? it's internal,
should be fine
pw-bot: cr
>
> max_count = attr->batch.count;
> if (!max_count)
> @@ -1980,7 +1988,8 @@ int generic_map_update_batch(struct bpf_map *map, struct file *map_file,
> break;
>
> err = bpf_map_update_value(map, map_file, key, value,
> - attr->batch.elem_flags);
> + attr->batch.elem_flags,
> + attr->batch.cpu);
So I think we discussed cpu as a separate field vs embedded into flags
field, right? I don't remember what I argued for, but looking at this
patch, it seems like it would be more convenient to have cpu come as
part of flags, no? And I don't mean UAPI-side, there separate cpu
field I think makes most sense. But internally I'd roll it into flags
as ((cpu << 32) | flags), instead of dragging it around everywhere. It
feels unclean to have "cpu" argument to generic
bpf_map_copy_value()...
(and looking at how much code we add just to pass that extra cpu
argument through libbpf API, maybe combining cpu and flags is actually
a way to go?..)
WDYT?
[...]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH bpf-next 2/3] bpf, libbpf: Support BPF_F_CPU for percpu_array map
2025-06-24 16:53 ` [RFC PATCH bpf-next 2/3] bpf, libbpf: Support BPF_F_CPU " Leon Hwang
@ 2025-07-01 20:22 ` Andrii Nakryiko
2025-07-02 17:28 ` Leon Hwang
0 siblings, 1 reply; 13+ messages in thread
From: Andrii Nakryiko @ 2025-07-01 20:22 UTC (permalink / raw)
To: Leon Hwang; +Cc: bpf, ast, andrii, daniel
On Tue, Jun 24, 2025 at 9:55 AM Leon Hwang <leon.hwang@linux.dev> wrote:
>
> This patch adds libbpf support for the BPF_F_CPU flag in percpu_array maps,
> introducing the following APIs:
>
> 1. bpf_map_update_elem_opts(): update with struct bpf_map_update_elem_opts
> 2. bpf_map_lookup_elem_opts(): lookup with struct bpf_map_lookup_elem_opts
> 3. bpf_map__update_elem_opts(): high-level wrapper with input validation
> 4. bpf_map__lookup_elem_opts(): high-level wrapper with input validation
>
> Behavior:
>
> * If opts->cpu == 0xFFFFFFFF, the update is applied to all CPUs.
> * Otherwise, it applies only to the specified CPU.
> * Lookup APIs retrieve values from the target CPU when BPF_F_CPU is used.
>
> Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
> ---
> tools/lib/bpf/bpf.c | 37 +++++++++++++++++++++++
> tools/lib/bpf/bpf.h | 35 +++++++++++++++++++++-
> tools/lib/bpf/libbpf.c | 56 +++++++++++++++++++++++++++++++++++
> tools/lib/bpf/libbpf.h | 45 ++++++++++++++++++++++++++++
> tools/lib/bpf/libbpf.map | 4 +++
> tools/lib/bpf/libbpf_common.h | 12 ++++++++
> 6 files changed, 188 insertions(+), 1 deletion(-)
>
> diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
> index 6eb421ccf91b..80f7ea041187 100644
> --- a/tools/lib/bpf/bpf.c
> +++ b/tools/lib/bpf/bpf.c
> @@ -402,6 +402,24 @@ int bpf_map_update_elem(int fd, const void *key, const void *value,
> return libbpf_err_errno(ret);
> }
>
> +int bpf_map_update_elem_opts(int fd, const void *key, const void *value,
> + const struct bpf_map_update_elem_opts *opts)
> +{
> + const size_t attr_sz = offsetofend(union bpf_attr, cpu);
> + union bpf_attr attr;
> + int ret;
> +
> + memset(&attr, 0, attr_sz);
> + attr.map_fd = fd;
> + attr.key = ptr_to_u64(key);
> + attr.value = ptr_to_u64(value);
> + attr.flags = OPTS_GET(opts, flags, 0);
> + attr.cpu = OPTS_GET(opts, cpu, BPF_ALL_CPU);
> +
> + ret = sys_bpf(BPF_MAP_UPDATE_ELEM, &attr, attr_sz);
> + return libbpf_err_errno(ret);
> +}
> +
> int bpf_map_lookup_elem(int fd, const void *key, void *value)
> {
> const size_t attr_sz = offsetofend(union bpf_attr, flags);
> @@ -433,6 +451,24 @@ int bpf_map_lookup_elem_flags(int fd, const void *key, void *value, __u64 flags)
> return libbpf_err_errno(ret);
> }
>
> +int bpf_map_lookup_elem_opts(int fd, const void *key, void *value,
> + const struct bpf_map_lookup_elem_opts *opts)
> +{
> + const size_t attr_sz = offsetofend(union bpf_attr, cpu);
> + union bpf_attr attr;
> + int ret;
> +
> + memset(&attr, 0, attr_sz);
> + attr.map_fd = fd;
> + attr.key = ptr_to_u64(key);
> + attr.value = ptr_to_u64(value);
> + attr.flags = OPTS_GET(opts, flags, 0);
> + attr.cpu = OPTS_GET(opts, cpu, BPF_ALL_CPU);
can't do that, setting cpu field to 0xffffffff on old kernels will
cause -EINVAL, immediate backwards compat breakage
just default it to zero, this field should remain zero and not be used
unless flags have BPF_F_CPU
> +
> + ret = sys_bpf(BPF_MAP_LOOKUP_ELEM, &attr, attr_sz);
> + return libbpf_err_errno(ret);
> +}
> +
> int bpf_map_lookup_and_delete_elem(int fd, const void *key, void *value)
> {
> const size_t attr_sz = offsetofend(union bpf_attr, flags);
> @@ -542,6 +578,7 @@ static int bpf_map_batch_common(int cmd, int fd, void *in_batch,
> attr.batch.count = *count;
> attr.batch.elem_flags = OPTS_GET(opts, elem_flags, 0);
> attr.batch.flags = OPTS_GET(opts, flags, 0);
> + attr.batch.cpu = OPTS_GET(opts, cpu, BPF_ALL_CPU);
ditto
>
> ret = sys_bpf(cmd, &attr, attr_sz);
> *count = attr.batch.count;
> diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
> index 1342564214c8..7c6a0a3693c9 100644
> --- a/tools/lib/bpf/bpf.h
> +++ b/tools/lib/bpf/bpf.h
> @@ -163,12 +163,41 @@ LIBBPF_API int bpf_map_delete_elem_flags(int fd, const void *key, __u64 flags);
> LIBBPF_API int bpf_map_get_next_key(int fd, const void *key, void *next_key);
> LIBBPF_API int bpf_map_freeze(int fd);
>
> +/**
> + * @brief **bpf_map_update_elem_opts** allows for updating percpu map with value
> + * on specified CPU or on all CPUs.
IMO, a bit too specific a description. xxx_ops APIs are extended
versions of original non-opts APIs allowing to pass extra (optional)
arguments. Keep it generic. cpu field is currently the only "extra",
but this might grow over time
> + *
> + * @param fd BPF map file descriptor
> + * @param key pointer to key
> + * @param value pointer to value
> + * @param opts options for configuring the way to update percpu map
again, too specific
> + * @return 0, on success; negative error code, otherwise (errno is also set to
> + * the error code)
> + */
> +LIBBPF_API int bpf_map_update_elem_opts(int fd, const void *key, const void *value,
> + const struct bpf_map_update_elem_opts *opts);
> +
> +/**
> + * @brief **bpf_map_lookup_elem_opts** allows for looking up the value from
> + * percpu map on specified CPU.
> + *
> + * @param fd BPF map file descriptor
> + * @param key pointer to key
> + * @param value pointer to value
> + * @param opts options for configuring the way to lookup percpu map
> + * @return 0, on success; negative error code, otherwise (errno is also set to
> + * the error code)
> + */
> +LIBBPF_API int bpf_map_lookup_elem_opts(int fd, const void *key, void *value,
> + const struct bpf_map_lookup_elem_opts *opts);
> +
> struct bpf_map_batch_opts {
> size_t sz; /* size of this struct for forward/backward compatibility */
> __u64 elem_flags;
> __u64 flags;
> + __u32 cpu;
add size_t: 0 to avoid having non-zeroed padding at the end (see other
opts structs)
> };
> -#define bpf_map_batch_opts__last_field flags
> +#define bpf_map_batch_opts__last_field cpu
>
>
> /**
> @@ -286,6 +315,10 @@ LIBBPF_API int bpf_map_lookup_and_delete_batch(int fd, void *in_batch,
> * Update spin_lock-ed map elements. This must be
> * specified if the map value contains a spinlock.
> *
> + * **BPF_F_CPU**
> + * As for percpu map, update value on all CPUs if **opts->cpu** is
> + * 0xFFFFFFFF, or on specified CPU otherwise.
> + *
> * @param fd BPF map file descriptor
> * @param keys pointer to an array of *count* keys
> * @param values pointer to an array of *count* values
> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index 6445165a24f2..30400bdc20d9 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -10636,6 +10636,34 @@ int bpf_map__lookup_elem(const struct bpf_map *map,
> return bpf_map_lookup_elem_flags(map->fd, key, value, flags);
> }
>
> +int bpf_map__lookup_elem_opts(const struct bpf_map *map, const void *key,
> + size_t key_sz, void *value, size_t value_sz,
> + const struct bpf_map_lookup_elem_opts *opts)
> +{
> + int nr_cpus = libbpf_num_possible_cpus();
> + __u32 cpu = OPTS_GET(opts, cpu, nr_cpus);
> + __u64 flags = OPTS_GET(opts, flags, 0);
> + int err;
> +
> + if (flags & BPF_F_CPU) {
> + if (map->def.type != BPF_MAP_TYPE_PERCPU_ARRAY)
> + return -EINVAL;
> + if (cpu >= nr_cpus)
> + return -E2BIG;
> + if (map->def.value_size != value_sz) {
> + pr_warn("map '%s': unexpected value size %zu provided, expected %u\n",
> + map->name, value_sz, map->def.value_size);
> + return -EINVAL;
> + }
shouldn't this go into validate_map_op?..
> + } else {
> + err = validate_map_op(map, key_sz, value_sz, true);
> + if (err)
> + return libbpf_err(err);
> + }
> +
> + return bpf_map_lookup_elem_opts(map->fd, key, value, opts);
> +}
> +
> int bpf_map__update_elem(const struct bpf_map *map,
> const void *key, size_t key_sz,
> const void *value, size_t value_sz, __u64 flags)
> @@ -10649,6 +10677,34 @@ int bpf_map__update_elem(const struct bpf_map *map,
> return bpf_map_update_elem(map->fd, key, value, flags);
> }
>
> +int bpf_map__update_elem_opts(const struct bpf_map *map, const void *key,
> + size_t key_sz, const void *value, size_t value_sz,
> + const struct bpf_map_update_elem_opts *opts)
> +{
> + int nr_cpus = libbpf_num_possible_cpus();
> + __u32 cpu = OPTS_GET(opts, cpu, nr_cpus);
> + __u64 flags = OPTS_GET(opts, flags, 0);
> + int err;
> +
> + if (flags & BPF_F_CPU) {
> + if (map->def.type != BPF_MAP_TYPE_PERCPU_ARRAY)
> + return -EINVAL;
> + if (cpu != BPF_ALL_CPU && cpu >= nr_cpus)
> + return -E2BIG;
> + if (map->def.value_size != value_sz) {
> + pr_warn("map '%s': unexpected value size %zu provided, expected %u\n",
> + map->name, value_sz, map->def.value_size);
> + return -EINVAL;
> + }
same, move into validate_map_op
> + } else {
> + err = validate_map_op(map, key_sz, value_sz, true);
> + if (err)
> + return libbpf_err(err);
> + }
> +
> + return bpf_map_update_elem_opts(map->fd, key, value, opts);
> +}
> +
> int bpf_map__delete_elem(const struct bpf_map *map,
> const void *key, size_t key_sz, __u64 flags)
> {
> diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
> index d1cf813a057b..ba0d15028c72 100644
> --- a/tools/lib/bpf/libbpf.h
> +++ b/tools/lib/bpf/libbpf.h
> @@ -1185,6 +1185,28 @@ LIBBPF_API int bpf_map__lookup_elem(const struct bpf_map *map,
> const void *key, size_t key_sz,
> void *value, size_t value_sz, __u64 flags);
>
> +/**
> + * @brief **bpf_map__lookup_elem_opts()** allows to lookup BPF map value
> + * corresponding to provided key with options to lookup percpu map.
> + * @param map BPF map to lookup element in
> + * @param key pointer to memory containing bytes of the key used for lookup
> + * @param key_sz size in bytes of key data, needs to match BPF map definition's **key_size**
> + * @param value pointer to memory in which looked up value will be stored
> + * @param value_sz size in byte of value data memory; it has to match BPF map
> + * definition's **value_size**. For per-CPU BPF maps value size can be
> + * definition's **value_size** if **BPF_F_CPU** is specified in **opts->flags**,
> + * or the size described in **bpf_map__lookup_elem()**.
let's describe all this sizing in one place (either __lookup_elem or
__lookup_elem_opts) and then refer to that succinctly from another one
(without BPF_F_CPU exception spread out across two API descriptions)
> + * @opts extra options passed to kernel for this operation
> + * @return 0, on success; negative error, otherwise
> + *
> + * **bpf_map__lookup_elem_opts()** is high-level equivalent of
> + * **bpf_map_lookup_elem_opts()** API with added check for key and value size.
> + */
> +LIBBPF_API int bpf_map__lookup_elem_opts(const struct bpf_map *map,
> + const void *key, size_t key_sz,
> + void *value, size_t value_sz,
> + const struct bpf_map_lookup_elem_opts *opts);
> +
> /**
> * @brief **bpf_map__update_elem()** allows to insert or update value in BPF
> * map that corresponds to provided key.
> @@ -1209,6 +1231,29 @@ LIBBPF_API int bpf_map__update_elem(const struct bpf_map *map,
> const void *key, size_t key_sz,
> const void *value, size_t value_sz, __u64 flags);
>
> +/**
> + * @brief **bpf_map__update_elem_opts()** allows to insert or update value in BPF
> + * map that corresponds to provided key with options for percpu maps.
> + * @param map BPF map to insert to or update element in
> + * @param key pointer to memory containing bytes of the key
> + * @param key_sz size in bytes of key data, needs to match BPF map definition's **key_size**
> + * @param value pointer to memory containing bytes of the value
> + * @param value_sz size in byte of value data memory; it has to match BPF map
> + * definition's **value_size**. For per-CPU BPF maps value size can be
> + * definition's **value_size** if **BPF_F_CPU** is specified in **opts->flags**,
> + * or the size described in **bpf_map__update_elem()**.
> + * @opts extra options passed to kernel for this operation
> + * @flags extra flags passed to kernel for this operation
> + * @return 0, on success; negative error, otherwise
> + *
> + * **bpf_map__update_elem_opts()** is high-level equivalent of
> + * **bpf_map_update_elem_opts()** API with added check for key and value size.
> + */
> +LIBBPF_API int bpf_map__update_elem_opts(const struct bpf_map *map,
> + const void *key, size_t key_sz,
> + const void *value, size_t value_sz,
> + const struct bpf_map_update_elem_opts *opts);
> +
> /**
> * @brief **bpf_map__delete_elem()** allows to delete element in BPF map that
> * corresponds to provided key.
> diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
> index c7fc0bde5648..c39814adeae9 100644
> --- a/tools/lib/bpf/libbpf.map
> +++ b/tools/lib/bpf/libbpf.map
> @@ -436,6 +436,10 @@ LIBBPF_1.6.0 {
> bpf_linker__add_buf;
> bpf_linker__add_fd;
> bpf_linker__new_fd;
> + bpf_map__lookup_elem_opts;
> + bpf_map__update_elem_opts;
> + bpf_map_lookup_elem_opts;
> + bpf_map_update_elem_opts;
> bpf_object__prepare;
> bpf_program__attach_cgroup_opts;
> bpf_program__func_info;
> diff --git a/tools/lib/bpf/libbpf_common.h b/tools/lib/bpf/libbpf_common.h
> index 8fe248e14eb6..ef29caf91f9c 100644
> --- a/tools/lib/bpf/libbpf_common.h
> +++ b/tools/lib/bpf/libbpf_common.h
> @@ -89,4 +89,16 @@
> memcpy(&NAME, &___##NAME, sizeof(NAME)); \
> } while (0)
>
> +struct bpf_map_update_elem_opts {
> + size_t sz; /* size of this struct for forward/backward compatibility */
> + __u64 flags;
> + __u32 cpu;
size_t: 0
> +};
> +
> +struct bpf_map_lookup_elem_opts {
> + size_t sz; /* size of this struct for forward/backward compatibility */
> + __u64 flags;
> + __u32 cpu;
size_t: 0
> +};
> +
> #endif /* __LIBBPF_LIBBPF_COMMON_H */
> --
> 2.49.0
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH bpf-next 3/3] selftests/bpf: Add case to test BPF_F_CPU
2025-06-24 16:53 ` [RFC PATCH bpf-next 3/3] selftests/bpf: Add case to test BPF_F_CPU Leon Hwang
@ 2025-07-01 20:22 ` Andrii Nakryiko
2025-07-02 17:29 ` Leon Hwang
0 siblings, 1 reply; 13+ messages in thread
From: Andrii Nakryiko @ 2025-07-01 20:22 UTC (permalink / raw)
To: Leon Hwang; +Cc: bpf, ast, andrii, daniel
On Tue, Jun 24, 2025 at 9:55 AM Leon Hwang <leon.hwang@linux.dev> wrote:
>
> This patch adds test coverage for the new BPF_F_CPU flag support in
> percpu_array maps. The following APIs are exercised:
>
> * bpf_map_update_batch()
> * bpf_map_lookup_batch()
> * bpf_map_update_elem_opts()
> * bpf_map__update_elem_opts()
> * bpf_map_lookup_elem_opts()
> * bpf_map__lookup_elem_opts()
>
> cd tools/testing/selftests/bpf/
> ./test_progs -t percpu_alloc/cpu_flag_tests
> 251/5 percpu_alloc/cpu_flag_tests:OK
> 251 percpu_alloc:OK
> Summary: 1/1 PASSED, 0 SKIPPED, 0 FAILED
>
> Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
> ---
> .../selftests/bpf/prog_tests/percpu_alloc.c | 169 ++++++++++++++++++
> .../selftests/bpf/progs/percpu_array_flag.c | 24 +++
> 2 files changed, 193 insertions(+)
> create mode 100644 tools/testing/selftests/bpf/progs/percpu_array_flag.c
>
> diff --git a/tools/testing/selftests/bpf/prog_tests/percpu_alloc.c b/tools/testing/selftests/bpf/prog_tests/percpu_alloc.c
> index 343da65864d6..5727f4601b49 100644
> --- a/tools/testing/selftests/bpf/prog_tests/percpu_alloc.c
> +++ b/tools/testing/selftests/bpf/prog_tests/percpu_alloc.c
> @@ -3,6 +3,7 @@
> #include "percpu_alloc_array.skel.h"
> #include "percpu_alloc_cgrp_local_storage.skel.h"
> #include "percpu_alloc_fail.skel.h"
> +#include "percpu_array_flag.skel.h"
>
> static void test_array(void)
> {
> @@ -115,6 +116,172 @@ static void test_failure(void) {
> RUN_TESTS(percpu_alloc_fail);
> }
>
> +static void test_cpu_flag(void)
> +{
> + int map_fd, *keys = NULL, value_size, cpu, i, j, nr_cpus, err;
> + size_t key_sz = sizeof(int), value_sz = sizeof(u64);
> + struct percpu_array_flag *skel;
> + u64 batch = 0, *values = NULL;
> + const u64 value = 0xDEADC0DE;
> + u32 count, max_entries;
> + struct bpf_map *map;
> + DECLARE_LIBBPF_OPTS(bpf_map_lookup_elem_opts, lookup_opts,
> + .flags = BPF_F_CPU,
> + .cpu = 0,
> + );
use shorter LIBBPF_OPTS macro, please
> + DECLARE_LIBBPF_OPTS(bpf_map_update_elem_opts, update_opts,
> + .flags = BPF_F_CPU,
> + .cpu = 0,
> + );
> + DECLARE_LIBBPF_OPTS(bpf_map_batch_opts, batch_opts,
> + .elem_flags = BPF_F_CPU,
> + .flags = 0,
> + );
> +
[...]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH bpf-next 1/3] bpf: Introduce BPF_F_CPU flag for percpu_array map
2025-07-01 20:22 ` Andrii Nakryiko
@ 2025-07-02 17:01 ` Leon Hwang
2025-07-02 17:13 ` Andrii Nakryiko
0 siblings, 1 reply; 13+ messages in thread
From: Leon Hwang @ 2025-07-02 17:01 UTC (permalink / raw)
To: Andrii Nakryiko; +Cc: bpf, ast, andrii, daniel
On 2025/7/2 04:22, Andrii Nakryiko wrote:
> On Tue, Jun 24, 2025 at 9:54 AM Leon Hwang <leon.hwang@linux.dev> wrote:
>>
>> This patch introduces support for the BPF_F_CPU flag in percpu_array maps
>> to allow updating or looking up values for specific CPUs or for all CPUs
>> with a single value.
>>
>> This enhancement enables:
>>
>> * Efficient update of all CPUs using a single value when cpu == 0xFFFFFFFF.
>> * Targeted update or lookup for a specific CPU otherwise.
>>
>> The flag is passed via:
>>
>> * map_flags in bpf_percpu_array_update() along with the cpu field.
>> * elem_flags in generic_map_update_batch() along with the cpu field.
>>
>> Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
>> ---
>> include/linux/bpf.h | 5 +--
>> include/uapi/linux/bpf.h | 6 ++++
>> kernel/bpf/arraymap.c | 46 ++++++++++++++++++++++++----
>> kernel/bpf/syscall.c | 56 ++++++++++++++++++++++------------
>> tools/include/uapi/linux/bpf.h | 6 ++++
>> 5 files changed, 92 insertions(+), 27 deletions(-)
>>
>
> [...]
>
>> #define BPF_ALL_CPU 0xFFFFFFFF
>
> at the very least we have to make it an enum, IMO. but I'm in general
> unsure if we need it at all... and in any case, should it be named
> "BPF_ALL_CPUS" (plural)?
>
To avoid using such special value, would it be better to update value
across all CPUs when the cpu equals to num_possible_cpus()?
>
>> -int bpf_percpu_array_copy(struct bpf_map *map, void *key, void *value)
>> +int bpf_percpu_array_copy(struct bpf_map *map, void *key, void *value,
>> + u64 flags, u32 cpu)
>> {
>> struct bpf_array *array = container_of(map, struct bpf_array, map);
>> u32 index = *(u32 *)key;
>> void __percpu *pptr;
>> - int cpu, off = 0;
>> + int off = 0;
>> u32 size;
>>
>> if (unlikely(index >= array->map.max_entries))
>> return -ENOENT;
>>
>> + if (unlikely(flags > BPF_F_CPU))
>> + /* unknown flags */
>> + return -EINVAL;
>> +
>> /* per_cpu areas are zero-filled and bpf programs can only
>> * access 'value_size' of them, so copying rounded areas
>> * will not leak any kernel data
>> */
>> size = array->elem_size;
>> +
>> + if (flags & BPF_F_CPU) {
>> + if (cpu >= num_possible_cpus())
>> + return -E2BIG;
>> +
>> + rcu_read_lock();
>> + pptr = array->pptrs[index & array->index_mask];
>> + copy_map_value_long(map, value, per_cpu_ptr(pptr, cpu));
>> + check_and_init_map_value(map, value);
>> + rcu_read_unlock();
>> + return 0;
>> + }
>> +
>
> nit: it seems a bit cleaner to me to not duplicate
> rcu_read_{lock,unlock} and pptr fetching
>
> I'd probably add `if ((flags & BPF_F_CPU) && cpu >=
> num_possible_cpus())` check, and then within rcu region
>
> if (flags & BPF_F_CPU) {
> copy_map_value_long(...);
> check_and_init_map_value(...);
> } else {
> for_each_possible_cpu(cpu) {
> copy_map_value_long(...);
> check_and_init_map_value(...);
> }
> }
>
>
> This to me is more explicitly showing that locking/data fetching isn't
> different, and it's only about singular CPU vs all CPUs
>
> (oh, and move int off inside the else branch then as well)
>
LGTM, I'll do it.
>
>> rcu_read_lock();
>> pptr = array->pptrs[index & array->index_mask];
>> for_each_possible_cpu(cpu) {
>> @@ -382,15 +400,16 @@ static long array_map_update_elem(struct bpf_map *map, void *key, void *value,
>> }
>>
>> int bpf_percpu_array_update(struct bpf_map *map, void *key, void *value,
>> - u64 map_flags)
>> + u64 map_flags, u32 cpu)
>> {
>> struct bpf_array *array = container_of(map, struct bpf_array, map);
>> u32 index = *(u32 *)key;
>> void __percpu *pptr;
>> - int cpu, off = 0;
>> + bool reuse_value;
>> + int off = 0;
>> u32 size;
>>
>> - if (unlikely(map_flags > BPF_EXIST))
>> + if (unlikely(map_flags > BPF_F_CPU))
>> /* unknown flags */
>> return -EINVAL;
>>
>> @@ -409,10 +428,25 @@ int bpf_percpu_array_update(struct bpf_map *map, void *key, void *value,
>> * so no kernel data leaks possible
>> */
>> size = array->elem_size;
>> +
>> + if ((map_flags & BPF_F_CPU) && cpu != BPF_ALL_CPU) {
>> + if (cpu >= num_possible_cpus())
>> + return -E2BIG;
>> +
>> + rcu_read_lock();
>> + pptr = array->pptrs[index & array->index_mask];
>> + copy_map_value_long(map, per_cpu_ptr(pptr, cpu), value);
>> + bpf_obj_free_fields(array->map.record, per_cpu_ptr(pptr, cpu));
>> + rcu_read_unlock();
>> + return 0;
>> + }
>> +
>> + reuse_value = (map_flags & BPF_F_CPU) && cpu == BPF_ALL_CPU;
>> rcu_read_lock();
>> pptr = array->pptrs[index & array->index_mask];
>> for_each_possible_cpu(cpu) {
>> - copy_map_value_long(map, per_cpu_ptr(pptr, cpu), value + off);
>> + copy_map_value_long(map, per_cpu_ptr(pptr, cpu),
>> + reuse_value ? value : value + off);
>> bpf_obj_free_fields(array->map.record, per_cpu_ptr(pptr, cpu));
>> off += size;
>
>
> ditto here, I'd not touch rcu locking and bpf_obj_free_fields. The
> difference would be singular vs all CPUs, and then for all CPUs with
> BPF_F_CPU we just don't update off, getting desired behavior without
> extra reuse_value variable?
>
Ack.
> [...]
>
>> @@ -1941,19 +1941,27 @@ int generic_map_update_batch(struct bpf_map *map, struct file *map_file,
>> {
>> void __user *values = u64_to_user_ptr(attr->batch.values);
>> void __user *keys = u64_to_user_ptr(attr->batch.keys);
>> + u64 elem_flags = attr->batch.elem_flags;
>> u32 value_size, cp, max_count;
>> void *key, *value;
>> int err = 0;
>>
>> - if (attr->batch.elem_flags & ~BPF_F_LOCK)
>> + if (elem_flags & ~(BPF_F_LOCK | BPF_F_CPU))
>> return -EINVAL;
>>
>> - if ((attr->batch.elem_flags & BPF_F_LOCK) &&
>> + if ((elem_flags & BPF_F_LOCK) &&
>> !btf_record_has_field(map->record, BPF_SPIN_LOCK)) {
>> return -EINVAL;
>> }
>>
>> - value_size = bpf_map_value_size(map);
>> + if (elem_flags & BPF_F_CPU) {
>> + if (map->map_type != BPF_MAP_TYPE_PERCPU_ARRAY)
>> + return -EINVAL;
>> +
>> + value_size = round_up(map->value_size, 8);
>> + } else {
>> + value_size = bpf_map_value_size(map);
>> + }
>
> why not roll this into bpf_map_value_size() helper? it's internal,
> should be fine
>
It's to avoid updating value_size by pointer like
err = bpf_map_value_size(map, elem_flags, &value_size);
However, it's OK for me to do so.
> pw-bot: cr
>
>>
>> max_count = attr->batch.count;
>> if (!max_count)
>> @@ -1980,7 +1988,8 @@ int generic_map_update_batch(struct bpf_map *map, struct file *map_file,
>> break;
>>
>> err = bpf_map_update_value(map, map_file, key, value,
>> - attr->batch.elem_flags);
>> + attr->batch.elem_flags,
>> + attr->batch.cpu);
>
> So I think we discussed cpu as a separate field vs embedded into flags
> field, right? I don't remember what I argued for, but looking at this
> patch, it seems like it would be more convenient to have cpu come as
> part of flags, no? And I don't mean UAPI-side, there separate cpu
> field I think makes most sense. But internally I'd roll it into flags
> as ((cpu << 32) | flags), instead of dragging it around everywhere. It
> feels unclean to have "cpu" argument to generic
> bpf_map_copy_value()...
>
> (and looking at how much code we add just to pass that extra cpu
> argument through libbpf API, maybe combining cpu and flags is actually
> a way to go?..)
>
> WDYT?
>
I'd like to embed it into flags field in RFC v2.
Thereafter, we can discuss them clearly.
Thanks,
Leon
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH bpf-next 1/3] bpf: Introduce BPF_F_CPU flag for percpu_array map
2025-07-02 17:01 ` Leon Hwang
@ 2025-07-02 17:13 ` Andrii Nakryiko
0 siblings, 0 replies; 13+ messages in thread
From: Andrii Nakryiko @ 2025-07-02 17:13 UTC (permalink / raw)
To: Leon Hwang; +Cc: bpf, ast, andrii, daniel
On Wed, Jul 2, 2025 at 10:02 AM Leon Hwang <leon.hwang@linux.dev> wrote:
>
>
>
> On 2025/7/2 04:22, Andrii Nakryiko wrote:
> > On Tue, Jun 24, 2025 at 9:54 AM Leon Hwang <leon.hwang@linux.dev> wrote:
> >>
> >> This patch introduces support for the BPF_F_CPU flag in percpu_array maps
> >> to allow updating or looking up values for specific CPUs or for all CPUs
> >> with a single value.
> >>
> >> This enhancement enables:
> >>
> >> * Efficient update of all CPUs using a single value when cpu == 0xFFFFFFFF.
> >> * Targeted update or lookup for a specific CPU otherwise.
> >>
> >> The flag is passed via:
> >>
> >> * map_flags in bpf_percpu_array_update() along with the cpu field.
> >> * elem_flags in generic_map_update_batch() along with the cpu field.
> >>
> >> Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
> >> ---
> >> include/linux/bpf.h | 5 +--
> >> include/uapi/linux/bpf.h | 6 ++++
> >> kernel/bpf/arraymap.c | 46 ++++++++++++++++++++++++----
> >> kernel/bpf/syscall.c | 56 ++++++++++++++++++++++------------
> >> tools/include/uapi/linux/bpf.h | 6 ++++
> >> 5 files changed, 92 insertions(+), 27 deletions(-)
> >>
> >
> > [...]
> >
> >> #define BPF_ALL_CPU 0xFFFFFFFF
> >
> > at the very least we have to make it an enum, IMO. but I'm in general
> > unsure if we need it at all... and in any case, should it be named
> > "BPF_ALL_CPUS" (plural)?
> >
>
> To avoid using such special value, would it be better to update value
> across all CPUs when the cpu equals to num_possible_cpus()?
no, I'd keep special pattern (it's unnecessary complication to figure
out num_possible_cpus value), it's just the question whether to add an
enum for it in the UAPI or just document (u32)~0 as special case
[...]
> > [...]
> >
> >> @@ -1941,19 +1941,27 @@ int generic_map_update_batch(struct bpf_map *map, struct file *map_file,
> >> {
> >> void __user *values = u64_to_user_ptr(attr->batch.values);
> >> void __user *keys = u64_to_user_ptr(attr->batch.keys);
> >> + u64 elem_flags = attr->batch.elem_flags;
> >> u32 value_size, cp, max_count;
> >> void *key, *value;
> >> int err = 0;
> >>
> >> - if (attr->batch.elem_flags & ~BPF_F_LOCK)
> >> + if (elem_flags & ~(BPF_F_LOCK | BPF_F_CPU))
> >> return -EINVAL;
> >>
> >> - if ((attr->batch.elem_flags & BPF_F_LOCK) &&
> >> + if ((elem_flags & BPF_F_LOCK) &&
> >> !btf_record_has_field(map->record, BPF_SPIN_LOCK)) {
> >> return -EINVAL;
> >> }
> >>
> >> - value_size = bpf_map_value_size(map);
> >> + if (elem_flags & BPF_F_CPU) {
> >> + if (map->map_type != BPF_MAP_TYPE_PERCPU_ARRAY)
> >> + return -EINVAL;
> >> +
> >> + value_size = round_up(map->value_size, 8);
> >> + } else {
> >> + value_size = bpf_map_value_size(map);
> >> + }
> >
> > why not roll this into bpf_map_value_size() helper? it's internal,
> > should be fine
> >
>
> It's to avoid updating value_size by pointer like
>
> err = bpf_map_value_size(map, elem_flags, &value_size);
>
> However, it's OK for me to do so.
if you need to communicate error, then return negative value_size? but
alternatively just do error checking before bpf_map_value_size
[...]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH bpf-next 2/3] bpf, libbpf: Support BPF_F_CPU for percpu_array map
2025-07-01 20:22 ` Andrii Nakryiko
@ 2025-07-02 17:28 ` Leon Hwang
2025-07-02 17:30 ` Andrii Nakryiko
0 siblings, 1 reply; 13+ messages in thread
From: Leon Hwang @ 2025-07-02 17:28 UTC (permalink / raw)
To: Andrii Nakryiko; +Cc: bpf, ast, andrii, daniel
On 2025/7/2 04:22, Andrii Nakryiko wrote:
> On Tue, Jun 24, 2025 at 9:55 AM Leon Hwang <leon.hwang@linux.dev> wrote:
>>
>> This patch adds libbpf support for the BPF_F_CPU flag in percpu_array maps,
>> introducing the following APIs:
>>
>> 1. bpf_map_update_elem_opts(): update with struct bpf_map_update_elem_opts
>> 2. bpf_map_lookup_elem_opts(): lookup with struct bpf_map_lookup_elem_opts
>> 3. bpf_map__update_elem_opts(): high-level wrapper with input validation
>> 4. bpf_map__lookup_elem_opts(): high-level wrapper with input validation
>>
>> Behavior:
>>
>> * If opts->cpu == 0xFFFFFFFF, the update is applied to all CPUs.
>> * Otherwise, it applies only to the specified CPU.
>> * Lookup APIs retrieve values from the target CPU when BPF_F_CPU is used.
>>
>> Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
>> ---
>> tools/lib/bpf/bpf.c | 37 +++++++++++++++++++++++
>> tools/lib/bpf/bpf.h | 35 +++++++++++++++++++++-
>> tools/lib/bpf/libbpf.c | 56 +++++++++++++++++++++++++++++++++++
>> tools/lib/bpf/libbpf.h | 45 ++++++++++++++++++++++++++++
>> tools/lib/bpf/libbpf.map | 4 +++
>> tools/lib/bpf/libbpf_common.h | 12 ++++++++
>> 6 files changed, 188 insertions(+), 1 deletion(-)
>>
>> diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
>> index 6eb421ccf91b..80f7ea041187 100644
>> --- a/tools/lib/bpf/bpf.c
>> +++ b/tools/lib/bpf/bpf.c
>> @@ -402,6 +402,24 @@ int bpf_map_update_elem(int fd, const void *key, const void *value,
>> return libbpf_err_errno(ret);
>> }
>>
>> +int bpf_map_update_elem_opts(int fd, const void *key, const void *value,
>> + const struct bpf_map_update_elem_opts *opts)
>> +{
>> + const size_t attr_sz = offsetofend(union bpf_attr, cpu);
>> + union bpf_attr attr;
>> + int ret;
>> +
>> + memset(&attr, 0, attr_sz);
>> + attr.map_fd = fd;
>> + attr.key = ptr_to_u64(key);
>> + attr.value = ptr_to_u64(value);
>> + attr.flags = OPTS_GET(opts, flags, 0);
>> + attr.cpu = OPTS_GET(opts, cpu, BPF_ALL_CPU);
>> +
>> + ret = sys_bpf(BPF_MAP_UPDATE_ELEM, &attr, attr_sz);
>> + return libbpf_err_errno(ret);
>> +}
>> +
>> int bpf_map_lookup_elem(int fd, const void *key, void *value)
>> {
>> const size_t attr_sz = offsetofend(union bpf_attr, flags);
>> @@ -433,6 +451,24 @@ int bpf_map_lookup_elem_flags(int fd, const void *key, void *value, __u64 flags)
>> return libbpf_err_errno(ret);
>> }
>>
>> +int bpf_map_lookup_elem_opts(int fd, const void *key, void *value,
>> + const struct bpf_map_lookup_elem_opts *opts)
>> +{
>> + const size_t attr_sz = offsetofend(union bpf_attr, cpu);
>> + union bpf_attr attr;
>> + int ret;
>> +
>> + memset(&attr, 0, attr_sz);
>> + attr.map_fd = fd;
>> + attr.key = ptr_to_u64(key);
>> + attr.value = ptr_to_u64(value);
>> + attr.flags = OPTS_GET(opts, flags, 0);
>> + attr.cpu = OPTS_GET(opts, cpu, BPF_ALL_CPU);
>
> can't do that, setting cpu field to 0xffffffff on old kernels will
> cause -EINVAL, immediate backwards compat breakage
>
> just default it to zero, this field should remain zero and not be used
> unless flags have BPF_F_CPU
>
Ack.
>> +
>> + ret = sys_bpf(BPF_MAP_LOOKUP_ELEM, &attr, attr_sz);
>> + return libbpf_err_errno(ret);
>> +}
>> +
>> int bpf_map_lookup_and_delete_elem(int fd, const void *key, void *value)
>> {
>> const size_t attr_sz = offsetofend(union bpf_attr, flags);
>> @@ -542,6 +578,7 @@ static int bpf_map_batch_common(int cmd, int fd, void *in_batch,
>> attr.batch.count = *count;
>> attr.batch.elem_flags = OPTS_GET(opts, elem_flags, 0);
>> attr.batch.flags = OPTS_GET(opts, flags, 0);
>> + attr.batch.cpu = OPTS_GET(opts, cpu, BPF_ALL_CPU);
>
> ditto
>
Ack.
>>
>> ret = sys_bpf(cmd, &attr, attr_sz);
>> *count = attr.batch.count;
>> diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
>> index 1342564214c8..7c6a0a3693c9 100644
>> --- a/tools/lib/bpf/bpf.h
>> +++ b/tools/lib/bpf/bpf.h
>> @@ -163,12 +163,41 @@ LIBBPF_API int bpf_map_delete_elem_flags(int fd, const void *key, __u64 flags);
>> LIBBPF_API int bpf_map_get_next_key(int fd, const void *key, void *next_key);
>> LIBBPF_API int bpf_map_freeze(int fd);
>>
>> +/**
>> + * @brief **bpf_map_update_elem_opts** allows for updating percpu map with value
>> + * on specified CPU or on all CPUs.
>
> IMO, a bit too specific a description. xxx_ops APIs are extended
> versions of original non-opts APIs allowing to pass extra (optional)
> arguments. Keep it generic. cpu field is currently the only "extra",
> but this might grow over time
>
I'll update it.
>> + *
>> + * @param fd BPF map file descriptor
>> + * @param key pointer to key
>> + * @param value pointer to value
>> + * @param opts options for configuring the way to update percpu map
>
> again, too specific
>
Ack.
>> + * @return 0, on success; negative error code, otherwise (errno is also set to
>> + * the error code)
>> + */
>> +LIBBPF_API int bpf_map_update_elem_opts(int fd, const void *key, const void *value,
>> + const struct bpf_map_update_elem_opts *opts);
>> +
>> +/**
>> + * @brief **bpf_map_lookup_elem_opts** allows for looking up the value from
>> + * percpu map on specified CPU.
>> + *
>> + * @param fd BPF map file descriptor
>> + * @param key pointer to key
>> + * @param value pointer to value
>> + * @param opts options for configuring the way to lookup percpu map
>> + * @return 0, on success; negative error code, otherwise (errno is also set to
>> + * the error code)
>> + */
>> +LIBBPF_API int bpf_map_lookup_elem_opts(int fd, const void *key, void *value,
>> + const struct bpf_map_lookup_elem_opts *opts);
>> +
>> struct bpf_map_batch_opts {
>> size_t sz; /* size of this struct for forward/backward compatibility */
>> __u64 elem_flags;
>> __u64 flags;
>> + __u32 cpu;
>
> add size_t: 0 to avoid having non-zeroed padding at the end (see other
> opts structs)
>
Ack.
>> };
>> -#define bpf_map_batch_opts__last_field flags
>> +#define bpf_map_batch_opts__last_field cpu
>>
>>
>> /**
>> @@ -286,6 +315,10 @@ LIBBPF_API int bpf_map_lookup_and_delete_batch(int fd, void *in_batch,
>> * Update spin_lock-ed map elements. This must be
>> * specified if the map value contains a spinlock.
>> *
>> + * **BPF_F_CPU**
>> + * As for percpu map, update value on all CPUs if **opts->cpu** is
>> + * 0xFFFFFFFF, or on specified CPU otherwise.
>> + *
>> * @param fd BPF map file descriptor
>> * @param keys pointer to an array of *count* keys
>> * @param values pointer to an array of *count* values
>> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
>> index 6445165a24f2..30400bdc20d9 100644
>> --- a/tools/lib/bpf/libbpf.c
>> +++ b/tools/lib/bpf/libbpf.c
>> @@ -10636,6 +10636,34 @@ int bpf_map__lookup_elem(const struct bpf_map *map,
>> return bpf_map_lookup_elem_flags(map->fd, key, value, flags);
>> }
>>
>> +int bpf_map__lookup_elem_opts(const struct bpf_map *map, const void *key,
>> + size_t key_sz, void *value, size_t value_sz,
>> + const struct bpf_map_lookup_elem_opts *opts)
>> +{
>> + int nr_cpus = libbpf_num_possible_cpus();
>> + __u32 cpu = OPTS_GET(opts, cpu, nr_cpus);
>> + __u64 flags = OPTS_GET(opts, flags, 0);
>> + int err;
>> +
>> + if (flags & BPF_F_CPU) {
>> + if (map->def.type != BPF_MAP_TYPE_PERCPU_ARRAY)
>> + return -EINVAL;
>> + if (cpu >= nr_cpus)
>> + return -E2BIG;
>> + if (map->def.value_size != value_sz) {
>> + pr_warn("map '%s': unexpected value size %zu provided, expected %u\n",
>> + map->name, value_sz, map->def.value_size);
>> + return -EINVAL;
>> + }
>
> shouldn't this go into validate_map_op?..
>
It should.
However, to avoid making validate_map_op really complicated, I'd like to
add validate_map_cpu_op to wrap checking cpu and validate_map_op.
>> + } else {
>> + err = validate_map_op(map, key_sz, value_sz, true);
>> + if (err)
>> + return libbpf_err(err);
>> + }
>> +
>> + return bpf_map_lookup_elem_opts(map->fd, key, value, opts);
>> +}
>> +
>> int bpf_map__update_elem(const struct bpf_map *map,
>> const void *key, size_t key_sz,
>> const void *value, size_t value_sz, __u64 flags)
>> @@ -10649,6 +10677,34 @@ int bpf_map__update_elem(const struct bpf_map *map,
>> return bpf_map_update_elem(map->fd, key, value, flags);
>> }
>>
>> +int bpf_map__update_elem_opts(const struct bpf_map *map, const void *key,
>> + size_t key_sz, const void *value, size_t value_sz,
>> + const struct bpf_map_update_elem_opts *opts)
>> +{
>> + int nr_cpus = libbpf_num_possible_cpus();
>> + __u32 cpu = OPTS_GET(opts, cpu, nr_cpus);
>> + __u64 flags = OPTS_GET(opts, flags, 0);
>> + int err;
>> +
>> + if (flags & BPF_F_CPU) {
>> + if (map->def.type != BPF_MAP_TYPE_PERCPU_ARRAY)
>> + return -EINVAL;
>> + if (cpu != BPF_ALL_CPU && cpu >= nr_cpus)
>> + return -E2BIG;
>> + if (map->def.value_size != value_sz) {
>> + pr_warn("map '%s': unexpected value size %zu provided, expected %u\n",
>> + map->name, value_sz, map->def.value_size);
>> + return -EINVAL;
>> + }
>
> same, move into validate_map_op
>
Ack.
>> + } else {
>> + err = validate_map_op(map, key_sz, value_sz, true);
>> + if (err)
>> + return libbpf_err(err);
>> + }
>> +
>> + return bpf_map_update_elem_opts(map->fd, key, value, opts);
>> +}
>> +
>> int bpf_map__delete_elem(const struct bpf_map *map,
>> const void *key, size_t key_sz, __u64 flags)
>> {
>> diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
>> index d1cf813a057b..ba0d15028c72 100644
>> --- a/tools/lib/bpf/libbpf.h
>> +++ b/tools/lib/bpf/libbpf.h
>> @@ -1185,6 +1185,28 @@ LIBBPF_API int bpf_map__lookup_elem(const struct bpf_map *map,
>> const void *key, size_t key_sz,
>> void *value, size_t value_sz, __u64 flags);
>>
>> +/**
>> + * @brief **bpf_map__lookup_elem_opts()** allows to lookup BPF map value
>> + * corresponding to provided key with options to lookup percpu map.
>> + * @param map BPF map to lookup element in
>> + * @param key pointer to memory containing bytes of the key used for lookup
>> + * @param key_sz size in bytes of key data, needs to match BPF map definition's **key_size**
>> + * @param value pointer to memory in which looked up value will be stored
>> + * @param value_sz size in byte of value data memory; it has to match BPF map
>> + * definition's **value_size**. For per-CPU BPF maps value size can be
>> + * definition's **value_size** if **BPF_F_CPU** is specified in **opts->flags**,
>> + * or the size described in **bpf_map__lookup_elem()**.
>
> let's describe all this sizing in one place (either __lookup_elem or
> __lookup_elem_opts) and then refer to that succinctly from another one
> (without BPF_F_CPU exception spread out across two API descriptions)
>
Let's describe it in __lookup_elem_opts and then refer to that in
__lookup_elem.
>> + * @opts extra options passed to kernel for this operation
>> + * @return 0, on success; negative error, otherwise
>> + *
>> + * **bpf_map__lookup_elem_opts()** is high-level equivalent of
>> + * **bpf_map_lookup_elem_opts()** API with added check for key and value size.
>> + */
>> +LIBBPF_API int bpf_map__lookup_elem_opts(const struct bpf_map *map,
>> + const void *key, size_t key_sz,
>> + void *value, size_t value_sz,
>> + const struct bpf_map_lookup_elem_opts *opts);
>> +
>> /**
>> * @brief **bpf_map__update_elem()** allows to insert or update value in BPF
>> * map that corresponds to provided key.
>> @@ -1209,6 +1231,29 @@ LIBBPF_API int bpf_map__update_elem(const struct bpf_map *map,
>> const void *key, size_t key_sz,
>> const void *value, size_t value_sz, __u64 flags);
>>
>> +/**
>> + * @brief **bpf_map__update_elem_opts()** allows to insert or update value in BPF
>> + * map that corresponds to provided key with options for percpu maps.
>> + * @param map BPF map to insert to or update element in
>> + * @param key pointer to memory containing bytes of the key
>> + * @param key_sz size in bytes of key data, needs to match BPF map definition's **key_size**
>> + * @param value pointer to memory containing bytes of the value
>> + * @param value_sz size in byte of value data memory; it has to match BPF map
>> + * definition's **value_size**. For per-CPU BPF maps value size can be
>> + * definition's **value_size** if **BPF_F_CPU** is specified in **opts->flags**,
>> + * or the size described in **bpf_map__update_elem()**.
>> + * @opts extra options passed to kernel for this operation
>> + * @flags extra flags passed to kernel for this operation
>> + * @return 0, on success; negative error, otherwise
>> + *
>> + * **bpf_map__update_elem_opts()** is high-level equivalent of
>> + * **bpf_map_update_elem_opts()** API with added check for key and value size.
>> + */
>> +LIBBPF_API int bpf_map__update_elem_opts(const struct bpf_map *map,
>> + const void *key, size_t key_sz,
>> + const void *value, size_t value_sz,
>> + const struct bpf_map_update_elem_opts *opts);
>> +
>> /**
>> * @brief **bpf_map__delete_elem()** allows to delete element in BPF map that
>> * corresponds to provided key.
>> diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
>> index c7fc0bde5648..c39814adeae9 100644
>> --- a/tools/lib/bpf/libbpf.map
>> +++ b/tools/lib/bpf/libbpf.map
>> @@ -436,6 +436,10 @@ LIBBPF_1.6.0 {
>> bpf_linker__add_buf;
>> bpf_linker__add_fd;
>> bpf_linker__new_fd;
>> + bpf_map__lookup_elem_opts;
>> + bpf_map__update_elem_opts;
>> + bpf_map_lookup_elem_opts;
>> + bpf_map_update_elem_opts;
>> bpf_object__prepare;
>> bpf_program__attach_cgroup_opts;
>> bpf_program__func_info;
>> diff --git a/tools/lib/bpf/libbpf_common.h b/tools/lib/bpf/libbpf_common.h
>> index 8fe248e14eb6..ef29caf91f9c 100644
>> --- a/tools/lib/bpf/libbpf_common.h
>> +++ b/tools/lib/bpf/libbpf_common.h
>> @@ -89,4 +89,16 @@
>> memcpy(&NAME, &___##NAME, sizeof(NAME)); \
>> } while (0)
>>
>> +struct bpf_map_update_elem_opts {
>> + size_t sz; /* size of this struct for forward/backward compatibility */
>> + __u64 flags;
>> + __u32 cpu;
>
> size_t: 0
>
Ack.
>> +};
>> +
>> +struct bpf_map_lookup_elem_opts {
>> + size_t sz; /* size of this struct for forward/backward compatibility */
>> + __u64 flags;
>> + __u32 cpu;
>
> size_t: 0
>
Ack.
Thanks,
Leon
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH bpf-next 3/3] selftests/bpf: Add case to test BPF_F_CPU
2025-07-01 20:22 ` Andrii Nakryiko
@ 2025-07-02 17:29 ` Leon Hwang
0 siblings, 0 replies; 13+ messages in thread
From: Leon Hwang @ 2025-07-02 17:29 UTC (permalink / raw)
To: Andrii Nakryiko; +Cc: bpf, ast, andrii, daniel
On 2025/7/2 04:22, Andrii Nakryiko wrote:
> On Tue, Jun 24, 2025 at 9:55 AM Leon Hwang <leon.hwang@linux.dev> wrote:
>>
>> This patch adds test coverage for the new BPF_F_CPU flag support in
>> percpu_array maps. The following APIs are exercised:
>>
>> * bpf_map_update_batch()
>> * bpf_map_lookup_batch()
>> * bpf_map_update_elem_opts()
>> * bpf_map__update_elem_opts()
>> * bpf_map_lookup_elem_opts()
>> * bpf_map__lookup_elem_opts()
>>
>> cd tools/testing/selftests/bpf/
>> ./test_progs -t percpu_alloc/cpu_flag_tests
>> 251/5 percpu_alloc/cpu_flag_tests:OK
>> 251 percpu_alloc:OK
>> Summary: 1/1 PASSED, 0 SKIPPED, 0 FAILED
>>
>> Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
>> ---
>> .../selftests/bpf/prog_tests/percpu_alloc.c | 169 ++++++++++++++++++
>> .../selftests/bpf/progs/percpu_array_flag.c | 24 +++
>> 2 files changed, 193 insertions(+)
>> create mode 100644 tools/testing/selftests/bpf/progs/percpu_array_flag.c
>>
>> diff --git a/tools/testing/selftests/bpf/prog_tests/percpu_alloc.c b/tools/testing/selftests/bpf/prog_tests/percpu_alloc.c
>> index 343da65864d6..5727f4601b49 100644
>> --- a/tools/testing/selftests/bpf/prog_tests/percpu_alloc.c
>> +++ b/tools/testing/selftests/bpf/prog_tests/percpu_alloc.c
>> @@ -3,6 +3,7 @@
>> #include "percpu_alloc_array.skel.h"
>> #include "percpu_alloc_cgrp_local_storage.skel.h"
>> #include "percpu_alloc_fail.skel.h"
>> +#include "percpu_array_flag.skel.h"
>>
>> static void test_array(void)
>> {
>> @@ -115,6 +116,172 @@ static void test_failure(void) {
>> RUN_TESTS(percpu_alloc_fail);
>> }
>>
>> +static void test_cpu_flag(void)
>> +{
>> + int map_fd, *keys = NULL, value_size, cpu, i, j, nr_cpus, err;
>> + size_t key_sz = sizeof(int), value_sz = sizeof(u64);
>> + struct percpu_array_flag *skel;
>> + u64 batch = 0, *values = NULL;
>> + const u64 value = 0xDEADC0DE;
>> + u32 count, max_entries;
>> + struct bpf_map *map;
>> + DECLARE_LIBBPF_OPTS(bpf_map_lookup_elem_opts, lookup_opts,
>> + .flags = BPF_F_CPU,
>> + .cpu = 0,
>> + );
>
> use shorter LIBBPF_OPTS macro, please
>
Ack.
[...]
Thanks,
Leon
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH bpf-next 2/3] bpf, libbpf: Support BPF_F_CPU for percpu_array map
2025-07-02 17:28 ` Leon Hwang
@ 2025-07-02 17:30 ` Andrii Nakryiko
2025-07-02 17:32 ` Leon Hwang
0 siblings, 1 reply; 13+ messages in thread
From: Andrii Nakryiko @ 2025-07-02 17:30 UTC (permalink / raw)
To: Leon Hwang; +Cc: bpf, ast, andrii, daniel
On Wed, Jul 2, 2025 at 10:28 AM Leon Hwang <leon.hwang@linux.dev> wrote:
>
>
>
> On 2025/7/2 04:22, Andrii Nakryiko wrote:
> > On Tue, Jun 24, 2025 at 9:55 AM Leon Hwang <leon.hwang@linux.dev> wrote:
> >>
> >> This patch adds libbpf support for the BPF_F_CPU flag in percpu_array maps,
> >> introducing the following APIs:
> >>
> >> 1. bpf_map_update_elem_opts(): update with struct bpf_map_update_elem_opts
> >> 2. bpf_map_lookup_elem_opts(): lookup with struct bpf_map_lookup_elem_opts
> >> 3. bpf_map__update_elem_opts(): high-level wrapper with input validation
> >> 4. bpf_map__lookup_elem_opts(): high-level wrapper with input validation
> >>
> >> Behavior:
> >>
> >> * If opts->cpu == 0xFFFFFFFF, the update is applied to all CPUs.
> >> * Otherwise, it applies only to the specified CPU.
> >> * Lookup APIs retrieve values from the target CPU when BPF_F_CPU is used.
> >>
> >> Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
> >> ---
> >> tools/lib/bpf/bpf.c | 37 +++++++++++++++++++++++
> >> tools/lib/bpf/bpf.h | 35 +++++++++++++++++++++-
> >> tools/lib/bpf/libbpf.c | 56 +++++++++++++++++++++++++++++++++++
> >> tools/lib/bpf/libbpf.h | 45 ++++++++++++++++++++++++++++
> >> tools/lib/bpf/libbpf.map | 4 +++
> >> tools/lib/bpf/libbpf_common.h | 12 ++++++++
> >> 6 files changed, 188 insertions(+), 1 deletion(-)
> >>
[...]
> >> };
> >> -#define bpf_map_batch_opts__last_field flags
> >> +#define bpf_map_batch_opts__last_field cpu
> >>
> >>
> >> /**
> >> @@ -286,6 +315,10 @@ LIBBPF_API int bpf_map_lookup_and_delete_batch(int fd, void *in_batch,
> >> * Update spin_lock-ed map elements. This must be
> >> * specified if the map value contains a spinlock.
> >> *
> >> + * **BPF_F_CPU**
> >> + * As for percpu map, update value on all CPUs if **opts->cpu** is
> >> + * 0xFFFFFFFF, or on specified CPU otherwise.
> >> + *
> >> * @param fd BPF map file descriptor
> >> * @param keys pointer to an array of *count* keys
> >> * @param values pointer to an array of *count* values
> >> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> >> index 6445165a24f2..30400bdc20d9 100644
> >> --- a/tools/lib/bpf/libbpf.c
> >> +++ b/tools/lib/bpf/libbpf.c
> >> @@ -10636,6 +10636,34 @@ int bpf_map__lookup_elem(const struct bpf_map *map,
> >> return bpf_map_lookup_elem_flags(map->fd, key, value, flags);
> >> }
> >>
> >> +int bpf_map__lookup_elem_opts(const struct bpf_map *map, const void *key,
> >> + size_t key_sz, void *value, size_t value_sz,
> >> + const struct bpf_map_lookup_elem_opts *opts)
> >> +{
> >> + int nr_cpus = libbpf_num_possible_cpus();
> >> + __u32 cpu = OPTS_GET(opts, cpu, nr_cpus);
> >> + __u64 flags = OPTS_GET(opts, flags, 0);
> >> + int err;
> >> +
> >> + if (flags & BPF_F_CPU) {
> >> + if (map->def.type != BPF_MAP_TYPE_PERCPU_ARRAY)
> >> + return -EINVAL;
> >> + if (cpu >= nr_cpus)
> >> + return -E2BIG;
> >> + if (map->def.value_size != value_sz) {
> >> + pr_warn("map '%s': unexpected value size %zu provided, expected %u\n",
> >> + map->name, value_sz, map->def.value_size);
> >> + return -EINVAL;
> >> + }
> >
> > shouldn't this go into validate_map_op?..
> >
>
> It should.
>
> However, to avoid making validate_map_op really complicated, I'd like to
> add validate_map_cpu_op to wrap checking cpu and validate_map_op.
validate_map_op is meant to handle all the different conditions, let's
keep all that in one function
[...]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH bpf-next 2/3] bpf, libbpf: Support BPF_F_CPU for percpu_array map
2025-07-02 17:30 ` Andrii Nakryiko
@ 2025-07-02 17:32 ` Leon Hwang
0 siblings, 0 replies; 13+ messages in thread
From: Leon Hwang @ 2025-07-02 17:32 UTC (permalink / raw)
To: Andrii Nakryiko; +Cc: bpf, ast, andrii, daniel
On 2025/7/3 01:30, Andrii Nakryiko wrote:
> On Wed, Jul 2, 2025 at 10:28 AM Leon Hwang <leon.hwang@linux.dev> wrote:
>>
>>
>>
>> On 2025/7/2 04:22, Andrii Nakryiko wrote:
>>> On Tue, Jun 24, 2025 at 9:55 AM Leon Hwang <leon.hwang@linux.dev> wrote:
>>>>
>>>> This patch adds libbpf support for the BPF_F_CPU flag in percpu_array maps,
>>>> introducing the following APIs:
>>>>
>>>> 1. bpf_map_update_elem_opts(): update with struct bpf_map_update_elem_opts
>>>> 2. bpf_map_lookup_elem_opts(): lookup with struct bpf_map_lookup_elem_opts
>>>> 3. bpf_map__update_elem_opts(): high-level wrapper with input validation
>>>> 4. bpf_map__lookup_elem_opts(): high-level wrapper with input validation
>>>>
>>>> Behavior:
>>>>
>>>> * If opts->cpu == 0xFFFFFFFF, the update is applied to all CPUs.
>>>> * Otherwise, it applies only to the specified CPU.
>>>> * Lookup APIs retrieve values from the target CPU when BPF_F_CPU is used.
>>>>
>>>> Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
>>>> ---
>>>> tools/lib/bpf/bpf.c | 37 +++++++++++++++++++++++
>>>> tools/lib/bpf/bpf.h | 35 +++++++++++++++++++++-
>>>> tools/lib/bpf/libbpf.c | 56 +++++++++++++++++++++++++++++++++++
>>>> tools/lib/bpf/libbpf.h | 45 ++++++++++++++++++++++++++++
>>>> tools/lib/bpf/libbpf.map | 4 +++
>>>> tools/lib/bpf/libbpf_common.h | 12 ++++++++
>>>> 6 files changed, 188 insertions(+), 1 deletion(-)
>>>>
>
> [...]
>
>>>> };
>>>> -#define bpf_map_batch_opts__last_field flags
>>>> +#define bpf_map_batch_opts__last_field cpu
>>>>
>>>>
>>>> /**
>>>> @@ -286,6 +315,10 @@ LIBBPF_API int bpf_map_lookup_and_delete_batch(int fd, void *in_batch,
>>>> * Update spin_lock-ed map elements. This must be
>>>> * specified if the map value contains a spinlock.
>>>> *
>>>> + * **BPF_F_CPU**
>>>> + * As for percpu map, update value on all CPUs if **opts->cpu** is
>>>> + * 0xFFFFFFFF, or on specified CPU otherwise.
>>>> + *
>>>> * @param fd BPF map file descriptor
>>>> * @param keys pointer to an array of *count* keys
>>>> * @param values pointer to an array of *count* values
>>>> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
>>>> index 6445165a24f2..30400bdc20d9 100644
>>>> --- a/tools/lib/bpf/libbpf.c
>>>> +++ b/tools/lib/bpf/libbpf.c
>>>> @@ -10636,6 +10636,34 @@ int bpf_map__lookup_elem(const struct bpf_map *map,
>>>> return bpf_map_lookup_elem_flags(map->fd, key, value, flags);
>>>> }
>>>>
>>>> +int bpf_map__lookup_elem_opts(const struct bpf_map *map, const void *key,
>>>> + size_t key_sz, void *value, size_t value_sz,
>>>> + const struct bpf_map_lookup_elem_opts *opts)
>>>> +{
>>>> + int nr_cpus = libbpf_num_possible_cpus();
>>>> + __u32 cpu = OPTS_GET(opts, cpu, nr_cpus);
>>>> + __u64 flags = OPTS_GET(opts, flags, 0);
>>>> + int err;
>>>> +
>>>> + if (flags & BPF_F_CPU) {
>>>> + if (map->def.type != BPF_MAP_TYPE_PERCPU_ARRAY)
>>>> + return -EINVAL;
>>>> + if (cpu >= nr_cpus)
>>>> + return -E2BIG;
>>>> + if (map->def.value_size != value_sz) {
>>>> + pr_warn("map '%s': unexpected value size %zu provided, expected %u\n",
>>>> + map->name, value_sz, map->def.value_size);
>>>> + return -EINVAL;
>>>> + }
>>>
>>> shouldn't this go into validate_map_op?..
>>>
>>
>> It should.
>>
>> However, to avoid making validate_map_op really complicated, I'd like to
>> add validate_map_cpu_op to wrap checking cpu and validate_map_op.
>
> validate_map_op is meant to handle all the different conditions, let's
> keep all that in one function
>
Got it.
Thanks,
Leon
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2025-07-02 17:32 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-24 16:53 [RFC PATCH bpf-next 0/3] bpf: Introduce BPF_F_CPU flag for percpu_array map Leon Hwang
2025-06-24 16:53 ` [RFC PATCH bpf-next 1/3] " Leon Hwang
2025-07-01 20:22 ` Andrii Nakryiko
2025-07-02 17:01 ` Leon Hwang
2025-07-02 17:13 ` Andrii Nakryiko
2025-06-24 16:53 ` [RFC PATCH bpf-next 2/3] bpf, libbpf: Support BPF_F_CPU " Leon Hwang
2025-07-01 20:22 ` Andrii Nakryiko
2025-07-02 17:28 ` Leon Hwang
2025-07-02 17:30 ` Andrii Nakryiko
2025-07-02 17:32 ` Leon Hwang
2025-06-24 16:53 ` [RFC PATCH bpf-next 3/3] selftests/bpf: Add case to test BPF_F_CPU Leon Hwang
2025-07-01 20:22 ` Andrii Nakryiko
2025-07-02 17:29 ` Leon Hwang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).