[PATCH bpf-next v7 0/7] bpf: Introduce BPF_F_CPU and BPF_F_ALL

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH bpf-next v7 0/7] bpf: Introduce BPF_F_CPU and BPF_F_ALL_CPUS flags for percpu maps
@ 2025-09-10 16:27 Leon Hwang
  2025-09-10 16:27 ` [PATCH bpf-next v7 1/7] bpf: Introduce internal bpf_map_check_op_flags helper function Leon Hwang
                   ` (6 more replies)
  0 siblings, 7 replies; 29+ messages in thread
From: Leon Hwang @ 2025-09-10 16:27 UTC (permalink / raw)
  To: bpf
  Cc: ast, andrii, daniel, jolsa, yonghong.song, song, eddyz87, dxu,
	deso, leon.hwang, kernel-patches-bot

This patch set introduces the BPF_F_CPU and BPF_F_ALL_CPUS flags for
percpu maps, as the requirement of BPF_F_ALL_CPUS flag for percpu_array
maps was discussed in the thread of
"[PATCH bpf-next v3 0/4] bpf: Introduce global percpu data"[1].

The goal of BPF_F_ALL_CPUS flag is to reduce data caching overhead in light
skeletons by allowing a single value to be reused to update values across all
CPUs. This avoids the M:N problem where M cached values are used to update a
map on N CPUs kernel.

The BPF_F_CPU flag is accompanied by *flags*-embedded cpu info, which
specifies the target CPU for the operation:

* For lookup operations: the flag field alongside cpu info enable querying
  a value on the specified CPU.
* For update operations: the flag field alongside cpu info enable
  updating value for specified CPU.

Links:
[1] https://lore.kernel.org/bpf/20250526162146.24429-1-leon.hwang@linux.dev/

Changes:
v6 -> v7:
* Get correct value size for percpu_hash and lru_percpu_hash in
  update_batch API.
* Set 'count' as 'max_entries' in test cases for lookup_batch API.
* Address comment from Alexei:
  * Move cpu flags check into bpf_map_check_op_flags().

v5 -> v6:
* Move bpf_map_check_op_flags() from 'bpf.h' to 'syscall.c'.
* Address comments from Alexei:
  * Drop the refactoring code of data copying logic for percpu maps.
  * Drop bpf_map_check_op_flags() wrappers.

v4 -> v5:
* Address comments from Andrii:
  * Refactor data copying logic for all percpu maps.
  * Drop this_cpu_ptr() micro-optimization.
  * Drop cpu check in libbpf's validate_map_op().
  * Enhance bpf_map_check_op_flags() using *allowed flags* instead of
    'extra_flags_mask'.

v3 -> v4:
* Address comments from Andrii:
  * Remove unnecessary map_type check in bpf_map_value_size().
  * Reduce code churn.
  * Remove unnecessary do_delete check in
    __htab_map_lookup_and_delete_batch().
  * Introduce bpf_percpu_copy_to_user() and bpf_percpu_copy_from_user().
  * Rename check_map_flags() to bpf_map_check_op_flags() with
    extra_flags_mask.
  * Add human-readable pr_warn() explanations in validate_map_op().
  * Use flags in bpf_map__delete_elem() and
    bpf_map__lookup_and_delete_elem().
  * Drop "for alignment reasons".
v3 link: https://lore.kernel.org/bpf/20250821160817.70285-1-leon.hwang@linux.dev/

v2 -> v3:
* Address comments from Alexei:
  * Use BPF_F_ALL_CPUS instead of BPF_ALL_CPUS magic.
  * Introduce these two cpu flags for all percpu maps.
* Address comments from Jiri:
  * Reduce some unnecessary u32 cast.
  * Refactor more generic map flags check function.
  * A code style issue.
v2 link: https://lore.kernel.org/bpf/20250805163017.17015-1-leon.hwang@linux.dev/

v1 -> v2:
* Address comments from Andrii:
  * Embed cpu info as high 32 bits of *flags* totally.
  * Use ERANGE instead of E2BIG.
  * Few format issues.

Leon Hwang (7):
  bpf: Introduce internal bpf_map_check_op_flags helper function
  bpf: Introduce BPF_F_CPU and BPF_F_ALL_CPUS flags
  bpf: Add BPF_F_CPU and BPF_F_ALL_CPUS flags support for percpu_array
    maps
  bpf: Add BPF_F_CPU and BPF_F_ALL_CPUS flags support for percpu_hash
    and lru_percpu_hash maps
  bpf: Add BPF_F_CPU and BPF_F_ALL_CPUS flags support for
    percpu_cgroup_storage maps
  libbpf: Add BPF_F_CPU and BPF_F_ALL_CPUS flags support for percpu maps
  selftests/bpf: Add cases to test BPF_F_CPU and BPF_F_ALL_CPUS flags

 include/linux/bpf-cgroup.h                    |   4 +-
 include/linux/bpf.h                           |  44 +++-
 include/uapi/linux/bpf.h                      |   2 +
 kernel/bpf/arraymap.c                         |  24 +-
 kernel/bpf/hashtab.c                          |  77 ++++--
 kernel/bpf/local_storage.c                    |  22 +-
 kernel/bpf/syscall.c                          |  65 +++--
 tools/include/uapi/linux/bpf.h                |   2 +
 tools/lib/bpf/bpf.h                           |   8 +
 tools/lib/bpf/libbpf.c                        |  26 +-
 tools/lib/bpf/libbpf.h                        |  21 +-
 .../selftests/bpf/prog_tests/percpu_alloc.c   | 233 ++++++++++++++++++
 .../selftests/bpf/progs/percpu_alloc_array.c  |  32 +++
 13 files changed, 471 insertions(+), 89 deletions(-)

--
2.50.1

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH bpf-next v7 1/7] bpf: Introduce internal bpf_map_check_op_flags helper function
  2025-09-10 16:27 [PATCH bpf-next v7 0/7] bpf: Introduce BPF_F_CPU and BPF_F_ALL_CPUS flags for percpu maps Leon Hwang
@ 2025-09-10 16:27 ` Leon Hwang
  2025-09-16 23:44   ` Andrii Nakryiko
  2025-09-10 16:27 ` [PATCH bpf-next v7 2/7] bpf: Introduce BPF_F_CPU and BPF_F_ALL_CPUS flags Leon Hwang
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 29+ messages in thread
From: Leon Hwang @ 2025-09-10 16:27 UTC (permalink / raw)
  To: bpf
  Cc: ast, andrii, daniel, jolsa, yonghong.song, song, eddyz87, dxu,
	deso, leon.hwang, kernel-patches-bot

It is to unify map flags checking for lookup_elem, update_elem,
lookup_batch and update_batch APIs.

Therefore, it will be convenient to check BPF_F_CPU and BPF_F_ALL_CPUS
flags in it for these APIs in next patch.

Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
 include/linux/bpf.h  | 11 +++++++++++
 kernel/bpf/syscall.c | 34 +++++++++++-----------------------
 2 files changed, 22 insertions(+), 23 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 8f6e87f0f3a89..c5bf72cec1a62 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -3709,4 +3709,15 @@ int bpf_prog_get_file_line(struct bpf_prog *prog, unsigned long ip, const char *
 			   const char **linep, int *nump);
 struct bpf_prog *bpf_prog_find_from_stack(void);
 
+static inline int bpf_map_check_op_flags(struct bpf_map *map, u64 flags, u64 allowed_flags)
+{
+	if (flags & ~allowed_flags)
+		return -EINVAL;
+
+	if ((flags & BPF_F_LOCK) && !btf_record_has_field(map->record, BPF_SPIN_LOCK))
+		return -EINVAL;
+
+	return 0;
+}
+
 #endif /* _LINUX_BPF_H */
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 3f178a0f8eb12..1504630a72a76 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1669,9 +1669,6 @@ static int map_lookup_elem(union bpf_attr *attr)
 	if (CHECK_ATTR(BPF_MAP_LOOKUP_ELEM))
 		return -EINVAL;
 
-	if (attr->flags & ~BPF_F_LOCK)
-		return -EINVAL;
-
 	CLASS(fd, f)(attr->map_fd);
 	map = __bpf_map_get(f);
 	if (IS_ERR(map))
@@ -1679,9 +1676,9 @@ static int map_lookup_elem(union bpf_attr *attr)
 	if (!(map_get_sys_perms(map, f) & FMODE_CAN_READ))
 		return -EPERM;
 
-	if ((attr->flags & BPF_F_LOCK) &&
-	    !btf_record_has_field(map->record, BPF_SPIN_LOCK))
-		return -EINVAL;
+	err = bpf_map_check_op_flags(map, attr->flags, BPF_F_LOCK);
+	if (err)
+		return err;
 
 	key = __bpf_copy_key(ukey, map->key_size);
 	if (IS_ERR(key))
@@ -1744,11 +1741,9 @@ static int map_update_elem(union bpf_attr *attr, bpfptr_t uattr)
 		goto err_put;
 	}
 
-	if ((attr->flags & BPF_F_LOCK) &&
-	    !btf_record_has_field(map->record, BPF_SPIN_LOCK)) {
-		err = -EINVAL;
+	err = bpf_map_check_op_flags(map, attr->flags, ~0);
+	if (err)
 		goto err_put;
-	}
 
 	key = ___bpf_copy_key(ukey, map->key_size);
 	if (IS_ERR(key)) {
@@ -1952,13 +1947,9 @@ int generic_map_update_batch(struct bpf_map *map, struct file *map_file,
 	void *key, *value;
 	int err = 0;
 
-	if (attr->batch.elem_flags & ~BPF_F_LOCK)
-		return -EINVAL;
-
-	if ((attr->batch.elem_flags & BPF_F_LOCK) &&
-	    !btf_record_has_field(map->record, BPF_SPIN_LOCK)) {
-		return -EINVAL;
-	}
+	err = bpf_map_check_op_flags(map, attr->batch.elem_flags, BPF_F_LOCK);
+	if (err)
+		return err;
 
 	value_size = bpf_map_value_size(map);
 
@@ -2015,12 +2006,9 @@ int generic_map_lookup_batch(struct bpf_map *map,
 	u32 value_size, cp, max_count;
 	int err;
 
-	if (attr->batch.elem_flags & ~BPF_F_LOCK)
-		return -EINVAL;
-
-	if ((attr->batch.elem_flags & BPF_F_LOCK) &&
-	    !btf_record_has_field(map->record, BPF_SPIN_LOCK))
-		return -EINVAL;
+	err = bpf_map_check_op_flags(map, attr->batch.elem_flags, BPF_F_LOCK);
+	if (err)
+		return err;
 
 	value_size = bpf_map_value_size(map);
 
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH bpf-next v7 2/7] bpf: Introduce BPF_F_CPU and BPF_F_ALL_CPUS flags
  2025-09-10 16:27 [PATCH bpf-next v7 0/7] bpf: Introduce BPF_F_CPU and BPF_F_ALL_CPUS flags for percpu maps Leon Hwang
  2025-09-10 16:27 ` [PATCH bpf-next v7 1/7] bpf: Introduce internal bpf_map_check_op_flags helper function Leon Hwang
@ 2025-09-10 16:27 ` Leon Hwang
  2025-09-16 23:44   ` Andrii Nakryiko
  2025-09-10 16:27 ` [PATCH bpf-next v7 3/7] bpf: Add BPF_F_CPU and BPF_F_ALL_CPUS flags support for percpu_array maps Leon Hwang
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 29+ messages in thread
From: Leon Hwang @ 2025-09-10 16:27 UTC (permalink / raw)
  To: bpf
  Cc: ast, andrii, daniel, jolsa, yonghong.song, song, eddyz87, dxu,
	deso, leon.hwang, kernel-patches-bot

Introduce BPF_F_CPU and BPF_F_ALL_CPUS flags and check them for
following APIs:

* 'map_lookup_elem()'
* 'map_update_elem()'
* 'generic_map_lookup_batch()'
* 'generic_map_update_batch()'

And, get the correct value size for these APIs.

Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
 include/linux/bpf.h            | 23 ++++++++++++++++++++++-
 include/uapi/linux/bpf.h       |  2 ++
 kernel/bpf/syscall.c           | 31 +++++++++++++++++--------------
 tools/include/uapi/linux/bpf.h |  2 ++
 4 files changed, 43 insertions(+), 15 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index c5bf72cec1a62..cfb95e3a93dcc 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -3709,14 +3709,35 @@ int bpf_prog_get_file_line(struct bpf_prog *prog, unsigned long ip, const char *
 			   const char **linep, int *nump);
 struct bpf_prog *bpf_prog_find_from_stack(void);
 
+static inline bool bpf_map_supports_cpu_flags(enum bpf_map_type map_type)
+{
+	return false;
+}
+
 static inline int bpf_map_check_op_flags(struct bpf_map *map, u64 flags, u64 allowed_flags)
 {
-	if (flags & ~allowed_flags)
+	u32 cpu;
+
+	if ((u32)flags & ~allowed_flags)
 		return -EINVAL;
 
 	if ((flags & BPF_F_LOCK) && !btf_record_has_field(map->record, BPF_SPIN_LOCK))
 		return -EINVAL;
 
+	if (!(flags & BPF_F_CPU) && flags >> 32)
+		return -EINVAL;
+
+	if (flags & (BPF_F_CPU | BPF_F_ALL_CPUS)) {
+		if (!bpf_map_supports_cpu_flags(map->map_type))
+			return -EINVAL;
+		if ((flags & BPF_F_CPU) && (flags & BPF_F_ALL_CPUS))
+			return -EINVAL;
+
+		cpu = flags >> 32;
+		if ((flags & BPF_F_CPU) && cpu >= num_possible_cpus())
+			return -ERANGE;
+	}
+
 	return 0;
 }
 
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 233de8677382e..be1fdc5042744 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1372,6 +1372,8 @@ enum {
 	BPF_NOEXIST	= 1, /* create new element if it didn't exist */
 	BPF_EXIST	= 2, /* update existing element */
 	BPF_F_LOCK	= 4, /* spin_lock-ed map_lookup/map_update */
+	BPF_F_CPU	= 8, /* cpu flag for percpu maps, upper 32-bit of flags is a cpu number */
+	BPF_F_ALL_CPUS	= 16, /* update value across all CPUs for percpu maps */
 };
 
 /* flags for BPF_MAP_CREATE command */
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 1504630a72a76..0ce373e31490b 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -131,12 +131,14 @@ bool bpf_map_write_active(const struct bpf_map *map)
 	return atomic64_read(&map->writecnt) != 0;
 }
 
-static u32 bpf_map_value_size(const struct bpf_map *map)
-{
-	if (map->map_type == BPF_MAP_TYPE_PERCPU_HASH ||
-	    map->map_type == BPF_MAP_TYPE_LRU_PERCPU_HASH ||
-	    map->map_type == BPF_MAP_TYPE_PERCPU_ARRAY ||
-	    map->map_type == BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE)
+static u32 bpf_map_value_size(const struct bpf_map *map, u64 flags)
+{
+	if (flags & (BPF_F_CPU | BPF_F_ALL_CPUS))
+		return round_up(map->value_size, 8);
+	else if (map->map_type == BPF_MAP_TYPE_PERCPU_HASH ||
+		 map->map_type == BPF_MAP_TYPE_LRU_PERCPU_HASH ||
+		 map->map_type == BPF_MAP_TYPE_PERCPU_ARRAY ||
+		 map->map_type == BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE)
 		return round_up(map->value_size, 8) * num_possible_cpus();
 	else if (IS_FD_MAP(map))
 		return sizeof(u32);
@@ -1676,7 +1678,7 @@ static int map_lookup_elem(union bpf_attr *attr)
 	if (!(map_get_sys_perms(map, f) & FMODE_CAN_READ))
 		return -EPERM;
 
-	err = bpf_map_check_op_flags(map, attr->flags, BPF_F_LOCK);
+	err = bpf_map_check_op_flags(map, attr->flags, BPF_F_LOCK | BPF_F_CPU);
 	if (err)
 		return err;
 
@@ -1684,7 +1686,7 @@ static int map_lookup_elem(union bpf_attr *attr)
 	if (IS_ERR(key))
 		return PTR_ERR(key);
 
-	value_size = bpf_map_value_size(map);
+	value_size = bpf_map_value_size(map, attr->flags);
 
 	err = -ENOMEM;
 	value = kvmalloc(value_size, GFP_USER | __GFP_NOWARN);
@@ -1751,7 +1753,7 @@ static int map_update_elem(union bpf_attr *attr, bpfptr_t uattr)
 		goto err_put;
 	}
 
-	value_size = bpf_map_value_size(map);
+	value_size = bpf_map_value_size(map, attr->flags);
 	value = kvmemdup_bpfptr(uvalue, value_size);
 	if (IS_ERR(value)) {
 		err = PTR_ERR(value);
@@ -1947,11 +1949,12 @@ int generic_map_update_batch(struct bpf_map *map, struct file *map_file,
 	void *key, *value;
 	int err = 0;
 
-	err = bpf_map_check_op_flags(map, attr->batch.elem_flags, BPF_F_LOCK);
+	err = bpf_map_check_op_flags(map, attr->batch.elem_flags,
+				     BPF_F_LOCK | BPF_F_CPU | BPF_F_ALL_CPUS);
 	if (err)
 		return err;
 
-	value_size = bpf_map_value_size(map);
+	value_size = bpf_map_value_size(map, attr->batch.elem_flags);
 
 	max_count = attr->batch.count;
 	if (!max_count)
@@ -2006,11 +2009,11 @@ int generic_map_lookup_batch(struct bpf_map *map,
 	u32 value_size, cp, max_count;
 	int err;
 
-	err = bpf_map_check_op_flags(map, attr->batch.elem_flags, BPF_F_LOCK);
+	err = bpf_map_check_op_flags(map, attr->batch.elem_flags, BPF_F_LOCK | BPF_F_CPU);
 	if (err)
 		return err;
 
-	value_size = bpf_map_value_size(map);
+	value_size = bpf_map_value_size(map, attr->batch.elem_flags);
 
 	max_count = attr->batch.count;
 	if (!max_count)
@@ -2132,7 +2135,7 @@ static int map_lookup_and_delete_elem(union bpf_attr *attr)
 		goto err_put;
 	}
 
-	value_size = bpf_map_value_size(map);
+	value_size = bpf_map_value_size(map, 0);
 
 	err = -ENOMEM;
 	value = kvmalloc(value_size, GFP_USER | __GFP_NOWARN);
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 233de8677382e..be1fdc5042744 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -1372,6 +1372,8 @@ enum {
 	BPF_NOEXIST	= 1, /* create new element if it didn't exist */
 	BPF_EXIST	= 2, /* update existing element */
 	BPF_F_LOCK	= 4, /* spin_lock-ed map_lookup/map_update */
+	BPF_F_CPU	= 8, /* cpu flag for percpu maps, upper 32-bit of flags is a cpu number */
+	BPF_F_ALL_CPUS	= 16, /* update value across all CPUs for percpu maps */
 };
 
 /* flags for BPF_MAP_CREATE command */
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH bpf-next v7 3/7] bpf: Add BPF_F_CPU and BPF_F_ALL_CPUS flags support for percpu_array maps
  2025-09-10 16:27 [PATCH bpf-next v7 0/7] bpf: Introduce BPF_F_CPU and BPF_F_ALL_CPUS flags for percpu maps Leon Hwang
  2025-09-10 16:27 ` [PATCH bpf-next v7 1/7] bpf: Introduce internal bpf_map_check_op_flags helper function Leon Hwang
  2025-09-10 16:27 ` [PATCH bpf-next v7 2/7] bpf: Introduce BPF_F_CPU and BPF_F_ALL_CPUS flags Leon Hwang
@ 2025-09-10 16:27 ` Leon Hwang
  2025-09-16 23:44   ` Andrii Nakryiko
  2025-09-10 16:27 ` [PATCH bpf-next v7 4/7] bpf: Add BPF_F_CPU and BPF_F_ALL_CPUS flags support for percpu_hash and lru_percpu_hash maps Leon Hwang
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 29+ messages in thread
From: Leon Hwang @ 2025-09-10 16:27 UTC (permalink / raw)
  To: bpf
  Cc: ast, andrii, daniel, jolsa, yonghong.song, song, eddyz87, dxu,
	deso, leon.hwang, kernel-patches-bot

Introduce support for the BPF_F_ALL_CPUS flag in percpu_array maps to
allow updating values for all CPUs with a single value for both
update_elem and update_batch APIs.

Introduce support for the BPF_F_CPU flag in percpu_array maps to allow:

* update value for specified CPU for both update_elem and update_batch
APIs.
* lookup value for specified CPU for both lookup_elem and lookup_batch
APIs.

The BPF_F_CPU flag is passed via:

* map_flags of lookup_elem and update_elem APIs along with embedded cpu
info.
* elem_flags of lookup_batch and update_batch APIs along with embedded
cpu info.

Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
 include/linux/bpf.h   |  9 +++++++--
 kernel/bpf/arraymap.c | 24 +++++++++++++++++++++---
 kernel/bpf/syscall.c  |  2 +-
 3 files changed, 29 insertions(+), 6 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index cfb95e3a93dcc..0426b29cf6591 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -2697,7 +2697,7 @@ int map_set_for_each_callback_args(struct bpf_verifier_env *env,
 				   struct bpf_func_state *callee);
 
 int bpf_percpu_hash_copy(struct bpf_map *map, void *key, void *value);
-int bpf_percpu_array_copy(struct bpf_map *map, void *key, void *value);
+int bpf_percpu_array_copy(struct bpf_map *map, void *key, void *value, u64 flags);
 int bpf_percpu_hash_update(struct bpf_map *map, void *key, void *value,
 			   u64 flags);
 int bpf_percpu_array_update(struct bpf_map *map, void *key, void *value,
@@ -3711,7 +3711,12 @@ struct bpf_prog *bpf_prog_find_from_stack(void);
 
 static inline bool bpf_map_supports_cpu_flags(enum bpf_map_type map_type)
 {
-	return false;
+	switch (map_type) {
+	case BPF_MAP_TYPE_PERCPU_ARRAY:
+		return true;
+	default:
+		return false;
+	}
 }
 
 static inline int bpf_map_check_op_flags(struct bpf_map *map, u64 flags, u64 allowed_flags)
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 3d080916faf97..dbe2548b35513 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -295,7 +295,7 @@ static void *percpu_array_map_lookup_percpu_elem(struct bpf_map *map, void *key,
 	return per_cpu_ptr(array->pptrs[index & array->index_mask], cpu);
 }
 
-int bpf_percpu_array_copy(struct bpf_map *map, void *key, void *value)
+int bpf_percpu_array_copy(struct bpf_map *map, void *key, void *value, u64 map_flags)
 {
 	struct bpf_array *array = container_of(map, struct bpf_array, map);
 	u32 index = *(u32 *)key;
@@ -313,11 +313,18 @@ int bpf_percpu_array_copy(struct bpf_map *map, void *key, void *value)
 	size = array->elem_size;
 	rcu_read_lock();
 	pptr = array->pptrs[index & array->index_mask];
+	if (map_flags & BPF_F_CPU) {
+		cpu = map_flags >> 32;
+		copy_map_value_long(map, value, per_cpu_ptr(pptr, cpu));
+		check_and_init_map_value(map, value);
+		goto unlock;
+	}
 	for_each_possible_cpu(cpu) {
 		copy_map_value_long(map, value + off, per_cpu_ptr(pptr, cpu));
 		check_and_init_map_value(map, value + off);
 		off += size;
 	}
+unlock:
 	rcu_read_unlock();
 	return 0;
 }
@@ -390,7 +397,7 @@ int bpf_percpu_array_update(struct bpf_map *map, void *key, void *value,
 	int cpu, off = 0;
 	u32 size;
 
-	if (unlikely(map_flags > BPF_EXIST))
+	if (unlikely((u32)map_flags > BPF_F_ALL_CPUS))
 		/* unknown flags */
 		return -EINVAL;
 
@@ -411,11 +418,22 @@ int bpf_percpu_array_update(struct bpf_map *map, void *key, void *value,
 	size = array->elem_size;
 	rcu_read_lock();
 	pptr = array->pptrs[index & array->index_mask];
+	if (map_flags & BPF_F_CPU) {
+		cpu = map_flags >> 32;
+		copy_map_value_long(map, per_cpu_ptr(pptr, cpu), value);
+		bpf_obj_free_fields(array->map.record, per_cpu_ptr(pptr, cpu));
+		goto unlock;
+	}
 	for_each_possible_cpu(cpu) {
 		copy_map_value_long(map, per_cpu_ptr(pptr, cpu), value + off);
 		bpf_obj_free_fields(array->map.record, per_cpu_ptr(pptr, cpu));
-		off += size;
+		/* same user-provided value is used if BPF_F_ALL_CPUS is
+		 * specified, otherwise value is an array of per-CPU values.
+		 */
+		if (!(map_flags & BPF_F_ALL_CPUS))
+			off += size;
 	}
+unlock:
 	rcu_read_unlock();
 	return 0;
 }
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 0ce373e31490b..2054a943f69cb 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -316,7 +316,7 @@ static int bpf_map_copy_value(struct bpf_map *map, void *key, void *value,
 	    map->map_type == BPF_MAP_TYPE_LRU_PERCPU_HASH) {
 		err = bpf_percpu_hash_copy(map, key, value);
 	} else if (map->map_type == BPF_MAP_TYPE_PERCPU_ARRAY) {
-		err = bpf_percpu_array_copy(map, key, value);
+		err = bpf_percpu_array_copy(map, key, value, flags);
 	} else if (map->map_type == BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE) {
 		err = bpf_percpu_cgroup_storage_copy(map, key, value);
 	} else if (map->map_type == BPF_MAP_TYPE_STACK_TRACE) {
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH bpf-next v7 4/7] bpf: Add BPF_F_CPU and BPF_F_ALL_CPUS flags support for percpu_hash and lru_percpu_hash maps
  2025-09-10 16:27 [PATCH bpf-next v7 0/7] bpf: Introduce BPF_F_CPU and BPF_F_ALL_CPUS flags for percpu maps Leon Hwang
                   ` (2 preceding siblings ...)
  2025-09-10 16:27 ` [PATCH bpf-next v7 3/7] bpf: Add BPF_F_CPU and BPF_F_ALL_CPUS flags support for percpu_array maps Leon Hwang
@ 2025-09-10 16:27 ` Leon Hwang
  2025-09-16 23:44   ` Andrii Nakryiko
  2025-09-10 16:27 ` [PATCH bpf-next v7 5/7] bpf: Add BPF_F_CPU and BPF_F_ALL_CPUS flags support for percpu_cgroup_storage maps Leon Hwang
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 29+ messages in thread
From: Leon Hwang @ 2025-09-10 16:27 UTC (permalink / raw)
  To: bpf
  Cc: ast, andrii, daniel, jolsa, yonghong.song, song, eddyz87, dxu,
	deso, leon.hwang, kernel-patches-bot

Introduce BPF_F_ALL_CPUS flag support for percpu_hash and lru_percpu_hash
maps to allow updating values for all CPUs with a single value for both
update_elem and update_batch APIs.

Introduce BPF_F_CPU flag support for percpu_hash and lru_percpu_hash
maps to allow:

* update value for specified CPU for both update_elem and update_batch
APIs.
* lookup value for specified CPU for both lookup_elem and lookup_batch
APIs.

The BPF_F_CPU flag is passed via:

* map_flags along with embedded cpu info.
* elem_flags along with embedded cpu info.

Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
 include/linux/bpf.h  |  4 ++-
 kernel/bpf/hashtab.c | 77 +++++++++++++++++++++++++++++++-------------
 kernel/bpf/syscall.c |  2 +-
 3 files changed, 58 insertions(+), 25 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 0426b29cf6591..38900907dcafb 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -2696,7 +2696,7 @@ int map_set_for_each_callback_args(struct bpf_verifier_env *env,
 				   struct bpf_func_state *caller,
 				   struct bpf_func_state *callee);
 
-int bpf_percpu_hash_copy(struct bpf_map *map, void *key, void *value);
+int bpf_percpu_hash_copy(struct bpf_map *map, void *key, void *value, u64 flags);
 int bpf_percpu_array_copy(struct bpf_map *map, void *key, void *value, u64 flags);
 int bpf_percpu_hash_update(struct bpf_map *map, void *key, void *value,
 			   u64 flags);
@@ -3713,6 +3713,8 @@ static inline bool bpf_map_supports_cpu_flags(enum bpf_map_type map_type)
 {
 	switch (map_type) {
 	case BPF_MAP_TYPE_PERCPU_ARRAY:
+	case BPF_MAP_TYPE_PERCPU_HASH:
+	case BPF_MAP_TYPE_LRU_PERCPU_HASH:
 		return true;
 	default:
 		return false;
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 71f9931ac64cd..eb8f137258f51 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -937,7 +937,7 @@ static void free_htab_elem(struct bpf_htab *htab, struct htab_elem *l)
 }
 
 static void pcpu_copy_value(struct bpf_htab *htab, void __percpu *pptr,
-			    void *value, bool onallcpus)
+			    void *value, bool onallcpus, u64 map_flags)
 {
 	if (!onallcpus) {
 		/* copy true value_size bytes */
@@ -946,15 +946,26 @@ static void pcpu_copy_value(struct bpf_htab *htab, void __percpu *pptr,
 		u32 size = round_up(htab->map.value_size, 8);
 		int off = 0, cpu;
 
+		if (map_flags & BPF_F_CPU) {
+			cpu = map_flags >> 32;
+			copy_map_value_long(&htab->map, per_cpu_ptr(pptr, cpu), value);
+			return;
+		}
+
 		for_each_possible_cpu(cpu) {
 			copy_map_value_long(&htab->map, per_cpu_ptr(pptr, cpu), value + off);
-			off += size;
+			/* same user-provided value is used if BPF_F_ALL_CPUS
+			 * is specified, otherwise value is an array of per-CPU
+			 * values.
+			 */
+			if (!(map_flags & BPF_F_ALL_CPUS))
+				off += size;
 		}
 	}
 }
 
 static void pcpu_init_value(struct bpf_htab *htab, void __percpu *pptr,
-			    void *value, bool onallcpus)
+			    void *value, bool onallcpus, u64 map_flags)
 {
 	/* When not setting the initial value on all cpus, zero-fill element
 	 * values for other cpus. Otherwise, bpf program has no way to ensure
@@ -972,7 +983,7 @@ static void pcpu_init_value(struct bpf_htab *htab, void __percpu *pptr,
 				zero_map_value(&htab->map, per_cpu_ptr(pptr, cpu));
 		}
 	} else {
-		pcpu_copy_value(htab, pptr, value, onallcpus);
+		pcpu_copy_value(htab, pptr, value, onallcpus, map_flags);
 	}
 }
 
@@ -984,7 +995,7 @@ static bool fd_htab_map_needs_adjust(const struct bpf_htab *htab)
 static struct htab_elem *alloc_htab_elem(struct bpf_htab *htab, void *key,
 					 void *value, u32 key_size, u32 hash,
 					 bool percpu, bool onallcpus,
-					 struct htab_elem *old_elem)
+					 struct htab_elem *old_elem, u64 map_flags)
 {
 	u32 size = htab->map.value_size;
 	bool prealloc = htab_is_prealloc(htab);
@@ -1042,7 +1053,7 @@ static struct htab_elem *alloc_htab_elem(struct bpf_htab *htab, void *key,
 			pptr = *(void __percpu **)ptr;
 		}
 
-		pcpu_init_value(htab, pptr, value, onallcpus);
+		pcpu_init_value(htab, pptr, value, onallcpus, map_flags);
 
 		if (!prealloc)
 			htab_elem_set_ptr(l_new, key_size, pptr);
@@ -1147,7 +1158,7 @@ static long htab_map_update_elem(struct bpf_map *map, void *key, void *value,
 	}
 
 	l_new = alloc_htab_elem(htab, key, value, key_size, hash, false, false,
-				l_old);
+				l_old, map_flags);
 	if (IS_ERR(l_new)) {
 		/* all pre-allocated elements are in use or memory exhausted */
 		ret = PTR_ERR(l_new);
@@ -1263,7 +1274,7 @@ static long htab_map_update_elem_in_place(struct bpf_map *map, void *key,
 	u32 key_size, hash;
 	int ret;
 
-	if (unlikely(map_flags > BPF_EXIST))
+	if (unlikely(!onallcpus && map_flags > BPF_EXIST))
 		/* unknown flags */
 		return -EINVAL;
 
@@ -1291,7 +1302,7 @@ static long htab_map_update_elem_in_place(struct bpf_map *map, void *key,
 		/* Update value in-place */
 		if (percpu) {
 			pcpu_copy_value(htab, htab_elem_get_ptr(l_old, key_size),
-					value, onallcpus);
+					value, onallcpus, map_flags);
 		} else {
 			void **inner_map_pptr = htab_elem_value(l_old, key_size);
 
@@ -1300,7 +1311,7 @@ static long htab_map_update_elem_in_place(struct bpf_map *map, void *key,
 		}
 	} else {
 		l_new = alloc_htab_elem(htab, key, value, key_size,
-					hash, percpu, onallcpus, NULL);
+					hash, percpu, onallcpus, NULL, map_flags);
 		if (IS_ERR(l_new)) {
 			ret = PTR_ERR(l_new);
 			goto err;
@@ -1326,7 +1337,7 @@ static long __htab_lru_percpu_map_update_elem(struct bpf_map *map, void *key,
 	u32 key_size, hash;
 	int ret;
 
-	if (unlikely(map_flags > BPF_EXIST))
+	if (unlikely(!onallcpus && map_flags > BPF_EXIST))
 		/* unknown flags */
 		return -EINVAL;
 
@@ -1366,10 +1377,10 @@ static long __htab_lru_percpu_map_update_elem(struct bpf_map *map, void *key,
 
 		/* per-cpu hash map can update value in-place */
 		pcpu_copy_value(htab, htab_elem_get_ptr(l_old, key_size),
-				value, onallcpus);
+				value, onallcpus, map_flags);
 	} else {
 		pcpu_init_value(htab, htab_elem_get_ptr(l_new, key_size),
-				value, onallcpus);
+				value, onallcpus, map_flags);
 		hlist_nulls_add_head_rcu(&l_new->hash_node, head);
 		l_new = NULL;
 	}
@@ -1698,9 +1709,16 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map,
 	int ret = 0;
 
 	elem_map_flags = attr->batch.elem_flags;
-	if ((elem_map_flags & ~BPF_F_LOCK) ||
-	    ((elem_map_flags & BPF_F_LOCK) && !btf_record_has_field(map->record, BPF_SPIN_LOCK)))
-		return -EINVAL;
+	if (!do_delete && is_percpu) {
+		ret = bpf_map_check_op_flags(map, elem_map_flags, BPF_F_LOCK | BPF_F_CPU);
+		if (ret)
+			return ret;
+	} else {
+		if ((elem_map_flags & ~BPF_F_LOCK) ||
+		    ((elem_map_flags & BPF_F_LOCK) &&
+		     !btf_record_has_field(map->record, BPF_SPIN_LOCK)))
+			return -EINVAL;
+	}
 
 	map_flags = attr->batch.flags;
 	if (map_flags)
@@ -1724,7 +1742,7 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map,
 	value_size = htab->map.value_size;
 	size = round_up(value_size, 8);
 	if (is_percpu)
-		value_size = size * num_possible_cpus();
+		value_size = (elem_map_flags & BPF_F_CPU) ? size : size * num_possible_cpus();
 	total = 0;
 	/* while experimenting with hash tables with sizes ranging from 10 to
 	 * 1000, it was observed that a bucket can have up to 5 entries.
@@ -1806,10 +1824,17 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map,
 			void __percpu *pptr;
 
 			pptr = htab_elem_get_ptr(l, map->key_size);
-			for_each_possible_cpu(cpu) {
-				copy_map_value_long(&htab->map, dst_val + off, per_cpu_ptr(pptr, cpu));
-				check_and_init_map_value(&htab->map, dst_val + off);
-				off += size;
+			if (elem_map_flags & BPF_F_CPU) {
+				cpu = elem_map_flags >> 32;
+				copy_map_value_long(&htab->map, dst_val, per_cpu_ptr(pptr, cpu));
+				check_and_init_map_value(&htab->map, dst_val);
+			} else {
+				for_each_possible_cpu(cpu) {
+					copy_map_value_long(&htab->map, dst_val + off,
+							    per_cpu_ptr(pptr, cpu));
+					check_and_init_map_value(&htab->map, dst_val + off);
+					off += size;
+				}
 			}
 		} else {
 			value = htab_elem_value(l, key_size);
@@ -2365,7 +2390,7 @@ static void *htab_lru_percpu_map_lookup_percpu_elem(struct bpf_map *map, void *k
 	return NULL;
 }
 
-int bpf_percpu_hash_copy(struct bpf_map *map, void *key, void *value)
+int bpf_percpu_hash_copy(struct bpf_map *map, void *key, void *value, u64 map_flags)
 {
 	struct htab_elem *l;
 	void __percpu *pptr;
@@ -2382,16 +2407,22 @@ int bpf_percpu_hash_copy(struct bpf_map *map, void *key, void *value)
 	l = __htab_map_lookup_elem(map, key);
 	if (!l)
 		goto out;
+	ret = 0;
 	/* We do not mark LRU map element here in order to not mess up
 	 * eviction heuristics when user space does a map walk.
 	 */
 	pptr = htab_elem_get_ptr(l, map->key_size);
+	if (map_flags & BPF_F_CPU) {
+		cpu = map_flags >> 32;
+		copy_map_value_long(map, value, per_cpu_ptr(pptr, cpu));
+		check_and_init_map_value(map, value);
+		goto out;
+	}
 	for_each_possible_cpu(cpu) {
 		copy_map_value_long(map, value + off, per_cpu_ptr(pptr, cpu));
 		check_and_init_map_value(map, value + off);
 		off += size;
 	}
-	ret = 0;
 out:
 	rcu_read_unlock();
 	return ret;
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 2054a943f69cb..576b759da0101 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -314,7 +314,7 @@ static int bpf_map_copy_value(struct bpf_map *map, void *key, void *value,
 	bpf_disable_instrumentation();
 	if (map->map_type == BPF_MAP_TYPE_PERCPU_HASH ||
 	    map->map_type == BPF_MAP_TYPE_LRU_PERCPU_HASH) {
-		err = bpf_percpu_hash_copy(map, key, value);
+		err = bpf_percpu_hash_copy(map, key, value, flags);
 	} else if (map->map_type == BPF_MAP_TYPE_PERCPU_ARRAY) {
 		err = bpf_percpu_array_copy(map, key, value, flags);
 	} else if (map->map_type == BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE) {
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH bpf-next v7 5/7] bpf: Add BPF_F_CPU and BPF_F_ALL_CPUS flags support for percpu_cgroup_storage maps
  2025-09-10 16:27 [PATCH bpf-next v7 0/7] bpf: Introduce BPF_F_CPU and BPF_F_ALL_CPUS flags for percpu maps Leon Hwang
                   ` (3 preceding siblings ...)
  2025-09-10 16:27 ` [PATCH bpf-next v7 4/7] bpf: Add BPF_F_CPU and BPF_F_ALL_CPUS flags support for percpu_hash and lru_percpu_hash maps Leon Hwang
@ 2025-09-10 16:27 ` Leon Hwang
  2025-09-16 23:44   ` Andrii Nakryiko
  2025-09-10 16:27 ` [PATCH bpf-next v7 6/7] libbpf: Add BPF_F_CPU and BPF_F_ALL_CPUS flags support for percpu maps Leon Hwang
  2025-09-10 16:27 ` [PATCH bpf-next v7 7/7] selftests/bpf: Add cases to test BPF_F_CPU and BPF_F_ALL_CPUS flags Leon Hwang
  6 siblings, 1 reply; 29+ messages in thread
From: Leon Hwang @ 2025-09-10 16:27 UTC (permalink / raw)
  To: bpf
  Cc: ast, andrii, daniel, jolsa, yonghong.song, song, eddyz87, dxu,
	deso, leon.hwang, kernel-patches-bot

Introduce BPF_F_ALL_CPUS flag support for percpu_cgroup_storage maps to
allow updating values for all CPUs with a single value for update_elem
API.

Introduce BPF_F_CPU flag support for percpu_cgroup_storage maps to
allow:

* update value for specified CPU for update_elem API.
* lookup value for specified CPU for lookup_elem API.

The BPF_F_CPU flag is passed via map_flags along with embedded cpu info.

Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
 include/linux/bpf-cgroup.h |  4 ++--
 include/linux/bpf.h        |  1 +
 kernel/bpf/local_storage.c | 22 +++++++++++++++++++---
 kernel/bpf/syscall.c       |  2 +-
 4 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
index aedf573bdb426..013f4db9903fd 100644
--- a/include/linux/bpf-cgroup.h
+++ b/include/linux/bpf-cgroup.h
@@ -172,7 +172,7 @@ void bpf_cgroup_storage_link(struct bpf_cgroup_storage *storage,
 void bpf_cgroup_storage_unlink(struct bpf_cgroup_storage *storage);
 int bpf_cgroup_storage_assign(struct bpf_prog_aux *aux, struct bpf_map *map);
 
-int bpf_percpu_cgroup_storage_copy(struct bpf_map *map, void *key, void *value);
+int bpf_percpu_cgroup_storage_copy(struct bpf_map *map, void *key, void *value, u64 flags);
 int bpf_percpu_cgroup_storage_update(struct bpf_map *map, void *key,
 				     void *value, u64 flags);
 
@@ -467,7 +467,7 @@ static inline struct bpf_cgroup_storage *bpf_cgroup_storage_alloc(
 static inline void bpf_cgroup_storage_free(
 	struct bpf_cgroup_storage *storage) {}
 static inline int bpf_percpu_cgroup_storage_copy(struct bpf_map *map, void *key,
-						 void *value) {
+						 void *value, u64 flags) {
 	return 0;
 }
 static inline int bpf_percpu_cgroup_storage_update(struct bpf_map *map,
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 38900907dcafb..7ac563ef6f0b2 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -3715,6 +3715,7 @@ static inline bool bpf_map_supports_cpu_flags(enum bpf_map_type map_type)
 	case BPF_MAP_TYPE_PERCPU_ARRAY:
 	case BPF_MAP_TYPE_PERCPU_HASH:
 	case BPF_MAP_TYPE_LRU_PERCPU_HASH:
+	case BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE:
 		return true;
 	default:
 		return false;
diff --git a/kernel/bpf/local_storage.c b/kernel/bpf/local_storage.c
index c93a756e035c0..6887a78b4823a 100644
--- a/kernel/bpf/local_storage.c
+++ b/kernel/bpf/local_storage.c
@@ -180,7 +180,7 @@ static long cgroup_storage_update_elem(struct bpf_map *map, void *key,
 }
 
 int bpf_percpu_cgroup_storage_copy(struct bpf_map *_map, void *key,
-				   void *value)
+				   void *value, u64 map_flags)
 {
 	struct bpf_cgroup_storage_map *map = map_to_storage(_map);
 	struct bpf_cgroup_storage *storage;
@@ -199,11 +199,17 @@ int bpf_percpu_cgroup_storage_copy(struct bpf_map *_map, void *key,
 	 * will not leak any kernel data
 	 */
 	size = round_up(_map->value_size, 8);
+	if (map_flags & BPF_F_CPU) {
+		cpu = map_flags >> 32;
+		bpf_long_memcpy(value, per_cpu_ptr(storage->percpu_buf, cpu), size);
+		goto unlock;
+	}
 	for_each_possible_cpu(cpu) {
 		bpf_long_memcpy(value + off,
 				per_cpu_ptr(storage->percpu_buf, cpu), size);
 		off += size;
 	}
+unlock:
 	rcu_read_unlock();
 	return 0;
 }
@@ -216,7 +222,7 @@ int bpf_percpu_cgroup_storage_update(struct bpf_map *_map, void *key,
 	int cpu, off = 0;
 	u32 size;
 
-	if (map_flags != BPF_ANY && map_flags != BPF_EXIST)
+	if ((u32)map_flags & ~(BPF_ANY | BPF_EXIST | BPF_F_CPU | BPF_F_ALL_CPUS))
 		return -EINVAL;
 
 	rcu_read_lock();
@@ -233,11 +239,21 @@ int bpf_percpu_cgroup_storage_update(struct bpf_map *_map, void *key,
 	 * so no kernel data leaks possible
 	 */
 	size = round_up(_map->value_size, 8);
+	if (map_flags & BPF_F_CPU) {
+		cpu = map_flags >> 32;
+		bpf_long_memcpy(per_cpu_ptr(storage->percpu_buf, cpu), value, size);
+		goto unlock;
+	}
 	for_each_possible_cpu(cpu) {
 		bpf_long_memcpy(per_cpu_ptr(storage->percpu_buf, cpu),
 				value + off, size);
-		off += size;
+		/* same user-provided value is used if BPF_F_ALL_CPUS is
+		 * specified, otherwise value is an array of per-CPU values.
+		 */
+		if (!(map_flags & BPF_F_ALL_CPUS))
+			off += size;
 	}
+unlock:
 	rcu_read_unlock();
 	return 0;
 }
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 576b759da0101..a0d399b8a6163 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -318,7 +318,7 @@ static int bpf_map_copy_value(struct bpf_map *map, void *key, void *value,
 	} else if (map->map_type == BPF_MAP_TYPE_PERCPU_ARRAY) {
 		err = bpf_percpu_array_copy(map, key, value, flags);
 	} else if (map->map_type == BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE) {
-		err = bpf_percpu_cgroup_storage_copy(map, key, value);
+		err = bpf_percpu_cgroup_storage_copy(map, key, value, flags);
 	} else if (map->map_type == BPF_MAP_TYPE_STACK_TRACE) {
 		err = bpf_stackmap_copy(map, key, value);
 	} else if (IS_FD_ARRAY(map) || IS_FD_PROG_ARRAY(map)) {
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH bpf-next v7 6/7] libbpf: Add BPF_F_CPU and BPF_F_ALL_CPUS flags support for percpu maps
  2025-09-10 16:27 [PATCH bpf-next v7 0/7] bpf: Introduce BPF_F_CPU and BPF_F_ALL_CPUS flags for percpu maps Leon Hwang
                   ` (4 preceding siblings ...)
  2025-09-10 16:27 ` [PATCH bpf-next v7 5/7] bpf: Add BPF_F_CPU and BPF_F_ALL_CPUS flags support for percpu_cgroup_storage maps Leon Hwang
@ 2025-09-10 16:27 ` Leon Hwang
  2025-09-16 23:44   ` Andrii Nakryiko
  2025-09-10 16:27 ` [PATCH bpf-next v7 7/7] selftests/bpf: Add cases to test BPF_F_CPU and BPF_F_ALL_CPUS flags Leon Hwang
  6 siblings, 1 reply; 29+ messages in thread
From: Leon Hwang @ 2025-09-10 16:27 UTC (permalink / raw)
  To: bpf
  Cc: ast, andrii, daniel, jolsa, yonghong.song, song, eddyz87, dxu,
	deso, leon.hwang, kernel-patches-bot

Add libbpf support for the BPF_F_CPU flag for percpu maps by embedding the
cpu info into the high 32 bits of:

1. **flags**: bpf_map_lookup_elem_flags(), bpf_map__lookup_elem(),
   bpf_map_update_elem() and bpf_map__update_elem()
2. **opts->elem_flags**: bpf_map_lookup_batch() and
   bpf_map_update_batch()

And the flag can be BPF_F_ALL_CPUS, but cannot be
'BPF_F_CPU | BPF_F_ALL_CPUS'.

Behavior:

* If the flag is BPF_F_ALL_CPUS, the update is applied across all CPUs.
* If the flag is BPF_F_CPU, it updates value only to the specified CPU.
* If the flag is BPF_F_CPU, lookup value only from the specified CPU.
* lookup does not support BPF_F_ALL_CPUS.

Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
 tools/lib/bpf/bpf.h    |  8 ++++++++
 tools/lib/bpf/libbpf.c | 26 ++++++++++++++++++++------
 tools/lib/bpf/libbpf.h | 21 ++++++++-------------
 3 files changed, 36 insertions(+), 19 deletions(-)

diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index 7252150e7ad35..28acb15e982b3 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -286,6 +286,14 @@ LIBBPF_API int bpf_map_lookup_and_delete_batch(int fd, void *in_batch,
  *    Update spin_lock-ed map elements. This must be
  *    specified if the map value contains a spinlock.
  *
+ * **BPF_F_CPU**
+ *    As for percpu maps, update value on the specified CPU. And the cpu
+ *    info is embedded into the high 32 bits of **opts->elem_flags**.
+ *
+ * **BPF_F_ALL_CPUS**
+ *    As for percpu maps, update value across all CPUs. This flag cannot
+ *    be used with BPF_F_CPU at the same time.
+ *
  * @param fd BPF map file descriptor
  * @param keys pointer to an array of *count* keys
  * @param values pointer to an array of *count* values
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index fe4fc5438678c..3d60e7a713518 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -10603,7 +10603,7 @@ bpf_object__find_map_fd_by_name(const struct bpf_object *obj, const char *name)
 }
 
 static int validate_map_op(const struct bpf_map *map, size_t key_sz,
-			   size_t value_sz, bool check_value_sz)
+			   size_t value_sz, bool check_value_sz, __u64 flags)
 {
 	if (!map_is_created(map)) /* map is not yet created */
 		return -ENOENT;
@@ -10630,6 +10630,20 @@ static int validate_map_op(const struct bpf_map *map, size_t key_sz,
 		int num_cpu = libbpf_num_possible_cpus();
 		size_t elem_sz = roundup(map->def.value_size, 8);
 
+		if (flags & (BPF_F_CPU | BPF_F_ALL_CPUS)) {
+			if ((flags & BPF_F_CPU) && (flags & BPF_F_ALL_CPUS)) {
+				pr_warn("map '%s': can't use BPF_F_CPU and BPF_F_ALL_CPUS at the same time\n",
+					map->name);
+				return -EINVAL;
+			}
+			if (value_sz != elem_sz) {
+				pr_warn("map '%s': unexpected value size %zu provided for per-CPU map, expected %zu\n",
+					map->name, value_sz, elem_sz);
+				return -EINVAL;
+			}
+			break;
+		}
+
 		if (value_sz != num_cpu * elem_sz) {
 			pr_warn("map '%s': unexpected value size %zu provided for per-CPU map, expected %d * %zu = %zd\n",
 				map->name, value_sz, num_cpu, elem_sz, num_cpu * elem_sz);
@@ -10654,7 +10668,7 @@ int bpf_map__lookup_elem(const struct bpf_map *map,
 {
 	int err;
 
-	err = validate_map_op(map, key_sz, value_sz, true);
+	err = validate_map_op(map, key_sz, value_sz, true, flags);
 	if (err)
 		return libbpf_err(err);
 
@@ -10667,7 +10681,7 @@ int bpf_map__update_elem(const struct bpf_map *map,
 {
 	int err;
 
-	err = validate_map_op(map, key_sz, value_sz, true);
+	err = validate_map_op(map, key_sz, value_sz, true, flags);
 	if (err)
 		return libbpf_err(err);
 
@@ -10679,7 +10693,7 @@ int bpf_map__delete_elem(const struct bpf_map *map,
 {
 	int err;
 
-	err = validate_map_op(map, key_sz, 0, false /* check_value_sz */);
+	err = validate_map_op(map, key_sz, 0, false /* check_value_sz */, flags);
 	if (err)
 		return libbpf_err(err);
 
@@ -10692,7 +10706,7 @@ int bpf_map__lookup_and_delete_elem(const struct bpf_map *map,
 {
 	int err;
 
-	err = validate_map_op(map, key_sz, value_sz, true);
+	err = validate_map_op(map, key_sz, value_sz, true, flags);
 	if (err)
 		return libbpf_err(err);
 
@@ -10704,7 +10718,7 @@ int bpf_map__get_next_key(const struct bpf_map *map,
 {
 	int err;
 
-	err = validate_map_op(map, key_sz, 0, false /* check_value_sz */);
+	err = validate_map_op(map, key_sz, 0, false /* check_value_sz */, 0);
 	if (err)
 		return libbpf_err(err);
 
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index 2e91148d9b44d..f221dc5c6ba41 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -1196,12 +1196,13 @@ LIBBPF_API struct bpf_map *bpf_map__inner_map(struct bpf_map *map);
  * @param key_sz size in bytes of key data, needs to match BPF map definition's **key_size**
  * @param value pointer to memory in which looked up value will be stored
  * @param value_sz size in byte of value data memory; it has to match BPF map
- * definition's **value_size**. For per-CPU BPF maps value size has to be
- * a product of BPF map value size and number of possible CPUs in the system
- * (could be fetched with **libbpf_num_possible_cpus()**). Note also that for
- * per-CPU values value size has to be aligned up to closest 8 bytes for
- * alignment reasons, so expected size is: `round_up(value_size, 8)
- * * libbpf_num_possible_cpus()`.
+ * definition's **value_size**. For per-CPU BPF maps, value size can be
+ * `round_up(value_size, 8)` if **BPF_F_CPU** or **BPF_F_ALL_CPUS** is
+ * specified in **flags**, otherwise a product of BPF map value size and number
+ * of possible CPUs in the system (could be fetched with
+ * **libbpf_num_possible_cpus()**). Note else that for per-CPU values value
+ * size has to be aligned up to closest 8 bytes, so expected size is:
+ * `round_up(value_size, 8) * libbpf_num_possible_cpus()`.
  * @flags extra flags passed to kernel for this operation
  * @return 0, on success; negative error, otherwise
  *
@@ -1219,13 +1220,7 @@ LIBBPF_API int bpf_map__lookup_elem(const struct bpf_map *map,
  * @param key pointer to memory containing bytes of the key
  * @param key_sz size in bytes of key data, needs to match BPF map definition's **key_size**
  * @param value pointer to memory containing bytes of the value
- * @param value_sz size in byte of value data memory; it has to match BPF map
- * definition's **value_size**. For per-CPU BPF maps value size has to be
- * a product of BPF map value size and number of possible CPUs in the system
- * (could be fetched with **libbpf_num_possible_cpus()**). Note also that for
- * per-CPU values value size has to be aligned up to closest 8 bytes for
- * alignment reasons, so expected size is: `round_up(value_size, 8)
- * * libbpf_num_possible_cpus()`.
+ * @param value_sz refer to **bpf_map__lookup_elem**'s description.'
  * @flags extra flags passed to kernel for this operation
  * @return 0, on success; negative error, otherwise
  *
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH bpf-next v7 7/7] selftests/bpf: Add cases to test BPF_F_CPU and BPF_F_ALL_CPUS flags
  2025-09-10 16:27 [PATCH bpf-next v7 0/7] bpf: Introduce BPF_F_CPU and BPF_F_ALL_CPUS flags for percpu maps Leon Hwang
                   ` (5 preceding siblings ...)
  2025-09-10 16:27 ` [PATCH bpf-next v7 6/7] libbpf: Add BPF_F_CPU and BPF_F_ALL_CPUS flags support for percpu maps Leon Hwang
@ 2025-09-10 16:27 ` Leon Hwang
  6 siblings, 0 replies; 29+ messages in thread
From: Leon Hwang @ 2025-09-10 16:27 UTC (permalink / raw)
  To: bpf
  Cc: ast, andrii, daniel, jolsa, yonghong.song, song, eddyz87, dxu,
	deso, leon.hwang, kernel-patches-bot

Add test coverage for the new BPF_F_CPU and BPF_F_ALL_CPUS flags support
in percpu maps. The following APIs are exercised:

* bpf_map_update_batch()
* bpf_map_lookup_batch()
* bpf_map_update_elem()
* bpf_map__update_elem()
* bpf_map_lookup_elem_flags()
* bpf_map__lookup_elem()

cd tools/testing/selftests/bpf/
./test_progs -t percpu_alloc
253/13  percpu_alloc/cpu_flag_percpu_array:OK
253/14  percpu_alloc/cpu_flag_percpu_hash:OK
253/15  percpu_alloc/cpu_flag_lru_percpu_hash:OK
253/16  percpu_alloc/cpu_flag_percpu_cgroup_storage:OK
253     percpu_alloc:OK
Summary: 1/16 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
 .../selftests/bpf/prog_tests/percpu_alloc.c   | 233 ++++++++++++++++++
 .../selftests/bpf/progs/percpu_alloc_array.c  |  32 +++
 2 files changed, 265 insertions(+)

diff --git a/tools/testing/selftests/bpf/prog_tests/percpu_alloc.c b/tools/testing/selftests/bpf/prog_tests/percpu_alloc.c
index 343da65864d6d..fcc51e2a325b4 100644
--- a/tools/testing/selftests/bpf/prog_tests/percpu_alloc.c
+++ b/tools/testing/selftests/bpf/prog_tests/percpu_alloc.c
@@ -1,5 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0
 #include <test_progs.h>
+#include "cgroup_helpers.h"
 #include "percpu_alloc_array.skel.h"
 #include "percpu_alloc_cgrp_local_storage.skel.h"
 #include "percpu_alloc_fail.skel.h"
@@ -115,6 +116,230 @@ static void test_failure(void) {
 	RUN_TESTS(percpu_alloc_fail);
 }
 
+static void test_percpu_map_op_cpu_flag(struct bpf_map *map, void *keys, size_t key_sz,
+					u32 max_entries, bool test_batch)
+{
+	int i, j, cpu, map_fd, value_size, nr_cpus, err;
+	u64 *values = NULL, batch = 0, flags;
+	const u64 value = 0xDEADC0DE;
+	size_t value_sz = sizeof(u64);
+	u32 count = max_entries;
+	LIBBPF_OPTS(bpf_map_batch_opts, batch_opts);
+
+	nr_cpus = libbpf_num_possible_cpus();
+	if (!ASSERT_GT(nr_cpus, 0, "libbpf_num_possible_cpus"))
+		return;
+
+	value_size = value_sz * nr_cpus;
+	values = calloc(max_entries, value_size);
+	if (!ASSERT_OK_PTR(values, "calloc values"))
+		goto out;
+	memset(values, 0, value_size * max_entries);
+
+	map_fd = bpf_map__fd(map);
+	flags = BPF_F_CPU | BPF_F_ALL_CPUS;
+	err = bpf_map_lookup_elem_flags(map_fd, keys, values, flags);
+	if (!ASSERT_ERR(err, "bpf_map_lookup_elem_flags err"))
+		goto out;
+
+	err = bpf_map_update_elem(map_fd, keys, values, flags);
+	if (!ASSERT_ERR(err, "bpf_map_update_elem err"))
+		goto out;
+
+	flags = (u64)nr_cpus << 32 | BPF_F_CPU;
+	err = bpf_map_update_elem(map_fd, keys, values, flags);
+	if (!ASSERT_EQ(err, -ERANGE, "bpf_map_update_elem -ERANGE"))
+		goto out;
+
+	err = bpf_map__update_elem(map, keys, key_sz, values, value_sz, flags);
+	if (!ASSERT_EQ(err, -ERANGE, "bpf_map__update_elem -ERANGE"))
+		goto out;
+
+	err = bpf_map_lookup_elem_flags(map_fd, keys, values, flags);
+	if (!ASSERT_EQ(err, -ERANGE, "bpf_map_lookup_elem_flags -ERANGE"))
+		goto out;
+
+	err = bpf_map__lookup_elem(map, keys, key_sz, values, value_sz, flags);
+	if (!ASSERT_EQ(err, -ERANGE, "bpf_map__lookup_elem -ERANGE"))
+		goto out;
+
+	for (cpu = 0; cpu < nr_cpus; cpu++) {
+		/* clear value on all cpus */
+		values[0] = 0;
+		flags = BPF_F_ALL_CPUS;
+		for (i = 0; i < max_entries; i++) {
+			err = bpf_map__update_elem(map, keys + i * key_sz, key_sz, values,
+						   value_sz, flags);
+			if (!ASSERT_OK(err, "bpf_map__update_elem all_cpus"))
+				goto out;
+		}
+
+		/* update value on specified cpu */
+		for (i = 0; i < max_entries; i++) {
+			values[0] = value;
+			flags = (u64)cpu << 32 | BPF_F_CPU;
+			err = bpf_map__update_elem(map, keys + i * key_sz, key_sz, values,
+						   value_sz, flags);
+			if (!ASSERT_OK(err, "bpf_map__update_elem specified cpu"))
+				goto out;
+
+			/* lookup then check value on CPUs */
+			for (j = 0; j < nr_cpus; j++) {
+				flags = (u64)j << 32 | BPF_F_CPU;
+				err = bpf_map__lookup_elem(map, keys + i * key_sz, key_sz, values,
+							   value_sz, flags);
+				if (!ASSERT_OK(err, "bpf_map__lookup_elem specified cpu"))
+					goto out;
+				if (!ASSERT_EQ(values[0], j != cpu ? 0 : value,
+					       "bpf_map__lookup_elem value on specified cpu"))
+					goto out;
+			}
+		}
+	}
+
+	if (!test_batch)
+		goto out;
+
+	batch_opts.elem_flags = (u64)nr_cpus << 32 | BPF_F_CPU;
+	err = bpf_map_update_batch(map_fd, keys, values, &max_entries, &batch_opts);
+	if (!ASSERT_EQ(err, -ERANGE, "bpf_map_update_batch -ERANGE"))
+		goto out;
+
+	for (cpu = 0; cpu < nr_cpus; cpu++) {
+		memset(values, 0, max_entries * value_size);
+
+		/* clear values across all CPUs */
+		batch_opts.elem_flags = BPF_F_ALL_CPUS;
+		err = bpf_map_update_batch(map_fd, keys, values, &max_entries, &batch_opts);
+		if (!ASSERT_OK(err, "bpf_map_update_batch all_cpus"))
+			goto out;
+
+		/* update values on specified CPU */
+		for (i = 0; i < max_entries; i++)
+			values[i] = value;
+
+		batch_opts.elem_flags = (u64)cpu << 32 | BPF_F_CPU;
+		err = bpf_map_update_batch(map_fd, keys, values, &max_entries, &batch_opts);
+		if (!ASSERT_OK(err, "bpf_map_update_batch specified cpu"))
+			goto out;
+
+		/* lookup values on specified CPU */
+		memset(values, 0, max_entries * value_sz);
+		err = bpf_map_lookup_batch(map_fd, NULL, &batch, keys, values, &count, &batch_opts);
+		if (!ASSERT_TRUE(!err || err == -ENOENT, "bpf_map_lookup_batch specified cpu"))
+			goto out;
+
+		for (i = 0; i < max_entries; i++)
+			if (!ASSERT_EQ(values[i], value, "value on specified cpu"))
+				goto out;
+
+		/* lookup values from all CPUs */
+		batch_opts.elem_flags = 0;
+		memset(values, 0, max_entries * value_size);
+		err = bpf_map_lookup_batch(map_fd, NULL, &batch, keys, values, &count, &batch_opts);
+		if (!ASSERT_TRUE(!err || err == -ENOENT, "bpf_map_lookup_batch all_cpus"))
+			goto out;
+
+		for (i = 0; i < max_entries; i++) {
+			for (j = 0; j < nr_cpus; j++) {
+				if (!ASSERT_EQ(values[i*nr_cpus + j], j != cpu ? 0 : value,
+					       "value on specified cpu"))
+					goto out;
+			}
+		}
+	}
+
+out:
+	if (values)
+		free(values);
+}
+
+static void test_percpu_map_cpu_flag(enum bpf_map_type map_type)
+{
+	struct percpu_alloc_array *skel;
+	size_t key_sz = sizeof(int);
+	int *keys = NULL, i, err;
+	struct bpf_map *map;
+	u32 max_entries;
+
+	skel = percpu_alloc_array__open();
+	if (!ASSERT_OK_PTR(skel, "percpu_alloc_array__open"))
+		return;
+
+	map = skel->maps.percpu;
+	bpf_map__set_type(map, map_type);
+
+	err = percpu_alloc_array__load(skel);
+	if (!ASSERT_OK(err, "test_percpu_alloc__load"))
+		goto out;
+
+	max_entries = bpf_map__max_entries(map);
+	keys = calloc(max_entries, key_sz);
+	if (!ASSERT_OK_PTR(keys, "calloc keys"))
+		goto out;
+
+	for (i = 0; i < max_entries; i++)
+		keys[i] = i;
+
+	test_percpu_map_op_cpu_flag(map, keys, key_sz, max_entries, true);
+out:
+	if (keys)
+		free(keys);
+	percpu_alloc_array__destroy(skel);
+}
+
+static void test_percpu_array_cpu_flag(void)
+{
+	test_percpu_map_cpu_flag(BPF_MAP_TYPE_PERCPU_ARRAY);
+}
+
+static void test_percpu_hash_cpu_flag(void)
+{
+	test_percpu_map_cpu_flag(BPF_MAP_TYPE_PERCPU_HASH);
+}
+
+static void test_lru_percpu_hash_cpu_flag(void)
+{
+	test_percpu_map_cpu_flag(BPF_MAP_TYPE_LRU_PERCPU_HASH);
+}
+
+static void test_percpu_cgroup_storage_cpu_flag(void)
+{
+	struct bpf_cgroup_storage_key key;
+	struct percpu_alloc_array *skel;
+	int cgroup = -1, prog_fd, err;
+	struct bpf_map *map;
+
+	skel = percpu_alloc_array__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "percpu_alloc_array__open_and_load"))
+		return;
+
+	cgroup = create_and_get_cgroup("/cg_percpu");
+	if (!ASSERT_GE(cgroup, 0, "create_and_get_cgroup"))
+		goto out;
+
+	err = join_cgroup("/cg_percpu");
+	if (!ASSERT_OK(err, "join_cgroup"))
+		goto out;
+
+	prog_fd = bpf_program__fd(skel->progs.cgroup_egress);
+	err = bpf_prog_attach(prog_fd, cgroup, BPF_CGROUP_INET_EGRESS, 0);
+	if (!ASSERT_OK(err, "bpf_prog_attach"))
+		goto out;
+
+	map = skel->maps.percpu_cgroup_storage;
+	err = bpf_map_get_next_key(bpf_map__fd(map), NULL, &key);
+	if (!ASSERT_OK(err, "bpf_map_get_next_key"))
+		goto out;
+
+	test_percpu_map_op_cpu_flag(map, &key, sizeof(key), 1, false);
+out:
+	bpf_prog_detach2(-1, cgroup, BPF_CGROUP_INET_EGRESS);
+	close(cgroup);
+	cleanup_cgroup_environment();
+	percpu_alloc_array__destroy(skel);
+}
+
 void test_percpu_alloc(void)
 {
 	if (test__start_subtest("array"))
@@ -125,4 +350,12 @@ void test_percpu_alloc(void)
 		test_cgrp_local_storage();
 	if (test__start_subtest("failure_tests"))
 		test_failure();
+	if (test__start_subtest("cpu_flag_percpu_array"))
+		test_percpu_array_cpu_flag();
+	if (test__start_subtest("cpu_flag_percpu_hash"))
+		test_percpu_hash_cpu_flag();
+	if (test__start_subtest("cpu_flag_lru_percpu_hash"))
+		test_lru_percpu_hash_cpu_flag();
+	if (test__start_subtest("cpu_flag_percpu_cgroup_storage"))
+		test_percpu_cgroup_storage_cpu_flag();
 }
diff --git a/tools/testing/selftests/bpf/progs/percpu_alloc_array.c b/tools/testing/selftests/bpf/progs/percpu_alloc_array.c
index 37c2d2608ec0b..427301909c349 100644
--- a/tools/testing/selftests/bpf/progs/percpu_alloc_array.c
+++ b/tools/testing/selftests/bpf/progs/percpu_alloc_array.c
@@ -187,4 +187,36 @@ int BPF_PROG(test_array_map_10)
 	return 0;
 }
 
+struct {
+	__uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
+	__uint(max_entries, 2);
+	__type(key, int);
+	__type(value, u64);
+} percpu SEC(".maps");
+
+SEC("?fentry/bpf_fentry_test1")
+int BPF_PROG(test_percpu_array, int x)
+{
+	u64 value = 0xDEADC0DE;
+	int key = 0;
+
+	bpf_map_update_elem(&percpu, &key, &value, BPF_ANY);
+	return 0;
+}
+
+struct {
+	__uint(type, BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE);
+	__type(key, struct bpf_cgroup_storage_key);
+	__type(value, u64);
+} percpu_cgroup_storage SEC(".maps");
+
+SEC("cgroup_skb/egress")
+int cgroup_egress(struct __sk_buff *skb)
+{
+	u64 *val = bpf_get_local_storage(&percpu_cgroup_storage, 0);
+
+	__sync_fetch_and_add(val, 1);
+	return 1;
+}
+
 char _license[] SEC("license") = "GPL";
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH bpf-next v7 1/7] bpf: Introduce internal bpf_map_check_op_flags helper function
  2025-09-10 16:27 ` [PATCH bpf-next v7 1/7] bpf: Introduce internal bpf_map_check_op_flags helper function Leon Hwang
@ 2025-09-16 23:44   ` Andrii Nakryiko
  0 siblings, 0 replies; 29+ messages in thread
From: Andrii Nakryiko @ 2025-09-16 23:44 UTC (permalink / raw)
  To: Leon Hwang
  Cc: bpf, ast, andrii, daniel, jolsa, yonghong.song, song, eddyz87,
	dxu, deso, kernel-patches-bot

On Wed, Sep 10, 2025 at 9:28 AM Leon Hwang <leon.hwang@linux.dev> wrote:
>
> It is to unify map flags checking for lookup_elem, update_elem,
> lookup_batch and update_batch APIs.
>
> Therefore, it will be convenient to check BPF_F_CPU and BPF_F_ALL_CPUS
> flags in it for these APIs in next patch.
>
> Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
> ---
>  include/linux/bpf.h  | 11 +++++++++++
>  kernel/bpf/syscall.c | 34 +++++++++++-----------------------
>  2 files changed, 22 insertions(+), 23 deletions(-)
>

lgtm

Acked-by: Andrii Nakryiko <andrii@kernel.org>

[...]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH bpf-next v7 2/7] bpf: Introduce BPF_F_CPU and BPF_F_ALL_CPUS flags
  2025-09-10 16:27 ` [PATCH bpf-next v7 2/7] bpf: Introduce BPF_F_CPU and BPF_F_ALL_CPUS flags Leon Hwang
@ 2025-09-16 23:44   ` Andrii Nakryiko
  0 siblings, 0 replies; 29+ messages in thread
From: Andrii Nakryiko @ 2025-09-16 23:44 UTC (permalink / raw)
  To: Leon Hwang
  Cc: bpf, ast, andrii, daniel, jolsa, yonghong.song, song, eddyz87,
	dxu, deso, kernel-patches-bot

On Wed, Sep 10, 2025 at 9:28 AM Leon Hwang <leon.hwang@linux.dev> wrote:
>
> Introduce BPF_F_CPU and BPF_F_ALL_CPUS flags and check them for
> following APIs:
>
> * 'map_lookup_elem()'
> * 'map_update_elem()'
> * 'generic_map_lookup_batch()'
> * 'generic_map_update_batch()'
>
> And, get the correct value size for these APIs.
>
> Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
> ---
>  include/linux/bpf.h            | 23 ++++++++++++++++++++++-
>  include/uapi/linux/bpf.h       |  2 ++
>  kernel/bpf/syscall.c           | 31 +++++++++++++++++--------------
>  tools/include/uapi/linux/bpf.h |  2 ++
>  4 files changed, 43 insertions(+), 15 deletions(-)
>

lgtm

Acked-by: Andrii Nakryiko <andrii@kernel.org>


[...]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH bpf-next v7 3/7] bpf: Add BPF_F_CPU and BPF_F_ALL_CPUS flags support for percpu_array maps
  2025-09-10 16:27 ` [PATCH bpf-next v7 3/7] bpf: Add BPF_F_CPU and BPF_F_ALL_CPUS flags support for percpu_array maps Leon Hwang
@ 2025-09-16 23:44   ` Andrii Nakryiko
  2025-09-17 15:04     ` Leon Hwang
  0 siblings, 1 reply; 29+ messages in thread
From: Andrii Nakryiko @ 2025-09-16 23:44 UTC (permalink / raw)
  To: Leon Hwang
  Cc: bpf, ast, andrii, daniel, jolsa, yonghong.song, song, eddyz87,
	dxu, deso, kernel-patches-bot

On Wed, Sep 10, 2025 at 9:28 AM Leon Hwang <leon.hwang@linux.dev> wrote:
>
> Introduce support for the BPF_F_ALL_CPUS flag in percpu_array maps to
> allow updating values for all CPUs with a single value for both
> update_elem and update_batch APIs.
>
> Introduce support for the BPF_F_CPU flag in percpu_array maps to allow:
>
> * update value for specified CPU for both update_elem and update_batch
> APIs.
> * lookup value for specified CPU for both lookup_elem and lookup_batch
> APIs.
>
> The BPF_F_CPU flag is passed via:
>
> * map_flags of lookup_elem and update_elem APIs along with embedded cpu
> info.
> * elem_flags of lookup_batch and update_batch APIs along with embedded
> cpu info.
>
> Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
> ---
>  include/linux/bpf.h   |  9 +++++++--
>  kernel/bpf/arraymap.c | 24 +++++++++++++++++++++---
>  kernel/bpf/syscall.c  |  2 +-
>  3 files changed, 29 insertions(+), 6 deletions(-)
>

[...]

>
> -int bpf_percpu_array_copy(struct bpf_map *map, void *key, void *value)
> +int bpf_percpu_array_copy(struct bpf_map *map, void *key, void *value, u64 map_flags)
>  {
>         struct bpf_array *array = container_of(map, struct bpf_array, map);
>         u32 index = *(u32 *)key;
> @@ -313,11 +313,18 @@ int bpf_percpu_array_copy(struct bpf_map *map, void *key, void *value)
>         size = array->elem_size;
>         rcu_read_lock();
>         pptr = array->pptrs[index & array->index_mask];
> +       if (map_flags & BPF_F_CPU) {
> +               cpu = map_flags >> 32;
> +               copy_map_value_long(map, value, per_cpu_ptr(pptr, cpu));
> +               check_and_init_map_value(map, value);
> +               goto unlock;

goto is not how I'd structure this logic, I think if/else is a more
logical structure here, but this works, I suppose...

> +       }
>         for_each_possible_cpu(cpu) {
>                 copy_map_value_long(map, value + off, per_cpu_ptr(pptr, cpu));
>                 check_and_init_map_value(map, value + off);
>                 off += size;
>         }
> +unlock:
>         rcu_read_unlock();
>         return 0;
>  }
> @@ -390,7 +397,7 @@ int bpf_percpu_array_update(struct bpf_map *map, void *key, void *value,
>         int cpu, off = 0;
>         u32 size;
>
> -       if (unlikely(map_flags > BPF_EXIST))
> +       if (unlikely((u32)map_flags > BPF_F_ALL_CPUS))

this will let through BPF_F_LOCK, no? which is not what you intended,
right? So you need to check for

(map_flags & BPF_F_LOCK) || (u32)map_flags > BPF_F_ALL_CPUS

>                 /* unknown flags */
>                 return -EINVAL;
>

[...]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH bpf-next v7 4/7] bpf: Add BPF_F_CPU and BPF_F_ALL_CPUS flags support for percpu_hash and lru_percpu_hash maps
  2025-09-10 16:27 ` [PATCH bpf-next v7 4/7] bpf: Add BPF_F_CPU and BPF_F_ALL_CPUS flags support for percpu_hash and lru_percpu_hash maps Leon Hwang
@ 2025-09-16 23:44   ` Andrii Nakryiko
  2025-09-17 15:20     ` Leon Hwang
  2025-09-19  5:25     ` Leon Hwang
  0 siblings, 2 replies; 29+ messages in thread
From: Andrii Nakryiko @ 2025-09-16 23:44 UTC (permalink / raw)
  To: Leon Hwang
  Cc: bpf, ast, andrii, daniel, jolsa, yonghong.song, song, eddyz87,
	dxu, deso, kernel-patches-bot

On Wed, Sep 10, 2025 at 9:28 AM Leon Hwang <leon.hwang@linux.dev> wrote:
>
> Introduce BPF_F_ALL_CPUS flag support for percpu_hash and lru_percpu_hash
> maps to allow updating values for all CPUs with a single value for both
> update_elem and update_batch APIs.
>
> Introduce BPF_F_CPU flag support for percpu_hash and lru_percpu_hash
> maps to allow:
>
> * update value for specified CPU for both update_elem and update_batch
> APIs.
> * lookup value for specified CPU for both lookup_elem and lookup_batch
> APIs.
>
> The BPF_F_CPU flag is passed via:
>
> * map_flags along with embedded cpu info.
> * elem_flags along with embedded cpu info.
>
> Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
> ---
>  include/linux/bpf.h  |  4 ++-
>  kernel/bpf/hashtab.c | 77 +++++++++++++++++++++++++++++++-------------
>  kernel/bpf/syscall.c |  2 +-
>  3 files changed, 58 insertions(+), 25 deletions(-)
>

[...]

> @@ -1147,7 +1158,7 @@ static long htab_map_update_elem(struct bpf_map *map, void *key, void *value,
>         }
>
>         l_new = alloc_htab_elem(htab, key, value, key_size, hash, false, false,
> -                               l_old);
> +                               l_old, map_flags);
>         if (IS_ERR(l_new)) {
>                 /* all pre-allocated elements are in use or memory exhausted */
>                 ret = PTR_ERR(l_new);
> @@ -1263,7 +1274,7 @@ static long htab_map_update_elem_in_place(struct bpf_map *map, void *key,
>         u32 key_size, hash;
>         int ret;
>
> -       if (unlikely(map_flags > BPF_EXIST))
> +       if (unlikely(!onallcpus && map_flags > BPF_EXIST))

BPF_F_LOCK shouldn't be let through

>                 /* unknown flags */
>                 return -EINVAL;
>

[...]

> @@ -1698,9 +1709,16 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map,
>         int ret = 0;
>
>         elem_map_flags = attr->batch.elem_flags;
> -       if ((elem_map_flags & ~BPF_F_LOCK) ||
> -           ((elem_map_flags & BPF_F_LOCK) && !btf_record_has_field(map->record, BPF_SPIN_LOCK)))
> -               return -EINVAL;
> +       if (!do_delete && is_percpu) {
> +               ret = bpf_map_check_op_flags(map, elem_map_flags, BPF_F_LOCK | BPF_F_CPU);
> +               if (ret)
> +                       return ret;
> +       } else {
> +               if ((elem_map_flags & ~BPF_F_LOCK) ||
> +                   ((elem_map_flags & BPF_F_LOCK) &&
> +                    !btf_record_has_field(map->record, BPF_SPIN_LOCK)))
> +                       return -EINVAL;
> +       }

partially open-coded bpf_map_check_op_flags() if `do_delete ||
!is_percpu`, right? Have you considered

u32 allowed_flags = 0;

...

allowed_flags = BPF_F_LOCK | BPF_F_CPU;
if (do_delete || !is_percpu)
    allowed_flags ~= BPF_F_CPU;
err = bpf_map_check_op_flags(map, elem_map_flags, allowed_flags);


This reads way more natural (in my head...), and no open-coding the
helper you just so painstakingly extracted and extended to check all
these conditions.

>
>         map_flags = attr->batch.flags;
>         if (map_flags)
> @@ -1724,7 +1742,7 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map,
>         value_size = htab->map.value_size;
>         size = round_up(value_size, 8);
>         if (is_percpu)
> -               value_size = size * num_possible_cpus();
> +               value_size = (elem_map_flags & BPF_F_CPU) ? size : size * num_possible_cpus();

if (is_percpu && !(elem_map_flags & BPF_F_CPU))
    value_size = size * num_possible_cpus();

?

>         total = 0;
>         /* while experimenting with hash tables with sizes ranging from 10 to
>          * 1000, it was observed that a bucket can have up to 5 entries.

[...]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH bpf-next v7 5/7] bpf: Add BPF_F_CPU and BPF_F_ALL_CPUS flags support for percpu_cgroup_storage maps
  2025-09-10 16:27 ` [PATCH bpf-next v7 5/7] bpf: Add BPF_F_CPU and BPF_F_ALL_CPUS flags support for percpu_cgroup_storage maps Leon Hwang
@ 2025-09-16 23:44   ` Andrii Nakryiko
  2025-09-17 15:07     ` Leon Hwang
  0 siblings, 1 reply; 29+ messages in thread
From: Andrii Nakryiko @ 2025-09-16 23:44 UTC (permalink / raw)
  To: Leon Hwang
  Cc: bpf, ast, andrii, daniel, jolsa, yonghong.song, song, eddyz87,
	dxu, deso, kernel-patches-bot

On Wed, Sep 10, 2025 at 9:28 AM Leon Hwang <leon.hwang@linux.dev> wrote:
>
> Introduce BPF_F_ALL_CPUS flag support for percpu_cgroup_storage maps to
> allow updating values for all CPUs with a single value for update_elem
> API.
>
> Introduce BPF_F_CPU flag support for percpu_cgroup_storage maps to
> allow:
>
> * update value for specified CPU for update_elem API.
> * lookup value for specified CPU for lookup_elem API.
>
> The BPF_F_CPU flag is passed via map_flags along with embedded cpu info.
>
> Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
> ---
>  include/linux/bpf-cgroup.h |  4 ++--
>  include/linux/bpf.h        |  1 +
>  kernel/bpf/local_storage.c | 22 +++++++++++++++++++---
>  kernel/bpf/syscall.c       |  2 +-
>  4 files changed, 23 insertions(+), 6 deletions(-)
>

[...]

> @@ -216,7 +222,7 @@ int bpf_percpu_cgroup_storage_update(struct bpf_map *_map, void *key,
>         int cpu, off = 0;
>         u32 size;
>
> -       if (map_flags != BPF_ANY && map_flags != BPF_EXIST)
> +       if ((u32)map_flags & ~(BPF_ANY | BPF_EXIST | BPF_F_CPU | BPF_F_ALL_CPUS))
>                 return -EINVAL;

shouldn't bpf_map_check_op_flags() be used here to validate cpu number
and BPF_F_CPU and BPF_F_ALL_CPUS exclusivity?..

[...]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH bpf-next v7 6/7] libbpf: Add BPF_F_CPU and BPF_F_ALL_CPUS flags support for percpu maps
  2025-09-10 16:27 ` [PATCH bpf-next v7 6/7] libbpf: Add BPF_F_CPU and BPF_F_ALL_CPUS flags support for percpu maps Leon Hwang
@ 2025-09-16 23:44   ` Andrii Nakryiko
  2025-09-17 15:25     ` Leon Hwang
  0 siblings, 1 reply; 29+ messages in thread
From: Andrii Nakryiko @ 2025-09-16 23:44 UTC (permalink / raw)
  To: Leon Hwang
  Cc: bpf, ast, andrii, daniel, jolsa, yonghong.song, song, eddyz87,
	dxu, deso, kernel-patches-bot

On Wed, Sep 10, 2025 at 9:28 AM Leon Hwang <leon.hwang@linux.dev> wrote:
>
> Add libbpf support for the BPF_F_CPU flag for percpu maps by embedding the
> cpu info into the high 32 bits of:
>
> 1. **flags**: bpf_map_lookup_elem_flags(), bpf_map__lookup_elem(),
>    bpf_map_update_elem() and bpf_map__update_elem()
> 2. **opts->elem_flags**: bpf_map_lookup_batch() and
>    bpf_map_update_batch()
>
> And the flag can be BPF_F_ALL_CPUS, but cannot be
> 'BPF_F_CPU | BPF_F_ALL_CPUS'.
>
> Behavior:
>
> * If the flag is BPF_F_ALL_CPUS, the update is applied across all CPUs.
> * If the flag is BPF_F_CPU, it updates value only to the specified CPU.
> * If the flag is BPF_F_CPU, lookup value only from the specified CPU.
> * lookup does not support BPF_F_ALL_CPUS.
>
> Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
> ---
>  tools/lib/bpf/bpf.h    |  8 ++++++++
>  tools/lib/bpf/libbpf.c | 26 ++++++++++++++++++++------
>  tools/lib/bpf/libbpf.h | 21 ++++++++-------------
>  3 files changed, 36 insertions(+), 19 deletions(-)
>

LGTM, but see some wording nits below

Acked-by: Andrii Nakryiko <andrii@kernel.org>

> diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
> index 7252150e7ad35..28acb15e982b3 100644
> --- a/tools/lib/bpf/bpf.h
> +++ b/tools/lib/bpf/bpf.h
> @@ -286,6 +286,14 @@ LIBBPF_API int bpf_map_lookup_and_delete_batch(int fd, void *in_batch,
>   *    Update spin_lock-ed map elements. This must be
>   *    specified if the map value contains a spinlock.
>   *
> + * **BPF_F_CPU**
> + *    As for percpu maps, update value on the specified CPU. And the cpu
> + *    info is embedded into the high 32 bits of **opts->elem_flags**.
> + *
> + * **BPF_F_ALL_CPUS**
> + *    As for percpu maps, update value across all CPUs. This flag cannot
> + *    be used with BPF_F_CPU at the same time.
> + *
>   * @param fd BPF map file descriptor
>   * @param keys pointer to an array of *count* keys
>   * @param values pointer to an array of *count* values
> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index fe4fc5438678c..3d60e7a713518 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -10603,7 +10603,7 @@ bpf_object__find_map_fd_by_name(const struct bpf_object *obj, const char *name)
>  }
>
>  static int validate_map_op(const struct bpf_map *map, size_t key_sz,
> -                          size_t value_sz, bool check_value_sz)
> +                          size_t value_sz, bool check_value_sz, __u64 flags)
>  {
>         if (!map_is_created(map)) /* map is not yet created */
>                 return -ENOENT;
> @@ -10630,6 +10630,20 @@ static int validate_map_op(const struct bpf_map *map, size_t key_sz,
>                 int num_cpu = libbpf_num_possible_cpus();
>                 size_t elem_sz = roundup(map->def.value_size, 8);
>
> +               if (flags & (BPF_F_CPU | BPF_F_ALL_CPUS)) {
> +                       if ((flags & BPF_F_CPU) && (flags & BPF_F_ALL_CPUS)) {
> +                               pr_warn("map '%s': can't use BPF_F_CPU and BPF_F_ALL_CPUS at the same time\n",

"BPF_F_CPU and BPF_F_ALL_CPUS are mutually exclusive" ?

> +                                       map->name);
> +                               return -EINVAL;
> +                       }
> +                       if (value_sz != elem_sz) {
> +                               pr_warn("map '%s': unexpected value size %zu provided for per-CPU map, expected %zu\n",
> +                                       map->name, value_sz, elem_sz);
> +                               return -EINVAL;
> +                       }
> +                       break;
> +               }
> +
>                 if (value_sz != num_cpu * elem_sz) {
>                         pr_warn("map '%s': unexpected value size %zu provided for per-CPU map, expected %d * %zu = %zd\n",
>                                 map->name, value_sz, num_cpu, elem_sz, num_cpu * elem_sz);

[...]

> diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
> index 2e91148d9b44d..f221dc5c6ba41 100644
> --- a/tools/lib/bpf/libbpf.h
> +++ b/tools/lib/bpf/libbpf.h
> @@ -1196,12 +1196,13 @@ LIBBPF_API struct bpf_map *bpf_map__inner_map(struct bpf_map *map);
>   * @param key_sz size in bytes of key data, needs to match BPF map definition's **key_size**
>   * @param value pointer to memory in which looked up value will be stored
>   * @param value_sz size in byte of value data memory; it has to match BPF map
> - * definition's **value_size**. For per-CPU BPF maps value size has to be
> - * a product of BPF map value size and number of possible CPUs in the system
> - * (could be fetched with **libbpf_num_possible_cpus()**). Note also that for
> - * per-CPU values value size has to be aligned up to closest 8 bytes for
> - * alignment reasons, so expected size is: `round_up(value_size, 8)
> - * * libbpf_num_possible_cpus()`.
> + * definition's **value_size**. For per-CPU BPF maps, value size can be
> + * `round_up(value_size, 8)` if **BPF_F_CPU** or **BPF_F_ALL_CPUS** is

nit: if either BPF_F_CPU or BPF_F_ALL_CPUS

> + * specified in **flags**, otherwise a product of BPF map value size and number
> + * of possible CPUs in the system (could be fetched with
> + * **libbpf_num_possible_cpus()**). Note else that for per-CPU values value

Note *also*? Is that what you were trying to say?


> + * size has to be aligned up to closest 8 bytes, so expected size is:
> + * `round_up(value_size, 8) * libbpf_num_possible_cpus()`.
>   * @flags extra flags passed to kernel for this operation
>   * @return 0, on success; negative error, otherwise
>   *
> @@ -1219,13 +1220,7 @@ LIBBPF_API int bpf_map__lookup_elem(const struct bpf_map *map,
>   * @param key pointer to memory containing bytes of the key
>   * @param key_sz size in bytes of key data, needs to match BPF map definition's **key_size**
>   * @param value pointer to memory containing bytes of the value
> - * @param value_sz size in byte of value data memory; it has to match BPF map
> - * definition's **value_size**. For per-CPU BPF maps value size has to be
> - * a product of BPF map value size and number of possible CPUs in the system
> - * (could be fetched with **libbpf_num_possible_cpus()**). Note also that for
> - * per-CPU values value size has to be aligned up to closest 8 bytes for
> - * alignment reasons, so expected size is: `round_up(value_size, 8)
> - * * libbpf_num_possible_cpus()`.
> + * @param value_sz refer to **bpf_map__lookup_elem**'s description.'
>   * @flags extra flags passed to kernel for this operation
>   * @return 0, on success; negative error, otherwise
>   *
> --
> 2.50.1
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH bpf-next v7 3/7] bpf: Add BPF_F_CPU and BPF_F_ALL_CPUS flags support for percpu_array maps
  2025-09-16 23:44   ` Andrii Nakryiko
@ 2025-09-17 15:04     ` Leon Hwang
  0 siblings, 0 replies; 29+ messages in thread
From: Leon Hwang @ 2025-09-17 15:04 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, ast, andrii, daniel, jolsa, yonghong.song, song, eddyz87,
	dxu, deso, kernel-patches-bot

On Wed Sep 17, 2025 at 7:44 AM +08, Andrii Nakryiko wrote:
> On Wed, Sep 10, 2025 at 9:28 AM Leon Hwang <leon.hwang@linux.dev> wrote:
>>
>> Introduce support for the BPF_F_ALL_CPUS flag in percpu_array maps to
>> allow updating values for all CPUs with a single value for both
>> update_elem and update_batch APIs.
>>
>> Introduce support for the BPF_F_CPU flag in percpu_array maps to allow:
>>
>> * update value for specified CPU for both update_elem and update_batch
>> APIs.
>> * lookup value for specified CPU for both lookup_elem and lookup_batch
>> APIs.
>>
>> The BPF_F_CPU flag is passed via:
>>
>> * map_flags of lookup_elem and update_elem APIs along with embedded cpu
>> info.
>> * elem_flags of lookup_batch and update_batch APIs along with embedded
>> cpu info.
>>
>> Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
>> ---
>>  include/linux/bpf.h   |  9 +++++++--
>>  kernel/bpf/arraymap.c | 24 +++++++++++++++++++++---
>>  kernel/bpf/syscall.c  |  2 +-
>>  3 files changed, 29 insertions(+), 6 deletions(-)
>>
>
> [...]
>
>>
>> -int bpf_percpu_array_copy(struct bpf_map *map, void *key, void *value)
>> +int bpf_percpu_array_copy(struct bpf_map *map, void *key, void *value, u64 map_flags)
>>  {
>>         struct bpf_array *array = container_of(map, struct bpf_array, map);
>>         u32 index = *(u32 *)key;
>> @@ -313,11 +313,18 @@ int bpf_percpu_array_copy(struct bpf_map *map, void *key, void *value)
>>         size = array->elem_size;
>>         rcu_read_lock();
>>         pptr = array->pptrs[index & array->index_mask];
>> +       if (map_flags & BPF_F_CPU) {
>> +               cpu = map_flags >> 32;
>> +               copy_map_value_long(map, value, per_cpu_ptr(pptr, cpu));
>> +               check_and_init_map_value(map, value);
>> +               goto unlock;
>
> goto is not how I'd structure this logic, I think if/else is a more
> logical structure here, but this works, I suppose...
>

My intention is to avoid putting the existing code inside a new 'else'
block, even if it would only affect indentation.

This way, the original code block stays intact, and git-blame will still
point to the commit that introduced it.

>> +       }
>>         for_each_possible_cpu(cpu) {
>>                 copy_map_value_long(map, value + off, per_cpu_ptr(pptr, cpu));
>>                 check_and_init_map_value(map, value + off);
>>                 off += size;
>>         }
>> +unlock:
>>         rcu_read_unlock();
>>         return 0;
>>  }
>> @@ -390,7 +397,7 @@ int bpf_percpu_array_update(struct bpf_map *map, void *key, void *value,
>>         int cpu, off = 0;
>>         u32 size;
>>
>> -       if (unlikely(map_flags > BPF_EXIST))
>> +       if (unlikely((u32)map_flags > BPF_F_ALL_CPUS))
>
> this will let through BPF_F_LOCK, no? which is not what you intended,
> right? So you need to check for
>
> (map_flags & BPF_F_LOCK) || (u32)map_flags > BPF_F_ALL_CPUS
>

Right.

I'll update it in next revision.

Thanks,
Leon

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH bpf-next v7 5/7] bpf: Add BPF_F_CPU and BPF_F_ALL_CPUS flags support for percpu_cgroup_storage maps
  2025-09-16 23:44   ` Andrii Nakryiko
@ 2025-09-17 15:07     ` Leon Hwang
  2025-09-17 19:12       ` Andrii Nakryiko
  0 siblings, 1 reply; 29+ messages in thread
From: Leon Hwang @ 2025-09-17 15:07 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, ast, andrii, daniel, jolsa, yonghong.song, song, eddyz87,
	dxu, deso, kernel-patches-bot

On Wed Sep 17, 2025 at 7:44 AM +08, Andrii Nakryiko wrote:
> On Wed, Sep 10, 2025 at 9:28 AM Leon Hwang <leon.hwang@linux.dev> wrote:
>>
>> Introduce BPF_F_ALL_CPUS flag support for percpu_cgroup_storage maps to
>> allow updating values for all CPUs with a single value for update_elem
>> API.
>>
>> Introduce BPF_F_CPU flag support for percpu_cgroup_storage maps to
>> allow:
>>
>> * update value for specified CPU for update_elem API.
>> * lookup value for specified CPU for lookup_elem API.
>>
>> The BPF_F_CPU flag is passed via map_flags along with embedded cpu info.
>>
>> Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
>> ---
>>  include/linux/bpf-cgroup.h |  4 ++--
>>  include/linux/bpf.h        |  1 +
>>  kernel/bpf/local_storage.c | 22 +++++++++++++++++++---
>>  kernel/bpf/syscall.c       |  2 +-
>>  4 files changed, 23 insertions(+), 6 deletions(-)
>>
>
> [...]
>
>> @@ -216,7 +222,7 @@ int bpf_percpu_cgroup_storage_update(struct bpf_map *_map, void *key,
>>         int cpu, off = 0;
>>         u32 size;
>>
>> -       if (map_flags != BPF_ANY && map_flags != BPF_EXIST)
>> +       if ((u32)map_flags & ~(BPF_ANY | BPF_EXIST | BPF_F_CPU | BPF_F_ALL_CPUS))
>>                 return -EINVAL;
>
> shouldn't bpf_map_check_op_flags() be used here to validate cpu number
> and BPF_F_CPU and BPF_F_ALL_CPUS exclusivity?..
>

bpf_map_check_op_flags() has been called in
syscall.c::map_update_elem().

Thanks,
Leon

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH bpf-next v7 4/7] bpf: Add BPF_F_CPU and BPF_F_ALL_CPUS flags support for percpu_hash and lru_percpu_hash maps
  2025-09-16 23:44   ` Andrii Nakryiko
@ 2025-09-17 15:20     ` Leon Hwang
  2025-09-17 19:11       ` Andrii Nakryiko
  2025-09-19  5:25     ` Leon Hwang
  1 sibling, 1 reply; 29+ messages in thread
From: Leon Hwang @ 2025-09-17 15:20 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, ast, andrii, daniel, jolsa, yonghong.song, song, eddyz87,
	dxu, deso, kernel-patches-bot

On Wed Sep 17, 2025 at 7:44 AM +08, Andrii Nakryiko wrote:
> On Wed, Sep 10, 2025 at 9:28 AM Leon Hwang <leon.hwang@linux.dev> wrote:
>>
>> Introduce BPF_F_ALL_CPUS flag support for percpu_hash and lru_percpu_hash
>> maps to allow updating values for all CPUs with a single value for both
>> update_elem and update_batch APIs.
>>
>> Introduce BPF_F_CPU flag support for percpu_hash and lru_percpu_hash
>> maps to allow:
>>
>> * update value for specified CPU for both update_elem and update_batch
>> APIs.
>> * lookup value for specified CPU for both lookup_elem and lookup_batch
>> APIs.
>>
>> The BPF_F_CPU flag is passed via:
>>
>> * map_flags along with embedded cpu info.
>> * elem_flags along with embedded cpu info.
>>
>> Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
>> ---
>>  include/linux/bpf.h  |  4 ++-
>>  kernel/bpf/hashtab.c | 77 +++++++++++++++++++++++++++++++-------------
>>  kernel/bpf/syscall.c |  2 +-
>>  3 files changed, 58 insertions(+), 25 deletions(-)
>>
>
> [...]
>
>> @@ -1147,7 +1158,7 @@ static long htab_map_update_elem(struct bpf_map *map, void *key, void *value,
>>         }
>>
>>         l_new = alloc_htab_elem(htab, key, value, key_size, hash, false, false,
>> -                               l_old);
>> +                               l_old, map_flags);
>>         if (IS_ERR(l_new)) {
>>                 /* all pre-allocated elements are in use or memory exhausted */
>>                 ret = PTR_ERR(l_new);
>> @@ -1263,7 +1274,7 @@ static long htab_map_update_elem_in_place(struct bpf_map *map, void *key,
>>         u32 key_size, hash;
>>         int ret;
>>
>> -       if (unlikely(map_flags > BPF_EXIST))
>> +       if (unlikely(!onallcpus && map_flags > BPF_EXIST))
>
> BPF_F_LOCK shouldn't be let through
>

Ack.

>>                 /* unknown flags */
>>                 return -EINVAL;
>>
>
> [...]
>
>> @@ -1698,9 +1709,16 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map,
>>         int ret = 0;
>>
>>         elem_map_flags = attr->batch.elem_flags;
>> -       if ((elem_map_flags & ~BPF_F_LOCK) ||
>> -           ((elem_map_flags & BPF_F_LOCK) && !btf_record_has_field(map->record, BPF_SPIN_LOCK)))
>> -               return -EINVAL;
>> +       if (!do_delete && is_percpu) {
>> +               ret = bpf_map_check_op_flags(map, elem_map_flags, BPF_F_LOCK | BPF_F_CPU);
>> +               if (ret)
>> +                       return ret;
>> +       } else {
>> +               if ((elem_map_flags & ~BPF_F_LOCK) ||
>> +                   ((elem_map_flags & BPF_F_LOCK) &&
>> +                    !btf_record_has_field(map->record, BPF_SPIN_LOCK)))
>> +                       return -EINVAL;
>> +       }
>
> partially open-coded bpf_map_check_op_flags() if `do_delete ||
> !is_percpu`, right? Have you considered
>
> u32 allowed_flags = 0;
>
> ...
>
> allowed_flags = BPF_F_LOCK | BPF_F_CPU;
> if (do_delete || !is_percpu)
>     allowed_flags ~= BPF_F_CPU;
> err = bpf_map_check_op_flags(map, elem_map_flags, allowed_flags);
>
>
> This reads way more natural (in my head...), and no open-coding the
> helper you just so painstakingly extracted and extended to check all
> these conditions.
>

My intention was to call bpf_map_check_op_flags() only for lookup_batch
on *percpu* hash maps, while excluding lookup_batch on non-percpu hash
maps and the lookup_and_delete_batch API.

I don’t think we should be checking op flags for non-percpu hash maps or
for lookup_and_delete_batch cases.

>>
>>         map_flags = attr->batch.flags;
>>         if (map_flags)
>> @@ -1724,7 +1742,7 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map,
>>         value_size = htab->map.value_size;
>>         size = round_up(value_size, 8);
>>         if (is_percpu)
>> -               value_size = size * num_possible_cpus();
>> +               value_size = (elem_map_flags & BPF_F_CPU) ? size : size * num_possible_cpus();
>
> if (is_percpu && !(elem_map_flags & BPF_F_CPU))
>     value_size = size * num_possible_cpus();
>
> ?
>

Right, good catch.

Thanks,
Leon

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH bpf-next v7 6/7] libbpf: Add BPF_F_CPU and BPF_F_ALL_CPUS flags support for percpu maps
  2025-09-16 23:44   ` Andrii Nakryiko
@ 2025-09-17 15:25     ` Leon Hwang
  0 siblings, 0 replies; 29+ messages in thread
From: Leon Hwang @ 2025-09-17 15:25 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, ast, andrii, daniel, jolsa, yonghong.song, song, eddyz87,
	dxu, deso, kernel-patches-bot

On Wed Sep 17, 2025 at 7:44 AM +08, Andrii Nakryiko wrote:
> On Wed, Sep 10, 2025 at 9:28 AM Leon Hwang <leon.hwang@linux.dev> wrote:
>>
>> Add libbpf support for the BPF_F_CPU flag for percpu maps by embedding the
>> cpu info into the high 32 bits of:
>>
>> 1. **flags**: bpf_map_lookup_elem_flags(), bpf_map__lookup_elem(),
>>    bpf_map_update_elem() and bpf_map__update_elem()
>> 2. **opts->elem_flags**: bpf_map_lookup_batch() and
>>    bpf_map_update_batch()
>>
>> And the flag can be BPF_F_ALL_CPUS, but cannot be
>> 'BPF_F_CPU | BPF_F_ALL_CPUS'.
>>
>> Behavior:
>>
>> * If the flag is BPF_F_ALL_CPUS, the update is applied across all CPUs.
>> * If the flag is BPF_F_CPU, it updates value only to the specified CPU.
>> * If the flag is BPF_F_CPU, lookup value only from the specified CPU.
>> * lookup does not support BPF_F_ALL_CPUS.
>>
>> Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
>> ---
>>  tools/lib/bpf/bpf.h    |  8 ++++++++
>>  tools/lib/bpf/libbpf.c | 26 ++++++++++++++++++++------
>>  tools/lib/bpf/libbpf.h | 21 ++++++++-------------
>>  3 files changed, 36 insertions(+), 19 deletions(-)
>>
>
> LGTM, but see some wording nits below
>
> Acked-by: Andrii Nakryiko <andrii@kernel.org>
>

Thanks.

>> diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
>> index 7252150e7ad35..28acb15e982b3 100644
>> --- a/tools/lib/bpf/bpf.h
>> +++ b/tools/lib/bpf/bpf.h
>> @@ -286,6 +286,14 @@ LIBBPF_API int bpf_map_lookup_and_delete_batch(int fd, void *in_batch,
>>   *    Update spin_lock-ed map elements. This must be
>>   *    specified if the map value contains a spinlock.
>>   *
>> + * **BPF_F_CPU**
>> + *    As for percpu maps, update value on the specified CPU. And the cpu
>> + *    info is embedded into the high 32 bits of **opts->elem_flags**.
>> + *
>> + * **BPF_F_ALL_CPUS**
>> + *    As for percpu maps, update value across all CPUs. This flag cannot
>> + *    be used with BPF_F_CPU at the same time.
>> + *
>>   * @param fd BPF map file descriptor
>>   * @param keys pointer to an array of *count* keys
>>   * @param values pointer to an array of *count* values
>> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
>> index fe4fc5438678c..3d60e7a713518 100644
>> --- a/tools/lib/bpf/libbpf.c
>> +++ b/tools/lib/bpf/libbpf.c
>> @@ -10603,7 +10603,7 @@ bpf_object__find_map_fd_by_name(const struct bpf_object *obj, const char *name)
>>  }
>>
>>  static int validate_map_op(const struct bpf_map *map, size_t key_sz,
>> -                          size_t value_sz, bool check_value_sz)
>> +                          size_t value_sz, bool check_value_sz, __u64 flags)
>>  {
>>         if (!map_is_created(map)) /* map is not yet created */
>>                 return -ENOENT;
>> @@ -10630,6 +10630,20 @@ static int validate_map_op(const struct bpf_map *map, size_t key_sz,
>>                 int num_cpu = libbpf_num_possible_cpus();
>>                 size_t elem_sz = roundup(map->def.value_size, 8);
>>
>> +               if (flags & (BPF_F_CPU | BPF_F_ALL_CPUS)) {
>> +                       if ((flags & BPF_F_CPU) && (flags & BPF_F_ALL_CPUS)) {
>> +                               pr_warn("map '%s': can't use BPF_F_CPU and BPF_F_ALL_CPUS at the same time\n",
>
> "BPF_F_CPU and BPF_F_ALL_CPUS are mutually exclusive" ?
>

Ack.

>> +                                       map->name);
>> +                               return -EINVAL;
>> +                       }
>> +                       if (value_sz != elem_sz) {
>> +                               pr_warn("map '%s': unexpected value size %zu provided for per-CPU map, expected %zu\n",
>> +                                       map->name, value_sz, elem_sz);
>> +                               return -EINVAL;
>> +                       }
>> +                       break;
>> +               }
>> +
>>                 if (value_sz != num_cpu * elem_sz) {
>>                         pr_warn("map '%s': unexpected value size %zu provided for per-CPU map, expected %d * %zu = %zd\n",
>>                                 map->name, value_sz, num_cpu, elem_sz, num_cpu * elem_sz);
>
> [...]
>
>> diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
>> index 2e91148d9b44d..f221dc5c6ba41 100644
>> --- a/tools/lib/bpf/libbpf.h
>> +++ b/tools/lib/bpf/libbpf.h
>> @@ -1196,12 +1196,13 @@ LIBBPF_API struct bpf_map *bpf_map__inner_map(struct bpf_map *map);
>>   * @param key_sz size in bytes of key data, needs to match BPF map definition's **key_size**
>>   * @param value pointer to memory in which looked up value will be stored
>>   * @param value_sz size in byte of value data memory; it has to match BPF map
>> - * definition's **value_size**. For per-CPU BPF maps value size has to be
>> - * a product of BPF map value size and number of possible CPUs in the system
>> - * (could be fetched with **libbpf_num_possible_cpus()**). Note also that for
>> - * per-CPU values value size has to be aligned up to closest 8 bytes for
>> - * alignment reasons, so expected size is: `round_up(value_size, 8)
>> - * * libbpf_num_possible_cpus()`.
>> + * definition's **value_size**. For per-CPU BPF maps, value size can be
>> + * `round_up(value_size, 8)` if **BPF_F_CPU** or **BPF_F_ALL_CPUS** is
>
> nit: if either BPF_F_CPU or BPF_F_ALL_CPUS
>

Ack.

>> + * specified in **flags**, otherwise a product of BPF map value size and number
>> + * of possible CPUs in the system (could be fetched with
>> + * **libbpf_num_possible_cpus()**). Note else that for per-CPU values value
>
> Note *also*? Is that what you were trying to say?
>

My mistake.

I’ll change it back to *also* in the next revision.

Thanks,
Leon

[...]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH bpf-next v7 4/7] bpf: Add BPF_F_CPU and BPF_F_ALL_CPUS flags support for percpu_hash and lru_percpu_hash maps
  2025-09-17 15:20     ` Leon Hwang
@ 2025-09-17 19:11       ` Andrii Nakryiko
  2025-09-18 16:07         ` Leon Hwang
  0 siblings, 1 reply; 29+ messages in thread
From: Andrii Nakryiko @ 2025-09-17 19:11 UTC (permalink / raw)
  To: Leon Hwang
  Cc: bpf, ast, andrii, daniel, jolsa, yonghong.song, song, eddyz87,
	dxu, deso, kernel-patches-bot

On Wed, Sep 17, 2025 at 8:20 AM Leon Hwang <leon.hwang@linux.dev> wrote:
>
> On Wed Sep 17, 2025 at 7:44 AM +08, Andrii Nakryiko wrote:
> > On Wed, Sep 10, 2025 at 9:28 AM Leon Hwang <leon.hwang@linux.dev> wrote:
> >>
> >> Introduce BPF_F_ALL_CPUS flag support for percpu_hash and lru_percpu_hash
> >> maps to allow updating values for all CPUs with a single value for both
> >> update_elem and update_batch APIs.
> >>
> >> Introduce BPF_F_CPU flag support for percpu_hash and lru_percpu_hash
> >> maps to allow:
> >>
> >> * update value for specified CPU for both update_elem and update_batch
> >> APIs.
> >> * lookup value for specified CPU for both lookup_elem and lookup_batch
> >> APIs.
> >>
> >> The BPF_F_CPU flag is passed via:
> >>
> >> * map_flags along with embedded cpu info.
> >> * elem_flags along with embedded cpu info.
> >>
> >> Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
> >> ---
> >>  include/linux/bpf.h  |  4 ++-
> >>  kernel/bpf/hashtab.c | 77 +++++++++++++++++++++++++++++++-------------
> >>  kernel/bpf/syscall.c |  2 +-
> >>  3 files changed, 58 insertions(+), 25 deletions(-)
> >>
> >
> > [...]
> >
> >> @@ -1147,7 +1158,7 @@ static long htab_map_update_elem(struct bpf_map *map, void *key, void *value,
> >>         }
> >>
> >>         l_new = alloc_htab_elem(htab, key, value, key_size, hash, false, false,
> >> -                               l_old);
> >> +                               l_old, map_flags);
> >>         if (IS_ERR(l_new)) {
> >>                 /* all pre-allocated elements are in use or memory exhausted */
> >>                 ret = PTR_ERR(l_new);
> >> @@ -1263,7 +1274,7 @@ static long htab_map_update_elem_in_place(struct bpf_map *map, void *key,
> >>         u32 key_size, hash;
> >>         int ret;
> >>
> >> -       if (unlikely(map_flags > BPF_EXIST))
> >> +       if (unlikely(!onallcpus && map_flags > BPF_EXIST))
> >
> > BPF_F_LOCK shouldn't be let through
> >
>
> Ack.
>
> >>                 /* unknown flags */
> >>                 return -EINVAL;
> >>
> >
> > [...]
> >
> >> @@ -1698,9 +1709,16 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map,
> >>         int ret = 0;
> >>
> >>         elem_map_flags = attr->batch.elem_flags;
> >> -       if ((elem_map_flags & ~BPF_F_LOCK) ||
> >> -           ((elem_map_flags & BPF_F_LOCK) && !btf_record_has_field(map->record, BPF_SPIN_LOCK)))
> >> -               return -EINVAL;
> >> +       if (!do_delete && is_percpu) {
> >> +               ret = bpf_map_check_op_flags(map, elem_map_flags, BPF_F_LOCK | BPF_F_CPU);
> >> +               if (ret)
> >> +                       return ret;
> >> +       } else {
> >> +               if ((elem_map_flags & ~BPF_F_LOCK) ||
> >> +                   ((elem_map_flags & BPF_F_LOCK) &&
> >> +                    !btf_record_has_field(map->record, BPF_SPIN_LOCK)))
> >> +                       return -EINVAL;
> >> +       }
> >
> > partially open-coded bpf_map_check_op_flags() if `do_delete ||
> > !is_percpu`, right? Have you considered
> >
> > u32 allowed_flags = 0;
> >
> > ...
> >
> > allowed_flags = BPF_F_LOCK | BPF_F_CPU;
> > if (do_delete || !is_percpu)
> >     allowed_flags ~= BPF_F_CPU;
> > err = bpf_map_check_op_flags(map, elem_map_flags, allowed_flags);
> >
> >
> > This reads way more natural (in my head...), and no open-coding the
> > helper you just so painstakingly extracted and extended to check all
> > these conditions.
> >
>
> My intention was to call bpf_map_check_op_flags() only for lookup_batch
> on *percpu* hash maps, while excluding lookup_batch on non-percpu hash
> maps and the lookup_and_delete_batch API.
>
> I don’t think we should be checking op flags for non-percpu hash maps or
> for lookup_and_delete_batch cases.

Can you elaborate on why?

>
> >>
> >>         map_flags = attr->batch.flags;
> >>         if (map_flags)
> >> @@ -1724,7 +1742,7 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map,
> >>         value_size = htab->map.value_size;
> >>         size = round_up(value_size, 8);
> >>         if (is_percpu)
> >> -               value_size = size * num_possible_cpus();
> >> +               value_size = (elem_map_flags & BPF_F_CPU) ? size : size * num_possible_cpus();
> >
> > if (is_percpu && !(elem_map_flags & BPF_F_CPU))
> >     value_size = size * num_possible_cpus();
> >
> > ?
> >
>
> Right, good catch.
>
> Thanks,
> Leon

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH bpf-next v7 5/7] bpf: Add BPF_F_CPU and BPF_F_ALL_CPUS flags support for percpu_cgroup_storage maps
  2025-09-17 15:07     ` Leon Hwang
@ 2025-09-17 19:12       ` Andrii Nakryiko
  2025-09-18 15:38         ` Leon Hwang
  0 siblings, 1 reply; 29+ messages in thread
From: Andrii Nakryiko @ 2025-09-17 19:12 UTC (permalink / raw)
  To: Leon Hwang
  Cc: bpf, ast, andrii, daniel, jolsa, yonghong.song, song, eddyz87,
	dxu, deso, kernel-patches-bot

On Wed, Sep 17, 2025 at 8:08 AM Leon Hwang <leon.hwang@linux.dev> wrote:
>
> On Wed Sep 17, 2025 at 7:44 AM +08, Andrii Nakryiko wrote:
> > On Wed, Sep 10, 2025 at 9:28 AM Leon Hwang <leon.hwang@linux.dev> wrote:
> >>
> >> Introduce BPF_F_ALL_CPUS flag support for percpu_cgroup_storage maps to
> >> allow updating values for all CPUs with a single value for update_elem
> >> API.
> >>
> >> Introduce BPF_F_CPU flag support for percpu_cgroup_storage maps to
> >> allow:
> >>
> >> * update value for specified CPU for update_elem API.
> >> * lookup value for specified CPU for lookup_elem API.
> >>
> >> The BPF_F_CPU flag is passed via map_flags along with embedded cpu info.
> >>
> >> Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
> >> ---
> >>  include/linux/bpf-cgroup.h |  4 ++--
> >>  include/linux/bpf.h        |  1 +
> >>  kernel/bpf/local_storage.c | 22 +++++++++++++++++++---
> >>  kernel/bpf/syscall.c       |  2 +-
> >>  4 files changed, 23 insertions(+), 6 deletions(-)
> >>
> >
> > [...]
> >
> >> @@ -216,7 +222,7 @@ int bpf_percpu_cgroup_storage_update(struct bpf_map *_map, void *key,
> >>         int cpu, off = 0;
> >>         u32 size;
> >>
> >> -       if (map_flags != BPF_ANY && map_flags != BPF_EXIST)
> >> +       if ((u32)map_flags & ~(BPF_ANY | BPF_EXIST | BPF_F_CPU | BPF_F_ALL_CPUS))
> >>                 return -EINVAL;
> >
> > shouldn't bpf_map_check_op_flags() be used here to validate cpu number
> > and BPF_F_CPU and BPF_F_ALL_CPUS exclusivity?..
> >
>
> bpf_map_check_op_flags() has been called in
> syscall.c::map_update_elem().

ah, I actually tried to double-check that by looking at earlier
patches, but still missed that. Never mind then.

>
> Thanks,
> Leon
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH bpf-next v7 5/7] bpf: Add BPF_F_CPU and BPF_F_ALL_CPUS flags support for percpu_cgroup_storage maps
  2025-09-17 19:12       ` Andrii Nakryiko
@ 2025-09-18 15:38         ` Leon Hwang
  0 siblings, 0 replies; 29+ messages in thread
From: Leon Hwang @ 2025-09-18 15:38 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, ast, andrii, daniel, jolsa, yonghong.song, song, eddyz87,
	dxu, deso, kernel-patches-bot

>> >> @@ -216,7 +222,7 @@ int bpf_percpu_cgroup_storage_update(struct bpf_map *_map, void *key,
>> >>         int cpu, off = 0;
>> >>         u32 size;
>> >>
>> >> -       if (map_flags != BPF_ANY && map_flags != BPF_EXIST)
>> >> +       if ((u32)map_flags & ~(BPF_ANY | BPF_EXIST | BPF_F_CPU | BPF_F_ALL_CPUS))
>> >>                 return -EINVAL;
>> >
>> > shouldn't bpf_map_check_op_flags() be used here to validate cpu number
>> > and BPF_F_CPU and BPF_F_ALL_CPUS exclusivity?..
>> >
>>
>> bpf_map_check_op_flags() has been called in
>> syscall.c::map_update_elem().
>
> ah, I actually tried to double-check that by looking at earlier
> patches, but still missed that. Never mind then.
>

Sorry for the earlier unclear explanation.

Let me restate:

1. Patch #1 introduces bpf_map_check_op_flags().
2. Patch #1 also updates map_update_elem() to call
   bpf_map_check_op_flags().
3. Patch #2 extends bpf_map_check_op_flags() to validate the CPU flags
   and CPU number.

When updating elements of percpu cgroup_storage maps, map_update_elem()
calls bpf_map_check_op_flags() before
bpf_percpu_cgroup_storage_update() being invoked.

So, the CPU flags and CPU number are already validated in
map_update_elem(), and don’t need to be re-checked here.

Thanks,
Leon

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH bpf-next v7 4/7] bpf: Add BPF_F_CPU and BPF_F_ALL_CPUS flags support for percpu_hash and lru_percpu_hash maps
  2025-09-17 19:11       ` Andrii Nakryiko
@ 2025-09-18 16:07         ` Leon Hwang
  2025-09-18 19:52           ` Andrii Nakryiko
  0 siblings, 1 reply; 29+ messages in thread
From: Leon Hwang @ 2025-09-18 16:07 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, ast, andrii, daniel, jolsa, yonghong.song, song, eddyz87,
	dxu, deso, kernel-patches-bot

>> >> @@ -1698,9 +1709,16 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map,
>> >>         int ret = 0;
>> >>
>> >>         elem_map_flags = attr->batch.elem_flags;
>> >> -       if ((elem_map_flags & ~BPF_F_LOCK) ||
>> >> -           ((elem_map_flags & BPF_F_LOCK) && !btf_record_has_field(map->record, BPF_SPIN_LOCK)))
>> >> -               return -EINVAL;
>> >> +       if (!do_delete && is_percpu) {
>> >> +               ret = bpf_map_check_op_flags(map, elem_map_flags, BPF_F_LOCK | BPF_F_CPU);
>> >> +               if (ret)
>> >> +                       return ret;
>> >> +       } else {
>> >> +               if ((elem_map_flags & ~BPF_F_LOCK) ||
>> >> +                   ((elem_map_flags & BPF_F_LOCK) &&
>> >> +                    !btf_record_has_field(map->record, BPF_SPIN_LOCK)))
>> >> +                       return -EINVAL;
>> >> +       }
>> >
>> > partially open-coded bpf_map_check_op_flags() if `do_delete ||
>> > !is_percpu`, right? Have you considered
>> >
>> > u32 allowed_flags = 0;
>> >
>> > ...
>> >
>> > allowed_flags = BPF_F_LOCK | BPF_F_CPU;
>> > if (do_delete || !is_percpu)
>> >     allowed_flags ~= BPF_F_CPU;
>> > err = bpf_map_check_op_flags(map, elem_map_flags, allowed_flags);
>> >
>> >
>> > This reads way more natural (in my head...), and no open-coding the
>> > helper you just so painstakingly extracted and extended to check all
>> > these conditions.
>> >
>>
>> My intention was to call bpf_map_check_op_flags() only for lookup_batch
>> on *percpu* hash maps, while excluding lookup_batch on non-percpu hash
>> maps and the lookup_and_delete_batch API.
>>
>> I don’t think we should be checking op flags for non-percpu hash maps or
>> for lookup_and_delete_batch cases.
>
> Can you elaborate on why?
>

I’ve reconsidered your suggestion, and I agree.

With your approach, CPU flags and the CPU number won’t be checked when
'(do_delete || !is_percpu)', which makes sense.

I’d like to update the code as follows:

allowed_flags = BPF_F_LOCK;
if (!do_delete && is_percpu)
    allowed_flags |= BPF_F_CPU;
err = bpf_map_check_op_flags(map, elem_map_flags, allowed_flags);

This way, CPU flags and the CPU number are only validated for the
lookup_batch API on percpu hash maps.

Thanks,
Leon

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH bpf-next v7 4/7] bpf: Add BPF_F_CPU and BPF_F_ALL_CPUS flags support for percpu_hash and lru_percpu_hash maps
  2025-09-18 16:07         ` Leon Hwang
@ 2025-09-18 19:52           ` Andrii Nakryiko
  0 siblings, 0 replies; 29+ messages in thread
From: Andrii Nakryiko @ 2025-09-18 19:52 UTC (permalink / raw)
  To: Leon Hwang
  Cc: bpf, ast, andrii, daniel, jolsa, yonghong.song, song, eddyz87,
	dxu, deso, kernel-patches-bot

On Thu, Sep 18, 2025 at 9:07 AM Leon Hwang <leon.hwang@linux.dev> wrote:
>
> >> >> @@ -1698,9 +1709,16 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map,
> >> >>         int ret = 0;
> >> >>
> >> >>         elem_map_flags = attr->batch.elem_flags;
> >> >> -       if ((elem_map_flags & ~BPF_F_LOCK) ||
> >> >> -           ((elem_map_flags & BPF_F_LOCK) && !btf_record_has_field(map->record, BPF_SPIN_LOCK)))
> >> >> -               return -EINVAL;
> >> >> +       if (!do_delete && is_percpu) {
> >> >> +               ret = bpf_map_check_op_flags(map, elem_map_flags, BPF_F_LOCK | BPF_F_CPU);
> >> >> +               if (ret)
> >> >> +                       return ret;
> >> >> +       } else {
> >> >> +               if ((elem_map_flags & ~BPF_F_LOCK) ||
> >> >> +                   ((elem_map_flags & BPF_F_LOCK) &&
> >> >> +                    !btf_record_has_field(map->record, BPF_SPIN_LOCK)))
> >> >> +                       return -EINVAL;
> >> >> +       }
> >> >
> >> > partially open-coded bpf_map_check_op_flags() if `do_delete ||
> >> > !is_percpu`, right? Have you considered
> >> >
> >> > u32 allowed_flags = 0;
> >> >
> >> > ...
> >> >
> >> > allowed_flags = BPF_F_LOCK | BPF_F_CPU;
> >> > if (do_delete || !is_percpu)
> >> >     allowed_flags ~= BPF_F_CPU;
> >> > err = bpf_map_check_op_flags(map, elem_map_flags, allowed_flags);
> >> >
> >> >
> >> > This reads way more natural (in my head...), and no open-coding the
> >> > helper you just so painstakingly extracted and extended to check all
> >> > these conditions.
> >> >
> >>
> >> My intention was to call bpf_map_check_op_flags() only for lookup_batch
> >> on *percpu* hash maps, while excluding lookup_batch on non-percpu hash
> >> maps and the lookup_and_delete_batch API.
> >>
> >> I don’t think we should be checking op flags for non-percpu hash maps or
> >> for lookup_and_delete_batch cases.
> >
> > Can you elaborate on why?
> >
>
> I’ve reconsidered your suggestion, and I agree.
>
> With your approach, CPU flags and the CPU number won’t be checked when
> '(do_delete || !is_percpu)', which makes sense.
>
> I’d like to update the code as follows:
>
> allowed_flags = BPF_F_LOCK;
> if (!do_delete && is_percpu)
>     allowed_flags |= BPF_F_CPU;
> err = bpf_map_check_op_flags(map, elem_map_flags, allowed_flags);
>

sure, lgtm

> This way, CPU flags and the CPU number are only validated for the
> lookup_batch API on percpu hash maps.
>
> Thanks,
> Leon

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH bpf-next v7 4/7] bpf: Add BPF_F_CPU and BPF_F_ALL_CPUS flags support for percpu_hash and lru_percpu_hash maps
  2025-09-16 23:44   ` Andrii Nakryiko
  2025-09-17 15:20     ` Leon Hwang
@ 2025-09-19  5:25     ` Leon Hwang
  2025-09-19 22:31       ` Andrii Nakryiko
  1 sibling, 1 reply; 29+ messages in thread
From: Leon Hwang @ 2025-09-19  5:25 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, ast, andrii, daniel, jolsa, yonghong.song, song, eddyz87,
	dxu, deso, kernel-patches-bot



>> @@ -1724,7 +1742,7 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map,
>>         value_size = htab->map.value_size;
>>         size = round_up(value_size, 8);
>>         if (is_percpu)
>> -               value_size = size * num_possible_cpus();
>> +               value_size = (elem_map_flags & BPF_F_CPU) ? size : size * num_possible_cpus();
>
> if (is_percpu && !(elem_map_flags & BPF_F_CPU))
>     value_size = size * num_possible_cpus();
>
> ?
>

After looking at it again, I’d like to keep my approach.

When 'elem_map_flags & BPF_F_CPU' is set, 'value_size' has to be
assigned to 'size' ('round_up(value_size, 8)') instead of keeping
'htab->map.value_size'.

Thanks,
Leon

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH bpf-next v7 4/7] bpf: Add BPF_F_CPU and BPF_F_ALL_CPUS flags support for percpu_hash and lru_percpu_hash maps
  2025-09-19  5:25     ` Leon Hwang
@ 2025-09-19 22:31       ` Andrii Nakryiko
  2025-09-22 14:50         ` Leon Hwang
  0 siblings, 1 reply; 29+ messages in thread
From: Andrii Nakryiko @ 2025-09-19 22:31 UTC (permalink / raw)
  To: Leon Hwang
  Cc: bpf, ast, andrii, daniel, jolsa, yonghong.song, song, eddyz87,
	dxu, deso, kernel-patches-bot

On Thu, Sep 18, 2025 at 10:25 PM Leon Hwang <leon.hwang@linux.dev> wrote:
>
>
>
> >> @@ -1724,7 +1742,7 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map,
> >>         value_size = htab->map.value_size;
> >>         size = round_up(value_size, 8);
> >>         if (is_percpu)
> >> -               value_size = size * num_possible_cpus();
> >> +               value_size = (elem_map_flags & BPF_F_CPU) ? size : size * num_possible_cpus();
> >
> > if (is_percpu && !(elem_map_flags & BPF_F_CPU))
> >     value_size = size * num_possible_cpus();
> >
> > ?
> >
>
> After looking at it again, I’d like to keep my approach.
>
> When 'elem_map_flags & BPF_F_CPU' is set, 'value_size' has to be
> assigned to 'size' ('round_up(value_size, 8)') instead of keeping
> 'htab->map.value_size'.
>

isn't that what will happen here as well? There is

size = round_up(value_size, 8);

right before that if

> Thanks,
> Leon

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH bpf-next v7 4/7] bpf: Add BPF_F_CPU and BPF_F_ALL_CPUS flags support for percpu_hash and lru_percpu_hash maps
  2025-09-19 22:31       ` Andrii Nakryiko
@ 2025-09-22 14:50         ` Leon Hwang
  2025-09-22 16:13           ` Andrii Nakryiko
  0 siblings, 1 reply; 29+ messages in thread
From: Leon Hwang @ 2025-09-22 14:50 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, ast, andrii, daniel, jolsa, yonghong.song, song, eddyz87,
	dxu, deso, kernel-patches-bot

On Sat Sep 20, 2025 at 6:31 AM +08, Andrii Nakryiko wrote:
> On Thu, Sep 18, 2025 at 10:25 PM Leon Hwang <leon.hwang@linux.dev> wrote:
>>
>>
>>
>> >> @@ -1724,7 +1742,7 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map,
>> >>         value_size = htab->map.value_size;
>> >>         size = round_up(value_size, 8);
>> >>         if (is_percpu)
>> >> -               value_size = size * num_possible_cpus();
>> >> +               value_size = (elem_map_flags & BPF_F_CPU) ? size : size * num_possible_cpus();
>> >
>> > if (is_percpu && !(elem_map_flags & BPF_F_CPU))
>> >     value_size = size * num_possible_cpus();
>> >
>> > ?
>> >
>>
>> After looking at it again, I’d like to keep my approach.
>>
>> When 'elem_map_flags & BPF_F_CPU' is set, 'value_size' has to be
>> assigned to 'size' ('round_up(value_size, 8)') instead of keeping
>> 'htab->map.value_size'.
>>
>
> isn't that what will happen here as well? There is
>
> size = round_up(value_size, 8);
>
> right before that if
>

As for percpu maps, both 'size' and 'value_size' need to be 8-byte
aligned here, because 'map.value_size' itself is not guarenteed to be
aligned.

In 'htab_map_alloc_check()', there is no alignment check for percpu
maps.

So 'map.value_size' can be unaligned.

Let's look at how 'value_size' is used:

values = kvmalloc_array(value_size, bucket_size, GFP_USER | __GFP_NOWARN);
dst_val = values;
hlist_nulls_for_each_entry_safe(l, n, head, hash_node) {
        if (is_percpu) {
                if (elem_map_flags & BPF_F_CPU) {
                        copy_map_value_long(&htab->map, dst_val, per_cpu_ptr(pptr, cpu));
                }
        }
        dst_val += value_size;
}
copy_to_user(uvalues + total * value_size, values,
             value_size * bucket_cnt)

Here, 'value_size' determines how values are laid out and copied.

As a result, when 'is_percpu && (elem_map_flags & BPF_F_CPU)',
'value_size' must be assigned to 'size' in order to make sure it's
8-byte aligned.

Thanks,
Leon

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH bpf-next v7 4/7] bpf: Add BPF_F_CPU and BPF_F_ALL_CPUS flags support for percpu_hash and lru_percpu_hash maps
  2025-09-22 14:50         ` Leon Hwang
@ 2025-09-22 16:13           ` Andrii Nakryiko
  2025-09-23  2:45             ` Leon Hwang
  0 siblings, 1 reply; 29+ messages in thread
From: Andrii Nakryiko @ 2025-09-22 16:13 UTC (permalink / raw)
  To: Leon Hwang
  Cc: bpf, ast, andrii, daniel, jolsa, yonghong.song, song, eddyz87,
	dxu, deso, kernel-patches-bot

On Mon, Sep 22, 2025 at 7:50 AM Leon Hwang <leon.hwang@linux.dev> wrote:
>
> On Sat Sep 20, 2025 at 6:31 AM +08, Andrii Nakryiko wrote:
> > On Thu, Sep 18, 2025 at 10:25 PM Leon Hwang <leon.hwang@linux.dev> wrote:
> >>
> >>
> >>
> >> >> @@ -1724,7 +1742,7 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map,
> >> >>         value_size = htab->map.value_size;
> >> >>         size = round_up(value_size, 8);
> >> >>         if (is_percpu)
> >> >> -               value_size = size * num_possible_cpus();
> >> >> +               value_size = (elem_map_flags & BPF_F_CPU) ? size : size * num_possible_cpus();
> >> >
> >> > if (is_percpu && !(elem_map_flags & BPF_F_CPU))
> >> >     value_size = size * num_possible_cpus();
> >> >
> >> > ?
> >> >
> >>
> >> After looking at it again, I’d like to keep my approach.
> >>
> >> When 'elem_map_flags & BPF_F_CPU' is set, 'value_size' has to be
> >> assigned to 'size' ('round_up(value_size, 8)') instead of keeping
> >> 'htab->map.value_size'.
> >>
> >
> > isn't that what will happen here as well? There is
> >
> > size = round_up(value_size, 8);
> >
> > right before that if
> >
>
> As for percpu maps, both 'size' and 'value_size' need to be 8-byte
> aligned here, because 'map.value_size' itself is not guarenteed to be
> aligned.
>
> In 'htab_map_alloc_check()', there is no alignment check for percpu
> maps.
>
> So 'map.value_size' can be unaligned.
>
> Let's look at how 'value_size' is used:
>
> values = kvmalloc_array(value_size, bucket_size, GFP_USER | __GFP_NOWARN);
> dst_val = values;
> hlist_nulls_for_each_entry_safe(l, n, head, hash_node) {
>         if (is_percpu) {
>                 if (elem_map_flags & BPF_F_CPU) {
>                         copy_map_value_long(&htab->map, dst_val, per_cpu_ptr(pptr, cpu));
>                 }
>         }
>         dst_val += value_size;
> }
> copy_to_user(uvalues + total * value_size, values,
>              value_size * bucket_cnt)
>
> Here, 'value_size' determines how values are laid out and copied.
>

So in my mind (and maybe it's wrong, tell me), BPF_F_CPU turns a
per-CPU map lookup into an effectively non-per-cpu one. So I'm not
sure we need to do 8 byte alignment of value/key sizes when BPF_F_CPU
is specified.

But if people would like to keep 8 byte alignment anyways for
BPF_F_CPU, that's fine too, I guess.

> As a result, when 'is_percpu && (elem_map_flags & BPF_F_CPU)',
> 'value_size' must be assigned to 'size' in order to make sure it's
> 8-byte aligned.
>
> Thanks,
> Leon

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH bpf-next v7 4/7] bpf: Add BPF_F_CPU and BPF_F_ALL_CPUS flags support for percpu_hash and lru_percpu_hash maps
  2025-09-22 16:13           ` Andrii Nakryiko
@ 2025-09-23  2:45             ` Leon Hwang
  2025-09-24 23:47               ` Andrii Nakryiko
  0 siblings, 1 reply; 29+ messages in thread
From: Leon Hwang @ 2025-09-23  2:45 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, ast, andrii, daniel, jolsa, yonghong.song, song, eddyz87,
	dxu, deso, kernel-patches-bot



On 23/9/25 00:13, Andrii Nakryiko wrote:
> On Mon, Sep 22, 2025 at 7:50 AM Leon Hwang <leon.hwang@linux.dev> wrote:
>>
>> On Sat Sep 20, 2025 at 6:31 AM +08, Andrii Nakryiko wrote:
>>> On Thu, Sep 18, 2025 at 10:25 PM Leon Hwang <leon.hwang@linux.dev> wrote:
>>>>
>>>>
>>>>
>>>>>> @@ -1724,7 +1742,7 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map,
>>>>>>         value_size = htab->map.value_size;
>>>>>>         size = round_up(value_size, 8);
>>>>>>         if (is_percpu)
>>>>>> -               value_size = size * num_possible_cpus();
>>>>>> +               value_size = (elem_map_flags & BPF_F_CPU) ? size : size * num_possible_cpus();
>>>>>
>>>>> if (is_percpu && !(elem_map_flags & BPF_F_CPU))
>>>>>     value_size = size * num_possible_cpus();
>>>>>
>>>>> ?
>>>>>
>>>>
>>>> After looking at it again, I’d like to keep my approach.
>>>>
>>>> When 'elem_map_flags & BPF_F_CPU' is set, 'value_size' has to be
>>>> assigned to 'size' ('round_up(value_size, 8)') instead of keeping
>>>> 'htab->map.value_size'.
>>>>
>>>
>>> isn't that what will happen here as well? There is
>>>
>>> size = round_up(value_size, 8);
>>>
>>> right before that if
>>>
>>
>> As for percpu maps, both 'size' and 'value_size' need to be 8-byte
>> aligned here, because 'map.value_size' itself is not guarenteed to be
>> aligned.
>>
>> In 'htab_map_alloc_check()', there is no alignment check for percpu
>> maps.
>>
>> So 'map.value_size' can be unaligned.
>>
>> Let's look at how 'value_size' is used:
>>
>> values = kvmalloc_array(value_size, bucket_size, GFP_USER | __GFP_NOWARN);
>> dst_val = values;
>> hlist_nulls_for_each_entry_safe(l, n, head, hash_node) {
>>         if (is_percpu) {
>>                 if (elem_map_flags & BPF_F_CPU) {
>>                         copy_map_value_long(&htab->map, dst_val, per_cpu_ptr(pptr, cpu));
>>                 }
>>         }
>>         dst_val += value_size;
>> }
>> copy_to_user(uvalues + total * value_size, values,
>>              value_size * bucket_cnt)
>>
>> Here, 'value_size' determines how values are laid out and copied.
>>
>
> So in my mind (and maybe it's wrong, tell me), BPF_F_CPU turns a
> per-CPU map lookup into an effectively non-per-cpu one. So I'm not
> sure we need to do 8 byte alignment of value/key sizes when BPF_F_CPU
> is specified.
>
> But if people would like to keep 8 byte alignment anyways for
> BPF_F_CPU, that's fine too, I guess.
>

'value_size' should be 8-byte aligned here.

For example, if 'value_size' is *1* when BPF_F_CPU is specified:

values = kvmalloc_array();  /* 5 bytes (value_size * bucket_size) memory */
copy_map_value_long();      /* copies 8 bytes, writing past the
                               allocated 5 bytes of memory */

To stay consistent with 'copy_map_value_long()', 'value_size' itself
needs to be 8-byte aligned.

That leaves us with two options:

1. Keep 'value_size' unaligned, switch 'copy_map_value_long()' to
   'copy_map_value()'.
2. Require 'value_size' to be 8-byte aligned.

WDYT?

Thanks,
Leon

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH bpf-next v7 4/7] bpf: Add BPF_F_CPU and BPF_F_ALL_CPUS flags support for percpu_hash and lru_percpu_hash maps
  2025-09-23  2:45             ` Leon Hwang
@ 2025-09-24 23:47               ` Andrii Nakryiko
  0 siblings, 0 replies; 29+ messages in thread
From: Andrii Nakryiko @ 2025-09-24 23:47 UTC (permalink / raw)
  To: Leon Hwang
  Cc: bpf, ast, andrii, daniel, jolsa, yonghong.song, song, eddyz87,
	dxu, deso, kernel-patches-bot

On Mon, Sep 22, 2025 at 7:45 PM Leon Hwang <leon.hwang@linux.dev> wrote:
>
>
>
> On 23/9/25 00:13, Andrii Nakryiko wrote:
> > On Mon, Sep 22, 2025 at 7:50 AM Leon Hwang <leon.hwang@linux.dev> wrote:
> >>
> >> On Sat Sep 20, 2025 at 6:31 AM +08, Andrii Nakryiko wrote:
> >>> On Thu, Sep 18, 2025 at 10:25 PM Leon Hwang <leon.hwang@linux.dev> wrote:
> >>>>
> >>>>
> >>>>
> >>>>>> @@ -1724,7 +1742,7 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map,
> >>>>>>         value_size = htab->map.value_size;
> >>>>>>         size = round_up(value_size, 8);
> >>>>>>         if (is_percpu)
> >>>>>> -               value_size = size * num_possible_cpus();
> >>>>>> +               value_size = (elem_map_flags & BPF_F_CPU) ? size : size * num_possible_cpus();
> >>>>>
> >>>>> if (is_percpu && !(elem_map_flags & BPF_F_CPU))
> >>>>>     value_size = size * num_possible_cpus();
> >>>>>
> >>>>> ?
> >>>>>
> >>>>
> >>>> After looking at it again, I’d like to keep my approach.
> >>>>
> >>>> When 'elem_map_flags & BPF_F_CPU' is set, 'value_size' has to be
> >>>> assigned to 'size' ('round_up(value_size, 8)') instead of keeping
> >>>> 'htab->map.value_size'.
> >>>>
> >>>
> >>> isn't that what will happen here as well? There is
> >>>
> >>> size = round_up(value_size, 8);
> >>>
> >>> right before that if
> >>>
> >>
> >> As for percpu maps, both 'size' and 'value_size' need to be 8-byte
> >> aligned here, because 'map.value_size' itself is not guarenteed to be
> >> aligned.
> >>
> >> In 'htab_map_alloc_check()', there is no alignment check for percpu
> >> maps.
> >>
> >> So 'map.value_size' can be unaligned.
> >>
> >> Let's look at how 'value_size' is used:
> >>
> >> values = kvmalloc_array(value_size, bucket_size, GFP_USER | __GFP_NOWARN);
> >> dst_val = values;
> >> hlist_nulls_for_each_entry_safe(l, n, head, hash_node) {
> >>         if (is_percpu) {
> >>                 if (elem_map_flags & BPF_F_CPU) {
> >>                         copy_map_value_long(&htab->map, dst_val, per_cpu_ptr(pptr, cpu));
> >>                 }
> >>         }
> >>         dst_val += value_size;
> >> }
> >> copy_to_user(uvalues + total * value_size, values,
> >>              value_size * bucket_cnt)
> >>
> >> Here, 'value_size' determines how values are laid out and copied.
> >>
> >
> > So in my mind (and maybe it's wrong, tell me), BPF_F_CPU turns a
> > per-CPU map lookup into an effectively non-per-cpu one. So I'm not
> > sure we need to do 8 byte alignment of value/key sizes when BPF_F_CPU
> > is specified.
> >
> > But if people would like to keep 8 byte alignment anyways for
> > BPF_F_CPU, that's fine too, I guess.
> >
>
> 'value_size' should be 8-byte aligned here.
>
> For example, if 'value_size' is *1* when BPF_F_CPU is specified:
>
> values = kvmalloc_array();  /* 5 bytes (value_size * bucket_size) memory */
> copy_map_value_long();      /* copies 8 bytes, writing past the
>                                allocated 5 bytes of memory */
>
> To stay consistent with 'copy_map_value_long()', 'value_size' itself
> needs to be 8-byte aligned.
>
> That leaves us with two options:
>
> 1. Keep 'value_size' unaligned, switch 'copy_map_value_long()' to
>    'copy_map_value()'.

Yes, this. As I said, I think BPF_F_CPU makes lookup effectively
non-per-CPU, so we should handle that consistently with no-per-CPU map
lookups.

> 2. Require 'value_size' to be 8-byte aligned.
>
> WDYT?
>
> Thanks,
> Leon

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2025-09-24 23:47 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-10 16:27 [PATCH bpf-next v7 0/7] bpf: Introduce BPF_F_CPU and BPF_F_ALL_CPUS flags for percpu maps Leon Hwang
2025-09-10 16:27 ` [PATCH bpf-next v7 1/7] bpf: Introduce internal bpf_map_check_op_flags helper function Leon Hwang
2025-09-16 23:44   ` Andrii Nakryiko
2025-09-10 16:27 ` [PATCH bpf-next v7 2/7] bpf: Introduce BPF_F_CPU and BPF_F_ALL_CPUS flags Leon Hwang
2025-09-16 23:44   ` Andrii Nakryiko
2025-09-10 16:27 ` [PATCH bpf-next v7 3/7] bpf: Add BPF_F_CPU and BPF_F_ALL_CPUS flags support for percpu_array maps Leon Hwang
2025-09-16 23:44   ` Andrii Nakryiko
2025-09-17 15:04     ` Leon Hwang
2025-09-10 16:27 ` [PATCH bpf-next v7 4/7] bpf: Add BPF_F_CPU and BPF_F_ALL_CPUS flags support for percpu_hash and lru_percpu_hash maps Leon Hwang
2025-09-16 23:44   ` Andrii Nakryiko
2025-09-17 15:20     ` Leon Hwang
2025-09-17 19:11       ` Andrii Nakryiko
2025-09-18 16:07         ` Leon Hwang
2025-09-18 19:52           ` Andrii Nakryiko
2025-09-19  5:25     ` Leon Hwang
2025-09-19 22:31       ` Andrii Nakryiko
2025-09-22 14:50         ` Leon Hwang
2025-09-22 16:13           ` Andrii Nakryiko
2025-09-23  2:45             ` Leon Hwang
2025-09-24 23:47               ` Andrii Nakryiko
2025-09-10 16:27 ` [PATCH bpf-next v7 5/7] bpf: Add BPF_F_CPU and BPF_F_ALL_CPUS flags support for percpu_cgroup_storage maps Leon Hwang
2025-09-16 23:44   ` Andrii Nakryiko
2025-09-17 15:07     ` Leon Hwang
2025-09-17 19:12       ` Andrii Nakryiko
2025-09-18 15:38         ` Leon Hwang
2025-09-10 16:27 ` [PATCH bpf-next v7 6/7] libbpf: Add BPF_F_CPU and BPF_F_ALL_CPUS flags support for percpu maps Leon Hwang
2025-09-16 23:44   ` Andrii Nakryiko
2025-09-17 15:25     ` Leon Hwang
2025-09-10 16:27 ` [PATCH bpf-next v7 7/7] selftests/bpf: Add cases to test BPF_F_CPU and BPF_F_ALL_CPUS flags Leon Hwang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.