* [PATCH bpf-next 00/16] Support dynptr key for hash map
@ 2024-10-08 9:14 Hou Tao
2024-10-08 9:14 ` [PATCH bpf-next 01/16] bpf: Introduce map flag BPF_F_DYNPTR_IN_KEY Hou Tao
` (16 more replies)
0 siblings, 17 replies; 47+ messages in thread
From: Hou Tao @ 2024-10-08 9:14 UTC (permalink / raw)
To: bpf
Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko,
Eduard Zingerman, Song Liu, Hao Luo, Yonghong Song,
Daniel Borkmann, KP Singh, Stanislav Fomichev, Jiri Olsa,
John Fastabend, houtao1, xukuohai
From: Hou Tao <houtao1@huawei.com>
Hi,
The patch set aims to add the basic dynptr key support for hash map as
discussed in [1]. The main motivation is to fully utilize the BTF info
of the map key and to support variable-length key (e.g., string or any
byte stream) for bpf map. The patch set uses bpf_dynptr to represent the
variable-length part in the map key and the total number of
variable-length parts in the map key is limited as 4 now. And due to the
limitation in bpf memory allocator, the max size of dynptr in map key is
limited as 4088 bytes. Beside the variable-length parts (dynptr parts),
the fixed-size part in map key is still allowed, so all of the following
map key definitions are valid:
struct bpf_dynptr;
struct map_key_1 {
struct bpf_dynptr name;
};
struct map_key_2 {
int pid;
struct bpf_dynptr name;
};
struct map_key_3 {
struct map_key_2 f1;
unsigned long when;
struct bpf_dynptr tag;
};
The patch set supports lookup, update, delete operations on hash map
with dynptr key for both bpf program and bpf syscall. It also supports
lookup_and_delete and get_next_key operations on dynptr map key for bpf
syscall.
However the following operations have not been fully supported yet on a
hash map with dynptr key:
1) batched map operation through bpf syscall
2) the memory accounting for dynptr (aka .htab_map_mem_usage)
3) btf print for the dynptr in map key
4) bpftool support
5) the iteration of elements through bpf program
When a bpf program iterates the element in a hash map with dynptr key
(e.g., bpf_for_each_map_elem() helper or map element iterator), the
dynptr in the map key has not been specially treated yet and the dynptr
is only treated as a read-only 16-bytes buffer.
The patch set is structured as follow:
Patch #1~#5 introduce BPF_F_DYNPTR_IN_KEY map flag, parse the bpf_dynptr
in the map key and verify the use of bpf_dynptr in map related helpers.
Patch #6~#10 introduce bpf_dynptr_user, support the use of
bpf_dynptr_user in bpf syscall for map lookup, lookup_delete, update,
delete and get_next_key operations.
Patch #11~#15 update the lookup, lookup_delete, update, delete and
get_next_key callback correspondingly to support key with bpf_dynptr for
hash map.
Patch #16 adds positive and negative test cases for hash map with dynptr
key support.
The following are the benchmark results on hash map with dynptr key.
(1) the benchmark compares the performance and memory usage between
normal hash map and dynptr-keyed hash map.
The key definitions for these two maps are show below:
struct norm_key {
__u64 cookie;
unsigned char desc[MAX_LEN];
};
struct dynptr_key {
__u64 cookie;
struct bpf_dynptr_user desc;
};
When the max length of desc is greater than 256, the lookup performance
of dynptr hash-map will be better than the normal hash map. When the max
length is greater than 512, the update performance of dynptr hash map
will be better than the normal hash map. And the memory consumption of
hash-map with dynptr key is smaller compared with normal hash map.
a) lookup operation
max_entries = 8K (randomly generated data set)
| max length of desc | normal hash-map | dynptr hash-map |
| --- | --- | --- |
| 64 | 12.1 M/s (? MB) | 7.5 M/s (? MB) |
| 128 | 7.5 M/s (? MB) | 6.3 M/s (? MB)
| 256 | 4.6 M/s (4.9 MB) | 4.9 M/s (? MB) |
| 512 | 2.6 M/s (8.9 MB) | 3.5 M/s (4.6 MB) |
| 1024 | 1.3 M/s (17 MB) | 2.2 M/s (7.4 MB) |
| 2048 | 0.6 M/s (33 MB) | 1.2 M/s (13 MB) |
| 4096 | 0.3 M/s (65 MB) | 0.6 M/s (24 MB) |
| string in file | normal hash-map | dynptr hash-map |
| --- | --- | --- |
| kallsyms | 5.4 M/s (32 MB) | 5.4 M/s (25 MB) |
| string in BTF | 6.8 M/s (23 MB) | 6.8 M/s (17 MB) |
| alexa top 1M sites | 3.2 M/s (192 MB) | 3.0 M/s (139 MB) |
b) update and delete operation
max_entries = 8K (randomly generated data set)
| max length of desc | normal hash-map | dynptr hash-map |
| --- | --- | --- |
| 64 | 4.3 M/s | 3.2 M/s |
| 128 | 3.6 M/s | 2.9 M/s |
| 256 | 2.8 M/s | 2.6 M/s |
| 512 | 1.9 M/s | 2.0 M/s |
| 1024 | 1.0 M/s | 1.3 M/s |
| 2048 | 0.5 M/s | 0.8 M/s |
| 4096 | 0.3 M/s | 0.5 M/s |
| strings in file | normal hash-map | dynptr hash-map |
| --- | --- | --- |
| kallsyms | 3.0 M/s | 2.0 M/s |
| strings in BTF | 3.9 M/s | 3.0 M/s |
| alexa top 1M sites | 2.4 M/s | 2.3 M/s |
(2) the benchmark uses map_perf_test under samples/bpf to test the
overhead of adding dynptr key support in hash map. The test is conducted
on a Intel Xeon CPU and the base kernel version is v6.11.
It seems adding dynptr key support in hash map degrades the lookup
performance about 12% and degrades the update performance about 7%. Will
investigate these degradation first.
a) lookup
max_entries = 8K
before:
0:hash_lookup 72347325 lookups per sec
after:
0:hash_lookup 64758890 lookups per sec
b) update/delete/lookup
max_entries = 8K
before:
0:hash_map_perf pre-alloc 675275 events per sec
0:hash_map_perf kmalloc 666535 events per sec
after:
0:hash_map_perf pre-alloc 626563 events per sec
0:hash_map_perf kmalloc 617234 events per sec
As usual, comments and suggestions are always welcome.
[1]: https://lore.kernel.org/bpf/CAADnVQJWaBRB=P-ZNkppwm=0tZaT3qP8PKLLJ2S5SSA2-S8mxg@mail.gmail.com/
Hou Tao (16):
bpf: Introduce map flag BPF_F_DYNPTR_IN_KEY
bpf: Add two helpers to facilitate the btf parsing of bpf_dynptr
bpf: Parse bpf_dynptr in map key
bpf: Pass flags instead of bool to check_helper_mem_access()
bpf: Support map key with dynptr in verifier
bpf: Introduce bpf_dynptr_user
libbpf: Add helpers for bpf_dynptr_user
bpf: Handle bpf_dynptr_user in bpf syscall when it is used as input
bpf: Handle bpf_dynptr_user in bpf syscall when it is used as output
bpf: Disable unsupported functionalities for map with dynptr key
bpf: Add bpf_mem_alloc_check_size() helper
bpf: Support basic operations for dynptr key in hash map
bpf: Export bpf_dynptr_set_size
bpf: Support get_next_key operation for dynptr key in hash map
bpf: Enable BPF_F_DYNPTR_IN_KEY for hash map
selftests/bpf: Add test cases for hash map with dynptr key
include/linux/bpf.h | 22 +-
include/linux/bpf_mem_alloc.h | 3 +
include/linux/btf.h | 2 +
include/uapi/linux/bpf.h | 9 +
kernel/bpf/btf.c | 46 +-
kernel/bpf/hashtab.c | 314 ++++++++++--
kernel/bpf/helpers.c | 2 +-
kernel/bpf/map_in_map.c | 19 +-
kernel/bpf/memalloc.c | 14 +-
kernel/bpf/syscall.c | 222 ++++++++-
kernel/bpf/verifier.c | 183 ++++++-
tools/include/uapi/linux/bpf.h | 9 +
tools/lib/bpf/bpf.h | 27 ++
.../bpf/prog_tests/htab_dynkey_test.c | 451 ++++++++++++++++++
.../bpf/progs/htab_dynkey_test_failure.c | 270 +++++++++++
.../bpf/progs/htab_dynkey_test_success.c | 399 ++++++++++++++++
16 files changed, 1902 insertions(+), 90 deletions(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/htab_dynkey_test.c
create mode 100644 tools/testing/selftests/bpf/progs/htab_dynkey_test_failure.c
create mode 100644 tools/testing/selftests/bpf/progs/htab_dynkey_test_success.c
--
2.44.0
^ permalink raw reply [flat|nested] 47+ messages in thread
* [PATCH bpf-next 01/16] bpf: Introduce map flag BPF_F_DYNPTR_IN_KEY
2024-10-08 9:14 [PATCH bpf-next 00/16] Support dynptr key for hash map Hou Tao
@ 2024-10-08 9:14 ` Hou Tao
2024-10-10 2:21 ` Alexei Starovoitov
2024-10-08 9:14 ` [PATCH bpf-next 02/16] bpf: Add two helpers to facilitate the btf parsing of bpf_dynptr Hou Tao
` (15 subsequent siblings)
16 siblings, 1 reply; 47+ messages in thread
From: Hou Tao @ 2024-10-08 9:14 UTC (permalink / raw)
To: bpf
Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko,
Eduard Zingerman, Song Liu, Hao Luo, Yonghong Song,
Daniel Borkmann, KP Singh, Stanislav Fomichev, Jiri Olsa,
John Fastabend, houtao1, xukuohai
From: Hou Tao <houtao1@huawei.com>
Introduce map flag BPF_F_DYNPTR_IN_KEY to support dynptr in map key. Add
the corresponding helper bpf_map_has_dynptr_key() to check whether or
not the dynptr-key is supported.
For map with dynptr key support, it needs to use map_extra to specify
the maximum length of these dynptrs. The implementation of the map will
check whether map_extra is smaller than the limitation imposed by memory
allocation during map creation. It may also use map_extra to optimizate
the memory allocation for dynptr.
Signed-off-by: Hou Tao <houtao1@huawei.com>
---
include/linux/bpf.h | 5 +++++
include/uapi/linux/bpf.h | 3 +++
kernel/bpf/syscall.c | 1 +
tools/include/uapi/linux/bpf.h | 3 +++
4 files changed, 12 insertions(+)
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 19d8ca8ac960..f61bf427e14e 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -308,6 +308,11 @@ struct bpf_map {
s64 __percpu *elem_count;
};
+static inline bool bpf_map_has_dynptr_key(const struct bpf_map *map)
+{
+ return map->map_flags & BPF_F_DYNPTR_IN_KEY;
+}
+
static inline const char *btf_field_type_name(enum btf_field_type type)
{
switch (type) {
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index c6cd7c7aeeee..07f7df308a01 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1409,6 +1409,9 @@ enum {
/* Do not translate kernel bpf_arena pointers to user pointers */
BPF_F_NO_USER_CONV = (1U << 18),
+
+/* Create a map with bpf_dynptr in key */
+ BPF_F_DYNPTR_IN_KEY = (1U << 19),
};
/* Flags for BPF_PROG_QUERY. */
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index a8f1808a1ca5..bffd803c5977 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1232,6 +1232,7 @@ static int map_create(union bpf_attr *attr)
if (attr->map_type != BPF_MAP_TYPE_BLOOM_FILTER &&
attr->map_type != BPF_MAP_TYPE_ARENA &&
+ !(attr->map_flags & BPF_F_DYNPTR_IN_KEY) &&
attr->map_extra != 0)
return -EINVAL;
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 1fb3cb2636e6..14f223282bfa 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -1409,6 +1409,9 @@ enum {
/* Do not translate kernel bpf_arena pointers to user pointers */
BPF_F_NO_USER_CONV = (1U << 18),
+
+/* Create a map with bpf_dynptr in key */
+ BPF_F_DYNPTR_IN_KEY = (1U << 19),
};
/* Flags for BPF_PROG_QUERY. */
--
2.44.0
^ permalink raw reply related [flat|nested] 47+ messages in thread
* [PATCH bpf-next 02/16] bpf: Add two helpers to facilitate the btf parsing of bpf_dynptr
2024-10-08 9:14 [PATCH bpf-next 00/16] Support dynptr key for hash map Hou Tao
2024-10-08 9:14 ` [PATCH bpf-next 01/16] bpf: Introduce map flag BPF_F_DYNPTR_IN_KEY Hou Tao
@ 2024-10-08 9:14 ` Hou Tao
2024-10-08 9:14 ` [PATCH bpf-next 03/16] bpf: Parse bpf_dynptr in map key Hou Tao
` (14 subsequent siblings)
16 siblings, 0 replies; 47+ messages in thread
From: Hou Tao @ 2024-10-08 9:14 UTC (permalink / raw)
To: bpf
Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko,
Eduard Zingerman, Song Liu, Hao Luo, Yonghong Song,
Daniel Borkmann, KP Singh, Stanislav Fomichev, Jiri Olsa,
John Fastabend, houtao1, xukuohai
From: Hou Tao <houtao1@huawei.com>
Add helper btf_type_is_dynptr() to check whether or not the btf_type is
a bpf_dynptr, and add helper btf_new_bpf_dynptr_record() to create an
btf record which only includes a bpf_dynptr. These two helpers will be
used when using bpf_dynptr as the map key directly.
Signed-off-by: Hou Tao <houtao1@huawei.com>
---
include/linux/btf.h | 2 ++
kernel/bpf/btf.c | 42 +++++++++++++++++++++++++++++++++++++-----
2 files changed, 39 insertions(+), 5 deletions(-)
diff --git a/include/linux/btf.h b/include/linux/btf.h
index b8a583194c4a..ed89b83e8c19 100644
--- a/include/linux/btf.h
+++ b/include/linux/btf.h
@@ -222,8 +222,10 @@ bool btf_member_is_reg_int(const struct btf *btf, const struct btf_type *s,
u32 expected_offset, u32 expected_size);
struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type *t,
u32 field_mask, u32 value_size);
+struct btf_record *btf_new_bpf_dynptr_record(void);
int btf_check_and_fixup_fields(const struct btf *btf, struct btf_record *rec);
bool btf_type_is_void(const struct btf_type *t);
+bool btf_type_is_dynptr(const struct btf *btf, const struct btf_type *t);
s32 btf_find_by_name_kind(const struct btf *btf, const char *name, u8 kind);
s32 bpf_find_btf_id(const char *name, u32 kind, struct btf **btf_p);
const struct btf_type *btf_type_skip_modifiers(const struct btf *btf,
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 5254fc9c1b24..321356710924 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -3923,6 +3923,16 @@ static int btf_field_cmp(const void *_a, const void *_b, const void *priv)
return 0;
}
+static void btf_init_btf_record(struct btf_record *record)
+{
+ record->cnt = 0;
+ record->field_mask = 0;
+ record->spin_lock_off = -EINVAL;
+ record->timer_off = -EINVAL;
+ record->wq_off = -EINVAL;
+ record->refcount_off = -EINVAL;
+}
+
struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type *t,
u32 field_mask, u32 value_size)
{
@@ -3941,14 +3951,11 @@ struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type
/* This needs to be kzalloc to zero out padding and unused fields, see
* comment in btf_record_equal.
*/
- rec = kzalloc(offsetof(struct btf_record, fields[cnt]), GFP_KERNEL | __GFP_NOWARN);
+ rec = kzalloc(struct_size(rec, fields, cnt), GFP_KERNEL | __GFP_NOWARN);
if (!rec)
return ERR_PTR(-ENOMEM);
- rec->spin_lock_off = -EINVAL;
- rec->timer_off = -EINVAL;
- rec->wq_off = -EINVAL;
- rec->refcount_off = -EINVAL;
+ btf_init_btf_record(rec);
for (i = 0; i < cnt; i++) {
field_type_size = btf_field_type_size(info_arr[i].type);
if (info_arr[i].off + field_type_size > value_size) {
@@ -4038,6 +4045,25 @@ struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type
return ERR_PTR(ret);
}
+struct btf_record *btf_new_bpf_dynptr_record(void)
+{
+ struct btf_record *record;
+
+ record = kzalloc(struct_size(record, fields, 1), GFP_KERNEL | __GFP_NOWARN);
+ if (!record)
+ return ERR_PTR(-ENOMEM);
+
+ btf_init_btf_record(record);
+
+ record->cnt = 1;
+ record->field_mask = BPF_DYNPTR;
+ record->fields[0].offset = 0;
+ record->fields[0].size = sizeof(struct bpf_dynptr);
+ record->fields[0].type = BPF_DYNPTR;
+
+ return record;
+}
+
int btf_check_and_fixup_fields(const struct btf *btf, struct btf_record *rec)
{
int i;
@@ -7276,6 +7302,12 @@ static bool btf_is_dynptr_ptr(const struct btf *btf, const struct btf_type *t)
return false;
}
+bool btf_type_is_dynptr(const struct btf *btf, const struct btf_type *t)
+{
+ return __btf_type_is_struct(t) && t->size == sizeof(struct bpf_dynptr) &&
+ !strcmp(__btf_name_by_offset(btf, t->name_off), "bpf_dynptr");
+}
+
struct bpf_cand_cache {
const char *name;
u32 name_len;
--
2.44.0
^ permalink raw reply related [flat|nested] 47+ messages in thread
* [PATCH bpf-next 03/16] bpf: Parse bpf_dynptr in map key
2024-10-08 9:14 [PATCH bpf-next 00/16] Support dynptr key for hash map Hou Tao
2024-10-08 9:14 ` [PATCH bpf-next 01/16] bpf: Introduce map flag BPF_F_DYNPTR_IN_KEY Hou Tao
2024-10-08 9:14 ` [PATCH bpf-next 02/16] bpf: Add two helpers to facilitate the btf parsing of bpf_dynptr Hou Tao
@ 2024-10-08 9:14 ` Hou Tao
2024-10-10 18:02 ` Eduard Zingerman
2024-10-11 16:29 ` Alexei Starovoitov
2024-10-08 9:14 ` [PATCH bpf-next 04/16] bpf: Pass flags instead of bool to check_helper_mem_access() Hou Tao
` (13 subsequent siblings)
16 siblings, 2 replies; 47+ messages in thread
From: Hou Tao @ 2024-10-08 9:14 UTC (permalink / raw)
To: bpf
Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko,
Eduard Zingerman, Song Liu, Hao Luo, Yonghong Song,
Daniel Borkmann, KP Singh, Stanislav Fomichev, Jiri Olsa,
John Fastabend, houtao1, xukuohai
From: Hou Tao <houtao1@huawei.com>
To support variable-length key or strings in map key, use bpf_dynptr to
represent these variable-length objects and save these bpf_dynptr
fields in the map key. As shown in the examples below, a map key with an
integer and a string is defined:
struct pid_name {
int pid;
struct bpf_dynptr name;
};
The bpf_dynptr in the map key could also be contained indirectly in a
struct as shown below:
struct pid_name_time {
struct pid_name process;
unsigned long long time;
};
If the whole map key is a bpf_dynptr, the map could be defined as a
struct or directly using bpf_dynptr as the map key:
struct map_key {
struct bpf_dynptr name;
};
The bpf program could use bpf_dynptr_init() to initialize the dynptr
part in the map key, and the userspace application will use
bpf_dynptr_user_init() or similar API to initialize the dynptr. Just
like kptrs in map value, the bpf_dynptr in the map key could also be
defined in a nested struct which is contained in the map key struct.
The patch updates map_create() accordingly to parse these bpf_dynptr
fields in map key, just like it does for other special fields in map
value. To enable bpf_dynptr support in map key, the map_type should be
BPF_MAP_TYPE_HASH, and BPF_F_DYNPTR_IN_KEY should also be enabled in
map_flags. For now, the max number of bpf_dynptr in a map key is
arbitrarily chosen as 4 and it may be changed later.
Signed-off-by: Hou Tao <houtao1@huawei.com>
---
include/linux/bpf.h | 13 ++++++++++--
kernel/bpf/btf.c | 4 ++++
kernel/bpf/map_in_map.c | 19 ++++++++++++-----
kernel/bpf/syscall.c | 47 +++++++++++++++++++++++++++++++++++++++++
4 files changed, 76 insertions(+), 7 deletions(-)
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index f61bf427e14e..3e25e94b7457 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -184,8 +184,8 @@ struct bpf_map_ops {
};
enum {
- /* Support at most 11 fields in a BTF type */
- BTF_FIELDS_MAX = 11,
+ /* Support at most 12 fields in a BTF type */
+ BTF_FIELDS_MAX = 12,
};
enum btf_field_type {
@@ -203,6 +203,7 @@ enum btf_field_type {
BPF_GRAPH_ROOT = BPF_RB_ROOT | BPF_LIST_HEAD,
BPF_REFCOUNT = (1 << 9),
BPF_WORKQUEUE = (1 << 10),
+ BPF_DYNPTR = (1 << 11),
};
typedef void (*btf_dtor_kfunc_t)(void *);
@@ -270,6 +271,7 @@ struct bpf_map {
u32 map_flags;
u32 id;
struct btf_record *record;
+ struct btf_record *key_record;
int numa_node;
u32 btf_key_type_id;
u32 btf_value_type_id;
@@ -337,6 +339,8 @@ static inline const char *btf_field_type_name(enum btf_field_type type)
return "bpf_rb_node";
case BPF_REFCOUNT:
return "bpf_refcount";
+ case BPF_DYNPTR:
+ return "bpf_dynptr";
default:
WARN_ON_ONCE(1);
return "unknown";
@@ -366,6 +370,8 @@ static inline u32 btf_field_type_size(enum btf_field_type type)
return sizeof(struct bpf_rb_node);
case BPF_REFCOUNT:
return sizeof(struct bpf_refcount);
+ case BPF_DYNPTR:
+ return sizeof(struct bpf_dynptr);
default:
WARN_ON_ONCE(1);
return 0;
@@ -395,6 +401,8 @@ static inline u32 btf_field_type_align(enum btf_field_type type)
return __alignof__(struct bpf_rb_node);
case BPF_REFCOUNT:
return __alignof__(struct bpf_refcount);
+ case BPF_DYNPTR:
+ return __alignof__(struct bpf_dynptr);
default:
WARN_ON_ONCE(1);
return 0;
@@ -424,6 +432,7 @@ static inline void bpf_obj_init_field(const struct btf_field *field, void *addr)
case BPF_KPTR_UNREF:
case BPF_KPTR_REF:
case BPF_KPTR_PERCPU:
+ case BPF_DYNPTR:
break;
default:
WARN_ON_ONCE(1);
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 321356710924..2604cef53915 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -3500,6 +3500,7 @@ static int btf_get_field_type(const struct btf *btf, const struct btf_type *var_
field_mask_test_name(BPF_RB_ROOT, "bpf_rb_root");
field_mask_test_name(BPF_RB_NODE, "bpf_rb_node");
field_mask_test_name(BPF_REFCOUNT, "bpf_refcount");
+ field_mask_test_name(BPF_DYNPTR, "bpf_dynptr");
/* Only return BPF_KPTR when all other types with matchable names fail */
if (field_mask & BPF_KPTR && !__btf_type_is_struct(var_type)) {
@@ -3537,6 +3538,7 @@ static int btf_repeat_fields(struct btf_field_info *info,
case BPF_KPTR_PERCPU:
case BPF_LIST_HEAD:
case BPF_RB_ROOT:
+ case BPF_DYNPTR:
break;
default:
return -EINVAL;
@@ -3659,6 +3661,7 @@ static int btf_find_field_one(const struct btf *btf,
case BPF_LIST_NODE:
case BPF_RB_NODE:
case BPF_REFCOUNT:
+ case BPF_DYNPTR:
ret = btf_find_struct(btf, var_type, off, sz, field_type,
info_cnt ? &info[0] : &tmp);
if (ret < 0)
@@ -4014,6 +4017,7 @@ struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type
break;
case BPF_LIST_NODE:
case BPF_RB_NODE:
+ case BPF_DYNPTR:
break;
default:
ret = -EFAULT;
diff --git a/kernel/bpf/map_in_map.c b/kernel/bpf/map_in_map.c
index 645bd30bc9a9..a072835dc645 100644
--- a/kernel/bpf/map_in_map.c
+++ b/kernel/bpf/map_in_map.c
@@ -9,7 +9,7 @@
struct bpf_map *bpf_map_meta_alloc(int inner_map_ufd)
{
- struct bpf_map *inner_map, *inner_map_meta;
+ struct bpf_map *inner_map, *inner_map_meta, *ret;
u32 inner_map_meta_size;
CLASS(fd, f)(inner_map_ufd);
@@ -45,9 +45,13 @@ struct bpf_map *bpf_map_meta_alloc(int inner_map_ufd)
* invalid/empty/valid, but ERR_PTR in case of errors. During
* equality NULL or IS_ERR is equivalent.
*/
- struct bpf_map *ret = ERR_CAST(inner_map_meta->record);
- kfree(inner_map_meta);
- return ret;
+ ret = ERR_CAST(inner_map_meta->record);
+ goto free;
+ }
+ inner_map_meta->key_record = btf_record_dup(inner_map->key_record);
+ if (IS_ERR(inner_map_meta->key_record)) {
+ ret = ERR_CAST(inner_map_meta->key_record);
+ goto free;
}
/* Note: We must use the same BTF, as we also used btf_record_dup above
* which relies on BTF being same for both maps, as some members like
@@ -71,6 +75,10 @@ struct bpf_map *bpf_map_meta_alloc(int inner_map_ufd)
inner_map_meta->bypass_spec_v1 = inner_map->bypass_spec_v1;
}
return inner_map_meta;
+
+free:
+ bpf_map_meta_free(inner_map_meta);
+ return ret;
}
void bpf_map_meta_free(struct bpf_map *map_meta)
@@ -88,7 +96,8 @@ bool bpf_map_meta_equal(const struct bpf_map *meta0,
meta0->key_size == meta1->key_size &&
meta0->value_size == meta1->value_size &&
meta0->map_flags == meta1->map_flags &&
- btf_record_equal(meta0->record, meta1->record);
+ btf_record_equal(meta0->record, meta1->record) &&
+ btf_record_equal(meta0->key_record, meta1->key_record);
}
void *bpf_map_fd_get_ptr(struct bpf_map *map,
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index bffd803c5977..aa0a500f8fad 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -561,6 +561,7 @@ void btf_record_free(struct btf_record *rec)
case BPF_TIMER:
case BPF_REFCOUNT:
case BPF_WORKQUEUE:
+ case BPF_DYNPTR:
/* Nothing to release */
break;
default:
@@ -574,7 +575,9 @@ void btf_record_free(struct btf_record *rec)
void bpf_map_free_record(struct bpf_map *map)
{
btf_record_free(map->record);
+ btf_record_free(map->key_record);
map->record = NULL;
+ map->key_record = NULL;
}
struct btf_record *btf_record_dup(const struct btf_record *rec)
@@ -612,6 +615,7 @@ struct btf_record *btf_record_dup(const struct btf_record *rec)
case BPF_TIMER:
case BPF_REFCOUNT:
case BPF_WORKQUEUE:
+ case BPF_DYNPTR:
/* Nothing to acquire */
break;
default:
@@ -728,6 +732,8 @@ void bpf_obj_free_fields(const struct btf_record *rec, void *obj)
case BPF_RB_NODE:
case BPF_REFCOUNT:
break;
+ case BPF_DYNPTR:
+ break;
default:
WARN_ON_ONCE(1);
continue;
@@ -737,6 +743,7 @@ void bpf_obj_free_fields(const struct btf_record *rec, void *obj)
static void bpf_map_free(struct bpf_map *map)
{
+ struct btf_record *key_rec = map->key_record;
struct btf_record *rec = map->record;
struct btf *btf = map->btf;
@@ -751,6 +758,7 @@ static void bpf_map_free(struct bpf_map *map)
* eventually calls bpf_map_free_meta, since inner_map_meta is only a
* template bpf_map struct used during verification.
*/
+ btf_record_free(key_rec);
btf_record_free(rec);
/* Delay freeing of btf for maps, as map_free callback may need
* struct_meta info which will be freed with btf_put().
@@ -1081,6 +1089,8 @@ int map_check_no_btf(const struct bpf_map *map,
return -ENOTSUPP;
}
+#define MAX_DYNPTR_CNT_IN_MAP_KEY 4
+
static int map_check_btf(struct bpf_map *map, struct bpf_token *token,
const struct btf *btf, u32 btf_key_id, u32 btf_value_id)
{
@@ -1103,6 +1113,40 @@ static int map_check_btf(struct bpf_map *map, struct bpf_token *token,
if (!value_type || value_size != map->value_size)
return -EINVAL;
+ if (btf_type_is_dynptr(btf, key_type))
+ map->key_record = btf_new_bpf_dynptr_record();
+ else
+ map->key_record = btf_parse_fields(btf, key_type, BPF_DYNPTR, map->key_size);
+ if (!IS_ERR_OR_NULL(map->key_record)) {
+ if (map->key_record->cnt > MAX_DYNPTR_CNT_IN_MAP_KEY) {
+ ret = -E2BIG;
+ goto free_map_tab;
+ }
+ if (!bpf_map_has_dynptr_key(map)) {
+ ret = -EINVAL;
+ goto free_map_tab;
+ }
+ if (map->map_type != BPF_MAP_TYPE_HASH) {
+ ret = -EOPNOTSUPP;
+ goto free_map_tab;
+ }
+ if (!bpf_token_capable(token, CAP_BPF)) {
+ ret = -EPERM;
+ goto free_map_tab;
+ }
+ /* Disallow key with dynptr for special map */
+ if (map->map_flags & (BPF_F_RDONLY_PROG | BPF_F_WRONLY_PROG)) {
+ ret = -EACCES;
+ goto free_map_tab;
+ }
+ } else if (bpf_map_has_dynptr_key(map)) {
+ ret = -EINVAL;
+ goto free_map_tab;
+ } else {
+ /* Ensure key_record is either a valid btf_record or NULL */
+ map->key_record = NULL;
+ }
+
map->record = btf_parse_fields(btf, value_type,
BPF_SPIN_LOCK | BPF_TIMER | BPF_KPTR | BPF_LIST_HEAD |
BPF_RB_ROOT | BPF_REFCOUNT | BPF_WORKQUEUE,
@@ -1230,6 +1274,9 @@ static int map_create(union bpf_attr *attr)
return -EINVAL;
}
+ if ((attr->map_flags & BPF_F_DYNPTR_IN_KEY) && !attr->btf_key_type_id)
+ return -EINVAL;
+
if (attr->map_type != BPF_MAP_TYPE_BLOOM_FILTER &&
attr->map_type != BPF_MAP_TYPE_ARENA &&
!(attr->map_flags & BPF_F_DYNPTR_IN_KEY) &&
--
2.44.0
^ permalink raw reply related [flat|nested] 47+ messages in thread
* [PATCH bpf-next 04/16] bpf: Pass flags instead of bool to check_helper_mem_access()
2024-10-08 9:14 [PATCH bpf-next 00/16] Support dynptr key for hash map Hou Tao
` (2 preceding siblings ...)
2024-10-08 9:14 ` [PATCH bpf-next 03/16] bpf: Parse bpf_dynptr in map key Hou Tao
@ 2024-10-08 9:14 ` Hou Tao
2024-10-08 9:14 ` [PATCH bpf-next 05/16] bpf: Support map key with dynptr in verifier Hou Tao
` (12 subsequent siblings)
16 siblings, 0 replies; 47+ messages in thread
From: Hou Tao @ 2024-10-08 9:14 UTC (permalink / raw)
To: bpf
Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko,
Eduard Zingerman, Song Liu, Hao Luo, Yonghong Song,
Daniel Borkmann, KP Singh, Stanislav Fomichev, Jiri Olsa,
John Fastabend, houtao1, xukuohai
From: Hou Tao <houtao1@huawei.com>
Extend "bool zero_size_allowed" into "unsigned int flags", so more flags
could by passed to check_helper_mem_access().
Signed-off-by: Hou Tao <houtao1@huawei.com>
---
kernel/bpf/verifier.c | 42 ++++++++++++++++++++++++------------------
1 file changed, 24 insertions(+), 18 deletions(-)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 9a7ed527e47e..2090d2472d7c 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -5041,9 +5041,11 @@ enum bpf_access_src {
ACCESS_HELPER = 2, /* the access is performed by a helper */
};
+#define ACCESS_F_ZERO_SIZE_ALLOWED BIT(0)
+
static int check_stack_range_initialized(struct bpf_verifier_env *env,
int regno, int off, int access_size,
- bool zero_size_allowed,
+ unsigned int access_flags,
enum bpf_access_src type,
struct bpf_call_arg_meta *meta);
@@ -5077,7 +5079,7 @@ static int check_stack_read_var_off(struct bpf_verifier_env *env,
/* Note that we pass a NULL meta, so raw access will not be permitted.
*/
err = check_stack_range_initialized(env, ptr_regno, off, size,
- false, ACCESS_DIRECT, NULL);
+ 0, ACCESS_DIRECT, NULL);
if (err)
return err;
@@ -7277,7 +7279,7 @@ static int check_atomic(struct bpf_verifier_env *env, int insn_idx, struct bpf_i
*/
static int check_stack_range_initialized(
struct bpf_verifier_env *env, int regno, int off,
- int access_size, bool zero_size_allowed,
+ int access_size, unsigned int access_flags,
enum bpf_access_src type, struct bpf_call_arg_meta *meta)
{
struct bpf_reg_state *reg = reg_state(env, regno);
@@ -7290,7 +7292,7 @@ static int check_stack_range_initialized(
*/
bool clobber = false;
- if (access_size == 0 && !zero_size_allowed) {
+ if (access_size == 0 && !(access_flags & ACCESS_F_ZERO_SIZE_ALLOWED)) {
verbose(env, "invalid zero-sized read\n");
return -EACCES;
}
@@ -7432,9 +7434,10 @@ static int check_stack_range_initialized(
}
static int check_helper_mem_access(struct bpf_verifier_env *env, int regno,
- int access_size, bool zero_size_allowed,
+ int access_size, unsigned int access_flags,
struct bpf_call_arg_meta *meta)
{
+ bool zero_size_allowed = access_flags & ACCESS_F_ZERO_SIZE_ALLOWED;
struct bpf_reg_state *regs = cur_regs(env), *reg = ®s[regno];
u32 *max_access;
@@ -7488,7 +7491,7 @@ static int check_helper_mem_access(struct bpf_verifier_env *env, int regno,
return check_stack_range_initialized(
env,
regno, reg->off, access_size,
- zero_size_allowed, ACCESS_HELPER, meta);
+ access_flags, ACCESS_HELPER, meta);
case PTR_TO_BTF_ID:
return check_ptr_to_btf_access(env, regs, regno, reg->off,
access_size, BPF_READ, -1);
@@ -7532,9 +7535,10 @@ static int check_helper_mem_access(struct bpf_verifier_env *env, int regno,
*/
static int check_mem_size_reg(struct bpf_verifier_env *env,
struct bpf_reg_state *reg, u32 regno,
- bool zero_size_allowed,
+ unsigned int access_flags,
struct bpf_call_arg_meta *meta)
{
+ bool zero_size_allowed = access_flags & ACCESS_F_ZERO_SIZE_ALLOWED;
int err;
/* This is used to refine r0 return value bounds for helpers
@@ -7577,7 +7581,7 @@ static int check_mem_size_reg(struct bpf_verifier_env *env,
}
err = check_helper_mem_access(env, regno - 1,
reg->umax_value,
- zero_size_allowed, meta);
+ access_flags, meta);
if (!err)
err = mark_chain_precision(env, regno);
return err;
@@ -7604,11 +7608,11 @@ static int check_mem_reg(struct bpf_verifier_env *env, struct bpf_reg_state *reg
mark_ptr_not_null_reg(reg);
}
- err = check_helper_mem_access(env, regno, mem_size, true, &meta);
+ err = check_helper_mem_access(env, regno, mem_size, ACCESS_F_ZERO_SIZE_ALLOWED, &meta);
/* Check access for BPF_WRITE */
meta.raw_mode = true;
- err = err ?: check_helper_mem_access(env, regno, mem_size, true, &meta);
-
+ err = err ?: check_helper_mem_access(env, regno, mem_size, ACCESS_F_ZERO_SIZE_ALLOWED,
+ &meta);
if (may_be_null)
*reg = saved_reg;
@@ -7633,10 +7637,10 @@ static int check_kfunc_mem_size_reg(struct bpf_verifier_env *env, struct bpf_reg
mark_ptr_not_null_reg(mem_reg);
}
- err = check_mem_size_reg(env, reg, regno, true, &meta);
+ err = check_mem_size_reg(env, reg, regno, ACCESS_F_ZERO_SIZE_ALLOWED, &meta);
/* Check access for BPF_WRITE */
meta.raw_mode = true;
- err = err ?: check_mem_size_reg(env, reg, regno, true, &meta);
+ err = err ?: check_mem_size_reg(env, reg, regno, ACCESS_F_ZERO_SIZE_ALLOWED, &meta);
if (may_be_null)
*mem_reg = saved_reg;
@@ -8943,7 +8947,7 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
return -EACCES;
}
err = check_helper_mem_access(env, regno,
- meta->map_ptr->key_size, false,
+ meta->map_ptr->key_size, 0,
NULL);
break;
case ARG_PTR_TO_MAP_VALUE:
@@ -8960,7 +8964,7 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
}
meta->raw_mode = arg_type & MEM_UNINIT;
err = check_helper_mem_access(env, regno,
- meta->map_ptr->value_size, false,
+ meta->map_ptr->value_size, 0,
meta);
break;
case ARG_PTR_TO_PERCPU_BTF_ID:
@@ -9003,7 +9007,9 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
*/
meta->raw_mode = arg_type & MEM_UNINIT;
if (arg_type & MEM_FIXED_SIZE) {
- err = check_helper_mem_access(env, regno, fn->arg_size[arg], false, meta);
+ err = check_helper_mem_access(env, regno,
+ fn->arg_size[arg], 0,
+ meta);
if (err)
return err;
if (arg_type & MEM_ALIGNED)
@@ -9011,10 +9017,10 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
}
break;
case ARG_CONST_SIZE:
- err = check_mem_size_reg(env, reg, regno, false, meta);
+ err = check_mem_size_reg(env, reg, regno, 0, meta);
break;
case ARG_CONST_SIZE_OR_ZERO:
- err = check_mem_size_reg(env, reg, regno, true, meta);
+ err = check_mem_size_reg(env, reg, regno, ACCESS_F_ZERO_SIZE_ALLOWED, meta);
break;
case ARG_PTR_TO_DYNPTR:
err = process_dynptr_func(env, regno, insn_idx, arg_type, 0);
--
2.44.0
^ permalink raw reply related [flat|nested] 47+ messages in thread
* [PATCH bpf-next 05/16] bpf: Support map key with dynptr in verifier
2024-10-08 9:14 [PATCH bpf-next 00/16] Support dynptr key for hash map Hou Tao
` (3 preceding siblings ...)
2024-10-08 9:14 ` [PATCH bpf-next 04/16] bpf: Pass flags instead of bool to check_helper_mem_access() Hou Tao
@ 2024-10-08 9:14 ` Hou Tao
2024-10-10 20:30 ` Eduard Zingerman
2024-10-13 13:07 ` Dan Carpenter
2024-10-08 9:14 ` [PATCH bpf-next 06/16] bpf: Introduce bpf_dynptr_user Hou Tao
` (11 subsequent siblings)
16 siblings, 2 replies; 47+ messages in thread
From: Hou Tao @ 2024-10-08 9:14 UTC (permalink / raw)
To: bpf
Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko,
Eduard Zingerman, Song Liu, Hao Luo, Yonghong Song,
Daniel Borkmann, KP Singh, Stanislav Fomichev, Jiri Olsa,
John Fastabend, houtao1, xukuohai
From: Hou Tao <houtao1@huawei.com>
The patch basically does the following three things to enable dynptr key
for bpf map:
1) Only allow PTR_TO_STACK typed register for dynptr key
The main reason is that bpf_dynptr can only be defined in the stack, so
for dynptr key only PTR_TO_STACK typed register is allowed. bpf_dynptr
could also be represented by CONST_PTR_TO_DYNPTR typed register (e.g.,
in callback func or subprog), but it is not supported now.
2) Only allow fixed-offset for PTR_TO_STACK register
Variable-offset for PTR_TO_STACK typed register is disallowed, because
it is impossible to check whether or not the stack access is aligned
with BPF_REG_SIZE and is matched with the location of dynptr or
non-dynptr part in the map key.
3) Check the layout of the stack content is matched with the btf_record
Firstly check the start offset of the stack access is aligned with
BPF_REG_SIZE, then check the offset and the size of dynptr/non-dynptr
parts in the stack content is consistent with the btf_record of the map
key.
Signed-off-by: Hou Tao <houtao1@huawei.com>
---
kernel/bpf/verifier.c | 143 +++++++++++++++++++++++++++++++++++++++---
1 file changed, 134 insertions(+), 9 deletions(-)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 2090d2472d7c..345b45edf2a7 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -5042,6 +5042,7 @@ enum bpf_access_src {
};
#define ACCESS_F_ZERO_SIZE_ALLOWED BIT(0)
+#define ACCESS_F_DYNPTR_READ_ALLOWED BIT(1)
static int check_stack_range_initialized(struct bpf_verifier_env *env,
int regno, int off, int access_size,
@@ -7267,6 +7268,86 @@ static int check_atomic(struct bpf_verifier_env *env, int insn_idx, struct bpf_i
return 0;
}
+struct dynptr_key_state {
+ const struct btf_record *rec;
+ const struct btf_field *cur_dynptr;
+ bool valid_dynptr_id;
+ int cur_dynptr_id;
+};
+
+static int init_dynptr_key_state(struct bpf_verifier_env *env, const struct btf_record *rec,
+ struct dynptr_key_state *state)
+{
+ unsigned int i;
+
+ /* Find the first dynptr in the dynptr-key */
+ for (i = 0; i < rec->cnt; i++) {
+ if (rec->fields[i].type == BPF_DYNPTR)
+ break;
+ }
+ if (i >= rec->cnt) {
+ verbose(env, "verifier bug: dynptr not found\n");
+ return -EFAULT;
+ }
+
+ state->rec = rec;
+ state->cur_dynptr = &rec->fields[i];
+ state->valid_dynptr_id = false;
+
+ return 0;
+}
+
+static int check_dynptr_key_access(struct bpf_verifier_env *env, struct dynptr_key_state *state,
+ struct bpf_reg_state *reg, u8 stype, int offset)
+{
+ const struct btf_field *dynptr = state->cur_dynptr;
+
+ /* Non-dynptr part before a dynptr or non-dynptr part after
+ * the last dynptr.
+ */
+ if (offset < dynptr->offset || offset >= dynptr->offset + dynptr->size) {
+ if (stype == STACK_DYNPTR) {
+ verbose(env,
+ "dynptr-key expects non-dynptr at offset %d cur_dynptr_offset %u\n",
+ offset, dynptr->offset);
+ return -EACCES;
+ }
+ } else {
+ if (stype != STACK_DYNPTR) {
+ verbose(env,
+ "dynptr-key expects dynptr at offset %d cur_dynptr_offset %u\n",
+ offset, dynptr->offset);
+ return -EACCES;
+ }
+
+ /* A dynptr is composed of parts from two dynptrs */
+ if (state->valid_dynptr_id && reg->id != state->cur_dynptr_id) {
+ verbose(env, "malformed dynptr-key at offset %d cur_dynptr_offset %u\n",
+ offset, dynptr->offset);
+ return -EACCES;
+ }
+ if (!state->valid_dynptr_id) {
+ state->valid_dynptr_id = true;
+ state->cur_dynptr_id = reg->id;
+ }
+
+ if (offset == dynptr->offset + dynptr->size - 1) {
+ const struct btf_record *rec = state->rec;
+ unsigned int i;
+
+ for (i = dynptr - rec->fields + 1; i < rec->cnt; i++) {
+ if (rec->fields[i].type == BPF_DYNPTR) {
+ state->cur_dynptr = &rec->fields[i];
+ state->valid_dynptr_id = false;
+ break;
+ }
+ }
+ }
+ }
+
+ return 0;
+}
+
/* When register 'regno' is used to read the stack (either directly or through
* a helper function) make sure that it's within stack boundary and, depending
* on the access type and privileges, that all elements of the stack are
@@ -7287,6 +7368,8 @@ static int check_stack_range_initialized(
int err, min_off, max_off, i, j, slot, spi;
char *err_extra = type == ACCESS_HELPER ? " indirect" : "";
enum bpf_access_type bounds_check_type;
+ struct dynptr_key_state dynptr_key;
+ bool dynptr_read_allowed;
/* Some accesses can write anything into the stack, others are
* read-only.
*/
@@ -7312,9 +7395,14 @@ static int check_stack_range_initialized(
if (err)
return err;
-
+ dynptr_read_allowed = access_flags & ACCESS_F_DYNPTR_READ_ALLOWED;
if (tnum_is_const(reg->var_off)) {
min_off = max_off = reg->var_off.value + off;
+
+ if (dynptr_read_allowed && (min_off % BPF_REG_SIZE)) {
+ verbose(env, "R%d misaligned offset %d for dynptr-key\n", regno, min_off);
+ return -EACCES;
+ }
} else {
/* Variable offset is prohibited for unprivileged mode for
* simplicity since it requires corresponding support in
@@ -7329,6 +7417,12 @@ static int check_stack_range_initialized(
regno, err_extra, tn_buf);
return -EACCES;
}
+
+ if (dynptr_read_allowed) {
+ verbose(env, "R%d variable offset prohibited for dynptr-key\n", regno);
+ return -EACCES;
+ }
+
/* Only initialized buffer on stack is allowed to be accessed
* with variable offset. With uninitialized buffer it's hard to
* guarantee that whole memory is marked as initialized on
@@ -7373,19 +7467,26 @@ static int check_stack_range_initialized(
return 0;
}
+ if (dynptr_read_allowed) {
+ err = init_dynptr_key_state(env, meta->map_ptr->key_record, &dynptr_key);
+ if (err)
+ return err;
+ }
for (i = min_off; i < max_off + access_size; i++) {
u8 *stype;
slot = -i - 1;
spi = slot / BPF_REG_SIZE;
if (state->allocated_stack <= slot) {
- verbose(env, "verifier bug: allocated_stack too small");
+ verbose(env, "verifier bug: allocated_stack too small\n");
return -EFAULT;
}
stype = &state->stack[spi].slot_type[slot % BPF_REG_SIZE];
if (*stype == STACK_MISC)
goto mark;
+ if (dynptr_read_allowed && *stype == STACK_DYNPTR)
+ goto mark;
if ((*stype == STACK_ZERO) ||
(*stype == STACK_INVALID && env->allow_uninit_stack)) {
if (clobber) {
@@ -7418,18 +7519,28 @@ static int check_stack_range_initialized(
}
return -EACCES;
mark:
+ if (dynptr_read_allowed) {
+ err = check_dynptr_key_access(env, &dynptr_key,
+ &state->stack[spi].spilled_ptr, *stype,
+ i - min_off);
+ if (err)
+ return err;
+ }
+
/* reading any byte out of 8-byte 'spill_slot' will cause
* the whole slot to be marked as 'read'
- */
- mark_reg_read(env, &state->stack[spi].spilled_ptr,
- state->stack[spi].spilled_ptr.parent,
- REG_LIVE_READ64);
- /* We do not set REG_LIVE_WRITTEN for stack slot, as we can not
+ *
+ * We do not set REG_LIVE_WRITTEN for stack slot, as we can not
* be sure that whether stack slot is written to or not. Hence,
* we must still conservatively propagate reads upwards even if
* helper may write to the entire memory range.
*/
+ mark_reg_read(env, &state->stack[spi].spilled_ptr,
+ state->stack[spi].spilled_ptr.parent,
+ REG_LIVE_READ64);
+
}
+
return 0;
}
@@ -8933,6 +9044,9 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
meta->map_uid = reg->map_uid;
break;
case ARG_PTR_TO_MAP_KEY:
+ {
+ u32 access_flags = 0;
+
/* bpf_map_xxx(..., map_ptr, ..., key) call:
* check that [key, key + map->key_size) are within
* stack limits and initialized
@@ -8946,10 +9060,21 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
verbose(env, "invalid map_ptr to access map->key\n");
return -EACCES;
}
+ /* Only allow PTR_TO_STACK for dynptr-key */
+ if (bpf_map_has_dynptr_key(meta->map_ptr)) {
+ if (base_type(reg->type) != PTR_TO_STACK) {
+ verbose(env, "map dynptr-key requires stack ptr but got %s\n",
+ reg_type_str(env, reg->type));
+ return -EACCES;
+ }
+ access_flags |= ACCESS_F_DYNPTR_READ_ALLOWED;
+ }
+ meta->raw_mode = false;
err = check_helper_mem_access(env, regno,
- meta->map_ptr->key_size, 0,
- NULL);
+ meta->map_ptr->key_size, access_flags,
+ meta);
break;
+ }
case ARG_PTR_TO_MAP_VALUE:
if (type_may_be_null(arg_type) && register_is_null(reg))
return 0;
--
2.44.0
^ permalink raw reply related [flat|nested] 47+ messages in thread
* [PATCH bpf-next 06/16] bpf: Introduce bpf_dynptr_user
2024-10-08 9:14 [PATCH bpf-next 00/16] Support dynptr key for hash map Hou Tao
` (4 preceding siblings ...)
2024-10-08 9:14 ` [PATCH bpf-next 05/16] bpf: Support map key with dynptr in verifier Hou Tao
@ 2024-10-08 9:14 ` Hou Tao
2024-10-10 21:50 ` Andrii Nakryiko
2024-10-08 9:14 ` [PATCH bpf-next 07/16] libbpf: Add helpers for bpf_dynptr_user Hou Tao
` (10 subsequent siblings)
16 siblings, 1 reply; 47+ messages in thread
From: Hou Tao @ 2024-10-08 9:14 UTC (permalink / raw)
To: bpf
Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko,
Eduard Zingerman, Song Liu, Hao Luo, Yonghong Song,
Daniel Borkmann, KP Singh, Stanislav Fomichev, Jiri Olsa,
John Fastabend, houtao1, xukuohai
From: Hou Tao <houtao1@huawei.com>
For bpf map with dynptr key support, the userspace application will use
bpf_dynptr_user to represent the bpf_dynptr in the map key and pass it
to bpf syscall. The bpf syscall will copy from bpf_dynptr_user to
construct a corresponding bpf_dynptr_kern object when the map key is an
input argument, and copy to bpf_dynptr_user from a bpf_dynptr_kern
object when the map key is an output argument.
For now the size of bpf_dynptr_user must be the same as bpf_dynptr, but
the last u32 field is not used, so make it a reserved field.
Signed-off-by: Hou Tao <houtao1@huawei.com>
---
include/uapi/linux/bpf.h | 6 ++++++
tools/include/uapi/linux/bpf.h | 6 ++++++
2 files changed, 12 insertions(+)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 07f7df308a01..72fe6a96b54c 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -7329,6 +7329,12 @@ struct bpf_dynptr {
__u64 __opaque[2];
} __attribute__((aligned(8)));
+struct bpf_dynptr_user {
+ __u64 data;
+ __u32 size;
+ __u32 rsvd;
+} __attribute__((aligned(8)));
+
struct bpf_list_head {
__u64 __opaque[2];
} __attribute__((aligned(8)));
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 14f223282bfa..f12ce268e6be 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -7328,6 +7328,12 @@ struct bpf_dynptr {
__u64 __opaque[2];
} __attribute__((aligned(8)));
+struct bpf_dynptr_user {
+ __u64 data;
+ __u32 size;
+ __u32 rsvd;
+} __attribute__((aligned(8)));
+
struct bpf_list_head {
__u64 __opaque[2];
} __attribute__((aligned(8)));
--
2.44.0
^ permalink raw reply related [flat|nested] 47+ messages in thread
* [PATCH bpf-next 07/16] libbpf: Add helpers for bpf_dynptr_user
2024-10-08 9:14 [PATCH bpf-next 00/16] Support dynptr key for hash map Hou Tao
` (5 preceding siblings ...)
2024-10-08 9:14 ` [PATCH bpf-next 06/16] bpf: Introduce bpf_dynptr_user Hou Tao
@ 2024-10-08 9:14 ` Hou Tao
2024-10-10 21:50 ` Andrii Nakryiko
2024-10-08 9:14 ` [PATCH bpf-next 08/16] bpf: Handle bpf_dynptr_user in bpf syscall when it is used as input Hou Tao
` (9 subsequent siblings)
16 siblings, 1 reply; 47+ messages in thread
From: Hou Tao @ 2024-10-08 9:14 UTC (permalink / raw)
To: bpf
Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko,
Eduard Zingerman, Song Liu, Hao Luo, Yonghong Song,
Daniel Borkmann, KP Singh, Stanislav Fomichev, Jiri Olsa,
John Fastabend, houtao1, xukuohai
From: Hou Tao <houtao1@huawei.com>
Add bpf_dynptr_user_init() to initialize a bpf_dynptr_user object,
bpf_dynptr_user_{data,size}() to get the address and length of the
dynptr object, and bpf_dynptr_user_set_size() to set the its size.
Instead of exporting these symbols, simply adding these helpers as
inline functions in bpf.h.
Signed-off-by: Hou Tao <houtao1@huawei.com>
---
tools/lib/bpf/bpf.h | 27 +++++++++++++++++++++++++++
1 file changed, 27 insertions(+)
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index a4a7b1ad1b63..92b4afac5f5f 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -700,6 +700,33 @@ struct bpf_token_create_opts {
LIBBPF_API int bpf_token_create(int bpffs_fd,
struct bpf_token_create_opts *opts);
+/* sys_bpf() will check the validity of data and size */
+static inline void bpf_dynptr_user_init(void *data, __u32 size,
+ struct bpf_dynptr_user *dynptr)
+{
+ dynptr->data = (__u64)(unsigned long)data;
+ dynptr->size = size;
+ dynptr->rsvd = 0;
+}
+
+static inline void bpf_dynptr_user_set_size(struct bpf_dynptr_user *dynptr,
+ __u32 new_size)
+{
+ dynptr->size = new_size;
+}
+
+static inline __u32
+bpf_dynptr_user_size(const struct bpf_dynptr_user *dynptr)
+{
+ return dynptr->size;
+}
+
+static inline void *
+bpf_dynptr_user_data(const struct bpf_dynptr_user *dynptr)
+{
+ return (void *)(unsigned long)dynptr->data;
+}
+
#ifdef __cplusplus
} /* extern "C" */
#endif
--
2.44.0
^ permalink raw reply related [flat|nested] 47+ messages in thread
* [PATCH bpf-next 08/16] bpf: Handle bpf_dynptr_user in bpf syscall when it is used as input
2024-10-08 9:14 [PATCH bpf-next 00/16] Support dynptr key for hash map Hou Tao
` (6 preceding siblings ...)
2024-10-08 9:14 ` [PATCH bpf-next 07/16] libbpf: Add helpers for bpf_dynptr_user Hou Tao
@ 2024-10-08 9:14 ` Hou Tao
2024-10-13 13:08 ` Dan Carpenter
2024-10-08 9:14 ` [PATCH bpf-next 09/16] bpf: Handle bpf_dynptr_user in bpf syscall when it is used as output Hou Tao
` (8 subsequent siblings)
16 siblings, 1 reply; 47+ messages in thread
From: Hou Tao @ 2024-10-08 9:14 UTC (permalink / raw)
To: bpf
Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko,
Eduard Zingerman, Song Liu, Hao Luo, Yonghong Song,
Daniel Borkmann, KP Singh, Stanislav Fomichev, Jiri Olsa,
John Fastabend, houtao1, xukuohai
From: Hou Tao <houtao1@huawei.com>
Introduce bpf_copy_from_dynptr_ukey() helper to handle map key with
bpf_dynptr when the map key is used in map lookup, update, delete and
get_next_key operations.
The helper places all variable-length data of these bpf_dynptr_user
objects at the end of the map key to simplify the allocate and the free
of map key with dynptr.
Signed-off-by: Hou Tao <houtao1@huawei.com>
---
kernel/bpf/syscall.c | 98 +++++++++++++++++++++++++++++++++++++++-----
1 file changed, 87 insertions(+), 11 deletions(-)
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index aa0a500f8fad..5bd75db9b12f 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1540,10 +1540,83 @@ int __weak bpf_stackmap_copy(struct bpf_map *map, void *key, void *value)
return -ENOTSUPP;
}
-static void *__bpf_copy_key(void __user *ukey, u64 key_size)
+static void *bpf_copy_from_dynptr_ukey(const struct bpf_map *map, bpfptr_t ukey)
+{
+ const struct btf_record *record;
+ const struct btf_field *field;
+ struct bpf_dynptr_user *uptr;
+ struct bpf_dynptr_kern *kptr;
+ void *key, *new_key, *kdata;
+ unsigned int key_size, size;
+ bpfptr_t udata;
+ unsigned int i;
+ int err;
+
+ key_size = map->key_size;
+ key = kvmemdup_bpfptr(ukey, key_size);
+ if (!key)
+ return ERR_PTR(-ENOMEM);
+
+ size = key_size;
+ record = map->key_record;
+ for (i = 0; i < record->cnt; i++) {
+ field = &record->fields[i];
+ if (field->type != BPF_DYNPTR)
+ continue;
+ uptr = key + field->offset;
+ if (!uptr->size || uptr->size > map->map_extra || uptr->rsvd) {
+ err = -EINVAL;
+ goto free_key;
+ }
+
+ size += uptr->size;
+ /* Overflow ? */
+ if (size < uptr->size) {
+ err = -E2BIG;
+ goto free_key;
+ }
+ }
+
+ /* Place all dynptrs' data in the end of the key */
+ new_key = kvrealloc(key, size, GFP_USER | __GFP_NOWARN);
+ if (!new_key) {
+ err = -ENOMEM;
+ goto free_key;
+ }
+
+ key = new_key;
+ kdata = key + key_size;
+ for (i = 0; i < record->cnt; i++) {
+ field = &record->fields[i];
+ if (field->type != BPF_DYNPTR)
+ continue;
+
+ uptr = key + field->offset;
+ size = uptr->size;
+ udata = make_bpfptr(uptr->data, bpfptr_is_kernel(ukey));
+ if (copy_from_bpfptr(kdata, udata, size)) {
+ err = -EFAULT;
+ goto free_key;
+ }
+ kptr = (struct bpf_dynptr_kern *)uptr;
+ bpf_dynptr_init(kptr, kdata, BPF_DYNPTR_TYPE_LOCAL, 0, size);
+ kdata += size;
+ }
+
+ return key;
+
+free_key:
+ kvfree(key);
+ return ERR_PTR(err);
+}
+
+static void *__bpf_copy_key(const struct bpf_map *map, void __user *ukey)
{
- if (key_size)
- return vmemdup_user(ukey, key_size);
+ if (bpf_map_has_dynptr_key(map))
+ return bpf_copy_from_dynptr_ukey(map, USER_BPFPTR(ukey));
+
+ if (map->key_size)
+ return vmemdup_user(ukey, map->key_size);
if (ukey)
return ERR_PTR(-EINVAL);
@@ -1551,10 +1624,13 @@ static void *__bpf_copy_key(void __user *ukey, u64 key_size)
return NULL;
}
-static void *___bpf_copy_key(bpfptr_t ukey, u64 key_size)
+static void *___bpf_copy_key(const struct bpf_map *map, bpfptr_t ukey)
{
- if (key_size)
- return kvmemdup_bpfptr(ukey, key_size);
+ if (bpf_map_has_dynptr_key(map))
+ return bpf_copy_from_dynptr_ukey(map, ukey);
+
+ if (map->key_size)
+ return kvmemdup_bpfptr(ukey, map->key_size);
if (!bpfptr_is_null(ukey))
return ERR_PTR(-EINVAL);
@@ -1591,7 +1667,7 @@ static int map_lookup_elem(union bpf_attr *attr)
!btf_record_has_field(map->record, BPF_SPIN_LOCK))
return -EINVAL;
- key = __bpf_copy_key(ukey, map->key_size);
+ key = __bpf_copy_key(map, ukey);
if (IS_ERR(key))
return PTR_ERR(key);
@@ -1658,7 +1734,7 @@ static int map_update_elem(union bpf_attr *attr, bpfptr_t uattr)
goto err_put;
}
- key = ___bpf_copy_key(ukey, map->key_size);
+ key = ___bpf_copy_key(map, ukey);
if (IS_ERR(key)) {
err = PTR_ERR(key);
goto err_put;
@@ -1705,7 +1781,7 @@ static int map_delete_elem(union bpf_attr *attr, bpfptr_t uattr)
goto err_put;
}
- key = ___bpf_copy_key(ukey, map->key_size);
+ key = ___bpf_copy_key(map, ukey);
if (IS_ERR(key)) {
err = PTR_ERR(key);
goto err_put;
@@ -1757,7 +1833,7 @@ static int map_get_next_key(union bpf_attr *attr)
return -EPERM;
if (ukey) {
- key = __bpf_copy_key(ukey, map->key_size);
+ key = __bpf_copy_key(map, ukey);
if (IS_ERR(key))
return PTR_ERR(key);
} else {
@@ -2054,7 +2130,7 @@ static int map_lookup_and_delete_elem(union bpf_attr *attr)
goto err_put;
}
- key = __bpf_copy_key(ukey, map->key_size);
+ key = __bpf_copy_key(map, ukey);
if (IS_ERR(key)) {
err = PTR_ERR(key);
goto err_put;
--
2.44.0
^ permalink raw reply related [flat|nested] 47+ messages in thread
* [PATCH bpf-next 09/16] bpf: Handle bpf_dynptr_user in bpf syscall when it is used as output
2024-10-08 9:14 [PATCH bpf-next 00/16] Support dynptr key for hash map Hou Tao
` (7 preceding siblings ...)
2024-10-08 9:14 ` [PATCH bpf-next 08/16] bpf: Handle bpf_dynptr_user in bpf syscall when it is used as input Hou Tao
@ 2024-10-08 9:14 ` Hou Tao
2024-10-08 9:14 ` [PATCH bpf-next 10/16] bpf: Disable unsupported functionalities for map with dynptr key Hou Tao
` (7 subsequent siblings)
16 siblings, 0 replies; 47+ messages in thread
From: Hou Tao @ 2024-10-08 9:14 UTC (permalink / raw)
To: bpf
Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko,
Eduard Zingerman, Song Liu, Hao Luo, Yonghong Song,
Daniel Borkmann, KP Singh, Stanislav Fomichev, Jiri Olsa,
John Fastabend, houtao1, xukuohai
From: Hou Tao <houtao1@huawei.com>
For get_next_key operation, unext_key is used as an output argument.
When there is dynptr in map key, unext_key will also be used as an input
argument, because the userspace application needs to pre-allocate a
buffer for each variable-length part in the map key and save the
length and the address of these buffers in bpf_dynptr_user objects.
To support get_next_key op for map with dynptr key, map_get_next_key()
first calls bpf_copy_from_dynptr_ukey() to construct a map key in which
each bpf_dynptr_kern object has the same size as the corresponding
bpf_dynptr_user object. It then calls ->map_get_next_key() to get the
next_key, and finally calls bpf_copy_to_dynptr_ukey() to copy both the
non-dynptr part and dynptr part in the map key to unext_key.
Signed-off-by: Hou Tao <houtao1@huawei.com>
---
kernel/bpf/syscall.c | 88 ++++++++++++++++++++++++++++++++++++--------
1 file changed, 73 insertions(+), 15 deletions(-)
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 5bd75db9b12f..338f17530068 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1540,7 +1540,7 @@ int __weak bpf_stackmap_copy(struct bpf_map *map, void *key, void *value)
return -ENOTSUPP;
}
-static void *bpf_copy_from_dynptr_ukey(const struct bpf_map *map, bpfptr_t ukey)
+static void *bpf_copy_from_dynptr_ukey(const struct bpf_map *map, bpfptr_t ukey, bool copy_data)
{
const struct btf_record *record;
const struct btf_field *field;
@@ -1548,7 +1548,6 @@ static void *bpf_copy_from_dynptr_ukey(const struct bpf_map *map, bpfptr_t ukey)
struct bpf_dynptr_kern *kptr;
void *key, *new_key, *kdata;
unsigned int key_size, size;
- bpfptr_t udata;
unsigned int i;
int err;
@@ -1563,6 +1562,7 @@ static void *bpf_copy_from_dynptr_ukey(const struct bpf_map *map, bpfptr_t ukey)
field = &record->fields[i];
if (field->type != BPF_DYNPTR)
continue;
+
uptr = key + field->offset;
if (!uptr->size || uptr->size > map->map_extra || uptr->rsvd) {
err = -EINVAL;
@@ -1593,10 +1593,13 @@ static void *bpf_copy_from_dynptr_ukey(const struct bpf_map *map, bpfptr_t ukey)
uptr = key + field->offset;
size = uptr->size;
- udata = make_bpfptr(uptr->data, bpfptr_is_kernel(ukey));
- if (copy_from_bpfptr(kdata, udata, size)) {
- err = -EFAULT;
- goto free_key;
+ if (copy_data) {
+ bpfptr_t udata = make_bpfptr(uptr->data, bpfptr_is_kernel(ukey));
+
+ if (copy_from_bpfptr(kdata, udata, size)) {
+ err = -EFAULT;
+ goto free_key;
+ }
}
kptr = (struct bpf_dynptr_kern *)uptr;
bpf_dynptr_init(kptr, kdata, BPF_DYNPTR_TYPE_LOCAL, 0, size);
@@ -1613,7 +1616,7 @@ static void *bpf_copy_from_dynptr_ukey(const struct bpf_map *map, bpfptr_t ukey)
static void *__bpf_copy_key(const struct bpf_map *map, void __user *ukey)
{
if (bpf_map_has_dynptr_key(map))
- return bpf_copy_from_dynptr_ukey(map, USER_BPFPTR(ukey));
+ return bpf_copy_from_dynptr_ukey(map, USER_BPFPTR(ukey), true);
if (map->key_size)
return vmemdup_user(ukey, map->key_size);
@@ -1627,7 +1630,7 @@ static void *__bpf_copy_key(const struct bpf_map *map, void __user *ukey)
static void *___bpf_copy_key(const struct bpf_map *map, bpfptr_t ukey)
{
if (bpf_map_has_dynptr_key(map))
- return bpf_copy_from_dynptr_ukey(map, ukey);
+ return bpf_copy_from_dynptr_ukey(map, ukey, true);
if (map->key_size)
return kvmemdup_bpfptr(ukey, map->key_size);
@@ -1638,6 +1641,51 @@ static void *___bpf_copy_key(const struct bpf_map *map, bpfptr_t ukey)
return NULL;
}
+static int bpf_copy_to_dynptr_ukey(const struct bpf_map *map,
+ void __user *ukey, void *key)
+{
+ struct bpf_dynptr_user __user *uptr;
+ struct bpf_dynptr_kern *kptr;
+ struct btf_record *record;
+ unsigned int i, offset;
+
+ offset = 0;
+ record = map->key_record;
+ for (i = 0; i < record->cnt; i++) {
+ struct btf_field *field;
+ unsigned int size;
+ u64 udata;
+
+ field = &record->fields[i];
+ if (field->type != BPF_DYNPTR)
+ continue;
+
+ /* Any no-dynptr part before the dynptr ? */
+ if (offset < field->offset &&
+ copy_to_user(ukey + offset, key + offset, field->offset - offset))
+ return -EFAULT;
+
+ /* dynptr part */
+ uptr = ukey + field->offset;
+ if (copy_from_user(&udata, &uptr->data, sizeof(udata)))
+ return -EFAULT;
+
+ kptr = key + field->offset;
+ size = __bpf_dynptr_size(kptr);
+ if (copy_to_user(u64_to_user_ptr(udata), __bpf_dynptr_data(kptr, size), size) ||
+ put_user(size, &uptr->size) || put_user(0, &uptr->rsvd))
+ return -EFAULT;
+
+ offset = field->offset + field->size;
+ }
+
+ if (offset < map->key_size &&
+ copy_to_user(ukey + offset, key + offset, map->key_size - offset))
+ return -EFAULT;
+
+ return 0;
+}
+
/* last field in 'union bpf_attr' used by this command */
#define BPF_MAP_LOOKUP_ELEM_LAST_FIELD flags
@@ -1840,10 +1888,19 @@ static int map_get_next_key(union bpf_attr *attr)
key = NULL;
}
- err = -ENOMEM;
- next_key = kvmalloc(map->key_size, GFP_USER);
- if (!next_key)
+ if (bpf_map_has_dynptr_key(map))
+ next_key = bpf_copy_from_dynptr_ukey(map, USER_BPFPTR(unext_key), false);
+ else
+ next_key = kvmalloc(map->key_size, GFP_USER);
+ if (IS_ERR_OR_NULL(next_key)) {
+ if (!next_key) {
+ err = -ENOMEM;
+ } else {
+ err = PTR_ERR(next_key);
+ next_key = NULL;
+ }
goto free_key;
+ }
if (bpf_map_is_offloaded(map)) {
err = bpf_map_offload_get_next_key(map, key, next_key);
@@ -1857,12 +1914,13 @@ static int map_get_next_key(union bpf_attr *attr)
if (err)
goto free_next_key;
- err = -EFAULT;
- if (copy_to_user(unext_key, next_key, map->key_size) != 0)
+ if (bpf_map_has_dynptr_key(map))
+ err = bpf_copy_to_dynptr_ukey(map, unext_key, next_key);
+ else
+ err = copy_to_user(unext_key, next_key, map->key_size) ? -EFAULT : 0;
+ if (err)
goto free_next_key;
- err = 0;
-
free_next_key:
kvfree(next_key);
free_key:
--
2.44.0
^ permalink raw reply related [flat|nested] 47+ messages in thread
* [PATCH bpf-next 10/16] bpf: Disable unsupported functionalities for map with dynptr key
2024-10-08 9:14 [PATCH bpf-next 00/16] Support dynptr key for hash map Hou Tao
` (8 preceding siblings ...)
2024-10-08 9:14 ` [PATCH bpf-next 09/16] bpf: Handle bpf_dynptr_user in bpf syscall when it is used as output Hou Tao
@ 2024-10-08 9:14 ` Hou Tao
2024-10-08 9:14 ` [PATCH bpf-next 11/16] bpf: Add bpf_mem_alloc_check_size() helper Hou Tao
` (6 subsequent siblings)
16 siblings, 0 replies; 47+ messages in thread
From: Hou Tao @ 2024-10-08 9:14 UTC (permalink / raw)
To: bpf
Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko,
Eduard Zingerman, Song Liu, Hao Luo, Yonghong Song,
Daniel Borkmann, KP Singh, Stanislav Fomichev, Jiri Olsa,
John Fastabend, houtao1, xukuohai
From: Hou Tao <houtao1@huawei.com>
For map with dynptr key, batched map operations and dumping all elements
through bpffs file are not yet supported. Therefore, disable these
functionalities for now.
Signed-off-by: Hou Tao <houtao1@huawei.com>
---
include/linux/bpf.h | 3 ++-
kernel/bpf/syscall.c | 4 ++++
2 files changed, 6 insertions(+), 1 deletion(-)
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 3e25e94b7457..127952377025 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -592,7 +592,8 @@ static inline bool bpf_map_offload_neutral(const struct bpf_map *map)
static inline bool bpf_map_support_seq_show(const struct bpf_map *map)
{
return (map->btf_value_type_id || map->btf_vmlinux_value_type_id) &&
- map->ops->map_seq_show_elem;
+ map->ops->map_seq_show_elem &&
+ !bpf_map_has_dynptr_key(map);
}
int map_check_no_btf(const struct bpf_map *map,
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 338f17530068..262f8a5541b6 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -5316,6 +5316,10 @@ static int bpf_map_do_batch(const union bpf_attr *attr,
err = -EPERM;
goto err_put;
}
+ if (bpf_map_has_dynptr_key(map)) {
+ err = -EOPNOTSUPP;
+ goto err_put;
+ }
if (cmd == BPF_MAP_LOOKUP_BATCH)
BPF_DO_BATCH(map->ops->map_lookup_batch, map, attr, uattr);
--
2.44.0
^ permalink raw reply related [flat|nested] 47+ messages in thread
* [PATCH bpf-next 11/16] bpf: Add bpf_mem_alloc_check_size() helper
2024-10-08 9:14 [PATCH bpf-next 00/16] Support dynptr key for hash map Hou Tao
` (9 preceding siblings ...)
2024-10-08 9:14 ` [PATCH bpf-next 10/16] bpf: Disable unsupported functionalities for map with dynptr key Hou Tao
@ 2024-10-08 9:14 ` Hou Tao
2024-10-08 9:14 ` [PATCH bpf-next 12/16] bpf: Support basic operations for dynptr key in hash map Hou Tao
` (5 subsequent siblings)
16 siblings, 0 replies; 47+ messages in thread
From: Hou Tao @ 2024-10-08 9:14 UTC (permalink / raw)
To: bpf
Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko,
Eduard Zingerman, Song Liu, Hao Luo, Yonghong Song,
Daniel Borkmann, KP Singh, Stanislav Fomichev, Jiri Olsa,
John Fastabend, houtao1, xukuohai
From: Hou Tao <houtao1@huawei.com>
Introduce bpf_mem_alloc_check_size() to check whether the allocation
size exceeds the limitation for kmalloc-equivalent allocator. Because
the upper limit for percpu allocation is LLIST_NODE_SZ bytes larger than
non-percpu allocation, so an extra percpu argument is added to the
helper.
It will be used by the following patch to check whether the maximum size
of variable-length key is valid.
Signed-off-by: Hou Tao <houtao1@huawei.com>
---
include/linux/bpf_mem_alloc.h | 3 +++
kernel/bpf/memalloc.c | 14 +++++++++++++-
2 files changed, 16 insertions(+), 1 deletion(-)
diff --git a/include/linux/bpf_mem_alloc.h b/include/linux/bpf_mem_alloc.h
index aaf004d94322..e45162ef59bb 100644
--- a/include/linux/bpf_mem_alloc.h
+++ b/include/linux/bpf_mem_alloc.h
@@ -33,6 +33,9 @@ int bpf_mem_alloc_percpu_init(struct bpf_mem_alloc *ma, struct obj_cgroup *objcg
int bpf_mem_alloc_percpu_unit_init(struct bpf_mem_alloc *ma, int size);
void bpf_mem_alloc_destroy(struct bpf_mem_alloc *ma);
+/* Check the allocation size for kmalloc equivalent allocator */
+int bpf_mem_alloc_check_size(bool percpu, size_t size);
+
/* kmalloc/kfree equivalent: */
void *bpf_mem_alloc(struct bpf_mem_alloc *ma, size_t size);
void bpf_mem_free(struct bpf_mem_alloc *ma, void *ptr);
diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c
index b3858a76e0b3..146f5b57cfb1 100644
--- a/kernel/bpf/memalloc.c
+++ b/kernel/bpf/memalloc.c
@@ -35,6 +35,8 @@
*/
#define LLIST_NODE_SZ sizeof(struct llist_node)
+#define BPF_MEM_ALLOC_SIZE_MAX 4096
+
/* similar to kmalloc, but sizeof == 8 bucket is gone */
static u8 size_index[24] __ro_after_init = {
3, /* 8 */
@@ -65,7 +67,7 @@ static u8 size_index[24] __ro_after_init = {
static int bpf_mem_cache_idx(size_t size)
{
- if (!size || size > 4096)
+ if (!size || size > BPF_MEM_ALLOC_SIZE_MAX)
return -1;
if (size <= 192)
@@ -1005,3 +1007,13 @@ void notrace *bpf_mem_cache_alloc_flags(struct bpf_mem_alloc *ma, gfp_t flags)
return !ret ? NULL : ret + LLIST_NODE_SZ;
}
+
+int bpf_mem_alloc_check_size(bool percpu, size_t size)
+{
+ /* The size of percpu allocation doesn't have LLIST_NODE_SZ overhead */
+ if ((percpu && size > BPF_MEM_ALLOC_SIZE_MAX) ||
+ (!percpu && size > BPF_MEM_ALLOC_SIZE_MAX - LLIST_NODE_SZ))
+ return -E2BIG;
+
+ return 0;
+}
--
2.44.0
^ permalink raw reply related [flat|nested] 47+ messages in thread
* [PATCH bpf-next 12/16] bpf: Support basic operations for dynptr key in hash map
2024-10-08 9:14 [PATCH bpf-next 00/16] Support dynptr key for hash map Hou Tao
` (10 preceding siblings ...)
2024-10-08 9:14 ` [PATCH bpf-next 11/16] bpf: Add bpf_mem_alloc_check_size() helper Hou Tao
@ 2024-10-08 9:14 ` Hou Tao
2024-10-11 16:47 ` Alexei Starovoitov
2024-10-08 9:14 ` [PATCH bpf-next 13/16] bpf: Export bpf_dynptr_set_size Hou Tao
` (4 subsequent siblings)
16 siblings, 1 reply; 47+ messages in thread
From: Hou Tao @ 2024-10-08 9:14 UTC (permalink / raw)
To: bpf
Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko,
Eduard Zingerman, Song Liu, Hao Luo, Yonghong Song,
Daniel Borkmann, KP Singh, Stanislav Fomichev, Jiri Olsa,
John Fastabend, houtao1, xukuohai
From: Hou Tao <houtao1@huawei.com>
The patch supports lookup, update, delete and lookup_delete operations
for hash map with dynptr map. There are two major differences between
the implementation of normal hash map and dynptr-keyed hash map:
1) dynptr-keyed hash map doesn't support pre-allocation.
The reason is that the dynptr in map key is allocated dynamically
through bpf mem allocator. The length limitation for these dynptrs is
4088 bytes now. Because there dynptrs are allocated dynamically, the
consumption of memory will be smaller compared with normal hash map when
there are big differences between the length of these dynptrs.
2) the freed element in dynptr-key map will not be reused immediately
For normal hash map, the freed element may be reused immediately by the
newly-added element, so the lookup may return an incorrect result due to
element deletion and element reuse. However dynptr-key map could not do
that, there are pointers (dynptrs) in the map key and the updates of
these dynptrs are not atomic: both the address and the length of the
dynptr will be updated. If the element is reused immediately, the access
of the dynptr in the freed element may incur invalid memory access due
to the mismatch between the address and the size of dynptr, so reuse the
freed element after one RCU grace period.
Beside the differences above, dynptr-keyed hash map also needs to handle
the maybe-nullified dynptr in the map key.
Signed-off-by: Hou Tao <houtao1@huawei.com>
---
kernel/bpf/hashtab.c | 283 +++++++++++++++++++++++++++++++++++++++----
1 file changed, 257 insertions(+), 26 deletions(-)
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index b14b87463ee0..edf19d36a413 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -88,6 +88,7 @@ struct bpf_htab {
struct bpf_map map;
struct bpf_mem_alloc ma;
struct bpf_mem_alloc pcpu_ma;
+ struct bpf_mem_alloc dynptr_ma;
struct bucket *buckets;
void *elems;
union {
@@ -425,6 +426,7 @@ static int htab_map_alloc_check(union bpf_attr *attr)
bool percpu_lru = (attr->map_flags & BPF_F_NO_COMMON_LRU);
bool prealloc = !(attr->map_flags & BPF_F_NO_PREALLOC);
bool zero_seed = (attr->map_flags & BPF_F_ZERO_SEED);
+ bool dynptr_in_key = (attr->map_flags & BPF_F_DYNPTR_IN_KEY);
int numa_node = bpf_map_attr_numa_node(attr);
BUILD_BUG_ON(offsetof(struct htab_elem, fnode.next) !=
@@ -438,6 +440,14 @@ static int htab_map_alloc_check(union bpf_attr *attr)
!bpf_map_flags_access_ok(attr->map_flags))
return -EINVAL;
+ if (dynptr_in_key) {
+ if (percpu || lru || prealloc || !attr->map_extra)
+ return -EINVAL;
+ if ((attr->map_extra >> 32) || bpf_dynptr_check_size(attr->map_extra) ||
+ bpf_mem_alloc_check_size(percpu, attr->map_extra))
+ return -E2BIG;
+ }
+
if (!lru && percpu_lru)
return -EINVAL;
@@ -482,6 +492,7 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
*/
bool percpu_lru = (attr->map_flags & BPF_F_NO_COMMON_LRU);
bool prealloc = !(attr->map_flags & BPF_F_NO_PREALLOC);
+ bool dynptr_in_key = (attr->map_flags & BPF_F_DYNPTR_IN_KEY);
struct bpf_htab *htab;
int err, i;
@@ -598,6 +609,11 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
if (err)
goto free_map_locked;
}
+ if (dynptr_in_key) {
+ err = bpf_mem_alloc_init(&htab->dynptr_ma, 0, false);
+ if (err)
+ goto free_map_locked;
+ }
}
return &htab->map;
@@ -610,6 +626,7 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
for (i = 0; i < HASHTAB_MAP_LOCK_COUNT; i++)
free_percpu(htab->map_locked[i]);
bpf_map_area_free(htab->buckets);
+ bpf_mem_alloc_destroy(&htab->dynptr_ma);
bpf_mem_alloc_destroy(&htab->pcpu_ma);
bpf_mem_alloc_destroy(&htab->ma);
free_elem_count:
@@ -620,13 +637,55 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
return ERR_PTR(err);
}
-static inline u32 htab_map_hash(const void *key, u32 key_len, u32 hashrnd)
+static inline u32 __htab_map_hash(const void *key, u32 key_len, u32 hashrnd)
{
if (likely(key_len % 4 == 0))
return jhash2(key, key_len / 4, hashrnd);
return jhash(key, key_len, hashrnd);
}
+static u32 htab_map_dynptr_hash(const void *key, u32 key_len, u32 hashrnd,
+ const struct btf_record *rec)
+{
+ unsigned int i, cnt = rec->cnt;
+ unsigned int hash = hashrnd;
+ unsigned int offset = 0;
+
+ for (i = 0; i < cnt; i++) {
+ const struct btf_field *field = &rec->fields[i];
+ const struct bpf_dynptr_kern *kptr;
+ unsigned int len;
+
+ if (field->type != BPF_DYNPTR)
+ continue;
+
+ /* non-dynptr part ? */
+ if (offset < field->offset)
+ hash = jhash(key + offset, field->offset - offset, hash);
+
+ /* Skip nullified dynptr */
+ kptr = key + field->offset;
+ if (kptr->data) {
+ len = __bpf_dynptr_size(kptr);
+ hash = jhash(__bpf_dynptr_data(kptr, len), len, hash);
+ }
+ offset = field->offset + field->size;
+ }
+
+ if (offset < key_len)
+ hash = jhash(key + offset, key_len - offset, hash);
+
+ return hash;
+}
+
+static inline u32 htab_map_hash(const void *key, u32 key_len, u32 hashrnd,
+ const struct btf_record *rec)
+{
+ if (!rec)
+ return __htab_map_hash(key, key_len, hashrnd);
+ return htab_map_dynptr_hash(key, key_len, hashrnd, rec);
+}
+
static inline struct bucket *__select_bucket(struct bpf_htab *htab, u32 hash)
{
return &htab->buckets[hash & (htab->n_buckets - 1)];
@@ -637,15 +696,68 @@ static inline struct hlist_nulls_head *select_bucket(struct bpf_htab *htab, u32
return &__select_bucket(htab, hash)->head;
}
+static bool is_same_dynptr_key(const void *key, const void *tgt, unsigned int key_size,
+ const struct btf_record *rec)
+{
+ unsigned int i, cnt = rec->cnt;
+ unsigned int offset = 0;
+
+ for (i = 0; i < cnt; i++) {
+ const struct btf_field *field = &rec->fields[i];
+ const struct bpf_dynptr_kern *kptr, *tgt_kptr;
+ const void *data, *tgt_data;
+ unsigned int len;
+
+ if (field->type != BPF_DYNPTR)
+ continue;
+
+ if (offset < field->offset &&
+ memcmp(key + offset, tgt + offset, field->offset - offset))
+ return false;
+
+ /*
+ * For a nullified dynptr in the target key, __bpf_dynptr_size()
+ * will return 0, and there will be no match for the target key.
+ */
+ kptr = key + field->offset;
+ tgt_kptr = tgt + field->offset;
+ len = __bpf_dynptr_size(kptr);
+ if (len != __bpf_dynptr_size(tgt_kptr))
+ return false;
+
+ data = __bpf_dynptr_data(kptr, len);
+ tgt_data = __bpf_dynptr_data(tgt_kptr, len);
+ if (memcmp(data, tgt_data, len))
+ return false;
+
+ offset = field->offset + field->size;
+ }
+
+ if (offset < key_size &&
+ memcmp(key + offset, tgt + offset, key_size - offset))
+ return false;
+
+ return true;
+}
+
+static inline bool htab_is_same_key(const void *key, const void *tgt, unsigned int key_size,
+ const struct btf_record *rec)
+{
+ if (!rec)
+ return !memcmp(key, tgt, key_size);
+ return is_same_dynptr_key(key, tgt, key_size, rec);
+}
+
/* this lookup function can only be called with bucket lock taken */
static struct htab_elem *lookup_elem_raw(struct hlist_nulls_head *head, u32 hash,
- void *key, u32 key_size)
+ void *key, u32 key_size,
+ const struct btf_record *record)
{
struct hlist_nulls_node *n;
struct htab_elem *l;
hlist_nulls_for_each_entry_rcu(l, n, head, hash_node)
- if (l->hash == hash && !memcmp(&l->key, key, key_size))
+ if (l->hash == hash && htab_is_same_key(l->key, key, key_size, record))
return l;
return NULL;
@@ -657,14 +769,15 @@ static struct htab_elem *lookup_elem_raw(struct hlist_nulls_head *head, u32 hash
*/
static struct htab_elem *lookup_nulls_elem_raw(struct hlist_nulls_head *head,
u32 hash, void *key,
- u32 key_size, u32 n_buckets)
+ u32 key_size, u32 n_buckets,
+ const struct btf_record *record)
{
struct hlist_nulls_node *n;
struct htab_elem *l;
again:
hlist_nulls_for_each_entry_rcu(l, n, head, hash_node)
- if (l->hash == hash && !memcmp(&l->key, key, key_size))
+ if (l->hash == hash && htab_is_same_key(l->key, key, key_size, record))
return l;
if (unlikely(get_nulls_value(n) != (hash & (n_buckets - 1))))
@@ -681,6 +794,7 @@ static struct htab_elem *lookup_nulls_elem_raw(struct hlist_nulls_head *head,
static void *__htab_map_lookup_elem(struct bpf_map *map, void *key)
{
struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
+ const struct btf_record *record;
struct hlist_nulls_head *head;
struct htab_elem *l;
u32 hash, key_size;
@@ -689,12 +803,13 @@ static void *__htab_map_lookup_elem(struct bpf_map *map, void *key)
!rcu_read_lock_bh_held());
key_size = map->key_size;
+ record = map->key_record;
- hash = htab_map_hash(key, key_size, htab->hashrnd);
+ hash = htab_map_hash(key, key_size, htab->hashrnd, record);
head = select_bucket(htab, hash);
- l = lookup_nulls_elem_raw(head, hash, key, key_size, htab->n_buckets);
+ l = lookup_nulls_elem_raw(head, hash, key, key_size, htab->n_buckets, record);
return l;
}
@@ -784,6 +899,26 @@ static int htab_lru_map_gen_lookup(struct bpf_map *map,
return insn - insn_buf;
}
+static void htab_free_dynptr_key(struct bpf_htab *htab, void *key)
+{
+ const struct btf_record *record = htab->map.key_record;
+ unsigned int i, cnt = record->cnt;
+
+ for (i = 0; i < cnt; i++) {
+ const struct btf_field *field = &record->fields[i];
+ struct bpf_dynptr_kern *kptr;
+
+ if (field->type != BPF_DYNPTR)
+ continue;
+
+ /* It may be accessed concurrently, so don't overwrite
+ * the kptr.
+ */
+ kptr = key + field->offset;
+ bpf_mem_free_rcu(&htab->dynptr_ma, kptr->data);
+ }
+}
+
static void check_and_free_fields(struct bpf_htab *htab,
struct htab_elem *elem)
{
@@ -834,6 +969,68 @@ static bool htab_lru_map_delete_node(void *arg, struct bpf_lru_node *node)
return l == tgt_l;
}
+static int htab_copy_dynptr_key(struct bpf_htab *htab, void *dst_key, const void *key, u32 key_size)
+{
+ const struct btf_record *rec = htab->map.key_record;
+ struct bpf_dynptr_kern *dst_kptr;
+ const struct btf_field *field;
+ unsigned int i, cnt, offset;
+ int err;
+
+ offset = 0;
+ cnt = rec->cnt;
+ for (i = 0; i < cnt; i++) {
+ const struct bpf_dynptr_kern *kptr;
+ unsigned int len;
+ const void *data;
+ void *dst_data;
+
+ field = &rec->fields[i];
+ if (field->type != BPF_DYNPTR)
+ continue;
+
+ if (offset < field->offset)
+ memcpy(dst_key + offset, key + offset, field->offset - offset);
+
+ /* Doesn't support nullified dynptr in map key */
+ kptr = key + field->offset;
+ if (!kptr->data) {
+ err = -EINVAL;
+ goto out;
+ }
+ len = __bpf_dynptr_size(kptr);
+ data = __bpf_dynptr_data(kptr, len);
+
+ dst_data = bpf_mem_alloc(&htab->dynptr_ma, len);
+ if (!dst_data) {
+ err = -ENOMEM;
+ goto out;
+ }
+
+ memcpy(dst_data, data, len);
+ dst_kptr = dst_key + field->offset;
+ bpf_dynptr_init(dst_kptr, dst_data, BPF_DYNPTR_TYPE_LOCAL, 0, len);
+
+ offset = field->offset + field->size;
+ }
+
+ if (offset < key_size)
+ memcpy(dst_key + offset, key + offset, key_size - offset);
+
+ return 0;
+
+out:
+ for (; i > 0; i--) {
+ field = &rec->fields[i - 1];
+ if (field->type != BPF_DYNPTR)
+ continue;
+
+ dst_kptr = dst_key + field->offset;
+ bpf_mem_free(&htab->dynptr_ma, dst_kptr->data);
+ }
+ return err;
+}
+
/* Called from syscall */
static int htab_map_get_next_key(struct bpf_map *map, void *key, void *next_key)
{
@@ -850,12 +1047,12 @@ static int htab_map_get_next_key(struct bpf_map *map, void *key, void *next_key)
if (!key)
goto find_first_elem;
- hash = htab_map_hash(key, key_size, htab->hashrnd);
+ hash = htab_map_hash(key, key_size, htab->hashrnd, NULL);
head = select_bucket(htab, hash);
/* lookup the key */
- l = lookup_nulls_elem_raw(head, hash, key, key_size, htab->n_buckets);
+ l = lookup_nulls_elem_raw(head, hash, key, key_size, htab->n_buckets, NULL);
if (!l)
goto find_first_elem;
@@ -895,10 +1092,27 @@ static int htab_map_get_next_key(struct bpf_map *map, void *key, void *next_key)
static void htab_elem_free(struct bpf_htab *htab, struct htab_elem *l)
{
+ bool dynptr_in_key = bpf_map_has_dynptr_key(&htab->map);
+
+ if (dynptr_in_key)
+ htab_free_dynptr_key(htab, l->key);
+
check_and_free_fields(htab, l);
+
if (htab->map.map_type == BPF_MAP_TYPE_PERCPU_HASH)
bpf_mem_cache_free(&htab->pcpu_ma, l->ptr_to_pptr);
- bpf_mem_cache_free(&htab->ma, l);
+
+ /*
+ * For dynptr key, the update of dynptr in the key is not atomic:
+ * both the pointer and the size are updated. If the element is reused
+ * immediately, the access of the dynptr key during lookup procedure may
+ * incur invalid memory access due to mismatch between the size and the
+ * data pointer, so reuse the element after one RCU GP.
+ */
+ if (dynptr_in_key)
+ bpf_mem_cache_free_rcu(&htab->ma, l);
+ else
+ bpf_mem_cache_free(&htab->ma, l);
}
static void htab_put_fd_value(struct bpf_htab *htab, struct htab_elem *l)
@@ -1046,7 +1260,19 @@ static struct htab_elem *alloc_htab_elem(struct bpf_htab *htab, void *key,
}
}
- memcpy(l_new->key, key, key_size);
+ if (bpf_map_has_dynptr_key(&htab->map)) {
+ int copy_err;
+
+ copy_err = htab_copy_dynptr_key(htab, l_new->key, key, key_size);
+ if (copy_err) {
+ bpf_mem_cache_free(&htab->ma, l_new);
+ l_new = ERR_PTR(copy_err);
+ goto dec_count;
+ }
+ } else {
+ memcpy(l_new->key, key, key_size);
+ }
+
if (percpu) {
if (prealloc) {
pptr = htab_elem_get_ptr(l_new, key_size);
@@ -1102,6 +1328,7 @@ static long htab_map_update_elem(struct bpf_map *map, void *key, void *value,
u64 map_flags)
{
struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
+ const struct btf_record *key_record = map->key_record;
struct htab_elem *l_new = NULL, *l_old;
struct hlist_nulls_head *head;
unsigned long flags;
@@ -1118,7 +1345,7 @@ static long htab_map_update_elem(struct bpf_map *map, void *key, void *value,
key_size = map->key_size;
- hash = htab_map_hash(key, key_size, htab->hashrnd);
+ hash = htab_map_hash(key, key_size, htab->hashrnd, key_record);
b = __select_bucket(htab, hash);
head = &b->head;
@@ -1128,7 +1355,7 @@ static long htab_map_update_elem(struct bpf_map *map, void *key, void *value,
return -EINVAL;
/* find an element without taking the bucket lock */
l_old = lookup_nulls_elem_raw(head, hash, key, key_size,
- htab->n_buckets);
+ htab->n_buckets, key_record);
ret = check_flags(htab, l_old, map_flags);
if (ret)
return ret;
@@ -1149,7 +1376,7 @@ static long htab_map_update_elem(struct bpf_map *map, void *key, void *value,
if (ret)
return ret;
- l_old = lookup_elem_raw(head, hash, key, key_size);
+ l_old = lookup_elem_raw(head, hash, key, key_size, key_record);
ret = check_flags(htab, l_old, map_flags);
if (ret)
@@ -1221,7 +1448,7 @@ static long htab_lru_map_update_elem(struct bpf_map *map, void *key, void *value
key_size = map->key_size;
- hash = htab_map_hash(key, key_size, htab->hashrnd);
+ hash = __htab_map_hash(key, key_size, htab->hashrnd);
b = __select_bucket(htab, hash);
head = &b->head;
@@ -1241,7 +1468,7 @@ static long htab_lru_map_update_elem(struct bpf_map *map, void *key, void *value
if (ret)
goto err_lock_bucket;
- l_old = lookup_elem_raw(head, hash, key, key_size);
+ l_old = lookup_elem_raw(head, hash, key, key_size, NULL);
ret = check_flags(htab, l_old, map_flags);
if (ret)
@@ -1290,7 +1517,7 @@ static long __htab_percpu_map_update_elem(struct bpf_map *map, void *key,
key_size = map->key_size;
- hash = htab_map_hash(key, key_size, htab->hashrnd);
+ hash = __htab_map_hash(key, key_size, htab->hashrnd);
b = __select_bucket(htab, hash);
head = &b->head;
@@ -1299,7 +1526,7 @@ static long __htab_percpu_map_update_elem(struct bpf_map *map, void *key,
if (ret)
return ret;
- l_old = lookup_elem_raw(head, hash, key, key_size);
+ l_old = lookup_elem_raw(head, hash, key, key_size, NULL);
ret = check_flags(htab, l_old, map_flags);
if (ret)
@@ -1345,7 +1572,7 @@ static long __htab_lru_percpu_map_update_elem(struct bpf_map *map, void *key,
key_size = map->key_size;
- hash = htab_map_hash(key, key_size, htab->hashrnd);
+ hash = htab_map_hash(key, key_size, htab->hashrnd, NULL);
b = __select_bucket(htab, hash);
head = &b->head;
@@ -1365,7 +1592,7 @@ static long __htab_lru_percpu_map_update_elem(struct bpf_map *map, void *key,
if (ret)
goto err_lock_bucket;
- l_old = lookup_elem_raw(head, hash, key, key_size);
+ l_old = lookup_elem_raw(head, hash, key, key_size, NULL);
ret = check_flags(htab, l_old, map_flags);
if (ret)
@@ -1411,6 +1638,7 @@ static long htab_lru_percpu_map_update_elem(struct bpf_map *map, void *key,
static long htab_map_delete_elem(struct bpf_map *map, void *key)
{
struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
+ const struct btf_record *key_record = map->key_record;
struct hlist_nulls_head *head;
struct bucket *b;
struct htab_elem *l;
@@ -1423,7 +1651,7 @@ static long htab_map_delete_elem(struct bpf_map *map, void *key)
key_size = map->key_size;
- hash = htab_map_hash(key, key_size, htab->hashrnd);
+ hash = htab_map_hash(key, key_size, htab->hashrnd, key_record);
b = __select_bucket(htab, hash);
head = &b->head;
@@ -1431,7 +1659,7 @@ static long htab_map_delete_elem(struct bpf_map *map, void *key)
if (ret)
return ret;
- l = lookup_elem_raw(head, hash, key, key_size);
+ l = lookup_elem_raw(head, hash, key, key_size, key_record);
if (l) {
hlist_nulls_del_rcu(&l->hash_node);
@@ -1459,7 +1687,7 @@ static long htab_lru_map_delete_elem(struct bpf_map *map, void *key)
key_size = map->key_size;
- hash = htab_map_hash(key, key_size, htab->hashrnd);
+ hash = __htab_map_hash(key, key_size, htab->hashrnd);
b = __select_bucket(htab, hash);
head = &b->head;
@@ -1467,7 +1695,7 @@ static long htab_lru_map_delete_elem(struct bpf_map *map, void *key)
if (ret)
return ret;
- l = lookup_elem_raw(head, hash, key, key_size);
+ l = lookup_elem_raw(head, hash, key, key_size, NULL);
if (l)
hlist_nulls_del_rcu(&l->hash_node);
@@ -1564,6 +1792,7 @@ static void htab_map_free(struct bpf_map *map)
bpf_map_free_elem_count(map);
free_percpu(htab->extra_elems);
bpf_map_area_free(htab->buckets);
+ bpf_mem_alloc_destroy(&htab->dynptr_ma);
bpf_mem_alloc_destroy(&htab->pcpu_ma);
bpf_mem_alloc_destroy(&htab->ma);
if (htab->use_percpu_counter)
@@ -1600,6 +1829,7 @@ static int __htab_map_lookup_and_delete_elem(struct bpf_map *map, void *key,
bool is_percpu, u64 flags)
{
struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
+ const struct btf_record *key_record;
struct hlist_nulls_head *head;
unsigned long bflags;
struct htab_elem *l;
@@ -1608,8 +1838,9 @@ static int __htab_map_lookup_and_delete_elem(struct bpf_map *map, void *key,
int ret;
key_size = map->key_size;
+ key_record = map->key_record;
- hash = htab_map_hash(key, key_size, htab->hashrnd);
+ hash = htab_map_hash(key, key_size, htab->hashrnd, key_record);
b = __select_bucket(htab, hash);
head = &b->head;
@@ -1617,7 +1848,7 @@ static int __htab_map_lookup_and_delete_elem(struct bpf_map *map, void *key,
if (ret)
return ret;
- l = lookup_elem_raw(head, hash, key, key_size);
+ l = lookup_elem_raw(head, hash, key, key_size, key_record);
if (!l) {
ret = -ENOENT;
} else {
--
2.44.0
^ permalink raw reply related [flat|nested] 47+ messages in thread
* [PATCH bpf-next 13/16] bpf: Export bpf_dynptr_set_size
2024-10-08 9:14 [PATCH bpf-next 00/16] Support dynptr key for hash map Hou Tao
` (11 preceding siblings ...)
2024-10-08 9:14 ` [PATCH bpf-next 12/16] bpf: Support basic operations for dynptr key in hash map Hou Tao
@ 2024-10-08 9:14 ` Hou Tao
2024-10-08 9:14 ` [PATCH bpf-next 14/16] bpf: Support get_next_key operation for dynptr key in hash map Hou Tao
` (3 subsequent siblings)
16 siblings, 0 replies; 47+ messages in thread
From: Hou Tao @ 2024-10-08 9:14 UTC (permalink / raw)
To: bpf
Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko,
Eduard Zingerman, Song Liu, Hao Luo, Yonghong Song,
Daniel Borkmann, KP Singh, Stanislav Fomichev, Jiri Olsa,
John Fastabend, houtao1, xukuohai
From: Hou Tao <houtao1@huawei.com>
It will be used by the following patch to shrink the size of dynptr.
Signed-off-by: Hou Tao <houtao1@huawei.com>
---
include/linux/bpf.h | 1 +
kernel/bpf/helpers.c | 2 +-
2 files changed, 2 insertions(+), 1 deletion(-)
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 127952377025..23db20e6402f 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1301,6 +1301,7 @@ enum bpf_dynptr_type {
};
int bpf_dynptr_check_size(u32 size);
+void bpf_dynptr_set_size(struct bpf_dynptr_kern *ptr, u32 new_size);
u32 __bpf_dynptr_size(const struct bpf_dynptr_kern *ptr);
const void *__bpf_dynptr_data(const struct bpf_dynptr_kern *ptr, u32 len);
void *__bpf_dynptr_data_rw(const struct bpf_dynptr_kern *ptr, u32 len);
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index f39521b53a4e..f6fd996e30c7 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1674,7 +1674,7 @@ u32 __bpf_dynptr_size(const struct bpf_dynptr_kern *ptr)
return ptr->size & DYNPTR_SIZE_MASK;
}
-static void bpf_dynptr_set_size(struct bpf_dynptr_kern *ptr, u32 new_size)
+void bpf_dynptr_set_size(struct bpf_dynptr_kern *ptr, u32 new_size)
{
u32 metadata = ptr->size & ~DYNPTR_SIZE_MASK;
--
2.44.0
^ permalink raw reply related [flat|nested] 47+ messages in thread
* [PATCH bpf-next 14/16] bpf: Support get_next_key operation for dynptr key in hash map
2024-10-08 9:14 [PATCH bpf-next 00/16] Support dynptr key for hash map Hou Tao
` (12 preceding siblings ...)
2024-10-08 9:14 ` [PATCH bpf-next 13/16] bpf: Export bpf_dynptr_set_size Hou Tao
@ 2024-10-08 9:14 ` Hou Tao
2024-10-08 9:15 ` [PATCH bpf-next 15/16] bpf: Enable BPF_F_DYNPTR_IN_KEY for " Hou Tao
` (2 subsequent siblings)
16 siblings, 0 replies; 47+ messages in thread
From: Hou Tao @ 2024-10-08 9:14 UTC (permalink / raw)
To: bpf
Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko,
Eduard Zingerman, Song Liu, Hao Luo, Yonghong Song,
Daniel Borkmann, KP Singh, Stanislav Fomichev, Jiri Olsa,
John Fastabend, houtao1, xukuohai
From: Hou Tao <houtao1@huawei.com>
It firstly passed the key_record to htab_map_hash() and
lookup_nulls_eleme_raw() to find the target key, then it uses
htab_copy_dynptr_key() helper to copy from the target key to the next
key used for output.
Signed-off-by: Hou Tao <houtao1@huawei.com>
---
kernel/bpf/hashtab.c | 55 ++++++++++++++++++++++++++++++--------------
1 file changed, 38 insertions(+), 17 deletions(-)
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index edf19d36a413..b647fe9f8f9f 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -969,7 +969,8 @@ static bool htab_lru_map_delete_node(void *arg, struct bpf_lru_node *node)
return l == tgt_l;
}
-static int htab_copy_dynptr_key(struct bpf_htab *htab, void *dst_key, const void *key, u32 key_size)
+static int htab_copy_dynptr_key(struct bpf_htab *htab, void *dst_key, const void *key, u32 key_size,
+ bool copy_in)
{
const struct btf_record *rec = htab->map.key_record;
struct bpf_dynptr_kern *dst_kptr;
@@ -994,22 +995,32 @@ static int htab_copy_dynptr_key(struct bpf_htab *htab, void *dst_key, const void
/* Doesn't support nullified dynptr in map key */
kptr = key + field->offset;
- if (!kptr->data) {
+ if (copy_in && !kptr->data) {
err = -EINVAL;
goto out;
}
len = __bpf_dynptr_size(kptr);
data = __bpf_dynptr_data(kptr, len);
- dst_data = bpf_mem_alloc(&htab->dynptr_ma, len);
- if (!dst_data) {
- err = -ENOMEM;
- goto out;
- }
+ dst_kptr = dst_key + field->offset;
+ if (copy_in) {
+ dst_data = bpf_mem_alloc(&htab->dynptr_ma, len);
+ if (!dst_data) {
+ err = -ENOMEM;
+ goto out;
+ }
+ bpf_dynptr_init(dst_kptr, dst_data, BPF_DYNPTR_TYPE_LOCAL, 0, len);
+ } else {
+ dst_data = __bpf_dynptr_data_rw(dst_kptr, len);
+ if (!dst_data) {
+ err = -ENOSPC;
+ goto out;
+ }
+ if (__bpf_dynptr_size(dst_kptr) > len)
+ bpf_dynptr_set_size(dst_kptr, len);
+ }
memcpy(dst_data, data, len);
- dst_kptr = dst_key + field->offset;
- bpf_dynptr_init(dst_kptr, dst_data, BPF_DYNPTR_TYPE_LOCAL, 0, len);
offset = field->offset + field->size;
}
@@ -1020,7 +1031,7 @@ static int htab_copy_dynptr_key(struct bpf_htab *htab, void *dst_key, const void
return 0;
out:
- for (; i > 0; i--) {
+ for (; i > 0 && copy_in; i--) {
field = &rec->fields[i - 1];
if (field->type != BPF_DYNPTR)
continue;
@@ -1031,10 +1042,22 @@ static int htab_copy_dynptr_key(struct bpf_htab *htab, void *dst_key, const void
return err;
}
+static inline int htab_copy_next_key(struct bpf_htab *htab, void *next_key, const void *key,
+ u32 key_size)
+{
+ if (!bpf_map_has_dynptr_key(&htab->map)) {
+ memcpy(next_key, key, key_size);
+ return 0;
+ }
+
+ return htab_copy_dynptr_key(htab, next_key, key, key_size, false);
+}
+
/* Called from syscall */
static int htab_map_get_next_key(struct bpf_map *map, void *key, void *next_key)
{
struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
+ const struct btf_record *key_record = map->key_record;
struct hlist_nulls_head *head;
struct htab_elem *l, *next_l;
u32 hash, key_size;
@@ -1047,12 +1070,12 @@ static int htab_map_get_next_key(struct bpf_map *map, void *key, void *next_key)
if (!key)
goto find_first_elem;
- hash = htab_map_hash(key, key_size, htab->hashrnd, NULL);
+ hash = htab_map_hash(key, key_size, htab->hashrnd, key_record);
head = select_bucket(htab, hash);
/* lookup the key */
- l = lookup_nulls_elem_raw(head, hash, key, key_size, htab->n_buckets, NULL);
+ l = lookup_nulls_elem_raw(head, hash, key, key_size, htab->n_buckets, key_record);
if (!l)
goto find_first_elem;
@@ -1063,8 +1086,7 @@ static int htab_map_get_next_key(struct bpf_map *map, void *key, void *next_key)
if (next_l) {
/* if next elem in this hash list is non-zero, just return it */
- memcpy(next_key, next_l->key, key_size);
- return 0;
+ return htab_copy_next_key(htab, next_key, next_l->key, key_size);
}
/* no more elements in this hash list, go to the next bucket */
@@ -1081,8 +1103,7 @@ static int htab_map_get_next_key(struct bpf_map *map, void *key, void *next_key)
struct htab_elem, hash_node);
if (next_l) {
/* if it's not empty, just return it */
- memcpy(next_key, next_l->key, key_size);
- return 0;
+ return htab_copy_next_key(htab, next_key, next_l->key, key_size);
}
}
@@ -1263,7 +1284,7 @@ static struct htab_elem *alloc_htab_elem(struct bpf_htab *htab, void *key,
if (bpf_map_has_dynptr_key(&htab->map)) {
int copy_err;
- copy_err = htab_copy_dynptr_key(htab, l_new->key, key, key_size);
+ copy_err = htab_copy_dynptr_key(htab, l_new->key, key, key_size, true);
if (copy_err) {
bpf_mem_cache_free(&htab->ma, l_new);
l_new = ERR_PTR(copy_err);
--
2.44.0
^ permalink raw reply related [flat|nested] 47+ messages in thread
* [PATCH bpf-next 15/16] bpf: Enable BPF_F_DYNPTR_IN_KEY for hash map
2024-10-08 9:14 [PATCH bpf-next 00/16] Support dynptr key for hash map Hou Tao
` (13 preceding siblings ...)
2024-10-08 9:14 ` [PATCH bpf-next 14/16] bpf: Support get_next_key operation for dynptr key in hash map Hou Tao
@ 2024-10-08 9:15 ` Hou Tao
2024-10-08 9:15 ` [PATCH bpf-next 16/16] selftests/bpf: Add test cases for hash map with dynptr key Hou Tao
2024-10-11 22:11 ` [PATCH bpf-next 00/16] Support dynptr key for hash map Eduard Zingerman
16 siblings, 0 replies; 47+ messages in thread
From: Hou Tao @ 2024-10-08 9:15 UTC (permalink / raw)
To: bpf
Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko,
Eduard Zingerman, Song Liu, Hao Luo, Yonghong Song,
Daniel Borkmann, KP Singh, Stanislav Fomichev, Jiri Olsa,
John Fastabend, houtao1, xukuohai
From: Hou Tao <houtao1@huawei.com>
Enable BPF_F_DYNPTR_IN_KEY in HTAB_CREATE_FLAG_MASK to support the
creation of dynptr key hash map.
Signed-off-by: Hou Tao <houtao1@huawei.com>
---
kernel/bpf/hashtab.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index b647fe9f8f9f..b34693a7f35c 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -19,7 +19,7 @@
#define HTAB_CREATE_FLAG_MASK \
(BPF_F_NO_PREALLOC | BPF_F_NO_COMMON_LRU | BPF_F_NUMA_NODE | \
- BPF_F_ACCESS_MASK | BPF_F_ZERO_SEED)
+ BPF_F_ACCESS_MASK | BPF_F_ZERO_SEED | BPF_F_DYNPTR_IN_KEY)
#define BATCH_OPS(_name) \
.map_lookup_batch = \
--
2.44.0
^ permalink raw reply related [flat|nested] 47+ messages in thread
* [PATCH bpf-next 16/16] selftests/bpf: Add test cases for hash map with dynptr key
2024-10-08 9:14 [PATCH bpf-next 00/16] Support dynptr key for hash map Hou Tao
` (14 preceding siblings ...)
2024-10-08 9:15 ` [PATCH bpf-next 15/16] bpf: Enable BPF_F_DYNPTR_IN_KEY for " Hou Tao
@ 2024-10-08 9:15 ` Hou Tao
2024-10-11 18:23 ` Alexei Starovoitov
2024-10-11 22:11 ` [PATCH bpf-next 00/16] Support dynptr key for hash map Eduard Zingerman
16 siblings, 1 reply; 47+ messages in thread
From: Hou Tao @ 2024-10-08 9:15 UTC (permalink / raw)
To: bpf
Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko,
Eduard Zingerman, Song Liu, Hao Luo, Yonghong Song,
Daniel Borkmann, KP Singh, Stanislav Fomichev, Jiri Olsa,
John Fastabend, houtao1, xukuohai
From: Hou Tao <houtao1@huawei.com>
Add three positive test cases to test the basic operations on the
dynptr-keyed hash map. These basic operations include lookup, update,
delete, lookup_and_delete, and get_next_key. These operations are
exercised through both bpf syscall and bpf program. These three test
cases use different map keys. The first test case uses both bpf_dynptr
and a struct with only bpf_dynptr as map key, the second one uses a
struct with a integer and a bpf_dynptr as map key, and the last one uses
a struct with two bpf_dynptr as map key.
Also add multiple negative test cases for dynptr-keyed hash map. These
test cases check whether the type of the register for the map key is
expected, whether the offset of the access is aligned, and whether the
layout of dynptr and non-dynptr parts in the stack is matched with the
definition of map->key_record.
Signed-off-by: Hou Tao <houtao1@huawei.com>
---
.../bpf/prog_tests/htab_dynkey_test.c | 451 ++++++++++++++++++
.../bpf/progs/htab_dynkey_test_failure.c | 270 +++++++++++
.../bpf/progs/htab_dynkey_test_success.c | 399 ++++++++++++++++
3 files changed, 1120 insertions(+)
create mode 100644 tools/testing/selftests/bpf/prog_tests/htab_dynkey_test.c
create mode 100644 tools/testing/selftests/bpf/progs/htab_dynkey_test_failure.c
create mode 100644 tools/testing/selftests/bpf/progs/htab_dynkey_test_success.c
diff --git a/tools/testing/selftests/bpf/prog_tests/htab_dynkey_test.c b/tools/testing/selftests/bpf/prog_tests/htab_dynkey_test.c
new file mode 100644
index 000000000000..30fc085cfc4c
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/htab_dynkey_test.c
@@ -0,0 +1,451 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (C) 2024. Huawei Technologies Co., Ltd */
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+#include <unistd.h>
+#include <test_progs.h>
+
+#include "htab_dynkey_test_success.skel.h"
+#include "htab_dynkey_test_failure.skel.h"
+
+struct id_dname_key {
+ int id;
+ struct bpf_dynptr_user name;
+};
+
+struct dname_key {
+ struct bpf_dynptr_user name;
+};
+
+struct multiple_dynptr_key {
+ struct dname_key f_1;
+ unsigned long f_2;
+ struct id_dname_key f_3;
+ unsigned long f_4;
+};
+
+static char *name_list[] = {
+ "systemd",
+ "[rcu_sched]",
+ "[kworker/42:0H-events_highpri]",
+ "[ksoftirqd/58]",
+ "[rcu_tasks_trace]",
+};
+
+#define INIT_VALUE 100
+#define INIT_ID 1000
+
+static void setup_pure_dynptr_key_map(int fd)
+{
+ struct bpf_dynptr_user key, _cur_key, _next_key;
+ struct bpf_dynptr_user *cur_key, *next_key;
+ bool marked[ARRAY_SIZE(name_list)];
+ unsigned int i, next_idx, size;
+ unsigned long value, got;
+ char name[2][64];
+ char msg[64];
+ void *data;
+ int err;
+
+ /* lookup non-existent keys */
+ for (i = 0; i < ARRAY_SIZE(name_list); i++) {
+ snprintf(msg, sizeof(msg), "#%u bad lookup", i);
+ /* Use strdup() to ensure that the content pointed by dynptr is
+ * used for lookup instead of the pointer in dynptr. sys_bpf()
+ * will handle the NULL case properly.
+ */
+ data = strdup(name_list[i]);
+ bpf_dynptr_user_init(data, strlen(name_list[i]) + 1, &key);
+ err = bpf_map_lookup_elem(fd, &key, &value);
+ ASSERT_EQ(err, -ENOENT, msg);
+ free(data);
+ }
+
+ /* update keys */
+ for (i = 0; i < ARRAY_SIZE(name_list); i++) {
+ snprintf(msg, sizeof(msg), "#%u insert", i);
+ data = strdup(name_list[i]);
+ bpf_dynptr_user_init(data, strlen(name_list[i]) + 1, &key);
+ value = INIT_VALUE + i;
+ err = bpf_map_update_elem(fd, &key, &value, BPF_NOEXIST);
+ ASSERT_OK(err, msg);
+ free(data);
+ }
+
+ /* lookup existent keys */
+ for (i = 0; i < ARRAY_SIZE(name_list); i++) {
+ snprintf(msg, sizeof(msg), "#%u lookup", i);
+ data = strdup(name_list[i]);
+ bpf_dynptr_user_init(data, strlen(name_list[i]) + 1, &key);
+ got = 0;
+ err = bpf_map_lookup_elem(fd, &key, &got);
+ ASSERT_OK(err, msg);
+ free(data);
+
+ value = INIT_VALUE + i;
+ ASSERT_EQ(got, value, msg);
+ }
+
+ /* delete keys */
+ for (i = 0; i < ARRAY_SIZE(name_list); i++) {
+ snprintf(msg, sizeof(msg), "#%u delete", i);
+ data = strdup(name_list[i]);
+ bpf_dynptr_user_init(data, strlen(name_list[i]) + 1, &key);
+ err = bpf_map_delete_elem(fd, &key);
+ ASSERT_OK(err, msg);
+ free(data);
+ }
+
+ /* re-insert keys */
+ for (i = 0; i < ARRAY_SIZE(name_list); i++) {
+ snprintf(msg, sizeof(msg), "#%u re-insert", i);
+ data = strdup(name_list[i]);
+ bpf_dynptr_user_init(data, strlen(name_list[i]) + 1, &key);
+ value = 0;
+ err = bpf_map_update_elem(fd, &key, &value, BPF_NOEXIST);
+ ASSERT_OK(err, msg);
+ free(data);
+ }
+
+ /* overwrite keys */
+ for (i = 0; i < ARRAY_SIZE(name_list); i++) {
+ snprintf(msg, sizeof(msg), "#%u overwrite", i);
+ data = strdup(name_list[i]);
+ bpf_dynptr_user_init(data, strlen(name_list[i]) + 1, &key);
+ value = INIT_VALUE + i;
+ err = bpf_map_update_elem(fd, &key, &value, BPF_EXIST);
+ ASSERT_OK(err, msg);
+ free(data);
+ }
+
+ /* get_next keys */
+ next_idx = 0;
+ cur_key = NULL;
+ next_key = &_next_key;
+ memset(&marked, 0, sizeof(marked));
+ while (true) {
+ bpf_dynptr_user_init(name[next_idx], sizeof(name[next_idx]), next_key);
+ err = bpf_map_get_next_key(fd, cur_key, next_key);
+ if (err) {
+ ASSERT_EQ(err, -ENOENT, "get_next_key");
+ break;
+ }
+
+ size = bpf_dynptr_user_size(next_key);
+ data = bpf_dynptr_user_data(next_key);
+ for (i = 0; i < ARRAY_SIZE(name_list); i++) {
+ if (size == strlen(name_list[i]) + 1 &&
+ !memcmp(name_list[i], data, size)) {
+ ASSERT_FALSE(marked[i], name_list[i]);
+ marked[i] = true;
+ break;
+ }
+ }
+ ASSERT_EQ(next_key->rsvd, 0, "rsvd");
+
+ if (!cur_key)
+ cur_key = &_cur_key;
+ *cur_key = *next_key;
+ next_idx ^= 1;
+ }
+
+ for (i = 0; i < ARRAY_SIZE(marked); i++)
+ ASSERT_TRUE(marked[i], name_list[i]);
+
+ /* lookup_and_delete all elements except the first one */
+ for (i = 1; i < ARRAY_SIZE(name_list); i++) {
+ snprintf(msg, sizeof(msg), "#%u lookup_delete", i);
+ data = strdup(name_list[i]);
+ bpf_dynptr_user_init(data, strlen(name_list[i]) + 1, &key);
+ got = 0;
+ err = bpf_map_lookup_and_delete_elem(fd, &key, &got);
+ ASSERT_OK(err, msg);
+ free(data);
+
+ value = INIT_VALUE + i;
+ ASSERT_EQ(got, value, msg);
+ }
+
+ /* get the key after the first element */
+ cur_key = &_cur_key;
+ strncpy(name[0], name_list[0], sizeof(name[0]) - 1);
+ name[0][sizeof(name[0]) - 1] = 0;
+ bpf_dynptr_user_init(name[0], strlen(name[0]) + 1, cur_key);
+
+ next_key = &_next_key;
+ bpf_dynptr_user_init(name[1], sizeof(name[1]), next_key);
+ err = bpf_map_get_next_key(fd, cur_key, next_key);
+ ASSERT_EQ(err, -ENOENT, "get_last");
+}
+
+static void setup_mixed_dynptr_key_map(int fd)
+{
+ struct id_dname_key key, _cur_key, _next_key;
+ struct id_dname_key *cur_key, *next_key;
+ bool marked[ARRAY_SIZE(name_list)];
+ unsigned int i, next_idx, size;
+ unsigned long value;
+ char name[2][64];
+ char msg[64];
+ void *data;
+ int err;
+
+ /* Zero the hole */
+ memset(&key, 0, sizeof(key));
+
+ /* lookup non-existent keys */
+ for (i = 0; i < ARRAY_SIZE(name_list); i++) {
+ snprintf(msg, sizeof(msg), "#%u bad lookup", i);
+ key.id = INIT_ID + i;
+ data = strdup(name_list[i]);
+ bpf_dynptr_user_init(data, strlen(name_list[i]) + 1, &key.name);
+ err = bpf_map_lookup_elem(fd, &key, &value);
+ ASSERT_EQ(err, -ENOENT, msg);
+ free(data);
+ }
+
+ /* update keys */
+ for (i = 0; i < ARRAY_SIZE(name_list); i++) {
+ snprintf(msg, sizeof(msg), "#%u insert", i);
+ key.id = INIT_ID + i;
+ data = strdup(name_list[i]);
+ bpf_dynptr_user_init(data, strlen(name_list[i]) + 1, &key.name);
+ value = INIT_VALUE + i;
+ err = bpf_map_update_elem(fd, &key, &value, BPF_NOEXIST);
+ ASSERT_OK(err, msg);
+ free(data);
+ }
+
+ /* lookup existent keys */
+ for (i = 0; i < ARRAY_SIZE(name_list); i++) {
+ unsigned long got = 0;
+
+ snprintf(msg, sizeof(msg), "#%u lookup", i);
+ key.id = INIT_ID + i;
+ data = strdup(name_list[i]);
+ bpf_dynptr_user_init(data, strlen(name_list[i]) + 1, &key.name);
+ err = bpf_map_lookup_elem(fd, &key, &got);
+ ASSERT_OK(err, msg);
+ free(data);
+
+ value = INIT_VALUE + i;
+ ASSERT_EQ(got, value, msg);
+ }
+
+ /* delete keys */
+ for (i = 0; i < ARRAY_SIZE(name_list); i++) {
+ snprintf(msg, sizeof(msg), "#%u delete", i);
+ key.id = INIT_ID + i;
+ data = strdup(name_list[i]);
+ bpf_dynptr_user_init(data, strlen(name_list[i]) + 1, &key.name);
+ err = bpf_map_delete_elem(fd, &key);
+ ASSERT_OK(err, msg);
+ free(data);
+ }
+
+ /* re-insert keys */
+ for (i = 0; i < ARRAY_SIZE(name_list); i++) {
+ snprintf(msg, sizeof(msg), "#%u re-insert", i);
+ key.id = INIT_ID + i;
+ data = strdup(name_list[i]);
+ bpf_dynptr_user_init(data, strlen(name_list[i]) + 1, &key.name);
+ value = 0;
+ err = bpf_map_update_elem(fd, &key, &value, BPF_NOEXIST);
+ ASSERT_OK(err, msg);
+ free(data);
+ }
+
+ /* overwrite keys */
+ for (i = 0; i < ARRAY_SIZE(name_list); i++) {
+ snprintf(msg, sizeof(msg), "#%u overwrite", i);
+ key.id = INIT_ID + i;
+ data = strdup(name_list[i]);
+ bpf_dynptr_user_init(data, strlen(name_list[i]) + 1, &key.name);
+ value = INIT_VALUE + i;
+ err = bpf_map_update_elem(fd, &key, &value, BPF_EXIST);
+ ASSERT_OK(err, msg);
+ free(data);
+ }
+
+ /* get_next keys */
+ next_idx = 0;
+ cur_key = NULL;
+ next_key = &_next_key;
+ memset(&marked, 0, sizeof(marked));
+ while (true) {
+ bpf_dynptr_user_init(name[next_idx], sizeof(name[next_idx]), &next_key->name);
+ err = bpf_map_get_next_key(fd, cur_key, next_key);
+ if (err) {
+ ASSERT_EQ(err, -ENOENT, "last get_next");
+ break;
+ }
+
+ size = bpf_dynptr_user_size(&next_key->name);
+ data = bpf_dynptr_user_data(&next_key->name);
+ for (i = 0; i < ARRAY_SIZE(name_list); i++) {
+ if (size == strlen(name_list[i]) + 1 &&
+ !memcmp(name_list[i], data, size)) {
+ ASSERT_FALSE(marked[i], name_list[i]);
+ ASSERT_EQ(next_key->id, INIT_ID + i, name_list[i]);
+ marked[i] = true;
+ break;
+ }
+ }
+ ASSERT_EQ(next_key->name.rsvd, 0, "rsvd");
+
+ if (!cur_key)
+ cur_key = &_cur_key;
+ *cur_key = *next_key;
+ next_idx ^= 1;
+ }
+
+ for (i = 0; i < ARRAY_SIZE(marked); i++)
+ ASSERT_TRUE(marked[i], name_list[i]);
+}
+
+static void setup_multiple_dynptr_key_map(int fd)
+{
+ struct multiple_dynptr_key key, cur_key, next_key;
+ unsigned long value;
+ unsigned int size;
+ char name[4][64];
+ void *data[2];
+ int err;
+
+ /* Zero the hole */
+ memset(&key, 0, sizeof(key));
+
+ key.f_2 = 2;
+ key.f_3.id = 3;
+ key.f_4 = 4;
+
+ /* lookup a non-existent key */
+ data[0] = strdup(name_list[0]);
+ data[1] = strdup(name_list[1]);
+ bpf_dynptr_user_init(data[0], strlen(name_list[0]) + 1, &key.f_1.name);
+ bpf_dynptr_user_init(data[1], strlen(name_list[1]) + 1, &key.f_3.name);
+ err = bpf_map_lookup_elem(fd, &key, &value);
+ ASSERT_EQ(err, -ENOENT, "lookup");
+
+ /* update key */
+ value = INIT_VALUE;
+ err = bpf_map_update_elem(fd, &key, &value, BPF_NOEXIST);
+ ASSERT_OK(err, "update");
+ free(data[0]);
+ free(data[1]);
+
+ /* lookup key */
+ data[0] = strdup(name_list[0]);
+ data[1] = strdup(name_list[1]);
+ bpf_dynptr_user_init(data[0], strlen(name_list[0]) + 1, &key.f_1.name);
+ bpf_dynptr_user_init(data[1], strlen(name_list[1]) + 1, &key.f_3.name);
+ err = bpf_map_lookup_elem(fd, &key, &value);
+ ASSERT_OK(err, "lookup");
+ ASSERT_EQ(value, INIT_VALUE, "lookup");
+
+ /* delete key */
+ err = bpf_map_delete_elem(fd, &key);
+ ASSERT_OK(err, "delete");
+ free(data[0]);
+ free(data[1]);
+
+ /* re-insert keys */
+ bpf_dynptr_user_init(name_list[0], strlen(name_list[0]) + 1, &key.f_1.name);
+ bpf_dynptr_user_init(name_list[1], strlen(name_list[1]) + 1, &key.f_3.name);
+ value = 0;
+ err = bpf_map_update_elem(fd, &key, &value, BPF_NOEXIST);
+ ASSERT_OK(err, "re-insert");
+
+ /* overwrite keys */
+ data[0] = strdup(name_list[0]);
+ data[1] = strdup(name_list[1]);
+ bpf_dynptr_user_init(data[0], strlen(name_list[0]) + 1, &key.f_1.name);
+ bpf_dynptr_user_init(data[1], strlen(name_list[1]) + 1, &key.f_3.name);
+ value = INIT_VALUE;
+ err = bpf_map_update_elem(fd, &key, &value, BPF_EXIST);
+ ASSERT_OK(err, "overwrite");
+ free(data[0]);
+ free(data[1]);
+
+ /* get_next_key */
+ bpf_dynptr_user_init(name[0], sizeof(name[0]), &next_key.f_1.name);
+ bpf_dynptr_user_init(name[1], sizeof(name[1]), &next_key.f_3.name);
+ err = bpf_map_get_next_key(fd, NULL, &next_key);
+ ASSERT_OK(err, "first get_next");
+
+ size = bpf_dynptr_user_size(&next_key.f_1.name);
+ data[0] = bpf_dynptr_user_data(&next_key.f_1.name);
+ if (ASSERT_EQ(size, strlen(name_list[0]) + 1, "f_1 size"))
+ ASSERT_TRUE(!memcmp(name_list[0], data[0], size), "f_1 data");
+ ASSERT_EQ(next_key.f_1.name.rsvd, 0, "f_1 rsvd");
+
+ ASSERT_EQ(next_key.f_2, 2, "f_2");
+
+ ASSERT_EQ(next_key.f_3.id, 3, "f_3 id");
+ size = bpf_dynptr_user_size(&next_key.f_3.name);
+ data[0] = bpf_dynptr_user_data(&next_key.f_3.name);
+ if (ASSERT_EQ(size, strlen(name_list[1]) + 1, "f_3 size"))
+ ASSERT_TRUE(!memcmp(name_list[1], data[0], size), "f_3 data");
+ ASSERT_EQ(next_key.f_3.name.rsvd, 0, "f_3 rsvd");
+
+ ASSERT_EQ(next_key.f_4, 4, "f_4");
+
+ cur_key = next_key;
+ bpf_dynptr_user_init(name[2], sizeof(name[2]), &next_key.f_1.name);
+ bpf_dynptr_user_init(name[3], sizeof(name[3]), &next_key.f_3.name);
+ err = bpf_map_get_next_key(fd, &cur_key, &next_key);
+ ASSERT_EQ(err, -ENOENT, "last get_next_key");
+}
+
+static void test_htab_dynptr_key(bool pure, bool multiple)
+{
+ struct htab_dynkey_test_success *skel;
+ struct bpf_program *prog;
+ int err;
+
+ skel = htab_dynkey_test_success__open();
+ if (!ASSERT_OK_PTR(skel, "open()"))
+ return;
+
+ prog = pure ? skel->progs.pure_dynptr_key :
+ (multiple ? skel->progs.multiple_dynptr_key : skel->progs.mixed_dynptr_key);
+ bpf_program__set_autoload(prog, true);
+
+ err = htab_dynkey_test_success__load(skel);
+ if (!ASSERT_OK(err, "load()"))
+ goto out;
+
+ if (pure) {
+ setup_pure_dynptr_key_map(bpf_map__fd(skel->maps.htab_1));
+ setup_pure_dynptr_key_map(bpf_map__fd(skel->maps.htab_2));
+ } else if (multiple) {
+ setup_multiple_dynptr_key_map(bpf_map__fd(skel->maps.htab_4));
+ } else {
+ setup_mixed_dynptr_key_map(bpf_map__fd(skel->maps.htab_3));
+ }
+
+ skel->bss->pid = getpid();
+
+ err = htab_dynkey_test_success__attach(skel);
+ if (!ASSERT_OK(err, "attach()"))
+ goto out;
+
+ usleep(1);
+
+ ASSERT_EQ(skel->bss->test_err, 0, "test");
+out:
+ htab_dynkey_test_success__destroy(skel);
+}
+
+void test_htab_dynkey_test(void)
+{
+ if (test__start_subtest("pure_dynptr_key"))
+ test_htab_dynptr_key(true, false);
+ if (test__start_subtest("mixed_dynptr_key"))
+ test_htab_dynptr_key(false, false);
+ if (test__start_subtest("multiple_dynptr_key"))
+ test_htab_dynptr_key(false, true);
+
+ RUN_TESTS(htab_dynkey_test_failure);
+}
diff --git a/tools/testing/selftests/bpf/progs/htab_dynkey_test_failure.c b/tools/testing/selftests/bpf/progs/htab_dynkey_test_failure.c
new file mode 100644
index 000000000000..c391e4fc5320
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/htab_dynkey_test_failure.c
@@ -0,0 +1,270 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (C) 2024. Huawei Technologies Co., Ltd */
+#include <linux/types.h>
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+#include <errno.h>
+
+#include "bpf_misc.h"
+
+char _license[] SEC("license") = "GPL";
+
+struct bpf_map;
+
+struct id_dname_key {
+ int id;
+ struct bpf_dynptr name;
+};
+
+struct dname_id_key {
+ struct bpf_dynptr name;
+ int id;
+};
+
+struct id_name_key {
+ int id;
+ char name[20];
+};
+
+struct dname_key {
+ struct bpf_dynptr name;
+};
+
+struct dname_dname_key {
+ struct bpf_dynptr name_1;
+ struct bpf_dynptr name_2;
+};
+
+struct dname_dname_id_key {
+ struct dname_dname_key names;
+ __u64 id;
+};
+
+struct dname_id_id_id_key {
+ struct bpf_dynptr name;
+ __u64 id[3];
+};
+
+struct dname_dname_dname_key {
+ struct bpf_dynptr name_1;
+ struct bpf_dynptr name_2;
+ struct bpf_dynptr name_3;
+};
+
+struct {
+ __uint(type, BPF_MAP_TYPE_HASH);
+ __uint(max_entries, 10);
+ __uint(map_flags, BPF_F_NO_PREALLOC | BPF_F_DYNPTR_IN_KEY);
+ __type(key, struct id_dname_key);
+ __type(value, unsigned long);
+ __uint(map_extra, 1024);
+} htab_1 SEC(".maps");
+
+struct {
+ __uint(type, BPF_MAP_TYPE_HASH);
+ __uint(max_entries, 10);
+ __uint(map_flags, BPF_F_NO_PREALLOC | BPF_F_DYNPTR_IN_KEY);
+ __type(key, struct dname_key);
+ __type(value, unsigned long);
+ __uint(map_extra, 1024);
+} htab_2 SEC(".maps");
+
+struct {
+ __uint(type, BPF_MAP_TYPE_HASH);
+ __uint(max_entries, 10);
+ __uint(map_flags, BPF_F_NO_PREALLOC | BPF_F_DYNPTR_IN_KEY);
+ __type(key, struct dname_dname_id_key);
+ __type(value, unsigned long);
+ __uint(map_extra, 1024);
+} htab_3 SEC(".maps");
+
+struct {
+ __uint(type, BPF_MAP_TYPE_HASH);
+ __uint(max_entries, 10);
+ __uint(map_flags, BPF_F_NO_PREALLOC | BPF_F_DYNPTR_IN_KEY);
+ __type(key, struct bpf_dynptr);
+ __type(value, unsigned long);
+ __uint(map_extra, 1024);
+} htab_4 SEC(".maps");
+
+struct {
+ __uint(type, BPF_MAP_TYPE_RINGBUF);
+ __uint(max_entries, 4096);
+} ringbuf SEC(".maps");
+
+char dynptr_buf[32] = {};
+
+/* uninitialized dynptr */
+SEC("fentry/" SYS_PREFIX "sys_nanosleep")
+__failure __msg("dynptr-key expects dynptr at offset 8")
+int BPF_PROG(uninit_dynptr)
+{
+ struct id_dname_key key;
+
+ key.id = 100;
+ bpf_map_lookup_elem(&htab_1, &key);
+
+ return 0;
+}
+
+/* invalid dynptr */
+SEC("fentry/" SYS_PREFIX "sys_nanosleep")
+__failure __msg("dynptr-key expects dynptr at offset 8")
+int BPF_PROG(invalid_dynptr)
+{
+ struct id_dname_key key;
+
+ key.id = 100;
+ bpf_ringbuf_reserve_dynptr(&ringbuf, 10, 0, &key.name);
+ bpf_ringbuf_discard_dynptr(&key.name, 0);
+ bpf_map_lookup_elem(&htab_1, &key);
+
+ return 0;
+}
+
+/* expect no-dynptr got dynptr */
+SEC("fentry/" SYS_PREFIX "sys_nanosleep")
+__failure __msg("dynptr-key expects non-dynptr at offset 0")
+int BPF_PROG(invalid_non_dynptr)
+{
+ struct dname_id_key key;
+
+ __builtin_memcpy(dynptr_buf, "test", 4);
+ bpf_dynptr_from_mem(dynptr_buf, 4, 0, &key.name);
+ key.id = 100;
+ bpf_map_lookup_elem(&htab_1, &key);
+
+ return 0;
+}
+
+/* expect dynptr get non-dynptr */
+SEC("fentry/" SYS_PREFIX "sys_nanosleep")
+__failure __msg("dynptr-key expects dynptr at offset 8")
+int BPF_PROG(no_dynptr)
+{
+ struct id_name_key key;
+
+ key.id = 100;
+ __builtin_memset(key.name, 0, sizeof(key.name));
+ __builtin_memcpy(key.name, "test", 4);
+ bpf_map_lookup_elem(&htab_1, &key);
+
+ return 0;
+}
+
+/* malformed */
+SEC("fentry/" SYS_PREFIX "sys_nanosleep")
+__failure __msg("malformed dynptr-key at offset 8")
+int BPF_PROG(malformed_dynptr)
+{
+ struct dname_dname_key key;
+
+ bpf_dynptr_from_mem(dynptr_buf, 4, 0, &key.name_1);
+ bpf_dynptr_from_mem(dynptr_buf, 4, 0, &key.name_2);
+
+ bpf_map_lookup_elem(&htab_2, (void *)&key + 8);
+
+ return 0;
+}
+
+/* expect no-dynptr got dynptr */
+SEC("fentry/" SYS_PREFIX "sys_nanosleep")
+__failure __msg("dynptr-key expects non-dynptr at offset 32")
+int BPF_PROG(invalid_non_dynptr_2)
+{
+ struct dname_dname_dname_key key;
+
+ bpf_dynptr_from_mem(dynptr_buf, 4, 0, &key.name_1);
+ bpf_dynptr_from_mem(dynptr_buf, 4, 0, &key.name_2);
+ bpf_dynptr_from_mem(dynptr_buf, 4, 0, &key.name_3);
+
+ bpf_map_lookup_elem(&htab_3, &key);
+
+ return 0;
+}
+
+/* expect dynptr get non-dynptr */
+SEC("fentry/" SYS_PREFIX "sys_nanosleep")
+__failure __msg("dynptr-key expects dynptr at offset 16")
+int BPF_PROG(no_dynptr_2)
+{
+ struct dname_id_id_id_key key;
+
+ bpf_dynptr_from_mem(dynptr_buf, 4, 0, &key.name);
+ bpf_map_lookup_elem(&htab_3, &key);
+
+ return 0;
+}
+
+/* misaligned */
+SEC("fentry/" SYS_PREFIX "sys_nanosleep")
+__failure __msg("R2 misaligned offset -28 for dynptr-key")
+int BPF_PROG(misaligned_dynptr)
+{
+ struct dname_dname_key key;
+
+ bpf_map_lookup_elem(&htab_1, (char *)&key + 4);
+
+ return 0;
+}
+
+/* variable offset */
+SEC("fentry/" SYS_PREFIX "sys_nanosleep")
+__failure __msg("R2 variable offset prohibited for dynptr-key")
+int BPF_PROG(variable_offset_dynptr)
+{
+ struct bpf_dynptr dynptr_1;
+ struct bpf_dynptr dynptr_2;
+ char *key;
+
+ bpf_dynptr_from_mem(dynptr_buf, 4, 0, &dynptr_1);
+ bpf_dynptr_from_mem(dynptr_buf, 4, 0, &dynptr_2);
+
+ key = (char *)&dynptr_2;
+ key = key + (bpf_get_prandom_u32() & 1) * 16;
+
+ bpf_map_lookup_elem(&htab_2, key);
+
+ return 0;
+}
+
+SEC("fentry/" SYS_PREFIX "sys_nanosleep")
+__failure __msg("map dynptr-key requires stack ptr but got map_value")
+int BPF_PROG(map_value_as_key)
+{
+ bpf_map_lookup_elem(&htab_1, dynptr_buf);
+
+ return 0;
+}
+
+static int lookup_htab(struct bpf_map *map, struct id_dname_key *key, void *value, void *data)
+{
+ bpf_map_lookup_elem(&htab_1, key);
+ return 0;
+}
+
+SEC("fentry/" SYS_PREFIX "sys_nanosleep")
+__failure __msg("map dynptr-key requires stack ptr but got map_key")
+int BPF_PROG(map_key_as_key)
+{
+ bpf_for_each_map_elem(&htab_1, lookup_htab, NULL, 0);
+ return 0;
+}
+
+__noinline __weak int subprog_lookup_htab(struct bpf_dynptr *dynptr)
+{
+ bpf_map_lookup_elem(&htab_4, dynptr);
+ return 0;
+}
+
+SEC("fentry/" SYS_PREFIX "sys_nanosleep")
+__failure __msg("R2 type=dynptr_ptr expected=")
+int BPF_PROG(subprog_dynptr)
+{
+ struct bpf_dynptr dynptr;
+
+ bpf_dynptr_from_mem(dynptr_buf, 4, 0, &dynptr);
+ subprog_lookup_htab(&dynptr);
+ return 0;
+}
diff --git a/tools/testing/selftests/bpf/progs/htab_dynkey_test_success.c b/tools/testing/selftests/bpf/progs/htab_dynkey_test_success.c
new file mode 100644
index 000000000000..52736b3519fb
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/htab_dynkey_test_success.c
@@ -0,0 +1,399 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (C) 2024. Huawei Technologies Co., Ltd */
+#include <linux/types.h>
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+#include <errno.h>
+
+#include "bpf_misc.h"
+
+char _license[] SEC("license") = "GPL";
+
+struct pure_dynptr_key {
+ struct bpf_dynptr name;
+};
+
+struct mixed_dynptr_key {
+ int id;
+ struct bpf_dynptr name;
+};
+
+struct multiple_dynptr_key {
+ struct pure_dynptr_key f_1;
+ unsigned long f_2;
+ struct mixed_dynptr_key f_3;
+ unsigned long f_4;
+};
+
+struct {
+ __uint(type, BPF_MAP_TYPE_HASH);
+ __uint(max_entries, 10);
+ __uint(map_flags, BPF_F_NO_PREALLOC | BPF_F_DYNPTR_IN_KEY);
+ __type(key, struct bpf_dynptr);
+ __type(value, unsigned long);
+ __uint(map_extra, 1024);
+} htab_1 SEC(".maps");
+
+struct {
+ __uint(type, BPF_MAP_TYPE_HASH);
+ __uint(max_entries, 10);
+ __uint(map_flags, BPF_F_NO_PREALLOC | BPF_F_DYNPTR_IN_KEY);
+ __type(key, struct pure_dynptr_key);
+ __type(value, unsigned long);
+ __uint(map_extra, 1024);
+} htab_2 SEC(".maps");
+
+struct {
+ __uint(type, BPF_MAP_TYPE_HASH);
+ __uint(max_entries, 10);
+ __uint(map_flags, BPF_F_NO_PREALLOC | BPF_F_DYNPTR_IN_KEY);
+ __type(key, struct mixed_dynptr_key);
+ __type(value, unsigned long);
+ __uint(map_extra, 1024);
+} htab_3 SEC(".maps");
+
+struct {
+ __uint(type, BPF_MAP_TYPE_HASH);
+ __uint(max_entries, 10);
+ __uint(map_flags, BPF_F_NO_PREALLOC | BPF_F_DYNPTR_IN_KEY);
+ __type(key, struct multiple_dynptr_key);
+ __type(value, unsigned long);
+ __uint(map_extra, 1024);
+} htab_4 SEC(".maps");
+
+struct {
+ __uint(type, BPF_MAP_TYPE_RINGBUF);
+ __uint(max_entries, 4096);
+} ringbuf SEC(".maps");
+
+int pid = 0;
+int test_err = 0;
+char dynptr_buf[2][32] = {{}, {}};
+
+static const char systemd_name[] = "systemd";
+static const char udevd_name[] = "udevd";
+static const char rcu_sched_name[] = "[rcu_sched]";
+
+struct bpf_map;
+
+static int test_pure_dynptr_key_htab(struct bpf_map *htab)
+{
+ unsigned long new_value, *value;
+ struct bpf_dynptr key;
+ int err = 0;
+
+ /* Lookup a existent key */
+ __builtin_memcpy(dynptr_buf[0], systemd_name, sizeof(systemd_name));
+ bpf_dynptr_from_mem(dynptr_buf[0], sizeof(systemd_name), 0, &key);
+ value = bpf_map_lookup_elem(htab, &key);
+ if (!value) {
+ err = 1;
+ goto out;
+ }
+ if (*value != 100) {
+ err = 2;
+ goto out;
+ }
+
+ /* Look up a non-existent key */
+ __builtin_memcpy(dynptr_buf[0], udevd_name, sizeof(udevd_name));
+ bpf_dynptr_from_mem(dynptr_buf[0], sizeof(udevd_name), 0, &key);
+ value = bpf_map_lookup_elem(htab, &key);
+ if (value) {
+ err = 3;
+ goto out;
+ }
+
+ /* Insert a new key */
+ new_value = 42;
+ err = bpf_map_update_elem(htab, &key, &new_value, BPF_NOEXIST);
+ if (err) {
+ err = 4;
+ goto out;
+ }
+
+ /* Insert an existent key */
+ bpf_ringbuf_reserve_dynptr(&ringbuf, sizeof(udevd_name), 0, &key);
+ err = bpf_dynptr_write(&key, 0, (void *)udevd_name, sizeof(udevd_name), 0);
+ if (err) {
+ bpf_ringbuf_discard_dynptr(&key, 0);
+ err = 5;
+ goto out;
+ }
+
+ err = bpf_map_update_elem(htab, &key, &new_value, BPF_NOEXIST);
+ bpf_ringbuf_discard_dynptr(&key, 0);
+ if (err != -EEXIST) {
+ err = 6;
+ goto out;
+ }
+
+ /* Lookup it again */
+ bpf_dynptr_from_mem(dynptr_buf[0], sizeof(udevd_name), 0, &key);
+ value = bpf_map_lookup_elem(htab, &key);
+ if (!value) {
+ err = 7;
+ goto out;
+ }
+ if (*value != 42) {
+ err = 8;
+ goto out;
+ }
+
+ /* Delete then lookup it */
+ bpf_ringbuf_reserve_dynptr(&ringbuf, sizeof(udevd_name), 0, &key);
+ err = bpf_dynptr_write(&key, 0, (void *)udevd_name, sizeof(udevd_name), 0);
+ if (err) {
+ bpf_ringbuf_discard_dynptr(&key, 0);
+ err = 9;
+ goto out;
+ }
+ err = bpf_map_delete_elem(htab, &key);
+ bpf_ringbuf_discard_dynptr(&key, 0);
+ if (err) {
+ err = 10;
+ goto out;
+ }
+
+ bpf_dynptr_from_mem(dynptr_buf[0], sizeof(udevd_name), 0, &key);
+ value = bpf_map_lookup_elem(htab, &key);
+ if (value) {
+ err = 10;
+ goto out;
+ }
+out:
+ return err;
+}
+
+static int test_mixed_dynptr_key_htab(struct bpf_map *htab)
+{
+ unsigned long new_value, *value;
+ char udevd_name[] = "udevd";
+ struct mixed_dynptr_key key;
+ int err = 0;
+
+ __builtin_memset(&key, 0, sizeof(key));
+ key.id = 1000;
+
+ /* Lookup a existent key */
+ __builtin_memcpy(dynptr_buf[0], systemd_name, sizeof(systemd_name));
+ bpf_dynptr_from_mem(dynptr_buf[0], sizeof(systemd_name), 0, &key.name);
+ value = bpf_map_lookup_elem(htab, &key);
+ if (!value) {
+ err = 1;
+ goto out;
+ }
+ if (*value != 100) {
+ err = 2;
+ goto out;
+ }
+
+ /* Look up a non-existent key */
+ __builtin_memcpy(dynptr_buf[0], udevd_name, sizeof(udevd_name));
+ bpf_dynptr_from_mem(dynptr_buf[0], sizeof(udevd_name), 0, &key.name);
+ value = bpf_map_lookup_elem(htab, &key);
+ if (value) {
+ err = 3;
+ goto out;
+ }
+
+ /* Insert a new key */
+ new_value = 42;
+ err = bpf_map_update_elem(htab, &key, &new_value, BPF_NOEXIST);
+ if (err) {
+ err = 4;
+ goto out;
+ }
+
+ /* Insert an existent key */
+ bpf_ringbuf_reserve_dynptr(&ringbuf, sizeof(udevd_name), 0, &key.name);
+ err = bpf_dynptr_write(&key.name, 0, (void *)udevd_name, sizeof(udevd_name), 0);
+ if (err) {
+ bpf_ringbuf_discard_dynptr(&key.name, 0);
+ err = 5;
+ goto out;
+ }
+
+ err = bpf_map_update_elem(htab, &key, &new_value, BPF_NOEXIST);
+ bpf_ringbuf_discard_dynptr(&key.name, 0);
+ if (err != -EEXIST) {
+ err = 6;
+ goto out;
+ }
+
+ /* Lookup it again */
+ bpf_dynptr_from_mem(dynptr_buf[0], sizeof(udevd_name), 0, &key.name);
+ value = bpf_map_lookup_elem(htab, &key);
+ if (!value) {
+ err = 7;
+ goto out;
+ }
+ if (*value != 42) {
+ err = 8;
+ goto out;
+ }
+
+ /* Delete then lookup it */
+ bpf_ringbuf_reserve_dynptr(&ringbuf, sizeof(udevd_name), 0, &key.name);
+ err = bpf_dynptr_write(&key.name, 0, (void *)udevd_name, sizeof(udevd_name), 0);
+ if (err) {
+ bpf_ringbuf_discard_dynptr(&key.name, 0);
+ err = 9;
+ goto out;
+ }
+ err = bpf_map_delete_elem(htab, &key);
+ bpf_ringbuf_discard_dynptr(&key.name, 0);
+ if (err) {
+ err = 10;
+ goto out;
+ }
+
+ bpf_dynptr_from_mem(dynptr_buf[0], sizeof(udevd_name), 0, &key.name);
+ value = bpf_map_lookup_elem(htab, &key);
+ if (value) {
+ err = 10;
+ goto out;
+ }
+out:
+ return err;
+}
+
+static int test_multiple_dynptr_key_htab(struct bpf_map *htab)
+{
+ unsigned long new_value, *value;
+ struct multiple_dynptr_key key;
+ int err = 0;
+
+ __builtin_memset(&key, 0, sizeof(key));
+ key.f_2 = 2;
+ key.f_3.id = 3;
+ key.f_4 = 4;
+
+ /* Lookup a existent key */
+ __builtin_memcpy(dynptr_buf[0], systemd_name, sizeof(systemd_name));
+ bpf_dynptr_from_mem(dynptr_buf[0], sizeof(systemd_name), 0, &key.f_1.name);
+ __builtin_memcpy(dynptr_buf[1], rcu_sched_name, sizeof(rcu_sched_name));
+ bpf_dynptr_from_mem(dynptr_buf[1], sizeof(rcu_sched_name), 0, &key.f_3.name);
+ value = bpf_map_lookup_elem(htab, &key);
+ if (!value) {
+ err = 1;
+ goto out;
+ }
+ if (*value != 100) {
+ err = 2;
+ goto out;
+ }
+
+ /* Look up a non-existent key */
+ bpf_dynptr_from_mem(dynptr_buf[1], sizeof(rcu_sched_name), 0, &key.f_1.name);
+ bpf_dynptr_from_mem(dynptr_buf[0], sizeof(systemd_name), 0, &key.f_3.name);
+ value = bpf_map_lookup_elem(htab, &key);
+ if (value) {
+ err = 3;
+ goto out;
+ }
+
+ /* Insert a new key */
+ new_value = 42;
+ err = bpf_map_update_elem(htab, &key, &new_value, BPF_NOEXIST);
+ if (err) {
+ err = 4;
+ goto out;
+ }
+
+ /* Insert an existent key */
+ bpf_ringbuf_reserve_dynptr(&ringbuf, sizeof(rcu_sched_name), 0, &key.f_1.name);
+ err = bpf_dynptr_write(&key.f_1.name, 0, (void *)rcu_sched_name, sizeof(rcu_sched_name), 0);
+ if (err) {
+ bpf_ringbuf_discard_dynptr(&key.f_1.name, 0);
+ err = 5;
+ goto out;
+ }
+ err = bpf_map_update_elem(htab, &key, &new_value, BPF_NOEXIST);
+ bpf_ringbuf_discard_dynptr(&key.f_1.name, 0);
+ if (err != -EEXIST) {
+ err = 6;
+ goto out;
+ }
+
+ /* Lookup a non-existent key */
+ bpf_dynptr_from_mem(dynptr_buf[1], sizeof(rcu_sched_name), 0, &key.f_1.name);
+ key.f_4 = 0;
+ value = bpf_map_lookup_elem(htab, &key);
+ if (value) {
+ err = 7;
+ goto out;
+ }
+
+ /* Lookup an existent key */
+ key.f_4 = 4;
+ value = bpf_map_lookup_elem(htab, &key);
+ if (!value) {
+ err = 8;
+ goto out;
+ }
+ if (*value != 42) {
+ err = 9;
+ goto out;
+ }
+
+ /* Delete the newly-inserted key */
+ bpf_ringbuf_reserve_dynptr(&ringbuf, sizeof(systemd_name), 0, &key.f_3.name);
+ err = bpf_dynptr_write(&key.f_3.name, 0, (void *)systemd_name, sizeof(systemd_name), 0);
+ if (err) {
+ bpf_ringbuf_discard_dynptr(&key.f_3.name, 0);
+ err = 10;
+ goto out;
+ }
+ err = bpf_map_delete_elem(htab, &key);
+ if (err) {
+ bpf_ringbuf_discard_dynptr(&key.f_3.name, 0);
+ err = 11;
+ goto out;
+ }
+
+ /* Lookup it again */
+ value = bpf_map_lookup_elem(htab, &key);
+ bpf_ringbuf_discard_dynptr(&key.f_3.name, 0);
+ if (value) {
+ err = 12;
+ goto out;
+ }
+out:
+ return err;
+}
+
+SEC("?fentry/" SYS_PREFIX "sys_nanosleep")
+int BPF_PROG(pure_dynptr_key)
+{
+ if (bpf_get_current_pid_tgid() >> 32 != pid)
+ return 0;
+
+ test_err = test_pure_dynptr_key_htab((struct bpf_map *)&htab_1);
+ test_err |= test_pure_dynptr_key_htab((struct bpf_map *)&htab_2) << 8;
+
+ return 0;
+}
+
+SEC("?fentry/" SYS_PREFIX "sys_nanosleep")
+int BPF_PROG(mixed_dynptr_key)
+{
+ if (bpf_get_current_pid_tgid() >> 32 != pid)
+ return 0;
+
+ test_err = test_mixed_dynptr_key_htab((struct bpf_map *)&htab_3);
+
+ return 0;
+}
+
+SEC("?fentry/" SYS_PREFIX "sys_nanosleep")
+int BPF_PROG(multiple_dynptr_key)
+{
+ if (bpf_get_current_pid_tgid() >> 32 != pid)
+ return 0;
+
+ test_err = test_multiple_dynptr_key_htab((struct bpf_map *)&htab_4);
+
+ return 0;
+}
--
2.44.0
^ permalink raw reply related [flat|nested] 47+ messages in thread
* Re: [PATCH bpf-next 01/16] bpf: Introduce map flag BPF_F_DYNPTR_IN_KEY
2024-10-08 9:14 ` [PATCH bpf-next 01/16] bpf: Introduce map flag BPF_F_DYNPTR_IN_KEY Hou Tao
@ 2024-10-10 2:21 ` Alexei Starovoitov
2024-10-21 13:45 ` Hou Tao
0 siblings, 1 reply; 47+ messages in thread
From: Alexei Starovoitov @ 2024-10-10 2:21 UTC (permalink / raw)
To: Hou Tao
Cc: bpf, Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko,
Eduard Zingerman, Song Liu, Hao Luo, Yonghong Song,
Daniel Borkmann, KP Singh, Stanislav Fomichev, Jiri Olsa,
John Fastabend, Hou Tao, Xu Kuohai
On Tue, Oct 8, 2024 at 2:02 AM Hou Tao <houtao@huaweicloud.com> wrote:
>
> index c6cd7c7aeeee..07f7df308a01 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -1409,6 +1409,9 @@ enum {
>
> /* Do not translate kernel bpf_arena pointers to user pointers */
> BPF_F_NO_USER_CONV = (1U << 18),
> +
> +/* Create a map with bpf_dynptr in key */
> + BPF_F_DYNPTR_IN_KEY = (1U << 19),
> };
If I'm reading the other patches correctly this uapi flag
is unnecessary.
BTF describes the fields and dynptr is either there or not.
Why require users to add an extra flag ?
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH bpf-next 03/16] bpf: Parse bpf_dynptr in map key
2024-10-08 9:14 ` [PATCH bpf-next 03/16] bpf: Parse bpf_dynptr in map key Hou Tao
@ 2024-10-10 18:02 ` Eduard Zingerman
2024-10-21 13:48 ` Hou Tao
2024-10-11 16:29 ` Alexei Starovoitov
1 sibling, 1 reply; 47+ messages in thread
From: Eduard Zingerman @ 2024-10-10 18:02 UTC (permalink / raw)
To: Hou Tao, bpf
Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko, Song Liu,
Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh,
Stanislav Fomichev, Jiri Olsa, John Fastabend, houtao1, xukuohai
On Tue, 2024-10-08 at 17:14 +0800, Hou Tao wrote:
[...]
> diff --git a/kernel/bpf/map_in_map.c b/kernel/bpf/map_in_map.c
> index 645bd30bc9a9..a072835dc645 100644
> --- a/kernel/bpf/map_in_map.c
> +++ b/kernel/bpf/map_in_map.c
[...]
> @@ -45,9 +45,13 @@ struct bpf_map *bpf_map_meta_alloc(int inner_map_ufd)
> * invalid/empty/valid, but ERR_PTR in case of errors. During
> * equality NULL or IS_ERR is equivalent.
> */
> - struct bpf_map *ret = ERR_CAST(inner_map_meta->record);
> - kfree(inner_map_meta);
> - return ret;
> + ret = ERR_CAST(inner_map_meta->record);
> + goto free;
> + }
> + inner_map_meta->key_record = btf_record_dup(inner_map->key_record);
> + if (IS_ERR(inner_map_meta->key_record)) {
> + ret = ERR_CAST(inner_map_meta->key_record);
> + goto free;
The 'goto free' executes a call to bpf_map_meta_free() which does
btf_put(map_meta->btf), but corresponding btf_get(inner_map->btf) only
happens on the lines below => in case when 'free' branch is taken we
'put' BTF object that was not 'get' by us.
> }
> /* Note: We must use the same BTF, as we also used btf_record_dup above
> * which relies on BTF being same for both maps, as some members like
> @@ -71,6 +75,10 @@ struct bpf_map *bpf_map_meta_alloc(int inner_map_ufd)
> inner_map_meta->bypass_spec_v1 = inner_map->bypass_spec_v1;
> }
> return inner_map_meta;
> +
> +free:
> + bpf_map_meta_free(inner_map_meta);
> + return ret;
> }
>
> void bpf_map_meta_free(struct bpf_map *map_meta)
[...]
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH bpf-next 05/16] bpf: Support map key with dynptr in verifier
2024-10-08 9:14 ` [PATCH bpf-next 05/16] bpf: Support map key with dynptr in verifier Hou Tao
@ 2024-10-10 20:30 ` Eduard Zingerman
2024-10-10 20:57 ` Eduard Zingerman
2024-10-13 13:07 ` Dan Carpenter
1 sibling, 1 reply; 47+ messages in thread
From: Eduard Zingerman @ 2024-10-10 20:30 UTC (permalink / raw)
To: Hou Tao, bpf
Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko, Song Liu,
Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh,
Stanislav Fomichev, Jiri Olsa, John Fastabend, houtao1, xukuohai
On Tue, 2024-10-08 at 17:14 +0800, Hou Tao wrote:
> From: Hou Tao <houtao1@huawei.com>
>
> The patch basically does the following three things to enable dynptr key
> for bpf map:
>
> 1) Only allow PTR_TO_STACK typed register for dynptr key
> The main reason is that bpf_dynptr can only be defined in the stack, so
> for dynptr key only PTR_TO_STACK typed register is allowed. bpf_dynptr
> could also be represented by CONST_PTR_TO_DYNPTR typed register (e.g.,
> in callback func or subprog), but it is not supported now.
>
> 2) Only allow fixed-offset for PTR_TO_STACK register
> Variable-offset for PTR_TO_STACK typed register is disallowed, because
> it is impossible to check whether or not the stack access is aligned
> with BPF_REG_SIZE and is matched with the location of dynptr or
> non-dynptr part in the map key.
>
> 3) Check the layout of the stack content is matched with the btf_record
> Firstly check the start offset of the stack access is aligned with
> BPF_REG_SIZE, then check the offset and the size of dynptr/non-dynptr
> parts in the stack content is consistent with the btf_record of the map
> key.
>
> Signed-off-by: Hou Tao <houtao1@huawei.com>
> ---
The logic of this patch looks correct, however I find it cumbersome.
The only place where access to dynptr key is allowed is 'case ARG_PTR_TO_MAP_KEY'
in check_func_arg(), a lot of places are modified to facilitate this.
It seems that logic would be easier to follow if there would be a
dedicated function to check dynptr key constraints, called only for
the 'case ARG_PTR_TO_MAP_KEY'. This would als make 'struct dynptr_key_state'
unnecessary as this state would be tracked inside such function.
Wdyt?
[...]
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH bpf-next 05/16] bpf: Support map key with dynptr in verifier
2024-10-10 20:30 ` Eduard Zingerman
@ 2024-10-10 20:57 ` Eduard Zingerman
2024-10-21 13:50 ` Hou Tao
0 siblings, 1 reply; 47+ messages in thread
From: Eduard Zingerman @ 2024-10-10 20:57 UTC (permalink / raw)
To: Hou Tao, bpf
Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko, Song Liu,
Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh,
Stanislav Fomichev, Jiri Olsa, John Fastabend, houtao1, xukuohai
On Thu, 2024-10-10 at 13:30 -0700, Eduard Zingerman wrote:
[...]
> The logic of this patch looks correct, however I find it cumbersome.
> The only place where access to dynptr key is allowed is 'case ARG_PTR_TO_MAP_KEY'
> in check_func_arg(), a lot of places are modified to facilitate this.
> It seems that logic would be easier to follow if there would be a
> dedicated function to check dynptr key constraints, called only for
> the 'case ARG_PTR_TO_MAP_KEY'. This would als make 'struct dynptr_key_state'
> unnecessary as this state would be tracked inside such function.
> Wdyt?
Just realized that change to check_stack_range_initialized would still
be necessary, as it forbids dynptr access at the moment. Unfortunate.
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH bpf-next 06/16] bpf: Introduce bpf_dynptr_user
2024-10-08 9:14 ` [PATCH bpf-next 06/16] bpf: Introduce bpf_dynptr_user Hou Tao
@ 2024-10-10 21:50 ` Andrii Nakryiko
2024-10-10 22:12 ` Alexei Starovoitov
2024-10-21 13:51 ` Hou Tao
0 siblings, 2 replies; 47+ messages in thread
From: Andrii Nakryiko @ 2024-10-10 21:50 UTC (permalink / raw)
To: Hou Tao
Cc: bpf, Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko,
Eduard Zingerman, Song Liu, Hao Luo, Yonghong Song,
Daniel Borkmann, KP Singh, Stanislav Fomichev, Jiri Olsa,
John Fastabend, houtao1, xukuohai
On Tue, Oct 8, 2024 at 2:02 AM Hou Tao <houtao@huaweicloud.com> wrote:
>
> From: Hou Tao <houtao1@huawei.com>
>
> For bpf map with dynptr key support, the userspace application will use
> bpf_dynptr_user to represent the bpf_dynptr in the map key and pass it
> to bpf syscall. The bpf syscall will copy from bpf_dynptr_user to
> construct a corresponding bpf_dynptr_kern object when the map key is an
> input argument, and copy to bpf_dynptr_user from a bpf_dynptr_kern
> object when the map key is an output argument.
>
> For now the size of bpf_dynptr_user must be the same as bpf_dynptr, but
> the last u32 field is not used, so make it a reserved field.
>
> Signed-off-by: Hou Tao <houtao1@huawei.com>
> ---
> include/uapi/linux/bpf.h | 6 ++++++
> tools/include/uapi/linux/bpf.h | 6 ++++++
> 2 files changed, 12 insertions(+)
>
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 07f7df308a01..72fe6a96b54c 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -7329,6 +7329,12 @@ struct bpf_dynptr {
> __u64 __opaque[2];
> } __attribute__((aligned(8)));
>
> +struct bpf_dynptr_user {
bikeshedding: maybe just bpf_udynptr?
> + __u64 data;
> + __u32 size;
> + __u32 rsvd;
> +} __attribute__((aligned(8)));
> +
> struct bpf_list_head {
> __u64 __opaque[2];
> } __attribute__((aligned(8)));
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index 14f223282bfa..f12ce268e6be 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -7328,6 +7328,12 @@ struct bpf_dynptr {
> __u64 __opaque[2];
> } __attribute__((aligned(8)));
>
> +struct bpf_dynptr_user {
> + __u64 data;
what if we use __bpf_md_ptr(void *, data), so users can just directly
use this struct (and then the next patch won't be necessary at all)
> + __u32 size;
> + __u32 rsvd;
please call it __reserved
> +} __attribute__((aligned(8)));
> +
> struct bpf_list_head {
> __u64 __opaque[2];
> } __attribute__((aligned(8)));
> --
> 2.44.0
>
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH bpf-next 07/16] libbpf: Add helpers for bpf_dynptr_user
2024-10-08 9:14 ` [PATCH bpf-next 07/16] libbpf: Add helpers for bpf_dynptr_user Hou Tao
@ 2024-10-10 21:50 ` Andrii Nakryiko
2024-10-21 13:51 ` Hou Tao
0 siblings, 1 reply; 47+ messages in thread
From: Andrii Nakryiko @ 2024-10-10 21:50 UTC (permalink / raw)
To: Hou Tao
Cc: bpf, Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko,
Eduard Zingerman, Song Liu, Hao Luo, Yonghong Song,
Daniel Borkmann, KP Singh, Stanislav Fomichev, Jiri Olsa,
John Fastabend, houtao1, xukuohai
On Tue, Oct 8, 2024 at 2:02 AM Hou Tao <houtao@huaweicloud.com> wrote:
>
> From: Hou Tao <houtao1@huawei.com>
>
> Add bpf_dynptr_user_init() to initialize a bpf_dynptr_user object,
> bpf_dynptr_user_{data,size}() to get the address and length of the
> dynptr object, and bpf_dynptr_user_set_size() to set the its size.
>
> Instead of exporting these symbols, simply adding these helpers as
> inline functions in bpf.h.
>
> Signed-off-by: Hou Tao <houtao1@huawei.com>
> ---
> tools/lib/bpf/bpf.h | 27 +++++++++++++++++++++++++++
> 1 file changed, 27 insertions(+)
>
I don't think we need this patch and these APIs at all, let user work
with bpf_udynptr directly
> diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
> index a4a7b1ad1b63..92b4afac5f5f 100644
> --- a/tools/lib/bpf/bpf.h
> +++ b/tools/lib/bpf/bpf.h
> @@ -700,6 +700,33 @@ struct bpf_token_create_opts {
> LIBBPF_API int bpf_token_create(int bpffs_fd,
> struct bpf_token_create_opts *opts);
>
> +/* sys_bpf() will check the validity of data and size */
> +static inline void bpf_dynptr_user_init(void *data, __u32 size,
> + struct bpf_dynptr_user *dynptr)
> +{
> + dynptr->data = (__u64)(unsigned long)data;
> + dynptr->size = size;
> + dynptr->rsvd = 0;
> +}
> +
> +static inline void bpf_dynptr_user_set_size(struct bpf_dynptr_user *dynptr,
> + __u32 new_size)
> +{
> + dynptr->size = new_size;
> +}
> +
> +static inline __u32
> +bpf_dynptr_user_size(const struct bpf_dynptr_user *dynptr)
> +{
> + return dynptr->size;
> +}
> +
> +static inline void *
> +bpf_dynptr_user_data(const struct bpf_dynptr_user *dynptr)
> +{
> + return (void *)(unsigned long)dynptr->data;
> +}
> +
> #ifdef __cplusplus
> } /* extern "C" */
> #endif
> --
> 2.44.0
>
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH bpf-next 06/16] bpf: Introduce bpf_dynptr_user
2024-10-10 21:50 ` Andrii Nakryiko
@ 2024-10-10 22:12 ` Alexei Starovoitov
2024-10-21 13:51 ` Hou Tao
1 sibling, 0 replies; 47+ messages in thread
From: Alexei Starovoitov @ 2024-10-10 22:12 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: Hou Tao, bpf, Martin KaFai Lau, Alexei Starovoitov,
Andrii Nakryiko, Eduard Zingerman, Song Liu, Hao Luo,
Yonghong Song, Daniel Borkmann, KP Singh, Stanislav Fomichev,
Jiri Olsa, John Fastabend, Hou Tao, Xu Kuohai
On Thu, Oct 10, 2024 at 2:50 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Tue, Oct 8, 2024 at 2:02 AM Hou Tao <houtao@huaweicloud.com> wrote:
> >
> > From: Hou Tao <houtao1@huawei.com>
> >
> > For bpf map with dynptr key support, the userspace application will use
> > bpf_dynptr_user to represent the bpf_dynptr in the map key and pass it
> > to bpf syscall. The bpf syscall will copy from bpf_dynptr_user to
> > construct a corresponding bpf_dynptr_kern object when the map key is an
> > input argument, and copy to bpf_dynptr_user from a bpf_dynptr_kern
> > object when the map key is an output argument.
> >
> > For now the size of bpf_dynptr_user must be the same as bpf_dynptr, but
> > the last u32 field is not used, so make it a reserved field.
> >
> > Signed-off-by: Hou Tao <houtao1@huawei.com>
> > ---
> > include/uapi/linux/bpf.h | 6 ++++++
> > tools/include/uapi/linux/bpf.h | 6 ++++++
> > 2 files changed, 12 insertions(+)
> >
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index 07f7df308a01..72fe6a96b54c 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -7329,6 +7329,12 @@ struct bpf_dynptr {
> > __u64 __opaque[2];
> > } __attribute__((aligned(8)));
> >
> > +struct bpf_dynptr_user {
>
> bikeshedding: maybe just bpf_udynptr?
I don't like it. Looks too similar and easy to misread.
I think bpf_dynptr_user is better.
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH bpf-next 03/16] bpf: Parse bpf_dynptr in map key
2024-10-08 9:14 ` [PATCH bpf-next 03/16] bpf: Parse bpf_dynptr in map key Hou Tao
2024-10-10 18:02 ` Eduard Zingerman
@ 2024-10-11 16:29 ` Alexei Starovoitov
2024-10-21 14:02 ` Hou Tao
1 sibling, 1 reply; 47+ messages in thread
From: Alexei Starovoitov @ 2024-10-11 16:29 UTC (permalink / raw)
To: Hou Tao
Cc: bpf, Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko,
Eduard Zingerman, Song Liu, Hao Luo, Yonghong Song,
Daniel Borkmann, KP Singh, Stanislav Fomichev, Jiri Olsa,
John Fastabend, Hou Tao, Xu Kuohai
On Tue, Oct 8, 2024 at 2:02 AM Hou Tao <houtao@huaweicloud.com> wrote:
>
> +#define MAX_DYNPTR_CNT_IN_MAP_KEY 4
> +
> static int map_check_btf(struct bpf_map *map, struct bpf_token *token,
> const struct btf *btf, u32 btf_key_id, u32 btf_value_id)
> {
> @@ -1103,6 +1113,40 @@ static int map_check_btf(struct bpf_map *map, struct bpf_token *token,
> if (!value_type || value_size != map->value_size)
> return -EINVAL;
>
> + if (btf_type_is_dynptr(btf, key_type))
> + map->key_record = btf_new_bpf_dynptr_record();
> + else
> + map->key_record = btf_parse_fields(btf, key_type, BPF_DYNPTR, map->key_size);
> + if (!IS_ERR_OR_NULL(map->key_record)) {
> + if (map->key_record->cnt > MAX_DYNPTR_CNT_IN_MAP_KEY) {
> + ret = -E2BIG;
> + goto free_map_tab;
Took me a while to grasp that map->key_record is only for dynptr fields
and map->record is for the rest except dynptr fields.
Maybe rename key_record to dynptr_fields ?
Or at least add a comment to struct bpf_map to explain
what each btf_record is for.
It's kinda arbitrary decision to support multiple dynptr-s per key
while other fields are not.
Maybe worth looking at generalizing it a bit so single btf_record
can have multiple of certain field kinds?
In addition to btf_record->cnt you'd need btf_record->dynptr_cnt
but that would be easier to extend in the future ?
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH bpf-next 12/16] bpf: Support basic operations for dynptr key in hash map
2024-10-08 9:14 ` [PATCH bpf-next 12/16] bpf: Support basic operations for dynptr key in hash map Hou Tao
@ 2024-10-11 16:47 ` Alexei Starovoitov
2024-10-30 10:02 ` Hou Tao
0 siblings, 1 reply; 47+ messages in thread
From: Alexei Starovoitov @ 2024-10-11 16:47 UTC (permalink / raw)
To: Hou Tao
Cc: bpf, Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko,
Eduard Zingerman, Song Liu, Hao Luo, Yonghong Song,
Daniel Borkmann, KP Singh, Stanislav Fomichev, Jiri Olsa,
John Fastabend, Hou Tao, Xu Kuohai
On Tue, Oct 8, 2024 at 2:02 AM Hou Tao <houtao@huaweicloud.com> wrote:
>
> From: Hou Tao <houtao1@huawei.com>
>
> The patch supports lookup, update, delete and lookup_delete operations
> for hash map with dynptr map. There are two major differences between
> the implementation of normal hash map and dynptr-keyed hash map:
>
> 1) dynptr-keyed hash map doesn't support pre-allocation.
> The reason is that the dynptr in map key is allocated dynamically
> through bpf mem allocator. The length limitation for these dynptrs is
> 4088 bytes now. Because there dynptrs are allocated dynamically, the
> consumption of memory will be smaller compared with normal hash map when
> there are big differences between the length of these dynptrs.
>
> 2) the freed element in dynptr-key map will not be reused immediately
> For normal hash map, the freed element may be reused immediately by the
> newly-added element, so the lookup may return an incorrect result due to
> element deletion and element reuse. However dynptr-key map could not do
> that, there are pointers (dynptrs) in the map key and the updates of
> these dynptrs are not atomic: both the address and the length of the
> dynptr will be updated. If the element is reused immediately, the access
> of the dynptr in the freed element may incur invalid memory access due
> to the mismatch between the address and the size of dynptr, so reuse the
> freed element after one RCU grace period.
>
> Beside the differences above, dynptr-keyed hash map also needs to handle
> the maybe-nullified dynptr in the map key.
>
> Signed-off-by: Hou Tao <houtao1@huawei.com>
> ---
> kernel/bpf/hashtab.c | 283 +++++++++++++++++++++++++++++++++++++++----
> 1 file changed, 257 insertions(+), 26 deletions(-)
>
> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
> index b14b87463ee0..edf19d36a413 100644
> --- a/kernel/bpf/hashtab.c
> +++ b/kernel/bpf/hashtab.c
> @@ -88,6 +88,7 @@ struct bpf_htab {
> struct bpf_map map;
> struct bpf_mem_alloc ma;
> struct bpf_mem_alloc pcpu_ma;
> + struct bpf_mem_alloc dynptr_ma;
> struct bucket *buckets;
> void *elems;
> union {
> @@ -425,6 +426,7 @@ static int htab_map_alloc_check(union bpf_attr *attr)
> bool percpu_lru = (attr->map_flags & BPF_F_NO_COMMON_LRU);
> bool prealloc = !(attr->map_flags & BPF_F_NO_PREALLOC);
> bool zero_seed = (attr->map_flags & BPF_F_ZERO_SEED);
> + bool dynptr_in_key = (attr->map_flags & BPF_F_DYNPTR_IN_KEY);
> int numa_node = bpf_map_attr_numa_node(attr);
>
> BUILD_BUG_ON(offsetof(struct htab_elem, fnode.next) !=
> @@ -438,6 +440,14 @@ static int htab_map_alloc_check(union bpf_attr *attr)
> !bpf_map_flags_access_ok(attr->map_flags))
> return -EINVAL;
>
> + if (dynptr_in_key) {
> + if (percpu || lru || prealloc || !attr->map_extra)
> + return -EINVAL;
> + if ((attr->map_extra >> 32) || bpf_dynptr_check_size(attr->map_extra) ||
> + bpf_mem_alloc_check_size(percpu, attr->map_extra))
> + return -E2BIG;
> + }
> +
> if (!lru && percpu_lru)
> return -EINVAL;
>
> @@ -482,6 +492,7 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
> */
> bool percpu_lru = (attr->map_flags & BPF_F_NO_COMMON_LRU);
> bool prealloc = !(attr->map_flags & BPF_F_NO_PREALLOC);
> + bool dynptr_in_key = (attr->map_flags & BPF_F_DYNPTR_IN_KEY);
> struct bpf_htab *htab;
> int err, i;
>
> @@ -598,6 +609,11 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
> if (err)
> goto free_map_locked;
> }
> + if (dynptr_in_key) {
> + err = bpf_mem_alloc_init(&htab->dynptr_ma, 0, false);
> + if (err)
> + goto free_map_locked;
> + }
> }
>
> return &htab->map;
> @@ -610,6 +626,7 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
> for (i = 0; i < HASHTAB_MAP_LOCK_COUNT; i++)
> free_percpu(htab->map_locked[i]);
> bpf_map_area_free(htab->buckets);
> + bpf_mem_alloc_destroy(&htab->dynptr_ma);
> bpf_mem_alloc_destroy(&htab->pcpu_ma);
> bpf_mem_alloc_destroy(&htab->ma);
> free_elem_count:
> @@ -620,13 +637,55 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
> return ERR_PTR(err);
> }
>
> -static inline u32 htab_map_hash(const void *key, u32 key_len, u32 hashrnd)
> +static inline u32 __htab_map_hash(const void *key, u32 key_len, u32 hashrnd)
> {
> if (likely(key_len % 4 == 0))
> return jhash2(key, key_len / 4, hashrnd);
> return jhash(key, key_len, hashrnd);
> }
>
> +static u32 htab_map_dynptr_hash(const void *key, u32 key_len, u32 hashrnd,
> + const struct btf_record *rec)
> +{
> + unsigned int i, cnt = rec->cnt;
> + unsigned int hash = hashrnd;
> + unsigned int offset = 0;
> +
> + for (i = 0; i < cnt; i++) {
> + const struct btf_field *field = &rec->fields[i];
> + const struct bpf_dynptr_kern *kptr;
> + unsigned int len;
> +
> + if (field->type != BPF_DYNPTR)
> + continue;
> +
> + /* non-dynptr part ? */
> + if (offset < field->offset)
> + hash = jhash(key + offset, field->offset - offset, hash);
> +
> + /* Skip nullified dynptr */
> + kptr = key + field->offset;
> + if (kptr->data) {
> + len = __bpf_dynptr_size(kptr);
> + hash = jhash(__bpf_dynptr_data(kptr, len), len, hash);
> + }
> + offset = field->offset + field->size;
> + }
> +
> + if (offset < key_len)
> + hash = jhash(key + offset, key_len - offset, hash);
> +
> + return hash;
> +}
> +
> +static inline u32 htab_map_hash(const void *key, u32 key_len, u32 hashrnd,
> + const struct btf_record *rec)
> +{
> + if (!rec)
> + return __htab_map_hash(key, key_len, hashrnd);
> + return htab_map_dynptr_hash(key, key_len, hashrnd, rec);
> +}
> +
> static inline struct bucket *__select_bucket(struct bpf_htab *htab, u32 hash)
> {
> return &htab->buckets[hash & (htab->n_buckets - 1)];
> @@ -637,15 +696,68 @@ static inline struct hlist_nulls_head *select_bucket(struct bpf_htab *htab, u32
> return &__select_bucket(htab, hash)->head;
> }
>
> +static bool is_same_dynptr_key(const void *key, const void *tgt, unsigned int key_size,
> + const struct btf_record *rec)
> +{
> + unsigned int i, cnt = rec->cnt;
> + unsigned int offset = 0;
> +
> + for (i = 0; i < cnt; i++) {
> + const struct btf_field *field = &rec->fields[i];
> + const struct bpf_dynptr_kern *kptr, *tgt_kptr;
> + const void *data, *tgt_data;
> + unsigned int len;
> +
> + if (field->type != BPF_DYNPTR)
> + continue;
> +
> + if (offset < field->offset &&
> + memcmp(key + offset, tgt + offset, field->offset - offset))
> + return false;
> +
> + /*
> + * For a nullified dynptr in the target key, __bpf_dynptr_size()
> + * will return 0, and there will be no match for the target key.
> + */
> + kptr = key + field->offset;
> + tgt_kptr = tgt + field->offset;
> + len = __bpf_dynptr_size(kptr);
> + if (len != __bpf_dynptr_size(tgt_kptr))
> + return false;
> +
> + data = __bpf_dynptr_data(kptr, len);
> + tgt_data = __bpf_dynptr_data(tgt_kptr, len);
> + if (memcmp(data, tgt_data, len))
> + return false;
> +
> + offset = field->offset + field->size;
> + }
> +
> + if (offset < key_size &&
> + memcmp(key + offset, tgt + offset, key_size - offset))
> + return false;
> +
> + return true;
> +}
> +
> +static inline bool htab_is_same_key(const void *key, const void *tgt, unsigned int key_size,
> + const struct btf_record *rec)
> +{
> + if (!rec)
> + return !memcmp(key, tgt, key_size);
> + return is_same_dynptr_key(key, tgt, key_size, rec);
> +}
> +
> /* this lookup function can only be called with bucket lock taken */
> static struct htab_elem *lookup_elem_raw(struct hlist_nulls_head *head, u32 hash,
> - void *key, u32 key_size)
> + void *key, u32 key_size,
> + const struct btf_record *record)
> {
> struct hlist_nulls_node *n;
> struct htab_elem *l;
>
> hlist_nulls_for_each_entry_rcu(l, n, head, hash_node)
> - if (l->hash == hash && !memcmp(&l->key, key, key_size))
> + if (l->hash == hash && htab_is_same_key(l->key, key, key_size, record))
> return l;
>
> return NULL;
> @@ -657,14 +769,15 @@ static struct htab_elem *lookup_elem_raw(struct hlist_nulls_head *head, u32 hash
> */
> static struct htab_elem *lookup_nulls_elem_raw(struct hlist_nulls_head *head,
> u32 hash, void *key,
> - u32 key_size, u32 n_buckets)
> + u32 key_size, u32 n_buckets,
> + const struct btf_record *record)
> {
> struct hlist_nulls_node *n;
> struct htab_elem *l;
>
> again:
> hlist_nulls_for_each_entry_rcu(l, n, head, hash_node)
> - if (l->hash == hash && !memcmp(&l->key, key, key_size))
> + if (l->hash == hash && htab_is_same_key(l->key, key, key_size, record))
> return l;
If I'm reading this correctly the support for dynptr in map keys
adds two map->key_record != NULL checks in the fast path,
hence what you said in cover letter:
> It seems adding dynptr key support in hash map degrades the lookup
> performance about 12% and degrades the update performance about 7%. Will
> investigate these degradation first.
>
> a) lookup
> max_entries = 8K
>
> before:
> 0:hash_lookup 72347325 lookups per sec
>
> after:
> 0:hash_lookup 64758890 lookups per sec
is surprising.
Two conditional branches contribute to 12% performance loss?
Something fishy.
Try unlikely() to hopefully recover most of it.
After analyzing 'perf report/annotate', of course.
Worst case we can specialize htab_map_gen_lookup() to
call into different helpers.
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH bpf-next 16/16] selftests/bpf: Add test cases for hash map with dynptr key
2024-10-08 9:15 ` [PATCH bpf-next 16/16] selftests/bpf: Add test cases for hash map with dynptr key Hou Tao
@ 2024-10-11 18:23 ` Alexei Starovoitov
2024-10-21 14:05 ` Hou Tao
0 siblings, 1 reply; 47+ messages in thread
From: Alexei Starovoitov @ 2024-10-11 18:23 UTC (permalink / raw)
To: Hou Tao
Cc: bpf, Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko,
Eduard Zingerman, Song Liu, Hao Luo, Yonghong Song,
Daniel Borkmann, KP Singh, Stanislav Fomichev, Jiri Olsa,
John Fastabend, Hou Tao, Xu Kuohai
On Tue, Oct 8, 2024 at 2:02 AM Hou Tao <houtao@huaweicloud.com> wrote:
>
> +
> +SEC("?fentry/" SYS_PREFIX "sys_nanosleep")
> +int BPF_PROG(pure_dynptr_key)
...
> +SEC("?fentry/" SYS_PREFIX "sys_nanosleep")
> +int BPF_PROG(mixed_dynptr_key)
...
> +SEC("?fentry/" SYS_PREFIX "sys_nanosleep")
> +int BPF_PROG(multiple_dynptr_key)
attaching to syscalls with pid filtering is ok-ish,
but it's a few unnecessary steps.
Use tracing prog for non-sleepable and syscall prog for sleepable
and bpf_prog_run() it.
More predictable and no need for a pid filter.
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH bpf-next 00/16] Support dynptr key for hash map
2024-10-08 9:14 [PATCH bpf-next 00/16] Support dynptr key for hash map Hou Tao
` (15 preceding siblings ...)
2024-10-08 9:15 ` [PATCH bpf-next 16/16] selftests/bpf: Add test cases for hash map with dynptr key Hou Tao
@ 2024-10-11 22:11 ` Eduard Zingerman
2024-10-21 14:09 ` Hou Tao
16 siblings, 1 reply; 47+ messages in thread
From: Eduard Zingerman @ 2024-10-11 22:11 UTC (permalink / raw)
To: Hou Tao, bpf
Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko, Song Liu,
Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh,
Stanislav Fomichev, Jiri Olsa, John Fastabend, houtao1, xukuohai
On Tue, 2024-10-08 at 17:14 +0800, Hou Tao wrote:
[...]
> As usual, comments and suggestions are always welcome.
fwiw, I've read through the patches in the series and code changes
seem all to make sense. Executing selftests with KASAN enabled also
does not show any issues.
Maybe add benchmarks in v2?
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH bpf-next 05/16] bpf: Support map key with dynptr in verifier
2024-10-08 9:14 ` [PATCH bpf-next 05/16] bpf: Support map key with dynptr in verifier Hou Tao
2024-10-10 20:30 ` Eduard Zingerman
@ 2024-10-13 13:07 ` Dan Carpenter
2024-10-31 2:39 ` Hou Tao
1 sibling, 1 reply; 47+ messages in thread
From: Dan Carpenter @ 2024-10-13 13:07 UTC (permalink / raw)
To: oe-kbuild, Hou Tao, bpf
Cc: lkp, oe-kbuild-all, Martin KaFai Lau, Alexei Starovoitov,
Andrii Nakryiko, Eduard Zingerman, Song Liu, Hao Luo,
Yonghong Song, Daniel Borkmann, KP Singh, Stanislav Fomichev,
Jiri Olsa, John Fastabend, houtao1, xukuohai
Hi Hou,
kernel test robot noticed the following build warnings:
url: https://github.com/intel-lab-lkp/linux/commits/Hou-Tao/bpf-Introduce-map-flag-BPF_F_DYNPTR_IN_KEY/20241008-171136
base: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
patch link: https://lore.kernel.org/r/20241008091501.8302-6-houtao%40huaweicloud.com
patch subject: [PATCH bpf-next 05/16] bpf: Support map key with dynptr in verifier
config: x86_64-randconfig-161-20241011 (https://download.01.org/0day-ci/archive/20241012/202410120302.bUO1BoP7-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
| Closes: https://lore.kernel.org/r/202410120302.bUO1BoP7-lkp@intel.com/
smatch warnings:
kernel/bpf/verifier.c:7471 check_stack_range_initialized() error: we previously assumed 'meta' could be null (see line 7439)
vim +/meta +7471 kernel/bpf/verifier.c
01f810ace9ed37 Andrei Matei 2021-02-06 7361 static int check_stack_range_initialized(
01f810ace9ed37 Andrei Matei 2021-02-06 7362 struct bpf_verifier_env *env, int regno, int off,
81b030a7eaa2ee Hou Tao 2024-10-08 7363 int access_size, unsigned int access_flags,
61df10c7799e27 Kumar Kartikeya Dwivedi 2022-04-25 7364 enum bpf_access_src type, struct bpf_call_arg_meta *meta)
17a5267067f3c3 Alexei Starovoitov 2014-09-26 7365 {
2a159c6f82381a Daniel Borkmann 2018-10-21 7366 struct bpf_reg_state *reg = reg_state(env, regno);
f4d7e40a5b7157 Alexei Starovoitov 2017-12-14 7367 struct bpf_func_state *state = func(env, reg);
f7cf25b2026dc8 Alexei Starovoitov 2019-06-15 7368 int err, min_off, max_off, i, j, slot, spi;
01f810ace9ed37 Andrei Matei 2021-02-06 7369 char *err_extra = type == ACCESS_HELPER ? " indirect" : "";
01f810ace9ed37 Andrei Matei 2021-02-06 7370 enum bpf_access_type bounds_check_type;
cf5a0c90a8bc5f Hou Tao 2024-10-08 7371 struct dynptr_key_state dynptr_key;
cf5a0c90a8bc5f Hou Tao 2024-10-08 7372 bool dynptr_read_allowed;
01f810ace9ed37 Andrei Matei 2021-02-06 7373 /* Some accesses can write anything into the stack, others are
01f810ace9ed37 Andrei Matei 2021-02-06 7374 * read-only.
01f810ace9ed37 Andrei Matei 2021-02-06 7375 */
01f810ace9ed37 Andrei Matei 2021-02-06 7376 bool clobber = false;
17a5267067f3c3 Alexei Starovoitov 2014-09-26 7377
81b030a7eaa2ee Hou Tao 2024-10-08 7378 if (access_size == 0 && !(access_flags & ACCESS_F_ZERO_SIZE_ALLOWED)) {
01f810ace9ed37 Andrei Matei 2021-02-06 7379 verbose(env, "invalid zero-sized read\n");
01f810ace9ed37 Andrei Matei 2021-02-06 7380 return -EACCES;
01f810ace9ed37 Andrei Matei 2021-02-06 7381 }
01f810ace9ed37 Andrei Matei 2021-02-06 7382
01f810ace9ed37 Andrei Matei 2021-02-06 7383 if (type == ACCESS_HELPER) {
01f810ace9ed37 Andrei Matei 2021-02-06 7384 /* The bounds checks for writes are more permissive than for
01f810ace9ed37 Andrei Matei 2021-02-06 7385 * reads. However, if raw_mode is not set, we'll do extra
01f810ace9ed37 Andrei Matei 2021-02-06 7386 * checks below.
01f810ace9ed37 Andrei Matei 2021-02-06 7387 */
01f810ace9ed37 Andrei Matei 2021-02-06 7388 bounds_check_type = BPF_WRITE;
01f810ace9ed37 Andrei Matei 2021-02-06 7389 clobber = true;
01f810ace9ed37 Andrei Matei 2021-02-06 7390 } else {
01f810ace9ed37 Andrei Matei 2021-02-06 7391 bounds_check_type = BPF_READ;
01f810ace9ed37 Andrei Matei 2021-02-06 7392 }
01f810ace9ed37 Andrei Matei 2021-02-06 7393 err = check_stack_access_within_bounds(env, regno, off, access_size,
01f810ace9ed37 Andrei Matei 2021-02-06 7394 type, bounds_check_type);
2011fccfb61bbd Andrey Ignatov 2019-03-28 7395 if (err)
2011fccfb61bbd Andrey Ignatov 2019-03-28 7396 return err;
01f810ace9ed37 Andrei Matei 2021-02-06 7397
cf5a0c90a8bc5f Hou Tao 2024-10-08 7398 dynptr_read_allowed = access_flags & ACCESS_F_DYNPTR_READ_ALLOWED;
01f810ace9ed37 Andrei Matei 2021-02-06 7399 if (tnum_is_const(reg->var_off)) {
01f810ace9ed37 Andrei Matei 2021-02-06 7400 min_off = max_off = reg->var_off.value + off;
cf5a0c90a8bc5f Hou Tao 2024-10-08 7401
cf5a0c90a8bc5f Hou Tao 2024-10-08 7402 if (dynptr_read_allowed && (min_off % BPF_REG_SIZE)) {
cf5a0c90a8bc5f Hou Tao 2024-10-08 7403 verbose(env, "R%d misaligned offset %d for dynptr-key\n", regno, min_off);
cf5a0c90a8bc5f Hou Tao 2024-10-08 7404 return -EACCES;
cf5a0c90a8bc5f Hou Tao 2024-10-08 7405 }
can meta be NULL on this path? If not then this is a false positive.
2011fccfb61bbd Andrey Ignatov 2019-03-28 7406 } else {
088ec26d9c2da9 Andrey Ignatov 2019-04-03 7407 /* Variable offset is prohibited for unprivileged mode for
088ec26d9c2da9 Andrey Ignatov 2019-04-03 7408 * simplicity since it requires corresponding support in
088ec26d9c2da9 Andrey Ignatov 2019-04-03 7409 * Spectre masking for stack ALU.
088ec26d9c2da9 Andrey Ignatov 2019-04-03 7410 * See also retrieve_ptr_limit().
088ec26d9c2da9 Andrey Ignatov 2019-04-03 7411 */
2c78ee898d8f10 Alexei Starovoitov 2020-05-13 7412 if (!env->bypass_spec_v1) {
f1174f77b50c94 Edward Cree 2017-08-07 7413 char tn_buf[48];
f1174f77b50c94 Edward Cree 2017-08-07 7414
914cb781ee1a35 Alexei Starovoitov 2017-11-30 7415 tnum_strn(tn_buf, sizeof(tn_buf), reg->var_off);
01f810ace9ed37 Andrei Matei 2021-02-06 7416 verbose(env, "R%d%s variable offset stack access prohibited for !root, var_off=%s\n",
01f810ace9ed37 Andrei Matei 2021-02-06 7417 regno, err_extra, tn_buf);
ea25f914dc164c Jann Horn 2017-12-18 7418 return -EACCES;
f1174f77b50c94 Edward Cree 2017-08-07 7419 }
cf5a0c90a8bc5f Hou Tao 2024-10-08 7420
cf5a0c90a8bc5f Hou Tao 2024-10-08 7421 if (dynptr_read_allowed) {
cf5a0c90a8bc5f Hou Tao 2024-10-08 7422 verbose(env, "R%d variable offset prohibited for dynptr-key\n", regno);
cf5a0c90a8bc5f Hou Tao 2024-10-08 7423 return -EACCES;
cf5a0c90a8bc5f Hou Tao 2024-10-08 7424 }
cf5a0c90a8bc5f Hou Tao 2024-10-08 7425
f2bcd05ec7b839 Andrey Ignatov 2019-04-03 7426 /* Only initialized buffer on stack is allowed to be accessed
f2bcd05ec7b839 Andrey Ignatov 2019-04-03 7427 * with variable offset. With uninitialized buffer it's hard to
f2bcd05ec7b839 Andrey Ignatov 2019-04-03 7428 * guarantee that whole memory is marked as initialized on
f2bcd05ec7b839 Andrey Ignatov 2019-04-03 7429 * helper return since specific bounds are unknown what may
f2bcd05ec7b839 Andrey Ignatov 2019-04-03 7430 * cause uninitialized stack leaking.
f2bcd05ec7b839 Andrey Ignatov 2019-04-03 7431 */
f2bcd05ec7b839 Andrey Ignatov 2019-04-03 7432 if (meta && meta->raw_mode)
f2bcd05ec7b839 Andrey Ignatov 2019-04-03 7433 meta = NULL;
f2bcd05ec7b839 Andrey Ignatov 2019-04-03 7434
01f810ace9ed37 Andrei Matei 2021-02-06 7435 min_off = reg->smin_value + off;
01f810ace9ed37 Andrei Matei 2021-02-06 7436 max_off = reg->smax_value + off;
107c26a70ca81b Andrey Ignatov 2019-04-03 7437 }
17a5267067f3c3 Alexei Starovoitov 2014-09-26 7438
435faee1aae9c1 Daniel Borkmann 2016-04-13 @7439 if (meta && meta->raw_mode) {
Check for NULL
ef8fc7a07c0e16 Kumar Kartikeya Dwivedi 2023-01-21 7440 /* Ensure we won't be overwriting dynptrs when simulating byte
ef8fc7a07c0e16 Kumar Kartikeya Dwivedi 2023-01-21 7441 * by byte access in check_helper_call using meta.access_size.
ef8fc7a07c0e16 Kumar Kartikeya Dwivedi 2023-01-21 7442 * This would be a problem if we have a helper in the future
ef8fc7a07c0e16 Kumar Kartikeya Dwivedi 2023-01-21 7443 * which takes:
ef8fc7a07c0e16 Kumar Kartikeya Dwivedi 2023-01-21 7444 *
ef8fc7a07c0e16 Kumar Kartikeya Dwivedi 2023-01-21 7445 * helper(uninit_mem, len, dynptr)
ef8fc7a07c0e16 Kumar Kartikeya Dwivedi 2023-01-21 7446 *
ef8fc7a07c0e16 Kumar Kartikeya Dwivedi 2023-01-21 7447 * Now, uninint_mem may overlap with dynptr pointer. Hence, it
ef8fc7a07c0e16 Kumar Kartikeya Dwivedi 2023-01-21 7448 * may end up writing to dynptr itself when touching memory from
ef8fc7a07c0e16 Kumar Kartikeya Dwivedi 2023-01-21 7449 * arg 1. This can be relaxed on a case by case basis for known
ef8fc7a07c0e16 Kumar Kartikeya Dwivedi 2023-01-21 7450 * safe cases, but reject due to the possibilitiy of aliasing by
ef8fc7a07c0e16 Kumar Kartikeya Dwivedi 2023-01-21 7451 * default.
ef8fc7a07c0e16 Kumar Kartikeya Dwivedi 2023-01-21 7452 */
ef8fc7a07c0e16 Kumar Kartikeya Dwivedi 2023-01-21 7453 for (i = min_off; i < max_off + access_size; i++) {
ef8fc7a07c0e16 Kumar Kartikeya Dwivedi 2023-01-21 7454 int stack_off = -i - 1;
ef8fc7a07c0e16 Kumar Kartikeya Dwivedi 2023-01-21 7455
ef8fc7a07c0e16 Kumar Kartikeya Dwivedi 2023-01-21 7456 spi = __get_spi(i);
ef8fc7a07c0e16 Kumar Kartikeya Dwivedi 2023-01-21 7457 /* raw_mode may write past allocated_stack */
ef8fc7a07c0e16 Kumar Kartikeya Dwivedi 2023-01-21 7458 if (state->allocated_stack <= stack_off)
ef8fc7a07c0e16 Kumar Kartikeya Dwivedi 2023-01-21 7459 continue;
ef8fc7a07c0e16 Kumar Kartikeya Dwivedi 2023-01-21 7460 if (state->stack[spi].slot_type[stack_off % BPF_REG_SIZE] == STACK_DYNPTR) {
ef8fc7a07c0e16 Kumar Kartikeya Dwivedi 2023-01-21 7461 verbose(env, "potential write to dynptr at off=%d disallowed\n", i);
ef8fc7a07c0e16 Kumar Kartikeya Dwivedi 2023-01-21 7462 return -EACCES;
ef8fc7a07c0e16 Kumar Kartikeya Dwivedi 2023-01-21 7463 }
ef8fc7a07c0e16 Kumar Kartikeya Dwivedi 2023-01-21 7464 }
435faee1aae9c1 Daniel Borkmann 2016-04-13 7465 meta->access_size = access_size;
435faee1aae9c1 Daniel Borkmann 2016-04-13 7466 meta->regno = regno;
435faee1aae9c1 Daniel Borkmann 2016-04-13 7467 return 0;
435faee1aae9c1 Daniel Borkmann 2016-04-13 7468 }
435faee1aae9c1 Daniel Borkmann 2016-04-13 7469
cf5a0c90a8bc5f Hou Tao 2024-10-08 7470 if (dynptr_read_allowed) {
cf5a0c90a8bc5f Hou Tao 2024-10-08 @7471 err = init_dynptr_key_state(env, meta->map_ptr->key_record, &dynptr_key);
^^^^^^^^^^^^^^^^^^^^^^^^^
Unchecked dereference
cf5a0c90a8bc5f Hou Tao 2024-10-08 7472 if (err)
cf5a0c90a8bc5f Hou Tao 2024-10-08 7473 return err;
cf5a0c90a8bc5f Hou Tao 2024-10-08 7474 }
2011fccfb61bbd Andrey Ignatov 2019-03-28 7475 for (i = min_off; i < max_off + access_size; i++) {
cc2b14d51053eb Alexei Starovoitov 2017-12-14 7476 u8 *stype;
cc2b14d51053eb Alexei Starovoitov 2017-12-14 7477
2011fccfb61bbd Andrey Ignatov 2019-03-28 7478 slot = -i - 1;
638f5b90d46016 Alexei Starovoitov 2017-10-31 7479 spi = slot / BPF_REG_SIZE;
6b4a64bafd107e Andrei Matei 2023-12-07 7480 if (state->allocated_stack <= slot) {
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH bpf-next 08/16] bpf: Handle bpf_dynptr_user in bpf syscall when it is used as input
2024-10-08 9:14 ` [PATCH bpf-next 08/16] bpf: Handle bpf_dynptr_user in bpf syscall when it is used as input Hou Tao
@ 2024-10-13 13:08 ` Dan Carpenter
2024-10-31 2:44 ` Hou Tao
0 siblings, 1 reply; 47+ messages in thread
From: Dan Carpenter @ 2024-10-13 13:08 UTC (permalink / raw)
To: oe-kbuild, Hou Tao, bpf
Cc: lkp, oe-kbuild-all, Martin KaFai Lau, Alexei Starovoitov,
Andrii Nakryiko, Eduard Zingerman, Song Liu, Hao Luo,
Yonghong Song, Daniel Borkmann, KP Singh, Stanislav Fomichev,
Jiri Olsa, John Fastabend, houtao1, xukuohai
Hi Hou,
kernel test robot noticed the following build warnings:
url: https://github.com/intel-lab-lkp/linux/commits/Hou-Tao/bpf-Introduce-map-flag-BPF_F_DYNPTR_IN_KEY/20241008-171136
base: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
patch link: https://lore.kernel.org/r/20241008091501.8302-9-houtao%40huaweicloud.com
patch subject: [PATCH bpf-next 08/16] bpf: Handle bpf_dynptr_user in bpf syscall when it is used as input
config: x86_64-randconfig-161-20241011 (https://download.01.org/0day-ci/archive/20241012/202410120530.zUoa1scp-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
| Closes: https://lore.kernel.org/r/202410120530.zUoa1scp-lkp@intel.com/
smatch warnings:
kernel/bpf/syscall.c:1557 bpf_copy_from_dynptr_ukey() warn: 'key' is an error pointer or valid
vim +/key +1557 kernel/bpf/syscall.c
e1883aa78ac1fe9 Hou Tao 2024-10-08 1543 static void *bpf_copy_from_dynptr_ukey(const struct bpf_map *map, bpfptr_t ukey)
e1883aa78ac1fe9 Hou Tao 2024-10-08 1544 {
e1883aa78ac1fe9 Hou Tao 2024-10-08 1545 const struct btf_record *record;
e1883aa78ac1fe9 Hou Tao 2024-10-08 1546 const struct btf_field *field;
e1883aa78ac1fe9 Hou Tao 2024-10-08 1547 struct bpf_dynptr_user *uptr;
e1883aa78ac1fe9 Hou Tao 2024-10-08 1548 struct bpf_dynptr_kern *kptr;
e1883aa78ac1fe9 Hou Tao 2024-10-08 1549 void *key, *new_key, *kdata;
e1883aa78ac1fe9 Hou Tao 2024-10-08 1550 unsigned int key_size, size;
e1883aa78ac1fe9 Hou Tao 2024-10-08 1551 bpfptr_t udata;
e1883aa78ac1fe9 Hou Tao 2024-10-08 1552 unsigned int i;
e1883aa78ac1fe9 Hou Tao 2024-10-08 1553 int err;
e1883aa78ac1fe9 Hou Tao 2024-10-08 1554
e1883aa78ac1fe9 Hou Tao 2024-10-08 1555 key_size = map->key_size;
e1883aa78ac1fe9 Hou Tao 2024-10-08 1556 key = kvmemdup_bpfptr(ukey, key_size);
e1883aa78ac1fe9 Hou Tao 2024-10-08 @1557 if (!key)
This should be if (IS_ERR(key))
e1883aa78ac1fe9 Hou Tao 2024-10-08 1558 return ERR_PTR(-ENOMEM);
e1883aa78ac1fe9 Hou Tao 2024-10-08 1559
e1883aa78ac1fe9 Hou Tao 2024-10-08 1560 size = key_size;
e1883aa78ac1fe9 Hou Tao 2024-10-08 1561 record = map->key_record;
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH bpf-next 01/16] bpf: Introduce map flag BPF_F_DYNPTR_IN_KEY
2024-10-10 2:21 ` Alexei Starovoitov
@ 2024-10-21 13:45 ` Hou Tao
2024-10-22 3:53 ` Alexei Starovoitov
0 siblings, 1 reply; 47+ messages in thread
From: Hou Tao @ 2024-10-21 13:45 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: bpf, Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko,
Eduard Zingerman, Song Liu, Hao Luo, Yonghong Song,
Daniel Borkmann, KP Singh, Stanislav Fomichev, Jiri Olsa,
John Fastabend, Hou Tao, Xu Kuohai
Hi,
On 10/10/2024 10:21 AM, Alexei Starovoitov wrote:
> On Tue, Oct 8, 2024 at 2:02 AM Hou Tao <houtao@huaweicloud.com> wrote:
>> index c6cd7c7aeeee..07f7df308a01 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -1409,6 +1409,9 @@ enum {
>>
>> /* Do not translate kernel bpf_arena pointers to user pointers */
>> BPF_F_NO_USER_CONV = (1U << 18),
>> +
>> +/* Create a map with bpf_dynptr in key */
>> + BPF_F_DYNPTR_IN_KEY = (1U << 19),
>> };
> If I'm reading the other patches correctly this uapi flag
> is unnecessary.
> BTF describes the fields and dynptr is either there or not.
> Why require users to add an extra flag ?
Sorry for the late reply. The reason for an extra flag is to make a bpf
map which had already used bpf_dynptr in its key to work as before. I
was not sure whether or not there is such case, so I added an extra
flag. If the case is basically impossible, I can remove it in the next
revision.
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH bpf-next 03/16] bpf: Parse bpf_dynptr in map key
2024-10-10 18:02 ` Eduard Zingerman
@ 2024-10-21 13:48 ` Hou Tao
0 siblings, 0 replies; 47+ messages in thread
From: Hou Tao @ 2024-10-21 13:48 UTC (permalink / raw)
To: Eduard Zingerman, bpf
Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko, Song Liu,
Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh,
Stanislav Fomichev, Jiri Olsa, John Fastabend, houtao1, xukuohai
Hi,
On 10/11/2024 2:02 AM, Eduard Zingerman wrote:
> On Tue, 2024-10-08 at 17:14 +0800, Hou Tao wrote:
>
> [...]
>
>> diff --git a/kernel/bpf/map_in_map.c b/kernel/bpf/map_in_map.c
>> index 645bd30bc9a9..a072835dc645 100644
>> --- a/kernel/bpf/map_in_map.c
>> +++ b/kernel/bpf/map_in_map.c
> [...]
>
>> @@ -45,9 +45,13 @@ struct bpf_map *bpf_map_meta_alloc(int inner_map_ufd)
>> * invalid/empty/valid, but ERR_PTR in case of errors. During
>> * equality NULL or IS_ERR is equivalent.
>> */
>> - struct bpf_map *ret = ERR_CAST(inner_map_meta->record);
>> - kfree(inner_map_meta);
>> - return ret;
>> + ret = ERR_CAST(inner_map_meta->record);
>> + goto free;
>> + }
>> + inner_map_meta->key_record = btf_record_dup(inner_map->key_record);
>> + if (IS_ERR(inner_map_meta->key_record)) {
>> + ret = ERR_CAST(inner_map_meta->key_record);
>> + goto free;
> The 'goto free' executes a call to bpf_map_meta_free() which does
> btf_put(map_meta->btf), but corresponding btf_get(inner_map->btf) only
> happens on the lines below => in case when 'free' branch is taken we
> 'put' BTF object that was not 'get' by us.
Yes, but map_meta->btf is NULL, so calling btf_put(NULL) incurs no harm.
My purpose was trying to simplify the error handling, but it seems that
it leads to confusion. Will only undo the done part in next revision.
>
>> }
>> /* Note: We must use the same BTF, as we also used btf_record_dup above
>> * which relies on BTF being same for both maps, as some members like
>> @@ -71,6 +75,10 @@ struct bpf_map *bpf_map_meta_alloc(int inner_map_ufd)
>> inner_map_meta->bypass_spec_v1 = inner_map->bypass_spec_v1;
>> }
>> return inner_map_meta;
>> +
>> +free:
>> + bpf_map_meta_free(inner_map_meta);
>> + return ret;
>> }
>>
>> void bpf_map_meta_free(struct bpf_map *map_meta)
> [...]
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH bpf-next 05/16] bpf: Support map key with dynptr in verifier
2024-10-10 20:57 ` Eduard Zingerman
@ 2024-10-21 13:50 ` Hou Tao
0 siblings, 0 replies; 47+ messages in thread
From: Hou Tao @ 2024-10-21 13:50 UTC (permalink / raw)
To: Eduard Zingerman, bpf
Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko, Song Liu,
Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh,
Stanislav Fomichev, Jiri Olsa, John Fastabend, houtao1, xukuohai
On 10/11/2024 4:57 AM, Eduard Zingerman wrote:
> On Thu, 2024-10-10 at 13:30 -0700, Eduard Zingerman wrote:
>
> [...]
>
>> The logic of this patch looks correct, however I find it cumbersome.
>> The only place where access to dynptr key is allowed is 'case ARG_PTR_TO_MAP_KEY'
>> in check_func_arg(), a lot of places are modified to facilitate this.
>> It seems that logic would be easier to follow if there would be a
>> dedicated function to check dynptr key constraints, called only for
>> the 'case ARG_PTR_TO_MAP_KEY'. This would als make 'struct dynptr_key_state'
>> unnecessary as this state would be tracked inside such function.
>> Wdyt?
> Just realized that change to check_stack_range_initialized would still
> be necessary, as it forbids dynptr access at the moment. Unfortunate.
Thanks for the suggestion. Will check later whether a cleaner way is
available or not.
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH bpf-next 06/16] bpf: Introduce bpf_dynptr_user
2024-10-10 21:50 ` Andrii Nakryiko
2024-10-10 22:12 ` Alexei Starovoitov
@ 2024-10-21 13:51 ` Hou Tao
1 sibling, 0 replies; 47+ messages in thread
From: Hou Tao @ 2024-10-21 13:51 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: bpf, Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko,
Eduard Zingerman, Song Liu, Hao Luo, Yonghong Song,
Daniel Borkmann, KP Singh, Stanislav Fomichev, Jiri Olsa,
John Fastabend, houtao1, xukuohai
HI,
On 10/11/2024 5:50 AM, Andrii Nakryiko wrote:
> On Tue, Oct 8, 2024 at 2:02 AM Hou Tao <houtao@huaweicloud.com> wrote:
>> From: Hou Tao <houtao1@huawei.com>
>>
>> For bpf map with dynptr key support, the userspace application will use
>> bpf_dynptr_user to represent the bpf_dynptr in the map key and pass it
>> to bpf syscall. The bpf syscall will copy from bpf_dynptr_user to
>> construct a corresponding bpf_dynptr_kern object when the map key is an
>> input argument, and copy to bpf_dynptr_user from a bpf_dynptr_kern
>> object when the map key is an output argument.
>>
>> For now the size of bpf_dynptr_user must be the same as bpf_dynptr, but
>> the last u32 field is not used, so make it a reserved field.
>>
>> Signed-off-by: Hou Tao <houtao1@huawei.com>
>> ---
>> include/uapi/linux/bpf.h | 6 ++++++
>> tools/include/uapi/linux/bpf.h | 6 ++++++
>> 2 files changed, 12 insertions(+)
>>
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index 07f7df308a01..72fe6a96b54c 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -7329,6 +7329,12 @@ struct bpf_dynptr {
>> __u64 __opaque[2];
>> } __attribute__((aligned(8)));
>>
>> +struct bpf_dynptr_user {
> bikeshedding: maybe just bpf_udynptr?
>
>> + __u64 data;
>> + __u32 size;
>> + __u32 rsvd;
>> +} __attribute__((aligned(8)));
>> +
>> struct bpf_list_head {
>> __u64 __opaque[2];
>> } __attribute__((aligned(8)));
>> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
>> index 14f223282bfa..f12ce268e6be 100644
>> --- a/tools/include/uapi/linux/bpf.h
>> +++ b/tools/include/uapi/linux/bpf.h
>> @@ -7328,6 +7328,12 @@ struct bpf_dynptr {
>> __u64 __opaque[2];
>> } __attribute__((aligned(8)));
>>
>> +struct bpf_dynptr_user {
>> + __u64 data;
> what if we use __bpf_md_ptr(void *, data), so users can just directly
> use this struct (and then the next patch won't be necessary at all)
Thanks for the suggestion. Will do in v2.
>> + __u32 size;
>> + __u32 rsvd;
> please call it __reserved
Got it.
>
>
>> +} __attribute__((aligned(8)));
>> +
>> struct bpf_list_head {
>> __u64 __opaque[2];
>> } __attribute__((aligned(8)));
>> --
>> 2.44.0
>>
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH bpf-next 07/16] libbpf: Add helpers for bpf_dynptr_user
2024-10-10 21:50 ` Andrii Nakryiko
@ 2024-10-21 13:51 ` Hou Tao
0 siblings, 0 replies; 47+ messages in thread
From: Hou Tao @ 2024-10-21 13:51 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: bpf, Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko,
Eduard Zingerman, Song Liu, Hao Luo, Yonghong Song,
Daniel Borkmann, KP Singh, Stanislav Fomichev, Jiri Olsa,
John Fastabend, houtao1, xukuohai
Hi,
On 10/11/2024 5:50 AM, Andrii Nakryiko wrote:
> On Tue, Oct 8, 2024 at 2:02 AM Hou Tao <houtao@huaweicloud.com> wrote:
>> From: Hou Tao <houtao1@huawei.com>
>>
>> Add bpf_dynptr_user_init() to initialize a bpf_dynptr_user object,
>> bpf_dynptr_user_{data,size}() to get the address and length of the
>> dynptr object, and bpf_dynptr_user_set_size() to set the its size.
>>
>> Instead of exporting these symbols, simply adding these helpers as
>> inline functions in bpf.h.
>>
>> Signed-off-by: Hou Tao <houtao1@huawei.com>
>> ---
>> tools/lib/bpf/bpf.h | 27 +++++++++++++++++++++++++++
>> 1 file changed, 27 insertions(+)
>>
> I don't think we need this patch and these APIs at all, let user work
> with bpf_udynptr directly
Got it. Will drop it in v2.
>
>> diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
>> index a4a7b1ad1b63..92b4afac5f5f 100644
>> --- a/tools/lib/bpf/bpf.h
>> +++ b/tools/lib/bpf/bpf.h
>> @@ -700,6 +700,33 @@ struct bpf_token_create_opts {
>> LIBBPF_API int bpf_token_create(int bpffs_fd,
>> struct bpf_token_create_opts *opts);
>>
>> +/* sys_bpf() will check the validity of data and size */
>> +static inline void bpf_dynptr_user_init(void *data, __u32 size,
>> + struct bpf_dynptr_user *dynptr)
>> +{
>> + dynptr->data = (__u64)(unsigned long)data;
>> + dynptr->size = size;
>> + dynptr->rsvd = 0;
>> +}
>> +
>> +static inline void bpf_dynptr_user_set_size(struct bpf_dynptr_user *dynptr,
>> + __u32 new_size)
>> +{
>> + dynptr->size = new_size;
>> +}
>> +
>> +static inline __u32
>> +bpf_dynptr_user_size(const struct bpf_dynptr_user *dynptr)
>> +{
>> + return dynptr->size;
>> +}
>> +
>> +static inline void *
>> +bpf_dynptr_user_data(const struct bpf_dynptr_user *dynptr)
>> +{
>> + return (void *)(unsigned long)dynptr->data;
>> +}
>> +
>> #ifdef __cplusplus
>> } /* extern "C" */
>> #endif
>> --
>> 2.44.0
>>
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH bpf-next 03/16] bpf: Parse bpf_dynptr in map key
2024-10-11 16:29 ` Alexei Starovoitov
@ 2024-10-21 14:02 ` Hou Tao
2024-10-22 3:59 ` Alexei Starovoitov
0 siblings, 1 reply; 47+ messages in thread
From: Hou Tao @ 2024-10-21 14:02 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: bpf, Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko,
Eduard Zingerman, Song Liu, Hao Luo, Yonghong Song,
Daniel Borkmann, KP Singh, Stanislav Fomichev, Jiri Olsa,
John Fastabend, Hou Tao, Xu Kuohai
Hi,
On 10/12/2024 12:29 AM, Alexei Starovoitov wrote:
> On Tue, Oct 8, 2024 at 2:02 AM Hou Tao <houtao@huaweicloud.com> wrote:
>> +#define MAX_DYNPTR_CNT_IN_MAP_KEY 4
>> +
>> static int map_check_btf(struct bpf_map *map, struct bpf_token *token,
>> const struct btf *btf, u32 btf_key_id, u32 btf_value_id)
>> {
>> @@ -1103,6 +1113,40 @@ static int map_check_btf(struct bpf_map *map, struct bpf_token *token,
>> if (!value_type || value_size != map->value_size)
>> return -EINVAL;
>>
>> + if (btf_type_is_dynptr(btf, key_type))
>> + map->key_record = btf_new_bpf_dynptr_record();
>> + else
>> + map->key_record = btf_parse_fields(btf, key_type, BPF_DYNPTR, map->key_size);
>> + if (!IS_ERR_OR_NULL(map->key_record)) {
>> + if (map->key_record->cnt > MAX_DYNPTR_CNT_IN_MAP_KEY) {
>> + ret = -E2BIG;
>> + goto free_map_tab;
> Took me a while to grasp that map->key_record is only for dynptr fields
> and map->record is for the rest except dynptr fields.
>
> Maybe rename key_record to dynptr_fields ?
> Or at least add a comment to struct bpf_map to explain
> what each btf_record is for.
I was trying to rename map->record to map->value_record, however, I was
afraid that it may introduce too much churn, so I didn't do that. But I
think it is a good idea to add comments for both btf_record. And
considering that only bpf_dynptr is enabled for map key, renaming it to
dynptr_fields seems reasonable as well.
>
> It's kinda arbitrary decision to support multiple dynptr-s per key
> while other fields are not.
> Maybe worth looking at generalizing it a bit so single btf_record
> can have multiple of certain field kinds?
> In addition to btf_record->cnt you'd need btf_record->dynptr_cnt
> but that would be easier to extend in the future ?
Map value has already supported multiple kptrs or bpf_list_node. And in
the discussion [1], I thought multiple dynptr support in map key is
necessary, so I enabled it.
[1]:
https://lore.kernel.org/bpf/CAADnVQJWaBRB=P-ZNkppwm=0tZaT3qP8PKLLJ2S5SSA2-S8mxg@mail.gmail.com/
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH bpf-next 16/16] selftests/bpf: Add test cases for hash map with dynptr key
2024-10-11 18:23 ` Alexei Starovoitov
@ 2024-10-21 14:05 ` Hou Tao
0 siblings, 0 replies; 47+ messages in thread
From: Hou Tao @ 2024-10-21 14:05 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: bpf, Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko,
Eduard Zingerman, Song Liu, Hao Luo, Yonghong Song,
Daniel Borkmann, KP Singh, Stanislav Fomichev, Jiri Olsa,
John Fastabend, Hou Tao, Xu Kuohai
Hi,
On 10/12/2024 2:23 AM, Alexei Starovoitov wrote:
> On Tue, Oct 8, 2024 at 2:02 AM Hou Tao <houtao@huaweicloud.com> wrote:
>> +
>> +SEC("?fentry/" SYS_PREFIX "sys_nanosleep")
>> +int BPF_PROG(pure_dynptr_key)
> ...
>
>> +SEC("?fentry/" SYS_PREFIX "sys_nanosleep")
>> +int BPF_PROG(mixed_dynptr_key)
> ...
>> +SEC("?fentry/" SYS_PREFIX "sys_nanosleep")
>> +int BPF_PROG(multiple_dynptr_key)
> attaching to syscalls with pid filtering is ok-ish,
> but it's a few unnecessary steps.
> Use tracing prog for non-sleepable and syscall prog for sleepable
> and bpf_prog_run() it.
> More predictable and no need for a pid filter.
Thanks for the suggestion. Will do in v2.
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH bpf-next 00/16] Support dynptr key for hash map
2024-10-11 22:11 ` [PATCH bpf-next 00/16] Support dynptr key for hash map Eduard Zingerman
@ 2024-10-21 14:09 ` Hou Tao
0 siblings, 0 replies; 47+ messages in thread
From: Hou Tao @ 2024-10-21 14:09 UTC (permalink / raw)
To: Eduard Zingerman, bpf
Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko, Song Liu,
Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh,
Stanislav Fomichev, Jiri Olsa, John Fastabend, houtao1, xukuohai
Hi,
On 10/12/2024 6:11 AM, Eduard Zingerman wrote:
> On Tue, 2024-10-08 at 17:14 +0800, Hou Tao wrote:
>
> [...]
>
>> As usual, comments and suggestions are always welcome.
> fwiw, I've read through the patches in the series and code changes
> seem all to make sense. Executing selftests with KASAN enabled also
> does not show any issues.
Thanks for the review and test.
>
> Maybe add benchmarks in v2?
I have written a benchmark to compare the performance between the normal
hash table and the dynptr-key hash map (this is the source of the
benchmark data in cover letter). However, it was not polished when I
posted v1. Will include it in v2.
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH bpf-next 01/16] bpf: Introduce map flag BPF_F_DYNPTR_IN_KEY
2024-10-21 13:45 ` Hou Tao
@ 2024-10-22 3:53 ` Alexei Starovoitov
2024-10-22 4:22 ` Hou Tao
0 siblings, 1 reply; 47+ messages in thread
From: Alexei Starovoitov @ 2024-10-22 3:53 UTC (permalink / raw)
To: Hou Tao
Cc: bpf, Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko,
Eduard Zingerman, Song Liu, Hao Luo, Yonghong Song,
Daniel Borkmann, KP Singh, Stanislav Fomichev, Jiri Olsa,
John Fastabend, Hou Tao, Xu Kuohai
On Mon, Oct 21, 2024 at 6:46 AM Hou Tao <houtao@huaweicloud.com> wrote:
>
> Hi,
>
> On 10/10/2024 10:21 AM, Alexei Starovoitov wrote:
> > On Tue, Oct 8, 2024 at 2:02 AM Hou Tao <houtao@huaweicloud.com> wrote:
> >> index c6cd7c7aeeee..07f7df308a01 100644
> >> --- a/include/uapi/linux/bpf.h
> >> +++ b/include/uapi/linux/bpf.h
> >> @@ -1409,6 +1409,9 @@ enum {
> >>
> >> /* Do not translate kernel bpf_arena pointers to user pointers */
> >> BPF_F_NO_USER_CONV = (1U << 18),
> >> +
> >> +/* Create a map with bpf_dynptr in key */
> >> + BPF_F_DYNPTR_IN_KEY = (1U << 19),
> >> };
> > If I'm reading the other patches correctly this uapi flag
> > is unnecessary.
> > BTF describes the fields and dynptr is either there or not.
> > Why require users to add an extra flag ?
>
> Sorry for the late reply. The reason for an extra flag is to make a bpf
> map which had already used bpf_dynptr in its key to work as before. I
> was not sure whether or not there is such case, so I added an extra
> flag. If the case is basically impossible, I can remove it in the next
> revision.
Hmm. bpf_dynptr is a kernel type and iirc (after paging in
the context after 12 days of silence) you were proposing to add
a new bpf_dynptr_user type which theoretically can be present
in the key, but it's fine to break such progs.
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH bpf-next 03/16] bpf: Parse bpf_dynptr in map key
2024-10-21 14:02 ` Hou Tao
@ 2024-10-22 3:59 ` Alexei Starovoitov
2024-10-22 7:20 ` Hou Tao
0 siblings, 1 reply; 47+ messages in thread
From: Alexei Starovoitov @ 2024-10-22 3:59 UTC (permalink / raw)
To: Hou Tao
Cc: bpf, Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko,
Eduard Zingerman, Song Liu, Hao Luo, Yonghong Song,
Daniel Borkmann, KP Singh, Stanislav Fomichev, Jiri Olsa,
John Fastabend, Hou Tao, Xu Kuohai
On Mon, Oct 21, 2024 at 7:02 AM Hou Tao <houtao@huaweicloud.com> wrote:
>
> Hi,
>
> On 10/12/2024 12:29 AM, Alexei Starovoitov wrote:
> > On Tue, Oct 8, 2024 at 2:02 AM Hou Tao <houtao@huaweicloud.com> wrote:
> >> +#define MAX_DYNPTR_CNT_IN_MAP_KEY 4
> >> +
> >> static int map_check_btf(struct bpf_map *map, struct bpf_token *token,
> >> const struct btf *btf, u32 btf_key_id, u32 btf_value_id)
> >> {
> >> @@ -1103,6 +1113,40 @@ static int map_check_btf(struct bpf_map *map, struct bpf_token *token,
> >> if (!value_type || value_size != map->value_size)
> >> return -EINVAL;
> >>
> >> + if (btf_type_is_dynptr(btf, key_type))
> >> + map->key_record = btf_new_bpf_dynptr_record();
> >> + else
> >> + map->key_record = btf_parse_fields(btf, key_type, BPF_DYNPTR, map->key_size);
> >> + if (!IS_ERR_OR_NULL(map->key_record)) {
> >> + if (map->key_record->cnt > MAX_DYNPTR_CNT_IN_MAP_KEY) {
> >> + ret = -E2BIG;
> >> + goto free_map_tab;
> > Took me a while to grasp that map->key_record is only for dynptr fields
> > and map->record is for the rest except dynptr fields.
> >
> > Maybe rename key_record to dynptr_fields ?
> > Or at least add a comment to struct bpf_map to explain
> > what each btf_record is for.
>
> I was trying to rename map->record to map->value_record, however, I was
> afraid that it may introduce too much churn, so I didn't do that. But I
> think it is a good idea to add comments for both btf_record. And
> considering that only bpf_dynptr is enabled for map key, renaming it to
> dynptr_fields seems reasonable as well.
> >
> > It's kinda arbitrary decision to support multiple dynptr-s per key
> > while other fields are not.
> > Maybe worth looking at generalizing it a bit so single btf_record
> > can have multiple of certain field kinds?
> > In addition to btf_record->cnt you'd need btf_record->dynptr_cnt
> > but that would be easier to extend in the future ?
>
> Map value has already supported multiple kptrs or bpf_list_node.
fwiw I believe we reached the dead end there.
The whole support for bpf_list and bpf_rb_tree may get deprecated
and removed. The expected users didn't materialize.
> And in
> the discussion [1], I thought multiple dynptr support in map key is
> necessary, so I enabled it.
>
> [1]:
> https://lore.kernel.org/bpf/CAADnVQJWaBRB=P-ZNkppwm=0tZaT3qP8PKLLJ2S5SSA2-S8mxg@mail.gmail.com/
Sure. That's a different reasoning and use case.
I'm proposing to use a single btf_record with different cnt-s.
The current btf_record->cnt will stay as-is indicating total number of fields
while btf_record->dynptr_cnt will be just for these dynptrs you're introducing.
Then you won't need two top level btf_record-s.
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH bpf-next 01/16] bpf: Introduce map flag BPF_F_DYNPTR_IN_KEY
2024-10-22 3:53 ` Alexei Starovoitov
@ 2024-10-22 4:22 ` Hou Tao
0 siblings, 0 replies; 47+ messages in thread
From: Hou Tao @ 2024-10-22 4:22 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: bpf, Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko,
Eduard Zingerman, Song Liu, Hao Luo, Yonghong Song,
Daniel Borkmann, KP Singh, Stanislav Fomichev, Jiri Olsa,
John Fastabend, Hou Tao, Xu Kuohai
Hi,
On 10/22/2024 11:53 AM, Alexei Starovoitov wrote:
> On Mon, Oct 21, 2024 at 6:46 AM Hou Tao <houtao@huaweicloud.com> wrote:
>> Hi,
>>
>> On 10/10/2024 10:21 AM, Alexei Starovoitov wrote:
>>> On Tue, Oct 8, 2024 at 2:02 AM Hou Tao <houtao@huaweicloud.com> wrote:
>>>> index c6cd7c7aeeee..07f7df308a01 100644
>>>> --- a/include/uapi/linux/bpf.h
>>>> +++ b/include/uapi/linux/bpf.h
>>>> @@ -1409,6 +1409,9 @@ enum {
>>>>
>>>> /* Do not translate kernel bpf_arena pointers to user pointers */
>>>> BPF_F_NO_USER_CONV = (1U << 18),
>>>> +
>>>> +/* Create a map with bpf_dynptr in key */
>>>> + BPF_F_DYNPTR_IN_KEY = (1U << 19),
>>>> };
>>> If I'm reading the other patches correctly this uapi flag
>>> is unnecessary.
>>> BTF describes the fields and dynptr is either there or not.
>>> Why require users to add an extra flag ?
>> Sorry for the late reply. The reason for an extra flag is to make a bpf
>> map which had already used bpf_dynptr in its key to work as before. I
>> was not sure whether or not there is such case, so I added an extra
>> flag. If the case is basically impossible, I can remove it in the next
>> revision.
> Hmm. bpf_dynptr is a kernel type and iirc (after paging in
> the context after 12 days of silence) you were proposing to add
> a new bpf_dynptr_user type which theoretically can be present
> in the key, but it's fine to break such progs.
Got it. Will remove the extra flag in the next revision. Sorry again for
the long delay. Will try to reply timely next time. bpf_dynptr_user is
only for syscall, bpf_dynptr will be used in map key.
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH bpf-next 03/16] bpf: Parse bpf_dynptr in map key
2024-10-22 3:59 ` Alexei Starovoitov
@ 2024-10-22 7:20 ` Hou Tao
2024-10-22 18:44 ` Alexei Starovoitov
0 siblings, 1 reply; 47+ messages in thread
From: Hou Tao @ 2024-10-22 7:20 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: bpf, Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko,
Eduard Zingerman, Song Liu, Hao Luo, Yonghong Song,
Daniel Borkmann, KP Singh, Stanislav Fomichev, Jiri Olsa,
John Fastabend, Hou Tao, Xu Kuohai
Hi,
On 10/22/2024 11:59 AM, Alexei Starovoitov wrote:
> On Mon, Oct 21, 2024 at 7:02 AM Hou Tao <houtao@huaweicloud.com> wrote:
>> Hi,
>>
>> On 10/12/2024 12:29 AM, Alexei Starovoitov wrote:
>>> On Tue, Oct 8, 2024 at 2:02 AM Hou Tao <houtao@huaweicloud.com> wrote:
>>>> +#define MAX_DYNPTR_CNT_IN_MAP_KEY 4
>>>> +
>>>> static int map_check_btf(struct bpf_map *map, struct bpf_token *token,
>>>> const struct btf *btf, u32 btf_key_id, u32 btf_value_id)
>>>> {
>>>> @@ -1103,6 +1113,40 @@ static int map_check_btf(struct bpf_map *map, struct bpf_token *token,
>>>> if (!value_type || value_size != map->value_size)
>>>> return -EINVAL;
>>>>
>>>> + if (btf_type_is_dynptr(btf, key_type))
>>>> + map->key_record = btf_new_bpf_dynptr_record();
>>>> + else
>>>> + map->key_record = btf_parse_fields(btf, key_type, BPF_DYNPTR, map->key_size);
>>>> + if (!IS_ERR_OR_NULL(map->key_record)) {
>>>> + if (map->key_record->cnt > MAX_DYNPTR_CNT_IN_MAP_KEY) {
>>>> + ret = -E2BIG;
>>>> + goto free_map_tab;
>>> Took me a while to grasp that map->key_record is only for dynptr fields
>>> and map->record is for the rest except dynptr fields.
>>>
>>> Maybe rename key_record to dynptr_fields ?
>>> Or at least add a comment to struct bpf_map to explain
>>> what each btf_record is for.
>> I was trying to rename map->record to map->value_record, however, I was
>> afraid that it may introduce too much churn, so I didn't do that. But I
>> think it is a good idea to add comments for both btf_record. And
>> considering that only bpf_dynptr is enabled for map key, renaming it to
>> dynptr_fields seems reasonable as well.
>>> It's kinda arbitrary decision to support multiple dynptr-s per key
>>> while other fields are not.
>>> Maybe worth looking at generalizing it a bit so single btf_record
>>> can have multiple of certain field kinds?
>>> In addition to btf_record->cnt you'd need btf_record->dynptr_cnt
>>> but that would be easier to extend in the future ?
>> Map value has already supported multiple kptrs or bpf_list_node.
> fwiw I believe we reached the dead end there.
> The whole support for bpf_list and bpf_rb_tree may get deprecated
> and removed. The expected users didn't materialize.
OK.
>
>> And in
>> the discussion [1], I thought multiple dynptr support in map key is
>> necessary, so I enabled it.
>>
>> [1]:
>> https://lore.kernel.org/bpf/CAADnVQJWaBRB=P-ZNkppwm=0tZaT3qP8PKLLJ2S5SSA2-S8mxg@mail.gmail.com/
> Sure. That's a different reasoning and use case.
> I'm proposing to use a single btf_record with different cnt-s.
> The current btf_record->cnt will stay as-is indicating total number of fields
> while btf_record->dynptr_cnt will be just for these dynptrs you're introducing.
> Then you won't need two top level btf_record-s.
I misunderstood your suggestion yesterday. Now I see what you are
suggesting. However, it seems using a separated counter for different
kinds of btf_field will only benefit dynptr field, because other types
doesn't need to iterate all field instead they just need to find one
through binary search. And I don't understand why only one btf_record
will be enough, because these two btf_records are derived from map key
and map value respectively.
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH bpf-next 03/16] bpf: Parse bpf_dynptr in map key
2024-10-22 7:20 ` Hou Tao
@ 2024-10-22 18:44 ` Alexei Starovoitov
0 siblings, 0 replies; 47+ messages in thread
From: Alexei Starovoitov @ 2024-10-22 18:44 UTC (permalink / raw)
To: Hou Tao
Cc: bpf, Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko,
Eduard Zingerman, Song Liu, Hao Luo, Yonghong Song,
Daniel Borkmann, KP Singh, Stanislav Fomichev, Jiri Olsa,
John Fastabend, Hou Tao, Xu Kuohai
On Tue, Oct 22, 2024 at 12:20 AM Hou Tao <houtao@huaweicloud.com> wrote:
>
> Hi,
>
> On 10/22/2024 11:59 AM, Alexei Starovoitov wrote:
> > On Mon, Oct 21, 2024 at 7:02 AM Hou Tao <houtao@huaweicloud.com> wrote:
> >> Hi,
> >>
> >> On 10/12/2024 12:29 AM, Alexei Starovoitov wrote:
> >>> On Tue, Oct 8, 2024 at 2:02 AM Hou Tao <houtao@huaweicloud.com> wrote:
> >>>> +#define MAX_DYNPTR_CNT_IN_MAP_KEY 4
> >>>> +
> >>>> static int map_check_btf(struct bpf_map *map, struct bpf_token *token,
> >>>> const struct btf *btf, u32 btf_key_id, u32 btf_value_id)
> >>>> {
> >>>> @@ -1103,6 +1113,40 @@ static int map_check_btf(struct bpf_map *map, struct bpf_token *token,
> >>>> if (!value_type || value_size != map->value_size)
> >>>> return -EINVAL;
> >>>>
> >>>> + if (btf_type_is_dynptr(btf, key_type))
> >>>> + map->key_record = btf_new_bpf_dynptr_record();
> >>>> + else
> >>>> + map->key_record = btf_parse_fields(btf, key_type, BPF_DYNPTR, map->key_size);
> >>>> + if (!IS_ERR_OR_NULL(map->key_record)) {
> >>>> + if (map->key_record->cnt > MAX_DYNPTR_CNT_IN_MAP_KEY) {
> >>>> + ret = -E2BIG;
> >>>> + goto free_map_tab;
> >>> Took me a while to grasp that map->key_record is only for dynptr fields
> >>> and map->record is for the rest except dynptr fields.
> >>>
> >>> Maybe rename key_record to dynptr_fields ?
> >>> Or at least add a comment to struct bpf_map to explain
> >>> what each btf_record is for.
> >> I was trying to rename map->record to map->value_record, however, I was
> >> afraid that it may introduce too much churn, so I didn't do that. But I
> >> think it is a good idea to add comments for both btf_record. And
> >> considering that only bpf_dynptr is enabled for map key, renaming it to
> >> dynptr_fields seems reasonable as well.
> >>> It's kinda arbitrary decision to support multiple dynptr-s per key
> >>> while other fields are not.
> >>> Maybe worth looking at generalizing it a bit so single btf_record
> >>> can have multiple of certain field kinds?
> >>> In addition to btf_record->cnt you'd need btf_record->dynptr_cnt
> >>> but that would be easier to extend in the future ?
> >> Map value has already supported multiple kptrs or bpf_list_node.
> > fwiw I believe we reached the dead end there.
> > The whole support for bpf_list and bpf_rb_tree may get deprecated
> > and removed. The expected users didn't materialize.
>
> OK.
> >
> >> And in
> >> the discussion [1], I thought multiple dynptr support in map key is
> >> necessary, so I enabled it.
> >>
> >> [1]:
> >> https://lore.kernel.org/bpf/CAADnVQJWaBRB=P-ZNkppwm=0tZaT3qP8PKLLJ2S5SSA2-S8mxg@mail.gmail.com/
> > Sure. That's a different reasoning and use case.
> > I'm proposing to use a single btf_record with different cnt-s.
> > The current btf_record->cnt will stay as-is indicating total number of fields
> > while btf_record->dynptr_cnt will be just for these dynptrs you're introducing.
> > Then you won't need two top level btf_record-s.
>
> I misunderstood your suggestion yesterday. Now I see what you are
> suggesting. However, it seems using a separated counter for different
> kinds of btf_field will only benefit dynptr field, because other types
> doesn't need to iterate all field instead they just need to find one
> through binary search. And I don't understand why only one btf_record
> will be enough, because these two btf_records are derived from map key
> and map value respectively.
If we have two btf records (one for key and one for value)
then it's fine.
For now btf_record for key will only have multiple
dynptr-s in there as special fields and it's fine too.
But we need to document it and when the need arises to have
dynptr-s in value or kptr/timers/whatever in a key
then we shouldn't add a 3rd and 4th btf records to
separate different kinds of special fields.
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH bpf-next 12/16] bpf: Support basic operations for dynptr key in hash map
2024-10-11 16:47 ` Alexei Starovoitov
@ 2024-10-30 10:02 ` Hou Tao
2024-11-02 18:31 ` Alexei Starovoitov
0 siblings, 1 reply; 47+ messages in thread
From: Hou Tao @ 2024-10-30 10:02 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: bpf, Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko,
Eduard Zingerman, Song Liu, Hao Luo, Yonghong Song,
Daniel Borkmann, KP Singh, Stanislav Fomichev, Jiri Olsa,
John Fastabend, Hou Tao, Xu Kuohai
Hi,
On 10/12/2024 12:47 AM, Alexei Starovoitov wrote:
> On Tue, Oct 8, 2024 at 2:02 AM Hou Tao <houtao@huaweicloud.com> wrote:
>> From: Hou Tao <houtao1@huawei.com>
>>
>> The patch supports lookup, update, delete and lookup_delete operations
>> for hash map with dynptr map. There are two major differences between
>> the implementation of normal hash map and dynptr-keyed hash map:
>>
>> 1) dynptr-keyed hash map doesn't support pre-allocation.
>> The reason is that the dynptr in map key is allocated dynamically
>> through bpf mem allocator. The length limitation for these dynptrs is
>> 4088 bytes now. Because there dynptrs are allocated dynamically, the
>> consumption of memory will be smaller compared with normal hash map when
>> there are big differences between the length of these dynptrs.
>>
>> 2) the freed element in dynptr-key map will not be reused immediately
>> For normal hash map, the freed element may be reused immediately by the
>> newly-added element, so the lookup may return an incorrect result due to
>> element deletion and element reuse. However dynptr-key map could not do
>> that, there are pointers (dynptrs) in the map key and the updates of
>> these dynptrs are not atomic: both the address and the length of the
>> dynptr will be updated. If the element is reused immediately, the access
>> of the dynptr in the freed element may incur invalid memory access due
>> to the mismatch between the address and the size of dynptr, so reuse the
>> freed element after one RCU grace period.
>>
>> Beside the differences above, dynptr-keyed hash map also needs to handle
>> the maybe-nullified dynptr in the map key.
>>
>> Signed-off-by: Hou Tao <houtao1@huawei.com>
>> ---
>> kernel/bpf/hashtab.c | 283 +++++++++++++++++++++++++++++++++++++++----
>> 1 file changed, 257 insertions(+), 26 deletions(-)
>>
>> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
>> index b14b87463ee0..edf19d36a413 100644
>> --- a/kernel/bpf/hashtab.c
>> +++ b/kernel/bpf/hashtab.c
>> @@ -88,6 +88,7 @@ struct bpf_htab {
>> struct bpf_map map;
>> struct bpf_mem_alloc ma;
>> struct bpf_mem_alloc pcpu_ma;
>> +
SNIP
>> +
>> +static inline bool htab_is_same_key(const void *key, const void *tgt, unsigned int key_size,
>> + const struct btf_record *rec)
>> +{
>> + if (!rec)
>> + return !memcmp(key, tgt, key_size);
>> + return is_same_dynptr_key(key, tgt, key_size, rec);
>> +}
>> +
>> /* this lookup function can only be called with bucket lock taken */
>> static struct htab_elem *lookup_elem_raw(struct hlist_nulls_head *head, u32 hash,
>> - void *key, u32 key_size)
>> + void *key, u32 key_size,
>> + const struct btf_record *record)
>> {
>> struct hlist_nulls_node *n;
>> struct htab_elem *l;
>>
>> hlist_nulls_for_each_entry_rcu(l, n, head, hash_node)
>> - if (l->hash == hash && !memcmp(&l->key, key, key_size))
>> + if (l->hash == hash && htab_is_same_key(l->key, key, key_size, record))
>> return l;
>>
>> return NULL;
>> @@ -657,14 +769,15 @@ static struct htab_elem *lookup_elem_raw(struct hlist_nulls_head *head, u32 hash
>> */
>> static struct htab_elem *lookup_nulls_elem_raw(struct hlist_nulls_head *head,
>> u32 hash, void *key,
>> - u32 key_size, u32 n_buckets)
>> + u32 key_size, u32 n_buckets,
>> + const struct btf_record *record)
>> {
>> struct hlist_nulls_node *n;
>> struct htab_elem *l;
>>
>> again:
>> hlist_nulls_for_each_entry_rcu(l, n, head, hash_node)
>> - if (l->hash == hash && !memcmp(&l->key, key, key_size))
>> + if (l->hash == hash && htab_is_same_key(l->key, key, key_size, record))
>> return l;
> If I'm reading this correctly the support for dynptr in map keys
> adds two map->key_record != NULL checks in the fast path,
> hence what you said in cover letter:
>
>> It seems adding dynptr key support in hash map degrades the lookup
>> performance about 12% and degrades the update performance about 7%. Will
>> investigate these degradation first.
>>
>> a) lookup
>> max_entries = 8K
>>
>> before:
>> 0:hash_lookup 72347325 lookups per sec
>>
>> after:
>> 0:hash_lookup 64758890 lookups per sec
> is surprising.
>
> Two conditional branches contribute to 12% performance loss?
> Something fishy.
> Try unlikely() to hopefully recover most of it.
> After analyzing 'perf report/annotate', of course.
Using unlikely/likely doesn't help much. It seems the big performance
gap is due to the inline of lookup_nulls_elem_raw() in
__htab_map_lookup_elem(). Still don't know the reason why
lookup_nulls_elem_raw() is not inlined after the change. After marking
the lookup_nulls_elem_raw() function as inline, the performance gap is
within ~2% for htab map lookup. For htab_map_update/delete_elem(), the
reason and the result is similar. Should I mark these two functions
(lookup_nulls_elem_raw and lookup_elem_raw) as inline in the next
revision, or should I leave it as is and try to fix the degradation in
another patch set ?
> Worst case we can specialize htab_map_gen_lookup() to
> call into different helpers.
> .
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH bpf-next 05/16] bpf: Support map key with dynptr in verifier
2024-10-13 13:07 ` Dan Carpenter
@ 2024-10-31 2:39 ` Hou Tao
0 siblings, 0 replies; 47+ messages in thread
From: Hou Tao @ 2024-10-31 2:39 UTC (permalink / raw)
To: Dan Carpenter, oe-kbuild, bpf
Cc: lkp, oe-kbuild-all, Martin KaFai Lau, Alexei Starovoitov,
Andrii Nakryiko, Eduard Zingerman, Song Liu, Hao Luo,
Yonghong Song, Daniel Borkmann, KP Singh, Stanislav Fomichev,
Jiri Olsa, John Fastabend, xukuohai
Hi,
On 10/13/2024 9:07 PM, Dan Carpenter wrote:
> Hi Hou,
>
> kernel test robot noticed the following build warnings:
>
> url: https://github.com/intel-lab-lkp/linux/commits/Hou-Tao/bpf-Introduce-map-flag-BPF_F_DYNPTR_IN_KEY/20241008-171136
> base: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
> patch link: https://lore.kernel.org/r/20241008091501.8302-6-houtao%40huaweicloud.com
> patch subject: [PATCH bpf-next 05/16] bpf: Support map key with dynptr in verifier
> config: x86_64-randconfig-161-20241011 (https://download.01.org/0day-ci/archive/20241012/202410120302.bUO1BoP7-lkp@intel.com/config)
> compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp@intel.com>
> | Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
> | Closes: https://lore.kernel.org/r/202410120302.bUO1BoP7-lkp@intel.com/
>
> smatch warnings:
> kernel/bpf/verifier.c:7471 check_stack_range_initialized() error: we previously assumed 'meta' could be null (see line 7439)
>
> vim +/meta +7471 kernel/bpf/verifier.c
Thanks for the report. Sorry for the late reply. The mail is lost in my
email client. It is a false positive. Because when
ACCESS_F_DYNPTR_READ_ALLOWED is enabled, meta must be no NULL. But I
think it incurs no harm to make dynptr_read_allowed depends on a no-NULL
meta pointer.
>
> 01f810ace9ed37 Andrei Matei 2021-02-06 7361 static int check_stack_range_initialized(
> 01f810ace9ed37 Andrei Matei 2021-02-06 7362 struct bpf_verifier_env *env, int regno, int off,
> 81b030a7eaa2ee Hou Tao 2024-10-08 7363 int access_size, unsigned int access_flags,
> 61df10c7799e27 Kumar Kartikeya Dwivedi 2022-04-25 7364 enum bpf_access_src type, struct bpf_call_arg_meta *meta)
> 17a5267067f3c3 Alexei Starovoitov 2014-09-26 7365 {
> 2a159c6f82381a Daniel Borkmann 2018-10-21 7366 struct bpf_reg_state *reg = reg_state(env, regno);
> f4d7e40a5b7157 Alexei Starovoitov 2017-12-14 7367 struct bpf_func_state *state = func(env, reg);
> f7cf25b2026dc8 Alexei Starovoitov 2019-06-15 7368 int err, min_off, max_off, i, j, slot, spi;
> 01f810ace9ed37 Andrei Matei 2021-02-06 7369 char *err_extra = type == ACCESS_HELPER ? " indirect" : "";
> 01f810ace9ed37 Andrei Matei 2021-02-06 7370 enum bpf_access_type bounds_check_type;
> cf5a0c90a8bc5f Hou Tao 2024-10-08 7371 struct dynptr_key_state dynptr_key;
> cf5a0c90a8bc5f Hou Tao 2024-10-08 7372 bool dynptr_read_allowed;
> 01f810ace9ed37 Andrei Matei 2021-02-06 7373 /* Some accesses can write anything into the stack, others are
> 01f810ace9ed37 Andrei Matei 2021-02-06 7374 * read-only.
> 01f810ace9ed37 Andrei Matei 2021-02-06 7375 */
> 01f810ace9ed37 Andrei Matei 2021-02-06 7376 bool clobber = false;
> 17a5267067f3c3 Alexei Starovoitov 2014-09-26 7377
> 81b030a7eaa2ee Hou Tao 2024-10-08 7378 if (access_size == 0 && !(access_flags & ACCESS_F_ZERO_SIZE_ALLOWED)) {
> 01f810ace9ed37 Andrei Matei 2021-02-06 7379 verbose(env, "invalid zero-sized read\n");
> 01f810ace9ed37 Andrei Matei 2021-02-06 7380 return -EACCES;
> 01f810ace9ed37 Andrei Matei 2021-02-06 7381 }
> 01f810ace9ed37 Andrei Matei 2021-02-06 7382
> 01f810ace9ed37 Andrei Matei 2021-02-06 7383 if (type == ACCESS_HELPER) {
> 01f810ace9ed37 Andrei Matei 2021-02-06 7384 /* The bounds checks for writes are more permissive than for
> 01f810ace9ed37 Andrei Matei 2021-02-06 7385 * reads. However, if raw_mode is not set, we'll do extra
> 01f810ace9ed37 Andrei Matei 2021-02-06 7386 * checks below.
> 01f810ace9ed37 Andrei Matei 2021-02-06 7387 */
> 01f810ace9ed37 Andrei Matei 2021-02-06 7388 bounds_check_type = BPF_WRITE;
> 01f810ace9ed37 Andrei Matei 2021-02-06 7389 clobber = true;
> 01f810ace9ed37 Andrei Matei 2021-02-06 7390 } else {
> 01f810ace9ed37 Andrei Matei 2021-02-06 7391 bounds_check_type = BPF_READ;
> 01f810ace9ed37 Andrei Matei 2021-02-06 7392 }
> 01f810ace9ed37 Andrei Matei 2021-02-06 7393 err = check_stack_access_within_bounds(env, regno, off, access_size,
> 01f810ace9ed37 Andrei Matei 2021-02-06 7394 type, bounds_check_type);
> 2011fccfb61bbd Andrey Ignatov 2019-03-28 7395 if (err)
> 2011fccfb61bbd Andrey Ignatov 2019-03-28 7396 return err;
> 01f810ace9ed37 Andrei Matei 2021-02-06 7397
> cf5a0c90a8bc5f Hou Tao 2024-10-08 7398 dynptr_read_allowed = access_flags & ACCESS_F_DYNPTR_READ_ALLOWED;
> 01f810ace9ed37 Andrei Matei 2021-02-06 7399 if (tnum_is_const(reg->var_off)) {
> 01f810ace9ed37 Andrei Matei 2021-02-06 7400 min_off = max_off = reg->var_off.value + off;
> cf5a0c90a8bc5f Hou Tao 2024-10-08 7401
> cf5a0c90a8bc5f Hou Tao 2024-10-08 7402 if (dynptr_read_allowed && (min_off % BPF_REG_SIZE)) {
> cf5a0c90a8bc5f Hou Tao 2024-10-08 7403 verbose(env, "R%d misaligned offset %d for dynptr-key\n", regno, min_off);
> cf5a0c90a8bc5f Hou Tao 2024-10-08 7404 return -EACCES;
> cf5a0c90a8bc5f Hou Tao 2024-10-08 7405 }
>
> can meta be NULL on this path? If not then this is a false positive.
>
> 2011fccfb61bbd Andrey Ignatov 2019-03-28 7406 } else {
> 088ec26d9c2da9 Andrey Ignatov 2019-04-03 7407 /* Variable offset is prohibited for unprivileged mode for
> 088ec26d9c2da9 Andrey Ignatov 2019-04-03 7408 * simplicity since it requires corresponding support in
> 088ec26d9c2da9 Andrey Ignatov 2019-04-03 7409 * Spectre masking for stack ALU.
> 088ec26d9c2da9 Andrey Ignatov 2019-04-03 7410 * See also retrieve_ptr_limit().
> 088ec26d9c2da9 Andrey Ignatov 2019-04-03 7411 */
> 2c78ee898d8f10 Alexei Starovoitov 2020-05-13 7412 if (!env->bypass_spec_v1) {
> f1174f77b50c94 Edward Cree 2017-08-07 7413 char tn_buf[48];
> f1174f77b50c94 Edward Cree 2017-08-07 7414
> 914cb781ee1a35 Alexei Starovoitov 2017-11-30 7415 tnum_strn(tn_buf, sizeof(tn_buf), reg->var_off);
> 01f810ace9ed37 Andrei Matei 2021-02-06 7416 verbose(env, "R%d%s variable offset stack access prohibited for !root, var_off=%s\n",
> 01f810ace9ed37 Andrei Matei 2021-02-06 7417 regno, err_extra, tn_buf);
> ea25f914dc164c Jann Horn 2017-12-18 7418 return -EACCES;
> f1174f77b50c94 Edward Cree 2017-08-07 7419 }
> cf5a0c90a8bc5f Hou Tao 2024-10-08 7420
> cf5a0c90a8bc5f Hou Tao 2024-10-08 7421 if (dynptr_read_allowed) {
> cf5a0c90a8bc5f Hou Tao 2024-10-08 7422 verbose(env, "R%d variable offset prohibited for dynptr-key\n", regno);
> cf5a0c90a8bc5f Hou Tao 2024-10-08 7423 return -EACCES;
> cf5a0c90a8bc5f Hou Tao 2024-10-08 7424 }
> cf5a0c90a8bc5f Hou Tao 2024-10-08 7425
> f2bcd05ec7b839 Andrey Ignatov 2019-04-03 7426 /* Only initialized buffer on stack is allowed to be accessed
> f2bcd05ec7b839 Andrey Ignatov 2019-04-03 7427 * with variable offset. With uninitialized buffer it's hard to
> f2bcd05ec7b839 Andrey Ignatov 2019-04-03 7428 * guarantee that whole memory is marked as initialized on
> f2bcd05ec7b839 Andrey Ignatov 2019-04-03 7429 * helper return since specific bounds are unknown what may
> f2bcd05ec7b839 Andrey Ignatov 2019-04-03 7430 * cause uninitialized stack leaking.
> f2bcd05ec7b839 Andrey Ignatov 2019-04-03 7431 */
> f2bcd05ec7b839 Andrey Ignatov 2019-04-03 7432 if (meta && meta->raw_mode)
> f2bcd05ec7b839 Andrey Ignatov 2019-04-03 7433 meta = NULL;
> f2bcd05ec7b839 Andrey Ignatov 2019-04-03 7434
> 01f810ace9ed37 Andrei Matei 2021-02-06 7435 min_off = reg->smin_value + off;
> 01f810ace9ed37 Andrei Matei 2021-02-06 7436 max_off = reg->smax_value + off;
> 107c26a70ca81b Andrey Ignatov 2019-04-03 7437 }
> 17a5267067f3c3 Alexei Starovoitov 2014-09-26 7438
> 435faee1aae9c1 Daniel Borkmann 2016-04-13 @7439 if (meta && meta->raw_mode) {
>
> Check for NULL
>
> ef8fc7a07c0e16 Kumar Kartikeya Dwivedi 2023-01-21 7440 /* Ensure we won't be overwriting dynptrs when simulating byte
> ef8fc7a07c0e16 Kumar Kartikeya Dwivedi 2023-01-21 7441 * by byte access in check_helper_call using meta.access_size.
> ef8fc7a07c0e16 Kumar Kartikeya Dwivedi 2023-01-21 7442 * This would be a problem if we have a helper in the future
> ef8fc7a07c0e16 Kumar Kartikeya Dwivedi 2023-01-21 7443 * which takes:
> ef8fc7a07c0e16 Kumar Kartikeya Dwivedi 2023-01-21 7444 *
> ef8fc7a07c0e16 Kumar Kartikeya Dwivedi 2023-01-21 7445 * helper(uninit_mem, len, dynptr)
> ef8fc7a07c0e16 Kumar Kartikeya Dwivedi 2023-01-21 7446 *
> ef8fc7a07c0e16 Kumar Kartikeya Dwivedi 2023-01-21 7447 * Now, uninint_mem may overlap with dynptr pointer. Hence, it
> ef8fc7a07c0e16 Kumar Kartikeya Dwivedi 2023-01-21 7448 * may end up writing to dynptr itself when touching memory from
> ef8fc7a07c0e16 Kumar Kartikeya Dwivedi 2023-01-21 7449 * arg 1. This can be relaxed on a case by case basis for known
> ef8fc7a07c0e16 Kumar Kartikeya Dwivedi 2023-01-21 7450 * safe cases, but reject due to the possibilitiy of aliasing by
> ef8fc7a07c0e16 Kumar Kartikeya Dwivedi 2023-01-21 7451 * default.
> ef8fc7a07c0e16 Kumar Kartikeya Dwivedi 2023-01-21 7452 */
> ef8fc7a07c0e16 Kumar Kartikeya Dwivedi 2023-01-21 7453 for (i = min_off; i < max_off + access_size; i++) {
> ef8fc7a07c0e16 Kumar Kartikeya Dwivedi 2023-01-21 7454 int stack_off = -i - 1;
> ef8fc7a07c0e16 Kumar Kartikeya Dwivedi 2023-01-21 7455
> ef8fc7a07c0e16 Kumar Kartikeya Dwivedi 2023-01-21 7456 spi = __get_spi(i);
> ef8fc7a07c0e16 Kumar Kartikeya Dwivedi 2023-01-21 7457 /* raw_mode may write past allocated_stack */
> ef8fc7a07c0e16 Kumar Kartikeya Dwivedi 2023-01-21 7458 if (state->allocated_stack <= stack_off)
> ef8fc7a07c0e16 Kumar Kartikeya Dwivedi 2023-01-21 7459 continue;
> ef8fc7a07c0e16 Kumar Kartikeya Dwivedi 2023-01-21 7460 if (state->stack[spi].slot_type[stack_off % BPF_REG_SIZE] == STACK_DYNPTR) {
> ef8fc7a07c0e16 Kumar Kartikeya Dwivedi 2023-01-21 7461 verbose(env, "potential write to dynptr at off=%d disallowed\n", i);
> ef8fc7a07c0e16 Kumar Kartikeya Dwivedi 2023-01-21 7462 return -EACCES;
> ef8fc7a07c0e16 Kumar Kartikeya Dwivedi 2023-01-21 7463 }
> ef8fc7a07c0e16 Kumar Kartikeya Dwivedi 2023-01-21 7464 }
> 435faee1aae9c1 Daniel Borkmann 2016-04-13 7465 meta->access_size = access_size;
> 435faee1aae9c1 Daniel Borkmann 2016-04-13 7466 meta->regno = regno;
> 435faee1aae9c1 Daniel Borkmann 2016-04-13 7467 return 0;
> 435faee1aae9c1 Daniel Borkmann 2016-04-13 7468 }
> 435faee1aae9c1 Daniel Borkmann 2016-04-13 7469
> cf5a0c90a8bc5f Hou Tao 2024-10-08 7470 if (dynptr_read_allowed) {
> cf5a0c90a8bc5f Hou Tao 2024-10-08 @7471 err = init_dynptr_key_state(env, meta->map_ptr->key_record, &dynptr_key);
> ^^^^^^^^^^^^^^^^^^^^^^^^^
> Unchecked dereference
>
> cf5a0c90a8bc5f Hou Tao 2024-10-08 7472 if (err)
> cf5a0c90a8bc5f Hou Tao 2024-10-08 7473 return err;
> cf5a0c90a8bc5f Hou Tao 2024-10-08 7474 }
> 2011fccfb61bbd Andrey Ignatov 2019-03-28 7475 for (i = min_off; i < max_off + access_size; i++) {
> cc2b14d51053eb Alexei Starovoitov 2017-12-14 7476 u8 *stype;
> cc2b14d51053eb Alexei Starovoitov 2017-12-14 7477
> 2011fccfb61bbd Andrey Ignatov 2019-03-28 7478 slot = -i - 1;
> 638f5b90d46016 Alexei Starovoitov 2017-10-31 7479 spi = slot / BPF_REG_SIZE;
> 6b4a64bafd107e Andrei Matei 2023-12-07 7480 if (state->allocated_stack <= slot) {
>
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH bpf-next 08/16] bpf: Handle bpf_dynptr_user in bpf syscall when it is used as input
2024-10-13 13:08 ` Dan Carpenter
@ 2024-10-31 2:44 ` Hou Tao
0 siblings, 0 replies; 47+ messages in thread
From: Hou Tao @ 2024-10-31 2:44 UTC (permalink / raw)
To: Dan Carpenter, oe-kbuild, bpf
Cc: lkp, oe-kbuild-all, Martin KaFai Lau, Alexei Starovoitov,
Andrii Nakryiko, Eduard Zingerman, Song Liu, Hao Luo,
Yonghong Song, Daniel Borkmann, KP Singh, Stanislav Fomichev,
Jiri Olsa, John Fastabend, xukuohai
Hi,
On 10/13/2024 9:08 PM, Dan Carpenter wrote:
> Hi Hou,
>
> kernel test robot noticed the following build warnings:
>
> url: https://github.com/intel-lab-lkp/linux/commits/Hou-Tao/bpf-Introduce-map-flag-BPF_F_DYNPTR_IN_KEY/20241008-171136
> base: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
> patch link: https://lore.kernel.org/r/20241008091501.8302-9-houtao%40huaweicloud.com
> patch subject: [PATCH bpf-next 08/16] bpf: Handle bpf_dynptr_user in bpf syscall when it is used as input
> config: x86_64-randconfig-161-20241011 (https://download.01.org/0day-ci/archive/20241012/202410120530.zUoa1scp-lkp@intel.com/config)
> compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp@intel.com>
> | Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
> | Closes: https://lore.kernel.org/r/202410120530.zUoa1scp-lkp@intel.com/
>
> smatch warnings:
> kernel/bpf/syscall.c:1557 bpf_copy_from_dynptr_ukey() warn: 'key' is an error pointer or valid
>
> vim +/key +1557 kernel/bpf/syscall.c
>
> e1883aa78ac1fe9 Hou Tao 2024-10-08 1543 static void *bpf_copy_from_dynptr_ukey(const struct bpf_map *map, bpfptr_t ukey)
> e1883aa78ac1fe9 Hou Tao 2024-10-08 1544 {
> e1883aa78ac1fe9 Hou Tao 2024-10-08 1545 const struct btf_record *record;
> e1883aa78ac1fe9 Hou Tao 2024-10-08 1546 const struct btf_field *field;
> e1883aa78ac1fe9 Hou Tao 2024-10-08 1547 struct bpf_dynptr_user *uptr;
> e1883aa78ac1fe9 Hou Tao 2024-10-08 1548 struct bpf_dynptr_kern *kptr;
> e1883aa78ac1fe9 Hou Tao 2024-10-08 1549 void *key, *new_key, *kdata;
> e1883aa78ac1fe9 Hou Tao 2024-10-08 1550 unsigned int key_size, size;
> e1883aa78ac1fe9 Hou Tao 2024-10-08 1551 bpfptr_t udata;
> e1883aa78ac1fe9 Hou Tao 2024-10-08 1552 unsigned int i;
> e1883aa78ac1fe9 Hou Tao 2024-10-08 1553 int err;
> e1883aa78ac1fe9 Hou Tao 2024-10-08 1554
> e1883aa78ac1fe9 Hou Tao 2024-10-08 1555 key_size = map->key_size;
> e1883aa78ac1fe9 Hou Tao 2024-10-08 1556 key = kvmemdup_bpfptr(ukey, key_size);
> e1883aa78ac1fe9 Hou Tao 2024-10-08 @1557 if (!key)
>
> This should be if (IS_ERR(key))
Thanks for the report. You are right, the check of key which is returned
by kvmemdup_bpfptr is incorrect(). Will fix it in the next revision and
will try to add some test cases for these error paths.
>
> e1883aa78ac1fe9 Hou Tao 2024-10-08 1558 return ERR_PTR(-ENOMEM);
> e1883aa78ac1fe9 Hou Tao 2024-10-08 1559
> e1883aa78ac1fe9 Hou Tao 2024-10-08 1560 size = key_size;
> e1883aa78ac1fe9 Hou Tao 2024-10-08 1561 record = map->key_record;
>
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH bpf-next 12/16] bpf: Support basic operations for dynptr key in hash map
2024-10-30 10:02 ` Hou Tao
@ 2024-11-02 18:31 ` Alexei Starovoitov
0 siblings, 0 replies; 47+ messages in thread
From: Alexei Starovoitov @ 2024-11-02 18:31 UTC (permalink / raw)
To: Hou Tao
Cc: bpf, Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko,
Eduard Zingerman, Song Liu, Hao Luo, Yonghong Song,
Daniel Borkmann, KP Singh, Stanislav Fomichev, Jiri Olsa,
John Fastabend, Hou Tao, Xu Kuohai
On Wed, Oct 30, 2024 at 3:02 AM Hou Tao <houtao@huaweicloud.com> wrote:
>
> >> a) lookup
> >> max_entries = 8K
> >>
> >> before:
> >> 0:hash_lookup 72347325 lookups per sec
> >>
> >> after:
> >> 0:hash_lookup 64758890 lookups per sec
> > is surprising.
> >
> > Two conditional branches contribute to 12% performance loss?
> > Something fishy.
> > Try unlikely() to hopefully recover most of it.
> > After analyzing 'perf report/annotate', of course.
>
> Using unlikely/likely doesn't help much. It seems the big performance
> gap is due to the inline of lookup_nulls_elem_raw() in
> __htab_map_lookup_elem(). Still don't know the reason why
> lookup_nulls_elem_raw() is not inlined after the change. After marking
> the lookup_nulls_elem_raw() function as inline, the performance gap is
> within ~2% for htab map lookup. For htab_map_update/delete_elem(), the
> reason and the result is similar. Should I mark these two functions
> (lookup_nulls_elem_raw and lookup_elem_raw) as inline in the next
> revision, or should I leave it as is and try to fix the degradation in
> another patch set ?
from 12% to 2% by adding 'inline' to lookup_[nulls_]elem_raw() ?
Certainly do it in the patch set.
^ permalink raw reply [flat|nested] 47+ messages in thread
end of thread, other threads:[~2024-11-02 18:31 UTC | newest]
Thread overview: 47+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-08 9:14 [PATCH bpf-next 00/16] Support dynptr key for hash map Hou Tao
2024-10-08 9:14 ` [PATCH bpf-next 01/16] bpf: Introduce map flag BPF_F_DYNPTR_IN_KEY Hou Tao
2024-10-10 2:21 ` Alexei Starovoitov
2024-10-21 13:45 ` Hou Tao
2024-10-22 3:53 ` Alexei Starovoitov
2024-10-22 4:22 ` Hou Tao
2024-10-08 9:14 ` [PATCH bpf-next 02/16] bpf: Add two helpers to facilitate the btf parsing of bpf_dynptr Hou Tao
2024-10-08 9:14 ` [PATCH bpf-next 03/16] bpf: Parse bpf_dynptr in map key Hou Tao
2024-10-10 18:02 ` Eduard Zingerman
2024-10-21 13:48 ` Hou Tao
2024-10-11 16:29 ` Alexei Starovoitov
2024-10-21 14:02 ` Hou Tao
2024-10-22 3:59 ` Alexei Starovoitov
2024-10-22 7:20 ` Hou Tao
2024-10-22 18:44 ` Alexei Starovoitov
2024-10-08 9:14 ` [PATCH bpf-next 04/16] bpf: Pass flags instead of bool to check_helper_mem_access() Hou Tao
2024-10-08 9:14 ` [PATCH bpf-next 05/16] bpf: Support map key with dynptr in verifier Hou Tao
2024-10-10 20:30 ` Eduard Zingerman
2024-10-10 20:57 ` Eduard Zingerman
2024-10-21 13:50 ` Hou Tao
2024-10-13 13:07 ` Dan Carpenter
2024-10-31 2:39 ` Hou Tao
2024-10-08 9:14 ` [PATCH bpf-next 06/16] bpf: Introduce bpf_dynptr_user Hou Tao
2024-10-10 21:50 ` Andrii Nakryiko
2024-10-10 22:12 ` Alexei Starovoitov
2024-10-21 13:51 ` Hou Tao
2024-10-08 9:14 ` [PATCH bpf-next 07/16] libbpf: Add helpers for bpf_dynptr_user Hou Tao
2024-10-10 21:50 ` Andrii Nakryiko
2024-10-21 13:51 ` Hou Tao
2024-10-08 9:14 ` [PATCH bpf-next 08/16] bpf: Handle bpf_dynptr_user in bpf syscall when it is used as input Hou Tao
2024-10-13 13:08 ` Dan Carpenter
2024-10-31 2:44 ` Hou Tao
2024-10-08 9:14 ` [PATCH bpf-next 09/16] bpf: Handle bpf_dynptr_user in bpf syscall when it is used as output Hou Tao
2024-10-08 9:14 ` [PATCH bpf-next 10/16] bpf: Disable unsupported functionalities for map with dynptr key Hou Tao
2024-10-08 9:14 ` [PATCH bpf-next 11/16] bpf: Add bpf_mem_alloc_check_size() helper Hou Tao
2024-10-08 9:14 ` [PATCH bpf-next 12/16] bpf: Support basic operations for dynptr key in hash map Hou Tao
2024-10-11 16:47 ` Alexei Starovoitov
2024-10-30 10:02 ` Hou Tao
2024-11-02 18:31 ` Alexei Starovoitov
2024-10-08 9:14 ` [PATCH bpf-next 13/16] bpf: Export bpf_dynptr_set_size Hou Tao
2024-10-08 9:14 ` [PATCH bpf-next 14/16] bpf: Support get_next_key operation for dynptr key in hash map Hou Tao
2024-10-08 9:15 ` [PATCH bpf-next 15/16] bpf: Enable BPF_F_DYNPTR_IN_KEY for " Hou Tao
2024-10-08 9:15 ` [PATCH bpf-next 16/16] selftests/bpf: Add test cases for hash map with dynptr key Hou Tao
2024-10-11 18:23 ` Alexei Starovoitov
2024-10-21 14:05 ` Hou Tao
2024-10-11 22:11 ` [PATCH bpf-next 00/16] Support dynptr key for hash map Eduard Zingerman
2024-10-21 14:09 ` Hou Tao
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox