* [PATCH bpf v5 0/7] bpf: Fix the release of inner map
@ 2023-12-04 14:04 Hou Tao
2023-12-04 14:04 ` [PATCH bpf v5 1/7] bpf: Check rcu_read_lock_trace_held() before calling bpf map helpers Hou Tao
` (7 more replies)
0 siblings, 8 replies; 9+ messages in thread
From: Hou Tao @ 2023-12-04 14:04 UTC (permalink / raw)
To: bpf
Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko, Song Liu,
Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh,
Stanislav Fomichev, Jiri Olsa, John Fastabend, Paul E . McKenney,
houtao1
From: Hou Tao <houtao1@huawei.com>
Hi,
The patchset aims to fix the release of inner map in map array or map
htab. The release of inner map is different with normal map. For normal
map, the map is released after the bpf program which uses the map is
destroyed, because the bpf program tracks the used maps. However bpf
program can not track the used inner map because these inner map may be
updated or deleted dynamically, and for now the ref-counter of inner map
is decreased after the inner map is remove from outer map, so the inner
map may be freed before the bpf program, which is accessing the inner
map, exits and there will be use-after-free problem as demonstrated by
patch #6.
The patchset fixes the problem by deferring the release of inner map.
The freeing of inner map is deferred according to the sleepable
attributes of the bpf programs which own the outer map. Patch #1 fixes
the warning when running the newly-added selftest under interpreter
mode. Patch #2 adds more parameters to .map_fd_put_ptr() to prepare for
the fix. Patch #3 fixes the incorrect value of need_defer when freeing
the fd array. Patch #4 fixes the potential use-after-free problem by
using call_rcu_tasks_trace() and call_rcu() to wait for one tasks trace
RCU GP and one RCU GP unconditionally. Patch #5 optimizes the free of
inner map by removing the unnecessary RCU GP waiting. Patch #6 adds a
selftest to demonstrate the potential use-after-free problem. Patch #7
updates a selftest to update outer map in syscall bpf program.
Please see individual patches for more details. And comments are always
welcome.
Change Log:
v5:
* patch #3: rename fd_array_map_delete_elem_with_deferred_free() to
__fd_array_map_delete_elem() (Alexei)
* patch #5: use atomic64_t instead of atomic_t to prevent potential
overflow (Alexei)
* patch #7: use ptr_to_u64() helper instead of force casting to initialize
pointers in bpf_attr (Alexei)
v4: https://lore.kernel.org/bpf/20231130140120.1736235-1-houtao@huaweicloud.com
* patch #2: don't use "deferred", use "need_defer" uniformly
* patch #3: newly-added, fix the incorrect value of need_defer during
fd array free.
* patch #4: doesn't consider the case in which bpf map is not used by
any bpf program and only use sleepable_refcnt to remove
unnecessary tasks trace RCU GP (Alexei)
* patch #4: remove memory barriers added due to cautiousness (Alexei)
v3: https://lore.kernel.org/bpf/20231124113033.503338-1-houtao@huaweicloud.com
* multiple variable renamings (Martin)
* define BPF_MAP_RCU_GP/BPF_MAP_RCU_TT_GP as bit (Martin)
* use call_rcu() and its variants instead of synchronize_rcu() (Martin)
* remove unnecessary mask in bpf_map_free_deferred() (Martin)
* place atomic_or() and the related smp_mb() together (Martin)
* add patch #6 to demonstrate that updating outer map in syscall
program is dead-lock free (Alexei)
* update comments about the memory barrier in bpf_map_fd_put_ptr()
* update commit message for patch #3 and #4 to describe more details
v2: https://lore.kernel.org/bpf/20231113123324.3914612-1-houtao@huaweicloud.com
* defer the invocation of ops->map_free() instead of bpf_map_put() (Martin)
* update selftest to make it being reproducible under JIT mode (Martin)
* remove unnecessary preparatory patches
v1: https://lore.kernel.org/bpf/20231107140702.1891778-1-houtao@huaweicloud.com
Hou Tao (7):
bpf: Check rcu_read_lock_trace_held() before calling bpf map helpers
bpf: Add map and need_defer parameters to .map_fd_put_ptr()
bpf: Set need_defer as false when clearing fd array during map free
bpf: Defer the free of inner map when necessary
bpf: Optimize the free of inner map
selftests/bpf: Add test cases for inner map
selftests/bpf: Test outer map update operations in syscall program
include/linux/bpf.h | 15 +-
kernel/bpf/arraymap.c | 33 ++--
kernel/bpf/core.c | 4 +
kernel/bpf/hashtab.c | 6 +-
kernel/bpf/helpers.c | 13 +-
kernel/bpf/map_in_map.c | 17 ++-
kernel/bpf/map_in_map.h | 2 +-
kernel/bpf/syscall.c | 40 ++++-
kernel/bpf/verifier.c | 4 +-
.../selftests/bpf/prog_tests/map_in_map.c | 141 ++++++++++++++++++
.../selftests/bpf/prog_tests/syscall.c | 30 +++-
.../selftests/bpf/progs/access_map_in_map.c | 93 ++++++++++++
tools/testing/selftests/bpf/progs/syscall.c | 96 +++++++++++-
13 files changed, 453 insertions(+), 41 deletions(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/map_in_map.c
create mode 100644 tools/testing/selftests/bpf/progs/access_map_in_map.c
--
2.29.2
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH bpf v5 1/7] bpf: Check rcu_read_lock_trace_held() before calling bpf map helpers
2023-12-04 14:04 [PATCH bpf v5 0/7] bpf: Fix the release of inner map Hou Tao
@ 2023-12-04 14:04 ` Hou Tao
2023-12-04 14:04 ` [PATCH bpf v5 2/7] bpf: Add map and need_defer parameters to .map_fd_put_ptr() Hou Tao
` (6 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Hou Tao @ 2023-12-04 14:04 UTC (permalink / raw)
To: bpf
Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko, Song Liu,
Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh,
Stanislav Fomichev, Jiri Olsa, John Fastabend, Paul E . McKenney,
houtao1
From: Hou Tao <houtao1@huawei.com>
These three bpf_map_{lookup,update,delete}_elem() helpers are also
available for sleepable bpf program, so add the corresponding lock
assertion for sleepable bpf program, otherwise the following warning
will be reported when a sleepable bpf program manipulates bpf map under
interpreter mode (aka bpf_jit_enable=0):
WARNING: CPU: 3 PID: 4985 at kernel/bpf/helpers.c:40 ......
CPU: 3 PID: 4985 Comm: test_progs Not tainted 6.6.0+ #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) ......
RIP: 0010:bpf_map_lookup_elem+0x54/0x60
......
Call Trace:
<TASK>
? __warn+0xa5/0x240
? bpf_map_lookup_elem+0x54/0x60
? report_bug+0x1ba/0x1f0
? handle_bug+0x40/0x80
? exc_invalid_op+0x18/0x50
? asm_exc_invalid_op+0x1b/0x20
? __pfx_bpf_map_lookup_elem+0x10/0x10
? rcu_lockdep_current_cpu_online+0x65/0xb0
? rcu_is_watching+0x23/0x50
? bpf_map_lookup_elem+0x54/0x60
? __pfx_bpf_map_lookup_elem+0x10/0x10
___bpf_prog_run+0x513/0x3b70
__bpf_prog_run32+0x9d/0xd0
? __bpf_prog_enter_sleepable_recur+0xad/0x120
? __bpf_prog_enter_sleepable_recur+0x3e/0x120
bpf_trampoline_6442580665+0x4d/0x1000
__x64_sys_getpgid+0x5/0x30
? do_syscall_64+0x36/0xb0
entry_SYSCALL_64_after_hwframe+0x6e/0x76
</TASK>
Signed-off-by: Hou Tao <houtao1@huawei.com>
---
kernel/bpf/helpers.c | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 56b0c1f678ee..f43038931935 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -32,12 +32,13 @@
*
* Different map implementations will rely on rcu in map methods
* lookup/update/delete, therefore eBPF programs must run under rcu lock
- * if program is allowed to access maps, so check rcu_read_lock_held in
- * all three functions.
+ * if program is allowed to access maps, so check rcu_read_lock_held() or
+ * rcu_read_lock_trace_held() in all three functions.
*/
BPF_CALL_2(bpf_map_lookup_elem, struct bpf_map *, map, void *, key)
{
- WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_bh_held());
+ WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_trace_held() &&
+ !rcu_read_lock_bh_held());
return (unsigned long) map->ops->map_lookup_elem(map, key);
}
@@ -53,7 +54,8 @@ const struct bpf_func_proto bpf_map_lookup_elem_proto = {
BPF_CALL_4(bpf_map_update_elem, struct bpf_map *, map, void *, key,
void *, value, u64, flags)
{
- WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_bh_held());
+ WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_trace_held() &&
+ !rcu_read_lock_bh_held());
return map->ops->map_update_elem(map, key, value, flags);
}
@@ -70,7 +72,8 @@ const struct bpf_func_proto bpf_map_update_elem_proto = {
BPF_CALL_2(bpf_map_delete_elem, struct bpf_map *, map, void *, key)
{
- WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_bh_held());
+ WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_trace_held() &&
+ !rcu_read_lock_bh_held());
return map->ops->map_delete_elem(map, key);
}
--
2.29.2
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH bpf v5 2/7] bpf: Add map and need_defer parameters to .map_fd_put_ptr()
2023-12-04 14:04 [PATCH bpf v5 0/7] bpf: Fix the release of inner map Hou Tao
2023-12-04 14:04 ` [PATCH bpf v5 1/7] bpf: Check rcu_read_lock_trace_held() before calling bpf map helpers Hou Tao
@ 2023-12-04 14:04 ` Hou Tao
2023-12-04 14:04 ` [PATCH bpf v5 3/7] bpf: Set need_defer as false when clearing fd array during map free Hou Tao
` (5 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Hou Tao @ 2023-12-04 14:04 UTC (permalink / raw)
To: bpf
Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko, Song Liu,
Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh,
Stanislav Fomichev, Jiri Olsa, John Fastabend, Paul E . McKenney,
houtao1
From: Hou Tao <houtao1@huawei.com>
map is the pointer of outer map, and need_defer needs some explanation.
need_defer tells the implementation to defer the reference release of
the passed element and ensure that the element is still alive before
the bpf program, which may manipulate it, exits.
The following three cases will invoke map_fd_put_ptr() and different
need_defer values will be passed to these callers:
1) release the reference of the old element in the map during map update
or map deletion. The release must be deferred, otherwise the bpf
program may incur use-after-free problem, so need_defer needs to be
true.
2) release the reference of the to-be-added element in the error path of
map update. The to-be-added element is not visible to any bpf
program, so it is OK to pass false for need_defer parameter.
3) release the references of all elements in the map during map release.
Any bpf program which has access to the map must have been exited and
released, so need_defer=false will be OK.
These two parameters will be used by the following patches to fix the
potential use-after-free problem for map-in-map.
Signed-off-by: Hou Tao <houtao1@huawei.com>
---
include/linux/bpf.h | 6 +++++-
kernel/bpf/arraymap.c | 12 +++++++-----
kernel/bpf/hashtab.c | 6 +++---
kernel/bpf/map_in_map.c | 2 +-
kernel/bpf/map_in_map.h | 2 +-
5 files changed, 17 insertions(+), 11 deletions(-)
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 6762dac3ef76..6f68ac24bb9a 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -106,7 +106,11 @@ struct bpf_map_ops {
/* funcs called by prog_array and perf_event_array map */
void *(*map_fd_get_ptr)(struct bpf_map *map, struct file *map_file,
int fd);
- void (*map_fd_put_ptr)(void *ptr);
+ /* If need_defer is true, the implementation should guarantee that
+ * the to-be-put element is still alive before the bpf program, which
+ * may manipulate it, exists.
+ */
+ void (*map_fd_put_ptr)(struct bpf_map *map, void *ptr, bool need_defer);
int (*map_gen_lookup)(struct bpf_map *map, struct bpf_insn *insn_buf);
u32 (*map_fd_sys_lookup_elem)(void *ptr);
void (*map_seq_show_elem)(struct bpf_map *map, void *key,
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 2058e89b5ddd..f9aed5909d6e 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -867,7 +867,7 @@ int bpf_fd_array_map_update_elem(struct bpf_map *map, struct file *map_file,
}
if (old_ptr)
- map->ops->map_fd_put_ptr(old_ptr);
+ map->ops->map_fd_put_ptr(map, old_ptr, true);
return 0;
}
@@ -890,7 +890,7 @@ static long fd_array_map_delete_elem(struct bpf_map *map, void *key)
}
if (old_ptr) {
- map->ops->map_fd_put_ptr(old_ptr);
+ map->ops->map_fd_put_ptr(map, old_ptr, true);
return 0;
} else {
return -ENOENT;
@@ -913,8 +913,9 @@ static void *prog_fd_array_get_ptr(struct bpf_map *map,
return prog;
}
-static void prog_fd_array_put_ptr(void *ptr)
+static void prog_fd_array_put_ptr(struct bpf_map *map, void *ptr, bool need_defer)
{
+ /* bpf_prog is freed after one RCU or tasks trace grace period */
bpf_prog_put(ptr);
}
@@ -1239,8 +1240,9 @@ static void *perf_event_fd_array_get_ptr(struct bpf_map *map,
return ee;
}
-static void perf_event_fd_array_put_ptr(void *ptr)
+static void perf_event_fd_array_put_ptr(struct bpf_map *map, void *ptr, bool need_defer)
{
+ /* bpf_perf_event is freed after one RCU grace period */
bpf_event_entry_free_rcu(ptr);
}
@@ -1294,7 +1296,7 @@ static void *cgroup_fd_array_get_ptr(struct bpf_map *map,
return cgroup_get_from_fd(fd);
}
-static void cgroup_fd_array_put_ptr(void *ptr)
+static void cgroup_fd_array_put_ptr(struct bpf_map *map, void *ptr, bool need_defer)
{
/* cgroup_put free cgrp after a rcu grace period */
cgroup_put(ptr);
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index fd8d4b0addfc..5b9146fa825f 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -897,7 +897,7 @@ static void htab_put_fd_value(struct bpf_htab *htab, struct htab_elem *l)
if (map->ops->map_fd_put_ptr) {
ptr = fd_htab_map_get_ptr(map, l);
- map->ops->map_fd_put_ptr(ptr);
+ map->ops->map_fd_put_ptr(map, ptr, true);
}
}
@@ -2484,7 +2484,7 @@ static void fd_htab_map_free(struct bpf_map *map)
hlist_nulls_for_each_entry_safe(l, n, head, hash_node) {
void *ptr = fd_htab_map_get_ptr(map, l);
- map->ops->map_fd_put_ptr(ptr);
+ map->ops->map_fd_put_ptr(map, ptr, false);
}
}
@@ -2525,7 +2525,7 @@ int bpf_fd_htab_map_update_elem(struct bpf_map *map, struct file *map_file,
ret = htab_map_update_elem(map, key, &ptr, map_flags);
if (ret)
- map->ops->map_fd_put_ptr(ptr);
+ map->ops->map_fd_put_ptr(map, ptr, false);
return ret;
}
diff --git a/kernel/bpf/map_in_map.c b/kernel/bpf/map_in_map.c
index cd5eafaba97e..2dfeb5835e16 100644
--- a/kernel/bpf/map_in_map.c
+++ b/kernel/bpf/map_in_map.c
@@ -127,7 +127,7 @@ void *bpf_map_fd_get_ptr(struct bpf_map *map,
return inner_map;
}
-void bpf_map_fd_put_ptr(void *ptr)
+void bpf_map_fd_put_ptr(struct bpf_map *map, void *ptr, bool need_defer)
{
/* ptr->ops->map_free() has to go through one
* rcu grace period by itself.
diff --git a/kernel/bpf/map_in_map.h b/kernel/bpf/map_in_map.h
index bcb7534afb3c..7d61602354de 100644
--- a/kernel/bpf/map_in_map.h
+++ b/kernel/bpf/map_in_map.h
@@ -13,7 +13,7 @@ struct bpf_map *bpf_map_meta_alloc(int inner_map_ufd);
void bpf_map_meta_free(struct bpf_map *map_meta);
void *bpf_map_fd_get_ptr(struct bpf_map *map, struct file *map_file,
int ufd);
-void bpf_map_fd_put_ptr(void *ptr);
+void bpf_map_fd_put_ptr(struct bpf_map *map, void *ptr, bool need_defer);
u32 bpf_map_fd_sys_lookup_elem(void *ptr);
#endif
--
2.29.2
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH bpf v5 3/7] bpf: Set need_defer as false when clearing fd array during map free
2023-12-04 14:04 [PATCH bpf v5 0/7] bpf: Fix the release of inner map Hou Tao
2023-12-04 14:04 ` [PATCH bpf v5 1/7] bpf: Check rcu_read_lock_trace_held() before calling bpf map helpers Hou Tao
2023-12-04 14:04 ` [PATCH bpf v5 2/7] bpf: Add map and need_defer parameters to .map_fd_put_ptr() Hou Tao
@ 2023-12-04 14:04 ` Hou Tao
2023-12-04 14:04 ` [PATCH bpf v5 4/7] bpf: Defer the free of inner map when necessary Hou Tao
` (4 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Hou Tao @ 2023-12-04 14:04 UTC (permalink / raw)
To: bpf
Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko, Song Liu,
Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh,
Stanislav Fomichev, Jiri Olsa, John Fastabend, Paul E . McKenney,
houtao1
From: Hou Tao <houtao1@huawei.com>
Both map deletion operation, map release and map free operation use
fd_array_map_delete_elem() to remove the element from fd array and
need_defer is always true in fd_array_map_delete_elem(). For the map
deletion operation and map release operation, need_defer=true is
necessary, because the bpf program, which accesses the element in fd
array, may still alive. However for map free operation, it is certain
that the bpf program which owns the fd array has already been exited, so
setting need_defer as false is appropriate for map free operation.
So fix it by adding need_defer parameter to bpf_fd_array_map_clear() and
adding a new helper __fd_array_map_delete_elem() to handle the map
deletion, map release and map free operations correspondingly.
Signed-off-by: Hou Tao <houtao1@huawei.com>
---
kernel/bpf/arraymap.c | 23 ++++++++++++++---------
1 file changed, 14 insertions(+), 9 deletions(-)
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index f9aed5909d6e..4a4a67956e21 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -871,7 +871,7 @@ int bpf_fd_array_map_update_elem(struct bpf_map *map, struct file *map_file,
return 0;
}
-static long fd_array_map_delete_elem(struct bpf_map *map, void *key)
+static long __fd_array_map_delete_elem(struct bpf_map *map, void *key, bool need_defer)
{
struct bpf_array *array = container_of(map, struct bpf_array, map);
void *old_ptr;
@@ -890,13 +890,18 @@ static long fd_array_map_delete_elem(struct bpf_map *map, void *key)
}
if (old_ptr) {
- map->ops->map_fd_put_ptr(map, old_ptr, true);
+ map->ops->map_fd_put_ptr(map, old_ptr, need_defer);
return 0;
} else {
return -ENOENT;
}
}
+static long fd_array_map_delete_elem(struct bpf_map *map, void *key)
+{
+ return __fd_array_map_delete_elem(map, key, true);
+}
+
static void *prog_fd_array_get_ptr(struct bpf_map *map,
struct file *map_file, int fd)
{
@@ -925,13 +930,13 @@ static u32 prog_fd_array_sys_lookup_elem(void *ptr)
}
/* decrement refcnt of all bpf_progs that are stored in this map */
-static void bpf_fd_array_map_clear(struct bpf_map *map)
+static void bpf_fd_array_map_clear(struct bpf_map *map, bool need_defer)
{
struct bpf_array *array = container_of(map, struct bpf_array, map);
int i;
for (i = 0; i < array->map.max_entries; i++)
- fd_array_map_delete_elem(map, &i);
+ __fd_array_map_delete_elem(map, &i, need_defer);
}
static void prog_array_map_seq_show_elem(struct bpf_map *map, void *key,
@@ -1110,7 +1115,7 @@ static void prog_array_map_clear_deferred(struct work_struct *work)
{
struct bpf_map *map = container_of(work, struct bpf_array_aux,
work)->map;
- bpf_fd_array_map_clear(map);
+ bpf_fd_array_map_clear(map, true);
bpf_map_put(map);
}
@@ -1260,7 +1265,7 @@ static void perf_event_fd_array_release(struct bpf_map *map,
for (i = 0; i < array->map.max_entries; i++) {
ee = READ_ONCE(array->ptrs[i]);
if (ee && ee->map_file == map_file)
- fd_array_map_delete_elem(map, &i);
+ __fd_array_map_delete_elem(map, &i, true);
}
rcu_read_unlock();
}
@@ -1268,7 +1273,7 @@ static void perf_event_fd_array_release(struct bpf_map *map,
static void perf_event_fd_array_map_free(struct bpf_map *map)
{
if (map->map_flags & BPF_F_PRESERVE_ELEMS)
- bpf_fd_array_map_clear(map);
+ bpf_fd_array_map_clear(map, false);
fd_array_map_free(map);
}
@@ -1304,7 +1309,7 @@ static void cgroup_fd_array_put_ptr(struct bpf_map *map, void *ptr, bool need_de
static void cgroup_fd_array_free(struct bpf_map *map)
{
- bpf_fd_array_map_clear(map);
+ bpf_fd_array_map_clear(map, false);
fd_array_map_free(map);
}
@@ -1349,7 +1354,7 @@ static void array_of_map_free(struct bpf_map *map)
* is protected by fdget/fdput.
*/
bpf_map_meta_free(map->inner_map_meta);
- bpf_fd_array_map_clear(map);
+ bpf_fd_array_map_clear(map, false);
fd_array_map_free(map);
}
--
2.29.2
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH bpf v5 4/7] bpf: Defer the free of inner map when necessary
2023-12-04 14:04 [PATCH bpf v5 0/7] bpf: Fix the release of inner map Hou Tao
` (2 preceding siblings ...)
2023-12-04 14:04 ` [PATCH bpf v5 3/7] bpf: Set need_defer as false when clearing fd array during map free Hou Tao
@ 2023-12-04 14:04 ` Hou Tao
2023-12-04 14:04 ` [PATCH bpf v5 5/7] bpf: Optimize the free of inner map Hou Tao
` (3 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Hou Tao @ 2023-12-04 14:04 UTC (permalink / raw)
To: bpf
Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko, Song Liu,
Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh,
Stanislav Fomichev, Jiri Olsa, John Fastabend, Paul E . McKenney,
houtao1
From: Hou Tao <houtao1@huawei.com>
When updating or deleting an inner map in map array or map htab, the map
may still be accessed by non-sleepable program or sleepable program.
However bpf_map_fd_put_ptr() decreases the ref-counter of the inner map
directly through bpf_map_put(), if the ref-counter is the last one
(which is true for most cases), the inner map will be freed by
ops->map_free() in a kworker. But for now, most .map_free() callbacks
don't use synchronize_rcu() or its variants to wait for the elapse of a
RCU grace period, so after the invocation of ops->map_free completes,
the bpf program which is accessing the inner map may incur
use-after-free problem.
Fix the free of inner map by invoking bpf_map_free_deferred() after both
one RCU grace period and one tasks trace RCU grace period if the inner
map has been removed from the outer map before. The deferment is
accomplished by using call_rcu() or call_rcu_tasks_trace() when
releasing the last ref-counter of bpf map. The newly-added rcu_head
field in bpf_map shares the same storage space with work field to
reduce the size of bpf_map.
Fixes: bba1dc0b55ac ("bpf: Remove redundant synchronize_rcu.")
Fixes: 638e4b825d52 ("bpf: Allows per-cpu maps and map-in-map in sleepable programs")
Signed-off-by: Hou Tao <houtao1@huawei.com>
---
include/linux/bpf.h | 7 ++++++-
kernel/bpf/map_in_map.c | 11 ++++++++---
kernel/bpf/syscall.c | 32 +++++++++++++++++++++++++++-----
3 files changed, 41 insertions(+), 9 deletions(-)
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 6f68ac24bb9a..15a6bb951b70 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -276,7 +276,11 @@ struct bpf_map {
*/
atomic64_t refcnt ____cacheline_aligned;
atomic64_t usercnt;
- struct work_struct work;
+ /* rcu is used before freeing and work is only used during freeing */
+ union {
+ struct work_struct work;
+ struct rcu_head rcu;
+ };
struct mutex freeze_mutex;
atomic64_t writecnt;
/* 'Ownership' of program-containing map is claimed by the first program
@@ -292,6 +296,7 @@ struct bpf_map {
} owner;
bool bypass_spec_v1;
bool frozen; /* write-once; write-protected by freeze_mutex */
+ bool free_after_mult_rcu_gp;
s64 __percpu *elem_count;
};
diff --git a/kernel/bpf/map_in_map.c b/kernel/bpf/map_in_map.c
index 2dfeb5835e16..3248ff5d8161 100644
--- a/kernel/bpf/map_in_map.c
+++ b/kernel/bpf/map_in_map.c
@@ -129,10 +129,15 @@ void *bpf_map_fd_get_ptr(struct bpf_map *map,
void bpf_map_fd_put_ptr(struct bpf_map *map, void *ptr, bool need_defer)
{
- /* ptr->ops->map_free() has to go through one
- * rcu grace period by itself.
+ struct bpf_map *inner_map = ptr;
+
+ /* The inner map may still be used by both non-sleepable and sleepable
+ * bpf program, so free it after one RCU grace period and one tasks
+ * trace RCU grace period.
*/
- bpf_map_put(ptr);
+ if (need_defer)
+ WRITE_ONCE(inner_map->free_after_mult_rcu_gp, true);
+ bpf_map_put(inner_map);
}
u32 bpf_map_fd_sys_lookup_elem(void *ptr)
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index a144eb286974..88882cb58121 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -718,6 +718,28 @@ static void bpf_map_put_uref(struct bpf_map *map)
}
}
+static void bpf_map_free_in_work(struct bpf_map *map)
+{
+ INIT_WORK(&map->work, bpf_map_free_deferred);
+ /* Avoid spawning kworkers, since they all might contend
+ * for the same mutex like slab_mutex.
+ */
+ queue_work(system_unbound_wq, &map->work);
+}
+
+static void bpf_map_free_rcu_gp(struct rcu_head *rcu)
+{
+ bpf_map_free_in_work(container_of(rcu, struct bpf_map, rcu));
+}
+
+static void bpf_map_free_mult_rcu_gp(struct rcu_head *rcu)
+{
+ if (rcu_trace_implies_rcu_gp())
+ bpf_map_free_rcu_gp(rcu);
+ else
+ call_rcu(rcu, bpf_map_free_rcu_gp);
+}
+
/* decrement map refcnt and schedule it for freeing via workqueue
* (underlying map implementation ops->map_free() might sleep)
*/
@@ -727,11 +749,11 @@ void bpf_map_put(struct bpf_map *map)
/* bpf_map_free_id() must be called first */
bpf_map_free_id(map);
btf_put(map->btf);
- INIT_WORK(&map->work, bpf_map_free_deferred);
- /* Avoid spawning kworkers, since they all might contend
- * for the same mutex like slab_mutex.
- */
- queue_work(system_unbound_wq, &map->work);
+
+ if (READ_ONCE(map->free_after_mult_rcu_gp))
+ call_rcu_tasks_trace(&map->rcu, bpf_map_free_mult_rcu_gp);
+ else
+ bpf_map_free_in_work(map);
}
}
EXPORT_SYMBOL_GPL(bpf_map_put);
--
2.29.2
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH bpf v5 5/7] bpf: Optimize the free of inner map
2023-12-04 14:04 [PATCH bpf v5 0/7] bpf: Fix the release of inner map Hou Tao
` (3 preceding siblings ...)
2023-12-04 14:04 ` [PATCH bpf v5 4/7] bpf: Defer the free of inner map when necessary Hou Tao
@ 2023-12-04 14:04 ` Hou Tao
2023-12-04 14:04 ` [PATCH bpf v5 6/7] selftests/bpf: Add test cases for " Hou Tao
` (2 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Hou Tao @ 2023-12-04 14:04 UTC (permalink / raw)
To: bpf
Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko, Song Liu,
Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh,
Stanislav Fomichev, Jiri Olsa, John Fastabend, Paul E . McKenney,
houtao1
From: Hou Tao <houtao1@huawei.com>
When removing the inner map from the outer map, the inner map will be
freed after one RCU grace period and one RCU tasks trace grace
period, so it is certain that the bpf program, which may access the
inner map, has exited before the inner map is freed.
However there is no need to wait for one RCU tasks trace grace period if
the outer map is only accessed by non-sleepable program. So adding
sleepable_refcnt in bpf_map and increasing sleepable_refcnt when adding
the outer map into env->used_maps for sleepable program. Although the
max number of bpf program is INT_MAX - 1, the number of bpf programs
which are being loaded may be greater than INT_MAX, so using atomic64_t
instead of atomic_t for sleepable_refcnt. When removing the inner map
from the outer map, using sleepable_refcnt to decide whether or not a
RCU tasks trace grace period is needed before freeing the inner map.
Signed-off-by: Hou Tao <houtao1@huawei.com>
---
include/linux/bpf.h | 2 ++
kernel/bpf/core.c | 4 ++++
kernel/bpf/map_in_map.c | 14 +++++++++-----
kernel/bpf/syscall.c | 8 ++++++++
kernel/bpf/verifier.c | 4 +++-
5 files changed, 26 insertions(+), 6 deletions(-)
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 15a6bb951b70..4a78cc0c81c8 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -297,6 +297,8 @@ struct bpf_map {
bool bypass_spec_v1;
bool frozen; /* write-once; write-protected by freeze_mutex */
bool free_after_mult_rcu_gp;
+ bool free_after_rcu_gp;
+ atomic64_t sleepable_refcnt;
s64 __percpu *elem_count;
};
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index cd3afe57ece3..4b813da8d6c0 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -2664,12 +2664,16 @@ void __bpf_free_used_maps(struct bpf_prog_aux *aux,
struct bpf_map **used_maps, u32 len)
{
struct bpf_map *map;
+ bool sleepable;
u32 i;
+ sleepable = aux->sleepable;
for (i = 0; i < len; i++) {
map = used_maps[i];
if (map->ops->map_poke_untrack)
map->ops->map_poke_untrack(map, aux);
+ if (sleepable)
+ atomic64_dec(&map->sleepable_refcnt);
bpf_map_put(map);
}
}
diff --git a/kernel/bpf/map_in_map.c b/kernel/bpf/map_in_map.c
index 3248ff5d8161..8ef269e66ba5 100644
--- a/kernel/bpf/map_in_map.c
+++ b/kernel/bpf/map_in_map.c
@@ -131,12 +131,16 @@ void bpf_map_fd_put_ptr(struct bpf_map *map, void *ptr, bool need_defer)
{
struct bpf_map *inner_map = ptr;
- /* The inner map may still be used by both non-sleepable and sleepable
- * bpf program, so free it after one RCU grace period and one tasks
- * trace RCU grace period.
+ /* Defer the freeing of inner map according to the sleepable attribute
+ * of bpf program which owns the outer map, so unnecessary waiting for
+ * RCU tasks trace grace period can be avoided.
*/
- if (need_defer)
- WRITE_ONCE(inner_map->free_after_mult_rcu_gp, true);
+ if (need_defer) {
+ if (atomic64_read(&map->sleepable_refcnt))
+ WRITE_ONCE(inner_map->free_after_mult_rcu_gp, true);
+ else
+ WRITE_ONCE(inner_map->free_after_rcu_gp, true);
+ }
bpf_map_put(inner_map);
}
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 88882cb58121..7ba10570ca97 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -750,8 +750,11 @@ void bpf_map_put(struct bpf_map *map)
bpf_map_free_id(map);
btf_put(map->btf);
+ WARN_ON_ONCE(atomic64_read(&map->sleepable_refcnt));
if (READ_ONCE(map->free_after_mult_rcu_gp))
call_rcu_tasks_trace(&map->rcu, bpf_map_free_mult_rcu_gp);
+ else if (READ_ONCE(map->free_after_rcu_gp))
+ call_rcu(&map->rcu, bpf_map_free_rcu_gp);
else
bpf_map_free_in_work(map);
}
@@ -5343,6 +5346,11 @@ static int bpf_prog_bind_map(union bpf_attr *attr)
goto out_unlock;
}
+ /* The bpf program will not access the bpf map, but for the sake of
+ * simplicity, increase sleepable_refcnt for sleepable program as well.
+ */
+ if (prog->aux->sleepable)
+ atomic64_inc(&map->sleepable_refcnt);
memcpy(used_maps_new, used_maps_old,
sizeof(used_maps_old[0]) * prog->aux->used_map_cnt);
used_maps_new[prog->aux->used_map_cnt] = map;
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 6da370a047fe..0c018cedf47e 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -18051,10 +18051,12 @@ static int resolve_pseudo_ldimm64(struct bpf_verifier_env *env)
return -E2BIG;
}
+ if (env->prog->aux->sleepable)
+ atomic64_inc(&map->sleepable_refcnt);
/* hold the map. If the program is rejected by verifier,
* the map will be released by release_maps() or it
* will be used by the valid program until it's unloaded
- * and all maps are released in free_used_maps()
+ * and all maps are released in bpf_free_used_maps()
*/
bpf_map_inc(map);
--
2.29.2
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH bpf v5 6/7] selftests/bpf: Add test cases for inner map
2023-12-04 14:04 [PATCH bpf v5 0/7] bpf: Fix the release of inner map Hou Tao
` (4 preceding siblings ...)
2023-12-04 14:04 ` [PATCH bpf v5 5/7] bpf: Optimize the free of inner map Hou Tao
@ 2023-12-04 14:04 ` Hou Tao
2023-12-04 14:04 ` [PATCH bpf v5 7/7] selftests/bpf: Test outer map update operations in syscall program Hou Tao
2023-12-05 2:08 ` [PATCH bpf v5 0/7] bpf: Fix the release of inner map patchwork-bot+netdevbpf
7 siblings, 0 replies; 9+ messages in thread
From: Hou Tao @ 2023-12-04 14:04 UTC (permalink / raw)
To: bpf
Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko, Song Liu,
Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh,
Stanislav Fomichev, Jiri Olsa, John Fastabend, Paul E . McKenney,
houtao1
From: Hou Tao <houtao1@huawei.com>
Add test cases to test the race between the destroy of inner map due to
map-in-map update and the access of inner map in bpf program. The
following 4 combinations are added:
(1) array map in map array + bpf program
(2) array map in map array + sleepable bpf program
(3) array map in map htab + bpf program
(4) array map in map htab + sleepable bpf program
Before applying the fixes, when running `./test_prog -a map_in_map`, the
following error was reported:
==================================================================
BUG: KASAN: slab-use-after-free in array_map_update_elem+0x48/0x3e0
Read of size 4 at addr ffff888114f33824 by task test_progs/1858
CPU: 1 PID: 1858 Comm: test_progs Tainted: G O 6.6.0+ #7
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) ......
Call Trace:
<TASK>
dump_stack_lvl+0x4a/0x90
print_report+0xd2/0x620
kasan_report+0xd1/0x110
__asan_load4+0x81/0xa0
array_map_update_elem+0x48/0x3e0
bpf_prog_be94a9f26772f5b7_access_map_in_array+0xe6/0xf6
trace_call_bpf+0x1aa/0x580
kprobe_perf_func+0xdd/0x430
kprobe_dispatcher+0xa0/0xb0
kprobe_ftrace_handler+0x18b/0x2e0
0xffffffffc02280f7
RIP: 0010:__x64_sys_getpgid+0x1/0x30
......
</TASK>
Allocated by task 1857:
kasan_save_stack+0x26/0x50
kasan_set_track+0x25/0x40
kasan_save_alloc_info+0x1e/0x30
__kasan_kmalloc+0x98/0xa0
__kmalloc_node+0x6a/0x150
__bpf_map_area_alloc+0x141/0x170
bpf_map_area_alloc+0x10/0x20
array_map_alloc+0x11f/0x310
map_create+0x28a/0xb40
__sys_bpf+0x753/0x37c0
__x64_sys_bpf+0x44/0x60
do_syscall_64+0x36/0xb0
entry_SYSCALL_64_after_hwframe+0x6e/0x76
Freed by task 11:
kasan_save_stack+0x26/0x50
kasan_set_track+0x25/0x40
kasan_save_free_info+0x2b/0x50
__kasan_slab_free+0x113/0x190
slab_free_freelist_hook+0xd7/0x1e0
__kmem_cache_free+0x170/0x260
kfree+0x9b/0x160
kvfree+0x2d/0x40
bpf_map_area_free+0xe/0x20
array_map_free+0x120/0x2c0
bpf_map_free_deferred+0xd7/0x1e0
process_one_work+0x462/0x990
worker_thread+0x370/0x670
kthread+0x1b0/0x200
ret_from_fork+0x3a/0x70
ret_from_fork_asm+0x1b/0x30
Last potentially related work creation:
kasan_save_stack+0x26/0x50
__kasan_record_aux_stack+0x94/0xb0
kasan_record_aux_stack_noalloc+0xb/0x20
__queue_work+0x331/0x950
queue_work_on+0x75/0x80
bpf_map_put+0xfa/0x160
bpf_map_fd_put_ptr+0xe/0x20
bpf_fd_array_map_update_elem+0x174/0x1b0
bpf_map_update_value+0x2b7/0x4a0
__sys_bpf+0x2551/0x37c0
__x64_sys_bpf+0x44/0x60
do_syscall_64+0x36/0xb0
entry_SYSCALL_64_after_hwframe+0x6e/0x76
Signed-off-by: Hou Tao <houtao1@huawei.com>
---
.../selftests/bpf/prog_tests/map_in_map.c | 141 ++++++++++++++++++
.../selftests/bpf/progs/access_map_in_map.c | 93 ++++++++++++
2 files changed, 234 insertions(+)
create mode 100644 tools/testing/selftests/bpf/prog_tests/map_in_map.c
create mode 100644 tools/testing/selftests/bpf/progs/access_map_in_map.c
diff --git a/tools/testing/selftests/bpf/prog_tests/map_in_map.c b/tools/testing/selftests/bpf/prog_tests/map_in_map.c
new file mode 100644
index 000000000000..d2a10eb4e5b5
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/map_in_map.c
@@ -0,0 +1,141 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (C) 2023. Huawei Technologies Co., Ltd */
+#define _GNU_SOURCE
+#include <unistd.h>
+#include <sys/syscall.h>
+#include <test_progs.h>
+#include <bpf/btf.h>
+#include "access_map_in_map.skel.h"
+
+struct thread_ctx {
+ pthread_barrier_t barrier;
+ int outer_map_fd;
+ int start, abort;
+ int loop, err;
+};
+
+static int wait_for_start_or_abort(struct thread_ctx *ctx)
+{
+ while (!ctx->start && !ctx->abort)
+ usleep(1);
+ return ctx->abort ? -1 : 0;
+}
+
+static void *update_map_fn(void *data)
+{
+ struct thread_ctx *ctx = data;
+ int loop = ctx->loop, err = 0;
+
+ if (wait_for_start_or_abort(ctx) < 0)
+ return NULL;
+ pthread_barrier_wait(&ctx->barrier);
+
+ while (loop-- > 0) {
+ int fd, zero = 0;
+
+ fd = bpf_map_create(BPF_MAP_TYPE_ARRAY, NULL, 4, 4, 1, NULL);
+ if (fd < 0) {
+ err |= 1;
+ pthread_barrier_wait(&ctx->barrier);
+ continue;
+ }
+
+ /* Remove the old inner map */
+ if (bpf_map_update_elem(ctx->outer_map_fd, &zero, &fd, 0) < 0)
+ err |= 2;
+ close(fd);
+ pthread_barrier_wait(&ctx->barrier);
+ }
+
+ ctx->err = err;
+
+ return NULL;
+}
+
+static void *access_map_fn(void *data)
+{
+ struct thread_ctx *ctx = data;
+ int loop = ctx->loop;
+
+ if (wait_for_start_or_abort(ctx) < 0)
+ return NULL;
+ pthread_barrier_wait(&ctx->barrier);
+
+ while (loop-- > 0) {
+ /* Access the old inner map */
+ syscall(SYS_getpgid);
+ pthread_barrier_wait(&ctx->barrier);
+ }
+
+ return NULL;
+}
+
+static void test_map_in_map_access(const char *prog_name, const char *map_name)
+{
+ struct access_map_in_map *skel;
+ struct bpf_map *outer_map;
+ struct bpf_program *prog;
+ struct thread_ctx ctx;
+ pthread_t tid[2];
+ int err;
+
+ skel = access_map_in_map__open();
+ if (!ASSERT_OK_PTR(skel, "access_map_in_map open"))
+ return;
+
+ prog = bpf_object__find_program_by_name(skel->obj, prog_name);
+ if (!ASSERT_OK_PTR(prog, "find program"))
+ goto out;
+ bpf_program__set_autoload(prog, true);
+
+ outer_map = bpf_object__find_map_by_name(skel->obj, map_name);
+ if (!ASSERT_OK_PTR(outer_map, "find map"))
+ goto out;
+
+ err = access_map_in_map__load(skel);
+ if (!ASSERT_OK(err, "access_map_in_map load"))
+ goto out;
+
+ err = access_map_in_map__attach(skel);
+ if (!ASSERT_OK(err, "access_map_in_map attach"))
+ goto out;
+
+ skel->bss->tgid = getpid();
+
+ memset(&ctx, 0, sizeof(ctx));
+ pthread_barrier_init(&ctx.barrier, NULL, 2);
+ ctx.outer_map_fd = bpf_map__fd(outer_map);
+ ctx.loop = 4;
+
+ err = pthread_create(&tid[0], NULL, update_map_fn, &ctx);
+ if (!ASSERT_OK(err, "close_thread"))
+ goto out;
+
+ err = pthread_create(&tid[1], NULL, access_map_fn, &ctx);
+ if (!ASSERT_OK(err, "read_thread")) {
+ ctx.abort = 1;
+ pthread_join(tid[0], NULL);
+ goto out;
+ }
+
+ ctx.start = 1;
+ pthread_join(tid[0], NULL);
+ pthread_join(tid[1], NULL);
+
+ ASSERT_OK(ctx.err, "err");
+out:
+ access_map_in_map__destroy(skel);
+}
+
+void test_map_in_map(void)
+{
+ if (test__start_subtest("acc_map_in_array"))
+ test_map_in_map_access("access_map_in_array", "outer_array_map");
+ if (test__start_subtest("sleepable_acc_map_in_array"))
+ test_map_in_map_access("sleepable_access_map_in_array", "outer_array_map");
+ if (test__start_subtest("acc_map_in_htab"))
+ test_map_in_map_access("access_map_in_htab", "outer_htab_map");
+ if (test__start_subtest("sleepable_acc_map_in_htab"))
+ test_map_in_map_access("sleepable_access_map_in_htab", "outer_htab_map");
+}
+
diff --git a/tools/testing/selftests/bpf/progs/access_map_in_map.c b/tools/testing/selftests/bpf/progs/access_map_in_map.c
new file mode 100644
index 000000000000..1126871c2ebd
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/access_map_in_map.c
@@ -0,0 +1,93 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (C) 2023. Huawei Technologies Co., Ltd */
+#include <linux/bpf.h>
+#include <time.h>
+#include <bpf/bpf_helpers.h>
+
+#include "bpf_misc.h"
+
+struct inner_map_type {
+ __uint(type, BPF_MAP_TYPE_ARRAY);
+ __uint(key_size, 4);
+ __uint(value_size, 4);
+ __uint(max_entries, 1);
+} inner_map SEC(".maps");
+
+struct {
+ __uint(type, BPF_MAP_TYPE_ARRAY_OF_MAPS);
+ __type(key, int);
+ __type(value, int);
+ __uint(max_entries, 1);
+ __array(values, struct inner_map_type);
+} outer_array_map SEC(".maps") = {
+ .values = {
+ [0] = &inner_map,
+ },
+};
+
+struct {
+ __uint(type, BPF_MAP_TYPE_HASH_OF_MAPS);
+ __type(key, int);
+ __type(value, int);
+ __uint(max_entries, 1);
+ __array(values, struct inner_map_type);
+} outer_htab_map SEC(".maps") = {
+ .values = {
+ [0] = &inner_map,
+ },
+};
+
+char _license[] SEC("license") = "GPL";
+
+int tgid = 0;
+
+static int acc_map_in_map(void *outer_map)
+{
+ int i, key, value = 0xdeadbeef;
+ void *inner_map;
+
+ if ((bpf_get_current_pid_tgid() >> 32) != tgid)
+ return 0;
+
+ /* Find nonexistent inner map */
+ key = 1;
+ inner_map = bpf_map_lookup_elem(outer_map, &key);
+ if (inner_map)
+ return 0;
+
+ /* Find the old inner map */
+ key = 0;
+ inner_map = bpf_map_lookup_elem(outer_map, &key);
+ if (!inner_map)
+ return 0;
+
+ /* Wait for the old inner map to be replaced */
+ for (i = 0; i < 2048; i++)
+ bpf_map_update_elem(inner_map, &key, &value, 0);
+
+ return 0;
+}
+
+SEC("?kprobe/" SYS_PREFIX "sys_getpgid")
+int access_map_in_array(void *ctx)
+{
+ return acc_map_in_map(&outer_array_map);
+}
+
+SEC("?fentry.s/" SYS_PREFIX "sys_getpgid")
+int sleepable_access_map_in_array(void *ctx)
+{
+ return acc_map_in_map(&outer_array_map);
+}
+
+SEC("?kprobe/" SYS_PREFIX "sys_getpgid")
+int access_map_in_htab(void *ctx)
+{
+ return acc_map_in_map(&outer_htab_map);
+}
+
+SEC("?fentry.s/" SYS_PREFIX "sys_getpgid")
+int sleepable_access_map_in_htab(void *ctx)
+{
+ return acc_map_in_map(&outer_htab_map);
+}
--
2.29.2
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH bpf v5 7/7] selftests/bpf: Test outer map update operations in syscall program
2023-12-04 14:04 [PATCH bpf v5 0/7] bpf: Fix the release of inner map Hou Tao
` (5 preceding siblings ...)
2023-12-04 14:04 ` [PATCH bpf v5 6/7] selftests/bpf: Add test cases for " Hou Tao
@ 2023-12-04 14:04 ` Hou Tao
2023-12-05 2:08 ` [PATCH bpf v5 0/7] bpf: Fix the release of inner map patchwork-bot+netdevbpf
7 siblings, 0 replies; 9+ messages in thread
From: Hou Tao @ 2023-12-04 14:04 UTC (permalink / raw)
To: bpf
Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko, Song Liu,
Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh,
Stanislav Fomichev, Jiri Olsa, John Fastabend, Paul E . McKenney,
houtao1
From: Hou Tao <houtao1@huawei.com>
Syscall program is running with rcu_read_lock_trace being held, so if
bpf_map_update_elem() or bpf_map_delete_elem() invokes
synchronize_rcu_tasks_trace() when operating on an outer map, there will
be dead-lock, so add a test to guarantee that it is dead-lock free.
Signed-off-by: Hou Tao <houtao1@huawei.com>
---
.../selftests/bpf/prog_tests/syscall.c | 30 +++++-
tools/testing/selftests/bpf/progs/syscall.c | 96 ++++++++++++++++++-
2 files changed, 119 insertions(+), 7 deletions(-)
diff --git a/tools/testing/selftests/bpf/prog_tests/syscall.c b/tools/testing/selftests/bpf/prog_tests/syscall.c
index f4d40001155a..0be8301c0ffd 100644
--- a/tools/testing/selftests/bpf/prog_tests/syscall.c
+++ b/tools/testing/selftests/bpf/prog_tests/syscall.c
@@ -12,7 +12,7 @@ struct args {
int btf_fd;
};
-void test_syscall(void)
+static void test_syscall_load_prog(void)
{
static char verifier_log[8192];
struct args ctx = {
@@ -32,7 +32,7 @@ void test_syscall(void)
if (!ASSERT_OK_PTR(skel, "skel_load"))
goto cleanup;
- prog_fd = bpf_program__fd(skel->progs.bpf_prog);
+ prog_fd = bpf_program__fd(skel->progs.load_prog);
err = bpf_prog_test_run_opts(prog_fd, &tattr);
ASSERT_EQ(err, 0, "err");
ASSERT_EQ(tattr.retval, 1, "retval");
@@ -53,3 +53,29 @@ void test_syscall(void)
if (ctx.btf_fd > 0)
close(ctx.btf_fd);
}
+
+static void test_syscall_update_outer_map(void)
+{
+ LIBBPF_OPTS(bpf_test_run_opts, opts);
+ struct syscall *skel;
+ int err, prog_fd;
+
+ skel = syscall__open_and_load();
+ if (!ASSERT_OK_PTR(skel, "skel_load"))
+ goto cleanup;
+
+ prog_fd = bpf_program__fd(skel->progs.update_outer_map);
+ err = bpf_prog_test_run_opts(prog_fd, &opts);
+ ASSERT_EQ(err, 0, "err");
+ ASSERT_EQ(opts.retval, 1, "retval");
+cleanup:
+ syscall__destroy(skel);
+}
+
+void test_syscall(void)
+{
+ if (test__start_subtest("load_prog"))
+ test_syscall_load_prog();
+ if (test__start_subtest("update_outer_map"))
+ test_syscall_update_outer_map();
+}
diff --git a/tools/testing/selftests/bpf/progs/syscall.c b/tools/testing/selftests/bpf/progs/syscall.c
index e550f728962d..3d3cafdebe72 100644
--- a/tools/testing/selftests/bpf/progs/syscall.c
+++ b/tools/testing/selftests/bpf/progs/syscall.c
@@ -6,9 +6,15 @@
#include <bpf/bpf_tracing.h>
#include <../../../tools/include/linux/filter.h>
#include <linux/btf.h>
+#include <string.h>
+#include <errno.h>
char _license[] SEC("license") = "GPL";
+struct bpf_map {
+ int id;
+} __attribute__((preserve_access_index));
+
struct args {
__u64 log_buf;
__u32 log_size;
@@ -27,6 +33,37 @@ struct args {
BTF_TYPE_ENC(name, BTF_INFO_ENC(BTF_KIND_INT, 0, 0), sz), \
BTF_INT_ENC(encoding, bits_offset, bits)
+struct {
+ __uint(type, BPF_MAP_TYPE_ARRAY);
+ __type(key, int);
+ __type(value, union bpf_attr);
+ __uint(max_entries, 1);
+} bpf_attr_array SEC(".maps");
+
+struct inner_map_type {
+ __uint(type, BPF_MAP_TYPE_ARRAY);
+ __uint(key_size, 4);
+ __uint(value_size, 4);
+ __uint(max_entries, 1);
+} inner_map SEC(".maps");
+
+struct {
+ __uint(type, BPF_MAP_TYPE_ARRAY_OF_MAPS);
+ __type(key, int);
+ __type(value, int);
+ __uint(max_entries, 1);
+ __array(values, struct inner_map_type);
+} outer_array_map SEC(".maps") = {
+ .values = {
+ [0] = &inner_map,
+ },
+};
+
+static inline __u64 ptr_to_u64(const void *ptr)
+{
+ return (__u64) (unsigned long) ptr;
+}
+
static int btf_load(void)
{
struct btf_blob {
@@ -58,7 +95,7 @@ static int btf_load(void)
}
SEC("syscall")
-int bpf_prog(struct args *ctx)
+int load_prog(struct args *ctx)
{
static char license[] = "GPL";
static struct bpf_insn insns[] = {
@@ -94,8 +131,8 @@ int bpf_prog(struct args *ctx)
map_create_attr.max_entries = ctx->max_entries;
map_create_attr.btf_fd = ret;
- prog_load_attr.license = (long) license;
- prog_load_attr.insns = (long) insns;
+ prog_load_attr.license = ptr_to_u64(license);
+ prog_load_attr.insns = ptr_to_u64(insns);
prog_load_attr.log_buf = ctx->log_buf;
prog_load_attr.log_size = ctx->log_size;
prog_load_attr.log_level = 1;
@@ -107,8 +144,8 @@ int bpf_prog(struct args *ctx)
insns[3].imm = ret;
map_update_attr.map_fd = ret;
- map_update_attr.key = (long) &key;
- map_update_attr.value = (long) &value;
+ map_update_attr.key = ptr_to_u64(&key);
+ map_update_attr.value = ptr_to_u64(&value);
ret = bpf_sys_bpf(BPF_MAP_UPDATE_ELEM, &map_update_attr, sizeof(map_update_attr));
if (ret < 0)
return ret;
@@ -119,3 +156,52 @@ int bpf_prog(struct args *ctx)
ctx->prog_fd = ret;
return 1;
}
+
+SEC("syscall")
+int update_outer_map(void *ctx)
+{
+ int zero = 0, ret = 0, outer_fd = -1, inner_fd = -1, err;
+ const int attr_sz = sizeof(union bpf_attr);
+ union bpf_attr *attr;
+
+ attr = bpf_map_lookup_elem((struct bpf_map *)&bpf_attr_array, &zero);
+ if (!attr)
+ goto out;
+
+ memset(attr, 0, attr_sz);
+ attr->map_id = ((struct bpf_map *)&outer_array_map)->id;
+ outer_fd = bpf_sys_bpf(BPF_MAP_GET_FD_BY_ID, attr, attr_sz);
+ if (outer_fd < 0)
+ goto out;
+
+ memset(attr, 0, attr_sz);
+ attr->map_type = BPF_MAP_TYPE_ARRAY;
+ attr->key_size = 4;
+ attr->value_size = 4;
+ attr->max_entries = 1;
+ inner_fd = bpf_sys_bpf(BPF_MAP_CREATE, attr, attr_sz);
+ if (inner_fd < 0)
+ goto out;
+
+ memset(attr, 0, attr_sz);
+ attr->map_fd = outer_fd;
+ attr->key = ptr_to_u64(&zero);
+ attr->value = ptr_to_u64(&inner_fd);
+ err = bpf_sys_bpf(BPF_MAP_UPDATE_ELEM, attr, attr_sz);
+ if (err)
+ goto out;
+
+ memset(attr, 0, attr_sz);
+ attr->map_fd = outer_fd;
+ attr->key = ptr_to_u64(&zero);
+ err = bpf_sys_bpf(BPF_MAP_DELETE_ELEM, attr, attr_sz);
+ if (err)
+ goto out;
+ ret = 1;
+out:
+ if (inner_fd >= 0)
+ bpf_sys_close(inner_fd);
+ if (outer_fd >= 0)
+ bpf_sys_close(outer_fd);
+ return ret;
+}
--
2.29.2
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH bpf v5 0/7] bpf: Fix the release of inner map
2023-12-04 14:04 [PATCH bpf v5 0/7] bpf: Fix the release of inner map Hou Tao
` (6 preceding siblings ...)
2023-12-04 14:04 ` [PATCH bpf v5 7/7] selftests/bpf: Test outer map update operations in syscall program Hou Tao
@ 2023-12-05 2:08 ` patchwork-bot+netdevbpf
7 siblings, 0 replies; 9+ messages in thread
From: patchwork-bot+netdevbpf @ 2023-12-05 2:08 UTC (permalink / raw)
To: Hou Tao
Cc: bpf, martin.lau, alexei.starovoitov, andrii, song, haoluo,
yonghong.song, daniel, kpsingh, sdf, jolsa, john.fastabend,
paulmck, houtao1
Hello:
This series was applied to bpf/bpf-next.git (master)
by Alexei Starovoitov <ast@kernel.org>:
On Mon, 4 Dec 2023 22:04:18 +0800 you wrote:
> From: Hou Tao <houtao1@huawei.com>
>
> Hi,
>
> The patchset aims to fix the release of inner map in map array or map
> htab. The release of inner map is different with normal map. For normal
> map, the map is released after the bpf program which uses the map is
> destroyed, because the bpf program tracks the used maps. However bpf
> program can not track the used inner map because these inner map may be
> updated or deleted dynamically, and for now the ref-counter of inner map
> is decreased after the inner map is remove from outer map, so the inner
> map may be freed before the bpf program, which is accessing the inner
> map, exits and there will be use-after-free problem as demonstrated by
> patch #6.
>
> [...]
Here is the summary with links:
- [bpf,v5,1/7] bpf: Check rcu_read_lock_trace_held() before calling bpf map helpers
https://git.kernel.org/bpf/bpf-next/c/169410eba271
- [bpf,v5,2/7] bpf: Add map and need_defer parameters to .map_fd_put_ptr()
https://git.kernel.org/bpf/bpf-next/c/20c20bd11a07
- [bpf,v5,3/7] bpf: Set need_defer as false when clearing fd array during map free
https://git.kernel.org/bpf/bpf-next/c/79d93b3c6ffd
- [bpf,v5,4/7] bpf: Defer the free of inner map when necessary
https://git.kernel.org/bpf/bpf-next/c/876673364161
- [bpf,v5,5/7] bpf: Optimize the free of inner map
https://git.kernel.org/bpf/bpf-next/c/af66bfd3c853
- [bpf,v5,6/7] selftests/bpf: Add test cases for inner map
https://git.kernel.org/bpf/bpf-next/c/1624918be84a
- [bpf,v5,7/7] selftests/bpf: Test outer map update operations in syscall program
https://git.kernel.org/bpf/bpf-next/c/e3dd40828534
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2023-12-05 2:08 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-12-04 14:04 [PATCH bpf v5 0/7] bpf: Fix the release of inner map Hou Tao
2023-12-04 14:04 ` [PATCH bpf v5 1/7] bpf: Check rcu_read_lock_trace_held() before calling bpf map helpers Hou Tao
2023-12-04 14:04 ` [PATCH bpf v5 2/7] bpf: Add map and need_defer parameters to .map_fd_put_ptr() Hou Tao
2023-12-04 14:04 ` [PATCH bpf v5 3/7] bpf: Set need_defer as false when clearing fd array during map free Hou Tao
2023-12-04 14:04 ` [PATCH bpf v5 4/7] bpf: Defer the free of inner map when necessary Hou Tao
2023-12-04 14:04 ` [PATCH bpf v5 5/7] bpf: Optimize the free of inner map Hou Tao
2023-12-04 14:04 ` [PATCH bpf v5 6/7] selftests/bpf: Add test cases for " Hou Tao
2023-12-04 14:04 ` [PATCH bpf v5 7/7] selftests/bpf: Test outer map update operations in syscall program Hou Tao
2023-12-05 2:08 ` [PATCH bpf v5 0/7] bpf: Fix the release of inner map patchwork-bot+netdevbpf
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox