* [PATCH bpf-next v3 0/2] bpf: Use GFP_KERNEL in bpf_event_entry_gen()
@ 2023-12-14 4:30 Hou Tao
2023-12-14 4:30 ` [PATCH bpf-next v3 1/2] bpf: Reduce the scope of rcu_read_lock when updating fd map Hou Tao
` (2 more replies)
0 siblings, 3 replies; 11+ messages in thread
From: Hou Tao @ 2023-12-14 4:30 UTC (permalink / raw)
To: bpf
Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko, Song Liu,
Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh,
Stanislav Fomichev, Jiri Olsa, John Fastabend, xingwei lee,
houtao1
From: Hou Tao <houtao1@huawei.com>
Hi,
The simple patch set aims to replace GFP_ATOMIC by GFP_KERNEL in
bpf_event_entry_gen(). These two patches in the patch set were
preparatory patches in "Fix the release of inner map" patchset [1] and
are not needed for v2, so re-post it to bpf-next tree.
Patch #1 reduces the scope of rcu_read_lock when updating fd map and
patch #2 replaces GFP_ATOMIC by GFP_KERNEL. Please see individual
patches for more details.
Comments are always welcome.
Change Log:
v3:
* patch #1: fallback to patch #1 in v1. Update comments in
bpf_fd_htab_map_update_elem() to explain the reason for
rcu_read_lock() (Alexei)
v2: https://lore.kernel.org/bpf/20231211073843.1888058-1-houtao@huaweicloud.com/
* patch #1: add rcu_read_lock/unlock() for bpf_fd_array_map_update_elem
as well to make it consistent with
bpf_fd_htab_map_update_elem and update commit message
accordingly (Alexei)
* patch #1/#2: collects ack tags from Yonghong
v1: https://lore.kernel.org/bpf/20231208103357.2637299-1-houtao@huaweicloud.com/
[1]: https://lore.kernel.org/bpf/20231107140702.1891778-1-houtao@huaweicloud.com/
Hou Tao (2):
bpf: Reduce the scope of rcu_read_lock when updating fd map
bpf: Use GFP_KERNEL in bpf_event_entry_gen()
kernel/bpf/arraymap.c | 2 +-
kernel/bpf/hashtab.c | 6 ++++++
kernel/bpf/syscall.c | 4 ----
3 files changed, 7 insertions(+), 5 deletions(-)
--
2.29.2
^ permalink raw reply [flat|nested] 11+ messages in thread* [PATCH bpf-next v3 1/2] bpf: Reduce the scope of rcu_read_lock when updating fd map 2023-12-14 4:30 [PATCH bpf-next v3 0/2] bpf: Use GFP_KERNEL in bpf_event_entry_gen() Hou Tao @ 2023-12-14 4:30 ` Hou Tao 2023-12-14 6:22 ` John Fastabend 2023-12-14 4:30 ` [PATCH bpf-next v3 2/2] bpf: Use GFP_KERNEL in bpf_event_entry_gen() Hou Tao 2023-12-14 5:10 ` [PATCH bpf-next v3 0/2] " patchwork-bot+netdevbpf 2 siblings, 1 reply; 11+ messages in thread From: Hou Tao @ 2023-12-14 4:30 UTC (permalink / raw) To: bpf Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko, Song Liu, Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh, Stanislav Fomichev, Jiri Olsa, John Fastabend, xingwei lee, houtao1 From: Hou Tao <houtao1@huawei.com> There is no rcu-read-lock requirement for ops->map_fd_get_ptr() or ops->map_fd_put_ptr(), so doesn't use rcu-read-lock for these two callbacks. For bpf_fd_array_map_update_elem(), accessing array->ptrs doesn't need rcu-read-lock because array->ptrs must still be allocated. For bpf_fd_htab_map_update_elem(), htab_map_update_elem() only requires rcu-read-lock to be held to avoid the WARN_ON_ONCE(), so only use rcu_read_lock() during the invocation of htab_map_update_elem(). Acked-by: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Hou Tao <houtao1@huawei.com> --- kernel/bpf/hashtab.c | 6 ++++++ kernel/bpf/syscall.c | 4 ---- 2 files changed, 6 insertions(+), 4 deletions(-) diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c index 5b9146fa825f..ec3bdcc6a3cf 100644 --- a/kernel/bpf/hashtab.c +++ b/kernel/bpf/hashtab.c @@ -2523,7 +2523,13 @@ int bpf_fd_htab_map_update_elem(struct bpf_map *map, struct file *map_file, if (IS_ERR(ptr)) return PTR_ERR(ptr); + /* The htab bucket lock is always held during update operations in fd + * htab map, and the following rcu_read_lock() is only used to avoid + * the WARN_ON_ONCE in htab_map_update_elem(). + */ + rcu_read_lock(); ret = htab_map_update_elem(map, key, &ptr, map_flags); + rcu_read_unlock(); if (ret) map->ops->map_fd_put_ptr(map, ptr, false); diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index d63c1ed42412..3fcf7741146a 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -184,15 +184,11 @@ static int bpf_map_update_value(struct bpf_map *map, struct file *map_file, err = bpf_percpu_cgroup_storage_update(map, key, value, flags); } else if (IS_FD_ARRAY(map)) { - rcu_read_lock(); err = bpf_fd_array_map_update_elem(map, map_file, key, value, flags); - rcu_read_unlock(); } else if (map->map_type == BPF_MAP_TYPE_HASH_OF_MAPS) { - rcu_read_lock(); err = bpf_fd_htab_map_update_elem(map, map_file, key, value, flags); - rcu_read_unlock(); } else if (map->map_type == BPF_MAP_TYPE_REUSEPORT_SOCKARRAY) { /* rcu_read_lock() is not needed */ err = bpf_fd_reuseport_array_update_elem(map, key, value, -- 2.29.2 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* RE: [PATCH bpf-next v3 1/2] bpf: Reduce the scope of rcu_read_lock when updating fd map 2023-12-14 4:30 ` [PATCH bpf-next v3 1/2] bpf: Reduce the scope of rcu_read_lock when updating fd map Hou Tao @ 2023-12-14 6:22 ` John Fastabend 2023-12-14 7:31 ` Hou Tao 0 siblings, 1 reply; 11+ messages in thread From: John Fastabend @ 2023-12-14 6:22 UTC (permalink / raw) To: Hou Tao, bpf Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko, Song Liu, Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh, Stanislav Fomichev, Jiri Olsa, John Fastabend, xingwei lee, houtao1 Hou Tao wrote: > From: Hou Tao <houtao1@huawei.com> > > There is no rcu-read-lock requirement for ops->map_fd_get_ptr() or > ops->map_fd_put_ptr(), so doesn't use rcu-read-lock for these two > callbacks. > > For bpf_fd_array_map_update_elem(), accessing array->ptrs doesn't need > rcu-read-lock because array->ptrs must still be allocated. For > bpf_fd_htab_map_update_elem(), htab_map_update_elem() only requires > rcu-read-lock to be held to avoid the WARN_ON_ONCE(), so only use > rcu_read_lock() during the invocation of htab_map_update_elem(). > > Acked-by: Yonghong Song <yonghong.song@linux.dev> > Signed-off-by: Hou Tao <houtao1@huawei.com> > --- > kernel/bpf/hashtab.c | 6 ++++++ > kernel/bpf/syscall.c | 4 ---- > 2 files changed, 6 insertions(+), 4 deletions(-) > > diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c > index 5b9146fa825f..ec3bdcc6a3cf 100644 > --- a/kernel/bpf/hashtab.c > +++ b/kernel/bpf/hashtab.c > @@ -2523,7 +2523,13 @@ int bpf_fd_htab_map_update_elem(struct bpf_map *map, struct file *map_file, > if (IS_ERR(ptr)) > return PTR_ERR(ptr); > > + /* The htab bucket lock is always held during update operations in fd > + * htab map, and the following rcu_read_lock() is only used to avoid > + * the WARN_ON_ONCE in htab_map_update_elem(). > + */ > + rcu_read_lock(); > ret = htab_map_update_elem(map, key, &ptr, map_flags); > + rcu_read_unlock(); Did we consider dropping the WARN_ON_ONCE in htab_map_update_elem()? It looks like there are two ways to get to htab_map_update_elem() either through a syscall and the path here (bpf_fd_htab_map_update_elem) or through a BPF program calling, bpf_update_elem()? In the BPF_CALL case bpf_map_update_elem() already has, WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_bh_held()) The htab_map_update_elem() has an additional check for rcu_read_lock_trace_held(), but not sure where this is coming from at the moment. Can that be added to the BPF caller side if needed? Did I miss some caller path? > if (ret) > map->ops->map_fd_put_ptr(map, ptr, false); > > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c > index d63c1ed42412..3fcf7741146a 100644 > --- a/kernel/bpf/syscall.c > +++ b/kernel/bpf/syscall.c > @@ -184,15 +184,11 @@ static int bpf_map_update_value(struct bpf_map *map, struct file *map_file, > err = bpf_percpu_cgroup_storage_update(map, key, value, > flags); > } else if (IS_FD_ARRAY(map)) { > - rcu_read_lock(); > err = bpf_fd_array_map_update_elem(map, map_file, key, value, > flags); > - rcu_read_unlock(); > } else if (map->map_type == BPF_MAP_TYPE_HASH_OF_MAPS) { > - rcu_read_lock(); > err = bpf_fd_htab_map_update_elem(map, map_file, key, value, > flags); > - rcu_read_unlock(); > } else if (map->map_type == BPF_MAP_TYPE_REUSEPORT_SOCKARRAY) { > /* rcu_read_lock() is not needed */ > err = bpf_fd_reuseport_array_update_elem(map, key, value, Any reason to leave the last rcu_read_lock() on the 'else{}' case? If the rule is we have a reference to the map through the file fdget()? And any concurrent runners need some locking, xchg, to handle the update a rcu_read_lock() wont help there. I didn't audit all the update flows tonight though. > -- > 2.29.2 > > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH bpf-next v3 1/2] bpf: Reduce the scope of rcu_read_lock when updating fd map 2023-12-14 6:22 ` John Fastabend @ 2023-12-14 7:31 ` Hou Tao 2023-12-14 13:55 ` Alexei Starovoitov 0 siblings, 1 reply; 11+ messages in thread From: Hou Tao @ 2023-12-14 7:31 UTC (permalink / raw) To: John Fastabend, bpf Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko, Song Liu, Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh, Stanislav Fomichev, Jiri Olsa, xingwei lee, houtao1 Hi, On 12/14/2023 2:22 PM, John Fastabend wrote: > Hou Tao wrote: >> From: Hou Tao <houtao1@huawei.com> >> >> There is no rcu-read-lock requirement for ops->map_fd_get_ptr() or >> ops->map_fd_put_ptr(), so doesn't use rcu-read-lock for these two >> callbacks. >> >> For bpf_fd_array_map_update_elem(), accessing array->ptrs doesn't need >> rcu-read-lock because array->ptrs must still be allocated. For >> bpf_fd_htab_map_update_elem(), htab_map_update_elem() only requires >> rcu-read-lock to be held to avoid the WARN_ON_ONCE(), so only use >> rcu_read_lock() during the invocation of htab_map_update_elem(). >> >> Acked-by: Yonghong Song <yonghong.song@linux.dev> >> Signed-off-by: Hou Tao <houtao1@huawei.com> >> --- >> kernel/bpf/hashtab.c | 6 ++++++ >> kernel/bpf/syscall.c | 4 ---- >> 2 files changed, 6 insertions(+), 4 deletions(-) >> >> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c >> index 5b9146fa825f..ec3bdcc6a3cf 100644 >> --- a/kernel/bpf/hashtab.c >> +++ b/kernel/bpf/hashtab.c >> @@ -2523,7 +2523,13 @@ int bpf_fd_htab_map_update_elem(struct bpf_map *map, struct file *map_file, >> if (IS_ERR(ptr)) >> return PTR_ERR(ptr); >> >> + /* The htab bucket lock is always held during update operations in fd >> + * htab map, and the following rcu_read_lock() is only used to avoid >> + * the WARN_ON_ONCE in htab_map_update_elem(). >> + */ >> + rcu_read_lock(); >> ret = htab_map_update_elem(map, key, &ptr, map_flags); >> + rcu_read_unlock(); > Did we consider dropping the WARN_ON_ONCE in htab_map_update_elem()? It > looks like there are two ways to get to htab_map_update_elem() either > through a syscall and the path here (bpf_fd_htab_map_update_elem) or > through a BPF program calling, bpf_update_elem()? In the BPF_CALL > case bpf_map_update_elem() already has, > > WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_bh_held()) > > The htab_map_update_elem() has an additional check for > rcu_read_lock_trace_held(), but not sure where this is coming from > at the moment. Can that be added to the BPF caller side if needed? > > Did I miss some caller path? No. But I think the main reason for the extra WARN in bpf_map_update_elem() is that bpf_map_update_elem() may be inlined by verifier in do_misc_fixups(), so the WARN_ON_ONCE in bpf_map_update_elem() will not be invoked ever. For rcu_read_lock_trace_held(), I have added the assertion in bpf_map_delete_elem() recently in commit 169410eba271 ("bpf: Check rcu_read_lock_trace_held() before calling bpf map helpers"). > > >> if (ret) >> map->ops->map_fd_put_ptr(map, ptr, false); >> >> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c >> index d63c1ed42412..3fcf7741146a 100644 >> --- a/kernel/bpf/syscall.c >> +++ b/kernel/bpf/syscall.c >> @@ -184,15 +184,11 @@ static int bpf_map_update_value(struct bpf_map *map, struct file *map_file, >> err = bpf_percpu_cgroup_storage_update(map, key, value, >> flags); >> } else if (IS_FD_ARRAY(map)) { >> - rcu_read_lock(); >> err = bpf_fd_array_map_update_elem(map, map_file, key, value, >> flags); >> - rcu_read_unlock(); >> } else if (map->map_type == BPF_MAP_TYPE_HASH_OF_MAPS) { >> - rcu_read_lock(); >> err = bpf_fd_htab_map_update_elem(map, map_file, key, value, >> flags); >> - rcu_read_unlock(); >> } else if (map->map_type == BPF_MAP_TYPE_REUSEPORT_SOCKARRAY) { >> /* rcu_read_lock() is not needed */ >> err = bpf_fd_reuseport_array_update_elem(map, key, value, > Any reason to leave the last rcu_read_lock() on the 'else{}' case? If > the rule is we have a reference to the map through the file fdget()? And > any concurrent runners need some locking, xchg, to handle the update a > rcu_read_lock() wont help there. > > I didn't audit all the update flows tonight though. It seems it is still necessary for htab and local storage. For normal htab, it is possible the update is done without taking the bucket lock (in-place replace), so RCU CS is needed to guarantee the iteration is still safe. And for local storage (e.g. cgrp local storage) it may also do in-place update through lookup and then update. We could fold the calling of rcu_read_lock() into .map_update_elem() if it is necessary. > > >> -- >> 2.29.2 >> >> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH bpf-next v3 1/2] bpf: Reduce the scope of rcu_read_lock when updating fd map 2023-12-14 7:31 ` Hou Tao @ 2023-12-14 13:55 ` Alexei Starovoitov 2023-12-14 19:15 ` John Fastabend 0 siblings, 1 reply; 11+ messages in thread From: Alexei Starovoitov @ 2023-12-14 13:55 UTC (permalink / raw) To: Hou Tao Cc: John Fastabend, bpf, Martin KaFai Lau, Andrii Nakryiko, Song Liu, Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh, Stanislav Fomichev, Jiri Olsa, xingwei lee, Hou Tao On Wed, Dec 13, 2023 at 11:31 PM Hou Tao <houtao@huaweicloud.com> wrote: > > Hi, > > On 12/14/2023 2:22 PM, John Fastabend wrote: > > Hou Tao wrote: > >> From: Hou Tao <houtao1@huawei.com> > >> > >> There is no rcu-read-lock requirement for ops->map_fd_get_ptr() or > >> ops->map_fd_put_ptr(), so doesn't use rcu-read-lock for these two > >> callbacks. > >> > >> For bpf_fd_array_map_update_elem(), accessing array->ptrs doesn't need > >> rcu-read-lock because array->ptrs must still be allocated. For > >> bpf_fd_htab_map_update_elem(), htab_map_update_elem() only requires > >> rcu-read-lock to be held to avoid the WARN_ON_ONCE(), so only use > >> rcu_read_lock() during the invocation of htab_map_update_elem(). > >> > >> Acked-by: Yonghong Song <yonghong.song@linux.dev> > >> Signed-off-by: Hou Tao <houtao1@huawei.com> > >> --- > >> kernel/bpf/hashtab.c | 6 ++++++ > >> kernel/bpf/syscall.c | 4 ---- > >> 2 files changed, 6 insertions(+), 4 deletions(-) > >> > >> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c > >> index 5b9146fa825f..ec3bdcc6a3cf 100644 > >> --- a/kernel/bpf/hashtab.c > >> +++ b/kernel/bpf/hashtab.c > >> @@ -2523,7 +2523,13 @@ int bpf_fd_htab_map_update_elem(struct bpf_map *map, struct file *map_file, > >> if (IS_ERR(ptr)) > >> return PTR_ERR(ptr); > >> > >> + /* The htab bucket lock is always held during update operations in fd > >> + * htab map, and the following rcu_read_lock() is only used to avoid > >> + * the WARN_ON_ONCE in htab_map_update_elem(). > >> + */ > >> + rcu_read_lock(); > >> ret = htab_map_update_elem(map, key, &ptr, map_flags); > >> + rcu_read_unlock(); > > Did we consider dropping the WARN_ON_ONCE in htab_map_update_elem()? It > > looks like there are two ways to get to htab_map_update_elem() either > > through a syscall and the path here (bpf_fd_htab_map_update_elem) or > > through a BPF program calling, bpf_update_elem()? In the BPF_CALL > > case bpf_map_update_elem() already has, > > > > WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_bh_held()) > > > > The htab_map_update_elem() has an additional check for > > rcu_read_lock_trace_held(), but not sure where this is coming from > > at the moment. Can that be added to the BPF caller side if needed? > > > > Did I miss some caller path? > > No. But I think the main reason for the extra WARN in > bpf_map_update_elem() is that bpf_map_update_elem() may be inlined by > verifier in do_misc_fixups(), so the WARN_ON_ONCE in > bpf_map_update_elem() will not be invoked ever. For > rcu_read_lock_trace_held(), I have added the assertion in > bpf_map_delete_elem() recently in commit 169410eba271 ("bpf: Check > rcu_read_lock_trace_held() before calling bpf map helpers"). Yep. We should probably remove WARN_ONs from bpf_map_update_elem() and others in kernel/bpf/helpers.c since they are inlined by the verifier with 99% probability and the WARNs are never called even in DEBUG kernels. And confusing developers. As this thread shows. We can replace them with a comment that explains this inlining logic and where the real WARNs are. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH bpf-next v3 1/2] bpf: Reduce the scope of rcu_read_lock when updating fd map 2023-12-14 13:55 ` Alexei Starovoitov @ 2023-12-14 19:15 ` John Fastabend 2023-12-15 3:23 ` Alexei Starovoitov 2023-12-15 8:18 ` Hou Tao 0 siblings, 2 replies; 11+ messages in thread From: John Fastabend @ 2023-12-14 19:15 UTC (permalink / raw) To: Alexei Starovoitov, Hou Tao Cc: John Fastabend, bpf, Martin KaFai Lau, Andrii Nakryiko, Song Liu, Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh, Stanislav Fomichev, Jiri Olsa, xingwei lee, Hou Tao Alexei Starovoitov wrote: > On Wed, Dec 13, 2023 at 11:31 PM Hou Tao <houtao@huaweicloud.com> wrote: > > > > Hi, > > > > On 12/14/2023 2:22 PM, John Fastabend wrote: > > > Hou Tao wrote: > > >> From: Hou Tao <houtao1@huawei.com> > > >> > > >> There is no rcu-read-lock requirement for ops->map_fd_get_ptr() or > > >> ops->map_fd_put_ptr(), so doesn't use rcu-read-lock for these two > > >> callbacks. > > >> > > >> For bpf_fd_array_map_update_elem(), accessing array->ptrs doesn't need > > >> rcu-read-lock because array->ptrs must still be allocated. For > > >> bpf_fd_htab_map_update_elem(), htab_map_update_elem() only requires > > >> rcu-read-lock to be held to avoid the WARN_ON_ONCE(), so only use > > >> rcu_read_lock() during the invocation of htab_map_update_elem(). > > >> > > >> Acked-by: Yonghong Song <yonghong.song@linux.dev> > > >> Signed-off-by: Hou Tao <houtao1@huawei.com> > > >> --- > > >> kernel/bpf/hashtab.c | 6 ++++++ > > >> kernel/bpf/syscall.c | 4 ---- > > >> 2 files changed, 6 insertions(+), 4 deletions(-) > > >> > > >> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c > > >> index 5b9146fa825f..ec3bdcc6a3cf 100644 > > >> --- a/kernel/bpf/hashtab.c > > >> +++ b/kernel/bpf/hashtab.c > > >> @@ -2523,7 +2523,13 @@ int bpf_fd_htab_map_update_elem(struct bpf_map *map, struct file *map_file, > > >> if (IS_ERR(ptr)) > > >> return PTR_ERR(ptr); > > >> > > >> + /* The htab bucket lock is always held during update operations in fd > > >> + * htab map, and the following rcu_read_lock() is only used to avoid > > >> + * the WARN_ON_ONCE in htab_map_update_elem(). > > >> + */ Ah ok but isn't this comment wrong because you do need rcu read lock to do the walk with lookup_nulls_elem_raw where there is no lock being held? And then the subsequent copy in place is fine because you do have a lock. So its not just to appease the WARN_ON_ONCE here it has an actual real need? > > >> + rcu_read_lock(); > > >> ret = htab_map_update_elem(map, key, &ptr, map_flags); > > >> + rcu_read_unlock(); > > > Did we consider dropping the WARN_ON_ONCE in htab_map_update_elem()? It > > > looks like there are two ways to get to htab_map_update_elem() either > > > through a syscall and the path here (bpf_fd_htab_map_update_elem) or > > > through a BPF program calling, bpf_update_elem()? In the BPF_CALL > > > case bpf_map_update_elem() already has, > > > > > > WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_bh_held()) > > > > > > The htab_map_update_elem() has an additional check for > > > rcu_read_lock_trace_held(), but not sure where this is coming from > > > at the moment. Can that be added to the BPF caller side if needed? > > > > > > Did I miss some caller path? > > > > No. But I think the main reason for the extra WARN in > > bpf_map_update_elem() is that bpf_map_update_elem() may be inlined by > > verifier in do_misc_fixups(), so the WARN_ON_ONCE in > > bpf_map_update_elem() will not be invoked ever. For > > rcu_read_lock_trace_held(), I have added the assertion in > > bpf_map_delete_elem() recently in commit 169410eba271 ("bpf: Check > > rcu_read_lock_trace_held() before calling bpf map helpers"). > > Yep. > We should probably remove WARN_ONs from > bpf_map_update_elem() and others in kernel/bpf/helpers.c > since they are inlined by the verifier with 99% probability > and the WARNs are never called even in DEBUG kernels. > And confusing developers. As this thread shows. Agree. The rcu_read needs to be close as possible to where its actually needed and the WARN_ON_ONCE should be dropped if its going to be inlined. > > We can replace them with a comment that explains this inlining logic > and where the real WARNs are. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH bpf-next v3 1/2] bpf: Reduce the scope of rcu_read_lock when updating fd map 2023-12-14 19:15 ` John Fastabend @ 2023-12-15 3:23 ` Alexei Starovoitov 2023-12-15 3:39 ` Hou Tao 2023-12-15 8:18 ` Hou Tao 1 sibling, 1 reply; 11+ messages in thread From: Alexei Starovoitov @ 2023-12-15 3:23 UTC (permalink / raw) To: John Fastabend Cc: Hou Tao, bpf, Martin KaFai Lau, Andrii Nakryiko, Song Liu, Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh, Stanislav Fomichev, Jiri Olsa, xingwei lee, Hou Tao On Thu, Dec 14, 2023 at 11:15 AM John Fastabend <john.fastabend@gmail.com> wrote: > > Alexei Starovoitov wrote: > > On Wed, Dec 13, 2023 at 11:31 PM Hou Tao <houtao@huaweicloud.com> wrote: > > > > > > Hi, > > > > > > On 12/14/2023 2:22 PM, John Fastabend wrote: > > > > Hou Tao wrote: > > > >> From: Hou Tao <houtao1@huawei.com> > > > >> > > > >> There is no rcu-read-lock requirement for ops->map_fd_get_ptr() or > > > >> ops->map_fd_put_ptr(), so doesn't use rcu-read-lock for these two > > > >> callbacks. > > > >> > > > >> For bpf_fd_array_map_update_elem(), accessing array->ptrs doesn't need > > > >> rcu-read-lock because array->ptrs must still be allocated. For > > > >> bpf_fd_htab_map_update_elem(), htab_map_update_elem() only requires > > > >> rcu-read-lock to be held to avoid the WARN_ON_ONCE(), so only use > > > >> rcu_read_lock() during the invocation of htab_map_update_elem(). > > > >> > > > >> Acked-by: Yonghong Song <yonghong.song@linux.dev> > > > >> Signed-off-by: Hou Tao <houtao1@huawei.com> > > > >> --- > > > >> kernel/bpf/hashtab.c | 6 ++++++ > > > >> kernel/bpf/syscall.c | 4 ---- > > > >> 2 files changed, 6 insertions(+), 4 deletions(-) > > > >> > > > >> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c > > > >> index 5b9146fa825f..ec3bdcc6a3cf 100644 > > > >> --- a/kernel/bpf/hashtab.c > > > >> +++ b/kernel/bpf/hashtab.c > > > >> @@ -2523,7 +2523,13 @@ int bpf_fd_htab_map_update_elem(struct bpf_map *map, struct file *map_file, > > > >> if (IS_ERR(ptr)) > > > >> return PTR_ERR(ptr); > > > >> > > > >> + /* The htab bucket lock is always held during update operations in fd > > > >> + * htab map, and the following rcu_read_lock() is only used to avoid > > > >> + * the WARN_ON_ONCE in htab_map_update_elem(). > > > >> + */ > > Ah ok but isn't this comment wrong because you do need rcu read lock to do > the walk with lookup_nulls_elem_raw where there is no lock being held? And > then the subsequent copy in place is fine because you do have a lock. Ohh. You're correct. Not sure what I was thinking. Hou, could you please send a follow up to undo my braino. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH bpf-next v3 1/2] bpf: Reduce the scope of rcu_read_lock when updating fd map 2023-12-15 3:23 ` Alexei Starovoitov @ 2023-12-15 3:39 ` Hou Tao 0 siblings, 0 replies; 11+ messages in thread From: Hou Tao @ 2023-12-15 3:39 UTC (permalink / raw) To: Alexei Starovoitov, John Fastabend Cc: bpf, Martin KaFai Lau, Andrii Nakryiko, Song Liu, Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh, Stanislav Fomichev, Jiri Olsa, xingwei lee, Hou Tao Hi, On 12/15/2023 11:23 AM, Alexei Starovoitov wrote: > On Thu, Dec 14, 2023 at 11:15 AM John Fastabend > <john.fastabend@gmail.com> wrote: >> Alexei Starovoitov wrote: >>> On Wed, Dec 13, 2023 at 11:31 PM Hou Tao <houtao@huaweicloud.com> wrote: >>>> Hi, >>>> >>>> On 12/14/2023 2:22 PM, John Fastabend wrote: >>>>> Hou Tao wrote: >>>>>> From: Hou Tao <houtao1@huawei.com> >>>>>> >>>>>> There is no rcu-read-lock requirement for ops->map_fd_get_ptr() or >>>>>> ops->map_fd_put_ptr(), so doesn't use rcu-read-lock for these two >>>>>> callbacks. >>>>>> >>>>>> For bpf_fd_array_map_update_elem(), accessing array->ptrs doesn't need >>>>>> rcu-read-lock because array->ptrs must still be allocated. For >>>>>> bpf_fd_htab_map_update_elem(), htab_map_update_elem() only requires >>>>>> rcu-read-lock to be held to avoid the WARN_ON_ONCE(), so only use >>>>>> rcu_read_lock() during the invocation of htab_map_update_elem(). >>>>>> >>>>>> Acked-by: Yonghong Song <yonghong.song@linux.dev> >>>>>> Signed-off-by: Hou Tao <houtao1@huawei.com> >>>>>> --- >>>>>> kernel/bpf/hashtab.c | 6 ++++++ >>>>>> kernel/bpf/syscall.c | 4 ---- >>>>>> 2 files changed, 6 insertions(+), 4 deletions(-) >>>>>> >>>>>> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c >>>>>> index 5b9146fa825f..ec3bdcc6a3cf 100644 >>>>>> --- a/kernel/bpf/hashtab.c >>>>>> +++ b/kernel/bpf/hashtab.c >>>>>> @@ -2523,7 +2523,13 @@ int bpf_fd_htab_map_update_elem(struct bpf_map *map, struct file *map_file, >>>>>> if (IS_ERR(ptr)) >>>>>> return PTR_ERR(ptr); >>>>>> >>>>>> + /* The htab bucket lock is always held during update operations in fd >>>>>> + * htab map, and the following rcu_read_lock() is only used to avoid >>>>>> + * the WARN_ON_ONCE in htab_map_update_elem(). >>>>>> + */ >> Ah ok but isn't this comment wrong because you do need rcu read lock to do >> the walk with lookup_nulls_elem_raw where there is no lock being held? And >> then the subsequent copy in place is fine because you do have a lock. > Ohh. You're correct. > Not sure what I was thinking. > > Hou, > could you please send a follow up to undo my braino. Er, I didn't follow. There is no spin-lock support in fd htab map, so htab_map_update_elem() won't call lookup_nulls_elem_raw(), instead it will lock the bucket and invoke lookup_elem_raw(), so I don't think rcu_read_lock() is indeed needed for the invocation of htab_map_update_elem(), except to make WARN_ON_ONC() happy. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH bpf-next v3 1/2] bpf: Reduce the scope of rcu_read_lock when updating fd map 2023-12-14 19:15 ` John Fastabend 2023-12-15 3:23 ` Alexei Starovoitov @ 2023-12-15 8:18 ` Hou Tao 1 sibling, 0 replies; 11+ messages in thread From: Hou Tao @ 2023-12-15 8:18 UTC (permalink / raw) To: John Fastabend, Alexei Starovoitov Cc: bpf, Martin KaFai Lau, Andrii Nakryiko, Song Liu, Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh, Stanislav Fomichev, Jiri Olsa, xingwei lee, Hou Tao Hi, On 12/15/2023 3:15 AM, John Fastabend wrote: > Alexei Starovoitov wrote: >> On Wed, Dec 13, 2023 at 11:31 PM Hou Tao <houtao@huaweicloud.com> wrote: >>> Hi, >>> >>> On 12/14/2023 2:22 PM, John Fastabend wrote: >>>> Hou Tao wrote: >>>>> From: Hou Tao <houtao1@huawei.com> >>>>> >>>>> There is no rcu-read-lock requirement for ops->map_fd_get_ptr() or >>>>> ops->map_fd_put_ptr(), so doesn't use rcu-read-lock for these two >>>>> callbacks. >>>>> >>>>> For bpf_fd_array_map_update_elem(), accessing array->ptrs doesn't need >>>>> rcu-read-lock because array->ptrs must still be allocated. For >>>>> bpf_fd_htab_map_update_elem(), htab_map_update_elem() only requires >>>>> rcu-read-lock to be held to avoid the WARN_ON_ONCE(), so only use >>>>> rcu_read_lock() during the invocation of htab_map_update_elem(). >>>>> >>>>> Acked-by: Yonghong Song <yonghong.song@linux.dev> >>>>> Signed-off-by: Hou Tao <houtao1@huawei.com> >>>>> --- >>>>> kernel/bpf/hashtab.c | 6 ++++++ >>>>> kernel/bpf/syscall.c | 4 ---- >>>>> 2 files changed, 6 insertions(+), 4 deletions(-) >>>>> >>>>> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c >>>>> index 5b9146fa825f..ec3bdcc6a3cf 100644 >>>>> --- a/kernel/bpf/hashtab.c >>>>> +++ b/kernel/bpf/hashtab.c >>>>> @@ -2523,7 +2523,13 @@ int bpf_fd_htab_map_update_elem(struct bpf_map *map, struct file *map_file, >>>>> if (IS_ERR(ptr)) >>>>> return PTR_ERR(ptr); >>>>> >>>>> + /* The htab bucket lock is always held during update operations in fd >>>>> + * htab map, and the following rcu_read_lock() is only used to avoid >>>>> + * the WARN_ON_ONCE in htab_map_update_elem(). >>>>> + */ > Ah ok but isn't this comment wrong because you do need rcu read lock to do > the walk with lookup_nulls_elem_raw where there is no lock being held? And > then the subsequent copy in place is fine because you do have a lock. > > So its not just to appease the WARN_ON_ONCE here it has an actual real > need? > >>>>> + rcu_read_lock(); >>>>> ret = htab_map_update_elem(map, key, &ptr, map_flags); >>>>> + rcu_read_unlock(); >>>> Did we consider dropping the WARN_ON_ONCE in htab_map_update_elem()? It >>>> looks like there are two ways to get to htab_map_update_elem() either >>>> through a syscall and the path here (bpf_fd_htab_map_update_elem) or >>>> through a BPF program calling, bpf_update_elem()? In the BPF_CALL >>>> case bpf_map_update_elem() already has, >>>> >>>> WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_bh_held()) >>>> >>>> The htab_map_update_elem() has an additional check for >>>> rcu_read_lock_trace_held(), but not sure where this is coming from >>>> at the moment. Can that be added to the BPF caller side if needed? >>>> >>>> Did I miss some caller path? >>> No. But I think the main reason for the extra WARN in >>> bpf_map_update_elem() is that bpf_map_update_elem() may be inlined by >>> verifier in do_misc_fixups(), so the WARN_ON_ONCE in >>> bpf_map_update_elem() will not be invoked ever. For >>> rcu_read_lock_trace_held(), I have added the assertion in >>> bpf_map_delete_elem() recently in commit 169410eba271 ("bpf: Check >>> rcu_read_lock_trace_held() before calling bpf map helpers"). >> Yep. >> We should probably remove WARN_ONs from >> bpf_map_update_elem() and others in kernel/bpf/helpers.c >> since they are inlined by the verifier with 99% probability >> and the WARNs are never called even in DEBUG kernels. >> And confusing developers. As this thread shows. > Agree. The rcu_read needs to be close as possible to where its actually > needed and the WARN_ON_ONCE should be dropped if its going to be > inlined. I did some investigation on these bpf map helpers and the implementations of these helpers in various kinds of bpf map. It seems most implementations (besides dev_map_hash_ops) already have added proper RCU lock assertions, so I think it is indeed OK to remove WARN_ON_ONCE() on these bpf map helpers after fixing the assertion in dev_map_hash_ops. The following is the details: 1. bpf_map_lookup_elem helper (a) hash/lru_hash/percpu_hash/lru_percpu_hash with !rcu_read_lock_held() && !rcu_read_lock_trace_held() && !rcu_read_lock_bh_held() in __htab_map_lookup_elem() (b) array/percpu_array no deletion, so no RCU (c) lpm_trie with rcu_read_lock_bh_held() in trie_lookup_elem() (d) htab_of_maps with !rcu_read_lock_held() && !rcu_read_lock_trace_held() && !rcu_read_lock_bh_held() in __htab_map_lookup_elem() (e) array_of_maps no deletion, so no RCU (f) sockmap rcu_read_lock_held() in __sock_map_lookup_elem() (g) sockhash rcu_read_lock_held() in__sock_hash_lookup_elem() (h) devmap rcu_read_lock_bh_held() in __dev_map_lookup_elem() (i) devmap_hash (incorrect assertion) No rcu_read_lock_bh_held() in __dev_map_hash_lookup_elem() (j) xskmap rcu_read_lock_bh_held() in __xsk_map_lookup_elem() 2. bpf_map_update_elem helper (a) hash/lru_hash/percpu_hash/lru_percpu_hash with !rcu_read_lock_held() && !rcu_read_lock_trace_held() && !rcu_read_lock_bh_held() in htab_map_update_elem()/htab_lru_map_update_elem()/__htab_percpu_map_update_elem()/__htab_lru_percpu_map_update_elem() (b) array/percpu_array no RCU (c) lpm_trie use spin-lock, and no RCU (d) sockmap use spin-lock & with rcu_read_lock_held() in sock_map_update_common() (e) sockhash use spin-lock & with rcu_read_lock_held() in sock_hash_update_common() 3.bpf_map_delete_elem helper (a) hash/lru_hash/percpu_hash/lru_percpu_hash with !rcu_read_lock_held() && !rcu_read_lock_trace_held() && !rcu_read_lock_bh_held() () in htab_map_delete_elem/htab_lru_map_delete_elem (b) array/percpu_array no support (c) lpm_trie use spin-lock, no rcu (d) sockmap use spin-lock (e) sockhash use spin-lock 4. bpf_map_lookup_percpu_elem (a) percpu_hash/lru_percpu_hash with !rcu_read_lock_held() && !rcu_read_lock_trace_held() && !rcu_read_lock_bh_held() in __htab_map_lookup_elem() (b) percpu_array no deletion, no RCU >> We can replace them with a comment that explains this inlining logic >> and where the real WARNs are.. ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH bpf-next v3 2/2] bpf: Use GFP_KERNEL in bpf_event_entry_gen() 2023-12-14 4:30 [PATCH bpf-next v3 0/2] bpf: Use GFP_KERNEL in bpf_event_entry_gen() Hou Tao 2023-12-14 4:30 ` [PATCH bpf-next v3 1/2] bpf: Reduce the scope of rcu_read_lock when updating fd map Hou Tao @ 2023-12-14 4:30 ` Hou Tao 2023-12-14 5:10 ` [PATCH bpf-next v3 0/2] " patchwork-bot+netdevbpf 2 siblings, 0 replies; 11+ messages in thread From: Hou Tao @ 2023-12-14 4:30 UTC (permalink / raw) To: bpf Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko, Song Liu, Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh, Stanislav Fomichev, Jiri Olsa, John Fastabend, xingwei lee, houtao1 From: Hou Tao <houtao1@huawei.com> rcu_read_lock() is no longer held when invoking bpf_event_entry_gen() which is called by perf_event_fd_array_get_ptr(), so using GFP_KERNEL instead of GFP_ATOMIC to reduce the possibility of failures due to out-of-memory. Acked-by: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Hou Tao <houtao1@huawei.com> --- kernel/bpf/arraymap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c index 8d365bda9a8b..b5ec24b3563e 100644 --- a/kernel/bpf/arraymap.c +++ b/kernel/bpf/arraymap.c @@ -1195,7 +1195,7 @@ static struct bpf_event_entry *bpf_event_entry_gen(struct file *perf_file, { struct bpf_event_entry *ee; - ee = kzalloc(sizeof(*ee), GFP_ATOMIC); + ee = kzalloc(sizeof(*ee), GFP_KERNEL); if (ee) { ee->event = perf_file->private_data; ee->perf_file = perf_file; -- 2.29.2 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH bpf-next v3 0/2] bpf: Use GFP_KERNEL in bpf_event_entry_gen() 2023-12-14 4:30 [PATCH bpf-next v3 0/2] bpf: Use GFP_KERNEL in bpf_event_entry_gen() Hou Tao 2023-12-14 4:30 ` [PATCH bpf-next v3 1/2] bpf: Reduce the scope of rcu_read_lock when updating fd map Hou Tao 2023-12-14 4:30 ` [PATCH bpf-next v3 2/2] bpf: Use GFP_KERNEL in bpf_event_entry_gen() Hou Tao @ 2023-12-14 5:10 ` patchwork-bot+netdevbpf 2 siblings, 0 replies; 11+ messages in thread From: patchwork-bot+netdevbpf @ 2023-12-14 5:10 UTC (permalink / raw) To: Hou Tao Cc: bpf, martin.lau, alexei.starovoitov, andrii, song, haoluo, yonghong.song, daniel, kpsingh, sdf, jolsa, john.fastabend, xrivendell7, houtao1 Hello: This series was applied to bpf/bpf-next.git (master) by Alexei Starovoitov <ast@kernel.org>: On Thu, 14 Dec 2023 12:30:08 +0800 you wrote: > From: Hou Tao <houtao1@huawei.com> > > Hi, > > The simple patch set aims to replace GFP_ATOMIC by GFP_KERNEL in > bpf_event_entry_gen(). These two patches in the patch set were > preparatory patches in "Fix the release of inner map" patchset [1] and > are not needed for v2, so re-post it to bpf-next tree. > > [...] Here is the summary with links: - [bpf-next,v3,1/2] bpf: Reduce the scope of rcu_read_lock when updating fd map https://git.kernel.org/bpf/bpf-next/c/8f82583f9527 - [bpf-next,v3,2/2] bpf: Use GFP_KERNEL in bpf_event_entry_gen() https://git.kernel.org/bpf/bpf-next/c/dc68540913ac You are awesome, thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/patchwork/pwbot.html ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2023-12-15 8:18 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-12-14 4:30 [PATCH bpf-next v3 0/2] bpf: Use GFP_KERNEL in bpf_event_entry_gen() Hou Tao 2023-12-14 4:30 ` [PATCH bpf-next v3 1/2] bpf: Reduce the scope of rcu_read_lock when updating fd map Hou Tao 2023-12-14 6:22 ` John Fastabend 2023-12-14 7:31 ` Hou Tao 2023-12-14 13:55 ` Alexei Starovoitov 2023-12-14 19:15 ` John Fastabend 2023-12-15 3:23 ` Alexei Starovoitov 2023-12-15 3:39 ` Hou Tao 2023-12-15 8:18 ` Hou Tao 2023-12-14 4:30 ` [PATCH bpf-next v3 2/2] bpf: Use GFP_KERNEL in bpf_event_entry_gen() Hou Tao 2023-12-14 5:10 ` [PATCH bpf-next v3 0/2] " patchwork-bot+netdevbpf
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox