BPF List
 help / color / mirror / Atom feed
* [PATCH bpf-next v3 0/2] bpf: Use GFP_KERNEL in bpf_event_entry_gen()
@ 2023-12-14  4:30 Hou Tao
  2023-12-14  4:30 ` [PATCH bpf-next v3 1/2] bpf: Reduce the scope of rcu_read_lock when updating fd map Hou Tao
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Hou Tao @ 2023-12-14  4:30 UTC (permalink / raw)
  To: bpf
  Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko, Song Liu,
	Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh,
	Stanislav Fomichev, Jiri Olsa, John Fastabend, xingwei lee,
	houtao1

From: Hou Tao <houtao1@huawei.com>

Hi,

The simple patch set aims to replace GFP_ATOMIC by GFP_KERNEL in
bpf_event_entry_gen(). These two patches in the patch set were
preparatory patches in "Fix the release of inner map" patchset [1] and
are not needed for v2, so re-post it to bpf-next tree.

Patch #1 reduces the scope of rcu_read_lock when updating fd map and
patch #2 replaces GFP_ATOMIC by GFP_KERNEL. Please see individual
patches for more details.

Comments are always welcome.

Change Log:
v3:
 * patch #1: fallback to patch #1 in v1. Update comments in
             bpf_fd_htab_map_update_elem() to explain the reason for
	     rcu_read_lock() (Alexei)

v2: https://lore.kernel.org/bpf/20231211073843.1888058-1-houtao@huaweicloud.com/
 * patch #1: add rcu_read_lock/unlock() for bpf_fd_array_map_update_elem
             as well to make it consistent with
	     bpf_fd_htab_map_update_elem and update commit message
             accordingly (Alexei)
 * patch #1/#2: collects ack tags from Yonghong

v1: https://lore.kernel.org/bpf/20231208103357.2637299-1-houtao@huaweicloud.com/

[1]: https://lore.kernel.org/bpf/20231107140702.1891778-1-houtao@huaweicloud.com/


Hou Tao (2):
  bpf: Reduce the scope of rcu_read_lock when updating fd map
  bpf: Use GFP_KERNEL in bpf_event_entry_gen()

 kernel/bpf/arraymap.c | 2 +-
 kernel/bpf/hashtab.c  | 6 ++++++
 kernel/bpf/syscall.c  | 4 ----
 3 files changed, 7 insertions(+), 5 deletions(-)

-- 
2.29.2


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH bpf-next v3 1/2] bpf: Reduce the scope of rcu_read_lock when updating fd map
  2023-12-14  4:30 [PATCH bpf-next v3 0/2] bpf: Use GFP_KERNEL in bpf_event_entry_gen() Hou Tao
@ 2023-12-14  4:30 ` Hou Tao
  2023-12-14  6:22   ` John Fastabend
  2023-12-14  4:30 ` [PATCH bpf-next v3 2/2] bpf: Use GFP_KERNEL in bpf_event_entry_gen() Hou Tao
  2023-12-14  5:10 ` [PATCH bpf-next v3 0/2] " patchwork-bot+netdevbpf
  2 siblings, 1 reply; 11+ messages in thread
From: Hou Tao @ 2023-12-14  4:30 UTC (permalink / raw)
  To: bpf
  Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko, Song Liu,
	Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh,
	Stanislav Fomichev, Jiri Olsa, John Fastabend, xingwei lee,
	houtao1

From: Hou Tao <houtao1@huawei.com>

There is no rcu-read-lock requirement for ops->map_fd_get_ptr() or
ops->map_fd_put_ptr(), so doesn't use rcu-read-lock for these two
callbacks.

For bpf_fd_array_map_update_elem(), accessing array->ptrs doesn't need
rcu-read-lock because array->ptrs must still be allocated. For
bpf_fd_htab_map_update_elem(), htab_map_update_elem() only requires
rcu-read-lock to be held to avoid the WARN_ON_ONCE(), so only use
rcu_read_lock() during the invocation of htab_map_update_elem().

Acked-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Hou Tao <houtao1@huawei.com>
---
 kernel/bpf/hashtab.c | 6 ++++++
 kernel/bpf/syscall.c | 4 ----
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 5b9146fa825f..ec3bdcc6a3cf 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -2523,7 +2523,13 @@ int bpf_fd_htab_map_update_elem(struct bpf_map *map, struct file *map_file,
 	if (IS_ERR(ptr))
 		return PTR_ERR(ptr);
 
+	/* The htab bucket lock is always held during update operations in fd
+	 * htab map, and the following rcu_read_lock() is only used to avoid
+	 * the WARN_ON_ONCE in htab_map_update_elem().
+	 */
+	rcu_read_lock();
 	ret = htab_map_update_elem(map, key, &ptr, map_flags);
+	rcu_read_unlock();
 	if (ret)
 		map->ops->map_fd_put_ptr(map, ptr, false);
 
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index d63c1ed42412..3fcf7741146a 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -184,15 +184,11 @@ static int bpf_map_update_value(struct bpf_map *map, struct file *map_file,
 		err = bpf_percpu_cgroup_storage_update(map, key, value,
 						       flags);
 	} else if (IS_FD_ARRAY(map)) {
-		rcu_read_lock();
 		err = bpf_fd_array_map_update_elem(map, map_file, key, value,
 						   flags);
-		rcu_read_unlock();
 	} else if (map->map_type == BPF_MAP_TYPE_HASH_OF_MAPS) {
-		rcu_read_lock();
 		err = bpf_fd_htab_map_update_elem(map, map_file, key, value,
 						  flags);
-		rcu_read_unlock();
 	} else if (map->map_type == BPF_MAP_TYPE_REUSEPORT_SOCKARRAY) {
 		/* rcu_read_lock() is not needed */
 		err = bpf_fd_reuseport_array_update_elem(map, key, value,
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH bpf-next v3 2/2] bpf: Use GFP_KERNEL in bpf_event_entry_gen()
  2023-12-14  4:30 [PATCH bpf-next v3 0/2] bpf: Use GFP_KERNEL in bpf_event_entry_gen() Hou Tao
  2023-12-14  4:30 ` [PATCH bpf-next v3 1/2] bpf: Reduce the scope of rcu_read_lock when updating fd map Hou Tao
@ 2023-12-14  4:30 ` Hou Tao
  2023-12-14  5:10 ` [PATCH bpf-next v3 0/2] " patchwork-bot+netdevbpf
  2 siblings, 0 replies; 11+ messages in thread
From: Hou Tao @ 2023-12-14  4:30 UTC (permalink / raw)
  To: bpf
  Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko, Song Liu,
	Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh,
	Stanislav Fomichev, Jiri Olsa, John Fastabend, xingwei lee,
	houtao1

From: Hou Tao <houtao1@huawei.com>

rcu_read_lock() is no longer held when invoking bpf_event_entry_gen()
which is called by perf_event_fd_array_get_ptr(), so using GFP_KERNEL
instead of GFP_ATOMIC to reduce the possibility of failures due to
out-of-memory.

Acked-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Hou Tao <houtao1@huawei.com>
---
 kernel/bpf/arraymap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 8d365bda9a8b..b5ec24b3563e 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -1195,7 +1195,7 @@ static struct bpf_event_entry *bpf_event_entry_gen(struct file *perf_file,
 {
 	struct bpf_event_entry *ee;
 
-	ee = kzalloc(sizeof(*ee), GFP_ATOMIC);
+	ee = kzalloc(sizeof(*ee), GFP_KERNEL);
 	if (ee) {
 		ee->event = perf_file->private_data;
 		ee->perf_file = perf_file;
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next v3 0/2] bpf: Use GFP_KERNEL in bpf_event_entry_gen()
  2023-12-14  4:30 [PATCH bpf-next v3 0/2] bpf: Use GFP_KERNEL in bpf_event_entry_gen() Hou Tao
  2023-12-14  4:30 ` [PATCH bpf-next v3 1/2] bpf: Reduce the scope of rcu_read_lock when updating fd map Hou Tao
  2023-12-14  4:30 ` [PATCH bpf-next v3 2/2] bpf: Use GFP_KERNEL in bpf_event_entry_gen() Hou Tao
@ 2023-12-14  5:10 ` patchwork-bot+netdevbpf
  2 siblings, 0 replies; 11+ messages in thread
From: patchwork-bot+netdevbpf @ 2023-12-14  5:10 UTC (permalink / raw)
  To: Hou Tao
  Cc: bpf, martin.lau, alexei.starovoitov, andrii, song, haoluo,
	yonghong.song, daniel, kpsingh, sdf, jolsa, john.fastabend,
	xrivendell7, houtao1

Hello:

This series was applied to bpf/bpf-next.git (master)
by Alexei Starovoitov <ast@kernel.org>:

On Thu, 14 Dec 2023 12:30:08 +0800 you wrote:
> From: Hou Tao <houtao1@huawei.com>
> 
> Hi,
> 
> The simple patch set aims to replace GFP_ATOMIC by GFP_KERNEL in
> bpf_event_entry_gen(). These two patches in the patch set were
> preparatory patches in "Fix the release of inner map" patchset [1] and
> are not needed for v2, so re-post it to bpf-next tree.
> 
> [...]

Here is the summary with links:
  - [bpf-next,v3,1/2] bpf: Reduce the scope of rcu_read_lock when updating fd map
    https://git.kernel.org/bpf/bpf-next/c/8f82583f9527
  - [bpf-next,v3,2/2] bpf: Use GFP_KERNEL in bpf_event_entry_gen()
    https://git.kernel.org/bpf/bpf-next/c/dc68540913ac

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [PATCH bpf-next v3 1/2] bpf: Reduce the scope of rcu_read_lock when updating fd map
  2023-12-14  4:30 ` [PATCH bpf-next v3 1/2] bpf: Reduce the scope of rcu_read_lock when updating fd map Hou Tao
@ 2023-12-14  6:22   ` John Fastabend
  2023-12-14  7:31     ` Hou Tao
  0 siblings, 1 reply; 11+ messages in thread
From: John Fastabend @ 2023-12-14  6:22 UTC (permalink / raw)
  To: Hou Tao, bpf
  Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko, Song Liu,
	Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh,
	Stanislav Fomichev, Jiri Olsa, John Fastabend, xingwei lee,
	houtao1

Hou Tao wrote:
> From: Hou Tao <houtao1@huawei.com>
> 
> There is no rcu-read-lock requirement for ops->map_fd_get_ptr() or
> ops->map_fd_put_ptr(), so doesn't use rcu-read-lock for these two
> callbacks.
> 
> For bpf_fd_array_map_update_elem(), accessing array->ptrs doesn't need
> rcu-read-lock because array->ptrs must still be allocated. For
> bpf_fd_htab_map_update_elem(), htab_map_update_elem() only requires
> rcu-read-lock to be held to avoid the WARN_ON_ONCE(), so only use
> rcu_read_lock() during the invocation of htab_map_update_elem().
> 
> Acked-by: Yonghong Song <yonghong.song@linux.dev>
> Signed-off-by: Hou Tao <houtao1@huawei.com>
> ---
>  kernel/bpf/hashtab.c | 6 ++++++
>  kernel/bpf/syscall.c | 4 ----
>  2 files changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
> index 5b9146fa825f..ec3bdcc6a3cf 100644
> --- a/kernel/bpf/hashtab.c
> +++ b/kernel/bpf/hashtab.c
> @@ -2523,7 +2523,13 @@ int bpf_fd_htab_map_update_elem(struct bpf_map *map, struct file *map_file,
>  	if (IS_ERR(ptr))
>  		return PTR_ERR(ptr);
>  
> +	/* The htab bucket lock is always held during update operations in fd
> +	 * htab map, and the following rcu_read_lock() is only used to avoid
> +	 * the WARN_ON_ONCE in htab_map_update_elem().
> +	 */
> +	rcu_read_lock();
>  	ret = htab_map_update_elem(map, key, &ptr, map_flags);
> +	rcu_read_unlock();

Did we consider dropping the WARN_ON_ONCE in htab_map_update_elem()? It
looks like there are two ways to get to htab_map_update_elem() either
through a syscall and the path here (bpf_fd_htab_map_update_elem) or
through a BPF program calling, bpf_update_elem()? In the BPF_CALL
case bpf_map_update_elem() already has,

   WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_bh_held())

The htab_map_update_elem() has an additional check for
rcu_read_lock_trace_held(), but not sure where this is coming from
at the moment. Can that be added to the BPF caller side if needed?

Did I miss some caller path?

 

>  	if (ret)
>  		map->ops->map_fd_put_ptr(map, ptr, false);
>  
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index d63c1ed42412..3fcf7741146a 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -184,15 +184,11 @@ static int bpf_map_update_value(struct bpf_map *map, struct file *map_file,
>  		err = bpf_percpu_cgroup_storage_update(map, key, value,
>  						       flags);
>  	} else if (IS_FD_ARRAY(map)) {
> -		rcu_read_lock();
>  		err = bpf_fd_array_map_update_elem(map, map_file, key, value,
>  						   flags);
> -		rcu_read_unlock();
>  	} else if (map->map_type == BPF_MAP_TYPE_HASH_OF_MAPS) {
> -		rcu_read_lock();
>  		err = bpf_fd_htab_map_update_elem(map, map_file, key, value,
>  						  flags);
> -		rcu_read_unlock();
>  	} else if (map->map_type == BPF_MAP_TYPE_REUSEPORT_SOCKARRAY) {
>  		/* rcu_read_lock() is not needed */
>  		err = bpf_fd_reuseport_array_update_elem(map, key, value,

Any reason to leave the last rcu_read_lock() on the 'else{}' case? If
the rule is we have a reference to the map through the file fdget()? And
any concurrent runners need some locking, xchg, to handle the update a
rcu_read_lock() wont help there.

I didn't audit all the update flows tonight though.


> -- 
> 2.29.2
> 
> 



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next v3 1/2] bpf: Reduce the scope of rcu_read_lock when updating fd map
  2023-12-14  6:22   ` John Fastabend
@ 2023-12-14  7:31     ` Hou Tao
  2023-12-14 13:55       ` Alexei Starovoitov
  0 siblings, 1 reply; 11+ messages in thread
From: Hou Tao @ 2023-12-14  7:31 UTC (permalink / raw)
  To: John Fastabend, bpf
  Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko, Song Liu,
	Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh,
	Stanislav Fomichev, Jiri Olsa, xingwei lee, houtao1

Hi,

On 12/14/2023 2:22 PM, John Fastabend wrote:
> Hou Tao wrote:
>> From: Hou Tao <houtao1@huawei.com>
>>
>> There is no rcu-read-lock requirement for ops->map_fd_get_ptr() or
>> ops->map_fd_put_ptr(), so doesn't use rcu-read-lock for these two
>> callbacks.
>>
>> For bpf_fd_array_map_update_elem(), accessing array->ptrs doesn't need
>> rcu-read-lock because array->ptrs must still be allocated. For
>> bpf_fd_htab_map_update_elem(), htab_map_update_elem() only requires
>> rcu-read-lock to be held to avoid the WARN_ON_ONCE(), so only use
>> rcu_read_lock() during the invocation of htab_map_update_elem().
>>
>> Acked-by: Yonghong Song <yonghong.song@linux.dev>
>> Signed-off-by: Hou Tao <houtao1@huawei.com>
>> ---
>>  kernel/bpf/hashtab.c | 6 ++++++
>>  kernel/bpf/syscall.c | 4 ----
>>  2 files changed, 6 insertions(+), 4 deletions(-)
>>
>> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
>> index 5b9146fa825f..ec3bdcc6a3cf 100644
>> --- a/kernel/bpf/hashtab.c
>> +++ b/kernel/bpf/hashtab.c
>> @@ -2523,7 +2523,13 @@ int bpf_fd_htab_map_update_elem(struct bpf_map *map, struct file *map_file,
>>  	if (IS_ERR(ptr))
>>  		return PTR_ERR(ptr);
>>  
>> +	/* The htab bucket lock is always held during update operations in fd
>> +	 * htab map, and the following rcu_read_lock() is only used to avoid
>> +	 * the WARN_ON_ONCE in htab_map_update_elem().
>> +	 */
>> +	rcu_read_lock();
>>  	ret = htab_map_update_elem(map, key, &ptr, map_flags);
>> +	rcu_read_unlock();
> Did we consider dropping the WARN_ON_ONCE in htab_map_update_elem()? It
> looks like there are two ways to get to htab_map_update_elem() either
> through a syscall and the path here (bpf_fd_htab_map_update_elem) or
> through a BPF program calling, bpf_update_elem()? In the BPF_CALL
> case bpf_map_update_elem() already has,
>
>    WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_bh_held())
>
> The htab_map_update_elem() has an additional check for
> rcu_read_lock_trace_held(), but not sure where this is coming from
> at the moment. Can that be added to the BPF caller side if needed?
>
> Did I miss some caller path?

No. But I think the main reason for the extra WARN in
bpf_map_update_elem() is that bpf_map_update_elem() may be inlined by
verifier in do_misc_fixups(), so the WARN_ON_ONCE in
bpf_map_update_elem() will not be invoked ever. For
rcu_read_lock_trace_held(), I have added the assertion in
bpf_map_delete_elem() recently in commit 169410eba271 ("bpf: Check
rcu_read_lock_trace_held() before calling bpf map helpers").
>  
>
>>  	if (ret)
>>  		map->ops->map_fd_put_ptr(map, ptr, false);
>>  
>> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
>> index d63c1ed42412..3fcf7741146a 100644
>> --- a/kernel/bpf/syscall.c
>> +++ b/kernel/bpf/syscall.c
>> @@ -184,15 +184,11 @@ static int bpf_map_update_value(struct bpf_map *map, struct file *map_file,
>>  		err = bpf_percpu_cgroup_storage_update(map, key, value,
>>  						       flags);
>>  	} else if (IS_FD_ARRAY(map)) {
>> -		rcu_read_lock();
>>  		err = bpf_fd_array_map_update_elem(map, map_file, key, value,
>>  						   flags);
>> -		rcu_read_unlock();
>>  	} else if (map->map_type == BPF_MAP_TYPE_HASH_OF_MAPS) {
>> -		rcu_read_lock();
>>  		err = bpf_fd_htab_map_update_elem(map, map_file, key, value,
>>  						  flags);
>> -		rcu_read_unlock();
>>  	} else if (map->map_type == BPF_MAP_TYPE_REUSEPORT_SOCKARRAY) {
>>  		/* rcu_read_lock() is not needed */
>>  		err = bpf_fd_reuseport_array_update_elem(map, key, value,
> Any reason to leave the last rcu_read_lock() on the 'else{}' case? If
> the rule is we have a reference to the map through the file fdget()? And
> any concurrent runners need some locking, xchg, to handle the update a
> rcu_read_lock() wont help there.
>
> I didn't audit all the update flows tonight though.

It seems it is still necessary for htab and local storage. For normal
htab, it is possible the update is done without taking the bucket lock
(in-place replace), so RCU CS is needed to guarantee the iteration is
still safe. And for local storage (e.g. cgrp local storage) it may also
do in-place update through lookup and then update. We could fold the
calling of rcu_read_lock() into .map_update_elem() if it is necessary.
>
>
>> -- 
>> 2.29.2
>>
>>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next v3 1/2] bpf: Reduce the scope of rcu_read_lock when updating fd map
  2023-12-14  7:31     ` Hou Tao
@ 2023-12-14 13:55       ` Alexei Starovoitov
  2023-12-14 19:15         ` John Fastabend
  0 siblings, 1 reply; 11+ messages in thread
From: Alexei Starovoitov @ 2023-12-14 13:55 UTC (permalink / raw)
  To: Hou Tao
  Cc: John Fastabend, bpf, Martin KaFai Lau, Andrii Nakryiko, Song Liu,
	Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh,
	Stanislav Fomichev, Jiri Olsa, xingwei lee, Hou Tao

On Wed, Dec 13, 2023 at 11:31 PM Hou Tao <houtao@huaweicloud.com> wrote:
>
> Hi,
>
> On 12/14/2023 2:22 PM, John Fastabend wrote:
> > Hou Tao wrote:
> >> From: Hou Tao <houtao1@huawei.com>
> >>
> >> There is no rcu-read-lock requirement for ops->map_fd_get_ptr() or
> >> ops->map_fd_put_ptr(), so doesn't use rcu-read-lock for these two
> >> callbacks.
> >>
> >> For bpf_fd_array_map_update_elem(), accessing array->ptrs doesn't need
> >> rcu-read-lock because array->ptrs must still be allocated. For
> >> bpf_fd_htab_map_update_elem(), htab_map_update_elem() only requires
> >> rcu-read-lock to be held to avoid the WARN_ON_ONCE(), so only use
> >> rcu_read_lock() during the invocation of htab_map_update_elem().
> >>
> >> Acked-by: Yonghong Song <yonghong.song@linux.dev>
> >> Signed-off-by: Hou Tao <houtao1@huawei.com>
> >> ---
> >>  kernel/bpf/hashtab.c | 6 ++++++
> >>  kernel/bpf/syscall.c | 4 ----
> >>  2 files changed, 6 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
> >> index 5b9146fa825f..ec3bdcc6a3cf 100644
> >> --- a/kernel/bpf/hashtab.c
> >> +++ b/kernel/bpf/hashtab.c
> >> @@ -2523,7 +2523,13 @@ int bpf_fd_htab_map_update_elem(struct bpf_map *map, struct file *map_file,
> >>      if (IS_ERR(ptr))
> >>              return PTR_ERR(ptr);
> >>
> >> +    /* The htab bucket lock is always held during update operations in fd
> >> +     * htab map, and the following rcu_read_lock() is only used to avoid
> >> +     * the WARN_ON_ONCE in htab_map_update_elem().
> >> +     */
> >> +    rcu_read_lock();
> >>      ret = htab_map_update_elem(map, key, &ptr, map_flags);
> >> +    rcu_read_unlock();
> > Did we consider dropping the WARN_ON_ONCE in htab_map_update_elem()? It
> > looks like there are two ways to get to htab_map_update_elem() either
> > through a syscall and the path here (bpf_fd_htab_map_update_elem) or
> > through a BPF program calling, bpf_update_elem()? In the BPF_CALL
> > case bpf_map_update_elem() already has,
> >
> >    WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_bh_held())
> >
> > The htab_map_update_elem() has an additional check for
> > rcu_read_lock_trace_held(), but not sure where this is coming from
> > at the moment. Can that be added to the BPF caller side if needed?
> >
> > Did I miss some caller path?
>
> No. But I think the main reason for the extra WARN in
> bpf_map_update_elem() is that bpf_map_update_elem() may be inlined by
> verifier in do_misc_fixups(), so the WARN_ON_ONCE in
> bpf_map_update_elem() will not be invoked ever. For
> rcu_read_lock_trace_held(), I have added the assertion in
> bpf_map_delete_elem() recently in commit 169410eba271 ("bpf: Check
> rcu_read_lock_trace_held() before calling bpf map helpers").

Yep.
We should probably remove WARN_ONs from
bpf_map_update_elem() and others in kernel/bpf/helpers.c
since they are inlined by the verifier with 99% probability
and the WARNs are never called even in DEBUG kernels.
And confusing developers. As this thread shows.

We can replace them with a comment that explains this inlining logic
and where the real WARNs are.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next v3 1/2] bpf: Reduce the scope of rcu_read_lock when updating fd map
  2023-12-14 13:55       ` Alexei Starovoitov
@ 2023-12-14 19:15         ` John Fastabend
  2023-12-15  3:23           ` Alexei Starovoitov
  2023-12-15  8:18           ` Hou Tao
  0 siblings, 2 replies; 11+ messages in thread
From: John Fastabend @ 2023-12-14 19:15 UTC (permalink / raw)
  To: Alexei Starovoitov, Hou Tao
  Cc: John Fastabend, bpf, Martin KaFai Lau, Andrii Nakryiko, Song Liu,
	Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh,
	Stanislav Fomichev, Jiri Olsa, xingwei lee, Hou Tao

Alexei Starovoitov wrote:
> On Wed, Dec 13, 2023 at 11:31 PM Hou Tao <houtao@huaweicloud.com> wrote:
> >
> > Hi,
> >
> > On 12/14/2023 2:22 PM, John Fastabend wrote:
> > > Hou Tao wrote:
> > >> From: Hou Tao <houtao1@huawei.com>
> > >>
> > >> There is no rcu-read-lock requirement for ops->map_fd_get_ptr() or
> > >> ops->map_fd_put_ptr(), so doesn't use rcu-read-lock for these two
> > >> callbacks.
> > >>
> > >> For bpf_fd_array_map_update_elem(), accessing array->ptrs doesn't need
> > >> rcu-read-lock because array->ptrs must still be allocated. For
> > >> bpf_fd_htab_map_update_elem(), htab_map_update_elem() only requires
> > >> rcu-read-lock to be held to avoid the WARN_ON_ONCE(), so only use
> > >> rcu_read_lock() during the invocation of htab_map_update_elem().
> > >>
> > >> Acked-by: Yonghong Song <yonghong.song@linux.dev>
> > >> Signed-off-by: Hou Tao <houtao1@huawei.com>
> > >> ---
> > >>  kernel/bpf/hashtab.c | 6 ++++++
> > >>  kernel/bpf/syscall.c | 4 ----
> > >>  2 files changed, 6 insertions(+), 4 deletions(-)
> > >>
> > >> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
> > >> index 5b9146fa825f..ec3bdcc6a3cf 100644
> > >> --- a/kernel/bpf/hashtab.c
> > >> +++ b/kernel/bpf/hashtab.c
> > >> @@ -2523,7 +2523,13 @@ int bpf_fd_htab_map_update_elem(struct bpf_map *map, struct file *map_file,
> > >>      if (IS_ERR(ptr))
> > >>              return PTR_ERR(ptr);
> > >>
> > >> +    /* The htab bucket lock is always held during update operations in fd
> > >> +     * htab map, and the following rcu_read_lock() is only used to avoid
> > >> +     * the WARN_ON_ONCE in htab_map_update_elem().
> > >> +     */

Ah ok but isn't this comment wrong because you do need rcu read lock to do
the walk with lookup_nulls_elem_raw where there is no lock being held? And
then the subsequent copy in place is fine because you do have a lock.

So its not just to appease the WARN_ON_ONCE here it has an actual real
need?

> > >> +    rcu_read_lock();
> > >>      ret = htab_map_update_elem(map, key, &ptr, map_flags);
> > >> +    rcu_read_unlock();
> > > Did we consider dropping the WARN_ON_ONCE in htab_map_update_elem()? It
> > > looks like there are two ways to get to htab_map_update_elem() either
> > > through a syscall and the path here (bpf_fd_htab_map_update_elem) or
> > > through a BPF program calling, bpf_update_elem()? In the BPF_CALL
> > > case bpf_map_update_elem() already has,
> > >
> > >    WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_bh_held())
> > >
> > > The htab_map_update_elem() has an additional check for
> > > rcu_read_lock_trace_held(), but not sure where this is coming from
> > > at the moment. Can that be added to the BPF caller side if needed?
> > >
> > > Did I miss some caller path?
> >
> > No. But I think the main reason for the extra WARN in
> > bpf_map_update_elem() is that bpf_map_update_elem() may be inlined by
> > verifier in do_misc_fixups(), so the WARN_ON_ONCE in
> > bpf_map_update_elem() will not be invoked ever. For
> > rcu_read_lock_trace_held(), I have added the assertion in
> > bpf_map_delete_elem() recently in commit 169410eba271 ("bpf: Check
> > rcu_read_lock_trace_held() before calling bpf map helpers").
> 
> Yep.
> We should probably remove WARN_ONs from
> bpf_map_update_elem() and others in kernel/bpf/helpers.c
> since they are inlined by the verifier with 99% probability
> and the WARNs are never called even in DEBUG kernels.
> And confusing developers. As this thread shows.

Agree. The rcu_read needs to be close as possible to where its actually
needed and the WARN_ON_ONCE should be dropped if its going to be
inlined.

> 
> We can replace them with a comment that explains this inlining logic
> and where the real WARNs are.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next v3 1/2] bpf: Reduce the scope of rcu_read_lock when updating fd map
  2023-12-14 19:15         ` John Fastabend
@ 2023-12-15  3:23           ` Alexei Starovoitov
  2023-12-15  3:39             ` Hou Tao
  2023-12-15  8:18           ` Hou Tao
  1 sibling, 1 reply; 11+ messages in thread
From: Alexei Starovoitov @ 2023-12-15  3:23 UTC (permalink / raw)
  To: John Fastabend
  Cc: Hou Tao, bpf, Martin KaFai Lau, Andrii Nakryiko, Song Liu,
	Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh,
	Stanislav Fomichev, Jiri Olsa, xingwei lee, Hou Tao

On Thu, Dec 14, 2023 at 11:15 AM John Fastabend
<john.fastabend@gmail.com> wrote:
>
> Alexei Starovoitov wrote:
> > On Wed, Dec 13, 2023 at 11:31 PM Hou Tao <houtao@huaweicloud.com> wrote:
> > >
> > > Hi,
> > >
> > > On 12/14/2023 2:22 PM, John Fastabend wrote:
> > > > Hou Tao wrote:
> > > >> From: Hou Tao <houtao1@huawei.com>
> > > >>
> > > >> There is no rcu-read-lock requirement for ops->map_fd_get_ptr() or
> > > >> ops->map_fd_put_ptr(), so doesn't use rcu-read-lock for these two
> > > >> callbacks.
> > > >>
> > > >> For bpf_fd_array_map_update_elem(), accessing array->ptrs doesn't need
> > > >> rcu-read-lock because array->ptrs must still be allocated. For
> > > >> bpf_fd_htab_map_update_elem(), htab_map_update_elem() only requires
> > > >> rcu-read-lock to be held to avoid the WARN_ON_ONCE(), so only use
> > > >> rcu_read_lock() during the invocation of htab_map_update_elem().
> > > >>
> > > >> Acked-by: Yonghong Song <yonghong.song@linux.dev>
> > > >> Signed-off-by: Hou Tao <houtao1@huawei.com>
> > > >> ---
> > > >>  kernel/bpf/hashtab.c | 6 ++++++
> > > >>  kernel/bpf/syscall.c | 4 ----
> > > >>  2 files changed, 6 insertions(+), 4 deletions(-)
> > > >>
> > > >> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
> > > >> index 5b9146fa825f..ec3bdcc6a3cf 100644
> > > >> --- a/kernel/bpf/hashtab.c
> > > >> +++ b/kernel/bpf/hashtab.c
> > > >> @@ -2523,7 +2523,13 @@ int bpf_fd_htab_map_update_elem(struct bpf_map *map, struct file *map_file,
> > > >>      if (IS_ERR(ptr))
> > > >>              return PTR_ERR(ptr);
> > > >>
> > > >> +    /* The htab bucket lock is always held during update operations in fd
> > > >> +     * htab map, and the following rcu_read_lock() is only used to avoid
> > > >> +     * the WARN_ON_ONCE in htab_map_update_elem().
> > > >> +     */
>
> Ah ok but isn't this comment wrong because you do need rcu read lock to do
> the walk with lookup_nulls_elem_raw where there is no lock being held? And
> then the subsequent copy in place is fine because you do have a lock.

Ohh. You're correct.
Not sure what I was thinking.

Hou,
could you please send a follow up to undo my braino.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next v3 1/2] bpf: Reduce the scope of rcu_read_lock when updating fd map
  2023-12-15  3:23           ` Alexei Starovoitov
@ 2023-12-15  3:39             ` Hou Tao
  0 siblings, 0 replies; 11+ messages in thread
From: Hou Tao @ 2023-12-15  3:39 UTC (permalink / raw)
  To: Alexei Starovoitov, John Fastabend
  Cc: bpf, Martin KaFai Lau, Andrii Nakryiko, Song Liu, Hao Luo,
	Yonghong Song, Daniel Borkmann, KP Singh, Stanislav Fomichev,
	Jiri Olsa, xingwei lee, Hou Tao

Hi,

On 12/15/2023 11:23 AM, Alexei Starovoitov wrote:
> On Thu, Dec 14, 2023 at 11:15 AM John Fastabend
> <john.fastabend@gmail.com> wrote:
>> Alexei Starovoitov wrote:
>>> On Wed, Dec 13, 2023 at 11:31 PM Hou Tao <houtao@huaweicloud.com> wrote:
>>>> Hi,
>>>>
>>>> On 12/14/2023 2:22 PM, John Fastabend wrote:
>>>>> Hou Tao wrote:
>>>>>> From: Hou Tao <houtao1@huawei.com>
>>>>>>
>>>>>> There is no rcu-read-lock requirement for ops->map_fd_get_ptr() or
>>>>>> ops->map_fd_put_ptr(), so doesn't use rcu-read-lock for these two
>>>>>> callbacks.
>>>>>>
>>>>>> For bpf_fd_array_map_update_elem(), accessing array->ptrs doesn't need
>>>>>> rcu-read-lock because array->ptrs must still be allocated. For
>>>>>> bpf_fd_htab_map_update_elem(), htab_map_update_elem() only requires
>>>>>> rcu-read-lock to be held to avoid the WARN_ON_ONCE(), so only use
>>>>>> rcu_read_lock() during the invocation of htab_map_update_elem().
>>>>>>
>>>>>> Acked-by: Yonghong Song <yonghong.song@linux.dev>
>>>>>> Signed-off-by: Hou Tao <houtao1@huawei.com>
>>>>>> ---
>>>>>>  kernel/bpf/hashtab.c | 6 ++++++
>>>>>>  kernel/bpf/syscall.c | 4 ----
>>>>>>  2 files changed, 6 insertions(+), 4 deletions(-)
>>>>>>
>>>>>> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
>>>>>> index 5b9146fa825f..ec3bdcc6a3cf 100644
>>>>>> --- a/kernel/bpf/hashtab.c
>>>>>> +++ b/kernel/bpf/hashtab.c
>>>>>> @@ -2523,7 +2523,13 @@ int bpf_fd_htab_map_update_elem(struct bpf_map *map, struct file *map_file,
>>>>>>      if (IS_ERR(ptr))
>>>>>>              return PTR_ERR(ptr);
>>>>>>
>>>>>> +    /* The htab bucket lock is always held during update operations in fd
>>>>>> +     * htab map, and the following rcu_read_lock() is only used to avoid
>>>>>> +     * the WARN_ON_ONCE in htab_map_update_elem().
>>>>>> +     */
>> Ah ok but isn't this comment wrong because you do need rcu read lock to do
>> the walk with lookup_nulls_elem_raw where there is no lock being held? And
>> then the subsequent copy in place is fine because you do have a lock.
> Ohh. You're correct.
> Not sure what I was thinking.
>
> Hou,
> could you please send a follow up to undo my braino.
Er, I didn't follow. There is no spin-lock support in fd htab map, so
htab_map_update_elem() won't call lookup_nulls_elem_raw(), instead it
will lock the bucket and invoke lookup_elem_raw(), so I don't think
rcu_read_lock() is indeed needed for the invocation of
htab_map_update_elem(), except to make WARN_ON_ONC() happy.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next v3 1/2] bpf: Reduce the scope of rcu_read_lock when updating fd map
  2023-12-14 19:15         ` John Fastabend
  2023-12-15  3:23           ` Alexei Starovoitov
@ 2023-12-15  8:18           ` Hou Tao
  1 sibling, 0 replies; 11+ messages in thread
From: Hou Tao @ 2023-12-15  8:18 UTC (permalink / raw)
  To: John Fastabend, Alexei Starovoitov
  Cc: bpf, Martin KaFai Lau, Andrii Nakryiko, Song Liu, Hao Luo,
	Yonghong Song, Daniel Borkmann, KP Singh, Stanislav Fomichev,
	Jiri Olsa, xingwei lee, Hou Tao

Hi,

On 12/15/2023 3:15 AM, John Fastabend wrote:
> Alexei Starovoitov wrote:
>> On Wed, Dec 13, 2023 at 11:31 PM Hou Tao <houtao@huaweicloud.com> wrote:
>>> Hi,
>>>
>>> On 12/14/2023 2:22 PM, John Fastabend wrote:
>>>> Hou Tao wrote:
>>>>> From: Hou Tao <houtao1@huawei.com>
>>>>>
>>>>> There is no rcu-read-lock requirement for ops->map_fd_get_ptr() or
>>>>> ops->map_fd_put_ptr(), so doesn't use rcu-read-lock for these two
>>>>> callbacks.
>>>>>
>>>>> For bpf_fd_array_map_update_elem(), accessing array->ptrs doesn't need
>>>>> rcu-read-lock because array->ptrs must still be allocated. For
>>>>> bpf_fd_htab_map_update_elem(), htab_map_update_elem() only requires
>>>>> rcu-read-lock to be held to avoid the WARN_ON_ONCE(), so only use
>>>>> rcu_read_lock() during the invocation of htab_map_update_elem().
>>>>>
>>>>> Acked-by: Yonghong Song <yonghong.song@linux.dev>
>>>>> Signed-off-by: Hou Tao <houtao1@huawei.com>
>>>>> ---
>>>>>  kernel/bpf/hashtab.c | 6 ++++++
>>>>>  kernel/bpf/syscall.c | 4 ----
>>>>>  2 files changed, 6 insertions(+), 4 deletions(-)
>>>>>
>>>>> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
>>>>> index 5b9146fa825f..ec3bdcc6a3cf 100644
>>>>> --- a/kernel/bpf/hashtab.c
>>>>> +++ b/kernel/bpf/hashtab.c
>>>>> @@ -2523,7 +2523,13 @@ int bpf_fd_htab_map_update_elem(struct bpf_map *map, struct file *map_file,
>>>>>      if (IS_ERR(ptr))
>>>>>              return PTR_ERR(ptr);
>>>>>
>>>>> +    /* The htab bucket lock is always held during update operations in fd
>>>>> +     * htab map, and the following rcu_read_lock() is only used to avoid
>>>>> +     * the WARN_ON_ONCE in htab_map_update_elem().
>>>>> +     */
> Ah ok but isn't this comment wrong because you do need rcu read lock to do
> the walk with lookup_nulls_elem_raw where there is no lock being held? And
> then the subsequent copy in place is fine because you do have a lock.
>
> So its not just to appease the WARN_ON_ONCE here it has an actual real
> need?
>
>>>>> +    rcu_read_lock();
>>>>>      ret = htab_map_update_elem(map, key, &ptr, map_flags);
>>>>> +    rcu_read_unlock();
>>>> Did we consider dropping the WARN_ON_ONCE in htab_map_update_elem()? It
>>>> looks like there are two ways to get to htab_map_update_elem() either
>>>> through a syscall and the path here (bpf_fd_htab_map_update_elem) or
>>>> through a BPF program calling, bpf_update_elem()? In the BPF_CALL
>>>> case bpf_map_update_elem() already has,
>>>>
>>>>    WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_bh_held())
>>>>
>>>> The htab_map_update_elem() has an additional check for
>>>> rcu_read_lock_trace_held(), but not sure where this is coming from
>>>> at the moment. Can that be added to the BPF caller side if needed?
>>>>
>>>> Did I miss some caller path?
>>> No. But I think the main reason for the extra WARN in
>>> bpf_map_update_elem() is that bpf_map_update_elem() may be inlined by
>>> verifier in do_misc_fixups(), so the WARN_ON_ONCE in
>>> bpf_map_update_elem() will not be invoked ever. For
>>> rcu_read_lock_trace_held(), I have added the assertion in
>>> bpf_map_delete_elem() recently in commit 169410eba271 ("bpf: Check
>>> rcu_read_lock_trace_held() before calling bpf map helpers").
>> Yep.
>> We should probably remove WARN_ONs from
>> bpf_map_update_elem() and others in kernel/bpf/helpers.c
>> since they are inlined by the verifier with 99% probability
>> and the WARNs are never called even in DEBUG kernels.
>> And confusing developers. As this thread shows.
> Agree. The rcu_read needs to be close as possible to where its actually
> needed and the WARN_ON_ONCE should be dropped if its going to be
> inlined.

I did some investigation on these bpf map helpers and the
implementations of these helpers in various kinds of bpf map. It seems
most implementations (besides dev_map_hash_ops) already have added
proper RCU lock assertions, so I think it is indeed OK to remove
WARN_ON_ONCE() on these bpf map helpers after fixing the assertion in
dev_map_hash_ops. The following is the details:

1. bpf_map_lookup_elem helper
(a) hash/lru_hash/percpu_hash/lru_percpu_hash
with !rcu_read_lock_held() && !rcu_read_lock_trace_held() &&
!rcu_read_lock_bh_held() in __htab_map_lookup_elem()

(b) array/percpu_array
no deletion, so no RCU

(c) lpm_trie
with rcu_read_lock_bh_held() in trie_lookup_elem()

(d) htab_of_maps
with !rcu_read_lock_held() && !rcu_read_lock_trace_held() &&
!rcu_read_lock_bh_held() in __htab_map_lookup_elem()

(e) array_of_maps
no deletion, so no RCU

(f) sockmap
rcu_read_lock_held() in __sock_map_lookup_elem()

(g) sockhash
rcu_read_lock_held() in__sock_hash_lookup_elem()

(h) devmap
rcu_read_lock_bh_held() in __dev_map_lookup_elem()

(i) devmap_hash (incorrect assertion)
No rcu_read_lock_bh_held() in __dev_map_hash_lookup_elem()

(j) xskmap
rcu_read_lock_bh_held() in __xsk_map_lookup_elem()

2. bpf_map_update_elem helper
(a) hash/lru_hash/percpu_hash/lru_percpu_hash
with !rcu_read_lock_held() && !rcu_read_lock_trace_held() &&
!rcu_read_lock_bh_held() in
htab_map_update_elem()/htab_lru_map_update_elem()/__htab_percpu_map_update_elem()/__htab_lru_percpu_map_update_elem()

(b) array/percpu_array
no RCU

(c) lpm_trie
use spin-lock, and no RCU

(d) sockmap
use spin-lock & with rcu_read_lock_held() in sock_map_update_common()

(e) sockhash
use spin-lock & with rcu_read_lock_held() in sock_hash_update_common()

3.bpf_map_delete_elem helper
(a) hash/lru_hash/percpu_hash/lru_percpu_hash
with !rcu_read_lock_held() && !rcu_read_lock_trace_held() &&
!rcu_read_lock_bh_held() () in htab_map_delete_elem/htab_lru_map_delete_elem

(b) array/percpu_array
no support

(c) lpm_trie
use spin-lock, no rcu

(d) sockmap
use spin-lock

(e) sockhash
use spin-lock

4. bpf_map_lookup_percpu_elem
(a) percpu_hash/lru_percpu_hash
with !rcu_read_lock_held() && !rcu_read_lock_trace_held() &&
!rcu_read_lock_bh_held() in __htab_map_lookup_elem()

(b) percpu_array
no deletion, no RCU

>> We can replace them with a comment that explains this inlining logic
>> and where the real WARNs are..


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2023-12-15  8:18 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-12-14  4:30 [PATCH bpf-next v3 0/2] bpf: Use GFP_KERNEL in bpf_event_entry_gen() Hou Tao
2023-12-14  4:30 ` [PATCH bpf-next v3 1/2] bpf: Reduce the scope of rcu_read_lock when updating fd map Hou Tao
2023-12-14  6:22   ` John Fastabend
2023-12-14  7:31     ` Hou Tao
2023-12-14 13:55       ` Alexei Starovoitov
2023-12-14 19:15         ` John Fastabend
2023-12-15  3:23           ` Alexei Starovoitov
2023-12-15  3:39             ` Hou Tao
2023-12-15  8:18           ` Hou Tao
2023-12-14  4:30 ` [PATCH bpf-next v3 2/2] bpf: Use GFP_KERNEL in bpf_event_entry_gen() Hou Tao
2023-12-14  5:10 ` [PATCH bpf-next v3 0/2] " patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox