BPF List
 help / color / mirror / Atom feed
* [PATCH bpf] bpf, lpm_trie: Allow lookups from sleepable BPF programs
@ 2026-05-29 17:42 Vlad Poenaru
  2026-05-29 19:02 ` sashiko-bot
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Vlad Poenaru @ 2026-05-29 17:42 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, bpf
  Cc: Martin KaFai Lau, Eduard Zingerman, Kumar Kartikeya Dwivedi,
	Song Liu, Yonghong Song, Jiri Olsa,
	Toke Høiland-Jørgensen, linux-kernel, stable

trie_lookup_elem() annotates its rcu_dereference_check() walks with
only rcu_read_lock_bh_held(). Because rcu_dereference_check(p, c)
resolves to "c || rcu_read_lock_held()", this passes for XDP/NAPI and
classic RCU readers but fails for sleepable BPF programs, which enter
via __bpf_prog_enter_sleepable() and hold only rcu_read_lock_trace().

A sleepable LSM hook that ends up doing bpf_map_lookup_elem() on an LPM
trie therefore triggers lockdep on debug kernels:

  =============================
  WARNING: suspicious RCU usage
  7.1.0-... Tainted: G            E
  -----------------------------
  kernel/bpf/lpm_trie.c:249 suspicious rcu_dereference_check() usage!
  1 lock held by net_tests/540:
   #0: (rcu_tasks_trace_srcu_struct){....}-{0:0},
       at: __bpf_prog_enter_sleepable+0x26/0x280
  Call Trace:
   dump_stack_lvl
   lockdep_rcu_suspicious
   trie_lookup_elem
   bpf_prog_..._enforce_security_socket_connect
   bpf_trampoline_...
   security_socket_connect
   __sys_connect
   do_syscall_64

This is lockdep-only -- no UAF, since Tasks Trace RCU does serialize
against the trie's reclaim path -- but it spams the console once per
distinct callsite on every debug kernel running a sleepable BPF LSM
that does map lookups on an LPM trie, which is increasingly common.

Other map types already use the bpf_rcu_lock_held() helper, which
accepts all three contexts (classic, BH, Tasks Trace). Use it here as
well, matching the established convention.

Fixes: 694cea395fde ("bpf: Allow RCU-protected lookups to happen from bh context")
Cc: stable@vger.kernel.org
Signed-off-by: Vlad Poenaru <vlad.wing@gmail.com>
---
 kernel/bpf/lpm_trie.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c
index 0f57608b385d..ac36063cb7e6 100644
--- a/kernel/bpf/lpm_trie.c
+++ b/kernel/bpf/lpm_trie.c
@@ -246,7 +246,7 @@ static void *trie_lookup_elem(struct bpf_map *map, void *_key)
 
 	/* Start walking the trie from the root node ... */
 
-	for (node = rcu_dereference_check(trie->root, rcu_read_lock_bh_held());
+	for (node = rcu_dereference_check(trie->root, bpf_rcu_lock_held());
 	     node;) {
 		unsigned int next_bit;
 		size_t matchlen;
@@ -280,7 +280,7 @@ static void *trie_lookup_elem(struct bpf_map *map, void *_key)
 		 */
 		next_bit = extract_bit(key->data, node->prefixlen);
 		node = rcu_dereference_check(node->child[next_bit],
-					     rcu_read_lock_bh_held());
+					     bpf_rcu_lock_held());
 	}
 
 	if (!found)
-- 
2.53.0-Meta


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH bpf] bpf, lpm_trie: Allow lookups from sleepable BPF programs
  2026-05-29 17:42 [PATCH bpf] bpf, lpm_trie: Allow lookups from sleepable BPF programs Vlad Poenaru
@ 2026-05-29 19:02 ` sashiko-bot
  2026-05-29 19:30   ` Emil Tsalapatis
  2026-06-07  9:17   ` Kumar Kartikeya Dwivedi
  2026-05-29 19:19 ` Emil Tsalapatis
  2026-06-09 13:55 ` [PATCH bpf v2 0/2] bpf, lpm_trie: Allow sleepable BPF programs to use LPM tries Vlad Poenaru
  2 siblings, 2 replies; 13+ messages in thread
From: sashiko-bot @ 2026-05-29 19:02 UTC (permalink / raw)
  To: Vlad Poenaru; +Cc: bpf

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:

Pre-existing issues:
- [High] This problem wasn't introduced by this patch, but the fix is incomplete. The patch resolves an RCU lockdep splat in `trie_lookup_elem` for sleepable BPF programs, but leaves identical core kernel locking rule violations in `trie_update_elem` and `trie_delete_elem`. Sleepable eBPF programs hold `rcu_read_lock_trace()`, not the classic `rcu_read_lock()`. The update and delete operations acquire the map's spinlock (`trie->lock`) and correctly serialize access, but they use `rcu_dereference()` to traverse the trie instead of the update-side API `rcu_dereference_protected()`. `rcu_dereference()` strictly asserts that `rcu_read_lock_held()` is true. Holding a spinlock disables preemption but does not satisfy this check, leading to the exact same 'suspicious rcu_dereference_check() usage' lockdep splat that this patch aimed to fix.
--

commit 1ac6e30b3c7fb4d93d67bfcbb5dc35077814b266
Author: Vlad Poenaru <vlad.wing@gmail.com>

bpf, lpm_trie: Allow lookups from sleepable BPF programs

This commit fixes a lockdep warning caused by sleepable BPF programs
accessing LPM tries. It updates the RCU dereference checks in
trie_lookup_elem() to use bpf_rcu_lock_held() instead of
rcu_read_lock_bh_held().

> diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c
> index 0f57608b385d4..ac36063cb7e62 100644
> --- a/kernel/bpf/lpm_trie.c
> +++ b/kernel/bpf/lpm_trie.c
[ ... ]
> @@ -280,7 +280,7 @@ static void *trie_lookup_elem(struct bpf_map *map, void *_key)
>  		 */
>  		next_bit = extract_bit(key->data, node->prefixlen);
>  		node = rcu_dereference_check(node->child[next_bit],
> -					     rcu_read_lock_bh_held());
> +					     bpf_rcu_lock_held());
>  	}
>  
>  	if (!found)

[Severity: High]
This is a pre-existing issue, but does this patch leave the fix incomplete?

While this patch updates the lookup path to avoid lockdep splats from sleepable
BPF programs, sleepable BPF programs can also trigger map updates and deletes.
The functions trie_update_elem() and trie_delete_elem() acquire the map's
spinlock (trie->lock) but traverse the tree using rcu_dereference():

kernel/bpf/lpm_trie.c:trie_update_elem() {
...
        while ((node = rcu_dereference(*slot))) {
                matchlen = longest_prefix_match(trie, node, key);
...
}

kernel/bpf/lpm_trie.c:trie_delete_elem() {
...
        while ((node = rcu_dereference(*trim))) {
                matchlen = longest_prefix_match(trie, node, key);
...
}

Because rcu_dereference() strictly asserts that rcu_read_lock_held() is true,
and holding a spinlock does not satisfy the classic RCU read lock check, won't
calling map updates or deletes from sleepable programs still trigger the exact
same suspicious rcu_dereference_check() usage lockdep splats?

Should these paths be updated to use rcu_dereference_protected() since the
trie->lock is held?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260529174233.2954240-1-vlad.wing@gmail.com?part=1

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH bpf] bpf, lpm_trie: Allow lookups from sleepable BPF programs
  2026-05-29 17:42 [PATCH bpf] bpf, lpm_trie: Allow lookups from sleepable BPF programs Vlad Poenaru
  2026-05-29 19:02 ` sashiko-bot
@ 2026-05-29 19:19 ` Emil Tsalapatis
  2026-06-09 13:55 ` [PATCH bpf v2 0/2] bpf, lpm_trie: Allow sleepable BPF programs to use LPM tries Vlad Poenaru
  2 siblings, 0 replies; 13+ messages in thread
From: Emil Tsalapatis @ 2026-05-29 19:19 UTC (permalink / raw)
  To: Vlad Poenaru, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, bpf
  Cc: Martin KaFai Lau, Eduard Zingerman, Kumar Kartikeya Dwivedi,
	Song Liu, Yonghong Song, Jiri Olsa,
	Toke Høiland-Jørgensen, linux-kernel, stable

On Fri May 29, 2026 at 1:42 PM EDT, Vlad Poenaru wrote:
> trie_lookup_elem() annotates its rcu_dereference_check() walks with
> only rcu_read_lock_bh_held(). Because rcu_dereference_check(p, c)
> resolves to "c || rcu_read_lock_held()", this passes for XDP/NAPI and
> classic RCU readers but fails for sleepable BPF programs, which enter
> via __bpf_prog_enter_sleepable() and hold only rcu_read_lock_trace().
>
> A sleepable LSM hook that ends up doing bpf_map_lookup_elem() on an LPM
> trie therefore triggers lockdep on debug kernels:
>
>   =============================
>   WARNING: suspicious RCU usage
>   7.1.0-... Tainted: G            E
>   -----------------------------
>   kernel/bpf/lpm_trie.c:249 suspicious rcu_dereference_check() usage!
>   1 lock held by net_tests/540:
>    #0: (rcu_tasks_trace_srcu_struct){....}-{0:0},
>        at: __bpf_prog_enter_sleepable+0x26/0x280
>   Call Trace:
>    dump_stack_lvl
>    lockdep_rcu_suspicious
>    trie_lookup_elem
>    bpf_prog_..._enforce_security_socket_connect
>    bpf_trampoline_...
>    security_socket_connect
>    __sys_connect
>    do_syscall_64
>
> This is lockdep-only -- no UAF, since Tasks Trace RCU does serialize
> against the trie's reclaim path -- but it spams the console once per
> distinct callsite on every debug kernel running a sleepable BPF LSM
> that does map lookups on an LPM trie, which is increasingly common.
>
> Other map types already use the bpf_rcu_lock_held() helper, which
> accepts all three contexts (classic, BH, Tasks Trace). Use it here as
> well, matching the established convention.
>
> Fixes: 694cea395fde ("bpf: Allow RCU-protected lookups to happen from bh context")
> Cc: stable@vger.kernel.org
> Signed-off-by: Vlad Poenaru <vlad.wing@gmail.com>

Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>

> ---
>  kernel/bpf/lpm_trie.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c
> index 0f57608b385d..ac36063cb7e6 100644
> --- a/kernel/bpf/lpm_trie.c
> +++ b/kernel/bpf/lpm_trie.c
> @@ -246,7 +246,7 @@ static void *trie_lookup_elem(struct bpf_map *map, void *_key)
>  
>  	/* Start walking the trie from the root node ... */
>  
> -	for (node = rcu_dereference_check(trie->root, rcu_read_lock_bh_held());
> +	for (node = rcu_dereference_check(trie->root, bpf_rcu_lock_held());
>  	     node;) {
>  		unsigned int next_bit;
>  		size_t matchlen;
> @@ -280,7 +280,7 @@ static void *trie_lookup_elem(struct bpf_map *map, void *_key)
>  		 */
>  		next_bit = extract_bit(key->data, node->prefixlen);
>  		node = rcu_dereference_check(node->child[next_bit],
> -					     rcu_read_lock_bh_held());
> +					     bpf_rcu_lock_held());
>  	}
>  
>  	if (!found)


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH bpf] bpf, lpm_trie: Allow lookups from sleepable BPF programs
  2026-05-29 19:02 ` sashiko-bot
@ 2026-05-29 19:30   ` Emil Tsalapatis
  2026-06-07  9:17   ` Kumar Kartikeya Dwivedi
  1 sibling, 0 replies; 13+ messages in thread
From: Emil Tsalapatis @ 2026-05-29 19:30 UTC (permalink / raw)
  To: sashiko-reviews, Vlad Poenaru; +Cc: bpf

On Fri May 29, 2026 at 3:02 PM EDT, sashiko-bot wrote:
> Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
>
> Pre-existing issues:
> - [High] This problem wasn't introduced by this patch, but the fix is incomplete. The patch resolves an RCU lockdep splat in `trie_lookup_elem` for sleepable BPF programs, but leaves identical core kernel locking rule violations in `trie_update_elem` and `trie_delete_elem`. Sleepable eBPF programs hold `rcu_read_lock_trace()`, not the classic `rcu_read_lock()`. The update and delete operations acquire the map's spinlock (`trie->lock`) and correctly serialize access, but they use `rcu_dereference()` to traverse the trie instead of the update-side API `rcu_dereference_protected()`. `rcu_dereference()` strictly asserts that `rcu_read_lock_held()` is true. Holding a spinlock disables preemption but does not satisfy this check, leading to the exact same 'suspicious rcu_dereference_check() usage' lockdep splat that this patch aimed to fix.
> --
>
> commit 1ac6e30b3c7fb4d93d67bfcbb5dc35077814b266
> Author: Vlad Poenaru <vlad.wing@gmail.com>
>
> bpf, lpm_trie: Allow lookups from sleepable BPF programs
>
> This commit fixes a lockdep warning caused by sleepable BPF programs
> accessing LPM tries. It updates the RCU dereference checks in
> trie_lookup_elem() to use bpf_rcu_lock_held() instead of
> rcu_read_lock_bh_held().
>
>> diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c
>> index 0f57608b385d4..ac36063cb7e62 100644
>> --- a/kernel/bpf/lpm_trie.c
>> +++ b/kernel/bpf/lpm_trie.c
> [ ... ]
>> @@ -280,7 +280,7 @@ static void *trie_lookup_elem(struct bpf_map *map, void *_key)
>>  		 */
>>  		next_bit = extract_bit(key->data, node->prefixlen);
>>  		node = rcu_dereference_check(node->child[next_bit],
>> -					     rcu_read_lock_bh_held());
>> +					     bpf_rcu_lock_held());
>>  	}
>>  
>>  	if (!found)
>
> [Severity: High]
> This is a pre-existing issue, but does this patch leave the fix incomplete?
>
> While this patch updates the lookup path to avoid lockdep splats from sleepable
> BPF programs, sleepable BPF programs can also trigger map updates and deletes.
> The functions trie_update_elem() and trie_delete_elem() acquire the map's
> spinlock (trie->lock) but traverse the tree using rcu_dereference():
>
> kernel/bpf/lpm_trie.c:trie_update_elem() {
> ...
>         while ((node = rcu_dereference(*slot))) {
>                 matchlen = longest_prefix_match(trie, node, key);
> ...
> }
>
> kernel/bpf/lpm_trie.c:trie_delete_elem() {
> ...
>         while ((node = rcu_dereference(*trim))) {
>                 matchlen = longest_prefix_match(trie, node, key);
> ...
> }
>
> Because rcu_dereference() strictly asserts that rcu_read_lock_held() is true,
> and holding a spinlock does not satisfy the classic RCU read lock check, won't
> calling map updates or deletes from sleepable programs still trigger the exact
> same suspicious rcu_dereference_check() usage lockdep splats?
>
> Should these paths be updated to use rcu_dereference_protected() since the
> trie->lock is held?

Fair point by the bot


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH bpf] bpf, lpm_trie: Allow lookups from sleepable BPF programs
  2026-05-29 19:02 ` sashiko-bot
  2026-05-29 19:30   ` Emil Tsalapatis
@ 2026-06-07  9:17   ` Kumar Kartikeya Dwivedi
  1 sibling, 0 replies; 13+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2026-06-07  9:17 UTC (permalink / raw)
  To: sashiko-reviews, Vlad Poenaru; +Cc: bpf

On Fri May 29, 2026 at 9:02 PM CEST, sashiko-bot wrote:
> Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
>
> Pre-existing issues:
> - [High] This problem wasn't introduced by this patch, but the fix is incomplete. The patch resolves an RCU lockdep splat in `trie_lookup_elem` for sleepable BPF programs, but leaves identical core kernel locking rule violations in `trie_update_elem` and `trie_delete_elem`. Sleepable eBPF programs hold `rcu_read_lock_trace()`, not the classic `rcu_read_lock()`. The update and delete operations acquire the map's spinlock (`trie->lock`) and correctly serialize access, but they use `rcu_dereference()` to traverse the trie instead of the update-side API `rcu_dereference_protected()`. `rcu_dereference()` strictly asserts that `rcu_read_lock_held()` is true. Holding a spinlock disables preemption but does not satisfy this check, leading to the exact same 'suspicious rcu_dereference_check() usage' lockdep splat that this patch aimed to fix.
> --
>
> commit 1ac6e30b3c7fb4d93d67bfcbb5dc35077814b266
> Author: Vlad Poenaru <vlad.wing@gmail.com>
>
> bpf, lpm_trie: Allow lookups from sleepable BPF programs
>
> This commit fixes a lockdep warning caused by sleepable BPF programs
> accessing LPM tries. It updates the RCU dereference checks in
> trie_lookup_elem() to use bpf_rcu_lock_held() instead of
> rcu_read_lock_bh_held().
>
>> diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c
>> index 0f57608b385d4..ac36063cb7e62 100644
>> --- a/kernel/bpf/lpm_trie.c
>> +++ b/kernel/bpf/lpm_trie.c
> [ ... ]
>> @@ -280,7 +280,7 @@ static void *trie_lookup_elem(struct bpf_map *map, void *_key)
>>  		 */
>>  		next_bit = extract_bit(key->data, node->prefixlen);
>>  		node = rcu_dereference_check(node->child[next_bit],
>> -					     rcu_read_lock_bh_held());
>> +					     bpf_rcu_lock_held());
>>  	}
>>
>>  	if (!found)
>
> [Severity: High]
> This is a pre-existing issue, but does this patch leave the fix incomplete?
>
> While this patch updates the lookup path to avoid lockdep splats from sleepable
> BPF programs, sleepable BPF programs can also trigger map updates and deletes.
> The functions trie_update_elem() and trie_delete_elem() acquire the map's
> spinlock (trie->lock) but traverse the tree using rcu_dereference():
>
> kernel/bpf/lpm_trie.c:trie_update_elem() {
> ...
>         while ((node = rcu_dereference(*slot))) {
>                 matchlen = longest_prefix_match(trie, node, key);
> ...
> }
>
> kernel/bpf/lpm_trie.c:trie_delete_elem() {
> ...
>         while ((node = rcu_dereference(*trim))) {
>                 matchlen = longest_prefix_match(trie, node, key);
> ...
> }
>
> Because rcu_dereference() strictly asserts that rcu_read_lock_held() is true,
> and holding a spinlock does not satisfy the classic RCU read lock check, won't
> calling map updates or deletes from sleepable programs still trigger the exact
> same suspicious rcu_dereference_check() usage lockdep splats?

Please address this feedback for other instances and respin.

pw-bot: cr

>
> Should these paths be updated to use rcu_dereference_protected() since the
> trie->lock is held?


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH bpf v2 0/2] bpf, lpm_trie: Allow sleepable BPF programs to use LPM tries
  2026-05-29 17:42 [PATCH bpf] bpf, lpm_trie: Allow lookups from sleepable BPF programs Vlad Poenaru
  2026-05-29 19:02 ` sashiko-bot
  2026-05-29 19:19 ` Emil Tsalapatis
@ 2026-06-09 13:55 ` Vlad Poenaru
  2026-06-09 13:55   ` [PATCH bpf v2 1/2] bpf, lpm_trie: Allow access from sleepable BPF programs Vlad Poenaru
                     ` (2 more replies)
  2 siblings, 3 replies; 13+ messages in thread
From: Vlad Poenaru @ 2026-06-09 13:55 UTC (permalink / raw)
  To: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	John Fastabend, Martin KaFai Lau, Eduard Zingerman,
	Kumar Kartikeya Dwivedi, Song Liu, Yonghong Song, Jiri Olsa,
	Toke Høiland-Jørgensen
  Cc: Emil Tsalapatis, linux-kernel

trie_lookup_elem() annotates its rcu_dereference_check() walks with only
rcu_read_lock_bh_held(), so a sleepable BPF program that touches an LPM
trie (e.g. a sleepable LSM hook calling bpf_map_lookup_elem()) trips a
"suspicious RCU usage" lockdep splat on debug kernels: it holds only
rcu_read_lock_trace(), which that annotation does not accept.

Patch 1 relaxes the rcu_dereference annotations in the trie walks so they
no longer trip lockdep from the Tasks Trace context, including the
trie_update_elem()/trie_delete_elem() writer walks (protected by
trie->lock). Patch 2 adds BPF_MAP_TYPE_LPM_TRIE to the verifier's
sleepable map whitelist so sleepable programs can reference an LPM trie
directly, not just as the inner map of a map-of-maps. LPM trie nodes are
reclaimed via bpf_mem_cache_free_rcu(), which chains a regular RCU grace
period into a Tasks Trace grace period before freeing -- the same
discipline BPF_MAP_TYPE_HASH relies on for sleepable access.

Changes since v1:
- Split into a 2-patch series.
- Patch 1 now also converts the trie_update_elem()/trie_delete_elem()
  walks from rcu_dereference() to rcu_dereference_protected(*p, 1),
  addressing review feedback that v1 only fixed the lookup path and left
  the same splat on the writer paths.
- New patch 2 adds the verifier whitelist entry so the fix is actually
  reachable for directly-referenced LPM tries.
- Retitled v1 ("Allow lookups from sleepable BPF programs").

v1: https://lore.kernel.org/all/20260529174233.2954240-1-vlad.wing@gmail.com/

Vlad Poenaru (2):
  bpf, lpm_trie: Allow access from sleepable BPF programs
  bpf, lpm_trie: Allow sleepable programs to use LPM trie maps directly

 kernel/bpf/lpm_trie.c | 8 ++++----
 kernel/bpf/verifier.c | 1 +
 2 files changed, 5 insertions(+), 4 deletions(-)

--
2.53.0-Meta


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH bpf v2 1/2] bpf, lpm_trie: Allow access from sleepable BPF programs
  2026-06-09 13:55 ` [PATCH bpf v2 0/2] bpf, lpm_trie: Allow sleepable BPF programs to use LPM tries Vlad Poenaru
@ 2026-06-09 13:55   ` Vlad Poenaru
  2026-06-09 16:36     ` Emil Tsalapatis
  2026-06-09 13:55   ` [PATCH bpf v2 2/2] bpf, lpm_trie: Allow sleepable programs to use LPM trie maps directly Vlad Poenaru
  2026-06-09 19:50   ` [PATCH bpf v2 0/2] bpf, lpm_trie: Allow sleepable BPF programs to use LPM tries patchwork-bot+netdevbpf
  2 siblings, 1 reply; 13+ messages in thread
From: Vlad Poenaru @ 2026-06-09 13:55 UTC (permalink / raw)
  To: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	John Fastabend, Martin KaFai Lau, Eduard Zingerman,
	Kumar Kartikeya Dwivedi, Song Liu, Yonghong Song, Jiri Olsa,
	Toke Høiland-Jørgensen
  Cc: Emil Tsalapatis, linux-kernel, stable

trie_lookup_elem() annotates its rcu_dereference_check() walks with
only rcu_read_lock_bh_held().  Because rcu_dereference_check(p, c)
resolves to "c || rcu_read_lock_held()", this passes for XDP/NAPI and
classic RCU readers but fails for sleepable BPF programs, which enter
via __bpf_prog_enter_sleepable() and hold only rcu_read_lock_trace().

trie_update_elem() and trie_delete_elem() have the same problem in a
different form: they walk the trie with plain rcu_dereference(), which
asserts rcu_read_lock_held() unconditionally.  Both are reachable from
sleepable BPF programs via the bpf_map_update_elem / bpf_map_delete_elem
helpers, and from the syscall path under classic rcu_read_lock().  In
the writer paths the trie is actually protected by trie->lock (an
rqspinlock taken across the walk); we never relied on the RCU read-side
lock to keep nodes alive there.

A sleepable LSM hook that ends up touching an LPM trie therefore
triggers lockdep on debug kernels:

  =============================
  WARNING: suspicious RCU usage
  7.1.0-... Tainted: G            E
  -----------------------------
  kernel/bpf/lpm_trie.c:249 suspicious rcu_dereference_check() usage!
  1 lock held by net_tests/540:
   #0: (rcu_tasks_trace_srcu_struct){....}-{0:0},
       at: __bpf_prog_enter_sleepable+0x26/0x280
  Call Trace:
   dump_stack_lvl
   lockdep_rcu_suspicious
   trie_lookup_elem
   bpf_prog_..._enforce_security_socket_connect
   bpf_trampoline_...
   security_socket_connect
   __sys_connect
   do_syscall_64

This is lockdep-only -- no UAF, since Tasks Trace RCU does serialize
against the trie's reclaim path -- but it spams the console once per
distinct callsite on every debug kernel running a sleepable BPF LSM
that touches an LPM trie, which is increasingly common.

For the lookup path, switch the rcu_dereference_check() annotation
from rcu_read_lock_bh_held() to bpf_rcu_lock_held(), which accepts all
three contexts (classic, BH, Tasks Trace).  Other map types already
follow this convention.

For trie_update_elem() and trie_delete_elem(), annotate the walks as
rcu_dereference_protected(*p, 1) -- matching trie_free() in the same
file -- since trie->lock is held across the walk.  rqspinlock has no
lockdep_map, so the predicate degenerates to '1' rather than
lockdep_is_held(&trie->lock); the protection is real but not
machine-verifiable.  trie_get_next_key() also uses bare
rcu_dereference() but is reachable only from the BPF syscall, which
holds classic rcu_read_lock() before dispatching, so it is left
untouched.

Fixes: 694cea395fde ("bpf: Allow RCU-protected lookups to happen from bh context")
Cc: stable@vger.kernel.org
Signed-off-by: Vlad Poenaru <vlad.wing@gmail.com>
---
 kernel/bpf/lpm_trie.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c
index 0f57608b385d..4d6f25db9ba1 100644
--- a/kernel/bpf/lpm_trie.c
+++ b/kernel/bpf/lpm_trie.c
@@ -246,7 +246,7 @@ static void *trie_lookup_elem(struct bpf_map *map, void *_key)
 
 	/* Start walking the trie from the root node ... */
 
-	for (node = rcu_dereference_check(trie->root, rcu_read_lock_bh_held());
+	for (node = rcu_dereference_check(trie->root, bpf_rcu_lock_held());
 	     node;) {
 		unsigned int next_bit;
 		size_t matchlen;
@@ -280,7 +280,7 @@ static void *trie_lookup_elem(struct bpf_map *map, void *_key)
 		 */
 		next_bit = extract_bit(key->data, node->prefixlen);
 		node = rcu_dereference_check(node->child[next_bit],
-					     rcu_read_lock_bh_held());
+					     bpf_rcu_lock_held());
 	}
 
 	if (!found)
@@ -359,7 +359,7 @@ static long trie_update_elem(struct bpf_map *map,
 	 */
 	slot = &trie->root;
 
-	while ((node = rcu_dereference(*slot))) {
+	while ((node = rcu_dereference_protected(*slot, 1))) {
 		matchlen = longest_prefix_match(trie, node, key);
 
 		if (node->prefixlen != matchlen ||
@@ -482,7 +482,7 @@ static long trie_delete_elem(struct bpf_map *map, void *_key)
 	trim = &trie->root;
 	trim2 = trim;
 	parent = NULL;
-	while ((node = rcu_dereference(*trim))) {
+	while ((node = rcu_dereference_protected(*trim, 1))) {
 		matchlen = longest_prefix_match(trie, node, key);
 
 		if (node->prefixlen != matchlen ||
-- 
2.53.0-Meta


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH bpf v2 2/2] bpf, lpm_trie: Allow sleepable programs to use LPM trie maps directly
  2026-06-09 13:55 ` [PATCH bpf v2 0/2] bpf, lpm_trie: Allow sleepable BPF programs to use LPM tries Vlad Poenaru
  2026-06-09 13:55   ` [PATCH bpf v2 1/2] bpf, lpm_trie: Allow access from sleepable BPF programs Vlad Poenaru
@ 2026-06-09 13:55   ` Vlad Poenaru
  2026-06-09 16:19     ` Emil Tsalapatis
  2026-06-10  1:53     ` Hou Tao
  2026-06-09 19:50   ` [PATCH bpf v2 0/2] bpf, lpm_trie: Allow sleepable BPF programs to use LPM tries patchwork-bot+netdevbpf
  2 siblings, 2 replies; 13+ messages in thread
From: Vlad Poenaru @ 2026-06-09 13:55 UTC (permalink / raw)
  To: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	John Fastabend, Martin KaFai Lau, Eduard Zingerman,
	Kumar Kartikeya Dwivedi, Song Liu, Yonghong Song, Jiri Olsa,
	Toke Høiland-Jørgensen
  Cc: Emil Tsalapatis, linux-kernel

The previous change relaxed the rcu_dereference annotations in
lpm_trie.c so the trie walks no longer trip lockdep when reached from a
sleepable BPF program holding only rcu_read_lock_trace().  By itself
that only helps tries reached as the inner map of a map-of-maps, or
from the classic-RCU syscall path: a sleepable program that references
an LPM trie directly is still rejected at load time by
check_map_prog_compatibility(), whose sleepable whitelist omits
BPF_MAP_TYPE_LPM_TRIE:

  Sleepable programs can only use array, hash, ringbuf and local storage maps

LPM trie nodes are allocated from a bpf_mem_alloc (trie->ma) and freed
with bpf_mem_cache_free_rcu(), which chains a regular RCU grace period
into a Tasks Trace grace period before the node -- and the value
embedded in it that trie_lookup_elem() returns to the program -- is
released.  That is the same reclaim discipline BPF_MAP_TYPE_HASH relies
on for sleepable access, so a value handed to a sleepable reader cannot
be freed while the program is still running under rcu_read_lock_trace().
The writer paths take trie->lock across the walk and never relied on the
RCU read-side lock to keep nodes alive.

Add BPF_MAP_TYPE_LPM_TRIE to the sleepable map whitelist so these
programs can use LPM tries directly.

Signed-off-by: Vlad Poenaru <vlad.wing@gmail.com>
---
 kernel/bpf/verifier.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 7fb88e1cd7c4..71c1e59e4df4 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -18122,6 +18122,7 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env,
 		case BPF_MAP_TYPE_PERCPU_HASH:
 		case BPF_MAP_TYPE_PERCPU_ARRAY:
 		case BPF_MAP_TYPE_LRU_PERCPU_HASH:
+		case BPF_MAP_TYPE_LPM_TRIE:
 		case BPF_MAP_TYPE_ARRAY_OF_MAPS:
 		case BPF_MAP_TYPE_HASH_OF_MAPS:
 		case BPF_MAP_TYPE_RINGBUF:
-- 
2.53.0-Meta


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH bpf v2 2/2] bpf, lpm_trie: Allow sleepable programs to use LPM trie maps directly
  2026-06-09 13:55   ` [PATCH bpf v2 2/2] bpf, lpm_trie: Allow sleepable programs to use LPM trie maps directly Vlad Poenaru
@ 2026-06-09 16:19     ` Emil Tsalapatis
  2026-06-10  1:53     ` Hou Tao
  1 sibling, 0 replies; 13+ messages in thread
From: Emil Tsalapatis @ 2026-06-09 16:19 UTC (permalink / raw)
  To: Vlad Poenaru, bpf, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, John Fastabend, Martin KaFai Lau,
	Eduard Zingerman, Kumar Kartikeya Dwivedi, Song Liu,
	Yonghong Song, Jiri Olsa, Toke Høiland-Jørgensen
  Cc: Emil Tsalapatis, linux-kernel

On Tue Jun 9, 2026 at 9:55 AM EDT, Vlad Poenaru wrote:
> The previous change relaxed the rcu_dereference annotations in
> lpm_trie.c so the trie walks no longer trip lockdep when reached from a
> sleepable BPF program holding only rcu_read_lock_trace().  By itself
> that only helps tries reached as the inner map of a map-of-maps, or
> from the classic-RCU syscall path: a sleepable program that references
> an LPM trie directly is still rejected at load time by
> check_map_prog_compatibility(), whose sleepable whitelist omits
> BPF_MAP_TYPE_LPM_TRIE:
>
>   Sleepable programs can only use array, hash, ringbuf and local storage maps
>
> LPM trie nodes are allocated from a bpf_mem_alloc (trie->ma) and freed
> with bpf_mem_cache_free_rcu(), which chains a regular RCU grace period
> into a Tasks Trace grace period before the node -- and the value
> embedded in it that trie_lookup_elem() returns to the program -- is
> released.  That is the same reclaim discipline BPF_MAP_TYPE_HASH relies
> on for sleepable access, so a value handed to a sleepable reader cannot
> be freed while the program is still running under rcu_read_lock_trace().
> The writer paths take trie->lock across the walk and never relied on the
> RCU read-side lock to keep nodes alive.
>
> Add BPF_MAP_TYPE_LPM_TRIE to the sleepable map whitelist so these
> programs can use LPM tries directly.
>
> Signed-off-by: Vlad Poenaru <vlad.wing@gmail.com>

Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>

> ---
>  kernel/bpf/verifier.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 7fb88e1cd7c4..71c1e59e4df4 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -18122,6 +18122,7 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env,
>  		case BPF_MAP_TYPE_PERCPU_HASH:
>  		case BPF_MAP_TYPE_PERCPU_ARRAY:
>  		case BPF_MAP_TYPE_LRU_PERCPU_HASH:
> +		case BPF_MAP_TYPE_LPM_TRIE:
>  		case BPF_MAP_TYPE_ARRAY_OF_MAPS:
>  		case BPF_MAP_TYPE_HASH_OF_MAPS:
>  		case BPF_MAP_TYPE_RINGBUF:


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH bpf v2 1/2] bpf, lpm_trie: Allow access from sleepable BPF programs
  2026-06-09 13:55   ` [PATCH bpf v2 1/2] bpf, lpm_trie: Allow access from sleepable BPF programs Vlad Poenaru
@ 2026-06-09 16:36     ` Emil Tsalapatis
  0 siblings, 0 replies; 13+ messages in thread
From: Emil Tsalapatis @ 2026-06-09 16:36 UTC (permalink / raw)
  To: Vlad Poenaru, bpf, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, John Fastabend, Martin KaFai Lau,
	Eduard Zingerman, Kumar Kartikeya Dwivedi, Song Liu,
	Yonghong Song, Jiri Olsa, Toke Høiland-Jørgensen
  Cc: Emil Tsalapatis, linux-kernel, stable

On Tue Jun 9, 2026 at 9:55 AM EDT, Vlad Poenaru wrote:
> trie_lookup_elem() annotates its rcu_dereference_check() walks with
> only rcu_read_lock_bh_held().  Because rcu_dereference_check(p, c)
> resolves to "c || rcu_read_lock_held()", this passes for XDP/NAPI and
> classic RCU readers but fails for sleepable BPF programs, which enter
> via __bpf_prog_enter_sleepable() and hold only rcu_read_lock_trace().
>
> trie_update_elem() and trie_delete_elem() have the same problem in a
> different form: they walk the trie with plain rcu_dereference(), which
> asserts rcu_read_lock_held() unconditionally.  Both are reachable from
> sleepable BPF programs via the bpf_map_update_elem / bpf_map_delete_elem
> helpers, and from the syscall path under classic rcu_read_lock().  In
> the writer paths the trie is actually protected by trie->lock (an
> rqspinlock taken across the walk); we never relied on the RCU read-side
> lock to keep nodes alive there.
>
> A sleepable LSM hook that ends up touching an LPM trie therefore
> triggers lockdep on debug kernels:
>
>   =============================
>   WARNING: suspicious RCU usage
>   7.1.0-... Tainted: G            E
>   -----------------------------
>   kernel/bpf/lpm_trie.c:249 suspicious rcu_dereference_check() usage!
>   1 lock held by net_tests/540:
>    #0: (rcu_tasks_trace_srcu_struct){....}-{0:0},
>        at: __bpf_prog_enter_sleepable+0x26/0x280
>   Call Trace:
>    dump_stack_lvl
>    lockdep_rcu_suspicious
>    trie_lookup_elem
>    bpf_prog_..._enforce_security_socket_connect
>    bpf_trampoline_...
>    security_socket_connect
>    __sys_connect
>    do_syscall_64
>
> This is lockdep-only -- no UAF, since Tasks Trace RCU does serialize
> against the trie's reclaim path -- but it spams the console once per
> distinct callsite on every debug kernel running a sleepable BPF LSM
> that touches an LPM trie, which is increasingly common.
>
> For the lookup path, switch the rcu_dereference_check() annotation
> from rcu_read_lock_bh_held() to bpf_rcu_lock_held(), which accepts all
> three contexts (classic, BH, Tasks Trace).  Other map types already
> follow this convention.
>
> For trie_update_elem() and trie_delete_elem(), annotate the walks as
> rcu_dereference_protected(*p, 1) -- matching trie_free() in the same
> file -- since trie->lock is held across the walk.  rqspinlock has no
> lockdep_map, so the predicate degenerates to '1' rather than
> lockdep_is_held(&trie->lock); the protection is real but not
> machine-verifiable.  trie_get_next_key() also uses bare
> rcu_dereference() but is reachable only from the BPF syscall, which
> holds classic rcu_read_lock() before dispatching, so it is left
> untouched.
>
> Fixes: 694cea395fde ("bpf: Allow RCU-protected lookups to happen from bh context")
> Cc: stable@vger.kernel.org
> Signed-off-by: Vlad Poenaru <vlad.wing@gmail.com>

Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>

> ---
>  kernel/bpf/lpm_trie.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c
> index 0f57608b385d..4d6f25db9ba1 100644
> --- a/kernel/bpf/lpm_trie.c
> +++ b/kernel/bpf/lpm_trie.c
> @@ -246,7 +246,7 @@ static void *trie_lookup_elem(struct bpf_map *map, void *_key)
>  
>  	/* Start walking the trie from the root node ... */
>  
> -	for (node = rcu_dereference_check(trie->root, rcu_read_lock_bh_held());
> +	for (node = rcu_dereference_check(trie->root, bpf_rcu_lock_held());
>  	     node;) {
>  		unsigned int next_bit;
>  		size_t matchlen;
> @@ -280,7 +280,7 @@ static void *trie_lookup_elem(struct bpf_map *map, void *_key)
>  		 */
>  		next_bit = extract_bit(key->data, node->prefixlen);
>  		node = rcu_dereference_check(node->child[next_bit],
> -					     rcu_read_lock_bh_held());
> +					     bpf_rcu_lock_held());
>  	}
>  
>  	if (!found)
> @@ -359,7 +359,7 @@ static long trie_update_elem(struct bpf_map *map,
>  	 */
>  	slot = &trie->root;
>  
> -	while ((node = rcu_dereference(*slot))) {
> +	while ((node = rcu_dereference_protected(*slot, 1))) {
>  		matchlen = longest_prefix_match(trie, node, key);
>  
>  		if (node->prefixlen != matchlen ||
> @@ -482,7 +482,7 @@ static long trie_delete_elem(struct bpf_map *map, void *_key)
>  	trim = &trie->root;
>  	trim2 = trim;
>  	parent = NULL;
> -	while ((node = rcu_dereference(*trim))) {
> +	while ((node = rcu_dereference_protected(*trim, 1))) {
>  		matchlen = longest_prefix_match(trie, node, key);
>  
>  		if (node->prefixlen != matchlen ||


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH bpf v2 0/2] bpf, lpm_trie: Allow sleepable BPF programs to use LPM tries
  2026-06-09 13:55 ` [PATCH bpf v2 0/2] bpf, lpm_trie: Allow sleepable BPF programs to use LPM tries Vlad Poenaru
  2026-06-09 13:55   ` [PATCH bpf v2 1/2] bpf, lpm_trie: Allow access from sleepable BPF programs Vlad Poenaru
  2026-06-09 13:55   ` [PATCH bpf v2 2/2] bpf, lpm_trie: Allow sleepable programs to use LPM trie maps directly Vlad Poenaru
@ 2026-06-09 19:50   ` patchwork-bot+netdevbpf
  2 siblings, 0 replies; 13+ messages in thread
From: patchwork-bot+netdevbpf @ 2026-06-09 19:50 UTC (permalink / raw)
  To: Vlad Poenaru
  Cc: bpf, ast, daniel, andrii, john.fastabend, martin.lau, eddyz87,
	memxor, song, yonghong.song, jolsa, toke, emil, linux-kernel

Hello:

This series was applied to bpf/bpf-next.git (master)
by Alexei Starovoitov <ast@kernel.org>:

On Tue,  9 Jun 2026 06:55:56 -0700 you wrote:
> trie_lookup_elem() annotates its rcu_dereference_check() walks with only
> rcu_read_lock_bh_held(), so a sleepable BPF program that touches an LPM
> trie (e.g. a sleepable LSM hook calling bpf_map_lookup_elem()) trips a
> "suspicious RCU usage" lockdep splat on debug kernels: it holds only
> rcu_read_lock_trace(), which that annotation does not accept.
> 
> Patch 1 relaxes the rcu_dereference annotations in the trie walks so they
> no longer trip lockdep from the Tasks Trace context, including the
> trie_update_elem()/trie_delete_elem() writer walks (protected by
> trie->lock). Patch 2 adds BPF_MAP_TYPE_LPM_TRIE to the verifier's
> sleepable map whitelist so sleepable programs can reference an LPM trie
> directly, not just as the inner map of a map-of-maps. LPM trie nodes are
> reclaimed via bpf_mem_cache_free_rcu(), which chains a regular RCU grace
> period into a Tasks Trace grace period before freeing -- the same
> discipline BPF_MAP_TYPE_HASH relies on for sleepable access.
> 
> [...]

Here is the summary with links:
  - [bpf,v2,1/2] bpf, lpm_trie: Allow access from sleepable BPF programs
    https://git.kernel.org/bpf/bpf-next/c/2f884d371faf
  - [bpf,v2,2/2] bpf, lpm_trie: Allow sleepable programs to use LPM trie maps directly
    https://git.kernel.org/bpf/bpf-next/c/a3d76e27bbbf

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH bpf v2 2/2] bpf, lpm_trie: Allow sleepable programs to use LPM trie maps directly
  2026-06-09 13:55   ` [PATCH bpf v2 2/2] bpf, lpm_trie: Allow sleepable programs to use LPM trie maps directly Vlad Poenaru
  2026-06-09 16:19     ` Emil Tsalapatis
@ 2026-06-10  1:53     ` Hou Tao
  2026-06-10  2:34       ` Alexei Starovoitov
  1 sibling, 1 reply; 13+ messages in thread
From: Hou Tao @ 2026-06-10  1:53 UTC (permalink / raw)
  To: Vlad Poenaru, bpf, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, John Fastabend, Martin KaFai Lau,
	Eduard Zingerman, Kumar Kartikeya Dwivedi, Song Liu,
	Yonghong Song, Jiri Olsa, Toke Høiland-Jørgensen
  Cc: Emil Tsalapatis, linux-kernel

Hi,

On 6/9/2026 9:55 PM, Vlad Poenaru wrote:
> The previous change relaxed the rcu_dereference annotations in
> lpm_trie.c so the trie walks no longer trip lockdep when reached from a
> sleepable BPF program holding only rcu_read_lock_trace().  By itself
> that only helps tries reached as the inner map of a map-of-maps, or
> from the classic-RCU syscall path: a sleepable program that references
> an LPM trie directly is still rejected at load time by
> check_map_prog_compatibility(), whose sleepable whitelist omits
> BPF_MAP_TYPE_LPM_TRIE:
>
>   Sleepable programs can only use array, hash, ringbuf and local storage maps
>
> LPM trie nodes are allocated from a bpf_mem_alloc (trie->ma) and freed
> with bpf_mem_cache_free_rcu(), which chains a regular RCU grace period
> into a Tasks Trace grace period before the node -- and the value
> embedded in it that trie_lookup_elem() returns to the program -- is
> released.  That is the same reclaim discipline BPF_MAP_TYPE_HASH relies
> on for sleepable access, so a value handed to a sleepable reader cannot
> be freed while the program is still running under rcu_read_lock_trace().
> The writer paths take trie->lock across the walk and never relied on the
> RCU read-side lock to keep nodes alive.

For trie_lookup_elem(), I think it is not safe to enable the usage in
the sleep-able program as the patch does and it may return unexpected
value. The main reason is that rcu_read_lock_trace() can not guarantee
the current node which is being lookup-ed up will not reused by other
update procedure concurrently. However rcu_read_lock() has such
guarantee, because bpf_mem_cache_free_rcu() makes it be reusable only
after one RCU grace. For the hash-table case, I think it has the similar
problem through it has already used some trickle (hlist_nulls_node
variants) to mitigate it.
>
> Add BPF_MAP_TYPE_LPM_TRIE to the sleepable map whitelist so these
> programs can use LPM tries directly.
>
> Signed-off-by: Vlad Poenaru <vlad.wing@gmail.com>
> ---
>  kernel/bpf/verifier.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 7fb88e1cd7c4..71c1e59e4df4 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -18122,6 +18122,7 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env,
>  		case BPF_MAP_TYPE_PERCPU_HASH:
>  		case BPF_MAP_TYPE_PERCPU_ARRAY:
>  		case BPF_MAP_TYPE_LRU_PERCPU_HASH:
> +		case BPF_MAP_TYPE_LPM_TRIE:
>  		case BPF_MAP_TYPE_ARRAY_OF_MAPS:
>  		case BPF_MAP_TYPE_HASH_OF_MAPS:
>  		case BPF_MAP_TYPE_RINGBUF:


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH bpf v2 2/2] bpf, lpm_trie: Allow sleepable programs to use LPM trie maps directly
  2026-06-10  1:53     ` Hou Tao
@ 2026-06-10  2:34       ` Alexei Starovoitov
  0 siblings, 0 replies; 13+ messages in thread
From: Alexei Starovoitov @ 2026-06-10  2:34 UTC (permalink / raw)
  To: Hou Tao, Vlad Poenaru, bpf, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, John Fastabend, Martin KaFai Lau,
	Eduard Zingerman, Kumar Kartikeya Dwivedi, Song Liu,
	Yonghong Song, Jiri Olsa, Toke Høiland-Jørgensen
  Cc: Emil Tsalapatis, linux-kernel

On Tue Jun 9, 2026 at 6:53 PM PDT, Hou Tao wrote:
> Hi,
>
> On 6/9/2026 9:55 PM, Vlad Poenaru wrote:
>> The previous change relaxed the rcu_dereference annotations in
>> lpm_trie.c so the trie walks no longer trip lockdep when reached from a
>> sleepable BPF program holding only rcu_read_lock_trace().  By itself
>> that only helps tries reached as the inner map of a map-of-maps, or
>> from the classic-RCU syscall path: a sleepable program that references
>> an LPM trie directly is still rejected at load time by
>> check_map_prog_compatibility(), whose sleepable whitelist omits
>> BPF_MAP_TYPE_LPM_TRIE:
>>
>>   Sleepable programs can only use array, hash, ringbuf and local storage maps
>>
>> LPM trie nodes are allocated from a bpf_mem_alloc (trie->ma) and freed
>> with bpf_mem_cache_free_rcu(), which chains a regular RCU grace period
>> into a Tasks Trace grace period before the node -- and the value
>> embedded in it that trie_lookup_elem() returns to the program -- is
>> released.  That is the same reclaim discipline BPF_MAP_TYPE_HASH relies
>> on for sleepable access, so a value handed to a sleepable reader cannot
>> be freed while the program is still running under rcu_read_lock_trace().
>> The writer paths take trie->lock across the walk and never relied on the
>> RCU read-side lock to keep nodes alive.
>
> For trie_lookup_elem(), I think it is not safe to enable the usage in
> the sleep-able program as the patch does and it may return unexpected
> value. The main reason is that rcu_read_lock_trace() can not guarantee
> the current node which is being lookup-ed up will not reused by other
> update procedure concurrently. However rcu_read_lock() has such
> guarantee, because bpf_mem_cache_free_rcu() makes it be reusable only
> after one RCU grace. For the hash-table case, I think it has the similar
> problem through it has already used some trickle (hlist_nulls_node
> variants) to mitigate it.

You're correct. I remember that discussion.
Yet people already use lpm via map-in-map bug/workaround.
So I applied this set to make lpm-in-sleepable usage official
and force us to do a proper fix.

Also both AI bots didn't spot an issue, so the bug won't be
discovered immediately and we won't see a flurry of
"security" reports with slop "fixes". AI isn't that smart yet.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2026-06-10  2:34 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-29 17:42 [PATCH bpf] bpf, lpm_trie: Allow lookups from sleepable BPF programs Vlad Poenaru
2026-05-29 19:02 ` sashiko-bot
2026-05-29 19:30   ` Emil Tsalapatis
2026-06-07  9:17   ` Kumar Kartikeya Dwivedi
2026-05-29 19:19 ` Emil Tsalapatis
2026-06-09 13:55 ` [PATCH bpf v2 0/2] bpf, lpm_trie: Allow sleepable BPF programs to use LPM tries Vlad Poenaru
2026-06-09 13:55   ` [PATCH bpf v2 1/2] bpf, lpm_trie: Allow access from sleepable BPF programs Vlad Poenaru
2026-06-09 16:36     ` Emil Tsalapatis
2026-06-09 13:55   ` [PATCH bpf v2 2/2] bpf, lpm_trie: Allow sleepable programs to use LPM trie maps directly Vlad Poenaru
2026-06-09 16:19     ` Emil Tsalapatis
2026-06-10  1:53     ` Hou Tao
2026-06-10  2:34       ` Alexei Starovoitov
2026-06-09 19:50   ` [PATCH bpf v2 0/2] bpf, lpm_trie: Allow sleepable BPF programs to use LPM tries patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox