Re: [PATCH] bpf: Clear rb node linkage when freeing bpf_rb_root

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Kaitao Cheng <kaitao.cheng@linux.dev>
To: sashiko-reviews@lists.linux.dev
Cc: bpf@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net,
	andrii@kernel.org, martin.lau@linux.dev, eddyz87@gmail.com,
	memxor@gmail.com, song@kernel.org, yonghong.song@linux.dev,
	jolsa@kernel.org, davemarchevsky@fb.com,
	linux-kernel@vger.kernel.org, chengkaitao@kylinos.cn,
	martin.lau@kernel.org, clm@meta.com, ihor.solodrai@linux.dev
Subject: Re: [PATCH] bpf: Clear rb node linkage when freeing bpf_rb_root
Date: Wed, 3 Jun 2026 16:42:30 +0800	[thread overview]
Message-ID: <fc9077f5-ec68-43d9-a874-c110e38b2530@linux.dev> (raw)
In-Reply-To: <20260601061503.EFD881F00898@smtp.kernel.org>

在 2026/6/1 14:15, sashiko-bot@kernel.org 写道:
> Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
> 
> New issues:
> - [High] Unbounded O(N) red-black tree traversal inside an IRQ-disabled spinlock critical section introduces latency spikes, potential soft/hard lockups, and deadlocks.
> 
> Pre-existing issues:
> - [High] Nodes added to a deleted map element's rbtree after `bpf_obj_free_fields` are permanently leaked.
> --
> 
> commit 867fefb5dd2755c5bd12c9f3bd37d5367c6793f3
> Author: Kaitao Cheng <chengkaitao@kylinos.cn>
> 
> bpf: Clear rb node linkage when freeing bpf_rb_root
> 
> This commit modifies bpf_rb_root_free to clear red-black node linkage when
> freeing a root. It mirrors list draining behavior by marking nodes as busy
> while detaching the root, clearing the linkage, and releasing the owner
> before dropping the containing object.
> 
>> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
>> index b5314c9fed3cf..c7f26336984f0 100644
>> --- a/kernel/bpf/helpers.c
>> +++ b/kernel/bpf/helpers.c
>> @@ -2296,22 +2296,30 @@ void bpf_rb_root_free(const struct btf_field *field, void *rb_root,
>>  {
>>  	struct rb_root_cached orig_root, *root = rb_root;
>>  	struct rb_node *pos, *n;
>> -	void *obj;
>>  
>>  	BUILD_BUG_ON(sizeof(struct rb_root_cached) > sizeof(struct bpf_rb_root));
>>  	BUILD_BUG_ON(__alignof__(struct rb_root_cached) > __alignof__(struct bpf_rb_root));
>>  
>>  	__bpf_spin_lock_irqsave(spin_lock);
>>  	orig_root = *root;
>> +	bpf_rbtree_postorder_for_each_entry_safe(pos, n, &orig_root.rb_root) {
>> +		struct bpf_rb_node_kern *node;
>> +
>> +		node = rb_entry(pos, struct bpf_rb_node_kern, rb_node);
>> +		WRITE_ONCE(node->owner, BPF_PTR_POISON);
>> +	}
> 
> [Severity: High]
> Since there is no verifier-enforced limit on the number of nodes in a BPF
> rbtree, could this O(N) post-order traversal trigger latency spikes, RCU
> stalls, or NMI watchdog lockups?
> 
> This loop runs inside the __bpf_spin_lock_irqsave() critical section, which
> disables local interrupts.

The critical sections protected by the bpf_spin_lock are very small.
They only do simple pointer/value assignments and do not perform
any expensive operation, so I do not think they should introduce
noticeable latency by themselves.

> Additionally, since rb_first_postorder() and rb_next_postorder() are
> exported and traceable, if an fentry/fexit BPF program is attached to them
> and attempts to acquire the same bpf_spin_lock, will it cause a hard deadlock?

I am not entirely sure whether this needs to be changed. I would like
to hear others' opinions on this. If we do want to avoid calling
rb_first_postorder()/rb_next_postorder() here, the straightforward
fix would be to duplicate these two helpers with exactly the same
implementation but under different names, and use the renamed versions
locally. That feels somewhat redundant to me, though.

>>  	*root = RB_ROOT_CACHED;
>>  	__bpf_spin_unlock_irqrestore(spin_lock);
> 
> [Severity: High]
> This isn't a bug introduced by this patch, but does this pattern permanently
> leak memory if a concurrent BPF program adds a node after the tree is emptied?
> 
> When a map element is replaced or deleted, bpf_obj_free_fields() detaches
> the tree here and sets *root = RB_ROOT_CACHED. However, a concurrent BPF
> program running inside an RCU critical section might still hold a pointer
> to the map element.
> 
> If that program takes the spinlock and calls bpf_rbtree_add(), it will
> successfully add a new node to the now-emptied tree.
> 
> After the RCU grace period ends, bpf_mem_free_rcu() frees the map element,
> but bpf_obj_free_fields() is never called again. Will the newly added node
> be left behind and leaked?

This looks like a more general pre-existing issue. I am not sure whether
we should fix it in this series, and would like to hear others' opinions.

If we do want to fix it here, maybe we need to distinguish map_free/delete
from in-place map_update. For map_free/delete, a concurrent bpf_rbtree_add()
after the tree is detached should fail and drop the node. For in-place
map_update, the root should still be reset so that it remains usable.

-- 
Thanks
Kaitao Cheng

next prev parent reply	other threads:[~2026-06-03  8:43 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-01  5:58 [PATCH] bpf: Clear rb node linkage when freeing bpf_rb_root Kaitao Cheng
2026-06-01  6:15 ` sashiko-bot
2026-06-03  8:42   ` Kaitao Cheng [this message]
2026-06-01  6:44 ` bot+bpf-ci
2026-06-01 18:06 ` Yonghong Song
2026-06-05  9:51   ` Kaitao Cheng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fc9077f5-ec68-43d9-a874-c110e38b2530@linux.dev \
    --to=kaitao.cheng@linux.dev \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=chengkaitao@kylinos.cn \
    --cc=clm@meta.com \
    --cc=daniel@iogearbox.net \
    --cc=davemarchevsky@fb.com \
    --cc=eddyz87@gmail.com \
    --cc=ihor.solodrai@linux.dev \
    --cc=jolsa@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=martin.lau@kernel.org \
    --cc=martin.lau@linux.dev \
    --cc=memxor@gmail.com \
    --cc=sashiko-reviews@lists.linux.dev \
    --cc=song@kernel.org \
    --cc=yonghong.song@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.