From: bot+bpf-ci@kernel.org
To: kaitao.cheng@linux.dev,ast@kernel.org,corbet@lwn.net,martin.lau@linux.dev,daniel@iogearbox.net,andrii@kernel.org,eddyz87@gmail.com,song@kernel.org,yonghong.song@linux.dev,john.fastabend@gmail.com,kpsingh@kernel.org,sdf@fomichev.me,haoluo@google.com,jolsa@kernel.org,shuah@kernel.org,chengkaitao@kylinos.cn,skhan@linuxfoundation.org,memxor@gmail.com
Cc: bpf@vger.kernel.org,linux-kernel@vger.kernel.org,linux-doc@vger.kernel.org,vmalik@redhat.com,linux-kselftest@vger.kernel.org,ast@kernel.org,andrii@kernel.org,daniel@iogearbox.net,martin.lau@kernel.org,eddyz87@gmail.com,yonghong.song@linux.dev,clm@meta.com,ihor.solodrai@linux.dev
Subject: Re: [PATCH RESEND bpf-next v10 2/8] bpf: clear list node owner and unlink before drop
Date: Tue, 12 May 2026 06:41:10 +0000 (UTC) [thread overview]
Message-ID: <da95f45b43d94d17796b19abc4c555e35daafab7d5e1cdca3714fe0f54ab8cce@mail.kernel.org> (raw)
In-Reply-To: <20260512055919.95716-3-kaitao.cheng@linux.dev>
[-- Attachment #1: Type: text/plain, Size: 6354 bytes --]
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index 9cd7b028592c..1e8754877dd1 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -2247,10 +2247,11 @@ EXPORT_SYMBOL_GPL(bpf_base_func_proto);
> void bpf_list_head_free(const struct btf_field *field, void *list_head,
> struct bpf_spin_lock *spin_lock)
> {
> - struct list_head *head = list_head, *orig_head = list_head;
> + struct list_head *head = list_head, drain, *pos, *n;
>
> BUILD_BUG_ON(sizeof(struct list_head) > sizeof(struct bpf_list_head));
> BUILD_BUG_ON(__alignof__(struct list_head) > __alignof__(struct bpf_list_head));
> + INIT_LIST_HEAD(&drain);
>
> /* Do the actual list draining outside the lock to not hold the lock for
> * too long, and also prevent deadlocks if tracing programs end up
> @@ -2261,20 +2262,23 @@ void bpf_list_head_free(const struct btf_field *field, void *list_head,
> __bpf_spin_lock_irqsave(spin_lock);
> if (!head->next || list_empty(head))
> goto unlock;
> - head = head->next;
> + list_for_each_safe(pos, n, head) {
> + WRITE_ONCE(container_of(pos,
> + struct bpf_list_node_kern, list_head)->owner, NULL);
> + list_move_tail(pos, &drain);
> + }
> unlock:
> - INIT_LIST_HEAD(orig_head);
> + INIT_LIST_HEAD(head);
> __bpf_spin_unlock_irqrestore(spin_lock);
>
> - while (head != orig_head) {
> - void *obj = head;
> -
> - obj -= field->graph_root.node_offset;
> - head = head->next;
> + while (!list_empty(&drain)) {
> + pos = drain.next;
> + list_del_init(pos);
Does this create a race condition when a BPF program holds a
refcount-acquired reference to a node being freed?
The patch clears each node's owner to NULL under spin_lock, moves the
node to a local 'drain' list, then drops the lock. After unlock, the
drain loop walks the local list and calls list_del_init(pos) before
__bpf_obj_drop_impl().
Between spin_unlock and list_del_init(pos), each node in 'drain' has
owner == NULL AND is still linked in 'drain'.
When a BPF program holds an owning or refcount-acquired reference to
one of these nodes (node X), which is structurally supported because
__bpf_obj_drop_impl() uses refcount_dec_and_test() and only frees at
refcount 0, a concurrent push to a DIFFERENT bpf_list_head becomes a
corruption:
CPU 0 (bpf_list_head_free, lock released) CPU 1 (BPF prog, refcount X)
----------------------------------------- ----------------------------
(owner of X == NULL, X linked in drain)
bpf_list_push_back(other, X)
__bpf_list_add: spin_lock()
cmpxchg(X->owner, NULL,
POISON) -> OK
list_add_tail(&X->list_head,
other_head)
-> overwrites X->next,
X->prev, corrupts
other_head's chain
because X is still
stitched into drain
pos = drain.next; (may be X or neighbor using X's stale next)
list_del_init(pos); reads X->next/prev now pointing into other_head,
corrupts other_head's list and/or drain
Pre-patch, the owner pointer was left pointing at the old head (never
cleared). Any concurrent push attempt cmpxchg(&node->owner, NULL,
POISON) would fail and return -EINVAL, blocking re-insertion. This
matches what bpf_rb_root_free() still does: it does NOT clear owner, so
concurrent bpf_rbtree_add() cmpxchg fails.
The patch clears owner under the lock but still links the node into the
unprotected local drain list, then drops the lock. The owner==NULL
state now acts as a 'push is allowed' signal even though the node is
still on a list, violating the invariant __bpf_list_add() relies on:
kernel/bpf/helpers.c:__bpf_list_add() {
/* node->owner != NULL implies !list_empty(n), no need to separately
* check the latter
*/
if (cmpxchg(&node->owner, NULL, BPF_PTR_POISON)) {
__bpf_obj_drop_impl((void *)n - off, rec, false);
return -EINVAL;
}
tail ? list_add_tail(n, h) : list_add(n, h);
WRITE_ONCE(node->owner, head);
}
Compounding this, 'drain' is a local stack variable: corruption can
redirect list pointers into kernel stack memory of the freeing CPU,
giving an unbounded list_add/list_del_init corruption vector.
Could you do list_del_init(pos) under spin_lock in the same critical
section that clears owner? Swapping the order (list_del_init then
WRITE_ONCE(owner, NULL), like __bpf_list_del() does), or leaving owner
non-NULL like bpf_rb_root_free does so cmpxchg in concurrent push still
fails?
The commit message's stated goal ('preventing subsequent users who hold
a reference count to the node from acquiring an invalid next node') is
only achievable if the delinking is done before unlock, not after.
> /* The contained type can also have resources, including a
> * bpf_list_head which needs to be freed.
> */
> - __bpf_obj_drop_impl(obj, field->graph_root.value_rec, false);
> + __bpf_obj_drop_impl((char *)pos - field->graph_root.node_offset,
> + field->graph_root.value_rec, false);
> }
> }
bot+bpf-ci@kernel.org raised a concern in v9 about whether this patch
should include a Fixes tag. The CI bot suggested:
Fixes: f0c5941ff5b2 ("bpf: Support bpf_list_head in map values")
since the concurrent reader issue was introduced when bpf_list_head
support was added. The author responded in v10 that the issue only
appears after the later patch 'bpf: Introduce the bpf_list_del kfunc' is
merged, so no Fixes tag was added.
However, the reasoning suggests the Fixes tag should point to the
bpf_list_del introduction rather than f0c5941ff5b2, and a Fixes tag may
still be appropriate. Should this include a Fixes tag pointing to the
commit that introduced bpf_list_del?
---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md
CI run summary: https://github.com/kernel-patches/bpf/actions/runs/25716874656
next prev parent reply other threads:[~2026-05-12 6:41 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-12 5:59 [PATCH RESEND bpf-next v10 0/8] bpf: Extend the bpf_list family of APIs Kaitao cheng
2026-05-12 5:59 ` [PATCH RESEND bpf-next v10 1/8] bpf: refactor __bpf_list_del to take list node pointer Kaitao cheng
2026-05-12 6:41 ` bot+bpf-ci
2026-05-12 8:55 ` Kaitao Cheng
2026-05-12 5:59 ` [PATCH RESEND bpf-next v10 2/8] bpf: clear list node owner and unlink before drop Kaitao cheng
2026-05-12 6:41 ` bot+bpf-ci [this message]
2026-05-12 5:59 ` [PATCH RESEND bpf-next v10 3/8] bpf: Introduce the bpf_list_del kfunc Kaitao cheng
2026-05-12 6:41 ` bot+bpf-ci
2026-05-12 9:36 ` Kaitao Cheng
2026-05-12 5:59 ` [PATCH RESEND bpf-next v10 4/8] bpf: refactor __bpf_list_add to take insertion point via **prev_ptr Kaitao cheng
2026-05-12 5:59 ` [PATCH RESEND bpf-next v10 5/8] bpf: Add bpf_list_add to insert node after a given list node Kaitao cheng
2026-05-12 6:41 ` bot+bpf-ci
2026-05-12 12:05 ` Kaitao Cheng
2026-05-12 5:59 ` [PATCH RESEND bpf-next v10 6/8] bpf: add bpf_list_is_first/last/empty kfuncs Kaitao cheng
2026-05-12 5:59 ` [PATCH RESEND bpf-next v10 7/8] bpf: allow non-owning list-node args via __nonown_allowed Kaitao cheng
2026-05-12 6:41 ` bot+bpf-ci
2026-05-12 5:59 ` [PATCH RESEND bpf-next v10 8/8] selftests/bpf: Add test cases for bpf_list_del/add/is_first/is_last/empty Kaitao cheng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=da95f45b43d94d17796b19abc4c555e35daafab7d5e1cdca3714fe0f54ab8cce@mail.kernel.org \
--to=bot+bpf-ci@kernel.org \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=chengkaitao@kylinos.cn \
--cc=clm@meta.com \
--cc=corbet@lwn.net \
--cc=daniel@iogearbox.net \
--cc=eddyz87@gmail.com \
--cc=haoluo@google.com \
--cc=ihor.solodrai@linux.dev \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=kaitao.cheng@linux.dev \
--cc=kpsingh@kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=martin.lau@kernel.org \
--cc=martin.lau@linux.dev \
--cc=memxor@gmail.com \
--cc=sdf@fomichev.me \
--cc=shuah@kernel.org \
--cc=skhan@linuxfoundation.org \
--cc=song@kernel.org \
--cc=vmalik@redhat.com \
--cc=yonghong.song@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox