From: Kaitao Cheng <kaitao.cheng@linux.dev>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>,
Eduard Zingerman <eddyz87@gmail.com>
Cc: bpf@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-doc@vger.kernel.org, ast@kernel.org, memxor@gmail.com,
corbet@lwn.net, martin.lau@linux.dev, daniel@iogearbox.net,
andrii@kernel.org, song@kernel.org, yonghong.song@linux.dev,
john.fastabend@gmail.com, kpsingh@kernel.org, sdf@fomichev.me,
haoluo@google.com, jolsa@kernel.org, shuah@kernel.org,
chengkaitao@kylinos.cn, skhan@linuxfoundation.org,
vmalik@redhat.com, linux-kselftest@vger.kernel.org,
martin.lau@kernel.org, clm@meta.com, ihor.solodrai@linux.dev,
bot+bpf-ci@kernel.org
Subject: Re: [PATCH RESEND bpf-next v10 2/8] bpf: clear list node owner and unlink before drop
Date: Fri, 15 May 2026 12:34:38 +0800 [thread overview]
Message-ID: <0fb2d99b-b122-44fa-a8bc-9befe6e350bc@linux.dev> (raw)
In-Reply-To: <DII0TT9LXYCX.2GMM6QA4Q9BPZ@gmail.com>
在 2026/5/14 09:50, Alexei Starovoitov 写道:
> On Wed May 13, 2026 at 3:53 PM PDT, Eduard Zingerman wrote:
>> On Tue, 2026-05-12 at 06:41 +0000, bot+bpf-ci@kernel.org wrote:
>>
>> [...]
>>
>>> When a BPF program holds an owning or refcount-acquired reference to
>>> one of these nodes (node X), which is structurally supported because
>>> __bpf_obj_drop_impl() uses refcount_dec_and_test() and only frees at
>>> refcount 0, a concurrent push to a DIFFERENT bpf_list_head becomes a
>>> corruption:
>>>
>>> CPU 0 (bpf_list_head_free, lock released) CPU 1 (BPF prog, refcount X)
>>> ----------------------------------------- ----------------------------
>>> (owner of X == NULL, X linked in drain)
>>> bpf_list_push_back(other, X)
>>> __bpf_list_add: spin_lock()
>>> cmpxchg(X->owner, NULL,
>>> POISON) -> OK
>>> list_add_tail(&X->list_head,
>>> other_head)
>>> -> overwrites X->next,
>>> X->prev, corrupts
>>> other_head's chain
>>> because X is still
>>> stitched into drain
>>> pos = drain.next; (may be X or neighbor using X's stale next)
>>> list_del_init(pos); reads X->next/prev now pointing into other_head,
>>> corrupts other_head's list and/or drain
>>
>>
>> Kaitao, this scenario seem plausible, could you please comment on it?
>
> I think bot is correct.
> This patch looks buggy.
> It seems to me an optimization that breaks the concurrent logic.
> May be just drop this patch and reorder the other one, so that bot
> sees nonown suffix logic first.
This patch is still necessary because it addresses the problem discussed
in this thread:
https://lore.kernel.org/all/DH846C0P88QU.16YT12I1LXBZM@etsalapatis.com/
The patch does have a bug, however. To fix the issues we are seeing now,
I propose the additional changes below and would appreciate feedback.
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -2263,8 +2263,10 @@ void bpf_list_head_free(const struct btf_field *field, void *list_head,
if (!head->next || list_empty(head))
goto unlock;
list_for_each_safe(pos, n, head) {
- WRITE_ONCE(container_of(pos,
- struct bpf_list_node_kern, list_head)->owner, NULL);
+ struct bpf_list_node_kern *node;
+
+ node = container_of(pos, struct bpf_list_node_kern, list_head);
+ WRITE_ONCE(node->owner, BPF_PTR_POISON);
list_move_tail(pos, &drain);
}
unlock:
@@ -2272,8 +2274,12 @@ void bpf_list_head_free(const struct btf_field *field, void *list_head,
__bpf_spin_unlock_irqrestore(spin_lock);
while (!list_empty(&drain)) {
+ struct bpf_list_node_kern *node;
+
pos = drain.next;
+ node = container_of(pos, struct bpf_list_node_kern, list_head);
list_del_init(pos);
+ WRITE_ONCE(node->owner, NULL);
/* The contained type can also have resources, including a
* bpf_list_head which needs to be freed.
*/
@@ -2481,6 +2487,14 @@ static int __bpf_list_add(struct bpf_list_node_kern *node,
if (unlikely(!h->next))
INIT_LIST_HEAD(h);
+ /* bpf_list_head_free() marks nodes being detached with BPF_PTR_POISON
+ * before list_del_init(). cmpxchg(NULL, POISON) below would fail with
+ * that old value and fall into the generic error path, which wrongly
+ * calls __bpf_obj_drop_impl(). Reject POISON up front instead.
+ */
+ if (READ_ONCE(node->owner) == BPF_PTR_POISON)
+ return -EINVAL;
+
/* node->owner != NULL implies !list_empty(n), no need to separately
* check the latter
*/
--
Thanks
Kaitao Cheng
next prev parent reply other threads:[~2026-05-15 4:35 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-12 5:59 [PATCH RESEND bpf-next v10 0/8] bpf: Extend the bpf_list family of APIs Kaitao cheng
2026-05-12 5:59 ` [PATCH RESEND bpf-next v10 1/8] bpf: refactor __bpf_list_del to take list node pointer Kaitao cheng
2026-05-12 6:41 ` bot+bpf-ci
2026-05-12 8:55 ` Kaitao Cheng
2026-05-13 22:30 ` Eduard Zingerman
2026-05-12 5:59 ` [PATCH RESEND bpf-next v10 2/8] bpf: clear list node owner and unlink before drop Kaitao cheng
2026-05-12 6:41 ` bot+bpf-ci
2026-05-13 22:53 ` Eduard Zingerman
2026-05-14 1:50 ` Alexei Starovoitov
2026-05-15 4:34 ` Kaitao Cheng [this message]
2026-05-15 18:24 ` Eduard Zingerman
2026-05-13 6:02 ` sashiko-bot
2026-05-12 5:59 ` [PATCH RESEND bpf-next v10 3/8] bpf: Introduce the bpf_list_del kfunc Kaitao cheng
2026-05-12 6:41 ` bot+bpf-ci
2026-05-12 9:36 ` Kaitao Cheng
2026-05-13 22:32 ` Eduard Zingerman
2026-05-12 5:59 ` [PATCH RESEND bpf-next v10 4/8] bpf: refactor __bpf_list_add to take insertion point via **prev_ptr Kaitao cheng
2026-05-13 22:33 ` Eduard Zingerman
2026-05-12 5:59 ` [PATCH RESEND bpf-next v10 5/8] bpf: Add bpf_list_add to insert node after a given list node Kaitao cheng
2026-05-12 6:41 ` bot+bpf-ci
2026-05-12 12:05 ` Kaitao Cheng
2026-05-13 20:44 ` sashiko-bot
2026-05-13 22:35 ` Eduard Zingerman
2026-05-12 5:59 ` [PATCH RESEND bpf-next v10 6/8] bpf: add bpf_list_is_first/last/empty kfuncs Kaitao cheng
2026-05-13 22:35 ` Eduard Zingerman
2026-05-12 5:59 ` [PATCH RESEND bpf-next v10 7/8] bpf: allow non-owning list-node args via __nonown_allowed Kaitao cheng
2026-05-12 6:41 ` bot+bpf-ci
2026-05-13 22:22 ` sashiko-bot
2026-05-13 22:37 ` Eduard Zingerman
2026-05-13 22:55 ` Eduard Zingerman
2026-05-12 5:59 ` [PATCH RESEND bpf-next v10 8/8] selftests/bpf: Add test cases for bpf_list_del/add/is_first/is_last/empty Kaitao cheng
2026-05-13 22:44 ` sashiko-bot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0fb2d99b-b122-44fa-a8bc-9befe6e350bc@linux.dev \
--to=kaitao.cheng@linux.dev \
--cc=alexei.starovoitov@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bot+bpf-ci@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=chengkaitao@kylinos.cn \
--cc=clm@meta.com \
--cc=corbet@lwn.net \
--cc=daniel@iogearbox.net \
--cc=eddyz87@gmail.com \
--cc=haoluo@google.com \
--cc=ihor.solodrai@linux.dev \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=kpsingh@kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=martin.lau@kernel.org \
--cc=martin.lau@linux.dev \
--cc=memxor@gmail.com \
--cc=sdf@fomichev.me \
--cc=shuah@kernel.org \
--cc=skhan@linuxfoundation.org \
--cc=song@kernel.org \
--cc=vmalik@redhat.com \
--cc=yonghong.song@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.