All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eduard Zingerman <eddyz87@gmail.com>
To: Kaitao Cheng <kaitao.cheng@linux.dev>,
	Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: bpf@vger.kernel.org, linux-kernel@vger.kernel.org,
	 linux-doc@vger.kernel.org, ast@kernel.org, memxor@gmail.com,
	corbet@lwn.net, 	martin.lau@linux.dev, daniel@iogearbox.net,
	andrii@kernel.org, song@kernel.org, 	yonghong.song@linux.dev,
	john.fastabend@gmail.com, kpsingh@kernel.org, 	sdf@fomichev.me,
	haoluo@google.com, jolsa@kernel.org, shuah@kernel.org,
	 chengkaitao@kylinos.cn, skhan@linuxfoundation.org,
	vmalik@redhat.com,  linux-kselftest@vger.kernel.org,
	martin.lau@kernel.org, clm@meta.com,  ihor.solodrai@linux.dev,
	bot+bpf-ci@kernel.org
Subject: Re: [PATCH RESEND bpf-next v10 2/8] bpf: clear list node owner and unlink before drop
Date: Fri, 15 May 2026 11:24:09 -0700	[thread overview]
Message-ID: <7fa6794161a8bd4fdbc21dad68e86e9770c873cc.camel@gmail.com> (raw)
In-Reply-To: <0fb2d99b-b122-44fa-a8bc-9befe6e350bc@linux.dev>

On Fri, 2026-05-15 at 12:34 +0800, Kaitao Cheng wrote:
>
> 在 2026/5/14 09:50, Alexei Starovoitov 写道:
> > On Wed May 13, 2026 at 3:53 PM PDT, Eduard Zingerman wrote:
> > > On Tue, 2026-05-12 at 06:41 +0000, bot+bpf-ci@kernel.org wrote:
> > >
> > > [...]
> > >
> > > > When a BPF program holds an owning or refcount-acquired reference to
> > > > one of these nodes (node X), which is structurally supported because
> > > > __bpf_obj_drop_impl() uses refcount_dec_and_test() and only frees at
> > > > refcount 0, a concurrent push to a DIFFERENT bpf_list_head becomes a
> > > > corruption:
> > > >
> > > > CPU 0 (bpf_list_head_free, lock released)  CPU 1 (BPF prog, refcount X)
> > > > -----------------------------------------   ----------------------------
> > > > (owner of X == NULL, X linked in drain)
> > > >                                             bpf_list_push_back(other, X)
> > > >                                               __bpf_list_add: spin_lock()
> > > >                                               cmpxchg(X->owner, NULL,
> > > >                                                       POISON) -> OK
> > > >                                               list_add_tail(&X->list_head,
> > > >                                                             other_head)
> > > >                                                 -> overwrites X->next,
> > > >                                                    X->prev, corrupts
> > > >                                                    other_head's chain
> > > >                                                    because X is still
> > > >                                                    stitched into drain
> > > > pos = drain.next;      (may be X or neighbor using X's stale next)
> > > > list_del_init(pos);    reads X->next/prev now pointing into other_head,
> > > >                        corrupts other_head's list and/or drain
> > >
> > >
> > > Kaitao, this scenario seem plausible, could you please comment on it?
> >
> > I think bot is correct.
> > This patch looks buggy.
> > It seems to me an optimization that breaks the concurrent logic.
> > May be just drop this patch and reorder the other one, so that bot
> > sees nonown suffix logic first.
>
> This patch is still necessary because it addresses the problem discussed
> in this thread:
> https://lore.kernel.org/all/DH846C0P88QU.16YT12I1LXBZM@etsalapatis.com/
>
> The patch does have a bug, however. To fix the issues we are seeing now,
> I propose the additional changes below and would appreciate feedback.
>
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -2263,8 +2263,10 @@ void bpf_list_head_free(const struct btf_field *field, void *list_head,
>         if (!head->next || list_empty(head))
>                 goto unlock;
>         list_for_each_safe(pos, n, head) {
> -               WRITE_ONCE(container_of(pos,
> -                       struct bpf_list_node_kern, list_head)->owner, NULL);
> +               struct bpf_list_node_kern *node;
> +
> +               node = container_of(pos, struct bpf_list_node_kern, list_head);
> +               WRITE_ONCE(node->owner, BPF_PTR_POISON);
>                 list_move_tail(pos, &drain);
>         }
>  unlock:
> @@ -2272,8 +2274,12 @@ void bpf_list_head_free(const struct btf_field *field, void *list_head,
>         __bpf_spin_unlock_irqrestore(spin_lock);
>
>         while (!list_empty(&drain)) {
> +               struct bpf_list_node_kern *node;
> +
>                 pos = drain.next;
> +               node = container_of(pos, struct bpf_list_node_kern, list_head);
>                 list_del_init(pos);
> +               WRITE_ONCE(node->owner, NULL);

I think this still leaves a short race window open.
Why does the .owner has field to be NULL?
Can the logic that implies for it to be NULL be extended to accept
POISON as well?

>                 /* The contained type can also have resources, including a
>                  * bpf_list_head which needs to be freed.
>                  */

[...]

  reply	other threads:[~2026-05-15 18:24 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-12  5:59 [PATCH RESEND bpf-next v10 0/8] bpf: Extend the bpf_list family of APIs Kaitao cheng
2026-05-12  5:59 ` [PATCH RESEND bpf-next v10 1/8] bpf: refactor __bpf_list_del to take list node pointer Kaitao cheng
2026-05-12  6:41   ` bot+bpf-ci
2026-05-12  8:55     ` Kaitao Cheng
2026-05-13 22:30   ` Eduard Zingerman
2026-05-12  5:59 ` [PATCH RESEND bpf-next v10 2/8] bpf: clear list node owner and unlink before drop Kaitao cheng
2026-05-12  6:41   ` bot+bpf-ci
2026-05-13 22:53     ` Eduard Zingerman
2026-05-14  1:50       ` Alexei Starovoitov
2026-05-15  4:34         ` Kaitao Cheng
2026-05-15 18:24           ` Eduard Zingerman [this message]
2026-05-13  6:02   ` sashiko-bot
2026-05-12  5:59 ` [PATCH RESEND bpf-next v10 3/8] bpf: Introduce the bpf_list_del kfunc Kaitao cheng
2026-05-12  6:41   ` bot+bpf-ci
2026-05-12  9:36     ` Kaitao Cheng
2026-05-13 22:32   ` Eduard Zingerman
2026-05-12  5:59 ` [PATCH RESEND bpf-next v10 4/8] bpf: refactor __bpf_list_add to take insertion point via **prev_ptr Kaitao cheng
2026-05-13 22:33   ` Eduard Zingerman
2026-05-12  5:59 ` [PATCH RESEND bpf-next v10 5/8] bpf: Add bpf_list_add to insert node after a given list node Kaitao cheng
2026-05-12  6:41   ` bot+bpf-ci
2026-05-12 12:05     ` Kaitao Cheng
2026-05-13 20:44   ` sashiko-bot
2026-05-13 22:35   ` Eduard Zingerman
2026-05-12  5:59 ` [PATCH RESEND bpf-next v10 6/8] bpf: add bpf_list_is_first/last/empty kfuncs Kaitao cheng
2026-05-13 22:35   ` Eduard Zingerman
2026-05-12  5:59 ` [PATCH RESEND bpf-next v10 7/8] bpf: allow non-owning list-node args via __nonown_allowed Kaitao cheng
2026-05-12  6:41   ` bot+bpf-ci
2026-05-13 22:22   ` sashiko-bot
2026-05-13 22:37   ` Eduard Zingerman
2026-05-13 22:55     ` Eduard Zingerman
2026-05-12  5:59 ` [PATCH RESEND bpf-next v10 8/8] selftests/bpf: Add test cases for bpf_list_del/add/is_first/is_last/empty Kaitao cheng
2026-05-13 22:44   ` sashiko-bot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7fa6794161a8bd4fdbc21dad68e86e9770c873cc.camel@gmail.com \
    --to=eddyz87@gmail.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bot+bpf-ci@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=chengkaitao@kylinos.cn \
    --cc=clm@meta.com \
    --cc=corbet@lwn.net \
    --cc=daniel@iogearbox.net \
    --cc=haoluo@google.com \
    --cc=ihor.solodrai@linux.dev \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kaitao.cheng@linux.dev \
    --cc=kpsingh@kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=martin.lau@kernel.org \
    --cc=martin.lau@linux.dev \
    --cc=memxor@gmail.com \
    --cc=sdf@fomichev.me \
    --cc=shuah@kernel.org \
    --cc=skhan@linuxfoundation.org \
    --cc=song@kernel.org \
    --cc=vmalik@redhat.com \
    --cc=yonghong.song@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.