BPF List
 help / color / mirror / Atom feed
From: Jiayuan Chen <jiayuan.chen@linux.dev>
To: Cen Zhang <rollkingzzc@gmail.com>
Cc: John Fastabend <john.fastabend@gmail.com>,
	Jakub Sitnicki <jakub@cloudflare.com>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Simon Horman <horms@kernel.org>,
	netdev@vger.kernel.org, bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org, zerocling0077@gmail.com,
	2045gemini@gmail.com
Subject: Re: [PATCH] net: skmsg: pin the delayed-work psock in sk_psock_backlog
Date: Fri, 15 May 2026 16:26:01 +0800	[thread overview]
Message-ID: <c5fadf42-583c-4e8c-8484-46781a0581b2@linux.dev> (raw)
In-Reply-To: <CAB7XQsEJDoZHUznSYE3v1DadVXGWP7J21rS4Q=BuMr_jV9tX_w@mail.gmail.com>


On 5/15/26 4:12 PM, Cen Zhang wrote:
> Dear Jiayuan Chen
>
> Jiayuan Chen <jiayuan.chen@linux.dev> 于2026年5月15日周五 14:10写道:
>> Where is the 'last_old_ref_before_put' symbol from? I can't find it
>> anywhere in the tree.
>>
>> If you are using LLMs to dig into races like this, please also have them
>> produce a reproducer, e.g. patch mdelay() into
>>
>> the relevant windows to widen them, then trigger it from userspace.
>>
>>
> Hi Jiayuan,
>
> Thanks for checking this. You are right: last_old_ref_before_put is
> not an in-tree kernel symbol. It was a temporary validation probe
> label which recorded the old psock refcount immediately before the
> backlog worker's final put, and it should not have appeared in the
> commit message as if it were kernel output.
>
> The in-tree path I was trying to describe is:
>
>    sk_psock_backlog() starts at net/core/skmsg.c:670.
>    get path: sk_psock_get(psock->sk), net/core/skmsg.c:692.
>    put path: sk_psock_put(psock->sk, psock), net/core/skmsg.c:746.
>    detach clears sk_user_data at net/core/skmsg.c:892.
>    reattach publishes a replacement psock at net/core/skmsg.c:793.
>    warning path: REFCOUNT_SUB_UAF at lib/refcount.c:28.
>
> The trigger was based on the in-tree sockmap_redir BPF selftest
> under tools/testing/selftests/bpf/prog_tests/.
> The one-shot test used AF_UNIX SOCK_STREAM socket pairs, attached
> the sk_skb verdict program to the input map, inserted one socket
> into the input map and one destination socket into the sockmap at
> key 0, then sent one byte through the input peer so the destination
> psock backlog worker was queued.
> For validation I used a temporary local instrumentation patch in
> net/core/skmsg.c. It added a debugfs-controlled gate in
> sk_psock_backlog() after the TX-enabled check and before the
> existing sk_psock_get(psock->sk) call, plus counters and pr_info()
> snapshots in sk_psock_backlog(), sk_psock_init() and
> sk_psock_drop(). It also stored the pointer returned by
> sk_psock_get(psock->sk) for logging. The worker still used the
> existing get path and the existing sk_psock_put(psock->sk, psock)
> exit path.
> With the worker parked before sk_psock_get(psock->sk), the test
> forked: the child deleted the destination sockmap entry, and the
> parent retried BPF_NOEXIST update of the same key with the same
> destination socket fd until reattach succeeded.
> After the delete completed, the test released the old worker. At
> that point sk->sk_user_data referred to the replacement psock, while

So, should the fix swap the order of sk->sk_user_data = null and 
sk_psock_clear_state(psock, SK_PSOCK_TX_ENABLED)?


> the delayed work still belonged to the old psock. The recorded state
> before the warning had the sk_user_data psock and the psock returned
> by sk_psock_get(psock->sk) equal to each other, but different from
> the delayed-work container. The instrumentation was only used to make
> that interleaving deterministic and observable. The warning below is
> the kernel's normal refcount warning path.
>
> The native kernel report from that run was:
>
>    refcount_t: underflow; use-after-free.
>    WARNING: lib/refcount.c:28 at refcount_warn_saturate+0xbf/0xf0
>    Workqueue: events sk_psock_backlog
>    RIP: 0010:refcount_warn_saturate+0xbf/0xf0
>    Call trace:
>      sk_psock_backlog() (net/core/skmsg.c:670)
>      process_one_work() (kernel/workqueue.c:3200)
>
> So the reproducer is instrumentation-assisted, not an
> unmodified upstream selftest. The instrumentation can widen the
> race window and record the participating psock pointers, but it
> does not publish a replacement psock, clear sk->sk_user_data, or
> add an extra put on the old psock. The final warning is reached
> through the existing sk_psock_put(psock->sk, psock) path after
> the test has forced delete-plus-reattach to happen before the
> parked worker resumes.
>
> I will send v2 as a new thread after the netdev 24-hour
> interval, with the lab probe label removed from the commit text.
> If useful, I can also share the small instrumentation/selftest
> diff separately to show the exact widened window.

You can just put the kernel patch and userspace program patch in this 
thread (no need to send a new patch).

Also this patch should be targeted to bpf not net.

--
pw-bot: cr




  reply	other threads:[~2026-05-15  8:26 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-15  5:04 [PATCH] net: skmsg: pin the delayed-work psock in sk_psock_backlog Zhang Cen
2026-05-15  6:09 ` Jiayuan Chen
2026-05-15  8:12   ` Cen Zhang
2026-05-15  8:26     ` Jiayuan Chen [this message]
2026-05-15  8:54       ` Jiayuan Chen
2026-05-15  9:10         ` Cen Zhang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c5fadf42-583c-4e8c-8484-46781a0581b2@linux.dev \
    --to=jiayuan.chen@linux.dev \
    --cc=2045gemini@gmail.com \
    --cc=bpf@vger.kernel.org \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=horms@kernel.org \
    --cc=jakub@cloudflare.com \
    --cc=john.fastabend@gmail.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=rollkingzzc@gmail.com \
    --cc=zerocling0077@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox