All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jiayuan Chen <jiayuan.chen@linux.dev>
To: Cen Zhang <rollkingzzc@gmail.com>
Cc: John Fastabend <john.fastabend@gmail.com>,
	Jakub Sitnicki <jakub@cloudflare.com>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Simon Horman <horms@kernel.org>,
	netdev@vger.kernel.org, bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org, zerocling0077@gmail.com,
	2045gemini@gmail.com
Subject: Re: [PATCH] net: skmsg: pin the delayed-work psock in sk_psock_backlog
Date: Fri, 15 May 2026 16:26:01 +0800	[thread overview]
Message-ID: <c5fadf42-583c-4e8c-8484-46781a0581b2@linux.dev> (raw)
In-Reply-To: <CAB7XQsEJDoZHUznSYE3v1DadVXGWP7J21rS4Q=BuMr_jV9tX_w@mail.gmail.com>


On 5/15/26 4:12 PM, Cen Zhang wrote:
> Dear Jiayuan Chen
>
> Jiayuan Chen <jiayuan.chen@linux.dev> 于2026年5月15日周五 14:10写道:
>> Where is the 'last_old_ref_before_put' symbol from? I can't find it
>> anywhere in the tree.
>>
>> If you are using LLMs to dig into races like this, please also have them
>> produce a reproducer, e.g. patch mdelay() into
>>
>> the relevant windows to widen them, then trigger it from userspace.
>>
>>
> Hi Jiayuan,
>
> Thanks for checking this. You are right: last_old_ref_before_put is
> not an in-tree kernel symbol. It was a temporary validation probe
> label which recorded the old psock refcount immediately before the
> backlog worker's final put, and it should not have appeared in the
> commit message as if it were kernel output.
>
> The in-tree path I was trying to describe is:
>
>    sk_psock_backlog() starts at net/core/skmsg.c:670.
>    get path: sk_psock_get(psock->sk), net/core/skmsg.c:692.
>    put path: sk_psock_put(psock->sk, psock), net/core/skmsg.c:746.
>    detach clears sk_user_data at net/core/skmsg.c:892.
>    reattach publishes a replacement psock at net/core/skmsg.c:793.
>    warning path: REFCOUNT_SUB_UAF at lib/refcount.c:28.
>
> The trigger was based on the in-tree sockmap_redir BPF selftest
> under tools/testing/selftests/bpf/prog_tests/.
> The one-shot test used AF_UNIX SOCK_STREAM socket pairs, attached
> the sk_skb verdict program to the input map, inserted one socket
> into the input map and one destination socket into the sockmap at
> key 0, then sent one byte through the input peer so the destination
> psock backlog worker was queued.
> For validation I used a temporary local instrumentation patch in
> net/core/skmsg.c. It added a debugfs-controlled gate in
> sk_psock_backlog() after the TX-enabled check and before the
> existing sk_psock_get(psock->sk) call, plus counters and pr_info()
> snapshots in sk_psock_backlog(), sk_psock_init() and
> sk_psock_drop(). It also stored the pointer returned by
> sk_psock_get(psock->sk) for logging. The worker still used the
> existing get path and the existing sk_psock_put(psock->sk, psock)
> exit path.
> With the worker parked before sk_psock_get(psock->sk), the test
> forked: the child deleted the destination sockmap entry, and the
> parent retried BPF_NOEXIST update of the same key with the same
> destination socket fd until reattach succeeded.
> After the delete completed, the test released the old worker. At
> that point sk->sk_user_data referred to the replacement psock, while

So, should the fix swap the order of sk->sk_user_data = null and 
sk_psock_clear_state(psock, SK_PSOCK_TX_ENABLED)?


> the delayed work still belonged to the old psock. The recorded state
> before the warning had the sk_user_data psock and the psock returned
> by sk_psock_get(psock->sk) equal to each other, but different from
> the delayed-work container. The instrumentation was only used to make
> that interleaving deterministic and observable. The warning below is
> the kernel's normal refcount warning path.
>
> The native kernel report from that run was:
>
>    refcount_t: underflow; use-after-free.
>    WARNING: lib/refcount.c:28 at refcount_warn_saturate+0xbf/0xf0
>    Workqueue: events sk_psock_backlog
>    RIP: 0010:refcount_warn_saturate+0xbf/0xf0
>    Call trace:
>      sk_psock_backlog() (net/core/skmsg.c:670)
>      process_one_work() (kernel/workqueue.c:3200)
>
> So the reproducer is instrumentation-assisted, not an
> unmodified upstream selftest. The instrumentation can widen the
> race window and record the participating psock pointers, but it
> does not publish a replacement psock, clear sk->sk_user_data, or
> add an extra put on the old psock. The final warning is reached
> through the existing sk_psock_put(psock->sk, psock) path after
> the test has forced delete-plus-reattach to happen before the
> parked worker resumes.
>
> I will send v2 as a new thread after the netdev 24-hour
> interval, with the lab probe label removed from the commit text.
> If useful, I can also share the small instrumentation/selftest
> diff separately to show the exact widened window.

You can just put the kernel patch and userspace program patch in this 
thread (no need to send a new patch).

Also this patch should be targeted to bpf not net.

--
pw-bot: cr




  reply	other threads:[~2026-05-15  8:26 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-15  5:04 [PATCH] net: skmsg: pin the delayed-work psock in sk_psock_backlog Zhang Cen
2026-05-15  6:09 ` Jiayuan Chen
2026-05-15  8:12   ` Cen Zhang
2026-05-15  8:26     ` Jiayuan Chen [this message]
2026-05-15  8:54       ` Jiayuan Chen
2026-05-15  9:10         ` Cen Zhang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c5fadf42-583c-4e8c-8484-46781a0581b2@linux.dev \
    --to=jiayuan.chen@linux.dev \
    --cc=2045gemini@gmail.com \
    --cc=bpf@vger.kernel.org \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=horms@kernel.org \
    --cc=jakub@cloudflare.com \
    --cc=john.fastabend@gmail.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=rollkingzzc@gmail.com \
    --cc=zerocling0077@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.