From: Martin KaFai Lau <martin.lau@linux.dev>
To: Kuniyuki Iwashima <kuniyu@google.com>, Jiayuan Chen <mrpre@163.com>
Cc: sashiko@lists.linux.dev, bpf@vger.kernel.org
Subject: Re: [PATCH v2 bpf] sockmap: Fix sk_psock_drop() race vs sock_map_{unhash,close,destroy}().
Date: Thu, 23 Apr 2026 14:17:15 -0700 [thread overview]
Message-ID: <202642321855.Yu6v.martin.lau@linux.dev> (raw)
In-Reply-To: <062b1b3b-dedd-422b-83a6-1ca78d7270f7@163.com>
On Tue, Apr 21, 2026 at 11:21:27AM +0800, Jiayuan Chen wrote:
> > > @@ -1652,20 +1657,25 @@ void sock_map_destroy(struct sock *sk)
> > > void (*saved_destroy)(struct sock *sk);
> > > struct sk_psock *psock;
> > > +retry:
> > > rcu_read_lock();
> > > psock = sk_psock_get(sk);
> > > if (unlikely(!psock)) {
> > > rcu_read_unlock();
> > > saved_destroy = READ_ONCE(sk->sk_prot)->destroy;
> > > + if (unlikely(saved_destroy == sock_map_destroy))
> > > + goto retry;
> > Can this unbounded retry loop cause a hard lockup due to softirq preemption
> > or priority inversion?
> >
> > If sk_psock_put() is called from process context with bottom-halves enabled
> > (such as from sock_map_close() after release_sock()), an interrupt could
> > preempt the thread immediately after refcount_dec_and_test(&psock->refcnt)
> > drops to 0 but before sk_psock_drop() disables BH.
> >
> > If a network interrupt fires on the same CPU, NET_RX_SOFTIRQ may process a
> > packet (such as an RST) that triggers socket destruction via:
> > tcp_done() -> inet_csk_destroy_sock() -> sk->sk_prot->destroy()
> >
> > Since sk_psock_drop() has not yet restored the protocol, sk->sk_prot->destroy
> > is still sock_map_destroy().
> >
> > When sock_map_destroy() calls sk_psock_get(), it returns NULL because the
> > refcount is exactly 0. The code then falls into the !psock branch, sees that
> > sk->sk_prot->destroy is still sock_map_destroy(), and jumps to retry.
> >
> > Because the softirq spins infinitely in this tight loop and never yields the
> > CPU, the preempted process context can never execute sk_psock_drop(),
> > resulting in a permanent hard lockup.
>
>
> sock_map_close(sk)
> |___ sk_psock_put(sk, psock) <- refcnt-hits-0 window lives here
> |___ saved_close == tcp_close
> |__tcp_close
> |____ sock_orphan <- SOCK_DEAD set here
> |____(later) inet_csk_destroy_sock
>
> At the exact instant the refcnt can be observed at 0 with
> sk_prot not yet restored, SOCK_DEAD is guaranteed not to be set.
> > A similar priority inversion deadlock could also occur on PREEMPT_RT if the
> > thread calling sk_psock_drop() is preempted by a higher-priority task.
The same SOCK_DEAD reasoning applies to PREEMPT_RT?
It is useful to have some explanation in the commit message for this case.
Kuniyuki, does the above make sense? I can fold it in before landing.
next prev parent reply other threads:[~2026-04-23 21:17 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-20 19:48 [PATCH v2 bpf] sockmap: Fix sk_psock_drop() race vs sock_map_{unhash,close,destroy}() Kuniyuki Iwashima
2026-04-21 1:13 ` sashiko-bot
2026-04-21 3:21 ` Jiayuan Chen
2026-04-23 21:17 ` Martin KaFai Lau [this message]
2026-04-23 23:02 ` Kuniyuki Iwashima
2026-04-21 9:26 ` Jiayuan Chen
2026-04-24 3:40 ` patchwork-bot+netdevbpf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=202642321855.Yu6v.martin.lau@linux.dev \
--to=martin.lau@linux.dev \
--cc=bpf@vger.kernel.org \
--cc=kuniyu@google.com \
--cc=mrpre@163.com \
--cc=sashiko@lists.linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox