From: Martin KaFai Lau <martin.lau@linux.dev>
To: Kuniyuki Iwashima <kuniyu@google.com>, Jiayuan Chen <mrpre@163.com>
Cc: sashiko@lists.linux.dev, bpf@vger.kernel.org
Subject: Re: [PATCH v2 bpf] sockmap: Fix sk_psock_drop() race vs sock_map_{unhash,close,destroy}().
Date: Thu, 23 Apr 2026 14:17:15 -0700 [thread overview]
Message-ID: <202642321855.Yu6v.martin.lau@linux.dev> (raw)
In-Reply-To: <062b1b3b-dedd-422b-83a6-1ca78d7270f7@163.com>
On Tue, Apr 21, 2026 at 11:21:27AM +0800, Jiayuan Chen wrote:
> > > @@ -1652,20 +1657,25 @@ void sock_map_destroy(struct sock *sk)
> > > void (*saved_destroy)(struct sock *sk);
> > > struct sk_psock *psock;
> > > +retry:
> > > rcu_read_lock();
> > > psock = sk_psock_get(sk);
> > > if (unlikely(!psock)) {
> > > rcu_read_unlock();
> > > saved_destroy = READ_ONCE(sk->sk_prot)->destroy;
> > > + if (unlikely(saved_destroy == sock_map_destroy))
> > > + goto retry;
> > Can this unbounded retry loop cause a hard lockup due to softirq preemption
> > or priority inversion?
> >
> > If sk_psock_put() is called from process context with bottom-halves enabled
> > (such as from sock_map_close() after release_sock()), an interrupt could
> > preempt the thread immediately after refcount_dec_and_test(&psock->refcnt)
> > drops to 0 but before sk_psock_drop() disables BH.
> >
> > If a network interrupt fires on the same CPU, NET_RX_SOFTIRQ may process a
> > packet (such as an RST) that triggers socket destruction via:
> > tcp_done() -> inet_csk_destroy_sock() -> sk->sk_prot->destroy()
> >
> > Since sk_psock_drop() has not yet restored the protocol, sk->sk_prot->destroy
> > is still sock_map_destroy().
> >
> > When sock_map_destroy() calls sk_psock_get(), it returns NULL because the
> > refcount is exactly 0. The code then falls into the !psock branch, sees that
> > sk->sk_prot->destroy is still sock_map_destroy(), and jumps to retry.
> >
> > Because the softirq spins infinitely in this tight loop and never yields the
> > CPU, the preempted process context can never execute sk_psock_drop(),
> > resulting in a permanent hard lockup.
>
>
> sock_map_close(sk)
> |___ sk_psock_put(sk, psock) <- refcnt-hits-0 window lives here
> |___ saved_close == tcp_close
> |__tcp_close
> |____ sock_orphan <- SOCK_DEAD set here
> |____(later) inet_csk_destroy_sock
>
> At the exact instant the refcnt can be observed at 0 with
> sk_prot not yet restored, SOCK_DEAD is guaranteed not to be set.
> > A similar priority inversion deadlock could also occur on PREEMPT_RT if the
> > thread calling sk_psock_drop() is preempted by a higher-priority task.
The same SOCK_DEAD reasoning applies to PREEMPT_RT?
It is useful to have some explanation in the commit message for this case.
Kuniyuki, does the above make sense? I can fold it in before landing.
next prev parent reply other threads:[~2026-04-23 21:17 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-20 19:48 [PATCH v2 bpf] sockmap: Fix sk_psock_drop() race vs sock_map_{unhash,close,destroy}() Kuniyuki Iwashima
2026-04-21 1:13 ` sashiko-bot
2026-04-21 3:21 ` Jiayuan Chen
2026-04-23 21:17 ` Martin KaFai Lau [this message]
2026-04-23 23:02 ` Kuniyuki Iwashima
2026-04-21 9:26 ` Jiayuan Chen
2026-04-24 3:40 ` patchwork-bot+netdevbpf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=202642321855.Yu6v.martin.lau@linux.dev \
--to=martin.lau@linux.dev \
--cc=bpf@vger.kernel.org \
--cc=kuniyu@google.com \
--cc=mrpre@163.com \
--cc=sashiko@lists.linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.