From: Martin KaFai Lau <martin.lau@linux.dev>
To: Michal Luczaj <mhal@rbox.co>
Cc: John Fastabend <john.fastabend@gmail.com>,
Jakub Sitnicki <jakub@cloudflare.com>,
Kuniyuki Iwashima <kuniyu@google.com>,
"David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
Simon Horman <horms@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
netdev@vger.kernel.org, bpf@vger.kernel.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH bpf] bpf, sockmap: Fix af_unix null-ptr-deref in proto update
Date: Fri, 30 Jan 2026 13:29:44 -0800 [thread overview]
Message-ID: <612a9446-252f-4b14-8605-ae1af000cc41@linux.dev> (raw)
In-Reply-To: <3362017f-9c3d-46cd-b3ce-cb750b565d5b@rbox.co>
On 1/30/26 3:00 AM, Michal Luczaj wrote:
>>> Follow-up to discussion at
>>> https://lore.kernel.org/netdev/20240610174906.32921-1-kuniyu@amazon.com/.
>>
>> It is a long thread to dig. Please summarize the discussion in the
>> commit message.
>
> OK, there we go:
>
> The root cause of the null-ptr-deref is that unix_stream_connect() sets
> sk_state (`WRITE_ONCE(sk->sk_state, TCP_ESTABLISHED)`) _before_ it assigns
> a peer (`unix_peer(sk) = newsk`). sk_state == TCP_ESTABLISHED makes
> sock_map_sk_state_allowed() believe that socket is properly set up, which
> would include having a defined peer.
>
> In other words, there's a window when you can call
> unix_stream_bpf_update_proto() on socket which still has unix_peer(sk) == NULL.
>
> My initial idea was to simply move peer assignment _before_ the sk_state
> update, but the maintainer wasn't interested in changing the
> unix_stream_connect() hot path. He suggested taking care of it in the
> sockmap code.
>
> My understanding is that users are not supposed to put sockets in a sockmap
> when said socket is only half-way through connect() call. Hence `return
> -EINVAL` on a missing peer. Now, if users should be allowed to legally race
> connect() vs. sockmap update, then I guess we can wait for connect() to
> "finalize" e.g. by taking the unix_state_lock(), as discussed below.
>
>> From looking at this commit message, if the existing lock_sock held by
>> update_elem is not useful for af_unix,
>
> Right, the existing lock_sock is not useful. update's lock_sock holds
> sock::sk_lock, while unix_state_lock() holds unix_sock::lock.
It sounds like lock_sock is the incorrect lock to hold for af_unix. Is
taking lock_sock in sock_map doing anything useful for af_unix? Should
sock_map hold the unix_state_lock instead of lock_sock?
Other than update_elem, do other lock_sock() usages in sock_map have a
similar issue for af_unix?
>
>> it is not clear why a new test
>> "!sk_pair" on top of the existing WRITE_ONCE(sk->sk_state...) is a fix.
>
> "On top"? Just to make sure we're looking at the same thing: above I was
> trying to show two parallel flows with unix_peer() fetch in thread-0 and
> WRITE_ONCE(sk->sk_state...) and `unix_peer(sk) = newsk` in thread-1.
>
> It fixes the problem because now update_proto won't call sock_hold(NULL).
>
>> A minor thing is sock_map_sk_state_allowed doesn't have
>> READ_ONCE(sk->sk_state) for sk_is_stream_unix also.
>
> Ok, I'll add this as a separate patch in v2. Along with the !tcp case of
> sock_map_redirect_allowed()?
sgtm. thanks.
>
>> If unix_stream_connect does not hold lock_sock, can unix_state_lock be
>> used here? lock_sock has already been taken, update_elem should not be
>> the hot path.
>
> Yes, it can be used, it was proposed in the old thread. In fact, critical
> section can be empty; only used to wait for unix_stream_connect() to
> release the lock, which would guarantee unix_peer(sk) != NULL by then.
>
> if (!psock->sk_pair) {
> + unix_state_lock(sk);
> + unix_state_unlock(sk);
> sk_pair = unix_peer(sk);
> sock_hold(sk_pair);
I don't have a strong opinion on waiting or checking NULL. imo, both are
not easy to understand. One is sk_state had already been checked earlier
under a lock_sock but still needs to check NULL on unix_peer(). Another
one is an empty unix_state_[un]lock(). If taking unix_state_lock, may as
well just use the existing unix_peer_get(sk). If its return value cannot
(?) be NULL, WARN_ON_ONCE() instead of having a special empty
lock/unlock pattern here. If the correct lock (unix_state_lock) was held
earlier in update_elem, all these would go away.
Also, it is not immediately clear why a non-NULL unix_peer(sk) is safe
here. From looking around af_unix.c, is it because the sk refcnt is held
earlier in update_elem? For unix_stream, unix_peer(sk) will stay valid
until unix_release_sock(sk). Am I reading it correctly?
next prev parent reply other threads:[~2026-01-30 21:30 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-29 16:47 [PATCH bpf] bpf, sockmap: Fix af_unix null-ptr-deref in proto update Michal Luczaj
2026-01-29 19:41 ` Martin KaFai Lau
2026-01-30 11:00 ` Michal Luczaj
2026-01-30 21:29 ` Martin KaFai Lau [this message]
2026-01-31 10:06 ` Kuniyuki Iwashima
2026-02-02 15:10 ` Michal Luczaj
2026-02-03 3:53 ` Martin KaFai Lau
2026-02-03 9:57 ` Michal Luczaj
2026-02-03 19:47 ` Kuniyuki Iwashima
2026-02-04 7:15 ` Martin KaFai Lau
2026-02-04 7:58 ` Kuniyuki Iwashima
2026-02-04 15:41 ` Michal Luczaj
2026-02-04 19:16 ` Kuniyuki Iwashima
2026-02-04 20:18 ` Martin KaFai Lau
2026-02-04 19:34 ` Martin KaFai Lau
2026-02-04 21:09 ` Kuniyuki Iwashima
2026-02-05 0:55 ` Martin KaFai Lau
2026-02-05 2:00 ` Martin KaFai Lau
2026-02-05 7:39 ` Kuniyuki Iwashima
2026-02-04 23:25 ` Michal Luczaj
2026-02-05 0:27 ` Kuniyuki Iwashima
2026-02-05 0:31 ` Martin KaFai Lau
2026-02-02 19:15 ` Martin KaFai Lau
2026-02-07 14:37 ` Michal Luczaj
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=612a9446-252f-4b14-8605-ae1af000cc41@linux.dev \
--to=martin.lau@linux.dev \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=horms@kernel.org \
--cc=jakub@cloudflare.com \
--cc=john.fastabend@gmail.com \
--cc=kuba@kernel.org \
--cc=kuniyu@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mhal@rbox.co \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.