All of lore.kernel.org
 help / color / mirror / Atom feed
From: Martin KaFai Lau <martin.lau@linux.dev>
To: Michal Luczaj <mhal@rbox.co>
Cc: John Fastabend <john.fastabend@gmail.com>,
	Jakub Sitnicki <jakub@cloudflare.com>,
	Kuniyuki Iwashima <kuniyu@google.com>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Simon Horman <horms@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	netdev@vger.kernel.org, bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH bpf] bpf, sockmap: Fix af_unix null-ptr-deref in proto update
Date: Fri, 30 Jan 2026 13:29:44 -0800	[thread overview]
Message-ID: <612a9446-252f-4b14-8605-ae1af000cc41@linux.dev> (raw)
In-Reply-To: <3362017f-9c3d-46cd-b3ce-cb750b565d5b@rbox.co>

On 1/30/26 3:00 AM, Michal Luczaj wrote:
>>> Follow-up to discussion at
>>> https://lore.kernel.org/netdev/20240610174906.32921-1-kuniyu@amazon.com/.
>>
>> It is a long thread to dig. Please summarize the discussion in the
>> commit message.
> 
> OK, there we go:
> 
> The root cause of the null-ptr-deref is that unix_stream_connect() sets
> sk_state (`WRITE_ONCE(sk->sk_state, TCP_ESTABLISHED)`) _before_ it assigns
> a peer (`unix_peer(sk) = newsk`). sk_state == TCP_ESTABLISHED makes
> sock_map_sk_state_allowed() believe that socket is properly set up, which
> would include having a defined peer.
> 
> In other words, there's a window when you can call
> unix_stream_bpf_update_proto() on socket which still has unix_peer(sk) == NULL.
> 
> My initial idea was to simply move peer assignment _before_ the sk_state
> update, but the maintainer wasn't interested in changing the
> unix_stream_connect() hot path. He suggested taking care of it in the
> sockmap code.
> 
> My understanding is that users are not supposed to put sockets in a sockmap
> when said socket is only half-way through connect() call. Hence `return
> -EINVAL` on a missing peer. Now, if users should be allowed to legally race
> connect() vs. sockmap update, then I guess we can wait for connect() to
> "finalize" e.g. by taking the unix_state_lock(), as discussed below.
> 
>>   From looking at this commit message, if the existing lock_sock held by
>> update_elem is not useful for af_unix,
> 
> Right, the existing lock_sock is not useful. update's lock_sock holds
> sock::sk_lock, while unix_state_lock() holds unix_sock::lock.

It sounds like lock_sock is the incorrect lock to hold for af_unix. Is 
taking lock_sock in sock_map doing anything useful for af_unix? Should 
sock_map hold the unix_state_lock instead of lock_sock?

Other than update_elem, do other lock_sock() usages in sock_map have a 
similar issue for af_unix?

> 
>> it is not clear why a new test
>> "!sk_pair" on top of the existing WRITE_ONCE(sk->sk_state...) is a fix.
> 
> "On top"? Just to make sure we're looking at the same thing: above I was
> trying to show two parallel flows with unix_peer() fetch in thread-0 and
> WRITE_ONCE(sk->sk_state...) and `unix_peer(sk) = newsk` in thread-1.
> 
> It fixes the problem because now update_proto won't call sock_hold(NULL).
> 
>> A minor thing is sock_map_sk_state_allowed doesn't have
>> READ_ONCE(sk->sk_state) for sk_is_stream_unix also.
> 
> Ok, I'll add this as a separate patch in v2. Along with the !tcp case of
> sock_map_redirect_allowed()?

sgtm. thanks.

> 
>> If unix_stream_connect does not hold lock_sock, can unix_state_lock be
>> used here? lock_sock has already been taken, update_elem should not be
>> the hot path.
> 
> Yes, it can be used, it was proposed in the old thread. In fact, critical
> section can be empty; only used to wait for unix_stream_connect() to
> release the lock, which would guarantee unix_peer(sk) != NULL by then.
> 
>          if (!psock->sk_pair) {
> +               unix_state_lock(sk);
> +               unix_state_unlock(sk);
>                  sk_pair = unix_peer(sk);
>                  sock_hold(sk_pair);

I don't have a strong opinion on waiting or checking NULL. imo, both are 
not easy to understand. One is sk_state had already been checked earlier 
under a lock_sock but still needs to check NULL on unix_peer(). Another 
one is an empty unix_state_[un]lock(). If taking unix_state_lock, may as 
well just use the existing unix_peer_get(sk). If its return value cannot 
(?) be NULL, WARN_ON_ONCE() instead of having a special empty 
lock/unlock pattern here. If the correct lock (unix_state_lock) was held 
earlier in update_elem, all these would go away.

Also, it is not immediately clear why a non-NULL unix_peer(sk) is safe 
here. From looking around af_unix.c, is it because the sk refcnt is held 
earlier in update_elem? For unix_stream, unix_peer(sk) will stay valid 
until unix_release_sock(sk). Am I reading it correctly?


  reply	other threads:[~2026-01-30 21:30 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-29 16:47 [PATCH bpf] bpf, sockmap: Fix af_unix null-ptr-deref in proto update Michal Luczaj
2026-01-29 19:41 ` Martin KaFai Lau
2026-01-30 11:00   ` Michal Luczaj
2026-01-30 21:29     ` Martin KaFai Lau [this message]
2026-01-31 10:06       ` Kuniyuki Iwashima
2026-02-02 15:10         ` Michal Luczaj
2026-02-03  3:53           ` Martin KaFai Lau
2026-02-03  9:57             ` Michal Luczaj
2026-02-03 19:47               ` Kuniyuki Iwashima
2026-02-04  7:15                 ` Martin KaFai Lau
2026-02-04  7:58                   ` Kuniyuki Iwashima
2026-02-04 15:41                     ` Michal Luczaj
2026-02-04 19:16                       ` Kuniyuki Iwashima
2026-02-04 20:18                         ` Martin KaFai Lau
2026-02-04 19:34                       ` Martin KaFai Lau
2026-02-04 21:09                         ` Kuniyuki Iwashima
2026-02-05  0:55                           ` Martin KaFai Lau
2026-02-05  2:00                             ` Martin KaFai Lau
2026-02-05  7:39                               ` Kuniyuki Iwashima
2026-02-04 23:25                         ` Michal Luczaj
2026-02-05  0:27                           ` Kuniyuki Iwashima
2026-02-05  0:31                           ` Martin KaFai Lau
2026-02-02 19:15         ` Martin KaFai Lau
2026-02-07 14:37           ` Michal Luczaj

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=612a9446-252f-4b14-8605-ae1af000cc41@linux.dev \
    --to=martin.lau@linux.dev \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=horms@kernel.org \
    --cc=jakub@cloudflare.com \
    --cc=john.fastabend@gmail.com \
    --cc=kuba@kernel.org \
    --cc=kuniyu@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhal@rbox.co \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.