public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: Michal Luczaj <mhal@rbox.co>
To: Martin KaFai Lau <martin.lau@linux.dev>
Cc: John Fastabend <john.fastabend@gmail.com>,
	Jakub Sitnicki <jakub@cloudflare.com>,
	Kuniyuki Iwashima <kuniyu@google.com>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Simon Horman <horms@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	netdev@vger.kernel.org, bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH bpf] bpf, sockmap: Fix af_unix null-ptr-deref in proto update
Date: Fri, 30 Jan 2026 12:00:09 +0100	[thread overview]
Message-ID: <3362017f-9c3d-46cd-b3ce-cb750b565d5b@rbox.co> (raw)
In-Reply-To: <3fc5611f-394f-40db-b49d-2f26402e221a@linux.dev>

On 1/29/26 20:41, Martin KaFai Lau wrote:
> On 1/29/26 8:47 AM, Michal Luczaj wrote:
>> BPF_MAP_UPDATE_ELEM races unix_stream_connect(): when
>> sock_map_sk_state_allowed() passes (sk_state == TCP_ESTABLISHED),
>> unix_peer(sk) in unix_stream_bpf_update_proto() may still return NULL.
>>
>> 	T0 bpf				T1 connect
>> 	------				----------
>>
>> 				WRITE_ONCE(sk->sk_state, TCP_ESTABLISHED)
>> sock_map_sk_state_allowed(sk)
>> ...
>> sk_pair = unix_peer(sk)
>> sock_hold(sk_pair)
>> 				sock_hold(newsk)
>> 				smp_mb__after_atomic()
>> 				unix_peer(sk) = newsk
>>
>> BUG: kernel NULL pointer dereference, address: 0000000000000080
>> RIP: 0010:unix_stream_bpf_update_proto+0xa0/0x1b0
>> Call Trace:
>>   sock_map_link+0x564/0x8b0
>>   sock_map_update_common+0x6e/0x340
>>   sock_map_update_elem_sys+0x17d/0x240
>>   __sys_bpf+0x26db/0x3250
>>   __x64_sys_bpf+0x21/0x30
>>   do_syscall_64+0x6b/0x3a0
>>   entry_SYSCALL_64_after_hwframe+0x76/0x7e
>>
>> Follow-up to discussion at
>> https://lore.kernel.org/netdev/20240610174906.32921-1-kuniyu@amazon.com/.
> 
> It is a long thread to dig. Please summarize the discussion in the 
> commit message.

OK, there we go:

The root cause of the null-ptr-deref is that unix_stream_connect() sets
sk_state (`WRITE_ONCE(sk->sk_state, TCP_ESTABLISHED)`) _before_ it assigns
a peer (`unix_peer(sk) = newsk`). sk_state == TCP_ESTABLISHED makes
sock_map_sk_state_allowed() believe that socket is properly set up, which
would include having a defined peer.

In other words, there's a window when you can call
unix_stream_bpf_update_proto() on socket which still has unix_peer(sk) == NULL.

My initial idea was to simply move peer assignment _before_ the sk_state
update, but the maintainer wasn't interested in changing the
unix_stream_connect() hot path. He suggested taking care of it in the
sockmap code.

My understanding is that users are not supposed to put sockets in a sockmap
when said socket is only half-way through connect() call. Hence `return
-EINVAL` on a missing peer. Now, if users should be allowed to legally race
connect() vs. sockmap update, then I guess we can wait for connect() to
"finalize" e.g. by taking the unix_state_lock(), as discussed below.

>  From looking at this commit message, if the existing lock_sock held by 
> update_elem is not useful for af_unix,

Right, the existing lock_sock is not useful. update's lock_sock holds
sock::sk_lock, while unix_state_lock() holds unix_sock::lock.

> it is not clear why a new test 
> "!sk_pair" on top of the existing WRITE_ONCE(sk->sk_state...) is a fix. 

"On top"? Just to make sure we're looking at the same thing: above I was
trying to show two parallel flows with unix_peer() fetch in thread-0 and
WRITE_ONCE(sk->sk_state...) and `unix_peer(sk) = newsk` in thread-1.

It fixes the problem because now update_proto won't call sock_hold(NULL).

> A minor thing is sock_map_sk_state_allowed doesn't have 
> READ_ONCE(sk->sk_state) for sk_is_stream_unix also.

Ok, I'll add this as a separate patch in v2. Along with the !tcp case of
sock_map_redirect_allowed()?

> If unix_stream_connect does not hold lock_sock, can unix_state_lock be 
> used here? lock_sock has already been taken, update_elem should not be 
> the hot path.

Yes, it can be used, it was proposed in the old thread. In fact, critical
section can be empty; only used to wait for unix_stream_connect() to
release the lock, which would guarantee unix_peer(sk) != NULL by then.

        if (!psock->sk_pair) {
+               unix_state_lock(sk);
+               unix_state_unlock(sk);
                sk_pair = unix_peer(sk);
                sock_hold(sk_pair);

>> Fixes: 8866730aed51 ("bpf, sockmap: af_unix stream sockets need to hold ref for pair sock")
>> Suggested-by: Kuniyuki Iwashima <kuniyu@google.com>
>> Signed-off-by: Michal Luczaj <mhal@rbox.co>
>> ---
>> Re-triggered while working on an unrelated selftest:
>> https://lore.kernel.org/bpf/20260123-selftest-signal-on-connect-v1-0-b0256e7025b6@rbox.co/
>> ---
>>   net/unix/unix_bpf.c | 3 +++
>>   1 file changed, 3 insertions(+)
>>
>> diff --git a/net/unix/unix_bpf.c b/net/unix/unix_bpf.c
>> index e0d30d6d22ac..57f3124c9d8d 100644
>> --- a/net/unix/unix_bpf.c
>> +++ b/net/unix/unix_bpf.c
>> @@ -185,6 +185,9 @@ int unix_stream_bpf_update_proto(struct sock *sk, struct sk_psock *psock, bool r
>>   	 */
>>   	if (!psock->sk_pair) {
>>   		sk_pair = unix_peer(sk);
>> +		if (unlikely(!sk_pair))
>> +			return -EINVAL;
>> +
>>   		sock_hold(sk_pair);
>>   		psock->sk_pair = sk_pair;
>>   	}
>>
>> ---
>> base-commit: 63804fed149a6750ffd28610c5c1c98cce6bd377
>> change-id: 20260129-unix-proto-update-null-ptr-deref-6a2733bcbbf8
>>
>> Best regards,
> 


  reply	other threads:[~2026-01-30 11:00 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-29 16:47 [PATCH bpf] bpf, sockmap: Fix af_unix null-ptr-deref in proto update Michal Luczaj
2026-01-29 19:41 ` Martin KaFai Lau
2026-01-30 11:00   ` Michal Luczaj [this message]
2026-01-30 21:29     ` Martin KaFai Lau
2026-01-31 10:06       ` Kuniyuki Iwashima
2026-02-02 15:10         ` Michal Luczaj
2026-02-03  3:53           ` Martin KaFai Lau
2026-02-03  9:57             ` Michal Luczaj
2026-02-03 19:47               ` Kuniyuki Iwashima
2026-02-04  7:15                 ` Martin KaFai Lau
2026-02-04  7:58                   ` Kuniyuki Iwashima
2026-02-04 15:41                     ` Michal Luczaj
2026-02-04 19:16                       ` Kuniyuki Iwashima
2026-02-04 20:18                         ` Martin KaFai Lau
2026-02-04 19:34                       ` Martin KaFai Lau
2026-02-04 21:09                         ` Kuniyuki Iwashima
2026-02-05  0:55                           ` Martin KaFai Lau
2026-02-05  2:00                             ` Martin KaFai Lau
2026-02-05  7:39                               ` Kuniyuki Iwashima
2026-02-04 23:25                         ` Michal Luczaj
2026-02-05  0:27                           ` Kuniyuki Iwashima
2026-02-05  0:31                           ` Martin KaFai Lau
2026-02-02 19:15         ` Martin KaFai Lau
2026-02-07 14:37           ` Michal Luczaj

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3362017f-9c3d-46cd-b3ce-cb750b565d5b@rbox.co \
    --to=mhal@rbox.co \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=horms@kernel.org \
    --cc=jakub@cloudflare.com \
    --cc=john.fastabend@gmail.com \
    --cc=kuba@kernel.org \
    --cc=kuniyu@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=martin.lau@linux.dev \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox