* [PATCH bpf] bpf, sockmap: Fix af_unix null-ptr-deref in proto update
@ 2026-01-29 16:47 Michal Luczaj
2026-01-29 19:41 ` Martin KaFai Lau
0 siblings, 1 reply; 24+ messages in thread
From: Michal Luczaj @ 2026-01-29 16:47 UTC (permalink / raw)
To: John Fastabend, Jakub Sitnicki, Kuniyuki Iwashima,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, Daniel Borkmann
Cc: netdev, bpf, linux-kernel, Michal Luczaj
BPF_MAP_UPDATE_ELEM races unix_stream_connect(): when
sock_map_sk_state_allowed() passes (sk_state == TCP_ESTABLISHED),
unix_peer(sk) in unix_stream_bpf_update_proto() may still return NULL.
T0 bpf T1 connect
------ ----------
WRITE_ONCE(sk->sk_state, TCP_ESTABLISHED)
sock_map_sk_state_allowed(sk)
...
sk_pair = unix_peer(sk)
sock_hold(sk_pair)
sock_hold(newsk)
smp_mb__after_atomic()
unix_peer(sk) = newsk
BUG: kernel NULL pointer dereference, address: 0000000000000080
RIP: 0010:unix_stream_bpf_update_proto+0xa0/0x1b0
Call Trace:
sock_map_link+0x564/0x8b0
sock_map_update_common+0x6e/0x340
sock_map_update_elem_sys+0x17d/0x240
__sys_bpf+0x26db/0x3250
__x64_sys_bpf+0x21/0x30
do_syscall_64+0x6b/0x3a0
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Follow-up to discussion at
https://lore.kernel.org/netdev/20240610174906.32921-1-kuniyu@amazon.com/.
Fixes: 8866730aed51 ("bpf, sockmap: af_unix stream sockets need to hold ref for pair sock")
Suggested-by: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Michal Luczaj <mhal@rbox.co>
---
Re-triggered while working on an unrelated selftest:
https://lore.kernel.org/bpf/20260123-selftest-signal-on-connect-v1-0-b0256e7025b6@rbox.co/
---
net/unix/unix_bpf.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/net/unix/unix_bpf.c b/net/unix/unix_bpf.c
index e0d30d6d22ac..57f3124c9d8d 100644
--- a/net/unix/unix_bpf.c
+++ b/net/unix/unix_bpf.c
@@ -185,6 +185,9 @@ int unix_stream_bpf_update_proto(struct sock *sk, struct sk_psock *psock, bool r
*/
if (!psock->sk_pair) {
sk_pair = unix_peer(sk);
+ if (unlikely(!sk_pair))
+ return -EINVAL;
+
sock_hold(sk_pair);
psock->sk_pair = sk_pair;
}
---
base-commit: 63804fed149a6750ffd28610c5c1c98cce6bd377
change-id: 20260129-unix-proto-update-null-ptr-deref-6a2733bcbbf8
Best regards,
--
Michal Luczaj <mhal@rbox.co>
^ permalink raw reply related [flat|nested] 24+ messages in thread* Re: [PATCH bpf] bpf, sockmap: Fix af_unix null-ptr-deref in proto update 2026-01-29 16:47 [PATCH bpf] bpf, sockmap: Fix af_unix null-ptr-deref in proto update Michal Luczaj @ 2026-01-29 19:41 ` Martin KaFai Lau 2026-01-30 11:00 ` Michal Luczaj 0 siblings, 1 reply; 24+ messages in thread From: Martin KaFai Lau @ 2026-01-29 19:41 UTC (permalink / raw) To: Michal Luczaj Cc: John Fastabend, Jakub Sitnicki, Kuniyuki Iwashima, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman, Daniel Borkmann, netdev, bpf, linux-kernel On 1/29/26 8:47 AM, Michal Luczaj wrote: > BPF_MAP_UPDATE_ELEM races unix_stream_connect(): when > sock_map_sk_state_allowed() passes (sk_state == TCP_ESTABLISHED), > unix_peer(sk) in unix_stream_bpf_update_proto() may still return NULL. > > T0 bpf T1 connect > ------ ---------- > > WRITE_ONCE(sk->sk_state, TCP_ESTABLISHED) > sock_map_sk_state_allowed(sk) > ... > sk_pair = unix_peer(sk) > sock_hold(sk_pair) > sock_hold(newsk) > smp_mb__after_atomic() > unix_peer(sk) = newsk > > BUG: kernel NULL pointer dereference, address: 0000000000000080 > RIP: 0010:unix_stream_bpf_update_proto+0xa0/0x1b0 > Call Trace: > sock_map_link+0x564/0x8b0 > sock_map_update_common+0x6e/0x340 > sock_map_update_elem_sys+0x17d/0x240 > __sys_bpf+0x26db/0x3250 > __x64_sys_bpf+0x21/0x30 > do_syscall_64+0x6b/0x3a0 > entry_SYSCALL_64_after_hwframe+0x76/0x7e > > Follow-up to discussion at > https://lore.kernel.org/netdev/20240610174906.32921-1-kuniyu@amazon.com/. It is a long thread to dig. Please summarize the discussion in the commit message. From looking at this commit message, if the existing lock_sock held by update_elem is not useful for af_unix, it is not clear why a new test "!sk_pair" on top of the existing WRITE_ONCE(sk->sk_state...) is a fix. A minor thing is sock_map_sk_state_allowed doesn't have READ_ONCE(sk->sk_state) for sk_is_stream_unix also. If unix_stream_connect does not hold lock_sock, can unix_state_lock be used here? lock_sock has already been taken, update_elem should not be the hot path. > > Fixes: 8866730aed51 ("bpf, sockmap: af_unix stream sockets need to hold ref for pair sock") > Suggested-by: Kuniyuki Iwashima <kuniyu@google.com> > Signed-off-by: Michal Luczaj <mhal@rbox.co> > --- > Re-triggered while working on an unrelated selftest: > https://lore.kernel.org/bpf/20260123-selftest-signal-on-connect-v1-0-b0256e7025b6@rbox.co/ > --- > net/unix/unix_bpf.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/net/unix/unix_bpf.c b/net/unix/unix_bpf.c > index e0d30d6d22ac..57f3124c9d8d 100644 > --- a/net/unix/unix_bpf.c > +++ b/net/unix/unix_bpf.c > @@ -185,6 +185,9 @@ int unix_stream_bpf_update_proto(struct sock *sk, struct sk_psock *psock, bool r > */ > if (!psock->sk_pair) { > sk_pair = unix_peer(sk); > + if (unlikely(!sk_pair)) > + return -EINVAL; > + > sock_hold(sk_pair); > psock->sk_pair = sk_pair; > } > > --- > base-commit: 63804fed149a6750ffd28610c5c1c98cce6bd377 > change-id: 20260129-unix-proto-update-null-ptr-deref-6a2733bcbbf8 > > Best regards, ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH bpf] bpf, sockmap: Fix af_unix null-ptr-deref in proto update 2026-01-29 19:41 ` Martin KaFai Lau @ 2026-01-30 11:00 ` Michal Luczaj 2026-01-30 21:29 ` Martin KaFai Lau 0 siblings, 1 reply; 24+ messages in thread From: Michal Luczaj @ 2026-01-30 11:00 UTC (permalink / raw) To: Martin KaFai Lau Cc: John Fastabend, Jakub Sitnicki, Kuniyuki Iwashima, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman, Daniel Borkmann, netdev, bpf, linux-kernel On 1/29/26 20:41, Martin KaFai Lau wrote: > On 1/29/26 8:47 AM, Michal Luczaj wrote: >> BPF_MAP_UPDATE_ELEM races unix_stream_connect(): when >> sock_map_sk_state_allowed() passes (sk_state == TCP_ESTABLISHED), >> unix_peer(sk) in unix_stream_bpf_update_proto() may still return NULL. >> >> T0 bpf T1 connect >> ------ ---------- >> >> WRITE_ONCE(sk->sk_state, TCP_ESTABLISHED) >> sock_map_sk_state_allowed(sk) >> ... >> sk_pair = unix_peer(sk) >> sock_hold(sk_pair) >> sock_hold(newsk) >> smp_mb__after_atomic() >> unix_peer(sk) = newsk >> >> BUG: kernel NULL pointer dereference, address: 0000000000000080 >> RIP: 0010:unix_stream_bpf_update_proto+0xa0/0x1b0 >> Call Trace: >> sock_map_link+0x564/0x8b0 >> sock_map_update_common+0x6e/0x340 >> sock_map_update_elem_sys+0x17d/0x240 >> __sys_bpf+0x26db/0x3250 >> __x64_sys_bpf+0x21/0x30 >> do_syscall_64+0x6b/0x3a0 >> entry_SYSCALL_64_after_hwframe+0x76/0x7e >> >> Follow-up to discussion at >> https://lore.kernel.org/netdev/20240610174906.32921-1-kuniyu@amazon.com/. > > It is a long thread to dig. Please summarize the discussion in the > commit message. OK, there we go: The root cause of the null-ptr-deref is that unix_stream_connect() sets sk_state (`WRITE_ONCE(sk->sk_state, TCP_ESTABLISHED)`) _before_ it assigns a peer (`unix_peer(sk) = newsk`). sk_state == TCP_ESTABLISHED makes sock_map_sk_state_allowed() believe that socket is properly set up, which would include having a defined peer. In other words, there's a window when you can call unix_stream_bpf_update_proto() on socket which still has unix_peer(sk) == NULL. My initial idea was to simply move peer assignment _before_ the sk_state update, but the maintainer wasn't interested in changing the unix_stream_connect() hot path. He suggested taking care of it in the sockmap code. My understanding is that users are not supposed to put sockets in a sockmap when said socket is only half-way through connect() call. Hence `return -EINVAL` on a missing peer. Now, if users should be allowed to legally race connect() vs. sockmap update, then I guess we can wait for connect() to "finalize" e.g. by taking the unix_state_lock(), as discussed below. > From looking at this commit message, if the existing lock_sock held by > update_elem is not useful for af_unix, Right, the existing lock_sock is not useful. update's lock_sock holds sock::sk_lock, while unix_state_lock() holds unix_sock::lock. > it is not clear why a new test > "!sk_pair" on top of the existing WRITE_ONCE(sk->sk_state...) is a fix. "On top"? Just to make sure we're looking at the same thing: above I was trying to show two parallel flows with unix_peer() fetch in thread-0 and WRITE_ONCE(sk->sk_state...) and `unix_peer(sk) = newsk` in thread-1. It fixes the problem because now update_proto won't call sock_hold(NULL). > A minor thing is sock_map_sk_state_allowed doesn't have > READ_ONCE(sk->sk_state) for sk_is_stream_unix also. Ok, I'll add this as a separate patch in v2. Along with the !tcp case of sock_map_redirect_allowed()? > If unix_stream_connect does not hold lock_sock, can unix_state_lock be > used here? lock_sock has already been taken, update_elem should not be > the hot path. Yes, it can be used, it was proposed in the old thread. In fact, critical section can be empty; only used to wait for unix_stream_connect() to release the lock, which would guarantee unix_peer(sk) != NULL by then. if (!psock->sk_pair) { + unix_state_lock(sk); + unix_state_unlock(sk); sk_pair = unix_peer(sk); sock_hold(sk_pair); >> Fixes: 8866730aed51 ("bpf, sockmap: af_unix stream sockets need to hold ref for pair sock") >> Suggested-by: Kuniyuki Iwashima <kuniyu@google.com> >> Signed-off-by: Michal Luczaj <mhal@rbox.co> >> --- >> Re-triggered while working on an unrelated selftest: >> https://lore.kernel.org/bpf/20260123-selftest-signal-on-connect-v1-0-b0256e7025b6@rbox.co/ >> --- >> net/unix/unix_bpf.c | 3 +++ >> 1 file changed, 3 insertions(+) >> >> diff --git a/net/unix/unix_bpf.c b/net/unix/unix_bpf.c >> index e0d30d6d22ac..57f3124c9d8d 100644 >> --- a/net/unix/unix_bpf.c >> +++ b/net/unix/unix_bpf.c >> @@ -185,6 +185,9 @@ int unix_stream_bpf_update_proto(struct sock *sk, struct sk_psock *psock, bool r >> */ >> if (!psock->sk_pair) { >> sk_pair = unix_peer(sk); >> + if (unlikely(!sk_pair)) >> + return -EINVAL; >> + >> sock_hold(sk_pair); >> psock->sk_pair = sk_pair; >> } >> >> --- >> base-commit: 63804fed149a6750ffd28610c5c1c98cce6bd377 >> change-id: 20260129-unix-proto-update-null-ptr-deref-6a2733bcbbf8 >> >> Best regards, > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH bpf] bpf, sockmap: Fix af_unix null-ptr-deref in proto update 2026-01-30 11:00 ` Michal Luczaj @ 2026-01-30 21:29 ` Martin KaFai Lau 2026-01-31 10:06 ` Kuniyuki Iwashima 0 siblings, 1 reply; 24+ messages in thread From: Martin KaFai Lau @ 2026-01-30 21:29 UTC (permalink / raw) To: Michal Luczaj Cc: John Fastabend, Jakub Sitnicki, Kuniyuki Iwashima, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman, Daniel Borkmann, netdev, bpf, linux-kernel On 1/30/26 3:00 AM, Michal Luczaj wrote: >>> Follow-up to discussion at >>> https://lore.kernel.org/netdev/20240610174906.32921-1-kuniyu@amazon.com/. >> >> It is a long thread to dig. Please summarize the discussion in the >> commit message. > > OK, there we go: > > The root cause of the null-ptr-deref is that unix_stream_connect() sets > sk_state (`WRITE_ONCE(sk->sk_state, TCP_ESTABLISHED)`) _before_ it assigns > a peer (`unix_peer(sk) = newsk`). sk_state == TCP_ESTABLISHED makes > sock_map_sk_state_allowed() believe that socket is properly set up, which > would include having a defined peer. > > In other words, there's a window when you can call > unix_stream_bpf_update_proto() on socket which still has unix_peer(sk) == NULL. > > My initial idea was to simply move peer assignment _before_ the sk_state > update, but the maintainer wasn't interested in changing the > unix_stream_connect() hot path. He suggested taking care of it in the > sockmap code. > > My understanding is that users are not supposed to put sockets in a sockmap > when said socket is only half-way through connect() call. Hence `return > -EINVAL` on a missing peer. Now, if users should be allowed to legally race > connect() vs. sockmap update, then I guess we can wait for connect() to > "finalize" e.g. by taking the unix_state_lock(), as discussed below. > >> From looking at this commit message, if the existing lock_sock held by >> update_elem is not useful for af_unix, > > Right, the existing lock_sock is not useful. update's lock_sock holds > sock::sk_lock, while unix_state_lock() holds unix_sock::lock. It sounds like lock_sock is the incorrect lock to hold for af_unix. Is taking lock_sock in sock_map doing anything useful for af_unix? Should sock_map hold the unix_state_lock instead of lock_sock? Other than update_elem, do other lock_sock() usages in sock_map have a similar issue for af_unix? > >> it is not clear why a new test >> "!sk_pair" on top of the existing WRITE_ONCE(sk->sk_state...) is a fix. > > "On top"? Just to make sure we're looking at the same thing: above I was > trying to show two parallel flows with unix_peer() fetch in thread-0 and > WRITE_ONCE(sk->sk_state...) and `unix_peer(sk) = newsk` in thread-1. > > It fixes the problem because now update_proto won't call sock_hold(NULL). > >> A minor thing is sock_map_sk_state_allowed doesn't have >> READ_ONCE(sk->sk_state) for sk_is_stream_unix also. > > Ok, I'll add this as a separate patch in v2. Along with the !tcp case of > sock_map_redirect_allowed()? sgtm. thanks. > >> If unix_stream_connect does not hold lock_sock, can unix_state_lock be >> used here? lock_sock has already been taken, update_elem should not be >> the hot path. > > Yes, it can be used, it was proposed in the old thread. In fact, critical > section can be empty; only used to wait for unix_stream_connect() to > release the lock, which would guarantee unix_peer(sk) != NULL by then. > > if (!psock->sk_pair) { > + unix_state_lock(sk); > + unix_state_unlock(sk); > sk_pair = unix_peer(sk); > sock_hold(sk_pair); I don't have a strong opinion on waiting or checking NULL. imo, both are not easy to understand. One is sk_state had already been checked earlier under a lock_sock but still needs to check NULL on unix_peer(). Another one is an empty unix_state_[un]lock(). If taking unix_state_lock, may as well just use the existing unix_peer_get(sk). If its return value cannot (?) be NULL, WARN_ON_ONCE() instead of having a special empty lock/unlock pattern here. If the correct lock (unix_state_lock) was held earlier in update_elem, all these would go away. Also, it is not immediately clear why a non-NULL unix_peer(sk) is safe here. From looking around af_unix.c, is it because the sk refcnt is held earlier in update_elem? For unix_stream, unix_peer(sk) will stay valid until unix_release_sock(sk). Am I reading it correctly? ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH bpf] bpf, sockmap: Fix af_unix null-ptr-deref in proto update 2026-01-30 21:29 ` Martin KaFai Lau @ 2026-01-31 10:06 ` Kuniyuki Iwashima 2026-02-02 15:10 ` Michal Luczaj 2026-02-02 19:15 ` Martin KaFai Lau 0 siblings, 2 replies; 24+ messages in thread From: Kuniyuki Iwashima @ 2026-01-31 10:06 UTC (permalink / raw) To: Martin KaFai Lau Cc: Michal Luczaj, John Fastabend, Jakub Sitnicki, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman, Daniel Borkmann, netdev, bpf, linux-kernel On Fri, Jan 30, 2026 at 1:30 PM Martin KaFai Lau <martin.lau@linux.dev> wrote: > > On 1/30/26 3:00 AM, Michal Luczaj wrote: > >>> Follow-up to discussion at > >>> https://lore.kernel.org/netdev/20240610174906.32921-1-kuniyu@amazon.com/. > >> > >> It is a long thread to dig. Please summarize the discussion in the > >> commit message. > > > > OK, there we go: > > > > The root cause of the null-ptr-deref is that unix_stream_connect() sets > > sk_state (`WRITE_ONCE(sk->sk_state, TCP_ESTABLISHED)`) _before_ it assigns > > a peer (`unix_peer(sk) = newsk`). sk_state == TCP_ESTABLISHED makes > > sock_map_sk_state_allowed() believe that socket is properly set up, which > > would include having a defined peer. > > > > In other words, there's a window when you can call > > unix_stream_bpf_update_proto() on socket which still has unix_peer(sk) == NULL. > > > > My initial idea was to simply move peer assignment _before_ the sk_state > > update, but the maintainer wasn't interested in changing the > > unix_stream_connect() hot path. He suggested taking care of it in the > > sockmap code. Yes, we already have a memory barrier for unix_peer(sk) there (to save sock_hold()/sock_put() in sendmsg(), see 830a1e5c212fb) and another one just for sk->sk_state is not worth the unlikely case in sockmap by a buggy user. > > > > My understanding is that users are not supposed to put sockets in a sockmap > > when said socket is only half-way through connect() call. Hence `return > > -EINVAL` on a missing peer. Now, if users should be allowed to legally race > > connect() vs. sockmap update, then I guess we can wait for connect() to > > "finalize" e.g. by taking the unix_state_lock(), as discussed below. If a user hit the issue, the user must have updated sockmap while the user knows connect() had not returned. Such a user must prepare for failures since it could occur before sock_map_sk_state_allowed() too. This is a subtle timing issue and I don't think the kernel should be friendly to such buggy users by waiting for connect() etc. > > > >> From looking at this commit message, if the existing lock_sock held by > >> update_elem is not useful for af_unix, > > > > Right, the existing lock_sock is not useful. update's lock_sock holds > > sock::sk_lock, while unix_state_lock() holds unix_sock::lock. > > It sounds like lock_sock is the incorrect lock to hold for af_unix. Is > taking lock_sock in sock_map doing anything useful for af_unix? Should > sock_map hold the unix_state_lock instead of lock_sock? If sockmap code does not sleep, unix_state_lock can be used there. > > Other than update_elem, do other lock_sock() usages in sock_map have a > similar issue for af_unix? > > > > >> it is not clear why a new test > >> "!sk_pair" on top of the existing WRITE_ONCE(sk->sk_state...) is a fix. > > > > "On top"? Just to make sure we're looking at the same thing: above I was > > trying to show two parallel flows with unix_peer() fetch in thread-0 and > > WRITE_ONCE(sk->sk_state...) and `unix_peer(sk) = newsk` in thread-1. > > > > It fixes the problem because now update_proto won't call sock_hold(NULL). > > > >> A minor thing is sock_map_sk_state_allowed doesn't have > >> READ_ONCE(sk->sk_state) for sk_is_stream_unix also. > > > > Ok, I'll add this as a separate patch in v2. Along with the !tcp case of > > sock_map_redirect_allowed()? > > sgtm. thanks. > > > > >> If unix_stream_connect does not hold lock_sock, can unix_state_lock be > >> used here? lock_sock has already been taken, update_elem should not be > >> the hot path. > > > > Yes, it can be used, it was proposed in the old thread. In fact, critical > > section can be empty; only used to wait for unix_stream_connect() to > > release the lock, which would guarantee unix_peer(sk) != NULL by then. > > > > if (!psock->sk_pair) { > > + unix_state_lock(sk); > > + unix_state_unlock(sk); I don't like this... we had a similar one in recvmsg(MSG_PEEK) path for GC with a biiiiiig comment, which I removed in 118f457da9ed . > > sk_pair = unix_peer(sk); > > sock_hold(sk_pair); > > I don't have a strong opinion on waiting or checking NULL. imo, both are > not easy to understand. One is sk_state had already been checked earlier > under a lock_sock but still needs to check NULL on unix_peer(). Another > one is an empty unix_state_[un]lock(). If taking unix_state_lock, may as > well just use the existing unix_peer_get(sk). Yes, unix_peer_get() can be safely used there (with extra sock_put()). > If its return value cannot > (?) be NULL, WARN_ON_ONCE() instead of having a special empty I suggested WARN_ON_ONCE() because Michal reproduced it with mdelay() and I did not think it could occur in real life, but considering PREEMPT_RT, it could be real. So, the current form in this patch looks good to me. > lock/unlock pattern here. If the correct lock (unix_state_lock) was held > earlier in update_elem, all these would go away. > > Also, it is not immediately clear why a non-NULL unix_peer(sk) is safe > here. From looking around af_unix.c, is it because the sk refcnt is held > earlier in update_elem? For unix_stream, unix_peer(sk) will stay valid > until unix_release_sock(sk). Am I reading it correctly? unix_stream_connect() holds the peer's refcnt, and once unix_peer(sk) is set, it and refcnt are not cleared until close()d. So unix_peer_get() is a bit redundant for sane users. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH bpf] bpf, sockmap: Fix af_unix null-ptr-deref in proto update 2026-01-31 10:06 ` Kuniyuki Iwashima @ 2026-02-02 15:10 ` Michal Luczaj 2026-02-03 3:53 ` Martin KaFai Lau 2026-02-02 19:15 ` Martin KaFai Lau 1 sibling, 1 reply; 24+ messages in thread From: Michal Luczaj @ 2026-02-02 15:10 UTC (permalink / raw) To: Kuniyuki Iwashima, Martin KaFai Lau Cc: John Fastabend, Jakub Sitnicki, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman, Daniel Borkmann, netdev, bpf, linux-kernel On 1/31/26 11:06, Kuniyuki Iwashima wrote: > On Fri, Jan 30, 2026 at 1:30 PM Martin KaFai Lau <martin.lau@linux.dev> wrote: >> >> On 1/30/26 3:00 AM, Michal Luczaj wrote: >>>>> Follow-up to discussion at >>>>> https://lore.kernel.org/netdev/20240610174906.32921-1-kuniyu@amazon.com/. >>>> >>>> It is a long thread to dig. Please summarize the discussion in the >>>> commit message. >>> >>> OK, there we go: >>> >>> The root cause of the null-ptr-deref is that unix_stream_connect() sets >>> sk_state (`WRITE_ONCE(sk->sk_state, TCP_ESTABLISHED)`) _before_ it assigns >>> a peer (`unix_peer(sk) = newsk`). sk_state == TCP_ESTABLISHED makes >>> sock_map_sk_state_allowed() believe that socket is properly set up, which >>> would include having a defined peer. >>> >>> In other words, there's a window when you can call >>> unix_stream_bpf_update_proto() on socket which still has unix_peer(sk) == NULL. >>> >>> My initial idea was to simply move peer assignment _before_ the sk_state >>> update, but the maintainer wasn't interested in changing the >>> unix_stream_connect() hot path. He suggested taking care of it in the >>> sockmap code. > > Yes, we already have a memory barrier for unix_peer(sk) there > (to save sock_hold()/sock_put() in sendmsg(), see 830a1e5c212fb) > and another one just for sk->sk_state is not worth the unlikely > case in sockmap by a buggy user. > > >>> >>> My understanding is that users are not supposed to put sockets in a sockmap >>> when said socket is only half-way through connect() call. Hence `return >>> -EINVAL` on a missing peer. Now, if users should be allowed to legally race >>> connect() vs. sockmap update, then I guess we can wait for connect() to >>> "finalize" e.g. by taking the unix_state_lock(), as discussed below. > > If a user hit the issue, the user must have updated sockmap while the > user knows connect() had not returned. Such a user must prepare > for failures since it could occur before sock_map_sk_state_allowed() too. > > This is a subtle timing issue and I don't think the kernel should be > friendly to such buggy users by waiting for connect() etc. > > >>> >>>> From looking at this commit message, if the existing lock_sock held by >>>> update_elem is not useful for af_unix, >>> >>> Right, the existing lock_sock is not useful. update's lock_sock holds >>> sock::sk_lock, while unix_state_lock() holds unix_sock::lock. >> >> It sounds like lock_sock is the incorrect lock to hold for af_unix. Is >> taking lock_sock in sock_map doing anything useful for af_unix? Should >> sock_map hold the unix_state_lock instead of lock_sock? > > If sockmap code does not sleep, unix_state_lock can be used there. > > >> >> Other than update_elem, do other lock_sock() usages in sock_map have a >> similar issue for af_unix? As for the sockmap, I think that would be it. In related news, looks like bpf_iter_unix_seq_show() is missing unix_state_lock(): lock_sock_fast() won't stop unix_release_sock(). E.g. bpf iterator can grab unix_sock::peer as it is being released. >>>> it is not clear why a new test >>>> "!sk_pair" on top of the existing WRITE_ONCE(sk->sk_state...) is a fix. >>> >>> "On top"? Just to make sure we're looking at the same thing: above I was >>> trying to show two parallel flows with unix_peer() fetch in thread-0 and >>> WRITE_ONCE(sk->sk_state...) and `unix_peer(sk) = newsk` in thread-1. >>> >>> It fixes the problem because now update_proto won't call sock_hold(NULL). >>> >>>> A minor thing is sock_map_sk_state_allowed doesn't have >>>> READ_ONCE(sk->sk_state) for sk_is_stream_unix also. >>> >>> Ok, I'll add this as a separate patch in v2. Along with the !tcp case of >>> sock_map_redirect_allowed()? >> >> sgtm. thanks. >> >>> >>>> If unix_stream_connect does not hold lock_sock, can unix_state_lock be >>>> used here? lock_sock has already been taken, update_elem should not be >>>> the hot path. >>> >>> Yes, it can be used, it was proposed in the old thread. In fact, critical >>> section can be empty; only used to wait for unix_stream_connect() to >>> release the lock, which would guarantee unix_peer(sk) != NULL by then. >>> >>> if (!psock->sk_pair) { >>> + unix_state_lock(sk); >>> + unix_state_unlock(sk); > > I don't like this... we had a similar one in recvmsg(MSG_PEEK) path > for GC with a biiiiiig comment, which I removed in 118f457da9ed . > > >>> sk_pair = unix_peer(sk); >>> sock_hold(sk_pair); >> >> I don't have a strong opinion on waiting or checking NULL. imo, both are >> not easy to understand. One is sk_state had already been checked earlier >> under a lock_sock but still needs to check NULL on unix_peer(). Another >> one is an empty unix_state_[un]lock(). If taking unix_state_lock, may as >> well just use the existing unix_peer_get(sk). > > Yes, unix_peer_get() can be safely used there (with extra sock_put()). > > >> If its return value cannot >> (?) be NULL, WARN_ON_ONCE() instead of having a special empty > > I suggested WARN_ON_ONCE() because Michal reproduced it with > mdelay() and I did not think it could occur in real life, but considering > PREEMPT_RT, it could be real. So, the current form in this patch looks > good to me. FWIW, it reproduces on CONFIG_PREEMPT_RT=n without mdelay(). Please see https://lore.kernel.org/bpf/20260123-selftest-signal-on-connect-v1-0-b0256e7025b6@rbox.co/ >> lock/unlock pattern here. If the correct lock (unix_state_lock) was held >> earlier in update_elem, all these would go away. >> >> Also, it is not immediately clear why a non-NULL unix_peer(sk) is safe >> here. From looking around af_unix.c, is it because the sk refcnt is held >> earlier in update_elem? For unix_stream, unix_peer(sk) will stay valid >> until unix_release_sock(sk). Am I reading it correctly? > > unix_stream_connect() holds the peer's refcnt, and once unix_peer(sk) > is set, it and refcnt are not cleared until close()d. So unix_peer_get() is > a bit redundant for sane users. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH bpf] bpf, sockmap: Fix af_unix null-ptr-deref in proto update 2026-02-02 15:10 ` Michal Luczaj @ 2026-02-03 3:53 ` Martin KaFai Lau 2026-02-03 9:57 ` Michal Luczaj 0 siblings, 1 reply; 24+ messages in thread From: Martin KaFai Lau @ 2026-02-03 3:53 UTC (permalink / raw) To: Michal Luczaj, Kuniyuki Iwashima Cc: John Fastabend, Jakub Sitnicki, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman, Daniel Borkmann, netdev, bpf, linux-kernel On 2/2/26 7:10 AM, Michal Luczaj wrote: >>> Other than update_elem, do other lock_sock() usages in sock_map have a >>> similar issue for af_unix? > As for the sockmap, I think that would be it. Thanks for checking. > > In related news, looks like bpf_iter_unix_seq_show() is missing > unix_state_lock(): lock_sock_fast() won't stop unix_release_sock(). E.g. > bpf iterator can grab unix_sock::peer as it is being released. If the concern is the bpf iterator prog may use a released unix_peer(sk) pointer, it should be fine. The unix_peer(sk) pointer is not a trusted pointer to the bpf prog, so nothing bad will happen other than potentially reading incorrect values. However, yeah, the bpf_iter_(tcp|udp)_seq_show is better in the sense that the correct lock is used. For tcp_sock that has many stats, I think it will be particularly useful to read them in a consistent state. I don't have a strong opinion on af_unix. Unlike the sock_map where the lock_sock is not useful for af_unix. The bpf iterator can do bpf_setsockopt, so a lock_sock_fast() is still needed in bpf_iter_unix_seq_show and I think it is the reason lock_sock_fast() is used here. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH bpf] bpf, sockmap: Fix af_unix null-ptr-deref in proto update 2026-02-03 3:53 ` Martin KaFai Lau @ 2026-02-03 9:57 ` Michal Luczaj 2026-02-03 19:47 ` Kuniyuki Iwashima 0 siblings, 1 reply; 24+ messages in thread From: Michal Luczaj @ 2026-02-03 9:57 UTC (permalink / raw) To: Martin KaFai Lau, Kuniyuki Iwashima Cc: John Fastabend, Jakub Sitnicki, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman, Daniel Borkmann, netdev, bpf, linux-kernel On 2/3/26 04:53, Martin KaFai Lau wrote: > On 2/2/26 7:10 AM, Michal Luczaj wrote: >> In related news, looks like bpf_iter_unix_seq_show() is missing >> unix_state_lock(): lock_sock_fast() won't stop unix_release_sock(). E.g. >> bpf iterator can grab unix_sock::peer as it is being released. > > If the concern is the bpf iterator prog may use a released unix_peer(sk) > pointer, it should be fine. The unix_peer(sk) pointer is not a trusted > pointer to the bpf prog, so nothing bad will happen other than > potentially reading incorrect values. But if the prog passes a released peer pointer to a bpf helper: BUG: KASAN: slab-use-after-free in bpf_skc_to_unix_sock+0x95/0xb0 Read of size 1 at addr ffff888110654c92 by task test_progs/1936 Call Trace: dump_stack_lvl+0x5d/0x80 print_report+0x170/0x4f3 kasan_report+0xe1/0x180 bpf_skc_to_unix_sock+0x95/0xb0 bpf_prog_382743e45576abd5_dump_unix+0x4a0/0x576 bpf_iter_run_prog+0x5b9/0xb00 bpf_iter_unix_seq_show+0x1f7/0x2e0 bpf_seq_read+0x42c/0x10d0 vfs_read+0x171/0xb20 ksys_read+0xff/0x200 do_syscall_64+0x6b/0x3a0 entry_SYSCALL_64_after_hwframe+0x76/0x7e Allocated by task 1941: kasan_save_stack+0x30/0x50 kasan_save_track+0x14/0x30 __kasan_slab_alloc+0x63/0x80 kmem_cache_alloc_noprof+0x1f7/0x6f0 sk_prot_alloc+0x59/0x210 sk_alloc+0x34/0x470 unix_create1+0x86/0x8a0 unix_create+0xd4/0x2d0 __sock_create+0x243/0x5a0 __sys_socket+0x119/0x1d0 __x64_sys_socket+0x72/0xd0 do_syscall_64+0x6b/0x3a0 entry_SYSCALL_64_after_hwframe+0x76/0x7e Freed by task 1941: kasan_save_stack+0x30/0x50 kasan_save_track+0x14/0x30 kasan_save_free_info+0x3b/0x70 __kasan_slab_free+0x47/0x70 kmem_cache_free+0x12c/0x5d0 __sk_destruct+0x432/0x6e0 unix_release_sock+0x9b3/0xf60 unix_release_sock+0x99f/0xf60 unix_release+0x8a/0xf0 __sock_release+0xb0/0x270 sock_close+0x18/0x20 __fput+0x36e/0xac0 fput_close_sync+0xe5/0x1a0 __x64_sys_close+0x7d/0xd0 do_syscall_64+0x6b/0x3a0 entry_SYSCALL_64_after_hwframe+0x76/0x7e ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH bpf] bpf, sockmap: Fix af_unix null-ptr-deref in proto update 2026-02-03 9:57 ` Michal Luczaj @ 2026-02-03 19:47 ` Kuniyuki Iwashima 2026-02-04 7:15 ` Martin KaFai Lau 0 siblings, 1 reply; 24+ messages in thread From: Kuniyuki Iwashima @ 2026-02-03 19:47 UTC (permalink / raw) To: mhal Cc: bpf, daniel, davem, edumazet, horms, jakub, john.fastabend, kuba, kuniyu, linux-kernel, martin.lau, netdev, pabeni From: Michal Luczaj <mhal@rbox.co> Date: Tue, 3 Feb 2026 10:57:46 +0100 > On 2/3/26 04:53, Martin KaFai Lau wrote: > > On 2/2/26 7:10 AM, Michal Luczaj wrote: > >> In related news, looks like bpf_iter_unix_seq_show() is missing > >> unix_state_lock(): lock_sock_fast() won't stop unix_release_sock(). E.g. > >> bpf iterator can grab unix_sock::peer as it is being released. > > > > If the concern is the bpf iterator prog may use a released unix_peer(sk) > > pointer, it should be fine. The unix_peer(sk) pointer is not a trusted > > pointer to the bpf prog, so nothing bad will happen other than > > potentially reading incorrect values. > > But if the prog passes a released peer pointer to a bpf helper: > > BUG: KASAN: slab-use-after-free in bpf_skc_to_unix_sock+0x95/0xb0 > Read of size 1 at addr ffff888110654c92 by task test_progs/1936 Can you cook a patch for this ? probably like below ---8<--- diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 02ebad6afac7..9c7e9fbde362 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -3740,8 +3740,9 @@ static int bpf_iter_unix_seq_show(struct seq_file *seq, void *v) return 0; slow = lock_sock_fast(sk); + unix_state_lock(sk); - if (unlikely(sk_unhashed(sk))) { + if (unlikely(sock_flag(other, SOCK_DEAD))) { ret = SEQ_SKIP; goto unlock; } @@ -3751,6 +3752,7 @@ static int bpf_iter_unix_seq_show(struct seq_file *seq, void *v) prog = bpf_iter_get_info(&meta, false); ret = unix_prog_seq_show(prog, &meta, v, uid); unlock: + unix_staet_unlock(sk); unlock_sock_fast(sk, slow); return ret; } ---8<--- Thanks! ^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH bpf] bpf, sockmap: Fix af_unix null-ptr-deref in proto update 2026-02-03 19:47 ` Kuniyuki Iwashima @ 2026-02-04 7:15 ` Martin KaFai Lau 2026-02-04 7:58 ` Kuniyuki Iwashima 0 siblings, 1 reply; 24+ messages in thread From: Martin KaFai Lau @ 2026-02-04 7:15 UTC (permalink / raw) To: Kuniyuki Iwashima, mhal Cc: bpf, daniel, davem, edumazet, horms, jakub, john.fastabend, kuba, linux-kernel, netdev, pabeni On 2/3/26 11:47 AM, Kuniyuki Iwashima wrote: > From: Michal Luczaj <mhal@rbox.co> > Date: Tue, 3 Feb 2026 10:57:46 +0100 >> On 2/3/26 04:53, Martin KaFai Lau wrote: >>> On 2/2/26 7:10 AM, Michal Luczaj wrote: >>>> In related news, looks like bpf_iter_unix_seq_show() is missing >>>> unix_state_lock(): lock_sock_fast() won't stop unix_release_sock(). E.g. >>>> bpf iterator can grab unix_sock::peer as it is being released. >>> >>> If the concern is the bpf iterator prog may use a released unix_peer(sk) >>> pointer, it should be fine. The unix_peer(sk) pointer is not a trusted >>> pointer to the bpf prog, so nothing bad will happen other than >>> potentially reading incorrect values. >> >> But if the prog passes a released peer pointer to a bpf helper: >> >> BUG: KASAN: slab-use-after-free in bpf_skc_to_unix_sock+0x95/0xb0 >> Read of size 1 at addr ffff888110654c92 by task test_progs/1936 hmm... bpf_skc_to_unix_sock is exposed to tracing. bpf_iter is a tracing bpf prog. > > Can you cook a patch for this ? probably like below This can help the bpf_iter but not the other tracing prog such as fentry. > > ---8<--- > diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c > index 02ebad6afac7..9c7e9fbde362 100644 > --- a/net/unix/af_unix.c > +++ b/net/unix/af_unix.c > @@ -3740,8 +3740,9 @@ static int bpf_iter_unix_seq_show(struct seq_file *seq, void *v) > return 0; > > slow = lock_sock_fast(sk); > + unix_state_lock(sk); > > - if (unlikely(sk_unhashed(sk))) { > + if (unlikely(sock_flag(other, SOCK_DEAD))) { > ret = SEQ_SKIP; > goto unlock; > } > @@ -3751,6 +3752,7 @@ static int bpf_iter_unix_seq_show(struct seq_file *seq, void *v) > prog = bpf_iter_get_info(&meta, false); > ret = unix_prog_seq_show(prog, &meta, v, uid); > unlock: > + unix_staet_unlock(sk); > unlock_sock_fast(sk, slow); > return ret; > } > ---8<--- > > Thanks! ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH bpf] bpf, sockmap: Fix af_unix null-ptr-deref in proto update 2026-02-04 7:15 ` Martin KaFai Lau @ 2026-02-04 7:58 ` Kuniyuki Iwashima 2026-02-04 15:41 ` Michal Luczaj 0 siblings, 1 reply; 24+ messages in thread From: Kuniyuki Iwashima @ 2026-02-04 7:58 UTC (permalink / raw) To: Martin KaFai Lau Cc: mhal, bpf, daniel, davem, edumazet, horms, jakub, john.fastabend, kuba, linux-kernel, netdev, pabeni On Tue, Feb 3, 2026 at 11:15 PM Martin KaFai Lau <martin.lau@linux.dev> wrote: > > On 2/3/26 11:47 AM, Kuniyuki Iwashima wrote: > > From: Michal Luczaj <mhal@rbox.co> > > Date: Tue, 3 Feb 2026 10:57:46 +0100 > >> On 2/3/26 04:53, Martin KaFai Lau wrote: > >>> On 2/2/26 7:10 AM, Michal Luczaj wrote: > >>>> In related news, looks like bpf_iter_unix_seq_show() is missing > >>>> unix_state_lock(): lock_sock_fast() won't stop unix_release_sock(). E.g. > >>>> bpf iterator can grab unix_sock::peer as it is being released. > >>> > >>> If the concern is the bpf iterator prog may use a released unix_peer(sk) > >>> pointer, it should be fine. The unix_peer(sk) pointer is not a trusted > >>> pointer to the bpf prog, so nothing bad will happen other than > >>> potentially reading incorrect values. > >> > >> But if the prog passes a released peer pointer to a bpf helper: > >> > >> BUG: KASAN: slab-use-after-free in bpf_skc_to_unix_sock+0x95/0xb0 > >> Read of size 1 at addr ffff888110654c92 by task test_progs/1936 > > hmm... bpf_skc_to_unix_sock is exposed to tracing. bpf_iter is a tracing > bpf prog. > > > > > Can you cook a patch for this ? probably like below > > This can help the bpf_iter but not the other tracing prog such as fentry. Oh well ... then bpf_skc_to_unix_sock() can be used even with SEQ_START_TOKEN at fentry of bpf_iter_unix_seq_show() ?? How about adding notrace to all af_unix bpf iterator functions ? The procfs iterator holds a spinlock of the hashtable from ->start/next() to ->stop() to prevent the race with unix_release_sock(). I think other (non-iterator) functions cannot do such racy access with tracing prog. > > > > > ---8<--- > > diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c > > index 02ebad6afac7..9c7e9fbde362 100644 > > --- a/net/unix/af_unix.c > > +++ b/net/unix/af_unix.c > > @@ -3740,8 +3740,9 @@ static int bpf_iter_unix_seq_show(struct seq_file *seq, void *v) > > return 0; > > > > slow = lock_sock_fast(sk); > > + unix_state_lock(sk); > > > > - if (unlikely(sk_unhashed(sk))) { > > + if (unlikely(sock_flag(other, SOCK_DEAD))) { > > ret = SEQ_SKIP; > > goto unlock; > > } > > @@ -3751,6 +3752,7 @@ static int bpf_iter_unix_seq_show(struct seq_file *seq, void *v) > > prog = bpf_iter_get_info(&meta, false); > > ret = unix_prog_seq_show(prog, &meta, v, uid); > > unlock: > > + unix_staet_unlock(sk); > > unlock_sock_fast(sk, slow); > > return ret; > > } > > ---8<--- > > > > Thanks! > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH bpf] bpf, sockmap: Fix af_unix null-ptr-deref in proto update 2026-02-04 7:58 ` Kuniyuki Iwashima @ 2026-02-04 15:41 ` Michal Luczaj 2026-02-04 19:16 ` Kuniyuki Iwashima 2026-02-04 19:34 ` Martin KaFai Lau 0 siblings, 2 replies; 24+ messages in thread From: Michal Luczaj @ 2026-02-04 15:41 UTC (permalink / raw) To: Kuniyuki Iwashima, Martin KaFai Lau Cc: bpf, daniel, davem, edumazet, horms, jakub, john.fastabend, kuba, linux-kernel, netdev, pabeni On 2/4/26 08:58, Kuniyuki Iwashima wrote: > On Tue, Feb 3, 2026 at 11:15 PM Martin KaFai Lau <martin.lau@linux.dev> wrote: >> >> On 2/3/26 11:47 AM, Kuniyuki Iwashima wrote: >>> From: Michal Luczaj <mhal@rbox.co> >>> Date: Tue, 3 Feb 2026 10:57:46 +0100 >>>> On 2/3/26 04:53, Martin KaFai Lau wrote: >>>>> On 2/2/26 7:10 AM, Michal Luczaj wrote: >>>>>> In related news, looks like bpf_iter_unix_seq_show() is missing >>>>>> unix_state_lock(): lock_sock_fast() won't stop unix_release_sock(). E.g. >>>>>> bpf iterator can grab unix_sock::peer as it is being released. >>>>> >>>>> If the concern is the bpf iterator prog may use a released unix_peer(sk) >>>>> pointer, it should be fine. The unix_peer(sk) pointer is not a trusted >>>>> pointer to the bpf prog, so nothing bad will happen other than >>>>> potentially reading incorrect values. >>>> >>>> But if the prog passes a released peer pointer to a bpf helper: >>>> >>>> BUG: KASAN: slab-use-after-free in bpf_skc_to_unix_sock+0x95/0xb0 >>>> Read of size 1 at addr ffff888110654c92 by task test_progs/1936 >> >> hmm... bpf_skc_to_unix_sock is exposed to tracing. bpf_iter is a tracing >> bpf prog. >> >>> >>> Can you cook a patch for this ? probably like below >> >> This can help the bpf_iter but not the other tracing prog such as fentry. > > Oh well ... then bpf_skc_to_unix_sock() can be used even > with SEQ_START_TOKEN at fentry of bpf_iter_unix_seq_show() ?? > > How about adding notrace to all af_unix bpf iterator functions ? > > The procfs iterator holds a spinlock of the hashtable from > ->start/next() to ->stop() to prevent the race with unix_release_sock(). > > I think other (non-iterator) functions cannot do such racy > access with tracing prog. But then there's SOCK_DGRAM where you can drop unix_peer(sk) without releasing sk; see AF_UNSPEC in unix_dgram_connect(). I think Martin is right, we can crash at many fentries. BUG: KASAN: slab-use-after-free in bpf_skc_to_unix_sock+0xa4/0xb0 Read of size 2 at addr ffff888147d38890 by task test_progs/2495 Call Trace: dump_stack_lvl+0x5d/0x80 print_report+0x170/0x4f3 kasan_report+0xe1/0x180 bpf_skc_to_unix_sock+0xa4/0xb0 bpf_prog_564a1c39c35d86a2_unix_shutdown_entry+0x8a/0x8e bpf_trampoline_6442564662+0x47/0xab unix_shutdown+0x9/0x880 __sys_shutdown+0xe1/0x160 __x64_sys_shutdown+0x52/0x90 do_syscall_64+0x6b/0x3a0 entry_SYSCALL_64_after_hwframe+0x76/0x7e > >> >>> >>> ---8<--- >>> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c >>> index 02ebad6afac7..9c7e9fbde362 100644 >>> --- a/net/unix/af_unix.c >>> +++ b/net/unix/af_unix.c >>> @@ -3740,8 +3740,9 @@ static int bpf_iter_unix_seq_show(struct seq_file *seq, void *v) >>> return 0; >>> >>> slow = lock_sock_fast(sk); >>> + unix_state_lock(sk); >>> >>> - if (unlikely(sk_unhashed(sk))) { >>> + if (unlikely(sock_flag(other, SOCK_DEAD))) { >>> ret = SEQ_SKIP; >>> goto unlock; >>> } >>> @@ -3751,6 +3752,7 @@ static int bpf_iter_unix_seq_show(struct seq_file *seq, void *v) >>> prog = bpf_iter_get_info(&meta, false); >>> ret = unix_prog_seq_show(prog, &meta, v, uid); >>> unlock: >>> + unix_staet_unlock(sk); >>> unlock_sock_fast(sk, slow); >>> return ret; >>> } >>> ---8<--- >>> >>> Thanks! >> ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH bpf] bpf, sockmap: Fix af_unix null-ptr-deref in proto update 2026-02-04 15:41 ` Michal Luczaj @ 2026-02-04 19:16 ` Kuniyuki Iwashima 2026-02-04 20:18 ` Martin KaFai Lau 2026-02-04 19:34 ` Martin KaFai Lau 1 sibling, 1 reply; 24+ messages in thread From: Kuniyuki Iwashima @ 2026-02-04 19:16 UTC (permalink / raw) To: Michal Luczaj Cc: Martin KaFai Lau, bpf, daniel, davem, edumazet, horms, jakub, john.fastabend, kuba, linux-kernel, netdev, pabeni On Wed, Feb 4, 2026 at 7:41 AM Michal Luczaj <mhal@rbox.co> wrote: > > On 2/4/26 08:58, Kuniyuki Iwashima wrote: > > On Tue, Feb 3, 2026 at 11:15 PM Martin KaFai Lau <martin.lau@linux.dev> wrote: > >> > >> On 2/3/26 11:47 AM, Kuniyuki Iwashima wrote: > >>> From: Michal Luczaj <mhal@rbox.co> > >>> Date: Tue, 3 Feb 2026 10:57:46 +0100 > >>>> On 2/3/26 04:53, Martin KaFai Lau wrote: > >>>>> On 2/2/26 7:10 AM, Michal Luczaj wrote: > >>>>>> In related news, looks like bpf_iter_unix_seq_show() is missing > >>>>>> unix_state_lock(): lock_sock_fast() won't stop unix_release_sock(). E.g. > >>>>>> bpf iterator can grab unix_sock::peer as it is being released. > >>>>> > >>>>> If the concern is the bpf iterator prog may use a released unix_peer(sk) > >>>>> pointer, it should be fine. The unix_peer(sk) pointer is not a trusted > >>>>> pointer to the bpf prog, so nothing bad will happen other than > >>>>> potentially reading incorrect values. > >>>> > >>>> But if the prog passes a released peer pointer to a bpf helper: > >>>> > >>>> BUG: KASAN: slab-use-after-free in bpf_skc_to_unix_sock+0x95/0xb0 > >>>> Read of size 1 at addr ffff888110654c92 by task test_progs/1936 > >> > >> hmm... bpf_skc_to_unix_sock is exposed to tracing. bpf_iter is a tracing > >> bpf prog. > >> > >>> > >>> Can you cook a patch for this ? probably like below > >> > >> This can help the bpf_iter but not the other tracing prog such as fentry. > > > > Oh well ... then bpf_skc_to_unix_sock() can be used even > > with SEQ_START_TOKEN at fentry of bpf_iter_unix_seq_show() ?? > > > > How about adding notrace to all af_unix bpf iterator functions ? > > > > The procfs iterator holds a spinlock of the hashtable from > > ->start/next() to ->stop() to prevent the race with unix_release_sock(). > > > > I think other (non-iterator) functions cannot do such racy > > access with tracing prog. > > But then there's SOCK_DGRAM where you can drop unix_peer(sk) without > releasing sk; see AF_UNSPEC in unix_dgram_connect(). I think Martin is > right, we can crash at many fentries. Ah I was only considering SOCK_STREAM :p Only solution that came to mind is breaking change enforcing a release function to bpf_skc_to_unix_sock() so that it can hold the peer's refcnt and save the pointer somewhere else until the release function, but I think this is unacceptable. Also, I guess this type of issue could be triggered with any objects that are not refcounted by bpf tracing prog ? For example, inet_sock(sk)->inet_opt could be freed by setsockopt(IP_OPTIONS) even after fentry prog verifies that it's not NULL. I'm not sure if bpf_core_cast() etc allows such access, but if it's allowed, I think there is no general solution. Fortunately that's not null-deref nor oob-write, and it just reads stale info as Martin mentioned... so probably this is WAI for tracing prog ? > > BUG: KASAN: slab-use-after-free in bpf_skc_to_unix_sock+0xa4/0xb0 > Read of size 2 at addr ffff888147d38890 by task test_progs/2495 > Call Trace: > dump_stack_lvl+0x5d/0x80 > print_report+0x170/0x4f3 > kasan_report+0xe1/0x180 > bpf_skc_to_unix_sock+0xa4/0xb0 > bpf_prog_564a1c39c35d86a2_unix_shutdown_entry+0x8a/0x8e > bpf_trampoline_6442564662+0x47/0xab > unix_shutdown+0x9/0x880 > __sys_shutdown+0xe1/0x160 > __x64_sys_shutdown+0x52/0x90 > do_syscall_64+0x6b/0x3a0 > entry_SYSCALL_64_after_hwframe+0x76/0x7e > > > > >> > >>> > >>> ---8<--- > >>> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c > >>> index 02ebad6afac7..9c7e9fbde362 100644 > >>> --- a/net/unix/af_unix.c > >>> +++ b/net/unix/af_unix.c > >>> @@ -3740,8 +3740,9 @@ static int bpf_iter_unix_seq_show(struct seq_file *seq, void *v) > >>> return 0; > >>> > >>> slow = lock_sock_fast(sk); > >>> + unix_state_lock(sk); > >>> > >>> - if (unlikely(sk_unhashed(sk))) { > >>> + if (unlikely(sock_flag(other, SOCK_DEAD))) { > >>> ret = SEQ_SKIP; > >>> goto unlock; > >>> } > >>> @@ -3751,6 +3752,7 @@ static int bpf_iter_unix_seq_show(struct seq_file *seq, void *v) > >>> prog = bpf_iter_get_info(&meta, false); > >>> ret = unix_prog_seq_show(prog, &meta, v, uid); > >>> unlock: > >>> + unix_staet_unlock(sk); > >>> unlock_sock_fast(sk, slow); > >>> return ret; > >>> } > >>> ---8<--- > >>> > >>> Thanks! > >> > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH bpf] bpf, sockmap: Fix af_unix null-ptr-deref in proto update 2026-02-04 19:16 ` Kuniyuki Iwashima @ 2026-02-04 20:18 ` Martin KaFai Lau 0 siblings, 0 replies; 24+ messages in thread From: Martin KaFai Lau @ 2026-02-04 20:18 UTC (permalink / raw) To: Kuniyuki Iwashima, Michal Luczaj Cc: bpf, daniel, davem, edumazet, horms, jakub, john.fastabend, kuba, linux-kernel, netdev, pabeni On 2/4/26 11:16 AM, Kuniyuki Iwashima wrote: > For example, inet_sock(sk)->inet_opt could be freed by > setsockopt(IP_OPTIONS) even after fentry prog verifies > that it's not NULL. This one should be fine because of rcu. > > I'm not sure if bpf_core_cast() etc allows such access, but > if it's allowed, I think there is no general solution. bpf_core_cast (i.e. the "kfunc" bpf_rdonly_cast) does not use the pointer argument, so should be fine. Its return value is marked as PTR_UNTRUSTED. iirc, PTR_UNTRUSTED cannot be passed to helper, so bpf_core_cast should be fine overall. > > Fortunately that's not null-deref nor oob-write, and it just reads > stale info as Martin mentioned... so probably this is WAI for > tracing prog ? afaik, the tracing radius is large, so the prog cannot expect much guarantee. Reading in the bpf prog is fine. The exception is handled. The problem here is passing it to a helper (not kfunc) that depends on the arg being valid. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH bpf] bpf, sockmap: Fix af_unix null-ptr-deref in proto update 2026-02-04 15:41 ` Michal Luczaj 2026-02-04 19:16 ` Kuniyuki Iwashima @ 2026-02-04 19:34 ` Martin KaFai Lau 2026-02-04 21:09 ` Kuniyuki Iwashima 2026-02-04 23:25 ` Michal Luczaj 1 sibling, 2 replies; 24+ messages in thread From: Martin KaFai Lau @ 2026-02-04 19:34 UTC (permalink / raw) To: Michal Luczaj, Kuniyuki Iwashima Cc: bpf, daniel, davem, edumazet, horms, jakub, john.fastabend, kuba, linux-kernel, netdev, pabeni On 2/4/26 7:41 AM, Michal Luczaj wrote: >>>>>> If the concern is the bpf iterator prog may use a released unix_peer(sk) >>>>>> pointer, it should be fine. The unix_peer(sk) pointer is not a trusted >>>>>> pointer to the bpf prog, so nothing bad will happen other than >>>>>> potentially reading incorrect values. I misremembered that following unix->peer would be marked as (PTR_TO_BTF_ID | PTR_UNTRUSTED). I forgot there are some legacy supports on the PTR_TO_BTF_ID (i.e. without PTR_UNTRUSTED marking). >>>>> >>>>> But if the prog passes a released peer pointer to a bpf helper: >>>>> >>>>> BUG: KASAN: slab-use-after-free in bpf_skc_to_unix_sock+0x95/0xb0 >>>>> Read of size 1 at addr ffff888110654c92 by task test_progs/1936 >>> >>> hmm... bpf_skc_to_unix_sock is exposed to tracing. bpf_iter is a tracing >>> bpf prog. >>> >>>> >>>> Can you cook a patch for this ? probably like below >>> >>> This can help the bpf_iter but not the other tracing prog such as fentry. >> >> Oh well ... then bpf_skc_to_unix_sock() can be used even >> with SEQ_START_TOKEN at fentry of bpf_iter_unix_seq_show() ?? It is fine. The type is void. >> >> How about adding notrace to all af_unix bpf iterator functions ? but right, other functions taking [unix_]sock pointer could be audited. I don't know af_unix well enough to assess the blast radius or whether some useful functions may become untraceable. >> >> The procfs iterator holds a spinlock of the hashtable from >> ->start/next() to ->stop() to prevent the race with unix_release_sock(). >> >> I think other (non-iterator) functions cannot do such racy >> access with tracing prog. > > But then there's SOCK_DGRAM where you can drop unix_peer(sk) without > releasing sk; see AF_UNSPEC in unix_dgram_connect(). I think Martin is > right, we can crash at many fentries. > > BUG: KASAN: slab-use-after-free in bpf_skc_to_unix_sock+0xa4/0xb0 > Read of size 2 at addr ffff888147d38890 by task test_progs/2495 > Call Trace: > dump_stack_lvl+0x5d/0x80 > print_report+0x170/0x4f3 > kasan_report+0xe1/0x180 > bpf_skc_to_unix_sock+0xa4/0xb0 > bpf_prog_564a1c39c35d86a2_unix_shutdown_entry+0x8a/0x8e > bpf_trampoline_6442564662+0x47/0xab > unix_shutdown+0x9/0x880 > __sys_shutdown+0xe1/0x160 > __x64_sys_shutdown+0x52/0x90 > do_syscall_64+0x6b/0x3a0 > entry_SYSCALL_64_after_hwframe+0x76/0x7e This probably is the first case where reading a sk pointer requires a lock. I think it will need to be marked as PTR_UNTRUSTED in the verifier for the unix->peer access, so that it cannot be passed to a helper. There is a BTF_TYPE_SAFE_TRUSTED list. afaik, there is no untrusted one now. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH bpf] bpf, sockmap: Fix af_unix null-ptr-deref in proto update 2026-02-04 19:34 ` Martin KaFai Lau @ 2026-02-04 21:09 ` Kuniyuki Iwashima 2026-02-05 0:55 ` Martin KaFai Lau 2026-02-04 23:25 ` Michal Luczaj 1 sibling, 1 reply; 24+ messages in thread From: Kuniyuki Iwashima @ 2026-02-04 21:09 UTC (permalink / raw) To: martin.lau Cc: bpf, daniel, davem, edumazet, horms, jakub, john.fastabend, kuba, kuniyu, linux-kernel, mhal, netdev, pabeni From: Martin KaFai Lau <martin.lau@linux.dev> Date: Wed, 4 Feb 2026 11:34:55 -0800 > On 2/4/26 7:41 AM, Michal Luczaj wrote: > >>>>>> If the concern is the bpf iterator prog may use a released unix_peer(sk) > >>>>>> pointer, it should be fine. The unix_peer(sk) pointer is not a trusted > >>>>>> pointer to the bpf prog, so nothing bad will happen other than > >>>>>> potentially reading incorrect values. > > I misremembered that following unix->peer would be marked as > (PTR_TO_BTF_ID | PTR_UNTRUSTED). I forgot there are some legacy supports > on the PTR_TO_BTF_ID (i.e. without PTR_UNTRUSTED marking). > > >>>>> > >>>>> But if the prog passes a released peer pointer to a bpf helper: > >>>>> > >>>>> BUG: KASAN: slab-use-after-free in bpf_skc_to_unix_sock+0x95/0xb0 > >>>>> Read of size 1 at addr ffff888110654c92 by task test_progs/1936 > >>> > >>> hmm... bpf_skc_to_unix_sock is exposed to tracing. bpf_iter is a tracing > >>> bpf prog. > >>> > >>>> > >>>> Can you cook a patch for this ? probably like below > >>> > >>> This can help the bpf_iter but not the other tracing prog such as fentry. > >> > >> Oh well ... then bpf_skc_to_unix_sock() can be used even > >> with SEQ_START_TOKEN at fentry of bpf_iter_unix_seq_show() ?? > > It is fine. The type is void. > > >> > >> How about adding notrace to all af_unix bpf iterator functions ? > > but right, other functions taking [unix_]sock pointer could be audited. > I don't know af_unix well enough to assess the blast radius or whether > some useful functions may become untraceable. Considering SOCK_DGRAM, the blast radus is much bigger than I thought, so I'd avoid this way if possible by modifying the verifier. > > >> > >> The procfs iterator holds a spinlock of the hashtable from > >> ->start/next() to ->stop() to prevent the race with unix_release_sock(). > >> > >> I think other (non-iterator) functions cannot do such racy > >> access with tracing prog. > > > > But then there's SOCK_DGRAM where you can drop unix_peer(sk) without > > releasing sk; see AF_UNSPEC in unix_dgram_connect(). I think Martin is > > right, we can crash at many fentries. > > > > BUG: KASAN: slab-use-after-free in bpf_skc_to_unix_sock+0xa4/0xb0 > > Read of size 2 at addr ffff888147d38890 by task test_progs/2495 > > Call Trace: > > dump_stack_lvl+0x5d/0x80 > > print_report+0x170/0x4f3 > > kasan_report+0xe1/0x180 > > bpf_skc_to_unix_sock+0xa4/0xb0 > > bpf_prog_564a1c39c35d86a2_unix_shutdown_entry+0x8a/0x8e > > bpf_trampoline_6442564662+0x47/0xab > > unix_shutdown+0x9/0x880 > > __sys_shutdown+0xe1/0x160 > > __x64_sys_shutdown+0x52/0x90 > > do_syscall_64+0x6b/0x3a0 > > entry_SYSCALL_64_after_hwframe+0x76/0x7e > > This probably is the first case where reading a sk pointer requires a > lock. I think it will need to be marked as PTR_UNTRUSTED in the verifier > for the unix->peer access, so that it cannot be passed to a helper. > There is a BTF_TYPE_SAFE_TRUSTED list. afaik, there is no untrusted one now. Just skimmed the code, and I guess something like below would do that ? and if needed, we could add another helper to fetch peer with a proper release function ? ---8<--- diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 3135643d5695..ef8b4dd21923 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -7177,6 +7177,14 @@ static bool type_is_rcu_or_null(struct bpf_verifier_env *env, return btf_nested_type_is_trusted(&env->log, reg, field_name, btf_id, "__safe_rcu_or_null"); } +static bool type_is_untrusted(struct bpf_verifier_env *env, + struct bpf_reg_state *reg, + const char *field_name, u32 btf_id) +{ + /* TODO: return true if field_name and btf_id is unix_sock.peer. */ + return false; +} + static bool type_is_trusted(struct bpf_verifier_env *env, struct bpf_reg_state *reg, const char *field_name, u32 btf_id) @@ -7307,7 +7315,9 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env, * A regular RCU-protected pointer with __rcu tag can also be deemed * trusted if we are in an RCU CS. Such pointer can be NULL. */ - if (type_is_trusted(env, reg, field_name, btf_id)) { + if (type_is_untrusted(env, reg, field_name, btf_id)) { + flag |= PTR_UNTRUSTED; + } else if (type_is_trusted(env, reg, field_name, btf_id)) { flag |= PTR_TRUSTED; } else if (type_is_trusted_or_null(env, reg, field_name, btf_id)) { flag |= PTR_TRUSTED | PTR_MAYBE_NULL; ---8<--- ^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH bpf] bpf, sockmap: Fix af_unix null-ptr-deref in proto update 2026-02-04 21:09 ` Kuniyuki Iwashima @ 2026-02-05 0:55 ` Martin KaFai Lau 2026-02-05 2:00 ` Martin KaFai Lau 0 siblings, 1 reply; 24+ messages in thread From: Martin KaFai Lau @ 2026-02-05 0:55 UTC (permalink / raw) To: Kuniyuki Iwashima, Alexei Starovoitov Cc: bpf, daniel, davem, edumazet, horms, jakub, john.fastabend, kuba, linux-kernel, mhal, netdev, pabeni On 2/4/26 1:09 PM, Kuniyuki Iwashima wrote: > From: Martin KaFai Lau <martin.lau@linux.dev> > Date: Wed, 4 Feb 2026 11:34:55 -0800 >> On 2/4/26 7:41 AM, Michal Luczaj wrote: >>>>>>>> If the concern is the bpf iterator prog may use a released unix_peer(sk) >>>>>>>> pointer, it should be fine. The unix_peer(sk) pointer is not a trusted >>>>>>>> pointer to the bpf prog, so nothing bad will happen other than >>>>>>>> potentially reading incorrect values. >> >> I misremembered that following unix->peer would be marked as >> (PTR_TO_BTF_ID | PTR_UNTRUSTED). I forgot there are some legacy supports >> on the PTR_TO_BTF_ID (i.e. without PTR_UNTRUSTED marking). >> >>>>>>> >>>>>>> But if the prog passes a released peer pointer to a bpf helper: >>>>>>> >>>>>>> BUG: KASAN: slab-use-after-free in bpf_skc_to_unix_sock+0x95/0xb0 >>>>>>> Read of size 1 at addr ffff888110654c92 by task test_progs/1936 >>>>> >>>>> hmm... bpf_skc_to_unix_sock is exposed to tracing. bpf_iter is a tracing >>>>> bpf prog. >>>>> >>>>>> >>>>>> Can you cook a patch for this ? probably like below >>>>> >>>>> This can help the bpf_iter but not the other tracing prog such as fentry. >>>> >>>> Oh well ... then bpf_skc_to_unix_sock() can be used even >>>> with SEQ_START_TOKEN at fentry of bpf_iter_unix_seq_show() ?? >> >> It is fine. The type is void. >> >>>> >>>> How about adding notrace to all af_unix bpf iterator functions ? >> >> but right, other functions taking [unix_]sock pointer could be audited. >> I don't know af_unix well enough to assess the blast radius or whether >> some useful functions may become untraceable. > > Considering SOCK_DGRAM, the blast radus is much bigger than > I thought, so I'd avoid this way if possible by modifying > the verifier. > > >> >>>> >>>> The procfs iterator holds a spinlock of the hashtable from >>>> ->start/next() to ->stop() to prevent the race with unix_release_sock(). >>>> >>>> I think other (non-iterator) functions cannot do such racy >>>> access with tracing prog. >>> >>> But then there's SOCK_DGRAM where you can drop unix_peer(sk) without >>> releasing sk; see AF_UNSPEC in unix_dgram_connect(). I think Martin is >>> right, we can crash at many fentries. >>> >>> BUG: KASAN: slab-use-after-free in bpf_skc_to_unix_sock+0xa4/0xb0 >>> Read of size 2 at addr ffff888147d38890 by task test_progs/2495 >>> Call Trace: >>> dump_stack_lvl+0x5d/0x80 >>> print_report+0x170/0x4f3 >>> kasan_report+0xe1/0x180 >>> bpf_skc_to_unix_sock+0xa4/0xb0 >>> bpf_prog_564a1c39c35d86a2_unix_shutdown_entry+0x8a/0x8e >>> bpf_trampoline_6442564662+0x47/0xab >>> unix_shutdown+0x9/0x880 >>> __sys_shutdown+0xe1/0x160 >>> __x64_sys_shutdown+0x52/0x90 >>> do_syscall_64+0x6b/0x3a0 >>> entry_SYSCALL_64_after_hwframe+0x76/0x7e >> >> This probably is the first case where reading a sk pointer requires a >> lock. I think it will need to be marked as PTR_UNTRUSTED in the verifier >> for the unix->peer access, so that it cannot be passed to a helper. >> There is a BTF_TYPE_SAFE_TRUSTED list. afaik, there is no untrusted one now. > > Just skimmed the code, and I guess something like below would > do that ? and if needed, we could add another helper to fetch > peer with a proper release function ? > > ---8<--- > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > index 3135643d5695..ef8b4dd21923 100644 > --- a/kernel/bpf/verifier.c > +++ b/kernel/bpf/verifier.c > @@ -7177,6 +7177,14 @@ static bool type_is_rcu_or_null(struct bpf_verifier_env *env, > return btf_nested_type_is_trusted(&env->log, reg, field_name, btf_id, "__safe_rcu_or_null"); > } > > +static bool type_is_untrusted(struct bpf_verifier_env *env, > + struct bpf_reg_state *reg, > + const char *field_name, u32 btf_id) > +{ > + /* TODO: return true if field_name and btf_id is unix_sock.peer. */ > + return false; > +} > + > static bool type_is_trusted(struct bpf_verifier_env *env, > struct bpf_reg_state *reg, > const char *field_name, u32 btf_id) > @@ -7307,7 +7315,9 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env, > * A regular RCU-protected pointer with __rcu tag can also be deemed > * trusted if we are in an RCU CS. Such pointer can be NULL. > */ > - if (type_is_trusted(env, reg, field_name, btf_id)) { > + if (type_is_untrusted(env, reg, field_name, btf_id)) { > + flag |= PTR_UNTRUSTED; Something like this but I think the PTR_UNTRUSTED marking should be done right after the clear_trusted_flags() where it is for supporting the depreciated PTR_TO_BTF_ID. Before that ... Alexei, can you advise if we should change the verifier to mark PTR_UNTRUSTED on unix_sock->peer or we can deprecate the bpf_skc_to_* helper support from tracing and ask the user to switch to bpf_core_cast (i.e. bpf_rdonly_cast) by using a WARN_ON_ONCE message? The problem is that the unix_sock->peer pointer is not always valid when passing to the bpf_skc_to_* helpers, so it is a UAF. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH bpf] bpf, sockmap: Fix af_unix null-ptr-deref in proto update 2026-02-05 0:55 ` Martin KaFai Lau @ 2026-02-05 2:00 ` Martin KaFai Lau 2026-02-05 7:39 ` Kuniyuki Iwashima 0 siblings, 1 reply; 24+ messages in thread From: Martin KaFai Lau @ 2026-02-05 2:00 UTC (permalink / raw) To: Kuniyuki Iwashima, Alexei Starovoitov Cc: bpf, daniel, davem, edumazet, horms, jakub, john.fastabend, kuba, linux-kernel, mhal, netdev, pabeni On 2/4/26 4:55 PM, Martin KaFai Lau wrote: > > > On 2/4/26 1:09 PM, Kuniyuki Iwashima wrote: >> From: Martin KaFai Lau <martin.lau@linux.dev> >> Date: Wed, 4 Feb 2026 11:34:55 -0800 >>> On 2/4/26 7:41 AM, Michal Luczaj wrote: >>>>>>>>> If the concern is the bpf iterator prog may use a released >>>>>>>>> unix_peer(sk) >>>>>>>>> pointer, it should be fine. The unix_peer(sk) pointer is not a >>>>>>>>> trusted >>>>>>>>> pointer to the bpf prog, so nothing bad will happen other than >>>>>>>>> potentially reading incorrect values. >>> >>> I misremembered that following unix->peer would be marked as >>> (PTR_TO_BTF_ID | PTR_UNTRUSTED). I forgot there are some legacy supports >>> on the PTR_TO_BTF_ID (i.e. without PTR_UNTRUSTED marking). >>> >>>>>>>> >>>>>>>> But if the prog passes a released peer pointer to a bpf helper: >>>>>>>> >>>>>>>> BUG: KASAN: slab-use-after-free in bpf_skc_to_unix_sock+0x95/0xb0 >>>>>>>> Read of size 1 at addr ffff888110654c92 by task test_progs/1936 >>>>>> >>>>>> hmm... bpf_skc_to_unix_sock is exposed to tracing. bpf_iter is a >>>>>> tracing >>>>>> bpf prog. >>>>>> >>>>>>> >>>>>>> Can you cook a patch for this ? probably like below >>>>>> >>>>>> This can help the bpf_iter but not the other tracing prog such as >>>>>> fentry. >>>>> >>>>> Oh well ... then bpf_skc_to_unix_sock() can be used even >>>>> with SEQ_START_TOKEN at fentry of bpf_iter_unix_seq_show() ?? >>> >>> It is fine. The type is void. >>> >>>>> >>>>> How about adding notrace to all af_unix bpf iterator functions ? >>> >>> but right, other functions taking [unix_]sock pointer could be audited. >>> I don't know af_unix well enough to assess the blast radius or whether >>> some useful functions may become untraceable. >> >> Considering SOCK_DGRAM, the blast radus is much bigger than >> I thought, so I'd avoid this way if possible by modifying >> the verifier. >> >> >>> >>>>> >>>>> The procfs iterator holds a spinlock of the hashtable from >>>>> ->start/next() to ->stop() to prevent the race with >>>>> unix_release_sock(). >>>>> >>>>> I think other (non-iterator) functions cannot do such racy >>>>> access with tracing prog. >>>> >>>> But then there's SOCK_DGRAM where you can drop unix_peer(sk) without >>>> releasing sk; see AF_UNSPEC in unix_dgram_connect(). I think Martin is >>>> right, we can crash at many fentries. >>>> >>>> BUG: KASAN: slab-use-after-free in bpf_skc_to_unix_sock+0xa4/0xb0 >>>> Read of size 2 at addr ffff888147d38890 by task test_progs/2495 >>>> Call Trace: >>>> dump_stack_lvl+0x5d/0x80 >>>> print_report+0x170/0x4f3 >>>> kasan_report+0xe1/0x180 >>>> bpf_skc_to_unix_sock+0xa4/0xb0 >>>> bpf_prog_564a1c39c35d86a2_unix_shutdown_entry+0x8a/0x8e >>>> bpf_trampoline_6442564662+0x47/0xab >>>> unix_shutdown+0x9/0x880 >>>> __sys_shutdown+0xe1/0x160 >>>> __x64_sys_shutdown+0x52/0x90 >>>> do_syscall_64+0x6b/0x3a0 >>>> entry_SYSCALL_64_after_hwframe+0x76/0x7e >>> >>> This probably is the first case where reading a sk pointer requires a >>> lock. I think it will need to be marked as PTR_UNTRUSTED in the verifier >>> for the unix->peer access, so that it cannot be passed to a helper. >>> There is a BTF_TYPE_SAFE_TRUSTED list. afaik, there is no untrusted >>> one now. >> >> Just skimmed the code, and I guess something like below would >> do that ? and if needed, we could add another helper to fetch >> peer with a proper release function ? >> >> ---8<--- >> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c >> index 3135643d5695..ef8b4dd21923 100644 >> --- a/kernel/bpf/verifier.c >> +++ b/kernel/bpf/verifier.c >> @@ -7177,6 +7177,14 @@ static bool type_is_rcu_or_null(struct >> bpf_verifier_env *env, >> return btf_nested_type_is_trusted(&env->log, reg, field_name, >> btf_id, "__safe_rcu_or_null"); >> } >> +static bool type_is_untrusted(struct bpf_verifier_env *env, >> + struct bpf_reg_state *reg, >> + const char *field_name, u32 btf_id) >> +{ >> + /* TODO: return true if field_name and btf_id is unix_sock.peer. */ >> + return false; >> +} >> + >> static bool type_is_trusted(struct bpf_verifier_env *env, >> struct bpf_reg_state *reg, >> const char *field_name, u32 btf_id) >> @@ -7307,7 +7315,9 @@ static int check_ptr_to_btf_access(struct >> bpf_verifier_env *env, >> * A regular RCU-protected pointer with __rcu tag can also >> be deemed >> * trusted if we are in an RCU CS. Such pointer can be NULL. >> */ >> - if (type_is_trusted(env, reg, field_name, btf_id)) { >> + if (type_is_untrusted(env, reg, field_name, btf_id)) { >> + flag |= PTR_UNTRUSTED; > > Something like this but I think the PTR_UNTRUSTED marking should be done > right after the clear_trusted_flags() where it is for supporting the > depreciated PTR_TO_BTF_ID. Before that ... > > Alexei, can you advise if we should change the verifier to mark > PTR_UNTRUSTED on unix_sock->peer or we can deprecate the bpf_skc_to_* > helper support from tracing and ask the user to switch to bpf_core_cast > (i.e. bpf_rdonly_cast) by using a WARN_ON_ONCE message? After trying more, taking out bpf_skc_to_* is not enough. It still needs to reject passing unix->peer to bpf_setsockopt for bpf_iter, so PTR_UNTRUSTED mark is needed. > > The problem is that the unix_sock->peer pointer is not always valid when > passing to the bpf_skc_to_* helpers, so it is a UAF. > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH bpf] bpf, sockmap: Fix af_unix null-ptr-deref in proto update 2026-02-05 2:00 ` Martin KaFai Lau @ 2026-02-05 7:39 ` Kuniyuki Iwashima 0 siblings, 0 replies; 24+ messages in thread From: Kuniyuki Iwashima @ 2026-02-05 7:39 UTC (permalink / raw) To: Martin KaFai Lau Cc: Alexei Starovoitov, bpf, daniel, davem, edumazet, horms, jakub, john.fastabend, kuba, linux-kernel, mhal, netdev, pabeni On Wed, Feb 4, 2026 at 6:00 PM Martin KaFai Lau <martin.lau@linux.dev> wrote: > > > > On 2/4/26 4:55 PM, Martin KaFai Lau wrote: > > > > > > On 2/4/26 1:09 PM, Kuniyuki Iwashima wrote: > >> From: Martin KaFai Lau <martin.lau@linux.dev> > >> Date: Wed, 4 Feb 2026 11:34:55 -0800 > >>> On 2/4/26 7:41 AM, Michal Luczaj wrote: > >>>>>>>>> If the concern is the bpf iterator prog may use a released > >>>>>>>>> unix_peer(sk) > >>>>>>>>> pointer, it should be fine. The unix_peer(sk) pointer is not a > >>>>>>>>> trusted > >>>>>>>>> pointer to the bpf prog, so nothing bad will happen other than > >>>>>>>>> potentially reading incorrect values. > >>> > >>> I misremembered that following unix->peer would be marked as > >>> (PTR_TO_BTF_ID | PTR_UNTRUSTED). I forgot there are some legacy supports > >>> on the PTR_TO_BTF_ID (i.e. without PTR_UNTRUSTED marking). > >>> > >>>>>>>> > >>>>>>>> But if the prog passes a released peer pointer to a bpf helper: > >>>>>>>> > >>>>>>>> BUG: KASAN: slab-use-after-free in bpf_skc_to_unix_sock+0x95/0xb0 > >>>>>>>> Read of size 1 at addr ffff888110654c92 by task test_progs/1936 > >>>>>> > >>>>>> hmm... bpf_skc_to_unix_sock is exposed to tracing. bpf_iter is a > >>>>>> tracing > >>>>>> bpf prog. > >>>>>> > >>>>>>> > >>>>>>> Can you cook a patch for this ? probably like below > >>>>>> > >>>>>> This can help the bpf_iter but not the other tracing prog such as > >>>>>> fentry. > >>>>> > >>>>> Oh well ... then bpf_skc_to_unix_sock() can be used even > >>>>> with SEQ_START_TOKEN at fentry of bpf_iter_unix_seq_show() ?? > >>> > >>> It is fine. The type is void. > >>> > >>>>> > >>>>> How about adding notrace to all af_unix bpf iterator functions ? > >>> > >>> but right, other functions taking [unix_]sock pointer could be audited. > >>> I don't know af_unix well enough to assess the blast radius or whether > >>> some useful functions may become untraceable. > >> > >> Considering SOCK_DGRAM, the blast radus is much bigger than > >> I thought, so I'd avoid this way if possible by modifying > >> the verifier. > >> > >> > >>> > >>>>> > >>>>> The procfs iterator holds a spinlock of the hashtable from > >>>>> ->start/next() to ->stop() to prevent the race with > >>>>> unix_release_sock(). > >>>>> > >>>>> I think other (non-iterator) functions cannot do such racy > >>>>> access with tracing prog. > >>>> > >>>> But then there's SOCK_DGRAM where you can drop unix_peer(sk) without > >>>> releasing sk; see AF_UNSPEC in unix_dgram_connect(). I think Martin is > >>>> right, we can crash at many fentries. > >>>> > >>>> BUG: KASAN: slab-use-after-free in bpf_skc_to_unix_sock+0xa4/0xb0 > >>>> Read of size 2 at addr ffff888147d38890 by task test_progs/2495 > >>>> Call Trace: > >>>> dump_stack_lvl+0x5d/0x80 > >>>> print_report+0x170/0x4f3 > >>>> kasan_report+0xe1/0x180 > >>>> bpf_skc_to_unix_sock+0xa4/0xb0 > >>>> bpf_prog_564a1c39c35d86a2_unix_shutdown_entry+0x8a/0x8e > >>>> bpf_trampoline_6442564662+0x47/0xab > >>>> unix_shutdown+0x9/0x880 > >>>> __sys_shutdown+0xe1/0x160 > >>>> __x64_sys_shutdown+0x52/0x90 > >>>> do_syscall_64+0x6b/0x3a0 > >>>> entry_SYSCALL_64_after_hwframe+0x76/0x7e > >>> > >>> This probably is the first case where reading a sk pointer requires a > >>> lock. I think it will need to be marked as PTR_UNTRUSTED in the verifier > >>> for the unix->peer access, so that it cannot be passed to a helper. > >>> There is a BTF_TYPE_SAFE_TRUSTED list. afaik, there is no untrusted > >>> one now. > >> > >> Just skimmed the code, and I guess something like below would > >> do that ? and if needed, we could add another helper to fetch > >> peer with a proper release function ? > >> > >> ---8<--- > >> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > >> index 3135643d5695..ef8b4dd21923 100644 > >> --- a/kernel/bpf/verifier.c > >> +++ b/kernel/bpf/verifier.c > >> @@ -7177,6 +7177,14 @@ static bool type_is_rcu_or_null(struct > >> bpf_verifier_env *env, > >> return btf_nested_type_is_trusted(&env->log, reg, field_name, > >> btf_id, "__safe_rcu_or_null"); > >> } > >> +static bool type_is_untrusted(struct bpf_verifier_env *env, > >> + struct bpf_reg_state *reg, > >> + const char *field_name, u32 btf_id) > >> +{ > >> + /* TODO: return true if field_name and btf_id is unix_sock.peer. */ > >> + return false; > >> +} > >> + > >> static bool type_is_trusted(struct bpf_verifier_env *env, > >> struct bpf_reg_state *reg, > >> const char *field_name, u32 btf_id) > >> @@ -7307,7 +7315,9 @@ static int check_ptr_to_btf_access(struct > >> bpf_verifier_env *env, > >> * A regular RCU-protected pointer with __rcu tag can also > >> be deemed > >> * trusted if we are in an RCU CS. Such pointer can be NULL. > >> */ > >> - if (type_is_trusted(env, reg, field_name, btf_id)) { > >> + if (type_is_untrusted(env, reg, field_name, btf_id)) { > >> + flag |= PTR_UNTRUSTED; > > > > Something like this but I think the PTR_UNTRUSTED marking should be done > > right after the clear_trusted_flags() where it is for supporting the > > depreciated PTR_TO_BTF_ID. Before that ... > > > > Alexei, can you advise if we should change the verifier to mark > > PTR_UNTRUSTED on unix_sock->peer or we can deprecate the bpf_skc_to_* > > helper support from tracing and ask the user to switch to bpf_core_cast > > (i.e. bpf_rdonly_cast) by using a WARN_ON_ONCE message? > > After trying more, taking out bpf_skc_to_* is not enough. It still needs > to reject passing unix->peer to bpf_setsockopt for bpf_iter, so > PTR_UNTRUSTED mark is needed. Ah exactly, I'll cook a patch in that way. > > > > > The problem is that the unix_sock->peer pointer is not always valid when > > passing to the bpf_skc_to_* helpers, so it is a UAF. > > > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH bpf] bpf, sockmap: Fix af_unix null-ptr-deref in proto update 2026-02-04 19:34 ` Martin KaFai Lau 2026-02-04 21:09 ` Kuniyuki Iwashima @ 2026-02-04 23:25 ` Michal Luczaj 2026-02-05 0:27 ` Kuniyuki Iwashima 2026-02-05 0:31 ` Martin KaFai Lau 1 sibling, 2 replies; 24+ messages in thread From: Michal Luczaj @ 2026-02-04 23:25 UTC (permalink / raw) To: Martin KaFai Lau, Kuniyuki Iwashima Cc: bpf, daniel, davem, edumazet, horms, jakub, john.fastabend, kuba, linux-kernel, netdev, pabeni On 2/4/26 20:34, Martin KaFai Lau wrote: >> But then there's SOCK_DGRAM where you can drop unix_peer(sk) without >> releasing sk; see AF_UNSPEC in unix_dgram_connect(). I think Martin is >> right, we can crash at many fentries. >> >> BUG: KASAN: slab-use-after-free in bpf_skc_to_unix_sock+0xa4/0xb0 >> Read of size 2 at addr ffff888147d38890 by task test_progs/2495 >> Call Trace: >> dump_stack_lvl+0x5d/0x80 >> print_report+0x170/0x4f3 >> kasan_report+0xe1/0x180 >> bpf_skc_to_unix_sock+0xa4/0xb0 >> bpf_prog_564a1c39c35d86a2_unix_shutdown_entry+0x8a/0x8e >> bpf_trampoline_6442564662+0x47/0xab >> unix_shutdown+0x9/0x880 >> __sys_shutdown+0xe1/0x160 >> __x64_sys_shutdown+0x52/0x90 >> do_syscall_64+0x6b/0x3a0 >> entry_SYSCALL_64_after_hwframe+0x76/0x7e > > This probably is the first case where reading a sk pointer requires a > lock. I think it will need to be marked as PTR_UNTRUSTED in the verifier > for the unix->peer access, so that it cannot be passed to a helper. > There is a BTF_TYPE_SAFE_TRUSTED list. afaik, there is no untrusted one now. What about 'fexit/sk_common_release' that does 'bpf_skc_to_unix_sock(sk)'. Is this something the verifies is supposed to handle? $ python -c 'from socket import *; socket(AF_INET, SOCK_DGRAM)' BUG: KASAN: slab-use-after-free in bpf_skc_to_unix_sock+0x95/0xb0 Read of size 1 at addr ffff888128312112 by task python/4076 Call Trace: dump_stack_lvl+0x5d/0x80 print_report+0x170/0x4f3 kasan_report+0xe1/0x180 bpf_skc_to_unix_sock+0x95/0xb0 bpf_prog_70a8d38bd5dbf399_release_exit+0x87/0x8b bpf_trampoline_6442556594+0x64/0xd3 inet_release+0x104/0x230 __sock_release+0xb0/0x270 sock_close+0x18/0x20 __fput+0x36e/0xac0 fput_close_sync+0xe5/0x1a0 __x64_sys_close+0x7d/0xd0 do_syscall_64+0x6b/0x3a0 entry_SYSCALL_64_after_hwframe+0x76/0x7e ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH bpf] bpf, sockmap: Fix af_unix null-ptr-deref in proto update 2026-02-04 23:25 ` Michal Luczaj @ 2026-02-05 0:27 ` Kuniyuki Iwashima 2026-02-05 0:31 ` Martin KaFai Lau 1 sibling, 0 replies; 24+ messages in thread From: Kuniyuki Iwashima @ 2026-02-05 0:27 UTC (permalink / raw) To: Michal Luczaj Cc: Martin KaFai Lau, bpf, daniel, davem, edumazet, horms, jakub, john.fastabend, kuba, linux-kernel, netdev, pabeni On Wed, Feb 4, 2026 at 3:25 PM Michal Luczaj <mhal@rbox.co> wrote: > > On 2/4/26 20:34, Martin KaFai Lau wrote: > >> But then there's SOCK_DGRAM where you can drop unix_peer(sk) without > >> releasing sk; see AF_UNSPEC in unix_dgram_connect(). I think Martin is > >> right, we can crash at many fentries. > >> > >> BUG: KASAN: slab-use-after-free in bpf_skc_to_unix_sock+0xa4/0xb0 > >> Read of size 2 at addr ffff888147d38890 by task test_progs/2495 > >> Call Trace: > >> dump_stack_lvl+0x5d/0x80 > >> print_report+0x170/0x4f3 > >> kasan_report+0xe1/0x180 > >> bpf_skc_to_unix_sock+0xa4/0xb0 > >> bpf_prog_564a1c39c35d86a2_unix_shutdown_entry+0x8a/0x8e > >> bpf_trampoline_6442564662+0x47/0xab > >> unix_shutdown+0x9/0x880 > >> __sys_shutdown+0xe1/0x160 > >> __x64_sys_shutdown+0x52/0x90 > >> do_syscall_64+0x6b/0x3a0 > >> entry_SYSCALL_64_after_hwframe+0x76/0x7e > > > > This probably is the first case where reading a sk pointer requires a > > lock. I think it will need to be marked as PTR_UNTRUSTED in the verifier > > for the unix->peer access, so that it cannot be passed to a helper. > > There is a BTF_TYPE_SAFE_TRUSTED list. afaik, there is no untrusted one now. > > What about 'fexit/sk_common_release' that does 'bpf_skc_to_unix_sock(sk)'. > Is this something the verifies is supposed to handle? sk_common_release() is not called from AF_UNIX and even other helpers accessing sk->sk_xxx should trigger the same splat. Moreover, any read would trigger the splat at other functions like sk_prot_free(). We could sprinkle notrace over the kernel or have a blocked list for each helper in the verifier, but then we need to inspect all bpf helpers. In the first place, it makes no sense that someone who knows where a pointer is freed tries to read it after being freed and complains about UAF :/ > > $ python -c 'from socket import *; socket(AF_INET, SOCK_DGRAM)' > > BUG: KASAN: slab-use-after-free in bpf_skc_to_unix_sock+0x95/0xb0 > Read of size 1 at addr ffff888128312112 by task python/4076 > Call Trace: > dump_stack_lvl+0x5d/0x80 > print_report+0x170/0x4f3 > kasan_report+0xe1/0x180 > bpf_skc_to_unix_sock+0x95/0xb0 > bpf_prog_70a8d38bd5dbf399_release_exit+0x87/0x8b > bpf_trampoline_6442556594+0x64/0xd3 > inet_release+0x104/0x230 > __sock_release+0xb0/0x270 > sock_close+0x18/0x20 > __fput+0x36e/0xac0 > fput_close_sync+0xe5/0x1a0 > __x64_sys_close+0x7d/0xd0 > do_syscall_64+0x6b/0x3a0 > entry_SYSCALL_64_after_hwframe+0x76/0x7e > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH bpf] bpf, sockmap: Fix af_unix null-ptr-deref in proto update 2026-02-04 23:25 ` Michal Luczaj 2026-02-05 0:27 ` Kuniyuki Iwashima @ 2026-02-05 0:31 ` Martin KaFai Lau 1 sibling, 0 replies; 24+ messages in thread From: Martin KaFai Lau @ 2026-02-05 0:31 UTC (permalink / raw) To: Michal Luczaj, Kuniyuki Iwashima Cc: bpf, daniel, davem, edumazet, horms, jakub, john.fastabend, kuba, linux-kernel, netdev, pabeni On 2/4/26 3:25 PM, Michal Luczaj wrote: > On 2/4/26 20:34, Martin KaFai Lau wrote: >>> But then there's SOCK_DGRAM where you can drop unix_peer(sk) without >>> releasing sk; see AF_UNSPEC in unix_dgram_connect(). I think Martin is >>> right, we can crash at many fentries. >>> >>> BUG: KASAN: slab-use-after-free in bpf_skc_to_unix_sock+0xa4/0xb0 >>> Read of size 2 at addr ffff888147d38890 by task test_progs/2495 >>> Call Trace: >>> dump_stack_lvl+0x5d/0x80 >>> print_report+0x170/0x4f3 >>> kasan_report+0xe1/0x180 >>> bpf_skc_to_unix_sock+0xa4/0xb0 >>> bpf_prog_564a1c39c35d86a2_unix_shutdown_entry+0x8a/0x8e >>> bpf_trampoline_6442564662+0x47/0xab >>> unix_shutdown+0x9/0x880 >>> __sys_shutdown+0xe1/0x160 >>> __x64_sys_shutdown+0x52/0x90 >>> do_syscall_64+0x6b/0x3a0 >>> entry_SYSCALL_64_after_hwframe+0x76/0x7e >> >> This probably is the first case where reading a sk pointer requires a >> lock. I think it will need to be marked as PTR_UNTRUSTED in the verifier >> for the unix->peer access, so that it cannot be passed to a helper. >> There is a BTF_TYPE_SAFE_TRUSTED list. afaik, there is no untrusted one now. > > What about 'fexit/sk_common_release' that does 'bpf_skc_to_unix_sock(sk)'. > Is this something the verifies is supposed to handle? fexit is an obvious one if the bpf prog wants to shoot itself on the foot by using things at the fexit of a free/release function. There are other things can break also if a tracing-capable user wants to exploit at fexit. I don't have good idea how to solve it. Going back to the bpf_skc_to_* helper. The bigger hammer may be, we should properly depreciate the bpf_skc_to_* usage from tracing instead of fixing it and ask the user to stay with the bpf_core_cast. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH bpf] bpf, sockmap: Fix af_unix null-ptr-deref in proto update 2026-01-31 10:06 ` Kuniyuki Iwashima 2026-02-02 15:10 ` Michal Luczaj @ 2026-02-02 19:15 ` Martin KaFai Lau 2026-02-07 14:37 ` Michal Luczaj 1 sibling, 1 reply; 24+ messages in thread From: Martin KaFai Lau @ 2026-02-02 19:15 UTC (permalink / raw) To: Kuniyuki Iwashima, Michal Luczaj, Jakub Sitnicki, John Fastabend Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman, Daniel Borkmann, netdev, bpf, linux-kernel On 1/31/26 2:06 AM, Kuniyuki Iwashima wrote: > On Fri, Jan 30, 2026 at 1:30 PM Martin KaFai Lau <martin.lau@linux.dev> wrote: >> >> On 1/30/26 3:00 AM, Michal Luczaj wrote: >>>>> Follow-up to discussion at >>>>> https://lore.kernel.org/netdev/20240610174906.32921-1-kuniyu@amazon.com/. >>>> >>>> It is a long thread to dig. Please summarize the discussion in the >>>> commit message. >>> >>> OK, there we go: >>> >>> The root cause of the null-ptr-deref is that unix_stream_connect() sets >>> sk_state (`WRITE_ONCE(sk->sk_state, TCP_ESTABLISHED)`) _before_ it assigns >>> a peer (`unix_peer(sk) = newsk`). sk_state == TCP_ESTABLISHED makes >>> sock_map_sk_state_allowed() believe that socket is properly set up, which >>> would include having a defined peer. >>> >>> In other words, there's a window when you can call >>> unix_stream_bpf_update_proto() on socket which still has unix_peer(sk) == NULL. >>> >>> My initial idea was to simply move peer assignment _before_ the sk_state >>> update, but the maintainer wasn't interested in changing the >>> unix_stream_connect() hot path. He suggested taking care of it in the >>> sockmap code. > > Yes, we already have a memory barrier for unix_peer(sk) there > (to save sock_hold()/sock_put() in sendmsg(), see 830a1e5c212fb) > and another one just for sk->sk_state is not worth the unlikely > case in sockmap by a buggy user. > > >>> >>> My understanding is that users are not supposed to put sockets in a sockmap >>> when said socket is only half-way through connect() call. Hence `return >>> -EINVAL` on a missing peer. Now, if users should be allowed to legally race >>> connect() vs. sockmap update, then I guess we can wait for connect() to >>> "finalize" e.g. by taking the unix_state_lock(), as discussed below. > > If a user hit the issue, the user must have updated sockmap while the > user knows connect() had not returned. Such a user must prepare > for failures since it could occur before sock_map_sk_state_allowed() too. > > This is a subtle timing issue and I don't think the kernel should be > friendly to such buggy users by waiting for connect() etc. I don't have a use case for parallel connect and map update either. Also, I have no concern about returning -EINVAL in map_update for the not-yet-completed unix_stream_connect(). However, TCP/UDP (and probably vsock also?) do not have this racing issue because sock_map follows the lock usage in TCP/UDP as in other parts of the kernel. Why AF_UNIX is an exception and unix_state_lock() is not used in sock_map. John and Jakub Stinicki, could this be an oversight? >>> >>>> From looking at this commit message, if the existing lock_sock held by >>>> update_elem is not useful for af_unix, >>> >>> Right, the existing lock_sock is not useful. update's lock_sock holds >>> sock::sk_lock, while unix_state_lock() holds unix_sock::lock. >> >> It sounds like lock_sock is the incorrect lock to hold for af_unix. Is >> taking lock_sock in sock_map doing anything useful for af_unix? Should >> sock_map hold the unix_state_lock instead of lock_sock? > > If sockmap code does not sleep, unix_state_lock can be used there. afaik, bpf prog using the sockmap cannot sleep. Search for "if (prog->sleepable)" in "check_map_prog_compatibility()". The bpf prog attached to sock_map cannot sleep either, there is "can_be_sleepable()" check in the verifier. >>> >>> if (!psock->sk_pair) { >>> + unix_state_lock(sk); >>> + unix_state_unlock(sk); > > I don't like this... we had a similar one in recvmsg(MSG_PEEK) path > for GC with a biiiiiig comment, which I removed in 118f457da9ed . Me neither, for both empty critical section and big fat comment. This usually suggests something else is incorrect to begin with. I believe the wrong lock is taken in this case. Unless there is something prohibiting taking unix_state_lock in sock_map, fix it properly instead. > > >>> sk_pair = unix_peer(sk); >>> sock_hold(sk_pair); >> >> I don't have a strong opinion on waiting or checking NULL. imo, both are >> not easy to understand. One is sk_state had already been checked earlier >> under a lock_sock but still needs to check NULL on unix_peer(). Another >> one is an empty unix_state_[un]lock(). If taking unix_state_lock, may as >> well just use the existing unix_peer_get(sk). > > Yes, unix_peer_get() can be safely used there (with extra sock_put()). The sock_map needs to sock_hold(sk_pair) anyway and stores it in psock->sk_pair, so I don't think it needs an extra sock_put(). > > >> If its return value cannot >> (?) be NULL, WARN_ON_ONCE() instead of having a special empty > > I suggested WARN_ON_ONCE() because Michal reproduced it with > mdelay() and I did not think it could occur in real life, but considering > PREEMPT_RT, it could be real. So, the current form in this patch looks > good to me. hmm... If unix_peer_get(sk) is used and it takes (and waits for) the unix_state_lock, it shouldn't be NULL? The above empty [un]lock idea does not check for NULL on unix_peer() either. Or am I missing something? Regardless, if the proper lock is held, all this complication and reasoning will go away. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH bpf] bpf, sockmap: Fix af_unix null-ptr-deref in proto update 2026-02-02 19:15 ` Martin KaFai Lau @ 2026-02-07 14:37 ` Michal Luczaj 0 siblings, 0 replies; 24+ messages in thread From: Michal Luczaj @ 2026-02-07 14:37 UTC (permalink / raw) To: Martin KaFai Lau, Kuniyuki Iwashima, Jakub Sitnicki, John Fastabend Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman, Daniel Borkmann, netdev, bpf, linux-kernel On 2/2/26 20:15, Martin KaFai Lau wrote: > Regardless, if the proper lock is held, all this complication and > reasoning will go away. Here's my attempt: https://lore.kernel.org/bpf/20260207-unix-proto-update-null-ptr-deref-v2-0-9f091330e7cd@rbox.co/ ^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2026-02-07 14:38 UTC | newest] Thread overview: 24+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-01-29 16:47 [PATCH bpf] bpf, sockmap: Fix af_unix null-ptr-deref in proto update Michal Luczaj 2026-01-29 19:41 ` Martin KaFai Lau 2026-01-30 11:00 ` Michal Luczaj 2026-01-30 21:29 ` Martin KaFai Lau 2026-01-31 10:06 ` Kuniyuki Iwashima 2026-02-02 15:10 ` Michal Luczaj 2026-02-03 3:53 ` Martin KaFai Lau 2026-02-03 9:57 ` Michal Luczaj 2026-02-03 19:47 ` Kuniyuki Iwashima 2026-02-04 7:15 ` Martin KaFai Lau 2026-02-04 7:58 ` Kuniyuki Iwashima 2026-02-04 15:41 ` Michal Luczaj 2026-02-04 19:16 ` Kuniyuki Iwashima 2026-02-04 20:18 ` Martin KaFai Lau 2026-02-04 19:34 ` Martin KaFai Lau 2026-02-04 21:09 ` Kuniyuki Iwashima 2026-02-05 0:55 ` Martin KaFai Lau 2026-02-05 2:00 ` Martin KaFai Lau 2026-02-05 7:39 ` Kuniyuki Iwashima 2026-02-04 23:25 ` Michal Luczaj 2026-02-05 0:27 ` Kuniyuki Iwashima 2026-02-05 0:31 ` Martin KaFai Lau 2026-02-02 19:15 ` Martin KaFai Lau 2026-02-07 14:37 ` Michal Luczaj
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox