* [PATCH bpf v4 1/5] bpf, sockmap: Annotate af_unix sock::sk_state data-races
2026-04-14 14:13 [PATCH bpf v4 0/5] bpf, sockmap: Fix af_unix null-ptr-deref in proto update Michal Luczaj
@ 2026-04-14 14:13 ` Michal Luczaj
2026-04-14 14:13 ` [PATCH bpf v4 2/5] bpf, sockmap: Fix af_unix iter deadlock Michal Luczaj
` (4 subsequent siblings)
5 siblings, 0 replies; 10+ messages in thread
From: Michal Luczaj @ 2026-04-14 14:13 UTC (permalink / raw)
To: John Fastabend, Jakub Sitnicki, Eric Dumazet, Kuniyuki Iwashima,
Paolo Abeni, Willem de Bruijn, David S. Miller, Jakub Kicinski,
Simon Horman, Yonghong Song, Andrii Nakryiko, Alexei Starovoitov,
Daniel Borkmann, Martin KaFai Lau, Eduard Zingerman, Song Liu,
Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Shuah Khan, Cong Wang
Cc: netdev, bpf, linux-kernel, linux-kselftest, Michal Luczaj,
Jiayuan Chen
sock_map_sk_state_allowed() and sock_map_redirect_allowed() read af_unix
socket sk_state locklessly.
Use READ_ONCE(). Note that for sock_map_redirect_allowed() change affects
not only af_unix, but all non-TCP sockets (UDP, af_vsock).
Suggested-by: Kuniyuki Iwashima <kuniyu@google.com>
Suggested-by: Martin KaFai Lau <martin.lau@linux.dev>
Reviewed-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Michal Luczaj <mhal@rbox.co>
---
net/core/sock_map.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index b0e96337a269..02a68be3002a 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -530,7 +530,7 @@ static bool sock_map_redirect_allowed(const struct sock *sk)
if (sk_is_tcp(sk))
return sk->sk_state != TCP_LISTEN;
else
- return sk->sk_state == TCP_ESTABLISHED;
+ return READ_ONCE(sk->sk_state) == TCP_ESTABLISHED;
}
static bool sock_map_sk_is_suitable(const struct sock *sk)
@@ -543,7 +543,7 @@ static bool sock_map_sk_state_allowed(const struct sock *sk)
if (sk_is_tcp(sk))
return (1 << sk->sk_state) & (TCPF_ESTABLISHED | TCPF_LISTEN);
if (sk_is_stream_unix(sk))
- return (1 << sk->sk_state) & TCPF_ESTABLISHED;
+ return (1 << READ_ONCE(sk->sk_state)) & TCPF_ESTABLISHED;
if (sk_is_vsock(sk) &&
(sk->sk_type == SOCK_STREAM || sk->sk_type == SOCK_SEQPACKET))
return (1 << sk->sk_state) & TCPF_ESTABLISHED;
--
2.53.0
^ permalink raw reply related [flat|nested] 10+ messages in thread* [PATCH bpf v4 2/5] bpf, sockmap: Fix af_unix iter deadlock
2026-04-14 14:13 [PATCH bpf v4 0/5] bpf, sockmap: Fix af_unix null-ptr-deref in proto update Michal Luczaj
2026-04-14 14:13 ` [PATCH bpf v4 1/5] bpf, sockmap: Annotate af_unix sock::sk_state data-races Michal Luczaj
@ 2026-04-14 14:13 ` Michal Luczaj
2026-04-14 14:13 ` [PATCH bpf v4 3/5] selftests/bpf: Extend bpf_iter_unix to attempt deadlocking Michal Luczaj
` (3 subsequent siblings)
5 siblings, 0 replies; 10+ messages in thread
From: Michal Luczaj @ 2026-04-14 14:13 UTC (permalink / raw)
To: John Fastabend, Jakub Sitnicki, Eric Dumazet, Kuniyuki Iwashima,
Paolo Abeni, Willem de Bruijn, David S. Miller, Jakub Kicinski,
Simon Horman, Yonghong Song, Andrii Nakryiko, Alexei Starovoitov,
Daniel Borkmann, Martin KaFai Lau, Eduard Zingerman, Song Liu,
Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Shuah Khan, Cong Wang
Cc: netdev, bpf, linux-kernel, linux-kselftest, Michal Luczaj,
Jiayuan Chen
bpf_iter_unix_seq_show() may deadlock when lock_sock_fast() takes the fast
path and the iter prog attempts to update a sockmap. Which ends up spinning
at sock_map_update_elem()'s bh_lock_sock():
WARNING: possible recursive locking detected
test_progs/1393 is trying to acquire lock:
ffff88811ec25f58 (slock-AF_UNIX){+...}-{3:3}, at: sock_map_update_elem+0xdb/0x1f0
but task is already holding lock:
ffff88811ec25f58 (slock-AF_UNIX){+...}-{3:3}, at: __lock_sock_fast+0x37/0xe0
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(slock-AF_UNIX);
lock(slock-AF_UNIX);
*** DEADLOCK ***
May be due to missing lock nesting notation
4 locks held by test_progs/1393:
#0: ffff88814b59c790 (&p->lock){+.+.}-{4:4}, at: bpf_seq_read+0x59/0x10d0
#1: ffff88811ec25fd8 (sk_lock-AF_UNIX){+.+.}-{0:0}, at: bpf_seq_read+0x42c/0x10d0
#2: ffff88811ec25f58 (slock-AF_UNIX){+...}-{3:3}, at: __lock_sock_fast+0x37/0xe0
#3: ffffffff85a6a7c0 (rcu_read_lock){....}-{1:3}, at: bpf_iter_run_prog+0x51d/0xb00
Call Trace:
dump_stack_lvl+0x5d/0x80
print_deadlock_bug.cold+0xc0/0xce
__lock_acquire+0x130f/0x2590
lock_acquire+0x14e/0x2b0
_raw_spin_lock+0x30/0x40
sock_map_update_elem+0xdb/0x1f0
bpf_prog_2d0075e5d9b721cd_dump_unix+0x55/0x4f4
bpf_iter_run_prog+0x5b9/0xb00
bpf_iter_unix_seq_show+0x1f7/0x2e0
bpf_seq_read+0x42c/0x10d0
vfs_read+0x171/0xb20
ksys_read+0xff/0x200
do_syscall_64+0x6b/0x3a0
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Suggested-by: Kuniyuki Iwashima <kuniyu@google.com>
Suggested-by: Martin KaFai Lau <martin.lau@linux.dev>
Fixes: 2c860a43dd77 ("bpf: af_unix: Implement BPF iterator for UNIX domain socket.")
Reviewed-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Michal Luczaj <mhal@rbox.co>
---
net/unix/af_unix.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index b23c33df8b46..590a30d3b2f7 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -3731,15 +3731,14 @@ static int bpf_iter_unix_seq_show(struct seq_file *seq, void *v)
struct bpf_prog *prog;
struct sock *sk = v;
uid_t uid;
- bool slow;
int ret;
if (v == SEQ_START_TOKEN)
return 0;
- slow = lock_sock_fast(sk);
+ lock_sock(sk);
- if (unlikely(sk_unhashed(sk))) {
+ if (unlikely(sock_flag(sk, SOCK_DEAD))) {
ret = SEQ_SKIP;
goto unlock;
}
@@ -3749,7 +3748,7 @@ static int bpf_iter_unix_seq_show(struct seq_file *seq, void *v)
prog = bpf_iter_get_info(&meta, false);
ret = unix_prog_seq_show(prog, &meta, v, uid);
unlock:
- unlock_sock_fast(sk, slow);
+ release_sock(sk);
return ret;
}
--
2.53.0
^ permalink raw reply related [flat|nested] 10+ messages in thread* [PATCH bpf v4 3/5] selftests/bpf: Extend bpf_iter_unix to attempt deadlocking
2026-04-14 14:13 [PATCH bpf v4 0/5] bpf, sockmap: Fix af_unix null-ptr-deref in proto update Michal Luczaj
2026-04-14 14:13 ` [PATCH bpf v4 1/5] bpf, sockmap: Annotate af_unix sock::sk_state data-races Michal Luczaj
2026-04-14 14:13 ` [PATCH bpf v4 2/5] bpf, sockmap: Fix af_unix iter deadlock Michal Luczaj
@ 2026-04-14 14:13 ` Michal Luczaj
2026-04-15 5:01 ` Kuniyuki Iwashima
2026-04-14 14:13 ` [PATCH bpf v4 4/5] bpf, sockmap: Fix af_unix null-ptr-deref in proto update Michal Luczaj
` (2 subsequent siblings)
5 siblings, 1 reply; 10+ messages in thread
From: Michal Luczaj @ 2026-04-14 14:13 UTC (permalink / raw)
To: John Fastabend, Jakub Sitnicki, Eric Dumazet, Kuniyuki Iwashima,
Paolo Abeni, Willem de Bruijn, David S. Miller, Jakub Kicinski,
Simon Horman, Yonghong Song, Andrii Nakryiko, Alexei Starovoitov,
Daniel Borkmann, Martin KaFai Lau, Eduard Zingerman, Song Liu,
Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Shuah Khan, Cong Wang
Cc: netdev, bpf, linux-kernel, linux-kselftest, Michal Luczaj,
Jiayuan Chen
Updating a sockmap from a unix iterator prog may lead to a deadlock.
Piggyback on the original selftest.
Reviewed-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Signed-off-by: Michal Luczaj <mhal@rbox.co>
---
tools/testing/selftests/bpf/progs/bpf_iter_unix.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/tools/testing/selftests/bpf/progs/bpf_iter_unix.c b/tools/testing/selftests/bpf/progs/bpf_iter_unix.c
index fea275df9e22..a2652c8c3616 100644
--- a/tools/testing/selftests/bpf/progs/bpf_iter_unix.c
+++ b/tools/testing/selftests/bpf/progs/bpf_iter_unix.c
@@ -7,6 +7,13 @@
char _license[] SEC("license") = "GPL";
+SEC(".maps") struct {
+ __uint(type, BPF_MAP_TYPE_SOCKMAP);
+ __uint(max_entries, 1);
+ __type(key, __u32);
+ __type(value, __u64);
+} sockmap;
+
static long sock_i_ino(const struct sock *sk)
{
const struct socket *sk_socket = sk->sk_socket;
@@ -76,5 +83,8 @@ int dump_unix(struct bpf_iter__unix *ctx)
BPF_SEQ_PRINTF(seq, "\n");
+ /* Test for deadlock. */
+ bpf_map_update_elem(&sockmap, &(int){0}, sk, 0);
+
return 0;
}
--
2.53.0
^ permalink raw reply related [flat|nested] 10+ messages in thread* Re: [PATCH bpf v4 3/5] selftests/bpf: Extend bpf_iter_unix to attempt deadlocking
2026-04-14 14:13 ` [PATCH bpf v4 3/5] selftests/bpf: Extend bpf_iter_unix to attempt deadlocking Michal Luczaj
@ 2026-04-15 5:01 ` Kuniyuki Iwashima
0 siblings, 0 replies; 10+ messages in thread
From: Kuniyuki Iwashima @ 2026-04-15 5:01 UTC (permalink / raw)
To: Michal Luczaj
Cc: John Fastabend, Jakub Sitnicki, Eric Dumazet, Paolo Abeni,
Willem de Bruijn, David S. Miller, Jakub Kicinski, Simon Horman,
Yonghong Song, Andrii Nakryiko, Alexei Starovoitov,
Daniel Borkmann, Martin KaFai Lau, Eduard Zingerman, Song Liu,
Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Shuah Khan, Cong Wang, netdev, bpf, linux-kernel, linux-kselftest,
Jiayuan Chen
On Tue, Apr 14, 2026 at 7:13 AM Michal Luczaj <mhal@rbox.co> wrote:
>
> Updating a sockmap from a unix iterator prog may lead to a deadlock.
> Piggyback on the original selftest.
>
> Reviewed-by: Jiayuan Chen <jiayuan.chen@linux.dev>
> Signed-off-by: Michal Luczaj <mhal@rbox.co>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH bpf v4 4/5] bpf, sockmap: Fix af_unix null-ptr-deref in proto update
2026-04-14 14:13 [PATCH bpf v4 0/5] bpf, sockmap: Fix af_unix null-ptr-deref in proto update Michal Luczaj
` (2 preceding siblings ...)
2026-04-14 14:13 ` [PATCH bpf v4 3/5] selftests/bpf: Extend bpf_iter_unix to attempt deadlocking Michal Luczaj
@ 2026-04-14 14:13 ` Michal Luczaj
2026-04-15 5:00 ` Kuniyuki Iwashima
2026-04-14 14:13 ` [PATCH bpf v4 5/5] bpf, sockmap: Take state lock for af_unix iter Michal Luczaj
2026-04-16 0:30 ` [PATCH bpf v4 0/5] bpf, sockmap: Fix af_unix null-ptr-deref in proto update patchwork-bot+netdevbpf
5 siblings, 1 reply; 10+ messages in thread
From: Michal Luczaj @ 2026-04-14 14:13 UTC (permalink / raw)
To: John Fastabend, Jakub Sitnicki, Eric Dumazet, Kuniyuki Iwashima,
Paolo Abeni, Willem de Bruijn, David S. Miller, Jakub Kicinski,
Simon Horman, Yonghong Song, Andrii Nakryiko, Alexei Starovoitov,
Daniel Borkmann, Martin KaFai Lau, Eduard Zingerman, Song Liu,
Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Shuah Khan, Cong Wang
Cc: netdev, bpf, linux-kernel, linux-kselftest, Michal Luczaj,
钱一铭
unix_stream_connect() sets sk_state (`WRITE_ONCE(sk->sk_state,
TCP_ESTABLISHED)`) _before_ it assigns a peer (`unix_peer(sk) = newsk`).
sk_state == TCP_ESTABLISHED makes sock_map_sk_state_allowed() believe that
socket is properly set up, which would include having a defined peer. IOW,
there's a window when unix_stream_bpf_update_proto() can be called on
socket which still has unix_peer(sk) == NULL.
CPU0 bpf CPU1 connect
-------- ------------
WRITE_ONCE(sk->sk_state, TCP_ESTABLISHED)
sock_map_sk_state_allowed(sk)
...
sk_pair = unix_peer(sk)
sock_hold(sk_pair)
sock_hold(newsk)
smp_mb__after_atomic()
unix_peer(sk) = newsk
BUG: kernel NULL pointer dereference, address: 0000000000000080
RIP: 0010:unix_stream_bpf_update_proto+0xa0/0x1b0
Call Trace:
sock_map_link+0x564/0x8b0
sock_map_update_common+0x6e/0x340
sock_map_update_elem_sys+0x17d/0x240
__sys_bpf+0x26db/0x3250
__x64_sys_bpf+0x21/0x30
do_syscall_64+0x6b/0x3a0
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Initial idea was to move peer assignment _before_ the sk_state update[1],
but that involved an additional memory barrier, and changing the hot path
was rejected.
Then a NULL check during proto update in unix_stream_bpf_update_proto() was
considered[2], but the follow-up discussion[3] focused on the root cause,
i.e. sockmap update taking a wrong lock. Or, more specifically, missing
unix_state_lock()[4].
In the end it was concluded that teaching sockmap about the af_unix locking
would be unnecessarily complex[5].
Complexity aside, since BPF_PROG_TYPE_SCHED_CLS and BPF_PROG_TYPE_SCHED_ACT
are allowed to update sockmaps, sock_map_update_elem() taking the unix
lock, as it is currently implemented in unix_state_lock():
spin_lock(&unix_sk(s)->lock), would be problematic. unix_state_lock() taken
in a process context, followed by a softirq-context TC BPF program
attempting to take the same spinlock -- deadlock[6].
This way we circled back to the peer check idea[2].
[1]: https://lore.kernel.org/netdev/ba5c50aa-1df4-40c2-ab33-a72022c5a32e@rbox.co/
[2]: https://lore.kernel.org/netdev/20240610174906.32921-1-kuniyu@amazon.com/
[3]: https://lore.kernel.org/netdev/7603c0e6-cd5b-452b-b710-73b64bd9de26@linux.dev/
[4]: https://lore.kernel.org/netdev/CAAVpQUA+8GL_j63CaKb8hbxoL21izD58yr1NvhOhU=j+35+3og@mail.gmail.com/
[5]: https://lore.kernel.org/bpf/CAAVpQUAHijOMext28Gi10dSLuMzGYh+jK61Ujn+fZ-wvcODR2A@mail.gmail.com/
[6]: https://lore.kernel.org/bpf/dd043c69-4d03-46fe-8325-8f97101435cf@linux.dev/
Summary of scenarios where af_unix/stream connect() may race a sockmap
update:
1. connect() vs. bpf(BPF_MAP_UPDATE_ELEM), i.e. sock_map_update_elem_sys()
Implemented NULL check is sufficient. Once assigned, socket peer won't
be released until socket fd is released. And that's not an issue because
sock_map_update_elem_sys() bumps fd refcnf.
2. connect() vs BPF program doing update
Update restricted per verifier.c:may_update_sockmap() to
BPF_PROG_TYPE_TRACING/BPF_TRACE_ITER
BPF_PROG_TYPE_SOCK_OPS (bpf_sock_map_update() only)
BPF_PROG_TYPE_SOCKET_FILTER
BPF_PROG_TYPE_SCHED_CLS
BPF_PROG_TYPE_SCHED_ACT
BPF_PROG_TYPE_XDP
BPF_PROG_TYPE_SK_REUSEPORT
BPF_PROG_TYPE_FLOW_DISSECTOR
BPF_PROG_TYPE_SK_LOOKUP
Plus one more race to consider:
CPU0 bpf CPU1 connect
-------- ------------
WRITE_ONCE(sk->sk_state, TCP_ESTABLISHED)
sock_map_sk_state_allowed(sk)
sock_hold(newsk)
smp_mb__after_atomic()
unix_peer(sk) = newsk
sk_pair = unix_peer(sk)
if (unlikely(!sk_pair))
return -EINVAL;
CPU1 close
----------
skpair = unix_peer(sk);
unix_peer(sk) = NULL;
sock_put(skpair)
// use after free?
sock_hold(sk_pair)
2.1 BPF program invoking helper function bpf_sock_map_update() ->
BPF_CALL_4(bpf_sock_map_update(), ...)
Helper limited to BPF_PROG_TYPE_SOCK_OPS. Nevertheless, a unix sock
might be accessible via bpf_map_lookup_elem(). Which implies sk
already having psock, which in turn implies sk already having
sk_pair. Since sk_psock_destroy() is queued as RCU work, sk_pair
won't go away while BPF executes the update.
2.2 BPF program invoking helper function bpf_map_update_elem() ->
sock_map_update_elem()
2.2.1 Unix sock accessible to BPF prog only via sockmap lookup in
BPF_PROG_TYPE_SOCKET_FILTER, BPF_PROG_TYPE_SCHED_CLS,
BPF_PROG_TYPE_SCHED_ACT, BPF_PROG_TYPE_XDP,
BPF_PROG_TYPE_SK_REUSEPORT, BPF_PROG_TYPE_FLOW_DISSECTOR,
BPF_PROG_TYPE_SK_LOOKUP.
Pretty much the same as case 2.1.
2.2.2 Unix sock accessible to BPF program directly:
BPF_PROG_TYPE_TRACING, narrowed down to BPF_TRACE_ITER.
Sockmap iterator (sock_map_seq_ops) is safe: unix sock
residing in a sockmap means that the sock already went through
the proto update step.
Unix sock iterator (bpf_iter_unix_seq_ops), on the other hand,
gives access to socks that may still be unconnected. Which
means iterator prog can race sockmap/proto update against
connect().
BUG: KASAN: null-ptr-deref in unix_stream_bpf_update_proto+0x253/0x4d0
Write of size 4 at addr 0000000000000080 by task test_progs/3140
Call Trace:
dump_stack_lvl+0x5d/0x80
kasan_report+0xe4/0x1c0
kasan_check_range+0x125/0x200
unix_stream_bpf_update_proto+0x253/0x4d0
sock_map_link+0x71c/0xec0
sock_map_update_common+0xbc/0x600
sock_map_update_elem+0x19a/0x1f0
bpf_prog_bbbf56096cdd4f01_selective_dump_unix+0x20c/0x217
bpf_iter_run_prog+0x21e/0xae0
bpf_iter_unix_seq_show+0x1e0/0x2a0
bpf_seq_read+0x42c/0x10d0
vfs_read+0x171/0xb20
ksys_read+0xff/0x200
do_syscall_64+0xf7/0x5e0
entry_SYSCALL_64_after_hwframe+0x76/0x7e
While the introduced NULL check prevents null-ptr-deref in the
BPF program path as well, it is insufficient to guard against
a poorly timed close() leading to a use-after-free. This will
be addressed in a subsequent patch.
Reported-by: Michal Luczaj <mhal@rbox.co>
Closes: https://lore.kernel.org/netdev/ba5c50aa-1df4-40c2-ab33-a72022c5a32e@rbox.co/
Reported-by: 钱一铭 <yimingqian591@gmail.com>
Suggested-by: Kuniyuki Iwashima <kuniyu@google.com>
Suggested-by: Martin KaFai Lau <martin.lau@linux.dev>
Fixes: c63829182c37 ("af_unix: Implement ->psock_update_sk_prot()")
Signed-off-by: Michal Luczaj <mhal@rbox.co>
---
net/unix/unix_bpf.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/net/unix/unix_bpf.c b/net/unix/unix_bpf.c
index e0d30d6d22ac..57f3124c9d8d 100644
--- a/net/unix/unix_bpf.c
+++ b/net/unix/unix_bpf.c
@@ -185,6 +185,9 @@ int unix_stream_bpf_update_proto(struct sock *sk, struct sk_psock *psock, bool r
*/
if (!psock->sk_pair) {
sk_pair = unix_peer(sk);
+ if (unlikely(!sk_pair))
+ return -EINVAL;
+
sock_hold(sk_pair);
psock->sk_pair = sk_pair;
}
--
2.53.0
^ permalink raw reply related [flat|nested] 10+ messages in thread* Re: [PATCH bpf v4 4/5] bpf, sockmap: Fix af_unix null-ptr-deref in proto update
2026-04-14 14:13 ` [PATCH bpf v4 4/5] bpf, sockmap: Fix af_unix null-ptr-deref in proto update Michal Luczaj
@ 2026-04-15 5:00 ` Kuniyuki Iwashima
0 siblings, 0 replies; 10+ messages in thread
From: Kuniyuki Iwashima @ 2026-04-15 5:00 UTC (permalink / raw)
To: Michal Luczaj
Cc: John Fastabend, Jakub Sitnicki, Eric Dumazet, Paolo Abeni,
Willem de Bruijn, David S. Miller, Jakub Kicinski, Simon Horman,
Yonghong Song, Andrii Nakryiko, Alexei Starovoitov,
Daniel Borkmann, Martin KaFai Lau, Eduard Zingerman, Song Liu,
Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Shuah Khan, Cong Wang, netdev, bpf, linux-kernel, linux-kselftest,
钱一铭
On Tue, Apr 14, 2026 at 7:13 AM Michal Luczaj <mhal@rbox.co> wrote:
>
> unix_stream_connect() sets sk_state (`WRITE_ONCE(sk->sk_state,
> TCP_ESTABLISHED)`) _before_ it assigns a peer (`unix_peer(sk) = newsk`).
> sk_state == TCP_ESTABLISHED makes sock_map_sk_state_allowed() believe that
> socket is properly set up, which would include having a defined peer. IOW,
> there's a window when unix_stream_bpf_update_proto() can be called on
> socket which still has unix_peer(sk) == NULL.
>
> CPU0 bpf CPU1 connect
> -------- ------------
>
> WRITE_ONCE(sk->sk_state, TCP_ESTABLISHED)
> sock_map_sk_state_allowed(sk)
> ...
> sk_pair = unix_peer(sk)
> sock_hold(sk_pair)
> sock_hold(newsk)
> smp_mb__after_atomic()
> unix_peer(sk) = newsk
>
> BUG: kernel NULL pointer dereference, address: 0000000000000080
> RIP: 0010:unix_stream_bpf_update_proto+0xa0/0x1b0
> Call Trace:
> sock_map_link+0x564/0x8b0
> sock_map_update_common+0x6e/0x340
> sock_map_update_elem_sys+0x17d/0x240
> __sys_bpf+0x26db/0x3250
> __x64_sys_bpf+0x21/0x30
> do_syscall_64+0x6b/0x3a0
> entry_SYSCALL_64_after_hwframe+0x76/0x7e
>
> Initial idea was to move peer assignment _before_ the sk_state update[1],
> but that involved an additional memory barrier, and changing the hot path
> was rejected.
> Then a NULL check during proto update in unix_stream_bpf_update_proto() was
> considered[2], but the follow-up discussion[3] focused on the root cause,
> i.e. sockmap update taking a wrong lock. Or, more specifically, missing
> unix_state_lock()[4].
> In the end it was concluded that teaching sockmap about the af_unix locking
> would be unnecessarily complex[5].
> Complexity aside, since BPF_PROG_TYPE_SCHED_CLS and BPF_PROG_TYPE_SCHED_ACT
> are allowed to update sockmaps, sock_map_update_elem() taking the unix
> lock, as it is currently implemented in unix_state_lock():
> spin_lock(&unix_sk(s)->lock), would be problematic. unix_state_lock() taken
> in a process context, followed by a softirq-context TC BPF program
> attempting to take the same spinlock -- deadlock[6].
> This way we circled back to the peer check idea[2].
>
> [1]: https://lore.kernel.org/netdev/ba5c50aa-1df4-40c2-ab33-a72022c5a32e@rbox.co/
> [2]: https://lore.kernel.org/netdev/20240610174906.32921-1-kuniyu@amazon.com/
> [3]: https://lore.kernel.org/netdev/7603c0e6-cd5b-452b-b710-73b64bd9de26@linux.dev/
> [4]: https://lore.kernel.org/netdev/CAAVpQUA+8GL_j63CaKb8hbxoL21izD58yr1NvhOhU=j+35+3og@mail.gmail.com/
> [5]: https://lore.kernel.org/bpf/CAAVpQUAHijOMext28Gi10dSLuMzGYh+jK61Ujn+fZ-wvcODR2A@mail.gmail.com/
> [6]: https://lore.kernel.org/bpf/dd043c69-4d03-46fe-8325-8f97101435cf@linux.dev/
>
> Summary of scenarios where af_unix/stream connect() may race a sockmap
> update:
>
> 1. connect() vs. bpf(BPF_MAP_UPDATE_ELEM), i.e. sock_map_update_elem_sys()
>
> Implemented NULL check is sufficient. Once assigned, socket peer won't
> be released until socket fd is released. And that's not an issue because
> sock_map_update_elem_sys() bumps fd refcnf.
>
> 2. connect() vs BPF program doing update
>
> Update restricted per verifier.c:may_update_sockmap() to
>
> BPF_PROG_TYPE_TRACING/BPF_TRACE_ITER
> BPF_PROG_TYPE_SOCK_OPS (bpf_sock_map_update() only)
> BPF_PROG_TYPE_SOCKET_FILTER
> BPF_PROG_TYPE_SCHED_CLS
> BPF_PROG_TYPE_SCHED_ACT
> BPF_PROG_TYPE_XDP
> BPF_PROG_TYPE_SK_REUSEPORT
> BPF_PROG_TYPE_FLOW_DISSECTOR
> BPF_PROG_TYPE_SK_LOOKUP
>
> Plus one more race to consider:
>
> CPU0 bpf CPU1 connect
> -------- ------------
>
> WRITE_ONCE(sk->sk_state, TCP_ESTABLISHED)
> sock_map_sk_state_allowed(sk)
> sock_hold(newsk)
> smp_mb__after_atomic()
> unix_peer(sk) = newsk
> sk_pair = unix_peer(sk)
> if (unlikely(!sk_pair))
> return -EINVAL;
>
> CPU1 close
> ----------
>
> skpair = unix_peer(sk);
> unix_peer(sk) = NULL;
> sock_put(skpair)
> // use after free?
> sock_hold(sk_pair)
>
> 2.1 BPF program invoking helper function bpf_sock_map_update() ->
> BPF_CALL_4(bpf_sock_map_update(), ...)
>
> Helper limited to BPF_PROG_TYPE_SOCK_OPS. Nevertheless, a unix sock
> might be accessible via bpf_map_lookup_elem(). Which implies sk
> already having psock, which in turn implies sk already having
> sk_pair. Since sk_psock_destroy() is queued as RCU work, sk_pair
> won't go away while BPF executes the update.
>
> 2.2 BPF program invoking helper function bpf_map_update_elem() ->
> sock_map_update_elem()
>
> 2.2.1 Unix sock accessible to BPF prog only via sockmap lookup in
> BPF_PROG_TYPE_SOCKET_FILTER, BPF_PROG_TYPE_SCHED_CLS,
> BPF_PROG_TYPE_SCHED_ACT, BPF_PROG_TYPE_XDP,
> BPF_PROG_TYPE_SK_REUSEPORT, BPF_PROG_TYPE_FLOW_DISSECTOR,
> BPF_PROG_TYPE_SK_LOOKUP.
>
> Pretty much the same as case 2.1.
>
> 2.2.2 Unix sock accessible to BPF program directly:
> BPF_PROG_TYPE_TRACING, narrowed down to BPF_TRACE_ITER.
>
> Sockmap iterator (sock_map_seq_ops) is safe: unix sock
> residing in a sockmap means that the sock already went through
> the proto update step.
>
> Unix sock iterator (bpf_iter_unix_seq_ops), on the other hand,
> gives access to socks that may still be unconnected. Which
> means iterator prog can race sockmap/proto update against
> connect().
>
> BUG: KASAN: null-ptr-deref in unix_stream_bpf_update_proto+0x253/0x4d0
> Write of size 4 at addr 0000000000000080 by task test_progs/3140
> Call Trace:
> dump_stack_lvl+0x5d/0x80
> kasan_report+0xe4/0x1c0
> kasan_check_range+0x125/0x200
> unix_stream_bpf_update_proto+0x253/0x4d0
> sock_map_link+0x71c/0xec0
> sock_map_update_common+0xbc/0x600
> sock_map_update_elem+0x19a/0x1f0
> bpf_prog_bbbf56096cdd4f01_selective_dump_unix+0x20c/0x217
> bpf_iter_run_prog+0x21e/0xae0
> bpf_iter_unix_seq_show+0x1e0/0x2a0
> bpf_seq_read+0x42c/0x10d0
> vfs_read+0x171/0xb20
> ksys_read+0xff/0x200
> do_syscall_64+0xf7/0x5e0
> entry_SYSCALL_64_after_hwframe+0x76/0x7e
>
> While the introduced NULL check prevents null-ptr-deref in the
> BPF program path as well, it is insufficient to guard against
> a poorly timed close() leading to a use-after-free. This will
> be addressed in a subsequent patch.
>
> Reported-by: Michal Luczaj <mhal@rbox.co>
> Closes: https://lore.kernel.org/netdev/ba5c50aa-1df4-40c2-ab33-a72022c5a32e@rbox.co/
> Reported-by: 钱一铭 <yimingqian591@gmail.com>
> Suggested-by: Kuniyuki Iwashima <kuniyu@google.com>
> Suggested-by: Martin KaFai Lau <martin.lau@linux.dev>
> Fixes: c63829182c37 ("af_unix: Implement ->psock_update_sk_prot()")
> Signed-off-by: Michal Luczaj <mhal@rbox.co>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH bpf v4 5/5] bpf, sockmap: Take state lock for af_unix iter
2026-04-14 14:13 [PATCH bpf v4 0/5] bpf, sockmap: Fix af_unix null-ptr-deref in proto update Michal Luczaj
` (3 preceding siblings ...)
2026-04-14 14:13 ` [PATCH bpf v4 4/5] bpf, sockmap: Fix af_unix null-ptr-deref in proto update Michal Luczaj
@ 2026-04-14 14:13 ` Michal Luczaj
2026-04-15 5:02 ` Kuniyuki Iwashima
2026-04-16 0:30 ` [PATCH bpf v4 0/5] bpf, sockmap: Fix af_unix null-ptr-deref in proto update patchwork-bot+netdevbpf
5 siblings, 1 reply; 10+ messages in thread
From: Michal Luczaj @ 2026-04-14 14:13 UTC (permalink / raw)
To: John Fastabend, Jakub Sitnicki, Eric Dumazet, Kuniyuki Iwashima,
Paolo Abeni, Willem de Bruijn, David S. Miller, Jakub Kicinski,
Simon Horman, Yonghong Song, Andrii Nakryiko, Alexei Starovoitov,
Daniel Borkmann, Martin KaFai Lau, Eduard Zingerman, Song Liu,
Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Shuah Khan, Cong Wang
Cc: netdev, bpf, linux-kernel, linux-kselftest, Michal Luczaj
When a BPF iterator program updates a sockmap, there is a race condition in
unix_stream_bpf_update_proto() where the `peer` pointer can become stale[1]
during a state transition TCP_ESTABLISHED -> TCP_CLOSE.
CPU0 bpf CPU1 close
-------- ----------
// unix_stream_bpf_update_proto()
sk_pair = unix_peer(sk)
if (unlikely(!sk_pair))
return -EINVAL;
// unix_release_sock()
skpair = unix_peer(sk);
unix_peer(sk) = NULL;
sock_put(skpair)
sock_hold(sk_pair) // UaF
More practically, this fix guarantees that the iterator program is
consistently provided with a unix socket that remains stable during
iterator execution.
[1]:
BUG: KASAN: slab-use-after-free in unix_stream_bpf_update_proto+0x155/0x490
Write of size 4 at addr ffff8881178c9a00 by task test_progs/2231
Call Trace:
dump_stack_lvl+0x5d/0x80
print_report+0x170/0x4f3
kasan_report+0xe4/0x1c0
kasan_check_range+0x125/0x200
unix_stream_bpf_update_proto+0x155/0x490
sock_map_link+0x71c/0xec0
sock_map_update_common+0xbc/0x600
sock_map_update_elem+0x19a/0x1f0
bpf_prog_bbbf56096cdd4f01_selective_dump_unix+0x20c/0x217
bpf_iter_run_prog+0x21e/0xae0
bpf_iter_unix_seq_show+0x1e0/0x2a0
bpf_seq_read+0x42c/0x10d0
vfs_read+0x171/0xb20
ksys_read+0xff/0x200
do_syscall_64+0xf7/0x5e0
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Allocated by task 2236:
kasan_save_stack+0x30/0x50
kasan_save_track+0x14/0x30
__kasan_slab_alloc+0x63/0x80
kmem_cache_alloc_noprof+0x1d5/0x680
sk_prot_alloc+0x59/0x210
sk_alloc+0x34/0x470
unix_create1+0x86/0x8a0
unix_stream_connect+0x318/0x15b0
__sys_connect+0xfd/0x130
__x64_sys_connect+0x72/0xd0
do_syscall_64+0xf7/0x5e0
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Freed by task 2236:
kasan_save_stack+0x30/0x50
kasan_save_track+0x14/0x30
kasan_save_free_info+0x3b/0x70
__kasan_slab_free+0x47/0x70
kmem_cache_free+0x11c/0x590
__sk_destruct+0x432/0x6e0
unix_release_sock+0x9b3/0xf60
unix_release+0x8a/0xf0
__sock_release+0xb0/0x270
sock_close+0x18/0x20
__fput+0x36e/0xac0
fput_close_sync+0xe5/0x1a0
__x64_sys_close+0x7d/0xd0
do_syscall_64+0xf7/0x5e0
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Suggested-by: Kuniyuki Iwashima <kuniyu@google.com>
Fixes: 2c860a43dd77 ("bpf: af_unix: Implement BPF iterator for UNIX domain socket.")
Signed-off-by: Michal Luczaj <mhal@rbox.co>
---
net/unix/af_unix.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 590a30d3b2f7..15b48cc6e9b0 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -3737,6 +3737,7 @@ static int bpf_iter_unix_seq_show(struct seq_file *seq, void *v)
return 0;
lock_sock(sk);
+ unix_state_lock(sk);
if (unlikely(sock_flag(sk, SOCK_DEAD))) {
ret = SEQ_SKIP;
@@ -3748,6 +3749,7 @@ static int bpf_iter_unix_seq_show(struct seq_file *seq, void *v)
prog = bpf_iter_get_info(&meta, false);
ret = unix_prog_seq_show(prog, &meta, v, uid);
unlock:
+ unix_state_unlock(sk);
release_sock(sk);
return ret;
}
--
2.53.0
^ permalink raw reply related [flat|nested] 10+ messages in thread* Re: [PATCH bpf v4 5/5] bpf, sockmap: Take state lock for af_unix iter
2026-04-14 14:13 ` [PATCH bpf v4 5/5] bpf, sockmap: Take state lock for af_unix iter Michal Luczaj
@ 2026-04-15 5:02 ` Kuniyuki Iwashima
0 siblings, 0 replies; 10+ messages in thread
From: Kuniyuki Iwashima @ 2026-04-15 5:02 UTC (permalink / raw)
To: Michal Luczaj
Cc: John Fastabend, Jakub Sitnicki, Eric Dumazet, Paolo Abeni,
Willem de Bruijn, David S. Miller, Jakub Kicinski, Simon Horman,
Yonghong Song, Andrii Nakryiko, Alexei Starovoitov,
Daniel Borkmann, Martin KaFai Lau, Eduard Zingerman, Song Liu,
Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Shuah Khan, Cong Wang, netdev, bpf, linux-kernel, linux-kselftest
On Tue, Apr 14, 2026 at 7:13 AM Michal Luczaj <mhal@rbox.co> wrote:
>
> When a BPF iterator program updates a sockmap, there is a race condition in
> unix_stream_bpf_update_proto() where the `peer` pointer can become stale[1]
> during a state transition TCP_ESTABLISHED -> TCP_CLOSE.
>
> CPU0 bpf CPU1 close
> -------- ----------
> // unix_stream_bpf_update_proto()
> sk_pair = unix_peer(sk)
> if (unlikely(!sk_pair))
> return -EINVAL;
> // unix_release_sock()
> skpair = unix_peer(sk);
> unix_peer(sk) = NULL;
> sock_put(skpair)
> sock_hold(sk_pair) // UaF
>
> More practically, this fix guarantees that the iterator program is
> consistently provided with a unix socket that remains stable during
> iterator execution.
>
> [1]:
> BUG: KASAN: slab-use-after-free in unix_stream_bpf_update_proto+0x155/0x490
> Write of size 4 at addr ffff8881178c9a00 by task test_progs/2231
> Call Trace:
> dump_stack_lvl+0x5d/0x80
> print_report+0x170/0x4f3
> kasan_report+0xe4/0x1c0
> kasan_check_range+0x125/0x200
> unix_stream_bpf_update_proto+0x155/0x490
> sock_map_link+0x71c/0xec0
> sock_map_update_common+0xbc/0x600
> sock_map_update_elem+0x19a/0x1f0
> bpf_prog_bbbf56096cdd4f01_selective_dump_unix+0x20c/0x217
> bpf_iter_run_prog+0x21e/0xae0
> bpf_iter_unix_seq_show+0x1e0/0x2a0
> bpf_seq_read+0x42c/0x10d0
> vfs_read+0x171/0xb20
> ksys_read+0xff/0x200
> do_syscall_64+0xf7/0x5e0
> entry_SYSCALL_64_after_hwframe+0x76/0x7e
>
> Allocated by task 2236:
> kasan_save_stack+0x30/0x50
> kasan_save_track+0x14/0x30
> __kasan_slab_alloc+0x63/0x80
> kmem_cache_alloc_noprof+0x1d5/0x680
> sk_prot_alloc+0x59/0x210
> sk_alloc+0x34/0x470
> unix_create1+0x86/0x8a0
> unix_stream_connect+0x318/0x15b0
> __sys_connect+0xfd/0x130
> __x64_sys_connect+0x72/0xd0
> do_syscall_64+0xf7/0x5e0
> entry_SYSCALL_64_after_hwframe+0x76/0x7e
>
> Freed by task 2236:
> kasan_save_stack+0x30/0x50
> kasan_save_track+0x14/0x30
> kasan_save_free_info+0x3b/0x70
> __kasan_slab_free+0x47/0x70
> kmem_cache_free+0x11c/0x590
> __sk_destruct+0x432/0x6e0
> unix_release_sock+0x9b3/0xf60
> unix_release+0x8a/0xf0
> __sock_release+0xb0/0x270
> sock_close+0x18/0x20
> __fput+0x36e/0xac0
> fput_close_sync+0xe5/0x1a0
> __x64_sys_close+0x7d/0xd0
> do_syscall_64+0xf7/0x5e0
> entry_SYSCALL_64_after_hwframe+0x76/0x7e
>
> Suggested-by: Kuniyuki Iwashima <kuniyu@google.com>
> Fixes: 2c860a43dd77 ("bpf: af_unix: Implement BPF iterator for UNIX domain socket.")
> Signed-off-by: Michal Luczaj <mhal@rbox.co>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Thanks for the fixes, Michal !
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH bpf v4 0/5] bpf, sockmap: Fix af_unix null-ptr-deref in proto update
2026-04-14 14:13 [PATCH bpf v4 0/5] bpf, sockmap: Fix af_unix null-ptr-deref in proto update Michal Luczaj
` (4 preceding siblings ...)
2026-04-14 14:13 ` [PATCH bpf v4 5/5] bpf, sockmap: Take state lock for af_unix iter Michal Luczaj
@ 2026-04-16 0:30 ` patchwork-bot+netdevbpf
5 siblings, 0 replies; 10+ messages in thread
From: patchwork-bot+netdevbpf @ 2026-04-16 0:30 UTC (permalink / raw)
To: Michal Luczaj
Cc: john.fastabend, jakub, edumazet, kuniyu, pabeni, willemb, davem,
kuba, horms, yhs, andrii, ast, daniel, martin.lau, eddyz87, song,
yonghong.song, kpsingh, sdf, haoluo, jolsa, shuah, cong.wang,
netdev, bpf, linux-kernel, linux-kselftest, jiayuan.chen,
yimingqian591
Hello:
This series was applied to bpf/bpf.git (master)
by Martin KaFai Lau <martin.lau@kernel.org>:
On Tue, 14 Apr 2026 16:13:14 +0200 you wrote:
> Updating sockmap/sockhash using a unix sock races unix_stream_connect():
> when sock_map_sk_state_allowed() passes (sk_state == TCP_ESTABLISHED),
> unix_peer(sk) in unix_stream_bpf_update_proto() may still return NULL.
>
> Signed-off-by: Michal Luczaj <mhal@rbox.co>
> ---
> Changes in v4:
> - Circle back to v1 approach
> - More details in commit messages [Martin]
> - Make unix iter take the state lock [Kaniyuki]
> - Link to v3: https://lore.kernel.org/r/20260306-unix-proto-update-null-ptr-deref-v3-0-2f0c7410c523@rbox.co
>
> [...]
Here is the summary with links:
- [bpf,v4,1/5] bpf, sockmap: Annotate af_unix sock::sk_state data-races
https://git.kernel.org/bpf/bpf/c/a25566084e39
- [bpf,v4,2/5] bpf, sockmap: Fix af_unix iter deadlock
https://git.kernel.org/bpf/bpf/c/4d328dd69538
- [bpf,v4,3/5] selftests/bpf: Extend bpf_iter_unix to attempt deadlocking
https://git.kernel.org/bpf/bpf/c/997b8483d44c
- [bpf,v4,4/5] bpf, sockmap: Fix af_unix null-ptr-deref in proto update
https://git.kernel.org/bpf/bpf/c/dca38b7734d2
- [bpf,v4,5/5] bpf, sockmap: Take state lock for af_unix iter
https://git.kernel.org/bpf/bpf/c/64c2f93fc325
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 10+ messages in thread