[PATCH bpf v2 0/4] bpf, sockmap: Fix af_unix null-ptr-deref in proto update

public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH bpf v2 0/4] bpf, sockmap: Fix af_unix null-ptr-deref in proto update
@ 2026-02-07 14:34 Michal Luczaj
  2026-02-07 14:34 ` [PATCH bpf v2 1/4] bpf, sockmap: Annotate af_unix sock::sk_state data-races Michal Luczaj
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Michal Luczaj @ 2026-02-07 14:34 UTC (permalink / raw)
  To: John Fastabend, Jakub Sitnicki, Kuniyuki Iwashima,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Daniel Borkmann, Willem de Bruijn, Cong Wang,
	Alexei Starovoitov, Yonghong Song, Andrii Nakryiko,
	Eduard Zingerman, Martin KaFai Lau, Song Liu, Yonghong Song,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Shuah Khan
  Cc: netdev, bpf, linux-kernel, linux-kselftest, Michal Luczaj

BPF_MAP_UPDATE_ELEM races unix_stream_connect(): when
sock_map_sk_state_allowed() passes (sk_state == TCP_ESTABLISHED),
unix_peer(sk) in unix_stream_bpf_update_proto() may still return NULL.

BUG: kernel NULL pointer dereference, address: 0000000000000080
RIP: 0010:unix_stream_bpf_update_proto+0xa0/0x1b0
Call Trace:
  sock_map_link+0x564/0x8b0
  sock_map_update_common+0x6e/0x340
  sock_map_update_elem_sys+0x17d/0x240
  __sys_bpf+0x26db/0x3250
  __x64_sys_bpf+0x21/0x30
  do_syscall_64+0x6b/0x3a0
  entry_SYSCALL_64_after_hwframe+0x76/0x7e

Series fixes the null-ptr-deref by teaching sockmap about the
af_unix-specific locking. Accidentally this also fixes a deadlock.

Signed-off-by: Michal Luczaj <mhal@rbox.co>
---
Changes in v2:
- Instead of probing for unix peer, make sockmap take the right lock [Martin]
- Annotate data races [Kaniyuki, Martin]
- Extend bpf unix iter selftest to attempt a deadlock
- Link to v1: https://lore.kernel.org/r/20260129-unix-proto-update-null-ptr-deref-v1-1-e1daeb7012fd@rbox.co

---
Michal Luczaj (4):
      bpf, sockmap: Annotate af_unix sock::sk_state data-races
      bpf, sockmap: Use sock_map_sk_{acquire,release}() where open-coded
      bpf, sockmap: Adapt for the af_unix-specific lock
      selftests/bpf: Extend bpf_iter_unix to attempt deadlocking

 net/core/sock_map.c                               | 72 +++++++++++++++--------
 tools/testing/selftests/bpf/progs/bpf_iter_unix.c | 10 ++++
 2 files changed, 58 insertions(+), 24 deletions(-)
---
base-commit: 2687c848e57820651b9f69d30c4710f4219f7dbf
change-id: 20260129-unix-proto-update-null-ptr-deref-6a2733bcbbf8

Best regards,
-- 
Michal Luczaj <mhal@rbox.co>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH bpf v2 1/4] bpf, sockmap: Annotate af_unix sock::sk_state data-races
  2026-02-07 14:34 [PATCH bpf v2 0/4] bpf, sockmap: Fix af_unix null-ptr-deref in proto update Michal Luczaj
@ 2026-02-07 14:34 ` Michal Luczaj
  2026-02-07 14:34 ` [PATCH bpf v2 2/4] bpf, sockmap: Use sock_map_sk_{acquire,release}() where open-coded Michal Luczaj
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 14+ messages in thread
From: Michal Luczaj @ 2026-02-07 14:34 UTC (permalink / raw)
  To: John Fastabend, Jakub Sitnicki, Kuniyuki Iwashima,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Daniel Borkmann, Willem de Bruijn, Cong Wang,
	Alexei Starovoitov, Yonghong Song, Andrii Nakryiko,
	Eduard Zingerman, Martin KaFai Lau, Song Liu, Yonghong Song,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Shuah Khan
  Cc: netdev, bpf, linux-kernel, linux-kselftest, Michal Luczaj

sock_map_sk_state_allowed() and sock_map_redirect_allowed() read af_unix
socket sk_state locklessly.

Use READ_ONCE(). Note that for sock_map_redirect_allowed() change affects
not only af_unix, but all non-TCP sockets (UDP, af_vsock).

Suggested-by: Kuniyuki Iwashima <kuniyu@google.com>
Suggested-by: Martin KaFai Lau <martin.lau@linux.dev>
Signed-off-by: Michal Luczaj <mhal@rbox.co>
---
 net/core/sock_map.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index 5947b38e4f8b..d4f15b846ad4 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -530,7 +530,7 @@ static bool sock_map_redirect_allowed(const struct sock *sk)
 	if (sk_is_tcp(sk))
 		return sk->sk_state != TCP_LISTEN;
 	else
-		return sk->sk_state == TCP_ESTABLISHED;
+		return READ_ONCE(sk->sk_state) == TCP_ESTABLISHED;
 }
 
 static bool sock_map_sk_is_suitable(const struct sock *sk)
@@ -543,7 +543,7 @@ static bool sock_map_sk_state_allowed(const struct sock *sk)
 	if (sk_is_tcp(sk))
 		return (1 << sk->sk_state) & (TCPF_ESTABLISHED | TCPF_LISTEN);
 	if (sk_is_stream_unix(sk))
-		return (1 << sk->sk_state) & TCPF_ESTABLISHED;
+		return (1 << READ_ONCE(sk->sk_state)) & TCPF_ESTABLISHED;
 	if (sk_is_vsock(sk) &&
 	    (sk->sk_type == SOCK_STREAM || sk->sk_type == SOCK_SEQPACKET))
 		return (1 << sk->sk_state) & TCPF_ESTABLISHED;

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH bpf v2 2/4] bpf, sockmap: Use sock_map_sk_{acquire,release}() where open-coded
  2026-02-07 14:34 [PATCH bpf v2 0/4] bpf, sockmap: Fix af_unix null-ptr-deref in proto update Michal Luczaj
  2026-02-07 14:34 ` [PATCH bpf v2 1/4] bpf, sockmap: Annotate af_unix sock::sk_state data-races Michal Luczaj
@ 2026-02-07 14:34 ` Michal Luczaj
  2026-02-07 14:34 ` [PATCH bpf v2 3/4] bpf, sockmap: Adapt for the af_unix-specific lock Michal Luczaj
  2026-02-07 14:34 ` [PATCH bpf v2 4/4] selftests/bpf: Extend bpf_iter_unix to attempt deadlocking Michal Luczaj
  3 siblings, 0 replies; 14+ messages in thread
From: Michal Luczaj @ 2026-02-07 14:34 UTC (permalink / raw)
  To: John Fastabend, Jakub Sitnicki, Kuniyuki Iwashima,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Daniel Borkmann, Willem de Bruijn, Cong Wang,
	Alexei Starovoitov, Yonghong Song, Andrii Nakryiko,
	Eduard Zingerman, Martin KaFai Lau, Song Liu, Yonghong Song,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Shuah Khan
  Cc: netdev, bpf, linux-kernel, linux-kselftest, Michal Luczaj

Instead of repeating the same (un)locking pattern, reuse
sock_map_sk_{acquire,release}(). This centralizes the code and makes it
easier to adapt sockmap to af_unix-specific locking.

Signed-off-by: Michal Luczaj <mhal@rbox.co>
---
 net/core/sock_map.c | 21 +++++++--------------
 1 file changed, 7 insertions(+), 14 deletions(-)

diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index d4f15b846ad4..b6586d9590b7 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -353,11 +353,9 @@ static void sock_map_free(struct bpf_map *map)
 		sk = xchg(psk, NULL);
 		if (sk) {
 			sock_hold(sk);
-			lock_sock(sk);
-			rcu_read_lock();
+			sock_map_sk_acquire(sk);
 			sock_map_unref(sk, psk);
-			rcu_read_unlock();
-			release_sock(sk);
+			sock_map_sk_release(sk);
 			sock_put(sk);
 		}
 	}
@@ -1176,11 +1174,9 @@ static void sock_hash_free(struct bpf_map *map)
 		 */
 		hlist_for_each_entry_safe(elem, node, &unlink_list, node) {
 			hlist_del(&elem->node);
-			lock_sock(elem->sk);
-			rcu_read_lock();
+			sock_map_sk_acquire(elem->sk);
 			sock_map_unref(elem->sk, elem);
-			rcu_read_unlock();
-			release_sock(elem->sk);
+			sock_map_sk_release(elem->sk);
 			sock_put(elem->sk);
 			sock_hash_free_elem(htab, elem);
 		}
@@ -1676,8 +1672,7 @@ void sock_map_close(struct sock *sk, long timeout)
 	void (*saved_close)(struct sock *sk, long timeout);
 	struct sk_psock *psock;
 
-	lock_sock(sk);
-	rcu_read_lock();
+	sock_map_sk_acquire(sk);
 	psock = sk_psock(sk);
 	if (likely(psock)) {
 		saved_close = psock->saved_close;
@@ -1685,16 +1680,14 @@ void sock_map_close(struct sock *sk, long timeout)
 		psock = sk_psock_get(sk);
 		if (unlikely(!psock))
 			goto no_psock;
-		rcu_read_unlock();
 		sk_psock_stop(psock);
-		release_sock(sk);
+		sock_map_sk_release(sk);
 		cancel_delayed_work_sync(&psock->work);
 		sk_psock_put(sk, psock);
 	} else {
 		saved_close = READ_ONCE(sk->sk_prot)->close;
 no_psock:
-		rcu_read_unlock();
-		release_sock(sk);
+		sock_map_sk_release(sk);
 	}
 
 	/* Make sure we do not recurse. This is a bug.

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH bpf v2 3/4] bpf, sockmap: Adapt for the af_unix-specific lock
  2026-02-07 14:34 [PATCH bpf v2 0/4] bpf, sockmap: Fix af_unix null-ptr-deref in proto update Michal Luczaj
  2026-02-07 14:34 ` [PATCH bpf v2 1/4] bpf, sockmap: Annotate af_unix sock::sk_state data-races Michal Luczaj
  2026-02-07 14:34 ` [PATCH bpf v2 2/4] bpf, sockmap: Use sock_map_sk_{acquire,release}() where open-coded Michal Luczaj
@ 2026-02-07 14:34 ` Michal Luczaj
  2026-02-07 22:00   ` Kuniyuki Iwashima
  2026-02-07 14:34 ` [PATCH bpf v2 4/4] selftests/bpf: Extend bpf_iter_unix to attempt deadlocking Michal Luczaj
  3 siblings, 1 reply; 14+ messages in thread
From: Michal Luczaj @ 2026-02-07 14:34 UTC (permalink / raw)
  To: John Fastabend, Jakub Sitnicki, Kuniyuki Iwashima,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Daniel Borkmann, Willem de Bruijn, Cong Wang,
	Alexei Starovoitov, Yonghong Song, Andrii Nakryiko,
	Eduard Zingerman, Martin KaFai Lau, Song Liu, Yonghong Song,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Shuah Khan
  Cc: netdev, bpf, linux-kernel, linux-kselftest, Michal Luczaj

unix_stream_connect() sets sk_state (`WRITE_ONCE(sk->sk_state,
TCP_ESTABLISHED)`) _before_ it assigns a peer (`unix_peer(sk) = newsk`).
sk_state == TCP_ESTABLISHED makes sock_map_sk_state_allowed() believe that
socket is properly set up, which would include having a defined peer. IOW,
there's a window when unix_stream_bpf_update_proto() can be called on
socket which still has unix_peer(sk) == NULL.

          T0 bpf                            T1 connect
          ------                            ----------

                                WRITE_ONCE(sk->sk_state, TCP_ESTABLISHED)
sock_map_sk_state_allowed(sk)
...
sk_pair = unix_peer(sk)
sock_hold(sk_pair)
                                sock_hold(newsk)
                                smp_mb__after_atomic()
                                unix_peer(sk) = newsk

BUG: kernel NULL pointer dereference, address: 0000000000000080
RIP: 0010:unix_stream_bpf_update_proto+0xa0/0x1b0
Call Trace:
  sock_map_link+0x564/0x8b0
  sock_map_update_common+0x6e/0x340
  sock_map_update_elem_sys+0x17d/0x240
  __sys_bpf+0x26db/0x3250
  __x64_sys_bpf+0x21/0x30
  do_syscall_64+0x6b/0x3a0
  entry_SYSCALL_64_after_hwframe+0x76/0x7e

Initial idea was to move peer assignment _before_ the sk_state update[1],
but that involved an additional memory barrier, and changing the hot path
was rejected. Then a check during proto update was considered[2], but a
follow-up discussion[3] concluded the root cause is sockmap taking a wrong
lock.

Thus, teach sockmap about the af_unix-specific locking: instead of the
usual lock_sock() involving sock::sk_lock, af_unix protects critical
sections under unix_state_lock() operating on unix_sock::lock.

[1]: https://lore.kernel.org/netdev/ba5c50aa-1df4-40c2-ab33-a72022c5a32e@rbox.co/
[2]: https://lore.kernel.org/netdev/20240610174906.32921-1-kuniyu@amazon.com/
[3]: https://lore.kernel.org/netdev/7603c0e6-cd5b-452b-b710-73b64bd9de26@linux.dev/

This patch also happens to fix a deadlock that may occur when
bpf_iter_unix_seq_show()'s lock_sock_fast() takes the fast path and the
iter prog attempts to update a sockmap. Which ends up spinning at
sock_map_update_elem()'s bh_lock_sock():

WARNING: possible recursive locking detected
--------------------------------------------
test_progs/1393 is trying to acquire lock:
ffff88811ec25f58 (slock-AF_UNIX){+...}-{3:3}, at: sock_map_update_elem+0xdb/0x1f0

but task is already holding lock:
ffff88811ec25f58 (slock-AF_UNIX){+...}-{3:3}, at: __lock_sock_fast+0x37/0xe0

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(slock-AF_UNIX);
  lock(slock-AF_UNIX);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

4 locks held by test_progs/1393:
 #0: ffff88814b59c790 (&p->lock){+.+.}-{4:4}, at: bpf_seq_read+0x59/0x10d0
 #1: ffff88811ec25fd8 (sk_lock-AF_UNIX){+.+.}-{0:0}, at: bpf_seq_read+0x42c/0x10d0
 #2: ffff88811ec25f58 (slock-AF_UNIX){+...}-{3:3}, at: __lock_sock_fast+0x37/0xe0
 #3: ffffffff85a6a7c0 (rcu_read_lock){....}-{1:3}, at: bpf_iter_run_prog+0x51d/0xb00

Call Trace:
 dump_stack_lvl+0x5d/0x80
 print_deadlock_bug.cold+0xc0/0xce
 __lock_acquire+0x130f/0x2590
 lock_acquire+0x14e/0x2b0
 _raw_spin_lock+0x30/0x40
 sock_map_update_elem+0xdb/0x1f0
 bpf_prog_2d0075e5d9b721cd_dump_unix+0x55/0x4f4
 bpf_iter_run_prog+0x5b9/0xb00
 bpf_iter_unix_seq_show+0x1f7/0x2e0
 bpf_seq_read+0x42c/0x10d0
 vfs_read+0x171/0xb20
 ksys_read+0xff/0x200
 do_syscall_64+0x6b/0x3a0
 entry_SYSCALL_64_after_hwframe+0x76/0x7e

Suggested-by: Kuniyuki Iwashima <kuniyu@google.com>
Suggested-by: Martin KaFai Lau <martin.lau@linux.dev>
Fixes: c63829182c37 ("af_unix: Implement ->psock_update_sk_prot()")
Fixes: 2c860a43dd77 ("bpf: af_unix: Implement BPF iterator for UNIX domain socket.")
Signed-off-by: Michal Luczaj <mhal@rbox.co>
---
Keeping sparse annotations in sock_map_sk_{acquire,release}() required some
hackery I'm not proud of. Is there a better way?
---
 net/core/sock_map.c | 47 +++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 39 insertions(+), 8 deletions(-)

diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index b6586d9590b7..0c638b1f363a 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -12,6 +12,7 @@
 #include <linux/list.h>
 #include <linux/jhash.h>
 #include <linux/sock_diag.h>
+#include <net/af_unix.h>
 #include <net/udp.h>
 
 struct bpf_stab {
@@ -115,17 +116,49 @@ int sock_map_prog_detach(const union bpf_attr *attr, enum bpf_prog_type ptype)
 }
 
 static void sock_map_sk_acquire(struct sock *sk)
-	__acquires(&sk->sk_lock.slock)
+	__acquires(sock_or_unix_lock)
 {
-	lock_sock(sk);
+	if (sk_is_unix(sk)) {
+		unix_state_lock(sk);
+		__release(sk); /* Silence sparse. */
+	} else {
+		lock_sock(sk);
+	}
+
 	rcu_read_lock();
 }
 
 static void sock_map_sk_release(struct sock *sk)
-	__releases(&sk->sk_lock.slock)
+	__releases(sock_or_unix_lock)
 {
 	rcu_read_unlock();
-	release_sock(sk);
+
+	if (sk_is_unix(sk)) {
+		unix_state_unlock(sk);
+		__acquire(sk); /* Silence sparse. */
+	} else {
+		release_sock(sk);
+	}
+}
+
+static inline void sock_map_sk_acquire_fast(struct sock *sk)
+{
+	local_bh_disable();
+
+	if (sk_is_unix(sk))
+		unix_state_lock(sk);
+	else
+		bh_lock_sock(sk);
+}
+
+static inline void sock_map_sk_release_fast(struct sock *sk)
+{
+	if (sk_is_unix(sk))
+		unix_state_unlock(sk);
+	else
+		bh_unlock_sock(sk);
+
+	local_bh_enable();
 }
 
 static void sock_map_add_link(struct sk_psock *psock,
@@ -604,16 +637,14 @@ static long sock_map_update_elem(struct bpf_map *map, void *key,
 	if (!sock_map_sk_is_suitable(sk))
 		return -EOPNOTSUPP;
 
-	local_bh_disable();
-	bh_lock_sock(sk);
+	sock_map_sk_acquire_fast(sk);
 	if (!sock_map_sk_state_allowed(sk))
 		ret = -EOPNOTSUPP;
 	else if (map->map_type == BPF_MAP_TYPE_SOCKMAP)
 		ret = sock_map_update_common(map, *(u32 *)key, sk, flags);
 	else
 		ret = sock_hash_update_common(map, key, sk, flags);
-	bh_unlock_sock(sk);
-	local_bh_enable();
+	sock_map_sk_release_fast(sk);
 	return ret;
 }
 

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH bpf v2 3/4] bpf, sockmap: Adapt for the af_unix-specific lock
  2026-02-07 14:34 ` [PATCH bpf v2 3/4] bpf, sockmap: Adapt for the af_unix-specific lock Michal Luczaj
@ 2026-02-07 22:00   ` Kuniyuki Iwashima
  2026-02-08 17:14     ` Michal Luczaj
  0 siblings, 1 reply; 14+ messages in thread
From: Kuniyuki Iwashima @ 2026-02-07 22:00 UTC (permalink / raw)
  To: Michal Luczaj
  Cc: John Fastabend, Jakub Sitnicki, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Daniel Borkmann,
	Willem de Bruijn, Cong Wang, Alexei Starovoitov, Yonghong Song,
	Andrii Nakryiko, Eduard Zingerman, Martin KaFai Lau, Song Liu,
	Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Shuah Khan, netdev, bpf, linux-kernel, linux-kselftest

On Sat, Feb 7, 2026 at 6:35 AM Michal Luczaj <mhal@rbox.co> wrote:
>
> unix_stream_connect() sets sk_state (`WRITE_ONCE(sk->sk_state,
> TCP_ESTABLISHED)`) _before_ it assigns a peer (`unix_peer(sk) = newsk`).
> sk_state == TCP_ESTABLISHED makes sock_map_sk_state_allowed() believe that
> socket is properly set up, which would include having a defined peer. IOW,
> there's a window when unix_stream_bpf_update_proto() can be called on
> socket which still has unix_peer(sk) == NULL.
>
>           T0 bpf                            T1 connect
>           ------                            ----------
>
>                                 WRITE_ONCE(sk->sk_state, TCP_ESTABLISHED)
> sock_map_sk_state_allowed(sk)
> ...
> sk_pair = unix_peer(sk)
> sock_hold(sk_pair)
>                                 sock_hold(newsk)
>                                 smp_mb__after_atomic()
>                                 unix_peer(sk) = newsk
>
> BUG: kernel NULL pointer dereference, address: 0000000000000080
> RIP: 0010:unix_stream_bpf_update_proto+0xa0/0x1b0
> Call Trace:
>   sock_map_link+0x564/0x8b0
>   sock_map_update_common+0x6e/0x340
>   sock_map_update_elem_sys+0x17d/0x240
>   __sys_bpf+0x26db/0x3250
>   __x64_sys_bpf+0x21/0x30
>   do_syscall_64+0x6b/0x3a0
>   entry_SYSCALL_64_after_hwframe+0x76/0x7e
>
> Initial idea was to move peer assignment _before_ the sk_state update[1],
> but that involved an additional memory barrier, and changing the hot path
> was rejected. Then a check during proto update was considered[2], but a
> follow-up discussion[3] concluded the root cause is sockmap taking a wrong
> lock.
>
> Thus, teach sockmap about the af_unix-specific locking: instead of the
> usual lock_sock() involving sock::sk_lock, af_unix protects critical
> sections under unix_state_lock() operating on unix_sock::lock.
>
> [1]: https://lore.kernel.org/netdev/ba5c50aa-1df4-40c2-ab33-a72022c5a32e@rbox.co/
> [2]: https://lore.kernel.org/netdev/20240610174906.32921-1-kuniyu@amazon.com/
> [3]: https://lore.kernel.org/netdev/7603c0e6-cd5b-452b-b710-73b64bd9de26@linux.dev/
>
> This patch also happens to fix a deadlock that may occur when
> bpf_iter_unix_seq_show()'s lock_sock_fast() takes the fast path and the
> iter prog attempts to update a sockmap. Which ends up spinning at
> sock_map_update_elem()'s bh_lock_sock():

Hmm.. this seems to be a more general problem for
bpf iter vs sockmap.  bpf_iter_{tcp,udp}_seq_show() also
hold lock_sock(),  where this patch's solution does not help.
We need to resolve this regardless of socket family.

Also, I feel lock_sock() should be outside of unix_state_lock()
since the former is usually sleepable.  If we used such locking
order in the future, it would trigger ABBA deadlock and we would
have to revisit this problem.



>
> WARNING: possible recursive locking detected
> --------------------------------------------
> test_progs/1393 is trying to acquire lock:
> ffff88811ec25f58 (slock-AF_UNIX){+...}-{3:3}, at: sock_map_update_elem+0xdb/0x1f0
>
> but task is already holding lock:
> ffff88811ec25f58 (slock-AF_UNIX){+...}-{3:3}, at: __lock_sock_fast+0x37/0xe0
>
> other info that might help us debug this:
>  Possible unsafe locking scenario:
>
>        CPU0
>        ----
>   lock(slock-AF_UNIX);
>   lock(slock-AF_UNIX);
>
>  *** DEADLOCK ***
>
>  May be due to missing lock nesting notation
>
> 4 locks held by test_progs/1393:
>  #0: ffff88814b59c790 (&p->lock){+.+.}-{4:4}, at: bpf_seq_read+0x59/0x10d0
>  #1: ffff88811ec25fd8 (sk_lock-AF_UNIX){+.+.}-{0:0}, at: bpf_seq_read+0x42c/0x10d0
>  #2: ffff88811ec25f58 (slock-AF_UNIX){+...}-{3:3}, at: __lock_sock_fast+0x37/0xe0
>  #3: ffffffff85a6a7c0 (rcu_read_lock){....}-{1:3}, at: bpf_iter_run_prog+0x51d/0xb00
>
> Call Trace:
>  dump_stack_lvl+0x5d/0x80
>  print_deadlock_bug.cold+0xc0/0xce
>  __lock_acquire+0x130f/0x2590
>  lock_acquire+0x14e/0x2b0
>  _raw_spin_lock+0x30/0x40
>  sock_map_update_elem+0xdb/0x1f0
>  bpf_prog_2d0075e5d9b721cd_dump_unix+0x55/0x4f4
>  bpf_iter_run_prog+0x5b9/0xb00
>  bpf_iter_unix_seq_show+0x1f7/0x2e0
>  bpf_seq_read+0x42c/0x10d0
>  vfs_read+0x171/0xb20
>  ksys_read+0xff/0x200
>  do_syscall_64+0x6b/0x3a0
>  entry_SYSCALL_64_after_hwframe+0x76/0x7e
>
> Suggested-by: Kuniyuki Iwashima <kuniyu@google.com>
> Suggested-by: Martin KaFai Lau <martin.lau@linux.dev>
> Fixes: c63829182c37 ("af_unix: Implement ->psock_update_sk_prot()")
> Fixes: 2c860a43dd77 ("bpf: af_unix: Implement BPF iterator for UNIX domain socket.")
> Signed-off-by: Michal Luczaj <mhal@rbox.co>
> ---
> Keeping sparse annotations in sock_map_sk_{acquire,release}() required some
> hackery I'm not proud of. Is there a better way?
> ---
>  net/core/sock_map.c | 47 +++++++++++++++++++++++++++++++++++++++--------
>  1 file changed, 39 insertions(+), 8 deletions(-)
>
> diff --git a/net/core/sock_map.c b/net/core/sock_map.c
> index b6586d9590b7..0c638b1f363a 100644
> --- a/net/core/sock_map.c
> +++ b/net/core/sock_map.c
> @@ -12,6 +12,7 @@
>  #include <linux/list.h>
>  #include <linux/jhash.h>
>  #include <linux/sock_diag.h>
> +#include <net/af_unix.h>
>  #include <net/udp.h>
>
>  struct bpf_stab {
> @@ -115,17 +116,49 @@ int sock_map_prog_detach(const union bpf_attr *attr, enum bpf_prog_type ptype)
>  }
>
>  static void sock_map_sk_acquire(struct sock *sk)
> -       __acquires(&sk->sk_lock.slock)
> +       __acquires(sock_or_unix_lock)
>  {
> -       lock_sock(sk);
> +       if (sk_is_unix(sk)) {
> +               unix_state_lock(sk);
> +               __release(sk); /* Silence sparse. */
> +       } else {
> +               lock_sock(sk);
> +       }
> +
>         rcu_read_lock();
>  }
>
>  static void sock_map_sk_release(struct sock *sk)
> -       __releases(&sk->sk_lock.slock)
> +       __releases(sock_or_unix_lock)
>  {
>         rcu_read_unlock();
> -       release_sock(sk);
> +
> +       if (sk_is_unix(sk)) {
> +               unix_state_unlock(sk);
> +               __acquire(sk); /* Silence sparse. */
> +       } else {
> +               release_sock(sk);
> +       }
> +}
> +
> +static inline void sock_map_sk_acquire_fast(struct sock *sk)
> +{
> +       local_bh_disable();
> +
> +       if (sk_is_unix(sk))
> +               unix_state_lock(sk);
> +       else
> +               bh_lock_sock(sk);
> +}
> +
> +static inline void sock_map_sk_release_fast(struct sock *sk)
> +{
> +       if (sk_is_unix(sk))
> +               unix_state_unlock(sk);
> +       else
> +               bh_unlock_sock(sk);
> +
> +       local_bh_enable();
>  }
>
>  static void sock_map_add_link(struct sk_psock *psock,
> @@ -604,16 +637,14 @@ static long sock_map_update_elem(struct bpf_map *map, void *key,
>         if (!sock_map_sk_is_suitable(sk))
>                 return -EOPNOTSUPP;
>
> -       local_bh_disable();
> -       bh_lock_sock(sk);
> +       sock_map_sk_acquire_fast(sk);
>         if (!sock_map_sk_state_allowed(sk))
>                 ret = -EOPNOTSUPP;
>         else if (map->map_type == BPF_MAP_TYPE_SOCKMAP)
>                 ret = sock_map_update_common(map, *(u32 *)key, sk, flags);
>         else
>                 ret = sock_hash_update_common(map, key, sk, flags);
> -       bh_unlock_sock(sk);
> -       local_bh_enable();
> +       sock_map_sk_release_fast(sk);
>         return ret;
>  }
>
>
> --
> 2.52.0
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH bpf v2 3/4] bpf, sockmap: Adapt for the af_unix-specific lock
  2026-02-07 22:00   ` Kuniyuki Iwashima
@ 2026-02-08 17:14     ` Michal Luczaj
  2026-02-09 20:17       ` Martin KaFai Lau
  0 siblings, 1 reply; 14+ messages in thread
From: Michal Luczaj @ 2026-02-08 17:14 UTC (permalink / raw)
  To: Kuniyuki Iwashima
  Cc: John Fastabend, Jakub Sitnicki, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Daniel Borkmann,
	Willem de Bruijn, Cong Wang, Alexei Starovoitov, Yonghong Song,
	Andrii Nakryiko, Eduard Zingerman, Martin KaFai Lau, Song Liu,
	Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Shuah Khan, netdev, bpf, linux-kernel, linux-kselftest

On 2/7/26 23:00, Kuniyuki Iwashima wrote:
> On Sat, Feb 7, 2026 at 6:35 AM Michal Luczaj <mhal@rbox.co> wrote:
>> This patch also happens to fix a deadlock that may occur when
>> bpf_iter_unix_seq_show()'s lock_sock_fast() takes the fast path and the
>> iter prog attempts to update a sockmap. Which ends up spinning at
>> sock_map_update_elem()'s bh_lock_sock():
> 
> Hmm.. this seems to be a more general problem for
> bpf iter vs sockmap.  bpf_iter_{tcp,udp}_seq_show() also
> hold lock_sock(),  where this patch's solution does not help.
> We need to resolve this regardless of socket family.

I don't see any deadlocks there. Note that I've mentioned lock_sock_fast()
fast path was a problem, not lock_sock().

> Also, I feel lock_sock() should be outside of unix_state_lock()
> since the former is usually sleepable.  If we used such locking
> order in the future, it would trigger ABBA deadlock and we would
> have to revisit this problem.

Do I understand correctly you're suggesting taking lock_sock() for af_unix
just as well?


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH bpf v2 3/4] bpf, sockmap: Adapt for the af_unix-specific lock
  2026-02-08 17:14     ` Michal Luczaj
@ 2026-02-09 20:17       ` Martin KaFai Lau
  2026-02-11 10:02         ` Michal Luczaj
  0 siblings, 1 reply; 14+ messages in thread
From: Martin KaFai Lau @ 2026-02-09 20:17 UTC (permalink / raw)
  To: Michal Luczaj, Kuniyuki Iwashima
  Cc: John Fastabend, Jakub Sitnicki, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Daniel Borkmann,
	Willem de Bruijn, Cong Wang, Alexei Starovoitov, Yonghong Song,
	Andrii Nakryiko, Eduard Zingerman, Song Liu, Yonghong Song,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Shuah Khan,
	netdev, bpf, linux-kernel, linux-kselftest



On 2/8/26 9:14 AM, Michal Luczaj wrote:
> On 2/7/26 23:00, Kuniyuki Iwashima wrote:
>> On Sat, Feb 7, 2026 at 6:35 AM Michal Luczaj <mhal@rbox.co> wrote:
>>> This patch also happens to fix a deadlock that may occur when
>>> bpf_iter_unix_seq_show()'s lock_sock_fast() takes the fast path and the
>>> iter prog attempts to update a sockmap. Which ends up spinning at
>>> sock_map_update_elem()'s bh_lock_sock():
>>
>> Hmm.. this seems to be a more general problem for
>> bpf iter vs sockmap.  bpf_iter_{tcp,udp}_seq_show() also
>> hold lock_sock(),  where this patch's solution does not help.
>> We need to resolve this regardless of socket family.
> 
> I don't see any deadlocks there. Note that I've mentioned lock_sock_fast()
> fast path was a problem, not lock_sock().

For the tcp/udp, I think the bpf_iter should be fine: lock_sock() in 
seq_show and bh_lock_sock() in map_update. It seems redundant though.

 From looking at may_update_sockmap(), other bpf progs (e.g., tc) can do 
map_update also. On those paths, I am not sure why 
sock_map_update_elem() does not need to check "!sock_owned_by_user(sk)". 
If it is indeed an issue, it probably needs to be addressed separately.

It should also be helpful to be consistent with tcp/udp iter and use 
lock_sock() instead of lock_sock_fast() in bpf_iter_unix_seq_show().




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH bpf v2 3/4] bpf, sockmap: Adapt for the af_unix-specific lock
  2026-02-09 20:17       ` Martin KaFai Lau
@ 2026-02-11 10:02         ` Michal Luczaj
  2026-02-11 13:24           ` Michal Luczaj
  2026-02-23 21:43           ` Martin KaFai Lau
  0 siblings, 2 replies; 14+ messages in thread
From: Michal Luczaj @ 2026-02-11 10:02 UTC (permalink / raw)
  To: Martin KaFai Lau, Kuniyuki Iwashima
  Cc: John Fastabend, Jakub Sitnicki, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Daniel Borkmann,
	Willem de Bruijn, Cong Wang, Alexei Starovoitov, Yonghong Song,
	Andrii Nakryiko, Eduard Zingerman, Song Liu, Yonghong Song,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Shuah Khan,
	netdev, bpf, linux-kernel, linux-kselftest

On 2/9/26 21:17, Martin KaFai Lau wrote:
> On 2/8/26 9:14 AM, Michal Luczaj wrote:
>> On 2/7/26 23:00, Kuniyuki Iwashima wrote:
>>> On Sat, Feb 7, 2026 at 6:35 AM Michal Luczaj <mhal@rbox.co> wrote:
>>>> This patch also happens to fix a deadlock that may occur when
>>>> bpf_iter_unix_seq_show()'s lock_sock_fast() takes the fast path and the
>>>> iter prog attempts to update a sockmap. Which ends up spinning at
>>>> sock_map_update_elem()'s bh_lock_sock():
>>>
>>> Hmm.. this seems to be a more general problem for
>>> bpf iter vs sockmap.  bpf_iter_{tcp,udp}_seq_show() also
>>> hold lock_sock(),  where this patch's solution does not help.
>>> We need to resolve this regardless of socket family.
>>
>> I don't see any deadlocks there. Note that I've mentioned lock_sock_fast()
>> fast path was a problem, not lock_sock().
> 
> For the tcp/udp, I think the bpf_iter should be fine: lock_sock() in 
> seq_show and bh_lock_sock() in map_update. It seems redundant though.

I wasn't sure what exactly you suspect of being redundant, so I did some
digging:

lock_sock() in tcp/udp iter is expected (among others?) by kfunc
bpf_sock_destroy(). Relevant commit 4ddbcb886268 ("bpf: Add
bpf_sock_destroy kfunc"),
https://lore.kernel.org/all/20230519225157.760788-8-aditi.ghag@isovalent.com/
In short: lock must be taken for synchronization of proto::diag_destroy().

Reasons for bh_lock_sock() during bpf sockmap update are explained in
commit 0126240f448d ("bpf: sockmap: Allow update from BPF"),
https://lore.kernel.org/bpf/20200821102948.21918-6-lmb@cloudflare.com/
In short: socket shouldn't be allowed to change its state during the
update. BH lock because bpf can't sleep.

>  From looking at may_update_sockmap(), other bpf progs (e.g., tc) can do 
> map_update also. On those paths, I am not sure why 
> sock_map_update_elem() does not need to check "!sock_owned_by_user(sk)". 
> If it is indeed an issue, it probably needs to be addressed separately.

Since sockmap update can happen in a tracing prog, can you really expect a
socket to be always owned?

> It should also be helpful to be consistent with tcp/udp iter and use 
> lock_sock() instead of lock_sock_fast() in bpf_iter_unix_seq_show().

OK, I'll tweak that in v3.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH bpf v2 3/4] bpf, sockmap: Adapt for the af_unix-specific lock
  2026-02-11 10:02         ` Michal Luczaj
@ 2026-02-11 13:24           ` Michal Luczaj
  2026-02-23 21:43           ` Martin KaFai Lau
  1 sibling, 0 replies; 14+ messages in thread
From: Michal Luczaj @ 2026-02-11 13:24 UTC (permalink / raw)
  To: Martin KaFai Lau, Kuniyuki Iwashima
  Cc: John Fastabend, Jakub Sitnicki, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Daniel Borkmann,
	Willem de Bruijn, Cong Wang, Alexei Starovoitov, Yonghong Song,
	Andrii Nakryiko, Eduard Zingerman, Song Liu, Yonghong Song,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Shuah Khan,
	netdev, bpf, linux-kernel, linux-kselftest

Replying to correct myself:

> Since sockmap update can happen in a tracing prog, can you really expect a
> socket to be always owned?

Ha, but I'm wrong! Of tracers only BPF_TRACE_ITER allows calling
sock_map_update_elem(). And those are supposed to lock_sock(). Except some
don't, e.g. sock_map_seq_show(). Or is there something else I'm missing?


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH bpf v2 3/4] bpf, sockmap: Adapt for the af_unix-specific lock
  2026-02-11 10:02         ` Michal Luczaj
  2026-02-11 13:24           ` Michal Luczaj
@ 2026-02-23 21:43           ` Martin KaFai Lau
  2026-02-24 15:28             ` Michal Luczaj
  1 sibling, 1 reply; 14+ messages in thread
From: Martin KaFai Lau @ 2026-02-23 21:43 UTC (permalink / raw)
  To: Michal Luczaj, Kuniyuki Iwashima
  Cc: John Fastabend, Jakub Sitnicki, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Daniel Borkmann,
	Willem de Bruijn, Cong Wang, Alexei Starovoitov, Yonghong Song,
	Andrii Nakryiko, Eduard Zingerman, Song Liu, Yonghong Song,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Shuah Khan,
	netdev, bpf, linux-kernel, linux-kselftest


On 2/11/26 2:02 AM, Michal Luczaj wrote:
>> It should also be helpful to be consistent with tcp/udp iter and use
>> lock_sock() instead of lock_sock_fast() in bpf_iter_unix_seq_show().
> OK, I'll tweak that in v3.

Hi, Michal. Are you planning to send v3 soon? I don't think the 
sock_owned_by_user for the non-tracing prog needs to be addressed in the 
same set.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH bpf v2 3/4] bpf, sockmap: Adapt for the af_unix-specific lock
  2026-02-23 21:43           ` Martin KaFai Lau
@ 2026-02-24 15:28             ` Michal Luczaj
  2026-03-03  1:51               ` Martin KaFai Lau
  0 siblings, 1 reply; 14+ messages in thread
From: Michal Luczaj @ 2026-02-24 15:28 UTC (permalink / raw)
  To: Martin KaFai Lau, Kuniyuki Iwashima
  Cc: John Fastabend, Jakub Sitnicki, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Daniel Borkmann,
	Willem de Bruijn, Cong Wang, Alexei Starovoitov, Yonghong Song,
	Andrii Nakryiko, Eduard Zingerman, Song Liu, Yonghong Song,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Shuah Khan,
	netdev, bpf, linux-kernel, linux-kselftest

On 2/23/26 22:43, Martin KaFai Lau wrote:
> 
> On 2/11/26 2:02 AM, Michal Luczaj wrote:
>>> It should also be helpful to be consistent with tcp/udp iter and use
>>> lock_sock() instead of lock_sock_fast() in bpf_iter_unix_seq_show().
>> OK, I'll tweak that in v3.
> 
> Hi, Michal. Are you planning to send v3 soon? I don't think the 
> sock_owned_by_user for the non-tracing prog needs to be addressed in the 
> same set.

Yes, I'm working on it. Sorry for the delay, I'm taking my sweet time to
come up with a selftest.

I think I can neatly fit the sock_owned_by_user()-related stuff in this
series, but let me know if you'd rather have it separately. Whichever way,
I don't mind.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH bpf v2 3/4] bpf, sockmap: Adapt for the af_unix-specific lock
  2026-02-24 15:28             ` Michal Luczaj
@ 2026-03-03  1:51               ` Martin KaFai Lau
  2026-03-05 23:35                 ` Michal Luczaj
  0 siblings, 1 reply; 14+ messages in thread
From: Martin KaFai Lau @ 2026-03-03  1:51 UTC (permalink / raw)
  To: Michal Luczaj
  Cc: Kuniyuki Iwashima, John Fastabend, Jakub Sitnicki,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Daniel Borkmann, Willem de Bruijn, Cong Wang,
	Alexei Starovoitov, Yonghong Song, Andrii Nakryiko,
	Eduard Zingerman, Song Liu, Yonghong Song, KP Singh,
	Stanislav Fomichev, Hao Luo, Jiri Olsa, Shuah Khan, netdev, bpf,
	linux-kernel, linux-kselftest

On 2/24/26 7:28 AM, Michal Luczaj wrote:
> On 2/23/26 22:43, Martin KaFai Lau wrote:
>>
>> On 2/11/26 2:02 AM, Michal Luczaj wrote:
>>>> It should also be helpful to be consistent with tcp/udp iter and use
>>>> lock_sock() instead of lock_sock_fast() in bpf_iter_unix_seq_show().
>>> OK, I'll tweak that in v3.
>>
>> Hi, Michal. Are you planning to send v3 soon? I don't think the
>> sock_owned_by_user for the non-tracing prog needs to be addressed in the
>> same set.
> 
> Yes, I'm working on it. Sorry for the delay, I'm taking my sweet time to
> come up with a selftest.
> 
> I think I can neatly fit the sock_owned_by_user()-related stuff in this
> series, but let me know if you'd rather have it separately. Whichever way,
> I don't mind.

I think sock_owned_by_user() is not related to the AF_UNIX's sockmap 
fix. If it is not, it's better to separate it. I think one thing at a 
time is easier to review.

If they are related somewhat or it's easier to review them together, see 
if it makes sense to structure them in a way such that the AF_UNIX's 
sockmap-related fixes and tests are at the beginning of the patch set so 
that they can be applied first if others need more discussion.

Regarding the "Keeping sparse annotations in 
sock_map_sk_{acquire,release}() required some
hackery...". Maybe just remove the annotations?
[ btw, from commit 5b63d0ae94cc, the sparse support is removed and it
   depends solely on clang now (?). If it is the case, what does clang do
   on the "sock_or_unix_lock"? ]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH bpf v2 3/4] bpf, sockmap: Adapt for the af_unix-specific lock
  2026-03-03  1:51               ` Martin KaFai Lau
@ 2026-03-05 23:35                 ` Michal Luczaj
  0 siblings, 0 replies; 14+ messages in thread
From: Michal Luczaj @ 2026-03-05 23:35 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: Kuniyuki Iwashima, John Fastabend, Jakub Sitnicki,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Daniel Borkmann, Willem de Bruijn, Cong Wang,
	Alexei Starovoitov, Yonghong Song, Andrii Nakryiko,
	Eduard Zingerman, Song Liu, Yonghong Song, KP Singh,
	Stanislav Fomichev, Hao Luo, Jiri Olsa, Shuah Khan, netdev, bpf,
	linux-kernel, linux-kselftest

On 3/3/26 02:51, Martin KaFai Lau wrote:
> On 2/24/26 7:28 AM, Michal Luczaj wrote:
>> On 2/23/26 22:43, Martin KaFai Lau wrote:
>>>
>>> On 2/11/26 2:02 AM, Michal Luczaj wrote:
>>>>> It should also be helpful to be consistent with tcp/udp iter and use
>>>>> lock_sock() instead of lock_sock_fast() in bpf_iter_unix_seq_show().
>>>> OK, I'll tweak that in v3.
>>>
>>> Hi, Michal. Are you planning to send v3 soon? I don't think the
>>> sock_owned_by_user for the non-tracing prog needs to be addressed in the
>>> same set.
>>
>> Yes, I'm working on it. Sorry for the delay, I'm taking my sweet time to
>> come up with a selftest.
>>
>> I think I can neatly fit the sock_owned_by_user()-related stuff in this
>> series, but let me know if you'd rather have it separately. Whichever way,
>> I don't mind.
> 
> I think sock_owned_by_user() is not related to the AF_UNIX's sockmap 
> fix. If it is not, it's better to separate it. I think one thing at a 
> time is easier to review.

All right, sending out v3 to reduce the scope:
https://lore.kernel.org/bpf/20260306-unix-proto-update-null-ptr-deref-v3-0-2f0c7410c523@rbox.co/

> If they are related somewhat or it's easier to review them together, see 
> if it makes sense to structure them in a way such that the AF_UNIX's 
> sockmap-related fixes and tests are at the beginning of the patch set so 
> that they can be applied first if others need more discussion.

I'd say those changes are related, but things are getting unclear for me, I
won't insist on keeping them together.

So. Simply dropping BH locks and introducing sock_owned_by_user() in
sock_map_update_elem() is not enough. For example, sock_map_seq_show() and
sock_hash_seq_show() must be adapted before taking the sock lock (as they
currently run under RCU).

But there's a deeper problem: bpf iters let the bpf code call sockmap
update when sk's file descriptor is halfway through sock_map_close(). This
breaks the sockmap contract as the update can happen after
sock_map_remove_links(), but before saved_close(). sockmap ends up with a
closed sk that normally would be removed.

For sockmap/sockhash iters this condition can be handled: before running
the bpf tracer prog, check if sk is still linked to map. Since those are
sockmap/sockhash iters, if there's no link, it means sk should be skipped.

But I'm not sure about other iters. Plus the case you've mentioned of bpf
tc progs that can call sockmap update with sk unowned. Hence I stop here
worried this may be a doomed approach :)

Long story short: feels like taking the spinlocks (without unconditionally
checking sock_owned_by_user()) in sock_map_update_elem() is the best
approach. Just as it is currently implemented (+the af_unix special
casing). Some sks will come owned, some won't. What needs to be added after
taking the spinlock(s) is a sort of is-sk-being-closed check. Or am I
missing something?

> Regarding the "Keeping sparse annotations in 
> sock_map_sk_{acquire,release}() required some
> hackery...". Maybe just remove the annotations?
> [ btw, from commit 5b63d0ae94cc, the sparse support is removed and it
>    depends solely on clang now (?). If it is the case, what does clang do
>    on the "sock_or_unix_lock"? ]

Oh, I haven't seen that one, thanks. Will take a look. For now, I've just
dropped the annotations.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH bpf v2 4/4] selftests/bpf: Extend bpf_iter_unix to attempt deadlocking
  2026-02-07 14:34 [PATCH bpf v2 0/4] bpf, sockmap: Fix af_unix null-ptr-deref in proto update Michal Luczaj
                   ` (2 preceding siblings ...)
  2026-02-07 14:34 ` [PATCH bpf v2 3/4] bpf, sockmap: Adapt for the af_unix-specific lock Michal Luczaj
@ 2026-02-07 14:34 ` Michal Luczaj
  3 siblings, 0 replies; 14+ messages in thread
From: Michal Luczaj @ 2026-02-07 14:34 UTC (permalink / raw)
  To: John Fastabend, Jakub Sitnicki, Kuniyuki Iwashima,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Daniel Borkmann, Willem de Bruijn, Cong Wang,
	Alexei Starovoitov, Yonghong Song, Andrii Nakryiko,
	Eduard Zingerman, Martin KaFai Lau, Song Liu, Yonghong Song,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Shuah Khan
  Cc: netdev, bpf, linux-kernel, linux-kselftest, Michal Luczaj

Updating a sockmap from a unix iterator prog may lead to a deadlock.
Piggyback on the original selftest.

Signed-off-by: Michal Luczaj <mhal@rbox.co>
---
 tools/testing/selftests/bpf/progs/bpf_iter_unix.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/tools/testing/selftests/bpf/progs/bpf_iter_unix.c b/tools/testing/selftests/bpf/progs/bpf_iter_unix.c
index fea275df9e22..a2652c8c3616 100644
--- a/tools/testing/selftests/bpf/progs/bpf_iter_unix.c
+++ b/tools/testing/selftests/bpf/progs/bpf_iter_unix.c
@@ -7,6 +7,13 @@
 
 char _license[] SEC("license") = "GPL";
 
+SEC(".maps") struct {
+	__uint(type, BPF_MAP_TYPE_SOCKMAP);
+	__uint(max_entries, 1);
+	__type(key, __u32);
+	__type(value, __u64);
+} sockmap;
+
 static long sock_i_ino(const struct sock *sk)
 {
 	const struct socket *sk_socket = sk->sk_socket;
@@ -76,5 +83,8 @@ int dump_unix(struct bpf_iter__unix *ctx)
 
 	BPF_SEQ_PRINTF(seq, "\n");
 
+	/* Test for deadlock. */
+	bpf_map_update_elem(&sockmap, &(int){0}, sk, 0);
+
 	return 0;
 }

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2026-03-05 23:35 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-07 14:34 [PATCH bpf v2 0/4] bpf, sockmap: Fix af_unix null-ptr-deref in proto update Michal Luczaj
2026-02-07 14:34 ` [PATCH bpf v2 1/4] bpf, sockmap: Annotate af_unix sock::sk_state data-races Michal Luczaj
2026-02-07 14:34 ` [PATCH bpf v2 2/4] bpf, sockmap: Use sock_map_sk_{acquire,release}() where open-coded Michal Luczaj
2026-02-07 14:34 ` [PATCH bpf v2 3/4] bpf, sockmap: Adapt for the af_unix-specific lock Michal Luczaj
2026-02-07 22:00   ` Kuniyuki Iwashima
2026-02-08 17:14     ` Michal Luczaj
2026-02-09 20:17       ` Martin KaFai Lau
2026-02-11 10:02         ` Michal Luczaj
2026-02-11 13:24           ` Michal Luczaj
2026-02-23 21:43           ` Martin KaFai Lau
2026-02-24 15:28             ` Michal Luczaj
2026-03-03  1:51               ` Martin KaFai Lau
2026-03-05 23:35                 ` Michal Luczaj
2026-02-07 14:34 ` [PATCH bpf v2 4/4] selftests/bpf: Extend bpf_iter_unix to attempt deadlocking Michal Luczaj

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox