From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from out-189.mta0.migadu.com (out-189.mta0.migadu.com [91.218.175.189]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 49A6836D9F4; Fri, 6 Mar 2026 06:04:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.189 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772777092; cv=none; b=KXE27YJHNjxUDiJ26GjELuLvSECv/PQP8bSEXEOTjg9V0BIl1xmXsA9kIWibSqNmfhZWHk0DWyuw7CT7ZaVX4fdZKF43oTJu/mDWceiGbeDD2iUGIcjCAjbRLqlD2ZwWkB03c6ZPGyDE+Jnk7r/hJZHOwf5q0vvCR7PyNfd8ato= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772777092; c=relaxed/simple; bh=vWeu+rrJuKn1HkWhdpVCh461e+ugjTi9tY18/n63aBU=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=HfmHriGTPcOHbffrmOWv8AmXtRasJ1twsQG3TXe2qEC3gIX6P+elKuc3/v57V8InTFbuLn6nfe/HFlu0vSa5sVOOJDCYecB403sLVS+entV4YwZUlAUnnBTEm13qcQiywC1eZQSkqWPczFcs/Cd+/zFQs9/xQxLqtJ2H/DouzAs= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=Jg6FGjoK; arc=none smtp.client-ip=91.218.175.189 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="Jg6FGjoK" Message-ID: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1772777076; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=lha1WxEP+imAaOmc44xUnzff4mjB8gJAeNTzMUJkIRU=; b=Jg6FGjoK1PpSXKlligTbMsTDBRMhL/de2yK3tlsDKu2sxpMYtli5ImRDeZIg6MwAsnjUmF a0QOh6/N3x4yIg2UuELd1FVRhP/cE7IHjZT+I5QqUdXIsQgwZBxHusHK71Rt9lrQFUEsyj NF7WpdbM6vil13A0pzoVwjUupJo3jCs= Date: Fri, 6 Mar 2026 14:04:09 +0800 Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Subject: Re: [PATCH bpf v3 3/5] bpf, sockmap: Fix af_unix iter deadlock To: Michal Luczaj , John Fastabend , Jakub Sitnicki , Eric Dumazet , Kuniyuki Iwashima , Paolo Abeni , Willem de Bruijn , "David S. Miller" , Jakub Kicinski , Simon Horman , Yonghong Song , Andrii Nakryiko , Alexei Starovoitov , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , Song Liu , Yonghong Song , KP Singh , Stanislav Fomichev , Hao Luo , Jiri Olsa , Shuah Khan , Cong Wang Cc: netdev@vger.kernel.org, bpf@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org References: <20260306-unix-proto-update-null-ptr-deref-v3-0-2f0c7410c523@rbox.co> <20260306-unix-proto-update-null-ptr-deref-v3-3-2f0c7410c523@rbox.co> X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Jiayuan Chen In-Reply-To: <20260306-unix-proto-update-null-ptr-deref-v3-3-2f0c7410c523@rbox.co> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Migadu-Flow: FLOW_OUT On 3/6/26 7:30 AM, Michal Luczaj wrote: > bpf_iter_unix_seq_show() may deadlock when lock_sock_fast() takes the fast > path and the iter prog attempts to update a sockmap. Which ends up spinning > at sock_map_update_elem()'s bh_lock_sock(): > > WARNING: possible recursive locking detected > test_progs/1393 is trying to acquire lock: > ffff88811ec25f58 (slock-AF_UNIX){+...}-{3:3}, at: sock_map_update_elem+0xdb/0x1f0 > > but task is already holding lock: > ffff88811ec25f58 (slock-AF_UNIX){+...}-{3:3}, at: __lock_sock_fast+0x37/0xe0 > > other info that might help us debug this: > Possible unsafe locking scenario: > > CPU0 > ---- > lock(slock-AF_UNIX); > lock(slock-AF_UNIX); > > *** DEADLOCK *** > > May be due to missing lock nesting notation > > 4 locks held by test_progs/1393: > #0: ffff88814b59c790 (&p->lock){+.+.}-{4:4}, at: bpf_seq_read+0x59/0x10d0 > #1: ffff88811ec25fd8 (sk_lock-AF_UNIX){+.+.}-{0:0}, at: bpf_seq_read+0x42c/0x10d0 > #2: ffff88811ec25f58 (slock-AF_UNIX){+...}-{3:3}, at: __lock_sock_fast+0x37/0xe0 > #3: ffffffff85a6a7c0 (rcu_read_lock){....}-{1:3}, at: bpf_iter_run_prog+0x51d/0xb00 > > Call Trace: > dump_stack_lvl+0x5d/0x80 > print_deadlock_bug.cold+0xc0/0xce > __lock_acquire+0x130f/0x2590 > lock_acquire+0x14e/0x2b0 > _raw_spin_lock+0x30/0x40 > sock_map_update_elem+0xdb/0x1f0 > bpf_prog_2d0075e5d9b721cd_dump_unix+0x55/0x4f4 > bpf_iter_run_prog+0x5b9/0xb00 > bpf_iter_unix_seq_show+0x1f7/0x2e0 > bpf_seq_read+0x42c/0x10d0 > vfs_read+0x171/0xb20 > ksys_read+0xff/0x200 > do_syscall_64+0x6b/0x3a0 > entry_SYSCALL_64_after_hwframe+0x76/0x7e > > Suggested-by: Kuniyuki Iwashima > Suggested-by: Martin KaFai Lau > Fixes: 2c860a43dd77 ("bpf: af_unix: Implement BPF iterator for UNIX domain socket.") > Signed-off-by: Michal Luczaj > --- > net/unix/af_unix.c | 7 +++---- > 1 file changed, 3 insertions(+), 4 deletions(-) > > diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c > index 3756a93dc63a..3d2cfb4ecbcd 100644 > --- a/net/unix/af_unix.c > +++ b/net/unix/af_unix.c > @@ -3729,15 +3729,14 @@ static int bpf_iter_unix_seq_show(struct seq_file *seq, void *v) > struct bpf_prog *prog; > struct sock *sk = v; > uid_t uid; > - bool slow; > int ret; > > if (v == SEQ_START_TOKEN) > return 0; > > - slow = lock_sock_fast(sk); > + lock_sock(sk); > > - if (unlikely(sk_unhashed(sk))) { > + if (unlikely(sock_flag(sk, SOCK_DEAD))) { > ret = SEQ_SKIP; > goto unlock; > } Switching to lock_sock() fixes the deadlock, but it does not provide mutual exclusion with unix_release_sock(), which uses unix_state_lock() exclusively and does not touch lock_sock() at all. So a dying socket can still reach the BPF prog concurrently with unix_release_sock() running on another CPU. Both SOCK_DEAD and the clearing of unix_peer(sk) happen under unix_state_lock() in unix_release_sock(). Without taking unix_state_lock() before the SOCK_DEAD check, there is a window: iter unix_release_sock() ---  lock_sock(sk) SOCK_DEAD == 0(check passes)   unix_state_lock(sk)                                                   unix_peer(sk) = NULL   sock_set_flag(sk, SOCK_DEAD)   unix_state_unlock(sk) BPF prog runs → accesses unix_peer(sk) == NULL → crash This was not raised in the v2 discussion. The natural fix is to check SOCK_DEAD under unix_state_lock(). However, holding unix_state_lock() throughout BPF prog execution would conflict with patch 5: sock_map_sk_acquire_fast() also takes unix_state_lock() for AF_UNIX sockets, resulting in a recursive spinlock deadlock. Kuniyuki, Martin — what is the right approach here? > @@ -3747,7 +3746,7 @@ static int bpf_iter_unix_seq_show(struct seq_file *seq, void *v) > prog = bpf_iter_get_info(&meta, false); > ret = unix_prog_seq_show(prog, &meta, v, uid); > unlock: > - unlock_sock_fast(sk, slow); > + release_sock(sk); > return ret; > } > >