From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from out-186.mta0.migadu.com (out-186.mta0.migadu.com [91.218.175.186]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3477E38F239 for ; Thu, 26 Mar 2026 06:26:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.186 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774506402; cv=none; b=DeLqwBjNeZUDHk33fMNUMdT18t3vExNJP5ZIZIMbv9Fsdm0BagCotEfKwQKB7/Yhc3ZGqLB7LkXYwMpb0lFOpbUkkRCD1joTVkQuo4IFxZsgDVepZE0UTCw8/dTIL+6PWJYkGQz9MpSzV4dhB/n4BUBKsAaeDAYidayJUdjtkAQ= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774506402; c=relaxed/simple; bh=DnRIzX0hBwXd/Ulw6QVfouiLtToEjZ9H2jhzh+wlpIk=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=QOYn+R7Fix6p6gJ4UdBqr0hkuO79Fi9MRR8Em6DGcCpvaaa/V6v9WoWQLta5JAgGOU/vxfkdTnstcmRi+gtYONOH8XpGsMST6W8F3Tix3uIo7tMKrmLwX8sgZM4a54+EUBNI0bPmigUUfacE50HP6wRLyR+Xlfl2xHZt9F1q8h8= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=ozHe/Rpm; arc=none smtp.client-ip=91.218.175.186 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="ozHe/Rpm" Message-ID: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1774506398; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=478mCIfLG4518iPcZWZwANHxYmr9jmXXnsrIUimPqK8=; b=ozHe/Rpm44EqYiJeysKFASSSwPol+29egozoRPNBanzav/uZEANKyasiE2SjVOx6cwHEQB 0ou1Wd85s/v09o0wtfnpJxDx9EAiiSpprF2jFADCdXOH9NioXAZLfiTNWLu8PSEykSe/L4 dWYKhOJpfRw5/yZVcKsEXCpKcFRrY24= Date: Wed, 25 Mar 2026 23:26:30 -0700 Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Subject: Re: [PATCH bpf v3 5/5] bpf, sockmap: Adapt for af_unix-specific lock To: Michal Luczaj Cc: Jiayuan Chen , John Fastabend , Jakub Sitnicki , Eric Dumazet , Kuniyuki Iwashima , Paolo Abeni , Willem de Bruijn , "David S. Miller" , Jakub Kicinski , Simon Horman , Yonghong Song , Andrii Nakryiko , Alexei Starovoitov , Daniel Borkmann , Eduard Zingerman , Song Liu , Yonghong Song , KP Singh , Stanislav Fomichev , Hao Luo , Jiri Olsa , Shuah Khan , Cong Wang , netdev@vger.kernel.org, bpf@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org References: <20260306-unix-proto-update-null-ptr-deref-v3-0-2f0c7410c523@rbox.co> <20260306-unix-proto-update-null-ptr-deref-v3-5-2f0c7410c523@rbox.co> <2192e9ef-894f-4826-b296-4f90e973a31b@163.com> Content-Language: en-US X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Martin KaFai Lau In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Migadu-Flow: FLOW_OUT On 3/15/26 4:58 PM, Michal Luczaj wrote: >> Beside, from looking at the may_update_sockmap(), I don't know if it is >> even doable (or useful) to bpf_map_update_elem(unix_sk) in >> tc/flow_dissector/xdp. One possible path is the SOCK_FILTER when looking >> at unix_dgram_sendmsg() => sk_filter(). It was not the original use case >> when the bpf_map_update_elem(sockmap) support was added iirc. > > What about a situation when unix_sk is stored in a sockmap, then tc prog > looks it up and invokes bpf_map_update_elem(unix_sk)? I'm not sure it's > useful, but seems doable. [ Sorry for the late reply ] It is a bummer that the bpf_map_update_elem(unix_sk) path is possible from tc :( Then unix_state_lock() in its current form cannot be safely acquired in sock_map_update_elem(). It is currently a spin_lock() instead of spin_lock_bh(). > >> The only path left is bpf_iter, which I believe was the primary use case >> when adding bpf_map_update_elem(sockmap) support [1]. It would be nice >> to avoid bh_lock_sock() when calling from all bpf_iter (tcp/udp/unix) >> where lock_sock() has already been done. It is more for >> reading-correctness though. This just came to my mind. >> has_current_bpf_ctx() can be used to check this. sockopt_lock_sock() has >> been using it to conditionally take lock_sock() or not. > > [ One clarification: bh_lock_sock() is a sock_map_update_elem() thing, > which can only be called by a bpf prog. IOW, has_current_bpf_ctx() is > always `true` in sock_map_update_elem(), right? ] For all the bpf prog types allowed by may_update_sockmap() to do bpf_map_update_elem(sockmap), only BPF_TRACE_ITER should have has_current_bpf_ctx() == true. The tc prog (and others allowed in may_update_sockmap()) will have has_current_bpf_ctx() == false when calling sock_map_update_elem(). The tc case of bpf_map_update_elem(unix_sk) is unfortunate and requires going back to the drawing board. I think checking unix_peer(sk) for NULL without acquiring unix_state_lock() is needed for the sock_map_update_elem() path, since changing unix_state_lock() for this unknown use case seems overkill. Whether sock_map_update_elem_"sys"() needs unix_state_lock() is up for debate. For bpf_iter_unix_seq_show(), one thought is to add unix_state_lock() there before running the bpf iter prog. iiuc, it is what Kuniyuki has in mind also to allow bpf iter prog having a stable view of unix_sock. This could be a followup. [fwiw, it was why I first thought of has_current_bpf_ctx() to avoid sock_map_update_elem() taking unix_state_lock() again if bpf_iter_unix_seq_show() acquires unix_state_lock() earlier. I later concluded (but proved to be incorrect) that tc cannot call bpf_map_update_elem(unix_sk).] > > Let me know if I'm correctly rephrasing your idea: assume all bpf-context > callers hold the socket locked or keep it "stable" (meaning: "sk won't > surprise sockmap update by some breaking state change coming from another > thread"). As you said, most bpf iters already take the sock_lock(), and I Right, all bpf iter (udp, tcp, unix) has acquired the lock_sock() before running the bpf iter prog. afaik, the only exception is netlink bpf iter but it cannot be added to sock_map afaik. > have a patch that fixes sock_{map,hash}_seq_show(). Then we could try > dropping that bh_lock_sock(). > >> [ I would still keep patch 3 though. ] > > Right. > >> [1]: https://lore.kernel.org/bpf/20200821102948.21918-1-lmb@cloudflare.com/ >> >>> >>> In a parallel thread I've asked Kuniyuki if it might be better to >>> completely drop patch 2/5, which would change how we interact with >>> sock_map_close(). Lets see how it goes. >>> >> >> If patch 2 is dropped, lock_sock() is always needed for unix_sk? > > For sock_map_update_elem_sys() I wanted to lock_sock()+unix_state_lock() > following Kuniyuki's suggestion to keep locking pattern/order (that repeats > when unix bpf iter prog invokes bpf_map_update_elem() -> > sock_map_update_elem()). For sock_map_update_elem() not, we can't sleep there.