From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailtransmit04.runbox.com (mailtransmit04.runbox.com [185.226.149.37]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 29A252566D3; Wed, 1 Apr 2026 19:19:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=185.226.149.37 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775071193; cv=none; b=dbh5ai4EiLIriDAwkrwqHTnB03gQKONHkKEnm4avYqaLJHG6sS1yPQopkhH3lddEVy5xlpk9+ZXSuWQWK5no2nnnH37ioM1f4+w5xfcu+B354c/1JgcJ5rTVKn/DMjr1E1ta7ROQA4Tca4H6POFxjCrj6N2r9HUCfBogBgS91eU= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775071193; c=relaxed/simple; bh=04e6ONstFpXGyUCwmekTXD2mE3YtM61GhSA3gWjC5jI=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=smSMNG+WOMM4GoUru0kOSTMj+qWrUrlRiSb5s+a6AsXerQr4Btg1zSJLmp9MjaB9kXqebPaLjeePXv+C4NXUGCpBcTbsNm2BAwHRtiJtGyeB635xK7olGQ6TWvSU0MeITivgJybbcFfPSYzi+tYBRvvAzLiBz/5AAP/GJhyThjs= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=rbox.co; spf=pass smtp.mailfrom=rbox.co; dkim=pass (2048-bit key) header.d=rbox.co header.i=@rbox.co header.b=lX16oQ8X; arc=none smtp.client-ip=185.226.149.37 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=rbox.co Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=rbox.co Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=rbox.co header.i=@rbox.co header.b="lX16oQ8X" Received: from mailtransmit03.runbox ([10.9.9.163] helo=aibo.runbox.com) by mailtransmit04.runbox.com with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.93) (envelope-from ) id 1w815u-008cy6-CR; Wed, 01 Apr 2026 21:19:14 +0200 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=rbox.co; s=selector1; h=Content-Transfer-Encoding:Content-Type:In-Reply-To:From: References:Cc:To:Subject:MIME-Version:Date:Message-ID; bh=aJ+/jWA2bg1c3IWd84F1W+Yb+GYd9TZfWYB4Twn+Wmc=; b=lX16oQ8X+N2wmoZDVUtyUwL08m 5lqZbmyzJ40LoehUNSkhhy35uGj+weS8/U91OQjf22bLpEK/rfeAJrGAX7QCSsgPlf0COFpwhD1ho DED55vgRrvKgM9DgfWil3mN57OE/BUNKRGw20exiuDxbV4Q1+ksTWKQgbMl8jKjfpaTDSqOsQ0mbm 6IT4VJNkDGCayIoRW6bkZtkQ8h1YkfqBsOzL+gCgfDfOhMaq+s4OUr2IjXqOcPPvUIUI3Zh/NvRbe lRUxYN7CiAZNXRRjlKkiFKVpzrjVIid7KpGtYZGbUZAu/4xGReHZ8CGZ5bvNFQuPQAacCWJyeO857 5ImMXUPg==; Received: from [10.9.9.72] (helo=submission01.runbox) by mailtransmit03.runbox with esmtp (Exim 4.86_2) (envelope-from ) id 1w815d-00031g-Kh; Wed, 01 Apr 2026 21:18:57 +0200 Received: by submission01.runbox with esmtpsa [Authenticated ID (604044)] (TLS1.2:ECDHE_SECP256R1__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim 4.93) id 1w815c-00CE7n-Rw; Wed, 01 Apr 2026 21:18:56 +0200 Message-ID: Date: Wed, 1 Apr 2026 21:18:54 +0200 Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH bpf v3 5/5] bpf, sockmap: Adapt for af_unix-specific lock To: Kuniyuki Iwashima Cc: Martin KaFai Lau , Jiayuan Chen , John Fastabend , Jakub Sitnicki , Eric Dumazet , Paolo Abeni , Willem de Bruijn , "David S. Miller" , Jakub Kicinski , Simon Horman , Yonghong Song , Andrii Nakryiko , Alexei Starovoitov , Daniel Borkmann , Eduard Zingerman , Song Liu , Yonghong Song , KP Singh , Stanislav Fomichev , Hao Luo , Jiri Olsa , Shuah Khan , Cong Wang , netdev@vger.kernel.org, bpf@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org References: <20260306-unix-proto-update-null-ptr-deref-v3-0-2f0c7410c523@rbox.co> <20260306-unix-proto-update-null-ptr-deref-v3-5-2f0c7410c523@rbox.co> <2192e9ef-894f-4826-b296-4f90e973a31b@163.com> <27fa6e91-02a5-46cd-8c95-b75fd2c5fa08@rbox.co> <48bad987-a130-4fb0-b4de-e39deac80ab9@rbox.co> Content-Language: pl-PL, en-GB From: Michal Luczaj In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit On 4/1/26 01:18, Kuniyuki Iwashima wrote: > On Tue, Mar 31, 2026 at 3:44 PM Michal Luczaj wrote: >> >> On 3/31/26 01:27, Kuniyuki Iwashima wrote: >>> On Mon, Mar 30, 2026 at 4:04 PM Michal Luczaj wrote: >>>> On 3/26/26 07:26, Martin KaFai Lau wrote: >>>>> On 3/15/26 4:58 PM, Michal Luczaj wrote: >>>>>>> Beside, from looking at the may_update_sockmap(), I don't know if it is >>>>>>> even doable (or useful) to bpf_map_update_elem(unix_sk) in >>>>>>> tc/flow_dissector/xdp. One possible path is the SOCK_FILTER when looking >>>>>>> at unix_dgram_sendmsg() => sk_filter(). It was not the original use case >>>>>>> when the bpf_map_update_elem(sockmap) support was added iirc. >>>>>> >>>>>> What about a situation when unix_sk is stored in a sockmap, then tc prog >>>>>> looks it up and invokes bpf_map_update_elem(unix_sk)? I'm not sure it's >>>>>> useful, but seems doable. >>>>> >>>>> [ Sorry for the late reply ] >>>>> >>>>> It is a bummer that the bpf_map_update_elem(unix_sk) path is possible >>>>> from tc :( >>>>> >>>>> Then unix_state_lock() in its current form cannot be safely acquired in >>>>> sock_map_update_elem(). It is currently a spin_lock() instead of >>>>> spin_lock_bh(). >>>> >>>> Is there a specific deadlock you have in your mind? >>> >>> lockdep would complain if we used the same lock from >>> different contexts. >>> >>> e.g.) >>> Process context holding unix_state_lock() with the normal spin_lock() >>> -> BH interrupt >>> -> tc prog trying to hold the same lock with spin_lock(). (_bh()) >>> -> deadlock >> >> OK, I get it, thanks. >> >> So here's one more idea: the null-ptr-deref issue is connect() racing >> against sock_map_update_elem_*SYS*() coming from user-space, not the >> can-be-called-from-BH sock_map_update_elem() variant. So can't we assume >> that for any sock_map_update_elem(unix_sk) invoked by a tc prog, unix_sk >> will always be "stable", i.e. in a state that cannot lead to that >> null-ptr-deref? >> >> IOW, if for a tc prog the only way to get hold of unix_sk is look it up in >> a sockmap, then (by the fact that unix_sk _is_ in the sockmap) unix_sk will >> be already safe to use by sock_map_update_elem() without taking the af_unix >> state lock. >> >> Long story short: take the unix state lock in sock_map_update_elem_sys(), >> don't bother in sock_map_update_elem()? > > but it will prevents bpf iter from taking unix_state_lock(). How come? bpf iter can take unix_state_lock() and the prog is free to invoke sock_map_update_elem(). It's the _sys variant that would be taking unix_state_lock() by itself. Or is there some other locking complexity I'm missing? > I don't see a good reason to introduce a new locking rule by unnecessarily > wrapping the entire sockmap update with unix_state_lock() even though > the root bug can be avoided by a simple null check in a leaf function. Yeah, I get that point. I just assumed the root bug was indeed sockmap update missing a lock.