From: Michal Luczaj <mhal@rbox.co>
To: Kuniyuki Iwashima <kuniyu@google.com>
Cc: John Fastabend <john.fastabend@gmail.com>,
Jakub Sitnicki <jakub@cloudflare.com>,
Eric Dumazet <edumazet@google.com>,
Paolo Abeni <pabeni@redhat.com>,
Willem de Bruijn <willemb@google.com>,
"David S. Miller" <davem@davemloft.net>,
Jakub Kicinski <kuba@kernel.org>, Simon Horman <horms@kernel.org>,
Yonghong Song <yhs@fb.com>, Andrii Nakryiko <andrii@kernel.org>,
Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Martin KaFai Lau <martin.lau@linux.dev>,
Eduard Zingerman <eddyz87@gmail.com>, Song Liu <song@kernel.org>,
Yonghong Song <yonghong.song@linux.dev>,
KP Singh <kpsingh@kernel.org>,
Stanislav Fomichev <sdf@fomichev.me>, Hao Luo <haoluo@google.com>,
Jiri Olsa <jolsa@kernel.org>, Shuah Khan <shuah@kernel.org>,
Cong Wang <cong.wang@bytedance.com>,
netdev@vger.kernel.org, bpf@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org
Subject: Re: [PATCH bpf v3 2/5] bpf, sockmap: Use sock_map_sk_{acquire,release}() where open-coded
Date: Mon, 16 Mar 2026 00:58:53 +0100 [thread overview]
Message-ID: <b74fa713-22e1-417c-8c72-b02937dbdaaa@rbox.co> (raw)
In-Reply-To: <CAAVpQUADj21wNH=OkUVxW81Zf8RPd1TxLgcfU1wVbWXDY+W6Sg@mail.gmail.com>
On 3/11/26 05:57, Kuniyuki Iwashima wrote:
> On Tue, Mar 10, 2026 at 9:17 PM Kuniyuki Iwashima <kuniyu@google.com> wrote:
>>
>> On Fri, Mar 6, 2026 at 6:05 AM Michal Luczaj <mhal@rbox.co> wrote:
>>>
>>> On 3/6/26 06:44, Kuniyuki Iwashima wrote:
>>>> On Thu, Mar 5, 2026 at 3:32 PM Michal Luczaj <mhal@rbox.co> wrote:
>>>>>
>>>>> Instead of repeating the same (un)locking pattern, reuse
>>>>> sock_map_sk_{acquire,release}(). This centralizes the code and makes it
>>>>> easier to adapt sockmap to af_unix-specific locking.
>>>>>
>>>>> Signed-off-by: Michal Luczaj <mhal@rbox.co>
>>>>> ---
>>>>> net/core/sock_map.c | 21 +++++++--------------
>>>>> 1 file changed, 7 insertions(+), 14 deletions(-)
>>>>>
>>>>> diff --git a/net/core/sock_map.c b/net/core/sock_map.c
>>>>> index 02a68be3002a..7ba6a7f24ccd 100644
>>>>> --- a/net/core/sock_map.c
>>>>> +++ b/net/core/sock_map.c
>>>>> @@ -353,11 +353,9 @@ static void sock_map_free(struct bpf_map *map)
>>>>> sk = xchg(psk, NULL);
>>>>> if (sk) {
>>>>> sock_hold(sk);
>>>>> - lock_sock(sk);
>>>>> - rcu_read_lock();
>>>>> + sock_map_sk_acquire(sk);
>>>>> sock_map_unref(sk, psk);
>>>>> - rcu_read_unlock();
>>>>> - release_sock(sk);
>>>>> + sock_map_sk_release(sk);
>>>>> sock_put(sk);
>>>>> }
>>>>> }
>>>>> @@ -1176,11 +1174,9 @@ static void sock_hash_free(struct bpf_map *map)
>>>>> */
>>>>> hlist_for_each_entry_safe(elem, node, &unlink_list, node) {
>>>>> hlist_del(&elem->node);
>>>>> - lock_sock(elem->sk);
>>>>> - rcu_read_lock();
>>>>> + sock_map_sk_acquire(elem->sk);
>>>>> sock_map_unref(elem->sk, elem);
>>>>> - rcu_read_unlock();
>>>>> - release_sock(elem->sk);
>>>>> + sock_map_sk_release(elem->sk);
>>>>> sock_put(elem->sk);
>>>>> sock_hash_free_elem(htab, elem);
>>>>> }
>>>>> @@ -1676,8 +1672,7 @@ void sock_map_close(struct sock *sk, long timeout)
>>>>> void (*saved_close)(struct sock *sk, long timeout);
>>>>> struct sk_psock *psock;
>>>>>
>>>>> - lock_sock(sk);
>>>>> - rcu_read_lock();
>>>>> + sock_map_sk_acquire(sk);
>>>>> psock = sk_psock(sk);
>>>>> if (likely(psock)) {
>>>>> saved_close = psock->saved_close;
>>>>> @@ -1685,16 +1680,14 @@ void sock_map_close(struct sock *sk, long timeout)
>>>>> psock = sk_psock_get(sk);
>>>>> if (unlikely(!psock))
>>>>> goto no_psock;
>>>>> - rcu_read_unlock();
>>>>> sk_psock_stop(psock);
>>>>> - release_sock(sk);
>>>>> + sock_map_sk_release(sk);
>>>>
>>>> I think sk_psock_stop() was intentionally put outside
>>>> of rcu_read_lock() to not extend the grace period
>>>> unnecessarily. e.g. while + __sk_msg_free().
>>>>
>>>> Maybe add __sock_map_sk_release() without
>>>> rcu_read_unlock() ?
>>>
>>> How about dropping this patch completely? The more I stare at it, I see no
>>> reason why af_unix state lock would matter in any of these places.
>>
>> I agree. Actually, once it's held, it can be released right away.
>> The lock is only to ensure that peer is set after checking
>> TCP_ESTABLISHED, but it continues holding unix_state_lock()
>> unnecessarily long.
>>
>> Honestly I prefer Martin's idea, using unix_peer_get() in
>> unix_stream_bpf_update_proto().
>
> Pondering again, I'm leaning to towards my initial approach,
> just null check for peer, which allows bpf_iter to acquire
> unix_state_lock() and make sure the socket is alive.
> (still lock_sock() is needed for bpf_setsockopt())
>
> The check is lightweight and SOCKMAP does not need to
> hold the lock unnecessarily, and we can provide stable
> result to bpf_iter.
Right, unix_state_lock() would keep the unix sock state. But is keeping the
unix state that important in the context of bpf iter?
> IOW, if we hold unix_state_lock() in the SOCKMAP path
> (even unix_peer_get()), we cannot use unix_state_lock()
> for bpf_iter and lose stability since it will trigger dead lock.
Assuming the answer to the question above is "not really", I'd say taking
unix_state_lock() during sockmap update (both bpf and non-bpf context)
makes sense for 2 reasons:
1. Fulfil sockmap's locking expectation as Martin pointed out; once the
state lock is taken, no need to (additionally) protected against
null-ptr-deref during sock_map_update_elem_sys().
2. Slightly better handle the case of unix sock changing state unexpectedly
_while_ it is being placed in a sockmap by sock_map_update_elem(). At least
for now; in the follow up series we may be able to get rid of (or
simplify?) sock_map_sk_{acquire,release}_fast().
So, what I'm saying is: let's drop changes touching sock_map_free(),
sock_hash_free(), sock_map_close() (this patch), stick to lock_sock-only
unix iter, for non-bpf-context update take the unix state lock (in series
with lock_sock() per your suggestion) which tackles the null-ptr-deref, and
for bpf-context update let's follow what's currently happening for
non-af_unix socks: take the unix state spinlock.
This would boil down to:
diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index 02a68be3002a..61d782db614c 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -12,6 +12,7 @@
#include <linux/list.h>
#include <linux/jhash.h>
#include <linux/sock_diag.h>
+#include <net/af_unix.h>
#include <net/udp.h>
struct bpf_stab {
@@ -115,19 +116,45 @@ int sock_map_prog_detach(const union bpf_attr *attr,
enum bpf_prog_type ptype)
}
static void sock_map_sk_acquire(struct sock *sk)
- __acquires(&sk->sk_lock.slock)
{
lock_sock(sk);
+
+ if (sk_is_unix(sk))
+ unix_state_lock(sk);
+
rcu_read_lock();
}
static void sock_map_sk_release(struct sock *sk)
- __releases(&sk->sk_lock.slock)
{
rcu_read_unlock();
+
+ if (sk_is_unix(sk))
+ unix_state_unlock(sk);
+
release_sock(sk);
}
+static void sock_map_sk_acquire_fast(struct sock *sk)
+{
+ if (sk_is_unix(sk)) {
+ unix_state_lock(sk);
+ } else {
+ local_bh_disable();
+ bh_lock_sock(sk);
+ }
+}
+
+static void sock_map_sk_release_fast(struct sock *sk)
+{
+ if (sk_is_unix(sk)) {
+ unix_state_unlock(sk);
+ } else {
+ bh_unlock_sock(sk);
+ local_bh_enable();
+ }
+}
+
static void sock_map_add_link(struct sk_psock *psock,
struct sk_psock_link *link,
struct bpf_map *map, void *link_raw)
@@ -606,16 +633,14 @@ static long sock_map_update_elem(struct bpf_map *map,
void *key,
if (!sock_map_sk_is_suitable(sk))
return -EOPNOTSUPP;
- local_bh_disable();
- bh_lock_sock(sk);
+ sock_map_sk_acquire_fast(sk);
if (!sock_map_sk_state_allowed(sk))
ret = -EOPNOTSUPP;
else if (map->map_type == BPF_MAP_TYPE_SOCKMAP)
ret = sock_map_update_common(map, *(u32 *)key, sk, flags);
else
ret = sock_hash_update_common(map, key, sk, flags);
- bh_unlock_sock(sk);
- local_bh_enable();
+ sock_map_sk_release_fast(sk);
return ret;
}
next prev parent reply other threads:[~2026-03-15 23:59 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-05 23:30 [PATCH bpf v3 0/5] bpf, sockmap: Fix af_unix null-ptr-deref in proto update Michal Luczaj
2026-03-05 23:30 ` [PATCH bpf v3 1/5] bpf, sockmap: Annotate af_unix sock::sk_state data-races Michal Luczaj
2026-03-06 5:30 ` Kuniyuki Iwashima
2026-03-06 6:24 ` [PATCH bpf v3 1/5] bpf, sockmap: Annotate af_unix sock^sk_state data-races Jiayuan Chen
2026-03-18 17:05 ` [PATCH bpf v3 1/5] bpf, sockmap: Annotate af_unix sock::sk_state data-races Michal Luczaj
2026-03-05 23:30 ` [PATCH bpf v3 2/5] bpf, sockmap: Use sock_map_sk_{acquire,release}() where open-coded Michal Luczaj
2026-03-06 5:44 ` Kuniyuki Iwashima
2026-03-06 14:05 ` Michal Luczaj
2026-03-11 4:17 ` Kuniyuki Iwashima
2026-03-11 4:57 ` Kuniyuki Iwashima
2026-03-15 23:58 ` Michal Luczaj [this message]
2026-03-05 23:30 ` [PATCH bpf v3 3/5] bpf, sockmap: Fix af_unix iter deadlock Michal Luczaj
2026-03-06 5:47 ` Kuniyuki Iwashima
2026-03-06 6:04 ` Jiayuan Chen
2026-03-06 6:15 ` Jiayuan Chen
2026-03-06 14:06 ` Michal Luczaj
2026-03-06 14:31 ` Jiayuan Chen
2026-03-06 14:33 ` Jiayuan Chen
2026-03-05 23:30 ` [PATCH bpf v3 4/5] selftests/bpf: Extend bpf_iter_unix to attempt deadlocking Michal Luczaj
2026-03-06 14:34 ` Jiayuan Chen
2026-03-05 23:30 ` [PATCH bpf v3 5/5] bpf, sockmap: Adapt for af_unix-specific lock Michal Luczaj
2026-03-06 5:01 ` Jiayuan Chen
2026-03-06 14:09 ` Michal Luczaj
2026-03-10 22:20 ` Martin KaFai Lau
2026-03-15 23:58 ` Michal Luczaj
2026-03-26 6:26 ` Martin KaFai Lau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b74fa713-22e1-417c-8c72-b02937dbdaaa@rbox.co \
--to=mhal@rbox.co \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=cong.wang@bytedance.com \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=eddyz87@gmail.com \
--cc=edumazet@google.com \
--cc=haoluo@google.com \
--cc=horms@kernel.org \
--cc=jakub@cloudflare.com \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=kpsingh@kernel.org \
--cc=kuba@kernel.org \
--cc=kuniyu@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=martin.lau@linux.dev \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=sdf@fomichev.me \
--cc=shuah@kernel.org \
--cc=song@kernel.org \
--cc=willemb@google.com \
--cc=yhs@fb.com \
--cc=yonghong.song@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox