From: Jiayuan Chen <jiayuan.chen@linux.dev>
To: xiyou.wangcong@gmail.com, john.fastabend@gmail.com, jakub@cloudflare.com
Cc: davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
pabeni@redhat.com, horms@kernel.org, andrii@kernel.org,
eddyz87@gmail.com, mykolal@fb.com, ast@kernel.org,
daniel@iogearbox.net, martin.lau@linux.dev, song@kernel.org,
yonghong.song@linux.dev, kpsingh@kernel.org, sdf@fomichev.me,
haoluo@google.com, jolsa@kernel.org, shuah@kernel.org,
mhal@rbox.co, jiayuan.chen@linux.dev, sgarzare@redhat.com,
netdev@vger.kernel.org, bpf@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org
Subject: [PATCH bpf-next v3 1/3] bpf, sockmap: avoid using sk_socket after free when sending
Date: Mon, 17 Mar 2025 17:22:54 +0800 [thread overview]
Message-ID: <20250317092257.68760-2-jiayuan.chen@linux.dev> (raw)
In-Reply-To: <20250317092257.68760-1-jiayuan.chen@linux.dev>
The sk->sk_socket is not locked or referenced, and during the call to
skb_send_sock(), there is a race condition with the release of sk_socket.
All types of sockets(tcp/udp/unix/vsock) will be affected.
Race conditions:
'''
CPU0 CPU1
skb_send_sock
sendmsg_unlocked
sock_sendmsg
sock_sendmsg_nosec
close(fd):
...
ops->release()
sock_map_close()
sk_socket->ops = NULL
free(socket)
sock->ops->sendmsg
^
panic here
'''
Based on the fact that we already wait for the workqueue to finish in
sock_map_close() if psock is held, we simply increase the psock
reference count to avoid race conditions.
'''
void sock_map_close()
{
...
if (likely(psock)) {
...
psock = sk_psock_get(sk);
if (unlikely(!psock))
goto no_psock; <=== Control usually jumps here via goto
...
cancel_delayed_work_sync(&psock->work); <=== not executed
sk_psock_put(sk, psock);
...
}
'''
The panic I catched:
'''
Workqueue: events sk_psock_backlog
RIP: 0010:sock_sendmsg+0x21d/0x440
RAX: 0000000000000000 RBX: ffffc9000521fad8 RCX: 0000000000000001
...
Call Trace:
<TASK>
? die_addr+0x40/0xa0
? exc_general_protection+0x14c/0x230
? asm_exc_general_protection+0x26/0x30
? sock_sendmsg+0x21d/0x440
? sock_sendmsg+0x3e0/0x440
? __pfx_sock_sendmsg+0x10/0x10
__skb_send_sock+0x543/0xb70
sk_psock_backlog+0x247/0xb80
...
'''
Reported-by: Michal Luczaj <mhal@rbox.co>
Fixes: 799aa7f98d53 ("skmsg: Avoid lock_sock() in sk_psock_backlog()")
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
---
Some approach I tried
1. add rcu:
- RCU conflicts with mutex_lock in Unix socket send path.
- Race conditions still exist when reading sk->sk_socket->ops for in
current sock_sendmsg implementation.
2. Increased the reference of sk_socket->file:
- If the user calls close(fd), we will do nothing because the reference
count is not set to 0. It's unexpected.
3. Use sock_lock when calling skb_send_sock:
- skb_send_sock itself already do the locking.
- If we call skb_send_sock_locked instead, we have to implement
sendmsg_locked for each protocol, which is not easy for UDP or Unix,
as the sending process involves frequent locking and unlocking, which
makes it challenging to isolate the locking logic.
---
net/core/skmsg.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index 0ddc4c718833..6101c1bb279a 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -655,6 +655,14 @@ static void sk_psock_backlog(struct work_struct *work)
bool ingress;
int ret;
+ /* Increment the psock refcnt to synchronize with close(fd) path in
+ * sock_map_close(), ensuring we wait for backlog thread completion
+ * before sk_socket freed. If refcnt increment fails, it indicates
+ * sock_map_close() completed with sk_socket potentially already freed.
+ */
+ if (!sk_psock_get(psock->sk))
+ return;
+
mutex_lock(&psock->work_mutex);
if (unlikely(state->len)) {
len = state->len;
@@ -702,6 +710,7 @@ static void sk_psock_backlog(struct work_struct *work)
}
end:
mutex_unlock(&psock->work_mutex);
+ sk_psock_put(psock->sk, psock);
}
struct sk_psock *sk_psock_init(struct sock *sk, int node)
--
2.47.1
next prev parent reply other threads:[~2025-03-17 9:23 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-17 9:22 [PATCH bpf-next v3 0/3] bpf: Fix use-after-free of sockmap Jiayuan Chen
2025-03-17 9:22 ` Jiayuan Chen [this message]
2025-03-19 23:02 ` [PATCH bpf-next v3 1/3] bpf, sockmap: avoid using sk_socket after free when sending Cong Wang
2025-03-19 23:36 ` Jiayuan Chen
2025-03-20 0:06 ` Cong Wang
2025-03-20 0:27 ` Jiayuan Chen
2025-03-20 12:32 ` Michal Luczaj
2025-03-20 14:48 ` Jiayuan Chen
2025-03-17 9:22 ` [PATCH bpf-next v3 2/3] bpf, sockmap: avoid using sk_socket after free when reading Jiayuan Chen
2025-03-20 0:34 ` Cong Wang
2025-03-20 12:36 ` Jiayuan Chen
2025-03-17 9:22 ` [PATCH bpf-next v3 3/3] selftests/bpf: Add edge case tests for sockmap Jiayuan Chen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250317092257.68760-2-jiayuan.chen@linux.dev \
--to=jiayuan.chen@linux.dev \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=eddyz87@gmail.com \
--cc=edumazet@google.com \
--cc=haoluo@google.com \
--cc=horms@kernel.org \
--cc=jakub@cloudflare.com \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=kpsingh@kernel.org \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=martin.lau@linux.dev \
--cc=mhal@rbox.co \
--cc=mykolal@fb.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=sdf@fomichev.me \
--cc=sgarzare@redhat.com \
--cc=shuah@kernel.org \
--cc=song@kernel.org \
--cc=xiyou.wangcong@gmail.com \
--cc=yonghong.song@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox