* [PATCH bpf] bpf, sockmap: take the socket lock in udp_bpf_recvmsg()
@ 2026-06-10 17:59 Sechang Lim
2026-06-10 18:12 ` sashiko-bot
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: Sechang Lim @ 2026-06-10 17:59 UTC (permalink / raw)
To: John Fastabend, Jakub Sitnicki, David S . Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni
Cc: Simon Horman, Cong Wang, Alexei Starovoitov, netdev, bpf,
linux-kernel
udp_bpf_recvmsg() drains the psock ingress_msg queue through
sk_msg_recvmsg() without holding the socket lock. __sk_msg_recvmsg()
walks the head sk_msg's scatterlist and updates each entry in place
(sge->length -= copy, sge->offset += copy), drops the page once an entry
is consumed, and frees the sk_msg when it is drained. ingress_lock only
serializes the queue list operations, not this per-entry consume;
copy_page_to_iter() can fault, so the loop cannot hold a spinlock.
Two threads calling recvmsg() on the same UDP socket then consume the
same sk_msg at once:
CPU0 CPU1
sk_msg_recvmsg sk_msg_recvmsg
copy = sge->length copy = sge->length
copy_page_to_iter(copy) copy_page_to_iter(copy)
sge->length -= copy
/* sge->length == 0 */
sge->length -= copy
/* underflows to ~0U */
The next pass loads the underflowed length into a signed int, the value
is negative so the copied + copy > len clamp is skipped, and
copy_page_to_iter() is handed ~2^64 as its size_t bytes, which trips
page_copy_sane(). The same race can put_page() one page twice and free
the sk_msg while the other reader still walks it.
WARNING: CPU: 0 PID: 5484 at lib/iov_iter.c:356 copy_page_to_iter
WARNING: CPU: 1 PID: 5485 at lib/iov_iter.c:356 copy_page_to_iter
CPU: 1 UID: 0 PID: 5485 Comm: syz.6.1204 Not tainted 7.1.0-rc7 #1
RIP: 0010:copy_page_to_iter+0xa9/0x270
Call Trace:
<TASK>
__sk_msg_recvmsg+0x384/0x9a0
udp_bpf_recvmsg+0x1a0/0x6f0
sock_recvmsg+0x124/0x150
__sys_recvfrom+0x1c2/0x2b0
__x64_sys_recvfrom+0x7b/0x90
do_syscall_64+0x115/0x6d0
entry_SYSCALL_64_after_hwframe+0x77/0x7f
</TASK>
tcp_bpf_recvmsg_parser() already serializes this with lock_sock() around
__sk_msg_recvmsg(); udp_bpf_recvmsg() does not. Take the socket lock
around the sk_msg_recvmsg() consume so a given ingress sk_msg is drained
by one reader at a time. The wait path is left unlocked, as
udp_msg_wait_data() sleeps without dropping the socket lock.
Fixes: 1f5be6b3b063 ("udp: Implement udp_bpf_recvmsg() for sockmap")
Signed-off-by: Sechang Lim <rhkrqnwk98@gmail.com>
---
net/ipv4/udp_bpf.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/net/ipv4/udp_bpf.c b/net/ipv4/udp_bpf.c
index 9f33b07b1481..8dd73e9ee1ea 100644
--- a/net/ipv4/udp_bpf.c
+++ b/net/ipv4/udp_bpf.c
@@ -80,7 +80,9 @@ static int udp_bpf_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
}
msg_bytes_ready:
+ lock_sock(sk);
copied = sk_msg_recvmsg(sk, psock, msg, len, flags);
+ release_sock(sk);
if (!copied) {
long timeo;
int data;
--
2.43.0
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH bpf] bpf, sockmap: take the socket lock in udp_bpf_recvmsg()
2026-06-10 17:59 [PATCH bpf] bpf, sockmap: take the socket lock in udp_bpf_recvmsg() Sechang Lim
@ 2026-06-10 18:12 ` sashiko-bot
2026-06-10 18:35 ` bot+bpf-ci
2026-06-11 21:19 ` Cong Wang
2 siblings, 0 replies; 4+ messages in thread
From: sashiko-bot @ 2026-06-10 18:12 UTC (permalink / raw)
To: Sechang Lim; +Cc: bpf
Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:
Pre-existing issues:
- [High] `udp_bpf_recvmsg` returns `-EAGAIN` instead of `-EINTR` or `-ERESTARTSYS` on signal interruption for blocking sockets.
- [High] `udp_bpf_recvmsg` enters an infinite busy-wait loop if data is queued in `psock->ingress_skb` but not yet processed into `psock->ingress_msg`.
- [High] Lost wakeups and deadlocks for BPF-redirected packets when the UDP socket is initially empty.
- [High] `vsock_bpf_recvmsg` sleeps while holding the socket lock, causing a deadlock with incoming packet delivery.
--
commit 093cedb7b7f9c8e7af25b932fdeb58d606fcc2ea
Author: Sechang Lim <rhkrqnwk98@gmail.com>
bpf, sockmap: take the socket lock in udp_bpf_recvmsg()
udp_bpf_recvmsg() drains the psock ingress_msg queue through
sk_msg_recvmsg() without holding the socket lock. This commit
adds lock_sock() and release_sock() around the consume path
so a given ingress sk_msg is drained by one reader at a time.
> diff --git a/net/ipv4/udp_bpf.c b/net/ipv4/udp_bpf.c
> index 9f33b07b14813..8dd73e9ee1ea6 100644
> --- a/net/ipv4/udp_bpf.c
> +++ b/net/ipv4/udp_bpf.c
[Severity: High]
This isn't a bug introduced by this patch, but can the unconditional
delegation to sk_udp_recvmsg() in udp_bpf_recvmsg() cause lost wakeups
and deadlocks for BPF-redirected packets when the UDP socket is
initially empty?
If !psock_has_data(psock) is initially true, udp_bpf_recvmsg() does:
if (!psock_has_data(psock)) {
ret = sk_udp_recvmsg(sk, msg, len, flags);
goto out;
}
sk_udp_recvmsg() will sleep on the native socket wait queue checking
only sk->sk_receive_queue.
If a packet arrives and is redirected by a BPF program to
psock->ingress_msg, the wait queue is woken up, but because
sk_receive_queue is still empty, the native UDP wait loop
(__skb_wait_for_more_packets) will immediately go back to sleep.
Does this leave the packet stuck in psock and the user thread hanging
permanently?
> @@ -80,7 +80,9 @@ static int udp_bpf_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
> }
>
> msg_bytes_ready:
> + lock_sock(sk);
> copied = sk_msg_recvmsg(sk, psock, msg, len, flags);
> + release_sock(sk);
[Severity: High]
This is a pre-existing issue, but can this code enter an infinite
busy-wait loop if data is queued in psock->ingress_skb but not yet
processed into psock->ingress_msg?
If psock->ingress_skb is not empty, psock_has_data() evaluates to true.
Then udp_bpf_recvmsg() calls sk_msg_recvmsg(), which only reads from
ingress_msg, returning 0.
Because psock_has_data() is true, udp_msg_wait_data() returns 1
without sleeping. The loop sees data == 1 and psock_has_data() == true
and jumps back to msg_bytes_ready.
Does this create a tight spin loop that rapidly acquires and releases
lock_sock, causing CPU starvation until the backlog workqueue thread
finishes moving the skb?
> if (!copied) {
> long timeo;
> int data;
[Severity: High]
This is a pre-existing issue, but does udp_bpf_recvmsg() violate POSIX
semantics by returning -EAGAIN instead of -EINTR or -ERESTARTSYS on
signal interruption for blocking sockets?
If a blocking recvmsg is interrupted by a signal, wait_woken() inside
udp_msg_wait_data() returns prematurely, causing udp_msg_wait_data()
to return 0.
The code then unconditionally sets copied = -EAGAIN when data is 0:
copied = -EAGAIN;
Does this completely ignore signal_pending(current)?
Additionally, there is a pre-existing issue in net/vmw_vsock/vsock_bpf.c:
[Severity: High]
This isn't a bug introduced by this patch, but does vsock_bpf_recvmsg()
sleep while holding the socket lock, causing a deadlock with incoming
packet delivery?
In vsock_bpf_recvmsg(), the code acquires lock_sock(sk) and calls
vsock_msg_wait_data() when no data is available:
vsock_bpf_recvmsg()
lock_sock(sk);
...
while (copied == 0) {
long timeo = sock_rcvtimeo(sk, flags & MSG_DONTWAIT);
if (!vsock_msg_wait_data(sk, psock, timeo)) {
This function calls wait_woken() without releasing the socket lock:
vsock_msg_wait_data()
wait_woken(&wait, TASK_INTERRUPTIBLE, timeo);
Meanwhile, the Rx thread (virtio_transport_recv_pkt()) needs to acquire
lock_sock(sk) to deliver incoming packets and trigger sk_data_ready().
Because the sleeping user task holds the lock, does the Rx thread block
indefinitely, resulting in a deadlock where the user task is never
woken up?
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260610175922.1103871-1-rhkrqnwk98@gmail.com?part=1
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH bpf] bpf, sockmap: take the socket lock in udp_bpf_recvmsg()
2026-06-10 17:59 [PATCH bpf] bpf, sockmap: take the socket lock in udp_bpf_recvmsg() Sechang Lim
2026-06-10 18:12 ` sashiko-bot
@ 2026-06-10 18:35 ` bot+bpf-ci
2026-06-11 21:19 ` Cong Wang
2 siblings, 0 replies; 4+ messages in thread
From: bot+bpf-ci @ 2026-06-10 18:35 UTC (permalink / raw)
To: rhkrqnwk98, john.fastabend, jakub, davem, edumazet, kuba, pabeni
Cc: horms, cong.wang, ast, netdev, bpf, linux-kernel, ast, andrii,
daniel, martin.lau, eddyz87, yonghong.song, clm, ihor.solodrai
[-- Attachment #1: Type: text/plain, Size: 2619 bytes --]
> udp_bpf_recvmsg() drains the psock ingress_msg queue through
> sk_msg_recvmsg() without holding the socket lock. __sk_msg_recvmsg()
> walks the head sk_msg's scatterlist and updates each entry in place
> (sge->length -= copy, sge->offset += copy), drops the page once an entry
> is consumed, and frees the sk_msg when it is drained. ingress_lock only
> serializes the queue list operations, not this per-entry consume;
> copy_page_to_iter() can fault, so the loop cannot hold a spinlock.
>
> Two threads calling recvmsg() on the same UDP socket then consume the
> same sk_msg at once:
>
> CPU0 CPU1
>
> sk_msg_recvmsg sk_msg_recvmsg
> copy = sge->length copy = sge->length
> copy_page_to_iter(copy) copy_page_to_iter(copy)
> sge->length -= copy
> /* sge->length == 0 */
> sge->length -= copy
> /* underflows to ~0U */
>
> The next pass loads the underflowed length into a signed int, the value
> is negative so the copied + copy > len clamp is skipped, and
> copy_page_to_iter() is handed ~2^64 as its size_t bytes, which trips
> page_copy_sane(). The same race can put_page() one page twice and free
> the sk_msg while the other reader still walks it.
>
> tcp_bpf_recvmsg_parser() already serializes this with lock_sock() around
> __sk_msg_recvmsg(); udp_bpf_recvmsg() does not. Take the socket lock
> around the sk_msg_recvmsg() consume so a given ingress sk_msg is drained
> by one reader at a time. The wait path is left unlocked, as
> udp_msg_wait_data() sleeps without dropping the socket lock.
>
> Fixes: 1f5be6b3b063 ("udp: Implement udp_bpf_recvmsg() for sockmap")
Is the Fixes: tag pointing at the right commit?
Commit 1f5be6b3b063 ("udp: Implement udp_bpf_recvmsg() for sockmap")
introduced udp_bpf_recvmsg() with a lock_sock(sk)/release_sock(sk) pair
wrapping the whole function, including the sk_msg_recvmsg() consume, so the
race described here was not present at that point.
The unlocked consume appears to have been introduced later by
9f2470fbc4cb ("skmsg: Improve udp_bpf_recvmsg() accuracy"), whose diff
removes both lock_sock(sk) and release_sock(sk) from udp_bpf_recvmsg().
Should the tag instead be:
Fixes: 9f2470fbc4cb ("skmsg: Improve udp_bpf_recvmsg() accuracy")
---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md
CI run summary: https://github.com/kernel-patches/bpf/actions/runs/27296268347
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH bpf] bpf, sockmap: take the socket lock in udp_bpf_recvmsg()
2026-06-10 17:59 [PATCH bpf] bpf, sockmap: take the socket lock in udp_bpf_recvmsg() Sechang Lim
2026-06-10 18:12 ` sashiko-bot
2026-06-10 18:35 ` bot+bpf-ci
@ 2026-06-11 21:19 ` Cong Wang
2 siblings, 0 replies; 4+ messages in thread
From: Cong Wang @ 2026-06-11 21:19 UTC (permalink / raw)
To: Sechang Lim
Cc: John Fastabend, Jakub Sitnicki, David S . Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman, Cong Wang,
Alexei Starovoitov, netdev, bpf, linux-kernel
On Wed, Jun 10, 2026 at 05:59:17PM +0000, Sechang Lim wrote:
> diff --git a/net/ipv4/udp_bpf.c b/net/ipv4/udp_bpf.c
> index 9f33b07b1481..8dd73e9ee1ea 100644
> --- a/net/ipv4/udp_bpf.c
> +++ b/net/ipv4/udp_bpf.c
> @@ -80,7 +80,9 @@ static int udp_bpf_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
> }
>
> msg_bytes_ready:
> + lock_sock(sk);
> copied = sk_msg_recvmsg(sk, psock, msg, len, flags);
> + release_sock(sk);
> if (!copied) {
> long timeo;
> int data;
It seems Kuniyuki's fix covers more than yours:
https://lore.kernel.org/bpf/20260221233234.3814768-4-kuniyu@google.com/
Regards,
Cong
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-06-11 21:19 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-10 17:59 [PATCH bpf] bpf, sockmap: take the socket lock in udp_bpf_recvmsg() Sechang Lim
2026-06-10 18:12 ` sashiko-bot
2026-06-10 18:35 ` bot+bpf-ci
2026-06-11 21:19 ` Cong Wang
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.