From: sashiko-bot@kernel.org
To: "Cong Wang" <xiyou.wangcong@gmail.com>
Cc: bpf@vger.kernel.org
Subject: Re: [RFC PATCH bpf-next 2/5] tcp_bpf: busy-poll the splice ring before parking the receiver
Date: Fri, 12 Jun 2026 03:29:48 +0000 [thread overview]
Message-ID: <20260612032949.68A3F1F000E9@smtp.kernel.org> (raw)
In-Reply-To: <20260612011452.134466-3-xiyou.wangcong@gmail.com>
Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
- [High] Missing `need_resched()` check in `splice_busy_loop()` allows an unprivileged user to monopolize a CPU, causing soft lockups and denial of service.
- [High] `tcp_bpf_splice_recvmsg` incorrectly returns `-EINTR` instead of `-ERESTARTSYS` for blocking sockets when interrupted by a signal, breaking `SA_RESTART` semantics.
- [High] Providing an invalid memory buffer to `recvmsg` on a spliced socket causes an uninterruptible infinite loop in kernel space, leading to Denial of Service.
--
commit 8f0193a527ddda6d7d50681be8a081f8bc267f16
Author: Cong Wang <xiyou.wangcong@gmail.com>
tcp_bpf: busy-poll the splice ring before parking the receiver
This commit adds an optional bounded busy-poll of the ring before parking,
reusing the socket's SO_BUSY_POLL budget via sk_can_busy_loop() and
sk_busy_loop_timeout(). It spins directly on the in-kernel ring without
polling a NAPI instance.
> diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
> index 549f370772446..9c4421a74225b 100644
> --- a/net/ipv4/tcp_bpf.c
> +++ b/net/ipv4/tcp_bpf.c
[ ... ]
> @@ -1255,6 +1256,33 @@ static long splice_recv_wait(struct sock *sk, struct sk_psock_splice *s,
> splice_recv_ready(sk, s), timeo);
> }
>
> +/* Bounded busy-poll on the ring before parking the receiver. Reuses the
> + * socket's SO_BUSY_POLL budget (sk_ll_usec) via sk_can_busy_loop() and
> + * sk_busy_loop_timeout(); the default budget of 0 makes sk_can_busy_loop()
> + * false so this is a no-op unless the application (or net.core.busy_read)
> + * opted in.
> + *
> + * Unlike sk_busy_loop() / napi_busy_loop(), this spins on the in-kernel
> + * ring directly rather than polling a NAPI instance, so it is effective on
> + * loopback - which delivers via the per-CPU backlog and exposes no
> + * pollable napi_id. Keeping the receiver hot lets a synchronous sender's
> + * small writes accumulate in the ring without a wakeup per message.
> + */
> +static void splice_busy_loop(struct sock *sk, struct sk_psock_splice *s)
> +{
> + unsigned long start;
> +
> + if (!sk_can_busy_loop(sk))
> + return;
> +
> + start = busy_loop_current_time();
> + do {
> + cpu_relax();
> + if (splice_recv_ready(sk, s) || signal_pending(current))
> + return;
> + } while (!sk_busy_loop_timeout(sk, start));
[Severity: High]
Does this loop need a need_resched() check or a call to cond_resched()?
Because an unprivileged user can set SO_BUSY_POLL to an extremely large
value (up to INT_MAX microseconds), triggering this path via recvmsg()
could cause the thread to spin on the CPU without yielding.
On non-preemptible kernels, this might completely starve other tasks,
stall RCU grace periods, and trigger watchdog soft lockups.
> +}
> +
> /* prot->sock_is_readable for paired-splice sockets. tcp_stream_is_readable()
[ ... ]
> @@ -1349,6 +1377,16 @@ static int tcp_bpf_splice_recvmsg(struct sock *sk,
> return 0;
> }
>
> + /* Spin on the ring for the SO_BUSY_POLL budget before
> + * sleeping. If the spin observes data, re-read from the
> + * loop head; otherwise (budget expired or a terminal
> + * condition) proceed to park - splice_recv_wait() returns
> + * immediately for terminal conditions.
> + */
> + splice_busy_loop(sk, s);
> + if (splice_ring_has_data(s))
> + continue;
[Severity: High]
Could this result in an infinite loop if a user provides an invalid
buffer to recvmsg?
If the user buffer is unmapped memory, copy_to_iter() in splice_ring_read()
called earlier in the loop might safely fault and return 0. If
splice_ring_read() returns 0, the ring's tail pointer isn't advanced, and
the data remains in the ring buffer.
tcp_bpf_splice_recvmsg() seems to treat the 0 return as an empty read
rather than an -EFAULT error. When it subsequently checks
splice_ring_has_data() here, it will evaluate to true because the data is
still present.
This triggers the continue statement, which loops back immediately,
potentially creating a tight infinite loop in the kernel.
> +
> timeo = splice_recv_wait(sk, s, timeo);
[Severity: High]
Could this cause the system call to incorrectly return -EINTR instead of
-ERESTARTSYS when interrupted by a signal?
When splice_recv_wait() uses wait_event_interruptible_timeout() and is
interrupted, it returns -ERESTARTSYS (-512). This return value directly
overwrites the timeo variable here.
On the next loop iteration, signal_pending(current) would evaluate to true,
and the error code is computed via sock_intr_errno(timeo). Because timeo
is now -512 instead of MAX_SCHEDULE_TIMEOUT, sock_intr_errno() might
incorrectly return -EINTR instead of -ERESTARTSYS, which breaks SA_RESTART
semantics.
> }
> }
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260612011452.134466-1-xiyou.wangcong@gmail.com?part=2
next prev parent reply other threads:[~2026-06-12 3:29 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-12 1:14 [RFC PATCH bpf-next 0/5] tcp: opportunistic loopback splice for BPF-paired sockets Cong Wang
2026-06-12 1:14 ` [RFC PATCH bpf-next 1/5] tcp_bpf: add bpf_sock_splice_pair kfunc for opportunistic loopback splice Cong Wang
2026-06-12 1:33 ` sashiko-bot
2026-06-12 2:10 ` bot+bpf-ci
2026-06-12 1:14 ` [RFC PATCH bpf-next 2/5] tcp_bpf: busy-poll the splice ring before parking the receiver Cong Wang
2026-06-12 3:29 ` sashiko-bot [this message]
2026-06-12 1:14 ` [RFC PATCH bpf-next 3/5] selftests/bpf: add tcp_splice basic round-trip test Cong Wang
2026-06-12 1:28 ` sashiko-bot
2026-06-12 1:14 ` [RFC PATCH bpf-next 4/5] bpf: allow SO_BUSY_POLL in bpf_setsockopt() Cong Wang
2026-06-12 1:14 ` [RFC PATCH bpf-next 5/5] selftests/bpf: set SO_BUSY_POLL from the tcp_splice sockops prog Cong Wang
2026-06-12 1:26 ` sashiko-bot
2026-06-12 16:01 ` [RFC PATCH bpf-next 0/5] tcp: opportunistic loopback splice for BPF-paired sockets Alexei Starovoitov
2026-06-12 18:12 ` Cong Wang
2026-06-12 18:34 ` Alexei Starovoitov
2026-06-12 20:17 ` Cong Wang
2026-06-13 17:57 ` Jakub Kicinski
2026-06-13 21:25 ` Cong Wang
2026-06-12 22:10 ` [syzbot ci] " syzbot ci
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260612032949.68A3F1F000E9@smtp.kernel.org \
--to=sashiko-bot@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=sashiko-reviews@lists.linux.dev \
--cc=xiyou.wangcong@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox