Re: [RFC PATCH bpf-next 2/5] tcp_bpf: busy-poll the splice ring before parking the receiver

BPF List
 help / color / mirror / Atom feed

From: sashiko-bot@kernel.org
To: "Cong Wang" <xiyou.wangcong@gmail.com>
Cc: bpf@vger.kernel.org
Subject: Re: [RFC PATCH bpf-next 2/5] tcp_bpf: busy-poll the splice ring before parking the receiver
Date: Fri, 12 Jun 2026 03:29:48 +0000	[thread overview]
Message-ID: <20260612032949.68A3F1F000E9@smtp.kernel.org> (raw)
In-Reply-To: <20260612011452.134466-3-xiyou.wangcong@gmail.com>

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
- [High] Missing `need_resched()` check in `splice_busy_loop()` allows an unprivileged user to monopolize a CPU, causing soft lockups and denial of service.
- [High] `tcp_bpf_splice_recvmsg` incorrectly returns `-EINTR` instead of `-ERESTARTSYS` for blocking sockets when interrupted by a signal, breaking `SA_RESTART` semantics.
- [High] Providing an invalid memory buffer to `recvmsg` on a spliced socket causes an uninterruptible infinite loop in kernel space, leading to Denial of Service.
--

commit 8f0193a527ddda6d7d50681be8a081f8bc267f16
Author: Cong Wang <xiyou.wangcong@gmail.com>

tcp_bpf: busy-poll the splice ring before parking the receiver

This commit adds an optional bounded busy-poll of the ring before parking,
reusing the socket's SO_BUSY_POLL budget via sk_can_busy_loop() and
sk_busy_loop_timeout(). It spins directly on the in-kernel ring without
polling a NAPI instance.

> diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
> index 549f370772446..9c4421a74225b 100644
> --- a/net/ipv4/tcp_bpf.c
> +++ b/net/ipv4/tcp_bpf.c

[ ... ]

> @@ -1255,6 +1256,33 @@ static long splice_recv_wait(struct sock *sk, struct sk_psock_splice *s,
>  					splice_recv_ready(sk, s), timeo);
>  }
>  
> +/* Bounded busy-poll on the ring before parking the receiver. Reuses the
> + * socket's SO_BUSY_POLL budget (sk_ll_usec) via sk_can_busy_loop() and
> + * sk_busy_loop_timeout(); the default budget of 0 makes sk_can_busy_loop()
> + * false so this is a no-op unless the application (or net.core.busy_read)
> + * opted in.
> + *
> + * Unlike sk_busy_loop() / napi_busy_loop(), this spins on the in-kernel
> + * ring directly rather than polling a NAPI instance, so it is effective on
> + * loopback - which delivers via the per-CPU backlog and exposes no
> + * pollable napi_id. Keeping the receiver hot lets a synchronous sender's
> + * small writes accumulate in the ring without a wakeup per message.
> + */
> +static void splice_busy_loop(struct sock *sk, struct sk_psock_splice *s)
> +{
> +	unsigned long start;
> +
> +	if (!sk_can_busy_loop(sk))
> +		return;
> +
> +	start = busy_loop_current_time();
> +	do {
> +		cpu_relax();
> +		if (splice_recv_ready(sk, s) || signal_pending(current))
> +			return;
> +	} while (!sk_busy_loop_timeout(sk, start));

[Severity: High]
Does this loop need a need_resched() check or a call to cond_resched()?

Because an unprivileged user can set SO_BUSY_POLL to an extremely large
value (up to INT_MAX microseconds), triggering this path via recvmsg()
could cause the thread to spin on the CPU without yielding.

On non-preemptible kernels, this might completely starve other tasks,
stall RCU grace periods, and trigger watchdog soft lockups.

> +}
> +
>  /* prot->sock_is_readable for paired-splice sockets. tcp_stream_is_readable()

[ ... ]

> @@ -1349,6 +1377,16 @@ static int tcp_bpf_splice_recvmsg(struct sock *sk,
>  			return 0;
>  		}
>  
> +		/* Spin on the ring for the SO_BUSY_POLL budget before
> +		 * sleeping. If the spin observes data, re-read from the
> +		 * loop head; otherwise (budget expired or a terminal
> +		 * condition) proceed to park - splice_recv_wait() returns
> +		 * immediately for terminal conditions.
> +		 */
> +		splice_busy_loop(sk, s);
> +		if (splice_ring_has_data(s))
> +			continue;

[Severity: High]
Could this result in an infinite loop if a user provides an invalid
buffer to recvmsg?

If the user buffer is unmapped memory, copy_to_iter() in splice_ring_read()
called earlier in the loop might safely fault and return 0. If
splice_ring_read() returns 0, the ring's tail pointer isn't advanced, and
the data remains in the ring buffer.

tcp_bpf_splice_recvmsg() seems to treat the 0 return as an empty read
rather than an -EFAULT error. When it subsequently checks
splice_ring_has_data() here, it will evaluate to true because the data is
still present.

This triggers the continue statement, which loops back immediately,
potentially creating a tight infinite loop in the kernel.

> +
>  		timeo = splice_recv_wait(sk, s, timeo);

[Severity: High]
Could this cause the system call to incorrectly return -EINTR instead of
-ERESTARTSYS when interrupted by a signal?

When splice_recv_wait() uses wait_event_interruptible_timeout() and is
interrupted, it returns -ERESTARTSYS (-512). This return value directly
overwrites the timeo variable here.

On the next loop iteration, signal_pending(current) would evaluate to true,
and the error code is computed via sock_intr_errno(timeo). Because timeo
is now -512 instead of MAX_SCHEDULE_TIMEOUT, sock_intr_errno() might
incorrectly return -EINTR instead of -ERESTARTSYS, which breaks SA_RESTART
semantics.

>  	}
>  }

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260612011452.134466-1-xiyou.wangcong@gmail.com?part=2

next prev parent reply	other threads:[~2026-06-12  3:29 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-12  1:14 [RFC PATCH bpf-next 0/5] tcp: opportunistic loopback splice for BPF-paired sockets Cong Wang
2026-06-12  1:14 ` [RFC PATCH bpf-next 1/5] tcp_bpf: add bpf_sock_splice_pair kfunc for opportunistic loopback splice Cong Wang
2026-06-12  1:33   ` sashiko-bot
2026-06-12  2:10   ` bot+bpf-ci
2026-06-12  1:14 ` [RFC PATCH bpf-next 2/5] tcp_bpf: busy-poll the splice ring before parking the receiver Cong Wang
2026-06-12  3:29   ` sashiko-bot [this message]
2026-06-12  1:14 ` [RFC PATCH bpf-next 3/5] selftests/bpf: add tcp_splice basic round-trip test Cong Wang
2026-06-12  1:28   ` sashiko-bot
2026-06-12  1:14 ` [RFC PATCH bpf-next 4/5] bpf: allow SO_BUSY_POLL in bpf_setsockopt() Cong Wang
2026-06-12  1:14 ` [RFC PATCH bpf-next 5/5] selftests/bpf: set SO_BUSY_POLL from the tcp_splice sockops prog Cong Wang
2026-06-12  1:26   ` sashiko-bot
2026-06-12 16:01 ` [RFC PATCH bpf-next 0/5] tcp: opportunistic loopback splice for BPF-paired sockets Alexei Starovoitov
2026-06-12 18:12   ` Cong Wang
2026-06-12 18:34     ` Alexei Starovoitov
2026-06-12 20:17       ` Cong Wang
2026-06-13 17:57   ` Jakub Kicinski
2026-06-13 21:25     ` Cong Wang
2026-06-12 22:10 ` [syzbot ci] " syzbot ci

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260612032949.68A3F1F000E9@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=sashiko-reviews@lists.linux.dev \
    --cc=xiyou.wangcong@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox