Re: [RFC PATCH bpf-next 2/5] tcp_bpf: busy-poll the splice ring before parking the receiver

All of lore.kernel.org
 help / color / mirror / Atom feed

From: sashiko-bot@kernel.org
To: "Cong Wang" <xiyou.wangcong@gmail.com>
Cc: bpf@vger.kernel.org
Subject: Re: [RFC PATCH bpf-next 2/5] tcp_bpf: busy-poll the splice ring before parking the receiver
Date: Fri, 12 Jun 2026 03:29:48 +0000	[thread overview]
Message-ID: <20260612032949.68A3F1F000E9@smtp.kernel.org> (raw)
In-Reply-To: <20260612011452.134466-3-xiyou.wangcong@gmail.com>

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
- [High] Missing `need_resched()` check in `splice_busy_loop()` allows an unprivileged user to monopolize a CPU, causing soft lockups and denial of service.
- [High] `tcp_bpf_splice_recvmsg` incorrectly returns `-EINTR` instead of `-ERESTARTSYS` for blocking sockets when interrupted by a signal, breaking `SA_RESTART` semantics.
- [High] Providing an invalid memory buffer to `recvmsg` on a spliced socket causes an uninterruptible infinite loop in kernel space, leading to Denial of Service.
--

commit 8f0193a527ddda6d7d50681be8a081f8bc267f16
Author: Cong Wang <xiyou.wangcong@gmail.com>

tcp_bpf: busy-poll the splice ring before parking the receiver

This commit adds an optional bounded busy-poll of the ring before parking,
reusing the socket's SO_BUSY_POLL budget via sk_can_busy_loop() and
sk_busy_loop_timeout(). It spins directly on the in-kernel ring without
polling a NAPI instance.

> diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
> index 549f370772446..9c4421a74225b 100644
> --- a/net/ipv4/tcp_bpf.c
> +++ b/net/ipv4/tcp_bpf.c

[ ... ]

> @@ -1255,6 +1256,33 @@ static long splice_recv_wait(struct sock *sk, struct sk_psock_splice *s,
>  					splice_recv_ready(sk, s), timeo);
>  }
>  
> +/* Bounded busy-poll on the ring before parking the receiver. Reuses the
> + * socket's SO_BUSY_POLL budget (sk_ll_usec) via sk_can_busy_loop() and
> + * sk_busy_loop_timeout(); the default budget of 0 makes sk_can_busy_loop()
> + * false so this is a no-op unless the application (or net.core.busy_read)
> + * opted in.
> + *
> + * Unlike sk_busy_loop() / napi_busy_loop(), this spins on the in-kernel
> + * ring directly rather than polling a NAPI instance, so it is effective on
> + * loopback - which delivers via the per-CPU backlog and exposes no
> + * pollable napi_id. Keeping the receiver hot lets a synchronous sender's
> + * small writes accumulate in the ring without a wakeup per message.
> + */
> +static void splice_busy_loop(struct sock *sk, struct sk_psock_splice *s)
> +{
> +	unsigned long start;
> +
> +	if (!sk_can_busy_loop(sk))
> +		return;
> +
> +	start = busy_loop_current_time();
> +	do {
> +		cpu_relax();
> +		if (splice_recv_ready(sk, s) || signal_pending(current))
> +			return;
> +	} while (!sk_busy_loop_timeout(sk, start));

[Severity: High]
Does this loop need a need_resched() check or a call to cond_resched()?

Because an unprivileged user can set SO_BUSY_POLL to an extremely large
value (up to INT_MAX microseconds), triggering this path via recvmsg()
could cause the thread to spin on the CPU without yielding.

On non-preemptible kernels, this might completely starve other tasks,
stall RCU grace periods, and trigger watchdog soft lockups.

> +}
> +
>  /* prot->sock_is_readable for paired-splice sockets. tcp_stream_is_readable()

[ ... ]

> @@ -1349,6 +1377,16 @@ static int tcp_bpf_splice_recvmsg(struct sock *sk,
>  			return 0;
>  		}
>  
> +		/* Spin on the ring for the SO_BUSY_POLL budget before
> +		 * sleeping. If the spin observes data, re-read from the
> +		 * loop head; otherwise (budget expired or a terminal
> +		 * condition) proceed to park - splice_recv_wait() returns
> +		 * immediately for terminal conditions.
> +		 */
> +		splice_busy_loop(sk, s);
> +		if (splice_ring_has_data(s))
> +			continue;

[Severity: High]
Could this result in an infinite loop if a user provides an invalid
buffer to recvmsg?

If the user buffer is unmapped memory, copy_to_iter() in splice_ring_read()
called earlier in the loop might safely fault and return 0. If
splice_ring_read() returns 0, the ring's tail pointer isn't advanced, and
the data remains in the ring buffer.

tcp_bpf_splice_recvmsg() seems to treat the 0 return as an empty read
rather than an -EFAULT error. When it subsequently checks
splice_ring_has_data() here, it will evaluate to true because the data is
still present.

This triggers the continue statement, which loops back immediately,
potentially creating a tight infinite loop in the kernel.

> +
>  		timeo = splice_recv_wait(sk, s, timeo);

[Severity: High]
Could this cause the system call to incorrectly return -EINTR instead of
-ERESTARTSYS when interrupted by a signal?

When splice_recv_wait() uses wait_event_interruptible_timeout() and is
interrupted, it returns -ERESTARTSYS (-512). This return value directly
overwrites the timeo variable here.

On the next loop iteration, signal_pending(current) would evaluate to true,
and the error code is computed via sock_intr_errno(timeo). Because timeo
is now -512 instead of MAX_SCHEDULE_TIMEOUT, sock_intr_errno() might
incorrectly return -EINTR instead of -ERESTARTSYS, which breaks SA_RESTART
semantics.

>  	}
>  }

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260612011452.134466-1-xiyou.wangcong@gmail.com?part=2

next prev parent reply	other threads:[~2026-06-12  3:29 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-12  1:14 [RFC PATCH bpf-next 0/5] tcp: opportunistic loopback splice for BPF-paired sockets Cong Wang
2026-06-12  1:14 ` [RFC PATCH bpf-next 1/5] tcp_bpf: add bpf_sock_splice_pair kfunc for opportunistic loopback splice Cong Wang
2026-06-12  1:33   ` sashiko-bot
2026-06-12  2:10   ` bot+bpf-ci
2026-06-12  1:14 ` [RFC PATCH bpf-next 2/5] tcp_bpf: busy-poll the splice ring before parking the receiver Cong Wang
2026-06-12  3:29   ` sashiko-bot [this message]
2026-06-12  1:14 ` [RFC PATCH bpf-next 3/5] selftests/bpf: add tcp_splice basic round-trip test Cong Wang
2026-06-12  1:28   ` sashiko-bot
2026-06-12  1:14 ` [RFC PATCH bpf-next 4/5] bpf: allow SO_BUSY_POLL in bpf_setsockopt() Cong Wang
2026-06-12  1:14 ` [RFC PATCH bpf-next 5/5] selftests/bpf: set SO_BUSY_POLL from the tcp_splice sockops prog Cong Wang
2026-06-12  1:26   ` sashiko-bot
2026-06-12 16:01 ` [RFC PATCH bpf-next 0/5] tcp: opportunistic loopback splice for BPF-paired sockets Alexei Starovoitov
2026-06-12 18:12   ` Cong Wang
2026-06-12 18:34     ` Alexei Starovoitov
2026-06-12 20:17       ` Cong Wang
2026-06-13 17:57   ` Jakub Kicinski
2026-06-13 21:25     ` Cong Wang
2026-06-12 22:10 ` [syzbot ci] " syzbot ci

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260612032949.68A3F1F000E9@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=sashiko-reviews@lists.linux.dev \
    --cc=xiyou.wangcong@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.