From: Cong Wang <xiyou.wangcong@gmail.com>
To: netdev@vger.kernel.org
Cc: bpf@vger.kernel.org, John Fastabend <john.fastabend@gmail.com>,
Jakub Sitnicki <jakub@cloudflare.com>,
Jiayuan Chen <jiayuan.chen@linux.dev>,
hemanthmalla@gmail.com, zijianzhang@bytedance.com,
Cong Wang <xiyou.wangcong@gmail.com>
Subject: [RFC PATCH bpf-next 0/5] tcp: opportunistic loopback splice for BPF-paired sockets
Date: Thu, 11 Jun 2026 18:14:47 -0700 [thread overview]
Message-ID: <20260612011452.134466-1-xiyou.wangcong@gmail.com> (raw)
This series adds an opportunistic "loopback splice" fast path for two
locally-connected TCP sockets that a sock_ops BPF program pairs at
handshake completion. Once paired, sendmsg copies the user payload into
a per-direction in-kernel byte ring and recvmsg drains it on the other
side; both copies happen in their own task's mm, so the fast path incurs
no skb construction, no softirq, and no TCP protocol-state processing.
The underlying TCP connection stays fully real: sequence numbers are
frozen at post-handshake values, so FIN/RST/keepalive keep flowing
through the normal paths and the pair tears down via a regular close.
Pairing is opt-in per flow and fallback is per-message - handshake-style
traffic takes the TCP path, the bulk phase takes the ring, on the same
socket. Nothing leaves the host and applications need no changes: no new
address family, no LD_PRELOAD, no source modification.
The target use cases are co-located endpoints that speak plain TCP:
- regular TCP loopback (127.0.0.1) between processes on the same host;
- container sidecar deployments - e.g. a service-mesh sidecar proxy and
its application in the same pod, talking over loopback or a veth pair -
where the per-skb veth+bridge cost is exactly what the ring sidesteps.
Highlights (TCP_RR, 1 KB request/response, netperf, pinned CPUs,
baseline TCP vs splice; full tables across message sizes and TCP_STREAM
in patches 1 and 2):
loopback (127.0.0.1):
without busy-poll: 105.8k -> 235.1k tps (2.2x)
with busy-poll 50us: 106.1k -> 713.0k tps (6.7x)
container (netns + veth + bridge):
without busy-poll: 99.9k -> 233.9k tps (2.3x)
with busy-poll 50us: 100.4k -> 704.9k tps (7.0x)
Synchronous-RPC (TCP_RR) at a 1 KB message wins ~2.2x without busy
polling and ~6.7x with it (the win grows toward smaller messages and
narrows toward 64 KB), because the ring removes the per-cycle kernel TCP
receive-path cost and the receiver can spin on the ring directly -
loopback delivers via the per-CPU backlog and exposes no pollable
napi_id, so the generic sk_busy_loop() is a no-op there. Bulk streaming
is roughly neutral on bare-metal loopback but wins decisively (up to
~6x) container-to-container, where per-skb veth+bridge cost dominates
the path the ring sidesteps.
---
Cong Wang (5):
tcp_bpf: add bpf_sock_splice_pair kfunc for opportunistic loopback
splice
tcp_bpf: busy-poll the splice ring before parking the receiver
selftests/bpf: add tcp_splice basic round-trip test
bpf: allow SO_BUSY_POLL in bpf_setsockopt()
selftests/bpf: set SO_BUSY_POLL from the tcp_splice sockops prog
include/linux/skmsg.h | 9 +
include/net/tcp.h | 8 +
net/core/filter.c | 1 +
net/core/skmsg.c | 3 +
net/ipv4/tcp_bpf.c | 847 +++++++++++++++++-
.../selftests/bpf/prog_tests/tcp_splice.c | 206 +++++
.../selftests/bpf/progs/test_tcp_splice.c | 125 +++
7 files changed, 1198 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/tcp_splice.c
create mode 100644 tools/testing/selftests/bpf/progs/test_tcp_splice.c
base-commit: 30dee2c176e7954f63d1fa3e52d172f30beb9bfb
--
2.43.0
next reply other threads:[~2026-06-12 1:15 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-12 1:14 Cong Wang [this message]
2026-06-12 1:14 ` [RFC PATCH bpf-next 1/5] tcp_bpf: add bpf_sock_splice_pair kfunc for opportunistic loopback splice Cong Wang
2026-06-12 2:10 ` bot+bpf-ci
2026-06-12 1:14 ` [RFC PATCH bpf-next 2/5] tcp_bpf: busy-poll the splice ring before parking the receiver Cong Wang
2026-06-12 1:14 ` [RFC PATCH bpf-next 3/5] selftests/bpf: add tcp_splice basic round-trip test Cong Wang
2026-06-12 1:14 ` [RFC PATCH bpf-next 4/5] bpf: allow SO_BUSY_POLL in bpf_setsockopt() Cong Wang
2026-06-12 1:14 ` [RFC PATCH bpf-next 5/5] selftests/bpf: set SO_BUSY_POLL from the tcp_splice sockops prog Cong Wang
2026-06-12 16:01 ` [RFC PATCH bpf-next 0/5] tcp: opportunistic loopback splice for BPF-paired sockets Alexei Starovoitov
2026-06-12 18:12 ` Cong Wang
2026-06-12 18:34 ` Alexei Starovoitov
2026-06-12 20:17 ` Cong Wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260612011452.134466-1-xiyou.wangcong@gmail.com \
--to=xiyou.wangcong@gmail.com \
--cc=bpf@vger.kernel.org \
--cc=hemanthmalla@gmail.com \
--cc=jakub@cloudflare.com \
--cc=jiayuan.chen@linux.dev \
--cc=john.fastabend@gmail.com \
--cc=netdev@vger.kernel.org \
--cc=zijianzhang@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox