Netdev List
 help / color / mirror / Atom feed
* [RFC PATCH bpf-next 0/5] tcp: opportunistic loopback splice for BPF-paired sockets
@ 2026-06-12  1:14 Cong Wang
  2026-06-12  1:14 ` [RFC PATCH bpf-next 1/5] tcp_bpf: add bpf_sock_splice_pair kfunc for opportunistic loopback splice Cong Wang
                   ` (5 more replies)
  0 siblings, 6 replies; 11+ messages in thread
From: Cong Wang @ 2026-06-12  1:14 UTC (permalink / raw)
  To: netdev
  Cc: bpf, John Fastabend, Jakub Sitnicki, Jiayuan Chen, hemanthmalla,
	zijianzhang, Cong Wang

This series adds an opportunistic "loopback splice" fast path for two
locally-connected TCP sockets that a sock_ops BPF program pairs at
handshake completion. Once paired, sendmsg copies the user payload into
a per-direction in-kernel byte ring and recvmsg drains it on the other
side; both copies happen in their own task's mm, so the fast path incurs
no skb construction, no softirq, and no TCP protocol-state processing.

The underlying TCP connection stays fully real: sequence numbers are
frozen at post-handshake values, so FIN/RST/keepalive keep flowing
through the normal paths and the pair tears down via a regular close.
Pairing is opt-in per flow and fallback is per-message - handshake-style
traffic takes the TCP path, the bulk phase takes the ring, on the same
socket. Nothing leaves the host and applications need no changes: no new
address family, no LD_PRELOAD, no source modification.

The target use cases are co-located endpoints that speak plain TCP:
 - regular TCP loopback (127.0.0.1) between processes on the same host;
 - container sidecar deployments - e.g. a service-mesh sidecar proxy and
   its application in the same pod, talking over loopback or a veth pair -
   where the per-skb veth+bridge cost is exactly what the ring sidesteps.

Highlights (TCP_RR, 1 KB request/response, netperf, pinned CPUs,
baseline TCP vs splice; full tables across message sizes and TCP_STREAM
in patches 1 and 2):

  loopback (127.0.0.1):
    without busy-poll:   105.8k -> 235.1k tps  (2.2x)
    with busy-poll 50us: 106.1k -> 713.0k tps  (6.7x)

  container (netns + veth + bridge):
    without busy-poll:    99.9k -> 233.9k tps  (2.3x)
    with busy-poll 50us: 100.4k -> 704.9k tps  (7.0x)

Synchronous-RPC (TCP_RR) at a 1 KB message wins ~2.2x without busy
polling and ~6.7x with it (the win grows toward smaller messages and
narrows toward 64 KB), because the ring removes the per-cycle kernel TCP
receive-path cost and the receiver can spin on the ring directly -
loopback delivers via the per-CPU backlog and exposes no pollable
napi_id, so the generic sk_busy_loop() is a no-op there. Bulk streaming
is roughly neutral on bare-metal loopback but wins decisively (up to
~6x) container-to-container, where per-skb veth+bridge cost dominates
the path the ring sidesteps.

---
Cong Wang (5):
  tcp_bpf: add bpf_sock_splice_pair kfunc for opportunistic loopback
    splice
  tcp_bpf: busy-poll the splice ring before parking the receiver
  selftests/bpf: add tcp_splice basic round-trip test
  bpf: allow SO_BUSY_POLL in bpf_setsockopt()
  selftests/bpf: set SO_BUSY_POLL from the tcp_splice sockops prog

 include/linux/skmsg.h                         |   9 +
 include/net/tcp.h                             |   8 +
 net/core/filter.c                             |   1 +
 net/core/skmsg.c                              |   3 +
 net/ipv4/tcp_bpf.c                            | 847 +++++++++++++++++-
 .../selftests/bpf/prog_tests/tcp_splice.c     | 206 +++++
 .../selftests/bpf/progs/test_tcp_splice.c     | 125 +++
 7 files changed, 1198 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/tcp_splice.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_tcp_splice.c


base-commit: 30dee2c176e7954f63d1fa3e52d172f30beb9bfb
-- 
2.43.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2026-06-12 20:17 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-12  1:14 [RFC PATCH bpf-next 0/5] tcp: opportunistic loopback splice for BPF-paired sockets Cong Wang
2026-06-12  1:14 ` [RFC PATCH bpf-next 1/5] tcp_bpf: add bpf_sock_splice_pair kfunc for opportunistic loopback splice Cong Wang
2026-06-12  2:10   ` bot+bpf-ci
2026-06-12  1:14 ` [RFC PATCH bpf-next 2/5] tcp_bpf: busy-poll the splice ring before parking the receiver Cong Wang
2026-06-12  1:14 ` [RFC PATCH bpf-next 3/5] selftests/bpf: add tcp_splice basic round-trip test Cong Wang
2026-06-12  1:14 ` [RFC PATCH bpf-next 4/5] bpf: allow SO_BUSY_POLL in bpf_setsockopt() Cong Wang
2026-06-12  1:14 ` [RFC PATCH bpf-next 5/5] selftests/bpf: set SO_BUSY_POLL from the tcp_splice sockops prog Cong Wang
2026-06-12 16:01 ` [RFC PATCH bpf-next 0/5] tcp: opportunistic loopback splice for BPF-paired sockets Alexei Starovoitov
2026-06-12 18:12   ` Cong Wang
2026-06-12 18:34     ` Alexei Starovoitov
2026-06-12 20:17       ` Cong Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox