public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Jiayuan Chen <jiayuan.chen@linux.dev>
To: bpf@vger.kernel.org, john.fastabend@gmail.com, jakub@cloudflare.com
Cc: Jiayuan Chen <jiayuan.chen@linux.dev>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Simon Horman <horms@kernel.org>,
	Kuniyuki Iwashima <kuniyu@google.com>,
	Willem de Bruijn <willemb@google.com>,
	David Ahern <dsahern@kernel.org>,
	Neal Cardwell <ncardwell@google.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>,
	Martin KaFai Lau <martin.lau@linux.dev>,
	Eduard Zingerman <eddyz87@gmail.com>, Song Liu <song@kernel.org>,
	Yonghong Song <yonghong.song@linux.dev>,
	KP Singh <kpsingh@kernel.org>,
	Stanislav Fomichev <sdf@fomichev.me>, Hao Luo <haoluo@google.com>,
	Jiri Olsa <jolsa@kernel.org>, Shuah Khan <shuah@kernel.org>,
	Jiapeng Chong <jiapeng.chong@linux.alibaba.com>,
	Ihor Solodrai <ihor.solodrai@linux.dev>,
	Michal Luczaj <mhal@rbox.co>,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-kselftest@vger.kernel.org
Subject: [PATCH bpf-next v1 0/7] bpf/sockmap: add splice support for tcp_bpf
Date: Wed,  4 Mar 2026 14:33:51 +0800	[thread overview]
Message-ID: <20260304063643.14581-1-jiayuan.chen@linux.dev> (raw)

Starting from Go 1.22.0, TCPConn implements the WriteTo interface [1],
which internally uses the splice(2) syscall to transfer data between
file descriptors [2].

However, for sockets with sockmap enabled, sk_prot is replaced with
tcp_bpf_prots which does not provide a splice_read callback. When data
is redirected to a socket's psock ingress queue via bpf_msg_redirect,
splice(2) cannot read from it because the splice path has no knowledge
of the psock queue. This causes TCPConn.WriteTo to return 0 bytes,
effectively breaking Go applications that rely on io.Copy between TCP
connections when sockmap/BPF is in use [3].

The simplest fix would be registering a splice callback that just calls
copy_splice_read(), but this results in redundant copies (socket -> kernel
buffer -> pipe -> destination), which defeats the purpose of splice.

Patch 1 adds splice_read to struct proto and sets it in TCP.
Patch 2 adds inet_splice_read and uses it in inet_stream_ops.
Patch 3 refactors tcp_bpf recvmsg with a read actor abstraction.
Patch 4 adds basic splice_read support for sockmap, but this still
involves 2 data copies.
Patch 5 optimizes the splice implementation by transferring page
ownership directly into the pipe, achieving true zero-copy. Benchmarks
show performance on par with the read(2) path.
Patch 6 adds splice selftests. Since splice can seamlessly replace read
operations, we redefine read to splice in the existing selftests so
that all existing test cases also cover the splice path.
Patch 7 adds splice to the sockmap benchmark, which also serves to
verify the effectiveness of our zero-copy implementation.

Benchmark results with rx-verdict-ingress mode (loopback, 8 CPUs):

  read(2):                  ~4292 MB/s
  splice(2) + zero-copy:    ~4270 MB/s
  splice(2) + always-copy:  ~2770 MB/s

Zero-copy splice achieves near-parity with read(2), while the
always-copy fallback is ~35% slower.

[1] https://github.com/golang/go/blob/master/src/net/tcpsock.go#L173
[2] https://github.com/golang/go/blob/fdf3bee/src/net/tcpsock_posix.go#L57
[3] https://github.com/jschwinger233/bpf_msg_redirect_bug_reproducer

Jiayuan Chen (7):
  net: add splice_read to struct proto and set it in tcp_prot/tcpv6_prot
  inet: add inet_splice_read() and use it in
    inet_stream_ops/inet6_stream_ops
  tcp_bpf: refactor recvmsg with read actor abstraction
  tcp_bpf: add splice_read support for sockmap
  tcp_bpf: optimize splice_read with zero-copy for non-slab pages
  selftests/bpf: add splice_read tests for sockmap
  selftests/bpf: add splice option to sockmap benchmark

 include/linux/skmsg.h                         |  12 +-
 include/net/inet_common.h                     |   3 +
 include/net/sock.h                            |   3 +
 net/core/skmsg.c                              |  34 ++-
 net/ipv4/af_inet.c                            |  15 +-
 net/ipv4/tcp_bpf.c                            | 227 +++++++++++++++---
 net/ipv4/tcp_ipv4.c                           |   1 +
 net/ipv6/af_inet6.c                           |   2 +-
 net/ipv6/tcp_ipv6.c                           |   1 +
 .../selftests/bpf/benchs/bench_sockmap.c      |  57 ++++-
 .../selftests/bpf/prog_tests/sockmap_basic.c  |  28 ++-
 .../bpf/prog_tests/sockmap_helpers.h          |  62 +++++
 .../selftests/bpf/prog_tests/sockmap_strp.c   |  28 ++-
 13 files changed, 421 insertions(+), 52 deletions(-)

-- 
2.43.0


             reply	other threads:[~2026-03-04  6:37 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-04  6:33 Jiayuan Chen [this message]
2026-03-04  6:33 ` [PATCH bpf-next v1 1/7] net: add splice_read to struct proto and set it in tcp_prot/tcpv6_prot Jiayuan Chen
2026-03-04  6:33 ` [PATCH bpf-next v1 2/7] inet: add inet_splice_read() and use it in inet_stream_ops/inet6_stream_ops Jiayuan Chen
2026-03-04  6:33 ` [PATCH bpf-next v1 3/7] tcp_bpf: refactor recvmsg with read actor abstraction Jiayuan Chen
2026-03-04  7:14   ` bot+bpf-ci
2026-03-04  6:33 ` [PATCH bpf-next v1 4/7] tcp_bpf: add splice_read support for sockmap Jiayuan Chen
2026-03-04  7:27   ` bot+bpf-ci
2026-03-04  6:33 ` [PATCH bpf-next v1 5/7] tcp_bpf: optimize splice_read with zero-copy for non-slab pages Jiayuan Chen
2026-03-04  6:33 ` [PATCH bpf-next v1 6/7] selftests/bpf: add splice_read tests for sockmap Jiayuan Chen
2026-03-06 17:25   ` Mykyta Yatsenko
2026-03-04  6:33 ` [PATCH bpf-next v1 7/7] selftests/bpf: add splice option to sockmap benchmark Jiayuan Chen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260304063643.14581-1-jiayuan.chen@linux.dev \
    --to=jiayuan.chen@linux.dev \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=dsahern@kernel.org \
    --cc=eddyz87@gmail.com \
    --cc=edumazet@google.com \
    --cc=haoluo@google.com \
    --cc=horms@kernel.org \
    --cc=ihor.solodrai@linux.dev \
    --cc=jakub@cloudflare.com \
    --cc=jiapeng.chong@linux.alibaba.com \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kpsingh@kernel.org \
    --cc=kuba@kernel.org \
    --cc=kuniyu@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=martin.lau@linux.dev \
    --cc=mhal@rbox.co \
    --cc=ncardwell@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=sdf@fomichev.me \
    --cc=shuah@kernel.org \
    --cc=song@kernel.org \
    --cc=willemb@google.com \
    --cc=yonghong.song@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox