From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from out-182.mta0.migadu.com (out-182.mta0.migadu.com [91.218.175.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 66DAA33122D for ; Wed, 4 Mar 2026 06:37:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.182 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772606245; cv=none; b=CXMh+Wsxp2yAs8HjBVHFY53EU59nB3042WSzrg++yZN1+E/yuWpt7YIhkO0bOZKQaA8p1H5g5B51xQ0cROo2sQRMvrlyLQiC9Vtca8BnxDiSUNCCzX4h8hAnz0NeGWxWj+CtAiYDW5X20ylWgTdx7nP6/y2y084IEBK9lq8IJXw= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772606245; c=relaxed/simple; bh=C+GWh0X/x+E8K8Heufepk3oNTNdSdcGqi5B/uREcWyM=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version:Content-Type; b=PZ4nnjlY+U/Fmyl/SmFHfV+tKhnNEc4mKSIl0IktPWBsdP5OqtrLftQuZgPCzYpIk2/h+0Xsoq8A0A9VhCfw2Cma8Zx27U5VgfhCxxysDkoIE5InY9B9CUwB6s2fQtSJyDMHGwgcDcDmfTpTZbsjrgTAyE7ULeLBS2oyHsEjsak= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=dSdIUZn/; arc=none smtp.client-ip=91.218.175.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="dSdIUZn/" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1772606232; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=BezM0d3Efmj5Mb5nol+0YNAjnV5cCAH9C7KVaf9ncPM=; b=dSdIUZn/Je1e4TEMSMz6sz+HWz+Z67hLlAtKW/MRyVywEMHOVKjHo/LTFV+CR0iPzbus5U 7SEeOmQamZI4V90SkQbQamxfBYQsMQCgKvpvG5qUExT1RlLbxEE4uCNGgwmpx0z7Mgvb9v mW3Cf0vE+qmUXg97JEPoKMATXtCr9IU= From: Jiayuan Chen To: bpf@vger.kernel.org, john.fastabend@gmail.com, jakub@cloudflare.com Cc: Jiayuan Chen , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Simon Horman , Kuniyuki Iwashima , Willem de Bruijn , David Ahern , Neal Cardwell , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Eduard Zingerman , Song Liu , Yonghong Song , KP Singh , Stanislav Fomichev , Hao Luo , Jiri Olsa , Shuah Khan , Jiapeng Chong , Ihor Solodrai , Michal Luczaj , netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org Subject: [PATCH bpf-next v1 0/7] bpf/sockmap: add splice support for tcp_bpf Date: Wed, 4 Mar 2026 14:33:51 +0800 Message-ID: <20260304063643.14581-1-jiayuan.chen@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Migadu-Flow: FLOW_OUT Starting from Go 1.22.0, TCPConn implements the WriteTo interface [1], which internally uses the splice(2) syscall to transfer data between file descriptors [2]. However, for sockets with sockmap enabled, sk_prot is replaced with tcp_bpf_prots which does not provide a splice_read callback. When data is redirected to a socket's psock ingress queue via bpf_msg_redirect, splice(2) cannot read from it because the splice path has no knowledge of the psock queue. This causes TCPConn.WriteTo to return 0 bytes, effectively breaking Go applications that rely on io.Copy between TCP connections when sockmap/BPF is in use [3]. The simplest fix would be registering a splice callback that just calls copy_splice_read(), but this results in redundant copies (socket -> kernel buffer -> pipe -> destination), which defeats the purpose of splice. Patch 1 adds splice_read to struct proto and sets it in TCP. Patch 2 adds inet_splice_read and uses it in inet_stream_ops. Patch 3 refactors tcp_bpf recvmsg with a read actor abstraction. Patch 4 adds basic splice_read support for sockmap, but this still involves 2 data copies. Patch 5 optimizes the splice implementation by transferring page ownership directly into the pipe, achieving true zero-copy. Benchmarks show performance on par with the read(2) path. Patch 6 adds splice selftests. Since splice can seamlessly replace read operations, we redefine read to splice in the existing selftests so that all existing test cases also cover the splice path. Patch 7 adds splice to the sockmap benchmark, which also serves to verify the effectiveness of our zero-copy implementation. Benchmark results with rx-verdict-ingress mode (loopback, 8 CPUs): read(2): ~4292 MB/s splice(2) + zero-copy: ~4270 MB/s splice(2) + always-copy: ~2770 MB/s Zero-copy splice achieves near-parity with read(2), while the always-copy fallback is ~35% slower. [1] https://github.com/golang/go/blob/master/src/net/tcpsock.go#L173 [2] https://github.com/golang/go/blob/fdf3bee/src/net/tcpsock_posix.go#L57 [3] https://github.com/jschwinger233/bpf_msg_redirect_bug_reproducer Jiayuan Chen (7): net: add splice_read to struct proto and set it in tcp_prot/tcpv6_prot inet: add inet_splice_read() and use it in inet_stream_ops/inet6_stream_ops tcp_bpf: refactor recvmsg with read actor abstraction tcp_bpf: add splice_read support for sockmap tcp_bpf: optimize splice_read with zero-copy for non-slab pages selftests/bpf: add splice_read tests for sockmap selftests/bpf: add splice option to sockmap benchmark include/linux/skmsg.h | 12 +- include/net/inet_common.h | 3 + include/net/sock.h | 3 + net/core/skmsg.c | 34 ++- net/ipv4/af_inet.c | 15 +- net/ipv4/tcp_bpf.c | 227 +++++++++++++++--- net/ipv4/tcp_ipv4.c | 1 + net/ipv6/af_inet6.c | 2 +- net/ipv6/tcp_ipv6.c | 1 + .../selftests/bpf/benchs/bench_sockmap.c | 57 ++++- .../selftests/bpf/prog_tests/sockmap_basic.c | 28 ++- .../bpf/prog_tests/sockmap_helpers.h | 62 +++++ .../selftests/bpf/prog_tests/sockmap_strp.c | 28 ++- 13 files changed, 421 insertions(+), 52 deletions(-) -- 2.43.0