netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next v3 0/5] sockmap: Fix reading with splice(2)
@ 2025-07-09 12:47 Vincent Whitchurch via B4 Relay
  2025-07-09 12:47 ` [PATCH bpf-next v3 1/5] net: Add splice_read to prot Vincent Whitchurch via B4 Relay
                   ` (6 more replies)
  0 siblings, 7 replies; 12+ messages in thread
From: Vincent Whitchurch via B4 Relay @ 2025-07-09 12:47 UTC (permalink / raw)
  To: John Fastabend, Jakub Sitnicki
  Cc: Kuniyuki Iwashima, Jakub Kicinski, netdev, bpf,
	Vincent Whitchurch

I noticed that if the verdict callback returns SK_PASS, using splice(2)
to read from a socket in a sockmap does not work since it never sees the
data queued on to it.  As far as I can see, this is not a regression but
just something that has never worked, but it does make sockmap unusable
if you can't guarantee that the programs using the socket will not use
splice(2).

This series attempts to fix it and add a test for it.

---
Changes in v3:
- Rebase on latest bpf-next/master
- Link to v2: https://lore.kernel.org/r/20250609-sockmap-splice-v2-0-9c50645cfa32@datadoghq.com

Changes in v2:
- Rebase on latest bpf-next/master
- Remove unnecessary change in inet_dgram_ops
- Remove ->splice_read NULL check in inet_splice_read()
- Use INDIRECT_CALL_1() in inet_splice_read()
- Include test case in default test suite in test_sockmap
- Link to v1: https://lore.kernel.org/r/20240606-sockmap-splice-v1-0-4820a2ab14b5@datadoghq.com

---
Vincent Whitchurch (5):
      net: Add splice_read to prot
      tcp_bpf: Fix reading with splice(2)
      selftests/bpf: sockmap: Exit with error on failure
      selftests/bpf: sockmap: Allow SK_PASS in verdict
      selftests/bpf: sockmap: Add splice + SK_PASS regression test

 include/net/inet_common.h                  |  3 +
 include/net/sock.h                         |  3 +
 net/ipv4/af_inet.c                         | 13 ++++-
 net/ipv4/tcp_bpf.c                         |  9 +++
 net/ipv4/tcp_ipv4.c                        |  1 +
 net/ipv6/af_inet6.c                        |  2 +-
 net/ipv6/tcp_ipv6.c                        |  1 +
 tools/testing/selftests/bpf/test_sockmap.c | 90 +++++++++++++++++++++++++++++-
 8 files changed, 118 insertions(+), 4 deletions(-)
---
base-commit: ad97cb2ed06a6ba9025fd8bd14fa24369550cbb5
change-id: 20240606-sockmap-splice-d371ac07d7b4

Best regards,
-- 
Vincent Whitchurch <vincent.whitchurch@datadoghq.com>



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH bpf-next v3 1/5] net: Add splice_read to prot
  2025-07-09 12:47 [PATCH bpf-next v3 0/5] sockmap: Fix reading with splice(2) Vincent Whitchurch via B4 Relay
@ 2025-07-09 12:47 ` Vincent Whitchurch via B4 Relay
  2025-07-11 17:27   ` John Fastabend
  2025-07-09 12:47 ` [PATCH bpf-next v3 2/5] tcp_bpf: Fix reading with splice(2) Vincent Whitchurch via B4 Relay
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 12+ messages in thread
From: Vincent Whitchurch via B4 Relay @ 2025-07-09 12:47 UTC (permalink / raw)
  To: John Fastabend, Jakub Sitnicki
  Cc: Kuniyuki Iwashima, Jakub Kicinski, netdev, bpf,
	Vincent Whitchurch

From: Vincent Whitchurch <vincent.whitchurch@datadoghq.com>

The TCP BPF code will need to override splice_read(), so add it to prot.

Signed-off-by: Vincent Whitchurch <vincent.whitchurch@datadoghq.com>
---
 include/net/inet_common.h |  3 +++
 include/net/sock.h        |  3 +++
 net/ipv4/af_inet.c        | 13 ++++++++++++-
 net/ipv4/tcp_ipv4.c       |  1 +
 net/ipv6/af_inet6.c       |  2 +-
 net/ipv6/tcp_ipv6.c       |  1 +
 6 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/include/net/inet_common.h b/include/net/inet_common.h
index c17a6585d0b0..2a6480d0d575 100644
--- a/include/net/inet_common.h
+++ b/include/net/inet_common.h
@@ -35,6 +35,9 @@ void __inet_accept(struct socket *sock, struct socket *newsock,
 		   struct sock *newsk);
 int inet_send_prepare(struct sock *sk);
 int inet_sendmsg(struct socket *sock, struct msghdr *msg, size_t size);
+ssize_t inet_splice_read(struct socket *sk, loff_t *ppos,
+			 struct pipe_inode_info *pipe, size_t len,
+			 unsigned int flags);
 void inet_splice_eof(struct socket *sock);
 int inet_recvmsg(struct socket *sock, struct msghdr *msg, size_t size,
 		 int flags);
diff --git a/include/net/sock.h b/include/net/sock.h
index 4c37015b7cf7..4bdebcbcca38 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1280,6 +1280,9 @@ struct proto {
 					   size_t len);
 	int			(*recvmsg)(struct sock *sk, struct msghdr *msg,
 					   size_t len, int flags, int *addr_len);
+	ssize_t			(*splice_read)(struct socket *sock,  loff_t *ppos,
+					       struct pipe_inode_info *pipe, size_t len,
+					       unsigned int flags);
 	void			(*splice_eof)(struct socket *sock);
 	int			(*bind)(struct sock *sk,
 					struct sockaddr *addr, int addr_len);
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 76e38092cd8a..9c521d252f66 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -868,6 +868,17 @@ void inet_splice_eof(struct socket *sock)
 }
 EXPORT_SYMBOL_GPL(inet_splice_eof);
 
+ssize_t inet_splice_read(struct socket *sock, loff_t *ppos,
+			 struct pipe_inode_info *pipe, size_t len,
+			 unsigned int flags)
+{
+	struct sock *sk = sock->sk;
+
+	return INDIRECT_CALL_1(sk->sk_prot->splice_read, tcp_splice_read, sock,
+			       ppos, pipe, len, flags);
+}
+EXPORT_SYMBOL_GPL(inet_splice_read);
+
 INDIRECT_CALLABLE_DECLARE(int udp_recvmsg(struct sock *, struct msghdr *,
 					  size_t, int, int *));
 int inet_recvmsg(struct socket *sock, struct msghdr *msg, size_t size,
@@ -1071,7 +1082,7 @@ const struct proto_ops inet_stream_ops = {
 	.mmap		   = tcp_mmap,
 #endif
 	.splice_eof	   = inet_splice_eof,
-	.splice_read	   = tcp_splice_read,
+	.splice_read	   = inet_splice_read,
 	.set_peek_off      = sk_set_peek_off,
 	.read_sock	   = tcp_read_sock,
 	.read_skb	   = tcp_read_skb,
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 6a14f9e6fef6..0391b6bef35b 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -3375,6 +3375,7 @@ struct proto tcp_prot = {
 	.keepalive		= tcp_set_keepalive,
 	.recvmsg		= tcp_recvmsg,
 	.sendmsg		= tcp_sendmsg,
+	.splice_read		= tcp_splice_read,
 	.splice_eof		= tcp_splice_eof,
 	.backlog_rcv		= tcp_v4_do_rcv,
 	.release_cb		= tcp_release_cb,
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index acaff1296783..2d6a1d669452 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -705,7 +705,7 @@ const struct proto_ops inet6_stream_ops = {
 #endif
 	.splice_eof	   = inet_splice_eof,
 	.sendmsg_locked    = tcp_sendmsg_locked,
-	.splice_read	   = tcp_splice_read,
+	.splice_read	   = inet_splice_read,
 	.set_peek_off      = sk_set_peek_off,
 	.read_sock	   = tcp_read_sock,
 	.read_skb	   = tcp_read_skb,
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index e8e68a142649..f7558e443de6 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -2342,6 +2342,7 @@ struct proto tcpv6_prot = {
 	.keepalive		= tcp_set_keepalive,
 	.recvmsg		= tcp_recvmsg,
 	.sendmsg		= tcp_sendmsg,
+	.splice_read		= tcp_splice_read,
 	.splice_eof		= tcp_splice_eof,
 	.backlog_rcv		= tcp_v6_do_rcv,
 	.release_cb		= tcp_release_cb,

-- 
2.34.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH bpf-next v3 2/5] tcp_bpf: Fix reading with splice(2)
  2025-07-09 12:47 [PATCH bpf-next v3 0/5] sockmap: Fix reading with splice(2) Vincent Whitchurch via B4 Relay
  2025-07-09 12:47 ` [PATCH bpf-next v3 1/5] net: Add splice_read to prot Vincent Whitchurch via B4 Relay
@ 2025-07-09 12:47 ` Vincent Whitchurch via B4 Relay
  2025-07-09 12:47 ` [PATCH bpf-next v3 3/5] selftests/bpf: sockmap: Exit with error on failure Vincent Whitchurch via B4 Relay
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: Vincent Whitchurch via B4 Relay @ 2025-07-09 12:47 UTC (permalink / raw)
  To: John Fastabend, Jakub Sitnicki
  Cc: Kuniyuki Iwashima, Jakub Kicinski, netdev, bpf,
	Vincent Whitchurch

From: Vincent Whitchurch <vincent.whitchurch@datadoghq.com>

If a socket is added to a sockmap with a verdict program which returns
SK_PASS, splice(2) is not able to read from the socket.

The verdict code removes skbs from the receive queue, checks them using
the bpf program, and then re-queues them onto a separate queue
(psock->ingress_msg).  The sockmap code modifies the TCP recvmsg hook to
check this second queue also so that works. But the splice_read hooks is
not modified and the default tcp_read_splice() only reads the normal
receive queue so it never sees the skbs which have been re-queued.

Fix it by using copy_splice_read() when replacing the proto for the
sockmap.  This could eventually be replaced with a more efficient custom
version.

Signed-off-by: Vincent Whitchurch <vincent.whitchurch@datadoghq.com>
---
 net/ipv4/tcp_bpf.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
index ba581785adb4..197429b6adae 100644
--- a/net/ipv4/tcp_bpf.c
+++ b/net/ipv4/tcp_bpf.c
@@ -3,6 +3,7 @@
 
 #include <linux/skmsg.h>
 #include <linux/filter.h>
+#include <linux/fs.h>
 #include <linux/bpf.h>
 #include <linux/init.h>
 #include <linux/wait.h>
@@ -381,6 +382,13 @@ static int tcp_bpf_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
 	return ret;
 }
 
+static ssize_t tcp_bpf_splice_read(struct socket *sock, loff_t *ppos,
+				   struct pipe_inode_info *pipe, size_t len,
+				   unsigned int flags)
+{
+	return copy_splice_read(sock->file, ppos, pipe, len, flags);
+}
+
 static int tcp_bpf_send_verdict(struct sock *sk, struct sk_psock *psock,
 				struct sk_msg *msg, int *copied, int flags)
 {
@@ -605,6 +613,7 @@ static void tcp_bpf_rebuild_protos(struct proto prot[TCP_BPF_NUM_CFGS],
 	prot[TCP_BPF_BASE].destroy		= sock_map_destroy;
 	prot[TCP_BPF_BASE].close		= sock_map_close;
 	prot[TCP_BPF_BASE].recvmsg		= tcp_bpf_recvmsg;
+	prot[TCP_BPF_BASE].splice_read		= tcp_bpf_splice_read;
 	prot[TCP_BPF_BASE].sock_is_readable	= sk_msg_is_readable;
 
 	prot[TCP_BPF_TX]			= prot[TCP_BPF_BASE];

-- 
2.34.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH bpf-next v3 3/5] selftests/bpf: sockmap: Exit with error on failure
  2025-07-09 12:47 [PATCH bpf-next v3 0/5] sockmap: Fix reading with splice(2) Vincent Whitchurch via B4 Relay
  2025-07-09 12:47 ` [PATCH bpf-next v3 1/5] net: Add splice_read to prot Vincent Whitchurch via B4 Relay
  2025-07-09 12:47 ` [PATCH bpf-next v3 2/5] tcp_bpf: Fix reading with splice(2) Vincent Whitchurch via B4 Relay
@ 2025-07-09 12:47 ` Vincent Whitchurch via B4 Relay
  2025-07-09 12:48 ` [PATCH bpf-next v3 4/5] selftests/bpf: sockmap: Allow SK_PASS in verdict Vincent Whitchurch via B4 Relay
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: Vincent Whitchurch via B4 Relay @ 2025-07-09 12:47 UTC (permalink / raw)
  To: John Fastabend, Jakub Sitnicki
  Cc: Kuniyuki Iwashima, Jakub Kicinski, netdev, bpf,
	Vincent Whitchurch

From: Vincent Whitchurch <vincent.whitchurch@datadoghq.com>

If any tests failed, exit the program with a non-zero
error code.

Signed-off-by: Vincent Whitchurch <vincent.whitchurch@datadoghq.com>
---
 tools/testing/selftests/bpf/test_sockmap.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/bpf/test_sockmap.c b/tools/testing/selftests/bpf/test_sockmap.c
index fd2da2234cc9..cf1c36ed32c1 100644
--- a/tools/testing/selftests/bpf/test_sockmap.c
+++ b/tools/testing/selftests/bpf/test_sockmap.c
@@ -2238,7 +2238,9 @@ int main(int argc, char **argv)
 	close(cg_fd);
 	if (cg_created)
 		cleanup_cgroup_environment();
-	return err;
+	if (err)
+		return err;
+	return failed ? 1 : 0;
 }
 
 void running_handler(int a)

-- 
2.34.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH bpf-next v3 4/5] selftests/bpf: sockmap: Allow SK_PASS in verdict
  2025-07-09 12:47 [PATCH bpf-next v3 0/5] sockmap: Fix reading with splice(2) Vincent Whitchurch via B4 Relay
                   ` (2 preceding siblings ...)
  2025-07-09 12:47 ` [PATCH bpf-next v3 3/5] selftests/bpf: sockmap: Exit with error on failure Vincent Whitchurch via B4 Relay
@ 2025-07-09 12:48 ` Vincent Whitchurch via B4 Relay
  2025-07-09 12:48 ` [PATCH bpf-next v3 5/5] selftests/bpf: sockmap: Add splice + SK_PASS regression test Vincent Whitchurch via B4 Relay
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: Vincent Whitchurch via B4 Relay @ 2025-07-09 12:48 UTC (permalink / raw)
  To: John Fastabend, Jakub Sitnicki
  Cc: Kuniyuki Iwashima, Jakub Kicinski, netdev, bpf,
	Vincent Whitchurch

From: Vincent Whitchurch <vincent.whitchurch@datadoghq.com>

Add an option to always return SK_PASS in the verdict callback
instead of redirecting the skb.  This allows testing cases
which are not covered by the test program as of now.

Signed-off-by: Vincent Whitchurch <vincent.whitchurch@datadoghq.com>
---
 tools/testing/selftests/bpf/test_sockmap.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/bpf/test_sockmap.c b/tools/testing/selftests/bpf/test_sockmap.c
index cf1c36ed32c1..8be2714dd573 100644
--- a/tools/testing/selftests/bpf/test_sockmap.c
+++ b/tools/testing/selftests/bpf/test_sockmap.c
@@ -80,6 +80,7 @@ int txmsg_end_push;
 int txmsg_start_pop;
 int txmsg_pop;
 int txmsg_ingress;
+int txmsg_pass_skb;
 int txmsg_redir_skb;
 int txmsg_ktls_skb;
 int txmsg_ktls_skb_drop;
@@ -114,6 +115,7 @@ static const struct option long_options[] = {
 	{"txmsg_start_pop",  required_argument,	NULL, 'w'},
 	{"txmsg_pop",	     required_argument,	NULL, 'x'},
 	{"txmsg_ingress", no_argument,		&txmsg_ingress, 1 },
+	{"txmsg_pass_skb", no_argument,		&txmsg_pass_skb, 1 },
 	{"txmsg_redir_skb", no_argument,	&txmsg_redir_skb, 1 },
 	{"ktls", no_argument,			&ktls, 1 },
 	{"peek", no_argument,			&peek_flag, 1 },
@@ -183,6 +185,7 @@ static void test_reset(void)
 	txmsg_pass = txmsg_drop = txmsg_redir = 0;
 	txmsg_apply = txmsg_cork = 0;
 	txmsg_ingress = txmsg_redir_skb = 0;
+	txmsg_pass_skb = 0;
 	txmsg_ktls_skb = txmsg_ktls_skb_drop = txmsg_ktls_skb_redir = 0;
 	txmsg_omit_skb_parser = 0;
 	skb_use_parser = 0;
@@ -1050,6 +1053,7 @@ static int run_options(struct sockmap_options *options, int cg_fd,  int test)
 {
 	int i, key, next_key, err, zero = 0;
 	struct bpf_program *tx_prog;
+	struct bpf_program *skb_verdict_prog;
 
 	/* If base test skip BPF setup */
 	if (test == BASE || test == BASE_SENDPAGE)
@@ -1066,7 +1070,12 @@ static int run_options(struct sockmap_options *options, int cg_fd,  int test)
 		}
 	}
 
-	links[1] = bpf_program__attach_sockmap(progs[1], map_fd[0]);
+	if (txmsg_pass_skb)
+		skb_verdict_prog = progs[2];
+	else
+		skb_verdict_prog = progs[1];
+
+	links[1] = bpf_program__attach_sockmap(skb_verdict_prog, map_fd[0]);
 	if (!links[1]) {
 		fprintf(stderr, "ERROR: bpf_program__attach_sockmap (sockmap): (%s)\n",
 			strerror(errno));
@@ -1455,6 +1464,8 @@ static void test_options(char *options)
 	}
 	if (txmsg_ingress)
 		append_str(options, "ingress,", OPTSTRING);
+	if (txmsg_pass_skb)
+		append_str(options, "pass_skb,", OPTSTRING);
 	if (txmsg_redir_skb)
 		append_str(options, "redir_skb,", OPTSTRING);
 	if (txmsg_ktls_skb)

-- 
2.34.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH bpf-next v3 5/5] selftests/bpf: sockmap: Add splice + SK_PASS regression test
  2025-07-09 12:47 [PATCH bpf-next v3 0/5] sockmap: Fix reading with splice(2) Vincent Whitchurch via B4 Relay
                   ` (3 preceding siblings ...)
  2025-07-09 12:48 ` [PATCH bpf-next v3 4/5] selftests/bpf: sockmap: Allow SK_PASS in verdict Vincent Whitchurch via B4 Relay
@ 2025-07-09 12:48 ` Vincent Whitchurch via B4 Relay
  2025-07-11 15:57 ` [PATCH bpf-next v3 0/5] sockmap: Fix reading with splice(2) John Fastabend
  2025-07-11 23:09 ` Jakub Kicinski
  6 siblings, 0 replies; 12+ messages in thread
From: Vincent Whitchurch via B4 Relay @ 2025-07-09 12:48 UTC (permalink / raw)
  To: John Fastabend, Jakub Sitnicki
  Cc: Kuniyuki Iwashima, Jakub Kicinski, netdev, bpf,
	Vincent Whitchurch

From: Vincent Whitchurch <vincent.whitchurch@datadoghq.com>

Add a test which checks that splice(2) is still able to read data from
the socket if it is added to a verdict program which returns SK_PASS.

Signed-off-by: Vincent Whitchurch <vincent.whitchurch@datadoghq.com>
---
 tools/testing/selftests/bpf/test_sockmap.c | 73 ++++++++++++++++++++++++++++++
 1 file changed, 73 insertions(+)

diff --git a/tools/testing/selftests/bpf/test_sockmap.c b/tools/testing/selftests/bpf/test_sockmap.c
index 8be2714dd573..b4102161db62 100644
--- a/tools/testing/selftests/bpf/test_sockmap.c
+++ b/tools/testing/selftests/bpf/test_sockmap.c
@@ -1,5 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0
 // Copyright (c) 2017-2018 Covalent IO, Inc. http://covalent.io
+#define _GNU_SOURCE
 #include <stdio.h>
 #include <stdlib.h>
 #include <sys/socket.h>
@@ -965,6 +966,61 @@ static int sendmsg_test(struct sockmap_options *opt)
 	return err;
 }
 
+static int splice_test(struct sockmap_options *opt)
+{
+	int pipefds[2];
+	char buf[1024] = {0};
+	ssize_t bytes;
+	int ret;
+
+	ret = pipe(pipefds);
+	if (ret < 0) {
+		perror("pipe");
+		return ret;
+	}
+
+	bytes = send(c1, buf, sizeof(buf), 0);
+	if (bytes < 0) {
+		perror("send failed");
+		return bytes;
+	}
+	if (bytes == 0) {
+		fprintf(stderr, "send wrote zero bytes\n");
+		return -1;
+	}
+
+	bytes = write(pipefds[1], buf, sizeof(buf));
+	if (bytes < 0) {
+		perror("pipe write failed");
+		return bytes;
+	}
+
+	bytes = splice(p1, NULL, pipefds[1], NULL, sizeof(buf), 0);
+	if (bytes < 0) {
+		perror("splice failed");
+		return bytes;
+	}
+	if (bytes == 0) {
+		fprintf(stderr, "spliced zero bytes\n");
+		return -1;
+	}
+
+	bytes = read(pipefds[0], buf, sizeof(buf));
+	if (bytes < 0) {
+		perror("pipe read failed");
+		return bytes;
+	}
+	if (bytes == 0) {
+		fprintf(stderr, "EOF from pipe\n");
+		return -1;
+	}
+
+	close(pipefds[1]);
+	close(pipefds[0]);
+
+	return 0;
+}
+
 static int forever_ping_pong(int rate, struct sockmap_options *opt)
 {
 	struct timeval timeout;
@@ -1047,6 +1103,7 @@ enum {
 	BASE,
 	BASE_SENDPAGE,
 	SENDPAGE,
+	SPLICE,
 };
 
 static int run_options(struct sockmap_options *options, int cg_fd,  int test)
@@ -1378,6 +1435,8 @@ static int run_options(struct sockmap_options *options, int cg_fd,  int test)
 		options->base = true;
 		options->sendpage = true;
 		err = sendmsg_test(options);
+	} else if (test == SPLICE) {
+		err = splice_test(options);
 	} else
 		fprintf(stderr, "unknown test\n");
 out:
@@ -1993,6 +2052,17 @@ static int populate_progs(char *bpf_file)
 	return 0;
 }
 
+static void test_txmsg_splice_pass(int cgrp, struct sockmap_options *opt)
+{
+	txmsg_omit_skb_parser = 1;
+	txmsg_pass_skb = 1;
+
+	__test_exec(cgrp, SPLICE, opt);
+
+	txmsg_omit_skb_parser = 0;
+	txmsg_pass_skb = 0;
+}
+
 struct _test test[] = {
 	{"txmsg test passthrough", test_txmsg_pass},
 	{"txmsg test redirect", test_txmsg_redir},
@@ -2009,6 +2079,7 @@ struct _test test[] = {
 	{"txmsg test push/pop data", test_txmsg_push_pop},
 	{"txmsg test ingress parser", test_txmsg_ingress_parser},
 	{"txmsg test ingress parser2", test_txmsg_ingress_parser2},
+	{"txmsg test splice pass", test_txmsg_splice_pass},
 };
 
 static int check_whitelist(struct _test *t, struct sockmap_options *opt)
@@ -2187,6 +2258,8 @@ int main(int argc, char **argv)
 				test = BASE_SENDPAGE;
 			} else if (strcmp(optarg, "sendpage") == 0) {
 				test = SENDPAGE;
+			} else if (strcmp(optarg, "splice") == 0) {
+				test = SPLICE;
 			} else {
 				usage(argv);
 				return -1;

-- 
2.34.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf-next v3 0/5] sockmap: Fix reading with splice(2)
  2025-07-09 12:47 [PATCH bpf-next v3 0/5] sockmap: Fix reading with splice(2) Vincent Whitchurch via B4 Relay
                   ` (4 preceding siblings ...)
  2025-07-09 12:48 ` [PATCH bpf-next v3 5/5] selftests/bpf: sockmap: Add splice + SK_PASS regression test Vincent Whitchurch via B4 Relay
@ 2025-07-11 15:57 ` John Fastabend
  2025-07-11 23:09 ` Jakub Kicinski
  6 siblings, 0 replies; 12+ messages in thread
From: John Fastabend @ 2025-07-11 15:57 UTC (permalink / raw)
  To: vincent.whitchurch
  Cc: Jakub Sitnicki, Kuniyuki Iwashima, Jakub Kicinski, netdev, bpf

On 2025-07-09 14:47:56, Vincent Whitchurch via B4 Relay wrote:
> I noticed that if the verdict callback returns SK_PASS, using splice(2)
> to read from a socket in a sockmap does not work since it never sees the
> data queued on to it.  As far as I can see, this is not a regression but
> just something that has never worked, but it does make sockmap unusable
> if you can't guarantee that the programs using the socket will not use
> splice(2).

Correct it was never supported.

> 
> This series attempts to fix it and add a test for it.

Thanks I had this todo on my list, but never got to it.

> 
> ---
> Changes in v3:
> - Rebase on latest bpf-next/master
> - Link to v2: https://lore.kernel.org/r/20250609-sockmap-splice-v2-0-9c50645cfa32@datadoghq.com
> 
> Changes in v2:
> - Rebase on latest bpf-next/master
> - Remove unnecessary change in inet_dgram_ops
> - Remove ->splice_read NULL check in inet_splice_read()
> - Use INDIRECT_CALL_1() in inet_splice_read()
> - Include test case in default test suite in test_sockmap
> - Link to v1: https://lore.kernel.org/r/20240606-sockmap-splice-v1-0-4820a2ab14b5@datadoghq.com
> 
> ---
> Vincent Whitchurch (5):
>       net: Add splice_read to prot
>       tcp_bpf: Fix reading with splice(2)
>       selftests/bpf: sockmap: Exit with error on failure
>       selftests/bpf: sockmap: Allow SK_PASS in verdict
>       selftests/bpf: sockmap: Add splice + SK_PASS regression test
> 
>  include/net/inet_common.h                  |  3 +
>  include/net/sock.h                         |  3 +
>  net/ipv4/af_inet.c                         | 13 ++++-
>  net/ipv4/tcp_bpf.c                         |  9 +++
>  net/ipv4/tcp_ipv4.c                        |  1 +
>  net/ipv6/af_inet6.c                        |  2 +-
>  net/ipv6/tcp_ipv6.c                        |  1 +
>  tools/testing/selftests/bpf/test_sockmap.c | 90 +++++++++++++++++++++++++++++-
>  8 files changed, 118 insertions(+), 4 deletions(-)
> ---
> base-commit: ad97cb2ed06a6ba9025fd8bd14fa24369550cbb5
> change-id: 20240606-sockmap-splice-d371ac07d7b4
> 
> Best regards,
> -- 
> Vincent Whitchurch <vincent.whitchurch@datadoghq.com>
> 
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf-next v3 1/5] net: Add splice_read to prot
  2025-07-09 12:47 ` [PATCH bpf-next v3 1/5] net: Add splice_read to prot Vincent Whitchurch via B4 Relay
@ 2025-07-11 17:27   ` John Fastabend
  2025-07-11 18:48     ` Kuniyuki Iwashima
  0 siblings, 1 reply; 12+ messages in thread
From: John Fastabend @ 2025-07-11 17:27 UTC (permalink / raw)
  To: vincent.whitchurch
  Cc: Jakub Sitnicki, Kuniyuki Iwashima, Jakub Kicinski, netdev, bpf

On 2025-07-09 14:47:57, Vincent Whitchurch via B4 Relay wrote:
> From: Vincent Whitchurch <vincent.whitchurch@datadoghq.com>
> 
> The TCP BPF code will need to override splice_read(), so add it to prot.
> 
> Signed-off-by: Vincent Whitchurch <vincent.whitchurch@datadoghq.com>
> ---
>  include/net/inet_common.h |  3 +++
>  include/net/sock.h        |  3 +++
>  net/ipv4/af_inet.c        | 13 ++++++++++++-
>  net/ipv4/tcp_ipv4.c       |  1 +
>  net/ipv6/af_inet6.c       |  2 +-
>  net/ipv6/tcp_ipv6.c       |  1 +
>  6 files changed, 21 insertions(+), 2 deletions(-)
> 
> diff --git a/include/net/inet_common.h b/include/net/inet_common.h
> index c17a6585d0b0..2a6480d0d575 100644
> --- a/include/net/inet_common.h
> +++ b/include/net/inet_common.h
> @@ -35,6 +35,9 @@ void __inet_accept(struct socket *sock, struct socket *newsock,
>  		   struct sock *newsk);
>  int inet_send_prepare(struct sock *sk);
>  int inet_sendmsg(struct socket *sock, struct msghdr *msg, size_t size);
> +ssize_t inet_splice_read(struct socket *sk, loff_t *ppos,
> +			 struct pipe_inode_info *pipe, size_t len,
> +			 unsigned int flags);
>  void inet_splice_eof(struct socket *sock);
>  int inet_recvmsg(struct socket *sock, struct msghdr *msg, size_t size,
>  		 int flags);
> diff --git a/include/net/sock.h b/include/net/sock.h
> index 4c37015b7cf7..4bdebcbcca38 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -1280,6 +1280,9 @@ struct proto {
>  					   size_t len);
>  	int			(*recvmsg)(struct sock *sk, struct msghdr *msg,
>  					   size_t len, int flags, int *addr_len);
> +	ssize_t			(*splice_read)(struct socket *sock,  loff_t *ppos,
> +					       struct pipe_inode_info *pipe, size_t len,
> +					       unsigned int flags);
>  	void			(*splice_eof)(struct socket *sock);
>  	int			(*bind)(struct sock *sk,
>  					struct sockaddr *addr, int addr_len);
> diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
> index 76e38092cd8a..9c521d252f66 100644
> --- a/net/ipv4/af_inet.c
> +++ b/net/ipv4/af_inet.c
> @@ -868,6 +868,17 @@ void inet_splice_eof(struct socket *sock)
>  }
>  EXPORT_SYMBOL_GPL(inet_splice_eof);
>  
> +ssize_t inet_splice_read(struct socket *sock, loff_t *ppos,
> +			 struct pipe_inode_info *pipe, size_t len,
> +			 unsigned int flags)
> +{
> +	struct sock *sk = sock->sk;
> +
> +	return INDIRECT_CALL_1(sk->sk_prot->splice_read, tcp_splice_read, sock,
> +			       ppos, pipe, len, flags);
> +}

Could we do a indirect_call_2 here?  something like this?

  INDIRECT_CALL_2(sk->sk_prot->splice_read, tcp_splice_read ...

Otherwise the series looks reasonable to me.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf-next v3 1/5] net: Add splice_read to prot
  2025-07-11 17:27   ` John Fastabend
@ 2025-07-11 18:48     ` Kuniyuki Iwashima
  2025-07-11 18:52       ` Kuniyuki Iwashima
  0 siblings, 1 reply; 12+ messages in thread
From: Kuniyuki Iwashima @ 2025-07-11 18:48 UTC (permalink / raw)
  To: John Fastabend
  Cc: vincent.whitchurch, Jakub Sitnicki, Jakub Kicinski, netdev, bpf

On Fri, Jul 11, 2025 at 10:27 AM John Fastabend
<john.fastabend@gmail.com> wrote:
>
> On 2025-07-09 14:47:57, Vincent Whitchurch via B4 Relay wrote:
> > From: Vincent Whitchurch <vincent.whitchurch@datadoghq.com>
> >
> > The TCP BPF code will need to override splice_read(), so add it to prot.
> >
> > Signed-off-by: Vincent Whitchurch <vincent.whitchurch@datadoghq.com>
> > ---
> >  include/net/inet_common.h |  3 +++
> >  include/net/sock.h        |  3 +++
> >  net/ipv4/af_inet.c        | 13 ++++++++++++-
> >  net/ipv4/tcp_ipv4.c       |  1 +
> >  net/ipv6/af_inet6.c       |  2 +-
> >  net/ipv6/tcp_ipv6.c       |  1 +
> >  6 files changed, 21 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/net/inet_common.h b/include/net/inet_common.h
> > index c17a6585d0b0..2a6480d0d575 100644
> > --- a/include/net/inet_common.h
> > +++ b/include/net/inet_common.h
> > @@ -35,6 +35,9 @@ void __inet_accept(struct socket *sock, struct socket *newsock,
> >                  struct sock *newsk);
> >  int inet_send_prepare(struct sock *sk);
> >  int inet_sendmsg(struct socket *sock, struct msghdr *msg, size_t size);
> > +ssize_t inet_splice_read(struct socket *sk, loff_t *ppos,
> > +                      struct pipe_inode_info *pipe, size_t len,
> > +                      unsigned int flags);
> >  void inet_splice_eof(struct socket *sock);
> >  int inet_recvmsg(struct socket *sock, struct msghdr *msg, size_t size,
> >                int flags);
> > diff --git a/include/net/sock.h b/include/net/sock.h
> > index 4c37015b7cf7..4bdebcbcca38 100644
> > --- a/include/net/sock.h
> > +++ b/include/net/sock.h
> > @@ -1280,6 +1280,9 @@ struct proto {
> >                                          size_t len);
> >       int                     (*recvmsg)(struct sock *sk, struct msghdr *msg,
> >                                          size_t len, int flags, int *addr_len);
> > +     ssize_t                 (*splice_read)(struct socket *sock,  loff_t *ppos,
> > +                                            struct pipe_inode_info *pipe, size_t len,
> > +                                            unsigned int flags);
> >       void                    (*splice_eof)(struct socket *sock);
> >       int                     (*bind)(struct sock *sk,
> >                                       struct sockaddr *addr, int addr_len);
> > diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
> > index 76e38092cd8a..9c521d252f66 100644
> > --- a/net/ipv4/af_inet.c
> > +++ b/net/ipv4/af_inet.c
> > @@ -868,6 +868,17 @@ void inet_splice_eof(struct socket *sock)
> >  }
> >  EXPORT_SYMBOL_GPL(inet_splice_eof);
> >
> > +ssize_t inet_splice_read(struct socket *sock, loff_t *ppos,
> > +                      struct pipe_inode_info *pipe, size_t len,
> > +                      unsigned int flags)
> > +{
> > +     struct sock *sk = sock->sk;
> > +
> > +     return INDIRECT_CALL_1(sk->sk_prot->splice_read, tcp_splice_read, sock,
> > +                            ppos, pipe, len, flags);
> > +}
>
> Could we do a indirect_call_2 here?  something like this?
>
>   INDIRECT_CALL_2(sk->sk_prot->splice_read, tcp_splice_read ...
>
> Otherwise the series looks reasonable to me.

What's the second candidate ?
I think we should specify the built-in one and cannot use bpf one.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf-next v3 1/5] net: Add splice_read to prot
  2025-07-11 18:48     ` Kuniyuki Iwashima
@ 2025-07-11 18:52       ` Kuniyuki Iwashima
  0 siblings, 0 replies; 12+ messages in thread
From: Kuniyuki Iwashima @ 2025-07-11 18:52 UTC (permalink / raw)
  To: John Fastabend
  Cc: Jakub Sitnicki, Jakub Kicinski, netdev, bpf, vincent.whitchurch

[ Fixed up Vincent's address as John's and my reply seems not to be delivered. ]

On Fri, Jul 11, 2025 at 11:48 AM Kuniyuki Iwashima <kuniyu@google.com> wrote:
>
> On Fri, Jul 11, 2025 at 10:27 AM John Fastabend
> <john.fastabend@gmail.com> wrote:
> >
> > On 2025-07-09 14:47:57, Vincent Whitchurch via B4 Relay wrote:
> > > From: Vincent Whitchurch <vincent.whitchurch@datadoghq.com>
> > >
> > > The TCP BPF code will need to override splice_read(), so add it to prot.
> > >
> > > Signed-off-by: Vincent Whitchurch <vincent.whitchurch@datadoghq.com>
> > > ---
> > >  include/net/inet_common.h |  3 +++
> > >  include/net/sock.h        |  3 +++
> > >  net/ipv4/af_inet.c        | 13 ++++++++++++-
> > >  net/ipv4/tcp_ipv4.c       |  1 +
> > >  net/ipv6/af_inet6.c       |  2 +-
> > >  net/ipv6/tcp_ipv6.c       |  1 +
> > >  6 files changed, 21 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/include/net/inet_common.h b/include/net/inet_common.h
> > > index c17a6585d0b0..2a6480d0d575 100644
> > > --- a/include/net/inet_common.h
> > > +++ b/include/net/inet_common.h
> > > @@ -35,6 +35,9 @@ void __inet_accept(struct socket *sock, struct socket *newsock,
> > >                  struct sock *newsk);
> > >  int inet_send_prepare(struct sock *sk);
> > >  int inet_sendmsg(struct socket *sock, struct msghdr *msg, size_t size);
> > > +ssize_t inet_splice_read(struct socket *sk, loff_t *ppos,
> > > +                      struct pipe_inode_info *pipe, size_t len,
> > > +                      unsigned int flags);
> > >  void inet_splice_eof(struct socket *sock);
> > >  int inet_recvmsg(struct socket *sock, struct msghdr *msg, size_t size,
> > >                int flags);
> > > diff --git a/include/net/sock.h b/include/net/sock.h
> > > index 4c37015b7cf7..4bdebcbcca38 100644
> > > --- a/include/net/sock.h
> > > +++ b/include/net/sock.h
> > > @@ -1280,6 +1280,9 @@ struct proto {
> > >                                          size_t len);
> > >       int                     (*recvmsg)(struct sock *sk, struct msghdr *msg,
> > >                                          size_t len, int flags, int *addr_len);
> > > +     ssize_t                 (*splice_read)(struct socket *sock,  loff_t *ppos,
> > > +                                            struct pipe_inode_info *pipe, size_t len,
> > > +                                            unsigned int flags);
> > >       void                    (*splice_eof)(struct socket *sock);
> > >       int                     (*bind)(struct sock *sk,
> > >                                       struct sockaddr *addr, int addr_len);
> > > diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
> > > index 76e38092cd8a..9c521d252f66 100644
> > > --- a/net/ipv4/af_inet.c
> > > +++ b/net/ipv4/af_inet.c
> > > @@ -868,6 +868,17 @@ void inet_splice_eof(struct socket *sock)
> > >  }
> > >  EXPORT_SYMBOL_GPL(inet_splice_eof);
> > >
> > > +ssize_t inet_splice_read(struct socket *sock, loff_t *ppos,
> > > +                      struct pipe_inode_info *pipe, size_t len,
> > > +                      unsigned int flags)
> > > +{
> > > +     struct sock *sk = sock->sk;
> > > +
> > > +     return INDIRECT_CALL_1(sk->sk_prot->splice_read, tcp_splice_read, sock,
> > > +                            ppos, pipe, len, flags);
> > > +}
> >
> > Could we do a indirect_call_2 here?  something like this?
> >
> >   INDIRECT_CALL_2(sk->sk_prot->splice_read, tcp_splice_read ...
> >
> > Otherwise the series looks reasonable to me.
>
> What's the second candidate ?
> I think we should specify the built-in one and cannot use bpf one.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf-next v3 0/5] sockmap: Fix reading with splice(2)
  2025-07-09 12:47 [PATCH bpf-next v3 0/5] sockmap: Fix reading with splice(2) Vincent Whitchurch via B4 Relay
                   ` (5 preceding siblings ...)
  2025-07-11 15:57 ` [PATCH bpf-next v3 0/5] sockmap: Fix reading with splice(2) John Fastabend
@ 2025-07-11 23:09 ` Jakub Kicinski
  2025-07-22  3:33   ` John Fastabend
  6 siblings, 1 reply; 12+ messages in thread
From: Jakub Kicinski @ 2025-07-11 23:09 UTC (permalink / raw)
  To: Vincent Whitchurch via B4 Relay
  Cc: vincent.whitchurch, John Fastabend, Jakub Sitnicki,
	Kuniyuki Iwashima, netdev, bpf

On Wed, 09 Jul 2025 14:47:56 +0200 Vincent Whitchurch via B4 Relay
wrote:
> I noticed that if the verdict callback returns SK_PASS, using splice(2)
> to read from a socket in a sockmap does not work since it never sees the
> data queued on to it.  As far as I can see, this is not a regression but
> just something that has never worked, but it does make sockmap unusable
> if you can't guarantee that the programs using the socket will not use
> splice(2).

On v2 you should you can't replace ops for passively opened
connections. Can that not be addressed instead of adding
an indirect call on the data path?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf-next v3 0/5] sockmap: Fix reading with splice(2)
  2025-07-11 23:09 ` Jakub Kicinski
@ 2025-07-22  3:33   ` John Fastabend
  0 siblings, 0 replies; 12+ messages in thread
From: John Fastabend @ 2025-07-22  3:33 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Vincent Whitchurch via B4 Relay, vincent.whitchurch,
	Jakub Sitnicki, Kuniyuki Iwashima, netdev, bpf

On 2025-07-11 16:09:31, Jakub Kicinski wrote:
> On Wed, 09 Jul 2025 14:47:56 +0200 Vincent Whitchurch via B4 Relay
> wrote:
> > I noticed that if the verdict callback returns SK_PASS, using splice(2)
> > to read from a socket in a sockmap does not work since it never sees the
> > data queued on to it.  As far as I can see, this is not a regression but
> > just something that has never worked, but it does make sockmap unusable
> > if you can't guarantee that the programs using the socket will not use
> > splice(2).
> 
> On v2 you should you can't replace ops for passively opened
> connections. Can that not be addressed instead of adding
> an indirect call on the data path?

I guess I missed above? We can create the psock and add the socket
to a map on the accept()?

It would be good to get some fix for this.

Thanks,
John

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2025-07-22  3:33 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-09 12:47 [PATCH bpf-next v3 0/5] sockmap: Fix reading with splice(2) Vincent Whitchurch via B4 Relay
2025-07-09 12:47 ` [PATCH bpf-next v3 1/5] net: Add splice_read to prot Vincent Whitchurch via B4 Relay
2025-07-11 17:27   ` John Fastabend
2025-07-11 18:48     ` Kuniyuki Iwashima
2025-07-11 18:52       ` Kuniyuki Iwashima
2025-07-09 12:47 ` [PATCH bpf-next v3 2/5] tcp_bpf: Fix reading with splice(2) Vincent Whitchurch via B4 Relay
2025-07-09 12:47 ` [PATCH bpf-next v3 3/5] selftests/bpf: sockmap: Exit with error on failure Vincent Whitchurch via B4 Relay
2025-07-09 12:48 ` [PATCH bpf-next v3 4/5] selftests/bpf: sockmap: Allow SK_PASS in verdict Vincent Whitchurch via B4 Relay
2025-07-09 12:48 ` [PATCH bpf-next v3 5/5] selftests/bpf: sockmap: Add splice + SK_PASS regression test Vincent Whitchurch via B4 Relay
2025-07-11 15:57 ` [PATCH bpf-next v3 0/5] sockmap: Fix reading with splice(2) John Fastabend
2025-07-11 23:09 ` Jakub Kicinski
2025-07-22  3:33   ` John Fastabend

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).