bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next v2 0/5] sockmap: Fix reading with splice(2)
@ 2025-06-09 13:26 Vincent Whitchurch via B4 Relay
  2025-06-09 13:26 ` [PATCH bpf-next v2 1/5] net: Add splice_read to prot Vincent Whitchurch via B4 Relay
                   ` (4 more replies)
  0 siblings, 5 replies; 9+ messages in thread
From: Vincent Whitchurch via B4 Relay @ 2025-06-09 13:26 UTC (permalink / raw)
  To: John Fastabend, Jakub Sitnicki
  Cc: Kuniyuki Iwashima, netdev, bpf, Vincent Whitchurch

I noticed that if the verdict callback returns SK_PASS, using splice(2)
to read from the socket doesn't work since it never sees the data queued
on to it.  As far as I can see, this is not a regression but just
something that has never worked.

This series attempts to fix it and add a test for it.

---
Changes in v2:
- Rebase on latest bpf-next/master
- Remove unnecessary change in inet_dgram_ops
- Remove ->splice_read NULL check in inet_splice_read()
- Use INDIRECT_CALL_1() in inet_splice_read()
- Include test case in default test suite in test_sockmap
- Link to v1: https://lore.kernel.org/r/20240606-sockmap-splice-v1-0-4820a2ab14b5@datadoghq.com

---
Vincent Whitchurch (5):
      net: Add splice_read to prot
      tcp_bpf: Fix reading with splice(2)
      selftests/bpf: sockmap: Exit with error on failure
      selftests/bpf: sockmap: Allow SK_PASS in verdict
      selftests/bpf: sockmap: Add splice + SK_PASS regression test

 include/net/inet_common.h                  |  3 +
 include/net/sock.h                         |  3 +
 net/ipv4/af_inet.c                         | 13 ++++-
 net/ipv4/tcp_bpf.c                         |  9 +++
 net/ipv4/tcp_ipv4.c                        |  1 +
 net/ipv6/af_inet6.c                        |  2 +-
 net/ipv6/tcp_ipv6.c                        |  1 +
 tools/testing/selftests/bpf/test_sockmap.c | 90 +++++++++++++++++++++++++++++-
 8 files changed, 118 insertions(+), 4 deletions(-)
---
base-commit: e41079f53e8792c99cc8888f545c31bc341ea9ac
change-id: 20240606-sockmap-splice-d371ac07d7b4

Best regards,
-- 
Vincent Whitchurch <vincent.whitchurch@datadoghq.com>



^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH bpf-next v2 1/5] net: Add splice_read to prot
  2025-06-09 13:26 [PATCH bpf-next v2 0/5] sockmap: Fix reading with splice(2) Vincent Whitchurch via B4 Relay
@ 2025-06-09 13:26 ` Vincent Whitchurch via B4 Relay
  2025-06-09 19:21   ` Jakub Kicinski
  2025-06-09 13:26 ` [PATCH bpf-next v2 2/5] tcp_bpf: Fix reading with splice(2) Vincent Whitchurch via B4 Relay
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 9+ messages in thread
From: Vincent Whitchurch via B4 Relay @ 2025-06-09 13:26 UTC (permalink / raw)
  To: John Fastabend, Jakub Sitnicki
  Cc: Kuniyuki Iwashima, netdev, bpf, Vincent Whitchurch

From: Vincent Whitchurch <vincent.whitchurch@datadoghq.com>

The TCP BPF code will need to override splice_read(), so add it to prot.

Signed-off-by: Vincent Whitchurch <vincent.whitchurch@datadoghq.com>
---
 include/net/inet_common.h |  3 +++
 include/net/sock.h        |  3 +++
 net/ipv4/af_inet.c        | 13 ++++++++++++-
 net/ipv4/tcp_ipv4.c       |  1 +
 net/ipv6/af_inet6.c       |  2 +-
 net/ipv6/tcp_ipv6.c       |  1 +
 6 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/include/net/inet_common.h b/include/net/inet_common.h
index c17a6585d0b0..2a6480d0d575 100644
--- a/include/net/inet_common.h
+++ b/include/net/inet_common.h
@@ -35,6 +35,9 @@ void __inet_accept(struct socket *sock, struct socket *newsock,
 		   struct sock *newsk);
 int inet_send_prepare(struct sock *sk);
 int inet_sendmsg(struct socket *sock, struct msghdr *msg, size_t size);
+ssize_t inet_splice_read(struct socket *sk, loff_t *ppos,
+			 struct pipe_inode_info *pipe, size_t len,
+			 unsigned int flags);
 void inet_splice_eof(struct socket *sock);
 int inet_recvmsg(struct socket *sock, struct msghdr *msg, size_t size,
 		 int flags);
diff --git a/include/net/sock.h b/include/net/sock.h
index 92e7c1aae3cc..08f0acf1fce4 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1280,6 +1280,9 @@ struct proto {
 					   size_t len);
 	int			(*recvmsg)(struct sock *sk, struct msghdr *msg,
 					   size_t len, int flags, int *addr_len);
+	ssize_t			(*splice_read)(struct socket *sock,  loff_t *ppos,
+					       struct pipe_inode_info *pipe, size_t len,
+					       unsigned int flags);
 	void			(*splice_eof)(struct socket *sock);
 	int			(*bind)(struct sock *sk,
 					struct sockaddr *addr, int addr_len);
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 76e38092cd8a..9c521d252f66 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -868,6 +868,17 @@ void inet_splice_eof(struct socket *sock)
 }
 EXPORT_SYMBOL_GPL(inet_splice_eof);
 
+ssize_t inet_splice_read(struct socket *sock, loff_t *ppos,
+			 struct pipe_inode_info *pipe, size_t len,
+			 unsigned int flags)
+{
+	struct sock *sk = sock->sk;
+
+	return INDIRECT_CALL_1(sk->sk_prot->splice_read, tcp_splice_read, sock,
+			       ppos, pipe, len, flags);
+}
+EXPORT_SYMBOL_GPL(inet_splice_read);
+
 INDIRECT_CALLABLE_DECLARE(int udp_recvmsg(struct sock *, struct msghdr *,
 					  size_t, int, int *));
 int inet_recvmsg(struct socket *sock, struct msghdr *msg, size_t size,
@@ -1071,7 +1082,7 @@ const struct proto_ops inet_stream_ops = {
 	.mmap		   = tcp_mmap,
 #endif
 	.splice_eof	   = inet_splice_eof,
-	.splice_read	   = tcp_splice_read,
+	.splice_read	   = inet_splice_read,
 	.set_peek_off      = sk_set_peek_off,
 	.read_sock	   = tcp_read_sock,
 	.read_skb	   = tcp_read_skb,
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 6a14f9e6fef6..0391b6bef35b 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -3375,6 +3375,7 @@ struct proto tcp_prot = {
 	.keepalive		= tcp_set_keepalive,
 	.recvmsg		= tcp_recvmsg,
 	.sendmsg		= tcp_sendmsg,
+	.splice_read		= tcp_splice_read,
 	.splice_eof		= tcp_splice_eof,
 	.backlog_rcv		= tcp_v4_do_rcv,
 	.release_cb		= tcp_release_cb,
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index acaff1296783..2d6a1d669452 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -705,7 +705,7 @@ const struct proto_ops inet6_stream_ops = {
 #endif
 	.splice_eof	   = inet_splice_eof,
 	.sendmsg_locked    = tcp_sendmsg_locked,
-	.splice_read	   = tcp_splice_read,
+	.splice_read	   = inet_splice_read,
 	.set_peek_off      = sk_set_peek_off,
 	.read_sock	   = tcp_read_sock,
 	.read_skb	   = tcp_read_skb,
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index e8e68a142649..f7558e443de6 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -2342,6 +2342,7 @@ struct proto tcpv6_prot = {
 	.keepalive		= tcp_set_keepalive,
 	.recvmsg		= tcp_recvmsg,
 	.sendmsg		= tcp_sendmsg,
+	.splice_read		= tcp_splice_read,
 	.splice_eof		= tcp_splice_eof,
 	.backlog_rcv		= tcp_v6_do_rcv,
 	.release_cb		= tcp_release_cb,

-- 
2.34.1



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH bpf-next v2 2/5] tcp_bpf: Fix reading with splice(2)
  2025-06-09 13:26 [PATCH bpf-next v2 0/5] sockmap: Fix reading with splice(2) Vincent Whitchurch via B4 Relay
  2025-06-09 13:26 ` [PATCH bpf-next v2 1/5] net: Add splice_read to prot Vincent Whitchurch via B4 Relay
@ 2025-06-09 13:26 ` Vincent Whitchurch via B4 Relay
  2025-06-09 13:27 ` [PATCH bpf-next v2 3/5] selftests/bpf: sockmap: Exit with error on failure Vincent Whitchurch via B4 Relay
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Vincent Whitchurch via B4 Relay @ 2025-06-09 13:26 UTC (permalink / raw)
  To: John Fastabend, Jakub Sitnicki
  Cc: Kuniyuki Iwashima, netdev, bpf, Vincent Whitchurch

From: Vincent Whitchurch <vincent.whitchurch@datadoghq.com>

If a socket is added to a sockmap with a verdict program which returns
SK_PASS, splice(2) is not able to read from the socket.

The verdict code removes skbs from the receive queue, checks them using
the bpf program, and then re-queues them onto a separate queue
(psock->ingress_msg).  The sockmap code modifies the TCP recvmsg hook to
check this second queue also so that works. But the splice_read hooks is
not modified and the default tcp_read_splice() only reads the normal
receive queue so it never sees the skbs which have been re-queued.

Fix it by using copy_splice_read() when replacing the proto for the
sockmap.  This could eventually be replaced with a more efficient custom
version.

Signed-off-by: Vincent Whitchurch <vincent.whitchurch@datadoghq.com>
---
 net/ipv4/tcp_bpf.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
index ba581785adb4..197429b6adae 100644
--- a/net/ipv4/tcp_bpf.c
+++ b/net/ipv4/tcp_bpf.c
@@ -3,6 +3,7 @@
 
 #include <linux/skmsg.h>
 #include <linux/filter.h>
+#include <linux/fs.h>
 #include <linux/bpf.h>
 #include <linux/init.h>
 #include <linux/wait.h>
@@ -381,6 +382,13 @@ static int tcp_bpf_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
 	return ret;
 }
 
+static ssize_t tcp_bpf_splice_read(struct socket *sock, loff_t *ppos,
+				   struct pipe_inode_info *pipe, size_t len,
+				   unsigned int flags)
+{
+	return copy_splice_read(sock->file, ppos, pipe, len, flags);
+}
+
 static int tcp_bpf_send_verdict(struct sock *sk, struct sk_psock *psock,
 				struct sk_msg *msg, int *copied, int flags)
 {
@@ -605,6 +613,7 @@ static void tcp_bpf_rebuild_protos(struct proto prot[TCP_BPF_NUM_CFGS],
 	prot[TCP_BPF_BASE].destroy		= sock_map_destroy;
 	prot[TCP_BPF_BASE].close		= sock_map_close;
 	prot[TCP_BPF_BASE].recvmsg		= tcp_bpf_recvmsg;
+	prot[TCP_BPF_BASE].splice_read		= tcp_bpf_splice_read;
 	prot[TCP_BPF_BASE].sock_is_readable	= sk_msg_is_readable;
 
 	prot[TCP_BPF_TX]			= prot[TCP_BPF_BASE];

-- 
2.34.1



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH bpf-next v2 3/5] selftests/bpf: sockmap: Exit with error on failure
  2025-06-09 13:26 [PATCH bpf-next v2 0/5] sockmap: Fix reading with splice(2) Vincent Whitchurch via B4 Relay
  2025-06-09 13:26 ` [PATCH bpf-next v2 1/5] net: Add splice_read to prot Vincent Whitchurch via B4 Relay
  2025-06-09 13:26 ` [PATCH bpf-next v2 2/5] tcp_bpf: Fix reading with splice(2) Vincent Whitchurch via B4 Relay
@ 2025-06-09 13:27 ` Vincent Whitchurch via B4 Relay
  2025-06-09 13:27 ` [PATCH bpf-next v2 4/5] selftests/bpf: sockmap: Allow SK_PASS in verdict Vincent Whitchurch via B4 Relay
  2025-06-09 13:27 ` [PATCH bpf-next v2 5/5] selftests/bpf: sockmap: Add splice + SK_PASS regression test Vincent Whitchurch via B4 Relay
  4 siblings, 0 replies; 9+ messages in thread
From: Vincent Whitchurch via B4 Relay @ 2025-06-09 13:27 UTC (permalink / raw)
  To: John Fastabend, Jakub Sitnicki
  Cc: Kuniyuki Iwashima, netdev, bpf, Vincent Whitchurch

From: Vincent Whitchurch <vincent.whitchurch@datadoghq.com>

If any tests failed, exit the program with a non-zero
error code.

Signed-off-by: Vincent Whitchurch <vincent.whitchurch@datadoghq.com>
---
 tools/testing/selftests/bpf/test_sockmap.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/bpf/test_sockmap.c b/tools/testing/selftests/bpf/test_sockmap.c
index fd2da2234cc9..cf1c36ed32c1 100644
--- a/tools/testing/selftests/bpf/test_sockmap.c
+++ b/tools/testing/selftests/bpf/test_sockmap.c
@@ -2238,7 +2238,9 @@ int main(int argc, char **argv)
 	close(cg_fd);
 	if (cg_created)
 		cleanup_cgroup_environment();
-	return err;
+	if (err)
+		return err;
+	return failed ? 1 : 0;
 }
 
 void running_handler(int a)

-- 
2.34.1



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH bpf-next v2 4/5] selftests/bpf: sockmap: Allow SK_PASS in verdict
  2025-06-09 13:26 [PATCH bpf-next v2 0/5] sockmap: Fix reading with splice(2) Vincent Whitchurch via B4 Relay
                   ` (2 preceding siblings ...)
  2025-06-09 13:27 ` [PATCH bpf-next v2 3/5] selftests/bpf: sockmap: Exit with error on failure Vincent Whitchurch via B4 Relay
@ 2025-06-09 13:27 ` Vincent Whitchurch via B4 Relay
  2025-06-09 13:27 ` [PATCH bpf-next v2 5/5] selftests/bpf: sockmap: Add splice + SK_PASS regression test Vincent Whitchurch via B4 Relay
  4 siblings, 0 replies; 9+ messages in thread
From: Vincent Whitchurch via B4 Relay @ 2025-06-09 13:27 UTC (permalink / raw)
  To: John Fastabend, Jakub Sitnicki
  Cc: Kuniyuki Iwashima, netdev, bpf, Vincent Whitchurch

From: Vincent Whitchurch <vincent.whitchurch@datadoghq.com>

Add an option to always return SK_PASS in the verdict callback
instead of redirecting the skb.  This allows testing cases
which are not covered by the test program as of now.

Signed-off-by: Vincent Whitchurch <vincent.whitchurch@datadoghq.com>
---
 tools/testing/selftests/bpf/test_sockmap.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/bpf/test_sockmap.c b/tools/testing/selftests/bpf/test_sockmap.c
index cf1c36ed32c1..8be2714dd573 100644
--- a/tools/testing/selftests/bpf/test_sockmap.c
+++ b/tools/testing/selftests/bpf/test_sockmap.c
@@ -80,6 +80,7 @@ int txmsg_end_push;
 int txmsg_start_pop;
 int txmsg_pop;
 int txmsg_ingress;
+int txmsg_pass_skb;
 int txmsg_redir_skb;
 int txmsg_ktls_skb;
 int txmsg_ktls_skb_drop;
@@ -114,6 +115,7 @@ static const struct option long_options[] = {
 	{"txmsg_start_pop",  required_argument,	NULL, 'w'},
 	{"txmsg_pop",	     required_argument,	NULL, 'x'},
 	{"txmsg_ingress", no_argument,		&txmsg_ingress, 1 },
+	{"txmsg_pass_skb", no_argument,		&txmsg_pass_skb, 1 },
 	{"txmsg_redir_skb", no_argument,	&txmsg_redir_skb, 1 },
 	{"ktls", no_argument,			&ktls, 1 },
 	{"peek", no_argument,			&peek_flag, 1 },
@@ -183,6 +185,7 @@ static void test_reset(void)
 	txmsg_pass = txmsg_drop = txmsg_redir = 0;
 	txmsg_apply = txmsg_cork = 0;
 	txmsg_ingress = txmsg_redir_skb = 0;
+	txmsg_pass_skb = 0;
 	txmsg_ktls_skb = txmsg_ktls_skb_drop = txmsg_ktls_skb_redir = 0;
 	txmsg_omit_skb_parser = 0;
 	skb_use_parser = 0;
@@ -1050,6 +1053,7 @@ static int run_options(struct sockmap_options *options, int cg_fd,  int test)
 {
 	int i, key, next_key, err, zero = 0;
 	struct bpf_program *tx_prog;
+	struct bpf_program *skb_verdict_prog;
 
 	/* If base test skip BPF setup */
 	if (test == BASE || test == BASE_SENDPAGE)
@@ -1066,7 +1070,12 @@ static int run_options(struct sockmap_options *options, int cg_fd,  int test)
 		}
 	}
 
-	links[1] = bpf_program__attach_sockmap(progs[1], map_fd[0]);
+	if (txmsg_pass_skb)
+		skb_verdict_prog = progs[2];
+	else
+		skb_verdict_prog = progs[1];
+
+	links[1] = bpf_program__attach_sockmap(skb_verdict_prog, map_fd[0]);
 	if (!links[1]) {
 		fprintf(stderr, "ERROR: bpf_program__attach_sockmap (sockmap): (%s)\n",
 			strerror(errno));
@@ -1455,6 +1464,8 @@ static void test_options(char *options)
 	}
 	if (txmsg_ingress)
 		append_str(options, "ingress,", OPTSTRING);
+	if (txmsg_pass_skb)
+		append_str(options, "pass_skb,", OPTSTRING);
 	if (txmsg_redir_skb)
 		append_str(options, "redir_skb,", OPTSTRING);
 	if (txmsg_ktls_skb)

-- 
2.34.1



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH bpf-next v2 5/5] selftests/bpf: sockmap: Add splice + SK_PASS regression test
  2025-06-09 13:26 [PATCH bpf-next v2 0/5] sockmap: Fix reading with splice(2) Vincent Whitchurch via B4 Relay
                   ` (3 preceding siblings ...)
  2025-06-09 13:27 ` [PATCH bpf-next v2 4/5] selftests/bpf: sockmap: Allow SK_PASS in verdict Vincent Whitchurch via B4 Relay
@ 2025-06-09 13:27 ` Vincent Whitchurch via B4 Relay
  4 siblings, 0 replies; 9+ messages in thread
From: Vincent Whitchurch via B4 Relay @ 2025-06-09 13:27 UTC (permalink / raw)
  To: John Fastabend, Jakub Sitnicki
  Cc: Kuniyuki Iwashima, netdev, bpf, Vincent Whitchurch

From: Vincent Whitchurch <vincent.whitchurch@datadoghq.com>

Add a test which checks that splice(2) is still able to read data from
the socket if it is added to a verdict program which returns SK_PASS.

Signed-off-by: Vincent Whitchurch <vincent.whitchurch@datadoghq.com>
---
 tools/testing/selftests/bpf/test_sockmap.c | 73 ++++++++++++++++++++++++++++++
 1 file changed, 73 insertions(+)

diff --git a/tools/testing/selftests/bpf/test_sockmap.c b/tools/testing/selftests/bpf/test_sockmap.c
index 8be2714dd573..b4102161db62 100644
--- a/tools/testing/selftests/bpf/test_sockmap.c
+++ b/tools/testing/selftests/bpf/test_sockmap.c
@@ -1,5 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0
 // Copyright (c) 2017-2018 Covalent IO, Inc. http://covalent.io
+#define _GNU_SOURCE
 #include <stdio.h>
 #include <stdlib.h>
 #include <sys/socket.h>
@@ -965,6 +966,61 @@ static int sendmsg_test(struct sockmap_options *opt)
 	return err;
 }
 
+static int splice_test(struct sockmap_options *opt)
+{
+	int pipefds[2];
+	char buf[1024] = {0};
+	ssize_t bytes;
+	int ret;
+
+	ret = pipe(pipefds);
+	if (ret < 0) {
+		perror("pipe");
+		return ret;
+	}
+
+	bytes = send(c1, buf, sizeof(buf), 0);
+	if (bytes < 0) {
+		perror("send failed");
+		return bytes;
+	}
+	if (bytes == 0) {
+		fprintf(stderr, "send wrote zero bytes\n");
+		return -1;
+	}
+
+	bytes = write(pipefds[1], buf, sizeof(buf));
+	if (bytes < 0) {
+		perror("pipe write failed");
+		return bytes;
+	}
+
+	bytes = splice(p1, NULL, pipefds[1], NULL, sizeof(buf), 0);
+	if (bytes < 0) {
+		perror("splice failed");
+		return bytes;
+	}
+	if (bytes == 0) {
+		fprintf(stderr, "spliced zero bytes\n");
+		return -1;
+	}
+
+	bytes = read(pipefds[0], buf, sizeof(buf));
+	if (bytes < 0) {
+		perror("pipe read failed");
+		return bytes;
+	}
+	if (bytes == 0) {
+		fprintf(stderr, "EOF from pipe\n");
+		return -1;
+	}
+
+	close(pipefds[1]);
+	close(pipefds[0]);
+
+	return 0;
+}
+
 static int forever_ping_pong(int rate, struct sockmap_options *opt)
 {
 	struct timeval timeout;
@@ -1047,6 +1103,7 @@ enum {
 	BASE,
 	BASE_SENDPAGE,
 	SENDPAGE,
+	SPLICE,
 };
 
 static int run_options(struct sockmap_options *options, int cg_fd,  int test)
@@ -1378,6 +1435,8 @@ static int run_options(struct sockmap_options *options, int cg_fd,  int test)
 		options->base = true;
 		options->sendpage = true;
 		err = sendmsg_test(options);
+	} else if (test == SPLICE) {
+		err = splice_test(options);
 	} else
 		fprintf(stderr, "unknown test\n");
 out:
@@ -1993,6 +2052,17 @@ static int populate_progs(char *bpf_file)
 	return 0;
 }
 
+static void test_txmsg_splice_pass(int cgrp, struct sockmap_options *opt)
+{
+	txmsg_omit_skb_parser = 1;
+	txmsg_pass_skb = 1;
+
+	__test_exec(cgrp, SPLICE, opt);
+
+	txmsg_omit_skb_parser = 0;
+	txmsg_pass_skb = 0;
+}
+
 struct _test test[] = {
 	{"txmsg test passthrough", test_txmsg_pass},
 	{"txmsg test redirect", test_txmsg_redir},
@@ -2009,6 +2079,7 @@ struct _test test[] = {
 	{"txmsg test push/pop data", test_txmsg_push_pop},
 	{"txmsg test ingress parser", test_txmsg_ingress_parser},
 	{"txmsg test ingress parser2", test_txmsg_ingress_parser2},
+	{"txmsg test splice pass", test_txmsg_splice_pass},
 };
 
 static int check_whitelist(struct _test *t, struct sockmap_options *opt)
@@ -2187,6 +2258,8 @@ int main(int argc, char **argv)
 				test = BASE_SENDPAGE;
 			} else if (strcmp(optarg, "sendpage") == 0) {
 				test = SENDPAGE;
+			} else if (strcmp(optarg, "splice") == 0) {
+				test = SPLICE;
 			} else {
 				usage(argv);
 				return -1;

-- 
2.34.1



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH bpf-next v2 1/5] net: Add splice_read to prot
  2025-06-09 13:26 ` [PATCH bpf-next v2 1/5] net: Add splice_read to prot Vincent Whitchurch via B4 Relay
@ 2025-06-09 19:21   ` Jakub Kicinski
  2025-06-11  8:57     ` Vincent Whitchurch
  0 siblings, 1 reply; 9+ messages in thread
From: Jakub Kicinski @ 2025-06-09 19:21 UTC (permalink / raw)
  To: Vincent Whitchurch via B4 Relay
  Cc: vincent.whitchurch, John Fastabend, Jakub Sitnicki,
	Kuniyuki Iwashima, netdev, bpf

On Mon, 09 Jun 2025 15:26:58 +0200 Vincent Whitchurch via B4 Relay
wrote:
> The TCP BPF code will need to override splice_read(), so add it to prot.

Can we not override proto_ops in tcp_bpf for some specific reason?
TLS does that, IIUC.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH bpf-next v2 1/5] net: Add splice_read to prot
  2025-06-09 19:21   ` Jakub Kicinski
@ 2025-06-11  8:57     ` Vincent Whitchurch
  2025-06-20 15:44       ` Vincent Whitchurch
  0 siblings, 1 reply; 9+ messages in thread
From: Vincent Whitchurch @ 2025-06-11  8:57 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Vincent Whitchurch via B4 Relay, John Fastabend, Jakub Sitnicki,
	Kuniyuki Iwashima, netdev, bpf

On Mon, Jun 9, 2025 at 9:21 PM Jakub Kicinski <kuba@kernel.org> wrote:
> Can we not override proto_ops in tcp_bpf for some specific reason?
> TLS does that, IIUC.

I see that TLS writes to sk->sk_socket->ops to override the proto_ops.
I added some prints to tcp_bpf_update_proto() but there I see that
sk->sk_socket is NULL in some code paths, like the one below.

 tcp_bpf_update_proto: restore 0 sk_prot 000000002cf13dcc sk_socket
0000000000000000
 CPU: 0 UID: 0 PID: 392 Comm: test_sockmap Not tainted
6.15.0-12313-g39e87f4ff7c3-dirty #77 PREEMPT(voluntary)
 Call Trace:
  <IRQ>
  dump_stack_lvl+0x83/0xa0
  tcp_bpf_update_proto+0x116/0x790
  sock_map_link+0x425/0xdd0
  sock_map_update_common+0xb8/0x6a0
  bpf_sock_map_update+0x102/0x190
  bpf_prog_4d9ceaf804942d01_bpf_sockmap+0x79/0x81
  __cgroup_bpf_run_filter_sock_ops+0x1db/0x4b0
  tcp_init_transfer+0x852/0xc00
  tcp_rcv_state_process+0x3147/0x4b30
  tcp_child_process+0x346/0x8b0
  tcp_v4_rcv+0x1616/0x3e10
  ip_protocol_deliver_rcu+0x93/0x370
  ip_local_deliver_finish+0x29c/0x420
  ip_local_deliver+0x193/0x450
  ip_rcv+0x497/0x710
  __netif_receive_skb_one_core+0x164/0x1b0
  process_backlog+0x3a7/0x12b0
  __napi_poll.constprop.0+0xa0/0x440
  net_rx_action+0x8ce/0xca0
  handle_softirqs+0x1c3/0x7b0
  do_softirq+0xa5/0xd0

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH bpf-next v2 1/5] net: Add splice_read to prot
  2025-06-11  8:57     ` Vincent Whitchurch
@ 2025-06-20 15:44       ` Vincent Whitchurch
  0 siblings, 0 replies; 9+ messages in thread
From: Vincent Whitchurch @ 2025-06-20 15:44 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Vincent Whitchurch via B4 Relay, John Fastabend, Jakub Sitnicki,
	Kuniyuki Iwashima, netdev, bpf

On Wed, Jun 11, 2025 at 10:57 AM Vincent Whitchurch
<vincent.whitchurch@datadoghq.com> wrote:
> On Mon, Jun 9, 2025 at 9:21 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > Can we not override proto_ops in tcp_bpf for some specific reason?
> > TLS does that, IIUC.
>
> I see that TLS writes to sk->sk_socket->ops to override the proto_ops.
> I added some prints to tcp_bpf_update_proto() but there I see that
> sk->sk_socket is NULL in some code paths, like the one below.

To expand on this: TLS is able to override the sk->sk_socket->ops
since it can only be installed on the socket via setsockopt(2).
tcp_bpf on the other hand allows being installed on passively
established sockets before they have a sk->sk_socket assigned via
accept(2). So, AFAICS, we can't use the same override mechanism as
TLS.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2025-06-20 15:45 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-09 13:26 [PATCH bpf-next v2 0/5] sockmap: Fix reading with splice(2) Vincent Whitchurch via B4 Relay
2025-06-09 13:26 ` [PATCH bpf-next v2 1/5] net: Add splice_read to prot Vincent Whitchurch via B4 Relay
2025-06-09 19:21   ` Jakub Kicinski
2025-06-11  8:57     ` Vincent Whitchurch
2025-06-20 15:44       ` Vincent Whitchurch
2025-06-09 13:26 ` [PATCH bpf-next v2 2/5] tcp_bpf: Fix reading with splice(2) Vincent Whitchurch via B4 Relay
2025-06-09 13:27 ` [PATCH bpf-next v2 3/5] selftests/bpf: sockmap: Exit with error on failure Vincent Whitchurch via B4 Relay
2025-06-09 13:27 ` [PATCH bpf-next v2 4/5] selftests/bpf: sockmap: Allow SK_PASS in verdict Vincent Whitchurch via B4 Relay
2025-06-09 13:27 ` [PATCH bpf-next v2 5/5] selftests/bpf: sockmap: Add splice + SK_PASS regression test Vincent Whitchurch via B4 Relay

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).