* [PATCH bpf-next v2 0/5] sockmap: Fix reading with splice(2)
@ 2025-06-09 13:26 Vincent Whitchurch via B4 Relay
2025-06-09 13:26 ` [PATCH bpf-next v2 1/5] net: Add splice_read to prot Vincent Whitchurch via B4 Relay
` (4 more replies)
0 siblings, 5 replies; 9+ messages in thread
From: Vincent Whitchurch via B4 Relay @ 2025-06-09 13:26 UTC (permalink / raw)
To: John Fastabend, Jakub Sitnicki
Cc: Kuniyuki Iwashima, netdev, bpf, Vincent Whitchurch
I noticed that if the verdict callback returns SK_PASS, using splice(2)
to read from the socket doesn't work since it never sees the data queued
on to it. As far as I can see, this is not a regression but just
something that has never worked.
This series attempts to fix it and add a test for it.
---
Changes in v2:
- Rebase on latest bpf-next/master
- Remove unnecessary change in inet_dgram_ops
- Remove ->splice_read NULL check in inet_splice_read()
- Use INDIRECT_CALL_1() in inet_splice_read()
- Include test case in default test suite in test_sockmap
- Link to v1: https://lore.kernel.org/r/20240606-sockmap-splice-v1-0-4820a2ab14b5@datadoghq.com
---
Vincent Whitchurch (5):
net: Add splice_read to prot
tcp_bpf: Fix reading with splice(2)
selftests/bpf: sockmap: Exit with error on failure
selftests/bpf: sockmap: Allow SK_PASS in verdict
selftests/bpf: sockmap: Add splice + SK_PASS regression test
include/net/inet_common.h | 3 +
include/net/sock.h | 3 +
net/ipv4/af_inet.c | 13 ++++-
net/ipv4/tcp_bpf.c | 9 +++
net/ipv4/tcp_ipv4.c | 1 +
net/ipv6/af_inet6.c | 2 +-
net/ipv6/tcp_ipv6.c | 1 +
tools/testing/selftests/bpf/test_sockmap.c | 90 +++++++++++++++++++++++++++++-
8 files changed, 118 insertions(+), 4 deletions(-)
---
base-commit: e41079f53e8792c99cc8888f545c31bc341ea9ac
change-id: 20240606-sockmap-splice-d371ac07d7b4
Best regards,
--
Vincent Whitchurch <vincent.whitchurch@datadoghq.com>
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH bpf-next v2 1/5] net: Add splice_read to prot
2025-06-09 13:26 [PATCH bpf-next v2 0/5] sockmap: Fix reading with splice(2) Vincent Whitchurch via B4 Relay
@ 2025-06-09 13:26 ` Vincent Whitchurch via B4 Relay
2025-06-09 19:21 ` Jakub Kicinski
2025-06-09 13:26 ` [PATCH bpf-next v2 2/5] tcp_bpf: Fix reading with splice(2) Vincent Whitchurch via B4 Relay
` (3 subsequent siblings)
4 siblings, 1 reply; 9+ messages in thread
From: Vincent Whitchurch via B4 Relay @ 2025-06-09 13:26 UTC (permalink / raw)
To: John Fastabend, Jakub Sitnicki
Cc: Kuniyuki Iwashima, netdev, bpf, Vincent Whitchurch
From: Vincent Whitchurch <vincent.whitchurch@datadoghq.com>
The TCP BPF code will need to override splice_read(), so add it to prot.
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@datadoghq.com>
---
include/net/inet_common.h | 3 +++
include/net/sock.h | 3 +++
net/ipv4/af_inet.c | 13 ++++++++++++-
net/ipv4/tcp_ipv4.c | 1 +
net/ipv6/af_inet6.c | 2 +-
net/ipv6/tcp_ipv6.c | 1 +
6 files changed, 21 insertions(+), 2 deletions(-)
diff --git a/include/net/inet_common.h b/include/net/inet_common.h
index c17a6585d0b0..2a6480d0d575 100644
--- a/include/net/inet_common.h
+++ b/include/net/inet_common.h
@@ -35,6 +35,9 @@ void __inet_accept(struct socket *sock, struct socket *newsock,
struct sock *newsk);
int inet_send_prepare(struct sock *sk);
int inet_sendmsg(struct socket *sock, struct msghdr *msg, size_t size);
+ssize_t inet_splice_read(struct socket *sk, loff_t *ppos,
+ struct pipe_inode_info *pipe, size_t len,
+ unsigned int flags);
void inet_splice_eof(struct socket *sock);
int inet_recvmsg(struct socket *sock, struct msghdr *msg, size_t size,
int flags);
diff --git a/include/net/sock.h b/include/net/sock.h
index 92e7c1aae3cc..08f0acf1fce4 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1280,6 +1280,9 @@ struct proto {
size_t len);
int (*recvmsg)(struct sock *sk, struct msghdr *msg,
size_t len, int flags, int *addr_len);
+ ssize_t (*splice_read)(struct socket *sock, loff_t *ppos,
+ struct pipe_inode_info *pipe, size_t len,
+ unsigned int flags);
void (*splice_eof)(struct socket *sock);
int (*bind)(struct sock *sk,
struct sockaddr *addr, int addr_len);
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 76e38092cd8a..9c521d252f66 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -868,6 +868,17 @@ void inet_splice_eof(struct socket *sock)
}
EXPORT_SYMBOL_GPL(inet_splice_eof);
+ssize_t inet_splice_read(struct socket *sock, loff_t *ppos,
+ struct pipe_inode_info *pipe, size_t len,
+ unsigned int flags)
+{
+ struct sock *sk = sock->sk;
+
+ return INDIRECT_CALL_1(sk->sk_prot->splice_read, tcp_splice_read, sock,
+ ppos, pipe, len, flags);
+}
+EXPORT_SYMBOL_GPL(inet_splice_read);
+
INDIRECT_CALLABLE_DECLARE(int udp_recvmsg(struct sock *, struct msghdr *,
size_t, int, int *));
int inet_recvmsg(struct socket *sock, struct msghdr *msg, size_t size,
@@ -1071,7 +1082,7 @@ const struct proto_ops inet_stream_ops = {
.mmap = tcp_mmap,
#endif
.splice_eof = inet_splice_eof,
- .splice_read = tcp_splice_read,
+ .splice_read = inet_splice_read,
.set_peek_off = sk_set_peek_off,
.read_sock = tcp_read_sock,
.read_skb = tcp_read_skb,
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 6a14f9e6fef6..0391b6bef35b 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -3375,6 +3375,7 @@ struct proto tcp_prot = {
.keepalive = tcp_set_keepalive,
.recvmsg = tcp_recvmsg,
.sendmsg = tcp_sendmsg,
+ .splice_read = tcp_splice_read,
.splice_eof = tcp_splice_eof,
.backlog_rcv = tcp_v4_do_rcv,
.release_cb = tcp_release_cb,
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index acaff1296783..2d6a1d669452 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -705,7 +705,7 @@ const struct proto_ops inet6_stream_ops = {
#endif
.splice_eof = inet_splice_eof,
.sendmsg_locked = tcp_sendmsg_locked,
- .splice_read = tcp_splice_read,
+ .splice_read = inet_splice_read,
.set_peek_off = sk_set_peek_off,
.read_sock = tcp_read_sock,
.read_skb = tcp_read_skb,
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index e8e68a142649..f7558e443de6 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -2342,6 +2342,7 @@ struct proto tcpv6_prot = {
.keepalive = tcp_set_keepalive,
.recvmsg = tcp_recvmsg,
.sendmsg = tcp_sendmsg,
+ .splice_read = tcp_splice_read,
.splice_eof = tcp_splice_eof,
.backlog_rcv = tcp_v6_do_rcv,
.release_cb = tcp_release_cb,
--
2.34.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH bpf-next v2 2/5] tcp_bpf: Fix reading with splice(2)
2025-06-09 13:26 [PATCH bpf-next v2 0/5] sockmap: Fix reading with splice(2) Vincent Whitchurch via B4 Relay
2025-06-09 13:26 ` [PATCH bpf-next v2 1/5] net: Add splice_read to prot Vincent Whitchurch via B4 Relay
@ 2025-06-09 13:26 ` Vincent Whitchurch via B4 Relay
2025-06-09 13:27 ` [PATCH bpf-next v2 3/5] selftests/bpf: sockmap: Exit with error on failure Vincent Whitchurch via B4 Relay
` (2 subsequent siblings)
4 siblings, 0 replies; 9+ messages in thread
From: Vincent Whitchurch via B4 Relay @ 2025-06-09 13:26 UTC (permalink / raw)
To: John Fastabend, Jakub Sitnicki
Cc: Kuniyuki Iwashima, netdev, bpf, Vincent Whitchurch
From: Vincent Whitchurch <vincent.whitchurch@datadoghq.com>
If a socket is added to a sockmap with a verdict program which returns
SK_PASS, splice(2) is not able to read from the socket.
The verdict code removes skbs from the receive queue, checks them using
the bpf program, and then re-queues them onto a separate queue
(psock->ingress_msg). The sockmap code modifies the TCP recvmsg hook to
check this second queue also so that works. But the splice_read hooks is
not modified and the default tcp_read_splice() only reads the normal
receive queue so it never sees the skbs which have been re-queued.
Fix it by using copy_splice_read() when replacing the proto for the
sockmap. This could eventually be replaced with a more efficient custom
version.
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@datadoghq.com>
---
net/ipv4/tcp_bpf.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
index ba581785adb4..197429b6adae 100644
--- a/net/ipv4/tcp_bpf.c
+++ b/net/ipv4/tcp_bpf.c
@@ -3,6 +3,7 @@
#include <linux/skmsg.h>
#include <linux/filter.h>
+#include <linux/fs.h>
#include <linux/bpf.h>
#include <linux/init.h>
#include <linux/wait.h>
@@ -381,6 +382,13 @@ static int tcp_bpf_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
return ret;
}
+static ssize_t tcp_bpf_splice_read(struct socket *sock, loff_t *ppos,
+ struct pipe_inode_info *pipe, size_t len,
+ unsigned int flags)
+{
+ return copy_splice_read(sock->file, ppos, pipe, len, flags);
+}
+
static int tcp_bpf_send_verdict(struct sock *sk, struct sk_psock *psock,
struct sk_msg *msg, int *copied, int flags)
{
@@ -605,6 +613,7 @@ static void tcp_bpf_rebuild_protos(struct proto prot[TCP_BPF_NUM_CFGS],
prot[TCP_BPF_BASE].destroy = sock_map_destroy;
prot[TCP_BPF_BASE].close = sock_map_close;
prot[TCP_BPF_BASE].recvmsg = tcp_bpf_recvmsg;
+ prot[TCP_BPF_BASE].splice_read = tcp_bpf_splice_read;
prot[TCP_BPF_BASE].sock_is_readable = sk_msg_is_readable;
prot[TCP_BPF_TX] = prot[TCP_BPF_BASE];
--
2.34.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH bpf-next v2 3/5] selftests/bpf: sockmap: Exit with error on failure
2025-06-09 13:26 [PATCH bpf-next v2 0/5] sockmap: Fix reading with splice(2) Vincent Whitchurch via B4 Relay
2025-06-09 13:26 ` [PATCH bpf-next v2 1/5] net: Add splice_read to prot Vincent Whitchurch via B4 Relay
2025-06-09 13:26 ` [PATCH bpf-next v2 2/5] tcp_bpf: Fix reading with splice(2) Vincent Whitchurch via B4 Relay
@ 2025-06-09 13:27 ` Vincent Whitchurch via B4 Relay
2025-06-09 13:27 ` [PATCH bpf-next v2 4/5] selftests/bpf: sockmap: Allow SK_PASS in verdict Vincent Whitchurch via B4 Relay
2025-06-09 13:27 ` [PATCH bpf-next v2 5/5] selftests/bpf: sockmap: Add splice + SK_PASS regression test Vincent Whitchurch via B4 Relay
4 siblings, 0 replies; 9+ messages in thread
From: Vincent Whitchurch via B4 Relay @ 2025-06-09 13:27 UTC (permalink / raw)
To: John Fastabend, Jakub Sitnicki
Cc: Kuniyuki Iwashima, netdev, bpf, Vincent Whitchurch
From: Vincent Whitchurch <vincent.whitchurch@datadoghq.com>
If any tests failed, exit the program with a non-zero
error code.
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@datadoghq.com>
---
tools/testing/selftests/bpf/test_sockmap.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/test_sockmap.c b/tools/testing/selftests/bpf/test_sockmap.c
index fd2da2234cc9..cf1c36ed32c1 100644
--- a/tools/testing/selftests/bpf/test_sockmap.c
+++ b/tools/testing/selftests/bpf/test_sockmap.c
@@ -2238,7 +2238,9 @@ int main(int argc, char **argv)
close(cg_fd);
if (cg_created)
cleanup_cgroup_environment();
- return err;
+ if (err)
+ return err;
+ return failed ? 1 : 0;
}
void running_handler(int a)
--
2.34.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH bpf-next v2 4/5] selftests/bpf: sockmap: Allow SK_PASS in verdict
2025-06-09 13:26 [PATCH bpf-next v2 0/5] sockmap: Fix reading with splice(2) Vincent Whitchurch via B4 Relay
` (2 preceding siblings ...)
2025-06-09 13:27 ` [PATCH bpf-next v2 3/5] selftests/bpf: sockmap: Exit with error on failure Vincent Whitchurch via B4 Relay
@ 2025-06-09 13:27 ` Vincent Whitchurch via B4 Relay
2025-06-09 13:27 ` [PATCH bpf-next v2 5/5] selftests/bpf: sockmap: Add splice + SK_PASS regression test Vincent Whitchurch via B4 Relay
4 siblings, 0 replies; 9+ messages in thread
From: Vincent Whitchurch via B4 Relay @ 2025-06-09 13:27 UTC (permalink / raw)
To: John Fastabend, Jakub Sitnicki
Cc: Kuniyuki Iwashima, netdev, bpf, Vincent Whitchurch
From: Vincent Whitchurch <vincent.whitchurch@datadoghq.com>
Add an option to always return SK_PASS in the verdict callback
instead of redirecting the skb. This allows testing cases
which are not covered by the test program as of now.
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@datadoghq.com>
---
tools/testing/selftests/bpf/test_sockmap.c | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/test_sockmap.c b/tools/testing/selftests/bpf/test_sockmap.c
index cf1c36ed32c1..8be2714dd573 100644
--- a/tools/testing/selftests/bpf/test_sockmap.c
+++ b/tools/testing/selftests/bpf/test_sockmap.c
@@ -80,6 +80,7 @@ int txmsg_end_push;
int txmsg_start_pop;
int txmsg_pop;
int txmsg_ingress;
+int txmsg_pass_skb;
int txmsg_redir_skb;
int txmsg_ktls_skb;
int txmsg_ktls_skb_drop;
@@ -114,6 +115,7 @@ static const struct option long_options[] = {
{"txmsg_start_pop", required_argument, NULL, 'w'},
{"txmsg_pop", required_argument, NULL, 'x'},
{"txmsg_ingress", no_argument, &txmsg_ingress, 1 },
+ {"txmsg_pass_skb", no_argument, &txmsg_pass_skb, 1 },
{"txmsg_redir_skb", no_argument, &txmsg_redir_skb, 1 },
{"ktls", no_argument, &ktls, 1 },
{"peek", no_argument, &peek_flag, 1 },
@@ -183,6 +185,7 @@ static void test_reset(void)
txmsg_pass = txmsg_drop = txmsg_redir = 0;
txmsg_apply = txmsg_cork = 0;
txmsg_ingress = txmsg_redir_skb = 0;
+ txmsg_pass_skb = 0;
txmsg_ktls_skb = txmsg_ktls_skb_drop = txmsg_ktls_skb_redir = 0;
txmsg_omit_skb_parser = 0;
skb_use_parser = 0;
@@ -1050,6 +1053,7 @@ static int run_options(struct sockmap_options *options, int cg_fd, int test)
{
int i, key, next_key, err, zero = 0;
struct bpf_program *tx_prog;
+ struct bpf_program *skb_verdict_prog;
/* If base test skip BPF setup */
if (test == BASE || test == BASE_SENDPAGE)
@@ -1066,7 +1070,12 @@ static int run_options(struct sockmap_options *options, int cg_fd, int test)
}
}
- links[1] = bpf_program__attach_sockmap(progs[1], map_fd[0]);
+ if (txmsg_pass_skb)
+ skb_verdict_prog = progs[2];
+ else
+ skb_verdict_prog = progs[1];
+
+ links[1] = bpf_program__attach_sockmap(skb_verdict_prog, map_fd[0]);
if (!links[1]) {
fprintf(stderr, "ERROR: bpf_program__attach_sockmap (sockmap): (%s)\n",
strerror(errno));
@@ -1455,6 +1464,8 @@ static void test_options(char *options)
}
if (txmsg_ingress)
append_str(options, "ingress,", OPTSTRING);
+ if (txmsg_pass_skb)
+ append_str(options, "pass_skb,", OPTSTRING);
if (txmsg_redir_skb)
append_str(options, "redir_skb,", OPTSTRING);
if (txmsg_ktls_skb)
--
2.34.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH bpf-next v2 5/5] selftests/bpf: sockmap: Add splice + SK_PASS regression test
2025-06-09 13:26 [PATCH bpf-next v2 0/5] sockmap: Fix reading with splice(2) Vincent Whitchurch via B4 Relay
` (3 preceding siblings ...)
2025-06-09 13:27 ` [PATCH bpf-next v2 4/5] selftests/bpf: sockmap: Allow SK_PASS in verdict Vincent Whitchurch via B4 Relay
@ 2025-06-09 13:27 ` Vincent Whitchurch via B4 Relay
4 siblings, 0 replies; 9+ messages in thread
From: Vincent Whitchurch via B4 Relay @ 2025-06-09 13:27 UTC (permalink / raw)
To: John Fastabend, Jakub Sitnicki
Cc: Kuniyuki Iwashima, netdev, bpf, Vincent Whitchurch
From: Vincent Whitchurch <vincent.whitchurch@datadoghq.com>
Add a test which checks that splice(2) is still able to read data from
the socket if it is added to a verdict program which returns SK_PASS.
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@datadoghq.com>
---
tools/testing/selftests/bpf/test_sockmap.c | 73 ++++++++++++++++++++++++++++++
1 file changed, 73 insertions(+)
diff --git a/tools/testing/selftests/bpf/test_sockmap.c b/tools/testing/selftests/bpf/test_sockmap.c
index 8be2714dd573..b4102161db62 100644
--- a/tools/testing/selftests/bpf/test_sockmap.c
+++ b/tools/testing/selftests/bpf/test_sockmap.c
@@ -1,5 +1,6 @@
// SPDX-License-Identifier: GPL-2.0
// Copyright (c) 2017-2018 Covalent IO, Inc. http://covalent.io
+#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <sys/socket.h>
@@ -965,6 +966,61 @@ static int sendmsg_test(struct sockmap_options *opt)
return err;
}
+static int splice_test(struct sockmap_options *opt)
+{
+ int pipefds[2];
+ char buf[1024] = {0};
+ ssize_t bytes;
+ int ret;
+
+ ret = pipe(pipefds);
+ if (ret < 0) {
+ perror("pipe");
+ return ret;
+ }
+
+ bytes = send(c1, buf, sizeof(buf), 0);
+ if (bytes < 0) {
+ perror("send failed");
+ return bytes;
+ }
+ if (bytes == 0) {
+ fprintf(stderr, "send wrote zero bytes\n");
+ return -1;
+ }
+
+ bytes = write(pipefds[1], buf, sizeof(buf));
+ if (bytes < 0) {
+ perror("pipe write failed");
+ return bytes;
+ }
+
+ bytes = splice(p1, NULL, pipefds[1], NULL, sizeof(buf), 0);
+ if (bytes < 0) {
+ perror("splice failed");
+ return bytes;
+ }
+ if (bytes == 0) {
+ fprintf(stderr, "spliced zero bytes\n");
+ return -1;
+ }
+
+ bytes = read(pipefds[0], buf, sizeof(buf));
+ if (bytes < 0) {
+ perror("pipe read failed");
+ return bytes;
+ }
+ if (bytes == 0) {
+ fprintf(stderr, "EOF from pipe\n");
+ return -1;
+ }
+
+ close(pipefds[1]);
+ close(pipefds[0]);
+
+ return 0;
+}
+
static int forever_ping_pong(int rate, struct sockmap_options *opt)
{
struct timeval timeout;
@@ -1047,6 +1103,7 @@ enum {
BASE,
BASE_SENDPAGE,
SENDPAGE,
+ SPLICE,
};
static int run_options(struct sockmap_options *options, int cg_fd, int test)
@@ -1378,6 +1435,8 @@ static int run_options(struct sockmap_options *options, int cg_fd, int test)
options->base = true;
options->sendpage = true;
err = sendmsg_test(options);
+ } else if (test == SPLICE) {
+ err = splice_test(options);
} else
fprintf(stderr, "unknown test\n");
out:
@@ -1993,6 +2052,17 @@ static int populate_progs(char *bpf_file)
return 0;
}
+static void test_txmsg_splice_pass(int cgrp, struct sockmap_options *opt)
+{
+ txmsg_omit_skb_parser = 1;
+ txmsg_pass_skb = 1;
+
+ __test_exec(cgrp, SPLICE, opt);
+
+ txmsg_omit_skb_parser = 0;
+ txmsg_pass_skb = 0;
+}
+
struct _test test[] = {
{"txmsg test passthrough", test_txmsg_pass},
{"txmsg test redirect", test_txmsg_redir},
@@ -2009,6 +2079,7 @@ struct _test test[] = {
{"txmsg test push/pop data", test_txmsg_push_pop},
{"txmsg test ingress parser", test_txmsg_ingress_parser},
{"txmsg test ingress parser2", test_txmsg_ingress_parser2},
+ {"txmsg test splice pass", test_txmsg_splice_pass},
};
static int check_whitelist(struct _test *t, struct sockmap_options *opt)
@@ -2187,6 +2258,8 @@ int main(int argc, char **argv)
test = BASE_SENDPAGE;
} else if (strcmp(optarg, "sendpage") == 0) {
test = SENDPAGE;
+ } else if (strcmp(optarg, "splice") == 0) {
+ test = SPLICE;
} else {
usage(argv);
return -1;
--
2.34.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH bpf-next v2 1/5] net: Add splice_read to prot
2025-06-09 13:26 ` [PATCH bpf-next v2 1/5] net: Add splice_read to prot Vincent Whitchurch via B4 Relay
@ 2025-06-09 19:21 ` Jakub Kicinski
2025-06-11 8:57 ` Vincent Whitchurch
0 siblings, 1 reply; 9+ messages in thread
From: Jakub Kicinski @ 2025-06-09 19:21 UTC (permalink / raw)
To: Vincent Whitchurch via B4 Relay
Cc: vincent.whitchurch, John Fastabend, Jakub Sitnicki,
Kuniyuki Iwashima, netdev, bpf
On Mon, 09 Jun 2025 15:26:58 +0200 Vincent Whitchurch via B4 Relay
wrote:
> The TCP BPF code will need to override splice_read(), so add it to prot.
Can we not override proto_ops in tcp_bpf for some specific reason?
TLS does that, IIUC.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH bpf-next v2 1/5] net: Add splice_read to prot
2025-06-09 19:21 ` Jakub Kicinski
@ 2025-06-11 8:57 ` Vincent Whitchurch
2025-06-20 15:44 ` Vincent Whitchurch
0 siblings, 1 reply; 9+ messages in thread
From: Vincent Whitchurch @ 2025-06-11 8:57 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Vincent Whitchurch via B4 Relay, John Fastabend, Jakub Sitnicki,
Kuniyuki Iwashima, netdev, bpf
On Mon, Jun 9, 2025 at 9:21 PM Jakub Kicinski <kuba@kernel.org> wrote:
> Can we not override proto_ops in tcp_bpf for some specific reason?
> TLS does that, IIUC.
I see that TLS writes to sk->sk_socket->ops to override the proto_ops.
I added some prints to tcp_bpf_update_proto() but there I see that
sk->sk_socket is NULL in some code paths, like the one below.
tcp_bpf_update_proto: restore 0 sk_prot 000000002cf13dcc sk_socket
0000000000000000
CPU: 0 UID: 0 PID: 392 Comm: test_sockmap Not tainted
6.15.0-12313-g39e87f4ff7c3-dirty #77 PREEMPT(voluntary)
Call Trace:
<IRQ>
dump_stack_lvl+0x83/0xa0
tcp_bpf_update_proto+0x116/0x790
sock_map_link+0x425/0xdd0
sock_map_update_common+0xb8/0x6a0
bpf_sock_map_update+0x102/0x190
bpf_prog_4d9ceaf804942d01_bpf_sockmap+0x79/0x81
__cgroup_bpf_run_filter_sock_ops+0x1db/0x4b0
tcp_init_transfer+0x852/0xc00
tcp_rcv_state_process+0x3147/0x4b30
tcp_child_process+0x346/0x8b0
tcp_v4_rcv+0x1616/0x3e10
ip_protocol_deliver_rcu+0x93/0x370
ip_local_deliver_finish+0x29c/0x420
ip_local_deliver+0x193/0x450
ip_rcv+0x497/0x710
__netif_receive_skb_one_core+0x164/0x1b0
process_backlog+0x3a7/0x12b0
__napi_poll.constprop.0+0xa0/0x440
net_rx_action+0x8ce/0xca0
handle_softirqs+0x1c3/0x7b0
do_softirq+0xa5/0xd0
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH bpf-next v2 1/5] net: Add splice_read to prot
2025-06-11 8:57 ` Vincent Whitchurch
@ 2025-06-20 15:44 ` Vincent Whitchurch
0 siblings, 0 replies; 9+ messages in thread
From: Vincent Whitchurch @ 2025-06-20 15:44 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Vincent Whitchurch via B4 Relay, John Fastabend, Jakub Sitnicki,
Kuniyuki Iwashima, netdev, bpf
On Wed, Jun 11, 2025 at 10:57 AM Vincent Whitchurch
<vincent.whitchurch@datadoghq.com> wrote:
> On Mon, Jun 9, 2025 at 9:21 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > Can we not override proto_ops in tcp_bpf for some specific reason?
> > TLS does that, IIUC.
>
> I see that TLS writes to sk->sk_socket->ops to override the proto_ops.
> I added some prints to tcp_bpf_update_proto() but there I see that
> sk->sk_socket is NULL in some code paths, like the one below.
To expand on this: TLS is able to override the sk->sk_socket->ops
since it can only be installed on the socket via setsockopt(2).
tcp_bpf on the other hand allows being installed on passively
established sockets before they have a sk->sk_socket assigned via
accept(2). So, AFAICS, we can't use the same override mechanism as
TLS.
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2025-06-20 15:45 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-09 13:26 [PATCH bpf-next v2 0/5] sockmap: Fix reading with splice(2) Vincent Whitchurch via B4 Relay
2025-06-09 13:26 ` [PATCH bpf-next v2 1/5] net: Add splice_read to prot Vincent Whitchurch via B4 Relay
2025-06-09 19:21 ` Jakub Kicinski
2025-06-11 8:57 ` Vincent Whitchurch
2025-06-20 15:44 ` Vincent Whitchurch
2025-06-09 13:26 ` [PATCH bpf-next v2 2/5] tcp_bpf: Fix reading with splice(2) Vincent Whitchurch via B4 Relay
2025-06-09 13:27 ` [PATCH bpf-next v2 3/5] selftests/bpf: sockmap: Exit with error on failure Vincent Whitchurch via B4 Relay
2025-06-09 13:27 ` [PATCH bpf-next v2 4/5] selftests/bpf: sockmap: Allow SK_PASS in verdict Vincent Whitchurch via B4 Relay
2025-06-09 13:27 ` [PATCH bpf-next v2 5/5] selftests/bpf: sockmap: Add splice + SK_PASS regression test Vincent Whitchurch via B4 Relay
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).