* [PATCH bpf-next v3 1/5] net: Add splice_read to prot
2025-07-09 12:47 [PATCH bpf-next v3 0/5] sockmap: Fix reading with splice(2) Vincent Whitchurch via B4 Relay
@ 2025-07-09 12:47 ` Vincent Whitchurch via B4 Relay
2025-07-11 17:27 ` John Fastabend
2025-07-09 12:47 ` [PATCH bpf-next v3 2/5] tcp_bpf: Fix reading with splice(2) Vincent Whitchurch via B4 Relay
` (5 subsequent siblings)
6 siblings, 1 reply; 12+ messages in thread
From: Vincent Whitchurch via B4 Relay @ 2025-07-09 12:47 UTC (permalink / raw)
To: John Fastabend, Jakub Sitnicki
Cc: Kuniyuki Iwashima, Jakub Kicinski, netdev, bpf,
Vincent Whitchurch
From: Vincent Whitchurch <vincent.whitchurch@datadoghq.com>
The TCP BPF code will need to override splice_read(), so add it to prot.
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@datadoghq.com>
---
include/net/inet_common.h | 3 +++
include/net/sock.h | 3 +++
net/ipv4/af_inet.c | 13 ++++++++++++-
net/ipv4/tcp_ipv4.c | 1 +
net/ipv6/af_inet6.c | 2 +-
net/ipv6/tcp_ipv6.c | 1 +
6 files changed, 21 insertions(+), 2 deletions(-)
diff --git a/include/net/inet_common.h b/include/net/inet_common.h
index c17a6585d0b0..2a6480d0d575 100644
--- a/include/net/inet_common.h
+++ b/include/net/inet_common.h
@@ -35,6 +35,9 @@ void __inet_accept(struct socket *sock, struct socket *newsock,
struct sock *newsk);
int inet_send_prepare(struct sock *sk);
int inet_sendmsg(struct socket *sock, struct msghdr *msg, size_t size);
+ssize_t inet_splice_read(struct socket *sk, loff_t *ppos,
+ struct pipe_inode_info *pipe, size_t len,
+ unsigned int flags);
void inet_splice_eof(struct socket *sock);
int inet_recvmsg(struct socket *sock, struct msghdr *msg, size_t size,
int flags);
diff --git a/include/net/sock.h b/include/net/sock.h
index 4c37015b7cf7..4bdebcbcca38 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1280,6 +1280,9 @@ struct proto {
size_t len);
int (*recvmsg)(struct sock *sk, struct msghdr *msg,
size_t len, int flags, int *addr_len);
+ ssize_t (*splice_read)(struct socket *sock, loff_t *ppos,
+ struct pipe_inode_info *pipe, size_t len,
+ unsigned int flags);
void (*splice_eof)(struct socket *sock);
int (*bind)(struct sock *sk,
struct sockaddr *addr, int addr_len);
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 76e38092cd8a..9c521d252f66 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -868,6 +868,17 @@ void inet_splice_eof(struct socket *sock)
}
EXPORT_SYMBOL_GPL(inet_splice_eof);
+ssize_t inet_splice_read(struct socket *sock, loff_t *ppos,
+ struct pipe_inode_info *pipe, size_t len,
+ unsigned int flags)
+{
+ struct sock *sk = sock->sk;
+
+ return INDIRECT_CALL_1(sk->sk_prot->splice_read, tcp_splice_read, sock,
+ ppos, pipe, len, flags);
+}
+EXPORT_SYMBOL_GPL(inet_splice_read);
+
INDIRECT_CALLABLE_DECLARE(int udp_recvmsg(struct sock *, struct msghdr *,
size_t, int, int *));
int inet_recvmsg(struct socket *sock, struct msghdr *msg, size_t size,
@@ -1071,7 +1082,7 @@ const struct proto_ops inet_stream_ops = {
.mmap = tcp_mmap,
#endif
.splice_eof = inet_splice_eof,
- .splice_read = tcp_splice_read,
+ .splice_read = inet_splice_read,
.set_peek_off = sk_set_peek_off,
.read_sock = tcp_read_sock,
.read_skb = tcp_read_skb,
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 6a14f9e6fef6..0391b6bef35b 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -3375,6 +3375,7 @@ struct proto tcp_prot = {
.keepalive = tcp_set_keepalive,
.recvmsg = tcp_recvmsg,
.sendmsg = tcp_sendmsg,
+ .splice_read = tcp_splice_read,
.splice_eof = tcp_splice_eof,
.backlog_rcv = tcp_v4_do_rcv,
.release_cb = tcp_release_cb,
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index acaff1296783..2d6a1d669452 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -705,7 +705,7 @@ const struct proto_ops inet6_stream_ops = {
#endif
.splice_eof = inet_splice_eof,
.sendmsg_locked = tcp_sendmsg_locked,
- .splice_read = tcp_splice_read,
+ .splice_read = inet_splice_read,
.set_peek_off = sk_set_peek_off,
.read_sock = tcp_read_sock,
.read_skb = tcp_read_skb,
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index e8e68a142649..f7558e443de6 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -2342,6 +2342,7 @@ struct proto tcpv6_prot = {
.keepalive = tcp_set_keepalive,
.recvmsg = tcp_recvmsg,
.sendmsg = tcp_sendmsg,
+ .splice_read = tcp_splice_read,
.splice_eof = tcp_splice_eof,
.backlog_rcv = tcp_v6_do_rcv,
.release_cb = tcp_release_cb,
--
2.34.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH bpf-next v3 1/5] net: Add splice_read to prot
2025-07-09 12:47 ` [PATCH bpf-next v3 1/5] net: Add splice_read to prot Vincent Whitchurch via B4 Relay
@ 2025-07-11 17:27 ` John Fastabend
2025-07-11 18:48 ` Kuniyuki Iwashima
0 siblings, 1 reply; 12+ messages in thread
From: John Fastabend @ 2025-07-11 17:27 UTC (permalink / raw)
To: vincent.whitchurch
Cc: Jakub Sitnicki, Kuniyuki Iwashima, Jakub Kicinski, netdev, bpf
On 2025-07-09 14:47:57, Vincent Whitchurch via B4 Relay wrote:
> From: Vincent Whitchurch <vincent.whitchurch@datadoghq.com>
>
> The TCP BPF code will need to override splice_read(), so add it to prot.
>
> Signed-off-by: Vincent Whitchurch <vincent.whitchurch@datadoghq.com>
> ---
> include/net/inet_common.h | 3 +++
> include/net/sock.h | 3 +++
> net/ipv4/af_inet.c | 13 ++++++++++++-
> net/ipv4/tcp_ipv4.c | 1 +
> net/ipv6/af_inet6.c | 2 +-
> net/ipv6/tcp_ipv6.c | 1 +
> 6 files changed, 21 insertions(+), 2 deletions(-)
>
> diff --git a/include/net/inet_common.h b/include/net/inet_common.h
> index c17a6585d0b0..2a6480d0d575 100644
> --- a/include/net/inet_common.h
> +++ b/include/net/inet_common.h
> @@ -35,6 +35,9 @@ void __inet_accept(struct socket *sock, struct socket *newsock,
> struct sock *newsk);
> int inet_send_prepare(struct sock *sk);
> int inet_sendmsg(struct socket *sock, struct msghdr *msg, size_t size);
> +ssize_t inet_splice_read(struct socket *sk, loff_t *ppos,
> + struct pipe_inode_info *pipe, size_t len,
> + unsigned int flags);
> void inet_splice_eof(struct socket *sock);
> int inet_recvmsg(struct socket *sock, struct msghdr *msg, size_t size,
> int flags);
> diff --git a/include/net/sock.h b/include/net/sock.h
> index 4c37015b7cf7..4bdebcbcca38 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -1280,6 +1280,9 @@ struct proto {
> size_t len);
> int (*recvmsg)(struct sock *sk, struct msghdr *msg,
> size_t len, int flags, int *addr_len);
> + ssize_t (*splice_read)(struct socket *sock, loff_t *ppos,
> + struct pipe_inode_info *pipe, size_t len,
> + unsigned int flags);
> void (*splice_eof)(struct socket *sock);
> int (*bind)(struct sock *sk,
> struct sockaddr *addr, int addr_len);
> diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
> index 76e38092cd8a..9c521d252f66 100644
> --- a/net/ipv4/af_inet.c
> +++ b/net/ipv4/af_inet.c
> @@ -868,6 +868,17 @@ void inet_splice_eof(struct socket *sock)
> }
> EXPORT_SYMBOL_GPL(inet_splice_eof);
>
> +ssize_t inet_splice_read(struct socket *sock, loff_t *ppos,
> + struct pipe_inode_info *pipe, size_t len,
> + unsigned int flags)
> +{
> + struct sock *sk = sock->sk;
> +
> + return INDIRECT_CALL_1(sk->sk_prot->splice_read, tcp_splice_read, sock,
> + ppos, pipe, len, flags);
> +}
Could we do a indirect_call_2 here? something like this?
INDIRECT_CALL_2(sk->sk_prot->splice_read, tcp_splice_read ...
Otherwise the series looks reasonable to me.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH bpf-next v3 1/5] net: Add splice_read to prot
2025-07-11 17:27 ` John Fastabend
@ 2025-07-11 18:48 ` Kuniyuki Iwashima
2025-07-11 18:52 ` Kuniyuki Iwashima
0 siblings, 1 reply; 12+ messages in thread
From: Kuniyuki Iwashima @ 2025-07-11 18:48 UTC (permalink / raw)
To: John Fastabend
Cc: vincent.whitchurch, Jakub Sitnicki, Jakub Kicinski, netdev, bpf
On Fri, Jul 11, 2025 at 10:27 AM John Fastabend
<john.fastabend@gmail.com> wrote:
>
> On 2025-07-09 14:47:57, Vincent Whitchurch via B4 Relay wrote:
> > From: Vincent Whitchurch <vincent.whitchurch@datadoghq.com>
> >
> > The TCP BPF code will need to override splice_read(), so add it to prot.
> >
> > Signed-off-by: Vincent Whitchurch <vincent.whitchurch@datadoghq.com>
> > ---
> > include/net/inet_common.h | 3 +++
> > include/net/sock.h | 3 +++
> > net/ipv4/af_inet.c | 13 ++++++++++++-
> > net/ipv4/tcp_ipv4.c | 1 +
> > net/ipv6/af_inet6.c | 2 +-
> > net/ipv6/tcp_ipv6.c | 1 +
> > 6 files changed, 21 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/net/inet_common.h b/include/net/inet_common.h
> > index c17a6585d0b0..2a6480d0d575 100644
> > --- a/include/net/inet_common.h
> > +++ b/include/net/inet_common.h
> > @@ -35,6 +35,9 @@ void __inet_accept(struct socket *sock, struct socket *newsock,
> > struct sock *newsk);
> > int inet_send_prepare(struct sock *sk);
> > int inet_sendmsg(struct socket *sock, struct msghdr *msg, size_t size);
> > +ssize_t inet_splice_read(struct socket *sk, loff_t *ppos,
> > + struct pipe_inode_info *pipe, size_t len,
> > + unsigned int flags);
> > void inet_splice_eof(struct socket *sock);
> > int inet_recvmsg(struct socket *sock, struct msghdr *msg, size_t size,
> > int flags);
> > diff --git a/include/net/sock.h b/include/net/sock.h
> > index 4c37015b7cf7..4bdebcbcca38 100644
> > --- a/include/net/sock.h
> > +++ b/include/net/sock.h
> > @@ -1280,6 +1280,9 @@ struct proto {
> > size_t len);
> > int (*recvmsg)(struct sock *sk, struct msghdr *msg,
> > size_t len, int flags, int *addr_len);
> > + ssize_t (*splice_read)(struct socket *sock, loff_t *ppos,
> > + struct pipe_inode_info *pipe, size_t len,
> > + unsigned int flags);
> > void (*splice_eof)(struct socket *sock);
> > int (*bind)(struct sock *sk,
> > struct sockaddr *addr, int addr_len);
> > diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
> > index 76e38092cd8a..9c521d252f66 100644
> > --- a/net/ipv4/af_inet.c
> > +++ b/net/ipv4/af_inet.c
> > @@ -868,6 +868,17 @@ void inet_splice_eof(struct socket *sock)
> > }
> > EXPORT_SYMBOL_GPL(inet_splice_eof);
> >
> > +ssize_t inet_splice_read(struct socket *sock, loff_t *ppos,
> > + struct pipe_inode_info *pipe, size_t len,
> > + unsigned int flags)
> > +{
> > + struct sock *sk = sock->sk;
> > +
> > + return INDIRECT_CALL_1(sk->sk_prot->splice_read, tcp_splice_read, sock,
> > + ppos, pipe, len, flags);
> > +}
>
> Could we do a indirect_call_2 here? something like this?
>
> INDIRECT_CALL_2(sk->sk_prot->splice_read, tcp_splice_read ...
>
> Otherwise the series looks reasonable to me.
What's the second candidate ?
I think we should specify the built-in one and cannot use bpf one.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH bpf-next v3 1/5] net: Add splice_read to prot
2025-07-11 18:48 ` Kuniyuki Iwashima
@ 2025-07-11 18:52 ` Kuniyuki Iwashima
0 siblings, 0 replies; 12+ messages in thread
From: Kuniyuki Iwashima @ 2025-07-11 18:52 UTC (permalink / raw)
To: John Fastabend
Cc: Jakub Sitnicki, Jakub Kicinski, netdev, bpf, vincent.whitchurch
[ Fixed up Vincent's address as John's and my reply seems not to be delivered. ]
On Fri, Jul 11, 2025 at 11:48 AM Kuniyuki Iwashima <kuniyu@google.com> wrote:
>
> On Fri, Jul 11, 2025 at 10:27 AM John Fastabend
> <john.fastabend@gmail.com> wrote:
> >
> > On 2025-07-09 14:47:57, Vincent Whitchurch via B4 Relay wrote:
> > > From: Vincent Whitchurch <vincent.whitchurch@datadoghq.com>
> > >
> > > The TCP BPF code will need to override splice_read(), so add it to prot.
> > >
> > > Signed-off-by: Vincent Whitchurch <vincent.whitchurch@datadoghq.com>
> > > ---
> > > include/net/inet_common.h | 3 +++
> > > include/net/sock.h | 3 +++
> > > net/ipv4/af_inet.c | 13 ++++++++++++-
> > > net/ipv4/tcp_ipv4.c | 1 +
> > > net/ipv6/af_inet6.c | 2 +-
> > > net/ipv6/tcp_ipv6.c | 1 +
> > > 6 files changed, 21 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/include/net/inet_common.h b/include/net/inet_common.h
> > > index c17a6585d0b0..2a6480d0d575 100644
> > > --- a/include/net/inet_common.h
> > > +++ b/include/net/inet_common.h
> > > @@ -35,6 +35,9 @@ void __inet_accept(struct socket *sock, struct socket *newsock,
> > > struct sock *newsk);
> > > int inet_send_prepare(struct sock *sk);
> > > int inet_sendmsg(struct socket *sock, struct msghdr *msg, size_t size);
> > > +ssize_t inet_splice_read(struct socket *sk, loff_t *ppos,
> > > + struct pipe_inode_info *pipe, size_t len,
> > > + unsigned int flags);
> > > void inet_splice_eof(struct socket *sock);
> > > int inet_recvmsg(struct socket *sock, struct msghdr *msg, size_t size,
> > > int flags);
> > > diff --git a/include/net/sock.h b/include/net/sock.h
> > > index 4c37015b7cf7..4bdebcbcca38 100644
> > > --- a/include/net/sock.h
> > > +++ b/include/net/sock.h
> > > @@ -1280,6 +1280,9 @@ struct proto {
> > > size_t len);
> > > int (*recvmsg)(struct sock *sk, struct msghdr *msg,
> > > size_t len, int flags, int *addr_len);
> > > + ssize_t (*splice_read)(struct socket *sock, loff_t *ppos,
> > > + struct pipe_inode_info *pipe, size_t len,
> > > + unsigned int flags);
> > > void (*splice_eof)(struct socket *sock);
> > > int (*bind)(struct sock *sk,
> > > struct sockaddr *addr, int addr_len);
> > > diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
> > > index 76e38092cd8a..9c521d252f66 100644
> > > --- a/net/ipv4/af_inet.c
> > > +++ b/net/ipv4/af_inet.c
> > > @@ -868,6 +868,17 @@ void inet_splice_eof(struct socket *sock)
> > > }
> > > EXPORT_SYMBOL_GPL(inet_splice_eof);
> > >
> > > +ssize_t inet_splice_read(struct socket *sock, loff_t *ppos,
> > > + struct pipe_inode_info *pipe, size_t len,
> > > + unsigned int flags)
> > > +{
> > > + struct sock *sk = sock->sk;
> > > +
> > > + return INDIRECT_CALL_1(sk->sk_prot->splice_read, tcp_splice_read, sock,
> > > + ppos, pipe, len, flags);
> > > +}
> >
> > Could we do a indirect_call_2 here? something like this?
> >
> > INDIRECT_CALL_2(sk->sk_prot->splice_read, tcp_splice_read ...
> >
> > Otherwise the series looks reasonable to me.
>
> What's the second candidate ?
> I think we should specify the built-in one and cannot use bpf one.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH bpf-next v3 2/5] tcp_bpf: Fix reading with splice(2)
2025-07-09 12:47 [PATCH bpf-next v3 0/5] sockmap: Fix reading with splice(2) Vincent Whitchurch via B4 Relay
2025-07-09 12:47 ` [PATCH bpf-next v3 1/5] net: Add splice_read to prot Vincent Whitchurch via B4 Relay
@ 2025-07-09 12:47 ` Vincent Whitchurch via B4 Relay
2025-07-09 12:47 ` [PATCH bpf-next v3 3/5] selftests/bpf: sockmap: Exit with error on failure Vincent Whitchurch via B4 Relay
` (4 subsequent siblings)
6 siblings, 0 replies; 12+ messages in thread
From: Vincent Whitchurch via B4 Relay @ 2025-07-09 12:47 UTC (permalink / raw)
To: John Fastabend, Jakub Sitnicki
Cc: Kuniyuki Iwashima, Jakub Kicinski, netdev, bpf,
Vincent Whitchurch
From: Vincent Whitchurch <vincent.whitchurch@datadoghq.com>
If a socket is added to a sockmap with a verdict program which returns
SK_PASS, splice(2) is not able to read from the socket.
The verdict code removes skbs from the receive queue, checks them using
the bpf program, and then re-queues them onto a separate queue
(psock->ingress_msg). The sockmap code modifies the TCP recvmsg hook to
check this second queue also so that works. But the splice_read hooks is
not modified and the default tcp_read_splice() only reads the normal
receive queue so it never sees the skbs which have been re-queued.
Fix it by using copy_splice_read() when replacing the proto for the
sockmap. This could eventually be replaced with a more efficient custom
version.
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@datadoghq.com>
---
net/ipv4/tcp_bpf.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
index ba581785adb4..197429b6adae 100644
--- a/net/ipv4/tcp_bpf.c
+++ b/net/ipv4/tcp_bpf.c
@@ -3,6 +3,7 @@
#include <linux/skmsg.h>
#include <linux/filter.h>
+#include <linux/fs.h>
#include <linux/bpf.h>
#include <linux/init.h>
#include <linux/wait.h>
@@ -381,6 +382,13 @@ static int tcp_bpf_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
return ret;
}
+static ssize_t tcp_bpf_splice_read(struct socket *sock, loff_t *ppos,
+ struct pipe_inode_info *pipe, size_t len,
+ unsigned int flags)
+{
+ return copy_splice_read(sock->file, ppos, pipe, len, flags);
+}
+
static int tcp_bpf_send_verdict(struct sock *sk, struct sk_psock *psock,
struct sk_msg *msg, int *copied, int flags)
{
@@ -605,6 +613,7 @@ static void tcp_bpf_rebuild_protos(struct proto prot[TCP_BPF_NUM_CFGS],
prot[TCP_BPF_BASE].destroy = sock_map_destroy;
prot[TCP_BPF_BASE].close = sock_map_close;
prot[TCP_BPF_BASE].recvmsg = tcp_bpf_recvmsg;
+ prot[TCP_BPF_BASE].splice_read = tcp_bpf_splice_read;
prot[TCP_BPF_BASE].sock_is_readable = sk_msg_is_readable;
prot[TCP_BPF_TX] = prot[TCP_BPF_BASE];
--
2.34.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH bpf-next v3 3/5] selftests/bpf: sockmap: Exit with error on failure
2025-07-09 12:47 [PATCH bpf-next v3 0/5] sockmap: Fix reading with splice(2) Vincent Whitchurch via B4 Relay
2025-07-09 12:47 ` [PATCH bpf-next v3 1/5] net: Add splice_read to prot Vincent Whitchurch via B4 Relay
2025-07-09 12:47 ` [PATCH bpf-next v3 2/5] tcp_bpf: Fix reading with splice(2) Vincent Whitchurch via B4 Relay
@ 2025-07-09 12:47 ` Vincent Whitchurch via B4 Relay
2025-07-09 12:48 ` [PATCH bpf-next v3 4/5] selftests/bpf: sockmap: Allow SK_PASS in verdict Vincent Whitchurch via B4 Relay
` (3 subsequent siblings)
6 siblings, 0 replies; 12+ messages in thread
From: Vincent Whitchurch via B4 Relay @ 2025-07-09 12:47 UTC (permalink / raw)
To: John Fastabend, Jakub Sitnicki
Cc: Kuniyuki Iwashima, Jakub Kicinski, netdev, bpf,
Vincent Whitchurch
From: Vincent Whitchurch <vincent.whitchurch@datadoghq.com>
If any tests failed, exit the program with a non-zero
error code.
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@datadoghq.com>
---
tools/testing/selftests/bpf/test_sockmap.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/test_sockmap.c b/tools/testing/selftests/bpf/test_sockmap.c
index fd2da2234cc9..cf1c36ed32c1 100644
--- a/tools/testing/selftests/bpf/test_sockmap.c
+++ b/tools/testing/selftests/bpf/test_sockmap.c
@@ -2238,7 +2238,9 @@ int main(int argc, char **argv)
close(cg_fd);
if (cg_created)
cleanup_cgroup_environment();
- return err;
+ if (err)
+ return err;
+ return failed ? 1 : 0;
}
void running_handler(int a)
--
2.34.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH bpf-next v3 4/5] selftests/bpf: sockmap: Allow SK_PASS in verdict
2025-07-09 12:47 [PATCH bpf-next v3 0/5] sockmap: Fix reading with splice(2) Vincent Whitchurch via B4 Relay
` (2 preceding siblings ...)
2025-07-09 12:47 ` [PATCH bpf-next v3 3/5] selftests/bpf: sockmap: Exit with error on failure Vincent Whitchurch via B4 Relay
@ 2025-07-09 12:48 ` Vincent Whitchurch via B4 Relay
2025-07-09 12:48 ` [PATCH bpf-next v3 5/5] selftests/bpf: sockmap: Add splice + SK_PASS regression test Vincent Whitchurch via B4 Relay
` (2 subsequent siblings)
6 siblings, 0 replies; 12+ messages in thread
From: Vincent Whitchurch via B4 Relay @ 2025-07-09 12:48 UTC (permalink / raw)
To: John Fastabend, Jakub Sitnicki
Cc: Kuniyuki Iwashima, Jakub Kicinski, netdev, bpf,
Vincent Whitchurch
From: Vincent Whitchurch <vincent.whitchurch@datadoghq.com>
Add an option to always return SK_PASS in the verdict callback
instead of redirecting the skb. This allows testing cases
which are not covered by the test program as of now.
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@datadoghq.com>
---
tools/testing/selftests/bpf/test_sockmap.c | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/test_sockmap.c b/tools/testing/selftests/bpf/test_sockmap.c
index cf1c36ed32c1..8be2714dd573 100644
--- a/tools/testing/selftests/bpf/test_sockmap.c
+++ b/tools/testing/selftests/bpf/test_sockmap.c
@@ -80,6 +80,7 @@ int txmsg_end_push;
int txmsg_start_pop;
int txmsg_pop;
int txmsg_ingress;
+int txmsg_pass_skb;
int txmsg_redir_skb;
int txmsg_ktls_skb;
int txmsg_ktls_skb_drop;
@@ -114,6 +115,7 @@ static const struct option long_options[] = {
{"txmsg_start_pop", required_argument, NULL, 'w'},
{"txmsg_pop", required_argument, NULL, 'x'},
{"txmsg_ingress", no_argument, &txmsg_ingress, 1 },
+ {"txmsg_pass_skb", no_argument, &txmsg_pass_skb, 1 },
{"txmsg_redir_skb", no_argument, &txmsg_redir_skb, 1 },
{"ktls", no_argument, &ktls, 1 },
{"peek", no_argument, &peek_flag, 1 },
@@ -183,6 +185,7 @@ static void test_reset(void)
txmsg_pass = txmsg_drop = txmsg_redir = 0;
txmsg_apply = txmsg_cork = 0;
txmsg_ingress = txmsg_redir_skb = 0;
+ txmsg_pass_skb = 0;
txmsg_ktls_skb = txmsg_ktls_skb_drop = txmsg_ktls_skb_redir = 0;
txmsg_omit_skb_parser = 0;
skb_use_parser = 0;
@@ -1050,6 +1053,7 @@ static int run_options(struct sockmap_options *options, int cg_fd, int test)
{
int i, key, next_key, err, zero = 0;
struct bpf_program *tx_prog;
+ struct bpf_program *skb_verdict_prog;
/* If base test skip BPF setup */
if (test == BASE || test == BASE_SENDPAGE)
@@ -1066,7 +1070,12 @@ static int run_options(struct sockmap_options *options, int cg_fd, int test)
}
}
- links[1] = bpf_program__attach_sockmap(progs[1], map_fd[0]);
+ if (txmsg_pass_skb)
+ skb_verdict_prog = progs[2];
+ else
+ skb_verdict_prog = progs[1];
+
+ links[1] = bpf_program__attach_sockmap(skb_verdict_prog, map_fd[0]);
if (!links[1]) {
fprintf(stderr, "ERROR: bpf_program__attach_sockmap (sockmap): (%s)\n",
strerror(errno));
@@ -1455,6 +1464,8 @@ static void test_options(char *options)
}
if (txmsg_ingress)
append_str(options, "ingress,", OPTSTRING);
+ if (txmsg_pass_skb)
+ append_str(options, "pass_skb,", OPTSTRING);
if (txmsg_redir_skb)
append_str(options, "redir_skb,", OPTSTRING);
if (txmsg_ktls_skb)
--
2.34.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH bpf-next v3 5/5] selftests/bpf: sockmap: Add splice + SK_PASS regression test
2025-07-09 12:47 [PATCH bpf-next v3 0/5] sockmap: Fix reading with splice(2) Vincent Whitchurch via B4 Relay
` (3 preceding siblings ...)
2025-07-09 12:48 ` [PATCH bpf-next v3 4/5] selftests/bpf: sockmap: Allow SK_PASS in verdict Vincent Whitchurch via B4 Relay
@ 2025-07-09 12:48 ` Vincent Whitchurch via B4 Relay
2025-07-11 15:57 ` [PATCH bpf-next v3 0/5] sockmap: Fix reading with splice(2) John Fastabend
2025-07-11 23:09 ` Jakub Kicinski
6 siblings, 0 replies; 12+ messages in thread
From: Vincent Whitchurch via B4 Relay @ 2025-07-09 12:48 UTC (permalink / raw)
To: John Fastabend, Jakub Sitnicki
Cc: Kuniyuki Iwashima, Jakub Kicinski, netdev, bpf,
Vincent Whitchurch
From: Vincent Whitchurch <vincent.whitchurch@datadoghq.com>
Add a test which checks that splice(2) is still able to read data from
the socket if it is added to a verdict program which returns SK_PASS.
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@datadoghq.com>
---
tools/testing/selftests/bpf/test_sockmap.c | 73 ++++++++++++++++++++++++++++++
1 file changed, 73 insertions(+)
diff --git a/tools/testing/selftests/bpf/test_sockmap.c b/tools/testing/selftests/bpf/test_sockmap.c
index 8be2714dd573..b4102161db62 100644
--- a/tools/testing/selftests/bpf/test_sockmap.c
+++ b/tools/testing/selftests/bpf/test_sockmap.c
@@ -1,5 +1,6 @@
// SPDX-License-Identifier: GPL-2.0
// Copyright (c) 2017-2018 Covalent IO, Inc. http://covalent.io
+#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <sys/socket.h>
@@ -965,6 +966,61 @@ static int sendmsg_test(struct sockmap_options *opt)
return err;
}
+static int splice_test(struct sockmap_options *opt)
+{
+ int pipefds[2];
+ char buf[1024] = {0};
+ ssize_t bytes;
+ int ret;
+
+ ret = pipe(pipefds);
+ if (ret < 0) {
+ perror("pipe");
+ return ret;
+ }
+
+ bytes = send(c1, buf, sizeof(buf), 0);
+ if (bytes < 0) {
+ perror("send failed");
+ return bytes;
+ }
+ if (bytes == 0) {
+ fprintf(stderr, "send wrote zero bytes\n");
+ return -1;
+ }
+
+ bytes = write(pipefds[1], buf, sizeof(buf));
+ if (bytes < 0) {
+ perror("pipe write failed");
+ return bytes;
+ }
+
+ bytes = splice(p1, NULL, pipefds[1], NULL, sizeof(buf), 0);
+ if (bytes < 0) {
+ perror("splice failed");
+ return bytes;
+ }
+ if (bytes == 0) {
+ fprintf(stderr, "spliced zero bytes\n");
+ return -1;
+ }
+
+ bytes = read(pipefds[0], buf, sizeof(buf));
+ if (bytes < 0) {
+ perror("pipe read failed");
+ return bytes;
+ }
+ if (bytes == 0) {
+ fprintf(stderr, "EOF from pipe\n");
+ return -1;
+ }
+
+ close(pipefds[1]);
+ close(pipefds[0]);
+
+ return 0;
+}
+
static int forever_ping_pong(int rate, struct sockmap_options *opt)
{
struct timeval timeout;
@@ -1047,6 +1103,7 @@ enum {
BASE,
BASE_SENDPAGE,
SENDPAGE,
+ SPLICE,
};
static int run_options(struct sockmap_options *options, int cg_fd, int test)
@@ -1378,6 +1435,8 @@ static int run_options(struct sockmap_options *options, int cg_fd, int test)
options->base = true;
options->sendpage = true;
err = sendmsg_test(options);
+ } else if (test == SPLICE) {
+ err = splice_test(options);
} else
fprintf(stderr, "unknown test\n");
out:
@@ -1993,6 +2052,17 @@ static int populate_progs(char *bpf_file)
return 0;
}
+static void test_txmsg_splice_pass(int cgrp, struct sockmap_options *opt)
+{
+ txmsg_omit_skb_parser = 1;
+ txmsg_pass_skb = 1;
+
+ __test_exec(cgrp, SPLICE, opt);
+
+ txmsg_omit_skb_parser = 0;
+ txmsg_pass_skb = 0;
+}
+
struct _test test[] = {
{"txmsg test passthrough", test_txmsg_pass},
{"txmsg test redirect", test_txmsg_redir},
@@ -2009,6 +2079,7 @@ struct _test test[] = {
{"txmsg test push/pop data", test_txmsg_push_pop},
{"txmsg test ingress parser", test_txmsg_ingress_parser},
{"txmsg test ingress parser2", test_txmsg_ingress_parser2},
+ {"txmsg test splice pass", test_txmsg_splice_pass},
};
static int check_whitelist(struct _test *t, struct sockmap_options *opt)
@@ -2187,6 +2258,8 @@ int main(int argc, char **argv)
test = BASE_SENDPAGE;
} else if (strcmp(optarg, "sendpage") == 0) {
test = SENDPAGE;
+ } else if (strcmp(optarg, "splice") == 0) {
+ test = SPLICE;
} else {
usage(argv);
return -1;
--
2.34.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH bpf-next v3 0/5] sockmap: Fix reading with splice(2)
2025-07-09 12:47 [PATCH bpf-next v3 0/5] sockmap: Fix reading with splice(2) Vincent Whitchurch via B4 Relay
` (4 preceding siblings ...)
2025-07-09 12:48 ` [PATCH bpf-next v3 5/5] selftests/bpf: sockmap: Add splice + SK_PASS regression test Vincent Whitchurch via B4 Relay
@ 2025-07-11 15:57 ` John Fastabend
2025-07-11 23:09 ` Jakub Kicinski
6 siblings, 0 replies; 12+ messages in thread
From: John Fastabend @ 2025-07-11 15:57 UTC (permalink / raw)
To: vincent.whitchurch
Cc: Jakub Sitnicki, Kuniyuki Iwashima, Jakub Kicinski, netdev, bpf
On 2025-07-09 14:47:56, Vincent Whitchurch via B4 Relay wrote:
> I noticed that if the verdict callback returns SK_PASS, using splice(2)
> to read from a socket in a sockmap does not work since it never sees the
> data queued on to it. As far as I can see, this is not a regression but
> just something that has never worked, but it does make sockmap unusable
> if you can't guarantee that the programs using the socket will not use
> splice(2).
Correct it was never supported.
>
> This series attempts to fix it and add a test for it.
Thanks I had this todo on my list, but never got to it.
>
> ---
> Changes in v3:
> - Rebase on latest bpf-next/master
> - Link to v2: https://lore.kernel.org/r/20250609-sockmap-splice-v2-0-9c50645cfa32@datadoghq.com
>
> Changes in v2:
> - Rebase on latest bpf-next/master
> - Remove unnecessary change in inet_dgram_ops
> - Remove ->splice_read NULL check in inet_splice_read()
> - Use INDIRECT_CALL_1() in inet_splice_read()
> - Include test case in default test suite in test_sockmap
> - Link to v1: https://lore.kernel.org/r/20240606-sockmap-splice-v1-0-4820a2ab14b5@datadoghq.com
>
> ---
> Vincent Whitchurch (5):
> net: Add splice_read to prot
> tcp_bpf: Fix reading with splice(2)
> selftests/bpf: sockmap: Exit with error on failure
> selftests/bpf: sockmap: Allow SK_PASS in verdict
> selftests/bpf: sockmap: Add splice + SK_PASS regression test
>
> include/net/inet_common.h | 3 +
> include/net/sock.h | 3 +
> net/ipv4/af_inet.c | 13 ++++-
> net/ipv4/tcp_bpf.c | 9 +++
> net/ipv4/tcp_ipv4.c | 1 +
> net/ipv6/af_inet6.c | 2 +-
> net/ipv6/tcp_ipv6.c | 1 +
> tools/testing/selftests/bpf/test_sockmap.c | 90 +++++++++++++++++++++++++++++-
> 8 files changed, 118 insertions(+), 4 deletions(-)
> ---
> base-commit: ad97cb2ed06a6ba9025fd8bd14fa24369550cbb5
> change-id: 20240606-sockmap-splice-d371ac07d7b4
>
> Best regards,
> --
> Vincent Whitchurch <vincent.whitchurch@datadoghq.com>
>
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH bpf-next v3 0/5] sockmap: Fix reading with splice(2)
2025-07-09 12:47 [PATCH bpf-next v3 0/5] sockmap: Fix reading with splice(2) Vincent Whitchurch via B4 Relay
` (5 preceding siblings ...)
2025-07-11 15:57 ` [PATCH bpf-next v3 0/5] sockmap: Fix reading with splice(2) John Fastabend
@ 2025-07-11 23:09 ` Jakub Kicinski
2025-07-22 3:33 ` John Fastabend
6 siblings, 1 reply; 12+ messages in thread
From: Jakub Kicinski @ 2025-07-11 23:09 UTC (permalink / raw)
To: Vincent Whitchurch via B4 Relay
Cc: vincent.whitchurch, John Fastabend, Jakub Sitnicki,
Kuniyuki Iwashima, netdev, bpf
On Wed, 09 Jul 2025 14:47:56 +0200 Vincent Whitchurch via B4 Relay
wrote:
> I noticed that if the verdict callback returns SK_PASS, using splice(2)
> to read from a socket in a sockmap does not work since it never sees the
> data queued on to it. As far as I can see, this is not a regression but
> just something that has never worked, but it does make sockmap unusable
> if you can't guarantee that the programs using the socket will not use
> splice(2).
On v2 you should you can't replace ops for passively opened
connections. Can that not be addressed instead of adding
an indirect call on the data path?
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH bpf-next v3 0/5] sockmap: Fix reading with splice(2)
2025-07-11 23:09 ` Jakub Kicinski
@ 2025-07-22 3:33 ` John Fastabend
0 siblings, 0 replies; 12+ messages in thread
From: John Fastabend @ 2025-07-22 3:33 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Vincent Whitchurch via B4 Relay, vincent.whitchurch,
Jakub Sitnicki, Kuniyuki Iwashima, netdev, bpf
On 2025-07-11 16:09:31, Jakub Kicinski wrote:
> On Wed, 09 Jul 2025 14:47:56 +0200 Vincent Whitchurch via B4 Relay
> wrote:
> > I noticed that if the verdict callback returns SK_PASS, using splice(2)
> > to read from a socket in a sockmap does not work since it never sees the
> > data queued on to it. As far as I can see, this is not a regression but
> > just something that has never worked, but it does make sockmap unusable
> > if you can't guarantee that the programs using the socket will not use
> > splice(2).
>
> On v2 you should you can't replace ops for passively opened
> connections. Can that not be addressed instead of adding
> an indirect call on the data path?
I guess I missed above? We can create the psock and add the socket
to a map on the accept()?
It would be good to get some fix for this.
Thanks,
John
^ permalink raw reply [flat|nested] 12+ messages in thread