* [PATCH bpf v8 0/5] bpf: fix wrong copied_seq calculation and add tests
@ 2025-01-21 5:07 Jiayuan Chen
2025-01-21 5:07 ` [PATCH bpf v8 1/5] strparser: add read_sock callback Jiayuan Chen
` (4 more replies)
0 siblings, 5 replies; 12+ messages in thread
From: Jiayuan Chen @ 2025-01-21 5:07 UTC (permalink / raw)
To: bpf, jakub, john.fastabend
Cc: netdev, martin.lau, ast, edumazet, davem, dsahern, kuba, pabeni,
linux-kernel, song, andrii, mhal, yonghong.song, daniel,
xiyou.wangcong, horms, corbet, eddyz87, cong.wang, shuah, mykolal,
jolsa, haoluo, sdf, kpsingh, linux-doc, linux-kselftest,
Jiayuan Chen
A previous commit described in this topic
http://lore.kernel.org/bpf/20230523025618.113937-9-john.fastabend@gmail.com
directly updated 'sk->copied_seq' in the tcp_eat_skb() function when the
action of a BPF program was SK_REDIRECT. For other actions, like SK_PASS,
the update logic for 'sk->copied_seq' was moved to
tcp_bpf_recvmsg_parser() to ensure the accuracy of the 'fionread' feature.
That commit works for a single stream_verdict scenario, as it also
modified 'sk_data_ready->sk_psock_verdict_data_ready->tcp_read_skb'
to remove updating 'sk->copied_seq'.
However, for programs where both stream_parser and stream_verdict are
active(strparser purpose), tcp_read_sock() was used instead of
tcp_read_skb() (sk_data_ready->strp_data_ready->tcp_read_sock)
tcp_read_sock() now still update 'sk->copied_seq', leading to duplicated
updates.
In summary, for strparser + SK_PASS, copied_seq is redundantly calculated
in both tcp_read_sock() and tcp_bpf_recvmsg_parser().
The issue causes incorrect copied_seq calculations, which prevent
correct data reads from the recv() interface in user-land.
Also we added test cases for bpf + strparser and separated them from
sockmap_basic, as strparser has more encapsulation and parsing
capabilities compared to sockmap.
Fixes: e5c6de5fa025 ("bpf, sockmap: Incorrectly handling copied_seq")
---
V8 -> V7:
https://lore.kernel.org/bpf/20250116140531.108636-1-mrpre@163.com/
Avoid using add read_sock to psock. (Jakub Sitnicki)
Avoid using warpper function to check whether strparser is supported.
V3 -> V7:
https://lore.kernel.org/bpf/20250109094402.50838-1-mrpre@163.com/
https://lore.kernel.org/bpf/20241218053408.437295-1-mrpre@163.com/
Avoid introducing new proto_ops. (Jakub Sitnicki).
Add more edge test cases for strparser + bpf.
Fix patchwork fail of test cases code.
Fix psock fetch without rcu lock.
Move code of modifying to tcp_bpf.c.
V1 -> V3:
https://lore.kernel.org/bpf/20241209152740.281125-1-mrpre@163.com/
Fix patchwork fail by adding Fixes tag.
Save skb data offset for ENOMEM. (John Fastabend)
---
Jiayuan Chen (5):
strparser: add read_sock callback
bpf: fix wrong copied_seq calculation
bpf: disable non stream socket for strparser
selftests/bpf: fix invalid flag of recv()
selftests/bpf: add strparser test for bpf
Documentation/networking/strparser.rst | 9 +-
include/linux/skmsg.h | 2 +
include/net/strparser.h | 2 +
include/net/tcp.h | 8 +
net/core/skmsg.c | 7 +
net/core/sock_map.c | 5 +-
net/ipv4/tcp.c | 29 +-
net/ipv4/tcp_bpf.c | 42 ++
net/strparser/strparser.c | 11 +-
.../selftests/bpf/prog_tests/sockmap_basic.c | 59 +--
.../selftests/bpf/prog_tests/sockmap_strp.c | 452 ++++++++++++++++++
.../selftests/bpf/progs/test_sockmap_strp.c | 53 ++
12 files changed, 614 insertions(+), 65 deletions(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/sockmap_strp.c
create mode 100644 tools/testing/selftests/bpf/progs/test_sockmap_strp.c
--
2.43.5
^ permalink raw reply [flat|nested] 12+ messages in thread* [PATCH bpf v8 1/5] strparser: add read_sock callback 2025-01-21 5:07 [PATCH bpf v8 0/5] bpf: fix wrong copied_seq calculation and add tests Jiayuan Chen @ 2025-01-21 5:07 ` Jiayuan Chen 2025-01-21 14:19 ` Jakub Sitnicki 2025-01-21 5:07 ` [PATCH bpf v8 2/5] bpf: fix wrong copied_seq calculation Jiayuan Chen ` (3 subsequent siblings) 4 siblings, 1 reply; 12+ messages in thread From: Jiayuan Chen @ 2025-01-21 5:07 UTC (permalink / raw) To: bpf, jakub, john.fastabend Cc: netdev, martin.lau, ast, edumazet, davem, dsahern, kuba, pabeni, linux-kernel, song, andrii, mhal, yonghong.song, daniel, xiyou.wangcong, horms, corbet, eddyz87, cong.wang, shuah, mykolal, jolsa, haoluo, sdf, kpsingh, linux-doc, linux-kselftest, Jiayuan Chen Added a new read_sock handler, allowing users to customize read operations instead of relying on the native socket's read_sock. Signed-off-by: Jiayuan Chen <mrpre@163.com> --- Documentation/networking/strparser.rst | 9 ++++++++- include/net/strparser.h | 2 ++ net/strparser/strparser.c | 11 +++++++++-- 3 files changed, 19 insertions(+), 3 deletions(-) diff --git a/Documentation/networking/strparser.rst b/Documentation/networking/strparser.rst index 6cab1f74ae05..7f623d1db72a 100644 --- a/Documentation/networking/strparser.rst +++ b/Documentation/networking/strparser.rst @@ -112,7 +112,7 @@ Functions Callbacks ========= -There are six callbacks: +There are seven callbacks: :: @@ -182,6 +182,13 @@ There are six callbacks: the length of the message. skb->len - offset may be greater then full_len since strparser does not trim the skb. + :: + + int (*read_sock)(struct strparser *strp, read_descriptor_t *desc, + sk_read_actor_t recv_actor); + + The read_sock callback is used by strparser instead of + sock->ops->read_sock, if provided. :: int (*read_sock_done)(struct strparser *strp, int err); diff --git a/include/net/strparser.h b/include/net/strparser.h index 41e2ce9e9e10..0a83010b3a64 100644 --- a/include/net/strparser.h +++ b/include/net/strparser.h @@ -43,6 +43,8 @@ struct strparser; struct strp_callbacks { int (*parse_msg)(struct strparser *strp, struct sk_buff *skb); void (*rcv_msg)(struct strparser *strp, struct sk_buff *skb); + int (*read_sock)(struct strparser *strp, read_descriptor_t *desc, + sk_read_actor_t recv_actor); int (*read_sock_done)(struct strparser *strp, int err); void (*abort_parser)(struct strparser *strp, int err); void (*lock)(struct strparser *strp); diff --git a/net/strparser/strparser.c b/net/strparser/strparser.c index 8299ceb3e373..95696f42647e 100644 --- a/net/strparser/strparser.c +++ b/net/strparser/strparser.c @@ -347,7 +347,10 @@ static int strp_read_sock(struct strparser *strp) struct socket *sock = strp->sk->sk_socket; read_descriptor_t desc; - if (unlikely(!sock || !sock->ops || !sock->ops->read_sock)) + if (unlikely(!sock || !sock->ops)) + return -EBUSY; + + if (unlikely(!strp->cb.read_sock && !sock->ops->read_sock)) return -EBUSY; desc.arg.data = strp; @@ -355,7 +358,10 @@ static int strp_read_sock(struct strparser *strp) desc.count = 1; /* give more than one skb per call */ /* sk should be locked here, so okay to do read_sock */ - sock->ops->read_sock(strp->sk, &desc, strp_recv); + if (strp->cb.read_sock) + strp->cb.read_sock(strp, &desc, strp_recv); + else + sock->ops->read_sock(strp->sk, &desc, strp_recv); desc.error = strp->cb.read_sock_done(strp, desc.error); @@ -468,6 +474,7 @@ int strp_init(struct strparser *strp, struct sock *sk, strp->cb.unlock = cb->unlock ? : strp_sock_unlock; strp->cb.rcv_msg = cb->rcv_msg; strp->cb.parse_msg = cb->parse_msg; + strp->cb.read_sock = cb->read_sock; strp->cb.read_sock_done = cb->read_sock_done ? : default_read_sock_done; strp->cb.abort_parser = cb->abort_parser ? : strp_abort_strp; -- 2.43.5 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH bpf v8 1/5] strparser: add read_sock callback 2025-01-21 5:07 ` [PATCH bpf v8 1/5] strparser: add read_sock callback Jiayuan Chen @ 2025-01-21 14:19 ` Jakub Sitnicki 0 siblings, 0 replies; 12+ messages in thread From: Jakub Sitnicki @ 2025-01-21 14:19 UTC (permalink / raw) To: Jiayuan Chen Cc: bpf, john.fastabend, netdev, martin.lau, ast, edumazet, davem, dsahern, kuba, pabeni, linux-kernel, song, andrii, mhal, yonghong.song, daniel, xiyou.wangcong, horms, corbet, eddyz87, cong.wang, shuah, mykolal, jolsa, haoluo, sdf, kpsingh, linux-doc, linux-kselftest On Tue, Jan 21, 2025 at 01:07 PM +08, Jiayuan Chen wrote: > Added a new read_sock handler, allowing users to customize read operations > instead of relying on the native socket's read_sock. > > Signed-off-by: Jiayuan Chen <mrpre@163.com> > --- Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH bpf v8 2/5] bpf: fix wrong copied_seq calculation 2025-01-21 5:07 [PATCH bpf v8 0/5] bpf: fix wrong copied_seq calculation and add tests Jiayuan Chen 2025-01-21 5:07 ` [PATCH bpf v8 1/5] strparser: add read_sock callback Jiayuan Chen @ 2025-01-21 5:07 ` Jiayuan Chen 2025-01-21 14:18 ` Jakub Sitnicki 2025-01-21 5:07 ` [PATCH bpf v8 3/5] bpf: disable non stream socket for strparser Jiayuan Chen ` (2 subsequent siblings) 4 siblings, 1 reply; 12+ messages in thread From: Jiayuan Chen @ 2025-01-21 5:07 UTC (permalink / raw) To: bpf, jakub, john.fastabend Cc: netdev, martin.lau, ast, edumazet, davem, dsahern, kuba, pabeni, linux-kernel, song, andrii, mhal, yonghong.song, daniel, xiyou.wangcong, horms, corbet, eddyz87, cong.wang, shuah, mykolal, jolsa, haoluo, sdf, kpsingh, linux-doc, linux-kselftest, Jiayuan Chen 'sk->copied_seq' was updated in the tcp_eat_skb() function when the action of a BPF program was SK_REDIRECT. For other actions, like SK_PASS, the update logic for 'sk->copied_seq' was moved to tcp_bpf_recvmsg_parser() to ensure the accuracy of the 'fionread' feature. It works for a single stream_verdict scenario, as it also modified 'sk_data_ready->sk_psock_verdict_data_ready->tcp_read_skb' to remove updating 'sk->copied_seq'. However, for programs where both stream_parser and stream_verdict are active(strparser purpose), tcp_read_sock() was used instead of tcp_read_skb() (sk_data_ready->strp_data_ready->tcp_read_sock) tcp_read_sock() now still update 'sk->copied_seq', leading to duplicated updates. In summary, for strparser + SK_PASS, copied_seq is redundantly calculated in both tcp_read_sock() and tcp_bpf_recvmsg_parser(). The issue causes incorrect copied_seq calculations, which prevent correct data reads from the recv() interface in user-land. We do not want to add new proto_ops to implement a new version of tcp_read_sock, as this would introduce code complexity [1]. We add new callback for strparser for customized read operation. [1]: https://lore.kernel.org/bpf/20241218053408.437295-1-mrpre@163.com Fixes: e5c6de5fa025 ("bpf, sockmap: Incorrectly handling copied_seq") Suggested-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Jiayuan Chen <mrpre@163.com> --- include/linux/skmsg.h | 2 ++ include/net/tcp.h | 8 ++++++++ net/core/skmsg.c | 7 +++++++ net/ipv4/tcp.c | 29 ++++++++++++++++++++++++----- net/ipv4/tcp_bpf.c | 42 ++++++++++++++++++++++++++++++++++++++++++ 5 files changed, 83 insertions(+), 5 deletions(-) diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h index 2cbe0c22a32f..0b9095a281b8 100644 --- a/include/linux/skmsg.h +++ b/include/linux/skmsg.h @@ -91,6 +91,8 @@ struct sk_psock { struct sk_psock_progs progs; #if IS_ENABLED(CONFIG_BPF_STREAM_PARSER) struct strparser strp; + u32 copied_seq; + u32 ingress_bytes; #endif struct sk_buff_head ingress_skb; struct list_head ingress_msg; diff --git a/include/net/tcp.h b/include/net/tcp.h index e9b37b76e894..06affc653247 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -729,6 +729,9 @@ void tcp_get_info(struct sock *, struct tcp_info *); /* Read 'sendfile()'-style from a TCP socket */ int tcp_read_sock(struct sock *sk, read_descriptor_t *desc, sk_read_actor_t recv_actor); +int tcp_read_sock_noack(struct sock *sk, read_descriptor_t *desc, + sk_read_actor_t recv_actor, bool noack, + u32 *copied_seq); int tcp_read_skb(struct sock *sk, skb_read_actor_t recv_actor); struct sk_buff *tcp_recv_skb(struct sock *sk, u32 seq, u32 *off); void tcp_read_done(struct sock *sk, size_t len); @@ -2599,6 +2602,11 @@ struct sk_psock; #ifdef CONFIG_BPF_SYSCALL int tcp_bpf_update_proto(struct sock *sk, struct sk_psock *psock, bool restore); void tcp_bpf_clone(const struct sock *sk, struct sock *newsk); +#ifdef CONFIG_BPF_STREAM_PARSER +struct strparser; +int tcp_bpf_strp_read_sock(struct strparser *strp, read_descriptor_t *desc, + sk_read_actor_t recv_actor); +#endif /* CONFIG_BPF_STREAM_PARSER */ #endif /* CONFIG_BPF_SYSCALL */ #ifdef CONFIG_INET diff --git a/net/core/skmsg.c b/net/core/skmsg.c index 61f3f3d4e528..0ddc4c718833 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -549,6 +549,9 @@ static int sk_psock_skb_ingress_enqueue(struct sk_buff *skb, return num_sge; } +#if IS_ENABLED(CONFIG_BPF_STREAM_PARSER) + psock->ingress_bytes += len; +#endif copied = len; msg->sg.start = 0; msg->sg.size = copied; @@ -1144,6 +1147,10 @@ int sk_psock_init_strp(struct sock *sk, struct sk_psock *psock) if (!ret) sk_psock_set_state(psock, SK_PSOCK_RX_STRP_ENABLED); + if (sk_is_tcp(sk)) { + psock->strp.cb.read_sock = tcp_bpf_strp_read_sock; + psock->copied_seq = tcp_sk(sk)->copied_seq; + } return ret; } diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 0d704bda6c41..285678d8ce07 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -1565,12 +1565,13 @@ EXPORT_SYMBOL(tcp_recv_skb); * or for 'peeking' the socket using this routine * (although both would be easy to implement). */ -int tcp_read_sock(struct sock *sk, read_descriptor_t *desc, - sk_read_actor_t recv_actor) +static int __tcp_read_sock(struct sock *sk, read_descriptor_t *desc, + sk_read_actor_t recv_actor, bool noack, + u32 *copied_seq) { struct sk_buff *skb; struct tcp_sock *tp = tcp_sk(sk); - u32 seq = tp->copied_seq; + u32 seq = *copied_seq; u32 offset; int copied = 0; @@ -1624,9 +1625,12 @@ int tcp_read_sock(struct sock *sk, read_descriptor_t *desc, tcp_eat_recv_skb(sk, skb); if (!desc->count) break; - WRITE_ONCE(tp->copied_seq, seq); + WRITE_ONCE(*copied_seq, seq); } - WRITE_ONCE(tp->copied_seq, seq); + WRITE_ONCE(*copied_seq, seq); + + if (noack) + goto out; tcp_rcv_space_adjust(sk); @@ -1635,10 +1639,25 @@ int tcp_read_sock(struct sock *sk, read_descriptor_t *desc, tcp_recv_skb(sk, seq, &offset); tcp_cleanup_rbuf(sk, copied); } +out: return copied; } + +int tcp_read_sock(struct sock *sk, read_descriptor_t *desc, + sk_read_actor_t recv_actor) +{ + return __tcp_read_sock(sk, desc, recv_actor, false, + &tcp_sk(sk)->copied_seq); +} EXPORT_SYMBOL(tcp_read_sock); +int tcp_read_sock_noack(struct sock *sk, read_descriptor_t *desc, + sk_read_actor_t recv_actor, bool noack, + u32 *copied_seq) +{ + return __tcp_read_sock(sk, desc, recv_actor, noack, copied_seq); +} + int tcp_read_skb(struct sock *sk, skb_read_actor_t recv_actor) { struct sk_buff *skb; diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c index 47f65b1b70ca..4dcf88ad8275 100644 --- a/net/ipv4/tcp_bpf.c +++ b/net/ipv4/tcp_bpf.c @@ -646,6 +646,48 @@ static int tcp_bpf_assert_proto_ops(struct proto *ops) ops->sendmsg == tcp_sendmsg ? 0 : -ENOTSUPP; } +#if IS_ENABLED(CONFIG_BPF_STREAM_PARSER) +int tcp_bpf_strp_read_sock(struct strparser *strp, read_descriptor_t *desc, + sk_read_actor_t recv_actor) +{ + struct sock *sk = strp->sk; + struct sk_psock *psock; + struct tcp_sock *tp; + int copied = 0; + + tp = tcp_sk(sk); + rcu_read_lock(); + psock = sk_psock(sk); + if (WARN_ON(!psock)) { + desc->error = -EINVAL; + goto out; + } + + psock->ingress_bytes = 0; + /* We could easily add copied_seq and noack into desc then call + * ops->read_sock without calling symbol directly. But unfortunately + * most descriptors used by other modules are not inited with zero. + * Also it not work by replacing ops->read_sock without introducing + * new ops as ops itself is located in rodata segment. + */ + copied = tcp_read_sock_noack(sk, desc, recv_actor, true, + &psock->copied_seq); + if (copied < 0) + goto out; + /* recv_actor may redirect skb to another socket(SK_REDIRECT) or + * just put skb into ingress queue of current socket(SK_PASS). + * For SK_REDIRECT, we need 'ack' the frame immediately but for + * SK_PASS, the 'ack' was delay to tcp_bpf_recvmsg_parser() + */ + tp->copied_seq = psock->copied_seq - psock->ingress_bytes; + tcp_rcv_space_adjust(sk); + __tcp_cleanup_rbuf(sk, copied - psock->ingress_bytes); +out: + rcu_read_unlock(); + return copied; +} +#endif /* CONFIG_BPF_STREAM_PARSER */ + int tcp_bpf_update_proto(struct sock *sk, struct sk_psock *psock, bool restore) { int family = sk->sk_family == AF_INET6 ? TCP_BPF_IPV6 : TCP_BPF_IPV4; -- 2.43.5 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH bpf v8 2/5] bpf: fix wrong copied_seq calculation 2025-01-21 5:07 ` [PATCH bpf v8 2/5] bpf: fix wrong copied_seq calculation Jiayuan Chen @ 2025-01-21 14:18 ` Jakub Sitnicki 2025-01-22 8:55 ` Jiayuan Chen 0 siblings, 1 reply; 12+ messages in thread From: Jakub Sitnicki @ 2025-01-21 14:18 UTC (permalink / raw) To: Jiayuan Chen Cc: bpf, john.fastabend, netdev, martin.lau, ast, edumazet, davem, dsahern, kuba, pabeni, linux-kernel, song, andrii, mhal, yonghong.song, daniel, xiyou.wangcong, horms, corbet, eddyz87, cong.wang, shuah, mykolal, jolsa, haoluo, sdf, kpsingh, linux-doc, linux-kselftest On Tue, Jan 21, 2025 at 01:07 PM +08, Jiayuan Chen wrote: > 'sk->copied_seq' was updated in the tcp_eat_skb() function when the > action of a BPF program was SK_REDIRECT. For other actions, like SK_PASS, > the update logic for 'sk->copied_seq' was moved to > tcp_bpf_recvmsg_parser() to ensure the accuracy of the 'fionread' feature. > > It works for a single stream_verdict scenario, as it also modified > 'sk_data_ready->sk_psock_verdict_data_ready->tcp_read_skb' > to remove updating 'sk->copied_seq'. > > However, for programs where both stream_parser and stream_verdict are > active(strparser purpose), tcp_read_sock() was used instead of Nit: missing space, "active (strparser purpose)" ^ > tcp_read_skb() (sk_data_ready->strp_data_ready->tcp_read_sock) Nit: missing period, "… (sk_data_ready->strp_data_ready->tcp_read_sock)." ^ > tcp_read_sock() now still update 'sk->copied_seq', leading to duplicated Nit: grammar "still updates" ^ Please run commit descriptions through a language tool or an LLM, if possible. This makes reviewing easier. > updates. > > In summary, for strparser + SK_PASS, copied_seq is redundantly calculated > in both tcp_read_sock() and tcp_bpf_recvmsg_parser(). > > The issue causes incorrect copied_seq calculations, which prevent > correct data reads from the recv() interface in user-land. > > We do not want to add new proto_ops to implement a new version of > tcp_read_sock, as this would introduce code complexity [1]. > > We add new callback for strparser for customized read operation. > > [1]: https://lore.kernel.org/bpf/20241218053408.437295-1-mrpre@163.com > Fixes: e5c6de5fa025 ("bpf, sockmap: Incorrectly handling copied_seq") > Suggested-by: Jakub Sitnicki <jakub@cloudflare.com> > Signed-off-by: Jiayuan Chen <mrpre@163.com> > --- > include/linux/skmsg.h | 2 ++ > include/net/tcp.h | 8 ++++++++ > net/core/skmsg.c | 7 +++++++ > net/ipv4/tcp.c | 29 ++++++++++++++++++++++++----- > net/ipv4/tcp_bpf.c | 42 ++++++++++++++++++++++++++++++++++++++++++ > 5 files changed, 83 insertions(+), 5 deletions(-) > > diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h > index 2cbe0c22a32f..0b9095a281b8 100644 > --- a/include/linux/skmsg.h > +++ b/include/linux/skmsg.h > @@ -91,6 +91,8 @@ struct sk_psock { > struct sk_psock_progs progs; > #if IS_ENABLED(CONFIG_BPF_STREAM_PARSER) > struct strparser strp; > + u32 copied_seq; > + u32 ingress_bytes; > #endif > struct sk_buff_head ingress_skb; > struct list_head ingress_msg; > diff --git a/include/net/tcp.h b/include/net/tcp.h > index e9b37b76e894..06affc653247 100644 > --- a/include/net/tcp.h > +++ b/include/net/tcp.h > @@ -729,6 +729,9 @@ void tcp_get_info(struct sock *, struct tcp_info *); > /* Read 'sendfile()'-style from a TCP socket */ > int tcp_read_sock(struct sock *sk, read_descriptor_t *desc, > sk_read_actor_t recv_actor); > +int tcp_read_sock_noack(struct sock *sk, read_descriptor_t *desc, > + sk_read_actor_t recv_actor, bool noack, > + u32 *copied_seq); > int tcp_read_skb(struct sock *sk, skb_read_actor_t recv_actor); > struct sk_buff *tcp_recv_skb(struct sock *sk, u32 seq, u32 *off); > void tcp_read_done(struct sock *sk, size_t len); > @@ -2599,6 +2602,11 @@ struct sk_psock; > #ifdef CONFIG_BPF_SYSCALL > int tcp_bpf_update_proto(struct sock *sk, struct sk_psock *psock, bool restore); > void tcp_bpf_clone(const struct sock *sk, struct sock *newsk); > +#ifdef CONFIG_BPF_STREAM_PARSER > +struct strparser; > +int tcp_bpf_strp_read_sock(struct strparser *strp, read_descriptor_t *desc, > + sk_read_actor_t recv_actor); > +#endif /* CONFIG_BPF_STREAM_PARSER */ > #endif /* CONFIG_BPF_SYSCALL */ > > #ifdef CONFIG_INET > diff --git a/net/core/skmsg.c b/net/core/skmsg.c > index 61f3f3d4e528..0ddc4c718833 100644 > --- a/net/core/skmsg.c > +++ b/net/core/skmsg.c > @@ -549,6 +549,9 @@ static int sk_psock_skb_ingress_enqueue(struct sk_buff *skb, > return num_sge; > } > > +#if IS_ENABLED(CONFIG_BPF_STREAM_PARSER) > + psock->ingress_bytes += len; > +#endif > copied = len; > msg->sg.start = 0; > msg->sg.size = copied; > @@ -1144,6 +1147,10 @@ int sk_psock_init_strp(struct sock *sk, struct sk_psock *psock) > if (!ret) > sk_psock_set_state(psock, SK_PSOCK_RX_STRP_ENABLED); > > + if (sk_is_tcp(sk)) { > + psock->strp.cb.read_sock = tcp_bpf_strp_read_sock; > + psock->copied_seq = tcp_sk(sk)->copied_seq; > + } > return ret; > } > > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c > index 0d704bda6c41..285678d8ce07 100644 > --- a/net/ipv4/tcp.c > +++ b/net/ipv4/tcp.c > @@ -1565,12 +1565,13 @@ EXPORT_SYMBOL(tcp_recv_skb); > * or for 'peeking' the socket using this routine > * (although both would be easy to implement). > */ > -int tcp_read_sock(struct sock *sk, read_descriptor_t *desc, > - sk_read_actor_t recv_actor) > +static int __tcp_read_sock(struct sock *sk, read_descriptor_t *desc, > + sk_read_actor_t recv_actor, bool noack, > + u32 *copied_seq) > { > struct sk_buff *skb; > struct tcp_sock *tp = tcp_sk(sk); > - u32 seq = tp->copied_seq; > + u32 seq = *copied_seq; > u32 offset; > int copied = 0; > > @@ -1624,9 +1625,12 @@ int tcp_read_sock(struct sock *sk, read_descriptor_t *desc, > tcp_eat_recv_skb(sk, skb); > if (!desc->count) > break; > - WRITE_ONCE(tp->copied_seq, seq); > + WRITE_ONCE(*copied_seq, seq); > } > - WRITE_ONCE(tp->copied_seq, seq); > + WRITE_ONCE(*copied_seq, seq); > + > + if (noack) > + goto out; > > tcp_rcv_space_adjust(sk); > > @@ -1635,10 +1639,25 @@ int tcp_read_sock(struct sock *sk, read_descriptor_t *desc, > tcp_recv_skb(sk, seq, &offset); > tcp_cleanup_rbuf(sk, copied); > } > +out: > return copied; > } > + > +int tcp_read_sock(struct sock *sk, read_descriptor_t *desc, > + sk_read_actor_t recv_actor) > +{ > + return __tcp_read_sock(sk, desc, recv_actor, false, > + &tcp_sk(sk)->copied_seq); > +} > EXPORT_SYMBOL(tcp_read_sock); > > +int tcp_read_sock_noack(struct sock *sk, read_descriptor_t *desc, > + sk_read_actor_t recv_actor, bool noack, > + u32 *copied_seq) > +{ > + return __tcp_read_sock(sk, desc, recv_actor, noack, copied_seq); > +} > + > int tcp_read_skb(struct sock *sk, skb_read_actor_t recv_actor) > { > struct sk_buff *skb; > diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c > index 47f65b1b70ca..4dcf88ad8275 100644 > --- a/net/ipv4/tcp_bpf.c > +++ b/net/ipv4/tcp_bpf.c > @@ -646,6 +646,48 @@ static int tcp_bpf_assert_proto_ops(struct proto *ops) > ops->sendmsg == tcp_sendmsg ? 0 : -ENOTSUPP; > } > > +#if IS_ENABLED(CONFIG_BPF_STREAM_PARSER) > +int tcp_bpf_strp_read_sock(struct strparser *strp, read_descriptor_t *desc, > + sk_read_actor_t recv_actor) > +{ > + struct sock *sk = strp->sk; > + struct sk_psock *psock; > + struct tcp_sock *tp; > + int copied = 0; > + > + tp = tcp_sk(sk); > + rcu_read_lock(); > + psock = sk_psock(sk); > + if (WARN_ON(!psock)) { WARN_ON_ONCE, please. We don't want to flood dmesg. > + desc->error = -EINVAL; > + goto out; > + } > + > + psock->ingress_bytes = 0; > + /* We could easily add copied_seq and noack into desc then call > + * ops->read_sock without calling symbol directly. But unfortunately > + * most descriptors used by other modules are not inited with zero. > + * Also it not work by replacing ops->read_sock without introducing > + * new ops as ops itself is located in rodata segment. > + */ Above commment explains an implementation decision and belongs in the patch description, not in the code. It does not help with understanding the code itself. > + copied = tcp_read_sock_noack(sk, desc, recv_actor, true, > + &psock->copied_seq); > + if (copied < 0) > + goto out; > + /* recv_actor may redirect skb to another socket(SK_REDIRECT) or Nit: missing space, "socket (SK_REDIRECT)" > + * just put skb into ingress queue of current socket(SK_PASS). Nit: missing space, "socket (SK_PASS)" > + * For SK_REDIRECT, we need 'ack' the frame immediately but for Nit: grammar, "we need to" Nit: style, "to ack" is an accepted form of "to acknowlege", no need for quotes around it. > + * SK_PASS, the 'ack' was delay to tcp_bpf_recvmsg_parser() Nit: grammar, "we want to delay the ack until tcp_bpf_recvmsg_parser()" > + */ > + tp->copied_seq = psock->copied_seq - psock->ingress_bytes; > + tcp_rcv_space_adjust(sk); > + __tcp_cleanup_rbuf(sk, copied - psock->ingress_bytes); > +out: > + rcu_read_unlock(); > + return copied; > +} > +#endif /* CONFIG_BPF_STREAM_PARSER */ > + > int tcp_bpf_update_proto(struct sock *sk, struct sk_psock *psock, bool restore) > { > int family = sk->sk_family == AF_INET6 ? TCP_BPF_IPV6 : TCP_BPF_IPV4; ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH bpf v8 2/5] bpf: fix wrong copied_seq calculation 2025-01-21 14:18 ` Jakub Sitnicki @ 2025-01-22 8:55 ` Jiayuan Chen 0 siblings, 0 replies; 12+ messages in thread From: Jiayuan Chen @ 2025-01-22 8:55 UTC (permalink / raw) To: Jakub Sitnicki Cc: bpf, john.fastabend, netdev, martin.lau, ast, edumazet, davem, dsahern, kuba, pabeni, linux-kernel, song, andrii, mhal, yonghong.song, daniel, xiyou.wangcong, horms, corbet, eddyz87, cong.wang, shuah, mykolal, jolsa, haoluo, sdf, kpsingh, linux-doc, linux-kselftest On Tue, Jan 21, 2025 at 03:18:38PM +0100, Jakub Sitnicki wrote: > On Tue, Jan 21, 2025 at 01:07 PM +08, Jiayuan Chen wrote: > > 'sk->copied_seq' was updated in the tcp_eat_skb() function when the > > action of a BPF program was SK_REDIRECT. For other actions, like SK_PASS, > > the update logic for 'sk->copied_seq' was moved to > > tcp_bpf_recvmsg_parser() to ensure the accuracy of the 'fionread' feature. > > > > It works for a single stream_verdict scenario, as it also modified > > 'sk_data_ready->sk_psock_verdict_data_ready->tcp_read_skb' > > to remove updating 'sk->copied_seq'. > > > > However, for programs where both stream_parser and stream_verdict are > > active(strparser purpose), tcp_read_sock() was used instead of > > Nit: missing space, "active (strparser purpose)" > ^ > > > tcp_read_skb() (sk_data_ready->strp_data_ready->tcp_read_sock) > > Nit: missing period, "… (sk_data_ready->strp_data_ready->tcp_read_sock)." > ^ > > > tcp_read_sock() now still update 'sk->copied_seq', leading to duplicated > > Nit: grammar "still updates" > ^ > Please run commit descriptions through a language tool or an LLM, if > possible. This makes reviewing easier. > Thanks Jakub, I'll review all commit messages and code comments, and also use LLLM for grammar checks. ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH bpf v8 3/5] bpf: disable non stream socket for strparser 2025-01-21 5:07 [PATCH bpf v8 0/5] bpf: fix wrong copied_seq calculation and add tests Jiayuan Chen 2025-01-21 5:07 ` [PATCH bpf v8 1/5] strparser: add read_sock callback Jiayuan Chen 2025-01-21 5:07 ` [PATCH bpf v8 2/5] bpf: fix wrong copied_seq calculation Jiayuan Chen @ 2025-01-21 5:07 ` Jiayuan Chen 2025-01-21 14:23 ` Jakub Sitnicki 2025-01-21 5:07 ` [PATCH bpf v8 4/5] selftests/bpf: fix invalid flag of recv() Jiayuan Chen 2025-01-21 5:07 ` [PATCH bpf v8 5/5] selftests/bpf: add strparser test for bpf Jiayuan Chen 4 siblings, 1 reply; 12+ messages in thread From: Jiayuan Chen @ 2025-01-21 5:07 UTC (permalink / raw) To: bpf, jakub, john.fastabend Cc: netdev, martin.lau, ast, edumazet, davem, dsahern, kuba, pabeni, linux-kernel, song, andrii, mhal, yonghong.song, daniel, xiyou.wangcong, horms, corbet, eddyz87, cong.wang, shuah, mykolal, jolsa, haoluo, sdf, kpsingh, linux-doc, linux-kselftest, Jiayuan Chen Currently, only TCP supports strparser, but sockmap doesn't intercept non-TCP to attach strparser. For example, with UDP, although the read/write handlers are replaced, strparser is not executed due to the lack of read_sock operation. Furthermore, in udp_bpf_recvmsg(), it checks whether psock has data, and if not, it falls back to the native UDP read interface, making UDP + strparser appear to read correctly. According to it's commit history, the behavior is unexpected. Moreover, since UDP lacks the concept of streams, we intercept it directly. Signed-off-by: Jiayuan Chen <mrpre@163.com> --- net/core/sock_map.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/net/core/sock_map.c b/net/core/sock_map.c index f1b9b3958792..3b0f59d9b4db 100644 --- a/net/core/sock_map.c +++ b/net/core/sock_map.c @@ -303,7 +303,10 @@ static int sock_map_link(struct bpf_map *map, struct sock *sk) write_lock_bh(&sk->sk_callback_lock); if (stream_parser && stream_verdict && !psock->saved_data_ready) { - ret = sk_psock_init_strp(sk, psock); + if (sk_is_tcp(sk)) + ret = sk_psock_init_strp(sk, psock); + else + ret = -EOPNOTSUPP; if (ret) { write_unlock_bh(&sk->sk_callback_lock); sk_psock_put(sk, psock); -- 2.43.5 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH bpf v8 3/5] bpf: disable non stream socket for strparser 2025-01-21 5:07 ` [PATCH bpf v8 3/5] bpf: disable non stream socket for strparser Jiayuan Chen @ 2025-01-21 14:23 ` Jakub Sitnicki 0 siblings, 0 replies; 12+ messages in thread From: Jakub Sitnicki @ 2025-01-21 14:23 UTC (permalink / raw) To: Jiayuan Chen Cc: bpf, john.fastabend, netdev, martin.lau, ast, edumazet, davem, dsahern, kuba, pabeni, linux-kernel, song, andrii, mhal, yonghong.song, daniel, xiyou.wangcong, horms, corbet, eddyz87, cong.wang, shuah, mykolal, jolsa, haoluo, sdf, kpsingh, linux-doc, linux-kselftest On Tue, Jan 21, 2025 at 01:07 PM +08, Jiayuan Chen wrote: > Currently, only TCP supports strparser, but sockmap doesn't intercept > non-TCP to attach strparser. For example, with UDP, although the > read/write handlers are replaced, strparser is not executed due to the > lack of read_sock operation. > > Furthermore, in udp_bpf_recvmsg(), it checks whether psock has data, and > if not, it falls back to the native UDP read interface, making > UDP + strparser appear to read correctly. According to it's commit Nit: typo, "its" > history, the behavior is unexpected. > > Moreover, since UDP lacks the concept of streams, we intercept it > directly. > > Signed-off-by: Jiayuan Chen <mrpre@163.com> > --- > net/core/sock_map.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/net/core/sock_map.c b/net/core/sock_map.c > index f1b9b3958792..3b0f59d9b4db 100644 > --- a/net/core/sock_map.c > +++ b/net/core/sock_map.c > @@ -303,7 +303,10 @@ static int sock_map_link(struct bpf_map *map, struct sock *sk) > > write_lock_bh(&sk->sk_callback_lock); > if (stream_parser && stream_verdict && !psock->saved_data_ready) { > - ret = sk_psock_init_strp(sk, psock); > + if (sk_is_tcp(sk)) > + ret = sk_psock_init_strp(sk, psock); > + else > + ret = -EOPNOTSUPP; > if (ret) { > write_unlock_bh(&sk->sk_callback_lock); > sk_psock_put(sk, psock); Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH bpf v8 4/5] selftests/bpf: fix invalid flag of recv() 2025-01-21 5:07 [PATCH bpf v8 0/5] bpf: fix wrong copied_seq calculation and add tests Jiayuan Chen ` (2 preceding siblings ...) 2025-01-21 5:07 ` [PATCH bpf v8 3/5] bpf: disable non stream socket for strparser Jiayuan Chen @ 2025-01-21 5:07 ` Jiayuan Chen 2025-01-21 14:26 ` Jakub Sitnicki 2025-01-21 5:07 ` [PATCH bpf v8 5/5] selftests/bpf: add strparser test for bpf Jiayuan Chen 4 siblings, 1 reply; 12+ messages in thread From: Jiayuan Chen @ 2025-01-21 5:07 UTC (permalink / raw) To: bpf, jakub, john.fastabend Cc: netdev, martin.lau, ast, edumazet, davem, dsahern, kuba, pabeni, linux-kernel, song, andrii, mhal, yonghong.song, daniel, xiyou.wangcong, horms, corbet, eddyz87, cong.wang, shuah, mykolal, jolsa, haoluo, sdf, kpsingh, linux-doc, linux-kselftest, Jiayuan Chen SOCK_NONBLOCK flag is only effective during socket creation, not during recv. Use MSG_DONTWAIT instead. Signed-off-by: Jiayuan Chen <mrpre@163.com> --- tools/testing/selftests/bpf/prog_tests/sockmap_basic.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c b/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c index 884ad87783d5..0c51b7288978 100644 --- a/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c +++ b/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c @@ -522,8 +522,8 @@ static void test_sockmap_skb_verdict_shutdown(void) if (!ASSERT_EQ(err, 1, "epoll_wait(fd)")) goto out_close; - n = recv(c1, &b, 1, SOCK_NONBLOCK); - ASSERT_EQ(n, 0, "recv_timeout(fin)"); + n = recv(c1, &b, 1, MSG_DONTWAIT); + ASSERT_EQ(n, 0, "recv(fin)"); out_close: close(c1); close(p1); @@ -628,7 +628,7 @@ static void test_sockmap_skb_verdict_fionread(bool pass_prog) ASSERT_EQ(avail, expected, "ioctl(FIONREAD)"); /* On DROP test there will be no data to read */ if (pass_prog) { - recvd = recv_timeout(c1, &buf, sizeof(buf), SOCK_NONBLOCK, IO_TIMEOUT_SEC); + recvd = recv_timeout(c1, &buf, sizeof(buf), MSG_DONTWAIT, IO_TIMEOUT_SEC); ASSERT_EQ(recvd, sizeof(buf), "recv_timeout(c0)"); } -- 2.43.5 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH bpf v8 4/5] selftests/bpf: fix invalid flag of recv() 2025-01-21 5:07 ` [PATCH bpf v8 4/5] selftests/bpf: fix invalid flag of recv() Jiayuan Chen @ 2025-01-21 14:26 ` Jakub Sitnicki 0 siblings, 0 replies; 12+ messages in thread From: Jakub Sitnicki @ 2025-01-21 14:26 UTC (permalink / raw) To: Jiayuan Chen Cc: bpf, john.fastabend, netdev, martin.lau, ast, edumazet, davem, dsahern, kuba, pabeni, linux-kernel, song, andrii, mhal, yonghong.song, daniel, xiyou.wangcong, horms, corbet, eddyz87, cong.wang, shuah, mykolal, jolsa, haoluo, sdf, kpsingh, linux-doc, linux-kselftest On Tue, Jan 21, 2025 at 01:07 PM +08, Jiayuan Chen wrote: > SOCK_NONBLOCK flag is only effective during socket creation, not during > recv. Use MSG_DONTWAIT instead. > > Signed-off-by: Jiayuan Chen <mrpre@163.com> > --- Good catch. Fixes: 1fa1fe8ff161 ("bpf, sockmap: Test shutdown() correctly exits epoll and recv()=0") Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH bpf v8 5/5] selftests/bpf: add strparser test for bpf 2025-01-21 5:07 [PATCH bpf v8 0/5] bpf: fix wrong copied_seq calculation and add tests Jiayuan Chen ` (3 preceding siblings ...) 2025-01-21 5:07 ` [PATCH bpf v8 4/5] selftests/bpf: fix invalid flag of recv() Jiayuan Chen @ 2025-01-21 5:07 ` Jiayuan Chen 2025-01-21 15:06 ` Jakub Sitnicki 4 siblings, 1 reply; 12+ messages in thread From: Jiayuan Chen @ 2025-01-21 5:07 UTC (permalink / raw) To: bpf, jakub, john.fastabend Cc: netdev, martin.lau, ast, edumazet, davem, dsahern, kuba, pabeni, linux-kernel, song, andrii, mhal, yonghong.song, daniel, xiyou.wangcong, horms, corbet, eddyz87, cong.wang, shuah, mykolal, jolsa, haoluo, sdf, kpsingh, linux-doc, linux-kselftest, Jiayuan Chen Add test cases for bpf + strparser and separated them from sockmap_basic, as strparser has more encapsulation and parsing capabilities compared to sockmap. Signed-off-by: Jiayuan Chen <mrpre@163.com> --- .../selftests/bpf/prog_tests/sockmap_basic.c | 53 -- .../selftests/bpf/prog_tests/sockmap_strp.c | 452 ++++++++++++++++++ .../selftests/bpf/progs/test_sockmap_strp.c | 53 ++ 3 files changed, 505 insertions(+), 53 deletions(-) create mode 100644 tools/testing/selftests/bpf/prog_tests/sockmap_strp.c create mode 100644 tools/testing/selftests/bpf/progs/test_sockmap_strp.c diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c b/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c index 0c51b7288978..f8953455db29 100644 --- a/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c +++ b/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c @@ -531,57 +531,6 @@ static void test_sockmap_skb_verdict_shutdown(void) test_sockmap_pass_prog__destroy(skel); } -static void test_sockmap_stream_pass(void) -{ - int zero = 0, sent, recvd; - int verdict, parser; - int err, map; - int c = -1, p = -1; - struct test_sockmap_pass_prog *pass = NULL; - char snd[256] = "0123456789"; - char rcv[256] = "0"; - - pass = test_sockmap_pass_prog__open_and_load(); - verdict = bpf_program__fd(pass->progs.prog_skb_verdict); - parser = bpf_program__fd(pass->progs.prog_skb_parser); - map = bpf_map__fd(pass->maps.sock_map_rx); - - err = bpf_prog_attach(parser, map, BPF_SK_SKB_STREAM_PARSER, 0); - if (!ASSERT_OK(err, "bpf_prog_attach stream parser")) - goto out; - - err = bpf_prog_attach(verdict, map, BPF_SK_SKB_STREAM_VERDICT, 0); - if (!ASSERT_OK(err, "bpf_prog_attach stream verdict")) - goto out; - - err = create_pair(AF_INET, SOCK_STREAM, &c, &p); - if (err) - goto out; - - /* sk_data_ready of 'p' will be replaced by strparser handler */ - err = bpf_map_update_elem(map, &zero, &p, BPF_NOEXIST); - if (!ASSERT_OK(err, "bpf_map_update_elem(p)")) - goto out_close; - - /* - * as 'prog_skb_parser' return the original skb len and - * 'prog_skb_verdict' return SK_PASS, the kernel will just - * pass it through to original socket 'p' - */ - sent = xsend(c, snd, sizeof(snd), 0); - ASSERT_EQ(sent, sizeof(snd), "xsend(c)"); - - recvd = recv_timeout(p, rcv, sizeof(rcv), SOCK_NONBLOCK, - IO_TIMEOUT_SEC); - ASSERT_EQ(recvd, sizeof(rcv), "recv_timeout(p)"); - -out_close: - close(c); - close(p); - -out: - test_sockmap_pass_prog__destroy(pass); -} static void test_sockmap_skb_verdict_fionread(bool pass_prog) { @@ -1101,8 +1050,6 @@ void test_sockmap_basic(void) test_sockmap_progs_query(BPF_SK_SKB_VERDICT); if (test__start_subtest("sockmap skb_verdict shutdown")) test_sockmap_skb_verdict_shutdown(); - if (test__start_subtest("sockmap stream parser and verdict pass")) - test_sockmap_stream_pass(); if (test__start_subtest("sockmap skb_verdict fionread")) test_sockmap_skb_verdict_fionread(true); if (test__start_subtest("sockmap skb_verdict fionread on drop")) diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_strp.c b/tools/testing/selftests/bpf/prog_tests/sockmap_strp.c new file mode 100644 index 000000000000..01ed1fca1d9c --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/sockmap_strp.c @@ -0,0 +1,452 @@ +// SPDX-License-Identifier: GPL-2.0 +#include <error.h> +#include <netinet/tcp.h> +#include <test_progs.h> +#include "sockmap_helpers.h" +#include "test_skmsg_load_helpers.skel.h" +#include "test_sockmap_strp.skel.h" +#define STRP_PKT_HEAD_LEN 4 +#define STRP_PKT_BODY_LEN 6 +#define STRP_PKT_FULL_LEN (STRP_PKT_HEAD_LEN + STRP_PKT_BODY_LEN) +static const char packet[STRP_PKT_FULL_LEN] = "head+body\0"; +static const int test_packet_num = 100; + +/* current implementation of tcp_bpf_recvmsg_parser() invoke + * data_ready with sk held if skb exist in sk_receive_queue. + * Then for data_ready implementation of strparser, it will + * delay the read operation if sk was held and EAGAIN is returned. + */ +static int sockmap_strp_consume_pre_data(int p) +{ + int recvd; + bool retried = false; + char rcv[10]; + +retry: + errno = 0; + recvd = recv_timeout(p, rcv, sizeof(rcv), 0, 1); + if (recvd < 0 && errno == EAGAIN && retried == false) { + /* On the first call, EAGAIN will certainly be returned. + * Waiting 1 second is pretty enough wait workqueue finish. + */ + sleep(1); + retried = true; + goto retry; + } + + if (!ASSERT_EQ(recvd, STRP_PKT_FULL_LEN, "recv_timeout(pre-data)") || + !ASSERT_OK(memcmp(packet, rcv, STRP_PKT_FULL_LEN), + "memcmp pre-data")) + return -1; + return 0; +} + +static struct test_sockmap_strp *sockmap_strp_init(int *out_map, bool pass, + bool need_parser) +{ + struct test_sockmap_strp *strp = NULL; + int verdict, parser; + int err; + + strp = test_sockmap_strp__open_and_load(); + *out_map = bpf_map__fd(strp->maps.sock_map); + + if (need_parser) + parser = bpf_program__fd(strp->progs.prog_skb_parser_partial); + else + parser = bpf_program__fd(strp->progs.prog_skb_parser); + + if (pass) + verdict = bpf_program__fd(strp->progs.prog_skb_verdict_pass); + else + verdict = bpf_program__fd(strp->progs.prog_skb_verdict); + + err = bpf_prog_attach(parser, *out_map, BPF_SK_SKB_STREAM_PARSER, 0); + if (!ASSERT_OK(err, "bpf_prog_attach stream parser")) + goto err; + + err = bpf_prog_attach(verdict, *out_map, BPF_SK_SKB_STREAM_VERDICT, 0); + if (!ASSERT_OK(err, "bpf_prog_attach stream verdict")) + goto err; + + return strp; +err: + test_sockmap_strp__destroy(strp); + return NULL; +} + +/* Dispatch packets to different socket by packet size: + * + * ------ ------ + * | pkt4 || pkt1 |... > remote socket + * ------ ------ / ------ ------ + * | pkt8 | pkt7 |... + * ------ ------ \ ------ ------ + * | pkt3 || pkt2 |... > local socket + * ------ ------ + */ +static void test_sockmap_strp_dispatch_pkt(int family, int sotype) +{ + int i, j, zero = 0, one = 1, recvd; + int err, map; + int c0 = -1, p0 = -1, c1 = -1, p1 = -1; + struct test_sockmap_strp *strp = NULL; + int test_cnt = 6; + char rcv[10]; + struct { + char data[7]; + int data_len; + int send_cnt; + int *receiver; + } send_dir[2] = { + /* data expected to deliver to local */ + {"llllll", 6, 0, &p0}, + /* data expected to deliver to remote */ + {"rrrrr", 5, 0, &c1} + }; + + strp = sockmap_strp_init(&map, false, false); + if (!ASSERT_TRUE(strp, "sockmap_strp_init")) + return; + + err = create_socket_pairs(family, sotype, &c0, &c1, &p0, &p1); + if (!ASSERT_OK(err, "create_socket_pairs()")) + goto out; + + err = bpf_map_update_elem(map, &zero, &p0, BPF_NOEXIST); + if (!ASSERT_OK(err, "bpf_map_update_elem(p0)")) + goto out_close; + + err = bpf_map_update_elem(map, &one, &p1, BPF_NOEXIST); + if (!ASSERT_OK(err, "bpf_map_update_elem(p1)")) + goto out_close; + + err = setsockopt(c1, IPPROTO_TCP, TCP_NODELAY, &zero, sizeof(zero)); + if (!ASSERT_OK(err, "setsockopt(TCP_NODELAY)")) + goto out_close; + + /* deliver data with data size greater than 5 to local */ + strp->data->verdict_max_size = 5; + + for (i = 0; i < test_cnt; i++) { + int d = i % 2; + + xsend(c0, send_dir[d].data, send_dir[d].data_len, 0); + send_dir[d].send_cnt++; + } + + for (i = 0; i < 2; i++) { + for (j = 0; j < send_dir[i].send_cnt; j++) { + int expected = send_dir[i].data_len; + + recvd = recv_timeout(*send_dir[i].receiver, rcv, + expected, MSG_DONTWAIT, + IO_TIMEOUT_SEC); + if (!ASSERT_EQ(recvd, expected, "recv_timeout()")) + goto out_close; + if (!ASSERT_OK(memcmp(send_dir[i].data, rcv, recvd), + "memcmp(rcv)")) + goto out_close; + } + } +out_close: + close(c0); + close(c1); + close(p0); + close(p1); +out: + test_sockmap_strp__destroy(strp); +} + +/* We have multiple packets in one skb + * ------------ ------------ ------------ + * | packet1 | packet2 | ... + * ------------ ------------ ------------ + */ +static void test_sockmap_strp_multiple_pkt(int family, int sotype) +{ + int i, zero = 0; + int sent, recvd, total; + int err, map; + int c = -1, p = -1; + struct test_sockmap_strp *strp = NULL; + char *snd = NULL, *rcv = NULL; + + strp = sockmap_strp_init(&map, true, true); + if (!ASSERT_TRUE(strp, "sockmap_strp_init")) + return; + + err = create_pair(family, sotype, &c, &p); + if (err) + goto out; + + err = bpf_map_update_elem(map, &zero, &p, BPF_NOEXIST); + if (!ASSERT_OK(err, "bpf_map_update_elem(zero, p)")) + goto out_close; + + /* construct multiple packets in one buffer */ + total = test_packet_num * STRP_PKT_FULL_LEN; + snd = malloc(total); + rcv = malloc(total + 1); + if (!ASSERT_TRUE(snd, "malloc(snd)") || + !ASSERT_TRUE(rcv, "malloc(rcv)")) + goto out_close; + + for (i = 0; i < test_packet_num; i++) { + memcpy(snd + i * STRP_PKT_FULL_LEN, + packet, STRP_PKT_FULL_LEN); + } + + sent = xsend(c, snd, total, 0); + if (!ASSERT_EQ(sent, total, "xsend(c)")) + goto out_close; + + /* try to recv one more byte to avoid truncation check */ + recvd = recv_timeout(p, rcv, total + 1, MSG_DONTWAIT, IO_TIMEOUT_SEC); + if (!ASSERT_EQ(recvd, total, "recv(rcv)")) + goto out_close; + + /* we sent TCP segment with multiple encapsulation + * then check whether packets are handled correctly + */ + if (!ASSERT_OK(memcmp(snd, rcv, total), "memcmp(snd, rcv)")) + goto out_close; + +out_close: + close(c); + close(p); + if (snd) + free(snd); + if (rcv) + free(rcv); +out: + test_sockmap_strp__destroy(strp); +} + +/* Test strparser with partial read */ +static void test_sockmap_strp_partial_read(int family, int sotype) +{ + int zero = 0, recvd, off; + int err, map; + int c = -1, p = -1; + struct test_sockmap_strp *strp = NULL; + char rcv[STRP_PKT_FULL_LEN + 1] = "0"; + + strp = sockmap_strp_init(&map, true, true); + if (!ASSERT_TRUE(strp, "sockmap_strp_init")) + return; + + err = create_pair(family, sotype, &c, &p); + if (err) + goto out; + + /* sk_data_ready of 'p' will be replaced by strparser handler */ + err = bpf_map_update_elem(map, &zero, &p, BPF_NOEXIST); + if (!ASSERT_OK(err, "bpf_map_update_elem(zero, p)")) + goto out_close; + + /* 1.1 send partial head, 1 byte header left*/ + off = STRP_PKT_HEAD_LEN - 1; + xsend(c, packet, off, 0); + recvd = recv_timeout(p, rcv, sizeof(rcv), MSG_DONTWAIT, 1); + if (!ASSERT_EQ(-1, recvd, "insufficient head, should no data recvd")) + goto out_close; + + /* 1.2 send remaining head and body */ + xsend(c, packet + off, STRP_PKT_FULL_LEN - off, 0); + recvd = recv_timeout(p, rcv, sizeof(rcv), MSG_DONTWAIT, IO_TIMEOUT_SEC); + if (!ASSERT_EQ(recvd, STRP_PKT_FULL_LEN, "should full data recvd")) + goto out_close; + + /* 2.1 send partial head, 1 byte header left */ + off = STRP_PKT_HEAD_LEN - 1; + xsend(c, packet, off, 0); + + /* 2.2 send remaining head and partial body, 1 byte body left */ + xsend(c, packet + off, STRP_PKT_FULL_LEN - off - 1, 0); + off = STRP_PKT_FULL_LEN - 1; + recvd = recv_timeout(p, rcv, sizeof(rcv), MSG_DONTWAIT, 1); + if (!ASSERT_EQ(-1, recvd, "insufficient body, should no data read")) + goto out_close; + + /* 2.3 send remaining body */ + xsend(c, packet + off, STRP_PKT_FULL_LEN - off, 0); + recvd = recv_timeout(p, rcv, sizeof(rcv), MSG_DONTWAIT, IO_TIMEOUT_SEC); + if (!ASSERT_EQ(recvd, STRP_PKT_FULL_LEN, "should full data recvd")) + goto out_close; + +out_close: + close(c); + close(p); + +out: + test_sockmap_strp__destroy(strp); +} + +/* Test simple socket read/write with strparser + FIONREAD */ +static void test_sockmap_strp_pass(int family, int sotype, bool fionread) +{ + int zero = 0, pkt_size = STRP_PKT_FULL_LEN, sent, recvd, avail; + int err, map; + int c = -1, p = -1; + int test_cnt = 10, i; + struct test_sockmap_strp *strp = NULL; + char rcv[STRP_PKT_FULL_LEN + 1] = "0"; + + strp = sockmap_strp_init(&map, true, true); + if (!ASSERT_TRUE(strp, "sockmap_strp_init")) + return; + + err = create_pair(family, sotype, &c, &p); + if (err) + goto out; + + /* inject some data before bpf process, it should be read + * correctly because we check sk_receive_queue in + * tcp_bpf_recvmsg_parser() + */ + sent = xsend(c, packet, pkt_size, 0); + if (!ASSERT_EQ(sent, pkt_size, "xsend(pre-data)")) + goto out_close; + + /* sk_data_ready of 'p' will be replaced by strparser handler */ + err = bpf_map_update_elem(map, &zero, &p, BPF_NOEXIST); + if (!ASSERT_OK(err, "bpf_map_update_elem(p)")) + goto out_close; + + /* consume previous data we injected */ + if (sockmap_strp_consume_pre_data(p)) + goto out_close; + + /* Previously, we encountered issues such as deadlocks and + * sequence errors that resulted in the inability to read + * continuously. Therefore, we perform multiple iterations + * of testing here. + */ + for (i = 0; i < test_cnt; i++) { + sent = xsend(c, packet, pkt_size, 0); + if (!ASSERT_EQ(sent, pkt_size, "xsend(c)")) + goto out_close; + + recvd = recv_timeout(p, rcv, sizeof(rcv), MSG_DONTWAIT, + IO_TIMEOUT_SEC); + if (!ASSERT_EQ(recvd, pkt_size, "recv_timeout(p)") || + !ASSERT_OK(memcmp(packet, rcv, pkt_size), + "memcmp")) + goto out_close; + } + + if (fionread) { + sent = xsend(c, packet, pkt_size, 0); + if (!ASSERT_EQ(sent, pkt_size, "second xsend(c)")) + goto out_close; + + err = ioctl(p, FIONREAD, &avail); + if (!ASSERT_OK(err, "ioctl(FIONREAD) error") || + !ASSERT_EQ(avail, pkt_size, "ioctl(FIONREAD)")) + goto out_close; + + recvd = recv_timeout(p, rcv, sizeof(rcv), MSG_DONTWAIT, + IO_TIMEOUT_SEC); + if (!ASSERT_EQ(recvd, pkt_size, "second recv_timeout(p)") || + !ASSERT_OK(memcmp(packet, rcv, pkt_size), + "second memcmp")) + goto out_close; + } + +out_close: + close(c); + close(p); + +out: + test_sockmap_strp__destroy(strp); +} + +/* Test strparser with verdict mode */ +static void test_sockmap_strp_verdict(int family, int sotype) +{ + int zero = 0, one = 1, sent, recvd, off; + int err, map; + int c0 = -1, p0 = -1, c1 = -1, p1 = -1; + struct test_sockmap_strp *strp = NULL; + char rcv[STRP_PKT_FULL_LEN + 1] = "0"; + + strp = sockmap_strp_init(&map, false, true); + if (!ASSERT_TRUE(strp, "sockmap_strp_init")) + return; + + /* We simulate a reverse proxy server. + * When p0 receives data from c0, we forward it to c1. + * From c1's perspective, it will consider this data + * as being sent by p1. + */ + err = create_socket_pairs(family, sotype, &c0, &c1, &p0, &p1); + if (!ASSERT_OK(err, "create_socket_pairs()")) + goto out; + + err = bpf_map_update_elem(map, &zero, &p0, BPF_NOEXIST); + if (!ASSERT_OK(err, "bpf_map_update_elem(p0)")) + goto out_close; + + err = bpf_map_update_elem(map, &one, &p1, BPF_NOEXIST); + if (!ASSERT_OK(err, "bpf_map_update_elem(c1)")) + goto out_close; + + sent = xsend(c0, packet, STRP_PKT_FULL_LEN, 0); + if (!ASSERT_EQ(sent, STRP_PKT_FULL_LEN, "xsend(c0)")) + goto out_close; + + recvd = recv_timeout(c1, rcv, sizeof(rcv), MSG_DONTWAIT, + IO_TIMEOUT_SEC); + if (!ASSERT_EQ(recvd, STRP_PKT_FULL_LEN, "recv_timeout(p1)") || + !ASSERT_OK(memcmp(packet, rcv, STRP_PKT_FULL_LEN), + "received data does not match the sent data")) + goto out_close; + + /* send again to ensure the stream is functioning correctly. */ + sent = xsend(c0, packet, STRP_PKT_FULL_LEN, 0); + if (!ASSERT_EQ(sent, STRP_PKT_FULL_LEN, "second xsend(c0)")) + goto out_close; + + /* partial read */ + off = STRP_PKT_FULL_LEN / 2; + recvd = recv_timeout(c1, rcv, off, MSG_DONTWAIT, + IO_TIMEOUT_SEC); + recvd += recv_timeout(c1, rcv + off, sizeof(rcv) - off, MSG_DONTWAIT, + IO_TIMEOUT_SEC); + + if (!ASSERT_EQ(recvd, STRP_PKT_FULL_LEN, "partial recv_timeout(c1)") || + !ASSERT_OK(memcmp(packet, rcv, STRP_PKT_FULL_LEN), + "partial received data does not match the sent data")) + goto out_close; + +out_close: + close(c0); + close(c1); + close(p0); + close(p1); +out: + test_sockmap_strp__destroy(strp); +} + +void test_sockmap_strp(void) +{ + if (test__start_subtest("sockmap strp tcp pass")) + test_sockmap_strp_pass(AF_INET, SOCK_STREAM, false); + if (test__start_subtest("sockmap strp tcp v6 pass")) + test_sockmap_strp_pass(AF_INET6, SOCK_STREAM, false); + if (test__start_subtest("sockmap strp tcp pass fionread")) + test_sockmap_strp_pass(AF_INET, SOCK_STREAM, true); + if (test__start_subtest("sockmap strp tcp v6 pass fionread")) + test_sockmap_strp_pass(AF_INET6, SOCK_STREAM, true); + if (test__start_subtest("sockmap strp tcp verdict")) + test_sockmap_strp_verdict(AF_INET, SOCK_STREAM); + if (test__start_subtest("sockmap strp tcp v6 verdict")) + test_sockmap_strp_verdict(AF_INET6, SOCK_STREAM); + if (test__start_subtest("sockmap strp tcp partial read")) + test_sockmap_strp_partial_read(AF_INET, SOCK_STREAM); + if (test__start_subtest("sockmap strp tcp multiple packets")) + test_sockmap_strp_multiple_pkt(AF_INET, SOCK_STREAM); + if (test__start_subtest("sockmap strp tcp dispatch")) + test_sockmap_strp_dispatch_pkt(AF_INET, SOCK_STREAM); +} diff --git a/tools/testing/selftests/bpf/progs/test_sockmap_strp.c b/tools/testing/selftests/bpf/progs/test_sockmap_strp.c new file mode 100644 index 000000000000..dde3d5bec515 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/test_sockmap_strp.c @@ -0,0 +1,53 @@ +// SPDX-License-Identifier: GPL-2.0 +#include <linux/bpf.h> +#include <bpf/bpf_helpers.h> +#include <bpf/bpf_endian.h> +int verdict_max_size = 10000; +struct { + __uint(type, BPF_MAP_TYPE_SOCKMAP); + __uint(max_entries, 20); + __type(key, int); + __type(value, int); +} sock_map SEC(".maps"); + +SEC("sk_skb/stream_verdict") +int prog_skb_verdict(struct __sk_buff *skb) +{ + __u32 one = 1; + + if (skb->len > verdict_max_size) + return SK_PASS; + + return bpf_sk_redirect_map(skb, &sock_map, one, 0); +} + +SEC("sk_skb/stream_verdict") +int prog_skb_verdict_pass(struct __sk_buff *skb) +{ + return SK_PASS; +} + +SEC("sk_skb/stream_parser") +int prog_skb_parser(struct __sk_buff *skb) +{ + return skb->len; +} + +SEC("sk_skb/stream_parser") +int prog_skb_parser_partial(struct __sk_buff *skb) +{ + /* agreement with the test program on a 4-byte size header + * and 6-byte body. + */ + if (skb->len < 4) { + /* need more header to determine full length */ + return 0; + } + /* return full length decoded from header. + * the return value may be larger than skb->len which + * means framework must wait body coming. + */ + return 10; +} + +char _license[] SEC("license") = "GPL"; -- 2.43.5 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH bpf v8 5/5] selftests/bpf: add strparser test for bpf 2025-01-21 5:07 ` [PATCH bpf v8 5/5] selftests/bpf: add strparser test for bpf Jiayuan Chen @ 2025-01-21 15:06 ` Jakub Sitnicki 0 siblings, 0 replies; 12+ messages in thread From: Jakub Sitnicki @ 2025-01-21 15:06 UTC (permalink / raw) To: Jiayuan Chen Cc: bpf, john.fastabend, netdev, martin.lau, ast, edumazet, davem, dsahern, kuba, pabeni, linux-kernel, song, andrii, mhal, yonghong.song, daniel, xiyou.wangcong, horms, corbet, eddyz87, cong.wang, shuah, mykolal, jolsa, haoluo, sdf, kpsingh, linux-doc, linux-kselftest Thanks for expanding tests. On Tue, Jan 21, 2025 at 01:07 PM +08, Jiayuan Chen wrote: > Add test cases for bpf + strparser and separated them from > sockmap_basic, as strparser has more encapsulation and parsing > capabilities compared to sockmap. > > Signed-off-by: Jiayuan Chen <mrpre@163.com> > --- > .../selftests/bpf/prog_tests/sockmap_basic.c | 53 -- > .../selftests/bpf/prog_tests/sockmap_strp.c | 452 ++++++++++++++++++ > .../selftests/bpf/progs/test_sockmap_strp.c | 53 ++ > 3 files changed, 505 insertions(+), 53 deletions(-) > create mode 100644 tools/testing/selftests/bpf/prog_tests/sockmap_strp.c > create mode 100644 tools/testing/selftests/bpf/progs/test_sockmap_strp.c > > diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c b/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c > index 0c51b7288978..f8953455db29 100644 > --- a/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c > +++ b/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c > @@ -531,57 +531,6 @@ static void test_sockmap_skb_verdict_shutdown(void) > test_sockmap_pass_prog__destroy(skel); > } > > -static void test_sockmap_stream_pass(void) > -{ > - int zero = 0, sent, recvd; > - int verdict, parser; > - int err, map; > - int c = -1, p = -1; > - struct test_sockmap_pass_prog *pass = NULL; > - char snd[256] = "0123456789"; > - char rcv[256] = "0"; > - > - pass = test_sockmap_pass_prog__open_and_load(); > - verdict = bpf_program__fd(pass->progs.prog_skb_verdict); > - parser = bpf_program__fd(pass->progs.prog_skb_parser); > - map = bpf_map__fd(pass->maps.sock_map_rx); > - > - err = bpf_prog_attach(parser, map, BPF_SK_SKB_STREAM_PARSER, 0); > - if (!ASSERT_OK(err, "bpf_prog_attach stream parser")) > - goto out; > - > - err = bpf_prog_attach(verdict, map, BPF_SK_SKB_STREAM_VERDICT, 0); > - if (!ASSERT_OK(err, "bpf_prog_attach stream verdict")) > - goto out; > - > - err = create_pair(AF_INET, SOCK_STREAM, &c, &p); > - if (err) > - goto out; > - > - /* sk_data_ready of 'p' will be replaced by strparser handler */ > - err = bpf_map_update_elem(map, &zero, &p, BPF_NOEXIST); > - if (!ASSERT_OK(err, "bpf_map_update_elem(p)")) > - goto out_close; > - > - /* > - * as 'prog_skb_parser' return the original skb len and > - * 'prog_skb_verdict' return SK_PASS, the kernel will just > - * pass it through to original socket 'p' > - */ > - sent = xsend(c, snd, sizeof(snd), 0); > - ASSERT_EQ(sent, sizeof(snd), "xsend(c)"); > - > - recvd = recv_timeout(p, rcv, sizeof(rcv), SOCK_NONBLOCK, > - IO_TIMEOUT_SEC); > - ASSERT_EQ(recvd, sizeof(rcv), "recv_timeout(p)"); > - > -out_close: > - close(c); > - close(p); > - > -out: > - test_sockmap_pass_prog__destroy(pass); > -} > > static void test_sockmap_skb_verdict_fionread(bool pass_prog) > { > @@ -1101,8 +1050,6 @@ void test_sockmap_basic(void) > test_sockmap_progs_query(BPF_SK_SKB_VERDICT); > if (test__start_subtest("sockmap skb_verdict shutdown")) > test_sockmap_skb_verdict_shutdown(); > - if (test__start_subtest("sockmap stream parser and verdict pass")) > - test_sockmap_stream_pass(); > if (test__start_subtest("sockmap skb_verdict fionread")) > test_sockmap_skb_verdict_fionread(true); > if (test__start_subtest("sockmap skb_verdict fionread on drop")) > diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_strp.c b/tools/testing/selftests/bpf/prog_tests/sockmap_strp.c > new file mode 100644 > index 000000000000..01ed1fca1d9c > --- /dev/null > +++ b/tools/testing/selftests/bpf/prog_tests/sockmap_strp.c > @@ -0,0 +1,452 @@ > +// SPDX-License-Identifier: GPL-2.0 > +#include <error.h> > +#include <netinet/tcp.h> > +#include <test_progs.h> > +#include "sockmap_helpers.h" > +#include "test_skmsg_load_helpers.skel.h" > +#include "test_sockmap_strp.skel.h" Nit: add new line to separate cpp defines visually > +#define STRP_PKT_HEAD_LEN 4 > +#define STRP_PKT_BODY_LEN 6 > +#define STRP_PKT_FULL_LEN (STRP_PKT_HEAD_LEN + STRP_PKT_BODY_LEN) Nit: add new line to constants visually > +static const char packet[STRP_PKT_FULL_LEN] = "head+body\0"; > +static const int test_packet_num = 100; > + > +/* current implementation of tcp_bpf_recvmsg_parser() invoke Nit: grammar, "invoke*s*" > + * data_ready with sk held if skb exist in sk_receive_queue. Nit: grammar, "exist*s*" > + * Then for data_ready implementation of strparser, it will > + * delay the read operation if sk was held and EAGAIN is returned. > + */ > +static int sockmap_strp_consume_pre_data(int p) > +{ > + int recvd; > + bool retried = false; > + char rcv[10]; > + > +retry: > + errno = 0; > + recvd = recv_timeout(p, rcv, sizeof(rcv), 0, 1); > + if (recvd < 0 && errno == EAGAIN && retried == false) { > + /* On the first call, EAGAIN will certainly be returned. > + * Waiting 1 second is pretty enough wait workqueue finish. Nit: style, "Wait for workqueue to finish." > + */ > + sleep(1); > + retried = true; > + goto retry; > + } > + > + if (!ASSERT_EQ(recvd, STRP_PKT_FULL_LEN, "recv_timeout(pre-data)") || Meaningful error message like "recv error or truncated data" would be better. ASSERT_EQ / CHECK macros print the function name, so "(pre-data)" tag is redundant. > + !ASSERT_OK(memcmp(packet, rcv, STRP_PKT_FULL_LEN), > + "memcmp pre-data")) Suggested error message: "data mismatch". > + return -1; > + return 0; > +} > + > +static struct test_sockmap_strp *sockmap_strp_init(int *out_map, bool pass, > + bool need_parser) > +{ > + struct test_sockmap_strp *strp = NULL; > + int verdict, parser; > + int err; > + > + strp = test_sockmap_strp__open_and_load(); > + *out_map = bpf_map__fd(strp->maps.sock_map); > + > + if (need_parser) > + parser = bpf_program__fd(strp->progs.prog_skb_parser_partial); > + else > + parser = bpf_program__fd(strp->progs.prog_skb_parser); > + > + if (pass) > + verdict = bpf_program__fd(strp->progs.prog_skb_verdict_pass); > + else > + verdict = bpf_program__fd(strp->progs.prog_skb_verdict); > + > + err = bpf_prog_attach(parser, *out_map, BPF_SK_SKB_STREAM_PARSER, 0); > + if (!ASSERT_OK(err, "bpf_prog_attach stream parser")) > + goto err; > + > + err = bpf_prog_attach(verdict, *out_map, BPF_SK_SKB_STREAM_VERDICT, 0); > + if (!ASSERT_OK(err, "bpf_prog_attach stream verdict")) > + goto err; > + > + return strp; > +err: > + test_sockmap_strp__destroy(strp); > + return NULL; > +} > + > +/* Dispatch packets to different socket by packet size: > + * > + * ------ ------ > + * | pkt4 || pkt1 |... > remote socket > + * ------ ------ / ------ ------ > + * | pkt8 | pkt7 |... > + * ------ ------ \ ------ ------ > + * | pkt3 || pkt2 |... > local socket > + * ------ ------ > + */ > +static void test_sockmap_strp_dispatch_pkt(int family, int sotype) > +{ > + int i, j, zero = 0, one = 1, recvd; > + int err, map; > + int c0 = -1, p0 = -1, c1 = -1, p1 = -1; > + struct test_sockmap_strp *strp = NULL; > + int test_cnt = 6; > + char rcv[10]; > + struct { > + char data[7]; > + int data_len; > + int send_cnt; > + int *receiver; > + } send_dir[2] = { > + /* data expected to deliver to local */ > + {"llllll", 6, 0, &p0}, > + /* data expected to deliver to remote */ > + {"rrrrr", 5, 0, &c1} > + }; > + > + strp = sockmap_strp_init(&map, false, false); > + if (!ASSERT_TRUE(strp, "sockmap_strp_init")) > + return; > + > + err = create_socket_pairs(family, sotype, &c0, &c1, &p0, &p1); > + if (!ASSERT_OK(err, "create_socket_pairs()")) > + goto out; > + > + err = bpf_map_update_elem(map, &zero, &p0, BPF_NOEXIST); > + if (!ASSERT_OK(err, "bpf_map_update_elem(p0)")) > + goto out_close; > + > + err = bpf_map_update_elem(map, &one, &p1, BPF_NOEXIST); > + if (!ASSERT_OK(err, "bpf_map_update_elem(p1)")) > + goto out_close; > + > + err = setsockopt(c1, IPPROTO_TCP, TCP_NODELAY, &zero, sizeof(zero)); > + if (!ASSERT_OK(err, "setsockopt(TCP_NODELAY)")) > + goto out_close; > + > + /* deliver data with data size greater than 5 to local */ > + strp->data->verdict_max_size = 5; > + > + for (i = 0; i < test_cnt; i++) { > + int d = i % 2; > + > + xsend(c0, send_dir[d].data, send_dir[d].data_len, 0); > + send_dir[d].send_cnt++; > + } > + > + for (i = 0; i < 2; i++) { > + for (j = 0; j < send_dir[i].send_cnt; j++) { > + int expected = send_dir[i].data_len; > + > + recvd = recv_timeout(*send_dir[i].receiver, rcv, > + expected, MSG_DONTWAIT, > + IO_TIMEOUT_SEC); > + if (!ASSERT_EQ(recvd, expected, "recv_timeout()")) > + goto out_close; > + if (!ASSERT_OK(memcmp(send_dir[i].data, rcv, recvd), > + "memcmp(rcv)")) > + goto out_close; > + } > + } > +out_close: > + close(c0); > + close(c1); > + close(p0); > + close(p1); > +out: > + test_sockmap_strp__destroy(strp); > +} > + > +/* We have multiple packets in one skb > + * ------------ ------------ ------------ > + * | packet1 | packet2 | ... > + * ------------ ------------ ------------ > + */ > +static void test_sockmap_strp_multiple_pkt(int family, int sotype) > +{ > + int i, zero = 0; > + int sent, recvd, total; > + int err, map; > + int c = -1, p = -1; > + struct test_sockmap_strp *strp = NULL; > + char *snd = NULL, *rcv = NULL; > + > + strp = sockmap_strp_init(&map, true, true); > + if (!ASSERT_TRUE(strp, "sockmap_strp_init")) > + return; > + > + err = create_pair(family, sotype, &c, &p); > + if (err) > + goto out; > + > + err = bpf_map_update_elem(map, &zero, &p, BPF_NOEXIST); > + if (!ASSERT_OK(err, "bpf_map_update_elem(zero, p)")) > + goto out_close; > + > + /* construct multiple packets in one buffer */ > + total = test_packet_num * STRP_PKT_FULL_LEN; > + snd = malloc(total); > + rcv = malloc(total + 1); > + if (!ASSERT_TRUE(snd, "malloc(snd)") || > + !ASSERT_TRUE(rcv, "malloc(rcv)")) > + goto out_close; > + > + for (i = 0; i < test_packet_num; i++) { > + memcpy(snd + i * STRP_PKT_FULL_LEN, > + packet, STRP_PKT_FULL_LEN); > + } > + > + sent = xsend(c, snd, total, 0); > + if (!ASSERT_EQ(sent, total, "xsend(c)")) > + goto out_close; > + > + /* try to recv one more byte to avoid truncation check */ > + recvd = recv_timeout(p, rcv, total + 1, MSG_DONTWAIT, IO_TIMEOUT_SEC); > + if (!ASSERT_EQ(recvd, total, "recv(rcv)")) > + goto out_close; > + > + /* we sent TCP segment with multiple encapsulation > + * then check whether packets are handled correctly > + */ > + if (!ASSERT_OK(memcmp(snd, rcv, total), "memcmp(snd, rcv)")) > + goto out_close; > + > +out_close: > + close(c); > + close(p); > + if (snd) > + free(snd); > + if (rcv) > + free(rcv); > +out: > + test_sockmap_strp__destroy(strp); > +} > + > +/* Test strparser with partial read */ > +static void test_sockmap_strp_partial_read(int family, int sotype) > +{ > + int zero = 0, recvd, off; > + int err, map; > + int c = -1, p = -1; > + struct test_sockmap_strp *strp = NULL; > + char rcv[STRP_PKT_FULL_LEN + 1] = "0"; > + > + strp = sockmap_strp_init(&map, true, true); > + if (!ASSERT_TRUE(strp, "sockmap_strp_init")) > + return; > + > + err = create_pair(family, sotype, &c, &p); > + if (err) > + goto out; > + > + /* sk_data_ready of 'p' will be replaced by strparser handler */ > + err = bpf_map_update_elem(map, &zero, &p, BPF_NOEXIST); > + if (!ASSERT_OK(err, "bpf_map_update_elem(zero, p)")) > + goto out_close; > + > + /* 1.1 send partial head, 1 byte header left*/ Nit: missing space before comment-close tag, "left */". > + off = STRP_PKT_HEAD_LEN - 1; > + xsend(c, packet, off, 0); > + recvd = recv_timeout(p, rcv, sizeof(rcv), MSG_DONTWAIT, 1); > + if (!ASSERT_EQ(-1, recvd, "insufficient head, should no data recvd")) "partial head sent, expected no data" > + goto out_close; > + > + /* 1.2 send remaining head and body */ > + xsend(c, packet + off, STRP_PKT_FULL_LEN - off, 0); > + recvd = recv_timeout(p, rcv, sizeof(rcv), MSG_DONTWAIT, IO_TIMEOUT_SEC); > + if (!ASSERT_EQ(recvd, STRP_PKT_FULL_LEN, "should full data recvd")) "expected full data" > + goto out_close; > + > + /* 2.1 send partial head, 1 byte header left */ > + off = STRP_PKT_HEAD_LEN - 1; > + xsend(c, packet, off, 0); > + > + /* 2.2 send remaining head and partial body, 1 byte body left */ > + xsend(c, packet + off, STRP_PKT_FULL_LEN - off - 1, 0); > + off = STRP_PKT_FULL_LEN - 1; > + recvd = recv_timeout(p, rcv, sizeof(rcv), MSG_DONTWAIT, 1); > + if (!ASSERT_EQ(-1, recvd, "insufficient body, should no data read")) "partial body sent, expected no data" > + goto out_close; > + > + /* 2.3 send remaining body */ > + xsend(c, packet + off, STRP_PKT_FULL_LEN - off, 0); > + recvd = recv_timeout(p, rcv, sizeof(rcv), MSG_DONTWAIT, IO_TIMEOUT_SEC); > + if (!ASSERT_EQ(recvd, STRP_PKT_FULL_LEN, "should full data recvd")) "expected full data" > + goto out_close; > + > +out_close: > + close(c); > + close(p); > + > +out: > + test_sockmap_strp__destroy(strp); > +} > + > +/* Test simple socket read/write with strparser + FIONREAD */ > +static void test_sockmap_strp_pass(int family, int sotype, bool fionread) > +{ > + int zero = 0, pkt_size = STRP_PKT_FULL_LEN, sent, recvd, avail; > + int err, map; > + int c = -1, p = -1; > + int test_cnt = 10, i; > + struct test_sockmap_strp *strp = NULL; > + char rcv[STRP_PKT_FULL_LEN + 1] = "0"; > + > + strp = sockmap_strp_init(&map, true, true); > + if (!ASSERT_TRUE(strp, "sockmap_strp_init")) > + return; > + > + err = create_pair(family, sotype, &c, &p); > + if (err) > + goto out; > + > + /* inject some data before bpf process, it should be read > + * correctly because we check sk_receive_queue in > + * tcp_bpf_recvmsg_parser() > + */ > + sent = xsend(c, packet, pkt_size, 0); > + if (!ASSERT_EQ(sent, pkt_size, "xsend(pre-data)")) > + goto out_close; > + > + /* sk_data_ready of 'p' will be replaced by strparser handler */ > + err = bpf_map_update_elem(map, &zero, &p, BPF_NOEXIST); > + if (!ASSERT_OK(err, "bpf_map_update_elem(p)")) > + goto out_close; > + > + /* consume previous data we injected */ > + if (sockmap_strp_consume_pre_data(p)) > + goto out_close; > + > + /* Previously, we encountered issues such as deadlocks and > + * sequence errors that resulted in the inability to read > + * continuously. Therefore, we perform multiple iterations > + * of testing here. > + */ > + for (i = 0; i < test_cnt; i++) { > + sent = xsend(c, packet, pkt_size, 0); > + if (!ASSERT_EQ(sent, pkt_size, "xsend(c)")) > + goto out_close; > + > + recvd = recv_timeout(p, rcv, sizeof(rcv), MSG_DONTWAIT, > + IO_TIMEOUT_SEC); > + if (!ASSERT_EQ(recvd, pkt_size, "recv_timeout(p)") || > + !ASSERT_OK(memcmp(packet, rcv, pkt_size), > + "memcmp")) > + goto out_close; > + } > + > + if (fionread) { > + sent = xsend(c, packet, pkt_size, 0); > + if (!ASSERT_EQ(sent, pkt_size, "second xsend(c)")) > + goto out_close; > + > + err = ioctl(p, FIONREAD, &avail); > + if (!ASSERT_OK(err, "ioctl(FIONREAD) error") || > + !ASSERT_EQ(avail, pkt_size, "ioctl(FIONREAD)")) > + goto out_close; > + > + recvd = recv_timeout(p, rcv, sizeof(rcv), MSG_DONTWAIT, > + IO_TIMEOUT_SEC); > + if (!ASSERT_EQ(recvd, pkt_size, "second recv_timeout(p)") || > + !ASSERT_OK(memcmp(packet, rcv, pkt_size), > + "second memcmp")) > + goto out_close; > + } > + > +out_close: > + close(c); > + close(p); > + > +out: > + test_sockmap_strp__destroy(strp); > +} > + > +/* Test strparser with verdict mode */ > +static void test_sockmap_strp_verdict(int family, int sotype) > +{ > + int zero = 0, one = 1, sent, recvd, off; > + int err, map; > + int c0 = -1, p0 = -1, c1 = -1, p1 = -1; > + struct test_sockmap_strp *strp = NULL; > + char rcv[STRP_PKT_FULL_LEN + 1] = "0"; > + > + strp = sockmap_strp_init(&map, false, true); > + if (!ASSERT_TRUE(strp, "sockmap_strp_init")) > + return; > + > + /* We simulate a reverse proxy server. > + * When p0 receives data from c0, we forward it to c1. > + * From c1's perspective, it will consider this data > + * as being sent by p1. > + */ > + err = create_socket_pairs(family, sotype, &c0, &c1, &p0, &p1); > + if (!ASSERT_OK(err, "create_socket_pairs()")) > + goto out; > + > + err = bpf_map_update_elem(map, &zero, &p0, BPF_NOEXIST); > + if (!ASSERT_OK(err, "bpf_map_update_elem(p0)")) > + goto out_close; > + > + err = bpf_map_update_elem(map, &one, &p1, BPF_NOEXIST); > + if (!ASSERT_OK(err, "bpf_map_update_elem(c1)")) > + goto out_close; > + > + sent = xsend(c0, packet, STRP_PKT_FULL_LEN, 0); > + if (!ASSERT_EQ(sent, STRP_PKT_FULL_LEN, "xsend(c0)")) > + goto out_close; > + > + recvd = recv_timeout(c1, rcv, sizeof(rcv), MSG_DONTWAIT, > + IO_TIMEOUT_SEC); > + if (!ASSERT_EQ(recvd, STRP_PKT_FULL_LEN, "recv_timeout(p1)") || > + !ASSERT_OK(memcmp(packet, rcv, STRP_PKT_FULL_LEN), > + "received data does not match the sent data")) > + goto out_close; > + > + /* send again to ensure the stream is functioning correctly. */ > + sent = xsend(c0, packet, STRP_PKT_FULL_LEN, 0); > + if (!ASSERT_EQ(sent, STRP_PKT_FULL_LEN, "second xsend(c0)")) > + goto out_close; > + > + /* partial read */ > + off = STRP_PKT_FULL_LEN / 2; > + recvd = recv_timeout(c1, rcv, off, MSG_DONTWAIT, > + IO_TIMEOUT_SEC); > + recvd += recv_timeout(c1, rcv + off, sizeof(rcv) - off, MSG_DONTWAIT, > + IO_TIMEOUT_SEC); > + > + if (!ASSERT_EQ(recvd, STRP_PKT_FULL_LEN, "partial recv_timeout(c1)") || > + !ASSERT_OK(memcmp(packet, rcv, STRP_PKT_FULL_LEN), > + "partial received data does not match the sent data")) > + goto out_close; > + > +out_close: > + close(c0); > + close(c1); > + close(p0); > + close(p1); > +out: > + test_sockmap_strp__destroy(strp); > +} > + > +void test_sockmap_strp(void) > +{ > + if (test__start_subtest("sockmap strp tcp pass")) > + test_sockmap_strp_pass(AF_INET, SOCK_STREAM, false); > + if (test__start_subtest("sockmap strp tcp v6 pass")) > + test_sockmap_strp_pass(AF_INET6, SOCK_STREAM, false); > + if (test__start_subtest("sockmap strp tcp pass fionread")) > + test_sockmap_strp_pass(AF_INET, SOCK_STREAM, true); > + if (test__start_subtest("sockmap strp tcp v6 pass fionread")) > + test_sockmap_strp_pass(AF_INET6, SOCK_STREAM, true); > + if (test__start_subtest("sockmap strp tcp verdict")) > + test_sockmap_strp_verdict(AF_INET, SOCK_STREAM); > + if (test__start_subtest("sockmap strp tcp v6 verdict")) > + test_sockmap_strp_verdict(AF_INET6, SOCK_STREAM); > + if (test__start_subtest("sockmap strp tcp partial read")) > + test_sockmap_strp_partial_read(AF_INET, SOCK_STREAM); > + if (test__start_subtest("sockmap strp tcp multiple packets")) > + test_sockmap_strp_multiple_pkt(AF_INET, SOCK_STREAM); > + if (test__start_subtest("sockmap strp tcp dispatch")) > + test_sockmap_strp_dispatch_pkt(AF_INET, SOCK_STREAM); > +} > diff --git a/tools/testing/selftests/bpf/progs/test_sockmap_strp.c b/tools/testing/selftests/bpf/progs/test_sockmap_strp.c > new file mode 100644 > index 000000000000..dde3d5bec515 > --- /dev/null > +++ b/tools/testing/selftests/bpf/progs/test_sockmap_strp.c > @@ -0,0 +1,53 @@ > +// SPDX-License-Identifier: GPL-2.0 > +#include <linux/bpf.h> > +#include <bpf/bpf_helpers.h> > +#include <bpf/bpf_endian.h> > +int verdict_max_size = 10000; > +struct { > + __uint(type, BPF_MAP_TYPE_SOCKMAP); > + __uint(max_entries, 20); > + __type(key, int); > + __type(value, int); > +} sock_map SEC(".maps"); > + > +SEC("sk_skb/stream_verdict") > +int prog_skb_verdict(struct __sk_buff *skb) > +{ > + __u32 one = 1; > + > + if (skb->len > verdict_max_size) > + return SK_PASS; > + > + return bpf_sk_redirect_map(skb, &sock_map, one, 0); > +} > + > +SEC("sk_skb/stream_verdict") > +int prog_skb_verdict_pass(struct __sk_buff *skb) > +{ > + return SK_PASS; > +} > + > +SEC("sk_skb/stream_parser") > +int prog_skb_parser(struct __sk_buff *skb) > +{ > + return skb->len; > +} > + > +SEC("sk_skb/stream_parser") > +int prog_skb_parser_partial(struct __sk_buff *skb) > +{ > + /* agreement with the test program on a 4-byte size header > + * and 6-byte body. > + */ > + if (skb->len < 4) { > + /* need more header to determine full length */ > + return 0; > + } > + /* return full length decoded from header. > + * the return value may be larger than skb->len which > + * means framework must wait body coming. > + */ > + return 10; > +} > + > +char _license[] SEC("license") = "GPL"; ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2025-01-22 8:58 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-01-21 5:07 [PATCH bpf v8 0/5] bpf: fix wrong copied_seq calculation and add tests Jiayuan Chen 2025-01-21 5:07 ` [PATCH bpf v8 1/5] strparser: add read_sock callback Jiayuan Chen 2025-01-21 14:19 ` Jakub Sitnicki 2025-01-21 5:07 ` [PATCH bpf v8 2/5] bpf: fix wrong copied_seq calculation Jiayuan Chen 2025-01-21 14:18 ` Jakub Sitnicki 2025-01-22 8:55 ` Jiayuan Chen 2025-01-21 5:07 ` [PATCH bpf v8 3/5] bpf: disable non stream socket for strparser Jiayuan Chen 2025-01-21 14:23 ` Jakub Sitnicki 2025-01-21 5:07 ` [PATCH bpf v8 4/5] selftests/bpf: fix invalid flag of recv() Jiayuan Chen 2025-01-21 14:26 ` Jakub Sitnicki 2025-01-21 5:07 ` [PATCH bpf v8 5/5] selftests/bpf: add strparser test for bpf Jiayuan Chen 2025-01-21 15:06 ` Jakub Sitnicki
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).