netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf v7 0/5] bpf: fix wrong copied_seq calculation and add tests
@ 2025-01-16 14:05 Jiayuan Chen
  2025-01-16 14:05 ` [PATCH bpf v7 1/5] strparser: add read_sock callback Jiayuan Chen
                   ` (4 more replies)
  0 siblings, 5 replies; 13+ messages in thread
From: Jiayuan Chen @ 2025-01-16 14:05 UTC (permalink / raw)
  To: bpf, jakub, john.fastabend
  Cc: netdev, martin.lau, ast, edumazet, davem, dsahern, kuba, pabeni,
	linux-kernel, song, andrii, mhal, yonghong.song, daniel,
	xiyou.wangcong, horms, corbet, eddyz87, cong.wang, shuah, mykolal,
	jolsa, haoluo, sdf, kpsingh, linux-doc, Jiayuan Chen

A previous commit described in this topic
http://lore.kernel.org/bpf/20230523025618.113937-9-john.fastabend@gmail.com
directly updated 'sk->copied_seq' in the tcp_eat_skb() function when the
action of a BPF program was SK_REDIRECT. For other actions, like SK_PASS,
the update logic for 'sk->copied_seq' was moved to
tcp_bpf_recvmsg_parser() to ensure the accuracy of the 'fionread' feature.

That commit works for a single stream_verdict scenario, as it also
modified 'sk_data_ready->sk_psock_verdict_data_ready->tcp_read_skb'
to remove updating 'sk->copied_seq'.

However, for programs where both stream_parser and stream_verdict are
active(strparser purpose), tcp_read_sock() was used instead of
tcp_read_skb() (sk_data_ready->strp_data_ready->tcp_read_sock)
tcp_read_sock() now still update 'sk->copied_seq', leading to duplicated
updates.

In summary, for strparser + SK_PASS, copied_seq is redundantly calculated
in both tcp_read_sock() and tcp_bpf_recvmsg_parser().

The issue causes incorrect copied_seq calculations, which prevent
correct data reads from the recv() interface in user-land.

Also we added test cases for bpf + strparser and separated them from
sockmap_basic, as strparser has more encapsulation and parsing
capabilities compared to sockmap.

Fixes: e5c6de5fa025 ("bpf, sockmap: Incorrectly handling copied_seq")

---
V3 -> V7:
https://lore.kernel.org/bpf/20250109094402.50838-1-mrpre@163.com/
https://lore.kernel.org/bpf/20241218053408.437295-1-mrpre@163.com/
Avoid introducing new proto_ops. (Jakub Sitnicki).
Add more edge test cases for strparser + bpf.
Fix patchwork fail of test cases code.
Fix psock fetch without rcu lock.
Move code of modifying to tcp_bpf.c.

V1 -> V3:
https://lore.kernel.org/bpf/20241209152740.281125-1-mrpre@163.com/
Fix patchwork fail by adding Fixes tag.
Save skb data offset for ENOMEM. (John Fastabend)

Jiayuan Chen (5):
  strparser: add read_sock callback
  bpf: fix wrong copied_seq calculation
  bpf: disable non stream socket for strparser
  selftests/bpf: fix invalid flag of recv()
  selftests/bpf: add strparser test for bpf

 Documentation/networking/strparser.rst        |  11 +-
 include/linux/skmsg.h                         |   4 +
 include/net/strparser.h                       |   2 +
 include/net/tcp.h                             |   3 +
 net/core/skmsg.c                              |  23 +
 net/core/sock_map.c                           |  13 +-
 net/ipv4/tcp.c                                |  29 +-
 net/ipv4/tcp_bpf.c                            |  47 ++
 net/strparser/strparser.c                     |  11 +-
 .../selftests/bpf/prog_tests/sockmap_basic.c  |  59 +--
 .../selftests/bpf/prog_tests/sockmap_strp.c   | 452 ++++++++++++++++++
 .../selftests/bpf/progs/test_sockmap_strp.c   |  53 ++
 12 files changed, 642 insertions(+), 65 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/sockmap_strp.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_sockmap_strp.c

-- 
2.43.5


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH bpf v7 1/5] strparser: add read_sock callback
  2025-01-16 14:05 [PATCH bpf v7 0/5] bpf: fix wrong copied_seq calculation and add tests Jiayuan Chen
@ 2025-01-16 14:05 ` Jiayuan Chen
  2025-01-18 14:56   ` Jakub Sitnicki
  2025-01-16 14:05 ` [PATCH bpf v7 2/5] bpf: fix wrong copied_seq calculation Jiayuan Chen
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 13+ messages in thread
From: Jiayuan Chen @ 2025-01-16 14:05 UTC (permalink / raw)
  To: bpf, jakub, john.fastabend
  Cc: netdev, martin.lau, ast, edumazet, davem, dsahern, kuba, pabeni,
	linux-kernel, song, andrii, mhal, yonghong.song, daniel,
	xiyou.wangcong, horms, corbet, eddyz87, cong.wang, shuah, mykolal,
	jolsa, haoluo, sdf, kpsingh, linux-doc, Jiayuan Chen

Added a new read_sock handler, allowing users to customize read operations
instead of relying on the native socket's read_sock.

Signed-off-by: Jiayuan Chen <mrpre@163.com>
---
 Documentation/networking/strparser.rst | 11 ++++++++++-
 include/net/strparser.h                |  2 ++
 net/strparser/strparser.c              | 11 +++++++++--
 3 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/Documentation/networking/strparser.rst b/Documentation/networking/strparser.rst
index 6cab1f74ae05..e41c18eee2f4 100644
--- a/Documentation/networking/strparser.rst
+++ b/Documentation/networking/strparser.rst
@@ -112,7 +112,7 @@ Functions
 Callbacks
 =========
 
-There are six callbacks:
+There are seven callbacks:
 
     ::
 
@@ -182,6 +182,15 @@ There are six callbacks:
     the length of the message. skb->len - offset may be greater
     then full_len since strparser does not trim the skb.
 
+    ::
+
+	int (*read_sock)(struct strparser *strp, read_descriptor_t *desc,
+                     sk_read_actor_t recv_actor);
+
+    read_sock is called when the user specify it, allowing for customized
+    read operations. If the callback is not set (NULL in strp_init) native
+    read_sock operation of the socket is used.
+
     ::
 
 	int (*read_sock_done)(struct strparser *strp, int err);
diff --git a/include/net/strparser.h b/include/net/strparser.h
index 41e2ce9e9e10..0a83010b3a64 100644
--- a/include/net/strparser.h
+++ b/include/net/strparser.h
@@ -43,6 +43,8 @@ struct strparser;
 struct strp_callbacks {
 	int (*parse_msg)(struct strparser *strp, struct sk_buff *skb);
 	void (*rcv_msg)(struct strparser *strp, struct sk_buff *skb);
+	int (*read_sock)(struct strparser *strp, read_descriptor_t *desc,
+			 sk_read_actor_t recv_actor);
 	int (*read_sock_done)(struct strparser *strp, int err);
 	void (*abort_parser)(struct strparser *strp, int err);
 	void (*lock)(struct strparser *strp);
diff --git a/net/strparser/strparser.c b/net/strparser/strparser.c
index 8299ceb3e373..95696f42647e 100644
--- a/net/strparser/strparser.c
+++ b/net/strparser/strparser.c
@@ -347,7 +347,10 @@ static int strp_read_sock(struct strparser *strp)
 	struct socket *sock = strp->sk->sk_socket;
 	read_descriptor_t desc;
 
-	if (unlikely(!sock || !sock->ops || !sock->ops->read_sock))
+	if (unlikely(!sock || !sock->ops))
+		return -EBUSY;
+
+	if (unlikely(!strp->cb.read_sock && !sock->ops->read_sock))
 		return -EBUSY;
 
 	desc.arg.data = strp;
@@ -355,7 +358,10 @@ static int strp_read_sock(struct strparser *strp)
 	desc.count = 1; /* give more than one skb per call */
 
 	/* sk should be locked here, so okay to do read_sock */
-	sock->ops->read_sock(strp->sk, &desc, strp_recv);
+	if (strp->cb.read_sock)
+		strp->cb.read_sock(strp, &desc, strp_recv);
+	else
+		sock->ops->read_sock(strp->sk, &desc, strp_recv);
 
 	desc.error = strp->cb.read_sock_done(strp, desc.error);
 
@@ -468,6 +474,7 @@ int strp_init(struct strparser *strp, struct sock *sk,
 	strp->cb.unlock = cb->unlock ? : strp_sock_unlock;
 	strp->cb.rcv_msg = cb->rcv_msg;
 	strp->cb.parse_msg = cb->parse_msg;
+	strp->cb.read_sock = cb->read_sock;
 	strp->cb.read_sock_done = cb->read_sock_done ? : default_read_sock_done;
 	strp->cb.abort_parser = cb->abort_parser ? : strp_abort_strp;
 
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH bpf v7 2/5] bpf: fix wrong copied_seq calculation
  2025-01-16 14:05 [PATCH bpf v7 0/5] bpf: fix wrong copied_seq calculation and add tests Jiayuan Chen
  2025-01-16 14:05 ` [PATCH bpf v7 1/5] strparser: add read_sock callback Jiayuan Chen
@ 2025-01-16 14:05 ` Jiayuan Chen
  2025-01-18 14:50   ` Jakub Sitnicki
  2025-01-16 14:05 ` [PATCH bpf v7 3/5] bpf: disable non stream socket for strparser Jiayuan Chen
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 13+ messages in thread
From: Jiayuan Chen @ 2025-01-16 14:05 UTC (permalink / raw)
  To: bpf, jakub, john.fastabend
  Cc: netdev, martin.lau, ast, edumazet, davem, dsahern, kuba, pabeni,
	linux-kernel, song, andrii, mhal, yonghong.song, daniel,
	xiyou.wangcong, horms, corbet, eddyz87, cong.wang, shuah, mykolal,
	jolsa, haoluo, sdf, kpsingh, linux-doc, Jiayuan Chen

'sk->copied_seq' was updated in the tcp_eat_skb() function when the
action of a BPF program was SK_REDIRECT. For other actions, like SK_PASS,
the update logic for 'sk->copied_seq' was moved to
tcp_bpf_recvmsg_parser() to ensure the accuracy of the 'fionread' feature.

It works for a single stream_verdict scenario, as it also modified
'sk_data_ready->sk_psock_verdict_data_ready->tcp_read_skb'
to remove updating 'sk->copied_seq'.

However, for programs where both stream_parser and stream_verdict are
active(strparser purpose), tcp_read_sock() was used instead of
tcp_read_skb() (sk_data_ready->strp_data_ready->tcp_read_sock)
tcp_read_sock() now still update 'sk->copied_seq', leading to duplicated
updates.

In summary, for strparser + SK_PASS, copied_seq is redundantly calculated
in both tcp_read_sock() and tcp_bpf_recvmsg_parser().

The issue causes incorrect copied_seq calculations, which prevent
correct data reads from the recv() interface in user-land.

We do not want to add new proto_ops to implement a new version of
tcp_read_sock, as this would introduce code complexity [1].

We add new callback for strparser for customized read operation, also as
a wrapper function it provides abstraction use psock.

[1]: https://lore.kernel.org/bpf/20241218053408.437295-1-mrpre@163.com
Fixes: e5c6de5fa025 ("bpf, sockmap: Incorrectly handling copied_seq")
Suggested-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Jiayuan Chen <mrpre@163.com>
---
 include/linux/skmsg.h |  4 ++++
 include/net/tcp.h     |  3 +++
 net/core/skmsg.c      | 23 +++++++++++++++++++++
 net/ipv4/tcp.c        | 29 +++++++++++++++++++++-----
 net/ipv4/tcp_bpf.c    | 47 +++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 101 insertions(+), 5 deletions(-)

diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h
index 2cbe0c22a32f..c9343eeac8b3 100644
--- a/include/linux/skmsg.h
+++ b/include/linux/skmsg.h
@@ -91,6 +91,10 @@ struct sk_psock {
 	struct sk_psock_progs		progs;
 #if IS_ENABLED(CONFIG_BPF_STREAM_PARSER)
 	struct strparser		strp;
+	int (*read_sock)(struct sock *sk, read_descriptor_t *desc,
+			 sk_read_actor_t recv_actor);
+	u32				copied_seq;
+	u32				ingress_bytes;
 #endif
 	struct sk_buff_head		ingress_skb;
 	struct list_head		ingress_msg;
diff --git a/include/net/tcp.h b/include/net/tcp.h
index e9b37b76e894..88e55e62023c 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -729,6 +729,9 @@ void tcp_get_info(struct sock *, struct tcp_info *);
 /* Read 'sendfile()'-style from a TCP socket */
 int tcp_read_sock(struct sock *sk, read_descriptor_t *desc,
 		  sk_read_actor_t recv_actor);
+int tcp_read_sock_noack(struct sock *sk, read_descriptor_t *desc,
+			sk_read_actor_t recv_actor, bool noack,
+			u32 *copied_seq);
 int tcp_read_skb(struct sock *sk, skb_read_actor_t recv_actor);
 struct sk_buff *tcp_recv_skb(struct sock *sk, u32 seq, u32 *off);
 void tcp_read_done(struct sock *sk, size_t len);
diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index 61f3f3d4e528..6695659d3447 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -549,6 +549,9 @@ static int sk_psock_skb_ingress_enqueue(struct sk_buff *skb,
 			return num_sge;
 	}
 
+#if IS_ENABLED(CONFIG_BPF_STREAM_PARSER)
+	psock->ingress_bytes += len;
+#endif
 	copied = len;
 	msg->sg.start = 0;
 	msg->sg.size = copied;
@@ -1092,6 +1095,25 @@ static int sk_psock_strp_read_done(struct strparser *strp, int err)
 	return err;
 }
 
+static int sk_psock_strp_read_sock(struct strparser *strp,
+				   read_descriptor_t *desc,
+				   sk_read_actor_t recv_actor)
+{
+	struct sock *sk = strp->sk;
+	struct socket *sock = sk->sk_socket;
+	struct sk_psock *psock;
+	int rv = 0;
+
+	rcu_read_lock();
+	psock = sk_psock(sk);
+	if (likely(psock && psock->read_sock))
+		rv = psock->read_sock(sk, desc, recv_actor);
+	else if (sock && sock->ops && sock->ops->read_sock)
+		rv = sock->ops->read_sock(sk, desc, recv_actor);
+	rcu_read_unlock();
+	return rv;
+}
+
 static int sk_psock_strp_parse(struct strparser *strp, struct sk_buff *skb)
 {
 	struct sk_psock *psock = container_of(strp, struct sk_psock, strp);
@@ -1136,6 +1158,7 @@ int sk_psock_init_strp(struct sock *sk, struct sk_psock *psock)
 
 	static const struct strp_callbacks cb = {
 		.rcv_msg	= sk_psock_strp_read,
+		.read_sock	= sk_psock_strp_read_sock,
 		.read_sock_done	= sk_psock_strp_read_done,
 		.parse_msg	= sk_psock_strp_parse,
 	};
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 0d704bda6c41..285678d8ce07 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1565,12 +1565,13 @@ EXPORT_SYMBOL(tcp_recv_skb);
  *	  or for 'peeking' the socket using this routine
  *	  (although both would be easy to implement).
  */
-int tcp_read_sock(struct sock *sk, read_descriptor_t *desc,
-		  sk_read_actor_t recv_actor)
+static int __tcp_read_sock(struct sock *sk, read_descriptor_t *desc,
+			   sk_read_actor_t recv_actor, bool noack,
+			   u32 *copied_seq)
 {
 	struct sk_buff *skb;
 	struct tcp_sock *tp = tcp_sk(sk);
-	u32 seq = tp->copied_seq;
+	u32 seq = *copied_seq;
 	u32 offset;
 	int copied = 0;
 
@@ -1624,9 +1625,12 @@ int tcp_read_sock(struct sock *sk, read_descriptor_t *desc,
 		tcp_eat_recv_skb(sk, skb);
 		if (!desc->count)
 			break;
-		WRITE_ONCE(tp->copied_seq, seq);
+		WRITE_ONCE(*copied_seq, seq);
 	}
-	WRITE_ONCE(tp->copied_seq, seq);
+	WRITE_ONCE(*copied_seq, seq);
+
+	if (noack)
+		goto out;
 
 	tcp_rcv_space_adjust(sk);
 
@@ -1635,10 +1639,25 @@ int tcp_read_sock(struct sock *sk, read_descriptor_t *desc,
 		tcp_recv_skb(sk, seq, &offset);
 		tcp_cleanup_rbuf(sk, copied);
 	}
+out:
 	return copied;
 }
+
+int tcp_read_sock(struct sock *sk, read_descriptor_t *desc,
+		  sk_read_actor_t recv_actor)
+{
+	return __tcp_read_sock(sk, desc, recv_actor, false,
+			       &tcp_sk(sk)->copied_seq);
+}
 EXPORT_SYMBOL(tcp_read_sock);
 
+int tcp_read_sock_noack(struct sock *sk, read_descriptor_t *desc,
+			sk_read_actor_t recv_actor, bool noack,
+			u32 *copied_seq)
+{
+	return __tcp_read_sock(sk, desc, recv_actor, noack, copied_seq);
+}
+
 int tcp_read_skb(struct sock *sk, skb_read_actor_t recv_actor)
 {
 	struct sk_buff *skb;
diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
index 47f65b1b70ca..6dcde3506a9b 100644
--- a/net/ipv4/tcp_bpf.c
+++ b/net/ipv4/tcp_bpf.c
@@ -646,6 +646,47 @@ static int tcp_bpf_assert_proto_ops(struct proto *ops)
 	       ops->sendmsg  == tcp_sendmsg ? 0 : -ENOTSUPP;
 }
 
+#if IS_ENABLED(CONFIG_BPF_STREAM_PARSER)
+static int tcp_bpf_strp_read_sock(struct sock *sk, read_descriptor_t *desc,
+				  sk_read_actor_t recv_actor)
+{
+	struct sk_psock *psock;
+	struct tcp_sock *tp;
+	int copied = 0;
+
+	tp = tcp_sk(sk);
+	rcu_read_lock();
+	psock = sk_psock(sk);
+	if (WARN_ON(!psock)) {
+		desc->error = -EINVAL;
+		goto out;
+	}
+
+	psock->ingress_bytes = 0;
+	/* We could easily add copied_seq and noack into desc then call
+	 * ops->read_sock without calling symbol directly. But unfortunately
+	 * most descriptors used by other modules are not inited with zero.
+	 * Also it not work by replacing ops->read_sock without introducing
+	 * new ops as ops itself is located in rodata segment.
+	 */
+	copied = tcp_read_sock_noack(sk, desc, recv_actor, true,
+				     &psock->copied_seq);
+	if (copied < 0)
+		goto out;
+	/* recv_actor may redirect skb to another socket(SK_REDIRECT) or
+	 * just put skb into ingress queue of current socket(SK_PASS).
+	 * For SK_REDIRECT, we need 'ack' the frame immediately but for
+	 * SK_PASS, the 'ack' was delay to tcp_bpf_recvmsg_parser()
+	 */
+	tp->copied_seq = psock->copied_seq - psock->ingress_bytes;
+	tcp_rcv_space_adjust(sk);
+	__tcp_cleanup_rbuf(sk, copied - psock->ingress_bytes);
+out:
+	rcu_read_unlock();
+	return copied;
+}
+#endif /* CONFIG_BPF_STREAM_PARSER */
+
 int tcp_bpf_update_proto(struct sock *sk, struct sk_psock *psock, bool restore)
 {
 	int family = sk->sk_family == AF_INET6 ? TCP_BPF_IPV6 : TCP_BPF_IPV4;
@@ -681,6 +722,12 @@ int tcp_bpf_update_proto(struct sock *sk, struct sk_psock *psock, bool restore)
 
 	/* Pairs with lockless read in sk_clone_lock() */
 	sock_replace_proto(sk, &tcp_bpf_prots[family][config]);
+#if IS_ENABLED(CONFIG_BPF_STREAM_PARSER)
+	if (psock->progs.stream_parser && psock->progs.stream_verdict) {
+		psock->copied_seq = tcp_sk(sk)->copied_seq;
+		psock->read_sock = tcp_bpf_strp_read_sock;
+	}
+#endif
 	return 0;
 }
 EXPORT_SYMBOL_GPL(tcp_bpf_update_proto);
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH bpf v7 3/5] bpf: disable non stream socket for strparser
  2025-01-16 14:05 [PATCH bpf v7 0/5] bpf: fix wrong copied_seq calculation and add tests Jiayuan Chen
  2025-01-16 14:05 ` [PATCH bpf v7 1/5] strparser: add read_sock callback Jiayuan Chen
  2025-01-16 14:05 ` [PATCH bpf v7 2/5] bpf: fix wrong copied_seq calculation Jiayuan Chen
@ 2025-01-16 14:05 ` Jiayuan Chen
  2025-01-18 15:03   ` Jakub Sitnicki
  2025-01-16 14:05 ` [PATCH bpf v7 4/5] selftests/bpf: fix invalid flag of recv() Jiayuan Chen
  2025-01-16 14:05 ` [PATCH bpf v7 5/5] selftests/bpf: add strparser test for bpf Jiayuan Chen
  4 siblings, 1 reply; 13+ messages in thread
From: Jiayuan Chen @ 2025-01-16 14:05 UTC (permalink / raw)
  To: bpf, jakub, john.fastabend
  Cc: netdev, martin.lau, ast, edumazet, davem, dsahern, kuba, pabeni,
	linux-kernel, song, andrii, mhal, yonghong.song, daniel,
	xiyou.wangcong, horms, corbet, eddyz87, cong.wang, shuah, mykolal,
	jolsa, haoluo, sdf, kpsingh, linux-doc, Jiayuan Chen

Currently, only TCP supports strparser, but sockmap doesn't intercept
non-TCP to attach strparser. For example, with UDP, although the
read/write handlers are replaced, strparser is not executed due to the
lack of read_sock operation.

Furthermore, in udp_bpf_recvmsg(), it checks whether psock has data, and
if not, it falls back to the native UDP read interface, making
UDP + strparser appear to read correctly. According to it's commit
history, the behavior is unexpected.

Moreover, since UDP lacks the concept of streams, we intercept it
directly. Later, we will try to support Unix streams and add more
check.

Signed-off-by: Jiayuan Chen <mrpre@163.com>
---
 net/core/sock_map.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index f1b9b3958792..c6ee2d1d9cf2 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -214,6 +214,14 @@ static struct sk_psock *sock_map_psock_get_checked(struct sock *sk)
 	return psock;
 }
 
+static bool sock_map_sk_strp_allowed(const struct sock *sk)
+{
+	/* todo: support unix stream socket */
+	if (sk_is_tcp(sk))
+		return true;
+	return false;
+}
+
 static int sock_map_link(struct bpf_map *map, struct sock *sk)
 {
 	struct sk_psock_progs *progs = sock_map_progs(map);
@@ -303,7 +311,10 @@ static int sock_map_link(struct bpf_map *map, struct sock *sk)
 
 	write_lock_bh(&sk->sk_callback_lock);
 	if (stream_parser && stream_verdict && !psock->saved_data_ready) {
-		ret = sk_psock_init_strp(sk, psock);
+		if (sock_map_sk_strp_allowed(sk))
+			ret = sk_psock_init_strp(sk, psock);
+		else
+			ret = -EOPNOTSUPP;
 		if (ret) {
 			write_unlock_bh(&sk->sk_callback_lock);
 			sk_psock_put(sk, psock);
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH bpf v7 4/5] selftests/bpf: fix invalid flag of recv()
  2025-01-16 14:05 [PATCH bpf v7 0/5] bpf: fix wrong copied_seq calculation and add tests Jiayuan Chen
                   ` (2 preceding siblings ...)
  2025-01-16 14:05 ` [PATCH bpf v7 3/5] bpf: disable non stream socket for strparser Jiayuan Chen
@ 2025-01-16 14:05 ` Jiayuan Chen
  2025-01-16 14:05 ` [PATCH bpf v7 5/5] selftests/bpf: add strparser test for bpf Jiayuan Chen
  4 siblings, 0 replies; 13+ messages in thread
From: Jiayuan Chen @ 2025-01-16 14:05 UTC (permalink / raw)
  To: bpf, jakub, john.fastabend
  Cc: netdev, martin.lau, ast, edumazet, davem, dsahern, kuba, pabeni,
	linux-kernel, song, andrii, mhal, yonghong.song, daniel,
	xiyou.wangcong, horms, corbet, eddyz87, cong.wang, shuah, mykolal,
	jolsa, haoluo, sdf, kpsingh, linux-doc, Jiayuan Chen

SOCK_NONBLOCK flag is only effective during socket creation, not during
recv. Use MSG_DONTWAIT instead.

Signed-off-by: Jiayuan Chen <mrpre@163.com>
---
 tools/testing/selftests/bpf/prog_tests/sockmap_basic.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c b/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c
index 884ad87783d5..0c51b7288978 100644
--- a/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c
+++ b/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c
@@ -522,8 +522,8 @@ static void test_sockmap_skb_verdict_shutdown(void)
 	if (!ASSERT_EQ(err, 1, "epoll_wait(fd)"))
 		goto out_close;
 
-	n = recv(c1, &b, 1, SOCK_NONBLOCK);
-	ASSERT_EQ(n, 0, "recv_timeout(fin)");
+	n = recv(c1, &b, 1, MSG_DONTWAIT);
+	ASSERT_EQ(n, 0, "recv(fin)");
 out_close:
 	close(c1);
 	close(p1);
@@ -628,7 +628,7 @@ static void test_sockmap_skb_verdict_fionread(bool pass_prog)
 	ASSERT_EQ(avail, expected, "ioctl(FIONREAD)");
 	/* On DROP test there will be no data to read */
 	if (pass_prog) {
-		recvd = recv_timeout(c1, &buf, sizeof(buf), SOCK_NONBLOCK, IO_TIMEOUT_SEC);
+		recvd = recv_timeout(c1, &buf, sizeof(buf), MSG_DONTWAIT, IO_TIMEOUT_SEC);
 		ASSERT_EQ(recvd, sizeof(buf), "recv_timeout(c0)");
 	}
 
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH bpf v7 5/5] selftests/bpf: add strparser test for bpf
  2025-01-16 14:05 [PATCH bpf v7 0/5] bpf: fix wrong copied_seq calculation and add tests Jiayuan Chen
                   ` (3 preceding siblings ...)
  2025-01-16 14:05 ` [PATCH bpf v7 4/5] selftests/bpf: fix invalid flag of recv() Jiayuan Chen
@ 2025-01-16 14:05 ` Jiayuan Chen
  4 siblings, 0 replies; 13+ messages in thread
From: Jiayuan Chen @ 2025-01-16 14:05 UTC (permalink / raw)
  To: bpf, jakub, john.fastabend
  Cc: netdev, martin.lau, ast, edumazet, davem, dsahern, kuba, pabeni,
	linux-kernel, song, andrii, mhal, yonghong.song, daniel,
	xiyou.wangcong, horms, corbet, eddyz87, cong.wang, shuah, mykolal,
	jolsa, haoluo, sdf, kpsingh, linux-doc, Jiayuan Chen

Add test cases for bpf + strparser and separated them from
sockmap_basic, as strparser has more encapsulation and parsing
capabilities compared to sockmap.

Signed-off-by: Jiayuan Chen <mrpre@163.com>
---
 .../selftests/bpf/prog_tests/sockmap_basic.c  |  53 --
 .../selftests/bpf/prog_tests/sockmap_strp.c   | 452 ++++++++++++++++++
 .../selftests/bpf/progs/test_sockmap_strp.c   |  53 ++
 3 files changed, 505 insertions(+), 53 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/sockmap_strp.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_sockmap_strp.c

diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c b/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c
index 0c51b7288978..f8953455db29 100644
--- a/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c
+++ b/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c
@@ -531,57 +531,6 @@ static void test_sockmap_skb_verdict_shutdown(void)
 	test_sockmap_pass_prog__destroy(skel);
 }
 
-static void test_sockmap_stream_pass(void)
-{
-	int zero = 0, sent, recvd;
-	int verdict, parser;
-	int err, map;
-	int c = -1, p = -1;
-	struct test_sockmap_pass_prog *pass = NULL;
-	char snd[256] = "0123456789";
-	char rcv[256] = "0";
-
-	pass = test_sockmap_pass_prog__open_and_load();
-	verdict = bpf_program__fd(pass->progs.prog_skb_verdict);
-	parser = bpf_program__fd(pass->progs.prog_skb_parser);
-	map = bpf_map__fd(pass->maps.sock_map_rx);
-
-	err = bpf_prog_attach(parser, map, BPF_SK_SKB_STREAM_PARSER, 0);
-	if (!ASSERT_OK(err, "bpf_prog_attach stream parser"))
-		goto out;
-
-	err = bpf_prog_attach(verdict, map, BPF_SK_SKB_STREAM_VERDICT, 0);
-	if (!ASSERT_OK(err, "bpf_prog_attach stream verdict"))
-		goto out;
-
-	err = create_pair(AF_INET, SOCK_STREAM, &c, &p);
-	if (err)
-		goto out;
-
-	/* sk_data_ready of 'p' will be replaced by strparser handler */
-	err = bpf_map_update_elem(map, &zero, &p, BPF_NOEXIST);
-	if (!ASSERT_OK(err, "bpf_map_update_elem(p)"))
-		goto out_close;
-
-	/*
-	 * as 'prog_skb_parser' return the original skb len and
-	 * 'prog_skb_verdict' return SK_PASS, the kernel will just
-	 * pass it through to original socket 'p'
-	 */
-	sent = xsend(c, snd, sizeof(snd), 0);
-	ASSERT_EQ(sent, sizeof(snd), "xsend(c)");
-
-	recvd = recv_timeout(p, rcv, sizeof(rcv), SOCK_NONBLOCK,
-			     IO_TIMEOUT_SEC);
-	ASSERT_EQ(recvd, sizeof(rcv), "recv_timeout(p)");
-
-out_close:
-	close(c);
-	close(p);
-
-out:
-	test_sockmap_pass_prog__destroy(pass);
-}
 
 static void test_sockmap_skb_verdict_fionread(bool pass_prog)
 {
@@ -1101,8 +1050,6 @@ void test_sockmap_basic(void)
 		test_sockmap_progs_query(BPF_SK_SKB_VERDICT);
 	if (test__start_subtest("sockmap skb_verdict shutdown"))
 		test_sockmap_skb_verdict_shutdown();
-	if (test__start_subtest("sockmap stream parser and verdict pass"))
-		test_sockmap_stream_pass();
 	if (test__start_subtest("sockmap skb_verdict fionread"))
 		test_sockmap_skb_verdict_fionread(true);
 	if (test__start_subtest("sockmap skb_verdict fionread on drop"))
diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_strp.c b/tools/testing/selftests/bpf/prog_tests/sockmap_strp.c
new file mode 100644
index 000000000000..01ed1fca1d9c
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/sockmap_strp.c
@@ -0,0 +1,452 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <error.h>
+#include <netinet/tcp.h>
+#include <test_progs.h>
+#include "sockmap_helpers.h"
+#include "test_skmsg_load_helpers.skel.h"
+#include "test_sockmap_strp.skel.h"
+#define STRP_PKT_HEAD_LEN 4
+#define STRP_PKT_BODY_LEN 6
+#define STRP_PKT_FULL_LEN (STRP_PKT_HEAD_LEN + STRP_PKT_BODY_LEN)
+static const char packet[STRP_PKT_FULL_LEN] = "head+body\0";
+static const int test_packet_num = 100;
+
+/* current implementation of tcp_bpf_recvmsg_parser() invoke
+ * data_ready with sk held if skb exist in sk_receive_queue.
+ * Then for data_ready implementation of strparser, it will
+ * delay the read operation if sk was held and EAGAIN is returned.
+ */
+static int sockmap_strp_consume_pre_data(int p)
+{
+	int recvd;
+	bool retried = false;
+	char rcv[10];
+
+retry:
+	errno = 0;
+	recvd = recv_timeout(p, rcv, sizeof(rcv), 0, 1);
+	if (recvd < 0 && errno == EAGAIN && retried == false) {
+		/* On the first call, EAGAIN will certainly be returned.
+		 * Waiting 1 second is pretty enough wait workqueue finish.
+		 */
+		sleep(1);
+		retried = true;
+		goto retry;
+	}
+
+	if (!ASSERT_EQ(recvd, STRP_PKT_FULL_LEN, "recv_timeout(pre-data)") ||
+	    !ASSERT_OK(memcmp(packet, rcv, STRP_PKT_FULL_LEN),
+				"memcmp pre-data"))
+		return -1;
+	return 0;
+}
+
+static struct test_sockmap_strp *sockmap_strp_init(int *out_map, bool pass,
+						   bool need_parser)
+{
+	struct test_sockmap_strp *strp = NULL;
+	int verdict, parser;
+	int err;
+
+	strp = test_sockmap_strp__open_and_load();
+	*out_map = bpf_map__fd(strp->maps.sock_map);
+
+	if (need_parser)
+		parser = bpf_program__fd(strp->progs.prog_skb_parser_partial);
+	else
+		parser = bpf_program__fd(strp->progs.prog_skb_parser);
+
+	if (pass)
+		verdict = bpf_program__fd(strp->progs.prog_skb_verdict_pass);
+	else
+		verdict = bpf_program__fd(strp->progs.prog_skb_verdict);
+
+	err = bpf_prog_attach(parser, *out_map, BPF_SK_SKB_STREAM_PARSER, 0);
+	if (!ASSERT_OK(err, "bpf_prog_attach stream parser"))
+		goto err;
+
+	err = bpf_prog_attach(verdict, *out_map, BPF_SK_SKB_STREAM_VERDICT, 0);
+	if (!ASSERT_OK(err, "bpf_prog_attach stream verdict"))
+		goto err;
+
+	return strp;
+err:
+	test_sockmap_strp__destroy(strp);
+	return NULL;
+}
+
+/* Dispatch packets to different socket by packet size:
+ *
+ *                      ------  ------
+ *                     | pkt4 || pkt1 |... > remote socket
+ *  ------ ------     / ------  ------
+ * | pkt8 | pkt7 |...
+ *  ------ ------     \ ------  ------
+ *                     | pkt3 || pkt2 |... > local socket
+ *                      ------  ------
+ */
+static void test_sockmap_strp_dispatch_pkt(int family, int sotype)
+{
+	int i, j, zero = 0, one = 1, recvd;
+	int err, map;
+	int c0 = -1, p0 = -1, c1 = -1, p1 = -1;
+	struct test_sockmap_strp *strp = NULL;
+	int test_cnt = 6;
+	char rcv[10];
+	struct {
+		char	data[7];
+		int	data_len;
+		int	send_cnt;
+		int	*receiver;
+	} send_dir[2] = {
+		/* data expected to deliver to local */
+		{"llllll", 6, 0, &p0},
+		/* data expected to deliver to remote */
+		{"rrrrr",  5, 0, &c1}
+	};
+
+	strp = sockmap_strp_init(&map, false, false);
+	if (!ASSERT_TRUE(strp, "sockmap_strp_init"))
+		return;
+
+	err = create_socket_pairs(family, sotype, &c0, &c1, &p0, &p1);
+	if (!ASSERT_OK(err, "create_socket_pairs()"))
+		goto out;
+
+	err = bpf_map_update_elem(map, &zero, &p0, BPF_NOEXIST);
+	if (!ASSERT_OK(err, "bpf_map_update_elem(p0)"))
+		goto out_close;
+
+	err = bpf_map_update_elem(map, &one, &p1, BPF_NOEXIST);
+	if (!ASSERT_OK(err, "bpf_map_update_elem(p1)"))
+		goto out_close;
+
+	err = setsockopt(c1, IPPROTO_TCP, TCP_NODELAY, &zero, sizeof(zero));
+	if (!ASSERT_OK(err, "setsockopt(TCP_NODELAY)"))
+		goto out_close;
+
+	/* deliver data with data size greater than 5 to local */
+	strp->data->verdict_max_size = 5;
+
+	for (i = 0; i < test_cnt; i++) {
+		int d = i % 2;
+
+		xsend(c0, send_dir[d].data, send_dir[d].data_len, 0);
+		send_dir[d].send_cnt++;
+	}
+
+	for (i = 0; i < 2; i++) {
+		for (j = 0; j < send_dir[i].send_cnt; j++) {
+			int expected = send_dir[i].data_len;
+
+			recvd = recv_timeout(*send_dir[i].receiver, rcv,
+					     expected, MSG_DONTWAIT,
+					     IO_TIMEOUT_SEC);
+			if (!ASSERT_EQ(recvd, expected, "recv_timeout()"))
+				goto out_close;
+			if (!ASSERT_OK(memcmp(send_dir[i].data, rcv, recvd),
+				       "memcmp(rcv)"))
+				goto out_close;
+		}
+	}
+out_close:
+	close(c0);
+	close(c1);
+	close(p0);
+	close(p1);
+out:
+	test_sockmap_strp__destroy(strp);
+}
+
+/* We have multiple packets in one skb
+ * ------------ ------------ ------------
+ * |  packet1  |   packet2  |  ...
+ * ------------ ------------ ------------
+ */
+static void test_sockmap_strp_multiple_pkt(int family, int sotype)
+{
+	int i, zero = 0;
+	int sent, recvd, total;
+	int err, map;
+	int c = -1, p = -1;
+	struct test_sockmap_strp *strp = NULL;
+	char *snd = NULL, *rcv = NULL;
+
+	strp = sockmap_strp_init(&map, true, true);
+	if (!ASSERT_TRUE(strp, "sockmap_strp_init"))
+		return;
+
+	err = create_pair(family, sotype, &c, &p);
+	if (err)
+		goto out;
+
+	err = bpf_map_update_elem(map, &zero, &p, BPF_NOEXIST);
+	if (!ASSERT_OK(err, "bpf_map_update_elem(zero, p)"))
+		goto out_close;
+
+	/* construct multiple packets in one buffer */
+	total = test_packet_num * STRP_PKT_FULL_LEN;
+	snd = malloc(total);
+	rcv = malloc(total + 1);
+	if (!ASSERT_TRUE(snd, "malloc(snd)") ||
+	    !ASSERT_TRUE(rcv, "malloc(rcv)"))
+		goto out_close;
+
+	for (i = 0; i < test_packet_num; i++) {
+		memcpy(snd + i * STRP_PKT_FULL_LEN,
+		       packet, STRP_PKT_FULL_LEN);
+	}
+
+	sent = xsend(c, snd, total, 0);
+	if (!ASSERT_EQ(sent, total, "xsend(c)"))
+		goto out_close;
+
+	/* try to recv one more byte to avoid truncation check */
+	recvd = recv_timeout(p, rcv, total + 1, MSG_DONTWAIT, IO_TIMEOUT_SEC);
+	if (!ASSERT_EQ(recvd, total, "recv(rcv)"))
+		goto out_close;
+
+	/* we sent TCP segment with multiple encapsulation
+	 * then check whether packets are handled correctly
+	 */
+	if (!ASSERT_OK(memcmp(snd, rcv, total), "memcmp(snd, rcv)"))
+		goto out_close;
+
+out_close:
+	close(c);
+	close(p);
+	if (snd)
+		free(snd);
+	if (rcv)
+		free(rcv);
+out:
+	test_sockmap_strp__destroy(strp);
+}
+
+/* Test strparser with partial read */
+static void test_sockmap_strp_partial_read(int family, int sotype)
+{
+	int zero = 0, recvd, off;
+	int err, map;
+	int c = -1, p = -1;
+	struct test_sockmap_strp *strp = NULL;
+	char rcv[STRP_PKT_FULL_LEN + 1] = "0";
+
+	strp = sockmap_strp_init(&map, true, true);
+	if (!ASSERT_TRUE(strp, "sockmap_strp_init"))
+		return;
+
+	err = create_pair(family, sotype, &c, &p);
+	if (err)
+		goto out;
+
+	/* sk_data_ready of 'p' will be replaced by strparser handler */
+	err = bpf_map_update_elem(map, &zero, &p, BPF_NOEXIST);
+	if (!ASSERT_OK(err, "bpf_map_update_elem(zero, p)"))
+		goto out_close;
+
+	/* 1.1 send partial head, 1 byte header left*/
+	off = STRP_PKT_HEAD_LEN - 1;
+	xsend(c, packet, off, 0);
+	recvd = recv_timeout(p, rcv, sizeof(rcv), MSG_DONTWAIT, 1);
+	if (!ASSERT_EQ(-1, recvd, "insufficient head, should no data recvd"))
+		goto out_close;
+
+	/* 1.2 send remaining head and body */
+	xsend(c, packet + off, STRP_PKT_FULL_LEN - off, 0);
+	recvd = recv_timeout(p, rcv, sizeof(rcv), MSG_DONTWAIT, IO_TIMEOUT_SEC);
+	if (!ASSERT_EQ(recvd, STRP_PKT_FULL_LEN, "should full data recvd"))
+		goto out_close;
+
+	/* 2.1 send partial head, 1 byte header left */
+	off = STRP_PKT_HEAD_LEN - 1;
+	xsend(c, packet, off, 0);
+
+	/* 2.2 send remaining head and partial body, 1 byte body left */
+	xsend(c, packet + off, STRP_PKT_FULL_LEN - off - 1, 0);
+	off = STRP_PKT_FULL_LEN - 1;
+	recvd = recv_timeout(p, rcv, sizeof(rcv), MSG_DONTWAIT, 1);
+	if (!ASSERT_EQ(-1, recvd, "insufficient body, should no data read"))
+		goto out_close;
+
+	/* 2.3 send remaining body */
+	xsend(c, packet + off, STRP_PKT_FULL_LEN - off, 0);
+	recvd = recv_timeout(p, rcv, sizeof(rcv), MSG_DONTWAIT, IO_TIMEOUT_SEC);
+	if (!ASSERT_EQ(recvd, STRP_PKT_FULL_LEN, "should full data recvd"))
+		goto out_close;
+
+out_close:
+	close(c);
+	close(p);
+
+out:
+	test_sockmap_strp__destroy(strp);
+}
+
+/* Test simple socket read/write with strparser + FIONREAD */
+static void test_sockmap_strp_pass(int family, int sotype, bool fionread)
+{
+	int zero = 0, pkt_size = STRP_PKT_FULL_LEN, sent, recvd, avail;
+	int err, map;
+	int c = -1, p = -1;
+	int test_cnt = 10, i;
+	struct test_sockmap_strp *strp = NULL;
+	char rcv[STRP_PKT_FULL_LEN + 1] = "0";
+
+	strp = sockmap_strp_init(&map, true, true);
+	if (!ASSERT_TRUE(strp, "sockmap_strp_init"))
+		return;
+
+	err = create_pair(family, sotype, &c, &p);
+	if (err)
+		goto out;
+
+	/* inject some data before bpf process, it should be read
+	 * correctly because we check sk_receive_queue in
+	 * tcp_bpf_recvmsg_parser()
+	 */
+	sent = xsend(c, packet, pkt_size, 0);
+	if (!ASSERT_EQ(sent, pkt_size, "xsend(pre-data)"))
+		goto out_close;
+
+	/* sk_data_ready of 'p' will be replaced by strparser handler */
+	err = bpf_map_update_elem(map, &zero, &p, BPF_NOEXIST);
+	if (!ASSERT_OK(err, "bpf_map_update_elem(p)"))
+		goto out_close;
+
+	/* consume previous data we injected */
+	if (sockmap_strp_consume_pre_data(p))
+		goto out_close;
+
+	/* Previously, we encountered issues such as deadlocks and
+	 * sequence errors that resulted in the inability to read
+	 * continuously. Therefore, we perform multiple iterations
+	 * of testing here.
+	 */
+	for (i = 0; i < test_cnt; i++) {
+		sent = xsend(c, packet, pkt_size, 0);
+		if (!ASSERT_EQ(sent, pkt_size, "xsend(c)"))
+			goto out_close;
+
+		recvd = recv_timeout(p, rcv, sizeof(rcv), MSG_DONTWAIT,
+				     IO_TIMEOUT_SEC);
+		if (!ASSERT_EQ(recvd, pkt_size, "recv_timeout(p)") ||
+		    !ASSERT_OK(memcmp(packet, rcv, pkt_size),
+				  "memcmp"))
+			goto out_close;
+	}
+
+	if (fionread) {
+		sent = xsend(c, packet, pkt_size, 0);
+		if (!ASSERT_EQ(sent, pkt_size, "second xsend(c)"))
+			goto out_close;
+
+		err = ioctl(p, FIONREAD, &avail);
+		if (!ASSERT_OK(err, "ioctl(FIONREAD) error") ||
+		    !ASSERT_EQ(avail, pkt_size, "ioctl(FIONREAD)"))
+			goto out_close;
+
+		recvd = recv_timeout(p, rcv, sizeof(rcv), MSG_DONTWAIT,
+				     IO_TIMEOUT_SEC);
+		if (!ASSERT_EQ(recvd, pkt_size, "second recv_timeout(p)") ||
+		    !ASSERT_OK(memcmp(packet, rcv, pkt_size),
+			      "second memcmp"))
+			goto out_close;
+	}
+
+out_close:
+	close(c);
+	close(p);
+
+out:
+	test_sockmap_strp__destroy(strp);
+}
+
+/* Test strparser with verdict mode */
+static void test_sockmap_strp_verdict(int family, int sotype)
+{
+	int zero = 0, one = 1, sent, recvd, off;
+	int err, map;
+	int c0 = -1, p0 = -1, c1 = -1, p1 = -1;
+	struct test_sockmap_strp *strp = NULL;
+	char rcv[STRP_PKT_FULL_LEN + 1] = "0";
+
+	strp = sockmap_strp_init(&map, false, true);
+	if (!ASSERT_TRUE(strp, "sockmap_strp_init"))
+		return;
+
+	/* We simulate a reverse proxy server.
+	 * When p0 receives data from c0, we forward it to c1.
+	 * From c1's perspective, it will consider this data
+	 * as being sent by p1.
+	 */
+	err = create_socket_pairs(family, sotype, &c0, &c1, &p0, &p1);
+	if (!ASSERT_OK(err, "create_socket_pairs()"))
+		goto out;
+
+	err = bpf_map_update_elem(map, &zero, &p0, BPF_NOEXIST);
+	if (!ASSERT_OK(err, "bpf_map_update_elem(p0)"))
+		goto out_close;
+
+	err = bpf_map_update_elem(map, &one, &p1, BPF_NOEXIST);
+	if (!ASSERT_OK(err, "bpf_map_update_elem(c1)"))
+		goto out_close;
+
+	sent = xsend(c0, packet, STRP_PKT_FULL_LEN, 0);
+	if (!ASSERT_EQ(sent, STRP_PKT_FULL_LEN, "xsend(c0)"))
+		goto out_close;
+
+	recvd = recv_timeout(c1, rcv, sizeof(rcv), MSG_DONTWAIT,
+			     IO_TIMEOUT_SEC);
+	if (!ASSERT_EQ(recvd, STRP_PKT_FULL_LEN, "recv_timeout(p1)") ||
+	    !ASSERT_OK(memcmp(packet, rcv, STRP_PKT_FULL_LEN),
+			  "received data does not match the sent data"))
+		goto out_close;
+
+	/* send again to ensure the stream is functioning correctly. */
+	sent = xsend(c0, packet, STRP_PKT_FULL_LEN, 0);
+	if (!ASSERT_EQ(sent, STRP_PKT_FULL_LEN, "second xsend(c0)"))
+		goto out_close;
+
+	/* partial read */
+	off = STRP_PKT_FULL_LEN / 2;
+	recvd = recv_timeout(c1, rcv, off, MSG_DONTWAIT,
+			     IO_TIMEOUT_SEC);
+	recvd += recv_timeout(c1, rcv + off, sizeof(rcv) - off, MSG_DONTWAIT,
+			      IO_TIMEOUT_SEC);
+
+	if (!ASSERT_EQ(recvd, STRP_PKT_FULL_LEN, "partial recv_timeout(c1)") ||
+	    !ASSERT_OK(memcmp(packet, rcv, STRP_PKT_FULL_LEN),
+			  "partial received data does not match the sent data"))
+		goto out_close;
+
+out_close:
+	close(c0);
+	close(c1);
+	close(p0);
+	close(p1);
+out:
+	test_sockmap_strp__destroy(strp);
+}
+
+void test_sockmap_strp(void)
+{
+	if (test__start_subtest("sockmap strp tcp pass"))
+		test_sockmap_strp_pass(AF_INET, SOCK_STREAM, false);
+	if (test__start_subtest("sockmap strp tcp v6 pass"))
+		test_sockmap_strp_pass(AF_INET6, SOCK_STREAM, false);
+	if (test__start_subtest("sockmap strp tcp pass fionread"))
+		test_sockmap_strp_pass(AF_INET, SOCK_STREAM, true);
+	if (test__start_subtest("sockmap strp tcp v6 pass fionread"))
+		test_sockmap_strp_pass(AF_INET6, SOCK_STREAM, true);
+	if (test__start_subtest("sockmap strp tcp verdict"))
+		test_sockmap_strp_verdict(AF_INET, SOCK_STREAM);
+	if (test__start_subtest("sockmap strp tcp v6 verdict"))
+		test_sockmap_strp_verdict(AF_INET6, SOCK_STREAM);
+	if (test__start_subtest("sockmap strp tcp partial read"))
+		test_sockmap_strp_partial_read(AF_INET, SOCK_STREAM);
+	if (test__start_subtest("sockmap strp tcp multiple packets"))
+		test_sockmap_strp_multiple_pkt(AF_INET, SOCK_STREAM);
+	if (test__start_subtest("sockmap strp tcp dispatch"))
+		test_sockmap_strp_dispatch_pkt(AF_INET, SOCK_STREAM);
+}
diff --git a/tools/testing/selftests/bpf/progs/test_sockmap_strp.c b/tools/testing/selftests/bpf/progs/test_sockmap_strp.c
new file mode 100644
index 000000000000..dde3d5bec515
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_sockmap_strp.c
@@ -0,0 +1,53 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_endian.h>
+int verdict_max_size = 10000;
+struct {
+	__uint(type, BPF_MAP_TYPE_SOCKMAP);
+	__uint(max_entries, 20);
+	__type(key, int);
+	__type(value, int);
+} sock_map SEC(".maps");
+
+SEC("sk_skb/stream_verdict")
+int prog_skb_verdict(struct __sk_buff *skb)
+{
+	__u32 one = 1;
+
+	if (skb->len > verdict_max_size)
+		return SK_PASS;
+
+	return bpf_sk_redirect_map(skb, &sock_map, one, 0);
+}
+
+SEC("sk_skb/stream_verdict")
+int prog_skb_verdict_pass(struct __sk_buff *skb)
+{
+	return SK_PASS;
+}
+
+SEC("sk_skb/stream_parser")
+int prog_skb_parser(struct __sk_buff *skb)
+{
+	return skb->len;
+}
+
+SEC("sk_skb/stream_parser")
+int prog_skb_parser_partial(struct __sk_buff *skb)
+{
+	/* agreement with the test program on a 4-byte size header
+	 * and 6-byte body.
+	 */
+	if (skb->len < 4) {
+		/* need more header to determine full length */
+		return 0;
+	}
+	/* return full length decoded from header.
+	 * the return value may be larger than skb->len which
+	 * means framework must wait body coming.
+	 */
+	return 10;
+}
+
+char _license[] SEC("license") = "GPL";
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH bpf v7 2/5] bpf: fix wrong copied_seq calculation
  2025-01-16 14:05 ` [PATCH bpf v7 2/5] bpf: fix wrong copied_seq calculation Jiayuan Chen
@ 2025-01-18 14:50   ` Jakub Sitnicki
  2025-01-18 15:29     ` Jiayuan Chen
  0 siblings, 1 reply; 13+ messages in thread
From: Jakub Sitnicki @ 2025-01-18 14:50 UTC (permalink / raw)
  To: Jiayuan Chen
  Cc: bpf, john.fastabend, netdev, martin.lau, ast, edumazet, davem,
	dsahern, kuba, pabeni, linux-kernel, song, andrii, mhal,
	yonghong.song, daniel, xiyou.wangcong, horms, corbet, eddyz87,
	cong.wang, shuah, mykolal, jolsa, haoluo, sdf, kpsingh, linux-doc

On Thu, Jan 16, 2025 at 10:05 PM +08, Jiayuan Chen wrote:
> 'sk->copied_seq' was updated in the tcp_eat_skb() function when the
> action of a BPF program was SK_REDIRECT. For other actions, like SK_PASS,
> the update logic for 'sk->copied_seq' was moved to
> tcp_bpf_recvmsg_parser() to ensure the accuracy of the 'fionread' feature.
>
> It works for a single stream_verdict scenario, as it also modified
> 'sk_data_ready->sk_psock_verdict_data_ready->tcp_read_skb'
> to remove updating 'sk->copied_seq'.
>
> However, for programs where both stream_parser and stream_verdict are
> active(strparser purpose), tcp_read_sock() was used instead of
> tcp_read_skb() (sk_data_ready->strp_data_ready->tcp_read_sock)
> tcp_read_sock() now still update 'sk->copied_seq', leading to duplicated
> updates.
>
> In summary, for strparser + SK_PASS, copied_seq is redundantly calculated
> in both tcp_read_sock() and tcp_bpf_recvmsg_parser().
>
> The issue causes incorrect copied_seq calculations, which prevent
> correct data reads from the recv() interface in user-land.
>
> We do not want to add new proto_ops to implement a new version of
> tcp_read_sock, as this would introduce code complexity [1].
>
> We add new callback for strparser for customized read operation, also as
> a wrapper function it provides abstraction use psock.
>
> [1]: https://lore.kernel.org/bpf/20241218053408.437295-1-mrpre@163.com
> Fixes: e5c6de5fa025 ("bpf, sockmap: Incorrectly handling copied_seq")
> Suggested-by: Jakub Sitnicki <jakub@cloudflare.com>
> Signed-off-by: Jiayuan Chen <mrpre@163.com>
> ---

[...]

> diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
> index 47f65b1b70ca..6dcde3506a9b 100644
> --- a/net/ipv4/tcp_bpf.c
> +++ b/net/ipv4/tcp_bpf.c
> @@ -646,6 +646,47 @@ static int tcp_bpf_assert_proto_ops(struct proto *ops)
>  	       ops->sendmsg  == tcp_sendmsg ? 0 : -ENOTSUPP;
>  }
>  
> +#if IS_ENABLED(CONFIG_BPF_STREAM_PARSER)
> +static int tcp_bpf_strp_read_sock(struct sock *sk, read_descriptor_t *desc,
> +				  sk_read_actor_t recv_actor)
> +{
> +	struct sk_psock *psock;
> +	struct tcp_sock *tp;
> +	int copied = 0;
> +
> +	tp = tcp_sk(sk);
> +	rcu_read_lock();
> +	psock = sk_psock(sk);
> +	if (WARN_ON(!psock)) {
> +		desc->error = -EINVAL;
> +		goto out;
> +	}
> +
> +	psock->ingress_bytes = 0;
> +	/* We could easily add copied_seq and noack into desc then call
> +	 * ops->read_sock without calling symbol directly. But unfortunately
> +	 * most descriptors used by other modules are not inited with zero.
> +	 * Also it not work by replacing ops->read_sock without introducing
> +	 * new ops as ops itself is located in rodata segment.
> +	 */
> +	copied = tcp_read_sock_noack(sk, desc, recv_actor, true,
> +				     &psock->copied_seq);
> +	if (copied < 0)
> +		goto out;
> +	/* recv_actor may redirect skb to another socket(SK_REDIRECT) or
> +	 * just put skb into ingress queue of current socket(SK_PASS).
> +	 * For SK_REDIRECT, we need 'ack' the frame immediately but for
> +	 * SK_PASS, the 'ack' was delay to tcp_bpf_recvmsg_parser()
> +	 */
> +	tp->copied_seq = psock->copied_seq - psock->ingress_bytes;
> +	tcp_rcv_space_adjust(sk);
> +	__tcp_cleanup_rbuf(sk, copied - psock->ingress_bytes);
> +out:
> +	rcu_read_unlock();
> +	return copied;
> +}
> +#endif /* CONFIG_BPF_STREAM_PARSER */
> +
>  int tcp_bpf_update_proto(struct sock *sk, struct sk_psock *psock, bool restore)
>  {
>  	int family = sk->sk_family == AF_INET6 ? TCP_BPF_IPV6 : TCP_BPF_IPV4;
> @@ -681,6 +722,12 @@ int tcp_bpf_update_proto(struct sock *sk, struct sk_psock *psock, bool restore)
>  
>  	/* Pairs with lockless read in sk_clone_lock() */
>  	sock_replace_proto(sk, &tcp_bpf_prots[family][config]);
> +#if IS_ENABLED(CONFIG_BPF_STREAM_PARSER)
> +	if (psock->progs.stream_parser && psock->progs.stream_verdict) {
> +		psock->copied_seq = tcp_sk(sk)->copied_seq;
> +		psock->read_sock = tcp_bpf_strp_read_sock;

Just directly set psock->strp.cb.read_sock to tcp_bpf_strp_read_sock.
Then we don't need this intermediate psock->read_sock callback, which
doesn't do anything useful.

> +	}
> +#endif
>  	return 0;
>  }
>  EXPORT_SYMBOL_GPL(tcp_bpf_update_proto);

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH bpf v7 1/5] strparser: add read_sock callback
  2025-01-16 14:05 ` [PATCH bpf v7 1/5] strparser: add read_sock callback Jiayuan Chen
@ 2025-01-18 14:56   ` Jakub Sitnicki
  0 siblings, 0 replies; 13+ messages in thread
From: Jakub Sitnicki @ 2025-01-18 14:56 UTC (permalink / raw)
  To: Jiayuan Chen
  Cc: bpf, john.fastabend, netdev, martin.lau, ast, edumazet, davem,
	dsahern, kuba, pabeni, linux-kernel, song, andrii, mhal,
	yonghong.song, daniel, xiyou.wangcong, horms, corbet, eddyz87,
	cong.wang, shuah, mykolal, jolsa, haoluo, sdf, kpsingh, linux-doc

On Thu, Jan 16, 2025 at 10:05 PM +08, Jiayuan Chen wrote:
> Added a new read_sock handler, allowing users to customize read operations
> instead of relying on the native socket's read_sock.
>
> Signed-off-by: Jiayuan Chen <mrpre@163.com>
> ---
>  Documentation/networking/strparser.rst | 11 ++++++++++-
>  include/net/strparser.h                |  2 ++
>  net/strparser/strparser.c              | 11 +++++++++--
>  3 files changed, 21 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/networking/strparser.rst b/Documentation/networking/strparser.rst
> index 6cab1f74ae05..e41c18eee2f4 100644
> --- a/Documentation/networking/strparser.rst
> +++ b/Documentation/networking/strparser.rst
> @@ -112,7 +112,7 @@ Functions
>  Callbacks
>  =========
>  
> -There are six callbacks:
> +There are seven callbacks:
>  
>      ::
>  
> @@ -182,6 +182,15 @@ There are six callbacks:
>      the length of the message. skb->len - offset may be greater
>      then full_len since strparser does not trim the skb.
>  
> +    ::
> +
> +	int (*read_sock)(struct strparser *strp, read_descriptor_t *desc,
> +                     sk_read_actor_t recv_actor);
> +
> +    read_sock is called when the user specify it, allowing for customized
> +    read operations. If the callback is not set (NULL in strp_init) native
> +    read_sock operation of the socket is used.
> +

Could be one sentence:

        The read_sock callback is used by strparser instead of
        sock->ops->read_sock, if provided.

>      ::
>  
>  	int (*read_sock_done)(struct strparser *strp, int err);

[...]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH bpf v7 3/5] bpf: disable non stream socket for strparser
  2025-01-16 14:05 ` [PATCH bpf v7 3/5] bpf: disable non stream socket for strparser Jiayuan Chen
@ 2025-01-18 15:03   ` Jakub Sitnicki
  2025-01-18 15:32     ` Jiayuan Chen
  0 siblings, 1 reply; 13+ messages in thread
From: Jakub Sitnicki @ 2025-01-18 15:03 UTC (permalink / raw)
  To: Jiayuan Chen
  Cc: bpf, john.fastabend, netdev, martin.lau, ast, edumazet, davem,
	dsahern, kuba, pabeni, linux-kernel, song, andrii, mhal,
	yonghong.song, daniel, xiyou.wangcong, horms, corbet, eddyz87,
	cong.wang, shuah, mykolal, jolsa, haoluo, sdf, kpsingh, linux-doc

On Thu, Jan 16, 2025 at 10:05 PM +08, Jiayuan Chen wrote:
> Currently, only TCP supports strparser, but sockmap doesn't intercept
> non-TCP to attach strparser. For example, with UDP, although the
> read/write handlers are replaced, strparser is not executed due to the
> lack of read_sock operation.
>
> Furthermore, in udp_bpf_recvmsg(), it checks whether psock has data, and
> if not, it falls back to the native UDP read interface, making
> UDP + strparser appear to read correctly. According to it's commit
> history, the behavior is unexpected.
>
> Moreover, since UDP lacks the concept of streams, we intercept it
> directly. Later, we will try to support Unix streams and add more
> check.
>
> Signed-off-by: Jiayuan Chen <mrpre@163.com>
> ---

Needs a Fixes: tag.

>  net/core/sock_map.c | 13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/net/core/sock_map.c b/net/core/sock_map.c
> index f1b9b3958792..c6ee2d1d9cf2 100644
> --- a/net/core/sock_map.c
> +++ b/net/core/sock_map.c
> @@ -214,6 +214,14 @@ static struct sk_psock *sock_map_psock_get_checked(struct sock *sk)
>  	return psock;
>  }
>  
> +static bool sock_map_sk_strp_allowed(const struct sock *sk)
> +{
> +	/* todo: support unix stream socket */
> +	if (sk_is_tcp(sk))
> +		return true;
> +	return false;
> +}
> +

We don't need this yet, so please don't add it. Especially since this is
fix. It should be kept down to a minimum. Do the sk_is_tcp() check
directly from sock_map_link().

>  static int sock_map_link(struct bpf_map *map, struct sock *sk)
>  {
>  	struct sk_psock_progs *progs = sock_map_progs(map);
> @@ -303,7 +311,10 @@ static int sock_map_link(struct bpf_map *map, struct sock *sk)
>  
>  	write_lock_bh(&sk->sk_callback_lock);
>  	if (stream_parser && stream_verdict && !psock->saved_data_ready) {
> -		ret = sk_psock_init_strp(sk, psock);
> +		if (sock_map_sk_strp_allowed(sk))
> +			ret = sk_psock_init_strp(sk, psock);
> +		else
> +			ret = -EOPNOTSUPP;
>  		if (ret) {
>  			write_unlock_bh(&sk->sk_callback_lock);
>  			sk_psock_put(sk, psock);

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH bpf v7 2/5] bpf: fix wrong copied_seq calculation
  2025-01-18 14:50   ` Jakub Sitnicki
@ 2025-01-18 15:29     ` Jiayuan Chen
  2025-01-20  3:35       ` Jiayuan Chen
  0 siblings, 1 reply; 13+ messages in thread
From: Jiayuan Chen @ 2025-01-18 15:29 UTC (permalink / raw)
  To: Jakub Sitnicki
  Cc: bpf, john.fastabend, netdev, martin.lau, ast, edumazet, davem,
	dsahern, kuba, pabeni, linux-kernel, song, andrii, mhal,
	yonghong.song, daniel, xiyou.wangcong, horms, corbet, eddyz87,
	cong.wang, shuah, mykolal, jolsa, haoluo, sdf, kpsingh, linux-doc

On Sat, Jan 18, 2025 at 03:50:22PM +0100, Jakub Sitnicki wrote:
> On Thu, Jan 16, 2025 at 10:05 PM +08, Jiayuan Chen wrote:
> > 'sk->copied_seq' was updated in the tcp_eat_skb() function when the
> > action of a BPF program was SK_REDIRECT. For other actions, like SK_PASS,
> > +}
> > +#endif /* CONFIG_BPF_STREAM_PARSER */
> > +
> >  int tcp_bpf_update_proto(struct sock *sk, struct sk_psock *psock, bool restore)
> >  {
> >  	int family = sk->sk_family == AF_INET6 ? TCP_BPF_IPV6 : TCP_BPF_IPV4;
> > @@ -681,6 +722,12 @@ int tcp_bpf_update_proto(struct sock *sk, struct sk_psock *psock, bool restore)
> >  
> >  	/* Pairs with lockless read in sk_clone_lock() */
> >  	sock_replace_proto(sk, &tcp_bpf_prots[family][config]);
> > +#if IS_ENABLED(CONFIG_BPF_STREAM_PARSER)
> > +	if (psock->progs.stream_parser && psock->progs.stream_verdict) {
> > +		psock->copied_seq = tcp_sk(sk)->copied_seq;
> > +		psock->read_sock = tcp_bpf_strp_read_sock;
> 
> Just directly set psock->strp.cb.read_sock to tcp_bpf_strp_read_sock.
> Then we don't need this intermediate psock->read_sock callback, which
> doesn't do anything useful.
>
Ok, I will do this.
(BTW, I intended to avoid bringing "struct strparser" into tcp_bpf.c so I
added a wrapper function instead in skmsg.c without calling it directly) 
> > +	}
> > +#endif
> >  	return 0;
> >  }
> >  EXPORT_SYMBOL_GPL(tcp_bpf_update_proto);


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH bpf v7 3/5] bpf: disable non stream socket for strparser
  2025-01-18 15:03   ` Jakub Sitnicki
@ 2025-01-18 15:32     ` Jiayuan Chen
  0 siblings, 0 replies; 13+ messages in thread
From: Jiayuan Chen @ 2025-01-18 15:32 UTC (permalink / raw)
  To: Jakub Sitnicki
  Cc: bpf, john.fastabend, netdev, martin.lau, ast, edumazet, davem,
	dsahern, kuba, pabeni, linux-kernel, song, andrii, mhal,
	yonghong.song, daniel, xiyou.wangcong, horms, corbet, eddyz87,
	cong.wang, shuah, mykolal, jolsa, haoluo, sdf, kpsingh, linux-doc

On Sat, Jan 18, 2025 at 04:03:33PM +0100, Jakub Sitnicki wrote:
> On Thu, Jan 16, 2025 at 10:05 PM +08, Jiayuan Chen wrote:
> > +	if (sk_is_tcp(sk))
> > +		return true;
> > +	return false;
> > +}
> > +
> 
> We don't need this yet, so please don't add it. Especially since this is
> fix. It should be kept down to a minimum. Do the sk_is_tcp() check
> directly from sock_map_link().
> 
Ok, I will do this.
> >  static int sock_map_link(struct bpf_map *map, struct sock *sk)
> >  {
> >  	struct sk_psock_progs *progs = sock_map_progs(map);
> > @@ -303,7 +311,10 @@ static int sock_map_link(struct bpf_map *map, struct sock *sk)
> >  
> >  	write_lock_bh(&sk->sk_callback_lock);
> >  	if (stream_parser && stream_verdict && !psock->saved_data_ready) {
> > -		ret = sk_psock_init_strp(sk, psock);
> > +		if (sock_map_sk_strp_allowed(sk))
> > +			ret = sk_psock_init_strp(sk, psock);
> > +		else
> > +			ret = -EOPNOTSUPP;
> >  		if (ret) {
> >  			write_unlock_bh(&sk->sk_callback_lock);
> >  			sk_psock_put(sk, psock);


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH bpf v7 2/5] bpf: fix wrong copied_seq calculation
  2025-01-18 15:29     ` Jiayuan Chen
@ 2025-01-20  3:35       ` Jiayuan Chen
  2025-01-20 10:13         ` Jakub Sitnicki
  0 siblings, 1 reply; 13+ messages in thread
From: Jiayuan Chen @ 2025-01-20  3:35 UTC (permalink / raw)
  To: Jakub Sitnicki
  Cc: bpf, john.fastabend, netdev, martin.lau, ast, edumazet, davem,
	dsahern, kuba, pabeni, linux-kernel, song, andrii, mhal,
	yonghong.song, daniel, xiyou.wangcong, horms, corbet, eddyz87,
	cong.wang, shuah, mykolal, jolsa, haoluo, sdf, kpsingh, linux-doc

On Sat, Jan 18, 2025 at 11:29:04PM +0800, Jiayuan Chen wrote:
> On Sat, Jan 18, 2025 at 03:50:22PM +0100, Jakub Sitnicki wrote:
> > On Thu, Jan 16, 2025 at 10:05 PM +08, Jiayuan Chen wrote:
> > > 'sk->copied_seq' was updated in the tcp_eat_skb() function when the
> > > action of a BPF program was SK_REDIRECT. For other actions, like SK_PASS,
> > > +}
> > > +#endif /* CONFIG_BPF_STREAM_PARSER */
> > > +
> > >  int tcp_bpf_update_proto(struct sock *sk, struct sk_psock *psock, bool restore)
> > >  {
> > >  	int family = sk->sk_family == AF_INET6 ? TCP_BPF_IPV6 : TCP_BPF_IPV4;
> > > @@ -681,6 +722,12 @@ int tcp_bpf_update_proto(struct sock *sk, struct sk_psock *psock, bool restore)
> > >  
> > >  	/* Pairs with lockless read in sk_clone_lock() */
> > >  	sock_replace_proto(sk, &tcp_bpf_prots[family][config]);
> > > +#if IS_ENABLED(CONFIG_BPF_STREAM_PARSER)
> > > +	if (psock->progs.stream_parser && psock->progs.stream_verdict) {
> > > +		psock->copied_seq = tcp_sk(sk)->copied_seq;
> > > +		psock->read_sock = tcp_bpf_strp_read_sock;
> > 
> > Just directly set psock->strp.cb.read_sock to tcp_bpf_strp_read_sock.
> > Then we don't need this intermediate psock->read_sock callback, which
> > doesn't do anything useful.
> >
> Ok, I will do this.
> (BTW, I intended to avoid bringing "struct strparser" into tcp_bpf.c so I
> added a wrapper function instead in skmsg.c without calling it directly) 
> 
I find that tcp_bpf_update_proto is called before sk_psock_init_strp. Any
assignment of psock->cb.strp will be overwritten in sk_psock_init_strp.

May read_sock still needed. But we can avoid adding wrapper function by
assigning psock->read_sock to cb.read_sock directly like this:

--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -1137,10 +1137,11 @@ int sk_psock_init_strp(struct sock *sk, struct sk_psock *psock)
 {
        int ret;

-       static const struct strp_callbacks cb = {
+       struct strp_callbacks cb = {
                .rcv_msg        = sk_psock_strp_read,
                .read_sock_done = sk_psock_strp_read_done,
                .parse_msg      = sk_psock_strp_parse,
+               .read_sock      = psock->read_sock,
        };

        ret = strp_init(&psock->strp, sk, &cb);

---
Thanks


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH bpf v7 2/5] bpf: fix wrong copied_seq calculation
  2025-01-20  3:35       ` Jiayuan Chen
@ 2025-01-20 10:13         ` Jakub Sitnicki
  0 siblings, 0 replies; 13+ messages in thread
From: Jakub Sitnicki @ 2025-01-20 10:13 UTC (permalink / raw)
  To: Jiayuan Chen
  Cc: bpf, john.fastabend, netdev, martin.lau, ast, edumazet, davem,
	dsahern, kuba, pabeni, linux-kernel, song, andrii, mhal,
	yonghong.song, daniel, xiyou.wangcong, horms, corbet, eddyz87,
	cong.wang, shuah, mykolal, jolsa, haoluo, sdf, kpsingh, linux-doc

On Mon, Jan 20, 2025 at 11:35 AM +08, Jiayuan Chen wrote:
> On Sat, Jan 18, 2025 at 11:29:04PM +0800, Jiayuan Chen wrote:
>> On Sat, Jan 18, 2025 at 03:50:22PM +0100, Jakub Sitnicki wrote:
>> > On Thu, Jan 16, 2025 at 10:05 PM +08, Jiayuan Chen wrote:
>> > > 'sk->copied_seq' was updated in the tcp_eat_skb() function when the
>> > > action of a BPF program was SK_REDIRECT. For other actions, like SK_PASS,
>> > > +}
>> > > +#endif /* CONFIG_BPF_STREAM_PARSER */
>> > > +
>> > >  int tcp_bpf_update_proto(struct sock *sk, struct sk_psock *psock, bool restore)
>> > >  {
>> > >  	int family = sk->sk_family == AF_INET6 ? TCP_BPF_IPV6 : TCP_BPF_IPV4;
>> > > @@ -681,6 +722,12 @@ int tcp_bpf_update_proto(struct sock *sk, struct sk_psock *psock, bool restore)
>> > >  
>> > >  	/* Pairs with lockless read in sk_clone_lock() */
>> > >  	sock_replace_proto(sk, &tcp_bpf_prots[family][config]);
>> > > +#if IS_ENABLED(CONFIG_BPF_STREAM_PARSER)
>> > > +	if (psock->progs.stream_parser && psock->progs.stream_verdict) {
>> > > +		psock->copied_seq = tcp_sk(sk)->copied_seq;
>> > > +		psock->read_sock = tcp_bpf_strp_read_sock;
>> > 
>> > Just directly set psock->strp.cb.read_sock to tcp_bpf_strp_read_sock.
>> > Then we don't need this intermediate psock->read_sock callback, which
>> > doesn't do anything useful.
>> >
>> Ok, I will do this.
>> (BTW, I intended to avoid bringing "struct strparser" into tcp_bpf.c so I
>> added a wrapper function instead in skmsg.c without calling it directly) 
>> 
> I find that tcp_bpf_update_proto is called before sk_psock_init_strp. Any
> assignment of psock->cb.strp will be overwritten in sk_psock_init_strp.

Or just don't set ->read_sock in strp_init.
It's being reset only because you made it so in patch 1 :-)

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2025-01-20 10:13 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-16 14:05 [PATCH bpf v7 0/5] bpf: fix wrong copied_seq calculation and add tests Jiayuan Chen
2025-01-16 14:05 ` [PATCH bpf v7 1/5] strparser: add read_sock callback Jiayuan Chen
2025-01-18 14:56   ` Jakub Sitnicki
2025-01-16 14:05 ` [PATCH bpf v7 2/5] bpf: fix wrong copied_seq calculation Jiayuan Chen
2025-01-18 14:50   ` Jakub Sitnicki
2025-01-18 15:29     ` Jiayuan Chen
2025-01-20  3:35       ` Jiayuan Chen
2025-01-20 10:13         ` Jakub Sitnicki
2025-01-16 14:05 ` [PATCH bpf v7 3/5] bpf: disable non stream socket for strparser Jiayuan Chen
2025-01-18 15:03   ` Jakub Sitnicki
2025-01-18 15:32     ` Jiayuan Chen
2025-01-16 14:05 ` [PATCH bpf v7 4/5] selftests/bpf: fix invalid flag of recv() Jiayuan Chen
2025-01-16 14:05 ` [PATCH bpf v7 5/5] selftests/bpf: add strparser test for bpf Jiayuan Chen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).