mptcp.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
* [PATCH mptcp-next v7 0/4] implement mptcp read_sock
@ 2025-07-07  9:34 Geliang Tang
  2025-07-07  9:34 ` [PATCH mptcp-next v7 1/4] mptcp: add eat_recv_skb helper Geliang Tang
                   ` (4 more replies)
  0 siblings, 5 replies; 10+ messages in thread
From: Geliang Tang @ 2025-07-07  9:34 UTC (permalink / raw)
  To: mptcp, pabeni, hare; +Cc: Geliang Tang

From: Geliang Tang <tanggeliang@kylinos.cn>

v7:
 - only patch 1 and patch 2 changed.
 - add a new helper mptcp_eat_recv_skb.
 - invoke skb_peek in mptcp_recv_skb().
 - use while ((skb = mptcp_recv_skb(sk)) != NULL) instead of
 skb_queue_walk_safe(&sk->sk_receive_queue, skb, tmp).

v6:
 - address Paolo's comment for v4, v5 (thanks)

v5:
 - extract the common code of __mptcp_recvmsg_mskq() and mptcp_read_sock()
 into a new helper __mptcp_recvmsg_desc() to reduce duplication code.

v4:
 - v3 doesn't work for MPTCP fallback tests in mptcp_connect.sh, this
   set fix it.
 - invoke __mptcp_move_skbs in mptcp_read_sock.
 - use INDIRECT_CALL_INET_1 in __tcp_splice_read.

v3:
 - merge the two squash-to patches.
 - use sk->sk_rcvbuf instead of INT_MAX as the max len in
 mptcp_read_sock().
 - add splice io mode for mptcp_connect and drop mptcp_splice.c test.
 - the splice test for packetdrill is also added here:
https://github.com/multipath-tcp/packetdrill/pull/162

v2:
 - set splice_read of mptcp
 - add a splice selftest.

I have good news! I recently added MPTCP support to "NVME over TCP".
And my RFC patches are under review by NVME maintainer Hannes.

Replacing "NVME over TCP" with MPTCP is very simple. I used IPPROTO_MPTCP
instead of IPPROTO_TCP to create MPTCP sockets on both target and host
sides, these sockets are created in Kernel space.

nvmet_tcp_add_port:

	ret = sock_create(port->addr.ss_family, SOCK_STREAM,
				IPPROTO_MPTCP, &port->sock);

nvme_tcp_alloc_queue:

	ret = sock_create_kern(current->nsproxy->net_ns,
			ctrl->addr.ss_family, SOCK_STREAM,
			IPPROTO_MPTCP, &queue->sock);

nvme_tcp_try_recv() needs to call .read_sock interface of struct
proto_ops, but it is not implemented in MPTCP. So I implemented it
with reference to __mptcp_recvmsg_mskq().

Since the NVME part patches are still under reviewing, I only send the
MPTCP part patches in this set to MPTCP ML for your opinions.

Geliang Tang (4):
  mptcp: add eat_recv_skb helper
  mptcp: implement .read_sock
  mptcp: implement .splice_read
  selftests: mptcp: add splice io mode

 net/mptcp/protocol.c                          | 215 +++++++++++++++++-
 .../selftests/net/mptcp/mptcp_connect.c       |  63 ++++-
 2 files changed, 271 insertions(+), 7 deletions(-)

-- 
2.48.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH mptcp-next v7 1/4] mptcp: add eat_recv_skb helper
  2025-07-07  9:34 [PATCH mptcp-next v7 0/4] implement mptcp read_sock Geliang Tang
@ 2025-07-07  9:34 ` Geliang Tang
  2025-07-07  9:34 ` [PATCH mptcp-next v7 2/4] mptcp: implement .read_sock Geliang Tang
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 10+ messages in thread
From: Geliang Tang @ 2025-07-07  9:34 UTC (permalink / raw)
  To: mptcp, pabeni, hare; +Cc: Geliang Tang

From: Geliang Tang <tanggeliang@kylinos.cn>

This patch extracts the free skb related code in __mptcp_recvmsg_mskq()
into a new helper mptcp_eat_recv_skb().

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
---
 net/mptcp/protocol.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 0c93b36373b1..48365d54bc06 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -1845,6 +1845,15 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 
 static void mptcp_rcv_space_adjust(struct mptcp_sock *msk, int copied);
 
+static void mptcp_eat_recv_skb(struct sock *sk, struct sk_buff *skb)
+{
+	/* avoid the indirect call, we know the destructor is sock_wfree */
+	skb->destructor = NULL;
+	atomic_sub(skb->truesize, &sk->sk_rmem_alloc);
+	sk_mem_uncharge(sk, skb->truesize);
+	sk_eat_skb(sk, skb);
+}
+
 static int __mptcp_recvmsg_mskq(struct sock *sk,
 				struct msghdr *msg,
 				size_t len, int flags,
@@ -1887,12 +1896,7 @@ static int __mptcp_recvmsg_mskq(struct sock *sk,
 		}
 
 		if (!(flags & MSG_PEEK)) {
-			/* avoid the indirect call, we know the destructor is sock_wfree */
-			skb->destructor = NULL;
-			atomic_sub(skb->truesize, &sk->sk_rmem_alloc);
-			sk_mem_uncharge(sk, skb->truesize);
-			__skb_unlink(skb, &sk->sk_receive_queue);
-			__kfree_skb(skb);
+			mptcp_eat_recv_skb(sk, skb);
 			msk->bytes_consumed += count;
 		}
 
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH mptcp-next v7 2/4] mptcp: implement .read_sock
  2025-07-07  9:34 [PATCH mptcp-next v7 0/4] implement mptcp read_sock Geliang Tang
  2025-07-07  9:34 ` [PATCH mptcp-next v7 1/4] mptcp: add eat_recv_skb helper Geliang Tang
@ 2025-07-07  9:34 ` Geliang Tang
  2025-07-08 16:16   ` Matthieu Baerts
  2025-07-07  9:34 ` [PATCH mptcp-next v7 3/4] mptcp: implement .splice_read Geliang Tang
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 10+ messages in thread
From: Geliang Tang @ 2025-07-07  9:34 UTC (permalink / raw)
  To: mptcp, pabeni, hare; +Cc: Geliang Tang

From: Geliang Tang <tanggeliang@kylinos.cn>

nvme_tcp_try_recv() needs to call .read_sock interface of struct
proto_ops, but it is not implemented in MPTCP.

This patch implements it with reference to __mptcp_recvmsg_mskq().

v2:
 - first check the sk_state (Matt), but not look for the end of the
end of a connection like TCP in __tcp_read_sock():

	if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN)
		break;

This will cause a use-after-free error:

	BUG: KASAN: slab-use-after-free in mptcp_read_sock.

v3:
 - Use sk->sk_rcvbuf instead of INT_MAX as the max len.

v4:
 - invoke __mptcp_move_skbs.

Reviewed-by: Hannes Reinecke <hare@kernel.org>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
---
 net/mptcp/protocol.c | 63 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 63 insertions(+)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 48365d54bc06..fc429d175ede 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -3962,6 +3962,67 @@ static __poll_t mptcp_poll(struct file *file, struct socket *sock,
 	return mask;
 }
 
+static struct sk_buff *mptcp_recv_skb(struct sock *sk)
+{
+	if (skb_queue_empty(&sk->sk_receive_queue))
+		__mptcp_move_skbs(sk);
+
+	return skb_peek(&sk->sk_receive_queue);
+}
+
+/*
+ * Note:
+ *	- It is assumed that the socket was locked by the caller.
+ */
+static int mptcp_read_sock(struct sock *sk, read_descriptor_t *desc,
+			   sk_read_actor_t recv_actor)
+{
+	struct mptcp_sock *msk = mptcp_sk(sk);
+	size_t len = sk->sk_rcvbuf;
+	struct sk_buff *skb;
+	int copied = 0;
+
+	if (sk->sk_state == TCP_LISTEN)
+		return -ENOTCONN;
+	while ((skb = mptcp_recv_skb(sk)) != NULL) {
+		u32 offset = MPTCP_SKB_CB(skb)->offset;
+		u32 data_len = skb->len - offset;
+		u32 size = min_t(size_t, len - copied, data_len);
+		int count;
+
+		count = recv_actor(desc, skb, offset, size);
+		if (count <= 0) {
+			if (!copied)
+				copied = count;
+			break;
+		}
+
+		copied += count;
+
+		if (count < data_len) {
+			MPTCP_SKB_CB(skb)->offset += count;
+			MPTCP_SKB_CB(skb)->map_seq += count;
+			msk->bytes_consumed += count;
+			break;
+		}
+
+		mptcp_eat_recv_skb(sk, skb);
+		msk->bytes_consumed += count;
+
+		if (copied >= len)
+			break;
+	}
+
+	mptcp_rcv_space_adjust(msk, copied);
+
+	if (copied > 0) {
+		mptcp_recv_skb(sk);
+		mptcp_cleanup_rbuf(msk, copied);
+	}
+
+	return copied;
+}
+
 static const struct proto_ops mptcp_stream_ops = {
 	.family		   = PF_INET,
 	.owner		   = THIS_MODULE,
@@ -3982,6 +4043,7 @@ static const struct proto_ops mptcp_stream_ops = {
 	.recvmsg	   = inet_recvmsg,
 	.mmap		   = sock_no_mmap,
 	.set_rcvlowat	   = mptcp_set_rcvlowat,
+	.read_sock	   = mptcp_read_sock,
 };
 
 static struct inet_protosw mptcp_protosw = {
@@ -4086,6 +4148,7 @@ static const struct proto_ops mptcp_v6_stream_ops = {
 	.compat_ioctl	   = inet6_compat_ioctl,
 #endif
 	.set_rcvlowat	   = mptcp_set_rcvlowat,
+	.read_sock	   = mptcp_read_sock,
 };
 
 static struct proto mptcp_v6_prot;
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH mptcp-next v7 3/4] mptcp: implement .splice_read
  2025-07-07  9:34 [PATCH mptcp-next v7 0/4] implement mptcp read_sock Geliang Tang
  2025-07-07  9:34 ` [PATCH mptcp-next v7 1/4] mptcp: add eat_recv_skb helper Geliang Tang
  2025-07-07  9:34 ` [PATCH mptcp-next v7 2/4] mptcp: implement .read_sock Geliang Tang
@ 2025-07-07  9:34 ` Geliang Tang
  2025-07-08 14:52   ` Paolo Abeni
  2025-07-07  9:34 ` [PATCH mptcp-next v7 4/4] selftests: mptcp: add splice io mode Geliang Tang
  2025-07-08 15:25 ` [PATCH mptcp-next v7 0/4] implement mptcp read_sock MPTCP CI
  4 siblings, 1 reply; 10+ messages in thread
From: Geliang Tang @ 2025-07-07  9:34 UTC (permalink / raw)
  To: mptcp, pabeni, hare; +Cc: Geliang Tang

From: Geliang Tang <tanggeliang@kylinos.cn>

This patch implements .splice_read interface of mptcp struct proto_ops
as mptcp_splice_read() with reference to tcp_splice_read().

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
---
 net/mptcp/protocol.c | 136 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 136 insertions(+)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index fc429d175ede..4638d4be2b98 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -4023,6 +4023,140 @@ static int mptcp_read_sock(struct sock *sk, read_descriptor_t *desc,
 	return copied;
 }
 
+/*
+ * MPTCP splice context
+ */
+struct mptcp_splice_state {
+	struct pipe_inode_info *pipe;
+	size_t len;
+	unsigned int flags;
+};
+
+static int mptcp_splice_data_recv(read_descriptor_t *rd_desc, struct sk_buff *skb,
+				  unsigned int offset, size_t len)
+{
+	struct mptcp_splice_state *mss = rd_desc->arg.data;
+	int ret;
+
+	ret = skb_splice_bits(skb, skb->sk, offset, mss->pipe,
+			      min(rd_desc->count, len), mss->flags);
+	if (ret > 0)
+		rd_desc->count -= ret;
+	return ret;
+}
+
+static int __mptcp_splice_read(struct sock *sk, struct mptcp_splice_state *mss)
+{
+	/* Store MPTCP splice context information in read_descriptor_t. */
+	read_descriptor_t rd_desc = {
+		.arg.data = mss,
+		.count	  = mss->len,
+	};
+
+	return mptcp_read_sock(sk, &rd_desc, mptcp_splice_data_recv);
+}
+
+/**
+ *  mptcp_splice_read - splice data from MPTCP socket to a pipe
+ * @sock:	socket to splice from
+ * @ppos:	position (not valid)
+ * @pipe:	pipe to splice to
+ * @len:	number of bytes to splice
+ * @flags:	splice modifier flags
+ *
+ * Description:
+ *    Will read pages from given socket and fill them into a pipe.
+ *
+ **/
+static ssize_t mptcp_splice_read(struct socket *sock, loff_t *ppos,
+				 struct pipe_inode_info *pipe, size_t len,
+				 unsigned int flags)
+{
+	struct mptcp_splice_state mss = {
+		.pipe = pipe,
+		.len = len,
+		.flags = flags,
+	};
+	struct sock *sk = sock->sk;
+	ssize_t spliced;
+	long timeo;
+	int ret;
+
+	/*
+	 * We can't seek on a socket input
+	 */
+	if (unlikely(*ppos))
+		return -ESPIPE;
+
+	spliced = 0;
+	ret = 0;
+
+	lock_sock(sk);
+
+	timeo = sock_rcvtimeo(sk, sock->file->f_flags & O_NONBLOCK);
+	while (mss.len) {
+		ret = __mptcp_splice_read(sk, &mss);
+		if (ret < 0) {
+			break;
+		} else if (!ret) {
+			if (spliced)
+				break;
+			if (sock_flag(sk, SOCK_DONE))
+				break;
+			if (sk->sk_err) {
+				ret = sock_error(sk);
+				break;
+			}
+			if (sk->sk_shutdown & RCV_SHUTDOWN) {
+				if (__mptcp_move_skbs(sk))
+					continue;
+				break;
+			}
+			if (sk->sk_state == TCP_CLOSE) {
+				ret = -ENOTCONN;
+				break;
+			}
+			if (!timeo) {
+				ret = -EAGAIN;
+				break;
+			}
+			/* if __mptcp_splice_read() got nothing while we have
+			 * an skb in receive queue, we do not want to loop.
+			 * This might happen with URG data.
+			 */
+			if (!skb_queue_empty(&sk->sk_receive_queue))
+				break;
+			ret = sk_wait_data(sk, &timeo, NULL);
+			if (ret < 0)
+				break;
+			if (signal_pending(current)) {
+				ret = sock_intr_errno(timeo);
+				break;
+			}
+			continue;
+		}
+		mss.len -= ret;
+		spliced += ret;
+
+		if (!mss.len || !timeo)
+			break;
+		release_sock(sk);
+		lock_sock(sk);
+
+		if (sk->sk_err || sk->sk_state == TCP_CLOSE ||
+		    (sk->sk_shutdown & RCV_SHUTDOWN) ||
+		    signal_pending(current))
+			break;
+	}
+
+	release_sock(sk);
+
+	if (spliced)
+		return spliced;
+
+	return ret;
+}
+
 static const struct proto_ops mptcp_stream_ops = {
 	.family		   = PF_INET,
 	.owner		   = THIS_MODULE,
@@ -4044,6 +4178,7 @@ static const struct proto_ops mptcp_stream_ops = {
 	.mmap		   = sock_no_mmap,
 	.set_rcvlowat	   = mptcp_set_rcvlowat,
 	.read_sock	   = mptcp_read_sock,
+	.splice_read	   = mptcp_splice_read,
 };
 
 static struct inet_protosw mptcp_protosw = {
@@ -4149,6 +4284,7 @@ static const struct proto_ops mptcp_v6_stream_ops = {
 #endif
 	.set_rcvlowat	   = mptcp_set_rcvlowat,
 	.read_sock	   = mptcp_read_sock,
+	.splice_read	   = mptcp_splice_read,
 };
 
 static struct proto mptcp_v6_prot;
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH mptcp-next v7 4/4] selftests: mptcp: add splice io mode
  2025-07-07  9:34 [PATCH mptcp-next v7 0/4] implement mptcp read_sock Geliang Tang
                   ` (2 preceding siblings ...)
  2025-07-07  9:34 ` [PATCH mptcp-next v7 3/4] mptcp: implement .splice_read Geliang Tang
@ 2025-07-07  9:34 ` Geliang Tang
  2025-07-08 16:23   ` Matthieu Baerts
  2025-07-08 15:25 ` [PATCH mptcp-next v7 0/4] implement mptcp read_sock MPTCP CI
  4 siblings, 1 reply; 10+ messages in thread
From: Geliang Tang @ 2025-07-07  9:34 UTC (permalink / raw)
  To: mptcp, pabeni, hare; +Cc: Geliang Tang

From: Geliang Tang <tanggeliang@kylinos.cn>

This patch adds a new 'splice' io mode for mptcp_connect to test
the newly added read_sock() and splice() functions of MPTCP.

	./mptcp_connect.sh -m splice

v2:
 - only use 'splice' mode for MPTCP fallback tests too.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
---
 .../selftests/net/mptcp/mptcp_connect.c       | 63 ++++++++++++++++++-
 1 file changed, 62 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/net/mptcp/mptcp_connect.c b/tools/testing/selftests/net/mptcp/mptcp_connect.c
index ac1349c4b9e5..4c219d620398 100644
--- a/tools/testing/selftests/net/mptcp/mptcp_connect.c
+++ b/tools/testing/selftests/net/mptcp/mptcp_connect.c
@@ -51,6 +51,7 @@ enum cfg_mode {
 	CFG_MODE_POLL,
 	CFG_MODE_MMAP,
 	CFG_MODE_SENDFILE,
+	CFG_MODE_SPLICE,
 };
 
 enum cfg_peek {
@@ -123,7 +124,7 @@ static void die_usage(void)
 	fprintf(stderr, "\t-j     -- add additional sleep at connection start and tear down "
 		"-- for MPJ tests\n");
 	fprintf(stderr, "\t-l     -- listens mode, accepts incoming connection\n");
-	fprintf(stderr, "\t-m [poll|mmap|sendfile] -- use poll(default)/mmap+write/sendfile\n");
+	fprintf(stderr, "\t-m [poll|mmap|sendfile|splice] -- use poll(default)/mmap+write/sendfile/splice\n");
 	fprintf(stderr, "\t-M mark -- set socket packet mark\n");
 	fprintf(stderr, "\t-o option -- test sockopt <option>\n");
 	fprintf(stderr, "\t-p num -- use port num\n");
@@ -925,6 +926,55 @@ static int copyfd_io_sendfile(int infd, int peerfd, int outfd,
 	return err;
 }
 
+static int do_splice(const int infd, const int outfd, const size_t len,
+		     struct wstate *winfo)
+{
+	int pipefd[2];
+	ssize_t bytes;
+	int err;
+
+	err = pipe(pipefd);
+	if (err)
+		return err;
+
+	while ((bytes = splice(infd, NULL, pipefd[1], NULL,
+			       len - winfo->total_len,
+			       SPLICE_F_MOVE | SPLICE_F_MORE)) > 0) {
+		splice(pipefd[0], NULL, outfd, NULL, bytes,
+		       SPLICE_F_MOVE | SPLICE_F_MORE);
+	}
+
+	close(pipefd[0]);
+	close(pipefd[1]);
+
+	return 0;
+}
+
+static int copyfd_io_splice(int infd, int peerfd, int outfd, unsigned int size,
+			    bool *in_closed_after_out, struct wstate *winfo)
+{
+	int err;
+
+	if (listen_mode) {
+		err = do_splice(peerfd, outfd, size, winfo);
+		if (err)
+			return err;
+
+		err = do_splice(infd, peerfd, size, winfo);
+	} else {
+		err = do_splice(infd, peerfd, size, winfo);
+		if (err)
+			return err;
+
+		shut_wr(peerfd);
+
+		err = do_splice(peerfd, outfd, size, winfo);
+		*in_closed_after_out = true;
+	}
+
+	return err;
+}
+
 static int copyfd_io(int infd, int peerfd, int outfd, bool close_peerfd, struct wstate *winfo)
 {
 	bool in_closed_after_out = false;
@@ -957,6 +1007,14 @@ static int copyfd_io(int infd, int peerfd, int outfd, bool close_peerfd, struct
 					 &in_closed_after_out, winfo);
 		break;
 
+	case CFG_MODE_SPLICE:
+		file_size = get_infd_size(infd);
+		if (file_size < 0)
+			return file_size;
+		ret = copyfd_io_splice(infd, peerfd, outfd, file_size,
+				       &in_closed_after_out, winfo);
+		break;
+
 	default:
 		fprintf(stderr, "Invalid mode %d\n", cfg_mode);
 
@@ -1361,12 +1419,15 @@ int parse_mode(const char *mode)
 		return CFG_MODE_MMAP;
 	if (!strcasecmp(mode, "sendfile"))
 		return CFG_MODE_SENDFILE;
+	if (!strcasecmp(mode, "splice"))
+		return CFG_MODE_SPLICE;
 
 	fprintf(stderr, "Unknown test mode: %s\n", mode);
 	fprintf(stderr, "Supported modes are:\n");
 	fprintf(stderr, "\t\t\"poll\" - interleaved read/write using poll()\n");
 	fprintf(stderr, "\t\t\"mmap\" - send entire input file (mmap+write), then read response (-l will read input first)\n");
 	fprintf(stderr, "\t\t\"sendfile\" - send entire input file (sendfile), then read response (-l will read input first)\n");
+	fprintf(stderr, "\t\t\"splice\" - send entire input file (splice), then read response (-l will read input first)\n");
 
 	die_usage();
 
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH mptcp-next v7 3/4] mptcp: implement .splice_read
  2025-07-07  9:34 ` [PATCH mptcp-next v7 3/4] mptcp: implement .splice_read Geliang Tang
@ 2025-07-08 14:52   ` Paolo Abeni
  2025-07-10  9:10     ` Geliang Tang
  0 siblings, 1 reply; 10+ messages in thread
From: Paolo Abeni @ 2025-07-08 14:52 UTC (permalink / raw)
  To: Geliang Tang, mptcp, hare; +Cc: Geliang Tang

On 7/7/25 11:34 AM, Geliang Tang wrote:
> From: Geliang Tang <tanggeliang@kylinos.cn>
> 
> This patch implements .splice_read interface of mptcp struct proto_ops
> as mptcp_splice_read() with reference to tcp_splice_read().
> 
> Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
> ---
>  net/mptcp/protocol.c | 136 +++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 136 insertions(+)
> 
> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index fc429d175ede..4638d4be2b98 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -4023,6 +4023,140 @@ static int mptcp_read_sock(struct sock *sk, read_descriptor_t *desc,
>  	return copied;
>  }
>  
> +/*
> + * MPTCP splice context
> + */
> +struct mptcp_splice_state {
> +	struct pipe_inode_info *pipe;
> +	size_t len;
> +	unsigned int flags;
> +};
> +
> +static int mptcp_splice_data_recv(read_descriptor_t *rd_desc, struct sk_buff *skb,
> +				  unsigned int offset, size_t len)
> +{
> +	struct mptcp_splice_state *mss = rd_desc->arg.data;
> +	int ret;
> +
> +	ret = skb_splice_bits(skb, skb->sk, offset, mss->pipe,
> +			      min(rd_desc->count, len), mss->flags);
> +	if (ret > 0)
> +		rd_desc->count -= ret;
> +	return ret;
> +}

I have mixed feeling WRT the above. I'm wondering if we should reuse the
same code already existing in TCP, moving tcp_spice_state definition in
some shared hdr and macking tcp_splice_data_recv not static.

> +static int __mptcp_splice_read(struct sock *sk, struct mptcp_splice_state *mss)
> +{
> +	/* Store MPTCP splice context information in read_descriptor_t. */
> +	read_descriptor_t rd_desc = {
> +		.arg.data = mss,
> +		.count	  = mss->len,
> +	};
> +
> +	return mptcp_read_sock(sk, &rd_desc, mptcp_splice_data_recv);
> +}
> +
> +/**
> + *  mptcp_splice_read - splice data from MPTCP socket to a pipe
> + * @sock:	socket to splice from
> + * @ppos:	position (not valid)
> + * @pipe:	pipe to splice to
> + * @len:	number of bytes to splice
> + * @flags:	splice modifier flags
> + *
> + * Description:
> + *    Will read pages from given socket and fill them into a pipe.
> + *
> + **/
> +static ssize_t mptcp_splice_read(struct socket *sock, loff_t *ppos,
> +				 struct pipe_inode_info *pipe, size_t len,
> +				 unsigned int flags)
> +{
> +	struct mptcp_splice_state mss = {
> +		.pipe = pipe,
> +		.len = len,
> +		.flags = flags,
> +	};
> +	struct sock *sk = sock->sk;
> +	ssize_t spliced;
> +	long timeo;
> +	int ret;
> +
> +	/*
> +	 * We can't seek on a socket input
> +	 */
> +	if (unlikely(*ppos))
> +		return -ESPIPE;
> +
> +	spliced = 0;
> +	ret = 0;
> +
> +	lock_sock(sk);
> +
> +	timeo = sock_rcvtimeo(sk, sock->file->f_flags & O_NONBLOCK);
> +	while (mss.len) {
> +		ret = __mptcp_splice_read(sk, &mss);
> +		if (ret < 0) {
> +			break;
> +		} else if (!ret) {
> +			if (spliced)
> +				break;
> +			if (sock_flag(sk, SOCK_DONE))
> +				break;
> +			if (sk->sk_err) {
> +				ret = sock_error(sk);
> +				break;
> +			}
> +			if (sk->sk_shutdown & RCV_SHUTDOWN) {
> +				if (__mptcp_move_skbs(sk))
> +					continue;
> +				break;
> +			}
> +			if (sk->sk_state == TCP_CLOSE) {
> +				ret = -ENOTCONN;
> +				break;
> +			}
> +			if (!timeo) {
> +				ret = -EAGAIN;
> +				break;
> +			}
> +			/* if __mptcp_splice_read() got nothing while we have
> +			 * an skb in receive queue, we do not want to loop.
> +			 * This might happen with URG data.
> +			 */
> +			if (!skb_queue_empty(&sk->sk_receive_queue))
> +				break;
> +			ret = sk_wait_data(sk, &timeo, NULL);
> +			if (ret < 0)
> +				break;
> +			if (signal_pending(current)) {
> +				ret = sock_intr_errno(timeo);
> +				break;
> +			}

I think that moving the above if statement before the queue empty check
will not change the overall behavior.

With that in place you could factor out an

bool mptcp_recv_should_stop(struct sock *sk, int err)

helper from mptcp_recvmsg() and use it verbatim in both in
mptcp_recvmsg() and here.

Side note: suggestions for a better helper name welcome!

/P


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH mptcp-next v7 0/4] implement mptcp read_sock
  2025-07-07  9:34 [PATCH mptcp-next v7 0/4] implement mptcp read_sock Geliang Tang
                   ` (3 preceding siblings ...)
  2025-07-07  9:34 ` [PATCH mptcp-next v7 4/4] selftests: mptcp: add splice io mode Geliang Tang
@ 2025-07-08 15:25 ` MPTCP CI
  4 siblings, 0 replies; 10+ messages in thread
From: MPTCP CI @ 2025-07-08 15:25 UTC (permalink / raw)
  To: Geliang Tang; +Cc: mptcp

Hi Geliang,

Thank you for your modifications, that's great!

Our CI did some validations and here is its report:

- KVM Validation: normal: Success! ✅
- KVM Validation: debug: Success! ✅
- KVM Validation: btf-normal (only bpftest_all): Success! ✅
- KVM Validation: btf-debug (only bpftest_all): Unstable: 1 failed test(s): bpftest_test_progs_mptcp 🔴
- Task: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/16145295355

Initiator: Matthieu Baerts (NGI0)
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/47721d9b7e6c
Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=979613


If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:

    $ cd [kernel source code]
    $ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
        --pull always mptcp/mptcp-upstream-virtme-docker:latest \
        auto-normal

For more details:

    https://github.com/multipath-tcp/mptcp-upstream-virtme-docker


Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)

Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (NGI0 Core)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH mptcp-next v7 2/4] mptcp: implement .read_sock
  2025-07-07  9:34 ` [PATCH mptcp-next v7 2/4] mptcp: implement .read_sock Geliang Tang
@ 2025-07-08 16:16   ` Matthieu Baerts
  0 siblings, 0 replies; 10+ messages in thread
From: Matthieu Baerts @ 2025-07-08 16:16 UTC (permalink / raw)
  To: Geliang Tang, mptcp, pabeni, hare; +Cc: Geliang Tang

Hi Geliang,

Thank you for the new version!

(Thank you, Paolo, for the previous reviews!)

On 07/07/2025 11:34, Geliang Tang wrote:
> From: Geliang Tang <tanggeliang@kylinos.cn>
> 
> nvme_tcp_try_recv() needs to call .read_sock interface of struct
> proto_ops, but it is not implemented in MPTCP.
> 
> This patch implements it with reference to __mptcp_recvmsg_mskq().

I still think more explanations are needed to help reviewers and
developers later, e.g. explaining what this interface is supposed to do,
what are the particularities linked to MPTCP and the differences with
TCP, why you cannot re-use something already there or the differences
with __mptcp_recvmsg_mskq(), etc.

> v2:
>  - first check the sk_state (Matt), but not look for the end of the
> end of a connection like TCP in __tcp_read_sock():
> 
> 	if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN)
> 		break;
> 
> This will cause a use-after-free error:
> 
> 	BUG: KASAN: slab-use-after-free in mptcp_read_sock.
> 
> v3:
>  - Use sk->sk_rcvbuf instead of INT_MAX as the max len.
> 
> v4:
>  - invoke __mptcp_move_skbs.

Thank you for the changelog, it is very useful, but please move it in
the comment section, under the last Signed-off-by, after a line
containing "---":

  (...)
  Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
  ---
  Notes:
    v2:
      - (...)

Otherwise, the changelog will be part of the commit message when sending
this patch to netdev.

Same in the other patches.

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH mptcp-next v7 4/4] selftests: mptcp: add splice io mode
  2025-07-07  9:34 ` [PATCH mptcp-next v7 4/4] selftests: mptcp: add splice io mode Geliang Tang
@ 2025-07-08 16:23   ` Matthieu Baerts
  0 siblings, 0 replies; 10+ messages in thread
From: Matthieu Baerts @ 2025-07-08 16:23 UTC (permalink / raw)
  To: Geliang Tang, mptcp, pabeni, hare; +Cc: Geliang Tang

Hi Geliang,

On 07/07/2025 11:34, Geliang Tang wrote:
> From: Geliang Tang <tanggeliang@kylinos.cn>
> 
> This patch adds a new 'splice' io mode for mptcp_connect to test
> the newly added read_sock() and splice() functions of MPTCP.
> 
> 	./mptcp_connect.sh -m splice

That's good, but CI will not check them by default. For our CI, we can
easily modify it to run extra tests, but CI check stable versions will not.

What about adding new .sh files call mptcp_connect.sh with -m <mode>?

  mptcp_connect_mmap.sh
  mptcp_connect_sendfile.sh
  mptcp_connect_splice.sh

I will quickly check if that can work, and maybe propose patches for the
two others that we could backport to improve the code coverage on stable
releases too. If it works, feel free to do the same for splice.sh. I
will then modify the CI to remove the specific case checking mmap on the
side.

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH mptcp-next v7 3/4] mptcp: implement .splice_read
  2025-07-08 14:52   ` Paolo Abeni
@ 2025-07-10  9:10     ` Geliang Tang
  0 siblings, 0 replies; 10+ messages in thread
From: Geliang Tang @ 2025-07-10  9:10 UTC (permalink / raw)
  To: Paolo Abeni, mptcp, hare; +Cc: Geliang Tang

Hi Paolo,

On Tue, 2025-07-08 at 16:52 +0200, Paolo Abeni wrote:
> On 7/7/25 11:34 AM, Geliang Tang wrote:
> > From: Geliang Tang <tanggeliang@kylinos.cn>
> > 
> > This patch implements .splice_read interface of mptcp struct
> > proto_ops
> > as mptcp_splice_read() with reference to tcp_splice_read().
> > 
> > Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
> > ---
> >  net/mptcp/protocol.c | 136
> > +++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 136 insertions(+)
> > 
> > diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> > index fc429d175ede..4638d4be2b98 100644
> > --- a/net/mptcp/protocol.c
> > +++ b/net/mptcp/protocol.c
> > @@ -4023,6 +4023,140 @@ static int mptcp_read_sock(struct sock *sk,
> > read_descriptor_t *desc,
> >  	return copied;
> >  }
> >  
> > +/*
> > + * MPTCP splice context
> > + */
> > +struct mptcp_splice_state {
> > +	struct pipe_inode_info *pipe;
> > +	size_t len;
> > +	unsigned int flags;
> > +};
> > +
> > +static int mptcp_splice_data_recv(read_descriptor_t *rd_desc,
> > struct sk_buff *skb,
> > +				  unsigned int offset, size_t len)
> > +{
> > +	struct mptcp_splice_state *mss = rd_desc->arg.data;
> > +	int ret;
> > +
> > +	ret = skb_splice_bits(skb, skb->sk, offset, mss->pipe,
> > +			      min(rd_desc->count, len), mss-
> > >flags);
> > +	if (ret > 0)
> > +		rd_desc->count -= ret;
> > +	return ret;
> > +}
> 
> I have mixed feeling WRT the above. I'm wondering if we should reuse
> the
> same code already existing in TCP, moving tcp_spice_state definition
> in
> some shared hdr and macking tcp_splice_data_recv not static.
> 
> > +static int __mptcp_splice_read(struct sock *sk, struct
> > mptcp_splice_state *mss)
> > +{
> > +	/* Store MPTCP splice context information in
> > read_descriptor_t. */
> > +	read_descriptor_t rd_desc = {
> > +		.arg.data = mss,
> > +		.count	  = mss->len,
> > +	};
> > +
> > +	return mptcp_read_sock(sk, &rd_desc,
> > mptcp_splice_data_recv);
> > +}
> > +
> > +/**
> > + *  mptcp_splice_read - splice data from MPTCP socket to a pipe
> > + * @sock:	socket to splice from
> > + * @ppos:	position (not valid)
> > + * @pipe:	pipe to splice to
> > + * @len:	number of bytes to splice
> > + * @flags:	splice modifier flags
> > + *
> > + * Description:
> > + *    Will read pages from given socket and fill them into a pipe.
> > + *
> > + **/
> > +static ssize_t mptcp_splice_read(struct socket *sock, loff_t
> > *ppos,
> > +				 struct pipe_inode_info *pipe,
> > size_t len,
> > +				 unsigned int flags)
> > +{
> > +	struct mptcp_splice_state mss = {
> > +		.pipe = pipe,
> > +		.len = len,
> > +		.flags = flags,
> > +	};
> > +	struct sock *sk = sock->sk;
> > +	ssize_t spliced;
> > +	long timeo;
> > +	int ret;
> > +
> > +	/*
> > +	 * We can't seek on a socket input
> > +	 */
> > +	if (unlikely(*ppos))
> > +		return -ESPIPE;
> > +
> > +	spliced = 0;
> > +	ret = 0;
> > +
> > +	lock_sock(sk);
> > +
> > +	timeo = sock_rcvtimeo(sk, sock->file->f_flags &
> > O_NONBLOCK);
> > +	while (mss.len) {
> > +		ret = __mptcp_splice_read(sk, &mss);
> > +		if (ret < 0) {
> > +			break;
> > +		} else if (!ret) {
> > +			if (spliced)
> > +				break;
> > +			if (sock_flag(sk, SOCK_DONE))
> > +				break;

I noticed that this SOCK_DONE flag is also checked in
tcp_recvmsg_locked() but not in mptcp_recvmsg(). I wonder if this flag
should also be checked in mptcp_recvmsg() too.

> > +			if (sk->sk_err) {
> > +				ret = sock_error(sk);
> > +				break;
> > +			}
> > +			if (sk->sk_shutdown & RCV_SHUTDOWN) {
> > +				if (__mptcp_move_skbs(sk))
> > +					continue;
> > +				break;
> > +			}
> > +			if (sk->sk_state == TCP_CLOSE) {
> > +				ret = -ENOTCONN;
> > +				break;
> > +			}
> > +			if (!timeo) {
> > +				ret = -EAGAIN;
> > +				break;
> > +			}
> > +			/* if __mptcp_splice_read() got nothing
> > while we have
> > +			 * an skb in receive queue, we do not want
> > to loop.
> > +			 * This might happen with URG data.
> > +			 */
> > +			if (!skb_queue_empty(&sk-
> > >sk_receive_queue))
> > +				break;
> > +			ret = sk_wait_data(sk, &timeo, NULL);
> > +			if (ret < 0)
> > +				break;
> > +			if (signal_pending(current)) {
> > +				ret = sock_intr_errno(timeo);
> > +				break;
> > +			}
> 
> I think that moving the above if statement before the queue empty
> check
> will not change the overall behavior.
> 
> With that in place you could factor out an
> 
> bool mptcp_recv_should_stop(struct sock *sk, int err)

This helper can also be used in tcp_recvmsg_locked and tcp_splice_read
too. What about rename it as tcp_recv_should_stop, then add it in
include/net/tcp.h and use it for both TCP and MPTCP. WDYT?

Thanks,
-Geliang

> 
> helper from mptcp_recvmsg() and use it verbatim in both in
> mptcp_recvmsg() and here.
> 
> Side note: suggestions for a better helper name welcome!
> 
> /P
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2025-07-10  9:10 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-07  9:34 [PATCH mptcp-next v7 0/4] implement mptcp read_sock Geliang Tang
2025-07-07  9:34 ` [PATCH mptcp-next v7 1/4] mptcp: add eat_recv_skb helper Geliang Tang
2025-07-07  9:34 ` [PATCH mptcp-next v7 2/4] mptcp: implement .read_sock Geliang Tang
2025-07-08 16:16   ` Matthieu Baerts
2025-07-07  9:34 ` [PATCH mptcp-next v7 3/4] mptcp: implement .splice_read Geliang Tang
2025-07-08 14:52   ` Paolo Abeni
2025-07-10  9:10     ` Geliang Tang
2025-07-07  9:34 ` [PATCH mptcp-next v7 4/4] selftests: mptcp: add splice io mode Geliang Tang
2025-07-08 16:23   ` Matthieu Baerts
2025-07-08 15:25 ` [PATCH mptcp-next v7 0/4] implement mptcp read_sock MPTCP CI

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).