mptcp.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
* [PATCH mptcp-next v3 0/4] implement mptcp read_sock
@ 2025-06-26  8:41 Geliang Tang
  2025-06-26  8:41 ` [PATCH mptcp-next v3 1/4] mptcp: use sk_eat_skb in recvmsg_mskq Geliang Tang
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Geliang Tang @ 2025-06-26  8:41 UTC (permalink / raw)
  To: mptcp, hare; +Cc: Geliang Tang

v3:
 - merge the two squash-to patchs.
 - use sk->sk_rcvbuf instead of INT_MAX as the max len in
 mptcp_read_sock().
 - add splice io mode for mptcp_connect and drop mptcp_splice.c test.
 - the splice test for packetdrill is also added here:
https://github.com/multipath-tcp/packetdrill/pull/162

v2:
 - set splice_read of mptcp
 - add a splice selftest.

I have good news! I recently added MPTCP support to "NVME over TCP".
And my RFC patches are under review by NVME maintainer Hannes.

Replacing "NVME over TCP" with MPTCP is very simple. I used IPPROTO_MPTCP
instead of IPPROTO_TCP to create MPTCP sockets on both target and host
sides, these sockets are created in Kernel space.

nvmet_tcp_add_port:

	ret = sock_create(port->addr.ss_family, SOCK_STREAM,
				IPPROTO_MPTCP, &port->sock);

nvme_tcp_alloc_queue:

	ret = sock_create_kern(current->nsproxy->net_ns,
			ctrl->addr.ss_family, SOCK_STREAM,
			IPPROTO_MPTCP, &queue->sock);

nvme_tcp_try_recv() needs to call .read_sock interface of struct
proto_ops, but it is not implemented in MPTCP. So I implemented it
with reference to __mptcp_recvmsg_mskq().

Since the NVME part patches are still under reviewing, I only send the
MPTCP part patches in this set to MPTCP ML for your opinions.

Geliang Tang (4):
  mptcp: use sk_eat_skb in recvmsg_mskq
  mptcp: implement .read_sock
  mptcp: set .splice_read
  selftests: mptcp: add splice io mode

 net/ipv4/tcp.c                                |  6 ++
 net/mptcp/protocol.c                          | 66 ++++++++++++++++++-
 .../selftests/net/mptcp/mptcp_connect.c       | 61 ++++++++++++++++-
 .../selftests/net/mptcp/mptcp_connect.sh      |  9 ++-
 4 files changed, 138 insertions(+), 4 deletions(-)

-- 
2.48.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH mptcp-next v3 1/4] mptcp: use sk_eat_skb in recvmsg_mskq
  2025-06-26  8:41 [PATCH mptcp-next v3 0/4] implement mptcp read_sock Geliang Tang
@ 2025-06-26  8:41 ` Geliang Tang
  2025-06-26  8:41 ` [PATCH mptcp-next v3 2/4] mptcp: implement .read_sock Geliang Tang
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Geliang Tang @ 2025-06-26  8:41 UTC (permalink / raw)
  To: mptcp, hare; +Cc: Geliang Tang

From: Geliang Tang <tanggeliang@kylinos.cn>

This patch uses sk_eat_skb() helper in __mptcp_recvmsg_mskq() instead of
open-coding it.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
---
 net/mptcp/protocol.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 1de42dc4e8ea..2f747ab730e5 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -1890,8 +1890,7 @@ static int __mptcp_recvmsg_mskq(struct sock *sk,
 			skb->destructor = NULL;
 			atomic_sub(skb->truesize, &sk->sk_rmem_alloc);
 			sk_mem_uncharge(sk, skb->truesize);
-			__skb_unlink(skb, &sk->sk_receive_queue);
-			__kfree_skb(skb);
+			sk_eat_skb(sk, skb);
 			msk->bytes_consumed += count;
 		}
 
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH mptcp-next v3 2/4] mptcp: implement .read_sock
  2025-06-26  8:41 [PATCH mptcp-next v3 0/4] implement mptcp read_sock Geliang Tang
  2025-06-26  8:41 ` [PATCH mptcp-next v3 1/4] mptcp: use sk_eat_skb in recvmsg_mskq Geliang Tang
@ 2025-06-26  8:41 ` Geliang Tang
  2025-06-26  8:41 ` [PATCH mptcp-next v3 3/4] mptcp: set .splice_read Geliang Tang
  2025-06-26  8:41 ` [PATCH mptcp-next v3 4/4] selftests: mptcp: add splice io mode Geliang Tang
  3 siblings, 0 replies; 5+ messages in thread
From: Geliang Tang @ 2025-06-26  8:41 UTC (permalink / raw)
  To: mptcp, hare; +Cc: Geliang Tang

From: Geliang Tang <tanggeliang@kylinos.cn>

nvme_tcp_try_recv() needs to call .read_sock interface of struct
proto_ops, but it is not implemented in MPTCP.

This patch implements it with reference to __mptcp_recvmsg_mskq().

v3:
 - Use sk->sk_rcvbuf instead of INT_MAX as the max len.

v2:
 - first check the sk_state (Matt), but not look for the end of the
end of a connection like TCP in __tcp_read_sock():

	if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN)
		break;

This will cause a use-after-free error:

	BUG: KASAN: slab-use-after-free in mptcp_read_sock.

Reviewed-by: Hannes Reinecke <hare@kernel.org>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
---
 net/mptcp/protocol.c | 61 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 61 insertions(+)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 2f747ab730e5..488a673f4da3 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -3956,6 +3956,65 @@ static __poll_t mptcp_poll(struct file *file, struct socket *sock,
 	return mask;
 }
 
+/*
+ * Note:
+ *	- It is assumed that the socket was locked by the caller.
+ */
+static int mptcp_read_sock(struct sock *sk, read_descriptor_t *desc,
+			   sk_read_actor_t recv_actor)
+{
+	struct mptcp_sock *msk = mptcp_sk(sk);
+	struct scm_timestamping_internal tss;
+	size_t len = sk->sk_rcvbuf;
+	struct sk_buff *skb, *tmp;
+	int copied = 0;
+
+	if (sk->sk_state == TCP_LISTEN)
+		return -ENOTCONN;
+	skb_queue_walk_safe(&sk->sk_receive_queue, skb, tmp) {
+		u32 offset = MPTCP_SKB_CB(skb)->offset;
+		u32 data_len = skb->len - offset;
+		u32 size = min_t(size_t, len - copied, data_len);
+		int count;
+
+		count = recv_actor(desc, skb, offset, size);
+		if (count <= 0) {
+			if (!copied)
+				copied = count;
+			break;
+		}
+
+		if (MPTCP_SKB_CB(skb)->has_rxtstamp)
+			tcp_update_recv_tstamps(skb, &tss);
+
+		copied += count;
+
+		if (count < data_len) {
+			MPTCP_SKB_CB(skb)->offset += count;
+			MPTCP_SKB_CB(skb)->map_seq += count;
+			msk->bytes_consumed += count;
+			break;
+		}
+
+		/* avoid the indirect call, we know the destructor is sock_wfree */
+		skb->destructor = NULL;
+		atomic_sub(skb->truesize, &sk->sk_rmem_alloc);
+		sk_mem_uncharge(sk, skb->truesize);
+		sk_eat_skb(sk, skb);
+		msk->bytes_consumed += count;
+
+		if (copied >= len)
+			break;
+	}
+
+	mptcp_rcv_space_adjust(msk, copied);
+
+	if (copied > 0)
+		mptcp_cleanup_rbuf(msk, copied);
+
+	return copied;
+}
+
 static const struct proto_ops mptcp_stream_ops = {
 	.family		   = PF_INET,
 	.owner		   = THIS_MODULE,
@@ -3976,6 +4035,7 @@ static const struct proto_ops mptcp_stream_ops = {
 	.recvmsg	   = inet_recvmsg,
 	.mmap		   = sock_no_mmap,
 	.set_rcvlowat	   = mptcp_set_rcvlowat,
+	.read_sock	   = mptcp_read_sock,
 };
 
 static struct inet_protosw mptcp_protosw = {
@@ -4080,6 +4140,7 @@ static const struct proto_ops mptcp_v6_stream_ops = {
 	.compat_ioctl	   = inet6_compat_ioctl,
 #endif
 	.set_rcvlowat	   = mptcp_set_rcvlowat,
+	.read_sock	   = mptcp_read_sock,
 };
 
 static struct proto mptcp_v6_prot;
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH mptcp-next v3 3/4] mptcp: set .splice_read
  2025-06-26  8:41 [PATCH mptcp-next v3 0/4] implement mptcp read_sock Geliang Tang
  2025-06-26  8:41 ` [PATCH mptcp-next v3 1/4] mptcp: use sk_eat_skb in recvmsg_mskq Geliang Tang
  2025-06-26  8:41 ` [PATCH mptcp-next v3 2/4] mptcp: implement .read_sock Geliang Tang
@ 2025-06-26  8:41 ` Geliang Tang
  2025-06-26  8:41 ` [PATCH mptcp-next v3 4/4] selftests: mptcp: add splice io mode Geliang Tang
  3 siblings, 0 replies; 5+ messages in thread
From: Geliang Tang @ 2025-06-26  8:41 UTC (permalink / raw)
  To: mptcp, hare; +Cc: Geliang Tang

From: Geliang Tang <tanggeliang@kylinos.cn>

This patch sets .splice_read interface of mptcp struct proto_ops as
tcp_splice_read. And invoke .read_sock in __tcp_splice_read().

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
---
 net/ipv4/tcp.c       | 6 ++++++
 net/mptcp/protocol.c | 2 ++
 2 files changed, 8 insertions(+)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index c18682c3fa33..98ce7a624b33 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -779,6 +779,12 @@ static int __tcp_splice_read(struct sock *sk, struct tcp_splice_state *tss)
 		.arg.data = tss,
 		.count	  = tss->len,
 	};
+	const struct proto_ops *ops;
+
+	ops = READ_ONCE(sk->sk_socket->ops);
+
+	if (likely(ops->read_sock))
+		return ops->read_sock(sk, &rd_desc, tcp_splice_data_recv);
 
 	return tcp_read_sock(sk, &rd_desc, tcp_splice_data_recv);
 }
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 488a673f4da3..45db44700e64 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -4036,6 +4036,7 @@ static const struct proto_ops mptcp_stream_ops = {
 	.mmap		   = sock_no_mmap,
 	.set_rcvlowat	   = mptcp_set_rcvlowat,
 	.read_sock	   = mptcp_read_sock,
+	.splice_read	   = tcp_splice_read,
 };
 
 static struct inet_protosw mptcp_protosw = {
@@ -4141,6 +4142,7 @@ static const struct proto_ops mptcp_v6_stream_ops = {
 #endif
 	.set_rcvlowat	   = mptcp_set_rcvlowat,
 	.read_sock	   = mptcp_read_sock,
+	.splice_read	   = tcp_splice_read,
 };
 
 static struct proto mptcp_v6_prot;
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH mptcp-next v3 4/4] selftests: mptcp: add splice io mode
  2025-06-26  8:41 [PATCH mptcp-next v3 0/4] implement mptcp read_sock Geliang Tang
                   ` (2 preceding siblings ...)
  2025-06-26  8:41 ` [PATCH mptcp-next v3 3/4] mptcp: set .splice_read Geliang Tang
@ 2025-06-26  8:41 ` Geliang Tang
  3 siblings, 0 replies; 5+ messages in thread
From: Geliang Tang @ 2025-06-26  8:41 UTC (permalink / raw)
  To: mptcp, hare; +Cc: Geliang Tang

From: Geliang Tang <tanggeliang@kylinos.cn>

This patch adds a new 'splice' io mode for mptcp_connect to test
the newly added read_sock() and splice() functions of MPTCP.

	./mptcp_connect.sh -m splice

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
---
 .../selftests/net/mptcp/mptcp_connect.c       | 61 ++++++++++++++++++-
 .../selftests/net/mptcp/mptcp_connect.sh      |  9 ++-
 2 files changed, 68 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/net/mptcp/mptcp_connect.c b/tools/testing/selftests/net/mptcp/mptcp_connect.c
index ac1349c4b9e5..ce4b4ed9164d 100644
--- a/tools/testing/selftests/net/mptcp/mptcp_connect.c
+++ b/tools/testing/selftests/net/mptcp/mptcp_connect.c
@@ -51,6 +51,7 @@ enum cfg_mode {
 	CFG_MODE_POLL,
 	CFG_MODE_MMAP,
 	CFG_MODE_SENDFILE,
+	CFG_MODE_SPLICE,
 };
 
 enum cfg_peek {
@@ -123,7 +124,7 @@ static void die_usage(void)
 	fprintf(stderr, "\t-j     -- add additional sleep at connection start and tear down "
 		"-- for MPJ tests\n");
 	fprintf(stderr, "\t-l     -- listens mode, accepts incoming connection\n");
-	fprintf(stderr, "\t-m [poll|mmap|sendfile] -- use poll(default)/mmap+write/sendfile\n");
+	fprintf(stderr, "\t-m [poll|mmap|sendfile|splice] -- use poll(default)/mmap+write/sendfile/splice\n");
 	fprintf(stderr, "\t-M mark -- set socket packet mark\n");
 	fprintf(stderr, "\t-o option -- test sockopt <option>\n");
 	fprintf(stderr, "\t-p num -- use port num\n");
@@ -925,6 +926,53 @@ static int copyfd_io_sendfile(int infd, int peerfd, int outfd,
 	return err;
 }
 
+static int do_splice(const int infd, const int outfd, const size_t len)
+{
+	int pipefd[2];
+	ssize_t bytes;
+	int err;
+
+	err = pipe(pipefd);
+	if (err)
+		return err;
+
+	while ((bytes = splice(infd, NULL, pipefd[1], NULL, len,
+			       SPLICE_F_MOVE | SPLICE_F_MORE)) > 0) {
+		splice(pipefd[0], NULL, outfd, NULL, bytes,
+		       SPLICE_F_MOVE | SPLICE_F_MORE);
+	}
+
+	close(pipefd[0]);
+	close(pipefd[1]);
+
+	return 0;
+}
+
+static int copyfd_io_splice(int infd, int peerfd, int outfd, unsigned int size,
+			    bool *in_closed_after_out, struct wstate *winfo)
+{
+	int err;
+
+	if (listen_mode) {
+		err = do_splice(peerfd, outfd, size);
+		if (err)
+			return err;
+
+		err = do_splice(infd, peerfd, size);
+	} else {
+		err = do_splice(infd, peerfd, size);
+		if (err)
+			return err;
+
+		shut_wr(peerfd);
+
+		err = do_splice(peerfd, outfd, size);
+		*in_closed_after_out = true;
+	}
+
+	return err;
+}
+
 static int copyfd_io(int infd, int peerfd, int outfd, bool close_peerfd, struct wstate *winfo)
 {
 	bool in_closed_after_out = false;
@@ -957,6 +1005,14 @@ static int copyfd_io(int infd, int peerfd, int outfd, bool close_peerfd, struct
 					 &in_closed_after_out, winfo);
 		break;
 
+	case CFG_MODE_SPLICE:
+		file_size = get_infd_size(infd);
+		if (file_size < 0)
+			return file_size;
+		ret = copyfd_io_splice(infd, peerfd, outfd, file_size,
+				       &in_closed_after_out, winfo);
+		break;
+
 	default:
 		fprintf(stderr, "Invalid mode %d\n", cfg_mode);
 
@@ -1361,12 +1417,15 @@ int parse_mode(const char *mode)
 		return CFG_MODE_MMAP;
 	if (!strcasecmp(mode, "sendfile"))
 		return CFG_MODE_SENDFILE;
+	if (!strcasecmp(mode, "splice"))
+		return CFG_MODE_SPLICE;
 
 	fprintf(stderr, "Unknown test mode: %s\n", mode);
 	fprintf(stderr, "Supported modes are:\n");
 	fprintf(stderr, "\t\t\"poll\" - interleaved read/write using poll()\n");
 	fprintf(stderr, "\t\t\"mmap\" - send entire input file (mmap+write), then read response (-l will read input first)\n");
 	fprintf(stderr, "\t\t\"sendfile\" - send entire input file (sendfile), then read response (-l will read input first)\n");
+	fprintf(stderr, "\t\t\"splice\" - send entire input file (splice), then read response (-l will read input first)\n");
 
 	die_usage();
 
diff --git a/tools/testing/selftests/net/mptcp/mptcp_connect.sh b/tools/testing/selftests/net/mptcp/mptcp_connect.sh
index 5e3c56253274..4dd59a2ed21e 100755
--- a/tools/testing/selftests/net/mptcp/mptcp_connect.sh
+++ b/tools/testing/selftests/net/mptcp/mptcp_connect.sh
@@ -339,7 +339,14 @@ do_transfer()
 	fi
 
 	if [ -n "$testmode" ]; then
-		extra_args+=" -m $testmode"
+		if [ ${testmode} = "splice" ]; then
+			# only use 'splice' mode for MPTCP tests
+			if [ ${cl_proto} = "MPTCP" ] && [ ${srv_proto} = "MPTCP" ]; then
+				extra_args+=" -m splice"
+			fi
+		else
+			extra_args+=" -m $testmode"
+		fi
 	fi
 
 	if [ -n "$extra_args" ] && $options_log; then
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-06-26  8:42 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-26  8:41 [PATCH mptcp-next v3 0/4] implement mptcp read_sock Geliang Tang
2025-06-26  8:41 ` [PATCH mptcp-next v3 1/4] mptcp: use sk_eat_skb in recvmsg_mskq Geliang Tang
2025-06-26  8:41 ` [PATCH mptcp-next v3 2/4] mptcp: implement .read_sock Geliang Tang
2025-06-26  8:41 ` [PATCH mptcp-next v3 3/4] mptcp: set .splice_read Geliang Tang
2025-06-26  8:41 ` [PATCH mptcp-next v3 4/4] selftests: mptcp: add splice io mode Geliang Tang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).