MPTCP Linux Development
 help / color / mirror / Atom feed
* [RFC mptcp-next v2 0/7] NVME over MPTCP
@ 2025-11-26 10:39 Geliang Tang
  2025-11-26 10:39 ` [RFC mptcp-next v2 1/7] mptcp: allow overridden write_space to be invoked Geliang Tang
                   ` (7 more replies)
  0 siblings, 8 replies; 9+ messages in thread
From: Geliang Tang @ 2025-11-26 10:39 UTC (permalink / raw)
  To: mptcp, hare, hare; +Cc: Geliang Tang

From: Geliang Tang <tanggeliang@kylinos.cn>

v2:
 - Patch 1 fixes the timeout issue reported in v1, thanks to Paolo and Gang
Yan for their help.
 - Patch 5 implements an MPTCP-specific sock_set_syncnt helper.

This series (previously named "MPTCP support to 'NVME over TCP'") had three
RFC versions sent to Hannes in May, with subsequent revisions based on his
input. Following that, I initiated the process of upstreaming the dependent
"implement mptcp read_sock" series to the main MPTCP repository, which has
now largely taken shape.

Depends on: implement mptcp read_sock, v14
Based-on: <cover.1763974740.git.tanggeliang@kylinos.cn>

Geliang Tang (7):
  mptcp: allow overridden write_space to be invoked
  mptcp: add sock_set_reuseaddr
  mptcp: add sock_set_nodelay
  nvmet-tcp: add mptcp support
  mptcp: add sock_set_syncnt
  nvme-tcp: add mptcp support
  selftests: mptcp: add NVMe-over-MPTCP test

 drivers/nvme/host/tcp.c                       |  28 +++-
 drivers/nvme/target/configfs.c                |   1 +
 drivers/nvme/target/tcp.c                     |  38 +++++-
 include/linux/nvme.h                          |   1 +
 include/net/mptcp.h                           |  15 +++
 net/mptcp/protocol.c                          |  52 ++++++++
 net/mptcp/protocol.h                          |   2 +-
 tools/testing/selftests/net/mptcp/config      |   7 +
 .../testing/selftests/net/mptcp/mptcp_nvme.sh | 120 ++++++++++++++++++
 9 files changed, 260 insertions(+), 4 deletions(-)
 create mode 100755 tools/testing/selftests/net/mptcp/mptcp_nvme.sh

-- 
2.51.0


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [RFC mptcp-next v2 1/7] mptcp: allow overridden write_space to be invoked
  2025-11-26 10:39 [RFC mptcp-next v2 0/7] NVME over MPTCP Geliang Tang
@ 2025-11-26 10:39 ` Geliang Tang
  2025-11-26 10:39 ` [RFC mptcp-next v2 2/7] mptcp: add sock_set_reuseaddr Geliang Tang
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Geliang Tang @ 2025-11-26 10:39 UTC (permalink / raw)
  To: mptcp, hare, hare
  Cc: Geliang Tang, Paolo Abeni, Hui Zhu, Gang Yan, zhenwei pi

From: Geliang Tang <tanggeliang@kylinos.cn>

NVMe overrides its own sk_write_space functions. This patch ensures that
the overridden sk_write_space can be invoked by MPTCP.

Suggested-by: Paolo Abeni <pabeni@redhat.com>
Co-developed-by: Hui Zhu <zhuhui@kylinos.cn>
Signed-off-by: Hui Zhu <zhuhui@kylinos.cn>
Co-developed-by: Gang Yan <yangang@kylinos.cn>
Signed-off-by: Gang Yan <yangang@kylinos.cn>
Co-developed-by: zhenwei pi <zhenwei.pi@linux.dev>
Signed-off-by: zhenwei pi <zhenwei.pi@linux.dev>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
---
 net/mptcp/protocol.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index 199f28f3dd5e..483143e2d0b5 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -973,7 +973,7 @@ static inline void mptcp_write_space(struct sock *sk)
 	/* pairs with memory barrier in mptcp_poll */
 	smp_mb();
 	if (mptcp_stream_memory_free(sk, 1))
-		sk_stream_write_space(sk);
+		INDIRECT_CALL_1(sk->sk_write_space, sk_stream_write_space, sk);
 }
 
 static inline void __mptcp_sync_sndbuf(struct sock *sk)
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC mptcp-next v2 2/7] mptcp: add sock_set_reuseaddr
  2025-11-26 10:39 [RFC mptcp-next v2 0/7] NVME over MPTCP Geliang Tang
  2025-11-26 10:39 ` [RFC mptcp-next v2 1/7] mptcp: allow overridden write_space to be invoked Geliang Tang
@ 2025-11-26 10:39 ` Geliang Tang
  2025-11-26 10:40 ` [RFC mptcp-next v2 3/7] mptcp: add sock_set_nodelay Geliang Tang
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Geliang Tang @ 2025-11-26 10:39 UTC (permalink / raw)
  To: mptcp, hare, hare; +Cc: Geliang Tang, Hui Zhu, Gang Yan, zhenwei pi

From: Geliang Tang <tanggeliang@kylinos.cn>

Implement an MPTCP sock_set_nodelay helper, which will be used for NVMe
over MPTCP. Using tcp_sock_set_nodelay() with MPTCP will cause list
corruption:

  nvmet: adding nsid 1 to subsystem nqn.2014-08.org.nvmexpress.mptcpdev
  nvmet_tcp: enabling port 1234 (127.0.0.1:4420)
   slab MPTCP start ffff8880108f0b80 pointer offset 2480 size 2816
  list_add corruption. prev->next should be next (ffff8880108f1530), but
  was ffff8885108f1530. (prev=ffff8880108f1530).
  ------------[ cut here ]------------
  kernel BUG at lib/list_debug.c:32!
  Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI
  CPU: 1 UID: 0 PID: 182 Comm: nvme Not tainted 6.16.0-rc3+ #1 PREEMPT(full)
  Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011

Co-developed-by: Hui Zhu <zhuhui@kylinos.cn>
Signed-off-by: Hui Zhu <zhuhui@kylinos.cn>
Co-developed-by: Gang Yan <yangang@kylinos.cn>
Signed-off-by: Gang Yan <yangang@kylinos.cn>
Co-developed-by: zhenwei pi <zhenwei.pi@linux.dev>
Signed-off-by: zhenwei pi <zhenwei.pi@linux.dev>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
---
 include/net/mptcp.h  |  4 ++++
 net/mptcp/protocol.c | 15 +++++++++++++++
 2 files changed, 19 insertions(+)

diff --git a/include/net/mptcp.h b/include/net/mptcp.h
index 4cf59e83c1c5..c7bf444eee56 100644
--- a/include/net/mptcp.h
+++ b/include/net/mptcp.h
@@ -237,6 +237,8 @@ static inline __be32 mptcp_reset_option(const struct sk_buff *skb)
 }
 
 void mptcp_active_detect_blackhole(struct sock *sk, bool expired);
+
+void mptcp_sock_set_reuseaddr(struct sock *sk);
 #else
 
 static inline void mptcp_init(void)
@@ -323,6 +325,8 @@ static inline struct request_sock *mptcp_subflow_reqsk_alloc(const struct reques
 static inline __be32 mptcp_reset_option(const struct sk_buff *skb)  { return htonl(0u); }
 
 static inline void mptcp_active_detect_blackhole(struct sock *sk, bool expired) { }
+
+static inline void mptcp_sock_set_reuseaddr(struct sock *sk) { }
 #endif /* CONFIG_MPTCP */
 
 #if IS_ENABLED(CONFIG_MPTCP_IPV6)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index f9a544e31637..21067245a8f5 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -3764,6 +3764,21 @@ static void mptcp_sock_check_graft(struct sock *sk, struct sock *ssk)
 	}
 }
 
+void mptcp_sock_set_reuseaddr(struct sock *sk)
+{
+	struct mptcp_sock *msk = mptcp_sk(sk);
+	struct sock *ssk;
+
+	lock_sock(sk);
+	ssk = __mptcp_nmpc_sk(msk);
+	if (IS_ERR(ssk))
+		goto unlock;
+	ssk->sk_reuse = SK_CAN_REUSE;
+unlock:
+	release_sock(sk);
+}
+EXPORT_SYMBOL(mptcp_sock_set_reuseaddr);
+
 bool mptcp_finish_join(struct sock *ssk)
 {
 	struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC mptcp-next v2 3/7] mptcp: add sock_set_nodelay
  2025-11-26 10:39 [RFC mptcp-next v2 0/7] NVME over MPTCP Geliang Tang
  2025-11-26 10:39 ` [RFC mptcp-next v2 1/7] mptcp: allow overridden write_space to be invoked Geliang Tang
  2025-11-26 10:39 ` [RFC mptcp-next v2 2/7] mptcp: add sock_set_reuseaddr Geliang Tang
@ 2025-11-26 10:40 ` Geliang Tang
  2025-11-26 10:40 ` [RFC mptcp-next v2 4/7] nvmet-tcp: add mptcp support Geliang Tang
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Geliang Tang @ 2025-11-26 10:40 UTC (permalink / raw)
  To: mptcp, hare, hare; +Cc: Geliang Tang, Hui Zhu, Gang Yan, zhenwei pi

From: Geliang Tang <tanggeliang@kylinos.cn>

Implement an MPTCP-specific sock_set_nodelay helper, which will be used for
NVMe over MPTCP.

Co-developed-by: Hui Zhu <zhuhui@kylinos.cn>
Signed-off-by: Hui Zhu <zhuhui@kylinos.cn>
Co-developed-by: Gang Yan <yangang@kylinos.cn>
Signed-off-by: Gang Yan <yangang@kylinos.cn>
Co-developed-by: zhenwei pi <zhenwei.pi@linux.dev>
Signed-off-by: zhenwei pi <zhenwei.pi@linux.dev>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
---
 include/net/mptcp.h  |  4 ++++
 net/mptcp/protocol.c | 18 ++++++++++++++++++
 2 files changed, 22 insertions(+)

diff --git a/include/net/mptcp.h b/include/net/mptcp.h
index c7bf444eee56..4fc8f08c6588 100644
--- a/include/net/mptcp.h
+++ b/include/net/mptcp.h
@@ -239,6 +239,8 @@ static inline __be32 mptcp_reset_option(const struct sk_buff *skb)
 void mptcp_active_detect_blackhole(struct sock *sk, bool expired);
 
 void mptcp_sock_set_reuseaddr(struct sock *sk);
+
+void mptcp_sock_set_nodelay(struct sock *sk);
 #else
 
 static inline void mptcp_init(void)
@@ -327,6 +329,8 @@ static inline __be32 mptcp_reset_option(const struct sk_buff *skb)  { return hto
 static inline void mptcp_active_detect_blackhole(struct sock *sk, bool expired) { }
 
 static inline void mptcp_sock_set_reuseaddr(struct sock *sk) { }
+
+static inline void mptcp_sock_set_nodelay(struct sock *sk) { }
 #endif /* CONFIG_MPTCP */
 
 #if IS_ENABLED(CONFIG_MPTCP_IPV6)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 21067245a8f5..9790bc74a0b7 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -3779,6 +3779,24 @@ void mptcp_sock_set_reuseaddr(struct sock *sk)
 }
 EXPORT_SYMBOL(mptcp_sock_set_reuseaddr);
 
+void mptcp_sock_set_nodelay(struct sock *sk)
+{
+	struct mptcp_sock *msk = mptcp_sk(sk);
+	struct sock *ssk;
+
+	lock_sock(sk);
+	ssk = __mptcp_nmpc_sk(msk);
+	if (IS_ERR(ssk))
+		goto unlock;
+
+	lock_sock(ssk);
+	__tcp_sock_set_nodelay(ssk, true);
+	release_sock(ssk);
+unlock:
+	release_sock(sk);
+}
+EXPORT_SYMBOL(mptcp_sock_set_nodelay);
+
 bool mptcp_finish_join(struct sock *ssk)
 {
 	struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC mptcp-next v2 4/7] nvmet-tcp: add mptcp support
  2025-11-26 10:39 [RFC mptcp-next v2 0/7] NVME over MPTCP Geliang Tang
                   ` (2 preceding siblings ...)
  2025-11-26 10:40 ` [RFC mptcp-next v2 3/7] mptcp: add sock_set_nodelay Geliang Tang
@ 2025-11-26 10:40 ` Geliang Tang
  2025-11-26 10:40 ` [RFC mptcp-next v2 5/7] mptcp: add sock_set_syncnt Geliang Tang
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Geliang Tang @ 2025-11-26 10:40 UTC (permalink / raw)
  To: mptcp, hare, hare; +Cc: Geliang Tang, Hui Zhu, Gang Yan, zhenwei pi

From: Geliang Tang <tanggeliang@kylinos.cn>

This patch adds a new nvme target transport type NVMF_TRTYPE_MPTCP for
MPTCP. And defines a new nvmet_fabrics_ops named nvmet_mptcp_ops, which
is almost the same as nvmet_tcp_ops except .type.

Check if disc_addr.trtype is NVMF_TRTYPE_MPTCP in nvmet_tcp_add_port()
to decide whether to pass IPPROTO_MPTCP to sock_create() to create a
MPTCP socket instead of a TCP one.

This new nvmet_fabrics_ops can be switched in nvmet_tcp_done_recv_pdu()
according to different protocol.

v2:
 - use trtype instead of tsas (Hannes).

v3:
 - check mptcp protocol from disc_addr.trtype instead of passing a
parameter (Hannes).

v4:
 - check CONFIG_MPTCP.

Co-developed-by: Hui Zhu <zhuhui@kylinos.cn>
Signed-off-by: Hui Zhu <zhuhui@kylinos.cn>
Co-developed-by: Gang Yan <yangang@kylinos.cn>
Signed-off-by: Gang Yan <yangang@kylinos.cn>
Co-developed-by: zhenwei pi <zhenwei.pi@linux.dev>
Signed-off-by: zhenwei pi <zhenwei.pi@linux.dev>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
---
 drivers/nvme/target/configfs.c |  1 +
 drivers/nvme/target/tcp.c      | 38 ++++++++++++++++++++++++++++++++--
 include/linux/nvme.h           |  1 +
 3 files changed, 38 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/target/configfs.c b/drivers/nvme/target/configfs.c
index e44ef69dffc2..14c642cd458e 100644
--- a/drivers/nvme/target/configfs.c
+++ b/drivers/nvme/target/configfs.c
@@ -37,6 +37,7 @@ static struct nvmet_type_name_map nvmet_transport[] = {
 	{ NVMF_TRTYPE_RDMA,	"rdma" },
 	{ NVMF_TRTYPE_FC,	"fc" },
 	{ NVMF_TRTYPE_TCP,	"tcp" },
+	{ NVMF_TRTYPE_MPTCP,	"mptcp" },
 	{ NVMF_TRTYPE_PCI,	"pci" },
 	{ NVMF_TRTYPE_LOOP,	"loop" },
 };
diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
index d543da09ef8e..a0a165d7f9cd 100644
--- a/drivers/nvme/target/tcp.c
+++ b/drivers/nvme/target/tcp.c
@@ -16,6 +16,7 @@
 #include <net/tls.h>
 #include <net/tls_prot.h>
 #include <net/handshake.h>
+#include <net/mptcp.h>
 #include <linux/inet.h>
 #include <linux/llist.h>
 #include <trace/events/sock.h>
@@ -212,6 +213,7 @@ static DEFINE_MUTEX(nvmet_tcp_queue_mutex);
 
 static struct workqueue_struct *nvmet_tcp_wq;
 static const struct nvmet_fabrics_ops nvmet_tcp_ops;
+static const struct nvmet_fabrics_ops nvmet_mptcp_ops;
 static void nvmet_tcp_free_cmd(struct nvmet_tcp_cmd *c);
 static void nvmet_tcp_free_cmd_buffers(struct nvmet_tcp_cmd *cmd);
 
@@ -1039,7 +1041,9 @@ static int nvmet_tcp_done_recv_pdu(struct nvmet_tcp_queue *queue)
 	req = &queue->cmd->req;
 	memcpy(req->cmd, nvme_cmd, sizeof(*nvme_cmd));
 
-	if (unlikely(!nvmet_req_init(req, &queue->nvme_sq, &nvmet_tcp_ops))) {
+	if (unlikely(!nvmet_req_init(req, &queue->nvme_sq,
+				     queue->sock->sk->sk_protocol == IPPROTO_MPTCP ?
+				     &nvmet_mptcp_ops : &nvmet_tcp_ops))) {
 		pr_err("failed cmd %p id %d opcode %d, data_len: %d, status: %04x\n",
 			req->cmd, req->cmd->common.command_id,
 			req->cmd->common.opcode,
@@ -2007,6 +2011,7 @@ static int nvmet_tcp_add_port(struct nvmet_port *nport)
 {
 	struct nvmet_tcp_port *port;
 	__kernel_sa_family_t af;
+	int proto = IPPROTO_TCP;
 	int ret;
 
 	port = kzalloc(sizeof(*port), GFP_KERNEL);
@@ -2027,6 +2032,11 @@ static int nvmet_tcp_add_port(struct nvmet_port *nport)
 		goto err_port;
 	}
 
+#ifdef CONFIG_MPTCP
+	if (nport->disc_addr.trtype == NVMF_TRTYPE_MPTCP)
+		proto = IPPROTO_MPTCP;
+#endif
+
 	ret = inet_pton_with_scope(&init_net, af, nport->disc_addr.traddr,
 			nport->disc_addr.trsvcid, &port->addr);
 	if (ret) {
@@ -2041,7 +2051,7 @@ static int nvmet_tcp_add_port(struct nvmet_port *nport)
 		port->nport->inline_data_size = NVMET_TCP_DEF_INLINE_DATA_SIZE;
 
 	ret = sock_create(port->addr.ss_family, SOCK_STREAM,
-				IPPROTO_TCP, &port->sock);
+				proto, &port->sock);
 	if (ret) {
 		pr_err("failed to create a socket\n");
 		goto err_port;
@@ -2050,7 +2060,11 @@ static int nvmet_tcp_add_port(struct nvmet_port *nport)
 	port->sock->sk->sk_user_data = port;
 	port->data_ready = port->sock->sk->sk_data_ready;
 	port->sock->sk->sk_data_ready = nvmet_tcp_listen_data_ready;
+	proto == IPPROTO_MPTCP ?
+	mptcp_sock_set_reuseaddr(port->sock->sk) :
 	sock_set_reuseaddr(port->sock->sk);
+	proto == IPPROTO_MPTCP ?
+	mptcp_sock_set_nodelay(port->sock->sk) :
 	tcp_sock_set_nodelay(port->sock->sk);
 	if (so_priority > 0)
 		sock_set_priority(port->sock->sk, so_priority);
@@ -2193,6 +2207,19 @@ static const struct nvmet_fabrics_ops nvmet_tcp_ops = {
 	.host_traddr		= nvmet_tcp_host_port_addr,
 };
 
+static const struct nvmet_fabrics_ops nvmet_mptcp_ops = {
+	.owner			= THIS_MODULE,
+	.type			= NVMF_TRTYPE_MPTCP,
+	.msdbd			= 1,
+	.add_port		= nvmet_tcp_add_port,
+	.remove_port		= nvmet_tcp_remove_port,
+	.queue_response		= nvmet_tcp_queue_response,
+	.delete_ctrl		= nvmet_tcp_delete_ctrl,
+	.install_queue		= nvmet_tcp_install_queue,
+	.disc_traddr		= nvmet_tcp_disc_port_addr,
+	.host_traddr		= nvmet_tcp_host_port_addr,
+};
+
 static int __init nvmet_tcp_init(void)
 {
 	int ret;
@@ -2206,6 +2233,12 @@ static int __init nvmet_tcp_init(void)
 	if (ret)
 		goto err;
 
+	ret = nvmet_register_transport(&nvmet_mptcp_ops);
+	if (ret) {
+		nvmet_unregister_transport(&nvmet_tcp_ops);
+		goto err;
+	}
+
 	return 0;
 err:
 	destroy_workqueue(nvmet_tcp_wq);
@@ -2216,6 +2249,7 @@ static void __exit nvmet_tcp_exit(void)
 {
 	struct nvmet_tcp_queue *queue;
 
+	nvmet_unregister_transport(&nvmet_mptcp_ops);
 	nvmet_unregister_transport(&nvmet_tcp_ops);
 
 	flush_workqueue(nvmet_wq);
diff --git a/include/linux/nvme.h b/include/linux/nvme.h
index 655d194f8e72..8069667ad47e 100644
--- a/include/linux/nvme.h
+++ b/include/linux/nvme.h
@@ -68,6 +68,7 @@ enum {
 	NVMF_TRTYPE_RDMA	= 1,	/* RDMA */
 	NVMF_TRTYPE_FC		= 2,	/* Fibre Channel */
 	NVMF_TRTYPE_TCP		= 3,	/* TCP/IP */
+	NVMF_TRTYPE_MPTCP	= 4,	/* Multipath TCP */
 	NVMF_TRTYPE_LOOP	= 254,	/* Reserved for host usage */
 	NVMF_TRTYPE_MAX,
 };
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC mptcp-next v2 5/7] mptcp: add sock_set_syncnt
  2025-11-26 10:39 [RFC mptcp-next v2 0/7] NVME over MPTCP Geliang Tang
                   ` (3 preceding siblings ...)
  2025-11-26 10:40 ` [RFC mptcp-next v2 4/7] nvmet-tcp: add mptcp support Geliang Tang
@ 2025-11-26 10:40 ` Geliang Tang
  2025-11-26 10:40 ` [RFC mptcp-next v2 6/7] nvme-tcp: add mptcp support Geliang Tang
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Geliang Tang @ 2025-11-26 10:40 UTC (permalink / raw)
  To: mptcp, hare, hare; +Cc: Geliang Tang, Hui Zhu, Gang Yan, zhenwei pi

From: Geliang Tang <tanggeliang@kylinos.cn>

Implement an MPTCP-specific sock_set_syncnt helper, which will be used for
NVMe over MPTCP.

Co-developed-by: Hui Zhu <zhuhui@kylinos.cn>
Signed-off-by: Hui Zhu <zhuhui@kylinos.cn>
Co-developed-by: Gang Yan <yangang@kylinos.cn>
Signed-off-by: Gang Yan <yangang@kylinos.cn>
Co-developed-by: zhenwei pi <zhenwei.pi@linux.dev>
Signed-off-by: zhenwei pi <zhenwei.pi@linux.dev>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
---
 include/net/mptcp.h  |  7 +++++++
 net/mptcp/protocol.c | 19 +++++++++++++++++++
 2 files changed, 26 insertions(+)

diff --git a/include/net/mptcp.h b/include/net/mptcp.h
index 4fc8f08c6588..7d981973df48 100644
--- a/include/net/mptcp.h
+++ b/include/net/mptcp.h
@@ -241,6 +241,8 @@ void mptcp_active_detect_blackhole(struct sock *sk, bool expired);
 void mptcp_sock_set_reuseaddr(struct sock *sk);
 
 void mptcp_sock_set_nodelay(struct sock *sk);
+
+int mptcp_sock_set_syncnt(struct sock *sk, int val);
 #else
 
 static inline void mptcp_init(void)
@@ -331,6 +333,11 @@ static inline void mptcp_active_detect_blackhole(struct sock *sk, bool expired)
 static inline void mptcp_sock_set_reuseaddr(struct sock *sk) { }
 
 static inline void mptcp_sock_set_nodelay(struct sock *sk) { }
+
+static inline int mptcp_sock_set_syncnt(struct sock *sk, int val)
+{
+	return 0;
+}
 #endif /* CONFIG_MPTCP */
 
 #if IS_ENABLED(CONFIG_MPTCP_IPV6)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 9790bc74a0b7..543fa3131af7 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -3797,6 +3797,25 @@ void mptcp_sock_set_nodelay(struct sock *sk)
 }
 EXPORT_SYMBOL(mptcp_sock_set_nodelay);
 
+int mptcp_sock_set_syncnt(struct sock *sk, int val)
+{
+	struct mptcp_sock *msk = mptcp_sk(sk);
+	struct sock *ssk;
+
+	if (val < 1 || val > MAX_TCP_SYNCNT)
+		return -EINVAL;
+
+	lock_sock(sk);
+	ssk = __mptcp_nmpc_sk(msk);
+	if (IS_ERR(ssk))
+		goto unlock;
+	WRITE_ONCE(inet_csk(ssk)->icsk_syn_retries, val);
+unlock:
+	release_sock(sk);
+	return 0;
+}
+EXPORT_SYMBOL(mptcp_sock_set_syncnt);
+
 bool mptcp_finish_join(struct sock *ssk)
 {
 	struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC mptcp-next v2 6/7] nvme-tcp: add mptcp support
  2025-11-26 10:39 [RFC mptcp-next v2 0/7] NVME over MPTCP Geliang Tang
                   ` (4 preceding siblings ...)
  2025-11-26 10:40 ` [RFC mptcp-next v2 5/7] mptcp: add sock_set_syncnt Geliang Tang
@ 2025-11-26 10:40 ` Geliang Tang
  2025-11-26 10:40 ` [RFC mptcp-next v2 7/7] selftests: mptcp: add NVMe-over-MPTCP test Geliang Tang
  2025-11-26 12:29 ` [RFC mptcp-next v2 0/7] NVME over MPTCP MPTCP CI
  7 siblings, 0 replies; 9+ messages in thread
From: Geliang Tang @ 2025-11-26 10:40 UTC (permalink / raw)
  To: mptcp, hare, hare; +Cc: Geliang Tang, Hui Zhu, Gang Yan, zhenwei pi

From: Geliang Tang <tanggeliang@kylinos.cn>

This patch defines a new nvmf_transport_ops named nvme_mptcp_transport,
which is almost the same as nvme_tcp_transport except .type.

Check if opts->transport is "mptcp" in nvme_tcp_alloc_queue() to decide
whether to pass IPPROTO_MPTCP to sock_create_kern() to create a MPTCP
socket instead of a TCP one.

v2:
 - use 'trtype' instead of '--mptcp' (Hannes)

v3:
 - check mptcp protocol from opts->transport instead of passing a
parameter (Hannes).

v4:
 - check CONFIG_MPTCP.

Co-developed-by: Hui Zhu <zhuhui@kylinos.cn>
Signed-off-by: Hui Zhu <zhuhui@kylinos.cn>
Co-developed-by: Gang Yan <yangang@kylinos.cn>
Signed-off-by: Gang Yan <yangang@kylinos.cn>
Co-developed-by: zhenwei pi <zhenwei.pi@linux.dev>
Signed-off-by: zhenwei pi <zhenwei.pi@linux.dev>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
---
 drivers/nvme/host/tcp.c | 28 +++++++++++++++++++++++++++-
 1 file changed, 27 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 6795b8286c35..383e56ccc539 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -19,6 +19,7 @@
 #include <linux/blk-mq.h>
 #include <net/busy_poll.h>
 #include <trace/events/sock.h>
+#include <net/mptcp.h>
 
 #include "nvme.h"
 #include "fabrics.h"
@@ -1766,6 +1767,7 @@ static int nvme_tcp_alloc_queue(struct nvme_ctrl *nctrl, int qid,
 {
 	struct nvme_tcp_ctrl *ctrl = to_tcp_ctrl(nctrl);
 	struct nvme_tcp_queue *queue = &ctrl->queues[qid];
+	int proto = IPPROTO_TCP;
 	int ret, rcv_pdu_size;
 	struct file *sock_file;
 
@@ -1782,9 +1784,14 @@ static int nvme_tcp_alloc_queue(struct nvme_ctrl *nctrl, int qid,
 		queue->cmnd_capsule_len = sizeof(struct nvme_command) +
 						NVME_TCP_ADMIN_CCSZ;
 
+#ifdef CONFIG_MPTCP
+	if (!strcmp(ctrl->ctrl.opts->transport, "mptcp"))
+		proto = IPPROTO_MPTCP;
+#endif
+
 	ret = sock_create_kern(current->nsproxy->net_ns,
 			ctrl->addr.ss_family, SOCK_STREAM,
-			IPPROTO_TCP, &queue->sock);
+			proto, &queue->sock);
 	if (ret) {
 		dev_err(nctrl->device,
 			"failed to create socket: %d\n", ret);
@@ -1801,9 +1808,13 @@ static int nvme_tcp_alloc_queue(struct nvme_ctrl *nctrl, int qid,
 	nvme_tcp_reclassify_socket(queue->sock);
 
 	/* Single syn retry */
+	proto == IPPROTO_MPTCP ?
+	mptcp_sock_set_syncnt(queue->sock->sk, 1) :
 	tcp_sock_set_syncnt(queue->sock->sk, 1);
 
 	/* Set TCP no delay */
+	proto == IPPROTO_MPTCP ?
+	mptcp_sock_set_nodelay(queue->sock->sk) :
 	tcp_sock_set_nodelay(queue->sock->sk);
 
 	/*
@@ -3022,6 +3033,19 @@ static struct nvmf_transport_ops nvme_tcp_transport = {
 	.create_ctrl	= nvme_tcp_create_ctrl,
 };
 
+static struct nvmf_transport_ops nvme_mptcp_transport = {
+	.name		= "mptcp",
+	.module		= THIS_MODULE,
+	.required_opts	= NVMF_OPT_TRADDR,
+	.allowed_opts	= NVMF_OPT_TRSVCID | NVMF_OPT_RECONNECT_DELAY |
+			  NVMF_OPT_HOST_TRADDR | NVMF_OPT_CTRL_LOSS_TMO |
+			  NVMF_OPT_HDR_DIGEST | NVMF_OPT_DATA_DIGEST |
+			  NVMF_OPT_NR_WRITE_QUEUES | NVMF_OPT_NR_POLL_QUEUES |
+			  NVMF_OPT_TOS | NVMF_OPT_HOST_IFACE | NVMF_OPT_TLS |
+			  NVMF_OPT_KEYRING | NVMF_OPT_TLS_KEY | NVMF_OPT_CONCAT,
+	.create_ctrl	= nvme_tcp_create_ctrl,
+};
+
 static int __init nvme_tcp_init_module(void)
 {
 	unsigned int wq_flags = WQ_MEM_RECLAIM | WQ_HIGHPRI | WQ_SYSFS;
@@ -3047,6 +3071,7 @@ static int __init nvme_tcp_init_module(void)
 		atomic_set(&nvme_tcp_cpu_queues[cpu], 0);
 
 	nvmf_register_transport(&nvme_tcp_transport);
+	nvmf_register_transport(&nvme_mptcp_transport);
 	return 0;
 }
 
@@ -3054,6 +3079,7 @@ static void __exit nvme_tcp_cleanup_module(void)
 {
 	struct nvme_tcp_ctrl *ctrl;
 
+	nvmf_unregister_transport(&nvme_mptcp_transport);
 	nvmf_unregister_transport(&nvme_tcp_transport);
 
 	mutex_lock(&nvme_tcp_ctrl_mutex);
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC mptcp-next v2 7/7] selftests: mptcp: add NVMe-over-MPTCP test
  2025-11-26 10:39 [RFC mptcp-next v2 0/7] NVME over MPTCP Geliang Tang
                   ` (5 preceding siblings ...)
  2025-11-26 10:40 ` [RFC mptcp-next v2 6/7] nvme-tcp: add mptcp support Geliang Tang
@ 2025-11-26 10:40 ` Geliang Tang
  2025-11-26 12:29 ` [RFC mptcp-next v2 0/7] NVME over MPTCP MPTCP CI
  7 siblings, 0 replies; 9+ messages in thread
From: Geliang Tang @ 2025-11-26 10:40 UTC (permalink / raw)
  To: mptcp, hare, hare; +Cc: Geliang Tang, Hui Zhu, Gang Yan, zhenwei pi

From: Geliang Tang <tanggeliang@kylinos.cn>

A test case for NVMe over MPTCP has been implemented. It verifies the
proper functionality of nvme list, discover, connect, and disconnect
commands. Additionally, read/write performance has been evaluated using
fio.

Co-developed-by: Hui Zhu <zhuhui@kylinos.cn>
Signed-off-by: Hui Zhu <zhuhui@kylinos.cn>
Co-developed-by: Gang Yan <yangang@kylinos.cn>
Signed-off-by: Gang Yan <yangang@kylinos.cn>
Co-developed-by: zhenwei pi <zhenwei.pi@linux.dev>
Signed-off-by: zhenwei pi <zhenwei.pi@linux.dev>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
---
 tools/testing/selftests/net/mptcp/config      |   7 +
 .../testing/selftests/net/mptcp/mptcp_nvme.sh | 120 ++++++++++++++++++
 2 files changed, 127 insertions(+)
 create mode 100755 tools/testing/selftests/net/mptcp/mptcp_nvme.sh

diff --git a/tools/testing/selftests/net/mptcp/config b/tools/testing/selftests/net/mptcp/config
index 59051ee2a986..0eee348eff8b 100644
--- a/tools/testing/selftests/net/mptcp/config
+++ b/tools/testing/selftests/net/mptcp/config
@@ -34,3 +34,10 @@ CONFIG_NFT_SOCKET=m
 CONFIG_NFT_TPROXY=m
 CONFIG_SYN_COOKIES=y
 CONFIG_VETH=y
+CONFIG_CONFIGFS_FS=y
+CONFIG_NVME_CORE=y
+CONFIG_NVME_FABRICS=y
+CONFIG_NVME_TCP=y
+CONFIG_NVME_TARGET=y
+CONFIG_NVME_TARGET_TCP=y
+CONFIG_NVME_MULTIPATH=y
diff --git a/tools/testing/selftests/net/mptcp/mptcp_nvme.sh b/tools/testing/selftests/net/mptcp/mptcp_nvme.sh
new file mode 100755
index 000000000000..1c24198003e2
--- /dev/null
+++ b/tools/testing/selftests/net/mptcp/mptcp_nvme.sh
@@ -0,0 +1,120 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+trtype="${1:-mptcp}"
+traddr="${2:-127.0.0.1}"
+ns=1
+port=1234
+trsvcid=4420
+nqn=nqn.2014-08.org.nvmexpress.${trtype}dev
+final_ret=0
+
+#ip mptcp monitor &
+
+cleanup()
+{
+	rm -rf /sys/kernel/config/nvmet/ports/${port}/subsystems/${trtype}subsys
+	rmdir /sys/kernel/config/nvmet/ports/${port}
+	echo 0 > /sys/kernel/config/nvmet/subsystems/${nqn}/namespaces/${ns}/enable
+	echo -n 0 > /sys/kernel/config/nvmet/subsystems/${nqn}/namespaces/${ns}/device_path
+	rmdir /sys/kernel/config/nvmet/subsystems/${nqn}/namespaces/${ns}
+	rmdir /sys/kernel/config/nvmet/subsystems/${nqn}
+	losetup -d /dev/loop100
+	rm -rf /tmp/test.raw
+}
+
+check_error()
+{
+	if dmesg | grep -E -q "starting error recovery|Buffer I/O error"; then
+		cleanup
+		exit 1
+	fi
+}
+
+dd if=/dev/zero of=/tmp/test.raw bs=1M count=0 seek=512
+losetup /dev/loop100 /tmp/test.raw
+cd /sys/kernel/config/nvmet/subsystems
+mkdir ${nqn}
+cd ${nqn}
+echo 1 > attr_allow_any_host
+cd namespaces
+mkdir ${ns}
+cd ${ns}
+echo /dev/loop100 > device_path
+echo 1 > enable
+cd /sys/kernel/config/nvmet/ports
+mkdir ${port}
+cd ${port}
+echo ${trtype} > addr_trtype
+echo ipv4 > addr_adrfam
+echo ${traddr} > addr_traddr
+echo ${trsvcid} > addr_trsvcid
+cd subsystems
+ln -s ../../../subsystems/${nqn} ${trtype}subsys
+
+echo
+echo "nvme discover"
+echo
+nvme discover -t ${trtype} -a ${traddr} -s ${trsvcid}
+
+echo
+echo "nvme connect"
+echo
+devname=$(nvme connect -t ${trtype} -a ${traddr} -s ${trsvcid} -n ${nqn} | awk '{print $4}')
+lret=$?
+if [ $lret -ne 0 ]; then
+	final_ret=${lret}
+fi
+check_error
+
+sleep 0.5
+echo
+echo "nvme list"
+echo
+nvme list
+lret=$?
+if [ $lret -ne 0 ]; then
+	final_ret=${lret}
+fi
+check_error
+
+echo
+echo "fio randread"
+echo
+fio --name=global --direct=1 --norandommap --randrepeat=0 --ioengine=libaio \
+    --thread=1 --blocksize=4k --runtime=10 --time_based --rw=randread --numjobs=4 \
+    --iodepth=256 --group_reporting --size=100% --name=libaio_4_256_4k_randread \
+    --filename=/dev/${devname}n1
+lret=$?
+if [ $lret -ne 0 ]; then
+	final_ret=${lret}
+fi
+check_error
+
+echo
+echo "fio randwrite"
+echo
+fio --name=global --direct=1 --norandommap --randrepeat=0 --ioengine=libaio \
+    --thread=1 --blocksize=4k --runtime=10 --time_based --rw=randwrite --numjobs=4 \
+    --iodepth=256 --group_reporting --size=100% --name=libaio_4_256_4k_randwrite \
+    --filename=/dev/${devname}n1
+lret=$?
+if [ $lret -ne 0 ]; then
+	final_ret=${lret}
+fi
+check_error
+
+sleep 0.5
+echo
+echo "nvme disconnect"
+echo
+nvme disconnect -n ${nqn}
+lret=$?
+if [ $lret -ne 0 ]; then
+	final_ret=${lret}
+fi
+check_error
+
+cleanup
+echo "final_ret=${final_ret}"
+exit ${final_ret}
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [RFC mptcp-next v2 0/7] NVME over MPTCP
  2025-11-26 10:39 [RFC mptcp-next v2 0/7] NVME over MPTCP Geliang Tang
                   ` (6 preceding siblings ...)
  2025-11-26 10:40 ` [RFC mptcp-next v2 7/7] selftests: mptcp: add NVMe-over-MPTCP test Geliang Tang
@ 2025-11-26 12:29 ` MPTCP CI
  7 siblings, 0 replies; 9+ messages in thread
From: MPTCP CI @ 2025-11-26 12:29 UTC (permalink / raw)
  To: Geliang Tang; +Cc: mptcp

Hi Geliang,

Thank you for your modifications, that's great!

Our CI did some validations and here is its report:

- KVM Validation: normal (except selftest_mptcp_join): Success! ✅
- KVM Validation: normal (only selftest_mptcp_join): Success! ✅
- KVM Validation: debug (except selftest_mptcp_join): Unstable: 1 failed test(s): packetdrill_sockopts 🔴
- KVM Validation: debug (only selftest_mptcp_join): Success! ✅
- KVM Validation: btf-normal (only bpftest_all): Success! ✅
- KVM Validation: btf-debug (only bpftest_all): Success! ✅
- Task: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/19701453448

Initiator: Patchew Applier
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/018aa03d0c20
Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=1027779


If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:

    $ cd [kernel source code]
    $ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
        --pull always mptcp/mptcp-upstream-virtme-docker:latest \
        auto-normal

For more details:

    https://github.com/multipath-tcp/mptcp-upstream-virtme-docker


Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)

Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (NGI0 Core)

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2025-11-26 12:29 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-26 10:39 [RFC mptcp-next v2 0/7] NVME over MPTCP Geliang Tang
2025-11-26 10:39 ` [RFC mptcp-next v2 1/7] mptcp: allow overridden write_space to be invoked Geliang Tang
2025-11-26 10:39 ` [RFC mptcp-next v2 2/7] mptcp: add sock_set_reuseaddr Geliang Tang
2025-11-26 10:40 ` [RFC mptcp-next v2 3/7] mptcp: add sock_set_nodelay Geliang Tang
2025-11-26 10:40 ` [RFC mptcp-next v2 4/7] nvmet-tcp: add mptcp support Geliang Tang
2025-11-26 10:40 ` [RFC mptcp-next v2 5/7] mptcp: add sock_set_syncnt Geliang Tang
2025-11-26 10:40 ` [RFC mptcp-next v2 6/7] nvme-tcp: add mptcp support Geliang Tang
2025-11-26 10:40 ` [RFC mptcp-next v2 7/7] selftests: mptcp: add NVMe-over-MPTCP test Geliang Tang
2025-11-26 12:29 ` [RFC mptcp-next v2 0/7] NVME over MPTCP MPTCP CI

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox