* [RFC mptcp-next v2 0/7] NVME over MPTCP
@ 2025-11-26 10:39 Geliang Tang
2025-11-26 10:39 ` [RFC mptcp-next v2 1/7] mptcp: allow overridden write_space to be invoked Geliang Tang
` (7 more replies)
0 siblings, 8 replies; 9+ messages in thread
From: Geliang Tang @ 2025-11-26 10:39 UTC (permalink / raw)
To: mptcp, hare, hare; +Cc: Geliang Tang
From: Geliang Tang <tanggeliang@kylinos.cn>
v2:
- Patch 1 fixes the timeout issue reported in v1, thanks to Paolo and Gang
Yan for their help.
- Patch 5 implements an MPTCP-specific sock_set_syncnt helper.
This series (previously named "MPTCP support to 'NVME over TCP'") had three
RFC versions sent to Hannes in May, with subsequent revisions based on his
input. Following that, I initiated the process of upstreaming the dependent
"implement mptcp read_sock" series to the main MPTCP repository, which has
now largely taken shape.
Depends on: implement mptcp read_sock, v14
Based-on: <cover.1763974740.git.tanggeliang@kylinos.cn>
Geliang Tang (7):
mptcp: allow overridden write_space to be invoked
mptcp: add sock_set_reuseaddr
mptcp: add sock_set_nodelay
nvmet-tcp: add mptcp support
mptcp: add sock_set_syncnt
nvme-tcp: add mptcp support
selftests: mptcp: add NVMe-over-MPTCP test
drivers/nvme/host/tcp.c | 28 +++-
drivers/nvme/target/configfs.c | 1 +
drivers/nvme/target/tcp.c | 38 +++++-
include/linux/nvme.h | 1 +
include/net/mptcp.h | 15 +++
net/mptcp/protocol.c | 52 ++++++++
net/mptcp/protocol.h | 2 +-
tools/testing/selftests/net/mptcp/config | 7 +
.../testing/selftests/net/mptcp/mptcp_nvme.sh | 120 ++++++++++++++++++
9 files changed, 260 insertions(+), 4 deletions(-)
create mode 100755 tools/testing/selftests/net/mptcp/mptcp_nvme.sh
--
2.51.0
^ permalink raw reply [flat|nested] 9+ messages in thread
* [RFC mptcp-next v2 1/7] mptcp: allow overridden write_space to be invoked
2025-11-26 10:39 [RFC mptcp-next v2 0/7] NVME over MPTCP Geliang Tang
@ 2025-11-26 10:39 ` Geliang Tang
2025-11-26 10:39 ` [RFC mptcp-next v2 2/7] mptcp: add sock_set_reuseaddr Geliang Tang
` (6 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Geliang Tang @ 2025-11-26 10:39 UTC (permalink / raw)
To: mptcp, hare, hare
Cc: Geliang Tang, Paolo Abeni, Hui Zhu, Gang Yan, zhenwei pi
From: Geliang Tang <tanggeliang@kylinos.cn>
NVMe overrides its own sk_write_space functions. This patch ensures that
the overridden sk_write_space can be invoked by MPTCP.
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Co-developed-by: Hui Zhu <zhuhui@kylinos.cn>
Signed-off-by: Hui Zhu <zhuhui@kylinos.cn>
Co-developed-by: Gang Yan <yangang@kylinos.cn>
Signed-off-by: Gang Yan <yangang@kylinos.cn>
Co-developed-by: zhenwei pi <zhenwei.pi@linux.dev>
Signed-off-by: zhenwei pi <zhenwei.pi@linux.dev>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
---
net/mptcp/protocol.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index 199f28f3dd5e..483143e2d0b5 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -973,7 +973,7 @@ static inline void mptcp_write_space(struct sock *sk)
/* pairs with memory barrier in mptcp_poll */
smp_mb();
if (mptcp_stream_memory_free(sk, 1))
- sk_stream_write_space(sk);
+ INDIRECT_CALL_1(sk->sk_write_space, sk_stream_write_space, sk);
}
static inline void __mptcp_sync_sndbuf(struct sock *sk)
--
2.51.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [RFC mptcp-next v2 2/7] mptcp: add sock_set_reuseaddr
2025-11-26 10:39 [RFC mptcp-next v2 0/7] NVME over MPTCP Geliang Tang
2025-11-26 10:39 ` [RFC mptcp-next v2 1/7] mptcp: allow overridden write_space to be invoked Geliang Tang
@ 2025-11-26 10:39 ` Geliang Tang
2025-11-26 10:40 ` [RFC mptcp-next v2 3/7] mptcp: add sock_set_nodelay Geliang Tang
` (5 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Geliang Tang @ 2025-11-26 10:39 UTC (permalink / raw)
To: mptcp, hare, hare; +Cc: Geliang Tang, Hui Zhu, Gang Yan, zhenwei pi
From: Geliang Tang <tanggeliang@kylinos.cn>
Implement an MPTCP sock_set_nodelay helper, which will be used for NVMe
over MPTCP. Using tcp_sock_set_nodelay() with MPTCP will cause list
corruption:
nvmet: adding nsid 1 to subsystem nqn.2014-08.org.nvmexpress.mptcpdev
nvmet_tcp: enabling port 1234 (127.0.0.1:4420)
slab MPTCP start ffff8880108f0b80 pointer offset 2480 size 2816
list_add corruption. prev->next should be next (ffff8880108f1530), but
was ffff8885108f1530. (prev=ffff8880108f1530).
------------[ cut here ]------------
kernel BUG at lib/list_debug.c:32!
Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI
CPU: 1 UID: 0 PID: 182 Comm: nvme Not tainted 6.16.0-rc3+ #1 PREEMPT(full)
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Co-developed-by: Hui Zhu <zhuhui@kylinos.cn>
Signed-off-by: Hui Zhu <zhuhui@kylinos.cn>
Co-developed-by: Gang Yan <yangang@kylinos.cn>
Signed-off-by: Gang Yan <yangang@kylinos.cn>
Co-developed-by: zhenwei pi <zhenwei.pi@linux.dev>
Signed-off-by: zhenwei pi <zhenwei.pi@linux.dev>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
---
include/net/mptcp.h | 4 ++++
net/mptcp/protocol.c | 15 +++++++++++++++
2 files changed, 19 insertions(+)
diff --git a/include/net/mptcp.h b/include/net/mptcp.h
index 4cf59e83c1c5..c7bf444eee56 100644
--- a/include/net/mptcp.h
+++ b/include/net/mptcp.h
@@ -237,6 +237,8 @@ static inline __be32 mptcp_reset_option(const struct sk_buff *skb)
}
void mptcp_active_detect_blackhole(struct sock *sk, bool expired);
+
+void mptcp_sock_set_reuseaddr(struct sock *sk);
#else
static inline void mptcp_init(void)
@@ -323,6 +325,8 @@ static inline struct request_sock *mptcp_subflow_reqsk_alloc(const struct reques
static inline __be32 mptcp_reset_option(const struct sk_buff *skb) { return htonl(0u); }
static inline void mptcp_active_detect_blackhole(struct sock *sk, bool expired) { }
+
+static inline void mptcp_sock_set_reuseaddr(struct sock *sk) { }
#endif /* CONFIG_MPTCP */
#if IS_ENABLED(CONFIG_MPTCP_IPV6)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index f9a544e31637..21067245a8f5 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -3764,6 +3764,21 @@ static void mptcp_sock_check_graft(struct sock *sk, struct sock *ssk)
}
}
+void mptcp_sock_set_reuseaddr(struct sock *sk)
+{
+ struct mptcp_sock *msk = mptcp_sk(sk);
+ struct sock *ssk;
+
+ lock_sock(sk);
+ ssk = __mptcp_nmpc_sk(msk);
+ if (IS_ERR(ssk))
+ goto unlock;
+ ssk->sk_reuse = SK_CAN_REUSE;
+unlock:
+ release_sock(sk);
+}
+EXPORT_SYMBOL(mptcp_sock_set_reuseaddr);
+
bool mptcp_finish_join(struct sock *ssk)
{
struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
--
2.51.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [RFC mptcp-next v2 3/7] mptcp: add sock_set_nodelay
2025-11-26 10:39 [RFC mptcp-next v2 0/7] NVME over MPTCP Geliang Tang
2025-11-26 10:39 ` [RFC mptcp-next v2 1/7] mptcp: allow overridden write_space to be invoked Geliang Tang
2025-11-26 10:39 ` [RFC mptcp-next v2 2/7] mptcp: add sock_set_reuseaddr Geliang Tang
@ 2025-11-26 10:40 ` Geliang Tang
2025-11-26 10:40 ` [RFC mptcp-next v2 4/7] nvmet-tcp: add mptcp support Geliang Tang
` (4 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Geliang Tang @ 2025-11-26 10:40 UTC (permalink / raw)
To: mptcp, hare, hare; +Cc: Geliang Tang, Hui Zhu, Gang Yan, zhenwei pi
From: Geliang Tang <tanggeliang@kylinos.cn>
Implement an MPTCP-specific sock_set_nodelay helper, which will be used for
NVMe over MPTCP.
Co-developed-by: Hui Zhu <zhuhui@kylinos.cn>
Signed-off-by: Hui Zhu <zhuhui@kylinos.cn>
Co-developed-by: Gang Yan <yangang@kylinos.cn>
Signed-off-by: Gang Yan <yangang@kylinos.cn>
Co-developed-by: zhenwei pi <zhenwei.pi@linux.dev>
Signed-off-by: zhenwei pi <zhenwei.pi@linux.dev>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
---
include/net/mptcp.h | 4 ++++
net/mptcp/protocol.c | 18 ++++++++++++++++++
2 files changed, 22 insertions(+)
diff --git a/include/net/mptcp.h b/include/net/mptcp.h
index c7bf444eee56..4fc8f08c6588 100644
--- a/include/net/mptcp.h
+++ b/include/net/mptcp.h
@@ -239,6 +239,8 @@ static inline __be32 mptcp_reset_option(const struct sk_buff *skb)
void mptcp_active_detect_blackhole(struct sock *sk, bool expired);
void mptcp_sock_set_reuseaddr(struct sock *sk);
+
+void mptcp_sock_set_nodelay(struct sock *sk);
#else
static inline void mptcp_init(void)
@@ -327,6 +329,8 @@ static inline __be32 mptcp_reset_option(const struct sk_buff *skb) { return hto
static inline void mptcp_active_detect_blackhole(struct sock *sk, bool expired) { }
static inline void mptcp_sock_set_reuseaddr(struct sock *sk) { }
+
+static inline void mptcp_sock_set_nodelay(struct sock *sk) { }
#endif /* CONFIG_MPTCP */
#if IS_ENABLED(CONFIG_MPTCP_IPV6)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 21067245a8f5..9790bc74a0b7 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -3779,6 +3779,24 @@ void mptcp_sock_set_reuseaddr(struct sock *sk)
}
EXPORT_SYMBOL(mptcp_sock_set_reuseaddr);
+void mptcp_sock_set_nodelay(struct sock *sk)
+{
+ struct mptcp_sock *msk = mptcp_sk(sk);
+ struct sock *ssk;
+
+ lock_sock(sk);
+ ssk = __mptcp_nmpc_sk(msk);
+ if (IS_ERR(ssk))
+ goto unlock;
+
+ lock_sock(ssk);
+ __tcp_sock_set_nodelay(ssk, true);
+ release_sock(ssk);
+unlock:
+ release_sock(sk);
+}
+EXPORT_SYMBOL(mptcp_sock_set_nodelay);
+
bool mptcp_finish_join(struct sock *ssk)
{
struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
--
2.51.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [RFC mptcp-next v2 4/7] nvmet-tcp: add mptcp support
2025-11-26 10:39 [RFC mptcp-next v2 0/7] NVME over MPTCP Geliang Tang
` (2 preceding siblings ...)
2025-11-26 10:40 ` [RFC mptcp-next v2 3/7] mptcp: add sock_set_nodelay Geliang Tang
@ 2025-11-26 10:40 ` Geliang Tang
2025-11-26 10:40 ` [RFC mptcp-next v2 5/7] mptcp: add sock_set_syncnt Geliang Tang
` (3 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Geliang Tang @ 2025-11-26 10:40 UTC (permalink / raw)
To: mptcp, hare, hare; +Cc: Geliang Tang, Hui Zhu, Gang Yan, zhenwei pi
From: Geliang Tang <tanggeliang@kylinos.cn>
This patch adds a new nvme target transport type NVMF_TRTYPE_MPTCP for
MPTCP. And defines a new nvmet_fabrics_ops named nvmet_mptcp_ops, which
is almost the same as nvmet_tcp_ops except .type.
Check if disc_addr.trtype is NVMF_TRTYPE_MPTCP in nvmet_tcp_add_port()
to decide whether to pass IPPROTO_MPTCP to sock_create() to create a
MPTCP socket instead of a TCP one.
This new nvmet_fabrics_ops can be switched in nvmet_tcp_done_recv_pdu()
according to different protocol.
v2:
- use trtype instead of tsas (Hannes).
v3:
- check mptcp protocol from disc_addr.trtype instead of passing a
parameter (Hannes).
v4:
- check CONFIG_MPTCP.
Co-developed-by: Hui Zhu <zhuhui@kylinos.cn>
Signed-off-by: Hui Zhu <zhuhui@kylinos.cn>
Co-developed-by: Gang Yan <yangang@kylinos.cn>
Signed-off-by: Gang Yan <yangang@kylinos.cn>
Co-developed-by: zhenwei pi <zhenwei.pi@linux.dev>
Signed-off-by: zhenwei pi <zhenwei.pi@linux.dev>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
---
drivers/nvme/target/configfs.c | 1 +
drivers/nvme/target/tcp.c | 38 ++++++++++++++++++++++++++++++++--
include/linux/nvme.h | 1 +
3 files changed, 38 insertions(+), 2 deletions(-)
diff --git a/drivers/nvme/target/configfs.c b/drivers/nvme/target/configfs.c
index e44ef69dffc2..14c642cd458e 100644
--- a/drivers/nvme/target/configfs.c
+++ b/drivers/nvme/target/configfs.c
@@ -37,6 +37,7 @@ static struct nvmet_type_name_map nvmet_transport[] = {
{ NVMF_TRTYPE_RDMA, "rdma" },
{ NVMF_TRTYPE_FC, "fc" },
{ NVMF_TRTYPE_TCP, "tcp" },
+ { NVMF_TRTYPE_MPTCP, "mptcp" },
{ NVMF_TRTYPE_PCI, "pci" },
{ NVMF_TRTYPE_LOOP, "loop" },
};
diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
index d543da09ef8e..a0a165d7f9cd 100644
--- a/drivers/nvme/target/tcp.c
+++ b/drivers/nvme/target/tcp.c
@@ -16,6 +16,7 @@
#include <net/tls.h>
#include <net/tls_prot.h>
#include <net/handshake.h>
+#include <net/mptcp.h>
#include <linux/inet.h>
#include <linux/llist.h>
#include <trace/events/sock.h>
@@ -212,6 +213,7 @@ static DEFINE_MUTEX(nvmet_tcp_queue_mutex);
static struct workqueue_struct *nvmet_tcp_wq;
static const struct nvmet_fabrics_ops nvmet_tcp_ops;
+static const struct nvmet_fabrics_ops nvmet_mptcp_ops;
static void nvmet_tcp_free_cmd(struct nvmet_tcp_cmd *c);
static void nvmet_tcp_free_cmd_buffers(struct nvmet_tcp_cmd *cmd);
@@ -1039,7 +1041,9 @@ static int nvmet_tcp_done_recv_pdu(struct nvmet_tcp_queue *queue)
req = &queue->cmd->req;
memcpy(req->cmd, nvme_cmd, sizeof(*nvme_cmd));
- if (unlikely(!nvmet_req_init(req, &queue->nvme_sq, &nvmet_tcp_ops))) {
+ if (unlikely(!nvmet_req_init(req, &queue->nvme_sq,
+ queue->sock->sk->sk_protocol == IPPROTO_MPTCP ?
+ &nvmet_mptcp_ops : &nvmet_tcp_ops))) {
pr_err("failed cmd %p id %d opcode %d, data_len: %d, status: %04x\n",
req->cmd, req->cmd->common.command_id,
req->cmd->common.opcode,
@@ -2007,6 +2011,7 @@ static int nvmet_tcp_add_port(struct nvmet_port *nport)
{
struct nvmet_tcp_port *port;
__kernel_sa_family_t af;
+ int proto = IPPROTO_TCP;
int ret;
port = kzalloc(sizeof(*port), GFP_KERNEL);
@@ -2027,6 +2032,11 @@ static int nvmet_tcp_add_port(struct nvmet_port *nport)
goto err_port;
}
+#ifdef CONFIG_MPTCP
+ if (nport->disc_addr.trtype == NVMF_TRTYPE_MPTCP)
+ proto = IPPROTO_MPTCP;
+#endif
+
ret = inet_pton_with_scope(&init_net, af, nport->disc_addr.traddr,
nport->disc_addr.trsvcid, &port->addr);
if (ret) {
@@ -2041,7 +2051,7 @@ static int nvmet_tcp_add_port(struct nvmet_port *nport)
port->nport->inline_data_size = NVMET_TCP_DEF_INLINE_DATA_SIZE;
ret = sock_create(port->addr.ss_family, SOCK_STREAM,
- IPPROTO_TCP, &port->sock);
+ proto, &port->sock);
if (ret) {
pr_err("failed to create a socket\n");
goto err_port;
@@ -2050,7 +2060,11 @@ static int nvmet_tcp_add_port(struct nvmet_port *nport)
port->sock->sk->sk_user_data = port;
port->data_ready = port->sock->sk->sk_data_ready;
port->sock->sk->sk_data_ready = nvmet_tcp_listen_data_ready;
+ proto == IPPROTO_MPTCP ?
+ mptcp_sock_set_reuseaddr(port->sock->sk) :
sock_set_reuseaddr(port->sock->sk);
+ proto == IPPROTO_MPTCP ?
+ mptcp_sock_set_nodelay(port->sock->sk) :
tcp_sock_set_nodelay(port->sock->sk);
if (so_priority > 0)
sock_set_priority(port->sock->sk, so_priority);
@@ -2193,6 +2207,19 @@ static const struct nvmet_fabrics_ops nvmet_tcp_ops = {
.host_traddr = nvmet_tcp_host_port_addr,
};
+static const struct nvmet_fabrics_ops nvmet_mptcp_ops = {
+ .owner = THIS_MODULE,
+ .type = NVMF_TRTYPE_MPTCP,
+ .msdbd = 1,
+ .add_port = nvmet_tcp_add_port,
+ .remove_port = nvmet_tcp_remove_port,
+ .queue_response = nvmet_tcp_queue_response,
+ .delete_ctrl = nvmet_tcp_delete_ctrl,
+ .install_queue = nvmet_tcp_install_queue,
+ .disc_traddr = nvmet_tcp_disc_port_addr,
+ .host_traddr = nvmet_tcp_host_port_addr,
+};
+
static int __init nvmet_tcp_init(void)
{
int ret;
@@ -2206,6 +2233,12 @@ static int __init nvmet_tcp_init(void)
if (ret)
goto err;
+ ret = nvmet_register_transport(&nvmet_mptcp_ops);
+ if (ret) {
+ nvmet_unregister_transport(&nvmet_tcp_ops);
+ goto err;
+ }
+
return 0;
err:
destroy_workqueue(nvmet_tcp_wq);
@@ -2216,6 +2249,7 @@ static void __exit nvmet_tcp_exit(void)
{
struct nvmet_tcp_queue *queue;
+ nvmet_unregister_transport(&nvmet_mptcp_ops);
nvmet_unregister_transport(&nvmet_tcp_ops);
flush_workqueue(nvmet_wq);
diff --git a/include/linux/nvme.h b/include/linux/nvme.h
index 655d194f8e72..8069667ad47e 100644
--- a/include/linux/nvme.h
+++ b/include/linux/nvme.h
@@ -68,6 +68,7 @@ enum {
NVMF_TRTYPE_RDMA = 1, /* RDMA */
NVMF_TRTYPE_FC = 2, /* Fibre Channel */
NVMF_TRTYPE_TCP = 3, /* TCP/IP */
+ NVMF_TRTYPE_MPTCP = 4, /* Multipath TCP */
NVMF_TRTYPE_LOOP = 254, /* Reserved for host usage */
NVMF_TRTYPE_MAX,
};
--
2.51.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [RFC mptcp-next v2 5/7] mptcp: add sock_set_syncnt
2025-11-26 10:39 [RFC mptcp-next v2 0/7] NVME over MPTCP Geliang Tang
` (3 preceding siblings ...)
2025-11-26 10:40 ` [RFC mptcp-next v2 4/7] nvmet-tcp: add mptcp support Geliang Tang
@ 2025-11-26 10:40 ` Geliang Tang
2025-11-26 10:40 ` [RFC mptcp-next v2 6/7] nvme-tcp: add mptcp support Geliang Tang
` (2 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Geliang Tang @ 2025-11-26 10:40 UTC (permalink / raw)
To: mptcp, hare, hare; +Cc: Geliang Tang, Hui Zhu, Gang Yan, zhenwei pi
From: Geliang Tang <tanggeliang@kylinos.cn>
Implement an MPTCP-specific sock_set_syncnt helper, which will be used for
NVMe over MPTCP.
Co-developed-by: Hui Zhu <zhuhui@kylinos.cn>
Signed-off-by: Hui Zhu <zhuhui@kylinos.cn>
Co-developed-by: Gang Yan <yangang@kylinos.cn>
Signed-off-by: Gang Yan <yangang@kylinos.cn>
Co-developed-by: zhenwei pi <zhenwei.pi@linux.dev>
Signed-off-by: zhenwei pi <zhenwei.pi@linux.dev>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
---
include/net/mptcp.h | 7 +++++++
net/mptcp/protocol.c | 19 +++++++++++++++++++
2 files changed, 26 insertions(+)
diff --git a/include/net/mptcp.h b/include/net/mptcp.h
index 4fc8f08c6588..7d981973df48 100644
--- a/include/net/mptcp.h
+++ b/include/net/mptcp.h
@@ -241,6 +241,8 @@ void mptcp_active_detect_blackhole(struct sock *sk, bool expired);
void mptcp_sock_set_reuseaddr(struct sock *sk);
void mptcp_sock_set_nodelay(struct sock *sk);
+
+int mptcp_sock_set_syncnt(struct sock *sk, int val);
#else
static inline void mptcp_init(void)
@@ -331,6 +333,11 @@ static inline void mptcp_active_detect_blackhole(struct sock *sk, bool expired)
static inline void mptcp_sock_set_reuseaddr(struct sock *sk) { }
static inline void mptcp_sock_set_nodelay(struct sock *sk) { }
+
+static inline int mptcp_sock_set_syncnt(struct sock *sk, int val)
+{
+ return 0;
+}
#endif /* CONFIG_MPTCP */
#if IS_ENABLED(CONFIG_MPTCP_IPV6)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 9790bc74a0b7..543fa3131af7 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -3797,6 +3797,25 @@ void mptcp_sock_set_nodelay(struct sock *sk)
}
EXPORT_SYMBOL(mptcp_sock_set_nodelay);
+int mptcp_sock_set_syncnt(struct sock *sk, int val)
+{
+ struct mptcp_sock *msk = mptcp_sk(sk);
+ struct sock *ssk;
+
+ if (val < 1 || val > MAX_TCP_SYNCNT)
+ return -EINVAL;
+
+ lock_sock(sk);
+ ssk = __mptcp_nmpc_sk(msk);
+ if (IS_ERR(ssk))
+ goto unlock;
+ WRITE_ONCE(inet_csk(ssk)->icsk_syn_retries, val);
+unlock:
+ release_sock(sk);
+ return 0;
+}
+EXPORT_SYMBOL(mptcp_sock_set_syncnt);
+
bool mptcp_finish_join(struct sock *ssk)
{
struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
--
2.51.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [RFC mptcp-next v2 6/7] nvme-tcp: add mptcp support
2025-11-26 10:39 [RFC mptcp-next v2 0/7] NVME over MPTCP Geliang Tang
` (4 preceding siblings ...)
2025-11-26 10:40 ` [RFC mptcp-next v2 5/7] mptcp: add sock_set_syncnt Geliang Tang
@ 2025-11-26 10:40 ` Geliang Tang
2025-11-26 10:40 ` [RFC mptcp-next v2 7/7] selftests: mptcp: add NVMe-over-MPTCP test Geliang Tang
2025-11-26 12:29 ` [RFC mptcp-next v2 0/7] NVME over MPTCP MPTCP CI
7 siblings, 0 replies; 9+ messages in thread
From: Geliang Tang @ 2025-11-26 10:40 UTC (permalink / raw)
To: mptcp, hare, hare; +Cc: Geliang Tang, Hui Zhu, Gang Yan, zhenwei pi
From: Geliang Tang <tanggeliang@kylinos.cn>
This patch defines a new nvmf_transport_ops named nvme_mptcp_transport,
which is almost the same as nvme_tcp_transport except .type.
Check if opts->transport is "mptcp" in nvme_tcp_alloc_queue() to decide
whether to pass IPPROTO_MPTCP to sock_create_kern() to create a MPTCP
socket instead of a TCP one.
v2:
- use 'trtype' instead of '--mptcp' (Hannes)
v3:
- check mptcp protocol from opts->transport instead of passing a
parameter (Hannes).
v4:
- check CONFIG_MPTCP.
Co-developed-by: Hui Zhu <zhuhui@kylinos.cn>
Signed-off-by: Hui Zhu <zhuhui@kylinos.cn>
Co-developed-by: Gang Yan <yangang@kylinos.cn>
Signed-off-by: Gang Yan <yangang@kylinos.cn>
Co-developed-by: zhenwei pi <zhenwei.pi@linux.dev>
Signed-off-by: zhenwei pi <zhenwei.pi@linux.dev>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
---
drivers/nvme/host/tcp.c | 28 +++++++++++++++++++++++++++-
1 file changed, 27 insertions(+), 1 deletion(-)
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 6795b8286c35..383e56ccc539 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -19,6 +19,7 @@
#include <linux/blk-mq.h>
#include <net/busy_poll.h>
#include <trace/events/sock.h>
+#include <net/mptcp.h>
#include "nvme.h"
#include "fabrics.h"
@@ -1766,6 +1767,7 @@ static int nvme_tcp_alloc_queue(struct nvme_ctrl *nctrl, int qid,
{
struct nvme_tcp_ctrl *ctrl = to_tcp_ctrl(nctrl);
struct nvme_tcp_queue *queue = &ctrl->queues[qid];
+ int proto = IPPROTO_TCP;
int ret, rcv_pdu_size;
struct file *sock_file;
@@ -1782,9 +1784,14 @@ static int nvme_tcp_alloc_queue(struct nvme_ctrl *nctrl, int qid,
queue->cmnd_capsule_len = sizeof(struct nvme_command) +
NVME_TCP_ADMIN_CCSZ;
+#ifdef CONFIG_MPTCP
+ if (!strcmp(ctrl->ctrl.opts->transport, "mptcp"))
+ proto = IPPROTO_MPTCP;
+#endif
+
ret = sock_create_kern(current->nsproxy->net_ns,
ctrl->addr.ss_family, SOCK_STREAM,
- IPPROTO_TCP, &queue->sock);
+ proto, &queue->sock);
if (ret) {
dev_err(nctrl->device,
"failed to create socket: %d\n", ret);
@@ -1801,9 +1808,13 @@ static int nvme_tcp_alloc_queue(struct nvme_ctrl *nctrl, int qid,
nvme_tcp_reclassify_socket(queue->sock);
/* Single syn retry */
+ proto == IPPROTO_MPTCP ?
+ mptcp_sock_set_syncnt(queue->sock->sk, 1) :
tcp_sock_set_syncnt(queue->sock->sk, 1);
/* Set TCP no delay */
+ proto == IPPROTO_MPTCP ?
+ mptcp_sock_set_nodelay(queue->sock->sk) :
tcp_sock_set_nodelay(queue->sock->sk);
/*
@@ -3022,6 +3033,19 @@ static struct nvmf_transport_ops nvme_tcp_transport = {
.create_ctrl = nvme_tcp_create_ctrl,
};
+static struct nvmf_transport_ops nvme_mptcp_transport = {
+ .name = "mptcp",
+ .module = THIS_MODULE,
+ .required_opts = NVMF_OPT_TRADDR,
+ .allowed_opts = NVMF_OPT_TRSVCID | NVMF_OPT_RECONNECT_DELAY |
+ NVMF_OPT_HOST_TRADDR | NVMF_OPT_CTRL_LOSS_TMO |
+ NVMF_OPT_HDR_DIGEST | NVMF_OPT_DATA_DIGEST |
+ NVMF_OPT_NR_WRITE_QUEUES | NVMF_OPT_NR_POLL_QUEUES |
+ NVMF_OPT_TOS | NVMF_OPT_HOST_IFACE | NVMF_OPT_TLS |
+ NVMF_OPT_KEYRING | NVMF_OPT_TLS_KEY | NVMF_OPT_CONCAT,
+ .create_ctrl = nvme_tcp_create_ctrl,
+};
+
static int __init nvme_tcp_init_module(void)
{
unsigned int wq_flags = WQ_MEM_RECLAIM | WQ_HIGHPRI | WQ_SYSFS;
@@ -3047,6 +3071,7 @@ static int __init nvme_tcp_init_module(void)
atomic_set(&nvme_tcp_cpu_queues[cpu], 0);
nvmf_register_transport(&nvme_tcp_transport);
+ nvmf_register_transport(&nvme_mptcp_transport);
return 0;
}
@@ -3054,6 +3079,7 @@ static void __exit nvme_tcp_cleanup_module(void)
{
struct nvme_tcp_ctrl *ctrl;
+ nvmf_unregister_transport(&nvme_mptcp_transport);
nvmf_unregister_transport(&nvme_tcp_transport);
mutex_lock(&nvme_tcp_ctrl_mutex);
--
2.51.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [RFC mptcp-next v2 7/7] selftests: mptcp: add NVMe-over-MPTCP test
2025-11-26 10:39 [RFC mptcp-next v2 0/7] NVME over MPTCP Geliang Tang
` (5 preceding siblings ...)
2025-11-26 10:40 ` [RFC mptcp-next v2 6/7] nvme-tcp: add mptcp support Geliang Tang
@ 2025-11-26 10:40 ` Geliang Tang
2025-11-26 12:29 ` [RFC mptcp-next v2 0/7] NVME over MPTCP MPTCP CI
7 siblings, 0 replies; 9+ messages in thread
From: Geliang Tang @ 2025-11-26 10:40 UTC (permalink / raw)
To: mptcp, hare, hare; +Cc: Geliang Tang, Hui Zhu, Gang Yan, zhenwei pi
From: Geliang Tang <tanggeliang@kylinos.cn>
A test case for NVMe over MPTCP has been implemented. It verifies the
proper functionality of nvme list, discover, connect, and disconnect
commands. Additionally, read/write performance has been evaluated using
fio.
Co-developed-by: Hui Zhu <zhuhui@kylinos.cn>
Signed-off-by: Hui Zhu <zhuhui@kylinos.cn>
Co-developed-by: Gang Yan <yangang@kylinos.cn>
Signed-off-by: Gang Yan <yangang@kylinos.cn>
Co-developed-by: zhenwei pi <zhenwei.pi@linux.dev>
Signed-off-by: zhenwei pi <zhenwei.pi@linux.dev>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
---
tools/testing/selftests/net/mptcp/config | 7 +
.../testing/selftests/net/mptcp/mptcp_nvme.sh | 120 ++++++++++++++++++
2 files changed, 127 insertions(+)
create mode 100755 tools/testing/selftests/net/mptcp/mptcp_nvme.sh
diff --git a/tools/testing/selftests/net/mptcp/config b/tools/testing/selftests/net/mptcp/config
index 59051ee2a986..0eee348eff8b 100644
--- a/tools/testing/selftests/net/mptcp/config
+++ b/tools/testing/selftests/net/mptcp/config
@@ -34,3 +34,10 @@ CONFIG_NFT_SOCKET=m
CONFIG_NFT_TPROXY=m
CONFIG_SYN_COOKIES=y
CONFIG_VETH=y
+CONFIG_CONFIGFS_FS=y
+CONFIG_NVME_CORE=y
+CONFIG_NVME_FABRICS=y
+CONFIG_NVME_TCP=y
+CONFIG_NVME_TARGET=y
+CONFIG_NVME_TARGET_TCP=y
+CONFIG_NVME_MULTIPATH=y
diff --git a/tools/testing/selftests/net/mptcp/mptcp_nvme.sh b/tools/testing/selftests/net/mptcp/mptcp_nvme.sh
new file mode 100755
index 000000000000..1c24198003e2
--- /dev/null
+++ b/tools/testing/selftests/net/mptcp/mptcp_nvme.sh
@@ -0,0 +1,120 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+trtype="${1:-mptcp}"
+traddr="${2:-127.0.0.1}"
+ns=1
+port=1234
+trsvcid=4420
+nqn=nqn.2014-08.org.nvmexpress.${trtype}dev
+final_ret=0
+
+#ip mptcp monitor &
+
+cleanup()
+{
+ rm -rf /sys/kernel/config/nvmet/ports/${port}/subsystems/${trtype}subsys
+ rmdir /sys/kernel/config/nvmet/ports/${port}
+ echo 0 > /sys/kernel/config/nvmet/subsystems/${nqn}/namespaces/${ns}/enable
+ echo -n 0 > /sys/kernel/config/nvmet/subsystems/${nqn}/namespaces/${ns}/device_path
+ rmdir /sys/kernel/config/nvmet/subsystems/${nqn}/namespaces/${ns}
+ rmdir /sys/kernel/config/nvmet/subsystems/${nqn}
+ losetup -d /dev/loop100
+ rm -rf /tmp/test.raw
+}
+
+check_error()
+{
+ if dmesg | grep -E -q "starting error recovery|Buffer I/O error"; then
+ cleanup
+ exit 1
+ fi
+}
+
+dd if=/dev/zero of=/tmp/test.raw bs=1M count=0 seek=512
+losetup /dev/loop100 /tmp/test.raw
+cd /sys/kernel/config/nvmet/subsystems
+mkdir ${nqn}
+cd ${nqn}
+echo 1 > attr_allow_any_host
+cd namespaces
+mkdir ${ns}
+cd ${ns}
+echo /dev/loop100 > device_path
+echo 1 > enable
+cd /sys/kernel/config/nvmet/ports
+mkdir ${port}
+cd ${port}
+echo ${trtype} > addr_trtype
+echo ipv4 > addr_adrfam
+echo ${traddr} > addr_traddr
+echo ${trsvcid} > addr_trsvcid
+cd subsystems
+ln -s ../../../subsystems/${nqn} ${trtype}subsys
+
+echo
+echo "nvme discover"
+echo
+nvme discover -t ${trtype} -a ${traddr} -s ${trsvcid}
+
+echo
+echo "nvme connect"
+echo
+devname=$(nvme connect -t ${trtype} -a ${traddr} -s ${trsvcid} -n ${nqn} | awk '{print $4}')
+lret=$?
+if [ $lret -ne 0 ]; then
+ final_ret=${lret}
+fi
+check_error
+
+sleep 0.5
+echo
+echo "nvme list"
+echo
+nvme list
+lret=$?
+if [ $lret -ne 0 ]; then
+ final_ret=${lret}
+fi
+check_error
+
+echo
+echo "fio randread"
+echo
+fio --name=global --direct=1 --norandommap --randrepeat=0 --ioengine=libaio \
+ --thread=1 --blocksize=4k --runtime=10 --time_based --rw=randread --numjobs=4 \
+ --iodepth=256 --group_reporting --size=100% --name=libaio_4_256_4k_randread \
+ --filename=/dev/${devname}n1
+lret=$?
+if [ $lret -ne 0 ]; then
+ final_ret=${lret}
+fi
+check_error
+
+echo
+echo "fio randwrite"
+echo
+fio --name=global --direct=1 --norandommap --randrepeat=0 --ioengine=libaio \
+ --thread=1 --blocksize=4k --runtime=10 --time_based --rw=randwrite --numjobs=4 \
+ --iodepth=256 --group_reporting --size=100% --name=libaio_4_256_4k_randwrite \
+ --filename=/dev/${devname}n1
+lret=$?
+if [ $lret -ne 0 ]; then
+ final_ret=${lret}
+fi
+check_error
+
+sleep 0.5
+echo
+echo "nvme disconnect"
+echo
+nvme disconnect -n ${nqn}
+lret=$?
+if [ $lret -ne 0 ]; then
+ final_ret=${lret}
+fi
+check_error
+
+cleanup
+echo "final_ret=${final_ret}"
+exit ${final_ret}
--
2.51.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [RFC mptcp-next v2 0/7] NVME over MPTCP
2025-11-26 10:39 [RFC mptcp-next v2 0/7] NVME over MPTCP Geliang Tang
` (6 preceding siblings ...)
2025-11-26 10:40 ` [RFC mptcp-next v2 7/7] selftests: mptcp: add NVMe-over-MPTCP test Geliang Tang
@ 2025-11-26 12:29 ` MPTCP CI
7 siblings, 0 replies; 9+ messages in thread
From: MPTCP CI @ 2025-11-26 12:29 UTC (permalink / raw)
To: Geliang Tang; +Cc: mptcp
Hi Geliang,
Thank you for your modifications, that's great!
Our CI did some validations and here is its report:
- KVM Validation: normal (except selftest_mptcp_join): Success! ✅
- KVM Validation: normal (only selftest_mptcp_join): Success! ✅
- KVM Validation: debug (except selftest_mptcp_join): Unstable: 1 failed test(s): packetdrill_sockopts 🔴
- KVM Validation: debug (only selftest_mptcp_join): Success! ✅
- KVM Validation: btf-normal (only bpftest_all): Success! ✅
- KVM Validation: btf-debug (only bpftest_all): Success! ✅
- Task: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/19701453448
Initiator: Patchew Applier
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/018aa03d0c20
Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=1027779
If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:
$ cd [kernel source code]
$ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
--pull always mptcp/mptcp-upstream-virtme-docker:latest \
auto-normal
For more details:
https://github.com/multipath-tcp/mptcp-upstream-virtme-docker
Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)
Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (NGI0 Core)
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2025-11-26 12:29 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-26 10:39 [RFC mptcp-next v2 0/7] NVME over MPTCP Geliang Tang
2025-11-26 10:39 ` [RFC mptcp-next v2 1/7] mptcp: allow overridden write_space to be invoked Geliang Tang
2025-11-26 10:39 ` [RFC mptcp-next v2 2/7] mptcp: add sock_set_reuseaddr Geliang Tang
2025-11-26 10:40 ` [RFC mptcp-next v2 3/7] mptcp: add sock_set_nodelay Geliang Tang
2025-11-26 10:40 ` [RFC mptcp-next v2 4/7] nvmet-tcp: add mptcp support Geliang Tang
2025-11-26 10:40 ` [RFC mptcp-next v2 5/7] mptcp: add sock_set_syncnt Geliang Tang
2025-11-26 10:40 ` [RFC mptcp-next v2 6/7] nvme-tcp: add mptcp support Geliang Tang
2025-11-26 10:40 ` [RFC mptcp-next v2 7/7] selftests: mptcp: add NVMe-over-MPTCP test Geliang Tang
2025-11-26 12:29 ` [RFC mptcp-next v2 0/7] NVME over MPTCP MPTCP CI
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox