* [RFC mptcp-next v6 0/7] NVME over MPTCP
@ 2026-03-30 9:43 Geliang Tang
2026-03-30 9:43 ` [RFC mptcp-next v6 1/7] nvmet-tcp: define target tcp_sockops struct Geliang Tang
` (8 more replies)
0 siblings, 9 replies; 10+ messages in thread
From: Geliang Tang @ 2026-03-30 9:43 UTC (permalink / raw)
To: mptcp; +Cc: Geliang Tang
From: Geliang Tang <tanggeliang@kylinos.cn>
v6:
- introduce nvmet_tcp_sockops and nvme_tcp_sockops structures
- fix set_reuseaddr, set_nodelay and set_syncnt, add sockopt_seq_inc
calls, only set the first subflow, and synchronize to other subflows in
sync_socket_options
- Add implementations for no_linger, set_priority and set_tos
- This version no longer depends on the "mptcp: fix stall because of
data_ready" series of fixes
v5:
- address comments reported by ai-review: set msk->nodelay to true in
mptcp_sock_set_nodelay, set sk->sk_reuse to ssk->sk_reuse in
mptcp_sock_set_reuseaddr, add mptcp_nvme.sh to TEST_PROGS, and adjust
the order of patches.
- remove TLS-related options from .allowed_opts of
nvme_mptcp_transport.
- some cleanups for selftest.
- Link: https://patchwork.kernel.org/project/mptcp/cover/cover.1773374342.git.tanggeliang@kylinos.cn/
v4:
- a new patch to set nvme iopolicy as Nilay suggested.
- resend all set to trigger AI review.
- Link: https://patchwork.kernel.org/project/mptcp/cover/cover.1772683110.git.tanggeliang@kylinos.cn/
v3:
- update the implementation of sock_set_nodelay: originally it only set
the first subflow, but now it sets every subflow.
- use sk_is_msk helper in this set.
- update the selftest to perform testing under a multi-interface
environment.
- Link: https://patchwork.kernel.org/project/mptcp/cover/cover.1770627071.git.tanggeliang@kylinos.cn/
v2:
- Patch 1 fixes the timeout issue reported in v1, thanks to Paolo and Gang
Yan for their help.
- Patch 5 implements an MPTCP-specific sock_set_syncnt helper.
- Link: https://patchwork.kernel.org/project/mptcp/cover/cover.1764152990.git.tanggeliang@kylinos.cn/
This series (previously named "MPTCP support to 'NVME over TCP'") had three
RFC versions sent to Hannes in May, with subsequent revisions based on his
input. Following that, I initiated the process of upstreaming the dependent
"implement mptcp read_sock" series to the main MPTCP repository, which has
been merged into net-next recently.
Geliang Tang (7):
nvmet-tcp: define target tcp_sockops struct
nvmet-tcp: implement target mptcp sockops
nvmet-tcp: register target mptcp transport
nvme-tcp: define host tcp_sockops struct
nvme-tcp: implement host mptcp sockops
nvme-tcp: register host mptcp transport
selftests: mptcp: add NVMe over MPTCP test
drivers/nvme/host/tcp.c | 66 +++++-
drivers/nvme/target/configfs.c | 1 +
drivers/nvme/target/tcp.c | 79 ++++++-
include/linux/nvme.h | 1 +
include/net/mptcp.h | 27 +++
net/mptcp/protocol.h | 1 +
net/mptcp/sockopt.c | 120 ++++++++++
tools/testing/selftests/net/mptcp/Makefile | 1 +
tools/testing/selftests/net/mptcp/config | 7 +
.../testing/selftests/net/mptcp/mptcp_nvme.sh | 205 ++++++++++++++++++
10 files changed, 494 insertions(+), 14 deletions(-)
create mode 100755 tools/testing/selftests/net/mptcp/mptcp_nvme.sh
--
2.51.0
^ permalink raw reply [flat|nested] 10+ messages in thread
* [RFC mptcp-next v6 1/7] nvmet-tcp: define target tcp_sockops struct
2026-03-30 9:43 [RFC mptcp-next v6 0/7] NVME over MPTCP Geliang Tang
@ 2026-03-30 9:43 ` Geliang Tang
2026-03-30 9:43 ` [RFC mptcp-next v6 2/7] nvmet-tcp: implement target mptcp sockops Geliang Tang
` (7 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: Geliang Tang @ 2026-03-30 9:43 UTC (permalink / raw)
To: mptcp; +Cc: Geliang Tang, Hannes Reinecke, zhenwei pi, Hui Zhu, Gang Yan
From: Geliang Tang <tanggeliang@kylinos.cn>
To add MPTCP support in "NVMe over TCP", the target side needs to pass
IPPROTO_MPTCP to sock_create() instead of IPPROTO_TCP to create an MPTCP
socket. Additionally, the setsockopt operations for this socket need to
be switched to a set of MPTCP-specific functions.
This patch defines the nvmet_tcp_sockops structure, which contains the
protocol of the socket and a set of function pointers for these socket
operations. A "sockops" field is also added to struct nvmet_tcp_port.
A TCP-specific version of struct nvmet_tcp_sockops is defined. In
nvmet_tcp_add_port(), port->sockops is set to nvmet_tcp_sockops based on
whether trtype is TCP. All locations that previously called TCP setsockopt
functions are updated to call the corresponding function pointers in the
nvmet_tcp_sockops structure.
Cc: Hannes Reinecke <hare@suse.de>
Co-developed-by: zhenwei pi <zhenwei.pi@linux.dev>
Signed-off-by: zhenwei pi <zhenwei.pi@linux.dev>
Co-developed-by: Hui Zhu <zhuhui@kylinos.cn>
Signed-off-by: Hui Zhu <zhuhui@kylinos.cn>
Co-developed-by: Gang Yan <yangang@kylinos.cn>
Signed-off-by: Gang Yan <yangang@kylinos.cn>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
---
drivers/nvme/target/tcp.c | 40 ++++++++++++++++++++++++++++++++-------
1 file changed, 33 insertions(+), 7 deletions(-)
diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
index acc71a26733f..b20adfb10737 100644
--- a/drivers/nvme/target/tcp.c
+++ b/drivers/nvme/target/tcp.c
@@ -198,12 +198,22 @@ struct nvmet_tcp_queue {
void (*write_space)(struct sock *);
};
+struct nvmet_tcp_sockops {
+ int proto;
+ void (*set_reuseaddr)(struct sock *sk);
+ void (*set_nodelay)(struct sock *sk);
+ void (*set_priority)(struct sock *sk, u32 priority);
+ void (*no_linger)(struct sock *sk);
+ void (*set_tos)(struct sock *sk, int val);
+};
+
struct nvmet_tcp_port {
struct socket *sock;
struct work_struct accept_work;
struct nvmet_port *nport;
struct sockaddr_storage addr;
void (*data_ready)(struct sock *);
+ struct nvmet_tcp_sockops sockops;
};
static DEFINE_IDA(nvmet_tcp_queue_ida);
@@ -1703,14 +1713,14 @@ static int nvmet_tcp_set_queue_sock(struct nvmet_tcp_queue *queue)
* close. This is done to prevent stale data from being sent should
* the network connection be restored before TCP times out.
*/
- sock_no_linger(sock->sk);
+ queue->port->sockops.no_linger(sock->sk);
if (so_priority > 0)
- sock_set_priority(sock->sk, so_priority);
+ queue->port->sockops.set_priority(sock->sk, so_priority);
/* Set socket type of service */
if (inet->rcv_tos > 0)
- ip_sock_set_tos(sock->sk, inet->rcv_tos);
+ queue->port->sockops.set_tos(sock->sk, inet->rcv_tos);
ret = 0;
write_lock_bh(&sock->sk->sk_callback_lock);
@@ -2030,6 +2040,15 @@ static void nvmet_tcp_listen_data_ready(struct sock *sk)
read_unlock_bh(&sk->sk_callback_lock);
}
+static const struct nvmet_tcp_sockops nvmet_tcp_sockops = {
+ .proto = IPPROTO_TCP,
+ .set_reuseaddr = sock_set_reuseaddr,
+ .set_nodelay = tcp_sock_set_nodelay,
+ .set_priority = sock_set_priority,
+ .no_linger = sock_no_linger,
+ .set_tos = ip_sock_set_tos,
+};
+
static int nvmet_tcp_add_port(struct nvmet_port *nport)
{
struct nvmet_tcp_port *port;
@@ -2054,6 +2073,13 @@ static int nvmet_tcp_add_port(struct nvmet_port *nport)
goto err_port;
}
+ if (nport->disc_addr.trtype == NVMF_TRTYPE_TCP) {
+ port->sockops = nvmet_tcp_sockops;
+ } else {
+ ret = -EINVAL;
+ goto err_port;
+ }
+
ret = inet_pton_with_scope(&init_net, af, nport->disc_addr.traddr,
nport->disc_addr.trsvcid, &port->addr);
if (ret) {
@@ -2068,7 +2094,7 @@ static int nvmet_tcp_add_port(struct nvmet_port *nport)
port->nport->inline_data_size = NVMET_TCP_DEF_INLINE_DATA_SIZE;
ret = sock_create(port->addr.ss_family, SOCK_STREAM,
- IPPROTO_TCP, &port->sock);
+ port->sockops.proto, &port->sock);
if (ret) {
pr_err("failed to create a socket\n");
goto err_port;
@@ -2077,10 +2103,10 @@ static int nvmet_tcp_add_port(struct nvmet_port *nport)
port->sock->sk->sk_user_data = port;
port->data_ready = port->sock->sk->sk_data_ready;
port->sock->sk->sk_data_ready = nvmet_tcp_listen_data_ready;
- sock_set_reuseaddr(port->sock->sk);
- tcp_sock_set_nodelay(port->sock->sk);
+ port->sockops.set_reuseaddr(port->sock->sk);
+ port->sockops.set_nodelay(port->sock->sk);
if (so_priority > 0)
- sock_set_priority(port->sock->sk, so_priority);
+ port->sockops.set_priority(port->sock->sk, so_priority);
ret = kernel_bind(port->sock, (struct sockaddr_unsized *)&port->addr,
sizeof(port->addr));
--
2.51.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [RFC mptcp-next v6 2/7] nvmet-tcp: implement target mptcp sockops
2026-03-30 9:43 [RFC mptcp-next v6 0/7] NVME over MPTCP Geliang Tang
2026-03-30 9:43 ` [RFC mptcp-next v6 1/7] nvmet-tcp: define target tcp_sockops struct Geliang Tang
@ 2026-03-30 9:43 ` Geliang Tang
2026-03-30 9:43 ` [RFC mptcp-next v6 3/7] nvmet-tcp: register target mptcp transport Geliang Tang
` (6 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: Geliang Tang @ 2026-03-30 9:43 UTC (permalink / raw)
To: mptcp; +Cc: Geliang Tang, Hannes Reinecke, zhenwei pi, Hui Zhu, Gang Yan
From: Geliang Tang <tanggeliang@kylinos.cn>
This patch introduces a new NVMe target transport type NVMF_TRTYPE_MPTCP
to support MPTCP.
An MPTCP-specific version of struct nvmet_tcp_sockops is implemented,
and it is assigned to port->sockops when the transport type is MPTCP.
Dedicated MPTCP helpers are introduced for setting socket options. These
helpers set the values on the first subflow socket of an MPTCP connection.
The values are then synchronized to other newly created subflows in
sync_socket_options().
Cc: Hannes Reinecke <hare@suse.de>
Co-developed-by: zhenwei pi <zhenwei.pi@linux.dev>
Signed-off-by: zhenwei pi <zhenwei.pi@linux.dev>
Co-developed-by: Hui Zhu <zhuhui@kylinos.cn>
Signed-off-by: Hui Zhu <zhuhui@kylinos.cn>
Co-developed-by: Gang Yan <yangang@kylinos.cn>
Signed-off-by: Gang Yan <yangang@kylinos.cn>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
---
drivers/nvme/target/tcp.c | 13 ++++++
include/linux/nvme.h | 1 +
include/net/mptcp.h | 20 ++++++++
net/mptcp/sockopt.c | 98 +++++++++++++++++++++++++++++++++++++++
4 files changed, 132 insertions(+)
diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
index b20adfb10737..03f876440f6d 100644
--- a/drivers/nvme/target/tcp.c
+++ b/drivers/nvme/target/tcp.c
@@ -2049,6 +2049,15 @@ static const struct nvmet_tcp_sockops nvmet_tcp_sockops = {
.set_tos = ip_sock_set_tos,
};
+static const struct nvmet_tcp_sockops nvmet_mptcp_sockops = {
+ .proto = IPPROTO_MPTCP,
+ .set_reuseaddr = mptcp_sock_set_reuseaddr,
+ .set_nodelay = mptcp_sock_set_nodelay,
+ .set_priority = mptcp_sock_set_priority,
+ .no_linger = mptcp_sock_no_linger,
+ .set_tos = mptcp_sock_set_tos,
+};
+
static int nvmet_tcp_add_port(struct nvmet_port *nport)
{
struct nvmet_tcp_port *port;
@@ -2075,6 +2084,10 @@ static int nvmet_tcp_add_port(struct nvmet_port *nport)
if (nport->disc_addr.trtype == NVMF_TRTYPE_TCP) {
port->sockops = nvmet_tcp_sockops;
+#ifdef CONFIG_MPTCP
+ } else if (nport->disc_addr.trtype == NVMF_TRTYPE_MPTCP) {
+ port->sockops = nvmet_mptcp_sockops;
+#endif
} else {
ret = -EINVAL;
goto err_port;
diff --git a/include/linux/nvme.h b/include/linux/nvme.h
index 655d194f8e72..8069667ad47e 100644
--- a/include/linux/nvme.h
+++ b/include/linux/nvme.h
@@ -68,6 +68,7 @@ enum {
NVMF_TRTYPE_RDMA = 1, /* RDMA */
NVMF_TRTYPE_FC = 2, /* Fibre Channel */
NVMF_TRTYPE_TCP = 3, /* TCP/IP */
+ NVMF_TRTYPE_MPTCP = 4, /* Multipath TCP */
NVMF_TRTYPE_LOOP = 254, /* Reserved for host usage */
NVMF_TRTYPE_MAX,
};
diff --git a/include/net/mptcp.h b/include/net/mptcp.h
index 4cf59e83c1c5..6eca3ff13324 100644
--- a/include/net/mptcp.h
+++ b/include/net/mptcp.h
@@ -237,6 +237,16 @@ static inline __be32 mptcp_reset_option(const struct sk_buff *skb)
}
void mptcp_active_detect_blackhole(struct sock *sk, bool expired);
+
+void mptcp_sock_set_reuseaddr(struct sock *sk);
+
+void mptcp_sock_set_nodelay(struct sock *sk);
+
+void mptcp_sock_set_priority(struct sock *sk, u32 priority);
+
+void mptcp_sock_no_linger(struct sock *sk);
+
+void mptcp_sock_set_tos(struct sock *sk, int val);
#else
static inline void mptcp_init(void)
@@ -323,6 +333,16 @@ static inline struct request_sock *mptcp_subflow_reqsk_alloc(const struct reques
static inline __be32 mptcp_reset_option(const struct sk_buff *skb) { return htonl(0u); }
static inline void mptcp_active_detect_blackhole(struct sock *sk, bool expired) { }
+
+static inline void mptcp_sock_set_reuseaddr(struct sock *sk) { }
+
+static inline void mptcp_sock_set_nodelay(struct sock *sk) { }
+
+static void mptcp_sock_set_priority(struct sock *sk, u32 priority) { }
+
+static inline void mptcp_sock_no_linger(struct sock *sk) { }
+
+static void mptcp_sock_set_tos(struct sock *sk, int val) { }
#endif /* CONFIG_MPTCP */
#if IS_ENABLED(CONFIG_MPTCP_IPV6)
diff --git a/net/mptcp/sockopt.c b/net/mptcp/sockopt.c
index de90a2897d2d..c6a2ccab7049 100644
--- a/net/mptcp/sockopt.c
+++ b/net/mptcp/sockopt.c
@@ -1537,6 +1537,7 @@ static void sync_socket_options(struct mptcp_sock *msk, struct sock *ssk)
static const unsigned int tx_rx_locks = SOCK_RCVBUF_LOCK | SOCK_SNDBUF_LOCK;
struct sock *sk = (struct sock *)msk;
bool keep_open;
+ u32 priority;
keep_open = sock_flag(sk, SOCK_KEEPOPEN);
if (ssk->sk_prot->keepalive)
@@ -1586,6 +1587,11 @@ static void sync_socket_options(struct mptcp_sock *msk, struct sock *ssk)
inet_assign_bit(FREEBIND, ssk, inet_test_bit(FREEBIND, sk));
inet_assign_bit(BIND_ADDRESS_NO_PORT, ssk, inet_test_bit(BIND_ADDRESS_NO_PORT, sk));
WRITE_ONCE(inet_sk(ssk)->local_port_range, READ_ONCE(inet_sk(sk)->local_port_range));
+
+ ssk->sk_reuse = sk->sk_reuse;
+ priority = READ_ONCE(sk->sk_priority);
+ if (priority > 0)
+ sock_set_priority(ssk, priority);
}
void mptcp_sockopt_sync_locked(struct mptcp_sock *msk, struct sock *ssk)
@@ -1652,3 +1658,95 @@ int mptcp_set_rcvlowat(struct sock *sk, int val)
}
return 0;
}
+
+void mptcp_sock_set_reuseaddr(struct sock *sk)
+{
+ struct mptcp_sock *msk = mptcp_sk(sk);
+ struct sock *ssk;
+
+ lock_sock(sk);
+ sockopt_seq_inc(msk);
+ sk->sk_reuse = SK_CAN_REUSE;
+ ssk = __mptcp_nmpc_sk(msk);
+ if (IS_ERR(ssk))
+ goto unlock;
+ sock_set_reuseaddr(ssk);
+unlock:
+ release_sock(sk);
+}
+EXPORT_SYMBOL(mptcp_sock_set_reuseaddr);
+
+void mptcp_sock_set_nodelay(struct sock *sk)
+{
+ struct mptcp_sock *msk = mptcp_sk(sk);
+ struct sock *ssk;
+
+ lock_sock(sk);
+ sockopt_seq_inc(msk);
+ msk->nodelay = true;
+ ssk = __mptcp_nmpc_sk(msk);
+ if (IS_ERR(ssk))
+ goto unlock;
+ lock_sock(ssk);
+ __tcp_sock_set_nodelay(ssk, true);
+ release_sock(ssk);
+unlock:
+ release_sock(sk);
+}
+EXPORT_SYMBOL(mptcp_sock_set_nodelay);
+
+void mptcp_sock_set_priority(struct sock *sk, u32 priority)
+{
+ struct mptcp_sock *msk = mptcp_sk(sk);
+ struct sock *ssk;
+
+ lock_sock(sk);
+ sockopt_seq_inc(msk);
+ sock_set_priority(sk, priority);
+ ssk = __mptcp_nmpc_sk(msk);
+ if (IS_ERR(ssk))
+ goto unlock;
+ lock_sock(ssk);
+ sock_set_priority(ssk, priority);
+ release_sock(ssk);
+unlock:
+ release_sock(sk);
+}
+EXPORT_SYMBOL(mptcp_sock_set_priority);
+
+void mptcp_sock_no_linger(struct sock *sk)
+{
+ struct mptcp_sock *msk = mptcp_sk(sk);
+ struct sock *ssk;
+
+ lock_sock(sk);
+ sockopt_seq_inc(msk);
+ WRITE_ONCE(sk->sk_lingertime, 0);
+ sock_set_flag(sk, SOCK_LINGER);
+ ssk = __mptcp_nmpc_sk(msk);
+ if (IS_ERR(ssk))
+ goto unlock;
+ sock_no_linger(ssk);
+unlock:
+ release_sock(sk);
+}
+EXPORT_SYMBOL(mptcp_sock_no_linger);
+
+void mptcp_sock_set_tos(struct sock *sk, int val)
+{
+ struct mptcp_sock *msk = mptcp_sk(sk);
+ struct sock *ssk;
+
+ lock_sock(sk);
+ sockopt_seq_inc(msk);
+ __ip_sock_set_tos(sk, val);
+ ssk = __mptcp_nmpc_sk(msk);
+ if (IS_ERR(ssk))
+ goto unlock;
+ lock_sock(ssk);
+ __ip_sock_set_tos(ssk, val);
+ release_sock(ssk);
+unlock:
+ release_sock(sk);
+}
+EXPORT_SYMBOL(mptcp_sock_set_tos);
--
2.51.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [RFC mptcp-next v6 3/7] nvmet-tcp: register target mptcp transport
2026-03-30 9:43 [RFC mptcp-next v6 0/7] NVME over MPTCP Geliang Tang
2026-03-30 9:43 ` [RFC mptcp-next v6 1/7] nvmet-tcp: define target tcp_sockops struct Geliang Tang
2026-03-30 9:43 ` [RFC mptcp-next v6 2/7] nvmet-tcp: implement target mptcp sockops Geliang Tang
@ 2026-03-30 9:43 ` Geliang Tang
2026-03-30 9:43 ` [RFC mptcp-next v6 4/7] nvme-tcp: define host tcp_sockops struct Geliang Tang
` (5 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: Geliang Tang @ 2026-03-30 9:43 UTC (permalink / raw)
To: mptcp; +Cc: Geliang Tang, Hannes Reinecke, zhenwei pi, Hui Zhu, Gang Yan
From: Geliang Tang <tanggeliang@kylinos.cn>
This patch defines a new nvmet_fabrics_ops named nvmet_mptcp_ops, which
is almost identical to nvmet_tcp_ops except for the .type field.
It is registered in nvmet_tcp_init() and unregistered in nvmet_tcp_exit().
This new nvmet_fabrics_ops is selected in nvmet_tcp_done_recv_pdu() based
on the protocol type.
A MODULE_ALIAS for "nvmet-transport-4" is also added.
v2:
- use trtype instead of tsas (Hannes).
v3:
- check mptcp protocol from disc_addr.trtype instead of passing a
parameter (Hannes).
v4:
- check CONFIG_MPTCP.
Cc: Hannes Reinecke <hare@suse.de>
Co-developed-by: zhenwei pi <zhenwei.pi@linux.dev>
Signed-off-by: zhenwei pi <zhenwei.pi@linux.dev>
Co-developed-by: Hui Zhu <zhuhui@kylinos.cn>
Signed-off-by: Hui Zhu <zhuhui@kylinos.cn>
Co-developed-by: Gang Yan <yangang@kylinos.cn>
Signed-off-by: Gang Yan <yangang@kylinos.cn>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
---
drivers/nvme/target/configfs.c | 1 +
drivers/nvme/target/tcp.c | 26 +++++++++++++++++++++++++-
2 files changed, 26 insertions(+), 1 deletion(-)
diff --git a/drivers/nvme/target/configfs.c b/drivers/nvme/target/configfs.c
index 3088e044dbcb..4b7498ffb102 100644
--- a/drivers/nvme/target/configfs.c
+++ b/drivers/nvme/target/configfs.c
@@ -38,6 +38,7 @@ static struct nvmet_type_name_map nvmet_transport[] = {
{ NVMF_TRTYPE_RDMA, "rdma" },
{ NVMF_TRTYPE_FC, "fc" },
{ NVMF_TRTYPE_TCP, "tcp" },
+ { NVMF_TRTYPE_MPTCP, "mptcp" },
{ NVMF_TRTYPE_PCI, "pci" },
{ NVMF_TRTYPE_LOOP, "loop" },
};
diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
index 03f876440f6d..86ff4ed0f753 100644
--- a/drivers/nvme/target/tcp.c
+++ b/drivers/nvme/target/tcp.c
@@ -222,6 +222,7 @@ static DEFINE_MUTEX(nvmet_tcp_queue_mutex);
static struct workqueue_struct *nvmet_tcp_wq;
static const struct nvmet_fabrics_ops nvmet_tcp_ops;
+static const struct nvmet_fabrics_ops nvmet_mptcp_ops;
static void nvmet_tcp_free_cmd(struct nvmet_tcp_cmd *c);
static void nvmet_tcp_free_cmd_buffers(struct nvmet_tcp_cmd *cmd);
@@ -1077,7 +1078,9 @@ static int nvmet_tcp_done_recv_pdu(struct nvmet_tcp_queue *queue)
req = &queue->cmd->req;
memcpy(req->cmd, nvme_cmd, sizeof(*nvme_cmd));
- if (unlikely(!nvmet_req_init(req, &queue->nvme_sq, &nvmet_tcp_ops))) {
+ if (unlikely(!nvmet_req_init(req, &queue->nvme_sq,
+ sk_is_tcp(queue->sock->sk) ?
+ &nvmet_tcp_ops : &nvmet_mptcp_ops))) {
pr_err("failed cmd %p id %d opcode %d, data_len: %d, status: %04x\n",
req->cmd, req->cmd->common.command_id,
req->cmd->common.opcode,
@@ -2259,6 +2262,19 @@ static const struct nvmet_fabrics_ops nvmet_tcp_ops = {
.host_traddr = nvmet_tcp_host_port_addr,
};
+static const struct nvmet_fabrics_ops nvmet_mptcp_ops = {
+ .owner = THIS_MODULE,
+ .type = NVMF_TRTYPE_MPTCP,
+ .msdbd = 1,
+ .add_port = nvmet_tcp_add_port,
+ .remove_port = nvmet_tcp_remove_port,
+ .queue_response = nvmet_tcp_queue_response,
+ .delete_ctrl = nvmet_tcp_delete_ctrl,
+ .install_queue = nvmet_tcp_install_queue,
+ .disc_traddr = nvmet_tcp_disc_port_addr,
+ .host_traddr = nvmet_tcp_host_port_addr,
+};
+
static int __init nvmet_tcp_init(void)
{
int ret;
@@ -2272,6 +2288,12 @@ static int __init nvmet_tcp_init(void)
if (ret)
goto err;
+ ret = nvmet_register_transport(&nvmet_mptcp_ops);
+ if (ret) {
+ nvmet_unregister_transport(&nvmet_tcp_ops);
+ goto err;
+ }
+
return 0;
err:
destroy_workqueue(nvmet_tcp_wq);
@@ -2282,6 +2304,7 @@ static void __exit nvmet_tcp_exit(void)
{
struct nvmet_tcp_queue *queue;
+ nvmet_unregister_transport(&nvmet_mptcp_ops);
nvmet_unregister_transport(&nvmet_tcp_ops);
flush_workqueue(nvmet_wq);
@@ -2301,3 +2324,4 @@ module_exit(nvmet_tcp_exit);
MODULE_DESCRIPTION("NVMe target TCP transport driver");
MODULE_LICENSE("GPL v2");
MODULE_ALIAS("nvmet-transport-3"); /* 3 == NVMF_TRTYPE_TCP */
+MODULE_ALIAS("nvmet-transport-4"); /* 4 == NVMF_TRTYPE_MPTCP */
--
2.51.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [RFC mptcp-next v6 4/7] nvme-tcp: define host tcp_sockops struct
2026-03-30 9:43 [RFC mptcp-next v6 0/7] NVME over MPTCP Geliang Tang
` (2 preceding siblings ...)
2026-03-30 9:43 ` [RFC mptcp-next v6 3/7] nvmet-tcp: register target mptcp transport Geliang Tang
@ 2026-03-30 9:43 ` Geliang Tang
2026-03-30 9:43 ` [RFC mptcp-next v6 5/7] nvme-tcp: implement host mptcp sockops Geliang Tang
` (4 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: Geliang Tang @ 2026-03-30 9:43 UTC (permalink / raw)
To: mptcp; +Cc: Geliang Tang, Hannes Reinecke, zhenwei pi, Hui Zhu, Gang Yan
From: Geliang Tang <tanggeliang@kylinos.cn>
To add MPTCP support in "NVMe over TCP", the host side needs to pass
IPPROTO_MPTCP to sock_create_kern() instead of IPPROTO_TCP to create an
MPTCP socket.
Similar to the target-side nvmet_tcp_sockops, this patch defines the
host-side nvme_tcp_sockops structure, which contains the protocol of the
socket and a set of function pointers for socket operations. The only
difference is that it defines .set_syncnt instead of .set_reuseaddr.
A TCP-specific version of this structure is defined, and a sockops field is
added to nvme_tcp_ctrl. When the transport string is "tcp", it is assigned
to ctrl->sockops.
All locations that previously called TCP setsockopt functions are updated
to call the corresponding function pointers in the nvme_tcp_sockops
structure.
Cc: Hannes Reinecke <hare@suse.de>
Co-developed-by: zhenwei pi <zhenwei.pi@linux.dev>
Signed-off-by: zhenwei pi <zhenwei.pi@linux.dev>
Co-developed-by: Hui Zhu <zhuhui@kylinos.cn>
Signed-off-by: Hui Zhu <zhuhui@kylinos.cn>
Co-developed-by: Gang Yan <yangang@kylinos.cn>
Signed-off-by: Gang Yan <yangang@kylinos.cn>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
---
drivers/nvme/host/tcp.c | 39 +++++++++++++++++++++++++++++++++------
1 file changed, 33 insertions(+), 6 deletions(-)
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 243dab830dc8..910c4186eb3b 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -182,6 +182,15 @@ struct nvme_tcp_queue {
void (*write_space)(struct sock *);
};
+struct nvme_tcp_sockops {
+ int proto;
+ int (*set_syncnt)(struct sock *sk, int val);
+ void (*set_nodelay)(struct sock *sk);
+ void (*no_linger)(struct sock *sk);
+ void (*set_priority)(struct sock *sk, u32 priority);
+ void (*set_tos)(struct sock *sk, int val);
+};
+
struct nvme_tcp_ctrl {
/* read only in the hot path */
struct nvme_tcp_queue *queues;
@@ -198,6 +207,8 @@ struct nvme_tcp_ctrl {
struct delayed_work connect_work;
struct nvme_tcp_request async_req;
u32 io_queues[HCTX_MAX_TYPES];
+
+ struct nvme_tcp_sockops sockops;
};
static LIST_HEAD(nvme_tcp_ctrl_list);
@@ -1785,7 +1796,7 @@ static int nvme_tcp_alloc_queue(struct nvme_ctrl *nctrl, int qid,
ret = sock_create_kern(current->nsproxy->net_ns,
ctrl->addr.ss_family, SOCK_STREAM,
- IPPROTO_TCP, &queue->sock);
+ ctrl->sockops.proto, &queue->sock);
if (ret) {
dev_err(nctrl->device,
"failed to create socket: %d\n", ret);
@@ -1802,24 +1813,24 @@ static int nvme_tcp_alloc_queue(struct nvme_ctrl *nctrl, int qid,
nvme_tcp_reclassify_socket(queue->sock);
/* Single syn retry */
- tcp_sock_set_syncnt(queue->sock->sk, 1);
+ ctrl->sockops.set_syncnt(queue->sock->sk, 1);
/* Set TCP no delay */
- tcp_sock_set_nodelay(queue->sock->sk);
+ ctrl->sockops.set_nodelay(queue->sock->sk);
/*
* Cleanup whatever is sitting in the TCP transmit queue on socket
* close. This is done to prevent stale data from being sent should
* the network connection be restored before TCP times out.
*/
- sock_no_linger(queue->sock->sk);
+ ctrl->sockops.no_linger(queue->sock->sk);
if (so_priority > 0)
- sock_set_priority(queue->sock->sk, so_priority);
+ ctrl->sockops.set_priority(queue->sock->sk, so_priority);
/* Set socket type of service */
if (nctrl->opts->tos >= 0)
- ip_sock_set_tos(queue->sock->sk, nctrl->opts->tos);
+ ctrl->sockops.set_tos(queue->sock->sk, nctrl->opts->tos);
/* Set 10 seconds timeout for icresp recvmsg */
queue->sock->sk->sk_rcvtimeo = 10 * HZ;
@@ -2886,6 +2897,15 @@ nvme_tcp_existing_controller(struct nvmf_ctrl_options *opts)
return found;
}
+static const struct nvme_tcp_sockops nvme_tcp_sockops = {
+ .proto = IPPROTO_TCP,
+ .set_syncnt = tcp_sock_set_syncnt,
+ .set_nodelay = tcp_sock_set_nodelay,
+ .no_linger = sock_no_linger,
+ .set_priority = sock_set_priority,
+ .set_tos = ip_sock_set_tos,
+};
+
static struct nvme_tcp_ctrl *nvme_tcp_alloc_ctrl(struct device *dev,
struct nvmf_ctrl_options *opts)
{
@@ -2960,6 +2980,13 @@ static struct nvme_tcp_ctrl *nvme_tcp_alloc_ctrl(struct device *dev,
if (ret)
goto out_kfree_queues;
+ if (!strcmp(ctrl->ctrl.opts->transport, "tcp")) {
+ ctrl->sockops = nvme_tcp_sockops;
+ } else {
+ ret = -EINVAL;
+ goto out_kfree_queues;
+ }
+
return ctrl;
out_kfree_queues:
kfree(ctrl->queues);
--
2.51.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [RFC mptcp-next v6 5/7] nvme-tcp: implement host mptcp sockops
2026-03-30 9:43 [RFC mptcp-next v6 0/7] NVME over MPTCP Geliang Tang
` (3 preceding siblings ...)
2026-03-30 9:43 ` [RFC mptcp-next v6 4/7] nvme-tcp: define host tcp_sockops struct Geliang Tang
@ 2026-03-30 9:43 ` Geliang Tang
2026-03-30 9:43 ` [RFC mptcp-next v6 6/7] nvme-tcp: register host mptcp transport Geliang Tang
` (3 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: Geliang Tang @ 2026-03-30 9:43 UTC (permalink / raw)
To: mptcp; +Cc: Geliang Tang, Hannes Reinecke, zhenwei pi, Hui Zhu, Gang Yan
From: Geliang Tang <tanggeliang@kylinos.cn>
An MPTCP-specific version of struct nvme_tcp_sockops is implemented,
and it is assigned to ctrl->sockops when the transport string is "mptcp".
The socket option setting logic is similar to the target side, except that
mptcp_sock_set_syncnt is newly defined for the host side.
It sets the value on the first subflow socket of an MPTCP connection.
The value is then synchronized to other newly created subflows in
sync_socket_options().
Cc: Hannes Reinecke <hare@suse.de>
Co-developed-by: zhenwei pi <zhenwei.pi@linux.dev>
Signed-off-by: zhenwei pi <zhenwei.pi@linux.dev>
Co-developed-by: Hui Zhu <zhuhui@kylinos.cn>
Signed-off-by: Hui Zhu <zhuhui@kylinos.cn>
Co-developed-by: Gang Yan <yangang@kylinos.cn>
Signed-off-by: Gang Yan <yangang@kylinos.cn>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
---
drivers/nvme/host/tcp.c | 13 +++++++++++++
include/net/mptcp.h | 7 +++++++
net/mptcp/protocol.h | 1 +
net/mptcp/sockopt.c | 22 ++++++++++++++++++++++
4 files changed, 43 insertions(+)
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 910c4186eb3b..cf80f9354471 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -2906,6 +2906,15 @@ static const struct nvme_tcp_sockops nvme_tcp_sockops = {
.set_tos = ip_sock_set_tos,
};
+static const struct nvme_tcp_sockops nvme_mptcp_sockops = {
+ .proto = IPPROTO_MPTCP,
+ .set_syncnt = mptcp_sock_set_syncnt,
+ .set_nodelay = mptcp_sock_set_nodelay,
+ .no_linger = mptcp_sock_no_linger,
+ .set_priority = mptcp_sock_set_priority,
+ .set_tos = mptcp_sock_set_tos,
+};
+
static struct nvme_tcp_ctrl *nvme_tcp_alloc_ctrl(struct device *dev,
struct nvmf_ctrl_options *opts)
{
@@ -2982,6 +2991,10 @@ static struct nvme_tcp_ctrl *nvme_tcp_alloc_ctrl(struct device *dev,
if (!strcmp(ctrl->ctrl.opts->transport, "tcp")) {
ctrl->sockops = nvme_tcp_sockops;
+#ifdef CONFIG_MPTCP
+ } else if (!strcmp(ctrl->ctrl.opts->transport, "mptcp")) {
+ ctrl->sockops = nvme_mptcp_sockops;
+#endif
} else {
ret = -EINVAL;
goto out_kfree_queues;
diff --git a/include/net/mptcp.h b/include/net/mptcp.h
index 6eca3ff13324..d4bc9bde2256 100644
--- a/include/net/mptcp.h
+++ b/include/net/mptcp.h
@@ -247,6 +247,8 @@ void mptcp_sock_set_priority(struct sock *sk, u32 priority);
void mptcp_sock_no_linger(struct sock *sk);
void mptcp_sock_set_tos(struct sock *sk, int val);
+
+int mptcp_sock_set_syncnt(struct sock *sk, int val);
#else
static inline void mptcp_init(void)
@@ -343,6 +345,11 @@ static void mptcp_sock_set_priority(struct sock *sk, u32 priority) { }
static inline void mptcp_sock_no_linger(struct sock *sk) { }
static void mptcp_sock_set_tos(struct sock *sk, int val) { }
+
+static inline int mptcp_sock_set_syncnt(struct sock *sk, int val)
+{
+ return 0;
+}
#endif /* CONFIG_MPTCP */
#if IS_ENABLED(CONFIG_MPTCP_IPV6)
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index f5d4d7d030f2..84e80816b2a4 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -335,6 +335,7 @@ struct mptcp_sock {
int keepalive_idle;
int keepalive_intvl;
int maxseg;
+ int icsk_syn_retries;
struct work_struct work;
struct sk_buff *ooo_last_skb;
struct rb_root out_of_order_queue;
diff --git a/net/mptcp/sockopt.c b/net/mptcp/sockopt.c
index c6a2ccab7049..df1b6d2c4d7e 100644
--- a/net/mptcp/sockopt.c
+++ b/net/mptcp/sockopt.c
@@ -1592,6 +1592,7 @@ static void sync_socket_options(struct mptcp_sock *msk, struct sock *ssk)
priority = READ_ONCE(sk->sk_priority);
if (priority > 0)
sock_set_priority(ssk, priority);
+ tcp_sock_set_syncnt(ssk, msk->icsk_syn_retries);
}
void mptcp_sockopt_sync_locked(struct mptcp_sock *msk, struct sock *ssk)
@@ -1750,3 +1751,24 @@ void mptcp_sock_set_tos(struct sock *sk, int val)
release_sock(sk);
}
EXPORT_SYMBOL(mptcp_sock_set_tos);
+
+int mptcp_sock_set_syncnt(struct sock *sk, int val)
+{
+ struct mptcp_sock *msk = mptcp_sk(sk);
+ struct sock *ssk;
+
+ if (val < 1 || val > MAX_TCP_SYNCNT)
+ return -EINVAL;
+
+ lock_sock(sk);
+ sockopt_seq_inc(msk);
+ msk->icsk_syn_retries = val;
+ ssk = __mptcp_nmpc_sk(msk);
+ if (IS_ERR(ssk))
+ goto unlock;
+ tcp_sock_set_syncnt(ssk, val);
+unlock:
+ release_sock(sk);
+ return 0;
+}
+EXPORT_SYMBOL(mptcp_sock_set_syncnt);
--
2.51.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [RFC mptcp-next v6 6/7] nvme-tcp: register host mptcp transport
2026-03-30 9:43 [RFC mptcp-next v6 0/7] NVME over MPTCP Geliang Tang
` (4 preceding siblings ...)
2026-03-30 9:43 ` [RFC mptcp-next v6 5/7] nvme-tcp: implement host mptcp sockops Geliang Tang
@ 2026-03-30 9:43 ` Geliang Tang
2026-03-30 9:43 ` [RFC mptcp-next v6 7/7] selftests: mptcp: add NVMe over MPTCP test Geliang Tang
` (2 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: Geliang Tang @ 2026-03-30 9:43 UTC (permalink / raw)
To: mptcp; +Cc: Geliang Tang, Hannes Reinecke, zhenwei pi, Hui Zhu, Gang Yan
From: Geliang Tang <tanggeliang@kylinos.cn>
This patch defines a new nvmf_transport_ops named nvme_mptcp_transport,
which is almost the same as nvme_tcp_transport except .type and
.allowed_opts.
MPTCP currently does not support TLS. The four TLS-related options
(NVMF_OPT_TLS, NVMF_OPT_KEYRING, NVMF_OPT_TLS_KEY, and NVMF_OPT_CONCAT)
have been removed from allowed_opts. They will be added back once MPTCP
TLS is supported.
It is registered in nvme_tcp_init_module() and unregistered in
nvme_tcp_cleanup_module().
v2:
- use 'trtype' instead of '--mptcp' (Hannes)
v3:
- check mptcp protocol from opts->transport instead of passing a
parameter (Hannes).
v4:
- check CONFIG_MPTCP.
Cc: Hannes Reinecke <hare@suse.de>
Co-developed-by: zhenwei pi <zhenwei.pi@linux.dev>
Signed-off-by: zhenwei pi <zhenwei.pi@linux.dev>
Co-developed-by: Hui Zhu <zhuhui@kylinos.cn>
Signed-off-by: Hui Zhu <zhuhui@kylinos.cn>
Co-developed-by: Gang Yan <yangang@kylinos.cn>
Signed-off-by: Gang Yan <yangang@kylinos.cn>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
---
drivers/nvme/host/tcp.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index cf80f9354471..9231310d4a9b 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -3063,6 +3063,18 @@ static struct nvmf_transport_ops nvme_tcp_transport = {
.create_ctrl = nvme_tcp_create_ctrl,
};
+static struct nvmf_transport_ops nvme_mptcp_transport = {
+ .name = "mptcp",
+ .module = THIS_MODULE,
+ .required_opts = NVMF_OPT_TRADDR,
+ .allowed_opts = NVMF_OPT_TRSVCID | NVMF_OPT_RECONNECT_DELAY |
+ NVMF_OPT_HOST_TRADDR | NVMF_OPT_CTRL_LOSS_TMO |
+ NVMF_OPT_HDR_DIGEST | NVMF_OPT_DATA_DIGEST |
+ NVMF_OPT_NR_WRITE_QUEUES | NVMF_OPT_NR_POLL_QUEUES |
+ NVMF_OPT_TOS | NVMF_OPT_HOST_IFACE,
+ .create_ctrl = nvme_tcp_create_ctrl,
+};
+
static int __init nvme_tcp_init_module(void)
{
unsigned int wq_flags = WQ_MEM_RECLAIM | WQ_HIGHPRI | WQ_SYSFS;
@@ -3088,6 +3100,7 @@ static int __init nvme_tcp_init_module(void)
atomic_set(&nvme_tcp_cpu_queues[cpu], 0);
nvmf_register_transport(&nvme_tcp_transport);
+ nvmf_register_transport(&nvme_mptcp_transport);
return 0;
}
@@ -3095,6 +3108,7 @@ static void __exit nvme_tcp_cleanup_module(void)
{
struct nvme_tcp_ctrl *ctrl;
+ nvmf_unregister_transport(&nvme_mptcp_transport);
nvmf_unregister_transport(&nvme_tcp_transport);
mutex_lock(&nvme_tcp_ctrl_mutex);
--
2.51.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [RFC mptcp-next v6 7/7] selftests: mptcp: add NVMe over MPTCP test
2026-03-30 9:43 [RFC mptcp-next v6 0/7] NVME over MPTCP Geliang Tang
` (5 preceding siblings ...)
2026-03-30 9:43 ` [RFC mptcp-next v6 6/7] nvme-tcp: register host mptcp transport Geliang Tang
@ 2026-03-30 9:43 ` Geliang Tang
2026-03-30 10:06 ` [RFC mptcp-next v6 0/7] NVME over MPTCP MPTCP CI
2026-03-30 11:38 ` MPTCP CI
8 siblings, 0 replies; 10+ messages in thread
From: Geliang Tang @ 2026-03-30 9:43 UTC (permalink / raw)
To: mptcp
Cc: Geliang Tang, Hannes Reinecke, Nilay Shroff, Ming Lei, zhenwei pi,
Hui Zhu, Gang Yan
From: Geliang Tang <tanggeliang@kylinos.cn>
A test case for NVMe over MPTCP has been implemented. It verifies the
proper functionality of nvme list, discover, connect, and disconnect
commands. Additionally, read/write performance has been evaluated using
fio.
This test simulats four NICs on both target and host sides, each limited
to 100MB/s. It shows that 'NVMe over MPTCP' delivered bandwidth up to
four times that of standard TCP:
# ./mptcp_nvme.sh tcp
READ: bw=112MiB/s (118MB/s), 112MiB/s-112MiB/s (118MB/s-118MB/s),
io=1123MiB (1177MB), run=10018-10018msec
WRITE: bw=112MiB/s (117MB/s), 112MiB/s-112MiB/s (117MB/s-117MB/s),
io=1118MiB (1173MB), run=10018-10018msec
# ./mptcp_nvme.sh mptcp
READ: bw=427MiB/s (448MB/s), 427MiB/s-427MiB/s (448MB/s-448MB/s),
io=4286MiB (4494MB), run=10039-10039msec
WRITE: bw=387MiB/s (406MB/s), 387MiB/s-387MiB/s (406MB/s-406MB/s),
io=3885MiB (4073MB), run=10043-10043msec
Also add NVMe iopolicy testing to mptcp_nvme.sh, with the default set
to "numa". It can be set to "round-robin" or "queue-depth".
# ./mptcp_nvme.sh mptcp round-robin
Cc: Hannes Reinecke <hare@suse.de>
Cc: Nilay Shroff <nilay@linux.ibm.com>
Cc: Ming Lei <ming.lei@redhat.com>
Co-developed-by: zhenwei pi <zhenwei.pi@linux.dev>
Signed-off-by: zhenwei pi <zhenwei.pi@linux.dev>
Co-developed-by: Hui Zhu <zhuhui@kylinos.cn>
Signed-off-by: Hui Zhu <zhuhui@kylinos.cn>
Co-developed-by: Gang Yan <yangang@kylinos.cn>
Signed-off-by: Gang Yan <yangang@kylinos.cn>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
---
tools/testing/selftests/net/mptcp/Makefile | 1 +
tools/testing/selftests/net/mptcp/config | 7 +
.../testing/selftests/net/mptcp/mptcp_nvme.sh | 205 ++++++++++++++++++
3 files changed, 213 insertions(+)
create mode 100755 tools/testing/selftests/net/mptcp/mptcp_nvme.sh
diff --git a/tools/testing/selftests/net/mptcp/Makefile b/tools/testing/selftests/net/mptcp/Makefile
index 22ba0da2adb8..7b308447a58b 100644
--- a/tools/testing/selftests/net/mptcp/Makefile
+++ b/tools/testing/selftests/net/mptcp/Makefile
@@ -13,6 +13,7 @@ TEST_PROGS := \
mptcp_connect_sendfile.sh \
mptcp_connect_splice.sh \
mptcp_join.sh \
+ mptcp_nvme.sh \
mptcp_sockopt.sh \
pm_netlink.sh \
simult_flows.sh \
diff --git a/tools/testing/selftests/net/mptcp/config b/tools/testing/selftests/net/mptcp/config
index 59051ee2a986..0eee348eff8b 100644
--- a/tools/testing/selftests/net/mptcp/config
+++ b/tools/testing/selftests/net/mptcp/config
@@ -34,3 +34,10 @@ CONFIG_NFT_SOCKET=m
CONFIG_NFT_TPROXY=m
CONFIG_SYN_COOKIES=y
CONFIG_VETH=y
+CONFIG_CONFIGFS_FS=y
+CONFIG_NVME_CORE=y
+CONFIG_NVME_FABRICS=y
+CONFIG_NVME_TCP=y
+CONFIG_NVME_TARGET=y
+CONFIG_NVME_TARGET_TCP=y
+CONFIG_NVME_MULTIPATH=y
diff --git a/tools/testing/selftests/net/mptcp/mptcp_nvme.sh b/tools/testing/selftests/net/mptcp/mptcp_nvme.sh
new file mode 100755
index 000000000000..bc201a300b72
--- /dev/null
+++ b/tools/testing/selftests/net/mptcp/mptcp_nvme.sh
@@ -0,0 +1,205 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+. "$(dirname "$0")/mptcp_lib.sh"
+
+ret=0
+trtype="${1:-mptcp}"
+iopolicy=${2:-"numa"} # round-robin, queue-depth
+nqn=nqn.2014-08.org.nvmexpress.${trtype}dev
+ns=1
+port=1234
+trsvcid=4420
+ns1=""
+ns2=""
+
+ns1_cleanup()
+{
+ mount -t configfs none /sys/kernel/config
+
+ rm -rf /sys/kernel/config/nvmet/ports/"${port}"/subsystems/"${trtype}"subsys
+ rmdir /sys/kernel/config/nvmet/ports/"${port}"
+ echo 0 > /sys/kernel/config/nvmet/subsystems/"${nqn}"/namespaces/"${ns}"/enable
+ echo -n 0 > /sys/kernel/config/nvmet/subsystems/"${nqn}"/namespaces/"${ns}"/device_path
+ rmdir /sys/kernel/config/nvmet/subsystems/"${nqn}"/namespaces/"${ns}"
+ rmdir /sys/kernel/config/nvmet/subsystems/"${nqn}"
+}
+
+ns2_cleanup()
+{
+ nvme disconnect -n "${nqn}" || true
+}
+
+cleanup()
+{
+ ip netns exec "$ns2" bash <<- EOF
+ $(declare -f ns2_cleanup)
+ ns2_cleanup
+ EOF
+
+ sleep 1
+
+ ip netns exec "$ns1" bash <<- EOF
+ $(declare -f ns1_cleanup)
+ ns1_cleanup
+ EOF
+
+ losetup -d /dev/loop100
+ rm -rf /tmp/test.raw
+
+ mptcp_lib_ns_exit "$ns1" "$ns2"
+
+ kill "$monitor_pid_ns1" 2>/dev/null
+ wait "$monitor_pid_ns1" 2>/dev/null
+
+ kill "$monitor_pid_ns2" 2>/dev/null
+ wait "$monitor_pid_ns2" 2>/dev/null
+
+ unset -v trtype nqn ns port trsvcid
+}
+
+init()
+{
+ mptcp_lib_ns_init ns1 ns2
+
+ # ns1 ns2
+ # 10.1.1.1 10.1.1.2
+ # 10.1.2.1 10.1.2.2
+ # 10.1.3.1 10.1.3.2
+ # 10.1.4.1 10.1.4.2
+ for i in {1..4}; do
+ ip link add ns1eth"$i" netns "$ns1" type veth peer name ns2eth"$i" netns "$ns2"
+ ip -net "$ns1" addr add 10.1."$i".1/24 dev ns1eth"$i"
+ ip -net "$ns1" addr add dead:beef:"$i"::1/64 dev ns1eth"$i" nodad
+ ip -net "$ns1" link set ns1eth"$i" up
+ ip -net "$ns2" addr add 10.1."$i".2/24 dev ns2eth"$i"
+ ip -net "$ns2" addr add dead:beef:"$i"::2/64 dev ns2eth"$i" nodad
+ ip -net "$ns2" link set ns2eth"$i" up
+ ip -net "$ns2" route add default via 10.1."$i".1 dev ns2eth"$i" metric 10"$i"
+ ip -net "$ns2" route add default via dead:beef:"$i"::1 dev ns2eth"$i" metric 10"$i"
+
+ # Add tc qdisc to both namespaces for bandwidth limiting
+ tc -n "$ns1" qdisc add dev ns1eth"$i" root netem rate 1000mbit
+ tc -n "$ns2" qdisc add dev ns2eth"$i" root netem rate 1000mbit
+ done
+
+ mptcp_lib_pm_nl_set_limits "${ns1}" 8 8
+
+ mptcp_lib_pm_nl_add_endpoint "$ns1" 10.1.2.1 flags signal
+ mptcp_lib_pm_nl_add_endpoint "$ns1" 10.1.3.1 flags signal
+ mptcp_lib_pm_nl_add_endpoint "$ns1" 10.1.4.1 flags signal
+
+ mptcp_lib_pm_nl_set_limits "${ns2}" 8 8
+
+ mptcp_lib_pm_nl_add_endpoint "$ns2" 10.1.2.2 flags subflow
+ mptcp_lib_pm_nl_add_endpoint "$ns2" 10.1.3.2 flags subflow
+ mptcp_lib_pm_nl_add_endpoint "$ns2" 10.1.4.2 flags subflow
+
+ ip -n "${ns1}" mptcp monitor &
+ monitor_pid_ns1=$!
+ ip -n "${ns2}" mptcp monitor &
+ monitor_pid_ns2=$!
+}
+
+run_target()
+{
+ mount -t configfs none /sys/kernel/config
+
+ cd /sys/kernel/config/nvmet/subsystems || exit
+ mkdir -p "${nqn}"
+ cd "${nqn}" || exit
+ echo 1 > attr_allow_any_host
+ mkdir -p namespaces/"${ns}"
+ echo /dev/loop100 > namespaces/"${ns}"/device_path
+ echo 1 > namespaces/"${ns}"/enable
+
+ cd /sys/kernel/config/nvmet/ports || exit
+ mkdir -p "${port}"
+ cd "${port}" || exit
+ echo "${trtype}" > addr_trtype
+ echo ipv4 > addr_adrfam
+ echo 0.0.0.0 > addr_traddr
+ echo "${trsvcid}" > addr_trsvcid
+
+ cd subsystems || exit
+ ln -sf ../../../subsystems/"${nqn}" "${trtype}"subsys
+}
+
+run_host()
+{
+ local traddr=10.1.1.1
+
+ echo "nvme discover -a ${traddr}"
+ nvme discover -t "${trtype}" -a "${traddr}" -s "${trsvcid}"
+ if [ $? -ne 0 ]; then
+ return "${KSFT_FAIL}"
+ fi
+
+ echo "nvme connect"
+ devname=$(nvme connect -t "${trtype}" -a "${traddr}" -s "${trsvcid}" -n "${nqn}" |
+ awk '{print $NF}')
+
+ sleep 1
+
+ echo "nvme list"
+ nvme list
+
+ subname=$(nvme list-subsys /dev/"${devname}"n1 |
+ grep -o 'nvme-subsys[0-9]*' | head -1)
+
+ echo "${iopolicy}" > /sys/class/nvme-subsystem/"${subname}"/iopolicy
+ cat /sys/class/nvme-subsystem/"${subname}"/iopolicy
+
+ echo "fio randread /dev/${devname}n1"
+ fio --name=global --direct=1 --norandommap --randrepeat=0 --ioengine=libaio \
+ --thread=1 --blocksize=4k --runtime=10 --time_based --rw=randread --numjobs=4 \
+ --iodepth=256 --group_reporting --size=100% --name=libaio_4_256_4k_randread \
+ --size=4m --filename=/dev/"${devname}"n1
+
+ sleep 1
+
+ echo "fio randwrite /dev/${devname}n1"
+ fio --name=global --direct=1 --norandommap --randrepeat=0 --ioengine=libaio \
+ --thread=1 --blocksize=4k --runtime=10 --time_based --rw=randwrite --numjobs=4 \
+ --iodepth=256 --group_reporting --size=100% --name=libaio_4_256_4k_randwrite \
+ --size=4m --filename=/dev/"${devname}"n1
+
+ nvme flush /dev/"${devname}"n1
+
+ sleep 1
+}
+
+init
+trap cleanup EXIT
+
+dd if=/dev/zero of=/tmp/test.raw bs=1M count=0 seek=512
+losetup /dev/loop100 /tmp/test.raw
+
+run_test()
+{
+ export trtype nqn ns port trsvcid
+ export iopolicy
+
+ if ! ip netns exec "$ns1" bash <<- EOF
+ $(declare -f run_target)
+ run_target
+ exit \$?
+ EOF
+ then
+ ret="${KSFT_FAIL}"
+ fi
+
+ if ! ip netns exec "$ns2" bash <<- EOF
+ $(declare -f run_host)
+ run_host
+ exit \$?
+ EOF
+ then
+ ret="${KSFT_FAIL}"
+ fi
+}
+
+run_test "$@"
+
+mptcp_lib_result_print_all_tap
+exit "$ret"
--
2.51.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [RFC mptcp-next v6 0/7] NVME over MPTCP
2026-03-30 9:43 [RFC mptcp-next v6 0/7] NVME over MPTCP Geliang Tang
` (6 preceding siblings ...)
2026-03-30 9:43 ` [RFC mptcp-next v6 7/7] selftests: mptcp: add NVMe over MPTCP test Geliang Tang
@ 2026-03-30 10:06 ` MPTCP CI
2026-03-30 11:38 ` MPTCP CI
8 siblings, 0 replies; 10+ messages in thread
From: MPTCP CI @ 2026-03-30 10:06 UTC (permalink / raw)
To: Geliang Tang; +Cc: mptcp
Hi Geliang,
Thank you for your modifications, that's great!
But sadly, our CI spotted some issues with it when trying to build it.
You can find more details there:
https://github.com/multipath-tcp/mptcp_net-next/actions/runs/23738944865
Status: failure
Initiator: Patchew Applier
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/6cb584e4f7f0
Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=1074466
Feel free to reply to this email if you cannot access logs, if you need
some support to fix the error, if this doesn't seem to be caused by your
modifications or if the error is a false positive one.
Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (NGI0 Core)
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC mptcp-next v6 0/7] NVME over MPTCP
2026-03-30 9:43 [RFC mptcp-next v6 0/7] NVME over MPTCP Geliang Tang
` (7 preceding siblings ...)
2026-03-30 10:06 ` [RFC mptcp-next v6 0/7] NVME over MPTCP MPTCP CI
@ 2026-03-30 11:38 ` MPTCP CI
8 siblings, 0 replies; 10+ messages in thread
From: MPTCP CI @ 2026-03-30 11:38 UTC (permalink / raw)
To: Geliang Tang; +Cc: mptcp
Hi Geliang,
Thank you for your modifications, that's great!
Our CI did some validations and here is its report:
- KVM Validation: normal (except selftest_mptcp_join): Success! ✅
- KVM Validation: normal (only selftest_mptcp_join): Success! ✅
- KVM Validation: debug (except selftest_mptcp_join): Unstable: 2 failed test(s): packetdrill_dss selftest_diag 🔴
- KVM Validation: debug (only selftest_mptcp_join): Success! ✅
- KVM Validation: btf-normal (only bpftest_all): Success! ✅
- KVM Validation: btf-debug (only bpftest_all): Success! ✅
- Task: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/23738944825
Initiator: Patchew Applier
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/6cb584e4f7f0
Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=1074466
If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:
$ cd [kernel source code]
$ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
--pull always mptcp/mptcp-upstream-virtme-docker:latest \
auto-normal
For more details:
https://github.com/multipath-tcp/mptcp-upstream-virtme-docker
Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)
Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (NGI0 Core)
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2026-03-30 11:38 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-30 9:43 [RFC mptcp-next v6 0/7] NVME over MPTCP Geliang Tang
2026-03-30 9:43 ` [RFC mptcp-next v6 1/7] nvmet-tcp: define target tcp_sockops struct Geliang Tang
2026-03-30 9:43 ` [RFC mptcp-next v6 2/7] nvmet-tcp: implement target mptcp sockops Geliang Tang
2026-03-30 9:43 ` [RFC mptcp-next v6 3/7] nvmet-tcp: register target mptcp transport Geliang Tang
2026-03-30 9:43 ` [RFC mptcp-next v6 4/7] nvme-tcp: define host tcp_sockops struct Geliang Tang
2026-03-30 9:43 ` [RFC mptcp-next v6 5/7] nvme-tcp: implement host mptcp sockops Geliang Tang
2026-03-30 9:43 ` [RFC mptcp-next v6 6/7] nvme-tcp: register host mptcp transport Geliang Tang
2026-03-30 9:43 ` [RFC mptcp-next v6 7/7] selftests: mptcp: add NVMe over MPTCP test Geliang Tang
2026-03-30 10:06 ` [RFC mptcp-next v6 0/7] NVME over MPTCP MPTCP CI
2026-03-30 11:38 ` MPTCP CI
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox