All of lore.kernel.org
 help / color / mirror / Atom feed
* [LSF/MM/BPF TOPIC] NVMe over MPTCP: Multi-Fold Acceleration for NVMe over TCP in Multi-NIC Environments
@ 2026-01-29  4:13 Geliang Tang
  2026-02-25  5:57 ` Ming Lei
  2026-02-25 15:07 ` Nilay Shroff
  0 siblings, 2 replies; 10+ messages in thread
From: Geliang Tang @ 2026-01-29  4:13 UTC (permalink / raw)
  To: lsf-pc, linux-nvme
  Cc: mptcp, Matthieu Baerts, Mat Martineau, Paolo Abeni,
	Hannes Reinecke

As one of the MPTCP upstream developers, I'm recently working on adding
MPTCP support to 'NVMe over TCP'. This approach achieves a multi-fold
performance improvement over using standard TCP. The implementation and
testing phases are largely complete. The code is currently in the RFC
stage and has undergone several rounds of discussion and iteration on
the MPTCP mailing list [1]. It will be sent to the NVMe mailing list
shortly.

1. Introduction to MPTCP

Multipath TCP (MPTCP), standardized in RFC 8684, represents a major
evolution of the TCP protocol. It enables a single transport connection
to utilize multiple network paths simultaneously, providing benefits in
redundancy, resilience, and bandwidth aggregation. Since its
introduction in Linux kernel v5.6, it has become a key technology for
modern networking, particularly in multi-NIC environments.

On a supported system such as Linux, an MPTCP socket is created by
specifying the IPPROTO_MPTCP protocol in the socket() system call:

	int fd = socket(AF_INET, SOCK_STREAM, IPPROTO_MPTCP);

This creates a socket that appears as a standard TCP socket to the
application but uses the MPTCP protocol stack underneath.

For more details, please visit the project website: https://mptcp.dev.

2. Implementation

'NVMe over TCP' establishes multiple TCP connections between the target
and host for data transfer. This includes one admin queue connection
for management traffic and multiple I/O queue connections for data
traffic, with the number typically scaling with available CPU cores.
While these multiple TCP connections (using the same IP address but
different ports) help distribute computational load across CPUs, all
data traffic still flows through a single network interface card (NIC),
even in multi-NIC environments.

The 'NVMe over MPTCP' solution enhances 'NVMe over TCP' by replacing
the multiple TCP connections with multiple MPTCP connections, leaving
other mechanisms unchanged. Internally, each MPTCP connection can
establish multiple subflows based on the number of configured NICs.
This distributes data traffic across all available NICs, thereby
increasing aggregate transmission speed.

Therefore, the primary change required is to modify the protocol
parameter from IPPROTO_TCP to IPPROTO_MPTCP when creating sockets on
both the target and host sides:

	Target side:

	sock_create(port->addr.ss_family, SOCK_STREAM,
			IPPROTO_TCP, &port->sock);

	Host side:

	sock_create_kern(current->nsproxy->net_ns,
			ctrl->addr.ss_family, SOCK_STREAM,
			IPPROTO_TCP, &queue->sock);

A new NVMe transport type, named NVMF_TRTYPE_MPTCP (suggested by Hannes
Reinecke), has been introduced to determine whether to create a TCP or
MPTCP socket:

	Target side:

	if (nport->disc_addr.trtype == NVMF_TRTYPE_MPTCP)
		proto = IPPROTO_MPTCP;

	Host side:

	if (!strcmp(ctrl->ctrl.opts->transport, "mptcp"))
		proto = IPPROTO_MPTCP;

3. Performance Benefits

This new feature has been evaluated in different environments:

I conducted 'NVMe over MPTCP' tests between two PCs, each equipped with
two Gigabit NICs and directly connected via Ethernet cables. Using
'NVMe over TCP', the fio benchmark showed a speed of approximately 100
MiB/s. In contrast, 'NVMe over MPTCP' achieved about 200 MiB/s with
fio, doubling the throughput.

In a virtual machine test environment simulating four NICs on both
sides, 'NVMe over MPTCP' delivered bandwidth up to four times that of
standard TCP.

4. Configuration

To achieve the described multi-fold acceleration benefits, both the
target and host sides must be deployed in multi-NIC environments with
properly configured MPTCP endpoints. The target side should use the
'signal' flag for its endpoints, while the host side should use the
'subflow' flag.

	Target side:

	# ip mptcp endpoint add 192.168.1.2 id 2 dev enp3s0f1 signal
	# echo mptcp > /sys/kernel/config/nvmet/ports/1234/addr_trtype

	Host side:

	# ip mptcp endpoint add 192.168.1.4 id 2 dev enp1s0f1 subflow
	# nvme discover -t mptcp ...
	# nvme connect -t mptcp ...

5. Dependencies

The modifications in the NVMe subsystem are minimal. Most of the code
change is on the MPTCP side, to implement interfaces to use MPTCP from
the kernel space, similar to what is done today with TCP.

'NVMe over TCP' uses the read_sock interface for data receiving.
Consequently, the read_sock interface for MPTCP has been implemented
('implement mptcp read_sock' series [2]), which is under review on the
MPTCP mailing list.

As 'NVMe over TCP' can utilize TLS for encryption, KTLS support for
MPTCP has also been added ('MPTCP KTLS support' series [3]), which is
currently in the RFC stage. TLS mode for 'NVMe over MPTCP' has been
successfully validated.

Corresponding updates are also required in user-space libraries and
tools, including libnvme [4], nvme-cli [5], and ktls-utils [6], to add
MPTCP support.

6. Discussion

The current approach is to define a new transport type,
NVMF_TRTYPE_MPTCP, but I understand it will need to register a new
protocol number from NVMexpress.org. Hannes Reinecke also suggested
declaring MPTCP as a TCP 'variant', but I found some drawbacks. I would
like to discuss them and find possible solutions.

I also seek guidance on how to incorporate MPTCP support into the NVMe
protocol specifications. I lack experience in modifying NVMe protocol
specifications and would appreciate guidance and assistance from the
NVMe community.

Thanks,
-Geliang

[1]
NVME over MPTCP
https://patchwork.kernel.org/project/mptcp/cover/cover.1764152990.git.tanggeliang@kylinos.cn/
[2]
implement mptcp read_sock
https://patchwork.kernel.org/project/mptcp/cover/cover.1765023923.git.tanggeliang@kylinos.cn/
[3]
MPTCP KTLS support
https://patchwork.kernel.org/project/mptcp/cover/cover.1768294706.git.tanggeliang@kylinos.cn/
[4]
libnvme: add mptcp trtype
https://patchwork.kernel.org/project/mptcp/patch/99f6e63b5c9677f29a9bc8cdd87b2064b258435f.1764206766.git.tanggeliang@kylinos.cn/
[5]
fabrics: add mptcp support
https://github.com/linux-nvme/nvme-cli/commit/f468531d0592ad22b71760d883409363b1f8a9d6
[6]
add mptcp support
https://github.com/oracle/ktls-utils/commit/4a45e486c65be986ef349ed10b0fc9bd5dbf107d



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [LSF/MM/BPF TOPIC] NVMe over MPTCP: Multi-Fold Acceleration for NVMe over TCP in Multi-NIC Environments
  2026-01-29  4:13 [LSF/MM/BPF TOPIC] NVMe over MPTCP: Multi-Fold Acceleration for NVMe over TCP in Multi-NIC Environments Geliang Tang
@ 2026-02-25  5:57 ` Ming Lei
  2026-02-26  9:44   ` Geliang Tang
  2026-02-25 15:07 ` Nilay Shroff
  1 sibling, 1 reply; 10+ messages in thread
From: Ming Lei @ 2026-02-25  5:57 UTC (permalink / raw)
  To: Geliang Tang
  Cc: lsf-pc, linux-nvme, mptcp, Matthieu Baerts, Mat Martineau,
	Paolo Abeni, Hannes Reinecke

Hi Geliang,

Looks one interesting topic!

On Thu, Jan 29, 2026 at 12:13:25PM +0800, Geliang Tang wrote:
> As one of the MPTCP upstream developers, I'm recently working on adding
> MPTCP support to 'NVMe over TCP'. This approach achieves a multi-fold
> performance improvement over using standard TCP. The implementation and
> testing phases are largely complete. The code is currently in the RFC
> stage and has undergone several rounds of discussion and iteration on
> the MPTCP mailing list [1]. It will be sent to the NVMe mailing list
> shortly.
> 
> 1. Introduction to MPTCP
> 
> Multipath TCP (MPTCP), standardized in RFC 8684, represents a major
> evolution of the TCP protocol. It enables a single transport connection
> to utilize multiple network paths simultaneously, providing benefits in
> redundancy, resilience, and bandwidth aggregation. Since its
> introduction in Linux kernel v5.6, it has become a key technology for
> modern networking, particularly in multi-NIC environments.
> 
> On a supported system such as Linux, an MPTCP socket is created by
> specifying the IPPROTO_MPTCP protocol in the socket() system call:
> 
> 	int fd = socket(AF_INET, SOCK_STREAM, IPPROTO_MPTCP);
> 
> This creates a socket that appears as a standard TCP socket to the
> application but uses the MPTCP protocol stack underneath.
> 
> For more details, please visit the project website: https://mptcp.dev.
> 
> 2. Implementation
> 
> 'NVMe over TCP' establishes multiple TCP connections between the target
> and host for data transfer. This includes one admin queue connection
> for management traffic and multiple I/O queue connections for data
> traffic, with the number typically scaling with available CPU cores.
> While these multiple TCP connections (using the same IP address but
> different ports) help distribute computational load across CPUs, all
> data traffic still flows through a single network interface card (NIC),
> even in multi-NIC environments.
> 
> The 'NVMe over MPTCP' solution enhances 'NVMe over TCP' by replacing
> the multiple TCP connections with multiple MPTCP connections, leaving
> other mechanisms unchanged. Internally, each MPTCP connection can
> establish multiple subflows based on the number of configured NICs.
> This distributes data traffic across all available NICs, thereby
> increasing aggregate transmission speed.

NVMe supports multipath, which can apply load balance or sort of algorithm
to maximize network link/bandwidth too.

Maybe you can compare mptcp with multipath in this viewpoint.



Thanks,
Ming



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [LSF/MM/BPF TOPIC] NVMe over MPTCP: Multi-Fold Acceleration for NVMe over TCP in Multi-NIC Environments
  2026-01-29  4:13 [LSF/MM/BPF TOPIC] NVMe over MPTCP: Multi-Fold Acceleration for NVMe over TCP in Multi-NIC Environments Geliang Tang
  2026-02-25  5:57 ` Ming Lei
@ 2026-02-25 15:07 ` Nilay Shroff
  2026-02-26  9:54   ` Geliang Tang
  1 sibling, 1 reply; 10+ messages in thread
From: Nilay Shroff @ 2026-02-25 15:07 UTC (permalink / raw)
  To: Geliang Tang, lsf-pc, linux-nvme
  Cc: mptcp, Matthieu Baerts, Mat Martineau, Paolo Abeni,
	Hannes Reinecke



On 1/29/26 9:43 AM, Geliang Tang wrote:
> 3. Performance Benefits
> 
> This new feature has been evaluated in different environments:
> 
> I conducted 'NVMe over MPTCP' tests between two PCs, each equipped with
> two Gigabit NICs and directly connected via Ethernet cables. Using
> 'NVMe over TCP', the fio benchmark showed a speed of approximately 100
> MiB/s. In contrast, 'NVMe over MPTCP' achieved about 200 MiB/s with
> fio, doubling the throughput.
> 
> In a virtual machine test environment simulating four NICs on both
> sides, 'NVMe over MPTCP' delivered bandwidth up to four times that of
> standard TCP.

This is interesting. Did you try using an NVMe multipath iopolicy other
than the default numa policy? Assuming both the host and target are multihomed,
configuring round-robin or queue-depth may provide performance comparable
to what you are seeing with MPTCP.

I think MPTCP shall distribute traffic using transport-level metrics such as
RTT, cwnd, and packet loss, whereas the NVMe multipath layer makes decisions
based on ANA state, queue depth, and NUMA locality. In a setup with multiple
active paths, switching the iopolicy from numa to round-robin or queue-depth
could improve load distribution across controllers and thus improve performance.

IMO, it would be useful to test with those policies and compare the results
against the MPTCP setup.

Thanks,
--Nilay


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [LSF/MM/BPF TOPIC] NVMe over MPTCP: Multi-Fold Acceleration for NVMe over TCP in Multi-NIC Environments
  2026-02-25  5:57 ` Ming Lei
@ 2026-02-26  9:44   ` Geliang Tang
  0 siblings, 0 replies; 10+ messages in thread
From: Geliang Tang @ 2026-02-26  9:44 UTC (permalink / raw)
  To: Ming Lei
  Cc: lsf-pc, linux-nvme, mptcp, Matthieu Baerts, Mat Martineau,
	Paolo Abeni, Hannes Reinecke

Hi Ming,

On Wed, 2026-02-25 at 13:57 +0800, Ming Lei wrote:
> Hi Geliang,
> 
> Looks one interesting topic!

Thanks for your reply.

> 
> On Thu, Jan 29, 2026 at 12:13:25PM +0800, Geliang Tang wrote:
> > As one of the MPTCP upstream developers, I'm recently working on
> > adding
> > MPTCP support to 'NVMe over TCP'. This approach achieves a multi-
> > fold
> > performance improvement over using standard TCP. The implementation
> > and
> > testing phases are largely complete. The code is currently in the
> > RFC
> > stage and has undergone several rounds of discussion and iteration
> > on
> > the MPTCP mailing list [1]. It will be sent to the NVMe mailing
> > list
> > shortly.
> > 
> > 1. Introduction to MPTCP
> > 
> > Multipath TCP (MPTCP), standardized in RFC 8684, represents a major
> > evolution of the TCP protocol. It enables a single transport
> > connection
> > to utilize multiple network paths simultaneously, providing
> > benefits in
> > redundancy, resilience, and bandwidth aggregation. Since its
> > introduction in Linux kernel v5.6, it has become a key technology
> > for
> > modern networking, particularly in multi-NIC environments.
> > 
> > On a supported system such as Linux, an MPTCP socket is created by
> > specifying the IPPROTO_MPTCP protocol in the socket() system call:
> > 
> > 	int fd = socket(AF_INET, SOCK_STREAM, IPPROTO_MPTCP);
> > 
> > This creates a socket that appears as a standard TCP socket to the
> > application but uses the MPTCP protocol stack underneath.
> > 
> > For more details, please visit the project website:
> > https://mptcp.dev.
> > 
> > 2. Implementation
> > 
> > 'NVMe over TCP' establishes multiple TCP connections between the
> > target
> > and host for data transfer. This includes one admin queue
> > connection
> > for management traffic and multiple I/O queue connections for data
> > traffic, with the number typically scaling with available CPU
> > cores.
> > While these multiple TCP connections (using the same IP address but
> > different ports) help distribute computational load across CPUs,
> > all
> > data traffic still flows through a single network interface card
> > (NIC),
> > even in multi-NIC environments.
> > 
> > The 'NVMe over MPTCP' solution enhances 'NVMe over TCP' by
> > replacing
> > the multiple TCP connections with multiple MPTCP connections,
> > leaving
> > other mechanisms unchanged. Internally, each MPTCP connection can
> > establish multiple subflows based on the number of configured NICs.
> > This distributes data traffic across all available NICs, thereby
> > increasing aggregate transmission speed.
> 
> NVMe supports multipath, which can apply load balance or sort of
> algorithm
> to maximize network link/bandwidth too.
> 
> Maybe you can compare mptcp with multipath in this viewpoint.

Indeed worth comparing. Although they work at different layers, their
goals share similarities. I'll compare them in a follow-up and get back
to you.

Thanks,
-Geliang

> 
> 
> 
> Thanks,
> Ming
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [LSF/MM/BPF TOPIC] NVMe over MPTCP: Multi-Fold Acceleration for NVMe over TCP in Multi-NIC Environments
  2026-02-25 15:07 ` Nilay Shroff
@ 2026-02-26  9:54   ` Geliang Tang
  2026-03-05  4:30     ` Geliang Tang
  0 siblings, 1 reply; 10+ messages in thread
From: Geliang Tang @ 2026-02-26  9:54 UTC (permalink / raw)
  To: Nilay Shroff, lsf-pc, linux-nvme
  Cc: mptcp, Matthieu Baerts, Mat Martineau, Paolo Abeni,
	Hannes Reinecke

Hi Nilay,

Thanks for your reply.

On Wed, 2026-02-25 at 20:37 +0530, Nilay Shroff wrote:
> 
> 
> On 1/29/26 9:43 AM, Geliang Tang wrote:
> > 3. Performance Benefits
> > 
> > This new feature has been evaluated in different environments:
> > 
> > I conducted 'NVMe over MPTCP' tests between two PCs, each equipped
> > with
> > two Gigabit NICs and directly connected via Ethernet cables. Using
> > 'NVMe over TCP', the fio benchmark showed a speed of approximately
> > 100
> > MiB/s. In contrast, 'NVMe over MPTCP' achieved about 200 MiB/s with
> > fio, doubling the throughput.
> > 
> > In a virtual machine test environment simulating four NICs on both
> > sides, 'NVMe over MPTCP' delivered bandwidth up to four times that
> > of
> > standard TCP.
> 
> This is interesting. Did you try using an NVMe multipath iopolicy
> other
> than the default numa policy? Assuming both the host and target are
> multihomed,
> configuring round-robin or queue-depth may provide performance
> comparable
> to what you are seeing with MPTCP.
> 
> I think MPTCP shall distribute traffic using transport-level metrics
> such as
> RTT, cwnd, and packet loss, whereas the NVMe multipath layer makes
> decisions
> based on ANA state, queue depth, and NUMA locality. In a setup with
> multiple
> active paths, switching the iopolicy from numa to round-robin or
> queue-depth
> could improve load distribution across controllers and thus improve
> performance.
> 
> IMO, it would be useful to test with those policies and compare the
> results
> against the MPTCP setup.

Ming Lei also made a similar comment. In my experiments, I didn't set
the multipath iopolicy, so I was using the default numa policy. In the
follow-up, I'll adjust it to round-robin or queue-depth and rerun the
experiments. I'll share the results in this email thread.

Thanks,
-Geliang

> 
> Thanks,
> --Nilay


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [LSF/MM/BPF TOPIC] NVMe over MPTCP: Multi-Fold Acceleration for NVMe over TCP in Multi-NIC Environments
  2026-02-26  9:54   ` Geliang Tang
@ 2026-03-05  4:30     ` Geliang Tang
  2026-05-13 10:04       ` Geliang Tang
  0 siblings, 1 reply; 10+ messages in thread
From: Geliang Tang @ 2026-03-05  4:30 UTC (permalink / raw)
  To: Nilay Shroff, lsf-pc, linux-nvme, Ming Lei, Javier González
  Cc: mptcp, Matthieu Baerts, Mat Martineau, Paolo Abeni,
	Hannes Reinecke

Hi Nilay, Ming,

Thank you again for your interest in NVMe over MPTCP.

On Thu, 2026-02-26 at 17:54 +0800, Geliang Tang wrote:
> Hi Nilay,
> 
> Thanks for your reply.
> 
> On Wed, 2026-02-25 at 20:37 +0530, Nilay Shroff wrote:
> > 
> > 
> > On 1/29/26 9:43 AM, Geliang Tang wrote:
> > > 3. Performance Benefits
> > > 
> > > This new feature has been evaluated in different environments:
> > > 
> > > I conducted 'NVMe over MPTCP' tests between two PCs, each
> > > equipped
> > > with
> > > two Gigabit NICs and directly connected via Ethernet cables.
> > > Using
> > > 'NVMe over TCP', the fio benchmark showed a speed of
> > > approximately
> > > 100
> > > MiB/s. In contrast, 'NVMe over MPTCP' achieved about 200 MiB/s
> > > with
> > > fio, doubling the throughput.
> > > 
> > > In a virtual machine test environment simulating four NICs on
> > > both
> > > sides, 'NVMe over MPTCP' delivered bandwidth up to four times
> > > that
> > > of
> > > standard TCP.
> > 
> > This is interesting. Did you try using an NVMe multipath iopolicy
> > other
> > than the default numa policy? Assuming both the host and target are
> > multihomed,
> > configuring round-robin or queue-depth may provide performance
> > comparable
> > to what you are seeing with MPTCP.
> > 
> > I think MPTCP shall distribute traffic using transport-level
> > metrics
> > such as
> > RTT, cwnd, and packet loss, whereas the NVMe multipath layer makes
> > decisions
> > based on ANA state, queue depth, and NUMA locality. In a setup with
> > multiple
> > active paths, switching the iopolicy from numa to round-robin or
> > queue-depth
> > could improve load distribution across controllers and thus improve
> > performance.
> > 
> > IMO, it would be useful to test with those policies and compare the
> > results
> > against the MPTCP setup.
> 
> Ming Lei also made a similar comment. In my experiments, I didn't set
> the multipath iopolicy, so I was using the default numa policy. In
> the
> follow-up, I'll adjust it to round-robin or queue-depth and rerun the
> experiments. I'll share the results in this email thread.

Based on your feedback, I have added iopolicy support to the NVMe over
MPTCP selftest script (see patch 8 in [1]). We can set the iopolicy to
round-robin like this:

 # ./mptcp_nvme.sh mptcp round-robin

This demonstrates that "NVMe over MPTCP" and "NVMe multipath" can work
simultaneously without conflict.

Using this test script, I compared three I/O policies: numa, round-
robin, and queue-depth. The results for fio were very similar. It's
possible that this test environment doesn't fully reflect the
differences in I/O policies. I will continue to follow up with further
tests.

Thanks,
-Geliang

[1]
NVME over MPTCP, v4
https://patchwork.kernel.org/project/mptcp/cover/cover.1772683110.git.tanggeliang@kylinos.cn/

> 
> Thanks,
> -Geliang
> 
> > 
> > Thanks,
> > --Nilay


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [LSF/MM/BPF TOPIC] NVMe over MPTCP: Multi-Fold Acceleration for NVMe over TCP in Multi-NIC Environments
  2026-03-05  4:30     ` Geliang Tang
@ 2026-05-13 10:04       ` Geliang Tang
  2026-05-19  7:31         ` Geliang Tang
  0 siblings, 1 reply; 10+ messages in thread
From: Geliang Tang @ 2026-05-13 10:04 UTC (permalink / raw)
  To: lsf-pc, Javier González, Nilay Shroff, Ming Lei,
	Matthieu Baerts, Mat Martineau, Paolo Abeni, Hannes Reinecke,
	John Meneghini, Randy Jennings
  Cc: mptcp, linux-nvme

[-- Attachment #1: Type: text/plain, Size: 3494 bytes --]

Hello everyone,

Thank you for your interest in NVMe over MPTCP. I have attached the
slides from the presentation to this email.

Please note that the demo in the slides only configured a single NVMe
multipath. Subsequently, I will post the MPTCP performance test results
under several NVMe multipaths here.

Thanks,
-Geliang

On Thu, 2026-03-05 at 12:30 +0800, Geliang Tang wrote:
> Hi Nilay, Ming,
> 
> Thank you again for your interest in NVMe over MPTCP.
> 
> On Thu, 2026-02-26 at 17:54 +0800, Geliang Tang wrote:
> > Hi Nilay,
> > 
> > Thanks for your reply.
> > 
> > On Wed, 2026-02-25 at 20:37 +0530, Nilay Shroff wrote:
> > > 
> > > 
> > > On 1/29/26 9:43 AM, Geliang Tang wrote:
> > > > 3. Performance Benefits
> > > > 
> > > > This new feature has been evaluated in different environments:
> > > > 
> > > > I conducted 'NVMe over MPTCP' tests between two PCs, each
> > > > equipped
> > > > with
> > > > two Gigabit NICs and directly connected via Ethernet cables.
> > > > Using
> > > > 'NVMe over TCP', the fio benchmark showed a speed of
> > > > approximately
> > > > 100
> > > > MiB/s. In contrast, 'NVMe over MPTCP' achieved about 200 MiB/s
> > > > with
> > > > fio, doubling the throughput.
> > > > 
> > > > In a virtual machine test environment simulating four NICs on
> > > > both
> > > > sides, 'NVMe over MPTCP' delivered bandwidth up to four times
> > > > that
> > > > of
> > > > standard TCP.
> > > 
> > > This is interesting. Did you try using an NVMe multipath iopolicy
> > > other
> > > than the default numa policy? Assuming both the host and target
> > > are
> > > multihomed,
> > > configuring round-robin or queue-depth may provide performance
> > > comparable
> > > to what you are seeing with MPTCP.
> > > 
> > > I think MPTCP shall distribute traffic using transport-level
> > > metrics
> > > such as
> > > RTT, cwnd, and packet loss, whereas the NVMe multipath layer
> > > makes
> > > decisions
> > > based on ANA state, queue depth, and NUMA locality. In a setup
> > > with
> > > multiple
> > > active paths, switching the iopolicy from numa to round-robin or
> > > queue-depth
> > > could improve load distribution across controllers and thus
> > > improve
> > > performance.
> > > 
> > > IMO, it would be useful to test with those policies and compare
> > > the
> > > results
> > > against the MPTCP setup.
> > 
> > Ming Lei also made a similar comment. In my experiments, I didn't
> > set
> > the multipath iopolicy, so I was using the default numa policy. In
> > the
> > follow-up, I'll adjust it to round-robin or queue-depth and rerun
> > the
> > experiments. I'll share the results in this email thread.
> 
> Based on your feedback, I have added iopolicy support to the NVMe
> over
> MPTCP selftest script (see patch 8 in [1]). We can set the iopolicy
> to
> round-robin like this:
> 
>  # ./mptcp_nvme.sh mptcp round-robin
> 
> This demonstrates that "NVMe over MPTCP" and "NVMe multipath" can
> work
> simultaneously without conflict.
> 
> Using this test script, I compared three I/O policies: numa, round-
> robin, and queue-depth. The results for fio were very similar. It's
> possible that this test environment doesn't fully reflect the
> differences in I/O policies. I will continue to follow up with
> further
> tests.
> 
> Thanks,
> -Geliang
> 
> [1]
> NVME over MPTCP, v4
> https://patchwork.kernel.org/project/mptcp/cover/cover.1772683110.git.tanggeliang@kylinos.cn/
> 
> > 
> > Thanks,
> > -Geliang
> > 
> > > 
> > > Thanks,
> > > --Nilay


[-- Attachment #2: lsfmmbpf2026-nvme-over-mptcp.pdf --]
[-- Type: application/pdf, Size: 130018 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [LSF/MM/BPF TOPIC] NVMe over MPTCP: Multi-Fold Acceleration for NVMe over TCP in Multi-NIC Environments
  2026-05-13 10:04       ` Geliang Tang
@ 2026-05-19  7:31         ` Geliang Tang
  2026-05-26 10:16           ` Geliang Tang
  2026-05-28 15:59           ` Randy Jennings
  0 siblings, 2 replies; 10+ messages in thread
From: Geliang Tang @ 2026-05-19  7:31 UTC (permalink / raw)
  To: lsf-pc, Javier González, Nilay Shroff, Ming Lei,
	Matthieu Baerts, Mat Martineau, Paolo Abeni, Hannes Reinecke,
	John Meneghini, Randy Jennings
  Cc: mptcp, linux-nvme

[-- Attachment #1: Type: text/plain, Size: 8863 bytes --]

Hi,

The performance test results of MPTCP under several NVMe multipath
settings are now ready.

On Wed, 2026-05-13 at 18:04 +0800, Geliang Tang wrote:
> Hello everyone,
> 
> Thank you for your interest in NVMe over MPTCP. I have attached the
> slides from the presentation to this email.
> 
> Please note that the demo in the slides only configured a single NVMe
> multipath. Subsequently, I will post the MPTCP performance test
> results
> under several NVMe multipaths here.

To test the performance of TCP and MPTCP under NVMe multipath, I added
two more arguments, "path" and "loss", to the original NVMe MPTCP self
test script. The latest code is available at [1].

The script now accepts the following four arguments:

  mptcp_nvme.sh [trtype] [path] [iopolicy] [loss]

  trtype   Transport type (tcp|mptcp) - default: mptcp
  path     Number of multipath (1-4) - default: 1
  iopolicy I/O policy (numa|round-robin|queue-depth) - default: numa
  loss     Enable packet loss (0|1) - default: 0

The first argument is the transport type. The second argument, "path",
specifies how many NVMe multipaths to create. The third argument is the
I/O policy. The fourth argument controls whether the network
environment is lossy. When set to 0, each NIC is rate-limited to 125
MB/s (tc arguments: rate 1000mbit). When set to 1, in addition to the
same rate limit of 125 MB/s, each NIC also experiences a 5 ms delay and
0.5% packet loss (tc arguments: rate 1000mbit delay 5ms loss 0.5%).


First set of tests: lossless network, path=4, loss=0. The tc output is
as follows:

  qdisc netem 8031: root refcnt 25 limit 1000 rate 1Gbit
			seed 1626193586047356330

Lossless network, comparison between TCP and MPTCP using the "numa"
policy - MPTCP is four times faster than TCP:

# ./mptcp_nvme.sh tcp 4 numa 0
   READ: bw=114MiB/s (119MB/s), 114MiB/s-114MiB/s (119MB/s-119MB/s),
			io=1200MiB (1259MB), run=10533-10533msec
  WRITE: bw=114MiB/s (119MB/s), 114MiB/s-114MiB/s (119MB/s-119MB/s),
			io=1203MiB (1261MB), run=10570-10570msec

# ./mptcp_nvme.sh mptcp 4 numa 0
   READ: bw=445MiB/s (467MB/s), 445MiB/s-445MiB/s (467MB/s-467MB/s),
			io=4512MiB (4731MB), run=10130-10130msec
  WRITE: bw=443MiB/s (465MB/s), 443MiB/s-443MiB/s (465MB/s-465MB/s),
			io=4504MiB (4723MB), run=10158-10158msec

Lossless network, comparison between TCP and MPTCP using the "round-
robin" policy - MPTCP and TCP show similar performance:

# ./mptcp_nvme.sh tcp 4 round-robin 0
   READ: bw=456MiB/s (478MB/s), 456MiB/s-456MiB/s (478MB/s-478MB/s),
			io=4683MiB (4910MB), run=10278-10278msec
  WRITE: bw=455MiB/s (477MB/s), 455MiB/s-455MiB/s (477MB/s-477MB/s),
			io=4660MiB (4887MB), run=10239-10239msec

# ./mptcp_nvme.sh mptcp 4 round-robin 0
   READ: bw=446MiB/s (467MB/s), 446MiB/s-446MiB/s (467MB/s-467MB/s),
			io=4565MiB (4786MB), run=10239-10239msec
  WRITE: bw=445MiB/s (467MB/s), 445MiB/s-445MiB/s (467MB/s-467MB/s),
			io=4575MiB (4797MB), run=10280-10280msec

Lossless network, comparison between TCP and MPTCP using the "queue-
depth" policy - MPTCP and TCP show similar performance:

# ./mptcp_nvme.sh tcp 4 queue-depth 0
   READ: bw=456MiB/s (478MB/s), 456MiB/s-456MiB/s (478MB/s-478MB/s),
			io=4632MiB (4857MB), run=10169-10169msec
  WRITE: bw=455MiB/s (477MB/s), 455MiB/s-455MiB/s (477MB/s-477MB/s),
			io=4666MiB (4893MB), run=10250-10250msec

# ./mptcp_nvme.sh mptcp 4 queue-depth 0
   READ: bw=446MiB/s (467MB/s), 446MiB/s-446MiB/s (467MB/s-467MB/s),
			io=4568MiB (4790MB), run=10249-10249msec
  WRITE: bw=445MiB/s (467MB/s), 445MiB/s-445MiB/s (467MB/s-467MB/s),
			io=4563MiB (4784MB), run=10245-10245msec


Second set of tests: lossy network, path=4, loss=1. The tc output is as
follows:

  qdisc netem 8051: root refcnt 25 limit 1000 delay 5ms loss 0.5%
			rate 1Gbit seed 14946049878654165618

Lossy network, comparison between TCP and MPTCP using the "round-robin"
policy - MPTCP is four times faster than TCP:

# ./mptcp_nvme.sh tcp 4 round-robin 1
   READ: bw=106MiB/s (111MB/s), 106MiB/s-106MiB/s (111MB/s-111MB/s),
			io=1574MiB (1650MB), run=14906-14906msec
  WRITE: bw=98.5MiB/s (103MB/s), 98.5MiB/s-98.5MiB/s (103MB/s-103MB/s),
			io=1455MiB (1526MB), run=14770-14770msec

# ./mptcp_nvme.sh mptcp 4 round-robin 1
   READ: bw=426MiB/s (447MB/s), 426MiB/s-426MiB/s (447MB/s-447MB/s),
			io=4533MiB (4753MB), run=10637-10637msec
  WRITE: bw=428MiB/s (449MB/s), 428MiB/s-428MiB/s (449MB/s-449MB/s),
			io=4507MiB (4725MB), run=10522-10522msec

Lossy network, comparison between TCP and MPTCP using the "queue-depth"
policy - MPTCP is four times faster than TCP:

# ./mptcp_nvme.sh tcp 4 queue-depth 1
   READ: bw=168MiB/s (176MB/s), 168MiB/s-168MiB/s (176MB/s-176MB/s),
			io=2179MiB (2285MB), run=12965-12965msec
  WRITE: bw=128MiB/s (134MB/s), 128MiB/s-128MiB/s (134MB/s-134MB/s),
			io=1590MiB (1667MB), run=12418-12418msec

# ./mptcp_nvme.sh mptcp 4 queue-depth 1
   READ: bw=425MiB/s (445MB/s), 425MiB/s-425MiB/s (445MB/s-445MB/s),
			io=4536MiB (4756MB), run=10677-10677msec
  WRITE: bw=414MiB/s (434MB/s), 414MiB/s-414MiB/s (434MB/s-434MB/s),
			io=4447MiB (4663MB), run=10733-10733msec


Conclusion: MPTCP achieves bandwidth aggregation comparable to that of
NVMe multipath while offering better resilience against network
interference.

The full test results are in the attachment.

Thanks,
-Geliang

[1]
https://patchwork.kernel.org/project/mptcp/cover/cover.1779159524.git.tanggeliang@kylinos.cn/

> 
> Thanks,
> -Geliang
> 
> On Thu, 2026-03-05 at 12:30 +0800, Geliang Tang wrote:
> > Hi Nilay, Ming,
> > 
> > Thank you again for your interest in NVMe over MPTCP.
> > 
> > On Thu, 2026-02-26 at 17:54 +0800, Geliang Tang wrote:
> > > Hi Nilay,
> > > 
> > > Thanks for your reply.
> > > 
> > > On Wed, 2026-02-25 at 20:37 +0530, Nilay Shroff wrote:
> > > > 
> > > > 
> > > > On 1/29/26 9:43 AM, Geliang Tang wrote:
> > > > > 3. Performance Benefits
> > > > > 
> > > > > This new feature has been evaluated in different
> > > > > environments:
> > > > > 
> > > > > I conducted 'NVMe over MPTCP' tests between two PCs, each
> > > > > equipped
> > > > > with
> > > > > two Gigabit NICs and directly connected via Ethernet cables.
> > > > > Using
> > > > > 'NVMe over TCP', the fio benchmark showed a speed of
> > > > > approximately
> > > > > 100
> > > > > MiB/s. In contrast, 'NVMe over MPTCP' achieved about 200
> > > > > MiB/s
> > > > > with
> > > > > fio, doubling the throughput.
> > > > > 
> > > > > In a virtual machine test environment simulating four NICs on
> > > > > both
> > > > > sides, 'NVMe over MPTCP' delivered bandwidth up to four times
> > > > > that
> > > > > of
> > > > > standard TCP.
> > > > 
> > > > This is interesting. Did you try using an NVMe multipath
> > > > iopolicy
> > > > other
> > > > than the default numa policy? Assuming both the host and target
> > > > are
> > > > multihomed,
> > > > configuring round-robin or queue-depth may provide performance
> > > > comparable
> > > > to what you are seeing with MPTCP.
> > > > 
> > > > I think MPTCP shall distribute traffic using transport-level
> > > > metrics
> > > > such as
> > > > RTT, cwnd, and packet loss, whereas the NVMe multipath layer
> > > > makes
> > > > decisions
> > > > based on ANA state, queue depth, and NUMA locality. In a setup
> > > > with
> > > > multiple
> > > > active paths, switching the iopolicy from numa to round-robin
> > > > or
> > > > queue-depth
> > > > could improve load distribution across controllers and thus
> > > > improve
> > > > performance.
> > > > 
> > > > IMO, it would be useful to test with those policies and compare
> > > > the
> > > > results
> > > > against the MPTCP setup.
> > > 
> > > Ming Lei also made a similar comment. In my experiments, I didn't
> > > set
> > > the multipath iopolicy, so I was using the default numa policy.
> > > In
> > > the
> > > follow-up, I'll adjust it to round-robin or queue-depth and rerun
> > > the
> > > experiments. I'll share the results in this email thread.
> > 
> > Based on your feedback, I have added iopolicy support to the NVMe
> > over
> > MPTCP selftest script (see patch 8 in [1]). We can set the iopolicy
> > to
> > round-robin like this:
> > 
> >  # ./mptcp_nvme.sh mptcp round-robin
> > 
> > This demonstrates that "NVMe over MPTCP" and "NVMe multipath" can
> > work
> > simultaneously without conflict.
> > 
> > Using this test script, I compared three I/O policies: numa, round-
> > robin, and queue-depth. The results for fio were very similar. It's
> > possible that this test environment doesn't fully reflect the
> > differences in I/O policies. I will continue to follow up with
> > further
> > tests.
> > 
> > Thanks,
> > -Geliang
> > 
> > [1]
> > NVME over MPTCP, v4
> > https://patchwork.kernel.org/project/mptcp/cover/cover.1772683110.git.tanggeliang@kylinos.cn/
> > 
> > > 
> > > Thanks,
> > > -Geliang
> > > 
> > > > 
> > > > Thanks,
> > > > --Nilay

[-- Attachment #2: nvme-over-mptcp-multipath-tests.log --]
[-- Type: text/x-log, Size: 66748 bytes --]

tgl@ThinkBook:~/mptcp_net-next/mptcp-selftests$ sudo ./mptcp_nvme.sh tcp 4 numa 0
0+0 records in
0+0 records out
0 bytes copied, 0.000033553 s, 0.0 B/s
qdisc netem 8031: root refcnt 25 limit 1000 rate 1Gbit seed 1626193586047356330
qdisc netem 8032: root refcnt 25 limit 1000 rate 1Gbit seed 6042711589015947724
qdisc netem 8033: root refcnt 25 limit 1000 rate 1Gbit seed 17633132008389812116
qdisc netem 8034: root refcnt 25 limit 1000 rate 1Gbit seed 7224188079500556986
qdisc netem 8035: root refcnt 25 limit 1000 rate 1Gbit seed 9443293930724710079
qdisc netem 8036: root refcnt 25 limit 1000 rate 1Gbit seed 16373570367022413035
qdisc netem 8037: root refcnt 25 limit 1000 rate 1Gbit seed 13985178702777805862
qdisc netem 8038: root refcnt 25 limit 1000 rate 1Gbit seed 17161966827319017756
nvme discover -a 10.1.1.1

Discovery Log Number of Records 2, Generation counter 47
=====Discovery Log Entry 0======
trtype:  tcp
adrfam:  ipv4
subtype: current discovery subsystem
treq:    not specified, sq flow control disable supported
portid:  20316
trsvcid: 23502
subnqn:  nqn.2014-08.org.nvmexpress.discovery
traddr:  10.1.1.1
eflags:  none
sectype: none
=====Discovery Log Entry 1======
trtype:  tcp
adrfam:  ipv4
subtype: nvme subsystem
treq:    not specified, sq flow control disable supported
portid:  20316
trsvcid: 23502
subnqn:  nqn.2014-08.org.nvmexpress.tcpdev.6825.11970
traddr:  10.1.1.1
eflags:  none
sectype: none
Connecting to 10.1.1.1:23502
connecting to device: nvme1
Connecting to 10.1.2.1:23502
connecting to device: nvme2
Connecting to 10.1.3.1:23502
connecting to device: nvme3
Connecting to 10.1.4.1:23502
connecting to device: nvme4
nvme list
Node                  Generic               SN                   Model                                    Namespace  Usage                      Format           FW Rev  
--------------------- --------------------- -------------------- ---------------------------------------- ---------- -------------------------- ---------------- --------
/dev/nvme0n1          /dev/ng0n1            SS1Q24313Z2CD54R50B8 UMIS RPJYJ1T24MML1AWY                    0x1          1.02  TB /   1.02  TB    512   B +  0 B   1.0L06C1
/dev/nvme1n1          /dev/ng1n1            10da529f68f6534c815e Linux                                    0x1        536.87  MB / 536.87  MB    512   B +  0 B   7.1.0-rc
fio randread /dev/nvme1n1
libaio_4_256_128k_randread: (g=0): rw=randread, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=libaio, iodepth=256
...
fio-3.41
Starting 4 threads
Jobs: 4 (f=4): [r(4)][100.0%][r=114MiB/s][r=913 IOPS][eta 00m:00s]
libaio_4_256_128k_randread: (groupid=0, jobs=4): err= 0: pid=6996: Tue May 19 14:28:47 2026
  read: IOPS=911, BW=114MiB/s (119MB/s)(1200MiB/10533msec)
    slat (usec): min=2, max=962951, avg=4180.26, stdev=24042.85
    clat (msec): min=88, max=3807, avg=1090.51, stdev=670.77
     lat (msec): min=88, max=3807, avg=1094.69, stdev=672.21
    clat percentiles (msec):
     |  1.00th=[  384],  5.00th=[  447], 10.00th=[  481], 20.00th=[  600],
     | 30.00th=[  718], 40.00th=[  835], 50.00th=[  894], 60.00th=[  969],
     | 70.00th=[ 1099], 80.00th=[ 1368], 90.00th=[ 2106], 95.00th=[ 2769],
     | 99.00th=[ 3306], 99.50th=[ 3473], 99.90th=[ 3742], 99.95th=[ 3809],
     | 99.99th=[ 3809]
   bw (  KiB/s): min=11264, max=253440, per=99.26%, avg=115826.65, stdev=17233.15, samples=76
   iops        : min=   88, max= 1980, avg=904.90, stdev=134.63, samples=76
  lat (msec)   : 100=0.06%, 250=0.33%, 500=11.35%, 750=20.27%, 1000=31.71%
  lat (msec)   : 2000=24.87%, >=2000=11.40%
  cpu          : usr=0.01%, sys=0.20%, ctx=10293, majf=0, minf=32772
  IO depths    : 1=0.1%, 2=0.1%, 4=0.2%, 8=0.3%, 16=0.7%, 32=1.3%, >=64=97.4%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
     issued rwts: total=9602,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=256

Run status group 0 (all jobs):
   READ: bw=114MiB/s (119MB/s), 114MiB/s-114MiB/s (119MB/s-119MB/s), io=1200MiB (1259MB), run=10533-10533msec

Disk stats (read/write):
  nvme1n1: ios=9467/0, sectors=2423552/0, merge=0/0, ticks=5060639/0, in_queue=5060639, util=99.12%
fio randwrite /dev/nvme1n1
libaio_4_256_128k_randwrite: (g=0): rw=randwrite, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=libaio, iodepth=256
...
fio-3.41
Starting 4 threads
Jobs: 4 (f=4): [w(4)][100.0%][w=109MiB/s][w=874 IOPS][eta 00m:00s]
libaio_4_256_128k_randwrite: (groupid=0, jobs=4): err= 0: pid=7029: Tue May 19 14:28:59 2026
  write: IOPS=910, BW=114MiB/s (119MB/s)(1203MiB/10570msec); 0 zone resets
    slat (usec): min=2, max=1095.4k, avg=4161.24, stdev=28251.90
    clat (msec): min=176, max=9210, avg=1088.71, stdev=1244.47
     lat (msec): min=178, max=9210, avg=1092.87, stdev=1247.38
    clat percentiles (msec):
     |  1.00th=[  259],  5.00th=[  363], 10.00th=[  405], 20.00th=[  481],
     | 30.00th=[  531], 40.00th=[  609], 50.00th=[  701], 60.00th=[  835],
     | 70.00th=[ 1045], 80.00th=[ 1301], 90.00th=[ 2072], 95.00th=[ 2903],
     | 99.00th=[ 8087], 99.50th=[ 9194], 99.90th=[ 9194], 99.95th=[ 9194],
     | 99.99th=[ 9194]
   bw (  KiB/s): min=11008, max=305152, per=100.00%, avg=119677.01, stdev=22187.64, samples=74
   iops        : min=   86, max= 2384, avg=934.98, stdev=173.34, samples=74
  lat (msec)   : 250=0.85%, 500=23.29%, 750=30.67%, 1000=13.58%, 2000=20.92%
  lat (msec)   : >=2000=10.69%
  cpu          : usr=0.08%, sys=0.11%, ctx=6954, majf=0, minf=4
  IO depths    : 1=0.1%, 2=0.1%, 4=0.2%, 8=0.3%, 16=0.7%, 32=1.3%, >=64=97.4%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
     issued rwts: total=0,9623,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=256

Run status group 0 (all jobs):
  WRITE: bw=114MiB/s (119MB/s), 114MiB/s-114MiB/s (119MB/s-119MB/s), io=1203MiB (1261MB), run=10570-10570msec

Disk stats (read/write):
  nvme1n1: ios=56/9456, sectors=1984/2420736, merge=0/0, ticks=76/4946676, in_queue=4946752, util=100.00%
NVMe Flush: success

TAP version 13
1..1
ok 1 - mptcp_nvme: nvme over tcp test # time=25025ms
NQN:nqn.2014-08.org.nvmexpress.tcpdev.6825.11970 disconnected 4 controller(s)
/sys/kernel/config/nvmet /home/tgl/mptcp_net-next/tools/testing/selftests/net/mptcp
/home/tgl/mptcp_net-next/tools/testing/selftests/net/mptcp
tgl@ThinkBook:~/mptcp_net-next/mptcp-selftests$ sudo ./mptcp_nvme.sh mptcp 4 numa 0
0+0 records in
0+0 records out
0 bytes copied, 0.000035046 s, 0.0 B/s
qdisc netem 8039: root refcnt 25 limit 1000 rate 1Gbit seed 6085134754699925076
qdisc netem 803a: root refcnt 25 limit 1000 rate 1Gbit seed 450455490517112707
qdisc netem 803b: root refcnt 25 limit 1000 rate 1Gbit seed 264120141577159822
qdisc netem 803c: root refcnt 25 limit 1000 rate 1Gbit seed 14027700267729747694
qdisc netem 803d: root refcnt 25 limit 1000 rate 1Gbit seed 1021893091432102865
qdisc netem 803e: root refcnt 25 limit 1000 rate 1Gbit seed 9767710148348395904
qdisc netem 803f: root refcnt 25 limit 1000 rate 1Gbit seed 10286576419486170728
qdisc netem 8040: root refcnt 25 limit 1000 rate 1Gbit seed 12482312460578339362
nvme discover -a 10.1.1.1

Discovery Log Number of Records 2, Generation counter 56
=====Discovery Log Entry 0======
trtype:  unrecognized
adrfam:  ipv4
subtype: current discovery subsystem
treq:    not specified, sq flow control disable supported
portid:  24021
trsvcid: 5862
subnqn:  nqn.2014-08.org.nvmexpress.discovery
traddr:  10.1.1.1
eflags:  none
=====Discovery Log Entry 1======
trtype:  unrecognized
adrfam:  ipv4
subtype: nvme subsystem
treq:    not specified, sq flow control disable supported
portid:  24021
trsvcid: 5862
subnqn:  nqn.2014-08.org.nvmexpress.mptcpdev.7084.21368
traddr:  10.1.1.1
eflags:  none
Connecting to 10.1.1.1:5862
connecting to device: nvme1
Connecting to 10.1.2.1:5862
connecting to device: nvme2
Connecting to 10.1.3.1:5862
connecting to device: nvme3
Connecting to 10.1.4.1:5862
connecting to device: nvme4
nvme list
Node                  Generic               SN                   Model                                    Namespace  Usage                      Format           FW Rev  
--------------------- --------------------- -------------------- ---------------------------------------- ---------- -------------------------- ---------------- --------
/dev/nvme0n1          /dev/ng0n1            SS1Q24313Z2CD54R50B8 UMIS RPJYJ1T24MML1AWY                    0x1          1.02  TB /   1.02  TB    512   B +  0 B   1.0L06C1
/dev/nvme1n1          /dev/ng1n1            2641199e93532d5f5106 Linux                                    0x1        536.87  MB / 536.87  MB    512   B +  0 B   7.1.0-rc
fio randread /dev/nvme1n1
libaio_4_256_128k_randread: (g=0): rw=randread, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=libaio, iodepth=256
...
fio-3.41
Starting 4 threads
Jobs: 4 (f=4): [r(4)][100.0%][r=442MiB/s][r=3532 IOPS][eta 00m:00s]
libaio_4_256_128k_randread: (groupid=0, jobs=4): err= 0: pid=7255: Tue May 19 14:29:20 2026
  read: IOPS=3563, BW=445MiB/s (467MB/s)(4512MiB/10130msec)
    slat (usec): min=2, max=385056, avg=1097.92, stdev=8410.04
    clat (msec): min=11, max=1294, avg=285.22, stdev=164.51
     lat (msec): min=38, max=1298, avg=286.32, stdev=164.88
    clat percentiles (msec):
     |  1.00th=[   73],  5.00th=[  104], 10.00th=[  125], 20.00th=[  153],
     | 30.00th=[  180], 40.00th=[  213], 50.00th=[  247], 60.00th=[  296],
     | 70.00th=[  342], 80.00th=[  393], 90.00th=[  485], 95.00th=[  584],
     | 99.00th=[  827], 99.50th=[ 1167], 99.90th=[ 1267], 99.95th=[ 1267],
     | 99.99th=[ 1301]
   bw (  KiB/s): min=119040, max=852992, per=98.43%, avg=448986.20, stdev=44471.69, samples=80
   iops        : min=  930, max= 6664, avg=3507.65, stdev=347.44, samples=80
  lat (msec)   : 20=0.01%, 50=0.04%, 100=4.16%, 250=46.61%, 500=40.23%
  lat (msec)   : 750=7.57%, 1000=0.65%, 2000=0.73%
  cpu          : usr=0.07%, sys=0.70%, ctx=4877, majf=0, minf=32772
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.2%, 32=0.4%, >=64=99.3%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
     issued rwts: total=36098,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=256

Run status group 0 (all jobs):
   READ: bw=445MiB/s (467MB/s), 445MiB/s-445MiB/s (467MB/s-467MB/s), io=4512MiB (4731MB), run=10130-10130msec

Disk stats (read/write):
  nvme1n1: ios=35140/0, sectors=8995840/0, merge=0/0, ticks=5536502/0, in_queue=5536502, util=99.04%
fio randwrite /dev/nvme1n1
libaio_4_256_128k_randwrite: (g=0): rw=randwrite, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=libaio, iodepth=256
...
fio-3.41
Starting 4 threads
Jobs: 4 (f=4): [w(4)][100.0%][w=443MiB/s][w=3541 IOPS][eta 00m:00s]
libaio_4_256_128k_randwrite: (groupid=0, jobs=4): err= 0: pid=7290: Tue May 19 14:29:31 2026
  write: IOPS=3547, BW=443MiB/s (465MB/s)(4504MiB/10158msec); 0 zone resets
    slat (usec): min=2, max=202481, avg=1024.27, stdev=6657.70
    clat (msec): min=31, max=1350, avg=286.57, stdev=153.23
     lat (msec): min=31, max=1350, avg=287.60, stdev=153.56
    clat percentiles (msec):
     |  1.00th=[   66],  5.00th=[  104], 10.00th=[  127], 20.00th=[  163],
     | 30.00th=[  192], 40.00th=[  224], 50.00th=[  253], 60.00th=[  292],
     | 70.00th=[  338], 80.00th=[  401], 90.00th=[  485], 95.00th=[  575],
     | 99.00th=[  776], 99.50th=[  877], 99.90th=[ 1099], 99.95th=[ 1099],
     | 99.99th=[ 1351]
   bw (  KiB/s): min=189952, max=765952, per=98.70%, avg=448125.70, stdev=39877.96, samples=80
   iops        : min= 1484, max= 5984, avg=3500.95, stdev=311.54, samples=80
  lat (msec)   : 50=0.27%, 100=4.29%, 250=44.62%, 500=41.75%, 750=7.88%
  lat (msec)   : 1000=0.85%, 2000=0.34%
  cpu          : usr=0.41%, sys=0.44%, ctx=9102, majf=0, minf=4
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.2%, 32=0.4%, >=64=99.3%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
     issued rwts: total=0,36032,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=256

Run status group 0 (all jobs):
  WRITE: bw=443MiB/s (465MB/s), 443MiB/s-443MiB/s (465MB/s-465MB/s), io=4504MiB (4723MB), run=10158-10158msec

Disk stats (read/write):
  nvme1n1: ios=48/35948, sectors=1672/9202688, merge=0/0, ticks=7/6644720, in_queue=6644727, util=99.02%
NVMe Flush: success

TAP version 13
1..1
ok 1 - mptcp_nvme: nvme over mptcp test # time=24516ms
NQN:nqn.2014-08.org.nvmexpress.mptcpdev.7084.21368 disconnected 4 controller(s)
/sys/kernel/config/nvmet /home/tgl/mptcp_net-next/tools/testing/selftests/net/mptcp
/home/tgl/mptcp_net-next/tools/testing/selftests/net/mptcp
tgl@ThinkBook:~/mptcp_net-next/mptcp-selftests$ sudo ./mptcp_nvme.sh tcp 4 round-robin 0
0+0 records in
0+0 records out
0 bytes copied, 0.000031583 s, 0.0 B/s
qdisc netem 8041: root refcnt 25 limit 1000 rate 1Gbit seed 17645609376687379696
qdisc netem 8042: root refcnt 25 limit 1000 rate 1Gbit seed 17131351233718588600
qdisc netem 8043: root refcnt 25 limit 1000 rate 1Gbit seed 12336840797390620536
qdisc netem 8044: root refcnt 25 limit 1000 rate 1Gbit seed 3442025196063469443
qdisc netem 8045: root refcnt 25 limit 1000 rate 1Gbit seed 1220869316842653191
qdisc netem 8046: root refcnt 25 limit 1000 rate 1Gbit seed 5375117655844217989
qdisc netem 8047: root refcnt 25 limit 1000 rate 1Gbit seed 7113140317580801833
qdisc netem 8048: root refcnt 25 limit 1000 rate 1Gbit seed 8218613337378753343
nvme discover -a 10.1.1.1

Discovery Log Number of Records 2, Generation counter 65
=====Discovery Log Entry 0======
trtype:  tcp
adrfam:  ipv4
subtype: current discovery subsystem
treq:    not specified, sq flow control disable supported
portid:  26201
trsvcid: 22625
subnqn:  nqn.2014-08.org.nvmexpress.discovery
traddr:  10.1.1.1
eflags:  none
sectype: none
=====Discovery Log Entry 1======
trtype:  tcp
adrfam:  ipv4
subtype: nvme subsystem
treq:    not specified, sq flow control disable supported
portid:  26201
trsvcid: 22625
subnqn:  nqn.2014-08.org.nvmexpress.tcpdev.7486.4297
traddr:  10.1.1.1
eflags:  none
sectype: none
Connecting to 10.1.1.1:22625
connecting to device: nvme1
Connecting to 10.1.2.1:22625
connecting to device: nvme2
Connecting to 10.1.3.1:22625
connecting to device: nvme3
Connecting to 10.1.4.1:22625
connecting to device: nvme4
nvme list
Node                  Generic               SN                   Model                                    Namespace  Usage                      Format           FW Rev  
--------------------- --------------------- -------------------- ---------------------------------------- ---------- -------------------------- ---------------- --------
/dev/nvme0n1          /dev/ng0n1            SS1Q24313Z2CD54R50B8 UMIS RPJYJ1T24MML1AWY                    0x1          1.02  TB /   1.02  TB    512   B +  0 B   1.0L06C1
/dev/nvme1n1          /dev/ng1n1            10989b90c05a143e4117 Linux                                    0x1        536.87  MB / 536.87  MB    512   B +  0 B   7.1.0-rc
fio randread /dev/nvme1n1
libaio_4_256_128k_randread: (g=0): rw=randread, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=libaio, iodepth=256
...
fio-3.41
Starting 4 threads
Jobs: 4 (f=4): [r(4)][100.0%][r=456MiB/s][r=3644 IOPS][eta 00m:00s]
libaio_4_256_128k_randread: (groupid=0, jobs=4): err= 0: pid=7655: Tue May 19 14:29:54 2026
  read: IOPS=3645, BW=456MiB/s (478MB/s)(4683MiB/10278msec)
    slat (usec): min=2, max=1076, avg= 9.75, stdev=17.01
    clat (msec): min=7, max=2007, avg=280.55, stdev=214.94
     lat (msec): min=7, max=2007, avg=280.56, stdev=214.94
    clat percentiles (msec):
     |  1.00th=[   32],  5.00th=[   52], 10.00th=[   63], 20.00th=[   93],
     | 30.00th=[  136], 40.00th=[  186], 50.00th=[  236], 60.00th=[  288],
     | 70.00th=[  355], 80.00th=[  430], 90.00th=[  542], 95.00th=[  667],
     | 99.00th=[  995], 99.50th=[ 1167], 99.90th=[ 1670], 99.95th=[ 1770],
     | 99.99th=[ 1921]
   bw (  KiB/s): min=243200, max=716800, per=99.98%, avg=466483.20, stdev=34411.30, samples=80
   iops        : min= 1900, max= 5600, avg=3644.40, stdev=268.84, samples=80
  lat (msec)   : 10=0.03%, 20=0.26%, 50=4.24%, 100=17.38%, 250=30.84%
  lat (msec)   : 500=34.50%, 750=9.36%, 1000=2.45%, 2000=0.94%, >=2000=0.01%
  cpu          : usr=0.18%, sys=0.95%, ctx=38408, majf=0, minf=32772
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.2%, 32=0.3%, >=64=99.3%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
     issued rwts: total=37464,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=256

Run status group 0 (all jobs):
   READ: bw=456MiB/s (478MB/s), 456MiB/s-456MiB/s (478MB/s-478MB/s), io=4683MiB (4910MB), run=10278-10278msec

Disk stats (read/write):
  nvme1n1: ios=36957/0, sectors=9460992/0, merge=0/0, ticks=10056816/0, in_queue=10056816, util=100.00%
fio randwrite /dev/nvme1n1
libaio_4_256_128k_randwrite: (g=0): rw=randwrite, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=libaio, iodepth=256
...
fio-3.41
Starting 4 threads
Jobs: 4 (f=4): [w(4)][100.0%][w=481MiB/s][w=3844 IOPS][eta 00m:00s]
libaio_4_256_128k_randwrite: (groupid=0, jobs=4): err= 0: pid=7695: Tue May 19 14:30:05 2026
  write: IOPS=3641, BW=455MiB/s (477MB/s)(4660MiB/10239msec); 0 zone resets
    slat (usec): min=2, max=189874, avg=241.45, stdev=4981.17
    clat (msec): min=3, max=3202, avg=280.63, stdev=340.07
     lat (msec): min=4, max=3202, avg=280.87, stdev=340.69
    clat percentiles (msec):
     |  1.00th=[   27],  5.00th=[   43], 10.00th=[   57], 20.00th=[   79],
     | 30.00th=[  101], 40.00th=[  131], 50.00th=[  171], 60.00th=[  226],
     | 70.00th=[  292], 80.00th=[  401], 90.00th=[  609], 95.00th=[  860],
     | 99.00th=[ 1972], 99.50th=[ 2433], 99.90th=[ 2869], 99.95th=[ 2937],
     | 99.99th=[ 3171]
   bw (  KiB/s): min=112640, max=868096, per=99.59%, avg=464153.60, stdev=50584.91, samples=80
   iops        : min=  880, max= 6782, avg=3626.20, stdev=395.19, samples=80
  lat (msec)   : 4=0.01%, 10=0.05%, 20=0.26%, 50=7.31%, 100=22.09%
  lat (msec)   : 250=34.13%, 500=22.17%, 750=7.29%, 1000=3.68%, 2000=2.05%
  lat (msec)   : >=2000=0.96%
  cpu          : usr=0.38%, sys=0.74%, ctx=37915, majf=0, minf=4
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.2%, 32=0.3%, >=64=99.3%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
     issued rwts: total=0,37283,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=256

Run status group 0 (all jobs):
  WRITE: bw=455MiB/s (477MB/s), 455MiB/s-455MiB/s (477MB/s-477MB/s), io=4660MiB (4887MB), run=10239-10239msec

Disk stats (read/write):
  nvme1n1: ios=52/36889, sectors=1952/9443584, merge=0/0, ticks=7/9065684, in_queue=9065691, util=99.03%
NVMe Flush: success

TAP version 13
1..1
ok 1 - mptcp_nvme: nvme over tcp test # time=24541ms
NQN:nqn.2014-08.org.nvmexpress.tcpdev.7486.4297 disconnected 4 controller(s)
/sys/kernel/config/nvmet /home/tgl/mptcp_net-next/tools/testing/selftests/net/mptcp
/home/tgl/mptcp_net-next/tools/testing/selftests/net/mptcp
tgl@ThinkBook:~/mptcp_net-next/mptcp-selftests$ sudo ./mptcp_nvme.sh mptcp 4 round-robin 0
0+0 records in
0+0 records out
0 bytes copied, 0.000034286 s, 0.0 B/s
qdisc netem 8049: root refcnt 25 limit 1000 rate 1Gbit seed 1506589290665984997
qdisc netem 804a: root refcnt 25 limit 1000 rate 1Gbit seed 16222507226790466665
qdisc netem 804b: root refcnt 25 limit 1000 rate 1Gbit seed 1183797428335127788
qdisc netem 804c: root refcnt 25 limit 1000 rate 1Gbit seed 1369710343512163978
qdisc netem 804d: root refcnt 25 limit 1000 rate 1Gbit seed 14215765399017529043
qdisc netem 804e: root refcnt 25 limit 1000 rate 1Gbit seed 16863051369914697256
qdisc netem 804f: root refcnt 25 limit 1000 rate 1Gbit seed 1086698515664012853
qdisc netem 8050: root refcnt 25 limit 1000 rate 1Gbit seed 9133601558953778418
nvme discover -a 10.1.1.1

Discovery Log Number of Records 2, Generation counter 74
=====Discovery Log Entry 0======
trtype:  unrecognized
adrfam:  ipv4
subtype: current discovery subsystem
treq:    not specified, sq flow control disable supported
portid:  25587
trsvcid: 20822
subnqn:  nqn.2014-08.org.nvmexpress.discovery
traddr:  10.1.1.1
eflags:  none
=====Discovery Log Entry 1======
trtype:  unrecognized
adrfam:  ipv4
subtype: nvme subsystem
treq:    not specified, sq flow control disable supported
portid:  25587
trsvcid: 20822
subnqn:  nqn.2014-08.org.nvmexpress.mptcpdev.7759.18799
traddr:  10.1.1.1
eflags:  none
Connecting to 10.1.1.1:20822
connecting to device: nvme1
Connecting to 10.1.2.1:20822
connecting to device: nvme2
Connecting to 10.1.3.1:20822
connecting to device: nvme3
Connecting to 10.1.4.1:20822
connecting to device: nvme4
nvme list
Node                  Generic               SN                   Model                                    Namespace  Usage                      Format           FW Rev  
--------------------- --------------------- -------------------- ---------------------------------------- ---------- -------------------------- ---------------- --------
/dev/nvme0n1          /dev/ng0n1            SS1Q24313Z2CD54R50B8 UMIS RPJYJ1T24MML1AWY                    0x1          1.02  TB /   1.02  TB    512   B +  0 B   1.0L06C1
/dev/nvme1n1          /dev/ng1n1            c8903574af001e2d8144 Linux                                    0x1        536.87  MB / 536.87  MB    512   B +  0 B   7.1.0-rc
fio randread /dev/nvme1n1
libaio_4_256_128k_randread: (g=0): rw=randread, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=libaio, iodepth=256
...
fio-3.41
Starting 4 threads
Jobs: 4 (f=4): [r(4)][100.0%][r=443MiB/s][r=3547 IOPS][eta 00m:00s]
libaio_4_256_128k_randread: (groupid=0, jobs=4): err= 0: pid=7932: Tue May 19 14:30:23 2026
  read: IOPS=3566, BW=446MiB/s (467MB/s)(4565MiB/10239msec)
    slat (usec): min=2, max=230767, avg=107.20, stdev=3298.93
    clat (msec): min=12, max=1820, avg=286.67, stdev=234.19
     lat (msec): min=12, max=1952, avg=286.78, stdev=234.60
    clat percentiles (msec):
     |  1.00th=[   30],  5.00th=[   47], 10.00th=[   64], 20.00th=[   99],
     | 30.00th=[  136], 40.00th=[  176], 50.00th=[  226], 60.00th=[  288],
     | 70.00th=[  359], 80.00th=[  439], 90.00th=[  567], 95.00th=[  718],
     | 99.00th=[ 1133], 99.50th=[ 1469], 99.90th=[ 1703], 99.95th=[ 1720],
     | 99.99th=[ 1754]
   bw (  KiB/s): min=258816, max=738816, per=99.53%, avg=454371.10, stdev=35123.19, samples=80
   iops        : min= 2022, max= 5772, avg=3549.70, stdev=274.40, samples=80
  lat (msec)   : 20=0.24%, 50=5.56%, 100=14.60%, 250=33.79%, 500=31.58%
  lat (msec)   : 750=9.98%, 1000=2.77%, 2000=1.49%
  cpu          : usr=0.11%, sys=1.11%, ctx=28917, majf=0, minf=32772
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.2%, 32=0.4%, >=64=99.3%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
     issued rwts: total=36516,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=256

Run status group 0 (all jobs):
   READ: bw=446MiB/s (467MB/s), 446MiB/s-446MiB/s (467MB/s-467MB/s), io=4565MiB (4786MB), run=10239-10239msec

Disk stats (read/write):
  nvme1n1: ios=36065/0, sectors=9232640/0, merge=0/0, ticks=9849440/0, in_queue=9849440, util=99.08%
fio randwrite /dev/nvme1n1
libaio_4_256_128k_randwrite: (g=0): rw=randwrite, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=libaio, iodepth=256
...
fio-3.41
Starting 4 threads
Jobs: 4 (f=4): [w(4)][100.0%][w=449MiB/s][w=3591 IOPS][eta 00m:00s]
libaio_4_256_128k_randwrite: (groupid=0, jobs=4): err= 0: pid=7964: Tue May 19 14:30:34 2026
  write: IOPS=3560, BW=445MiB/s (467MB/s)(4575MiB/10280msec); 0 zone resets
    slat (usec): min=3, max=106224, avg=36.02, stdev=1317.16
    clat (msec): min=3, max=1458, avg=287.19, stdev=208.34
     lat (msec): min=3, max=1458, avg=287.23, stdev=208.38
    clat percentiles (msec):
     |  1.00th=[   56],  5.00th=[   74], 10.00th=[   90], 20.00th=[  117],
     | 30.00th=[  148], 40.00th=[  182], 50.00th=[  224], 60.00th=[  279],
     | 70.00th=[  347], 80.00th=[  426], 90.00th=[  584], 95.00th=[  718],
     | 99.00th=[  961], 99.50th=[ 1083], 99.90th=[ 1368], 99.95th=[ 1385],
     | 99.99th=[ 1435]
   bw (  KiB/s): min=273664, max=696320, per=99.94%, avg=455459.95, stdev=27313.01, samples=80
   iops        : min= 2138, max= 5440, avg=3558.20, stdev=213.39, samples=80
  lat (msec)   : 4=0.01%, 10=0.05%, 20=0.10%, 50=0.39%, 100=13.15%
  lat (msec)   : 250=41.14%, 500=30.83%, 750=10.33%, 1000=3.22%, 2000=0.80%
  cpu          : usr=0.48%, sys=0.81%, ctx=26520, majf=0, minf=4
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.2%, 32=0.3%, >=64=99.3%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
     issued rwts: total=0,36601,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=256

Run status group 0 (all jobs):
  WRITE: bw=445MiB/s (467MB/s), 445MiB/s-445MiB/s (467MB/s-467MB/s), io=4575MiB (4797MB), run=10280-10280msec

Disk stats (read/write):
  nvme1n1: ios=73/36055, sectors=2520/9230080, merge=0/0, ticks=23/10146459, in_queue=10146482, util=100.00%
NVMe Flush: success

TAP version 13
1..1
ok 1 - mptcp_nvme: nvme over mptcp test # time=24539ms
NQN:nqn.2014-08.org.nvmexpress.mptcpdev.7759.18799 disconnected 4 controller(s)
/sys/kernel/config/nvmet /home/tgl/mptcp_net-next/tools/testing/selftests/net/mptcp
/home/tgl/mptcp_net-next/tools/testing/selftests/net/mptcp
$ sudo ./mptcp_nvme.sh tcp 4 queue-depth 0
[sudo: authenticate] Password:        
0+0 records in
0+0 records out
0 bytes copied, 0.000034443 s, 0.0 B/s
qdisc netem 8061: root refcnt 25 limit 1000 rate 1Gbit seed 3295449951790515148
qdisc netem 8062: root refcnt 25 limit 1000 rate 1Gbit seed 3138537503814513313
qdisc netem 8063: root refcnt 25 limit 1000 rate 1Gbit seed 10184262077164245255
qdisc netem 8064: root refcnt 25 limit 1000 rate 1Gbit seed 8196776831254145068
qdisc netem 8065: root refcnt 25 limit 1000 rate 1Gbit seed 3794130921615195350
qdisc netem 8066: root refcnt 25 limit 1000 rate 1Gbit seed 3768240293358984722
qdisc netem 8067: root refcnt 25 limit 1000 rate 1Gbit seed 11220714754356808479
qdisc netem 8068: root refcnt 25 limit 1000 rate 1Gbit seed 12767913258139064713
nvme discover -a 10.1.1.1

Discovery Log Number of Records 2, Generation counter 101
=====Discovery Log Entry 0======
trtype:  tcp
adrfam:  ipv4
subtype: current discovery subsystem
treq:    not specified, sq flow control disable supported
portid:  29891
trsvcid: 7688
subnqn:  nqn.2014-08.org.nvmexpress.discovery
traddr:  10.1.1.1
eflags:  none
sectype: none
=====Discovery Log Entry 1======
trtype:  tcp
adrfam:  ipv4
subtype: nvme subsystem
treq:    not specified, sq flow control disable supported
portid:  29891
trsvcid: 7688
subnqn:  nqn.2014-08.org.nvmexpress.tcpdev.8843.27030
traddr:  10.1.1.1
eflags:  none
sectype: none
Connecting to 10.1.1.1:7688
connecting to device: nvme1
Connecting to 10.1.2.1:7688
connecting to device: nvme2
Connecting to 10.1.3.1:7688
connecting to device: nvme3
Connecting to 10.1.4.1:7688
connecting to device: nvme4
nvme list
Node                  Generic               SN                   Model                                    Namespace  Usage                      Format           FW Rev  
--------------------- --------------------- -------------------- ---------------------------------------- ---------- -------------------------- ---------------- --------
/dev/nvme0n1          /dev/ng0n1            SS1Q24313Z2CD54R50B8 UMIS RPJYJ1T24MML1AWY                    0x1          1.02  TB /   1.02  TB    512   B +  0 B   1.0L06C1
/dev/nvme1n1          /dev/ng1n1            9f68b5437684bbcdddd1 Linux                                    0x1        536.87  MB / 536.87  MB    512   B +  0 B   7.1.0-rc
fio randread /dev/nvme1n1
libaio_4_256_128k_randread: (g=0): rw=randread, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=libaio, iodepth=256
...
fio-3.41
Starting 4 threads
Jobs: 4 (f=4): [r(4)][100.0%][r=454MiB/s][r=3635 IOPS][eta 00m:00s]
libaio_4_256_128k_randread: (groupid=0, jobs=4): err= 0: pid=9014: Tue May 19 14:37:44 2026
  read: IOPS=3644, BW=456MiB/s (478MB/s)(4632MiB/10169msec)
    slat (usec): min=2, max=830059, avg=160.63, stdev=6754.87
    clat (usec): min=1666, max=3091.4k, avg=280481.59, stdev=335991.37
     lat (msec): min=2, max=3091, avg=280.64, stdev=336.28
    clat percentiles (msec):
     |  1.00th=[   16],  5.00th=[   31], 10.00th=[   44], 20.00th=[   65],
     | 30.00th=[   89], 40.00th=[  126], 50.00th=[  174], 60.00th=[  239],
     | 70.00th=[  313], 80.00th=[  418], 90.00th=[  609], 95.00th=[  835],
     | 99.00th=[ 1838], 99.50th=[ 2366], 99.90th=[ 2769], 99.95th=[ 2802],
     | 99.99th=[ 2937]
   bw (  KiB/s): min=101888, max=941312, per=98.88%, avg=461260.80, stdev=54054.15, samples=80
   iops        : min=  796, max= 7354, avg=3603.60, stdev=422.30, samples=80
  lat (msec)   : 2=0.01%, 4=0.04%, 10=0.35%, 20=1.41%, 50=11.51%
  lat (msec)   : 100=19.90%, 250=28.85%, 500=23.31%, 750=7.81%, 1000=3.63%
  lat (msec)   : 2000=2.44%, >=2000=0.75%
  cpu          : usr=0.11%, sys=0.98%, ctx=37347, majf=0, minf=32772
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.2%, 32=0.3%, >=64=99.3%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
     issued rwts: total=37059,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=256

Run status group 0 (all jobs):
   READ: bw=456MiB/s (478MB/s), 456MiB/s-456MiB/s (478MB/s-478MB/s), io=4632MiB (4857MB), run=10169-10169msec

Disk stats (read/write):
  nvme1n1: ios=36964/0, sectors=9462784/0, merge=0/0, ticks=9407103/0, in_queue=9407103, util=100.00%
fio randwrite /dev/nvme1n1
libaio_4_256_128k_randwrite: (g=0): rw=randwrite, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=libaio, iodepth=256
...
fio-3.41
Starting 4 threads
Jobs: 4 (f=4): [w(4)][100.0%][w=457MiB/s][w=3656 IOPS][eta 00m:00s]
libaio_4_256_128k_randwrite: (groupid=0, jobs=4): err= 0: pid=9056: Tue May 19 14:37:56 2026
  write: IOPS=3641, BW=455MiB/s (477MB/s)(4666MiB/10250msec); 0 zone resets
    slat (usec): min=2, max=281750, avg=150.14, stdev=4550.86
    clat (msec): min=4, max=3125, avg=280.88, stdev=311.14
     lat (msec): min=4, max=3125, avg=281.03, stdev=311.59
    clat percentiles (msec):
     |  1.00th=[   33],  5.00th=[   57], 10.00th=[   71], 20.00th=[   95],
     | 30.00th=[  121], 40.00th=[  150], 50.00th=[  184], 60.00th=[  236],
     | 70.00th=[  296], 80.00th=[  384], 90.00th=[  550], 95.00th=[  802],
     | 99.00th=[ 1787], 99.50th=[ 2072], 99.90th=[ 2769], 99.95th=[ 2937],
     | 99.99th=[ 3104]
   bw (  KiB/s): min=218880, max=769280, per=99.70%, avg=464743.95, stdev=37729.75, samples=80
   iops        : min= 1710, max= 6010, avg=3630.75, stdev=294.76, samples=80
  lat (msec)   : 10=0.05%, 20=0.18%, 50=3.43%, 100=18.35%, 250=40.40%
  lat (msec)   : 500=25.52%, 750=6.32%, 1000=2.32%, 2000=2.79%, >=2000=0.62%
  cpu          : usr=0.38%, sys=0.82%, ctx=39052, majf=0, minf=4
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.2%, 32=0.3%, >=64=99.3%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
     issued rwts: total=0,37327,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=256

Run status group 0 (all jobs):
  WRITE: bw=455MiB/s (477MB/s), 455MiB/s-455MiB/s (477MB/s-477MB/s), io=4666MiB (4893MB), run=10250-10250msec

Disk stats (read/write):
  nvme1n1: ios=52/36902, sectors=1952/9446912, merge=0/0, ticks=9/9443482, in_queue=9443491, util=99.05%
NVMe Flush: success

TAP version 13
1..1
ok 1 - mptcp_nvme: nvme over tcp test # time=24526ms
NQN:nqn.2014-08.org.nvmexpress.tcpdev.8843.27030 disconnected 4 controller(s)
/sys/kernel/config/nvmet /home/tgl/mptcp_net-next/tools/testing/selftests/net/mptcp
/home/tgl/mptcp_net-next/tools/testing/selftests/net/mptcp
tgl@ThinkBook:~/mptcp_net-next/mptcp-selftests$ sudo ./mptcp_nvme.sh mptcp 4 queue-depth 0
0+0 records in
0+0 records out
0 bytes copied, 0.000033222 s, 0.0 B/s
qdisc netem 8069: root refcnt 25 limit 1000 rate 1Gbit seed 10161739475702683234
qdisc netem 806a: root refcnt 25 limit 1000 rate 1Gbit seed 12159989797435071587
qdisc netem 806b: root refcnt 25 limit 1000 rate 1Gbit seed 14498810451485745943
qdisc netem 806c: root refcnt 25 limit 1000 rate 1Gbit seed 18442905267492529790
qdisc netem 806d: root refcnt 25 limit 1000 rate 1Gbit seed 2985926051413726176
qdisc netem 806e: root refcnt 25 limit 1000 rate 1Gbit seed 4435161703691035974
qdisc netem 806f: root refcnt 25 limit 1000 rate 1Gbit seed 16832561787704440488
qdisc netem 8070: root refcnt 25 limit 1000 rate 1Gbit seed 8416006940920284114
nvme discover -a 10.1.1.1

Discovery Log Number of Records 2, Generation counter 110
=====Discovery Log Entry 0======
trtype:  unrecognized
adrfam:  ipv4
subtype: current discovery subsystem
treq:    not specified, sq flow control disable supported
portid:  28051
trsvcid: 19031
subnqn:  nqn.2014-08.org.nvmexpress.discovery
traddr:  10.1.1.1
eflags:  none
=====Discovery Log Entry 1======
trtype:  unrecognized
adrfam:  ipv4
subtype: nvme subsystem
treq:    not specified, sq flow control disable supported
portid:  28051
trsvcid: 19031
subnqn:  nqn.2014-08.org.nvmexpress.mptcpdev.9156.18561
traddr:  10.1.1.1
eflags:  none
Connecting to 10.1.1.1:19031
connecting to device: nvme1
Connecting to 10.1.2.1:19031
connecting to device: nvme2
Connecting to 10.1.3.1:19031
connecting to device: nvme3
Connecting to 10.1.4.1:19031
connecting to device: nvme4
nvme list
Node                  Generic               SN                   Model                                    Namespace  Usage                      Format           FW Rev  
--------------------- --------------------- -------------------- ---------------------------------------- ---------- -------------------------- ---------------- --------
/dev/nvme0n1          /dev/ng0n1            SS1Q24313Z2CD54R50B8 UMIS RPJYJ1T24MML1AWY                    0x1          1.02  TB /   1.02  TB    512   B +  0 B   1.0L06C1
/dev/nvme1n1          /dev/ng1n1            261e91ea8603f223e6fe Linux                                    0x1        536.87  MB / 536.87  MB    512   B +  0 B   7.1.0-rc
fio randread /dev/nvme1n1
libaio_4_256_128k_randread: (g=0): rw=randread, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=libaio, iodepth=256
...
fio-3.41
Starting 4 threads
Jobs: 4 (f=4): [r(4)][100.0%][r=458MiB/s][r=3665 IOPS][eta 00m:00s]
libaio_4_256_128k_randread: (groupid=0, jobs=4): err= 0: pid=9331: Tue May 19 14:38:16 2026
  read: IOPS=3565, BW=446MiB/s (467MB/s)(4568MiB/10249msec)
    slat (usec): min=2, max=621, avg=11.66, stdev=15.54
    clat (msec): min=15, max=1787, avg=286.66, stdev=218.14
     lat (msec): min=15, max=1787, avg=286.67, stdev=218.14
    clat percentiles (msec):
     |  1.00th=[   35],  5.00th=[   49], 10.00th=[   70], 20.00th=[  108],
     | 30.00th=[  142], 40.00th=[  184], 50.00th=[  230], 60.00th=[  284],
     | 70.00th=[  347], 80.00th=[  439], 90.00th=[  584], 95.00th=[  726],
     | 99.00th=[  995], 99.50th=[ 1150], 99.90th=[ 1368], 99.95th=[ 1569],
     | 99.99th=[ 1737]
   bw (  KiB/s): min=265728, max=655872, per=99.62%, avg=454694.40, stdev=25269.02, samples=80
   iops        : min= 2076, max= 5124, avg=3552.30, stdev=197.41, samples=80
  lat (msec)   : 20=0.16%, 50=5.19%, 100=12.82%, 250=35.67%, 500=30.99%
  lat (msec)   : 750=10.73%, 1000=3.47%, 2000=0.98%
  cpu          : usr=0.11%, sys=1.25%, ctx=29785, majf=0, minf=32772
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.2%, 32=0.4%, >=64=99.3%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
     issued rwts: total=36545,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=256

Run status group 0 (all jobs):
   READ: bw=446MiB/s (467MB/s), 446MiB/s-446MiB/s (467MB/s-467MB/s), io=4568MiB (4790MB), run=10249-10249msec

Disk stats (read/write):
  nvme1n1: ios=36005/0, sectors=9217280/0, merge=0/0, ticks=10137035/0, in_queue=10137035, util=99.06%
fio randwrite /dev/nvme1n1
libaio_4_256_128k_randwrite: (g=0): rw=randwrite, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=libaio, iodepth=256
...
fio-3.41
Starting 4 threads
Jobs: 4 (f=4): [w(4)][100.0%][w=448MiB/s][w=3582 IOPS][eta 00m:00s]
libaio_4_256_128k_randwrite: (groupid=0, jobs=4): err= 0: pid=9365: Tue May 19 14:38:27 2026
  write: IOPS=3562, BW=445MiB/s (467MB/s)(4563MiB/10245msec); 0 zone resets
    slat (usec): min=2, max=2021, avg=11.72, stdev=20.90
    clat (msec): min=4, max=1463, avg=287.22, stdev=202.83
     lat (msec): min=4, max=1463, avg=287.24, stdev=202.83
    clat percentiles (msec):
     |  1.00th=[   59],  5.00th=[   75], 10.00th=[   92], 20.00th=[  122],
     | 30.00th=[  153], 40.00th=[  190], 50.00th=[  232], 60.00th=[  279],
     | 70.00th=[  342], 80.00th=[  430], 90.00th=[  558], 95.00th=[  684],
     | 99.00th=[  969], 99.50th=[ 1133], 99.90th=[ 1368], 99.95th=[ 1418],
     | 99.99th=[ 1418]
   bw (  KiB/s): min=287744, max=602624, per=99.59%, avg=454153.50, stdev=21675.66, samples=80
   iops        : min= 2248, max= 4708, avg=3548.05, stdev=169.34, samples=80
  lat (msec)   : 10=0.06%, 20=0.07%, 50=0.31%, 100=12.62%, 250=41.04%
  lat (msec)   : 500=32.01%, 750=10.08%, 1000=2.96%, 2000=0.86%
  cpu          : usr=0.50%, sys=0.83%, ctx=26931, majf=0, minf=4
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.2%, 32=0.4%, >=64=99.3%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
     issued rwts: total=0,36500,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=256

Run status group 0 (all jobs):
  WRITE: bw=445MiB/s (467MB/s), 445MiB/s-445MiB/s (467MB/s-467MB/s), io=4563MiB (4784MB), run=10245-10245msec

Disk stats (read/write):
  nvme1n1: ios=45/36009, sectors=1328/9218304, merge=0/0, ticks=9/10121003, in_queue=10121012, util=99.06%
NVMe Flush: success

TAP version 13
1..1
ok 1 - mptcp_nvme: nvme over mptcp test # time=24554ms
NQN:nqn.2014-08.org.nvmexpress.mptcpdev.9156.18561 disconnected 4 controller(s)
/sys/kernel/config/nvmet /home/tgl/mptcp_net-next/tools/testing/selftests/net/mptcp
/home/tgl/mptcp_net-next/tools/testing/selftests/net/mptcp
tgl@ThinkBook:~/mptcp_net-next/mptcp-selftests$ sudo ./mptcp_nvme.sh tcp 4 round-robin 1
0+0 records in
0+0 records out
0 bytes copied, 0.000036756 s, 0.0 B/s
qdisc netem 8051: root refcnt 25 limit 1000 delay 5ms loss 0.5% rate 1Gbit seed 14946049878654165618
qdisc netem 8052: root refcnt 25 limit 1000 delay 5ms loss 0.5% rate 1Gbit seed 11230459861705135323
qdisc netem 8053: root refcnt 25 limit 1000 delay 5ms loss 0.5% rate 1Gbit seed 9641862848092404184
qdisc netem 8054: root refcnt 25 limit 1000 delay 5ms loss 0.5% rate 1Gbit seed 6252522120019877633
qdisc netem 8055: root refcnt 25 limit 1000 delay 5ms loss 0.5% rate 1Gbit seed 15159846717589848988
qdisc netem 8056: root refcnt 25 limit 1000 delay 5ms loss 0.5% rate 1Gbit seed 2882562256880768222
qdisc netem 8057: root refcnt 25 limit 1000 delay 5ms loss 0.5% rate 1Gbit seed 9830209649295320810
qdisc netem 8058: root refcnt 25 limit 1000 delay 5ms loss 0.5% rate 1Gbit seed 11605172172603852230
nvme discover -a 10.1.1.1

Discovery Log Number of Records 2, Generation counter 83
=====Discovery Log Entry 0======
trtype:  tcp
adrfam:  ipv4
subtype: current discovery subsystem
treq:    not specified, sq flow control disable supported
portid:  25757
trsvcid: 8795
subnqn:  nqn.2014-08.org.nvmexpress.discovery
traddr:  10.1.1.1
eflags:  none
sectype: none
=====Discovery Log Entry 1======
trtype:  tcp
adrfam:  ipv4
subtype: nvme subsystem
treq:    not specified, sq flow control disable supported
portid:  25757
trsvcid: 8795
subnqn:  nqn.2014-08.org.nvmexpress.tcpdev.8020.19160
traddr:  10.1.1.1
eflags:  none
sectype: none
Connecting to 10.1.1.1:8795
connecting to device: nvme1
Connecting to 10.1.2.1:8795
connecting to device: nvme2
Connecting to 10.1.3.1:8795
connecting to device: nvme3
Connecting to 10.1.4.1:8795
connecting to device: nvme4
nvme list
Node                  Generic               SN                   Model                                    Namespace  Usage                      Format           FW Rev  
--------------------- --------------------- -------------------- ---------------------------------------- ---------- -------------------------- ---------------- --------
/dev/nvme0n1          /dev/ng0n1            SS1Q24313Z2CD54R50B8 UMIS RPJYJ1T24MML1AWY                    0x1          1.02  TB /   1.02  TB    512   B +  0 B   1.0L06C1
/dev/nvme1n1          /dev/ng1n1            d713f8a662af6c1145b6 Linux                                    0x1        536.87  MB / 536.87  MB    512   B +  0 B   7.1.0-rc
fio randread /dev/nvme1n1
libaio_4_256_128k_randread: (g=0): rw=randread, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=libaio, iodepth=256
...
fio-3.41
Starting 4 threads
Jobs: 1 (f=1): [_(3),r(1)][62.5%][r=63.8MiB/s][r=510 IOPS][eta 00m:09s]     
libaio_4_256_128k_randread: (groupid=0, jobs=4): err= 0: pid=8190: Tue May 19 14:31:01 2026
  read: IOPS=844, BW=106MiB/s (111MB/s)(1574MiB/14906msec)
    slat (usec): min=2, max=593278, avg=701.94, stdev=14737.94
    clat (msec): min=11, max=9029, avg=1158.52, stdev=1632.75
     lat (msec): min=11, max=9029, avg=1159.22, stdev=1633.95
    clat percentiles (msec):
     |  1.00th=[   12],  5.00th=[   12], 10.00th=[   18], 20.00th=[   32],
     | 30.00th=[   66], 40.00th=[  169], 50.00th=[  351], 60.00th=[  718],
     | 70.00th=[ 1284], 80.00th=[ 2106], 90.00th=[ 3574], 95.00th=[ 5134],
     | 99.00th=[ 6544], 99.50th=[ 7550], 99.90th=[ 8792], 99.95th=[ 8792],
     | 99.99th=[ 9060]
   bw (  KiB/s): min=32000, max=442796, per=100.00%, avg=149431.05, stdev=30858.28, samples=79
   iops        : min=  250, max= 3459, avg=1167.37, stdev=241.06, samples=79
  lat (msec)   : 20=11.95%, 50=14.53%, 100=7.70%, 250=11.45%, 500=8.76%
  lat (msec)   : 750=6.30%, 1000=4.42%, 2000=14.27%, >=2000=20.62%
  cpu          : usr=0.03%, sys=0.20%, ctx=10272, majf=0, minf=32772
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.3%, 16=0.5%, 32=1.0%, >=64=98.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
     issued rwts: total=12590,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=256

Run status group 0 (all jobs):
   READ: bw=106MiB/s (111MB/s), 106MiB/s-106MiB/s (111MB/s-111MB/s), io=1574MiB (1650MB), run=14906-14906msec

Disk stats (read/write):
  nvme1n1: ios=12589/0, sectors=3222784/0, merge=0/0, ticks=10598011/0, in_queue=10598011, util=99.39%
fio randwrite /dev/nvme1n1
libaio_4_256_128k_randwrite: (g=0): rw=randwrite, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=libaio, iodepth=256
...
fio-3.41
Starting 4 threads
Jobs: 3 (f=3): [w(1),f(1),w(1),_(1)][70.0%][w=31.9MiB/s][w=255 IOPS][eta 00m:06s]
libaio_4_256_128k_randwrite: (groupid=0, jobs=4): err= 0: pid=8222: Tue May 19 14:31:17 2026
  write: IOPS=788, BW=98.5MiB/s (103MB/s)(1455MiB/14770msec); 0 zone resets
    slat (usec): min=2, max=831622, avg=358.60, stdev=12909.46
    clat (msec): min=21, max=8504, avg=1214.70, stdev=1453.79
     lat (msec): min=21, max=8504, avg=1215.06, stdev=1453.93
    clat percentiles (msec):
     |  1.00th=[   22],  5.00th=[   36], 10.00th=[   53], 20.00th=[  112],
     | 30.00th=[  213], 40.00th=[  401], 50.00th=[  659], 60.00th=[ 1020],
     | 70.00th=[ 1418], 80.00th=[ 1989], 90.00th=[ 3339], 95.00th=[ 4799],
     | 99.00th=[ 6141], 99.50th=[ 6342], 99.90th=[ 7819], 99.95th=[ 7819],
     | 99.99th=[ 7886]
   bw (  KiB/s): min=39936, max=271265, per=100.00%, avg=139243.92, stdev=14876.84, samples=80
   iops        : min=  312, max= 2119, avg=1087.77, stdev=116.20, samples=80
  lat (msec)   : 50=8.18%, 100=10.16%, 250=13.95%, 500=12.16%, 750=8.50%
  lat (msec)   : 1000=6.50%, 2000=20.70%, >=2000=19.85%
  cpu          : usr=0.12%, sys=0.21%, ctx=13938, majf=0, minf=4
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.3%, 16=0.5%, 32=1.1%, >=64=97.8%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
     issued rwts: total=0,11643,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=256

Run status group 0 (all jobs):
  WRITE: bw=98.5MiB/s (103MB/s), 98.5MiB/s-98.5MiB/s (103MB/s-103MB/s), io=1455MiB (1526MB), run=14770-14770msec

Disk stats (read/write):
  nvme1n1: ios=90/11639, sectors=2976/2979584, merge=0/0, ticks=16935/10792314, in_queue=10809249, util=100.00%
NVMe Flush: success

TAP version 13
1..1
ok 1 - mptcp_nvme: nvme over tcp test # time=37684ms
NQN:nqn.2014-08.org.nvmexpress.tcpdev.8020.19160 disconnected 4 controller(s)
/sys/kernel/config/nvmet /home/tgl/mptcp_net-next/tools/testing/selftests/net/mptcp
/home/tgl/mptcp_net-next/tools/testing/selftests/net/mptcp
tgl@ThinkBook:~/mptcp_net-next/mptcp-selftests$ sudo ./mptcp_nvme.sh mptcp 4 round-robin 1
0+0 records in
0+0 records out
0 bytes copied, 0.000034975 s, 0.0 B/s
qdisc netem 8059: root refcnt 25 limit 1000 delay 5ms loss 0.5% rate 1Gbit seed 16648645294543629576
qdisc netem 805a: root refcnt 25 limit 1000 delay 5ms loss 0.5% rate 1Gbit seed 12381805247586694531
qdisc netem 805b: root refcnt 25 limit 1000 delay 5ms loss 0.5% rate 1Gbit seed 16335976436197243438
qdisc netem 805c: root refcnt 25 limit 1000 delay 5ms loss 0.5% rate 1Gbit seed 14431250154863119196
qdisc netem 805d: root refcnt 25 limit 1000 delay 5ms loss 0.5% rate 1Gbit seed 16886676323171265152
qdisc netem 805e: root refcnt 25 limit 1000 delay 5ms loss 0.5% rate 1Gbit seed 8248230441049121426
qdisc netem 805f: root refcnt 25 limit 1000 delay 5ms loss 0.5% rate 1Gbit seed 3846556862478236863
qdisc netem 8060: root refcnt 25 limit 1000 delay 5ms loss 0.5% rate 1Gbit seed 18342167217463336805
nvme discover -a 10.1.1.1

Discovery Log Number of Records 2, Generation counter 92
=====Discovery Log Entry 0======
trtype:  unrecognized
adrfam:  ipv4
subtype: current discovery subsystem
treq:    not specified, sq flow control disable supported
portid:  23106
trsvcid: 23601
subnqn:  nqn.2014-08.org.nvmexpress.discovery
traddr:  10.1.1.1
eflags:  none
=====Discovery Log Entry 1======
trtype:  unrecognized
adrfam:  ipv4
subtype: nvme subsystem
treq:    not specified, sq flow control disable supported
portid:  23106
trsvcid: 23601
subnqn:  nqn.2014-08.org.nvmexpress.mptcpdev.8281.3834
traddr:  10.1.1.1
eflags:  none
Connecting to 10.1.1.1:23601
connecting to device: nvme1
Connecting to 10.1.2.1:23601
connecting to device: nvme2
Connecting to 10.1.3.1:23601
connecting to device: nvme3
Connecting to 10.1.4.1:23601
connecting to device: nvme4
nvme list
Node                  Generic               SN                   Model                                    Namespace  Usage                      Format           FW Rev  
--------------------- --------------------- -------------------- ---------------------------------------- ---------- -------------------------- ---------------- --------
/dev/nvme0n1          /dev/ng0n1            SS1Q24313Z2CD54R50B8 UMIS RPJYJ1T24MML1AWY                    0x1          1.02  TB /   1.02  TB    512   B +  0 B   1.0L06C1
/dev/nvme1n1          /dev/ng1n1            e40fc3718d44d3bc547c Linux                                    0x1        536.87  MB / 536.87  MB    512   B +  0 B   7.1.0-rc
fio randread /dev/nvme1n1
libaio_4_256_128k_randread: (g=0): rw=randread, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=libaio, iodepth=256
...
fio-3.41
Starting 4 threads
Jobs: 4 (f=4): [r(4)][100.0%][r=442MiB/s][r=3534 IOPS][eta 00m:00s]
libaio_4_256_128k_randread: (groupid=0, jobs=4): err= 0: pid=8455: Tue May 19 14:31:41 2026
  read: IOPS=3409, BW=426MiB/s (447MB/s)(4533MiB/10637msec)
    slat (usec): min=2, max=8926, avg= 9.82, stdev=49.04
    clat (msec): min=13, max=1821, avg=294.51, stdev=256.51
     lat (msec): min=13, max=1821, avg=294.52, stdev=256.52
    clat percentiles (msec):
     |  1.00th=[   22],  5.00th=[   39], 10.00th=[   51], 20.00th=[   80],
     | 30.00th=[  120], 40.00th=[  167], 50.00th=[  228], 60.00th=[  288],
     | 70.00th=[  363], 80.00th=[  464], 90.00th=[  634], 95.00th=[  818],
     | 99.00th=[ 1167], 99.50th=[ 1284], 99.90th=[ 1603], 99.95th=[ 1636],
     | 99.99th=[ 1754]
   bw (  KiB/s): min=252779, max=695040, per=100.00%, avg=448147.72, stdev=26919.12, samples=81
   iops        : min= 1974, max= 5430, avg=3501.11, stdev=210.33, samples=81
  lat (msec)   : 20=0.64%, 50=9.10%, 100=15.72%, 250=28.46%, 500=28.97%
  lat (msec)   : 750=11.02%, 1000=3.52%, 2000=2.56%
  cpu          : usr=0.10%, sys=0.99%, ctx=29544, majf=0, minf=32772
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.2%, 32=0.4%, >=64=99.3%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
     issued rwts: total=36266,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=256

Run status group 0 (all jobs):
   READ: bw=426MiB/s (447MB/s), 426MiB/s-426MiB/s (447MB/s-447MB/s), io=4533MiB (4753MB), run=10637-10637msec

Disk stats (read/write):
  nvme1n1: ios=36139/0, sectors=9251584/0, merge=0/0, ticks=10343665/0, in_queue=10343665, util=100.00%
fio randwrite /dev/nvme1n1
libaio_4_256_128k_randwrite: (g=0): rw=randwrite, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=libaio, iodepth=256
...
fio-3.41
Starting 4 threads
Jobs: 4 (f=4): [w(4)][100.0%][w=444MiB/s][w=3555 IOPS][eta 00m:00s]
libaio_4_256_128k_randwrite: (groupid=0, jobs=4): err= 0: pid=8499: Tue May 19 14:31:53 2026
  write: IOPS=3426, BW=428MiB/s (449MB/s)(4507MiB/10522msec); 0 zone resets
    slat (usec): min=3, max=45871, avg=14.63, stdev=249.32
    clat (msec): min=20, max=1490, avg=297.39, stdev=222.83
     lat (msec): min=20, max=1490, avg=297.40, stdev=222.83
    clat percentiles (msec):
     |  1.00th=[   35],  5.00th=[   51], 10.00th=[   66], 20.00th=[   99],
     | 30.00th=[  144], 40.00th=[  194], 50.00th=[  247], 60.00th=[  305],
     | 70.00th=[  368], 80.00th=[  464], 90.00th=[  617], 95.00th=[  743],
     | 99.00th=[  986], 99.50th=[ 1083], 99.90th=[ 1301], 99.95th=[ 1368],
     | 99.99th=[ 1485]
   bw (  KiB/s): min=286976, max=641024, per=100.00%, avg=446654.17, stdev=24140.84, samples=81
   iops        : min= 2242, max= 5008, avg=3489.49, stdev=188.60, samples=81
  lat (msec)   : 50=4.77%, 100=15.95%, 250=30.00%, 500=32.46%, 750=12.07%
  lat (msec)   : 1000=3.86%, 2000=0.89%
  cpu          : usr=0.45%, sys=0.96%, ctx=25181, majf=0, minf=4
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.2%, 32=0.4%, >=64=99.3%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
     issued rwts: total=0,36052,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=256

Run status group 0 (all jobs):
  WRITE: bw=428MiB/s (449MB/s), 428MiB/s-428MiB/s (449MB/s-449MB/s), io=4507MiB (4725MB), run=10522-10522msec

Disk stats (read/write):
  nvme1n1: ios=85/35927, sectors=2736/9197312, merge=0/0, ticks=2753/10368824, in_queue=10371577, util=100.00%
NVMe Flush: success

TAP version 13
1..1
ok 1 - mptcp_nvme: nvme over mptcp test # time=29910ms
NQN:nqn.2014-08.org.nvmexpress.mptcpdev.8281.3834 disconnected 4 controller(s)
/sys/kernel/config/nvmet /home/tgl/mptcp_net-next/tools/testing/selftests/net/mptcp
/home/tgl/mptcp_net-next/tools/testing/selftests/net/mptcp
$ sudo ./mptcp_nvme.sh tcp 4 queue-depth 1
0+0 records in
0+0 records out
0 bytes copied, 0.000030985 s, 0.0 B/s
qdisc netem 8071: root refcnt 25 limit 1000 delay 5ms loss 0.5% rate 1Gbit seed 13101495999964109604
qdisc netem 8072: root refcnt 25 limit 1000 delay 5ms loss 0.5% rate 1Gbit seed 11517974220235216241
qdisc netem 8073: root refcnt 25 limit 1000 delay 5ms loss 0.5% rate 1Gbit seed 12492982716523034098
qdisc netem 8074: root refcnt 25 limit 1000 delay 5ms loss 0.5% rate 1Gbit seed 10279723881541909769
qdisc netem 8075: root refcnt 25 limit 1000 delay 5ms loss 0.5% rate 1Gbit seed 2346606414154275258
qdisc netem 8076: root refcnt 25 limit 1000 delay 5ms loss 0.5% rate 1Gbit seed 7578393669647277521
qdisc netem 8077: root refcnt 25 limit 1000 delay 5ms loss 0.5% rate 1Gbit seed 17269306750778379857
qdisc netem 8078: root refcnt 25 limit 1000 delay 5ms loss 0.5% rate 1Gbit seed 908546285485823085
nvme discover -a 10.1.1.1

Discovery Log Number of Records 2, Generation counter 119
=====Discovery Log Entry 0======
trtype:  tcp
adrfam:  ipv4
subtype: current discovery subsystem
treq:    not specified, sq flow control disable supported
portid:  24646
trsvcid: 14447
subnqn:  nqn.2014-08.org.nvmexpress.discovery
traddr:  10.1.1.1
eflags:  none
sectype: none
=====Discovery Log Entry 1======
trtype:  tcp
adrfam:  ipv4
subtype: nvme subsystem
treq:    not specified, sq flow control disable supported
portid:  24646
trsvcid: 14447
subnqn:  nqn.2014-08.org.nvmexpress.tcpdev.10012.27992
traddr:  10.1.1.1
eflags:  none
sectype: none
Connecting to 10.1.1.1:14447
connecting to device: nvme1
Connecting to 10.1.2.1:14447
connecting to device: nvme2
Connecting to 10.1.3.1:14447
connecting to device: nvme3
Connecting to 10.1.4.1:14447
connecting to device: nvme4
nvme list
Node                  Generic               SN                   Model                                    Namespace  Usage                      Format           FW Rev  
--------------------- --------------------- -------------------- ---------------------------------------- ---------- -------------------------- ---------------- --------
/dev/nvme0n1          /dev/ng0n1            SS1Q24313Z2CD54R50B8 UMIS RPJYJ1T24MML1AWY                    0x1          1.02  TB /   1.02  TB    512   B +  0 B   1.0L06C1
/dev/nvme1n1          /dev/ng1n1            e9540cf45de68e42c683 Linux                                    0x1        536.87  MB / 536.87  MB    512   B +  0 B   7.1.0-rc
fio randread /dev/nvme1n1
libaio_4_256_128k_randread: (g=0): rw=randread, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=libaio, iodepth=256
...
fio-3.41
Starting 4 threads
Jobs: 1 (f=1): [_(2),r(1),_(1)][100.0%][r=63.8MiB/s][r=510 IOPS][eta 00m:00s]
libaio_4_256_128k_randread: (groupid=0, jobs=4): err= 0: pid=10182: Tue May 19 14:39:25 2026
  read: IOPS=1344, BW=168MiB/s (176MB/s)(2179MiB/12965msec)
    slat (usec): min=2, max=503567, avg=115.68, stdev=6521.43
    clat (msec): min=11, max=5731, avg=713.51, stdev=1011.82
     lat (msec): min=11, max=5731, avg=713.63, stdev=1012.11
    clat percentiles (msec):
     |  1.00th=[   12],  5.00th=[   17], 10.00th=[   23], 20.00th=[   41],
     | 30.00th=[   71], 40.00th=[  142], 50.00th=[  232], 60.00th=[  384],
     | 70.00th=[  667], 80.00th=[ 1267], 90.00th=[ 2333], 95.00th=[ 3171],
     | 99.00th=[ 4010], 99.50th=[ 4329], 99.90th=[ 5604], 99.95th=[ 5671],
     | 99.99th=[ 5738]
   bw (  KiB/s): min=48128, max=585984, per=100.00%, avg=210060.80, stdev=34447.60, samples=80
   iops        : min=  376, max= 4578, avg=1641.10, stdev=269.12, samples=80
  lat (msec)   : 20=6.61%, 50=17.33%, 100=10.93%, 250=17.03%, 500=12.87%
  lat (msec)   : 750=7.16%, 1000=4.85%, 2000=9.51%, >=2000=13.72%
  cpu          : usr=0.04%, sys=0.36%, ctx=16206, majf=0, minf=32772
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.2%, 16=0.4%, 32=0.7%, >=64=98.6%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
     issued rwts: total=17431,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=256

Run status group 0 (all jobs):
   READ: bw=168MiB/s (176MB/s), 168MiB/s-168MiB/s (176MB/s-176MB/s), io=2179MiB (2285MB), run=12965-12965msec

Disk stats (read/write):
  nvme1n1: ios=17428/0, sectors=4461568/0, merge=0/0, ticks=10855250/0, in_queue=10855250, util=100.00%
fio randwrite /dev/nvme1n1
libaio_4_256_128k_randwrite: (g=0): rw=randwrite, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=libaio, iodepth=256
...
fio-3.41
Starting 4 threads
Jobs: 3 (f=3): [w(3),_(1)][66.7%][w=31.9MiB/s][w=255 IOPS][eta 00m:06s]
libaio_4_256_128k_randwrite: (groupid=0, jobs=4): err= 0: pid=10218: Tue May 19 14:39:39 2026
  write: IOPS=1024, BW=128MiB/s (134MB/s)(1590MiB/12418msec); 0 zone resets
    slat (usec): min=2, max=889, avg=12.18, stdev=10.43
    clat (msec): min=21, max=4787, avg=973.45, stdev=979.48
     lat (msec): min=21, max=4787, avg=973.47, stdev=979.48
    clat percentiles (msec):
     |  1.00th=[   22],  5.00th=[   37], 10.00th=[   62], 20.00th=[  150],
     | 30.00th=[  271], 40.00th=[  426], 50.00th=[  617], 60.00th=[  860],
     | 70.00th=[ 1217], 80.00th=[ 1787], 90.00th=[ 2534], 95.00th=[ 3071],
     | 99.00th=[ 3876], 99.50th=[ 4044], 99.90th=[ 4329], 99.95th=[ 4665],
     | 99.99th=[ 4799]
   bw (  KiB/s): min=49664, max=286720, per=100.00%, avg=149785.60, stdev=17248.23, samples=80
   iops        : min=  388, max= 2240, avg=1170.20, stdev=134.75, samples=80
  lat (msec)   : 50=7.35%, 100=7.59%, 250=13.54%, 500=15.93%, 750=11.41%
  lat (msec)   : 1000=8.88%, 2000=18.64%, >=2000=16.65%
  cpu          : usr=0.13%, sys=0.27%, ctx=16346, majf=0, minf=4
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.3%, 16=0.5%, 32=1.0%, >=64=98.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
     issued rwts: total=0,12722,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=256

Run status group 0 (all jobs):
  WRITE: bw=128MiB/s (134MB/s), 128MiB/s-128MiB/s (134MB/s-134MB/s), io=1590MiB (1667MB), run=12418-12418msec

Disk stats (read/write):
  nvme1n1: ios=131/12721, sectors=3784/3256576, merge=0/0, ticks=1945/10940196, in_queue=10942141, util=100.00%
NVMe Flush: success

TAP version 13
1..1
ok 1 - mptcp_nvme: nvme over tcp test # time=35027ms
NQN:nqn.2014-08.org.nvmexpress.tcpdev.10012.27992 disconnected 4 controller(s)
/sys/kernel/config/nvmet /home/tgl/mptcp_net-next/tools/testing/selftests/net/mptcp
/home/tgl/mptcp_net-next/tools/testing/selftests/net/mptcp
tgl@ThinkBook:~/mptcp_net-next/mptcp-selftests$ sudo ./mptcp_nvme.sh mptcp 4 queue-depth 1
0+0 records in
0+0 records out
0 bytes copied, 0.000034865 s, 0.0 B/s
qdisc netem 8079: root refcnt 25 limit 1000 delay 5ms loss 0.5% rate 1Gbit seed 11684875286331491500
qdisc netem 807a: root refcnt 25 limit 1000 delay 5ms loss 0.5% rate 1Gbit seed 15405903518535477173
qdisc netem 807b: root refcnt 25 limit 1000 delay 5ms loss 0.5% rate 1Gbit seed 11645838655906894761
qdisc netem 807c: root refcnt 25 limit 1000 delay 5ms loss 0.5% rate 1Gbit seed 1760572176314863578
qdisc netem 807d: root refcnt 25 limit 1000 delay 5ms loss 0.5% rate 1Gbit seed 15635004858252533828
qdisc netem 807e: root refcnt 25 limit 1000 delay 5ms loss 0.5% rate 1Gbit seed 933733748701051554
qdisc netem 807f: root refcnt 25 limit 1000 delay 5ms loss 0.5% rate 1Gbit seed 13388314664729634730
qdisc netem 8080: root refcnt 25 limit 1000 delay 5ms loss 0.5% rate 1Gbit seed 2611167444991410732
nvme discover -a 10.1.1.1

Discovery Log Number of Records 2, Generation counter 128
=====Discovery Log Entry 0======
trtype:  unrecognized
adrfam:  ipv4
subtype: current discovery subsystem
treq:    not specified, sq flow control disable supported
portid:  24481
trsvcid: 1192
subnqn:  nqn.2014-08.org.nvmexpress.discovery
traddr:  10.1.1.1
eflags:  none
=====Discovery Log Entry 1======
trtype:  unrecognized
adrfam:  ipv4
subtype: nvme subsystem
treq:    not specified, sq flow control disable supported
portid:  24481
trsvcid: 1192
subnqn:  nqn.2014-08.org.nvmexpress.mptcpdev.10360.8450
traddr:  10.1.1.1
eflags:  none
Connecting to 10.1.1.1:1192
connecting to device: nvme1
Connecting to 10.1.2.1:1192
connecting to device: nvme2
Connecting to 10.1.3.1:1192
connecting to device: nvme3
Connecting to 10.1.4.1:1192
connecting to device: nvme4
nvme list
Node                  Generic               SN                   Model                                    Namespace  Usage                      Format           FW Rev  
--------------------- --------------------- -------------------- ---------------------------------------- ---------- -------------------------- ---------------- --------
/dev/nvme0n1          /dev/ng0n1            SS1Q24313Z2CD54R50B8 UMIS RPJYJ1T24MML1AWY                    0x1          1.02  TB /   1.02  TB    512   B +  0 B   1.0L06C1
/dev/nvme1n1          /dev/ng1n1            ec42151c9f27c53e5e5b Linux                                    0x1        536.87  MB / 536.87  MB    512   B +  0 B   7.1.0-rc
fio randread /dev/nvme1n1
libaio_4_256_128k_randread: (g=0): rw=randread, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=libaio, iodepth=256
...
fio-3.41
Starting 4 threads
Jobs: 4 (f=4): [r(4)][100.0%][r=444MiB/s][r=3548 IOPS][eta 00m:00s]
libaio_4_256_128k_randread: (groupid=0, jobs=4): err= 0: pid=10530: Tue May 19 14:40:12 2026
  read: IOPS=3398, BW=425MiB/s (445MB/s)(4536MiB/10677msec)
    slat (usec): min=2, max=1225, avg= 9.51, stdev=11.40
    clat (msec): min=11, max=1802, avg=295.38, stdev=251.61
     lat (msec): min=11, max=1802, avg=295.39, stdev=251.61
    clat percentiles (msec):
     |  1.00th=[   22],  5.00th=[   41], 10.00th=[   56], 20.00th=[   90],
     | 30.00th=[  130], 40.00th=[  178], 50.00th=[  230], 60.00th=[  288],
     | 70.00th=[  359], 80.00th=[  456], 90.00th=[  625], 95.00th=[  802],
     | 99.00th=[ 1200], 99.50th=[ 1368], 99.90th=[ 1636], 99.95th=[ 1720],
     | 99.99th=[ 1754]
   bw (  KiB/s): min=254208, max=672512, per=100.00%, avg=451392.00, stdev=26485.91, samples=80
   iops        : min= 1986, max= 5254, avg=3526.50, stdev=206.92, samples=80
  lat (msec)   : 20=0.65%, 50=7.55%, 100=14.81%, 250=30.73%, 500=29.98%
  lat (msec)   : 750=10.05%, 1000=4.12%, 2000=2.11%
  cpu          : usr=0.12%, sys=0.99%, ctx=29081, majf=0, minf=32772
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.2%, 32=0.4%, >=64=99.3%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
     issued rwts: total=36285,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=256

Run status group 0 (all jobs):
   READ: bw=425MiB/s (445MB/s), 425MiB/s-425MiB/s (445MB/s-445MB/s), io=4536MiB (4756MB), run=10677-10677msec

Disk stats (read/write):
  nvme1n1: ios=36271/0, sectors=9285376/0, merge=0/0, ticks=10433719/0, in_queue=10433719, util=100.00%
fio randwrite /dev/nvme1n1
libaio_4_256_128k_randwrite: (g=0): rw=randwrite, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=libaio, iodepth=256
...
fio-3.41
Starting 4 threads
Jobs: 4 (f=4): [w(4)][100.0%][w=435MiB/s][w=3480 IOPS][eta 00m:00s]
libaio_4_256_128k_randwrite: (groupid=0, jobs=4): err= 0: pid=10574: Tue May 19 14:40:24 2026
  write: IOPS=3314, BW=414MiB/s (434MB/s)(4447MiB/10733msec); 0 zone resets
    slat (usec): min=3, max=391916, avg=64.18, stdev=2661.10
    clat (msec): min=20, max=1286, avg=303.98, stdev=245.38
     lat (msec): min=20, max=1286, avg=304.04, stdev=245.45
    clat percentiles (msec):
     |  1.00th=[   31],  5.00th=[   45], 10.00th=[   59], 20.00th=[   92],
     | 30.00th=[  134], 40.00th=[  184], 50.00th=[  243], 60.00th=[  305],
     | 70.00th=[  376], 80.00th=[  477], 90.00th=[  634], 95.00th=[  827],
     | 99.00th=[ 1099], 99.50th=[ 1150], 99.90th=[ 1217], 99.95th=[ 1250],
     | 99.99th=[ 1284]
   bw (  KiB/s): min=209664, max=781056, per=100.00%, avg=439949.26, stdev=34944.39, samples=81
   iops        : min= 1638, max= 6102, avg=3437.04, stdev=273.01, samples=81
  lat (msec)   : 50=6.96%, 100=15.29%, 250=29.00%, 500=30.76%, 750=11.29%
  lat (msec)   : 1000=4.32%, 2000=2.38%
  cpu          : usr=0.46%, sys=0.94%, ctx=23523, majf=0, minf=4
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.2%, 32=0.4%, >=64=99.3%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
     issued rwts: total=0,35573,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=256

Run status group 0 (all jobs):
  WRITE: bw=414MiB/s (434MB/s), 414MiB/s-414MiB/s (434MB/s-434MB/s), io=4447MiB (4663MB), run=10733-10733msec

Disk stats (read/write):
  nvme1n1: ios=117/35561, sectors=3008/9103616, merge=0/0, ticks=4265/10379195, in_queue=10383460, util=100.00%
NVMe Flush: success

TAP version 13
1..1
ok 1 - mptcp_nvme: nvme over mptcp test # time=30332ms
NQN:nqn.2014-08.org.nvmexpress.mptcpdev.10360.8450 disconnected 4 controller(s)
/sys/kernel/config/nvmet /home/tgl/mptcp_net-next/tools/testing/selftests/net/mptcp
/home/tgl/mptcp_net-next/tools/testing/selftests/net/mptcp

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [LSF/MM/BPF TOPIC] NVMe over MPTCP: Multi-Fold Acceleration for NVMe over TCP in Multi-NIC Environments
  2026-05-19  7:31         ` Geliang Tang
@ 2026-05-26 10:16           ` Geliang Tang
  2026-05-28 15:59           ` Randy Jennings
  1 sibling, 0 replies; 10+ messages in thread
From: Geliang Tang @ 2026-05-26 10:16 UTC (permalink / raw)
  To: Nilay Shroff; +Cc: mptcp, linux-nvme

Hi Nilay,

Off-list.

On Tue, 2026-05-19 at 15:31 +0800, Geliang Tang wrote:
> Hi,
> 
> The performance test results of MPTCP under several NVMe multipath
> settings are now ready.
> 
> On Wed, 2026-05-13 at 18:04 +0800, Geliang Tang wrote:
> > Hello everyone,
> > 
> > Thank you for your interest in NVMe over MPTCP. I have attached the
> > slides from the presentation to this email.
> > 
> > Please note that the demo in the slides only configured a single
> > NVMe
> > multipath. Subsequently, I will post the MPTCP performance test
> > results
> > under several NVMe multipaths here.
> 
> To test the performance of TCP and MPTCP under NVMe multipath, I
> added
> two more arguments, "path" and "loss", to the original NVMe MPTCP
> self
> test script. The latest code is available at [1].
> 
> The script now accepts the following four arguments:
> 
>   mptcp_nvme.sh [trtype] [path] [iopolicy] [loss]
> 
>   trtype   Transport type (tcp|mptcp) - default: mptcp
>   path     Number of multipath (1-4) - default: 1
>   iopolicy I/O policy (numa|round-robin|queue-depth) - default: numa
>   loss     Enable packet loss (0|1) - default: 0
> 
> The first argument is the transport type. The second argument,
> "path",
> specifies how many NVMe multipaths to create. The third argument is
> the
> I/O policy. The fourth argument controls whether the network
> environment is lossy. When set to 0, each NIC is rate-limited to 125
> MB/s (tc arguments: rate 1000mbit). When set to 1, in addition to the
> same rate limit of 125 MB/s, each NIC also experiences a 5 ms delay
> and
> 0.5% packet loss (tc arguments: rate 1000mbit delay 5ms loss 0.5%).
> 
> 
> First set of tests: lossless network, path=4, loss=0. The tc output
> is
> as follows:
> 
>   qdisc netem 8031: root refcnt 25 limit 1000 rate 1Gbit
> 			seed 1626193586047356330
> 
> Lossless network, comparison between TCP and MPTCP using the "numa"
> policy - MPTCP is four times faster than TCP:
> 
> # ./mptcp_nvme.sh tcp 4 numa 0
>    READ: bw=114MiB/s (119MB/s), 114MiB/s-114MiB/s (119MB/s-119MB/s),
> 			io=1200MiB (1259MB), run=10533-10533msec
>   WRITE: bw=114MiB/s (119MB/s), 114MiB/s-114MiB/s (119MB/s-119MB/s),
> 			io=1203MiB (1261MB), run=10570-10570msec
> 
> # ./mptcp_nvme.sh mptcp 4 numa 0
>    READ: bw=445MiB/s (467MB/s), 445MiB/s-445MiB/s (467MB/s-467MB/s),
> 			io=4512MiB (4731MB), run=10130-10130msec
>   WRITE: bw=443MiB/s (465MB/s), 443MiB/s-443MiB/s (465MB/s-465MB/s),
> 			io=4504MiB (4723MB), run=10158-10158msec
> 
> Lossless network, comparison between TCP and MPTCP using the "round-
> robin" policy - MPTCP and TCP show similar performance:
> 
> # ./mptcp_nvme.sh tcp 4 round-robin 0
>    READ: bw=456MiB/s (478MB/s), 456MiB/s-456MiB/s (478MB/s-478MB/s),
> 			io=4683MiB (4910MB), run=10278-10278msec
>   WRITE: bw=455MiB/s (477MB/s), 455MiB/s-455MiB/s (477MB/s-477MB/s),
> 			io=4660MiB (4887MB), run=10239-10239msec
> 
> # ./mptcp_nvme.sh mptcp 4 round-robin 0
>    READ: bw=446MiB/s (467MB/s), 446MiB/s-446MiB/s (467MB/s-467MB/s),
> 			io=4565MiB (4786MB), run=10239-10239msec
>   WRITE: bw=445MiB/s (467MB/s), 445MiB/s-445MiB/s (467MB/s-467MB/s),
> 			io=4575MiB (4797MB), run=10280-10280msec
> 
> Lossless network, comparison between TCP and MPTCP using the "queue-
> depth" policy - MPTCP and TCP show similar performance:
> 
> # ./mptcp_nvme.sh tcp 4 queue-depth 0
>    READ: bw=456MiB/s (478MB/s), 456MiB/s-456MiB/s (478MB/s-478MB/s),
> 			io=4632MiB (4857MB), run=10169-10169msec
>   WRITE: bw=455MiB/s (477MB/s), 455MiB/s-455MiB/s (477MB/s-477MB/s),
> 			io=4666MiB (4893MB), run=10250-10250msec
> 
> # ./mptcp_nvme.sh mptcp 4 queue-depth 0
>    READ: bw=446MiB/s (467MB/s), 446MiB/s-446MiB/s (467MB/s-467MB/s),
> 			io=4568MiB (4790MB), run=10249-10249msec
>   WRITE: bw=445MiB/s (467MB/s), 445MiB/s-445MiB/s (467MB/s-467MB/s),
> 			io=4563MiB (4784MB), run=10245-10245msec
> 
> 
> Second set of tests: lossy network, path=4, loss=1. The tc output is
> as
> follows:
> 
>   qdisc netem 8051: root refcnt 25 limit 1000 delay 5ms loss 0.5%
> 			rate 1Gbit seed 14946049878654165618
> 
> Lossy network, comparison between TCP and MPTCP using the "round-
> robin"
> policy - MPTCP is four times faster than TCP:
> 
> # ./mptcp_nvme.sh tcp 4 round-robin 1
>    READ: bw=106MiB/s (111MB/s), 106MiB/s-106MiB/s (111MB/s-111MB/s),
> 			io=1574MiB (1650MB), run=14906-14906msec
>   WRITE: bw=98.5MiB/s (103MB/s), 98.5MiB/s-98.5MiB/s (103MB/s-
> 103MB/s),
> 			io=1455MiB (1526MB), run=14770-14770msec
> 
> # ./mptcp_nvme.sh mptcp 4 round-robin 1
>    READ: bw=426MiB/s (447MB/s), 426MiB/s-426MiB/s (447MB/s-447MB/s),
> 			io=4533MiB (4753MB), run=10637-10637msec
>   WRITE: bw=428MiB/s (449MB/s), 428MiB/s-428MiB/s (449MB/s-449MB/s),
> 			io=4507MiB (4725MB), run=10522-10522msec
> 
> Lossy network, comparison between TCP and MPTCP using the "queue-
> depth"
> policy - MPTCP is four times faster than TCP:
> 
> # ./mptcp_nvme.sh tcp 4 queue-depth 1
>    READ: bw=168MiB/s (176MB/s), 168MiB/s-168MiB/s (176MB/s-176MB/s),
> 			io=2179MiB (2285MB), run=12965-12965msec
>   WRITE: bw=128MiB/s (134MB/s), 128MiB/s-128MiB/s (134MB/s-134MB/s),
> 			io=1590MiB (1667MB), run=12418-12418msec
> 
> # ./mptcp_nvme.sh mptcp 4 queue-depth 1
>    READ: bw=425MiB/s (445MB/s), 425MiB/s-425MiB/s (445MB/s-445MB/s),
> 			io=4536MiB (4756MB), run=10677-10677msec
>   WRITE: bw=414MiB/s (434MB/s), 414MiB/s-414MiB/s (434MB/s-434MB/s),
> 			io=4447MiB (4663MB), run=10733-10733msec
> 
> 
> Conclusion: MPTCP achieves bandwidth aggregation comparable to that
> of
> NVMe multipath while offering better resilience against network
> interference.
> 
> The full test results are in the attachment.

Thank you very much for your interest in NVMe MPTCP. During my
presentation, I mentioned that I would show you the performance test
results of MPTCP when configuring several NVMe multipath paths. I spent
some time updating the scripts, and here are the latest results. What
do you think of these results? Please give me some suggestions.

Thanks,
-Geliang

> 
> Thanks,
> -Geliang
> 
> [1]
> https://patchwork.kernel.org/project/mptcp/cover/cover.1779159524.git.tanggeliang@kylinos.cn/
> 
> > 
> > Thanks,
> > -Geliang
> > 
> > On Thu, 2026-03-05 at 12:30 +0800, Geliang Tang wrote:
> > > Hi Nilay, Ming,
> > > 
> > > Thank you again for your interest in NVMe over MPTCP.
> > > 
> > > On Thu, 2026-02-26 at 17:54 +0800, Geliang Tang wrote:
> > > > Hi Nilay,
> > > > 
> > > > Thanks for your reply.
> > > > 
> > > > On Wed, 2026-02-25 at 20:37 +0530, Nilay Shroff wrote:
> > > > > 
> > > > > 
> > > > > On 1/29/26 9:43 AM, Geliang Tang wrote:
> > > > > > 3. Performance Benefits
> > > > > > 
> > > > > > This new feature has been evaluated in different
> > > > > > environments:
> > > > > > 
> > > > > > I conducted 'NVMe over MPTCP' tests between two PCs, each
> > > > > > equipped
> > > > > > with
> > > > > > two Gigabit NICs and directly connected via Ethernet
> > > > > > cables.
> > > > > > Using
> > > > > > 'NVMe over TCP', the fio benchmark showed a speed of
> > > > > > approximately
> > > > > > 100
> > > > > > MiB/s. In contrast, 'NVMe over MPTCP' achieved about 200
> > > > > > MiB/s
> > > > > > with
> > > > > > fio, doubling the throughput.
> > > > > > 
> > > > > > In a virtual machine test environment simulating four NICs
> > > > > > on
> > > > > > both
> > > > > > sides, 'NVMe over MPTCP' delivered bandwidth up to four
> > > > > > times
> > > > > > that
> > > > > > of
> > > > > > standard TCP.
> > > > > 
> > > > > This is interesting. Did you try using an NVMe multipath
> > > > > iopolicy
> > > > > other
> > > > > than the default numa policy? Assuming both the host and
> > > > > target
> > > > > are
> > > > > multihomed,
> > > > > configuring round-robin or queue-depth may provide
> > > > > performance
> > > > > comparable
> > > > > to what you are seeing with MPTCP.
> > > > > 
> > > > > I think MPTCP shall distribute traffic using transport-level
> > > > > metrics
> > > > > such as
> > > > > RTT, cwnd, and packet loss, whereas the NVMe multipath layer
> > > > > makes
> > > > > decisions
> > > > > based on ANA state, queue depth, and NUMA locality. In a
> > > > > setup
> > > > > with
> > > > > multiple
> > > > > active paths, switching the iopolicy from numa to round-robin
> > > > > or
> > > > > queue-depth
> > > > > could improve load distribution across controllers and thus
> > > > > improve
> > > > > performance.
> > > > > 
> > > > > IMO, it would be useful to test with those policies and
> > > > > compare
> > > > > the
> > > > > results
> > > > > against the MPTCP setup.
> > > > 
> > > > Ming Lei also made a similar comment. In my experiments, I
> > > > didn't
> > > > set
> > > > the multipath iopolicy, so I was using the default numa policy.
> > > > In
> > > > the
> > > > follow-up, I'll adjust it to round-robin or queue-depth and
> > > > rerun
> > > > the
> > > > experiments. I'll share the results in this email thread.
> > > 
> > > Based on your feedback, I have added iopolicy support to the NVMe
> > > over
> > > MPTCP selftest script (see patch 8 in [1]). We can set the
> > > iopolicy
> > > to
> > > round-robin like this:
> > > 
> > >  # ./mptcp_nvme.sh mptcp round-robin
> > > 
> > > This demonstrates that "NVMe over MPTCP" and "NVMe multipath" can
> > > work
> > > simultaneously without conflict.
> > > 
> > > Using this test script, I compared three I/O policies: numa,
> > > round-
> > > robin, and queue-depth. The results for fio were very similar.
> > > It's
> > > possible that this test environment doesn't fully reflect the
> > > differences in I/O policies. I will continue to follow up with
> > > further
> > > tests.
> > > 
> > > Thanks,
> > > -Geliang
> > > 
> > > [1]
> > > NVME over MPTCP, v4
> > > https://patchwork.kernel.org/project/mptcp/cover/cover.1772683110.git.tanggeliang@kylinos.cn/
> > > 
> > > > 
> > > > Thanks,
> > > > -Geliang
> > > > 
> > > > > 
> > > > > Thanks,
> > > > > --Nilay

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [LSF/MM/BPF TOPIC] NVMe over MPTCP: Multi-Fold Acceleration for NVMe over TCP in Multi-NIC Environments
  2026-05-19  7:31         ` Geliang Tang
  2026-05-26 10:16           ` Geliang Tang
@ 2026-05-28 15:59           ` Randy Jennings
  1 sibling, 0 replies; 10+ messages in thread
From: Randy Jennings @ 2026-05-28 15:59 UTC (permalink / raw)
  To: Geliang Tang
  Cc: lsf-pc, Javier González, Nilay Shroff, Ming Lei,
	Matthieu Baerts, Mat Martineau, Paolo Abeni, Hannes Reinecke,
	John Meneghini, mptcp, linux-nvme

On Tue, May 19, 2026 at 12:31 AM Geliang Tang <geliang@kernel.org> wrote:
> Lossless network, comparison between TCP and MPTCP using the "queue-
> depth" policy - MPTCP and TCP show similar performance:
>
> # ./mptcp_nvme.sh tcp 4 queue-depth 0
>    READ: bw=456MiB/s (478MB/s), 456MiB/s-456MiB/s (478MB/s-478MB/s),
>                         io=4632MiB (4857MB), run=10169-10169msec
>   WRITE: bw=455MiB/s (477MB/s), 455MiB/s-455MiB/s (477MB/s-477MB/s),
>                         io=4666MiB (4893MB), run=10250-10250msec
>
> # ./mptcp_nvme.sh mptcp 4 queue-depth 0
>    READ: bw=446MiB/s (467MB/s), 446MiB/s-446MiB/s (467MB/s-467MB/s),
>                         io=4568MiB (4790MB), run=10249-10249msec
>   WRITE: bw=445MiB/s (467MB/s), 445MiB/s-445MiB/s (467MB/s-467MB/s),
>                         io=4563MiB (4784MB), run=10245-10245msec
This makes much more sense to me.  Have you tested where one path
is _not_ flaky but is slower?  (3 100GBps, 1 50GBps or something like that)

>
>
> Second set of tests: lossy network, path=4, loss=1. The tc output is as
> follows:
>
>   qdisc netem 8051: root refcnt 25 limit 1000 delay 5ms loss 0.5%
>                         rate 1Gbit seed 14946049878654165618
>
>
> Lossy network, comparison between TCP and MPTCP using the "queue-depth"
> policy - MPTCP is four times faster than TCP:
>
> # ./mptcp_nvme.sh tcp 4 queue-depth 1
>    READ: bw=168MiB/s (176MB/s), 168MiB/s-168MiB/s (176MB/s-176MB/s),
>                         io=2179MiB (2285MB), run=12965-12965msec
>   WRITE: bw=128MiB/s (134MB/s), 128MiB/s-128MiB/s (134MB/s-134MB/s),
>                         io=1590MiB (1667MB), run=12418-12418msec
>
> # ./mptcp_nvme.sh mptcp 4 queue-depth 1
>    READ: bw=425MiB/s (445MB/s), 425MiB/s-425MiB/s (445MB/s-445MB/s),
>                         io=4536MiB (4756MB), run=10677-10677msec
>   WRITE: bw=414MiB/s (434MB/s), 414MiB/s-414MiB/s (434MB/s-434MB/s),
>                         io=4447MiB (4663MB), run=10733-10733msec
>
>
> Conclusion: MPTCP achieves bandwidth aggregation comparable to that of
> NVMe multipath while offering better resilience against network
> interference.
This is interesting.  So, one path of 4 flaky reduces bandwidth to
1/4 bandwidth, effectively a penalty of 2 paths (I was expecting more of a
penalty), while tcpmp can shake it off. Do you have a
hypothosis/understanding of why?

I have a guess that selective retransmission might be kicking in (which
would be good), but how is that different than expected behavior for IP/NIC
bonding?  (which, I think, could be implemented without an
NVMe driver/protocol change?)  We generally point people away from
IP/NIC bonding; although, I am (personally) not sure why.

Sincerely,
Randy Jennings


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2026-05-28 16:00 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-29  4:13 [LSF/MM/BPF TOPIC] NVMe over MPTCP: Multi-Fold Acceleration for NVMe over TCP in Multi-NIC Environments Geliang Tang
2026-02-25  5:57 ` Ming Lei
2026-02-26  9:44   ` Geliang Tang
2026-02-25 15:07 ` Nilay Shroff
2026-02-26  9:54   ` Geliang Tang
2026-03-05  4:30     ` Geliang Tang
2026-05-13 10:04       ` Geliang Tang
2026-05-19  7:31         ` Geliang Tang
2026-05-26 10:16           ` Geliang Tang
2026-05-28 15:59           ` Randy Jennings

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.