public inbox for linux-nvme@lists.infradead.org
 help / color / mirror / Atom feed
From: John Meneghini <jmeneghi@redhat.com>
To: Sagi Grimberg <sagi@grimberg.me>, Hannes Reinecke <hare@suse.de>,
	"Ewan D. Milne" <emilne@redhat.com>,
	linux-nvme@lists.infradead.org
Cc: tsong@purestorage.com, mlombard@redhat.com
Subject: Re: [PATCH 3/3] nvme-multipath: add "use_nonoptimized" module option
Date: Wed, 27 Sep 2023 15:11:51 +0200	[thread overview]
Message-ID: <0d1c8cfd-99e7-228f-8ccb-94461a4f776a@redhat.com> (raw)
In-Reply-To: <c6c4cbab-9f03-f7f2-6985-c627f2aaece8@grimberg.me>

Ewan I discussed this patch and agreed this is not something we want to go upstream. It was only included here for completeness. 
This patch was only used to increase the number of active paths during the testing of patches 01 and 02.  The test bed Ewan used 
originally had only 4 35Gbps nvme-tcp contollers (2 active optimized and 2 active non-optimized).  He used this patch to change 
the multi-pathing policy and enable the use all 4 controllers - resulting in 4 active paths.

Those test results can be seen here:

https://people.redhat.com/jmeneghi/.multipath/test1/A400-TCP-FIO-RR.ps - round-robin io tests
https://people.redhat.com/jmeneghi/.multipath/test1/A400-TCP-FIO-QD.ps - queue-depth io tests

After sending these patches upstream on Monday Ewan and I built a new test bed with 8 controllers - 4 active optimize paths and 
4 non-optimized paths.  This provided a true multi-path test bed and the use_nonoptimized patch wasn't needed.

In addition, this test bed has a mixed controller subsystem consisting of 4 32GB nvme-fc controllers and 4 100Gbps nvme-tcp 
controllers.  This provided the optimal mix of active optimized controllers paths with different transports that have inherently 
different latency characteristics.

  [root@rhel-storage-104 ~]# nvme list-subsys
nvme-subsys3 - NQN=nqn.1992-08.com.netapp:sn.2b82d9b13bb211ee8744d039ea989119:subsystem.SS104a
\
  +- nvme10 fc traddr=nn-0x2027d039ea98949e:pn-0x202cd039ea98949e,host_traddr=nn-0x200000109b9b7f0d:pn-0x100000109b9b7f0d live
  +- nvme11 fc traddr=nn-0x2027d039ea98949e:pn-0x2029d039ea98949e,host_traddr=nn-0x200000109b9b7f0c:pn-0x100000109b9b7f0c live
  +- nvme12 fc traddr=nn-0x2027d039ea98949e:pn-0x2028d039ea98949e,host_traddr=nn-0x200000109b9b7f0d:pn-0x100000109b9b7f0d live
  +- nvme13 tcp traddr=172.18.50.13,trsvcid=4420,src_addr=172.18.50.3 live
  +- nvme2 tcp traddr=172.18.60.16,trsvcid=4420,src_addr=172.18.60.4 live
  +- nvme3 fc traddr=nn-0x2027d039ea98949e:pn-0x202dd039ea98949e,host_traddr=nn-0x200000109b9b7f0c:pn-0x100000109b9b7f0c live
  +- nvme4 tcp traddr=172.18.50.15,trsvcid=4420,src_addr=172.18.50.3 live
  +- nvme9 tcp traddr=172.18.60.14,trsvcid=4420,src_addr=172.18.60.4 live

I shared the performance graphs for this testbed using these patches at the ALPSS conference today.

Graphs of inflight I/O on 4 Optimized paths (2 FC, 2 TCP) with round-robin:

https://people.redhat.com/jmeneghi/.multipath/test2/A400-TEST1-FIO-RR.ps

and queue-depth:

https://people.redhat.com/jmeneghi/.multipath/test2/A400-TEST1-FIO-QD.ps

Also a included is a graph showing the combined number of inflight I/Os on all paths, plotted with both RR and QD:

https://people.redhat.com/jmeneghi/.multipath/test2/A400-TEST1-FIO-MAX.ps

What we see in these graphs is that RR is only using 1 of the 4 possible paths, and with QD all 4 paths are used about equally 
while the maximum inflight I/Os count almost doubles.

Hope this helps.

John A. Meneghini
Senior Principal Platform Storage Engineer
RHEL SST - Platform Storage Group
jmeneghi@redhat.com

On 9/27/23 13:31, Sagi Grimberg wrote:
> 
>>> Setting nvme_core.use_nonoptimized=true will cause the path
>>> selector to treat optimized and nonoptimized paths equally.
>>>
>>> This is because although an NVMe fabrics target device may report
>>> an unoptimized ANA state, it is possible that other factors such
>>> as fabric latency are a large factor in the I/O service time.  And,
>>> throughput may improve overall if nonoptimized ports are also used.
>>>
>>> Signed-off-by: Ewan D. Milne <emilne@redhat.com>
>>> ---
>>>   drivers/nvme/host/multipath.c | 22 +++++++++++++++++++---
>>>   1 file changed, 19 insertions(+), 3 deletions(-)
>>>
>> No. Please don't.
>>
>> There's a reason why controllers specify paths as 'active/optimized' or 'active/non-optimized'. If they had wanted us to use 
>> all paths they would have put them into the same group.
>> They tend to get very unhappy if you start using them at the same time.
>> (Triggering failover etc.)
> 
> I have to agree here. This is effectively a modparam that says
> all paths are optimized regardless of what the controller reports.
> 
> While I do acknowledge that there may be some merit to use non-optimized
> paths as well, but its almost impossible to know some latent optimum
> path distribution. Hence the host forfeits even attempting.
> 
> If the controller wants all path used, it should make all paths
> optimized and the host can examine QD accumulating on some paths
> vs others.
> 



  reply	other threads:[~2023-09-27 13:13 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-25 16:31 [PATCH 0/3] NVMe multipath path selector enhancements Ewan D. Milne
2023-09-25 16:31 ` [PATCH 1/3] block: introduce blk_queue_nr_active() Ewan D. Milne
2023-09-25 20:56   ` Bart Van Assche
2023-09-27 14:49     ` Ewan Milne
2023-09-27 14:58       ` Bart Van Assche
2023-09-27  7:36   ` Hannes Reinecke
2023-09-27 11:37     ` Ming Lei
2023-09-27 13:34     ` Ewan Milne
2023-10-03 20:11       ` Uday Shankar
2023-10-04  9:19         ` Sagi Grimberg
2023-09-27  9:49   ` Ming Lei
2023-09-27 13:54     ` Ewan Milne
2023-09-27 10:56   ` Sagi Grimberg
2023-09-27 13:50     ` Ewan Milne
2023-09-28 10:56       ` Sagi Grimberg
2023-09-25 16:31 ` [PATCH 2/3] nvme-multipath: Implement new iopolicy "queue-depth" Ewan D. Milne
2023-09-27  7:38   ` Hannes Reinecke
2023-09-27 11:02     ` Sagi Grimberg
2023-09-27 13:42     ` Ewan Milne
2023-09-25 16:31 ` [PATCH 3/3] nvme-multipath: add "use_nonoptimized" module option Ewan D. Milne
2023-09-27  7:41   ` Hannes Reinecke
2023-09-27 11:31     ` Sagi Grimberg
2023-09-27 13:11       ` John Meneghini [this message]
2023-09-27 13:45     ` Ewan Milne

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0d1c8cfd-99e7-228f-8ccb-94461a4f776a@redhat.com \
    --to=jmeneghi@redhat.com \
    --cc=emilne@redhat.com \
    --cc=hare@suse.de \
    --cc=linux-nvme@lists.infradead.org \
    --cc=mlombard@redhat.com \
    --cc=sagi@grimberg.me \
    --cc=tsong@purestorage.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox