From: Nilay Shroff <nilay@linux.ibm.com>
To: linux-nvme@lists.infradead.org
Cc: hch@lst.de, kbusch@kernel.org, sagi@grimberg.me, axboe@fb.com,
gjoyce@linux.ibm.com, Nilay Shroff <nilay@linux.ibm.com>
Subject: [PATCH RFC 0/1] Add visibility for native NVMe miltipath using debugfs
Date: Mon, 22 Jul 2024 15:01:08 +0530 [thread overview]
Message-ID: <20240722093124.42581-1-nilay@linux.ibm.com> (raw)
Hi,
This patch propose adding a new debugfs file entry for NVMe native
multipath. As we know NVMe native multipath today supports three different
io-policies (numa, round-robin and queue-depth) for selecting optimal I/O
path and forwarding data. However we don't have yet any visibility to find
the I/O path being selected by NVMe native multipath code.
IMO, it'd be nice to have this visibility information available under
debugfs which could help a user to validate the I/O path being chosen is
optimal for a given io policy. This patch propose adding a debugfs file
for each head disk node on the system. The proposal is to create a file
named "multipath" under "/sys/kernel/debug/nvmeXnY/".
Please find below output generated with this patch applied on a system
with a multi-controller PCIe NVMe disk attached to it. This system is also
an NVMf-TCP host which is connected to NVMf-TCP target over two NIC cards.
This system has two numa nodes online when the below output was captured:
# cat /sys/devices/system/node/online
2-3
# nvme list -v
Subsystem Subsystem-NQN Controllers
---------------- ------------------------------------------------------------------------------------------------ ----------------
nvme-subsys1 nvmet_subsystem nvme1, nvme3
nvme-subsys2 nqn.2019-10.com.kioxia:KCM7DRUG1T92:3D60A04906N1 nvme0, nvme2
Device Cntlid SN MN FR TxPort Address Slot Subsystem Namespaces
---------------- ------ -------------------- ---------------------------------------- -------- ------ -------------- ------ ------------ ----------------
nvme0 2 3D60A04906N1 1.6TB NVMe Gen4 U.2 SSD IV REV.CAS2 pcie 0524:28:00.0 U50EE.001.WZS000E-P3-C4-R1 nvme-subsys2 nvme2n2
nvme2 1 3D60A04906N1 1.6TB NVMe Gen4 U.2 SSD IV REV.CAS2 pcie 0584:28:00.0 U50EE.001.WZS000E-P3-C4-R2 nvme-subsys2 nvme2n2
nvme1 1 a224673364d1dcb6fab9 Linux 6.9.0 tcp traddr=10.0.0.200,trsvcid=4420,src_addr=10.0.0.100 nvme-subsys1 nvme1n1
nvme3 2 a224673364d1dcb6fab9 Linux 6.9.0 tcp traddr=20.0.0.200,trsvcid=4420,src_addr=20.0.0.100 nvme-subsys1 nvme1n1
Device Generic NSID Usage Format Controllers
----------------- ----------------- ---------- -------------------------- ---------------- ----------------
/dev/nvme1n1 /dev/ng1n1 0x1 5.75 GB / 5.75 GB 4 KiB + 0 B nvme1, nvme3
/dev/nvme2n2 /dev/ng2n2 0x2 0.00 B / 5.75 GB 4 KiB + 0 B nvme0, nvme2
# cat /sys/class/nvme-subsystem/nvme-subsys2/iopolicy
numa
# cat /sys/kernel/debug/block/nvme2n2/multipath
io-policy: numa
io-path:
--------
node current-path ctrl ana-state
2 nvme2c2n2 nvme2 optimized
3 nvme2c0n2 nvme0 optimized
The above output shows that current selected iopolicy is numa. And when we
have workload running I/O on numa node 2, accessing namespace "nvme2n2",
it uses path nvme2c2n2 and controller nvme2 for forwarding data. Moreover
the current ana-state for this path is optimized. Similarly, for I/O
workload running on numa node 3 would use path nvme2c0n2 and controller
nvme0.
Now changing the iopolicy to round-robin,
# echo "round-robin" > /sys/class/nvme-subsystem/nvme-subsys2/iopolicy
# cat /sys/kernel/debug/block/nvme2n2/multipath
io-policy: round-robin
io-path:
--------
node rr-path ctrl ana-state
2 nvme2c2n2 nvme2 optimized
2 nvme2c0n2 nvme0 optimized
3 nvme2c2n2 nvme2 optimized
3 nvme2c0n2 nvme0 optimized
The above output shows that current selected iopolicy is round-robin, and
when we have I/O workload running on numa node 2, accessing namespace
"nvme2n2", the I/O path would toggle between nvme2c2n2/nvme2 and
nvme2c0n2/nvme0. And the same is true for I/O workload running on node 3.
Both I/O paths are currently optimized.
The namespace "nvme1n1" is accessible over fabric(NVMf-TCP).
# cat /sys/kernel/debug/block/nvme1n1/multipath
io-policy: queue-depth
io-path:
--------
node path ctrl qdepth ana-state
2 nvme1c1n1 nvme1 1328 optimized
2 nvme1c3n1 nvme3 1324 optimized
3 nvme1c1n1 nvme1 1328 optimized
3 nvme1c3n1 nvme3 1324 optimized
The above output was captured while I/O was running and accessing
namespace nvme1n1. From the above output, we see that iopolicy is set to
"queue-depth". When we have I/O workload running on numa node 2, accessing
namespace "nvme1n1", the I/O path nvme1c1n1/nvme1 has queue depth of 1328
and another I/O path nvme1c3n1/nvme3 has queue depth of 1324. Both paths
are optimized and seems that both paths are equally utilized for
forwarding I/O. The same could be said for workload running on numa
node 3.
Nilay Shroff (1):
nvme-multipath: Add debugfs entry for showing multipath info
drivers/nvme/host/multipath.c | 92 +++++++++++++++++++++++++++++++++++
drivers/nvme/host/nvme.h | 1 +
2 files changed, 93 insertions(+)
--
2.45.2
next reply other threads:[~2024-07-22 9:31 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-22 9:31 Nilay Shroff [this message]
2024-07-22 9:31 ` [PATCH RFC 1/1] nvme-multipath: Add debugfs entry for showing multipath info Nilay Shroff
2024-07-22 14:18 ` [PATCH RFC 0/1] Add visibility for native NVMe miltipath using debugfs Daniel Wagner
2024-07-23 5:18 ` Nilay Shroff
2024-07-23 7:40 ` Daniel Wagner
2024-07-24 13:41 ` Christoph Hellwig
2024-07-25 6:23 ` Nilay Shroff
2024-07-24 14:37 ` Keith Busch
2024-07-25 6:20 ` Nilay Shroff
2024-07-28 20:47 ` Sagi Grimberg
2024-07-29 4:50 ` Nilay Shroff
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240722093124.42581-1-nilay@linux.ibm.com \
--to=nilay@linux.ibm.com \
--cc=axboe@fb.com \
--cc=gjoyce@linux.ibm.com \
--cc=hch@lst.de \
--cc=kbusch@kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=sagi@grimberg.me \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox