* [PATCH RFC 0/1] Add visibility for native NVMe miltipath using debugfs
@ 2024-07-22 9:31 Nilay Shroff
2024-07-22 9:31 ` [PATCH RFC 1/1] nvme-multipath: Add debugfs entry for showing multipath info Nilay Shroff
` (3 more replies)
0 siblings, 4 replies; 11+ messages in thread
From: Nilay Shroff @ 2024-07-22 9:31 UTC (permalink / raw)
To: linux-nvme; +Cc: hch, kbusch, sagi, axboe, gjoyce, Nilay Shroff
Hi,
This patch propose adding a new debugfs file entry for NVMe native
multipath. As we know NVMe native multipath today supports three different
io-policies (numa, round-robin and queue-depth) for selecting optimal I/O
path and forwarding data. However we don't have yet any visibility to find
the I/O path being selected by NVMe native multipath code.
IMO, it'd be nice to have this visibility information available under
debugfs which could help a user to validate the I/O path being chosen is
optimal for a given io policy. This patch propose adding a debugfs file
for each head disk node on the system. The proposal is to create a file
named "multipath" under "/sys/kernel/debug/nvmeXnY/".
Please find below output generated with this patch applied on a system
with a multi-controller PCIe NVMe disk attached to it. This system is also
an NVMf-TCP host which is connected to NVMf-TCP target over two NIC cards.
This system has two numa nodes online when the below output was captured:
# cat /sys/devices/system/node/online
2-3
# nvme list -v
Subsystem Subsystem-NQN Controllers
---------------- ------------------------------------------------------------------------------------------------ ----------------
nvme-subsys1 nvmet_subsystem nvme1, nvme3
nvme-subsys2 nqn.2019-10.com.kioxia:KCM7DRUG1T92:3D60A04906N1 nvme0, nvme2
Device Cntlid SN MN FR TxPort Address Slot Subsystem Namespaces
---------------- ------ -------------------- ---------------------------------------- -------- ------ -------------- ------ ------------ ----------------
nvme0 2 3D60A04906N1 1.6TB NVMe Gen4 U.2 SSD IV REV.CAS2 pcie 0524:28:00.0 U50EE.001.WZS000E-P3-C4-R1 nvme-subsys2 nvme2n2
nvme2 1 3D60A04906N1 1.6TB NVMe Gen4 U.2 SSD IV REV.CAS2 pcie 0584:28:00.0 U50EE.001.WZS000E-P3-C4-R2 nvme-subsys2 nvme2n2
nvme1 1 a224673364d1dcb6fab9 Linux 6.9.0 tcp traddr=10.0.0.200,trsvcid=4420,src_addr=10.0.0.100 nvme-subsys1 nvme1n1
nvme3 2 a224673364d1dcb6fab9 Linux 6.9.0 tcp traddr=20.0.0.200,trsvcid=4420,src_addr=20.0.0.100 nvme-subsys1 nvme1n1
Device Generic NSID Usage Format Controllers
----------------- ----------------- ---------- -------------------------- ---------------- ----------------
/dev/nvme1n1 /dev/ng1n1 0x1 5.75 GB / 5.75 GB 4 KiB + 0 B nvme1, nvme3
/dev/nvme2n2 /dev/ng2n2 0x2 0.00 B / 5.75 GB 4 KiB + 0 B nvme0, nvme2
# cat /sys/class/nvme-subsystem/nvme-subsys2/iopolicy
numa
# cat /sys/kernel/debug/block/nvme2n2/multipath
io-policy: numa
io-path:
--------
node current-path ctrl ana-state
2 nvme2c2n2 nvme2 optimized
3 nvme2c0n2 nvme0 optimized
The above output shows that current selected iopolicy is numa. And when we
have workload running I/O on numa node 2, accessing namespace "nvme2n2",
it uses path nvme2c2n2 and controller nvme2 for forwarding data. Moreover
the current ana-state for this path is optimized. Similarly, for I/O
workload running on numa node 3 would use path nvme2c0n2 and controller
nvme0.
Now changing the iopolicy to round-robin,
# echo "round-robin" > /sys/class/nvme-subsystem/nvme-subsys2/iopolicy
# cat /sys/kernel/debug/block/nvme2n2/multipath
io-policy: round-robin
io-path:
--------
node rr-path ctrl ana-state
2 nvme2c2n2 nvme2 optimized
2 nvme2c0n2 nvme0 optimized
3 nvme2c2n2 nvme2 optimized
3 nvme2c0n2 nvme0 optimized
The above output shows that current selected iopolicy is round-robin, and
when we have I/O workload running on numa node 2, accessing namespace
"nvme2n2", the I/O path would toggle between nvme2c2n2/nvme2 and
nvme2c0n2/nvme0. And the same is true for I/O workload running on node 3.
Both I/O paths are currently optimized.
The namespace "nvme1n1" is accessible over fabric(NVMf-TCP).
# cat /sys/kernel/debug/block/nvme1n1/multipath
io-policy: queue-depth
io-path:
--------
node path ctrl qdepth ana-state
2 nvme1c1n1 nvme1 1328 optimized
2 nvme1c3n1 nvme3 1324 optimized
3 nvme1c1n1 nvme1 1328 optimized
3 nvme1c3n1 nvme3 1324 optimized
The above output was captured while I/O was running and accessing
namespace nvme1n1. From the above output, we see that iopolicy is set to
"queue-depth". When we have I/O workload running on numa node 2, accessing
namespace "nvme1n1", the I/O path nvme1c1n1/nvme1 has queue depth of 1328
and another I/O path nvme1c3n1/nvme3 has queue depth of 1324. Both paths
are optimized and seems that both paths are equally utilized for
forwarding I/O. The same could be said for workload running on numa
node 3.
Nilay Shroff (1):
nvme-multipath: Add debugfs entry for showing multipath info
drivers/nvme/host/multipath.c | 92 +++++++++++++++++++++++++++++++++++
drivers/nvme/host/nvme.h | 1 +
2 files changed, 93 insertions(+)
--
2.45.2
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH RFC 1/1] nvme-multipath: Add debugfs entry for showing multipath info
2024-07-22 9:31 [PATCH RFC 0/1] Add visibility for native NVMe miltipath using debugfs Nilay Shroff
@ 2024-07-22 9:31 ` Nilay Shroff
2024-07-22 14:18 ` [PATCH RFC 0/1] Add visibility for native NVMe miltipath using debugfs Daniel Wagner
` (2 subsequent siblings)
3 siblings, 0 replies; 11+ messages in thread
From: Nilay Shroff @ 2024-07-22 9:31 UTC (permalink / raw)
To: linux-nvme; +Cc: hch, kbusch, sagi, axboe, gjoyce, Nilay Shroff
NVMe native multipath supports different io policies for selecting
I/O path, however, we don't have any visibility about which path is
being selected by multipath code for forwarding I/O. This patch helps
add that visibility by adding a debugfs file for each head disk node
on the system. It creates a file named "multipath" under
"/sys/kernel/debug/block/nvmeXnY/". This file shows the information
about current selected "io-policy" as well as it prints a "table"
showing information about each online node and it's respective I/O
path, controller name, ana-state and optionally queue depth of each
path (if selected io-policy is queue-depth).
Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
---
drivers/nvme/host/multipath.c | 90 +++++++++++++++++++++++++++++++++++
drivers/nvme/host/nvme.h | 1 +
2 files changed, 91 insertions(+)
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 91d9eb3c22ef..143d4b279b43 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -6,6 +6,7 @@
#include <linux/backing-dev.h>
#include <linux/moduleparam.h>
#include <linux/vmalloc.h>
+#include <linux/debugfs.h>
#include <trace/events/block.h>
#include "nvme.h"
@@ -628,6 +629,91 @@ int nvme_mpath_alloc_disk(struct nvme_ctrl *ctrl, struct nvme_ns_head *head)
ctrl->subsys->instance, head->instance);
return 0;
}
+static void nvme_mpath_numa_show(struct seq_file *m, struct nvme_ns_head *head)
+{
+ int node;
+ struct nvme_ns *ns;
+
+ seq_printf(m, "%-4s %-12s %-6s %s\n",
+ "node", "current-path", "ctrl", "ana-state");
+
+ for_each_online_node(node) {
+ ns = srcu_dereference(head->current_path[node], &head->srcu);
+ if (ns)
+ seq_printf(m, "%-4d %-12s %-6s %s\n",
+ node, ns->disk->disk_name,
+ dev_name(ns->ctrl->device),
+ nvme_ana_state_names[ns->ana_state]);
+ }
+}
+
+static void nvme_mpath_rr_show(struct seq_file *m, struct nvme_ns_head *head)
+{
+ int node;
+ struct nvme_ns *ns;
+
+ seq_printf(m, "%-4s %-12s %-6s %s\n",
+ "node", "rr-path", "ctrl", "ana-state");
+
+ for_each_online_node(node) {
+ list_for_each_entry_rcu(ns, &head->list, siblings) {
+ seq_printf(m, "%-4d %-12s %-6s %s\n",
+ node, ns->disk->disk_name,
+ dev_name(ns->ctrl->device),
+ nvme_ana_state_names[ns->ana_state]);
+ }
+ }
+}
+
+static void nvme_mpath_qd_show(struct seq_file *m, struct nvme_ns_head *head)
+{
+ int node;
+ struct nvme_ns *ns;
+
+ seq_printf(m, "%-4s %-12s %-6s %-10s %s\n",
+ "node", "path", "ctrl", "qdepth", "ana-state");
+
+ for_each_online_node(node) {
+ list_for_each_entry_rcu(ns, &head->list, siblings) {
+ seq_printf(m, "%-4d %-12s %-6s %-10d %s\n",
+ node, ns->disk->disk_name,
+ dev_name(ns->ctrl->device),
+ atomic_read(&ns->ctrl->nr_active),
+ nvme_ana_state_names[ns->ana_state]);
+
+ }
+ }
+}
+
+static int nvme_mpath_show(struct seq_file *m, void *p)
+{
+ struct nvme_ns_head *head = m->private;
+ int iopolicy = READ_ONCE(head->subsys->iopolicy);
+
+ seq_printf(m, "io-policy: %s\n", nvme_iopolicy_names[iopolicy]);
+
+ seq_puts(m, "io-path:\n");
+ seq_puts(m, "--------\n");
+
+ if (iopolicy == NVME_IOPOLICY_NUMA)
+ nvme_mpath_numa_show(m, head);
+ else if (iopolicy == NVME_IOPOLICY_RR)
+ nvme_mpath_rr_show(m, head);
+ else if (iopolicy == NVME_IOPOLICY_QD)
+ nvme_mpath_qd_show(m, head);
+
+ return 0;
+}
+
+static int nvme_mpath_open(struct inode *inode, struct file *file)
+{
+ return single_open(file, nvme_mpath_show, inode->i_private);
+}
+static const struct file_operations nvme_mpath_fops = {
+ .open = nvme_mpath_open,
+ .read = seq_read,
+ .release = single_release
+};
static void nvme_mpath_set_live(struct nvme_ns *ns)
{
@@ -650,6 +736,9 @@ static void nvme_mpath_set_live(struct nvme_ns *ns)
return;
}
nvme_add_ns_head_cdev(head);
+ head->debugfs = debugfs_create_file("multipath", 0400,
+ head->disk->queue->debugfs_dir, head,
+ &nvme_mpath_fops);
}
mutex_lock(&head->lock);
@@ -969,6 +1058,7 @@ void nvme_mpath_shutdown_disk(struct nvme_ns_head *head)
return;
kblockd_schedule_work(&head->requeue_work);
if (test_bit(NVME_NSHEAD_DISK_LIVE, &head->flags)) {
+ debugfs_remove(head->debugfs);
nvme_cdev_del(&head->cdev, &head->cdev_device);
del_gendisk(head->disk);
}
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index f900e44243ae..5b4c0b70cedf 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -493,6 +493,7 @@ struct nvme_ns_head {
struct work_struct requeue_work;
struct mutex lock;
unsigned long flags;
+ struct dentry *debugfs;
#define NVME_NSHEAD_DISK_LIVE 0
struct nvme_ns __rcu *current_path[];
#endif
--
2.45.2
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH RFC 0/1] Add visibility for native NVMe miltipath using debugfs
2024-07-22 9:31 [PATCH RFC 0/1] Add visibility for native NVMe miltipath using debugfs Nilay Shroff
2024-07-22 9:31 ` [PATCH RFC 1/1] nvme-multipath: Add debugfs entry for showing multipath info Nilay Shroff
@ 2024-07-22 14:18 ` Daniel Wagner
2024-07-23 5:18 ` Nilay Shroff
2024-07-24 14:37 ` Keith Busch
2024-07-28 20:47 ` Sagi Grimberg
3 siblings, 1 reply; 11+ messages in thread
From: Daniel Wagner @ 2024-07-22 14:18 UTC (permalink / raw)
To: Nilay Shroff; +Cc: linux-nvme, hch, kbusch, sagi, axboe, gjoyce
On Mon, Jul 22, 2024 at 03:01:08PM GMT, Nilay Shroff wrote:
> This patch propose adding a new debugfs file entry for NVMe native
> multipath. As we know NVMe native multipath today supports three different
> io-policies (numa, round-robin and queue-depth) for selecting optimal I/O
> path and forwarding data. However we don't have yet any visibility to find
> the I/O path being selected by NVMe native multipath code.
>
> IMO, it'd be nice to have this visibility information available under
> debugfs which could help a user to validate the I/O path being chosen is
> optimal for a given io policy. This patch propose adding a debugfs file
> for each head disk node on the system. The proposal is to create a file
> named "multipath" under "/sys/kernel/debug/nvmeXnY/".
>
> Please find below output generated with this patch applied on a system
> with a multi-controller PCIe NVMe disk attached to it. This system is also
> an NVMf-TCP host which is connected to NVMf-TCP target over two NIC cards.
> This system has two numa nodes online when the below output was
> captured:
Wouldn't it make sense to extend nvme-cli instead adding additional
debugfs entries to the kernel, e.g. extending show-topology?
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH RFC 0/1] Add visibility for native NVMe miltipath using debugfs
2024-07-22 14:18 ` [PATCH RFC 0/1] Add visibility for native NVMe miltipath using debugfs Daniel Wagner
@ 2024-07-23 5:18 ` Nilay Shroff
2024-07-23 7:40 ` Daniel Wagner
0 siblings, 1 reply; 11+ messages in thread
From: Nilay Shroff @ 2024-07-23 5:18 UTC (permalink / raw)
To: Daniel Wagner; +Cc: linux-nvme, hch, kbusch, sagi, axboe, gjoyce
On 7/22/24 19:48, Daniel Wagner wrote:
> On Mon, Jul 22, 2024 at 03:01:08PM GMT, Nilay Shroff wrote:
>> This patch propose adding a new debugfs file entry for NVMe native
>> multipath. As we know NVMe native multipath today supports three different
>> io-policies (numa, round-robin and queue-depth) for selecting optimal I/O
>> path and forwarding data. However we don't have yet any visibility to find
>> the I/O path being selected by NVMe native multipath code.
>>
>> IMO, it'd be nice to have this visibility information available under
>> debugfs which could help a user to validate the I/O path being chosen is
>> optimal for a given io policy. This patch propose adding a debugfs file
>> for each head disk node on the system. The proposal is to create a file
>> named "multipath" under "/sys/kernel/debug/nvmeXnY/".
>>
>> Please find below output generated with this patch applied on a system
>> with a multi-controller PCIe NVMe disk attached to it. This system is also
>> an NVMf-TCP host which is connected to NVMf-TCP target over two NIC cards.
>> This system has two numa nodes online when the below output was
>> captured:
>
> Wouldn't it make sense to extend nvme-cli instead adding additional
> debugfs entries to the kernel, e.g. extending show-topology?
>
Yeah we may extend nvme-cli to print this(multipathing) information however from
where would nvme-cli retrieve that information? AFAIK, today this multipath information
is not exported by NVMe driver. So we have to first make this information available from
driver either through sysfs or ioctl and then nvme-cli could parse it and show it to the
user. If everyone thinks that it's worth extending nvme-cli so that it could display this
information then yes we can certainly implement it. Please suggest.
Thanks,
--Nilay
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH RFC 0/1] Add visibility for native NVMe miltipath using debugfs
2024-07-23 5:18 ` Nilay Shroff
@ 2024-07-23 7:40 ` Daniel Wagner
2024-07-24 13:41 ` Christoph Hellwig
0 siblings, 1 reply; 11+ messages in thread
From: Daniel Wagner @ 2024-07-23 7:40 UTC (permalink / raw)
To: Nilay Shroff; +Cc: linux-nvme, hch, kbusch, sagi, axboe, gjoyce
On Tue, Jul 23, 2024 at 10:48:02AM GMT, Nilay Shroff wrote:
> On 7/22/24 19:48, Daniel Wagner wrote:
> > On Mon, Jul 22, 2024 at 03:01:08PM GMT, Nilay Shroff wrote:
> >> This patch propose adding a new debugfs file entry for NVMe native
> >> multipath. As we know NVMe native multipath today supports three different
> >> io-policies (numa, round-robin and queue-depth) for selecting optimal I/O
> >> path and forwarding data. However we don't have yet any visibility to find
> >> the I/O path being selected by NVMe native multipath code.
> >>
> >> IMO, it'd be nice to have this visibility information available under
> >> debugfs which could help a user to validate the I/O path being chosen is
> >> optimal for a given io policy. This patch propose adding a debugfs file
> >> for each head disk node on the system. The proposal is to create a file
> >> named "multipath" under "/sys/kernel/debug/nvmeXnY/".
> >>
> >> Please find below output generated with this patch applied on a system
> >> with a multi-controller PCIe NVMe disk attached to it. This system is also
> >> an NVMf-TCP host which is connected to NVMf-TCP target over two NIC cards.
> >> This system has two numa nodes online when the below output was
> >> captured:
> >
> > Wouldn't it make sense to extend nvme-cli instead adding additional
> > debugfs entries to the kernel, e.g. extending show-topology?
> >
> Yeah we may extend nvme-cli to print this(multipathing) information however from
> where would nvme-cli retrieve that information? AFAIK, today this multipath information
> is not exported by NVMe driver. So we have to first make this information available from
> driver either through sysfs or ioctl and then nvme-cli could parse it and show it to the
> user. If everyone thinks that it's worth extending nvme-cli so that it could display this
> information then yes we can certainly implement it. Please suggest.
debugfs might always be available. IIRC when lockdown is enabled, debugfs
is not available or parts of it.
I'd suggest going the full way and add expose the relevant information
via sysfs and extend libnvme and nvme-cli. But this just my take on this.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH RFC 0/1] Add visibility for native NVMe miltipath using debugfs
2024-07-23 7:40 ` Daniel Wagner
@ 2024-07-24 13:41 ` Christoph Hellwig
2024-07-25 6:23 ` Nilay Shroff
0 siblings, 1 reply; 11+ messages in thread
From: Christoph Hellwig @ 2024-07-24 13:41 UTC (permalink / raw)
To: Daniel Wagner; +Cc: Nilay Shroff, linux-nvme, hch, kbusch, sagi, axboe, gjoyce
On Tue, Jul 23, 2024 at 09:40:55AM +0200, Daniel Wagner wrote:
> debugfs might always be available. IIRC when lockdown is enabled, debugfs
> is not available or parts of it.
>
> I'd suggest going the full way and add expose the relevant information
> via sysfs and extend libnvme and nvme-cli. But this just my take on this.
Yes, if we want to do this properly sysfs is the place to go, not
debugfs.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH RFC 0/1] Add visibility for native NVMe miltipath using debugfs
2024-07-22 9:31 [PATCH RFC 0/1] Add visibility for native NVMe miltipath using debugfs Nilay Shroff
2024-07-22 9:31 ` [PATCH RFC 1/1] nvme-multipath: Add debugfs entry for showing multipath info Nilay Shroff
2024-07-22 14:18 ` [PATCH RFC 0/1] Add visibility for native NVMe miltipath using debugfs Daniel Wagner
@ 2024-07-24 14:37 ` Keith Busch
2024-07-25 6:20 ` Nilay Shroff
2024-07-28 20:47 ` Sagi Grimberg
3 siblings, 1 reply; 11+ messages in thread
From: Keith Busch @ 2024-07-24 14:37 UTC (permalink / raw)
To: Nilay Shroff; +Cc: linux-nvme, hch, sagi, axboe, gjoyce
On Mon, Jul 22, 2024 at 03:01:08PM +0530, Nilay Shroff wrote:
> # cat /sys/kernel/debug/block/nvme1n1/multipath
> io-policy: queue-depth
> io-path:
> --------
> node path ctrl qdepth ana-state
> 2 nvme1c1n1 nvme1 1328 optimized
> 2 nvme1c3n1 nvme3 1324 optimized
> 3 nvme1c1n1 nvme1 1328 optimized
> 3 nvme1c3n1 nvme3 1324 optimized
>
> The above output was captured while I/O was running and accessing
> namespace nvme1n1. From the above output, we see that iopolicy is set to
> "queue-depth". When we have I/O workload running on numa node 2, accessing
> namespace "nvme1n1", the I/O path nvme1c1n1/nvme1 has queue depth of 1328
> and another I/O path nvme1c3n1/nvme3 has queue depth of 1324. Both paths
> are optimized and seems that both paths are equally utilized for
> forwarding I/O.
You can get the outstanding queue-depth from iostats too, and that
doesn't rely on queue-depth io policy. It does, however, require stats
are enabled, but that's probably a more reasonable given than an io
policy.
> The same could be said for workload running on numa
> node 3.
The output for all numa nodes will be the same regardless of which node
a workload is running on (the accounting isn't per-node), so I'm not
sure outputting qdepth again for each node is useful.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH RFC 0/1] Add visibility for native NVMe miltipath using debugfs
2024-07-24 14:37 ` Keith Busch
@ 2024-07-25 6:20 ` Nilay Shroff
0 siblings, 0 replies; 11+ messages in thread
From: Nilay Shroff @ 2024-07-25 6:20 UTC (permalink / raw)
To: Keith Busch; +Cc: linux-nvme, hch, sagi, axboe, gjoyce
On 7/24/24 20:07, Keith Busch wrote:
> On Mon, Jul 22, 2024 at 03:01:08PM +0530, Nilay Shroff wrote:
>> # cat /sys/kernel/debug/block/nvme1n1/multipath
>> io-policy: queue-depth
>> io-path:
>> --------
>> node path ctrl qdepth ana-state
>> 2 nvme1c1n1 nvme1 1328 optimized
>> 2 nvme1c3n1 nvme3 1324 optimized
>> 3 nvme1c1n1 nvme1 1328 optimized
>> 3 nvme1c3n1 nvme3 1324 optimized
>>
>> The above output was captured while I/O was running and accessing
>> namespace nvme1n1. From the above output, we see that iopolicy is set to
>> "queue-depth". When we have I/O workload running on numa node 2, accessing
>> namespace "nvme1n1", the I/O path nvme1c1n1/nvme1 has queue depth of 1328
>> and another I/O path nvme1c3n1/nvme3 has queue depth of 1324. Both paths
>> are optimized and seems that both paths are equally utilized for
>> forwarding I/O.
>
> You can get the outstanding queue-depth from iostats too, and that
> doesn't rely on queue-depth io policy. It does, however, require stats
> are enabled, but that's probably a more reasonable given than an io
> policy.
>
Yes correct, user could use iostat to find the queue-depth in real-time
when I/O workload is running.
>> The same could be said for workload running on numa
>> node 3.
>
> The output for all numa nodes will be the same regardless of which node
> a workload is running on (the accounting isn't per-node), so I'm not
> sure outputting qdepth again for each node is useful.
Agreed, so in that case we may only show the available I/O paths for
head disk node when I/O policy is set to "queue-depth". In this case,
we don't need to show paths per numa node as you suggested. And then
for each I/O path we can show the "qdepth" once.
IMO, though it's possible to find the queue-depth monitoring the iostat
output, it'd be convenient to have it readily available under one place
where we would add further visibility of multipathing.
Thanks,
--Nilay
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH RFC 0/1] Add visibility for native NVMe miltipath using debugfs
2024-07-24 13:41 ` Christoph Hellwig
@ 2024-07-25 6:23 ` Nilay Shroff
0 siblings, 0 replies; 11+ messages in thread
From: Nilay Shroff @ 2024-07-25 6:23 UTC (permalink / raw)
To: Christoph Hellwig, Daniel Wagner; +Cc: linux-nvme, kbusch, sagi, axboe, gjoyce
On 7/24/24 19:11, Christoph Hellwig wrote:
> On Tue, Jul 23, 2024 at 09:40:55AM +0200, Daniel Wagner wrote:
>> debugfs might always be available. IIRC when lockdown is enabled, debugfs
>> is not available or parts of it.
>>
>> I'd suggest going the full way and add expose the relevant information
>> via sysfs and extend libnvme and nvme-cli. But this just my take on this.
>
> Yes, if we want to do this properly sysfs is the place to go, not
> debugfs.
>
Alright, I will send out another patch with relevant changes for
sysfs and then latter in libnvme/nvme-cli.
Thanks,
--Nilay
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH RFC 0/1] Add visibility for native NVMe miltipath using debugfs
2024-07-22 9:31 [PATCH RFC 0/1] Add visibility for native NVMe miltipath using debugfs Nilay Shroff
` (2 preceding siblings ...)
2024-07-24 14:37 ` Keith Busch
@ 2024-07-28 20:47 ` Sagi Grimberg
2024-07-29 4:50 ` Nilay Shroff
3 siblings, 1 reply; 11+ messages in thread
From: Sagi Grimberg @ 2024-07-28 20:47 UTC (permalink / raw)
To: Nilay Shroff, linux-nvme; +Cc: hch, kbusch, axboe, gjoyce
> # cat /sys/kernel/debug/block/nvme2n2/multipath
> io-policy: numa
> io-path:
> --------
> node current-path ctrl ana-state
> 2 nvme2c2n2 nvme2 optimized
> 3 nvme2c0n2 nvme0 optimized
>
> The above output shows that current selected iopolicy is numa. And when we
> have workload running I/O on numa node 2, accessing namespace "nvme2n2",
> it uses path nvme2c2n2 and controller nvme2 for forwarding data. Moreover
> the current ana-state for this path is optimized. Similarly, for I/O
> workload running on numa node 3 would use path nvme2c0n2 and controller
> nvme0.
>
> Now changing the iopolicy to round-robin,
>
> # echo "round-robin" > /sys/class/nvme-subsystem/nvme-subsys2/iopolicy
>
> # cat /sys/kernel/debug/block/nvme2n2/multipath
> io-policy: round-robin
> io-path:
> --------
> node rr-path ctrl ana-state
> 2 nvme2c2n2 nvme2 optimized
> 2 nvme2c0n2 nvme0 optimized
> 3 nvme2c2n2 nvme2 optimized
> 3 nvme2c0n2 nvme0 optimized
Can we avoid a formatted output in sysfs? I'd much rather prefer that
nvme-cli/libnvme to
format this (maybe this may be wanted as json in the future for example)...
Can we simply expose the individual components and have userpace format
the output?
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH RFC 0/1] Add visibility for native NVMe miltipath using debugfs
2024-07-28 20:47 ` Sagi Grimberg
@ 2024-07-29 4:50 ` Nilay Shroff
0 siblings, 0 replies; 11+ messages in thread
From: Nilay Shroff @ 2024-07-29 4:50 UTC (permalink / raw)
To: Sagi Grimberg, linux-nvme; +Cc: hch, kbusch, axboe, gjoyce
On 7/29/24 02:17, Sagi Grimberg wrote:
>
>> # cat /sys/kernel/debug/block/nvme2n2/multipath
>> io-policy: numa
>> io-path:
>> --------
>> node current-path ctrl ana-state
>> 2 nvme2c2n2 nvme2 optimized
>> 3 nvme2c0n2 nvme0 optimized
>>
>> The above output shows that current selected iopolicy is numa. And when we
>> have workload running I/O on numa node 2, accessing namespace "nvme2n2",
>> it uses path nvme2c2n2 and controller nvme2 for forwarding data. Moreover
>> the current ana-state for this path is optimized. Similarly, for I/O
>> workload running on numa node 3 would use path nvme2c0n2 and controller
>> nvme0.
>>
>> Now changing the iopolicy to round-robin,
>>
>> # echo "round-robin" > /sys/class/nvme-subsystem/nvme-subsys2/iopolicy
>>
>> # cat /sys/kernel/debug/block/nvme2n2/multipath
>> io-policy: round-robin
>> io-path:
>> --------
>> node rr-path ctrl ana-state
>> 2 nvme2c2n2 nvme2 optimized
>> 2 nvme2c0n2 nvme0 optimized
>> 3 nvme2c2n2 nvme2 optimized
>> 3 nvme2c0n2 nvme0 optimized
>
> Can we avoid a formatted output in sysfs? I'd much rather prefer that nvme-cli/libnvme to
> format this (maybe this may be wanted as json in the future for example)...
>
> Can we simply expose the individual components and have userpace format the output?
Yes that's what I am planning to implement. The sysfs would only expose the relevant
information from NVMe driver and then libnvme/nvme-cli format the sysfs output as needed.
Thanks,
--Nilay
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2024-07-29 4:51 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-22 9:31 [PATCH RFC 0/1] Add visibility for native NVMe miltipath using debugfs Nilay Shroff
2024-07-22 9:31 ` [PATCH RFC 1/1] nvme-multipath: Add debugfs entry for showing multipath info Nilay Shroff
2024-07-22 14:18 ` [PATCH RFC 0/1] Add visibility for native NVMe miltipath using debugfs Daniel Wagner
2024-07-23 5:18 ` Nilay Shroff
2024-07-23 7:40 ` Daniel Wagner
2024-07-24 13:41 ` Christoph Hellwig
2024-07-25 6:23 ` Nilay Shroff
2024-07-24 14:37 ` Keith Busch
2024-07-25 6:20 ` Nilay Shroff
2024-07-28 20:47 ` Sagi Grimberg
2024-07-29 4:50 ` Nilay Shroff
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox