From: Nilay Shroff <nilay@linux.ibm.com>
To: linux-nvme@lists.infradead.org
Cc: kbusch@kernel.org, sagi@grimberg.me, hch@lst.de, dwagner@suse.de,
hare@suse.de, chaitanyak@nvidia.com, axboe@fb.com,
gjoyce@linux.ibm.com
Subject: [PATCHv7 RFC 2/3] nvme-multipath: Add visibility for numa io-policy
Date: Sun, 12 Jan 2025 18:11:45 +0530 [thread overview]
Message-ID: <20250112124154.60690-3-nilay@linux.ibm.com> (raw)
In-Reply-To: <20250112124154.60690-1-nilay@linux.ibm.com>
This patch helps add nvme native multipath visibility for numa io-policy.
It adds a new attribute file named "numa_nodes" under namespace gendisk
device path node which prints the list of numa nodes preferred by the
given namespace path. The numa nodes value is comma delimited list of
nodes or A-B range of nodes.
For instance, if we have a shared namespace accessible from two different
controllers/paths then accessing head node of the shared namespace would
show the following output:
$ ls -l /sys/block/nvme1n1/multipath/
nvme1c1n1 -> ../../../../../pci052e:78/052e:78:00.0/nvme/nvme1/nvme1c1n1
nvme1c3n1 -> ../../../../../pci058e:78/058e:78:00.0/nvme/nvme3/nvme1c3n1
In the above example, nvme1n1 is head gendisk node created for a shared
namespace and this namespace is accessible from nvme1c1n1 and nvme1c3n1
paths. For numa io-policy we can then refer the "numa_nodes" attribute
file created under each namespace path:
$ cat /sys/block/nvme1n1/multipath/nvme1c1n1/numa_nodes
0-1
$ cat /sys/block/nvme1n1/multipath/nvme1c3n1/numa_nodes
2-3
>From the above output, we infer that I/O workload targeted at nvme1n1
and running on numa nodes 0 and 1 would prefer using path nvme1c1n1.
Similarly, I/O workload running on numa nodes 2 and 3 would prefer
using path nvme1c3n1. Reading "numa_nodes" file when configured
io-policy is anything but numa would show no output.
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
---
drivers/nvme/host/multipath.c | 27 +++++++++++++++++++++++++++
drivers/nvme/host/nvme.h | 1 +
drivers/nvme/host/sysfs.c | 5 +++++
3 files changed, 33 insertions(+)
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index eccc26616e38..3f402a7f4af7 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -976,6 +976,33 @@ static ssize_t ana_state_show(struct device *dev, struct device_attribute *attr,
}
DEVICE_ATTR_RO(ana_state);
+static ssize_t numa_nodes_show(struct device *dev, struct device_attribute *attr,
+ char *buf)
+{
+ int node, srcu_idx;
+ nodemask_t numa_nodes;
+ struct nvme_ns *current_ns;
+ struct nvme_ns *ns = nvme_get_ns_from_dev(dev);
+ struct nvme_ns_head *head = ns->head;
+
+ if (head->subsys->iopolicy != NVME_IOPOLICY_NUMA)
+ return 0;
+
+ nodes_clear(numa_nodes);
+
+ srcu_idx = srcu_read_lock(&head->srcu);
+ for_each_node(node) {
+ current_ns = srcu_dereference(head->current_path[node],
+ &head->srcu);
+ if (ns == current_ns)
+ node_set(node, numa_nodes);
+ }
+ srcu_read_unlock(&head->srcu, srcu_idx);
+
+ return sysfs_emit(buf, "%*pbl\n", nodemask_pr_args(&numa_nodes));
+}
+DEVICE_ATTR_RO(numa_nodes);
+
static int nvme_lookup_ana_group_desc(struct nvme_ctrl *ctrl,
struct nvme_ana_group_desc *desc, void *data)
{
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 643bf580bd09..fd9f2070d86f 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -984,6 +984,7 @@ static inline void nvme_trace_bio_complete(struct request *req)
extern bool multipath;
extern struct device_attribute dev_attr_ana_grpid;
extern struct device_attribute dev_attr_ana_state;
+extern struct device_attribute dev_attr_numa_nodes;
extern struct device_attribute subsys_attr_iopolicy;
static inline bool nvme_disk_is_ns_head(struct gendisk *disk)
diff --git a/drivers/nvme/host/sysfs.c b/drivers/nvme/host/sysfs.c
index 5a23e23b0d01..5a5ee0beb166 100644
--- a/drivers/nvme/host/sysfs.c
+++ b/drivers/nvme/host/sysfs.c
@@ -258,6 +258,7 @@ static struct attribute *nvme_ns_attrs[] = {
#ifdef CONFIG_NVME_MULTIPATH
&dev_attr_ana_grpid.attr,
&dev_attr_ana_state.attr,
+ &dev_attr_numa_nodes.attr,
#endif
&dev_attr_io_passthru_err_log_enabled.attr,
NULL,
@@ -290,6 +291,10 @@ static umode_t nvme_ns_attrs_are_visible(struct kobject *kobj,
if (!nvme_ctrl_use_ana(nvme_get_ns_from_dev(dev)->ctrl))
return 0;
}
+ if (a == &dev_attr_numa_nodes.attr) {
+ if (nvme_disk_is_ns_head(dev_to_disk(dev)))
+ return 0;
+ }
#endif
return a->mode;
}
--
2.47.1
next prev parent reply other threads:[~2025-01-12 12:43 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-12 12:41 [PATCHv7 RFC 0/3] Add visibility for native NVMe multipath using sysfs Nilay Shroff
2025-01-12 12:41 ` [PATCHv7 RFC 1/3] nvme-multipath: Add visibility for round-robin io-policy Nilay Shroff
2025-01-13 10:34 ` Hannes Reinecke
2025-01-12 12:41 ` Nilay Shroff [this message]
2025-01-13 10:35 ` [PATCHv7 RFC 2/3] nvme-multipath: Add visibility for numa io-policy Hannes Reinecke
2025-01-12 12:41 ` [PATCHv7 RFC 3/3] nvme-multipath: Add visibility for queue-depth io-policy Nilay Shroff
2025-01-13 10:35 ` Hannes Reinecke
2025-01-24 15:57 ` [PATCHv7 RFC 0/3] Add visibility for native NVMe multipath using sysfs Keith Busch
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250112124154.60690-3-nilay@linux.ibm.com \
--to=nilay@linux.ibm.com \
--cc=axboe@fb.com \
--cc=chaitanyak@nvidia.com \
--cc=dwagner@suse.de \
--cc=gjoyce@linux.ibm.com \
--cc=hare@suse.de \
--cc=hch@lst.de \
--cc=kbusch@kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=sagi@grimberg.me \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.