* [PATCH v8 0/2] nvme: queue-depth multipath iopolicy
@ 2024-06-25 12:26 John Meneghini
2024-06-25 12:26 ` [PATCH v8 1/2] nvme-multipath: prepare for "queue-depth" iopolicy John Meneghini
` (2 more replies)
0 siblings, 3 replies; 12+ messages in thread
From: John Meneghini @ 2024-06-25 12:26 UTC (permalink / raw)
To: kbusch, hch, sagi
Cc: linux-nvme, linux-kernel, emilne, jmeneghi, jrani, randyj,
chaitanyak, hare
I've addressed all of Christoph's review comments and rebased this to
the latest nvme-6.11 branch. I also retested everything. The test
results can be seen at:
https://github.com/johnmeneghini/iopolicy/?tab=readme-ov-file#sample-data
Please add this to nvme-6.11.
Changes since V7
Broke the changes to nvme_find_path() into a prepare patch.
This made the patch much more readable. Removed the WARN_ON_ONCE
in nvme_mpath_end_request() and changed the pr_notice message
in nvme_subsys_iopolicy_update().
Changes since V6:
Cleanup tab formatting in nvme.h and removed extra white lines.
Removed the results variable from nvme_mpath_end_request().
Changes since V5:
Refactored nvme_find_path() to reduce the spaghetti code.
Cleaned up all comments and reduced the total size of the
diff, and fixed the commit message. Thomas Song now
gets credit as the first author.
Changes since V4:
Removed atomic_set() from and return if (old_iopolicy == iopolicy)
At the beginning of nvme_subsys_iopolicy_update().
Changes since V3:
Addresssed all review comments, fixed the commit log, and moved
nr_counter initialization from nvme_mpath_init_ctlr() to
nvme_mpath_init_identify().
Changes since V2:
Add the NVME_MPATH_CNT_ACTIVE flag to eliminate a READ_ONCE in the
completion path and increment/decrement the active_nr count on all mpath
IOs - including passthru commands.
Send a pr_notice when ever the iopolicy on a subsystem is changed. This
is important for support reasons. It is fully expected that users will
be changing the iopolicy with active IO in progress.
Squashed everything and rebased to nvme-v6.10
Changes since V1:
I'm re-issuing Ewan's queue-depth patches in preparation for LSFMM
These patches were first show at ALPSS 2023 where I shared the following
graphs which measure the IO distribution across 4 active-optimized
controllers using the round-robin verses queue-depth iopolicy.
https://people.redhat.com/jmeneghi/ALPSS_2023/NVMe_QD_Multipathing.pdf
Since that time we have continued testing these patches with a number of
different nvme-of storage arrays and test bed configurations, and I've
codified the tests and methods we use to measure IO distribution
All of my test results, together with the scripts I used to generate these
graphs, are available at:
https://github.com/johnmeneghini/iopolicy
Please use the scripts in this repository to do your own testing.
These patches are based on nvme-v6.9
John Meneghini (1):
nvme-multipath: prepare for "queue-depth" iopolicy
Thomas Song (1):
nvme-multipath: implement "queue-depth" iopolicy
drivers/nvme/host/core.c | 2 +-
drivers/nvme/host/multipath.c | 102 +++++++++++++++++++++++++++++++---
drivers/nvme/host/nvme.h | 4 ++
3 files changed, 99 insertions(+), 9 deletions(-)
--
2.45.2
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v8 1/2] nvme-multipath: prepare for "queue-depth" iopolicy
2024-06-25 12:26 [PATCH v8 0/2] nvme: queue-depth multipath iopolicy John Meneghini
@ 2024-06-25 12:26 ` John Meneghini
2024-06-26 5:05 ` Christoph Hellwig
` (2 more replies)
2024-06-25 12:26 ` [PATCH v8 2/2] nvme-multipath: implement " John Meneghini
2024-06-25 16:37 ` [PATCH v8 0/2] nvme: queue-depth multipath iopolicy Keith Busch
2 siblings, 3 replies; 12+ messages in thread
From: John Meneghini @ 2024-06-25 12:26 UTC (permalink / raw)
To: kbusch, hch, sagi
Cc: linux-nvme, linux-kernel, emilne, jmeneghi, jrani, randyj,
chaitanyak, hare
This patch prepares for the introduction of a new iopolicy by breaking up
the nvme_find_path() code path into sub-routines.
Signed-off-by: John Meneghini <jmeneghi@redhat.com>
---
drivers/nvme/host/multipath.c | 21 +++++++++++++++------
1 file changed, 15 insertions(+), 6 deletions(-)
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 8a4d0e377114..0ade378e514b 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -291,10 +291,14 @@ static struct nvme_ns *nvme_next_ns(struct nvme_ns_head *head,
return list_first_or_null_rcu(&head->list, struct nvme_ns, siblings);
}
-static struct nvme_ns *nvme_round_robin_path(struct nvme_ns_head *head,
- int node, struct nvme_ns *old)
+static struct nvme_ns *nvme_round_robin_path(struct nvme_ns_head *head)
{
struct nvme_ns *ns, *found = NULL;
+ int node = numa_node_id();
+ struct nvme_ns *old = srcu_dereference(head->current_path[node], &head->srcu);
+
+ if (unlikely(!old))
+ return __nvme_find_path(head, node);
if (list_is_singular(&head->list)) {
if (nvme_path_is_disabled(old))
@@ -340,7 +344,7 @@ static inline bool nvme_path_is_optimized(struct nvme_ns *ns)
ns->ana_state == NVME_ANA_OPTIMIZED;
}
-inline struct nvme_ns *nvme_find_path(struct nvme_ns_head *head)
+static struct nvme_ns *nvme_numa_path(struct nvme_ns_head *head)
{
int node = numa_node_id();
struct nvme_ns *ns;
@@ -348,14 +352,19 @@ inline struct nvme_ns *nvme_find_path(struct nvme_ns_head *head)
ns = srcu_dereference(head->current_path[node], &head->srcu);
if (unlikely(!ns))
return __nvme_find_path(head, node);
-
- if (READ_ONCE(head->subsys->iopolicy) == NVME_IOPOLICY_RR)
- return nvme_round_robin_path(head, node, ns);
if (unlikely(!nvme_path_is_optimized(ns)))
return __nvme_find_path(head, node);
return ns;
}
+inline struct nvme_ns *nvme_find_path(struct nvme_ns_head *head)
+{
+ if (READ_ONCE(head->subsys->iopolicy) == NVME_IOPOLICY_RR)
+ return nvme_round_robin_path(head);
+ else
+ return nvme_numa_path(head);
+}
+
static bool nvme_available_path(struct nvme_ns_head *head)
{
struct nvme_ns *ns;
--
2.45.2
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v8 2/2] nvme-multipath: implement "queue-depth" iopolicy
2024-06-25 12:26 [PATCH v8 0/2] nvme: queue-depth multipath iopolicy John Meneghini
2024-06-25 12:26 ` [PATCH v8 1/2] nvme-multipath: prepare for "queue-depth" iopolicy John Meneghini
@ 2024-06-25 12:26 ` John Meneghini
2024-06-26 5:06 ` Christoph Hellwig
` (2 more replies)
2024-06-25 16:37 ` [PATCH v8 0/2] nvme: queue-depth multipath iopolicy Keith Busch
2 siblings, 3 replies; 12+ messages in thread
From: John Meneghini @ 2024-06-25 12:26 UTC (permalink / raw)
To: kbusch, hch, sagi
Cc: linux-nvme, linux-kernel, emilne, jmeneghi, jrani, randyj,
chaitanyak, hare
From: Thomas Song <tsong@purestorage.com>
The round-robin path selector is inefficient in cases where there is a
difference in latency between paths. In the presence of one or more
high latency paths the round-robin selector continues to use the high
latency path equally. This results in a bias towards the highest latency
path and can cause a significant decrease in overall performance as IOs
pile on the highest latency path. This problem is acute with NVMe-oF
controllers.
The queue-depth path selector sends I/O down the path with the lowest
number of requests in its request queue. Paths with lower latency will
clear requests more quickly and have less requests queued compared to
higher latency paths. The goal of this path selector is to make more use
of lower latency paths which will bring down overall IO latency and
increase throughput and performance.
Signed-off-by: Thomas Song <tsong@purestorage.com>
[emilne: patch developed by Thomas Song @ Pure Storage, fixed whitespace
and compilation warnings, updated MODULE_PARM description, and
fixed potential issue with ->current_path[] being used]
Co-developed-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
[jmeneghi: various changes and improvements, addressed review comments]
Co-developed-by: John Meneghini <jmeneghi@redhat.com>
Signed-off-by: John Meneghini <jmeneghi@redhat.com>
Link: https://lore.kernel.org/linux-nvme/20240509202929.831680-1-jmeneghi@redhat.com/
Tested-by: Marco Patalano <mpatalan@redhat.com>
Reviewed-by: Randy Jennings <randyj@purestorage.com>
Tested-by: Jyoti Rani <jrani@purestorage.com>
Tested-by: John Meneghini <jmeneghi@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
---
drivers/nvme/host/core.c | 2 +-
drivers/nvme/host/multipath.c | 85 +++++++++++++++++++++++++++++++++--
drivers/nvme/host/nvme.h | 4 ++
3 files changed, 86 insertions(+), 5 deletions(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 8d8e7a3549c6..20e7505852ce 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -110,7 +110,7 @@ struct workqueue_struct *nvme_delete_wq;
EXPORT_SYMBOL_GPL(nvme_delete_wq);
static LIST_HEAD(nvme_subsystems);
-static DEFINE_MUTEX(nvme_subsystems_lock);
+DEFINE_MUTEX(nvme_subsystems_lock);
static DEFINE_IDA(nvme_instance_ida);
static dev_t nvme_ctrl_base_chr_devt;
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 0ade378e514b..63d1dbc3c87a 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -17,6 +17,7 @@ MODULE_PARM_DESC(multipath,
static const char *nvme_iopolicy_names[] = {
[NVME_IOPOLICY_NUMA] = "numa",
[NVME_IOPOLICY_RR] = "round-robin",
+ [NVME_IOPOLICY_QD] = "queue-depth",
};
static int iopolicy = NVME_IOPOLICY_NUMA;
@@ -29,6 +30,8 @@ static int nvme_set_iopolicy(const char *val, const struct kernel_param *kp)
iopolicy = NVME_IOPOLICY_NUMA;
else if (!strncmp(val, "round-robin", 11))
iopolicy = NVME_IOPOLICY_RR;
+ else if (!strncmp(val, "queue-depth", 11))
+ iopolicy = NVME_IOPOLICY_QD;
else
return -EINVAL;
@@ -43,7 +46,7 @@ static int nvme_get_iopolicy(char *buf, const struct kernel_param *kp)
module_param_call(iopolicy, nvme_set_iopolicy, nvme_get_iopolicy,
&iopolicy, 0644);
MODULE_PARM_DESC(iopolicy,
- "Default multipath I/O policy; 'numa' (default) or 'round-robin'");
+ "Default multipath I/O policy; 'numa' (default), 'round-robin' or 'queue-depth'");
void nvme_mpath_default_iopolicy(struct nvme_subsystem *subsys)
{
@@ -128,6 +131,11 @@ void nvme_mpath_start_request(struct request *rq)
struct nvme_ns *ns = rq->q->queuedata;
struct gendisk *disk = ns->head->disk;
+ if (READ_ONCE(ns->head->subsys->iopolicy) == NVME_IOPOLICY_QD) {
+ atomic_inc(&ns->ctrl->nr_active);
+ nvme_req(rq)->flags |= NVME_MPATH_CNT_ACTIVE;
+ }
+
if (!blk_queue_io_stat(disk->queue) || blk_rq_is_passthrough(rq))
return;
@@ -141,6 +149,9 @@ void nvme_mpath_end_request(struct request *rq)
{
struct nvme_ns *ns = rq->q->queuedata;
+ if ((nvme_req(rq)->flags & NVME_MPATH_CNT_ACTIVE))
+ atomic_dec_if_positive(&ns->ctrl->nr_active);
+
if (!(nvme_req(rq)->flags & NVME_MPATH_IO_STATS))
return;
bdev_end_io_acct(ns->head->disk->part0, req_op(rq),
@@ -338,6 +349,42 @@ static struct nvme_ns *nvme_round_robin_path(struct nvme_ns_head *head)
return found;
}
+static struct nvme_ns *nvme_queue_depth_path(struct nvme_ns_head *head)
+{
+ struct nvme_ns *best_opt = NULL, *best_nonopt = NULL, *ns;
+ unsigned int min_depth_opt = UINT_MAX, min_depth_nonopt = UINT_MAX;
+ unsigned int depth;
+
+ list_for_each_entry_rcu(ns, &head->list, siblings) {
+ if (nvme_path_is_disabled(ns))
+ continue;
+
+ depth = atomic_read(&ns->ctrl->nr_active);
+
+ switch (ns->ana_state) {
+ case NVME_ANA_OPTIMIZED:
+ if (depth < min_depth_opt) {
+ min_depth_opt = depth;
+ best_opt = ns;
+ }
+ break;
+ case NVME_ANA_NONOPTIMIZED:
+ if (depth < min_depth_nonopt) {
+ min_depth_nonopt = depth;
+ best_nonopt = ns;
+ }
+ break;
+ default:
+ break;
+ }
+
+ if (min_depth_opt == 0)
+ return best_opt;
+ }
+
+ return best_opt ? best_opt : best_nonopt;
+}
+
static inline bool nvme_path_is_optimized(struct nvme_ns *ns)
{
return nvme_ctrl_state(ns->ctrl) == NVME_CTRL_LIVE &&
@@ -359,10 +406,14 @@ static struct nvme_ns *nvme_numa_path(struct nvme_ns_head *head)
inline struct nvme_ns *nvme_find_path(struct nvme_ns_head *head)
{
- if (READ_ONCE(head->subsys->iopolicy) == NVME_IOPOLICY_RR)
+ switch (READ_ONCE(head->subsys->iopolicy)) {
+ case NVME_IOPOLICY_QD:
+ return nvme_queue_depth_path(head);
+ case NVME_IOPOLICY_RR:
return nvme_round_robin_path(head);
- else
+ default:
return nvme_numa_path(head);
+ }
}
static bool nvme_available_path(struct nvme_ns_head *head)
@@ -794,6 +845,29 @@ static ssize_t nvme_subsys_iopolicy_show(struct device *dev,
nvme_iopolicy_names[READ_ONCE(subsys->iopolicy)]);
}
+static void nvme_subsys_iopolicy_update(struct nvme_subsystem *subsys,
+ int iopolicy)
+{
+ struct nvme_ctrl *ctrl;
+ int old_iopolicy = READ_ONCE(subsys->iopolicy);
+
+ if (old_iopolicy == iopolicy)
+ return;
+
+ WRITE_ONCE(subsys->iopolicy, iopolicy);
+
+ /* iopolicy changes clear the mpath by design */
+ mutex_lock(&nvme_subsystems_lock);
+ list_for_each_entry(ctrl, &subsys->ctrls, subsys_entry)
+ nvme_mpath_clear_ctrl_paths(ctrl);
+ mutex_unlock(&nvme_subsystems_lock);
+
+ pr_notice("subsysnqn %s iopolicy changed from %s to %s\n",
+ subsys->subnqn,
+ nvme_iopolicy_names[old_iopolicy],
+ nvme_iopolicy_names[iopolicy]);
+}
+
static ssize_t nvme_subsys_iopolicy_store(struct device *dev,
struct device_attribute *attr, const char *buf, size_t count)
{
@@ -803,7 +877,7 @@ static ssize_t nvme_subsys_iopolicy_store(struct device *dev,
for (i = 0; i < ARRAY_SIZE(nvme_iopolicy_names); i++) {
if (sysfs_streq(buf, nvme_iopolicy_names[i])) {
- WRITE_ONCE(subsys->iopolicy, i);
+ nvme_subsys_iopolicy_update(subsys, i);
return count;
}
}
@@ -911,6 +985,9 @@ int nvme_mpath_init_identify(struct nvme_ctrl *ctrl, struct nvme_id_ctrl *id)
!(ctrl->subsys->cmic & NVME_CTRL_CMIC_ANA))
return 0;
+ /* initialize this in the identify path to cover controller resets */
+ atomic_set(&ctrl->nr_active, 0);
+
if (!ctrl->max_namespaces ||
ctrl->max_namespaces > le32_to_cpu(id->nn)) {
dev_err(ctrl->device,
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index c63f2b452369..b512012b7044 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -49,6 +49,7 @@ extern unsigned int admin_timeout;
extern struct workqueue_struct *nvme_wq;
extern struct workqueue_struct *nvme_reset_wq;
extern struct workqueue_struct *nvme_delete_wq;
+extern struct mutex nvme_subsystems_lock;
/*
* List of workarounds for devices that required behavior not specified in
@@ -195,6 +196,7 @@ enum {
NVME_REQ_CANCELLED = (1 << 0),
NVME_REQ_USERCMD = (1 << 1),
NVME_MPATH_IO_STATS = (1 << 2),
+ NVME_MPATH_CNT_ACTIVE = (1 << 3),
};
static inline struct nvme_request *nvme_req(struct request *req)
@@ -360,6 +362,7 @@ struct nvme_ctrl {
size_t ana_log_size;
struct timer_list anatt_timer;
struct work_struct ana_work;
+ atomic_t nr_active;
#endif
#ifdef CONFIG_NVME_HOST_AUTH
@@ -408,6 +411,7 @@ static inline enum nvme_ctrl_state nvme_ctrl_state(struct nvme_ctrl *ctrl)
enum nvme_iopolicy {
NVME_IOPOLICY_NUMA,
NVME_IOPOLICY_RR,
+ NVME_IOPOLICY_QD,
};
struct nvme_subsystem {
--
2.45.2
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH v8 0/2] nvme: queue-depth multipath iopolicy
2024-06-25 12:26 [PATCH v8 0/2] nvme: queue-depth multipath iopolicy John Meneghini
2024-06-25 12:26 ` [PATCH v8 1/2] nvme-multipath: prepare for "queue-depth" iopolicy John Meneghini
2024-06-25 12:26 ` [PATCH v8 2/2] nvme-multipath: implement " John Meneghini
@ 2024-06-25 16:37 ` Keith Busch
2024-06-26 16:54 ` Keith Busch
2 siblings, 1 reply; 12+ messages in thread
From: Keith Busch @ 2024-06-25 16:37 UTC (permalink / raw)
To: John Meneghini
Cc: hch, sagi, linux-nvme, linux-kernel, emilne, jrani, randyj,
chaitanyak, hare
On Tue, Jun 25, 2024 at 08:26:03AM -0400, John Meneghini wrote:
> Please add this to nvme-6.11.
This looks good to me! I'll give another day for potential
comments/reviews, but I think this okay for 6.11.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v8 1/2] nvme-multipath: prepare for "queue-depth" iopolicy
2024-06-25 12:26 ` [PATCH v8 1/2] nvme-multipath: prepare for "queue-depth" iopolicy John Meneghini
@ 2024-06-26 5:05 ` Christoph Hellwig
2024-06-26 8:02 ` Chaitanya Kulkarni
2024-06-26 8:24 ` Sagi Grimberg
2 siblings, 0 replies; 12+ messages in thread
From: Christoph Hellwig @ 2024-06-26 5:05 UTC (permalink / raw)
To: John Meneghini
Cc: kbusch, hch, sagi, linux-nvme, linux-kernel, emilne, jrani,
randyj, chaitanyak, hare
On Tue, Jun 25, 2024 at 08:26:04AM -0400, John Meneghini wrote:
> + struct nvme_ns *old = srcu_dereference(head->current_path[node], &head->srcu);
Please avoid the overly long line here.
> +inline struct nvme_ns *nvme_find_path(struct nvme_ns_head *head)
> +{
> + if (READ_ONCE(head->subsys->iopolicy) == NVME_IOPOLICY_RR)
> + return nvme_round_robin_path(head);
> + else
> + return nvme_numa_path(head);
No need for an else after a return.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v8 2/2] nvme-multipath: implement "queue-depth" iopolicy
2024-06-25 12:26 ` [PATCH v8 2/2] nvme-multipath: implement " John Meneghini
@ 2024-06-26 5:06 ` Christoph Hellwig
2024-06-26 8:19 ` Chaitanya Kulkarni
2024-06-26 8:24 ` Sagi Grimberg
2 siblings, 0 replies; 12+ messages in thread
From: Christoph Hellwig @ 2024-06-26 5:06 UTC (permalink / raw)
To: John Meneghini
Cc: kbusch, hch, sagi, linux-nvme, linux-kernel, emilne, jrani,
randyj, chaitanyak, hare
Looks good:
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v8 1/2] nvme-multipath: prepare for "queue-depth" iopolicy
2024-06-25 12:26 ` [PATCH v8 1/2] nvme-multipath: prepare for "queue-depth" iopolicy John Meneghini
2024-06-26 5:05 ` Christoph Hellwig
@ 2024-06-26 8:02 ` Chaitanya Kulkarni
2024-06-26 8:24 ` Sagi Grimberg
2 siblings, 0 replies; 12+ messages in thread
From: Chaitanya Kulkarni @ 2024-06-26 8:02 UTC (permalink / raw)
To: John Meneghini, kbusch@kernel.org, hch@lst.de, sagi@grimberg.me
Cc: linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org,
emilne@redhat.com, jrani@purestorage.com, randyj@purestorage.com,
hare@kernel.org
On 6/25/24 05:26, John Meneghini wrote:
> This patch prepares for the introduction of a new iopolicy by breaking up
> the nvme_find_path() code path into sub-routines.
>
> Signed-off-by: John Meneghini <jmeneghi@redhat.com>
> ---
> drivers/nvme/host/multipath.c | 21 +++++++++++++++------
> 1 file changed, 15 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
> index 8a4d0e377114..0ade378e514b 100644
> --- a/drivers/nvme/host/multipath.c
> +++ b/drivers/nvme/host/multipath.c
> @@ -291,10 +291,14 @@ static struct nvme_ns *nvme_next_ns(struct nvme_ns_head *head,
> return list_first_or_null_rcu(&head->list, struct nvme_ns, siblings);
> }
>
> -static struct nvme_ns *nvme_round_robin_path(struct nvme_ns_head *head,
> - int node, struct nvme_ns *old)
> +static struct nvme_ns *nvme_round_robin_path(struct nvme_ns_head *head)
> {
> struct nvme_ns *ns, *found = NULL;
> + int node = numa_node_id();
> + struct nvme_ns *old = srcu_dereference(head->current_path[node], &head->srcu);
> +
> + if (unlikely(!old))
> + return __nvme_find_path(head, node);
nit:- above looks little bit odd with long line, I'd just :-
struct nvme_ns *old;
old = srcu_dereference(head->current_path[node], &head->srcu);
if (unlikely(!old))
return __nvme_find_path(head, node);
>
> if (list_is_singular(&head->list)) {
> if (nvme_path_is_disabled(old))
> @@ -340,7 +344,7 @@ static inline bool nvme_path_is_optimized(struct nvme_ns *ns)
> ns->ana_state == NVME_ANA_OPTIMIZED;
> }
>
> -inline struct nvme_ns *nvme_find_path(struct nvme_ns_head *head)
> +static struct nvme_ns *nvme_numa_path(struct nvme_ns_head *head)
> {
> int node = numa_node_id();
> struct nvme_ns *ns;
> @@ -348,14 +352,19 @@ inline struct nvme_ns *nvme_find_path(struct nvme_ns_head *head)
> ns = srcu_dereference(head->current_path[node], &head->srcu);
> if (unlikely(!ns))
> return __nvme_find_path(head, node);
> -
> - if (READ_ONCE(head->subsys->iopolicy) == NVME_IOPOLICY_RR)
> - return nvme_round_robin_path(head, node, ns);
> if (unlikely(!nvme_path_is_optimized(ns)))
> return __nvme_find_path(head, node);
> return ns;
> }
>
> +inline struct nvme_ns *nvme_find_path(struct nvme_ns_head *head)
> +{
> + if (READ_ONCE(head->subsys->iopolicy) == NVME_IOPOLICY_RR)
> + return nvme_round_robin_path(head);
> + else
> + return nvme_numa_path(head);
nit: - else is not needed ...
Looks good.
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
-ck
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v8 2/2] nvme-multipath: implement "queue-depth" iopolicy
2024-06-25 12:26 ` [PATCH v8 2/2] nvme-multipath: implement " John Meneghini
2024-06-26 5:06 ` Christoph Hellwig
@ 2024-06-26 8:19 ` Chaitanya Kulkarni
2024-06-26 8:24 ` Sagi Grimberg
2 siblings, 0 replies; 12+ messages in thread
From: Chaitanya Kulkarni @ 2024-06-26 8:19 UTC (permalink / raw)
To: John Meneghini, kbusch@kernel.org, hch@lst.de, sagi@grimberg.me
Cc: linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org,
emilne@redhat.com, jrani@purestorage.com, randyj@purestorage.com,
hare@kernel.org
On 6/25/24 05:26, John Meneghini wrote:
> From: Thomas Song<tsong@purestorage.com>
>
> The round-robin path selector is inefficient in cases where there is a
> difference in latency between paths. In the presence of one or more
> high latency paths the round-robin selector continues to use the high
> latency path equally. This results in a bias towards the highest latency
> path and can cause a significant decrease in overall performance as IOs
> pile on the highest latency path. This problem is acute with NVMe-oF
> controllers.
>
> The queue-depth path selector sends I/O down the path with the lowest
> number of requests in its request queue. Paths with lower latency will
> clear requests more quickly and have less requests queued compared to
> higher latency paths. The goal of this path selector is to make more use
> of lower latency paths which will bring down overall IO latency and
> increase throughput and performance.
>
> Signed-off-by: Thomas Song<tsong@purestorage.com>
> [emilne: patch developed by Thomas Song @ Pure Storage, fixed whitespace
> and compilation warnings, updated MODULE_PARM description, and
> fixed potential issue with ->current_path[] being used]
> Co-developed-by: Ewan D. Milne<emilne@redhat.com>
> Signed-off-by: Ewan D. Milne<emilne@redhat.com>
> [jmeneghi: various changes and improvements, addressed review comments]
> Co-developed-by: John Meneghini<jmeneghi@redhat.com>
> Signed-off-by: John Meneghini<jmeneghi@redhat.com>
> Link:https://lore.kernel.org/linux-nvme/20240509202929.831680-1-jmeneghi@redhat.com/
> Tested-by: Marco Patalano<mpatalan@redhat.com>
> Reviewed-by: Randy Jennings<randyj@purestorage.com>
> Tested-by: Jyoti Rani<jrani@purestorage.com>
> Tested-by: John Meneghini<jmeneghi@redhat.com>
> Reviewed-by: Hannes Reinecke<hare@suse.de>
thanks for sharing github link, those scripts are in-general helpful ...
Looks good.
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
-ck
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v8 1/2] nvme-multipath: prepare for "queue-depth" iopolicy
2024-06-25 12:26 ` [PATCH v8 1/2] nvme-multipath: prepare for "queue-depth" iopolicy John Meneghini
2024-06-26 5:05 ` Christoph Hellwig
2024-06-26 8:02 ` Chaitanya Kulkarni
@ 2024-06-26 8:24 ` Sagi Grimberg
2 siblings, 0 replies; 12+ messages in thread
From: Sagi Grimberg @ 2024-06-26 8:24 UTC (permalink / raw)
To: John Meneghini, kbusch, hch
Cc: linux-nvme, linux-kernel, emilne, jrani, randyj, chaitanyak, hare
other than the comment from hch,
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v8 2/2] nvme-multipath: implement "queue-depth" iopolicy
2024-06-25 12:26 ` [PATCH v8 2/2] nvme-multipath: implement " John Meneghini
2024-06-26 5:06 ` Christoph Hellwig
2024-06-26 8:19 ` Chaitanya Kulkarni
@ 2024-06-26 8:24 ` Sagi Grimberg
2 siblings, 0 replies; 12+ messages in thread
From: Sagi Grimberg @ 2024-06-26 8:24 UTC (permalink / raw)
To: John Meneghini, kbusch, hch
Cc: linux-nvme, linux-kernel, emilne, jrani, randyj, chaitanyak, hare
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v8 0/2] nvme: queue-depth multipath iopolicy
2024-06-25 16:37 ` [PATCH v8 0/2] nvme: queue-depth multipath iopolicy Keith Busch
@ 2024-06-26 16:54 ` Keith Busch
2024-06-26 17:26 ` Randy Jennings
0 siblings, 1 reply; 12+ messages in thread
From: Keith Busch @ 2024-06-26 16:54 UTC (permalink / raw)
To: John Meneghini
Cc: hch, sagi, linux-nvme, linux-kernel, emilne, jrani, randyj,
chaitanyak, hare
On Tue, Jun 25, 2024 at 10:37:33AM -0600, Keith Busch wrote:
> On Tue, Jun 25, 2024 at 08:26:03AM -0400, John Meneghini wrote:
> > Please add this to nvme-6.11.
>
> This looks good to me! I'll give another day for potential
> comments/reviews, but I think this okay for 6.11.
I fixed up the suggestions from Christoph while applying. Thanks,
patches are now in nvme-6.11.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v8 0/2] nvme: queue-depth multipath iopolicy
2024-06-26 16:54 ` Keith Busch
@ 2024-06-26 17:26 ` Randy Jennings
0 siblings, 0 replies; 12+ messages in thread
From: Randy Jennings @ 2024-06-26 17:26 UTC (permalink / raw)
To: Keith Busch
Cc: John Meneghini, hch, sagi, linux-nvme, linux-kernel, emilne,
jrani, chaitanyak, hare
> I fixed up the suggestions from Christoph while applying. Thanks,
> patches are now in nvme-6.11.
Thank you very much!
Sincerely,
Randy Jennings
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2024-06-26 17:26 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-25 12:26 [PATCH v8 0/2] nvme: queue-depth multipath iopolicy John Meneghini
2024-06-25 12:26 ` [PATCH v8 1/2] nvme-multipath: prepare for "queue-depth" iopolicy John Meneghini
2024-06-26 5:05 ` Christoph Hellwig
2024-06-26 8:02 ` Chaitanya Kulkarni
2024-06-26 8:24 ` Sagi Grimberg
2024-06-25 12:26 ` [PATCH v8 2/2] nvme-multipath: implement " John Meneghini
2024-06-26 5:06 ` Christoph Hellwig
2024-06-26 8:19 ` Chaitanya Kulkarni
2024-06-26 8:24 ` Sagi Grimberg
2024-06-25 16:37 ` [PATCH v8 0/2] nvme: queue-depth multipath iopolicy Keith Busch
2024-06-26 16:54 ` Keith Busch
2024-06-26 17:26 ` Randy Jennings
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox