* [PATCH v7 0/1] nvme: queue-depth multipath iopolicy
@ 2024-06-19 16:35 John Meneghini
2024-06-19 16:35 ` [PATCH v7 1/1] nvme-multipath: implement "queue-depth" iopolicy John Meneghini
0 siblings, 1 reply; 8+ messages in thread
From: John Meneghini @ 2024-06-19 16:35 UTC (permalink / raw)
To: kbusch, hch, sagi
Cc: linux-nvme, linux-kernel, emilne, jmeneghi, jrani, randyj,
chaitanyak, hare
I've addressed Chaitanya's and Hannes's review comments
and retested everything. Test results can be seen at:
https://github.com/johnmeneghini/iopolicy/tree/sample3
Please add this to nvme-6.11.
Changes since V6:
Cleanup tab formatting in nvme.h and removed extra white lines. Removed the
results variable from nvme_mpath_end_request().
Changes since V5:
Refactored nvme_find_path() to reduce the spaghetti code.
Cleaned up all comments and reduced the total size of the
diff, and fixed the commit message. Thomas Song now
gets credit as the first author.
Changes since V4:
Removed atomic_set() from and return if (old_iopolicy == iopolicy)
At the beginning of nvme_subsys_iopolicy_update().
Changes since V3:
Addresssed all review comments, fixed the commit log, and moved
nr_counter initialization from nvme_mpath_init_ctlr() to
nvme_mpath_init_identify().
Changes since V2:
Add the NVME_MPATH_CNT_ACTIVE flag to eliminate a READ_ONCE in the
completion path and increment/decrement the active_nr count on all mpath
IOs - including passthru commands.
Send a pr_notice when ever the iopolicy on a subsystem is changed. This
is important for support reasons. It is fully expected that users will
be changing the iopolicy with active IO in progress.
Squashed everything and rebased to nvme-v6.10
Changes since V1:
I'm re-issuing Ewan's queue-depth patches in preparation for LSFMM
These patches were first show at ALPSS 2023 where I shared the following
graphs which measure the IO distribution across 4 active-optimized
controllers using the round-robin verses queue-depth iopolicy.
https://people.redhat.com/jmeneghi/ALPSS_2023/NVMe_QD_Multipathing.pdf
Since that time we have continued testing these patches with a number of
different nvme-of storage arrays and test bed configurations, and I've
codified the tests and methods we use to measure IO distribution
All of my test results, together with the scripts I used to generate these
graphs, are available at:
https://github.com/johnmeneghini/iopolicy
Please use the scripts in this repository to do your own testing.
These patches are based on nvme-v6.9
Thomas Song (1):
nvme-multipath: implement "queue-depth" iopolicy
drivers/nvme/host/core.c | 2 +-
drivers/nvme/host/multipath.c | 103 +++++++++++++++++++++++++++++++---
drivers/nvme/host/nvme.h | 5 ++
3 files changed, 101 insertions(+), 9 deletions(-)
--
2.45.1
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH v7 1/1] nvme-multipath: implement "queue-depth" iopolicy
2024-06-19 16:35 [PATCH v7 0/1] nvme: queue-depth multipath iopolicy John Meneghini
@ 2024-06-19 16:35 ` John Meneghini
2024-06-20 6:56 ` Christoph Hellwig
0 siblings, 1 reply; 8+ messages in thread
From: John Meneghini @ 2024-06-19 16:35 UTC (permalink / raw)
To: kbusch, hch, sagi
Cc: linux-nvme, linux-kernel, emilne, jmeneghi, jrani, randyj,
chaitanyak, hare
From: Thomas Song <tsong@purestorage.com>
The round-robin path selector is inefficient in cases where there is a
difference in latency between paths. In the presence of one or more
high latency paths the round-robin selector continues to use the high
latency path equally. This results in a bias towards the highest latency
path and can cause a significant decrease in overall performance as IOs
pile on the highest latency path. This problem is acute with NVMe-oF
controllers.
The queue-depth policy instead sends I/O requests down the path with the
least amount of requests in its request queue. Paths with lower latency
will clear requests more quickly and have less requests in their queues
compared to higher latency paths. The goal of this path selector is to
make more use of lower latency paths which will bring down overall IO
latency and increase throughput and performance.
Signed-off-by: Thomas Song <tsong@purestorage.com>
[emilne: patch developed by Thomas Song @ Pure Storage, fixed whitespace
and compilation warnings, updated MODULE_PARM description, and
fixed potential issue with ->current_path[] being used]
Co-developed-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
[jmeneghi: vairious changes and improvements, addressed review comments]
Co-developed-by: John Meneghini <jmeneghi@redhat.com>
Signed-off-by: John Meneghini <jmeneghi@redhat.com>
Link: https://lore.kernel.org/linux-nvme/20240509202929.831680-1-jmeneghi@redhat.com/
Tested-by: Marco Patalano <mpatalan@redhat.com>
Reviewed-by: Randy Jennings <randyj@purestorage.com>
Tested-by: Jyoti Rani <jrani@purestorage.com>
Tested-by: John Meneghini <jmeneghi@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
---
drivers/nvme/host/core.c | 2 +-
drivers/nvme/host/multipath.c | 103 +++++++++++++++++++++++++++++++---
drivers/nvme/host/nvme.h | 5 ++
3 files changed, 101 insertions(+), 9 deletions(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 7c9f91314d36..c10ff8815d82 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -110,7 +110,7 @@ struct workqueue_struct *nvme_delete_wq;
EXPORT_SYMBOL_GPL(nvme_delete_wq);
static LIST_HEAD(nvme_subsystems);
-static DEFINE_MUTEX(nvme_subsystems_lock);
+DEFINE_MUTEX(nvme_subsystems_lock);
static DEFINE_IDA(nvme_instance_ida);
static dev_t nvme_ctrl_base_chr_devt;
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 03a6868f4dbc..4e97cfe76a95 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -17,6 +17,7 @@ MODULE_PARM_DESC(multipath,
static const char *nvme_iopolicy_names[] = {
[NVME_IOPOLICY_NUMA] = "numa",
[NVME_IOPOLICY_RR] = "round-robin",
+ [NVME_IOPOLICY_QD] = "queue-depth",
};
static int iopolicy = NVME_IOPOLICY_NUMA;
@@ -29,6 +30,8 @@ static int nvme_set_iopolicy(const char *val, const struct kernel_param *kp)
iopolicy = NVME_IOPOLICY_NUMA;
else if (!strncmp(val, "round-robin", 11))
iopolicy = NVME_IOPOLICY_RR;
+ else if (!strncmp(val, "queue-depth", 11))
+ iopolicy = NVME_IOPOLICY_QD;
else
return -EINVAL;
@@ -43,7 +46,7 @@ static int nvme_get_iopolicy(char *buf, const struct kernel_param *kp)
module_param_call(iopolicy, nvme_set_iopolicy, nvme_get_iopolicy,
&iopolicy, 0644);
MODULE_PARM_DESC(iopolicy,
- "Default multipath I/O policy; 'numa' (default) or 'round-robin'");
+ "Default multipath I/O policy; 'numa' (default), 'round-robin' or 'queue-depth'");
void nvme_mpath_default_iopolicy(struct nvme_subsystem *subsys)
{
@@ -128,6 +131,11 @@ void nvme_mpath_start_request(struct request *rq)
struct nvme_ns *ns = rq->q->queuedata;
struct gendisk *disk = ns->head->disk;
+ if (READ_ONCE(ns->head->subsys->iopolicy) == NVME_IOPOLICY_QD) {
+ atomic_inc(&ns->ctrl->nr_active);
+ nvme_req(rq)->flags |= NVME_MPATH_CNT_ACTIVE;
+ }
+
if (!blk_queue_io_stat(disk->queue) || blk_rq_is_passthrough(rq))
return;
@@ -141,6 +149,9 @@ void nvme_mpath_end_request(struct request *rq)
{
struct nvme_ns *ns = rq->q->queuedata;
+ if ((nvme_req(rq)->flags & NVME_MPATH_CNT_ACTIVE))
+ WARN_ON_ONCE((atomic_dec_if_positive(&ns->ctrl->nr_active)) < 0);
+
if (!(nvme_req(rq)->flags & NVME_MPATH_IO_STATS))
return;
bdev_end_io_acct(ns->head->disk->part0, req_op(rq),
@@ -291,10 +302,14 @@ static struct nvme_ns *nvme_next_ns(struct nvme_ns_head *head,
return list_first_or_null_rcu(&head->list, struct nvme_ns, siblings);
}
-static struct nvme_ns *nvme_round_robin_path(struct nvme_ns_head *head,
- int node, struct nvme_ns *old)
+static struct nvme_ns *nvme_round_robin_path(struct nvme_ns_head *head)
{
- struct nvme_ns *ns, *found = NULL;
+ struct nvme_ns *ns, *old, *found = NULL;
+ int node = numa_node_id();
+
+ old = srcu_dereference(head->current_path[node], &head->srcu);
+ if (unlikely(!old))
+ return __nvme_find_path(head, node);
if (list_is_singular(&head->list)) {
if (nvme_path_is_disabled(old))
@@ -334,13 +349,49 @@ static struct nvme_ns *nvme_round_robin_path(struct nvme_ns_head *head,
return found;
}
+static struct nvme_ns *nvme_queue_depth_path(struct nvme_ns_head *head)
+{
+ struct nvme_ns *best_opt = NULL, *best_nonopt = NULL, *ns;
+ unsigned int min_depth_opt = UINT_MAX, min_depth_nonopt = UINT_MAX;
+ unsigned int depth;
+
+ list_for_each_entry_rcu(ns, &head->list, siblings) {
+ if (nvme_path_is_disabled(ns))
+ continue;
+
+ depth = atomic_read(&ns->ctrl->nr_active);
+
+ switch (ns->ana_state) {
+ case NVME_ANA_OPTIMIZED:
+ if (depth < min_depth_opt) {
+ min_depth_opt = depth;
+ best_opt = ns;
+ }
+ break;
+ case NVME_ANA_NONOPTIMIZED:
+ if (depth < min_depth_nonopt) {
+ min_depth_nonopt = depth;
+ best_nonopt = ns;
+ }
+ break;
+ default:
+ break;
+ }
+
+ if (min_depth_opt == 0)
+ return best_opt;
+ }
+
+ return best_opt ? best_opt : best_nonopt;
+}
+
static inline bool nvme_path_is_optimized(struct nvme_ns *ns)
{
return nvme_ctrl_state(ns->ctrl) == NVME_CTRL_LIVE &&
ns->ana_state == NVME_ANA_OPTIMIZED;
}
-inline struct nvme_ns *nvme_find_path(struct nvme_ns_head *head)
+static struct nvme_ns *nvme_numa_path(struct nvme_ns_head *head)
{
int node = numa_node_id();
struct nvme_ns *ns;
@@ -349,13 +400,24 @@ inline struct nvme_ns *nvme_find_path(struct nvme_ns_head *head)
if (unlikely(!ns))
return __nvme_find_path(head, node);
- if (READ_ONCE(head->subsys->iopolicy) == NVME_IOPOLICY_RR)
- return nvme_round_robin_path(head, node, ns);
if (unlikely(!nvme_path_is_optimized(ns)))
return __nvme_find_path(head, node);
return ns;
}
+
+inline struct nvme_ns *nvme_find_path(struct nvme_ns_head *head)
+{
+ switch (READ_ONCE(head->subsys->iopolicy)) {
+ case NVME_IOPOLICY_QD:
+ return nvme_queue_depth_path(head);
+ case NVME_IOPOLICY_RR:
+ return nvme_round_robin_path(head);
+ default:
+ return nvme_numa_path(head);
+ }
+}
+
static bool nvme_available_path(struct nvme_ns_head *head)
{
struct nvme_ns *ns;
@@ -803,6 +865,28 @@ static ssize_t nvme_subsys_iopolicy_show(struct device *dev,
nvme_iopolicy_names[READ_ONCE(subsys->iopolicy)]);
}
+static void nvme_subsys_iopolicy_update(struct nvme_subsystem *subsys,
+ int iopolicy)
+{
+ struct nvme_ctrl *ctrl;
+ int old_iopolicy = READ_ONCE(subsys->iopolicy);
+
+ if (old_iopolicy == iopolicy)
+ return;
+
+ WRITE_ONCE(subsys->iopolicy, iopolicy);
+
+ /* iopolicy changes clear the mpath by design */
+ mutex_lock(&nvme_subsystems_lock);
+ list_for_each_entry(ctrl, &subsys->ctrls, subsys_entry)
+ nvme_mpath_clear_ctrl_paths(ctrl);
+ mutex_unlock(&nvme_subsystems_lock);
+
+ pr_notice("%s: changed from %s to %s for subsysnqn %s\n", __func__,
+ nvme_iopolicy_names[old_iopolicy], nvme_iopolicy_names[iopolicy],
+ subsys->subnqn);
+}
+
static ssize_t nvme_subsys_iopolicy_store(struct device *dev,
struct device_attribute *attr, const char *buf, size_t count)
{
@@ -812,7 +896,7 @@ static ssize_t nvme_subsys_iopolicy_store(struct device *dev,
for (i = 0; i < ARRAY_SIZE(nvme_iopolicy_names); i++) {
if (sysfs_streq(buf, nvme_iopolicy_names[i])) {
- WRITE_ONCE(subsys->iopolicy, i);
+ nvme_subsys_iopolicy_update(subsys, i);
return count;
}
}
@@ -923,6 +1007,9 @@ int nvme_mpath_init_identify(struct nvme_ctrl *ctrl, struct nvme_id_ctrl *id)
!(ctrl->subsys->cmic & NVME_CTRL_CMIC_ANA))
return 0;
+ /* initialize this in the identify path to cover controller resets */
+ atomic_set(&ctrl->nr_active, 0);
+
if (!ctrl->max_namespaces ||
ctrl->max_namespaces > le32_to_cpu(id->nn)) {
dev_err(ctrl->device,
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 73442d3f504b..c440e4cb60f8 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -50,6 +50,8 @@ extern struct workqueue_struct *nvme_wq;
extern struct workqueue_struct *nvme_reset_wq;
extern struct workqueue_struct *nvme_delete_wq;
+extern struct mutex nvme_subsystems_lock;
+
/*
* List of workarounds for devices that required behavior not specified in
* the standard.
@@ -195,6 +197,7 @@ enum {
NVME_REQ_CANCELLED = (1 << 0),
NVME_REQ_USERCMD = (1 << 1),
NVME_MPATH_IO_STATS = (1 << 2),
+ NVME_MPATH_CNT_ACTIVE = (1 << 3),
};
static inline struct nvme_request *nvme_req(struct request *req)
@@ -360,6 +363,7 @@ struct nvme_ctrl {
size_t ana_log_size;
struct timer_list anatt_timer;
struct work_struct ana_work;
+ atomic_t nr_active;
#endif
#ifdef CONFIG_NVME_HOST_AUTH
@@ -408,6 +412,7 @@ static inline enum nvme_ctrl_state nvme_ctrl_state(struct nvme_ctrl *ctrl)
enum nvme_iopolicy {
NVME_IOPOLICY_NUMA,
NVME_IOPOLICY_RR,
+ NVME_IOPOLICY_QD,
};
struct nvme_subsystem {
--
2.45.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v7 1/1] nvme-multipath: implement "queue-depth" iopolicy
2024-06-19 16:35 ` [PATCH v7 1/1] nvme-multipath: implement "queue-depth" iopolicy John Meneghini
@ 2024-06-20 6:56 ` Christoph Hellwig
2024-06-20 14:41 ` John Meneghini
2024-06-20 17:54 ` John Meneghini
0 siblings, 2 replies; 8+ messages in thread
From: Christoph Hellwig @ 2024-06-20 6:56 UTC (permalink / raw)
To: John Meneghini
Cc: kbusch, hch, sagi, linux-nvme, linux-kernel, emilne, jrani,
randyj, chaitanyak, hare
> [jmeneghi: vairious changes and improvements, addressed review comments]
s/vairious/various/
> + if ((nvme_req(rq)->flags & NVME_MPATH_CNT_ACTIVE))
No need for the double braces here.
> + WARN_ON_ONCE((atomic_dec_if_positive(&ns->ctrl->nr_active)) < 0);
Overly long line.
But I don't understand why you need the WARN_ON anyway. If the value
must always be positive there is no point in atomic_dec_if_positive.
If misaccounting is fine there WARN_ON is counterproductive.
> -static struct nvme_ns *nvme_round_robin_path(struct nvme_ns_head *head,
> - int node, struct nvme_ns *old)
> +static struct nvme_ns *nvme_round_robin_path(struct nvme_ns_head *head)
> {
> - struct nvme_ns *ns, *found = NULL;
> + struct nvme_ns *ns, *old, *found = NULL;
> + int node = numa_node_id();
> +
> + old = srcu_dereference(head->current_path[node], &head->srcu);
> + if (unlikely(!old))
> + return __nvme_find_path(head, node);
Can you split the refactoring of the existing path selectors into a
prep patch, please?
> +static void nvme_subsys_iopolicy_update(struct nvme_subsystem *subsys,
> + int iopolicy)
> +{
> + struct nvme_ctrl *ctrl;
> + int old_iopolicy = READ_ONCE(subsys->iopolicy);
> +
> + if (old_iopolicy == iopolicy)
> + return;
> +
> + WRITE_ONCE(subsys->iopolicy, iopolicy);
What is the atomicy model here? There doesn't seem to be any
global lock protecting it? Maybe move it into the
nvme_subsystems_lock critical section?
> + pr_notice("%s: changed from %s to %s for subsysnqn %s\n", __func__,
> + nvme_iopolicy_names[old_iopolicy], nvme_iopolicy_names[iopolicy],
> + subsys->subnqn);
The function is not really relevant here, this should become something
like:
pr_notice("%s: changing iopolicy from %s to %s\n",
subsys->subnqn,
nvme_iopolicy_names[old_iopolicy],
nvme_iopolicy_names[iopolicy]);
or maybe:
dev_notice(changing iopolicy from %s to %s\n",
&subsys->dev,
nvme_iopolicy_names[old_iopolicy],
nvme_iopolicy_names[iopolicy]);
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v7 1/1] nvme-multipath: implement "queue-depth" iopolicy
2024-06-20 6:56 ` Christoph Hellwig
@ 2024-06-20 14:41 ` John Meneghini
2024-06-20 17:54 ` John Meneghini
1 sibling, 0 replies; 8+ messages in thread
From: John Meneghini @ 2024-06-20 14:41 UTC (permalink / raw)
To: Christoph Hellwig
Cc: kbusch, sagi, linux-nvme, linux-kernel, emilne, jrani, randyj,
chaitanyak, hare
On 6/20/24 02:56, Christoph Hellwig wrote:
>> + WARN_ON_ONCE((atomic_dec_if_positive(&ns->ctrl->nr_active)) < 0);
> Overly long line.
>
> But I don't understand why you need the WARN_ON anyway. If the value
> must always be positive there is no point in atomic_dec_if_positive.
> If misaccounting is fine there WARN_ON is counterproductive.
Agreed. I wanted this here for testing purposes. In all of my hours of running tests I never saw this warning go off, so I
agree it should be removed.
/John
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v7 1/1] nvme-multipath: implement "queue-depth" iopolicy
2024-06-20 6:56 ` Christoph Hellwig
2024-06-20 14:41 ` John Meneghini
@ 2024-06-20 17:54 ` John Meneghini
2024-06-24 8:46 ` Christoph Hellwig
1 sibling, 1 reply; 8+ messages in thread
From: John Meneghini @ 2024-06-20 17:54 UTC (permalink / raw)
To: Christoph Hellwig
Cc: kbusch, sagi, linux-nvme, linux-kernel, emilne, jrani, randyj,
chaitanyak, hare
On 6/20/24 02:56, Christoph Hellwig wrote:
>> [jmeneghi: vairious changes and improvements, addressed review comments]
>> -static struct nvme_ns *nvme_round_robin_path(struct nvme_ns_head *head,
>> - int node, struct nvme_ns *old)
>> +static struct nvme_ns *nvme_round_robin_path(struct nvme_ns_head *head)
>> {
>> - struct nvme_ns *ns, *found = NULL;
>> + struct nvme_ns *ns, *old, *found = NULL;
>> + int node = numa_node_id();
>> +
>> + old = srcu_dereference(head->current_path[node], &head->srcu);
>> + if (unlikely(!old))
>> + return __nvme_find_path(head, node);
>
> Can you split the refactoring of the existing path selectors into a
> prep patch, please?
Yes, I can do that.
>> +static void nvme_subsys_iopolicy_update(struct nvme_subsystem *subsys,
>> + int iopolicy)
>> +{
>> + struct nvme_ctrl *ctrl;
>> + int old_iopolicy = READ_ONCE(subsys->iopolicy);
>> +
>> + if (old_iopolicy == iopolicy)
>> + return;
>> +
>> + WRITE_ONCE(subsys->iopolicy, iopolicy);
>
> What is the atomicy model here? There doesn't seem to be any
> global lock protecting it? Maybe move it into the
> nvme_subsystems_lock critical section?
Good question. I didn't write this code. Yes, I agree this looks racy. Updates to the subsys->iopolicy variable are not atomic.
They don't need to be. The process of changing the iopolicy doesn't need to be synchronized and each CPU's cache will be updated
lazily. This was done to avoid the expense of adding (another) atomic read the io path.
So really, the nvme_subsystems_lock only protects the list_entrys in &nvme_subsystems. I can move the WRITE_ONCE() update to
after the lock, but then were did the caller - nvme_subsys_iopolicy_update() - get it's reference for the nvme_subsystem* from?
And there's no lock or synchronization on the read side.
The subsys->iopolicy has always been unprotected. It's been that way from since before this change.
At least... that's my read of the code.
>> + pr_notice("%s: changed from %s to %s for subsysnqn %s\n", __func__,
>> + nvme_iopolicy_names[old_iopolicy], nvme_iopolicy_names[iopolicy],
>> + subsys->subnqn);
>
> The function is not really relevant here, this should become something
> like:
>
> pr_notice("%s: changing iopolicy from %s to %s\n",
> subsys->subnqn,
> nvme_iopolicy_names[old_iopolicy],
> nvme_iopolicy_names[iopolicy]);
How about:
pr_notice("Changed iopolicy from %s to %s for subsysnqn %s\n",
nvme_iopolicy_names[old_iopolicy],
nvme_iopolicy_names[iopolicy],
subsys->subnqn);
> or maybe:
>
> dev_notice(changing iopolicy from %s to %s\n",
> &subsys->dev,
> nvme_iopolicy_names[old_iopolicy],
> nvme_iopolicy_names[iopolicy]);
>
The dev_notice will only spit out the nvme controller number (e.g. nvme5c5n3) and that's not very helpful to the user. I want to
include the subsystemNQN because that's easily visible and something the user can see on both the storage and the host.
Example:
root:resultsF > echo "round-robin" > /sys/class/nvme-subsystem/nvme-subsys11/iopolicy
[Thu Jun 20 13:42:59 2024] Changed iopolicy from queue-depth to round-robin for subsysnqn
nqn.1992-08.com.netapp:sn.2b82d9b13bb211ee8744d039ea989119:subsystem.SS104a
root:resultsF > nvme list-subsys | grep -A 8 nqn.1992-08.com.netapp:sn.2b82d9b13bb211ee8744d039ea989119:subsystem.SS104a
nvme-subsys11 - NQN=nqn.1992-08.com.netapp:sn.2b82d9b13bb211ee8744d039ea989119:subsystem.SS104a
hostnqn=nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0034-5310-8057-b2c04f355333
iopolicy=round-robin
\
+- nvme8 tcp traddr=172.18.60.14,trsvcid=4420,src_addr=172.18.60.4 live
+- nvme17 tcp traddr=172.18.50.15,trsvcid=4420,src_addr=172.18.50.3 live
+- nvme12 tcp traddr=172.18.60.16,trsvcid=4420,src_addr=172.18.60.4 live
+- nvme10 tcp traddr=172.18.50.13,trsvcid=4420,src_addr=172.18.50.3 live
/John
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v7 1/1] nvme-multipath: implement "queue-depth" iopolicy
2024-06-20 17:54 ` John Meneghini
@ 2024-06-24 8:46 ` Christoph Hellwig
2024-06-24 17:50 ` John Meneghini
0 siblings, 1 reply; 8+ messages in thread
From: Christoph Hellwig @ 2024-06-24 8:46 UTC (permalink / raw)
To: John Meneghini
Cc: Christoph Hellwig, kbusch, sagi, linux-nvme, linux-kernel, emilne,
jrani, randyj, chaitanyak, hare
On Thu, Jun 20, 2024 at 01:54:29PM -0400, John Meneghini wrote:
>>> +static void nvme_subsys_iopolicy_update(struct nvme_subsystem *subsys,
>>> + int iopolicy)
>>> +{
>>> + struct nvme_ctrl *ctrl;
>>> + int old_iopolicy = READ_ONCE(subsys->iopolicy);
>>> +
>>> + if (old_iopolicy == iopolicy)
>>> + return;
>>> +
>>> + WRITE_ONCE(subsys->iopolicy, iopolicy);
>>
>> What is the atomicy model here? There doesn't seem to be any
>> global lock protecting it? Maybe move it into the
>> nvme_subsystems_lock critical section?
>
> Good question. I didn't write this code. Yes, I agree this looks racy.
> Updates to the subsys->iopolicy variable are not atomic. They don't need to
> be. The process of changing the iopolicy doesn't need to be synchronized
> and each CPU's cache will be updated lazily. This was done to avoid the
> expense of adding (another) atomic read the io path.
Looks like all sysfs ->store calls for the same attribute are protected
by of->mutex in kernfs_fop_write_iter and we should actually be fine
here. Sorry for the noise.
>> pr_notice("%s: changing iopolicy from %s to %s\n",
>> subsys->subnqn,
>> nvme_iopolicy_names[old_iopolicy],
>> nvme_iopolicy_names[iopolicy]);
>
> How about:
>
> pr_notice("Changed iopolicy from %s to %s for subsysnqn %s\n",
> nvme_iopolicy_names[old_iopolicy],
> nvme_iopolicy_names[iopolicy],
> subsys->subnqn);
Having the identification as the prefixe seems easier to parse
and grep for.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v7 1/1] nvme-multipath: implement "queue-depth" iopolicy
2024-06-24 8:46 ` Christoph Hellwig
@ 2024-06-24 17:50 ` John Meneghini
2024-06-24 17:54 ` Christoph Hellwig
0 siblings, 1 reply; 8+ messages in thread
From: John Meneghini @ 2024-06-24 17:50 UTC (permalink / raw)
To: Christoph Hellwig
Cc: kbusch, sagi, linux-nvme, linux-kernel, emilne, jrani, randyj,
chaitanyak, hare
On 6/24/24 04:46, Christoph Hellwig wrote:
> On Thu, Jun 20, 2024 at 01:54:29PM -0400, John Meneghini wrote:
>> How about:
>>
>> pr_notice("Changed iopolicy from %s to %s for subsysnqn %s\n",
>> nvme_iopolicy_names[old_iopolicy],
>> nvme_iopolicy_names[iopolicy],
>> subsys->subnqn);
>
> Having the identification as the prefixe seems easier to parse
> and grep for.
To be clear Christoph, your saying you'd like the NQN in the prefix of the dmesg?
E.g.:
root> echo "round-robin" > /sys/class/nvme-subsystem/nvme-subsys11/iopolicy
[Thu Jun 24 11:44:20 2024] subsysnqn nqn.1992-08.com.netapp:sn.2b82d9b13bb211ee8744d039ea989119:subsystem.SS104a iopolicy
changed from queue-depth to round-robin.
Correct?
/John
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v7 1/1] nvme-multipath: implement "queue-depth" iopolicy
2024-06-24 17:50 ` John Meneghini
@ 2024-06-24 17:54 ` Christoph Hellwig
0 siblings, 0 replies; 8+ messages in thread
From: Christoph Hellwig @ 2024-06-24 17:54 UTC (permalink / raw)
To: John Meneghini
Cc: Christoph Hellwig, kbusch, sagi, linux-nvme, linux-kernel, emilne,
jrani, randyj, chaitanyak, hare
On Mon, Jun 24, 2024 at 01:50:21PM -0400, John Meneghini wrote:
>> Having the identification as the prefixe seems easier to parse
>> and grep for.
>
> To be clear Christoph, your saying you'd like the NQN in the prefix of the dmesg?
Yes.
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2024-06-24 17:54 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-19 16:35 [PATCH v7 0/1] nvme: queue-depth multipath iopolicy John Meneghini
2024-06-19 16:35 ` [PATCH v7 1/1] nvme-multipath: implement "queue-depth" iopolicy John Meneghini
2024-06-20 6:56 ` Christoph Hellwig
2024-06-20 14:41 ` John Meneghini
2024-06-20 17:54 ` John Meneghini
2024-06-24 8:46 ` Christoph Hellwig
2024-06-24 17:50 ` John Meneghini
2024-06-24 17:54 ` Christoph Hellwig
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox