From: Keith Busch <kbusch@kernel.org>
To: John Meneghini <jmeneghi@redhat.com>
Cc: hch@lst.de, sagi@grimberg.me, emilne@redhat.com,
linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org,
jrani@purestorage.com, randyj@purestorage.com, hare@kernel.org
Subject: Re: [PATCH v3 1/1] nvme: multipath: Implemented new iopolicy "queue-depth"
Date: Mon, 20 May 2024 14:50:04 -0600 [thread overview]
Message-ID: <Zku3fBuauZQX6bEO@kbusch-mbp> (raw)
In-Reply-To: <20240520202045.427110-2-jmeneghi@redhat.com>
On Mon, May 20, 2024 at 04:20:45PM -0400, John Meneghini wrote:
> From: "Ewan D. Milne" <emilne@redhat.com>
>
> The round-robin path selector is inefficient in cases where there is a
> difference in latency between multiple active optimized paths. In the
> presence of one or more high latency paths the round-robin selector
> continues to the high latency path equally. This results in a bias
> towards the highest latency path and can cause is significant decrease
> in overall performance as IOs pile on the lowest latency path. This
> problem is particularly accute with NVMe-oF controllers.
The patch looks pretty good to me. Just a few questions/comments.
> static LIST_HEAD(nvme_subsystems);
> -static DEFINE_MUTEX(nvme_subsystems_lock);
> +DEFINE_MUTEX(nvme_subsystems_lock);
This seems odd. Why is this lock protecting both the global
nvme_subsystems list, and also subsystem controllers? IOW, why isn't the
subsys->ctrls list protected by the more fine grained 'subsys->lock'
instead of this global lock?
> @@ -43,7 +46,7 @@ static int nvme_get_iopolicy(char *buf, const struct kernel_param *kp)
> module_param_call(iopolicy, nvme_set_iopolicy, nvme_get_iopolicy,
> &iopolicy, 0644);
> MODULE_PARM_DESC(iopolicy,
> - "Default multipath I/O policy; 'numa' (default) or 'round-robin'");
> + "Default multipath I/O policy; 'numa' (default) , 'round-robin' or 'queue-depth'");
Unnecessary space before the ','.
> + if (READ_ONCE(ns->head->subsys->iopolicy) == NVME_IOPOLICY_QD) {
> + atomic_inc(&ns->ctrl->nr_active);
> + nvme_req(rq)->flags |= NVME_MPATH_CNT_ACTIVE;
> + }
> +
> if (!blk_queue_io_stat(disk->queue) || blk_rq_is_passthrough(rq))
> return;
>
> @@ -140,8 +148,12 @@ void nvme_mpath_end_request(struct request *rq)
> {
> struct nvme_ns *ns = rq->q->queuedata;
>
> + if ((nvme_req(rq)->flags & NVME_MPATH_CNT_ACTIVE))
> + atomic_dec_if_positive(&ns->ctrl->nr_active);
You can just do a atomic_dec() since your new flag has this tied to to
the atomic_inc().
> +static struct nvme_ns *nvme_queue_depth_path(struct nvme_ns_head *head)
> +{
> + struct nvme_ns *best_opt = NULL, *best_nonopt = NULL, *ns;
> + unsigned int min_depth_opt = UINT_MAX, min_depth_nonopt = UINT_MAX;
> + unsigned int depth;
> +
> + list_for_each_entry_rcu(ns, &head->list, siblings) {
> + if (nvme_path_is_disabled(ns))
> + continue;
> +
> + depth = atomic_read(&ns->ctrl->nr_active);
> +
> + switch (ns->ana_state) {
> + case NVME_ANA_OPTIMIZED:
> + if (depth < min_depth_opt) {
> + min_depth_opt = depth;
> + best_opt = ns;
> + }
> + break;
> +
> + case NVME_ANA_NONOPTIMIZED:
> + if (depth < min_depth_nonopt) {
> + min_depth_nonopt = depth;
> + best_nonopt = ns;
> + }
> + break;
> + default:
> + break;
> + }
Could we break out of this loop early if "min_depth_opt == 0"? We can't
find a better path that that, so no need to read the rest of the paths.
> +void nvme_subsys_iopolicy_update(struct nvme_subsystem *subsys, int iopolicy)
> +{
> + struct nvme_ctrl *ctrl;
> + int old_iopolicy = READ_ONCE(subsys->iopolicy);
> +
Let's add a check here:
if (old_iopolicy == iopolicy)
return;
> @@ -935,6 +940,7 @@ void nvme_mpath_clear_ctrl_paths(struct nvme_ctrl *ctrl);
> void nvme_mpath_shutdown_disk(struct nvme_ns_head *head);
> void nvme_mpath_start_request(struct request *rq);
> void nvme_mpath_end_request(struct request *rq);
> +void nvme_subsys_iopolicy_update(struct nvme_subsystem *subsys, int iopolicy);
This funciton isn't used outside multipath.c, so it should be static.
next prev parent reply other threads:[~2024-05-20 20:50 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-20 20:20 [PATCH v3 0/1] nvme: queue-depth multipath iopolicy John Meneghini
2024-05-20 20:20 ` [PATCH v3 1/1] nvme: multipath: Implemented new iopolicy "queue-depth" John Meneghini
2024-05-20 20:50 ` Keith Busch [this message]
2024-05-21 14:20 ` John Meneghini
2024-05-21 6:46 ` Hannes Reinecke
2024-05-21 13:58 ` John Meneghini
2024-05-21 14:10 ` Keith Busch
2024-05-21 14:23 ` Hannes Reinecke
2024-05-21 16:35 ` Caleb Sander
2024-05-21 8:48 ` Nilay Shroff
2024-05-21 9:45 ` Sagi Grimberg
2024-05-21 10:07 ` Nilay Shroff
2024-05-21 10:11 ` Sagi Grimberg
2024-05-21 10:15 ` Sagi Grimberg
2024-05-21 10:16 ` Sagi Grimberg
2024-05-21 14:44 ` John Meneghini
2024-05-22 10:48 ` Nilay Shroff
2024-05-22 10:52 ` Sagi Grimberg
2024-05-22 13:12 ` John Meneghini
2024-05-21 10:22 ` Nilay Shroff
2024-05-21 13:05 ` Keith Busch
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Zku3fBuauZQX6bEO@kbusch-mbp \
--to=kbusch@kernel.org \
--cc=emilne@redhat.com \
--cc=hare@kernel.org \
--cc=hch@lst.de \
--cc=jmeneghi@redhat.com \
--cc=jrani@purestorage.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=randyj@purestorage.com \
--cc=sagi@grimberg.me \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox