From: Nilay Shroff <nilay@linux.ibm.com>
To: Sagi Grimberg <sagi@grimberg.me>,
John Meneghini <jmeneghi@redhat.com>,
kbusch@kernel.org, hch@lst.de, emilne@redhat.com
Cc: linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org,
jrani@purestorage.com, randyj@purestorage.com, hare@kernel.org
Subject: Re: [PATCH v3 1/1] nvme: multipath: Implemented new iopolicy "queue-depth"
Date: Tue, 21 May 2024 15:37:10 +0530 [thread overview]
Message-ID: <95fe3168-ec39-4932-b9fc-26484de49191@linux.ibm.com> (raw)
In-Reply-To: <3b8d33db-f2c3-469a-bfa0-8fc52594f243@grimberg.me>
On 5/21/24 15:15, Sagi Grimberg wrote:
>
>
> On 21/05/2024 11:48, Nilay Shroff wrote:
>>
>> On 5/21/24 01:50, John Meneghini wrote:
>>> From: "Ewan D. Milne" <emilne@redhat.com>
>>>
>>> The round-robin path selector is inefficient in cases where there is a
>>> difference in latency between multiple active optimized paths. In the
>>> presence of one or more high latency paths the round-robin selector
>>> continues to the high latency path equally. This results in a bias
>>> towards the highest latency path and can cause is significant decrease
>>> in overall performance as IOs pile on the lowest latency path. This
>>> problem is particularly accute with NVMe-oF controllers.
>>>
>>> The queue-depth policy instead sends I/O requests down the path with the
>>> least amount of requests in its request queue. Paths with lower latency
>>> will clear requests more quickly and have less requests in their queues
>>> compared to higher latency paths. The goal of this path selector is to
>>> make more use of lower latency paths, which will bring down overall IO
>>> latency.
>>>
>>> Signed-off-by: Ewan D. Milne <emilne@redhat.com>
>>> [tsong: patch developed by Thomas Song @ Pure Storage, fixed whitespace
>>> and compilation warnings, updated MODULE_PARM description, and
>>> fixed potential issue with ->current_path[] being used]
>>> Signed-off-by: Thomas Song <tsong@purestorage.com>
>>> [jmeneghi: vairious changes and improvements, addressed review comments]
>>> Signed-off-by: John Meneghini <jmeneghi@redhat.com>
>>> Link: https://lore.kernel.org/linux-nvme/20240509202929.831680-1-jmeneghi@redhat.com/
>>> Tested-by: Marco Patalano <mpatalan@redhat.com>
>>> Reviewed-by: Randy Jennings <randyj@redhat.com>
>>> Tested-by: Jyoti Rani <jani@purestorage.com>
>>> ---
>>> drivers/nvme/host/core.c | 2 +-
>>> drivers/nvme/host/multipath.c | 86 +++++++++++++++++++++++++++++++++--
>>> drivers/nvme/host/nvme.h | 9 ++++
>>> 3 files changed, 92 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>>> index a066429b790d..1dd7c52293ff 100644
>>> --- a/drivers/nvme/host/core.c
>>> +++ b/drivers/nvme/host/core.c
>>> @@ -110,7 +110,7 @@ struct workqueue_struct *nvme_delete_wq;
>>> EXPORT_SYMBOL_GPL(nvme_delete_wq);
>>> static LIST_HEAD(nvme_subsystems);
>>> -static DEFINE_MUTEX(nvme_subsystems_lock);
>>> +DEFINE_MUTEX(nvme_subsystems_lock);
>>> static DEFINE_IDA(nvme_instance_ida);
>>> static dev_t nvme_ctrl_base_chr_devt;
>>> diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
>>> index 5397fb428b24..0e2b6e720e95 100644
>>> --- a/drivers/nvme/host/multipath.c
>>> +++ b/drivers/nvme/host/multipath.c
>>> @@ -17,6 +17,7 @@ MODULE_PARM_DESC(multipath,
>>> static const char *nvme_iopolicy_names[] = {
>>> [NVME_IOPOLICY_NUMA] = "numa",
>>> [NVME_IOPOLICY_RR] = "round-robin",
>>> + [NVME_IOPOLICY_QD] = "queue-depth",
>>> };
>>> static int iopolicy = NVME_IOPOLICY_NUMA;
>>> @@ -29,6 +30,8 @@ static int nvme_set_iopolicy(const char *val, const struct kernel_param *kp)
>>> iopolicy = NVME_IOPOLICY_NUMA;
>>> else if (!strncmp(val, "round-robin", 11))
>>> iopolicy = NVME_IOPOLICY_RR;
>>> + else if (!strncmp(val, "queue-depth", 11))
>>> + iopolicy = NVME_IOPOLICY_QD;
>>> else
>>> return -EINVAL;
>>> @@ -43,7 +46,7 @@ static int nvme_get_iopolicy(char *buf, const struct kernel_param *kp)
>>> module_param_call(iopolicy, nvme_set_iopolicy, nvme_get_iopolicy,
>>> &iopolicy, 0644);
>>> MODULE_PARM_DESC(iopolicy,
>>> - "Default multipath I/O policy; 'numa' (default) or 'round-robin'");
>>> + "Default multipath I/O policy; 'numa' (default) , 'round-robin' or 'queue-depth'");
>>> void nvme_mpath_default_iopolicy(struct nvme_subsystem *subsys)
>>> {
>>> @@ -127,6 +130,11 @@ void nvme_mpath_start_request(struct request *rq)
>>> struct nvme_ns *ns = rq->q->queuedata;
>>> struct gendisk *disk = ns->head->disk;
>>> + if (READ_ONCE(ns->head->subsys->iopolicy) == NVME_IOPOLICY_QD) {
>>> + atomic_inc(&ns->ctrl->nr_active);
>>> + nvme_req(rq)->flags |= NVME_MPATH_CNT_ACTIVE;
>>> + }
>>> +
>>> if (!blk_queue_io_stat(disk->queue) || blk_rq_is_passthrough(rq))
>>> return;
>>> @@ -140,8 +148,12 @@ void nvme_mpath_end_request(struct request *rq)
>>> {
>>> struct nvme_ns *ns = rq->q->queuedata;
>>> + if ((nvme_req(rq)->flags & NVME_MPATH_CNT_ACTIVE))
>>> + atomic_dec_if_positive(&ns->ctrl->nr_active);
>>> +
>>> if (!(nvme_req(rq)->flags & NVME_MPATH_IO_STATS))
>>> return;
>>> +
>>> bdev_end_io_acct(ns->head->disk->part0, req_op(rq),
>>> blk_rq_bytes(rq) >> SECTOR_SHIFT,
>>> nvme_req(rq)->start_time);
>>> @@ -330,6 +342,40 @@ static struct nvme_ns *nvme_round_robin_path(struct nvme_ns_head *head,
>>> return found;
>>> }
>>>
>> I think you may also want to reset nr_active counter if in case, in-flight nvme request
>> is cancelled. If the request is cancelled then nvme_mpath_end_request() wouldn't be invoked.
>> So you may want to reset nr_active counter from nvme_cancel_request() as below:
>>
>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>> index bf7615cb36ee..4fea7883ce8e 100644
>> --- a/drivers/nvme/host/core.c
>> +++ b/drivers/nvme/host/core.c
>> @@ -497,8 +497,9 @@ EXPORT_SYMBOL_GPL(nvme_host_path_error);
>> bool nvme_cancel_request(struct request *req, void *data)
>> {
>> - dev_dbg_ratelimited(((struct nvme_ctrl *) data)->device,
>> - "Cancelling I/O %d", req->tag);
>> + struct nvme_ctrl *ctrl = (struct nvme_ctrl *)data;
>> +
>> + dev_dbg_ratelimited(ctrl->device, "Cancelling I/O %d", req->tag);
>> /* don't abort one completed or idle request */
>> if (blk_mq_rq_state(req) != MQ_RQ_IN_FLIGHT)
>> @@ -506,6 +507,8 @@ bool nvme_cancel_request(struct request *req, void *data)
>> nvme_req(req)->status = NVME_SC_HOST_ABORTED_CMD;
>> nvme_req(req)->flags |= NVME_REQ_CANCELLED;
>> + if ((nvme_req(rq)->flags & NVME_MPATH_CNT_ACTIVE))
>> + atomic_dec(&ctrl->nr_active);
>
> Don't think this matters because cancellation only happens when we
> teardown the controller anyways...
>
I think in case if we reset the nvme controller then we don't teardown
controller, isn't it? In this case we cancel all pending requests, and
later restart the controller.
Thanks,
--Nilay
next prev parent reply other threads:[~2024-05-21 10:07 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-20 20:20 [PATCH v3 0/1] nvme: queue-depth multipath iopolicy John Meneghini
2024-05-20 20:20 ` [PATCH v3 1/1] nvme: multipath: Implemented new iopolicy "queue-depth" John Meneghini
2024-05-20 20:50 ` Keith Busch
2024-05-21 14:20 ` John Meneghini
2024-05-21 6:46 ` Hannes Reinecke
2024-05-21 13:58 ` John Meneghini
2024-05-21 14:10 ` Keith Busch
2024-05-21 14:23 ` Hannes Reinecke
2024-05-21 16:35 ` Caleb Sander
2024-05-21 8:48 ` Nilay Shroff
2024-05-21 9:45 ` Sagi Grimberg
2024-05-21 10:07 ` Nilay Shroff [this message]
2024-05-21 10:11 ` Sagi Grimberg
2024-05-21 10:15 ` Sagi Grimberg
2024-05-21 10:16 ` Sagi Grimberg
2024-05-21 14:44 ` John Meneghini
2024-05-22 10:48 ` Nilay Shroff
2024-05-22 10:52 ` Sagi Grimberg
2024-05-22 13:12 ` John Meneghini
2024-05-21 10:22 ` Nilay Shroff
2024-05-21 13:05 ` Keith Busch
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=95fe3168-ec39-4932-b9fc-26484de49191@linux.ibm.com \
--to=nilay@linux.ibm.com \
--cc=emilne@redhat.com \
--cc=hare@kernel.org \
--cc=hch@lst.de \
--cc=jmeneghi@redhat.com \
--cc=jrani@purestorage.com \
--cc=kbusch@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=randyj@purestorage.com \
--cc=sagi@grimberg.me \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox