* [PATCH 1/1] nvme-multipath: Skip nr_active increments in RETRY disposition
@ 2025-09-24 22:43 Amit Chaudhary
2025-09-24 23:02 ` Keith Busch
0 siblings, 1 reply; 5+ messages in thread
From: Amit Chaudhary @ 2025-09-24 22:43 UTC (permalink / raw)
To: achaudhary, Keith Busch, Jens Axboe, Christoph Hellwig,
Sagi Grimberg
Cc: mkhalfella, randyj, jmeneghi, emilne, linux-nvme, linux-kernel
For queue-depth I/O policy, this patch fixes unbalanced I/Os across
nvme multipaths.
Issue Description:
The RETRY disposition incorrectly increments ns->ctrl->nr_active
counter and reinitializes iostat start-time. In such cases nr_active
counter never goes back to zero until that path disconnects and
reconnects.
Such a path is not chosen for new I/Os if multiple RETRY cases on a given
a path cause its queue-depth counter to be artificially higher compared
to other paths. This leads to unbalanced I/Os across paths.
The patch skips calling nvme_mpath_start_request() on a RETRY disposition
if nvme_req(rq)->retries counter is non-zero. This avoids reincrementing
nr_active and also from restarting io_stat start time.
base-commit: e989a3da2d371a4b6597ee8dee5c72e407b4db7a
Fixes: d4d957b53d91eeb ("nvme-multipath: support io stats on the mpath device")
Signed-off-by: Amit Chaudhary <achaudhary@purestorage.com>
Reviewed-by: Randy Jennings <randyj@purestorage.com>
---
drivers/nvme/host/nvme.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 102fae6a231c..6ca6529ba83a 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -1150,7 +1150,7 @@ static inline void nvme_hwmon_exit(struct nvme_ctrl *ctrl)
static inline void nvme_start_request(struct request *rq)
{
- if (rq->cmd_flags & REQ_NVME_MPATH)
+ if ((rq->cmd_flags & REQ_NVME_MPATH) && (!nvme_req(rq)->retries))
nvme_mpath_start_request(rq);
blk_mq_start_request(rq);
}
--
2.43.0
^ permalink raw reply related [flat|nested] 5+ messages in thread* Re: [PATCH 1/1] nvme-multipath: Skip nr_active increments in RETRY disposition
2025-09-24 22:43 [PATCH 1/1] nvme-multipath: Skip nr_active increments in RETRY disposition Amit Chaudhary
@ 2025-09-24 23:02 ` Keith Busch
2025-09-25 1:14 ` Mohamed Khalfella
0 siblings, 1 reply; 5+ messages in thread
From: Keith Busch @ 2025-09-24 23:02 UTC (permalink / raw)
To: Amit Chaudhary
Cc: Jens Axboe, Christoph Hellwig, Sagi Grimberg, mkhalfella, randyj,
jmeneghi, emilne, linux-nvme, linux-kernel
On Wed, Sep 24, 2025 at 03:43:18PM -0700, Amit Chaudhary wrote:
> static inline void nvme_start_request(struct request *rq)
> {
> - if (rq->cmd_flags & REQ_NVME_MPATH)
> + if ((rq->cmd_flags & REQ_NVME_MPATH) && (!nvme_req(rq)->retries))
> nvme_mpath_start_request(rq);
> blk_mq_start_request(rq);
> }
Using "retries" is bit indirect as a proxy for multipath active counts.
Could this be moved to the mpath start instead, directly using the flag
that accounts for the path? This also helps to keep track if the command
gets retried across a user toggling the policy to "qd".
---
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 3da980dc60d91..1c630967ddd40 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -182,7 +182,8 @@ void nvme_mpath_start_request(struct request *rq)
struct nvme_ns *ns = rq->q->queuedata;
struct gendisk *disk = ns->head->disk;
- if (READ_ONCE(ns->head->subsys->iopolicy) == NVME_IOPOLICY_QD) {
+ if (READ_ONCE(ns->head->subsys->iopolicy) == NVME_IOPOLICY_QD &&
+ !(nvme_req(rq)->flags & NVME_MPATH_CNT_ACTIVE)) {
atomic_inc(&ns->ctrl->nr_active);
nvme_req(rq)->flags |= NVME_MPATH_CNT_ACTIVE;
}
--
^ permalink raw reply related [flat|nested] 5+ messages in thread* Re: [PATCH 1/1] nvme-multipath: Skip nr_active increments in RETRY disposition
2025-09-24 23:02 ` Keith Busch
@ 2025-09-25 1:14 ` Mohamed Khalfella
2025-09-25 14:43 ` Keith Busch
0 siblings, 1 reply; 5+ messages in thread
From: Mohamed Khalfella @ 2025-09-25 1:14 UTC (permalink / raw)
To: Keith Busch
Cc: Amit Chaudhary, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
randyj, jmeneghi, emilne, linux-nvme, linux-kernel
On 2025-09-24 17:02:51 -0600, Keith Busch wrote:
> On Wed, Sep 24, 2025 at 03:43:18PM -0700, Amit Chaudhary wrote:
> > static inline void nvme_start_request(struct request *rq)
> > {
> > - if (rq->cmd_flags & REQ_NVME_MPATH)
> > + if ((rq->cmd_flags & REQ_NVME_MPATH) && (!nvme_req(rq)->retries))
> > nvme_mpath_start_request(rq);
> > blk_mq_start_request(rq);
> > }
>
> Using "retries" is bit indirect as a proxy for multipath active counts.
> Could this be moved to the mpath start instead, directly using the flag
> that accounts for the path? This also helps to keep track if the command
> gets retried across a user toggling the policy to "qd".
>
> ---
> diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
> index 3da980dc60d91..1c630967ddd40 100644
> --- a/drivers/nvme/host/multipath.c
> +++ b/drivers/nvme/host/multipath.c
> @@ -182,7 +182,8 @@ void nvme_mpath_start_request(struct request *rq)
> struct nvme_ns *ns = rq->q->queuedata;
> struct gendisk *disk = ns->head->disk;
>
> - if (READ_ONCE(ns->head->subsys->iopolicy) == NVME_IOPOLICY_QD) {
> + if (READ_ONCE(ns->head->subsys->iopolicy) == NVME_IOPOLICY_QD &&
> + !(nvme_req(rq)->flags & NVME_MPATH_CNT_ACTIVE)) {
> atomic_inc(&ns->ctrl->nr_active);
> nvme_req(rq)->flags |= NVME_MPATH_CNT_ACTIVE;
> }
> --
193 nvme_req(rq)->flags |= NVME_MPATH_IO_STATS;
194 nvme_req(rq)->start_time = bdev_start_io_acct(disk->part0, req_op(rq),
195 jiffies);
Doing it this way might messup with stats accounting because the two
lines above will be executed on request retry. I do not think we need
that, right?
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [PATCH 1/1] nvme-multipath: Skip nr_active increments in RETRY disposition
2025-09-25 1:14 ` Mohamed Khalfella
@ 2025-09-25 14:43 ` Keith Busch
2025-09-25 15:59 ` Mohamed Khalfella
0 siblings, 1 reply; 5+ messages in thread
From: Keith Busch @ 2025-09-25 14:43 UTC (permalink / raw)
To: Mohamed Khalfella
Cc: Amit Chaudhary, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
randyj, jmeneghi, emilne, linux-nvme, linux-kernel
On Wed, Sep 24, 2025 at 06:14:27PM -0700, Mohamed Khalfella wrote:
> On 2025-09-24 17:02:51 -0600, Keith Busch wrote:
> > On Wed, Sep 24, 2025 at 03:43:18PM -0700, Amit Chaudhary wrote:
> > > static inline void nvme_start_request(struct request *rq)
> > > {
> > > - if (rq->cmd_flags & REQ_NVME_MPATH)
> > > + if ((rq->cmd_flags & REQ_NVME_MPATH) && (!nvme_req(rq)->retries))
> > > nvme_mpath_start_request(rq);
> > > blk_mq_start_request(rq);
> > > }
> >
> > Using "retries" is bit indirect as a proxy for multipath active counts.
> > Could this be moved to the mpath start instead, directly using the flag
> > that accounts for the path? This also helps to keep track if the command
> > gets retried across a user toggling the policy to "qd".
> >
> > ---
> > diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
> > index 3da980dc60d91..1c630967ddd40 100644
> > --- a/drivers/nvme/host/multipath.c
> > +++ b/drivers/nvme/host/multipath.c
> > @@ -182,7 +182,8 @@ void nvme_mpath_start_request(struct request *rq)
> > struct nvme_ns *ns = rq->q->queuedata;
> > struct gendisk *disk = ns->head->disk;
> >
> > - if (READ_ONCE(ns->head->subsys->iopolicy) == NVME_IOPOLICY_QD) {
> > + if (READ_ONCE(ns->head->subsys->iopolicy) == NVME_IOPOLICY_QD &&
> > + !(nvme_req(rq)->flags & NVME_MPATH_CNT_ACTIVE)) {
> > atomic_inc(&ns->ctrl->nr_active);
> > nvme_req(rq)->flags |= NVME_MPATH_CNT_ACTIVE;
> > }
> > --
>
> 193 nvme_req(rq)->flags |= NVME_MPATH_IO_STATS;
> 194 nvme_req(rq)->start_time = bdev_start_io_acct(disk->part0, req_op(rq),
> 195 jiffies);
>
> Doing it this way might messup with stats accounting because the two
> lines above will be executed on request retry. I do not think we need
> that, right?
Yeah, but we can use the other flag to know if it's already been
accounted:
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -182,12 +182,14 @@ void nvme_mpath_start_request(struct request *rq)
struct nvme_ns *ns = rq->q->queuedata;
struct gendisk *disk = ns->head->disk;
- if (READ_ONCE(ns->head->subsys->iopolicy) == NVME_IOPOLICY_QD) {
+ if (READ_ONCE(ns->head->subsys->iopolicy) == NVME_IOPOLICY_QD &&
+ !(nvme_req(rq)->flags & NVME_MPATH_CNT_ACTIVE)) {
atomic_inc(&ns->ctrl->nr_active);
nvme_req(rq)->flags |= NVME_MPATH_CNT_ACTIVE;
}
- if (!blk_queue_io_stat(disk->queue) || blk_rq_is_passthrough(rq))
+ if (!blk_queue_io_stat(disk->queue) || blk_rq_is_passthrough(rq) ||
+ nvme_req(rq)->flags & NVME_MPATH_IO_STATS)
return;
nvme_req(rq)->flags |= NVME_MPATH_IO_STATS;
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [PATCH 1/1] nvme-multipath: Skip nr_active increments in RETRY disposition
2025-09-25 14:43 ` Keith Busch
@ 2025-09-25 15:59 ` Mohamed Khalfella
0 siblings, 0 replies; 5+ messages in thread
From: Mohamed Khalfella @ 2025-09-25 15:59 UTC (permalink / raw)
To: Keith Busch
Cc: Amit Chaudhary, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
randyj, jmeneghi, emilne, linux-nvme, linux-kernel
On 2025-09-25 08:43:44 -0600, Keith Busch wrote:
> On Wed, Sep 24, 2025 at 06:14:27PM -0700, Mohamed Khalfella wrote:
> > On 2025-09-24 17:02:51 -0600, Keith Busch wrote:
> > > On Wed, Sep 24, 2025 at 03:43:18PM -0700, Amit Chaudhary wrote:
> > > > static inline void nvme_start_request(struct request *rq)
> > > > {
> > > > - if (rq->cmd_flags & REQ_NVME_MPATH)
> > > > + if ((rq->cmd_flags & REQ_NVME_MPATH) && (!nvme_req(rq)->retries))
> > > > nvme_mpath_start_request(rq);
> > > > blk_mq_start_request(rq);
> > > > }
> > >
> > > Using "retries" is bit indirect as a proxy for multipath active counts.
> > > Could this be moved to the mpath start instead, directly using the flag
> > > that accounts for the path? This also helps to keep track if the command
> > > gets retried across a user toggling the policy to "qd".
> > >
> > > ---
> > > diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
> > > index 3da980dc60d91..1c630967ddd40 100644
> > > --- a/drivers/nvme/host/multipath.c
> > > +++ b/drivers/nvme/host/multipath.c
> > > @@ -182,7 +182,8 @@ void nvme_mpath_start_request(struct request *rq)
> > > struct nvme_ns *ns = rq->q->queuedata;
> > > struct gendisk *disk = ns->head->disk;
> > >
> > > - if (READ_ONCE(ns->head->subsys->iopolicy) == NVME_IOPOLICY_QD) {
> > > + if (READ_ONCE(ns->head->subsys->iopolicy) == NVME_IOPOLICY_QD &&
> > > + !(nvme_req(rq)->flags & NVME_MPATH_CNT_ACTIVE)) {
> > > atomic_inc(&ns->ctrl->nr_active);
> > > nvme_req(rq)->flags |= NVME_MPATH_CNT_ACTIVE;
> > > }
> > > --
> >
> > 193 nvme_req(rq)->flags |= NVME_MPATH_IO_STATS;
> > 194 nvme_req(rq)->start_time = bdev_start_io_acct(disk->part0, req_op(rq),
> > 195 jiffies);
> >
> > Doing it this way might messup with stats accounting because the two
> > lines above will be executed on request retry. I do not think we need
> > that, right?
>
> Yeah, but we can use the other flag to know if it's already been
> accounted:
>
> --- a/drivers/nvme/host/multipath.c
> +++ b/drivers/nvme/host/multipath.c
> @@ -182,12 +182,14 @@ void nvme_mpath_start_request(struct request *rq)
> struct nvme_ns *ns = rq->q->queuedata;
> struct gendisk *disk = ns->head->disk;
>
> - if (READ_ONCE(ns->head->subsys->iopolicy) == NVME_IOPOLICY_QD) {
> + if (READ_ONCE(ns->head->subsys->iopolicy) == NVME_IOPOLICY_QD &&
> + !(nvme_req(rq)->flags & NVME_MPATH_CNT_ACTIVE)) {
> atomic_inc(&ns->ctrl->nr_active);
> nvme_req(rq)->flags |= NVME_MPATH_CNT_ACTIVE;
> }
>
> - if (!blk_queue_io_stat(disk->queue) || blk_rq_is_passthrough(rq))
> + if (!blk_queue_io_stat(disk->queue) || blk_rq_is_passthrough(rq) ||
> + nvme_req(rq)->flags & NVME_MPATH_IO_STATS)
> return;
>
> nvme_req(rq)->flags |= NVME_MPATH_IO_STATS;
This works. However, I find Amit's change more straight forward to
understand. nvme_mpath_start_request()/nvme_mpath_end_request() are
called when request started/ended respectively. For a request that has
been retried on the same path nvme_mpath_start_request() need not be
called again. Such retry should be transparent to multipath layer.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-09-25 15:59 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-24 22:43 [PATCH 1/1] nvme-multipath: Skip nr_active increments in RETRY disposition Amit Chaudhary
2025-09-24 23:02 ` Keith Busch
2025-09-25 1:14 ` Mohamed Khalfella
2025-09-25 14:43 ` Keith Busch
2025-09-25 15:59 ` Mohamed Khalfella
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox