From: Mohamed Khalfella <mkhalfella@purestorage.com>
To: Daniel Wagner <dwagner@suse.de>
Cc: Daniel Wagner <wagi@kernel.org>, Christoph Hellwig <hch@lst.de>,
Sagi Grimberg <sagi@grimberg.me>, Keith Busch <kbusch@kernel.org>,
Hannes Reinecke <hare@suse.de>,
John Meneghini <jmeneghi@redhat.com>,
randyj@purestorage.com, linux-nvme@lists.infradead.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH RFC 3/3] nvme: delay failover by command quiesce timeout
Date: Tue, 15 Apr 2025 17:40:16 -0700 [thread overview]
Message-ID: <20250416004016.GC78596-mkhalfella@purestorage.com> (raw)
In-Reply-To: <6f0d50b2-7a16-4298-8129-c3a0b1426d26@flourine.local>
On 2025-04-15 14:17:48 +0200, Daniel Wagner wrote:
> On Thu, Apr 10, 2025 at 01:51:37AM -0700, Mohamed Khalfella wrote:
> > > +void nvme_schedule_failover(struct nvme_ctrl *ctrl)
> > > +{
> > > + unsigned long delay;
> > > +
> > > + if (ctrl->cqt)
> > > + delay = msecs_to_jiffies(ctrl->cqt);
> > > + else
> > > + delay = ctrl->kato * HZ;
> >
> > I thought that delay = m * ctrl->kato + ctrl->cqt
> > where m = ctrl->ctratt & NVME_CTRL_ATTR_TBKAS ? 3 : 2
> > no?
>
> The failover schedule delay is the additional amount of time we have to
> wait for the target to cleanup (CQT). If the CTQ is not valid I thought
> the spec said to wait for a KATO. Possible I got that wrong.
>
> The factor 3 or 2 is relavant for the timeout value for the KATO command
> we schedule. The failover schedule timeout is ontop of the command
> timeout value.
>
> > > --- a/drivers/nvme/host/multipath.c
> > > +++ b/drivers/nvme/host/multipath.c
> > > @@ -86,9 +86,11 @@ void nvme_mpath_start_freeze(struct nvme_subsystem *subsys)
> > > void nvme_failover_req(struct request *req)
> > > {
> > > struct nvme_ns *ns = req->q->queuedata;
> > > + struct nvme_ctrl *ctrl = nvme_req(req)->ctrl;
> > > u16 status = nvme_req(req)->status & NVME_SCT_SC_MASK;
> > > unsigned long flags;
> > > struct bio *bio;
> > > + enum nvme_ctrl_state state = nvme_ctrl_state(ctrl);
> > >
> > > nvme_mpath_clear_current_path(ns);
> > >
> > > @@ -121,9 +123,53 @@ void nvme_failover_req(struct request *req)
> > > blk_steal_bios(&ns->head->requeue_list, req);
> > > spin_unlock_irqrestore(&ns->head->requeue_lock, flags);
> > >
> > > - nvme_req(req)->status = 0;
> > > - nvme_end_req(req);
> > > - kblockd_schedule_work(&ns->head->requeue_work);
> > > + spin_lock_irqsave(&ctrl->lock, flags);
> > > + list_add_tail(&req->queuelist, &ctrl->failover_list);
> > > + spin_unlock_irqrestore(&ctrl->lock, flags);
> >
> > I see this is the only place where held requests are added to
> > failover_list.
> >
> > - Will this hold admin requests in failover_list?
>
> Yes.
>
> > - What about requests that do not go through nvme_failover_req(), like
> > passthrough requests, do we not want to hold these requests until it
> > is safe for them to be retried?
>
> Pasthrough commands should fail immediately. Userland is in charge here,
> not the kernel. At least this what should happen here.
I see your point. Unless I am missing something these requests should be
held equally to bio requests from multipath layer. Let us say app
submitted write a request that got canceled immediately, how does the app
know when it is safe to retry the write request?
Holding requests like write until it is safe to be retried is the whole
point of this work, right?
>
> > - In case of controller reset or delete if nvme_disable_ctrl()
> > successfully disables the controller, then we do not want to add
> > canceled requests to failover_list, right? Does this implementation
> > consider this case?
>
> Not sure. I've tested a few things but I am pretty sure this RFC is far
> from being complete.
next prev parent reply other threads:[~2025-04-16 0:40 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-24 12:07 [PATCH RFC 0/3] nvme: add support for command quiesce timeout Daniel Wagner
2025-03-24 12:07 ` [PATCH RFC 1/3] nvmet: add command quiesce time Daniel Wagner
2025-04-01 9:33 ` Hannes Reinecke
2025-04-10 9:00 ` Mohamed Khalfella
2025-04-16 11:37 ` Daniel Wagner
2025-03-24 12:07 ` [PATCH RFC 2/3] nvme: store cqt value into nvme ctrl object Daniel Wagner
2025-04-01 9:34 ` Hannes Reinecke
2025-03-24 12:07 ` [PATCH RFC 3/3] nvme: delay failover by command quiesce timeout Daniel Wagner
2025-04-01 9:37 ` Hannes Reinecke
2025-04-15 12:00 ` Daniel Wagner
2025-04-01 13:32 ` Nilay Shroff
2025-04-15 12:05 ` Daniel Wagner
2025-04-10 8:51 ` Mohamed Khalfella
2025-04-14 22:28 ` Sagi Grimberg
2025-04-15 12:11 ` Daniel Wagner
2025-04-15 21:07 ` Sagi Grimberg
2025-04-15 23:02 ` Randy Jennings
2025-04-15 23:35 ` Sagi Grimberg
2025-04-15 23:57 ` Randy Jennings
2025-04-16 22:15 ` Sagi Grimberg
2025-04-17 0:47 ` Randy Jennings
2025-04-15 12:17 ` Daniel Wagner
2025-04-15 22:56 ` Randy Jennings
2025-04-16 6:39 ` Daniel Wagner
2025-04-16 0:17 ` Mohamed Khalfella
2025-04-16 6:57 ` Daniel Wagner
2025-04-16 13:39 ` Mohamed Khalfella
2025-04-16 0:40 ` Mohamed Khalfella [this message]
2025-04-16 8:30 ` Daniel Wagner
2025-04-16 13:53 ` Mohamed Khalfella
2025-04-16 22:21 ` Sagi Grimberg
2025-04-16 22:59 ` Mohamed Khalfella
2025-04-17 7:28 ` Hannes Reinecke
2025-04-10 16:07 ` Jiewei Ke
2025-04-10 17:13 ` Jiewei Ke
2025-04-13 22:03 ` Sagi Grimberg
2025-04-16 8:51 ` Daniel Wagner
2025-04-16 0:23 ` Mohamed Khalfella
2025-04-16 11:33 ` Daniel Wagner
[not found] <8F2489FD-1663-4A52-A50B-F15046AC2878@163.com>
2025-04-15 12:34 ` Daniel Wagner
2025-04-15 15:08 ` Jiewei Ke
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250416004016.GC78596-mkhalfella@purestorage.com \
--to=mkhalfella@purestorage.com \
--cc=dwagner@suse.de \
--cc=hare@suse.de \
--cc=hch@lst.de \
--cc=jmeneghi@redhat.com \
--cc=kbusch@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=randyj@purestorage.com \
--cc=sagi@grimberg.me \
--cc=wagi@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox