From: Keith Busch <keith.busch@linux.intel.com>
To: Ming Lei <ming.lei@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>, Keith Busch <keith.busch@intel.com>,
Laurence Oberman <loberman@redhat.com>,
Sagi Grimberg <sagi@grimberg.me>,
James Smart <james.smart@broadcom.com>,
linux-nvme@lists.infradead.org, linux-block@vger.kernel.org,
Johannes Thumshirn <jthumshirn@suse.de>,
Christoph Hellwig <hch@lst.de>
Subject: Re: [PATCH 1/6] nvme: Sync request queues on reset
Date: Mon, 21 May 2018 09:59:09 -0600 [thread overview]
Message-ID: <20180521155909.GK5528@localhost.localdomain> (raw)
In-Reply-To: <20180521152536.GB19099@ming.t460p>
On Mon, May 21, 2018 at 11:25:43PM +0800, Ming Lei wrote:
> On Mon, May 21, 2018 at 08:04:13AM -0600, Keith Busch wrote:
> > On Sat, May 19, 2018 at 08:01:42AM +0800, Ming Lei wrote:
> > > > You keep saying that, but the controller state is global to the
> > > > controller. It doesn't matter which namespace request_queue started the
> > > > reset: every namespaces request queue sees the RESETTING controller state
> > >
> > > When timeouts come, the global state of RESETTING may not be updated
> > > yet, so all the timeouts may not observe the state.
> >
> > Even prior to the RESETING state, every single command, no matter
> > which namespace or request_queue it came on, is reclaimed by the driver.
> > There *should* be no requests to timeout after nvme_dev_disable is called
> > because the nvme driver returned control of all requests in the tagset
> > to blk-mq.
>
> The timed-out requests won't be canceled by nvme_dev_disable().
??? nvme_dev_disable cancels all started requests. There are no exceptions.
> If the timed-out requests is handled as RESET_TIMER, there may
> be new timeout event triggered again.
>
> >
> > In any case, if blk-mq decides it won't complete those requests, we
> > can just swap the order in the reset_work: sync first, uncondintionally
> > disable. Does the following snippet look more okay?
> >
> > ---
> > diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> > index 17a0190bd88f..42af077ee07a 100644
> > --- a/drivers/nvme/host/pci.c
> > +++ b/drivers/nvme/host/pci.c
> > @@ -2307,11 +2307,14 @@ static void nvme_reset_work(struct work_struct *work)
> > goto out;
> >
> > /*
> > - * If we're called to reset a live controller first shut it down before
> > - * moving on.
> > + * Ensure there are no timeout work in progress prior to forcefully
> > + * disabling the queue. There is no harm in disabling the device even
> > + * when it was already disabled, as this will forcefully reclaim any
> > + * IOs that are stuck due to blk-mq's timeout handling that prevents
> > + * timed out requests from completing.
> > */
> > - if (dev->ctrl.ctrl_config & NVME_CC_ENABLE)
> > - nvme_dev_disable(dev, false);
> > + nvme_sync_queues(&dev->ctrl);
> > + nvme_dev_disable(dev, false);
>
> That may not work reliably too.
>
> For example, request A from NS_0 is timed-out and handled as RESET_TIMER,
> meantime request B from NS_1 is timed-out and handled as EH_HANDLED.
Meanwhile, request B's nvme_dev_disable prior to returning EH_HANDLED
cancels all requests, which includes request A.
> When the above reset work is run for handling timeout of req B,
> new timeout event on request A may come just between the above
> nvme_sync_queues() and nvme_dev_disable()
The sync queues either stops the timer from running, or waits for it to
complete. We are in the RESETTING state: if request A's timeout happens
to be running, we're not restarting the timer; we're returning EH_HANDLED.
> then nvme_dev_disable()
> can't cover request A, and finally the timed-out event for req A
> will nvme_dev_disable() when the current reset is just in-progress,
> then this reset can't move on, and IO hang is caused.
At no point in the nvme_reset are we waiting for any IO to complete.
Reset continues to make forward progress.
next prev parent reply other threads:[~2018-05-21 15:59 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-05-18 16:38 [PATCH 1/6] nvme: Sync request queues on reset Keith Busch
2018-05-18 16:38 ` [PATCH 2/6] nvme-pci: Fix queue freeze criteria " Keith Busch
2018-05-18 16:38 ` [PATCH 3/6] nvme: Move all IO out of controller reset Keith Busch
2018-05-18 23:03 ` Ming Lei
2018-05-21 14:22 ` Keith Busch
2018-05-21 14:58 ` Ming Lei
2018-05-21 15:03 ` Keith Busch
2018-05-21 15:34 ` Ming Lei
2018-05-21 15:44 ` Keith Busch
2018-05-21 16:04 ` Ming Lei
2018-05-21 16:23 ` Keith Busch
2018-05-22 1:46 ` Ming Lei
2018-05-22 14:03 ` Keith Busch
2018-05-18 16:38 ` [PATCH 4/6] nvme: Allow reset from CONNECTING state Keith Busch
2018-05-18 16:38 ` [PATCH 5/6] nvme-pci: Attempt reset retry for IO failures Keith Busch
2018-05-18 16:38 ` [PATCH 6/6] nvme-pci: Rate limit the nvme timeout warnings Keith Busch
2018-05-18 22:32 ` [PATCH 1/6] nvme: Sync request queues on reset Ming Lei
2018-05-18 23:44 ` Keith Busch
2018-05-19 0:01 ` Ming Lei
2018-05-21 14:04 ` Keith Busch
2018-05-21 15:25 ` Ming Lei
2018-05-21 15:59 ` Keith Busch [this message]
2018-05-21 16:08 ` Ming Lei
2018-05-21 16:25 ` Keith Busch
2018-05-22 1:56 ` Ming Lei
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180521155909.GK5528@localhost.localdomain \
--to=keith.busch@linux.intel.com \
--cc=axboe@kernel.dk \
--cc=hch@lst.de \
--cc=james.smart@broadcom.com \
--cc=jthumshirn@suse.de \
--cc=keith.busch@intel.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=loberman@redhat.com \
--cc=ming.lei@redhat.com \
--cc=sagi@grimberg.me \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).