Re: [PATCH 1/6] nvme: Sync request queues on reset

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Keith Busch <keith.busch@linux.intel.com>
To: Ming Lei <ming.lei@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>, Keith Busch <keith.busch@intel.com>,
	Laurence Oberman <loberman@redhat.com>,
	Sagi Grimberg <sagi@grimberg.me>,
	James Smart <james.smart@broadcom.com>,
	linux-nvme@lists.infradead.org, linux-block@vger.kernel.org,
	Johannes Thumshirn <jthumshirn@suse.de>,
	Christoph Hellwig <hch@lst.de>
Subject: Re: [PATCH 1/6] nvme: Sync request queues on reset
Date: Mon, 21 May 2018 09:59:09 -0600	[thread overview]
Message-ID: <20180521155909.GK5528@localhost.localdomain> (raw)
In-Reply-To: <20180521152536.GB19099@ming.t460p>

On Mon, May 21, 2018 at 11:25:43PM +0800, Ming Lei wrote:
> On Mon, May 21, 2018 at 08:04:13AM -0600, Keith Busch wrote:
> > On Sat, May 19, 2018 at 08:01:42AM +0800, Ming Lei wrote:
> > > > You keep saying that, but the controller state is global to the
> > > > controller. It doesn't matter which namespace request_queue started the
> > > > reset: every namespaces request queue sees the RESETTING controller state
> > > 
> > > When timeouts come, the global state of RESETTING may not be updated
> > > yet, so all the timeouts may not observe the state.
> > 
> > Even prior to the RESETING state, every single command, no matter
> > which namespace or request_queue it came on, is reclaimed by the driver.
> > There *should* be no requests to timeout after nvme_dev_disable is called
> > because the nvme driver returned control of all requests in the tagset
> > to blk-mq.
> 
> The timed-out requests won't be canceled by nvme_dev_disable().

??? nvme_dev_disable cancels all started requests. There are no exceptions.

> If the timed-out requests is handled as RESET_TIMER, there may
> be new timeout event triggered again.
> 
> > 
> > In any case, if blk-mq decides it won't complete those requests, we
> > can just swap the order in the reset_work: sync first, uncondintionally
> > disable. Does the following snippet look more okay?
> > 
> > ---
> > diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> > index 17a0190bd88f..42af077ee07a 100644
> > --- a/drivers/nvme/host/pci.c
> > +++ b/drivers/nvme/host/pci.c
> > @@ -2307,11 +2307,14 @@ static void nvme_reset_work(struct work_struct *work)
> >  		goto out;
> >  
> >  	/*
> > -	 * If we're called to reset a live controller first shut it down before
> > -	 * moving on.
> > +	 * Ensure there are no timeout work in progress prior to forcefully
> > +	 * disabling the queue. There is no harm in disabling the device even
> > +	 * when it was already disabled, as this will forcefully reclaim any
> > +	 * IOs that are stuck due to blk-mq's timeout handling that prevents
> > +	 * timed out requests from completing.
> >  	 */
> > -	if (dev->ctrl.ctrl_config & NVME_CC_ENABLE)
> > -		nvme_dev_disable(dev, false);
> > +	nvme_sync_queues(&dev->ctrl);
> > +	nvme_dev_disable(dev, false);
> 
> That may not work reliably too.
> 
> For example, request A from NS_0 is timed-out and handled as RESET_TIMER,
> meantime request B from NS_1 is timed-out and handled as EH_HANDLED.

Meanwhile, request B's nvme_dev_disable prior to returning EH_HANDLED
cancels all requests, which includes request A.

> When the above reset work is run for handling timeout of req B,
> new timeout event on request A may come just between the above
> nvme_sync_queues() and nvme_dev_disable()

The sync queues either stops the timer from running, or waits for it to
complete. We are in the RESETTING state: if request A's timeout happens
to be running, we're not restarting the timer; we're returning EH_HANDLED.

> then nvme_dev_disable()
> can't cover request A, and finally the timed-out event for req A
> will nvme_dev_disable() when the current reset is just in-progress,
> then this reset can't move on, and IO hang is caused.

At no point in the nvme_reset are we waiting for any IO to complete.
Reset continues to make forward progress.

WARNING: multiple messages have this Message-ID (diff)

From: keith.busch@linux.intel.com (Keith Busch)
Subject: [PATCH 1/6] nvme: Sync request queues on reset
Date: Mon, 21 May 2018 09:59:09 -0600	[thread overview]
Message-ID: <20180521155909.GK5528@localhost.localdomain> (raw)
In-Reply-To: <20180521152536.GB19099@ming.t460p>

On Mon, May 21, 2018@11:25:43PM +0800, Ming Lei wrote:
> On Mon, May 21, 2018@08:04:13AM -0600, Keith Busch wrote:
> > On Sat, May 19, 2018@08:01:42AM +0800, Ming Lei wrote:
> > > > You keep saying that, but the controller state is global to the
> > > > controller. It doesn't matter which namespace request_queue started the
> > > > reset: every namespaces request queue sees the RESETTING controller state
> > > 
> > > When timeouts come, the global state of RESETTING may not be updated
> > > yet, so all the timeouts may not observe the state.
> > 
> > Even prior to the RESETING state, every single command, no matter
> > which namespace or request_queue it came on, is reclaimed by the driver.
> > There *should* be no requests to timeout after nvme_dev_disable is called
> > because the nvme driver returned control of all requests in the tagset
> > to blk-mq.
> 
> The timed-out requests won't be canceled by nvme_dev_disable().

??? nvme_dev_disable cancels all started requests. There are no exceptions.

> If the timed-out requests is handled as RESET_TIMER, there may
> be new timeout event triggered again.
> 
> > 
> > In any case, if blk-mq decides it won't complete those requests, we
> > can just swap the order in the reset_work: sync first, uncondintionally
> > disable. Does the following snippet look more okay?
> > 
> > ---
> > diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> > index 17a0190bd88f..42af077ee07a 100644
> > --- a/drivers/nvme/host/pci.c
> > +++ b/drivers/nvme/host/pci.c
> > @@ -2307,11 +2307,14 @@ static void nvme_reset_work(struct work_struct *work)
> >  		goto out;
> >  
> >  	/*
> > -	 * If we're called to reset a live controller first shut it down before
> > -	 * moving on.
> > +	 * Ensure there are no timeout work in progress prior to forcefully
> > +	 * disabling the queue. There is no harm in disabling the device even
> > +	 * when it was already disabled, as this will forcefully reclaim any
> > +	 * IOs that are stuck due to blk-mq's timeout handling that prevents
> > +	 * timed out requests from completing.
> >  	 */
> > -	if (dev->ctrl.ctrl_config & NVME_CC_ENABLE)
> > -		nvme_dev_disable(dev, false);
> > +	nvme_sync_queues(&dev->ctrl);
> > +	nvme_dev_disable(dev, false);
> 
> That may not work reliably too.
> 
> For example, request A from NS_0 is timed-out and handled as RESET_TIMER,
> meantime request B from NS_1 is timed-out and handled as EH_HANDLED.

Meanwhile, request B's nvme_dev_disable prior to returning EH_HANDLED
cancels all requests, which includes request A.

> When the above reset work is run for handling timeout of req B,
> new timeout event on request A may come just between the above
> nvme_sync_queues() and nvme_dev_disable()

The sync queues either stops the timer from running, or waits for it to
complete. We are in the RESETTING state: if request A's timeout happens
to be running, we're not restarting the timer; we're returning EH_HANDLED.

> then nvme_dev_disable()
> can't cover request A, and finally the timed-out event for req A
> will nvme_dev_disable() when the current reset is just in-progress,
> then this reset can't move on, and IO hang is caused.

At no point in the nvme_reset are we waiting for any IO to complete.
Reset continues to make forward progress.

next prev parent reply	other threads:[~2018-05-21 15:59 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-18 16:38 [PATCH 1/6] nvme: Sync request queues on reset Keith Busch
2018-05-18 16:38 ` Keith Busch
2018-05-18 16:38 ` [PATCH 2/6] nvme-pci: Fix queue freeze criteria " Keith Busch
2018-05-18 16:38   ` Keith Busch
2018-05-18 16:38 ` [PATCH 3/6] nvme: Move all IO out of controller reset Keith Busch
2018-05-18 16:38   ` Keith Busch
2018-05-18 23:03   ` Ming Lei
2018-05-18 23:03     ` Ming Lei
2018-05-21 14:22     ` Keith Busch
2018-05-21 14:22       ` Keith Busch
2018-05-21 14:58       ` Ming Lei
2018-05-21 14:58         ` Ming Lei
2018-05-21 15:03         ` Keith Busch
2018-05-21 15:03           ` Keith Busch
2018-05-21 15:34           ` Ming Lei
2018-05-21 15:34             ` Ming Lei
2018-05-21 15:44             ` Keith Busch
2018-05-21 15:44               ` Keith Busch
2018-05-21 16:04               ` Ming Lei
2018-05-21 16:04                 ` Ming Lei
2018-05-21 16:23                 ` Keith Busch
2018-05-21 16:23                   ` Keith Busch
2018-05-22  1:46                   ` Ming Lei
2018-05-22  1:46                     ` Ming Lei
2018-05-22 14:03                     ` Keith Busch
2018-05-22 14:03                       ` Keith Busch
2018-05-18 16:38 ` [PATCH 4/6] nvme: Allow reset from CONNECTING state Keith Busch
2018-05-18 16:38   ` Keith Busch
2018-05-18 16:38 ` [PATCH 5/6] nvme-pci: Attempt reset retry for IO failures Keith Busch
2018-05-18 16:38   ` Keith Busch
2018-05-18 16:38 ` [PATCH 6/6] nvme-pci: Rate limit the nvme timeout warnings Keith Busch
2018-05-18 16:38   ` Keith Busch
2018-05-18 22:32 ` [PATCH 1/6] nvme: Sync request queues on reset Ming Lei
2018-05-18 22:32   ` Ming Lei
2018-05-18 23:44   ` Keith Busch
2018-05-18 23:44     ` Keith Busch
2018-05-19  0:01     ` Ming Lei
2018-05-19  0:01       ` Ming Lei
2018-05-21 14:04       ` Keith Busch
2018-05-21 14:04         ` Keith Busch
2018-05-21 15:25         ` Ming Lei
2018-05-21 15:25           ` Ming Lei
2018-05-21 15:59           ` Keith Busch [this message]
2018-05-21 15:59             ` Keith Busch
2018-05-21 16:08             ` Ming Lei
2018-05-21 16:08               ` Ming Lei
2018-05-21 16:25               ` Keith Busch
2018-05-21 16:25                 ` Keith Busch
2018-05-22  1:56                 ` Ming Lei
2018-05-22  1:56                   ` Ming Lei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180521155909.GK5528@localhost.localdomain \
    --to=keith.busch@linux.intel.com \
    --cc=axboe@kernel.dk \
    --cc=hch@lst.de \
    --cc=james.smart@broadcom.com \
    --cc=jthumshirn@suse.de \
    --cc=keith.busch@intel.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=loberman@redhat.com \
    --cc=ming.lei@redhat.com \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.