[PATCHv3 3/9] nvme: Move all IO out of controller reset

All of lore.kernel.org
 help / color / mirror / Atom feed

From: keith.busch@linux.intel.com (Keith Busch)
Subject: [PATCHv3 3/9] nvme: Move all IO out of controller reset
Date: Fri, 25 May 2018 08:41:25 -0600	[thread overview]
Message-ID: <20180525144125.GP11037@localhost.localdomain> (raw)
In-Reply-To: <20180525130057.GF23463@lst.de>

On Fri, May 25, 2018@03:00:57PM +0200, Christoph Hellwig wrote:
> On Thu, May 24, 2018@02:34:54PM -0600, Keith Busch wrote:
> > IO may be retryable, so don't wait for them in the reset path.
> 
> Can't parse this.
> 
> 
> > These
> > commands may trigger a reset if that IO expires without a completion,
> 
> What are "these commands"?

Sorry, I'm referring to the failed requests that were requeued after
a controller disabling. We've been dispatching them from reset_work and
just hoping they don't time out.
 
> > placing it on the requeue list, so waiting for these would deadlock the
> > reset handler.
> > 
> > To fix the theoretical deadlock, this patch unblocks IO submission from
> 
> How did you find it if it is theoretical?

Variants of the blktests block/011 can trigger this. I've neven seen it
in real life, but its not too much of a leap to imagine it can happen.
 
> > This patch is also renaming the function 'nvme_dev_add' to a
> > more appropriate name that describes what it's actually doing:
> > nvme_alloc_io_tags.
> 
> Can you split this out into a separate patch?

Sure thing.
 
> > @@ -3175,6 +3175,8 @@ static void nvme_scan_work(struct work_struct *work)
> >  	struct nvme_id_ctrl *id;
> >  	unsigned nn;
> >  
> > +	if (ctrl->ops->update_hw_ctx)
> > +		ctrl->ops->update_hw_ctx(ctrl);
> 
> nvme_scan_work gets kicked from all kinds of places including
> ioctls and AERs. I don't think the code you added below should
> be called from all of them.

True, most of the time nothing happens on this call. I'm trying to not
require another work_struct, and scan_work provides a safe context for
what this needs to accomplish, but I can try to find another way.

> > +static void nvme_pci_update_hw_ctx(struct nvme_ctrl *ctrl)
> > +{
> > +	struct nvme_dev *dev = to_nvme_dev(ctrl);
> > +	bool unfreeze;
> > +
> > +	mutex_lock(&dev->shutdown_lock);
> > +	unfreeze = dev->queues_froze;
> > +	mutex_unlock(&dev->shutdown_lock);
> 
> No need to take a mutex here is you sample as single <= register
> sized value.
> 
> > +	if (!unfreeze)
> > +		return;
> 
> But this whole scheme stinks to me.  For one we are adding more ad-hoc
> state outside the state machine, second it all seems very "ad-hoc".
> 
> > +
> > +	nvme_wait_freeze(&dev->ctrl);
> > +	blk_mq_update_nr_hw_queues(ctrl->tagset, dev->online_queues - 1);
> > +	nvme_free_queues(dev, dev->online_queues);
> > +	nvme_unfreeze(&dev->ctrl);
> > +
> > +	mutex_lock(&dev->shutdown_lock);
> > +	dev->queues_froze = false;
> > +	mutex_unlock(&dev->shutdown_lock);
> 
> Same here.  Simple READ_ONCE/WRITE_ONCE will give you the right
> memory barriers with no need for the lock.

Good point.
 
> Also except for the nvme_free_queues this all is generic code,
> so I think we want this in the core.

That can be arranged.

> And I wonder where this would fit better than the scan work, but I
> can't think of anything else but an entirely new work_struct, which
> isn't all that great either.

Yeah, I was trying to avoid introducing yet another work_struct here.

> > @@ -2211,7 +2228,10 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
> >  	     dev->ctrl.state == NVME_CTRL_RESETTING)) {
> >  		u32 csts = readl(dev->bar + NVME_REG_CSTS);
> >  
> > -		nvme_start_freeze(&dev->ctrl);
> > +		if (!dev->queues_froze)	{
> > +			nvme_start_freeze(&dev->ctrl);
> > +			dev->queues_froze = true;
> > +		}
> 
> And this sounds like another indicator for a new FROZEN state.  Once
> the ctrl already is frozen we really shouldn't even end up in here
> anymore.

I'll have to think about this idea.

next prev parent reply	other threads:[~2018-05-25 14:41 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-24 20:34 [PATCHv3 0/9] nvme timeout fixes, v3 Keith Busch
2018-05-24 20:34 ` [PATCHv3 1/9] nvme: Sync request queues on reset Keith Busch
2018-05-25 12:42   ` Christoph Hellwig
2018-05-25 14:22     ` Keith Busch
2018-05-25 14:32       ` Christoph Hellwig
2018-05-25 14:45         ` Keith Busch
2018-05-25 15:56         ` James Smart
2018-05-25 16:24           ` Keith Busch
2018-05-25 18:04             ` James Smart
2018-05-25 18:30               ` Keith Busch
2018-05-30 23:25                 ` Sagi Grimberg
2018-06-05 16:25                   ` Keith Busch
2018-05-30 23:24           ` Sagi Grimberg
2018-05-24 20:34 ` [PATCHv3 2/9] nvme-pci: Fix queue freeze criteria " Keith Busch
2018-05-25 12:43   ` Christoph Hellwig
2018-05-30 23:36   ` Sagi Grimberg
2018-05-24 20:34 ` [PATCHv3 3/9] nvme: Move all IO out of controller reset Keith Busch
2018-05-25 13:00   ` Christoph Hellwig
2018-05-25 14:41     ` Keith Busch [this message]
2018-05-24 20:34 ` [PATCHv3 4/9] nvme-pci: Rate limit the nvme timeout warnings Keith Busch
2018-05-25 13:01   ` Christoph Hellwig
2018-05-30  6:06   ` Christoph Hellwig
2018-05-24 20:34 ` [PATCHv3 5/9] nvme-pci: End IO requests in CONNECTING state Keith Busch
2018-05-24 20:47   ` Christoph Hellwig
2018-05-24 21:03     ` Keith Busch
2018-05-25 12:31       ` Christoph Hellwig
2018-05-24 20:34 ` [PATCHv3 6/9] nvme-pci: Unquiesce dead controller queues Keith Busch
2018-05-25 13:03   ` Christoph Hellwig
2018-05-24 20:34 ` [PATCHv3 7/9] nvme-pci: Attempt reset retry for IO failures Keith Busch
2018-05-25 13:04   ` Christoph Hellwig
2018-05-25 14:25     ` Keith Busch
2018-05-30 23:40   ` Sagi Grimberg
2018-06-04 22:46     ` Keith Busch
2018-05-24 20:34 ` [PATCHv3 8/9] nvme-pci: Queue creation error handling Keith Busch
2018-05-25 12:35   ` Christoph Hellwig
2018-06-05 16:28     ` Keith Busch
2018-05-30 23:37   ` Sagi Grimberg
2018-05-24 20:35 ` [PATCHv3 9/9] nvme-pci: Don't wait for HMB completion on shutdown Keith Busch
2018-05-24 20:45   ` Christoph Hellwig
2018-05-24 21:15     ` Keith Busch
2018-05-25  3:10       ` jianchao.wang
2018-05-25 15:09         ` Keith Busch
2018-05-25 12:36       ` Christoph Hellwig
2018-07-13  0:48 ` [PATCHv3 0/9] nvme timeout fixes, v3 Ming Lei
2018-07-13 20:54   ` Keith Busch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180525144125.GP11037@localhost.localdomain \
    --to=keith.busch@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.