[PATCH RFC] nvme: fix race condition between remove and scan_work

Linux-NVME Archive on lore.kernel.org
 help / color / mirror / Atom feed

From: kbusch@kernel.org (Keith Busch)
Subject: [PATCH RFC] nvme: fix race condition between remove and scan_work
Date: Wed, 24 Apr 2019 10:26:59 -0600	[thread overview]
Message-ID: <20190424162659.GA15412@localhost.localdomain> (raw)
In-Reply-To: <a9d6bf42-e01a-4e56-acc7-5d87ac9179f9@grimberg.me>

On Wed, Apr 24, 2019@09:23:10AM -0700, Sagi Grimberg wrote:
> >   	/* If PCI error recovery process is happening, we cannot reset or
> >   	 * the recovery mechanism will surely fail.
> > @@ -1329,7 +1330,13 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved)
> >   			 "I/O %d QID %d timeout, reset controller\n",
> >   			 req->tag, nvmeq->qid);
> >   		nvme_dev_disable(dev, false);
> > -		nvme_reset_ctrl(&dev->ctrl);
> > +		/*
> > +		 * If reset ctrl fail, we need to drain all requests in ctx
> > +		 * and elevator, avoiding io stuck forever.
> > +		 */
> > +		error = nvme_reset_ctrl(&dev->ctrl);
> > +		if (error)
> > +			blk_mq_unquiesce_queue(dev->ctrl.admin_q);
> 
> Is it just DELETING state that is acceptable here? or can we meet other
> states that fail transition to RESETTING (CONNECTING/DEAD)?

It could be connecting or already scheduled resetting, in which case we
wouldn't want to unquiesce.

When we do want to unquiesce, though, we also want to do that to the
IO queues, not just the admin queue. Untested below, but this might be
in the right direction:

---
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index a90cf5d63aac..acfb34c945b2 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1315,6 +1315,10 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved)
 		nvme_dev_disable(dev, false);
 		nvme_req(req)->flags |= NVME_REQ_CANCELLED;
 		return BLK_EH_DONE;
+	case NVME_CTRL_DELETING:
+		nvme_dev_disable(dev, true);
+		nvme_req(req)->flags |= NVME_REQ_CANCELLED;
+		return BLK_EH_DONE;
 	default:
 		break;
 	}
@@ -2438,8 +2442,11 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
 	 * must flush all entered requests to their failed completion to avoid
 	 * deadlocking blk-mq hot-cpu notifier.
 	 */
-	if (shutdown)
+	if (shutdown) {
 		nvme_start_queues(&dev->ctrl);
+		if (dev->ctrl.admin_q)
+			blk_mq_unquiesce_queue(dev->ctrl.admin_q);
+	}
 	mutex_unlock(&dev->shutdown_lock);
 }
 
--

next prev parent reply	other threads:[~2019-04-24 16:26 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-11 13:32 [PATCH RFC] nvme: fix race condition between remove and scan_work Yufen Yu
2019-04-19 12:28 ` yuyufen
2019-04-24 16:23 ` Sagi Grimberg
2019-04-24 16:26   ` Keith Busch [this message]
2019-04-24 16:42     ` Sagi Grimberg
2019-04-30 13:14       ` yuyufen
2019-04-30 14:58         ` Sagi Grimberg

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:a90cf5d63aa dfblob:acfb34c945b )
 OR (
bs:"[PATCH RFC] nvme: fix race condition between remove and scan_work" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190424162659.GA15412@localhost.localdomain \
    --to=kbusch@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox