public inbox for linux-nvme@lists.infradead.org
 help / color / mirror / Atom feed
* [PATCH] nvme: ensure disabling pairs with unquiesce
@ 2023-06-30 17:47 Keith Busch
  2023-07-06 12:51 ` Christoph Hellwig
  0 siblings, 1 reply; 3+ messages in thread
From: Keith Busch @ 2023-06-30 17:47 UTC (permalink / raw)
  To: linux-nvme, hch; +Cc: sagi, Keith Busch, Ming Lei

From: Keith Busch <kbusch@kernel.org>

If any error handling that disables the controller fails to queue the
reset work, like if the state changed to disconnected inbetween, then
the failed teardown needs to unquiesce the queues since it's no longer
paired with reset_work.

Reported-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
---
 drivers/nvme/host/pci.c | 22 +++++++++++++---------
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index c9224d39195e5..bf21bd08ee7ed 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -108,6 +108,7 @@ MODULE_PARM_DESC(noacpi, "disable acpi bios quirks");
 struct nvme_dev;
 struct nvme_queue;
 
+static int nvme_disable_prepare_reset(struct nvme_dev *dev, bool shutdown);
 static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown);
 static void nvme_delete_io_queues(struct nvme_dev *dev);
 static void nvme_update_attrs(struct nvme_dev *dev);
@@ -1298,9 +1299,7 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req)
 	 */
 	if (nvme_should_reset(dev, csts)) {
 		nvme_warn_reset(dev, csts);
-		nvme_dev_disable(dev, false);
-		nvme_reset_ctrl(&dev->ctrl);
-		return BLK_EH_DONE;
+		goto disable;
 	}
 
 	/*
@@ -1351,10 +1350,7 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req)
 			 "I/O %d QID %d timeout, reset controller\n",
 			 req->tag, nvmeq->qid);
 		nvme_req(req)->flags |= NVME_REQ_CANCELLED;
-		nvme_dev_disable(dev, false);
-		nvme_reset_ctrl(&dev->ctrl);
-
-		return BLK_EH_DONE;
+		goto disable;
 	}
 
 	if (atomic_dec_return(&dev->ctrl.abort_limit) < 0) {
@@ -1391,6 +1387,12 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req)
 	 * as the device then is in a faulty state.
 	 */
 	return BLK_EH_RESET_TIMER;
+
+disable:
+	if (!nvme_disable_prepare_reset(dev, false) &&
+	    nvme_try_sched_reset(&dev->ctrl))
+		nvme_unquiesce_io_queues(&dev->ctrl);
+	return BLK_EH_DONE;
 }
 
 static void nvme_free_queue(struct nvme_queue *nvmeq)
@@ -3278,7 +3280,8 @@ static pci_ers_result_t nvme_error_detected(struct pci_dev *pdev,
 	case pci_channel_io_frozen:
 		dev_warn(dev->ctrl.device,
 			"frozen state error detected, reset controller\n");
-		nvme_dev_disable(dev, false);
+		if (nvme_disable_prepare_reset(dev, false))
+			return PCI_ERS_RESULT_DISCONNECT;
 		return PCI_ERS_RESULT_NEED_RESET;
 	case pci_channel_io_perm_failure:
 		dev_warn(dev->ctrl.device,
@@ -3294,7 +3297,8 @@ static pci_ers_result_t nvme_slot_reset(struct pci_dev *pdev)
 
 	dev_info(dev->ctrl.device, "restart after slot reset\n");
 	pci_restore_state(pdev);
-	nvme_reset_ctrl(&dev->ctrl);
+	if (!nvme_try_sched_reset(&dev->ctrl))
+		nvme_unquiesce_io_queues(&dev->ctrl);
 	return PCI_ERS_RESULT_RECOVERED;
 }
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] nvme: ensure disabling pairs with unquiesce
  2023-06-30 17:47 [PATCH] nvme: ensure disabling pairs with unquiesce Keith Busch
@ 2023-07-06 12:51 ` Christoph Hellwig
  2023-07-06 15:24   ` Keith Busch
  0 siblings, 1 reply; 3+ messages in thread
From: Christoph Hellwig @ 2023-07-06 12:51 UTC (permalink / raw)
  To: Keith Busch; +Cc: linux-nvme, hch, sagi, Keith Busch, Ming Lei

> +static int nvme_disable_prepare_reset(struct nvme_dev *dev, bool shutdown);

Can we rename nvme_disable_prepare_reset while we're at it?  The
name already is a bit suboptimal in the xisting code, but with
the new calles becomes really confusing.

nvme_pci_wait_for_reset_and_disable is what I could come up with.
It feels a bit too verbose, but at least it's descriptive.

> +disable:
> +	if (!nvme_disable_prepare_reset(dev, false) &&
> +	    nvme_try_sched_reset(&dev->ctrl))

Why do we wait for a reset just to schedule another reset here?

Also I feel splitting out the changes to wait for reset from those
that add the unquiesce when we can't reset into another patch
would probably help beeing able to understand the changes here.


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] nvme: ensure disabling pairs with unquiesce
  2023-07-06 12:51 ` Christoph Hellwig
@ 2023-07-06 15:24   ` Keith Busch
  0 siblings, 0 replies; 3+ messages in thread
From: Keith Busch @ 2023-07-06 15:24 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Keith Busch, linux-nvme, sagi, Ming Lei

On Thu, Jul 06, 2023 at 02:51:34PM +0200, Christoph Hellwig wrote:
> > +static int nvme_disable_prepare_reset(struct nvme_dev *dev, bool shutdown);
> 
> Can we rename nvme_disable_prepare_reset while we're at it?  The
> name already is a bit suboptimal in the xisting code, but with
> the new calles becomes really confusing.
> 
> nvme_pci_wait_for_reset_and_disable is what I could come up with.
> It feels a bit too verbose, but at least it's descriptive.
> 
> > +disable:
> > +	if (!nvme_disable_prepare_reset(dev, false) &&
> > +	    nvme_try_sched_reset(&dev->ctrl))
> 
> Why do we wait for a reset just to schedule another reset here?

nvme_disable_prepare_reset() just waits for the ctrl->state to become
"RESETTING" so the code path knows no one else can schedule a reset
during disable. The actual reset has to enable happens later.

The following nvme_try_sched_reset() just does the required 2nd part if
the state change was successful. The only way that should fail is if the
controller was removed during the disabling.

There is a problem with multi-namespace, though: the reset_work
synchronizes each request_queue while in the RESETTING state, so if each
request_queue's timeout handler blocks for the RESETTING state, then
it's a deadlock. I will need to reconsider how to fix this.

> Also I feel splitting out the changes to wait for reset from those
> that add the unquiesce when we can't reset into another patch
> would probably help beeing able to understand the changes here.


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2023-07-06 15:24 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-06-30 17:47 [PATCH] nvme: ensure disabling pairs with unquiesce Keith Busch
2023-07-06 12:51 ` Christoph Hellwig
2023-07-06 15:24   ` Keith Busch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox