Linux-NVME Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: keith.busch@intel.com (Keith Busch)
Subject: Bug Report: can't unload nvme module in case of disabled device
Date: Thu, 10 Aug 2017 15:34:23 -0400	[thread overview]
Message-ID: <20170810193423.GA1180@localhost.localdomain> (raw)
In-Reply-To: <20170810191717.GA1604@localhost.localdomain>

On Thu, Aug 10, 2017@03:17:17PM -0400, Keith Busch wrote:
> On Thu, Aug 10, 2017@12:45:36PM -0400, Keith Busch wrote:
> > On Tue, Aug 01, 2017@03:58:10PM +0300, Max Gurtovoy wrote:
> > > Hi all,
> > > 
> > > I would like to report a bug that reproduced by the following steps (I'm
> > > using 4.13.0-rc3+):
> > > 
> > > 1. modprobe nvme
> > > 2. echo 0 >  /sys/block/nvme0n1/device/device/enable
> > > 3. nvme list (stuck for more than 1-2 mins)
> > > 4. modprobe -r nvme (stuck forever)
> > > 
> > > log:
> > > 
> > > [ 1342.388888] nvme nvme0: controller is down; will reset: CSTS=0x3,
> > > PCI_STATUS=0x10
> > > [ 1476.021392] INFO: task kworker/u98:1:436 blocked for more than 120
> > > seconds.
> > > [ 1476.029072]       Not tainted 4.13.0-rc3+ #19
> > > [ 1476.033878] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> > > this message.
> > > [ 1476.042505] kworker/u98:1   D    0   436      2 0x00000000
> > > [ 1476.048569] Workqueue: nvme-wq nvme_reset_work [nvme]
> > > [ 1476.054133] Call Trace:
> > > [ 1476.056862]  __schedule+0x1dc/0x780
> > > [ 1476.060706]  schedule+0x36/0x80
> > > [ 1476.064180]  blk_mq_freeze_queue_wait+0x4b/0xb0
> > > [ 1476.069175]  ? remove_wait_queue+0x60/0x60
> > > [ 1476.073693]  nvme_wait_freeze+0x33/0x50 [nvme_core]
> > > [ 1476.079068]  nvme_reset_work+0x6b9/0xc40 [nvme]
> > > [ 1476.084075]  ? __switch_to+0x23e/0x4a0
> > > [ 1476.088209]  process_one_work+0x149/0x360
> > > [ 1476.092625]  worker_thread+0x4d/0x3c0
> > > [ 1476.096692]  kthread+0x109/0x140
> > > [ 1476.100247]  ? rescuer_thread+0x380/0x380
> > > [ 1476.104664]  ? kthread_park+0x60/0x60
> > > [ 1476.108698]  ret_from_fork+0x25/0x30
> > 
> > This looks like a path does not pair the freeze start with the reset's
> > freeze wait. I'll have to see what the pci 'enable' sysfs entry does.
> 
> I see how the freeze start/stops are not paired in this scenario:
> nvme_dev_disable doesn't start the freeze if the pci device isn't
> disabled. It uses this to know if it is disabling the device twice.
> 
> In this test, though, you are disabling the pci device without the
> driver's knowledge, so that breaks that logic. In light of that, we'll
> need different criteria to know when the driver should start a freeze.
> I'll test some things out and send a patch.

This should fix it for your scenario, but I am not completely sure this
can't get the freeze depth higher than we need.

---
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index cd888a4..ca03980 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2006,12 +2006,14 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
 	struct pci_dev *pdev = to_pci_dev(dev->dev);
 
 	mutex_lock(&dev->shutdown_lock);
+
+	if (dev->ctrl.state == NVME_CTRL_LIVE ||
+	    dev->ctrl.state == NVME_CTRL_RESETTING)
+		nvme_start_freeze(&dev->ctrl);
+
 	if (pci_is_enabled(pdev)) {
 		u32 csts = readl(dev->bar + NVME_REG_CSTS);
 
-		if (dev->ctrl.state == NVME_CTRL_LIVE ||
-		    dev->ctrl.state == NVME_CTRL_RESETTING)
-			nvme_start_freeze(&dev->ctrl);
 		dead = !!((csts & NVME_CSTS_CFS) || !(csts & NVME_CSTS_RDY) ||
 			pdev->error_state  != pci_channel_io_normal);
 	}
--

      reply	other threads:[~2017-08-10 19:34 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-01 12:58 Bug Report: can't unload nvme module in case of disabled device Max Gurtovoy
2017-08-10  8:59 ` Christoph Hellwig
2017-08-10 17:04   ` Max Gurtovoy
2017-08-10 19:36     ` Keith Busch
2017-08-13  8:29       ` Max Gurtovoy
2017-08-14 20:24         ` Keith Busch
2017-08-10 16:45 ` Keith Busch
2017-08-10 19:17   ` Keith Busch
2017-08-10 19:34     ` Keith Busch [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170810193423.GA1180@localhost.localdomain \
    --to=keith.busch@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox