* linux-nvme: driver removal deadlock when device hot-removed
@ 2018-09-21 16:32 Michael Schoberg (mschoberg)
2018-09-21 16:53 ` Keith Busch
0 siblings, 1 reply; 2+ messages in thread
From: Michael Schoberg (mschoberg) @ 2018-09-21 16:32 UTC (permalink / raw)
I'm not sure if this is the correct forum, but hopefully someone can help instruct me on how to proceed towards resolving this issue.
I'm working on a problem that occurs when a nvme device is hot-removed while IO is occurring.? I see a driver deadlock during recovery or more precisely, the removal of the driver instance during the recovery attempt.? What seems to be happening is when the system calls the nvme timeout routine, it eventually calls outside the driver and runs into a mutex deadlock.? I'm running with a 4.18.8 kernel:
The code path appears as follows (note - the device has been removed so there is no possibility of it being recovered):
nvme_timeout --> nvme_warn_reset
nvme nvme0: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0xffff
*The pci_read_config_word() returns PCIBIOS_SUCCESSFUL when the returned values show the device is detached.? I'm not quite sure how the configuration read works with a detached device, but it's pretty clear the returned values are not valid.
Within nvme_timeout(), it begins the process to reset the controller:
nvme_reset_ctrl --> nvme_reset_work --> nvme_remove_dead_ctrl --> nvme_kill_queues --> nvme_set_queue_dying --> revalidate_disk
The thread hits a mutex deadlock after returning back to fs/block_dev.c::revalidate_disk():? mutex_lock(&bdev->bd_mutex)
This is the point where the driver appears completely stuck and never recovers.? A power cycle or system reset is required to restore operation (reboot or shutdown hangs).? I'm not sure what within block_dev.c is holding bd_mutex that would cause the deadlock and therefore it's very possible there is a cleaner solution than what I am using.
^ permalink raw reply [flat|nested] 2+ messages in thread
* linux-nvme: driver removal deadlock when device hot-removed
2018-09-21 16:32 linux-nvme: driver removal deadlock when device hot-removed Michael Schoberg (mschoberg)
@ 2018-09-21 16:53 ` Keith Busch
0 siblings, 0 replies; 2+ messages in thread
From: Keith Busch @ 2018-09-21 16:53 UTC (permalink / raw)
On Fri, Sep 21, 2018@04:32:19PM +0000, Michael Schoberg (mschoberg) wrote:
> I'm not sure if this is the correct forum, but hopefully someone can
> help instruct me on how to proceed towards resolving this issue.
>
> I'm working on a problem that occurs when a nvme device is hot-removed
> while IO is occurring.? I see a driver deadlock during recovery or
> more precisely, the removal of the driver instance during the recovery
> attempt.? What seems to be happening is when the system calls the nvme
> timeout routine, it eventually calls outside the driver and runs into
> a mutex deadlock.? I'm running with a 4.18.8 kernel:
>
> The code path appears as follows (note - the device has been removed
> so there is no possibility of it being recovered):
>
> nvme_timeout --> nvme_warn_reset
We shouldn't get to nvme_timeout on a surprise removal. If you have
native pcie hotplug capable slots, the path ought to be:
pciehp_isr
remove_board
pciehp_unconfigure_device
pci_remove_bus_device
device_release_driver
nvme_remove
nvme_disable
And that should immediately clear outstanding IO and prevent new IO from
entering the driver.
Given what you're observing, I think your situation must be one of the
following:
1. You don't have pcie hotplug capable hardware
2. You don't have a pcie hotplug capable kernel
3. You are not using native PCIe hotplug with your platform
> nvme nvme0: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0xffff
>
> *The pci_read_config_word() returns PCIBIOS_SUCCESSFUL when the
> returned values show the device is detached.? I'm not quite sure how
> the configuration read works with a detached device, but it's pretty
> clear the returned values are not valid.
Since you're seeing PCIBIOS_SUCCESSFUL, the host is actually dispatching
the config request to a non-existent device. The reply will be a PCIe
Unsupported Request since there is nothing backing the address you're
trying to read, and you will see this as an "all 1's completion" with
no other indication of failure.
> Within nvme_timeout(), it begins the process to reset the controller:
>
> nvme_reset_ctrl --> nvme_reset_work --> nvme_remove_dead_ctrl --> nvme_kill_queues --> nvme_set_queue_dying --> revalidate_disk
>
> The thread hits a mutex deadlock after returning back to fs/block_dev.c::revalidate_disk():? mutex_lock(&bdev->bd_mutex)
>
> This is the point where the driver appears completely stuck and never
> recovers.? A power cycle or system reset is required to restore
> operation (reboot or shutdown hangs).? I'm not sure what within
> block_dev.c is holding bd_mutex that would cause the deadlock and
> therefore it's very possible there is a cleaner solution than what I
> am using.
I'm not sure what could be holding it either. Maybe it's a task waiting
for an entered request to complete, in which case we should have the
nvme driver drain entered requests to failure in this path. The following
should be safe:
---
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 800ee9b345f3..7c1330986a6c 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2233,7 +2233,7 @@ static void nvme_remove_dead_ctrl(struct nvme_dev *dev, int status)
dev_warn(dev->ctrl.device, "Removing after probe failure status: %d\n", status);
nvme_get_ctrl(&dev->ctrl);
- nvme_dev_disable(dev, false);
+ nvme_dev_disable(dev, true);
nvme_kill_queues(&dev->ctrl);
if (!queue_work(nvme_wq, &dev->remove_work))
nvme_put_ctrl(&dev->ctrl);
--
^ permalink raw reply related [flat|nested] 2+ messages in thread
end of thread, other threads:[~2018-09-21 16:53 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-09-21 16:32 linux-nvme: driver removal deadlock when device hot-removed Michael Schoberg (mschoberg)
2018-09-21 16:53 ` Keith Busch
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).