linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* linux-nvme: driver removal deadlock when device hot-removed
@ 2018-09-21 16:32 Michael Schoberg (mschoberg)
  2018-09-21 16:53 ` Keith Busch
  0 siblings, 1 reply; 2+ messages in thread
From: Michael Schoberg (mschoberg) @ 2018-09-21 16:32 UTC (permalink / raw)


I'm not sure if this is the correct forum, but hopefully someone can help instruct me on how to proceed towards resolving this issue.

I'm working on a problem that occurs when a nvme device is hot-removed while IO is occurring.? I see a driver deadlock during recovery or more precisely, the removal of the driver instance during the recovery attempt.? What seems to be happening is when the system calls the nvme timeout routine, it eventually calls outside the driver and runs into a mutex deadlock.? I'm running with a 4.18.8 kernel:

The code path appears as follows (note - the device has been removed so there is no possibility of it being recovered):

	nvme_timeout --> nvme_warn_reset

	nvme nvme0: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0xffff

*The pci_read_config_word() returns PCIBIOS_SUCCESSFUL when the returned values show the device is detached.? I'm not quite sure how the configuration read works with a detached device, but it's pretty clear the returned values are not valid.

Within nvme_timeout(), it begins the process to reset the controller:

	nvme_reset_ctrl --> nvme_reset_work --> nvme_remove_dead_ctrl --> nvme_kill_queues --> nvme_set_queue_dying --> revalidate_disk

The thread hits a mutex deadlock after returning back to fs/block_dev.c::revalidate_disk():? mutex_lock(&bdev->bd_mutex)

This is the point where the driver appears completely stuck and never recovers.? A power cycle or system reset is required to restore operation (reboot or shutdown hangs).? I'm not sure what within block_dev.c is holding bd_mutex that would cause the deadlock and therefore it's very possible there is a cleaner solution than what I am using.

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2018-09-21 16:53 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-09-21 16:32 linux-nvme: driver removal deadlock when device hot-removed Michael Schoberg (mschoberg)
2018-09-21 16:53 ` Keith Busch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).