From mboxrd@z Thu Jan 1 00:00:00 1970 From: keith.busch@linux.intel.com (Keith Busch) Date: Mon, 21 May 2018 09:03:31 -0600 Subject: [PATCH 3/6] nvme: Move all IO out of controller reset In-Reply-To: <20180521145850.GA19099@ming.t460p> References: <20180518163823.27820-1-keith.busch@intel.com> <20180518163823.27820-3-keith.busch@intel.com> <20180518230357.GC18334@ming.t460p> <20180521142219.GC5528@localhost.localdomain> <20180521145850.GA19099@ming.t460p> Message-ID: <20180521150331.GI5528@localhost.localdomain> On Mon, May 21, 2018@10:58:51PM +0800, Ming Lei wrote: > On Mon, May 21, 2018@08:22:19AM -0600, Keith Busch wrote: > > On Sat, May 19, 2018@07:03:58AM +0800, Ming Lei wrote: > > > On Fri, May 18, 2018@10:38:20AM -0600, Keith Busch wrote: > > > > + > > > > + if (unfreeze) > > > > + nvme_wait_freeze(&dev->ctrl); > > > > + > > > > > > timeout may comes just before&during blk_mq_update_nr_hw_queues() or > > > the above nvme_wait_freeze(), then both two may hang forever. > > > > Why would it hang forever? The scan_work doesn't stop a timeout from > > triggering a reset to reclaim requests necessary to complete a freeze. > > nvme_dev_disable() will quiesce queues, then nvme_wait_freeze() or > blk_mq_update_nr_hw_queues() may hang forever. nvme_dev_disable is just the first part of the timeout sequence. You have to follow it through to the reset_work that either restarts or kills the queues.