From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Mon, 21 May 2018 10:23:55 -0600 From: Keith Busch To: Ming Lei Cc: Jens Axboe , Keith Busch , Laurence Oberman , Sagi Grimberg , James Smart , linux-nvme@lists.infradead.org, linux-block@vger.kernel.org, Johannes Thumshirn , Christoph Hellwig Subject: Re: [PATCH 3/6] nvme: Move all IO out of controller reset Message-ID: <20180521162354.GM5528@localhost.localdomain> References: <20180518163823.27820-1-keith.busch@intel.com> <20180518163823.27820-3-keith.busch@intel.com> <20180518230357.GC18334@ming.t460p> <20180521142219.GC5528@localhost.localdomain> <20180521145850.GA19099@ming.t460p> <20180521150331.GI5528@localhost.localdomain> <20180521153425.GC19099@ming.t460p> <20180521154433.GJ5528@localhost.localdomain> <20180521160452.GD19099@ming.t460p> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii In-Reply-To: <20180521160452.GD19099@ming.t460p> List-ID: On Tue, May 22, 2018 at 12:04:53AM +0800, Ming Lei wrote: > On Mon, May 21, 2018 at 09:44:33AM -0600, Keith Busch wrote: > > On Mon, May 21, 2018 at 11:34:27PM +0800, Ming Lei wrote: > > > nvme_dev_disable() quiesces queues first before killing queues. > > > > > > If queues are quiesced during or before nvme_wait_freeze() is run > > > from the 2nd part of reset, the 2nd part can't move on, and IO hang > > > is caused. Finally no reset can be scheduled at all. > > > > But this patch moves nvme_wait_freeze outside the reset path, so I'm > > afraid I'm unable to follow how you've concluded the wait freeze is > > somehow part of the reset. > > For example: > > 1) the 1st timeout event: > > - nvme_dev_disable() > - reset > - scan_work > > 2) the 2nd timeout event: > > nvme_dev_disable() may come just after nvme_start_queues() in > the above reset of the 1st timeout. And nvme_timeout() won't > schedule a new reset since the controller state is NVME_CTRL_CONNECTING. Let me get this straight -- you're saying nvme_start_queues is going to somehow immediately trigger timeout work? I can't see how that could possibly happen in real life, but we can just remove it and use the existing nvme_start_ctrl to handle that in the LIVE state.