From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754174AbeA2UNk (ORCPT ); Mon, 29 Jan 2018 15:13:40 -0500 Received: from mga02.intel.com ([134.134.136.20]:27327 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932150AbeA2UNh (ORCPT ); Mon, 29 Jan 2018 15:13:37 -0500 X-Amp-Result: UNSCANNABLE X-Amp-File-Uploaded: False X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.46,432,1511856000"; d="scan'208";a="27252189" Date: Mon, 29 Jan 2018 13:17:16 -0700 From: Keith Busch To: Sagi Grimberg Cc: Jianchao Wang , axboe@fb.com, hch@lst.de, linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] nvme-pci: use NOWAIT flag for nvme_set_host_mem Message-ID: <20180129201716.GB25515@localhost.localdomain> References: <1517195255-21832-1-git-send-email-jianchao.w.wang@oracle.com> <20180129160145.GA25515@localhost.localdomain> <1b7d3700-945f-9272-b6aa-d2ebeaf0cb1e@grimberg.me> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1b7d3700-945f-9272-b6aa-d2ebeaf0cb1e@grimberg.me> User-Agent: Mutt/1.9.1 (2017-09-22) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jan 29, 2018 at 09:55:41PM +0200, Sagi Grimberg wrote: > > Thanks for the fix. It looks like we still have a problem, though. > > Commands submitted with the "shutdown_lock" held need to be able to make > > forward progress without relying on a completion, but this one could > > block indefinitely. > > Can you explain to me why is the shutdown_lock needed to synchronize > nvme_dev_disable? More concretely, how is nvme_dev_disable different > from other places where we rely on the ctrl state to serialize stuff? > > The only reason I see would be to protect against completion-after-abort > scenario but I think the block layer should protect against it (checks > if the request timeout timer fired). We can probably find a way to use the state machine for this. Disabling the controller pre-dates the state machine, and the mutex is there to protect against two actors shutting the controller down at the same time, like a hot removal at the same time as a timeout handling reset.