From mboxrd@z Thu Jan 1 00:00:00 1970 From: ming.lei@redhat.com (Ming Lei) Date: Tue, 21 May 2019 17:45:42 +0800 Subject: [PATCH 5.0 66/95] nvme: cancel request synchronously In-Reply-To: References: <20190509181309.180685671@linuxfoundation.org> <20190509181314.082604502@linuxfoundation.org> Message-ID: <20190521094535.GA28632@ming.t460p> On Tue, May 21, 2019@11:36:26AM +0300, Max Gurtovoy wrote: > On 5/9/2019 9:42 PM, Greg Kroah-Hartman wrote: > > [ Upstream commit eb3afb75b57c28599af0dfa03a99579d410749e9 ] > > > > nvme_cancel_request() is used in error handler, and it is always > > reliable to cancel request synchronously, and avoids possible race > > in which request may be completed after real hw queue is destroyed. > > Ming, > > If the completion is async in the block layer, can't a "good" request (not a > canceled one..) complete after real HW queue is destroyed ? In theory, it can't. 1) in case of error recovery It is driver's responsibility to sync normal completion and handling error. NVMe PCI calls nvme_dev_disable() to shutdown controller, and there won't be good request any more after nvme_dev_disable() returns. I am not very familiar with NVMe RDMA code, but nvme_rdma_stop_io_queues() is supposed to do that for avoiding race with normal completion. Otherwise, it isn't enough by simply canceling in-flight requests. 2) in case of device removal blk_cleanup_queue() drains all in-queue requests, so there can't be such issue. Thanks, Ming