From mboxrd@z Thu Jan 1 00:00:00 1970 From: ming.lei@redhat.com (Ming Lei) Date: Mon, 18 Mar 2019 23:16:19 +0800 Subject: [PATCH 1/2] blk-mq: introduce blk_mq_complete_request_sync() In-Reply-To: <1552921495.152266.8.camel@acm.org> References: <20190318032950.17770-1-ming.lei@redhat.com> <20190318032950.17770-2-ming.lei@redhat.com> <20190318073826.GA29746@ming.t460p> <1552921495.152266.8.camel@acm.org> Message-ID: <20190318151618.GA20371@ming.t460p> On Mon, Mar 18, 2019@08:04:55AM -0700, Bart Van Assche wrote: > On Mon, 2019-03-18@15:38 +0800, Ming Lei wrote: > > On Sun, Mar 17, 2019@09:09:09PM -0700, Bart Van Assche wrote: > > > On 3/17/19 8:29 PM, Ming Lei wrote: > > > > In NVMe's error handler, follows the typical steps for tearing down > > > > hardware: > > > > > > > > 1) stop blk_mq hw queues > > > > 2) stop the real hw queues > > > > 3) cancel in-flight requests via > > > > blk_mq_tagset_busy_iter(tags, cancel_request, ...) > > > > cancel_request(): > > > > mark the request as abort > > > > blk_mq_complete_request(req); > > > > 4) destroy real hw queues > > > > > > > > However, there may be race between #3 and #4, because blk_mq_complete_request() > > > > actually completes the request asynchronously. > > > > > > > > This patch introduces blk_mq_complete_request_sync() for fixing the > > > > above race. > > > > > > Other block drivers wait until outstanding requests have completed by > > > calling blk_cleanup_queue() before hardware queues are destroyed. Why can't > > > the NVMe driver follow that approach? > > > > The tearing down of controller can be done in error handler, in which > > the request queues may not be cleaned up, almost all kinds of NVMe > > controller's error handling follows the above steps, such as: > > > > nvme_rdma_error_recovery_work() > > ->nvme_rdma_teardown_io_queues() > > > > nvme_timeout() > > ->nvme_dev_disable > > Hi Ming, > > This makes me wonder whether the current design of the NVMe core is the best > design we can come up with? The structure of e.g. the SRP initiator and target > drivers is similar to the NVMeOF drivers. However, there is no need in the SRP > initiator driver to terminate requests synchronously. Is this due to I am not familiar with SRP, could you explain what SRP initiator driver will do when the controller is in bad state? Especially about dealing with in-flight IO requests under this situation. > differences in the error handling approaches in the SCSI and NVMe core drivers? As far as I can tell, I don't see obvious design issue in NVMe host drivers, which tries best to recover controller and retries to complete all in-flight IO. Thanks, Ming