From mboxrd@z Thu Jan 1 00:00:00 1970 From: james.smart@broadcom.com (James Smart) Date: Mon, 5 Nov 2018 12:04:18 -0800 Subject: [RFC PATCH] nvme of: don't flush scan work inside reset context In-Reply-To: <20181105115734.15515-1-ming.lei@redhat.com> References: <20181105115734.15515-1-ming.lei@redhat.com> Message-ID: <782deffe-eb72-3550-2d69-980604609fd5@broadcom.com> On 11/5/2018 3:57 AM, Ming Lei wrote: > When scan work is in-progress, any controller error may trigger > reset, now fc, rdma and loop host tries to flush scan work > inside reset context. > > This way can cause deadlock easily because any IO during controler > recovery(reset) can't be completed until the recovery is done. > > This patch tries to address the deadlock issue by not flushing > scan work inside reset context. Actually not see obvious reason > to do that: > > - once reset is done, a new scan will be scheduled. > - PCI NVMe doesn't do that way > > Cc: James Smart > Cc: Keith Busch > Cc: Christoph Hellwig > Cc: Sagi Grimberg > Signed-off-by: Ming Lei > --- > drivers/nvme/host/core.c | 4 ++-- > drivers/nvme/host/fc.c | 2 +- > drivers/nvme/host/nvme.h | 2 +- > drivers/nvme/host/pci.c | 2 +- > drivers/nvme/host/rdma.c | 2 +- > drivers/nvme/target/loop.c | 2 +- > 6 files changed, 7 insertions(+), 7 deletions(-) > > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c > index 2e65be8b1387..fbbb6bd8fbc5 100644 > --- a/drivers/nvme/host/core.c > +++ b/drivers/nvme/host/core.c > @@ -161,7 +161,7 @@ static void nvme_delete_ctrl_work(struct work_struct *work) > "Removing ctrl: NQN \"%s\"\n", ctrl->opts->subsysnqn); > > flush_work(&ctrl->reset_work); > - nvme_stop_ctrl(ctrl); > + nvme_stop_ctrl(ctrl, true); > nvme_remove_namespaces(ctrl); > ctrl->ops->delete_ctrl(ctrl); > nvme_uninit_ctrl(ctrl); > @@ -3469,7 +3469,7 @@ void nvme_complete_async_event(struct nvme_ctrl *ctrl, __le16 status, > } > EXPORT_SYMBOL_GPL(nvme_complete_async_event); > > -void nvme_stop_ctrl(struct nvme_ctrl *ctrl) > +void nvme_stop_ctrl(struct nvme_ctrl *ctrl, bool flush_scan) > { > nvme_mpath_stop(ctrl); > nvme_stop_keep_alive(ctrl); > Keith: This was missing a snippet in nvme_stop_ctrl() for ?-????????????? flush_work(&ctrl->scan_work); ?+???????????? if (flush_scan) ?+ ????????????????? flush_work(&ctrl->scan_work); I also believe Ming's patch isn't enough.? The issue for the transports are several of the flush cases require an io to complete to finish flushing - not just scan_work.?? nvme_mpath_stop() syncs with ctrl->ana_work, which may be waiting for an ana log to complete.? fw_act_queues() may be in the midst of polling the CSTS bit register or doing a fw_slot log read, or a stop_ctrl() routine for the transport may be in the middle of a start_ctrl call. Give the transport may be at a point where it's lost connectivity to the controller (never really a case for pci unless you consider hot unplug), before calling ctlr stop, the transport has to terminate/return outstanding io and stop further io from being accepted (usually by marking the queue as not connected). rdma seems to pretty much do the right thing for it's error handling path - where it schedules the err_work handler to do this. But that's missing from the reset controller path (and is nooped if reset is in progress and stuck in one of these flushes when a timeout occurs). fc resets the device, thus calls the flush routines, when it resets or has an error. So it never gets to call delete_association to teardown/mark things unconnected befpre waiting on the flushes. I think we need to have the reset_work routines call the routine that deletes/tearsdown before it calls nvme_stop_ctrl().? These routines will likely call nvme_stop_keep_alive right before doing so as well. -- james