[RFC PATCH] nvme of: don't flush scan work inside reset context

All of lore.kernel.org
 help / color / mirror / Atom feed

From: james.smart@broadcom.com (James Smart)
Subject: [RFC PATCH] nvme of: don't flush scan work inside reset context
Date: Mon, 5 Nov 2018 12:04:18 -0800	[thread overview]
Message-ID: <782deffe-eb72-3550-2d69-980604609fd5@broadcom.com> (raw)
In-Reply-To: <20181105115734.15515-1-ming.lei@redhat.com>

On 11/5/2018 3:57 AM, Ming Lei wrote:
> When scan work is in-progress, any controller error may trigger
> reset, now fc, rdma and loop host tries to flush scan work
> inside reset context.
>
> This way can cause deadlock easily because any IO during controler
> recovery(reset) can't be completed until the recovery is done.
>
> This patch tries to address the deadlock issue by not flushing
> scan work inside reset context. Actually not see obvious reason
> to do that:
>
> - once reset is done, a new scan will be scheduled.
> - PCI NVMe doesn't do that way
>
> Cc: James Smart <james.smart at broadcom.com>
> Cc: Keith Busch <keith.busch at intel.com>
> Cc: Christoph Hellwig <hch at lst.de>
> Cc: Sagi Grimberg <sagi at grimberg.me>
> Signed-off-by: Ming Lei <ming.lei at redhat.com>
> ---
>   drivers/nvme/host/core.c   | 4 ++--
>   drivers/nvme/host/fc.c     | 2 +-
>   drivers/nvme/host/nvme.h   | 2 +-
>   drivers/nvme/host/pci.c    | 2 +-
>   drivers/nvme/host/rdma.c   | 2 +-
>   drivers/nvme/target/loop.c | 2 +-
>   6 files changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 2e65be8b1387..fbbb6bd8fbc5 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -161,7 +161,7 @@ static void nvme_delete_ctrl_work(struct work_struct *work)
>   		 "Removing ctrl: NQN \"%s\"\n", ctrl->opts->subsysnqn);
>   
>   	flush_work(&ctrl->reset_work);
> -	nvme_stop_ctrl(ctrl);
> +	nvme_stop_ctrl(ctrl, true);
>   	nvme_remove_namespaces(ctrl);
>   	ctrl->ops->delete_ctrl(ctrl);
>   	nvme_uninit_ctrl(ctrl);
> @@ -3469,7 +3469,7 @@ void nvme_complete_async_event(struct nvme_ctrl *ctrl, __le16 status,
>   }
>   EXPORT_SYMBOL_GPL(nvme_complete_async_event);
>   
> -void nvme_stop_ctrl(struct nvme_ctrl *ctrl)
> +void nvme_stop_ctrl(struct nvme_ctrl *ctrl, bool flush_scan)
>   {
>   	nvme_mpath_stop(ctrl);
>   	nvme_stop_keep_alive(ctrl);
>

Keith: This was missing a snippet in nvme_stop_ctrl() for

 ?-????????????? flush_work(&ctrl->scan_work);
 ?+???????????? if (flush_scan)
 ?+ ????????????????? flush_work(&ctrl->scan_work);

I also believe Ming's patch isn't enough.? The issue for the transports 
are several of the flush cases require an io to complete to finish 
flushing - not just scan_work.?? nvme_mpath_stop() syncs with 
ctrl->ana_work, which may be waiting for an ana log to complete.? 
fw_act_queues() may be in the midst of polling the CSTS bit register or 
doing a fw_slot log read, or a stop_ctrl() routine for the transport may 
be in the middle of a start_ctrl call.

Give the transport may be at a point where it's lost connectivity to the 
controller (never really a case for pci unless you consider hot unplug), 
before calling ctlr stop, the transport has to terminate/return 
outstanding io and stop further io from being accepted (usually by 
marking the queue as not connected).

rdma seems to pretty much do the right thing for it's error handling 
path - where it schedules the err_work handler to do this. But that's 
missing from the reset controller path (and is nooped if reset is in 
progress and stuck in one of these flushes when a timeout occurs).

fc resets the device, thus calls the flush routines, when it resets or 
has an error. So it never gets to call delete_association to 
teardown/mark things unconnected befpre waiting on the flushes.

I think we need to have the reset_work routines call the routine that 
deletes/tearsdown before it calls nvme_stop_ctrl().? These routines will 
likely call nvme_stop_keep_alive right before doing so as well.

-- james

next prev parent reply	other threads:[~2018-11-05 20:04 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-05 11:57 [RFC PATCH] nvme of: don't flush scan work inside reset context Ming Lei
2018-11-05 16:28 ` Keith Busch
2018-11-06  0:30   ` Ming Lei
2018-11-05 20:04 ` James Smart [this message]
2018-11-06  1:18   ` Ming Lei
2018-11-06  5:45     ` James Smart
2018-11-07  1:58       ` Ming Lei
2018-11-08  0:19       ` Sagi Grimberg
2018-11-08 17:49         ` James Smart
2018-11-07  3:26 ` Sagi Grimberg
2018-11-07  3:51   ` Ming Lei
2018-11-07  4:38     ` Sagi Grimberg
2018-11-07  8:34       ` Ming Lei
2018-11-07 18:38         ` Sagi Grimberg
2018-11-07 19:27           ` James Smart
2018-11-08  0:05           ` Ming Lei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=782deffe-eb72-3550-2d69-980604609fd5@broadcom.com \
    --to=james.smart@broadcom.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.