[PATCH 2/2] nvme: flush scan_work when resetting controller

All of lore.kernel.org
 help / color / mirror / Atom feed

From: ming.lei@redhat.com (Ming Lei)
Subject: [PATCH 2/2] nvme: flush scan_work when resetting controller
Date: Mon, 24 Jun 2019 11:29:46 +0800	[thread overview]
Message-ID: <20190624032945.GA6563@ming.t460p> (raw)
In-Reply-To: <5ec67ad6-a61a-b28f-9676-864a5f04bbad@suse.de>

On Fri, Jun 21, 2019@09:59:02AM +0200, Hannes Reinecke wrote:
> On 6/21/19 8:58 AM, Ming Lei wrote:
> > On Fri, Jun 21, 2019@08:14:45AM +0200, Hannes Reinecke wrote:
> >> On 6/20/19 3:36 AM, Ming Lei wrote:
> >>> On Tue, Jun 18, 2019@12:10:25PM +0200, Hannes Reinecke wrote:
> >>>> When resetting the controller there is no point whatsoever to
> >>>> have a scan run in parallel; we cannot access the controller and
> >>>
> >>> scan won't be run in parallel, because .scan_work is embedded in
> >>> 'struct nvme_ctrl' which is per-HBA.
> >>>
> >> Wrong. We do.
> >> Not sure why having it embedded in the controller structure might
> >> prevent this from happening; both reset and scan are embedded, but
> >> running on different queues:
> > 
> > I mean the scan_work function itself is run exclusively, but yes, it can be 
> > run when resetting is in-progress.
> > 
> >>
> >> void nvme_queue_scan(struct nvme_ctrl *ctrl)
> >> {
> >> 	/*
> >> 	 * Only new queue scan work when admin and IO queues are both alive
> >> 	 */
> >> 	if (ctrl->state == NVME_CTRL_LIVE)
> >> 		queue_work(nvme_wq, &ctrl->scan_work);
> >> }
> >> EXPORT_SYMBOL_GPL(nvme_queue_scan);
> >>
> >> int nvme_reset_ctrl(struct nvme_ctrl *ctrl)
> >> {
> >> 	if (!nvme_change_ctrl_state(ctrl, NVME_CTRL_RESETTING))
> >> 		return -EBUSY;
> >> 	if (!queue_work(nvme_reset_wq, &ctrl->reset_work))
> >> 		return -EBUSY;
> >> 	return 0;
> >> }
> >> EXPORT_SYMBOL_GPL(nvme_reset_ctrl);
> >>
> >> So there's nothing stopping them to run in parallel.
> >>
> >>>> we cannot tell which devices are present and which not.
> >>>> Additionally we'll run a scan after reset anyway.
> >>>> So flush existing scans before reconnecting, ensuring to
> >>>> short-circuit the scan workqueue function if the controller state
> >>>> isn't live to avoid lockups.
> >>>
> >>> This way may cause dead-lock.
> >>>
> >>> 1) nvme_revalidate_disk() might freeze queue in flush context, however
> >>> any in-flight requests won't be completed until reset is done, so
> >>> deadlock may be caused by flushing scans in reset context.
> >>>
> >> Which is why I'm checking the controller state; I've observed the
> >> deadlock plenty of times before introducing the controller state check.
> > 
> > Your check can't help wrt. the deadlock, for example:
> > 
> > 1) in scan work context:
> > 
> > - blk_mq_freeze_queue() is being started after passing the controller
> >   state check
> > 
> > 2) timeout & reset is triggered in another context at the exact same time:
> > 
> > - all in-flight IOs won't be freed until disable controller & reset is done.
> > 
> > - now flush_work() in reset context can't move on, because
> >   blk_mq_freeze_queue() in scan context can't make progress.
> > 
> There might be a difference between RDMA and FC implementations; for FC
> we terminate all outstanding I/Os from the HW side, so each I/O will be
> returned with an aborted status.
> Which for all tests I (and NetApp :-) did was enough to get
> 'blk_mq_freeze_queue()' unstuck and the flush_work to complete.
> We _did_ observed, however, that the state checks are absolutely
> critical to this, otherwise we indeed ended up with a stuck flush_work().

I have explained that the check can't fix the issue completely, and it is
like a workround.

> 
> >>
> >>> 2) sync IO may be involved in revalidate_disk() which is called in
> >>> scan context, so deadlock is caused for same reason with 1).
> >>>
> >> I/O during revalidate_disk() is protected by the state check, too, so we
> >> won't be issuing any I/O during resetting.
> >>
> >> To be precise: any I/O in flight when reset is triggered will be
> >> terminated, and any subsequent I/O is short-circuited by the state check.
> > 
> > No, any I/O in flight before resetting is just terminated from hardware,
> > but still in blk-mq sw or scheduler queue, so either sync IO or queue
> > freezing won't make progress.
> > 
> > Please see nvme_complete_rq(), all these IO will be retried usually.
> > 
> For the multipath case the request will be requeued via
> blk_steal_bios(), and the original request will be completed.

The sync IO during scan is actually waiting for bio completion, not request.

So there is still the issue in case of multipath.

Thanks,
Ming

     prev parent reply	other threads:[~2019-06-24  3:29 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-18 10:10 [PATCH 0/2] nvme: flush rescan worker before resetting Hannes Reinecke
2019-06-18 10:10 ` [PATCH 1/2] nvme: Do not remove namespaces during reset Hannes Reinecke
2019-06-18 17:30   ` Sagi Grimberg
2019-06-20  1:22   ` Ming Lei
2019-06-18 10:10 ` [PATCH 2/2] nvme: flush scan_work when resetting controller Hannes Reinecke
2019-06-18 17:41   ` Sagi Grimberg
2019-06-19  6:22     ` Hannes Reinecke
2019-06-19 16:56       ` Sagi Grimberg
2019-06-19 18:45         ` Hannes Reinecke
2019-06-19 20:04           ` Sagi Grimberg
2019-06-21 16:26             ` Sagi Grimberg
2019-06-24  5:48               ` Hannes Reinecke
2019-06-24  6:13               ` Hannes Reinecke
2019-06-24 18:08                 ` Sagi Grimberg
2019-06-24 18:51                   ` James Smart
2019-06-25  6:07                   ` Hannes Reinecke
2019-06-25 21:50                     ` Sagi Grimberg
2019-06-26  5:34                       ` Hannes Reinecke
2019-06-26 20:22                         ` Sagi Grimberg
2019-07-02  5:38                           ` Sagi Grimberg
2019-07-02 13:29                             ` Hannes Reinecke
2019-06-20  1:36   ` Ming Lei
2019-06-21  6:14     ` Hannes Reinecke
2019-06-21  6:58       ` Ming Lei
2019-06-21  7:59         ` Hannes Reinecke
2019-06-21 17:23           ` James Smart
2019-06-21 17:23           ` James Smart
2019-06-24  3:29           ` Ming Lei [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190624032945.GA6563@ming.t460p \
    --to=ming.lei@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.