From: ming.lei@redhat.com (Ming Lei)
Subject: [RFC PATCH] nvme of: don't flush scan work inside reset context
Date: Wed, 7 Nov 2018 09:58:29 +0800 [thread overview]
Message-ID: <20181107015828.GE3531@ming.t460p> (raw)
In-Reply-To: <dbfed2f9-4ad3-40ce-7c6d-0feae2b4128f@broadcom.com>
Hi James,
On Mon, Nov 05, 2018@09:45:21PM -0800, James Smart wrote:
>
>
> On 11/5/2018 5:18 PM, Ming Lei wrote:
> >
> > > I also believe Ming's patch isn't enough.? The issue for the transports are
> > > several of the flush cases require an io to complete to finish flushing -
> > > not just scan_work.?? nvme_mpath_stop() syncs with ctrl->ana_work, which may
> > > be waiting for an ana log to complete.? fw_act_queues() may be in the midst
> > > of polling the CSTS bit register or doing a fw_slot log read, or a
> > > stop_ctrl() routine for the transport may be in the middle of a start_ctrl
> > > call.
> > These commands won't be retried since REQ_FAILFAST_DRIVER is set for any
> > request allocated via nvme_alloc_request().
> >
> > So the above flush cases you mentioned should be easier to handle than IO from
> > scan context. However, looks the patch we discussed before by moving
> > nvme_stop_ctrl() after nvme_rdma_shutdown_ctrl() is still needed for
> > avoiding hang on flush in these commands.
>
> Agree - nvme_stop_ctrl() has to move so the transport routines that
> stop/return the io can run.? Leaving it in the same place with an argument
> won't be enough.
OK, I will include this part into RFC V2 and post out for further review.
>
>
> >
> > > Give the transport may be at a point where it's lost connectivity to the
> > > controller (never really a case for pci unless you consider hot unplug),
> > > before calling ctlr stop, the transport has to terminate/return outstanding
> > > io and stop further io from being accepted (usually by marking the queue as
> > > not connected).
> > This patch doesn't change this point, all in-flight IO is still terminate/return.
> > With or without this patch, all these IO will be retried too after controller is
> > recovered.
> >
> > Or you mean all these IOs have to terminate/return before calling
> > ctrl->ops->stop_ctrl().
>
> Patch as is, that calls the flush routines pointed out, will deadlock as the
> io won't be terminated and returned by the transport. The flush routines,
> thus stop_ctrl, has to be called after the transport stops the queues and
> aborts/returns the ios. Retries after that will do what they do.
All normal IOs are retried and won't be terminated during reset.
However, it is very unusual to wait for completion of these normal IOs
inside reset handler, and the only such usage I saw is
flush_work(&ctrl->scan_work) in fc/rdma/loop.
>
> But the concern I have is - if things like ns_revalidate requires normal io
> to be completed to be "flushed", then we have an issue. The non-normal io,
> generated by the core layer and those issued by multipath will fast-fail.
> But the normal io, whether non-multipath or last-path with multipath, may
> not be failed as they will be continually requeued or wait for a new path.
> So I don't know how it exits this deadlock.
Except for flush_work(&ctrl->scan_work), do we have other cases in which normal
IO is waited for completion inside reset handler? Looks not found.
For other waiting for completion of normal IO, it is the responsibility
of reset or error or timeout handler to guarantee forward progress, and
there shouldn't be such deadlock.
Thanks,
Ming
next prev parent reply other threads:[~2018-11-07 1:58 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-11-05 11:57 [RFC PATCH] nvme of: don't flush scan work inside reset context Ming Lei
2018-11-05 16:28 ` Keith Busch
2018-11-06 0:30 ` Ming Lei
2018-11-05 20:04 ` James Smart
2018-11-06 1:18 ` Ming Lei
2018-11-06 5:45 ` James Smart
2018-11-07 1:58 ` Ming Lei [this message]
2018-11-08 0:19 ` Sagi Grimberg
2018-11-08 17:49 ` James Smart
2018-11-07 3:26 ` Sagi Grimberg
2018-11-07 3:51 ` Ming Lei
2018-11-07 4:38 ` Sagi Grimberg
2018-11-07 8:34 ` Ming Lei
2018-11-07 18:38 ` Sagi Grimberg
2018-11-07 19:27 ` James Smart
2018-11-08 0:05 ` Ming Lei
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20181107015828.GE3531@ming.t460p \
--to=ming.lei@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.