From: keith.busch@linux.intel.com (Keith Busch)
Subject: [PATCH V5 0/9] nvme: pci: fix & improve timeout handling
Date: Mon, 14 May 2018 09:18:21 -0600 [thread overview]
Message-ID: <20180514151821.GE7772@localhost.localdomain> (raw)
In-Reply-To: <20180512002110.GA23631@ming.t460p>
Hi Ming,
On Sat, May 12, 2018@08:21:22AM +0800, Ming Lei wrote:
> > [ 760.679960] nvme nvme1: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0xffff
> > [ 760.701468] nvme nvme1: EH 0: after shutdown, top eh: 1
> > [ 760.727099] pci_raw_set_power_state: 62 callbacks suppressed
> > [ 760.727103] nvme 0000:86:00.0: Refused to change power state, currently in D3
>
> EH may not cover this kind of failure, so it fails in the 1st try.
Indeed, the test is simulating a permanently broken link, so recovery is
not expected. A success in this case is just completing driver
unbinding.
> > [ 760.727483] nvme nvme1: EH 0: state 4, eh_done -19, top eh 1
> > [ 760.727485] nvme nvme1: EH 0: after recovery -19
> > [ 760.727488] nvme nvme1: EH: fail controller
>
> The above issue(hang in nvme_remove()) is still an old issue, which
> is because queues are kept as quiesce during remove, so could you
> please test the following change?
>
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 1dec353388be..c78e5a0cde06 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -3254,6 +3254,11 @@ void nvme_remove_namespaces(struct nvme_ctrl *ctrl)
> */
> if (ctrl->state == NVME_CTRL_DEAD)
> nvme_kill_queues(ctrl);
> + else {
> + if (ctrl->admin_q)
> + blk_mq_unquiesce_queue(ctrl->admin_q);
> + nvme_start_queues(ctrl);
> + }
>
> down_write(&ctrl->namespaces_rwsem);
> list_splice_init(&ctrl->namespaces, &ns_list);
The above won't actually do anything here since the broken link puts the
controller in the DEAD state, so we've killed the queues which also
unquiesces them.
> BTW, in my environment, it is hard to trigger this failure, so not see
> this issue, but I did verify the nested EH which can recover from error
> in reset.
It's actually pretty easy to trigger this one. I just modify block/019 to
remove the check for a hotplug slot then run it on a block device that's
not hot-pluggable.
next prev parent reply other threads:[~2018-05-14 15:18 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-05-11 12:29 [PATCH V5 0/9] nvme: pci: fix & improve timeout handling Ming Lei
2018-05-11 12:29 ` [PATCH V5 1/9] block: introduce blk_quiesce_timeout() and blk_unquiesce_timeout() Ming Lei
2018-05-11 12:29 ` [PATCH V5 2/9] nvme: pci: cover timeout for admin commands running in EH Ming Lei
2018-05-11 12:29 ` [PATCH V5 3/9] nvme: pci: only wait freezing if queue is frozen Ming Lei
2018-05-11 12:29 ` [PATCH V5 4/9] nvme: pci: freeze queue in nvme_dev_disable() in case of error recovery Ming Lei
2018-05-11 12:29 ` [PATCH V5 5/9] nvme: pci: prepare for supporting error recovery from resetting context Ming Lei
2018-05-11 12:29 ` [PATCH V5 6/9] nvme: pci: move error handling out of nvme_reset_dev() Ming Lei
2018-05-11 12:29 ` [PATCH V5 7/9] nvme: pci: don't unfreeze queue until controller state updating succeeds Ming Lei
2018-05-11 12:29 ` [PATCH V5 8/9] nvme: core: introduce nvme_force_change_ctrl_state() Ming Lei
2018-05-11 12:29 ` [PATCH V5 9/9] nvme: pci: support nested EH Ming Lei
2018-05-15 10:02 ` jianchao.wang
2018-05-15 12:39 ` Ming Lei
2018-05-11 20:50 ` [PATCH V5 0/9] nvme: pci: fix & improve timeout handling Keith Busch
2018-05-12 0:21 ` Ming Lei
2018-05-14 15:18 ` Keith Busch [this message]
2018-05-14 23:47 ` Ming Lei
2018-05-15 0:33 ` Keith Busch
2018-05-15 9:08 ` Ming Lei
2018-05-16 4:31 ` Ming Lei
2018-05-16 15:18 ` Keith Busch
2018-05-16 22:18 ` Ming Lei
2018-05-14 8:21 ` jianchao.wang
2018-05-14 9:38 ` Ming Lei
2018-05-14 10:05 ` jianchao.wang
2018-05-14 12:22 ` Ming Lei
2018-05-15 0:33 ` Ming Lei
2018-05-15 9:56 ` jianchao.wang
2018-05-15 12:56 ` Ming Lei
2018-05-16 3:03 ` jianchao.wang
2018-05-16 2:04 ` Ming Lei
2018-05-16 2:09 ` Ming Lei
2018-05-16 2:15 ` jianchao.wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180514151821.GE7772@localhost.localdomain \
--to=keith.busch@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).