From: Keith Busch <kbusch@kernel.org>
To: Tyler Ramer <tyaramer@gmail.com>
Cc: Jens Axboe <axboe@fb.com>,
linux-kernel@vger.kernel.org, Christoph Hellwig <hch@lst.de>,
linux-nvme@lists.infradead.org, Sagi Grimberg <sagi@grimberg.me>
Subject: Re: [PATCH] nvme-pci: Shutdown when removing dead controller
Date: Sun, 6 Oct 2019 13:21:11 -0600 [thread overview]
Message-ID: <20191006192109.GA9983@keith-busch> (raw)
In-Reply-To: <CAKcoMVC2LdcmUx6j5JzuT-TsFGz=mwQ0MsprrKR2qeXoTmQ-TQ@mail.gmail.com>
On Fri, Oct 04, 2019 at 11:36:42AM -0400, Tyler Ramer wrote:
> Here's a failure we had which represents the issue the patch is
> intended to solve:
>
> Aug 26 15:00:56 testhost kernel: nvme nvme4: async event result 00010300
> Aug 26 15:01:27 testhost kernel: nvme nvme4: controller is down; will
> reset: CSTS=0x3, PCI_STATUS=0x10
> Aug 26 15:02:10 testhost kernel: nvme nvme4: Device not ready; aborting reset
> Aug 26 15:02:10 testhost kernel: nvme nvme4: Removing after probe
> failure status: -19
>
> The CSTS warnings comes from nvme_timeout, and is printed by
> nvme_warn_reset. A reset then occurs
> Controller state should be NVME_CTRL_RESETTING
>
> Now, in nvme_reset_work, controller is never marked "CONNECTING" at:
>
> if (!nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_CONNECTING))
>
> because several lines above, we can determine that
> nvme_pci_configure_admin_queues returns
> a bad result, which triggers a goto out_unlock and prints "removing
> after probe failure status: -19"
>
> Because state is never changed to NVME_CTRL_CONNECTING or
> NVME_CTRL_DELETING, the
> logic added in https://github.com/torvalds/linux/commit/2036f7263d70e67d70a67899a468588cb7356bc9
> should not apply.
Nor should it, because there are no IO in flight at this point, there
can't be any timeout work to check the state.
> We can further validate that dev->ctrl.state ==
> NVME_CTRL_RESETTING thanks to
> the WARN_ON in nvme_reset_work.
I'm not sure I see what this is fixing. Setting the shutdown to true is
usually just to get the queues flushed, but the nvme_kill_queues() that
we call accomplishes the same thing.
> On Thu, Oct 3, 2019 at 3:13 PM Tyler Ramer <tyaramer@gmail.com> wrote:
> >
> > Always shutdown the controller when nvme_remove_dead_controller is
> > reached.
> >
> > It's possible for nvme_remove_dead_controller to be called as part of a
> > failed reset, when there is a bad NVME_CSTS. The controller won't
> > be comming back online, so we should shut it down rather than just
> > disabling.
> >
> > Signed-off-by: Tyler Ramer <tyaramer@gmail.com>
> > ---
> > drivers/nvme/host/pci.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> > index c0808f9eb8ab..c3f5ba22c625 100644
> > --- a/drivers/nvme/host/pci.c
> > +++ b/drivers/nvme/host/pci.c
> > @@ -2509,7 +2509,7 @@ static void nvme_pci_free_ctrl(struct nvme_ctrl *ctrl)
> > static void nvme_remove_dead_ctrl(struct nvme_dev *dev)
> > {
> > nvme_get_ctrl(&dev->ctrl);
> > - nvme_dev_disable(dev, false);
> > + nvme_dev_disable(dev, true);
> > nvme_kill_queues(&dev->ctrl);
> > if (!queue_work(nvme_wq, &dev->remove_work))
> > nvme_put_ctrl(&dev->ctrl);
> > --
> > 2.23.0
> >
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
WARNING: multiple messages have this Message-ID (diff)
From: Keith Busch <kbusch@kernel.org>
To: Tyler Ramer <tyaramer@gmail.com>
Cc: Jens Axboe <axboe@fb.com>, Christoph Hellwig <hch@lst.de>,
Sagi Grimberg <sagi@grimberg.me>,
linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] nvme-pci: Shutdown when removing dead controller
Date: Sun, 6 Oct 2019 13:21:11 -0600 [thread overview]
Message-ID: <20191006192109.GA9983@keith-busch> (raw)
In-Reply-To: <CAKcoMVC2LdcmUx6j5JzuT-TsFGz=mwQ0MsprrKR2qeXoTmQ-TQ@mail.gmail.com>
On Fri, Oct 04, 2019 at 11:36:42AM -0400, Tyler Ramer wrote:
> Here's a failure we had which represents the issue the patch is
> intended to solve:
>
> Aug 26 15:00:56 testhost kernel: nvme nvme4: async event result 00010300
> Aug 26 15:01:27 testhost kernel: nvme nvme4: controller is down; will
> reset: CSTS=0x3, PCI_STATUS=0x10
> Aug 26 15:02:10 testhost kernel: nvme nvme4: Device not ready; aborting reset
> Aug 26 15:02:10 testhost kernel: nvme nvme4: Removing after probe
> failure status: -19
>
> The CSTS warnings comes from nvme_timeout, and is printed by
> nvme_warn_reset. A reset then occurs
> Controller state should be NVME_CTRL_RESETTING
>
> Now, in nvme_reset_work, controller is never marked "CONNECTING" at:
>
> if (!nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_CONNECTING))
>
> because several lines above, we can determine that
> nvme_pci_configure_admin_queues returns
> a bad result, which triggers a goto out_unlock and prints "removing
> after probe failure status: -19"
>
> Because state is never changed to NVME_CTRL_CONNECTING or
> NVME_CTRL_DELETING, the
> logic added in https://github.com/torvalds/linux/commit/2036f7263d70e67d70a67899a468588cb7356bc9
> should not apply.
Nor should it, because there are no IO in flight at this point, there
can't be any timeout work to check the state.
> We can further validate that dev->ctrl.state ==
> NVME_CTRL_RESETTING thanks to
> the WARN_ON in nvme_reset_work.
I'm not sure I see what this is fixing. Setting the shutdown to true is
usually just to get the queues flushed, but the nvme_kill_queues() that
we call accomplishes the same thing.
> On Thu, Oct 3, 2019 at 3:13 PM Tyler Ramer <tyaramer@gmail.com> wrote:
> >
> > Always shutdown the controller when nvme_remove_dead_controller is
> > reached.
> >
> > It's possible for nvme_remove_dead_controller to be called as part of a
> > failed reset, when there is a bad NVME_CSTS. The controller won't
> > be comming back online, so we should shut it down rather than just
> > disabling.
> >
> > Signed-off-by: Tyler Ramer <tyaramer@gmail.com>
> > ---
> > drivers/nvme/host/pci.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> > index c0808f9eb8ab..c3f5ba22c625 100644
> > --- a/drivers/nvme/host/pci.c
> > +++ b/drivers/nvme/host/pci.c
> > @@ -2509,7 +2509,7 @@ static void nvme_pci_free_ctrl(struct nvme_ctrl *ctrl)
> > static void nvme_remove_dead_ctrl(struct nvme_dev *dev)
> > {
> > nvme_get_ctrl(&dev->ctrl);
> > - nvme_dev_disable(dev, false);
> > + nvme_dev_disable(dev, true);
> > nvme_kill_queues(&dev->ctrl);
> > if (!queue_work(nvme_wq, &dev->remove_work))
> > nvme_put_ctrl(&dev->ctrl);
> > --
> > 2.23.0
> >
next prev parent reply other threads:[~2019-10-06 19:21 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-10-03 19:13 [PATCH] nvme-pci: Shutdown when removing dead controller Tyler Ramer
2019-10-03 19:13 ` Tyler Ramer
2019-10-04 15:36 ` Tyler Ramer
2019-10-04 15:36 ` Tyler Ramer
2019-10-05 2:07 ` Singh, Balbir
2019-10-05 2:07 ` Singh, Balbir
2019-10-05 21:58 ` Tyler Ramer
2019-10-05 21:58 ` Tyler Ramer
2019-10-06 19:21 ` Keith Busch [this message]
2019-10-06 19:21 ` Keith Busch
2019-10-07 15:13 ` Tyler Ramer
2019-10-07 15:13 ` Tyler Ramer
2019-10-07 15:44 ` Keith Busch
2019-10-07 15:44 ` Keith Busch
2019-10-07 17:50 ` [PATCH v2] " Tyler Ramer
2019-10-07 17:50 ` Tyler Ramer
2019-10-07 18:28 ` Keith Busch
2019-10-07 18:28 ` Keith Busch
2019-10-07 19:32 ` Tyler Ramer
2019-10-07 19:32 ` Tyler Ramer
2019-10-07 22:11 ` [PATCH] " Singh, Balbir
2019-10-07 22:11 ` Singh, Balbir
2019-10-11 14:28 ` [PATCH v3] Always shutdown the controller when nvme_remove_dead_ctrl is reached Tyler Ramer
2019-10-11 14:28 ` Tyler Ramer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20191006192109.GA9983@keith-busch \
--to=kbusch@kernel.org \
--cc=axboe@fb.com \
--cc=hch@lst.de \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=sagi@grimberg.me \
--cc=tyaramer@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.