From: hch@lst.de (Christoph Hellwig)
Subject: [PATCHv3 1/9] nvme: Sync request queues on reset
Date: Fri, 25 May 2018 14:42:09 +0200 [thread overview]
Message-ID: <20180525124209.GD23463@lst.de> (raw)
In-Reply-To: <20180524203500.14081-2-keith.busch@intel.com>
On Thu, May 24, 2018@02:34:52PM -0600, Keith Busch wrote:
> This patch fixes races that occur with simultaneous controller
> resets
Wait.. How do we end up with simultaneous controller resets? We
not only have the NVME_CTRL_RESETTING resetting state, but also
execute all resets from ctrl->reset_work, so they are implicitly
single threaded.
> by synchronizing request queues prior to initializing the
> controller. Without this, a timeout thread may attempt disabling a
> controller at the same time as we're trying to enable it.
>
> Signed-off-by: Keith Busch <keith.busch at intel.com>
> ---
> drivers/nvme/host/core.c | 21 +++++++++++++++++++--
> drivers/nvme/host/nvme.h | 1 +
> drivers/nvme/host/pci.c | 11 +++++++----
> 3 files changed, 27 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index dc8aa2c1c22a..33034e469bbc 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -3469,6 +3469,12 @@ int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev,
> }
> EXPORT_SYMBOL_GPL(nvme_init_ctrl);
>
> +static void nvme_start_queue(struct nvme_ns *ns)
> +{
> + blk_mq_unquiesce_queue(ns->queue);
> + blk_mq_kick_requeue_list(ns->queue);
> +}
> +
> /**
> * nvme_kill_queues(): Ends all namespace queues
> * @ctrl: the dead controller that needs to end
> @@ -3497,7 +3503,7 @@ void nvme_kill_queues(struct nvme_ctrl *ctrl)
> blk_set_queue_dying(ns->queue);
>
> /* Forcibly unquiesce queues to avoid blocking dispatch */
> - blk_mq_unquiesce_queue(ns->queue);
> + nvme_start_queue(ns);
> }
> up_read(&ctrl->namespaces_rwsem);
> }
> @@ -3567,11 +3573,22 @@ void nvme_start_queues(struct nvme_ctrl *ctrl)
>
> down_read(&ctrl->namespaces_rwsem);
> list_for_each_entry(ns, &ctrl->namespaces, list)
> - blk_mq_unquiesce_queue(ns->queue);
> + nvme_start_queue(ns);
> up_read(&ctrl->namespaces_rwsem);
The whole kick the requeue list when starting queues bit seems like
it should be a separate patch.
> }
> EXPORT_SYMBOL_GPL(nvme_start_queues);
>
> +void nvme_sync_queues(struct nvme_ctrl *ctrl)
> +{
> + struct nvme_ns *ns;
> +
> + down_read(&ctrl->namespaces_rwsem);
> + list_for_each_entry(ns, &ctrl->namespaces, list)
> + blk_sync_queue(ns->queue);
> + up_read(&ctrl->namespaces_rwsem);
> +}
> +EXPORT_SYMBOL_GPL(nvme_sync_queues);
> +
> int nvme_reinit_tagset(struct nvme_ctrl *ctrl, struct blk_mq_tag_set *set)
> {
> if (!ctrl->ops->reinit_request)
> diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
> index ec6e4acc4d48..4f43918cd902 100644
> --- a/drivers/nvme/host/nvme.h
> +++ b/drivers/nvme/host/nvme.h
> @@ -403,6 +403,7 @@ int nvme_sec_submit(void *data, u16 spsp, u8 secp, void *buffer, size_t len,
> void nvme_complete_async_event(struct nvme_ctrl *ctrl, __le16 status,
> volatile union nvme_result *res);
>
> +void nvme_sync_queues(struct nvme_ctrl *ctrl);
> void nvme_stop_queues(struct nvme_ctrl *ctrl);
> void nvme_start_queues(struct nvme_ctrl *ctrl);
> void nvme_kill_queues(struct nvme_ctrl *ctrl);
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 917e1714f7d9..9da28e10d942 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -2317,11 +2317,14 @@ static void nvme_reset_work(struct work_struct *work)
> goto out;
>
> /*
> - * If we're called to reset a live controller first shut it down before
> - * moving on.
> + * Ensure there are no timeout work in progress prior to forcefully
> + * disabling the queue. There is no harm in disabling the device even
> + * when it was already disabled, as this will forcefully reclaim any
> + * IOs that are stuck due to blk-mq's timeout handling that prevents
> + * timed out requests from completing.
> */
> - if (dev->ctrl.ctrl_config & NVME_CC_ENABLE)
> - nvme_dev_disable(dev, false);
> + nvme_sync_queues(&dev->ctrl);
> + nvme_dev_disable(dev, false);
And this part also makes sense to me, but I don't really understand
how it relates to the commit message.
next prev parent reply other threads:[~2018-05-25 12:42 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-05-24 20:34 [PATCHv3 0/9] nvme timeout fixes, v3 Keith Busch
2018-05-24 20:34 ` [PATCHv3 1/9] nvme: Sync request queues on reset Keith Busch
2018-05-25 12:42 ` Christoph Hellwig [this message]
2018-05-25 14:22 ` Keith Busch
2018-05-25 14:32 ` Christoph Hellwig
2018-05-25 14:45 ` Keith Busch
2018-05-25 15:56 ` James Smart
2018-05-25 16:24 ` Keith Busch
2018-05-25 18:04 ` James Smart
2018-05-25 18:30 ` Keith Busch
2018-05-30 23:25 ` Sagi Grimberg
2018-06-05 16:25 ` Keith Busch
2018-05-30 23:24 ` Sagi Grimberg
2018-05-24 20:34 ` [PATCHv3 2/9] nvme-pci: Fix queue freeze criteria " Keith Busch
2018-05-25 12:43 ` Christoph Hellwig
2018-05-30 23:36 ` Sagi Grimberg
2018-05-24 20:34 ` [PATCHv3 3/9] nvme: Move all IO out of controller reset Keith Busch
2018-05-25 13:00 ` Christoph Hellwig
2018-05-25 14:41 ` Keith Busch
2018-05-24 20:34 ` [PATCHv3 4/9] nvme-pci: Rate limit the nvme timeout warnings Keith Busch
2018-05-25 13:01 ` Christoph Hellwig
2018-05-30 6:06 ` Christoph Hellwig
2018-05-24 20:34 ` [PATCHv3 5/9] nvme-pci: End IO requests in CONNECTING state Keith Busch
2018-05-24 20:47 ` Christoph Hellwig
2018-05-24 21:03 ` Keith Busch
2018-05-25 12:31 ` Christoph Hellwig
2018-05-24 20:34 ` [PATCHv3 6/9] nvme-pci: Unquiesce dead controller queues Keith Busch
2018-05-25 13:03 ` Christoph Hellwig
2018-05-24 20:34 ` [PATCHv3 7/9] nvme-pci: Attempt reset retry for IO failures Keith Busch
2018-05-25 13:04 ` Christoph Hellwig
2018-05-25 14:25 ` Keith Busch
2018-05-30 23:40 ` Sagi Grimberg
2018-06-04 22:46 ` Keith Busch
2018-05-24 20:34 ` [PATCHv3 8/9] nvme-pci: Queue creation error handling Keith Busch
2018-05-25 12:35 ` Christoph Hellwig
2018-06-05 16:28 ` Keith Busch
2018-05-30 23:37 ` Sagi Grimberg
2018-05-24 20:35 ` [PATCHv3 9/9] nvme-pci: Don't wait for HMB completion on shutdown Keith Busch
2018-05-24 20:45 ` Christoph Hellwig
2018-05-24 21:15 ` Keith Busch
2018-05-25 3:10 ` jianchao.wang
2018-05-25 15:09 ` Keith Busch
2018-05-25 12:36 ` Christoph Hellwig
2018-07-13 0:48 ` [PATCHv3 0/9] nvme timeout fixes, v3 Ming Lei
2018-07-13 20:54 ` Keith Busch
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180525124209.GD23463@lst.de \
--to=hch@lst.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.