From: keith.busch@intel.com (Keith Busch)
Subject: [PATCH v2] nvme-rdma: fix timeout handler
Date: Tue, 8 Jan 2019 08:43:53 -0700 [thread overview]
Message-ID: <20190108154353.GB18014@localhost.localdomain> (raw)
In-Reply-To: <20190108085734.9681-1-sagi@grimberg.me>
On Tue, Jan 08, 2019@12:57:34AM -0800, Sagi Grimberg wrote:
> Currently, we have several problems with the timeout
> handler:
> 1. If we timeout on the controller establishment flow, we will hang
> because we don't execute the error recovery (and we shouldn't because
> the create_ctrl flow needs to fail and cleanup on its own)
> 2. We might also hang if we get a disconnet on a queue while the
> controller is already deleting. This racy flow can cause the controller
> disable/shutdown admin command to hang.
>
> We cannot complete a timed out request from the timeout handler without
> mutual exclusion from the teardown flow (e.g. nvme_rdma_error_recovery_work).
> So we serialize it in the timeout handler and teardown io and admin
> queues to guarantee that no one races with us from completing the
> request.
>
> Reported-by: Jaesoo Lee <jalee at purestorage.com>
> Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
> ---
> This is a slightly different version that looks more like pci
> behavior as we teardown the controller if we are connection
> or deleting in the timeout handler.
This looks good to me.
Reviewed-by: Keith Busch <keith.busch at intel.com>
> drivers/nvme/host/rdma.c | 26 ++++++++++++++++++--------
> 1 file changed, 18 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
> index e63f36c09c9a..079d59c04a0e 100644
> --- a/drivers/nvme/host/rdma.c
> +++ b/drivers/nvme/host/rdma.c
> @@ -1681,18 +1681,28 @@ static enum blk_eh_timer_return
> nvme_rdma_timeout(struct request *rq, bool reserved)
> {
> struct nvme_rdma_request *req = blk_mq_rq_to_pdu(rq);
> + struct nvme_rdma_queue *queue = req->queue;
> + struct nvme_rdma_ctrl *ctrl = queue->ctrl;
>
> - dev_warn(req->queue->ctrl->ctrl.device,
> - "I/O %d QID %d timeout, reset controller\n",
> - rq->tag, nvme_rdma_queue_idx(req->queue));
> + dev_warn(ctrl->ctrl.device, "I/O %d QID %d timeout\n",
> + rq->tag, nvme_rdma_queue_idx(queue));
>
> - /* queue error recovery */
> - nvme_rdma_error_recovery(req->queue->ctrl);
> + if (ctrl->ctrl.state != NVME_CTRL_LIVE) {
> + /*
> + * teardown immediately if controller times out while starting
> + * or we are already started error recovery. all outstanding
> + * requests are completed on shutdown, so we return BLK_EH_DONE.
> + */
> + flush_work(&ctrl->err_work);
> + nvme_rdma_teardown_io_queues(ctrl, false);
> + nvme_rdma_teardown_admin_queue(ctrl, false);
> + return BLK_EH_DONE;
> + }
>
> - /* fail with DNR on cmd timeout */
> - nvme_req(rq)->status = NVME_SC_ABORT_REQ | NVME_SC_DNR;
> + dev_warn(ctrl->ctrl.device, "starting error recovery\n");
> + nvme_rdma_error_recovery(ctrl);
>
> - return BLK_EH_DONE;
> + return BLK_EH_RESET_TIMER;
> }
>
> static blk_status_t nvme_rdma_queue_rq(struct blk_mq_hw_ctx *hctx,
next prev parent reply other threads:[~2019-01-08 15:43 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-01-08 8:57 [PATCH v2] nvme-rdma: fix timeout handler Sagi Grimberg
2019-01-08 15:43 ` Keith Busch [this message]
2019-01-17 1:27 ` Sagi Grimberg
[not found] ` <CAJX3CtgkGX_1eUU_x=v+VrUwvEA6ZwOgiPdps_-sPc5+JUFBqg@mail.gmail.com>
2019-01-18 2:19 ` Jaesoo Lee
2019-01-19 1:01 ` Sagi Grimberg
2019-01-19 13:36 ` Christoph Hellwig
2019-01-21 9:12 ` Sagi Grimberg
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190108154353.GB18014@localhost.localdomain \
--to=keith.busch@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.