From: Christoph Hellwig <hch@lst.de>
To: Sagi Grimberg <sagi@grimberg.me>
Cc: Keith Busch <kbusch@kernel.org>, Christoph Hellwig <hch@lst.de>,
linux-nvme@lists.infradead.org,
James Smart <james.smart@broadcom.com>
Subject: Re: [PATCH v2 7/8] nvme-rdma: fix timeout handler
Date: Fri, 14 Aug 2020 08:52:30 +0200 [thread overview]
Message-ID: <20200814065230.GD1719@lst.de> (raw)
In-Reply-To: <20200806191127.592062-8-sagi@grimberg.me>
On Thu, Aug 06, 2020 at 12:11:26PM -0700, Sagi Grimberg wrote:
> Currently we check if the controller state != LIVE, and
> we directly fail the command under the assumption that this
> is the connect command or an admin command within the
> controller initialization sequence.
>
> This is wrong, we need to check if the request risking
> controller setup/teardown blocking if not completed and
> only then fail.
FYI: you can use up to 73 characters in the commit log..
> +++ b/drivers/nvme/host/rdma.c
> @@ -1185,6 +1185,7 @@ static void nvme_rdma_error_recovery(struct nvme_rdma_ctrl *ctrl)
> if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_RESETTING))
> return;
>
> + dev_warn(ctrl->ctrl.device, "starting error recovery\n");
Should this really be a warning? I'd turn this down to _info.
> +static void nvme_rdma_complete_timed_out(struct request *rq)
> +{
> + struct nvme_rdma_request *req = blk_mq_rq_to_pdu(rq);
> + struct nvme_rdma_queue *queue = req->queue;
> + struct nvme_rdma_ctrl *ctrl = queue->ctrl;
> +
> + /* fence other contexts that may complete the command */
> + mutex_lock(&ctrl->teardown_lock);
> + nvme_rdma_stop_queue(queue);
> + if (blk_mq_request_completed(rq))
> + goto out;
> + nvme_req(rq)->status = NVME_SC_HOST_ABORTED_CMD;
> + blk_mq_complete_request(rq);
> +out:
Nit: I'd probably avoid the goto here for a slightly simpler flow.
> {
> @@ -1961,29 +1979,43 @@ nvme_rdma_timeout(struct request *rq, bool reserved)
> dev_warn(ctrl->ctrl.device, "I/O %d QID %d timeout\n",
> rq->tag, nvme_rdma_queue_idx(queue));
>
> + switch (ctrl->ctrl.state) {
> + case NVME_CTRL_RESETTING:
> + if (!nvme_rdma_queue_idx(queue)) {
> + /*
> + * if we are in teardown we must complete immediately
> + * because we may block the teardown sequence (e.g.
> + * nvme_disable_ctrl timed out).
> + */
Please start the setence with an upper case character.
> + nvme_rdma_complete_timed_out(rq);
> + return BLK_EH_DONE;
> + }
> + /*
> + * Restart the timer if a controller reset is already scheduled.
> + * Any timed out commands would be handled before entering the
> + * connecting state.
> + */
> return BLK_EH_RESET_TIMER;
> + case NVME_CTRL_CONNECTING:
> + if (reserved || !nvme_rdma_queue_idx(queue)) {
> + /*
> + * if we are connecting we must complete immediately
> + * connect (reserved) or admin requests because we may
> + * block controller setup sequence.
> + */
> + nvme_rdma_complete_timed_out(rq);
> + return BLK_EH_DONE;
A goto to share the immediate completion branch would be nice. I wonder
if we should also do it for the reserved case during shutdown even if
that should never happen and entirely share the code, though:
switch (ctrl->ctrl.state) {
case NVME_CTRL_RESETTING:
case NVME_CTRL_CONNECTING:
/*
* If we are connecting or connecting, we must complete
* connect (reserved) or admin requests immediately, because
* they may block the controller setup or teardown sequence.
*/
if (reserved || !nvme_rdma_queue_idx(queue)) {
nvme_rdma_complete_timed_out(rq);
return BLK_EH_DONE;
}
break;
default:
break;
}
nvme_rdma_error_recovery(ctrl);
return BLK_EH_RESET_TIMER;
}
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
next prev parent reply other threads:[~2020-08-14 6:52 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-08-06 19:11 [PATCH v2 0/8] fix possible controller reset hangs in nvme-tcp/nvme-rdma Sagi Grimberg
2020-08-06 19:11 ` [PATCH v2 1/8] nvme-fabrics: allow to queue requests for live queues Sagi Grimberg
2020-08-14 6:44 ` Christoph Hellwig
2020-08-14 7:08 ` Sagi Grimberg
2020-08-14 7:22 ` Christoph Hellwig
2020-08-14 15:55 ` James Smart
2020-08-14 17:49 ` Sagi Grimberg
2020-08-06 19:11 ` [PATCH v2 2/8] nvme: have nvme_wait_freeze_timeout return if it timed out Sagi Grimberg
2020-08-14 6:45 ` Christoph Hellwig
2020-08-14 7:09 ` Sagi Grimberg
2020-08-06 19:11 ` [PATCH v2 3/8] nvme-tcp: serialize controller teardown double completion Sagi Grimberg
2020-08-06 19:11 ` [PATCH v2 4/8] nvme-tcp: fix timeout handler Sagi Grimberg
2020-08-06 19:11 ` [PATCH v2 5/8] nvme-tcp: fix reset hang if controller died in the middle of a reset Sagi Grimberg
2020-08-06 19:11 ` [PATCH v2 6/8] nvme-rdma: serialize controller teardown sequences Sagi Grimberg
2020-08-14 6:45 ` Christoph Hellwig
2020-08-14 21:12 ` James Smart
2020-08-19 0:35 ` Sagi Grimberg
2020-08-06 19:11 ` [PATCH v2 7/8] nvme-rdma: fix timeout handler Sagi Grimberg
2020-08-14 6:52 ` Christoph Hellwig [this message]
2020-08-14 7:14 ` Sagi Grimberg
2020-08-14 23:19 ` James Smart
2020-08-19 0:26 ` Sagi Grimberg
2020-08-14 23:27 ` James Smart
2020-08-14 23:30 ` James Smart
2020-08-19 0:39 ` Sagi Grimberg
2020-08-19 0:38 ` Sagi Grimberg
2020-08-06 19:11 ` [PATCH v2 8/8] nvme-rdma: fix reset hang if controller died in the middle of a reset Sagi Grimberg
2020-08-14 6:53 ` Christoph Hellwig
2020-08-11 22:16 ` [PATCH v2 0/8] fix possible controller reset hangs in nvme-tcp/nvme-rdma Sagi Grimberg
2020-08-13 15:39 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200814065230.GD1719@lst.de \
--to=hch@lst.de \
--cc=james.smart@broadcom.com \
--cc=kbusch@kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=sagi@grimberg.me \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.