From: Hannes Reinecke <hare@suse.de>
To: Christoph Hellwig <hch@lst.de>
Cc: Keith Busch <keith.busch@wdc.com>,
Sagi Grimberg <sagi@grimberg.me>, Daniel Wagner <dwagner@suse.de>,
linux-nvme@lists.infradead.org, Hannes Reinecke <hare@suse.de>,
Chao Leng <lengchao@huawei.com>
Subject: [PATCH 1/3] nvme: fixup kato deadlock
Date: Tue, 2 Mar 2021 10:26:42 +0100 [thread overview]
Message-ID: <20210302092644.80701-2-hare@suse.de> (raw)
In-Reply-To: <20210302092644.80701-1-hare@suse.de>
A customer of ours has run into this deadlock with RDMA:
- The ka_work workqueue item is executed
- A new ka_work workqueue item is scheduled just after that.
- Now both, the kato request timeout _and_ the workqueue delay
will execute at roughly the same time
- If the timing is correct the workqueue executes _before_
the kato request timeout triggers
- Kato request timeout triggers, and starts error recovery
- error recovery deadlocks, as it needs to flush the kato
workqueue item; this is stuck in nvme_alloc_request() as all
reserved tags are in use.
The reserved tags would have been freed up later when cancelling all
outstanding requests in the queue:
nvme_stop_keep_alive(&ctrl->ctrl);
nvme_rdma_teardown_io_queues(ctrl, false);
nvme_start_queues(&ctrl->ctrl);
nvme_rdma_teardown_admin_queue(ctrl, false);
blk_mq_unquiesce_queue(ctrl->ctrl.admin_q);
but as we're stuck in nvme_stop_keep_alive() we'll never get this far.
To fix this a new controller flag 'NVME_CTRL_KATO_RUNNING' is added
which will short-circuit the nvme_keep_alive() function if one
keep-alive command is already running.
Additionally we should be allocating the KATO request with
BLK_MQ_REQ_NOWAIT as we must not block on request allocation; if we
cannot get a request we cannot determine if the connection is healthy,
and need to reset it anyway.
Cc: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Hannes Reinecke <hare@suse.de>
---
drivers/nvme/host/core.c | 10 ++++++++--
drivers/nvme/host/nvme.h | 1 +
2 files changed, 9 insertions(+), 2 deletions(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 587f8395435b..f890b310499e 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1207,6 +1207,7 @@ static void nvme_keep_alive_end_io(struct request *rq, blk_status_t status)
bool startka = false;
blk_mq_free_request(rq);
+ clear_bit(NVME_CTRL_KATO_RUNNING, &ctrl->flags);
if (status) {
dev_err(ctrl->device,
@@ -1229,10 +1230,15 @@ static int nvme_keep_alive(struct nvme_ctrl *ctrl)
{
struct request *rq;
+ if (test_and_set_bit(NVME_CTRL_KATO_RUNNING, &ctrl->flags))
+ return 0;
+
rq = nvme_alloc_request(ctrl->admin_q, &ctrl->ka_cmd,
- BLK_MQ_REQ_RESERVED);
- if (IS_ERR(rq))
+ BLK_MQ_REQ_RESERVED | BLK_MQ_REQ_NOWAIT);
+ if (IS_ERR(rq)) {
+ clear_bit(NVME_CTRL_KATO_RUNNING, &ctrl->flags);
return PTR_ERR(rq);
+ }
rq->timeout = ctrl->kato * HZ;
rq->end_io_data = ctrl;
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 07b34175c6ce..23711f6b7d13 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -344,6 +344,7 @@ struct nvme_ctrl {
int nr_reconnects;
unsigned long flags;
#define NVME_CTRL_FAILFAST_EXPIRED 0
+#define NVME_CTRL_KATO_RUNNING 1
struct nvmf_ctrl_options *opts;
struct page *discard_page;
--
2.29.2
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
next prev parent reply other threads:[~2021-03-02 9:27 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-03-02 9:26 [PATCHv2 0/3] nvme: sanitize KATO handling Hannes Reinecke
2021-03-02 9:26 ` Hannes Reinecke [this message]
2021-03-03 8:42 ` [PATCH 1/3] nvme: fixup kato deadlock Christoph Hellwig
2021-03-03 12:01 ` Hannes Reinecke
2021-03-03 12:35 ` Christoph Hellwig
2021-03-03 13:11 ` Hannes Reinecke
2021-03-03 14:23 ` Hannes Reinecke
2021-03-04 8:02 ` Christoph Hellwig
2021-03-04 8:56 ` Hannes Reinecke
2021-03-02 9:26 ` [PATCH 2/3] nvme: sanitize KATO setting Hannes Reinecke
2021-03-03 8:53 ` Chao Leng
2021-03-03 12:40 ` Christoph Hellwig
2021-03-05 20:38 ` Sagi Grimberg
2021-03-08 13:11 ` Max Gurtovoy
2021-03-08 13:54 ` Hannes Reinecke
2021-03-02 9:26 ` [PATCH 3/3] nvme: add 'kato' sysfs attribute Hannes Reinecke
2021-03-05 20:38 ` Sagi Grimberg
2021-03-08 13:06 ` Max Gurtovoy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210302092644.80701-2-hare@suse.de \
--to=hare@suse.de \
--cc=dwagner@suse.de \
--cc=hch@lst.de \
--cc=keith.busch@wdc.com \
--cc=lengchao@huawei.com \
--cc=linux-nvme@lists.infradead.org \
--cc=sagi@grimberg.me \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.