From: Daniel Wagner <dwagner@suse.de>
To: linux-nvme@lists.infradead.org
Cc: Daniel Wagner <dwagner@suse.de>
Subject: [RFC] nvme-rdma: Stop queues when starting with error recovery
Date: Mon, 23 May 2022 17:21:02 +0200 [thread overview]
Message-ID: <20220523152102.41000-1-dwagner@suse.de> (raw)
When we enter error recovery we should stop all queue activities and
all armed timers.
For example, we could arming an ANATT timer right before we enter
error recovery but do not successfully recover before the timer
fires. The timer is supposed only be active when the controller is in
LIVE state hence we should call nvme_stop_ctrl when starting with the
recover activites.
Signed-off-by: Daniel Wagner <dwagner@suse.de>
---
The nvme_stop_ctrl() does cancel pending ANATT timers. But so far I
don't got hold of logs when the two controllers get back live. So this
might not work as expected.
My question is do we just want to cancel the timer or is
nvme_stop_ctrl() the right function here. Obviously, the same problem
exists for nvme-tcp.
[ 889.241541] nvme nvme0: creating 4 I/O queues.
[ 892.341152] nvme nvme0: mapped 4/0/0 default/read/poll queues.
[ 892.350942] nvme nvme0: new ctrl: NQN "XXX", addr 192.20.93.101:4420
[ 892.402493] nvme nvme1: creating 4 I/O queues.
[ 895.392810] nvme nvme1: mapped 4/0/0 default/read/poll queues.
[ 895.402029] nvme nvme1: new ctrl: NQN "XXX", addr 192.20.93.102:4420
[ 895.471730] nvme nvme2: creating 4 I/O queues.
[ 898.509195] nvme nvme2: mapped 4/0/0 default/read/poll queues.
[ 898.519015] nvme nvme2: new ctrl: NQN "XXX", addr 192.20.193.101:4420
[ 898.571169] nvme nvme3: creating 4 I/O queues.
[ 901.592283] nvme nvme3: mapped 4/0/0 default/read/poll queues.
[ 901.601832] nvme nvme3: new ctrl: NQN "XXX", addr 192.20.193.102:4420
[ 983.429977] nvme nvme3: I/O 0 QID 0 timeout
[ 983.434472] nvme nvme3: starting error recovery
[ 984.549958] nvme nvme0: I/O 0 QID 0 timeout
[ 984.554452] nvme nvme0: starting error recovery
[ 986.962375] nvme nvme3: failed nvme_keep_alive_end_io error=10
[ 986.986898] nvme nvme3: Reconnecting in 10 seconds...
[ 1226.486740] nvme nvme3: Reconnecting in 10 seconds...
[ 1227.749980] nvme nvme0: rdma connection establishment failed (-110)
[ 1227.761593] nvme nvme0: Failed reconnect attempt 18
[ 1227.766848] nvme nvme0: Reconnecting in 10 seconds...
[ 1235.685958] nvme nvme0: ANATT timeout, resetting controller.
[ 1235.692107] nvme nvme3: ANATT timeout, resetting controller.
drivers/nvme/host/rdma.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index b87c8ae41d9b..209dd1becd6c 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -1197,8 +1197,7 @@ static void nvme_rdma_error_recovery_work(struct work_struct *work)
struct nvme_rdma_ctrl *ctrl = container_of(work,
struct nvme_rdma_ctrl, err_work);
- nvme_stop_keep_alive(&ctrl->ctrl);
- flush_work(&ctrl->ctrl.async_event_work);
+ nvme_stop_ctrl(&ctrl->ctrl);
nvme_rdma_teardown_io_queues(ctrl, false);
nvme_start_queues(&ctrl->ctrl);
nvme_rdma_teardown_admin_queue(ctrl, false);
--
2.29.2
next reply other threads:[~2022-05-23 15:21 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-05-23 15:21 Daniel Wagner [this message]
2022-05-24 13:12 ` [RFC] nvme-rdma: Stop queues when starting with error recovery Sagi Grimberg
2022-05-24 13:13 ` Sagi Grimberg
2022-05-24 13:38 ` Daniel Wagner
2022-05-24 13:48 ` Sagi Grimberg
2022-05-24 15:07 ` Daniel Wagner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220523152102.41000-1-dwagner@suse.de \
--to=dwagner@suse.de \
--cc=linux-nvme@lists.infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox