From: Jiewei Ke <jiewei@smartx.com>
To: wagi@kernel.org
Cc: hare@suse.de, hch@lst.de, jmeneghi@redhat.com, kbusch@kernel.org,
linux-kernel@vger.kernel.org, linux-nvme@lists.infradead.org,
mkhalfella@purestorage.com, randyj@purestorage.com,
sagi@grimberg.me
Subject: Re: [PATCH RFC 3/3] nvme: delay failover by command quiesce timeout
Date: Thu, 10 Apr 2025 12:07:57 -0400 [thread overview]
Message-ID: <20250410160757.2567493-1-jiewei@smartx.com> (raw)
In-Reply-To: <20250324-tp4129-v1-3-95a747b4c33b@kernel.org>
Hi Daniel,
I just noticed that your patchset addresses a similar issue to the one I'm
trying to solve with my recently submitted patchset [1]. Compared to your
approach, mine differs in a few key aspects:
1. Only aborted requests are delayed for retry. In the current implementation,
nvmf_complete_timed_out_request and nvme_cancel_request set the request status
to NVME_SC_HOST_ABORTED_CMD. These requests are usually already sent to the
target, but may have timed out or been canceled before a response is received.
Since the target may still be processing them, the host needs to delay retrying
to ensure the target has completed or cleaned up the stale requests. On the
other hand, requests that are not aborted - such as those that never got
submitted due to no usable path (e.g., from nvme_ns_head_submit_bio), or those
that already received an ANA error from the target - do not need delayed retry.
2. The host explicitly disconnects and stops KeepAlive before delay scheduling
retrying requests. This aligns with Section 9.6 "Communication Loss Handling"
of the NVMe Base Specification 2.1. Once the host disconnects, the target may
take up to the KATO interval to detect the lost connection and begin cleaning
up any remaining requests.
@@ -2178,6 +2180,7 @@ static void nvme_rdma_shutdown_ctrl(struct nvme_rdma_ctrl *ctrl, bool shutdown)
nvme_quiesce_admin_queue(&ctrl->ctrl);
nvme_disable_ctrl(&ctrl->ctrl, shutdown);
nvme_rdma_teardown_admin_queue(ctrl, shutdown);
+ nvme_delay_kick_retry_lists(&ctrl->ctrl); <<< delay kick retry after teardown all queues
}
3. Delayed retry of aborted requests is supported across multiple scenarios,
including error recovery work, controller reset, controller deletion, and
controller reconnect failure handling. Here's the relevant code for reference.
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 9109d5476417..f07b3960df7c 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -2449,6 +2449,7 @@ static int nvme_tcp_setup_ctrl(struct nvme_ctrl *ctrl, bool new)
destroy_admin:
nvme_stop_keep_alive(ctrl);
nvme_tcp_teardown_admin_queue(ctrl, new);
+ nvme_delay_kick_retry_lists(ctrl); <<< requests may be timed out when ctrl reconnects
return ret;
}
@@ -2494,6 +2495,7 @@ static void nvme_tcp_error_recovery_work(struct work_struct *work)
nvme_tcp_teardown_admin_queue(ctrl, false);
nvme_unquiesce_admin_queue(ctrl);
nvme_auth_stop(ctrl);
+ nvme_delay_kick_retry_lists(ctrl); <<< retry_lists may contain timed out or cancelled requests
if (!nvme_change_ctrl_state(ctrl, NVME_CTRL_CONNECTING)) {
/* state change failure is ok if we started ctrl delete */
@@ -2513,6 +2515,7 @@ static void nvme_tcp_teardown_ctrl(struct nvme_ctrl *ctrl, bool shutdown)
nvme_quiesce_admin_queue(ctrl);
nvme_disable_ctrl(ctrl, shutdown);
nvme_tcp_teardown_admin_queue(ctrl, shutdown);
+ nvme_delay_kick_retry_lists(ctrl); <<< retry_lists may contain timed out or cancelled requests when ctrl reset or delete
}
Besides, in nvme_tcp_error_recovery_work, the delayed retry must occur after
nvme_tcp_teardown_io_queues, because the teardown cancels requests that may need
to be retried too.
One limitation of my patchset is that it does not yet include full CQT support,
and due to testing environment constraints, only nvme_tcp and nvme_rdma are
currently covered.
I'd be happy to discuss the pros and cons of both approaches - perhaps we can
combine the best aspects.
Looking forward to your thoughts.
Thanks,
Jiewei
[1] https://lore.kernel.org/linux-nvme/20250410122054.2526358-1-jiewei@smartx.com/
next prev parent reply other threads:[~2025-04-10 17:10 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-24 12:07 [PATCH RFC 0/3] nvme: add support for command quiesce timeout Daniel Wagner
2025-03-24 12:07 ` [PATCH RFC 1/3] nvmet: add command quiesce time Daniel Wagner
2025-04-01 9:33 ` Hannes Reinecke
2025-04-10 9:00 ` Mohamed Khalfella
2025-04-16 11:37 ` Daniel Wagner
2025-03-24 12:07 ` [PATCH RFC 2/3] nvme: store cqt value into nvme ctrl object Daniel Wagner
2025-04-01 9:34 ` Hannes Reinecke
2025-03-24 12:07 ` [PATCH RFC 3/3] nvme: delay failover by command quiesce timeout Daniel Wagner
2025-04-01 9:37 ` Hannes Reinecke
2025-04-15 12:00 ` Daniel Wagner
2025-04-01 13:32 ` Nilay Shroff
2025-04-15 12:05 ` Daniel Wagner
2025-04-10 8:51 ` Mohamed Khalfella
2025-04-14 22:28 ` Sagi Grimberg
2025-04-15 12:11 ` Daniel Wagner
2025-04-15 21:07 ` Sagi Grimberg
2025-04-15 23:02 ` Randy Jennings
2025-04-15 23:35 ` Sagi Grimberg
2025-04-15 23:57 ` Randy Jennings
2025-04-16 22:15 ` Sagi Grimberg
2025-04-17 0:47 ` Randy Jennings
2025-04-15 12:17 ` Daniel Wagner
2025-04-15 22:56 ` Randy Jennings
2025-04-16 6:39 ` Daniel Wagner
2025-04-16 0:17 ` Mohamed Khalfella
2025-04-16 6:57 ` Daniel Wagner
2025-04-16 13:39 ` Mohamed Khalfella
2025-04-16 0:40 ` Mohamed Khalfella
2025-04-16 8:30 ` Daniel Wagner
2025-04-16 13:53 ` Mohamed Khalfella
2025-04-16 22:21 ` Sagi Grimberg
2025-04-16 22:59 ` Mohamed Khalfella
2025-04-17 7:28 ` Hannes Reinecke
2025-04-10 16:07 ` Jiewei Ke [this message]
2025-04-10 17:13 ` Jiewei Ke
2025-04-13 22:03 ` Sagi Grimberg
2025-04-16 8:51 ` Daniel Wagner
2025-04-16 0:23 ` Mohamed Khalfella
2025-04-16 11:33 ` Daniel Wagner
[not found] <8F2489FD-1663-4A52-A50B-F15046AC2878@163.com>
2025-04-15 12:34 ` Daniel Wagner
2025-04-15 15:08 ` Jiewei Ke
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250410160757.2567493-1-jiewei@smartx.com \
--to=jiewei@smartx.com \
--cc=hare@suse.de \
--cc=hch@lst.de \
--cc=jmeneghi@redhat.com \
--cc=kbusch@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=mkhalfella@purestorage.com \
--cc=randyj@purestorage.com \
--cc=sagi@grimberg.me \
--cc=wagi@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox