From: Mohamed Khalfella <mkhalfella@purestorage.com>
To: Sagi Grimberg <sagi@grimberg.me>
Cc: Chaitanya Kulkarni <kch@nvidia.com>,
Christoph Hellwig <hch@lst.de>, Jens Axboe <axboe@kernel.dk>,
Keith Busch <kbusch@kernel.org>,
Aaron Dailey <adailey@purestorage.com>,
Randy Jennings <randyj@purestorage.com>,
John Meneghini <jmeneghi@redhat.com>,
Hannes Reinecke <hare@suse.de>,
linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH 07/14] nvme: Add RECOVERING nvme controller state
Date: Thu, 25 Dec 2025 09:17:09 -0800 [thread overview]
Message-ID: <20251225171709.GA8129-mkhalfella@purestorage.com> (raw)
In-Reply-To: <e9bb2664-9d96-45f0-a39d-6deaad0829e9@grimberg.me>
On Thu 2025-12-25 15:29:52 +0200, Sagi Grimberg wrote:
>
>
> On 26/11/2025 4:11, Mohamed Khalfella wrote:
> > Add NVME_CTRL_RECOVERING as a new controller state to be used when
> > impacted controller is being recovered. A LIVE controller enters
> > RECOVERING state when an IO error is encountered. While recovering
> > inflight IOs will not be canceled if they timeout. These IOs will be
> > canceled after recovery finishes. Also, while recovering a controller
> > can not be reset or deleted. This is intentional because reset or delete
> > will result in canceling inflight IOs. When recovery finishes, the
> > impacted controller transitions from RECOVERING state to RESETTING state.
> > Reset codepath takes care of queues teardown and inflight requests
> > cancellation.
>
> Is RECOVERING really capturing the nature of this state? Maybe RESETTLING?
> or QUIESCING?
Naming is hard. QUIESCING sounds better, I will renaming it to
QUIESCING.
>
> >
> > Note, there is no transition from RECOVERING to RESETTING added to
> > nvme_change_ctrl_state(). The reason is that user should not be allowed
> > to reset or delete a controller that is being recovered.
> >
> > Add NVME_CTRL_RECOVERED controller flag. This flag is set on a controller
> > about to schedule delayed work for time based recovery.
> >
> > Signed-off-by: Mohamed Khalfella <mkhalfella@purestorage.com>
> > ---
> > drivers/nvme/host/core.c | 10 ++++++++++
> > drivers/nvme/host/nvme.h | 2 ++
> > drivers/nvme/host/sysfs.c | 1 +
> > 3 files changed, 13 insertions(+)
> >
> > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> > index aa007a7b9606..f5b84bc327d3 100644
> > --- a/drivers/nvme/host/core.c
> > +++ b/drivers/nvme/host/core.c
> > @@ -574,6 +574,15 @@ bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
> > break;
> > }
> > break;
> > + case NVME_CTRL_RECOVERING:
> > + switch (old_state) {
> > + case NVME_CTRL_LIVE:
> > + changed = true;
> > + fallthrough;
> > + default:
> > + break;
> > + }
> > + break;
>
> That is a strange transition...
Why is it strange?
We transition to RECOVERING state only if controller is LIVE. This is
when we expect to have inflight user IOs to be quiesced by CCR. We do
not care about inflight requests in other states.
next prev parent reply other threads:[~2025-12-25 17:17 UTC|newest]
Thread overview: 68+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-26 2:11 [RFC PATCH 00/14] TP8028 Rapid Path Failure Recovery Mohamed Khalfella
2025-11-26 2:11 ` [RFC PATCH 01/14] nvmet: Rapid Path Failure Recovery set controller identify fields Mohamed Khalfella
2025-12-16 1:35 ` Randy Jennings
2025-11-26 2:11 ` [RFC PATCH 02/14] nvmet/debugfs: Add ctrl uniquifier and random values Mohamed Khalfella
2025-12-16 1:43 ` Randy Jennings
2025-11-26 2:11 ` [RFC PATCH 03/14] nvmet: Implement CCR nvme command Mohamed Khalfella
2025-12-16 3:01 ` Randy Jennings
2025-12-31 21:14 ` Mohamed Khalfella
2025-12-25 13:14 ` Sagi Grimberg
2025-12-25 17:33 ` Mohamed Khalfella
2025-12-27 9:39 ` Sagi Grimberg
2025-12-31 21:35 ` Mohamed Khalfella
2025-11-26 2:11 ` [RFC PATCH 04/14] nvmet: Implement CCR logpage Mohamed Khalfella
2025-12-16 3:11 ` Randy Jennings
2025-11-26 2:11 ` [RFC PATCH 05/14] nvmet: Send an AEN on CCR completion Mohamed Khalfella
2025-12-16 3:31 ` Randy Jennings
2025-12-25 13:23 ` Sagi Grimberg
2025-12-25 18:13 ` Mohamed Khalfella
2025-12-27 9:48 ` Sagi Grimberg
2025-12-31 22:00 ` Mohamed Khalfella
2026-01-04 21:09 ` Sagi Grimberg
2026-01-07 2:58 ` Randy Jennings
2026-01-30 22:31 ` Mohamed Khalfella
2025-11-26 2:11 ` [RFC PATCH 06/14] nvme: Rapid Path Failure Recovery read controller identify fields Mohamed Khalfella
2025-12-18 15:22 ` Randy Jennings
2025-12-31 22:26 ` Mohamed Khalfella
2026-01-02 19:06 ` Mohamed Khalfella
2025-11-26 2:11 ` [RFC PATCH 07/14] nvme: Add RECOVERING nvme controller state Mohamed Khalfella
2025-12-18 23:18 ` Randy Jennings
2025-12-19 1:39 ` Randy Jennings
2025-12-25 13:29 ` Sagi Grimberg
2025-12-25 17:17 ` Mohamed Khalfella [this message]
2025-12-27 9:52 ` Sagi Grimberg
2025-12-31 22:45 ` Mohamed Khalfella
2025-12-27 9:55 ` Sagi Grimberg
2025-12-31 22:36 ` Mohamed Khalfella
2025-12-31 23:04 ` Mohamed Khalfella
2025-11-26 2:11 ` [RFC PATCH 08/14] nvme: Implement cross-controller reset recovery Mohamed Khalfella
2025-12-19 1:21 ` Randy Jennings
2025-12-27 10:14 ` Sagi Grimberg
2025-12-31 0:04 ` Randy Jennings
2026-01-04 21:14 ` Sagi Grimberg
2026-01-07 3:16 ` Randy Jennings
2025-12-31 23:43 ` Mohamed Khalfella
2026-01-04 21:39 ` Sagi Grimberg
2026-01-30 22:01 ` Mohamed Khalfella
2025-11-26 2:11 ` [RFC PATCH 09/14] nvme: Implement cross-controller reset completion Mohamed Khalfella
2025-12-19 1:31 ` Randy Jennings
2025-12-27 10:24 ` Sagi Grimberg
2025-12-31 23:51 ` Mohamed Khalfella
2026-01-04 21:15 ` Sagi Grimberg
2026-01-30 22:32 ` Mohamed Khalfella
2025-11-26 2:11 ` [RFC PATCH 10/14] nvme-tcp: Use CCR to recover controller that hits an error Mohamed Khalfella
2025-12-19 2:06 ` Randy Jennings
2026-01-01 0:04 ` Mohamed Khalfella
2025-12-27 10:35 ` Sagi Grimberg
2025-12-31 0:13 ` Randy Jennings
2026-01-04 21:19 ` Sagi Grimberg
2026-01-01 0:27 ` Mohamed Khalfella
2025-11-26 2:11 ` [RFC PATCH 11/14] nvme-rdma: " Mohamed Khalfella
2025-12-19 2:16 ` Randy Jennings
2025-12-27 10:36 ` Sagi Grimberg
2025-11-26 2:11 ` [RFC PATCH 12/14] nvme-fc: Decouple error recovery from controller reset Mohamed Khalfella
2025-12-19 2:59 ` Randy Jennings
2025-11-26 2:12 ` [RFC PATCH 13/14] nvme-fc: Use CCR to recover controller that hits an error Mohamed Khalfella
2025-12-20 1:21 ` Randy Jennings
2025-11-26 2:12 ` [RFC PATCH 14/14] nvme-fc: Hold inflight requests while in RECOVERING state Mohamed Khalfella
2025-12-20 1:44 ` Randy Jennings
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251225171709.GA8129-mkhalfella@purestorage.com \
--to=mkhalfella@purestorage.com \
--cc=adailey@purestorage.com \
--cc=axboe@kernel.dk \
--cc=hare@suse.de \
--cc=hch@lst.de \
--cc=jmeneghi@redhat.com \
--cc=kbusch@kernel.org \
--cc=kch@nvidia.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=randyj@purestorage.com \
--cc=sagi@grimberg.me \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox