From: Mohamed Khalfella <mkhalfella@purestorage.com>
To: Hannes Reinecke <hare@suse.de>
Cc: Justin Tee <justin.tee@broadcom.com>,
Naresh Gottumukkala <nareshgottumukkala83@gmail.com>,
Paul Ely <paul.ely@broadcom.com>,
Chaitanya Kulkarni <kch@nvidia.com>,
Christoph Hellwig <hch@lst.de>, Jens Axboe <axboe@kernel.dk>,
Keith Busch <kbusch@kernel.org>, Sagi Grimberg <sagi@grimberg.me>,
Aaron Dailey <adailey@purestorage.com>,
Randy Jennings <randyj@purestorage.com>,
Dhaval Giani <dgiani@purestorage.com>,
linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2 07/14] nvme: Introduce FENCING and FENCED controller states
Date: Tue, 3 Feb 2026 11:13:31 -0800 [thread overview]
Message-ID: <20260203191331.GD3729-mkhalfella@purestorage.com> (raw)
In-Reply-To: <fe62af4b-718c-423d-918a-d05acdadb980@suse.de>
On Tue 2026-02-03 06:07:35 +0100, Hannes Reinecke wrote:
> On 1/30/26 23:34, Mohamed Khalfella wrote:
> > FENCING is a new controller state that a LIVE controller enter when an
> > error is encountered. While in FENCING state inflight IOs that timeout
> > are not canceled because they should be held until either CCR succeeds
> > or time-based recovery completes. While the queues remain alive requests
> > are not allowed to be sent in this state and the controller can not be
> > reset of deleted. This is intentional because resetting or deleting the
> > controller results in canceling inflight IOs.
> >
> > FENCED is a short-term state the controller enters before it is reset.
> > It exists only to prevent manual resets to happen while controller is
> > in FENCING state.
> >
> > Signed-off-by: Mohamed Khalfella <mkhalfella@purestorage.com>
> > ---
> > drivers/nvme/host/core.c | 25 +++++++++++++++++++++++--
> > drivers/nvme/host/nvme.h | 4 ++++
> > drivers/nvme/host/sysfs.c | 2 ++
> > 3 files changed, 29 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> > index 8961d612ccb0..3e1e02822dd4 100644
> > --- a/drivers/nvme/host/core.c
> > +++ b/drivers/nvme/host/core.c
> > @@ -574,10 +574,29 @@ bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
> > break;
> > }
> > break;
> > + case NVME_CTRL_FENCING:
> > + switch (old_state) {
> > + case NVME_CTRL_LIVE:
> > + changed = true;
> > + fallthrough;
> > + default:
> > + break;
> > + }
> > + break;
> > + case NVME_CTRL_FENCED:> + switch (old_state) {
> > + case NVME_CTRL_FENCING:
> > + changed = true;
> > + fallthrough;
> > + default:
> > + break;
> > + }
> > + break;
> > case NVME_CTRL_RESETTING:
> > switch (old_state) {
> > case NVME_CTRL_NEW:
> > case NVME_CTRL_LIVE:
> > + case NVME_CTRL_FENCED:
> > changed = true;
> > fallthrough;
> > default:
> > @@ -760,6 +779,7 @@ blk_status_t nvme_fail_nonready_command(struct nvme_ctrl *ctrl,
> >
> > if (state != NVME_CTRL_DELETING_NOIO &&
> > state != NVME_CTRL_DELETING &&
> > + state != NVME_CTRL_FENCING &&
>
> Shouldn't 'FENCED' be in here, too?
Agreed. Will add FENCED to the two places.
>
> > state != NVME_CTRL_DEAD &&
> > !test_bit(NVME_CTRL_FAILFAST_EXPIRED, &ctrl->flags) &&
> > !blk_noretry_request(rq) && !(rq->cmd_flags & REQ_NVME_MPATH))
> > @@ -802,10 +822,11 @@ bool __nvme_check_ready(struct nvme_ctrl *ctrl, struct request *rq,
> > req->cmd->fabrics.fctype == nvme_fabrics_type_auth_receive))
> > return true;
> > break;
> > - default:
> > - break;
> > + case NVME_CTRL_FENCING:
>
> Similar here.
>
> > case NVME_CTRL_DEAD:
> > return false;
> > + default:
> > + break;
> > }
> > }
> >
> > diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
> > index 9dd9f179ad88..00866bbc66f3 100644
> > --- a/drivers/nvme/host/nvme.h
> > +++ b/drivers/nvme/host/nvme.h
> > @@ -251,6 +251,8 @@ static inline u16 nvme_req_qid(struct request *req)
> > enum nvme_ctrl_state {
> > NVME_CTRL_NEW,
> > NVME_CTRL_LIVE,
> > + NVME_CTRL_FENCING,
> > + NVME_CTRL_FENCED,
> > NVME_CTRL_RESETTING,
> > NVME_CTRL_CONNECTING,
> > NVME_CTRL_DELETING,
> > @@ -777,6 +779,8 @@ static inline bool nvme_state_terminal(struct nvme_ctrl *ctrl)
> > switch (nvme_ctrl_state(ctrl)) {
> > case NVME_CTRL_NEW:
> > case NVME_CTRL_LIVE:
> > + case NVME_CTRL_FENCING:
> > + case NVME_CTRL_FENCED:
> > case NVME_CTRL_RESETTING:
> > case NVME_CTRL_CONNECTING:
> > return false;
> > diff --git a/drivers/nvme/host/sysfs.c b/drivers/nvme/host/sysfs.c
> > index f81bbb6ec768..4ec9dfeb736e 100644
> > --- a/drivers/nvme/host/sysfs.c
> > +++ b/drivers/nvme/host/sysfs.c
> > @@ -443,6 +443,8 @@ static ssize_t nvme_sysfs_show_state(struct device *dev,
> > static const char *const state_name[] = {
> > [NVME_CTRL_NEW] = "new",
> > [NVME_CTRL_LIVE] = "live",
> > + [NVME_CTRL_FENCING] = "fencing",
> > + [NVME_CTRL_FENCED] = "fenced",
> > [NVME_CTRL_RESETTING] = "resetting",
> > [NVME_CTRL_CONNECTING] = "connecting",
> > [NVME_CTRL_DELETING] = "deleting",
>
> You need to modify nvme-tcp.c:nvme_tcp_timeout() too, as this checks
> 'just' for 'LIVE' state and will abort/terminate commands when in
> FENCING. Similar argument for nvme-rdma.c. And nvme-fc.c also needs an
> audit to ensure it works correctly.
Exactly. The changes to nvme-tcp, nvme-rdma, and nvme-fc are in
transport specific patches. For tcp and rdma the timeout callback
handler has been modified to do what you mentioned.
For nvme-fc nvme_fc_start_ioerr_recovery() does nothing if the
controller is in FENCING state.
next prev parent reply other threads:[~2026-02-03 19:13 UTC|newest]
Thread overview: 82+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-30 22:34 [PATCH v2 00/14] TP8028 Rapid Path Failure Recovery Mohamed Khalfella
2026-01-30 22:34 ` [PATCH v2 01/14] nvmet: Rapid Path Failure Recovery set controller identify fields Mohamed Khalfella
2026-02-03 3:03 ` Hannes Reinecke
2026-02-03 18:14 ` Mohamed Khalfella
2026-02-04 0:34 ` Hannes Reinecke
2026-02-07 13:41 ` Sagi Grimberg
2026-02-14 0:42 ` Randy Jennings
2026-02-14 3:56 ` Mohamed Khalfella
2026-01-30 22:34 ` [PATCH v2 02/14] nvmet/debugfs: Add ctrl uniquifier and random values Mohamed Khalfella
2026-02-03 3:04 ` Hannes Reinecke
2026-02-07 13:47 ` Sagi Grimberg
2026-02-11 0:50 ` Randy Jennings
2026-02-11 1:02 ` Mohamed Khalfella
2026-01-30 22:34 ` [PATCH v2 03/14] nvmet: Implement CCR nvme command Mohamed Khalfella
2026-02-03 3:19 ` Hannes Reinecke
2026-02-03 18:40 ` Mohamed Khalfella
2026-02-04 0:38 ` Hannes Reinecke
2026-02-04 0:44 ` Mohamed Khalfella
2026-02-04 0:55 ` Hannes Reinecke
2026-02-04 17:52 ` Mohamed Khalfella
2026-02-07 13:58 ` Sagi Grimberg
2026-02-08 23:10 ` Mohamed Khalfella
2026-02-09 19:27 ` Mohamed Khalfella
2026-02-11 1:34 ` Randy Jennings
2026-02-07 14:11 ` Sagi Grimberg
2026-01-30 22:34 ` [PATCH v2 04/14] nvmet: Implement CCR logpage Mohamed Khalfella
2026-02-03 3:21 ` Hannes Reinecke
2026-02-07 14:11 ` Sagi Grimberg
2026-02-11 1:49 ` Randy Jennings
2026-01-30 22:34 ` [PATCH v2 05/14] nvmet: Send an AEN on CCR completion Mohamed Khalfella
2026-02-03 3:27 ` Hannes Reinecke
2026-02-03 18:48 ` Mohamed Khalfella
2026-02-04 0:43 ` Hannes Reinecke
2026-02-07 14:12 ` Sagi Grimberg
2026-02-11 1:52 ` Randy Jennings
2026-01-30 22:34 ` [PATCH v2 06/14] nvme: Rapid Path Failure Recovery read controller identify fields Mohamed Khalfella
2026-02-03 3:28 ` Hannes Reinecke
2026-02-07 14:13 ` Sagi Grimberg
2026-02-11 1:56 ` Randy Jennings
2026-01-30 22:34 ` [PATCH v2 07/14] nvme: Introduce FENCING and FENCED controller states Mohamed Khalfella
2026-02-03 5:07 ` Hannes Reinecke
2026-02-03 19:13 ` Mohamed Khalfella [this message]
2026-01-30 22:34 ` [PATCH v2 08/14] nvme: Implement cross-controller reset recovery Mohamed Khalfella
2026-02-03 5:19 ` Hannes Reinecke
2026-02-03 20:00 ` Mohamed Khalfella
2026-02-04 1:10 ` Hannes Reinecke
2026-02-04 23:24 ` Mohamed Khalfella
2026-02-11 3:44 ` Randy Jennings
2026-02-11 15:19 ` Hannes Reinecke
2026-02-10 22:09 ` James Smart
2026-02-10 22:27 ` Mohamed Khalfella
2026-02-10 22:49 ` James Smart
2026-02-10 23:25 ` Mohamed Khalfella
2026-02-11 0:12 ` Mohamed Khalfella
2026-02-11 3:33 ` Randy Jennings
2026-01-30 22:34 ` [PATCH v2 09/14] nvme: Implement cross-controller reset completion Mohamed Khalfella
2026-02-03 5:22 ` Hannes Reinecke
2026-02-03 20:07 ` Mohamed Khalfella
2026-01-30 22:34 ` [PATCH v2 10/14] nvme-tcp: Use CCR to recover controller that hits an error Mohamed Khalfella
2026-02-03 5:34 ` Hannes Reinecke
2026-02-03 21:24 ` Mohamed Khalfella
2026-02-04 0:48 ` Randy Jennings
2026-02-04 2:57 ` Hannes Reinecke
2026-02-10 1:39 ` Mohamed Khalfella
2026-01-30 22:34 ` [PATCH v2 11/14] nvme-rdma: " Mohamed Khalfella
2026-02-03 5:35 ` Hannes Reinecke
2026-01-30 22:34 ` [PATCH v2 12/14] nvme-fc: Decouple error recovery from controller reset Mohamed Khalfella
2026-02-03 5:40 ` Hannes Reinecke
2026-02-03 21:29 ` Mohamed Khalfella
2026-02-03 19:19 ` James Smart
2026-02-03 22:49 ` James Smart
2026-02-04 0:15 ` Mohamed Khalfella
2026-02-04 0:11 ` Mohamed Khalfella
2026-02-05 0:08 ` James Smart
2026-02-05 0:59 ` Mohamed Khalfella
2026-02-09 22:53 ` Mohamed Khalfella
2026-01-30 22:34 ` [PATCH v2 13/14] nvme-fc: Use CCR to recover controller that hits an error Mohamed Khalfella
2026-02-03 5:43 ` Hannes Reinecke
2026-02-10 22:12 ` James Smart
2026-02-10 22:20 ` Mohamed Khalfella
2026-02-13 19:29 ` Mohamed Khalfella
2026-01-30 22:34 ` [PATCH v2 14/14] nvme-fc: Hold inflight requests while in FENCING state Mohamed Khalfella
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260203191331.GD3729-mkhalfella@purestorage.com \
--to=mkhalfella@purestorage.com \
--cc=adailey@purestorage.com \
--cc=axboe@kernel.dk \
--cc=dgiani@purestorage.com \
--cc=hare@suse.de \
--cc=hch@lst.de \
--cc=justin.tee@broadcom.com \
--cc=kbusch@kernel.org \
--cc=kch@nvidia.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=nareshgottumukkala83@gmail.com \
--cc=paul.ely@broadcom.com \
--cc=randyj@purestorage.com \
--cc=sagi@grimberg.me \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox