From: Mohamed Khalfella <mkhalfella@purestorage.com>
To: James Smart <jsmart833426@gmail.com>
Cc: Justin Tee <justin.tee@broadcom.com>,
Naresh Gottumukkala <nareshgottumukkala83@gmail.com>,
Paul Ely <paul.ely@broadcom.com>,
Chaitanya Kulkarni <kch@nvidia.com>,
Christoph Hellwig <hch@lst.de>, Jens Axboe <axboe@kernel.dk>,
Keith Busch <kbusch@kernel.org>, Sagi Grimberg <sagi@grimberg.me>,
Aaron Dailey <adailey@purestorage.com>,
Randy Jennings <randyj@purestorage.com>,
Dhaval Giani <dgiani@purestorage.com>,
Hannes Reinecke <hare@suse.de>,
linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2 08/14] nvme: Implement cross-controller reset recovery
Date: Tue, 10 Feb 2026 16:12:43 -0800 [thread overview]
Message-ID: <20260211001243.GS3729-mkhalfella@purestorage.com> (raw)
In-Reply-To: <20260210232553.GR3729-mkhalfella@purestorage.com>
On Tue 2026-02-10 15:25:55 -0800, Mohamed Khalfella wrote:
> On Tue 2026-02-10 14:49:15 -0800, James Smart wrote:
> > On 2/10/2026 2:27 PM, Mohamed Khalfella wrote:
> > > On Tue 2026-02-10 14:09:27 -0800, James Smart wrote:
> > >> On 1/30/2026 2:34 PM, Mohamed Khalfella wrote:
> > >> ...
> > >>> +unsigned long nvme_fence_ctrl(struct nvme_ctrl *ictrl)
> > >>> +{
> > >>> + unsigned long deadline, now, timeout;
> > >>> + struct nvme_ctrl *sctrl;
> > >>> + u32 min_cntlid = 0;
> > >>> + int ret;
> > >>> +
> > >>> + timeout = nvme_fence_timeout_ms(ictrl);
> > >>> + dev_info(ictrl->device, "attempting CCR, timeout %lums\n", timeout);
> > >>> +
> > >>> + now = jiffies;
> > >>> + deadline = now + msecs_to_jiffies(timeout);
> > >>> + while (time_before(now, deadline)) {
> > >>
> > >> Q: don't we have something to identify the controller's subsystem
> > >> supports CCR before we starting selecting controllers and sending CCR ?
> > >>
> > >> I would think on older devices that don't support it we should be
> > >> skipping this loop. The loop could delay the Time-Based delay without
> > >> any CCR.
> > >
> > > I do not think we have something that identifies CCR support at
> > > subsystem level. The spec defines CCRL at the controller level. The loop
> > > should not that bad. nvme_find_ctrl_ccr() should return NULL if CCR is
> > > not supported and nvme_fence_ctrl() will return immediately.
> > >
> > >>
> > >> -- james
> > >>
> >
> > I would think CCRL on the failed controller would be enough to assume
> > the subsystem supports it.
>
> ictrl->ccr_limit is a good indication that subsystem supports CCR. I do
> not think it is enough though. I say that for two reasons:
>
> - May be this controller does not support CCR but others do on the same
> subsystem. There is nothing prevents subsystem from putting a cap of
> CCR at subsytem level.
> - May be this controller supports CCR command but not now because all
> CCR slots are used now. This can happen in the case of cascading
> failure.
>
> >
> > I'm not worried about the coding on the host is so bad. It's more the
> > multiple paths that must have cmds sent to them and getting error
> > responses for unknown cmds (should be responded to ok, but you never
> > know) as well as creating conditions for other errors where there will
> > be no return for it - e.g. other paths losing connectivity while the ccr
> > outstanding, etc. yes, they all have to work, but why bother adding
> > these flows to an old controller that would never do CCR ?
>
> If nvme_find_ctrl_ccr() returns a source controller to use then we know
> the controller supports CCR and does have an available slot to process
> this CCR request. I do not see how this code will send CCR request to an
> old controller that does not know about CCR command.
>
> I am not fully opposed against using ictrl->ccr_limit to return early. I
> do not see the need for it. If you feel strongly about it I can update
> nvme_fence_ctrl() to do so.
>
I forgot to mention that ctrl->ccr_limit is initialized from id->ccrl in
nvme_init_identify(). If this value is greater than zero then we know
the controller does support CCR. nvme_find_ctrl_ccr() checks for that
and the returned source controller must support CCR and has a slot
available for it.
next prev parent reply other threads:[~2026-02-11 0:12 UTC|newest]
Thread overview: 82+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-30 22:34 [PATCH v2 00/14] TP8028 Rapid Path Failure Recovery Mohamed Khalfella
2026-01-30 22:34 ` [PATCH v2 01/14] nvmet: Rapid Path Failure Recovery set controller identify fields Mohamed Khalfella
2026-02-03 3:03 ` Hannes Reinecke
2026-02-03 18:14 ` Mohamed Khalfella
2026-02-04 0:34 ` Hannes Reinecke
2026-02-07 13:41 ` Sagi Grimberg
2026-02-14 0:42 ` Randy Jennings
2026-02-14 3:56 ` Mohamed Khalfella
2026-01-30 22:34 ` [PATCH v2 02/14] nvmet/debugfs: Add ctrl uniquifier and random values Mohamed Khalfella
2026-02-03 3:04 ` Hannes Reinecke
2026-02-07 13:47 ` Sagi Grimberg
2026-02-11 0:50 ` Randy Jennings
2026-02-11 1:02 ` Mohamed Khalfella
2026-01-30 22:34 ` [PATCH v2 03/14] nvmet: Implement CCR nvme command Mohamed Khalfella
2026-02-03 3:19 ` Hannes Reinecke
2026-02-03 18:40 ` Mohamed Khalfella
2026-02-04 0:38 ` Hannes Reinecke
2026-02-04 0:44 ` Mohamed Khalfella
2026-02-04 0:55 ` Hannes Reinecke
2026-02-04 17:52 ` Mohamed Khalfella
2026-02-07 13:58 ` Sagi Grimberg
2026-02-08 23:10 ` Mohamed Khalfella
2026-02-09 19:27 ` Mohamed Khalfella
2026-02-11 1:34 ` Randy Jennings
2026-02-07 14:11 ` Sagi Grimberg
2026-01-30 22:34 ` [PATCH v2 04/14] nvmet: Implement CCR logpage Mohamed Khalfella
2026-02-03 3:21 ` Hannes Reinecke
2026-02-07 14:11 ` Sagi Grimberg
2026-02-11 1:49 ` Randy Jennings
2026-01-30 22:34 ` [PATCH v2 05/14] nvmet: Send an AEN on CCR completion Mohamed Khalfella
2026-02-03 3:27 ` Hannes Reinecke
2026-02-03 18:48 ` Mohamed Khalfella
2026-02-04 0:43 ` Hannes Reinecke
2026-02-07 14:12 ` Sagi Grimberg
2026-02-11 1:52 ` Randy Jennings
2026-01-30 22:34 ` [PATCH v2 06/14] nvme: Rapid Path Failure Recovery read controller identify fields Mohamed Khalfella
2026-02-03 3:28 ` Hannes Reinecke
2026-02-07 14:13 ` Sagi Grimberg
2026-02-11 1:56 ` Randy Jennings
2026-01-30 22:34 ` [PATCH v2 07/14] nvme: Introduce FENCING and FENCED controller states Mohamed Khalfella
2026-02-03 5:07 ` Hannes Reinecke
2026-02-03 19:13 ` Mohamed Khalfella
2026-01-30 22:34 ` [PATCH v2 08/14] nvme: Implement cross-controller reset recovery Mohamed Khalfella
2026-02-03 5:19 ` Hannes Reinecke
2026-02-03 20:00 ` Mohamed Khalfella
2026-02-04 1:10 ` Hannes Reinecke
2026-02-04 23:24 ` Mohamed Khalfella
2026-02-11 3:44 ` Randy Jennings
2026-02-11 15:19 ` Hannes Reinecke
2026-02-10 22:09 ` James Smart
2026-02-10 22:27 ` Mohamed Khalfella
2026-02-10 22:49 ` James Smart
2026-02-10 23:25 ` Mohamed Khalfella
2026-02-11 0:12 ` Mohamed Khalfella [this message]
2026-02-11 3:33 ` Randy Jennings
2026-01-30 22:34 ` [PATCH v2 09/14] nvme: Implement cross-controller reset completion Mohamed Khalfella
2026-02-03 5:22 ` Hannes Reinecke
2026-02-03 20:07 ` Mohamed Khalfella
2026-01-30 22:34 ` [PATCH v2 10/14] nvme-tcp: Use CCR to recover controller that hits an error Mohamed Khalfella
2026-02-03 5:34 ` Hannes Reinecke
2026-02-03 21:24 ` Mohamed Khalfella
2026-02-04 0:48 ` Randy Jennings
2026-02-04 2:57 ` Hannes Reinecke
2026-02-10 1:39 ` Mohamed Khalfella
2026-01-30 22:34 ` [PATCH v2 11/14] nvme-rdma: " Mohamed Khalfella
2026-02-03 5:35 ` Hannes Reinecke
2026-01-30 22:34 ` [PATCH v2 12/14] nvme-fc: Decouple error recovery from controller reset Mohamed Khalfella
2026-02-03 5:40 ` Hannes Reinecke
2026-02-03 21:29 ` Mohamed Khalfella
2026-02-03 19:19 ` James Smart
2026-02-03 22:49 ` James Smart
2026-02-04 0:15 ` Mohamed Khalfella
2026-02-04 0:11 ` Mohamed Khalfella
2026-02-05 0:08 ` James Smart
2026-02-05 0:59 ` Mohamed Khalfella
2026-02-09 22:53 ` Mohamed Khalfella
2026-01-30 22:34 ` [PATCH v2 13/14] nvme-fc: Use CCR to recover controller that hits an error Mohamed Khalfella
2026-02-03 5:43 ` Hannes Reinecke
2026-02-10 22:12 ` James Smart
2026-02-10 22:20 ` Mohamed Khalfella
2026-02-13 19:29 ` Mohamed Khalfella
2026-01-30 22:34 ` [PATCH v2 14/14] nvme-fc: Hold inflight requests while in FENCING state Mohamed Khalfella
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260211001243.GS3729-mkhalfella@purestorage.com \
--to=mkhalfella@purestorage.com \
--cc=adailey@purestorage.com \
--cc=axboe@kernel.dk \
--cc=dgiani@purestorage.com \
--cc=hare@suse.de \
--cc=hch@lst.de \
--cc=jsmart833426@gmail.com \
--cc=justin.tee@broadcom.com \
--cc=kbusch@kernel.org \
--cc=kch@nvidia.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=nareshgottumukkala83@gmail.com \
--cc=paul.ely@broadcom.com \
--cc=randyj@purestorage.com \
--cc=sagi@grimberg.me \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox