public inbox for linux-nvme@lists.infradead.org
 help / color / mirror / Atom feed
From: Mohamed Khalfella <mkhalfella@purestorage.com>
To: Hannes Reinecke <hare@suse.de>
Cc: Justin Tee <justin.tee@broadcom.com>,
	Naresh Gottumukkala <nareshgottumukkala83@gmail.com>,
	Paul Ely <paul.ely@broadcom.com>,
	Chaitanya Kulkarni <kch@nvidia.com>,
	Christoph Hellwig <hch@lst.de>, Jens Axboe <axboe@kernel.dk>,
	Keith Busch <kbusch@kernel.org>, Sagi Grimberg <sagi@grimberg.me>,
	James Smart <jsmart833426@gmail.com>,
	Aaron Dailey <adailey@purestorage.com>,
	Randy Jennings <randyj@purestorage.com>,
	Dhaval Giani <dgiani@purestorage.com>,
	linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3 09/21] nvme: Implement cross-controller reset completion
Date: Wed, 18 Feb 2026 04:47:08 -0800	[thread overview]
Message-ID: <20260218124708.GJ2392949-mkhalfella@purestorage.com> (raw)
In-Reply-To: <1a9feff7-5a0d-4385-b582-c9ab60d724db@suse.de>

On Wed 2026-02-18 08:51:31 +0100, Hannes Reinecke wrote:
> On 2/17/26 19:25, Mohamed Khalfella wrote:
> > On Mon 2026-02-16 13:43:51 +0100, Hannes Reinecke wrote:
> [ .. ]
> >>
> >> We really would need some indicator whether 'ccr' is supported at all.
> > 
> > Why do we need this indicator, other than exporting it via sysfs?
> > 
> To avoid false positives.

We will never try CCR on a controller that does not support it. False
positive of what?

> 
> >> Using the number of available CCR commands would be an option, if though
> >> that would require us to keep two counters (one for the number of
> >> possible outstanding CCRs, and one for the number of actual outstanding
> >> CCRs.).
> > 
> > Like mentioned above ctrl->ccr_limit gives us the number of ccrs
> > available now. It is not 100% indicator if CCR is supported or not, but
> > it is enough to implement CCR. A second counter can help us skip trying
> > CCR if we know impacted controller does not support it.
> > 
> > Do you think it is worth it?
> > 
> Yes. The problem is that we want to get towards TP8028 compliance, which
> forces us to wait for 2*KATO + CQT before requests on the failed patch
> can be retried. That will cause a _noticeable_ stall on the application
> side. And the only way to shorten that is CCR; once we get confirmation
> from CCR we can start retrying immediately.
> At the same time the current implementation only waits for 1*KATO before
> retrying, so there will be regression if we switch to TP8028-compliant
> KATO handling for systems not supporting CCR.

The statement above is not correct. Careful consideration and testing
has been made to not introduce such regression. If CCR is not supported
then nvme_find_ctrl_ccr() will return NULL and nvme_fence_ctrl() will
return immediately. No CCR command will be sent and no wait for AEN.

What happens next depends on whether ictrl->cqt is supported or not. If
not supported, which will be the case for systems in the field today,
then requests will be retried immediately. Requests will not be held in
this case and no delay will be seen in failover case.

> 
> So we can (and should) use CCR as the determining factor whether we
> want to switch to TP8028-compliant behaviour or stick with the original
> implementation.

We do check CCR support and availability in nvme_find_ctrl_ccr(). Adding
a second counter will spare us the loop in nvme_find_ctrl_ccr(), which
is not worth it IMO.

> 
> Cheers,
> 
> Hannes
> -- 
> Dr. Hannes Reinecke                  Kernel Storage Architect
> hare@suse.de                                +49 911 74053 688
> SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
> HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich


  reply	other threads:[~2026-02-18 12:47 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-14  4:25 [PATCH v3 00/21] TP8028 Rapid Path Failure Recovery Mohamed Khalfella
2026-02-14  4:25 ` [PATCH v3 01/21] nvmet: Rapid Path Failure Recovery set controller identify fields Mohamed Khalfella
2026-02-14  4:25 ` [PATCH v3 02/21] nvmet/debugfs: Export controller CIU and CIRN via debugfs Mohamed Khalfella
2026-02-14  4:25 ` [PATCH v3 03/21] nvmet: Implement CCR nvme command Mohamed Khalfella
2026-02-27 16:30   ` Maurizio Lombardi
2026-03-25 18:52     ` Mohamed Khalfella
2026-02-14  4:25 ` [PATCH v3 04/21] nvmet: Implement CCR logpage Mohamed Khalfella
2026-02-14  4:25 ` [PATCH v3 05/21] nvmet: Send an AEN on CCR completion Mohamed Khalfella
2026-02-14  4:25 ` [PATCH v3 06/21] nvme: Rapid Path Failure Recovery read controller identify fields Mohamed Khalfella
2026-02-14  4:25 ` [PATCH v3 07/21] nvme: Introduce FENCING and FENCED controller states Mohamed Khalfella
2026-02-16 12:33   ` Hannes Reinecke
2026-02-14  4:25 ` [PATCH v3 08/21] nvme: Implement cross-controller reset recovery Mohamed Khalfella
2026-02-16 12:41   ` Hannes Reinecke
2026-02-17 18:35     ` Mohamed Khalfella
2026-02-26  2:37   ` Randy Jennings
2026-03-27 18:33     ` Mohamed Khalfella
2026-02-14  4:25 ` [PATCH v3 09/21] nvme: Implement cross-controller reset completion Mohamed Khalfella
2026-02-16 12:43   ` Hannes Reinecke
2026-02-17 18:25     ` Mohamed Khalfella
2026-02-18  7:51       ` Hannes Reinecke
2026-02-18 12:47         ` Mohamed Khalfella [this message]
2026-02-20  3:34           ` Randy Jennings
2026-02-14  4:25 ` [PATCH v3 10/21] nvme-tcp: Use CCR to recover controller that hits an error Mohamed Khalfella
2026-02-16 12:47   ` Hannes Reinecke
2026-02-14  4:25 ` [PATCH v3 11/21] nvme-rdma: " Mohamed Khalfella
2026-02-16 12:47   ` Hannes Reinecke
2026-02-14  4:25 ` [PATCH v3 12/21] nvme-fc: Decouple error recovery from controller reset Mohamed Khalfella
2026-02-28  0:12   ` James Smart
2026-03-26  2:37     ` Mohamed Khalfella
2026-02-14  4:25 ` [PATCH v3 13/21] nvme-fc: Use CCR to recover controller that hits an error Mohamed Khalfella
2026-02-28  1:03   ` James Smart
2026-03-26 17:40     ` Mohamed Khalfella
2026-02-14  4:25 ` [PATCH v3 14/21] nvme-fc: Hold inflight requests while in FENCING state Mohamed Khalfella
2026-02-27  2:49   ` Randy Jennings
2026-02-28  1:10   ` James Smart
2026-02-14  4:25 ` [PATCH v3 15/21] nvme-fc: Do not cancel requests in io taget before it is initialized Mohamed Khalfella
2026-02-28  1:12   ` James Smart
2026-02-14  4:25 ` [PATCH v3 16/21] nvmet: Add support for CQT to nvme target Mohamed Khalfella
2026-02-14  4:25 ` [PATCH v3 17/21] nvme: Add support for CQT to nvme host Mohamed Khalfella
2026-02-14  4:25 ` [PATCH v3 18/21] nvme: Update CCR completion wait timeout to consider CQT Mohamed Khalfella
2026-02-16 12:54   ` Hannes Reinecke
2026-02-16 18:45     ` Mohamed Khalfella
2026-02-17  7:09       ` Hannes Reinecke
2026-02-17 15:35         ` Mohamed Khalfella
2026-02-20  1:22           ` James Smart
2026-02-20  2:11             ` Randy Jennings
2026-02-20  7:23             ` Hannes Reinecke
2026-02-20  2:01           ` Randy Jennings
2026-02-20  7:25             ` Hannes Reinecke
2026-02-27  3:05               ` Randy Jennings
2026-03-02  7:32                 ` Hannes Reinecke
2026-02-14  4:25 ` [PATCH v3 19/21] nvme-tcp: Extend FENCING state per TP4129 on CCR failure Mohamed Khalfella
2026-02-16 12:56   ` Hannes Reinecke
2026-02-17 17:58     ` Mohamed Khalfella
2026-02-18  8:26       ` Hannes Reinecke
2026-02-14  4:25 ` [PATCH v3 20/21] nvme-rdma: " Mohamed Khalfella
2026-02-14  4:25 ` [PATCH v3 21/21] nvme-fc: " Mohamed Khalfella
2026-02-28  1:20   ` James Smart
2026-03-25 19:07     ` Mohamed Khalfella
2026-04-01 13:33 ` [PATCH v3 00/21] TP8028 Rapid Path Failure Recovery Achkinazi, Igor
2026-04-01 16:37   ` Mohamed Khalfella

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260218124708.GJ2392949-mkhalfella@purestorage.com \
    --to=mkhalfella@purestorage.com \
    --cc=adailey@purestorage.com \
    --cc=axboe@kernel.dk \
    --cc=dgiani@purestorage.com \
    --cc=hare@suse.de \
    --cc=hch@lst.de \
    --cc=jsmart833426@gmail.com \
    --cc=justin.tee@broadcom.com \
    --cc=kbusch@kernel.org \
    --cc=kch@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=nareshgottumukkala83@gmail.com \
    --cc=paul.ely@broadcom.com \
    --cc=randyj@purestorage.com \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox