All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mohamed Khalfella <mkhalfella@purestorage.com>
To: Justin Tee <justin.tee@broadcom.com>,
	Naresh Gottumukkala <nareshgottumukkala83@gmail.com>,
	Paul Ely <paul.ely@broadcom.com>,
	Chaitanya Kulkarni <kch@nvidia.com>, Jens Axboe <axboe@kernel.dk>,
	Keith Busch <kbusch@kernel.org>, Sagi Grimberg <sagi@grimberg.me>,
	James Smart <jsmart833426@gmail.com>,
	Hannes Reinecke <hare@suse.de>
Cc: Aaron Dailey <adailey@purestorage.com>,
	Randy Jennings <randyj@purestorage.com>,
	Dhaval Giani <dgiani@purestorage.com>,
	linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v4 00/15] TP8028 Rapid Path Failure Recovery
Date: Tue, 12 May 2026 14:40:40 -0700	[thread overview]
Message-ID: <20260512214040.GI10532-mkhalfella@purestorage.com> (raw)
In-Reply-To: <20260328004518.1729186-1-mkhalfella@purestorage.com>

On Fri 2026-03-27 17:43:31 -0700, Mohamed Khalfella wrote:
> This patchset adds support for TP8028 Rapid Path Failure Recovery for
> both nvme target and initiator. Rapid Path Failure Recovery brings
> Cross-Controller Reset (CCR) functionality to nvme. This allows nvme
> host to send an nvme command to a source nvme controller to reset
> the impacted nvme controller, provided that both source and impacted
> controllers are in the same nvme subsystem.
> 
> The main use of CCR is when one path to the nvme subsystem fails.
> Inflight IOs on impacted nvme controller need to be terminated first
> before they can be retried on another path. Otherwise data corruption
> may happen. CCR provides a quick way to terminate these IOs on the
> unreachable nvme controller allowing recovery to move quickly avoiding
> unnecessary delays. In case of CCR is not possible, inflight requests
> are held for duration defined by TP4129 KATO Corrections and
> Clarifications before they are allowed to be retried.
> 
> 
> On the target side:
> 
> * New struct members have been added to support CCR. struct nvme_id_ctrl
>   has been updated with CIU (Controller Instance Uniquifier), CIRN
>   (Controller Instance Random Number), and CQT (Command Quiesce Time).
>   The combination of CIU, CNTLID, and CIRN is used to identify impacted
>   controller in CCR command.
> 
> * CCR nvme command implemented on the target causes impacted controller
>   to fail and drop connections to host.
> 
> * CCR logpage contains the status of pending CCR requests. An entry is
>   added to the logpage after CCR request is validated. Completed CCR
>   requests are removed from the logpage when controller becomes ready or
>   when requested in get logpage command.
> 
> * An AEN is sent when CCR completes to let the host know that it is safe
>   to retry inflight requests.
> 
> 
> On the host side:
> 
> * CIU, CIRN, and CQT have been added to struct nvme_ctrl. CIU and CIRN
>   have been added to sysfs to make the values visible to the user.
>   CIU and CIRN can be used to construct and manually send admin-passthru
>   CCR commands.
> 
> * New controller states FENCING and FENCED have been added to make sure
>   that inflight request do not get canceled if they timeout during
>   fencing process. FENCED exists so that controller state machine does
>   not have a transition from FENCING to RESETTING. Instead FENCING ->
>   FENCED -> RESETTING. This prevents a controller being fenced from
>   getting reset. Only after fencing finishes the impacted controller is
>   reset.
> 
> * Controller recovery in nvme_fence_ctrl() is invoked when LIVE
>   controller hits an error or when a request times out. CCR is attempted
>   first to reset impacted controller. If it fails then inflight requests
>   are held until it is safe to retry them.
> 
> * Updated nvme fabric transports nvme-tcp, nvme-rdma, and nvme-fc to
>   use CCR recovery.
> 
> 
> Ideally all inflight requests should be held during controller recovery
> and only retried after recovery is done. However, there are known
> situations where that is not the case in this implementation. These gaps
> will be addressed in future patches:
> 
> * Manual controller reset from sysfs will result in controller going to
>   RESETTING state and all inflight requests to be canceled immediately
>   and may be retried on another path.
> 
> * Manual controller delete from sysfs will also result in all inflight
>   requests to be canceled immediately and may be retried on another path.
> 
> * In nvme-fc, nvme controller will be deleted if remote port disappears
>   with no timeout specified. This results in immediate cancellation of
>   requests that may be retried on another path.
> 
> * In nvme-rdma if HCA is removed all nvme controllers will be deleted.
>   This results in canceling inflight IOs and may be they will be retried
>   on another path.
> 
> 
> Changes from v3:
> - nvmet: Implement CCR nvme command
>   - Fixed a bug in the order of members of struct nvme_cross_ctrl_reset_cmd
>   - Use kmalloc_obj() instead of kmalloc()
> 
> - nvme: Implement cross-controller reset recovery
>   - Now CQT has been removed updated nvme_fence_ctrl() to return
>     success or failure instead of remaining time.
>   - Updated nvme_issue_wait_ccr() to respect deadline set in
>     nvme_fence_ctrl().

v4 dropped CQT patches in order to focus on CCR. However, I came to the
understanding that we need to bring CQT patches back. The plan for v5 is
to be similar to v3 plus minor fixes came in v4.

Sagi - Does this sound good to you?

> 
> - nvme-tcp: Use CCR to recover controller that hits an error
> - nvme-rdma: Use CCR to recover controller that hits an error
>   - Updated log nvme_fence_ctrl() return value
> 
> - nvme-fc: Refactor IO error recovery
>   - Updated the commit message
>   - Updated nvme_fc_start_ioerr_recovery() to handle
>     CONNECTING case first.
> 
> - nvme-fc: Use CCR to recover controller that hits an error
>   - Updated log nvme_fence_ctrl() return value
> 
> - nvmet: Add support for CQT to nvme target
> - nvme: Add support for CQT to nvme host
> - nvme: Update CCR completion wait timeout to consider CQT
> - nvme-tcp: Extend FENCING state per TP4129 on CCR failure
> - nvme-rdma: Extend FENCING state per TP4129 on CCR failure
> - nvme-fc: Extend FENCING state per TP4129 on CCR failure
>   - Dropped CQT patches
> 
> 
> v3: https://lore.kernel.org/all/20260214042753.4073668-1-mkhalfella@purestorage.com/
> 
> *** BLURB HERE ***
> 
> 
> Mohamed Khalfella (15):
>   nvmet: Rapid Path Failure Recovery set controller identify fields
>   nvmet/debugfs: Export controller CIU and CIRN via debugfs
>   nvmet: Implement CCR nvme command
>   nvmet: Implement CCR logpage
>   nvmet: Send an AEN on CCR completion
>   nvme: Rapid Path Failure Recovery read controller identify fields
>   nvme: Introduce FENCING and FENCED controller states
>   nvme: Implement cross-controller reset recovery
>   nvme: Implement cross-controller reset completion
>   nvme-tcp: Use CCR to recover controller that hits an error
>   nvme-rdma: Use CCR to recover controller that hits an error
>   nvme-fc: Refactor IO error recovery
>   nvme-fc: Use CCR to recover controller that hits an error
>   nvme-fc: Hold inflight requests while in FENCING state
>   nvme-fc: Do not cancel requests in io taget before it is initialized
> 
>  drivers/nvme/host/constants.c   |   1 +
>  drivers/nvme/host/core.c        | 225 +++++++++++++++++++++++++++++++-
>  drivers/nvme/host/fc.c          | 215 +++++++++++++++++++++---------
>  drivers/nvme/host/nvme.h        |  24 ++++
>  drivers/nvme/host/rdma.c        |  30 ++++-
>  drivers/nvme/host/sysfs.c       |  25 ++++
>  drivers/nvme/host/tcp.c         |  30 ++++-
>  drivers/nvme/target/admin-cmd.c | 123 +++++++++++++++++
>  drivers/nvme/target/core.c      | 110 +++++++++++++++-
>  drivers/nvme/target/debugfs.c   |  21 +++
>  drivers/nvme/target/nvmet.h     |  18 ++-
>  include/linux/nvme.h            |  65 ++++++++-
>  12 files changed, 812 insertions(+), 75 deletions(-)
> 
> 
> base-commit: dd09eb443372f9390d36051d86ebe06e9919aeec
> -- 
> 2.52.0
> 


  parent reply	other threads:[~2026-05-12 21:40 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-28  0:43 [PATCH v4 00/15] TP8028 Rapid Path Failure Recovery Mohamed Khalfella
2026-03-28  0:43 ` [PATCH v4 01/15] nvmet: Rapid Path Failure Recovery set controller identify fields Mohamed Khalfella
2026-03-30 10:37   ` Hannes Reinecke
2026-05-15  2:08   ` Randy Jennings
2026-03-28  0:43 ` [PATCH v4 02/15] nvmet/debugfs: Export controller CIU and CIRN via debugfs Mohamed Khalfella
2026-05-14 23:42   ` Randy Jennings
2026-03-28  0:43 ` [PATCH v4 03/15] nvmet: Implement CCR nvme command Mohamed Khalfella
2026-03-30 10:45   ` Hannes Reinecke
2026-03-31 16:38     ` Mohamed Khalfella
2026-04-07  5:40       ` Hannes Reinecke
2026-05-15  0:18   ` Randy Jennings
2026-03-28  0:43 ` [PATCH v4 04/15] nvmet: Implement CCR logpage Mohamed Khalfella
2026-05-15  0:38   ` Randy Jennings
2026-03-28  0:43 ` [PATCH v4 05/15] nvmet: Send an AEN on CCR completion Mohamed Khalfella
2026-05-15  0:50   ` Randy Jennings
2026-03-28  0:43 ` [PATCH v4 06/15] nvme: Rapid Path Failure Recovery read controller identify fields Mohamed Khalfella
2026-05-15  2:03   ` Randy Jennings
2026-03-28  0:43 ` [PATCH v4 07/15] nvme: Introduce FENCING and FENCED controller states Mohamed Khalfella
2026-03-30 10:46   ` Hannes Reinecke
2026-05-15  2:06   ` Randy Jennings
2026-03-28  0:43 ` [PATCH v4 08/15] nvme: Implement cross-controller reset recovery Mohamed Khalfella
2026-03-30 10:50   ` Hannes Reinecke
2026-03-31 16:47     ` Mohamed Khalfella
2026-04-07  5:39       ` Hannes Reinecke
2026-04-07 20:46         ` Mohamed Khalfella
2026-04-13 15:25           ` Randy Jennings
2026-04-13 16:33             ` Mohamed Khalfella
2026-04-24 23:07   ` Randy Jennings
2026-05-15  2:32   ` Randy Jennings
2026-03-28  0:43 ` [PATCH v4 09/15] nvme: Implement cross-controller reset completion Mohamed Khalfella
2026-03-30 10:53   ` Hannes Reinecke
2026-03-31 16:55     ` Mohamed Khalfella
2026-04-07  5:48       ` Hannes Reinecke
2026-04-07 19:09         ` Mohamed Khalfella
2026-05-15  2:49           ` Randy Jennings
2026-05-15  2:47   ` Randy Jennings
2026-03-28  0:43 ` [PATCH v4 10/15] nvme-tcp: Use CCR to recover controller that hits an error Mohamed Khalfella
2026-03-30 11:00   ` Hannes Reinecke
2026-03-28  0:43 ` [PATCH v4 11/15] nvme-rdma: " Mohamed Khalfella
2026-03-28  0:43 ` [PATCH v4 12/15] nvme-fc: Refactor IO error recovery Mohamed Khalfella
2026-03-28  0:43 ` [PATCH v4 13/15] nvme-fc: Use CCR to recover controller that hits an error Mohamed Khalfella
2026-03-28  0:43 ` [PATCH v4 14/15] nvme-fc: Hold inflight requests while in FENCING state Mohamed Khalfella
2026-03-28  0:43 ` [PATCH v4 15/15] nvme-fc: Do not cancel requests in io taget before it is initialized Mohamed Khalfella
2026-05-12 21:40 ` Mohamed Khalfella [this message]
2026-05-12 22:02   ` [PATCH v4 00/15] TP8028 Rapid Path Failure Recovery Sagi Grimberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260512214040.GI10532-mkhalfella@purestorage.com \
    --to=mkhalfella@purestorage.com \
    --cc=adailey@purestorage.com \
    --cc=axboe@kernel.dk \
    --cc=dgiani@purestorage.com \
    --cc=hare@suse.de \
    --cc=jsmart833426@gmail.com \
    --cc=justin.tee@broadcom.com \
    --cc=kbusch@kernel.org \
    --cc=kch@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=nareshgottumukkala83@gmail.com \
    --cc=paul.ely@broadcom.com \
    --cc=randyj@purestorage.com \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.