Re: [PATCH v4 00/15] TP8028 Rapid Path Failure Recovery

The Linux Kernel Mailing List
 help / color / mirror / Atom feed

* Re: [PATCH v4 00/15] TP8028 Rapid Path Failure Recovery
       [not found] <20260328004518.1729186-1-mkhalfella@purestorage.com>
@ 2026-05-12 21:40 ` Mohamed Khalfella
  2026-05-12 22:02   ` Sagi Grimberg
       [not found] ` <20260328004518.1729186-3-mkhalfella@purestorage.com>
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 12+ messages in thread
From: Mohamed Khalfella @ 2026-05-12 21:40 UTC (permalink / raw)
  To: Justin Tee, Naresh Gottumukkala, Paul Ely, Chaitanya Kulkarni,
	Jens Axboe, Keith Busch, Sagi Grimberg, James Smart,
	Hannes Reinecke
  Cc: Aaron Dailey, Randy Jennings, Dhaval Giani, linux-nvme,
	linux-kernel

On Fri 2026-03-27 17:43:31 -0700, Mohamed Khalfella wrote:
> This patchset adds support for TP8028 Rapid Path Failure Recovery for
> both nvme target and initiator. Rapid Path Failure Recovery brings
> Cross-Controller Reset (CCR) functionality to nvme. This allows nvme
> host to send an nvme command to a source nvme controller to reset
> the impacted nvme controller, provided that both source and impacted
> controllers are in the same nvme subsystem.
> 
> The main use of CCR is when one path to the nvme subsystem fails.
> Inflight IOs on impacted nvme controller need to be terminated first
> before they can be retried on another path. Otherwise data corruption
> may happen. CCR provides a quick way to terminate these IOs on the
> unreachable nvme controller allowing recovery to move quickly avoiding
> unnecessary delays. In case of CCR is not possible, inflight requests
> are held for duration defined by TP4129 KATO Corrections and
> Clarifications before they are allowed to be retried.
> 
> 
> On the target side:
> 
> * New struct members have been added to support CCR. struct nvme_id_ctrl
>   has been updated with CIU (Controller Instance Uniquifier), CIRN
>   (Controller Instance Random Number), and CQT (Command Quiesce Time).
>   The combination of CIU, CNTLID, and CIRN is used to identify impacted
>   controller in CCR command.
> 
> * CCR nvme command implemented on the target causes impacted controller
>   to fail and drop connections to host.
> 
> * CCR logpage contains the status of pending CCR requests. An entry is
>   added to the logpage after CCR request is validated. Completed CCR
>   requests are removed from the logpage when controller becomes ready or
>   when requested in get logpage command.
> 
> * An AEN is sent when CCR completes to let the host know that it is safe
>   to retry inflight requests.
> 
> 
> On the host side:
> 
> * CIU, CIRN, and CQT have been added to struct nvme_ctrl. CIU and CIRN
>   have been added to sysfs to make the values visible to the user.
>   CIU and CIRN can be used to construct and manually send admin-passthru
>   CCR commands.
> 
> * New controller states FENCING and FENCED have been added to make sure
>   that inflight request do not get canceled if they timeout during
>   fencing process. FENCED exists so that controller state machine does
>   not have a transition from FENCING to RESETTING. Instead FENCING ->
>   FENCED -> RESETTING. This prevents a controller being fenced from
>   getting reset. Only after fencing finishes the impacted controller is
>   reset.
> 
> * Controller recovery in nvme_fence_ctrl() is invoked when LIVE
>   controller hits an error or when a request times out. CCR is attempted
>   first to reset impacted controller. If it fails then inflight requests
>   are held until it is safe to retry them.
> 
> * Updated nvme fabric transports nvme-tcp, nvme-rdma, and nvme-fc to
>   use CCR recovery.
> 
> 
> Ideally all inflight requests should be held during controller recovery
> and only retried after recovery is done. However, there are known
> situations where that is not the case in this implementation. These gaps
> will be addressed in future patches:
> 
> * Manual controller reset from sysfs will result in controller going to
>   RESETTING state and all inflight requests to be canceled immediately
>   and may be retried on another path.
> 
> * Manual controller delete from sysfs will also result in all inflight
>   requests to be canceled immediately and may be retried on another path.
> 
> * In nvme-fc, nvme controller will be deleted if remote port disappears
>   with no timeout specified. This results in immediate cancellation of
>   requests that may be retried on another path.
> 
> * In nvme-rdma if HCA is removed all nvme controllers will be deleted.
>   This results in canceling inflight IOs and may be they will be retried
>   on another path.
> 
> 
> Changes from v3:
> - nvmet: Implement CCR nvme command
>   - Fixed a bug in the order of members of struct nvme_cross_ctrl_reset_cmd
>   - Use kmalloc_obj() instead of kmalloc()
> 
> - nvme: Implement cross-controller reset recovery
>   - Now CQT has been removed updated nvme_fence_ctrl() to return
>     success or failure instead of remaining time.
>   - Updated nvme_issue_wait_ccr() to respect deadline set in
>     nvme_fence_ctrl().

v4 dropped CQT patches in order to focus on CCR. However, I came to the
understanding that we need to bring CQT patches back. The plan for v5 is
to be similar to v3 plus minor fixes came in v4.

Sagi - Does this sound good to you?

> 
> - nvme-tcp: Use CCR to recover controller that hits an error
> - nvme-rdma: Use CCR to recover controller that hits an error
>   - Updated log nvme_fence_ctrl() return value
> 
> - nvme-fc: Refactor IO error recovery
>   - Updated the commit message
>   - Updated nvme_fc_start_ioerr_recovery() to handle
>     CONNECTING case first.
> 
> - nvme-fc: Use CCR to recover controller that hits an error
>   - Updated log nvme_fence_ctrl() return value
> 
> - nvmet: Add support for CQT to nvme target
> - nvme: Add support for CQT to nvme host
> - nvme: Update CCR completion wait timeout to consider CQT
> - nvme-tcp: Extend FENCING state per TP4129 on CCR failure
> - nvme-rdma: Extend FENCING state per TP4129 on CCR failure
> - nvme-fc: Extend FENCING state per TP4129 on CCR failure
>   - Dropped CQT patches
> 
> 
> v3: https://lore.kernel.org/all/20260214042753.4073668-1-mkhalfella@purestorage.com/
> 
> *** BLURB HERE ***
> 
> 
> Mohamed Khalfella (15):
>   nvmet: Rapid Path Failure Recovery set controller identify fields
>   nvmet/debugfs: Export controller CIU and CIRN via debugfs
>   nvmet: Implement CCR nvme command
>   nvmet: Implement CCR logpage
>   nvmet: Send an AEN on CCR completion
>   nvme: Rapid Path Failure Recovery read controller identify fields
>   nvme: Introduce FENCING and FENCED controller states
>   nvme: Implement cross-controller reset recovery
>   nvme: Implement cross-controller reset completion
>   nvme-tcp: Use CCR to recover controller that hits an error
>   nvme-rdma: Use CCR to recover controller that hits an error
>   nvme-fc: Refactor IO error recovery
>   nvme-fc: Use CCR to recover controller that hits an error
>   nvme-fc: Hold inflight requests while in FENCING state
>   nvme-fc: Do not cancel requests in io taget before it is initialized
> 
>  drivers/nvme/host/constants.c   |   1 +
>  drivers/nvme/host/core.c        | 225 +++++++++++++++++++++++++++++++-
>  drivers/nvme/host/fc.c          | 215 +++++++++++++++++++++---------
>  drivers/nvme/host/nvme.h        |  24 ++++
>  drivers/nvme/host/rdma.c        |  30 ++++-
>  drivers/nvme/host/sysfs.c       |  25 ++++
>  drivers/nvme/host/tcp.c         |  30 ++++-
>  drivers/nvme/target/admin-cmd.c | 123 +++++++++++++++++
>  drivers/nvme/target/core.c      | 110 +++++++++++++++-
>  drivers/nvme/target/debugfs.c   |  21 +++
>  drivers/nvme/target/nvmet.h     |  18 ++-
>  include/linux/nvme.h            |  65 ++++++++-
>  12 files changed, 812 insertions(+), 75 deletions(-)
> 
> 
> base-commit: dd09eb443372f9390d36051d86ebe06e9919aeec
> -- 
> 2.52.0
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 00/15] TP8028 Rapid Path Failure Recovery
  2026-05-12 21:40 ` [PATCH v4 00/15] TP8028 Rapid Path Failure Recovery Mohamed Khalfella
@ 2026-05-12 22:02   ` Sagi Grimberg
  0 siblings, 0 replies; 12+ messages in thread
From: Sagi Grimberg @ 2026-05-12 22:02 UTC (permalink / raw)
  To: Mohamed Khalfella, Justin Tee, Naresh Gottumukkala, Paul Ely,
	Chaitanya Kulkarni, Jens Axboe, Keith Busch, James Smart,
	Hannes Reinecke
  Cc: Aaron Dailey, Randy Jennings, Dhaval Giani, linux-nvme,
	linux-kernel

> v4 dropped CQT patches in order to focus on CCR. However, I came to the
> understanding that we need to bring CQT patches back. The plan for v5 is
> to be similar to v3 plus minor fixes came in v4.
>
> Sagi - Does this sound good to you?

Yes

^ permalink raw reply	[flat|nested] 12+ messages in thread

[parent not found: <20260328004518.1729186-3-mkhalfella@purestorage.com>]

* Re: [PATCH v4 02/15] nvmet/debugfs: Export controller CIU and CIRN via debugfs
       [not found] ` <20260328004518.1729186-3-mkhalfella@purestorage.com>
@ 2026-05-14 23:42   ` Randy Jennings
  0 siblings, 0 replies; 12+ messages in thread
From: Randy Jennings @ 2026-05-14 23:42 UTC (permalink / raw)
  To: Mohamed Khalfella
  Cc: Justin Tee, Naresh Gottumukkala, Paul Ely, Chaitanya Kulkarni,
	Jens Axboe, Keith Busch, Sagi Grimberg, James Smart,
	Hannes Reinecke, Aaron Dailey, Dhaval Giani, linux-nvme,
	linux-kernel

On Fri, Mar 27, 2026 at 5:45 PM Mohamed Khalfella
<mkhalfella@purestorage.com> wrote:
>
> Export ctrl->ciu and ctrl->cirn as debugfs files under controller
> debugfs directory.
>
> Signed-off-by: Mohamed Khalfella <mkhalfella@purestorage.com>
> Reviewed-by: Hannes Reinecke <hare@suse.de>
> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

Reviewed-by: Randy Jennings <randyj@purestorage.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

[parent not found: <20260328004518.1729186-4-mkhalfella@purestorage.com>]

* Re: [PATCH v4 03/15] nvmet: Implement CCR nvme command
       [not found] ` <20260328004518.1729186-4-mkhalfella@purestorage.com>
@ 2026-05-15  0:18   ` Randy Jennings
  0 siblings, 0 replies; 12+ messages in thread
From: Randy Jennings @ 2026-05-15  0:18 UTC (permalink / raw)
  To: Mohamed Khalfella
  Cc: Justin Tee, Naresh Gottumukkala, Paul Ely, Chaitanya Kulkarni,
	Jens Axboe, Keith Busch, Sagi Grimberg, James Smart,
	Hannes Reinecke, Aaron Dailey, Dhaval Giani, linux-nvme,
	linux-kernel

On Fri, Mar 27, 2026 at 5:45 PM Mohamed Khalfella
<mkhalfella@purestorage.com> wrote:
>
> Defined by TP8028 Rapid Path Failure Recovery, CCR (Cross-Controller
> Reset) command is an nvme command issued to source controller by
> initiator to reset impacted controller. Implement CCR command for linux
> nvme target.
>
> Signed-off-by: Mohamed Khalfella <mkhalfella@purestorage.com>
> ---

> +       mutex_lock(&sctrl->lock);
> +       list_for_each_entry(ccr, &sctrl->ccr_list, entry) {
> +               if (ccr->ctrl == ictrl) {
> +                       status = NVME_SC_CCR_IN_PROGRESS | NVME_STATUS_DNR;
> +                       goto out_unlock;
> +               }
> +
> +               ccr_total++;
> +               if (ccr->ctrl)
> +                       ccr_active++;
> +       }
> +
> +       if (ccr_active >= NVMF_CCR_LIMIT) {
> +               status = NVME_SC_CCR_LIMIT_EXCEEDED;
> +               goto out_unlock;
> +       }
> +       if (ccr_total >= NVMF_CCR_PER_PAGE) {
> +               status = NVME_SC_CCR_LOGPAGE_FULL;
> +               goto out_unlock;
> +       }
> +
> +       new_ccr = kmalloc_obj(*new_ccr, GFP_KERNEL);
This allocation could be done optimistically outside of the mutex.
But that would lead to more complicated code; probably not worth it
here.

Reviewed-by: Randy Jennings <randyj@purestorage.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

[parent not found: <20260328004518.1729186-5-mkhalfella@purestorage.com>]

* Re: [PATCH v4 04/15] nvmet: Implement CCR logpage
       [not found] ` <20260328004518.1729186-5-mkhalfella@purestorage.com>
@ 2026-05-15  0:38   ` Randy Jennings
  0 siblings, 0 replies; 12+ messages in thread
From: Randy Jennings @ 2026-05-15  0:38 UTC (permalink / raw)
  To: Mohamed Khalfella
  Cc: Justin Tee, Naresh Gottumukkala, Paul Ely, Chaitanya Kulkarni,
	Jens Axboe, Keith Busch, Sagi Grimberg, James Smart,
	Hannes Reinecke, Aaron Dailey, Dhaval Giani, linux-nvme,
	linux-kernel

On Fri, Mar 27, 2026 at 5:45 PM Mohamed Khalfella
<mkhalfella@purestorage.com> wrote:
>
> Defined by TP8028 Rapid Path Failure Recovery, CCR (Cross-Controller
> Reset) log page contains an entry for each CCR request submitted to
> source controller. Implement CCR logpage for nvme linux target.

> +/* NVMe Cross-Controller Reset Status */
> +enum {
> +       NVME_CCR_STATUS_IN_PROGRESS,
> +       NVME_CCR_STATUS_SUCCESS,
> +       NVME_CCR_STATUS_FAILED,
> +};
> +
Looking at the rest of the code, all the enums are defined except
/* NVMe Namespace Write Protect State */
which does define the value of the first entry (0).

I think it would be prefereable to add explicit values here (0, 1, 2) even
though the implicit values should be correct.

Sincerely,
Randy Jennings

^ permalink raw reply	[flat|nested] 12+ messages in thread

[parent not found: <20260328004518.1729186-6-mkhalfella@purestorage.com>]

* Re: [PATCH v4 05/15] nvmet: Send an AEN on CCR completion
       [not found] ` <20260328004518.1729186-6-mkhalfella@purestorage.com>
@ 2026-05-15  0:50   ` Randy Jennings
  0 siblings, 0 replies; 12+ messages in thread
From: Randy Jennings @ 2026-05-15  0:50 UTC (permalink / raw)
  To: Mohamed Khalfella
  Cc: Justin Tee, Naresh Gottumukkala, Paul Ely, Chaitanya Kulkarni,
	Jens Axboe, Keith Busch, Sagi Grimberg, James Smart,
	Hannes Reinecke, Aaron Dailey, Dhaval Giani, linux-nvme,
	linux-kernel

On Fri, Mar 27, 2026 at 5:45 PM Mohamed Khalfella
<mkhalfella@purestorage.com> wrote:
>
> Send an AEN to initiator when impacted controller exists. The
> notification points to CCR log page that initiator can read to check
> which CCR operation completed.
>
> Signed-off-by: Mohamed Khalfella <mkhalfella@purestorage.com>
> Reviewed-by: Hannes Reinecke <hare@suse.de>
> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
> +static void nvmet_add_async_event_locked(struct nvmet_ctrl *ctrl, u8 event_type,
>                 u8 event_info, u8 log_page)
>  {
>         struct nvmet_async_event *aen;
> @@ -218,13 +218,19 @@ void nvmet_add_async_event(struct nvmet_ctrl *ctrl, u8 event_type,
>         aen->event_info = event_info;
>         aen->log_page = log_page;
Please add
lockdep_assert_held(&ctrl->lock);
Looking at the rest of the code, this should go directly under the
local variable definitions.
>
> -       mutex_lock(&ctrl->lock);
>         list_add_tail(&aen->entry, &ctrl->async_events);
> -       mutex_unlock(&ctrl->lock);

Sincerely,
Randy Jennings

^ permalink raw reply	[flat|nested] 12+ messages in thread

[parent not found: <20260328004518.1729186-7-mkhalfella@purestorage.com>]

* Re: [PATCH v4 06/15] nvme: Rapid Path Failure Recovery read controller identify fields
       [not found] ` <20260328004518.1729186-7-mkhalfella@purestorage.com>
@ 2026-05-15  2:03   ` Randy Jennings
  0 siblings, 0 replies; 12+ messages in thread
From: Randy Jennings @ 2026-05-15  2:03 UTC (permalink / raw)
  To: Mohamed Khalfella
  Cc: Justin Tee, Naresh Gottumukkala, Paul Ely, Chaitanya Kulkarni,
	Jens Axboe, Keith Busch, Sagi Grimberg, James Smart,
	Hannes Reinecke, Aaron Dailey, Dhaval Giani, linux-nvme,
	linux-kernel

On Fri, Mar 27, 2026 at 5:45 PM Mohamed Khalfella
<mkhalfella@purestorage.com> wrote:
>
> TP8028 Rapid path failure added new fileds to controller identify
> response. Read CIU (Controller Instance Uniquifier), CIRN (Controller
> Instance Random Number), and CCRL (Cross-Controller Reset Limit) from
> controller identify response. Expose CIU and CIRN as sysfs attributes
> so the values can be used directrly by user if needed.
>
> Signed-off-by: Mohamed Khalfella <mkhalfella@purestorage.com>
> Reviewed-by: Hannes Reinecke <hare@suse.de>
> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

Reviewed-by: Randy Jennings <randyj@purestorage.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

[parent not found: <20260328004518.1729186-8-mkhalfella@purestorage.com>]

* Re: [PATCH v4 07/15] nvme: Introduce FENCING and FENCED controller states
       [not found] ` <20260328004518.1729186-8-mkhalfella@purestorage.com>
@ 2026-05-15  2:06   ` Randy Jennings
  0 siblings, 0 replies; 12+ messages in thread
From: Randy Jennings @ 2026-05-15  2:06 UTC (permalink / raw)
  To: Mohamed Khalfella
  Cc: Justin Tee, Naresh Gottumukkala, Paul Ely, Chaitanya Kulkarni,
	Jens Axboe, Keith Busch, Sagi Grimberg, James Smart,
	Hannes Reinecke, Aaron Dailey, Dhaval Giani, linux-nvme,
	linux-kernel

On Fri, Mar 27, 2026 at 5:46 PM Mohamed Khalfella
<mkhalfella@purestorage.com> wrote:
>
> FENCING is a new controller state that a LIVE controller enters when an
> error is encountered. While in FENCING state, inflight IOs that timeout
> are not canceled because they should be held until either CCR succeeds
> or time-based recovery completes. While the queues remain alive, requests
> are not allowed to be sent in this state, and the controller cannot be
> reset or deleted. This is intentional because resetting or deleting the
> controller results in canceling inflight IOs.
>
> FENCED is a short-term state the controller enters before it is reset.
> It exists only to prevent manual resets from happening while controller
> is in FENCING state.
>
> Signed-off-by: Mohamed Khalfella <mkhalfella@purestorage.com>

Reviewed-by: Randy Jennings <randyj@purestorage.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

[parent not found: <20260328004518.1729186-2-mkhalfella@purestorage.com>]

* Re: [PATCH v4 01/15] nvmet: Rapid Path Failure Recovery set controller identify fields
       [not found] ` <20260328004518.1729186-2-mkhalfella@purestorage.com>
@ 2026-05-15  2:08   ` Randy Jennings
  0 siblings, 0 replies; 12+ messages in thread
From: Randy Jennings @ 2026-05-15  2:08 UTC (permalink / raw)
  To: Mohamed Khalfella
  Cc: Justin Tee, Naresh Gottumukkala, Paul Ely, Chaitanya Kulkarni,
	Jens Axboe, Keith Busch, Sagi Grimberg, James Smart,
	Hannes Reinecke, Aaron Dailey, Dhaval Giani, linux-nvme,
	linux-kernel

On Fri, Mar 27, 2026 at 5:45 PM Mohamed Khalfella
<mkhalfella@purestorage.com> wrote:
>
> TP8028 Rapid Path Failure Recovery defined new fields in controller
> identify response. The newly defined fields are:
>
> Signed-off-by: Mohamed Khalfella <mkhalfella@purestorage.com>

Reviewed-by: Randy Jennings <randyj@purestorage.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

[parent not found: <20260328004518.1729186-9-mkhalfella@purestorage.com>]

* Re: [PATCH v4 08/15] nvme: Implement cross-controller reset recovery
       [not found] ` <20260328004518.1729186-9-mkhalfella@purestorage.com>
@ 2026-05-15  2:32   ` Randy Jennings
  0 siblings, 0 replies; 12+ messages in thread
From: Randy Jennings @ 2026-05-15  2:32 UTC (permalink / raw)
  To: Mohamed Khalfella
  Cc: Justin Tee, Naresh Gottumukkala, Paul Ely, Chaitanya Kulkarni,
	Jens Axboe, Keith Busch, Sagi Grimberg, James Smart,
	Hannes Reinecke, Aaron Dailey, Dhaval Giani, linux-nvme,
	linux-kernel

On Fri, Mar 27, 2026 at 5:46 PM Mohamed Khalfella
<mkhalfella@purestorage.com> wrote:
>
> A host that has more than one path connecting to an nvme subsystem
> typically has an nvme controller associated with every path. This is
> mostly applicable to nvmeof. If one path goes down, inflight IOs on that
> path should not be retried immediately on another path because this
> could lead to data corruption as described in TP4129. TP8028 defines
> cross-controller reset mechanism that can be used by host to terminate
> IOs on the failed path using one of the remaining healthy paths. Only
> after IOs are terminated, or long enough time passes as defined by
> TP4129, inflight IOs should be retried on another path. Implement core
> cross-controller reset shared logic to be used by the transports.
>
> Signed-off-by: Mohamed Khalfella <mkhalfella@purestorage.com>

> +       now = jiffies;
> +       if (time_before(now, deadline))
> +               tmo = min_t(unsigned long,
> +                           secs_to_jiffies(ictrl->kato), deadline - now);
At LSF last week, we talked about skipping logic here and just using
the rest of the time (deadline - now).

The arguments for shortening the time were dubious, so simpler logic
seemed better.

> +
> +       if (!wait_for_completion_timeout(&ccr.complete, tmo)) {
> +               ret = -ETIMEDOUT;
> +               goto out;
> +       }

Sincerely,
Randy Jennings

^ permalink raw reply	[flat|nested] 12+ messages in thread

[parent not found: <20260328004518.1729186-10-mkhalfella@purestorage.com>]

* Re: [PATCH v4 09/15] nvme: Implement cross-controller reset completion
       [not found] ` <20260328004518.1729186-10-mkhalfella@purestorage.com>
@ 2026-05-15  2:47   ` Randy Jennings
       [not found]   ` <73a9c0e2-ecd0-4170-8723-259529617ec0@suse.de>
  1 sibling, 0 replies; 12+ messages in thread
From: Randy Jennings @ 2026-05-15  2:47 UTC (permalink / raw)
  To: Mohamed Khalfella
  Cc: Justin Tee, Naresh Gottumukkala, Paul Ely, Chaitanya Kulkarni,
	Jens Axboe, Keith Busch, Sagi Grimberg, James Smart,
	Hannes Reinecke, Aaron Dailey, Dhaval Giani, linux-nvme,
	linux-kernel

On Fri, Mar 27, 2026 at 5:46 PM Mohamed Khalfella
<mkhalfella@purestorage.com> wrote:
>
> An nvme source controller that issues CCR command expects to receive an
> NVME_AER_NOTICE_CCR_COMPLETED when pending CCR succeeds or fails. Add
> sctrl->ccr_work to read NVME_LOG_CCR logpage and wakeup any thread
> waiting on CCR completion.
>
> Signed-off-by: Mohamed Khalfella <mkhalfella@purestorage.com>
> ---
> +       ret = nvme_get_log(ctrl, 0, NVME_LOG_CCR, 0x01,
> +                          0x00, log, sizeof(*log), 0);
The 0x01 is the log specific parameter.  0x01 sets the Remove Completed flag.
Please explain that in a comment (if not assigning it to a well named local/
defining a constant).

Sincerely,
Randy Jennings

^ permalink raw reply	[flat|nested] 12+ messages in thread

[parent not found: <73a9c0e2-ecd0-4170-8723-259529617ec0@suse.de>]

[parent not found: <20260331165510.GD2861-mkhalfella@purestorage.com>]

[parent not found: <019cf04f-8988-46fd-aecd-0f77ac5f8b8a@suse.de>]

[parent not found: <20260407190940.GF2861-mkhalfella@purestorage.com>]

* Re: [PATCH v4 09/15] nvme: Implement cross-controller reset completion
       [not found]         ` <20260407190940.GF2861-mkhalfella@purestorage.com>
@ 2026-05-15  2:49           ` Randy Jennings
  0 siblings, 0 replies; 12+ messages in thread
From: Randy Jennings @ 2026-05-15  2:49 UTC (permalink / raw)
  To: Mohamed Khalfella
  Cc: Hannes Reinecke, Justin Tee, Naresh Gottumukkala, Paul Ely,
	Chaitanya Kulkarni, Jens Axboe, Keith Busch, Sagi Grimberg,
	James Smart, Aaron Dailey, Dhaval Giani, linux-nvme, linux-kernel

On Tue, Apr 7, 2026 at 12:09 PM Mohamed Khalfella
<mkhalfella@purestorage.com> wrote:
>
> On Tue 2026-04-07 07:48:50 +0200, Hannes Reinecke wrote:
> > On 3/31/26 18:55, Mohamed Khalfella wrote:
> > > On Mon 2026-03-30 12:53:07 +0200, Hannes Reinecke wrote:
> > >> On 3/28/26 01:43, Mohamed Khalfella wrote:
> > >>> An nvme source controller that issues CCR command expects to receive an
> > >>> NVME_AER_NOTICE_CCR_COMPLETED when pending CCR succeeds or fails. Add
> > >>> sctrl->ccr_work to read NVME_LOG_CCR logpage and wakeup any thread
> > >>> waiting on CCR completion.
> > >>>
> > >>> Signed-off-by: Mohamed Khalfella <mkhalfella@purestorage.com>
> > >>> ---
> > >>>    drivers/nvme/host/core.c | 49 +++++++++++++++++++++++++++++++++++++++-
> > >>>    drivers/nvme/host/nvme.h |  1 +
> > >>>    2 files changed, 49 insertions(+), 1 deletion(-)
> > >>>
> > >>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> > >>> index 5603ae36444f..793f203bfc38 100644
> > >>> --- a/drivers/nvme/host/core.c
> > >>> +++ b/drivers/nvme/host/core.c
> > >>> @@ -1920,7 +1920,8 @@ EXPORT_SYMBOL_GPL(nvme_set_queue_count);
> > >>>
> > >>>    #define NVME_AEN_SUPPORTED \
> > >>>           (NVME_AEN_CFG_NS_ATTR | NVME_AEN_CFG_FW_ACT | \
> > >>> -  NVME_AEN_CFG_ANA_CHANGE | NVME_AEN_CFG_DISC_CHANGE)
> > >>> +  NVME_AEN_CFG_ANA_CHANGE | NVME_AEN_CFG_CCR_COMPLETE | \
> > >>> +  NVME_AEN_CFG_DISC_CHANGE)
> > >>>
> > >>>    static void nvme_enable_aen(struct nvme_ctrl *ctrl)
> > >>>    {
> > >>> @@ -4873,6 +4874,47 @@ static void nvme_get_fw_slot_info(struct nvme_ctrl *ctrl)
> > >>>           kfree(log);
> > >>>    }
> > >>>
> > >>> +static void nvme_ccr_work(struct work_struct *work)
> > >>> +{
> > >>> + struct nvme_ctrl *ctrl = container_of(work, struct nvme_ctrl, ccr_work);
> > >>> + struct nvme_ccr_entry *ccr;
> > >>> + struct nvme_ccr_log_entry *entry;
> > >>> + struct nvme_ccr_log *log;
> > >>> + unsigned long flags;
> > >>> + int ret, i;
> > >>> +
> > >>> + log = kmalloc(sizeof(*log), GFP_KERNEL);
> > >>> + if (!log)
> > >>> +         return;
> > >>> +
> > >>> + ret = nvme_get_log(ctrl, 0, NVME_LOG_CCR, 0x01,
> > >>> +                    0x00, log, sizeof(*log), 0);
> > >>> + if (ret)
> > >>> +         goto out;
> > >>> +
> > >>> + spin_lock_irqsave(&ctrl->lock, flags);
> > >>> + for (i = 0; i < le16_to_cpu(log->ne); i++) {
> > >>> +         entry = &log->entries[i];
> > >>> +         if (entry->ccrs == NVME_CCR_STATUS_IN_PROGRESS)
> > >>> +                 continue;
> > >>> +
> > >>> +         list_for_each_entry(ccr, &ctrl->ccr_list, list) {
> > >>> +                 struct nvme_ctrl *ictrl = ccr->ictrl;
> > >>> +
> > >>> +                 if (ictrl->cntlid != le16_to_cpu(entry->icid) ||
> > >>> +                     ictrl->ciu != entry->ciu)
> > >>> +                         continue;
> > >>> +
> > >>> +                 /* Complete matching entry */
> > >>> +                 ccr->ccrs = entry->ccrs;
> > >>> +                 complete(&ccr->complete);
> > >>> +         }
> > >>> + }
> > >>> + spin_unlock_irqrestore(&ctrl->lock, flags);
> > >>> +out:
> > >>> + kfree(log);
> > >>> +}
> > >>> +
> > >>>    static void nvme_fw_act_work(struct work_struct *work)
> > >>>    {
> > >>>           struct nvme_ctrl *ctrl = container_of(work,
> > >>> @@ -4949,6 +4991,9 @@ static bool nvme_handle_aen_notice(struct nvme_ctrl *ctrl, u32 result)
> > >>>           case NVME_AER_NOTICE_DISC_CHANGED:
> > >>>                   ctrl->aen_result = result;
> > >>>                   break;
> > >>> + case NVME_AER_NOTICE_CCR_COMPLETED:
> > >>> +         queue_work(nvme_wq, &ctrl->ccr_work);
> > >>> +         break;
> > >>>           default:
> > >>>                   dev_warn(ctrl->device, "async event result %08x\n", result);
> > >>>           }
> > >>> @@ -5144,6 +5189,7 @@ void nvme_stop_ctrl(struct nvme_ctrl *ctrl)
> > >>>           nvme_stop_failfast_work(ctrl);
> > >>>           flush_work(&ctrl->async_event_work);
> > >>>           cancel_work_sync(&ctrl->fw_act_work);
> > >>> + cancel_work_sync(&ctrl->ccr_work);
> > >>>           if (ctrl->ops->stop_ctrl)
> > >>>                   ctrl->ops->stop_ctrl(ctrl);
> > >>>    }
> > >>> @@ -5267,6 +5313,7 @@ int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev,
> > >>>           ctrl->quirks = quirks;
> > >>>           ctrl->numa_node = NUMA_NO_NODE;
> > >>>           INIT_WORK(&ctrl->scan_work, nvme_scan_work);
> > >>> + INIT_WORK(&ctrl->ccr_work, nvme_ccr_work);
> > >>>           INIT_WORK(&ctrl->async_event_work, nvme_async_event_work);
> > >>>           INIT_WORK(&ctrl->fw_act_work, nvme_fw_act_work);
> > >>>           INIT_WORK(&ctrl->delete_work, nvme_delete_ctrl_work);
> > >>> diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
> > >>> index f2bcff9ccd25..776ee8aa5a93 100644
> > >>> --- a/drivers/nvme/host/nvme.h
> > >>> +++ b/drivers/nvme/host/nvme.h
> > >>> @@ -419,6 +419,7 @@ struct nvme_ctrl {
> > >>>           struct nvme_effects_log *effects;
> > >>>           struct xarray cels;
> > >>>           struct work_struct scan_work;
> > >>> + struct work_struct ccr_work;
> > >>>           struct work_struct async_event_work;
> > >>>           struct delayed_work ka_work;
> > >>>           struct delayed_work failfast_work;
> > >>
> > >> Hmm. The 'nvme_fence_ctrl' operation introduced in the previous patch
> > >> is synchronous, yet in this patch we're looking a a log page to figure
> > >> out if the cross-controller reset is complete.
> > >> Which is slightly irritating.
> > >> Wouldn't it be better to make the 'nvme_fence_ctrl' operation
> > >> asynchronous, and then have a separate function to wait for the fence
> > >> operation to complete (which then could look at log pages etc)?
> > >
> > > True nvme_fence_ctrl() is synchronous, but it runs in from ctrl->fencing_work.
> > > What is it that you find irritating about nvme_fence_ctrl()?
> > >
> >
> > Thins is, in order to make nvme_fence_ctrl() synchronous we have to
> > wait for the operation itself (which is asynchronous) to complete.
> > And that wait in itself is implemented by a wait queue.
> > So we're having a wait queue calling nvme_fence_ctrl(), which calls
> > another wait queue waiting for a completion.
> > And then (if the IRS bit is not set) calling another waitqueue for
> > checking the log page.
>
> There is no point of checking the CCR logpage before getting AEN. Sure
> we can implement some sort of polling, but I do not think this is the
> right approach.
>
> >
> > I think we could simplify this by simply making nvme_fence_ctrl()
> > asynchronous, which could do away with all the workqueue handling.
>
> I am not sure I understand exactly how nvme_fence_ctrl() can be make
> asynchronous. Can you provide example code?

I was able to convince Hannes at LSF to withdraw the request for an
asynchronous structure for the code.

Sincerely,
Randy Jennings

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2026-05-15  2:49 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20260328004518.1729186-1-mkhalfella@purestorage.com>
2026-05-12 21:40 ` [PATCH v4 00/15] TP8028 Rapid Path Failure Recovery Mohamed Khalfella
2026-05-12 22:02   ` Sagi Grimberg
     [not found] ` <20260328004518.1729186-3-mkhalfella@purestorage.com>
2026-05-14 23:42   ` [PATCH v4 02/15] nvmet/debugfs: Export controller CIU and CIRN via debugfs Randy Jennings
     [not found] ` <20260328004518.1729186-4-mkhalfella@purestorage.com>
2026-05-15  0:18   ` [PATCH v4 03/15] nvmet: Implement CCR nvme command Randy Jennings
     [not found] ` <20260328004518.1729186-5-mkhalfella@purestorage.com>
2026-05-15  0:38   ` [PATCH v4 04/15] nvmet: Implement CCR logpage Randy Jennings
     [not found] ` <20260328004518.1729186-6-mkhalfella@purestorage.com>
2026-05-15  0:50   ` [PATCH v4 05/15] nvmet: Send an AEN on CCR completion Randy Jennings
     [not found] ` <20260328004518.1729186-7-mkhalfella@purestorage.com>
2026-05-15  2:03   ` [PATCH v4 06/15] nvme: Rapid Path Failure Recovery read controller identify fields Randy Jennings
     [not found] ` <20260328004518.1729186-8-mkhalfella@purestorage.com>
2026-05-15  2:06   ` [PATCH v4 07/15] nvme: Introduce FENCING and FENCED controller states Randy Jennings
     [not found] ` <20260328004518.1729186-2-mkhalfella@purestorage.com>
2026-05-15  2:08   ` [PATCH v4 01/15] nvmet: Rapid Path Failure Recovery set controller identify fields Randy Jennings
     [not found] ` <20260328004518.1729186-9-mkhalfella@purestorage.com>
2026-05-15  2:32   ` [PATCH v4 08/15] nvme: Implement cross-controller reset recovery Randy Jennings
     [not found] ` <20260328004518.1729186-10-mkhalfella@purestorage.com>
2026-05-15  2:47   ` [PATCH v4 09/15] nvme: Implement cross-controller reset completion Randy Jennings
     [not found]   ` <73a9c0e2-ecd0-4170-8723-259529617ec0@suse.de>
     [not found]     ` <20260331165510.GD2861-mkhalfella@purestorage.com>
     [not found]       ` <019cf04f-8988-46fd-aecd-0f77ac5f8b8a@suse.de>
     [not found]         ` <20260407190940.GF2861-mkhalfella@purestorage.com>
2026-05-15  2:49           ` Randy Jennings

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox