linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Nilay Shroff <nilay@linux.ibm.com>
To: kbusch@kernel.org
Cc: linux-nvme@lists.infradead.org, hch@lst.de, sagi@grimberg.me,
	gjoyce@linux.ibm.com, axboe@fb.com,
	Nilay Shroff <nilay@linux.ibm.com>
Subject: [PATCH v3 0/1] nvme-pci: recover from NVM subsystem reset
Date: Tue,  4 Jun 2024 14:40:03 +0530	[thread overview]
Message-ID: <20240604091523.1422027-1-nilay@linux.ibm.com> (raw)

Hi Keith,

My previous attempt to get attention for this patch didn't garner enough 
eyeballs and so I thought to rewrite the text and tried including some 
more background on this. For those interested, I have also copied below 
the link to the previous email where we had some discussions about this 
patch.

The NVM subsystem reset command might be needed for activating nvme 
controller firmware image after the image is committed to a slot or 
in some cases to recover from the controller fatal error. The NVM 
subsystem reset when executed, may cause the loss of communication 
with NVMe controller. And the only way to re-establish communication 
with NVMe adapter is to either re-enumerate the pci bus or hotplug 
NVMe disk or reboot the OS. Fortunately, the PPC architecture supports
extended PCI capability which could help recover the loss of PCI adapter 
communication. The EEH (Enhanced Error Handling) hardware features on 
PPC machine allow PCI bus errors to be cleared and a PCI card to be 
"rebooted", without actually having to reboot the OS or re-enumerating 
PCI bus or hotplugging NVMe disk.

In the current implementation, when user executes NVM subsystem reset 
command, kernel programs the nvme subsystem register (NSSR) and then 
initiates the nvme reset work. The nvme reset work first shuts down the 
controller and that requires access to PCIe config space. As programming 
to NSSR typically causes the loss of communication with NVMe controller, 
the nvme reset work which is immediately followed after that would fail 
to read/write to PCIe config space and that causes the nvme driver to 
believe that controller is dead and so driver cleanup all resources 
associated with that NVMe controller and marks the controller dead. 
So the PCI error recovery (EEH on PPC) doesn't get chance to try recover 
device from the adapter communication lost. 

This patch helps to detect the case if the communication with the NVMe 
adapter is lost and the PCI error recovery has been initiated by the 
platform then allow error recovery to forward progress and thus contain 
the nvme reset work (which has been initiated post NVM subsystem reset)
from marking the controller dead. If in case pci error recovery is unable 
to recover the device then it sets the pci channel state to 
"permanent failure" and help removes the device.

I have tested the following cases with this patch applied,
1. NVM subsystem reset while no IO is running 
2. NVM subsystem reset while IO is ongoing
3. Inject PCI error while reset work is scheduled and no IO is running
4. Inject PCI error while reset work is scheduled and IO is ongoing
   
   For all above cases (1-4), verified that pci error recovery could 
   successfully recover the nvme disk.

5. NVM subsystem reset and then immediately hot remove the NVMe disk: 
   In this case though pci error recovery is initiated it couldn't forward 
   progress (as disk is hot removed) and so controller is deleted and it's 
   all associated resources are freed.

6. NVM subsystem reset and PCI error recovery is unable to recover the 
   device:
   In this case controller is deleted and it's all associated resources 
   are freed.

7. NVM subsystem reset on a platform which doesn't support PCI error  
   recovery:
   In this case nvme reset work frees resources associated with the 
   controller and mark it dead.
	

Changelog:
==========
Changes from v2:
  - Formatting cleanup 
  - Updated commit changelog to better describe the issue
  - Added the cover later to add more details about nvme 
    subsystem reset and error recovery(EEH)
	
Changes from v1:
  - Allow a controller to move from CONNECTING state to 
	RESETTING state (Keith)

  - Fix race condition between reset work and pci error handler 
    code which may contain reset work and pci recovery from 
    forward progress (Keith)

Link: https://lore.kernel.org/all/20240209050342.406184-1-nilay@linux.ibm.com/

Nilay Shroff (1):
  nvme-pci : Fix EEH failure on ppc after subsystem reset

 drivers/nvme/host/core.c |  1 +
 drivers/nvme/host/pci.c  | 20 +++++++++++++++++---
 2 files changed, 18 insertions(+), 3 deletions(-)

-- 
2.45.1



             reply	other threads:[~2024-06-04  9:15 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-04  9:10 Nilay Shroff [this message]
2024-06-04  9:10 ` [PATCH v3 1/1] nvme-pci : Fix EEH failure on ppc after subsystem reset Nilay Shroff
2024-06-10 12:32   ` Maurizio Lombardi
2024-06-12 11:07     ` Nilay Shroff
2024-06-12 13:10       ` Maurizio Lombardi
2024-06-12 17:07         ` Nilay Shroff
2024-06-13  7:02           ` Maurizio Lombardi
2024-06-14  9:51   ` Hannes Reinecke
2024-06-21 16:37   ` Keith Busch
2024-06-22 15:07     ` Nilay Shroff
2024-06-24 16:07       ` Keith Busch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240604091523.1422027-1-nilay@linux.ibm.com \
    --to=nilay@linux.ibm.com \
    --cc=axboe@fb.com \
    --cc=gjoyce@linux.ibm.com \
    --cc=hch@lst.de \
    --cc=kbusch@kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).