From: Hannes Reinecke <hare@suse.de>
To: Nilay Shroff <nilay@linux.ibm.com>, kbusch@kernel.org
Cc: linux-nvme@lists.infradead.org, hch@lst.de, sagi@grimberg.me,
gjoyce@linux.ibm.com, axboe@fb.com
Subject: Re: [PATCH v3 1/1] nvme-pci : Fix EEH failure on ppc after subsystem reset
Date: Fri, 14 Jun 2024 11:51:14 +0200 [thread overview]
Message-ID: <72d9ab72-8b19-455e-a1db-d2bca9754bbf@suse.de> (raw)
In-Reply-To: <20240604091523.1422027-2-nilay@linux.ibm.com>
On 6/4/24 11:10, Nilay Shroff wrote:
> The NVMe subsystem reset command when executed may cause the loss of
> the NVMe adapter communication with kernel. And the only way today
> to recover the adapter is to either re-enumerate the pci bus or
> hotplug NVMe disk or reboot OS.
>
> The PPC architecture supports mechanism called EEH (enhanced error
> handling) which allows pci bus errors to be cleared and a pci card to
> be rebooted, without having to physically hotplug NVMe disk or reboot
> the OS.
>
> In the current implementation when user executes the nvme subsystem
> reset command and if kernel loses the communication with NVMe adapter
> then subsequent read/write to the PCIe config space of the device
> would fail. Failing to read/write to PCI config space makes NVMe
> driver assume the permanent loss of communication with the device and
> so driver marks the NVMe controller dead and frees all resources
> associate to that controller. As the NVMe controller goes dead, the
> EEH recovery can't succeed.
>
> This patch helps fix this issue so that after user executes subsystem
> reset command if the communication with the NVMe adapter is lost and
> EEH recovery is initiated then we allow the EEH recovery to forward
> progress and gives the EEH thread a fair chance to recover the
> adapter. If in case, the EEH thread couldn't recover the adapter
> communication then it sets the pci channel state of the erring
> adapter to "permanent failure" and removes the device.
>
> Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
> ---
> drivers/nvme/host/core.c | 1 +
> drivers/nvme/host/pci.c | 21 ++++++++++++++++++---
> 2 files changed, 19 insertions(+), 3 deletions(-)
>
Looks odd, but I cannot see a better way out here.
Reviewed-by: Hannes Reinecke <hare@suse.de>
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Ivo Totev, Andrew McDonald,
Werner Knoblich
next prev parent reply other threads:[~2024-06-14 9:51 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-04 9:10 [PATCH v3 0/1] nvme-pci: recover from NVM subsystem reset Nilay Shroff
2024-06-04 9:10 ` [PATCH v3 1/1] nvme-pci : Fix EEH failure on ppc after " Nilay Shroff
2024-06-10 12:32 ` Maurizio Lombardi
2024-06-12 11:07 ` Nilay Shroff
2024-06-12 13:10 ` Maurizio Lombardi
2024-06-12 17:07 ` Nilay Shroff
2024-06-13 7:02 ` Maurizio Lombardi
2024-06-14 9:51 ` Hannes Reinecke [this message]
2024-06-21 16:37 ` Keith Busch
2024-06-22 15:07 ` Nilay Shroff
2024-06-24 16:07 ` Keith Busch
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=72d9ab72-8b19-455e-a1db-d2bca9754bbf@suse.de \
--to=hare@suse.de \
--cc=axboe@fb.com \
--cc=gjoyce@linux.ibm.com \
--cc=hch@lst.de \
--cc=kbusch@kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=nilay@linux.ibm.com \
--cc=sagi@grimberg.me \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).