* [PATCHv2] nvme-pci: try function level reset on init failure
@ 2025-07-15 19:16 ` Keith Busch
2025-07-15 23:04 ` Chaitanya Kulkarni
` (3 more replies)
0 siblings, 4 replies; 8+ messages in thread
From: Keith Busch @ 2025-07-15 19:16 UTC (permalink / raw)
To: linux-nvme, hch; +Cc: Keith Busch, Chaitanya Kulkarni
From: Keith Busch <kbusch@kernel.org>
NVMe devices from multiple vendors appear to get stuck in a reset state
that we can't get out of with an NVMe level Controller Reset. The kernel
would report these with messages that look like:
Device not ready; aborting reset, CSTS=0x1
These have historically required a power cycle to make them usable
again, but in many cases, a PCIe FLR is sufficient to restart operation
without a power cycle. Try it if the initial controller reset fails
during any nvme reset attempt.
Cc: Chaitanya Kulkarni <chaitanyak@nvidia.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
---
v1->v2:
Added code comment explaining whe escalation
Add an informational kernel message that this event occured
Use the "pcie_reset_flr()" API instead of "pcie_flr()" since that one
checks for quirks and capabilities before writing FLR config bits.
Note, NVMe PCI Trasnsport Spec mandates FLR capability, so the latter
should not apply to any compliant device, but you never know...
drivers/nvme/host/pci.c | 24 ++++++++++++++++++++++--
1 file changed, 22 insertions(+), 2 deletions(-)
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 4cf87fb5d8573..f8f8cb6a4786a 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2067,8 +2067,28 @@ static int nvme_pci_configure_admin_queue(struct nvme_dev *dev)
* might be pointing at!
*/
result = nvme_disable_ctrl(&dev->ctrl, false);
- if (result < 0)
- return result;
+ if (result < 0) {
+ struct pci_dev *pdev = to_pci_dev(dev->dev);
+
+ /*
+ * The NVMe Controller Reset method did not get an expected
+ * CSTS.RDY transition, so something with the device appears to
+ * be stuck. Use the lower level and bigger hammer PCIe
+ * Function Level Reset to attempt restoring the device to its
+ * initial state, and try again.
+ */
+ result = pcie_reset_flr(pdev, false);
+ if (result < 0)
+ return result;
+
+ pci_restore_state(pdev);
+ result = nvme_disable_ctrl(&dev->ctrl, false);
+ if (result < 0)
+ return result;
+
+ dev_info(&dev->ctrl.device,
+ "controller reset completed after pcie flr\n");
+ }
result = nvme_alloc_queue(dev, 0, NVME_AQ_DEPTH);
if (result)
--
2.47.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCHv2] nvme-pci: try function level reset on init failure
2025-07-15 19:16 ` [PATCHv2] nvme-pci: try function level reset on init failure Keith Busch
@ 2025-07-15 23:04 ` Chaitanya Kulkarni
2025-07-17 8:38 ` Nitesh Shetty
` (2 subsequent siblings)
3 siblings, 0 replies; 8+ messages in thread
From: Chaitanya Kulkarni @ 2025-07-15 23:04 UTC (permalink / raw)
To: Keith Busch, linux-nvme@lists.infradead.org, hch@lst.de; +Cc: Keith Busch
On 7/15/25 12:16, Keith Busch wrote:
> From: Keith Busch<kbusch@kernel.org>
>
> NVMe devices from multiple vendors appear to get stuck in a reset state
> that we can't get out of with an NVMe level Controller Reset. The kernel
> would report these with messages that look like:
>
> Device not ready; aborting reset, CSTS=0x1
>
> These have historically required a power cycle to make them usable
> again, but in many cases, a PCIe FLR is sufficient to restart operation
> without a power cycle. Try it if the initial controller reset fails
> during any nvme reset attempt.
>
> Cc: Chaitanya Kulkarni<chaitanyak@nvidia.com>
> Signed-off-by: Keith Busch<kbusch@kernel.org>
Looks good.
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
-ck
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: nvme-pci: try function level reset on init failure
2025-07-15 19:16 ` [PATCHv2] nvme-pci: try function level reset on init failure Keith Busch
2025-07-15 23:04 ` Chaitanya Kulkarni
@ 2025-07-17 8:38 ` Nitesh Shetty
2025-07-17 11:40 ` [PATCHv2] " Christoph Hellwig
2025-07-17 13:54 ` Keith Busch
3 siblings, 0 replies; 8+ messages in thread
From: Nitesh Shetty @ 2025-07-17 8:38 UTC (permalink / raw)
To: Keith Busch; +Cc: linux-nvme, hch, Keith Busch, Chaitanya Kulkarni
[-- Attachment #1: Type: text/plain, Size: 1245 bytes --]
On 15/07/25 12:16PM, Keith Busch wrote:
>From: Keith Busch <kbusch@kernel.org>
>
>NVMe devices from multiple vendors appear to get stuck in a reset state
>that we can't get out of with an NVMe level Controller Reset. The kernel
>would report these with messages that look like:
>
> Device not ready; aborting reset, CSTS=0x1
>
>These have historically required a power cycle to make them usable
>again, but in many cases, a PCIe FLR is sufficient to restart operation
>without a power cycle. Try it if the initial controller reset fails
>during any nvme reset attempt.
>
>Cc: Chaitanya Kulkarni <chaitanyak@nvidia.com>
>Signed-off-by: Keith Busch <kbusch@kernel.org>
>---
>v1->v2:
>
> Added code comment explaining whe escalation
>
> Add an informational kernel message that this event occured
>
> Use the "pcie_reset_flr()" API instead of "pcie_flr()" since that one
> checks for quirks and capabilities before writing FLR config bits.
> Note, NVMe PCI Trasnsport Spec mandates FLR capability, so the latter
> should not apply to any compliant device, but you never know...
>
> drivers/nvme/host/pci.c | 24 ++++++++++++++++++++++--
> 1 file changed, 22 insertions(+), 2 deletions(-)
>
Reviewed-by: Nitesh Shetty <nj.shetty@samsung.com>
[-- Attachment #2: Type: text/plain, Size: 0 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCHv2] nvme-pci: try function level reset on init failure
2025-07-15 19:16 ` [PATCHv2] nvme-pci: try function level reset on init failure Keith Busch
2025-07-15 23:04 ` Chaitanya Kulkarni
2025-07-17 8:38 ` Nitesh Shetty
@ 2025-07-17 11:40 ` Christoph Hellwig
2025-07-17 13:54 ` Keith Busch
3 siblings, 0 replies; 8+ messages in thread
From: Christoph Hellwig @ 2025-07-17 11:40 UTC (permalink / raw)
To: Keith Busch; +Cc: linux-nvme, hch, Keith Busch, Chaitanya Kulkarni
Thanks,
applied to nvme-6.17.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCHv2] nvme-pci: try function level reset on init failure
2025-07-15 19:16 ` [PATCHv2] nvme-pci: try function level reset on init failure Keith Busch
` (2 preceding siblings ...)
2025-07-17 11:40 ` [PATCHv2] " Christoph Hellwig
@ 2025-07-17 13:54 ` Keith Busch
2025-07-17 13:57 ` Christoph Hellwig
3 siblings, 1 reply; 8+ messages in thread
From: Keith Busch @ 2025-07-17 13:54 UTC (permalink / raw)
To: Keith Busch; +Cc: linux-nvme, hch, Chaitanya Kulkarni
On Tue, Jul 15, 2025 at 12:16:27PM -0700, Keith Busch wrote:
> + dev_info(&dev->ctrl.device,
> + "controller reset completed after pcie flr\n");
Urg, sorry, this was the wrong diff: that '&' shouldn't be there, and
build bot will probably flag this.
Christoph, could you squash this in?
-- >8 --
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index a7aa8ba4951f4..73d5a5298822a 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2083,7 +2083,7 @@ static int nvme_pci_configure_admin_queue(struct nvme_dev *dev)
if (result < 0)
return result;
- dev_info(&dev->ctrl.device,
+ dev_info(dev->ctrl.device,
"controller reset completed after pcie flr\n");
}
--
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCHv2] nvme-pci: try function level reset on init failure
2025-07-17 13:54 ` Keith Busch
@ 2025-07-17 13:57 ` Christoph Hellwig
2025-07-17 15:44 ` Keith Busch
0 siblings, 1 reply; 8+ messages in thread
From: Christoph Hellwig @ 2025-07-17 13:57 UTC (permalink / raw)
To: Keith Busch; +Cc: Keith Busch, linux-nvme, hch, Chaitanya Kulkarni
On Thu, Jul 17, 2025 at 07:54:45AM -0600, Keith Busch wrote:
> On Tue, Jul 15, 2025 at 12:16:27PM -0700, Keith Busch wrote:
> > + dev_info(&dev->ctrl.device,
> > + "controller reset completed after pcie flr\n");
>
> Urg, sorry, this was the wrong diff: that '&' shouldn't be there, and
> build bot will probably flag this.
>
> Christoph, could you squash this in?
In fact I already did, but forgot to push out the result or tell
anyone..
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCHv2] nvme-pci: try function level reset on init failure
2025-07-17 13:57 ` Christoph Hellwig
@ 2025-07-17 15:44 ` Keith Busch
2025-07-17 15:46 ` Christoph Hellwig
0 siblings, 1 reply; 8+ messages in thread
From: Keith Busch @ 2025-07-17 15:44 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Keith Busch, linux-nvme, Chaitanya Kulkarni
On Thu, Jul 17, 2025 at 03:57:06PM +0200, Christoph Hellwig wrote:
> On Thu, Jul 17, 2025 at 07:54:45AM -0600, Keith Busch wrote:
> > Christoph, could you squash this in?
>
> In fact I already did, but forgot to push out the result or tell
> anyone..
Thanks, but the new push appears to have incorporated some unrelated
changes to fs/verity/verify.c
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCHv2] nvme-pci: try function level reset on init failure
2025-07-17 15:44 ` Keith Busch
@ 2025-07-17 15:46 ` Christoph Hellwig
0 siblings, 0 replies; 8+ messages in thread
From: Christoph Hellwig @ 2025-07-17 15:46 UTC (permalink / raw)
To: Keith Busch
Cc: Christoph Hellwig, Keith Busch, linux-nvme, Chaitanya Kulkarni
On Thu, Jul 17, 2025 at 09:44:51AM -0600, Keith Busch wrote:
> On Thu, Jul 17, 2025 at 03:57:06PM +0200, Christoph Hellwig wrote:
> > On Thu, Jul 17, 2025 at 07:54:45AM -0600, Keith Busch wrote:
> > > Christoph, could you squash this in?
> >
> > In fact I already did, but forgot to push out the result or tell
> > anyone..
>
> Thanks, but the new push appears to have incorporated some unrelated
> changes to fs/verity/verify.c
Fixes. That's what you for not commiting thing immediately before
rushing to a meeting.
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-07-17 16:21 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <CGME20250717083851epcas5p4e16203f1673841f8dbf2da2a671cde66@epcas5p4.samsung.com>
2025-07-15 19:16 ` [PATCHv2] nvme-pci: try function level reset on init failure Keith Busch
2025-07-15 23:04 ` Chaitanya Kulkarni
2025-07-17 8:38 ` Nitesh Shetty
2025-07-17 11:40 ` [PATCHv2] " Christoph Hellwig
2025-07-17 13:54 ` Keith Busch
2025-07-17 13:57 ` Christoph Hellwig
2025-07-17 15:44 ` Keith Busch
2025-07-17 15:46 ` Christoph Hellwig
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.