* [PATCH] nvme-pci: fix queue unquiesce check on slot_reset
@ 2025-04-24 17:18 Keith Busch
2025-04-25 13:19 ` Christoph Hellwig
2025-04-29 13:10 ` Christoph Hellwig
0 siblings, 2 replies; 6+ messages in thread
From: Keith Busch @ 2025-04-24 17:18 UTC (permalink / raw)
To: linux-nvme, hch; +Cc: Keith Busch, Dhankaran Singh Ajravat
From: Keith Busch <kbusch@kernel.org>
A zero return means the reset was successfully scheduled. We don't want
to unquiesce the queues while the reset_work is pending, as that will
just flush out requeued requests to a failed completion.
Fixes: 71a5bb153be104 ("nvme: ensure disabling pairs with unquiesce")
Reported-by: Dhankaran Singh Ajravat <dhankaran@meta.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
---
drivers/nvme/host/pci.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index b178d52eac1b7..c9e2a5450bc0f 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -3575,7 +3575,7 @@ static pci_ers_result_t nvme_slot_reset(struct pci_dev *pdev)
dev_info(dev->ctrl.device, "restart after slot reset\n");
pci_restore_state(pdev);
- if (!nvme_try_sched_reset(&dev->ctrl))
+ if (nvme_try_sched_reset(&dev->ctrl))
nvme_unquiesce_io_queues(&dev->ctrl);
return PCI_ERS_RESULT_RECOVERED;
}
--
2.47.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] nvme-pci: fix queue unquiesce check on slot_reset
2025-04-24 17:18 [PATCH] nvme-pci: fix queue unquiesce check on slot_reset Keith Busch
@ 2025-04-25 13:19 ` Christoph Hellwig
2025-04-25 14:33 ` Keith Busch
2025-04-29 13:10 ` Christoph Hellwig
1 sibling, 1 reply; 6+ messages in thread
From: Christoph Hellwig @ 2025-04-25 13:19 UTC (permalink / raw)
To: Keith Busch; +Cc: linux-nvme, hch, Keith Busch, Dhankaran Singh Ajravat
On Thu, Apr 24, 2025 at 10:18:01AM -0700, Keith Busch wrote:
> From: Keith Busch <kbusch@kernel.org>
>
> A zero return means the reset was successfully scheduled. We don't want
> to unquiesce the queues while the reset_work is pending, as that will
> just flush out requeued requests to a failed completion.
>
> Fixes: 71a5bb153be104 ("nvme: ensure disabling pairs with unquiesce")
Sounds like this code path isn't get teste all the much if this stuck
around for so long..
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] nvme-pci: fix queue unquiesce check on slot_reset
2025-04-25 13:19 ` Christoph Hellwig
@ 2025-04-25 14:33 ` Keith Busch
2025-04-28 15:22 ` Keith Busch
0 siblings, 1 reply; 6+ messages in thread
From: Keith Busch @ 2025-04-25 14:33 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Keith Busch, linux-nvme, Dhankaran Singh Ajravat
On Fri, Apr 25, 2025 at 03:19:35PM +0200, Christoph Hellwig wrote:
> On Thu, Apr 24, 2025 at 10:18:01AM -0700, Keith Busch wrote:
> > From: Keith Busch <kbusch@kernel.org>
> >
> > A zero return means the reset was successfully scheduled. We don't want
> > to unquiesce the queues while the reset_work is pending, as that will
> > just flush out requeued requests to a failed completion.
> >
> > Fixes: 71a5bb153be104 ("nvme: ensure disabling pairs with unquiesce")
>
> Sounds like this code path isn't get teste all the much if this stuck
> around for so long..
The conditions that trigger pcie errors were the primary concern. Of
course you'll get IO errors, right? The pcie connection is flakey! But
we are supposed retry IO's after recovery, which we weren't doing, and
that was a secondary concern I embaressingly overlooked for many
reports.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] nvme-pci: fix queue unquiesce check on slot_reset
2025-04-25 14:33 ` Keith Busch
@ 2025-04-28 15:22 ` Keith Busch
2025-04-29 12:13 ` Christoph Hellwig
0 siblings, 1 reply; 6+ messages in thread
From: Keith Busch @ 2025-04-28 15:22 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Keith Busch, linux-nvme, Dhankaran Singh Ajravat
On Fri, Apr 25, 2025 at 08:33:28AM -0600, Keith Busch wrote:
> On Fri, Apr 25, 2025 at 03:19:35PM +0200, Christoph Hellwig wrote:
> > On Thu, Apr 24, 2025 at 10:18:01AM -0700, Keith Busch wrote:
> > > From: Keith Busch <kbusch@kernel.org>
> > >
> > > A zero return means the reset was successfully scheduled. We don't want
> > > to unquiesce the queues while the reset_work is pending, as that will
> > > just flush out requeued requests to a failed completion.
> > >
> > > Fixes: 71a5bb153be104 ("nvme: ensure disabling pairs with unquiesce")
> >
> > Sounds like this code path isn't get teste all the much if this stuck
> > around for so long..
>
> The conditions that trigger pcie errors were the primary concern. Of
> course you'll get IO errors, right? The pcie connection is flakey! But
> we are supposed retry IO's after recovery, which we weren't doing, and
> that was a secondary concern I embaressingly overlooked for many
> reports.
We can still take this right? It is a good fix, even if we misunderstood
why IO was failing for over a year there.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] nvme-pci: fix queue unquiesce check on slot_reset
2025-04-28 15:22 ` Keith Busch
@ 2025-04-29 12:13 ` Christoph Hellwig
0 siblings, 0 replies; 6+ messages in thread
From: Christoph Hellwig @ 2025-04-29 12:13 UTC (permalink / raw)
To: Keith Busch
Cc: Christoph Hellwig, Keith Busch, linux-nvme,
Dhankaran Singh Ajravat
On Mon, Apr 28, 2025 at 09:22:55AM -0600, Keith Busch wrote:
> > > Sounds like this code path isn't get teste all the much if this stuck
> > > around for so long..
> >
> > The conditions that trigger pcie errors were the primary concern. Of
> > course you'll get IO errors, right? The pcie connection is flakey! But
> > we are supposed retry IO's after recovery, which we weren't doing, and
> > that was a secondary concern I embaressingly overlooked for many
> > reports.
>
> We can still take this right?
Absolutely. Just wondering how this managed to slip through.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] nvme-pci: fix queue unquiesce check on slot_reset
2025-04-24 17:18 [PATCH] nvme-pci: fix queue unquiesce check on slot_reset Keith Busch
2025-04-25 13:19 ` Christoph Hellwig
@ 2025-04-29 13:10 ` Christoph Hellwig
1 sibling, 0 replies; 6+ messages in thread
From: Christoph Hellwig @ 2025-04-29 13:10 UTC (permalink / raw)
To: Keith Busch; +Cc: linux-nvme, hch, Keith Busch, Dhankaran Singh Ajravat
Thanks,
added to nvme-6.15.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-04-29 15:16 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-24 17:18 [PATCH] nvme-pci: fix queue unquiesce check on slot_reset Keith Busch
2025-04-25 13:19 ` Christoph Hellwig
2025-04-25 14:33 ` Keith Busch
2025-04-28 15:22 ` Keith Busch
2025-04-29 12:13 ` Christoph Hellwig
2025-04-29 13:10 ` Christoph Hellwig
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox