* [PATCH v2 0/1] nvme: Fix problem when booting from NVMe drive was leading to a hang.
@ 2024-03-04 18:25 Michael Kropaczek
2024-03-05 13:51 ` Christoph Hellwig
0 siblings, 1 reply; 4+ messages in thread
From: Michael Kropaczek @ 2024-03-04 18:25 UTC (permalink / raw)
To: linux-nvme
Cc: Michael Kropaczek, Keith Busch, Jens Axboe, Christoph Hellwig,
Sagi Grimberg
Description:
During endurance test, when a system was rebooted from NVMe drive, boot
process hung occasionally. The number of reboot cycles was set to 1000,
with interval of 120s. Hang occurred after ~300 reboot cycles.
After investigating the cause, it was established that NVMe driver
did not disable host memory during shutdown leaving NVMe controller
in a state preventing proper initialization in BIOS pre-boot stage.
Adding of the call to nvme_set_host_mem(dev, 0) when in shutdown
fixed the issue.
Michael Kropaczek (1):
nvme: Fix problem when booting from NVMe drive was leading to a hang.
drivers/nvme/host/pci.c | 8 ++++++++
1 file changed, 8 insertions(+)
base-commit: 8d30528a170905ede9ab6ab81f229e441808590b
--
2.34.1
From 9eec234181015af624d8e5cd8670ba5d82d0ce7e Mon Sep 17 00:00:00 2001
From: Michael Kropaczek <michael.kropaczek@solidigm.com>
Date: Thu, 29 Feb 2024 15:33:27 -0800
Subject: [PATCH v2 1/1] nvme: Fix problem when booting from NVMe drive was
leading to a hang.
To: linux-nvme@lists.infradead.org
Cc: Keith Busch <kbusch@kernel.org>,
Jens Axboe <axboe@fb.com>,
Christoph Hellwig <hch@lst.de>,
Sagi Grimberg <sagi@grimberg.me>,
Michael Kropaczek <michael.kropaczek@solidigm.com>
On certain host architectures/HW, DRAM was keeping memory contents over reboot
cycles. Certain NVMe controllers were accessing host memory after startup which
led to undefined state, preventing proper initialization in BIOS boot stage.
Freeing host memory during host's shutdown prevents the problem from occurring.
Signed-off-by: Michael Kropaczek <michael.kropaczek@solidigm.com>
---
drivers/nvme/host/pci.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index e6267a6aa380..e5292c7b301f 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2593,6 +2593,14 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
nvme_wait_freeze_timeout(&dev->ctrl, NVME_IO_TIMEOUT);
}
+ /*
+ * On certain host architectures/HW, DRAM was keeping memory contents over reboot-cycles.
+ * It was observed that certain controllers were accessing host memory after
+ * resetting which led to undefined state preventing proper initialization.
+ */
+ if (shutdown && dev->hmb)
+ nvme_set_host_mem(dev, 0);
+
nvme_quiesce_io_queues(&dev->ctrl);
if (!dead && dev->ctrl.queue_count > 0) {
--
2.34.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH v2 0/1] nvme: Fix problem when booting from NVMe drive was leading to a hang.
2024-03-04 18:25 [PATCH v2 0/1] nvme: Fix problem when booting from NVMe drive was leading to a hang Michael Kropaczek
@ 2024-03-05 13:51 ` Christoph Hellwig
2024-03-05 15:17 ` Keith Busch
0 siblings, 1 reply; 4+ messages in thread
From: Christoph Hellwig @ 2024-03-05 13:51 UTC (permalink / raw)
To: Michael Kropaczek
Cc: linux-nvme, Keith Busch, Jens Axboe, Christoph Hellwig,
Sagi Grimberg
On Mon, Mar 04, 2024 at 10:25:07AM -0800, Michael Kropaczek wrote:
> Description:
>
> During endurance test, when a system was rebooted from NVMe drive, boot
> process hung occasionally. The number of reboot cycles was set to 1000,
> with interval of 120s. Hang occurred after ~300 reboot cycles.
> After investigating the cause, it was established that NVMe driver
> did not disable host memory during shutdown leaving NVMe controller
> in a state preventing proper initialization in BIOS pre-boot stage.
> Adding of the call to nvme_set_host_mem(dev, 0) when in shutdown
> fixed the issue.
Something odd is going on with your patch submissions, as you seem
to send a single mail with a cover letter and the actual patch.
That's why I missed the last one as I've been waiting for the actual
patch mail which never arrived.
> + /*
> + * On certain host architectures/HW, DRAM was keeping memory contents over reboot-cycles.
> + * It was observed that certain controllers were accessing host memory after
> + * resetting which led to undefined state preventing proper initialization.
> + */
Block comments should never span 80 characters. But more importantly
I don't even think we need this comment at all, this is a clear bug
fix and the code is self-describing.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH v2 0/1] nvme: Fix problem when booting from NVMe drive was leading to a hang.
2024-03-05 13:51 ` Christoph Hellwig
@ 2024-03-05 15:17 ` Keith Busch
2024-03-05 17:49 ` Michael Kropaczek
0 siblings, 1 reply; 4+ messages in thread
From: Keith Busch @ 2024-03-05 15:17 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Michael Kropaczek, linux-nvme, Jens Axboe, Sagi Grimberg
On Tue, Mar 05, 2024 at 02:51:00PM +0100, Christoph Hellwig wrote:
> > + /*
> > + * On certain host architectures/HW, DRAM was keeping memory contents over reboot-cycles.
> > + * It was observed that certain controllers were accessing host memory after
> > + * resetting which led to undefined state preventing proper initialization.
> > + */
>
> Block comments should never span 80 characters. But more importantly
> I don't even think we need this comment at all, this is a clear bug
> fix and the code is self-describing.
It sounds like a firmware bug. Spec says:
"The host memory resources are not persistent in the controller across
a reset event."
Exiting a shutdown state requires a CC.EN transition, which is a reset
event.
It still may be good practice for the host driver to explicitly disable
host memory, but as I said last time, doing this in the shutdown path
deadlocks if the drive fails to produce a response, which is why we
removed it in the first place.
^ permalink raw reply [flat|nested] 4+ messages in thread
* RE: [PATCH v2 0/1] nvme: Fix problem when booting from NVMe drive was leading to a hang.
2024-03-05 15:17 ` Keith Busch
@ 2024-03-05 17:49 ` Michael Kropaczek
0 siblings, 0 replies; 4+ messages in thread
From: Michael Kropaczek @ 2024-03-05 17:49 UTC (permalink / raw)
To: Keith Busch, Christoph Hellwig
Cc: linux-nvme@lists.infradead.org, Jens Axboe, Sagi Grimberg
Thank you, Keith, Christoph,
The comment will be removed in the next version,
Michael
-----Original Message-----
From: Keith Busch <kbusch@kernel.org>
Sent: Tuesday, March 5, 2024 7:18 AM
To: Christoph Hellwig <hch@lst.de>
Cc: Michael Kropaczek <Michael.Kropaczek@solidigm.com>; linux-nvme@lists.infradead.org; Jens Axboe <axboe@fb.com>; Sagi Grimberg <sagi@grimberg.me>
Subject: Re: [PATCH v2 0/1] nvme: Fix problem when booting from NVMe drive was leading to a hang.
Caution: External Email
On Tue, Mar 05, 2024 at 02:51:00PM +0100, Christoph Hellwig wrote:
> > + /*
> > + * On certain host architectures/HW, DRAM was keeping memory contents over reboot-cycles.
> > + * It was observed that certain controllers were accessing host memory after
> > + * resetting which led to undefined state preventing proper initialization.
> > + */
>
> Block comments should never span 80 characters. But more importantly
> I don't even think we need this comment at all, this is a clear bug
> fix and the code is self-describing.
It sounds like a firmware bug. Spec says:
"The host memory resources are not persistent in the controller across
a reset event."
Exiting a shutdown state requires a CC.EN transition, which is a reset event.
It still may be good practice for the host driver to explicitly disable host memory, but as I said last time, doing this in the shutdown path deadlocks if the drive fails to produce a response, which is why we removed it in the first place.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2024-03-05 17:49 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-03-04 18:25 [PATCH v2 0/1] nvme: Fix problem when booting from NVMe drive was leading to a hang Michael Kropaczek
2024-03-05 13:51 ` Christoph Hellwig
2024-03-05 15:17 ` Keith Busch
2024-03-05 17:49 ` Michael Kropaczek
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox