* [PATCH 0/2] scsi: smartpqi: fix PCIe hot reset recovery
@ 2026-05-06 14:01 Mateusz Nowicki
2026-05-06 14:01 ` [PATCH 1/2] scsi: smartpqi: add pci_error_handlers for bus " Mateusz Nowicki
` (2 more replies)
0 siblings, 3 replies; 5+ messages in thread
From: Mateusz Nowicki @ 2026-05-06 14:01 UTC (permalink / raw)
To: don.brace
Cc: martin.petersen, James.Bottomley, storagedev, linux-scsi,
linux-kernel
A PCIe bus reset (e.g. "echo 1 > /sys/bus/pci/devices/<bdf>/reset") on a
controller without FLR support leaves the HPE SR932i-p Gen10+ unusable
until reboot: smartpqi registers no pci_error_handlers, so the driver
is not notified, firmware reverts to SIS mode, and all queue mappings
are dropped while the driver still drives PQI.
Patch 1 adds .reset_prepare / .reset_done reusing
pqi_ofa_ctrl_quiesce() / _unquiesce() / pqi_ctrl_init_resume().
Patch 2 raises SIS_CTRL_READY_RESUME_TIMEOUT_SECS from 90s to 180s,
matching the cold-boot path; without this patch 1 fails at the SIS
ready check because firmware boot after reset takes ~125s on the
SR932i-p Gen10+.
Tested on HPE SR932i-p Gen10+ against Linus' master at 74fe02ce122a.
Note: the From: header is my Posteo address because my employer's SMTP
is unavailable for external mailing lists. The Signed-off-by carries
the Microchip attribution.
Mateusz Nowicki (2):
scsi: smartpqi: add pci_error_handlers for bus reset recovery
scsi: smartpqi: increase SIS ctrl ready resume timeout to 180s
drivers/scsi/smartpqi/smartpqi_init.c | 47 +++++++++++++++++++++++++++
drivers/scsi/smartpqi/smartpqi_sis.c | 2 +-
2 files changed, 48 insertions(+), 1 deletion(-)
--
2.43.0
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 1/2] scsi: smartpqi: add pci_error_handlers for bus reset recovery
2026-05-06 14:01 [PATCH 0/2] scsi: smartpqi: fix PCIe hot reset recovery Mateusz Nowicki
@ 2026-05-06 14:01 ` Mateusz Nowicki
2026-05-06 14:01 ` [PATCH 2/2] scsi: smartpqi: increase SIS ctrl ready resume timeout to 180s Mateusz Nowicki
2026-05-06 22:21 ` [PATCH 0/2] scsi: smartpqi: fix PCIe hot reset recovery Laurence Oberman
2 siblings, 0 replies; 5+ messages in thread
From: Mateusz Nowicki @ 2026-05-06 14:01 UTC (permalink / raw)
To: don.brace
Cc: martin.petersen, James.Bottomley, storagedev, linux-scsi,
linux-kernel, Mateusz Nowicki
The smartpqi driver does not register pci_error_handlers. When the PCI
subsystem performs a bus reset (e.g. "echo 1 > /sys/bus/pci/devices/
<bdf>/reset") on a controller without FLR support, the driver is not
notified. Firmware reverts to SIS mode and drops admin and operational
queue mappings while the driver still believes PQI is active; SCSI I/O
hangs until reboot.
Add .reset_prepare and .reset_done callbacks reusing the existing
SIS -> PQI recovery helpers.
reset_prepare:
- pqi_wait_until_ofa_finished()
- pqi_ofa_ctrl_quiesce()
- clear controller_online and pqi_mode_enabled
reset_done:
- ssleep(PQI_POST_RESET_DELAY_SECS)
- pqi_ofa_ctrl_unquiesce()
- pqi_ctrl_init_resume() to drive SIS -> PQI, recreate queues,
re-enable events and rescan
- pqi_take_ctrl_offline() on failure
No new helpers or exports. Tested on HPE SR932i-p Gen10+.
Signed-off-by: Mateusz Nowicki <mateusz.nowicki@microchip.com>
---
drivers/scsi/smartpqi/smartpqi_init.c | 47 +++++++++++++++++++++++++++
1 file changed, 47 insertions(+)
diff --git a/drivers/scsi/smartpqi/smartpqi_init.c b/drivers/scsi/smartpqi/smartpqi_init.c
index 2026ac645d6a..c4003d3cda7e 100644
--- a/drivers/scsi/smartpqi/smartpqi_init.c
+++ b/drivers/scsi/smartpqi/smartpqi_init.c
@@ -10677,12 +10677,59 @@ static const struct pci_device_id pqi_pci_id_table[] = {
MODULE_DEVICE_TABLE(pci, pqi_pci_id_table);
+static void pqi_reset_prepare(struct pci_dev *pci_dev)
+{
+ struct pqi_ctrl_info *ctrl_info = pci_get_drvdata(pci_dev);
+
+ if (!ctrl_info)
+ return;
+
+ dev_info(&pci_dev->dev, "PCI reset prepare\n");
+
+ pqi_wait_until_ofa_finished(ctrl_info);
+
+ pqi_ofa_ctrl_quiesce(ctrl_info);
+
+ ctrl_info->controller_online = false;
+ ctrl_info->pqi_mode_enabled = false;
+}
+
+static void pqi_reset_done(struct pci_dev *pci_dev)
+{
+ int rc;
+ struct pqi_ctrl_info *ctrl_info = pci_get_drvdata(pci_dev);
+
+ if (!ctrl_info)
+ return;
+
+ dev_info(&pci_dev->dev, "PCI reset done - reinitializing\n");
+
+ ssleep(PQI_POST_RESET_DELAY_SECS);
+
+ pqi_ofa_ctrl_unquiesce(ctrl_info);
+
+ rc = pqi_ctrl_init_resume(ctrl_info);
+ if (rc) {
+ dev_err(&pci_dev->dev, "reset recovery failed: %d\n", rc);
+ pqi_take_ctrl_offline(ctrl_info, PQI_FIRMWARE_KERNEL_NOT_UP);
+ return;
+ }
+
+ dev_info(&pci_dev->dev, "reset recovery complete\n");
+}
+
+static const struct pci_error_handlers pqi_pci_error_handlers = {
+ .reset_prepare = pqi_reset_prepare,
+ .reset_done = pqi_reset_done,
+};
+
static struct pci_driver pqi_pci_driver = {
.name = DRIVER_NAME_SHORT,
.id_table = pqi_pci_id_table,
.probe = pqi_pci_probe,
.remove = pqi_pci_remove,
.shutdown = pqi_shutdown,
+ .err_handler = &pqi_pci_error_handlers,
#if defined(CONFIG_PM)
.driver = {
.pm = &pqi_pm_ops
--
2.43.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 2/2] scsi: smartpqi: increase SIS ctrl ready resume timeout to 180s
2026-05-06 14:01 [PATCH 0/2] scsi: smartpqi: fix PCIe hot reset recovery Mateusz Nowicki
2026-05-06 14:01 ` [PATCH 1/2] scsi: smartpqi: add pci_error_handlers for bus " Mateusz Nowicki
@ 2026-05-06 14:01 ` Mateusz Nowicki
2026-05-06 22:21 ` [PATCH 0/2] scsi: smartpqi: fix PCIe hot reset recovery Laurence Oberman
2 siblings, 0 replies; 5+ messages in thread
From: Mateusz Nowicki @ 2026-05-06 14:01 UTC (permalink / raw)
To: don.brace
Cc: martin.petersen, James.Bottomley, storagedev, linux-scsi,
linux-kernel, Mateusz Nowicki
After a PCIe hot reset, firmware boot can exceed the 90 second timeout
in sis_wait_for_ctrl_ready_resume(). On HPE SR932i-p Gen10+ boot takes
~125s, causing pqi_ctrl_init_resume() to fail with -ETIMEDOUT:
smartpqi 0000:84:00.0: PCI reset prepare
smartpqi 0000:84:00.0: PCI reset done - reinitializing
smartpqi 0000:84:00.0: controller not ready after 90 seconds
smartpqi 0000:84:00.0: reset recovery failed: -110
Match SIS_CTRL_READY_TIMEOUT_SECS (180s) used on the cold-boot path.
Signed-off-by: Mateusz Nowicki <mateusz.nowicki@microchip.com>
---
drivers/scsi/smartpqi/smartpqi_sis.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/scsi/smartpqi/smartpqi_sis.c b/drivers/scsi/smartpqi/smartpqi_sis.c
index ae5a264d062d..df06302cec38 100644
--- a/drivers/scsi/smartpqi/smartpqi_sis.c
+++ b/drivers/scsi/smartpqi/smartpqi_sis.c
@@ -58,7 +58,7 @@
#define SIS_CTRL_KERNEL_UP 0x80
#define SIS_CTRL_KERNEL_PANIC 0x100
#define SIS_CTRL_READY_TIMEOUT_SECS 180
-#define SIS_CTRL_READY_RESUME_TIMEOUT_SECS 90
+#define SIS_CTRL_READY_RESUME_TIMEOUT_SECS 180
#define SIS_CTRL_READY_POLL_INTERVAL_MSECS 10
enum sis_fw_triage_status {
--
2.43.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH 0/2] scsi: smartpqi: fix PCIe hot reset recovery
2026-05-06 14:01 [PATCH 0/2] scsi: smartpqi: fix PCIe hot reset recovery Mateusz Nowicki
2026-05-06 14:01 ` [PATCH 1/2] scsi: smartpqi: add pci_error_handlers for bus " Mateusz Nowicki
2026-05-06 14:01 ` [PATCH 2/2] scsi: smartpqi: increase SIS ctrl ready resume timeout to 180s Mateusz Nowicki
@ 2026-05-06 22:21 ` Laurence Oberman
2026-05-07 1:45 ` Laurence Oberman
2 siblings, 1 reply; 5+ messages in thread
From: Laurence Oberman @ 2026-05-06 22:21 UTC (permalink / raw)
To: Mateusz Nowicki, don.brace
Cc: martin.petersen, James.Bottomley, storagedev, linux-scsi,
linux-kernel
On Wed, 2026-05-06 at 14:01 +0000, Mateusz Nowicki wrote:
> A PCIe bus reset (e.g. "echo 1 > /sys/bus/pci/devices/<bdf>/reset")
> on a
> controller without FLR support leaves the HPE SR932i-p Gen10+
> unusable
> until reboot: smartpqi registers no pci_error_handlers, so the driver
> is not notified, firmware reverts to SIS mode, and all queue mappings
> are dropped while the driver still drives PQI.
>
> Patch 1 adds .reset_prepare / .reset_done reusing
> pqi_ofa_ctrl_quiesce() / _unquiesce() / pqi_ctrl_init_resume().
>
> Patch 2 raises SIS_CTRL_READY_RESUME_TIMEOUT_SECS from 90s to 180s,
> matching the cold-boot path; without this patch 1 fails at the SIS
> ready check because firmware boot after reset takes ~125s on the
> SR932i-p Gen10+.
>
> Tested on HPE SR932i-p Gen10+ against Linus' master at 74fe02ce122a.
>
> Note: the From: header is my Posteo address because my employer's
> SMTP
> is unavailable for external mailing lists. The Signed-off-by carries
> the Microchip attribution.
>
> Mateusz Nowicki (2):
> scsi: smartpqi: add pci_error_handlers for bus reset recovery
> scsi: smartpqi: increase SIS ctrl ready resume timeout to 180s
>
> drivers/scsi/smartpqi/smartpqi_init.c | 47
> +++++++++++++++++++++++++++
> drivers/scsi/smartpqi/smartpqi_sis.c | 2 +-
> 2 files changed, 48 insertions(+), 1 deletion(-)
>
> --
> 2.43.0
>
>
>
Hello
I did reproduce this so I am testing the patches as well.
They look correct to me, I will reply again after testing with a
review.
Thanks
Laurence
[2513778.140012] smartpqi 0000:64:00.0: no heartbeat detected - last
heartbeat count: 4207808511
[2513778.140031] smartpqi 0000:64:00.0: controller offline: reason code
0x4 (no controller heartbeat detected)
[2513778.141346] sd 1:0:0:0: [sda] tag#549 FAILED Result:
hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK cmd_age=18s
[2513778.141355] sd 1:0:0:0: [sda] tag#550 FAILED Result:
"xfs_buf_ioend_handle_error+0xd5/0x3f0 [xfs]" at daddr 0x9f78 len 8
error 5
[2513778.141526] XFS (dm-0): log I/O error -5
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 0/2] scsi: smartpqi: fix PCIe hot reset recovery
2026-05-06 22:21 ` [PATCH 0/2] scsi: smartpqi: fix PCIe hot reset recovery Laurence Oberman
@ 2026-05-07 1:45 ` Laurence Oberman
0 siblings, 0 replies; 5+ messages in thread
From: Laurence Oberman @ 2026-05-07 1:45 UTC (permalink / raw)
To: Mateusz Nowicki, don.brace
Cc: martin.petersen, James.Bottomley, storagedev, linux-scsi,
linux-kernel
On Wed, 2026-05-06 at 18:21 -0400, Laurence Oberman wrote:
> On Wed, 2026-05-06 at 14:01 +0000, Mateusz Nowicki wrote:
> > A PCIe bus reset (e.g. "echo 1 > /sys/bus/pci/devices/<bdf>/reset")
> > on a
> > controller without FLR support leaves the HPE SR932i-p Gen10+
> > unusable
> > until reboot: smartpqi registers no pci_error_handlers, so the
> > driver
> > is not notified, firmware reverts to SIS mode, and all queue
> > mappings
> > are dropped while the driver still drives PQI.
> >
> > Patch 1 adds .reset_prepare / .reset_done reusing
> > pqi_ofa_ctrl_quiesce() / _unquiesce() / pqi_ctrl_init_resume().
> >
> > Patch 2 raises SIS_CTRL_READY_RESUME_TIMEOUT_SECS from 90s to 180s,
> > matching the cold-boot path; without this patch 1 fails at the SIS
> > ready check because firmware boot after reset takes ~125s on the
> > SR932i-p Gen10+.
> >
> > Tested on HPE SR932i-p Gen10+ against Linus' master at
> > 74fe02ce122a.
> >
> > Note: the From: header is my Posteo address because my employer's
> > SMTP
> > is unavailable for external mailing lists. The Signed-off-by
> > carries
> > the Microchip attribution.
> >
> > Mateusz Nowicki (2):
> > scsi: smartpqi: add pci_error_handlers for bus reset recovery
> > scsi: smartpqi: increase SIS ctrl ready resume timeout to 180s
> >
> > drivers/scsi/smartpqi/smartpqi_init.c | 47
> > +++++++++++++++++++++++++++
> > drivers/scsi/smartpqi/smartpqi_sis.c | 2 +-
> > 2 files changed, 48 insertions(+), 1 deletion(-)
> >
> > --
> > 2.43.0
> >
> >
> >
> Hello
>
> I did reproduce this so I am testing the patches as well.
> They look correct to me, I will reply again after testing with a
> review.
>
> Thanks
> Laurence
>
>
> [2513778.140012] smartpqi 0000:64:00.0: no heartbeat detected - last
> heartbeat count: 4207808511
> [2513778.140031] smartpqi 0000:64:00.0: controller offline: reason
> code
> 0x4 (no controller heartbeat detected)
> [2513778.141346] sd 1:0:0:0: [sda] tag#549 FAILED Result:
> hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK cmd_age=18s
> [2513778.141355] sd 1:0:0:0: [sda] tag#550 FAILED Result:
>
> "xfs_buf_ioend_handle_error+0xd5/0x3f0 [xfs]" at daddr 0x9f78 len 8
> error 5
> [2513778.141526] XFS (dm-0): log I/O error -5
>
Hello
For the series:
I tested the patches and it recovers with them applied.
The patches look good.
Tested-by: Laurence Oberman <loberman@redhat.com>
Reviewed-by: Laurence Oberman <loberman@redhat.com>
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-05-07 1:45 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-06 14:01 [PATCH 0/2] scsi: smartpqi: fix PCIe hot reset recovery Mateusz Nowicki
2026-05-06 14:01 ` [PATCH 1/2] scsi: smartpqi: add pci_error_handlers for bus " Mateusz Nowicki
2026-05-06 14:01 ` [PATCH 2/2] scsi: smartpqi: increase SIS ctrl ready resume timeout to 180s Mateusz Nowicki
2026-05-06 22:21 ` [PATCH 0/2] scsi: smartpqi: fix PCIe hot reset recovery Laurence Oberman
2026-05-07 1:45 ` Laurence Oberman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox