[PATCH 0/5] PCI: Reduce AER / EEH deviations

linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/5] PCI: Reduce AER / EEH deviations
@ 2025-08-13  5:11 Lukas Wunner
  2025-08-13  5:11 ` [PATCH 1/5] PCI/AER: Allow drivers to opt in to Bus Reset on Non-Fatal Errors Lukas Wunner
                   ` (5 more replies)
  0 siblings, 6 replies; 22+ messages in thread
From: Lukas Wunner @ 2025-08-13  5:11 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Riana Tauro, Aravind Iddamsetty, Sean C. Dardis, Terry Bowman,
	Sathyanarayanan Kuppuswamy, Niklas Schnelle, Linas Vepstas,
	Mahesh J Salgaonkar, Oliver OHalloran, Manivannan Sadhasivam,
	linuxppc-dev, linux-pci, Shahed Shaikh, Manish Chopra,
	GR-Linux-NIC-Dev, Nilesh Javali, GR-QLogic-Storage-Upstream,
	Edward Cree, linux-net-drivers, James Smart, Dick Kennedy,
	James E.J. Bottomley, Martin K. Petersen, linux-scsi

The kernel supports three different PCI error recovery mechanisms:

* AER per PCIe r7.0 sec 6.2 (drivers/pci/pcie/aer.c + err.c)
* EEH on PowerPC (arch/powerpc/kernel/eeh_driver.c)
* zPCI on s390 (arch/s390/pci/pci_event.c)

In theory, they should all follow Documentation/PCI/pci-error-recovery.rst
to afford uniform behavior to drivers across platforms.

In practice, there are deviations which this series seeks to reduce.

One particular pain point is AER not allowing drivers to opt in to a
Bus Reset on Non-Fatal Errors (patch [1/5]).  EEH allows this and the
"xe" graphics driver would like to take advantage of it on AER-capable
platforms.  Patches [2/5] to [4/5] address various other deviations,
while patch [5/5] cleans up old gunk in code comments.

I've gone through all drivers implementing pci_error_handlers to ascertain
that no regressions are introduced by these changes.  Nevertheless further
reviewing and testing would be appreciated to raise the confidence.
Thanks!

Lukas Wunner (5):
  PCI/AER: Allow drivers to opt in to Bus Reset on Non-Fatal Errors
  PCI/ERR: Fix uevent on failure to recover
  PCI/ERR: Notify drivers on failure to recover
  PCI/ERR: Update device error_state already after reset
  PCI/ERR: Remove remnants of .link_reset() callback

 .../ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c   |  1 -
 .../net/ethernet/qlogic/qlcnic/qlcnic_main.c  |  2 -
 drivers/net/ethernet/sfc/efx_common.c         |  3 --
 drivers/net/ethernet/sfc/falcon/efx.c         |  3 --
 drivers/net/ethernet/sfc/siena/efx_common.c   |  3 --
 drivers/pci/pcie/err.c                        | 40 ++++++++++++++-----
 drivers/scsi/lpfc/lpfc_init.c                 |  2 +-
 drivers/scsi/qla2xxx/qla_os.c                 |  5 ---
 8 files changed, 32 insertions(+), 27 deletions(-)

-- 
2.47.2

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/5] PCI/AER: Allow drivers to opt in to Bus Reset on Non-Fatal Errors
  2025-08-13  5:11 [PATCH 0/5] PCI: Reduce AER / EEH deviations Lukas Wunner
@ 2025-08-13  5:11 ` Lukas Wunner
  2025-08-13 23:01   ` Sathyanarayanan Kuppuswamy
                     ` (2 more replies)
  2025-08-13  5:11 ` [PATCH 2/5] PCI/ERR: Fix uevent on failure to recover Lukas Wunner
                   ` (4 subsequent siblings)
  5 siblings, 3 replies; 22+ messages in thread
From: Lukas Wunner @ 2025-08-13  5:11 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Riana Tauro, Aravind Iddamsetty, Sean C. Dardis, Terry Bowman,
	Sathyanarayanan Kuppuswamy, Niklas Schnelle, Linas Vepstas,
	Mahesh J Salgaonkar, Oliver OHalloran, Manivannan Sadhasivam,
	linuxppc-dev, linux-pci

When Advanced Error Reporting was introduced in September 2006 by commit
6c2b374d7485 ("PCI-Express AER implemetation: AER core and aerdriver"), it
sought to adhere to the recovery flow and callbacks specified in
Documentation/PCI/pci-error-recovery.rst.

That document had been added in January 2006, when Enhanced Error Handling
(EEH) was introduced for PowerPC with commit 065c6359071c ("[PATCH] PCI
Error Recovery: documentation").

However the AER driver deviates from the document in that it never
performs a Secondary Bus Reset on Non-Fatal Errors, but always on Fatal
Errors.  By contrast, EEH allows drivers to opt in or out of a Bus Reset
regardless of error severity, by returning PCI_ERS_RESULT_NEED_RESET or
PCI_ERS_RESULT_CAN_RECOVER from their ->error_detected() callback.  If all
drivers agree that they can recover without a Bus Reset, EEH skips it.
Should one of them request a Bus Reset, it overrides all other drivers.

This inconsistency between EEH and AER seems problematic because drivers
need to be aware of and cope with it.

The file Documentation/PCI/pcieaer-howto.rst hints at a rationale for
always performing a Bus Reset on Fatal Errors:  "Fatal errors [...] cause
the link to be unreliable.  [...] This [reset_link] callback is used to
reset the PCIe physical link when a fatal error happens.  If an error
message indicates a fatal error, [...] performing link reset at upstream
is necessary."

There's no such rationale provided for never performing a Bus Reset on
Non-Fatal Errors.

The "xe" driver has a need to attempt a reset of local units on graphics
cards upon a Non-Fatal Error.  If that is insufficient for recovery, the
driver wants to opt in to a Bus Reset.

Accommodate such use cases and align AER more closely with EEH by
performing a Bus Reset in pcie_do_recovery() if drivers request it and the
faulting device's channel_state is pci_channel_io_normal.  The AER driver
sets this channel_state for Non-Fatal Errors.  For Fatal Errors, it uses
pci_channel_io_frozen.

This limits the deviation from Documentation/PCI/pci-error-recovery.rst
and EEH to the unconditional Bus Reset on Fatal Errors.

pcie_do_recovery() is also invoked by the Downstream Port Containment and
Error Disconnect Recover drivers.  They both set the channel_state to
pci_channel_io_frozen, hence pcie_do_recovery() continues to always invoke
the ->reset_subordinates() callback in their case.  That is necessary
because the callback brings the link back up at the containing Downstream
Port.

There are two behavioral changes resulting from this commit:

First, if channel_state is pci_channel_io_normal and one of the affected
drivers returns PCI_ERS_RESULT_NEED_RESET from its ->error_detected()
callback, a Bus Reset will now be performed.  There are drivers doing this
and although it would be possible to avoid a behavioral change by letting
them return PCI_ERS_RESULT_CAN_RECOVER instead, the impression I got from
examination of all drivers is that they actually expect or want a Bus
Reset (cxl_error_detected() is a case in point).  In any case, if they can
cope with a Bus Reset on Fatal Errors, they shouldn't have issues with a
Bus Reset on Non-Fatal Errors.

Second, if channel_state is pci_channel_io_frozen and all affected drivers
return PCI_ERS_RESULT_CAN_RECOVER from ->error_detected(), their
->mmio_enabled() callback is now invoked prior to performing a Bus Reset,
instead of afterwards.  This actually makes sense:  For example,
drivers/scsi/sym53c8xx_2/sym_glue.c dumps debug registers in its
->mmio_enabled() callback.  Doing so after reset right now captures the
post-reset state instead of the faulting state, which is useless.

There is only one other driver which implements ->mmio_enabled() and
returns PCI_ERS_RESULT_CAN_RECOVER from ->error_detected() for
channel_state pci_channel_io_frozen, drivers/scsi/ipr.c (IBM Power RAID).
It appears to only be used on EEH platforms.  So the second behavioral
change is limited to these two drivers.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 drivers/pci/pcie/err.c | 17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index de6381c690f5..e795e5ae6b03 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -217,15 +217,10 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
 	pci_walk_bridge(bridge, pci_pm_runtime_get_sync, NULL);

 	pci_dbg(bridge, "broadcast error_detected message\n");
-	if (state == pci_channel_io_frozen) {
+	if (state == pci_channel_io_frozen)
 		pci_walk_bridge(bridge, report_frozen_detected, &status);
-		if (reset_subordinates(bridge) != PCI_ERS_RESULT_RECOVERED) {
-			pci_warn(bridge, "subordinate device reset failed\n");
-			goto failed;
-		}
-	} else {
+	else
 		pci_walk_bridge(bridge, report_normal_detected, &status);
-	}

 	if (status == PCI_ERS_RESULT_CAN_RECOVER) {
 		status = PCI_ERS_RESULT_RECOVERED;
@@ -233,6 +228,14 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
 		pci_walk_bridge(bridge, report_mmio_enabled, &status);
 	}

+	if (status == PCI_ERS_RESULT_NEED_RESET ||
+	    state == pci_channel_io_frozen) {
+		if (reset_subordinates(bridge) != PCI_ERS_RESULT_RECOVERED) {
+			pci_warn(bridge, "subordinate device reset failed\n");
+			goto failed;
+		}
+	}
+
 	if (status == PCI_ERS_RESULT_NEED_RESET) {
 		/*
 		 * TODO: Should call platform-specific
-- 
2.47.2

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 2/5] PCI/ERR: Fix uevent on failure to recover
  2025-08-13  5:11 [PATCH 0/5] PCI: Reduce AER / EEH deviations Lukas Wunner
  2025-08-13  5:11 ` [PATCH 1/5] PCI/AER: Allow drivers to opt in to Bus Reset on Non-Fatal Errors Lukas Wunner
@ 2025-08-13  5:11 ` Lukas Wunner
  2025-08-13 23:01   ` Sathyanarayanan Kuppuswamy
  2025-08-14  7:08   ` Niklas Schnelle
  2025-08-13  5:11 ` [PATCH 3/5] PCI/ERR: Notify drivers " Lukas Wunner
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 22+ messages in thread
From: Lukas Wunner @ 2025-08-13  5:11 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Riana Tauro, Aravind Iddamsetty, Sean C. Dardis, Terry Bowman,
	Sathyanarayanan Kuppuswamy, Niklas Schnelle, Linas Vepstas,
	Mahesh J Salgaonkar, Oliver OHalloran, Manivannan Sadhasivam,
	linuxppc-dev, linux-pci

Upon failure to recover from a PCIe error through AER, DPC or EDR, a
uevent is sent to inform user space about disconnection of the bridge
whose subordinate devices failed to recover.

However the bridge itself is not disconnected.  Instead, a uevent should
be sent for each of the subordinate devices.

Only if the "bridge" happens to be a Root Complex Event Collector or
Integrated Endpoint does it make sense to send a uevent for it (because
there are no subordinate devices).

Right now if there is a mix of subordinate devices with and without
pci_error_handlers, a BEGIN_RECOVERY event is sent for those with
pci_error_handlers but no FAILED_RECOVERY event is ever sent for them
afterwards.  Fix it.

Fixes: 856e1eb9bdd4 ("PCI/AER: Add uevents in AER and EEH error/resume")
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: stable@vger.kernel.org  # v4.16+
---
 drivers/pci/pcie/err.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index e795e5ae6b03..21d554359fb1 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -108,6 +108,12 @@ static int report_normal_detected(struct pci_dev *dev, void *data)
 	return report_error_detected(dev, pci_channel_io_normal, data);
 }
 
+static int report_perm_failure_detected(struct pci_dev *dev, void *data)
+{
+	pci_uevent_ers(dev, PCI_ERS_RESULT_DISCONNECT);
+	return 0;
+}
+
 static int report_mmio_enabled(struct pci_dev *dev, void *data)
 {
 	struct pci_driver *pdrv;
@@ -272,7 +278,7 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
 failed:
 	pci_walk_bridge(bridge, pci_pm_runtime_put, NULL);
 
-	pci_uevent_ers(bridge, PCI_ERS_RESULT_DISCONNECT);
+	pci_walk_bridge(bridge, report_perm_failure_detected, NULL);
 
 	pci_info(bridge, "device recovery failed\n");
 
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 3/5] PCI/ERR: Notify drivers on failure to recover
  2025-08-13  5:11 [PATCH 0/5] PCI: Reduce AER / EEH deviations Lukas Wunner
  2025-08-13  5:11 ` [PATCH 1/5] PCI/AER: Allow drivers to opt in to Bus Reset on Non-Fatal Errors Lukas Wunner
  2025-08-13  5:11 ` [PATCH 2/5] PCI/ERR: Fix uevent on failure to recover Lukas Wunner
@ 2025-08-13  5:11 ` Lukas Wunner
  2025-08-13 23:05   ` Sathyanarayanan Kuppuswamy
  2025-08-13  5:11 ` [PATCH 4/5] PCI/ERR: Update device error_state already after reset Lukas Wunner
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 22+ messages in thread
From: Lukas Wunner @ 2025-08-13  5:11 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Riana Tauro, Aravind Iddamsetty, Sean C. Dardis, Terry Bowman,
	Sathyanarayanan Kuppuswamy, Niklas Schnelle, Linas Vepstas,
	Mahesh J Salgaonkar, Oliver OHalloran, Manivannan Sadhasivam,
	linuxppc-dev, linux-pci

According to Documentation/PCI/pci-error-recovery.rst, the following shall
occur on failure to recover from a PCIe Uncorrectable Error:

  STEP 6: Permanent Failure
  -------------------------
  A "permanent failure" has occurred, and the platform cannot recover
  the device.  The platform will call error_detected() with a
  pci_channel_state_t value of pci_channel_io_perm_failure.

  The device driver should, at this point, assume the worst. It should
  cancel all pending I/O, refuse all new I/O, returning -EIO to
  higher layers. The device driver should then clean up all of its
  memory and remove itself from kernel operations, much as it would
  during system shutdown.

Sathya notes that AER does not call error_detected() on failure and thus
deviates from the document (as well as EEH, for which the document was
originally added).

Most drivers do nothing on permanent failure, but the SCSI drivers and a
number of Ethernet drivers do take advantage of the notification to flush
queues and give up resources.

Amend AER to notify such drivers and align with the documentation and EEH.

Link: https://lore.kernel.org/r/f496fc0f-64d7-46a4-8562-dba74e31a956@linux.intel.com/
Suggested-by: Sathyanarayanan Kuppuswamy <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 drivers/pci/pcie/err.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index 21d554359fb1..930bb60fb761 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -110,7 +110,19 @@ static int report_normal_detected(struct pci_dev *dev, void *data)
 
 static int report_perm_failure_detected(struct pci_dev *dev, void *data)
 {
+	struct pci_driver *pdrv;
+	const struct pci_error_handlers *err_handler;
+
+	device_lock(&dev->dev);
+	pdrv = dev->driver;
+	if (!pdrv || !pdrv->err_handler || !pdrv->err_handler->error_detected)
+		goto out;
+
+	err_handler = pdrv->err_handler;
+	err_handler->error_detected(dev, pci_channel_io_perm_failure);
+out:
 	pci_uevent_ers(dev, PCI_ERS_RESULT_DISCONNECT);
+	device_unlock(&dev->dev);
 	return 0;
 }
 
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 4/5] PCI/ERR: Update device error_state already after reset
  2025-08-13  5:11 [PATCH 0/5] PCI: Reduce AER / EEH deviations Lukas Wunner
                   ` (2 preceding siblings ...)
  2025-08-13  5:11 ` [PATCH 3/5] PCI/ERR: Notify drivers " Lukas Wunner
@ 2025-08-13  5:11 ` Lukas Wunner
  2025-08-13 23:43   ` Sathyanarayanan Kuppuswamy
  2025-08-13  5:11 ` [PATCH 5/5] PCI/ERR: Remove remnants of .link_reset() callback Lukas Wunner
  2025-08-13 18:21 ` [PATCH 0/5] PCI: Reduce AER / EEH deviations Bjorn Helgaas
  5 siblings, 1 reply; 22+ messages in thread
From: Lukas Wunner @ 2025-08-13  5:11 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Riana Tauro, Aravind Iddamsetty, Sean C. Dardis, Terry Bowman,
	Sathyanarayanan Kuppuswamy, Niklas Schnelle, Linas Vepstas,
	Mahesh J Salgaonkar, Oliver OHalloran, Manivannan Sadhasivam,
	linuxppc-dev, linux-pci, Shahed Shaikh, Manish Chopra,
	GR-Linux-NIC-Dev, Nilesh Javali, GR-QLogic-Storage-Upstream,
	James E.J. Bottomley, Martin K. Petersen, linux-scsi, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski <ku 

After a Fatal Error has been reported by a device and has been recovered
through a Secondary Bus Reset, AER updates the device's error_state to
pci_channel_io_normal before invoking its driver's ->resume() callback.

By contrast, EEH updates the error_state earlier, namely after resetting
the device and before invoking its driver's ->slot_reset() callback.
Commit c58dc575f3c8 ("powerpc/pseries: Set error_state to
pci_channel_io_normal in eeh_report_reset()") explains in great detail
that the earlier invocation is necessitated by various drivers checking
accessibility of the device with pci_channel_offline() and avoiding
accesses if it returns true.  It returns true for any other error_state
than pci_channel_io_normal.

The device should be accessible already after reset, hence the reasoning
is that it's safe to update the error_state immediately afterwards.

This deviation between AER and EEH seems problematic because drivers
behave differently depending on which error recovery mechanism the
platform uses.  Three drivers have gone so far as to update the
error_state themselves, presumably to work around AER's behavior.

For consistency, amend AER to update the error_state at the same recovery
steps as EEH.  Drop the now unnecessary workaround from the three drivers.

Keep updating the error_state before ->resume() in case ->error_detected()
or ->mmio_enabled() return PCI_ERS_RESULT_RECOVERED, which causes
->slot_reset() to be skipped.  There are drivers doing this even for Fatal
Errors, e.g. mhi_pci_error_detected().

Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c | 1 -
 drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c    | 2 --
 drivers/pci/pcie/err.c                              | 3 ++-
 drivers/scsi/qla2xxx/qla_os.c                       | 5 -----
 4 files changed, 2 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c
index d7cdea8f604d..91e7b38143ea 100644
--- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c
+++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c
@@ -4215,7 +4215,6 @@ static pci_ers_result_t qlcnic_83xx_io_slot_reset(struct pci_dev *pdev)
 	struct qlcnic_adapter *adapter = pci_get_drvdata(pdev);
 	int err = 0;
 
-	pdev->error_state = pci_channel_io_normal;
 	err = pci_enable_device(pdev);
 	if (err)
 		goto disconnect;
diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
index 53cdd36c4123..e051d8c7a28d 100644
--- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
+++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
@@ -3766,8 +3766,6 @@ static int qlcnic_attach_func(struct pci_dev *pdev)
 	struct qlcnic_adapter *adapter = pci_get_drvdata(pdev);
 	struct net_device *netdev = adapter->netdev;
 
-	pdev->error_state = pci_channel_io_normal;
-
 	err = pci_enable_device(pdev);
 	if (err)
 		return err;
diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index 930bb60fb761..bebe4bc111d7 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -153,7 +153,8 @@ static int report_slot_reset(struct pci_dev *dev, void *data)
 
 	device_lock(&dev->dev);
 	pdrv = dev->driver;
-	if (!pdrv || !pdrv->err_handler || !pdrv->err_handler->slot_reset)
+	if (!pci_dev_set_io_state(dev, pci_channel_io_normal) ||
+	    !pdrv || !pdrv->err_handler || !pdrv->err_handler->slot_reset)
 		goto out;
 
 	err_handler = pdrv->err_handler;
diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c
index d4b484c0fd9d..4460421834cb 100644
--- a/drivers/scsi/qla2xxx/qla_os.c
+++ b/drivers/scsi/qla2xxx/qla_os.c
@@ -7883,11 +7883,6 @@ qla2xxx_pci_slot_reset(struct pci_dev *pdev)
 	       "Slot Reset.\n");
 
 	ha->pci_error_state = QLA_PCI_SLOT_RESET;
-	/* Workaround: qla2xxx driver which access hardware earlier
-	 * needs error state to be pci_channel_io_online.
-	 * Otherwise mailbox command timesout.
-	 */
-	pdev->error_state = pci_channel_io_normal;
 
 	pci_restore_state(pdev);
 
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 5/5] PCI/ERR: Remove remnants of .link_reset() callback
  2025-08-13  5:11 [PATCH 0/5] PCI: Reduce AER / EEH deviations Lukas Wunner
                   ` (3 preceding siblings ...)
  2025-08-13  5:11 ` [PATCH 4/5] PCI/ERR: Update device error_state already after reset Lukas Wunner
@ 2025-08-13  5:11 ` Lukas Wunner
  2025-08-14  0:40   ` Sathyanarayanan Kuppuswamy
  2025-08-13 18:21 ` [PATCH 0/5] PCI: Reduce AER / EEH deviations Bjorn Helgaas
  5 siblings, 1 reply; 22+ messages in thread
From: Lukas Wunner @ 2025-08-13  5:11 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Riana Tauro, Aravind Iddamsetty, Sean C. Dardis, Terry Bowman,
	Sathyanarayanan Kuppuswamy, Niklas Schnelle, Linas Vepstas,
	Mahesh J Salgaonkar, Oliver OHalloran, Manivannan Sadhasivam,
	linuxppc-dev, linux-pci, Edward Cree, linux-net-drivers,
	James Smart, Dick Kennedy, James E.J. Bottomley,
	Martin K. Petersen, linux-scsi, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

Back in 2017, commit 2fd260f03b6a ("PCI/AER: Remove unused .link_reset()
callback") removed .link_reset() from struct pci_error_handlers, but left
a few code comments behind which still mention it.  Remove them.

The code comments in the SolarFlare Ethernet drivers point out that no
.mmio_enabled() callback is needed because the driver's .error_detected()
callback always returns PCI_ERS_RESULT_NEED_RESET, which causes
pcie_do_recovery() to skip .mmio_enabled().  That's not quite correct
because efx_io_error_detected() does return PCI_ERS_RESULT_RECOVERED under
certain conditions and then .mmio_enabled() would indeed be called if it
were implemented.  Remove this misleading portion of the code comment as
well.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 drivers/net/ethernet/sfc/efx_common.c       | 3 ---
 drivers/net/ethernet/sfc/falcon/efx.c       | 3 ---
 drivers/net/ethernet/sfc/siena/efx_common.c | 3 ---
 drivers/scsi/lpfc/lpfc_init.c               | 2 +-
 4 files changed, 1 insertion(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/sfc/efx_common.c b/drivers/net/ethernet/sfc/efx_common.c
index 5a14d94163b1..e8fdbb62d872 100644
--- a/drivers/net/ethernet/sfc/efx_common.c
+++ b/drivers/net/ethernet/sfc/efx_common.c
@@ -1258,9 +1258,6 @@ static void efx_io_resume(struct pci_dev *pdev)
 
 /* For simplicity and reliability, we always require a slot reset and try to
  * reset the hardware when a pci error affecting the device is detected.
- * We leave both the link_reset and mmio_enabled callback unimplemented:
- * with our request for slot reset the mmio_enabled callback will never be
- * called, and the link_reset callback is not used by AER or EEH mechanisms.
  */
 const struct pci_error_handlers efx_err_handlers = {
 	.error_detected = efx_io_error_detected,
diff --git a/drivers/net/ethernet/sfc/falcon/efx.c b/drivers/net/ethernet/sfc/falcon/efx.c
index b07f7e4e2877..0c784656fde9 100644
--- a/drivers/net/ethernet/sfc/falcon/efx.c
+++ b/drivers/net/ethernet/sfc/falcon/efx.c
@@ -3128,9 +3128,6 @@ static void ef4_io_resume(struct pci_dev *pdev)
 
 /* For simplicity and reliability, we always require a slot reset and try to
  * reset the hardware when a pci error affecting the device is detected.
- * We leave both the link_reset and mmio_enabled callback unimplemented:
- * with our request for slot reset the mmio_enabled callback will never be
- * called, and the link_reset callback is not used by AER or EEH mechanisms.
  */
 static const struct pci_error_handlers ef4_err_handlers = {
 	.error_detected = ef4_io_error_detected,
diff --git a/drivers/net/ethernet/sfc/siena/efx_common.c b/drivers/net/ethernet/sfc/siena/efx_common.c
index a0966f879664..35036cc902fe 100644
--- a/drivers/net/ethernet/sfc/siena/efx_common.c
+++ b/drivers/net/ethernet/sfc/siena/efx_common.c
@@ -1285,9 +1285,6 @@ static void efx_io_resume(struct pci_dev *pdev)
 
 /* For simplicity and reliability, we always require a slot reset and try to
  * reset the hardware when a pci error affecting the device is detected.
- * We leave both the link_reset and mmio_enabled callback unimplemented:
- * with our request for slot reset the mmio_enabled callback will never be
- * called, and the link_reset callback is not used by AER or EEH mechanisms.
  */
 const struct pci_error_handlers efx_siena_err_handlers = {
 	.error_detected = efx_io_error_detected,
diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c
index 4081d2a358ee..cf08bb5b37c3 100644
--- a/drivers/scsi/lpfc/lpfc_init.c
+++ b/drivers/scsi/lpfc/lpfc_init.c
@@ -14377,7 +14377,7 @@ lpfc_sli_prep_dev_for_perm_failure(struct lpfc_hba *phba)
  * as desired.
  *
  * Return codes
- * 	PCI_ERS_RESULT_CAN_RECOVER - can be recovered with reset_link
+ *	PCI_ERS_RESULT_CAN_RECOVER - can be recovered without reset
  * 	PCI_ERS_RESULT_NEED_RESET - need to reset before recovery
  * 	PCI_ERS_RESULT_DISCONNECT - device could not be recovered
  **/
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/5] PCI: Reduce AER / EEH deviations
  2025-08-13  5:11 [PATCH 0/5] PCI: Reduce AER / EEH deviations Lukas Wunner
                   ` (4 preceding siblings ...)
  2025-08-13  5:11 ` [PATCH 5/5] PCI/ERR: Remove remnants of .link_reset() callback Lukas Wunner
@ 2025-08-13 18:21 ` Bjorn Helgaas
  5 siblings, 0 replies; 22+ messages in thread
From: Bjorn Helgaas @ 2025-08-13 18:21 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Riana Tauro, Aravind Iddamsetty, Sean C. Dardis, Terry Bowman,
	Sathyanarayanan Kuppuswamy, Niklas Schnelle, Linas Vepstas,
	Mahesh J Salgaonkar, Oliver OHalloran, Manivannan Sadhasivam,
	linuxppc-dev, linux-pci, Shahed Shaikh, Manish Chopra,
	GR-Linux-NIC-Dev, Nilesh Javali, GR-QLogic-Storage-Upstream,
	Edward Cree, linux-net-drivers, James Smart, Dick Kennedy,
	James E.J. Bottomley, Martin K. Petersen, linux-scsi,
	er.kernel.org, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, netdev

On Wed, Aug 13, 2025 at 07:11:00AM +0200, Lukas Wunner wrote:
> The kernel supports three different PCI error recovery mechanisms:
> 
> * AER per PCIe r7.0 sec 6.2 (drivers/pci/pcie/aer.c + err.c)
> * EEH on PowerPC (arch/powerpc/kernel/eeh_driver.c)
> * zPCI on s390 (arch/s390/pci/pci_event.c)
> 
> In theory, they should all follow Documentation/PCI/pci-error-recovery.rst
> to afford uniform behavior to drivers across platforms.
> 
> In practice, there are deviations which this series seeks to reduce.
> 
> One particular pain point is AER not allowing drivers to opt in to a
> Bus Reset on Non-Fatal Errors (patch [1/5]).  EEH allows this and the
> "xe" graphics driver would like to take advantage of it on AER-capable
> platforms.  Patches [2/5] to [4/5] address various other deviations,
> while patch [5/5] cleans up old gunk in code comments.
> 
> I've gone through all drivers implementing pci_error_handlers to ascertain
> that no regressions are introduced by these changes.  Nevertheless further
> reviewing and testing would be appreciated to raise the confidence.
> Thanks!
> 
> Lukas Wunner (5):
>   PCI/AER: Allow drivers to opt in to Bus Reset on Non-Fatal Errors
>   PCI/ERR: Fix uevent on failure to recover
>   PCI/ERR: Notify drivers on failure to recover
>   PCI/ERR: Update device error_state already after reset
>   PCI/ERR: Remove remnants of .link_reset() callback
> 
>  .../ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c   |  1 -
>  .../net/ethernet/qlogic/qlcnic/qlcnic_main.c  |  2 -
>  drivers/net/ethernet/sfc/efx_common.c         |  3 --
>  drivers/net/ethernet/sfc/falcon/efx.c         |  3 --
>  drivers/net/ethernet/sfc/siena/efx_common.c   |  3 --
>  drivers/pci/pcie/err.c                        | 40 ++++++++++++++-----
>  drivers/scsi/lpfc/lpfc_init.c                 |  2 +-
>  drivers/scsi/qla2xxx/qla_os.c                 |  5 ---
>  8 files changed, 32 insertions(+), 27 deletions(-)

Applied to pci/aer for v6.18, thanks, Lukas!

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/5] PCI/AER: Allow drivers to opt in to Bus Reset on Non-Fatal Errors
  2025-08-13  5:11 ` [PATCH 1/5] PCI/AER: Allow drivers to opt in to Bus Reset on Non-Fatal Errors Lukas Wunner
@ 2025-08-13 23:01   ` Sathyanarayanan Kuppuswamy
  2025-08-17 13:45     ` Lukas Wunner
  2025-08-14  7:56   ` Niklas Schnelle
  2025-08-17 16:11   ` Sathyanarayanan Kuppuswamy
  2 siblings, 1 reply; 22+ messages in thread
From: Sathyanarayanan Kuppuswamy @ 2025-08-13 23:01 UTC (permalink / raw)
  To: Lukas Wunner, Bjorn Helgaas
  Cc: Riana Tauro, Aravind Iddamsetty, Sean C. Dardis, Terry Bowman,
	Niklas Schnelle, Linas Vepstas, Mahesh J Salgaonkar,
	Oliver OHalloran, Manivannan Sadhasivam, linuxppc-dev, linux-pci



On 8/12/25 10:11 PM, Lukas Wunner wrote:
> When Advanced Error Reporting was introduced in September 2006 by commit
> 6c2b374d7485 ("PCI-Express AER implemetation: AER core and aerdriver"), it
> sought to adhere to the recovery flow and callbacks specified in
> Documentation/PCI/pci-error-recovery.rst.
>
> That document had been added in January 2006, when Enhanced Error Handling
> (EEH) was introduced for PowerPC with commit 065c6359071c ("[PATCH] PCI
> Error Recovery: documentation").
>
> However the AER driver deviates from the document in that it never
> performs a Secondary Bus Reset on Non-Fatal Errors, but always on Fatal
> Errors.  By contrast, EEH allows drivers to opt in or out of a Bus Reset
> regardless of error severity, by returning PCI_ERS_RESULT_NEED_RESET or
> PCI_ERS_RESULT_CAN_RECOVER from their ->error_detected() callback.  If all
> drivers agree that they can recover without a Bus Reset, EEH skips it.
> Should one of them request a Bus Reset, it overrides all other drivers.
>
> This inconsistency between EEH and AER seems problematic because drivers
> need to be aware of and cope with it.
>
> The file Documentation/PCI/pcieaer-howto.rst hints at a rationale for
> always performing a Bus Reset on Fatal Errors:  "Fatal errors [...] cause
> the link to be unreliable.  [...] This [reset_link] callback is used to
> reset the PCIe physical link when a fatal error happens.  If an error
> message indicates a fatal error, [...] performing link reset at upstream
> is necessary."

In the code we don't seem to differentiate link_reset and slot_reset. But
the Documentation calls them into two steps. Do you think we should
fix the Documentation?

> There's no such rationale provided for never performing a Bus Reset on
> Non-Fatal Errors.
>
> The "xe" driver has a need to attempt a reset of local units on graphics
> cards upon a Non-Fatal Error.  If that is insufficient for recovery, the
> driver wants to opt in to a Bus Reset.
>
> Accommodate such use cases and align AER more closely with EEH by
> performing a Bus Reset in pcie_do_recovery() if drivers request it and the
> faulting device's channel_state is pci_channel_io_normal.  The AER driver
> sets this channel_state for Non-Fatal Errors.  For Fatal Errors, it uses
> pci_channel_io_frozen.
>
> This limits the deviation from Documentation/PCI/pci-error-recovery.rst
> and EEH to the unconditional Bus Reset on Fatal Errors.
>
> pcie_do_recovery() is also invoked by the Downstream Port Containment and
> Error Disconnect Recover drivers.  They both set the channel_state to
> pci_channel_io_frozen, hence pcie_do_recovery() continues to always invoke
> the ->reset_subordinates() callback in their case.  That is necessary
> because the callback brings the link back up at the containing Downstream
> Port.
>
> There are two behavioral changes resulting from this commit:
>
> First, if channel_state is pci_channel_io_normal and one of the affected
> drivers returns PCI_ERS_RESULT_NEED_RESET from its ->error_detected()
> callback, a Bus Reset will now be performed.  There are drivers doing this
> and although it would be possible to avoid a behavioral change by letting
> them return PCI_ERS_RESULT_CAN_RECOVER instead, the impression I got from
> examination of all drivers is that they actually expect or want a Bus
> Reset (cxl_error_detected() is a case in point).  In any case, if they can
> cope with a Bus Reset on Fatal Errors, they shouldn't have issues with a
> Bus Reset on Non-Fatal Errors.
>
> Second, if channel_state is pci_channel_io_frozen and all affected drivers
> return PCI_ERS_RESULT_CAN_RECOVER from ->error_detected(), their
> ->mmio_enabled() callback is now invoked prior to performing a Bus Reset,
> instead of afterwards.  This actually makes sense:  For example,
> drivers/scsi/sym53c8xx_2/sym_glue.c dumps debug registers in its
> ->mmio_enabled() callback.  Doing so after reset right now captures the
> post-reset state instead of the faulting state, which is useless.
>
> There is only one other driver which implements ->mmio_enabled() and
> returns PCI_ERS_RESULT_CAN_RECOVER from ->error_detected() for
> channel_state pci_channel_io_frozen, drivers/scsi/ipr.c (IBM Power RAID).
> It appears to only be used on EEH platforms.  So the second behavioral
> change is limited to these two drivers.
>
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> ---
>   drivers/pci/pcie/err.c | 17 ++++++++++-------
>   1 file changed, 10 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> index de6381c690f5..e795e5ae6b03 100644
> --- a/drivers/pci/pcie/err.c
> +++ b/drivers/pci/pcie/err.c
> @@ -217,15 +217,10 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
>   	pci_walk_bridge(bridge, pci_pm_runtime_get_sync, NULL);
>   
>   	pci_dbg(bridge, "broadcast error_detected message\n");
> -	if (state == pci_channel_io_frozen) {
> +	if (state == pci_channel_io_frozen)
>   		pci_walk_bridge(bridge, report_frozen_detected, &status);
> -		if (reset_subordinates(bridge) != PCI_ERS_RESULT_RECOVERED) {
> -			pci_warn(bridge, "subordinate device reset failed\n");
> -			goto failed;
> -		}
> -	} else {
> +	else
>   		pci_walk_bridge(bridge, report_normal_detected, &status);
> -	}
>   
>   	if (status == PCI_ERS_RESULT_CAN_RECOVER) {
>   		status = PCI_ERS_RESULT_RECOVERED;
> @@ -233,6 +228,14 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
>   		pci_walk_bridge(bridge, report_mmio_enabled, &status);
>   	}
>   
> +	if (status == PCI_ERS_RESULT_NEED_RESET ||
> +	    state == pci_channel_io_frozen) {

Why not let report_frozen_detected() update status to
PCI_ERS_RESULT_NEED_RESET?

For fatal errors, I believe most driver callbacks already return
PCI_ERS_RESULT_NEED_RESET by default. Even if that’s not always the case,
since a reset is required anyway, it might make more sense to update the
status there.If you do it, you can skip the above check.
> +		if (reset_subordinates(bridge) != PCI_ERS_RESULT_RECOVERED) {
> +			pci_warn(bridge, "subordinate device reset failed\n");
> +			goto failed;
> +		}
> +	}

For pci_channel_io_frozen,
> +
>   	if (status == PCI_ERS_RESULT_NEED_RESET) {
>   		/*
>   		 * TODO: Should call platform-specific

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/5] PCI/ERR: Fix uevent on failure to recover
  2025-08-13  5:11 ` [PATCH 2/5] PCI/ERR: Fix uevent on failure to recover Lukas Wunner
@ 2025-08-13 23:01   ` Sathyanarayanan Kuppuswamy
  2025-08-14  7:08   ` Niklas Schnelle
  1 sibling, 0 replies; 22+ messages in thread
From: Sathyanarayanan Kuppuswamy @ 2025-08-13 23:01 UTC (permalink / raw)
  To: Lukas Wunner, Bjorn Helgaas
  Cc: Riana Tauro, Aravind Iddamsetty, Sean C. Dardis, Terry Bowman,
	Niklas Schnelle, Linas Vepstas, Mahesh J Salgaonkar,
	Oliver OHalloran, Manivannan Sadhasivam, linuxppc-dev, linux-pci


On 8/12/25 10:11 PM, Lukas Wunner wrote:
> Upon failure to recover from a PCIe error through AER, DPC or EDR, a
> uevent is sent to inform user space about disconnection of the bridge
> whose subordinate devices failed to recover.
>
> However the bridge itself is not disconnected.  Instead, a uevent should
> be sent for each of the subordinate devices.
>
> Only if the "bridge" happens to be a Root Complex Event Collector or
> Integrated Endpoint does it make sense to send a uevent for it (because
> there are no subordinate devices).
>
> Right now if there is a mix of subordinate devices with and without
> pci_error_handlers, a BEGIN_RECOVERY event is sent for those with
> pci_error_handlers but no FAILED_RECOVERY event is ever sent for them
> afterwards.  Fix it.
>
> Fixes: 856e1eb9bdd4 ("PCI/AER: Add uevents in AER and EEH error/resume")
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> Cc: stable@vger.kernel.org  # v4.16+
> ---

Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>

>   drivers/pci/pcie/err.c | 8 +++++++-
>   1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> index e795e5ae6b03..21d554359fb1 100644
> --- a/drivers/pci/pcie/err.c
> +++ b/drivers/pci/pcie/err.c
> @@ -108,6 +108,12 @@ static int report_normal_detected(struct pci_dev *dev, void *data)
>   	return report_error_detected(dev, pci_channel_io_normal, data);
>   }
>   
> +static int report_perm_failure_detected(struct pci_dev *dev, void *data)
> +{
> +	pci_uevent_ers(dev, PCI_ERS_RESULT_DISCONNECT);
> +	return 0;
> +}
> +
>   static int report_mmio_enabled(struct pci_dev *dev, void *data)
>   {
>   	struct pci_driver *pdrv;
> @@ -272,7 +278,7 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
>   failed:
>   	pci_walk_bridge(bridge, pci_pm_runtime_put, NULL);
>   
> -	pci_uevent_ers(bridge, PCI_ERS_RESULT_DISCONNECT);
> +	pci_walk_bridge(bridge, report_perm_failure_detected, NULL);
>   
>   	pci_info(bridge, "device recovery failed\n");
>   

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/5] PCI/ERR: Notify drivers on failure to recover
  2025-08-13  5:11 ` [PATCH 3/5] PCI/ERR: Notify drivers " Lukas Wunner
@ 2025-08-13 23:05   ` Sathyanarayanan Kuppuswamy
  0 siblings, 0 replies; 22+ messages in thread
From: Sathyanarayanan Kuppuswamy @ 2025-08-13 23:05 UTC (permalink / raw)
  To: Lukas Wunner, Bjorn Helgaas
  Cc: Riana Tauro, Aravind Iddamsetty, Sean C. Dardis, Terry Bowman,
	Niklas Schnelle, Linas Vepstas, Mahesh J Salgaonkar,
	Oliver OHalloran, Manivannan Sadhasivam, linuxppc-dev, linux-pci


On 8/12/25 10:11 PM, Lukas Wunner wrote:
> According to Documentation/PCI/pci-error-recovery.rst, the following shall
> occur on failure to recover from a PCIe Uncorrectable Error:
>
>    STEP 6: Permanent Failure
>    -------------------------
>    A "permanent failure" has occurred, and the platform cannot recover
>    the device.  The platform will call error_detected() with a
>    pci_channel_state_t value of pci_channel_io_perm_failure.
>
>    The device driver should, at this point, assume the worst. It should
>    cancel all pending I/O, refuse all new I/O, returning -EIO to
>    higher layers. The device driver should then clean up all of its
>    memory and remove itself from kernel operations, much as it would
>    during system shutdown.
>
> Sathya notes that AER does not call error_detected() on failure and thus
> deviates from the document (as well as EEH, for which the document was
> originally added).
>
> Most drivers do nothing on permanent failure, but the SCSI drivers and a
> number of Ethernet drivers do take advantage of the notification to flush
> queues and give up resources.
>
> Amend AER to notify such drivers and align with the documentation and EEH.
>
> Link: https://lore.kernel.org/r/f496fc0f-64d7-46a4-8562-dba74e31a956@linux.intel.com/
> Suggested-by: Sathyanarayanan Kuppuswamy <sathyanarayanan.kuppuswamy@linux.intel.com>
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> ---

Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>

>   drivers/pci/pcie/err.c | 12 ++++++++++++
>   1 file changed, 12 insertions(+)
>
> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> index 21d554359fb1..930bb60fb761 100644
> --- a/drivers/pci/pcie/err.c
> +++ b/drivers/pci/pcie/err.c
> @@ -110,7 +110,19 @@ static int report_normal_detected(struct pci_dev *dev, void *data)
>   
>   static int report_perm_failure_detected(struct pci_dev *dev, void *data)
>   {
> +	struct pci_driver *pdrv;
> +	const struct pci_error_handlers *err_handler;
> +
> +	device_lock(&dev->dev);
> +	pdrv = dev->driver;
> +	if (!pdrv || !pdrv->err_handler || !pdrv->err_handler->error_detected)
> +		goto out;
> +
> +	err_handler = pdrv->err_handler;
> +	err_handler->error_detected(dev, pci_channel_io_perm_failure);
> +out:
>   	pci_uevent_ers(dev, PCI_ERS_RESULT_DISCONNECT);
> +	device_unlock(&dev->dev);
>   	return 0;
>   }
>   

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/5] PCI/ERR: Update device error_state already after reset
  2025-08-13  5:11 ` [PATCH 4/5] PCI/ERR: Update device error_state already after reset Lukas Wunner
@ 2025-08-13 23:43   ` Sathyanarayanan Kuppuswamy
  0 siblings, 0 replies; 22+ messages in thread
From: Sathyanarayanan Kuppuswamy @ 2025-08-13 23:43 UTC (permalink / raw)
  To: Lukas Wunner, Bjorn Helgaas
  Cc: Riana Tauro, Aravind Iddamsetty, Sean C. Dardis, Terry Bowman,
	Niklas Schnelle, Linas Vepstas, Mahesh J Salgaonkar,
	Oliver OHalloran, Manivannan Sadhasivam, linuxppc-dev, linux-pci,
	Shahed Shaikh, Manish Chopra, GR-Linux-NIC-Dev, Nilesh Javali,
	GR-QLogic-Storage-Upstream, James E.J. Bottomley,
	Martin K. Petersen, linux-scsi, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev


On 8/12/25 10:11 PM, Lukas Wunner wrote:
> After a Fatal Error has been reported by a device and has been recovered
> through a Secondary Bus Reset, AER updates the device's error_state to
> pci_channel_io_normal before invoking its driver's ->resume() callback.
>
> By contrast, EEH updates the error_state earlier, namely after resetting
> the device and before invoking its driver's ->slot_reset() callback.
> Commit c58dc575f3c8 ("powerpc/pseries: Set error_state to
> pci_channel_io_normal in eeh_report_reset()") explains in great detail
> that the earlier invocation is necessitated by various drivers checking
> accessibility of the device with pci_channel_offline() and avoiding
> accesses if it returns true.  It returns true for any other error_state
> than pci_channel_io_normal.
>
> The device should be accessible already after reset, hence the reasoning
> is that it's safe to update the error_state immediately afterwards.
>
> This deviation between AER and EEH seems problematic because drivers
> behave differently depending on which error recovery mechanism the
> platform uses.  Three drivers have gone so far as to update the
> error_state themselves, presumably to work around AER's behavior.
>
> For consistency, amend AER to update the error_state at the same recovery
> steps as EEH.  Drop the now unnecessary workaround from the three drivers.
>
> Keep updating the error_state before ->resume() in case ->error_detected()
> or ->mmio_enabled() return PCI_ERS_RESULT_RECOVERED, which causes
> ->slot_reset() to be skipped.  There are drivers doing this even for Fatal
> Errors, e.g. mhi_pci_error_detected().
>
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> ---

Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>

>   drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c | 1 -
>   drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c    | 2 --
>   drivers/pci/pcie/err.c                              | 3 ++-
>   drivers/scsi/qla2xxx/qla_os.c                       | 5 -----
>   4 files changed, 2 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c
> index d7cdea8f604d..91e7b38143ea 100644
> --- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c
> +++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c
> @@ -4215,7 +4215,6 @@ static pci_ers_result_t qlcnic_83xx_io_slot_reset(struct pci_dev *pdev)
>   	struct qlcnic_adapter *adapter = pci_get_drvdata(pdev);
>   	int err = 0;
>   
> -	pdev->error_state = pci_channel_io_normal;
>   	err = pci_enable_device(pdev);
>   	if (err)
>   		goto disconnect;
> diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
> index 53cdd36c4123..e051d8c7a28d 100644
> --- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
> +++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
> @@ -3766,8 +3766,6 @@ static int qlcnic_attach_func(struct pci_dev *pdev)
>   	struct qlcnic_adapter *adapter = pci_get_drvdata(pdev);
>   	struct net_device *netdev = adapter->netdev;
>   
> -	pdev->error_state = pci_channel_io_normal;
> -
>   	err = pci_enable_device(pdev);
>   	if (err)
>   		return err;
> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> index 930bb60fb761..bebe4bc111d7 100644
> --- a/drivers/pci/pcie/err.c
> +++ b/drivers/pci/pcie/err.c
> @@ -153,7 +153,8 @@ static int report_slot_reset(struct pci_dev *dev, void *data)
>   
>   	device_lock(&dev->dev);
>   	pdrv = dev->driver;
> -	if (!pdrv || !pdrv->err_handler || !pdrv->err_handler->slot_reset)
> +	if (!pci_dev_set_io_state(dev, pci_channel_io_normal) ||
> +	    !pdrv || !pdrv->err_handler || !pdrv->err_handler->slot_reset)
>   		goto out;
>   
>   	err_handler = pdrv->err_handler;
> diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c
> index d4b484c0fd9d..4460421834cb 100644
> --- a/drivers/scsi/qla2xxx/qla_os.c
> +++ b/drivers/scsi/qla2xxx/qla_os.c
> @@ -7883,11 +7883,6 @@ qla2xxx_pci_slot_reset(struct pci_dev *pdev)
>   	       "Slot Reset.\n");
>   
>   	ha->pci_error_state = QLA_PCI_SLOT_RESET;
> -	/* Workaround: qla2xxx driver which access hardware earlier
> -	 * needs error state to be pci_channel_io_online.
> -	 * Otherwise mailbox command timesout.
> -	 */
> -	pdev->error_state = pci_channel_io_normal;
>   
>   	pci_restore_state(pdev);
>   

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 5/5] PCI/ERR: Remove remnants of .link_reset() callback
  2025-08-13  5:11 ` [PATCH 5/5] PCI/ERR: Remove remnants of .link_reset() callback Lukas Wunner
@ 2025-08-14  0:40   ` Sathyanarayanan Kuppuswamy
  0 siblings, 0 replies; 22+ messages in thread
From: Sathyanarayanan Kuppuswamy @ 2025-08-14  0:40 UTC (permalink / raw)
  To: Lukas Wunner, Bjorn Helgaas
  Cc: Riana Tauro, Aravind Iddamsetty, Sean C. Dardis, Terry Bowman,
	Niklas Schnelle, Linas Vepstas, Mahesh J Salgaonkar,
	Oliver OHalloran, Manivannan Sadhasivam, linuxppc-dev, linux-pci,
	Edward Cree, linux-net-drivers, James Smart, Dick Kennedy,
	James E.J. Bottomley, Martin K. Petersen, linux-scsi, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	netdev


On 8/12/25 10:11 PM, Lukas Wunner wrote:
> Back in 2017, commit 2fd260f03b6a ("PCI/AER: Remove unused .link_reset()
> callback") removed .link_reset() from struct pci_error_handlers, but left
> a few code comments behind which still mention it.  Remove them.
>
> The code comments in the SolarFlare Ethernet drivers point out that no
> .mmio_enabled() callback is needed because the driver's .error_detected()
> callback always returns PCI_ERS_RESULT_NEED_RESET, which causes
> pcie_do_recovery() to skip .mmio_enabled().  That's not quite correct
> because efx_io_error_detected() does return PCI_ERS_RESULT_RECOVERED under
> certain conditions and then .mmio_enabled() would indeed be called if it
> were implemented.  Remove this misleading portion of the code comment as
> well.
>
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> ---

Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>

>   drivers/net/ethernet/sfc/efx_common.c       | 3 ---
>   drivers/net/ethernet/sfc/falcon/efx.c       | 3 ---
>   drivers/net/ethernet/sfc/siena/efx_common.c | 3 ---
>   drivers/scsi/lpfc/lpfc_init.c               | 2 +-
>   4 files changed, 1 insertion(+), 10 deletions(-)
>
> diff --git a/drivers/net/ethernet/sfc/efx_common.c b/drivers/net/ethernet/sfc/efx_common.c
> index 5a14d94163b1..e8fdbb62d872 100644
> --- a/drivers/net/ethernet/sfc/efx_common.c
> +++ b/drivers/net/ethernet/sfc/efx_common.c
> @@ -1258,9 +1258,6 @@ static void efx_io_resume(struct pci_dev *pdev)
>   
>   /* For simplicity and reliability, we always require a slot reset and try to
>    * reset the hardware when a pci error affecting the device is detected.
> - * We leave both the link_reset and mmio_enabled callback unimplemented:
> - * with our request for slot reset the mmio_enabled callback will never be
> - * called, and the link_reset callback is not used by AER or EEH mechanisms.
>    */
>   const struct pci_error_handlers efx_err_handlers = {
>   	.error_detected = efx_io_error_detected,
> diff --git a/drivers/net/ethernet/sfc/falcon/efx.c b/drivers/net/ethernet/sfc/falcon/efx.c
> index b07f7e4e2877..0c784656fde9 100644
> --- a/drivers/net/ethernet/sfc/falcon/efx.c
> +++ b/drivers/net/ethernet/sfc/falcon/efx.c
> @@ -3128,9 +3128,6 @@ static void ef4_io_resume(struct pci_dev *pdev)
>   
>   /* For simplicity and reliability, we always require a slot reset and try to
>    * reset the hardware when a pci error affecting the device is detected.
> - * We leave both the link_reset and mmio_enabled callback unimplemented:
> - * with our request for slot reset the mmio_enabled callback will never be
> - * called, and the link_reset callback is not used by AER or EEH mechanisms.
>    */
>   static const struct pci_error_handlers ef4_err_handlers = {
>   	.error_detected = ef4_io_error_detected,
> diff --git a/drivers/net/ethernet/sfc/siena/efx_common.c b/drivers/net/ethernet/sfc/siena/efx_common.c
> index a0966f879664..35036cc902fe 100644
> --- a/drivers/net/ethernet/sfc/siena/efx_common.c
> +++ b/drivers/net/ethernet/sfc/siena/efx_common.c
> @@ -1285,9 +1285,6 @@ static void efx_io_resume(struct pci_dev *pdev)
>   
>   /* For simplicity and reliability, we always require a slot reset and try to
>    * reset the hardware when a pci error affecting the device is detected.
> - * We leave both the link_reset and mmio_enabled callback unimplemented:
> - * with our request for slot reset the mmio_enabled callback will never be
> - * called, and the link_reset callback is not used by AER or EEH mechanisms.
>    */
>   const struct pci_error_handlers efx_siena_err_handlers = {
>   	.error_detected = efx_io_error_detected,
> diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c
> index 4081d2a358ee..cf08bb5b37c3 100644
> --- a/drivers/scsi/lpfc/lpfc_init.c
> +++ b/drivers/scsi/lpfc/lpfc_init.c
> @@ -14377,7 +14377,7 @@ lpfc_sli_prep_dev_for_perm_failure(struct lpfc_hba *phba)
>    * as desired.
>    *
>    * Return codes
> - * 	PCI_ERS_RESULT_CAN_RECOVER - can be recovered with reset_link
> + *	PCI_ERS_RESULT_CAN_RECOVER - can be recovered without reset
>    * 	PCI_ERS_RESULT_NEED_RESET - need to reset before recovery
>    * 	PCI_ERS_RESULT_DISCONNECT - device could not be recovered
>    **/

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/5] PCI/ERR: Fix uevent on failure to recover
  2025-08-13  5:11 ` [PATCH 2/5] PCI/ERR: Fix uevent on failure to recover Lukas Wunner
  2025-08-13 23:01   ` Sathyanarayanan Kuppuswamy
@ 2025-08-14  7:08   ` Niklas Schnelle
  1 sibling, 0 replies; 22+ messages in thread
From: Niklas Schnelle @ 2025-08-14  7:08 UTC (permalink / raw)
  To: Lukas Wunner, Bjorn Helgaas
  Cc: Riana Tauro, Aravind Iddamsetty, Sean C. Dardis, Terry Bowman,
	Sathyanarayanan Kuppuswamy, Linas Vepstas, Mahesh J Salgaonkar,
	Oliver OHalloran, Manivannan Sadhasivam, linuxppc-dev, linux-pci

On Wed, 2025-08-13 at 07:11 +0200, Lukas Wunner wrote:
> Upon failure to recover from a PCIe error through AER, DPC or EDR, a
> uevent is sent to inform user space about disconnection of the bridge
> whose subordinate devices failed to recover.
> 
> However the bridge itself is not disconnected.  Instead, a uevent should
> be sent for each of the subordinate devices.
> 
> Only if the "bridge" happens to be a Root Complex Event Collector or
> Integrated Endpoint does it make sense to send a uevent for it (because
> there are no subordinate devices).
> 
> Right now if there is a mix of subordinate devices with and without
> pci_error_handlers, a BEGIN_RECOVERY event is sent for those with
> pci_error_handlers but no FAILED_RECOVERY event is ever sent for them
> afterwards.  Fix it.
> 
> Fixes: 856e1eb9bdd4 ("PCI/AER: Add uevents in AER and EEH error/resume")
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> Cc: stable@vger.kernel.org  # v4.16+
> ---
--- snip ---
> 
> +static int report_perm_failure_detected(struct pci_dev *dev, void *data)
> +{
> +	pci_uevent_ers(dev, PCI_ERS_RESULT_DISCONNECT);
> +	return 0;
> +}
> +
>  static int report_mmio_enabled(struct pci_dev *dev, void *data)
>  {
>  	struct pci_driver *pdrv;
> @@ -272,7 +278,7 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
>  failed:
>  	pci_walk_bridge(bridge, pci_pm_runtime_put, NULL);
>  
> -	pci_uevent_ers(bridge, PCI_ERS_RESULT_DISCONNECT);
> +	pci_walk_bridge(bridge, report_perm_failure_detected, NULL);
>  
>  	pci_info(bridge, "device recovery failed\n");
>  

Thanks for catching this during review of my other error recovery
uevent fix! Looks good and I like that you kept the report_*_detected()
naming which makes the mismatched symmetry of the existing code quite
easy to see.

Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/5] PCI/AER: Allow drivers to opt in to Bus Reset on Non-Fatal Errors
  2025-08-13  5:11 ` [PATCH 1/5] PCI/AER: Allow drivers to opt in to Bus Reset on Non-Fatal Errors Lukas Wunner
  2025-08-13 23:01   ` Sathyanarayanan Kuppuswamy
@ 2025-08-14  7:56   ` Niklas Schnelle
  2025-08-14  9:36     ` Lukas Wunner
  2025-08-18 23:17     ` Linas Vepstas
  2025-08-17 16:11   ` Sathyanarayanan Kuppuswamy
  2 siblings, 2 replies; 22+ messages in thread
From: Niklas Schnelle @ 2025-08-14  7:56 UTC (permalink / raw)
  To: Lukas Wunner, Bjorn Helgaas
  Cc: Riana Tauro, Aravind Iddamsetty, Sean C. Dardis, Terry Bowman,
	Sathyanarayanan Kuppuswamy, Linas Vepstas, Mahesh J Salgaonkar,
	Oliver OHalloran, Manivannan Sadhasivam, linuxppc-dev, linux-pci

On Wed, 2025-08-13 at 07:11 +0200, Lukas Wunner wrote:
> When Advanced Error Reporting was introduced in September 2006 by commit
> 6c2b374d7485 ("PCI-Express AER implemetation: AER core and aerdriver"), it
> sought to adhere to the recovery flow and callbacks specified in
> Documentation/PCI/pci-error-recovery.rst.
> 
--- snip ---
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> ---
>  drivers/pci/pcie/err.c | 17 ++++++++++-------
>  1 file changed, 10 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> index de6381c690f5..e795e5ae6b03 100644
> --- a/drivers/pci/pcie/err.c
> +++ b/drivers/pci/pcie/err.c
> @@ -217,15 +217,10 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
>  	pci_walk_bridge(bridge, pci_pm_runtime_get_sync, NULL);
>  
>  	pci_dbg(bridge, "broadcast error_detected message\n");
> -	if (state == pci_channel_io_frozen) {
> +	if (state == pci_channel_io_frozen)
>  		pci_walk_bridge(bridge, report_frozen_detected, &status);
> -		if (reset_subordinates(bridge) != PCI_ERS_RESULT_RECOVERED) {
> -			pci_warn(bridge, "subordinate device reset failed\n");
> -			goto failed;
> -		}
> -	} else {
> +	else
>  		pci_walk_bridge(bridge, report_normal_detected, &status);
> -	}
>  
>  	if (status == PCI_ERS_RESULT_CAN_RECOVER) {
>  		status = PCI_ERS_RESULT_RECOVERED;

On s390 PCI errors leave the device with MMIO blocked until either the
error state is cleared or we reset via the firmware interface. With
this change and the pci_channel_io_frozen case AER would now do the
report_mmio_enabled() before the reset with nothing happening between
report_frozen_detected() and report_mmio_enabled() is MMIO enabled at
this point? I think this callback really only makes sense if you have
an equivalent to s390's clearing of the error state that enables MMIO
but doesn't otherwise reset. Similarly EEH has eeh_pci_enable(pe,
EEH_OPT_THAW_MMIO).

> @@ -233,6 +228,14 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
>  		pci_walk_bridge(bridge, report_mmio_enabled, &status);
>  	}
>  
> +	if (status == PCI_ERS_RESULT_NEED_RESET ||
> +	    state == pci_channel_io_frozen) {
> +		if (reset_subordinates(bridge) != PCI_ERS_RESULT_RECOVERED) {
> +			pci_warn(bridge, "subordinate device reset failed\n");
> +			goto failed;
> +		}
> +	}
> +
>  	if (status == PCI_ERS_RESULT_NEED_RESET) {
>  		/*
>  		 * TODO: Should call platform-specific

I wonder if it might make sense to merge the reset into the above
existing if. That would also work well with Sathyanarayanan's
suggestion to have state == pci_channel_io_frozen upgrade to
PCI_ERS_RESULT_NEED_RESET. I'm a bit confused by that TODO comment and
anyway, it asks for a platform-specific reset to be added bu
reset_subordinate() already seems to allow platform specific behavior,
so maybe the moved reset should replace the TODO? Also I think for the
kind of broken case where the state is pci_channel_io_frozen but the
drivers just reports PCI_ERS_RESULT_CAN_RECOVER it looks like there
would be a reset but no calls to ->slot_reset().

Thanks,
Niklas

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/5] PCI/AER: Allow drivers to opt in to Bus Reset on Non-Fatal Errors
  2025-08-14  7:56   ` Niklas Schnelle
@ 2025-08-14  9:36     ` Lukas Wunner
  2025-08-14 19:29       ` Sathyanarayanan Kuppuswamy
  2025-08-14 20:31       ` Niklas Schnelle
  2025-08-18 23:17     ` Linas Vepstas
  1 sibling, 2 replies; 22+ messages in thread
From: Lukas Wunner @ 2025-08-14  9:36 UTC (permalink / raw)
  To: Niklas Schnelle
  Cc: Bjorn Helgaas, Riana Tauro, Aravind Iddamsetty, Sean C. Dardis,
	Terry Bowman, Sathyanarayanan Kuppuswamy, Linas Vepstas,
	Mahesh J Salgaonkar, Oliver OHalloran, Manivannan Sadhasivam,
	linuxppc-dev, linux-pci

On Thu, Aug 14, 2025 at 09:56:09AM +0200, Niklas Schnelle wrote:
> On Wed, 2025-08-13 at 07:11 +0200, Lukas Wunner wrote:
> > +++ b/drivers/pci/pcie/err.c
> > @@ -217,15 +217,10 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
> >  	pci_walk_bridge(bridge, pci_pm_runtime_get_sync, NULL);
> >  
> >  	pci_dbg(bridge, "broadcast error_detected message\n");
> > -	if (state == pci_channel_io_frozen) {
> > +	if (state == pci_channel_io_frozen)
> >  		pci_walk_bridge(bridge, report_frozen_detected, &status);
> > -		if (reset_subordinates(bridge) != PCI_ERS_RESULT_RECOVERED) {
> > -			pci_warn(bridge, "subordinate device reset failed\n");
> > -			goto failed;
> > -		}
> > -	} else {
> > +	else
> >  		pci_walk_bridge(bridge, report_normal_detected, &status);
> > -	}
> >  
> >  	if (status == PCI_ERS_RESULT_CAN_RECOVER) {
> >  		status = PCI_ERS_RESULT_RECOVERED;
> 
> On s390 PCI errors leave the device with MMIO blocked until either the
> error state is cleared or we reset via the firmware interface. With
> this change and the pci_channel_io_frozen case AER would now do the
> report_mmio_enabled() before the reset with nothing happening between
> report_frozen_detected() and report_mmio_enabled() is MMIO enabled at
> this point?

Yes, MMIO access is controlled through the Memory Space Enable and
Bus Master Enable bits in the Command Register (PCIe r7.0 sec 7.5.1.1.3).
Drivers generally set those bits on probe and they're not automatically
cleared by hardware upon an Uncorrectable Error.

EEH and s390 blocking MMIO access likely serves the purpose of preventing
corrupted data being propagated upstream.  AER doesn't support that
(or at least doesn't mandate that -- certain implementations might
be capable of blocking poisoned data).

Thus with AER, MMIO access is usually possible already in ->error_detected().
The only reason why drivers shouldn't be doing that and instead defer
MMIO access to ->mmio_enabled() is to be compatible with EEH and s390.

There are exceptions to this rule:  E.g. if the Uncorrectable Error
was "Surprise Down", the link to the device is down and MMIO access
isn't possible, neither in ->error_detected() nor ->mmio_enabled().
Drivers should check whether an MMIO read results in an "all ones"
response (PCI_POSSIBLE_ERROR()), which is usually what the host bridge
fabricates upon a Completion Timeout caused by the link being down
and the Endpoint being inaccessible.

(There's a list of all the errors with default severity etc in PCIe r7.0
sec 6.2.7.)

I believe DPC was added to the PCIe Base Specification to prevent
propagating corrupted data upstream, similarly to EEH and s390.
DPC brings down the link immediately upon an Uncorrectable Error
(the Linux driver confines this to Fatal Errors), but as a side
effect this results in a Hot Reset being propagated down the
hierarchy, making it impossible to access the device in the
faulting state to retrieve debug information etc.  After the link
has been brought up again, the device is in post-reset state.
So DPC doesn't allow for reset-less recovery.

With the ordering change introduced by this commit, ->mmio_enabled()
will no longer be able to access MMIO space in the DPC case because
the link hasn't been brought back up until ->slot_reset().  But as
explained in the commit message, I only found two drivers affected
by this.  One seems to be EEH-specific (drivers/scsi/ipr.c).
And the other one (drivers/scsi/sym53c8xx_2/sym_glue.c) dumps debug
registers in ->mmio_enabled() and I'm arguing that with this commit
it's dumping "all ones", but right now it's dumping the post-reset
state of the device, which isn't any more useful.

> I think this callback really only makes sense if you have
> an equivalent to s390's clearing of the error state that enables MMIO
> but doesn't otherwise reset. Similarly EEH has eeh_pci_enable(pe,
> EEH_OPT_THAW_MMIO).

Right.

> > @@ -233,6 +228,14 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
> >  		pci_walk_bridge(bridge, report_mmio_enabled, &status);
> >  	}
> >  
> > +	if (status == PCI_ERS_RESULT_NEED_RESET ||
> > +	    state == pci_channel_io_frozen) {
> > +		if (reset_subordinates(bridge) != PCI_ERS_RESULT_RECOVERED) {
> > +			pci_warn(bridge, "subordinate device reset failed\n");
> > +			goto failed;
> > +		}
> > +	}
> > +
> >  	if (status == PCI_ERS_RESULT_NEED_RESET) {
> >  		/*
> >  		 * TODO: Should call platform-specific
> 
> I wonder if it might make sense to merge the reset into the above
> existing if.

There are drivers such as drivers/bus/mhi/host/pci_generic.c which
return PCI_ERS_RESULT_RECOVERED from ->error_detected().  So they
fall through directly to the ->resume() stage.  They're doing this
even in the pci_channel_io_frozen case (i.e. for Fatal Errors).

But for DPC we must call reset_subordinates() to bring the link back up.
And for Fatal Errors, Documentation/PCI/pcieaer-howto.rst suggests that
we must likewise call it because the link may be unreliable.

Hence the if-condition must use a boolean OR, i.e.:

	if (status == PCI_ERS_RESULT_NEED_RESET ||
	    state == pci_channel_io_frozen) {

... whereas if I would move the invocation of reset_subordinates() inside
the existing "if (status == PCI_ERS_RESULT_NEED_RESET)", it would be
equivalent to a boolean AND.

I would have to amend drivers such as drivers/bus/mhi/host/pci_generic.c
to work with the changed logic and I figured that it's simpler to only
change pcie_do_recovery() rather than touching all affected drivers.
I don't have any of that hardware and so it seems risky touching all
those drivers.

> I'm a bit confused by that TODO comment and
> anyway, it asks for a platform-specific reset to be added bu
> reset_subordinate() already seems to allow platform specific behavior,
> so maybe the moved reset should replace the TODO?

Manivannan has a patch pending which removes the TODO:

https://lore.kernel.org/all/20250715-pci-port-reset-v6-1-6f9cce94e7bb@oss.qualcomm.com/

> Also I think for the
> kind of broken case where the state is pci_channel_io_frozen but the
> drivers just reports PCI_ERS_RESULT_CAN_RECOVER it looks like there
> would be a reset but no calls to ->slot_reset().

Right, but that's what AER is currently doing and drivers such as
drivers/bus/mhi/host/pci_generic.c are written to work with the
existing flow.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/5] PCI/AER: Allow drivers to opt in to Bus Reset on Non-Fatal Errors
  2025-08-14  9:36     ` Lukas Wunner
@ 2025-08-14 19:29       ` Sathyanarayanan Kuppuswamy
  2025-08-17 13:17         ` Lukas Wunner
  2025-08-14 20:31       ` Niklas Schnelle
  1 sibling, 1 reply; 22+ messages in thread
From: Sathyanarayanan Kuppuswamy @ 2025-08-14 19:29 UTC (permalink / raw)
  To: Lukas Wunner, Niklas Schnelle
  Cc: Bjorn Helgaas, Riana Tauro, Aravind Iddamsetty, Sean C. Dardis,
	Terry Bowman, Linas Vepstas, Mahesh J Salgaonkar,
	Oliver OHalloran, Manivannan Sadhasivam, linuxppc-dev, linux-pci


On 8/14/25 2:36 AM, Lukas Wunner wrote:
> On Thu, Aug 14, 2025 at 09:56:09AM +0200, Niklas Schnelle wrote:
>> On Wed, 2025-08-13 at 07:11 +0200, Lukas Wunner wrote:
>>> +++ b/drivers/pci/pcie/err.c
>>> @@ -217,15 +217,10 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
>>>   	pci_walk_bridge(bridge, pci_pm_runtime_get_sync, NULL);
>>>   
>>>   	pci_dbg(bridge, "broadcast error_detected message\n");
>>> -	if (state == pci_channel_io_frozen) {
>>> +	if (state == pci_channel_io_frozen)
>>>   		pci_walk_bridge(bridge, report_frozen_detected, &status);
>>> -		if (reset_subordinates(bridge) != PCI_ERS_RESULT_RECOVERED) {
>>> -			pci_warn(bridge, "subordinate device reset failed\n");
>>> -			goto failed;
>>> -		}
>>> -	} else {
>>> +	else
>>>   		pci_walk_bridge(bridge, report_normal_detected, &status);
>>> -	}
>>>   
>>>   	if (status == PCI_ERS_RESULT_CAN_RECOVER) {
>>>   		status = PCI_ERS_RESULT_RECOVERED;
>> On s390 PCI errors leave the device with MMIO blocked until either the
>> error state is cleared or we reset via the firmware interface. With
>> this change and the pci_channel_io_frozen case AER would now do the
>> report_mmio_enabled() before the reset with nothing happening between
>> report_frozen_detected() and report_mmio_enabled() is MMIO enabled at
>> this point?
> Yes, MMIO access is controlled through the Memory Space Enable and
> Bus Master Enable bits in the Command Register (PCIe r7.0 sec 7.5.1.1.3).
> Drivers generally set those bits on probe and they're not automatically
> cleared by hardware upon an Uncorrectable Error.
>
> EEH and s390 blocking MMIO access likely serves the purpose of preventing
> corrupted data being propagated upstream.  AER doesn't support that
> (or at least doesn't mandate that -- certain implementations might
> be capable of blocking poisoned data).
>
> Thus with AER, MMIO access is usually possible already in ->error_detected().
> The only reason why drivers shouldn't be doing that and instead defer
> MMIO access to ->mmio_enabled() is to be compatible with EEH and s390.
>
> There are exceptions to this rule:  E.g. if the Uncorrectable Error
> was "Surprise Down", the link to the device is down and MMIO access
> isn't possible, neither in ->error_detected() nor ->mmio_enabled().
> Drivers should check whether an MMIO read results in an "all ones"
> response (PCI_POSSIBLE_ERROR()), which is usually what the host bridge
> fabricates upon a Completion Timeout caused by the link being down
> and the Endpoint being inaccessible.
>
> (There's a list of all the errors with default severity etc in PCIe r7.0
> sec 6.2.7.)
>
> I believe DPC was added to the PCIe Base Specification to prevent
> propagating corrupted data upstream, similarly to EEH and s390.
> DPC brings down the link immediately upon an Uncorrectable Error
> (the Linux driver confines this to Fatal Errors), but as a side
> effect this results in a Hot Reset being propagated down the
> hierarchy, making it impossible to access the device in the
> faulting state to retrieve debug information etc.  After the link
> has been brought up again, the device is in post-reset state.
> So DPC doesn't allow for reset-less recovery.
>
> With the ordering change introduced by this commit, ->mmio_enabled()
> will no longer be able to access MMIO space in the DPC case because
> the link hasn't been brought back up until ->slot_reset().  But as
> explained in the commit message, I only found two drivers affected
> by this.  One seems to be EEH-specific (drivers/scsi/ipr.c).
> And the other one (drivers/scsi/sym53c8xx_2/sym_glue.c) dumps debug
> registers in ->mmio_enabled() and I'm arguing that with this commit
> it's dumping "all ones", but right now it's dumping the post-reset
> state of the device, which isn't any more useful.
>
>> I think this callback really only makes sense if you have
>> an equivalent to s390's clearing of the error state that enables MMIO
>> but doesn't otherwise reset. Similarly EEH has eeh_pci_enable(pe,
>> EEH_OPT_THAW_MMIO).
> Right.
>
>>> @@ -233,6 +228,14 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
>>>   		pci_walk_bridge(bridge, report_mmio_enabled, &status);
>>>   	}
>>>   
>>> +	if (status == PCI_ERS_RESULT_NEED_RESET ||
>>> +	    state == pci_channel_io_frozen) {
>>> +		if (reset_subordinates(bridge) != PCI_ERS_RESULT_RECOVERED) {
>>> +			pci_warn(bridge, "subordinate device reset failed\n");
>>> +			goto failed;
>>> +		}
>>> +	}
>>> +
>>>   	if (status == PCI_ERS_RESULT_NEED_RESET) {
>>>   		/*
>>>   		 * TODO: Should call platform-specific
>> I wonder if it might make sense to merge the reset into the above
>> existing if.
> There are drivers such as drivers/bus/mhi/host/pci_generic.c which
> return PCI_ERS_RESULT_RECOVERED from ->error_detected().  So they
> fall through directly to the ->resume() stage.  They're doing this
> even in the pci_channel_io_frozen case (i.e. for Fatal Errors).
>
> But for DPC we must call reset_subordinates() to bring the link back up.
> And for Fatal Errors, Documentation/PCI/pcieaer-howto.rst suggests that
> we must likewise call it because the link may be unreliable.

For fatal errors, since we already ignore the value returned by
->error_detected() (by calling reset_subordinates() unconditionally), why
not update status accordingly in report_frozen_detected() and notify the
driver about the reset?

That way, the reset logic could be unified under a single if
(status == PCI_ERS_RESULT_NEED_RESET) condition.

Checking the drivers/bus/mhi/host/pci_generic.c implementation, it looks
like calling slot_reset callback looks harmless.
>
> Hence the if-condition must use a boolean OR, i.e.:
>
> 	if (status == PCI_ERS_RESULT_NEED_RESET ||
> 	    state == pci_channel_io_frozen) {
>
> ... whereas if I would move the invocation of reset_subordinates() inside
> the existing "if (status == PCI_ERS_RESULT_NEED_RESET)", it would be
> equivalent to a boolean AND.
>
> I would have to amend drivers such as drivers/bus/mhi/host/pci_generic.c
> to work with the changed logic and I figured that it's simpler to only
> change pcie_do_recovery() rather than touching all affected drivers.
> I don't have any of that hardware and so it seems risky touching all
> those drivers.
>
>> I'm a bit confused by that TODO comment and
>> anyway, it asks for a platform-specific reset to be added bu
>> reset_subordinate() already seems to allow platform specific behavior,
>> so maybe the moved reset should replace the TODO?
> Manivannan has a patch pending which removes the TODO:
>
> https://lore.kernel.org/all/20250715-pci-port-reset-v6-1-6f9cce94e7bb@oss.qualcomm.com/
>
>> Also I think for the
>> kind of broken case where the state is pci_channel_io_frozen but the
>> drivers just reports PCI_ERS_RESULT_CAN_RECOVER it looks like there
>> would be a reset but no calls to ->slot_reset().
> Right, but that's what AER is currently doing and drivers such as
> drivers/bus/mhi/host/pci_generic.c are written to work with the
> existing flow.
>
> Thanks,
>
> Lukas
>
-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/5] PCI/AER: Allow drivers to opt in to Bus Reset on Non-Fatal Errors
  2025-08-14  9:36     ` Lukas Wunner
  2025-08-14 19:29       ` Sathyanarayanan Kuppuswamy
@ 2025-08-14 20:31       ` Niklas Schnelle
  1 sibling, 0 replies; 22+ messages in thread
From: Niklas Schnelle @ 2025-08-14 20:31 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Bjorn Helgaas, Riana Tauro, Aravind Iddamsetty, Sean C. Dardis,
	Terry Bowman, Sathyanarayanan Kuppuswamy, Linas Vepstas,
	Mahesh J Salgaonkar, Oliver OHalloran, Manivannan Sadhasivam,
	linuxppc-dev, linux-pci

On Thu, 2025-08-14 at 11:36 +0200, Lukas Wunner wrote:
> On Thu, Aug 14, 2025 at 09:56:09AM +0200, Niklas Schnelle wrote:
> > On Wed, 2025-08-13 at 07:11 +0200, Lukas Wunner wrote:
> > 
--- snip ---
> > On s390 PCI errors leave the device with MMIO blocked until either the
> > error state is cleared or we reset via the firmware interface. With
> > this change and the pci_channel_io_frozen case AER would now do the
> > report_mmio_enabled() before the reset with nothing happening between
> > report_frozen_detected() and report_mmio_enabled() is MMIO enabled at
> > this point?
> 
> Yes, MMIO access is controlled through the Memory Space Enable and
> Bus Master Enable bits in the Command Register (PCIe r7.0 sec 7.5.1.1.3).
> Drivers generally set those bits on probe and they're not automatically
> cleared by hardware upon an Uncorrectable Error.
> 
> EEH and s390 blocking MMIO access likely serves the purpose of preventing
> corrupted data being propagated upstream.  AER doesn't support that
> (or at least doesn't mandate that -- certain implementations might
> be capable of blocking poisoned data).
> 
> Thus with AER, MMIO access is usually possible already in ->error_detected().
> The only reason why drivers shouldn't be doing that and instead defer
> MMIO access to ->mmio_enabled() is to be compatible with EEH and s390.
> 
> There are exceptions to this rule:  E.g. if the Uncorrectable Error
> was "Surprise Down", the link to the device is down and MMIO access
> isn't possible, neither in ->error_detected() nor ->mmio_enabled().
> Drivers should check whether an MMIO read results in an "all ones"
> response (PCI_POSSIBLE_ERROR()), which is usually what the host bridge
> fabricates upon a Completion Timeout caused by the link being down
> and the Endpoint being inaccessible.
> 
> (There's a list of all the errors with default severity etc in PCIe r7.0
> sec 6.2.7.)
> 
> I believe DPC was added to the PCIe Base Specification to prevent
> propagating corrupted data upstream, similarly to EEH and s390.
> DPC brings down the link immediately upon an Uncorrectable Error
> (the Linux driver confines this to Fatal Errors), but as a side
> effect this results in a Hot Reset being propagated down the
> hierarchy, making it impossible to access the device in the
> faulting state to retrieve debug information etc.  After the link
> has been brought up again, the device is in post-reset state.
> So DPC doesn't allow for reset-less recovery.
> 
> With the ordering change introduced by this commit, ->mmio_enabled()
> will no longer be able to access MMIO space in the DPC case because
> the link hasn't been brought back up until ->slot_reset().  But as
> explained in the commit message, I only found two drivers affected
> by this.  One seems to be EEH-specific (drivers/scsi/ipr.c).
> And the other one (drivers/scsi/sym53c8xx_2/sym_glue.c) dumps debug
> registers in ->mmio_enabled() and I'm arguing that with this commit
> it's dumping "all ones", but right now it's dumping the post-reset
> state of the device, which isn't any more useful.
> 
> > I think this callback really only makes sense if you have
> > an equivalent to s390's clearing of the error state that enables MMIO
> > but doesn't otherwise reset. Similarly EEH has eeh_pci_enable(pe,
> > EEH_OPT_THAW_MMIO).
> 
> Right.
--- snip ---
> > I wonder if it might make sense to merge the reset into the above
> > existing if.
> 
> There are drivers such as drivers/bus/mhi/host/pci_generic.c which
> return PCI_ERS_RESULT_RECOVERED from ->error_detected().  So they
> fall through directly to the ->resume() stage.  They're doing this
> even in the pci_channel_io_frozen case (i.e. for Fatal Errors).
> 
> But for DPC we must call reset_subordinates() to bring the link back up.
> And for Fatal Errors, Documentation/PCI/pcieaer-howto.rst suggests that
> we must likewise call it because the link may be unreliable.
> 
> Hence the if-condition must use a boolean OR, i.e.:
> 
> 	if (status == PCI_ERS_RESULT_NEED_RESET ||
> 	    state == pci_channel_io_frozen) {
> 
> ... whereas if I would move the invocation of reset_subordinates() inside
> the existing "if (status == PCI_ERS_RESULT_NEED_RESET)", it would be
> equivalent to a boolean AND.
> 
> I would have to amend drivers such as drivers/bus/mhi/host/pci_generic.c
> to work with the changed logic and I figured that it's simpler to only
> change pcie_do_recovery() rather than touching all affected drivers.
> I don't have any of that hardware and so it seems risky touching all
> those drivers.
> 
> > I'm a bit confused by that TODO comment and
> > anyway, it asks for a platform-specific reset to be added bu
> > reset_subordinate() already seems to allow platform specific behavior,
> > so maybe the moved reset should replace the TODO?
> 
> Manivannan has a patch pending which removes the TODO:
> 
> https://lore.kernel.org/all/20250715-pci-port-reset-v6-1-6f9cce94e7bb@oss.qualcomm.com/
> 
> > Also I think for the
> > kind of broken case where the state is pci_channel_io_frozen but the
> > drivers just reports PCI_ERS_RESULT_CAN_RECOVER it looks like there
> > would be a reset but no calls to ->slot_reset().
> 
> Right, but that's what AER is currently doing and drivers such as
> drivers/bus/mhi/host/pci_generic.c are written to work with the
> existing flow.
> 
> Thanks,
> 
> Lukas

Thank you for the very clear and extremely detailed explanation both in
your replies and the commit messages, this is really helpful and I
learned a lot. I know it was a lot of work and I appreciate you taking
that time and effort especially when the code was already merged!

Warm regards,
Niklas Schnelle

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/5] PCI/AER: Allow drivers to opt in to Bus Reset on Non-Fatal Errors
  2025-08-14 19:29       ` Sathyanarayanan Kuppuswamy
@ 2025-08-17 13:17         ` Lukas Wunner
  2025-08-17 16:10           ` Sathyanarayanan Kuppuswamy
  0 siblings, 1 reply; 22+ messages in thread
From: Lukas Wunner @ 2025-08-17 13:17 UTC (permalink / raw)
  To: Sathyanarayanan Kuppuswamy
  Cc: Niklas Schnelle, Bjorn Helgaas, Riana Tauro, Aravind Iddamsetty,
	Sean C. Dardis, Terry Bowman, Linas Vepstas, Mahesh J Salgaonkar,
	Oliver OHalloran, Manivannan Sadhasivam, linuxppc-dev, linux-pci

On Thu, Aug 14, 2025 at 12:29:25PM -0700, Sathyanarayanan Kuppuswamy wrote:
> On 8/14/25 2:36 AM, Lukas Wunner wrote:
> > On Thu, Aug 14, 2025 at 09:56:09AM +0200, Niklas Schnelle wrote:
> > > On Wed, 2025-08-13 at 07:11 +0200, Lukas Wunner wrote:
> > > > @@ -233,6 +228,14 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
> > > >   		pci_walk_bridge(bridge, report_mmio_enabled, &status);
> > > >   	}
> > > > +	if (status == PCI_ERS_RESULT_NEED_RESET ||
> > > > +	    state == pci_channel_io_frozen) {
> > > > +		if (reset_subordinates(bridge) != PCI_ERS_RESULT_RECOVERED) {
> > > > +			pci_warn(bridge, "subordinate device reset failed\n");
> > > > +			goto failed;
> > > > +		}
> > > > +	}
> > > > +
> > > >   	if (status == PCI_ERS_RESULT_NEED_RESET) {
> > > >   		/*
> > > >   		 * TODO: Should call platform-specific
> > > 
> > > I wonder if it might make sense to merge the reset into the above
> > > existing if.
> > 
> > There are drivers such as drivers/bus/mhi/host/pci_generic.c which
> > return PCI_ERS_RESULT_RECOVERED from ->error_detected().  So they
> > fall through directly to the ->resume() stage.  They're doing this
> > even in the pci_channel_io_frozen case (i.e. for Fatal Errors).
> > 
> > But for DPC we must call reset_subordinates() to bring the link back up.
> > And for Fatal Errors, Documentation/PCI/pcieaer-howto.rst suggests that
> > we must likewise call it because the link may be unreliable.
> 
> For fatal errors, since we already ignore the value returned by
> ->error_detected() (by calling reset_subordinates() unconditionally), why
> not update status accordingly in report_frozen_detected() and notify the
> driver about the reset?
> 
> That way, the reset logic could be unified under a single if
> (status == PCI_ERS_RESULT_NEED_RESET) condition.
> 
> Checking the drivers/bus/mhi/host/pci_generic.c implementation, it looks
> like calling slot_reset callback looks harmless.

Unfortunately it's not harmless:

mhi_pci_slot_reset() calls pci_enable_device().  But a corresponding
call to pci_disable_device() is only performed before in
mhi_pci_error_detected() if that function returns
PCI_ERS_RESULT_NEED_RESET.

So there would be an enable_cnt imbalance if I'd change the logic to
overwrite the driver's vote with PCI_ERS_RESULT_NEED_RESET in the
pci_channel_io_frozen case and call its ->slot_reset() callback.

The approach taken by this patch is to minimize risk, avoid any changes
to drivers, make do with minimal changes to pcie_do_recovery() and
limit the behavioral change.

I think overriding status = PCI_ERS_RESULT_NEED_RESET and calling drivers'
->slot_reset() would have to be done in a separate patch on top and would
require going through all drivers again to see which ones need to be
amended.

Also, note that report_frozen_detected() is too early to set
"status = PCI_ERS_RESULT_NEED_RESET".  That needs to happen after the
->mmio_enabled() step, so that drivers get a chance to examine the
device even in the pci_channel_io_frozen case before a reset is
performed.  (The ->mmio_enabled() step is only performed if "status" is
PCI_ERS_RESULT_CAN_RECOVER.)

So then the code would look like this:

	if (state == pci_channel_io_frozen)
		status = PCI_ERS_RESULT_NEED_RESET;

	if (status == PCI_ERS_RESULT_NEED_RESET) {
		if (reset_subordinates(bridge) != PCI_ERS_RESULT_RECOVERED) {
			pci_warn(bridge, "subordinate device reset failed\n");
			goto failed;
		}

		status = PCI_ERS_RESULT_RECOVERED;
		pci_dbg(bridge, "broadcast slot_reset message\n");
		pci_walk_bridge(bridge, report_slot_reset, &status);
	}

... which isn't very different from the present patch:

        if (status == PCI_ERS_RESULT_NEED_RESET ||
            state == pci_channel_io_frozen) {
                if (reset_subordinates(bridge) != PCI_ERS_RESULT_RECOVERED) {
                        pci_warn(bridge, "subordinate device reset failed\n");
                        goto failed;
                }
        }

        if (status == PCI_ERS_RESULT_NEED_RESET) {
                status = PCI_ERS_RESULT_RECOVERED;
                pci_dbg(bridge, "broadcast slot_reset message\n");
                pci_walk_bridge(bridge, report_slot_reset, &status);
        }

... except that this patch avoids touching any drivers.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/5] PCI/AER: Allow drivers to opt in to Bus Reset on Non-Fatal Errors
  2025-08-13 23:01   ` Sathyanarayanan Kuppuswamy
@ 2025-08-17 13:45     ` Lukas Wunner
  0 siblings, 0 replies; 22+ messages in thread
From: Lukas Wunner @ 2025-08-17 13:45 UTC (permalink / raw)
  To: Sathyanarayanan Kuppuswamy
  Cc: Bjorn Helgaas, Riana Tauro, Aravind Iddamsetty, Sean C. Dardis,
	Terry Bowman, Niklas Schnelle, Linas Vepstas, Mahesh J Salgaonkar,
	Oliver OHalloran, Manivannan Sadhasivam, linuxppc-dev, linux-pci

On Wed, Aug 13, 2025 at 04:01:09PM -0700, Sathyanarayanan Kuppuswamy wrote:
> On 8/12/25 10:11 PM, Lukas Wunner wrote:
> > The file Documentation/PCI/pcieaer-howto.rst hints at a rationale for
> > always performing a Bus Reset on Fatal Errors:  "Fatal errors [...] cause
> > the link to be unreliable.  [...] This [reset_link] callback is used to
> > reset the PCIe physical link when a fatal error happens.  If an error
> > message indicates a fatal error, [...] performing link reset at upstream
> > is necessary."
> 
> In the code we don't seem to differentiate link_reset and slot_reset. But
> the Documentation calls them into two steps. Do you think we should
> fix the Documentation?

reset_link and slot_reset are two different things:

* slot_reset is the ->slot_reset() callback in struct pci_error_handlers.

* reset_link is the reset_subordinates() callback passed in to
  pcie_do_recovery().

Commit 8f1bbfbc3596 renamed reset_link() to reset_subordinates() but
neglected to update Documentation/PCI/pcieaer-howto.rst.

Commit b6cf1a42f916 dropped the reset_link() callback from struct
pcie_port_service_driver and dropped default_reset_link() in favor
of passing aer_root_reset() to pcie_do_recovery().  Yet the documentation
continues referring to a "default reset_link callback" and incorrectly
claims that "Upstream Port drivers may provide their own reset_link
functions".

I've begun updating the documentation and intend to submit that separately.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/5] PCI/AER: Allow drivers to opt in to Bus Reset on Non-Fatal Errors
  2025-08-17 13:17         ` Lukas Wunner
@ 2025-08-17 16:10           ` Sathyanarayanan Kuppuswamy
  0 siblings, 0 replies; 22+ messages in thread
From: Sathyanarayanan Kuppuswamy @ 2025-08-17 16:10 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Niklas Schnelle, Bjorn Helgaas, Riana Tauro, Aravind Iddamsetty,
	Sean C. Dardis, Terry Bowman, Linas Vepstas, Mahesh J Salgaonkar,
	Oliver OHalloran, Manivannan Sadhasivam, linuxppc-dev, linux-pci


On 8/17/25 6:17 AM, Lukas Wunner wrote:
> On Thu, Aug 14, 2025 at 12:29:25PM -0700, Sathyanarayanan Kuppuswamy wrote:
>> On 8/14/25 2:36 AM, Lukas Wunner wrote:
>>> On Thu, Aug 14, 2025 at 09:56:09AM +0200, Niklas Schnelle wrote:
>>>> On Wed, 2025-08-13 at 07:11 +0200, Lukas Wunner wrote:
>>>>> @@ -233,6 +228,14 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
>>>>>    		pci_walk_bridge(bridge, report_mmio_enabled, &status);
>>>>>    	}
>>>>> +	if (status == PCI_ERS_RESULT_NEED_RESET ||
>>>>> +	    state == pci_channel_io_frozen) {
>>>>> +		if (reset_subordinates(bridge) != PCI_ERS_RESULT_RECOVERED) {
>>>>> +			pci_warn(bridge, "subordinate device reset failed\n");
>>>>> +			goto failed;
>>>>> +		}
>>>>> +	}
>>>>> +
>>>>>    	if (status == PCI_ERS_RESULT_NEED_RESET) {
>>>>>    		/*
>>>>>    		 * TODO: Should call platform-specific
>>>> I wonder if it might make sense to merge the reset into the above
>>>> existing if.
>>> There are drivers such as drivers/bus/mhi/host/pci_generic.c which
>>> return PCI_ERS_RESULT_RECOVERED from ->error_detected().  So they
>>> fall through directly to the ->resume() stage.  They're doing this
>>> even in the pci_channel_io_frozen case (i.e. for Fatal Errors).
>>>
>>> But for DPC we must call reset_subordinates() to bring the link back up.
>>> And for Fatal Errors, Documentation/PCI/pcieaer-howto.rst suggests that
>>> we must likewise call it because the link may be unreliable.
>> For fatal errors, since we already ignore the value returned by
>> ->error_detected() (by calling reset_subordinates() unconditionally), why
>> not update status accordingly in report_frozen_detected() and notify the
>> driver about the reset?
>>
>> That way, the reset logic could be unified under a single if
>> (status == PCI_ERS_RESULT_NEED_RESET) condition.
>>
>> Checking the drivers/bus/mhi/host/pci_generic.c implementation, it looks
>> like calling slot_reset callback looks harmless.
> Unfortunately it's not harmless:
>
> mhi_pci_slot_reset() calls pci_enable_device().  But a corresponding
> call to pci_disable_device() is only performed before in
> mhi_pci_error_detected() if that function returns
> PCI_ERS_RESULT_NEED_RESET.
>
> So there would be an enable_cnt imbalance if I'd change the logic to
> overwrite the driver's vote with PCI_ERS_RESULT_NEED_RESET in the
> pci_channel_io_frozen case and call its ->slot_reset() callback.
>
> The approach taken by this patch is to minimize risk, avoid any changes
> to drivers, make do with minimal changes to pcie_do_recovery() and
> limit the behavioral change.
>
> I think overriding status = PCI_ERS_RESULT_NEED_RESET and calling drivers'
> ->slot_reset() would have to be done in a separate patch on top and would
> require going through all drivers again to see which ones need to be
> amended.
>
> Also, note that report_frozen_detected() is too early to set
> "status = PCI_ERS_RESULT_NEED_RESET".  That needs to happen after the
> ->mmio_enabled() step, so that drivers get a chance to examine the
> device even in the pci_channel_io_frozen case before a reset is
> performed.  (The ->mmio_enabled() step is only performed if "status" is
> PCI_ERS_RESULT_CAN_RECOVER.)
>
> So then the code would look like this:
>
> 	if (state == pci_channel_io_frozen)
> 		status = PCI_ERS_RESULT_NEED_RESET;
>
> 	if (status == PCI_ERS_RESULT_NEED_RESET) {
> 		if (reset_subordinates(bridge) != PCI_ERS_RESULT_RECOVERED) {
> 			pci_warn(bridge, "subordinate device reset failed\n");
> 			goto failed;
> 		}
>
> 		status = PCI_ERS_RESULT_RECOVERED;
> 		pci_dbg(bridge, "broadcast slot_reset message\n");
> 		pci_walk_bridge(bridge, report_slot_reset, &status);
> 	}
>
> ... which isn't very different from the present patch:
>
>          if (status == PCI_ERS_RESULT_NEED_RESET ||
>              state == pci_channel_io_frozen) {
>                  if (reset_subordinates(bridge) != PCI_ERS_RESULT_RECOVERED) {
>                          pci_warn(bridge, "subordinate device reset failed\n");
>                          goto failed;
>                  }
>          }
>
>          if (status == PCI_ERS_RESULT_NEED_RESET) {
>                  status = PCI_ERS_RESULT_RECOVERED;
>                  pci_dbg(bridge, "broadcast slot_reset message\n");
>                  pci_walk_bridge(bridge, report_slot_reset, &status);
>          }
>
> ... except that this patch avoids touching any drivers.


Makes sense. Thanks for the clarification.


>
> Thanks,
>
> Lukas
>
-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/5] PCI/AER: Allow drivers to opt in to Bus Reset on Non-Fatal Errors
  2025-08-13  5:11 ` [PATCH 1/5] PCI/AER: Allow drivers to opt in to Bus Reset on Non-Fatal Errors Lukas Wunner
  2025-08-13 23:01   ` Sathyanarayanan Kuppuswamy
  2025-08-14  7:56   ` Niklas Schnelle
@ 2025-08-17 16:11   ` Sathyanarayanan Kuppuswamy
  2 siblings, 0 replies; 22+ messages in thread
From: Sathyanarayanan Kuppuswamy @ 2025-08-17 16:11 UTC (permalink / raw)
  To: Lukas Wunner, Bjorn Helgaas
  Cc: Riana Tauro, Aravind Iddamsetty, Sean C. Dardis, Terry Bowman,
	Niklas Schnelle, Linas Vepstas, Mahesh J Salgaonkar,
	Oliver OHalloran, Manivannan Sadhasivam, linuxppc-dev, linux-pci


On 8/12/25 10:11 PM, Lukas Wunner wrote:
> When Advanced Error Reporting was introduced in September 2006 by commit
> 6c2b374d7485 ("PCI-Express AER implemetation: AER core and aerdriver"), it
> sought to adhere to the recovery flow and callbacks specified in
> Documentation/PCI/pci-error-recovery.rst.
>
> That document had been added in January 2006, when Enhanced Error Handling
> (EEH) was introduced for PowerPC with commit 065c6359071c ("[PATCH] PCI
> Error Recovery: documentation").
>
> However the AER driver deviates from the document in that it never
> performs a Secondary Bus Reset on Non-Fatal Errors, but always on Fatal
> Errors.  By contrast, EEH allows drivers to opt in or out of a Bus Reset
> regardless of error severity, by returning PCI_ERS_RESULT_NEED_RESET or
> PCI_ERS_RESULT_CAN_RECOVER from their ->error_detected() callback.  If all
> drivers agree that they can recover without a Bus Reset, EEH skips it.
> Should one of them request a Bus Reset, it overrides all other drivers.
>
> This inconsistency between EEH and AER seems problematic because drivers
> need to be aware of and cope with it.
>
> The file Documentation/PCI/pcieaer-howto.rst hints at a rationale for
> always performing a Bus Reset on Fatal Errors:  "Fatal errors [...] cause
> the link to be unreliable.  [...] This [reset_link] callback is used to
> reset the PCIe physical link when a fatal error happens.  If an error
> message indicates a fatal error, [...] performing link reset at upstream
> is necessary."
>
> There's no such rationale provided for never performing a Bus Reset on
> Non-Fatal Errors.
>
> The "xe" driver has a need to attempt a reset of local units on graphics
> cards upon a Non-Fatal Error.  If that is insufficient for recovery, the
> driver wants to opt in to a Bus Reset.
>
> Accommodate such use cases and align AER more closely with EEH by
> performing a Bus Reset in pcie_do_recovery() if drivers request it and the
> faulting device's channel_state is pci_channel_io_normal.  The AER driver
> sets this channel_state for Non-Fatal Errors.  For Fatal Errors, it uses
> pci_channel_io_frozen.
>
> This limits the deviation from Documentation/PCI/pci-error-recovery.rst
> and EEH to the unconditional Bus Reset on Fatal Errors.
>
> pcie_do_recovery() is also invoked by the Downstream Port Containment and
> Error Disconnect Recover drivers.  They both set the channel_state to
> pci_channel_io_frozen, hence pcie_do_recovery() continues to always invoke
> the ->reset_subordinates() callback in their case.  That is necessary
> because the callback brings the link back up at the containing Downstream
> Port.
>
> There are two behavioral changes resulting from this commit:
>
> First, if channel_state is pci_channel_io_normal and one of the affected
> drivers returns PCI_ERS_RESULT_NEED_RESET from its ->error_detected()
> callback, a Bus Reset will now be performed.  There are drivers doing this
> and although it would be possible to avoid a behavioral change by letting
> them return PCI_ERS_RESULT_CAN_RECOVER instead, the impression I got from
> examination of all drivers is that they actually expect or want a Bus
> Reset (cxl_error_detected() is a case in point).  In any case, if they can
> cope with a Bus Reset on Fatal Errors, they shouldn't have issues with a
> Bus Reset on Non-Fatal Errors.
>
> Second, if channel_state is pci_channel_io_frozen and all affected drivers
> return PCI_ERS_RESULT_CAN_RECOVER from ->error_detected(), their
> ->mmio_enabled() callback is now invoked prior to performing a Bus Reset,
> instead of afterwards.  This actually makes sense:  For example,
> drivers/scsi/sym53c8xx_2/sym_glue.c dumps debug registers in its
> ->mmio_enabled() callback.  Doing so after reset right now captures the
> post-reset state instead of the faulting state, which is useless.
>
> There is only one other driver which implements ->mmio_enabled() and
> returns PCI_ERS_RESULT_CAN_RECOVER from ->error_detected() for
> channel_state pci_channel_io_frozen, drivers/scsi/ipr.c (IBM Power RAID).
> It appears to only be used on EEH platforms.  So the second behavioral
> change is limited to these two drivers.
>
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> ---

Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>

>   drivers/pci/pcie/err.c | 17 ++++++++++-------
>   1 file changed, 10 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> index de6381c690f5..e795e5ae6b03 100644
> --- a/drivers/pci/pcie/err.c
> +++ b/drivers/pci/pcie/err.c
> @@ -217,15 +217,10 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
>   	pci_walk_bridge(bridge, pci_pm_runtime_get_sync, NULL);
>   
>   	pci_dbg(bridge, "broadcast error_detected message\n");
> -	if (state == pci_channel_io_frozen) {
> +	if (state == pci_channel_io_frozen)
>   		pci_walk_bridge(bridge, report_frozen_detected, &status);
> -		if (reset_subordinates(bridge) != PCI_ERS_RESULT_RECOVERED) {
> -			pci_warn(bridge, "subordinate device reset failed\n");
> -			goto failed;
> -		}
> -	} else {
> +	else
>   		pci_walk_bridge(bridge, report_normal_detected, &status);
> -	}
>   
>   	if (status == PCI_ERS_RESULT_CAN_RECOVER) {
>   		status = PCI_ERS_RESULT_RECOVERED;
> @@ -233,6 +228,14 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
>   		pci_walk_bridge(bridge, report_mmio_enabled, &status);
>   	}
>   
> +	if (status == PCI_ERS_RESULT_NEED_RESET ||
> +	    state == pci_channel_io_frozen) {
> +		if (reset_subordinates(bridge) != PCI_ERS_RESULT_RECOVERED) {
> +			pci_warn(bridge, "subordinate device reset failed\n");
> +			goto failed;
> +		}
> +	}
> +
>   	if (status == PCI_ERS_RESULT_NEED_RESET) {
>   		/*
>   		 * TODO: Should call platform-specific

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/5] PCI/AER: Allow drivers to opt in to Bus Reset on Non-Fatal Errors
  2025-08-14  7:56   ` Niklas Schnelle
  2025-08-14  9:36     ` Lukas Wunner
@ 2025-08-18 23:17     ` Linas Vepstas
  1 sibling, 0 replies; 22+ messages in thread
From: Linas Vepstas @ 2025-08-18 23:17 UTC (permalink / raw)
  To: Niklas Schnelle
  Cc: Lukas Wunner, Bjorn Helgaas, Riana Tauro, Aravind Iddamsetty,
	Sean C. Dardis, Terry Bowman, Sathyanarayanan Kuppuswamy,
	Mahesh J Salgaonkar, Oliver OHalloran, Manivannan Sadhasivam,
	linuxppc-dev, linux-pci

On Thu, Aug 14, 2025 at 2:56 AM Niklas Schnelle <schnelle@linux.ibm.com> wrote:
>
> On Wed, 2025-08-13 at 07:11 +0200, Lukas Wunner wrote:
> > When Advanced Error Reporting was introduced in September 2006 by commit
> > 6c2b374d7485 ("PCI-Express AER implemetation: AER core and aerdriver"), it
> > sought to adhere to the recovery flow and callbacks specified in
> > Documentation/PCI/pci-error-recovery.rst.
> >
> --- snip ---
> > Signed-off-by: Lukas Wunner <lukas@wunner.de>
> > ---
> >  drivers/pci/pcie/err.c | 17 ++++++++++-------
> >  1 file changed, 10 insertions(+), 7 deletions(-)
> >
> > diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> > index de6381c690f5..e795e5ae6b03 100644
> > --- a/drivers/pci/pcie/err.c
> > +++ b/drivers/pci/pcie/err.c
> > @@ -217,15 +217,10 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
> >       pci_walk_bridge(bridge, pci_pm_runtime_get_sync, NULL);
> >
> >       pci_dbg(bridge, "broadcast error_detected message\n");
> > -     if (state == pci_channel_io_frozen) {
> > +     if (state == pci_channel_io_frozen)
> >               pci_walk_bridge(bridge, report_frozen_detected, &status);
> > -             if (reset_subordinates(bridge) != PCI_ERS_RESULT_RECOVERED) {
> > -                     pci_warn(bridge, "subordinate device reset failed\n");
> > -                     goto failed;
> > -             }
> > -     } else {
> > +     else
> >               pci_walk_bridge(bridge, report_normal_detected, &status);
> > -     }
> >
> >       if (status == PCI_ERS_RESULT_CAN_RECOVER) {
> >               status = PCI_ERS_RESULT_RECOVERED;
>
> On s390 PCI errors leave the device with MMIO blocked until either the
> error state is cleared or we reset via the firmware interface. With
> this change and the pci_channel_io_frozen case AER would now do the
> report_mmio_enabled() before the reset with nothing happening between
> report_frozen_detected() and report_mmio_enabled() is MMIO enabled at
> this point? I think this callback really only makes sense if you have
> an equivalent to s390's clearing of the error state that enables MMIO
> but doesn't otherwise reset. Similarly EEH has eeh_pci_enable(pe,
> EEH_OPT_THAW_MMIO).

The original intent was that if the channel locked up e.g. due to some
uncorrectable ECC error or some transient errors due to electrical
problems on the bus (bad reflection of some pulse off some poorly
terminated connector) then such an error would almost surely be
transient and very very unlikely to repeat.

Thus it would be OK to re-enable the MMIO (without otherwise resetting
any channel controller state) and let the device driver examine the PCI
config registers. If they all look good, don't contain any scrambled addrs
or bitflags, then completely normal operations could be resumed without
any further messing around, resetting, invalidating etc.

But first, the device driver needs to examine the config registers and
that cannot be done unless MMIO is enabled.  If MMIO is enabled,
and the PCI config regs appear to contain garbage, then that garbage
can be logged in some error report or crash dump. After this
got done, the device driver would invalidate any pending i/o (for example,
half-finished blocks in some s390 orb, irb, schib, ioccw, whatever)
make sure that assorted channel subsystems are actually halted,
and then attempt  a reset of the bus, the bus controllers (390 channel
or subchannel) and probably the device as well. If that reset succeeds,
then the device driver can restart with a fresh, clean device and a working
channel.  And maybe, if we're lucky, start handling any pending i/o requests.

In practice, this worked great for network adapters. However, if the affected
device was some storage controller for e.g. some mounted filesystem then
(way back when) it was hopeless, because the Linux block subsystem did
not know how to deal with transient errors like this.  Trying to figure out how
to unscramble the block subsystem, and keep mounted filesystems shielded
from this chaos was the one thing I couldn't figure out how to solve.  It seemed
important.  But we've come a long way since then, so I dunno.

-- Linas

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2025-08-18 23:18 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-13  5:11 [PATCH 0/5] PCI: Reduce AER / EEH deviations Lukas Wunner
2025-08-13  5:11 ` [PATCH 1/5] PCI/AER: Allow drivers to opt in to Bus Reset on Non-Fatal Errors Lukas Wunner
2025-08-13 23:01   ` Sathyanarayanan Kuppuswamy
2025-08-17 13:45     ` Lukas Wunner
2025-08-14  7:56   ` Niklas Schnelle
2025-08-14  9:36     ` Lukas Wunner
2025-08-14 19:29       ` Sathyanarayanan Kuppuswamy
2025-08-17 13:17         ` Lukas Wunner
2025-08-17 16:10           ` Sathyanarayanan Kuppuswamy
2025-08-14 20:31       ` Niklas Schnelle
2025-08-18 23:17     ` Linas Vepstas
2025-08-17 16:11   ` Sathyanarayanan Kuppuswamy
2025-08-13  5:11 ` [PATCH 2/5] PCI/ERR: Fix uevent on failure to recover Lukas Wunner
2025-08-13 23:01   ` Sathyanarayanan Kuppuswamy
2025-08-14  7:08   ` Niklas Schnelle
2025-08-13  5:11 ` [PATCH 3/5] PCI/ERR: Notify drivers " Lukas Wunner
2025-08-13 23:05   ` Sathyanarayanan Kuppuswamy
2025-08-13  5:11 ` [PATCH 4/5] PCI/ERR: Update device error_state already after reset Lukas Wunner
2025-08-13 23:43   ` Sathyanarayanan Kuppuswamy
2025-08-13  5:11 ` [PATCH 5/5] PCI/ERR: Remove remnants of .link_reset() callback Lukas Wunner
2025-08-14  0:40   ` Sathyanarayanan Kuppuswamy
2025-08-13 18:21 ` [PATCH 0/5] PCI: Reduce AER / EEH deviations Bjorn Helgaas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).