netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] PCI: Reduce AER / EEH deviations
@ 2025-08-13  5:11 Lukas Wunner
  2025-08-13  5:11 ` [PATCH 4/5] PCI/ERR: Update device error_state already after reset Lukas Wunner
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Lukas Wunner @ 2025-08-13  5:11 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Riana Tauro, Aravind Iddamsetty, Sean C. Dardis, Terry Bowman,
	Sathyanarayanan Kuppuswamy, Niklas Schnelle, Linas Vepstas,
	Mahesh J Salgaonkar, Oliver OHalloran, Manivannan Sadhasivam,
	linuxppc-dev, linux-pci, Shahed Shaikh, Manish Chopra,
	GR-Linux-NIC-Dev, Nilesh Javali, GR-QLogic-Storage-Upstream,
	Edward Cree, linux-net-drivers, James Smart, Dick Kennedy,
	James E.J. Bottomley, Martin K. Petersen, linux-scsi

The kernel supports three different PCI error recovery mechanisms:

* AER per PCIe r7.0 sec 6.2 (drivers/pci/pcie/aer.c + err.c)
* EEH on PowerPC (arch/powerpc/kernel/eeh_driver.c)
* zPCI on s390 (arch/s390/pci/pci_event.c)

In theory, they should all follow Documentation/PCI/pci-error-recovery.rst
to afford uniform behavior to drivers across platforms.

In practice, there are deviations which this series seeks to reduce.

One particular pain point is AER not allowing drivers to opt in to a
Bus Reset on Non-Fatal Errors (patch [1/5]).  EEH allows this and the
"xe" graphics driver would like to take advantage of it on AER-capable
platforms.  Patches [2/5] to [4/5] address various other deviations,
while patch [5/5] cleans up old gunk in code comments.

I've gone through all drivers implementing pci_error_handlers to ascertain
that no regressions are introduced by these changes.  Nevertheless further
reviewing and testing would be appreciated to raise the confidence.
Thanks!

Lukas Wunner (5):
  PCI/AER: Allow drivers to opt in to Bus Reset on Non-Fatal Errors
  PCI/ERR: Fix uevent on failure to recover
  PCI/ERR: Notify drivers on failure to recover
  PCI/ERR: Update device error_state already after reset
  PCI/ERR: Remove remnants of .link_reset() callback

 .../ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c   |  1 -
 .../net/ethernet/qlogic/qlcnic/qlcnic_main.c  |  2 -
 drivers/net/ethernet/sfc/efx_common.c         |  3 --
 drivers/net/ethernet/sfc/falcon/efx.c         |  3 --
 drivers/net/ethernet/sfc/siena/efx_common.c   |  3 --
 drivers/pci/pcie/err.c                        | 40 ++++++++++++++-----
 drivers/scsi/lpfc/lpfc_init.c                 |  2 +-
 drivers/scsi/qla2xxx/qla_os.c                 |  5 ---
 8 files changed, 32 insertions(+), 27 deletions(-)

-- 
2.47.2


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 4/5] PCI/ERR: Update device error_state already after reset
  2025-08-13  5:11 [PATCH 0/5] PCI: Reduce AER / EEH deviations Lukas Wunner
@ 2025-08-13  5:11 ` Lukas Wunner
  2025-08-13 23:43   ` Sathyanarayanan Kuppuswamy
  2025-08-13  5:11 ` [PATCH 5/5] PCI/ERR: Remove remnants of .link_reset() callback Lukas Wunner
  2025-08-13 18:21 ` [PATCH 0/5] PCI: Reduce AER / EEH deviations Bjorn Helgaas
  2 siblings, 1 reply; 6+ messages in thread
From: Lukas Wunner @ 2025-08-13  5:11 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Riana Tauro, Aravind Iddamsetty, Sean C. Dardis, Terry Bowman,
	Sathyanarayanan Kuppuswamy, Niklas Schnelle, Linas Vepstas,
	Mahesh J Salgaonkar, Oliver OHalloran, Manivannan Sadhasivam,
	linuxppc-dev, linux-pci, Shahed Shaikh, Manish Chopra,
	GR-Linux-NIC-Dev, Nilesh Javali, GR-QLogic-Storage-Upstream,
	James E.J. Bottomley, Martin K. Petersen, linux-scsi, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski <ku 

After a Fatal Error has been reported by a device and has been recovered
through a Secondary Bus Reset, AER updates the device's error_state to
pci_channel_io_normal before invoking its driver's ->resume() callback.

By contrast, EEH updates the error_state earlier, namely after resetting
the device and before invoking its driver's ->slot_reset() callback.
Commit c58dc575f3c8 ("powerpc/pseries: Set error_state to
pci_channel_io_normal in eeh_report_reset()") explains in great detail
that the earlier invocation is necessitated by various drivers checking
accessibility of the device with pci_channel_offline() and avoiding
accesses if it returns true.  It returns true for any other error_state
than pci_channel_io_normal.

The device should be accessible already after reset, hence the reasoning
is that it's safe to update the error_state immediately afterwards.

This deviation between AER and EEH seems problematic because drivers
behave differently depending on which error recovery mechanism the
platform uses.  Three drivers have gone so far as to update the
error_state themselves, presumably to work around AER's behavior.

For consistency, amend AER to update the error_state at the same recovery
steps as EEH.  Drop the now unnecessary workaround from the three drivers.

Keep updating the error_state before ->resume() in case ->error_detected()
or ->mmio_enabled() return PCI_ERS_RESULT_RECOVERED, which causes
->slot_reset() to be skipped.  There are drivers doing this even for Fatal
Errors, e.g. mhi_pci_error_detected().

Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c | 1 -
 drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c    | 2 --
 drivers/pci/pcie/err.c                              | 3 ++-
 drivers/scsi/qla2xxx/qla_os.c                       | 5 -----
 4 files changed, 2 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c
index d7cdea8f604d..91e7b38143ea 100644
--- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c
+++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c
@@ -4215,7 +4215,6 @@ static pci_ers_result_t qlcnic_83xx_io_slot_reset(struct pci_dev *pdev)
 	struct qlcnic_adapter *adapter = pci_get_drvdata(pdev);
 	int err = 0;
 
-	pdev->error_state = pci_channel_io_normal;
 	err = pci_enable_device(pdev);
 	if (err)
 		goto disconnect;
diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
index 53cdd36c4123..e051d8c7a28d 100644
--- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
+++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
@@ -3766,8 +3766,6 @@ static int qlcnic_attach_func(struct pci_dev *pdev)
 	struct qlcnic_adapter *adapter = pci_get_drvdata(pdev);
 	struct net_device *netdev = adapter->netdev;
 
-	pdev->error_state = pci_channel_io_normal;
-
 	err = pci_enable_device(pdev);
 	if (err)
 		return err;
diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index 930bb60fb761..bebe4bc111d7 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -153,7 +153,8 @@ static int report_slot_reset(struct pci_dev *dev, void *data)
 
 	device_lock(&dev->dev);
 	pdrv = dev->driver;
-	if (!pdrv || !pdrv->err_handler || !pdrv->err_handler->slot_reset)
+	if (!pci_dev_set_io_state(dev, pci_channel_io_normal) ||
+	    !pdrv || !pdrv->err_handler || !pdrv->err_handler->slot_reset)
 		goto out;
 
 	err_handler = pdrv->err_handler;
diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c
index d4b484c0fd9d..4460421834cb 100644
--- a/drivers/scsi/qla2xxx/qla_os.c
+++ b/drivers/scsi/qla2xxx/qla_os.c
@@ -7883,11 +7883,6 @@ qla2xxx_pci_slot_reset(struct pci_dev *pdev)
 	       "Slot Reset.\n");
 
 	ha->pci_error_state = QLA_PCI_SLOT_RESET;
-	/* Workaround: qla2xxx driver which access hardware earlier
-	 * needs error state to be pci_channel_io_online.
-	 * Otherwise mailbox command timesout.
-	 */
-	pdev->error_state = pci_channel_io_normal;
 
 	pci_restore_state(pdev);
 
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 5/5] PCI/ERR: Remove remnants of .link_reset() callback
  2025-08-13  5:11 [PATCH 0/5] PCI: Reduce AER / EEH deviations Lukas Wunner
  2025-08-13  5:11 ` [PATCH 4/5] PCI/ERR: Update device error_state already after reset Lukas Wunner
@ 2025-08-13  5:11 ` Lukas Wunner
  2025-08-14  0:40   ` Sathyanarayanan Kuppuswamy
  2025-08-13 18:21 ` [PATCH 0/5] PCI: Reduce AER / EEH deviations Bjorn Helgaas
  2 siblings, 1 reply; 6+ messages in thread
From: Lukas Wunner @ 2025-08-13  5:11 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Riana Tauro, Aravind Iddamsetty, Sean C. Dardis, Terry Bowman,
	Sathyanarayanan Kuppuswamy, Niklas Schnelle, Linas Vepstas,
	Mahesh J Salgaonkar, Oliver OHalloran, Manivannan Sadhasivam,
	linuxppc-dev, linux-pci, Edward Cree, linux-net-drivers,
	James Smart, Dick Kennedy, James E.J. Bottomley,
	Martin K. Petersen, linux-scsi, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

Back in 2017, commit 2fd260f03b6a ("PCI/AER: Remove unused .link_reset()
callback") removed .link_reset() from struct pci_error_handlers, but left
a few code comments behind which still mention it.  Remove them.

The code comments in the SolarFlare Ethernet drivers point out that no
.mmio_enabled() callback is needed because the driver's .error_detected()
callback always returns PCI_ERS_RESULT_NEED_RESET, which causes
pcie_do_recovery() to skip .mmio_enabled().  That's not quite correct
because efx_io_error_detected() does return PCI_ERS_RESULT_RECOVERED under
certain conditions and then .mmio_enabled() would indeed be called if it
were implemented.  Remove this misleading portion of the code comment as
well.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 drivers/net/ethernet/sfc/efx_common.c       | 3 ---
 drivers/net/ethernet/sfc/falcon/efx.c       | 3 ---
 drivers/net/ethernet/sfc/siena/efx_common.c | 3 ---
 drivers/scsi/lpfc/lpfc_init.c               | 2 +-
 4 files changed, 1 insertion(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/sfc/efx_common.c b/drivers/net/ethernet/sfc/efx_common.c
index 5a14d94163b1..e8fdbb62d872 100644
--- a/drivers/net/ethernet/sfc/efx_common.c
+++ b/drivers/net/ethernet/sfc/efx_common.c
@@ -1258,9 +1258,6 @@ static void efx_io_resume(struct pci_dev *pdev)
 
 /* For simplicity and reliability, we always require a slot reset and try to
  * reset the hardware when a pci error affecting the device is detected.
- * We leave both the link_reset and mmio_enabled callback unimplemented:
- * with our request for slot reset the mmio_enabled callback will never be
- * called, and the link_reset callback is not used by AER or EEH mechanisms.
  */
 const struct pci_error_handlers efx_err_handlers = {
 	.error_detected = efx_io_error_detected,
diff --git a/drivers/net/ethernet/sfc/falcon/efx.c b/drivers/net/ethernet/sfc/falcon/efx.c
index b07f7e4e2877..0c784656fde9 100644
--- a/drivers/net/ethernet/sfc/falcon/efx.c
+++ b/drivers/net/ethernet/sfc/falcon/efx.c
@@ -3128,9 +3128,6 @@ static void ef4_io_resume(struct pci_dev *pdev)
 
 /* For simplicity and reliability, we always require a slot reset and try to
  * reset the hardware when a pci error affecting the device is detected.
- * We leave both the link_reset and mmio_enabled callback unimplemented:
- * with our request for slot reset the mmio_enabled callback will never be
- * called, and the link_reset callback is not used by AER or EEH mechanisms.
  */
 static const struct pci_error_handlers ef4_err_handlers = {
 	.error_detected = ef4_io_error_detected,
diff --git a/drivers/net/ethernet/sfc/siena/efx_common.c b/drivers/net/ethernet/sfc/siena/efx_common.c
index a0966f879664..35036cc902fe 100644
--- a/drivers/net/ethernet/sfc/siena/efx_common.c
+++ b/drivers/net/ethernet/sfc/siena/efx_common.c
@@ -1285,9 +1285,6 @@ static void efx_io_resume(struct pci_dev *pdev)
 
 /* For simplicity and reliability, we always require a slot reset and try to
  * reset the hardware when a pci error affecting the device is detected.
- * We leave both the link_reset and mmio_enabled callback unimplemented:
- * with our request for slot reset the mmio_enabled callback will never be
- * called, and the link_reset callback is not used by AER or EEH mechanisms.
  */
 const struct pci_error_handlers efx_siena_err_handlers = {
 	.error_detected = efx_io_error_detected,
diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c
index 4081d2a358ee..cf08bb5b37c3 100644
--- a/drivers/scsi/lpfc/lpfc_init.c
+++ b/drivers/scsi/lpfc/lpfc_init.c
@@ -14377,7 +14377,7 @@ lpfc_sli_prep_dev_for_perm_failure(struct lpfc_hba *phba)
  * as desired.
  *
  * Return codes
- * 	PCI_ERS_RESULT_CAN_RECOVER - can be recovered with reset_link
+ *	PCI_ERS_RESULT_CAN_RECOVER - can be recovered without reset
  * 	PCI_ERS_RESULT_NEED_RESET - need to reset before recovery
  * 	PCI_ERS_RESULT_DISCONNECT - device could not be recovered
  **/
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/5] PCI: Reduce AER / EEH deviations
  2025-08-13  5:11 [PATCH 0/5] PCI: Reduce AER / EEH deviations Lukas Wunner
  2025-08-13  5:11 ` [PATCH 4/5] PCI/ERR: Update device error_state already after reset Lukas Wunner
  2025-08-13  5:11 ` [PATCH 5/5] PCI/ERR: Remove remnants of .link_reset() callback Lukas Wunner
@ 2025-08-13 18:21 ` Bjorn Helgaas
  2 siblings, 0 replies; 6+ messages in thread
From: Bjorn Helgaas @ 2025-08-13 18:21 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Riana Tauro, Aravind Iddamsetty, Sean C. Dardis, Terry Bowman,
	Sathyanarayanan Kuppuswamy, Niklas Schnelle, Linas Vepstas,
	Mahesh J Salgaonkar, Oliver OHalloran, Manivannan Sadhasivam,
	linuxppc-dev, linux-pci, Shahed Shaikh, Manish Chopra,
	GR-Linux-NIC-Dev, Nilesh Javali, GR-QLogic-Storage-Upstream,
	Edward Cree, linux-net-drivers, James Smart, Dick Kennedy,
	James E.J. Bottomley, Martin K. Petersen, linux-scsi,
	er.kernel.org, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, netdev

On Wed, Aug 13, 2025 at 07:11:00AM +0200, Lukas Wunner wrote:
> The kernel supports three different PCI error recovery mechanisms:
> 
> * AER per PCIe r7.0 sec 6.2 (drivers/pci/pcie/aer.c + err.c)
> * EEH on PowerPC (arch/powerpc/kernel/eeh_driver.c)
> * zPCI on s390 (arch/s390/pci/pci_event.c)
> 
> In theory, they should all follow Documentation/PCI/pci-error-recovery.rst
> to afford uniform behavior to drivers across platforms.
> 
> In practice, there are deviations which this series seeks to reduce.
> 
> One particular pain point is AER not allowing drivers to opt in to a
> Bus Reset on Non-Fatal Errors (patch [1/5]).  EEH allows this and the
> "xe" graphics driver would like to take advantage of it on AER-capable
> platforms.  Patches [2/5] to [4/5] address various other deviations,
> while patch [5/5] cleans up old gunk in code comments.
> 
> I've gone through all drivers implementing pci_error_handlers to ascertain
> that no regressions are introduced by these changes.  Nevertheless further
> reviewing and testing would be appreciated to raise the confidence.
> Thanks!
> 
> Lukas Wunner (5):
>   PCI/AER: Allow drivers to opt in to Bus Reset on Non-Fatal Errors
>   PCI/ERR: Fix uevent on failure to recover
>   PCI/ERR: Notify drivers on failure to recover
>   PCI/ERR: Update device error_state already after reset
>   PCI/ERR: Remove remnants of .link_reset() callback
> 
>  .../ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c   |  1 -
>  .../net/ethernet/qlogic/qlcnic/qlcnic_main.c  |  2 -
>  drivers/net/ethernet/sfc/efx_common.c         |  3 --
>  drivers/net/ethernet/sfc/falcon/efx.c         |  3 --
>  drivers/net/ethernet/sfc/siena/efx_common.c   |  3 --
>  drivers/pci/pcie/err.c                        | 40 ++++++++++++++-----
>  drivers/scsi/lpfc/lpfc_init.c                 |  2 +-
>  drivers/scsi/qla2xxx/qla_os.c                 |  5 ---
>  8 files changed, 32 insertions(+), 27 deletions(-)

Applied to pci/aer for v6.18, thanks, Lukas!

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 4/5] PCI/ERR: Update device error_state already after reset
  2025-08-13  5:11 ` [PATCH 4/5] PCI/ERR: Update device error_state already after reset Lukas Wunner
@ 2025-08-13 23:43   ` Sathyanarayanan Kuppuswamy
  0 siblings, 0 replies; 6+ messages in thread
From: Sathyanarayanan Kuppuswamy @ 2025-08-13 23:43 UTC (permalink / raw)
  To: Lukas Wunner, Bjorn Helgaas
  Cc: Riana Tauro, Aravind Iddamsetty, Sean C. Dardis, Terry Bowman,
	Niklas Schnelle, Linas Vepstas, Mahesh J Salgaonkar,
	Oliver OHalloran, Manivannan Sadhasivam, linuxppc-dev, linux-pci,
	Shahed Shaikh, Manish Chopra, GR-Linux-NIC-Dev, Nilesh Javali,
	GR-QLogic-Storage-Upstream, James E.J. Bottomley,
	Martin K. Petersen, linux-scsi, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev


On 8/12/25 10:11 PM, Lukas Wunner wrote:
> After a Fatal Error has been reported by a device and has been recovered
> through a Secondary Bus Reset, AER updates the device's error_state to
> pci_channel_io_normal before invoking its driver's ->resume() callback.
>
> By contrast, EEH updates the error_state earlier, namely after resetting
> the device and before invoking its driver's ->slot_reset() callback.
> Commit c58dc575f3c8 ("powerpc/pseries: Set error_state to
> pci_channel_io_normal in eeh_report_reset()") explains in great detail
> that the earlier invocation is necessitated by various drivers checking
> accessibility of the device with pci_channel_offline() and avoiding
> accesses if it returns true.  It returns true for any other error_state
> than pci_channel_io_normal.
>
> The device should be accessible already after reset, hence the reasoning
> is that it's safe to update the error_state immediately afterwards.
>
> This deviation between AER and EEH seems problematic because drivers
> behave differently depending on which error recovery mechanism the
> platform uses.  Three drivers have gone so far as to update the
> error_state themselves, presumably to work around AER's behavior.
>
> For consistency, amend AER to update the error_state at the same recovery
> steps as EEH.  Drop the now unnecessary workaround from the three drivers.
>
> Keep updating the error_state before ->resume() in case ->error_detected()
> or ->mmio_enabled() return PCI_ERS_RESULT_RECOVERED, which causes
> ->slot_reset() to be skipped.  There are drivers doing this even for Fatal
> Errors, e.g. mhi_pci_error_detected().
>
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> ---

Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>

>   drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c | 1 -
>   drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c    | 2 --
>   drivers/pci/pcie/err.c                              | 3 ++-
>   drivers/scsi/qla2xxx/qla_os.c                       | 5 -----
>   4 files changed, 2 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c
> index d7cdea8f604d..91e7b38143ea 100644
> --- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c
> +++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c
> @@ -4215,7 +4215,6 @@ static pci_ers_result_t qlcnic_83xx_io_slot_reset(struct pci_dev *pdev)
>   	struct qlcnic_adapter *adapter = pci_get_drvdata(pdev);
>   	int err = 0;
>   
> -	pdev->error_state = pci_channel_io_normal;
>   	err = pci_enable_device(pdev);
>   	if (err)
>   		goto disconnect;
> diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
> index 53cdd36c4123..e051d8c7a28d 100644
> --- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
> +++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
> @@ -3766,8 +3766,6 @@ static int qlcnic_attach_func(struct pci_dev *pdev)
>   	struct qlcnic_adapter *adapter = pci_get_drvdata(pdev);
>   	struct net_device *netdev = adapter->netdev;
>   
> -	pdev->error_state = pci_channel_io_normal;
> -
>   	err = pci_enable_device(pdev);
>   	if (err)
>   		return err;
> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> index 930bb60fb761..bebe4bc111d7 100644
> --- a/drivers/pci/pcie/err.c
> +++ b/drivers/pci/pcie/err.c
> @@ -153,7 +153,8 @@ static int report_slot_reset(struct pci_dev *dev, void *data)
>   
>   	device_lock(&dev->dev);
>   	pdrv = dev->driver;
> -	if (!pdrv || !pdrv->err_handler || !pdrv->err_handler->slot_reset)
> +	if (!pci_dev_set_io_state(dev, pci_channel_io_normal) ||
> +	    !pdrv || !pdrv->err_handler || !pdrv->err_handler->slot_reset)
>   		goto out;
>   
>   	err_handler = pdrv->err_handler;
> diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c
> index d4b484c0fd9d..4460421834cb 100644
> --- a/drivers/scsi/qla2xxx/qla_os.c
> +++ b/drivers/scsi/qla2xxx/qla_os.c
> @@ -7883,11 +7883,6 @@ qla2xxx_pci_slot_reset(struct pci_dev *pdev)
>   	       "Slot Reset.\n");
>   
>   	ha->pci_error_state = QLA_PCI_SLOT_RESET;
> -	/* Workaround: qla2xxx driver which access hardware earlier
> -	 * needs error state to be pci_channel_io_online.
> -	 * Otherwise mailbox command timesout.
> -	 */
> -	pdev->error_state = pci_channel_io_normal;
>   
>   	pci_restore_state(pdev);
>   

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 5/5] PCI/ERR: Remove remnants of .link_reset() callback
  2025-08-13  5:11 ` [PATCH 5/5] PCI/ERR: Remove remnants of .link_reset() callback Lukas Wunner
@ 2025-08-14  0:40   ` Sathyanarayanan Kuppuswamy
  0 siblings, 0 replies; 6+ messages in thread
From: Sathyanarayanan Kuppuswamy @ 2025-08-14  0:40 UTC (permalink / raw)
  To: Lukas Wunner, Bjorn Helgaas
  Cc: Riana Tauro, Aravind Iddamsetty, Sean C. Dardis, Terry Bowman,
	Niklas Schnelle, Linas Vepstas, Mahesh J Salgaonkar,
	Oliver OHalloran, Manivannan Sadhasivam, linuxppc-dev, linux-pci,
	Edward Cree, linux-net-drivers, James Smart, Dick Kennedy,
	James E.J. Bottomley, Martin K. Petersen, linux-scsi, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	netdev


On 8/12/25 10:11 PM, Lukas Wunner wrote:
> Back in 2017, commit 2fd260f03b6a ("PCI/AER: Remove unused .link_reset()
> callback") removed .link_reset() from struct pci_error_handlers, but left
> a few code comments behind which still mention it.  Remove them.
>
> The code comments in the SolarFlare Ethernet drivers point out that no
> .mmio_enabled() callback is needed because the driver's .error_detected()
> callback always returns PCI_ERS_RESULT_NEED_RESET, which causes
> pcie_do_recovery() to skip .mmio_enabled().  That's not quite correct
> because efx_io_error_detected() does return PCI_ERS_RESULT_RECOVERED under
> certain conditions and then .mmio_enabled() would indeed be called if it
> were implemented.  Remove this misleading portion of the code comment as
> well.
>
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> ---

Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>

>   drivers/net/ethernet/sfc/efx_common.c       | 3 ---
>   drivers/net/ethernet/sfc/falcon/efx.c       | 3 ---
>   drivers/net/ethernet/sfc/siena/efx_common.c | 3 ---
>   drivers/scsi/lpfc/lpfc_init.c               | 2 +-
>   4 files changed, 1 insertion(+), 10 deletions(-)
>
> diff --git a/drivers/net/ethernet/sfc/efx_common.c b/drivers/net/ethernet/sfc/efx_common.c
> index 5a14d94163b1..e8fdbb62d872 100644
> --- a/drivers/net/ethernet/sfc/efx_common.c
> +++ b/drivers/net/ethernet/sfc/efx_common.c
> @@ -1258,9 +1258,6 @@ static void efx_io_resume(struct pci_dev *pdev)
>   
>   /* For simplicity and reliability, we always require a slot reset and try to
>    * reset the hardware when a pci error affecting the device is detected.
> - * We leave both the link_reset and mmio_enabled callback unimplemented:
> - * with our request for slot reset the mmio_enabled callback will never be
> - * called, and the link_reset callback is not used by AER or EEH mechanisms.
>    */
>   const struct pci_error_handlers efx_err_handlers = {
>   	.error_detected = efx_io_error_detected,
> diff --git a/drivers/net/ethernet/sfc/falcon/efx.c b/drivers/net/ethernet/sfc/falcon/efx.c
> index b07f7e4e2877..0c784656fde9 100644
> --- a/drivers/net/ethernet/sfc/falcon/efx.c
> +++ b/drivers/net/ethernet/sfc/falcon/efx.c
> @@ -3128,9 +3128,6 @@ static void ef4_io_resume(struct pci_dev *pdev)
>   
>   /* For simplicity and reliability, we always require a slot reset and try to
>    * reset the hardware when a pci error affecting the device is detected.
> - * We leave both the link_reset and mmio_enabled callback unimplemented:
> - * with our request for slot reset the mmio_enabled callback will never be
> - * called, and the link_reset callback is not used by AER or EEH mechanisms.
>    */
>   static const struct pci_error_handlers ef4_err_handlers = {
>   	.error_detected = ef4_io_error_detected,
> diff --git a/drivers/net/ethernet/sfc/siena/efx_common.c b/drivers/net/ethernet/sfc/siena/efx_common.c
> index a0966f879664..35036cc902fe 100644
> --- a/drivers/net/ethernet/sfc/siena/efx_common.c
> +++ b/drivers/net/ethernet/sfc/siena/efx_common.c
> @@ -1285,9 +1285,6 @@ static void efx_io_resume(struct pci_dev *pdev)
>   
>   /* For simplicity and reliability, we always require a slot reset and try to
>    * reset the hardware when a pci error affecting the device is detected.
> - * We leave both the link_reset and mmio_enabled callback unimplemented:
> - * with our request for slot reset the mmio_enabled callback will never be
> - * called, and the link_reset callback is not used by AER or EEH mechanisms.
>    */
>   const struct pci_error_handlers efx_siena_err_handlers = {
>   	.error_detected = efx_io_error_detected,
> diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c
> index 4081d2a358ee..cf08bb5b37c3 100644
> --- a/drivers/scsi/lpfc/lpfc_init.c
> +++ b/drivers/scsi/lpfc/lpfc_init.c
> @@ -14377,7 +14377,7 @@ lpfc_sli_prep_dev_for_perm_failure(struct lpfc_hba *phba)
>    * as desired.
>    *
>    * Return codes
> - * 	PCI_ERS_RESULT_CAN_RECOVER - can be recovered with reset_link
> + *	PCI_ERS_RESULT_CAN_RECOVER - can be recovered without reset
>    * 	PCI_ERS_RESULT_NEED_RESET - need to reset before recovery
>    * 	PCI_ERS_RESULT_DISCONNECT - device could not be recovered
>    **/

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-08-14  0:40 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-13  5:11 [PATCH 0/5] PCI: Reduce AER / EEH deviations Lukas Wunner
2025-08-13  5:11 ` [PATCH 4/5] PCI/ERR: Update device error_state already after reset Lukas Wunner
2025-08-13 23:43   ` Sathyanarayanan Kuppuswamy
2025-08-13  5:11 ` [PATCH 5/5] PCI/ERR: Remove remnants of .link_reset() callback Lukas Wunner
2025-08-14  0:40   ` Sathyanarayanan Kuppuswamy
2025-08-13 18:21 ` [PATCH 0/5] PCI: Reduce AER / EEH deviations Bjorn Helgaas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).