public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v7 0/5] PCI/AER: Report fatal errors of RCiEP and EP if link recoverd
@ 2026-01-24  7:45 Shuai Xue
  2026-01-24  7:45 ` [PATCH v7 1/5] PCI/DPC: Clarify naming for error port in DPC Handling Shuai Xue
                   ` (4 more replies)
  0 siblings, 5 replies; 35+ messages in thread
From: Shuai Xue @ 2026-01-24  7:45 UTC (permalink / raw)
  To: linux-pci, linux-kernel, linuxppc-dev, bhelgaas, kbusch,
	sathyanarayanan.kuppuswamy
  Cc: mahesh, oohall, xueshuai, Jonathan.Cameron, terry.bowman,
	tianruidong, lukas

changes since v6:
- add aer_report_frozen_error helper per Lukas
- add explicit log for reset case per Lukas
- add comments for pci_dev_put per Kuppuswamy and Lukas
- init info in call site per Kuppuswamy
- rename back to add_error_device per Kuppushwamy and Lukas
- highlight newly added log per Kuppuswamy 
- drop origin patch 4 due to useless aer_cap check per Lukas
- add fixes tag and cc stable for patch 4 per Kuppuswamy
- add new patch 5 to fix pcie_clear_device_status per Lukas

changes since v5:
- rebase to 6.18-rc1
- add separate patches to clear device fatal status per Kuppuswamy

changes since v4:
- rebase to 6.17-rc6
- pick up Reviewed-by tag for PATCH[2] from Sathyanarayanan
- minor typos in commit log per Manivannan

changes since v3:
- squash patch 1 and 2 into one patch per Sathyanarayanan
- add comments note for dpc_process_error per Sathyanarayanan
- pick up Reviewed-by tag for PATCH[1] from Sathyanarayanan

changes since v2:
- moving the "err_port" rename to a separate patch per Sathyanarayanan
- rewrite comments of dpc_process_error per Sathyanarayanan
- remove NULL initialization for err_dev per Sathyanarayanan

changes since v1:
- rewrite commit log per Bjorn
- refactor aer_get_device_error_info to reduce duplication per Keith
- fix to avoid reporting fatal errors twice for root and downstream ports per Keith

The AER driver has historically avoided reading the configuration space of an
endpoint or RCiEP that reported a fatal error, considering the link to that
device unreliable. Consequently, when a fatal error occurs, the AER and DPC
drivers do not report specific error types, resulting in logs like:

  pcieport 0015:00:00.0: EDR: EDR event received
  pcieport 0015:00:00.0: EDR: Reported EDR dev: 0015:00:00.0
  pcieport 0015:00:00.0: DPC: containment event, status:0x200d, ERR_FATAL received from 0015:01:00.0
  pcieport 0015:00:00.0: AER: broadcast error_detected message
  pcieport 0015:00:00.0: AER: broadcast mmio_enabled message
  pcieport 0015:00:00.0: AER: broadcast resume message
  pcieport 0015:00:00.0: pciehp: Slot(21): Link Down/Up ignored
  pcieport 0015:00:00.0: AER: device recovery successful
  pcieport 0015:00:00.0: EDR: DPC port successfully recovered
  pcieport 0015:00:00.0: EDR: Status for 0015:00:00.0: 0x80

AER status registers are sticky and Write-1-to-clear. If the link recovered
after hot reset, we can still safely access AER status and TLP header of the
error device. In such case, report fatal errors which helps to figure out the
error root case.

After this patch, the logs like:

  pcieport 0015:00:00.0: EDR: EDR event received
  pcieport 0015:00:00.0: EDR: Reported EDR dev: 0015:00:00.0
  pcieport 0015:00:00.0: DPC: containment event, status:0x200d, ERR_FATAL received from 0015:01:00.0
  pcieport 0015:00:00.0: AER: broadcast error_detected message
+ vfio-pci 0015:01:00.0: AER: Errors reported prior to reset
+ vfio-pci 0015:01:00.0: PCIe Bus Error: severity=Uncorrectable (Fatal), type=Transaction Layer, (Receiver ID)
+ vfio-pci 0015:01:00.0:   device [144d:a80a] error status/mask=00001000/00400000
+ vfio-pci 0015:01:00.0:    [12] TLP                    (First)
+ vfio-pci 0015:01:00.0: AER:   TLP Header: 0x4a004010 0x00000040 0x01000000 0xffffffff
  pcieport 0015:00:00.0: AER: broadcast mmio_enabled message
  pcieport 0015:00:00.0: AER: broadcast resume message
  pcieport 0015:00:00.0: pciehp: Slot(21): Link Down/Up ignored
  pcieport 0015:00:00.0: AER: device recovery successful
  pcieport 0015:00:00.0: EDR: DPC port successfully recovered
  pcieport 0015:00:00.0: EDR: Status for 0015:00:00.0: 0x80


Shuai Xue (5):
  PCI/DPC: Clarify naming for error port in DPC Handling
  PCI/DPC: Run recovery on device that detected the error
  PCI/AER: Report fatal errors of RCiEP and EP if link recoverd
  PCI/AER: Clear both AER fatal and non-fatal status
  PCI/AER: Only clear error bits in pcie_clear_device_status()

 drivers/pci/pci.c             |  2 +-
 drivers/pci/pci.h             |  6 ++++--
 drivers/pci/pcie/aer.c        | 32 ++++++++++++++++++++++++++++----
 drivers/pci/pcie/dpc.c        | 31 ++++++++++++++++++++++++-------
 drivers/pci/pcie/edr.c        | 35 ++++++++++++++++++-----------------
 drivers/pci/pcie/err.c        |  7 ++++++-
 include/uapi/linux/pci_regs.h |  1 +
 7 files changed, 82 insertions(+), 32 deletions(-)

-- 
2.39.3


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v7 1/5] PCI/DPC: Clarify naming for error port in DPC Handling
  2026-01-24  7:45 [PATCH v7 0/5] PCI/AER: Report fatal errors of RCiEP and EP if link recoverd Shuai Xue
@ 2026-01-24  7:45 ` Shuai Xue
  2026-01-27 10:10   ` Jonathan Cameron
  2026-01-24  7:45 ` [PATCH v7 2/5] PCI/DPC: Run recovery on device that detected the error Shuai Xue
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 35+ messages in thread
From: Shuai Xue @ 2026-01-24  7:45 UTC (permalink / raw)
  To: linux-pci, linux-kernel, linuxppc-dev, bhelgaas, kbusch,
	sathyanarayanan.kuppuswamy
  Cc: mahesh, oohall, xueshuai, Jonathan.Cameron, terry.bowman,
	tianruidong, lukas

dpc_handler() is registered for error port which recevie DPC interrupt
and acpi_dpc_port_get() locate the port that experienced the containment
event.

Rename edev and pdev to err_port for clear so that later patch will
avoid misused err_port in pcie_do_recovery().

No functional changes intended.

Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
---
 drivers/pci/pcie/dpc.c | 10 +++++-----
 drivers/pci/pcie/edr.c | 34 +++++++++++++++++-----------------
 2 files changed, 22 insertions(+), 22 deletions(-)

diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
index fc18349614d7..bff29726c6a5 100644
--- a/drivers/pci/pcie/dpc.c
+++ b/drivers/pci/pcie/dpc.c
@@ -361,21 +361,21 @@ static bool dpc_is_surprise_removal(struct pci_dev *pdev)
 
 static irqreturn_t dpc_handler(int irq, void *context)
 {
-	struct pci_dev *pdev = context;
+	struct pci_dev *err_port = context;
 
 	/*
 	 * According to PCIe r6.0 sec 6.7.6, errors are an expected side effect
 	 * of async removal and should be ignored by software.
 	 */
-	if (dpc_is_surprise_removal(pdev)) {
-		dpc_handle_surprise_removal(pdev);
+	if (dpc_is_surprise_removal(err_port)) {
+		dpc_handle_surprise_removal(err_port);
 		return IRQ_HANDLED;
 	}
 
-	dpc_process_error(pdev);
+	dpc_process_error(err_port);
 
 	/* We configure DPC so it only triggers on ERR_FATAL */
-	pcie_do_recovery(pdev, pci_channel_io_frozen, dpc_reset_link);
+	pcie_do_recovery(err_port, pci_channel_io_frozen, dpc_reset_link);
 
 	return IRQ_HANDLED;
 }
diff --git a/drivers/pci/pcie/edr.c b/drivers/pci/pcie/edr.c
index e86298dbbcff..521fca2f40cb 100644
--- a/drivers/pci/pcie/edr.c
+++ b/drivers/pci/pcie/edr.c
@@ -150,7 +150,7 @@ static int acpi_send_edr_status(struct pci_dev *pdev, struct pci_dev *edev,
 
 static void edr_handle_event(acpi_handle handle, u32 event, void *data)
 {
-	struct pci_dev *pdev = data, *edev;
+	struct pci_dev *pdev = data, *err_port;
 	pci_ers_result_t estate = PCI_ERS_RESULT_DISCONNECT;
 	u16 status;
 
@@ -169,36 +169,36 @@ static void edr_handle_event(acpi_handle handle, u32 event, void *data)
 	 * may be that port or a parent of it (PCI Firmware r3.3, sec
 	 * 4.6.13).
 	 */
-	edev = acpi_dpc_port_get(pdev);
-	if (!edev) {
+	err_port = acpi_dpc_port_get(pdev);
+	if (!err_port) {
 		pci_err(pdev, "Firmware failed to locate DPC port\n");
 		return;
 	}
 
-	pci_dbg(pdev, "Reported EDR dev: %s\n", pci_name(edev));
+	pci_dbg(pdev, "Reported EDR dev: %s\n", pci_name(err_port));
 
 	/* If port does not support DPC, just send the OST */
-	if (!edev->dpc_cap) {
-		pci_err(edev, FW_BUG "This device doesn't support DPC\n");
+	if (!err_port->dpc_cap) {
+		pci_err(err_port, FW_BUG "This device doesn't support DPC\n");
 		goto send_ost;
 	}
 
 	/* Check if there is a valid DPC trigger */
-	pci_read_config_word(edev, edev->dpc_cap + PCI_EXP_DPC_STATUS, &status);
+	pci_read_config_word(err_port, err_port->dpc_cap + PCI_EXP_DPC_STATUS, &status);
 	if (!(status & PCI_EXP_DPC_STATUS_TRIGGER)) {
-		pci_err(edev, "Invalid DPC trigger %#010x\n", status);
+		pci_err(err_port, "Invalid DPC trigger %#010x\n", status);
 		goto send_ost;
 	}
 
-	dpc_process_error(edev);
-	pci_aer_raw_clear_status(edev);
+	dpc_process_error(err_port);
+	pci_aer_raw_clear_status(err_port);
 
 	/*
 	 * Irrespective of whether the DPC event is triggered by ERR_FATAL
 	 * or ERR_NONFATAL, since the link is already down, use the FATAL
 	 * error recovery path for both cases.
 	 */
-	estate = pcie_do_recovery(edev, pci_channel_io_frozen, dpc_reset_link);
+	estate = pcie_do_recovery(err_port, pci_channel_io_frozen, dpc_reset_link);
 
 send_ost:
 
@@ -207,15 +207,15 @@ static void edr_handle_event(acpi_handle handle, u32 event, void *data)
 	 * to firmware. If not successful, send _OST(0xF, BDF << 16 | 0x81).
 	 */
 	if (estate == PCI_ERS_RESULT_RECOVERED) {
-		pci_dbg(edev, "DPC port successfully recovered\n");
-		pcie_clear_device_status(edev);
-		acpi_send_edr_status(pdev, edev, EDR_OST_SUCCESS);
+		pci_dbg(err_port, "DPC port successfully recovered\n");
+		pcie_clear_device_status(err_port);
+		acpi_send_edr_status(pdev, err_port, EDR_OST_SUCCESS);
 	} else {
-		pci_dbg(edev, "DPC port recovery failed\n");
-		acpi_send_edr_status(pdev, edev, EDR_OST_FAILED);
+		pci_dbg(err_port, "DPC port recovery failed\n");
+		acpi_send_edr_status(pdev, err_port, EDR_OST_FAILED);
 	}
 
-	pci_dev_put(edev);
+	pci_dev_put(err_port);
 }
 
 void pci_acpi_add_edr_notifier(struct pci_dev *pdev)
-- 
2.39.3


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v7 2/5] PCI/DPC: Run recovery on device that detected the error
  2026-01-24  7:45 [PATCH v7 0/5] PCI/AER: Report fatal errors of RCiEP and EP if link recoverd Shuai Xue
  2026-01-24  7:45 ` [PATCH v7 1/5] PCI/DPC: Clarify naming for error port in DPC Handling Shuai Xue
@ 2026-01-24  7:45 ` Shuai Xue
  2026-01-27 10:24   ` Jonathan Cameron
  2026-02-02 14:02   ` Lukas Wunner
  2026-01-24  7:45 ` [PATCH v7 3/5] PCI/AER: Report fatal errors of RCiEP and EP if link recoverd Shuai Xue
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 35+ messages in thread
From: Shuai Xue @ 2026-01-24  7:45 UTC (permalink / raw)
  To: linux-pci, linux-kernel, linuxppc-dev, bhelgaas, kbusch,
	sathyanarayanan.kuppuswamy
  Cc: mahesh, oohall, xueshuai, Jonathan.Cameron, terry.bowman,
	tianruidong, lukas

The current implementation of pcie_do_recovery() assumes that the
recovery process is executed for the device that detected the error.
However, the DPC driver currently passes the error port that experienced
the DPC event to pcie_do_recovery().

Use the SOURCE ID register to correctly identify the device that
detected the error. When passing the error device, the
pcie_do_recovery() will find the upstream bridge and walk bridges
potentially AER affected. And subsequent commits will be able to
accurately access AER status of the error device.

Should not observe any functional changes.

Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
---
 drivers/pci/pci.h      |  2 +-
 drivers/pci/pcie/dpc.c | 25 +++++++++++++++++++++----
 drivers/pci/pcie/edr.c |  7 ++++---
 3 files changed, 26 insertions(+), 8 deletions(-)

diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 0e67014aa001..58640e656897 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -771,7 +771,7 @@ struct rcec_ea {
 void pci_save_dpc_state(struct pci_dev *dev);
 void pci_restore_dpc_state(struct pci_dev *dev);
 void pci_dpc_init(struct pci_dev *pdev);
-void dpc_process_error(struct pci_dev *pdev);
+struct pci_dev *dpc_process_error(struct pci_dev *pdev);
 pci_ers_result_t dpc_reset_link(struct pci_dev *pdev);
 bool pci_dpc_recovered(struct pci_dev *pdev);
 unsigned int dpc_tlp_log_len(struct pci_dev *dev);
diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
index bff29726c6a5..f6069f621683 100644
--- a/drivers/pci/pcie/dpc.c
+++ b/drivers/pci/pcie/dpc.c
@@ -260,10 +260,20 @@ static int dpc_get_aer_uncorrect_severity(struct pci_dev *dev,
 	return 1;
 }
 
-void dpc_process_error(struct pci_dev *pdev)
+/**
+ * dpc_process_error - handle the DPC error status
+ * @pdev: the port that experienced the containment event
+ *
+ * Return: the device that detected the error.
+ *
+ * NOTE: The device reference count is increased, the caller must decrement
+ * the reference count by calling pci_dev_put().
+ */
+struct pci_dev *dpc_process_error(struct pci_dev *pdev)
 {
 	u16 cap = pdev->dpc_cap, status, source, reason, ext_reason;
 	struct aer_err_info info = {};
+	struct pci_dev *err_dev;
 
 	pci_read_config_word(pdev, cap + PCI_EXP_DPC_STATUS, &status);
 
@@ -279,6 +289,7 @@ void dpc_process_error(struct pci_dev *pdev)
 			pci_aer_clear_nonfatal_status(pdev);
 			pci_aer_clear_fatal_status(pdev);
 		}
+		err_dev = pci_dev_get(pdev);
 		break;
 	case PCI_EXP_DPC_STATUS_TRIGGER_RSN_NFE:
 	case PCI_EXP_DPC_STATUS_TRIGGER_RSN_FE:
@@ -290,6 +301,8 @@ void dpc_process_error(struct pci_dev *pdev)
 				"ERR_FATAL" : "ERR_NONFATAL",
 			 pci_domain_nr(pdev->bus), PCI_BUS_NUM(source),
 			 PCI_SLOT(source), PCI_FUNC(source));
+		err_dev = pci_get_domain_bus_and_slot(pci_domain_nr(pdev->bus),
+					    PCI_BUS_NUM(source), source & 0xff);
 		break;
 	case PCI_EXP_DPC_STATUS_TRIGGER_RSN_IN_EXT:
 		ext_reason = status & PCI_EXP_DPC_STATUS_TRIGGER_RSN_EXT;
@@ -304,8 +317,11 @@ void dpc_process_error(struct pci_dev *pdev)
 		if (ext_reason == PCI_EXP_DPC_STATUS_TRIGGER_RSN_RP_PIO &&
 		    pdev->dpc_rp_extensions)
 			dpc_process_rp_pio_error(pdev);
+		err_dev = pci_dev_get(pdev);
 		break;
 	}
+
+	return err_dev;
 }
 
 static void pci_clear_surpdn_errors(struct pci_dev *pdev)
@@ -361,7 +377,7 @@ static bool dpc_is_surprise_removal(struct pci_dev *pdev)
 
 static irqreturn_t dpc_handler(int irq, void *context)
 {
-	struct pci_dev *err_port = context;
+	struct pci_dev *err_port = context, *err_dev;
 
 	/*
 	 * According to PCIe r6.0 sec 6.7.6, errors are an expected side effect
@@ -372,10 +388,11 @@ static irqreturn_t dpc_handler(int irq, void *context)
 		return IRQ_HANDLED;
 	}
 
-	dpc_process_error(err_port);
+	err_dev = dpc_process_error(err_port);
 
 	/* We configure DPC so it only triggers on ERR_FATAL */
-	pcie_do_recovery(err_port, pci_channel_io_frozen, dpc_reset_link);
+	pcie_do_recovery(err_dev, pci_channel_io_frozen, dpc_reset_link);
+	pci_dev_put(err_dev);
 
 	return IRQ_HANDLED;
 }
diff --git a/drivers/pci/pcie/edr.c b/drivers/pci/pcie/edr.c
index 521fca2f40cb..b6e9d652297e 100644
--- a/drivers/pci/pcie/edr.c
+++ b/drivers/pci/pcie/edr.c
@@ -150,7 +150,7 @@ static int acpi_send_edr_status(struct pci_dev *pdev, struct pci_dev *edev,
 
 static void edr_handle_event(acpi_handle handle, u32 event, void *data)
 {
-	struct pci_dev *pdev = data, *err_port;
+	struct pci_dev *pdev = data, *err_port, *err_dev;
 	pci_ers_result_t estate = PCI_ERS_RESULT_DISCONNECT;
 	u16 status;
 
@@ -190,7 +190,7 @@ static void edr_handle_event(acpi_handle handle, u32 event, void *data)
 		goto send_ost;
 	}
 
-	dpc_process_error(err_port);
+	err_dev = dpc_process_error(err_port);
 	pci_aer_raw_clear_status(err_port);
 
 	/*
@@ -198,7 +198,8 @@ static void edr_handle_event(acpi_handle handle, u32 event, void *data)
 	 * or ERR_NONFATAL, since the link is already down, use the FATAL
 	 * error recovery path for both cases.
 	 */
-	estate = pcie_do_recovery(err_port, pci_channel_io_frozen, dpc_reset_link);
+	estate = pcie_do_recovery(err_dev, pci_channel_io_frozen, dpc_reset_link);
+	pci_dev_put(err_dev);
 
 send_ost:
 
-- 
2.39.3


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v7 3/5] PCI/AER: Report fatal errors of RCiEP and EP if link recoverd
  2026-01-24  7:45 [PATCH v7 0/5] PCI/AER: Report fatal errors of RCiEP and EP if link recoverd Shuai Xue
  2026-01-24  7:45 ` [PATCH v7 1/5] PCI/DPC: Clarify naming for error port in DPC Handling Shuai Xue
  2026-01-24  7:45 ` [PATCH v7 2/5] PCI/DPC: Run recovery on device that detected the error Shuai Xue
@ 2026-01-24  7:45 ` Shuai Xue
  2026-01-27 10:36   ` Jonathan Cameron
  2026-01-28 16:50   ` Kuppuswamy Sathyanarayanan
  2026-01-24  7:45 ` [PATCH v7 4/5] PCI/AER: Clear both AER fatal and non-fatal status Shuai Xue
  2026-01-24  7:45 ` [PATCH v7 5/5] PCI/AER: Only clear error bits in pcie_clear_device_status() Shuai Xue
  4 siblings, 2 replies; 35+ messages in thread
From: Shuai Xue @ 2026-01-24  7:45 UTC (permalink / raw)
  To: linux-pci, linux-kernel, linuxppc-dev, bhelgaas, kbusch,
	sathyanarayanan.kuppuswamy
  Cc: mahesh, oohall, xueshuai, Jonathan.Cameron, terry.bowman,
	tianruidong, lukas

The AER driver has historically avoided reading the configuration space of
an endpoint or RCiEP that reported a fatal error, considering the link to
that device unreliable. Consequently, when a fatal error occurs, the AER
and DPC drivers do not report specific error types, resulting in logs like:

  pcieport 0015:00:00.0: EDR: EDR event received
  pcieport 0015:00:00.0: EDR: Reported EDR dev: 0015:00:00.0
  pcieport 0015:00:00.0: DPC: containment event, status:0x200d, ERR_FATAL received from 0015:01:00.0
  pcieport 0015:00:00.0: AER: broadcast error_detected message
  pcieport 0015:00:00.0: AER: broadcast mmio_enabled message
  pcieport 0015:00:00.0: AER: broadcast resume message
  pcieport 0015:00:00.0: pciehp: Slot(21): Link Down/Up ignored
  pcieport 0015:00:00.0: AER: device recovery successful
  pcieport 0015:00:00.0: EDR: DPC port successfully recovered
  pcieport 0015:00:00.0: EDR: Status for 0015:00:00.0: 0x80

AER status registers are sticky and Write-1-to-clear. If the link recovered
after hot reset, we can still safely access AER status and TLP header of the
error device. In such case, report fatal errors which helps to figure out the
error root case.

After this patch, the logs like:

  pcieport 0015:00:00.0: EDR: EDR event received
  pcieport 0015:00:00.0: EDR: Reported EDR dev: 0015:00:00.0
  pcieport 0015:00:00.0: DPC: containment event, status:0x200d, ERR_FATAL received from 0015:01:00.0
  pcieport 0015:00:00.0: AER: broadcast error_detected message
+ vfio-pci 0015:01:00.0: AER: Errors reported prior to reset
+ vfio-pci 0015:01:00.0: PCIe Bus Error: severity=Uncorrectable (Fatal), type=Transaction Layer, (Receiver ID)
+ vfio-pci 0015:01:00.0:   device [144d:a80a] error status/mask=00001000/00400000
+ vfio-pci 0015:01:00.0:    [12] TLP                    (First)
+ vfio-pci 0015:01:00.0: AER:   TLP Header: 0x4a004010 0x00000040 0x01000000 0xffffffff
  pcieport 0015:00:00.0: AER: broadcast mmio_enabled message
  pcieport 0015:00:00.0: AER: broadcast resume message
  pcieport 0015:00:00.0: pciehp: Slot(21): Link Down/Up ignored
  pcieport 0015:00:00.0: AER: device recovery successful
  pcieport 0015:00:00.0: EDR: DPC port successfully recovered
  pcieport 0015:00:00.0: EDR: Status for 0015:00:00.0: 0x80

Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
---
 drivers/pci/pci.h      |  4 +++-
 drivers/pci/pcie/aer.c | 32 ++++++++++++++++++++++++++++----
 drivers/pci/pcie/dpc.c |  2 +-
 drivers/pci/pcie/err.c |  5 +++++
 4 files changed, 37 insertions(+), 6 deletions(-)

diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 58640e656897..bd020ba0cef0 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -746,8 +746,10 @@ struct aer_err_info {
 	struct pcie_tlp_log tlp;	/* TLP Header */
 };
 
-int aer_get_device_error_info(struct aer_err_info *info, int i);
+int aer_get_device_error_info(struct aer_err_info *info, int i,
+			      bool link_healthy);
 void aer_print_error(struct aer_err_info *info, int i);
+void aer_report_frozen_error(struct pci_dev *dev);
 
 int pcie_read_tlp_log(struct pci_dev *dev, int where, int where2,
 		      unsigned int tlp_len, bool flit,
diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index e0bcaa896803..4c0a2bbe9197 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -1384,12 +1384,14 @@ EXPORT_SYMBOL_GPL(aer_recover_queue);
  * aer_get_device_error_info - read error status from dev and store it to info
  * @info: pointer to structure to store the error record
  * @i: index into info->dev[]
+ * @link_healthy: link is healthy or not
  *
  * Return: 1 on success, 0 on error.
  *
  * Note that @info is reused among all error devices. Clear fields properly.
  */
-int aer_get_device_error_info(struct aer_err_info *info, int i)
+int aer_get_device_error_info(struct aer_err_info *info, int i,
+			      bool link_healthy)
 {
 	struct pci_dev *dev;
 	int type, aer;
@@ -1420,7 +1422,8 @@ int aer_get_device_error_info(struct aer_err_info *info, int i)
 	} else if (type == PCI_EXP_TYPE_ROOT_PORT ||
 		   type == PCI_EXP_TYPE_RC_EC ||
 		   type == PCI_EXP_TYPE_DOWNSTREAM ||
-		   info->severity == AER_NONFATAL) {
+		   info->severity == AER_NONFATAL ||
+		   (info->severity == AER_FATAL && link_healthy)) {
 
 		/* Link is still healthy for IO reads */
 		pci_read_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS,
@@ -1447,17 +1450,38 @@ int aer_get_device_error_info(struct aer_err_info *info, int i)
 	return 1;
 }
 
+void aer_report_frozen_error(struct pci_dev *dev)
+{
+	struct aer_err_info info;
+	int type = pci_pcie_type(dev);
+
+	if (type != PCI_EXP_TYPE_ENDPOINT && type != PCI_EXP_TYPE_RC_END)
+		return;
+
+	info.error_dev_num = 0;
+	info.severity = AER_FATAL;
+	info.level = KERN_ERR;
+	add_error_device(&info, dev);
+
+	if (aer_get_device_error_info(&info, 0, true)) {
+		pci_err(dev, "Errors reported prior to reset\n");
+		aer_print_error(&info, 0);
+	}
+
+	pci_dev_put(dev); /* pairs with pci_dev_get() in add_error_device() */
+}
+
 static inline void aer_process_err_devices(struct aer_err_info *e_info)
 {
 	int i;
 
 	/* Report all before handling them, to not lose records by reset etc. */
 	for (i = 0; i < e_info->error_dev_num && e_info->dev[i]; i++) {
-		if (aer_get_device_error_info(e_info, i))
+		if (aer_get_device_error_info(e_info, i, false))
 			aer_print_error(e_info, i);
 	}
 	for (i = 0; i < e_info->error_dev_num && e_info->dev[i]; i++) {
-		if (aer_get_device_error_info(e_info, i))
+		if (aer_get_device_error_info(e_info, i, false))
 			handle_error_source(e_info->dev[i], e_info);
 	}
 }
diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
index f6069f621683..21c4e8371279 100644
--- a/drivers/pci/pcie/dpc.c
+++ b/drivers/pci/pcie/dpc.c
@@ -284,7 +284,7 @@ struct pci_dev *dpc_process_error(struct pci_dev *pdev)
 		pci_warn(pdev, "containment event, status:%#06x: unmasked uncorrectable error detected\n",
 			 status);
 		if (dpc_get_aer_uncorrect_severity(pdev, &info) &&
-		    aer_get_device_error_info(&info, 0)) {
+		    aer_get_device_error_info(&info, 0, false)) {
 			aer_print_error(&info, 0);
 			pci_aer_clear_nonfatal_status(pdev);
 			pci_aer_clear_fatal_status(pdev);
diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index bebe4bc111d7..0780ea09478b 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -253,6 +253,11 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
 			pci_warn(bridge, "subordinate device reset failed\n");
 			goto failed;
 		}
+
+		/* Link recovered, report fatal errors of RCiEP or EP */
+		if (state == pci_channel_io_frozen)
+			aer_report_frozen_error(dev);
+
 	}
 
 	if (status == PCI_ERS_RESULT_NEED_RESET) {
-- 
2.39.3


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v7 4/5] PCI/AER: Clear both AER fatal and non-fatal status
  2026-01-24  7:45 [PATCH v7 0/5] PCI/AER: Report fatal errors of RCiEP and EP if link recoverd Shuai Xue
                   ` (2 preceding siblings ...)
  2026-01-24  7:45 ` [PATCH v7 3/5] PCI/AER: Report fatal errors of RCiEP and EP if link recoverd Shuai Xue
@ 2026-01-24  7:45 ` Shuai Xue
  2026-01-27 10:39   ` Jonathan Cameron
                     ` (2 more replies)
  2026-01-24  7:45 ` [PATCH v7 5/5] PCI/AER: Only clear error bits in pcie_clear_device_status() Shuai Xue
  4 siblings, 3 replies; 35+ messages in thread
From: Shuai Xue @ 2026-01-24  7:45 UTC (permalink / raw)
  To: linux-pci, linux-kernel, linuxppc-dev, bhelgaas, kbusch,
	sathyanarayanan.kuppuswamy
  Cc: mahesh, oohall, xueshuai, Jonathan.Cameron, terry.bowman,
	tianruidong, lukas

The DPC driver clears AER fatal status for the port that reported the
error, but not for the downstream device that deteced the error.  The
current recovery code only clears non-fatal AER status, leaving fatal
status bits set in the error device.

Use pci_aer_raw_clear_status() to clear both fatal and non-fatal error
status in the error device, ensuring all AER status bits are properly
cleared after recovery.

Fixes: aa344bc8b727 ("PCI/ERR: Clear AER status only when we control AER")
Cc: stable@vger.kernel.org
Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
---
 drivers/pci/pcie/err.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index 0780ea09478b..5e463efc3d05 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -285,7 +285,7 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
 	 */
 	if (host->native_aer || pcie_ports_native) {
 		pcie_clear_device_status(dev);
-		pci_aer_clear_nonfatal_status(dev);
+		pci_aer_raw_clear_status(dev);
 	}
 
 	pci_walk_bridge(bridge, pci_pm_runtime_put, NULL);
-- 
2.39.3


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v7 5/5] PCI/AER: Only clear error bits in pcie_clear_device_status()
  2026-01-24  7:45 [PATCH v7 0/5] PCI/AER: Report fatal errors of RCiEP and EP if link recoverd Shuai Xue
                   ` (3 preceding siblings ...)
  2026-01-24  7:45 ` [PATCH v7 4/5] PCI/AER: Clear both AER fatal and non-fatal status Shuai Xue
@ 2026-01-24  7:45 ` Shuai Xue
  2026-01-27 10:45   ` Jonathan Cameron
                     ` (2 more replies)
  4 siblings, 3 replies; 35+ messages in thread
From: Shuai Xue @ 2026-01-24  7:45 UTC (permalink / raw)
  To: linux-pci, linux-kernel, linuxppc-dev, bhelgaas, kbusch,
	sathyanarayanan.kuppuswamy
  Cc: mahesh, oohall, xueshuai, Jonathan.Cameron, terry.bowman,
	tianruidong, lukas

Currently, pcie_clear_device_status() clears the entire PCIe Device
Status register (PCI_EXP_DEVSTA), which includes both error status bits
and other status bits such as AUX Power Detected (AUXPD) and
Transactions Pending (TRPND).

Clearing non-error status bits can interfere with other drivers or
subsystems that may rely on these bits. To fix it, only clear the error
bits (0xf) while preserving other status bits.

Fixes: ec752f5d54d7 ("PCI/AER: Clear device status bits during ERR_FATAL and ERR_NONFATAL")
Cc: stable@vger.kernel.org
Suggested-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
---
 drivers/pci/pci.c             | 2 +-
 include/uapi/linux/pci_regs.h | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 13dbb405dc31..0b947f90c333 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -2246,7 +2246,7 @@ void pcie_clear_device_status(struct pci_dev *dev)
 	u16 sta;
 
 	pcie_capability_read_word(dev, PCI_EXP_DEVSTA, &sta);
-	pcie_capability_write_word(dev, PCI_EXP_DEVSTA, sta);
+	pcie_capability_write_word(dev, PCI_EXP_DEVSTA, sta & PCI_EXP_DEVSTA_ERR);
 }
 #endif
 
diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
index 3add74ae2594..f4b68203bc4e 100644
--- a/include/uapi/linux/pci_regs.h
+++ b/include/uapi/linux/pci_regs.h
@@ -534,6 +534,7 @@
 #define  PCI_EXP_DEVSTA_NFED	0x0002	/* Non-Fatal Error Detected */
 #define  PCI_EXP_DEVSTA_FED	0x0004	/* Fatal Error Detected */
 #define  PCI_EXP_DEVSTA_URD	0x0008	/* Unsupported Request Detected */
+#define  PCI_EXP_DEVSTA_ERR	0xf	/* Error bits */
 #define  PCI_EXP_DEVSTA_AUXPD	0x0010	/* AUX Power Detected */
 #define  PCI_EXP_DEVSTA_TRPND	0x0020	/* Transactions Pending */
 #define PCI_CAP_EXP_RC_ENDPOINT_SIZEOF_V1	12	/* v1 endpoints without link end here */
-- 
2.39.3


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH v7 1/5] PCI/DPC: Clarify naming for error port in DPC Handling
  2026-01-24  7:45 ` [PATCH v7 1/5] PCI/DPC: Clarify naming for error port in DPC Handling Shuai Xue
@ 2026-01-27 10:10   ` Jonathan Cameron
  0 siblings, 0 replies; 35+ messages in thread
From: Jonathan Cameron @ 2026-01-27 10:10 UTC (permalink / raw)
  To: Shuai Xue
  Cc: linux-pci, linux-kernel, linuxppc-dev, bhelgaas, kbusch,
	sathyanarayanan.kuppuswamy, mahesh, oohall, terry.bowman,
	tianruidong, lukas

On Sat, 24 Jan 2026 15:45:53 +0800
Shuai Xue <xueshuai@linux.alibaba.com> wrote:

> dpc_handler() is registered for error port which recevie DPC interrupt
> and acpi_dpc_port_get() locate the port that experienced the containment
> event.
> 
> Rename edev and pdev to err_port for clear so that later patch will
> avoid misused err_port in pcie_do_recovery().
> 
> No functional changes intended.
> 
> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
Seems like a good readability improvement to me.

Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v7 2/5] PCI/DPC: Run recovery on device that detected the error
  2026-01-24  7:45 ` [PATCH v7 2/5] PCI/DPC: Run recovery on device that detected the error Shuai Xue
@ 2026-01-27 10:24   ` Jonathan Cameron
  2026-01-28 12:27     ` Shuai Xue
  2026-02-02 14:02   ` Lukas Wunner
  1 sibling, 1 reply; 35+ messages in thread
From: Jonathan Cameron @ 2026-01-27 10:24 UTC (permalink / raw)
  To: Shuai Xue
  Cc: linux-pci, linux-kernel, linuxppc-dev, bhelgaas, kbusch,
	sathyanarayanan.kuppuswamy, mahesh, oohall, terry.bowman,
	tianruidong, lukas

On Sat, 24 Jan 2026 15:45:54 +0800
Shuai Xue <xueshuai@linux.alibaba.com> wrote:

> The current implementation of pcie_do_recovery() assumes that the
> recovery process is executed for the device that detected the error.
> However, the DPC driver currently passes the error port that experienced
> the DPC event to pcie_do_recovery().
> 
> Use the SOURCE ID register to correctly identify the device that
> detected the error. When passing the error device, the
> pcie_do_recovery() will find the upstream bridge and walk bridges
> potentially AER affected. And subsequent commits will be able to
> accurately access AER status of the error device.
> 
> Should not observe any functional changes.
> 
> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
> ---
>  drivers/pci/pci.h      |  2 +-
>  drivers/pci/pcie/dpc.c | 25 +++++++++++++++++++++----
>  drivers/pci/pcie/edr.c |  7 ++++---
>  3 files changed, 26 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index 0e67014aa001..58640e656897 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -771,7 +771,7 @@ struct rcec_ea {
>  void pci_save_dpc_state(struct pci_dev *dev);
>  void pci_restore_dpc_state(struct pci_dev *dev);
>  void pci_dpc_init(struct pci_dev *pdev);
> -void dpc_process_error(struct pci_dev *pdev);
> +struct pci_dev *dpc_process_error(struct pci_dev *pdev);
>  pci_ers_result_t dpc_reset_link(struct pci_dev *pdev);
>  bool pci_dpc_recovered(struct pci_dev *pdev);
>  unsigned int dpc_tlp_log_len(struct pci_dev *dev);
> diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
> index bff29726c6a5..f6069f621683 100644
> --- a/drivers/pci/pcie/dpc.c
> +++ b/drivers/pci/pcie/dpc.c
> @@ -260,10 +260,20 @@ static int dpc_get_aer_uncorrect_severity(struct pci_dev *dev,
>  	return 1;
>  }
>  
> -void dpc_process_error(struct pci_dev *pdev)
> +/**
> + * dpc_process_error - handle the DPC error status
> + * @pdev: the port that experienced the containment event
> + *
> + * Return: the device that detected the error.
> + *
> + * NOTE: The device reference count is increased, the caller must decrement
> + * the reference count by calling pci_dev_put().
> + */
> +struct pci_dev *dpc_process_error(struct pci_dev *pdev)

Maybe it makes sense to carry the err_port naming for the pci_dev
in here as well?  Seems stronger than just relying on people
reading the documentation you've added.
 
>  {
>  	u16 cap = pdev->dpc_cap, status, source, reason, ext_reason;
>  	struct aer_err_info info = {};
> +	struct pci_dev *err_dev;
>  
>  	pci_read_config_word(pdev, cap + PCI_EXP_DPC_STATUS, &status);
>  
> @@ -279,6 +289,7 @@ void dpc_process_error(struct pci_dev *pdev)
>  			pci_aer_clear_nonfatal_status(pdev);
>  			pci_aer_clear_fatal_status(pdev);
>  		}
> +		err_dev = pci_dev_get(pdev);
>  		break;
>  	case PCI_EXP_DPC_STATUS_TRIGGER_RSN_NFE:
>  	case PCI_EXP_DPC_STATUS_TRIGGER_RSN_FE:
> @@ -290,6 +301,8 @@ void dpc_process_error(struct pci_dev *pdev)
>  				"ERR_FATAL" : "ERR_NONFATAL",
>  			 pci_domain_nr(pdev->bus), PCI_BUS_NUM(source),
>  			 PCI_SLOT(source), PCI_FUNC(source));
> +		err_dev = pci_get_domain_bus_and_slot(pci_domain_nr(pdev->bus),
> +					    PCI_BUS_NUM(source), source & 0xff);

Bunch of replication in her with the pci_warn(). Maybe some local variables?
Maybe even rebuild the final parameter from PCI_DEVFN(slot, func) just to make the
association with the print really obvious?

Is there any chance that this might return NULL?   Feels like maybe that's
only a possibility on a broken setup, but I'm not sure of all the wonderful
races around hotplug and DPC occurring before the OS has caught up.

>  		break;
>  	case PCI_EXP_DPC_STATUS_TRIGGER_RSN_IN_EXT:
>  		ext_reason = status & PCI_EXP_DPC_STATUS_TRIGGER_RSN_EXT;
> @@ -304,8 +317,11 @@ void dpc_process_error(struct pci_dev *pdev)
>  		if (ext_reason == PCI_EXP_DPC_STATUS_TRIGGER_RSN_RP_PIO &&
>  		    pdev->dpc_rp_extensions)
>  			dpc_process_rp_pio_error(pdev);
> +		err_dev = pci_dev_get(pdev);
>  		break;
>  	}
> +
> +	return err_dev;
>  }

> diff --git a/drivers/pci/pcie/edr.c b/drivers/pci/pcie/edr.c
> index 521fca2f40cb..b6e9d652297e 100644
> --- a/drivers/pci/pcie/edr.c
> +++ b/drivers/pci/pcie/edr.c
> @@ -150,7 +150,7 @@ static int acpi_send_edr_status(struct pci_dev *pdev, struct pci_dev *edev,
>  
>  static void edr_handle_event(acpi_handle handle, u32 event, void *data)
>  {
> -	struct pci_dev *pdev = data, *err_port;
> +	struct pci_dev *pdev = data, *err_port, *err_dev;
>  	pci_ers_result_t estate = PCI_ERS_RESULT_DISCONNECT;
>  	u16 status;
>  
> @@ -190,7 +190,7 @@ static void edr_handle_event(acpi_handle handle, u32 event, void *data)
>  		goto send_ost;
>  	}
>  
> -	dpc_process_error(err_port);
> +	err_dev = dpc_process_error(err_port);
>  	pci_aer_raw_clear_status(err_port);
>  
>  	/*
> @@ -198,7 +198,8 @@ static void edr_handle_event(acpi_handle handle, u32 event, void *data)
>  	 * or ERR_NONFATAL, since the link is already down, use the FATAL
>  	 * error recovery path for both cases.
>  	 */
> -	estate = pcie_do_recovery(err_port, pci_channel_io_frozen, dpc_reset_link);
> +	estate = pcie_do_recovery(err_dev, pci_channel_io_frozen, dpc_reset_link);
> +	pci_dev_put(err_dev);
>  
>  send_ost:
>  


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v7 3/5] PCI/AER: Report fatal errors of RCiEP and EP if link recoverd
  2026-01-24  7:45 ` [PATCH v7 3/5] PCI/AER: Report fatal errors of RCiEP and EP if link recoverd Shuai Xue
@ 2026-01-27 10:36   ` Jonathan Cameron
  2026-01-28 12:29     ` Shuai Xue
  2026-01-28 16:50   ` Kuppuswamy Sathyanarayanan
  1 sibling, 1 reply; 35+ messages in thread
From: Jonathan Cameron @ 2026-01-27 10:36 UTC (permalink / raw)
  To: Shuai Xue
  Cc: linux-pci, linux-kernel, linuxppc-dev, bhelgaas, kbusch,
	sathyanarayanan.kuppuswamy, mahesh, oohall, terry.bowman,
	tianruidong, lukas

On Sat, 24 Jan 2026 15:45:55 +0800
Shuai Xue <xueshuai@linux.alibaba.com> wrote:

> The AER driver has historically avoided reading the configuration space of
> an endpoint or RCiEP that reported a fatal error, considering the link to
> that device unreliable. Consequently, when a fatal error occurs, the AER
> and DPC drivers do not report specific error types, resulting in logs like:
> 
>   pcieport 0015:00:00.0: EDR: EDR event received
>   pcieport 0015:00:00.0: EDR: Reported EDR dev: 0015:00:00.0
>   pcieport 0015:00:00.0: DPC: containment event, status:0x200d, ERR_FATAL received from 0015:01:00.0
>   pcieport 0015:00:00.0: AER: broadcast error_detected message
>   pcieport 0015:00:00.0: AER: broadcast mmio_enabled message
>   pcieport 0015:00:00.0: AER: broadcast resume message
>   pcieport 0015:00:00.0: pciehp: Slot(21): Link Down/Up ignored
>   pcieport 0015:00:00.0: AER: device recovery successful
>   pcieport 0015:00:00.0: EDR: DPC port successfully recovered
>   pcieport 0015:00:00.0: EDR: Status for 0015:00:00.0: 0x80
> 
> AER status registers are sticky and Write-1-to-clear. If the link recovered
> after hot reset, we can still safely access AER status and TLP header of the
> error device. In such case, report fatal errors which helps to figure out the
> error root case.
> 
> After this patch, the logs like:
> 
>   pcieport 0015:00:00.0: EDR: EDR event received
>   pcieport 0015:00:00.0: EDR: Reported EDR dev: 0015:00:00.0
>   pcieport 0015:00:00.0: DPC: containment event, status:0x200d, ERR_FATAL received from 0015:01:00.0
>   pcieport 0015:00:00.0: AER: broadcast error_detected message
> + vfio-pci 0015:01:00.0: AER: Errors reported prior to reset
> + vfio-pci 0015:01:00.0: PCIe Bus Error: severity=Uncorrectable (Fatal), type=Transaction Layer, (Receiver ID)
> + vfio-pci 0015:01:00.0:   device [144d:a80a] error status/mask=00001000/00400000
> + vfio-pci 0015:01:00.0:    [12] TLP                    (First)
> + vfio-pci 0015:01:00.0: AER:   TLP Header: 0x4a004010 0x00000040 0x01000000 0xffffffff
>   pcieport 0015:00:00.0: AER: broadcast mmio_enabled message
>   pcieport 0015:00:00.0: AER: broadcast resume message
>   pcieport 0015:00:00.0: pciehp: Slot(21): Link Down/Up ignored
>   pcieport 0015:00:00.0: AER: device recovery successful
>   pcieport 0015:00:00.0: EDR: DPC port successfully recovered
>   pcieport 0015:00:00.0: EDR: Status for 0015:00:00.0: 0x80
> 
> Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
Hi Shuai,

With the structure zeroed below (just to make this easier to review, not because
there is a bug as far as I can see)
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>

> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index e0bcaa896803..4c0a2bbe9197 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c

> @@ -1447,17 +1450,38 @@ int aer_get_device_error_info(struct aer_err_info *info, int i)
>  	return 1;
>  }
>  
> +void aer_report_frozen_error(struct pci_dev *dev)
> +{
> +	struct aer_err_info info;
> +	int type = pci_pcie_type(dev);
> +
> +	if (type != PCI_EXP_TYPE_ENDPOINT && type != PCI_EXP_TYPE_RC_END)
> +		return;
> +

struct aer_err_info has a bunch of fields. I'd just make sure it's zeroed
to avoid us having to check that they are all filled in. = {};
or do it with the initial values being assigned.

	info = (struct aer_err_info) {
		.err_dev_num = 0,
		.severity = AER_FATAL,
		.level = KERN_ERR,
	};
	add_error_device(&info, dev);


> +	info.error_dev_num = 0;
> +	info.severity = AER_FATAL;
> +	info.level = KERN_ERR;
> +	add_error_device(&info, dev);
> +
> +	if (aer_get_device_error_info(&info, 0, true)) {
> +		pci_err(dev, "Errors reported prior to reset\n");
> +		aer_print_error(&info, 0);
> +	}
> +
> +	pci_dev_put(dev); /* pairs with pci_dev_get() in add_error_device() */
> +}


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v7 4/5] PCI/AER: Clear both AER fatal and non-fatal status
  2026-01-24  7:45 ` [PATCH v7 4/5] PCI/AER: Clear both AER fatal and non-fatal status Shuai Xue
@ 2026-01-27 10:39   ` Jonathan Cameron
  2026-01-28 12:30     ` Shuai Xue
  2026-01-28 16:58   ` Kuppuswamy Sathyanarayanan
  2026-02-03  8:06   ` Lukas Wunner
  2 siblings, 1 reply; 35+ messages in thread
From: Jonathan Cameron @ 2026-01-27 10:39 UTC (permalink / raw)
  To: Shuai Xue
  Cc: linux-pci, linux-kernel, linuxppc-dev, bhelgaas, kbusch,
	sathyanarayanan.kuppuswamy, mahesh, oohall, terry.bowman,
	tianruidong, lukas

On Sat, 24 Jan 2026 15:45:56 +0800
Shuai Xue <xueshuai@linux.alibaba.com> wrote:

> The DPC driver clears AER fatal status for the port that reported the
> error, but not for the downstream device that deteced the error.  The
> current recovery code only clears non-fatal AER status, leaving fatal
> status bits set in the error device.
> 
> Use pci_aer_raw_clear_status() to clear both fatal and non-fatal error
> status in the error device, ensuring all AER status bits are properly
> cleared after recovery.
> 
> Fixes: aa344bc8b727 ("PCI/ERR: Clear AER status only when we control AER")
> Cc: stable@vger.kernel.org
> Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
Shouldn't this be first patch in series to make it easier to backport?

Otherwise seems reasonable to me, but others know these flows better than me
so hopefully we'll get some more review.
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>

> ---
>  drivers/pci/pcie/err.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> index 0780ea09478b..5e463efc3d05 100644
> --- a/drivers/pci/pcie/err.c
> +++ b/drivers/pci/pcie/err.c
> @@ -285,7 +285,7 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
>  	 */
>  	if (host->native_aer || pcie_ports_native) {
>  		pcie_clear_device_status(dev);
> -		pci_aer_clear_nonfatal_status(dev);
> +		pci_aer_raw_clear_status(dev);
>  	}
>  
>  	pci_walk_bridge(bridge, pci_pm_runtime_put, NULL);


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v7 5/5] PCI/AER: Only clear error bits in pcie_clear_device_status()
  2026-01-24  7:45 ` [PATCH v7 5/5] PCI/AER: Only clear error bits in pcie_clear_device_status() Shuai Xue
@ 2026-01-27 10:45   ` Jonathan Cameron
  2026-01-28 12:45     ` Shuai Xue
  2026-01-28 17:01   ` Kuppuswamy Sathyanarayanan
  2026-02-03  7:53   ` Lukas Wunner
  2 siblings, 1 reply; 35+ messages in thread
From: Jonathan Cameron @ 2026-01-27 10:45 UTC (permalink / raw)
  To: Shuai Xue
  Cc: linux-pci, linux-kernel, linuxppc-dev, bhelgaas, kbusch,
	sathyanarayanan.kuppuswamy, mahesh, oohall, terry.bowman,
	tianruidong, lukas

On Sat, 24 Jan 2026 15:45:57 +0800
Shuai Xue <xueshuai@linux.alibaba.com> wrote:

> Currently, pcie_clear_device_status() clears the entire PCIe Device
> Status register (PCI_EXP_DEVSTA), which includes both error status bits
> and other status bits such as AUX Power Detected (AUXPD) and
> Transactions Pending (TRPND).
> 
> Clearing non-error status bits can interfere with other drivers or
> subsystems that may rely on these bits. To fix it, only clear the error
> bits (0xf) while preserving other status bits.
> 
> Fixes: ec752f5d54d7 ("PCI/AER: Clear device status bits during ERR_FATAL and ERR_NONFATAL")
> Cc: stable@vger.kernel.org
> Suggested-by: Lukas Wunner <lukas@wunner.de>
> Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
Similar to previous. Drag to start of series to make backports easier if
we think this is a fix that affects real cases.  For stuff that's defined
in the PCI 6.2 spec, AUX power and Transactions Pending are RO, but
the interesting one is Emergency Power Reduction Detected which is RW1C
and hence reason this fix is potentially needed  + future features using
remaining bits.

I'd be more explicit in the commit message for that.  Talk about it not
mattering for AUXPD or TRPND because they are RO, vs the ones we
aren't using yet in Linux with the emergency power reduction detected
as an example that clearly shows we need to mask this!

J


> ---
>  drivers/pci/pci.c             | 2 +-
>  include/uapi/linux/pci_regs.h | 1 +
>  2 files changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 13dbb405dc31..0b947f90c333 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -2246,7 +2246,7 @@ void pcie_clear_device_status(struct pci_dev *dev)
>  	u16 sta;
>  
>  	pcie_capability_read_word(dev, PCI_EXP_DEVSTA, &sta);
> -	pcie_capability_write_word(dev, PCI_EXP_DEVSTA, sta);
> +	pcie_capability_write_word(dev, PCI_EXP_DEVSTA, sta & PCI_EXP_DEVSTA_ERR);
>  }
>  #endif
>  
> diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
> index 3add74ae2594..f4b68203bc4e 100644
> --- a/include/uapi/linux/pci_regs.h
> +++ b/include/uapi/linux/pci_regs.h
> @@ -534,6 +534,7 @@
>  #define  PCI_EXP_DEVSTA_NFED	0x0002	/* Non-Fatal Error Detected */
>  #define  PCI_EXP_DEVSTA_FED	0x0004	/* Fatal Error Detected */
>  #define  PCI_EXP_DEVSTA_URD	0x0008	/* Unsupported Request Detected */
> +#define  PCI_EXP_DEVSTA_ERR	0xf	/* Error bits */
>  #define  PCI_EXP_DEVSTA_AUXPD	0x0010	/* AUX Power Detected */
>  #define  PCI_EXP_DEVSTA_TRPND	0x0020	/* Transactions Pending */
>  #define PCI_CAP_EXP_RC_ENDPOINT_SIZEOF_V1	12	/* v1 endpoints without link end here */


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v7 2/5] PCI/DPC: Run recovery on device that detected the error
  2026-01-27 10:24   ` Jonathan Cameron
@ 2026-01-28 12:27     ` Shuai Xue
  2026-01-28 15:02       ` Jonathan Cameron
  0 siblings, 1 reply; 35+ messages in thread
From: Shuai Xue @ 2026-01-28 12:27 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: linux-pci, linux-kernel, linuxppc-dev, bhelgaas, kbusch,
	sathyanarayanan.kuppuswamy, mahesh, oohall, terry.bowman,
	tianruidong, lukas



On 1/27/26 6:24 PM, Jonathan Cameron wrote:
> On Sat, 24 Jan 2026 15:45:54 +0800
> Shuai Xue <xueshuai@linux.alibaba.com> wrote:
> 
>> The current implementation of pcie_do_recovery() assumes that the
>> recovery process is executed for the device that detected the error.
>> However, the DPC driver currently passes the error port that experienced
>> the DPC event to pcie_do_recovery().
>>
>> Use the SOURCE ID register to correctly identify the device that
>> detected the error. When passing the error device, the
>> pcie_do_recovery() will find the upstream bridge and walk bridges
>> potentially AER affected. And subsequent commits will be able to
>> accurately access AER status of the error device.
>>
>> Should not observe any functional changes.
>>
>> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
>> Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
>> ---
>>   drivers/pci/pci.h      |  2 +-
>>   drivers/pci/pcie/dpc.c | 25 +++++++++++++++++++++----
>>   drivers/pci/pcie/edr.c |  7 ++++---
>>   3 files changed, 26 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
>> index 0e67014aa001..58640e656897 100644
>> --- a/drivers/pci/pci.h
>> +++ b/drivers/pci/pci.h
>> @@ -771,7 +771,7 @@ struct rcec_ea {
>>   void pci_save_dpc_state(struct pci_dev *dev);
>>   void pci_restore_dpc_state(struct pci_dev *dev);
>>   void pci_dpc_init(struct pci_dev *pdev);
>> -void dpc_process_error(struct pci_dev *pdev);
>> +struct pci_dev *dpc_process_error(struct pci_dev *pdev);
>>   pci_ers_result_t dpc_reset_link(struct pci_dev *pdev);
>>   bool pci_dpc_recovered(struct pci_dev *pdev);
>>   unsigned int dpc_tlp_log_len(struct pci_dev *dev);
>> diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
>> index bff29726c6a5..f6069f621683 100644
>> --- a/drivers/pci/pcie/dpc.c
>> +++ b/drivers/pci/pcie/dpc.c
>> @@ -260,10 +260,20 @@ static int dpc_get_aer_uncorrect_severity(struct pci_dev *dev,
>>   	return 1;
>>   }
>>   
>> -void dpc_process_error(struct pci_dev *pdev)
>> +/**
>> + * dpc_process_error - handle the DPC error status
>> + * @pdev: the port that experienced the containment event
>> + *
>> + * Return: the device that detected the error.
>> + *
>> + * NOTE: The device reference count is increased, the caller must decrement
>> + * the reference count by calling pci_dev_put().
>> + */
>> +struct pci_dev *dpc_process_error(struct pci_dev *pdev)
> 
> Maybe it makes sense to carry the err_port naming for the pci_dev
> in here as well?  Seems stronger than just relying on people
> reading the documentation you've added.

Good point. I think renaming the parameter would improve clarity. However,
I'd prefer to handle it in a separate patch to keep this change focused on
the functional modification. Would that work for you?

>   
>>   {
>>   	u16 cap = pdev->dpc_cap, status, source, reason, ext_reason;
>>   	struct aer_err_info info = {};
>> +	struct pci_dev *err_dev;
>>   
>>   	pci_read_config_word(pdev, cap + PCI_EXP_DPC_STATUS, &status);
>>   
>> @@ -279,6 +289,7 @@ void dpc_process_error(struct pci_dev *pdev)
>>   			pci_aer_clear_nonfatal_status(pdev);
>>   			pci_aer_clear_fatal_status(pdev);
>>   		}
>> +		err_dev = pci_dev_get(pdev);
>>   		break;
>>   	case PCI_EXP_DPC_STATUS_TRIGGER_RSN_NFE:
>>   	case PCI_EXP_DPC_STATUS_TRIGGER_RSN_FE:
>> @@ -290,6 +301,8 @@ void dpc_process_error(struct pci_dev *pdev)
>>   				"ERR_FATAL" : "ERR_NONFATAL",
>>   			 pci_domain_nr(pdev->bus), PCI_BUS_NUM(source),
>>   			 PCI_SLOT(source), PCI_FUNC(source));
>> +		err_dev = pci_get_domain_bus_and_slot(pci_domain_nr(pdev->bus),
>> +					    PCI_BUS_NUM(source), source & 0xff);
> 
> Bunch of replication in her with the pci_warn(). Maybe some local variables?
> Maybe even rebuild the final parameter from PCI_DEVFN(slot, func) just to make the
> association with the print really obvious?

Agreed. Here's the improved version:

--- a/drivers/pci/pcie/dpc.c
+++ b/drivers/pci/pcie/dpc.c
@@ -293,17 +293,28 @@ struct pci_dev *dpc_process_error(struct pci_dev *pdev)
                 break;
         case PCI_EXP_DPC_STATUS_TRIGGER_RSN_NFE:
         case PCI_EXP_DPC_STATUS_TRIGGER_RSN_FE:
-               pci_read_config_word(pdev, cap + PCI_EXP_DPC_SOURCE_ID,
-                                    &source);
+       {
+               int domain, bus;
+               u8 slot, func, devfn;
+               const char *err_type;
+
+               pci_read_config_word(pdev, cap + PCI_EXP_DPC_SOURCE_ID, &source);
+
+               /* Extract source device location */
+               domain = pci_domain_nr(pdev->bus);
+               bus = PCI_BUS_NUM(source);
+               slot = PCI_SLOT(source);
+               func = PCI_FUNC(source);
+               devfn = PCI_DEVFN(slot, func);
+
+               err_type = (reason == PCI_EXP_DPC_STATUS_TRIGGER_RSN_FE) ?
+                          "ERR_FATAL" : "ERR_NONFATAL";
+
                 pci_warn(pdev, "containment event, status:%#06x, %s received from %04x:%02x:%02x.%d\n",
-                        status,
-                        (reason == PCI_EXP_DPC_STATUS_TRIGGER_RSN_FE) ?
-                               "ERR_FATAL" : "ERR_NONFATAL",
-                        pci_domain_nr(pdev->bus), PCI_BUS_NUM(source),
-                        PCI_SLOT(source), PCI_FUNC(source));
-               err_dev = pci_get_domain_bus_and_slot(pci_domain_nr(pdev->bus),
-                                           PCI_BUS_NUM(source), source & 0xff);
+                        status, err_type, domain, bus, slot, func);
+               err_dev = pci_get_domain_bus_and_slot(domain, bus, devfn);
                 break;
+       }
         case PCI_EXP_DPC_STATUS_TRIGGER_RSN_IN_EXT:
                 ext_reason = status & PCI_EXP_DPC_STATUS_TRIGGER_RSN_EXT;
                 pci_warn(pdev, "containment event, status:%#06x: %s detected\n",

> 
> Is there any chance that this might return NULL?   Feels like maybe that's
> only a possibility on a broken setup, but I'm not sure of all the wonderful
> races around hotplug and DPC occurring before the OS has caught up.

Surprise Down events are handled separately in
dpc_handle_surprise_removal() and won't reach dpc_process_error().
Please correct me if I missed anything.

Thanks for valuable comments.

Best Regards,
Shuai

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v7 3/5] PCI/AER: Report fatal errors of RCiEP and EP if link recoverd
  2026-01-27 10:36   ` Jonathan Cameron
@ 2026-01-28 12:29     ` Shuai Xue
  0 siblings, 0 replies; 35+ messages in thread
From: Shuai Xue @ 2026-01-28 12:29 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: linux-pci, linux-kernel, linuxppc-dev, bhelgaas, kbusch,
	sathyanarayanan.kuppuswamy, mahesh, oohall, terry.bowman,
	tianruidong, lukas



On 1/27/26 6:36 PM, Jonathan Cameron wrote:
> On Sat, 24 Jan 2026 15:45:55 +0800
> Shuai Xue <xueshuai@linux.alibaba.com> wrote:
> 
>> The AER driver has historically avoided reading the configuration space of
>> an endpoint or RCiEP that reported a fatal error, considering the link to
>> that device unreliable. Consequently, when a fatal error occurs, the AER
>> and DPC drivers do not report specific error types, resulting in logs like:
>>
>>    pcieport 0015:00:00.0: EDR: EDR event received
>>    pcieport 0015:00:00.0: EDR: Reported EDR dev: 0015:00:00.0
>>    pcieport 0015:00:00.0: DPC: containment event, status:0x200d, ERR_FATAL received from 0015:01:00.0
>>    pcieport 0015:00:00.0: AER: broadcast error_detected message
>>    pcieport 0015:00:00.0: AER: broadcast mmio_enabled message
>>    pcieport 0015:00:00.0: AER: broadcast resume message
>>    pcieport 0015:00:00.0: pciehp: Slot(21): Link Down/Up ignored
>>    pcieport 0015:00:00.0: AER: device recovery successful
>>    pcieport 0015:00:00.0: EDR: DPC port successfully recovered
>>    pcieport 0015:00:00.0: EDR: Status for 0015:00:00.0: 0x80
>>
>> AER status registers are sticky and Write-1-to-clear. If the link recovered
>> after hot reset, we can still safely access AER status and TLP header of the
>> error device. In such case, report fatal errors which helps to figure out the
>> error root case.
>>
>> After this patch, the logs like:
>>
>>    pcieport 0015:00:00.0: EDR: EDR event received
>>    pcieport 0015:00:00.0: EDR: Reported EDR dev: 0015:00:00.0
>>    pcieport 0015:00:00.0: DPC: containment event, status:0x200d, ERR_FATAL received from 0015:01:00.0
>>    pcieport 0015:00:00.0: AER: broadcast error_detected message
>> + vfio-pci 0015:01:00.0: AER: Errors reported prior to reset
>> + vfio-pci 0015:01:00.0: PCIe Bus Error: severity=Uncorrectable (Fatal), type=Transaction Layer, (Receiver ID)
>> + vfio-pci 0015:01:00.0:   device [144d:a80a] error status/mask=00001000/00400000
>> + vfio-pci 0015:01:00.0:    [12] TLP                    (First)
>> + vfio-pci 0015:01:00.0: AER:   TLP Header: 0x4a004010 0x00000040 0x01000000 0xffffffff
>>    pcieport 0015:00:00.0: AER: broadcast mmio_enabled message
>>    pcieport 0015:00:00.0: AER: broadcast resume message
>>    pcieport 0015:00:00.0: pciehp: Slot(21): Link Down/Up ignored
>>    pcieport 0015:00:00.0: AER: device recovery successful
>>    pcieport 0015:00:00.0: EDR: DPC port successfully recovered
>>    pcieport 0015:00:00.0: EDR: Status for 0015:00:00.0: 0x80
>>
>> Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
> Hi Shuai,
> 
> With the structure zeroed below (just to make this easier to review, not because
> there is a bug as far as I can see)
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> 
>> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
>> index e0bcaa896803..4c0a2bbe9197 100644
>> --- a/drivers/pci/pcie/aer.c
>> +++ b/drivers/pci/pcie/aer.c
> 
>> @@ -1447,17 +1450,38 @@ int aer_get_device_error_info(struct aer_err_info *info, int i)
>>   	return 1;
>>   }
>>   
>> +void aer_report_frozen_error(struct pci_dev *dev)
>> +{
>> +	struct aer_err_info info;
>> +	int type = pci_pcie_type(dev);
>> +
>> +	if (type != PCI_EXP_TYPE_ENDPOINT && type != PCI_EXP_TYPE_RC_END)
>> +		return;
>> +
> 
> struct aer_err_info has a bunch of fields. I'd just make sure it's zeroed
> to avoid us having to check that they are all filled in. = {};
> or do it with the initial values being assigned.
> 
> 	info = (struct aer_err_info) {
> 		.err_dev_num = 0,
> 		.severity = AER_FATAL,
> 		.level = KERN_ERR,
> 	};
> 	add_error_device(&info, dev);
> 
> 

Got it. Will fix it in next verison.

Thanks for valuable comments.

Best Regards,
Shuai

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v7 4/5] PCI/AER: Clear both AER fatal and non-fatal status
  2026-01-27 10:39   ` Jonathan Cameron
@ 2026-01-28 12:30     ` Shuai Xue
  0 siblings, 0 replies; 35+ messages in thread
From: Shuai Xue @ 2026-01-28 12:30 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: linux-pci, linux-kernel, linuxppc-dev, bhelgaas, kbusch,
	sathyanarayanan.kuppuswamy, mahesh, oohall, terry.bowman,
	tianruidong, lukas



On 1/27/26 6:39 PM, Jonathan Cameron wrote:
> On Sat, 24 Jan 2026 15:45:56 +0800
> Shuai Xue <xueshuai@linux.alibaba.com> wrote:
> 
>> The DPC driver clears AER fatal status for the port that reported the
>> error, but not for the downstream device that deteced the error.  The
>> current recovery code only clears non-fatal AER status, leaving fatal
>> status bits set in the error device.
>>
>> Use pci_aer_raw_clear_status() to clear both fatal and non-fatal error
>> status in the error device, ensuring all AER status bits are properly
>> cleared after recovery.
>>
>> Fixes: aa344bc8b727 ("PCI/ERR: Clear AER status only when we control AER")
>> Cc: stable@vger.kernel.org
>> Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
> Shouldn't this be first patch in series to make it easier to backport?

Yes, you are right. Will move it as the first one.

> 
> Otherwise seems reasonable to me, but others know these flows better than me
> so hopefully we'll get some more review.
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Thanks.

Best Regards,
Shuai

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v7 5/5] PCI/AER: Only clear error bits in pcie_clear_device_status()
  2026-01-27 10:45   ` Jonathan Cameron
@ 2026-01-28 12:45     ` Shuai Xue
  2026-02-03  7:44       ` Lukas Wunner
  0 siblings, 1 reply; 35+ messages in thread
From: Shuai Xue @ 2026-01-28 12:45 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: linux-pci, linux-kernel, linuxppc-dev, bhelgaas, kbusch,
	sathyanarayanan.kuppuswamy, mahesh, oohall, terry.bowman,
	tianruidong, lukas



On 1/27/26 6:45 PM, Jonathan Cameron wrote:
> On Sat, 24 Jan 2026 15:45:57 +0800
> Shuai Xue <xueshuai@linux.alibaba.com> wrote:
> 
>> Currently, pcie_clear_device_status() clears the entire PCIe Device
>> Status register (PCI_EXP_DEVSTA), which includes both error status bits
>> and other status bits such as AUX Power Detected (AUXPD) and
>> Transactions Pending (TRPND).
>>
>> Clearing non-error status bits can interfere with other drivers or
>> subsystems that may rely on these bits. To fix it, only clear the error
>> bits (0xf) while preserving other status bits.
>>
>> Fixes: ec752f5d54d7 ("PCI/AER: Clear device status bits during ERR_FATAL and ERR_NONFATAL")
>> Cc: stable@vger.kernel.org
>> Suggested-by: Lukas Wunner <lukas@wunner.de>
>> Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
> Similar to previous. Drag to start of series to make backports easier if
> we think this is a fix that affects real cases.  

Thank you for the detailed feedback.

I'll move this patch to the start of the series for easier backporting.

> For stuff that's defined
> in the PCI 6.2 spec, AUX power and Transactions Pending are RO, but
> the interesting one is Emergency Power Reduction Detected which is RW1C
> and hence reason this fix is potentially needed  + future features using
> remaining bits.
> 
> I'd be more explicit in the commit message for that.  Talk about it not
> mattering for AUXPD or TRPND because they are RO, vs the ones we
> aren't using yet in Linux with the emergency power reduction detected
> as an example that clearly shows we need to mask this!
> 

You're absolutely right - the commit
message should be more explicit about the different bit types and their
implications.

The revised the commit message is:

PCI/AER: Only clear error bits in PCIe Device Status register

Currently, pcie_clear_device_status() clears the entire PCIe Device
Status register (PCI_EXP_DEVSTA), which includes both error status bits
and other status bits.

According to PCIe Base Spec r6.0 sec 7.5.3.5, the Device Status register
contains different types of status bits:

- Read-only bits (AUXPD, TRPND): Writing to these has no effect, but
   clearing them is unnecessary.

- RW1C (read/write 1 to clear) non-error bits: For example, Emergency
   Power Reduction Detected (bit 6). Unconditionally clearing these bits
   can interfere with other drivers or subsystems that rely on them.

- Reserved bits: May be used for future features and should be preserved.

To prevent unintended side effects, modify pcie_clear_device_status() to
only clear the four error status bits (CED, NFED, FED, URD) while
preserving all other status bits.

Fixes: ec752f5d54d7 ("PCI/AER: Clear device status bits during ERR_FATAL and ERR_NONFATAL")
Cc: stable@vger.kernel.org
Suggested-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>

> J

Thanks.

Best Regards,
Shuai

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v7 2/5] PCI/DPC: Run recovery on device that detected the error
  2026-01-28 12:27     ` Shuai Xue
@ 2026-01-28 15:02       ` Jonathan Cameron
  2026-01-29  5:49         ` Shuai Xue
  0 siblings, 1 reply; 35+ messages in thread
From: Jonathan Cameron @ 2026-01-28 15:02 UTC (permalink / raw)
  To: Shuai Xue
  Cc: linux-pci, linux-kernel, linuxppc-dev, bhelgaas, kbusch,
	sathyanarayanan.kuppuswamy, mahesh, oohall, terry.bowman,
	tianruidong, lukas

On Wed, 28 Jan 2026 20:27:31 +0800
Shuai Xue <xueshuai@linux.alibaba.com> wrote:

> On 1/27/26 6:24 PM, Jonathan Cameron wrote:
> > On Sat, 24 Jan 2026 15:45:54 +0800
> > Shuai Xue <xueshuai@linux.alibaba.com> wrote:
> >   
> >> The current implementation of pcie_do_recovery() assumes that the
> >> recovery process is executed for the device that detected the error.
> >> However, the DPC driver currently passes the error port that experienced
> >> the DPC event to pcie_do_recovery().
> >>
> >> Use the SOURCE ID register to correctly identify the device that
> >> detected the error. When passing the error device, the
> >> pcie_do_recovery() will find the upstream bridge and walk bridges
> >> potentially AER affected. And subsequent commits will be able to
> >> accurately access AER status of the error device.
> >>
> >> Should not observe any functional changes.
> >>
> >> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> >> Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
Hi Shuai,

> >> ---
> >>   drivers/pci/pci.h      |  2 +-
> >>   drivers/pci/pcie/dpc.c | 25 +++++++++++++++++++++----
> >>   drivers/pci/pcie/edr.c |  7 ++++---
> >>   3 files changed, 26 insertions(+), 8 deletions(-)
> >>
...

> >> -void dpc_process_error(struct pci_dev *pdev)
> >> +/**
> >> + * dpc_process_error - handle the DPC error status
> >> + * @pdev: the port that experienced the containment event
> >> + *
> >> + * Return: the device that detected the error.
> >> + *
> >> + * NOTE: The device reference count is increased, the caller must decrement
> >> + * the reference count by calling pci_dev_put().
> >> + */
> >> +struct pci_dev *dpc_process_error(struct pci_dev *pdev)  
> > 
> > Maybe it makes sense to carry the err_port naming for the pci_dev
> > in here as well?  Seems stronger than just relying on people
> > reading the documentation you've added.  
> 
> Good point. I think renaming the parameter would improve clarity. However,
> I'd prefer to handle it in a separate patch to keep this change focused on
> the functional modification. Would that work for you?
> 
Sure. Ideal would be a precursor patch, but if it's much easier to
do on top of this I'm fine with that.

You are absolutely correct that it should be a separate patch!

> >     
> >>   {
> >>   	u16 cap = pdev->dpc_cap, status, source, reason, ext_reason;
> >>   	struct aer_err_info info = {};
> >> +	struct pci_dev *err_dev;
> >>   
> >>   	pci_read_config_word(pdev, cap + PCI_EXP_DPC_STATUS, &status);
> >>   
> >> @@ -279,6 +289,7 @@ void dpc_process_error(struct pci_dev *pdev)
> >>   			pci_aer_clear_nonfatal_status(pdev);
> >>   			pci_aer_clear_fatal_status(pdev);
> >>   		}
> >> +		err_dev = pci_dev_get(pdev);
> >>   		break;
> >>   	case PCI_EXP_DPC_STATUS_TRIGGER_RSN_NFE:
> >>   	case PCI_EXP_DPC_STATUS_TRIGGER_RSN_FE:
> >> @@ -290,6 +301,8 @@ void dpc_process_error(struct pci_dev *pdev)
> >>   				"ERR_FATAL" : "ERR_NONFATAL",
> >>   			 pci_domain_nr(pdev->bus), PCI_BUS_NUM(source),
> >>   			 PCI_SLOT(source), PCI_FUNC(source));
> >> +		err_dev = pci_get_domain_bus_and_slot(pci_domain_nr(pdev->bus),
> >> +					    PCI_BUS_NUM(source), source & 0xff);  
> > 
> > Bunch of replication in her with the pci_warn(). Maybe some local variables?
> > Maybe even rebuild the final parameter from PCI_DEVFN(slot, func) just to make the
> > association with the print really obvious?  
> 
> Agreed. Here's the improved version:
> 
> --- a/drivers/pci/pcie/dpc.c
> +++ b/drivers/pci/pcie/dpc.c
> @@ -293,17 +293,28 @@ struct pci_dev *dpc_process_error(struct pci_dev *pdev)
>                  break;
>          case PCI_EXP_DPC_STATUS_TRIGGER_RSN_NFE:
>          case PCI_EXP_DPC_STATUS_TRIGGER_RSN_FE:
> -               pci_read_config_word(pdev, cap + PCI_EXP_DPC_SOURCE_ID,
> -                                    &source);
> +       {
> +               int domain, bus;
> +               u8 slot, func, devfn;
> +               const char *err_type;
> +
> +               pci_read_config_word(pdev, cap + PCI_EXP_DPC_SOURCE_ID, &source);
> +
> +               /* Extract source device location */
> +               domain = pci_domain_nr(pdev->bus);
> +               bus = PCI_BUS_NUM(source);
> +               slot = PCI_SLOT(source);
> +               func = PCI_FUNC(source);
> +               devfn = PCI_DEVFN(slot, func);
> +
> +               err_type = (reason == PCI_EXP_DPC_STATUS_TRIGGER_RSN_FE) ?
> +                          "ERR_FATAL" : "ERR_NONFATAL";
> +
>                  pci_warn(pdev, "containment event, status:%#06x, %s received from %04x:%02x:%02x.%d\n",
> -                        status,
> -                        (reason == PCI_EXP_DPC_STATUS_TRIGGER_RSN_FE) ?
> -                               "ERR_FATAL" : "ERR_NONFATAL",
> -                        pci_domain_nr(pdev->bus), PCI_BUS_NUM(source),
> -                        PCI_SLOT(source), PCI_FUNC(source));
> -               err_dev = pci_get_domain_bus_and_slot(pci_domain_nr(pdev->bus),
> -                                           PCI_BUS_NUM(source), source & 0xff);
> +                        status, err_type, domain, bus, slot, func);
> +               err_dev = pci_get_domain_bus_and_slot(domain, bus, devfn);
Maybe don't bother with local variables for the things only used once.
e.g.

		err_dev = pci_get_domain_bus_and_slot(domain, bus, PCI_DEVFN(slot, func));

and possibly the same for err_type.

I don't mind though if you prefer to use locals for everything.



>                  break;
> +       }
>          case PCI_EXP_DPC_STATUS_TRIGGER_RSN_IN_EXT:
>                  ext_reason = status & PCI_EXP_DPC_STATUS_TRIGGER_RSN_EXT;
>                  pci_warn(pdev, "containment event, status:%#06x: %s detected\n",
> 
> > 
> > Is there any chance that this might return NULL?   Feels like maybe that's
> > only a possibility on a broken setup, but I'm not sure of all the wonderful
> > races around hotplug and DPC occurring before the OS has caught up.  
> 
> Surprise Down events are handled separately in
> dpc_handle_surprise_removal() and won't reach dpc_process_error().
> Please correct me if I missed anything.

Probably fine. I just didn't check particularly closely.  

Jonathan
> 
> Thanks for valuable comments.
> 
> Best Regards,
> Shuai


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v7 3/5] PCI/AER: Report fatal errors of RCiEP and EP if link recoverd
  2026-01-24  7:45 ` [PATCH v7 3/5] PCI/AER: Report fatal errors of RCiEP and EP if link recoverd Shuai Xue
  2026-01-27 10:36   ` Jonathan Cameron
@ 2026-01-28 16:50   ` Kuppuswamy Sathyanarayanan
  2026-01-29 11:46     ` Shuai Xue
  1 sibling, 1 reply; 35+ messages in thread
From: Kuppuswamy Sathyanarayanan @ 2026-01-28 16:50 UTC (permalink / raw)
  To: Shuai Xue, linux-pci, linux-kernel, linuxppc-dev, bhelgaas,
	kbusch
  Cc: mahesh, oohall, Jonathan.Cameron, terry.bowman, tianruidong,
	lukas

Hi,

On 1/23/2026 11:45 PM, Shuai Xue wrote:
> The AER driver has historically avoided reading the configuration space of
> an endpoint or RCiEP that reported a fatal error, considering the link to
> that device unreliable. Consequently, when a fatal error occurs, the AER
> and DPC drivers do not report specific error types, resulting in logs like:
> 
>   pcieport 0015:00:00.0: EDR: EDR event received
>   pcieport 0015:00:00.0: EDR: Reported EDR dev: 0015:00:00.0
>   pcieport 0015:00:00.0: DPC: containment event, status:0x200d, ERR_FATAL received from 0015:01:00.0
>   pcieport 0015:00:00.0: AER: broadcast error_detected message
>   pcieport 0015:00:00.0: AER: broadcast mmio_enabled message
>   pcieport 0015:00:00.0: AER: broadcast resume message
>   pcieport 0015:00:00.0: pciehp: Slot(21): Link Down/Up ignored
>   pcieport 0015:00:00.0: AER: device recovery successful
>   pcieport 0015:00:00.0: EDR: DPC port successfully recovered
>   pcieport 0015:00:00.0: EDR: Status for 0015:00:00.0: 0x80
> 
> AER status registers are sticky and Write-1-to-clear. If the link recovered
> after hot reset, we can still safely access AER status and TLP header of the
> error device. In such case, report fatal errors which helps to figure out the
> error root case.
> 
> After this patch, the logs like:
> 
>   pcieport 0015:00:00.0: EDR: EDR event received
>   pcieport 0015:00:00.0: EDR: Reported EDR dev: 0015:00:00.0
>   pcieport 0015:00:00.0: DPC: containment event, status:0x200d, ERR_FATAL received from 0015:01:00.0
>   pcieport 0015:00:00.0: AER: broadcast error_detected message
> + vfio-pci 0015:01:00.0: AER: Errors reported prior to reset
> + vfio-pci 0015:01:00.0: PCIe Bus Error: severity=Uncorrectable (Fatal), type=Transaction Layer, (Receiver ID)
> + vfio-pci 0015:01:00.0:   device [144d:a80a] error status/mask=00001000/00400000
> + vfio-pci 0015:01:00.0:    [12] TLP                    (First)
> + vfio-pci 0015:01:00.0: AER:   TLP Header: 0x4a004010 0x00000040 0x01000000 0xffffffff
>   pcieport 0015:00:00.0: AER: broadcast mmio_enabled message
>   pcieport 0015:00:00.0: AER: broadcast resume message
>   pcieport 0015:00:00.0: pciehp: Slot(21): Link Down/Up ignored
>   pcieport 0015:00:00.0: AER: device recovery successful
>   pcieport 0015:00:00.0: EDR: DPC port successfully recovered
>   pcieport 0015:00:00.0: EDR: Status for 0015:00:00.0: 0x80
> 
> Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
> ---

LGTM. Few suggestions inline.

Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>


>  drivers/pci/pci.h      |  4 +++-
>  drivers/pci/pcie/aer.c | 32 ++++++++++++++++++++++++++++----
>  drivers/pci/pcie/dpc.c |  2 +-
>  drivers/pci/pcie/err.c |  5 +++++
>  4 files changed, 37 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index 58640e656897..bd020ba0cef0 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -746,8 +746,10 @@ struct aer_err_info {
>  	struct pcie_tlp_log tlp;	/* TLP Header */
>  };
>  
> -int aer_get_device_error_info(struct aer_err_info *info, int i);
> +int aer_get_device_error_info(struct aer_err_info *info, int i,
> +			      bool link_healthy);
>  void aer_print_error(struct aer_err_info *info, int i);
> +void aer_report_frozen_error(struct pci_dev *dev);
>  
>  int pcie_read_tlp_log(struct pci_dev *dev, int where, int where2,
>  		      unsigned int tlp_len, bool flit,
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index e0bcaa896803..4c0a2bbe9197 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -1384,12 +1384,14 @@ EXPORT_SYMBOL_GPL(aer_recover_queue);
>   * aer_get_device_error_info - read error status from dev and store it to info
>   * @info: pointer to structure to store the error record
>   * @i: index into info->dev[]
> + * @link_healthy: link is healthy or not
>   *
>   * Return: 1 on success, 0 on error.
>   *
>   * Note that @info is reused among all error devices. Clear fields properly.
>   */
> -int aer_get_device_error_info(struct aer_err_info *info, int i)
> +int aer_get_device_error_info(struct aer_err_info *info, int i,
> +			      bool link_healthy)
>  {
>  	struct pci_dev *dev;
>  	int type, aer;
> @@ -1420,7 +1422,8 @@ int aer_get_device_error_info(struct aer_err_info *info, int i)
>  	} else if (type == PCI_EXP_TYPE_ROOT_PORT ||
>  		   type == PCI_EXP_TYPE_RC_EC ||
>  		   type == PCI_EXP_TYPE_DOWNSTREAM ||
> -		   info->severity == AER_NONFATAL) {
> +		   info->severity == AER_NONFATAL ||
> +		   (info->severity == AER_FATAL && link_healthy)) {
>  
>  		/* Link is still healthy for IO reads */
>  		pci_read_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS,
> @@ -1447,17 +1450,38 @@ int aer_get_device_error_info(struct aer_err_info *info, int i)
>  	return 1;
>  }
>  
> +void aer_report_frozen_error(struct pci_dev *dev)

Since this function focuses specifically on printing fatal error details, would
aer_print_fatal_error() be a more descriptive name?

> +{
> +	struct aer_err_info info;
> +	int type = pci_pcie_type(dev);
> +
> +	if (type != PCI_EXP_TYPE_ENDPOINT && type != PCI_EXP_TYPE_RC_END)
> +		return;
> +
> +	info.error_dev_num = 0;
> +	info.severity = AER_FATAL;
> +	info.level = KERN_ERR;
> +	add_error_device(&info, dev);
> +
> +	if (aer_get_device_error_info(&info, 0, true)) {
> +		pci_err(dev, "Errors reported prior to reset\n");

The 'prior to reset' context depends on where this is called. I'd suggest moving
this log to the caller or removing it entirely to keep this helper generic.

> +		aer_print_error(&info, 0);
> +	}
> +
> +	pci_dev_put(dev); /* pairs with pci_dev_get() in add_error_device() */
> +}
> +
>  static inline void aer_process_err_devices(struct aer_err_info *e_info)
>  {
>  	int i;
>  
>  	/* Report all before handling them, to not lose records by reset etc. */
>  	for (i = 0; i < e_info->error_dev_num && e_info->dev[i]; i++) {
> -		if (aer_get_device_error_info(e_info, i))
> +		if (aer_get_device_error_info(e_info, i, false))
>  			aer_print_error(e_info, i);
>  	}
>  	for (i = 0; i < e_info->error_dev_num && e_info->dev[i]; i++) {
> -		if (aer_get_device_error_info(e_info, i))
> +		if (aer_get_device_error_info(e_info, i, false))
>  			handle_error_source(e_info->dev[i], e_info);
>  	}
>  }
> diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
> index f6069f621683..21c4e8371279 100644
> --- a/drivers/pci/pcie/dpc.c
> +++ b/drivers/pci/pcie/dpc.c
> @@ -284,7 +284,7 @@ struct pci_dev *dpc_process_error(struct pci_dev *pdev)
>  		pci_warn(pdev, "containment event, status:%#06x: unmasked uncorrectable error detected\n",
>  			 status);
>  		if (dpc_get_aer_uncorrect_severity(pdev, &info) &&
> -		    aer_get_device_error_info(&info, 0)) {
> +		    aer_get_device_error_info(&info, 0, false)) {
>  			aer_print_error(&info, 0);
>  			pci_aer_clear_nonfatal_status(pdev);
>  			pci_aer_clear_fatal_status(pdev);
> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> index bebe4bc111d7..0780ea09478b 100644
> --- a/drivers/pci/pcie/err.c
> +++ b/drivers/pci/pcie/err.c
> @@ -253,6 +253,11 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
>  			pci_warn(bridge, "subordinate device reset failed\n");
>  			goto failed;
>  		}
> +
> +		/* Link recovered, report fatal errors of RCiEP or EP */
> +		if (state == pci_channel_io_frozen)

To align with your comment regarding RCiEPs and EPs, should we explicitly
validate the device type here before calling the report function?

> +			aer_report_frozen_error(dev);
> +
>  	}
>  
>  	if (status == PCI_ERS_RESULT_NEED_RESET) {

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v7 4/5] PCI/AER: Clear both AER fatal and non-fatal status
  2026-01-24  7:45 ` [PATCH v7 4/5] PCI/AER: Clear both AER fatal and non-fatal status Shuai Xue
  2026-01-27 10:39   ` Jonathan Cameron
@ 2026-01-28 16:58   ` Kuppuswamy Sathyanarayanan
  2026-02-03  8:06   ` Lukas Wunner
  2 siblings, 0 replies; 35+ messages in thread
From: Kuppuswamy Sathyanarayanan @ 2026-01-28 16:58 UTC (permalink / raw)
  To: Shuai Xue, linux-pci, linux-kernel, linuxppc-dev, bhelgaas,
	kbusch
  Cc: mahesh, oohall, Jonathan.Cameron, terry.bowman, tianruidong,
	lukas



On 1/23/2026 11:45 PM, Shuai Xue wrote:
> The DPC driver clears AER fatal status for the port that reported the
> error, but not for the downstream device that deteced the error.  The
> current recovery code only clears non-fatal AER status, leaving fatal
> status bits set in the error device.
> 
> Use pci_aer_raw_clear_status() to clear both fatal and non-fatal error
> status in the error device, ensuring all AER status bits are properly
> cleared after recovery.
> 
> Fixes: aa344bc8b727 ("PCI/ERR: Clear AER status only when we control AER")
> Cc: stable@vger.kernel.org
> Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
> ---

Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>


>  drivers/pci/pcie/err.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> index 0780ea09478b..5e463efc3d05 100644
> --- a/drivers/pci/pcie/err.c
> +++ b/drivers/pci/pcie/err.c
> @@ -285,7 +285,7 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
>  	 */
>  	if (host->native_aer || pcie_ports_native) {
>  		pcie_clear_device_status(dev);
> -		pci_aer_clear_nonfatal_status(dev);
> +		pci_aer_raw_clear_status(dev);
>  	}
>  
>  	pci_walk_bridge(bridge, pci_pm_runtime_put, NULL);

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v7 5/5] PCI/AER: Only clear error bits in pcie_clear_device_status()
  2026-01-24  7:45 ` [PATCH v7 5/5] PCI/AER: Only clear error bits in pcie_clear_device_status() Shuai Xue
  2026-01-27 10:45   ` Jonathan Cameron
@ 2026-01-28 17:01   ` Kuppuswamy Sathyanarayanan
  2026-01-29 12:09     ` Shuai Xue
  2026-02-03  7:53   ` Lukas Wunner
  2 siblings, 1 reply; 35+ messages in thread
From: Kuppuswamy Sathyanarayanan @ 2026-01-28 17:01 UTC (permalink / raw)
  To: Shuai Xue, linux-pci, linux-kernel, linuxppc-dev, bhelgaas,
	kbusch
  Cc: mahesh, oohall, Jonathan.Cameron, terry.bowman, tianruidong,
	lukas



On 1/23/2026 11:45 PM, Shuai Xue wrote:
> Currently, pcie_clear_device_status() clears the entire PCIe Device
> Status register (PCI_EXP_DEVSTA), which includes both error status bits
> and other status bits such as AUX Power Detected (AUXPD) and
> Transactions Pending (TRPND).
> 
> Clearing non-error status bits can interfere with other drivers or
> subsystems that may rely on these bits. To fix it, only clear the error
> bits (0xf) while preserving other status bits.
> 
> Fixes: ec752f5d54d7 ("PCI/AER: Clear device status bits during ERR_FATAL and ERR_NONFATAL")
> Cc: stable@vger.kernel.org
> Suggested-by: Lukas Wunner <lukas@wunner.de>
> Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
> ---

Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>


>  drivers/pci/pci.c             | 2 +-
>  include/uapi/linux/pci_regs.h | 1 +
>  2 files changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 13dbb405dc31..0b947f90c333 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -2246,7 +2246,7 @@ void pcie_clear_device_status(struct pci_dev *dev)
>  	u16 sta;
>  
>  	pcie_capability_read_word(dev, PCI_EXP_DEVSTA, &sta);
> -	pcie_capability_write_word(dev, PCI_EXP_DEVSTA, sta);
> +	pcie_capability_write_word(dev, PCI_EXP_DEVSTA, sta & PCI_EXP_DEVSTA_ERR);
>  }
>  #endif
>  
> diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
> index 3add74ae2594..f4b68203bc4e 100644
> --- a/include/uapi/linux/pci_regs.h
> +++ b/include/uapi/linux/pci_regs.h
> @@ -534,6 +534,7 @@
>  #define  PCI_EXP_DEVSTA_NFED	0x0002	/* Non-Fatal Error Detected */
>  #define  PCI_EXP_DEVSTA_FED	0x0004	/* Fatal Error Detected */
>  #define  PCI_EXP_DEVSTA_URD	0x0008	/* Unsupported Request Detected */
> +#define  PCI_EXP_DEVSTA_ERR	0xf	/* Error bits */

To align with other macros, use 0x000F?

>  #define  PCI_EXP_DEVSTA_AUXPD	0x0010	/* AUX Power Detected */
>  #define  PCI_EXP_DEVSTA_TRPND	0x0020	/* Transactions Pending */
>  #define PCI_CAP_EXP_RC_ENDPOINT_SIZEOF_V1	12	/* v1 endpoints without link end here */

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v7 2/5] PCI/DPC: Run recovery on device that detected the error
  2026-01-28 15:02       ` Jonathan Cameron
@ 2026-01-29  5:49         ` Shuai Xue
  0 siblings, 0 replies; 35+ messages in thread
From: Shuai Xue @ 2026-01-29  5:49 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: linux-pci, linux-kernel, linuxppc-dev, bhelgaas, kbusch,
	sathyanarayanan.kuppuswamy, mahesh, oohall, terry.bowman,
	tianruidong, lukas



On 1/28/26 11:02 PM, Jonathan Cameron wrote:
> On Wed, 28 Jan 2026 20:27:31 +0800
> Shuai Xue <xueshuai@linux.alibaba.com> wrote:
> 
>> On 1/27/26 6:24 PM, Jonathan Cameron wrote:
>>> On Sat, 24 Jan 2026 15:45:54 +0800
>>> Shuai Xue <xueshuai@linux.alibaba.com> wrote:
>>>    
>>>> The current implementation of pcie_do_recovery() assumes that the
>>>> recovery process is executed for the device that detected the error.
>>>> However, the DPC driver currently passes the error port that experienced
>>>> the DPC event to pcie_do_recovery().
>>>>
>>>> Use the SOURCE ID register to correctly identify the device that
>>>> detected the error. When passing the error device, the
>>>> pcie_do_recovery() will find the upstream bridge and walk bridges
>>>> potentially AER affected. And subsequent commits will be able to
>>>> accurately access AER status of the error device.
>>>>
>>>> Should not observe any functional changes.
>>>>
>>>> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
>>>> Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
> Hi Shuai,
> 
>>>> ---
>>>>    drivers/pci/pci.h      |  2 +-
>>>>    drivers/pci/pcie/dpc.c | 25 +++++++++++++++++++++----
>>>>    drivers/pci/pcie/edr.c |  7 ++++---
>>>>    3 files changed, 26 insertions(+), 8 deletions(-)
>>>>
> ...
> 
>>>> -void dpc_process_error(struct pci_dev *pdev)
>>>> +/**
>>>> + * dpc_process_error - handle the DPC error status
>>>> + * @pdev: the port that experienced the containment event
>>>> + *
>>>> + * Return: the device that detected the error.
>>>> + *
>>>> + * NOTE: The device reference count is increased, the caller must decrement
>>>> + * the reference count by calling pci_dev_put().
>>>> + */
>>>> +struct pci_dev *dpc_process_error(struct pci_dev *pdev)
>>>
>>> Maybe it makes sense to carry the err_port naming for the pci_dev
>>> in here as well?  Seems stronger than just relying on people
>>> reading the documentation you've added.
>>
>> Good point. I think renaming the parameter would improve clarity. However,
>> I'd prefer to handle it in a separate patch to keep this change focused on
>> the functional modification. Would that work for you?
>>
> Sure. Ideal would be a precursor patch, but if it's much easier to
> do on top of this I'm fine with that.
> 
> You are absolutely correct that it should be a separate patch!

Got it.

>>>      
>>>>    {
>>>>    	u16 cap = pdev->dpc_cap, status, source, reason, ext_reason;
>>>>    	struct aer_err_info info = {};
>>>> +	struct pci_dev *err_dev;
>>>>    
>>>>    	pci_read_config_word(pdev, cap + PCI_EXP_DPC_STATUS, &status);
>>>>    
>>>> @@ -279,6 +289,7 @@ void dpc_process_error(struct pci_dev *pdev)
>>>>    			pci_aer_clear_nonfatal_status(pdev);
>>>>    			pci_aer_clear_fatal_status(pdev);
>>>>    		}
>>>> +		err_dev = pci_dev_get(pdev);
>>>>    		break;
>>>>    	case PCI_EXP_DPC_STATUS_TRIGGER_RSN_NFE:
>>>>    	case PCI_EXP_DPC_STATUS_TRIGGER_RSN_FE:
>>>> @@ -290,6 +301,8 @@ void dpc_process_error(struct pci_dev *pdev)
>>>>    				"ERR_FATAL" : "ERR_NONFATAL",
>>>>    			 pci_domain_nr(pdev->bus), PCI_BUS_NUM(source),
>>>>    			 PCI_SLOT(source), PCI_FUNC(source));
>>>> +		err_dev = pci_get_domain_bus_and_slot(pci_domain_nr(pdev->bus),
>>>> +					    PCI_BUS_NUM(source), source & 0xff);
>>>
>>> Bunch of replication in her with the pci_warn(). Maybe some local variables?
>>> Maybe even rebuild the final parameter from PCI_DEVFN(slot, func) just to make the
>>> association with the print really obvious?
>>
>> Agreed. Here's the improved version:
>>
>> --- a/drivers/pci/pcie/dpc.c
>> +++ b/drivers/pci/pcie/dpc.c
>> @@ -293,17 +293,28 @@ struct pci_dev *dpc_process_error(struct pci_dev *pdev)
>>                   break;
>>           case PCI_EXP_DPC_STATUS_TRIGGER_RSN_NFE:
>>           case PCI_EXP_DPC_STATUS_TRIGGER_RSN_FE:
>> -               pci_read_config_word(pdev, cap + PCI_EXP_DPC_SOURCE_ID,
>> -                                    &source);
>> +       {
>> +               int domain, bus;
>> +               u8 slot, func, devfn;
>> +               const char *err_type;
>> +
>> +               pci_read_config_word(pdev, cap + PCI_EXP_DPC_SOURCE_ID, &source);
>> +
>> +               /* Extract source device location */
>> +               domain = pci_domain_nr(pdev->bus);
>> +               bus = PCI_BUS_NUM(source);
>> +               slot = PCI_SLOT(source);
>> +               func = PCI_FUNC(source);
>> +               devfn = PCI_DEVFN(slot, func);
>> +
>> +               err_type = (reason == PCI_EXP_DPC_STATUS_TRIGGER_RSN_FE) ?
>> +                          "ERR_FATAL" : "ERR_NONFATAL";
>> +
>>                   pci_warn(pdev, "containment event, status:%#06x, %s received from %04x:%02x:%02x.%d\n",
>> -                        status,
>> -                        (reason == PCI_EXP_DPC_STATUS_TRIGGER_RSN_FE) ?
>> -                               "ERR_FATAL" : "ERR_NONFATAL",
>> -                        pci_domain_nr(pdev->bus), PCI_BUS_NUM(source),
>> -                        PCI_SLOT(source), PCI_FUNC(source));
>> -               err_dev = pci_get_domain_bus_and_slot(pci_domain_nr(pdev->bus),
>> -                                           PCI_BUS_NUM(source), source & 0xff);
>> +                        status, err_type, domain, bus, slot, func);
>> +               err_dev = pci_get_domain_bus_and_slot(domain, bus, devfn);
> Maybe don't bother with local variables for the things only used once.
> e.g.
> 
> 		err_dev = pci_get_domain_bus_and_slot(domain, bus, PCI_DEVFN(slot, func));
> 
> and possibly the same for err_type.
> 
> I don't mind though if you prefer to use locals for everything.

Sure, will remove local devfn and err_type.

> 
> 
> 
>>                   break;
>> +       }
>>           case PCI_EXP_DPC_STATUS_TRIGGER_RSN_IN_EXT:
>>                   ext_reason = status & PCI_EXP_DPC_STATUS_TRIGGER_RSN_EXT;
>>                   pci_warn(pdev, "containment event, status:%#06x: %s detected\n",
>>
>>>
>>> Is there any chance that this might return NULL?   Feels like maybe that's
>>> only a possibility on a broken setup, but I'm not sure of all the wonderful
>>> races around hotplug and DPC occurring before the OS has caught up.
>>
>> Surprise Down events are handled separately in
>> dpc_handle_surprise_removal() and won't reach dpc_process_error().
>> Please correct me if I missed anything.
> 
> Probably fine. I just didn't check particularly closely.
> 
> Jonathan


Thanks for valuable comments.

Best Regards,
Shuai


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v7 3/5] PCI/AER: Report fatal errors of RCiEP and EP if link recoverd
  2026-01-28 16:50   ` Kuppuswamy Sathyanarayanan
@ 2026-01-29 11:46     ` Shuai Xue
  0 siblings, 0 replies; 35+ messages in thread
From: Shuai Xue @ 2026-01-29 11:46 UTC (permalink / raw)
  To: Kuppuswamy Sathyanarayanan, linux-pci, linux-kernel, linuxppc-dev,
	bhelgaas, kbusch
  Cc: mahesh, oohall, Jonathan.Cameron, terry.bowman, tianruidong,
	lukas

Hi, Kuppuswamy,

On 1/29/26 12:50 AM, Kuppuswamy Sathyanarayanan wrote:
> Hi,
> 
> On 1/23/2026 11:45 PM, Shuai Xue wrote:
>> The AER driver has historically avoided reading the configuration space of
>> an endpoint or RCiEP that reported a fatal error, considering the link to
>> that device unreliable. Consequently, when a fatal error occurs, the AER
>> and DPC drivers do not report specific error types, resulting in logs like:
>>
>>    pcieport 0015:00:00.0: EDR: EDR event received
>>    pcieport 0015:00:00.0: EDR: Reported EDR dev: 0015:00:00.0
>>    pcieport 0015:00:00.0: DPC: containment event, status:0x200d, ERR_FATAL received from 0015:01:00.0
>>    pcieport 0015:00:00.0: AER: broadcast error_detected message
>>    pcieport 0015:00:00.0: AER: broadcast mmio_enabled message
>>    pcieport 0015:00:00.0: AER: broadcast resume message
>>    pcieport 0015:00:00.0: pciehp: Slot(21): Link Down/Up ignored
>>    pcieport 0015:00:00.0: AER: device recovery successful
>>    pcieport 0015:00:00.0: EDR: DPC port successfully recovered
>>    pcieport 0015:00:00.0: EDR: Status for 0015:00:00.0: 0x80
>>
>> AER status registers are sticky and Write-1-to-clear. If the link recovered
>> after hot reset, we can still safely access AER status and TLP header of the
>> error device. In such case, report fatal errors which helps to figure out the
>> error root case.
>>
>> After this patch, the logs like:
>>
>>    pcieport 0015:00:00.0: EDR: EDR event received
>>    pcieport 0015:00:00.0: EDR: Reported EDR dev: 0015:00:00.0
>>    pcieport 0015:00:00.0: DPC: containment event, status:0x200d, ERR_FATAL received from 0015:01:00.0
>>    pcieport 0015:00:00.0: AER: broadcast error_detected message
>> + vfio-pci 0015:01:00.0: AER: Errors reported prior to reset
>> + vfio-pci 0015:01:00.0: PCIe Bus Error: severity=Uncorrectable (Fatal), type=Transaction Layer, (Receiver ID)
>> + vfio-pci 0015:01:00.0:   device [144d:a80a] error status/mask=00001000/00400000
>> + vfio-pci 0015:01:00.0:    [12] TLP                    (First)
>> + vfio-pci 0015:01:00.0: AER:   TLP Header: 0x4a004010 0x00000040 0x01000000 0xffffffff
>>    pcieport 0015:00:00.0: AER: broadcast mmio_enabled message
>>    pcieport 0015:00:00.0: AER: broadcast resume message
>>    pcieport 0015:00:00.0: pciehp: Slot(21): Link Down/Up ignored
>>    pcieport 0015:00:00.0: AER: device recovery successful
>>    pcieport 0015:00:00.0: EDR: DPC port successfully recovered
>>    pcieport 0015:00:00.0: EDR: Status for 0015:00:00.0: 0x80
>>
>> Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
>> ---
> 
> LGTM. Few suggestions inline.
> 
> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>

Thank you for the thorough review and the Reviewed-by tag!

> 
> 
>>   drivers/pci/pci.h      |  4 +++-
>>   drivers/pci/pcie/aer.c | 32 ++++++++++++++++++++++++++++----
>>   drivers/pci/pcie/dpc.c |  2 +-
>>   drivers/pci/pcie/err.c |  5 +++++
>>   4 files changed, 37 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
>> index 58640e656897..bd020ba0cef0 100644
>> --- a/drivers/pci/pci.h
>> +++ b/drivers/pci/pci.h
>> @@ -746,8 +746,10 @@ struct aer_err_info {
>>   	struct pcie_tlp_log tlp;	/* TLP Header */
>>   };
>>   
>> -int aer_get_device_error_info(struct aer_err_info *info, int i);
>> +int aer_get_device_error_info(struct aer_err_info *info, int i,
>> +			      bool link_healthy);
>>   void aer_print_error(struct aer_err_info *info, int i);
>> +void aer_report_frozen_error(struct pci_dev *dev);
>>   
>>   int pcie_read_tlp_log(struct pci_dev *dev, int where, int where2,
>>   		      unsigned int tlp_len, bool flit,
>> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
>> index e0bcaa896803..4c0a2bbe9197 100644
>> --- a/drivers/pci/pcie/aer.c
>> +++ b/drivers/pci/pcie/aer.c
>> @@ -1384,12 +1384,14 @@ EXPORT_SYMBOL_GPL(aer_recover_queue);
>>    * aer_get_device_error_info - read error status from dev and store it to info
>>    * @info: pointer to structure to store the error record
>>    * @i: index into info->dev[]
>> + * @link_healthy: link is healthy or not
>>    *
>>    * Return: 1 on success, 0 on error.
>>    *
>>    * Note that @info is reused among all error devices. Clear fields properly.
>>    */
>> -int aer_get_device_error_info(struct aer_err_info *info, int i)
>> +int aer_get_device_error_info(struct aer_err_info *info, int i,
>> +			      bool link_healthy)
>>   {
>>   	struct pci_dev *dev;
>>   	int type, aer;
>> @@ -1420,7 +1422,8 @@ int aer_get_device_error_info(struct aer_err_info *info, int i)
>>   	} else if (type == PCI_EXP_TYPE_ROOT_PORT ||
>>   		   type == PCI_EXP_TYPE_RC_EC ||
>>   		   type == PCI_EXP_TYPE_DOWNSTREAM ||
>> -		   info->severity == AER_NONFATAL) {
>> +		   info->severity == AER_NONFATAL ||
>> +		   (info->severity == AER_FATAL && link_healthy)) {
>>   
>>   		/* Link is still healthy for IO reads */
>>   		pci_read_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS,
>> @@ -1447,17 +1450,38 @@ int aer_get_device_error_info(struct aer_err_info *info, int i)
>>   	return 1;
>>   }
>>   
>> +void aer_report_frozen_error(struct pci_dev *dev)
> 
> Since this function focuses specifically on printing fatal error details, would
> aer_print_fatal_error() be a more descriptive name?

Good point. aer_print_fatal_error() is indeed more descriptive and aligns
better with the existing aer_print_error() naming convention. I'll rename
it in next version.

> 
>> +{
>> +	struct aer_err_info info;
>> +	int type = pci_pcie_type(dev);
>> +
>> +	if (type != PCI_EXP_TYPE_ENDPOINT && type != PCI_EXP_TYPE_RC_END)
>> +		return;
>> +
>> +	info.error_dev_num = 0;
>> +	info.severity = AER_FATAL;
>> +	info.level = KERN_ERR;
>> +	add_error_device(&info, dev);
>> +
>> +	if (aer_get_device_error_info(&info, 0, true)) {
>> +		pci_err(dev, "Errors reported prior to reset\n");
> 
> The 'prior to reset' context depends on where this is called. I'd suggest moving
> this log to the caller or removing it entirely to keep this helper generic.

Agreed. Moving the log message to the caller makes the function more
generic and reusable. I'll move it to pcie_do_recovery() in next version.

> 
>> +		aer_print_error(&info, 0);
>> +	}
>> +
>> +	pci_dev_put(dev); /* pairs with pci_dev_get() in add_error_device() */
>> +}
>> +
>>   static inline void aer_process_err_devices(struct aer_err_info *e_info)
>>   {
>>   	int i;
>>   
>>   	/* Report all before handling them, to not lose records by reset etc. */
>>   	for (i = 0; i < e_info->error_dev_num && e_info->dev[i]; i++) {
>> -		if (aer_get_device_error_info(e_info, i))
>> +		if (aer_get_device_error_info(e_info, i, false))
>>   			aer_print_error(e_info, i);
>>   	}
>>   	for (i = 0; i < e_info->error_dev_num && e_info->dev[i]; i++) {
>> -		if (aer_get_device_error_info(e_info, i))
>> +		if (aer_get_device_error_info(e_info, i, false))
>>   			handle_error_source(e_info->dev[i], e_info);
>>   	}
>>   }
>> diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
>> index f6069f621683..21c4e8371279 100644
>> --- a/drivers/pci/pcie/dpc.c
>> +++ b/drivers/pci/pcie/dpc.c
>> @@ -284,7 +284,7 @@ struct pci_dev *dpc_process_error(struct pci_dev *pdev)
>>   		pci_warn(pdev, "containment event, status:%#06x: unmasked uncorrectable error detected\n",
>>   			 status);
>>   		if (dpc_get_aer_uncorrect_severity(pdev, &info) &&
>> -		    aer_get_device_error_info(&info, 0)) {
>> +		    aer_get_device_error_info(&info, 0, false)) {
>>   			aer_print_error(&info, 0);
>>   			pci_aer_clear_nonfatal_status(pdev);
>>   			pci_aer_clear_fatal_status(pdev);
>> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
>> index bebe4bc111d7..0780ea09478b 100644
>> --- a/drivers/pci/pcie/err.c
>> +++ b/drivers/pci/pcie/err.c
>> @@ -253,6 +253,11 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
>>   			pci_warn(bridge, "subordinate device reset failed\n");
>>   			goto failed;
>>   		}
>> +
>> +		/* Link recovered, report fatal errors of RCiEP or EP */
>> +		if (state == pci_channel_io_frozen)
> 
> To align with your comment regarding RCiEPs and EPs, should we explicitly
> validate the device type here before calling the report function?
> 
>> +			aer_report_frozen_error(dev);

I prefer to keep the type validation inside aer_print_fatal_error() to
make the function self-contained and safe. This way, if it's called from
other places in the future, we don't need to duplicate the type checking
logic.


Thanks again for the valuable feedback. I'll send next version with these
improvements.

Best regards,
Shuai



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v7 5/5] PCI/AER: Only clear error bits in pcie_clear_device_status()
  2026-01-28 17:01   ` Kuppuswamy Sathyanarayanan
@ 2026-01-29 12:09     ` Shuai Xue
  0 siblings, 0 replies; 35+ messages in thread
From: Shuai Xue @ 2026-01-29 12:09 UTC (permalink / raw)
  To: Kuppuswamy Sathyanarayanan, linux-pci, linux-kernel, linuxppc-dev,
	bhelgaas, kbusch
  Cc: mahesh, oohall, Jonathan.Cameron, terry.bowman, tianruidong,
	lukas



On 1/29/26 1:01 AM, Kuppuswamy Sathyanarayanan wrote:
> 
> 
> On 1/23/2026 11:45 PM, Shuai Xue wrote:
>> Currently, pcie_clear_device_status() clears the entire PCIe Device
>> Status register (PCI_EXP_DEVSTA), which includes both error status bits
>> and other status bits such as AUX Power Detected (AUXPD) and
>> Transactions Pending (TRPND).
>>
>> Clearing non-error status bits can interfere with other drivers or
>> subsystems that may rely on these bits. To fix it, only clear the error
>> bits (0xf) while preserving other status bits.
>>
>> Fixes: ec752f5d54d7 ("PCI/AER: Clear device status bits during ERR_FATAL and ERR_NONFATAL")
>> Cc: stable@vger.kernel.org
>> Suggested-by: Lukas Wunner <lukas@wunner.de>
>> Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
>> ---
> 
> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> 
> 
>>   drivers/pci/pci.c             | 2 +-
>>   include/uapi/linux/pci_regs.h | 1 +
>>   2 files changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>> index 13dbb405dc31..0b947f90c333 100644
>> --- a/drivers/pci/pci.c
>> +++ b/drivers/pci/pci.c
>> @@ -2246,7 +2246,7 @@ void pcie_clear_device_status(struct pci_dev *dev)
>>   	u16 sta;
>>   
>>   	pcie_capability_read_word(dev, PCI_EXP_DEVSTA, &sta);
>> -	pcie_capability_write_word(dev, PCI_EXP_DEVSTA, sta);
>> +	pcie_capability_write_word(dev, PCI_EXP_DEVSTA, sta & PCI_EXP_DEVSTA_ERR);
>>   }
>>   #endif
>>   
>> diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
>> index 3add74ae2594..f4b68203bc4e 100644
>> --- a/include/uapi/linux/pci_regs.h
>> +++ b/include/uapi/linux/pci_regs.h
>> @@ -534,6 +534,7 @@
>>   #define  PCI_EXP_DEVSTA_NFED	0x0002	/* Non-Fatal Error Detected */
>>   #define  PCI_EXP_DEVSTA_FED	0x0004	/* Fatal Error Detected */
>>   #define  PCI_EXP_DEVSTA_URD	0x0008	/* Unsupported Request Detected */
>> +#define  PCI_EXP_DEVSTA_ERR	0xf	/* Error bits */
> 
> To align with other macros, use 0x000F?
> 
Sure, will fix it.

Thanks.

Best Regards,
Shuai

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v7 2/5] PCI/DPC: Run recovery on device that detected the error
  2026-01-24  7:45 ` [PATCH v7 2/5] PCI/DPC: Run recovery on device that detected the error Shuai Xue
  2026-01-27 10:24   ` Jonathan Cameron
@ 2026-02-02 14:02   ` Lukas Wunner
  2026-02-02 21:09     ` Lukas Wunner
  2026-02-06  8:41     ` Shuai Xue
  1 sibling, 2 replies; 35+ messages in thread
From: Lukas Wunner @ 2026-02-02 14:02 UTC (permalink / raw)
  To: Shuai Xue
  Cc: linux-pci, linux-kernel, linuxppc-dev, bhelgaas, kbusch,
	sathyanarayanan.kuppuswamy, mahesh, oohall, Jonathan.Cameron,
	terry.bowman, tianruidong

On Sat, Jan 24, 2026 at 03:45:54PM +0800, Shuai Xue wrote:
> +++ b/drivers/pci/pcie/dpc.c
> @@ -372,10 +388,11 @@ static irqreturn_t dpc_handler(int irq, void *context)
>  		return IRQ_HANDLED;
>  	}
>  
> -	dpc_process_error(err_port);
> +	err_dev = dpc_process_error(err_port);
>  
>  	/* We configure DPC so it only triggers on ERR_FATAL */
> -	pcie_do_recovery(err_port, pci_channel_io_frozen, dpc_reset_link);
> +	pcie_do_recovery(err_dev, pci_channel_io_frozen, dpc_reset_link);
> +	pci_dev_put(err_dev);
>  
>  	return IRQ_HANDLED;
>  }

You're assuming that the parent of the Requester is always identical
to the containing Downstream Port.  But that's not necessarily the case:

E.g., imagine a DPC-capable Root Port with a PCIe switch below
whose Downstream Ports are not DPC-capable.  Let's say an Endpoint
beneath the PCIe switch sends ERR_FATAL upstream.  AFAICS, your patch
will cause pcie_do_recovery() to invoke dpc_reset_link() with the
Downstream Port of the PCIe switch as argument.  That function will
then happily use pdev->dpc_cap even though it's 0.

A possible solution may be to amend dpc_reset_link() to walk upwards,
starting at pdev, and search for a pci_dev whose pdev->dpc_cap is
non-zero.  That should be the containing DPC-capable port.

I don't really like the err_dev and err_port terminology invented here.
We generally use spec terms to make it easy to connect the code to the
spec.  The spec uses the term "Downstream Port" to denote the containing
Downstream Port, so a name such as "dport" may be better than "err_port".

Similarly, the spec uses "Requester" or "Source" for the reporting agent,
so "req", "requester", "src" or "source" may all be better names than
"err_dev".

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v7 2/5] PCI/DPC: Run recovery on device that detected the error
  2026-02-02 14:02   ` Lukas Wunner
@ 2026-02-02 21:09     ` Lukas Wunner
  2026-02-07  7:48       ` Shuai Xue
  2026-02-06  8:41     ` Shuai Xue
  1 sibling, 1 reply; 35+ messages in thread
From: Lukas Wunner @ 2026-02-02 21:09 UTC (permalink / raw)
  To: Shuai Xue
  Cc: linux-pci, linux-kernel, linuxppc-dev, bhelgaas, kbusch,
	sathyanarayanan.kuppuswamy, mahesh, oohall, Jonathan.Cameron,
	terry.bowman, tianruidong

On Mon, Feb 02, 2026 at 03:02:54PM +0100, Lukas Wunner wrote:
> You're assuming that the parent of the Requester is always identical
> to the containing Downstream Port.  But that's not necessarily the case:
> 
> E.g., imagine a DPC-capable Root Port with a PCIe switch below
> whose Downstream Ports are not DPC-capable.  Let's say an Endpoint
> beneath the PCIe switch sends ERR_FATAL upstream.  AFAICS, your patch
> will cause pcie_do_recovery() to invoke dpc_reset_link() with the
> Downstream Port of the PCIe switch as argument.  That function will
> then happily use pdev->dpc_cap even though it's 0.

Thinking about this some more, I realized there's another problem:

In a scenario like the one I've outlined above, after your change,
pcie_do_recovery() will only broadcast error_detected (and other
callbacks) below the downstream port of the PCIe switch -- and not
to any other devices below the containing Root Port.

However, the DPC-induced Link Down event at the Root Port results
in a Hot Reset being propagated down the hierarchy to any device
below the Root Port.  So with your change, the siblings of the
downstream port on the PCIe switch will no longer be informed of
the reset and thus are no longer given an opportunity to recover
after reset.

The premise on which this patch is built is false -- that the bridge
upstream of the error-reporting device is always equal to the
containing Downstream Port.

It seems the only reason why you want to pass the reporting device
to pcie_do_recovery() is that you want to call pcie_clear_device_status()
and pci_aer_clear_nonfatal_status() with that device.

However as I've said before, those calls are AER-specific and should
be moved out of pcie_do_recovery() so that it becomes generic and can
be used by EEH and s390:

https://lore.kernel.org/all/aPYKe1UKKkR7qrt1@wunner.de/

There's another problem:  When a device experiences an error while DPC
is ongoing (i.e. while the link is down), its ERR_FATAL or ERR_NONFATAL
Message may not come through.  Still the error bits are set in the
device's Uncorrectable Error Status register.  So I think what we need to
do is walk the hierarchy below the containing Downstream Port after the
link is back up and search for devices with any error bits set,
then report and clear those errors.  We may do this after first
examining the device in the DPC Error Source ID register.
Any additional errors found while walking the hierarchy can then
be identified as "occurred during DPC recovery".

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v7 5/5] PCI/AER: Only clear error bits in pcie_clear_device_status()
  2026-01-28 12:45     ` Shuai Xue
@ 2026-02-03  7:44       ` Lukas Wunner
  2026-02-06  8:12         ` Shuai Xue
  0 siblings, 1 reply; 35+ messages in thread
From: Lukas Wunner @ 2026-02-03  7:44 UTC (permalink / raw)
  To: Shuai Xue
  Cc: Jonathan Cameron, linux-pci, linux-kernel, linuxppc-dev, bhelgaas,
	kbusch, sathyanarayanan.kuppuswamy, mahesh, oohall, terry.bowman,
	tianruidong

On Wed, Jan 28, 2026 at 08:45:36PM +0800, Shuai Xue wrote:
> The revised the commit message is:
> 
> PCI/AER: Only clear error bits in PCIe Device Status register
> 
> Currently, pcie_clear_device_status() clears the entire PCIe Device
> Status register (PCI_EXP_DEVSTA), which includes both error status bits
> and other status bits.
> 
> According to PCIe Base Spec r6.0 sec 7.5.3.5, the Device Status register
> contains different types of status bits:

Always cite the latest spec revision, i.e. PCIe r7.0 sec 7.5.3.5.

> - RW1C (read/write 1 to clear) non-error bits: For example, Emergency
>   Power Reduction Detected (bit 6). Unconditionally clearing these bits
>   can interfere with other drivers or subsystems that rely on them.

It would be good to explicitly call out that this bit was introduced with
PCIe r5.0 in 2019 and that it's currently the only writable bit in the
register besides the error bits.

> - Reserved bits: May be used for future features and should be preserved.

Wrong, they're marked "RsvdZ" (not "RsvdP"), so they're supposed to be
written as zero (not preserved).

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v7 5/5] PCI/AER: Only clear error bits in pcie_clear_device_status()
  2026-01-24  7:45 ` [PATCH v7 5/5] PCI/AER: Only clear error bits in pcie_clear_device_status() Shuai Xue
  2026-01-27 10:45   ` Jonathan Cameron
  2026-01-28 17:01   ` Kuppuswamy Sathyanarayanan
@ 2026-02-03  7:53   ` Lukas Wunner
  2026-02-06  7:39     ` Shuai Xue
  2 siblings, 1 reply; 35+ messages in thread
From: Lukas Wunner @ 2026-02-03  7:53 UTC (permalink / raw)
  To: Shuai Xue
  Cc: linux-pci, linux-kernel, linuxppc-dev, bhelgaas, kbusch,
	sathyanarayanan.kuppuswamy, mahesh, oohall, Jonathan.Cameron,
	terry.bowman, tianruidong

On Sat, Jan 24, 2026 at 03:45:57PM +0800, Shuai Xue wrote:
> +++ b/drivers/pci/pci.c
> @@ -2246,7 +2246,7 @@ void pcie_clear_device_status(struct pci_dev *dev)
>  	u16 sta;
>  
>  	pcie_capability_read_word(dev, PCI_EXP_DEVSTA, &sta);
> -	pcie_capability_write_word(dev, PCI_EXP_DEVSTA, sta);
> +	pcie_capability_write_word(dev, PCI_EXP_DEVSTA, sta & PCI_EXP_DEVSTA_ERR);
>  }

I don't think there's any harm to write error bits which are currently 0,
so I'd just get rid of the pcie_capability_read_word() and directly write
the error bits.

> +++ b/include/uapi/linux/pci_regs.h
> @@ -534,6 +534,7 @@
>  #define  PCI_EXP_DEVSTA_NFED	0x0002	/* Non-Fatal Error Detected */
>  #define  PCI_EXP_DEVSTA_FED	0x0004	/* Fatal Error Detected */
>  #define  PCI_EXP_DEVSTA_URD	0x0008	/* Unsupported Request Detected */
> +#define  PCI_EXP_DEVSTA_ERR	0xf	/* Error bits */

There's only one user of PCI_EXP_DEVSTA_ERR and it feels a little
awkward to define a macro in a uapi header which does not correspond
to an "official" bit definition but is just there for convenience.

So maybe it's better to simply use the macros for the four bits in
pcie_clear_device_status()?  Might also be slightly clearer.

This patch could be submitted individually instead of being part
of this series.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v7 4/5] PCI/AER: Clear both AER fatal and non-fatal status
  2026-01-24  7:45 ` [PATCH v7 4/5] PCI/AER: Clear both AER fatal and non-fatal status Shuai Xue
  2026-01-27 10:39   ` Jonathan Cameron
  2026-01-28 16:58   ` Kuppuswamy Sathyanarayanan
@ 2026-02-03  8:06   ` Lukas Wunner
  2026-02-07  8:34     ` Shuai Xue
  2 siblings, 1 reply; 35+ messages in thread
From: Lukas Wunner @ 2026-02-03  8:06 UTC (permalink / raw)
  To: Shuai Xue
  Cc: linux-pci, linux-kernel, linuxppc-dev, bhelgaas, kbusch,
	sathyanarayanan.kuppuswamy, mahesh, oohall, Jonathan.Cameron,
	terry.bowman, tianruidong

On Sat, Jan 24, 2026 at 03:45:56PM +0800, Shuai Xue wrote:
> The DPC driver clears AER fatal status for the port that reported the
> error, but not for the downstream device that deteced the error.  The
> current recovery code only clears non-fatal AER status, leaving fatal
> status bits set in the error device.

That's not quite accurate:

The error device has undergone a Hot Reset as a result of the Link Down
event.  To be able to use it again, pci_restore_state() is invoked by
the driver's ->slot_reset() callback.  And pci_restore_state() does
clear fatal status bits.

pci_restore_state()
  pci_aer_clear_status()
    pci_aer_raw_clear_status()

> Use pci_aer_raw_clear_status() to clear both fatal and non-fatal error
> status in the error device, ensuring all AER status bits are properly
> cleared after recovery.

Well, pci_restore_state() already clears all AER status bits so why
is this patch necessary?

> +++ b/drivers/pci/pcie/err.c
> @@ -285,7 +285,7 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
>  	 */
>  	if (host->native_aer || pcie_ports_native) {
>  		pcie_clear_device_status(dev);
> -		pci_aer_clear_nonfatal_status(dev);
> +		pci_aer_raw_clear_status(dev);
>  	}

This code path is for the case when pcie_do_recovery() is called
with state=pci_channel_io_normal, i.e. in the nonfatal case.
That's why only the nonfatal bits need to be cleared here.

In the fatal case clearing of the error bits is done by
pci_restore_state().

I understand that this is subtle and should probably be changed
to improve clarity, but this patch doesn't look like a step
in that direction.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v7 5/5] PCI/AER: Only clear error bits in pcie_clear_device_status()
  2026-02-03  7:53   ` Lukas Wunner
@ 2026-02-06  7:39     ` Shuai Xue
  0 siblings, 0 replies; 35+ messages in thread
From: Shuai Xue @ 2026-02-06  7:39 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: linux-pci, linux-kernel, linuxppc-dev, bhelgaas, kbusch,
	sathyanarayanan.kuppuswamy, mahesh, oohall, Jonathan.Cameron,
	terry.bowman, tianruidong



On 2/3/26 3:53 PM, Lukas Wunner wrote:
> On Sat, Jan 24, 2026 at 03:45:57PM +0800, Shuai Xue wrote:
>> +++ b/drivers/pci/pci.c
>> @@ -2246,7 +2246,7 @@ void pcie_clear_device_status(struct pci_dev *dev)
>>   	u16 sta;
>>   
>>   	pcie_capability_read_word(dev, PCI_EXP_DEVSTA, &sta);
>> -	pcie_capability_write_word(dev, PCI_EXP_DEVSTA, sta);
>> +	pcie_capability_write_word(dev, PCI_EXP_DEVSTA, sta & PCI_EXP_DEVSTA_ERR);
>>   }
> 
> I don't think there's any harm to write error bits which are currently 0,
> so I'd just get rid of the pcie_capability_read_word() and directly write
> the error bits.

Good point. I will remove the read step.

> 
>> +++ b/include/uapi/linux/pci_regs.h
>> @@ -534,6 +534,7 @@
>>   #define  PCI_EXP_DEVSTA_NFED	0x0002	/* Non-Fatal Error Detected */
>>   #define  PCI_EXP_DEVSTA_FED	0x0004	/* Fatal Error Detected */
>>   #define  PCI_EXP_DEVSTA_URD	0x0008	/* Unsupported Request Detected */
>> +#define  PCI_EXP_DEVSTA_ERR	0xf	/* Error bits */
> 
> There's only one user of PCI_EXP_DEVSTA_ERR and it feels a little
> awkward to define a macro in a uapi header which does not correspond
> to an "official" bit definition but is just there for convenience.
> 
> So maybe it's better to simply use the macros for the four bits in
> pcie_clear_device_status()?  Might also be slightly clearer.

Agreed, will move PCI_EXP_DEVSTA_ERR to drivers/pci/pci.c.

> 
> This patch could be submitted individually instead of being part
> of this series.


Got it. Will send a individual patch.

> 
> Thanks,
> 
> Lukas

Thanks for valuable comments.
Shuai


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v7 5/5] PCI/AER: Only clear error bits in pcie_clear_device_status()
  2026-02-03  7:44       ` Lukas Wunner
@ 2026-02-06  8:12         ` Shuai Xue
  0 siblings, 0 replies; 35+ messages in thread
From: Shuai Xue @ 2026-02-06  8:12 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Jonathan Cameron, linux-pci, linux-kernel, linuxppc-dev, bhelgaas,
	kbusch, sathyanarayanan.kuppuswamy, mahesh, oohall, terry.bowman,
	tianruidong



On 2/3/26 3:44 PM, Lukas Wunner wrote:
> On Wed, Jan 28, 2026 at 08:45:36PM +0800, Shuai Xue wrote:
>> The revised the commit message is:
>>
>> PCI/AER: Only clear error bits in PCIe Device Status register
>>
>> Currently, pcie_clear_device_status() clears the entire PCIe Device
>> Status register (PCI_EXP_DEVSTA), which includes both error status bits
>> and other status bits.
>>
>> According to PCIe Base Spec r6.0 sec 7.5.3.5, the Device Status register
>> contains different types of status bits:
> 
> Always cite the latest spec revision, i.e. PCIe r7.0 sec 7.5.3.5.

Sure, I will update the cite version.

> 
>> - RW1C (read/write 1 to clear) non-error bits: For example, Emergency
>>    Power Reduction Detected (bit 6). Unconditionally clearing these bits
>>    can interfere with other drivers or subsystems that rely on them.
> 
> It would be good to explicitly call out that this bit was introduced with
> PCIe r5.0 in 2019 and that it's currently the only writable bit in the
> register besides the error bits.

Sure, will add it.

> 
>> - Reserved bits: May be used for future features and should be preserved.
> 
> Wrong, they're marked "RsvdZ" (not "RsvdP"), so they're supposed to be
> written as zero (not preserved).

Thanks for correcting me. Will fix it.

> 
> Thanks,
> 
> Lukas


Thanks for valuable comments.
Shuai


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v7 2/5] PCI/DPC: Run recovery on device that detected the error
  2026-02-02 14:02   ` Lukas Wunner
  2026-02-02 21:09     ` Lukas Wunner
@ 2026-02-06  8:41     ` Shuai Xue
  1 sibling, 0 replies; 35+ messages in thread
From: Shuai Xue @ 2026-02-06  8:41 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: linux-pci, linux-kernel, linuxppc-dev, bhelgaas, kbusch,
	sathyanarayanan.kuppuswamy, mahesh, oohall, Jonathan.Cameron,
	terry.bowman, tianruidong



On 2/2/26 10:02 PM, Lukas Wunner wrote:
> On Sat, Jan 24, 2026 at 03:45:54PM +0800, Shuai Xue wrote:
>> +++ b/drivers/pci/pcie/dpc.c
>> @@ -372,10 +388,11 @@ static irqreturn_t dpc_handler(int irq, void *context)
>>   		return IRQ_HANDLED;
>>   	}
>>   
>> -	dpc_process_error(err_port);
>> +	err_dev = dpc_process_error(err_port);
>>   
>>   	/* We configure DPC so it only triggers on ERR_FATAL */
>> -	pcie_do_recovery(err_port, pci_channel_io_frozen, dpc_reset_link);
>> +	pcie_do_recovery(err_dev, pci_channel_io_frozen, dpc_reset_link);
>> +	pci_dev_put(err_dev);
>>   
>>   	return IRQ_HANDLED;
>>   }
> 
> You're assuming that the parent of the Requester is always identical
> to the containing Downstream Port.  But that's not necessarily the case:
> 
> E.g., imagine a DPC-capable Root Port with a PCIe switch below
> whose Downstream Ports are not DPC-capable.  Let's say an Endpoint
> beneath the PCIe switch sends ERR_FATAL upstream.  AFAICS, your patch
> will cause pcie_do_recovery() to invoke dpc_reset_link() with the
> Downstream Port of the PCIe switch as argument.  That function will
> then happily use pdev->dpc_cap even though it's 0.

I see. Goot point.

> 
> A possible solution may be to amend dpc_reset_link() to walk upwards,
> starting at pdev, and search for a pci_dev whose pdev->dpc_cap is
> non-zero.  That should be the containing DPC-capable port.

See my reply in your later patch.

> 
> I don't really like the err_dev and err_port terminology invented here.
> We generally use spec terms to make it easy to connect the code to the
> spec.  The spec uses the term "Downstream Port" to denote the containing
> Downstream Port, so a name such as "dport" may be better than "err_port".
 >
> 
> Similarly, the spec uses "Requester" or "Source" for the reporting agent,
> so "req", "requester", "src" or "source" may all be better names than
> "err_dev".

Sure, I wll use the common "dport" and "req" term from spec to make it more readable.


> 
> Thanks,
> 
> Lukas

Thanks for valuable comments.
Shuai


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v7 2/5] PCI/DPC: Run recovery on device that detected the error
  2026-02-02 21:09     ` Lukas Wunner
@ 2026-02-07  7:48       ` Shuai Xue
  2026-02-27  8:28         ` Shuai Xue
  0 siblings, 1 reply; 35+ messages in thread
From: Shuai Xue @ 2026-02-07  7:48 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: linux-pci, linux-kernel, linuxppc-dev, bhelgaas, kbusch,
	sathyanarayanan.kuppuswamy, mahesh, oohall, Jonathan.Cameron,
	terry.bowman, tianruidong



On 2/3/26 5:09 AM, Lukas Wunner wrote:
> On Mon, Feb 02, 2026 at 03:02:54PM +0100, Lukas Wunner wrote:
>> You're assuming that the parent of the Requester is always identical
>> to the containing Downstream Port.  But that's not necessarily the case:
>>
>> E.g., imagine a DPC-capable Root Port with a PCIe switch below
>> whose Downstream Ports are not DPC-capable.  Let's say an Endpoint
>> beneath the PCIe switch sends ERR_FATAL upstream.  AFAICS, your patch
>> will cause pcie_do_recovery() to invoke dpc_reset_link() with the
>> Downstream Port of the PCIe switch as argument.  That function will
>> then happily use pdev->dpc_cap even though it's 0.
> 
> Thinking about this some more, I realized there's another problem:
> 
> In a scenario like the one I've outlined above, after your change,
> pcie_do_recovery() will only broadcast error_detected (and other
> callbacks) below the downstream port of the PCIe switch -- and not
> to any other devices below the containing Root Port.
> 
> However, the DPC-induced Link Down event at the Root Port results
> in a Hot Reset being propagated down the hierarchy to any device
> below the Root Port.  So with your change, the siblings of the
> downstream port on the PCIe switch will no longer be informed of
> the reset and thus are no longer given an opportunity to recover
> after reset.
> 
> The premise on which this patch is built is false -- that the bridge
> upstream of the error-reporting device is always equal to the
> containing Downstream Port.

Thanks again for the very detailed analysis and for the pointers to
your earlier mail.

You are right, thanks for pointing it out.

> 
> It seems the only reason why you want to pass the reporting device
> to pcie_do_recovery() is that you want to call pcie_clear_device_status()
> and pci_aer_clear_nonfatal_status() with that device.

In the AER path, pcie_do_recovery() is indeed invoked with the Error
Source device found by find_source_device(), and internally it treats
that dev as the function that detected the error and derives the
containing Downstream Port (bridge) from it.  For DPC, however, the
error-detecting function is the DPC-capable Downstream Port itself, not
the Endpoint identified as Error Source, so passing the Endpoint to
pcie_do_recovery() breaks that assumption.
> 
> However as I've said before, those calls are AER-specific and should
> be moved out of pcie_do_recovery() so that it becomes generic and can
> be used by EEH and s390:
> 
> https://lore.kernel.org/all/aPYKe1UKKkR7qrt1@wunner.de/

Sure, I'd like to move it out.  I will remove the AER-specific calls
(pcie_clear_device_status() and pci_aer_raw_clear_status()) from
pcie_do_recovery() itself, and instead handle them in the AER and DPC
code paths where we already know which device(s) are the actual error
sources.  That way, pcie_do_recovery() becomes a generic recovery
framework that can be reused by EEH and s390.

> 
> There's another problem:  When a device experiences an error while DPC
> is ongoing (i.e. while the link is down), its ERR_FATAL or ERR_NONFATAL
> Message may not come through.  Still the error bits are set in the
> device's Uncorrectable Error Status register.  So I think what we need to
> do is walk the hierarchy below the containing Downstream Port after the
> link is back up and search for devices with any error bits set,
> then report and clear those errors.  We may do this after first
> examining the device in the DPC Error Source ID register.
> Any additional errors found while walking the hierarchy can then
> be identified as "occurred during DPC recovery".

I agree this is similar in spirit to find_source_device() -- both walk
the bus and check AER Status registers.  For the DPC case, I'll perform
this walk after the link is back up (i.e., after dpc_reset_link()
succeeds).

Regarding pci_restore_state() in slot_reset(): I see now that it does
call pci_aer_clear_status(dev) (at line 1844 in pci.c), which will
clear the AER Status registers.  So if we walk the hierarchy after
the slot_reset callbacks, the error bits accumulated during DPC will
already be cleared.

To avoid losing those errors, I think the walk should happen after
dpc_reset_link() succeeds but *before* pcie_do_recovery() invokes the
slot_reset callbacks.  That way, we can capture the AER Status bits
before pci_restore_state() clears them.

Does that sound like the right approach, or would you prefer a
different placement?

Thanks a lot for your guidance.

Best Regards,
Shuai


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v7 4/5] PCI/AER: Clear both AER fatal and non-fatal status
  2026-02-03  8:06   ` Lukas Wunner
@ 2026-02-07  8:34     ` Shuai Xue
  0 siblings, 0 replies; 35+ messages in thread
From: Shuai Xue @ 2026-02-07  8:34 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: linux-pci, linux-kernel, linuxppc-dev, bhelgaas, kbusch,
	sathyanarayanan.kuppuswamy, mahesh, oohall, Jonathan.Cameron,
	terry.bowman, tianruidong



On 2/3/26 4:06 PM, Lukas Wunner wrote:
> On Sat, Jan 24, 2026 at 03:45:56PM +0800, Shuai Xue wrote:
>> The DPC driver clears AER fatal status for the port that reported the
>> error, but not for the downstream device that deteced the error.  The
>> current recovery code only clears non-fatal AER status, leaving fatal
>> status bits set in the error device.
> 
> That's not quite accurate:
> 
> The error device has undergone a Hot Reset as a result of the Link Down
> event.  To be able to use it again, pci_restore_state() is invoked by
> the driver's ->slot_reset() callback.  And pci_restore_state() does
> clear fatal status bits.
> 
> pci_restore_state()
>    pci_aer_clear_status()
>      pci_aer_raw_clear_status()
> 
>> Use pci_aer_raw_clear_status() to clear both fatal and non-fatal error
>> status in the error device, ensuring all AER status bits are properly
>> cleared after recovery.
> 
> Well, pci_restore_state() already clears all AER status bits so why
> is this patch necessary?

You're right that many drivers call pci_restore_state() in their
->slot_reset() callback, which clears all AER status bits.  However,
since ->slot_reset() is driver-defined and not all drivers invoke
pci_restore_state(), there could be cases where fatal AER status bits
remain set after the frozen recovery completes.

> 
>> +++ b/drivers/pci/pcie/err.c
>> @@ -285,7 +285,7 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
>>   	 */
>>   	if (host->native_aer || pcie_ports_native) {
>>   		pcie_clear_device_status(dev);
>> -		pci_aer_clear_nonfatal_status(dev);
>> +		pci_aer_raw_clear_status(dev);
>>   	}
> 
> This code path is for the case when pcie_do_recovery() is called
> with state=pci_channel_io_normal, i.e. in the nonfatal case.
> That's why only the nonfatal bits need to be cleared here.
> 
> In the fatal case clearing of the error bits is done by
> pci_restore_state().
> 
> I understand that this is subtle and should probably be changed
> to improve clarity, but this patch doesn't look like a step
> in that direction.
> 

I notice that pcie_clear_device_status()
(called just before pci_aer_clear_nonfatal_status()) already clears
*all* error bits in the Device Status register (CED, NFED, FED, URD),
including the Fatal Error Detected (FED) bit, regardless of whether
we're in a fatal or nonfatal recovery path.

So there's an inconsistency in the current design:
- pcie_clear_device_status() clears all error bits (including FED)
- pci_aer_clear_nonfatal_status() only clears nonfatal AER status

This means that even in the nonfatal case, the FED bit in Device
Status is cleared, even though we preserve the fatal bits in the AER
Uncorrectable Error Status register.

Would you prefer that I:
1. Make both pcie_clear_device_status() and the AER clearing
    conditional on the 'state' parameter, so that nonfatal recovery
    truly preserves fatal bits in both registers, or
2. Drop this patch and instead move the AER-specific clearing logic
    out of pcie_do_recovery() entirely (as you suggested earlier), so
    that each caller can handle the status clearing explicitly for the
    appropriate error sources and severity?

Thanks,
Shuai


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v7 2/5] PCI/DPC: Run recovery on device that detected the error
  2026-02-07  7:48       ` Shuai Xue
@ 2026-02-27  8:28         ` Shuai Xue
  2026-02-27 10:47           ` Lukas Wunner
  0 siblings, 1 reply; 35+ messages in thread
From: Shuai Xue @ 2026-02-27  8:28 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: linux-pci, linux-kernel, linuxppc-dev, bhelgaas, kbusch,
	sathyanarayanan.kuppuswamy, mahesh, oohall, Jonathan.Cameron,
	terry.bowman, tianruidong



On 2/7/26 3:48 PM, Shuai Xue wrote:
> 
> 
> On 2/3/26 5:09 AM, Lukas Wunner wrote:
>> On Mon, Feb 02, 2026 at 03:02:54PM +0100, Lukas Wunner wrote:
>>> You're assuming that the parent of the Requester is always identical
>>> to the containing Downstream Port.  But that's not necessarily the case:
>>>
>>> E.g., imagine a DPC-capable Root Port with a PCIe switch below
>>> whose Downstream Ports are not DPC-capable.  Let's say an Endpoint
>>> beneath the PCIe switch sends ERR_FATAL upstream.  AFAICS, your patch
>>> will cause pcie_do_recovery() to invoke dpc_reset_link() with the
>>> Downstream Port of the PCIe switch as argument.  That function will
>>> then happily use pdev->dpc_cap even though it's 0.
>>
>> Thinking about this some more, I realized there's another problem:
>>
>> In a scenario like the one I've outlined above, after your change,
>> pcie_do_recovery() will only broadcast error_detected (and other
>> callbacks) below the downstream port of the PCIe switch -- and not
>> to any other devices below the containing Root Port.
>>
>> However, the DPC-induced Link Down event at the Root Port results
>> in a Hot Reset being propagated down the hierarchy to any device
>> below the Root Port.  So with your change, the siblings of the
>> downstream port on the PCIe switch will no longer be informed of
>> the reset and thus are no longer given an opportunity to recover
>> after reset.
>>
>> The premise on which this patch is built is false -- that the bridge
>> upstream of the error-reporting device is always equal to the
>> containing Downstream Port.
> 
> Thanks again for the very detailed analysis and for the pointers to
> your earlier mail.
> 
> You are right, thanks for pointing it out.
> 
>>
>> It seems the only reason why you want to pass the reporting device
>> to pcie_do_recovery() is that you want to call pcie_clear_device_status()
>> and pci_aer_clear_nonfatal_status() with that device.
> 
> In the AER path, pcie_do_recovery() is indeed invoked with the Error
> Source device found by find_source_device(), and internally it treats
> that dev as the function that detected the error and derives the
> containing Downstream Port (bridge) from it.  For DPC, however, the
> error-detecting function is the DPC-capable Downstream Port itself, not
> the Endpoint identified as Error Source, so passing the Endpoint to
> pcie_do_recovery() breaks that assumption.
>>
>> However as I've said before, those calls are AER-specific and should
>> be moved out of pcie_do_recovery() so that it becomes generic and can
>> be used by EEH and s390:
>>
>> https://lore.kernel.org/all/aPYKe1UKKkR7qrt1@wunner.de/
> 
> Sure, I'd like to move it out.  I will remove the AER-specific calls
> (pcie_clear_device_status() and pci_aer_raw_clear_status()) from
> pcie_do_recovery() itself, and instead handle them in the AER and DPC
> code paths where we already know which device(s) are the actual error
> sources.  That way, pcie_do_recovery() becomes a generic recovery
> framework that can be reused by EEH and s390.
> 
>>
>> There's another problem:  When a device experiences an error while DPC
>> is ongoing (i.e. while the link is down), its ERR_FATAL or ERR_NONFATAL
>> Message may not come through.  Still the error bits are set in the
>> device's Uncorrectable Error Status register.  So I think what we need to
>> do is walk the hierarchy below the containing Downstream Port after the
>> link is back up and search for devices with any error bits set,
>> then report and clear those errors.  We may do this after first
>> examining the device in the DPC Error Source ID register.
>> Any additional errors found while walking the hierarchy can then
>> be identified as "occurred during DPC recovery".
> 
> I agree this is similar in spirit to find_source_device() -- both walk
> the bus and check AER Status registers.  For the DPC case, I'll perform
> this walk after the link is back up (i.e., after dpc_reset_link()
> succeeds).
> 
> Regarding pci_restore_state() in slot_reset(): I see now that it does
> call pci_aer_clear_status(dev) (at line 1844 in pci.c), which will
> clear the AER Status registers.  So if we walk the hierarchy after
> the slot_reset callbacks, the error bits accumulated during DPC will
> already be cleared.
> 
> To avoid losing those errors, I think the walk should happen after
> dpc_reset_link() succeeds but *before* pcie_do_recovery() invokes the
> slot_reset callbacks.  That way, we can capture the AER Status bits
> before pci_restore_state() clears them.
> 
> Does that sound like the right approach, or would you prefer a
> different placement?
> 
> Thanks a lot for your guidance.
> 
> Best Regards,
> Shuai
> 

Hi, Lukas,

Gentle ping. Any feedback?

Thanks.
Shuai


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v7 2/5] PCI/DPC: Run recovery on device that detected the error
  2026-02-27  8:28         ` Shuai Xue
@ 2026-02-27 10:47           ` Lukas Wunner
  2026-02-27 12:28             ` Shuai Xue
  0 siblings, 1 reply; 35+ messages in thread
From: Lukas Wunner @ 2026-02-27 10:47 UTC (permalink / raw)
  To: Shuai Xue
  Cc: linux-pci, linux-kernel, linuxppc-dev, bhelgaas, kbusch,
	sathyanarayanan.kuppuswamy, mahesh, oohall, Jonathan.Cameron,
	terry.bowman, tianruidong

On Fri, Feb 27, 2026 at 04:28:59PM +0800, Shuai Xue wrote:
> On 2/7/26 3:48 PM, Shuai Xue wrote:
> > Regarding pci_restore_state() in slot_reset(): I see now that it does
> > call pci_aer_clear_status(dev) (at line 1844 in pci.c), which will
> > clear the AER Status registers. So if we walk the hierarchy after
> > the slot_reset callbacks, the error bits accumulated during DPC will
> > already be cleared.
> > 
> > To avoid losing those errors, I think the walk should happen after
> > dpc_reset_link() succeeds but *before* pcie_do_recovery() invokes the
> > slot_reset callbacks. That way, we can capture the AER Status bits
> > before pci_restore_state() clears them.
> > 
> > Does that sound like the right approach, or would you prefer a
> > different placement?

The problem is that if the hierarchy that was reset is deeper than
one level, you first need to call pci_restore_state() on all the
PCIe Upstream and Downstream Ports that were reset before you can
access the Endpoints at the bottom of the hierarchy.

E.g. if DPC occurs at a Root Port with multiple nested PCIe switches
below, the Endpoints at the "leafs" of that tree are only accessible
once Config Space has been restored at all the PCIe switches
in-between the Endpoints and the DPC-capable Root Port.

Hence your proposal unfortunately won't work.

I think the solution is to move pci_aer_clear_status() out of
pci_restore_state() into the callers that actually need it.
But that requires going through every single caller.
I've begun doing that last week and am about 60% done.

Once pci_restore_state() no longer clears the error bits, we can
report and clear them after the "report_slot_reset" stage (which
is where drivers call pci_restore_state()).

I've also changed my mind and I think reporting and clearing
the error bits *could* happen in pcie_do_recovery() even if it
were used for EEH and s390 because those platforms may plug in
AER-capable devices as well and so we do need to clear the bits
regardless of the error recovery mechanism used.

Let me get back to you once I've gone through all the callers of
pci_restore_state().  Please be patient.

Thank you!

Lukas

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v7 2/5] PCI/DPC: Run recovery on device that detected the error
  2026-02-27 10:47           ` Lukas Wunner
@ 2026-02-27 12:28             ` Shuai Xue
  0 siblings, 0 replies; 35+ messages in thread
From: Shuai Xue @ 2026-02-27 12:28 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: linux-pci, linux-kernel, linuxppc-dev, bhelgaas, kbusch,
	sathyanarayanan.kuppuswamy, mahesh, oohall, Jonathan.Cameron,
	terry.bowman, tianruidong



On 2/27/26 6:47 PM, Lukas Wunner wrote:
> On Fri, Feb 27, 2026 at 04:28:59PM +0800, Shuai Xue wrote:
>> On 2/7/26 3:48 PM, Shuai Xue wrote:
>>> Regarding pci_restore_state() in slot_reset(): I see now that it does
>>> call pci_aer_clear_status(dev) (at line 1844 in pci.c), which will
>>> clear the AER Status registers. So if we walk the hierarchy after
>>> the slot_reset callbacks, the error bits accumulated during DPC will
>>> already be cleared.
>>>
>>> To avoid losing those errors, I think the walk should happen after
>>> dpc_reset_link() succeeds but *before* pcie_do_recovery() invokes the
>>> slot_reset callbacks. That way, we can capture the AER Status bits
>>> before pci_restore_state() clears them.
>>>
>>> Does that sound like the right approach, or would you prefer a
>>> different placement?
> 
> The problem is that if the hierarchy that was reset is deeper than
> one level, you first need to call pci_restore_state() on all the
> PCIe Upstream and Downstream Ports that were reset before you can
> access the Endpoints at the bottom of the hierarchy.
> 
> E.g. if DPC occurs at a Root Port with multiple nested PCIe switches
> below, the Endpoints at the "leafs" of that tree are only accessible
> once Config Space has been restored at all the PCIe switches
> in-between the Endpoints and the DPC-capable Root Port.
> 
> Hence your proposal unfortunately won't work.
> 
> I think the solution is to move pci_aer_clear_status() out of
> pci_restore_state() into the callers that actually need it.
> But that requires going through every single caller.
> I've begun doing that last week and am about 60% done.
> 
> Once pci_restore_state() no longer clears the error bits, we can
> report and clear them after the "report_slot_reset" stage (which
> is where drivers call pci_restore_state()).
> 
> I've also changed my mind and I think reporting and clearing
> the error bits *could* happen in pcie_do_recovery() even if it
> were used for EEH and s390 because those platforms may plug in
> AER-capable devices as well and so we do need to clear the bits
> regardless of the error recovery mechanism used.
> 
> Let me get back to you once I've gone through all the callers of
> pci_restore_state().  Please be patient.
> 

Sure, glad to hear you have been working on that.


Thanks.
SHuai


^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2026-02-27 12:28 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-24  7:45 [PATCH v7 0/5] PCI/AER: Report fatal errors of RCiEP and EP if link recoverd Shuai Xue
2026-01-24  7:45 ` [PATCH v7 1/5] PCI/DPC: Clarify naming for error port in DPC Handling Shuai Xue
2026-01-27 10:10   ` Jonathan Cameron
2026-01-24  7:45 ` [PATCH v7 2/5] PCI/DPC: Run recovery on device that detected the error Shuai Xue
2026-01-27 10:24   ` Jonathan Cameron
2026-01-28 12:27     ` Shuai Xue
2026-01-28 15:02       ` Jonathan Cameron
2026-01-29  5:49         ` Shuai Xue
2026-02-02 14:02   ` Lukas Wunner
2026-02-02 21:09     ` Lukas Wunner
2026-02-07  7:48       ` Shuai Xue
2026-02-27  8:28         ` Shuai Xue
2026-02-27 10:47           ` Lukas Wunner
2026-02-27 12:28             ` Shuai Xue
2026-02-06  8:41     ` Shuai Xue
2026-01-24  7:45 ` [PATCH v7 3/5] PCI/AER: Report fatal errors of RCiEP and EP if link recoverd Shuai Xue
2026-01-27 10:36   ` Jonathan Cameron
2026-01-28 12:29     ` Shuai Xue
2026-01-28 16:50   ` Kuppuswamy Sathyanarayanan
2026-01-29 11:46     ` Shuai Xue
2026-01-24  7:45 ` [PATCH v7 4/5] PCI/AER: Clear both AER fatal and non-fatal status Shuai Xue
2026-01-27 10:39   ` Jonathan Cameron
2026-01-28 12:30     ` Shuai Xue
2026-01-28 16:58   ` Kuppuswamy Sathyanarayanan
2026-02-03  8:06   ` Lukas Wunner
2026-02-07  8:34     ` Shuai Xue
2026-01-24  7:45 ` [PATCH v7 5/5] PCI/AER: Only clear error bits in pcie_clear_device_status() Shuai Xue
2026-01-27 10:45   ` Jonathan Cameron
2026-01-28 12:45     ` Shuai Xue
2026-02-03  7:44       ` Lukas Wunner
2026-02-06  8:12         ` Shuai Xue
2026-01-28 17:01   ` Kuppuswamy Sathyanarayanan
2026-01-29 12:09     ` Shuai Xue
2026-02-03  7:53   ` Lukas Wunner
2026-02-06  7:39     ` Shuai Xue

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox