Linux PCI subsystem development
 help / color / mirror / Atom feed
* [PATCH v8 0/5] PCI: Add support for resetting the Root Ports in a platform specific way
@ 2026-05-18 14:59 Manivannan Sadhasivam via B4 Relay
  2026-05-18 14:59 ` [PATCH v8 1/5] PCI: dwc: ep: Clear MSI iATU mapping in dw_pcie_ep_cleanup() Manivannan Sadhasivam via B4 Relay
                   ` (4 more replies)
  0 siblings, 5 replies; 11+ messages in thread
From: Manivannan Sadhasivam via B4 Relay @ 2026-05-18 14:59 UTC (permalink / raw)
  To: Bjorn Helgaas, Mahesh J Salgaonkar, Oliver O'Halloran,
	Will Deacon, Lorenzo Pieralisi, Krzysztof Wilczyński,
	Manivannan Sadhasivam, Rob Herring, Heiko Stuebner, Philipp Zabel
  Cc: linux-pci, linux-kernel, linuxppc-dev, linux-arm-kernel,
	linux-arm-msm, linux-rockchip, Niklas Cassel, Wilfred Mallawa,
	Krishna Chaitanya Chundru, mani, Lukas Wunner, Richard Zhu,
	Brian Norris, Wilson Ding, Manivannan Sadhasivam, Frank Li,
	Manivannan Sadhasivam

Hi,

Currently, in the event of AER/DPC, PCI core will try to reset the slot (Root
Port) and its subordinate devices by invoking bridge control reset and FLR. But
in some cases like AER Fatal error, it might be necessary to reset the Root
Ports using the PCI host bridge drivers in a platform specific way (as indicated
by the TODO in the pcie_do_recovery() function in drivers/pci/pcie/err.c).
Otherwise, the PCI link won't be recovered successfully.

So this series adds a new callback 'pci_host_bridge::reset_root_port' for the
host bridge drivers to reset the Root Port when a fatal error happens.

Also, this series allows the host bridge drivers to handle PCI link down event
by resetting the Root Ports and recovering the bus. This is accomplished by the
help of the new 'pci_host_handle_link_down()' API. Host bridge drivers are
expected to call this API (preferrably from a threaded IRQ handler) with
relevant Root Port 'pci_dev' when a link down event is detected for the port.
The API will reuse the pcie_do_recovery() function to recover the link if AER
support is enabled, otherwise it will directly call the reset_root_port()
callback of the host bridge driver (if exists).

For reference, I've modified the pcie-qcom driver to call
pci_host_handle_link_down() API with Root Port 'pci_dev' after receiving the
LDn global_irq event and populated 'pci_host_bridge::reset_root_port()'
callback to reset the Root Ports.
 
Testing
-------
Tested on Qcom Lemans AU Ride platform with Host and EP SoCs connected over PCIe
link. Simulated the LDn by disabling LTSSM_EN on the EP and I could verify that
the link was getting recovered successfully.

Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
---
Changes in v8:
- Removed pci_save_state() for the Root Port during recovery as the PCI core now
  saves the config space during enumeration
- Added save/restore in pci_endpoint_test.c driver to save the config space
  after enabling BME and restoring it after reset
- Added a patch to unmap MSI address post LDn
- Rebased on top of v7.1-rc1
- Link to v7: https://lore.kernel.org/r/20260310-pci-port-reset-v7-0-9dd00ccc25ab@oss.qualcomm.com

Changes in v7:
- Dropped Rockchip Root port reset patch due to reported issues. But the series
  works on other platforms as tested by others.
- Added pci_{lock/unlock}_rescan_remove() to guard pci_bus_error_reset() as the
  device could be removed in-between due to Native hotplug interrupt.
- Rebased on top of v7.0-rc1
- Link to v6: https://lore.kernel.org/r/20250715-pci-port-reset-v6-0-6f9cce94e7bb@oss.qualcomm.com

Changes in v6:
- Incorporated the patch: https://lore.kernel.org/all/20250524185304.26698-2-manivannan.sadhasivam@linaro.org/
- Link to v5: https://lore.kernel.org/r/20250715-pci-port-reset-v5-0-26a5d278db40@oss.qualcomm.com

Changes in v5:
* Reworked the pci_host_handle_link_down() to accept Root Port instead of
  resetting all Root Ports in the event of link down.
* Renamed 'reset_slot' to 'reset_root_port' to avoid confusion as both terms
  were used interchangibly and the series is intended to reset Root Port only.
* Added the Rockchip driver change to this series.
* Dropped the applied patches and review/tested tags due to rework.
* Rebased on top of v6.16-rc1.

Changes in v4:
- Handled link down first in the irq handler
- Updated ICC & OPP bandwidth after link up in reset_slot() callback
- Link to v3: https://lore.kernel.org/r/20250417-pcie-reset-slot-v3-0-59a10811c962@linaro.org

Changes in v3:
- Made the pci-host-common driver as a common library for host controller
  drivers
- Moved the reset slot code to pci-host-common library
- Link to v2: https://lore.kernel.org/r/20250416-pcie-reset-slot-v2-0-efe76b278c10@linaro.org

Changes in v2:
- Moved calling reset_slot() callback from pcie_do_recovery() to pcibios_reset_secondary_bus()
- Link to v1: https://lore.kernel.org/r/20250404-pcie-reset-slot-v1-0-98952918bf90@linaro.org

---
Manivannan Sadhasivam (5):
      PCI: dwc: ep: Clear MSI iATU mapping in dw_pcie_ep_cleanup()
      PCI/ERR: Add support for resetting the Root Ports in a platform specific way
      PCI: host-common: Add link down handling for Root Ports
      PCI: qcom: Add support for resetting the Root Port due to link down event
      misc: pci_endpoint_test: Add AER error handlers

 drivers/misc/pci_endpoint_test.c                |  23 ++++
 drivers/pci/controller/dwc/pcie-designware-ep.c |   5 +
 drivers/pci/controller/dwc/pcie-qcom.c          | 143 +++++++++++++++++++++++-
 drivers/pci/controller/pci-host-common.c        |  35 ++++++
 drivers/pci/controller/pci-host-common.h        |   1 +
 drivers/pci/pci.c                               |  14 +++
 drivers/pci/pcie/err.c                          |   6 +-
 include/linux/pci.h                             |   1 +
 8 files changed, 222 insertions(+), 6 deletions(-)
---
base-commit: 254f49634ee16a731174d2ae34bc50bd5f45e731
change-id: 20250715-pci-port-reset-4d9519570123

Best regards,
-- 
Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v8 1/5] PCI: dwc: ep: Clear MSI iATU mapping in dw_pcie_ep_cleanup()
  2026-05-18 14:59 [PATCH v8 0/5] PCI: Add support for resetting the Root Ports in a platform specific way Manivannan Sadhasivam via B4 Relay
@ 2026-05-18 14:59 ` Manivannan Sadhasivam via B4 Relay
  2026-05-18 16:04   ` sashiko-bot
  2026-05-18 14:59 ` [PATCH v8 2/5] PCI/ERR: Add support for resetting the Root Ports in a platform specific way Manivannan Sadhasivam via B4 Relay
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 11+ messages in thread
From: Manivannan Sadhasivam via B4 Relay @ 2026-05-18 14:59 UTC (permalink / raw)
  To: Bjorn Helgaas, Mahesh J Salgaonkar, Oliver O'Halloran,
	Will Deacon, Lorenzo Pieralisi, Krzysztof Wilczyński,
	Manivannan Sadhasivam, Rob Herring, Heiko Stuebner, Philipp Zabel
  Cc: linux-pci, linux-kernel, linuxppc-dev, linux-arm-kernel,
	linux-arm-msm, linux-rockchip, Niklas Cassel, Wilfred Mallawa,
	Krishna Chaitanya Chundru, mani, Lukas Wunner, Richard Zhu,
	Brian Norris, Wilson Ding, Manivannan Sadhasivam

From: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>

The MSI iATU mapping is currently only cleared when the endpoint is
stopped via configfs or when the host updates the MSI address/size.
This avoids redundant iATU reconfiguration every time the endpoint
raises an MSI interrupt.

However, a fundamental reset triggered by PERST# assert/deassert
resets all iATU inbound/outbound registers without going through the
configfs stop path. If the host also retains the same MSI address/size
after PERST# deassert, the driver never clears the stale MSI iATU
mapping. It then continues using this stale mapping to raise the MSI
interrupts, which can cause IOMMU faults and MSI failures on the host.

Fix this by clearing the MSI iATU mapping inside dw_pcie_ep_cleanup(),
which is already called as part of the PERST# assert/deassert sequence.
This unmaps the MSI iATU region and sets the msi_iatu_mapped flag to
false, ensuring that dw_pcie_ep_raise_msi_irq() performs a fresh iATU
mapping on its next invocation, regardless of whether the host changed
the MSI address/size.

Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
---
 drivers/pci/controller/dwc/pcie-designware-ep.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/pci/controller/dwc/pcie-designware-ep.c b/drivers/pci/controller/dwc/pcie-designware-ep.c
index d4dc3b24da60..4ae0e1b55f39 100644
--- a/drivers/pci/controller/dwc/pcie-designware-ep.c
+++ b/drivers/pci/controller/dwc/pcie-designware-ep.c
@@ -1035,6 +1035,11 @@ void dw_pcie_ep_cleanup(struct dw_pcie_ep *ep)
 {
 	struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
 
+	if (ep->msi_iatu_mapped) {
+		dw_pcie_ep_unmap_addr(ep->epc, 0, 0, ep->msi_mem_phys);
+		ep->msi_iatu_mapped = false;
+	}
+
 	dwc_pcie_debugfs_deinit(pci);
 	dw_pcie_edma_remove(pci);
 }

-- 
2.48.1



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v8 2/5] PCI/ERR: Add support for resetting the Root Ports in a platform specific way
  2026-05-18 14:59 [PATCH v8 0/5] PCI: Add support for resetting the Root Ports in a platform specific way Manivannan Sadhasivam via B4 Relay
  2026-05-18 14:59 ` [PATCH v8 1/5] PCI: dwc: ep: Clear MSI iATU mapping in dw_pcie_ep_cleanup() Manivannan Sadhasivam via B4 Relay
@ 2026-05-18 14:59 ` Manivannan Sadhasivam via B4 Relay
  2026-05-18 16:27   ` sashiko-bot
  2026-05-18 14:59 ` [PATCH v8 3/5] PCI: host-common: Add link down handling for Root Ports Manivannan Sadhasivam via B4 Relay
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 11+ messages in thread
From: Manivannan Sadhasivam via B4 Relay @ 2026-05-18 14:59 UTC (permalink / raw)
  To: Bjorn Helgaas, Mahesh J Salgaonkar, Oliver O'Halloran,
	Will Deacon, Lorenzo Pieralisi, Krzysztof Wilczyński,
	Manivannan Sadhasivam, Rob Herring, Heiko Stuebner, Philipp Zabel
  Cc: linux-pci, linux-kernel, linuxppc-dev, linux-arm-kernel,
	linux-arm-msm, linux-rockchip, Niklas Cassel, Wilfred Mallawa,
	Krishna Chaitanya Chundru, mani, Lukas Wunner, Richard Zhu,
	Brian Norris, Wilson Ding, Manivannan Sadhasivam, Frank Li,
	Manivannan Sadhasivam

From: Manivannan Sadhasivam <mani@kernel.org>

Some host bridge devices require resetting the Root Ports in a platform
specific way to recover them from error conditions such as Fatal AER
errors, Link Down etc... So introduce pci_host_bridge::reset_root_port()
callback and call it from pcibios_reset_secondary_bus() if available. Also,
save the Root Port config space before reset and restore it afterwards.

The 'reset_root_port' callback is responsible for resetting the given Root
Port referenced by the 'pci_dev' pointer in a platform specific way and
bring it back to the working state if possible. If any error occurs during
the reset operation, relevant errno should be returned.

Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
Tested-by: Brian Norris <briannorris@chromium.org>
Tested-by: Krishna Chaitanya Chundru <krishna.chundru@oss.qualcomm.com>
Tested-by: Richard Zhu <hongxing.zhu@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
---
 drivers/pci/pci.c      | 13 +++++++++++++
 drivers/pci/pcie/err.c |  5 -----
 include/linux/pci.h    |  1 +
 3 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 8f7cfcc00090..651505b3bd60 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4809,6 +4809,19 @@ void pci_reset_secondary_bus(struct pci_dev *dev)
 
 void __weak pcibios_reset_secondary_bus(struct pci_dev *dev)
 {
+	struct pci_host_bridge *host = pci_find_host_bridge(dev->bus);
+	int ret;
+
+	if (pci_is_root_bus(dev->bus) && host->reset_root_port) {
+		ret = host->reset_root_port(host, dev);
+		if (ret)
+			pci_err(dev, "Failed to reset Root Port: %d\n", ret);
+		else
+			pci_restore_state(dev);
+
+		return;
+	}
+
 	pci_reset_secondary_bus(dev);
 }
 
diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index bebe4bc111d7..13b9d9eb714f 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -256,11 +256,6 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
 	}
 
 	if (status == PCI_ERS_RESULT_NEED_RESET) {
-		/*
-		 * TODO: Should call platform-specific
-		 * functions to reset slot before calling
-		 * drivers' slot_reset callbacks?
-		 */
 		status = PCI_ERS_RESULT_RECOVERED;
 		pci_dbg(bridge, "broadcast slot_reset message\n");
 		pci_walk_bridge(bridge, report_slot_reset, &status);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 2c4454583c11..439dbd0d9184 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -646,6 +646,7 @@ struct pci_host_bridge {
 	void (*release_fn)(struct pci_host_bridge *);
 	int (*enable_device)(struct pci_host_bridge *bridge, struct pci_dev *dev);
 	void (*disable_device)(struct pci_host_bridge *bridge, struct pci_dev *dev);
+	int (*reset_root_port)(struct pci_host_bridge *bridge, struct pci_dev *dev);
 	void		*release_data;
 	unsigned int	ignore_reset_delay:1;	/* For entire hierarchy */
 	unsigned int	no_ext_tags:1;		/* No Extended Tags */

-- 
2.48.1



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v8 3/5] PCI: host-common: Add link down handling for Root Ports
  2026-05-18 14:59 [PATCH v8 0/5] PCI: Add support for resetting the Root Ports in a platform specific way Manivannan Sadhasivam via B4 Relay
  2026-05-18 14:59 ` [PATCH v8 1/5] PCI: dwc: ep: Clear MSI iATU mapping in dw_pcie_ep_cleanup() Manivannan Sadhasivam via B4 Relay
  2026-05-18 14:59 ` [PATCH v8 2/5] PCI/ERR: Add support for resetting the Root Ports in a platform specific way Manivannan Sadhasivam via B4 Relay
@ 2026-05-18 14:59 ` Manivannan Sadhasivam via B4 Relay
  2026-05-18 17:05   ` sashiko-bot
  2026-05-18 14:59 ` [PATCH v8 4/5] PCI: qcom: Add support for resetting the Root Port due to link down event Manivannan Sadhasivam via B4 Relay
  2026-05-18 14:59 ` [PATCH v8 5/5] misc: pci_endpoint_test: Add AER error handlers Manivannan Sadhasivam via B4 Relay
  4 siblings, 1 reply; 11+ messages in thread
From: Manivannan Sadhasivam via B4 Relay @ 2026-05-18 14:59 UTC (permalink / raw)
  To: Bjorn Helgaas, Mahesh J Salgaonkar, Oliver O'Halloran,
	Will Deacon, Lorenzo Pieralisi, Krzysztof Wilczyński,
	Manivannan Sadhasivam, Rob Herring, Heiko Stuebner, Philipp Zabel
  Cc: linux-pci, linux-kernel, linuxppc-dev, linux-arm-kernel,
	linux-arm-msm, linux-rockchip, Niklas Cassel, Wilfred Mallawa,
	Krishna Chaitanya Chundru, mani, Lukas Wunner, Richard Zhu,
	Brian Norris, Wilson Ding, Manivannan Sadhasivam, Frank Li,
	Manivannan Sadhasivam

From: Manivannan Sadhasivam <mani@kernel.org>

The PCI link, when down, needs to be recovered to bring it back. But on
some platforms, that cannot be done in a generic way as link recovery
procedure is platform specific. So add a new API
pci_host_handle_link_down() that could be called by the host bridge drivers
for a specific Root Port when the link goes down.

The API accepts the 'pci_dev' corresponding to the Root Port which observed
the link down event. If CONFIG_PCIEAER is enabled, the API calls
pcie_do_recovery() function with 'pci_channel_io_frozen' as the state. This
will result in the execution of the AER Fatal error handling code. Since
the link down recovery is pretty much the same as AER Fatal error handling,
pcie_do_recovery() helper is reused here. First, the AER error_detected()
callback will be triggered for the bridge and then for the downstream
devices. Finally, pci_host_reset_root_port() will be called for the Root
Port, which will reset the Root Port using 'reset_root_port' callback to
recover the link. Once that's done, resume message will be broadcasted to
the bridge and the downstream devices, indicating successful link recovery.

But if CONFIG_PCIEAER is not enabled in the kernel, only
pci_host_reset_root_port() API will be called, which will in turn call
pci_bus_error_reset() to just reset the Root Port as there is no way we
could inform the drivers about link recovery.

Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
Tested-by: Brian Norris <briannorris@chromium.org>
Tested-by: Krishna Chaitanya Chundru <krishna.chundru@oss.qualcomm.com>
Tested-by: Richard Zhu <hongxing.zhu@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
---
 drivers/pci/controller/pci-host-common.c | 35 ++++++++++++++++++++++++++++++++
 drivers/pci/controller/pci-host-common.h |  1 +
 drivers/pci/pci.c                        |  1 +
 drivers/pci/pcie/err.c                   |  1 +
 4 files changed, 38 insertions(+)

diff --git a/drivers/pci/controller/pci-host-common.c b/drivers/pci/controller/pci-host-common.c
index d6258c1cffe5..15ebff8a542a 100644
--- a/drivers/pci/controller/pci-host-common.c
+++ b/drivers/pci/controller/pci-host-common.c
@@ -12,9 +12,11 @@
 #include <linux/of.h>
 #include <linux/of_address.h>
 #include <linux/of_pci.h>
+#include <linux/pci.h>
 #include <linux/pci-ecam.h>
 #include <linux/platform_device.h>
 
+#include "../pci.h"
 #include "pci-host-common.h"
 
 static void gen_pci_unmap_cfg(void *ptr)
@@ -106,5 +108,38 @@ void pci_host_common_remove(struct platform_device *pdev)
 }
 EXPORT_SYMBOL_GPL(pci_host_common_remove);
 
+static pci_ers_result_t pci_host_reset_root_port(struct pci_dev *dev)
+{
+	int ret;
+
+	pci_lock_rescan_remove();
+	ret = pci_bus_error_reset(dev);
+	pci_unlock_rescan_remove();
+	if (ret) {
+		pci_err(dev, "Failed to reset Root Port: %d\n", ret);
+		return PCI_ERS_RESULT_DISCONNECT;
+	}
+
+	pci_info(dev, "Root Port has been reset\n");
+
+	return PCI_ERS_RESULT_RECOVERED;
+}
+
+static void pci_host_recover_root_port(struct pci_dev *port)
+{
+#if IS_ENABLED(CONFIG_PCIEAER)
+	pcie_do_recovery(port, pci_channel_io_frozen, pci_host_reset_root_port);
+#else
+	pci_host_reset_root_port(port);
+#endif
+}
+
+void pci_host_handle_link_down(struct pci_dev *port)
+{
+	pci_info(port, "Recovering Root Port due to Link Down\n");
+	pci_host_recover_root_port(port);
+}
+EXPORT_SYMBOL_GPL(pci_host_handle_link_down);
+
 MODULE_DESCRIPTION("Common library for PCI host controller drivers");
 MODULE_LICENSE("GPL v2");
diff --git a/drivers/pci/controller/pci-host-common.h b/drivers/pci/controller/pci-host-common.h
index b5075d4bd7eb..dd12dd1a1b23 100644
--- a/drivers/pci/controller/pci-host-common.h
+++ b/drivers/pci/controller/pci-host-common.h
@@ -17,6 +17,7 @@ int pci_host_common_init(struct platform_device *pdev,
 			 struct pci_host_bridge *bridge,
 			 const struct pci_ecam_ops *ops);
 void pci_host_common_remove(struct platform_device *pdev);
+void pci_host_handle_link_down(struct pci_dev *port);
 
 struct pci_config_window *pci_host_common_ecam_create(struct device *dev,
 	struct pci_host_bridge *bridge, const struct pci_ecam_ops *ops);
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 651505b3bd60..35dc9f54a8ef 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5669,6 +5669,7 @@ int pci_bus_error_reset(struct pci_dev *bridge)
 {
 	return pci_reset_bridge(bridge, PCI_RESET_NO_RESTORE);
 }
+EXPORT_SYMBOL_GPL(pci_bus_error_reset);
 
 int pci_try_reset_bridge(struct pci_dev *bridge)
 {
diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index 13b9d9eb714f..d77403d8855b 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -292,3 +292,4 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
 
 	return status;
 }
+EXPORT_SYMBOL_GPL(pcie_do_recovery);

-- 
2.48.1



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v8 4/5] PCI: qcom: Add support for resetting the Root Port due to link down event
  2026-05-18 14:59 [PATCH v8 0/5] PCI: Add support for resetting the Root Ports in a platform specific way Manivannan Sadhasivam via B4 Relay
                   ` (2 preceding siblings ...)
  2026-05-18 14:59 ` [PATCH v8 3/5] PCI: host-common: Add link down handling for Root Ports Manivannan Sadhasivam via B4 Relay
@ 2026-05-18 14:59 ` Manivannan Sadhasivam via B4 Relay
  2026-05-18 17:38   ` sashiko-bot
  2026-05-18 14:59 ` [PATCH v8 5/5] misc: pci_endpoint_test: Add AER error handlers Manivannan Sadhasivam via B4 Relay
  4 siblings, 1 reply; 11+ messages in thread
From: Manivannan Sadhasivam via B4 Relay @ 2026-05-18 14:59 UTC (permalink / raw)
  To: Bjorn Helgaas, Mahesh J Salgaonkar, Oliver O'Halloran,
	Will Deacon, Lorenzo Pieralisi, Krzysztof Wilczyński,
	Manivannan Sadhasivam, Rob Herring, Heiko Stuebner, Philipp Zabel
  Cc: linux-pci, linux-kernel, linuxppc-dev, linux-arm-kernel,
	linux-arm-msm, linux-rockchip, Niklas Cassel, Wilfred Mallawa,
	Krishna Chaitanya Chundru, mani, Lukas Wunner, Richard Zhu,
	Brian Norris, Wilson Ding, Manivannan Sadhasivam,
	Manivannan Sadhasivam

From: Manivannan Sadhasivam <mani@kernel.org>

The PCIe link can go down under circumstances such as the device firmware
crash, link instability, etc... When that happens, the PCIe Root Port needs
to be reset to make it operational again. Currently, the driver is not
handling the link down event, due to which the users have to restart the
machine to make PCIe link operational again. So fix it by detecting the
link down event and resetting the Root Port.

Since the Qcom PCIe controllers report the link down event through the
'global' IRQ, enable the link down event by setting PARF_INT_ALL_LINK_DOWN
bit in PARF_INT_ALL_MASK register.

In the case of the event, iterate through the available Root Ports and call
pci_host_handle_link_down() API with Root Port 'pci_dev' to let the PCI
core handle the link down condition. Since Qcom PCIe controllers only
support one Root Port per controller instance, the API will be called only
once. But the looping is necessary as there is no PCI API available to
fetch the Root Port instance without the child 'pci_dev'.

The API will internally call, 'pci_host_bridge::reset_root_port()' callback
to reset the Root Port in a platform specific way. So implement the
callback to reset the Root Port by first resetting the PCIe core, followed
by reinitializing the resources and then finally starting the link again.

Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
Tested-by: Krishna Chaitanya Chundru <krishna.chundru@oss.qualcomm.com>
---
 drivers/pci/controller/dwc/pcie-qcom.c | 143 ++++++++++++++++++++++++++++++++-
 1 file changed, 142 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/controller/dwc/pcie-qcom.c b/drivers/pci/controller/dwc/pcie-qcom.c
index af6bf5cce65b..feda8abf5f85 100644
--- a/drivers/pci/controller/dwc/pcie-qcom.c
+++ b/drivers/pci/controller/dwc/pcie-qcom.c
@@ -56,6 +56,10 @@
 #define PARF_AXI_MSTR_WR_ADDR_HALT_V2		0x1a8
 #define PARF_Q2A_FLUSH				0x1ac
 #define PARF_LTSSM				0x1b0
+#define PARF_INT_ALL_STATUS			0x224
+#define PARF_INT_ALL_CLEAR			0x228
+#define PARF_INT_ALL_MASK			0x22c
+#define PARF_STATUS				0x230
 #define PARF_SID_OFFSET				0x234
 #define PARF_BDF_TRANSLATE_CFG			0x24c
 #define PARF_DBI_BASE_ADDR_V2			0x350
@@ -131,6 +135,13 @@
 
 /* PARF_LTSSM register fields */
 #define LTSSM_EN				BIT(8)
+#define SW_CLEAR_FLUSH_MODE			BIT(10)
+#define FLUSH_MODE				BIT(11)
+
+/* PARF_INT_ALL_{STATUS/CLEAR/MASK} register fields */
+#define INT_ALL_LINK_DOWN			1
+#define PARF_INT_ALL_LINK_DOWN			BIT(INT_ALL_LINK_DOWN)
+#define PARF_INT_MSI_DEV_0_7			GENMASK(30, 23)
 
 /* PARF_NO_SNOOP_OVERRIDE register fields */
 #define WR_NO_SNOOP_OVERRIDE_EN			BIT(1)
@@ -142,6 +153,9 @@
 /* PARF_BDF_TO_SID_CFG fields */
 #define BDF_TO_SID_BYPASS			BIT(0)
 
+/* PARF_STATUS fields */
+#define FLUSH_COMPLETED				BIT(8)
+
 /* ELBI_SYS_CTRL register fields */
 #define ELBI_SYS_CTRL_LT_ENABLE			BIT(0)
 
@@ -166,6 +180,7 @@
 						PCIE_CAP_SLOT_POWER_LIMIT_SCALE)
 
 #define PERST_DELAY_US				1000
+#define FLUSH_TIMEOUT_US			100
 
 #define QCOM_PCIE_CRC8_POLYNOMIAL		(BIT(2) | BIT(1) | BIT(0))
 
@@ -282,11 +297,14 @@ struct qcom_pcie {
 	const struct qcom_pcie_cfg *cfg;
 	struct dentry *debugfs;
 	struct list_head ports;
+	int global_irq;
 	bool suspended;
 	bool use_pm_opp;
 };
 
 #define to_qcom_pcie(x)		dev_get_drvdata((x)->dev)
+static int qcom_pcie_reset_root_port(struct pci_host_bridge *bridge,
+				  struct pci_dev *pdev);
 
 static void __qcom_pcie_perst_assert(struct qcom_pcie *pcie, bool assert)
 {
@@ -1330,6 +1348,8 @@ static int qcom_pcie_host_init(struct dw_pcie_rp *pp)
 			goto err_assert_reset;
 	}
 
+	pp->bridge->reset_root_port = qcom_pcie_reset_root_port;
+
 	return 0;
 
 err_assert_reset:
@@ -1613,6 +1633,78 @@ static void qcom_pcie_icc_opp_update(struct qcom_pcie *pcie)
 	}
 }
 
+/*
+ * Qcom PCIe controllers only support one Root Port per controller instance. So
+ * this function ignores the 'pci_dev' associated with the Root Port and just
+ * resets the host bridge, which in turn resets the Root Port also.
+ */
+static int qcom_pcie_reset_root_port(struct pci_host_bridge *bridge,
+				  struct pci_dev *pdev)
+{
+	struct pci_bus *bus = bridge->bus;
+	struct dw_pcie_rp *pp = bus->sysdata;
+	struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
+	struct qcom_pcie *pcie = to_qcom_pcie(pci);
+	struct device *dev = pcie->pci->dev;
+	u32 val;
+	int ret;
+
+	/* Wait for the pending transactions to be completed */
+	ret = readl_relaxed_poll_timeout(pcie->parf + PARF_STATUS, val,
+					 val & FLUSH_COMPLETED, 10,
+					 FLUSH_TIMEOUT_US);
+	if (ret) {
+		dev_err(dev, "Flush completion failed: %d\n", ret);
+		goto err_host_deinit;
+	}
+
+	/* Clear the FLUSH_MODE to allow the core to be reset */
+	val = readl(pcie->parf + PARF_LTSSM);
+	val |= SW_CLEAR_FLUSH_MODE;
+	writel(val, pcie->parf + PARF_LTSSM);
+
+	/* Wait for the FLUSH_MODE to clear */
+	ret = readl_relaxed_poll_timeout(pcie->parf + PARF_LTSSM, val,
+					 !(val & FLUSH_MODE), 10,
+					 FLUSH_TIMEOUT_US);
+	if (ret) {
+		dev_err(dev, "Flush mode clear failed: %d\n", ret);
+		goto err_host_deinit;
+	}
+
+	qcom_pcie_host_deinit(pp);
+
+	ret = qcom_pcie_host_init(pp);
+	if (ret) {
+		dev_err(dev, "Host init failed\n");
+		return ret;
+	}
+
+	ret = dw_pcie_setup_rc(pp);
+	if (ret)
+		goto err_host_deinit;
+
+	/*
+	 * Re-enable global IRQ events as the PARF_INT_ALL_MASK register is
+	 * non-sticky.
+	 */
+	if (pcie->global_irq)
+		writel_relaxed(PARF_INT_ALL_LINK_DOWN | PARF_INT_MSI_DEV_0_7,
+				pcie->parf + PARF_INT_ALL_MASK);
+
+	qcom_pcie_start_link(pci);
+	dw_pcie_wait_for_link(pci);
+
+	dev_dbg(dev, "Root Port reset completed\n");
+
+	return 0;
+
+err_host_deinit:
+	qcom_pcie_host_deinit(pp);
+
+	return ret;
+}
+
 static int qcom_pcie_link_transition_count(struct seq_file *s, void *data)
 {
 	struct qcom_pcie *pcie = (struct qcom_pcie *)dev_get_drvdata(s->private);
@@ -1650,6 +1742,27 @@ static void qcom_pcie_init_debugfs(struct qcom_pcie *pcie)
 				    qcom_pcie_link_transition_count);
 }
 
+static irqreturn_t qcom_pcie_global_irq_thread(int irq, void *data)
+{
+	struct qcom_pcie *pcie = data;
+	struct dw_pcie_rp *pp = &pcie->pci->pp;
+	struct device *dev = pcie->pci->dev;
+	struct pci_dev *port;
+	unsigned long status = readl_relaxed(pcie->parf + PARF_INT_ALL_STATUS);
+
+	writel_relaxed(status, pcie->parf + PARF_INT_ALL_CLEAR);
+
+	if (test_and_clear_bit(INT_ALL_LINK_DOWN, &status)) {
+		dev_dbg(dev, "Received Link down event\n");
+		for_each_pci_bridge(port, pp->bridge->bus) {
+			if (pci_pcie_type(port) == PCI_EXP_TYPE_ROOT_PORT)
+				pci_host_handle_link_down(port);
+		}
+	}
+
+	return IRQ_HANDLED;
+}
+
 static void qcom_pci_free_msi(void *ptr)
 {
 	struct dw_pcie_rp *pp = (struct dw_pcie_rp *)ptr;
@@ -1852,7 +1965,7 @@ static int qcom_pcie_probe(struct platform_device *pdev)
 	struct dw_pcie_rp *pp;
 	struct resource *res;
 	struct dw_pcie *pci;
-	int ret;
+	int ret, irq;
 
 	pcie_cfg = of_device_get_match_data(dev);
 	if (!pcie_cfg) {
@@ -2009,6 +2122,32 @@ static int qcom_pcie_probe(struct platform_device *pdev)
 		goto err_phy_exit;
 	}
 
+	irq = platform_get_irq_byname_optional(pdev, "global");
+	if (irq > 0) {
+		const char *name;
+
+		name = devm_kasprintf(dev, GFP_KERNEL, "qcom_pcie_global_irq%d",
+				      pci_domain_nr(pp->bridge->bus));
+		if (!name) {
+			ret = -ENOMEM;
+			goto err_host_deinit;
+		}
+
+		ret = devm_request_threaded_irq(&pdev->dev, irq, NULL,
+						qcom_pcie_global_irq_thread,
+						IRQF_ONESHOT, name, pcie);
+		if (ret) {
+			dev_err_probe(&pdev->dev, ret,
+				      "Failed to request Global IRQ\n");
+			goto err_host_deinit;
+		}
+
+		writel_relaxed(PARF_INT_ALL_LINK_DOWN | PARF_INT_MSI_DEV_0_7,
+				pcie->parf + PARF_INT_ALL_MASK);
+
+		pcie->global_irq = irq;
+	}
+
 	qcom_pcie_icc_opp_update(pcie);
 
 	if (pcie->mhi)
@@ -2016,6 +2155,8 @@ static int qcom_pcie_probe(struct platform_device *pdev)
 
 	return 0;
 
+err_host_deinit:
+	dw_pcie_host_deinit(pp);
 err_phy_exit:
 	list_for_each_entry_safe(port, tmp_port, &pcie->ports, list) {
 		list_for_each_entry_safe(perst, tmp_perst, &port->perst, list)

-- 
2.48.1



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v8 5/5] misc: pci_endpoint_test: Add AER error handlers
  2026-05-18 14:59 [PATCH v8 0/5] PCI: Add support for resetting the Root Ports in a platform specific way Manivannan Sadhasivam via B4 Relay
                   ` (3 preceding siblings ...)
  2026-05-18 14:59 ` [PATCH v8 4/5] PCI: qcom: Add support for resetting the Root Port due to link down event Manivannan Sadhasivam via B4 Relay
@ 2026-05-18 14:59 ` Manivannan Sadhasivam via B4 Relay
  2026-05-18 17:55   ` sashiko-bot
  4 siblings, 1 reply; 11+ messages in thread
From: Manivannan Sadhasivam via B4 Relay @ 2026-05-18 14:59 UTC (permalink / raw)
  To: Bjorn Helgaas, Mahesh J Salgaonkar, Oliver O'Halloran,
	Will Deacon, Lorenzo Pieralisi, Krzysztof Wilczyński,
	Manivannan Sadhasivam, Rob Herring, Heiko Stuebner, Philipp Zabel
  Cc: linux-pci, linux-kernel, linuxppc-dev, linux-arm-kernel,
	linux-arm-msm, linux-rockchip, Niklas Cassel, Wilfred Mallawa,
	Krishna Chaitanya Chundru, mani, Lukas Wunner, Richard Zhu,
	Brian Norris, Wilson Ding, Manivannan Sadhasivam

From: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>

This Endpoint test driver doesn't need to do anything fancy in its error
handlers, but just restore the config space that was saved during probe and
report the correct result. This helps in making sure that the AER recovery
succeeds.

Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
---
 drivers/misc/pci_endpoint_test.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/drivers/misc/pci_endpoint_test.c b/drivers/misc/pci_endpoint_test.c
index dbd017cabbb9..3e89bd48c196 100644
--- a/drivers/misc/pci_endpoint_test.c
+++ b/drivers/misc/pci_endpoint_test.c
@@ -1327,6 +1327,8 @@ static int pci_endpoint_test_probe(struct pci_dev *pdev,
 		goto err_kfree_name;
 	}
 
+	pci_save_state(pdev);
+
 	return 0;
 
 err_kfree_name:
@@ -1448,12 +1450,33 @@ static const struct pci_device_id pci_endpoint_test_tbl[] = {
 };
 MODULE_DEVICE_TABLE(pci, pci_endpoint_test_tbl);
 
+static pci_ers_result_t pci_endpoint_test_error_detected(struct pci_dev *pdev,
+					      pci_channel_state_t state)
+{
+	if (state == pci_channel_io_perm_failure)
+		return PCI_ERS_RESULT_DISCONNECT;
+
+	return PCI_ERS_RESULT_NEED_RESET;
+}
+
+static pci_ers_result_t pci_endpoint_test_slot_reset(struct pci_dev *pdev)
+{
+	pci_restore_state(pdev);
+	return PCI_ERS_RESULT_RECOVERED;
+}
+
+static const struct pci_error_handlers pci_endpoint_test_err_handler = {
+	.error_detected = pci_endpoint_test_error_detected,
+	.slot_reset = pci_endpoint_test_slot_reset,
+};
+
 static struct pci_driver pci_endpoint_test_driver = {
 	.name		= DRV_MODULE_NAME,
 	.id_table	= pci_endpoint_test_tbl,
 	.probe		= pci_endpoint_test_probe,
 	.remove		= pci_endpoint_test_remove,
 	.sriov_configure = pci_sriov_configure_simple,
+	.err_handler    = &pci_endpoint_test_err_handler,
 };
 module_pci_driver(pci_endpoint_test_driver);
 

-- 
2.48.1



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v8 1/5] PCI: dwc: ep: Clear MSI iATU mapping in dw_pcie_ep_cleanup()
  2026-05-18 14:59 ` [PATCH v8 1/5] PCI: dwc: ep: Clear MSI iATU mapping in dw_pcie_ep_cleanup() Manivannan Sadhasivam via B4 Relay
@ 2026-05-18 16:04   ` sashiko-bot
  0 siblings, 0 replies; 11+ messages in thread
From: sashiko-bot @ 2026-05-18 16:04 UTC (permalink / raw)
  To: Manivannan Sadhasivam via B4 Relay; +Cc: linux-pci

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [High] dw_pcie_ep_cleanup() modifies the MSI iATU mapping without holding ep->epc->lock, introducing a critical race condition with concurrent EPF interrupt requests.
--

commit f01add14b7c42730aa5b71bb56cc8682eb2fa21a
Author: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>

PCI: dwc: ep: Clear MSI iATU mapping in dw_pcie_ep_cleanup()

This commit clears the MSI iATU mapping in dw_pcie_ep_cleanup() to avoid
retaining a stale MSI iATU mapping after a fundamental reset triggered by
PERST# assert/deassert.

> diff --git a/drivers/pci/controller/dwc/pcie-designware-ep.c b/drivers/pci/controller/dwc/pcie-designware-ep.c
> index d4dc3b24da607..4ae0e1b55f390 100644
> --- a/drivers/pci/controller/dwc/pcie-designware-ep.c
> +++ b/drivers/pci/controller/dwc/pcie-designware-ep.c
> @@ -1035,6 +1035,11 @@ void dw_pcie_ep_cleanup(struct dw_pcie_ep *ep)
>  {
>  	struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
>  
> +	if (ep->msi_iatu_mapped) {
> +		dw_pcie_ep_unmap_addr(ep->epc, 0, 0, ep->msi_mem_phys);

Is it safe to modify the MSI iATU mapping without holding ep->epc->lock here?

dw_pcie_ep_cleanup() can be invoked asynchronously from reset/link-down
threaded IRQ handlers (e.g., qcom_pcie_perst_irq_thread() calling
qcom_pcie_perst_deassert()) which might not hold the endpoint controller lock.

Concurrently, an endpoint function driver can attempt to raise an MSI
interrupt via pci_epc_raise_irq(), which acquires ep->epc->lock and invokes
dw_pcie_ep_raise_msi_irq().

If dw_pcie_ep_cleanup() unmaps the window right before
dw_pcie_ep_raise_msi_irq() writes to ep->msi_mem, could this result in
a write to an unmapped PCIe outbound window and trigger an AXI bus error?

Furthermore, dw_pcie_ep_unmap_addr() modifies the shared ep->ob_window_map
bitmask. Does modifying this without the EPC lock race against other functions
like dw_pcie_ep_map_addr(), potentially corrupting the hardware outbound window
routing?

> +		ep->msi_iatu_mapped = false;
> +	}
> +
>  	dwc_pcie_debugfs_deinit(pci);
>  	dw_pcie_edma_remove(pci);
>  }

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260518-pci-port-reset-v8-0-eb5a7d331dfc@oss.qualcomm.com?part=1

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v8 2/5] PCI/ERR: Add support for resetting the Root Ports in a platform specific way
  2026-05-18 14:59 ` [PATCH v8 2/5] PCI/ERR: Add support for resetting the Root Ports in a platform specific way Manivannan Sadhasivam via B4 Relay
@ 2026-05-18 16:27   ` sashiko-bot
  0 siblings, 0 replies; 11+ messages in thread
From: sashiko-bot @ 2026-05-18 16:27 UTC (permalink / raw)
  To: Manivannan Sadhasivam via B4 Relay; +Cc: linux-pci

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [High] Missing pci_save_state() before the Root Port reset, leading to restoration of stale or uninitialized configuration space.
- [Medium] Silent failure and bypassed fallback when the platform-specific reset_root_port() fails.
--

commit b100b0c0e31e848c799fc6d648ef37421c08fb9f
Author: Manivannan Sadhasivam <mani@kernel.org>

PCI/ERR: Add support for resetting the Root Ports in a platform specific way

This commit adds a reset_root_port callback to pci_host_bridge to allow
platform-specific root port resets during error recovery. It calls this
from pcibios_reset_secondary_bus() if available, attempting to save and
restore the configuration space across the reset.

> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 8f7cfcc000901..651505b3bd60e 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -4809,6 +4809,19 @@ void pci_reset_secondary_bus(struct pci_dev *dev)
>  
>  void __weak pcibios_reset_secondary_bus(struct pci_dev *dev)
>  {
> +	struct pci_host_bridge *host = pci_find_host_bridge(dev->bus);
> +	int ret;
> +
> +	if (pci_is_root_bus(dev->bus) && host->reset_root_port) {

The commit message says "save the Root Port config space before reset and
restore it afterwards". Should there be a call to pci_save_state() here
before initiating the reset?

If the state is never saved immediately prior to the reset, won't
pci_restore_state() restore whatever state was last saved in
dev->saved_config_space (e.g. from probe or suspend)?

> +		ret = host->reset_root_port(host, dev);
> +		if (ret)
> +			pci_err(dev, "Failed to reset Root Port: %d\n", ret);
> +		else
> +			pci_restore_state(dev);
> +
> +		return;

If host->reset_root_port() returns an error, the function returns early
without doing the standard reset or propagating the error (since the
function returns void).

Will this cause callers to falsely assume the reset was successful?

Should this fall through to the standard pci_reset_secondary_bus() if the
platform-specific reset fails?

> +	}
> +
>  	pci_reset_secondary_bus(dev);
>  }

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260518-pci-port-reset-v8-0-eb5a7d331dfc@oss.qualcomm.com?part=2

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v8 3/5] PCI: host-common: Add link down handling for Root Ports
  2026-05-18 14:59 ` [PATCH v8 3/5] PCI: host-common: Add link down handling for Root Ports Manivannan Sadhasivam via B4 Relay
@ 2026-05-18 17:05   ` sashiko-bot
  0 siblings, 0 replies; 11+ messages in thread
From: sashiko-bot @ 2026-05-18 17:05 UTC (permalink / raw)
  To: Manivannan Sadhasivam via B4 Relay; +Cc: linux-pci

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [High] Hardware state desynchronization when CONFIG_PCIEAER is disabled.
- [Medium] Commit message claims error_detected() callback is triggered for the bridge, but the implementation skips it.
--

commit f07cf37e729701c6c67b90c92f65dd76b53b9fec
Author: Manivannan Sadhasivam <mani@kernel.org>

PCI: host-common: Add link down handling for Root Ports

This commit introduces a new API, pci_host_handle_link_down(), for host
bridge drivers to recover a specific Root Port when a link down event is
observed. It utilizes pcie_do_recovery() with AER handling if enabled, or
falls back to pci_bus_error_reset() otherwise.

> First, the AER error_detected() callback will be triggered for the bridge and then for the downstream devices.

Is this description accurate regarding the bridge receiving the callback?

Looking at pcie_do_recovery(), it uses pci_walk_bridge() to trigger the
recovery callbacks. However, pci_walk_bridge() handles bridges with
subordinate buses by iterating over devices on the subordinate bus.

Consequently, the Root Port (the bridge itself) appears to be excluded from
the callback walk and might not receive the error_detected() callback.

> diff --git a/drivers/pci/controller/pci-host-common.c b/drivers/pci/controller/pci-host-common.c
> --- a/drivers/pci/controller/pci-host-common.c
> +++ b/drivers/pci/controller/pci-host-common.c

[ ... ]

> @@ -106,5 +108,38 @@ void pci_host_common_remove(struct platform_device *pdev)
>  }
>  EXPORT_SYMBOL_GPL(pci_host_common_remove);
>  
> +static pci_ers_result_t pci_host_reset_root_port(struct pci_dev *dev)
> +{
> +	int ret;
> +
> +	pci_lock_rescan_remove();
> +	ret = pci_bus_error_reset(dev);
> +	pci_unlock_rescan_remove();
> +	if (ret) {
> +		pci_err(dev, "Failed to reset Root Port: %d\n", ret);
> +		return PCI_ERS_RESULT_DISCONNECT;
> +	}
> +
> +	pci_info(dev, "Root Port has been reset\n");
> +
> +	return PCI_ERS_RESULT_RECOVERED;
> +}
> +
> +static void pci_host_recover_root_port(struct pci_dev *port)
> +{
> +#if IS_ENABLED(CONFIG_PCIEAER)
> +	pcie_do_recovery(port, pci_channel_io_frozen, pci_host_reset_root_port);
> +#else
> +	pci_host_reset_root_port(port);

If CONFIG_PCIEAER is disabled, this directly calls pci_host_reset_root_port()
which performs a bus reset using PCI_RESET_NO_RESTORE via
pci_bus_error_reset().

Since pcie_do_recovery() is skipped, downstream device drivers are never
notified via their error_detected(), slot_reset(), or resume() callbacks.

Does this leave the hardware configuration space of these devices wiped while
the bound drivers remain active and unaware?

If subsequent MMIO accesses by these drivers hit unmapped space and DMA
fails, could this lead to device lockups or data corruption? Would it be
safer to hot-remove and rescan the devices instead of blindly resetting the
bus when AER is unavailable?

> +#endif
> +}

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260518-pci-port-reset-v8-0-eb5a7d331dfc@oss.qualcomm.com?part=3

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v8 4/5] PCI: qcom: Add support for resetting the Root Port due to link down event
  2026-05-18 14:59 ` [PATCH v8 4/5] PCI: qcom: Add support for resetting the Root Port due to link down event Manivannan Sadhasivam via B4 Relay
@ 2026-05-18 17:38   ` sashiko-bot
  0 siblings, 0 replies; 11+ messages in thread
From: sashiko-bot @ 2026-05-18 17:38 UTC (permalink / raw)
  To: Manivannan Sadhasivam via B4 Relay; +Cc: linux-pci

Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:
- [Critical] System crash on error path due to unclocked controller.
- [High] Unintended MSI interception by the global IRQ handler causes severe performance regressions or silently drops events.
- [High] Data race and use-after-free via lockless iteration of PCI bus devices in IRQ thread.
- [Medium] Ignored return value masks link retraining failures.
--

commit f4a6523af4f70d1a2c5951a7190c4516cab3c8fc
Author: Manivannan Sadhasivam <mani@kernel.org>

PCI: qcom: Add support for resetting the Root Port due to link down event

This commit detects the PCIe link down event and resets the Root Port
by first resetting the PCIe core, reinitializing resources, and restarting
the link to make it operational again.

> diff --git a/drivers/pci/controller/dwc/pcie-qcom.c b/drivers/pci/controller/dwc/pcie-qcom.c
> index af6bf5cce65be..feda8abf5f851 100644
> --- a/drivers/pci/controller/dwc/pcie-qcom.c
> +++ b/drivers/pci/controller/dwc/pcie-qcom.c

[ ... ]

> @@ -1613,6 +1633,78 @@ static void qcom_pcie_icc_opp_update(struct qcom_pcie *pcie)
> +static int qcom_pcie_reset_root_port(struct pci_host_bridge *bridge,
> +				  struct pci_dev *pdev)
> +{

[ ... ]

> +	ret = dw_pcie_setup_rc(pp);
> +	if (ret)
> +		goto err_host_deinit;
> +
> +	/*
> +	 * Re-enable global IRQ events as the PARF_INT_ALL_MASK register is
> +	 * non-sticky.
> +	 */
> +	if (pcie->global_irq)
> +		writel_relaxed(PARF_INT_ALL_LINK_DOWN | PARF_INT_MSI_DEV_0_7,
> +				pcie->parf + PARF_INT_ALL_MASK);
> +
> +	qcom_pcie_start_link(pci);
> +	dw_pcie_wait_for_link(pci);
> +
> +	dev_dbg(dev, "Root Port reset completed\n");
> +
> +	return 0;

Does dw_pcie_wait_for_link() return an error code (such as -ETIMEDOUT) if the
link fails to retrain?

Since this return value is ignored and the function unconditionally returns 0,
the PCI core is falsely informed that the recovery was successful.

Could this prevent the PCI subsystem from properly disconnecting the downstream
devices if the link fails to come up, leaving zombie device structures that
cannot communicate?

> +
> +err_host_deinit:
> +	qcom_pcie_host_deinit(pp);
> +
> +	return ret;
> +}

If an error occurs earlier in this function and the code jumps to
err_host_deinit, qcom_pcie_host_deinit() will be called, which turns off the
PHY power, regulators, and PCIe clocks.

When this function returns the error code back to the PCI core, the PCI core
handles the reset failure by returning PCI_ERS_RESULT_DISCONNECT and
triggering device removal.

Since the PCI core and userspace tools might attempt to read the devices'
configuration space during unbind, and the clocks are now disabled, will these
MMIO accesses cause a synchronous external abort and a kernel panic?

Should the controller remain clocked on the error path so it can safely reject
config accesses with 0xFFFFFFFF?

> @@ -1650,6 +1742,27 @@ static void qcom_pcie_init_debugfs(struct qcom_pcie *pcie)
> +static irqreturn_t qcom_pcie_global_irq_thread(int irq, void *data)
> +{
> +	struct qcom_pcie *pcie = data;
> +	struct dw_pcie_rp *pp = &pcie->pci->pp;
> +	struct device *dev = pcie->pci->dev;
> +	struct pci_dev *port;
> +	unsigned long status = readl_relaxed(pcie->parf + PARF_INT_ALL_STATUS);
> +
> +	writel_relaxed(status, pcie->parf + PARF_INT_ALL_CLEAR);
> +

In qcom_pcie_reset_root_port() and qcom_pcie_probe(), PARF_INT_MSI_DEV_0_7
is unmasked in PARF_INT_ALL_MASK. By explicitly unmasking this bit alongside
PARF_INT_ALL_LINK_DOWN, does this cause MSI interrupts to be incorrectly
routed to this global IRQ handler?

The threaded handler reads the status and writes it to PARF_INT_ALL_CLEAR,
but ignores the MSI bits and unconditionally returns IRQ_HANDLED.

Could this lead to silent dropping of MSI interrupts if the Synopsys DesignWare
core relies on this status being preserved?

Also, for high-frequency interrupt sources, would every MSI unnecessarily wake
up qcom_pcie_global_irq_thread, causing excessive CPU overhead?

> +	if (test_and_clear_bit(INT_ALL_LINK_DOWN, &status)) {
> +		dev_dbg(dev, "Received Link down event\n");
> +		for_each_pci_bridge(port, pp->bridge->bus) {
> +			if (pci_pcie_type(port) == PCI_EXP_TYPE_ROOT_PORT)
> +				pci_host_handle_link_down(port);
> +		}
> +	}

Does iterating over the root bus devices using for_each_pci_bridge() in an
asynchronous threaded IRQ context require holding the pci_bus_sem read-write
semaphore?

Without locking, could this iteration race with concurrent PCI topology
changes like hotplug events, AER recoveries, or driver unbinds, potentially
leading to list corruption or a use-after-free if a device structure is
freed while the loop is executing?

Would it be safer to use pci_walk_bus() or explicitly acquire
down_read(&pci_bus_sem)?

> +
> +	return IRQ_HANDLED;
> +}

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260518-pci-port-reset-v8-0-eb5a7d331dfc@oss.qualcomm.com?part=4

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v8 5/5] misc: pci_endpoint_test: Add AER error handlers
  2026-05-18 14:59 ` [PATCH v8 5/5] misc: pci_endpoint_test: Add AER error handlers Manivannan Sadhasivam via B4 Relay
@ 2026-05-18 17:55   ` sashiko-bot
  0 siblings, 0 replies; 11+ messages in thread
From: sashiko-bot @ 2026-05-18 17:55 UTC (permalink / raw)
  To: Manivannan Sadhasivam via B4 Relay; +Cc: linux-pci

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [High] Race condition in probe: pci_save_state() is called after misc_register()
--

commit 1e7c4e54131311f06f05246675701727353fa564
Author: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>

misc: pci_endpoint_test: Add AER error handlers

This commit adds AER error handlers to the PCI endpoint test driver.
It saves the PCI configuration state during probe and restores it
during a slot reset to ensure AER recovery succeeds.

> diff --git a/drivers/misc/pci_endpoint_test.c b/drivers/misc/pci_endpoint_test.c
> index dbd017cabbb92..3e89bd48c196b 100644
> --- a/drivers/misc/pci_endpoint_test.c
> +++ b/drivers/misc/pci_endpoint_test.c
> @@ -1327,6 +1327,8 @@ static int pci_endpoint_test_probe(struct pci_dev *pdev,
>  		goto err_kfree_name;
>  	}
>  
> +	pci_save_state(pdev);
> +

Could this introduce a race condition with userspace?

Since misc_register() was just called above, the device node is already
accessible to userspace. If a userspace program immediately opens the device
and issues an ioctl like PCITEST_SET_IRQTYPE, it will trigger
pci_alloc_irq_vectors().

This could alter the PCI configuration space (such as modifying the PCI_COMMAND
register to set up MSI/MSI-X) before pci_save_state() executes.

If this happens, pci_save_state() would capture the modified state as the
baseline. During an AER recovery, pci_restore_state() would then restore
this modified state instead of the pristine hardware configuration.

Should pci_save_state() be called before misc_register() to ensure the base
configuration is correctly snapshotted?

>  	return 0;
>  
>  err_kfree_name:
> @@ -1448,12 +1450,33 @@ static const struct pci_device_id pci_endpoint_test_tbl[] = {

[ ... ]

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260518-pci-port-reset-v8-0-eb5a7d331dfc@oss.qualcomm.com?part=5

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2026-05-18 17:55 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-18 14:59 [PATCH v8 0/5] PCI: Add support for resetting the Root Ports in a platform specific way Manivannan Sadhasivam via B4 Relay
2026-05-18 14:59 ` [PATCH v8 1/5] PCI: dwc: ep: Clear MSI iATU mapping in dw_pcie_ep_cleanup() Manivannan Sadhasivam via B4 Relay
2026-05-18 16:04   ` sashiko-bot
2026-05-18 14:59 ` [PATCH v8 2/5] PCI/ERR: Add support for resetting the Root Ports in a platform specific way Manivannan Sadhasivam via B4 Relay
2026-05-18 16:27   ` sashiko-bot
2026-05-18 14:59 ` [PATCH v8 3/5] PCI: host-common: Add link down handling for Root Ports Manivannan Sadhasivam via B4 Relay
2026-05-18 17:05   ` sashiko-bot
2026-05-18 14:59 ` [PATCH v8 4/5] PCI: qcom: Add support for resetting the Root Port due to link down event Manivannan Sadhasivam via B4 Relay
2026-05-18 17:38   ` sashiko-bot
2026-05-18 14:59 ` [PATCH v8 5/5] misc: pci_endpoint_test: Add AER error handlers Manivannan Sadhasivam via B4 Relay
2026-05-18 17:55   ` sashiko-bot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox