[PATCH v4 0/5] PCI: Add support for resetting the slots in a platform specific way

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v4 0/5] PCI: Add support for resetting the slots in a platform specific way
@ 2025-05-08  7:10 Manivannan Sadhasivam
  2025-05-08  7:10 ` [PATCH v4 1/5] PCI/ERR: Remove misleading TODO regarding kernel panic Manivannan Sadhasivam
                   ` (5 more replies)
  0 siblings, 6 replies; 28+ messages in thread
From: Manivannan Sadhasivam @ 2025-05-08  7:10 UTC (permalink / raw)
  To: Mahesh J Salgaonkar, Oliver O'Halloran, Bjorn Helgaas,
	Lorenzo Pieralisi, Krzysztof Wilczyński, Rob Herring,
	Zhou Wang, Will Deacon, Robert Richter, Alyssa Rosenzweig,
	Marc Zyngier, Conor Dooley, Daire McNamara
  Cc: dingwei, cassel, Lukas Wunner, Krishna Chaitanya Chundru,
	linuxppc-dev, linux-pci, linux-kernel, linux-arm-msm,
	linux-arm-kernel, linux-riscv, Manivannan Sadhasivam

Hi,

Currently, in the event of AER/DPC, PCI core will try to reset the slot and its
subordinate devices by invoking bridge control reset and FLR. But in some
cases like AER Fatal error, it might be necessary to reset the slots using the
PCI host bridge drivers in a platform specific way (as indicated by the TODO in
the pcie_do_recovery() function in drivers/pci/pcie/err.c). Otherwise, the PCI
link won't be recovered successfully.

So this series adds a new callback 'pci_host_bridge::reset_slot' for the host
bridge drivers to reset the slot when a fatal error happens.

Also, this series allows the host bridge drivers to handle PCI link down event
by resetting the slots and recovering the bus. This is accomplished by the
help of a new API 'pci_host_handle_link_down()'. Host bridge drivers are
expected to call this API (preferrably from a threaded IRQ handler) when a link
down event is detected. The API will reuse the pcie_do_recovery() function to
recover the link if AER support is enabled, otherwise it will directly call the
reset_slot() callback of the host bridge driver (if exists).

For reference, I've modified the pcie-qcom driver to call
pci_host_handle_link_down() after receiving LINK_DOWN global_irq event and
populated the 'pci_host_bridge::reset_slot()' callback to reset the controller
(there by slots). Since the Qcom PCIe controllers support only a single root
port (slot) per controller instance, reset_slot() callback is going to be
invoked only once. For multi root port controllers, this callback is supposed to
identify the slots using the supplied 'pci_dev' pointer and reset them.

NOTE
====

This series is a reworked version of the earlier series [1] that I submitted for
handling PCI link down event. In this series, I've made use of the AER helpers
to recover the link as it allows notifying the device drivers and also
allows saving/restoring the config space.

Testing
=======

This series is tested on Qcom RB5 and SA8775p Ride boards by triggering the link
down event manually by writing to LTSSM register. For the error recovery to
succeed (if AER is enabled), all the drivers in the bridge hierarchy should have
the 'err_handlers' populated. Otherwise, the link recovery will fail.

[1] https://lore.kernel.org/linux-pci/20250221172309.120009-1-manivannan.sadhasivam@linaro.org

Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
---
Changes in v4:
- Handled link down first in the irq handler
- Updated ICC & OPP bandwidth after link up in reset_slot() callback
- Link to v3: https://lore.kernel.org/r/20250417-pcie-reset-slot-v3-0-59a10811c962@linaro.org

Changes in v3:
- Made the pci-host-common driver as a common library for host controller
  drivers
- Moved the reset slot code to pci-host-common library
- Link to v2: https://lore.kernel.org/r/20250416-pcie-reset-slot-v2-0-efe76b278c10@linaro.org

Changes in v2:
- Moved calling reset_slot() callback from pcie_do_recovery() to pcibios_reset_secondary_bus()
- Link to v1: https://lore.kernel.org/r/20250404-pcie-reset-slot-v1-0-98952918bf90@linaro.org

---
Manivannan Sadhasivam (5):
      PCI/ERR: Remove misleading TODO regarding kernel panic
      PCI/ERR: Add support for resetting the slots in a platform specific way
      PCI: host-common: Make the driver as a common library for host controller drivers
      PCI: host-common: Add link down handling for host bridges
      PCI: qcom: Add support for resetting the slot due to link down event

 drivers/pci/controller/Kconfig                    |   8 +-
 drivers/pci/controller/dwc/Kconfig                |   1 +
 drivers/pci/controller/dwc/pcie-hisi.c            |   1 +
 drivers/pci/controller/dwc/pcie-qcom.c            | 112 ++++++++++++++++++++--
 drivers/pci/controller/pci-host-common.c          |  64 ++++++++++++-
 drivers/pci/controller/pci-host-common.h          |  17 ++++
 drivers/pci/controller/pci-host-generic.c         |   2 +
 drivers/pci/controller/pci-thunder-ecam.c         |   2 +
 drivers/pci/controller/pci-thunder-pem.c          |   1 +
 drivers/pci/controller/pcie-apple.c               |   2 +
 drivers/pci/controller/plda/pcie-microchip-host.c |   1 +
 drivers/pci/pci.c                                 |  13 +++
 drivers/pci/pcie/err.c                            |   7 +-
 include/linux/pci-ecam.h                          |   6 --
 include/linux/pci.h                               |   1 +
 15 files changed, 212 insertions(+), 26 deletions(-)
---
base-commit: 08733088b566b58283f0f12fb73f5db6a9a9de30
change-id: 20250404-pcie-reset-slot-730bfa71a202

Best regards,
-- 
Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v4 1/5] PCI/ERR: Remove misleading TODO regarding kernel panic
  2025-05-08  7:10 [PATCH v4 0/5] PCI: Add support for resetting the slots in a platform specific way Manivannan Sadhasivam
@ 2025-05-08  7:10 ` Manivannan Sadhasivam
  2025-05-09  3:04   ` Wilfred Mallawa
                     ` (2 more replies)
  2025-05-08  7:10 ` [PATCH v4 2/5] PCI/ERR: Add support for resetting the slots in a platform specific way Manivannan Sadhasivam
                   ` (4 subsequent siblings)
  5 siblings, 3 replies; 28+ messages in thread
From: Manivannan Sadhasivam @ 2025-05-08  7:10 UTC (permalink / raw)
  To: Mahesh J Salgaonkar, Oliver O'Halloran, Bjorn Helgaas,
	Lorenzo Pieralisi, Krzysztof Wilczyński, Rob Herring,
	Zhou Wang, Will Deacon, Robert Richter, Alyssa Rosenzweig,
	Marc Zyngier, Conor Dooley, Daire McNamara
  Cc: dingwei, cassel, Lukas Wunner, Krishna Chaitanya Chundru,
	linuxppc-dev, linux-pci, linux-kernel, linux-arm-msm,
	linux-arm-kernel, linux-riscv, Manivannan Sadhasivam

A PCI device is just another peripheral in a system. So failure to
recover it, must not result in a kernel panic. So remove the TODO which
is quite misleading.

Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
---
 drivers/pci/pcie/err.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index 31090770fffcc94e15ba6e89f649c6f84bfdf0d5..de6381c690f5c21f00021cdc7bde8d93a5c7db52 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -271,7 +271,6 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
 
 	pci_uevent_ers(bridge, PCI_ERS_RESULT_DISCONNECT);
 
-	/* TODO: Should kernel panic here? */
 	pci_info(bridge, "device recovery failed\n");
 
 	return status;

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v4 2/5] PCI/ERR: Add support for resetting the slots in a platform specific way
  2025-05-08  7:10 [PATCH v4 0/5] PCI: Add support for resetting the slots in a platform specific way Manivannan Sadhasivam
  2025-05-08  7:10 ` [PATCH v4 1/5] PCI/ERR: Remove misleading TODO regarding kernel panic Manivannan Sadhasivam
@ 2025-05-08  7:10 ` Manivannan Sadhasivam
  2025-05-08  7:18   ` Lukas Wunner
                     ` (3 more replies)
  2025-05-08  7:10 ` [PATCH v4 3/5] PCI: host-common: Make the driver as a common library for host controller drivers Manivannan Sadhasivam
                   ` (3 subsequent siblings)
  5 siblings, 4 replies; 28+ messages in thread
From: Manivannan Sadhasivam @ 2025-05-08  7:10 UTC (permalink / raw)
  To: Mahesh J Salgaonkar, Oliver O'Halloran, Bjorn Helgaas,
	Lorenzo Pieralisi, Krzysztof Wilczyński, Rob Herring,
	Zhou Wang, Will Deacon, Robert Richter, Alyssa Rosenzweig,
	Marc Zyngier, Conor Dooley, Daire McNamara
  Cc: dingwei, cassel, Lukas Wunner, Krishna Chaitanya Chundru,
	linuxppc-dev, linux-pci, linux-kernel, linux-arm-msm,
	linux-arm-kernel, linux-riscv, Manivannan Sadhasivam

Some host bridge devices require resetting the slots in a platform specific
way to recover them from error conditions such as Fatal AER errors, Link
Down etc... So introduce pci_host_bridge::reset_slot callback and call it
from pcibios_reset_secondary_bus() if available.

The 'reset_slot' callback is responsible for resetting the given slot
referenced by the 'pci_dev' pointer in a platform specific way and bring it
back to the working state if possible. If any error occurs during the slot
reset operation, relevant errno should be returned.

Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
---
 drivers/pci/pci.c      | 12 ++++++++++++
 drivers/pci/pcie/err.c |  5 -----
 include/linux/pci.h    |  1 +
 3 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 4d7c9f64ea24ec754a135a2585c99489cfa641a9..13709bb898a967968540826a2b7ee8ade6b7e082 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4982,7 +4982,19 @@ void pci_reset_secondary_bus(struct pci_dev *dev)
 
 void __weak pcibios_reset_secondary_bus(struct pci_dev *dev)
 {
+	struct pci_host_bridge *host = pci_find_host_bridge(dev->bus);
+	int ret;
+
+	if (host->reset_slot) {
+		ret = host->reset_slot(host, dev);
+		if (ret)
+			pci_err(dev, "failed to reset slot: %d\n", ret);
+
+		return;
+	}
+
 	pci_reset_secondary_bus(dev);
+
 }
 
 /**
diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index de6381c690f5c21f00021cdc7bde8d93a5c7db52..b834fc0d705938540d3d7d3d8739770c09fe7cf1 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -234,11 +234,6 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
 	}
 
 	if (status == PCI_ERS_RESULT_NEED_RESET) {
-		/*
-		 * TODO: Should call platform-specific
-		 * functions to reset slot before calling
-		 * drivers' slot_reset callbacks?
-		 */
 		status = PCI_ERS_RESULT_RECOVERED;
 		pci_dbg(bridge, "broadcast slot_reset message\n");
 		pci_walk_bridge(bridge, report_slot_reset, &status);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 0e8e3fd77e96713054388bdc82f439e51023c1bf..8d7d2a49b76cf64b4218b179cec495e0d69ddf6f 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -599,6 +599,7 @@ struct pci_host_bridge {
 	void (*release_fn)(struct pci_host_bridge *);
 	int (*enable_device)(struct pci_host_bridge *bridge, struct pci_dev *dev);
 	void (*disable_device)(struct pci_host_bridge *bridge, struct pci_dev *dev);
+	int (*reset_slot)(struct pci_host_bridge *bridge, struct pci_dev *dev);
 	void		*release_data;
 	unsigned int	ignore_reset_delay:1;	/* For entire hierarchy */
 	unsigned int	no_ext_tags:1;		/* No Extended Tags */

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v4 3/5] PCI: host-common: Make the driver as a common library for host controller drivers
  2025-05-08  7:10 [PATCH v4 0/5] PCI: Add support for resetting the slots in a platform specific way Manivannan Sadhasivam
  2025-05-08  7:10 ` [PATCH v4 1/5] PCI/ERR: Remove misleading TODO regarding kernel panic Manivannan Sadhasivam
  2025-05-08  7:10 ` [PATCH v4 2/5] PCI/ERR: Add support for resetting the slots in a platform specific way Manivannan Sadhasivam
@ 2025-05-08  7:10 ` Manivannan Sadhasivam
  2025-05-08  7:10 ` [PATCH v4 4/5] PCI: host-common: Add link down handling for host bridges Manivannan Sadhasivam
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 28+ messages in thread
From: Manivannan Sadhasivam @ 2025-05-08  7:10 UTC (permalink / raw)
  To: Mahesh J Salgaonkar, Oliver O'Halloran, Bjorn Helgaas,
	Lorenzo Pieralisi, Krzysztof Wilczyński, Rob Herring,
	Zhou Wang, Will Deacon, Robert Richter, Alyssa Rosenzweig,
	Marc Zyngier, Conor Dooley, Daire McNamara
  Cc: dingwei, cassel, Lukas Wunner, Krishna Chaitanya Chundru,
	linuxppc-dev, linux-pci, linux-kernel, linux-arm-msm,
	linux-arm-kernel, linux-riscv, Manivannan Sadhasivam

This common library will be used as a placeholder for helper functions
shared by the host controller drivers. This avoids placing the host
controller drivers specific helpers in drivers/pci/*.c, to avoid enlarging
the kernel Image on platforms that do not use host controller drivers at
all (like x86/ACPI platforms).

Suggested-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
---
 drivers/pci/controller/Kconfig                    |  8 ++++----
 drivers/pci/controller/dwc/pcie-hisi.c            |  1 +
 drivers/pci/controller/pci-host-common.c          |  6 ++++--
 drivers/pci/controller/pci-host-common.h          | 16 ++++++++++++++++
 drivers/pci/controller/pci-host-generic.c         |  2 ++
 drivers/pci/controller/pci-thunder-ecam.c         |  2 ++
 drivers/pci/controller/pci-thunder-pem.c          |  1 +
 drivers/pci/controller/pcie-apple.c               |  2 ++
 drivers/pci/controller/plda/pcie-microchip-host.c |  1 +
 include/linux/pci-ecam.h                          |  6 ------
 10 files changed, 33 insertions(+), 12 deletions(-)

diff --git a/drivers/pci/controller/Kconfig b/drivers/pci/controller/Kconfig
index 9800b768105402d6dd1ba4b134c2ec23da6e4201..9bb8bf669a807272777b6168d042f8fd7490aeec 100644
--- a/drivers/pci/controller/Kconfig
+++ b/drivers/pci/controller/Kconfig
@@ -3,6 +3,10 @@
 menu "PCI controller drivers"
 	depends on PCI
 
+config PCI_HOST_COMMON
+	tristate
+	select PCI_ECAM
+
 config PCI_AARDVARK
 	tristate "Aardvark PCIe controller"
 	depends on (ARCH_MVEBU && ARM64) || COMPILE_TEST
@@ -119,10 +123,6 @@ config PCI_FTPCI100
 	depends on OF
 	default ARCH_GEMINI
 
-config PCI_HOST_COMMON
-	tristate
-	select PCI_ECAM
-
 config PCI_HOST_GENERIC
 	tristate "Generic PCI host controller"
 	depends on OF
diff --git a/drivers/pci/controller/dwc/pcie-hisi.c b/drivers/pci/controller/dwc/pcie-hisi.c
index 8904b5b85ee589576afcb6c81bb4bd39ff960c15..3c17897e56fcb60ec08cf522ee1485f90a2f36a3 100644
--- a/drivers/pci/controller/dwc/pcie-hisi.c
+++ b/drivers/pci/controller/dwc/pcie-hisi.c
@@ -15,6 +15,7 @@
 #include <linux/pci-acpi.h>
 #include <linux/pci-ecam.h>
 #include "../../pci.h"
+#include "../pci-host-common.h"
 
 #if defined(CONFIG_PCI_HISI) || (defined(CONFIG_ACPI) && defined(CONFIG_PCI_QUIRKS))
 
diff --git a/drivers/pci/controller/pci-host-common.c b/drivers/pci/controller/pci-host-common.c
index f441bfd6f96a8bde1c07fcf97d43d0693c424a27..f93bc7034e697250711833a5151f7ef177cd62a0 100644
--- a/drivers/pci/controller/pci-host-common.c
+++ b/drivers/pci/controller/pci-host-common.c
@@ -1,6 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0
 /*
- * Generic PCI host driver common code
+ * Common library for PCI host controller drivers
  *
  * Copyright (C) 2014 ARM Limited
  *
@@ -15,6 +15,8 @@
 #include <linux/pci-ecam.h>
 #include <linux/platform_device.h>
 
+#include "pci-host-common.h"
+
 static void gen_pci_unmap_cfg(void *ptr)
 {
 	pci_ecam_free((struct pci_config_window *)ptr);
@@ -94,5 +96,5 @@ void pci_host_common_remove(struct platform_device *pdev)
 }
 EXPORT_SYMBOL_GPL(pci_host_common_remove);
 
-MODULE_DESCRIPTION("Generic PCI host common driver");
+MODULE_DESCRIPTION("Common library for PCI host controller drivers");
 MODULE_LICENSE("GPL v2");
diff --git a/drivers/pci/controller/pci-host-common.h b/drivers/pci/controller/pci-host-common.h
new file mode 100644
index 0000000000000000000000000000000000000000..d8be024ca68d43afb147fd9104d632b907277144
--- /dev/null
+++ b/drivers/pci/controller/pci-host-common.h
@@ -0,0 +1,16 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Common library for PCI host controller drivers
+ *
+ * Copyright (C) 2014 ARM Limited
+ *
+ * Author: Will Deacon <will.deacon@arm.com>
+ */
+
+#ifndef _PCI_HOST_COMMON_H
+#define _PCI_HOST_COMMON_H
+
+int pci_host_common_probe(struct platform_device *pdev);
+void pci_host_common_remove(struct platform_device *pdev);
+
+#endif
diff --git a/drivers/pci/controller/pci-host-generic.c b/drivers/pci/controller/pci-host-generic.c
index 4051b9b61dace669422e5a6453cc9f58a081beb5..c1bc0d34348f44c9fdd549811f637fb50fe89c64 100644
--- a/drivers/pci/controller/pci-host-generic.c
+++ b/drivers/pci/controller/pci-host-generic.c
@@ -14,6 +14,8 @@
 #include <linux/pci-ecam.h>
 #include <linux/platform_device.h>
 
+#include "pci-host-common.h"
+
 static const struct pci_ecam_ops gen_pci_cfg_cam_bus_ops = {
 	.bus_shift	= 16,
 	.pci_ops	= {
diff --git a/drivers/pci/controller/pci-thunder-ecam.c b/drivers/pci/controller/pci-thunder-ecam.c
index 08161065a89c35a95714df935ef437dfc8845697..b5b4a958e6a22b21501cad45bb242a95a784efc1 100644
--- a/drivers/pci/controller/pci-thunder-ecam.c
+++ b/drivers/pci/controller/pci-thunder-ecam.c
@@ -11,6 +11,8 @@
 #include <linux/pci-ecam.h>
 #include <linux/platform_device.h>
 
+#include "pci-host-common.h"
+
 #if defined(CONFIG_PCI_HOST_THUNDER_ECAM) || (defined(CONFIG_ACPI) && defined(CONFIG_PCI_QUIRKS))
 
 static void set_val(u32 v, int where, int size, u32 *val)
diff --git a/drivers/pci/controller/pci-thunder-pem.c b/drivers/pci/controller/pci-thunder-pem.c
index f1bd5de67997cddac173723bc7f4ec20aaf20064..5fa037fb61dc356f3029d1b5cae632ae1da5bb9b 100644
--- a/drivers/pci/controller/pci-thunder-pem.c
+++ b/drivers/pci/controller/pci-thunder-pem.c
@@ -14,6 +14,7 @@
 #include <linux/platform_device.h>
 #include <linux/io-64-nonatomic-lo-hi.h>
 #include "../pci.h"
+#include "pci-host-common.h"
 
 #if defined(CONFIG_PCI_HOST_THUNDER_PEM) || (defined(CONFIG_ACPI) && defined(CONFIG_PCI_QUIRKS))
 
diff --git a/drivers/pci/controller/pcie-apple.c b/drivers/pci/controller/pcie-apple.c
index 18e11b9a7f46479348815c3f706319189e0a80b5..edd4c8c683c6a693401b47f5f056641c13ae89f8 100644
--- a/drivers/pci/controller/pcie-apple.c
+++ b/drivers/pci/controller/pcie-apple.c
@@ -29,6 +29,8 @@
 #include <linux/of_irq.h>
 #include <linux/pci-ecam.h>
 
+#include "pci-host-common.h"
+
 #define CORE_RC_PHYIF_CTL		0x00024
 #define   CORE_RC_PHYIF_CTL_RUN		BIT(0)
 #define CORE_RC_PHYIF_STAT		0x00028
diff --git a/drivers/pci/controller/plda/pcie-microchip-host.c b/drivers/pci/controller/plda/pcie-microchip-host.c
index 3fdfffdf027001bf88df8e1c2538587298228220..24bbf93b8051fa0d9027ce6983eae34cad81065e 100644
--- a/drivers/pci/controller/plda/pcie-microchip-host.c
+++ b/drivers/pci/controller/plda/pcie-microchip-host.c
@@ -23,6 +23,7 @@
 #include <linux/wordpart.h>
 
 #include "../../pci.h"
+#include "../pci-host-common.h"
 #include "pcie-plda.h"
 
 #define MC_MAX_NUM_INBOUND_WINDOWS		8
diff --git a/include/linux/pci-ecam.h b/include/linux/pci-ecam.h
index 3a10f8cfc3ad5c90585a8fc971be714011ed18fe..d930651473b4d0b406e657a24ede87e09517d091 100644
--- a/include/linux/pci-ecam.h
+++ b/include/linux/pci-ecam.h
@@ -93,10 +93,4 @@ extern const struct pci_ecam_ops al_pcie_ops;	/* Amazon Annapurna Labs PCIe */
 extern const struct pci_ecam_ops tegra194_pcie_ops; /* Tegra194 PCIe */
 extern const struct pci_ecam_ops loongson_pci_ecam_ops; /* Loongson PCIe */
 #endif
-
-#if IS_ENABLED(CONFIG_PCI_HOST_COMMON)
-/* for DT-based PCI controllers that support ECAM */
-int pci_host_common_probe(struct platform_device *pdev);
-void pci_host_common_remove(struct platform_device *pdev);
-#endif
 #endif

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v4 4/5] PCI: host-common: Add link down handling for host bridges
  2025-05-08  7:10 [PATCH v4 0/5] PCI: Add support for resetting the slots in a platform specific way Manivannan Sadhasivam
                   ` (2 preceding siblings ...)
  2025-05-08  7:10 ` [PATCH v4 3/5] PCI: host-common: Make the driver as a common library for host controller drivers Manivannan Sadhasivam
@ 2025-05-08  7:10 ` Manivannan Sadhasivam
  2025-05-14  6:30   ` Krishna Chaitanya Chundru
                     ` (2 more replies)
  2025-05-08  7:10 ` [PATCH v4 5/5] PCI: qcom: Add support for resetting the slot due to link down event Manivannan Sadhasivam
  2025-05-14 16:32 ` [PATCH v4 0/5] PCI: Add support for resetting the slots in a platform specific way Manivannan Sadhasivam
  5 siblings, 3 replies; 28+ messages in thread
From: Manivannan Sadhasivam @ 2025-05-08  7:10 UTC (permalink / raw)
  To: Mahesh J Salgaonkar, Oliver O'Halloran, Bjorn Helgaas,
	Lorenzo Pieralisi, Krzysztof Wilczyński, Rob Herring,
	Zhou Wang, Will Deacon, Robert Richter, Alyssa Rosenzweig,
	Marc Zyngier, Conor Dooley, Daire McNamara
  Cc: dingwei, cassel, Lukas Wunner, Krishna Chaitanya Chundru,
	linuxppc-dev, linux-pci, linux-kernel, linux-arm-msm,
	linux-arm-kernel, linux-riscv, Manivannan Sadhasivam

The PCI link, when down, needs to be recovered to bring it back. But that
cannot be done in a generic way as link recovery procedure is specific to
host bridges. So add a new API pci_host_handle_link_down() that could be
called by the host bridge drivers when the link goes down.

The API will iterate through all the slots and calls the pcie_do_recovery()
function with 'pci_channel_io_frozen' as the state. This will result in the
execution of the AER Fatal error handling code. Since the link down
recovery is pretty much the same as AER Fatal error handling,
pcie_do_recovery() helper is reused here. First the AER error_detected
callback will be triggered for the bridge and the downstream devices. Then,
pci_host_reset_slot() will be called for the slot, which will reset the
slot using 'reset_slot' callback to recover the link. Once that's done,
resume message will be broadcasted to the bridge and the downstream devices
indicating successful link recovery.

In case if the AER support is not enabled in the kernel, only
pci_bus_error_reset() will be called for each slots as there is no way we
could inform the drivers about link recovery.

Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
---
 drivers/pci/controller/pci-host-common.c | 58 ++++++++++++++++++++++++++++++++
 drivers/pci/controller/pci-host-common.h |  1 +
 drivers/pci/pci.c                        |  1 +
 drivers/pci/pcie/err.c                   |  1 +
 4 files changed, 61 insertions(+)

diff --git a/drivers/pci/controller/pci-host-common.c b/drivers/pci/controller/pci-host-common.c
index f93bc7034e697250711833a5151f7ef177cd62a0..f916f0a874a61ddfbfd99f96975c00fb66dd224c 100644
--- a/drivers/pci/controller/pci-host-common.c
+++ b/drivers/pci/controller/pci-host-common.c
@@ -12,9 +12,11 @@
 #include <linux/of.h>
 #include <linux/of_address.h>
 #include <linux/of_pci.h>
+#include <linux/pci.h>
 #include <linux/pci-ecam.h>
 #include <linux/platform_device.h>
 
+#include "../pci.h"
 #include "pci-host-common.h"
 
 static void gen_pci_unmap_cfg(void *ptr)
@@ -96,5 +98,61 @@ void pci_host_common_remove(struct platform_device *pdev)
 }
 EXPORT_SYMBOL_GPL(pci_host_common_remove);
 
+#if IS_ENABLED(CONFIG_PCIEAER)
+static pci_ers_result_t pci_host_reset_slot(struct pci_dev *dev)
+{
+	int ret;
+
+	ret = pci_bus_error_reset(dev);
+	if (ret) {
+		pci_err(dev, "Failed to reset slot: %d\n", ret);
+		return PCI_ERS_RESULT_DISCONNECT;
+	}
+
+	pci_info(dev, "Slot has been reset\n");
+
+	return PCI_ERS_RESULT_RECOVERED;
+}
+
+static void pci_host_recover_slots(struct pci_host_bridge *host)
+{
+	struct pci_bus *bus = host->bus;
+	struct pci_dev *dev;
+
+	for_each_pci_bridge(dev, bus) {
+		if (!pci_is_root_bus(bus))
+			continue;
+
+		pcie_do_recovery(dev, pci_channel_io_frozen,
+				 pci_host_reset_slot);
+	}
+}
+#else
+static void pci_host_recover_slots(struct pci_host_bridge *host)
+{
+	struct pci_bus *bus = host->bus;
+	struct pci_dev *dev;
+	int ret;
+
+	for_each_pci_bridge(dev, bus) {
+		if (!pci_is_root_bus(bus))
+			continue;
+
+		ret = pci_bus_error_reset(dev);
+		if (ret)
+			pci_err(dev, "Failed to reset slot: %d\n", ret);
+		else
+			pci_info(dev, "Slot has been reset\n");
+	}
+}
+#endif
+
+void pci_host_handle_link_down(struct pci_host_bridge *bridge)
+{
+	dev_info(&bridge->dev, "Recovering slots due to Link Down\n");
+	pci_host_recover_slots(bridge);
+}
+EXPORT_SYMBOL_GPL(pci_host_handle_link_down);
+
 MODULE_DESCRIPTION("Common library for PCI host controller drivers");
 MODULE_LICENSE("GPL v2");
diff --git a/drivers/pci/controller/pci-host-common.h b/drivers/pci/controller/pci-host-common.h
index d8be024ca68d43afb147fd9104d632b907277144..904698c1a2695888a0fc9c2fac360e456116eb1d 100644
--- a/drivers/pci/controller/pci-host-common.h
+++ b/drivers/pci/controller/pci-host-common.h
@@ -12,5 +12,6 @@
 
 int pci_host_common_probe(struct platform_device *pdev);
 void pci_host_common_remove(struct platform_device *pdev);
+void pci_host_handle_link_down(struct pci_host_bridge *bridge);
 
 #endif
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 13709bb898a967968540826a2b7ee8ade6b7e082..4d396bbab4a8f33cae0ffe8982da120a9f1d92c9 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5781,6 +5781,7 @@ int pci_bus_error_reset(struct pci_dev *bridge)
 	mutex_unlock(&pci_slot_mutex);
 	return pci_bus_reset(bridge->subordinate, PCI_RESET_DO_RESET);
 }
+EXPORT_SYMBOL_GPL(pci_bus_error_reset);
 
 /**
  * pci_probe_reset_bus - probe whether a PCI bus can be reset
diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index b834fc0d705938540d3d7d3d8739770c09fe7cf1..3e3084bb7cb7fa06b526e6fab60e77927aba0ad0 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -270,3 +270,4 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
 
 	return status;
 }
+EXPORT_SYMBOL_GPL(pcie_do_recovery);

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v4 5/5] PCI: qcom: Add support for resetting the slot due to link down event
  2025-05-08  7:10 [PATCH v4 0/5] PCI: Add support for resetting the slots in a platform specific way Manivannan Sadhasivam
                   ` (3 preceding siblings ...)
  2025-05-08  7:10 ` [PATCH v4 4/5] PCI: host-common: Add link down handling for host bridges Manivannan Sadhasivam
@ 2025-05-08  7:10 ` Manivannan Sadhasivam
  2025-05-14  6:22   ` Krishna Chaitanya Chundru
  2025-05-14 16:32 ` [PATCH v4 0/5] PCI: Add support for resetting the slots in a platform specific way Manivannan Sadhasivam
  5 siblings, 1 reply; 28+ messages in thread
From: Manivannan Sadhasivam @ 2025-05-08  7:10 UTC (permalink / raw)
  To: Mahesh J Salgaonkar, Oliver O'Halloran, Bjorn Helgaas,
	Lorenzo Pieralisi, Krzysztof Wilczyński, Rob Herring,
	Zhou Wang, Will Deacon, Robert Richter, Alyssa Rosenzweig,
	Marc Zyngier, Conor Dooley, Daire McNamara
  Cc: dingwei, cassel, Lukas Wunner, Krishna Chaitanya Chundru,
	linuxppc-dev, linux-pci, linux-kernel, linux-arm-msm,
	linux-arm-kernel, linux-riscv, Manivannan Sadhasivam

The PCIe link can go down under circumstances such as the device firmware
crash, link instability, etc... When that happens, the PCIe slot needs to
be reset to make it operational again. Currently, the driver is not
handling the link down event, due to which the users have to restart the
machine to make PCIe link operational again. So fix it by detecting the
link down event and resetting the slot.

Since the Qcom PCIe controllers report the link down event through the
'global' IRQ, enable the link down event by setting PARF_INT_ALL_LINK_DOWN
bit in PARF_INT_ALL_MASK register.

Then in the case of the event, call pci_host_handle_link_down() API
in the handler to let the PCI core handle the link down condition. Note
that both link up and link down events could be set at a time when the
handler runs. So always handle link down first.

The API will internally call, 'pci_host_bridge::reset_slot()' callback to
reset the slot in a platform specific way. So implement the callback to
reset the slot by first resetting the PCIe core, followed by reinitializing
the resources and then finally starting the link again.

Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
---
 drivers/pci/controller/dwc/Kconfig     |   1 +
 drivers/pci/controller/dwc/pcie-qcom.c | 112 ++++++++++++++++++++++++++++++---
 2 files changed, 105 insertions(+), 8 deletions(-)

diff --git a/drivers/pci/controller/dwc/Kconfig b/drivers/pci/controller/dwc/Kconfig
index d9f0386396edf66ad0e514a0f545ed24d89fcb6c..ce04ee6fbd99cbcce5d2f3a75ebd72a17070b7b7 100644
--- a/drivers/pci/controller/dwc/Kconfig
+++ b/drivers/pci/controller/dwc/Kconfig
@@ -296,6 +296,7 @@ config PCIE_QCOM
 	select PCIE_DW_HOST
 	select CRC8
 	select PCIE_QCOM_COMMON
+	select PCI_HOST_COMMON
 	help
 	  Say Y here to enable PCIe controller support on Qualcomm SoCs. The
 	  PCIe controller uses the DesignWare core plus Qualcomm-specific
diff --git a/drivers/pci/controller/dwc/pcie-qcom.c b/drivers/pci/controller/dwc/pcie-qcom.c
index dc98ae63362db0422384b1879a2b9a7dc564d091..e577619d0f8ceddf0955139ae6b939842f8cb7be 100644
--- a/drivers/pci/controller/dwc/pcie-qcom.c
+++ b/drivers/pci/controller/dwc/pcie-qcom.c
@@ -34,6 +34,7 @@
 #include <linux/units.h>
 
 #include "../../pci.h"
+#include "../pci-host-common.h"
 #include "pcie-designware.h"
 #include "pcie-qcom-common.h"
 
@@ -55,6 +56,7 @@
 #define PARF_INT_ALL_STATUS			0x224
 #define PARF_INT_ALL_CLEAR			0x228
 #define PARF_INT_ALL_MASK			0x22c
+#define PARF_STATUS				0x230
 #define PARF_SID_OFFSET				0x234
 #define PARF_BDF_TRANSLATE_CFG			0x24c
 #define PARF_DBI_BASE_ADDR_V2			0x350
@@ -130,9 +132,14 @@
 
 /* PARF_LTSSM register fields */
 #define LTSSM_EN				BIT(8)
+#define SW_CLEAR_FLUSH_MODE			BIT(10)
+#define FLUSH_MODE				BIT(11)
 
 /* PARF_INT_ALL_{STATUS/CLEAR/MASK} register fields */
-#define PARF_INT_ALL_LINK_UP			BIT(13)
+#define INT_ALL_LINK_DOWN			1
+#define INT_ALL_LINK_UP				13
+#define PARF_INT_ALL_LINK_DOWN			BIT(INT_ALL_LINK_DOWN)
+#define PARF_INT_ALL_LINK_UP			BIT(INT_ALL_LINK_UP)
 #define PARF_INT_MSI_DEV_0_7			GENMASK(30, 23)
 
 /* PARF_NO_SNOOP_OVERRIDE register fields */
@@ -145,6 +152,9 @@
 /* PARF_BDF_TO_SID_CFG fields */
 #define BDF_TO_SID_BYPASS			BIT(0)
 
+/* PARF_STATUS fields */
+#define FLUSH_COMPLETED				BIT(8)
+
 /* ELBI_SYS_CTRL register fields */
 #define ELBI_SYS_CTRL_LT_ENABLE			BIT(0)
 
@@ -169,6 +179,7 @@
 						PCIE_CAP_SLOT_POWER_LIMIT_SCALE)
 
 #define PERST_DELAY_US				1000
+#define FLUSH_TIMEOUT_US			100
 
 #define QCOM_PCIE_CRC8_POLYNOMIAL		(BIT(2) | BIT(1) | BIT(0))
 
@@ -274,11 +285,14 @@ struct qcom_pcie {
 	struct icc_path *icc_cpu;
 	const struct qcom_pcie_cfg *cfg;
 	struct dentry *debugfs;
+	int global_irq;
 	bool suspended;
 	bool use_pm_opp;
 };
 
 #define to_qcom_pcie(x)		dev_get_drvdata((x)->dev)
+static int qcom_pcie_reset_slot(struct pci_host_bridge *bridge,
+				  struct pci_dev *pdev);
 
 static void qcom_ep_reset_assert(struct qcom_pcie *pcie)
 {
@@ -1263,6 +1277,8 @@ static int qcom_pcie_host_init(struct dw_pcie_rp *pp)
 			goto err_assert_reset;
 	}
 
+	pp->bridge->reset_slot = qcom_pcie_reset_slot;
+
 	return 0;
 
 err_assert_reset:
@@ -1517,6 +1533,74 @@ static void qcom_pcie_icc_opp_update(struct qcom_pcie *pcie)
 	}
 }
 
+static int qcom_pcie_reset_slot(struct pci_host_bridge *bridge,
+				  struct pci_dev *pdev)
+{
+	struct pci_bus *bus = bridge->bus;
+	struct dw_pcie_rp *pp = bus->sysdata;
+	struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
+	struct qcom_pcie *pcie = to_qcom_pcie(pci);
+	struct device *dev = pcie->pci->dev;
+	u32 val;
+	int ret;
+
+	/* Wait for the pending transactions to be completed */
+	ret = readl_relaxed_poll_timeout(pcie->parf + PARF_STATUS, val,
+					 val & FLUSH_COMPLETED, 10,
+					 FLUSH_TIMEOUT_US);
+	if (ret) {
+		dev_err(dev, "Flush completion failed: %d\n", ret);
+		goto err_host_deinit;
+	}
+
+	/* Clear the FLUSH_MODE to allow the core to be reset */
+	val = readl(pcie->parf + PARF_LTSSM);
+	val |= SW_CLEAR_FLUSH_MODE;
+	writel(val, pcie->parf + PARF_LTSSM);
+
+	/* Wait for the FLUSH_MODE to clear */
+	ret = readl_relaxed_poll_timeout(pcie->parf + PARF_LTSSM, val,
+					 !(val & FLUSH_MODE), 10,
+					 FLUSH_TIMEOUT_US);
+	if (ret) {
+		dev_err(dev, "Flush mode clear failed: %d\n", ret);
+		goto err_host_deinit;
+	}
+
+	qcom_pcie_host_deinit(pp);
+
+	ret = qcom_pcie_host_init(pp);
+	if (ret) {
+		dev_err(dev, "Host init failed\n");
+		return ret;
+	}
+
+	ret = dw_pcie_setup_rc(pp);
+	if (ret)
+		goto err_host_deinit;
+
+	/*
+	 * Re-enable global IRQ events as the PARF_INT_ALL_MASK register is
+	 * non-sticky.
+	 */
+	if (pcie->global_irq)
+		writel_relaxed(PARF_INT_ALL_LINK_UP | PARF_INT_ALL_LINK_DOWN |
+			       PARF_INT_MSI_DEV_0_7, pcie->parf + PARF_INT_ALL_MASK);
+
+	qcom_pcie_start_link(pci);
+	if (!dw_pcie_wait_for_link(pci))
+		qcom_pcie_icc_opp_update(pcie);
+
+	dev_dbg(dev, "Slot reset completed\n");
+
+	return 0;
+
+err_host_deinit:
+	qcom_pcie_host_deinit(pp);
+
+	return ret;
+}
+
 static int qcom_pcie_link_transition_count(struct seq_file *s, void *data)
 {
 	struct qcom_pcie *pcie = (struct qcom_pcie *)dev_get_drvdata(s->private);
@@ -1559,11 +1643,20 @@ static irqreturn_t qcom_pcie_global_irq_thread(int irq, void *data)
 	struct qcom_pcie *pcie = data;
 	struct dw_pcie_rp *pp = &pcie->pci->pp;
 	struct device *dev = pcie->pci->dev;
-	u32 status = readl_relaxed(pcie->parf + PARF_INT_ALL_STATUS);
+	unsigned long status = readl_relaxed(pcie->parf + PARF_INT_ALL_STATUS);
 
 	writel_relaxed(status, pcie->parf + PARF_INT_ALL_CLEAR);
 
-	if (FIELD_GET(PARF_INT_ALL_LINK_UP, status)) {
+	/*
+	 * It is possible that both Link Up and Link Down events might have
+	 * happended. So always handle Link Down first.
+	 */
+	if (test_and_clear_bit(INT_ALL_LINK_DOWN, &status)) {
+		dev_dbg(dev, "Received Link down event\n");
+		pci_host_handle_link_down(pp->bridge);
+	}
+
+	if (test_and_clear_bit(INT_ALL_LINK_UP, &status)) {
 		dev_dbg(dev, "Received Link up event. Starting enumeration!\n");
 		/* Rescan the bus to enumerate endpoint devices */
 		pci_lock_rescan_remove();
@@ -1571,11 +1664,12 @@ static irqreturn_t qcom_pcie_global_irq_thread(int irq, void *data)
 		pci_unlock_rescan_remove();
 
 		qcom_pcie_icc_opp_update(pcie);
-	} else {
-		dev_WARN_ONCE(dev, 1, "Received unknown event. INT_STATUS: 0x%08x\n",
-			      status);
 	}
 
+	if (status)
+		dev_WARN_ONCE(dev, 1, "Received unknown event. INT_STATUS: 0x%08x\n",
+			      (u32) status);
+
 	return IRQ_HANDLED;
 }
 
@@ -1732,8 +1826,10 @@ static int qcom_pcie_probe(struct platform_device *pdev)
 			goto err_host_deinit;
 		}
 
-		writel_relaxed(PARF_INT_ALL_LINK_UP | PARF_INT_MSI_DEV_0_7,
-			       pcie->parf + PARF_INT_ALL_MASK);
+		writel_relaxed(PARF_INT_ALL_LINK_UP | PARF_INT_ALL_LINK_DOWN |
+			       PARF_INT_MSI_DEV_0_7, pcie->parf + PARF_INT_ALL_MASK);
+
+		pcie->global_irq = irq;
 	}
 
 	qcom_pcie_icc_opp_update(pcie);

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 2/5] PCI/ERR: Add support for resetting the slots in a platform specific way
  2025-05-08  7:10 ` [PATCH v4 2/5] PCI/ERR: Add support for resetting the slots in a platform specific way Manivannan Sadhasivam
@ 2025-05-08  7:18   ` Lukas Wunner
  2025-05-09  2:59   ` Wilfred Mallawa
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 28+ messages in thread
From: Lukas Wunner @ 2025-05-08  7:18 UTC (permalink / raw)
  To: Manivannan Sadhasivam
  Cc: Mahesh J Salgaonkar, Oliver O'Halloran, Bjorn Helgaas,
	Lorenzo Pieralisi, Krzysztof Wilczy??ski, Rob Herring, Zhou Wang,
	Will Deacon, Robert Richter, Alyssa Rosenzweig, Marc Zyngier,
	Conor Dooley, Daire McNamara, dingwei, cassel,
	Krishna Chaitanya Chundru, linuxppc-dev, linux-pci, linux-kernel,
	linux-arm-msm, linux-arm-kernel, linux-riscv

On Thu, May 08, 2025 at 12:40:31PM +0530, Manivannan Sadhasivam wrote:
> Some host bridge devices require resetting the slots in a platform specific
> way to recover them from error conditions such as Fatal AER errors, Link
> Down etc... So introduce pci_host_bridge::reset_slot callback and call it
> from pcibios_reset_secondary_bus() if available.
> 
> The 'reset_slot' callback is responsible for resetting the given slot
> referenced by the 'pci_dev' pointer in a platform specific way and bring it
> back to the working state if possible. If any error occurs during the slot
> reset operation, relevant errno should be returned.
> 
> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>

Reviewed-by: Lukas Wunner <lukas@wunner.de>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 2/5] PCI/ERR: Add support for resetting the slots in a platform specific way
  2025-05-08  7:10 ` [PATCH v4 2/5] PCI/ERR: Add support for resetting the slots in a platform specific way Manivannan Sadhasivam
  2025-05-08  7:18   ` Lukas Wunner
@ 2025-05-09  2:59   ` Wilfred Mallawa
  2025-05-14  6:33   ` Krishna Chaitanya Chundru
  2025-05-14 16:42   ` Sathyanarayanan Kuppuswamy
  3 siblings, 0 replies; 28+ messages in thread
From: Wilfred Mallawa @ 2025-05-09  2:59 UTC (permalink / raw)
  To: Manivannan Sadhasivam, Mahesh J Salgaonkar, Oliver O'Halloran,
	Bjorn Helgaas, Lorenzo Pieralisi, Krzysztof Wilczyński,
	Rob Herring, Zhou Wang, Will Deacon, Robert Richter,
	Alyssa Rosenzweig, Marc Zyngier, Conor Dooley, Daire McNamara
  Cc: dingwei, cassel, Lukas Wunner, Krishna Chaitanya Chundru,
	linuxppc-dev, linux-pci, linux-kernel, linux-arm-msm,
	linux-arm-kernel, linux-riscv

On Thu, 2025-05-08 at 12:40 +0530, Manivannan Sadhasivam wrote:
> Some host bridge devices require resetting the slots in a platform
> specific
> way to recover them from error conditions such as Fatal AER errors,
> Link
> Down etc... So introduce pci_host_bridge::reset_slot callback and
> call it
> from pcibios_reset_secondary_bus() if available.
> 
> The 'reset_slot' callback is responsible for resetting the given slot
> referenced by the 'pci_dev' pointer in a platform specific way and
> bring it
> back to the working state if possible. If any error occurs during the
> slot
> reset operation, relevant errno should be returned.
> 
> Signed-off-by: Manivannan Sadhasivam
> <manivannan.sadhasivam@linaro.org>
> 
Hey Manivannan,

I've been testing this by adding support for the reset_slot() callback
in the dw-rockchip driver. Which has now fixed issues with sysfs issued
bus resets to endpoints. So feel free to use:

Tested-by: Wilfred Mallawa <wilfred.mallawa@wdc.com>

Regards,
Wilfred


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 1/5] PCI/ERR: Remove misleading TODO regarding kernel panic
  2025-05-08  7:10 ` [PATCH v4 1/5] PCI/ERR: Remove misleading TODO regarding kernel panic Manivannan Sadhasivam
@ 2025-05-09  3:04   ` Wilfred Mallawa
  2025-05-09  6:11   ` Ethan Zhao
  2025-05-14 16:39   ` Sathyanarayanan Kuppuswamy
  2 siblings, 0 replies; 28+ messages in thread
From: Wilfred Mallawa @ 2025-05-09  3:04 UTC (permalink / raw)
  To: Manivannan Sadhasivam, Mahesh J Salgaonkar, Oliver O'Halloran,
	Bjorn Helgaas, Lorenzo Pieralisi, Krzysztof Wilczyński,
	Rob Herring, Zhou Wang, Will Deacon, Robert Richter,
	Alyssa Rosenzweig, Marc Zyngier, Conor Dooley, Daire McNamara
  Cc: dingwei, cassel, Lukas Wunner, Krishna Chaitanya Chundru,
	linuxppc-dev, linux-pci, linux-kernel, linux-arm-msm,
	linux-arm-kernel, linux-riscv

On Thu, 2025-05-08 at 12:40 +0530, Manivannan Sadhasivam wrote:
> A PCI device is just another peripheral in a system. So failure to
> recover it, must not result in a kernel panic. So remove the TODO
> which
> is quite misleading.
> 
> Signed-off-by: Manivannan Sadhasivam
> <manivannan.sadhasivam@linaro.org>
Reviewed-by: Wilfred Mallawa <wilfred.mallawa@wdc.com>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 1/5] PCI/ERR: Remove misleading TODO regarding kernel panic
  2025-05-08  7:10 ` [PATCH v4 1/5] PCI/ERR: Remove misleading TODO regarding kernel panic Manivannan Sadhasivam
  2025-05-09  3:04   ` Wilfred Mallawa
@ 2025-05-09  6:11   ` Ethan Zhao
  2025-05-09  6:51     ` Manivannan Sadhasivam
  2025-05-14 16:39   ` Sathyanarayanan Kuppuswamy
  2 siblings, 1 reply; 28+ messages in thread
From: Ethan Zhao @ 2025-05-09  6:11 UTC (permalink / raw)
  To: Manivannan Sadhasivam, Mahesh J Salgaonkar, Oliver O'Halloran,
	Bjorn Helgaas, Lorenzo Pieralisi, Krzysztof Wilczyński,
	Rob Herring, Zhou Wang, Will Deacon, Robert Richter,
	Alyssa Rosenzweig, Marc Zyngier, Conor Dooley, Daire McNamara
  Cc: dingwei, cassel, Lukas Wunner, Krishna Chaitanya Chundru,
	linuxppc-dev, linux-pci, linux-kernel, linux-arm-msm,
	linux-arm-kernel, linux-riscv



On 5/8/2025 3:10 PM, Manivannan Sadhasivam wrote:
> A PCI device is just another peripheral in a system. So failure to
> recover it, must not result in a kernel panic. So remove the TODO which
> is quite misleading.
> 
Could you explain what the result would be if A PCI device failed to
recovery from FATAL/NON_FATAL aer error or DPC event ? what else
better choice we have as next step ? or just saying "failed" then
go ahead ?

Thanks,
Ethan
> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> ---
>   drivers/pci/pcie/err.c | 1 -
>   1 file changed, 1 deletion(-)
> 
> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> index 31090770fffcc94e15ba6e89f649c6f84bfdf0d5..de6381c690f5c21f00021cdc7bde8d93a5c7db52 100644
> --- a/drivers/pci/pcie/err.c
> +++ b/drivers/pci/pcie/err.c
> @@ -271,7 +271,6 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
>   
>   	pci_uevent_ers(bridge, PCI_ERS_RESULT_DISCONNECT);
>   
> -	/* TODO: Should kernel panic here? */
>   	pci_info(bridge, "device recovery failed\n");
>   
>   	return status;
> 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 1/5] PCI/ERR: Remove misleading TODO regarding kernel panic
  2025-05-09  6:11   ` Ethan Zhao
@ 2025-05-09  6:51     ` Manivannan Sadhasivam
  0 siblings, 0 replies; 28+ messages in thread
From: Manivannan Sadhasivam @ 2025-05-09  6:51 UTC (permalink / raw)
  To: Ethan Zhao
  Cc: Mahesh J Salgaonkar, Oliver O'Halloran, Bjorn Helgaas,
	Lorenzo Pieralisi, Krzysztof Wilczyński, Rob Herring,
	Zhou Wang, Will Deacon, Robert Richter, Alyssa Rosenzweig,
	Marc Zyngier, Conor Dooley, Daire McNamara, dingwei, cassel,
	Lukas Wunner, Krishna Chaitanya Chundru, linuxppc-dev, linux-pci,
	linux-kernel, linux-arm-msm, linux-arm-kernel, linux-riscv

On Fri, May 09, 2025 at 02:11:00PM +0800, Ethan Zhao wrote:
> 
> 
> On 5/8/2025 3:10 PM, Manivannan Sadhasivam wrote:
> > A PCI device is just another peripheral in a system. So failure to
> > recover it, must not result in a kernel panic. So remove the TODO which
> > is quite misleading.
> > 
> Could you explain what the result would be if A PCI device failed to
> recovery from FATAL/NON_FATAL aer error or DPC event ? what else
> better choice we have as next step ? or just saying "failed" then
> go ahead ?
> 

If the recovery is not possible (with device,bus,host reset), then there is
nothing could be done.

- Mani

-- 
மணிவண்ணன் சதாசிவம்

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 5/5] PCI: qcom: Add support for resetting the slot due to link down event
  2025-05-08  7:10 ` [PATCH v4 5/5] PCI: qcom: Add support for resetting the slot due to link down event Manivannan Sadhasivam
@ 2025-05-14  6:22   ` Krishna Chaitanya Chundru
  2025-05-14 16:35     ` Manivannan Sadhasivam
  0 siblings, 1 reply; 28+ messages in thread
From: Krishna Chaitanya Chundru @ 2025-05-14  6:22 UTC (permalink / raw)
  To: Manivannan Sadhasivam, Mahesh J Salgaonkar, Oliver O'Halloran,
	Bjorn Helgaas, Lorenzo Pieralisi, Krzysztof Wilczyński,
	Rob Herring, Zhou Wang, Will Deacon, Robert Richter,
	Alyssa Rosenzweig, Marc Zyngier, Conor Dooley, Daire McNamara
  Cc: dingwei, cassel, Lukas Wunner, linuxppc-dev, linux-pci,
	linux-kernel, linux-arm-msm, linux-arm-kernel, linux-riscv



On 5/8/2025 12:40 PM, Manivannan Sadhasivam wrote:
> The PCIe link can go down under circumstances such as the device firmware
> crash, link instability, etc... When that happens, the PCIe slot needs to
> be reset to make it operational again. Currently, the driver is not
> handling the link down event, due to which the users have to restart the
> machine to make PCIe link operational again. So fix it by detecting the
> link down event and resetting the slot.
> 
> Since the Qcom PCIe controllers report the link down event through the
> 'global' IRQ, enable the link down event by setting PARF_INT_ALL_LINK_DOWN
> bit in PARF_INT_ALL_MASK register.
> 
> Then in the case of the event, call pci_host_handle_link_down() API
> in the handler to let the PCI core handle the link down condition. Note
> that both link up and link down events could be set at a time when the
> handler runs. So always handle link down first.
> 
> The API will internally call, 'pci_host_bridge::reset_slot()' callback to
> reset the slot in a platform specific way. So implement the callback to
> reset the slot by first resetting the PCIe core, followed by reinitializing
> the resources and then finally starting the link again.
> 
Only one comment see below.
> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Reviewed-by: Krishna Chaitanya Chundru <krishna.chundru@oss.qualcomm.com>
> ---
>   drivers/pci/controller/dwc/Kconfig     |   1 +
>   drivers/pci/controller/dwc/pcie-qcom.c | 112 ++++++++++++++++++++++++++++++---
>   2 files changed, 105 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/pci/controller/dwc/Kconfig b/drivers/pci/controller/dwc/Kconfig
> index d9f0386396edf66ad0e514a0f545ed24d89fcb6c..ce04ee6fbd99cbcce5d2f3a75ebd72a17070b7b7 100644
> --- a/drivers/pci/controller/dwc/Kconfig
> +++ b/drivers/pci/controller/dwc/Kconfig
> @@ -296,6 +296,7 @@ config PCIE_QCOM
>   	select PCIE_DW_HOST
>   	select CRC8
>   	select PCIE_QCOM_COMMON
> +	select PCI_HOST_COMMON
>   	help
>   	  Say Y here to enable PCIe controller support on Qualcomm SoCs. The
>   	  PCIe controller uses the DesignWare core plus Qualcomm-specific
> diff --git a/drivers/pci/controller/dwc/pcie-qcom.c b/drivers/pci/controller/dwc/pcie-qcom.c
> index dc98ae63362db0422384b1879a2b9a7dc564d091..e577619d0f8ceddf0955139ae6b939842f8cb7be 100644
> --- a/drivers/pci/controller/dwc/pcie-qcom.c
> +++ b/drivers/pci/controller/dwc/pcie-qcom.c
> @@ -34,6 +34,7 @@
>   #include <linux/units.h>
>   
>   #include "../../pci.h"
> +#include "../pci-host-common.h"
>   #include "pcie-designware.h"
>   #include "pcie-qcom-common.h"
>   
> @@ -55,6 +56,7 @@
>   #define PARF_INT_ALL_STATUS			0x224
>   #define PARF_INT_ALL_CLEAR			0x228
>   #define PARF_INT_ALL_MASK			0x22c
> +#define PARF_STATUS				0x230
>   #define PARF_SID_OFFSET				0x234
>   #define PARF_BDF_TRANSLATE_CFG			0x24c
>   #define PARF_DBI_BASE_ADDR_V2			0x350
> @@ -130,9 +132,14 @@
>   
>   /* PARF_LTSSM register fields */
>   #define LTSSM_EN				BIT(8)
> +#define SW_CLEAR_FLUSH_MODE			BIT(10)
> +#define FLUSH_MODE				BIT(11)
>   
>   /* PARF_INT_ALL_{STATUS/CLEAR/MASK} register fields */
> -#define PARF_INT_ALL_LINK_UP			BIT(13)
> +#define INT_ALL_LINK_DOWN			1
> +#define INT_ALL_LINK_UP				13
> +#define PARF_INT_ALL_LINK_DOWN			BIT(INT_ALL_LINK_DOWN)
> +#define PARF_INT_ALL_LINK_UP			BIT(INT_ALL_LINK_UP)
>   #define PARF_INT_MSI_DEV_0_7			GENMASK(30, 23)
>   
>   /* PARF_NO_SNOOP_OVERRIDE register fields */
> @@ -145,6 +152,9 @@
>   /* PARF_BDF_TO_SID_CFG fields */
>   #define BDF_TO_SID_BYPASS			BIT(0)
>   
> +/* PARF_STATUS fields */
> +#define FLUSH_COMPLETED				BIT(8)
> +
>   /* ELBI_SYS_CTRL register fields */
>   #define ELBI_SYS_CTRL_LT_ENABLE			BIT(0)
>   
> @@ -169,6 +179,7 @@
>   						PCIE_CAP_SLOT_POWER_LIMIT_SCALE)
>   
>   #define PERST_DELAY_US				1000
> +#define FLUSH_TIMEOUT_US			100
>   
>   #define QCOM_PCIE_CRC8_POLYNOMIAL		(BIT(2) | BIT(1) | BIT(0))
>   
> @@ -274,11 +285,14 @@ struct qcom_pcie {
>   	struct icc_path *icc_cpu;
>   	const struct qcom_pcie_cfg *cfg;
>   	struct dentry *debugfs;
> +	int global_irq;
>   	bool suspended;
>   	bool use_pm_opp;
>   };
>   
>   #define to_qcom_pcie(x)		dev_get_drvdata((x)->dev)
> +static int qcom_pcie_reset_slot(struct pci_host_bridge *bridge,
> +				  struct pci_dev *pdev);
>   
>   static void qcom_ep_reset_assert(struct qcom_pcie *pcie)
>   {
> @@ -1263,6 +1277,8 @@ static int qcom_pcie_host_init(struct dw_pcie_rp *pp)
>   			goto err_assert_reset;
>   	}
>   
> +	pp->bridge->reset_slot = qcom_pcie_reset_slot;
> +
>   	return 0;
>   
>   err_assert_reset:
> @@ -1517,6 +1533,74 @@ static void qcom_pcie_icc_opp_update(struct qcom_pcie *pcie)
>   	}
>   }
>   
> +static int qcom_pcie_reset_slot(struct pci_host_bridge *bridge,
> +				  struct pci_dev *pdev)
> +{
> +	struct pci_bus *bus = bridge->bus;
> +	struct dw_pcie_rp *pp = bus->sysdata;
> +	struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
> +	struct qcom_pcie *pcie = to_qcom_pcie(pci);
> +	struct device *dev = pcie->pci->dev;
> +	u32 val;
> +	int ret;
> +
> +	/* Wait for the pending transactions to be completed */
> +	ret = readl_relaxed_poll_timeout(pcie->parf + PARF_STATUS, val,
> +					 val & FLUSH_COMPLETED, 10,
> +					 FLUSH_TIMEOUT_US);
> +	if (ret) {
> +		dev_err(dev, "Flush completion failed: %d\n", ret);
> +		goto err_host_deinit;
> +	}
> +
> +	/* Clear the FLUSH_MODE to allow the core to be reset */
> +	val = readl(pcie->parf + PARF_LTSSM);
> +	val |= SW_CLEAR_FLUSH_MODE;
> +	writel(val, pcie->parf + PARF_LTSSM);
> +
> +	/* Wait for the FLUSH_MODE to clear */
> +	ret = readl_relaxed_poll_timeout(pcie->parf + PARF_LTSSM, val,
> +					 !(val & FLUSH_MODE), 10,
> +					 FLUSH_TIMEOUT_US);
> +	if (ret) {
> +		dev_err(dev, "Flush mode clear failed: %d\n", ret);
> +		goto err_host_deinit;
> +	}
> +
> +	qcom_pcie_host_deinit(pp);
> +
> +	ret = qcom_pcie_host_init(pp);
> +	if (ret) {
> +		dev_err(dev, "Host init failed\n");
> +		return ret;
> +	}
> +
> +	ret = dw_pcie_setup_rc(pp);
> +	if (ret)
> +		goto err_host_deinit;
> +
> +	/*
> +	 * Re-enable global IRQ events as the PARF_INT_ALL_MASK register is
> +	 * non-sticky.
> +	 */
> +	if (pcie->global_irq)
> +		writel_relaxed(PARF_INT_ALL_LINK_UP | PARF_INT_ALL_LINK_DOWN |
> +			       PARF_INT_MSI_DEV_0_7, pcie->parf + PARF_INT_ALL_MASK);
> +
> +	qcom_pcie_start_link(pci);
> +	if (!dw_pcie_wait_for_link(pci))
> +		qcom_pcie_icc_opp_update(pcie);
This icc opp update can we removed as this can updated from the global
IRQ.

- Krishna Chaitanya.
> +
> +	dev_dbg(dev, "Slot reset completed\n");
> +
> +	return 0;
> +
> +err_host_deinit:
> +	qcom_pcie_host_deinit(pp);
> +
> +	return ret;
> +}
> +
>   static int qcom_pcie_link_transition_count(struct seq_file *s, void *data)
>   {
>   	struct qcom_pcie *pcie = (struct qcom_pcie *)dev_get_drvdata(s->private);
> @@ -1559,11 +1643,20 @@ static irqreturn_t qcom_pcie_global_irq_thread(int irq, void *data)
>   	struct qcom_pcie *pcie = data;
>   	struct dw_pcie_rp *pp = &pcie->pci->pp;
>   	struct device *dev = pcie->pci->dev;
> -	u32 status = readl_relaxed(pcie->parf + PARF_INT_ALL_STATUS);
> +	unsigned long status = readl_relaxed(pcie->parf + PARF_INT_ALL_STATUS);
>   
>   	writel_relaxed(status, pcie->parf + PARF_INT_ALL_CLEAR);
>   
> -	if (FIELD_GET(PARF_INT_ALL_LINK_UP, status)) {
> +	/*
> +	 * It is possible that both Link Up and Link Down events might have
> +	 * happended. So always handle Link Down first.
> +	 */
> +	if (test_and_clear_bit(INT_ALL_LINK_DOWN, &status)) {
> +		dev_dbg(dev, "Received Link down event\n");
> +		pci_host_handle_link_down(pp->bridge);
> +	}
> +
> +	if (test_and_clear_bit(INT_ALL_LINK_UP, &status)) {
>   		dev_dbg(dev, "Received Link up event. Starting enumeration!\n");
>   		/* Rescan the bus to enumerate endpoint devices */
>   		pci_lock_rescan_remove();
> @@ -1571,11 +1664,12 @@ static irqreturn_t qcom_pcie_global_irq_thread(int irq, void *data)
>   		pci_unlock_rescan_remove();
>   
>   		qcom_pcie_icc_opp_update(pcie);
> -	} else {
> -		dev_WARN_ONCE(dev, 1, "Received unknown event. INT_STATUS: 0x%08x\n",
> -			      status);
>   	}
>   
> +	if (status)
> +		dev_WARN_ONCE(dev, 1, "Received unknown event. INT_STATUS: 0x%08x\n",
> +			      (u32) status);
> +
>   	return IRQ_HANDLED;
>   }
>   
> @@ -1732,8 +1826,10 @@ static int qcom_pcie_probe(struct platform_device *pdev)
>   			goto err_host_deinit;
>   		}
>   
> -		writel_relaxed(PARF_INT_ALL_LINK_UP | PARF_INT_MSI_DEV_0_7,
> -			       pcie->parf + PARF_INT_ALL_MASK);
> +		writel_relaxed(PARF_INT_ALL_LINK_UP | PARF_INT_ALL_LINK_DOWN |
> +			       PARF_INT_MSI_DEV_0_7, pcie->parf + PARF_INT_ALL_MASK);
> +
> +		pcie->global_irq = irq;
>   	}
>   
>   	qcom_pcie_icc_opp_update(pcie);
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 4/5] PCI: host-common: Add link down handling for host bridges
  2025-05-08  7:10 ` [PATCH v4 4/5] PCI: host-common: Add link down handling for host bridges Manivannan Sadhasivam
@ 2025-05-14  6:30   ` Krishna Chaitanya Chundru
  2025-05-14 16:33     ` Manivannan Sadhasivam
  2025-05-28 22:35   ` Bjorn Helgaas
  2025-06-02 21:19   ` Bjorn Helgaas
  2 siblings, 1 reply; 28+ messages in thread
From: Krishna Chaitanya Chundru @ 2025-05-14  6:30 UTC (permalink / raw)
  To: Manivannan Sadhasivam, Mahesh J Salgaonkar, Oliver O'Halloran,
	Bjorn Helgaas, Lorenzo Pieralisi, Krzysztof Wilczyński,
	Rob Herring, Zhou Wang, Will Deacon, Robert Richter,
	Alyssa Rosenzweig, Marc Zyngier, Conor Dooley, Daire McNamara
  Cc: dingwei, cassel, Lukas Wunner, linuxppc-dev, linux-pci,
	linux-kernel, linux-arm-msm, linux-arm-kernel, linux-riscv



On 5/8/2025 12:40 PM, Manivannan Sadhasivam wrote:
> The PCI link, when down, needs to be recovered to bring it back. But that
> cannot be done in a generic way as link recovery procedure is specific to
> host bridges. So add a new API pci_host_handle_link_down() that could be
> called by the host bridge drivers when the link goes down.
> 
> The API will iterate through all the slots and calls the pcie_do_recovery()
> function with 'pci_channel_io_frozen' as the state. This will result in the
> execution of the AER Fatal error handling code. Since the link down
> recovery is pretty much the same as AER Fatal error handling,
> pcie_do_recovery() helper is reused here. First the AER error_detected
> callback will be triggered for the bridge and the downstream devices. Then,
> pci_host_reset_slot() will be called for the slot, which will reset the
> slot using 'reset_slot' callback to recover the link. Once that's done,
> resume message will be broadcasted to the bridge and the downstream devices
> indicating successful link recovery.
> 
> In case if the AER support is not enabled in the kernel, only
> pci_bus_error_reset() will be called for each slots as there is no way we
> could inform the drivers about link recovery.
> 
> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> ---
>   drivers/pci/controller/pci-host-common.c | 58 ++++++++++++++++++++++++++++++++
>   drivers/pci/controller/pci-host-common.h |  1 +
>   drivers/pci/pci.c                        |  1 +
>   drivers/pci/pcie/err.c                   |  1 +
>   4 files changed, 61 insertions(+)
> 
> diff --git a/drivers/pci/controller/pci-host-common.c b/drivers/pci/controller/pci-host-common.c
> index f93bc7034e697250711833a5151f7ef177cd62a0..f916f0a874a61ddfbfd99f96975c00fb66dd224c 100644
> --- a/drivers/pci/controller/pci-host-common.c
> +++ b/drivers/pci/controller/pci-host-common.c
> @@ -12,9 +12,11 @@
>   #include <linux/of.h>
>   #include <linux/of_address.h>
>   #include <linux/of_pci.h>
> +#include <linux/pci.h>
>   #include <linux/pci-ecam.h>
>   #include <linux/platform_device.h>
>   
> +#include "../pci.h"
>   #include "pci-host-common.h"
>   
>   static void gen_pci_unmap_cfg(void *ptr)
> @@ -96,5 +98,61 @@ void pci_host_common_remove(struct platform_device *pdev)
>   }
>   EXPORT_SYMBOL_GPL(pci_host_common_remove);
>   
> +#if IS_ENABLED(CONFIG_PCIEAER)
> +static pci_ers_result_t pci_host_reset_slot(struct pci_dev *dev)
> +{
> +	int ret;
> +
> +	ret = pci_bus_error_reset(dev);
> +	if (ret) {
> +		pci_err(dev, "Failed to reset slot: %d\n", ret);
> +		return PCI_ERS_RESULT_DISCONNECT;
> +	}
> +
> +	pci_info(dev, "Slot has been reset\n");
> +
> +	return PCI_ERS_RESULT_RECOVERED;
> +}
> +
> +static void pci_host_recover_slots(struct pci_host_bridge *host)
> +{
> +	struct pci_bus *bus = host->bus;
> +	struct pci_dev *dev;
> +
> +	for_each_pci_bridge(dev, bus) {
> +		if (!pci_is_root_bus(bus))
bus here is always constant here, we may need to have check
for dev here like if (!pci_is_root_bus(dev->bus))
> +			continue;
> +
> +		pcie_do_recovery(dev, pci_channel_io_frozen,
> +				 pci_host_reset_slot);
> +	}
> +}
> +#else
> +static void pci_host_recover_slots(struct pci_host_bridge *host)
> +{
> +	struct pci_bus *bus = host->bus;
> +	struct pci_dev *dev;
> +	int ret;
> +
> +	for_each_pci_bridge(dev, bus) {
> +		if (!pci_is_root_bus(bus))Same comment as above.

- Krishna Chaitanya.
> +			continue;
> +
> +		ret = pci_bus_error_reset(dev);
> +		if (ret)
> +			pci_err(dev, "Failed to reset slot: %d\n", ret);
> +		else
> +			pci_info(dev, "Slot has been reset\n");
> +	}
> +}
> +#endif
> +
> +void pci_host_handle_link_down(struct pci_host_bridge *bridge)
> +{
> +	dev_info(&bridge->dev, "Recovering slots due to Link Down\n");
> +	pci_host_recover_slots(bridge);
> +}
> +EXPORT_SYMBOL_GPL(pci_host_handle_link_down);
> +
>   MODULE_DESCRIPTION("Common library for PCI host controller drivers");
>   MODULE_LICENSE("GPL v2");
> diff --git a/drivers/pci/controller/pci-host-common.h b/drivers/pci/controller/pci-host-common.h
> index d8be024ca68d43afb147fd9104d632b907277144..904698c1a2695888a0fc9c2fac360e456116eb1d 100644
> --- a/drivers/pci/controller/pci-host-common.h
> +++ b/drivers/pci/controller/pci-host-common.h
> @@ -12,5 +12,6 @@
>   
>   int pci_host_common_probe(struct platform_device *pdev);
>   void pci_host_common_remove(struct platform_device *pdev);
> +void pci_host_handle_link_down(struct pci_host_bridge *bridge);
>   
>   #endif
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 13709bb898a967968540826a2b7ee8ade6b7e082..4d396bbab4a8f33cae0ffe8982da120a9f1d92c9 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -5781,6 +5781,7 @@ int pci_bus_error_reset(struct pci_dev *bridge)
>   	mutex_unlock(&pci_slot_mutex);
>   	return pci_bus_reset(bridge->subordinate, PCI_RESET_DO_RESET);
>   }
> +EXPORT_SYMBOL_GPL(pci_bus_error_reset);
>   
>   /**
>    * pci_probe_reset_bus - probe whether a PCI bus can be reset
> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> index b834fc0d705938540d3d7d3d8739770c09fe7cf1..3e3084bb7cb7fa06b526e6fab60e77927aba0ad0 100644
> --- a/drivers/pci/pcie/err.c
> +++ b/drivers/pci/pcie/err.c
> @@ -270,3 +270,4 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
>   
>   	return status;
>   }
> +EXPORT_SYMBOL_GPL(pcie_do_recovery);
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 2/5] PCI/ERR: Add support for resetting the slots in a platform specific way
  2025-05-08  7:10 ` [PATCH v4 2/5] PCI/ERR: Add support for resetting the slots in a platform specific way Manivannan Sadhasivam
  2025-05-08  7:18   ` Lukas Wunner
  2025-05-09  2:59   ` Wilfred Mallawa
@ 2025-05-14  6:33   ` Krishna Chaitanya Chundru
  2025-05-14 16:42   ` Sathyanarayanan Kuppuswamy
  3 siblings, 0 replies; 28+ messages in thread
From: Krishna Chaitanya Chundru @ 2025-05-14  6:33 UTC (permalink / raw)
  To: Manivannan Sadhasivam, Mahesh J Salgaonkar, Oliver O'Halloran,
	Bjorn Helgaas, Lorenzo Pieralisi, Krzysztof Wilczyński,
	Rob Herring, Zhou Wang, Will Deacon, Robert Richter,
	Alyssa Rosenzweig, Marc Zyngier, Conor Dooley, Daire McNamara
  Cc: dingwei, cassel, Lukas Wunner, linuxppc-dev, linux-pci,
	linux-kernel, linux-arm-msm, linux-arm-kernel, linux-riscv



On 5/8/2025 12:40 PM, Manivannan Sadhasivam wrote:
> Some host bridge devices require resetting the slots in a platform specific
> way to recover them from error conditions such as Fatal AER errors, Link
> Down etc... So introduce pci_host_bridge::reset_slot callback and call it
> from pcibios_reset_secondary_bus() if available.
> 
> The 'reset_slot' callback is responsible for resetting the given slot
> referenced by the 'pci_dev' pointer in a platform specific way and bring it
> back to the working state if possible. If any error occurs during the slot
> reset operation, relevant errno should be returned.
> 
> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Reviewed-by: Krishna Chaitanya Chundru <krishna.chundru@oss.qualcomm.com>

- Krishna Chaitanya.
> ---
>   drivers/pci/pci.c      | 12 ++++++++++++
>   drivers/pci/pcie/err.c |  5 -----
>   include/linux/pci.h    |  1 +
>   3 files changed, 13 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 4d7c9f64ea24ec754a135a2585c99489cfa641a9..13709bb898a967968540826a2b7ee8ade6b7e082 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -4982,7 +4982,19 @@ void pci_reset_secondary_bus(struct pci_dev *dev)
>   
>   void __weak pcibios_reset_secondary_bus(struct pci_dev *dev)
>   {
> +	struct pci_host_bridge *host = pci_find_host_bridge(dev->bus);
> +	int ret;
> +
> +	if (host->reset_slot) {
> +		ret = host->reset_slot(host, dev);
> +		if (ret)
> +			pci_err(dev, "failed to reset slot: %d\n", ret);
> +
> +		return;
> +	}
> +
>   	pci_reset_secondary_bus(dev);
> +
>   }
>   
>   /**
> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> index de6381c690f5c21f00021cdc7bde8d93a5c7db52..b834fc0d705938540d3d7d3d8739770c09fe7cf1 100644
> --- a/drivers/pci/pcie/err.c
> +++ b/drivers/pci/pcie/err.c
> @@ -234,11 +234,6 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
>   	}
>   
>   	if (status == PCI_ERS_RESULT_NEED_RESET) {
> -		/*
> -		 * TODO: Should call platform-specific
> -		 * functions to reset slot before calling
> -		 * drivers' slot_reset callbacks?
> -		 */
>   		status = PCI_ERS_RESULT_RECOVERED;
>   		pci_dbg(bridge, "broadcast slot_reset message\n");
>   		pci_walk_bridge(bridge, report_slot_reset, &status);
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 0e8e3fd77e96713054388bdc82f439e51023c1bf..8d7d2a49b76cf64b4218b179cec495e0d69ddf6f 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -599,6 +599,7 @@ struct pci_host_bridge {
>   	void (*release_fn)(struct pci_host_bridge *);
>   	int (*enable_device)(struct pci_host_bridge *bridge, struct pci_dev *dev);
>   	void (*disable_device)(struct pci_host_bridge *bridge, struct pci_dev *dev);
> +	int (*reset_slot)(struct pci_host_bridge *bridge, struct pci_dev *dev);
>   	void		*release_data;
>   	unsigned int	ignore_reset_delay:1;	/* For entire hierarchy */
>   	unsigned int	no_ext_tags:1;		/* No Extended Tags */
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 0/5] PCI: Add support for resetting the slots in a platform specific way
  2025-05-08  7:10 [PATCH v4 0/5] PCI: Add support for resetting the slots in a platform specific way Manivannan Sadhasivam
                   ` (4 preceding siblings ...)
  2025-05-08  7:10 ` [PATCH v4 5/5] PCI: qcom: Add support for resetting the slot due to link down event Manivannan Sadhasivam
@ 2025-05-14 16:32 ` Manivannan Sadhasivam
  5 siblings, 0 replies; 28+ messages in thread
From: Manivannan Sadhasivam @ 2025-05-14 16:32 UTC (permalink / raw)
  To: Mahesh J Salgaonkar, Oliver O'Halloran, Bjorn Helgaas,
	Lorenzo Pieralisi, Krzysztof Wilczyński, Rob Herring,
	Zhou Wang, Will Deacon, Robert Richter, Alyssa Rosenzweig,
	Marc Zyngier, Conor Dooley, Daire McNamara, Manivannan Sadhasivam
  Cc: dingwei, cassel, Lukas Wunner, Krishna Chaitanya Chundru,
	linuxppc-dev, linux-pci, linux-kernel, linux-arm-msm,
	linux-arm-kernel, linux-riscv


On Thu, 08 May 2025 12:40:29 +0530, Manivannan Sadhasivam wrote:
> Currently, in the event of AER/DPC, PCI core will try to reset the slot and its
> subordinate devices by invoking bridge control reset and FLR. But in some
> cases like AER Fatal error, it might be necessary to reset the slots using the
> PCI host bridge drivers in a platform specific way (as indicated by the TODO in
> the pcie_do_recovery() function in drivers/pci/pcie/err.c). Otherwise, the PCI
> link won't be recovered successfully.
> 
> [...]

Applied, thanks!

[1/5] PCI/ERR: Remove misleading TODO regarding kernel panic
      commit: 0b56264b66b695aa76115ae68c1365dff80814e0
[2/5] PCI/ERR: Add support for resetting the slots in a platform specific way
      commit: e2889025d83fb38faac80d976f1789af3320ecb2
[3/5] PCI: host-common: Make the driver as a common library for host controller drivers
      commit: a811ffd43754dfcaccff497de1d8ee0bd30f8f2f
[4/5] PCI: host-common: Add link down handling for host bridges
      commit: f24509b96c33c1bdd9cc57bc4f8a58391e846c8c
[5/5] PCI: qcom: Add support for resetting the slot due to link down event
      commit: 40eba89968afc15302fa5aaef207fd9ccaf1ecb2

Best regards,
-- 
Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 4/5] PCI: host-common: Add link down handling for host bridges
  2025-05-14  6:30   ` Krishna Chaitanya Chundru
@ 2025-05-14 16:33     ` Manivannan Sadhasivam
  0 siblings, 0 replies; 28+ messages in thread
From: Manivannan Sadhasivam @ 2025-05-14 16:33 UTC (permalink / raw)
  To: Krishna Chaitanya Chundru
  Cc: Mahesh J Salgaonkar, Oliver O'Halloran, Bjorn Helgaas,
	Lorenzo Pieralisi, Krzysztof Wilczyński, Rob Herring,
	Zhou Wang, Will Deacon, Robert Richter, Alyssa Rosenzweig,
	Marc Zyngier, Conor Dooley, Daire McNamara, dingwei, cassel,
	Lukas Wunner, linuxppc-dev, linux-pci, linux-kernel,
	linux-arm-msm, linux-arm-kernel, linux-riscv

On Wed, May 14, 2025 at 12:00:11PM +0530, Krishna Chaitanya Chundru wrote:
> 
> 
> On 5/8/2025 12:40 PM, Manivannan Sadhasivam wrote:
> > The PCI link, when down, needs to be recovered to bring it back. But that
> > cannot be done in a generic way as link recovery procedure is specific to
> > host bridges. So add a new API pci_host_handle_link_down() that could be
> > called by the host bridge drivers when the link goes down.
> > 
> > The API will iterate through all the slots and calls the pcie_do_recovery()
> > function with 'pci_channel_io_frozen' as the state. This will result in the
> > execution of the AER Fatal error handling code. Since the link down
> > recovery is pretty much the same as AER Fatal error handling,
> > pcie_do_recovery() helper is reused here. First the AER error_detected
> > callback will be triggered for the bridge and the downstream devices. Then,
> > pci_host_reset_slot() will be called for the slot, which will reset the
> > slot using 'reset_slot' callback to recover the link. Once that's done,
> > resume message will be broadcasted to the bridge and the downstream devices
> > indicating successful link recovery.
> > 
> > In case if the AER support is not enabled in the kernel, only
> > pci_bus_error_reset() will be called for each slots as there is no way we
> > could inform the drivers about link recovery.
> > 
> > Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> > ---
> >   drivers/pci/controller/pci-host-common.c | 58 ++++++++++++++++++++++++++++++++
> >   drivers/pci/controller/pci-host-common.h |  1 +
> >   drivers/pci/pci.c                        |  1 +
> >   drivers/pci/pcie/err.c                   |  1 +
> >   4 files changed, 61 insertions(+)
> > 
> > diff --git a/drivers/pci/controller/pci-host-common.c b/drivers/pci/controller/pci-host-common.c
> > index f93bc7034e697250711833a5151f7ef177cd62a0..f916f0a874a61ddfbfd99f96975c00fb66dd224c 100644
> > --- a/drivers/pci/controller/pci-host-common.c
> > +++ b/drivers/pci/controller/pci-host-common.c
> > @@ -12,9 +12,11 @@
> >   #include <linux/of.h>
> >   #include <linux/of_address.h>
> >   #include <linux/of_pci.h>
> > +#include <linux/pci.h>
> >   #include <linux/pci-ecam.h>
> >   #include <linux/platform_device.h>
> > +#include "../pci.h"
> >   #include "pci-host-common.h"
> >   static void gen_pci_unmap_cfg(void *ptr)
> > @@ -96,5 +98,61 @@ void pci_host_common_remove(struct platform_device *pdev)
> >   }
> >   EXPORT_SYMBOL_GPL(pci_host_common_remove);
> > +#if IS_ENABLED(CONFIG_PCIEAER)
> > +static pci_ers_result_t pci_host_reset_slot(struct pci_dev *dev)
> > +{
> > +	int ret;
> > +
> > +	ret = pci_bus_error_reset(dev);
> > +	if (ret) {
> > +		pci_err(dev, "Failed to reset slot: %d\n", ret);
> > +		return PCI_ERS_RESULT_DISCONNECT;
> > +	}
> > +
> > +	pci_info(dev, "Slot has been reset\n");
> > +
> > +	return PCI_ERS_RESULT_RECOVERED;
> > +}
> > +
> > +static void pci_host_recover_slots(struct pci_host_bridge *host)
> > +{
> > +	struct pci_bus *bus = host->bus;
> > +	struct pci_dev *dev;
> > +
> > +	for_each_pci_bridge(dev, bus) {
> > +		if (!pci_is_root_bus(bus))
> bus here is always constant here, we may need to have check
> for dev here like if (!pci_is_root_bus(dev->bus))

Good catch! Ammended it while applying.

- Mani

-- 
மணிவண்ணன் சதாசிவம்

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 5/5] PCI: qcom: Add support for resetting the slot due to link down event
  2025-05-14  6:22   ` Krishna Chaitanya Chundru
@ 2025-05-14 16:35     ` Manivannan Sadhasivam
  0 siblings, 0 replies; 28+ messages in thread
From: Manivannan Sadhasivam @ 2025-05-14 16:35 UTC (permalink / raw)
  To: Krishna Chaitanya Chundru
  Cc: Mahesh J Salgaonkar, Oliver O'Halloran, Bjorn Helgaas,
	Lorenzo Pieralisi, Krzysztof Wilczyński, Rob Herring,
	Zhou Wang, Will Deacon, Robert Richter, Alyssa Rosenzweig,
	Marc Zyngier, Conor Dooley, Daire McNamara, dingwei, cassel,
	Lukas Wunner, linuxppc-dev, linux-pci, linux-kernel,
	linux-arm-msm, linux-arm-kernel, linux-riscv

On Wed, May 14, 2025 at 11:52:13AM +0530, Krishna Chaitanya Chundru wrote:
> 
> 
> On 5/8/2025 12:40 PM, Manivannan Sadhasivam wrote:
> > The PCIe link can go down under circumstances such as the device firmware
> > crash, link instability, etc... When that happens, the PCIe slot needs to
> > be reset to make it operational again. Currently, the driver is not
> > handling the link down event, due to which the users have to restart the
> > machine to make PCIe link operational again. So fix it by detecting the
> > link down event and resetting the slot.
> > 
> > Since the Qcom PCIe controllers report the link down event through the
> > 'global' IRQ, enable the link down event by setting PARF_INT_ALL_LINK_DOWN
> > bit in PARF_INT_ALL_MASK register.
> > 
> > Then in the case of the event, call pci_host_handle_link_down() API
> > in the handler to let the PCI core handle the link down condition. Note
> > that both link up and link down events could be set at a time when the
> > handler runs. So always handle link down first.
> > 
> > The API will internally call, 'pci_host_bridge::reset_slot()' callback to
> > reset the slot in a platform specific way. So implement the callback to
> > reset the slot by first resetting the PCIe core, followed by reinitializing
> > the resources and then finally starting the link again.
> > 
> Only one comment see below.
> > Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> Reviewed-by: Krishna Chaitanya Chundru <krishna.chundru@oss.qualcomm.com>
> > ---
> >   drivers/pci/controller/dwc/Kconfig     |   1 +
> >   drivers/pci/controller/dwc/pcie-qcom.c | 112 ++++++++++++++++++++++++++++++---
> >   2 files changed, 105 insertions(+), 8 deletions(-)
> > 
> > diff --git a/drivers/pci/controller/dwc/Kconfig b/drivers/pci/controller/dwc/Kconfig
> > index d9f0386396edf66ad0e514a0f545ed24d89fcb6c..ce04ee6fbd99cbcce5d2f3a75ebd72a17070b7b7 100644
> > --- a/drivers/pci/controller/dwc/Kconfig
> > +++ b/drivers/pci/controller/dwc/Kconfig
> > @@ -296,6 +296,7 @@ config PCIE_QCOM
> >   	select PCIE_DW_HOST
> >   	select CRC8
> >   	select PCIE_QCOM_COMMON
> > +	select PCI_HOST_COMMON
> >   	help
> >   	  Say Y here to enable PCIe controller support on Qualcomm SoCs. The
> >   	  PCIe controller uses the DesignWare core plus Qualcomm-specific
> > diff --git a/drivers/pci/controller/dwc/pcie-qcom.c b/drivers/pci/controller/dwc/pcie-qcom.c
> > index dc98ae63362db0422384b1879a2b9a7dc564d091..e577619d0f8ceddf0955139ae6b939842f8cb7be 100644
> > --- a/drivers/pci/controller/dwc/pcie-qcom.c
> > +++ b/drivers/pci/controller/dwc/pcie-qcom.c
> > @@ -34,6 +34,7 @@
> >   #include <linux/units.h>
> >   #include "../../pci.h"
> > +#include "../pci-host-common.h"
> >   #include "pcie-designware.h"
> >   #include "pcie-qcom-common.h"
> > @@ -55,6 +56,7 @@
> >   #define PARF_INT_ALL_STATUS			0x224
> >   #define PARF_INT_ALL_CLEAR			0x228
> >   #define PARF_INT_ALL_MASK			0x22c
> > +#define PARF_STATUS				0x230
> >   #define PARF_SID_OFFSET				0x234
> >   #define PARF_BDF_TRANSLATE_CFG			0x24c
> >   #define PARF_DBI_BASE_ADDR_V2			0x350
> > @@ -130,9 +132,14 @@
> >   /* PARF_LTSSM register fields */
> >   #define LTSSM_EN				BIT(8)
> > +#define SW_CLEAR_FLUSH_MODE			BIT(10)
> > +#define FLUSH_MODE				BIT(11)
> >   /* PARF_INT_ALL_{STATUS/CLEAR/MASK} register fields */
> > -#define PARF_INT_ALL_LINK_UP			BIT(13)
> > +#define INT_ALL_LINK_DOWN			1
> > +#define INT_ALL_LINK_UP				13
> > +#define PARF_INT_ALL_LINK_DOWN			BIT(INT_ALL_LINK_DOWN)
> > +#define PARF_INT_ALL_LINK_UP			BIT(INT_ALL_LINK_UP)
> >   #define PARF_INT_MSI_DEV_0_7			GENMASK(30, 23)
> >   /* PARF_NO_SNOOP_OVERRIDE register fields */
> > @@ -145,6 +152,9 @@
> >   /* PARF_BDF_TO_SID_CFG fields */
> >   #define BDF_TO_SID_BYPASS			BIT(0)
> > +/* PARF_STATUS fields */
> > +#define FLUSH_COMPLETED				BIT(8)
> > +
> >   /* ELBI_SYS_CTRL register fields */
> >   #define ELBI_SYS_CTRL_LT_ENABLE			BIT(0)
> > @@ -169,6 +179,7 @@
> >   						PCIE_CAP_SLOT_POWER_LIMIT_SCALE)
> >   #define PERST_DELAY_US				1000
> > +#define FLUSH_TIMEOUT_US			100
> >   #define QCOM_PCIE_CRC8_POLYNOMIAL		(BIT(2) | BIT(1) | BIT(0))
> > @@ -274,11 +285,14 @@ struct qcom_pcie {
> >   	struct icc_path *icc_cpu;
> >   	const struct qcom_pcie_cfg *cfg;
> >   	struct dentry *debugfs;
> > +	int global_irq;
> >   	bool suspended;
> >   	bool use_pm_opp;
> >   };
> >   #define to_qcom_pcie(x)		dev_get_drvdata((x)->dev)
> > +static int qcom_pcie_reset_slot(struct pci_host_bridge *bridge,
> > +				  struct pci_dev *pdev);
> >   static void qcom_ep_reset_assert(struct qcom_pcie *pcie)
> >   {
> > @@ -1263,6 +1277,8 @@ static int qcom_pcie_host_init(struct dw_pcie_rp *pp)
> >   			goto err_assert_reset;
> >   	}
> > +	pp->bridge->reset_slot = qcom_pcie_reset_slot;
> > +
> >   	return 0;
> >   err_assert_reset:
> > @@ -1517,6 +1533,74 @@ static void qcom_pcie_icc_opp_update(struct qcom_pcie *pcie)
> >   	}
> >   }
> > +static int qcom_pcie_reset_slot(struct pci_host_bridge *bridge,
> > +				  struct pci_dev *pdev)
> > +{
> > +	struct pci_bus *bus = bridge->bus;
> > +	struct dw_pcie_rp *pp = bus->sysdata;
> > +	struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
> > +	struct qcom_pcie *pcie = to_qcom_pcie(pci);
> > +	struct device *dev = pcie->pci->dev;
> > +	u32 val;
> > +	int ret;
> > +
> > +	/* Wait for the pending transactions to be completed */
> > +	ret = readl_relaxed_poll_timeout(pcie->parf + PARF_STATUS, val,
> > +					 val & FLUSH_COMPLETED, 10,
> > +					 FLUSH_TIMEOUT_US);
> > +	if (ret) {
> > +		dev_err(dev, "Flush completion failed: %d\n", ret);
> > +		goto err_host_deinit;
> > +	}
> > +
> > +	/* Clear the FLUSH_MODE to allow the core to be reset */
> > +	val = readl(pcie->parf + PARF_LTSSM);
> > +	val |= SW_CLEAR_FLUSH_MODE;
> > +	writel(val, pcie->parf + PARF_LTSSM);
> > +
> > +	/* Wait for the FLUSH_MODE to clear */
> > +	ret = readl_relaxed_poll_timeout(pcie->parf + PARF_LTSSM, val,
> > +					 !(val & FLUSH_MODE), 10,
> > +					 FLUSH_TIMEOUT_US);
> > +	if (ret) {
> > +		dev_err(dev, "Flush mode clear failed: %d\n", ret);
> > +		goto err_host_deinit;
> > +	}
> > +
> > +	qcom_pcie_host_deinit(pp);
> > +
> > +	ret = qcom_pcie_host_init(pp);
> > +	if (ret) {
> > +		dev_err(dev, "Host init failed\n");
> > +		return ret;
> > +	}
> > +
> > +	ret = dw_pcie_setup_rc(pp);
> > +	if (ret)
> > +		goto err_host_deinit;
> > +
> > +	/*
> > +	 * Re-enable global IRQ events as the PARF_INT_ALL_MASK register is
> > +	 * non-sticky.
> > +	 */
> > +	if (pcie->global_irq)
> > +		writel_relaxed(PARF_INT_ALL_LINK_UP | PARF_INT_ALL_LINK_DOWN |
> > +			       PARF_INT_MSI_DEV_0_7, pcie->parf + PARF_INT_ALL_MASK);
> > +
> > +	qcom_pcie_start_link(pci);
> > +	if (!dw_pcie_wait_for_link(pci))
> > +		qcom_pcie_icc_opp_update(pcie);
> This icc opp update can we removed as this can updated from the global
> IRQ.

Right. I forgot to remove it after keeping link up IRQ change. Removed it while
applying.

- Mani

-- 
மணிவண்ணன் சதாசிவம்

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 1/5] PCI/ERR: Remove misleading TODO regarding kernel panic
  2025-05-08  7:10 ` [PATCH v4 1/5] PCI/ERR: Remove misleading TODO regarding kernel panic Manivannan Sadhasivam
  2025-05-09  3:04   ` Wilfred Mallawa
  2025-05-09  6:11   ` Ethan Zhao
@ 2025-05-14 16:39   ` Sathyanarayanan Kuppuswamy
  2 siblings, 0 replies; 28+ messages in thread
From: Sathyanarayanan Kuppuswamy @ 2025-05-14 16:39 UTC (permalink / raw)
  To: Manivannan Sadhasivam, Mahesh J Salgaonkar, Oliver O'Halloran,
	Bjorn Helgaas, Lorenzo Pieralisi, Krzysztof Wilczyński,
	Rob Herring, Zhou Wang, Will Deacon, Robert Richter,
	Alyssa Rosenzweig, Marc Zyngier, Conor Dooley, Daire McNamara
  Cc: dingwei, cassel, Lukas Wunner, Krishna Chaitanya Chundru,
	linuxppc-dev, linux-pci, linux-kernel, linux-arm-msm,
	linux-arm-kernel, linux-riscv


On 5/8/25 12:10 AM, Manivannan Sadhasivam wrote:
> A PCI device is just another peripheral in a system. So failure to
> recover it, must not result in a kernel panic. So remove the TODO which
> is quite misleading.
>
> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> ---

Reviewed-by: Kuppuswamy Sathyanarayanan 
<sathyanarayanan.kuppuswamy@linux.intel.com>

>   drivers/pci/pcie/err.c | 1 -
>   1 file changed, 1 deletion(-)
>
> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> index 31090770fffcc94e15ba6e89f649c6f84bfdf0d5..de6381c690f5c21f00021cdc7bde8d93a5c7db52 100644
> --- a/drivers/pci/pcie/err.c
> +++ b/drivers/pci/pcie/err.c
> @@ -271,7 +271,6 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
>   
>   	pci_uevent_ers(bridge, PCI_ERS_RESULT_DISCONNECT);
>   
> -	/* TODO: Should kernel panic here? */
>   	pci_info(bridge, "device recovery failed\n");
>   
>   	return status;
>
-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 2/5] PCI/ERR: Add support for resetting the slots in a platform specific way
  2025-05-08  7:10 ` [PATCH v4 2/5] PCI/ERR: Add support for resetting the slots in a platform specific way Manivannan Sadhasivam
                     ` (2 preceding siblings ...)
  2025-05-14  6:33   ` Krishna Chaitanya Chundru
@ 2025-05-14 16:42   ` Sathyanarayanan Kuppuswamy
  3 siblings, 0 replies; 28+ messages in thread
From: Sathyanarayanan Kuppuswamy @ 2025-05-14 16:42 UTC (permalink / raw)
  To: Manivannan Sadhasivam, Mahesh J Salgaonkar, Oliver O'Halloran,
	Bjorn Helgaas, Lorenzo Pieralisi, Krzysztof Wilczyński,
	Rob Herring, Zhou Wang, Will Deacon, Robert Richter,
	Alyssa Rosenzweig, Marc Zyngier, Conor Dooley, Daire McNamara
  Cc: dingwei, cassel, Lukas Wunner, Krishna Chaitanya Chundru,
	linuxppc-dev, linux-pci, linux-kernel, linux-arm-msm,
	linux-arm-kernel, linux-riscv


On 5/8/25 12:10 AM, Manivannan Sadhasivam wrote:
> Some host bridge devices require resetting the slots in a platform specific
> way to recover them from error conditions such as Fatal AER errors, Link
> Down etc... So introduce pci_host_bridge::reset_slot callback and call it
> from pcibios_reset_secondary_bus() if available.
>
> The 'reset_slot' callback is responsible for resetting the given slot
> referenced by the 'pci_dev' pointer in a platform specific way and bring it
> back to the working state if possible. If any error occurs during the slot
> reset operation, relevant errno should be returned.
>
> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> ---

Looks good to me.

Reviewed-by: Kuppuswamy Sathyanarayanan 
<sathyanarayanan.kuppuswamy@linux.intel.com>

>   drivers/pci/pci.c      | 12 ++++++++++++
>   drivers/pci/pcie/err.c |  5 -----
>   include/linux/pci.h    |  1 +
>   3 files changed, 13 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 4d7c9f64ea24ec754a135a2585c99489cfa641a9..13709bb898a967968540826a2b7ee8ade6b7e082 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -4982,7 +4982,19 @@ void pci_reset_secondary_bus(struct pci_dev *dev)
>   
>   void __weak pcibios_reset_secondary_bus(struct pci_dev *dev)
>   {
> +	struct pci_host_bridge *host = pci_find_host_bridge(dev->bus);
> +	int ret;
> +
> +	if (host->reset_slot) {
> +		ret = host->reset_slot(host, dev);
> +		if (ret)
> +			pci_err(dev, "failed to reset slot: %d\n", ret);
> +
> +		return;
> +	}
> +
>   	pci_reset_secondary_bus(dev);
> +
>   }
>   
>   /**
> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> index de6381c690f5c21f00021cdc7bde8d93a5c7db52..b834fc0d705938540d3d7d3d8739770c09fe7cf1 100644
> --- a/drivers/pci/pcie/err.c
> +++ b/drivers/pci/pcie/err.c
> @@ -234,11 +234,6 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
>   	}
>   
>   	if (status == PCI_ERS_RESULT_NEED_RESET) {
> -		/*
> -		 * TODO: Should call platform-specific
> -		 * functions to reset slot before calling
> -		 * drivers' slot_reset callbacks?
> -		 */
>   		status = PCI_ERS_RESULT_RECOVERED;
>   		pci_dbg(bridge, "broadcast slot_reset message\n");
>   		pci_walk_bridge(bridge, report_slot_reset, &status);
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 0e8e3fd77e96713054388bdc82f439e51023c1bf..8d7d2a49b76cf64b4218b179cec495e0d69ddf6f 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -599,6 +599,7 @@ struct pci_host_bridge {
>   	void (*release_fn)(struct pci_host_bridge *);
>   	int (*enable_device)(struct pci_host_bridge *bridge, struct pci_dev *dev);
>   	void (*disable_device)(struct pci_host_bridge *bridge, struct pci_dev *dev);
> +	int (*reset_slot)(struct pci_host_bridge *bridge, struct pci_dev *dev);
>   	void		*release_data;
>   	unsigned int	ignore_reset_delay:1;	/* For entire hierarchy */
>   	unsigned int	no_ext_tags:1;		/* No Extended Tags */
>
-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 4/5] PCI: host-common: Add link down handling for host bridges
  2025-05-08  7:10 ` [PATCH v4 4/5] PCI: host-common: Add link down handling for host bridges Manivannan Sadhasivam
  2025-05-14  6:30   ` Krishna Chaitanya Chundru
@ 2025-05-28 22:35   ` Bjorn Helgaas
  2025-05-30  3:46     ` Manivannan Sadhasivam
  2025-06-02 21:19   ` Bjorn Helgaas
  2 siblings, 1 reply; 28+ messages in thread
From: Bjorn Helgaas @ 2025-05-28 22:35 UTC (permalink / raw)
  To: Manivannan Sadhasivam
  Cc: Mahesh J Salgaonkar, Oliver O'Halloran, Bjorn Helgaas,
	Lorenzo Pieralisi, Krzysztof Wilczyński, Rob Herring,
	Zhou Wang, Will Deacon, Robert Richter, Alyssa Rosenzweig,
	Marc Zyngier, Conor Dooley, Daire McNamara, dingwei, cassel,
	Lukas Wunner, Krishna Chaitanya Chundru, linuxppc-dev, linux-pci,
	linux-kernel, linux-arm-msm, linux-arm-kernel, linux-riscv

On Thu, May 08, 2025 at 12:40:33PM +0530, Manivannan Sadhasivam wrote:
> The PCI link, when down, needs to be recovered to bring it back. But that
> cannot be done in a generic way as link recovery procedure is specific to
> host bridges. So add a new API pci_host_handle_link_down() that could be
> called by the host bridge drivers when the link goes down.
> 
> The API will iterate through all the slots and calls the pcie_do_recovery()
> function with 'pci_channel_io_frozen' as the state. This will result in the
> execution of the AER Fatal error handling code. Since the link down
> recovery is pretty much the same as AER Fatal error handling,
> pcie_do_recovery() helper is reused here. First the AER error_detected
> callback will be triggered for the bridge and the downstream devices. Then,
> pci_host_reset_slot() will be called for the slot, which will reset the
> slot using 'reset_slot' callback to recover the link. Once that's done,
> resume message will be broadcasted to the bridge and the downstream devices
> indicating successful link recovery.

Link down is an event for a single Root Port.  Why would we iterate
through all the Root Ports if the link went down for one of them?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 4/5] PCI: host-common: Add link down handling for host bridges
  2025-05-28 22:35   ` Bjorn Helgaas
@ 2025-05-30  3:46     ` Manivannan Sadhasivam
  2025-05-30 11:34       ` Bjorn Helgaas
  0 siblings, 1 reply; 28+ messages in thread
From: Manivannan Sadhasivam @ 2025-05-30  3:46 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Mahesh J Salgaonkar, Oliver O'Halloran, Bjorn Helgaas,
	Lorenzo Pieralisi, Krzysztof Wilczyński, Rob Herring,
	Zhou Wang, Will Deacon, Robert Richter, Alyssa Rosenzweig,
	Marc Zyngier, Conor Dooley, Daire McNamara, dingwei, cassel,
	Lukas Wunner, Krishna Chaitanya Chundru, linuxppc-dev, linux-pci,
	linux-kernel, linux-arm-msm, linux-arm-kernel, linux-riscv

On Wed, May 28, 2025 at 05:35:00PM -0500, Bjorn Helgaas wrote:
> On Thu, May 08, 2025 at 12:40:33PM +0530, Manivannan Sadhasivam wrote:
> > The PCI link, when down, needs to be recovered to bring it back. But that
> > cannot be done in a generic way as link recovery procedure is specific to
> > host bridges. So add a new API pci_host_handle_link_down() that could be
> > called by the host bridge drivers when the link goes down.
> > 
> > The API will iterate through all the slots and calls the pcie_do_recovery()
> > function with 'pci_channel_io_frozen' as the state. This will result in the
> > execution of the AER Fatal error handling code. Since the link down
> > recovery is pretty much the same as AER Fatal error handling,
> > pcie_do_recovery() helper is reused here. First the AER error_detected
> > callback will be triggered for the bridge and the downstream devices. Then,
> > pci_host_reset_slot() will be called for the slot, which will reset the
> > slot using 'reset_slot' callback to recover the link. Once that's done,
> > resume message will be broadcasted to the bridge and the downstream devices
> > indicating successful link recovery.
> 
> Link down is an event for a single Root Port.  Why would we iterate
> through all the Root Ports if the link went down for one of them?

Because on the reference platform (Qcom), link down notification is not
per-port, but per controller. So that's why we are iterating through all ports.
The callback is supposed to identify the ports that triggered the link down
event and recover them.

- Mani

-- 
மணிவண்ணன் சதாசிவம்

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 4/5] PCI: host-common: Add link down handling for host bridges
  2025-05-30  3:46     ` Manivannan Sadhasivam
@ 2025-05-30 11:34       ` Bjorn Helgaas
  2025-05-30 16:09         ` Manivannan Sadhasivam
  0 siblings, 1 reply; 28+ messages in thread
From: Bjorn Helgaas @ 2025-05-30 11:34 UTC (permalink / raw)
  To: Manivannan Sadhasivam
  Cc: Mahesh J Salgaonkar, Oliver O'Halloran, Bjorn Helgaas,
	Lorenzo Pieralisi, Krzysztof Wilczyński, Rob Herring,
	Zhou Wang, Will Deacon, Robert Richter, Alyssa Rosenzweig,
	Marc Zyngier, Conor Dooley, Daire McNamara, dingwei, cassel,
	Lukas Wunner, Krishna Chaitanya Chundru, linuxppc-dev, linux-pci,
	linux-kernel, linux-arm-msm, linux-arm-kernel, linux-riscv

On Fri, May 30, 2025 at 09:16:59AM +0530, Manivannan Sadhasivam wrote:
> On Wed, May 28, 2025 at 05:35:00PM -0500, Bjorn Helgaas wrote:
> > On Thu, May 08, 2025 at 12:40:33PM +0530, Manivannan Sadhasivam wrote:
> > > The PCI link, when down, needs to be recovered to bring it back. But that
> > > cannot be done in a generic way as link recovery procedure is specific to
> > > host bridges. So add a new API pci_host_handle_link_down() that could be
> > > called by the host bridge drivers when the link goes down.
> > > 
> > > The API will iterate through all the slots and calls the pcie_do_recovery()
> > > function with 'pci_channel_io_frozen' as the state. This will result in the
> > > execution of the AER Fatal error handling code. Since the link down
> > > recovery is pretty much the same as AER Fatal error handling,
> > > pcie_do_recovery() helper is reused here. First the AER error_detected
> > > callback will be triggered for the bridge and the downstream devices. Then,
> > > pci_host_reset_slot() will be called for the slot, which will reset the
> > > slot using 'reset_slot' callback to recover the link. Once that's done,
> > > resume message will be broadcasted to the bridge and the downstream devices
> > > indicating successful link recovery.
> > 
> > Link down is an event for a single Root Port.  Why would we iterate
> > through all the Root Ports if the link went down for one of them?
> 
> Because on the reference platform (Qcom), link down notification is
> not per-port, but per controller. So that's why we are iterating
> through all ports.  The callback is supposed to identify the ports
> that triggered the link down event and recover them.

Maybe I'm missing something.  Which callback identifies the port(s)
that triggered the link down event?  I see that
pci_host_handle_link_down() is called by
rockchip_pcie_rc_sys_irq_thread() and qcom_pcie_global_irq_thread(),
but I don't see the logic that identifies a particular Root Port.

Per-controller notification of per-port events is a controller
deficiency, not something inherent to PCIe.  I don't think we should
build common infrastructure that resets all the Root Ports just
because one of them had an issue.

I think pci_host_handle_link_down() should take a Root Port, not a
host bridge, and the controller driver should figure out which port
needs to be recovered, or the controller driver can have its own loop
to recover all of them if it can't figure out which one needs it.

Bjorn

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 4/5] PCI: host-common: Add link down handling for host bridges
  2025-05-30 11:34       ` Bjorn Helgaas
@ 2025-05-30 16:09         ` Manivannan Sadhasivam
  2025-06-02 21:13           ` Bjorn Helgaas
  2025-07-07 11:05           ` Niklas Cassel
  0 siblings, 2 replies; 28+ messages in thread
From: Manivannan Sadhasivam @ 2025-05-30 16:09 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Mahesh J Salgaonkar, Oliver O'Halloran, Bjorn Helgaas,
	Lorenzo Pieralisi, Krzysztof Wilczyński, Rob Herring,
	Zhou Wang, Will Deacon, Robert Richter, Alyssa Rosenzweig,
	Marc Zyngier, Conor Dooley, Daire McNamara, dingwei, cassel,
	Lukas Wunner, Krishna Chaitanya Chundru, linuxppc-dev, linux-pci,
	linux-kernel, linux-arm-msm, linux-arm-kernel, linux-riscv

On Fri, May 30, 2025 at 06:34:04AM -0500, Bjorn Helgaas wrote:
> On Fri, May 30, 2025 at 09:16:59AM +0530, Manivannan Sadhasivam wrote:
> > On Wed, May 28, 2025 at 05:35:00PM -0500, Bjorn Helgaas wrote:
> > > On Thu, May 08, 2025 at 12:40:33PM +0530, Manivannan Sadhasivam wrote:
> > > > The PCI link, when down, needs to be recovered to bring it back. But that
> > > > cannot be done in a generic way as link recovery procedure is specific to
> > > > host bridges. So add a new API pci_host_handle_link_down() that could be
> > > > called by the host bridge drivers when the link goes down.
> > > > 
> > > > The API will iterate through all the slots and calls the pcie_do_recovery()
> > > > function with 'pci_channel_io_frozen' as the state. This will result in the
> > > > execution of the AER Fatal error handling code. Since the link down
> > > > recovery is pretty much the same as AER Fatal error handling,
> > > > pcie_do_recovery() helper is reused here. First the AER error_detected
> > > > callback will be triggered for the bridge and the downstream devices. Then,
> > > > pci_host_reset_slot() will be called for the slot, which will reset the
> > > > slot using 'reset_slot' callback to recover the link. Once that's done,
> > > > resume message will be broadcasted to the bridge and the downstream devices
> > > > indicating successful link recovery.
> > > 
> > > Link down is an event for a single Root Port.  Why would we iterate
> > > through all the Root Ports if the link went down for one of them?
> > 
> > Because on the reference platform (Qcom), link down notification is
> > not per-port, but per controller. So that's why we are iterating
> > through all ports.  The callback is supposed to identify the ports
> > that triggered the link down event and recover them.
> 
> Maybe I'm missing something.  Which callback identifies the port(s)
> that triggered the link down event?

I was referring to the host_bridge::reset_root_port() callback that resets the
root ports.

>  I see that
> pci_host_handle_link_down() is called by
> rockchip_pcie_rc_sys_irq_thread() and qcom_pcie_global_irq_thread(),
> but I don't see the logic that identifies a particular Root Port.
> 
> Per-controller notification of per-port events is a controller
> deficiency, not something inherent to PCIe.  I don't think we should
> build common infrastructure that resets all the Root Ports just
> because one of them had an issue.
> 

Hmm, fair enough.

> I think pci_host_handle_link_down() should take a Root Port, not a
> host bridge, and the controller driver should figure out which port
> needs to be recovered, or the controller driver can have its own loop
> to recover all of them if it can't figure out which one needs it.
> 

This should also work. Feel free to drop the relevant commits for v6.16, I can
resubmit them (including dw-rockchip after -rc1).

- Mani

-- 
மணிவண்ணன் சதாசிவம்

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 4/5] PCI: host-common: Add link down handling for host bridges
  2025-05-30 16:09         ` Manivannan Sadhasivam
@ 2025-06-02 21:13           ` Bjorn Helgaas
  2025-07-07 11:05           ` Niklas Cassel
  1 sibling, 0 replies; 28+ messages in thread
From: Bjorn Helgaas @ 2025-06-02 21:13 UTC (permalink / raw)
  To: Manivannan Sadhasivam
  Cc: Mahesh J Salgaonkar, Oliver O'Halloran, Bjorn Helgaas,
	Lorenzo Pieralisi, Krzysztof Wilczyński, Rob Herring,
	Zhou Wang, Will Deacon, Robert Richter, Alyssa Rosenzweig,
	Marc Zyngier, Conor Dooley, Daire McNamara, dingwei, cassel,
	Lukas Wunner, Krishna Chaitanya Chundru, linuxppc-dev, linux-pci,
	linux-kernel, linux-arm-msm, linux-arm-kernel, linux-riscv

On Fri, May 30, 2025 at 09:39:28PM +0530, Manivannan Sadhasivam wrote:
> On Fri, May 30, 2025 at 06:34:04AM -0500, Bjorn Helgaas wrote:
> > On Fri, May 30, 2025 at 09:16:59AM +0530, Manivannan Sadhasivam wrote:
> > > On Wed, May 28, 2025 at 05:35:00PM -0500, Bjorn Helgaas wrote:
> > > > On Thu, May 08, 2025 at 12:40:33PM +0530, Manivannan Sadhasivam wrote:
> > > > > The PCI link, when down, needs to be recovered to bring it back. But that
> > > > > cannot be done in a generic way as link recovery procedure is specific to
> > > > > host bridges. So add a new API pci_host_handle_link_down() that could be
> > > > > called by the host bridge drivers when the link goes down.
> > > > > 
> > > > > The API will iterate through all the slots and calls the pcie_do_recovery()
> > > > > function with 'pci_channel_io_frozen' as the state. This will result in the
> > > > > execution of the AER Fatal error handling code. Since the link down
> > > > > recovery is pretty much the same as AER Fatal error handling,
> > > > > pcie_do_recovery() helper is reused here. First the AER error_detected
> > > > > callback will be triggered for the bridge and the downstream devices. Then,
> > > > > pci_host_reset_slot() will be called for the slot, which will reset the
> > > > > slot using 'reset_slot' callback to recover the link. Once that's done,
> > > > > resume message will be broadcasted to the bridge and the downstream devices
> > > > > indicating successful link recovery.
> > > > 
> > > > Link down is an event for a single Root Port.  Why would we iterate
> > > > through all the Root Ports if the link went down for one of them?
> > > 
> > > Because on the reference platform (Qcom), link down notification is
> > > not per-port, but per controller. So that's why we are iterating
> > > through all ports.  The callback is supposed to identify the ports
> > > that triggered the link down event and recover them.
> > 
> > Maybe I'm missing something.  Which callback identifies the port(s)
> > that triggered the link down event?
> 
> I was referring to the host_bridge::reset_root_port() callback that resets the
> root ports.
> 
> >  I see that
> > pci_host_handle_link_down() is called by
> > rockchip_pcie_rc_sys_irq_thread() and qcom_pcie_global_irq_thread(),
> > but I don't see the logic that identifies a particular Root Port.
> > 
> > Per-controller notification of per-port events is a controller
> > deficiency, not something inherent to PCIe.  I don't think we should
> > build common infrastructure that resets all the Root Ports just
> > because one of them had an issue.
> 
> Hmm, fair enough.
> 
> > I think pci_host_handle_link_down() should take a Root Port, not a
> > host bridge, and the controller driver should figure out which port
> > needs to be recovered, or the controller driver can have its own loop
> > to recover all of them if it can't figure out which one needs it.
> 
> This should also work. Feel free to drop the relevant commits for
> v6.16, I can resubmit them (including dw-rockchip after -rc1).

OK, I kept "PCI: host-common: Make the driver as a common library for
host controller drivers" (renamed to "PCI: host-common: Convert to
library for host controller drivers") on pci/controller/dw-rockchip
for v6.16 and deferred the rest until later.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 4/5] PCI: host-common: Add link down handling for host bridges
  2025-05-08  7:10 ` [PATCH v4 4/5] PCI: host-common: Add link down handling for host bridges Manivannan Sadhasivam
  2025-05-14  6:30   ` Krishna Chaitanya Chundru
  2025-05-28 22:35   ` Bjorn Helgaas
@ 2025-06-02 21:19   ` Bjorn Helgaas
  2 siblings, 0 replies; 28+ messages in thread
From: Bjorn Helgaas @ 2025-06-02 21:19 UTC (permalink / raw)
  To: Manivannan Sadhasivam
  Cc: Mahesh J Salgaonkar, Oliver O'Halloran, Bjorn Helgaas,
	Lorenzo Pieralisi, Krzysztof Wilczyński, Rob Herring,
	Zhou Wang, Will Deacon, Robert Richter, Alyssa Rosenzweig,
	Marc Zyngier, Conor Dooley, Daire McNamara, dingwei, cassel,
	Lukas Wunner, Krishna Chaitanya Chundru, linuxppc-dev, linux-pci,
	linux-kernel, linux-arm-msm, linux-arm-kernel, linux-riscv

On Thu, May 08, 2025 at 12:40:33PM +0530, Manivannan Sadhasivam wrote:
> The PCI link, when down, needs to be recovered to bring it back. But that
> cannot be done in a generic way as link recovery procedure is specific to
> host bridges. So add a new API pci_host_handle_link_down() that could be
> called by the host bridge drivers when the link goes down.

IIUC you plumbed this into the reset path so the standard entries
(pci_reset_function() and the sysfs "reset" files) work can now work
for Root Ports on DT systems just like they do for ACPI systems
(assuming the ACPI systems supply an _RST method for the ports).  That
all sounds good.

> The API will iterate through all the slots and calls the pcie_do_recovery()
> function with 'pci_channel_io_frozen' as the state. This will result in the
> execution of the AER Fatal error handling code. Since the link down
> recovery is pretty much the same as AER Fatal error handling,
> pcie_do_recovery() helper is reused here. First the AER error_detected
> callback will be triggered for the bridge and the downstream devices. Then,
> pci_host_reset_slot() will be called for the slot, which will reset the
> slot using 'reset_slot' callback to recover the link. Once that's done,
> resume message will be broadcasted to the bridge and the downstream devices
> indicating successful link recovery.

We have standard PCIe mechanisms to learn about "link down" events,
e.g., AER Surprise Down error reporting and the Data Link Layer State
Changed events for hot-plug capable ports.

How does this controller-specific "link down" notification relate to
those?  Is this for controllers that don't support those AER or
hotplug mechanisms?  Or is this a different scenario that wouldn't be
covered by them?

If AER is enabled, do we get both the AER interrupt and the controller
"link down" interrupt?

> In case if the AER support is not enabled in the kernel, only
> pci_bus_error_reset() will be called for each slots as there is no way we
> could inform the drivers about link recovery.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 4/5] PCI: host-common: Add link down handling for host bridges
  2025-05-30 16:09         ` Manivannan Sadhasivam
  2025-06-02 21:13           ` Bjorn Helgaas
@ 2025-07-07 11:05           ` Niklas Cassel
  2025-07-07 11:29             ` Niklas Cassel
  1 sibling, 1 reply; 28+ messages in thread
From: Niklas Cassel @ 2025-07-07 11:05 UTC (permalink / raw)
  To: Manivannan Sadhasivam
  Cc: Bjorn Helgaas, Mahesh J Salgaonkar, Oliver O'Halloran,
	Bjorn Helgaas, Lorenzo Pieralisi, Krzysztof Wilczyński,
	Rob Herring, Zhou Wang, Will Deacon, Robert Richter,
	Alyssa Rosenzweig, Marc Zyngier, Conor Dooley, Daire McNamara,
	dingwei, Lukas Wunner, Krishna Chaitanya Chundru, linuxppc-dev,
	linux-pci, linux-kernel, linux-arm-msm, linux-arm-kernel,
	linux-riscv

Hello Mani,

On Fri, May 30, 2025 at 09:39:28PM +0530, Manivannan Sadhasivam wrote:
> On Fri, May 30, 2025 at 06:34:04AM -0500, Bjorn Helgaas wrote:
> 
> > I think pci_host_handle_link_down() should take a Root Port, not a
> > host bridge, and the controller driver should figure out which port
> > needs to be recovered, or the controller driver can have its own loop
> > to recover all of them if it can't figure out which one needs it.
> > 
> 
> This should also work. Feel free to drop the relevant commits for v6.16, I can
> resubmit them (including dw-rockchip after -rc1).

What is the current status of this?

I assume that there is not much time left before 6.17 cut-off.


Kind regards,
Niklas

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 4/5] PCI: host-common: Add link down handling for host bridges
  2025-07-07 11:05           ` Niklas Cassel
@ 2025-07-07 11:29             ` Niklas Cassel
  2025-07-09 12:32               ` Manivannan Sadhasivam
  0 siblings, 1 reply; 28+ messages in thread
From: Niklas Cassel @ 2025-07-07 11:29 UTC (permalink / raw)
  To: Manivannan Sadhasivam
  Cc: Bjorn Helgaas, Mahesh J Salgaonkar, Oliver O'Halloran,
	Bjorn Helgaas, Lorenzo Pieralisi, Krzysztof Wilczyński,
	Rob Herring, Zhou Wang, Will Deacon, Robert Richter,
	Alyssa Rosenzweig, Marc Zyngier, Conor Dooley, Daire McNamara,
	dingwei, Lukas Wunner, Krishna Chaitanya Chundru, linuxppc-dev,
	linux-pci, linux-kernel, linux-arm-msm, linux-arm-kernel,
	linux-riscv

+ Mani's kernel.org email.

On Mon, Jul 07, 2025 at 01:05:39PM +0200, Niklas Cassel wrote:
> Hello Mani,
> 
> On Fri, May 30, 2025 at 09:39:28PM +0530, Manivannan Sadhasivam wrote:
> > On Fri, May 30, 2025 at 06:34:04AM -0500, Bjorn Helgaas wrote:
> > 
> > > I think pci_host_handle_link_down() should take a Root Port, not a
> > > host bridge, and the controller driver should figure out which port
> > > needs to be recovered, or the controller driver can have its own loop
> > > to recover all of them if it can't figure out which one needs it.
> > > 
> > 
> > This should also work. Feel free to drop the relevant commits for v6.16, I can
> > resubmit them (including dw-rockchip after -rc1).
> 
> What is the current status of this?
> 
> I assume that there is not much time left before 6.17 cut-off.
> 
> 
> Kind regards,
> Niklas

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 4/5] PCI: host-common: Add link down handling for host bridges
  2025-07-07 11:29             ` Niklas Cassel
@ 2025-07-09 12:32               ` Manivannan Sadhasivam
  0 siblings, 0 replies; 28+ messages in thread
From: Manivannan Sadhasivam @ 2025-07-09 12:32 UTC (permalink / raw)
  To: Niklas Cassel
  Cc: Bjorn Helgaas, Mahesh J Salgaonkar, Oliver O'Halloran,
	Bjorn Helgaas, Lorenzo Pieralisi, Krzysztof Wilczyński,
	Rob Herring, Zhou Wang, Will Deacon, Robert Richter,
	Alyssa Rosenzweig, Marc Zyngier, Conor Dooley, Daire McNamara,
	dingwei, Lukas Wunner, Krishna Chaitanya Chundru, linuxppc-dev,
	linux-pci, linux-kernel, linux-arm-msm, linux-arm-kernel,
	linux-riscv

On Mon, Jul 07, 2025 at 01:29:22PM GMT, Niklas Cassel wrote:
> + Mani's kernel.org email.
> 
> On Mon, Jul 07, 2025 at 01:05:39PM +0200, Niklas Cassel wrote:
> > Hello Mani,
> > 
> > On Fri, May 30, 2025 at 09:39:28PM +0530, Manivannan Sadhasivam wrote:
> > > On Fri, May 30, 2025 at 06:34:04AM -0500, Bjorn Helgaas wrote:
> > > 
> > > > I think pci_host_handle_link_down() should take a Root Port, not a
> > > > host bridge, and the controller driver should figure out which port
> > > > needs to be recovered, or the controller driver can have its own loop
> > > > to recover all of them if it can't figure out which one needs it.
> > > > 
> > > 
> > > This should also work. Feel free to drop the relevant commits for v6.16, I can
> > > resubmit them (including dw-rockchip after -rc1).
> > 
> > What is the current status of this?
> > 

Thanks for the nudge!

I couldn't respin the series as I lost access to the hardware I was testing on
due to job change. I'll figure out a way to test it and respin it asap.

> > I assume that there is not much time left before 6.17 cut-off.

Mid of -rc5 is not that bad ;) Let's see how long it takes.

- Mani

-- 
மணிவண்ணன் சதாசிவம்

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2025-07-09 12:32 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-08  7:10 [PATCH v4 0/5] PCI: Add support for resetting the slots in a platform specific way Manivannan Sadhasivam
2025-05-08  7:10 ` [PATCH v4 1/5] PCI/ERR: Remove misleading TODO regarding kernel panic Manivannan Sadhasivam
2025-05-09  3:04   ` Wilfred Mallawa
2025-05-09  6:11   ` Ethan Zhao
2025-05-09  6:51     ` Manivannan Sadhasivam
2025-05-14 16:39   ` Sathyanarayanan Kuppuswamy
2025-05-08  7:10 ` [PATCH v4 2/5] PCI/ERR: Add support for resetting the slots in a platform specific way Manivannan Sadhasivam
2025-05-08  7:18   ` Lukas Wunner
2025-05-09  2:59   ` Wilfred Mallawa
2025-05-14  6:33   ` Krishna Chaitanya Chundru
2025-05-14 16:42   ` Sathyanarayanan Kuppuswamy
2025-05-08  7:10 ` [PATCH v4 3/5] PCI: host-common: Make the driver as a common library for host controller drivers Manivannan Sadhasivam
2025-05-08  7:10 ` [PATCH v4 4/5] PCI: host-common: Add link down handling for host bridges Manivannan Sadhasivam
2025-05-14  6:30   ` Krishna Chaitanya Chundru
2025-05-14 16:33     ` Manivannan Sadhasivam
2025-05-28 22:35   ` Bjorn Helgaas
2025-05-30  3:46     ` Manivannan Sadhasivam
2025-05-30 11:34       ` Bjorn Helgaas
2025-05-30 16:09         ` Manivannan Sadhasivam
2025-06-02 21:13           ` Bjorn Helgaas
2025-07-07 11:05           ` Niklas Cassel
2025-07-07 11:29             ` Niklas Cassel
2025-07-09 12:32               ` Manivannan Sadhasivam
2025-06-02 21:19   ` Bjorn Helgaas
2025-05-08  7:10 ` [PATCH v4 5/5] PCI: qcom: Add support for resetting the slot due to link down event Manivannan Sadhasivam
2025-05-14  6:22   ` Krishna Chaitanya Chundru
2025-05-14 16:35     ` Manivannan Sadhasivam
2025-05-14 16:32 ` [PATCH v4 0/5] PCI: Add support for resetting the slots in a platform specific way Manivannan Sadhasivam

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).