public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/4] PCIe hotplug interrupt related fixes
@ 2025-02-24  3:44 Feng Tang
  2025-02-24  3:44 ` [PATCH v3 1/4] PCI: portdrv: pciehp: Move PCIe hotplug command waiting logic to port driver Feng Tang
                   ` (3 more replies)
  0 siblings, 4 replies; 21+ messages in thread
From: Feng Tang @ 2025-02-24  3:44 UTC (permalink / raw)
  To: Bjorn Helgaas, Lukas Wunner, Sathyanarayanan Kuppuswamy,
	Liguang Zhang, Guanghui Feng, rafael
  Cc: Markus Elfring, lkp, Jonathan Cameron, ilpo.jarvinen, linux-pci,
	linux-kernel, Feng Tang

Hi all,

This patchset tries to address 2 PCIe hotplug interrupt related problems
we met recently:

1. Firmware developers reported that they received two PCIe hotplug commands
   in very short intervals on an ARM server, which doesn't comply with PCIe
   spec, and broke their state machine and work flow.
2. An irq storm bug found when testing "pci=nomsi" case, and the root
   cause is: 'nomsi' will disable MSI and let devices and root ports use
   legacy INTX interrupt, and likely make several devices/ports share one
   interrupt. In the failure case, BIOS doesn't disable the pcie hotplug
   interrupts, and actually asserts the command-complete interrupt.

More details could be found in commit log of patch 2/4 and 4/4. Basically:
    Patch 0001 moves the PCIe hotplug command waiting funtion from pciehp
               driver to PCIe port driver for code reuse.
    Patch 0002 adds the necessary wait for PCIe hotplug command
    Patch 0003 loose the condition check for interrupt disabling
    Patch 0004 for msi disabled case, disable PCIe hotplug interrupt in
               early boot phase 

Please help to review, thanks!

- Feng

Changelog:

  since v2:
    * Add patch 0001, which move the waiting logic of pcie_poll_cmd from pciehp
      driver to PCIe port driver for code reuse (Bjorn Helgaas)
    * Separate Lucas' suggestion out as patch 0003 (Bjorn and Sathyanarayanan)  
    * Avoid hotplug command waiting for HW without command-complete
      event support (Bjorn Helgaas)
    * Fix spell issue in commit log (Bjorn and Markus)
    * Add cover-letter for whole patchset (Markus Elfring)
    * Handle a set-but-unused build warning (0Day lkp bot)

  since v1:
    * Add the Originally-by for Liguang for patch 0002. The issue was found on
      a 5.10 kernel, then 6.6. I was initially given a 5.10 kernel tar ball
      without git info to debug the issue, and made the patch. Thanks to Guanghui
      who recently pointed me to tree https://gitee.com/anolis/cloud-kernel which
      show the wait logic in 5.10 was originally from Liguang, and never hit
      mainline.
    * Make the irq disabling not dependent on wthether pciehp service driver
      will be loaded (Lukas Wunner) 
    * Use read_poll_timeout() API to simply the waiting logic (Sathyanarayanan
      Kuppuswamy)
    * Fix wrong email address (Markus Elfring)
    * Add logic to skip irq disabling if it is already disabled.


Feng Tang (4):
  PCI: portdrv: pciehp: Move PCIe hotplug command waiting logic to port
    driver
  PCI/portdrv: Add necessary wait for disabling hotplug events
  PCI/portdrv: Loose the condition check for disabling hotplug
    interrupts
  PCI: Disable PCIe hotplug interrupts early when msi is disabled

 drivers/pci/hotplug/pciehp_hpc.c | 38 ++++++------------------
 drivers/pci/pci.h                |  7 +++++
 drivers/pci/pcie/portdrv.c       | 50 ++++++++++++++++++++++++++++----
 drivers/pci/probe.c              |  9 ++++++
 4 files changed, 70 insertions(+), 34 deletions(-)

-- 
2.43.5


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v3 1/4] PCI: portdrv: pciehp: Move PCIe hotplug command waiting logic to port driver
  2025-02-24  3:44 [PATCH v3 0/4] PCIe hotplug interrupt related fixes Feng Tang
@ 2025-02-24  3:44 ` Feng Tang
  2025-02-24 15:06   ` Ilpo Järvinen
  2025-02-24  3:44 ` [PATCH v3 2/4] PCI/portdrv: Add necessary wait for disabling hotplug events Feng Tang
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 21+ messages in thread
From: Feng Tang @ 2025-02-24  3:44 UTC (permalink / raw)
  To: Bjorn Helgaas, Lukas Wunner, Sathyanarayanan Kuppuswamy,
	Liguang Zhang, Guanghui Feng, rafael
  Cc: Markus Elfring, lkp, Jonathan Cameron, ilpo.jarvinen, linux-pci,
	linux-kernel, Feng Tang

According to PCIe spec, after sending a hotplug command, software should
wait some time for the command completion. Currently the waiting logic
is implemented in pciehp driver, as the same logic will be reused by
PCIe port driver, move it to port driver, which complies with the logic
of CONFIG_HOTPLUG_PCI_PCIE depending on CONFIG_PCIEPORTBUS.

Also convert the loop wait logic to helper read_poll_timeout() as
suggested by Sathyanarayanan Kuppuswamy.

Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com>
---
 drivers/pci/hotplug/pciehp_hpc.c | 38 ++++++++------------------------
 drivers/pci/pci.h                |  5 +++++
 drivers/pci/pcie/portdrv.c       | 25 +++++++++++++++++++++
 3 files changed, 39 insertions(+), 29 deletions(-)

diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index bb5a8d9f03ad..24e346f558db 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -83,32 +83,6 @@ static inline void pciehp_free_irq(struct controller *ctrl)
 		free_irq(ctrl->pcie->irq, ctrl);
 }
 
-static int pcie_poll_cmd(struct controller *ctrl, int timeout)
-{
-	struct pci_dev *pdev = ctrl_dev(ctrl);
-	u16 slot_status;
-
-	do {
-		pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, &slot_status);
-		if (PCI_POSSIBLE_ERROR(slot_status)) {
-			ctrl_info(ctrl, "%s: no response from device\n",
-				  __func__);
-			return 0;
-		}
-
-		if (slot_status & PCI_EXP_SLTSTA_CC) {
-			pcie_capability_write_word(pdev, PCI_EXP_SLTSTA,
-						   PCI_EXP_SLTSTA_CC);
-			ctrl->cmd_busy = 0;
-			smp_mb();
-			return 1;
-		}
-		msleep(10);
-		timeout -= 10;
-	} while (timeout >= 0);
-	return 0;	/* timeout */
-}
-
 static void pcie_wait_cmd(struct controller *ctrl)
 {
 	unsigned int msecs = pciehp_poll_mode ? 2500 : 1000;
@@ -138,10 +112,16 @@ static void pcie_wait_cmd(struct controller *ctrl)
 		timeout = cmd_timeout - now;
 
 	if (ctrl->slot_ctrl & PCI_EXP_SLTCTL_HPIE &&
-	    ctrl->slot_ctrl & PCI_EXP_SLTCTL_CCIE)
+	    ctrl->slot_ctrl & PCI_EXP_SLTCTL_CCIE) {
 		rc = wait_event_timeout(ctrl->queue, !ctrl->cmd_busy, timeout);
-	else
-		rc = pcie_poll_cmd(ctrl, jiffies_to_msecs(timeout));
+	} else {
+		rc = pcie_poll_sltctl_cmd(ctrl_dev(ctrl), jiffies_to_msecs(timeout));
+		if (!rc) {
+			ctrl->cmd_busy = 0;
+			smp_mb();
+			rc = 1;
+		}
+	}
 
 	if (!rc)
 		ctrl_info(ctrl, "Timeout on hotplug command %#06x (issued %u msec ago)\n",
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 01e51db8d285..4c94a589de4a 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -759,12 +759,17 @@ static inline void pcie_ecrc_get_policy(char *str) { }
 #ifdef CONFIG_PCIEPORTBUS
 void pcie_reset_lbms_count(struct pci_dev *port);
 int pcie_lbms_count(struct pci_dev *port, unsigned long *val);
+int pcie_poll_sltctl_cmd(struct pci_dev *dev, int timeout_ms);
 #else
 static inline void pcie_reset_lbms_count(struct pci_dev *port) {}
 static inline int pcie_lbms_count(struct pci_dev *port, unsigned long *val)
 {
 	return -EOPNOTSUPP;
 }
+static inline int pcie_poll_sltctl_cmd(struct pci_dev *dev, int timeout_ms)
+{
+	return 0;
+}
 #endif
 
 struct pci_dev_reset_methods {
diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c
index 02e73099bad0..bb00ba45ee51 100644
--- a/drivers/pci/pcie/portdrv.c
+++ b/drivers/pci/pcie/portdrv.c
@@ -18,6 +18,7 @@
 #include <linux/string.h>
 #include <linux/slab.h>
 #include <linux/aer.h>
+#include <linux/iopoll.h>
 
 #include "../pci.h"
 #include "portdrv.h"
@@ -205,6 +206,30 @@ static int pcie_init_service_irqs(struct pci_dev *dev, int *irqs, int mask)
 	return 0;
 }
 
+/* Return 0 on command completed on time, otherwise return -ETIMEOUT */
+int pcie_poll_sltctl_cmd(struct pci_dev *dev, int timeout_ms)
+{
+	u16 slot_status = 0;
+	u32 slot_cap;
+	int ret = 0;
+	int __maybe_unused ret1;
+
+	/* Don't wait if the command complete event is not well supported */
+	pcie_capability_read_dword(dev, PCI_EXP_SLTCAP, &slot_cap);
+	if (!(slot_cap & PCI_EXP_SLTCAP_HPC) || slot_cap & PCI_EXP_SLTCAP_NCCS)
+		return ret;
+
+	ret = read_poll_timeout(pcie_capability_read_word, ret1,
+				(slot_status & PCI_EXP_SLTSTA_CC), 10000,
+				timeout_ms * 1000, true, dev, PCI_EXP_SLTSTA,
+				&slot_status);
+	if (!ret)
+		pcie_capability_write_word(dev, PCI_EXP_SLTSTA,
+						PCI_EXP_SLTSTA_CC);
+
+	return  ret;
+}
+
 /**
  * get_port_device_capability - discover capabilities of a PCI Express port
  * @dev: PCI Express port to examine
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3 2/4] PCI/portdrv: Add necessary wait for disabling hotplug events
  2025-02-24  3:44 [PATCH v3 0/4] PCIe hotplug interrupt related fixes Feng Tang
  2025-02-24  3:44 ` [PATCH v3 1/4] PCI: portdrv: pciehp: Move PCIe hotplug command waiting logic to port driver Feng Tang
@ 2025-02-24  3:44 ` Feng Tang
  2025-02-24  9:42   ` Markus Elfring
                     ` (2 more replies)
  2025-02-24  3:44 ` [PATCH v3 3/4] PCI/portdrv: Loose the condition check for disabling hotplug interrupts Feng Tang
  2025-02-24  3:45 ` [PATCH v3 4/4] PCI: Disable PCIe hotplug interrupts early when msi is disabled Feng Tang
  3 siblings, 3 replies; 21+ messages in thread
From: Feng Tang @ 2025-02-24  3:44 UTC (permalink / raw)
  To: Bjorn Helgaas, Lukas Wunner, Sathyanarayanan Kuppuswamy,
	Liguang Zhang, Guanghui Feng, rafael
  Cc: Markus Elfring, lkp, Jonathan Cameron, ilpo.jarvinen, linux-pci,
	linux-kernel, Feng Tang

There was problem reported by firmware developers that they received
two PCIe hotplug commands in very short intervals on an ARM server,
which doesn't comply with PCIe spec, and broke their state machine and
work flow. According to PCIe 6.1 spec, section 6.7.3.2, software needs
to wait at least 1 second for the command-complete event, before
resending the command or sending a new command.

In the failure case, the first PCIe hotplug command firmware received
is from get_port_device_capability(), which sends command to disable
PCIe hotplug interrupts without waiting for its completion, and the
second command comes from pcie_enable_notification() of pciehp driver,
which enables hotplug interrupts again.

Fix it by adding the necessary wait to comply with PCIe spec.

Fixes: 2bd50dd800b5 ("PCI: PCIe: Disable PCIe port services during port initialization")
Originally-by: Liguang Zhang <zhangliguang@linux.alibaba.com>
Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com>
---
 drivers/pci/pci.h          |  2 ++
 drivers/pci/pcie/portdrv.c | 19 +++++++++++++++++--
 2 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 4c94a589de4a..a1138ebc2689 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -760,6 +760,7 @@ static inline void pcie_ecrc_get_policy(char *str) { }
 void pcie_reset_lbms_count(struct pci_dev *port);
 int pcie_lbms_count(struct pci_dev *port, unsigned long *val);
 int pcie_poll_sltctl_cmd(struct pci_dev *dev, int timeout_ms);
+void pcie_disable_hp_interrupts_early(struct pci_dev *dev);
 #else
 static inline void pcie_reset_lbms_count(struct pci_dev *port) {}
 static inline int pcie_lbms_count(struct pci_dev *port, unsigned long *val)
@@ -770,6 +771,7 @@ static inline int pcie_poll_sltctl_cmd(struct pci_dev *dev, int timeout_ms)
 {
 	return 0;
 }
+static inline void pcie_disable_hp_interrupts_early(struct pci_dev *dev) {}
 #endif
 
 struct pci_dev_reset_methods {
diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c
index bb00ba45ee51..ca4f21dff486 100644
--- a/drivers/pci/pcie/portdrv.c
+++ b/drivers/pci/pcie/portdrv.c
@@ -230,6 +230,22 @@ int pcie_poll_sltctl_cmd(struct pci_dev *dev, int timeout_ms)
 	return  ret;
 }
 
+void pcie_disable_hp_interrupts_early(struct pci_dev *dev)
+{
+	u16 slot_ctrl = 0;
+
+	pcie_capability_read_word(dev, PCI_EXP_SLTCTL, &slot_ctrl);
+	/* Bail out early if it is already disabled */
+	if (!(slot_ctrl & (PCI_EXP_SLTCTL_CCIE | PCI_EXP_SLTCTL_HPIE)))
+		return;
+
+	pcie_capability_clear_word(dev, PCI_EXP_SLTCTL,
+		  PCI_EXP_SLTCTL_CCIE | PCI_EXP_SLTCTL_HPIE);
+
+	if (pcie_poll_sltctl_cmd(dev, 1000))
+		pci_info(dev, "Timeout on disabling PCIe hot-plug interrupt\n");
+}
+
 /**
  * get_port_device_capability - discover capabilities of a PCI Express port
  * @dev: PCI Express port to examine
@@ -255,8 +271,7 @@ static int get_port_device_capability(struct pci_dev *dev)
 		 * Disable hot-plug interrupts in case they have been enabled
 		 * by the BIOS and the hot-plug service driver is not loaded.
 		 */
-		pcie_capability_clear_word(dev, PCI_EXP_SLTCTL,
-			  PCI_EXP_SLTCTL_CCIE | PCI_EXP_SLTCTL_HPIE);
+		pcie_disable_hp_interrupts_early(dev);
 	}
 
 #ifdef CONFIG_PCIEAER
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3 3/4] PCI/portdrv: Loose the condition check for disabling hotplug interrupts
  2025-02-24  3:44 [PATCH v3 0/4] PCIe hotplug interrupt related fixes Feng Tang
  2025-02-24  3:44 ` [PATCH v3 1/4] PCI: portdrv: pciehp: Move PCIe hotplug command waiting logic to port driver Feng Tang
  2025-02-24  3:44 ` [PATCH v3 2/4] PCI/portdrv: Add necessary wait for disabling hotplug events Feng Tang
@ 2025-02-24  3:44 ` Feng Tang
  2025-02-24  9:47   ` Markus Elfring
  2025-02-25  4:09   ` Lukas Wunner
  2025-02-24  3:45 ` [PATCH v3 4/4] PCI: Disable PCIe hotplug interrupts early when msi is disabled Feng Tang
  3 siblings, 2 replies; 21+ messages in thread
From: Feng Tang @ 2025-02-24  3:44 UTC (permalink / raw)
  To: Bjorn Helgaas, Lukas Wunner, Sathyanarayanan Kuppuswamy,
	Liguang Zhang, Guanghui Feng, rafael
  Cc: Markus Elfring, lkp, Jonathan Cameron, ilpo.jarvinen, linux-pci,
	linux-kernel, Feng Tang

Currently when 'pcie_ports_native' and host's 'native_pcie_hotplug' are
both false, kernel will not disable PCIe hotplug interrupts. But as
those could be affected by software setup like kernel cmdline parameter,
remove the depency over them.

Suggested-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com>
---
 drivers/pci/pcie/portdrv.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c
index ca4f21dff486..619d06b1b3e8 100644
--- a/drivers/pci/pcie/portdrv.c
+++ b/drivers/pci/pcie/portdrv.c
@@ -263,9 +263,9 @@ static int get_port_device_capability(struct pci_dev *dev)
 
 	if (dev->is_hotplug_bridge &&
 	    (pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
-	     pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM) &&
-	    (pcie_ports_native || host->native_pcie_hotplug)) {
-		services |= PCIE_PORT_SERVICE_HP;
+	     pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM)) {
+		if  (pcie_ports_native || host->native_pcie_hotplug)
+			services |= PCIE_PORT_SERVICE_HP;
 
 		/*
 		 * Disable hot-plug interrupts in case they have been enabled
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3 4/4] PCI: Disable PCIe hotplug interrupts early when msi is disabled
  2025-02-24  3:44 [PATCH v3 0/4] PCIe hotplug interrupt related fixes Feng Tang
                   ` (2 preceding siblings ...)
  2025-02-24  3:44 ` [PATCH v3 3/4] PCI/portdrv: Loose the condition check for disabling hotplug interrupts Feng Tang
@ 2025-02-24  3:45 ` Feng Tang
  3 siblings, 0 replies; 21+ messages in thread
From: Feng Tang @ 2025-02-24  3:45 UTC (permalink / raw)
  To: Bjorn Helgaas, Lukas Wunner, Sathyanarayanan Kuppuswamy,
	Liguang Zhang, Guanghui Feng, rafael
  Cc: Markus Elfring, lkp, Jonathan Cameron, ilpo.jarvinen, linux-pci,
	linux-kernel, Feng Tang

There was an irq storm bug when testing "pci=nomsi" case, and the root
cause is: 'nomsi' will disable MSI and let devices and root ports use
legacy INTX interrupt, and likely make several devices/ports share one
interrupt. In the failure case, BIOS doesn't disable the PCIe hotplug
interrupts, and the command-complete interrupt was actually asserted.

So the timeline is:
1. pciehp's CCIE/HPIE enabled and command-complete interrupts asserted
2. the interrupt is shared by PCIe root port and nvme/nic device
3. nvme/nic driver's probe function enables the interrupt line
4. pciehp driver is loaded later or never loaded

And the "nobody cared irq storm" happens between 3 and 4. This is not
an issue for normal MSI case, as each interrupt is controlled by its own
driver. When the driver is not loaded, the interrupt won't get fired
to kernel even if it is physically asserted.

So disable the PCIe hotplug CCIE/HPIE interrupt in early boot phase when
MSI is not enabled.

Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com>
---
 drivers/pci/probe.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 246744d8d268..ffea7851366a 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1665,6 +1665,15 @@ void set_pcie_hotplug_bridge(struct pci_dev *pdev)
 	pcie_capability_read_dword(pdev, PCI_EXP_SLTCAP, &reg32);
 	if (reg32 & PCI_EXP_SLTCAP_HPC)
 		pdev->is_hotplug_bridge = 1;
+
+	/*
+	 * When MSI is disabled, root port will use legacy INTX, and likely
+	 * share INTX interrupt line with other devices like NIC/NVME. There
+	 * was real world issue that the CCIE IRQ is asserted afer boot, but
+	 * will not be handled well and cause IRQ storm. So disable it early.
+	 */
+	if (!pci_msi_enabled())
+		pcie_disable_hp_interrupts_early(pdev);
 }
 
 static void set_pcie_thunderbolt(struct pci_dev *dev)
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 2/4] PCI/portdrv: Add necessary wait for disabling hotplug events
  2025-02-24  3:44 ` [PATCH v3 2/4] PCI/portdrv: Add necessary wait for disabling hotplug events Feng Tang
@ 2025-02-24  9:42   ` Markus Elfring
  2025-02-24 15:00   ` Ilpo Järvinen
  2025-02-24 18:12   ` Lukas Wunner
  2 siblings, 0 replies; 21+ messages in thread
From: Markus Elfring @ 2025-02-24  9:42 UTC (permalink / raw)
  To: Feng Tang, linux-pci, Bjorn Helgaas, Guanghui Feng, Liguang Zhang,
	Lukas Wunner, Rafael J. Wysocki, Sathyanarayanan Kuppuswamy
  Cc: lkp, LKML, Ilpo Järvinen, Jonathan Cameron

…
> work flow. According to PCIe 6.1 spec, section 6.7.3.2, software needs
…

                                   specification?

Regards,
Markus

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 3/4] PCI/portdrv: Loose the condition check for disabling hotplug interrupts
  2025-02-24  3:44 ` [PATCH v3 3/4] PCI/portdrv: Loose the condition check for disabling hotplug interrupts Feng Tang
@ 2025-02-24  9:47   ` Markus Elfring
  2025-02-25  4:09   ` Lukas Wunner
  1 sibling, 0 replies; 21+ messages in thread
From: Markus Elfring @ 2025-02-24  9:47 UTC (permalink / raw)
  To: Feng Tang, linux-pci, Bjorn Helgaas, Guanghui Feng, Liguang Zhang,
	Lukas Wunner, Rafael J. Wysocki, Sathyanarayanan Kuppuswamy
  Cc: lkp, LKML, Ilpo Järvinen, Jonathan Cameron

…
> remove the depency over them.

             dependency?

Regards,
Markus

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 2/4] PCI/portdrv: Add necessary wait for disabling hotplug events
  2025-02-24  3:44 ` [PATCH v3 2/4] PCI/portdrv: Add necessary wait for disabling hotplug events Feng Tang
  2025-02-24  9:42   ` Markus Elfring
@ 2025-02-24 15:00   ` Ilpo Järvinen
  2025-02-25  5:51     ` Feng Tang
  2025-02-24 18:12   ` Lukas Wunner
  2 siblings, 1 reply; 21+ messages in thread
From: Ilpo Järvinen @ 2025-02-24 15:00 UTC (permalink / raw)
  To: Feng Tang
  Cc: Bjorn Helgaas, Lukas Wunner, Sathyanarayanan Kuppuswamy,
	Liguang Zhang, Guanghui Feng, Rafael J. Wysocki, Markus Elfring,
	lkp, Jonathan Cameron, linux-pci, LKML

On Mon, 24 Feb 2025, Feng Tang wrote:

> There was problem reported by firmware developers that they received
> two PCIe hotplug commands in very short intervals on an ARM server,
> which doesn't comply with PCIe spec, and broke their state machine and
> work flow. According to PCIe 6.1 spec, section 6.7.3.2, software needs
> to wait at least 1 second for the command-complete event, before
> resending the command or sending a new command.
> 
> In the failure case, the first PCIe hotplug command firmware received
> is from get_port_device_capability(), which sends command to disable
> PCIe hotplug interrupts without waiting for its completion, and the
> second command comes from pcie_enable_notification() of pciehp driver,
> which enables hotplug interrupts again.
> 
> Fix it by adding the necessary wait to comply with PCIe spec.
> 
> Fixes: 2bd50dd800b5 ("PCI: PCIe: Disable PCIe port services during port initialization")
> Originally-by: Liguang Zhang <zhangliguang@linux.alibaba.com>
> Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com>
> ---
>  drivers/pci/pci.h          |  2 ++
>  drivers/pci/pcie/portdrv.c | 19 +++++++++++++++++--
>  2 files changed, 19 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index 4c94a589de4a..a1138ebc2689 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -760,6 +760,7 @@ static inline void pcie_ecrc_get_policy(char *str) { }
>  void pcie_reset_lbms_count(struct pci_dev *port);
>  int pcie_lbms_count(struct pci_dev *port, unsigned long *val);
>  int pcie_poll_sltctl_cmd(struct pci_dev *dev, int timeout_ms);
> +void pcie_disable_hp_interrupts_early(struct pci_dev *dev);
>  #else
>  static inline void pcie_reset_lbms_count(struct pci_dev *port) {}
>  static inline int pcie_lbms_count(struct pci_dev *port, unsigned long *val)
> @@ -770,6 +771,7 @@ static inline int pcie_poll_sltctl_cmd(struct pci_dev *dev, int timeout_ms)
>  {
>  	return 0;
>  }
> +static inline void pcie_disable_hp_interrupts_early(struct pci_dev *dev) {}
>  #endif
>  
>  struct pci_dev_reset_methods {
> diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c
> index bb00ba45ee51..ca4f21dff486 100644
> --- a/drivers/pci/pcie/portdrv.c
> +++ b/drivers/pci/pcie/portdrv.c
> @@ -230,6 +230,22 @@ int pcie_poll_sltctl_cmd(struct pci_dev *dev, int timeout_ms)
>  	return  ret;
>  }
>  
> +void pcie_disable_hp_interrupts_early(struct pci_dev *dev)
> +{
> +	u16 slot_ctrl = 0;

Unnecessary initialization

> +
> +	pcie_capability_read_word(dev, PCI_EXP_SLTCTL, &slot_ctrl);
> +	/* Bail out early if it is already disabled */
> +	if (!(slot_ctrl & (PCI_EXP_SLTCTL_CCIE | PCI_EXP_SLTCTL_HPIE)))
> +		return;
> +
> +	pcie_capability_clear_word(dev, PCI_EXP_SLTCTL,
> +		  PCI_EXP_SLTCTL_CCIE | PCI_EXP_SLTCTL_HPIE);

Align to (. You might need to put the bits to own lines.

> +
> +	if (pcie_poll_sltctl_cmd(dev, 1000))
> +		pci_info(dev, "Timeout on disabling PCIe hot-plug interrupt\n");
> +}
> +
>  /**
>   * get_port_device_capability - discover capabilities of a PCI Express port
>   * @dev: PCI Express port to examine
> @@ -255,8 +271,7 @@ static int get_port_device_capability(struct pci_dev *dev)
>  		 * Disable hot-plug interrupts in case they have been enabled
>  		 * by the BIOS and the hot-plug service driver is not loaded.
>  		 */
> -		pcie_capability_clear_word(dev, PCI_EXP_SLTCTL,
> -			  PCI_EXP_SLTCTL_CCIE | PCI_EXP_SLTCTL_HPIE);
> +		pcie_disable_hp_interrupts_early(dev);

Doesn't calling this here delay setup for all portdrv services, not just 
hotplug? And the delay can be relatively long.

>  	}
>  
>  #ifdef CONFIG_PCIEAER
> 

-- 
 i.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 1/4] PCI: portdrv: pciehp: Move PCIe hotplug command waiting logic to port driver
  2025-02-24  3:44 ` [PATCH v3 1/4] PCI: portdrv: pciehp: Move PCIe hotplug command waiting logic to port driver Feng Tang
@ 2025-02-24 15:06   ` Ilpo Järvinen
  2025-02-25  8:18     ` Feng Tang
  0 siblings, 1 reply; 21+ messages in thread
From: Ilpo Järvinen @ 2025-02-24 15:06 UTC (permalink / raw)
  To: Feng Tang
  Cc: Bjorn Helgaas, Lukas Wunner, Sathyanarayanan Kuppuswamy,
	Liguang Zhang, Guanghui Feng, Rafael J. Wysocki, Markus Elfring,
	lkp, Jonathan Cameron, linux-pci, LKML

On Mon, 24 Feb 2025, Feng Tang wrote:

> According to PCIe spec, after sending a hotplug command, software should
> wait some time for the command completion. Currently the waiting logic

Where is it in the spec, please put a more precise reference.

> is implemented in pciehp driver, as the same logic will be reused by
> PCIe port driver, move it to port driver, which complies with the logic
> of CONFIG_HOTPLUG_PCI_PCIE depending on CONFIG_PCIEPORTBUS.
> 
> Also convert the loop wait logic to helper read_poll_timeout() as
> suggested by Sathyanarayanan Kuppuswamy.

You could express the second part of this with a tag:

Suggested-by: Sathyanarayanan Kuppuswamy <sathyanarayanan.kuppuswamy@linux.intel.com> # Use to read_poll_timeout()

> Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com>
> ---
>  drivers/pci/hotplug/pciehp_hpc.c | 38 ++++++++------------------------
>  drivers/pci/pci.h                |  5 +++++
>  drivers/pci/pcie/portdrv.c       | 25 +++++++++++++++++++++
>  3 files changed, 39 insertions(+), 29 deletions(-)
> 
> diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
> index bb5a8d9f03ad..24e346f558db 100644
> --- a/drivers/pci/hotplug/pciehp_hpc.c
> +++ b/drivers/pci/hotplug/pciehp_hpc.c
> @@ -83,32 +83,6 @@ static inline void pciehp_free_irq(struct controller *ctrl)
>  		free_irq(ctrl->pcie->irq, ctrl);
>  }
>  
> -static int pcie_poll_cmd(struct controller *ctrl, int timeout)
> -{
> -	struct pci_dev *pdev = ctrl_dev(ctrl);
> -	u16 slot_status;
> -
> -	do {
> -		pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, &slot_status);
> -		if (PCI_POSSIBLE_ERROR(slot_status)) {
> -			ctrl_info(ctrl, "%s: no response from device\n",
> -				  __func__);
> -			return 0;
> -		}
> -
> -		if (slot_status & PCI_EXP_SLTSTA_CC) {
> -			pcie_capability_write_word(pdev, PCI_EXP_SLTSTA,
> -						   PCI_EXP_SLTSTA_CC);
> -			ctrl->cmd_busy = 0;
> -			smp_mb();
> -			return 1;
> -		}
> -		msleep(10);
> -		timeout -= 10;
> -	} while (timeout >= 0);
> -	return 0;	/* timeout */
> -}
> -
>  static void pcie_wait_cmd(struct controller *ctrl)
>  {
>  	unsigned int msecs = pciehp_poll_mode ? 2500 : 1000;
> @@ -138,10 +112,16 @@ static void pcie_wait_cmd(struct controller *ctrl)
>  		timeout = cmd_timeout - now;
>  
>  	if (ctrl->slot_ctrl & PCI_EXP_SLTCTL_HPIE &&
> -	    ctrl->slot_ctrl & PCI_EXP_SLTCTL_CCIE)
> +	    ctrl->slot_ctrl & PCI_EXP_SLTCTL_CCIE) {
>  		rc = wait_event_timeout(ctrl->queue, !ctrl->cmd_busy, timeout);
> -	else
> -		rc = pcie_poll_cmd(ctrl, jiffies_to_msecs(timeout));
> +	} else {
> +		rc = pcie_poll_sltctl_cmd(ctrl_dev(ctrl), jiffies_to_msecs(timeout));
> +		if (!rc) {
> +			ctrl->cmd_busy = 0;
> +			smp_mb();
> +			rc = 1;
> +		}
> +	}
>  
>  	if (!rc)
>  		ctrl_info(ctrl, "Timeout on hotplug command %#06x (issued %u msec ago)\n",
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index 01e51db8d285..4c94a589de4a 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -759,12 +759,17 @@ static inline void pcie_ecrc_get_policy(char *str) { }
>  #ifdef CONFIG_PCIEPORTBUS
>  void pcie_reset_lbms_count(struct pci_dev *port);
>  int pcie_lbms_count(struct pci_dev *port, unsigned long *val);
> +int pcie_poll_sltctl_cmd(struct pci_dev *dev, int timeout_ms);
>  #else
>  static inline void pcie_reset_lbms_count(struct pci_dev *port) {}
>  static inline int pcie_lbms_count(struct pci_dev *port, unsigned long *val)
>  {
>  	return -EOPNOTSUPP;
>  }
> +static inline int pcie_poll_sltctl_cmd(struct pci_dev *dev, int timeout_ms)
> +{
> +	return 0;
> +}
>  #endif
>  
>  struct pci_dev_reset_methods {
> diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c
> index 02e73099bad0..bb00ba45ee51 100644
> --- a/drivers/pci/pcie/portdrv.c
> +++ b/drivers/pci/pcie/portdrv.c
> @@ -18,6 +18,7 @@
>  #include <linux/string.h>
>  #include <linux/slab.h>
>  #include <linux/aer.h>
> +#include <linux/iopoll.h>
>  
>  #include "../pci.h"
>  #include "portdrv.h"
> @@ -205,6 +206,30 @@ static int pcie_init_service_irqs(struct pci_dev *dev, int *irqs, int mask)
>  	return 0;
>  }
>  
> +/* Return 0 on command completed on time, otherwise return -ETIMEOUT */

Since you're making this visible outside of the file, please document this 
properly using a kerneldoc compliant comment.

> +int pcie_poll_sltctl_cmd(struct pci_dev *dev, int timeout_ms)
> +{
> +	u16 slot_status = 0;
> +	u32 slot_cap;
> +	int ret = 0;

Unnecessary initialization.

> +	int __maybe_unused ret1;
> +
> +	/* Don't wait if the command complete event is not well supported */
> +	pcie_capability_read_dword(dev, PCI_EXP_SLTCAP, &slot_cap);
> +	if (!(slot_cap & PCI_EXP_SLTCAP_HPC) || slot_cap & PCI_EXP_SLTCAP_NCCS)
> +		return ret;
> +
> +	ret = read_poll_timeout(pcie_capability_read_word, ret1,
> +				(slot_status & PCI_EXP_SLTSTA_CC), 10000,
> +				timeout_ms * 1000, true, dev, PCI_EXP_SLTSTA,

Replace:
        10000 -> 10 * USEC_PER_MSEC
        timeout_ms * 1000 -> USEC_PER_SEC (the variable can be dropped)

Please also check you have linux/units.h included for those defines.

> +				&slot_status);
> +	if (!ret)

Use the normal error handling logic by reversing the condition.

> +		pcie_capability_write_word(dev, PCI_EXP_SLTSTA,
> +						PCI_EXP_SLTSTA_CC);
> +
> +	return  ret;

Remove extra space but this will become return 0; once the error handling 
is done with the usual pattern.

> +}
> +
>  /**
>   * get_port_device_capability - discover capabilities of a PCI Express port
>   * @dev: PCI Express port to examine
> 

-- 
 i.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 2/4] PCI/portdrv: Add necessary wait for disabling hotplug events
  2025-02-24  3:44 ` [PATCH v3 2/4] PCI/portdrv: Add necessary wait for disabling hotplug events Feng Tang
  2025-02-24  9:42   ` Markus Elfring
  2025-02-24 15:00   ` Ilpo Järvinen
@ 2025-02-24 18:12   ` Lukas Wunner
  2025-02-25  3:06     ` Feng Tang
  2 siblings, 1 reply; 21+ messages in thread
From: Lukas Wunner @ 2025-02-24 18:12 UTC (permalink / raw)
  To: Feng Tang
  Cc: Bjorn Helgaas, Sathyanarayanan Kuppuswamy, Liguang Zhang,
	Guanghui Feng, rafael, Markus Elfring, lkp, Jonathan Cameron,
	ilpo.jarvinen, linux-pci, linux-kernel

On Mon, Feb 24, 2025 at 11:44:58AM +0800, Feng Tang wrote:
> @@ -255,8 +271,7 @@ static int get_port_device_capability(struct pci_dev *dev)
>  		 * Disable hot-plug interrupts in case they have been enabled
>  		 * by the BIOS and the hot-plug service driver is not loaded.
>  		 */
> -		pcie_capability_clear_word(dev, PCI_EXP_SLTCTL,
> -			  PCI_EXP_SLTCTL_CCIE | PCI_EXP_SLTCTL_HPIE);
> +		pcie_disable_hp_interrupts_early(dev);
>  	}

Moving the Slot Control code from pciehp to portdrv (as is done in
patch 1 of this series) is hackish.  It should be avoided if at all
possible.

As I've already said before...

https://lore.kernel.org/all/Z6HYuBDP6uvE1Sf4@wunner.de/

...it seems clearing those interrupts is only necessary
in the CONFIG_HOTPLUG_PCI_PCIE=n case.  And in that case,
there is no second Slot Control write, so there is no delay
to be observed.

Hence the proper solution is to make the clearing of the
interrupts conditional on: !IS_ENABLED(CONFIG_HOTPLUG_PCI_PCIE)

You keep sending new versions of these patches that do not
incorporate the feedback I provided in the above-linked e-mail.

Please re-read that e-mail and verify if the solution that
I proposed solves the problem.  That solution does not
require moving the Slot Control code from pciehp to portdrv.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 2/4] PCI/portdrv: Add necessary wait for disabling hotplug events
  2025-02-24 18:12   ` Lukas Wunner
@ 2025-02-25  3:06     ` Feng Tang
  2025-02-25  4:01       ` Lukas Wunner
  0 siblings, 1 reply; 21+ messages in thread
From: Feng Tang @ 2025-02-25  3:06 UTC (permalink / raw)
  To: Lukas Wunner, rafael
  Cc: Bjorn Helgaas, Sathyanarayanan Kuppuswamy, Liguang Zhang,
	Guanghui Feng, rafael, Markus Elfring, lkp, Jonathan Cameron,
	ilpo.jarvinen, linux-pci, linux-kernel

Hi Lukas,

On Mon, Feb 24, 2025 at 07:12:11PM +0100, Lukas Wunner wrote:
> On Mon, Feb 24, 2025 at 11:44:58AM +0800, Feng Tang wrote:
> > @@ -255,8 +271,7 @@ static int get_port_device_capability(struct pci_dev *dev)
> >  		 * Disable hot-plug interrupts in case they have been enabled
> >  		 * by the BIOS and the hot-plug service driver is not loaded.
> >  		 */
> > -		pcie_capability_clear_word(dev, PCI_EXP_SLTCTL,
> > -			  PCI_EXP_SLTCTL_CCIE | PCI_EXP_SLTCTL_HPIE);
> > +		pcie_disable_hp_interrupts_early(dev);
> >  	}
> 
> Moving the Slot Control code from pciehp to portdrv (as is done in
> patch 1 of this series) is hackish.  It should be avoided if at all
> possible.

I tried to remove the code duplication of 2 waiting function, according
to Bjorn's comment in https://lore.kernel.org/lkml/20250218223354.GA196886@bhelgaas/.
Maybe I didn't git it right. Any suggestion?

> 
> As I've already said before...
> 
> https://lore.kernel.org/all/Z6HYuBDP6uvE1Sf4@wunner.de/
> 
> ...it seems clearing those interrupts is only necessary
> in the CONFIG_HOTPLUG_PCI_PCIE=n case.  And in that case,
> there is no second Slot Control write, so there is no delay
> to be observed.
> 
> Hence the proper solution is to make the clearing of the
> interrupts conditional on: !IS_ENABLED(CONFIG_HOTPLUG_PCI_PCIE)
> 
> You keep sending new versions of these patches that do not
> incorporate the feedback I provided in the above-linked e-mail.
> 
> Please re-read that e-mail and verify if the solution that
> I proposed solves the problem.  That solution does not
> require moving the Slot Control code from pciehp to portdrv.

There might be some misunderstaning here :), I responded in
https://lore.kernel.org/lkml/Z6LRAozZm1UfgjqT@U-2FWC9VHC-2323.local/
that your suggestion could solve our issue.

And the reason I didn't take it is I was afraid that it might hurt
the problem what commit 2bd50dd800b5 ("PCI: PCIe: Disable PCIe port
services during port initialization") tried to solve.

As you mentioned, the comment for 2bd50dd800b5 is "a bit confusing",
and I tried to guess in my previous reply: 
"
I'm not sure what problem this piece of code try to avoid, maybe
something simliar to the irq storm isseu as mentioned in the 2/2 patch?
The code comments could be about the small time window between this
point and the loading of pciehp driver, which will request_irq and
enable hotplug interrupt again.
"

The code comment from 2bd50dd800b5 is:

	/*
	 * Disable hot-plug interrupts in case they have been
	 * enabled by the BIOS and the hot-plug service driver
	 * is not loaded.
	 */

The "is not loaded" has 2 possible meanings:
1. the pciehp driver is not loaded yet at this point inside
   get_port_device_capability(), and will be loaded later
2. the pciehp will never be loaded, i.e. CONFIG_HOTPLUG_PCI_PCIE=n 

If it's case 2, your suggestion can solve it nicely, but for case 1,
we may have to keep the interrupt disabling.

Thanks,
Feng

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 2/4] PCI/portdrv: Add necessary wait for disabling hotplug events
  2025-02-25  3:06     ` Feng Tang
@ 2025-02-25  4:01       ` Lukas Wunner
  2025-02-25  4:42         ` Feng Tang
  0 siblings, 1 reply; 21+ messages in thread
From: Lukas Wunner @ 2025-02-25  4:01 UTC (permalink / raw)
  To: Feng Tang
  Cc: rafael, Bjorn Helgaas, Sathyanarayanan Kuppuswamy, Liguang Zhang,
	Guanghui Feng, Markus Elfring, lkp, Jonathan Cameron,
	ilpo.jarvinen, linux-pci, linux-kernel

On Tue, Feb 25, 2025 at 11:06:50AM +0800, Feng Tang wrote:
> On Mon, Feb 24, 2025 at 07:12:11PM +0100, Lukas Wunner wrote:
> > On Mon, Feb 24, 2025 at 11:44:58AM +0800, Feng Tang wrote:
> > > @@ -255,8 +271,7 @@ static int get_port_device_capability(struct pci_dev *dev)
> > >  		 * Disable hot-plug interrupts in case they have been enabled
> > >  		 * by the BIOS and the hot-plug service driver is not loaded.
> > >  		 */
> > > -		pcie_capability_clear_word(dev, PCI_EXP_SLTCTL,
> > > -			  PCI_EXP_SLTCTL_CCIE | PCI_EXP_SLTCTL_HPIE);
> > > +		pcie_disable_hp_interrupts_early(dev);
> > >  	}
> > 
> > Moving the Slot Control code from pciehp to portdrv (as is done in
> > patch 1 of this series) is hackish.  It should be avoided if at all
> > possible.
> 
> I tried to remove the code duplication of 2 waiting function, according
> to Bjorn's comment in https://lore.kernel.org/lkml/20250218223354.GA196886@bhelgaas/.
> Maybe I didn't git it right. Any suggestion?

My point is just that you may not need to move the code from pciehp to
portdrv at all if you follow my suggestion.


> There might be some misunderstaning here :), I responded in
> https://lore.kernel.org/lkml/Z6LRAozZm1UfgjqT@U-2FWC9VHC-2323.local/
> that your suggestion could solve our issue.

Well, could you test it please?

A small change like the one I proposed is definitely preferable to
moving dozens of lines of code around.


> And the reason I didn't take it is I was afraid that it might hurt
> the problem what commit 2bd50dd800b5 ("PCI: PCIe: Disable PCIe port
> services during port initialization") tried to solve.
> 
> As you mentioned, the comment for 2bd50dd800b5 is "a bit confusing",
> and I tried to guess in my previous reply: 
> "
> I'm not sure what problem this piece of code try to avoid, maybe
> something simliar to the irq storm isseu as mentioned in the 2/2 patch?
> The code comments could be about the small time window between this
> point and the loading of pciehp driver, which will request_irq and
> enable hotplug interrupt again.
> "
> 
> The code comment from 2bd50dd800b5 is:
> 
> 	/*
> 	 * Disable hot-plug interrupts in case they have been
> 	 * enabled by the BIOS and the hot-plug service driver
> 	 * is not loaded.
> 	 */
> 
> The "is not loaded" has 2 possible meanings:
> 1. the pciehp driver is not loaded yet at this point inside
>    get_port_device_capability(), and will be loaded later
> 2. the pciehp will never be loaded, i.e. CONFIG_HOTPLUG_PCI_PCIE=n 
> 
> If it's case 2, your suggestion can solve it nicely, but for case 1,
> we may have to keep the interrupt disabling.

The pciehp driver cannot be bound to the PCIe port when
get_port_device_capability() is running.  Because at that point,
portdrv is still figuring out which capabilities the port has and
it will subsequently instantiate the port service devices to which
the drivers (such as pciehp) will bind.

So in that sense, case 1 cannot be what the code comment is
referring to.

My point is that if CONFIG_HOTPLUG_PCI_PCIE=y, there may indeed be
another write to the Slot Control register before the command written
by portdrv has been completed.  Because pciehp will write to the
register on probe.  But in this case, there shouldn't be a need for
portdrv to quiesce the interrupt because pciehp will do that anyway
shortly afterwards.

And in the CONFIG_HOTPLUG_PCI_PCIE=n case, pciehp will not quiesce
the interrupt, so portdrv has to do that.  I believe that's what
the code comment is referring to.  It should be safe to just write
to the Slot Control register without waiting for the command to
complete because there shouldn't be another Slot Control write
afterwards (not by pciehp anyway).

If making the Slot Control write in portdrv conditional on
CONFIG_HOTPLUG_PCI_PCIE=n does unexpectedly *not* solve the issue,
please try to find out where the second Slot Control write is
coming from.  E.g. you could amend pcie_capability_write_word()
with something like:

	if (pos == PCI_EXP_SLTCTL) {
		pci_info(dev, "Writing %04hx SltCtl\n", val);
		dump_stack();
	}

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 3/4] PCI/portdrv: Loose the condition check for disabling hotplug interrupts
  2025-02-24  3:44 ` [PATCH v3 3/4] PCI/portdrv: Loose the condition check for disabling hotplug interrupts Feng Tang
  2025-02-24  9:47   ` Markus Elfring
@ 2025-02-25  4:09   ` Lukas Wunner
  2025-02-26  2:54     ` Feng Tang
  1 sibling, 1 reply; 21+ messages in thread
From: Lukas Wunner @ 2025-02-25  4:09 UTC (permalink / raw)
  To: Feng Tang
  Cc: Bjorn Helgaas, Sathyanarayanan Kuppuswamy, Liguang Zhang,
	Guanghui Feng, rafael, Markus Elfring, lkp, Jonathan Cameron,
	ilpo.jarvinen, linux-pci, linux-kernel

On Mon, Feb 24, 2025 at 11:44:59AM +0800, Feng Tang wrote:
> Currently when 'pcie_ports_native' and host's 'native_pcie_hotplug' are
> both false, kernel will not disable PCIe hotplug interrupts. But as
> those could be affected by software setup like kernel cmdline parameter,
> remove the depency over them.
[...]
> --- a/drivers/pci/pcie/portdrv.c
> +++ b/drivers/pci/pcie/portdrv.c
> @@ -263,9 +263,9 @@ static int get_port_device_capability(struct pci_dev *dev)
>  
>  	if (dev->is_hotplug_bridge &&
>  	    (pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
> -	     pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM) &&
> -	    (pcie_ports_native || host->native_pcie_hotplug)) {
> -		services |= PCIE_PORT_SERVICE_HP;
> +	     pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM)) {
> +		if  (pcie_ports_native || host->native_pcie_hotplug)
> +			services |= PCIE_PORT_SERVICE_HP;
>  
>  		/*
>  		 * Disable hot-plug interrupts in case they have been enabled

If the platform doesn't grant hotplug control to OSPM, OSPM isn't allowed
to disable interrupts behind the platform's back.

So this change doesn't seem safe and we should focus on finding out
why the platform isn't granting hotplug control in the pci=nomsi case.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 2/4] PCI/portdrv: Add necessary wait for disabling hotplug events
  2025-02-25  4:01       ` Lukas Wunner
@ 2025-02-25  4:42         ` Feng Tang
  2025-02-28  6:29           ` Feng Tang
  0 siblings, 1 reply; 21+ messages in thread
From: Feng Tang @ 2025-02-25  4:42 UTC (permalink / raw)
  To: Lukas Wunner, rafael
  Cc: rafael, Bjorn Helgaas, Sathyanarayanan Kuppuswamy, Liguang Zhang,
	Guanghui Feng, Markus Elfring, lkp, Jonathan Cameron,
	ilpo.jarvinen, linux-pci, linux-kernel

Hi Lukas,

On Tue, Feb 25, 2025 at 05:01:43AM +0100, Lukas Wunner wrote:
> On Tue, Feb 25, 2025 at 11:06:50AM +0800, Feng Tang wrote:
> > On Mon, Feb 24, 2025 at 07:12:11PM +0100, Lukas Wunner wrote:
> > > On Mon, Feb 24, 2025 at 11:44:58AM +0800, Feng Tang wrote:
> > > > @@ -255,8 +271,7 @@ static int get_port_device_capability(struct pci_dev *dev)
> > > >  		 * Disable hot-plug interrupts in case they have been enabled
> > > >  		 * by the BIOS and the hot-plug service driver is not loaded.
> > > >  		 */
> > > > -		pcie_capability_clear_word(dev, PCI_EXP_SLTCTL,
> > > > -			  PCI_EXP_SLTCTL_CCIE | PCI_EXP_SLTCTL_HPIE);
> > > > +		pcie_disable_hp_interrupts_early(dev);
> > > >  	}
> > > 
> > > Moving the Slot Control code from pciehp to portdrv (as is done in
> > > patch 1 of this series) is hackish.  It should be avoided if at all
> > > possible.
> > 
> > I tried to remove the code duplication of 2 waiting function, according
> > to Bjorn's comment in https://lore.kernel.org/lkml/20250218223354.GA196886@bhelgaas/.
> > Maybe I didn't git it right. Any suggestion?
> 
> My point is just that you may not need to move the code from pciehp to
> portdrv at all if you follow my suggestion.
 
I see.

> 
> > There might be some misunderstaning here :), I responded in
> > https://lore.kernel.org/lkml/Z6LRAozZm1UfgjqT@U-2FWC9VHC-2323.local/
> > that your suggestion could solve our issue.
> 
> Well, could you test it please?

I don't have the hardware right now, will try to get from firmware
developers. But from code logic, your suggestion can surely solve the
issue unless I still miss something. From bug report (also commit log),
the first PCIe hotplug command issued is here, and the second command
comes from pciehp driver. In our kernel config, CONFIG_HOTPLUG_PCI_PCIE=y,
so the first command won't happen, and all following commands come
from pciehp driver, which setup its own waiting for command logic.

> A small change like the one I proposed is definitely preferable to
> moving dozens of lines of code around.

Agree.

> 
> > And the reason I didn't take it is I was afraid that it might hurt
> > the problem what commit 2bd50dd800b5 ("PCI: PCIe: Disable PCIe port
> > services during port initialization") tried to solve.
> > 
> > As you mentioned, the comment for 2bd50dd800b5 is "a bit confusing",
> > and I tried to guess in my previous reply: 
> > "
> > I'm not sure what problem this piece of code try to avoid, maybe
> > something simliar to the irq storm isseu as mentioned in the 2/2 patch?
> > The code comments could be about the small time window between this
> > point and the loading of pciehp driver, which will request_irq and
> > enable hotplug interrupt again.
> > "
> > 
> > The code comment from 2bd50dd800b5 is:
> > 
> > 	/*
> > 	 * Disable hot-plug interrupts in case they have been
> > 	 * enabled by the BIOS and the hot-plug service driver
> > 	 * is not loaded.
> > 	 */
> > 
> > The "is not loaded" has 2 possible meanings:
> > 1. the pciehp driver is not loaded yet at this point inside
> >    get_port_device_capability(), and will be loaded later
> > 2. the pciehp will never be loaded, i.e. CONFIG_HOTPLUG_PCI_PCIE=n 
> > 
> > If it's case 2, your suggestion can solve it nicely, but for case 1,
> > we may have to keep the interrupt disabling.
> 
> The pciehp driver cannot be bound to the PCIe port when
> get_port_device_capability() is running.  Because at that point,
> portdrv is still figuring out which capabilities the port has and
> it will subsequently instantiate the port service devices to which
> the drivers (such as pciehp) will bind.
 
Yes, the time window between here and the following initialization of
pciehp service driver is very small, and your suggestion sounds quite
safe to me.

> So in that sense, case 1 cannot be what the code comment is
> referring to.

Hi Rafel,

Could you help to confirm this?

> 
> My point is that if CONFIG_HOTPLUG_PCI_PCIE=y, there may indeed be
> another write to the Slot Control register before the command written
> by portdrv has been completed.  Because pciehp will write to the
> register on probe.  But in this case, there shouldn't be a need for
> portdrv to quiesce the interrupt because pciehp will do that anyway
> shortly afterwards.
> 
> And in the CONFIG_HOTPLUG_PCI_PCIE=n case, pciehp will not quiesce
> the interrupt, so portdrv has to do that.  I believe that's what
> the code comment is referring to.  It should be safe to just write
> to the Slot Control register without waiting for the command to
> complete because there shouldn't be another Slot Control write
> afterwards (not by pciehp anyway).
> 
> If making the Slot Control write in portdrv conditional on
> CONFIG_HOTPLUG_PCI_PCIE=n does unexpectedly *not* solve the issue,
> please try to find out where the second Slot Control write is
> coming from.  E.g. you could amend pcie_capability_write_word()
> with something like:
> 
> 	if (pos == PCI_EXP_SLTCTL) {
> 		pci_info(dev, "Writing %04hx SltCtl\n", val);
> 		dump_stack();
> 	}

Thanks for the suggestion, and we added similar debug to figure
out them.

Thanks,
Feng

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 2/4] PCI/portdrv: Add necessary wait for disabling hotplug events
  2025-02-24 15:00   ` Ilpo Järvinen
@ 2025-02-25  5:51     ` Feng Tang
  0 siblings, 0 replies; 21+ messages in thread
From: Feng Tang @ 2025-02-25  5:51 UTC (permalink / raw)
  To: Ilpo Järvinen
  Cc: Bjorn Helgaas, Lukas Wunner, Sathyanarayanan Kuppuswamy,
	Liguang Zhang, Guanghui Feng, Rafael J. Wysocki, Markus Elfring,
	lkp, Jonathan Cameron, linux-pci, LKML

Hi Ilpo, 

On Mon, Feb 24, 2025 at 05:00:03PM +0200, Ilpo Järvinen wrote:
> On Mon, 24 Feb 2025, Feng Tang wrote:
> 
> > There was problem reported by firmware developers that they received
> > two PCIe hotplug commands in very short intervals on an ARM server,
> > which doesn't comply with PCIe spec, and broke their state machine and
> > work flow. According to PCIe 6.1 spec, section 6.7.3.2, software needs
> > to wait at least 1 second for the command-complete event, before
> > resending the command or sending a new command.
> > 
> > In the failure case, the first PCIe hotplug command firmware received
> > is from get_port_device_capability(), which sends command to disable
> > PCIe hotplug interrupts without waiting for its completion, and the
> > second command comes from pcie_enable_notification() of pciehp driver,
> > which enables hotplug interrupts again.
> > 
> > Fix it by adding the necessary wait to comply with PCIe spec.
> > 
> > Fixes: 2bd50dd800b5 ("PCI: PCIe: Disable PCIe port services during port initialization")
> > Originally-by: Liguang Zhang <zhangliguang@linux.alibaba.com>
> > Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com>
> > ---
> >  drivers/pci/pci.h          |  2 ++
> >  drivers/pci/pcie/portdrv.c | 19 +++++++++++++++++--
> >  2 files changed, 19 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> > index 4c94a589de4a..a1138ebc2689 100644
> > --- a/drivers/pci/pci.h
> > +++ b/drivers/pci/pci.h
> > @@ -760,6 +760,7 @@ static inline void pcie_ecrc_get_policy(char *str) { }
> >  void pcie_reset_lbms_count(struct pci_dev *port);
> >  int pcie_lbms_count(struct pci_dev *port, unsigned long *val);
> >  int pcie_poll_sltctl_cmd(struct pci_dev *dev, int timeout_ms);
> > +void pcie_disable_hp_interrupts_early(struct pci_dev *dev);
> >  #else
> >  static inline void pcie_reset_lbms_count(struct pci_dev *port) {}
> >  static inline int pcie_lbms_count(struct pci_dev *port, unsigned long *val)
> > @@ -770,6 +771,7 @@ static inline int pcie_poll_sltctl_cmd(struct pci_dev *dev, int timeout_ms)
> >  {
> >  	return 0;
> >  }
> > +static inline void pcie_disable_hp_interrupts_early(struct pci_dev *dev) {}
> >  #endif
> >  
> >  struct pci_dev_reset_methods {
> > diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c
> > index bb00ba45ee51..ca4f21dff486 100644
> > --- a/drivers/pci/pcie/portdrv.c
> > +++ b/drivers/pci/pcie/portdrv.c
> > @@ -230,6 +230,22 @@ int pcie_poll_sltctl_cmd(struct pci_dev *dev, int timeout_ms)
> >  	return  ret;
> >  }
> >  
> > +void pcie_disable_hp_interrupts_early(struct pci_dev *dev)
> > +{
> > +	u16 slot_ctrl = 0;
> 
> Unnecessary initialization
 
The reason I put it to 0 is, very unlikely, the pcie_capability_read_word()
might fail, and 0 can avoid the early return.

> > +
> > +	pcie_capability_read_word(dev, PCI_EXP_SLTCTL, &slot_ctrl);
> > +	/* Bail out early if it is already disabled */
> > +	if (!(slot_ctrl & (PCI_EXP_SLTCTL_CCIE | PCI_EXP_SLTCTL_HPIE)))
> > +		return;
> > +
> > +	pcie_capability_clear_word(dev, PCI_EXP_SLTCTL,
> > +		  PCI_EXP_SLTCTL_CCIE | PCI_EXP_SLTCTL_HPIE);
> 
> Align to (. You might need to put the bits to own lines.

OK.

> > +
> > +	if (pcie_poll_sltctl_cmd(dev, 1000))
> > +		pci_info(dev, "Timeout on disabling PCIe hot-plug interrupt\n");
> > +}
> > +
> >  /**
> >   * get_port_device_capability - discover capabilities of a PCI Express port
> >   * @dev: PCI Express port to examine
> > @@ -255,8 +271,7 @@ static int get_port_device_capability(struct pci_dev *dev)
> >  		 * Disable hot-plug interrupts in case they have been enabled
> >  		 * by the BIOS and the hot-plug service driver is not loaded.
> >  		 */
> > -		pcie_capability_clear_word(dev, PCI_EXP_SLTCTL,
> > -			  PCI_EXP_SLTCTL_CCIE | PCI_EXP_SLTCTL_HPIE);
> > +		pcie_disable_hp_interrupts_early(dev);
> 
> Doesn't calling this here delay setup for all portdrv services, not just 
> hotplug? And the delay can be relatively long.

I don't quite follow, physically there is only one root port, the code
from commit 2bd50dd800b5 just does it once. Also the 1 second is just the
maxim waiting time, the wait will end once the command completed event
bit is set. The exact time depends on platform (x86 and ARM) and the
firmeware implementation, and it is much smaller than 1 second AFAIK.

Thanks,
Feng

> >  	}
> >  
> >  #ifdef CONFIG_PCIEAER
> > 
> 
> -- 
>  i.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 1/4] PCI: portdrv: pciehp: Move PCIe hotplug command waiting logic to port driver
  2025-02-24 15:06   ` Ilpo Järvinen
@ 2025-02-25  8:18     ` Feng Tang
  0 siblings, 0 replies; 21+ messages in thread
From: Feng Tang @ 2025-02-25  8:18 UTC (permalink / raw)
  To: Ilpo Järvinen
  Cc: Bjorn Helgaas, Lukas Wunner, Sathyanarayanan Kuppuswamy,
	Liguang Zhang, Guanghui Feng, Rafael J. Wysocki, Markus Elfring,
	lkp, Jonathan Cameron, linux-pci, LKML

Hi Ilpo,

On Mon, Feb 24, 2025 at 05:06:26PM +0200, Ilpo Järvinen wrote:
> On Mon, 24 Feb 2025, Feng Tang wrote:
> 
> > According to PCIe spec, after sending a hotplug command, software should
> > wait some time for the command completion. Currently the waiting logic
> 
> Where is it in the spec, please put a more precise reference.

Will do, if the patch is kept.
> 
> > is implemented in pciehp driver, as the same logic will be reused by
> > PCIe port driver, move it to port driver, which complies with the logic
> > of CONFIG_HOTPLUG_PCI_PCIE depending on CONFIG_PCIEPORTBUS.
> > 
> > Also convert the loop wait logic to helper read_poll_timeout() as
> > suggested by Sathyanarayanan Kuppuswamy.
> 
> You could express the second part of this with a tag:
> 
> Suggested-by: Sathyanarayanan Kuppuswamy <sathyanarayanan.kuppuswamy@linux.intel.com> # Use to read_poll_timeout()

Thanks for the suggestion!

> 
> > Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com>
> > ---
> >  drivers/pci/hotplug/pciehp_hpc.c | 38 ++++++++------------------------
> >  drivers/pci/pci.h                |  5 +++++
> >  drivers/pci/pcie/portdrv.c       | 25 +++++++++++++++++++++
> >  3 files changed, 39 insertions(+), 29 deletions(-)
[...]

> >  
> > +/* Return 0 on command completed on time, otherwise return -ETIMEOUT */
> 
> Since you're making this visible outside of the file, please document this 
> properly using a kerneldoc compliant comment.

OK.

> 
> > +int pcie_poll_sltctl_cmd(struct pci_dev *dev, int timeout_ms)
> > +{
> > +	u16 slot_status = 0;
> > +	u32 slot_cap;
> > +	int ret = 0;
> 
> Unnecessary initialization.

The initialization is actually used below.
> 
> > +	int __maybe_unused ret1;
> > +
> > +	/* Don't wait if the command complete event is not well supported */
> > +	pcie_capability_read_dword(dev, PCI_EXP_SLTCAP, &slot_cap);
> > +	if (!(slot_cap & PCI_EXP_SLTCAP_HPC) || slot_cap & PCI_EXP_SLTCAP_NCCS)
> > +		return ret;

Used here to return early if the event is not supported.

> > +
> > +	ret = read_poll_timeout(pcie_capability_read_word, ret1,
> > +				(slot_status & PCI_EXP_SLTSTA_CC), 10000,
> > +				timeout_ms * 1000, true, dev, PCI_EXP_SLTSTA,
> 
> Replace:
>         10000 -> 10 * USEC_PER_MSEC
>         timeout_ms * 1000 -> USEC_PER_SEC (the variable can be dropped)
> 
> Please also check you have linux/units.h included for those defines.

Will use the macros, which are indeed self-describing. 

If you are referring 'timeout_ms', I don't think it can be dropped as
it is needed in the logic of pcie_poll_cmd().

> > +				&slot_status);
> > +	if (!ret)
> 
> Use the normal error handling logic by reversing the condition.
> 
> > +		pcie_capability_write_word(dev, PCI_EXP_SLTSTA,
> > +						PCI_EXP_SLTSTA_CC);
> > +
> > +	return  ret;
> 
> Remove extra space but this will become return 0; once the error handling 
> is done with the usual pattern.

This is not error handling, but rather "normal" and "expected", that
the command-completed bit is set and will be cleared here.

Thanks,
Feng

> 
> > +}
> > +
> >  /**
> >   * get_port_device_capability - discover capabilities of a PCI Express port
> >   * @dev: PCI Express port to examine
> > 
> 
> -- 
>  i.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 3/4] PCI/portdrv: Loose the condition check for disabling hotplug interrupts
  2025-02-25  4:09   ` Lukas Wunner
@ 2025-02-26  2:54     ` Feng Tang
  0 siblings, 0 replies; 21+ messages in thread
From: Feng Tang @ 2025-02-26  2:54 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Bjorn Helgaas, Sathyanarayanan Kuppuswamy, Liguang Zhang,
	Guanghui Feng, rafael, Markus Elfring, lkp, Jonathan Cameron,
	ilpo.jarvinen, linux-pci, linux-kernel

Hi Lukas,

On Tue, Feb 25, 2025 at 05:09:11AM +0100, Lukas Wunner wrote:
> On Mon, Feb 24, 2025 at 11:44:59AM +0800, Feng Tang wrote:
> > Currently when 'pcie_ports_native' and host's 'native_pcie_hotplug' are
> > both false, kernel will not disable PCIe hotplug interrupts. But as
> > those could be affected by software setup like kernel cmdline parameter,
> > remove the depency over them.
> [...]
> > --- a/drivers/pci/pcie/portdrv.c
> > +++ b/drivers/pci/pcie/portdrv.c
> > @@ -263,9 +263,9 @@ static int get_port_device_capability(struct pci_dev *dev)
> >  
> >  	if (dev->is_hotplug_bridge &&
> >  	    (pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
> > -	     pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM) &&
> > -	    (pcie_ports_native || host->native_pcie_hotplug)) {
> > -		services |= PCIE_PORT_SERVICE_HP;
> > +	     pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM)) {
> > +		if  (pcie_ports_native || host->native_pcie_hotplug)
> > +			services |= PCIE_PORT_SERVICE_HP;
> >  
> >  		/*
> >  		 * Disable hot-plug interrupts in case they have been enabled
> 
> If the platform doesn't grant hotplug control to OSPM, OSPM isn't allowed
> to disable interrupts behind the platform's back.
> 
> So this change doesn't seem safe and we should focus on finding out
> why the platform isn't granting hotplug control in the pci=nomsi case.

Yes. As discussed in https://lore.kernel.org/lkml/Z6ycYOKUeOECrcgb@U-2FWC9VHC-2323.local/
the last state is:
"
I tried to remove OSC_PCI_MSI_SUPPORT from ACPI_PCIE_REQ_SUPPORT, but
after negotiate_os_control(), the 'PCIeHotplug' control is still
disabled in the control capability after ACPI query_osc, run_osc
routines (I haven't figured out why yet), thus the pciehp severvice
driver can't be loader.
"

After talking with some firmware developer, the root cause is the
parameter OS passed to OSC control shows it doesn't support MSI, then
firmware doesn't grant hotplug control. The hardware here is a ARM
server. As different OEM/vendor (X86/ARM) may have different
implementaions of firmware, other firmware may make different decision. 

Thanks,
Feng

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 2/4] PCI/portdrv: Add necessary wait for disabling hotplug events
  2025-02-25  4:42         ` Feng Tang
@ 2025-02-28  6:29           ` Feng Tang
  2025-02-28  7:14             ` Lukas Wunner
  0 siblings, 1 reply; 21+ messages in thread
From: Feng Tang @ 2025-02-28  6:29 UTC (permalink / raw)
  To: Lukas Wunner, rafael, Bjorn Helgaas
  Cc: Sathyanarayanan Kuppuswamy, Liguang Zhang, Guanghui Feng,
	Markus Elfring, lkp, Jonathan Cameron, ilpo.jarvinen, linux-pci,
	linux-kernel

On Tue, Feb 25, 2025 at 12:42:04PM +0800, Feng Tang wrote:
 
> > 
> > > There might be some misunderstaning here :), I responded in
> > > https://lore.kernel.org/lkml/Z6LRAozZm1UfgjqT@U-2FWC9VHC-2323.local/
> > > that your suggestion could solve our issue.
> > 
> > Well, could you test it please?
> 
> I don't have the hardware right now, will try to get from firmware
> developers. But from code logic, your suggestion can surely solve the
> issue unless I still miss something. From bug report (also commit log),
> the first PCIe hotplug command issued is here, and the second command
> comes from pciehp driver. In our kernel config, CONFIG_HOTPLUG_PCI_PCIE=y,
> so the first command won't happen, and all following commands come
> from pciehp driver, which setup its own waiting for command logic.
> 

Hi Lucas,

We just tried the patch on the hardware and initial 5.10 kernel, and
the problem cannot be reproduced, as the first PCIe hotplug command
of disabling CCIE and HPIE was not issued. 

Should I post a new version patch with your suggestion? You analysis
in previous sounded sane to me. Also for the original context, if the
BIOS has enabled the hotplug interrupt, it has been there since OS
boot for quite some time, this solution just affects a very small
time window from here to the loading of pciehp driver.  

Also I would like to separate this patch from the patch dealing the
nomsi irq storm issue. How do you think?

Thanks,
Feng

> > > 
> > > The code comment from 2bd50dd800b5 is:
> > > 
> > > 	/*
> > > 	 * Disable hot-plug interrupts in case they have been
> > > 	 * enabled by the BIOS and the hot-plug service driver
> > > 	 * is not loaded.
> > > 	 */
> > > 
> > > The "is not loaded" has 2 possible meanings:
> > > 1. the pciehp driver is not loaded yet at this point inside
> > >    get_port_device_capability(), and will be loaded later
> > > 2. the pciehp will never be loaded, i.e. CONFIG_HOTPLUG_PCI_PCIE=n 
> > > 
> > > If it's case 2, your suggestion can solve it nicely, but for case 1,
> > > we may have to keep the interrupt disabling.
> > 
> > The pciehp driver cannot be bound to the PCIe port when
> > get_port_device_capability() is running.  Because at that point,
> > portdrv is still figuring out which capabilities the port has and
> > it will subsequently instantiate the port service devices to which
> > the drivers (such as pciehp) will bind.
>  
> Yes, the time window between here and the following initialization of
> pciehp service driver is very small, and your suggestion sounds quite
> safe to me.
> 
> > So in that sense, case 1 cannot be what the code comment is
> > referring to.
> 
> Hi Rafel,
> 
> Could you help to confirm this?
> 
> > 
> > My point is that if CONFIG_HOTPLUG_PCI_PCIE=y, there may indeed be
> > another write to the Slot Control register before the command written
> > by portdrv has been completed.  Because pciehp will write to the
> > register on probe.  But in this case, there shouldn't be a need for
> > portdrv to quiesce the interrupt because pciehp will do that anyway
> > shortly afterwards.
> > 
> > And in the CONFIG_HOTPLUG_PCI_PCIE=n case, pciehp will not quiesce
> > the interrupt, so portdrv has to do that.  I believe that's what
> > the code comment is referring to.  It should be safe to just write
> > to the Slot Control register without waiting for the command to
> > complete because there shouldn't be another Slot Control write
> > afterwards (not by pciehp anyway).
> > 
> > If making the Slot Control write in portdrv conditional on
> > CONFIG_HOTPLUG_PCI_PCIE=n does unexpectedly *not* solve the issue,
> > please try to find out where the second Slot Control write is
> > coming from.  E.g. you could amend pcie_capability_write_word()
> > with something like:
> > 
> > 	if (pos == PCI_EXP_SLTCTL) {
> > 		pci_info(dev, "Writing %04hx SltCtl\n", val);
> > 		dump_stack();
> > 	}
> 
> Thanks for the suggestion, and we added similar debug to figure
> out them.
> 
> Thanks,
> Feng

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 2/4] PCI/portdrv: Add necessary wait for disabling hotplug events
  2025-02-28  6:29           ` Feng Tang
@ 2025-02-28  7:14             ` Lukas Wunner
  2025-02-28  9:33               ` Feng Tang
  0 siblings, 1 reply; 21+ messages in thread
From: Lukas Wunner @ 2025-02-28  7:14 UTC (permalink / raw)
  To: Feng Tang
  Cc: rafael, Bjorn Helgaas, Sathyanarayanan Kuppuswamy, Liguang Zhang,
	Guanghui Feng, Markus Elfring, lkp, Jonathan Cameron,
	ilpo.jarvinen, linux-pci, linux-kernel

On Fri, Feb 28, 2025 at 02:29:29PM +0800, Feng Tang wrote:
> On Tue, Feb 25, 2025 at 12:42:04PM +0800, Feng Tang wrote:
> > > > There might be some misunderstaning here :), I responded in
> > > > https://lore.kernel.org/lkml/Z6LRAozZm1UfgjqT@U-2FWC9VHC-2323.local/
> > > > that your suggestion could solve our issue.
> > > 
> > > Well, could you test it please?
> > 
> We just tried the patch on the hardware and initial 5.10 kernel, and
> the problem cannot be reproduced, as the first PCIe hotplug command
> of disabling CCIE and HPIE was not issued.

Good!

> Should I post a new version patch with your suggestion?

Yes, please.

> Also I would like to separate this patch from the patch dealing the
> nomsi irq storm issue. How do you think?

Makes sense to me.

The problem with the nomsi irq storm is really that if the platform
(i.e. BIOS) doesn't grant OSPM control of hotplug, OSPM (i.e. the kernel)
cannot modify hotplug registers because the assumption is that the
platform controls them.  If the platform doesn't actually handle
hotplug, but keeps the interrupts enabled, that's basically a bug
of the specific platform.

I think the kernel community's stance in such situations is that the
BIOS vendor should provide an update with a fix.  In some cases
that's not posible because the product is no longer supported,
or the vendor doesn't care about Linux issues because it only
supports Windows or macOS.  In those cases, we deal with these
problems with a quirk.  E.g. on x86 we often use a DMI quirk to
recognize affected hardware and the quirk would then disable the
hotplug interrupts.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 2/4] PCI/portdrv: Add necessary wait for disabling hotplug events
  2025-02-28  7:14             ` Lukas Wunner
@ 2025-02-28  9:33               ` Feng Tang
  2025-02-28 10:01                 ` Feng Tang
  0 siblings, 1 reply; 21+ messages in thread
From: Feng Tang @ 2025-02-28  9:33 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: rafael, Bjorn Helgaas, Sathyanarayanan Kuppuswamy, Liguang Zhang,
	Guanghui Feng, Markus Elfring, lkp, Jonathan Cameron,
	ilpo.jarvinen, linux-pci, linux-kernel

Hi Lukas,

On Fri, Feb 28, 2025 at 08:14:04AM +0100, Lukas Wunner wrote:
> On Fri, Feb 28, 2025 at 02:29:29PM +0800, Feng Tang wrote:
> > On Tue, Feb 25, 2025 at 12:42:04PM +0800, Feng Tang wrote:
> > > > > There might be some misunderstaning here :), I responded in
> > > > > https://lore.kernel.org/lkml/Z6LRAozZm1UfgjqT@U-2FWC9VHC-2323.local/
> > > > > that your suggestion could solve our issue.
> > > > 
> > > > Well, could you test it please?
> > > 
> > We just tried the patch on the hardware and initial 5.10 kernel, and
> > the problem cannot be reproduced, as the first PCIe hotplug command
> > of disabling CCIE and HPIE was not issued.
> 
> Good!
> 
> > Should I post a new version patch with your suggestion?
> 
> Yes, please.

Will do, thanks

> 
> > Also I would like to separate this patch from the patch dealing the
> > nomsi irq storm issue. How do you think?
> 
> Makes sense to me.
> 
> The problem with the nomsi irq storm is really that if the platform
> (i.e. BIOS) doesn't grant OSPM control of hotplug, OSPM (i.e. the kernel)
> cannot modify hotplug registers because the assumption is that the
> platform controls them. 

Yes, very reasonable. I also talked with some firmware engineer, who
shared there is working sample that on some old x86 platform, the
firmware itself is really capable of handling the hotplug stuff when
MSI is disabled.

> If the platform doesn't actually handle
> hotplug, but keeps the interrupts enabled, that's basically a bug
> of the specific platform.

That's what happened in our case :)

> I think the kernel community's stance in such situations is that the
> BIOS vendor should provide an update with a fix.  In some cases
> that's not posible because the product is no longer supported,
> or the vendor doesn't care about Linux issues because it only
> supports Windows or macOS.  In those cases, we deal with these
> problems with a quirk.  E.g. on x86 we often use a DMI quirk to
> recognize affected hardware and the quirk would then disable the
> hotplug interrupts.

I see.

As you dug out the history in https://lore.kernel.org/lkml/Z6RU-681eXl7hcp6@wunner.de/

Our previous debug could go through the OSC check in nomsi case,
only after below patch:

diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c
index 84030804a763..e7d9328cba45 100644
--- a/drivers/acpi/pci_root.c
+++ b/drivers/acpi/pci_root.c
@@ -38,8 +38,7 @@ static int acpi_pci_root_scan_dependent(struct acpi_device *adev)
 
 #define ACPI_PCIE_REQ_SUPPORT (OSC_PCI_EXT_CONFIG_SUPPORT \
 				| OSC_PCI_ASPM_SUPPORT \
-				| OSC_PCI_CLOCK_PM_SUPPORT \
-				| OSC_PCI_MSI_SUPPORT)
+				| OSC_PCI_CLOCK_PM_SUPPORT)

Otherwise, the OSC function won't be executed, but kernel will simply
disable PCIe hotplug, which breaks the working sample I mentioned above.
We'd better also include take this change?

Thanks,
Feng

> 
> Thanks,
> 
> Lukas

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 2/4] PCI/portdrv: Add necessary wait for disabling hotplug events
  2025-02-28  9:33               ` Feng Tang
@ 2025-02-28 10:01                 ` Feng Tang
  0 siblings, 0 replies; 21+ messages in thread
From: Feng Tang @ 2025-02-28 10:01 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: rafael, Bjorn Helgaas, Sathyanarayanan Kuppuswamy, Liguang Zhang,
	Guanghui Feng, Markus Elfring, lkp, Jonathan Cameron,
	ilpo.jarvinen, linux-pci, linux-kernel

On Fri, Feb 28, 2025 at 05:33:43PM +0800, Feng Tang wrote:
> > The problem with the nomsi irq storm is really that if the platform
> > (i.e. BIOS) doesn't grant OSPM control of hotplug, OSPM (i.e. the kernel)
> > cannot modify hotplug registers because the assumption is that the
> > platform controls them. 
> 
> Yes, very reasonable. I also talked with some firmware engineer, who
> shared there is working sample that on some old x86 platform, the
> firmware itself is really capable of handling the hotplug stuff when
> MSI is disabled.
> 
> > If the platform doesn't actually handle
> > hotplug, but keeps the interrupts enabled, that's basically a bug
> > of the specific platform.
> 
> That's what happened in our case :)
> 
> > I think the kernel community's stance in such situations is that the
> > BIOS vendor should provide an update with a fix.  In some cases
> > that's not posible because the product is no longer supported,
> > or the vendor doesn't care about Linux issues because it only
> > supports Windows or macOS.  In those cases, we deal with these
> > problems with a quirk.  E.g. on x86 we often use a DMI quirk to
> > recognize affected hardware and the quirk would then disable the
> > hotplug interrupts.
> 
> I see.
> 
> As you dug out the history in https://lore.kernel.org/lkml/Z6RU-681eXl7hcp6@wunner.de/
> 
> Our previous debug could go through the OSC check in nomsi case,
> only after below patch:
> 
> diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c
> index 84030804a763..e7d9328cba45 100644
> --- a/drivers/acpi/pci_root.c
> +++ b/drivers/acpi/pci_root.c
> @@ -38,8 +38,7 @@ static int acpi_pci_root_scan_dependent(struct acpi_device *adev)
>  
>  #define ACPI_PCIE_REQ_SUPPORT (OSC_PCI_EXT_CONFIG_SUPPORT \
>  				| OSC_PCI_ASPM_SUPPORT \
> -				| OSC_PCI_CLOCK_PM_SUPPORT \
> -				| OSC_PCI_MSI_SUPPORT)
> +				| OSC_PCI_CLOCK_PM_SUPPORT)
> 
> Otherwise, the OSC function won't be executed, but kernel will simply
> disable PCIe hotplug,

> which breaks the working sample I mentioned above.
Sorry, this saying is wrong, the patch doesn't change anything for cases
that firwmare would handle the PCIe hotplug.

But still, it is reasonable to let firmware decide if it would grant
the control to OSPM.

Thanks,
Feng

> We'd better also include take this change?
> 
> Thanks,
> Feng
> 
> > 
> > Thanks,
> > 
> > Lukas

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2025-02-28 10:01 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-24  3:44 [PATCH v3 0/4] PCIe hotplug interrupt related fixes Feng Tang
2025-02-24  3:44 ` [PATCH v3 1/4] PCI: portdrv: pciehp: Move PCIe hotplug command waiting logic to port driver Feng Tang
2025-02-24 15:06   ` Ilpo Järvinen
2025-02-25  8:18     ` Feng Tang
2025-02-24  3:44 ` [PATCH v3 2/4] PCI/portdrv: Add necessary wait for disabling hotplug events Feng Tang
2025-02-24  9:42   ` Markus Elfring
2025-02-24 15:00   ` Ilpo Järvinen
2025-02-25  5:51     ` Feng Tang
2025-02-24 18:12   ` Lukas Wunner
2025-02-25  3:06     ` Feng Tang
2025-02-25  4:01       ` Lukas Wunner
2025-02-25  4:42         ` Feng Tang
2025-02-28  6:29           ` Feng Tang
2025-02-28  7:14             ` Lukas Wunner
2025-02-28  9:33               ` Feng Tang
2025-02-28 10:01                 ` Feng Tang
2025-02-24  3:44 ` [PATCH v3 3/4] PCI/portdrv: Loose the condition check for disabling hotplug interrupts Feng Tang
2025-02-24  9:47   ` Markus Elfring
2025-02-25  4:09   ` Lukas Wunner
2025-02-26  2:54     ` Feng Tang
2025-02-24  3:45 ` [PATCH v3 4/4] PCI: Disable PCIe hotplug interrupts early when msi is disabled Feng Tang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox