public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] PCI: pciehp: Fix hotplug on Catlow Lake with unreliable PME status
@ 2026-02-13 23:14 Kuppuswamy Sathyanarayanan
  2026-02-14  6:01 ` Lukas Wunner
  2026-02-17 16:54 ` Kuppuswamy Sathyanarayanan
  0 siblings, 2 replies; 14+ messages in thread
From: Kuppuswamy Sathyanarayanan @ 2026-02-13 23:14 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: Lukas Wunner, linux-pci, linux-kernel

On Intel Catlow Lake platforms, PCH PCIe root ports do not reliably
update PME status registers (PME Status and PME Requester_ID in the
Root Status register) during D3hot to D0 transitions, even though PME
interrupts are delivered correctly.

This issue manifests during PCIe hotplug operations as follows:

1. After a hot-remove event, the PCIe port transitions to D3hot and
   the hotplug interrupt enable (HPIE) flag is disabled as the port
   enters low power state.

2. When a hot-add occurs while the port is in D3hot, a PME interrupt
   fires as expected to wake the port.

3. However, the PME interrupt handler finds the PME_Status and
   PME_Requester_ID registers unpopulated, preventing identification
   of which device triggered the PME. The handler returns IRQ_NONE,
   leaving the port in D3hot.

4. Because the port remains in D3hot with HPIE disabled, the hotplug
   driver ignores the hot-add event, resulting in the newly inserted
   device not being recognized.

The PME interrupt delivery mechanism itself works correctly;
interrupts arrive reliably. The problem is purely the missing status
register updates. Verification via IOSF-SideBand (IOSF-SB) backdoor
reads confirms that these registers remain empty when the PME
interrupt fires. Neither BIOS nor kernel code is clearing these
registers.

This issue is present in all steppings of Catlow Lake PCH and affects
customers in production deployments. A public hardware errata document
is not yet available.

Work around this issue by disabling runtime PM for affected ports,
keeping them in D0 during runtime operation. This ensures hotplug
events are handled via direct interrupts rather than relying on
unreliable PME-based wakeup.

During system suspend/resume, PCIe ports are resumed unconditionally
when coming out of system sleep due to DPM_FLAG_SMART_SUSPEND set by
pcie_portdrv_probe(), and pciehp re-enables interrupts and checks slot
occupation status during resume.

The quirk is applied only to Catlow PCH PCIe root ports (device IDs
0x7a30 through 0x7a4b). Catlow CPU PCIe ports are not affected as
they are not hotplug-capable.

Suggested-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
---

Changes since v1:
 * Removed hack in hotplug driver and disabled runtime PM on affected ports.
 * Fixed the commit log and comments accordingly.

 drivers/pci/quirks.c | 49 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 49 insertions(+)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 280cd50d693b..779cd65b1a8a 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -6340,3 +6340,52 @@ static void pci_mask_replay_timer_timeout(struct pci_dev *pdev)
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_GLI, 0x9750, pci_mask_replay_timer_timeout);
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_GLI, 0x9755, pci_mask_replay_timer_timeout);
 #endif
+
+/*
+ * Intel Catlow Lake PCH PCIe root ports have a hardware issue where
+ * PME status registers (PME Status and PME Requester_ID in Root Status)
+ * are not reliably updated during D3hot to D0 transitions, even though
+ * PME interrupts are delivered correctly.
+ *
+ * When a hotplug event occurs while the port is in D3hot, the PME
+ * interrupt fires but the status registers remain empty. This prevents
+ * the PME handler from identifying the event source, leaving the port
+ * in D3hot and causing the hotplug driver to miss the event.
+ *
+ * Disable runtime PM to keep these ports in D0, ensuring hotplug events
+ * are handled via direct interrupts.
+ */
+static void quirk_intel_catlow_pcie_no_pme_wakeup(struct pci_dev *dev)
+{
+	pm_runtime_disable(&dev->dev);
+	pci_info(dev, "Catlow PCH port: PME status unreliable, disabling runtime PM\n");
+}
+/* Apply quirk to Catlow Lake PCH root ports (0x7a30 - 0x7a4b) */
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a30, quirk_intel_catlow_pcie_no_pme_wakeup);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a31, quirk_intel_catlow_pcie_no_pme_wakeup);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a32, quirk_intel_catlow_pcie_no_pme_wakeup);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a33, quirk_intel_catlow_pcie_no_pme_wakeup);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a34, quirk_intel_catlow_pcie_no_pme_wakeup);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a35, quirk_intel_catlow_pcie_no_pme_wakeup);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a36, quirk_intel_catlow_pcie_no_pme_wakeup);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a37, quirk_intel_catlow_pcie_no_pme_wakeup);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a38, quirk_intel_catlow_pcie_no_pme_wakeup);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a39, quirk_intel_catlow_pcie_no_pme_wakeup);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a3a, quirk_intel_catlow_pcie_no_pme_wakeup);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a3b, quirk_intel_catlow_pcie_no_pme_wakeup);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a3c, quirk_intel_catlow_pcie_no_pme_wakeup);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a3d, quirk_intel_catlow_pcie_no_pme_wakeup);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a3e, quirk_intel_catlow_pcie_no_pme_wakeup);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a3f, quirk_intel_catlow_pcie_no_pme_wakeup);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a40, quirk_intel_catlow_pcie_no_pme_wakeup);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a41, quirk_intel_catlow_pcie_no_pme_wakeup);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a42, quirk_intel_catlow_pcie_no_pme_wakeup);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a43, quirk_intel_catlow_pcie_no_pme_wakeup);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a44, quirk_intel_catlow_pcie_no_pme_wakeup);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a45, quirk_intel_catlow_pcie_no_pme_wakeup);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a46, quirk_intel_catlow_pcie_no_pme_wakeup);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a47, quirk_intel_catlow_pcie_no_pme_wakeup);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a48, quirk_intel_catlow_pcie_no_pme_wakeup);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a49, quirk_intel_catlow_pcie_no_pme_wakeup);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a4a, quirk_intel_catlow_pcie_no_pme_wakeup);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a4b, quirk_intel_catlow_pcie_no_pme_wakeup);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2026-03-09 18:04 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-13 23:14 [PATCH v2] PCI: pciehp: Fix hotplug on Catlow Lake with unreliable PME status Kuppuswamy Sathyanarayanan
2026-02-14  6:01 ` Lukas Wunner
2026-02-14 15:11   ` Lukas Wunner
2026-02-17 17:01     ` Kuppuswamy Sathyanarayanan
2026-02-17 18:22       ` Lukas Wunner
2026-02-18 16:28         ` Kuppuswamy Sathyanarayanan
2026-02-17 16:54 ` Kuppuswamy Sathyanarayanan
2026-02-17 18:08   ` Rafael J. Wysocki
2026-02-18 16:27     ` Kuppuswamy Sathyanarayanan
2026-02-18 17:33       ` Rafael J. Wysocki
2026-02-19  8:04         ` Lukas Wunner
2026-02-19 11:09           ` Rafael J. Wysocki
2026-02-19 21:54             ` Kuppuswamy Sathyanarayanan
2026-03-09 18:04               ` Kuppuswamy Sathyanarayanan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox