From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
To: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>,
Lukas Wunner <lukas@wunner.de>,
linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-pm@vger.kernel.org
Subject: Re: [PATCH v2] PCI: pciehp: Fix hotplug on Catlow Lake with unreliable PME status
Date: Wed, 18 Feb 2026 08:27:34 -0800 [thread overview]
Message-ID: <7b4dd756-2ab7-4331-b560-268f9cff0887@linux.intel.com> (raw)
In-Reply-To: <CAJZ5v0iaKU6QJ7sxYCS21H0fv99DBNny-_bXzKH4g8RXgFuN6w@mail.gmail.com>
On 2/17/2026 10:08 AM, Rafael J. Wysocki wrote:
> On Tue, Feb 17, 2026 at 5:54 PM Kuppuswamy Sathyanarayanan
> <sathyanarayanan.kuppuswamy@linux.intel.com> wrote:
>>
>> Hi Rafael,
>>
>> On 2/13/2026 3:14 PM, Kuppuswamy Sathyanarayanan wrote:
>>> On Intel Catlow Lake platforms, PCH PCIe root ports do not reliably
>>> update PME status registers (PME Status and PME Requester_ID in the
>>> Root Status register) during D3hot to D0 transitions, even though PME
>>> interrupts are delivered correctly.
>>>
>>> This issue manifests during PCIe hotplug operations as follows:
>>>
>>> 1. After a hot-remove event, the PCIe port transitions to D3hot and
>>> the hotplug interrupt enable (HPIE) flag is disabled as the port
>>> enters low power state.
>>>
>>> 2. When a hot-add occurs while the port is in D3hot, a PME interrupt
>>> fires as expected to wake the port.
>>>
>>> 3. However, the PME interrupt handler finds the PME_Status and
>>> PME_Requester_ID registers unpopulated, preventing identification
>>> of which device triggered the PME. The handler returns IRQ_NONE,
>>> leaving the port in D3hot.
>
> I think that you mean the
>
> if (PCI_POSSIBLE_ERROR(rtsta) || !(rtsta & PCI_EXP_RTSTA_PME))
>
> check in pcie_pme_irq(). Or do you mean something else?
Yes, I was referring to the above check.
>
> An alternative workaround might be to add a (new) "always poll PME"
> flag for the port in question that will cause it to go to pci_pme_list
> in pci_pme_active() every time wakeup is enabled (essentially, an
> override for pme_poll clearing).
I will check whether this approach works. I want to make sure the poll
logic eventually triggers the hotplug handler to detect slot state
changes.
But if you think there is no power-related issue with keeping these ports
in D0, then we can adopt the pm_runtime_disable() approach. I think this
approach looks clean and simple.
What's your preference?
>
>>> 4. Because the port remains in D3hot with HPIE disabled, the hotplug
>>> driver ignores the hot-add event, resulting in the newly inserted
>>> device not being recognized.
>>>
>>> The PME interrupt delivery mechanism itself works correctly;
>>> interrupts arrive reliably. The problem is purely the missing status
>>> register updates. Verification via IOSF-SideBand (IOSF-SB) backdoor
>>> reads confirms that these registers remain empty when the PME
>>> interrupt fires. Neither BIOS nor kernel code is clearing these
>>> registers.
>>>
>>> This issue is present in all steppings of Catlow Lake PCH and affects
>>> customers in production deployments. A public hardware errata document
>>> is not yet available.
>>>
>>> Work around this issue by disabling runtime PM for affected ports,
>>> keeping them in D0 during runtime operation. This ensures hotplug
>>> events are handled via direct interrupts rather than relying on
>>> unreliable PME-based wakeup.
>>>
>>> During system suspend/resume, PCIe ports are resumed unconditionally
>>> when coming out of system sleep due to DPM_FLAG_SMART_SUSPEND set by
>>> pcie_portdrv_probe(), and pciehp re-enables interrupts and checks slot
>>> occupation status during resume.
>>>
>>> The quirk is applied only to Catlow PCH PCIe root ports (device IDs
>>> 0x7a30 through 0x7a4b). Catlow CPU PCIe ports are not affected as
>>> they are not hotplug-capable.
>>>
>>> Suggested-by: Lukas Wunner <lukas@wunner.de>
>>> Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
>>> ---
>>
>> Could you please review this patch and let us know if calling
>> pm_runtime_disable() from a PCI quirk is acceptable?
>>
>> The quirk keeps specific Catlow Lake PCH PCIe root ports in D0 to
>> work around a hardware bug where PME status registers are not reliably
>> updated during D3hot to D0 transitions, causing hotplug events to be
>> missed.
>>
>> System suspend/resume is unaffected as DPM_FLAG_SMART_SUSPEND ensures
>> ports are resumed unconditionally and pciehp checks slot occupation
>> on resume.
>>
>>
>>>
>>> Changes since v1:
>>> * Removed hack in hotplug driver and disabled runtime PM on affected ports.
>>> * Fixed the commit log and comments accordingly.
>>>
>>> drivers/pci/quirks.c | 49 ++++++++++++++++++++++++++++++++++++++++++++
>>> 1 file changed, 49 insertions(+)
>>>
>>> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
>>> index 280cd50d693b..779cd65b1a8a 100644
>>> --- a/drivers/pci/quirks.c
>>> +++ b/drivers/pci/quirks.c
>>> @@ -6340,3 +6340,52 @@ static void pci_mask_replay_timer_timeout(struct pci_dev *pdev)
>>> DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_GLI, 0x9750, pci_mask_replay_timer_timeout);
>>> DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_GLI, 0x9755, pci_mask_replay_timer_timeout);
>>> #endif
>>> +
>>> +/*
>>> + * Intel Catlow Lake PCH PCIe root ports have a hardware issue where
>>> + * PME status registers (PME Status and PME Requester_ID in Root Status)
>>> + * are not reliably updated during D3hot to D0 transitions, even though
>>> + * PME interrupts are delivered correctly.
>>> + *
>>> + * When a hotplug event occurs while the port is in D3hot, the PME
>>> + * interrupt fires but the status registers remain empty. This prevents
>>> + * the PME handler from identifying the event source, leaving the port
>>> + * in D3hot and causing the hotplug driver to miss the event.
>>> + *
>>> + * Disable runtime PM to keep these ports in D0, ensuring hotplug events
>>> + * are handled via direct interrupts.
>>> + */
>>> +static void quirk_intel_catlow_pcie_no_pme_wakeup(struct pci_dev *dev)
>>> +{
>>> + pm_runtime_disable(&dev->dev);
>
> Personally, I would use pm_runtime_get_sync() here instead which would
> really mean "never suspend".
>
>>> + pci_info(dev, "Catlow PCH port: PME status unreliable, disabling runtime PM\n");
>>> +}
>>> +/* Apply quirk to Catlow Lake PCH root ports (0x7a30 - 0x7a4b) */
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a30, quirk_intel_catlow_pcie_no_pme_wakeup);
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a31, quirk_intel_catlow_pcie_no_pme_wakeup);
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a32, quirk_intel_catlow_pcie_no_pme_wakeup);
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a33, quirk_intel_catlow_pcie_no_pme_wakeup);
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a34, quirk_intel_catlow_pcie_no_pme_wakeup);
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a35, quirk_intel_catlow_pcie_no_pme_wakeup);
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a36, quirk_intel_catlow_pcie_no_pme_wakeup);
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a37, quirk_intel_catlow_pcie_no_pme_wakeup);
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a38, quirk_intel_catlow_pcie_no_pme_wakeup);
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a39, quirk_intel_catlow_pcie_no_pme_wakeup);
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a3a, quirk_intel_catlow_pcie_no_pme_wakeup);
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a3b, quirk_intel_catlow_pcie_no_pme_wakeup);
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a3c, quirk_intel_catlow_pcie_no_pme_wakeup);
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a3d, quirk_intel_catlow_pcie_no_pme_wakeup);
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a3e, quirk_intel_catlow_pcie_no_pme_wakeup);
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a3f, quirk_intel_catlow_pcie_no_pme_wakeup);
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a40, quirk_intel_catlow_pcie_no_pme_wakeup);
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a41, quirk_intel_catlow_pcie_no_pme_wakeup);
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a42, quirk_intel_catlow_pcie_no_pme_wakeup);
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a43, quirk_intel_catlow_pcie_no_pme_wakeup);
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a44, quirk_intel_catlow_pcie_no_pme_wakeup);
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a45, quirk_intel_catlow_pcie_no_pme_wakeup);
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a46, quirk_intel_catlow_pcie_no_pme_wakeup);
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a47, quirk_intel_catlow_pcie_no_pme_wakeup);
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a48, quirk_intel_catlow_pcie_no_pme_wakeup);
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a49, quirk_intel_catlow_pcie_no_pme_wakeup);
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a4a, quirk_intel_catlow_pcie_no_pme_wakeup);
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a4b, quirk_intel_catlow_pcie_no_pme_wakeup);
>>
>> --
>> Sathyanarayanan Kuppuswamy
>> Linux Kernel Developer
>>
>
--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer
next prev parent reply other threads:[~2026-02-18 16:27 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-13 23:14 [PATCH v2] PCI: pciehp: Fix hotplug on Catlow Lake with unreliable PME status Kuppuswamy Sathyanarayanan
2026-02-14 6:01 ` Lukas Wunner
2026-02-14 15:11 ` Lukas Wunner
2026-02-17 17:01 ` Kuppuswamy Sathyanarayanan
2026-02-17 18:22 ` Lukas Wunner
2026-02-18 16:28 ` Kuppuswamy Sathyanarayanan
2026-02-17 16:54 ` Kuppuswamy Sathyanarayanan
2026-02-17 18:08 ` Rafael J. Wysocki
2026-02-18 16:27 ` Kuppuswamy Sathyanarayanan [this message]
2026-02-18 17:33 ` Rafael J. Wysocki
2026-02-19 8:04 ` Lukas Wunner
2026-02-19 11:09 ` Rafael J. Wysocki
2026-02-19 21:54 ` Kuppuswamy Sathyanarayanan
2026-03-09 18:04 ` Kuppuswamy Sathyanarayanan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7b4dd756-2ab7-4331-b560-268f9cff0887@linux.intel.com \
--to=sathyanarayanan.kuppuswamy@linux.intel.com \
--cc=bhelgaas@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=linux-pm@vger.kernel.org \
--cc=lukas@wunner.de \
--cc=rafael@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.