public inbox for linux-pm@vger.kernel.org
 help / color / mirror / Atom feed
From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
To: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>,
	Lukas Wunner <lukas@wunner.de>,
	linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-pm@vger.kernel.org
Subject: Re: [PATCH v2] PCI: pciehp: Fix hotplug on Catlow Lake with unreliable PME status
Date: Wed, 18 Feb 2026 08:27:34 -0800	[thread overview]
Message-ID: <7b4dd756-2ab7-4331-b560-268f9cff0887@linux.intel.com> (raw)
In-Reply-To: <CAJZ5v0iaKU6QJ7sxYCS21H0fv99DBNny-_bXzKH4g8RXgFuN6w@mail.gmail.com>



On 2/17/2026 10:08 AM, Rafael J. Wysocki wrote:
> On Tue, Feb 17, 2026 at 5:54 PM Kuppuswamy Sathyanarayanan
> <sathyanarayanan.kuppuswamy@linux.intel.com> wrote:
>>
>> Hi Rafael,
>>
>> On 2/13/2026 3:14 PM, Kuppuswamy Sathyanarayanan wrote:
>>> On Intel Catlow Lake platforms, PCH PCIe root ports do not reliably
>>> update PME status registers (PME Status and PME Requester_ID in the
>>> Root Status register) during D3hot to D0 transitions, even though PME
>>> interrupts are delivered correctly.
>>>
>>> This issue manifests during PCIe hotplug operations as follows:
>>>
>>> 1. After a hot-remove event, the PCIe port transitions to D3hot and
>>>    the hotplug interrupt enable (HPIE) flag is disabled as the port
>>>    enters low power state.
>>>
>>> 2. When a hot-add occurs while the port is in D3hot, a PME interrupt
>>>    fires as expected to wake the port.
>>>
>>> 3. However, the PME interrupt handler finds the PME_Status and
>>>    PME_Requester_ID registers unpopulated, preventing identification
>>>    of which device triggered the PME. The handler returns IRQ_NONE,
>>>    leaving the port in D3hot.
> 
> I think that you mean the
> 
> if (PCI_POSSIBLE_ERROR(rtsta) || !(rtsta & PCI_EXP_RTSTA_PME))
> 
> check in pcie_pme_irq().  Or do you mean something else?

Yes, I was referring to the above check.

> 
> An alternative workaround might be to add a (new) "always poll PME"
> flag for the port in question that will cause it to go to pci_pme_list
> in pci_pme_active() every time wakeup is enabled (essentially, an
> override for pme_poll clearing).

I will check whether this approach works. I want to make sure the poll
logic eventually triggers the hotplug handler to detect slot state
changes.

But if you think there is no power-related issue with keeping these ports
in D0, then we can adopt the pm_runtime_disable() approach. I think this
approach looks clean and simple.

What's your preference?

> 
>>> 4. Because the port remains in D3hot with HPIE disabled, the hotplug
>>>    driver ignores the hot-add event, resulting in the newly inserted
>>>    device not being recognized.
>>>
>>> The PME interrupt delivery mechanism itself works correctly;
>>> interrupts arrive reliably. The problem is purely the missing status
>>> register updates. Verification via IOSF-SideBand (IOSF-SB) backdoor
>>> reads confirms that these registers remain empty when the PME
>>> interrupt fires. Neither BIOS nor kernel code is clearing these
>>> registers.
>>>
>>> This issue is present in all steppings of Catlow Lake PCH and affects
>>> customers in production deployments. A public hardware errata document
>>> is not yet available.
>>>
>>> Work around this issue by disabling runtime PM for affected ports,
>>> keeping them in D0 during runtime operation. This ensures hotplug
>>> events are handled via direct interrupts rather than relying on
>>> unreliable PME-based wakeup.
>>>
>>> During system suspend/resume, PCIe ports are resumed unconditionally
>>> when coming out of system sleep due to DPM_FLAG_SMART_SUSPEND set by
>>> pcie_portdrv_probe(), and pciehp re-enables interrupts and checks slot
>>> occupation status during resume.
>>>
>>> The quirk is applied only to Catlow PCH PCIe root ports (device IDs
>>> 0x7a30 through 0x7a4b). Catlow CPU PCIe ports are not affected as
>>> they are not hotplug-capable.
>>>
>>> Suggested-by: Lukas Wunner <lukas@wunner.de>
>>> Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
>>> ---
>>
>> Could you please review this patch and let us know if calling
>> pm_runtime_disable() from a PCI quirk is acceptable?
>>
>> The quirk keeps specific Catlow Lake PCH PCIe root ports in D0 to
>> work around a hardware bug where PME status registers are not reliably
>> updated during D3hot to D0 transitions, causing hotplug events to be
>> missed.
>>
>> System suspend/resume is unaffected as DPM_FLAG_SMART_SUSPEND ensures
>> ports are resumed unconditionally and pciehp checks slot occupation
>> on resume.
>>
>>
>>>
>>> Changes since v1:
>>>  * Removed hack in hotplug driver and disabled runtime PM on affected ports.
>>>  * Fixed the commit log and comments accordingly.
>>>
>>>  drivers/pci/quirks.c | 49 ++++++++++++++++++++++++++++++++++++++++++++
>>>  1 file changed, 49 insertions(+)
>>>
>>> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
>>> index 280cd50d693b..779cd65b1a8a 100644
>>> --- a/drivers/pci/quirks.c
>>> +++ b/drivers/pci/quirks.c
>>> @@ -6340,3 +6340,52 @@ static void pci_mask_replay_timer_timeout(struct pci_dev *pdev)
>>>  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_GLI, 0x9750, pci_mask_replay_timer_timeout);
>>>  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_GLI, 0x9755, pci_mask_replay_timer_timeout);
>>>  #endif
>>> +
>>> +/*
>>> + * Intel Catlow Lake PCH PCIe root ports have a hardware issue where
>>> + * PME status registers (PME Status and PME Requester_ID in Root Status)
>>> + * are not reliably updated during D3hot to D0 transitions, even though
>>> + * PME interrupts are delivered correctly.
>>> + *
>>> + * When a hotplug event occurs while the port is in D3hot, the PME
>>> + * interrupt fires but the status registers remain empty. This prevents
>>> + * the PME handler from identifying the event source, leaving the port
>>> + * in D3hot and causing the hotplug driver to miss the event.
>>> + *
>>> + * Disable runtime PM to keep these ports in D0, ensuring hotplug events
>>> + * are handled via direct interrupts.
>>> + */
>>> +static void quirk_intel_catlow_pcie_no_pme_wakeup(struct pci_dev *dev)
>>> +{
>>> +     pm_runtime_disable(&dev->dev);
> 
> Personally, I would use pm_runtime_get_sync() here instead which would
> really mean "never suspend".
> 
>>> +     pci_info(dev, "Catlow PCH port: PME status unreliable, disabling runtime PM\n");
>>> +}
>>> +/* Apply quirk to Catlow Lake PCH root ports (0x7a30 - 0x7a4b) */
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a30, quirk_intel_catlow_pcie_no_pme_wakeup);
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a31, quirk_intel_catlow_pcie_no_pme_wakeup);
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a32, quirk_intel_catlow_pcie_no_pme_wakeup);
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a33, quirk_intel_catlow_pcie_no_pme_wakeup);
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a34, quirk_intel_catlow_pcie_no_pme_wakeup);
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a35, quirk_intel_catlow_pcie_no_pme_wakeup);
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a36, quirk_intel_catlow_pcie_no_pme_wakeup);
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a37, quirk_intel_catlow_pcie_no_pme_wakeup);
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a38, quirk_intel_catlow_pcie_no_pme_wakeup);
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a39, quirk_intel_catlow_pcie_no_pme_wakeup);
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a3a, quirk_intel_catlow_pcie_no_pme_wakeup);
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a3b, quirk_intel_catlow_pcie_no_pme_wakeup);
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a3c, quirk_intel_catlow_pcie_no_pme_wakeup);
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a3d, quirk_intel_catlow_pcie_no_pme_wakeup);
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a3e, quirk_intel_catlow_pcie_no_pme_wakeup);
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a3f, quirk_intel_catlow_pcie_no_pme_wakeup);
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a40, quirk_intel_catlow_pcie_no_pme_wakeup);
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a41, quirk_intel_catlow_pcie_no_pme_wakeup);
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a42, quirk_intel_catlow_pcie_no_pme_wakeup);
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a43, quirk_intel_catlow_pcie_no_pme_wakeup);
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a44, quirk_intel_catlow_pcie_no_pme_wakeup);
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a45, quirk_intel_catlow_pcie_no_pme_wakeup);
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a46, quirk_intel_catlow_pcie_no_pme_wakeup);
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a47, quirk_intel_catlow_pcie_no_pme_wakeup);
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a48, quirk_intel_catlow_pcie_no_pme_wakeup);
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a49, quirk_intel_catlow_pcie_no_pme_wakeup);
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a4a, quirk_intel_catlow_pcie_no_pme_wakeup);
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a4b, quirk_intel_catlow_pcie_no_pme_wakeup);
>>
>> --
>> Sathyanarayanan Kuppuswamy
>> Linux Kernel Developer
>>
> 

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer


  reply	other threads:[~2026-02-18 16:27 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20260213231428.613164-1-sathyanarayanan.kuppuswamy@linux.intel.com>
2026-02-17 16:54 ` [PATCH v2] PCI: pciehp: Fix hotplug on Catlow Lake with unreliable PME status Kuppuswamy Sathyanarayanan
2026-02-17 18:08   ` Rafael J. Wysocki
2026-02-18 16:27     ` Kuppuswamy Sathyanarayanan [this message]
2026-02-18 17:33       ` Rafael J. Wysocki
2026-02-19  8:04         ` Lukas Wunner
2026-02-19 11:09           ` Rafael J. Wysocki
2026-02-19 21:54             ` Kuppuswamy Sathyanarayanan
2026-03-09 18:04               ` Kuppuswamy Sathyanarayanan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7b4dd756-2ab7-4331-b560-268f9cff0887@linux.intel.com \
    --to=sathyanarayanan.kuppuswamy@linux.intel.com \
    --cc=bhelgaas@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=lukas@wunner.de \
    --cc=rafael@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox