Re: [PATCH v3] PCI: pciehp: Fix hotplug on Catlow Lake with unreliable PME status

public inbox for linux-pci@vger.kernel.org
 help / color / mirror / Atom feed

From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
To: Bjorn Helgaas <helgaas@kernel.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>,
	"Rafael J . Wysocki" <rafael@kernel.org>,
	Lukas Wunner <lukas@wunner.de>,
	linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
	Mika Westerberg <mika.westerberg@linux.intel.com>
Subject: Re: [PATCH v3] PCI: pciehp: Fix hotplug on Catlow Lake with unreliable PME status
Date: Tue, 24 Mar 2026 14:45:25 -0700	[thread overview]
Message-ID: <ca15feda-5d63-405d-a4c4-00cbb9991fed@linux.intel.com> (raw)
In-Reply-To: <20260323232437.GA1085990@bhelgaas>

Hi Bjorn,

Thanks for the review.

On 3/23/2026 4:24 PM, Bjorn Helgaas wrote:
> [+cc Mika, author of eb34da60edee]
> 
> On Mon, Mar 16, 2026 at 03:08:06PM -0700, Kuppuswamy Sathyanarayanan wrote:
>> On Intel Catlow Lake platforms, PCH PCIe root ports do not reliably
>> update PME status registers (PME Status and PME Requester_ID in the
>> Root Status register) during D3hot to D0 transitions, even though PME
>> interrupts are delivered correctly.
> 
> IIUC, the Root Status update should happen on receipt of a PME Message
> and is not directly related to a D3hot to D0 transition (PCIe r7.0,
> sec 6.1.6).


You're correct. PME status register updates occur on PME Message
reception, not during power state transitions.

I used "D3hot to D0 transitions" as shorthand for the full sequence: port
in D3hot detects hotplug event -> generates PME Message -> should update
Root Status -> PME interrupt -> port wakes to D0.

On Catlow Lake, the PME interrupt is delivered but the Root Status
registers (PME_Status and PME_Requester_ID) are not updated when the port
generates these PME Messages. This causes pcie_pme_irq() to return
IRQ_NONE, leaving the port in D3hot.

I'll clarify the commit message to avoid this misleading shorthand.
Something like below:

On Intel Catlow Lake platforms, PCH PCIe root ports do not reliably
update PME status registers (PME Status and PME Requester_ID in the
Root Status register) when receiving PME Messages while in D3hot state,
even though PME interrupts are delivered correctly

> 
>> This issue manifests during PCIe hotplug operations as follows:
>>
>> 1. After a hot-remove event, the PCIe port runtime suspends to D3hot.
>>    pciehp_suspend() disables hotplug interrupts (HPIE) to rely on
>>    PME-based wakeup.
> 
> Didn't we rely on PME interrupts anyway, independent of HPIE?

I don't think they are independent. If HPIE is enabled, hotplug interrupt
will be triggered as direct interrupt and we don't have to reply on PME
for wakeup.

> 
> eb34da60edee ("PCI: pciehp: Disable hotplug interrupt during suspend")
> cleared PCI_EXP_SLTCTL_HPIE so that when the link goes down, we
> wouldn't get a PCI_EXP_SLTSTA_DLLSC interrupt and wake the system.
> 
> I don't know the details of why the PCI_EXP_SLTSTA_DLLSC would cause
> that wakeup.  I would think pciehp should field that, and it should be
> able to figure out whether to bring the port out of D3hot.
> 
> Anyway, with this patch it looks like we'll leave PCI_EXP_SLTCTL_HPIE
> set, and potentially get that PCI_EXP_SLTSTA_DLLSC interrupt again?

I have tested this patch on Catlow Lake. Enabling HPIE does not result in
spurious wakeups as mentioned in Mika's patch.

Mika, any comments?

> 
>> 2. When a hot-add occurs while the port is in D3hot, a PME interrupt
>>    fires as expected to wake the port.
> 
> Why is a PME interrupt expected here?  I would expect a hot-add to
> cause a PCI_EXP_SLTSTA_PDC or PCI_EXP_SLTSTA_DLLSC interrupt.  Sec
> 6.1.6 suggests PME interrupts are only from Root Ports.
> 
>> 3. However, the PME interrupt handler finds the PME_Status and
>>    PME_Requester_ID registers unpopulated, preventing identification
>>    of which device triggered the PME. The handler returns IRQ_NONE,
>>    leaving the port in D3hot.
> 
> I guess this is pcie_pme_irq(), and it finds PCI_EXP_RTSTA_PME clear
> because of this Catlow defect?  It looks like it returns without even
> looking at PME_Requester_ID.

Yes, it returns after checking for PCI_EXP_RTSTA_PME. I tried modifying
it to proceed without the status check, but it still failed when reading
PME_Requester_ID (also not updated). So I mentioned both registers.

> 
> Sec 5.3.3.1 suggests that the purpose of PME_Requester_ID is to
> facilitate quicker PME service and shorter resume time.  So maybe the
> lack of PME_Requester_ID should only be a performance issue, not a
> functional problem?
> 
> If we know we got a PME interrupt, and we can wake up (maybe more
> slowly without a Requester ID), why can't we just do the wakeup
> independent of PCI_EXP_RTSTA_PME and PCI_EXP_RTSTA_PME_RQ_ID?  Are
> spurious PME interrupts a problem?
> 

Yes, I think we can call pcie_pme_walk_bus() even when PCI_EXP_RTSTA_PME
is clear for ports with the quirk. This would work but be slower without
the Requester_ID hint.

I can verify this alternative approach if Mika confirms that keeping HPIE
enabled is problematic. In our testing on Catlow Lake, enabling HPIE did not
cause any spurious wakeup issues.

>> 4. Because the port remains in D3hot with HPIE disabled, the hotplug
>>    event is lost and the newly inserted device is not recognized.
>>
>> The PME interrupt delivery mechanism itself works correctly;
>> interrupts arrive reliably. The problem is purely the missing status
>> register updates. Verification via IOSF-SideBand (IOSF-SB) backdoor
>> reads confirms that these registers remain empty when the PME
>> interrupt fires. Neither BIOS nor kernel code is clearing these
>> registers.
>>
>> This issue is present in all steppings of Catlow Lake PCH and affects
>> customers in production deployments. A public hardware errata document
>> is not yet available.
>>
>> Work around this issue by introducing a PCI_DEV_FLAGS_PME_UNRELIABLE
>> flag for affected ports. When this flag is set, pciehp keeps hotplug
>> interrupts (HPIE) enabled during D3hot instead of disabling them and
>> relying on PME. This allows hotplug events to be delivered via direct
>> interrupts rather than through the broken PME status mechanism.
>>
>> The port still enters D3hot for power savings during runtime suspend,
>> avoiding the power regression that would occur with pm_runtime_disable().
>> Testing confirms this approach does not impact PC6/PC10 package C-state
>> residency.
>>
>> During system suspend/resume, the behavior is unchanged. Ports are
>> resumed unconditionally when coming out of system sleep due to
>> DPM_FLAG_SMART_SUSPEND set by pcie_portdrv_probe(), and pciehp
>> re-enables interrupts and checks slot occupation status during resume.
>>
>> The quirk is applied only to Catlow PCH PCIe root ports (device IDs
>> 0x7a30 through 0x7a4b). Catlow CPU PCIe ports are not affected as
>> they are not hotplug-capable.
>>
>> Suggested-by: Lukas Wunner <lukas@wunner.de>
>> Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
>> ---
>>
>> Changes since v2:
>>  * Switched from pm_runtime_disable() to PCI_DEV_FLAGS_PME_UNRELIABLE
>>    flag approach to avoid power regression (feedback from Rafael and Lukas)
>>  * Keep hotplug interrupts (HPIE) enabled during D3hot instead of
>>    preventing D3hot entry entirely
>>  * Port still enters D3hot for power savings; testing confirms no impact
>>    on PC6 package C-state residency
>>  * Modified pciehp to check pme_is_broken() before disabling/enabling
>>    hotplug interrupts during suspend/resume
>>  * Made quirk comment generic to cover both PME notification and status
>>    update issues, with Catlow Lake specifics documented separately
>>
>> Changes since v1:
>>  * Removed hack in hotplug driver and disabled runtime PM on affected ports.
>>  * Fixed the commit log and comments accordingly.
>>
>>  drivers/pci/hotplug/pciehp_core.c | 11 ++++--
>>  drivers/pci/quirks.c              | 60 +++++++++++++++++++++++++++++++
>>  include/linux/pci.h               |  2 ++
>>  3 files changed, 71 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c
>> index 1e9158d7bac7..f854ef9551c3 100644
>> --- a/drivers/pci/hotplug/pciehp_core.c
>> +++ b/drivers/pci/hotplug/pciehp_core.c
>> @@ -260,13 +260,20 @@ static bool pme_is_native(struct pcie_device *dev)
>>  	return pcie_ports_native || host->native_pme;
>>  }
>>  
>> +static bool pme_is_broken(struct pcie_device *pcie)
>> +{
>> +	struct pci_dev *pdev = pcie->port;
>> +
>> +	return !!(pdev->dev_flags & PCI_DEV_FLAGS_PME_UNRELIABLE);
>> +}
>> +
>>  static void pciehp_disable_interrupt(struct pcie_device *dev)
>>  {
>>  	/*
>>  	 * Disable hotplug interrupt so that it does not trigger
>>  	 * immediately when the downstream link goes down.
>>  	 */
>> -	if (pme_is_native(dev))
>> +	if (pme_is_native(dev) && !pme_is_broken(dev))
>>  		pcie_disable_interrupt(get_service_data(dev));
>>  }
>>  
>> @@ -318,7 +325,7 @@ static int pciehp_resume(struct pcie_device *dev)
>>  {
>>  	struct controller *ctrl = get_service_data(dev);
>>  
>> -	if (pme_is_native(dev))
>> +	if (pme_is_native(dev) && !pme_is_broken(dev))
>>  		pcie_enable_interrupt(ctrl);
>>  
>>  	pciehp_check_presence(ctrl);
>> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
>> index 48946cca4be7..bfb52735c4e3 100644
>> --- a/drivers/pci/quirks.c
>> +++ b/drivers/pci/quirks.c
>> @@ -6380,3 +6380,63 @@ static void pci_mask_replay_timer_timeout(struct pci_dev *pdev)
>>  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_GLI, 0x9750, pci_mask_replay_timer_timeout);
>>  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_GLI, 0x9755, pci_mask_replay_timer_timeout);
>>  #endif
>> +
>> +/*
>> + * Some PCIe root ports have a hardware issue where PME-based wakeup
>> + * from D3hot is unreliable. This can manifest as either PME interrupts
>> + * not being delivered, or PME status registers (PME Status and PME
>> + * Requester_ID in Root Status) not being reliably updated even when
>> + * interrupts are delivered.
>> + *
>> + * When a hotplug event occurs while the port is in D3hot, the system
>> + * relies on PME to wake the port back to D0. If PME notification or
>> + * status updates are unreliable, the PME handler either doesn't get
>> + * invoked or cannot identify the event source. This leaves the port in
>> + * D3hot with hotplug interrupts disabled, causing hotplug events to be
>> + * missed.
>> + *
>> + * Mark affected ports with PCI_DEV_FLAGS_PME_UNRELIABLE to keep
>> + * hotplug interrupts (HPIE) enabled during D3hot instead of relying on
>> + * PME-based wakeup. This allows hotplug events to be delivered via
>> + * direct interrupts while still permitting the port to enter D3hot for
>> + * power savings.
>> + *
>> + * Known affected hardware:
>> + * - Intel Catlow Lake PCH PCIe root ports: PME status registers are
>> + *   not updated during D3hot to D0 transitions, even though PME
>> + *   interrupts are delivered correctly.
>> + */
>> +static void quirk_pcie_pme_unreliable(struct pci_dev *dev)
>> +{
>> +	dev->dev_flags |= PCI_DEV_FLAGS_PME_UNRELIABLE;
>> +	pci_info(dev, "PME status unreliable, keeping hotplug interrupts enabled in D3hot\n");
>> +}
>> +/* Apply quirk to Catlow Lake PCH root ports (0x7a30 - 0x7a4b) */
>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a30, quirk_pcie_pme_unreliable);
>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a31, quirk_pcie_pme_unreliable);
>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a32, quirk_pcie_pme_unreliable);
>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a33, quirk_pcie_pme_unreliable);
>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a34, quirk_pcie_pme_unreliable);
>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a35, quirk_pcie_pme_unreliable);
>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a36, quirk_pcie_pme_unreliable);
>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a37, quirk_pcie_pme_unreliable);
>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a38, quirk_pcie_pme_unreliable);
>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a39, quirk_pcie_pme_unreliable);
>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a3a, quirk_pcie_pme_unreliable);
>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a3b, quirk_pcie_pme_unreliable);
>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a3c, quirk_pcie_pme_unreliable);
>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a3d, quirk_pcie_pme_unreliable);
>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a3e, quirk_pcie_pme_unreliable);
>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a3f, quirk_pcie_pme_unreliable);
>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a40, quirk_pcie_pme_unreliable);
>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a41, quirk_pcie_pme_unreliable);
>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a42, quirk_pcie_pme_unreliable);
>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a43, quirk_pcie_pme_unreliable);
>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a44, quirk_pcie_pme_unreliable);
>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a45, quirk_pcie_pme_unreliable);
>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a46, quirk_pcie_pme_unreliable);
>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a47, quirk_pcie_pme_unreliable);
>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a48, quirk_pcie_pme_unreliable);
>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a49, quirk_pcie_pme_unreliable);
>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a4a, quirk_pcie_pme_unreliable);
>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x7a4b, quirk_pcie_pme_unreliable);
>> diff --git a/include/linux/pci.h b/include/linux/pci.h
>> index 1c270f1d5123..9761351c5d70 100644
>> --- a/include/linux/pci.h
>> +++ b/include/linux/pci.h
>> @@ -253,6 +253,8 @@ enum pci_dev_flags {
>>  	 * integrated with the downstream devices and doesn't use real PCI.
>>  	 */
>>  	PCI_DEV_FLAGS_PCI_BRIDGE_NO_ALIAS = (__force pci_dev_flags_t) (1 << 14),
>> +	/* Device PME is broken or unreliable */
>> +	PCI_DEV_FLAGS_PME_UNRELIABLE = (__force pci_dev_flags_t) (1 << 15),
>>  };
>>  
>>  enum pci_irq_reroute_variant {
>> -- 
>> 2.43.0
>>
> 

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer

next prev parent reply	other threads:[~2026-03-24 21:45 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-16 22:08 [PATCH v3] PCI: pciehp: Fix hotplug on Catlow Lake with unreliable PME status Kuppuswamy Sathyanarayanan
2026-03-23 12:53 ` Lukas Wunner
2026-03-23 23:24 ` Bjorn Helgaas
2026-03-24 21:45   ` Kuppuswamy Sathyanarayanan [this message]
2026-03-24 23:46     ` Bjorn Helgaas
2026-03-25  5:56     ` Lukas Wunner
2026-03-25 23:21       ` Bjorn Helgaas
2026-03-25  6:11     ` Mika Westerberg
2026-03-25 21:12       ` Kuppuswamy Sathyanarayanan
2026-03-26  6:12         ` Mika Westerberg
2026-03-26 21:23           ` Kuppuswamy Sathyanarayanan
2026-03-27 11:16             ` Mika Westerberg
2026-04-03 19:37               ` Kuppuswamy Sathyanarayanan
2026-04-07  7:08                 ` Mika Westerberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ca15feda-5d63-405d-a4c4-00cbb9991fed@linux.intel.com \
    --to=sathyanarayanan.kuppuswamy@linux.intel.com \
    --cc=bhelgaas@google.com \
    --cc=helgaas@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=lukas@wunner.de \
    --cc=mika.westerberg@linux.intel.com \
    --cc=rafael@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox