public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
* Re: Patch "PCI: pciehp: Ignore Link Down/Up caused by Secondary Bus Reset" has been added to the 6.15-stable tree
       [not found] <20250610121606.1556304-1-sashal@kernel.org>
@ 2025-06-13  3:55 ` Lukas Wunner
  2025-06-14 16:11   ` Sasha Levin
  0 siblings, 1 reply; 2+ messages in thread
From: Lukas Wunner @ 2025-06-13  3:55 UTC (permalink / raw)
  To: stable; +Cc: stable-commits, Bjorn Helgaas, Joel Mathew Thomas

[cc += Joel Mathew Thomas]

On Tue, Jun 10, 2025 at 08:16:05AM -0400, Sasha Levin wrote:
> This is a note to let you know that I've just added the patch titled
> 
>     PCI: pciehp: Ignore Link Down/Up caused by Secondary Bus Reset
> 
> to the 6.15-stable tree which can be found at:
>     http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary
> 
> The filename of the patch is:
>      pci-pciehp-ignore-link-down-up-caused-by-secondary-b.patch
> and it can be found in the queue-6.15 subdirectory.
> 
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable@vger.kernel.org> know about it.

Hi Sasha, thanks for selecting the above (which is 2af781a9edc4 upstream)
as a 6.15 backport.

A small feature request, could you amend the stable tooling to cc
people tagged as Reported-by and Tested-by?  I think they're the
ones most interested in seeing something backported.

Thanks!

Lukas

> commit 161a7237de69f65ccfe68da318343f3719149480
> Author: Lukas Wunner <lukas@wunner.de>
> Date:   Thu Apr 10 17:27:12 2025 +0200
> 
>     PCI: pciehp: Ignore Link Down/Up caused by Secondary Bus Reset
>     
>     [ Upstream commit 2af781a9edc4ef5f6684c0710cc3542d9be48b31 ]
>     
>     When a Secondary Bus Reset is issued at a hotplug port, it causes a Data
>     Link Layer State Changed event as a side effect.  On hotplug ports using
>     in-band presence detect, it additionally causes a Presence Detect Changed
>     event.
>     
>     These spurious events should not result in teardown and re-enumeration of
>     the device in the slot.  Hence commit 2e35afaefe64 ("PCI: pciehp: Add
>     reset_slot() method") masked the Presence Detect Changed Enable bit in the
>     Slot Control register during a Secondary Bus Reset.  Commit 06a8d89af551
>     ("PCI: pciehp: Disable link notification across slot reset") additionally
>     masked the Data Link Layer State Changed Enable bit.
>     
>     However masking those bits only disables interrupt generation (PCIe r6.2
>     sec 6.7.3.1).  The events are still visible in the Slot Status register
>     and picked up by the IRQ handler if it runs during a Secondary Bus Reset.
>     This can happen if the interrupt is shared or if an unmasked hotplug event
>     occurs, e.g. Attention Button Pressed or Power Fault Detected.
>     
>     The likelihood of this happening used to be small, so it wasn't much of a
>     problem in practice.  That has changed with the recent introduction of
>     bandwidth control in v6.13-rc1 with commit 665745f27487 ("PCI/bwctrl:
>     Re-add BW notification portdrv as PCIe BW controller"):
>     
>     Bandwidth control shares the interrupt with PCIe hotplug.  A Secondary Bus
>     Reset causes a Link Bandwidth Notification, so the hotplug IRQ handler
>     runs, picks up the masked events and tears down the device in the slot.
>     
>     As a result, Joel reports VFIO passthrough failure of a GPU, which Ilpo
>     root-caused to the incorrect handling of masked hotplug events.
>     
>     Clearly, a more reliable way is needed to ignore spurious hotplug events.
>     
>     For Downstream Port Containment, a new ignore mechanism was introduced by
>     commit a97396c6eb13 ("PCI: pciehp: Ignore Link Down/Up caused by DPC").
>     It has been working reliably for the past four years.
>     
>     Adapt it for Secondary Bus Resets.
>     
>     Introduce two helpers to annotate code sections which cause spurious link
>     changes:  pci_hp_ignore_link_change() and pci_hp_unignore_link_change()
>     Use those helpers in lieu of masking interrupts in the Slot Control
>     register.
>     
>     Introduce a helper to check whether such a code section is executing
>     concurrently and if so, await it:  pci_hp_spurious_link_change()
>     Invoke the helper in the hotplug IRQ thread pciehp_ist().  Re-use the
>     IRQ thread's existing code which ignores DPC-induced link changes unless
>     the link is unexpectedly down after reset recovery or the device was
>     replaced during the bus reset.
>     
>     That code block in pciehp_ist() was previously only executed if a Data
>     Link Layer State Changed event has occurred.  Additionally execute it for
>     Presence Detect Changed events.  That's necessary for compatibility with
>     PCIe r1.0 hotplug ports because Data Link Layer State Changed didn't exist
>     before PCIe r1.1.  DPC was added with PCIe r3.1 and thus DPC-capable
>     hotplug ports always support Data Link Layer State Changed events.
>     But the same cannot be assumed for Secondary Bus Reset, which already
>     existed in PCIe r1.0.
>     
>     Secondary Bus Reset is only one of many causes of spurious link changes.
>     Others include runtime suspend to D3cold, firmware updates or FPGA
>     reconfiguration.  The new pci_hp_{,un}ignore_link_change() helpers may be
>     used by all kinds of drivers to annotate such code sections, hence their
>     declarations are publicly visible in <linux/pci.h>.  A case in point is
>     the Mellanox Ethernet driver which disables a firmware reset feature if
>     the Ethernet card is attached to a hotplug port, see commit 3d7a3f2612d7
>     ("net/mlx5: Nack sync reset request when HotPlug is enabled").  Going
>     forward, PCIe hotplug will be able to cope gracefully with all such use
>     cases once the code sections are properly annotated.
>     
>     The new helpers internally use two bits in struct pci_dev's priv_flags as
>     well as a wait_queue.  This mirrors what was done for DPC by commit
>     a97396c6eb13 ("PCI: pciehp: Ignore Link Down/Up caused by DPC").  That may
>     be insufficient if spurious link changes are caused by multiple sources
>     simultaneously.  An example might be a Secondary Bus Reset issued by AER
>     during FPGA reconfiguration.  If this turns out to happen in real life,
>     support for it can easily be added by replacing the PCI_LINK_CHANGING flag
>     with an atomic_t counter incremented by pci_hp_ignore_link_change() and
>     decremented by pci_hp_unignore_link_change().  Instead of awaiting a zero
>     PCI_LINK_CHANGING flag, the pci_hp_spurious_link_change() helper would
>     then simply await a zero counter.
>     
>     Fixes: 665745f27487 ("PCI/bwctrl: Re-add BW notification portdrv as PCIe BW controller")
>     Reported-by: Joel Mathew Thomas <proxy0@tutamail.com>
>     Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219765
>     Signed-off-by: Lukas Wunner <lukas@wunner.de>
>     Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
>     Tested-by: Joel Mathew Thomas <proxy0@tutamail.com>
>     Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
>     Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
>     Link: https://patch.msgid.link/d04deaf49d634a2edf42bf3c06ed81b4ca54d17b.1744298239.git.lukas@wunner.de
>     Signed-off-by: Sasha Levin <sashal@kernel.org>

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Patch "PCI: pciehp: Ignore Link Down/Up caused by Secondary Bus Reset" has been added to the 6.15-stable tree
  2025-06-13  3:55 ` Patch "PCI: pciehp: Ignore Link Down/Up caused by Secondary Bus Reset" has been added to the 6.15-stable tree Lukas Wunner
@ 2025-06-14 16:11   ` Sasha Levin
  0 siblings, 0 replies; 2+ messages in thread
From: Sasha Levin @ 2025-06-14 16:11 UTC (permalink / raw)
  To: Lukas Wunner; +Cc: stable, stable-commits, Bjorn Helgaas, Joel Mathew Thomas

On Fri, Jun 13, 2025 at 05:55:19AM +0200, Lukas Wunner wrote:
>[cc += Joel Mathew Thomas]
>
>On Tue, Jun 10, 2025 at 08:16:05AM -0400, Sasha Levin wrote:
>> This is a note to let you know that I've just added the patch titled
>>
>>     PCI: pciehp: Ignore Link Down/Up caused by Secondary Bus Reset
>>
>> to the 6.15-stable tree which can be found at:
>>     http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary
>>
>> The filename of the patch is:
>>      pci-pciehp-ignore-link-down-up-caused-by-secondary-b.patch
>> and it can be found in the queue-6.15 subdirectory.
>>
>> If you, or anyone else, feels it should not be added to the stable tree,
>> please let <stable@vger.kernel.org> know about it.
>
>Hi Sasha, thanks for selecting the above (which is 2af781a9edc4 upstream)
>as a 6.15 backport.
>
>A small feature request, could you amend the stable tooling to cc
>people tagged as Reported-by and Tested-by?  I think they're the
>ones most interested in seeing something backported.

I'll see if git-send-email supports something like that...

-- 
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2025-06-14 16:11 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20250610121606.1556304-1-sashal@kernel.org>
2025-06-13  3:55 ` Patch "PCI: pciehp: Ignore Link Down/Up caused by Secondary Bus Reset" has been added to the 6.15-stable tree Lukas Wunner
2025-06-14 16:11   ` Sasha Levin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox