linux-fpga.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Ilpo Järvinen" <ilpo.jarvinen@linux.intel.com>
To: Lukas Wunner <lukas@wunner.de>, Bjorn Helgaas <helgaas@kernel.org>
Cc: Sathyanarayanan Kuppuswamy
	<sathyanarayanan.kuppuswamy@linux.intel.com>,
	 Keith Busch <kbusch@kernel.org>,
	Yicong Yang <yangyicong@hisilicon.com>,
	 linux-pci@vger.kernel.org,
	Stuart Hayes <stuart.w.hayes@gmail.com>,
	 Mika Westerberg <mika.westerberg@linux.intel.com>,
	 Joel Mathew Thomas <proxy0@tutamail.com>,
	 Russ Weight <russ.weight@linux.dev>,
	 Matthew Gerlach <matthew.gerlach@altera.com>,
	 Yilun Xu <yilun.xu@intel.com>,
	linux-fpga@vger.kernel.org,  Moshe Shemesh <moshe@nvidia.com>,
	Shay Drory <shayd@nvidia.com>,
	 Saeed Mahameed <saeedm@nvidia.com>,
	 Alex Williamson <alex.williamson@redhat.com>
Subject: Re: [PATCH 2/2] PCI: pciehp: Ignore Link Down/Up caused by Secondary Bus Reset
Date: Wed, 16 Apr 2025 11:00:43 +0300 (EEST)	[thread overview]
Message-ID: <e42a19a8-4934-69bb-0a79-6748062043ec@linux.intel.com> (raw)
In-Reply-To: <d04deaf49d634a2edf42bf3c06ed81b4ca54d17b.1744298239.git.lukas@wunner.de>

[-- Attachment #1: Type: text/plain, Size: 7697 bytes --]

On Thu, 10 Apr 2025, Lukas Wunner wrote:

> When a Secondary Bus Reset is issued at a hotplug port, it causes a Data
> Link Layer State Changed event as a side effect.  On hotplug ports using
> in-band presence detect, it additionally causes a Presence Detect Changed
> event.
> 
> These spurious events should not result in teardown and re-enumeration of
> the device in the slot.  Hence commit 2e35afaefe64 ("PCI: pciehp: Add
> reset_slot() method") masked the Presence Detect Changed Enable bit in the
> Slot Control register during a Secondary Bus Reset.  Commit 06a8d89af551
> ("PCI: pciehp: Disable link notification across slot reset") additionally
> masked the Data Link Layer State Changed Enable bit.
> 
> However masking those bits only disables interrupt generation (PCIe r6.2
> sec 6.7.3.1).  The events are still visible in the Slot Status register
> and picked up by the IRQ handler if it runs during a Secondary Bus Reset.
> This can happen if the interrupt is shared or if an unmasked hotplug event
> occurs, e.g. Attention Button Pressed or Power Fault Detected.
> 
> The likelihood of this happening used to be small, so it wasn't much of a
> problem in practice.  That has changed with the recent introduction of
> bandwidth control in v6.13-rc1 with commit 665745f27487 ("PCI/bwctrl:
> Re-add BW notification portdrv as PCIe BW controller"):
> 
> Bandwidth control shares the interrupt with PCIe hotplug.  A Secondary Bus
> Reset causes a Link Bandwidth Notification, so the hotplug IRQ handler
> runs, picks up the masked events and tears down the device in the slot.
> 
> As a result, Joel reports VFIO passthrough failure of a GPU, which Ilpo
> root-caused to the incorrect handling of masked hotplug events.
> 
> Clearly, a more reliable way is needed to ignore spurious hotplug events.
> 
> For Downstream Port Containment, a new ignore mechanism was introduced by
> commit a97396c6eb13 ("PCI: pciehp: Ignore Link Down/Up caused by DPC").
> It has been working reliably for the past four years.
> 
> Adapt it for Secondary Bus Resets.
> 
> Introduce two helpers to annotate code sections which cause spurious link
> changes:  pci_hp_ignore_link_change() and pci_hp_unignore_link_change()
> Use those helpers in lieu of masking interrupts in the Slot Control
> register.
> 
> Introduce a helper to check whether such a code section is executing
> concurrently and if so, await it:  pci_hp_spurious_link_change()
> Invoke the helper in the hotplug IRQ thread pciehp_ist().  Re-use the
> IRQ thread's existing code which ignores DPC-induced link changes unless
> the link is unexpectedly down after reset recovery or the device was
> replaced during the bus reset.
> 
> That code block in pciehp_ist() was previously only executed if a Data
> Link Layer State Changed event has occurred.  Additionally execute it for
> Presence Detect Changed events.  That's necessary for compatibility with
> PCIe r1.0 hotplug ports because Data Link Layer State Changed didn't exist
> before PCIe r1.1.  DPC was added with PCIe r3.1 and thus DPC-capable
> hotplug ports always support Data Link Layer State Changed events.
> But the same cannot be assumed for Secondary Bus Reset, which already
> existed in PCIe r1.0.
> 
> Secondary Bus Reset is only one of many causes of spurious link changes.
> Others include runtime suspend to D3cold, firmware updates or FPGA
> reconfiguration.  The new pci_hp_{,un}ignore_link_change() helpers may be
> used by all kinds of drivers to annotate such code sections, hence their
> declarations are publicly visible in <linux/pci.h>.  A case in point is
> the Mellanox Ethernet driver which disables a firmware reset feature if
> the Ethernet card is attached to a hotplug port, see commit 3d7a3f2612d7
> ("net/mlx5: Nack sync reset request when HotPlug is enabled").  Going
> forward, PCIe hotplug will be able to cope gracefully with all such use
> cases once the code sections are properly annotated.
> 
> The new helpers internally use two bits in struct pci_dev's priv_flags as
> well as a wait_queue.  This mirrors what was done for DPC by commit
> a97396c6eb13 ("PCI: pciehp: Ignore Link Down/Up caused by DPC").  That may
> be insufficient if spurious link changes are caused by multiple sources
> simultaneously.  An example might be a Secondary Bus Reset issued by AER
> during FPGA reconfiguration.  If this turns out to happen in real life,
> support for it can easily be added by replacing the PCI_LINK_CHANGING flag
> with an atomic_t counter incremented by pci_hp_ignore_link_change() and
> decremented by pci_hp_unignore_link_change().  Instead of awaiting a zero
> PCI_LINK_CHANGING flag, the pci_hp_spurious_link_change() helper would
> then simply await a zero counter.
> 
> Fixes: 665745f27487 ("PCI/bwctrl: Re-add BW notification portdrv as PCIe BW controller")
> Reported-by: Joel Mathew Thomas <proxy0@tutamail.com>
> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219765
> Tested-by: Joel Mathew Thomas <proxy0@tutamail.com>
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> Cc: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
> ---
>  drivers/pci/hotplug/pci_hotplug_core.c | 69 ++++++++++++++++++++++++++++++++++
>  drivers/pci/hotplug/pciehp_hpc.c       | 35 ++++++-----------
>  drivers/pci/pci.h                      |  3 ++
>  include/linux/pci.h                    |  8 ++++
>  4 files changed, 92 insertions(+), 23 deletions(-)
> 
> diff --git a/drivers/pci/hotplug/pci_hotplug_core.c b/drivers/pci/hotplug/pci_hotplug_core.c
> index d30f131..d8c5856 100644
> --- a/drivers/pci/hotplug/pci_hotplug_core.c
> +++ b/drivers/pci/hotplug/pci_hotplug_core.c
> @@ -492,6 +492,75 @@ void pci_hp_destroy(struct hotplug_slot *slot)
>  }
>  EXPORT_SYMBOL_GPL(pci_hp_destroy);
>  
> +static DECLARE_WAIT_QUEUE_HEAD(pci_hp_link_change_wq);
> +
> +/**
> + * pci_hp_ignore_link_change - begin code section causing spurious link changes
> + * @pdev: PCI hotplug bridge
> + *
> + * Mark the beginning of a code section causing spurious link changes on the
> + * Secondary Bus of @pdev, e.g. as a side effect of a Secondary Bus Reset,
> + * D3cold transition, firmware update or FPGA reconfiguration.
> + *
> + * Hotplug drivers can thus check whether such a code section is executing
> + * concurrently, await it with pci_hp_spurious_link_change() and ignore the
> + * resulting link change events.
> + *
> + * Must be paired with pci_hp_unignore_link_change().  May be called both
> + * from the PCI core and from Endpoint drivers.  May be called for bridges
> + * which are not hotplug-capable, in which case it has no effect because
> + * no hotplug driver is bound to the bridge.
> + */
> +void pci_hp_ignore_link_change(struct pci_dev *pdev)
> +{
> +	set_bit(PCI_LINK_CHANGING, &pdev->priv_flags);
> +	smp_mb__after_atomic(); /* pairs with implied barrier of wait_event() */
> +}
> +
> +/**
> + * pci_hp_unignore_link_change - end code section causing spurious link changes
> + * @pdev: PCI hotplug bridge
> + *
> + * Mark the end of a code section causing spurious link changes on the
> + * Secondary Bus of @pdev.  Must be paired with pci_hp_ignore_link_change().
> + */
> +void pci_hp_unignore_link_change(struct pci_dev *pdev)
> +{
> +	set_bit(PCI_LINK_CHANGED, &pdev->priv_flags);
> +	mb(); /* ensure pci_hp_spurious_link_change() sees either bit set */
> +	clear_bit(PCI_LINK_CHANGING, &pdev->priv_flags);
> +	wake_up_all(&pci_hp_link_change_wq);

This change should have added these:

#include <linux/bitops.h>
#include <linux/wait.h>

#include <asm/barrier.h>

--
 i.

  parent reply	other threads:[~2025-04-16  8:00 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-10 15:27 [PATCH 0/2] Ignore spurious PCIe hotplug events Lukas Wunner
2025-04-10 15:27 ` [PATCH 1/2] PCI: pciehp: Ignore Presence Detect Changed caused by DPC Lukas Wunner
2025-04-11  2:34   ` Sathyanarayanan Kuppuswamy
2025-04-11  8:58     ` Lukas Wunner
2025-04-14 13:33   ` Ilpo Järvinen
2025-04-10 15:27 ` [PATCH 2/2] PCI: pciehp: Ignore Link Down/Up caused by Secondary Bus Reset Lukas Wunner
2025-04-11 22:28   ` Sathyanarayanan Kuppuswamy
2025-04-12  3:36     ` Lukas Wunner
2025-04-13 17:21       ` Sathyanarayanan Kuppuswamy
2025-04-13 17:22   ` Sathyanarayanan Kuppuswamy
2025-04-14 13:32   ` Ilpo Järvinen
2025-04-16  8:00   ` Ilpo Järvinen [this message]
2025-04-10 22:19 ` [PATCH 0/2] Ignore spurious PCIe hotplug events Bjorn Helgaas
2025-04-15 20:51 ` Keith Busch
2025-04-16 15:06   ` Lukas Wunner
2025-04-18  1:26     ` Keith Busch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e42a19a8-4934-69bb-0a79-6748062043ec@linux.intel.com \
    --to=ilpo.jarvinen@linux.intel.com \
    --cc=alex.williamson@redhat.com \
    --cc=helgaas@kernel.org \
    --cc=kbusch@kernel.org \
    --cc=linux-fpga@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=lukas@wunner.de \
    --cc=matthew.gerlach@altera.com \
    --cc=mika.westerberg@linux.intel.com \
    --cc=moshe@nvidia.com \
    --cc=proxy0@tutamail.com \
    --cc=russ.weight@linux.dev \
    --cc=saeedm@nvidia.com \
    --cc=sathyanarayanan.kuppuswamy@linux.intel.com \
    --cc=shayd@nvidia.com \
    --cc=stuart.w.hayes@gmail.com \
    --cc=yangyicong@hisilicon.com \
    --cc=yilun.xu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).