linux-fpga.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] Ignore spurious PCIe hotplug events
@ 2025-04-10 15:27 Lukas Wunner
  2025-04-10 15:27 ` [PATCH 1/2] PCI: pciehp: Ignore Presence Detect Changed caused by DPC Lukas Wunner
                   ` (3 more replies)
  0 siblings, 4 replies; 16+ messages in thread
From: Lukas Wunner @ 2025-04-10 15:27 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Sathyanarayanan Kuppuswamy, Keith Busch, Yicong Yang, linux-pci,
	Stuart Hayes, Mika Westerberg, Ilpo Jarvinen, Joel Mathew Thomas,
	Russ Weight, Matthew Gerlach, Yilun Xu, linux-fpga, Moshe Shemesh,
	Shay Drory, Saeed Mahameed, Alex Williamson

Trying to kill several birds with one stone here:

First of all, PCIe hotplug is deliberately ignoring link events occurring
as a side effect of Downstream Port Containment.  But it's not yet ignoring
Presence Detect Changed events.  These can happen if a hotplug bridge uses
in-band presence detect.  Reported by Keith Busch, patch [1/2] seeks to
fix it.

Second, PCIe hotplug is deliberately ignoring link events and Presence
Detect Changed events occurring as a side effect of a Secondary Bus Reset.
But that's no longer working properly since the introduction of bandwidth
control in v6.13-rc1.  Actually it never worked properly, but bandwidth
control is now mercilessly exposing the issue.  VFIO is thus broken,
it resets the device on passthrough.  Reported by Joel Mathew Thomas.

Third, link or presence events can not only occur as a side effect of DPC
or SBR, but also because of suspend to D3cold, a firmware update or FPGA
reconfiguration.  In particular, Altera engineers report that the link
goes down as a side effect of FPGA reconfiguration and the PCIe hotplug
driver responds by disabling slot power.  Obviously not what you'd want
while the FPGA is being reconfigured!

This leads me to believe that we need a generic mechanism to tell hotplug
drivers that spurious link changes are ongoing which need to be ignored.
Patch [2/2] introduces an API for it and the first user is SBR handling
in PCIe hotplug.  This fixes the issue exposed by bandwidth control.
It also aligns DPC and SBR handling in the PCIe hotplug driver such that
they use the same code path.

The API pci_hp_ignore_link_change() / pci_hp_unignore_link_change() is
initially not exported.  It can be once the first modular user shows up.

Although these are technically fixes, they're slightly intrusive, so it
would be good to let them simmer in linux-next for a while.  One option
would be to apply for v6.16 and let Greg & Sasha do the backporting.
Another would be to apply to the for-linus branch for v6.15 but wait
maybe 4 weeks before a pull request is sent.

Please review and test.  Thanks!

Lukas Wunner (2):
  PCI: pciehp: Ignore Presence Detect Changed caused by DPC
  PCI: pciehp: Ignore Link Down/Up caused by Secondary Bus Reset

 drivers/pci/hotplug/pci_hotplug_core.c | 69 ++++++++++++++++++++++++++++++
 drivers/pci/hotplug/pciehp.h           |  1 +
 drivers/pci/hotplug/pciehp_core.c      | 29 -------------
 drivers/pci/hotplug/pciehp_hpc.c       | 78 ++++++++++++++++++++++------------
 drivers/pci/pci.h                      |  3 ++
 include/linux/pci.h                    |  8 ++++
 6 files changed, 132 insertions(+), 56 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/2] PCI: pciehp: Ignore Presence Detect Changed caused by DPC
  2025-04-10 15:27 [PATCH 0/2] Ignore spurious PCIe hotplug events Lukas Wunner
@ 2025-04-10 15:27 ` Lukas Wunner
  2025-04-11  2:34   ` Sathyanarayanan Kuppuswamy
  2025-04-14 13:33   ` Ilpo Järvinen
  2025-04-10 15:27 ` [PATCH 2/2] PCI: pciehp: Ignore Link Down/Up caused by Secondary Bus Reset Lukas Wunner
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 16+ messages in thread
From: Lukas Wunner @ 2025-04-10 15:27 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Sathyanarayanan Kuppuswamy, Keith Busch, Yicong Yang, linux-pci,
	Stuart Hayes, Mika Westerberg, Ilpo Jarvinen, Joel Mathew Thomas,
	Russ Weight, Matthew Gerlach, Yilun Xu, linux-fpga, Moshe Shemesh,
	Shay Drory, Saeed Mahameed, Alex Williamson

Commit a97396c6eb13 ("PCI: pciehp: Ignore Link Down/Up caused by DPC")
amended PCIe hotplug to not bring down the slot upon Data Link Layer State
Changed events caused by Downstream Port Containment.

However Keith reports off-list that if the slot uses in-band presence
detect (i.e. Presence Detect State is derived from Data Link Layer Link
Active), DPC also causes a spurious Presence Detect Changed event.

This needs to be ignored as well.

Unfortunately there's no register indicating that in-band presence detect
is used.  PCIe r5.0 sec 7.5.3.10 introduced the In-Band PD Disable bit in
the Slot Control Register.  The PCIe hotplug driver sets this bit on
ports supporting it.  But older ports may still use in-band presence
detect.

If in-band presence detect can be disabled, Presence Detect Changed events
occurring during DPC must not be ignored because they signal device
replacement.  On all other ports, device replacement cannot be detected
reliably because the Presence Detect Changed event could be a side effect
of DPC.  On those (older) ports, perform a best-effort device replacement
check by comparing the Vendor ID, Device ID and other data in Config Space
with the values cached in struct pci_dev.  Use the existing helper
pciehp_device_replaced() to accomplish this.  It is currently #ifdef'ed to
CONFIG_PM_SLEEP in pciehp_core.c, so move it to pciehp_hpc.c where most
other functions accessing config space reside.

Reported-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 drivers/pci/hotplug/pciehp.h      |  1 +
 drivers/pci/hotplug/pciehp_core.c | 29 -----------------------
 drivers/pci/hotplug/pciehp_hpc.c  | 49 +++++++++++++++++++++++++++++++++------
 3 files changed, 43 insertions(+), 36 deletions(-)

diff --git a/drivers/pci/hotplug/pciehp.h b/drivers/pci/hotplug/pciehp.h
index 273dd8c..debc79b0 100644
--- a/drivers/pci/hotplug/pciehp.h
+++ b/drivers/pci/hotplug/pciehp.h
@@ -187,6 +187,7 @@ struct controller {
 int pciehp_card_present_or_link_active(struct controller *ctrl);
 int pciehp_check_link_status(struct controller *ctrl);
 int pciehp_check_link_active(struct controller *ctrl);
+bool pciehp_device_replaced(struct controller *ctrl);
 void pciehp_release_ctrl(struct controller *ctrl);
 
 int pciehp_sysfs_enable_slot(struct hotplug_slot *hotplug_slot);
diff --git a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c
index 997841c..f59baa9 100644
--- a/drivers/pci/hotplug/pciehp_core.c
+++ b/drivers/pci/hotplug/pciehp_core.c
@@ -284,35 +284,6 @@ static int pciehp_suspend(struct pcie_device *dev)
 	return 0;
 }
 
-static bool pciehp_device_replaced(struct controller *ctrl)
-{
-	struct pci_dev *pdev __free(pci_dev_put) = NULL;
-	u32 reg;
-
-	if (pci_dev_is_disconnected(ctrl->pcie->port))
-		return false;
-
-	pdev = pci_get_slot(ctrl->pcie->port->subordinate, PCI_DEVFN(0, 0));
-	if (!pdev)
-		return true;
-
-	if (pci_read_config_dword(pdev, PCI_VENDOR_ID, &reg) ||
-	    reg != (pdev->vendor | (pdev->device << 16)) ||
-	    pci_read_config_dword(pdev, PCI_CLASS_REVISION, &reg) ||
-	    reg != (pdev->revision | (pdev->class << 8)))
-		return true;
-
-	if (pdev->hdr_type == PCI_HEADER_TYPE_NORMAL &&
-	    (pci_read_config_dword(pdev, PCI_SUBSYSTEM_VENDOR_ID, &reg) ||
-	     reg != (pdev->subsystem_vendor | (pdev->subsystem_device << 16))))
-		return true;
-
-	if (pci_get_dsn(pdev) != ctrl->dsn)
-		return true;
-
-	return false;
-}
-
 static int pciehp_resume_noirq(struct pcie_device *dev)
 {
 	struct controller *ctrl = get_service_data(dev);
diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index 8a09fb6..388fbed 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -563,18 +563,48 @@ void pciehp_power_off_slot(struct controller *ctrl)
 		 PCI_EXP_SLTCTL_PWR_OFF);
 }
 
+bool pciehp_device_replaced(struct controller *ctrl)
+{
+	struct pci_dev *pdev __free(pci_dev_put) = NULL;
+	u32 reg;
+
+	if (pci_dev_is_disconnected(ctrl->pcie->port))
+		return false;
+
+	pdev = pci_get_slot(ctrl->pcie->port->subordinate, PCI_DEVFN(0, 0));
+	if (!pdev)
+		return true;
+
+	if (pci_read_config_dword(pdev, PCI_VENDOR_ID, &reg) ||
+	    reg != (pdev->vendor | (pdev->device << 16)) ||
+	    pci_read_config_dword(pdev, PCI_CLASS_REVISION, &reg) ||
+	    reg != (pdev->revision | (pdev->class << 8)))
+		return true;
+
+	if (pdev->hdr_type == PCI_HEADER_TYPE_NORMAL &&
+	    (pci_read_config_dword(pdev, PCI_SUBSYSTEM_VENDOR_ID, &reg) ||
+	     reg != (pdev->subsystem_vendor | (pdev->subsystem_device << 16))))
+		return true;
+
+	if (pci_get_dsn(pdev) != ctrl->dsn)
+		return true;
+
+	return false;
+}
+
 static void pciehp_ignore_dpc_link_change(struct controller *ctrl,
-					  struct pci_dev *pdev, int irq)
+					  struct pci_dev *pdev, int irq,
+					  u16 ignored_events)
 {
 	/*
 	 * Ignore link changes which occurred while waiting for DPC recovery.
 	 * Could be several if DPC triggered multiple times consecutively.
 	 */
 	synchronize_hardirq(irq);
-	atomic_and(~PCI_EXP_SLTSTA_DLLSC, &ctrl->pending_events);
+	atomic_and(~ignored_events, &ctrl->pending_events);
 	if (pciehp_poll_mode)
 		pcie_capability_write_word(pdev, PCI_EXP_SLTSTA,
-					   PCI_EXP_SLTSTA_DLLSC);
+					   ignored_events);
 	ctrl_info(ctrl, "Slot(%s): Link Down/Up ignored (recovered by DPC)\n",
 		  slot_name(ctrl));
 
@@ -584,8 +614,8 @@ static void pciehp_ignore_dpc_link_change(struct controller *ctrl,
 	 * Synthesize it to ensure that it is acted on.
 	 */
 	down_read_nested(&ctrl->reset_lock, ctrl->depth);
-	if (!pciehp_check_link_active(ctrl))
-		pciehp_request(ctrl, PCI_EXP_SLTSTA_DLLSC);
+	if (!pciehp_check_link_active(ctrl) || pciehp_device_replaced(ctrl))
+		pciehp_request(ctrl, ignored_events);
 	up_read(&ctrl->reset_lock);
 }
 
@@ -736,8 +766,13 @@ static irqreturn_t pciehp_ist(int irq, void *dev_id)
 	 */
 	if ((events & PCI_EXP_SLTSTA_DLLSC) && pci_dpc_recovered(pdev) &&
 	    ctrl->state == ON_STATE) {
-		events &= ~PCI_EXP_SLTSTA_DLLSC;
-		pciehp_ignore_dpc_link_change(ctrl, pdev, irq);
+		u16 ignored_events = PCI_EXP_SLTSTA_DLLSC;
+
+		if (!ctrl->inband_presence_disabled)
+			ignored_events |= events & PCI_EXP_SLTSTA_PDC;
+
+		events &= ~ignored_events;
+		pciehp_ignore_dpc_link_change(ctrl, pdev, irq, ignored_events);
 	}
 
 	/*
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/2] PCI: pciehp: Ignore Link Down/Up caused by Secondary Bus Reset
  2025-04-10 15:27 [PATCH 0/2] Ignore spurious PCIe hotplug events Lukas Wunner
  2025-04-10 15:27 ` [PATCH 1/2] PCI: pciehp: Ignore Presence Detect Changed caused by DPC Lukas Wunner
@ 2025-04-10 15:27 ` Lukas Wunner
  2025-04-11 22:28   ` Sathyanarayanan Kuppuswamy
                     ` (3 more replies)
  2025-04-10 22:19 ` [PATCH 0/2] Ignore spurious PCIe hotplug events Bjorn Helgaas
  2025-04-15 20:51 ` Keith Busch
  3 siblings, 4 replies; 16+ messages in thread
From: Lukas Wunner @ 2025-04-10 15:27 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Sathyanarayanan Kuppuswamy, Keith Busch, Yicong Yang, linux-pci,
	Stuart Hayes, Mika Westerberg, Ilpo Jarvinen, Joel Mathew Thomas,
	Russ Weight, Matthew Gerlach, Yilun Xu, linux-fpga, Moshe Shemesh,
	Shay Drory, Saeed Mahameed, Alex Williamson

When a Secondary Bus Reset is issued at a hotplug port, it causes a Data
Link Layer State Changed event as a side effect.  On hotplug ports using
in-band presence detect, it additionally causes a Presence Detect Changed
event.

These spurious events should not result in teardown and re-enumeration of
the device in the slot.  Hence commit 2e35afaefe64 ("PCI: pciehp: Add
reset_slot() method") masked the Presence Detect Changed Enable bit in the
Slot Control register during a Secondary Bus Reset.  Commit 06a8d89af551
("PCI: pciehp: Disable link notification across slot reset") additionally
masked the Data Link Layer State Changed Enable bit.

However masking those bits only disables interrupt generation (PCIe r6.2
sec 6.7.3.1).  The events are still visible in the Slot Status register
and picked up by the IRQ handler if it runs during a Secondary Bus Reset.
This can happen if the interrupt is shared or if an unmasked hotplug event
occurs, e.g. Attention Button Pressed or Power Fault Detected.

The likelihood of this happening used to be small, so it wasn't much of a
problem in practice.  That has changed with the recent introduction of
bandwidth control in v6.13-rc1 with commit 665745f27487 ("PCI/bwctrl:
Re-add BW notification portdrv as PCIe BW controller"):

Bandwidth control shares the interrupt with PCIe hotplug.  A Secondary Bus
Reset causes a Link Bandwidth Notification, so the hotplug IRQ handler
runs, picks up the masked events and tears down the device in the slot.

As a result, Joel reports VFIO passthrough failure of a GPU, which Ilpo
root-caused to the incorrect handling of masked hotplug events.

Clearly, a more reliable way is needed to ignore spurious hotplug events.

For Downstream Port Containment, a new ignore mechanism was introduced by
commit a97396c6eb13 ("PCI: pciehp: Ignore Link Down/Up caused by DPC").
It has been working reliably for the past four years.

Adapt it for Secondary Bus Resets.

Introduce two helpers to annotate code sections which cause spurious link
changes:  pci_hp_ignore_link_change() and pci_hp_unignore_link_change()
Use those helpers in lieu of masking interrupts in the Slot Control
register.

Introduce a helper to check whether such a code section is executing
concurrently and if so, await it:  pci_hp_spurious_link_change()
Invoke the helper in the hotplug IRQ thread pciehp_ist().  Re-use the
IRQ thread's existing code which ignores DPC-induced link changes unless
the link is unexpectedly down after reset recovery or the device was
replaced during the bus reset.

That code block in pciehp_ist() was previously only executed if a Data
Link Layer State Changed event has occurred.  Additionally execute it for
Presence Detect Changed events.  That's necessary for compatibility with
PCIe r1.0 hotplug ports because Data Link Layer State Changed didn't exist
before PCIe r1.1.  DPC was added with PCIe r3.1 and thus DPC-capable
hotplug ports always support Data Link Layer State Changed events.
But the same cannot be assumed for Secondary Bus Reset, which already
existed in PCIe r1.0.

Secondary Bus Reset is only one of many causes of spurious link changes.
Others include runtime suspend to D3cold, firmware updates or FPGA
reconfiguration.  The new pci_hp_{,un}ignore_link_change() helpers may be
used by all kinds of drivers to annotate such code sections, hence their
declarations are publicly visible in <linux/pci.h>.  A case in point is
the Mellanox Ethernet driver which disables a firmware reset feature if
the Ethernet card is attached to a hotplug port, see commit 3d7a3f2612d7
("net/mlx5: Nack sync reset request when HotPlug is enabled").  Going
forward, PCIe hotplug will be able to cope gracefully with all such use
cases once the code sections are properly annotated.

The new helpers internally use two bits in struct pci_dev's priv_flags as
well as a wait_queue.  This mirrors what was done for DPC by commit
a97396c6eb13 ("PCI: pciehp: Ignore Link Down/Up caused by DPC").  That may
be insufficient if spurious link changes are caused by multiple sources
simultaneously.  An example might be a Secondary Bus Reset issued by AER
during FPGA reconfiguration.  If this turns out to happen in real life,
support for it can easily be added by replacing the PCI_LINK_CHANGING flag
with an atomic_t counter incremented by pci_hp_ignore_link_change() and
decremented by pci_hp_unignore_link_change().  Instead of awaiting a zero
PCI_LINK_CHANGING flag, the pci_hp_spurious_link_change() helper would
then simply await a zero counter.

Fixes: 665745f27487 ("PCI/bwctrl: Re-add BW notification portdrv as PCIe BW controller")
Reported-by: Joel Mathew Thomas <proxy0@tutamail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219765
Tested-by: Joel Mathew Thomas <proxy0@tutamail.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
---
 drivers/pci/hotplug/pci_hotplug_core.c | 69 ++++++++++++++++++++++++++++++++++
 drivers/pci/hotplug/pciehp_hpc.c       | 35 ++++++-----------
 drivers/pci/pci.h                      |  3 ++
 include/linux/pci.h                    |  8 ++++
 4 files changed, 92 insertions(+), 23 deletions(-)

diff --git a/drivers/pci/hotplug/pci_hotplug_core.c b/drivers/pci/hotplug/pci_hotplug_core.c
index d30f131..d8c5856 100644
--- a/drivers/pci/hotplug/pci_hotplug_core.c
+++ b/drivers/pci/hotplug/pci_hotplug_core.c
@@ -492,6 +492,75 @@ void pci_hp_destroy(struct hotplug_slot *slot)
 }
 EXPORT_SYMBOL_GPL(pci_hp_destroy);
 
+static DECLARE_WAIT_QUEUE_HEAD(pci_hp_link_change_wq);
+
+/**
+ * pci_hp_ignore_link_change - begin code section causing spurious link changes
+ * @pdev: PCI hotplug bridge
+ *
+ * Mark the beginning of a code section causing spurious link changes on the
+ * Secondary Bus of @pdev, e.g. as a side effect of a Secondary Bus Reset,
+ * D3cold transition, firmware update or FPGA reconfiguration.
+ *
+ * Hotplug drivers can thus check whether such a code section is executing
+ * concurrently, await it with pci_hp_spurious_link_change() and ignore the
+ * resulting link change events.
+ *
+ * Must be paired with pci_hp_unignore_link_change().  May be called both
+ * from the PCI core and from Endpoint drivers.  May be called for bridges
+ * which are not hotplug-capable, in which case it has no effect because
+ * no hotplug driver is bound to the bridge.
+ */
+void pci_hp_ignore_link_change(struct pci_dev *pdev)
+{
+	set_bit(PCI_LINK_CHANGING, &pdev->priv_flags);
+	smp_mb__after_atomic(); /* pairs with implied barrier of wait_event() */
+}
+
+/**
+ * pci_hp_unignore_link_change - end code section causing spurious link changes
+ * @pdev: PCI hotplug bridge
+ *
+ * Mark the end of a code section causing spurious link changes on the
+ * Secondary Bus of @pdev.  Must be paired with pci_hp_ignore_link_change().
+ */
+void pci_hp_unignore_link_change(struct pci_dev *pdev)
+{
+	set_bit(PCI_LINK_CHANGED, &pdev->priv_flags);
+	mb(); /* ensure pci_hp_spurious_link_change() sees either bit set */
+	clear_bit(PCI_LINK_CHANGING, &pdev->priv_flags);
+	wake_up_all(&pci_hp_link_change_wq);
+}
+
+/**
+ * pci_hp_spurious_link_change - check for spurious link changes
+ * @pdev: PCI hotplug bridge
+ *
+ * Check whether a code section is executing concurrently which is causing
+ * spurious link changes on the Secondary Bus of @pdev.  Await the end of the
+ * code section if so.
+ *
+ * Return %true if such a code section has been executing concurrently,
+ * otherwise %false.  Also return %true if such a code section has not been
+ * executing concurrently, but at least once since the last invocation of this
+ * function.
+ *
+ * May be called by hotplug drivers to check whether a link change is spurious
+ * and can be ignored.
+ *
+ * Because a genuine link change may have occurred in-between a spurious link
+ * change and the invocation of this function, hotplug drivers should perform
+ * sanity checks such as retrieving the current link state and bringing down
+ * the slot if the link is down.
+ */
+bool pci_hp_spurious_link_change(struct pci_dev *pdev)
+{
+	wait_event(pci_hp_link_change_wq,
+		   !test_bit(PCI_LINK_CHANGING, &pdev->priv_flags));
+
+	return test_and_clear_bit(PCI_LINK_CHANGED, &pdev->priv_flags);
+}
+
 static int __init pci_hotplug_init(void)
 {
 	int result;
diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index 388fbed..c98d310 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -592,21 +592,21 @@ bool pciehp_device_replaced(struct controller *ctrl)
 	return false;
 }
 
-static void pciehp_ignore_dpc_link_change(struct controller *ctrl,
-					  struct pci_dev *pdev, int irq,
-					  u16 ignored_events)
+static void pciehp_ignore_link_change(struct controller *ctrl,
+				      struct pci_dev *pdev, int irq,
+				      u16 ignored_events)
 {
 	/*
 	 * Ignore link changes which occurred while waiting for DPC recovery.
 	 * Could be several if DPC triggered multiple times consecutively.
+	 * Also ignore link changes caused by Secondary Bus Reset etc.
 	 */
 	synchronize_hardirq(irq);
 	atomic_and(~ignored_events, &ctrl->pending_events);
 	if (pciehp_poll_mode)
 		pcie_capability_write_word(pdev, PCI_EXP_SLTSTA,
 					   ignored_events);
-	ctrl_info(ctrl, "Slot(%s): Link Down/Up ignored (recovered by DPC)\n",
-		  slot_name(ctrl));
+	ctrl_info(ctrl, "Slot(%s): Link Down/Up ignored\n", slot_name(ctrl));
 
 	/*
 	 * If the link is unexpectedly down after successful recovery,
@@ -762,9 +762,11 @@ static irqreturn_t pciehp_ist(int irq, void *dev_id)
 
 	/*
 	 * Ignore Link Down/Up events caused by Downstream Port Containment
-	 * if recovery from the error succeeded.
+	 * if recovery succeeded, or caused by Secondary Bus Reset,
+	 * suspend to D3cold, firmware update, FPGA reconfiguration, etc.
 	 */
-	if ((events & PCI_EXP_SLTSTA_DLLSC) && pci_dpc_recovered(pdev) &&
+	if ((events & (PCI_EXP_SLTSTA_PDC | PCI_EXP_SLTSTA_DLLSC)) &&
+	    (pci_dpc_recovered(pdev) || pci_hp_spurious_link_change(pdev)) &&
 	    ctrl->state == ON_STATE) {
 		u16 ignored_events = PCI_EXP_SLTSTA_DLLSC;
 
@@ -772,7 +774,7 @@ static irqreturn_t pciehp_ist(int irq, void *dev_id)
 			ignored_events |= events & PCI_EXP_SLTSTA_PDC;
 
 		events &= ~ignored_events;
-		pciehp_ignore_dpc_link_change(ctrl, pdev, irq, ignored_events);
+		pciehp_ignore_link_change(ctrl, pdev, irq, ignored_events);
 	}
 
 	/*
@@ -937,7 +939,6 @@ int pciehp_reset_slot(struct hotplug_slot *hotplug_slot, bool probe)
 {
 	struct controller *ctrl = to_ctrl(hotplug_slot);
 	struct pci_dev *pdev = ctrl_dev(ctrl);
-	u16 stat_mask = 0, ctrl_mask = 0;
 	int rc;
 
 	if (probe)
@@ -945,23 +946,11 @@ int pciehp_reset_slot(struct hotplug_slot *hotplug_slot, bool probe)
 
 	down_write_nested(&ctrl->reset_lock, ctrl->depth);
 
-	if (!ATTN_BUTTN(ctrl)) {
-		ctrl_mask |= PCI_EXP_SLTCTL_PDCE;
-		stat_mask |= PCI_EXP_SLTSTA_PDC;
-	}
-	ctrl_mask |= PCI_EXP_SLTCTL_DLLSCE;
-	stat_mask |= PCI_EXP_SLTSTA_DLLSC;
-
-	pcie_write_cmd(ctrl, 0, ctrl_mask);
-	ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__,
-		 pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL, 0);
+	pci_hp_ignore_link_change(pdev);
 
 	rc = pci_bridge_secondary_bus_reset(ctrl->pcie->port);
 
-	pcie_capability_write_word(pdev, PCI_EXP_SLTSTA, stat_mask);
-	pcie_write_cmd_nowait(ctrl, ctrl_mask, ctrl_mask);
-	ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__,
-		 pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL, ctrl_mask);
+	pci_hp_unignore_link_change(pdev);
 
 	up_write(&ctrl->reset_lock);
 	return rc;
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index b81e99c..7db798b 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -227,6 +227,7 @@ static inline bool pcie_downstream_port(const struct pci_dev *dev)
 
 /* Functions for PCI Hotplug drivers to use */
 int pci_hp_add_bridge(struct pci_dev *dev);
+bool pci_hp_spurious_link_change(struct pci_dev *pdev);
 
 #if defined(CONFIG_SYSFS) && defined(HAVE_PCI_LEGACY)
 void pci_create_legacy_files(struct pci_bus *bus);
@@ -557,6 +558,8 @@ static inline int pci_dev_set_disconnected(struct pci_dev *dev, void *unused)
 #define PCI_DPC_RECOVERED 1
 #define PCI_DPC_RECOVERING 2
 #define PCI_DEV_REMOVED 3
+#define PCI_LINK_CHANGED 4
+#define PCI_LINK_CHANGING 5
 
 static inline void pci_dev_assign_added(struct pci_dev *dev)
 {
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 0e8e3fd..833b54f 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1848,6 +1848,14 @@ static inline void pcie_no_aspm(void) { }
 static inline bool pcie_aspm_enabled(struct pci_dev *pdev) { return false; }
 #endif
 
+#ifdef CONFIG_HOTPLUG_PCI
+void pci_hp_ignore_link_change(struct pci_dev *pdev);
+void pci_hp_unignore_link_change(struct pci_dev *pdev);
+#else
+static inline void pci_hp_ignore_link_change(struct pci_dev *pdev) { }
+static inline void pci_hp_unignore_link_change(struct pci_dev *pdev) { }
+#endif
+
 #ifdef CONFIG_PCIEAER
 bool pci_aer_available(void);
 #else
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/2] Ignore spurious PCIe hotplug events
  2025-04-10 15:27 [PATCH 0/2] Ignore spurious PCIe hotplug events Lukas Wunner
  2025-04-10 15:27 ` [PATCH 1/2] PCI: pciehp: Ignore Presence Detect Changed caused by DPC Lukas Wunner
  2025-04-10 15:27 ` [PATCH 2/2] PCI: pciehp: Ignore Link Down/Up caused by Secondary Bus Reset Lukas Wunner
@ 2025-04-10 22:19 ` Bjorn Helgaas
  2025-04-15 20:51 ` Keith Busch
  3 siblings, 0 replies; 16+ messages in thread
From: Bjorn Helgaas @ 2025-04-10 22:19 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Sathyanarayanan Kuppuswamy, Keith Busch, Yicong Yang, linux-pci,
	Stuart Hayes, Mika Westerberg, Ilpo Jarvinen, Joel Mathew Thomas,
	Russ Weight, Matthew Gerlach, Yilun Xu, linux-fpga, Moshe Shemesh,
	Shay Drory, Saeed Mahameed, Alex Williamson

On Thu, Apr 10, 2025 at 05:27:10PM +0200, Lukas Wunner wrote:
> Trying to kill several birds with one stone here:
> 
> First of all, PCIe hotplug is deliberately ignoring link events occurring
> as a side effect of Downstream Port Containment.  But it's not yet ignoring
> Presence Detect Changed events.  These can happen if a hotplug bridge uses
> in-band presence detect.  Reported by Keith Busch, patch [1/2] seeks to
> fix it.
> 
> Second, PCIe hotplug is deliberately ignoring link events and Presence
> Detect Changed events occurring as a side effect of a Secondary Bus Reset.
> But that's no longer working properly since the introduction of bandwidth
> control in v6.13-rc1.  Actually it never worked properly, but bandwidth
> control is now mercilessly exposing the issue.  VFIO is thus broken,
> it resets the device on passthrough.  Reported by Joel Mathew Thomas.
> 
> Third, link or presence events can not only occur as a side effect of DPC
> or SBR, but also because of suspend to D3cold, a firmware update or FPGA
> reconfiguration.  In particular, Altera engineers report that the link
> goes down as a side effect of FPGA reconfiguration and the PCIe hotplug
> driver responds by disabling slot power.  Obviously not what you'd want
> while the FPGA is being reconfigured!
> 
> This leads me to believe that we need a generic mechanism to tell hotplug
> drivers that spurious link changes are ongoing which need to be ignored.
> Patch [2/2] introduces an API for it and the first user is SBR handling
> in PCIe hotplug.  This fixes the issue exposed by bandwidth control.
> It also aligns DPC and SBR handling in the PCIe hotplug driver such that
> they use the same code path.
> 
> The API pci_hp_ignore_link_change() / pci_hp_unignore_link_change() is
> initially not exported.  It can be once the first modular user shows up.
> 
> Although these are technically fixes, they're slightly intrusive, so it
> would be good to let them simmer in linux-next for a while.  One option
> would be to apply for v6.16 and let Greg & Sasha do the backporting.
> Another would be to apply to the for-linus branch for v6.15 but wait
> maybe 4 weeks before a pull request is sent.
> 
> Please review and test.  Thanks!
> 
> Lukas Wunner (2):
>   PCI: pciehp: Ignore Presence Detect Changed caused by DPC
>   PCI: pciehp: Ignore Link Down/Up caused by Secondary Bus Reset
> 
>  drivers/pci/hotplug/pci_hotplug_core.c | 69 ++++++++++++++++++++++++++++++
>  drivers/pci/hotplug/pciehp.h           |  1 +
>  drivers/pci/hotplug/pciehp_core.c      | 29 -------------
>  drivers/pci/hotplug/pciehp_hpc.c       | 78 ++++++++++++++++++++++------------
>  drivers/pci/pci.h                      |  3 ++
>  include/linux/pci.h                    |  8 ++++
>  6 files changed, 132 insertions(+), 56 deletions(-)

Applied to pci/hotplug for now, thanks!  We can get it into -next and
if all goes well easily move to for-linus for v6.15 if that's the
right thing.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] PCI: pciehp: Ignore Presence Detect Changed caused by DPC
  2025-04-10 15:27 ` [PATCH 1/2] PCI: pciehp: Ignore Presence Detect Changed caused by DPC Lukas Wunner
@ 2025-04-11  2:34   ` Sathyanarayanan Kuppuswamy
  2025-04-11  8:58     ` Lukas Wunner
  2025-04-14 13:33   ` Ilpo Järvinen
  1 sibling, 1 reply; 16+ messages in thread
From: Sathyanarayanan Kuppuswamy @ 2025-04-11  2:34 UTC (permalink / raw)
  To: Lukas Wunner, Bjorn Helgaas
  Cc: Keith Busch, Yicong Yang, linux-pci, Stuart Hayes,
	Mika Westerberg, Ilpo Jarvinen, Joel Mathew Thomas, Russ Weight,
	Matthew Gerlach, Yilun Xu, linux-fpga, Moshe Shemesh, Shay Drory,
	Saeed Mahameed, Alex Williamson


On 4/10/25 8:27 AM, Lukas Wunner wrote:
> Commit a97396c6eb13 ("PCI: pciehp: Ignore Link Down/Up caused by DPC")
> amended PCIe hotplug to not bring down the slot upon Data Link Layer State
> Changed events caused by Downstream Port Containment.
>
> However Keith reports off-list that if the slot uses in-band presence
> detect (i.e. Presence Detect State is derived from Data Link Layer Link
> Active), DPC also causes a spurious Presence Detect Changed event.
>
> This needs to be ignored as well.
>
> Unfortunately there's no register indicating that in-band presence detect
> is used.  PCIe r5.0 sec 7.5.3.10 introduced the In-Band PD Disable bit in
> the Slot Control Register.  The PCIe hotplug driver sets this bit on
> ports supporting it.  But older ports may still use in-band presence
> detect.
>
> If in-band presence detect can be disabled, Presence Detect Changed events

It should be "in-band presence detect is disabled", right?

> occurring during DPC must not be ignored because they signal device
> replacement.  On all other ports, device replacement cannot be detected
> reliably because the Presence Detect Changed event could be a side effect
> of DPC.  On those (older) ports, perform a best-effort device replacement
> check by comparing the Vendor ID, Device ID and other data in Config Space
> with the values cached in struct pci_dev.  Use the existing helper
> pciehp_device_replaced() to accomplish this.  It is currently #ifdef'ed to
> CONFIG_PM_SLEEP in pciehp_core.c, so move it to pciehp_hpc.c where most
> other functions accessing config space reside.
>
> Reported-by: Keith Busch <kbusch@kernel.org>
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> ---

Code looks fine to me

Reviewed-by: Kuppuswamy Sathyanarayanan 
<sathyanarayanan.kuppuswamy@linux.intel.com>

>   drivers/pci/hotplug/pciehp.h      |  1 +
>   drivers/pci/hotplug/pciehp_core.c | 29 -----------------------
>   drivers/pci/hotplug/pciehp_hpc.c  | 49 +++++++++++++++++++++++++++++++++------
>   3 files changed, 43 insertions(+), 36 deletions(-)
>
> diff --git a/drivers/pci/hotplug/pciehp.h b/drivers/pci/hotplug/pciehp.h
> index 273dd8c..debc79b0 100644
> --- a/drivers/pci/hotplug/pciehp.h
> +++ b/drivers/pci/hotplug/pciehp.h
> @@ -187,6 +187,7 @@ struct controller {
>   int pciehp_card_present_or_link_active(struct controller *ctrl);
>   int pciehp_check_link_status(struct controller *ctrl);
>   int pciehp_check_link_active(struct controller *ctrl);
> +bool pciehp_device_replaced(struct controller *ctrl);
>   void pciehp_release_ctrl(struct controller *ctrl);
>   
>   int pciehp_sysfs_enable_slot(struct hotplug_slot *hotplug_slot);
> diff --git a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c
> index 997841c..f59baa9 100644
> --- a/drivers/pci/hotplug/pciehp_core.c
> +++ b/drivers/pci/hotplug/pciehp_core.c
> @@ -284,35 +284,6 @@ static int pciehp_suspend(struct pcie_device *dev)
>   	return 0;
>   }
>   
> -static bool pciehp_device_replaced(struct controller *ctrl)
> -{
> -	struct pci_dev *pdev __free(pci_dev_put) = NULL;
> -	u32 reg;
> -
> -	if (pci_dev_is_disconnected(ctrl->pcie->port))
> -		return false;
> -
> -	pdev = pci_get_slot(ctrl->pcie->port->subordinate, PCI_DEVFN(0, 0));
> -	if (!pdev)
> -		return true;
> -
> -	if (pci_read_config_dword(pdev, PCI_VENDOR_ID, &reg) ||
> -	    reg != (pdev->vendor | (pdev->device << 16)) ||
> -	    pci_read_config_dword(pdev, PCI_CLASS_REVISION, &reg) ||
> -	    reg != (pdev->revision | (pdev->class << 8)))
> -		return true;
> -
> -	if (pdev->hdr_type == PCI_HEADER_TYPE_NORMAL &&
> -	    (pci_read_config_dword(pdev, PCI_SUBSYSTEM_VENDOR_ID, &reg) ||
> -	     reg != (pdev->subsystem_vendor | (pdev->subsystem_device << 16))))
> -		return true;
> -
> -	if (pci_get_dsn(pdev) != ctrl->dsn)
> -		return true;
> -
> -	return false;
> -}
> -
>   static int pciehp_resume_noirq(struct pcie_device *dev)
>   {
>   	struct controller *ctrl = get_service_data(dev);
> diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
> index 8a09fb6..388fbed 100644
> --- a/drivers/pci/hotplug/pciehp_hpc.c
> +++ b/drivers/pci/hotplug/pciehp_hpc.c
> @@ -563,18 +563,48 @@ void pciehp_power_off_slot(struct controller *ctrl)
>   		 PCI_EXP_SLTCTL_PWR_OFF);
>   }
>   
> +bool pciehp_device_replaced(struct controller *ctrl)
> +{
> +	struct pci_dev *pdev __free(pci_dev_put) = NULL;
> +	u32 reg;
> +
> +	if (pci_dev_is_disconnected(ctrl->pcie->port))
> +		return false;
> +
> +	pdev = pci_get_slot(ctrl->pcie->port->subordinate, PCI_DEVFN(0, 0));
> +	if (!pdev)
> +		return true;
> +
> +	if (pci_read_config_dword(pdev, PCI_VENDOR_ID, &reg) ||
> +	    reg != (pdev->vendor | (pdev->device << 16)) ||
> +	    pci_read_config_dword(pdev, PCI_CLASS_REVISION, &reg) ||
> +	    reg != (pdev->revision | (pdev->class << 8)))
> +		return true;
> +
> +	if (pdev->hdr_type == PCI_HEADER_TYPE_NORMAL &&
> +	    (pci_read_config_dword(pdev, PCI_SUBSYSTEM_VENDOR_ID, &reg) ||
> +	     reg != (pdev->subsystem_vendor | (pdev->subsystem_device << 16))))
> +		return true;
> +
> +	if (pci_get_dsn(pdev) != ctrl->dsn)
> +		return true;
> +
> +	return false;
> +}
> +
>   static void pciehp_ignore_dpc_link_change(struct controller *ctrl,
> -					  struct pci_dev *pdev, int irq)
> +					  struct pci_dev *pdev, int irq,
> +					  u16 ignored_events)
>   {
>   	/*
>   	 * Ignore link changes which occurred while waiting for DPC recovery.
>   	 * Could be several if DPC triggered multiple times consecutively.
>   	 */
>   	synchronize_hardirq(irq);
> -	atomic_and(~PCI_EXP_SLTSTA_DLLSC, &ctrl->pending_events);
> +	atomic_and(~ignored_events, &ctrl->pending_events);
>   	if (pciehp_poll_mode)
>   		pcie_capability_write_word(pdev, PCI_EXP_SLTSTA,
> -					   PCI_EXP_SLTSTA_DLLSC);
> +					   ignored_events);
>   	ctrl_info(ctrl, "Slot(%s): Link Down/Up ignored (recovered by DPC)\n",
>   		  slot_name(ctrl));
>   
> @@ -584,8 +614,8 @@ static void pciehp_ignore_dpc_link_change(struct controller *ctrl,
>   	 * Synthesize it to ensure that it is acted on.
>   	 */
>   	down_read_nested(&ctrl->reset_lock, ctrl->depth);
> -	if (!pciehp_check_link_active(ctrl))
> -		pciehp_request(ctrl, PCI_EXP_SLTSTA_DLLSC);
> +	if (!pciehp_check_link_active(ctrl) || pciehp_device_replaced(ctrl))
> +		pciehp_request(ctrl, ignored_events);
>   	up_read(&ctrl->reset_lock);
>   }
>   
> @@ -736,8 +766,13 @@ static irqreturn_t pciehp_ist(int irq, void *dev_id)
>   	 */
>   	if ((events & PCI_EXP_SLTSTA_DLLSC) && pci_dpc_recovered(pdev) &&
>   	    ctrl->state == ON_STATE) {
> -		events &= ~PCI_EXP_SLTSTA_DLLSC;
> -		pciehp_ignore_dpc_link_change(ctrl, pdev, irq);
> +		u16 ignored_events = PCI_EXP_SLTSTA_DLLSC;
> +
> +		if (!ctrl->inband_presence_disabled)
> +			ignored_events |= events & PCI_EXP_SLTSTA_PDC;
> +
> +		events &= ~ignored_events;
> +		pciehp_ignore_dpc_link_change(ctrl, pdev, irq, ignored_events);
>   	}
>   
>   	/*

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] PCI: pciehp: Ignore Presence Detect Changed caused by DPC
  2025-04-11  2:34   ` Sathyanarayanan Kuppuswamy
@ 2025-04-11  8:58     ` Lukas Wunner
  0 siblings, 0 replies; 16+ messages in thread
From: Lukas Wunner @ 2025-04-11  8:58 UTC (permalink / raw)
  To: Sathyanarayanan Kuppuswamy
  Cc: Bjorn Helgaas, Keith Busch, Yicong Yang, linux-pci, Stuart Hayes,
	Mika Westerberg, Ilpo Jarvinen, Joel Mathew Thomas, Russ Weight,
	Matthew Gerlach, Yilun Xu, linux-fpga, Moshe Shemesh, Shay Drory,
	Saeed Mahameed, Alex Williamson

On Thu, Apr 10, 2025 at 07:34:41PM -0700, Sathyanarayanan Kuppuswamy wrote:
> On 4/10/25 8:27 AM, Lukas Wunner wrote:
> > Commit a97396c6eb13 ("PCI: pciehp: Ignore Link Down/Up caused by DPC")
> > amended PCIe hotplug to not bring down the slot upon Data Link Layer State
> > Changed events caused by Downstream Port Containment.
> > 
> > However Keith reports off-list that if the slot uses in-band presence
> > detect (i.e. Presence Detect State is derived from Data Link Layer Link
> > Active), DPC also causes a spurious Presence Detect Changed event.
> > 
> > This needs to be ignored as well.
> > 
> > Unfortunately there's no register indicating that in-band presence detect
> > is used.  PCIe r5.0 sec 7.5.3.10 introduced the In-Band PD Disable bit in
> > the Slot Control Register.  The PCIe hotplug driver sets this bit on
> > ports supporting it.  But older ports may still use in-band presence
> > detect.
> > 
> > If in-band presence detect can be disabled, Presence Detect Changed events
> 
> It should be "in-band presence detect is disabled", right?

Well, for all practical purposes it's the same because pciehp disables
in-band PD if it can be disabled.

> > occurring during DPC must not be ignored because they signal device
> > replacement.  On all other ports, device replacement cannot be detected
> > reliably because the Presence Detect Changed event could be a side effect
> > of DPC.  On those (older) ports, perform a best-effort device replacement
> > check by comparing the Vendor ID, Device ID and other data in Config Space
> > with the values cached in struct pci_dev.  Use the existing helper
> > pciehp_device_replaced() to accomplish this.  It is currently #ifdef'ed to
> > CONFIG_PM_SLEEP in pciehp_core.c, so move it to pciehp_hpc.c where most
> > other functions accessing config space reside.
> 
> Code looks fine to me
> 
> Reviewed-by: Kuppuswamy Sathyanarayanan
> <sathyanarayanan.kuppuswamy@linux.intel.com>

Thanks for taking a look!

Lukas

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] PCI: pciehp: Ignore Link Down/Up caused by Secondary Bus Reset
  2025-04-10 15:27 ` [PATCH 2/2] PCI: pciehp: Ignore Link Down/Up caused by Secondary Bus Reset Lukas Wunner
@ 2025-04-11 22:28   ` Sathyanarayanan Kuppuswamy
  2025-04-12  3:36     ` Lukas Wunner
  2025-04-13 17:22   ` Sathyanarayanan Kuppuswamy
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 16+ messages in thread
From: Sathyanarayanan Kuppuswamy @ 2025-04-11 22:28 UTC (permalink / raw)
  To: Lukas Wunner, Bjorn Helgaas
  Cc: Keith Busch, Yicong Yang, linux-pci, Stuart Hayes,
	Mika Westerberg, Ilpo Jarvinen, Joel Mathew Thomas, Russ Weight,
	Matthew Gerlach, Yilun Xu, linux-fpga, Moshe Shemesh, Shay Drory,
	Saeed Mahameed, Alex Williamson


On 4/10/25 8:27 AM, Lukas Wunner wrote:
> When a Secondary Bus Reset is issued at a hotplug port, it causes a Data
> Link Layer State Changed event as a side effect.  On hotplug ports using
> in-band presence detect, it additionally causes a Presence Detect Changed
> event.
>
> These spurious events should not result in teardown and re-enumeration of
> the device in the slot.  Hence commit 2e35afaefe64 ("PCI: pciehp: Add
> reset_slot() method") masked the Presence Detect Changed Enable bit in the
> Slot Control register during a Secondary Bus Reset.  Commit 06a8d89af551
> ("PCI: pciehp: Disable link notification across slot reset") additionally
> masked the Data Link Layer State Changed Enable bit.
>
> However masking those bits only disables interrupt generation (PCIe r6.2
> sec 6.7.3.1).  The events are still visible in the Slot Status register
> and picked up by the IRQ handler if it runs during a Secondary Bus Reset.
> This can happen if the interrupt is shared or if an unmasked hotplug event
> occurs, e.g. Attention Button Pressed or Power Fault Detected.
>
> The likelihood of this happening used to be small, so it wasn't much of a
> problem in practice.  That has changed with the recent introduction of
> bandwidth control in v6.13-rc1 with commit 665745f27487 ("PCI/bwctrl:
> Re-add BW notification portdrv as PCIe BW controller"):
>
> Bandwidth control shares the interrupt with PCIe hotplug.  A Secondary Bus
> Reset causes a Link Bandwidth Notification, so the hotplug IRQ handler
> runs, picks up the masked events and tears down the device in the slot.
>
> As a result, Joel reports VFIO passthrough failure of a GPU, which Ilpo
> root-caused to the incorrect handling of masked hotplug events.
>
> Clearly, a more reliable way is needed to ignore spurious hotplug events.
>
> For Downstream Port Containment, a new ignore mechanism was introduced by
> commit a97396c6eb13 ("PCI: pciehp: Ignore Link Down/Up caused by DPC").
> It has been working reliably for the past four years.
>
> Adapt it for Secondary Bus Resets.
>
> Introduce two helpers to annotate code sections which cause spurious link
> changes:  pci_hp_ignore_link_change() and pci_hp_unignore_link_change()
> Use those helpers in lieu of masking interrupts in the Slot Control
> register.
>
> Introduce a helper to check whether such a code section is executing
> concurrently and if so, await it:  pci_hp_spurious_link_change()
> Invoke the helper in the hotplug IRQ thread pciehp_ist().  Re-use the
> IRQ thread's existing code which ignores DPC-induced link changes unless
> the link is unexpectedly down after reset recovery or the device was
> replaced during the bus reset.
>
> That code block in pciehp_ist() was previously only executed if a Data
> Link Layer State Changed event has occurred.  Additionally execute it for
> Presence Detect Changed events.  That's necessary for compatibility with
> PCIe r1.0 hotplug ports because Data Link Layer State Changed didn't exist
> before PCIe r1.1.  DPC was added with PCIe r3.1 and thus DPC-capable
> hotplug ports always support Data Link Layer State Changed events.
> But the same cannot be assumed for Secondary Bus Reset, which already
> existed in PCIe r1.0.
>
> Secondary Bus Reset is only one of many causes of spurious link changes.
> Others include runtime suspend to D3cold, firmware updates or FPGA
> reconfiguration.  The new pci_hp_{,un}ignore_link_change() helpers may be
> used by all kinds of drivers to annotate such code sections, hence their
> declarations are publicly visible in <linux/pci.h>.  A case in point is
> the Mellanox Ethernet driver which disables a firmware reset feature if
> the Ethernet card is attached to a hotplug port, see commit 3d7a3f2612d7
> ("net/mlx5: Nack sync reset request when HotPlug is enabled").  Going
> forward, PCIe hotplug will be able to cope gracefully with all such use
> cases once the code sections are properly annotated.
>
> The new helpers internally use two bits in struct pci_dev's priv_flags as
> well as a wait_queue.  This mirrors what was done for DPC by commit
> a97396c6eb13 ("PCI: pciehp: Ignore Link Down/Up caused by DPC").  That may
> be insufficient if spurious link changes are caused by multiple sources
> simultaneously.  An example might be a Secondary Bus Reset issued by AER
> during FPGA reconfiguration.  If this turns out to happen in real life,
> support for it can easily be added by replacing the PCI_LINK_CHANGING flag
> with an atomic_t counter incremented by pci_hp_ignore_link_change() and
> decremented by pci_hp_unignore_link_change().  Instead of awaiting a zero
> PCI_LINK_CHANGING flag, the pci_hp_spurious_link_change() helper would
> then simply await a zero counter.
>
> Fixes: 665745f27487 ("PCI/bwctrl: Re-add BW notification portdrv as PCIe BW controller")
> Reported-by: Joel Mathew Thomas <proxy0@tutamail.com>
> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219765
> Tested-by: Joel Mathew Thomas <proxy0@tutamail.com>
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> Cc: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
> ---
>   drivers/pci/hotplug/pci_hotplug_core.c | 69 ++++++++++++++++++++++++++++++++++
>   drivers/pci/hotplug/pciehp_hpc.c       | 35 ++++++-----------
>   drivers/pci/pci.h                      |  3 ++
>   include/linux/pci.h                    |  8 ++++
>   4 files changed, 92 insertions(+), 23 deletions(-)
>
> diff --git a/drivers/pci/hotplug/pci_hotplug_core.c b/drivers/pci/hotplug/pci_hotplug_core.c
> index d30f131..d8c5856 100644
> --- a/drivers/pci/hotplug/pci_hotplug_core.c
> +++ b/drivers/pci/hotplug/pci_hotplug_core.c
> @@ -492,6 +492,75 @@ void pci_hp_destroy(struct hotplug_slot *slot)
>   }
>   EXPORT_SYMBOL_GPL(pci_hp_destroy);
>   
> +static DECLARE_WAIT_QUEUE_HEAD(pci_hp_link_change_wq);
> +
> +/**
> + * pci_hp_ignore_link_change - begin code section causing spurious link changes
> + * @pdev: PCI hotplug bridge
> + *
> + * Mark the beginning of a code section causing spurious link changes on the
> + * Secondary Bus of @pdev, e.g. as a side effect of a Secondary Bus Reset,
> + * D3cold transition, firmware update or FPGA reconfiguration.
> + *
> + * Hotplug drivers can thus check whether such a code section is executing
> + * concurrently, await it with pci_hp_spurious_link_change() and ignore the
> + * resulting link change events.
> + *
> + * Must be paired with pci_hp_unignore_link_change().  May be called both
> + * from the PCI core and from Endpoint drivers.  May be called for bridges
> + * which are not hotplug-capable, in which case it has no effect because
> + * no hotplug driver is bound to the bridge.
> + */
> +void pci_hp_ignore_link_change(struct pci_dev *pdev)
> +{
> +	set_bit(PCI_LINK_CHANGING, &pdev->priv_flags);
> +	smp_mb__after_atomic(); /* pairs with implied barrier of wait_event() */
> +}
> +
> +/**
> + * pci_hp_unignore_link_change - end code section causing spurious link changes
> + * @pdev: PCI hotplug bridge
> + *
> + * Mark the end of a code section causing spurious link changes on the
> + * Secondary Bus of @pdev.  Must be paired with pci_hp_ignore_link_change().
> + */
> +void pci_hp_unignore_link_change(struct pci_dev *pdev)
> +{
> +	set_bit(PCI_LINK_CHANGED, &pdev->priv_flags);
> +	mb(); /* ensure pci_hp_spurious_link_change() sees either bit set */
> +	clear_bit(PCI_LINK_CHANGING, &pdev->priv_flags);
> +	wake_up_all(&pci_hp_link_change_wq);
> +}
> +
> +/**
> + * pci_hp_spurious_link_change - check for spurious link changes
> + * @pdev: PCI hotplug bridge
> + *
> + * Check whether a code section is executing concurrently which is causing
> + * spurious link changes on the Secondary Bus of @pdev.  Await the end of the
> + * code section if so.
> + *
> + * Return %true if such a code section has been executing concurrently,
> + * otherwise %false.  Also return %true if such a code section has not been
> + * executing concurrently, but at least once since the last invocation of this
> + * function.
> + *
> + * May be called by hotplug drivers to check whether a link change is spurious
> + * and can be ignored.
> + *
> + * Because a genuine link change may have occurred in-between a spurious link
> + * change and the invocation of this function, hotplug drivers should perform
> + * sanity checks such as retrieving the current link state and bringing down
> + * the slot if the link is down.
> + */
> +bool pci_hp_spurious_link_change(struct pci_dev *pdev)
> +{
> +	wait_event(pci_hp_link_change_wq,
> +		   !test_bit(PCI_LINK_CHANGING, &pdev->priv_flags));
> +
> +	return test_and_clear_bit(PCI_LINK_CHANGED, &pdev->priv_flags);
> +}
> +
>   static int __init pci_hotplug_init(void)
>   {
>   	int result;
> diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
> index 388fbed..c98d310 100644
> --- a/drivers/pci/hotplug/pciehp_hpc.c
> +++ b/drivers/pci/hotplug/pciehp_hpc.c
> @@ -592,21 +592,21 @@ bool pciehp_device_replaced(struct controller *ctrl)
>   	return false;
>   }
>   
> -static void pciehp_ignore_dpc_link_change(struct controller *ctrl,
> -					  struct pci_dev *pdev, int irq,
> -					  u16 ignored_events)
> +static void pciehp_ignore_link_change(struct controller *ctrl,
> +				      struct pci_dev *pdev, int irq,
> +				      u16 ignored_events)
>   {
>   	/*
>   	 * Ignore link changes which occurred while waiting for DPC recovery.
>   	 * Could be several if DPC triggered multiple times consecutively.
> +	 * Also ignore link changes caused by Secondary Bus Reset etc.
>   	 */
>   	synchronize_hardirq(irq);
>   	atomic_and(~ignored_events, &ctrl->pending_events);
>   	if (pciehp_poll_mode)
>   		pcie_capability_write_word(pdev, PCI_EXP_SLTSTA,
>   					   ignored_events);
> -	ctrl_info(ctrl, "Slot(%s): Link Down/Up ignored (recovered by DPC)\n",
> -		  slot_name(ctrl));
> +	ctrl_info(ctrl, "Slot(%s): Link Down/Up ignored\n", slot_name(ctrl));
>   
>   	/*
>   	 * If the link is unexpectedly down after successful recovery,
> @@ -762,9 +762,11 @@ static irqreturn_t pciehp_ist(int irq, void *dev_id)
>   
>   	/*
>   	 * Ignore Link Down/Up events caused by Downstream Port Containment
> -	 * if recovery from the error succeeded.
> +	 * if recovery succeeded, or caused by Secondary Bus Reset,
> +	 * suspend to D3cold, firmware update, FPGA reconfiguration, etc.
>   	 */
> -	if ((events & PCI_EXP_SLTSTA_DLLSC) && pci_dpc_recovered(pdev) &&
> +	if ((events & (PCI_EXP_SLTSTA_PDC | PCI_EXP_SLTSTA_DLLSC)) &&
> +	    (pci_dpc_recovered(pdev) || pci_hp_spurious_link_change(pdev)) &&
>   	    ctrl->state == ON_STATE) {
>   		u16 ignored_events = PCI_EXP_SLTSTA_DLLSC;
>   
> @@ -772,7 +774,7 @@ static irqreturn_t pciehp_ist(int irq, void *dev_id)
>   			ignored_events |= events & PCI_EXP_SLTSTA_PDC;
>   
>   		events &= ~ignored_events;
> -		pciehp_ignore_dpc_link_change(ctrl, pdev, irq, ignored_events);
> +		pciehp_ignore_link_change(ctrl, pdev, irq, ignored_events);
>   	}
>   
>   	/*
> @@ -937,7 +939,6 @@ int pciehp_reset_slot(struct hotplug_slot *hotplug_slot, bool probe)
>   {
>   	struct controller *ctrl = to_ctrl(hotplug_slot);
>   	struct pci_dev *pdev = ctrl_dev(ctrl);
> -	u16 stat_mask = 0, ctrl_mask = 0;
>   	int rc;
>   
>   	if (probe)
> @@ -945,23 +946,11 @@ int pciehp_reset_slot(struct hotplug_slot *hotplug_slot, bool probe)
>   
>   	down_write_nested(&ctrl->reset_lock, ctrl->depth);
>   
> -	if (!ATTN_BUTTN(ctrl)) {
> -		ctrl_mask |= PCI_EXP_SLTCTL_PDCE;
> -		stat_mask |= PCI_EXP_SLTSTA_PDC;
> -	}
> -	ctrl_mask |= PCI_EXP_SLTCTL_DLLSCE;
> -	stat_mask |= PCI_EXP_SLTSTA_DLLSC;
> -
> -	pcie_write_cmd(ctrl, 0, ctrl_mask);
> -	ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__,
> -		 pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL, 0);
> +	pci_hp_ignore_link_change(pdev);
>   
>   	rc = pci_bridge_secondary_bus_reset(ctrl->pcie->port);
>   
> -	pcie_capability_write_word(pdev, PCI_EXP_SLTSTA, stat_mask);
> -	pcie_write_cmd_nowait(ctrl, ctrl_mask, ctrl_mask);
> -	ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__,
> -		 pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL, ctrl_mask);
> +	pci_hp_unignore_link_change(pdev);
>   

Since most of the changes in this patch is related to adding framework to
ignore spurious hotplug event, why not split it in to a separate patch?

Is this fix tested in any platform?

>   	up_write(&ctrl->reset_lock);
>   	return rc;
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index b81e99c..7db798b 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -227,6 +227,7 @@ static inline bool pcie_downstream_port(const struct pci_dev *dev)
>   
>   /* Functions for PCI Hotplug drivers to use */
>   int pci_hp_add_bridge(struct pci_dev *dev);
> +bool pci_hp_spurious_link_change(struct pci_dev *pdev);

Do you expect this function used outside hotplug code? If not why not 
leave the
declaration in pciehp.h?

>   
>   #if defined(CONFIG_SYSFS) && defined(HAVE_PCI_LEGACY)
>   void pci_create_legacy_files(struct pci_bus *bus);
> @@ -557,6 +558,8 @@ static inline int pci_dev_set_disconnected(struct pci_dev *dev, void *unused)
>   #define PCI_DPC_RECOVERED 1
>   #define PCI_DPC_RECOVERING 2
>   #define PCI_DEV_REMOVED 3
> +#define PCI_LINK_CHANGED 4
> +#define PCI_LINK_CHANGING 5
>   
>   static inline void pci_dev_assign_added(struct pci_dev *dev)
>   {
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 0e8e3fd..833b54f 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -1848,6 +1848,14 @@ static inline void pcie_no_aspm(void) { }
>   static inline bool pcie_aspm_enabled(struct pci_dev *pdev) { return false; }
>   #endif
>   
> +#ifdef CONFIG_HOTPLUG_PCI
> +void pci_hp_ignore_link_change(struct pci_dev *pdev);
> +void pci_hp_unignore_link_change(struct pci_dev *pdev);
> +#else
> +static inline void pci_hp_ignore_link_change(struct pci_dev *pdev) { }
> +static inline void pci_hp_unignore_link_change(struct pci_dev *pdev) { }
> +#endif
> +

Generally we expose APIs when we really need it.  Since you have already
identified some use cases where you will use it in other drivers, why not
include one such change as an example?

>   #ifdef CONFIG_PCIEAER
>   bool pci_aer_available(void);
>   #else

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] PCI: pciehp: Ignore Link Down/Up caused by Secondary Bus Reset
  2025-04-11 22:28   ` Sathyanarayanan Kuppuswamy
@ 2025-04-12  3:36     ` Lukas Wunner
  2025-04-13 17:21       ` Sathyanarayanan Kuppuswamy
  0 siblings, 1 reply; 16+ messages in thread
From: Lukas Wunner @ 2025-04-12  3:36 UTC (permalink / raw)
  To: Sathyanarayanan Kuppuswamy
  Cc: Bjorn Helgaas, Keith Busch, Yicong Yang, linux-pci, Stuart Hayes,
	Mika Westerberg, Ilpo Jarvinen, Joel Mathew Thomas, Russ Weight,
	Matthew Gerlach, Yilun Xu, linux-fpga, Moshe Shemesh, Shay Drory,
	Saeed Mahameed, Alex Williamson

On Fri, Apr 11, 2025 at 03:28:15PM -0700, Sathyanarayanan Kuppuswamy wrote:
> On 4/10/25 8:27 AM, Lukas Wunner wrote:
> > Introduce two helpers to annotate code sections which cause spurious link
> > changes:  pci_hp_ignore_link_change() and pci_hp_unignore_link_change()
> > Use those helpers in lieu of masking interrupts in the Slot Control
> > register.
> > 
> > Introduce a helper to check whether such a code section is executing
> > concurrently and if so, await it:  pci_hp_spurious_link_change()
> > Invoke the helper in the hotplug IRQ thread pciehp_ist().  Re-use the
> > IRQ thread's existing code which ignores DPC-induced link changes unless
> > the link is unexpectedly down after reset recovery or the device was
> > replaced during the bus reset.
> 
> Since most of the changes in this patch is related to adding framework to
> ignore spurious hotplug event, why not split it in to a separate patch?

The idea is to ease backporting.  The issue fixes VFIO passthrough on
v6.13-rc1 and newer, so the patch will have to be backported at least
to v6.13, v6.14, v6.15.


> Is this fix tested in any platform?

Yes, Joel Mathew Thomas successfully tested it:

https://bugzilla.kernel.org/show_bug.cgi?id=219765

That's an Asus TUF FA507NU dual-GPU laptop (AMD CPU + Nvidia discrete GPU).
The Nvidia GPU is passed through to a VM.


> > --- a/drivers/pci/pci.h
> > +++ b/drivers/pci/pci.h
> > @@ -227,6 +227,7 @@ static inline bool pcie_downstream_port(const struct pci_dev *dev)
> >   /* Functions for PCI Hotplug drivers to use */
> >   int pci_hp_add_bridge(struct pci_dev *dev);
> > +bool pci_hp_spurious_link_change(struct pci_dev *pdev);
> 
> Do you expect this function used outside hotplug code? If not why not leave
> the declaration in pciehp.h?

Any hotplug driver may call this.  In particular, there are two other drivers
implementing the ->reset_slot() callback: pnv_php.c and s390_pci_hpc.c

pnv_php.c does a similar dance as pciehp_hpc.c (before this patch):
It disables the interrupt, performs a Secondary Bus Reset, clears spurious
events and re-enables the interrupt.  I think it can be converted to use
the newly introduced API.  That should make it more robust against removal
or replacement of the device during the Secondary Bus Reset.

Also, to cope with runtime suspend to D3cold, there's an ignore_hotplug
bit in struct pci_dev.  The bit is set by drivers for discrete GPUs and
is honored by acpiphp and pciehp.  I'd like to remove the bit in favor
of the new mechanism introduced here and that means acpiphp will have to
be converted to call pci_hp_spurious_link_change().

One thing that's problematic about the current behavior is that hotplug
events are ignored wholesale for GPUs which runtime suspend to D3cold.
That works for discrete GPUs in laptops which are soldered down on the
mainboard, but doesn't work for GPUs which are plugged into an actual
hotplug port, e.g. data center GPUs.  The new API will allow detecting
and ignoring spurious events in a more surgical manner.  I envision
that __pci_set_power_state() will call pci_hp_ignore_link_change()
if the target power state is D3cold.  Also vga_switcheroo.c will have
to call that.  But none of the GPU drivers will have to call
pci_ignore_hotplug() anymore.

To summarize, there are at least two other hotplug drivers besides pciehp
which I expect will call pci_hp_spurious_link_change() in the foreseeable
future, acpiphp and pnv_php, hence the declaration is not in pciehp.h
but in drivers/pci/pci.h.


> > --- a/include/linux/pci.h
> > +++ b/include/linux/pci.h
> > @@ -1848,6 +1848,14 @@ static inline void pcie_no_aspm(void) { }
> >   static inline bool pcie_aspm_enabled(struct pci_dev *pdev) { return false; }
> >   #endif
> > +#ifdef CONFIG_HOTPLUG_PCI
> > +void pci_hp_ignore_link_change(struct pci_dev *pdev);
> > +void pci_hp_unignore_link_change(struct pci_dev *pdev);
> > +#else
> > +static inline void pci_hp_ignore_link_change(struct pci_dev *pdev) { }
> > +static inline void pci_hp_unignore_link_change(struct pci_dev *pdev) { }
> > +#endif
> > +
> 
> Generally we expose APIs when we really need it.  Since you have already
> identified some use cases where you will use it in other drivers, why not
> include one such change as an example?

Mostly because I wanted to get this fix out the door quickly before more
people come across the regression it addresses.

Converting the Mellanox Ethernet driver to use the API would require an ack
from its maintainers.  Since it's not urgent, I was planning to do that in
a subsequent cycle.  Same for the conversion of D3cold handling.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] PCI: pciehp: Ignore Link Down/Up caused by Secondary Bus Reset
  2025-04-12  3:36     ` Lukas Wunner
@ 2025-04-13 17:21       ` Sathyanarayanan Kuppuswamy
  0 siblings, 0 replies; 16+ messages in thread
From: Sathyanarayanan Kuppuswamy @ 2025-04-13 17:21 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Bjorn Helgaas, Keith Busch, Yicong Yang, linux-pci, Stuart Hayes,
	Mika Westerberg, Ilpo Jarvinen, Joel Mathew Thomas, Russ Weight,
	Matthew Gerlach, Yilun Xu, linux-fpga, Moshe Shemesh, Shay Drory,
	Saeed Mahameed, Alex Williamson


On 4/11/25 8:36 PM, Lukas Wunner wrote:
> On Fri, Apr 11, 2025 at 03:28:15PM -0700, Sathyanarayanan Kuppuswamy wrote:
>> On 4/10/25 8:27 AM, Lukas Wunner wrote:
>>> Introduce two helpers to annotate code sections which cause spurious link
>>> changes:  pci_hp_ignore_link_change() and pci_hp_unignore_link_change()
>>> Use those helpers in lieu of masking interrupts in the Slot Control
>>> register.
>>>
>>> Introduce a helper to check whether such a code section is executing
>>> concurrently and if so, await it:  pci_hp_spurious_link_change()
>>> Invoke the helper in the hotplug IRQ thread pciehp_ist().  Re-use the
>>> IRQ thread's existing code which ignores DPC-induced link changes unless
>>> the link is unexpectedly down after reset recovery or the device was
>>> replaced during the bus reset.
>> Since most of the changes in this patch is related to adding framework to
>> ignore spurious hotplug event, why not split it in to a separate patch?
> The idea is to ease backporting.  The issue fixes VFIO passthrough on
> v6.13-rc1 and newer, so the patch will have to be backported at least
> to v6.13, v6.14, v6.15.

Makes sense.

>
>> Is this fix tested in any platform?
> Yes, Joel Mathew Thomas successfully tested it:
>
> https://bugzilla.kernel.org/show_bug.cgi?id=219765
>
> That's an Asus TUF FA507NU dual-GPU laptop (AMD CPU + Nvidia discrete GPU).
> The Nvidia GPU is passed through to a VM.
>
>
>>> --- a/drivers/pci/pci.h
>>> +++ b/drivers/pci/pci.h
>>> @@ -227,6 +227,7 @@ static inline bool pcie_downstream_port(const struct pci_dev *dev)
>>>    /* Functions for PCI Hotplug drivers to use */
>>>    int pci_hp_add_bridge(struct pci_dev *dev);
>>> +bool pci_hp_spurious_link_change(struct pci_dev *pdev);
>> Do you expect this function used outside hotplug code? If not why not leave
>> the declaration in pciehp.h?
> Any hotplug driver may call this.  In particular, there are two other drivers
> implementing the ->reset_slot() callback: pnv_php.c and s390_pci_hpc.c
>
> pnv_php.c does a similar dance as pciehp_hpc.c (before this patch):
> It disables the interrupt, performs a Secondary Bus Reset, clears spurious
> events and re-enables the interrupt.  I think it can be converted to use
> the newly introduced API.  That should make it more robust against removal
> or replacement of the device during the Secondary Bus Reset.
>
> Also, to cope with runtime suspend to D3cold, there's an ignore_hotplug
> bit in struct pci_dev.  The bit is set by drivers for discrete GPUs and
> is honored by acpiphp and pciehp.  I'd like to remove the bit in favor
> of the new mechanism introduced here and that means acpiphp will have to
> be converted to call pci_hp_spurious_link_change().
>
> One thing that's problematic about the current behavior is that hotplug
> events are ignored wholesale for GPUs which runtime suspend to D3cold.
> That works for discrete GPUs in laptops which are soldered down on the
> mainboard, but doesn't work for GPUs which are plugged into an actual
> hotplug port, e.g. data center GPUs.  The new API will allow detecting
> and ignoring spurious events in a more surgical manner.  I envision
> that __pci_set_power_state() will call pci_hp_ignore_link_change()
> if the target power state is D3cold.  Also vga_switcheroo.c will have
> to call that.  But none of the GPU drivers will have to call
> pci_ignore_hotplug() anymore.
>
> To summarize, there are at least two other hotplug drivers besides pciehp
> which I expect will call pci_hp_spurious_link_change() in the foreseeable
> future, acpiphp and pnv_php, hence the declaration is not in pciehp.h
> but in drivers/pci/pci.h.

Thanks for sharing the details.

>
>>> --- a/include/linux/pci.h
>>> +++ b/include/linux/pci.h
>>> @@ -1848,6 +1848,14 @@ static inline void pcie_no_aspm(void) { }
>>>    static inline bool pcie_aspm_enabled(struct pci_dev *pdev) { return false; }
>>>    #endif
>>> +#ifdef CONFIG_HOTPLUG_PCI
>>> +void pci_hp_ignore_link_change(struct pci_dev *pdev);
>>> +void pci_hp_unignore_link_change(struct pci_dev *pdev);
>>> +#else
>>> +static inline void pci_hp_ignore_link_change(struct pci_dev *pdev) { }
>>> +static inline void pci_hp_unignore_link_change(struct pci_dev *pdev) { }
>>> +#endif
>>> +
>> Generally we expose APIs when we really need it.  Since you have already
>> identified some use cases where you will use it in other drivers, why not
>> include one such change as an example?
> Mostly because I wanted to get this fix out the door quickly before more
> people come across the regression it addresses.
>
> Converting the Mellanox Ethernet driver to use the API would require an ack
> from its maintainers.  Since it's not urgent, I was planning to do that in
> a subsequent cycle.  Same for the conversion of D3cold handling.


Got it.


>
> Thanks,
>
> Lukas
>
-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] PCI: pciehp: Ignore Link Down/Up caused by Secondary Bus Reset
  2025-04-10 15:27 ` [PATCH 2/2] PCI: pciehp: Ignore Link Down/Up caused by Secondary Bus Reset Lukas Wunner
  2025-04-11 22:28   ` Sathyanarayanan Kuppuswamy
@ 2025-04-13 17:22   ` Sathyanarayanan Kuppuswamy
  2025-04-14 13:32   ` Ilpo Järvinen
  2025-04-16  8:00   ` Ilpo Järvinen
  3 siblings, 0 replies; 16+ messages in thread
From: Sathyanarayanan Kuppuswamy @ 2025-04-13 17:22 UTC (permalink / raw)
  To: Lukas Wunner, Bjorn Helgaas
  Cc: Keith Busch, Yicong Yang, linux-pci, Stuart Hayes,
	Mika Westerberg, Ilpo Jarvinen, Joel Mathew Thomas, Russ Weight,
	Matthew Gerlach, Yilun Xu, linux-fpga, Moshe Shemesh, Shay Drory,
	Saeed Mahameed, Alex Williamson


On 4/10/25 8:27 AM, Lukas Wunner wrote:
> When a Secondary Bus Reset is issued at a hotplug port, it causes a Data
> Link Layer State Changed event as a side effect.  On hotplug ports using
> in-band presence detect, it additionally causes a Presence Detect Changed
> event.
>
> These spurious events should not result in teardown and re-enumeration of
> the device in the slot.  Hence commit 2e35afaefe64 ("PCI: pciehp: Add
> reset_slot() method") masked the Presence Detect Changed Enable bit in the
> Slot Control register during a Secondary Bus Reset.  Commit 06a8d89af551
> ("PCI: pciehp: Disable link notification across slot reset") additionally
> masked the Data Link Layer State Changed Enable bit.
>
> However masking those bits only disables interrupt generation (PCIe r6.2
> sec 6.7.3.1).  The events are still visible in the Slot Status register
> and picked up by the IRQ handler if it runs during a Secondary Bus Reset.
> This can happen if the interrupt is shared or if an unmasked hotplug event
> occurs, e.g. Attention Button Pressed or Power Fault Detected.
>
> The likelihood of this happening used to be small, so it wasn't much of a
> problem in practice.  That has changed with the recent introduction of
> bandwidth control in v6.13-rc1 with commit 665745f27487 ("PCI/bwctrl:
> Re-add BW notification portdrv as PCIe BW controller"):
>
> Bandwidth control shares the interrupt with PCIe hotplug.  A Secondary Bus
> Reset causes a Link Bandwidth Notification, so the hotplug IRQ handler
> runs, picks up the masked events and tears down the device in the slot.
>
> As a result, Joel reports VFIO passthrough failure of a GPU, which Ilpo
> root-caused to the incorrect handling of masked hotplug events.
>
> Clearly, a more reliable way is needed to ignore spurious hotplug events.
>
> For Downstream Port Containment, a new ignore mechanism was introduced by
> commit a97396c6eb13 ("PCI: pciehp: Ignore Link Down/Up caused by DPC").
> It has been working reliably for the past four years.
>
> Adapt it for Secondary Bus Resets.
>
> Introduce two helpers to annotate code sections which cause spurious link
> changes:  pci_hp_ignore_link_change() and pci_hp_unignore_link_change()
> Use those helpers in lieu of masking interrupts in the Slot Control
> register.
>
> Introduce a helper to check whether such a code section is executing
> concurrently and if so, await it:  pci_hp_spurious_link_change()
> Invoke the helper in the hotplug IRQ thread pciehp_ist().  Re-use the
> IRQ thread's existing code which ignores DPC-induced link changes unless
> the link is unexpectedly down after reset recovery or the device was
> replaced during the bus reset.
>
> That code block in pciehp_ist() was previously only executed if a Data
> Link Layer State Changed event has occurred.  Additionally execute it for
> Presence Detect Changed events.  That's necessary for compatibility with
> PCIe r1.0 hotplug ports because Data Link Layer State Changed didn't exist
> before PCIe r1.1.  DPC was added with PCIe r3.1 and thus DPC-capable
> hotplug ports always support Data Link Layer State Changed events.
> But the same cannot be assumed for Secondary Bus Reset, which already
> existed in PCIe r1.0.
>
> Secondary Bus Reset is only one of many causes of spurious link changes.
> Others include runtime suspend to D3cold, firmware updates or FPGA
> reconfiguration.  The new pci_hp_{,un}ignore_link_change() helpers may be
> used by all kinds of drivers to annotate such code sections, hence their
> declarations are publicly visible in <linux/pci.h>.  A case in point is
> the Mellanox Ethernet driver which disables a firmware reset feature if
> the Ethernet card is attached to a hotplug port, see commit 3d7a3f2612d7
> ("net/mlx5: Nack sync reset request when HotPlug is enabled").  Going
> forward, PCIe hotplug will be able to cope gracefully with all such use
> cases once the code sections are properly annotated.
>
> The new helpers internally use two bits in struct pci_dev's priv_flags as
> well as a wait_queue.  This mirrors what was done for DPC by commit
> a97396c6eb13 ("PCI: pciehp: Ignore Link Down/Up caused by DPC").  That may
> be insufficient if spurious link changes are caused by multiple sources
> simultaneously.  An example might be a Secondary Bus Reset issued by AER
> during FPGA reconfiguration.  If this turns out to happen in real life,
> support for it can easily be added by replacing the PCI_LINK_CHANGING flag
> with an atomic_t counter incremented by pci_hp_ignore_link_change() and
> decremented by pci_hp_unignore_link_change().  Instead of awaiting a zero
> PCI_LINK_CHANGING flag, the pci_hp_spurious_link_change() helper would
> then simply await a zero counter.
>
> Fixes: 665745f27487 ("PCI/bwctrl: Re-add BW notification portdrv as PCIe BW controller")
> Reported-by: Joel Mathew Thomas <proxy0@tutamail.com>
> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219765
> Tested-by: Joel Mathew Thomas <proxy0@tutamail.com>
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> Cc: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>

Looks good to me

Reviewed-by: Kuppuswamy Sathyanarayanan 
<sathyanarayanan.kuppuswamy@linux.intel.com>

> ---
>   drivers/pci/hotplug/pci_hotplug_core.c | 69 ++++++++++++++++++++++++++++++++++
>   drivers/pci/hotplug/pciehp_hpc.c       | 35 ++++++-----------
>   drivers/pci/pci.h                      |  3 ++
>   include/linux/pci.h                    |  8 ++++
>   4 files changed, 92 insertions(+), 23 deletions(-)
>
> diff --git a/drivers/pci/hotplug/pci_hotplug_core.c b/drivers/pci/hotplug/pci_hotplug_core.c
> index d30f131..d8c5856 100644
> --- a/drivers/pci/hotplug/pci_hotplug_core.c
> +++ b/drivers/pci/hotplug/pci_hotplug_core.c
> @@ -492,6 +492,75 @@ void pci_hp_destroy(struct hotplug_slot *slot)
>   }
>   EXPORT_SYMBOL_GPL(pci_hp_destroy);
>   
> +static DECLARE_WAIT_QUEUE_HEAD(pci_hp_link_change_wq);
> +
> +/**
> + * pci_hp_ignore_link_change - begin code section causing spurious link changes
> + * @pdev: PCI hotplug bridge
> + *
> + * Mark the beginning of a code section causing spurious link changes on the
> + * Secondary Bus of @pdev, e.g. as a side effect of a Secondary Bus Reset,
> + * D3cold transition, firmware update or FPGA reconfiguration.
> + *
> + * Hotplug drivers can thus check whether such a code section is executing
> + * concurrently, await it with pci_hp_spurious_link_change() and ignore the
> + * resulting link change events.
> + *
> + * Must be paired with pci_hp_unignore_link_change().  May be called both
> + * from the PCI core and from Endpoint drivers.  May be called for bridges
> + * which are not hotplug-capable, in which case it has no effect because
> + * no hotplug driver is bound to the bridge.
> + */
> +void pci_hp_ignore_link_change(struct pci_dev *pdev)
> +{
> +	set_bit(PCI_LINK_CHANGING, &pdev->priv_flags);
> +	smp_mb__after_atomic(); /* pairs with implied barrier of wait_event() */
> +}
> +
> +/**
> + * pci_hp_unignore_link_change - end code section causing spurious link changes
> + * @pdev: PCI hotplug bridge
> + *
> + * Mark the end of a code section causing spurious link changes on the
> + * Secondary Bus of @pdev.  Must be paired with pci_hp_ignore_link_change().
> + */
> +void pci_hp_unignore_link_change(struct pci_dev *pdev)
> +{
> +	set_bit(PCI_LINK_CHANGED, &pdev->priv_flags);
> +	mb(); /* ensure pci_hp_spurious_link_change() sees either bit set */
> +	clear_bit(PCI_LINK_CHANGING, &pdev->priv_flags);
> +	wake_up_all(&pci_hp_link_change_wq);
> +}
> +
> +/**
> + * pci_hp_spurious_link_change - check for spurious link changes
> + * @pdev: PCI hotplug bridge
> + *
> + * Check whether a code section is executing concurrently which is causing
> + * spurious link changes on the Secondary Bus of @pdev.  Await the end of the
> + * code section if so.
> + *
> + * Return %true if such a code section has been executing concurrently,
> + * otherwise %false.  Also return %true if such a code section has not been
> + * executing concurrently, but at least once since the last invocation of this
> + * function.
> + *
> + * May be called by hotplug drivers to check whether a link change is spurious
> + * and can be ignored.
> + *
> + * Because a genuine link change may have occurred in-between a spurious link
> + * change and the invocation of this function, hotplug drivers should perform
> + * sanity checks such as retrieving the current link state and bringing down
> + * the slot if the link is down.
> + */
> +bool pci_hp_spurious_link_change(struct pci_dev *pdev)
> +{
> +	wait_event(pci_hp_link_change_wq,
> +		   !test_bit(PCI_LINK_CHANGING, &pdev->priv_flags));
> +
> +	return test_and_clear_bit(PCI_LINK_CHANGED, &pdev->priv_flags);
> +}
> +
>   static int __init pci_hotplug_init(void)
>   {
>   	int result;
> diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
> index 388fbed..c98d310 100644
> --- a/drivers/pci/hotplug/pciehp_hpc.c
> +++ b/drivers/pci/hotplug/pciehp_hpc.c
> @@ -592,21 +592,21 @@ bool pciehp_device_replaced(struct controller *ctrl)
>   	return false;
>   }
>   
> -static void pciehp_ignore_dpc_link_change(struct controller *ctrl,
> -					  struct pci_dev *pdev, int irq,
> -					  u16 ignored_events)
> +static void pciehp_ignore_link_change(struct controller *ctrl,
> +				      struct pci_dev *pdev, int irq,
> +				      u16 ignored_events)
>   {
>   	/*
>   	 * Ignore link changes which occurred while waiting for DPC recovery.
>   	 * Could be several if DPC triggered multiple times consecutively.
> +	 * Also ignore link changes caused by Secondary Bus Reset etc.
>   	 */
>   	synchronize_hardirq(irq);
>   	atomic_and(~ignored_events, &ctrl->pending_events);
>   	if (pciehp_poll_mode)
>   		pcie_capability_write_word(pdev, PCI_EXP_SLTSTA,
>   					   ignored_events);
> -	ctrl_info(ctrl, "Slot(%s): Link Down/Up ignored (recovered by DPC)\n",
> -		  slot_name(ctrl));
> +	ctrl_info(ctrl, "Slot(%s): Link Down/Up ignored\n", slot_name(ctrl));
>   
>   	/*
>   	 * If the link is unexpectedly down after successful recovery,
> @@ -762,9 +762,11 @@ static irqreturn_t pciehp_ist(int irq, void *dev_id)
>   
>   	/*
>   	 * Ignore Link Down/Up events caused by Downstream Port Containment
> -	 * if recovery from the error succeeded.
> +	 * if recovery succeeded, or caused by Secondary Bus Reset,
> +	 * suspend to D3cold, firmware update, FPGA reconfiguration, etc.
>   	 */
> -	if ((events & PCI_EXP_SLTSTA_DLLSC) && pci_dpc_recovered(pdev) &&
> +	if ((events & (PCI_EXP_SLTSTA_PDC | PCI_EXP_SLTSTA_DLLSC)) &&
> +	    (pci_dpc_recovered(pdev) || pci_hp_spurious_link_change(pdev)) &&
>   	    ctrl->state == ON_STATE) {
>   		u16 ignored_events = PCI_EXP_SLTSTA_DLLSC;
>   
> @@ -772,7 +774,7 @@ static irqreturn_t pciehp_ist(int irq, void *dev_id)
>   			ignored_events |= events & PCI_EXP_SLTSTA_PDC;
>   
>   		events &= ~ignored_events;
> -		pciehp_ignore_dpc_link_change(ctrl, pdev, irq, ignored_events);
> +		pciehp_ignore_link_change(ctrl, pdev, irq, ignored_events);
>   	}
>   
>   	/*
> @@ -937,7 +939,6 @@ int pciehp_reset_slot(struct hotplug_slot *hotplug_slot, bool probe)
>   {
>   	struct controller *ctrl = to_ctrl(hotplug_slot);
>   	struct pci_dev *pdev = ctrl_dev(ctrl);
> -	u16 stat_mask = 0, ctrl_mask = 0;
>   	int rc;
>   
>   	if (probe)
> @@ -945,23 +946,11 @@ int pciehp_reset_slot(struct hotplug_slot *hotplug_slot, bool probe)
>   
>   	down_write_nested(&ctrl->reset_lock, ctrl->depth);
>   
> -	if (!ATTN_BUTTN(ctrl)) {
> -		ctrl_mask |= PCI_EXP_SLTCTL_PDCE;
> -		stat_mask |= PCI_EXP_SLTSTA_PDC;
> -	}
> -	ctrl_mask |= PCI_EXP_SLTCTL_DLLSCE;
> -	stat_mask |= PCI_EXP_SLTSTA_DLLSC;
> -
> -	pcie_write_cmd(ctrl, 0, ctrl_mask);
> -	ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__,
> -		 pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL, 0);
> +	pci_hp_ignore_link_change(pdev);
>   
>   	rc = pci_bridge_secondary_bus_reset(ctrl->pcie->port);
>   
> -	pcie_capability_write_word(pdev, PCI_EXP_SLTSTA, stat_mask);
> -	pcie_write_cmd_nowait(ctrl, ctrl_mask, ctrl_mask);
> -	ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__,
> -		 pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL, ctrl_mask);
> +	pci_hp_unignore_link_change(pdev);
>   
>   	up_write(&ctrl->reset_lock);
>   	return rc;
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index b81e99c..7db798b 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -227,6 +227,7 @@ static inline bool pcie_downstream_port(const struct pci_dev *dev)
>   
>   /* Functions for PCI Hotplug drivers to use */
>   int pci_hp_add_bridge(struct pci_dev *dev);
> +bool pci_hp_spurious_link_change(struct pci_dev *pdev);
>   
>   #if defined(CONFIG_SYSFS) && defined(HAVE_PCI_LEGACY)
>   void pci_create_legacy_files(struct pci_bus *bus);
> @@ -557,6 +558,8 @@ static inline int pci_dev_set_disconnected(struct pci_dev *dev, void *unused)
>   #define PCI_DPC_RECOVERED 1
>   #define PCI_DPC_RECOVERING 2
>   #define PCI_DEV_REMOVED 3
> +#define PCI_LINK_CHANGED 4
> +#define PCI_LINK_CHANGING 5
>   
>   static inline void pci_dev_assign_added(struct pci_dev *dev)
>   {
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 0e8e3fd..833b54f 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -1848,6 +1848,14 @@ static inline void pcie_no_aspm(void) { }
>   static inline bool pcie_aspm_enabled(struct pci_dev *pdev) { return false; }
>   #endif
>   
> +#ifdef CONFIG_HOTPLUG_PCI
> +void pci_hp_ignore_link_change(struct pci_dev *pdev);
> +void pci_hp_unignore_link_change(struct pci_dev *pdev);
> +#else
> +static inline void pci_hp_ignore_link_change(struct pci_dev *pdev) { }
> +static inline void pci_hp_unignore_link_change(struct pci_dev *pdev) { }
> +#endif
> +
>   #ifdef CONFIG_PCIEAER
>   bool pci_aer_available(void);
>   #else

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] PCI: pciehp: Ignore Link Down/Up caused by Secondary Bus Reset
  2025-04-10 15:27 ` [PATCH 2/2] PCI: pciehp: Ignore Link Down/Up caused by Secondary Bus Reset Lukas Wunner
  2025-04-11 22:28   ` Sathyanarayanan Kuppuswamy
  2025-04-13 17:22   ` Sathyanarayanan Kuppuswamy
@ 2025-04-14 13:32   ` Ilpo Järvinen
  2025-04-16  8:00   ` Ilpo Järvinen
  3 siblings, 0 replies; 16+ messages in thread
From: Ilpo Järvinen @ 2025-04-14 13:32 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Bjorn Helgaas, Sathyanarayanan Kuppuswamy, Keith Busch,
	Yicong Yang, linux-pci, Stuart Hayes, Mika Westerberg,
	Joel Mathew Thomas, Russ Weight, Matthew Gerlach, Yilun Xu,
	linux-fpga, Moshe Shemesh, Shay Drory, Saeed Mahameed,
	Alex Williamson

[-- Attachment #1: Type: text/plain, Size: 14191 bytes --]

On Thu, 10 Apr 2025, Lukas Wunner wrote:

> When a Secondary Bus Reset is issued at a hotplug port, it causes a Data
> Link Layer State Changed event as a side effect.  On hotplug ports using
> in-band presence detect, it additionally causes a Presence Detect Changed
> event.
> 
> These spurious events should not result in teardown and re-enumeration of
> the device in the slot.  Hence commit 2e35afaefe64 ("PCI: pciehp: Add
> reset_slot() method") masked the Presence Detect Changed Enable bit in the
> Slot Control register during a Secondary Bus Reset.  Commit 06a8d89af551
> ("PCI: pciehp: Disable link notification across slot reset") additionally
> masked the Data Link Layer State Changed Enable bit.
> 
> However masking those bits only disables interrupt generation (PCIe r6.2
> sec 6.7.3.1).  The events are still visible in the Slot Status register
> and picked up by the IRQ handler if it runs during a Secondary Bus Reset.
> This can happen if the interrupt is shared or if an unmasked hotplug event
> occurs, e.g. Attention Button Pressed or Power Fault Detected.
> 
> The likelihood of this happening used to be small, so it wasn't much of a
> problem in practice.  That has changed with the recent introduction of
> bandwidth control in v6.13-rc1 with commit 665745f27487 ("PCI/bwctrl:
> Re-add BW notification portdrv as PCIe BW controller"):
> 
> Bandwidth control shares the interrupt with PCIe hotplug.  A Secondary Bus
> Reset causes a Link Bandwidth Notification, so the hotplug IRQ handler
> runs, picks up the masked events and tears down the device in the slot.
> 
> As a result, Joel reports VFIO passthrough failure of a GPU, which Ilpo
> root-caused to the incorrect handling of masked hotplug events.
> 
> Clearly, a more reliable way is needed to ignore spurious hotplug events.
> 
> For Downstream Port Containment, a new ignore mechanism was introduced by
> commit a97396c6eb13 ("PCI: pciehp: Ignore Link Down/Up caused by DPC").
> It has been working reliably for the past four years.
> 
> Adapt it for Secondary Bus Resets.
> 
> Introduce two helpers to annotate code sections which cause spurious link
> changes:  pci_hp_ignore_link_change() and pci_hp_unignore_link_change()
> Use those helpers in lieu of masking interrupts in the Slot Control
> register.
> 
> Introduce a helper to check whether such a code section is executing
> concurrently and if so, await it:  pci_hp_spurious_link_change()
> Invoke the helper in the hotplug IRQ thread pciehp_ist().  Re-use the
> IRQ thread's existing code which ignores DPC-induced link changes unless
> the link is unexpectedly down after reset recovery or the device was
> replaced during the bus reset.
> 
> That code block in pciehp_ist() was previously only executed if a Data
> Link Layer State Changed event has occurred.  Additionally execute it for
> Presence Detect Changed events.  That's necessary for compatibility with
> PCIe r1.0 hotplug ports because Data Link Layer State Changed didn't exist
> before PCIe r1.1.  DPC was added with PCIe r3.1 and thus DPC-capable
> hotplug ports always support Data Link Layer State Changed events.
> But the same cannot be assumed for Secondary Bus Reset, which already
> existed in PCIe r1.0.
> 
> Secondary Bus Reset is only one of many causes of spurious link changes.
> Others include runtime suspend to D3cold, firmware updates or FPGA
> reconfiguration.  The new pci_hp_{,un}ignore_link_change() helpers may be
> used by all kinds of drivers to annotate such code sections, hence their
> declarations are publicly visible in <linux/pci.h>.  A case in point is
> the Mellanox Ethernet driver which disables a firmware reset feature if
> the Ethernet card is attached to a hotplug port, see commit 3d7a3f2612d7
> ("net/mlx5: Nack sync reset request when HotPlug is enabled").  Going
> forward, PCIe hotplug will be able to cope gracefully with all such use
> cases once the code sections are properly annotated.
> 
> The new helpers internally use two bits in struct pci_dev's priv_flags as
> well as a wait_queue.  This mirrors what was done for DPC by commit
> a97396c6eb13 ("PCI: pciehp: Ignore Link Down/Up caused by DPC").  That may
> be insufficient if spurious link changes are caused by multiple sources
> simultaneously.  An example might be a Secondary Bus Reset issued by AER
> during FPGA reconfiguration.  If this turns out to happen in real life,
> support for it can easily be added by replacing the PCI_LINK_CHANGING flag
> with an atomic_t counter incremented by pci_hp_ignore_link_change() and
> decremented by pci_hp_unignore_link_change().  Instead of awaiting a zero
> PCI_LINK_CHANGING flag, the pci_hp_spurious_link_change() helper would
> then simply await a zero counter.
> 
> Fixes: 665745f27487 ("PCI/bwctrl: Re-add BW notification portdrv as PCIe BW controller")
> Reported-by: Joel Mathew Thomas <proxy0@tutamail.com>
> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219765
> Tested-by: Joel Mathew Thomas <proxy0@tutamail.com>
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> Cc: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
> ---
>  drivers/pci/hotplug/pci_hotplug_core.c | 69 ++++++++++++++++++++++++++++++++++
>  drivers/pci/hotplug/pciehp_hpc.c       | 35 ++++++-----------
>  drivers/pci/pci.h                      |  3 ++
>  include/linux/pci.h                    |  8 ++++
>  4 files changed, 92 insertions(+), 23 deletions(-)
> 
> diff --git a/drivers/pci/hotplug/pci_hotplug_core.c b/drivers/pci/hotplug/pci_hotplug_core.c
> index d30f131..d8c5856 100644
> --- a/drivers/pci/hotplug/pci_hotplug_core.c
> +++ b/drivers/pci/hotplug/pci_hotplug_core.c
> @@ -492,6 +492,75 @@ void pci_hp_destroy(struct hotplug_slot *slot)
>  }
>  EXPORT_SYMBOL_GPL(pci_hp_destroy);
>  
> +static DECLARE_WAIT_QUEUE_HEAD(pci_hp_link_change_wq);
> +
> +/**
> + * pci_hp_ignore_link_change - begin code section causing spurious link changes
> + * @pdev: PCI hotplug bridge
> + *
> + * Mark the beginning of a code section causing spurious link changes on the
> + * Secondary Bus of @pdev, e.g. as a side effect of a Secondary Bus Reset,
> + * D3cold transition, firmware update or FPGA reconfiguration.
> + *
> + * Hotplug drivers can thus check whether such a code section is executing
> + * concurrently, await it with pci_hp_spurious_link_change() and ignore the
> + * resulting link change events.
> + *
> + * Must be paired with pci_hp_unignore_link_change().  May be called both
> + * from the PCI core and from Endpoint drivers.  May be called for bridges
> + * which are not hotplug-capable, in which case it has no effect because
> + * no hotplug driver is bound to the bridge.
> + */
> +void pci_hp_ignore_link_change(struct pci_dev *pdev)
> +{
> +	set_bit(PCI_LINK_CHANGING, &pdev->priv_flags);
> +	smp_mb__after_atomic(); /* pairs with implied barrier of wait_event() */
> +}
> +
> +/**
> + * pci_hp_unignore_link_change - end code section causing spurious link changes
> + * @pdev: PCI hotplug bridge
> + *
> + * Mark the end of a code section causing spurious link changes on the
> + * Secondary Bus of @pdev.  Must be paired with pci_hp_ignore_link_change().
> + */
> +void pci_hp_unignore_link_change(struct pci_dev *pdev)
> +{
> +	set_bit(PCI_LINK_CHANGED, &pdev->priv_flags);
> +	mb(); /* ensure pci_hp_spurious_link_change() sees either bit set */
> +	clear_bit(PCI_LINK_CHANGING, &pdev->priv_flags);
> +	wake_up_all(&pci_hp_link_change_wq);
> +}
> +
> +/**
> + * pci_hp_spurious_link_change - check for spurious link changes
> + * @pdev: PCI hotplug bridge
> + *
> + * Check whether a code section is executing concurrently which is causing
> + * spurious link changes on the Secondary Bus of @pdev.  Await the end of the
> + * code section if so.
> + *
> + * Return %true if such a code section has been executing concurrently,

Return:

And I think it's assumed to be the last, not in the middle of the 
function's description.

> + * otherwise %false.  Also return %true if such a code section has not been
> + * executing concurrently, but at least once since the last invocation of this
> + * function.
> + *
> + * May be called by hotplug drivers to check whether a link change is spurious
> + * and can be ignored.
> + *
> + * Because a genuine link change may have occurred in-between a spurious link
> + * change and the invocation of this function, hotplug drivers should perform
> + * sanity checks such as retrieving the current link state and bringing down
> + * the slot if the link is down.
> + */
> +bool pci_hp_spurious_link_change(struct pci_dev *pdev)
> +{
> +	wait_event(pci_hp_link_change_wq,
> +		   !test_bit(PCI_LINK_CHANGING, &pdev->priv_flags));
> +
> +	return test_and_clear_bit(PCI_LINK_CHANGED, &pdev->priv_flags);
> +}
> +
>  static int __init pci_hotplug_init(void)
>  {
>  	int result;
> diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
> index 388fbed..c98d310 100644
> --- a/drivers/pci/hotplug/pciehp_hpc.c
> +++ b/drivers/pci/hotplug/pciehp_hpc.c
> @@ -592,21 +592,21 @@ bool pciehp_device_replaced(struct controller *ctrl)
>  	return false;
>  }
>  
> -static void pciehp_ignore_dpc_link_change(struct controller *ctrl,
> -					  struct pci_dev *pdev, int irq,
> -					  u16 ignored_events)
> +static void pciehp_ignore_link_change(struct controller *ctrl,
> +				      struct pci_dev *pdev, int irq,
> +				      u16 ignored_events)
>  {
>  	/*
>  	 * Ignore link changes which occurred while waiting for DPC recovery.
>  	 * Could be several if DPC triggered multiple times consecutively.
> +	 * Also ignore link changes caused by Secondary Bus Reset etc.

Comma before etc.

>  	 */
>  	synchronize_hardirq(irq);
>  	atomic_and(~ignored_events, &ctrl->pending_events);
>  	if (pciehp_poll_mode)
>  		pcie_capability_write_word(pdev, PCI_EXP_SLTSTA,
>  					   ignored_events);
> -	ctrl_info(ctrl, "Slot(%s): Link Down/Up ignored (recovered by DPC)\n",
> -		  slot_name(ctrl));
> +	ctrl_info(ctrl, "Slot(%s): Link Down/Up ignored\n", slot_name(ctrl));
>  
>  	/*
>  	 * If the link is unexpectedly down after successful recovery,
> @@ -762,9 +762,11 @@ static irqreturn_t pciehp_ist(int irq, void *dev_id)
>  
>  	/*
>  	 * Ignore Link Down/Up events caused by Downstream Port Containment
> -	 * if recovery from the error succeeded.
> +	 * if recovery succeeded, or caused by Secondary Bus Reset,
> +	 * suspend to D3cold, firmware update, FPGA reconfiguration, etc.
>  	 */
> -	if ((events & PCI_EXP_SLTSTA_DLLSC) && pci_dpc_recovered(pdev) &&
> +	if ((events & (PCI_EXP_SLTSTA_PDC | PCI_EXP_SLTSTA_DLLSC)) &&
> +	    (pci_dpc_recovered(pdev) || pci_hp_spurious_link_change(pdev)) &&
>  	    ctrl->state == ON_STATE) {
>  		u16 ignored_events = PCI_EXP_SLTSTA_DLLSC;
>  
> @@ -772,7 +774,7 @@ static irqreturn_t pciehp_ist(int irq, void *dev_id)
>  			ignored_events |= events & PCI_EXP_SLTSTA_PDC;
>  
>  		events &= ~ignored_events;
> -		pciehp_ignore_dpc_link_change(ctrl, pdev, irq, ignored_events);
> +		pciehp_ignore_link_change(ctrl, pdev, irq, ignored_events);
>  	}
>  
>  	/*
> @@ -937,7 +939,6 @@ int pciehp_reset_slot(struct hotplug_slot *hotplug_slot, bool probe)
>  {
>  	struct controller *ctrl = to_ctrl(hotplug_slot);
>  	struct pci_dev *pdev = ctrl_dev(ctrl);
> -	u16 stat_mask = 0, ctrl_mask = 0;
>  	int rc;
>  
>  	if (probe)
> @@ -945,23 +946,11 @@ int pciehp_reset_slot(struct hotplug_slot *hotplug_slot, bool probe)
>  
>  	down_write_nested(&ctrl->reset_lock, ctrl->depth);
>  
> -	if (!ATTN_BUTTN(ctrl)) {
> -		ctrl_mask |= PCI_EXP_SLTCTL_PDCE;
> -		stat_mask |= PCI_EXP_SLTSTA_PDC;
> -	}
> -	ctrl_mask |= PCI_EXP_SLTCTL_DLLSCE;
> -	stat_mask |= PCI_EXP_SLTSTA_DLLSC;
> -
> -	pcie_write_cmd(ctrl, 0, ctrl_mask);
> -	ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__,
> -		 pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL, 0);
> +	pci_hp_ignore_link_change(pdev);
>  
>  	rc = pci_bridge_secondary_bus_reset(ctrl->pcie->port);
>  
> -	pcie_capability_write_word(pdev, PCI_EXP_SLTSTA, stat_mask);
> -	pcie_write_cmd_nowait(ctrl, ctrl_mask, ctrl_mask);
> -	ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__,
> -		 pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL, ctrl_mask);
> +	pci_hp_unignore_link_change(pdev);
>  
>  	up_write(&ctrl->reset_lock);
>  	return rc;
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index b81e99c..7db798b 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -227,6 +227,7 @@ static inline bool pcie_downstream_port(const struct pci_dev *dev)
>  
>  /* Functions for PCI Hotplug drivers to use */
>  int pci_hp_add_bridge(struct pci_dev *dev);
> +bool pci_hp_spurious_link_change(struct pci_dev *pdev);
>  
>  #if defined(CONFIG_SYSFS) && defined(HAVE_PCI_LEGACY)
>  void pci_create_legacy_files(struct pci_bus *bus);
> @@ -557,6 +558,8 @@ static inline int pci_dev_set_disconnected(struct pci_dev *dev, void *unused)
>  #define PCI_DPC_RECOVERED 1
>  #define PCI_DPC_RECOVERING 2
>  #define PCI_DEV_REMOVED 3
> +#define PCI_LINK_CHANGED 4
> +#define PCI_LINK_CHANGING 5
>  
>  static inline void pci_dev_assign_added(struct pci_dev *dev)
>  {
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 0e8e3fd..833b54f 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -1848,6 +1848,14 @@ static inline void pcie_no_aspm(void) { }
>  static inline bool pcie_aspm_enabled(struct pci_dev *pdev) { return false; }
>  #endif
>  
> +#ifdef CONFIG_HOTPLUG_PCI
> +void pci_hp_ignore_link_change(struct pci_dev *pdev);
> +void pci_hp_unignore_link_change(struct pci_dev *pdev);
> +#else
> +static inline void pci_hp_ignore_link_change(struct pci_dev *pdev) { }
> +static inline void pci_hp_unignore_link_change(struct pci_dev *pdev) { }
> +#endif
> +
>  #ifdef CONFIG_PCIEAER
>  bool pci_aer_available(void);
>  #else
> 

Just nits above.

Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>

-- 
 i.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] PCI: pciehp: Ignore Presence Detect Changed caused by DPC
  2025-04-10 15:27 ` [PATCH 1/2] PCI: pciehp: Ignore Presence Detect Changed caused by DPC Lukas Wunner
  2025-04-11  2:34   ` Sathyanarayanan Kuppuswamy
@ 2025-04-14 13:33   ` Ilpo Järvinen
  1 sibling, 0 replies; 16+ messages in thread
From: Ilpo Järvinen @ 2025-04-14 13:33 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Bjorn Helgaas, Sathyanarayanan Kuppuswamy, Keith Busch,
	Yicong Yang, linux-pci, Stuart Hayes, Mika Westerberg,
	Joel Mathew Thomas, Russ Weight, Matthew Gerlach, Yilun Xu,
	linux-fpga, Moshe Shemesh, Shay Drory, Saeed Mahameed,
	Alex Williamson

[-- Attachment #1: Type: text/plain, Size: 6966 bytes --]

On Thu, 10 Apr 2025, Lukas Wunner wrote:

> Commit a97396c6eb13 ("PCI: pciehp: Ignore Link Down/Up caused by DPC")
> amended PCIe hotplug to not bring down the slot upon Data Link Layer State
> Changed events caused by Downstream Port Containment.
> 
> However Keith reports off-list that if the slot uses in-band presence
> detect (i.e. Presence Detect State is derived from Data Link Layer Link
> Active), DPC also causes a spurious Presence Detect Changed event.
> 
> This needs to be ignored as well.
> 
> Unfortunately there's no register indicating that in-band presence detect
> is used.  PCIe r5.0 sec 7.5.3.10 introduced the In-Band PD Disable bit in
> the Slot Control Register.  The PCIe hotplug driver sets this bit on
> ports supporting it.  But older ports may still use in-band presence
> detect.
> 
> If in-band presence detect can be disabled, Presence Detect Changed events
> occurring during DPC must not be ignored because they signal device
> replacement.  On all other ports, device replacement cannot be detected
> reliably because the Presence Detect Changed event could be a side effect
> of DPC.  On those (older) ports, perform a best-effort device replacement
> check by comparing the Vendor ID, Device ID and other data in Config Space
> with the values cached in struct pci_dev.  Use the existing helper
> pciehp_device_replaced() to accomplish this.  It is currently #ifdef'ed to
> CONFIG_PM_SLEEP in pciehp_core.c, so move it to pciehp_hpc.c where most
> other functions accessing config space reside.
> 
> Reported-by: Keith Busch <kbusch@kernel.org>
> Signed-off-by: Lukas Wunner <lukas@wunner.de>

Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>

--
 i.

> ---
>  drivers/pci/hotplug/pciehp.h      |  1 +
>  drivers/pci/hotplug/pciehp_core.c | 29 -----------------------
>  drivers/pci/hotplug/pciehp_hpc.c  | 49 +++++++++++++++++++++++++++++++++------
>  3 files changed, 43 insertions(+), 36 deletions(-)
> 
> diff --git a/drivers/pci/hotplug/pciehp.h b/drivers/pci/hotplug/pciehp.h
> index 273dd8c..debc79b0 100644
> --- a/drivers/pci/hotplug/pciehp.h
> +++ b/drivers/pci/hotplug/pciehp.h
> @@ -187,6 +187,7 @@ struct controller {
>  int pciehp_card_present_or_link_active(struct controller *ctrl);
>  int pciehp_check_link_status(struct controller *ctrl);
>  int pciehp_check_link_active(struct controller *ctrl);
> +bool pciehp_device_replaced(struct controller *ctrl);
>  void pciehp_release_ctrl(struct controller *ctrl);
>  
>  int pciehp_sysfs_enable_slot(struct hotplug_slot *hotplug_slot);
> diff --git a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c
> index 997841c..f59baa9 100644
> --- a/drivers/pci/hotplug/pciehp_core.c
> +++ b/drivers/pci/hotplug/pciehp_core.c
> @@ -284,35 +284,6 @@ static int pciehp_suspend(struct pcie_device *dev)
>  	return 0;
>  }
>  
> -static bool pciehp_device_replaced(struct controller *ctrl)
> -{
> -	struct pci_dev *pdev __free(pci_dev_put) = NULL;
> -	u32 reg;
> -
> -	if (pci_dev_is_disconnected(ctrl->pcie->port))
> -		return false;
> -
> -	pdev = pci_get_slot(ctrl->pcie->port->subordinate, PCI_DEVFN(0, 0));
> -	if (!pdev)
> -		return true;
> -
> -	if (pci_read_config_dword(pdev, PCI_VENDOR_ID, &reg) ||
> -	    reg != (pdev->vendor | (pdev->device << 16)) ||
> -	    pci_read_config_dword(pdev, PCI_CLASS_REVISION, &reg) ||
> -	    reg != (pdev->revision | (pdev->class << 8)))
> -		return true;
> -
> -	if (pdev->hdr_type == PCI_HEADER_TYPE_NORMAL &&
> -	    (pci_read_config_dword(pdev, PCI_SUBSYSTEM_VENDOR_ID, &reg) ||
> -	     reg != (pdev->subsystem_vendor | (pdev->subsystem_device << 16))))
> -		return true;
> -
> -	if (pci_get_dsn(pdev) != ctrl->dsn)
> -		return true;
> -
> -	return false;
> -}
> -
>  static int pciehp_resume_noirq(struct pcie_device *dev)
>  {
>  	struct controller *ctrl = get_service_data(dev);
> diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
> index 8a09fb6..388fbed 100644
> --- a/drivers/pci/hotplug/pciehp_hpc.c
> +++ b/drivers/pci/hotplug/pciehp_hpc.c
> @@ -563,18 +563,48 @@ void pciehp_power_off_slot(struct controller *ctrl)
>  		 PCI_EXP_SLTCTL_PWR_OFF);
>  }
>  
> +bool pciehp_device_replaced(struct controller *ctrl)
> +{
> +	struct pci_dev *pdev __free(pci_dev_put) = NULL;
> +	u32 reg;
> +
> +	if (pci_dev_is_disconnected(ctrl->pcie->port))
> +		return false;
> +
> +	pdev = pci_get_slot(ctrl->pcie->port->subordinate, PCI_DEVFN(0, 0));
> +	if (!pdev)
> +		return true;
> +
> +	if (pci_read_config_dword(pdev, PCI_VENDOR_ID, &reg) ||
> +	    reg != (pdev->vendor | (pdev->device << 16)) ||
> +	    pci_read_config_dword(pdev, PCI_CLASS_REVISION, &reg) ||
> +	    reg != (pdev->revision | (pdev->class << 8)))
> +		return true;
> +
> +	if (pdev->hdr_type == PCI_HEADER_TYPE_NORMAL &&
> +	    (pci_read_config_dword(pdev, PCI_SUBSYSTEM_VENDOR_ID, &reg) ||
> +	     reg != (pdev->subsystem_vendor | (pdev->subsystem_device << 16))))
> +		return true;
> +
> +	if (pci_get_dsn(pdev) != ctrl->dsn)
> +		return true;
> +
> +	return false;
> +}
> +
>  static void pciehp_ignore_dpc_link_change(struct controller *ctrl,
> -					  struct pci_dev *pdev, int irq)
> +					  struct pci_dev *pdev, int irq,
> +					  u16 ignored_events)
>  {
>  	/*
>  	 * Ignore link changes which occurred while waiting for DPC recovery.
>  	 * Could be several if DPC triggered multiple times consecutively.
>  	 */
>  	synchronize_hardirq(irq);
> -	atomic_and(~PCI_EXP_SLTSTA_DLLSC, &ctrl->pending_events);
> +	atomic_and(~ignored_events, &ctrl->pending_events);
>  	if (pciehp_poll_mode)
>  		pcie_capability_write_word(pdev, PCI_EXP_SLTSTA,
> -					   PCI_EXP_SLTSTA_DLLSC);
> +					   ignored_events);
>  	ctrl_info(ctrl, "Slot(%s): Link Down/Up ignored (recovered by DPC)\n",
>  		  slot_name(ctrl));
>  
> @@ -584,8 +614,8 @@ static void pciehp_ignore_dpc_link_change(struct controller *ctrl,
>  	 * Synthesize it to ensure that it is acted on.
>  	 */
>  	down_read_nested(&ctrl->reset_lock, ctrl->depth);
> -	if (!pciehp_check_link_active(ctrl))
> -		pciehp_request(ctrl, PCI_EXP_SLTSTA_DLLSC);
> +	if (!pciehp_check_link_active(ctrl) || pciehp_device_replaced(ctrl))
> +		pciehp_request(ctrl, ignored_events);
>  	up_read(&ctrl->reset_lock);
>  }
>  
> @@ -736,8 +766,13 @@ static irqreturn_t pciehp_ist(int irq, void *dev_id)
>  	 */
>  	if ((events & PCI_EXP_SLTSTA_DLLSC) && pci_dpc_recovered(pdev) &&
>  	    ctrl->state == ON_STATE) {
> -		events &= ~PCI_EXP_SLTSTA_DLLSC;
> -		pciehp_ignore_dpc_link_change(ctrl, pdev, irq);
> +		u16 ignored_events = PCI_EXP_SLTSTA_DLLSC;
> +
> +		if (!ctrl->inband_presence_disabled)
> +			ignored_events |= events & PCI_EXP_SLTSTA_PDC;
> +
> +		events &= ~ignored_events;
> +		pciehp_ignore_dpc_link_change(ctrl, pdev, irq, ignored_events);
>  	}
>  
>  	/*
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/2] Ignore spurious PCIe hotplug events
  2025-04-10 15:27 [PATCH 0/2] Ignore spurious PCIe hotplug events Lukas Wunner
                   ` (2 preceding siblings ...)
  2025-04-10 22:19 ` [PATCH 0/2] Ignore spurious PCIe hotplug events Bjorn Helgaas
@ 2025-04-15 20:51 ` Keith Busch
  2025-04-16 15:06   ` Lukas Wunner
  3 siblings, 1 reply; 16+ messages in thread
From: Keith Busch @ 2025-04-15 20:51 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Bjorn Helgaas, Sathyanarayanan Kuppuswamy, Yicong Yang, linux-pci,
	Stuart Hayes, Mika Westerberg, Ilpo Jarvinen, Joel Mathew Thomas,
	Russ Weight, Matthew Gerlach, Yilun Xu, linux-fpga, Moshe Shemesh,
	Shay Drory, Saeed Mahameed, Alex Williamson

On Thu, Apr 10, 2025 at 05:27:10PM +0200, Lukas Wunner wrote:
> Trying to kill several birds with one stone here:
> 
> First of all, PCIe hotplug is deliberately ignoring link events occurring
> as a side effect of Downstream Port Containment.  But it's not yet ignoring
> Presence Detect Changed events.  These can happen if a hotplug bridge uses
> in-band presence detect.  Reported by Keith Busch, patch [1/2] seeks to
> fix it.
> 
> Second, PCIe hotplug is deliberately ignoring link events and Presence
> Detect Changed events occurring as a side effect of a Secondary Bus Reset.
> But that's no longer working properly since the introduction of bandwidth
> control in v6.13-rc1.  Actually it never worked properly, but bandwidth
> control is now mercilessly exposing the issue.  VFIO is thus broken,
> it resets the device on passthrough.  Reported by Joel Mathew Thomas.
> 
> Third, link or presence events can not only occur as a side effect of DPC
> or SBR, but also because of suspend to D3cold, a firmware update or FPGA
> reconfiguration.  In particular, Altera engineers report that the link
> goes down as a side effect of FPGA reconfiguration and the PCIe hotplug
> driver responds by disabling slot power.  Obviously not what you'd want
> while the FPGA is being reconfigured!
> 
> This leads me to believe that we need a generic mechanism to tell hotplug
> drivers that spurious link changes are ongoing which need to be ignored.
> Patch [2/2] introduces an API for it and the first user is SBR handling
> in PCIe hotplug.  This fixes the issue exposed by bandwidth control.
> It also aligns DPC and SBR handling in the PCIe hotplug driver such that
> they use the same code path.
> 
> The API pci_hp_ignore_link_change() / pci_hp_unignore_link_change() is
> initially not exported.  It can be once the first modular user shows up.
> 
> Although these are technically fixes, they're slightly intrusive, so it
> would be good to let them simmer in linux-next for a while.  One option
> would be to apply for v6.16 and let Greg & Sasha do the backporting.
> Another would be to apply to the for-linus branch for v6.15 but wait
> maybe 4 weeks before a pull request is sent.

I'm a bit conflicted on this because it does appear to help. But it is
ignoring a PDC and there are times where it is legit telling the host
the device presence really has changed. There are switches that let you
toggle downstream connections to change what's attached and it causes a
DPC event, swapping out the downstream device at the same time. So this
change has the pci driver resume with the wrong device if you happen to
be in such a situation. I don't have such switches anymore, so I'd hate
to stand in the way over some theoretical issue when this patch helps a
more immediate one.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] PCI: pciehp: Ignore Link Down/Up caused by Secondary Bus Reset
  2025-04-10 15:27 ` [PATCH 2/2] PCI: pciehp: Ignore Link Down/Up caused by Secondary Bus Reset Lukas Wunner
                     ` (2 preceding siblings ...)
  2025-04-14 13:32   ` Ilpo Järvinen
@ 2025-04-16  8:00   ` Ilpo Järvinen
  3 siblings, 0 replies; 16+ messages in thread
From: Ilpo Järvinen @ 2025-04-16  8:00 UTC (permalink / raw)
  To: Lukas Wunner, Bjorn Helgaas
  Cc: Sathyanarayanan Kuppuswamy, Keith Busch, Yicong Yang, linux-pci,
	Stuart Hayes, Mika Westerberg, Joel Mathew Thomas, Russ Weight,
	Matthew Gerlach, Yilun Xu, linux-fpga, Moshe Shemesh, Shay Drory,
	Saeed Mahameed, Alex Williamson

[-- Attachment #1: Type: text/plain, Size: 7697 bytes --]

On Thu, 10 Apr 2025, Lukas Wunner wrote:

> When a Secondary Bus Reset is issued at a hotplug port, it causes a Data
> Link Layer State Changed event as a side effect.  On hotplug ports using
> in-band presence detect, it additionally causes a Presence Detect Changed
> event.
> 
> These spurious events should not result in teardown and re-enumeration of
> the device in the slot.  Hence commit 2e35afaefe64 ("PCI: pciehp: Add
> reset_slot() method") masked the Presence Detect Changed Enable bit in the
> Slot Control register during a Secondary Bus Reset.  Commit 06a8d89af551
> ("PCI: pciehp: Disable link notification across slot reset") additionally
> masked the Data Link Layer State Changed Enable bit.
> 
> However masking those bits only disables interrupt generation (PCIe r6.2
> sec 6.7.3.1).  The events are still visible in the Slot Status register
> and picked up by the IRQ handler if it runs during a Secondary Bus Reset.
> This can happen if the interrupt is shared or if an unmasked hotplug event
> occurs, e.g. Attention Button Pressed or Power Fault Detected.
> 
> The likelihood of this happening used to be small, so it wasn't much of a
> problem in practice.  That has changed with the recent introduction of
> bandwidth control in v6.13-rc1 with commit 665745f27487 ("PCI/bwctrl:
> Re-add BW notification portdrv as PCIe BW controller"):
> 
> Bandwidth control shares the interrupt with PCIe hotplug.  A Secondary Bus
> Reset causes a Link Bandwidth Notification, so the hotplug IRQ handler
> runs, picks up the masked events and tears down the device in the slot.
> 
> As a result, Joel reports VFIO passthrough failure of a GPU, which Ilpo
> root-caused to the incorrect handling of masked hotplug events.
> 
> Clearly, a more reliable way is needed to ignore spurious hotplug events.
> 
> For Downstream Port Containment, a new ignore mechanism was introduced by
> commit a97396c6eb13 ("PCI: pciehp: Ignore Link Down/Up caused by DPC").
> It has been working reliably for the past four years.
> 
> Adapt it for Secondary Bus Resets.
> 
> Introduce two helpers to annotate code sections which cause spurious link
> changes:  pci_hp_ignore_link_change() and pci_hp_unignore_link_change()
> Use those helpers in lieu of masking interrupts in the Slot Control
> register.
> 
> Introduce a helper to check whether such a code section is executing
> concurrently and if so, await it:  pci_hp_spurious_link_change()
> Invoke the helper in the hotplug IRQ thread pciehp_ist().  Re-use the
> IRQ thread's existing code which ignores DPC-induced link changes unless
> the link is unexpectedly down after reset recovery or the device was
> replaced during the bus reset.
> 
> That code block in pciehp_ist() was previously only executed if a Data
> Link Layer State Changed event has occurred.  Additionally execute it for
> Presence Detect Changed events.  That's necessary for compatibility with
> PCIe r1.0 hotplug ports because Data Link Layer State Changed didn't exist
> before PCIe r1.1.  DPC was added with PCIe r3.1 and thus DPC-capable
> hotplug ports always support Data Link Layer State Changed events.
> But the same cannot be assumed for Secondary Bus Reset, which already
> existed in PCIe r1.0.
> 
> Secondary Bus Reset is only one of many causes of spurious link changes.
> Others include runtime suspend to D3cold, firmware updates or FPGA
> reconfiguration.  The new pci_hp_{,un}ignore_link_change() helpers may be
> used by all kinds of drivers to annotate such code sections, hence their
> declarations are publicly visible in <linux/pci.h>.  A case in point is
> the Mellanox Ethernet driver which disables a firmware reset feature if
> the Ethernet card is attached to a hotplug port, see commit 3d7a3f2612d7
> ("net/mlx5: Nack sync reset request when HotPlug is enabled").  Going
> forward, PCIe hotplug will be able to cope gracefully with all such use
> cases once the code sections are properly annotated.
> 
> The new helpers internally use two bits in struct pci_dev's priv_flags as
> well as a wait_queue.  This mirrors what was done for DPC by commit
> a97396c6eb13 ("PCI: pciehp: Ignore Link Down/Up caused by DPC").  That may
> be insufficient if spurious link changes are caused by multiple sources
> simultaneously.  An example might be a Secondary Bus Reset issued by AER
> during FPGA reconfiguration.  If this turns out to happen in real life,
> support for it can easily be added by replacing the PCI_LINK_CHANGING flag
> with an atomic_t counter incremented by pci_hp_ignore_link_change() and
> decremented by pci_hp_unignore_link_change().  Instead of awaiting a zero
> PCI_LINK_CHANGING flag, the pci_hp_spurious_link_change() helper would
> then simply await a zero counter.
> 
> Fixes: 665745f27487 ("PCI/bwctrl: Re-add BW notification portdrv as PCIe BW controller")
> Reported-by: Joel Mathew Thomas <proxy0@tutamail.com>
> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219765
> Tested-by: Joel Mathew Thomas <proxy0@tutamail.com>
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> Cc: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
> ---
>  drivers/pci/hotplug/pci_hotplug_core.c | 69 ++++++++++++++++++++++++++++++++++
>  drivers/pci/hotplug/pciehp_hpc.c       | 35 ++++++-----------
>  drivers/pci/pci.h                      |  3 ++
>  include/linux/pci.h                    |  8 ++++
>  4 files changed, 92 insertions(+), 23 deletions(-)
> 
> diff --git a/drivers/pci/hotplug/pci_hotplug_core.c b/drivers/pci/hotplug/pci_hotplug_core.c
> index d30f131..d8c5856 100644
> --- a/drivers/pci/hotplug/pci_hotplug_core.c
> +++ b/drivers/pci/hotplug/pci_hotplug_core.c
> @@ -492,6 +492,75 @@ void pci_hp_destroy(struct hotplug_slot *slot)
>  }
>  EXPORT_SYMBOL_GPL(pci_hp_destroy);
>  
> +static DECLARE_WAIT_QUEUE_HEAD(pci_hp_link_change_wq);
> +
> +/**
> + * pci_hp_ignore_link_change - begin code section causing spurious link changes
> + * @pdev: PCI hotplug bridge
> + *
> + * Mark the beginning of a code section causing spurious link changes on the
> + * Secondary Bus of @pdev, e.g. as a side effect of a Secondary Bus Reset,
> + * D3cold transition, firmware update or FPGA reconfiguration.
> + *
> + * Hotplug drivers can thus check whether such a code section is executing
> + * concurrently, await it with pci_hp_spurious_link_change() and ignore the
> + * resulting link change events.
> + *
> + * Must be paired with pci_hp_unignore_link_change().  May be called both
> + * from the PCI core and from Endpoint drivers.  May be called for bridges
> + * which are not hotplug-capable, in which case it has no effect because
> + * no hotplug driver is bound to the bridge.
> + */
> +void pci_hp_ignore_link_change(struct pci_dev *pdev)
> +{
> +	set_bit(PCI_LINK_CHANGING, &pdev->priv_flags);
> +	smp_mb__after_atomic(); /* pairs with implied barrier of wait_event() */
> +}
> +
> +/**
> + * pci_hp_unignore_link_change - end code section causing spurious link changes
> + * @pdev: PCI hotplug bridge
> + *
> + * Mark the end of a code section causing spurious link changes on the
> + * Secondary Bus of @pdev.  Must be paired with pci_hp_ignore_link_change().
> + */
> +void pci_hp_unignore_link_change(struct pci_dev *pdev)
> +{
> +	set_bit(PCI_LINK_CHANGED, &pdev->priv_flags);
> +	mb(); /* ensure pci_hp_spurious_link_change() sees either bit set */
> +	clear_bit(PCI_LINK_CHANGING, &pdev->priv_flags);
> +	wake_up_all(&pci_hp_link_change_wq);

This change should have added these:

#include <linux/bitops.h>
#include <linux/wait.h>

#include <asm/barrier.h>

--
 i.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/2] Ignore spurious PCIe hotplug events
  2025-04-15 20:51 ` Keith Busch
@ 2025-04-16 15:06   ` Lukas Wunner
  2025-04-18  1:26     ` Keith Busch
  0 siblings, 1 reply; 16+ messages in thread
From: Lukas Wunner @ 2025-04-16 15:06 UTC (permalink / raw)
  To: Keith Busch
  Cc: Bjorn Helgaas, Sathyanarayanan Kuppuswamy, Yicong Yang, linux-pci,
	Stuart Hayes, Mika Westerberg, Ilpo Jarvinen, Joel Mathew Thomas,
	Russ Weight, Matthew Gerlach, Yilun Xu, linux-fpga, Moshe Shemesh,
	Shay Drory, Saeed Mahameed, Alex Williamson

On Tue, Apr 15, 2025 at 02:51:42PM -0600, Keith Busch wrote:
> On Thu, Apr 10, 2025 at 05:27:10PM +0200, Lukas Wunner wrote:
> > First of all, PCIe hotplug is deliberately ignoring link events occurring
> > as a side effect of Downstream Port Containment.  But it's not yet ignoring
> > Presence Detect Changed events.  These can happen if a hotplug bridge uses
> > in-band presence detect.  Reported by Keith Busch, patch [1/2] seeks to
> > fix it.
> 
> There are switches that let you
> toggle downstream connections to change what's attached and it causes a
> DPC event, swapping out the downstream device at the same time. So this
> change has the pci driver resume with the wrong device if you happen to
> be in such a situation. I don't have such switches anymore

What's the error type causing the DPC event?  Surprise Down?

Since commit 2ae8fbbe1cd4 ("PCI/DPC: Ignore Surprise Down error on hot
removal"), which went into v6.9, the DPC driver handles Surprise Down
silently and it tells the hotplug driver *not* to ignore the hotplug
event.  It does that by unconditionally clearing the PCI_DPC_RECOVERED
flag at the end of dpc_handle_surprise_removal().

Hence in the situation you're describing, the hotplug driver should
always bring down the slot and bring it back up with the new device.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/2] Ignore spurious PCIe hotplug events
  2025-04-16 15:06   ` Lukas Wunner
@ 2025-04-18  1:26     ` Keith Busch
  0 siblings, 0 replies; 16+ messages in thread
From: Keith Busch @ 2025-04-18  1:26 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Bjorn Helgaas, Sathyanarayanan Kuppuswamy, Yicong Yang, linux-pci,
	Stuart Hayes, Mika Westerberg, Ilpo Jarvinen, Joel Mathew Thomas,
	Russ Weight, Matthew Gerlach, Yilun Xu, linux-fpga, Moshe Shemesh,
	Shay Drory, Saeed Mahameed, Alex Williamson

On Wed, Apr 16, 2025 at 05:06:23PM +0200, Lukas Wunner wrote:
> On Tue, Apr 15, 2025 at 02:51:42PM -0600, Keith Busch wrote:
> > On Thu, Apr 10, 2025 at 05:27:10PM +0200, Lukas Wunner wrote:
> > > First of all, PCIe hotplug is deliberately ignoring link events occurring
> > > as a side effect of Downstream Port Containment.  But it's not yet ignoring
> > > Presence Detect Changed events.  These can happen if a hotplug bridge uses
> > > in-band presence detect.  Reported by Keith Busch, patch [1/2] seeks to
> > > fix it.
> > 
> > There are switches that let you
> > toggle downstream connections to change what's attached and it causes a
> > DPC event, swapping out the downstream device at the same time. So this
> > change has the pci driver resume with the wrong device if you happen to
> > be in such a situation. I don't have such switches anymore
> 
> What's the error type causing the DPC event?  Surprise Down?
> 
> Since commit 2ae8fbbe1cd4 ("PCI/DPC: Ignore Surprise Down error on hot
> removal"), which went into v6.9, the DPC driver handles Surprise Down
> silently and it tells the hotplug driver *not* to ignore the hotplug
> event.  It does that by unconditionally clearing the PCI_DPC_RECOVERED
> flag at the end of dpc_handle_surprise_removal().
> 
> Hence in the situation you're describing, the hotplug driver should
> always bring down the slot and bring it back up with the new device.

I think you're right, thank you for the pointers.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2025-04-18  1:26 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-10 15:27 [PATCH 0/2] Ignore spurious PCIe hotplug events Lukas Wunner
2025-04-10 15:27 ` [PATCH 1/2] PCI: pciehp: Ignore Presence Detect Changed caused by DPC Lukas Wunner
2025-04-11  2:34   ` Sathyanarayanan Kuppuswamy
2025-04-11  8:58     ` Lukas Wunner
2025-04-14 13:33   ` Ilpo Järvinen
2025-04-10 15:27 ` [PATCH 2/2] PCI: pciehp: Ignore Link Down/Up caused by Secondary Bus Reset Lukas Wunner
2025-04-11 22:28   ` Sathyanarayanan Kuppuswamy
2025-04-12  3:36     ` Lukas Wunner
2025-04-13 17:21       ` Sathyanarayanan Kuppuswamy
2025-04-13 17:22   ` Sathyanarayanan Kuppuswamy
2025-04-14 13:32   ` Ilpo Järvinen
2025-04-16  8:00   ` Ilpo Järvinen
2025-04-10 22:19 ` [PATCH 0/2] Ignore spurious PCIe hotplug events Bjorn Helgaas
2025-04-15 20:51 ` Keith Busch
2025-04-16 15:06   ` Lukas Wunner
2025-04-18  1:26     ` Keith Busch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).