linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] pciehp: sync interrupts for bus resets
@ 2025-08-27 22:45 Keith Busch
  2025-08-27 22:48 ` Keith Busch
  2025-08-31 13:43 ` Lukas Wunner
  0 siblings, 2 replies; 6+ messages in thread
From: Keith Busch @ 2025-08-27 22:45 UTC (permalink / raw)
  To: linux-pci; +Cc: Keith Busch, Lukas Wunner

From: Keith Busch <kbusch@kernel.org>

Synchronize the interrupt to ensure the reset isn't going to disrupt a
previously pending handler from igoring the reset's link flap. Back to
back secondary bus resets create a window when the previous reset
proceeds with DLLLA, waking the pending pciehp interrupt thread, but the
subsequent reset tears it down while the irq thread tries to confirm the
link is active, triggering unexpected re-enumeration.

Fixes: bbf10cd686835d5 ("PCI: pciehp: Ignore belated Presence Detect Changed caused by DPC")
Cc: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
---
 drivers/pci/hotplug/pciehp_hpc.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index bcc51b26d03d5..f27ff20a3c34c 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -946,6 +946,7 @@ int pciehp_reset_slot(struct hotplug_slot *hotplug_slot, bool probe)
 
 	down_write_nested(&ctrl->reset_lock, ctrl->depth);
 
+	synchronize_irq(ctrl->pcie->irq);
 	pci_hp_ignore_link_change(pdev);
 
 	rc = pci_bridge_secondary_bus_reset(ctrl->pcie->port);
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] pciehp: sync interrupts for bus resets
  2025-08-27 22:45 [PATCH] pciehp: sync interrupts for bus resets Keith Busch
@ 2025-08-27 22:48 ` Keith Busch
  2025-08-31 13:43 ` Lukas Wunner
  1 sibling, 0 replies; 6+ messages in thread
From: Keith Busch @ 2025-08-27 22:48 UTC (permalink / raw)
  To: Keith Busch; +Cc: linux-pci, Lukas Wunner

On Wed, Aug 27, 2025 at 03:45:14PM -0700, Keith Busch wrote:
> 
> Fixes: bbf10cd686835d5 ("PCI: pciehp: Ignore belated Presence Detect Changed caused by DPC")

Ugh, I taged the wrong one. Should be:

Fixes: 2af781a9edc4ef ("PCI: pciehp: Ignore Link Down/Up caused by Secondary Bus Reset")

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] pciehp: sync interrupts for bus resets
  2025-08-27 22:45 [PATCH] pciehp: sync interrupts for bus resets Keith Busch
  2025-08-27 22:48 ` Keith Busch
@ 2025-08-31 13:43 ` Lukas Wunner
  2025-09-02 17:59   ` Keith Busch
  1 sibling, 1 reply; 6+ messages in thread
From: Lukas Wunner @ 2025-08-31 13:43 UTC (permalink / raw)
  To: Keith Busch; +Cc: linux-pci, Keith Busch

On Wed, Aug 27, 2025 at 03:45:14PM -0700, Keith Busch wrote:
> Synchronize the interrupt to ensure the reset isn't going to disrupt a
> previously pending handler from igoring the reset's link flap. Back to
> back secondary bus resets create a window when the previous reset
> proceeds with DLLLA, waking the pending pciehp interrupt thread, but the
> subsequent reset tears it down while the irq thread tries to confirm the
> link is active, triggering unexpected re-enumeration.

Help me understand this:

I think what you mean is that pciehp_reset_slot() runs and the
Secondary Bus Reset causes a spurious link change. So pciehp_ist() runs,
waits for the reset to finish with pci_hp_spurious_link_change(),
then calls pciehp_ignore_link_change(), which tests whether the link
is active again by calling pciehp_check_link_active().

And you're saying that at the same time, pciehp_reset_slot() runs,
performs a Secondary Bus Reset, thus brings down the link,
confusing the concurrent pciehp_check_link_active().
Did I understand that correctly?

I don't quite see how this can happen, given pciehp_reset_slot()
acquires ctrl->reset_lock for writing and the same lock is held
for reading for the call to pciehp_check_link_active().

Moreover pciehp_ist() ignores the link flap in the first iteration
(it clears the flags in the local variable "events") and if
pciehp_check_link_active() would indeed fail, then the bits would
only be set in ctrl->pending_events and a rerun of pciehp_ist()
would be triggered.  That second iteration of pciehp_ist() would
then find the PCI_LINK_CHANGED flag set and ignore the link change
(again).

So this should all work fine.

> +++ b/drivers/pci/hotplug/pciehp_hpc.c
> @@ -946,6 +946,7 @@ int pciehp_reset_slot(struct hotplug_slot *hotplug_slot, bool probe)
>  
>  	down_write_nested(&ctrl->reset_lock, ctrl->depth);
>  
> +	synchronize_irq(ctrl->pcie->irq);
>  	pci_hp_ignore_link_change(pdev);
>  
>  	rc = pci_bridge_secondary_bus_reset(ctrl->pcie->port);

This will deadlock because it waits for pciehp_ist() to finish
but pciehp_ist() may wait for ctrl->reset_lock, which is held here.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] pciehp: sync interrupts for bus resets
  2025-08-31 13:43 ` Lukas Wunner
@ 2025-09-02 17:59   ` Keith Busch
  2025-09-03  8:21     ` Lukas Wunner
  0 siblings, 1 reply; 6+ messages in thread
From: Keith Busch @ 2025-09-02 17:59 UTC (permalink / raw)
  To: Lukas Wunner; +Cc: Keith Busch, linux-pci

On Sun, Aug 31, 2025 at 03:43:35PM +0200, Lukas Wunner wrote:
> On Wed, Aug 27, 2025 at 03:45:14PM -0700, Keith Busch wrote:
> > Synchronize the interrupt to ensure the reset isn't going to disrupt a
> > previously pending handler from igoring the reset's link flap. Back to
> > back secondary bus resets create a window when the previous reset
> > proceeds with DLLLA, waking the pending pciehp interrupt thread, but the
> > subsequent reset tears it down while the irq thread tries to confirm the
> > link is active, triggering unexpected re-enumeration.
> 
> Help me understand this:
> 
> I think what you mean is that pciehp_reset_slot() runs and the
> Secondary Bus Reset causes a spurious link change. So pciehp_ist() runs,
> waits for the reset to finish with pci_hp_spurious_link_change(),
> then calls pciehp_ignore_link_change(), which tests whether the link
> is active again by calling pciehp_check_link_active().
> 
> And you're saying that at the same time, pciehp_reset_slot() runs,
> performs a Secondary Bus Reset, thus brings down the link,
> confusing the concurrent pciehp_check_link_active().
> Did I understand that correctly?
> 
> I don't quite see how this can happen, given pciehp_reset_slot()
> acquires ctrl->reset_lock for writing and the same lock is held
> for reading for the call to pciehp_check_link_active().
> 
> Moreover pciehp_ist() ignores the link flap in the first iteration
> (it clears the flags in the local variable "events") and if
> pciehp_check_link_active() would indeed fail, then the bits would
> only be set in ctrl->pending_events and a rerun of pciehp_ist()
> would be triggered.  That second iteration of pciehp_ist() would
> then find the PCI_LINK_CHANGED flag set and ignore the link change
> (again).
> 
> So this should all work fine.

Hm, I think you're right. We are definitely seeing pciehp requeue itself
with the link/presence events that we want to be ignored, so we're
getting re-enumeration when we didn't expect it. I thought the
back-to-back resets that we're causing vfio to initiate was the problem,
but maybe not. I think the switch and/or end device we're using have
some unusual link timings that defeats the pciehp ignore logic.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] pciehp: sync interrupts for bus resets
  2025-09-02 17:59   ` Keith Busch
@ 2025-09-03  8:21     ` Lukas Wunner
  2025-09-03 16:19       ` Keith Busch
  0 siblings, 1 reply; 6+ messages in thread
From: Lukas Wunner @ 2025-09-03  8:21 UTC (permalink / raw)
  To: Keith Busch; +Cc: Keith Busch, linux-pci

On Tue, Sep 02, 2025 at 11:59:11AM -0600, Keith Busch wrote:
> Hm, I think you're right. We are definitely seeing pciehp requeue itself
> with the link/presence events that we want to be ignored, so we're
> getting re-enumeration when we didn't expect it. I thought the
> back-to-back resets that we're causing vfio to initiate was the problem,
> but maybe not. I think the switch and/or end device we're using have
> some unusual link timings that defeats the pciehp ignore logic.

pci_bridge_secondary_bus_reset() calls pci_bridge_wait_for_secondary_bus()
to await Link Up.  So unless the link flaps afterwards, this should be
fine.

Another possibility is that the pciehp_device_replaced() check triggers,
e.g. because the Endpoint's Device Serial Number or other data in Config
Space changed after the second reset.

Maybe you can instrument the code with a few printk()'s to see what's
going on.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] pciehp: sync interrupts for bus resets
  2025-09-03  8:21     ` Lukas Wunner
@ 2025-09-03 16:19       ` Keith Busch
  0 siblings, 0 replies; 6+ messages in thread
From: Keith Busch @ 2025-09-03 16:19 UTC (permalink / raw)
  To: Lukas Wunner; +Cc: Keith Busch, linux-pci

On Wed, Sep 03, 2025 at 10:21:34AM +0200, Lukas Wunner wrote:
> On Tue, Sep 02, 2025 at 11:59:11AM -0600, Keith Busch wrote:
> > Hm, I think you're right. We are definitely seeing pciehp requeue itself
> > with the link/presence events that we want to be ignored, so we're
> > getting re-enumeration when we didn't expect it. I thought the
> > back-to-back resets that we're causing vfio to initiate was the problem,
> > but maybe not. I think the switch and/or end device we're using have
> > some unusual link timings that defeats the pciehp ignore logic.
> 
> pci_bridge_secondary_bus_reset() calls pci_bridge_wait_for_secondary_bus()
> to await Link Up.  So unless the link flaps afterwards, this should be
> fine.
> 
> Another possibility is that the pciehp_device_replaced() check triggers,
> e.g. because the Endpoint's Device Serial Number or other data in Config
> Space changed after the second reset.

That can happen because we're using switches that insert a fake
"placeholder" device when a link is down.
 
> Maybe you can instrument the code with a few printk()'s to see what's
> going on.

But it looks like we're more frequently seeing the link not active.
Here's the existing messages printed:


[ 7904.749658] vfio-pci 0000:05:00.0: disabling bus mastering
[ 7904.756595] vfio-pci 0000:05:00.0: reset via bus
[ 7904.759975] pcieport 0000:02:02.0: waiting 100 ms for downstream link, after activation
[ 7905.908987] vfio-pci 0000:05:00.0: ready 0ms after bus reset
[ 7905.909003] pcieport 0000:02:02.0: pciehp: Slot(314): Link Down/Up ignored
[ 7906.847973] vfio-pci 0000:05:00.0: resetting
[ 7906.856312] vfio-pci 0000:05:00.0: reset via bus
[ 7906.862967] pcieport 0000:02:02.0: waiting 100 ms for downstream link, after activation
[ 7909.915925] pcieport 0000:02:02.0: Data Link Layer Link Active not set in 100 msec
[ 7909.915953] pcieport 0000:02:02.0: pciehp: Slot(314): Link Down/Up ignored
[ 7909.915977] pcieport 0000:02:02.0: pciehp: Slot(314): Link Down
[ 7909.915978] pcieport 0000:02:02.0: pciehp: Slot(314): Card not present
[ 7909.918934] pcieport 0000:02:02.0: waiting 100 ms for downstream link, after activation
[ 7911.923899] vfio-pci 0000:05:00.0: disconnected; not waiting
[ 7911.923905] vfio-pci 0000:05:00.0: bus failed with -25

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-09-03 16:19 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-27 22:45 [PATCH] pciehp: sync interrupts for bus resets Keith Busch
2025-08-27 22:48 ` Keith Busch
2025-08-31 13:43 ` Lukas Wunner
2025-09-02 17:59   ` Keith Busch
2025-09-03  8:21     ` Lukas Wunner
2025-09-03 16:19       ` Keith Busch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).