Linux PCI subsystem development
 help / color / mirror / Atom feed
* [PATCH] PCI: pciehp: Avoid unnecessary device replacement check
@ 2025-03-11  6:27 Lukas Wunner
  2025-03-12 15:55 ` Mika Westerberg
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Lukas Wunner @ 2025-03-11  6:27 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, Kenneth Crudup, Chia-Lin Kao (AceLan), Mika Westerberg,
	Ricky Wu

Hot-removal of nested PCI hotplug ports suffers from a long-standing
race condition which can lead to a deadlock:  A parent hotplug port
acquires pci_lock_rescan_remove(), then waits for pciehp to unbind
from a child hotplug port.  Meanwhile that child hotplug port tries to
acquire pci_lock_rescan_remove() as well in order to remove its own
children.

The deadlock only occurs if the parent acquires pci_lock_rescan_remove()
first, not if the child happens to acquire it first.

Several workarounds to avoid the issue have been proposed and discarded
over the years, e.g.:

https://lore.kernel.org/r/4c882e25194ba8282b78fe963fec8faae7cf23eb.1529173804.git.lukas@wunner.de/

A proper fix is being worked on, but needs more time as it is nontrivial
and necessarily intrusive.

Recent commit 9d573d19547b ("PCI: pciehp: Detect device replacement
during system sleep") provokes more frequent occurrence of the deadlock
when removing more than one Thunderbolt device during system sleep.
The commit sought to detect device replacement, but also triggered on
device removal.  Differentiating reliably between replacement and
removal is impossible because pci_get_dsn() returns 0 both if the device
was removed, as well as if it was replaced with one lacking a Device
Serial Number.

Avoid the more frequent occurrence of the deadlock by checking whether
the hotplug port itself was hot-removed.  If so, there's no sense in
checking whether its child device was replaced.

This works because the ->resume_noirq() callback is invoked in top-down
order for the entire hierarchy:  A parent hotplug port detecting device
replacement (or removal) marks all children as removed using
pci_dev_set_disconnected() and a child hotplug port can then reliably
detect being removed.

Fixes: 9d573d19547b ("PCI: pciehp: Detect device replacement during system sleep")
Reported-by: Kenneth Crudup <kenny@panix.com>
Closes: https://lore.kernel.org/r/83d9302a-f743-43e4-9de2-2dd66d91ab5b@panix.com/
Reported-by: Chia-Lin Kao (AceLan) <acelan.kao@canonical.com>
Closes: https://lore.kernel.org/r/20240926125909.2362244-1-acelan.kao@canonical.com/
Tested-by: Kenneth Crudup <kenny@panix.com>
Tested-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: stable@vger.kernel.org # v6.11+
---
 drivers/pci/hotplug/pciehp_core.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c
index ff458e6..997841c 100644
--- a/drivers/pci/hotplug/pciehp_core.c
+++ b/drivers/pci/hotplug/pciehp_core.c
@@ -286,9 +286,12 @@ static int pciehp_suspend(struct pcie_device *dev)
 
 static bool pciehp_device_replaced(struct controller *ctrl)
 {
-	struct pci_dev *pdev __free(pci_dev_put);
+	struct pci_dev *pdev __free(pci_dev_put) = NULL;
 	u32 reg;
 
+	if (pci_dev_is_disconnected(ctrl->pcie->port))
+		return false;
+
 	pdev = pci_get_slot(ctrl->pcie->port->subordinate, PCI_DEVFN(0, 0));
 	if (!pdev)
 		return true;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-03-14 14:32 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-11  6:27 [PATCH] PCI: pciehp: Avoid unnecessary device replacement check Lukas Wunner
2025-03-12 15:55 ` Mika Westerberg
2025-03-12 16:29 ` Bjorn Helgaas
2025-03-14 14:32 ` Sathyanarayanan Kuppuswamy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox