public inbox for linux-pci@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] pci: remove slot specific lock/unlock and save/restore
@ 2026-01-29 16:09 Keith Busch
  2026-01-29 20:27 ` Bjorn Helgaas
  0 siblings, 1 reply; 3+ messages in thread
From: Keith Busch @ 2026-01-29 16:09 UTC (permalink / raw)
  To: linux-pci, helgaas
  Cc: alex, lukas, dan.j.williams, guojinhui.liam, ilpo.jarvinen,
	Keith Busch

From: Keith Busch <kbusch@kernel.org>

The Linux pci driver resolves a "slot" to the "D" in the B:D.f (see
PCI_SLOT()). A pcie "slot reset" is a secondary bus reset, which affects
every function on every "D", not just the ones with a matching "slot".
The slot lock/unlock and save/restore functions, however, are only
handling a subset of the functions, breaking the rest.

ARI devices with more than 8 functions fail because their state is not
properly handled, nor is the attached driver notified of the reset. In
the best case, the device will appear unresponsive to the driver,
resulting in unexpected errors. A worse possiblility may panic the
kernel if in flight transactions trigger hardware reported errors like
this real observation:

  vfio-pci 0000:01:00.0: resetting
  vfio-pci 0000:01:00.0: reset done
  {1}[Hardware Error]:  Error 1, type: fatal
  {1}[Hardware Error]:   section_type: PCIe error
  {1}[Hardware Error]:   port_type: 0, PCIe end point
  {1}[Hardware Error]:   version: 0.2
  {1}[Hardware Error]:   command: 0x0140, status: 0x0010
  {1}[Hardware Error]:   device_id: 0000:01:01.0
  {1}[Hardware Error]:   slot: 0
  {1}[Hardware Error]:   secondary_bus: 0x00
  {1}[Hardware Error]:   vendor_id: 0x1d9b, device_id: 0x0207
  {1}[Hardware Error]:   class_code: 020000
  {1}[Hardware Error]:   bridge: secondary_status: 0x0000, control: 0x0000
  {1}[Hardware Error]:   aer_cor_status: 0x00008000, aer_cor_mask: 0x00002000
  {1}[Hardware Error]:   aer_uncor_status: 0x00010000, aer_uncor_mask: 0x00100000
  {1}[Hardware Error]:   aer_uncor_severity: 0x006f6030
  {1}[Hardware Error]:   TLP Header: 0a412800 00192080 60000004 00000004
  GHES: Fatal hardware error but panic disabled
  Kernel panic - not syncing: GHES: Fatal hardware error

Fix this by properly locking and notifying the entire affected bus
topology, not just specific matching slots. For architectures that
support "slot" specific resets, this patch potentially introduces an
insignificant amount of overhead, but is otherwise harmless.

Signed-off-by: Keith Busch <kbusch@kernel.org>
---
This patch is based on v6.19-rc1 as requested. Note, there will be a
minor merge conflict if this other patch fixing the "trylock" handling
is already included:

  https://lore.kernel.org/linux-pci/20260128225344.GA422463@bhelgaas/

But I don't see it applied in any the pci git tree.

 drivers/pci/pci.c | 113 ++++------------------------------------------
 1 file changed, 9 insertions(+), 104 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 13dbb405dc31f..8c19939d633a0 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5307,67 +5307,6 @@ static bool pci_slot_resettable(struct pci_slot *slot)
 	return true;
 }
 
-/* Lock devices from the top of the tree down */
-static void pci_slot_lock(struct pci_slot *slot)
-{
-	struct pci_dev *dev;
-
-	list_for_each_entry(dev, &slot->bus->devices, bus_list) {
-		if (!dev->slot || dev->slot != slot)
-			continue;
-		if (dev->subordinate)
-			pci_bus_lock(dev->subordinate);
-		else
-			pci_dev_lock(dev);
-	}
-}
-
-/* Unlock devices from the bottom of the tree up */
-static void pci_slot_unlock(struct pci_slot *slot)
-{
-	struct pci_dev *dev;
-
-	list_for_each_entry(dev, &slot->bus->devices, bus_list) {
-		if (!dev->slot || dev->slot != slot)
-			continue;
-		if (dev->subordinate)
-			pci_bus_unlock(dev->subordinate);
-		else
-			pci_dev_unlock(dev);
-	}
-}
-
-/* Return 1 on successful lock, 0 on contention */
-static int pci_slot_trylock(struct pci_slot *slot)
-{
-	struct pci_dev *dev;
-
-	list_for_each_entry(dev, &slot->bus->devices, bus_list) {
-		if (!dev->slot || dev->slot != slot)
-			continue;
-		if (dev->subordinate) {
-			if (!pci_bus_trylock(dev->subordinate)) {
-				pci_dev_unlock(dev);
-				goto unlock;
-			}
-		} else if (!pci_dev_trylock(dev))
-			goto unlock;
-	}
-	return 1;
-
-unlock:
-	list_for_each_entry_continue_reverse(dev,
-					     &slot->bus->devices, bus_list) {
-		if (!dev->slot || dev->slot != slot)
-			continue;
-		if (dev->subordinate)
-			pci_bus_unlock(dev->subordinate);
-		else
-			pci_dev_unlock(dev);
-	}
-	return 0;
-}
-
 /*
  * Save and disable devices from the top of the tree down while holding
  * the @dev mutex lock for the entire tree.
@@ -5401,59 +5340,24 @@ static void pci_bus_restore_locked(struct pci_bus *bus)
 	}
 }
 
-/*
- * Save and disable devices from the top of the tree down while holding
- * the @dev mutex lock for the entire tree.
- */
-static void pci_slot_save_and_disable_locked(struct pci_slot *slot)
-{
-	struct pci_dev *dev;
-
-	list_for_each_entry(dev, &slot->bus->devices, bus_list) {
-		if (!dev->slot || dev->slot != slot)
-			continue;
-		pci_dev_save_and_disable(dev);
-		if (dev->subordinate)
-			pci_bus_save_and_disable_locked(dev->subordinate);
-	}
-}
-
-/*
- * Restore devices from top of the tree down while holding @dev mutex lock
- * for the entire tree.  Parent bridges need to be restored before we can
- * get to subordinate devices.
- */
-static void pci_slot_restore_locked(struct pci_slot *slot)
-{
-	struct pci_dev *dev;
-
-	list_for_each_entry(dev, &slot->bus->devices, bus_list) {
-		if (!dev->slot || dev->slot != slot)
-			continue;
-		pci_dev_restore(dev);
-		if (dev->subordinate) {
-			pci_bridge_wait_for_secondary_bus(dev, "slot reset");
-			pci_bus_restore_locked(dev->subordinate);
-		}
-	}
-}
-
 static int pci_slot_reset(struct pci_slot *slot, bool probe)
 {
+	struct pci_bus *bus;
 	int rc;
 
 	if (!slot || !pci_slot_resettable(slot))
 		return -ENOTTY;
 
+	bus = slot->bus;
 	if (!probe)
-		pci_slot_lock(slot);
+		pci_bus_lock(bus);
 
 	might_sleep();
 
 	rc = pci_reset_hotplug_slot(slot->hotplug, probe);
 
 	if (!probe)
-		pci_slot_unlock(slot);
+		pci_bus_unlock(bus);
 
 	return rc;
 }
@@ -5487,18 +5391,19 @@ EXPORT_SYMBOL_GPL(pci_probe_reset_slot);
  */
 static int __pci_reset_slot(struct pci_slot *slot)
 {
+	struct pci_bus *bus = slot->bus;
 	int rc;
 
 	rc = pci_slot_reset(slot, PCI_RESET_PROBE);
 	if (rc)
 		return rc;
 
-	if (pci_slot_trylock(slot)) {
-		pci_slot_save_and_disable_locked(slot);
+	if (pci_bus_trylock(bus)) {
+		pci_bus_save_and_disable_locked(bus);
 		might_sleep();
 		rc = pci_reset_hotplug_slot(slot->hotplug, PCI_RESET_DO_RESET);
-		pci_slot_restore_locked(slot);
-		pci_slot_unlock(slot);
+		pci_bus_restore_locked(bus);
+		pci_bus_unlock(bus);
 	} else
 		rc = -EAGAIN;
 
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] pci: remove slot specific lock/unlock and save/restore
  2026-01-29 16:09 [PATCH] pci: remove slot specific lock/unlock and save/restore Keith Busch
@ 2026-01-29 20:27 ` Bjorn Helgaas
  2026-01-29 21:16   ` Keith Busch
  0 siblings, 1 reply; 3+ messages in thread
From: Bjorn Helgaas @ 2026-01-29 20:27 UTC (permalink / raw)
  To: Keith Busch
  Cc: linux-pci, alex, lukas, dan.j.williams, guojinhui.liam,
	ilpo.jarvinen, Keith Busch

On Thu, Jan 29, 2026 at 08:09:55AM -0800, Keith Busch wrote:
> From: Keith Busch <kbusch@kernel.org>
> 
> The Linux pci driver resolves a "slot" to the "D" in the B:D.f (see
> PCI_SLOT()). A pcie "slot reset" is a secondary bus reset, which affects
> every function on every "D", not just the ones with a matching "slot".
> The slot lock/unlock and save/restore functions, however, are only
> handling a subset of the functions, breaking the rest.
> 
> ARI devices with more than 8 functions fail because their state is not
> properly handled, nor is the attached driver notified of the reset. In
> the best case, the device will appear unresponsive to the driver,
> resulting in unexpected errors. A worse possiblility may panic the
> kernel if in flight transactions trigger hardware reported errors like
> this real observation:
> 
>   vfio-pci 0000:01:00.0: resetting
>   vfio-pci 0000:01:00.0: reset done
>   {1}[Hardware Error]:  Error 1, type: fatal
>   {1}[Hardware Error]:   section_type: PCIe error
>   {1}[Hardware Error]:   port_type: 0, PCIe end point
>   {1}[Hardware Error]:   version: 0.2
>   {1}[Hardware Error]:   command: 0x0140, status: 0x0010
>   {1}[Hardware Error]:   device_id: 0000:01:01.0
>   {1}[Hardware Error]:   slot: 0
>   {1}[Hardware Error]:   secondary_bus: 0x00
>   {1}[Hardware Error]:   vendor_id: 0x1d9b, device_id: 0x0207
>   {1}[Hardware Error]:   class_code: 020000
>   {1}[Hardware Error]:   bridge: secondary_status: 0x0000, control: 0x0000
>   {1}[Hardware Error]:   aer_cor_status: 0x00008000, aer_cor_mask: 0x00002000
>   {1}[Hardware Error]:   aer_uncor_status: 0x00010000, aer_uncor_mask: 0x00100000
>   {1}[Hardware Error]:   aer_uncor_severity: 0x006f6030
>   {1}[Hardware Error]:   TLP Header: 0a412800 00192080 60000004 00000004
>   GHES: Fatal hardware error but panic disabled
>   Kernel panic - not syncing: GHES: Fatal hardware error
> 
> Fix this by properly locking and notifying the entire affected bus
> topology, not just specific matching slots. For architectures that
> support "slot" specific resets, this patch potentially introduces an
> insignificant amount of overhead, but is otherwise harmless.
> 
> Signed-off-by: Keith Busch <kbusch@kernel.org>
> ---

This seems reasonable, but is a much larger change than just fixing
the slot locking, so it would really be nice to get some review
besides just me.

> This patch is based on v6.19-rc1 as requested. Note, there will be a
> minor merge conflict if this other patch fixing the "trylock" handling
> is already included:
> 
>   https://lore.kernel.org/linux-pci/20260128225344.GA422463@bhelgaas/

The patch below removes pci_slot_trylock() completely, which renders
Jinhui's fix to that function moot, so if we apply this one, there
would be no reason to keep Jinhui's patch.

An alternate path would be to:

  - keep Jinhui's patch that fixes the pci_slot_trylock() error path
    (same as your 1/1 patch)

  - apply your 2/2 patch [1] to fix pci_slot_lock(),
    pci_slot_unlock(), and pci_slot_trylock() to lock the bridge
    (after accounting for the mistaken merge)

  - finally, rebase *this* patch on top to remove the pci_slot_lock(),
    pci_slot_unlock(), and pci_slot_trylock() that have just been
    fixed

The advantage would be that if there's any issue or objection to
completely removing pci_slot_lock() et al, we could drop or defer that
while still fixing the locking issues.

[1] https://lore.kernel.org/r/20260116184150.3013258-2-kbusch@meta.com

> But I don't see it applied in any the pci git tree.
> 
>  drivers/pci/pci.c | 113 ++++------------------------------------------
>  1 file changed, 9 insertions(+), 104 deletions(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 13dbb405dc31f..8c19939d633a0 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -5307,67 +5307,6 @@ static bool pci_slot_resettable(struct pci_slot *slot)
>  	return true;
>  }
>  
> -/* Lock devices from the top of the tree down */
> -static void pci_slot_lock(struct pci_slot *slot)
> -{
> -	struct pci_dev *dev;
> -
> -	list_for_each_entry(dev, &slot->bus->devices, bus_list) {
> -		if (!dev->slot || dev->slot != slot)
> -			continue;
> -		if (dev->subordinate)
> -			pci_bus_lock(dev->subordinate);
> -		else
> -			pci_dev_lock(dev);
> -	}
> -}
> -
> -/* Unlock devices from the bottom of the tree up */
> -static void pci_slot_unlock(struct pci_slot *slot)
> -{
> -	struct pci_dev *dev;
> -
> -	list_for_each_entry(dev, &slot->bus->devices, bus_list) {
> -		if (!dev->slot || dev->slot != slot)
> -			continue;
> -		if (dev->subordinate)
> -			pci_bus_unlock(dev->subordinate);
> -		else
> -			pci_dev_unlock(dev);
> -	}
> -}
> -
> -/* Return 1 on successful lock, 0 on contention */
> -static int pci_slot_trylock(struct pci_slot *slot)
> -{
> -	struct pci_dev *dev;
> -
> -	list_for_each_entry(dev, &slot->bus->devices, bus_list) {
> -		if (!dev->slot || dev->slot != slot)
> -			continue;
> -		if (dev->subordinate) {
> -			if (!pci_bus_trylock(dev->subordinate)) {
> -				pci_dev_unlock(dev);
> -				goto unlock;
> -			}
> -		} else if (!pci_dev_trylock(dev))
> -			goto unlock;
> -	}
> -	return 1;
> -
> -unlock:
> -	list_for_each_entry_continue_reverse(dev,
> -					     &slot->bus->devices, bus_list) {
> -		if (!dev->slot || dev->slot != slot)
> -			continue;
> -		if (dev->subordinate)
> -			pci_bus_unlock(dev->subordinate);
> -		else
> -			pci_dev_unlock(dev);
> -	}
> -	return 0;
> -}
> -
>  /*
>   * Save and disable devices from the top of the tree down while holding
>   * the @dev mutex lock for the entire tree.
> @@ -5401,59 +5340,24 @@ static void pci_bus_restore_locked(struct pci_bus *bus)
>  	}
>  }
>  
> -/*
> - * Save and disable devices from the top of the tree down while holding
> - * the @dev mutex lock for the entire tree.
> - */
> -static void pci_slot_save_and_disable_locked(struct pci_slot *slot)
> -{
> -	struct pci_dev *dev;
> -
> -	list_for_each_entry(dev, &slot->bus->devices, bus_list) {
> -		if (!dev->slot || dev->slot != slot)
> -			continue;
> -		pci_dev_save_and_disable(dev);
> -		if (dev->subordinate)
> -			pci_bus_save_and_disable_locked(dev->subordinate);
> -	}
> -}
> -
> -/*
> - * Restore devices from top of the tree down while holding @dev mutex lock
> - * for the entire tree.  Parent bridges need to be restored before we can
> - * get to subordinate devices.
> - */
> -static void pci_slot_restore_locked(struct pci_slot *slot)
> -{
> -	struct pci_dev *dev;
> -
> -	list_for_each_entry(dev, &slot->bus->devices, bus_list) {
> -		if (!dev->slot || dev->slot != slot)
> -			continue;
> -		pci_dev_restore(dev);
> -		if (dev->subordinate) {
> -			pci_bridge_wait_for_secondary_bus(dev, "slot reset");
> -			pci_bus_restore_locked(dev->subordinate);
> -		}
> -	}
> -}
> -
>  static int pci_slot_reset(struct pci_slot *slot, bool probe)
>  {
> +	struct pci_bus *bus;
>  	int rc;
>  
>  	if (!slot || !pci_slot_resettable(slot))
>  		return -ENOTTY;
>  
> +	bus = slot->bus;
>  	if (!probe)
> -		pci_slot_lock(slot);
> +		pci_bus_lock(bus);
>  
>  	might_sleep();
>  
>  	rc = pci_reset_hotplug_slot(slot->hotplug, probe);
>  
>  	if (!probe)
> -		pci_slot_unlock(slot);
> +		pci_bus_unlock(bus);
>  
>  	return rc;
>  }
> @@ -5487,18 +5391,19 @@ EXPORT_SYMBOL_GPL(pci_probe_reset_slot);
>   */
>  static int __pci_reset_slot(struct pci_slot *slot)
>  {
> +	struct pci_bus *bus = slot->bus;
>  	int rc;
>  
>  	rc = pci_slot_reset(slot, PCI_RESET_PROBE);
>  	if (rc)
>  		return rc;
>  
> -	if (pci_slot_trylock(slot)) {
> -		pci_slot_save_and_disable_locked(slot);
> +	if (pci_bus_trylock(bus)) {
> +		pci_bus_save_and_disable_locked(bus);
>  		might_sleep();
>  		rc = pci_reset_hotplug_slot(slot->hotplug, PCI_RESET_DO_RESET);
> -		pci_slot_restore_locked(slot);
> -		pci_slot_unlock(slot);
> +		pci_bus_restore_locked(bus);
> +		pci_bus_unlock(bus);
>  	} else
>  		rc = -EAGAIN;
>  
> -- 
> 2.47.3
> 

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] pci: remove slot specific lock/unlock and save/restore
  2026-01-29 20:27 ` Bjorn Helgaas
@ 2026-01-29 21:16   ` Keith Busch
  0 siblings, 0 replies; 3+ messages in thread
From: Keith Busch @ 2026-01-29 21:16 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Keith Busch, linux-pci, alex, lukas, dan.j.williams,
	guojinhui.liam, ilpo.jarvinen

On Thu, Jan 29, 2026 at 02:27:50PM -0600, Bjorn Helgaas wrote:
> An alternate path would be to:
> 
>   - keep Jinhui's patch that fixes the pci_slot_trylock() error path
>     (same as your 1/1 patch)
> 
>   - apply your 2/2 patch [1] to fix pci_slot_lock(),
>     pci_slot_unlock(), and pci_slot_trylock() to lock the bridge
>     (after accounting for the mistaken merge)
> 
>   - finally, rebase *this* patch on top to remove the pci_slot_lock(),
>     pci_slot_unlock(), and pci_slot_trylock() that have just been
>     fixed
> 
> The advantage would be that if there's any issue or objection to
> completely removing pci_slot_lock() et al, we could drop or defer that
> while still fixing the locking issues.

Sounds good. Another bonus is we can be more cautious for stable
backports since the first two are clear isolated bug fixes. I'll prepare
a rebased version with the proposed sequence.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-01-29 21:16 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-29 16:09 [PATCH] pci: remove slot specific lock/unlock and save/restore Keith Busch
2026-01-29 20:27 ` Bjorn Helgaas
2026-01-29 21:16   ` Keith Busch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox