public inbox for linux-pci@vger.kernel.org
 help / color / mirror / Atom feed
From: Bjorn Helgaas <helgaas@kernel.org>
To: Keith Busch <kbusch@meta.com>
Cc: linux-pci@vger.kernel.org, alex@shazbot.org, lukas@wunner.de,
	dan.j.williams@intel.com, guojinhui.liam@bytedance.com,
	ilpo.jarvinen@linux.intel.com, Keith Busch <kbusch@kernel.org>
Subject: Re: [PATCH] pci: remove slot specific lock/unlock and save/restore
Date: Thu, 29 Jan 2026 14:27:50 -0600	[thread overview]
Message-ID: <20260129202750.GA476192@bhelgaas> (raw)
In-Reply-To: <20260129160955.2637158-1-kbusch@meta.com>

On Thu, Jan 29, 2026 at 08:09:55AM -0800, Keith Busch wrote:
> From: Keith Busch <kbusch@kernel.org>
> 
> The Linux pci driver resolves a "slot" to the "D" in the B:D.f (see
> PCI_SLOT()). A pcie "slot reset" is a secondary bus reset, which affects
> every function on every "D", not just the ones with a matching "slot".
> The slot lock/unlock and save/restore functions, however, are only
> handling a subset of the functions, breaking the rest.
> 
> ARI devices with more than 8 functions fail because their state is not
> properly handled, nor is the attached driver notified of the reset. In
> the best case, the device will appear unresponsive to the driver,
> resulting in unexpected errors. A worse possiblility may panic the
> kernel if in flight transactions trigger hardware reported errors like
> this real observation:
> 
>   vfio-pci 0000:01:00.0: resetting
>   vfio-pci 0000:01:00.0: reset done
>   {1}[Hardware Error]:  Error 1, type: fatal
>   {1}[Hardware Error]:   section_type: PCIe error
>   {1}[Hardware Error]:   port_type: 0, PCIe end point
>   {1}[Hardware Error]:   version: 0.2
>   {1}[Hardware Error]:   command: 0x0140, status: 0x0010
>   {1}[Hardware Error]:   device_id: 0000:01:01.0
>   {1}[Hardware Error]:   slot: 0
>   {1}[Hardware Error]:   secondary_bus: 0x00
>   {1}[Hardware Error]:   vendor_id: 0x1d9b, device_id: 0x0207
>   {1}[Hardware Error]:   class_code: 020000
>   {1}[Hardware Error]:   bridge: secondary_status: 0x0000, control: 0x0000
>   {1}[Hardware Error]:   aer_cor_status: 0x00008000, aer_cor_mask: 0x00002000
>   {1}[Hardware Error]:   aer_uncor_status: 0x00010000, aer_uncor_mask: 0x00100000
>   {1}[Hardware Error]:   aer_uncor_severity: 0x006f6030
>   {1}[Hardware Error]:   TLP Header: 0a412800 00192080 60000004 00000004
>   GHES: Fatal hardware error but panic disabled
>   Kernel panic - not syncing: GHES: Fatal hardware error
> 
> Fix this by properly locking and notifying the entire affected bus
> topology, not just specific matching slots. For architectures that
> support "slot" specific resets, this patch potentially introduces an
> insignificant amount of overhead, but is otherwise harmless.
> 
> Signed-off-by: Keith Busch <kbusch@kernel.org>
> ---

This seems reasonable, but is a much larger change than just fixing
the slot locking, so it would really be nice to get some review
besides just me.

> This patch is based on v6.19-rc1 as requested. Note, there will be a
> minor merge conflict if this other patch fixing the "trylock" handling
> is already included:
> 
>   https://lore.kernel.org/linux-pci/20260128225344.GA422463@bhelgaas/

The patch below removes pci_slot_trylock() completely, which renders
Jinhui's fix to that function moot, so if we apply this one, there
would be no reason to keep Jinhui's patch.

An alternate path would be to:

  - keep Jinhui's patch that fixes the pci_slot_trylock() error path
    (same as your 1/1 patch)

  - apply your 2/2 patch [1] to fix pci_slot_lock(),
    pci_slot_unlock(), and pci_slot_trylock() to lock the bridge
    (after accounting for the mistaken merge)

  - finally, rebase *this* patch on top to remove the pci_slot_lock(),
    pci_slot_unlock(), and pci_slot_trylock() that have just been
    fixed

The advantage would be that if there's any issue or objection to
completely removing pci_slot_lock() et al, we could drop or defer that
while still fixing the locking issues.

[1] https://lore.kernel.org/r/20260116184150.3013258-2-kbusch@meta.com

> But I don't see it applied in any the pci git tree.
> 
>  drivers/pci/pci.c | 113 ++++------------------------------------------
>  1 file changed, 9 insertions(+), 104 deletions(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 13dbb405dc31f..8c19939d633a0 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -5307,67 +5307,6 @@ static bool pci_slot_resettable(struct pci_slot *slot)
>  	return true;
>  }
>  
> -/* Lock devices from the top of the tree down */
> -static void pci_slot_lock(struct pci_slot *slot)
> -{
> -	struct pci_dev *dev;
> -
> -	list_for_each_entry(dev, &slot->bus->devices, bus_list) {
> -		if (!dev->slot || dev->slot != slot)
> -			continue;
> -		if (dev->subordinate)
> -			pci_bus_lock(dev->subordinate);
> -		else
> -			pci_dev_lock(dev);
> -	}
> -}
> -
> -/* Unlock devices from the bottom of the tree up */
> -static void pci_slot_unlock(struct pci_slot *slot)
> -{
> -	struct pci_dev *dev;
> -
> -	list_for_each_entry(dev, &slot->bus->devices, bus_list) {
> -		if (!dev->slot || dev->slot != slot)
> -			continue;
> -		if (dev->subordinate)
> -			pci_bus_unlock(dev->subordinate);
> -		else
> -			pci_dev_unlock(dev);
> -	}
> -}
> -
> -/* Return 1 on successful lock, 0 on contention */
> -static int pci_slot_trylock(struct pci_slot *slot)
> -{
> -	struct pci_dev *dev;
> -
> -	list_for_each_entry(dev, &slot->bus->devices, bus_list) {
> -		if (!dev->slot || dev->slot != slot)
> -			continue;
> -		if (dev->subordinate) {
> -			if (!pci_bus_trylock(dev->subordinate)) {
> -				pci_dev_unlock(dev);
> -				goto unlock;
> -			}
> -		} else if (!pci_dev_trylock(dev))
> -			goto unlock;
> -	}
> -	return 1;
> -
> -unlock:
> -	list_for_each_entry_continue_reverse(dev,
> -					     &slot->bus->devices, bus_list) {
> -		if (!dev->slot || dev->slot != slot)
> -			continue;
> -		if (dev->subordinate)
> -			pci_bus_unlock(dev->subordinate);
> -		else
> -			pci_dev_unlock(dev);
> -	}
> -	return 0;
> -}
> -
>  /*
>   * Save and disable devices from the top of the tree down while holding
>   * the @dev mutex lock for the entire tree.
> @@ -5401,59 +5340,24 @@ static void pci_bus_restore_locked(struct pci_bus *bus)
>  	}
>  }
>  
> -/*
> - * Save and disable devices from the top of the tree down while holding
> - * the @dev mutex lock for the entire tree.
> - */
> -static void pci_slot_save_and_disable_locked(struct pci_slot *slot)
> -{
> -	struct pci_dev *dev;
> -
> -	list_for_each_entry(dev, &slot->bus->devices, bus_list) {
> -		if (!dev->slot || dev->slot != slot)
> -			continue;
> -		pci_dev_save_and_disable(dev);
> -		if (dev->subordinate)
> -			pci_bus_save_and_disable_locked(dev->subordinate);
> -	}
> -}
> -
> -/*
> - * Restore devices from top of the tree down while holding @dev mutex lock
> - * for the entire tree.  Parent bridges need to be restored before we can
> - * get to subordinate devices.
> - */
> -static void pci_slot_restore_locked(struct pci_slot *slot)
> -{
> -	struct pci_dev *dev;
> -
> -	list_for_each_entry(dev, &slot->bus->devices, bus_list) {
> -		if (!dev->slot || dev->slot != slot)
> -			continue;
> -		pci_dev_restore(dev);
> -		if (dev->subordinate) {
> -			pci_bridge_wait_for_secondary_bus(dev, "slot reset");
> -			pci_bus_restore_locked(dev->subordinate);
> -		}
> -	}
> -}
> -
>  static int pci_slot_reset(struct pci_slot *slot, bool probe)
>  {
> +	struct pci_bus *bus;
>  	int rc;
>  
>  	if (!slot || !pci_slot_resettable(slot))
>  		return -ENOTTY;
>  
> +	bus = slot->bus;
>  	if (!probe)
> -		pci_slot_lock(slot);
> +		pci_bus_lock(bus);
>  
>  	might_sleep();
>  
>  	rc = pci_reset_hotplug_slot(slot->hotplug, probe);
>  
>  	if (!probe)
> -		pci_slot_unlock(slot);
> +		pci_bus_unlock(bus);
>  
>  	return rc;
>  }
> @@ -5487,18 +5391,19 @@ EXPORT_SYMBOL_GPL(pci_probe_reset_slot);
>   */
>  static int __pci_reset_slot(struct pci_slot *slot)
>  {
> +	struct pci_bus *bus = slot->bus;
>  	int rc;
>  
>  	rc = pci_slot_reset(slot, PCI_RESET_PROBE);
>  	if (rc)
>  		return rc;
>  
> -	if (pci_slot_trylock(slot)) {
> -		pci_slot_save_and_disable_locked(slot);
> +	if (pci_bus_trylock(bus)) {
> +		pci_bus_save_and_disable_locked(bus);
>  		might_sleep();
>  		rc = pci_reset_hotplug_slot(slot->hotplug, PCI_RESET_DO_RESET);
> -		pci_slot_restore_locked(slot);
> -		pci_slot_unlock(slot);
> +		pci_bus_restore_locked(bus);
> +		pci_bus_unlock(bus);
>  	} else
>  		rc = -EAGAIN;
>  
> -- 
> 2.47.3
> 

  reply	other threads:[~2026-01-29 20:27 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-29 16:09 [PATCH] pci: remove slot specific lock/unlock and save/restore Keith Busch
2026-01-29 20:27 ` Bjorn Helgaas [this message]
2026-01-29 21:16   ` Keith Busch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260129202750.GA476192@bhelgaas \
    --to=helgaas@kernel.org \
    --cc=alex@shazbot.org \
    --cc=dan.j.williams@intel.com \
    --cc=guojinhui.liam@bytedance.com \
    --cc=ilpo.jarvinen@linux.intel.com \
    --cc=kbusch@kernel.org \
    --cc=kbusch@meta.com \
    --cc=linux-pci@vger.kernel.org \
    --cc=lukas@wunner.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox