public inbox for linux-pci@vger.kernel.org
 help / color / mirror / Atom feed
From: <dan.j.williams@intel.com>
To: Keith Busch <kbusch@meta.com>, <linux-pci@vger.kernel.org>,
	<helgaas@kernel.org>
Cc: <alex@shazbot.org>, <lukas@wunner.de>, <dan.j.williams@intel.com>,
	<guojinhui.liam@bytedance.com>, <ilpo.jarvinen@linux.intel.com>,
	Keith Busch <kbusch@kernel.org>
Subject: Re: [PATCHv3 3/4] pci: remove slot specific lock/unlock and save/restore
Date: Tue, 10 Feb 2026 14:03:54 -0800	[thread overview]
Message-ID: <698bab4a7cf16_2e57100bc@dwillia2-mobl4.notmuch> (raw)
In-Reply-To: <20260205212533.1512153-4-kbusch@meta.com>

Keith Busch wrote:
> From: Keith Busch <kbusch@kernel.org>
> 
> The Linux pci driver resolves a "slot" to the "D" in the B:D.f (see
> PCI_SLOT()). A pcie "slot reset" is a secondary bus reset, which affects

Maybe change "pci" and "pcie" to "pciehp" above to make it clear this problem is
specific to the native PCIe hotplug driver?

> every function on every "D", not just the ones with a matching "slot".
> The slot lock/unlock and save/restore functions, however, are only
> handling a subset of the functions, breaking the rest.
> 
> ARI devices with more than 8 functions fail because their state is not
> properly handled, nor is the attached driver notified of the reset. In
> the best case, the device will appear unresponsive to the driver,
> resulting in unexpected errors. A worse possibility may panic the kernel
> if in flight transactions trigger hardware reported errors like this
> real observation:
> 
>   vfio-pci 0000:01:00.0: resetting
>   vfio-pci 0000:01:00.0: reset done
>   {1}[Hardware Error]:  Error 1, type: fatal
>   {1}[Hardware Error]:   section_type: PCIe error
>   {1}[Hardware Error]:   port_type: 0, PCIe end point
>   {1}[Hardware Error]:   version: 0.2
>   {1}[Hardware Error]:   command: 0x0140, status: 0x0010
>   {1}[Hardware Error]:   device_id: 0000:01:01.0
>   {1}[Hardware Error]:   slot: 0
>   {1}[Hardware Error]:   secondary_bus: 0x00
>   {1}[Hardware Error]:   vendor_id: 0x1d9b, device_id: 0x0207
>   {1}[Hardware Error]:   class_code: 020000
>   {1}[Hardware Error]:   bridge: secondary_status: 0x0000, control: 0x0000
>   {1}[Hardware Error]:   aer_cor_status: 0x00008000, aer_cor_mask: 0x00002000
>   {1}[Hardware Error]:   aer_uncor_status: 0x00010000, aer_uncor_mask: 0x00100000
>   {1}[Hardware Error]:   aer_uncor_severity: 0x006f6030
>   {1}[Hardware Error]:   TLP Header: 0a412800 00192080 60000004 00000004
>   GHES: Fatal hardware error but panic disabled
>   Kernel panic - not syncing: GHES: Fatal hardware error
> 
> Fix this by properly locking and notifying the entire affected bus
> topology, not just specific matching slots. For architectures that
> support "slot" specific resets, this patch potentially introduces an
> insignificant amount of overhead, but is otherwise harmless.
> 
> Signed-off-by: Keith Busch <kbusch@kernel.org>
> ---
>  drivers/pci/pci.c | 147 ++++------------------------------------------
>  1 file changed, 11 insertions(+), 136 deletions(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index e00af20ea7376..df9ed73dad416 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
[..]
> @@ -5489,25 +5363,26 @@ EXPORT_SYMBOL_GPL(pci_probe_reset_slot);
>   * wrap the bus reset to avoid spurious slot related events such as hotplug.
>   * Generally a slot reset should be attempted before a bus reset.  All of the
>   * function of the slot and any subordinate buses behind the slot are reset
> - * through this function.  PCI config space of all devices in the slot and
> - * behind the slot is saved before and restored after reset.
> + * through this function.  PCI config space of all devices below the slot bus
> + * are saved before and restored after reset.
>   *
>   * Same as above except return -EAGAIN if the slot cannot be locked
>   */
>  static int pci_try_reset_slot(struct pci_slot *slot)
>  {
> +	struct pci_bus *bus = slot->bus;

Might ->bus be NULL here?

With that clarified / handled you can add:

Reviewed-by: Dan Williams <dan.j.williams@intel.com>

  reply	other threads:[~2026-02-10 22:04 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-05 21:25 [PATCHv3 0/4] pci slot reset handling fixes Keith Busch
2026-02-05 21:25 ` [PATCHv3 1/4] pci: rename __pci_bus_reset and __pci_slot_reset Keith Busch
2026-02-06 17:22   ` Keith Busch
2026-02-10 20:44   ` dan.j.williams
2026-02-05 21:25 ` [PATCHv3 2/4] pci: allow all bus devices to use the same slot Keith Busch
2026-02-10 20:00   ` dan.j.williams
2026-02-10 20:28     ` Keith Busch
2026-02-10 20:51       ` dan.j.williams
2026-02-05 21:25 ` [PATCHv3 3/4] pci: remove slot specific lock/unlock and save/restore Keith Busch
2026-02-10 22:03   ` dan.j.williams [this message]
2026-02-10 23:25     ` Keith Busch
2026-02-10 23:48       ` dan.j.williams
2026-02-10 23:46   ` Alex Williamson
2026-02-11  0:12     ` Keith Busch
2026-02-11 15:22       ` Alex Williamson
2026-02-11 15:54         ` Keith Busch
2026-02-05 21:25 ` [PATCHv3 4/4] pci: make reset_subordinate hotplug safe Keith Busch
2026-02-10 22:14   ` dan.j.williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=698bab4a7cf16_2e57100bc@dwillia2-mobl4.notmuch \
    --to=dan.j.williams@intel.com \
    --cc=alex@shazbot.org \
    --cc=guojinhui.liam@bytedance.com \
    --cc=helgaas@kernel.org \
    --cc=ilpo.jarvinen@linux.intel.com \
    --cc=kbusch@kernel.org \
    --cc=kbusch@meta.com \
    --cc=linux-pci@vger.kernel.org \
    --cc=lukas@wunner.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox