* [PATCHv5 0/3] pci slot reset handling fixes
@ 2026-02-17 16:08 Keith Busch
2026-02-17 16:08 ` [PATCHv5 1/3] PCI: rename __pci_bus_reset and __pci_slot_reset Keith Busch
` (4 more replies)
0 siblings, 5 replies; 8+ messages in thread
From: Keith Busch @ 2026-02-17 16:08 UTC (permalink / raw)
To: linux-pci, helgaas, dan.j.williams; +Cc: alex, ilpo.jarvinen, Keith Busch
From: Keith Busch <kbusch@kernel.org>
Previous version:
https://lore.kernel.org/linux-pci/20260212224112.1913980-1-kbusch@meta.com/
Changes from previous version:
* Fix subject line to match convenction (s/pci/PCI)
* Added example kernel panic logs for fatal hardware errors observed
without the slot handling fixes
* Rearraged some code in the last patch to avoid the need for a
forward declaration.
* Expanded on the comment explaing the difference between the
restore values for the bridge reset
Keith Busch (3):
PCI: rename __pci_bus_reset and __pci_slot_reset
PCI: allow all bus devices to use the same slot
PCI: make reset_subordinate hotplug safe
drivers/pci/hotplug/pciehp_core.c | 3 +-
drivers/pci/pci-sysfs.c | 3 +-
drivers/pci/pci.c | 97 ++++++++++++++++++++-----------
drivers/pci/pci.h | 2 +-
drivers/pci/slot.c | 27 +++++++--
include/linux/pci.h | 8 ++-
6 files changed, 97 insertions(+), 43 deletions(-)
--
2.47.3
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCHv5 1/3] PCI: rename __pci_bus_reset and __pci_slot_reset
2026-02-17 16:08 [PATCHv5 0/3] pci slot reset handling fixes Keith Busch
@ 2026-02-17 16:08 ` Keith Busch
2026-02-17 16:08 ` [PATCHv5 2/3] PCI: allow all bus devices to use the same slot Keith Busch
` (3 subsequent siblings)
4 siblings, 0 replies; 8+ messages in thread
From: Keith Busch @ 2026-02-17 16:08 UTC (permalink / raw)
To: linux-pci, helgaas, dan.j.williams; +Cc: alex, ilpo.jarvinen, Keith Busch
From: Keith Busch <kbusch@kernel.org>
Make the code a little easier to navigate with more descriptive
function names. The two renamed functions here "try" to do to a reset,
so make that clear in the name to distinguish them from other similarly
named functions.
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
---
drivers/pci/pci-sysfs.c | 2 +-
drivers/pci/pci.c | 10 +++++-----
drivers/pci/pci.h | 2 +-
3 files changed, 7 insertions(+), 7 deletions(-)
diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index 363187ba4f56c..b44e884fd5372 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -563,7 +563,7 @@ static ssize_t reset_subordinate_store(struct device *dev,
return -EINVAL;
if (val) {
- int ret = __pci_reset_bus(bus);
+ int ret = pci_try_reset_bus(bus);
if (ret)
return ret;
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 06bb01b3bd6c4..3bd0a830872c4 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5541,7 +5541,7 @@ int pci_probe_reset_slot(struct pci_slot *slot)
EXPORT_SYMBOL_GPL(pci_probe_reset_slot);
/**
- * __pci_reset_slot - Try to reset a PCI slot
+ * pci_try_reset_slot - Try to reset a PCI slot
* @slot: PCI slot to reset
*
* A PCI bus may host multiple slots, each slot may support a reset mechanism
@@ -5555,7 +5555,7 @@ EXPORT_SYMBOL_GPL(pci_probe_reset_slot);
*
* Same as above except return -EAGAIN if the slot cannot be locked
*/
-static int __pci_reset_slot(struct pci_slot *slot)
+static int pci_try_reset_slot(struct pci_slot *slot)
{
int rc;
@@ -5644,12 +5644,12 @@ int pci_probe_reset_bus(struct pci_bus *bus)
EXPORT_SYMBOL_GPL(pci_probe_reset_bus);
/**
- * __pci_reset_bus - Try to reset a PCI bus
+ * pci_try_reset_bus - Try to reset a PCI bus
* @bus: top level PCI bus to reset
*
* Same as above except return -EAGAIN if the bus cannot be locked
*/
-int __pci_reset_bus(struct pci_bus *bus)
+int pci_try_reset_bus(struct pci_bus *bus)
{
int rc;
@@ -5678,7 +5678,7 @@ int __pci_reset_bus(struct pci_bus *bus)
int pci_reset_bus(struct pci_dev *pdev)
{
return (!pci_probe_reset_slot(pdev->slot)) ?
- __pci_reset_slot(pdev->slot) : __pci_reset_bus(pdev->bus);
+ pci_try_reset_slot(pdev->slot) : pci_try_reset_bus(pdev->bus);
}
EXPORT_SYMBOL_GPL(pci_reset_bus);
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index c8a0522e2e1f5..d1350d54b932d 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -231,7 +231,7 @@ bool pci_reset_supported(struct pci_dev *dev);
void pci_init_reset_methods(struct pci_dev *dev);
int pci_bridge_secondary_bus_reset(struct pci_dev *dev);
int pci_bus_error_reset(struct pci_dev *dev);
-int __pci_reset_bus(struct pci_bus *bus);
+int pci_try_reset_bus(struct pci_bus *bus);
struct pci_cap_saved_data {
u16 cap_nr;
--
2.47.3
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCHv5 2/3] PCI: allow all bus devices to use the same slot
2026-02-17 16:08 [PATCHv5 0/3] pci slot reset handling fixes Keith Busch
2026-02-17 16:08 ` [PATCHv5 1/3] PCI: rename __pci_bus_reset and __pci_slot_reset Keith Busch
@ 2026-02-17 16:08 ` Keith Busch
2026-03-09 20:50 ` Bjorn Helgaas
2026-02-17 16:08 ` [PATCHv5 3/3] PCI: make reset_subordinate hotplug safe Keith Busch
` (2 subsequent siblings)
4 siblings, 1 reply; 8+ messages in thread
From: Keith Busch @ 2026-02-17 16:08 UTC (permalink / raw)
To: linux-pci, helgaas, dan.j.williams; +Cc: alex, ilpo.jarvinen, Keith Busch
From: Keith Busch <kbusch@kernel.org>
A pcie hotplug slot applies to the entire subordinate bus. Thus, pciehp
only allocates a single hotplug_slot for the bridge to that bus. The pci
slot, though, would only match to functions on device 0, meaning all
device beyond that are not matched to any slot even though they share
it. A slot reset will break all the missing devices because the handling
skips them.
For example, ARI devices with more than 8 functions fail because their state is
not properly handled, nor is the attached driver notified of the reset. In the
best case, the device will appear unresponsive to the driver, resulting in
unexpected errors. A worse possibility may panic the kernel if in flight
transactions trigger hardware reported errors like this real observation:
vfio-pci 0000:01:00.0: resetting
vfio-pci 0000:01:00.0: reset done
{1}[Hardware Error]: Error 1, type: fatal
{1}[Hardware Error]: section_type: PCIe error
{1}[Hardware Error]: port_type: 0, PCIe end point
{1}[Hardware Error]: version: 0.2
{1}[Hardware Error]: command: 0x0140, status: 0x0010
{1}[Hardware Error]: device_id: 0000:01:01.0
{1}[Hardware Error]: slot: 0
{1}[Hardware Error]: secondary_bus: 0x00
{1}[Hardware Error]: vendor_id: 0x1d9b, device_id: 0x0207
{1}[Hardware Error]: class_code: 020000
{1}[Hardware Error]: bridge: secondary_status: 0x0000, control: 0x0000
{1}[Hardware Error]: aer_cor_status: 0x00008000, aer_cor_mask: 0x00002000
{1}[Hardware Error]: aer_uncor_status: 0x00010000, aer_uncor_mask: 0x00100000
{1}[Hardware Error]: aer_uncor_severity: 0x006f6030
{1}[Hardware Error]: TLP Header: 0a412800 00192080 60000004 00000004
GHES: Fatal hardware error but panic disabled
Kernel panic - not syncing: GHES: Fatal hardware error
Allow a slot to be created to claim all devices on a bus, not just a
matching device. This is done by introducing a sentinel value, named
PCI_SLOT_ALL_DEVICES, which then has the pci slot match to any device on
the bus. This fixes slot resets for pciehp.
Since 0xff already has special meaning, the chosen value for this new
feature is 0xfe. This will not clash with any actual slot number since
they are limited to 5 bits.
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
---
drivers/pci/hotplug/pciehp_core.c | 3 ++-
drivers/pci/slot.c | 27 +++++++++++++++++++++++----
include/linux/pci.h | 8 +++++++-
3 files changed, 32 insertions(+), 6 deletions(-)
diff --git a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c
index f59baa9129709..d80346d567049 100644
--- a/drivers/pci/hotplug/pciehp_core.c
+++ b/drivers/pci/hotplug/pciehp_core.c
@@ -79,7 +79,8 @@ static int init_slot(struct controller *ctrl)
snprintf(name, SLOT_NAME_SIZE, "%u", PSN(ctrl));
retval = pci_hp_initialize(&ctrl->hotplug_slot,
- ctrl->pcie->port->subordinate, 0, name);
+ ctrl->pcie->port->subordinate,
+ PCI_SLOT_ALL_DEVICES, name);
if (retval) {
ctrl_err(ctrl, "pci_hp_initialize failed: error %d\n", retval);
kfree(ops);
diff --git a/drivers/pci/slot.c b/drivers/pci/slot.c
index 50fb3eb595fe6..bf6f265454a84 100644
--- a/drivers/pci/slot.c
+++ b/drivers/pci/slot.c
@@ -42,6 +42,15 @@ static ssize_t address_read_file(struct pci_slot *slot, char *buf)
pci_domain_nr(slot->bus),
slot->bus->number);
+ /*
+ * Preserve legacy ABI expectations that hotplug drivers that manage
+ * multiple devices per slot emit 0 for the device number.
+ */
+ if (slot->number == PCI_SLOT_ALL_DEVICES)
+ return sysfs_emit(buf, "%04x:%02x:00\n",
+ pci_domain_nr(slot->bus),
+ slot->bus->number);
+
return sysfs_emit(buf, "%04x:%02x:%02x\n",
pci_domain_nr(slot->bus),
slot->bus->number,
@@ -73,7 +82,8 @@ static void pci_slot_release(struct kobject *kobj)
down_read(&pci_bus_sem);
list_for_each_entry(dev, &slot->bus->devices, bus_list)
- if (PCI_SLOT(dev->devfn) == slot->number)
+ if (slot->number == PCI_SLOT_ALL_DEVICES ||
+ PCI_SLOT(dev->devfn) == slot->number)
dev->slot = NULL;
up_read(&pci_bus_sem);
@@ -166,7 +176,8 @@ void pci_dev_assign_slot(struct pci_dev *dev)
mutex_lock(&pci_slot_mutex);
list_for_each_entry(slot, &dev->bus->slots, list)
- if (PCI_SLOT(dev->devfn) == slot->number)
+ if (slot->number == PCI_SLOT_ALL_DEVICES ||
+ PCI_SLOT(dev->devfn) == slot->number)
dev->slot = slot;
mutex_unlock(&pci_slot_mutex);
}
@@ -188,7 +199,7 @@ static struct pci_slot *get_slot(struct pci_bus *parent, int slot_nr)
/**
* pci_create_slot - create or increment refcount for physical PCI slot
* @parent: struct pci_bus of parent bridge
- * @slot_nr: PCI_SLOT(pci_dev->devfn) or -1 for placeholder
+ * @slot_nr: PCI_SLOT(pci_dev->devfn), -1 for placeholder, or PCI_SLOT_ALL_DEVICES
* @name: user visible string presented in /sys/bus/pci/slots/<name>
* @hotplug: set if caller is hotplug driver, NULL otherwise
*
@@ -222,6 +233,13 @@ static struct pci_slot *get_slot(struct pci_bus *parent, int slot_nr)
* consist solely of a dddd:bb tuple, where dddd is the PCI domain of the
* %struct pci_bus and bb is the bus number. In other words, the devfn of
* the 'placeholder' slot will not be displayed.
+ *
+ * Bus-wide slots:
+ * For PCIe hotplug, the physical slot encompasses the entire subordinate
+ * bus, not just a single device number. Pass @slot_nr == PCI_SLOT_ALL_DEVICES
+ * to create a slot that matches all devices on the bus. Unlike placeholder
+ * slots, bus-wide slots go through normal slot lookup and reuse existing
+ * slots if present.
*/
struct pci_slot *pci_create_slot(struct pci_bus *parent, int slot_nr,
const char *name,
@@ -285,7 +303,8 @@ struct pci_slot *pci_create_slot(struct pci_bus *parent, int slot_nr,
down_read(&pci_bus_sem);
list_for_each_entry(dev, &parent->devices, bus_list)
- if (PCI_SLOT(dev->devfn) == slot_nr)
+ if (slot_nr == PCI_SLOT_ALL_DEVICES ||
+ PCI_SLOT(dev->devfn) == slot_nr)
dev->slot = slot;
up_read(&pci_bus_sem);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index edf792a79193f..7073519bcc1ad 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -72,12 +72,18 @@
/* return bus from PCI devid = ((u16)bus_number) << 8) | devfn */
#define PCI_BUS_NUM(x) (((x) >> 8) & 0xff)
+/*
+ * PCI_SLOT_ALL_DEVICES indicates a slot that covers all devices on the bus.
+ * Used for PCIe hotplug where the physical slot is the entire subordinate bus.
+ */
+#define PCI_SLOT_ALL_DEVICES 0xfe
+
/* pci_slot represents a physical slot */
struct pci_slot {
struct pci_bus *bus; /* Bus this slot is on */
struct list_head list; /* Node in list of slots */
struct hotplug_slot *hotplug; /* Hotplug info (move here) */
- unsigned char number; /* PCI_SLOT(pci_dev->devfn) */
+ unsigned char number; /* Device nr, or PCI_SLOT_ALL_DEVICES */
struct kobject kobj;
};
--
2.47.3
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCHv5 3/3] PCI: make reset_subordinate hotplug safe
2026-02-17 16:08 [PATCHv5 0/3] pci slot reset handling fixes Keith Busch
2026-02-17 16:08 ` [PATCHv5 1/3] PCI: rename __pci_bus_reset and __pci_slot_reset Keith Busch
2026-02-17 16:08 ` [PATCHv5 2/3] PCI: allow all bus devices to use the same slot Keith Busch
@ 2026-02-17 16:08 ` Keith Busch
2026-02-26 22:41 ` [PATCHv5 0/3] pci slot reset handling fixes Keith Busch
2026-03-05 23:17 ` Bjorn Helgaas
4 siblings, 0 replies; 8+ messages in thread
From: Keith Busch @ 2026-02-17 16:08 UTC (permalink / raw)
To: linux-pci, helgaas, dan.j.williams; +Cc: alex, ilpo.jarvinen, Keith Busch
From: Keith Busch <kbusch@kernel.org>
Use the slot reset method when resetting the bridge if the bus contains
hot plug slots. This fixes spurious hot plug events that are triggered
by the secondary bus reset that bypasses the slot's detection disabling.
Resetting a bridge's subordinate bus can be done like this:
# echo 1 > /sys/bus/pci/devices/0000:50:01.0/reset_subordinate
Prior to this patch, an example kernel message may show something like:
pcieport 0000:50:01.0: pciehp: Slot(40): Link Down
With this change, the pciehp driver ignores the link event during the
reset, so may show this message instead:
pcieport 0000:50:01.0: pciehp: Slot(40): Link Down/Up ignored
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
---
drivers/pci/pci-sysfs.c | 3 +-
drivers/pci/pci.c | 91 +++++++++++++++++++++++++++--------------
drivers/pci/pci.h | 2 +-
3 files changed, 62 insertions(+), 34 deletions(-)
diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index b44e884fd5372..6187b0f3e2833 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -553,7 +553,6 @@ static ssize_t reset_subordinate_store(struct device *dev,
const char *buf, size_t count)
{
struct pci_dev *pdev = to_pci_dev(dev);
- struct pci_bus *bus = pdev->subordinate;
unsigned long val;
if (!capable(CAP_SYS_ADMIN))
@@ -563,7 +562,7 @@ static ssize_t reset_subordinate_store(struct device *dev,
return -EINVAL;
if (val) {
- int ret = pci_try_reset_bus(bus);
+ int ret = pci_try_reset_bridge(pdev);
if (ret)
return ret;
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 3bd0a830872c4..747f35bad60a6 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5597,14 +5597,44 @@ static int pci_bus_reset(struct pci_bus *bus, bool probe)
}
/**
- * pci_bus_error_reset - reset the bridge's subordinate bus
- * @bridge: The parent device that connects to the bus to reset
+ * pci_try_reset_bus - Try to reset a PCI bus
+ * @bus: top level PCI bus to reset
+ *
+ * Same as above except return -EAGAIN if the bus cannot be locked
+ */
+static int pci_try_reset_bus(struct pci_bus *bus)
+{
+ int rc;
+
+ rc = pci_bus_reset(bus, PCI_RESET_PROBE);
+ if (rc)
+ return rc;
+
+ if (pci_bus_trylock(bus)) {
+ pci_bus_save_and_disable_locked(bus);
+ might_sleep();
+ rc = pci_bridge_secondary_bus_reset(bus->self);
+ pci_bus_restore_locked(bus);
+ pci_bus_unlock(bus);
+ } else
+ rc = -EAGAIN;
+
+ return rc;
+}
+
+#define PCI_RESET_RESTORE true
+#define PCI_RESET_NO_RESTORE false
+/**
+ * pci_reset_bridge - reset a bridge's subordinate bus
+ * @bridge: bridge that connects to the bus to reset
+ * @restore: when true use a reset method that invokes pci_dev_restore() post
+ * reset for affected devices
*
* This function will first try to reset the slots on this bus if the method is
* available. If slot reset fails or is not available, this will fall back to a
* secondary bus reset.
*/
-int pci_bus_error_reset(struct pci_dev *bridge)
+static int pci_reset_bridge(struct pci_dev *bridge, bool restore)
{
struct pci_bus *bus = bridge->subordinate;
struct pci_slot *slot;
@@ -5620,17 +5650,42 @@ int pci_bus_error_reset(struct pci_dev *bridge)
if (pci_probe_reset_slot(slot))
goto bus_reset;
- list_for_each_entry(slot, &bus->slots, list)
- if (pci_slot_reset(slot, PCI_RESET_DO_RESET))
+ list_for_each_entry(slot, &bus->slots, list) {
+ int ret;
+
+ if (restore)
+ ret = pci_try_reset_slot(slot);
+ else
+ ret = pci_slot_reset(slot, PCI_RESET_DO_RESET);
+
+ if (ret)
goto bus_reset;
+ }
mutex_unlock(&pci_slot_mutex);
return 0;
bus_reset:
mutex_unlock(&pci_slot_mutex);
+
+ if (restore)
+ return pci_try_reset_bus(bus);
return pci_bus_reset(bridge->subordinate, PCI_RESET_DO_RESET);
}
+/**
+ * pci_bus_error_reset - reset the bridge's subordinate bus
+ * @bridge: The parent device that connects to the bus to reset
+ */
+int pci_bus_error_reset(struct pci_dev *bridge)
+{
+ return pci_reset_bridge(bridge, PCI_RESET_NO_RESTORE);
+}
+
+int pci_try_reset_bridge(struct pci_dev *bridge)
+{
+ return pci_reset_bridge(bridge, PCI_RESET_RESTORE);
+}
+
/**
* pci_probe_reset_bus - probe whether a PCI bus can be reset
* @bus: PCI bus to probe
@@ -5643,32 +5698,6 @@ int pci_probe_reset_bus(struct pci_bus *bus)
}
EXPORT_SYMBOL_GPL(pci_probe_reset_bus);
-/**
- * pci_try_reset_bus - Try to reset a PCI bus
- * @bus: top level PCI bus to reset
- *
- * Same as above except return -EAGAIN if the bus cannot be locked
- */
-int pci_try_reset_bus(struct pci_bus *bus)
-{
- int rc;
-
- rc = pci_bus_reset(bus, PCI_RESET_PROBE);
- if (rc)
- return rc;
-
- if (pci_bus_trylock(bus)) {
- pci_bus_save_and_disable_locked(bus);
- might_sleep();
- rc = pci_bridge_secondary_bus_reset(bus->self);
- pci_bus_restore_locked(bus);
- pci_bus_unlock(bus);
- } else
- rc = -EAGAIN;
-
- return rc;
-}
-
/**
* pci_reset_bus - Try to reset a PCI bus
* @pdev: top level PCI device to reset via slot/bus
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index d1350d54b932d..9e363ad22e161 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -231,7 +231,7 @@ bool pci_reset_supported(struct pci_dev *dev);
void pci_init_reset_methods(struct pci_dev *dev);
int pci_bridge_secondary_bus_reset(struct pci_dev *dev);
int pci_bus_error_reset(struct pci_dev *dev);
-int pci_try_reset_bus(struct pci_bus *bus);
+int pci_try_reset_bridge(struct pci_dev *bridge);
struct pci_cap_saved_data {
u16 cap_nr;
--
2.47.3
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCHv5 0/3] pci slot reset handling fixes
2026-02-17 16:08 [PATCHv5 0/3] pci slot reset handling fixes Keith Busch
` (2 preceding siblings ...)
2026-02-17 16:08 ` [PATCHv5 3/3] PCI: make reset_subordinate hotplug safe Keith Busch
@ 2026-02-26 22:41 ` Keith Busch
2026-03-05 23:17 ` Bjorn Helgaas
4 siblings, 0 replies; 8+ messages in thread
From: Keith Busch @ 2026-02-26 22:41 UTC (permalink / raw)
To: Keith Busch; +Cc: linux-pci, helgaas, dan.j.williams, alex, ilpo.jarvinen
Hi Bjorn,
It's pretty quiet on this version. Can we consider this one for
inclusion?
Thanks
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCHv5 0/3] pci slot reset handling fixes
2026-02-17 16:08 [PATCHv5 0/3] pci slot reset handling fixes Keith Busch
` (3 preceding siblings ...)
2026-02-26 22:41 ` [PATCHv5 0/3] pci slot reset handling fixes Keith Busch
@ 2026-03-05 23:17 ` Bjorn Helgaas
4 siblings, 0 replies; 8+ messages in thread
From: Bjorn Helgaas @ 2026-03-05 23:17 UTC (permalink / raw)
To: Keith Busch; +Cc: linux-pci, dan.j.williams, alex, ilpo.jarvinen, Keith Busch
On Tue, Feb 17, 2026 at 08:08:33AM -0800, Keith Busch wrote:
> From: Keith Busch <kbusch@kernel.org>
>
> Previous version:
>
> https://lore.kernel.org/linux-pci/20260212224112.1913980-1-kbusch@meta.com/
>
> Changes from previous version:
>
> * Fix subject line to match convenction (s/pci/PCI)
>
> * Added example kernel panic logs for fatal hardware errors observed
> without the slot handling fixes
>
> * Rearraged some code in the last patch to avoid the need for a
> forward declaration.
>
> * Expanded on the comment explaing the difference between the
> restore values for the bridge reset
>
> Keith Busch (3):
> PCI: rename __pci_bus_reset and __pci_slot_reset
> PCI: allow all bus devices to use the same slot
> PCI: make reset_subordinate hotplug safe
>
> drivers/pci/hotplug/pciehp_core.c | 3 +-
> drivers/pci/pci-sysfs.c | 3 +-
> drivers/pci/pci.c | 97 ++++++++++++++++++++-----------
> drivers/pci/pci.h | 2 +-
> drivers/pci/slot.c | 27 +++++++--
> include/linux/pci.h | 8 ++-
> 6 files changed, 97 insertions(+), 43 deletions(-)
Applied to pci/reset for v7.1, thanks!
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCHv5 2/3] PCI: allow all bus devices to use the same slot
2026-02-17 16:08 ` [PATCHv5 2/3] PCI: allow all bus devices to use the same slot Keith Busch
@ 2026-03-09 20:50 ` Bjorn Helgaas
2026-03-11 19:35 ` Dan Williams
0 siblings, 1 reply; 8+ messages in thread
From: Bjorn Helgaas @ 2026-03-09 20:50 UTC (permalink / raw)
To: Keith Busch; +Cc: linux-pci, dan.j.williams, alex, ilpo.jarvinen, Keith Busch
On Tue, Feb 17, 2026 at 08:08:35AM -0800, Keith Busch wrote:
> From: Keith Busch <kbusch@kernel.org>
>
> A pcie hotplug slot applies to the entire subordinate bus. Thus, pciehp
> only allocates a single hotplug_slot for the bridge to that bus. The pci
> slot, though, would only match to functions on device 0, meaning all
> device beyond that are not matched to any slot even though they share
> it. A slot reset will break all the missing devices because the handling
> skips them.
I tweaked this a bit to (a) refer to the "secondary" bus instead of
"subordinate" to avoid confusion with the PCI "subordinate" term
("pci_dev->subordinate" already has this confusion but is harder to
fix), and (b) mention ARI, which I think is what causes the problem
you're solving:
+ A PCIe hotplug slot applies to the entire secondary bus. Thus, pciehp only
+ allocates a single hotplug_slot for the bridge to that bus. The existing
+ PCI slot, though, would only match to functions on device 0, meaning any
+ devices beyond that, e.g., ARI functions, are not matched to any slot even
+ though they share it.
> For example, ARI devices with more than 8 functions fail because their state is
> not properly handled, nor is the attached driver notified of the reset. In the
> best case, the device will appear unresponsive to the driver, resulting in
> unexpected errors. A worse possibility may panic the kernel if in flight
> transactions trigger hardware reported errors like this real observation:
> @@ -222,6 +233,13 @@ static struct pci_slot *get_slot(struct pci_bus *parent, int slot_nr)
> * consist solely of a dddd:bb tuple, where dddd is the PCI domain of the
> * %struct pci_bus and bb is the bus number. In other words, the devfn of
> * the 'placeholder' slot will not be displayed.
> + *
> + * Bus-wide slots:
> + * For PCIe hotplug, the physical slot encompasses the entire subordinate
> + * bus, not just a single device number. Pass @slot_nr == PCI_SLOT_ALL_DEVICES
> + * to create a slot that matches all devices on the bus. Unlike placeholder
> + * slots, bus-wide slots go through normal slot lookup and reuse existing
> + * slots if present.
+ * For PCIe hotplug, the physical slot encompasses the entire secondary
+ * bus, not just a single device number. If the device supports ARI and ARI
+ * Forwarding is enabled in the upstream bridge, a multi-function device
+ * may include functions that appear to have several different device
+ * numbers, i.e., PCI_SLOT() values. Pass @slot_nr == PCI_SLOT_ALL_DEVICES
> */
> struct pci_slot *pci_create_slot(struct pci_bus *parent, int slot_nr,
> const char *name,
> +++ b/include/linux/pci.h
> @@ -72,12 +72,18 @@
> /* return bus from PCI devid = ((u16)bus_number) << 8) | devfn */
> #define PCI_BUS_NUM(x) (((x) >> 8) & 0xff)
>
> +/*
> + * PCI_SLOT_ALL_DEVICES indicates a slot that covers all devices on the bus.
> + * Used for PCIe hotplug where the physical slot is the entire subordinate bus.
+ * Used for PCIe hotplug where the physical slot is the entire secondary bus,
+ * and, if ARI Forwarding is enabled, functions may appear to be on multiple
+ * devices.
Let me know if I got any of this wrong.
Bjorn
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCHv5 2/3] PCI: allow all bus devices to use the same slot
2026-03-09 20:50 ` Bjorn Helgaas
@ 2026-03-11 19:35 ` Dan Williams
0 siblings, 0 replies; 8+ messages in thread
From: Dan Williams @ 2026-03-11 19:35 UTC (permalink / raw)
To: Bjorn Helgaas, Keith Busch
Cc: linux-pci, dan.j.williams, alex, ilpo.jarvinen, Keith Busch
Bjorn Helgaas wrote:
> On Tue, Feb 17, 2026 at 08:08:35AM -0800, Keith Busch wrote:
> > From: Keith Busch <kbusch@kernel.org>
> >
> > A pcie hotplug slot applies to the entire subordinate bus. Thus, pciehp
> > only allocates a single hotplug_slot for the bridge to that bus. The pci
> > slot, though, would only match to functions on device 0, meaning all
> > device beyond that are not matched to any slot even though they share
> > it. A slot reset will break all the missing devices because the handling
> > skips them.
>
> I tweaked this a bit to (a) refer to the "secondary" bus instead of
> "subordinate" to avoid confusion with the PCI "subordinate" term
> ("pci_dev->subordinate" already has this confusion but is harder to
> fix), and (b) mention ARI, which I think is what causes the problem
> you're solving:
[..]
> Let me know if I got any of this wrong.
Looks good to me and matches my understanding of what Keith is fixing.
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2026-03-11 19:35 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-17 16:08 [PATCHv5 0/3] pci slot reset handling fixes Keith Busch
2026-02-17 16:08 ` [PATCHv5 1/3] PCI: rename __pci_bus_reset and __pci_slot_reset Keith Busch
2026-02-17 16:08 ` [PATCHv5 2/3] PCI: allow all bus devices to use the same slot Keith Busch
2026-03-09 20:50 ` Bjorn Helgaas
2026-03-11 19:35 ` Dan Williams
2026-02-17 16:08 ` [PATCHv5 3/3] PCI: make reset_subordinate hotplug safe Keith Busch
2026-02-26 22:41 ` [PATCHv5 0/3] pci slot reset handling fixes Keith Busch
2026-03-05 23:17 ` Bjorn Helgaas
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox