* [PATCH AUTOSEL 7.0-5.15] PCI: Allow all bus devices to use the same slot
[not found] <20260420132314.1023554-1-sashal@kernel.org>
@ 2026-04-20 13:18 ` Sasha Levin
2026-04-20 13:18 ` [PATCH AUTOSEL 7.0-5.10] PCI: Avoid FLR for AMD NPU device Sasha Levin
` (3 subsequent siblings)
4 siblings, 0 replies; 5+ messages in thread
From: Sasha Levin @ 2026-04-20 13:18 UTC (permalink / raw)
To: patches, stable
Cc: Keith Busch, Bjorn Helgaas, Dan Williams, Sasha Levin, linux-pci,
linux-kernel
From: Keith Busch <kbusch@kernel.org>
[ Upstream commit 102c8b26b54e363f85c4c86099ca049a0a76bb58 ]
A PCIe hotplug slot applies to the entire secondary bus. Thus, pciehp only
allocates a single hotplug_slot for the bridge to that bus. The existing
PCI slot, though, would only match to functions on device 0, meaning any
devices beyond that, e.g., ARI functions, are not matched to any slot even
though they share it. A slot reset will break all the missing devices
because the handling skips them.
For example, ARI devices with more than 8 functions fail because their
state is not properly handled, nor is the attached driver notified of the
reset. In the best case, the device will appear unresponsive to the driver,
resulting in unexpected errors. A worse possibility may panic the kernel if
in-flight transactions trigger hardware reported errors like this real
observation:
vfio-pci 0000:01:00.0: resetting
vfio-pci 0000:01:00.0: reset done
{1}[Hardware Error]: Error 1, type: fatal
{1}[Hardware Error]: section_type: PCIe error
{1}[Hardware Error]: port_type: 0, PCIe end point
{1}[Hardware Error]: version: 0.2
{1}[Hardware Error]: command: 0x0140, status: 0x0010
{1}[Hardware Error]: device_id: 0000:01:01.0
{1}[Hardware Error]: slot: 0
{1}[Hardware Error]: secondary_bus: 0x00
{1}[Hardware Error]: vendor_id: 0x1d9b, device_id: 0x0207
{1}[Hardware Error]: class_code: 020000
{1}[Hardware Error]: bridge: secondary_status: 0x0000, control: 0x0000
{1}[Hardware Error]: aer_cor_status: 0x00008000, aer_cor_mask: 0x00002000
{1}[Hardware Error]: aer_uncor_status: 0x00010000, aer_uncor_mask: 0x00100000
{1}[Hardware Error]: aer_uncor_severity: 0x006f6030
{1}[Hardware Error]: TLP Header: 0a412800 00192080 60000004 00000004
GHES: Fatal hardware error but panic disabled
Kernel panic - not syncing: GHES: Fatal hardware error
Allow a slot to be created to claim all devices on a bus, not just a
matching device. This is done by introducing a sentinel value, named
PCI_SLOT_ALL_DEVICES, which then has the PCI slot match to any device on
the bus. This fixes slot resets for pciehp.
Since 0xff already has special meaning, the chosen value for this new
feature is 0xfe. This will not clash with any actual slot number since they
are limited to 5 bits.
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Link: https://patch.msgid.link/20260217160836.2709885-3-kbusch@meta.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
LLM Generated explanations, may be completely bogus:
Now I have a comprehensive understanding. Let me compile my final
analysis.
---
## PHASE 1: COMMIT MESSAGE FORENSICS
**Step 1.1: Subject Line**
- Subsystem: PCI (pciehp hotplug)
- Action verb: "Allow" - describes enabling a new behavior
- Summary: Enables PCIe hotplug slot to match ALL bus devices, not just
device 0
- Record: [PCI/pciehp] [Allow] [bus-wide slot matching for ARI device
reset handling]
**Step 1.2: Tags**
- Signed-off-by: Keith Busch <kbusch@kernel.org> (author, prolific PCI
contributor)
- Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> (PCI subsystem
maintainer)
- Reviewed-by: Dan Williams <dan.j.williams@intel.com> (Intel PCI/CXL
expert)
- Link:
https://patch.msgid.link/20260217160836.2709885-3-kbusch@meta.com
(patch 3 of a series)
- No Fixes: tag, no Cc: stable, no Reported-by - all expected for
autosel candidates
- Record: Strong author+reviewer pedigree. Accepted through PCI
maintainer tree.
**Step 1.3: Commit Body**
- Bug: pciehp allocates a single hotplug_slot for the bridge, but only
matches device 0. ARI devices with >8 functions have functions
appearing to be on different PCI_SLOT() values. These are not matched
to the slot.
- Symptom: Slot reset skips unmatched devices - drivers not notified,
state not saved/restored. This causes hardware errors, device
unresponsiveness, and **kernel panic** from fatal PCIe AER errors.
- Real observed failure: Hardware Error with `Kernel panic - not
syncing: GHES: Fatal hardware error` shown for a vfio-pci device at
0000:01:01.0 during reset of 0000:01:00.0.
- Record: Bug is kernel panic during slot reset of ARI devices. Root
cause is slot matching only covers device 0.
**Step 1.4: Hidden Bug Fix Detection**
- Despite the "Allow" verb (sounds feature-like), this fixes a concrete
kernel panic. The commit includes a full hardware error trace showing
the panic.
- Record: This IS a bug fix, clearly demonstrated by the panic trace.
## PHASE 2: DIFF ANALYSIS
**Step 2.1: Inventory**
- `include/linux/pci.h`: +8 lines (define + comment), 1 line changed
(struct comment)
- `drivers/pci/hotplug/pciehp_core.c`: 2 lines changed (0 ->
PCI_SLOT_ALL_DEVICES)
- `drivers/pci/slot.c`: +8 lines (new sysfs case), 3 conditionals
changed, ~12 lines doc updates
- Total: ~20-25 functional lines, rest documentation. 3 files changed.
- Functions modified: `init_slot()`, `address_read_file()`,
`pci_slot_release()`, `pci_dev_assign_slot()`, `pci_create_slot()`
- Record: Small-to-medium scope, well-contained to slot management code.
**Step 2.2: Code Flow Changes**
1. `pci.h`: Adds `PCI_SLOT_ALL_DEVICES 0xfe` sentinel constant
2. `pciehp_core.c init_slot()`: Changes slot number from `0` to
`PCI_SLOT_ALL_DEVICES`
3. `slot.c pci_slot_release()`: Adds `slot->number ==
PCI_SLOT_ALL_DEVICES ||` check - ensures ALL bus devices get
`dev->slot = NULL` on release
4. `slot.c pci_dev_assign_slot()`: Same pattern - ensures ALL bus
devices get `dev->slot = slot` during assignment
5. `slot.c pci_create_slot()`: Same pattern - ensures ALL devices on bus
get `dev->slot` at creation
6. `slot.c address_read_file()`: New sysfs case for PCI_SLOT_ALL_DEVICES
emitting `0` for device number (backward compatible)
**Step 2.3: Bug Mechanism**
- Category: Logic/correctness bug in slot matching
- What was broken: `PCI_SLOT(dev->devfn) == slot->number` only matches
device 0. ARI functions >=8 have `PCI_SLOT(devfn) != 0`.
- How the fix works: The sentinel `PCI_SLOT_ALL_DEVICES` makes all
comparisons match any device on the bus.
- Impact chain: `pci_dev_assign_slot()` skips ARI devices -> `dev->slot
== NULL` -> `pci_slot_lock/save/restore` skips them (checks
`!dev->slot || dev->slot != slot`) -> state not saved during reset ->
hardware errors -> kernel panic
**Step 2.4: Fix Quality**
- Obviously correct: Yes - the matching logic is straightforward
- Minimal: Mostly - introduces a new constant as mechanism, but the
functional changes are small
- Regression risk: Very low - the new code path only triggers when
`slot->number == PCI_SLOT_ALL_DEVICES`, which only pciehp sets
- Record: High quality fix, well-reviewed, low regression risk.
## PHASE 3: GIT HISTORY
**Step 3.1: Blame**
- `slot.c` line 76 (`PCI_SLOT(dev->devfn) == slot->number`): from commit
cef354db0d7a72 (2008)
- `slot.c` line 169 (pci_dev_assign_slot): from cef354db0d7a72 (2008)
- `pciehp_core.c` line 82 (passing `0`): from 774d446b0f9222 (2018),
originally from even earlier
- The buggy slot matching logic has been present since 2008 - it exists
in ALL stable trees.
- Record: Buggy code from 2008, present in all active stable trees.
**Step 3.2: No Fixes: tag to follow** - Expected for autosel candidates.
**Step 3.3: File History**
- Recent changes to slot.c include treewide allocator changes (non-
conflicting) and minor hotplug cleanups
- The pci_slot_lock() fix (1f5e57c622b4d) is already in 7.0, which was a
prerequisite from the same author
- Record: File has low recent churn. Prerequisite slot lock fix already
present.
**Step 3.4: Author Assessment**
- Keith Busch is a major PCI/NVMe subsystem contributor at Meta
- Has 30+ commits to `drivers/pci/` in this tree, focusing on error
recovery and reset handling
- Record: Author is a key subsystem expert.
**Step 3.5: Dependencies**
- This is patch 3 of a series (msgid ...-3). The v2 series had 4
patches. Patches 1+2 from v2 are in 7.0 (trylock fix + slot lock fix).
Patches 3+4 from v2 were NOT applied (they took a different approach:
removing slot-specific lock/unlock).
- The Feb 17 series (v3?) evolved the approach. This patch introduces
PCI_SLOT_ALL_DEVICES instead of removing slot-specific functions.
- This patch is self-contained: it only modifies slot matching in slot.c
and the init call in pciehp_core.c. The existing pci.c code
(`dev->slot != slot` checks) works automatically once `dev->slot` is
correctly assigned.
- Record: Self-contained, no additional dependencies beyond what's
already in 7.0.
## PHASE 4: MAILING LIST RESEARCH
**Step 4.1: Original Patch Discussion**
- b4 dig found the v2 series at
https://patch.msgid.link/20260130165953.751063-3-kbusch@meta.com
- The v2 thread shows this was a 4-patch series: trylock fix, slot lock
fix, remove slot-specific functions, make reset_subordinate hotplug
safe
- The "Allow all bus devices" commit (Feb 17) is from a later revision
that changed approach
- Dan Williams reviewed patches 3 and 4 of v2, providing feedback that
led to the evolved approach
- Record: Series went through multiple revisions with review feedback
incorporated.
**Step 4.2: Reviewers**
- Dan Williams (Intel, CXL/PCI expert) - Reviewed-by
- Bjorn Helgaas (PCI maintainer) - accepted and signed off
- Record: Key PCI maintainers reviewed and approved.
**Step 4.3: Bug Report**
- No external bug report link, but the commit message contains a real
panic trace from production hardware
- The error shows vfio-pci resetting 0000:01:00.0, then device
0000:01:01.0 (ARI function) triggering fatal hardware error
- Record: Real-world production failure documented in commit message.
**Step 4.4-4.5**: Lore site behind Anubis protection, but b4 dig
provided series information.
## PHASE 5: CODE SEMANTIC ANALYSIS
**Step 5.1: Key Functions**
Modified: `init_slot()`, `address_read_file()`, `pci_slot_release()`,
`pci_dev_assign_slot()`, `pci_create_slot()`
**Step 5.2: Callers**
- `pci_dev_assign_slot()` called from `pci_setup_device()` in probe.c -
called during EVERY device enumeration
- `pci_create_slot()` called from pciehp init and other hotplug drivers
- The reset path uses `dev->slot` pointer in: `pci_slot_resettable()`,
`pci_slot_lock()`, `pci_slot_unlock()`, `pci_slot_trylock()`,
`pci_slot_save_and_disable_locked()`, `pci_slot_restore_locked()`
- `pci_reset_bus()` is called from VFIO, error recovery, and other reset
paths
- Record: Highly reachable code paths. Reset triggered from userspace
via VFIO.
**Step 5.3-5.5**: The fix ensures all devices on a pciehp bus get
`dev->slot` assigned, which propagates to all existing slot iteration
functions without any changes to pci.c.
## PHASE 6: STABLE TREE ANALYSIS
**Step 6.1: Buggy Code in Stable**
- The slot matching code (`PCI_SLOT(dev->devfn) == slot->number`) exists
since 2008 - present in ALL active stable trees
- pciehp passing `0` as slot_nr exists since at least 2018 (commit
774d446b0f9222)
- Record: Bug exists in all stable trees from 5.x through 7.0.
**Step 6.2: Backport Complications**
- The 7.0 tree has some treewide refactoring (kmalloc -> kmalloc_obj)
that might cause minor context conflicts
- The core changes should apply cleanly with minor adjustments
- Record: Minor conflicts possible, but straightforward to resolve.
**Step 6.3: No related fixes already in stable for this specific
issue.**
## PHASE 7: SUBSYSTEM CONTEXT
**Step 7.1: Subsystem**
- PCI core / pciehp - CORE subsystem
- Affects all systems with PCIe hotplug and ARI-capable devices
- Common in: VFIO/SR-IOV virtualization, datacenter hardware
- Record: [PCI/hotplug] [CORE/IMPORTANT]
## PHASE 8: IMPACT AND RISK
**Step 8.1: Affected Users**
- Users with PCIe ARI devices (multi-function devices with >8 functions)
undergoing slot reset
- Common in VFIO/SR-IOV scenarios in datacenters and virtualization
- Record: Important user population (datacenter, virtualization)
**Step 8.2: Trigger Conditions**
- Triggered by slot reset (VFIO reset, error recovery, hotplug events)
- Can be triggered from userspace via VFIO
- The VFIO use case is common in production
- Record: Realistic trigger, reachable from userspace
**Step 8.3: Failure Mode**
- Kernel panic from fatal hardware error (GHES)
- Device unresponsiveness, unexpected driver errors
- Record: CRITICAL severity - kernel panic
**Step 8.4: Risk-Benefit**
- BENEFIT: Very high - prevents kernel panic in production VFIO/ARI
scenarios
- RISK: Low - changes are well-contained, only affect slots with
PCI_SLOT_ALL_DEVICES, backward compatible sysfs output
- Record: Strong benefit/risk ratio
## PHASE 9: FINAL SYNTHESIS
**Step 9.1: Evidence Summary**
FOR backporting:
- Fixes documented kernel panic with real hardware error trace
- Affects production VFIO/SR-IOV users with ARI devices
- Author is key PCI contributor, reviewed by Dan Williams, accepted by
Bjorn Helgaas
- Self-contained fix, no dependencies on other unmerged patches
- Low regression risk - new code path only for PCI_SLOT_ALL_DEVICES
slots
- Buggy code present since 2008 - affects all stable trees
- The sysfs change preserves backward compatibility
AGAINST backporting:
- Introduces new `PCI_SLOT_ALL_DEVICES` constant (new kernel-internal
API)
- Adds new semantics to `pci_create_slot()` (bus-wide slots)
- Part of a multi-patch series (patch 3), though self-contained
- Not purely surgical - includes documentation and sysfs behavior
changes
- No Fixes: tag (expected for autosel)
**Step 9.2: Stable Rules Checklist**
1. Obviously correct and tested? YES - reviewed by Dan Williams,
accepted by Bjorn Helgaas
2. Fixes a real bug? YES - kernel panic during ARI device slot reset
3. Important issue? YES - kernel panic (CRITICAL severity)
4. Small and contained? YES (borderline) - ~25 functional lines, 3
files, well-contained scope
5. No new features/APIs? BORDERLINE - `PCI_SLOT_ALL_DEVICES` is new but
is mechanism for bug fix, not a user-facing feature
6. Can apply to stable? YES with possible minor context adjustments
**Step 9.3: Exception Categories**
- This is a hardware workaround/fix category - fixing slot reset to
properly handle ARI devices
**Step 9.4: Decision**
The kernel panic severity, production impact (VFIO/ARI), strong
authorship/review, self-contained nature, and backward compatibility
outweigh the concern about the new internal constant. The
`PCI_SLOT_ALL_DEVICES` define is the minimal mechanism needed to fix
this bug correctly.
## Verification
- [Phase 1] Parsed tags: Keith Busch (author), Bjorn Helgaas (maintainer
SOB), Dan Williams (Reviewed-by), Link to v3 series patch 3
- [Phase 2] Diff analysis: ~25 functional lines across 3 files. New
PCI_SLOT_ALL_DEVICES constant, 3 conditional changes in slot.c, 1 init
change in pciehp_core.c, sysfs backward-compatible change
- [Phase 3] git blame: Buggy slot matching in slot.c from commit
cef354db0d7a72 (2008), pciehp passing 0 from commit 774d446b0f9222
(2018)
- [Phase 3] git log author: Keith Busch has 30+ PCI commits including
error recovery and reset handling
- [Phase 3] git log: Slot lock fix (1f5e57c622b4d) already in 7.0; v2
patches 3+4 not applied (different approach evolved)
- [Phase 4] b4 dig -c 1f5e57c622b4d: Found v2 series [PATCHv2 0/4] pci:
slot handling fixes at lore
- [Phase 4] b4 dig -a: Series went through at least v2 (Jan 30) and v3+
(Feb 17) revisions
- [Phase 4] Thread review: Dan Williams reviewed the series, patches 1+2
merged, 3+4 replaced by this new approach
- [Phase 5] pci_dev_assign_slot() called from pci_setup_device() during
every device probe
- [Phase 5] pci_reset_bus() reachable from VFIO userspace and error
recovery - confirms user reachability
- [Phase 5] pci.c slot iteration uses `dev->slot != slot` pointer
comparison - works automatically once dev->slot correctly assigned
- [Phase 6] Buggy code exists in all stable trees (code from 2008/2018)
- [Phase 6] Minor context conflicts possible from treewide refactoring
- [Phase 8] Failure mode: kernel panic from GHES fatal hardware error -
severity CRITICAL
- UNVERIFIED: Could not access lore.kernel.org directly due to Anubis
protection for detailed discussion review
- UNVERIFIED: Exact series composition for the v3 (Feb 17) submission
beyond patch 3
**YES**
drivers/pci/hotplug/pciehp_core.c | 3 ++-
drivers/pci/slot.c | 31 +++++++++++++++++++++++++++----
include/linux/pci.h | 10 +++++++++-
3 files changed, 38 insertions(+), 6 deletions(-)
diff --git a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c
index 1e9158d7bac75..2cafd3b26f344 100644
--- a/drivers/pci/hotplug/pciehp_core.c
+++ b/drivers/pci/hotplug/pciehp_core.c
@@ -79,7 +79,8 @@ static int init_slot(struct controller *ctrl)
snprintf(name, SLOT_NAME_SIZE, "%u", PSN(ctrl));
retval = pci_hp_initialize(&ctrl->hotplug_slot,
- ctrl->pcie->port->subordinate, 0, name);
+ ctrl->pcie->port->subordinate,
+ PCI_SLOT_ALL_DEVICES, name);
if (retval) {
ctrl_err(ctrl, "pci_hp_initialize failed: error %d\n", retval);
kfree(ops);
diff --git a/drivers/pci/slot.c b/drivers/pci/slot.c
index 787311614e5b6..e0b7fb43423c9 100644
--- a/drivers/pci/slot.c
+++ b/drivers/pci/slot.c
@@ -42,6 +42,15 @@ static ssize_t address_read_file(struct pci_slot *slot, char *buf)
pci_domain_nr(slot->bus),
slot->bus->number);
+ /*
+ * Preserve legacy ABI expectations that hotplug drivers that manage
+ * multiple devices per slot emit 0 for the device number.
+ */
+ if (slot->number == PCI_SLOT_ALL_DEVICES)
+ return sysfs_emit(buf, "%04x:%02x:00\n",
+ pci_domain_nr(slot->bus),
+ slot->bus->number);
+
return sysfs_emit(buf, "%04x:%02x:%02x\n",
pci_domain_nr(slot->bus),
slot->bus->number,
@@ -73,7 +82,8 @@ static void pci_slot_release(struct kobject *kobj)
down_read(&pci_bus_sem);
list_for_each_entry(dev, &slot->bus->devices, bus_list)
- if (PCI_SLOT(dev->devfn) == slot->number)
+ if (slot->number == PCI_SLOT_ALL_DEVICES ||
+ PCI_SLOT(dev->devfn) == slot->number)
dev->slot = NULL;
up_read(&pci_bus_sem);
@@ -166,7 +176,8 @@ void pci_dev_assign_slot(struct pci_dev *dev)
mutex_lock(&pci_slot_mutex);
list_for_each_entry(slot, &dev->bus->slots, list)
- if (PCI_SLOT(dev->devfn) == slot->number)
+ if (slot->number == PCI_SLOT_ALL_DEVICES ||
+ PCI_SLOT(dev->devfn) == slot->number)
dev->slot = slot;
mutex_unlock(&pci_slot_mutex);
}
@@ -188,7 +199,8 @@ static struct pci_slot *get_slot(struct pci_bus *parent, int slot_nr)
/**
* pci_create_slot - create or increment refcount for physical PCI slot
* @parent: struct pci_bus of parent bridge
- * @slot_nr: PCI_SLOT(pci_dev->devfn) or -1 for placeholder
+ * @slot_nr: PCI_SLOT(pci_dev->devfn), -1 for placeholder, or
+ * PCI_SLOT_ALL_DEVICES
* @name: user visible string presented in /sys/bus/pci/slots/<name>
* @hotplug: set if caller is hotplug driver, NULL otherwise
*
@@ -222,6 +234,16 @@ static struct pci_slot *get_slot(struct pci_bus *parent, int slot_nr)
* consist solely of a dddd:bb tuple, where dddd is the PCI domain of the
* %struct pci_bus and bb is the bus number. In other words, the devfn of
* the 'placeholder' slot will not be displayed.
+ *
+ * Bus-wide slots:
+ * For PCIe hotplug, the physical slot encompasses the entire secondary
+ * bus, not just a single device number. If the device supports ARI and ARI
+ * Forwarding is enabled in the upstream bridge, a multi-function device
+ * may include functions that appear to have several different device
+ * numbers, i.e., PCI_SLOT() values. Pass @slot_nr == PCI_SLOT_ALL_DEVICES
+ * to create a slot that matches all devices on the bus. Unlike placeholder
+ * slots, bus-wide slots go through normal slot lookup and reuse existing
+ * slots if present.
*/
struct pci_slot *pci_create_slot(struct pci_bus *parent, int slot_nr,
const char *name,
@@ -285,7 +307,8 @@ struct pci_slot *pci_create_slot(struct pci_bus *parent, int slot_nr,
down_read(&pci_bus_sem);
list_for_each_entry(dev, &parent->devices, bus_list)
- if (PCI_SLOT(dev->devfn) == slot_nr)
+ if (slot_nr == PCI_SLOT_ALL_DEVICES ||
+ PCI_SLOT(dev->devfn) == slot_nr)
dev->slot = slot;
up_read(&pci_bus_sem);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 1c270f1d51230..5ae2dfdb2d6f3 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -72,12 +72,20 @@
/* return bus from PCI devid = ((u16)bus_number) << 8) | devfn */
#define PCI_BUS_NUM(x) (((x) >> 8) & 0xff)
+/*
+ * PCI_SLOT_ALL_DEVICES indicates a slot that covers all devices on the bus.
+ * Used for PCIe hotplug where the physical slot is the entire secondary bus,
+ * and, if ARI Forwarding is enabled, functions may appear to be on multiple
+ * devices.
+ */
+#define PCI_SLOT_ALL_DEVICES 0xfe
+
/* pci_slot represents a physical slot */
struct pci_slot {
struct pci_bus *bus; /* Bus this slot is on */
struct list_head list; /* Node in list of slots */
struct hotplug_slot *hotplug; /* Hotplug info (move here) */
- unsigned char number; /* PCI_SLOT(pci_dev->devfn) */
+ unsigned char number; /* Device nr, or PCI_SLOT_ALL_DEVICES */
struct kobject kobj;
};
--
2.53.0
^ permalink raw reply related [flat|nested] 5+ messages in thread* [PATCH AUTOSEL 7.0-5.10] PCI: Avoid FLR for AMD NPU device
[not found] <20260420132314.1023554-1-sashal@kernel.org>
2026-04-20 13:18 ` [PATCH AUTOSEL 7.0-5.15] PCI: Allow all bus devices to use the same slot Sasha Levin
@ 2026-04-20 13:18 ` Sasha Levin
2026-04-20 13:19 ` [PATCH AUTOSEL 7.0-6.12] PCI/DPC: Hold pci_dev reference during error recovery Sasha Levin
` (2 subsequent siblings)
4 siblings, 0 replies; 5+ messages in thread
From: Sasha Levin @ 2026-04-20 13:18 UTC (permalink / raw)
To: patches, stable
Cc: Lizhi Hou, Bjorn Helgaas, Sasha Levin, linux-pci, linux-kernel
From: Lizhi Hou <lizhi.hou@amd.com>
[ Upstream commit 806140e9a33218f22188fe5019c7874aa78d81f8 ]
The AMD NPU device (PCI Device IDs 0x1502 and 0x17f0) advertises FLR
support. However, triggering an FLR causes the device to hang.
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260226182545.3057330-1-lizhi.hou@amd.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
LLM Generated explanations, may be completely bogus:
Here is the complete analysis:
---
## PHASE 1: COMMIT MESSAGE FORENSICS
**Step 1.1: Subject Line**
- Subsystem: `PCI`
- Action verb: "Avoid" (indicating a workaround for broken hardware
behavior)
- Summary: Prevent FLR (Function Level Reset) on AMD NPU device IDs
0x1502 and 0x17f0 because it causes device hang.
Record: [PCI] [Avoid] [Disable FLR on AMD NPU that hangs when FLR
triggered]
**Step 1.2: Tags**
- `Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>` — author, AMD developer
working on AMD XDNA/NPU accelerator driver
- `Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>` — PCI subsystem
maintainer applied the patch
- `Link:
https://patch.msgid.link/20260226182545.3057330-1-lizhi.hou@amd.com` —
original submission
- No Fixes: tag (expected — this is a new quirk addition, not a code
fix)
- No Cc: stable tag (expected — that's why we're reviewing)
Record: Author is AMD NPU developer. PCI subsystem maintainer (Bjorn
Helgaas) applied it personally. No syzbot involvement (this is a
hardware bug, not a software bug).
**Step 1.3: Commit Body**
The message is concise: AMD NPU devices with PCI Device IDs 0x1502 and
0x17f0 advertise FLR support, but triggering FLR causes the device to
hang. This is a hardware defect — the device's FLR capability
advertisement is incorrect.
Record: Bug = device hang on FLR. Symptom = device becomes unresponsive.
Root cause = hardware defect in AMD NPU silicon.
**Step 1.4: Hidden Bug Fix Detection**
This is not a "hidden" fix — it's an explicit hardware workaround. The
commit clearly states the problem (device hang) and the solution (avoid
FLR).
Record: Not a hidden bug fix; explicit hardware quirk/workaround.
---
## PHASE 2: DIFF ANALYSIS
**Step 2.1: Inventory**
- Single file modified: `drivers/pci/quirks.c`
- 3 lines added:
1. Comment line in the block comment listing affected devices: `* AMD
Neural Processing Unit 0x1502 0x17f0`
2. `DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x1502, quirk_no_flr);`
3. `DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x17f0, quirk_no_flr);`
- 0 lines removed
- Function affected: none modified; new entries use existing
`quirk_no_flr()` function
Record: +3 lines, 0 removals, 1 file. Scope: trivial device ID addition
to existing quirk table.
**Step 2.2: Code Flow Change**
Before: AMD NPU devices (0x1502, 0x17f0) are not in the quirk table, so
`quirk_no_flr` is not called for them. When FLR is attempted via
`pcie_reset_flr()` or `pci_af_flr()`, the `PCI_DEV_FLAGS_NO_FLR_RESET`
flag is not set, and FLR proceeds — causing device hang.
After: During PCI early fixups, `quirk_no_flr()` sets the
`PCI_DEV_FLAGS_NO_FLR_RESET` flag on these devices. When FLR is later
attempted, `pcie_reset_flr()` (line 4375) and `pci_af_flr()` (line 4398)
check this flag and return `-ENOTTY` instead, preventing the hang.
Record: [Before: FLR triggers and hangs device] → [After: FLR is
blocked, device remains functional]
**Step 2.3: Bug Mechanism**
Category: Hardware workaround (PCI quirk).
The `DECLARE_PCI_FIXUP_EARLY` macro registers a callback that runs
during PCI enumeration for the specific vendor/device ID pair. The
callback sets a flag that is checked in the FLR code paths. This is an
extremely well-established pattern used for many other devices.
Record: [Hardware workaround] [Device advertises broken FLR capability;
quirk prevents FLR from being used]
**Step 2.4: Fix Quality**
- Obviously correct: follows identical pattern of 7+ existing entries in
the same quirk table
- Minimal/surgical: 3 lines, no logic changes
- Regression risk: essentially zero — the flag only affects these
specific PCI device IDs; no impact on any other devices
- No red flags
Record: Fix quality: excellent. Regression risk: near-zero.
---
## PHASE 3: GIT HISTORY INVESTIGATION
**Step 3.1: Blame**
The `quirk_no_flr` function was introduced in commit `f65fd1aa4f9881`
(2017-04-03, v4.12-rc1) for Intel 82579 NICs. It has been stable
infrastructure since v4.12. Additional device entries were added over
the years (AMD Matisse in 2020, AMD FCH AHCI in 2023, SolidRun SNET DPU,
Mediatek MT7922 in 2025).
Record: `quirk_no_flr` infrastructure exists since v4.12 (2017). Present
in ALL active stable kernel trees.
**Step 3.2: Fixes Tag**
No Fixes: tag — not applicable (this is a new device ID addition, not a
code fix).
**Step 3.3: File History**
`drivers/pci/quirks.c` is an actively maintained file with frequent
device-specific additions. The pattern of adding
`DECLARE_PCI_FIXUP_EARLY` entries for broken FLR is well-established
with many precedents.
Record: File is active. Pattern is established. This is standalone — no
prerequisites.
**Step 3.4: Author**
Lizhi Hou is an AMD developer who works on the `accel/amdxdna` (AMD NPU)
driver. Their commits show deep familiarity with the AMD NPU hardware.
They are the right person to identify this hardware defect.
Record: Author is the AMD NPU subsystem developer — credible source for
this hardware bug report.
**Step 3.5: Dependencies**
None. The `quirk_no_flr` function, `DECLARE_PCI_FIXUP_EARLY` macro, and
`PCI_VENDOR_ID_AMD` constant all exist in every stable tree back to
v4.12. This patch is completely self-contained.
Record: No dependencies. Will apply cleanly to any stable tree with the
quirk infrastructure (all of them).
---
## PHASE 4: MAILING LIST AND EXTERNAL RESEARCH
**Step 4.1-4.5:** Lore and patch.msgid.link are currently returning
anti-bot challenges. However, the commit was applied by Bjorn Helgaas
(PCI subsystem maintainer), which is the strongest possible endorsement
for a PCI patch. The Link: tag confirms it was submitted and reviewed
through the standard kernel mailing list process.
Record: Could not access lore discussion due to anti-bot protection.
However, maintainer (Bjorn Helgaas) applied the patch directly, which is
strong validation.
---
## PHASE 5: CODE SEMANTIC ANALYSIS
**Step 5.1: Functions**
No functions modified. Two new `DECLARE_PCI_FIXUP_EARLY` entries added
that call existing `quirk_no_flr()`.
**Step 5.2: Callers of quirk_no_flr**
Called by the PCI subsystem during device enumeration (early fixup
phase). This runs for every PCI device matching the vendor/device ID
pair.
**Step 5.3-5.4: Impact path**
`quirk_no_flr()` → sets `PCI_DEV_FLAGS_NO_FLR_RESET` → checked by
`pcie_reset_flr()` and `pci_af_flr()` in `drivers/pci/pci.c`. FLR is
triggered during VFIO device passthrough, device reset operations, and
error recovery. The flag causes these functions to return `-ENOTTY`,
which makes the PCI reset machinery use an alternative reset method.
Record: FLR is triggered in VFIO passthrough and device reset. The quirk
prevents hang in these common operations.
**Step 5.5: Similar patterns**
7 existing entries in the same quirk table for AMD, Intel, Mediatek, and
SolidRun devices. This is a very common pattern.
---
## PHASE 6: STABLE TREE ANALYSIS
**Step 6.1: Buggy code in stable?**
The "buggy code" is the PCI FLR path itself which doesn't know about
these AMD NPU devices. The `quirk_no_flr` infrastructure has existed
since v4.12. However, the AMD NPU hardware itself may only appear in
relatively recent systems.
Record: The quirk infrastructure exists in all stable trees. The fix
will apply cleanly.
**Step 6.2: Backport complications**
None expected. The patch adds entries at the end of an existing table.
The exact ordering might differ slightly across stable trees, but the
macro entries are independent and the patch should apply with at most
trivial context adjustments.
Record: Expected backport difficulty: clean apply or trivial context
adjustment.
**Step 6.3: Related fixes in stable**
No related fixes for AMD NPU FLR already in stable.
---
## PHASE 7: SUBSYSTEM AND MAINTAINER CONTEXT
**Step 7.1:** PCI subsystem (`drivers/pci/`) — CORE infrastructure,
affects all PCI device users.
**Step 7.2:** PCI quirks.c is actively maintained with frequent device
additions.
Record: [PCI subsystem] [CORE criticality] [Active development]
---
## PHASE 8: IMPACT AND RISK ASSESSMENT
**Step 8.1: Who is affected**
Users with AMD NPU hardware (PCI IDs 0x1502, 0x17f0). This includes
users doing VFIO passthrough to VMs, users experiencing error recovery
that triggers FLR, and potentially users of the amdxdna accelerator
driver.
Record: Affected users: AMD NPU hardware users.
**Step 8.2: Trigger conditions**
FLR is triggered during: VFIO device attach/detach, device error
recovery, and explicit reset operations. For VFIO users, this is
triggered during normal VM operations.
Record: Trigger is FLR attempt — common in VFIO and device reset
scenarios.
**Step 8.3: Failure mode**
Device hang — the NPU becomes unresponsive and cannot be used. This is a
HIGH severity issue.
Record: Severity: HIGH (device hang, device becomes unusable)
**Step 8.4: Risk-Benefit**
- Benefit: HIGH — prevents device hang on real hardware for real users
- Risk: VERY LOW — 3 lines, uses existing well-tested infrastructure,
affects only the specific device IDs, no logic changes
- Ratio: Excellent benefit-to-risk ratio
---
## PHASE 9: FINAL SYNTHESIS
**Step 9.1: Evidence Summary**
FOR backporting:
- Classic PCI quirk addition — explicitly listed as an exception
category for stable
- Only 3 lines added, using existing infrastructure unchanged since
v4.12
- Prevents device hang (HIGH severity)
- Applied by PCI subsystem maintainer (Bjorn Helgaas)
- Author is the AMD NPU hardware developer (credible bug report)
- Zero regression risk — only affects specific PCI device IDs
- Pattern has 7+ precedents in the same file
- Will apply cleanly to all stable trees
AGAINST backporting:
- None identified
**Step 9.2: Stable Rules Checklist**
1. Obviously correct and tested? **YES** — identical pattern to 7 other
entries, applied by subsystem maintainer
2. Fixes a real bug? **YES** — device hang on FLR
3. Important issue? **YES** — device hang is HIGH severity
4. Small and contained? **YES** — 3 lines, single file
5. No new features or APIs? **YES** — just device IDs for existing quirk
6. Can apply to stable trees? **YES** — infrastructure exists since
v4.12
**Step 9.3: Exception Category**
This is a **hardware quirk/workaround** — explicitly called out as
automatic YES for stable.
---
## Verification
- [Phase 1] Parsed tags: Signed-off-by from AMD developer (Lizhi Hou)
and PCI maintainer (Bjorn Helgaas)
- [Phase 2] Diff analysis: +3 lines (1 comment, 2
DECLARE_PCI_FIXUP_EARLY macros) in `drivers/pci/quirks.c`
- [Phase 3] git blame: `quirk_no_flr` introduced in commit
`f65fd1aa4f9881` (v4.12-rc1, 2017), present in all stable trees
- [Phase 3] git show `0d14f06cd6657`: confirmed AMD Matisse FLR quirk
precedent from 2020
- [Phase 3] git log --author: Lizhi Hou confirmed as AMD XDNA/NPU
accelerator developer
- [Phase 4] Lore/patch.msgid.link blocked by anti-bot; maintainer SOB
confirms standard review process
- [Phase 5] Grep for `PCI_DEV_FLAGS_NO_FLR_RESET`: checked in
`pcie_reset_flr()` (line 4375) and `pci_af_flr()` (line 4398) in
`pci.c`
- [Phase 6] `quirk_no_flr` infrastructure exists since v4.12 — present
in ALL active stable trees
- [Phase 6] No backport complications: trivial table additions
- [Phase 8] Failure mode: device hang on FLR, severity HIGH
- UNVERIFIED: Could not access lore discussion to check for reviewer
feedback or stable nominations (anti-bot protection)
This is a textbook hardware quirk addition: minimal (3 lines), using
long-established infrastructure, preventing a device hang on real AMD
NPU hardware, applied by the PCI subsystem maintainer, with zero
regression risk.
**YES**
drivers/pci/quirks.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 48946cca4be72..757a296eae411 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -5603,6 +5603,7 @@ DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x443, quirk_intel_qat_vf_cap);
* AMD Starship/Matisse HD Audio Controller 0x1487
* AMD Starship USB 3.0 Host Controller 0x148c
* AMD Matisse USB 3.0 Host Controller 0x149c
+ * AMD Neural Processing Unit 0x1502 0x17f0
* Intel 82579LM Gigabit Ethernet Controller 0x1502
* Intel 82579V Gigabit Ethernet Controller 0x1503
* Mediatek MT7922 802.11ax PCI Express Wireless Network Adapter
@@ -5615,6 +5616,8 @@ DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x1487, quirk_no_flr);
DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x148c, quirk_no_flr);
DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x149c, quirk_no_flr);
DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x7901, quirk_no_flr);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x1502, quirk_no_flr);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x17f0, quirk_no_flr);
DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1502, quirk_no_flr);
DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1503, quirk_no_flr);
DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_MEDIATEK, 0x0616, quirk_no_flr);
--
2.53.0
^ permalink raw reply related [flat|nested] 5+ messages in thread* [PATCH AUTOSEL 7.0-6.12] PCI/DPC: Hold pci_dev reference during error recovery
[not found] <20260420132314.1023554-1-sashal@kernel.org>
2026-04-20 13:18 ` [PATCH AUTOSEL 7.0-5.15] PCI: Allow all bus devices to use the same slot Sasha Levin
2026-04-20 13:18 ` [PATCH AUTOSEL 7.0-5.10] PCI: Avoid FLR for AMD NPU device Sasha Levin
@ 2026-04-20 13:19 ` Sasha Levin
2026-04-20 13:19 ` [PATCH AUTOSEL 7.0-5.10] PCI/VGA: Pass vga_get_uninterruptible() errors to userspace Sasha Levin
2026-04-20 13:21 ` [PATCH AUTOSEL 6.18] PCI: hv: Set default NUMA node to 0 for devices without affinity info Sasha Levin
4 siblings, 0 replies; 5+ messages in thread
From: Sasha Levin @ 2026-04-20 13:19 UTC (permalink / raw)
To: patches, stable
Cc: Sizhe Liu, Bjorn Helgaas, Sasha Levin, mahesh, linuxppc-dev,
linux-pci, linux-kernel
From: Sizhe Liu <liusizhe5@huawei.com>
[ Upstream commit a1ed752bc7cb77b740cee671567d9508ae74becd ]
The AER and EDR error handling paths hold a reference on the pci_dev during
recovery. Hold a reference during the DPC recovery path as well.
Signed-off-by: Sizhe Liu <liusizhe5@huawei.com>
[bhelgaas: split to separate patch]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
https://patch.msgid.link/20260214081130.1878424-1-liusizhe5@huawei.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
LLM Generated explanations, may be completely bogus:
Now I have all the information needed. Let me write the complete
analysis.
---
## Complete Analysis: PCI/DPC: Hold pci_dev reference during error
recovery
### PHASE 1: COMMIT MESSAGE FORENSICS
**Step 1.1: Subject Line**
- Subsystem: `PCI/DPC` (PCI Express Downstream Port Containment)
- Action verb: "Hold" - implies adding a missing safety measure
- Summary: Hold a pci_dev reference during error recovery to prevent
potential use-after-free
- Record: [PCI/DPC] [Hold] [Add missing reference counting for pci_dev
during DPC recovery path]
**Step 1.2: Tags**
- `Signed-off-by: Sizhe Liu <liusizhe5@huawei.com>` - Original author
- `[bhelgaas: split to separate patch]` - Bjorn Helgaas (PCI subsystem
maintainer) split this from a larger patch
- `Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>` - Committed by
the PCI subsystem maintainer
- `Link: https://patch.msgid.link/20260214081130.1878424-1-
liusizhe5@huawei.com` - Original submission
No Fixes: tag, no Reported-by, no Cc: stable (expected for a candidate).
Record: Signed by author and PCI maintainer. Split from a larger patch
by Bjorn Helgaas. No Fixes: tag.
**Step 1.3: Commit Body**
The message states: "The AER and EDR error handling paths hold a
reference on the pci_dev during recovery. Hold a reference during the
DPC recovery path as well." This is a clear statement of a reference
counting inconsistency between the three error recovery paths.
Record: Bug = missing pci_dev reference during DPC error recovery.
Symptom = potential use-after-free if device is freed during recovery.
Root cause = reference counting inconsistency between DPC, AER, and EDR
paths.
**Step 1.4: Hidden Bug Fix Detection**
Yes - this is a reference counting bug fix. The word "Hold" implies
adding a missing reference, aligning DPC with AER and EDR behavior. This
fixes a potential use-after-free.
### PHASE 2: DIFF ANALYSIS
**Step 2.1: Inventory**
- 1 file modified: `drivers/pci/pcie/dpc.c`
- +2 lines added, 0 removed
- Function modified: `dpc_handler()`
- Classification: single-file surgical fix (2 lines)
**Step 2.2: Code Flow Change**
Before: `dpc_handler()` uses `pdev` throughout `dpc_process_error()` and
`pcie_do_recovery()` without holding a reference.
After: `pci_dev_get(pdev)` before processing, `pci_dev_put(pdev)` after
recovery completes. Ensures the `pci_dev` object lives through the
entire recovery.
**Step 2.3: Bug Mechanism**
Category: **Reference counting fix**
- `pci_dev_get()` is ADDED (this fixes a potential use-after-free /
missing reference hold)
- AER path: uses `pci_dev_get()` in `add_error_device()` at `aer.c:992`
and `pci_dev_put()` in `handle_error_source()` at `aer.c:1202`
- AER APEI path: uses `pci_get_domain_bus_and_slot()` (returns with ref)
at `aer.c:1226` and `pci_dev_put()` at `aer.c:1253`
- EDR path: uses `pci_dev_get()` in `acpi_dpc_port_get()` at
`edr.c:89,94` and `pci_dev_put()` at `edr.c:218`
- DPC path: **no reference held** before this fix
**Step 2.4: Fix Quality**
- Obviously correct: balanced get/put pair wrapping the usage
- Minimal/surgical: exactly 2 lines
- No regression risk: adding reference counting cannot cause deadlock or
data corruption
- No red flags
### PHASE 3: GIT HISTORY INVESTIGATION
**Step 3.1: Blame**
The `dpc_handler()` function was authored by Kuppuswamy Sathyanarayanan
in commit `aea47413e7ceec` (2020-03-23), present since v5.15. The core
structure with `dpc_process_error()` + `pcie_do_recovery()` has been
stable since then. The surprise removal check was added later by
`2ae8fbbe1cd42` (2024, first in v6.12).
**Step 3.2: Fixes Tag Context**
The original v3 patch had `Fixes: a57f2bfb4a58` ("PCI/AER: Ratelimit
correctable and non-fatal error logging") which was first in v6.16-rc1.
However, the reference counting bug existed earlier - the DPC path has
been missing the reference since `aea47413e7ceec` (2020, v5.15+). Bjorn
split the reference counting part into its own patch.
**Step 3.3: File History**
Recent commits to `dpc.c` are mostly cleanup/improvement (FIELD_GET,
defines, TLP log). No other reference counting fixes.
**Step 3.4: Author Context**
Sizhe Liu (Huawei) identified the issue. Bjorn Helgaas (PCI subsystem
maintainer) reviewed, suggested the reference counting addition, and
committed the fix.
**Step 3.5: Dependencies**
This commit is fully standalone. It adds `pci_dev_get()`/`pci_dev_put()`
around existing code. No new functions, no API changes, no dependencies.
### PHASE 4: MAILING LIST RESEARCH
The complete discussion was found. Key findings:
1. Sizhe Liu submitted v1/v2/v3 of "PCI/AER: Fix missing AER logs in DPC
and EDR paths"
2. On v2 review, **Bjorn Helgaas himself identified** the missing
reference: "I don't see a similar pci_dev_get() anywhere in the DPC
path ... holding that reference on the device is important."
3. Sizhe Liu agreed and added it in v3
4. Bjorn then split the v3 patch into two separate patches: the AER log
fix and this reference counting fix
5. Shiju Jose reviewed v3 with minor formatting comments
6. The patch was applied by Bjorn Helgaas
### PHASE 5: CODE SEMANTIC ANALYSIS
**Callers of affected code:**
- `dpc_handler()` is a threaded IRQ handler registered in `dpc_probe()`
via `devm_request_threaded_irq()`
- Triggered by `dpc_irq()` (hardirq handler) returning `IRQ_WAKE_THREAD`
- `pcie_do_recovery()` is a long-running function that walks the PCI
bus, calls driver error handlers, resets links, and waits for
secondary bus readiness
**Call chain:** Hardware DPC trigger -> `dpc_irq()` -> `dpc_handler()`
-> `dpc_process_error()` + `pcie_do_recovery()` -> `dpc_reset_link()` ->
`pcie_wait_for_link()` + `pci_bridge_wait_for_secondary_bus()`
The recovery process can take seconds (waiting for links, bus resets).
During this time, the `pci_dev` must remain valid.
### PHASE 6: STABLE TREE ANALYSIS
The buggy code (`dpc_handler()` without reference) exists in all stable
trees from v5.15 onwards. The function was introduced in
`aea47413e7ceec` (v5.15). For trees older than v6.12, the surprise
removal block won't be present, but the patch context for the
`pci_dev_get`/`pci_dev_put` addition is around the `dpc_process_error()`
+ `pcie_do_recovery()` calls which are present in all trees. Minor
conflicts may be needed for trees without the surprise removal check.
### PHASE 7: SUBSYSTEM AND MAINTAINER CONTEXT
- Subsystem: PCI Express (drivers/pci/pcie/) - IMPORTANT subsystem
affecting all PCIe users
- DPC is a standard PCIe feature for error containment
- Maintainer: Bjorn Helgaas - he personally identified and committed
this fix
- Criticality: IMPORTANT - affects all systems with PCIe DPC support
### PHASE 8: IMPACT AND RISK ASSESSMENT
**Who is affected:** All systems with PCIe DPC support (most modern x86
systems, some ARM)
**Trigger conditions:** A DPC containment event (PCIe fatal error)
concurrent with device removal. While the specific race may be hard to
trigger, DPC events are not uncommon (hardware errors, NVMe removal
under load).
**Failure mode:** Potential use-after-free - accessing freed `pci_dev`
during recovery. Severity: **HIGH** (crash, potential security issue)
**Benefit:** Fixes a reference counting correctness bug, aligns DPC with
AER/EDR behavior, prevents potential UAF during error recovery.
**Risk:** **Very low** - 2 lines, balanced get/put, obviously correct,
no behavioral change.
### PHASE 9: FINAL SYNTHESIS
**Evidence FOR backporting:**
1. Fixes a reference counting bug (missing `pci_dev_get` in DPC handler)
2. Potential use-after-free during error recovery
3. Only 2 lines added - minimal risk
4. PCI subsystem maintainer (Bjorn Helgaas) personally identified the
issue and committed the fix
5. Consistent with how AER and EDR paths already work
6. `pcie_do_recovery()` is a long-running function making the window
non-trivial
7. Buggy code present since v5.15
8. Standalone fix with no dependencies
**Evidence AGAINST backporting:**
1. No concrete crash report or syzbot finding (theoretical)
2. Race requires specific timing (DPC + hot-removal)
**Stable Rules Checklist:**
1. Obviously correct? **YES** - balanced get/put, trivially verifiable
2. Fixes a real bug? **YES** - reference counting inconsistency,
potential UAF
3. Important issue? **YES** - potential crash/UAF during error recovery
4. Small and contained? **YES** - 2 lines in 1 file
5. No new features/APIs? **YES** - pure bug fix
6. Can apply to stable? **YES** - standalone, clean apply expected for
v7.0
### Verification
- [Phase 1] Parsed tags: Signed-off-by Sizhe Liu + Bjorn Helgaas, Link
to lore
- [Phase 2] Diff analysis: +2 lines in `dpc_handler()`, adds
`pci_dev_get()`/`pci_dev_put()` around error recovery
- [Phase 3] git blame: `dpc_handler()` introduced in commit
`aea47413e7ceec` (2020), present since v5.15
- [Phase 3] git show a57f2bfb4a58: confirmed the ratelimit commit is in
v6.16-rc1, first in v7.0 tree
- [Phase 4] Found full thread via yhbt.net mirror of lore: v1/v2/v3
progression, Bjorn requested ref counting in v2 review
- [Phase 4] Bjorn Helgaas (PCI maintainer) explicitly said "holding that
reference on the device is important"
- [Phase 4] Bjorn split original 3-file patch into separate patches,
this is the DPC ref-counting portion
- [Phase 5] `pcie_do_recovery()` walks bus, resets links, waits for
secondary bus - long-running operation
- [Phase 5] Verified AER (aer.c:992/1202), AER APEI (aer.c:1226/1253),
EDR (edr.c:89/218) all hold references
- [Phase 5] Verified DPC path has NO reference holding in current 7.0
tree
- [Phase 6] Buggy code exists since v5.15 (aea47413e7ceec present in
v5.15, v6.1, v6.6, v6.12, v7.0)
- [Phase 6] Surprise removal block (2ae8fbbe1cd42) is in v6.12+ only, so
older trees may need context adjustment
- [Phase 7] PCI/DPC is an important subsystem; fix committed by
subsystem maintainer
- [Phase 8] Failure mode: potential UAF during DPC recovery, severity
HIGH
- UNVERIFIED: Whether the `devm_request_threaded_irq` + `free_irq`
synchronization fully prevents the race (it synchronizes handler
completion but doesn't prevent refcount drops during handler
execution)
The fix is a 2-line, obviously correct reference counting addition that
aligns the DPC error recovery path with the AER and EDR paths. The PCI
subsystem maintainer identified and committed it. The potential
consequence of the missing reference is a use-after-free, and the fix
carries essentially zero regression risk.
**YES**
drivers/pci/pcie/dpc.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
index fc18349614d7c..f028bc795f197 100644
--- a/drivers/pci/pcie/dpc.c
+++ b/drivers/pci/pcie/dpc.c
@@ -372,11 +372,13 @@ static irqreturn_t dpc_handler(int irq, void *context)
return IRQ_HANDLED;
}
+ pci_dev_get(pdev);
dpc_process_error(pdev);
/* We configure DPC so it only triggers on ERR_FATAL */
pcie_do_recovery(pdev, pci_channel_io_frozen, dpc_reset_link);
+ pci_dev_put(pdev);
return IRQ_HANDLED;
}
--
2.53.0
^ permalink raw reply related [flat|nested] 5+ messages in thread* [PATCH AUTOSEL 7.0-5.10] PCI/VGA: Pass vga_get_uninterruptible() errors to userspace
[not found] <20260420132314.1023554-1-sashal@kernel.org>
` (2 preceding siblings ...)
2026-04-20 13:19 ` [PATCH AUTOSEL 7.0-6.12] PCI/DPC: Hold pci_dev reference during error recovery Sasha Levin
@ 2026-04-20 13:19 ` Sasha Levin
2026-04-20 13:21 ` [PATCH AUTOSEL 6.18] PCI: hv: Set default NUMA node to 0 for devices without affinity info Sasha Levin
4 siblings, 0 replies; 5+ messages in thread
From: Sasha Levin @ 2026-04-20 13:19 UTC (permalink / raw)
To: patches, stable
Cc: Simon Richter, Bjorn Helgaas, Sasha Levin, linux-pci,
linux-kernel
From: Simon Richter <Simon.Richter@hogyros.de>
[ Upstream commit 2a93c9851b2bb38614fadd84aa674b7a5c8181c6 ]
If VGA routing cannot be established, vga_get_uninterruptible() returns an
error and does not increment the lock count. Return the error to the
caller.
Return before incrementing uc->io_cnt/mem_cnt so vga_arb_release() won't
call vga_put() when userspace closes the handle.
Signed-off-by: Simon Richter <Simon.Richter@hogyros.de>
[bhelgaas: commit log]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260307173538.763188-2-Simon.Richter@hogyros.de
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
LLM Generated explanations, may be completely bogus:
Now I have all the information needed for a thorough analysis. Let me
compile the findings.
---
## PHASE 1: COMMIT MESSAGE FORENSICS
### Step 1.1: Subject Line
- **Subsystem**: PCI/VGA
- **Action verb**: "Pass" (implies a fix for missing error propagation)
- **Summary**: Pass error return from `vga_get_uninterruptible()` back
to userspace instead of silently ignoring it.
- Record: [PCI/VGA] [pass/fix] [Propagate vga_get_uninterruptible()
error to userspace callers]
### Step 1.2: Tags
- `Signed-off-by: Simon Richter <Simon.Richter@hogyros.de>` — patch
author
- `[bhelgaas: commit log]` — Bjorn Helgaas (PCI subsystem maintainer)
edited the log
- `Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>` — committed by
the PCI subsystem maintainer
- `Link: https://patch.msgid.link/20260307173538.763188-2-
Simon.Richter@hogyros.de`
- No Fixes: tag, no Cc: stable, no Reported-by. This is expected.
- Record: Maintainer-committed. Message-id `-2-` suggests this may be
part of a series.
### Step 1.3: Body Text
The message clearly states two bugs:
1. When `vga_get_uninterruptible()` fails, the error is silently dropped
— userspace isn't told the lock failed.
2. Even on failure, `uc->io_cnt/mem_cnt` are incremented, causing
spurious `vga_put()` calls in `vga_arb_release()`.
Record: Bug = ignored error return leads to false success report to
userspace AND unbalanced lock counters. Failure mode = userspace
believes it holds VGA locks it doesn't actually hold.
### Step 1.4: Hidden Bug Fix?
This is an explicit bug fix — the commit describes two concrete problems
and the fix for each.
---
## PHASE 2: DIFF ANALYSIS
### Step 2.1: Inventory
- **Files changed**: 1 (`drivers/pci/vgaarb.c`)
- **Lines added**: ~5 (new `int err;` variable, error check, and early
return)
- **Functions modified**: `vga_arb_write()` (the "lock" command handler)
- **Scope**: Single-file, single-function surgical fix.
### Step 2.2: Code Flow Change
**Before**: `vga_get_uninterruptible()` is called, return value
discarded, code unconditionally increments user counters and returns
`count` (success).
**After**: Return value is captured in `err`. If non-zero, the function
returns the error code to userspace via `goto done`, skipping the
counter increments.
### Step 2.3: Bug Mechanism
This is a **logic/correctness fix** combined with a **reference counting
fix**:
- `vga_get()` returns `-ENODEV` when the VGA device is removed from the
list (e.g., hot-unplug between target check and lock attempt)
- Ignoring this error means userspace thinks it holds a VGA lock when it
doesn't
- The unbalanced counters cause `vga_arb_release()` (lines 1424-1427) to
call `vga_put()` for locks never acquired (though `__vga_put()`'s `>0`
guards prevent underflow)
### Step 2.4: Fix Quality
- Obviously correct: standard error-checking pattern
- Minimal/surgical: 5 lines, single hunk in the "lock" handler
- No regression risk: the only change is to return error early rather
than continuing
- No API or behavior change: callers of `vga_arb_write()` already handle
error returns
---
## PHASE 3: GIT HISTORY INVESTIGATION
### Step 3.1: Blame
The buggy line (`vga_get_uninterruptible(pdev, io_state);` at line 1168)
was introduced in commit `deb2d2ecd43dfc` by Benjamin Herrenschmidt on
2009-08-11, the original VGA arbitration implementation. This is kernel
v2.6.32-rc1 era code, present in **all** active stable trees.
### Step 3.2: Fixes Tag
No Fixes: tag present (expected for a manually reviewed candidate).
### Step 3.3: File History
Recent changes to `vgaarb.c` are mostly refactoring/cleanup (VGA device
selection, typo fixes, simplifications). None touch the "lock" command
handler in `vga_arb_write()`.
### Step 3.4: Author
Simon Richter has other PCI-related commits (BAR resize fixes). Bjorn
Helgaas (PCI maintainer) committed this, which indicates it was reviewed
and accepted through the proper PCI maintainer tree.
### Step 3.5: Dependencies
The message-id `-2-` suggests this is patch 2 in a series, but the fix
is entirely self-contained. It only adds error checking to an existing
call. No structural dependencies on other patches.
---
## PHASE 4: MAILING LIST RESEARCH
### Step 4.1-4.2: Original Patch
b4 dig could not find a match by the message-id directly. Lore was
inaccessible. However, the commit was signed off by Bjorn Helgaas (PCI
maintainer), indicating proper review.
### Step 4.3-4.5: Bug Report / Stable Discussion
No bug report links. No syzbot reference. This appears to be a code-
review-discovered issue.
---
## PHASE 5: CODE SEMANTIC ANALYSIS
### Step 5.1: Functions Modified
`vga_arb_write()` — the write handler for `/dev/vga_arbiter`.
### Step 5.2: Callers
`vga_arb_write()` is registered as the `.write` callback in
`vga_arb_device_fops` (line 1498). It's called when userspace writes to
`/dev/vga_arbiter`. This is the main userspace interface for the VGA
arbiter, used by X servers and display managers on multi-GPU systems.
### Step 5.3-5.4: Call Chain
Userspace → `write(fd, "lock io+mem", ...)` → `vga_arb_write()` →
`vga_get_uninterruptible()` → `vga_get()`. The error path is directly
reachable from userspace.
### Step 5.5: Similar Patterns
The i915 driver also ignores `vga_get_uninterruptible()` return values
(lines 68, 93 in `intel_vga.c`), but those are kernel-internal callers
where the consequences differ.
---
## PHASE 6: STABLE TREE ANALYSIS
### Step 6.1: Buggy Code Existence
The buggy code has been present since v2.6.32 (2009). It exists in
**all** active stable trees (5.4.y, 5.10.y, 5.15.y, 6.1.y, 6.6.y,
6.12.y).
### Step 6.2: Backport Complications
The surrounding code in the "lock" handler has not changed significantly
since the original commit. The patch should apply cleanly. The only
nearby changes are comment fixes (`cc64ca4b62f50` in 2023).
### Step 6.3: Related Fixes
No other fix for this issue exists in any stable tree.
---
## PHASE 7: SUBSYSTEM CONTEXT
### Step 7.1: Subsystem Criticality
PCI/VGA arbiter — **IMPORTANT** level. Used by all systems with VGA
devices and userspace display servers (X11, Wayland compositors). Multi-
GPU systems (common in desktop/workstation use) depend on correct VGA
arbitration.
### Step 7.2: Subsystem Activity
Moderately active — periodic cleanups and improvements, but core logic
is mature.
---
## PHASE 8: IMPACT AND RISK ASSESSMENT
### Step 8.1: Affected Users
Users with multi-GPU systems or systems where VGA device hot-unplug can
occur. Includes desktop/workstation users running X servers or other VGA
arbitration clients.
### Step 8.2: Trigger Conditions
- VGA device hot-unplug while userspace holds a reference via
`/dev/vga_arbiter`
- The window is a race between target check and `vga_get()`, so it's
timing-dependent
- Unprivileged user with access to `/dev/vga_arbiter` can trigger this
### Step 8.3: Failure Mode Severity
- **Userspace is told lock succeeded when it didn't**: MEDIUM — could
lead to concurrent VGA resource access, potentially causing display
corruption or GPU conflicts
- **Unbalanced counters**: LOW — `__vga_put()` has `>0` guards
preventing counter underflow, though spurious wake-ups may occur
- Overall severity: **MEDIUM**
### Step 8.4: Risk-Benefit
- **Benefit**: Correct error propagation to userspace; prevents VGA
resource conflicts on multi-GPU systems
- **Risk**: Very low — 5-line change adding standard error checking
- **Ratio**: Favorable for backport
---
## PHASE 9: FINAL SYNTHESIS
### Step 9.1: Evidence
**FOR backporting:**
- Fixes a real correctness bug: userspace is lied to about VGA lock
acquisition
- Fixes unbalanced reference counting on the cleanup path
- Small, surgical fix (5 lines, single function)
- Obviously correct: standard error-checking pattern
- Committed by PCI subsystem maintainer (Bjorn Helgaas)
- Bug present since v2.6.32 — affects all stable trees
- Self-contained, no dependencies on other patches
- Should apply cleanly to all stable trees
**AGAINST backporting:**
- The trigger window is a race condition (timing-dependent)
- No reported real-world incidents (code-review-found issue)
- The consequences are somewhat mitigated by `__vga_put()`'s `>0` guards
### Step 9.2: Stable Rules Checklist
1. Obviously correct and tested? **YES** — trivial error-checking
pattern, committed by maintainer
2. Fixes a real bug? **YES** — incorrect error reporting to userspace +
unbalanced counters
3. Important issue? **MEDIUM** — resource conflict possible but
mitigated in practice
4. Small and contained? **YES** — 5 lines in one function
5. No new features/APIs? **YES** — no new features
6. Can apply to stable? **YES** — code unchanged since 2009
### Step 9.3: Exception Categories
Not an exception category; evaluated on its own merit as a bug fix.
### Step 9.4: Decision
This is a clear correctness fix: error values from
`vga_get_uninterruptible()` must be propagated to userspace rather than
silently swallowed. The fix is minimal, obviously correct, committed by
the subsystem maintainer, and applies to all stable trees. Despite the
trigger being race-dependent, the fix prevents both misleading userspace
and unbalanced lock accounting.
---
## Verification
- [Phase 1] Parsed tags: Signed-off-by from author and PCI maintainer,
Link to lore, no Fixes/Reported-by/Cc:stable (expected)
- [Phase 2] Diff analysis: 5 lines added in the "lock" handler of
`vga_arb_write()` — captures return value of
`vga_get_uninterruptible()`, returns error before incrementing
counters
- [Phase 3] git blame: buggy code introduced in commit deb2d2ecd43dfc
(v2.6.32-rc1, 2009), present in all stable trees
- [Phase 3] git describe: `v2.6.32-rc1~670^2~52` confirms the original
commit is in all active stable trees
- [Phase 3] git log: no recent changes to the "lock" handler code path,
ensuring clean backport
- [Phase 4] b4 dig -c deb2d2ecd43dfc: found original VGA arbitration
patch from 2009
- [Phase 4] lore inaccessible due to anti-scraping protection
- [Phase 5] `vga_get_uninterruptible` returns `int` (verified in
include/linux/vgaarb.h lines 92-96), wrapping `vga_get()` which
returns -ENODEV on failure
- [Phase 5] `vga_arb_release()` (lines 1418-1428): confirms `vga_put()`
is called for each `io_cnt`/`mem_cnt`, creating unbalanced puts if
counters were wrongly incremented
- [Phase 5] `__vga_put()` (lines 378-381): confirmed `>0` guards exist,
preventing counter underflow
- [Phase 5] `vga_get()` (lines 434-483): confirmed `-ENODEV` is the only
error possible for uninterruptible calls
- [Phase 6] Code unchanged in the "lock" handler since v2.6.32; should
apply cleanly to all stable trees
- [Phase 7] PCI/VGA arbiter is used by display servers on multi-GPU
systems
- UNVERIFIED: Could not access lore to check full patch series context
or reviewer feedback
**YES**
drivers/pci/vgaarb.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/pci/vgaarb.c b/drivers/pci/vgaarb.c
index d9383bf541e7c..22b2b6ebdefdb 100644
--- a/drivers/pci/vgaarb.c
+++ b/drivers/pci/vgaarb.c
@@ -1134,6 +1134,7 @@ static ssize_t vga_arb_write(struct file *file, const char __user *buf,
char kbuf[64], *curr_pos;
size_t remaining = count;
+ int err;
int ret_val;
int i;
@@ -1165,7 +1166,11 @@ static ssize_t vga_arb_write(struct file *file, const char __user *buf,
goto done;
}
- vga_get_uninterruptible(pdev, io_state);
+ err = vga_get_uninterruptible(pdev, io_state);
+ if (err) {
+ ret_val = err;
+ goto done;
+ }
/* Update the client's locks lists */
for (i = 0; i < MAX_USER_CARDS; i++) {
--
2.53.0
^ permalink raw reply related [flat|nested] 5+ messages in thread* [PATCH AUTOSEL 6.18] PCI: hv: Set default NUMA node to 0 for devices without affinity info
[not found] <20260420132314.1023554-1-sashal@kernel.org>
` (3 preceding siblings ...)
2026-04-20 13:19 ` [PATCH AUTOSEL 7.0-5.10] PCI/VGA: Pass vga_get_uninterruptible() errors to userspace Sasha Levin
@ 2026-04-20 13:21 ` Sasha Levin
4 siblings, 0 replies; 5+ messages in thread
From: Sasha Levin @ 2026-04-20 13:21 UTC (permalink / raw)
To: patches, stable
Cc: Long Li, Michael Kelley, Wei Liu, Sasha Levin, kys, haiyangz,
decui, lpieralisi, kwilczynski, mani, bhelgaas, mikelley,
linux-hyperv, linux-pci, linux-kernel
From: Long Li <longli@microsoft.com>
[ Upstream commit 7b3b1e5a87b2f5e35c52b5386d7c327be869454f ]
When hv_pci_assign_numa_node() processes a device that does not have
HV_PCI_DEVICE_FLAG_NUMA_AFFINITY set or has an out-of-range
virtual_numa_node, the device NUMA node is left unset. On x86_64,
the uninitialized default happens to be 0, but on ARM64 it is
NUMA_NO_NODE (-1).
Tests show that when no NUMA information is available from the Hyper-V
host, devices perform best when assigned to node 0. With NUMA_NO_NODE
the kernel may spread work across NUMA nodes, which degrades
performance on Hyper-V, particularly for high-throughput devices like
MANA.
Always set the device NUMA node to 0 before the conditional NUMA
affinity check, so that devices get a performant default when the host
provides no NUMA information, and behavior is consistent on both
x86_64 and ARM64.
Fixes: 999dd956d838 ("PCI: hv: Add support for protocol 1.3 and support PCI_BUS_RELATIONS2")
Signed-off-by: Long Li <longli@microsoft.com>
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
LLM Generated explanations, may be completely bogus:
Error: Failed to generate final synthesis
drivers/pci/controller/pci-hyperv.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index 146b43981b278..28b1572974879 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -2486,6 +2486,14 @@ static void hv_pci_assign_numa_node(struct hv_pcibus_device *hbus)
if (!hv_dev)
continue;
+ /*
+ * If the Hyper-V host doesn't provide a NUMA node for the
+ * device, default to node 0. With NUMA_NO_NODE the kernel
+ * may spread work across NUMA nodes, which degrades
+ * performance on Hyper-V.
+ */
+ set_dev_node(&dev->dev, 0);
+
if (hv_dev->desc.flags & HV_PCI_DEVICE_FLAG_NUMA_AFFINITY &&
hv_dev->desc.virtual_numa_node < num_possible_nodes())
/*
--
2.53.0
^ permalink raw reply related [flat|nested] 5+ messages in thread