* [PATCH v21 0/4] [PCI] Error recovery for vfio-pci devices on s390x
@ 2026-06-30 16:48 Farhan Ali
2026-06-30 16:48 ` [PATCH v21 1/4] PCI: Allow per function PCI slots to fix slot reset on s390 Farhan Ali
` (3 more replies)
0 siblings, 4 replies; 9+ messages in thread
From: Farhan Ali @ 2026-06-30 16:48 UTC (permalink / raw)
To: linux-s390, linux-kernel, linux-pci
Cc: helgaas, alex, alifm, schnelle, mjrosato
Hi Bjorn,
This patch set includes only the PCI patches of the original series for
error recovery for vfio-pci devices on s390x [1]. Breaking up the patch
series into PCI and VFIO only patches to make merging easier based on
discussion with Alex [2].
Thanks
Farhan
[1] https://lore.kernel.org/all/20260520171113.1111-1-alifm@linux.ibm.com/
[2] https://lore.kernel.org/all/20260602163344.1eda12d2@shazbot.org/
ChangeLog
---------
v20 https://lore.kernel.org/all/20260622171840.1618-1-alifm@linux.ibm.com/
v20 -> v21
- Amend commit message to include Fixes tag and cc stable (patch 4).
- Rebase on 7.2-rc1.
v19 https://lore.kernel.org/all/20260615183524.2880-1-alifm@linux.ibm.com/
v19 -> v20
- Unconditionally enable Memory bit while restoring MSI-X (patch 4).
Fixes an issue found with sashiko.
v18 https://lore.kernel.org/all/20260603181647.2215-1-alifm@linux.ibm.com/
v18 -> v19
- Move config space accessible check to pcie_flr() function (based on
discussion of Sashiko review)
- Fix a gap in MSI-X restoration (patch 4).
- Rebase on 7.1-rc7
v17 -> v18
- Rebase on 7.1-rc6.
Farhan Ali (4):
PCI: Allow per function PCI slots to fix slot reset on s390
PCI: Avoid saving config space state if inaccessible
PCI: Fail FLR when config space is inaccessible
PCI/MSI: Enable memory decoding before restoring MSI-X messages
drivers/pci/hotplug/rpaphp_slot.c | 2 +-
drivers/pci/msi/msi.c | 10 ++++++++++
drivers/pci/pci.c | 32 ++++++++++++++++++++++++++++--
drivers/pci/slot.c | 33 +++++++++++++++++++++++--------
include/linux/pci.h | 8 ++++++--
5 files changed, 72 insertions(+), 13 deletions(-)
--
2.43.0
^ permalink raw reply [flat|nested] 9+ messages in thread* [PATCH v21 1/4] PCI: Allow per function PCI slots to fix slot reset on s390 2026-06-30 16:48 [PATCH v21 0/4] [PCI] Error recovery for vfio-pci devices on s390x Farhan Ali @ 2026-06-30 16:48 ` Farhan Ali 2026-06-30 17:10 ` sashiko-bot 2026-06-30 16:48 ` [PATCH v21 2/4] PCI: Avoid saving config space state if inaccessible Farhan Ali ` (2 subsequent siblings) 3 siblings, 1 reply; 9+ messages in thread From: Farhan Ali @ 2026-06-30 16:48 UTC (permalink / raw) To: linux-s390, linux-kernel, linux-pci Cc: helgaas, alex, alifm, schnelle, mjrosato, stable On s390 systems, which use a machine level hypervisor, PCI devices are always accessed through a form of PCI pass-through which fundamentally operates on a per PCI function granularity. This is also reflected in the s390 PCI hotplug driver which creates hotplug slots for individual PCI functions. Its reset_slot() function, which is a wrapper for zpci_hot_reset_device(), thus also resets individual functions. Currently, the kernel's PCI_SLOT() macro assigns the same pci_slot object to multifunction devices. This approach worked fine on s390 systems that only exposed virtual functions as individual PCI domains to the operating system. Since commit 44510d6fa0c0 ("s390/pci: Handling multifunctions") s390 supports exposing the topology of multifunction PCI devices by grouping them in a shared PCI domain. This creates a problem when resetting a function through the hotplug driver's slot_reset() interface. When attempting to reset a function through the hotplug driver, the shared slot assignment causes the wrong function to be reset instead of the intended one. It also leaks memory as we do create a pci_slot object for the function, but don't correctly free it in pci_slot_release(). Add a flag for struct pci_slot to allow per function PCI slots for functions managed through a hypervisor, which exposes individual PCI functions while retaining the topology. Since we can use all 8 bits for slot 'number' (for ARI devices), change slot 'number' u16 to account for special values -1 and PCI_SLOT_ALL_DEVICES. Fixes: 44510d6fa0c0 ("s390/pci: Handling multifunctions") Cc: stable@vger.kernel.org Suggested-by: Niklas Schnelle <schnelle@linux.ibm.com> Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Farhan Ali <alifm@linux.ibm.com> --- drivers/pci/hotplug/rpaphp_slot.c | 2 +- drivers/pci/pci.c | 5 +++-- drivers/pci/slot.c | 33 +++++++++++++++++++++++-------- include/linux/pci.h | 8 ++++++-- 4 files changed, 35 insertions(+), 13 deletions(-) diff --git a/drivers/pci/hotplug/rpaphp_slot.c b/drivers/pci/hotplug/rpaphp_slot.c index 67362e5b9971..92eabf5f61b9 100644 --- a/drivers/pci/hotplug/rpaphp_slot.c +++ b/drivers/pci/hotplug/rpaphp_slot.c @@ -84,7 +84,7 @@ int rpaphp_register_slot(struct slot *slot) struct hotplug_slot *php_slot = &slot->hotplug_slot; u32 my_index; int retval; - int slotno = -1; + int slotno = PCI_SLOT_PLACEHOLDER; dbg("%s registering slot:path[%pOF] index[%x], name[%s] pdomain[%x] type[%d]\n", __func__, slot->dn, slot->index, slot->name, diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 77b17b13ee61..350bae907ebf 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -4897,8 +4897,9 @@ static int pci_reset_hotplug_slot(struct hotplug_slot *hotplug, bool probe) static int pci_dev_reset_slot_function(struct pci_dev *dev, bool probe) { - if (dev->multifunction || dev->subordinate || !dev->slot || - dev->dev_flags & PCI_DEV_FLAGS_NO_BUS_RESET) + if (dev->subordinate || !dev->slot || + dev->dev_flags & PCI_DEV_FLAGS_NO_BUS_RESET || + (dev->multifunction && !dev->slot->per_func_slot)) return -ENOTTY; return pci_reset_hotplug_slot(dev->slot->hotplug, probe); diff --git a/drivers/pci/slot.c b/drivers/pci/slot.c index 6d5cd37bfb1e..894d6213ed30 100644 --- a/drivers/pci/slot.c +++ b/drivers/pci/slot.c @@ -37,7 +37,7 @@ static const struct sysfs_ops pci_slot_sysfs_ops = { static ssize_t address_read_file(struct pci_slot *slot, char *buf) { - if (slot->number == 0xff) + if (slot->number == (u16)PCI_SLOT_PLACEHOLDER) return sysfs_emit(buf, "%04x:%02x\n", pci_domain_nr(slot->bus), slot->bus->number); @@ -72,6 +72,23 @@ static ssize_t cur_speed_read_file(struct pci_slot *slot, char *buf) return bus_speed_read(slot->bus->cur_bus_speed, buf); } +static bool pci_dev_matches_slot(struct pci_dev *dev, struct pci_slot *slot) +{ + if (slot->per_func_slot) + return dev->devfn == slot->number; + + return slot->number == PCI_SLOT_ALL_DEVICES || + PCI_SLOT(dev->devfn) == slot->number; +} + +static bool pci_slot_enabled_per_func(void) +{ + if (IS_ENABLED(CONFIG_S390)) + return true; + + return false; +} + static void pci_slot_release(struct kobject *kobj) { struct pci_dev *dev; @@ -82,8 +99,7 @@ static void pci_slot_release(struct kobject *kobj) down_read(&pci_bus_sem); list_for_each_entry(dev, &slot->bus->devices, bus_list) - if (slot->number == PCI_SLOT_ALL_DEVICES || - PCI_SLOT(dev->devfn) == slot->number) + if (pci_dev_matches_slot(dev, slot)) dev->slot = NULL; up_read(&pci_bus_sem); @@ -187,8 +203,7 @@ void pci_dev_assign_slot(struct pci_dev *dev) mutex_lock(&pci_slot_mutex); list_for_each_entry(slot, &dev->bus->slots, list) - if (slot->number == PCI_SLOT_ALL_DEVICES || - PCI_SLOT(dev->devfn) == slot->number) + if (pci_dev_matches_slot(dev, slot)) dev->slot = slot; mutex_unlock(&pci_slot_mutex); } @@ -267,7 +282,7 @@ struct pci_slot *pci_create_slot(struct pci_bus *parent, int slot_nr, mutex_lock(&pci_slot_mutex); - if (slot_nr == -1) + if (slot_nr == PCI_SLOT_PLACEHOLDER) goto placeholder; /* @@ -298,6 +313,9 @@ struct pci_slot *pci_create_slot(struct pci_bus *parent, int slot_nr, slot->bus = pci_bus_get(parent); slot->number = slot_nr; + if (pci_slot_enabled_per_func()) + slot->per_func_slot = 1; + slot->kobj.kset = pci_slots_kset; slot_name = make_slot_name(name); @@ -318,8 +336,7 @@ struct pci_slot *pci_create_slot(struct pci_bus *parent, int slot_nr, down_read(&pci_bus_sem); list_for_each_entry(dev, &parent->devices, bus_list) - if (slot_nr == PCI_SLOT_ALL_DEVICES || - PCI_SLOT(dev->devfn) == slot_nr) + if (pci_dev_matches_slot(dev, slot)) dev->slot = slot; up_read(&pci_bus_sem); diff --git a/include/linux/pci.h b/include/linux/pci.h index ebb5b9d76360..b6e20616e17f 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -79,14 +79,18 @@ * and, if ARI Forwarding is enabled, functions may appear to be on multiple * devices. */ -#define PCI_SLOT_ALL_DEVICES 0xfe +#define PCI_SLOT_ALL_DEVICES 0xfeff + +/* Used to identify a slot as a placeholder */ +#define PCI_SLOT_PLACEHOLDER -1 /* pci_slot represents a physical slot */ struct pci_slot { struct pci_bus *bus; /* Bus this slot is on */ struct list_head list; /* Node in list of slots */ struct hotplug_slot *hotplug; /* Hotplug info (move here) */ - unsigned char number; /* Device nr, or PCI_SLOT_ALL_DEVICES */ + u16 number; /* Device nr, or PCI_SLOT_ALL_DEVICES */ + unsigned int per_func_slot:1; /* Allow per function slot */ struct kobject kobj; }; -- 2.43.0 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v21 1/4] PCI: Allow per function PCI slots to fix slot reset on s390 2026-06-30 16:48 ` [PATCH v21 1/4] PCI: Allow per function PCI slots to fix slot reset on s390 Farhan Ali @ 2026-06-30 17:10 ` sashiko-bot 0 siblings, 0 replies; 9+ messages in thread From: sashiko-bot @ 2026-06-30 17:10 UTC (permalink / raw) To: Farhan Ali Cc: Heiko Carstens, Vasily Gorbik, Alexander Gordeev, linux-s390, linux-pci Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider: New issues: - [High] Unconditional enablement of `per_func_slot` on S390 breaks standard PCI hotplug (e.g., pciehp, shpchp) slot matching and resets. Pre-existing issues: - [High] Lockless access to `dev->slot` in `pci_dev_reset_slot_function` can lead to Use-After-Free if a hotplug driver is concurrently unbound. -- commit e76ff052218a061a873ef34162524387e5e10c78 Author: Farhan Ali <alifm@linux.ibm.com> Subject: PCI: Allow per function PCI slots to fix slot reset on s390 This commit addresses an issue on s390 systems where the hotplug driver creates slots for individual PCI functions. It introduces a per_func_slot flag to struct pci_slot to allow per-function PCI slots and changes the slot number to u16 to accommodate special placeholder and all-devices values. Fixes: 44510d6fa0c0 ("s390/pci: Handling multifunctions") Cc: stable@vger.kernel.org > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index 77b17b13ee615..350bae907ebf8 100644 > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -4897,8 +4897,9 @@ static int pci_reset_hotplug_slot(struct hotplug_slot *hotplug, bool probe) > > static int pci_dev_reset_slot_function(struct pci_dev *dev, bool probe) > { > - if (dev->multifunction || dev->subordinate || !dev->slot || > - dev->dev_flags & PCI_DEV_FLAGS_NO_BUS_RESET) > + if (dev->subordinate || !dev->slot || > + dev->dev_flags & PCI_DEV_FLAGS_NO_BUS_RESET || > + (dev->multifunction && !dev->slot->per_func_slot)) [Severity: High] This is a pre-existing issue, but does this lockless access to dev->slot risk a use-after-free if a hotplug driver is concurrently unbound? The patch expands this access by adding the check for !dev->slot->per_func_slot. If a concurrent sysfs operation unbinds the pciehp service driver from the parent bridge, it takes the device lock on the bridge and executes: pci_hp_deregister() -> pci_destroy_slot() This invokes pci_slot_release(), clearing dev->slot = NULL under down_read(&pci_bus_sem) and freeing the slot memory. Since pci_dev_reset_slot_function() does not hold pci_bus_sem or pci_slot_mutex, could it check !dev->slot, get preempted, and then dereference freed memory upon resuming to evaluate dev->slot->per_func_slot? > return -ENOTTY; > > return pci_reset_hotplug_slot(dev->slot->hotplug, probe); [ ... ] > diff --git a/drivers/pci/slot.c b/drivers/pci/slot.c > index 6d5cd37bfb1e7..894d6213ed306 100644 > --- a/drivers/pci/slot.c > +++ b/drivers/pci/slot.c [ ... ] > @@ -72,6 +72,23 @@ static ssize_t cur_speed_read_file(struct pci_slot *slot, char *buf) > return bus_speed_read(slot->bus->cur_bus_speed, buf); > } > > +static bool pci_dev_matches_slot(struct pci_dev *dev, struct pci_slot *slot) > +{ > + if (slot->per_func_slot) > + return dev->devfn == slot->number; [Severity: High] Does enabling per_func_slot unconditionally on S390 break standard PCI hotplug slot matching? When standard PCI hotplug drivers (like pciehp) create a slot on an S390 system, pci_slot_enabled_per_func() globally forces per_func_slot to 1. This preempts the wildcard checking logic in pci_dev_matches_slot(). For drivers creating bus-wide slots using PCI_SLOT_ALL_DEVICES (0xfeff), this rigid comparison forces dev->devfn == 0xfeff, which will always fail since devfn is an 8-bit value. Additionally, for drivers passing a standard slot number N, pci_dev_matches_slot() now compares it directly against devfn instead of calculating PCI_SLOT(dev->devfn). A slot number of 1 will match Device 0, Function 1 instead of Device 1, Functions 0-7. Should per_func_slot be an explicitly opted-in property provided by the specific hotplug driver, rather than an architecture-wide global override? > + > + return slot->number == PCI_SLOT_ALL_DEVICES || > + PCI_SLOT(dev->devfn) == slot->number; > +} > + > +static bool pci_slot_enabled_per_func(void) > +{ > + if (IS_ENABLED(CONFIG_S390)) > + return true; > + > + return false; > +} > + -- Sashiko AI review · https://sashiko.dev/#/patchset/20260630164807.643-1-alifm@linux.ibm.com?part=1 ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v21 2/4] PCI: Avoid saving config space state if inaccessible 2026-06-30 16:48 [PATCH v21 0/4] [PCI] Error recovery for vfio-pci devices on s390x Farhan Ali 2026-06-30 16:48 ` [PATCH v21 1/4] PCI: Allow per function PCI slots to fix slot reset on s390 Farhan Ali @ 2026-06-30 16:48 ` Farhan Ali 2026-06-30 17:07 ` sashiko-bot 2026-06-30 16:48 ` [PATCH v21 3/4] PCI: Fail FLR when config space is inaccessible Farhan Ali 2026-06-30 16:48 ` [PATCH v21 4/4] PCI/MSI: Enable memory decoding before restoring MSI-X messages Farhan Ali 3 siblings, 1 reply; 9+ messages in thread From: Farhan Ali @ 2026-06-30 16:48 UTC (permalink / raw) To: linux-s390, linux-kernel, linux-pci Cc: helgaas, alex, alifm, schnelle, mjrosato, Bjorn Helgaas The current reset process saves the device's config space state before reset and restores it afterward. However errors may occur unexpectedly and it may then be impossible to save config space because the device may be inaccessible (e.g. DPC). This results in saving invalid values that get written back to the device during state restoration. With a reset we want to recover/restore the device into a functional state. So avoid saving the state of the config space when the device config space is inaccessible. Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com> Reviewed-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Farhan Ali <alifm@linux.ibm.com> --- drivers/pci/pci.c | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 350bae907ebf..e8d7de77241a 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -713,6 +713,27 @@ u16 pci_find_dvsec_capability(struct pci_dev *dev, u16 vendor, u16 dvsec) } EXPORT_SYMBOL_GPL(pci_find_dvsec_capability); +static bool pci_dev_config_accessible(struct pci_dev *dev, char *msg) +{ + u32 val; + + /* + * If device's config space is inaccessible it can return ~0 for + * any reads. Since VFs can also return ~0 for Device and Vendor ID + * check Command and Status registers. Note that this is racy + * because the device may become inaccessible partway through + * next access. + */ + pci_read_config_dword(dev, PCI_COMMAND, &val); + if (PCI_POSSIBLE_ERROR(val)) { + pci_warn(dev, "Device config space inaccessible; unable to %s\n", + msg); + return false; + } + + return true; +} + /** * pci_find_parent_resource - return resource region of parent bus of given * region @@ -5059,6 +5080,9 @@ static void pci_dev_save_and_disable(struct pci_dev *dev) */ pci_set_power_state(dev, PCI_D0); + if (!pci_dev_config_accessible(dev, "save state")) + return; + pci_save_state(dev); /* * Disable the device by clearing the Command register, except for -- 2.43.0 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v21 2/4] PCI: Avoid saving config space state if inaccessible 2026-06-30 16:48 ` [PATCH v21 2/4] PCI: Avoid saving config space state if inaccessible Farhan Ali @ 2026-06-30 17:07 ` sashiko-bot 0 siblings, 0 replies; 9+ messages in thread From: sashiko-bot @ 2026-06-30 17:07 UTC (permalink / raw) To: Farhan Ali Cc: Vasily Gorbik, linux-pci, Alexander Gordeev, Heiko Carstens, linux-s390 Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider: - [High] pci_dev_save_and_disable() skips disabling the device if config space is momentarily inaccessible, potentially leaving DMA and interrupts enabled. - [Low] String literal passed to a non-const `char *` pointer in `pci_dev_config_accessible()`. -- commit 067c10f20c7b3c97acf8c9886f6586c07bea7142 Author: Farhan Ali <alifm@linux.ibm.com> PCI: Avoid saving config space state if inaccessible This commit avoids saving the device's config space state during a reset if the device config space is temporarily inaccessible, preventing invalid values from being written back later. > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -713,6 +713,27 @@ u16 pci_find_dvsec_capability(struct pci_dev *dev, u16 vendor, u16 dvsec) > } > EXPORT_SYMBOL_GPL(pci_find_dvsec_capability); > > +static bool pci_dev_config_accessible(struct pci_dev *dev, char *msg) [Severity: Low] This isn't a bug, but should the msg parameter be marked const? Since a string literal ("save state") is passed to this function, keeping it as a non-const pointer violates const-correctness and might trigger compiler warnings. > +{ > + u32 val; > + [ ... ] > @@ -5059,6 +5080,9 @@ static void pci_dev_save_and_disable(struct pci_dev *dev) > */ > pci_set_power_state(dev, PCI_D0); > > + if (!pci_dev_config_accessible(dev, "save state")) > + return; > + [Severity: High] Does this early return skip disabling the device entirely if the config space is momentarily inaccessible? If the device is only transiently inaccessible and recovers immediately after the pci_dev_config_accessible() check, this early return skips the subsequent pci_write_config_word() that clears the Command register. Could this leave the device with Bus Master enabled, potentially allowing it to generate unexpected DMA requests or interrupt storms during the reset sequence? > pci_save_state(dev); > /* > * Disable the device by clearing the Command register, except for -- Sashiko AI review · https://sashiko.dev/#/patchset/20260630164807.643-1-alifm@linux.ibm.com?part=2 ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v21 3/4] PCI: Fail FLR when config space is inaccessible 2026-06-30 16:48 [PATCH v21 0/4] [PCI] Error recovery for vfio-pci devices on s390x Farhan Ali 2026-06-30 16:48 ` [PATCH v21 1/4] PCI: Allow per function PCI slots to fix slot reset on s390 Farhan Ali 2026-06-30 16:48 ` [PATCH v21 2/4] PCI: Avoid saving config space state if inaccessible Farhan Ali @ 2026-06-30 16:48 ` Farhan Ali 2026-06-30 17:13 ` sashiko-bot 2026-06-30 16:48 ` [PATCH v21 4/4] PCI/MSI: Enable memory decoding before restoring MSI-X messages Farhan Ali 3 siblings, 1 reply; 9+ messages in thread From: Farhan Ali @ 2026-06-30 16:48 UTC (permalink / raw) To: linux-s390, linux-kernel, linux-pci Cc: helgaas, alex, alifm, schnelle, mjrosato, Benjamin Block If a device is in an error state, then it's config space may not be accssible. Add additional check to validate if a device's config space is accessible before doing an FLR reset. Reviewed-by: Benjamin Block <bblock@linux.ibm.com> Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Farhan Ali <alifm@linux.ibm.com> --- drivers/pci/pci.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index e8d7de77241a..9a9d021301c4 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -4384,6 +4384,9 @@ int pcie_flr(struct pci_dev *dev) { int ret; + if (!pci_dev_config_accessible(dev, "FLR")) + return -ENOTTY; + if (!pci_wait_for_pending_transaction(dev)) pci_err(dev, "timed out waiting for pending transaction; performing function level reset anyway\n"); -- 2.43.0 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v21 3/4] PCI: Fail FLR when config space is inaccessible 2026-06-30 16:48 ` [PATCH v21 3/4] PCI: Fail FLR when config space is inaccessible Farhan Ali @ 2026-06-30 17:13 ` sashiko-bot 0 siblings, 0 replies; 9+ messages in thread From: sashiko-bot @ 2026-06-30 17:13 UTC (permalink / raw) To: Farhan Ali Cc: Vasily Gorbik, Alexander Gordeev, linux-pci, linux-s390, Heiko Carstens Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider: - [High] Un-ratelimited pci_warn() in pci_dev_config_accessible() allows an attacker to flood the host kernel log. - [High] Early bailout in pcie_flr() when config space is inaccessible skips the Function Level Reset and the subsequent wait, allowing device assignment to continue without reset, leaking state. -- commit ce2804b7f0c0409f73465726275859a70fa16be3 Author: Farhan Ali <alifm@linux.ibm.com> PCI: Fail FLR when config space is inaccessible This commit adds a check to validate if a device's config space is accessible before attempting a Function Level Reset (FLR). If the device is in an error state and inaccessible, the reset returns early. > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index e8d7de77241a6..9a9d021301c4e 100644 > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -4384,6 +4384,9 @@ int pcie_flr(struct pci_dev *dev) > { > int ret; > > + if (!pci_dev_config_accessible(dev, "FLR")) > + return -ENOTTY; > + [Severity: High] If a device is momentarily unresponsive and this check returns -ENOTTY immediately, doesn't this skip the FLR entirely? Since VFIO's vfio_pci_core_enable() allows assignment to continue if a reset returns -ENOTTY (by setting vdev->reset_works = false), could a malicious guest intentionally put the device into an unresponsive state to bypass the reset and leak device state to the next user? The previous code path waited before issuing the FLR, but this early return removes the grace period. [Severity: High] Because this early return bypasses the previous delay loop, can a userspace process spam the VFIO_DEVICE_RESET ioctl in a tight loop? If the device's config space reads ~0, pci_dev_config_accessible() prints a warning: drivers/pci/pci.c:pci_dev_config_accessible() { ... if (PCI_POSSIBLE_ERROR(val)) { pci_warn(dev, "Device config space inaccessible; unable to %s\n", msg); return false; } ... } Since this warning is unconditional and not rate-limited, could this tight loop allow an attacker to trivially flood the host's kernel log? > if (!pci_wait_for_pending_transaction(dev)) > pci_err(dev, "timed out waiting for pending transaction; performing function level reset anyway\n"); > -- Sashiko AI review · https://sashiko.dev/#/patchset/20260630164807.643-1-alifm@linux.ibm.com?part=3 ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v21 4/4] PCI/MSI: Enable memory decoding before restoring MSI-X messages 2026-06-30 16:48 [PATCH v21 0/4] [PCI] Error recovery for vfio-pci devices on s390x Farhan Ali ` (2 preceding siblings ...) 2026-06-30 16:48 ` [PATCH v21 3/4] PCI: Fail FLR when config space is inaccessible Farhan Ali @ 2026-06-30 16:48 ` Farhan Ali 2026-06-30 17:12 ` sashiko-bot 3 siblings, 1 reply; 9+ messages in thread From: Farhan Ali @ 2026-06-30 16:48 UTC (permalink / raw) To: linux-s390, linux-kernel, linux-pci Cc: helgaas, alex, alifm, schnelle, mjrosato, stable, Thomas Gleixner The current MSI-X restoration path assumes the Command register Memory bit is enabled when writing MSI-X messages. But it's possible the last saved and restored state of a device may not have the Memory bit enabled, even if a device driver later enables Memory bit and MSI-X. Attempting to access Memory space without Memory bit enabled can lead to Unsupported Request (UR) from the device. Fix this by enabling Memory bit and restore it afterwards. Fixes: 41017f0cac92 ("[PATCH] PCI: MSI(X) save/restore for suspend/resume") Cc: stable@vger.kernel.org Reviewed-by: Thomas Gleixner <tglx@kernel.org> Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Farhan Ali <alifm@linux.ibm.com> --- drivers/pci/msi/msi.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/drivers/pci/msi/msi.c b/drivers/pci/msi/msi.c index 209373c92e9e..79c7e84d314b 100644 --- a/drivers/pci/msi/msi.c +++ b/drivers/pci/msi/msi.c @@ -870,6 +870,7 @@ void __pci_restore_msix_state(struct pci_dev *dev) { struct msi_desc *entry; bool write_msg; + u16 cmd; if (!dev->msix_enabled) return; @@ -879,6 +880,14 @@ void __pci_restore_msix_state(struct pci_dev *dev) pci_msix_clear_and_set_ctrl(dev, 0, PCI_MSIX_FLAGS_ENABLE | PCI_MSIX_FLAGS_MASKALL); + /* + * The restored device state may not have Memory decoding enabled + * in the Command register. Since the MSI-X was enabled for the + * device, enable Memory decoding before restoring MSI-X. + */ + pci_read_config_word(dev, PCI_COMMAND, &cmd); + pci_write_config_word(dev, PCI_COMMAND, cmd | PCI_COMMAND_MEMORY); + write_msg = arch_restore_msi_irqs(dev); scoped_guard (msi_descs_lock, &dev->dev) { @@ -889,6 +898,7 @@ void __pci_restore_msix_state(struct pci_dev *dev) } } + pci_write_config_word(dev, PCI_COMMAND, cmd); pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_MASKALL, 0); } -- 2.43.0 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v21 4/4] PCI/MSI: Enable memory decoding before restoring MSI-X messages 2026-06-30 16:48 ` [PATCH v21 4/4] PCI/MSI: Enable memory decoding before restoring MSI-X messages Farhan Ali @ 2026-06-30 17:12 ` sashiko-bot 0 siblings, 0 replies; 9+ messages in thread From: sashiko-bot @ 2026-06-30 17:12 UTC (permalink / raw) To: Farhan Ali Cc: Alexander Gordeev, linux-s390, linux-pci, Vasily Gorbik, Heiko Carstens Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider: - [Medium] Blindly restoring a stale PCI_COMMAND value introduces a TOCTOU race window that can erase concurrent modifications to the register. -- commit ff043ecd9c4375492e235cd366b469432d15d4f3 Author: Farhan Ali <alifm@linux.ibm.com> PCI/MSI: Enable memory decoding before restoring MSI-X messages This commit modifies the MSI-X restoration path to temporarily enable the Command register Memory bit before writing MSI-X messages. This prevents Unsupported Request (UR) errors if the saved device state did not have memory decoding enabled. > diff --git a/drivers/pci/msi/msi.c b/drivers/pci/msi/msi.c > index 209373c92e9e1..79c7e84d314b7 100644 > --- a/drivers/pci/msi/msi.c > +++ b/drivers/pci/msi/msi.c > @@ -870,6 +870,7 @@ void __pci_restore_msix_state(struct pci_dev *dev) > { > struct msi_desc *entry; > bool write_msg; > + u16 cmd; > > if (!dev->msix_enabled) > return; [ ... ] > @@ -879,6 +880,14 @@ void __pci_restore_msix_state(struct pci_dev *dev) > pci_msix_clear_and_set_ctrl(dev, 0, > PCI_MSIX_FLAGS_ENABLE | PCI_MSIX_FLAGS_MASKALL); > > + /* > + * The restored device state may not have Memory decoding enabled > + * in the Command register. Since the MSI-X was enabled for the > + * device, enable Memory decoding before restoring MSI-X. > + */ > + pci_read_config_word(dev, PCI_COMMAND, &cmd); > + pci_write_config_word(dev, PCI_COMMAND, cmd | PCI_COMMAND_MEMORY); > + > write_msg = arch_restore_msi_irqs(dev); > > scoped_guard (msi_descs_lock, &dev->dev) { > @@ -889,6 +898,7 @@ void __pci_restore_msix_state(struct pci_dev *dev) > } > } > > + pci_write_config_word(dev, PCI_COMMAND, cmd); [Severity: Medium] Does unconditionally restoring the stale cmd value here introduce a race condition? The function reads the PCI_COMMAND register into cmd, enables the memory bit, and then enters a potentially sleeping section by acquiring msi_descs_lock. If another thread or subsystem modifies PCI_COMMAND (such as changing Bus Mastering or INTx disable) while this lock is held, writing the stale cmd value back will silently overwrite and lose those concurrent modifications. Additionally, if PCI_COMMAND_MEMORY was already enabled initially, does the code needlessly expose the device to this race instead of skipping the register modification entirely? > pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_MASKALL, 0); > } -- Sashiko AI review · https://sashiko.dev/#/patchset/20260630164807.643-1-alifm@linux.ibm.com?part=4 ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2026-06-30 17:13 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-06-30 16:48 [PATCH v21 0/4] [PCI] Error recovery for vfio-pci devices on s390x Farhan Ali 2026-06-30 16:48 ` [PATCH v21 1/4] PCI: Allow per function PCI slots to fix slot reset on s390 Farhan Ali 2026-06-30 17:10 ` sashiko-bot 2026-06-30 16:48 ` [PATCH v21 2/4] PCI: Avoid saving config space state if inaccessible Farhan Ali 2026-06-30 17:07 ` sashiko-bot 2026-06-30 16:48 ` [PATCH v21 3/4] PCI: Fail FLR when config space is inaccessible Farhan Ali 2026-06-30 17:13 ` sashiko-bot 2026-06-30 16:48 ` [PATCH v21 4/4] PCI/MSI: Enable memory decoding before restoring MSI-X messages Farhan Ali 2026-06-30 17:12 ` sashiko-bot
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox