* [PATCH v19 0/4] [PCI] Error recovery for vfio-pci devices on s390x
@ 2026-06-15 18:35 Farhan Ali
2026-06-15 18:35 ` [PATCH v19 1/4] PCI: Allow per function PCI slots to fix slot reset on s390 Farhan Ali
` (3 more replies)
0 siblings, 4 replies; 11+ messages in thread
From: Farhan Ali @ 2026-06-15 18:35 UTC (permalink / raw)
To: linux-s390, linux-kernel, linux-pci
Cc: helgaas, alex, alifm, schnelle, mjrosato
Hi Bjorn,
This patch set includes only the PCI patches of the original series for
error recovery for vfio-pci devices on s390x [1]. Breaking up the patch
series into PCI and VFIO only patches to make merging easier based on
discussion with Alex [2].
Thanks
Farhan
[1] https://lore.kernel.org/all/20260520171113.1111-1-alifm@linux.ibm.com/
[2] https://lore.kernel.org/all/20260602163344.1eda12d2@shazbot.org/
ChangeLog
---------
v18 https://lore.kernel.org/all/20260603181647.2215-1-alifm@linux.ibm.com/
v18 -> v19
- Move config space accessible check to pcie_flr() function (based on
discussion of Sashiko review)
- Fix a gap in MSI-X restoration (patch 4).
- Rebase on 7.1-rc7
v17 -> v18
- Rebase on 7.1-rc6.
Farhan Ali (4):
PCI: Allow per function PCI slots to fix slot reset on s390
PCI: Avoid saving config space state if inaccessible
PCI: Fail FLR when config space is inaccessible
PCI/MSI: Enable memory decoding before restoring MSI-X messages
drivers/pci/hotplug/rpaphp_slot.c | 2 +-
drivers/pci/msi/msi.c | 9 +++++++++
drivers/pci/pci.c | 32 ++++++++++++++++++++++++++++--
drivers/pci/slot.c | 33 +++++++++++++++++++++++--------
include/linux/pci.h | 8 ++++++--
5 files changed, 71 insertions(+), 13 deletions(-)
--
2.43.0
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v19 1/4] PCI: Allow per function PCI slots to fix slot reset on s390
2026-06-15 18:35 [PATCH v19 0/4] [PCI] Error recovery for vfio-pci devices on s390x Farhan Ali
@ 2026-06-15 18:35 ` Farhan Ali
2026-06-15 18:50 ` sashiko-bot
2026-06-15 18:35 ` [PATCH v19 2/4] PCI: Avoid saving config space state if inaccessible Farhan Ali
` (2 subsequent siblings)
3 siblings, 1 reply; 11+ messages in thread
From: Farhan Ali @ 2026-06-15 18:35 UTC (permalink / raw)
To: linux-s390, linux-kernel, linux-pci
Cc: helgaas, alex, alifm, schnelle, mjrosato, stable
On s390 systems, which use a machine level hypervisor, PCI devices are
always accessed through a form of PCI pass-through which fundamentally
operates on a per PCI function granularity. This is also reflected in the
s390 PCI hotplug driver which creates hotplug slots for individual PCI
functions. Its reset_slot() function, which is a wrapper for
zpci_hot_reset_device(), thus also resets individual functions.
Currently, the kernel's PCI_SLOT() macro assigns the same pci_slot object
to multifunction devices. This approach worked fine on s390 systems that
only exposed virtual functions as individual PCI domains to the operating
system. Since commit 44510d6fa0c0 ("s390/pci: Handling multifunctions")
s390 supports exposing the topology of multifunction PCI devices by
grouping them in a shared PCI domain. This creates a problem when resetting
a function through the hotplug driver's slot_reset() interface.
When attempting to reset a function through the hotplug driver, the shared
slot assignment causes the wrong function to be reset instead of the
intended one. It also leaks memory as we do create a pci_slot object for
the function, but don't correctly free it in pci_slot_release().
Add a flag for struct pci_slot to allow per function PCI slots for
functions managed through a hypervisor, which exposes individual PCI
functions while retaining the topology. Since we can use all 8 bits
for slot 'number' (for ARI devices), change slot 'number' u16 to
account for special values -1 and PCI_SLOT_ALL_DEVICES.
Fixes: 44510d6fa0c0 ("s390/pci: Handling multifunctions")
Cc: stable@vger.kernel.org
Suggested-by: Niklas Schnelle <schnelle@linux.ibm.com>
Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
---
drivers/pci/hotplug/rpaphp_slot.c | 2 +-
drivers/pci/pci.c | 5 +++--
drivers/pci/slot.c | 33 +++++++++++++++++++++++--------
include/linux/pci.h | 8 ++++++--
4 files changed, 35 insertions(+), 13 deletions(-)
diff --git a/drivers/pci/hotplug/rpaphp_slot.c b/drivers/pci/hotplug/rpaphp_slot.c
index 67362e5b9971..92eabf5f61b9 100644
--- a/drivers/pci/hotplug/rpaphp_slot.c
+++ b/drivers/pci/hotplug/rpaphp_slot.c
@@ -84,7 +84,7 @@ int rpaphp_register_slot(struct slot *slot)
struct hotplug_slot *php_slot = &slot->hotplug_slot;
u32 my_index;
int retval;
- int slotno = -1;
+ int slotno = PCI_SLOT_PLACEHOLDER;
dbg("%s registering slot:path[%pOF] index[%x], name[%s] pdomain[%x] type[%d]\n",
__func__, slot->dn, slot->index, slot->name,
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index d34266651ad0..f5f8291482b0 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4865,8 +4865,9 @@ static int pci_reset_hotplug_slot(struct hotplug_slot *hotplug, bool probe)
static int pci_dev_reset_slot_function(struct pci_dev *dev, bool probe)
{
- if (dev->multifunction || dev->subordinate || !dev->slot ||
- dev->dev_flags & PCI_DEV_FLAGS_NO_BUS_RESET)
+ if (dev->subordinate || !dev->slot ||
+ dev->dev_flags & PCI_DEV_FLAGS_NO_BUS_RESET ||
+ (dev->multifunction && !dev->slot->per_func_slot))
return -ENOTTY;
return pci_reset_hotplug_slot(dev->slot->hotplug, probe);
diff --git a/drivers/pci/slot.c b/drivers/pci/slot.c
index 6d5cd37bfb1e..894d6213ed30 100644
--- a/drivers/pci/slot.c
+++ b/drivers/pci/slot.c
@@ -37,7 +37,7 @@ static const struct sysfs_ops pci_slot_sysfs_ops = {
static ssize_t address_read_file(struct pci_slot *slot, char *buf)
{
- if (slot->number == 0xff)
+ if (slot->number == (u16)PCI_SLOT_PLACEHOLDER)
return sysfs_emit(buf, "%04x:%02x\n",
pci_domain_nr(slot->bus),
slot->bus->number);
@@ -72,6 +72,23 @@ static ssize_t cur_speed_read_file(struct pci_slot *slot, char *buf)
return bus_speed_read(slot->bus->cur_bus_speed, buf);
}
+static bool pci_dev_matches_slot(struct pci_dev *dev, struct pci_slot *slot)
+{
+ if (slot->per_func_slot)
+ return dev->devfn == slot->number;
+
+ return slot->number == PCI_SLOT_ALL_DEVICES ||
+ PCI_SLOT(dev->devfn) == slot->number;
+}
+
+static bool pci_slot_enabled_per_func(void)
+{
+ if (IS_ENABLED(CONFIG_S390))
+ return true;
+
+ return false;
+}
+
static void pci_slot_release(struct kobject *kobj)
{
struct pci_dev *dev;
@@ -82,8 +99,7 @@ static void pci_slot_release(struct kobject *kobj)
down_read(&pci_bus_sem);
list_for_each_entry(dev, &slot->bus->devices, bus_list)
- if (slot->number == PCI_SLOT_ALL_DEVICES ||
- PCI_SLOT(dev->devfn) == slot->number)
+ if (pci_dev_matches_slot(dev, slot))
dev->slot = NULL;
up_read(&pci_bus_sem);
@@ -187,8 +203,7 @@ void pci_dev_assign_slot(struct pci_dev *dev)
mutex_lock(&pci_slot_mutex);
list_for_each_entry(slot, &dev->bus->slots, list)
- if (slot->number == PCI_SLOT_ALL_DEVICES ||
- PCI_SLOT(dev->devfn) == slot->number)
+ if (pci_dev_matches_slot(dev, slot))
dev->slot = slot;
mutex_unlock(&pci_slot_mutex);
}
@@ -267,7 +282,7 @@ struct pci_slot *pci_create_slot(struct pci_bus *parent, int slot_nr,
mutex_lock(&pci_slot_mutex);
- if (slot_nr == -1)
+ if (slot_nr == PCI_SLOT_PLACEHOLDER)
goto placeholder;
/*
@@ -298,6 +313,9 @@ struct pci_slot *pci_create_slot(struct pci_bus *parent, int slot_nr,
slot->bus = pci_bus_get(parent);
slot->number = slot_nr;
+ if (pci_slot_enabled_per_func())
+ slot->per_func_slot = 1;
+
slot->kobj.kset = pci_slots_kset;
slot_name = make_slot_name(name);
@@ -318,8 +336,7 @@ struct pci_slot *pci_create_slot(struct pci_bus *parent, int slot_nr,
down_read(&pci_bus_sem);
list_for_each_entry(dev, &parent->devices, bus_list)
- if (slot_nr == PCI_SLOT_ALL_DEVICES ||
- PCI_SLOT(dev->devfn) == slot_nr)
+ if (pci_dev_matches_slot(dev, slot))
dev->slot = slot;
up_read(&pci_bus_sem);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 2c4454583c11..d58982aa8730 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -78,14 +78,18 @@
* and, if ARI Forwarding is enabled, functions may appear to be on multiple
* devices.
*/
-#define PCI_SLOT_ALL_DEVICES 0xfe
+#define PCI_SLOT_ALL_DEVICES 0xfeff
+
+/* Used to identify a slot as a placeholder */
+#define PCI_SLOT_PLACEHOLDER -1
/* pci_slot represents a physical slot */
struct pci_slot {
struct pci_bus *bus; /* Bus this slot is on */
struct list_head list; /* Node in list of slots */
struct hotplug_slot *hotplug; /* Hotplug info (move here) */
- unsigned char number; /* Device nr, or PCI_SLOT_ALL_DEVICES */
+ u16 number; /* Device nr, or PCI_SLOT_ALL_DEVICES */
+ unsigned int per_func_slot:1; /* Allow per function slot */
struct kobject kobj;
};
--
2.43.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v19 2/4] PCI: Avoid saving config space state if inaccessible
2026-06-15 18:35 [PATCH v19 0/4] [PCI] Error recovery for vfio-pci devices on s390x Farhan Ali
2026-06-15 18:35 ` [PATCH v19 1/4] PCI: Allow per function PCI slots to fix slot reset on s390 Farhan Ali
@ 2026-06-15 18:35 ` Farhan Ali
2026-06-15 18:49 ` sashiko-bot
2026-06-15 18:35 ` [PATCH v19 3/4] PCI: Fail FLR when config space is inaccessible Farhan Ali
2026-06-15 18:35 ` [PATCH v19 4/4] PCI/MSI: Enable memory decoding before restoring MSI-X messages Farhan Ali
3 siblings, 1 reply; 11+ messages in thread
From: Farhan Ali @ 2026-06-15 18:35 UTC (permalink / raw)
To: linux-s390, linux-kernel, linux-pci
Cc: helgaas, alex, alifm, schnelle, mjrosato, Bjorn Helgaas
The current reset process saves the device's config space state before
reset and restores it afterward. However errors may occur unexpectedly and
it may then be impossible to save config space because the device may be
inaccessible (e.g. DPC). This results in saving invalid values that get
written back to the device during state restoration.
With a reset we want to recover/restore the device into a functional state.
So avoid saving the state of the config space when the device config space
is inaccessible.
Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
Reviewed-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
---
drivers/pci/pci.c | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index f5f8291482b0..973d23e41c48 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -722,6 +722,27 @@ u16 pci_find_dvsec_capability(struct pci_dev *dev, u16 vendor, u16 dvsec)
}
EXPORT_SYMBOL_GPL(pci_find_dvsec_capability);
+static bool pci_dev_config_accessible(struct pci_dev *dev, char *msg)
+{
+ u32 val;
+
+ /*
+ * If device's config space is inaccessible it can return ~0 for
+ * any reads. Since VFs can also return ~0 for Device and Vendor ID
+ * check Command and Status registers. Note that this is racy
+ * because the device may become inaccessible partway through
+ * next access.
+ */
+ pci_read_config_dword(dev, PCI_COMMAND, &val);
+ if (PCI_POSSIBLE_ERROR(val)) {
+ pci_warn(dev, "Device config space inaccessible; unable to %s\n",
+ msg);
+ return false;
+ }
+
+ return true;
+}
+
/**
* pci_find_parent_resource - return resource region of parent bus of given
* region
@@ -5027,6 +5048,9 @@ static void pci_dev_save_and_disable(struct pci_dev *dev)
*/
pci_set_power_state(dev, PCI_D0);
+ if (!pci_dev_config_accessible(dev, "save state"))
+ return;
+
pci_save_state(dev);
/*
* Disable the device by clearing the Command register, except for
--
2.43.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v19 3/4] PCI: Fail FLR when config space is inaccessible
2026-06-15 18:35 [PATCH v19 0/4] [PCI] Error recovery for vfio-pci devices on s390x Farhan Ali
2026-06-15 18:35 ` [PATCH v19 1/4] PCI: Allow per function PCI slots to fix slot reset on s390 Farhan Ali
2026-06-15 18:35 ` [PATCH v19 2/4] PCI: Avoid saving config space state if inaccessible Farhan Ali
@ 2026-06-15 18:35 ` Farhan Ali
2026-06-15 18:43 ` sashiko-bot
2026-06-15 18:35 ` [PATCH v19 4/4] PCI/MSI: Enable memory decoding before restoring MSI-X messages Farhan Ali
3 siblings, 1 reply; 11+ messages in thread
From: Farhan Ali @ 2026-06-15 18:35 UTC (permalink / raw)
To: linux-s390, linux-kernel, linux-pci
Cc: helgaas, alex, alifm, schnelle, mjrosato, Benjamin Block
If a device is in an error state, then it's config space may not be
accssible. Add additional check to validate if a device's config space is
accessible before doing an FLR reset.
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
---
drivers/pci/pci.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 973d23e41c48..521a4bb189d7 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4352,6 +4352,9 @@ int pcie_flr(struct pci_dev *dev)
{
int ret;
+ if (!pci_dev_config_accessible(dev, "FLR"))
+ return -ENOTTY;
+
if (!pci_wait_for_pending_transaction(dev))
pci_err(dev, "timed out waiting for pending transaction; performing function level reset anyway\n");
--
2.43.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v19 4/4] PCI/MSI: Enable memory decoding before restoring MSI-X messages
2026-06-15 18:35 [PATCH v19 0/4] [PCI] Error recovery for vfio-pci devices on s390x Farhan Ali
` (2 preceding siblings ...)
2026-06-15 18:35 ` [PATCH v19 3/4] PCI: Fail FLR when config space is inaccessible Farhan Ali
@ 2026-06-15 18:35 ` Farhan Ali
2026-06-15 18:48 ` sashiko-bot
2026-06-15 18:59 ` Niklas Schnelle
3 siblings, 2 replies; 11+ messages in thread
From: Farhan Ali @ 2026-06-15 18:35 UTC (permalink / raw)
To: linux-s390, linux-kernel, linux-pci
Cc: helgaas, alex, alifm, schnelle, mjrosato
The current MSI-X restoration path assumes the Command register Memory bit
is enabled when writing MSI-X messages. But its possible the last saved and
restored state of device may not have the Memory bit enabled, even if a
device driver later enables Memory bit and MSI-X. Attempting to access
Memory space without Memory bit enabled can lead to Unsupported Request
(UR) from the device. Fix this by enabling Memory bit if we write MSI-X
messages, and restore it afterwards.
Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
---
drivers/pci/msi/msi.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/drivers/pci/msi/msi.c b/drivers/pci/msi/msi.c
index 81d24a270a79..d8d3c8a911ac 100644
--- a/drivers/pci/msi/msi.c
+++ b/drivers/pci/msi/msi.c
@@ -874,6 +874,7 @@ void __pci_restore_msix_state(struct pci_dev *dev)
{
struct msi_desc *entry;
bool write_msg;
+ u16 cmd;
if (!dev->msix_enabled)
return;
@@ -884,6 +885,11 @@ void __pci_restore_msix_state(struct pci_dev *dev)
PCI_MSIX_FLAGS_ENABLE | PCI_MSIX_FLAGS_MASKALL);
write_msg = arch_restore_msi_irqs(dev);
+ if (write_msg) {
+ pci_read_config_word(dev, PCI_COMMAND, &cmd);
+ pci_write_config_word(dev, PCI_COMMAND,
+ cmd | PCI_COMMAND_MEMORY);
+ }
scoped_guard (msi_descs_lock, &dev->dev) {
msi_for_each_desc(entry, &dev->dev, MSI_DESC_ALL) {
@@ -893,6 +899,9 @@ void __pci_restore_msix_state(struct pci_dev *dev)
}
}
+ if (write_msg)
+ pci_write_config_word(dev, PCI_COMMAND, cmd);
+
pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_MASKALL, 0);
}
--
2.43.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH v19 3/4] PCI: Fail FLR when config space is inaccessible
2026-06-15 18:35 ` [PATCH v19 3/4] PCI: Fail FLR when config space is inaccessible Farhan Ali
@ 2026-06-15 18:43 ` sashiko-bot
0 siblings, 0 replies; 11+ messages in thread
From: sashiko-bot @ 2026-06-15 18:43 UTC (permalink / raw)
To: Farhan Ali
Cc: linux-pci, linux-s390, Vasily Gorbik, Alexander Gordeev,
Heiko Carstens
> If a device is in an error state, then it's config space may not be
> accssible. Add additional check to validate if a device's config space is
> accessible before doing an FLR reset.
>
> Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
> Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
Sashiko has reviewed this patch and found no issues. It looks great!
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260615183524.2880-1-alifm@linux.ibm.com?part=3
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v19 4/4] PCI/MSI: Enable memory decoding before restoring MSI-X messages
2026-06-15 18:35 ` [PATCH v19 4/4] PCI/MSI: Enable memory decoding before restoring MSI-X messages Farhan Ali
@ 2026-06-15 18:48 ` sashiko-bot
2026-06-15 18:59 ` Niklas Schnelle
1 sibling, 0 replies; 11+ messages in thread
From: sashiko-bot @ 2026-06-15 18:48 UTC (permalink / raw)
To: Farhan Ali
Cc: linux-s390, Alexander Gordeev, Vasily Gorbik, linux-pci,
Heiko Carstens
Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
New issues:
- [High] Unvalidated PCI_COMMAND read can result in writing 0xFFFF back to the device, enabling fatal error reporting.
- [High] Blind restoration of PCI_COMMAND register clobbers concurrent modifications across the locked section (TOCTOU race).
Pre-existing issues:
- [High] Conditional enablement of PCI_COMMAND_MEMORY leaves pci_msix_write_vector_ctrl() exposed to Unsupported Request (UR) faults.
--
commit 9bf79a22431c8153c5aed81ed2a19f276bfd3940
Author: Farhan Ali <alifm@linux.ibm.com>
PCI/MSI: Enable memory decoding before restoring MSI-X messages
This commit attempts to fix Unsupported Request (UR) faults during MSI-X
restoration by temporarily enabling the Memory bit in the Command register
when writing MSI-X messages. The bit is enabled prior to restoration and
restored to its previous state afterwards.
> diff --git a/drivers/pci/msi/msi.c b/drivers/pci/msi/msi.c
> index 81d24a270a795..d8d3c8a911ac0 100644
> --- a/drivers/pci/msi/msi.c
> +++ b/drivers/pci/msi/msi.c
> @@ -874,6 +874,7 @@ void __pci_restore_msix_state(struct pci_dev *dev)
> {
> struct msi_desc *entry;
> bool write_msg;
> + u16 cmd;
>
> if (!dev->msix_enabled)
> return;
[ ... ]
> @@ -884,6 +885,11 @@ void __pci_restore_msix_state(struct pci_dev *dev)
> PCI_MSIX_FLAGS_ENABLE | PCI_MSIX_FLAGS_MASKALL);
>
> write_msg = arch_restore_msi_irqs(dev);
> + if (write_msg) {
> + pci_read_config_word(dev, PCI_COMMAND, &cmd);
> + pci_write_config_word(dev, PCI_COMMAND,
> + cmd | PCI_COMMAND_MEMORY);
> + }
[Severity: High]
Does pci_read_config_word() need its return value checked here?
If the read fails (e.g., due to a transient bus error or recovery state), the
PCI subsystem populates cmd with 0xFFFF (PCI_ERROR_RESPONSE). Computing
cmd | PCI_COMMAND_MEMORY would then write 0xFFFF back to the Command Register,
enabling all writable capabilities at once (including SERR# Enable), which
could trigger a fatal NMI or system panic.
> scoped_guard (msi_descs_lock, &dev->dev) {
> msi_for_each_desc(entry, &dev->dev, MSI_DESC_ALL) {
> if (write_msg)
> __pci_write_msi_msg(entry, &entry->msg);
> pci_msix_write_vector_ctrl(entry, entry->pci.msix_ctrl);
[Severity: High]
This is a pre-existing issue, but does this code leave the MSI-X table
exposed to the very Unsupported Request (UR) faults this patch tries to fix?
Since pci_msix_write_vector_ctrl() performs an MMIO write to the MSI-X
Vector Control register unconditionally for every MSI descriptor, what
happens if write_msg is false? The memory decoding bit won't be enabled
above, meaning this unconditional MMIO access could still trigger a UR fault.
> }
> }
>
> + if (write_msg)
> + pci_write_config_word(dev, PCI_COMMAND, cmd);
> +
[Severity: High]
Can this blindly overwrite the PCI_COMMAND register with a stale snapshot?
The cmd value was read before the scoped_guard() block. If concurrent
operations modify PCI_COMMAND (such as changing PCI_COMMAND_MASTER or
PCI_COMMAND_INTX_DISABLE via sysfs or other subsystem operations) while
the lock is held, rewriting the old cmd value here would wipe out those
concurrent changes.
Should we re-read the register here, mask out PCI_COMMAND_MEMORY if we
enabled it, and only restore that specific bit?
> pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_MASKALL, 0);
> }
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260615183524.2880-1-alifm@linux.ibm.com?part=4
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v19 2/4] PCI: Avoid saving config space state if inaccessible
2026-06-15 18:35 ` [PATCH v19 2/4] PCI: Avoid saving config space state if inaccessible Farhan Ali
@ 2026-06-15 18:49 ` sashiko-bot
0 siblings, 0 replies; 11+ messages in thread
From: sashiko-bot @ 2026-06-15 18:49 UTC (permalink / raw)
To: Farhan Ali
Cc: linux-s390, Alexander Gordeev, Vasily Gorbik, Heiko Carstens,
linux-pci
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Low] The `msg` parameter in `pci_dev_config_accessible` should be declared as `const char *msg`.
--
commit a6a510b639eb621fca4f9673a8ce4e45c59a4efe
Author: Farhan Ali <alifm@linux.ibm.com>
PCI: Avoid saving config space state if inaccessible
The current reset process saves the device's config space state before
reset and restores it afterward. However errors may occur unexpectedly and
it may then be impossible to save config space because the device may be
inaccessible (e.g. DPC). This results in saving invalid values that get
written back to the device during state restoration.
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index d0c9f0166af5c..e71b814153921 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
[ ... ]
> @@ -722,6 +722,27 @@ u16 pci_find_dvsec_capability(struct pci_dev *dev, u16 vendor, u16 dvsec)
> }
> EXPORT_SYMBOL_GPL(pci_find_dvsec_capability);
>
> +static bool pci_dev_config_accessible(struct pci_dev *dev, char *msg)
[Severity: Low]
This isn't a bug, but should the msg parameter be declared as const char *?
The function is called with the string literal "save state" later in this
patch. Using a non-const pointer for string literals can trigger compiler
warnings.
> +{
> + u32 val;
> +
[ ... ]
> @@ -5027,6 +5048,9 @@ static void pci_dev_save_and_disable(struct pci_dev *dev)
> */
> pci_set_power_state(dev, PCI_D0);
>
> + if (!pci_dev_config_accessible(dev, "save state"))
> + return;
> +
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260615183524.2880-1-alifm@linux.ibm.com?part=2
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v19 1/4] PCI: Allow per function PCI slots to fix slot reset on s390
2026-06-15 18:35 ` [PATCH v19 1/4] PCI: Allow per function PCI slots to fix slot reset on s390 Farhan Ali
@ 2026-06-15 18:50 ` sashiko-bot
0 siblings, 0 replies; 11+ messages in thread
From: sashiko-bot @ 2026-06-15 18:50 UTC (permalink / raw)
To: Farhan Ali
Cc: linux-s390, Alexander Gordeev, Vasily Gorbik, linux-pci,
Heiko Carstens
Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
New issues:
- [High] Global enforcement of per_func_slot on S390 breaks standard generic PCI hotplug (e.g., pciehp) slot matching.
Pre-existing issues:
- [High] Pre-existing Race Conditions and Use-After-Free in PCI slot lifecycle and dev->slot management.
--
commit 39801ae4eb8c8847b623fb1ef810dd413b455ecc
Author: Farhan Ali <alifm@linux.ibm.com>
Subject: PCI: Allow per function PCI slots to fix slot reset on s390
This commit adds a per_func_slot flag to struct pci_slot to support
resetting individual PCI functions on s390 systems. It modifies the slot
matching and reset logic to account for this new flag, and adjusts the
slot number type to handle placeholder values.
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 8f7cfcc000901..d0c9f0166af5c 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -4865,8 +4865,9 @@ static int pci_reset_hotplug_slot(struct hotplug_slot *hotplug, bool probe)
>
> static int pci_dev_reset_slot_function(struct pci_dev *dev, bool probe)
> {
> - if (dev->multifunction || dev->subordinate || !dev->slot ||
> - dev->dev_flags & PCI_DEV_FLAGS_NO_BUS_RESET)
> + if (dev->subordinate || !dev->slot ||
> + dev->dev_flags & PCI_DEV_FLAGS_NO_BUS_RESET ||
> + (dev->multifunction && !dev->slot->per_func_slot))
[Severity: High]
This is a pre-existing issue, but does reading dev->slot->per_func_slot here
expose a use-after-free vulnerability?
In pci_dev_reset_slot_function(), dev->slot is dereferenced locklessly without
holding pci_bus_sem. If a concurrent hotplug removal triggers
pci_slot_release(), the slot could be freed.
Furthermore, pci_slot_release() iterates over bus->devices to clear dev->slot,
but this can miss devices currently undergoing probing that haven't been added
to the bus list yet, leaving a dangling pointer.
> return -ENOTTY;
>
> return pci_reset_hotplug_slot(dev->slot->hotplug, probe);
[ ... ]
> diff --git a/drivers/pci/slot.c b/drivers/pci/slot.c
> index 6d5cd37bfb1e7..894d6213ed306 100644
> --- a/drivers/pci/slot.c
> +++ b/drivers/pci/slot.c
> @@ -72,6 +72,23 @@ static ssize_t cur_speed_read_file(struct pci_slot *slot, char *buf)
> return bus_speed_read(slot->bus->cur_bus_speed, buf);
> }
>
> +static bool pci_dev_matches_slot(struct pci_dev *dev, struct pci_slot *slot)
> +{
> + if (slot->per_func_slot)
> + return dev->devfn == slot->number;
> +
[Severity: High]
Does this logic break generic hotplug drivers like pciehp on s390 systems?
Because pci_slot_enabled_per_func() unconditionally returns true for
CONFIG_S390, per_func_slot is enabled globally. When generic hotplug drivers
create slots using PCI_SLOT_ALL_DEVICES (0xfeff), slot->number is 0xfeff.
If slot->per_func_slot is true, this evaluates dev->devfn == 0xfeff before
checking for the PCI_SLOT_ALL_DEVICES wildcard. Since devfn is an 8-bit value,
this always evaluates to false, preventing downstream devices from matching
their hotplug slots.
> + return slot->number == PCI_SLOT_ALL_DEVICES ||
> + PCI_SLOT(dev->devfn) == slot->number;
> +}
> +
> +static bool pci_slot_enabled_per_func(void)
> +{
> + if (IS_ENABLED(CONFIG_S390))
> + return true;
> +
> + return false;
> +}
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260615183524.2880-1-alifm@linux.ibm.com?part=1
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v19 4/4] PCI/MSI: Enable memory decoding before restoring MSI-X messages
2026-06-15 18:35 ` [PATCH v19 4/4] PCI/MSI: Enable memory decoding before restoring MSI-X messages Farhan Ali
2026-06-15 18:48 ` sashiko-bot
@ 2026-06-15 18:59 ` Niklas Schnelle
2026-06-15 22:17 ` Farhan Ali
1 sibling, 1 reply; 11+ messages in thread
From: Niklas Schnelle @ 2026-06-15 18:59 UTC (permalink / raw)
To: Farhan Ali, linux-s390, linux-kernel, linux-pci; +Cc: helgaas, alex, mjrosato
On Mon, 2026-06-15 at 11:35 -0700, Farhan Ali wrote:
> The current MSI-X restoration path assumes the Command register Memory bit
> is enabled when writing MSI-X messages. But its possible the last saved and
^ it's
> restored state of device may not have the Memory bit enabled, even if a
^a
> device driver later enables Memory bit and MSI-X. Attempting to access
> Memory space without Memory bit enabled can lead to Unsupported Request
> (UR) from the device. Fix this by enabling Memory bit if we write MSI-X
> messages, and restore it afterwards.
Don't we have the same issue in __pci_restore_msi_state()?
>
> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
> ---
> drivers/pci/msi/msi.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/drivers/pci/msi/msi.c b/drivers/pci/msi/msi.c
> index 81d24a270a79..d8d3c8a911ac 100644
> --- a/drivers/pci/msi/msi.c
> +++ b/drivers/pci/msi/msi.c
> @@ -874,6 +874,7 @@ void __pci_restore_msix_state(struct pci_dev *dev)
> {
> struct msi_desc *entry;
> bool write_msg;
> + u16 cmd;
>
> if (!dev->msix_enabled)
> return;
> @@ -884,6 +885,11 @@ void __pci_restore_msix_state(struct pci_dev *dev)
> PCI_MSIX_FLAGS_ENABLE | PCI_MSIX_FLAGS_MASKALL);
>
> write_msg = arch_restore_msi_irqs(dev);
> + if (write_msg) {
> + pci_read_config_word(dev, PCI_COMMAND, &cmd);
Sashiko complains that the read cmd may be all 0xFF…F in case of
transient errors. Though I don't see how we'd handle that here without
a larger refactor.
> + pci_write_config_word(dev, PCI_COMMAND,
> + cmd | PCI_COMMAND_MEMORY);
> + }
>
> scoped_guard (msi_descs_lock, &dev->dev) {
> msi_for_each_desc(entry, &dev->dev, MSI_DESC_ALL) {
> @@ -893,6 +899,9 @@ void __pci_restore_msix_state(struct pci_dev *dev)
> }
> }
>
> + if (write_msg)
> + pci_write_config_word(dev, PCI_COMMAND, cmd);
I wonder if it may be safer to have the Command register reads/writes
within the msic_descs_lock critical section in case this gets executed
concurrently and so the restore above from one thread could slip
between the Memory bit enable and the beginning of the critical
section.
> +
> pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_MASKALL, 0);
> }
>
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v19 4/4] PCI/MSI: Enable memory decoding before restoring MSI-X messages
2026-06-15 18:59 ` Niklas Schnelle
@ 2026-06-15 22:17 ` Farhan Ali
0 siblings, 0 replies; 11+ messages in thread
From: Farhan Ali @ 2026-06-15 22:17 UTC (permalink / raw)
To: Niklas Schnelle, linux-s390, linux-kernel, linux-pci
Cc: helgaas, alex, mjrosato
On 6/15/2026 11:59 AM, Niklas Schnelle wrote:
> On Mon, 2026-06-15 at 11:35 -0700, Farhan Ali wrote:
>> The current MSI-X restoration path assumes the Command register Memory bit
>> is enabled when writing MSI-X messages. But its possible the last saved and
> ^ it's
>
>> restored state of device may not have the Memory bit enabled, even if a
> ^a
>
>> device driver later enables Memory bit and MSI-X. Attempting to access
>> Memory space without Memory bit enabled can lead to Unsupported Request
>> (UR) from the device. Fix this by enabling Memory bit if we write MSI-X
>> messages, and restore it afterwards.
> Don't we have the same issue in __pci_restore_msi_state()?
I think for MSI it's done through config space registers and so it
doesn't need to access MMIO.
>
>> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
>> ---
>> drivers/pci/msi/msi.c | 9 +++++++++
>> 1 file changed, 9 insertions(+)
>>
>> diff --git a/drivers/pci/msi/msi.c b/drivers/pci/msi/msi.c
>> index 81d24a270a79..d8d3c8a911ac 100644
>> --- a/drivers/pci/msi/msi.c
>> +++ b/drivers/pci/msi/msi.c
>> @@ -874,6 +874,7 @@ void __pci_restore_msix_state(struct pci_dev *dev)
>> {
>> struct msi_desc *entry;
>> bool write_msg;
>> + u16 cmd;
>>
>> if (!dev->msix_enabled)
>> return;
>> @@ -884,6 +885,11 @@ void __pci_restore_msix_state(struct pci_dev *dev)
>> PCI_MSIX_FLAGS_ENABLE | PCI_MSIX_FLAGS_MASKALL);
>>
>> write_msg = arch_restore_msi_irqs(dev);
>> + if (write_msg) {
>> + pci_read_config_word(dev, PCI_COMMAND, &cmd);
> Sashiko complains that the read cmd may be all 0xFF…F in case of
> transient errors. Though I don't see how we'd handle that here without
> a larger refactor.
>
>> + pci_write_config_word(dev, PCI_COMMAND,
>> + cmd | PCI_COMMAND_MEMORY);
>> + }
>>
>> scoped_guard (msi_descs_lock, &dev->dev) {
>> msi_for_each_desc(entry, &dev->dev, MSI_DESC_ALL) {
>> @@ -893,6 +899,9 @@ void __pci_restore_msix_state(struct pci_dev *dev)
>> }
>> }
>>
>> + if (write_msg)
>> + pci_write_config_word(dev, PCI_COMMAND, cmd);
> I wonder if it may be safer to have the Command register reads/writes
> within the msic_descs_lock critical section in case this gets executed
> concurrently and so the restore above from one thread could slip
> between the Memory bit enable and the beginning of the critical
> section.
Hmm I am not sure if its necessary, AFAIU the msi_desc_lock is used to
protect the descriptor list. Other config space changes in this function
is also done outside of this critical section.
OTOH Sashiko does mention correctly pci_msix_write_vector_ctrl() can
happen even when write_msg is false and which I missed. But this got me
curious, as to why would we write the vector control for an MSI-X entry
if we are not writing the message (Address and Data). AFAICT the change
to write message based on arch_restore_msi_irq() was originally
introduced with 76ccc297018d ("x86/PCI: Expand the x86_msi_ops to have a
restore MSIs."). Prior to that we always restored the message and vector
control mask for an entry. So I am not sure if write_msg is false and we
don't write a message, why would we want to write the vector control
mask? If we have a valid reason to write the vector control, even though
we don't write the message then we should we enable the Memory bit
unconditionally? Thanks
Farhan
>
>> +
>> pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_MASKALL, 0);
>> }
>>
>>
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2026-06-15 22:17 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-15 18:35 [PATCH v19 0/4] [PCI] Error recovery for vfio-pci devices on s390x Farhan Ali
2026-06-15 18:35 ` [PATCH v19 1/4] PCI: Allow per function PCI slots to fix slot reset on s390 Farhan Ali
2026-06-15 18:50 ` sashiko-bot
2026-06-15 18:35 ` [PATCH v19 2/4] PCI: Avoid saving config space state if inaccessible Farhan Ali
2026-06-15 18:49 ` sashiko-bot
2026-06-15 18:35 ` [PATCH v19 3/4] PCI: Fail FLR when config space is inaccessible Farhan Ali
2026-06-15 18:43 ` sashiko-bot
2026-06-15 18:35 ` [PATCH v19 4/4] PCI/MSI: Enable memory decoding before restoring MSI-X messages Farhan Ali
2026-06-15 18:48 ` sashiko-bot
2026-06-15 18:59 ` Niklas Schnelle
2026-06-15 22:17 ` Farhan Ali
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.