* [PATCH v12 1/7] PCI: Allow per function PCI slots to fix slot reset on s390
2026-03-30 17:40 [PATCH v12 0/7] Error recovery for vfio-pci devices on s390x Farhan Ali
@ 2026-03-30 17:40 ` Farhan Ali
2026-03-30 17:40 ` [PATCH v12 2/7] PCI: Avoid saving config space state if inaccessible Farhan Ali
` (6 subsequent siblings)
7 siblings, 0 replies; 24+ messages in thread
From: Farhan Ali @ 2026-03-30 17:40 UTC (permalink / raw)
To: linux-s390, linux-kernel, linux-pci
Cc: helgaas, lukas, alex, clg, kbusch, alifm, schnelle, mjrosato,
stable
On s390 systems, which use a machine level hypervisor, PCI devices are
always accessed through a form of PCI pass-through which fundamentally
operates on a per PCI function granularity. This is also reflected in the
s390 PCI hotplug driver which creates hotplug slots for individual PCI
functions. Its reset_slot() function, which is a wrapper for
zpci_hot_reset_device(), thus also resets individual functions.
Currently, the kernel's PCI_SLOT() macro assigns the same pci_slot object
to multifunction devices. This approach worked fine on s390 systems that
only exposed virtual functions as individual PCI domains to the operating
system. Since commit 44510d6fa0c0 ("s390/pci: Handling multifunctions")
s390 supports exposing the topology of multifunction PCI devices by
grouping them in a shared PCI domain. This creates a problem when resetting
a function through the hotplug driver's slot_reset() interface.
When attempting to reset a function through the hotplug driver, the shared
slot assignment causes the wrong function to be reset instead of the
intended one. It also leaks memory as we do create a pci_slot object for
the function, but don't correctly free it in pci_slot_release().
Add a flag for struct pci_slot to allow per function PCI slots for
functions managed through a hypervisor, which exposes individual PCI
functions while retaining the topology. Since we can use all 8 bits
for slot 'number' (for ARI devices), change slot 'number' u16 to
account for special values -1 and PCI_SLOT_ALL_DEVICES.
Fixes: 44510d6fa0c0 ("s390/pci: Handling multifunctions")
Cc: stable@vger.kernel.org
Suggested-by: Niklas Schnelle <schnelle@linux.ibm.com>
Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
---
drivers/pci/hotplug/rpaphp_slot.c | 2 +-
drivers/pci/pci.c | 5 +++--
drivers/pci/slot.c | 33 +++++++++++++++++++++++--------
include/linux/pci.h | 8 ++++++--
4 files changed, 35 insertions(+), 13 deletions(-)
diff --git a/drivers/pci/hotplug/rpaphp_slot.c b/drivers/pci/hotplug/rpaphp_slot.c
index 67362e5b9971..92eabf5f61b9 100644
--- a/drivers/pci/hotplug/rpaphp_slot.c
+++ b/drivers/pci/hotplug/rpaphp_slot.c
@@ -84,7 +84,7 @@ int rpaphp_register_slot(struct slot *slot)
struct hotplug_slot *php_slot = &slot->hotplug_slot;
u32 my_index;
int retval;
- int slotno = -1;
+ int slotno = PCI_SLOT_PLACEHOLDER;
dbg("%s registering slot:path[%pOF] index[%x], name[%s] pdomain[%x] type[%d]\n",
__func__, slot->dn, slot->index, slot->name,
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index b4b99fb32731..70162f15a72c 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4867,8 +4867,9 @@ static int pci_reset_hotplug_slot(struct hotplug_slot *hotplug, bool probe)
static int pci_dev_reset_slot_function(struct pci_dev *dev, bool probe)
{
- if (dev->multifunction || dev->subordinate || !dev->slot ||
- dev->dev_flags & PCI_DEV_FLAGS_NO_BUS_RESET)
+ if (dev->subordinate || !dev->slot ||
+ dev->dev_flags & PCI_DEV_FLAGS_NO_BUS_RESET ||
+ (dev->multifunction && !dev->slot->per_func_slot))
return -ENOTTY;
return pci_reset_hotplug_slot(dev->slot->hotplug, probe);
diff --git a/drivers/pci/slot.c b/drivers/pci/slot.c
index e0b7fb43423c..3f6e5dce27a0 100644
--- a/drivers/pci/slot.c
+++ b/drivers/pci/slot.c
@@ -37,7 +37,7 @@ static const struct sysfs_ops pci_slot_sysfs_ops = {
static ssize_t address_read_file(struct pci_slot *slot, char *buf)
{
- if (slot->number == 0xff)
+ if (slot->number == (u16)PCI_SLOT_PLACEHOLDER)
return sysfs_emit(buf, "%04x:%02x\n",
pci_domain_nr(slot->bus),
slot->bus->number);
@@ -72,6 +72,23 @@ static ssize_t cur_speed_read_file(struct pci_slot *slot, char *buf)
return bus_speed_read(slot->bus->cur_bus_speed, buf);
}
+static bool pci_dev_matches_slot(struct pci_dev *dev, struct pci_slot *slot)
+{
+ if (slot->per_func_slot)
+ return dev->devfn == slot->number;
+
+ return slot->number == PCI_SLOT_ALL_DEVICES ||
+ PCI_SLOT(dev->devfn) == slot->number;
+}
+
+static bool pci_slot_enabled_per_func(void)
+{
+ if (IS_ENABLED(CONFIG_S390))
+ return true;
+
+ return false;
+}
+
static void pci_slot_release(struct kobject *kobj)
{
struct pci_dev *dev;
@@ -82,8 +99,7 @@ static void pci_slot_release(struct kobject *kobj)
down_read(&pci_bus_sem);
list_for_each_entry(dev, &slot->bus->devices, bus_list)
- if (slot->number == PCI_SLOT_ALL_DEVICES ||
- PCI_SLOT(dev->devfn) == slot->number)
+ if (pci_dev_matches_slot(dev, slot))
dev->slot = NULL;
up_read(&pci_bus_sem);
@@ -176,8 +192,7 @@ void pci_dev_assign_slot(struct pci_dev *dev)
mutex_lock(&pci_slot_mutex);
list_for_each_entry(slot, &dev->bus->slots, list)
- if (slot->number == PCI_SLOT_ALL_DEVICES ||
- PCI_SLOT(dev->devfn) == slot->number)
+ if (pci_dev_matches_slot(dev, slot))
dev->slot = slot;
mutex_unlock(&pci_slot_mutex);
}
@@ -256,7 +271,7 @@ struct pci_slot *pci_create_slot(struct pci_bus *parent, int slot_nr,
mutex_lock(&pci_slot_mutex);
- if (slot_nr == -1)
+ if (slot_nr == PCI_SLOT_PLACEHOLDER)
goto placeholder;
/*
@@ -287,6 +302,9 @@ struct pci_slot *pci_create_slot(struct pci_bus *parent, int slot_nr,
slot->bus = pci_bus_get(parent);
slot->number = slot_nr;
+ if (pci_slot_enabled_per_func())
+ slot->per_func_slot = 1;
+
slot->kobj.kset = pci_slots_kset;
slot_name = make_slot_name(name);
@@ -307,8 +325,7 @@ struct pci_slot *pci_create_slot(struct pci_bus *parent, int slot_nr,
down_read(&pci_bus_sem);
list_for_each_entry(dev, &parent->devices, bus_list)
- if (slot_nr == PCI_SLOT_ALL_DEVICES ||
- PCI_SLOT(dev->devfn) == slot_nr)
+ if (pci_dev_matches_slot(dev, slot))
dev->slot = slot;
up_read(&pci_bus_sem);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 8861eeb4381d..ce0f7bff080a 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -78,14 +78,18 @@
* and, if ARI Forwarding is enabled, functions may appear to be on multiple
* devices.
*/
-#define PCI_SLOT_ALL_DEVICES 0xfe
+#define PCI_SLOT_ALL_DEVICES 0xfeff
+
+/* Used to identify a slot as a placeholder */
+#define PCI_SLOT_PLACEHOLDER -1
/* pci_slot represents a physical slot */
struct pci_slot {
struct pci_bus *bus; /* Bus this slot is on */
struct list_head list; /* Node in list of slots */
struct hotplug_slot *hotplug; /* Hotplug info (move here) */
- unsigned char number; /* Device nr, or PCI_SLOT_ALL_DEVICES */
+ u16 number; /* Device nr, or PCI_SLOT_ALL_DEVICES */
+ unsigned int per_func_slot:1; /* Allow per function slot */
struct kobject kobj;
};
--
2.43.0
^ permalink raw reply related [flat|nested] 24+ messages in thread* [PATCH v12 2/7] PCI: Avoid saving config space state if inaccessible
2026-03-30 17:40 [PATCH v12 0/7] Error recovery for vfio-pci devices on s390x Farhan Ali
2026-03-30 17:40 ` [PATCH v12 1/7] PCI: Allow per function PCI slots to fix slot reset on s390 Farhan Ali
@ 2026-03-30 17:40 ` Farhan Ali
2026-04-07 19:56 ` Bjorn Helgaas
2026-03-30 17:40 ` [PATCH v12 3/7] PCI: Fail FLR when config space is inaccessible Farhan Ali
` (5 subsequent siblings)
7 siblings, 1 reply; 24+ messages in thread
From: Farhan Ali @ 2026-03-30 17:40 UTC (permalink / raw)
To: linux-s390, linux-kernel, linux-pci
Cc: helgaas, lukas, alex, clg, kbusch, alifm, schnelle, mjrosato,
Bjorn Helgaas
The current reset process saves the device's config space state before
reset and restores it afterward. However errors may occur unexpectedly and
it may then be impossible to save config space because the device may be
inaccessible (e.g. DPC). This results in saving invalid values that get
written back to the device during state restoration.
With a reset we want to recover/restore the device into a functional state.
So avoid saving the state of the config space when the device config space
is inaccessible.
Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
Reviewed-by: Bjorn Helgaas <bhelgaas@google.com>
---
drivers/pci/pci.c | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 70162f15a72c..b36263834289 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -722,6 +722,27 @@ u16 pci_find_dvsec_capability(struct pci_dev *dev, u16 vendor, u16 dvsec)
}
EXPORT_SYMBOL_GPL(pci_find_dvsec_capability);
+static bool pci_dev_config_accessible(struct pci_dev *dev, char *msg)
+{
+ u32 val;
+
+ /*
+ * If device's config space is inaccessible it can return ~0 for
+ * any reads. Since VFs can also return ~0 for Device and Vendor ID
+ * check Command and Status registers. Note that this is racy
+ * because the device may become inaccessible partway through
+ * next access.
+ */
+ pci_read_config_dword(dev, PCI_COMMAND, &val);
+ if (PCI_POSSIBLE_ERROR(val)) {
+ pci_warn(dev, "Device config space inaccessible; unable to %s\n",
+ msg);
+ return false;
+ }
+
+ return true;
+}
+
/**
* pci_find_parent_resource - return resource region of parent bus of given
* region
@@ -5029,6 +5050,9 @@ static void pci_dev_save_and_disable(struct pci_dev *dev)
*/
pci_set_power_state(dev, PCI_D0);
+ if (!pci_dev_config_accessible(dev, "save state"))
+ return;
+
pci_save_state(dev);
/*
* Disable the device by clearing the Command register, except for
--
2.43.0
^ permalink raw reply related [flat|nested] 24+ messages in thread* Re: [PATCH v12 2/7] PCI: Avoid saving config space state if inaccessible
2026-03-30 17:40 ` [PATCH v12 2/7] PCI: Avoid saving config space state if inaccessible Farhan Ali
@ 2026-04-07 19:56 ` Bjorn Helgaas
2026-04-07 21:17 ` Farhan Ali
0 siblings, 1 reply; 24+ messages in thread
From: Bjorn Helgaas @ 2026-04-07 19:56 UTC (permalink / raw)
To: Farhan Ali
Cc: linux-s390, linux-kernel, linux-pci, lukas, alex, clg, kbusch,
schnelle, mjrosato, Bjorn Helgaas
On Mon, Mar 30, 2026 at 10:40:06AM -0700, Farhan Ali wrote:
> The current reset process saves the device's config space state before
> reset and restores it afterward. However errors may occur unexpectedly and
> it may then be impossible to save config space because the device may be
> inaccessible (e.g. DPC). This results in saving invalid values that get
> written back to the device during state restoration.
>
> With a reset we want to recover/restore the device into a functional state.
> So avoid saving the state of the config space when the device config space
> is inaccessible.
>
> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
> Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
> Reviewed-by: Bjorn Helgaas <bhelgaas@google.com>
> ---
> drivers/pci/pci.c | 24 ++++++++++++++++++++++++
> 1 file changed, 24 insertions(+)
>
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 70162f15a72c..b36263834289 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -722,6 +722,27 @@ u16 pci_find_dvsec_capability(struct pci_dev *dev, u16 vendor, u16 dvsec)
> }
> EXPORT_SYMBOL_GPL(pci_find_dvsec_capability);
>
> +static bool pci_dev_config_accessible(struct pci_dev *dev, char *msg)
> +{
> + u32 val;
> +
> + /*
> + * If device's config space is inaccessible it can return ~0 for
> + * any reads. Since VFs can also return ~0 for Device and Vendor ID
> + * check Command and Status registers. Note that this is racy
> + * because the device may become inaccessible partway through
> + * next access.
> + */
> + pci_read_config_dword(dev, PCI_COMMAND, &val);
> + if (PCI_POSSIBLE_ERROR(val)) {
> + pci_warn(dev, "Device config space inaccessible; unable to %s\n",
> + msg);
> + return false;
I wonder if it's feasible to do the pci_save_state() into a temporary
buffer, check the buffer for PCI_POSSIBLE_ERROR(), and copy the temp
buffer into the real buffer if we think the save was successful. I
know this is a lot more work, but I would like to avoid the raciness
if possible.
> + }
> +
> + return true;
> +}
> +
> /**
> * pci_find_parent_resource - return resource region of parent bus of given
> * region
> @@ -5029,6 +5050,9 @@ static void pci_dev_save_and_disable(struct pci_dev *dev)
> */
> pci_set_power_state(dev, PCI_D0);
>
> + if (!pci_dev_config_accessible(dev, "save state"))
> + return;
> +
> pci_save_state(dev);
> /*
> * Disable the device by clearing the Command register, except for
> --
> 2.43.0
>
^ permalink raw reply [flat|nested] 24+ messages in thread* Re: [PATCH v12 2/7] PCI: Avoid saving config space state if inaccessible
2026-04-07 19:56 ` Bjorn Helgaas
@ 2026-04-07 21:17 ` Farhan Ali
0 siblings, 0 replies; 24+ messages in thread
From: Farhan Ali @ 2026-04-07 21:17 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: linux-s390, linux-kernel, linux-pci, lukas, alex, clg, kbusch,
schnelle, mjrosato, Bjorn Helgaas
On 4/7/2026 12:56 PM, Bjorn Helgaas wrote:
> On Mon, Mar 30, 2026 at 10:40:06AM -0700, Farhan Ali wrote:
>> The current reset process saves the device's config space state before
>> reset and restores it afterward. However errors may occur unexpectedly and
>> it may then be impossible to save config space because the device may be
>> inaccessible (e.g. DPC). This results in saving invalid values that get
>> written back to the device during state restoration.
>>
>> With a reset we want to recover/restore the device into a functional state.
>> So avoid saving the state of the config space when the device config space
>> is inaccessible.
>>
>> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
>> Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
>> Reviewed-by: Bjorn Helgaas <bhelgaas@google.com>
>> ---
>> drivers/pci/pci.c | 24 ++++++++++++++++++++++++
>> 1 file changed, 24 insertions(+)
>>
>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>> index 70162f15a72c..b36263834289 100644
>> --- a/drivers/pci/pci.c
>> +++ b/drivers/pci/pci.c
>> @@ -722,6 +722,27 @@ u16 pci_find_dvsec_capability(struct pci_dev *dev, u16 vendor, u16 dvsec)
>> }
>> EXPORT_SYMBOL_GPL(pci_find_dvsec_capability);
>>
>> +static bool pci_dev_config_accessible(struct pci_dev *dev, char *msg)
>> +{
>> + u32 val;
>> +
>> + /*
>> + * If device's config space is inaccessible it can return ~0 for
>> + * any reads. Since VFs can also return ~0 for Device and Vendor ID
>> + * check Command and Status registers. Note that this is racy
>> + * because the device may become inaccessible partway through
>> + * next access.
>> + */
>> + pci_read_config_dword(dev, PCI_COMMAND, &val);
>> + if (PCI_POSSIBLE_ERROR(val)) {
>> + pci_warn(dev, "Device config space inaccessible; unable to %s\n",
>> + msg);
>> + return false;
> I wonder if it's feasible to do the pci_save_state() into a temporary
> buffer, check the buffer for PCI_POSSIBLE_ERROR(), and copy the temp
> buffer into the real buffer if we think the save was successful. I
> know this is a lot more work, but I would like to avoid the raciness
> if possible.
I just want to clarify, are you suggesting we do that in
pci_save_state() function? If not then we need to do something similar
to pci_save_state() and then check for errors. At that point wouldn't it
just make sense to add the check in places where we save the bits the
kernel wants? Please correct me if I misunderstood you.
Thanks
Farhan
>> + }
>> +
>> + return true;
>> +}
>> +
>> /**
>> * pci_find_parent_resource - return resource region of parent bus of given
>> * region
>> @@ -5029,6 +5050,9 @@ static void pci_dev_save_and_disable(struct pci_dev *dev)
>> */
>> pci_set_power_state(dev, PCI_D0);
>>
>> + if (!pci_dev_config_accessible(dev, "save state"))
>> + return;
>> +
>> pci_save_state(dev);
>> /*
>> * Disable the device by clearing the Command register, except for
>> --
>> 2.43.0
>>
^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH v12 3/7] PCI: Fail FLR when config space is inaccessible
2026-03-30 17:40 [PATCH v12 0/7] Error recovery for vfio-pci devices on s390x Farhan Ali
2026-03-30 17:40 ` [PATCH v12 1/7] PCI: Allow per function PCI slots to fix slot reset on s390 Farhan Ali
2026-03-30 17:40 ` [PATCH v12 2/7] PCI: Avoid saving config space state if inaccessible Farhan Ali
@ 2026-03-30 17:40 ` Farhan Ali
2026-03-30 17:40 ` [PATCH v12 4/7] s390/pci: Store PCI error information for passthrough devices Farhan Ali
` (4 subsequent siblings)
7 siblings, 0 replies; 24+ messages in thread
From: Farhan Ali @ 2026-03-30 17:40 UTC (permalink / raw)
To: linux-s390, linux-kernel, linux-pci
Cc: helgaas, lukas, alex, clg, kbusch, alifm, schnelle, mjrosato,
Benjamin Block
If a device is in an error state, then it's config space may not be
accssible. Add additional check to validate if a device's config space is
accessible before doing an FLR reset.
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
---
drivers/pci/pci.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index b36263834289..4795736193fb 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4398,6 +4398,9 @@ int pcie_reset_flr(struct pci_dev *dev, bool probe)
if (!(dev->devcap & PCI_EXP_DEVCAP_FLR))
return -ENOTTY;
+ if (!pci_dev_config_accessible(dev, "FLR"))
+ return -ENOTTY;
+
if (probe)
return 0;
--
2.43.0
^ permalink raw reply related [flat|nested] 24+ messages in thread* [PATCH v12 4/7] s390/pci: Store PCI error information for passthrough devices
2026-03-30 17:40 [PATCH v12 0/7] Error recovery for vfio-pci devices on s390x Farhan Ali
` (2 preceding siblings ...)
2026-03-30 17:40 ` [PATCH v12 3/7] PCI: Fail FLR when config space is inaccessible Farhan Ali
@ 2026-03-30 17:40 ` Farhan Ali
2026-03-31 17:41 ` Matthew Rosato
2026-04-07 15:38 ` Alex Williamson
2026-03-30 17:40 ` [PATCH v12 5/7] vfio-pci/zdev: Add a device feature for error information Farhan Ali
` (3 subsequent siblings)
7 siblings, 2 replies; 24+ messages in thread
From: Farhan Ali @ 2026-03-30 17:40 UTC (permalink / raw)
To: linux-s390, linux-kernel, linux-pci
Cc: helgaas, lukas, alex, clg, kbusch, alifm, schnelle, mjrosato
For a passthrough device we need co-operation from user space to recover
the device. This would require to bubble up any error information to user
space. Let's store this error information for passthrough devices, so it
can be retrieved later.
We can now have userspace drivers (vfio-pci based) on s390x. The userspace
drivers will not have any KVM fd and so no kzdev associated with them. So
we need to update the logic for detecting passthrough devices to not depend
on struct kvm_zdev.
Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
---
arch/s390/include/asm/pci.h | 30 ++++++++
arch/s390/pci/pci.c | 1 +
arch/s390/pci/pci_event.c | 113 ++++++++++++++++++-------------
drivers/vfio/pci/vfio_pci_zdev.c | 9 ++-
4 files changed, 105 insertions(+), 48 deletions(-)
diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h
index c0ff19dab580..7fb9a80b0175 100644
--- a/arch/s390/include/asm/pci.h
+++ b/arch/s390/include/asm/pci.h
@@ -118,6 +118,31 @@ struct zpci_bus {
enum pci_bus_speed max_bus_speed;
};
+/* Content Code Description for PCI Function Error */
+struct zpci_ccdf_err {
+ u32 reserved1;
+ u32 fh; /* function handle */
+ u32 fid; /* function id */
+ u32 ett : 4; /* expected table type */
+ u32 mvn : 12; /* MSI vector number */
+ u32 dmaas : 8; /* DMA address space */
+ u32 reserved2 : 6;
+ u32 q : 1; /* event qualifier */
+ u32 rw : 1; /* read/write */
+ u64 faddr; /* failing address */
+ u32 reserved3;
+ u16 reserved4;
+ u16 pec; /* PCI event code */
+} __packed;
+
+#define ZPCI_ERR_PENDING_MAX 4
+struct zpci_ccdf_pending {
+ u8 count;
+ u8 head;
+ u8 tail;
+ struct zpci_ccdf_err err[ZPCI_ERR_PENDING_MAX];
+};
+
/* Private data per function */
struct zpci_dev {
struct zpci_bus *zbus;
@@ -171,6 +196,7 @@ struct zpci_dev {
char res_name[16];
bool mio_capable;
+ bool mediated_recovery;
struct zpci_bar_struct bars[PCI_STD_NUM_BARS];
u64 start_dma; /* Start of available DMA addresses */
@@ -192,6 +218,8 @@ struct zpci_dev {
struct iommu_domain *s390_domain; /* attached IOMMU domain */
struct kvm_zdev *kzdev;
struct mutex kzdev_lock;
+ struct zpci_ccdf_pending pending_errs;
+ struct mutex pending_errs_lock;
spinlock_t dom_lock; /* protect s390_domain change */
};
@@ -330,6 +358,8 @@ void zpci_debug_exit_device(struct zpci_dev *);
int zpci_report_error(struct pci_dev *, struct zpci_report_error_header *);
int zpci_clear_error_state(struct zpci_dev *zdev);
int zpci_reset_load_store_blocked(struct zpci_dev *zdev);
+void zpci_start_mediated_recovery(struct zpci_dev *zdev);
+void zpci_stop_mediated_recovery(struct zpci_dev *zdev);
#ifdef CONFIG_NUMA
diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c
index 2a430722cbe4..d64d370d86cf 100644
--- a/arch/s390/pci/pci.c
+++ b/arch/s390/pci/pci.c
@@ -841,6 +841,7 @@ struct zpci_dev *zpci_create_device(u32 fid, u32 fh, enum zpci_state state)
mutex_init(&zdev->state_lock);
mutex_init(&zdev->fmb_lock);
mutex_init(&zdev->kzdev_lock);
+ mutex_init(&zdev->pending_errs_lock);
return zdev;
diff --git a/arch/s390/pci/pci_event.c b/arch/s390/pci/pci_event.c
index 839bd91c056e..9cda09da8b43 100644
--- a/arch/s390/pci/pci_event.c
+++ b/arch/s390/pci/pci_event.c
@@ -17,23 +17,6 @@
#include "pci_bus.h"
#include "pci_report.h"
-/* Content Code Description for PCI Function Error */
-struct zpci_ccdf_err {
- u32 reserved1;
- u32 fh; /* function handle */
- u32 fid; /* function id */
- u32 ett : 4; /* expected table type */
- u32 mvn : 12; /* MSI vector number */
- u32 dmaas : 8; /* DMA address space */
- u32 : 6;
- u32 q : 1; /* event qualifier */
- u32 rw : 1; /* read/write */
- u64 faddr; /* failing address */
- u32 reserved3;
- u16 reserved4;
- u16 pec; /* PCI event code */
-} __packed;
-
/* Content Code Description for PCI Function Availability */
struct zpci_ccdf_avail {
u32 reserved1;
@@ -60,16 +43,11 @@ static inline bool ers_result_indicates_abort(pci_ers_result_t ers_res)
}
}
-static bool is_passed_through(struct pci_dev *pdev)
+static bool needs_mediated_recovery(struct pci_dev *pdev)
{
struct zpci_dev *zdev = to_zpci(pdev);
- bool ret;
-
- mutex_lock(&zdev->kzdev_lock);
- ret = !!zdev->kzdev;
- mutex_unlock(&zdev->kzdev_lock);
-
- return ret;
+ guard(mutex)(&zdev->pending_errs_lock);
+ return zdev->mediated_recovery;
}
static bool is_driver_supported(struct pci_driver *driver)
@@ -81,6 +59,47 @@ static bool is_driver_supported(struct pci_driver *driver)
return true;
}
+static void zpci_store_pci_error(struct pci_dev *pdev,
+ struct zpci_ccdf_err *ccdf)
+{
+ struct zpci_dev *zdev = to_zpci(pdev);
+ int i;
+
+ guard(mutex)(&zdev->pending_errs_lock);
+ if (zdev->pending_errs.count >= ZPCI_ERR_PENDING_MAX) {
+ pr_err("%s: Maximum number (%d) of pending error events queued",
+ pci_name(pdev), ZPCI_ERR_PENDING_MAX);
+ return;
+ }
+
+ i = zdev->pending_errs.tail % ZPCI_ERR_PENDING_MAX;
+ memcpy(&zdev->pending_errs.err[i], ccdf, sizeof(struct zpci_ccdf_err));
+ zdev->pending_errs.tail++;
+ zdev->pending_errs.count++;
+}
+
+void zpci_start_mediated_recovery(struct zpci_dev *zdev)
+{
+ guard(mutex)(&zdev->pending_errs_lock);
+ zdev->mediated_recovery = true;
+}
+EXPORT_SYMBOL_GPL(zpci_start_mediated_recovery);
+
+void zpci_stop_mediated_recovery(struct zpci_dev *zdev)
+{
+ struct pci_dev *pdev = NULL;
+
+ guard(mutex)(&zdev->pending_errs_lock);
+ zdev->mediated_recovery = false;
+ pdev = pci_get_slot(zdev->zbus->bus, zdev->devfn);
+ if (zdev->pending_errs.count)
+ pr_info("%s: Unhandled PCI error events count=%d",
+ pci_name(pdev), zdev->pending_errs.count);
+ memset(&zdev->pending_errs, 0, sizeof(struct zpci_ccdf_pending));
+ pci_dev_put(pdev);
+}
+EXPORT_SYMBOL_GPL(zpci_stop_mediated_recovery);
+
static pci_ers_result_t zpci_event_notify_error_detected(struct pci_dev *pdev,
struct pci_driver *driver)
{
@@ -175,7 +194,8 @@ static pci_ers_result_t zpci_event_do_reset(struct pci_dev *pdev,
* and the platform determines which functions are affected for
* multi-function devices.
*/
-static pci_ers_result_t zpci_event_attempt_error_recovery(struct pci_dev *pdev)
+static pci_ers_result_t zpci_event_attempt_error_recovery(struct pci_dev *pdev,
+ struct zpci_ccdf_err *ccdf)
{
pci_ers_result_t ers_res = PCI_ERS_RESULT_DISCONNECT;
struct zpci_dev *zdev = to_zpci(pdev);
@@ -194,13 +214,6 @@ static pci_ers_result_t zpci_event_attempt_error_recovery(struct pci_dev *pdev)
}
pdev->error_state = pci_channel_io_frozen;
- if (is_passed_through(pdev)) {
- pr_info("%s: Cannot be recovered in the host because it is a pass-through device\n",
- pci_name(pdev));
- status_str = "failed (pass-through)";
- goto out_unlock;
- }
-
driver = to_pci_driver(pdev->dev.driver);
if (!is_driver_supported(driver)) {
if (!driver) {
@@ -216,12 +229,23 @@ static pci_ers_result_t zpci_event_attempt_error_recovery(struct pci_dev *pdev)
goto out_unlock;
}
+ if (needs_mediated_recovery(pdev))
+ zpci_store_pci_error(pdev, ccdf);
+
ers_res = zpci_event_notify_error_detected(pdev, driver);
if (ers_result_indicates_abort(ers_res)) {
status_str = "failed (abort on detection)";
goto out_unlock;
}
+ if (needs_mediated_recovery(pdev)) {
+ pr_info("%s: Leaving recovery of pass-through device to user-space\n",
+ pci_name(pdev));
+ ers_res = PCI_ERS_RESULT_RECOVERED;
+ status_str = "in progress";
+ goto out_unlock;
+ }
+
if (ers_res != PCI_ERS_RESULT_NEED_RESET) {
ers_res = zpci_event_do_error_state_clear(pdev, driver);
if (ers_result_indicates_abort(ers_res)) {
@@ -266,25 +290,20 @@ static pci_ers_result_t zpci_event_attempt_error_recovery(struct pci_dev *pdev)
* @pdev: PCI function for which to report
* @es: PCI channel failure state to report
*/
-static void zpci_event_io_failure(struct pci_dev *pdev, pci_channel_state_t es)
+static void zpci_event_io_failure(struct pci_dev *pdev, pci_channel_state_t es,
+ struct zpci_ccdf_err *ccdf)
{
struct pci_driver *driver;
pci_dev_lock(pdev);
pdev->error_state = es;
- /**
- * While vfio-pci's error_detected callback notifies user-space QEMU
- * reacts to this by freezing the guest. In an s390 environment PCI
- * errors are rarely fatal so this is overkill. Instead in the future
- * we will inject the error event and let the guest recover the device
- * itself.
- */
- if (is_passed_through(pdev))
- goto out;
+
+ if (needs_mediated_recovery(pdev))
+ zpci_store_pci_error(pdev, ccdf);
driver = to_pci_driver(pdev->dev.driver);
if (driver && driver->err_handler && driver->err_handler->error_detected)
driver->err_handler->error_detected(pdev, pdev->error_state);
-out:
+
pci_dev_unlock(pdev);
}
@@ -330,12 +349,12 @@ static void __zpci_event_error(struct zpci_ccdf_err *ccdf)
break;
case 0x0040: /* Service Action or Error Recovery Failed */
case 0x003b:
- zpci_event_io_failure(pdev, pci_channel_io_perm_failure);
+ zpci_event_io_failure(pdev, pci_channel_io_perm_failure, ccdf);
break;
default: /* PCI function left in the error state attempt to recover */
- ers_res = zpci_event_attempt_error_recovery(pdev);
+ ers_res = zpci_event_attempt_error_recovery(pdev, ccdf);
if (ers_res != PCI_ERS_RESULT_RECOVERED)
- zpci_event_io_failure(pdev, pci_channel_io_perm_failure);
+ zpci_event_io_failure(pdev, pci_channel_io_perm_failure, ccdf);
break;
}
pci_dev_put(pdev);
diff --git a/drivers/vfio/pci/vfio_pci_zdev.c b/drivers/vfio/pci/vfio_pci_zdev.c
index 0990fdb146b7..0658095ac5b1 100644
--- a/drivers/vfio/pci/vfio_pci_zdev.c
+++ b/drivers/vfio/pci/vfio_pci_zdev.c
@@ -148,6 +148,8 @@ int vfio_pci_zdev_open_device(struct vfio_pci_core_device *vdev)
if (!zdev)
return -ENODEV;
+ zpci_start_mediated_recovery(zdev);
+
if (!vdev->vdev.kvm)
return 0;
@@ -161,7 +163,12 @@ void vfio_pci_zdev_close_device(struct vfio_pci_core_device *vdev)
{
struct zpci_dev *zdev = to_zpci(vdev->pdev);
- if (!zdev || !vdev->vdev.kvm)
+ if (!zdev)
+ return;
+
+ zpci_stop_mediated_recovery(zdev);
+
+ if (!vdev->vdev.kvm)
return;
if (zpci_kvm_hook.kvm_unregister)
--
2.43.0
^ permalink raw reply related [flat|nested] 24+ messages in thread* Re: [PATCH v12 4/7] s390/pci: Store PCI error information for passthrough devices
2026-03-30 17:40 ` [PATCH v12 4/7] s390/pci: Store PCI error information for passthrough devices Farhan Ali
@ 2026-03-31 17:41 ` Matthew Rosato
2026-03-31 19:23 ` Farhan Ali
2026-04-07 15:38 ` Alex Williamson
1 sibling, 1 reply; 24+ messages in thread
From: Matthew Rosato @ 2026-03-31 17:41 UTC (permalink / raw)
To: Farhan Ali, linux-s390, linux-kernel, linux-pci
Cc: helgaas, lukas, alex, clg, kbusch, schnelle
On 3/30/26 1:40 PM, Farhan Ali wrote:
> For a passthrough device we need co-operation from user space to recover
> the device. This would require to bubble up any error information to user
> space. Let's store this error information for passthrough devices, so it
> can be retrieved later.
>
> We can now have userspace drivers (vfio-pci based) on s390x. The userspace
> drivers will not have any KVM fd and so no kzdev associated with them. So
> we need to update the logic for detecting passthrough devices to not depend
> on struct kvm_zdev.
>
> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
But a suggestion below:
> ---
> arch/s390/include/asm/pci.h | 30 ++++++++
> arch/s390/pci/pci.c | 1 +
> arch/s390/pci/pci_event.c | 113 ++++++++++++++++++-------------
> drivers/vfio/pci/vfio_pci_zdev.c | 9 ++-
> 4 files changed, 105 insertions(+), 48 deletions(-)
>
> diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h
> index c0ff19dab580..7fb9a80b0175 100644
> --- a/arch/s390/include/asm/pci.h
> +++ b/arch/s390/include/asm/pci.h
> @@ -118,6 +118,31 @@ struct zpci_bus {
> enum pci_bus_speed max_bus_speed;
> };
>
> +/* Content Code Description for PCI Function Error */
> +struct zpci_ccdf_err {
> + u32 reserved1;
> + u32 fh; /* function handle */
> + u32 fid; /* function id */
> + u32 ett : 4; /* expected table type */
> + u32 mvn : 12; /* MSI vector number */
> + u32 dmaas : 8; /* DMA address space */
> + u32 reserved2 : 6;
> + u32 q : 1; /* event qualifier */
> + u32 rw : 1; /* read/write */
> + u64 faddr; /* failing address */
> + u32 reserved3;
> + u16 reserved4;
> + u16 pec; /* PCI event code */
> +} __packed;
> +
> +#define ZPCI_ERR_PENDING_MAX 4
> +struct zpci_ccdf_pending {
> + u8 count;
> + u8 head;
> + u8 tail;
> + struct zpci_ccdf_err err[ZPCI_ERR_PENDING_MAX];
> +};
> +
> /* Private data per function */
> struct zpci_dev {
> struct zpci_bus *zbus;
> @@ -171,6 +196,7 @@ struct zpci_dev {
>
> char res_name[16];
> bool mio_capable;
> + bool mediated_recovery;
> struct zpci_bar_struct bars[PCI_STD_NUM_BARS];
>
> u64 start_dma; /* Start of available DMA addresses */
> @@ -192,6 +218,8 @@ struct zpci_dev {
> struct iommu_domain *s390_domain; /* attached IOMMU domain */
> struct kvm_zdev *kzdev;
> struct mutex kzdev_lock;
> + struct zpci_ccdf_pending pending_errs;
> + struct mutex pending_errs_lock;
This mutex now protects the pending_errs struct (obvious) as well as the
mediated_recovery bool (not obvious). Add a comment here and/or as a
block before the needs/start/stop functions?
Actually.. Could mediated_recovery just be part of the
zpci_ccdf_pending stucture? AFAICT the bit basically controls whether
or not the zpci_ccdf_pending structure is used / has meaning. You even
turn the bit off at the same time you memset(&zdev->pending_errs, 0) in
zpci_stop_mediated_recovery() -- though an explicit setting of
mediated_devices = false would still be nice for code clarity.
Then the spinlock continues to protect only this structure.
^ permalink raw reply [flat|nested] 24+ messages in thread* Re: [PATCH v12 4/7] s390/pci: Store PCI error information for passthrough devices
2026-03-31 17:41 ` Matthew Rosato
@ 2026-03-31 19:23 ` Farhan Ali
2026-03-31 19:29 ` Matthew Rosato
0 siblings, 1 reply; 24+ messages in thread
From: Farhan Ali @ 2026-03-31 19:23 UTC (permalink / raw)
To: Matthew Rosato, linux-s390, linux-kernel, linux-pci
Cc: helgaas, lukas, alex, clg, kbusch, schnelle
Hey Matt,
On 3/31/2026 10:41 AM, Matthew Rosato wrote:
> On 3/30/26 1:40 PM, Farhan Ali wrote:
>> For a passthrough device we need co-operation from user space to recover
>> the device. This would require to bubble up any error information to user
>> space. Let's store this error information for passthrough devices, so it
>> can be retrieved later.
>>
>> We can now have userspace drivers (vfio-pci based) on s390x. The userspace
>> drivers will not have any KVM fd and so no kzdev associated with them. So
>> we need to update the logic for detecting passthrough devices to not depend
>> on struct kvm_zdev.
>>
>> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Thanks for reviewing!
> But a suggestion below:
>
>> ---
>> arch/s390/include/asm/pci.h | 30 ++++++++
>> arch/s390/pci/pci.c | 1 +
>> arch/s390/pci/pci_event.c | 113 ++++++++++++++++++-------------
>> drivers/vfio/pci/vfio_pci_zdev.c | 9 ++-
>> 4 files changed, 105 insertions(+), 48 deletions(-)
>>
>> diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h
>> index c0ff19dab580..7fb9a80b0175 100644
>> --- a/arch/s390/include/asm/pci.h
>> +++ b/arch/s390/include/asm/pci.h
>> @@ -118,6 +118,31 @@ struct zpci_bus {
>> enum pci_bus_speed max_bus_speed;
>> };
>>
>> +/* Content Code Description for PCI Function Error */
>> +struct zpci_ccdf_err {
>> + u32 reserved1;
>> + u32 fh; /* function handle */
>> + u32 fid; /* function id */
>> + u32 ett : 4; /* expected table type */
>> + u32 mvn : 12; /* MSI vector number */
>> + u32 dmaas : 8; /* DMA address space */
>> + u32 reserved2 : 6;
>> + u32 q : 1; /* event qualifier */
>> + u32 rw : 1; /* read/write */
>> + u64 faddr; /* failing address */
>> + u32 reserved3;
>> + u16 reserved4;
>> + u16 pec; /* PCI event code */
>> +} __packed;
>> +
>> +#define ZPCI_ERR_PENDING_MAX 4
>> +struct zpci_ccdf_pending {
>> + u8 count;
>> + u8 head;
>> + u8 tail;
>> + struct zpci_ccdf_err err[ZPCI_ERR_PENDING_MAX];
>> +};
>> +
>> /* Private data per function */
>> struct zpci_dev {
>> struct zpci_bus *zbus;
>> @@ -171,6 +196,7 @@ struct zpci_dev {
>>
>> char res_name[16];
>> bool mio_capable;
>> + bool mediated_recovery;
>> struct zpci_bar_struct bars[PCI_STD_NUM_BARS];
>>
>> u64 start_dma; /* Start of available DMA addresses */
>> @@ -192,6 +218,8 @@ struct zpci_dev {
>> struct iommu_domain *s390_domain; /* attached IOMMU domain */
>> struct kvm_zdev *kzdev;
>> struct mutex kzdev_lock;
>> + struct zpci_ccdf_pending pending_errs;
>> + struct mutex pending_errs_lock;
> This mutex now protects the pending_errs struct (obvious) as well as the
> mediated_recovery bool (not obvious). Add a comment here and/or as a
> block before the needs/start/stop functions?
I can add a comment here. I was in two minds about renaming the lock,
but at the end decided against to keep the diff minimum for the patches.
>
> Actually.. Could mediated_recovery just be part of the
> zpci_ccdf_pending stucture? AFAICT the bit basically controls whether
> or not the zpci_ccdf_pending structure is used / has meaning. You even
> turn the bit off at the same time you memset(&zdev->pending_errs, 0) in
> zpci_stop_mediated_recovery() -- though an explicit setting of
> mediated_devices = false would still be nice for code clarity.
>
> Then the spinlock continues to protect only this structure.
You are right, the mediated_recovery flag drives the use of the
zpci_ccdf_pending struct. IMHO keeping the flag helped me with code
clarity and keeping the struct simple and not be tied with the flag. But
if you (or anyone) feels strongly about it then I can add the flag to
the zpci_ccdf_pending struct.
Thanks
Farhan
>
^ permalink raw reply [flat|nested] 24+ messages in thread* Re: [PATCH v12 4/7] s390/pci: Store PCI error information for passthrough devices
2026-03-31 19:23 ` Farhan Ali
@ 2026-03-31 19:29 ` Matthew Rosato
0 siblings, 0 replies; 24+ messages in thread
From: Matthew Rosato @ 2026-03-31 19:29 UTC (permalink / raw)
To: Farhan Ali, linux-s390, linux-kernel, linux-pci
Cc: helgaas, lukas, alex, clg, kbusch, schnelle
On 3/31/26 3:23 PM, Farhan Ali wrote:
>> Actually.. Could mediated_recovery just be part of the
>> zpci_ccdf_pending stucture? AFAICT the bit basically controls whether
>> or not the zpci_ccdf_pending structure is used / has meaning. You even
>> turn the bit off at the same time you memset(&zdev->pending_errs, 0) in
>> zpci_stop_mediated_recovery() -- though an explicit setting of
>> mediated_devices = false would still be nice for code clarity.
>>
>> Then the spinlock continues to protect only this structure.
>
> You are right, the mediated_recovery flag drives the use of the
> zpci_ccdf_pending struct. IMHO keeping the flag helped me with code
> clarity and keeping the struct simple and not be tied with the flag. But
> if you (or anyone) feels strongly about it then I can add the flag to
> the zpci_ccdf_pending struct.
>
FWIW I am fine with leaving it as-is as long as there is a clear comment
tying the bool to the mutex.
Thanks,
Matt
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v12 4/7] s390/pci: Store PCI error information for passthrough devices
2026-03-30 17:40 ` [PATCH v12 4/7] s390/pci: Store PCI error information for passthrough devices Farhan Ali
2026-03-31 17:41 ` Matthew Rosato
@ 2026-04-07 15:38 ` Alex Williamson
2026-04-07 18:00 ` Farhan Ali
1 sibling, 1 reply; 24+ messages in thread
From: Alex Williamson @ 2026-04-07 15:38 UTC (permalink / raw)
To: Farhan Ali
Cc: linux-s390, linux-kernel, linux-pci, helgaas, lukas, clg, kbusch,
schnelle, mjrosato, alex
On Mon, 30 Mar 2026 10:40:08 -0700
Farhan Ali <alifm@linux.ibm.com> wrote:
> For a passthrough device we need co-operation from user space to recover
> the device. This would require to bubble up any error information to user
> space. Let's store this error information for passthrough devices, so it
> can be retrieved later.
>
> We can now have userspace drivers (vfio-pci based) on s390x. The userspace
> drivers will not have any KVM fd and so no kzdev associated with them. So
> we need to update the logic for detecting passthrough devices to not depend
> on struct kvm_zdev.
>
> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
> ---
> arch/s390/include/asm/pci.h | 30 ++++++++
> arch/s390/pci/pci.c | 1 +
> arch/s390/pci/pci_event.c | 113 ++++++++++++++++++-------------
> drivers/vfio/pci/vfio_pci_zdev.c | 9 ++-
> 4 files changed, 105 insertions(+), 48 deletions(-)
>
> diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h
> index c0ff19dab580..7fb9a80b0175 100644
> --- a/arch/s390/include/asm/pci.h
> +++ b/arch/s390/include/asm/pci.h
> @@ -118,6 +118,31 @@ struct zpci_bus {
> enum pci_bus_speed max_bus_speed;
> };
>
> +/* Content Code Description for PCI Function Error */
> +struct zpci_ccdf_err {
> + u32 reserved1;
> + u32 fh; /* function handle */
> + u32 fid; /* function id */
> + u32 ett : 4; /* expected table type */
> + u32 mvn : 12; /* MSI vector number */
> + u32 dmaas : 8; /* DMA address space */
> + u32 reserved2 : 6;
> + u32 q : 1; /* event qualifier */
> + u32 rw : 1; /* read/write */
> + u64 faddr; /* failing address */
> + u32 reserved3;
> + u16 reserved4;
> + u16 pec; /* PCI event code */
> +} __packed;
> +
> +#define ZPCI_ERR_PENDING_MAX 4
> +struct zpci_ccdf_pending {
> + u8 count;
> + u8 head;
> + u8 tail;
> + struct zpci_ccdf_err err[ZPCI_ERR_PENDING_MAX];
> +};
> +
> /* Private data per function */
> struct zpci_dev {
> struct zpci_bus *zbus;
> @@ -171,6 +196,7 @@ struct zpci_dev {
>
> char res_name[16];
> bool mio_capable;
> + bool mediated_recovery;
> struct zpci_bar_struct bars[PCI_STD_NUM_BARS];
>
> u64 start_dma; /* Start of available DMA addresses */
> @@ -192,6 +218,8 @@ struct zpci_dev {
> struct iommu_domain *s390_domain; /* attached IOMMU domain */
> struct kvm_zdev *kzdev;
> struct mutex kzdev_lock;
> + struct zpci_ccdf_pending pending_errs;
> + struct mutex pending_errs_lock;
> spinlock_t dom_lock; /* protect s390_domain change */
> };
>
> @@ -330,6 +358,8 @@ void zpci_debug_exit_device(struct zpci_dev *);
> int zpci_report_error(struct pci_dev *, struct zpci_report_error_header *);
> int zpci_clear_error_state(struct zpci_dev *zdev);
> int zpci_reset_load_store_blocked(struct zpci_dev *zdev);
> +void zpci_start_mediated_recovery(struct zpci_dev *zdev);
> +void zpci_stop_mediated_recovery(struct zpci_dev *zdev);
>
> #ifdef CONFIG_NUMA
>
> diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c
> index 2a430722cbe4..d64d370d86cf 100644
> --- a/arch/s390/pci/pci.c
> +++ b/arch/s390/pci/pci.c
> @@ -841,6 +841,7 @@ struct zpci_dev *zpci_create_device(u32 fid, u32 fh, enum zpci_state state)
> mutex_init(&zdev->state_lock);
> mutex_init(&zdev->fmb_lock);
> mutex_init(&zdev->kzdev_lock);
> + mutex_init(&zdev->pending_errs_lock);
>
> return zdev;
>
> diff --git a/arch/s390/pci/pci_event.c b/arch/s390/pci/pci_event.c
> index 839bd91c056e..9cda09da8b43 100644
> --- a/arch/s390/pci/pci_event.c
> +++ b/arch/s390/pci/pci_event.c
> @@ -17,23 +17,6 @@
> #include "pci_bus.h"
> #include "pci_report.h"
>
> -/* Content Code Description for PCI Function Error */
> -struct zpci_ccdf_err {
> - u32 reserved1;
> - u32 fh; /* function handle */
> - u32 fid; /* function id */
> - u32 ett : 4; /* expected table type */
> - u32 mvn : 12; /* MSI vector number */
> - u32 dmaas : 8; /* DMA address space */
> - u32 : 6;
> - u32 q : 1; /* event qualifier */
> - u32 rw : 1; /* read/write */
> - u64 faddr; /* failing address */
> - u32 reserved3;
> - u16 reserved4;
> - u16 pec; /* PCI event code */
> -} __packed;
> -
> /* Content Code Description for PCI Function Availability */
> struct zpci_ccdf_avail {
> u32 reserved1;
> @@ -60,16 +43,11 @@ static inline bool ers_result_indicates_abort(pci_ers_result_t ers_res)
> }
> }
>
> -static bool is_passed_through(struct pci_dev *pdev)
> +static bool needs_mediated_recovery(struct pci_dev *pdev)
> {
> struct zpci_dev *zdev = to_zpci(pdev);
> - bool ret;
> -
> - mutex_lock(&zdev->kzdev_lock);
> - ret = !!zdev->kzdev;
> - mutex_unlock(&zdev->kzdev_lock);
> -
> - return ret;
> + guard(mutex)(&zdev->pending_errs_lock);
> + return zdev->mediated_recovery;
> }
This immediately raises a red flag, the caller gets a snapshot of the
value under mutex, but nothing guarantees the value doesn't immediately
change before the caller can take action using the (now stale) value.
The pending_errs_lock is attempting to be used to protect the recovery
state, in kernel vs mediated, but it only provides an instantaneous
snapshot.
>
> static bool is_driver_supported(struct pci_driver *driver)
> @@ -81,6 +59,47 @@ static bool is_driver_supported(struct pci_driver *driver)
> return true;
> }
>
> +static void zpci_store_pci_error(struct pci_dev *pdev,
> + struct zpci_ccdf_err *ccdf)
> +{
> + struct zpci_dev *zdev = to_zpci(pdev);
> + int i;
> +
> + guard(mutex)(&zdev->pending_errs_lock);
> + if (zdev->pending_errs.count >= ZPCI_ERR_PENDING_MAX) {
> + pr_err("%s: Maximum number (%d) of pending error events queued",
> + pci_name(pdev), ZPCI_ERR_PENDING_MAX);
> + return;
> + }
> +
> + i = zdev->pending_errs.tail % ZPCI_ERR_PENDING_MAX;
> + memcpy(&zdev->pending_errs.err[i], ccdf, sizeof(struct zpci_ccdf_err));
> + zdev->pending_errs.tail++;
> + zdev->pending_errs.count++;
> +}
Here we're using the mutex for something different, serializing access
to the buffer.
> +
> +void zpci_start_mediated_recovery(struct zpci_dev *zdev)
> +{
> + guard(mutex)(&zdev->pending_errs_lock);
> + zdev->mediated_recovery = true;
> +}
> +EXPORT_SYMBOL_GPL(zpci_start_mediated_recovery);
An now back to protecting the recovery mode, but not really.
> +
> +void zpci_stop_mediated_recovery(struct zpci_dev *zdev)
> +{
> + struct pci_dev *pdev = NULL;
> +
> + guard(mutex)(&zdev->pending_errs_lock);
> + zdev->mediated_recovery = false;
> + pdev = pci_get_slot(zdev->zbus->bus, zdev->devfn);
> + if (zdev->pending_errs.count)
> + pr_info("%s: Unhandled PCI error events count=%d",
> + pci_name(pdev), zdev->pending_errs.count);
> + memset(&zdev->pending_errs, 0, sizeof(struct zpci_ccdf_pending));
> + pci_dev_put(pdev);
> +}
> +EXPORT_SYMBOL_GPL(zpci_stop_mediated_recovery);
This brings the use cases together, fair.
> +
> static pci_ers_result_t zpci_event_notify_error_detected(struct pci_dev *pdev,
> struct pci_driver *driver)
> {
> @@ -175,7 +194,8 @@ static pci_ers_result_t zpci_event_do_reset(struct pci_dev *pdev,
> * and the platform determines which functions are affected for
> * multi-function devices.
> */
> -static pci_ers_result_t zpci_event_attempt_error_recovery(struct pci_dev *pdev)
> +static pci_ers_result_t zpci_event_attempt_error_recovery(struct pci_dev *pdev,
> + struct zpci_ccdf_err *ccdf)
> {
> pci_ers_result_t ers_res = PCI_ERS_RESULT_DISCONNECT;
> struct zpci_dev *zdev = to_zpci(pdev);
> @@ -194,13 +214,6 @@ static pci_ers_result_t zpci_event_attempt_error_recovery(struct pci_dev *pdev)
> }
> pdev->error_state = pci_channel_io_frozen;
>
> - if (is_passed_through(pdev)) {
> - pr_info("%s: Cannot be recovered in the host because it is a pass-through device\n",
> - pci_name(pdev));
> - status_str = "failed (pass-through)";
> - goto out_unlock;
> - }
> -
> driver = to_pci_driver(pdev->dev.driver);
> if (!is_driver_supported(driver)) {
> if (!driver) {
> @@ -216,12 +229,23 @@ static pci_ers_result_t zpci_event_attempt_error_recovery(struct pci_dev *pdev)
> goto out_unlock;
> }
>
> + if (needs_mediated_recovery(pdev))
The test result is invalid here. Why not call zpci_store_pci_error()
unconditionally here and wrap both in the same guard?
needs_mediated_recovery() should have a lockdep_assert.
> + zpci_store_pci_error(pdev, ccdf);
> +
> ers_res = zpci_event_notify_error_detected(pdev, driver);
> if (ers_result_indicates_abort(ers_res)) {
> status_str = "failed (abort on detection)";
> goto out_unlock;
> }
>
> + if (needs_mediated_recovery(pdev)) {
> + pr_info("%s: Leaving recovery of pass-through device to user-space\n",
> + pci_name(pdev));
> + ers_res = PCI_ERS_RESULT_RECOVERED;
> + status_str = "in progress";
> + goto out_unlock;
> + }
Does the mutex guard need to extend to here to make sure this is
consistent with the previous test? What prevents an open/close
across these discrete tests?
> +
> if (ers_res != PCI_ERS_RESULT_NEED_RESET) {
> ers_res = zpci_event_do_error_state_clear(pdev, driver);
> if (ers_result_indicates_abort(ers_res)) {
> @@ -266,25 +290,20 @@ static pci_ers_result_t zpci_event_attempt_error_recovery(struct pci_dev *pdev)
> * @pdev: PCI function for which to report
> * @es: PCI channel failure state to report
> */
> -static void zpci_event_io_failure(struct pci_dev *pdev, pci_channel_state_t es)
> +static void zpci_event_io_failure(struct pci_dev *pdev, pci_channel_state_t es,
> + struct zpci_ccdf_err *ccdf)
> {
> struct pci_driver *driver;
>
> pci_dev_lock(pdev);
> pdev->error_state = es;
> - /**
> - * While vfio-pci's error_detected callback notifies user-space QEMU
> - * reacts to this by freezing the guest. In an s390 environment PCI
> - * errors are rarely fatal so this is overkill. Instead in the future
> - * we will inject the error event and let the guest recover the device
> - * itself.
> - */
> - if (is_passed_through(pdev))
> - goto out;
> +
> + if (needs_mediated_recovery(pdev))
> + zpci_store_pci_error(pdev, ccdf);
Same. Unless something else prevents this from changing, the mutex is
not effective between test and use. Thanks,
Alex
> driver = to_pci_driver(pdev->dev.driver);
> if (driver && driver->err_handler && driver->err_handler->error_detected)
> driver->err_handler->error_detected(pdev, pdev->error_state);
> -out:
> +
> pci_dev_unlock(pdev);
> }
>
> @@ -330,12 +349,12 @@ static void __zpci_event_error(struct zpci_ccdf_err *ccdf)
> break;
> case 0x0040: /* Service Action or Error Recovery Failed */
> case 0x003b:
> - zpci_event_io_failure(pdev, pci_channel_io_perm_failure);
> + zpci_event_io_failure(pdev, pci_channel_io_perm_failure, ccdf);
> break;
> default: /* PCI function left in the error state attempt to recover */
> - ers_res = zpci_event_attempt_error_recovery(pdev);
> + ers_res = zpci_event_attempt_error_recovery(pdev, ccdf);
> if (ers_res != PCI_ERS_RESULT_RECOVERED)
> - zpci_event_io_failure(pdev, pci_channel_io_perm_failure);
> + zpci_event_io_failure(pdev, pci_channel_io_perm_failure, ccdf);
> break;
> }
> pci_dev_put(pdev);
> diff --git a/drivers/vfio/pci/vfio_pci_zdev.c b/drivers/vfio/pci/vfio_pci_zdev.c
> index 0990fdb146b7..0658095ac5b1 100644
> --- a/drivers/vfio/pci/vfio_pci_zdev.c
> +++ b/drivers/vfio/pci/vfio_pci_zdev.c
> @@ -148,6 +148,8 @@ int vfio_pci_zdev_open_device(struct vfio_pci_core_device *vdev)
> if (!zdev)
> return -ENODEV;
>
> + zpci_start_mediated_recovery(zdev);
> +
> if (!vdev->vdev.kvm)
> return 0;
>
> @@ -161,7 +163,12 @@ void vfio_pci_zdev_close_device(struct vfio_pci_core_device *vdev)
> {
> struct zpci_dev *zdev = to_zpci(vdev->pdev);
>
> - if (!zdev || !vdev->vdev.kvm)
> + if (!zdev)
> + return;
> +
> + zpci_stop_mediated_recovery(zdev);
> +
> + if (!vdev->vdev.kvm)
> return;
>
> if (zpci_kvm_hook.kvm_unregister)
^ permalink raw reply [flat|nested] 24+ messages in thread* Re: [PATCH v12 4/7] s390/pci: Store PCI error information for passthrough devices
2026-04-07 15:38 ` Alex Williamson
@ 2026-04-07 18:00 ` Farhan Ali
2026-04-07 18:23 ` Alex Williamson
0 siblings, 1 reply; 24+ messages in thread
From: Farhan Ali @ 2026-04-07 18:00 UTC (permalink / raw)
To: Alex Williamson
Cc: linux-s390, linux-kernel, linux-pci, helgaas, lukas, clg, kbusch,
schnelle, mjrosato
On 4/7/2026 8:38 AM, Alex Williamson wrote:
> On Mon, 30 Mar 2026 10:40:08 -0700
> Farhan Ali <alifm@linux.ibm.com> wrote:
>
>> For a passthrough device we need co-operation from user space to recover
>> the device. This would require to bubble up any error information to user
>> space. Let's store this error information for passthrough devices, so it
>> can be retrieved later.
>>
>> We can now have userspace drivers (vfio-pci based) on s390x. The userspace
>> drivers will not have any KVM fd and so no kzdev associated with them. So
>> we need to update the logic for detecting passthrough devices to not depend
>> on struct kvm_zdev.
>>
>> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
>> ---
>> arch/s390/include/asm/pci.h | 30 ++++++++
>> arch/s390/pci/pci.c | 1 +
>> arch/s390/pci/pci_event.c | 113 ++++++++++++++++++-------------
>> drivers/vfio/pci/vfio_pci_zdev.c | 9 ++-
>> 4 files changed, 105 insertions(+), 48 deletions(-)
>>
>> diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h
>> index c0ff19dab580..7fb9a80b0175 100644
>> --- a/arch/s390/include/asm/pci.h
>> +++ b/arch/s390/include/asm/pci.h
>> @@ -118,6 +118,31 @@ struct zpci_bus {
>> enum pci_bus_speed max_bus_speed;
>> };
>>
>> +/* Content Code Description for PCI Function Error */
>> +struct zpci_ccdf_err {
>> + u32 reserved1;
>> + u32 fh; /* function handle */
>> + u32 fid; /* function id */
>> + u32 ett : 4; /* expected table type */
>> + u32 mvn : 12; /* MSI vector number */
>> + u32 dmaas : 8; /* DMA address space */
>> + u32 reserved2 : 6;
>> + u32 q : 1; /* event qualifier */
>> + u32 rw : 1; /* read/write */
>> + u64 faddr; /* failing address */
>> + u32 reserved3;
>> + u16 reserved4;
>> + u16 pec; /* PCI event code */
>> +} __packed;
>> +
>> +#define ZPCI_ERR_PENDING_MAX 4
>> +struct zpci_ccdf_pending {
>> + u8 count;
>> + u8 head;
>> + u8 tail;
>> + struct zpci_ccdf_err err[ZPCI_ERR_PENDING_MAX];
>> +};
>> +
>> /* Private data per function */
>> struct zpci_dev {
>> struct zpci_bus *zbus;
>> @@ -171,6 +196,7 @@ struct zpci_dev {
>>
>> char res_name[16];
>> bool mio_capable;
>> + bool mediated_recovery;
>> struct zpci_bar_struct bars[PCI_STD_NUM_BARS];
>>
>> u64 start_dma; /* Start of available DMA addresses */
>> @@ -192,6 +218,8 @@ struct zpci_dev {
>> struct iommu_domain *s390_domain; /* attached IOMMU domain */
>> struct kvm_zdev *kzdev;
>> struct mutex kzdev_lock;
>> + struct zpci_ccdf_pending pending_errs;
>> + struct mutex pending_errs_lock;
>> spinlock_t dom_lock; /* protect s390_domain change */
>> };
>>
>> @@ -330,6 +358,8 @@ void zpci_debug_exit_device(struct zpci_dev *);
>> int zpci_report_error(struct pci_dev *, struct zpci_report_error_header *);
>> int zpci_clear_error_state(struct zpci_dev *zdev);
>> int zpci_reset_load_store_blocked(struct zpci_dev *zdev);
>> +void zpci_start_mediated_recovery(struct zpci_dev *zdev);
>> +void zpci_stop_mediated_recovery(struct zpci_dev *zdev);
>>
>> #ifdef CONFIG_NUMA
>>
>> diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c
>> index 2a430722cbe4..d64d370d86cf 100644
>> --- a/arch/s390/pci/pci.c
>> +++ b/arch/s390/pci/pci.c
>> @@ -841,6 +841,7 @@ struct zpci_dev *zpci_create_device(u32 fid, u32 fh, enum zpci_state state)
>> mutex_init(&zdev->state_lock);
>> mutex_init(&zdev->fmb_lock);
>> mutex_init(&zdev->kzdev_lock);
>> + mutex_init(&zdev->pending_errs_lock);
>>
>> return zdev;
>>
>> diff --git a/arch/s390/pci/pci_event.c b/arch/s390/pci/pci_event.c
>> index 839bd91c056e..9cda09da8b43 100644
>> --- a/arch/s390/pci/pci_event.c
>> +++ b/arch/s390/pci/pci_event.c
>> @@ -17,23 +17,6 @@
>> #include "pci_bus.h"
>> #include "pci_report.h"
>>
>> -/* Content Code Description for PCI Function Error */
>> -struct zpci_ccdf_err {
>> - u32 reserved1;
>> - u32 fh; /* function handle */
>> - u32 fid; /* function id */
>> - u32 ett : 4; /* expected table type */
>> - u32 mvn : 12; /* MSI vector number */
>> - u32 dmaas : 8; /* DMA address space */
>> - u32 : 6;
>> - u32 q : 1; /* event qualifier */
>> - u32 rw : 1; /* read/write */
>> - u64 faddr; /* failing address */
>> - u32 reserved3;
>> - u16 reserved4;
>> - u16 pec; /* PCI event code */
>> -} __packed;
>> -
>> /* Content Code Description for PCI Function Availability */
>> struct zpci_ccdf_avail {
>> u32 reserved1;
>> @@ -60,16 +43,11 @@ static inline bool ers_result_indicates_abort(pci_ers_result_t ers_res)
>> }
>> }
>>
>> -static bool is_passed_through(struct pci_dev *pdev)
>> +static bool needs_mediated_recovery(struct pci_dev *pdev)
>> {
>> struct zpci_dev *zdev = to_zpci(pdev);
>> - bool ret;
>> -
>> - mutex_lock(&zdev->kzdev_lock);
>> - ret = !!zdev->kzdev;
>> - mutex_unlock(&zdev->kzdev_lock);
>> -
>> - return ret;
>> + guard(mutex)(&zdev->pending_errs_lock);
>> + return zdev->mediated_recovery;
>> }
> This immediately raises a red flag, the caller gets a snapshot of the
> value under mutex, but nothing guarantees the value doesn't immediately
> change before the caller can take action using the (now stale) value.
>
> The pending_errs_lock is attempting to be used to protect the recovery
> state, in kernel vs mediated, but it only provides an instantaneous
> snapshot.
>
>>
>> static bool is_driver_supported(struct pci_driver *driver)
>> @@ -81,6 +59,47 @@ static bool is_driver_supported(struct pci_driver *driver)
>> return true;
>> }
>>
>> +static void zpci_store_pci_error(struct pci_dev *pdev,
>> + struct zpci_ccdf_err *ccdf)
>> +{
>> + struct zpci_dev *zdev = to_zpci(pdev);
>> + int i;
>> +
>> + guard(mutex)(&zdev->pending_errs_lock);
>> + if (zdev->pending_errs.count >= ZPCI_ERR_PENDING_MAX) {
>> + pr_err("%s: Maximum number (%d) of pending error events queued",
>> + pci_name(pdev), ZPCI_ERR_PENDING_MAX);
>> + return;
>> + }
>> +
>> + i = zdev->pending_errs.tail % ZPCI_ERR_PENDING_MAX;
>> + memcpy(&zdev->pending_errs.err[i], ccdf, sizeof(struct zpci_ccdf_err));
>> + zdev->pending_errs.tail++;
>> + zdev->pending_errs.count++;
>> +}
> Here we're using the mutex for something different, serializing access
> to the buffer.
Maybe I misunderstood, but I trying to implement this based on your
suggestion [1] from previous version of using the pending_errs_lock to
protect the mediated_recovery flag. I think it might be better to rename
the lock or maybe even have 2 separate locks?
>
>> +
>> +void zpci_start_mediated_recovery(struct zpci_dev *zdev)
>> +{
>> + guard(mutex)(&zdev->pending_errs_lock);
>> + zdev->mediated_recovery = true;
>> +}
>> +EXPORT_SYMBOL_GPL(zpci_start_mediated_recovery);
> An now back to protecting the recovery mode, but not really.
>
>> +
>> +void zpci_stop_mediated_recovery(struct zpci_dev *zdev)
>> +{
>> + struct pci_dev *pdev = NULL;
>> +
>> + guard(mutex)(&zdev->pending_errs_lock);
>> + zdev->mediated_recovery = false;
>> + pdev = pci_get_slot(zdev->zbus->bus, zdev->devfn);
>> + if (zdev->pending_errs.count)
>> + pr_info("%s: Unhandled PCI error events count=%d",
>> + pci_name(pdev), zdev->pending_errs.count);
>> + memset(&zdev->pending_errs, 0, sizeof(struct zpci_ccdf_pending));
>> + pci_dev_put(pdev);
>> +}
>> +EXPORT_SYMBOL_GPL(zpci_stop_mediated_recovery);
> This brings the use cases together, fair.
>
>> +
>> static pci_ers_result_t zpci_event_notify_error_detected(struct pci_dev *pdev,
>> struct pci_driver *driver)
>> {
>> @@ -175,7 +194,8 @@ static pci_ers_result_t zpci_event_do_reset(struct pci_dev *pdev,
>> * and the platform determines which functions are affected for
>> * multi-function devices.
>> */
>> -static pci_ers_result_t zpci_event_attempt_error_recovery(struct pci_dev *pdev)
>> +static pci_ers_result_t zpci_event_attempt_error_recovery(struct pci_dev *pdev,
>> + struct zpci_ccdf_err *ccdf)
>> {
>> pci_ers_result_t ers_res = PCI_ERS_RESULT_DISCONNECT;
>> struct zpci_dev *zdev = to_zpci(pdev);
>> @@ -194,13 +214,6 @@ static pci_ers_result_t zpci_event_attempt_error_recovery(struct pci_dev *pdev)
>> }
>> pdev->error_state = pci_channel_io_frozen;
>>
>> - if (is_passed_through(pdev)) {
>> - pr_info("%s: Cannot be recovered in the host because it is a pass-through device\n",
>> - pci_name(pdev));
>> - status_str = "failed (pass-through)";
>> - goto out_unlock;
>> - }
>> -
>> driver = to_pci_driver(pdev->dev.driver);
>> if (!is_driver_supported(driver)) {
>> if (!driver) {
>> @@ -216,12 +229,23 @@ static pci_ers_result_t zpci_event_attempt_error_recovery(struct pci_dev *pdev)
>> goto out_unlock;
>> }
>>
>> + if (needs_mediated_recovery(pdev))
> The test result is invalid here. Why not call zpci_store_pci_error()
> unconditionally here and wrap both in the same guard?
> needs_mediated_recovery() should have a lockdep_assert.
I think storing the error via zpci_store_pci_error() only made sense if
the error needed to be handled outside the kernel. Thinking more now I
think it makes sense to have separate locks, as the mediated_recovery
flag really provides information on if the vfio device is still opened
or closed.
>
>> + zpci_store_pci_error(pdev, ccdf);
>> +
>> ers_res = zpci_event_notify_error_detected(pdev, driver);
>> if (ers_result_indicates_abort(ers_res)) {
>> status_str = "failed (abort on detection)";
>> goto out_unlock;
>> }
>>
>> + if (needs_mediated_recovery(pdev)) {
>> + pr_info("%s: Leaving recovery of pass-through device to user-space\n",
>> + pci_name(pdev));
>> + ers_res = PCI_ERS_RESULT_RECOVERED;
>> + status_str = "in progress";
>> + goto out_unlock;
>> + }
> Does the mutex guard need to extend to here to make sure this is
> consistent with the previous test? What prevents an open/close
> across these discrete tests?
Yeah nothing prevents the device from being open/closed between the
test. Will fix this.
>
>> +
>> if (ers_res != PCI_ERS_RESULT_NEED_RESET) {
>> ers_res = zpci_event_do_error_state_clear(pdev, driver);
>> if (ers_result_indicates_abort(ers_res)) {
>> @@ -266,25 +290,20 @@ static pci_ers_result_t zpci_event_attempt_error_recovery(struct pci_dev *pdev)
>> * @pdev: PCI function for which to report
>> * @es: PCI channel failure state to report
>> */
>> -static void zpci_event_io_failure(struct pci_dev *pdev, pci_channel_state_t es)
>> +static void zpci_event_io_failure(struct pci_dev *pdev, pci_channel_state_t es,
>> + struct zpci_ccdf_err *ccdf)
>> {
>> struct pci_driver *driver;
>>
>> pci_dev_lock(pdev);
>> pdev->error_state = es;
>> - /**
>> - * While vfio-pci's error_detected callback notifies user-space QEMU
>> - * reacts to this by freezing the guest. In an s390 environment PCI
>> - * errors are rarely fatal so this is overkill. Instead in the future
>> - * we will inject the error event and let the guest recover the device
>> - * itself.
>> - */
>> - if (is_passed_through(pdev))
>> - goto out;
>> +
>> + if (needs_mediated_recovery(pdev))
>> + zpci_store_pci_error(pdev, ccdf);
> Same. Unless something else prevents this from changing, the mutex is
> not effective between test and use. Thanks,
Yes, will fix this.
Thanks
Farhan
>
> Alex
>
>> driver = to_pci_driver(pdev->dev.driver);
>> if (driver && driver->err_handler && driver->err_handler->error_detected)
>> driver->err_handler->error_detected(pdev, pdev->error_state);
>> -out:
>> +
>> pci_dev_unlock(pdev);
>> }
>>
>> @@ -330,12 +349,12 @@ static void __zpci_event_error(struct zpci_ccdf_err *ccdf)
>> break;
>> case 0x0040: /* Service Action or Error Recovery Failed */
>> case 0x003b:
>> - zpci_event_io_failure(pdev, pci_channel_io_perm_failure);
>> + zpci_event_io_failure(pdev, pci_channel_io_perm_failure, ccdf);
>> break;
>> default: /* PCI function left in the error state attempt to recover */
>> - ers_res = zpci_event_attempt_error_recovery(pdev);
>> + ers_res = zpci_event_attempt_error_recovery(pdev, ccdf);
>> if (ers_res != PCI_ERS_RESULT_RECOVERED)
>> - zpci_event_io_failure(pdev, pci_channel_io_perm_failure);
>> + zpci_event_io_failure(pdev, pci_channel_io_perm_failure, ccdf);
>> break;
>> }
>> pci_dev_put(pdev);
>> diff --git a/drivers/vfio/pci/vfio_pci_zdev.c b/drivers/vfio/pci/vfio_pci_zdev.c
>> index 0990fdb146b7..0658095ac5b1 100644
>> --- a/drivers/vfio/pci/vfio_pci_zdev.c
>> +++ b/drivers/vfio/pci/vfio_pci_zdev.c
>> @@ -148,6 +148,8 @@ int vfio_pci_zdev_open_device(struct vfio_pci_core_device *vdev)
>> if (!zdev)
>> return -ENODEV;
>>
>> + zpci_start_mediated_recovery(zdev);
>> +
>> if (!vdev->vdev.kvm)
>> return 0;
>>
>> @@ -161,7 +163,12 @@ void vfio_pci_zdev_close_device(struct vfio_pci_core_device *vdev)
>> {
>> struct zpci_dev *zdev = to_zpci(vdev->pdev);
>>
>> - if (!zdev || !vdev->vdev.kvm)
>> + if (!zdev)
>> + return;
>> +
>> + zpci_stop_mediated_recovery(zdev);
>> +
>> + if (!vdev->vdev.kvm)
>> return;
>>
>> if (zpci_kvm_hook.kvm_unregister)
^ permalink raw reply [flat|nested] 24+ messages in thread* Re: [PATCH v12 4/7] s390/pci: Store PCI error information for passthrough devices
2026-04-07 18:00 ` Farhan Ali
@ 2026-04-07 18:23 ` Alex Williamson
0 siblings, 0 replies; 24+ messages in thread
From: Alex Williamson @ 2026-04-07 18:23 UTC (permalink / raw)
To: Farhan Ali
Cc: linux-s390, linux-kernel, linux-pci, helgaas, lukas, clg, kbusch,
schnelle, mjrosato, alex
On Tue, 7 Apr 2026 11:00:05 -0700
Farhan Ali <alifm@linux.ibm.com> wrote:
> On 4/7/2026 8:38 AM, Alex Williamson wrote:
> > On Mon, 30 Mar 2026 10:40:08 -0700
> > Farhan Ali <alifm@linux.ibm.com> wrote:
> >> @@ -194,13 +214,6 @@ static pci_ers_result_t zpci_event_attempt_error_recovery(struct pci_dev *pdev)
> >> }
> >> pdev->error_state = pci_channel_io_frozen;
> >>
> >> - if (is_passed_through(pdev)) {
> >> - pr_info("%s: Cannot be recovered in the host because it is a pass-through device\n",
> >> - pci_name(pdev));
> >> - status_str = "failed (pass-through)";
> >> - goto out_unlock;
> >> - }
> >> -
> >> driver = to_pci_driver(pdev->dev.driver);
> >> if (!is_driver_supported(driver)) {
> >> if (!driver) {
> >> @@ -216,12 +229,23 @@ static pci_ers_result_t zpci_event_attempt_error_recovery(struct pci_dev *pdev)
> >> goto out_unlock;
> >> }
> >>
> >> + if (needs_mediated_recovery(pdev))
> > The test result is invalid here. Why not call zpci_store_pci_error()
> > unconditionally here and wrap both in the same guard?
> > needs_mediated_recovery() should have a lockdep_assert.
>
> I think storing the error via zpci_store_pci_error() only made sense if
> the error needed to be handled outside the kernel. Thinking more now I
> think it makes sense to have separate locks, as the mediated_recovery
> flag really provides information on if the vfio device is still opened
> or closed.
I don't think separate locks help the situation, the might conflate it
further. I think the problem here is that we're using the lock as if
it protects two separate things, the mediated_recovery flag vs the
pending_errs structure, but in fact they're related. We only record
errors to the structure when we're in mediated_recovery and we clear
the structure when exiting mediated recovery. One lock makes sense to
me, but it needs to protect both the state of the flag and the use of
the structure together, not independently. Thanks,
Alex
^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH v12 5/7] vfio-pci/zdev: Add a device feature for error information
2026-03-30 17:40 [PATCH v12 0/7] Error recovery for vfio-pci devices on s390x Farhan Ali
` (3 preceding siblings ...)
2026-03-30 17:40 ` [PATCH v12 4/7] s390/pci: Store PCI error information for passthrough devices Farhan Ali
@ 2026-03-30 17:40 ` Farhan Ali
2026-03-31 17:42 ` Matthew Rosato
2026-04-07 15:53 ` Alex Williamson
2026-03-30 17:40 ` [PATCH v12 6/7] vfio/pci: Add a reset_done callback for vfio-pci driver Farhan Ali
` (2 subsequent siblings)
7 siblings, 2 replies; 24+ messages in thread
From: Farhan Ali @ 2026-03-30 17:40 UTC (permalink / raw)
To: linux-s390, linux-kernel, linux-pci
Cc: helgaas, lukas, alex, clg, kbusch, alifm, schnelle, mjrosato
For zPCI devices, we have platform specific error information. The platform
firmware provides this error information to the operating system in an
architecture specific mechanism. To enable recovery from userspace for
these devices, we want to expose this error information to userspace. Add a
new device feature to expose this information.
Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
---
drivers/vfio/pci/vfio_pci_core.c | 2 ++
drivers/vfio/pci/vfio_pci_priv.h | 9 ++++++++
drivers/vfio/pci/vfio_pci_zdev.c | 36 ++++++++++++++++++++++++++++++++
include/uapi/linux/vfio.h | 18 ++++++++++++++++
4 files changed, 65 insertions(+)
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index d43745fe4c84..bbdb625e35ef 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -1534,6 +1534,8 @@ int vfio_pci_core_ioctl_feature(struct vfio_device *device, u32 flags,
return vfio_pci_core_feature_token(vdev, flags, arg, argsz);
case VFIO_DEVICE_FEATURE_DMA_BUF:
return vfio_pci_core_feature_dma_buf(vdev, flags, arg, argsz);
+ case VFIO_DEVICE_FEATURE_ZPCI_ERROR:
+ return vfio_pci_zdev_feature_err(device, flags, arg, argsz);
default:
return -ENOTTY;
}
diff --git a/drivers/vfio/pci/vfio_pci_priv.h b/drivers/vfio/pci/vfio_pci_priv.h
index 27ac280f00b9..647a9e8f348d 100644
--- a/drivers/vfio/pci/vfio_pci_priv.h
+++ b/drivers/vfio/pci/vfio_pci_priv.h
@@ -89,6 +89,8 @@ int vfio_pci_info_zdev_add_caps(struct vfio_pci_core_device *vdev,
struct vfio_info_cap *caps);
int vfio_pci_zdev_open_device(struct vfio_pci_core_device *vdev);
void vfio_pci_zdev_close_device(struct vfio_pci_core_device *vdev);
+int vfio_pci_zdev_feature_err(struct vfio_device *device, u32 flags,
+ void __user *arg, size_t argsz);
#else
static inline int vfio_pci_info_zdev_add_caps(struct vfio_pci_core_device *vdev,
struct vfio_info_cap *caps)
@@ -103,6 +105,13 @@ static inline int vfio_pci_zdev_open_device(struct vfio_pci_core_device *vdev)
static inline void vfio_pci_zdev_close_device(struct vfio_pci_core_device *vdev)
{}
+
+static inline int vfio_pci_zdev_feature_err(struct vfio_device *device,
+ u32 flags, void __user *arg,
+ size_t argsz)
+{
+ return -ENOTTY;
+}
#endif
static inline bool vfio_pci_is_vga(struct pci_dev *pdev)
diff --git a/drivers/vfio/pci/vfio_pci_zdev.c b/drivers/vfio/pci/vfio_pci_zdev.c
index 0658095ac5b1..0a8663879eee 100644
--- a/drivers/vfio/pci/vfio_pci_zdev.c
+++ b/drivers/vfio/pci/vfio_pci_zdev.c
@@ -141,6 +141,42 @@ int vfio_pci_info_zdev_add_caps(struct vfio_pci_core_device *vdev,
return ret;
}
+int vfio_pci_zdev_feature_err(struct vfio_device *device, u32 flags,
+ void __user *arg, size_t argsz)
+{
+ struct vfio_device_feature_zpci_err err = {};
+ struct vfio_pci_core_device *vdev;
+ struct zpci_dev *zdev;
+ int head = 0;
+ int ret;
+
+ vdev = container_of(device, struct vfio_pci_core_device, vdev);
+ zdev = to_zpci(vdev->pdev);
+ if (!zdev)
+ return -ENODEV;
+
+ ret = vfio_check_feature(flags, argsz, VFIO_DEVICE_FEATURE_GET,
+ sizeof(err));
+ if (ret != 1)
+ return ret;
+
+ mutex_lock(&zdev->pending_errs_lock);
+ if (zdev->pending_errs.count) {
+ head = zdev->pending_errs.head % ZPCI_ERR_PENDING_MAX;
+ err.pec = zdev->pending_errs.err[head].pec;
+ zdev->pending_errs.head++;
+ zdev->pending_errs.count--;
+ err.pending_errors = zdev->pending_errs.count;
+ }
+ mutex_unlock(&zdev->pending_errs_lock);
+
+ err.version = 1;
+ if (copy_to_user(arg, &err, sizeof(err)))
+ return -EFAULT;
+
+ return 0;
+}
+
int vfio_pci_zdev_open_device(struct vfio_pci_core_device *vdev)
{
struct zpci_dev *zdev = to_zpci(vdev->pdev);
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index bb7b89330d35..2552eef42000 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -1510,6 +1510,24 @@ struct vfio_device_feature_dma_buf {
struct vfio_region_dma_range dma_ranges[] __counted_by(nr_ranges);
};
+/**
+ * VFIO_DEVICE_FEATURE_ZPCI_ERROR feature provides PCI error information to
+ * userspace for vfio-pci devices on s390x. On s390x PCI error recovery
+ * involves platform firmware and notification to operating system is done
+ * by architecture specific mechanism. Exposing this information to
+ * userspace allows userspace to take appropriate actions to handle an
+ * error on the device. The pending_errors provide any additional errors
+ * pending for the device, and userspace should read until zero.
+ */
+
+struct vfio_device_feature_zpci_err {
+ __u8 version;
+ __u8 pending_errors;
+ __u16 pec;
+};
+
+#define VFIO_DEVICE_FEATURE_ZPCI_ERROR 12
+
/* -------- API for Type1 VFIO IOMMU -------- */
/**
--
2.43.0
^ permalink raw reply related [flat|nested] 24+ messages in thread* Re: [PATCH v12 5/7] vfio-pci/zdev: Add a device feature for error information
2026-03-30 17:40 ` [PATCH v12 5/7] vfio-pci/zdev: Add a device feature for error information Farhan Ali
@ 2026-03-31 17:42 ` Matthew Rosato
2026-03-31 19:27 ` Farhan Ali
2026-04-07 15:53 ` Alex Williamson
1 sibling, 1 reply; 24+ messages in thread
From: Matthew Rosato @ 2026-03-31 17:42 UTC (permalink / raw)
To: Farhan Ali, linux-s390, linux-kernel, linux-pci
Cc: helgaas, lukas, alex, clg, kbusch, schnelle
On 3/30/26 1:40 PM, Farhan Ali wrote:
> For zPCI devices, we have platform specific error information. The platform
> firmware provides this error information to the operating system in an
> architecture specific mechanism. To enable recovery from userspace for
> these devices, we want to expose this error information to userspace. Add a
> new device feature to expose this information.
>
> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Nits and a questions below
> +int vfio_pci_zdev_feature_err(struct vfio_device *device, u32 flags,
> + void __user *arg, size_t argsz)
> +{
> + struct vfio_device_feature_zpci_err err = {};
> + struct vfio_pci_core_device *vdev;
> + struct zpci_dev *zdev;
> + int head = 0;
> + int ret;
> +
> + vdev = container_of(device, struct vfio_pci_core_device, vdev);
> + zdev = to_zpci(vdev->pdev);
> + if (!zdev)
> + return -ENODEV;
> +
> + ret = vfio_check_feature(flags, argsz, VFIO_DEVICE_FEATURE_GET,
> + sizeof(err));
> + if (ret != 1)
> + return ret;
> +
> + mutex_lock(&zdev->pending_errs_lock);
> + if (zdev->pending_errs.count) {
> + head = zdev->pending_errs.head % ZPCI_ERR_PENDING_MAX;
> + err.pec = zdev->pending_errs.err[head].pec;
> + zdev->pending_errs.head++;
> + zdev->pending_errs.count--;
> + err.pending_errors = zdev->pending_errs.count;
> + }
> + mutex_unlock(&zdev->pending_errs_lock);
> +
Question: intentionally dropping so we don't hold over the copy_to_user?
That makes sense to me; but if not then may as well use guard(mutex)
here too. Does not affect my review tag.
> + err.version = 1;
> + if (copy_to_user(arg, &err, sizeof(err)))
> + return -EFAULT;
> +
> + return 0;
> +}
> +
> int vfio_pci_zdev_open_device(struct vfio_pci_core_device *vdev)
> {
> struct zpci_dev *zdev = to_zpci(vdev->pdev);
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index bb7b89330d35..2552eef42000 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -1510,6 +1510,24 @@ struct vfio_device_feature_dma_buf {
> struct vfio_region_dma_range dma_ranges[] __counted_by(nr_ranges);
> };
>
> +/**
> + * VFIO_DEVICE_FEATURE_ZPCI_ERROR feature provides PCI error information to
> + * userspace for vfio-pci devices on s390x. On s390x PCI error recovery
'On s390x, PCI error...' (suggest comma)
> + * involves platform firmware and notification to operating system is done
> + * by architecture specific mechanism. Exposing this information to
> + * userspace allows userspace to take appropriate actions to handle an
'userspace allows it to take...' (reduce repetition of the word userspace)
> + * error on the device. The pending_errors provide any additional errors
> + * pending for the device, and userspace should read until zero.
> + */
> +
> +struct vfio_device_feature_zpci_err {
> + __u8 version;
> + __u8 pending_errors;
> + __u16 pec;
> +};
> +
> +#define VFIO_DEVICE_FEATURE_ZPCI_ERROR 12
> +
> /* -------- API for Type1 VFIO IOMMU -------- */
>
> /**
^ permalink raw reply [flat|nested] 24+ messages in thread* Re: [PATCH v12 5/7] vfio-pci/zdev: Add a device feature for error information
2026-03-31 17:42 ` Matthew Rosato
@ 2026-03-31 19:27 ` Farhan Ali
0 siblings, 0 replies; 24+ messages in thread
From: Farhan Ali @ 2026-03-31 19:27 UTC (permalink / raw)
To: Matthew Rosato, linux-s390, linux-kernel, linux-pci
Cc: helgaas, lukas, alex, clg, kbusch, schnelle
On 3/31/2026 10:42 AM, Matthew Rosato wrote:
> On 3/30/26 1:40 PM, Farhan Ali wrote:
>> For zPCI devices, we have platform specific error information. The platform
>> firmware provides this error information to the operating system in an
>> architecture specific mechanism. To enable recovery from userspace for
>> these devices, we want to expose this error information to userspace. Add a
>> new device feature to expose this information.
>>
>> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
>
> Nits and a questions below
>
>
>> +int vfio_pci_zdev_feature_err(struct vfio_device *device, u32 flags,
>> + void __user *arg, size_t argsz)
>> +{
>> + struct vfio_device_feature_zpci_err err = {};
>> + struct vfio_pci_core_device *vdev;
>> + struct zpci_dev *zdev;
>> + int head = 0;
>> + int ret;
>> +
>> + vdev = container_of(device, struct vfio_pci_core_device, vdev);
>> + zdev = to_zpci(vdev->pdev);
>> + if (!zdev)
>> + return -ENODEV;
>> +
>> + ret = vfio_check_feature(flags, argsz, VFIO_DEVICE_FEATURE_GET,
>> + sizeof(err));
>> + if (ret != 1)
>> + return ret;
>> +
>> + mutex_lock(&zdev->pending_errs_lock);
>> + if (zdev->pending_errs.count) {
>> + head = zdev->pending_errs.head % ZPCI_ERR_PENDING_MAX;
>> + err.pec = zdev->pending_errs.err[head].pec;
>> + zdev->pending_errs.head++;
>> + zdev->pending_errs.count--;
>> + err.pending_errors = zdev->pending_errs.count;
>> + }
>> + mutex_unlock(&zdev->pending_errs_lock);
>> +
> Question: intentionally dropping so we don't hold over the copy_to_user?
> That makes sense to me; but if not then may as well use guard(mutex)
> here too. Does not affect my review tag.
Yes, exactly. I didn't see a need to hold the lock over copy_to_user().
Thanks
Farhan
>
>> + err.version = 1;
>> + if (copy_to_user(arg, &err, sizeof(err)))
>> + return -EFAULT;
>> +
>> + return 0;
>> +}
>> +
>> int vfio_pci_zdev_open_device(struct vfio_pci_core_device *vdev)
>> {
>> struct zpci_dev *zdev = to_zpci(vdev->pdev);
>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
>> index bb7b89330d35..2552eef42000 100644
>> --- a/include/uapi/linux/vfio.h
>> +++ b/include/uapi/linux/vfio.h
>> @@ -1510,6 +1510,24 @@ struct vfio_device_feature_dma_buf {
>> struct vfio_region_dma_range dma_ranges[] __counted_by(nr_ranges);
>> };
>>
>> +/**
>> + * VFIO_DEVICE_FEATURE_ZPCI_ERROR feature provides PCI error information to
>> + * userspace for vfio-pci devices on s390x. On s390x PCI error recovery
> 'On s390x, PCI error...' (suggest comma)
>
>> + * involves platform firmware and notification to operating system is done
>> + * by architecture specific mechanism. Exposing this information to
>> + * userspace allows userspace to take appropriate actions to handle an
> 'userspace allows it to take...' (reduce repetition of the word userspace)
>
>> + * error on the device. The pending_errors provide any additional errors
>> + * pending for the device, and userspace should read until zero.
>> + */
>> +
>> +struct vfio_device_feature_zpci_err {
>> + __u8 version;
>> + __u8 pending_errors;
>> + __u16 pec;
>> +};
>> +
>> +#define VFIO_DEVICE_FEATURE_ZPCI_ERROR 12
>> +
>> /* -------- API for Type1 VFIO IOMMU -------- */
>>
>> /**
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v12 5/7] vfio-pci/zdev: Add a device feature for error information
2026-03-30 17:40 ` [PATCH v12 5/7] vfio-pci/zdev: Add a device feature for error information Farhan Ali
2026-03-31 17:42 ` Matthew Rosato
@ 2026-04-07 15:53 ` Alex Williamson
2026-04-07 18:13 ` Farhan Ali
1 sibling, 1 reply; 24+ messages in thread
From: Alex Williamson @ 2026-04-07 15:53 UTC (permalink / raw)
To: Farhan Ali
Cc: linux-s390, linux-kernel, linux-pci, helgaas, lukas, clg, kbusch,
schnelle, mjrosato, alex
On Mon, 30 Mar 2026 10:40:09 -0700
Farhan Ali <alifm@linux.ibm.com> wrote:
> For zPCI devices, we have platform specific error information. The platform
> firmware provides this error information to the operating system in an
> architecture specific mechanism. To enable recovery from userspace for
> these devices, we want to expose this error information to userspace. Add a
> new device feature to expose this information.
>
> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
> ---
> drivers/vfio/pci/vfio_pci_core.c | 2 ++
> drivers/vfio/pci/vfio_pci_priv.h | 9 ++++++++
> drivers/vfio/pci/vfio_pci_zdev.c | 36 ++++++++++++++++++++++++++++++++
> include/uapi/linux/vfio.h | 18 ++++++++++++++++
> 4 files changed, 65 insertions(+)
>
> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index d43745fe4c84..bbdb625e35ef 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -1534,6 +1534,8 @@ int vfio_pci_core_ioctl_feature(struct vfio_device *device, u32 flags,
> return vfio_pci_core_feature_token(vdev, flags, arg, argsz);
> case VFIO_DEVICE_FEATURE_DMA_BUF:
> return vfio_pci_core_feature_dma_buf(vdev, flags, arg, argsz);
> + case VFIO_DEVICE_FEATURE_ZPCI_ERROR:
> + return vfio_pci_zdev_feature_err(device, flags, arg, argsz);
> default:
> return -ENOTTY;
> }
> diff --git a/drivers/vfio/pci/vfio_pci_priv.h b/drivers/vfio/pci/vfio_pci_priv.h
> index 27ac280f00b9..647a9e8f348d 100644
> --- a/drivers/vfio/pci/vfio_pci_priv.h
> +++ b/drivers/vfio/pci/vfio_pci_priv.h
> @@ -89,6 +89,8 @@ int vfio_pci_info_zdev_add_caps(struct vfio_pci_core_device *vdev,
> struct vfio_info_cap *caps);
> int vfio_pci_zdev_open_device(struct vfio_pci_core_device *vdev);
> void vfio_pci_zdev_close_device(struct vfio_pci_core_device *vdev);
> +int vfio_pci_zdev_feature_err(struct vfio_device *device, u32 flags,
> + void __user *arg, size_t argsz);
> #else
> static inline int vfio_pci_info_zdev_add_caps(struct vfio_pci_core_device *vdev,
> struct vfio_info_cap *caps)
> @@ -103,6 +105,13 @@ static inline int vfio_pci_zdev_open_device(struct vfio_pci_core_device *vdev)
>
> static inline void vfio_pci_zdev_close_device(struct vfio_pci_core_device *vdev)
> {}
> +
> +static inline int vfio_pci_zdev_feature_err(struct vfio_device *device,
> + u32 flags, void __user *arg,
> + size_t argsz)
> +{
> + return -ENOTTY;
> +}
> #endif
>
> static inline bool vfio_pci_is_vga(struct pci_dev *pdev)
> diff --git a/drivers/vfio/pci/vfio_pci_zdev.c b/drivers/vfio/pci/vfio_pci_zdev.c
> index 0658095ac5b1..0a8663879eee 100644
> --- a/drivers/vfio/pci/vfio_pci_zdev.c
> +++ b/drivers/vfio/pci/vfio_pci_zdev.c
> @@ -141,6 +141,42 @@ int vfio_pci_info_zdev_add_caps(struct vfio_pci_core_device *vdev,
> return ret;
> }
>
> +int vfio_pci_zdev_feature_err(struct vfio_device *device, u32 flags,
> + void __user *arg, size_t argsz)
> +{
> + struct vfio_device_feature_zpci_err err = {};
> + struct vfio_pci_core_device *vdev;
> + struct zpci_dev *zdev;
> + int head = 0;
> + int ret;
> +
> + vdev = container_of(device, struct vfio_pci_core_device, vdev);
> + zdev = to_zpci(vdev->pdev);
> + if (!zdev)
> + return -ENODEV;
> +
> + ret = vfio_check_feature(flags, argsz, VFIO_DEVICE_FEATURE_GET,
> + sizeof(err));
> + if (ret != 1)
> + return ret;
> +
> + mutex_lock(&zdev->pending_errs_lock);
> + if (zdev->pending_errs.count) {
> + head = zdev->pending_errs.head % ZPCI_ERR_PENDING_MAX;
> + err.pec = zdev->pending_errs.err[head].pec;
> + zdev->pending_errs.head++;
> + zdev->pending_errs.count--;
> + err.pending_errors = zdev->pending_errs.count;
> + }
> + mutex_unlock(&zdev->pending_errs_lock);
Inconsistent that this isn't a helper exported from the previous patch.
What's the meaning of err.pec = 0? Could this be interpreted as an
error itself?
> +
> + err.version = 1;
> + if (copy_to_user(arg, &err, sizeof(err)))
> + return -EFAULT;
> +
> + return 0;
> +}
> +
> int vfio_pci_zdev_open_device(struct vfio_pci_core_device *vdev)
> {
> struct zpci_dev *zdev = to_zpci(vdev->pdev);
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index bb7b89330d35..2552eef42000 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -1510,6 +1510,24 @@ struct vfio_device_feature_dma_buf {
> struct vfio_region_dma_range dma_ranges[] __counted_by(nr_ranges);
> };
>
> +/**
> + * VFIO_DEVICE_FEATURE_ZPCI_ERROR feature provides PCI error information to
> + * userspace for vfio-pci devices on s390x. On s390x PCI error recovery
> + * involves platform firmware and notification to operating system is done
> + * by architecture specific mechanism. Exposing this information to
> + * userspace allows userspace to take appropriate actions to handle an
> + * error on the device. The pending_errors provide any additional errors
> + * pending for the device, and userspace should read until zero.
> + */
> +
> +struct vfio_device_feature_zpci_err {
> + __u8 version;
> + __u8 pending_errors;
> + __u16 pec;
> +};
> +
> +#define VFIO_DEVICE_FEATURE_ZPCI_ERROR 12
12 is used in linux-next by the PRECOPY_INFOv2 feature. Thanks,
Alex
> +
> /* -------- API for Type1 VFIO IOMMU -------- */
>
> /**
^ permalink raw reply [flat|nested] 24+ messages in thread* Re: [PATCH v12 5/7] vfio-pci/zdev: Add a device feature for error information
2026-04-07 15:53 ` Alex Williamson
@ 2026-04-07 18:13 ` Farhan Ali
2026-04-07 18:27 ` Alex Williamson
0 siblings, 1 reply; 24+ messages in thread
From: Farhan Ali @ 2026-04-07 18:13 UTC (permalink / raw)
To: Alex Williamson
Cc: linux-s390, linux-kernel, linux-pci, helgaas, lukas, clg, kbusch,
schnelle, mjrosato
On 4/7/2026 8:53 AM, Alex Williamson wrote:
> On Mon, 30 Mar 2026 10:40:09 -0700
> Farhan Ali <alifm@linux.ibm.com> wrote:
>
>> For zPCI devices, we have platform specific error information. The platform
>> firmware provides this error information to the operating system in an
>> architecture specific mechanism. To enable recovery from userspace for
>> these devices, we want to expose this error information to userspace. Add a
>> new device feature to expose this information.
>>
>> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
>> ---
>> drivers/vfio/pci/vfio_pci_core.c | 2 ++
>> drivers/vfio/pci/vfio_pci_priv.h | 9 ++++++++
>> drivers/vfio/pci/vfio_pci_zdev.c | 36 ++++++++++++++++++++++++++++++++
>> include/uapi/linux/vfio.h | 18 ++++++++++++++++
>> 4 files changed, 65 insertions(+)
>>
>> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
>> index d43745fe4c84..bbdb625e35ef 100644
>> --- a/drivers/vfio/pci/vfio_pci_core.c
>> +++ b/drivers/vfio/pci/vfio_pci_core.c
>> @@ -1534,6 +1534,8 @@ int vfio_pci_core_ioctl_feature(struct vfio_device *device, u32 flags,
>> return vfio_pci_core_feature_token(vdev, flags, arg, argsz);
>> case VFIO_DEVICE_FEATURE_DMA_BUF:
>> return vfio_pci_core_feature_dma_buf(vdev, flags, arg, argsz);
>> + case VFIO_DEVICE_FEATURE_ZPCI_ERROR:
>> + return vfio_pci_zdev_feature_err(device, flags, arg, argsz);
>> default:
>> return -ENOTTY;
>> }
>> diff --git a/drivers/vfio/pci/vfio_pci_priv.h b/drivers/vfio/pci/vfio_pci_priv.h
>> index 27ac280f00b9..647a9e8f348d 100644
>> --- a/drivers/vfio/pci/vfio_pci_priv.h
>> +++ b/drivers/vfio/pci/vfio_pci_priv.h
>> @@ -89,6 +89,8 @@ int vfio_pci_info_zdev_add_caps(struct vfio_pci_core_device *vdev,
>> struct vfio_info_cap *caps);
>> int vfio_pci_zdev_open_device(struct vfio_pci_core_device *vdev);
>> void vfio_pci_zdev_close_device(struct vfio_pci_core_device *vdev);
>> +int vfio_pci_zdev_feature_err(struct vfio_device *device, u32 flags,
>> + void __user *arg, size_t argsz);
>> #else
>> static inline int vfio_pci_info_zdev_add_caps(struct vfio_pci_core_device *vdev,
>> struct vfio_info_cap *caps)
>> @@ -103,6 +105,13 @@ static inline int vfio_pci_zdev_open_device(struct vfio_pci_core_device *vdev)
>>
>> static inline void vfio_pci_zdev_close_device(struct vfio_pci_core_device *vdev)
>> {}
>> +
>> +static inline int vfio_pci_zdev_feature_err(struct vfio_device *device,
>> + u32 flags, void __user *arg,
>> + size_t argsz)
>> +{
>> + return -ENOTTY;
>> +}
>> #endif
>>
>> static inline bool vfio_pci_is_vga(struct pci_dev *pdev)
>> diff --git a/drivers/vfio/pci/vfio_pci_zdev.c b/drivers/vfio/pci/vfio_pci_zdev.c
>> index 0658095ac5b1..0a8663879eee 100644
>> --- a/drivers/vfio/pci/vfio_pci_zdev.c
>> +++ b/drivers/vfio/pci/vfio_pci_zdev.c
>> @@ -141,6 +141,42 @@ int vfio_pci_info_zdev_add_caps(struct vfio_pci_core_device *vdev,
>> return ret;
>> }
>>
>> +int vfio_pci_zdev_feature_err(struct vfio_device *device, u32 flags,
>> + void __user *arg, size_t argsz)
>> +{
>> + struct vfio_device_feature_zpci_err err = {};
>> + struct vfio_pci_core_device *vdev;
>> + struct zpci_dev *zdev;
>> + int head = 0;
>> + int ret;
>> +
>> + vdev = container_of(device, struct vfio_pci_core_device, vdev);
>> + zdev = to_zpci(vdev->pdev);
>> + if (!zdev)
>> + return -ENODEV;
>> +
>> + ret = vfio_check_feature(flags, argsz, VFIO_DEVICE_FEATURE_GET,
>> + sizeof(err));
>> + if (ret != 1)
>> + return ret;
>> +
>> + mutex_lock(&zdev->pending_errs_lock);
>> + if (zdev->pending_errs.count) {
>> + head = zdev->pending_errs.head % ZPCI_ERR_PENDING_MAX;
>> + err.pec = zdev->pending_errs.err[head].pec;
>> + zdev->pending_errs.head++;
>> + zdev->pending_errs.count--;
>> + err.pending_errors = zdev->pending_errs.count;
>> + }
>> + mutex_unlock(&zdev->pending_errs_lock);
> Inconsistent that this isn't a helper exported from the previous patch.
Do you prefer it to be in a helper function? I can move it to a helper
function.
>
> What's the meaning of err.pec = 0? Could this be interpreted as an
> error itself?
An err.pec = 0 would indicate there are no pending errors. I don't think
anything would prevent userspace from doing a VFIO_DEVICE_FEATURE_GET
even if not nudged by an eventfd?
>
>> +
>> + err.version = 1;
>> + if (copy_to_user(arg, &err, sizeof(err)))
>> + return -EFAULT;
>> +
>> + return 0;
>> +}
>> +
>> int vfio_pci_zdev_open_device(struct vfio_pci_core_device *vdev)
>> {
>> struct zpci_dev *zdev = to_zpci(vdev->pdev);
>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
>> index bb7b89330d35..2552eef42000 100644
>> --- a/include/uapi/linux/vfio.h
>> +++ b/include/uapi/linux/vfio.h
>> @@ -1510,6 +1510,24 @@ struct vfio_device_feature_dma_buf {
>> struct vfio_region_dma_range dma_ranges[] __counted_by(nr_ranges);
>> };
>>
>> +/**
>> + * VFIO_DEVICE_FEATURE_ZPCI_ERROR feature provides PCI error information to
>> + * userspace for vfio-pci devices on s390x. On s390x PCI error recovery
>> + * involves platform firmware and notification to operating system is done
>> + * by architecture specific mechanism. Exposing this information to
>> + * userspace allows userspace to take appropriate actions to handle an
>> + * error on the device. The pending_errors provide any additional errors
>> + * pending for the device, and userspace should read until zero.
>> + */
>> +
>> +struct vfio_device_feature_zpci_err {
>> + __u8 version;
>> + __u8 pending_errors;
>> + __u16 pec;
>> +};
>> +
>> +#define VFIO_DEVICE_FEATURE_ZPCI_ERROR 12
> 12 is used in linux-next by the PRECOPY_INFOv2 feature. Thanks,
>
> Alex
Ah okay, will update this. I think my next rebase would be on the new
kernel (7.1) and that should have the conflicting PCI patches.
Thanks
Farhan
>> +
>> /* -------- API for Type1 VFIO IOMMU -------- */
>>
>> /**
^ permalink raw reply [flat|nested] 24+ messages in thread* Re: [PATCH v12 5/7] vfio-pci/zdev: Add a device feature for error information
2026-04-07 18:13 ` Farhan Ali
@ 2026-04-07 18:27 ` Alex Williamson
0 siblings, 0 replies; 24+ messages in thread
From: Alex Williamson @ 2026-04-07 18:27 UTC (permalink / raw)
To: Farhan Ali
Cc: linux-s390, linux-kernel, linux-pci, helgaas, lukas, clg, kbusch,
schnelle, mjrosato, alex
On Tue, 7 Apr 2026 11:13:53 -0700
Farhan Ali <alifm@linux.ibm.com> wrote:
> On 4/7/2026 8:53 AM, Alex Williamson wrote:
> > On Mon, 30 Mar 2026 10:40:09 -0700
> > Farhan Ali <alifm@linux.ibm.com> wrote:
> >> diff --git a/drivers/vfio/pci/vfio_pci_zdev.c b/drivers/vfio/pci/vfio_pci_zdev.c
> >> index 0658095ac5b1..0a8663879eee 100644
> >> --- a/drivers/vfio/pci/vfio_pci_zdev.c
> >> +++ b/drivers/vfio/pci/vfio_pci_zdev.c
> >> @@ -141,6 +141,42 @@ int vfio_pci_info_zdev_add_caps(struct vfio_pci_core_device *vdev,
> >> return ret;
> >> }
> >>
> >> +int vfio_pci_zdev_feature_err(struct vfio_device *device, u32 flags,
> >> + void __user *arg, size_t argsz)
> >> +{
> >> + struct vfio_device_feature_zpci_err err = {};
> >> + struct vfio_pci_core_device *vdev;
> >> + struct zpci_dev *zdev;
> >> + int head = 0;
> >> + int ret;
> >> +
> >> + vdev = container_of(device, struct vfio_pci_core_device, vdev);
> >> + zdev = to_zpci(vdev->pdev);
> >> + if (!zdev)
> >> + return -ENODEV;
> >> +
> >> + ret = vfio_check_feature(flags, argsz, VFIO_DEVICE_FEATURE_GET,
> >> + sizeof(err));
> >> + if (ret != 1)
> >> + return ret;
> >> +
> >> + mutex_lock(&zdev->pending_errs_lock);
> >> + if (zdev->pending_errs.count) {
> >> + head = zdev->pending_errs.head % ZPCI_ERR_PENDING_MAX;
> >> + err.pec = zdev->pending_errs.err[head].pec;
> >> + zdev->pending_errs.head++;
> >> + zdev->pending_errs.count--;
> >> + err.pending_errors = zdev->pending_errs.count;
> >> + }
> >> + mutex_unlock(&zdev->pending_errs_lock);
> > Inconsistent that this isn't a helper exported from the previous patch.
>
> Do you prefer it to be in a helper function? I can move it to a helper
> function.
Yes, a helper that dequeues and returns the error pec with an arg of the
remaining count would make sense here. Call it unconditionally and
specify 0/0 = no error and none pending. All the locking and buffer
manipulation code is localized in s390 code.
> >
> > What's the meaning of err.pec = 0? Could this be interpreted as an
> > error itself?
>
> An err.pec = 0 would indicate there are no pending errors. I don't think
> anything would prevent userspace from doing a VFIO_DEVICE_FEATURE_GET
> even if not nudged by an eventfd?
Maybe worth specifying in the feature description. Thanks,
Alex
^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH v12 6/7] vfio/pci: Add a reset_done callback for vfio-pci driver
2026-03-30 17:40 [PATCH v12 0/7] Error recovery for vfio-pci devices on s390x Farhan Ali
` (4 preceding siblings ...)
2026-03-30 17:40 ` [PATCH v12 5/7] vfio-pci/zdev: Add a device feature for error information Farhan Ali
@ 2026-03-30 17:40 ` Farhan Ali
2026-03-31 17:43 ` Matthew Rosato
2026-03-30 17:40 ` [PATCH v12 7/7] vfio/pci: Remove the pcie check for VFIO_PCI_ERR_IRQ_INDEX Farhan Ali
2026-04-06 17:23 ` [PATCH v12 0/7] Error recovery for vfio-pci devices on s390x Farhan Ali
7 siblings, 1 reply; 24+ messages in thread
From: Farhan Ali @ 2026-03-30 17:40 UTC (permalink / raw)
To: linux-s390, linux-kernel, linux-pci
Cc: helgaas, lukas, alex, clg, kbusch, alifm, schnelle, mjrosato,
Julian Ruess
On error recovery for a PCI device bound to vfio-pci driver, we want to
recover the state of the device to its last known saved state. The callback
restores the state of the device to its initial saved state.
Reviewed-by: Julian Ruess <julianr@linux.ibm.com>
Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
Acked-by: Alex Williamson <alex@shazbot.org>
---
drivers/vfio/pci/vfio_pci_core.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index bbdb625e35ef..f1bd1266b88f 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -2257,6 +2257,17 @@ pci_ers_result_t vfio_pci_core_aer_err_detected(struct pci_dev *pdev,
}
EXPORT_SYMBOL_GPL(vfio_pci_core_aer_err_detected);
+static void vfio_pci_core_aer_reset_done(struct pci_dev *pdev)
+{
+ struct vfio_pci_core_device *vdev = dev_get_drvdata(&pdev->dev);
+
+ if (!vdev->pci_saved_state)
+ return;
+
+ pci_load_saved_state(pdev, vdev->pci_saved_state);
+ pci_restore_state(pdev);
+}
+
int vfio_pci_core_sriov_configure(struct vfio_pci_core_device *vdev,
int nr_virtfn)
{
@@ -2321,6 +2332,7 @@ EXPORT_SYMBOL_GPL(vfio_pci_core_sriov_configure);
const struct pci_error_handlers vfio_pci_core_err_handlers = {
.error_detected = vfio_pci_core_aer_err_detected,
+ .reset_done = vfio_pci_core_aer_reset_done,
};
EXPORT_SYMBOL_GPL(vfio_pci_core_err_handlers);
--
2.43.0
^ permalink raw reply related [flat|nested] 24+ messages in thread* Re: [PATCH v12 6/7] vfio/pci: Add a reset_done callback for vfio-pci driver
2026-03-30 17:40 ` [PATCH v12 6/7] vfio/pci: Add a reset_done callback for vfio-pci driver Farhan Ali
@ 2026-03-31 17:43 ` Matthew Rosato
0 siblings, 0 replies; 24+ messages in thread
From: Matthew Rosato @ 2026-03-31 17:43 UTC (permalink / raw)
To: Farhan Ali, linux-s390, linux-kernel, linux-pci
Cc: helgaas, lukas, alex, clg, kbusch, schnelle, Julian Ruess
On 3/30/26 1:40 PM, Farhan Ali wrote:
> On error recovery for a PCI device bound to vfio-pci driver, we want to
> recover the state of the device to its last known saved state. The callback
> restores the state of the device to its initial saved state.
>
> Reviewed-by: Julian Ruess <julianr@linux.ibm.com>
> Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
> Acked-by: Alex Williamson <alex@shazbot.org>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH v12 7/7] vfio/pci: Remove the pcie check for VFIO_PCI_ERR_IRQ_INDEX
2026-03-30 17:40 [PATCH v12 0/7] Error recovery for vfio-pci devices on s390x Farhan Ali
` (5 preceding siblings ...)
2026-03-30 17:40 ` [PATCH v12 6/7] vfio/pci: Add a reset_done callback for vfio-pci driver Farhan Ali
@ 2026-03-30 17:40 ` Farhan Ali
2026-04-07 15:58 ` Alex Williamson
2026-04-06 17:23 ` [PATCH v12 0/7] Error recovery for vfio-pci devices on s390x Farhan Ali
7 siblings, 1 reply; 24+ messages in thread
From: Farhan Ali @ 2026-03-30 17:40 UTC (permalink / raw)
To: linux-s390, linux-kernel, linux-pci
Cc: helgaas, lukas, alex, clg, kbusch, alifm, schnelle, mjrosato,
Julian Ruess
The error signaling is configured for the vast majority of devices and it's
extremely rare that it fires anyway. Removing the pcie check will allow
userspace to be notified on errors for legacy PCI devices. The Internal
Shared Memory (ISM) device on s390 is one such device. For PCI devices on
IBM s390 error recovery involves platform firmware and notification to
operating system is done by architecture specific way. So the ISM device
can still be recovered when notified of an error.
Reviewed-by: Julian Ruess <julianr@linux.ibm.com>
Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
---
drivers/vfio/pci/vfio_pci_core.c | 8 ++------
drivers/vfio/pci/vfio_pci_intrs.c | 3 +--
2 files changed, 3 insertions(+), 8 deletions(-)
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index f1bd1266b88f..cfd9a51cd194 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -786,8 +786,7 @@ static int vfio_pci_get_irq_count(struct vfio_pci_core_device *vdev, int irq_typ
return (flags & PCI_MSIX_FLAGS_QSIZE) + 1;
}
} else if (irq_type == VFIO_PCI_ERR_IRQ_INDEX) {
- if (pci_is_pcie(vdev->pdev))
- return 1;
+ return 1;
} else if (irq_type == VFIO_PCI_REQ_IRQ_INDEX) {
return 1;
}
@@ -1163,11 +1162,8 @@ static int vfio_pci_ioctl_get_irq_info(struct vfio_pci_core_device *vdev,
switch (info.index) {
case VFIO_PCI_INTX_IRQ_INDEX ... VFIO_PCI_MSIX_IRQ_INDEX:
case VFIO_PCI_REQ_IRQ_INDEX:
- break;
case VFIO_PCI_ERR_IRQ_INDEX:
- if (pci_is_pcie(vdev->pdev))
- break;
- fallthrough;
+ break;
default:
return -EINVAL;
}
diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index 33944d4d9dc4..64f80f64ff57 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -859,8 +859,7 @@ int vfio_pci_set_irqs_ioctl(struct vfio_pci_core_device *vdev, uint32_t flags,
case VFIO_PCI_ERR_IRQ_INDEX:
switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) {
case VFIO_IRQ_SET_ACTION_TRIGGER:
- if (pci_is_pcie(vdev->pdev))
- func = vfio_pci_set_err_trigger;
+ func = vfio_pci_set_err_trigger;
break;
}
break;
--
2.43.0
^ permalink raw reply related [flat|nested] 24+ messages in thread* Re: [PATCH v12 7/7] vfio/pci: Remove the pcie check for VFIO_PCI_ERR_IRQ_INDEX
2026-03-30 17:40 ` [PATCH v12 7/7] vfio/pci: Remove the pcie check for VFIO_PCI_ERR_IRQ_INDEX Farhan Ali
@ 2026-04-07 15:58 ` Alex Williamson
0 siblings, 0 replies; 24+ messages in thread
From: Alex Williamson @ 2026-04-07 15:58 UTC (permalink / raw)
To: Farhan Ali
Cc: linux-s390, linux-kernel, linux-pci, helgaas, lukas, clg, kbusch,
schnelle, mjrosato, Julian Ruess, alex
On Mon, 30 Mar 2026 10:40:11 -0700
Farhan Ali <alifm@linux.ibm.com> wrote:
> The error signaling is configured for the vast majority of devices and it's
> extremely rare that it fires anyway. Removing the pcie check will allow
> userspace to be notified on errors for legacy PCI devices. The Internal
> Shared Memory (ISM) device on s390 is one such device. For PCI devices on
> IBM s390 error recovery involves platform firmware and notification to
> operating system is done by architecture specific way. So the ISM device
> can still be recovered when notified of an error.
>
> Reviewed-by: Julian Ruess <julianr@linux.ibm.com>
> Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
> ---
> drivers/vfio/pci/vfio_pci_core.c | 8 ++------
> drivers/vfio/pci/vfio_pci_intrs.c | 3 +--
> 2 files changed, 3 insertions(+), 8 deletions(-)
Reviewed-by: Alex Williamson <alex@shazbot.org>
>
> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index f1bd1266b88f..cfd9a51cd194 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -786,8 +786,7 @@ static int vfio_pci_get_irq_count(struct vfio_pci_core_device *vdev, int irq_typ
> return (flags & PCI_MSIX_FLAGS_QSIZE) + 1;
> }
> } else if (irq_type == VFIO_PCI_ERR_IRQ_INDEX) {
> - if (pci_is_pcie(vdev->pdev))
> - return 1;
> + return 1;
> } else if (irq_type == VFIO_PCI_REQ_IRQ_INDEX) {
> return 1;
> }
> @@ -1163,11 +1162,8 @@ static int vfio_pci_ioctl_get_irq_info(struct vfio_pci_core_device *vdev,
> switch (info.index) {
> case VFIO_PCI_INTX_IRQ_INDEX ... VFIO_PCI_MSIX_IRQ_INDEX:
> case VFIO_PCI_REQ_IRQ_INDEX:
> - break;
> case VFIO_PCI_ERR_IRQ_INDEX:
> - if (pci_is_pcie(vdev->pdev))
> - break;
> - fallthrough;
> + break;
> default:
> return -EINVAL;
> }
> diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
> index 33944d4d9dc4..64f80f64ff57 100644
> --- a/drivers/vfio/pci/vfio_pci_intrs.c
> +++ b/drivers/vfio/pci/vfio_pci_intrs.c
> @@ -859,8 +859,7 @@ int vfio_pci_set_irqs_ioctl(struct vfio_pci_core_device *vdev, uint32_t flags,
> case VFIO_PCI_ERR_IRQ_INDEX:
> switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) {
> case VFIO_IRQ_SET_ACTION_TRIGGER:
> - if (pci_is_pcie(vdev->pdev))
> - func = vfio_pci_set_err_trigger;
> + func = vfio_pci_set_err_trigger;
> break;
> }
> break;
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v12 0/7] Error recovery for vfio-pci devices on s390x
2026-03-30 17:40 [PATCH v12 0/7] Error recovery for vfio-pci devices on s390x Farhan Ali
` (6 preceding siblings ...)
2026-03-30 17:40 ` [PATCH v12 7/7] vfio/pci: Remove the pcie check for VFIO_PCI_ERR_IRQ_INDEX Farhan Ali
@ 2026-04-06 17:23 ` Farhan Ali
7 siblings, 0 replies; 24+ messages in thread
From: Farhan Ali @ 2026-04-06 17:23 UTC (permalink / raw)
To: linux-s390, linux-kernel, linux-pci, Bjorn Helgaas,
Alex Williamson
Cc: lukas, clg, kbusch, schnelle, mjrosato
Gentle ping on this series. Hope I have been able address the concerns
from last revision.
Thanks
Farhan
On 3/30/2026 10:40 AM, Farhan Ali wrote:
> Hi,
>
> This Linux kernel patch series introduces support for error recovery for
> passthrough PCI devices on System Z (s390x).
>
> Background
> ----------
> For PCI devices on s390x an operating system receives platform specific
> error events from firmware rather than through AER.Today for
> passthrough/userspace devices, we don't attempt any error recovery and
> ignore any error events for the devices. The passthrough/userspace devices
> are managed by the vfio-pci driver. The driver does register error handling
> callbacks (error_detected), and on an error trigger an eventfd to
> userspace. But we need a mechanism to notify userspace
> (QEMU/guest/userspace drivers) about the error event.
>
> Proposal
> --------
> We can expose this error information (currently only the PCI Error Code)
> via a device feature. Userspace can then obtain the error information
> via VFIO_DEVICE_FEATURE ioctl and take appropriate actions such as driving
> a device reset.
>
> This is how a typical flow for passthrough devices to a VM would work:
> For passthrough devices to a VM, the driver bound to the device on the host
> is vfio-pci. vfio-pci driver does support the error_detected() callback
> (vfio_pci_core_aer_err_detected()), and on an PCI error s390x recovery
> code on the host will call the vfio-pci error_detected() callback. The
> vfio-pci error_detected() callback will notify userspace/QEMU via an
> eventfd, and return PCI_ERS_RESULT_CAN_RECOVER. At this point the s390x
> error recovery on the host will skip any further action(see patch 4) and
> let userspace drive the error recovery.
>
> Once userspace/QEMU is notified, it then injects this error into the VM
> so device drivers in the VM can take recovery actions. For example for a
> passthrough NVMe device, the VM's OS NVMe driver will access the device.
> At this point the VM's NVMe driver's error_detected() will drive the
> recovery by returning PCI_ERS_RESULT_NEED_RESET, and the s390x error
> recovery in the VM's OS will try to do a reset. Resets are privileged
> operations and so the VM will need intervention from QEMU to perform the
> reset. QEMU will invoke the VFIO_DEVICE_RESET ioctl to now notify the
> host that the VM is requesting a reset of the device. The vfio-pci driver
> on the host will then perform the reset on the device to recover it.
>
>
> Thanks
> Farhan
>
> ChangeLog
> ---------
> v11 series https://lore.kernel.org/all/20260316191544.2279-1-alifm@linux.ibm.com/
> - Address Bjorn's comments from v11 (patches 1-3).
>
> - Create a common function to check config space accessibility
> (patch 2).
>
> - Address Alex's comments from v11 (patches 4, 5, 7).
>
> - Protect the mediated_recovery flag with the pending_errs_lock.
> Doing that it made sense to squash patches 5 and 6 from v11
> (current patch 4). Even though the code didn't change significantly
> I have dropped R-b tags for it. Would appreciate another look at the
> patch (current patch 4).
>
> - Dropped arch specific pcibios_resource_to_bus and
> pcibios_bus_to_resource as its not needed for this series. Will address
> the issue as a standalone patch separate from this series.
>
> - Rebased on pci/next, with head at f8a1c947ccc6 ("Merge branch 'pci/misc'")
>
>
> v10 series https://lore.kernel.org/all/20260302203325.3826-1-alifm@linux.ibm.com/
> v10 -> v11
> - Rebase on pci/next to handle merge conflicts with patch 1.
>
> - Typo fixup in commit message (patch 4) and use guard() for mutex
> (patch 6).
>
> v9 series https://lore.kernel.org/all/20260217182257.1582-1-alifm@linux.ibm.com/
> v9 -> v10
> - Change pci_slot number to u16 (patch 1).
>
> - Avoid saving invalid config space state if config space is
> inaccessible in the device reset path. It uses the same patch as in v8
> with R-b from Niklas.
>
> - Rebase on 7.0.0-rc2
>
>
> v8 series https://lore.kernel.org/all/20260122194437.1903-1-alifm@linux.ibm.com/
> v8 -> v9
> - Avoid saving PCI config space state in reset path (patch 3) (suggested by Bjorn)
>
> - Add explicit version to struct vfio_device_feature_zpci_err (patch 7).
>
> - Rebase on 6.19
>
>
> v7 series https://lore.kernel.org/all/20260107183217.1365-1-alifm@linux.ibm.com/
> v7 -> v8
> - Rebase on 6.19-rc4
>
> - Address feedback from Niklas and Julien.
>
>
> v6 series https://lore.kernel.org/all/2c609e61-1861-4bf3-b019-a11c137d26a5@linux.ibm.com/
> v6 -> v7
> - Rebase on 6.19-rc4
>
> - Update commit message based on Niklas's suggestion (patch 3).
>
> v5 series https://lore.kernel.org/all/20251113183502.2388-1-alifm@linux.ibm.com/
> v5 -> v6
> - Rebase on 6.18 + Lukas's PCI: Universal error recoverability of
> devices series (https://lore.kernel.org/all/cover.1763483367.git.lukas@wunner.de/)
>
> - Re-work config space accessibility check to pci_dev_save_and_disable() (patch 3).
> This avoids saving the config space, in the reset path, if the device's config space is
> corrupted or inaccessible.
>
> v4 series https://lore.kernel.org/all/20250924171628.826-1-alifm@linux.ibm.com/
> v4 -> v5
> - Rebase on 6.18-rc5
>
> - Move bug fixes to the beginning of the series (patch 1 and 2). These patches
> were posted as a separate fixes series
> https://lore.kernel.org/all/a14936ac-47d6-461b-816f-0fd66f869b0f@linux.ibm.com/
>
> - Add matching pci_put_dev() for pci_get_slot() (patch 6).
>
> v3 series https://lore.kernel.org/all/20250911183307.1910-1-alifm@linux.ibm.com/
> v3 -> v4
> - Remove warn messages for each PCI capability not restored (patch 1)
>
> - Check PCI_COMMAND and PCI_STATUS register for error value instead of device id
> (patch 1)
>
> - Fix kernel crash in patch 3
>
> - Added reviewed by tags
>
> - Address comments from Niklas's (patches 4, 5, 7)
>
> - Fix compilation error non s390x system (patch 8)
>
> - Explicitly align struct vfio_device_feature_zpci_err (patch 8)
>
>
> v2 series https://lore.kernel.org/all/20250825171226.1602-1-alifm@linux.ibm.com/
> v2 -> v3
> - Patch 1 avoids saving any config space state if the device is in error
> (suggested by Alex)
>
> - Patch 2 adds additional check only for FLR reset to try other function
> reset method (suggested by Alex).
>
> - Patch 3 fixes a bug in s390 for resetting PCI devices with multiple
> functions. Creates a new flag pci_slot to allow per function slot.
>
> - Patch 4 fixes a bug in s390 for resource to bus address translation.
>
> - Rebase on 6.17-rc5
>
>
> v1 series https://lore.kernel.org/all/20250813170821.1115-1-alifm@linux.ibm.com/
> v1 - > v2
> - Patches 1 and 2 adds some additional checks for FLR/PM reset to
> try other function reset method (suggested by Alex).
>
> - Patch 3 fixes a bug in s390 for resetting PCI devices with multiple
> functions.
>
> - Patch 7 adds a new device feature for zPCI devices for the VFIO_DEVICE_FEATURE
> ioctl. The ioctl is used by userspace to retriece any PCI error
> information for the device (suggested by Alex).
>
> - Patch 8 adds a reset_done() callback for the vfio-pci driver, to
> restore the state of the device after a reset.
>
> - Patch 9 removes the pcie check for triggering VFIO_PCI_ERR_IRQ_INDEX.
>
>
> Farhan Ali (7):
> PCI: Allow per function PCI slots to fix slot reset on s390
> PCI: Avoid saving config space state if inaccessible
> PCI: Fail FLR when config space is inaccessible
> s390/pci: Store PCI error information for passthrough devices
> vfio-pci/zdev: Add a device feature for error information
> vfio/pci: Add a reset_done callback for vfio-pci driver
> vfio/pci: Remove the pcie check for VFIO_PCI_ERR_IRQ_INDEX
>
> arch/s390/include/asm/pci.h | 30 ++++++++
> arch/s390/pci/pci.c | 1 +
> arch/s390/pci/pci_event.c | 113 +++++++++++++++++-------------
> drivers/pci/hotplug/rpaphp_slot.c | 2 +-
> drivers/pci/pci.c | 32 ++++++++-
> drivers/pci/slot.c | 33 ++++++---
> drivers/vfio/pci/vfio_pci_core.c | 22 ++++--
> drivers/vfio/pci/vfio_pci_intrs.c | 3 +-
> drivers/vfio/pci/vfio_pci_priv.h | 9 +++
> drivers/vfio/pci/vfio_pci_zdev.c | 45 +++++++++++-
> include/linux/pci.h | 8 ++-
> include/uapi/linux/vfio.h | 18 +++++
> 12 files changed, 247 insertions(+), 69 deletions(-)
>
^ permalink raw reply [flat|nested] 24+ messages in thread