* [PATCH v2 0/4] Error recovery for zPCI passthrough devices
@ 2025-08-25 21:24 Farhan Ali
2025-08-25 21:24 ` [PATCH v2 1/4] [NOTFORMERGE] linux-headers: Update for zpci vfio device Farhan Ali
` (3 more replies)
0 siblings, 4 replies; 14+ messages in thread
From: Farhan Ali @ 2025-08-25 21:24 UTC (permalink / raw)
To: qemu-devel, qemu-s390x; +Cc: alifm, mjrosato, thuth, alex.williamson, clg
Hi,
This patch series introduces support for error recovery for passthrough
PCI devices on System Z (s390x). This is the user space component for the Linux
kernel patches [1]. For QEMU on eventfd notification for PCI error from vfio-pci
driver we call the vfio error handler. We can use an architecture specific error
handler to override the default vfio error handler.
For s390x specific error handler, we retrieve the architecture specific PCI error
information and inject the information into the guest. Once the guest receives
the error information, the guest drivers will drive the error recovery.
Typically recovery involves a device reset which translate to CLP
disable/enable cycle for the device.
I would appreciate some feedback on this patch series.
Thanks
Farhan
[1] https://lore.kernel.org/all/20250825171226.1602-1-alifm@linux.ibm.com/
ChangeLog
---------
v1 https://lore.kernel.org/qemu-devel/20250813174152.1238-1-alifm@linux.ibm.com/
v1 -> v2
- Use VFIO_DEVICE_FEATURE ioctl to get device error information.
(Based on Alex's feedback on kernel series)
Farhan Ali (4):
[NOTFORMERGE] linux-headers: Update for zpci vfio device
vfio/pci: Add an architecture specific error handler
s390x/pci: Add PCI error handling for vfio pci devices
s390x/pci: Reset a device in error state
hw/s390x/s390-pci-bus.c | 12 +++++
hw/s390x/s390-pci-vfio.c | 82 ++++++++++++++++++++++++++++++++
hw/vfio/pci.c | 5 ++
hw/vfio/pci.h | 1 +
include/hw/s390x/s390-pci-bus.h | 1 +
include/hw/s390x/s390-pci-vfio.h | 4 ++
linux-headers/linux/vfio.h | 14 ++++++
7 files changed, 119 insertions(+)
--
2.43.0
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v2 1/4] [NOTFORMERGE] linux-headers: Update for zpci vfio device
2025-08-25 21:24 [PATCH v2 0/4] Error recovery for zPCI passthrough devices Farhan Ali
@ 2025-08-25 21:24 ` Farhan Ali
2025-08-25 21:24 ` [PATCH v2 2/4] vfio/pci: Add an architecture specific error handler Farhan Ali
` (2 subsequent siblings)
3 siblings, 0 replies; 14+ messages in thread
From: Farhan Ali @ 2025-08-25 21:24 UTC (permalink / raw)
To: qemu-devel, qemu-s390x; +Cc: alifm, mjrosato, thuth, alex.williamson, clg
Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
---
| 14 ++++++++++++++
1 file changed, 14 insertions(+)
--git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
index 79bf8c0cc5..ef2cae92f5 100644
--- a/linux-headers/linux/vfio.h
+++ b/linux-headers/linux/vfio.h
@@ -1468,6 +1468,20 @@ struct vfio_device_feature_bus_master {
};
#define VFIO_DEVICE_FEATURE_BUS_MASTER 10
+/**
+ * VFIO_DEVICE_FEATURE_ZPCI_ERROR feature provides PCI error information to
+ * userspace for vfio-pci devices on s390x. On s390x PCI error recovery involves
+ * platform firmware and notification to operating system is done by
+ * architecture specific mechanism. Exposing this information to userspace
+ * allows userspace to take appropriate actions to handle an error on the
+ * device.
+ */
+struct vfio_device_feature_zpci_err {
+ __u16 pec;
+ int pending_errors;
+};
+#define VFIO_DEVICE_FEATURE_ZPCI_ERROR 11
+
/* -------- API for Type1 VFIO IOMMU -------- */
/**
--
2.43.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v2 2/4] vfio/pci: Add an architecture specific error handler
2025-08-25 21:24 [PATCH v2 0/4] Error recovery for zPCI passthrough devices Farhan Ali
2025-08-25 21:24 ` [PATCH v2 1/4] [NOTFORMERGE] linux-headers: Update for zpci vfio device Farhan Ali
@ 2025-08-25 21:24 ` Farhan Ali
2025-09-01 11:28 ` Cédric Le Goater
2025-08-25 21:24 ` [PATCH v2 3/4] s390x/pci: Add PCI error handling for vfio pci devices Farhan Ali
2025-08-25 21:24 ` [PATCH v2 4/4] s390x/pci: Reset a device in error state Farhan Ali
3 siblings, 1 reply; 14+ messages in thread
From: Farhan Ali @ 2025-08-25 21:24 UTC (permalink / raw)
To: qemu-devel, qemu-s390x; +Cc: alifm, mjrosato, thuth, alex.williamson, clg
Provide a architecture specific error handling callback, that can be used
by platforms to handle PCI errors for passthrough devices.
Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
---
hw/vfio/pci.c | 5 +++++
hw/vfio/pci.h | 1 +
2 files changed, 6 insertions(+)
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 07257d0fa0..3c71d19306 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -3026,6 +3026,11 @@ static void vfio_err_notifier_handler(void *opaque)
return;
}
+ if (vdev->arch_err_handler) {
+ vdev->arch_err_handler(vdev);
+ return;
+ }
+
/*
* TBD. Retrieve the error details and decide what action
* needs to be taken. One of the actions could be to pass
diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index 810a842f4a..45d4405e47 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -145,6 +145,7 @@ struct VFIOPCIDevice {
EventNotifier err_notifier;
EventNotifier req_notifier;
int (*resetfn)(struct VFIOPCIDevice *);
+ void (*arch_err_handler)(struct VFIOPCIDevice *);
uint32_t vendor_id;
uint32_t device_id;
uint32_t sub_vendor_id;
--
2.43.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v2 3/4] s390x/pci: Add PCI error handling for vfio pci devices
2025-08-25 21:24 [PATCH v2 0/4] Error recovery for zPCI passthrough devices Farhan Ali
2025-08-25 21:24 ` [PATCH v2 1/4] [NOTFORMERGE] linux-headers: Update for zpci vfio device Farhan Ali
2025-08-25 21:24 ` [PATCH v2 2/4] vfio/pci: Add an architecture specific error handler Farhan Ali
@ 2025-08-25 21:24 ` Farhan Ali
2025-09-01 11:25 ` Cédric Le Goater
2025-08-25 21:24 ` [PATCH v2 4/4] s390x/pci: Reset a device in error state Farhan Ali
3 siblings, 1 reply; 14+ messages in thread
From: Farhan Ali @ 2025-08-25 21:24 UTC (permalink / raw)
To: qemu-devel, qemu-s390x; +Cc: alifm, mjrosato, thuth, alex.williamson, clg
Add an s390x specific callback for vfio error handling. For s390x pci devices,
we have platform specific error information. We need to retrieve this error
information for passthrough devices. This is done via a memory region which
exposes that information.
Once this error information is retrieved we can then inject an error into
the guest, and let the guest drive the recovery.
Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
---
hw/s390x/s390-pci-bus.c | 5 +++
hw/s390x/s390-pci-vfio.c | 76 ++++++++++++++++++++++++++++++++
include/hw/s390x/s390-pci-bus.h | 1 +
include/hw/s390x/s390-pci-vfio.h | 2 +
4 files changed, 84 insertions(+)
diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
index f87d2748b6..af42eb9938 100644
--- a/hw/s390x/s390-pci-bus.c
+++ b/hw/s390x/s390-pci-bus.c
@@ -158,6 +158,8 @@ static void s390_pci_perform_unplug(S390PCIBusDevice *pbdev)
{
HotplugHandler *hotplug_ctrl;
+ qemu_mutex_destroy(&pbdev->err_handler_lock);
+
if (pbdev->pft == ZPCI_PFT_ISM) {
notifier_remove(&pbdev->shutdown_notifier);
}
@@ -1140,6 +1142,7 @@ static void s390_pcihost_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
pbdev->iommu->pbdev = pbdev;
pbdev->state = ZPCI_FS_DISABLED;
set_pbdev_info(pbdev);
+ qemu_mutex_init(&pbdev->err_handler_lock);
if (object_dynamic_cast(OBJECT(dev), "vfio-pci")) {
/*
@@ -1164,6 +1167,8 @@ static void s390_pcihost_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
pbdev->iommu->dma_limit = s390_pci_start_dma_count(s, pbdev);
/* Fill in CLP information passed via the vfio region */
s390_pci_get_clp_info(pbdev);
+ /* Setup error handler for error recovery */
+ s390_pci_setup_err_handler(pbdev);
if (!pbdev->interp) {
/* Do vfio passthrough but intercept for I/O */
pbdev->fh |= FH_SHM_VFIO;
diff --git a/hw/s390x/s390-pci-vfio.c b/hw/s390x/s390-pci-vfio.c
index aaf91319b4..87ecd06a81 100644
--- a/hw/s390x/s390-pci-vfio.c
+++ b/hw/s390x/s390-pci-vfio.c
@@ -10,6 +10,7 @@
*/
#include "qemu/osdep.h"
+#include "qemu/error-report.h"
#include <sys/ioctl.h>
#include <linux/vfio.h>
@@ -103,6 +104,60 @@ void s390_pci_end_dma_count(S390pciState *s, S390PCIDMACount *cnt)
}
}
+static int s390_pci_get_feature_err(VFIOPCIDevice *vfio_pci,
+ struct vfio_device_feature_zpci_err *err)
+{
+ int ret;
+ uint64_t buf[DIV_ROUND_UP(sizeof(struct vfio_device_feature) +
+ sizeof(struct vfio_device_feature_zpci_err),
+ sizeof(uint64_t))] = {};
+ struct vfio_device_feature *feature = (struct vfio_device_feature *)buf;
+
+ feature->argsz = sizeof(buf);
+ feature->flags = VFIO_DEVICE_FEATURE_GET | VFIO_DEVICE_FEATURE_ZPCI_ERROR;
+ ret = vfio_pci->vbasedev.io_ops->device_feature(&vfio_pci->vbasedev,
+ feature);
+
+ if (ret) {
+ error_report("Failed feature get VFIO_DEVICE_FEATURE_ZPCI_ERROR"
+ " (rc=%d)", ret);
+ return ret;
+ }
+
+ memcpy(err, (struct vfio_device_feature_zpci_err *) feature->data,
+ sizeof(struct vfio_device_feature_zpci_err));
+ return 0;
+}
+
+static void s390_pci_err_handler(VFIOPCIDevice *vfio_pci)
+{
+ S390PCIBusDevice *pbdev;
+ struct vfio_device_feature_zpci_err err;
+ int ret;
+
+ pbdev = s390_pci_find_dev_by_target(s390_get_phb(),
+ DEVICE(&vfio_pci->pdev)->id);
+
+ QEMU_LOCK_GUARD(&pbdev->err_handler_lock);
+
+ ret = s390_pci_get_feature_err(vfio_pci, &err);
+ if (ret) {
+ return;
+ }
+
+ pbdev->state = ZPCI_FS_ERROR;
+ s390_pci_generate_error_event(err.pec, pbdev->fh, pbdev->fid, 0, 0);
+
+ while (err.pending_errors) {
+ ret = s390_pci_get_feature_err(vfio_pci, &err);
+ if (ret) {
+ return;
+ }
+ s390_pci_generate_error_event(err.pec, pbdev->fh, pbdev->fid, 0, 0);
+ }
+ return;
+}
+
static void s390_pci_read_base(S390PCIBusDevice *pbdev,
struct vfio_device_info *info)
{
@@ -369,3 +424,24 @@ void s390_pci_get_clp_info(S390PCIBusDevice *pbdev)
s390_pci_read_util(pbdev, info);
s390_pci_read_pfip(pbdev, info);
}
+
+void s390_pci_setup_err_handler(S390PCIBusDevice *pbdev)
+{
+ int ret;
+ VFIOPCIDevice *vfio_pci = container_of(pbdev->pdev, VFIOPCIDevice, pdev);
+ uint64_t buf[DIV_ROUND_UP(sizeof(struct vfio_device_feature),
+ sizeof(uint64_t))] = {};
+ struct vfio_device_feature *feature = (struct vfio_device_feature *)buf;
+
+ feature->argsz = sizeof(buf);
+ feature->flags = VFIO_DEVICE_FEATURE_PROBE | VFIO_DEVICE_FEATURE_ZPCI_ERROR;
+
+ ret = vfio_pci->vbasedev.io_ops->device_feature(&vfio_pci->vbasedev,
+ feature);
+
+ if (ret) {
+ info_report("Automated error recovery not available for passthrough device");
+ return;
+ }
+ vfio_pci->arch_err_handler = s390_pci_err_handler;
+}
diff --git a/include/hw/s390x/s390-pci-bus.h b/include/hw/s390x/s390-pci-bus.h
index 04944d4fed..3795e0bbfc 100644
--- a/include/hw/s390x/s390-pci-bus.h
+++ b/include/hw/s390x/s390-pci-bus.h
@@ -364,6 +364,7 @@ struct S390PCIBusDevice {
bool forwarding_assist;
bool aif;
bool rtr_avail;
+ QemuMutex err_handler_lock;
QTAILQ_ENTRY(S390PCIBusDevice) link;
};
diff --git a/include/hw/s390x/s390-pci-vfio.h b/include/hw/s390x/s390-pci-vfio.h
index ae1b126ff7..66b274293c 100644
--- a/include/hw/s390x/s390-pci-vfio.h
+++ b/include/hw/s390x/s390-pci-vfio.h
@@ -22,6 +22,7 @@ S390PCIDMACount *s390_pci_start_dma_count(S390pciState *s,
void s390_pci_end_dma_count(S390pciState *s, S390PCIDMACount *cnt);
bool s390_pci_get_host_fh(S390PCIBusDevice *pbdev, uint32_t *fh);
void s390_pci_get_clp_info(S390PCIBusDevice *pbdev);
+void s390_pci_setup_err_handler(S390PCIBusDevice *pbdev);
#else
static inline bool s390_pci_update_dma_avail(int fd, unsigned int *avail)
{
@@ -39,6 +40,7 @@ static inline bool s390_pci_get_host_fh(S390PCIBusDevice *pbdev, uint32_t *fh)
return false;
}
static inline void s390_pci_get_clp_info(S390PCIBusDevice *pbdev) { }
+static inline void s390_pci_setup_err_handler(S390PCIBusDevice *pbdev) { }
#endif
#endif
--
2.43.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v2 4/4] s390x/pci: Reset a device in error state
2025-08-25 21:24 [PATCH v2 0/4] Error recovery for zPCI passthrough devices Farhan Ali
` (2 preceding siblings ...)
2025-08-25 21:24 ` [PATCH v2 3/4] s390x/pci: Add PCI error handling for vfio pci devices Farhan Ali
@ 2025-08-25 21:24 ` Farhan Ali
2025-09-01 11:17 ` Cédric Le Goater
3 siblings, 1 reply; 14+ messages in thread
From: Farhan Ali @ 2025-08-25 21:24 UTC (permalink / raw)
To: qemu-devel, qemu-s390x; +Cc: alifm, mjrosato, thuth, alex.williamson, clg
For passthrough devices in error state, for a guest driven reset of the
device we can attempt a reset to recover the device. A reset of the device
will trigger a CLP disable/enable cycle on the host to bring the device
into a recovered state.
Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
---
hw/s390x/s390-pci-bus.c | 7 +++++++
hw/s390x/s390-pci-vfio.c | 6 ++++++
include/hw/s390x/s390-pci-vfio.h | 2 ++
3 files changed, 15 insertions(+)
diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
index af42eb9938..c9c2d775f0 100644
--- a/hw/s390x/s390-pci-bus.c
+++ b/hw/s390x/s390-pci-bus.c
@@ -1493,6 +1493,8 @@ static void s390_pci_device_reset(DeviceState *dev)
return;
case ZPCI_FS_STANDBY:
break;
+ case ZPCI_FS_ERROR:
+ break;
default:
pbdev->fh &= ~FH_MASK_ENABLE;
pbdev->state = ZPCI_FS_DISABLED;
@@ -1505,6 +1507,11 @@ static void s390_pci_device_reset(DeviceState *dev)
} else if (pbdev->summary_ind) {
pci_dereg_irqs(pbdev);
}
+
+ if (pbdev->state == ZPCI_FS_ERROR) {
+ s390_pci_reset(pbdev);
+ }
+
if (pbdev->iommu->enabled) {
pci_dereg_ioat(pbdev->iommu);
}
diff --git a/hw/s390x/s390-pci-vfio.c b/hw/s390x/s390-pci-vfio.c
index 87ecd06a81..a11ec770a7 100644
--- a/hw/s390x/s390-pci-vfio.c
+++ b/hw/s390x/s390-pci-vfio.c
@@ -158,6 +158,12 @@ static void s390_pci_err_handler(VFIOPCIDevice *vfio_pci)
return;
}
+void s390_pci_reset(S390PCIBusDevice *pbdev)
+{
+ VFIOPCIDevice *vfio_pci = container_of(pbdev->pdev, VFIOPCIDevice, pdev);
+ ioctl(vfio_pci->vbasedev.fd, VFIO_DEVICE_RESET);
+}
+
static void s390_pci_read_base(S390PCIBusDevice *pbdev,
struct vfio_device_info *info)
{
diff --git a/include/hw/s390x/s390-pci-vfio.h b/include/hw/s390x/s390-pci-vfio.h
index 66b274293c..c28dafeed8 100644
--- a/include/hw/s390x/s390-pci-vfio.h
+++ b/include/hw/s390x/s390-pci-vfio.h
@@ -23,6 +23,7 @@ void s390_pci_end_dma_count(S390pciState *s, S390PCIDMACount *cnt);
bool s390_pci_get_host_fh(S390PCIBusDevice *pbdev, uint32_t *fh);
void s390_pci_get_clp_info(S390PCIBusDevice *pbdev);
void s390_pci_setup_err_handler(S390PCIBusDevice *pbdev);
+void s390_pci_reset(S390PCIBusDevice *pbdev);
#else
static inline bool s390_pci_update_dma_avail(int fd, unsigned int *avail)
{
@@ -41,6 +42,7 @@ static inline bool s390_pci_get_host_fh(S390PCIBusDevice *pbdev, uint32_t *fh)
}
static inline void s390_pci_get_clp_info(S390PCIBusDevice *pbdev) { }
static inline void s390_pci_setup_err_handler(S390PCIBusDevice *pbdev) { }
+void s390_pci_reset(S390PCIBusDevice *pbdev) { }
#endif
#endif
--
2.43.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH v2 4/4] s390x/pci: Reset a device in error state
2025-08-25 21:24 ` [PATCH v2 4/4] s390x/pci: Reset a device in error state Farhan Ali
@ 2025-09-01 11:17 ` Cédric Le Goater
2025-09-03 17:13 ` Farhan Ali
0 siblings, 1 reply; 14+ messages in thread
From: Cédric Le Goater @ 2025-09-01 11:17 UTC (permalink / raw)
To: Farhan Ali, qemu-devel, qemu-s390x; +Cc: mjrosato, thuth, alex.williamson
On 8/25/25 23:24, Farhan Ali wrote:
> For passthrough devices in error state, for a guest driven reset of the
> device we can attempt a reset to recover the device. A reset of the device
> will trigger a CLP disable/enable cycle on the host to bring the device
> into a recovered state.
>
> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
> ---
> hw/s390x/s390-pci-bus.c | 7 +++++++
> hw/s390x/s390-pci-vfio.c | 6 ++++++
> include/hw/s390x/s390-pci-vfio.h | 2 ++
> 3 files changed, 15 insertions(+)
>
> diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
> index af42eb9938..c9c2d775f0 100644
> --- a/hw/s390x/s390-pci-bus.c
> +++ b/hw/s390x/s390-pci-bus.c
> @@ -1493,6 +1493,8 @@ static void s390_pci_device_reset(DeviceState *dev)
> return;
> case ZPCI_FS_STANDBY:
> break;
> + case ZPCI_FS_ERROR:
> + break;
> default:
> pbdev->fh &= ~FH_MASK_ENABLE;
> pbdev->state = ZPCI_FS_DISABLED;
> @@ -1505,6 +1507,11 @@ static void s390_pci_device_reset(DeviceState *dev)
> } else if (pbdev->summary_ind) {
> pci_dereg_irqs(pbdev);
> }
> +
> + if (pbdev->state == ZPCI_FS_ERROR) {
> + s390_pci_reset(pbdev);
> + }
> +
> if (pbdev->iommu->enabled) {
> pci_dereg_ioat(pbdev->iommu);
> }
> diff --git a/hw/s390x/s390-pci-vfio.c b/hw/s390x/s390-pci-vfio.c
> index 87ecd06a81..a11ec770a7 100644
> --- a/hw/s390x/s390-pci-vfio.c
> +++ b/hw/s390x/s390-pci-vfio.c
> @@ -158,6 +158,12 @@ static void s390_pci_err_handler(VFIOPCIDevice *vfio_pci)
> return;
> }
>
> +void s390_pci_reset(S390PCIBusDevice *pbdev)
> +{
> + VFIOPCIDevice *vfio_pci = container_of(pbdev->pdev, VFIOPCIDevice, pdev);
> + ioctl(vfio_pci->vbasedev.fd, VFIO_DEVICE_RESET);
> +}
> +
> static void s390_pci_read_base(S390PCIBusDevice *pbdev,
> struct vfio_device_info *info)
> {
> diff --git a/include/hw/s390x/s390-pci-vfio.h b/include/hw/s390x/s390-pci-vfio.h
> index 66b274293c..c28dafeed8 100644
> --- a/include/hw/s390x/s390-pci-vfio.h
> +++ b/include/hw/s390x/s390-pci-vfio.h
> @@ -23,6 +23,7 @@ void s390_pci_end_dma_count(S390pciState *s, S390PCIDMACount *cnt);
> bool s390_pci_get_host_fh(S390PCIBusDevice *pbdev, uint32_t *fh);
> void s390_pci_get_clp_info(S390PCIBusDevice *pbdev);
> void s390_pci_setup_err_handler(S390PCIBusDevice *pbdev);
> +void s390_pci_reset(S390PCIBusDevice *pbdev);
> #else
> static inline bool s390_pci_update_dma_avail(int fd, unsigned int *avail)
> {
> @@ -41,6 +42,7 @@ static inline bool s390_pci_get_host_fh(S390PCIBusDevice *pbdev, uint32_t *fh)
> }
> static inline void s390_pci_get_clp_info(S390PCIBusDevice *pbdev) { }
> static inline void s390_pci_setup_err_handler(S390PCIBusDevice *pbdev) { }
> +void s390_pci_reset(S390PCIBusDevice *pbdev) { }
static inline void ... ^
Thanks,
C.
> #endif
>
> #endif
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 3/4] s390x/pci: Add PCI error handling for vfio pci devices
2025-08-25 21:24 ` [PATCH v2 3/4] s390x/pci: Add PCI error handling for vfio pci devices Farhan Ali
@ 2025-09-01 11:25 ` Cédric Le Goater
2025-09-03 17:12 ` Farhan Ali
0 siblings, 1 reply; 14+ messages in thread
From: Cédric Le Goater @ 2025-09-01 11:25 UTC (permalink / raw)
To: Farhan Ali, qemu-devel, qemu-s390x; +Cc: mjrosato, thuth, alex.williamson
On 8/25/25 23:24, Farhan Ali wrote:
> Add an s390x specific callback for vfio error handling. For s390x pci devices,
> we have platform specific error information. We need to retrieve this error
> information for passthrough devices. This is done via a memory region which
> exposes that information.
>
> Once this error information is retrieved we can then inject an error into
> the guest, and let the guest drive the recovery.
>
> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
> ---
> hw/s390x/s390-pci-bus.c | 5 +++
> hw/s390x/s390-pci-vfio.c | 76 ++++++++++++++++++++++++++++++++
> include/hw/s390x/s390-pci-bus.h | 1 +
> include/hw/s390x/s390-pci-vfio.h | 2 +
> 4 files changed, 84 insertions(+)
>
> diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
> index f87d2748b6..af42eb9938 100644
> --- a/hw/s390x/s390-pci-bus.c
> +++ b/hw/s390x/s390-pci-bus.c
> @@ -158,6 +158,8 @@ static void s390_pci_perform_unplug(S390PCIBusDevice *pbdev)
> {
> HotplugHandler *hotplug_ctrl;
>
> + qemu_mutex_destroy(&pbdev->err_handler_lock);
> +
> if (pbdev->pft == ZPCI_PFT_ISM) {
> notifier_remove(&pbdev->shutdown_notifier);
> }
> @@ -1140,6 +1142,7 @@ static void s390_pcihost_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
> pbdev->iommu->pbdev = pbdev;
> pbdev->state = ZPCI_FS_DISABLED;
> set_pbdev_info(pbdev);
> + qemu_mutex_init(&pbdev->err_handler_lock);
>
> if (object_dynamic_cast(OBJECT(dev), "vfio-pci")) {
> /*
> @@ -1164,6 +1167,8 @@ static void s390_pcihost_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
> pbdev->iommu->dma_limit = s390_pci_start_dma_count(s, pbdev);
> /* Fill in CLP information passed via the vfio region */
> s390_pci_get_clp_info(pbdev);
> + /* Setup error handler for error recovery */
> + s390_pci_setup_err_handler(pbdev);
This can fail. Please add an 'Error **' parameter and change the returned
value to bool.
> if (!pbdev->interp) {
> /* Do vfio passthrough but intercept for I/O */
> pbdev->fh |= FH_SHM_VFIO;
> diff --git a/hw/s390x/s390-pci-vfio.c b/hw/s390x/s390-pci-vfio.c
> index aaf91319b4..87ecd06a81 100644
> --- a/hw/s390x/s390-pci-vfio.c
> +++ b/hw/s390x/s390-pci-vfio.c
> @@ -10,6 +10,7 @@
> */
>
> #include "qemu/osdep.h"
> +#include "qemu/error-report.h"
>
> #include <sys/ioctl.h>
> #include <linux/vfio.h>
> @@ -103,6 +104,60 @@ void s390_pci_end_dma_count(S390pciState *s, S390PCIDMACount *cnt)
> }
> }
>
> +static int s390_pci_get_feature_err(VFIOPCIDevice *vfio_pci,
> + struct vfio_device_feature_zpci_err *err)
Please add an 'Error **' parameter and change the returned value to bool.
> +{
> + int ret;
> + uint64_t buf[DIV_ROUND_UP(sizeof(struct vfio_device_feature) +
> + sizeof(struct vfio_device_feature_zpci_err),
> + sizeof(uint64_t))] = {};
> + struct vfio_device_feature *feature = (struct vfio_device_feature *)buf;
> +
> + feature->argsz = sizeof(buf);
> + feature->flags = VFIO_DEVICE_FEATURE_GET | VFIO_DEVICE_FEATURE_ZPCI_ERROR;
> + ret = vfio_pci->vbasedev.io_ops->device_feature(&vfio_pci->vbasedev,
> + feature);
Please introduce vfio helpers to hide the internal indirection :
->vbasedev.io_ops->device_feature(...)
> +
> + if (ret) {
> + error_report("Failed feature get VFIO_DEVICE_FEATURE_ZPCI_ERROR"
> + " (rc=%d)", ret);
> + return ret;
> + }
> +
> + memcpy(err, (struct vfio_device_feature_zpci_err *) feature->data,
> + sizeof(struct vfio_device_feature_zpci_err));
> + return 0;
> +}
> +
> +static void s390_pci_err_handler(VFIOPCIDevice *vfio_pci)
> +{
> + S390PCIBusDevice *pbdev;
> + struct vfio_device_feature_zpci_err err;
> + int ret;
> +
> + pbdev = s390_pci_find_dev_by_target(s390_get_phb(),
> + DEVICE(&vfio_pci->pdev)->id);
> +
> + QEMU_LOCK_GUARD(&pbdev->err_handler_lock);
> +
> + ret = s390_pci_get_feature_err(vfio_pci, &err);
> + if (ret) {
> + return;
> + }
> +
> + pbdev->state = ZPCI_FS_ERROR;
> + s390_pci_generate_error_event(err.pec, pbdev->fh, pbdev->fid, 0, 0);
> +
> + while (err.pending_errors) {
> + ret = s390_pci_get_feature_err(vfio_pci, &err);
> + if (ret) {
> + return;
> + }
> + s390_pci_generate_error_event(err.pec, pbdev->fh, pbdev->fid, 0, 0);
> + }
> + return;
> +}
> +
> static void s390_pci_read_base(S390PCIBusDevice *pbdev,
> struct vfio_device_info *info)
> {
> @@ -369,3 +424,24 @@ void s390_pci_get_clp_info(S390PCIBusDevice *pbdev)
> s390_pci_read_util(pbdev, info);
> s390_pci_read_pfip(pbdev, info);
> }
> +
> +void s390_pci_setup_err_handler(S390PCIBusDevice *pbdev)
> +{
> + int ret;
> + VFIOPCIDevice *vfio_pci = container_of(pbdev->pdev, VFIOPCIDevice, pdev);
> + uint64_t buf[DIV_ROUND_UP(sizeof(struct vfio_device_feature),
> + sizeof(uint64_t))] = {};
> + struct vfio_device_feature *feature = (struct vfio_device_feature *)buf;
> +
> + feature->argsz = sizeof(buf);
> + feature->flags = VFIO_DEVICE_FEATURE_PROBE | VFIO_DEVICE_FEATURE_ZPCI_ERROR;
> +
> + ret = vfio_pci->vbasedev.io_ops->device_feature(&vfio_pci->vbasedev,
> + feature);
Please introduce vfio helpers to hide the internal indirection :
->vbasedev.io_ops->device_feature(...)
> +
> + if (ret) {
Shouldn't we test the return value to decide if the error is
an unimplemented feature or an unexpected error ?
Thanks,
C.
> + info_report("Automated error recovery not available for passthrough device");
> + return;
> + }
> + vfio_pci->arch_err_handler = s390_pci_err_handler;
> +}
> diff --git a/include/hw/s390x/s390-pci-bus.h b/include/hw/s390x/s390-pci-bus.h
> index 04944d4fed..3795e0bbfc 100644
> --- a/include/hw/s390x/s390-pci-bus.h
> +++ b/include/hw/s390x/s390-pci-bus.h
> @@ -364,6 +364,7 @@ struct S390PCIBusDevice {
> bool forwarding_assist;
> bool aif;
> bool rtr_avail;
> + QemuMutex err_handler_lock;
> QTAILQ_ENTRY(S390PCIBusDevice) link;
> };
>
> diff --git a/include/hw/s390x/s390-pci-vfio.h b/include/hw/s390x/s390-pci-vfio.h
> index ae1b126ff7..66b274293c 100644
> --- a/include/hw/s390x/s390-pci-vfio.h
> +++ b/include/hw/s390x/s390-pci-vfio.h
> @@ -22,6 +22,7 @@ S390PCIDMACount *s390_pci_start_dma_count(S390pciState *s,
> void s390_pci_end_dma_count(S390pciState *s, S390PCIDMACount *cnt);
> bool s390_pci_get_host_fh(S390PCIBusDevice *pbdev, uint32_t *fh);
> void s390_pci_get_clp_info(S390PCIBusDevice *pbdev);
> +void s390_pci_setup_err_handler(S390PCIBusDevice *pbdev);
> #else
> static inline bool s390_pci_update_dma_avail(int fd, unsigned int *avail)
> {
> @@ -39,6 +40,7 @@ static inline bool s390_pci_get_host_fh(S390PCIBusDevice *pbdev, uint32_t *fh)
> return false;
> }
> static inline void s390_pci_get_clp_info(S390PCIBusDevice *pbdev) { }
> +static inline void s390_pci_setup_err_handler(S390PCIBusDevice *pbdev) { }
> #endif
>
> #endif
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 2/4] vfio/pci: Add an architecture specific error handler
2025-08-25 21:24 ` [PATCH v2 2/4] vfio/pci: Add an architecture specific error handler Farhan Ali
@ 2025-09-01 11:28 ` Cédric Le Goater
2025-09-03 16:49 ` Farhan Ali
0 siblings, 1 reply; 14+ messages in thread
From: Cédric Le Goater @ 2025-09-01 11:28 UTC (permalink / raw)
To: Farhan Ali, qemu-devel, qemu-s390x; +Cc: mjrosato, thuth, alex.williamson
On 8/25/25 23:24, Farhan Ali wrote:
> Provide a architecture specific error handling callback, that can be used
> by platforms to handle PCI errors for passthrough devices.
>
> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
> ---
> hw/vfio/pci.c | 5 +++++
> hw/vfio/pci.h | 1 +
> 2 files changed, 6 insertions(+)
>
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 07257d0fa0..3c71d19306 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -3026,6 +3026,11 @@ static void vfio_err_notifier_handler(void *opaque)
> return;
> }
>
> + if (vdev->arch_err_handler) {
> + vdev->arch_err_handler(vdev);
I am not sure that the "architecture specific error handling"
will be implemented this way but we need to check for potential
errors.
So, please make the handler return a bool and add an extra
'Error **' parameter.
Thanks,
C.
> + return;
> + }
> +
> /*
> * TBD. Retrieve the error details and decide what action
> * needs to be taken. One of the actions could be to pass
> diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
> index 810a842f4a..45d4405e47 100644
> --- a/hw/vfio/pci.h
> +++ b/hw/vfio/pci.h
> @@ -145,6 +145,7 @@ struct VFIOPCIDevice {
> EventNotifier err_notifier;
> EventNotifier req_notifier;
> int (*resetfn)(struct VFIOPCIDevice *);
> + void (*arch_err_handler)(struct VFIOPCIDevice *);
> uint32_t vendor_id;
> uint32_t device_id;
> uint32_t sub_vendor_id;
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 2/4] vfio/pci: Add an architecture specific error handler
2025-09-01 11:28 ` Cédric Le Goater
@ 2025-09-03 16:49 ` Farhan Ali
2025-09-09 20:56 ` Cédric Le Goater
0 siblings, 1 reply; 14+ messages in thread
From: Farhan Ali @ 2025-09-03 16:49 UTC (permalink / raw)
To: Cédric Le Goater, qemu-devel, qemu-s390x
Cc: mjrosato, thuth, alex.williamson
On 9/1/2025 4:28 AM, Cédric Le Goater wrote:
> On 8/25/25 23:24, Farhan Ali wrote:
>> Provide a architecture specific error handling callback, that can be
>> used
>> by platforms to handle PCI errors for passthrough devices.
>>
>> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
>> ---
>> hw/vfio/pci.c | 5 +++++
>> hw/vfio/pci.h | 1 +
>> 2 files changed, 6 insertions(+)
>>
>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>> index 07257d0fa0..3c71d19306 100644
>> --- a/hw/vfio/pci.c
>> +++ b/hw/vfio/pci.c
>> @@ -3026,6 +3026,11 @@ static void vfio_err_notifier_handler(void
>> *opaque)
>> return;
>> }
>> + if (vdev->arch_err_handler) {
>> + vdev->arch_err_handler(vdev);
>
>
> I am not sure that the "architecture specific error handling"
> will be implemented this way but we need to check for potential
> errors.
Thanks for reviewing, do you have any suggestions on how to implement
the architecture/device specific error handling?
>
> So, please make the handler return a bool and add an extra
> 'Error **' parameter.
>
>
Sure, I can change that. Are you suggesting the change to bool return
for the handler to report any failures in the handler (through errp)?
Thanks
Farhan
>
> Thanks,
>
> C.
>
>
>
>
>> + return;
>> + }
>> +
>> /*
>> * TBD. Retrieve the error details and decide what action
>> * needs to be taken. One of the actions could be to pass
>> diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
>> index 810a842f4a..45d4405e47 100644
>> --- a/hw/vfio/pci.h
>> +++ b/hw/vfio/pci.h
>> @@ -145,6 +145,7 @@ struct VFIOPCIDevice {
>> EventNotifier err_notifier;
>> EventNotifier req_notifier;
>> int (*resetfn)(struct VFIOPCIDevice *);
>> + void (*arch_err_handler)(struct VFIOPCIDevice *);
>
>> uint32_t vendor_id;
>> uint32_t device_id;
>> uint32_t sub_vendor_id;
>
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 3/4] s390x/pci: Add PCI error handling for vfio pci devices
2025-09-01 11:25 ` Cédric Le Goater
@ 2025-09-03 17:12 ` Farhan Ali
2025-09-03 17:49 ` Matthew Rosato
0 siblings, 1 reply; 14+ messages in thread
From: Farhan Ali @ 2025-09-03 17:12 UTC (permalink / raw)
To: Cédric Le Goater, qemu-devel, qemu-s390x
Cc: mjrosato, thuth, alex.williamson
On 9/1/2025 4:25 AM, Cédric Le Goater wrote:
> On 8/25/25 23:24, Farhan Ali wrote:
>> Add an s390x specific callback for vfio error handling. For s390x pci
>> devices,
>> we have platform specific error information. We need to retrieve this
>> error
>> information for passthrough devices. This is done via a memory region
>> which
>> exposes that information.
>>
>> Once this error information is retrieved we can then inject an error
>> into
>> the guest, and let the guest drive the recovery.
>>
>> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
>> ---
>> hw/s390x/s390-pci-bus.c | 5 +++
>> hw/s390x/s390-pci-vfio.c | 76 ++++++++++++++++++++++++++++++++
>> include/hw/s390x/s390-pci-bus.h | 1 +
>> include/hw/s390x/s390-pci-vfio.h | 2 +
>> 4 files changed, 84 insertions(+)
>>
>> diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
>> index f87d2748b6..af42eb9938 100644
>> --- a/hw/s390x/s390-pci-bus.c
>> +++ b/hw/s390x/s390-pci-bus.c
>> @@ -158,6 +158,8 @@ static void
>> s390_pci_perform_unplug(S390PCIBusDevice *pbdev)
>> {
>> HotplugHandler *hotplug_ctrl;
>> + qemu_mutex_destroy(&pbdev->err_handler_lock);
>> +
>> if (pbdev->pft == ZPCI_PFT_ISM) {
>> notifier_remove(&pbdev->shutdown_notifier);
>> }
>> @@ -1140,6 +1142,7 @@ static void s390_pcihost_plug(HotplugHandler
>> *hotplug_dev, DeviceState *dev,
>> pbdev->iommu->pbdev = pbdev;
>> pbdev->state = ZPCI_FS_DISABLED;
>> set_pbdev_info(pbdev);
>> + qemu_mutex_init(&pbdev->err_handler_lock);
>> if (object_dynamic_cast(OBJECT(dev), "vfio-pci")) {
>> /*
>> @@ -1164,6 +1167,8 @@ static void s390_pcihost_plug(HotplugHandler
>> *hotplug_dev, DeviceState *dev,
>> pbdev->iommu->dma_limit = s390_pci_start_dma_count(s,
>> pbdev);
>> /* Fill in CLP information passed via the vfio region */
>> s390_pci_get_clp_info(pbdev);
>> + /* Setup error handler for error recovery */
>> + s390_pci_setup_err_handler(pbdev);
>
> This can fail. Please add an 'Error **' parameter and change the returned
> value to bool.
>
I wanted to avoid hard failing here as we can have mismatch in kernel
and QEMU support for the feature. For example we can have a newer QEMU
version with the feature running on an older kernel. So wanted to treat
any error in setting up the error handler would be more of an info/warn
message.
>
>
>> if (!pbdev->interp) {
>> /* Do vfio passthrough but intercept for I/O */
>> pbdev->fh |= FH_SHM_VFIO;
>> diff --git a/hw/s390x/s390-pci-vfio.c b/hw/s390x/s390-pci-vfio.c
>> index aaf91319b4..87ecd06a81 100644
>> --- a/hw/s390x/s390-pci-vfio.c
>> +++ b/hw/s390x/s390-pci-vfio.c
>> @@ -10,6 +10,7 @@
>> */
>> #include "qemu/osdep.h"
>> +#include "qemu/error-report.h"
>> #include <sys/ioctl.h>
>> #include <linux/vfio.h>
>> @@ -103,6 +104,60 @@ void s390_pci_end_dma_count(S390pciState *s,
>> S390PCIDMACount *cnt)
>> }
>> }
>> +static int s390_pci_get_feature_err(VFIOPCIDevice *vfio_pci,
>> + struct
>> vfio_device_feature_zpci_err *err)
>
> Please add an 'Error **' parameter and change the returned value to bool.
Ack, will change this.
>
>> +{
>> + int ret;
>> + uint64_t buf[DIV_ROUND_UP(sizeof(struct vfio_device_feature) +
>> + sizeof(struct
>> vfio_device_feature_zpci_err),
>> + sizeof(uint64_t))] = {};
>> + struct vfio_device_feature *feature = (struct
>> vfio_device_feature *)buf;
>> +
>> + feature->argsz = sizeof(buf);
>> + feature->flags = VFIO_DEVICE_FEATURE_GET |
>> VFIO_DEVICE_FEATURE_ZPCI_ERROR;
>> + ret = vfio_pci->vbasedev.io_ops->device_feature(&vfio_pci->vbasedev,
>> + feature);
>
>
> Please introduce vfio helpers to hide the internal indirection :
>
> ->vbasedev.io_ops->device_feature(...)
>
Should we define the helpers in include/hw/vfio/vfio-device.h and should
we define a generic helper like vfio_device_get_feature(VFIODevice
*vdev, struct vfio_device_feature *feat) ?
>
>> +
>> + if (ret) {
>> + error_report("Failed feature get
>> VFIO_DEVICE_FEATURE_ZPCI_ERROR"
>> + " (rc=%d)", ret);
>> + return ret;
>> + }
>> +
>> + memcpy(err, (struct vfio_device_feature_zpci_err *) feature->data,
>> + sizeof(struct vfio_device_feature_zpci_err));
>> + return 0;
>> +}
>> +
>> +static void s390_pci_err_handler(VFIOPCIDevice *vfio_pci)
>> +{
>> + S390PCIBusDevice *pbdev;
>> + struct vfio_device_feature_zpci_err err;
>> + int ret;
>> +
>> + pbdev = s390_pci_find_dev_by_target(s390_get_phb(),
>> + DEVICE(&vfio_pci->pdev)->id);
>> +
>> + QEMU_LOCK_GUARD(&pbdev->err_handler_lock);
>> +
>> + ret = s390_pci_get_feature_err(vfio_pci, &err);
>> + if (ret) {
>> + return;
>> + }
>> +
>> + pbdev->state = ZPCI_FS_ERROR;
>> + s390_pci_generate_error_event(err.pec, pbdev->fh, pbdev->fid, 0,
>> 0);
>> +
>> + while (err.pending_errors) {
>> + ret = s390_pci_get_feature_err(vfio_pci, &err);
>> + if (ret) {
>> + return;
>> + }
>> + s390_pci_generate_error_event(err.pec, pbdev->fh,
>> pbdev->fid, 0, 0);
>> + }
>> + return;
>> +}
>> +
>> static void s390_pci_read_base(S390PCIBusDevice *pbdev,
>> struct vfio_device_info *info)
>> {
>> @@ -369,3 +424,24 @@ void s390_pci_get_clp_info(S390PCIBusDevice *pbdev)
>> s390_pci_read_util(pbdev, info);
>> s390_pci_read_pfip(pbdev, info);
>> }
>> +
>> +void s390_pci_setup_err_handler(S390PCIBusDevice *pbdev)
>> +{
>> + int ret;
>> + VFIOPCIDevice *vfio_pci = container_of(pbdev->pdev,
>> VFIOPCIDevice, pdev);
>> + uint64_t buf[DIV_ROUND_UP(sizeof(struct vfio_device_feature),
>> + sizeof(uint64_t))] = {};
>> + struct vfio_device_feature *feature = (struct
>> vfio_device_feature *)buf;
>> +
>> + feature->argsz = sizeof(buf);
>> + feature->flags = VFIO_DEVICE_FEATURE_PROBE |
>> VFIO_DEVICE_FEATURE_ZPCI_ERROR;
>> +
>> + ret = vfio_pci->vbasedev.io_ops->device_feature(&vfio_pci->vbasedev,
>> + feature);
>
> Please introduce vfio helpers to hide the internal indirection :
>
> ->vbasedev.io_ops->device_feature(...)
>
>> +
>> + if (ret) {
>
> Shouldn't we test the return value to decide if the error is
> an unimplemented feature or an unexpected error ?
Yeah, I think it makes sense separate out error for unimplemented
feature (ENOTTY) vs any other unexpected error. Will change this.
>
> Thanks,
>
> C.
>
>
>
>> + info_report("Automated error recovery not available for
>> passthrough device");
>> + return;
>> + }
>> + vfio_pci->arch_err_handler = s390_pci_err_handler;
>> +}
>> diff --git a/include/hw/s390x/s390-pci-bus.h
>> b/include/hw/s390x/s390-pci-bus.h
>> index 04944d4fed..3795e0bbfc 100644
>> --- a/include/hw/s390x/s390-pci-bus.h
>> +++ b/include/hw/s390x/s390-pci-bus.h
>> @@ -364,6 +364,7 @@ struct S390PCIBusDevice {
>> bool forwarding_assist;
>> bool aif;
>> bool rtr_avail;
>> + QemuMutex err_handler_lock;
>> QTAILQ_ENTRY(S390PCIBusDevice) link;
>> };
>> diff --git a/include/hw/s390x/s390-pci-vfio.h
>> b/include/hw/s390x/s390-pci-vfio.h
>> index ae1b126ff7..66b274293c 100644
>> --- a/include/hw/s390x/s390-pci-vfio.h
>> +++ b/include/hw/s390x/s390-pci-vfio.h
>> @@ -22,6 +22,7 @@ S390PCIDMACount
>> *s390_pci_start_dma_count(S390pciState *s,
>> void s390_pci_end_dma_count(S390pciState *s, S390PCIDMACount *cnt);
>> bool s390_pci_get_host_fh(S390PCIBusDevice *pbdev, uint32_t *fh);
>> void s390_pci_get_clp_info(S390PCIBusDevice *pbdev);
>> +void s390_pci_setup_err_handler(S390PCIBusDevice *pbdev);
>> #else
>> static inline bool s390_pci_update_dma_avail(int fd, unsigned int
>> *avail)
>> {
>> @@ -39,6 +40,7 @@ static inline bool
>> s390_pci_get_host_fh(S390PCIBusDevice *pbdev, uint32_t *fh)
>> return false;
>> }
>> static inline void s390_pci_get_clp_info(S390PCIBusDevice *pbdev) { }
>> +static inline void s390_pci_setup_err_handler(S390PCIBusDevice
>> *pbdev) { }
>> #endif
>> #endif
>
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 4/4] s390x/pci: Reset a device in error state
2025-09-01 11:17 ` Cédric Le Goater
@ 2025-09-03 17:13 ` Farhan Ali
0 siblings, 0 replies; 14+ messages in thread
From: Farhan Ali @ 2025-09-03 17:13 UTC (permalink / raw)
To: Cédric Le Goater, qemu-devel, qemu-s390x
Cc: mjrosato, thuth, alex.williamson
On 9/1/2025 4:17 AM, Cédric Le Goater wrote:
> On 8/25/25 23:24, Farhan Ali wrote:
>> For passthrough devices in error state, for a guest driven reset of the
>> device we can attempt a reset to recover the device. A reset of the
>> device
>> will trigger a CLP disable/enable cycle on the host to bring the device
>> into a recovered state.
>>
>> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
>> ---
>> hw/s390x/s390-pci-bus.c | 7 +++++++
>> hw/s390x/s390-pci-vfio.c | 6 ++++++
>> include/hw/s390x/s390-pci-vfio.h | 2 ++
>> 3 files changed, 15 insertions(+)
>>
>> diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
>> index af42eb9938..c9c2d775f0 100644
>> --- a/hw/s390x/s390-pci-bus.c
>> +++ b/hw/s390x/s390-pci-bus.c
>> @@ -1493,6 +1493,8 @@ static void s390_pci_device_reset(DeviceState
>> *dev)
>> return;
>> case ZPCI_FS_STANDBY:
>> break;
>> + case ZPCI_FS_ERROR:
>> + break;
>> default:
>> pbdev->fh &= ~FH_MASK_ENABLE;
>> pbdev->state = ZPCI_FS_DISABLED;
>> @@ -1505,6 +1507,11 @@ static void s390_pci_device_reset(DeviceState
>> *dev)
>> } else if (pbdev->summary_ind) {
>> pci_dereg_irqs(pbdev);
>> }
>> +
>> + if (pbdev->state == ZPCI_FS_ERROR) {
>> + s390_pci_reset(pbdev);
>> + }
>> +
>> if (pbdev->iommu->enabled) {
>> pci_dereg_ioat(pbdev->iommu);
>> }
>> diff --git a/hw/s390x/s390-pci-vfio.c b/hw/s390x/s390-pci-vfio.c
>> index 87ecd06a81..a11ec770a7 100644
>> --- a/hw/s390x/s390-pci-vfio.c
>> +++ b/hw/s390x/s390-pci-vfio.c
>> @@ -158,6 +158,12 @@ static void s390_pci_err_handler(VFIOPCIDevice
>> *vfio_pci)
>> return;
>> }
>> +void s390_pci_reset(S390PCIBusDevice *pbdev)
>> +{
>> + VFIOPCIDevice *vfio_pci = container_of(pbdev->pdev,
>> VFIOPCIDevice, pdev);
>> + ioctl(vfio_pci->vbasedev.fd, VFIO_DEVICE_RESET);
>> +}
>> +
>> static void s390_pci_read_base(S390PCIBusDevice *pbdev,
>> struct vfio_device_info *info)
>> {
>> diff --git a/include/hw/s390x/s390-pci-vfio.h
>> b/include/hw/s390x/s390-pci-vfio.h
>> index 66b274293c..c28dafeed8 100644
>> --- a/include/hw/s390x/s390-pci-vfio.h
>> +++ b/include/hw/s390x/s390-pci-vfio.h
>> @@ -23,6 +23,7 @@ void s390_pci_end_dma_count(S390pciState *s,
>> S390PCIDMACount *cnt);
>> bool s390_pci_get_host_fh(S390PCIBusDevice *pbdev, uint32_t *fh);
>> void s390_pci_get_clp_info(S390PCIBusDevice *pbdev);
>> void s390_pci_setup_err_handler(S390PCIBusDevice *pbdev);
>> +void s390_pci_reset(S390PCIBusDevice *pbdev);
>> #else
>> static inline bool s390_pci_update_dma_avail(int fd, unsigned int
>> *avail)
>> {
>> @@ -41,6 +42,7 @@ static inline bool
>> s390_pci_get_host_fh(S390PCIBusDevice *pbdev, uint32_t *fh)
>> }
>> static inline void s390_pci_get_clp_info(S390PCIBusDevice *pbdev) { }
>> static inline void s390_pci_setup_err_handler(S390PCIBusDevice
>> *pbdev) { }
>> +void s390_pci_reset(S390PCIBusDevice *pbdev) { }
>
> static inline void ... ^
>
>
> Thanks,
>
> C.
Ack, will change.
Thanks
Farhan
>
>
>
>> #endif
>> #endif
>
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 3/4] s390x/pci: Add PCI error handling for vfio pci devices
2025-09-03 17:12 ` Farhan Ali
@ 2025-09-03 17:49 ` Matthew Rosato
2025-09-09 7:59 ` Cédric Le Goater
0 siblings, 1 reply; 14+ messages in thread
From: Matthew Rosato @ 2025-09-03 17:49 UTC (permalink / raw)
To: Farhan Ali, Cédric Le Goater, qemu-devel, qemu-s390x
Cc: thuth, alex.williamson
On 9/3/25 1:12 PM, Farhan Ali wrote:
>
> On 9/1/2025 4:25 AM, Cédric Le Goater wrote:
>> On 8/25/25 23:24, Farhan Ali wrote:
>>> Add an s390x specific callback for vfio error handling. For s390x pci devices,
>>> we have platform specific error information. We need to retrieve this error
>>> information for passthrough devices. This is done via a memory region which
>>> exposes that information.
>>>
>>> Once this error information is retrieved we can then inject an error into
>>> the guest, and let the guest drive the recovery.
>>>
>>> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
>>> ---
>>> hw/s390x/s390-pci-bus.c | 5 +++
>>> hw/s390x/s390-pci-vfio.c | 76 ++++++++++++++++++++++++++++++++
>>> include/hw/s390x/s390-pci-bus.h | 1 +
>>> include/hw/s390x/s390-pci-vfio.h | 2 +
>>> 4 files changed, 84 insertions(+)
>>>
>>> diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
>>> index f87d2748b6..af42eb9938 100644
>>> --- a/hw/s390x/s390-pci-bus.c
>>> +++ b/hw/s390x/s390-pci-bus.c
>>> @@ -158,6 +158,8 @@ static void s390_pci_perform_unplug(S390PCIBusDevice *pbdev)
>>> {
>>> HotplugHandler *hotplug_ctrl;
>>> + qemu_mutex_destroy(&pbdev->err_handler_lock);
>>> +
>>> if (pbdev->pft == ZPCI_PFT_ISM) {
>>> notifier_remove(&pbdev->shutdown_notifier);
>>> }
>>> @@ -1140,6 +1142,7 @@ static void s390_pcihost_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
>>> pbdev->iommu->pbdev = pbdev;
>>> pbdev->state = ZPCI_FS_DISABLED;
>>> set_pbdev_info(pbdev);
>>> + qemu_mutex_init(&pbdev->err_handler_lock);
>>> if (object_dynamic_cast(OBJECT(dev), "vfio-pci")) {
>>> /*
>>> @@ -1164,6 +1167,8 @@ static void s390_pcihost_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
>>> pbdev->iommu->dma_limit = s390_pci_start_dma_count(s, pbdev);
>>> /* Fill in CLP information passed via the vfio region */
>>> s390_pci_get_clp_info(pbdev);
>>> + /* Setup error handler for error recovery */
>>> + s390_pci_setup_err_handler(pbdev);
>>
>> This can fail. Please add an 'Error **' parameter and change the returned
>> value to bool.
>>
> I wanted to avoid hard failing here as we can have mismatch in kernel and QEMU support for the feature. For example we can have a newer QEMU version with the feature running on an older kernel. So wanted to treat any error in setting up the error handler would be more of an info/warn message.
+1, please do not cause a hard failure if the underlying host kernel is simply missing support...
>>> +void s390_pci_setup_err_handler(S390PCIBusDevice *pbdev)
>>> +{
>>> + int ret;
>>> + VFIOPCIDevice *vfio_pci = container_of(pbdev->pdev, VFIOPCIDevice, pdev);
>>> + uint64_t buf[DIV_ROUND_UP(sizeof(struct vfio_device_feature),
>>> + sizeof(uint64_t))] = {};
>>> + struct vfio_device_feature *feature = (struct vfio_device_feature *)buf;
>>> +
>>> + feature->argsz = sizeof(buf);
>>> + feature->flags = VFIO_DEVICE_FEATURE_PROBE | VFIO_DEVICE_FEATURE_ZPCI_ERROR;
>>> +
>>> + ret = vfio_pci->vbasedev.io_ops->device_feature(&vfio_pci->vbasedev,
>>> + feature);
>>
>> Please introduce vfio helpers to hide the internal indirection :
>>
>> ->vbasedev.io_ops->device_feature(...)
>>
>>> +
>>> + if (ret) {
>>
>> Shouldn't we test the return value to decide if the error is
>> an unimplemented feature or an unexpected error ?
>
> Yeah, I think it makes sense separate out error for unimplemented feature (ENOTTY) vs any other unexpected error. Will change this.
>
... But if you add differentiation here between the 2 types of errors then I would be fine with hard-fail for unexpected cases and info/warn for missing host kernel support.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 3/4] s390x/pci: Add PCI error handling for vfio pci devices
2025-09-03 17:49 ` Matthew Rosato
@ 2025-09-09 7:59 ` Cédric Le Goater
0 siblings, 0 replies; 14+ messages in thread
From: Cédric Le Goater @ 2025-09-09 7:59 UTC (permalink / raw)
To: Matthew Rosato, Farhan Ali, qemu-devel, qemu-s390x; +Cc: thuth, alex.williamson
On 9/3/25 19:49, Matthew Rosato wrote:
> On 9/3/25 1:12 PM, Farhan Ali wrote:
>>
>> On 9/1/2025 4:25 AM, Cédric Le Goater wrote:
>>> On 8/25/25 23:24, Farhan Ali wrote:
>>>> Add an s390x specific callback for vfio error handling. For s390x pci devices,
>>>> we have platform specific error information. We need to retrieve this error
>>>> information for passthrough devices. This is done via a memory region which
>>>> exposes that information.
>>>>
>>>> Once this error information is retrieved we can then inject an error into
>>>> the guest, and let the guest drive the recovery.
>>>>
>>>> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
>>>> ---
>>>> hw/s390x/s390-pci-bus.c | 5 +++
>>>> hw/s390x/s390-pci-vfio.c | 76 ++++++++++++++++++++++++++++++++
>>>> include/hw/s390x/s390-pci-bus.h | 1 +
>>>> include/hw/s390x/s390-pci-vfio.h | 2 +
>>>> 4 files changed, 84 insertions(+)
>>>>
>>>> diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
>>>> index f87d2748b6..af42eb9938 100644
>>>> --- a/hw/s390x/s390-pci-bus.c
>>>> +++ b/hw/s390x/s390-pci-bus.c
>>>> @@ -158,6 +158,8 @@ static void s390_pci_perform_unplug(S390PCIBusDevice *pbdev)
>>>> {
>>>> HotplugHandler *hotplug_ctrl;
>>>> + qemu_mutex_destroy(&pbdev->err_handler_lock);
>>>> +
>>>> if (pbdev->pft == ZPCI_PFT_ISM) {
>>>> notifier_remove(&pbdev->shutdown_notifier);
>>>> }
>>>> @@ -1140,6 +1142,7 @@ static void s390_pcihost_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
>>>> pbdev->iommu->pbdev = pbdev;
>>>> pbdev->state = ZPCI_FS_DISABLED;
>>>> set_pbdev_info(pbdev);
>>>> + qemu_mutex_init(&pbdev->err_handler_lock);
>>>> if (object_dynamic_cast(OBJECT(dev), "vfio-pci")) {
>>>> /*
>>>> @@ -1164,6 +1167,8 @@ static void s390_pcihost_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
>>>> pbdev->iommu->dma_limit = s390_pci_start_dma_count(s, pbdev);
>>>> /* Fill in CLP information passed via the vfio region */
>>>> s390_pci_get_clp_info(pbdev);
>>>> + /* Setup error handler for error recovery */
>>>> + s390_pci_setup_err_handler(pbdev);
>>>
>>> This can fail. Please add an 'Error **' parameter and change the returned
>>> value to bool.
>>>
>> I wanted to avoid hard failing here as we can have mismatch in kernel and QEMU support for the feature. For example we can have a newer QEMU version with the feature running on an older kernel. So wanted to treat any error in setting up the error handler would be more of an info/warn message.
>
> +1, please do not cause a hard failure if the underlying host kernel is simply missing support...
It doesn't have to be a hard failure. The code could issue a
warn_report_err(). Similarly to what is done for the vfio_ap
device when IRQ notifying support is not available on the host.
This is minor.
C.
>
>>>> +void s390_pci_setup_err_handler(S390PCIBusDevice *pbdev)
>>>> +{
>>>> + int ret;
>>>> + VFIOPCIDevice *vfio_pci = container_of(pbdev->pdev, VFIOPCIDevice, pdev);
>>>> + uint64_t buf[DIV_ROUND_UP(sizeof(struct vfio_device_feature),
>>>> + sizeof(uint64_t))] = {};
>>>> + struct vfio_device_feature *feature = (struct vfio_device_feature *)buf;
>>>> +
>>>> + feature->argsz = sizeof(buf);
>>>> + feature->flags = VFIO_DEVICE_FEATURE_PROBE | VFIO_DEVICE_FEATURE_ZPCI_ERROR;
>>>> +
>>>> + ret = vfio_pci->vbasedev.io_ops->device_feature(&vfio_pci->vbasedev,
>>>> + feature);
>>>
>>> Please introduce vfio helpers to hide the internal indirection :
>>>
>>> ->vbasedev.io_ops->device_feature(...)
>>>
>>>> +
>>>> + if (ret) {
>>>
>>> Shouldn't we test the return value to decide if the error is
>>> an unimplemented feature or an unexpected error ?
>>
>> Yeah, I think it makes sense separate out error for unimplemented feature (ENOTTY) vs any other unexpected error. Will change this.
>>
>
> ... But if you add differentiation here between the 2 types of errors then I would be fine with hard-fail for unexpected cases and info/warn for missing host kernel support.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 2/4] vfio/pci: Add an architecture specific error handler
2025-09-03 16:49 ` Farhan Ali
@ 2025-09-09 20:56 ` Cédric Le Goater
0 siblings, 0 replies; 14+ messages in thread
From: Cédric Le Goater @ 2025-09-09 20:56 UTC (permalink / raw)
To: Farhan Ali, qemu-devel, qemu-s390x; +Cc: mjrosato, thuth, alex.williamson
On 9/3/25 18:49, Farhan Ali wrote:
>
> On 9/1/2025 4:28 AM, Cédric Le Goater wrote:
>> On 8/25/25 23:24, Farhan Ali wrote:
>>> Provide a architecture specific error handling callback, that can be used
>>> by platforms to handle PCI errors for passthrough devices.
>>>
>>> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
>>> ---
>>> hw/vfio/pci.c | 5 +++++
>>> hw/vfio/pci.h | 1 +
>>> 2 files changed, 6 insertions(+)
>>>
>>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>>> index 07257d0fa0..3c71d19306 100644
>>> --- a/hw/vfio/pci.c
>>> +++ b/hw/vfio/pci.c
>>> @@ -3026,6 +3026,11 @@ static void vfio_err_notifier_handler(void *opaque)
>>> return;
>>> }
>>> + if (vdev->arch_err_handler) {
>>> + vdev->arch_err_handler(vdev);
>>
>>
>> I am not sure that the "architecture specific error handling"
>> will be implemented this way but we need to check for potential
>> errors.
>
> Thanks for reviewing, do you have any suggestions on how to implement the architecture/device specific error handling?
The nature of VFIOPCIDevice is special, it's both a PCIDevice and
a VFIODevice and it's difficult to use QOM to customize some of
its behavior with device class handlers. I don't see a better
way for now.
I would drop the 'arch_' prefix from the name.
>> So, please make the handler return a bool and add an extra
>> 'Error **' parameter.
>>
>>
> Sure, I can change that. Are you suggesting the change to bool return for the handler to report any failures in the handler (through errp)?
yes. The error reported before the vm_stop() in vfio_err_notifier_handler()
should be updated too.
Thanks,
C.
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2025-09-09 20:58 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-25 21:24 [PATCH v2 0/4] Error recovery for zPCI passthrough devices Farhan Ali
2025-08-25 21:24 ` [PATCH v2 1/4] [NOTFORMERGE] linux-headers: Update for zpci vfio device Farhan Ali
2025-08-25 21:24 ` [PATCH v2 2/4] vfio/pci: Add an architecture specific error handler Farhan Ali
2025-09-01 11:28 ` Cédric Le Goater
2025-09-03 16:49 ` Farhan Ali
2025-09-09 20:56 ` Cédric Le Goater
2025-08-25 21:24 ` [PATCH v2 3/4] s390x/pci: Add PCI error handling for vfio pci devices Farhan Ali
2025-09-01 11:25 ` Cédric Le Goater
2025-09-03 17:12 ` Farhan Ali
2025-09-03 17:49 ` Matthew Rosato
2025-09-09 7:59 ` Cédric Le Goater
2025-08-25 21:24 ` [PATCH v2 4/4] s390x/pci: Reset a device in error state Farhan Ali
2025-09-01 11:17 ` Cédric Le Goater
2025-09-03 17:13 ` Farhan Ali
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).