linux-s390.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1 0/6] Error recovery for vfio-pci devices on s390x
@ 2025-08-13 17:08 Farhan Ali
  2025-08-13 17:08 ` [PATCH v1 1/6] s390/pci: Restore airq unconditionally for the zPCI device Farhan Ali
                   ` (6 more replies)
  0 siblings, 7 replies; 27+ messages in thread
From: Farhan Ali @ 2025-08-13 17:08 UTC (permalink / raw)
  To: linux-s390, kvm, linux-kernel; +Cc: schnelle, mjrosato, alifm, alex.williamson

Hi,

This Linux kernel patch series introduces support for error recovery for
passthrough PCI devices on System Z (s390x). 

Background
----------
For PCI devices on s390x an operating system receives platform specific 
error events from firmware rather than through AER.Today for
passthrough/userspace devices, we don't attempt any error recovery
and ignore any error events for the devices. The passthrough/userspace devices are 
managed by the vfio-pci driver. The driver does register error handling 
callbacks (error_detected), and on an error trigger an eventfd to userspace. 
But we need a mechanism to notify userspace (QEMU/guest/userspace drivers) about
the error event. 

Proposal
--------
We can expose this error information (currently only the PCI Error Code) via a 
device specific memory region for s390 vfio pci devices. Userspace can then read 
the memory region to obtain the error information and take appropriate actions
such as driving a device reset. The memory region provides some flexibility in 
providing more information in the future if required.

I would appreciate some feedback on this approach.

Thanks
Farhan

Farhan Ali (6):
  s390/pci: Restore airq unconditionally for the zPCI device
  s390/pci: Update the logic for detecting passthrough device
  s390/pci: Store PCI error information for passthrough devices
  vfio-pci/zdev: Setup a zpci memory region for error information
  vfio-pci/zdev: Perform platform specific function reset for zPCI
  vfio: Allow error notification and recovery for ISM device

 arch/s390/include/asm/pci.h       |  29 +++++++
 arch/s390/pci/pci.c               |   2 +
 arch/s390/pci/pci_event.c         | 107 ++++++++++++++-----------
 arch/s390/pci/pci_irq.c           |   3 +-
 drivers/vfio/pci/vfio_pci_core.c  |  22 +++++-
 drivers/vfio/pci/vfio_pci_intrs.c |   2 +-
 drivers/vfio/pci/vfio_pci_priv.h  |   8 ++
 drivers/vfio/pci/vfio_pci_zdev.c  | 126 +++++++++++++++++++++++++++++-
 include/uapi/linux/vfio.h         |   2 +
 include/uapi/linux/vfio_zdev.h    |   5 ++
 10 files changed, 253 insertions(+), 53 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v1 1/6] s390/pci: Restore airq unconditionally for the zPCI device
  2025-08-13 17:08 [PATCH v1 0/6] Error recovery for vfio-pci devices on s390x Farhan Ali
@ 2025-08-13 17:08 ` Farhan Ali
  2025-08-14 11:32   ` Niklas Schnelle
  2025-08-13 17:08 ` [PATCH v1 2/6] s390/pci: Update the logic for detecting passthrough device Farhan Ali
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 27+ messages in thread
From: Farhan Ali @ 2025-08-13 17:08 UTC (permalink / raw)
  To: linux-s390, kvm, linux-kernel; +Cc: schnelle, mjrosato, alifm, alex.williamson

Commit c1e18c17bda6 ("s390/pci: add zpci_set_irq()/zpci_clear_irq()"),
introduced the zpci_set_irq() and zpci_clear_irq(), to be used while
resetting a zPCI device.

Commit da995d538d3a ("s390/pci: implement reset_slot for hotplug slot"),
mentions zpci_clear_irq() being called in the path for zpci_hot_reset_device().
But that is not the case anymore and these functions are not called
outside of this file.

However after a CLP disable/enable reset (zpci_hot_reset_device),
the airq setup of the device will need to be restored. Since we
are no longer calling zpci_clear_airq() in the reset path, we should
restore the airq for device unconditionally.

Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
---
 arch/s390/pci/pci_irq.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/s390/pci/pci_irq.c b/arch/s390/pci/pci_irq.c
index 84482a921332..8b5493f0dee0 100644
--- a/arch/s390/pci/pci_irq.c
+++ b/arch/s390/pci/pci_irq.c
@@ -427,8 +427,7 @@ bool arch_restore_msi_irqs(struct pci_dev *pdev)
 {
 	struct zpci_dev *zdev = to_zpci(pdev);
 
-	if (!zdev->irqs_registered)
-		zpci_set_irq(zdev);
+	zpci_set_irq(zdev);
 	return true;
 }
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v1 2/6] s390/pci: Update the logic for detecting passthrough device
  2025-08-13 17:08 [PATCH v1 0/6] Error recovery for vfio-pci devices on s390x Farhan Ali
  2025-08-13 17:08 ` [PATCH v1 1/6] s390/pci: Restore airq unconditionally for the zPCI device Farhan Ali
@ 2025-08-13 17:08 ` Farhan Ali
  2025-08-13 17:08 ` [PATCH v1 3/6] s390/pci: Store PCI error information for passthrough devices Farhan Ali
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 27+ messages in thread
From: Farhan Ali @ 2025-08-13 17:08 UTC (permalink / raw)
  To: linux-s390, kvm, linux-kernel; +Cc: schnelle, mjrosato, alifm, alex.williamson

We can now have userspace drivers (vfio-pci based)
on s390x. The userspace drivers will not have any KVM fd and so no
kzdev associated with them. So we need to update the logic for
detecting passthrough devices to not depend on struct kvm_zdev.

Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
---
 arch/s390/include/asm/pci.h      |  1 +
 arch/s390/pci/pci_event.c        | 14 ++++----------
 drivers/vfio/pci/vfio_pci_zdev.c |  9 ++++++++-
 3 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h
index 41f900f693d9..0705a2f52263 100644
--- a/arch/s390/include/asm/pci.h
+++ b/arch/s390/include/asm/pci.h
@@ -170,6 +170,7 @@ struct zpci_dev {
 
 	char res_name[16];
 	bool mio_capable;
+	bool mediated_recovery;
 	struct zpci_bar_struct bars[PCI_STD_NUM_BARS];
 
 	u64		start_dma;	/* Start of available DMA addresses */
diff --git a/arch/s390/pci/pci_event.c b/arch/s390/pci/pci_event.c
index d930416d4c90..541d536be052 100644
--- a/arch/s390/pci/pci_event.c
+++ b/arch/s390/pci/pci_event.c
@@ -61,16 +61,10 @@ static inline bool ers_result_indicates_abort(pci_ers_result_t ers_res)
 	}
 }
 
-static bool is_passed_through(struct pci_dev *pdev)
+static bool needs_mediated_recovery(struct pci_dev *pdev)
 {
 	struct zpci_dev *zdev = to_zpci(pdev);
-	bool ret;
-
-	mutex_lock(&zdev->kzdev_lock);
-	ret = !!zdev->kzdev;
-	mutex_unlock(&zdev->kzdev_lock);
-
-	return ret;
+	return zdev->mediated_recovery;
 }
 
 static bool is_driver_supported(struct pci_driver *driver)
@@ -194,7 +188,7 @@ static pci_ers_result_t zpci_event_attempt_error_recovery(struct pci_dev *pdev)
 	}
 	pdev->error_state = pci_channel_io_frozen;
 
-	if (is_passed_through(pdev)) {
+	if (needs_mediated_recovery(pdev)) {
 		pr_info("%s: Cannot be recovered in the host because it is a pass-through device\n",
 			pci_name(pdev));
 		status_str = "failed (pass-through)";
@@ -277,7 +271,7 @@ static void zpci_event_io_failure(struct pci_dev *pdev, pci_channel_state_t es)
 	 * we will inject the error event and let the guest recover the device
 	 * itself.
 	 */
-	if (is_passed_through(pdev))
+	if (needs_mediated_recovery(pdev))
 		goto out;
 	driver = to_pci_driver(pdev->dev.driver);
 	if (driver && driver->err_handler && driver->err_handler->error_detected)
diff --git a/drivers/vfio/pci/vfio_pci_zdev.c b/drivers/vfio/pci/vfio_pci_zdev.c
index 0990fdb146b7..a7bc23ce8483 100644
--- a/drivers/vfio/pci/vfio_pci_zdev.c
+++ b/drivers/vfio/pci/vfio_pci_zdev.c
@@ -148,6 +148,8 @@ int vfio_pci_zdev_open_device(struct vfio_pci_core_device *vdev)
 	if (!zdev)
 		return -ENODEV;
 
+	zdev->mediated_recovery = true;
+
 	if (!vdev->vdev.kvm)
 		return 0;
 
@@ -161,7 +163,12 @@ void vfio_pci_zdev_close_device(struct vfio_pci_core_device *vdev)
 {
 	struct zpci_dev *zdev = to_zpci(vdev->pdev);
 
-	if (!zdev || !vdev->vdev.kvm)
+	if (!zdev)
+		return;
+
+	zdev->mediated_recovery = false;
+
+	if (!vdev->vdev.kvm)
 		return;
 
 	if (zpci_kvm_hook.kvm_unregister)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v1 3/6] s390/pci: Store PCI error information for passthrough devices
  2025-08-13 17:08 [PATCH v1 0/6] Error recovery for vfio-pci devices on s390x Farhan Ali
  2025-08-13 17:08 ` [PATCH v1 1/6] s390/pci: Restore airq unconditionally for the zPCI device Farhan Ali
  2025-08-13 17:08 ` [PATCH v1 2/6] s390/pci: Update the logic for detecting passthrough device Farhan Ali
@ 2025-08-13 17:08 ` Farhan Ali
  2025-08-13 17:08 ` [PATCH v1 4/6] vfio-pci/zdev: Setup a zpci memory region for error information Farhan Ali
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 27+ messages in thread
From: Farhan Ali @ 2025-08-13 17:08 UTC (permalink / raw)
  To: linux-s390, kvm, linux-kernel; +Cc: schnelle, mjrosato, alifm, alex.williamson

For a passthrough device we need co-operation from user space
to recover the device. This would require to bubble up any error
information to user space. Let's store this error information for
passthrough devices, so it can be retrieved later.

Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
---
 arch/s390/include/asm/pci.h      | 28 ++++++++++
 arch/s390/pci/pci.c              |  1 +
 arch/s390/pci/pci_event.c        | 95 +++++++++++++++++++-------------
 drivers/vfio/pci/vfio_pci_zdev.c |  2 +
 4 files changed, 88 insertions(+), 38 deletions(-)

diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h
index 0705a2f52263..9b4abbd806f2 100644
--- a/arch/s390/include/asm/pci.h
+++ b/arch/s390/include/asm/pci.h
@@ -116,6 +116,31 @@ struct zpci_bus {
 	enum pci_bus_speed	max_bus_speed;
 };
 
+/* Content Code Description for PCI Function Error */
+struct zpci_ccdf_err {
+	u32 reserved1;
+	u32 fh;                         /* function handle */
+	u32 fid;                        /* function id */
+	u32 ett         :  4;           /* expected table type */
+	u32 mvn         : 12;           /* MSI vector number */
+	u32 dmaas       :  8;           /* DMA address space */
+	u32 reserved2   :  6;
+	u32 q           :  1;           /* event qualifier */
+	u32 rw          :  1;           /* read/write */
+	u64 faddr;                      /* failing address */
+	u32 reserved3;
+	u16 reserved4;
+	u16 pec;                        /* PCI event code */
+} __packed;
+
+#define ZPCI_ERR_PENDING_MAX 16
+struct zpci_ccdf_pending {
+	size_t count;
+	int head;
+	int tail;
+	struct zpci_ccdf_err err[ZPCI_ERR_PENDING_MAX];
+};
+
 /* Private data per function */
 struct zpci_dev {
 	struct zpci_bus *zbus;
@@ -192,6 +217,8 @@ struct zpci_dev {
 	struct iommu_domain *s390_domain; /* attached IOMMU domain */
 	struct kvm_zdev *kzdev;
 	struct mutex kzdev_lock;
+	struct zpci_ccdf_pending pending_errs;
+	struct mutex pending_errs_lock;
 	spinlock_t dom_lock;		/* protect s390_domain change */
 };
 
@@ -317,6 +344,7 @@ void zpci_debug_exit_device(struct zpci_dev *);
 int zpci_report_error(struct pci_dev *, struct zpci_report_error_header *);
 int zpci_clear_error_state(struct zpci_dev *zdev);
 int zpci_reset_load_store_blocked(struct zpci_dev *zdev);
+void zpci_cleanup_pending_errors(struct zpci_dev *zdev);
 
 #ifdef CONFIG_NUMA
 
diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c
index cd6676c2d602..f795e05b5001 100644
--- a/arch/s390/pci/pci.c
+++ b/arch/s390/pci/pci.c
@@ -823,6 +823,7 @@ struct zpci_dev *zpci_create_device(u32 fid, u32 fh, enum zpci_state state)
 	mutex_init(&zdev->state_lock);
 	mutex_init(&zdev->fmb_lock);
 	mutex_init(&zdev->kzdev_lock);
+	mutex_init(&zdev->pending_errs_lock);
 
 	return zdev;
 
diff --git a/arch/s390/pci/pci_event.c b/arch/s390/pci/pci_event.c
index 541d536be052..ac527410812b 100644
--- a/arch/s390/pci/pci_event.c
+++ b/arch/s390/pci/pci_event.c
@@ -18,23 +18,6 @@
 #include "pci_bus.h"
 #include "pci_report.h"
 
-/* Content Code Description for PCI Function Error */
-struct zpci_ccdf_err {
-	u32 reserved1;
-	u32 fh;				/* function handle */
-	u32 fid;			/* function id */
-	u32 ett		:  4;		/* expected table type */
-	u32 mvn		: 12;		/* MSI vector number */
-	u32 dmaas	:  8;		/* DMA address space */
-	u32		:  6;
-	u32 q		:  1;		/* event qualifier */
-	u32 rw		:  1;		/* read/write */
-	u64 faddr;			/* failing address */
-	u32 reserved3;
-	u16 reserved4;
-	u16 pec;			/* PCI event code */
-} __packed;
-
 /* Content Code Description for PCI Function Availability */
 struct zpci_ccdf_avail {
 	u32 reserved1;
@@ -76,6 +59,41 @@ static bool is_driver_supported(struct pci_driver *driver)
 	return true;
 }
 
+static void zpci_store_pci_error(struct pci_dev *pdev,
+				 struct zpci_ccdf_err *ccdf)
+{
+	struct zpci_dev *zdev = to_zpci(pdev);
+	int i;
+
+	mutex_lock(&zdev->pending_errs_lock);
+	if (zdev->pending_errs.count >= ZPCI_ERR_PENDING_MAX) {
+		pr_err("%s: Cannot store PCI error info for device",
+				pci_name(pdev));
+		mutex_unlock(&zdev->pending_errs_lock);
+		return;
+	}
+
+	i = zdev->pending_errs.tail % ZPCI_ERR_PENDING_MAX;
+	memcpy(&zdev->pending_errs.err[i], ccdf, sizeof(struct zpci_ccdf_err));
+	zdev->pending_errs.tail++;
+	zdev->pending_errs.count++;
+	mutex_unlock(&zdev->pending_errs_lock);
+}
+
+void zpci_cleanup_pending_errors(struct zpci_dev *zdev)
+{
+	struct pci_dev *pdev = NULL;
+
+	mutex_lock(&zdev->pending_errs_lock);
+	pdev = pci_get_slot(zdev->zbus->bus, zdev->devfn);
+	if (zdev->pending_errs.count)
+		pr_err("%s: Unhandled PCI error events count=%zu",
+				pci_name(pdev), zdev->pending_errs.count);
+	memset(&zdev->pending_errs, 0, sizeof(struct zpci_ccdf_pending));
+	mutex_unlock(&zdev->pending_errs_lock);
+}
+EXPORT_SYMBOL_GPL(zpci_cleanup_pending_errors);
+
 static pci_ers_result_t zpci_event_notify_error_detected(struct pci_dev *pdev,
 							 struct pci_driver *driver)
 {
@@ -169,7 +187,8 @@ static pci_ers_result_t zpci_event_do_reset(struct pci_dev *pdev,
  * and the platform determines which functions are affected for
  * multi-function devices.
  */
-static pci_ers_result_t zpci_event_attempt_error_recovery(struct pci_dev *pdev)
+static pci_ers_result_t zpci_event_attempt_error_recovery(struct pci_dev *pdev,
+							  struct zpci_ccdf_err *ccdf)
 {
 	pci_ers_result_t ers_res = PCI_ERS_RESULT_DISCONNECT;
 	struct zpci_dev *zdev = to_zpci(pdev);
@@ -188,13 +207,6 @@ static pci_ers_result_t zpci_event_attempt_error_recovery(struct pci_dev *pdev)
 	}
 	pdev->error_state = pci_channel_io_frozen;
 
-	if (needs_mediated_recovery(pdev)) {
-		pr_info("%s: Cannot be recovered in the host because it is a pass-through device\n",
-			pci_name(pdev));
-		status_str = "failed (pass-through)";
-		goto out_unlock;
-	}
-
 	driver = to_pci_driver(pdev->dev.driver);
 	if (!is_driver_supported(driver)) {
 		if (!driver) {
@@ -210,12 +222,22 @@ static pci_ers_result_t zpci_event_attempt_error_recovery(struct pci_dev *pdev)
 		goto out_unlock;
 	}
 
+	if (needs_mediated_recovery(pdev))
+		zpci_store_pci_error(pdev, ccdf);
+
 	ers_res = zpci_event_notify_error_detected(pdev, driver);
 	if (ers_result_indicates_abort(ers_res)) {
 		status_str = "failed (abort on detection)";
 		goto out_unlock;
 	}
 
+	if (needs_mediated_recovery(pdev)) {
+		pr_info("%s: Recovering passthrough device\n", pci_name(pdev));
+		ers_res = PCI_ERS_RESULT_RECOVERED;
+		status_str = "in progress";
+		goto out_unlock;
+	}
+
 	if (ers_res != PCI_ERS_RESULT_NEED_RESET) {
 		ers_res = zpci_event_do_error_state_clear(pdev, driver);
 		if (ers_result_indicates_abort(ers_res)) {
@@ -258,25 +280,20 @@ static pci_ers_result_t zpci_event_attempt_error_recovery(struct pci_dev *pdev)
  * @pdev: PCI function for which to report
  * @es: PCI channel failure state to report
  */
-static void zpci_event_io_failure(struct pci_dev *pdev, pci_channel_state_t es)
+static void zpci_event_io_failure(struct pci_dev *pdev, pci_channel_state_t es,
+				  struct zpci_ccdf_err *ccdf)
 {
 	struct pci_driver *driver;
 
 	pci_dev_lock(pdev);
 	pdev->error_state = es;
-	/**
-	 * While vfio-pci's error_detected callback notifies user-space QEMU
-	 * reacts to this by freezing the guest. In an s390 environment PCI
-	 * errors are rarely fatal so this is overkill. Instead in the future
-	 * we will inject the error event and let the guest recover the device
-	 * itself.
-	 */
+
 	if (needs_mediated_recovery(pdev))
-		goto out;
+		zpci_store_pci_error(pdev, ccdf);
 	driver = to_pci_driver(pdev->dev.driver);
 	if (driver && driver->err_handler && driver->err_handler->error_detected)
 		driver->err_handler->error_detected(pdev, pdev->error_state);
-out:
+
 	pci_dev_unlock(pdev);
 }
 
@@ -312,6 +329,7 @@ static void __zpci_event_error(struct zpci_ccdf_err *ccdf)
 	pr_err("%s: Event 0x%x reports an error for PCI function 0x%x\n",
 	       pdev ? pci_name(pdev) : "n/a", ccdf->pec, ccdf->fid);
 
+
 	if (!pdev)
 		goto no_pdev;
 
@@ -322,12 +340,13 @@ static void __zpci_event_error(struct zpci_ccdf_err *ccdf)
 		break;
 	case 0x0040: /* Service Action or Error Recovery Failed */
 	case 0x003b:
-		zpci_event_io_failure(pdev, pci_channel_io_perm_failure);
+		zpci_event_io_failure(pdev, pci_channel_io_perm_failure, ccdf);
 		break;
 	default: /* PCI function left in the error state attempt to recover */
-		ers_res = zpci_event_attempt_error_recovery(pdev);
+		ers_res = zpci_event_attempt_error_recovery(pdev, ccdf);
 		if (ers_res != PCI_ERS_RESULT_RECOVERED)
-			zpci_event_io_failure(pdev, pci_channel_io_perm_failure);
+			zpci_event_io_failure(pdev, pci_channel_io_perm_failure,
+					ccdf);
 		break;
 	}
 	pci_dev_put(pdev);
diff --git a/drivers/vfio/pci/vfio_pci_zdev.c b/drivers/vfio/pci/vfio_pci_zdev.c
index a7bc23ce8483..2be37eab9279 100644
--- a/drivers/vfio/pci/vfio_pci_zdev.c
+++ b/drivers/vfio/pci/vfio_pci_zdev.c
@@ -168,6 +168,8 @@ void vfio_pci_zdev_close_device(struct vfio_pci_core_device *vdev)
 
 	zdev->mediated_recovery = false;
 
+	zpci_cleanup_pending_errors(zdev);
+
 	if (!vdev->vdev.kvm)
 		return;
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v1 4/6] vfio-pci/zdev: Setup a zpci memory region for error information
  2025-08-13 17:08 [PATCH v1 0/6] Error recovery for vfio-pci devices on s390x Farhan Ali
                   ` (2 preceding siblings ...)
  2025-08-13 17:08 ` [PATCH v1 3/6] s390/pci: Store PCI error information for passthrough devices Farhan Ali
@ 2025-08-13 17:08 ` Farhan Ali
  2025-08-13 20:30   ` Alex Williamson
  2025-08-13 17:08 ` [PATCH v1 5/6] vfio-pci/zdev: Perform platform specific function reset for zPCI Farhan Ali
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 27+ messages in thread
From: Farhan Ali @ 2025-08-13 17:08 UTC (permalink / raw)
  To: linux-s390, kvm, linux-kernel; +Cc: schnelle, mjrosato, alifm, alex.williamson

For zpci devices, we have platform specific error information.
To enable recovery for passthrough devices, we want to expose
this error information to user space. We can expose this information
via a device specific (read only) memory region.

Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
---
 drivers/vfio/pci/vfio_pci_zdev.c | 76 ++++++++++++++++++++++++++++++++
 include/uapi/linux/vfio.h        |  2 +
 include/uapi/linux/vfio_zdev.h   |  5 +++
 3 files changed, 83 insertions(+)

diff --git a/drivers/vfio/pci/vfio_pci_zdev.c b/drivers/vfio/pci/vfio_pci_zdev.c
index 2be37eab9279..818235b28caa 100644
--- a/drivers/vfio/pci/vfio_pci_zdev.c
+++ b/drivers/vfio/pci/vfio_pci_zdev.c
@@ -141,15 +141,91 @@ int vfio_pci_info_zdev_add_caps(struct vfio_pci_core_device *vdev,
 	return ret;
 }
 
+static ssize_t vfio_pci_zdev_read_err_region(struct vfio_pci_core_device *vdev,
+					    char __user *buf, size_t count,
+					    loff_t *ppos, bool iswrite)
+{
+	struct zpci_dev *zdev = to_zpci(vdev->pdev);
+	struct zpci_ccdf_err err;
+	struct vfio_device_zpci_err_region *region;
+	unsigned int i = VFIO_PCI_OFFSET_TO_INDEX(*ppos) - VFIO_PCI_NUM_REGIONS;
+	int head = 0;
+	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
+
+	region = vdev->region[i].data;
+
+	if (!zdev)
+		return -ENODEV;
+
+	if (pos + count > vdev->region[i].size || iswrite)
+		return -EINVAL;
+
+	mutex_lock(&zdev->pending_errs_lock);
+	if (zdev->pending_errs.count) {
+		head = zdev->pending_errs.head % ZPCI_ERR_PENDING_MAX;
+		err = zdev->pending_errs.err[head];
+		region->pec = err.pec;
+		zdev->pending_errs.head++;
+		zdev->pending_errs.count--;
+		region->pending_errors = zdev->pending_errs.count;
+	}
+	mutex_unlock(&zdev->pending_errs_lock);
+
+	if (copy_to_user(buf, (void *)region + pos, count))
+		count = -EFAULT;
+
+	return count;
+}
+
+static void vfio_pci_zdev_release_err_region(struct vfio_pci_core_device *vdev,
+					     struct vfio_pci_region *region)
+{
+	struct vfio_device_zpci_err_region *err_region = region->data;
+
+	kfree(err_region);
+}
+
+static const struct vfio_pci_regops vfio_pci_zdev_err_regops = {
+	.rw = vfio_pci_zdev_read_err_region,
+	.release = vfio_pci_zdev_release_err_region
+};
+
+static int vfio_pci_zdev_setup_err_region(struct vfio_pci_core_device *vdev)
+{
+	struct vfio_device_zpci_err_region *region;
+	int ret = 0;
+
+	region = kzalloc(sizeof(*region), GFP_KERNEL);
+	if (!region)
+		return -ENOMEM;
+
+	ret = vfio_pci_core_register_dev_region(vdev,
+		PCI_VENDOR_ID_IBM | VFIO_REGION_TYPE_PCI_VENDOR_TYPE,
+		VFIO_REGION_SUBTYPE_IBM_ZPCI_ERROR_REGION,
+		&vfio_pci_zdev_err_regops,
+		sizeof(*region), VFIO_REGION_INFO_FLAG_READ, region);
+
+	if (ret)
+		kfree(region);
+
+
+	return ret;
+}
+
 int vfio_pci_zdev_open_device(struct vfio_pci_core_device *vdev)
 {
 	struct zpci_dev *zdev = to_zpci(vdev->pdev);
+	int ret;
 
 	if (!zdev)
 		return -ENODEV;
 
 	zdev->mediated_recovery = true;
 
+	ret = vfio_pci_zdev_setup_err_region(vdev);
+	if (ret)
+		return ret;
+
 	if (!vdev->vdev.kvm)
 		return 0;
 
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 75100bf009ba..452b87f3672e 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -369,6 +369,8 @@ struct vfio_region_info_cap_type {
  */
 #define VFIO_REGION_SUBTYPE_IBM_NVLINK2_ATSD	(1)
 
+#define VFIO_REGION_SUBTYPE_IBM_ZPCI_ERROR_REGION (2)
+
 /* sub-types for VFIO_REGION_TYPE_GFX */
 #define VFIO_REGION_SUBTYPE_GFX_EDID            (1)
 
diff --git a/include/uapi/linux/vfio_zdev.h b/include/uapi/linux/vfio_zdev.h
index 77f2aff1f27e..bcd06f334a42 100644
--- a/include/uapi/linux/vfio_zdev.h
+++ b/include/uapi/linux/vfio_zdev.h
@@ -82,4 +82,9 @@ struct vfio_device_info_cap_zpci_pfip {
 	__u8 pfip[];
 };
 
+struct vfio_device_zpci_err_region {
+	__u16 pec;
+	int pending_errors;
+};
+
 #endif
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v1 5/6] vfio-pci/zdev: Perform platform specific function reset for zPCI
  2025-08-13 17:08 [PATCH v1 0/6] Error recovery for vfio-pci devices on s390x Farhan Ali
                   ` (3 preceding siblings ...)
  2025-08-13 17:08 ` [PATCH v1 4/6] vfio-pci/zdev: Setup a zpci memory region for error information Farhan Ali
@ 2025-08-13 17:08 ` Farhan Ali
  2025-08-13 20:30   ` Alex Williamson
                     ` (2 more replies)
  2025-08-13 17:08 ` [PATCH v1 6/6] vfio: Allow error notification and recovery for ISM device Farhan Ali
  2025-08-13 17:45 ` [PATCH v1 0/6] Error recovery for vfio-pci devices on s390x Farhan Ali
  6 siblings, 3 replies; 27+ messages in thread
From: Farhan Ali @ 2025-08-13 17:08 UTC (permalink / raw)
  To: linux-s390, kvm, linux-kernel; +Cc: schnelle, mjrosato, alifm, alex.williamson

For zPCI devices we should drive a platform specific function reset
as part of VFIO_DEVICE_RESET. This reset is needed recover a zPCI device
in error state.

Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
---
 arch/s390/pci/pci.c              |  1 +
 drivers/vfio/pci/vfio_pci_core.c |  4 ++++
 drivers/vfio/pci/vfio_pci_priv.h |  5 ++++
 drivers/vfio/pci/vfio_pci_zdev.c | 39 ++++++++++++++++++++++++++++++++
 4 files changed, 49 insertions(+)

diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c
index f795e05b5001..860a64993b58 100644
--- a/arch/s390/pci/pci.c
+++ b/arch/s390/pci/pci.c
@@ -788,6 +788,7 @@ int zpci_hot_reset_device(struct zpci_dev *zdev)
 
 	return rc;
 }
+EXPORT_SYMBOL_GPL(zpci_hot_reset_device);
 
 /**
  * zpci_create_device() - Create a new zpci_dev and add it to the zbus
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 7dcf5439dedc..7220a22135a9 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -1227,6 +1227,10 @@ static int vfio_pci_ioctl_reset(struct vfio_pci_core_device *vdev,
 	 */
 	vfio_pci_set_power_state(vdev, PCI_D0);
 
+	ret = vfio_pci_zdev_reset(vdev);
+	if (ret && ret != -ENODEV)
+		return ret;
+
 	ret = pci_try_reset_function(vdev->pdev);
 	up_write(&vdev->memory_lock);
 
diff --git a/drivers/vfio/pci/vfio_pci_priv.h b/drivers/vfio/pci/vfio_pci_priv.h
index a9972eacb293..5288577b3170 100644
--- a/drivers/vfio/pci/vfio_pci_priv.h
+++ b/drivers/vfio/pci/vfio_pci_priv.h
@@ -86,6 +86,7 @@ int vfio_pci_info_zdev_add_caps(struct vfio_pci_core_device *vdev,
 				struct vfio_info_cap *caps);
 int vfio_pci_zdev_open_device(struct vfio_pci_core_device *vdev);
 void vfio_pci_zdev_close_device(struct vfio_pci_core_device *vdev);
+int vfio_pci_zdev_reset(struct vfio_pci_core_device *vdev);
 #else
 static inline int vfio_pci_info_zdev_add_caps(struct vfio_pci_core_device *vdev,
 					      struct vfio_info_cap *caps)
@@ -100,6 +101,10 @@ static inline int vfio_pci_zdev_open_device(struct vfio_pci_core_device *vdev)
 
 static inline void vfio_pci_zdev_close_device(struct vfio_pci_core_device *vdev)
 {}
+int vfio_pci_zdev_reset(struct vfio_pci_core_device *vdev)
+{
+	return -ENODEV;
+}
 #endif
 
 static inline bool vfio_pci_is_vga(struct pci_dev *pdev)
diff --git a/drivers/vfio/pci/vfio_pci_zdev.c b/drivers/vfio/pci/vfio_pci_zdev.c
index 818235b28caa..dd1919ccb3be 100644
--- a/drivers/vfio/pci/vfio_pci_zdev.c
+++ b/drivers/vfio/pci/vfio_pci_zdev.c
@@ -212,6 +212,45 @@ static int vfio_pci_zdev_setup_err_region(struct vfio_pci_core_device *vdev)
 	return ret;
 }
 
+int vfio_pci_zdev_reset(struct vfio_pci_core_device *vdev)
+{
+	struct zpci_dev *zdev = to_zpci(vdev->pdev);
+	int rc = -EIO;
+
+	if (!zdev)
+		return -ENODEV;
+
+	/*
+	 * If we can't get the zdev->state_lock the device state is
+	 * currently undergoing a transition and we bail out - just
+	 * the same as if the device's state is not configured at all.
+	 */
+	if (!mutex_trylock(&zdev->state_lock))
+		return rc;
+
+	/* We can reset only if the function is configured */
+	if (zdev->state != ZPCI_FN_STATE_CONFIGURED)
+		goto out;
+
+	rc = zpci_hot_reset_device(zdev);
+	if (rc != 0)
+		goto out;
+
+	if (!vdev->pci_saved_state) {
+		pci_err(vdev->pdev, "No saved available for the device");
+		rc = -EIO;
+		goto out;
+	}
+
+	pci_dev_lock(vdev->pdev);
+	pci_load_saved_state(vdev->pdev, vdev->pci_saved_state);
+	pci_restore_state(vdev->pdev);
+	pci_dev_unlock(vdev->pdev);
+out:
+	mutex_unlock(&zdev->state_lock);
+	return rc;
+}
+
 int vfio_pci_zdev_open_device(struct vfio_pci_core_device *vdev)
 {
 	struct zpci_dev *zdev = to_zpci(vdev->pdev);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v1 6/6] vfio: Allow error notification and recovery for ISM device
  2025-08-13 17:08 [PATCH v1 0/6] Error recovery for vfio-pci devices on s390x Farhan Ali
                   ` (4 preceding siblings ...)
  2025-08-13 17:08 ` [PATCH v1 5/6] vfio-pci/zdev: Perform platform specific function reset for zPCI Farhan Ali
@ 2025-08-13 17:08 ` Farhan Ali
  2025-08-14 20:48   ` Bjorn Helgaas
  2025-08-13 17:45 ` [PATCH v1 0/6] Error recovery for vfio-pci devices on s390x Farhan Ali
  6 siblings, 1 reply; 27+ messages in thread
From: Farhan Ali @ 2025-08-13 17:08 UTC (permalink / raw)
  To: linux-s390, kvm, linux-kernel; +Cc: schnelle, mjrosato, alifm, alex.williamson

VFIO allows error recovery and notification for devices that
are PCIe (and thus AER) capable. But for PCI devices on IBM
s390 error recovery involves platform firmware and
notification to operating system is done by architecture
specific way. The Internal Shared Memory(ISM) device is a legacy
PCI device (so not PCIe capable), but can still be recovered
when notified of an error.

Relax the PCIe only requirement for ISM devices, so passthrough
ISM devices can be notified and recovered on error.

Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
---
 drivers/vfio/pci/vfio_pci_core.c  | 18 ++++++++++++++++--
 drivers/vfio/pci/vfio_pci_intrs.c |  2 +-
 drivers/vfio/pci/vfio_pci_priv.h  |  3 +++
 3 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 7220a22135a9..1faab80139c6 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -723,6 +723,20 @@ void vfio_pci_core_finish_enable(struct vfio_pci_core_device *vdev)
 }
 EXPORT_SYMBOL_GPL(vfio_pci_core_finish_enable);
 
+bool vfio_pci_device_can_recover(struct vfio_pci_core_device *vdev)
+{
+	struct pci_dev *pdev = vdev->pdev;
+
+	if (pci_is_pcie(pdev))
+		return true;
+
+	if (pdev->vendor == PCI_VENDOR_ID_IBM &&
+			pdev->device == PCI_DEVICE_ID_IBM_ISM)
+		return true;
+
+	return false;
+}
+
 static int vfio_pci_get_irq_count(struct vfio_pci_core_device *vdev, int irq_type)
 {
 	if (irq_type == VFIO_PCI_INTX_IRQ_INDEX) {
@@ -749,7 +763,7 @@ static int vfio_pci_get_irq_count(struct vfio_pci_core_device *vdev, int irq_typ
 			return (flags & PCI_MSIX_FLAGS_QSIZE) + 1;
 		}
 	} else if (irq_type == VFIO_PCI_ERR_IRQ_INDEX) {
-		if (pci_is_pcie(vdev->pdev))
+		if (vfio_pci_device_can_recover(vdev))
 			return 1;
 	} else if (irq_type == VFIO_PCI_REQ_IRQ_INDEX) {
 		return 1;
@@ -1150,7 +1164,7 @@ static int vfio_pci_ioctl_get_irq_info(struct vfio_pci_core_device *vdev,
 	case VFIO_PCI_REQ_IRQ_INDEX:
 		break;
 	case VFIO_PCI_ERR_IRQ_INDEX:
-		if (pci_is_pcie(vdev->pdev))
+		if (vfio_pci_device_can_recover(vdev))
 			break;
 		fallthrough;
 	default:
diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index 123298a4dc8f..f5384086ac45 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -838,7 +838,7 @@ int vfio_pci_set_irqs_ioctl(struct vfio_pci_core_device *vdev, uint32_t flags,
 	case VFIO_PCI_ERR_IRQ_INDEX:
 		switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) {
 		case VFIO_IRQ_SET_ACTION_TRIGGER:
-			if (pci_is_pcie(vdev->pdev))
+			if (vfio_pci_device_can_recover(vdev))
 				func = vfio_pci_set_err_trigger;
 			break;
 		}
diff --git a/drivers/vfio/pci/vfio_pci_priv.h b/drivers/vfio/pci/vfio_pci_priv.h
index 5288577b3170..93c1e29fbbbb 100644
--- a/drivers/vfio/pci/vfio_pci_priv.h
+++ b/drivers/vfio/pci/vfio_pci_priv.h
@@ -36,6 +36,9 @@ ssize_t vfio_pci_config_rw(struct vfio_pci_core_device *vdev, char __user *buf,
 ssize_t vfio_pci_bar_rw(struct vfio_pci_core_device *vdev, char __user *buf,
 			size_t count, loff_t *ppos, bool iswrite);
 
+bool vfio_pci_device_can_recover(struct vfio_pci_core_device *vdev);
+
+
 #ifdef CONFIG_VFIO_PCI_VGA
 ssize_t vfio_pci_vga_rw(struct vfio_pci_core_device *vdev, char __user *buf,
 			size_t count, loff_t *ppos, bool iswrite);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v1 0/6] Error recovery for vfio-pci devices on s390x
  2025-08-13 17:08 [PATCH v1 0/6] Error recovery for vfio-pci devices on s390x Farhan Ali
                   ` (5 preceding siblings ...)
  2025-08-13 17:08 ` [PATCH v1 6/6] vfio: Allow error notification and recovery for ISM device Farhan Ali
@ 2025-08-13 17:45 ` Farhan Ali
  6 siblings, 0 replies; 27+ messages in thread
From: Farhan Ali @ 2025-08-13 17:45 UTC (permalink / raw)
  To: linux-s390, kvm, linux-kernel; +Cc: schnelle, mjrosato, alex.williamson

Also posted a QEMU series utilizing these kernel patches
https://lore.kernel.org/qemu-devel/20250813174152.1238-1-alifm@linux.ibm.com/

Thanks
Farhan

On 8/13/2025 10:08 AM, Farhan Ali wrote:
> Hi,
>
> This Linux kernel patch series introduces support for error recovery for
> passthrough PCI devices on System Z (s390x).
>
> Background
> ----------
> For PCI devices on s390x an operating system receives platform specific
> error events from firmware rather than through AER.Today for
> passthrough/userspace devices, we don't attempt any error recovery
> and ignore any error events for the devices. The passthrough/userspace devices are
> managed by the vfio-pci driver. The driver does register error handling
> callbacks (error_detected), and on an error trigger an eventfd to userspace.
> But we need a mechanism to notify userspace (QEMU/guest/userspace drivers) about
> the error event.
>
> Proposal
> --------
> We can expose this error information (currently only the PCI Error Code) via a
> device specific memory region for s390 vfio pci devices. Userspace can then read
> the memory region to obtain the error information and take appropriate actions
> such as driving a device reset. The memory region provides some flexibility in
> providing more information in the future if required.
>
> I would appreciate some feedback on this approach.
>
> Thanks
> Farhan
>
> Farhan Ali (6):
>    s390/pci: Restore airq unconditionally for the zPCI device
>    s390/pci: Update the logic for detecting passthrough device
>    s390/pci: Store PCI error information for passthrough devices
>    vfio-pci/zdev: Setup a zpci memory region for error information
>    vfio-pci/zdev: Perform platform specific function reset for zPCI
>    vfio: Allow error notification and recovery for ISM device
>
>   arch/s390/include/asm/pci.h       |  29 +++++++
>   arch/s390/pci/pci.c               |   2 +
>   arch/s390/pci/pci_event.c         | 107 ++++++++++++++-----------
>   arch/s390/pci/pci_irq.c           |   3 +-
>   drivers/vfio/pci/vfio_pci_core.c  |  22 +++++-
>   drivers/vfio/pci/vfio_pci_intrs.c |   2 +-
>   drivers/vfio/pci/vfio_pci_priv.h  |   8 ++
>   drivers/vfio/pci/vfio_pci_zdev.c  | 126 +++++++++++++++++++++++++++++-
>   include/uapi/linux/vfio.h         |   2 +
>   include/uapi/linux/vfio_zdev.h    |   5 ++
>   10 files changed, 253 insertions(+), 53 deletions(-)
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v1 4/6] vfio-pci/zdev: Setup a zpci memory region for error information
  2025-08-13 17:08 ` [PATCH v1 4/6] vfio-pci/zdev: Setup a zpci memory region for error information Farhan Ali
@ 2025-08-13 20:30   ` Alex Williamson
  2025-08-13 21:25     ` Farhan Ali
  0 siblings, 1 reply; 27+ messages in thread
From: Alex Williamson @ 2025-08-13 20:30 UTC (permalink / raw)
  To: Farhan Ali; +Cc: linux-s390, kvm, linux-kernel, schnelle, mjrosato

On Wed, 13 Aug 2025 10:08:18 -0700
Farhan Ali <alifm@linux.ibm.com> wrote:
> diff --git a/include/uapi/linux/vfio_zdev.h b/include/uapi/linux/vfio_zdev.h
> index 77f2aff1f27e..bcd06f334a42 100644
> --- a/include/uapi/linux/vfio_zdev.h
> +++ b/include/uapi/linux/vfio_zdev.h
> @@ -82,4 +82,9 @@ struct vfio_device_info_cap_zpci_pfip {
>  	__u8 pfip[];
>  };
>  
> +struct vfio_device_zpci_err_region {
> +	__u16 pec;
> +	int pending_errors;
> +};
> +
>  #endif

If this is uapi it would hopefully include some description, but if
this is the extent of what can be read from the device specific region,
why not just return it via a DEVICE_FEATURE ioctl?  Thanks,

Alex


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v1 5/6] vfio-pci/zdev: Perform platform specific function reset for zPCI
  2025-08-13 17:08 ` [PATCH v1 5/6] vfio-pci/zdev: Perform platform specific function reset for zPCI Farhan Ali
@ 2025-08-13 20:30   ` Alex Williamson
  2025-08-13 21:52     ` Farhan Ali
  2025-08-14  5:22   ` kernel test robot
  2025-08-14  7:42   ` kernel test robot
  2 siblings, 1 reply; 27+ messages in thread
From: Alex Williamson @ 2025-08-13 20:30 UTC (permalink / raw)
  To: Farhan Ali; +Cc: linux-s390, kvm, linux-kernel, schnelle, mjrosato

On Wed, 13 Aug 2025 10:08:19 -0700
Farhan Ali <alifm@linux.ibm.com> wrote:

> For zPCI devices we should drive a platform specific function reset
> as part of VFIO_DEVICE_RESET. This reset is needed recover a zPCI device
> in error state.
> 
> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
> ---
>  arch/s390/pci/pci.c              |  1 +
>  drivers/vfio/pci/vfio_pci_core.c |  4 ++++
>  drivers/vfio/pci/vfio_pci_priv.h |  5 ++++
>  drivers/vfio/pci/vfio_pci_zdev.c | 39 ++++++++++++++++++++++++++++++++
>  4 files changed, 49 insertions(+)
> 
> diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c
> index f795e05b5001..860a64993b58 100644
> --- a/arch/s390/pci/pci.c
> +++ b/arch/s390/pci/pci.c
> @@ -788,6 +788,7 @@ int zpci_hot_reset_device(struct zpci_dev *zdev)
>  
>  	return rc;
>  }
> +EXPORT_SYMBOL_GPL(zpci_hot_reset_device);
>  
>  /**
>   * zpci_create_device() - Create a new zpci_dev and add it to the zbus
> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index 7dcf5439dedc..7220a22135a9 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -1227,6 +1227,10 @@ static int vfio_pci_ioctl_reset(struct vfio_pci_core_device *vdev,
>  	 */
>  	vfio_pci_set_power_state(vdev, PCI_D0);
>  
> +	ret = vfio_pci_zdev_reset(vdev);
> +	if (ret && ret != -ENODEV)
> +		return ret;
> +
>  	ret = pci_try_reset_function(vdev->pdev);
>  	up_write(&vdev->memory_lock);

You're going to be very unhappy if this lock isn't released.

>  
> diff --git a/drivers/vfio/pci/vfio_pci_priv.h b/drivers/vfio/pci/vfio_pci_priv.h
> index a9972eacb293..5288577b3170 100644
> --- a/drivers/vfio/pci/vfio_pci_priv.h
> +++ b/drivers/vfio/pci/vfio_pci_priv.h
> @@ -86,6 +86,7 @@ int vfio_pci_info_zdev_add_caps(struct vfio_pci_core_device *vdev,
>  				struct vfio_info_cap *caps);
>  int vfio_pci_zdev_open_device(struct vfio_pci_core_device *vdev);
>  void vfio_pci_zdev_close_device(struct vfio_pci_core_device *vdev);
> +int vfio_pci_zdev_reset(struct vfio_pci_core_device *vdev);
>  #else
>  static inline int vfio_pci_info_zdev_add_caps(struct vfio_pci_core_device *vdev,
>  					      struct vfio_info_cap *caps)
> @@ -100,6 +101,10 @@ static inline int vfio_pci_zdev_open_device(struct vfio_pci_core_device *vdev)
>  
>  static inline void vfio_pci_zdev_close_device(struct vfio_pci_core_device *vdev)
>  {}
> +int vfio_pci_zdev_reset(struct vfio_pci_core_device *vdev)
> +{
> +	return -ENODEV;
> +}
>  #endif
>  
>  static inline bool vfio_pci_is_vga(struct pci_dev *pdev)
> diff --git a/drivers/vfio/pci/vfio_pci_zdev.c b/drivers/vfio/pci/vfio_pci_zdev.c
> index 818235b28caa..dd1919ccb3be 100644
> --- a/drivers/vfio/pci/vfio_pci_zdev.c
> +++ b/drivers/vfio/pci/vfio_pci_zdev.c
> @@ -212,6 +212,45 @@ static int vfio_pci_zdev_setup_err_region(struct vfio_pci_core_device *vdev)
>  	return ret;
>  }
>  
> +int vfio_pci_zdev_reset(struct vfio_pci_core_device *vdev)
> +{
> +	struct zpci_dev *zdev = to_zpci(vdev->pdev);
> +	int rc = -EIO;
> +
> +	if (!zdev)
> +		return -ENODEV;
> +
> +	/*
> +	 * If we can't get the zdev->state_lock the device state is
> +	 * currently undergoing a transition and we bail out - just
> +	 * the same as if the device's state is not configured at all.
> +	 */
> +	if (!mutex_trylock(&zdev->state_lock))
> +		return rc;
> +
> +	/* We can reset only if the function is configured */
> +	if (zdev->state != ZPCI_FN_STATE_CONFIGURED)
> +		goto out;
> +
> +	rc = zpci_hot_reset_device(zdev);
> +	if (rc != 0)
> +		goto out;
> +
> +	if (!vdev->pci_saved_state) {
> +		pci_err(vdev->pdev, "No saved available for the device");
> +		rc = -EIO;
> +		goto out;
> +	}
> +
> +	pci_dev_lock(vdev->pdev);
> +	pci_load_saved_state(vdev->pdev, vdev->pci_saved_state);
> +	pci_restore_state(vdev->pdev);
> +	pci_dev_unlock(vdev->pdev);
> +out:
> +	mutex_unlock(&zdev->state_lock);
> +	return rc;
> +}

This looks like it should be a device or arch specific reset
implemented in drivers/pci, not vfio.  Thanks,

Alex

> +
>  int vfio_pci_zdev_open_device(struct vfio_pci_core_device *vdev)
>  {
>  	struct zpci_dev *zdev = to_zpci(vdev->pdev);


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v1 4/6] vfio-pci/zdev: Setup a zpci memory region for error information
  2025-08-13 20:30   ` Alex Williamson
@ 2025-08-13 21:25     ` Farhan Ali
  2025-08-13 21:42       ` Alex Williamson
  0 siblings, 1 reply; 27+ messages in thread
From: Farhan Ali @ 2025-08-13 21:25 UTC (permalink / raw)
  To: Alex Williamson; +Cc: linux-s390, kvm, linux-kernel, schnelle, mjrosato


On 8/13/2025 1:30 PM, Alex Williamson wrote:
> On Wed, 13 Aug 2025 10:08:18 -0700
> Farhan Ali <alifm@linux.ibm.com> wrote:
>> diff --git a/include/uapi/linux/vfio_zdev.h b/include/uapi/linux/vfio_zdev.h
>> index 77f2aff1f27e..bcd06f334a42 100644
>> --- a/include/uapi/linux/vfio_zdev.h
>> +++ b/include/uapi/linux/vfio_zdev.h
>> @@ -82,4 +82,9 @@ struct vfio_device_info_cap_zpci_pfip {
>>   	__u8 pfip[];
>>   };
>>   
>> +struct vfio_device_zpci_err_region {
>> +	__u16 pec;
>> +	int pending_errors;
>> +};
>> +
>>   #endif
> If this is uapi it would hopefully include some description, but if
> this is the extent of what can be read from the device specific region,
> why not just return it via a DEVICE_FEATURE ioctl?  Thanks,
>
> Alex
>
Yes, will add more details about the uapi. My thinking was based on how 
we expose some other vfio device information on s390x, such as SCHIB for 
vfio-ccw device.

I didn't think about the DEVICE_FEATURE ioctl. But looking into it, it 
looks like we would have to define a device feature (for eg: 
VFIO_DEVICE_FEATURE_ZPCI_ERROR), and expose this information via 
GET_FEATURE? If the preference is to use the DEVICE_FEATURE ioctl I can 
try that. Curious, any specific reason you prefer the DEVICE_FEATURE 
ioctl to the memory region?

Thanks
Farhan



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v1 4/6] vfio-pci/zdev: Setup a zpci memory region for error information
  2025-08-13 21:25     ` Farhan Ali
@ 2025-08-13 21:42       ` Alex Williamson
  0 siblings, 0 replies; 27+ messages in thread
From: Alex Williamson @ 2025-08-13 21:42 UTC (permalink / raw)
  To: Farhan Ali; +Cc: linux-s390, kvm, linux-kernel, schnelle, mjrosato

On Wed, 13 Aug 2025 14:25:59 -0700
Farhan Ali <alifm@linux.ibm.com> wrote:

> On 8/13/2025 1:30 PM, Alex Williamson wrote:
> > On Wed, 13 Aug 2025 10:08:18 -0700
> > Farhan Ali <alifm@linux.ibm.com> wrote:  
> >> diff --git a/include/uapi/linux/vfio_zdev.h b/include/uapi/linux/vfio_zdev.h
> >> index 77f2aff1f27e..bcd06f334a42 100644
> >> --- a/include/uapi/linux/vfio_zdev.h
> >> +++ b/include/uapi/linux/vfio_zdev.h
> >> @@ -82,4 +82,9 @@ struct vfio_device_info_cap_zpci_pfip {
> >>   	__u8 pfip[];
> >>   };
> >>   
> >> +struct vfio_device_zpci_err_region {
> >> +	__u16 pec;
> >> +	int pending_errors;
> >> +};
> >> +
> >>   #endif  
> > If this is uapi it would hopefully include some description, but if
> > this is the extent of what can be read from the device specific region,
> > why not just return it via a DEVICE_FEATURE ioctl?  Thanks,
> >
> > Alex
> >  
> Yes, will add more details about the uapi. My thinking was based on how 
> we expose some other vfio device information on s390x, such as SCHIB for 
> vfio-ccw device.
> 
> I didn't think about the DEVICE_FEATURE ioctl. But looking into it, it 
> looks like we would have to define a device feature (for eg: 
> VFIO_DEVICE_FEATURE_ZPCI_ERROR), and expose this information via 
> GET_FEATURE?

Yes, and there's a probe capability built-in to determine support.

> If the preference is to use the DEVICE_FEATURE ioctl I can 
> try that. Curious, any specific reason you prefer the DEVICE_FEATURE 
> ioctl to the memory region?

Given our current segmenting of the vfio device fd, we're using 40-bits
of address space for a 6-byte structure.  We're returning structured
data that has no requirement to be read at arbitrary offsets and
lengths.  For example, does this series really even handle a short
read?  We adjust counters for any read, it's more prone to those sorts
of errors.

Maybe if you actually wanted to allow the user to mmap the error array
buffer and move the head as they read data while the kernel
asynchronously fills from the tail, it might make sense to use a
region, but as used here I don't think it's the right interface.
Thanks,

Alex


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v1 5/6] vfio-pci/zdev: Perform platform specific function reset for zPCI
  2025-08-13 20:30   ` Alex Williamson
@ 2025-08-13 21:52     ` Farhan Ali
  2025-08-13 22:56       ` Alex Williamson
  0 siblings, 1 reply; 27+ messages in thread
From: Farhan Ali @ 2025-08-13 21:52 UTC (permalink / raw)
  To: Alex Williamson; +Cc: linux-s390, kvm, linux-kernel, schnelle, mjrosato


On 8/13/2025 1:30 PM, Alex Williamson wrote:
> On Wed, 13 Aug 2025 10:08:19 -0700
> Farhan Ali <alifm@linux.ibm.com> wrote:
>
>> For zPCI devices we should drive a platform specific function reset
>> as part of VFIO_DEVICE_RESET. This reset is needed recover a zPCI device
>> in error state.
>>
>> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
>> ---
>>   arch/s390/pci/pci.c              |  1 +
>>   drivers/vfio/pci/vfio_pci_core.c |  4 ++++
>>   drivers/vfio/pci/vfio_pci_priv.h |  5 ++++
>>   drivers/vfio/pci/vfio_pci_zdev.c | 39 ++++++++++++++++++++++++++++++++
>>   4 files changed, 49 insertions(+)
>>
>> diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c
>> index f795e05b5001..860a64993b58 100644
>> --- a/arch/s390/pci/pci.c
>> +++ b/arch/s390/pci/pci.c
>> @@ -788,6 +788,7 @@ int zpci_hot_reset_device(struct zpci_dev *zdev)
>>   
>>   	return rc;
>>   }
>> +EXPORT_SYMBOL_GPL(zpci_hot_reset_device);
>>   
>>   /**
>>    * zpci_create_device() - Create a new zpci_dev and add it to the zbus
>> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
>> index 7dcf5439dedc..7220a22135a9 100644
>> --- a/drivers/vfio/pci/vfio_pci_core.c
>> +++ b/drivers/vfio/pci/vfio_pci_core.c
>> @@ -1227,6 +1227,10 @@ static int vfio_pci_ioctl_reset(struct vfio_pci_core_device *vdev,
>>   	 */
>>   	vfio_pci_set_power_state(vdev, PCI_D0);
>>   
>> +	ret = vfio_pci_zdev_reset(vdev);
>> +	if (ret && ret != -ENODEV)
>> +		return ret;
>> +
>>   	ret = pci_try_reset_function(vdev->pdev);
>>   	up_write(&vdev->memory_lock);
> You're going to be very unhappy if this lock isn't released.
>
Ah yes, thanks for catching that. Will fix this.

>
>>   
>> diff --git a/drivers/vfio/pci/vfio_pci_priv.h b/drivers/vfio/pci/vfio_pci_priv.h
>> index a9972eacb293..5288577b3170 100644
>> --- a/drivers/vfio/pci/vfio_pci_priv.h
>> +++ b/drivers/vfio/pci/vfio_pci_priv.h
>> @@ -86,6 +86,7 @@ int vfio_pci_info_zdev_add_caps(struct vfio_pci_core_device *vdev,
>>   				struct vfio_info_cap *caps);
>>   int vfio_pci_zdev_open_device(struct vfio_pci_core_device *vdev);
>>   void vfio_pci_zdev_close_device(struct vfio_pci_core_device *vdev);
>> +int vfio_pci_zdev_reset(struct vfio_pci_core_device *vdev);
>>   #else
>>   static inline int vfio_pci_info_zdev_add_caps(struct vfio_pci_core_device *vdev,
>>   					      struct vfio_info_cap *caps)
>> @@ -100,6 +101,10 @@ static inline int vfio_pci_zdev_open_device(struct vfio_pci_core_device *vdev)
>>   
>>   static inline void vfio_pci_zdev_close_device(struct vfio_pci_core_device *vdev)
>>   {}
>> +int vfio_pci_zdev_reset(struct vfio_pci_core_device *vdev)
>> +{
>> +	return -ENODEV;
>> +}
>>   #endif
>>   
>>   static inline bool vfio_pci_is_vga(struct pci_dev *pdev)
>> diff --git a/drivers/vfio/pci/vfio_pci_zdev.c b/drivers/vfio/pci/vfio_pci_zdev.c
>> index 818235b28caa..dd1919ccb3be 100644
>> --- a/drivers/vfio/pci/vfio_pci_zdev.c
>> +++ b/drivers/vfio/pci/vfio_pci_zdev.c
>> @@ -212,6 +212,45 @@ static int vfio_pci_zdev_setup_err_region(struct vfio_pci_core_device *vdev)
>>   	return ret;
>>   }
>>   
>> +int vfio_pci_zdev_reset(struct vfio_pci_core_device *vdev)
>> +{
>> +	struct zpci_dev *zdev = to_zpci(vdev->pdev);
>> +	int rc = -EIO;
>> +
>> +	if (!zdev)
>> +		return -ENODEV;
>> +
>> +	/*
>> +	 * If we can't get the zdev->state_lock the device state is
>> +	 * currently undergoing a transition and we bail out - just
>> +	 * the same as if the device's state is not configured at all.
>> +	 */
>> +	if (!mutex_trylock(&zdev->state_lock))
>> +		return rc;
>> +
>> +	/* We can reset only if the function is configured */
>> +	if (zdev->state != ZPCI_FN_STATE_CONFIGURED)
>> +		goto out;
>> +
>> +	rc = zpci_hot_reset_device(zdev);
>> +	if (rc != 0)
>> +		goto out;
>> +
>> +	if (!vdev->pci_saved_state) {
>> +		pci_err(vdev->pdev, "No saved available for the device");
>> +		rc = -EIO;
>> +		goto out;
>> +	}
>> +
>> +	pci_dev_lock(vdev->pdev);
>> +	pci_load_saved_state(vdev->pdev, vdev->pci_saved_state);
>> +	pci_restore_state(vdev->pdev);
>> +	pci_dev_unlock(vdev->pdev);
>> +out:
>> +	mutex_unlock(&zdev->state_lock);
>> +	return rc;
>> +}
> This looks like it should be a device or arch specific reset
> implemented in drivers/pci, not vfio.  Thanks,
>
> Alex

Are you suggesting to move this to an arch specific function? One thing 
we need to do after the zpci_hot_reset_device, is to correctly restore 
the config space of the device. And for vfio-pci bound devices we want 
to restore the state of the device to when it was initially opened.

Thanks
Farhan


>
>> +
>>   int vfio_pci_zdev_open_device(struct vfio_pci_core_device *vdev)
>>   {
>>   	struct zpci_dev *zdev = to_zpci(vdev->pdev);

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v1 5/6] vfio-pci/zdev: Perform platform specific function reset for zPCI
  2025-08-13 21:52     ` Farhan Ali
@ 2025-08-13 22:56       ` Alex Williamson
  2025-08-14 13:12         ` Niklas Schnelle
  0 siblings, 1 reply; 27+ messages in thread
From: Alex Williamson @ 2025-08-13 22:56 UTC (permalink / raw)
  To: Farhan Ali; +Cc: linux-s390, kvm, linux-kernel, schnelle, mjrosato

On Wed, 13 Aug 2025 14:52:24 -0700
Farhan Ali <alifm@linux.ibm.com> wrote:

> On 8/13/2025 1:30 PM, Alex Williamson wrote:
> > On Wed, 13 Aug 2025 10:08:19 -0700
> > Farhan Ali <alifm@linux.ibm.com> wrote:
> >  
> >> For zPCI devices we should drive a platform specific function reset
> >> as part of VFIO_DEVICE_RESET. This reset is needed recover a zPCI device
> >> in error state.
> >>
> >> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
> >> ---
> >>   arch/s390/pci/pci.c              |  1 +
> >>   drivers/vfio/pci/vfio_pci_core.c |  4 ++++
> >>   drivers/vfio/pci/vfio_pci_priv.h |  5 ++++
> >>   drivers/vfio/pci/vfio_pci_zdev.c | 39 ++++++++++++++++++++++++++++++++
> >>   4 files changed, 49 insertions(+)
> >>
> >> diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c
> >> index f795e05b5001..860a64993b58 100644
> >> --- a/arch/s390/pci/pci.c
> >> +++ b/arch/s390/pci/pci.c
> >> @@ -788,6 +788,7 @@ int zpci_hot_reset_device(struct zpci_dev *zdev)
> >>   
> >>   	return rc;
> >>   }
> >> +EXPORT_SYMBOL_GPL(zpci_hot_reset_device);
> >>   
> >>   /**
> >>    * zpci_create_device() - Create a new zpci_dev and add it to the zbus
> >> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> >> index 7dcf5439dedc..7220a22135a9 100644
> >> --- a/drivers/vfio/pci/vfio_pci_core.c
> >> +++ b/drivers/vfio/pci/vfio_pci_core.c
> >> @@ -1227,6 +1227,10 @@ static int vfio_pci_ioctl_reset(struct vfio_pci_core_device *vdev,
> >>   	 */
> >>   	vfio_pci_set_power_state(vdev, PCI_D0);
> >>   
> >> +	ret = vfio_pci_zdev_reset(vdev);
> >> +	if (ret && ret != -ENODEV)
> >> +		return ret;
> >> +
> >>   	ret = pci_try_reset_function(vdev->pdev);
> >>   	up_write(&vdev->memory_lock);  
> > You're going to be very unhappy if this lock isn't released.
> >  
> Ah yes, thanks for catching that. Will fix this.
> 
> >  
> >>   
> >> diff --git a/drivers/vfio/pci/vfio_pci_priv.h b/drivers/vfio/pci/vfio_pci_priv.h
> >> index a9972eacb293..5288577b3170 100644
> >> --- a/drivers/vfio/pci/vfio_pci_priv.h
> >> +++ b/drivers/vfio/pci/vfio_pci_priv.h
> >> @@ -86,6 +86,7 @@ int vfio_pci_info_zdev_add_caps(struct vfio_pci_core_device *vdev,
> >>   				struct vfio_info_cap *caps);
> >>   int vfio_pci_zdev_open_device(struct vfio_pci_core_device *vdev);
> >>   void vfio_pci_zdev_close_device(struct vfio_pci_core_device *vdev);
> >> +int vfio_pci_zdev_reset(struct vfio_pci_core_device *vdev);
> >>   #else
> >>   static inline int vfio_pci_info_zdev_add_caps(struct vfio_pci_core_device *vdev,
> >>   					      struct vfio_info_cap *caps)
> >> @@ -100,6 +101,10 @@ static inline int vfio_pci_zdev_open_device(struct vfio_pci_core_device *vdev)
> >>   
> >>   static inline void vfio_pci_zdev_close_device(struct vfio_pci_core_device *vdev)
> >>   {}
> >> +int vfio_pci_zdev_reset(struct vfio_pci_core_device *vdev)
> >> +{
> >> +	return -ENODEV;
> >> +}
> >>   #endif
> >>   
> >>   static inline bool vfio_pci_is_vga(struct pci_dev *pdev)
> >> diff --git a/drivers/vfio/pci/vfio_pci_zdev.c b/drivers/vfio/pci/vfio_pci_zdev.c
> >> index 818235b28caa..dd1919ccb3be 100644
> >> --- a/drivers/vfio/pci/vfio_pci_zdev.c
> >> +++ b/drivers/vfio/pci/vfio_pci_zdev.c
> >> @@ -212,6 +212,45 @@ static int vfio_pci_zdev_setup_err_region(struct vfio_pci_core_device *vdev)
> >>   	return ret;
> >>   }
> >>   
> >> +int vfio_pci_zdev_reset(struct vfio_pci_core_device *vdev)
> >> +{
> >> +	struct zpci_dev *zdev = to_zpci(vdev->pdev);
> >> +	int rc = -EIO;
> >> +
> >> +	if (!zdev)
> >> +		return -ENODEV;
> >> +
> >> +	/*
> >> +	 * If we can't get the zdev->state_lock the device state is
> >> +	 * currently undergoing a transition and we bail out - just
> >> +	 * the same as if the device's state is not configured at all.
> >> +	 */
> >> +	if (!mutex_trylock(&zdev->state_lock))
> >> +		return rc;
> >> +
> >> +	/* We can reset only if the function is configured */
> >> +	if (zdev->state != ZPCI_FN_STATE_CONFIGURED)
> >> +		goto out;
> >> +
> >> +	rc = zpci_hot_reset_device(zdev);
> >> +	if (rc != 0)
> >> +		goto out;
> >> +
> >> +	if (!vdev->pci_saved_state) {
> >> +		pci_err(vdev->pdev, "No saved available for the device");
> >> +		rc = -EIO;
> >> +		goto out;
> >> +	}
> >> +
> >> +	pci_dev_lock(vdev->pdev);
> >> +	pci_load_saved_state(vdev->pdev, vdev->pci_saved_state);
> >> +	pci_restore_state(vdev->pdev);
> >> +	pci_dev_unlock(vdev->pdev);
> >> +out:
> >> +	mutex_unlock(&zdev->state_lock);
> >> +	return rc;
> >> +}  
> > This looks like it should be a device or arch specific reset
> > implemented in drivers/pci, not vfio.  Thanks,
> >
> > Alex  
> 
> Are you suggesting to move this to an arch specific function? One thing 
> we need to do after the zpci_hot_reset_device, is to correctly restore 
> the config space of the device. And for vfio-pci bound devices we want 
> to restore the state of the device to when it was initially opened.

We generally rely on the abstraction of pci_reset_function() to select
the correct type of reset for a function scope reset.  We've gone to
quite a bit of effort to implement all device specific resets and
quirks in the PCI core to be re-used across the kernel.

Calling zpci_hot_reset_device() directly seems contradictory to those
efforts.  Should pci_reset_function() call this universally on s390x
rather than providing access to FLR/PM/SBR reset?  Why is it
universally correct here given the ioctl previously made use of
standard reset mechanisms?

The DEVICE_RESET ioctl is simply an in-place reset of the device,
without restoring the original device state.  So we're also subtly
changing that behavior here, presumably because we're targeting the
specific error recovery case.  Have you considered how this might
break non-error-recovery use cases?

I wonder if we want a different reset mechanism for this use case
rather than these subtle semantic changes.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v1 5/6] vfio-pci/zdev: Perform platform specific function reset for zPCI
  2025-08-13 17:08 ` [PATCH v1 5/6] vfio-pci/zdev: Perform platform specific function reset for zPCI Farhan Ali
  2025-08-13 20:30   ` Alex Williamson
@ 2025-08-14  5:22   ` kernel test robot
  2025-08-14  7:42   ` kernel test robot
  2 siblings, 0 replies; 27+ messages in thread
From: kernel test robot @ 2025-08-14  5:22 UTC (permalink / raw)
  To: Farhan Ali, linux-s390, kvm, linux-kernel
  Cc: llvm, oe-kbuild-all, schnelle, mjrosato, alifm, alex.williamson

Hi Farhan,

kernel test robot noticed the following build warnings:

[auto build test WARNING on awilliam-vfio/next]
[also build test WARNING on s390/features linus/master v6.17-rc1 next-20250813]
[cannot apply to kvms390/next awilliam-vfio/for-linus]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Farhan-Ali/s390-pci-Restore-airq-unconditionally-for-the-zPCI-device/20250814-012243
base:   https://github.com/awilliam/linux-vfio.git next
patch link:    https://lore.kernel.org/r/20250813170821.1115-6-alifm%40linux.ibm.com
patch subject: [PATCH v1 5/6] vfio-pci/zdev: Perform platform specific function reset for zPCI
config: i386-randconfig-005-20250814 (https://download.01.org/0day-ci/archive/20250814/202508141314.OUnmuib7-lkp@intel.com/config)
compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250814/202508141314.OUnmuib7-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202508141314.OUnmuib7-lkp@intel.com/

All warnings (new ones prefixed by >>):

   In file included from drivers/vfio/pci/vfio_pci_intrs.c:23:
>> drivers/vfio/pci/vfio_pci_priv.h:104:5: warning: no previous prototype for function 'vfio_pci_zdev_reset' [-Wmissing-prototypes]
     104 | int vfio_pci_zdev_reset(struct vfio_pci_core_device *vdev)
         |     ^
   drivers/vfio/pci/vfio_pci_priv.h:104:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
     104 | int vfio_pci_zdev_reset(struct vfio_pci_core_device *vdev)
         | ^
         | static 
   1 warning generated.


vim +/vfio_pci_zdev_reset +104 drivers/vfio/pci/vfio_pci_priv.h

   101	
   102	static inline void vfio_pci_zdev_close_device(struct vfio_pci_core_device *vdev)
   103	{}
 > 104	int vfio_pci_zdev_reset(struct vfio_pci_core_device *vdev)
   105	{
   106		return -ENODEV;
   107	}
   108	#endif
   109	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v1 5/6] vfio-pci/zdev: Perform platform specific function reset for zPCI
  2025-08-13 17:08 ` [PATCH v1 5/6] vfio-pci/zdev: Perform platform specific function reset for zPCI Farhan Ali
  2025-08-13 20:30   ` Alex Williamson
  2025-08-14  5:22   ` kernel test robot
@ 2025-08-14  7:42   ` kernel test robot
  2 siblings, 0 replies; 27+ messages in thread
From: kernel test robot @ 2025-08-14  7:42 UTC (permalink / raw)
  To: Farhan Ali, linux-s390, kvm, linux-kernel
  Cc: oe-kbuild-all, schnelle, mjrosato, alifm, alex.williamson

Hi Farhan,

kernel test robot noticed the following build errors:

[auto build test ERROR on awilliam-vfio/next]
[also build test ERROR on s390/features linus/master v6.17-rc1 next-20250814]
[cannot apply to kvms390/next awilliam-vfio/for-linus]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Farhan-Ali/s390-pci-Restore-airq-unconditionally-for-the-zPCI-device/20250814-012243
base:   https://github.com/awilliam/linux-vfio.git next
patch link:    https://lore.kernel.org/r/20250813170821.1115-6-alifm%40linux.ibm.com
patch subject: [PATCH v1 5/6] vfio-pci/zdev: Perform platform specific function reset for zPCI
config: csky-randconfig-002-20250814 (https://download.01.org/0day-ci/archive/20250814/202508141518.Z82dHhVu-lkp@intel.com/config)
compiler: csky-linux-gcc (GCC) 15.1.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250814/202508141518.Z82dHhVu-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202508141518.Z82dHhVu-lkp@intel.com/

All error/warnings (new ones prefixed by >>):

   In file included from drivers/vfio/pci/vfio_pci_core.c:36:
>> drivers/vfio/pci/vfio_pci_priv.h:104:5: warning: no previous prototype for 'vfio_pci_zdev_reset' [-Wmissing-prototypes]
     104 | int vfio_pci_zdev_reset(struct vfio_pci_core_device *vdev)
         |     ^~~~~~~~~~~~~~~~~~~
--
   csky-linux-ld: drivers/vfio/pci/vfio_pci_intrs.o: in function `vfio_pci_zdev_reset':
>> drivers/vfio/pci/vfio_pci_priv.h:105: multiple definition of `vfio_pci_zdev_reset'; drivers/vfio/pci/vfio_pci_core.o:(.text+0x1a6c): first defined here
   csky-linux-ld: drivers/vfio/pci/vfio_pci_rdwr.o: in function `vfio_pci_zdev_reset':
>> drivers/vfio/pci/vfio_pci_priv.h:105: multiple definition of `vfio_pci_zdev_reset'; drivers/vfio/pci/vfio_pci_core.o:(.text+0x1a6c): first defined here
   csky-linux-ld: drivers/vfio/pci/vfio_pci_config.o: in function `vfio_pci_zdev_reset':
   (.text+0x1964): multiple definition of `vfio_pci_zdev_reset'; drivers/vfio/pci/vfio_pci_core.o:(.text+0x1a6c): first defined here


vim +105 drivers/vfio/pci/vfio_pci_priv.h

   101	
   102	static inline void vfio_pci_zdev_close_device(struct vfio_pci_core_device *vdev)
   103	{}
 > 104	int vfio_pci_zdev_reset(struct vfio_pci_core_device *vdev)
 > 105	{
   106		return -ENODEV;
   107	}
   108	#endif
   109	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v1 1/6] s390/pci: Restore airq unconditionally for the zPCI device
  2025-08-13 17:08 ` [PATCH v1 1/6] s390/pci: Restore airq unconditionally for the zPCI device Farhan Ali
@ 2025-08-14 11:32   ` Niklas Schnelle
  2025-08-14 16:42     ` Farhan Ali
  0 siblings, 1 reply; 27+ messages in thread
From: Niklas Schnelle @ 2025-08-14 11:32 UTC (permalink / raw)
  To: Farhan Ali, linux-s390, kvm, linux-kernel; +Cc: mjrosato, alex.williamson

On Wed, 2025-08-13 at 10:08 -0700, Farhan Ali wrote:
> Commit c1e18c17bda6 ("s390/pci: add zpci_set_irq()/zpci_clear_irq()"),
> introduced the zpci_set_irq() and zpci_clear_irq(), to be used while
> resetting a zPCI device.
> 
> Commit da995d538d3a ("s390/pci: implement reset_slot for hotplug slot"),
> mentions zpci_clear_irq() being called in the path for zpci_hot_reset_device().
> But that is not the case anymore and these functions are not called
> outside of this file.
> 
> However after a CLP disable/enable reset (zpci_hot_reset_device),
> the airq setup of the device will need to be restored. Since we
> are no longer calling zpci_clear_airq() in the reset path, we should
> restore the airq for device unconditionally.
> 
> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
> ---
>  arch/s390/pci/pci_irq.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/arch/s390/pci/pci_irq.c b/arch/s390/pci/pci_irq.c
> index 84482a921332..8b5493f0dee0 100644
> --- a/arch/s390/pci/pci_irq.c
> +++ b/arch/s390/pci/pci_irq.c
> @@ -427,8 +427,7 @@ bool arch_restore_msi_irqs(struct pci_dev *pdev)
>  {
>  	struct zpci_dev *zdev = to_zpci(pdev);
>  
> -	if (!zdev->irqs_registered)
> -		zpci_set_irq(zdev);
> +	zpci_set_irq(zdev);
>  	return true;
>  }
>  

This would make zdev->irqs_registered effectively without function so
the patch should remove that field from struct zpci_dev and
zpci_set_irq()/zpci_clear_irq(). Alternatively you could also clear
zdev->irqs_registed in zpci_disable_device(). I think the former is
cleaner though.

Thanks,
Niklas

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v1 5/6] vfio-pci/zdev: Perform platform specific function reset for zPCI
  2025-08-13 22:56       ` Alex Williamson
@ 2025-08-14 13:12         ` Niklas Schnelle
  2025-08-14 16:33           ` Farhan Ali
  0 siblings, 1 reply; 27+ messages in thread
From: Niklas Schnelle @ 2025-08-14 13:12 UTC (permalink / raw)
  To: Alex Williamson, Farhan Ali, Bjorn Helgaas
  Cc: linux-s390, kvm, linux-kernel, mjrosato

On Wed, 2025-08-13 at 16:56 -0600, Alex Williamson wrote:
> On Wed, 13 Aug 2025 14:52:24 -0700
> Farhan Ali <alifm@linux.ibm.com> wrote:
> 
> > On 8/13/2025 1:30 PM, Alex Williamson wrote:
> > > On Wed, 13 Aug 2025 10:08:19 -0700
> > > Farhan Ali <alifm@linux.ibm.com> wrote:
> > >  
> > > > For zPCI devices we should drive a platform specific function reset
> > > > as part of VFIO_DEVICE_RESET. This reset is needed recover a zPCI device
> > > > in error state.
> > > > 
> > > > Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
> > > > ---
> > > >   arch/s390/pci/pci.c              |  1 +
> > > >   drivers/vfio/pci/vfio_pci_core.c |  4 ++++
> > > >   drivers/vfio/pci/vfio_pci_priv.h |  5 ++++
> > > >   drivers/vfio/pci/vfio_pci_zdev.c | 39 ++++++++++++++++++++++++++++++++
> > > >   4 files changed, 49 insertions(+)
--- snip ---
> > > >   
> > > > +int vfio_pci_zdev_reset(struct vfio_pci_core_device *vdev)
> > > > +{
> > > > +	struct zpci_dev *zdev = to_zpci(vdev->pdev);
> > > > +	int rc = -EIO;
> > > > +
> > > > +	if (!zdev)
> > > > +		return -ENODEV;
> > > > +
> > > > +	/*
> > > > +	 * If we can't get the zdev->state_lock the device state is
> > > > +	 * currently undergoing a transition and we bail out - just
> > > > +	 * the same as if the device's state is not configured at all.
> > > > +	 */
> > > > +	if (!mutex_trylock(&zdev->state_lock))
> > > > +		return rc;
> > > > +
> > > > +	/* We can reset only if the function is configured */
> > > > +	if (zdev->state != ZPCI_FN_STATE_CONFIGURED)
> > > > +		goto out;
> > > > +
> > > > +	rc = zpci_hot_reset_device(zdev);
> > > > +	if (rc != 0)
> > > > +		goto out;
> > > > +
> > > > +	if (!vdev->pci_saved_state) {
> > > > +		pci_err(vdev->pdev, "No saved available for the device");
> > > > +		rc = -EIO;
> > > > +		goto out;
> > > > +	}
> > > > +
> > > > +	pci_dev_lock(vdev->pdev);
> > > > +	pci_load_saved_state(vdev->pdev, vdev->pci_saved_state);
> > > > +	pci_restore_state(vdev->pdev);
> > > > +	pci_dev_unlock(vdev->pdev);
> > > > +out:
> > > > +	mutex_unlock(&zdev->state_lock);
> > > > +	return rc;
> > > > +}  
> > > This looks like it should be a device or arch specific reset
> > > implemented in drivers/pci, not vfio.  Thanks,
> > > 
> > > Alex  
> > 
> > Are you suggesting to move this to an arch specific function? One thing 
> > we need to do after the zpci_hot_reset_device, is to correctly restore 
> > the config space of the device. And for vfio-pci bound devices we want 
> > to restore the state of the device to when it was initially opened.
> 
> We generally rely on the abstraction of pci_reset_function() to select
> the correct type of reset for a function scope reset.  We've gone to
> quite a bit of effort to implement all device specific resets and
> quirks in the PCI core to be re-used across the kernel.
> 
> Calling zpci_hot_reset_device() directly seems contradictory to those
> efforts.  Should pci_reset_function() call this universally on s390x
> rather than providing access to FLR/PM/SBR reset? 
> 

I agree with you Alex. Still trying to figure out what's needed for
this. We already do zpci_hot_reset_device() in reset_slot() from the
s390_pci_hpc.c hotplug slot driver and that does get called via
pci_reset_hotplug_slot() and pci_reset_function(). There are a few
problems though that meant it didn't work for Farhan but I agree maybe
we can fix them for the general case. For one pci_reset_function()
via DEVICE_RESET first tries FLR but that won't work with the device in
the error state and MMIO blocked. Sadly __pci_reset_function_locked()
then concludes that other resets also won't work. So that's something
we might want to improve in general, for example maybe we need
something more like pci_dev_acpi_reset() with higher priority than FLR.

Now for pci_reset_hotplug_slot() via VFIO_DEVICE_PCI_HOT_RESET I'm not
sure why that won't work as is. @Farhan do you know?

>  Why is it
> universally correct here given the ioctl previously made use of
> standard reset mechanisms?
> 
> The DEVICE_RESET ioctl is simply an in-place reset of the device,
> without restoring the original device state.  So we're also subtly
> changing that behavior here, presumably because we're targeting the
> specific error recovery case.  Have you considered how this might
> break non-error-recovery use cases?
> 
> I wonder if we want a different reset mechanism for this use case
> rather than these subtle semantic changes. 

I think an alternative to that, which Farhan actually had in the
previous internal version, is to implement
pci_error_handlers::reset_done() and do the pci_load_saved_state()
there. That would only affect the error recovery case leaving other
cases alone.

Thanks,
Niklas

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v1 5/6] vfio-pci/zdev: Perform platform specific function reset for zPCI
  2025-08-14 13:12         ` Niklas Schnelle
@ 2025-08-14 16:33           ` Farhan Ali
  2025-08-14 19:55             ` Niklas Schnelle
  2025-08-14 20:57             ` Alex Williamson
  0 siblings, 2 replies; 27+ messages in thread
From: Farhan Ali @ 2025-08-14 16:33 UTC (permalink / raw)
  To: Niklas Schnelle, Alex Williamson, Bjorn Helgaas
  Cc: linux-s390, kvm, linux-kernel, mjrosato


On 8/14/2025 6:12 AM, Niklas Schnelle wrote:
> On Wed, 2025-08-13 at 16:56 -0600, Alex Williamson wrote:
>> On Wed, 13 Aug 2025 14:52:24 -0700
>> Farhan Ali <alifm@linux.ibm.com> wrote:
>>
>>> On 8/13/2025 1:30 PM, Alex Williamson wrote:
>>>> On Wed, 13 Aug 2025 10:08:19 -0700
>>>> Farhan Ali <alifm@linux.ibm.com> wrote:
>>>>   
>>>>> For zPCI devices we should drive a platform specific function reset
>>>>> as part of VFIO_DEVICE_RESET. This reset is needed recover a zPCI device
>>>>> in error state.
>>>>>
>>>>> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
>>>>> ---
>>>>>    arch/s390/pci/pci.c              |  1 +
>>>>>    drivers/vfio/pci/vfio_pci_core.c |  4 ++++
>>>>>    drivers/vfio/pci/vfio_pci_priv.h |  5 ++++
>>>>>    drivers/vfio/pci/vfio_pci_zdev.c | 39 ++++++++++++++++++++++++++++++++
>>>>>    4 files changed, 49 insertions(+)
> --- snip ---
>>>>>    
>>>>> +int vfio_pci_zdev_reset(struct vfio_pci_core_device *vdev)
>>>>> +{
>>>>> +	struct zpci_dev *zdev = to_zpci(vdev->pdev);
>>>>> +	int rc = -EIO;
>>>>> +
>>>>> +	if (!zdev)
>>>>> +		return -ENODEV;
>>>>> +
>>>>> +	/*
>>>>> +	 * If we can't get the zdev->state_lock the device state is
>>>>> +	 * currently undergoing a transition and we bail out - just
>>>>> +	 * the same as if the device's state is not configured at all.
>>>>> +	 */
>>>>> +	if (!mutex_trylock(&zdev->state_lock))
>>>>> +		return rc;
>>>>> +
>>>>> +	/* We can reset only if the function is configured */
>>>>> +	if (zdev->state != ZPCI_FN_STATE_CONFIGURED)
>>>>> +		goto out;
>>>>> +
>>>>> +	rc = zpci_hot_reset_device(zdev);
>>>>> +	if (rc != 0)
>>>>> +		goto out;
>>>>> +
>>>>> +	if (!vdev->pci_saved_state) {
>>>>> +		pci_err(vdev->pdev, "No saved available for the device");
>>>>> +		rc = -EIO;
>>>>> +		goto out;
>>>>> +	}
>>>>> +
>>>>> +	pci_dev_lock(vdev->pdev);
>>>>> +	pci_load_saved_state(vdev->pdev, vdev->pci_saved_state);
>>>>> +	pci_restore_state(vdev->pdev);
>>>>> +	pci_dev_unlock(vdev->pdev);
>>>>> +out:
>>>>> +	mutex_unlock(&zdev->state_lock);
>>>>> +	return rc;
>>>>> +}
>>>> This looks like it should be a device or arch specific reset
>>>> implemented in drivers/pci, not vfio.  Thanks,
>>>>
>>>> Alex
>>> Are you suggesting to move this to an arch specific function? One thing
>>> we need to do after the zpci_hot_reset_device, is to correctly restore
>>> the config space of the device. And for vfio-pci bound devices we want
>>> to restore the state of the device to when it was initially opened.
>> We generally rely on the abstraction of pci_reset_function() to select
>> the correct type of reset for a function scope reset.  We've gone to
>> quite a bit of effort to implement all device specific resets and
>> quirks in the PCI core to be re-used across the kernel.
>>
>> Calling zpci_hot_reset_device() directly seems contradictory to those
>> efforts.  Should pci_reset_function() call this universally on s390x
>> rather than providing access to FLR/PM/SBR reset?
>>
> I agree with you Alex. Still trying to figure out what's needed for
> this. We already do zpci_hot_reset_device() in reset_slot() from the
> s390_pci_hpc.c hotplug slot driver and that does get called via
> pci_reset_hotplug_slot() and pci_reset_function(). There are a few
> problems though that meant it didn't work for Farhan but I agree maybe
> we can fix them for the general case. For one pci_reset_function()
> via DEVICE_RESET first tries FLR but that won't work with the device in
> the error state and MMIO blocked. Sadly __pci_reset_function_locked()
> then concludes that other resets also won't work. So that's something
> we might want to improve in general, for example maybe we need
> something more like pci_dev_acpi_reset() with higher priority than FLR.

Yeah I did think of adding something like s390x CLP reset as part of the 
reset methods. AFAIU the s390x CLP reset is similar to ACPI _RST. But 
that would introduce s390x specific code in pci core common code.

>
> Now for pci_reset_hotplug_slot() via VFIO_DEVICE_PCI_HOT_RESET I'm not
> sure why that won't work as is. @Farhan do you know?

VFIO_DEVICE_PCI_HOT_RESET would have been sufficient interface for 
majority of PCI devices on s390x as that would drive a bus reset. It was 
sufficient as most devices were single bus devices. But in the latest 
generation of machines (z17) we expose true SR-IOV and an OS can have 
access to both PF and VFs and so these are on the same bus and can have 
different ownership based on what is bound to vfio-pci.

My thinking for extending VFIO_DEVICE_RESET is because AFAIU its a per 
function reset mechanism, which maps well with what our architecture 
provides. On s390x we can drive a per function reset (via firmware) 
through the CLP instruction driven by the zpci_hot_reset_device(). And 
doing it as vfio zpci specific function would confine the s390x logic.

>>   Why is it
>> universally correct here given the ioctl previously made use of
>> standard reset mechanisms?
>>
>> The DEVICE_RESET ioctl is simply an in-place reset of the device,
>> without restoring the original device state.  So we're also subtly
>> changing that behavior here, presumably because we're targeting the
>> specific error recovery case.  Have you considered how this might
>> break non-error-recovery use cases?
>>
>> I wonder if we want a different reset mechanism for this use case
>> rather than these subtle semantic changes.
> I think an alternative to that, which Farhan actually had in the
> previous internal version, is to implement
> pci_error_handlers::reset_done() and do the pci_load_saved_state()
> there. That would only affect the error recovery case leaving other
> cases alone.
>
>
> Thanks,
> Niklas

The reason I abandoned reset_done() callback idea is because its not 
sufficient to recover the device correctly. Today before driving a reset 
we save the state of the device. When a device is in error state, any 
pci load/store (on s390x they are actual instructions :)) to config 
space would return an error value (0xffffffff). We don't have any checks 
in pci_save_state to prevent storing error values. And after a reset 
when we try to restore the config space (pci_dev_restore) we try to 
write the error value and this can be problematic. By the time the 
reset_done() callback is invoked, its already too late.

@Alex,
I am open to ideas/suggestions on this. Do we think we need a separate 
VFIO ioctl to drive this or a new reset mechanism as Niklas suggested?

Thanks
Farhan


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v1 1/6] s390/pci: Restore airq unconditionally for the zPCI device
  2025-08-14 11:32   ` Niklas Schnelle
@ 2025-08-14 16:42     ` Farhan Ali
  0 siblings, 0 replies; 27+ messages in thread
From: Farhan Ali @ 2025-08-14 16:42 UTC (permalink / raw)
  To: Niklas Schnelle, linux-s390, kvm, linux-kernel; +Cc: mjrosato, alex.williamson


On 8/14/2025 4:32 AM, Niklas Schnelle wrote:
> On Wed, 2025-08-13 at 10:08 -0700, Farhan Ali wrote:
>> Commit c1e18c17bda6 ("s390/pci: add zpci_set_irq()/zpci_clear_irq()"),
>> introduced the zpci_set_irq() and zpci_clear_irq(), to be used while
>> resetting a zPCI device.
>>
>> Commit da995d538d3a ("s390/pci: implement reset_slot for hotplug slot"),
>> mentions zpci_clear_irq() being called in the path for zpci_hot_reset_device().
>> But that is not the case anymore and these functions are not called
>> outside of this file.
>>
>> However after a CLP disable/enable reset (zpci_hot_reset_device),
>> the airq setup of the device will need to be restored. Since we
>> are no longer calling zpci_clear_airq() in the reset path, we should
>> restore the airq for device unconditionally.
>>
>> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
>> ---
>>   arch/s390/pci/pci_irq.c | 3 +--
>>   1 file changed, 1 insertion(+), 2 deletions(-)
>>
>> diff --git a/arch/s390/pci/pci_irq.c b/arch/s390/pci/pci_irq.c
>> index 84482a921332..8b5493f0dee0 100644
>> --- a/arch/s390/pci/pci_irq.c
>> +++ b/arch/s390/pci/pci_irq.c
>> @@ -427,8 +427,7 @@ bool arch_restore_msi_irqs(struct pci_dev *pdev)
>>   {
>>   	struct zpci_dev *zdev = to_zpci(pdev);
>>   
>> -	if (!zdev->irqs_registered)
>> -		zpci_set_irq(zdev);
>> +	zpci_set_irq(zdev);
>>   	return true;
>>   }
>>   
> This would make zdev->irqs_registered effectively without function so
> the patch should remove that field from struct zpci_dev and
> zpci_set_irq()/zpci_clear_irq(). Alternatively you could also clear
> zdev->irqs_registed in zpci_disable_device(). I think the former is
> cleaner though.
>
> Thanks,
> Niklas
Yeah agreed, will remove the irqs_registered from zdev as its not needed 
anymore.

Thanks
Farhan

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v1 5/6] vfio-pci/zdev: Perform platform specific function reset for zPCI
  2025-08-14 16:33           ` Farhan Ali
@ 2025-08-14 19:55             ` Niklas Schnelle
  2025-08-14 20:57             ` Alex Williamson
  1 sibling, 0 replies; 27+ messages in thread
From: Niklas Schnelle @ 2025-08-14 19:55 UTC (permalink / raw)
  To: Farhan Ali, Alex Williamson, Bjorn Helgaas
  Cc: linux-s390, kvm, linux-kernel, mjrosato

On Thu, 2025-08-14 at 09:33 -0700, Farhan Ali wrote:
> On 8/14/2025 6:12 AM, Niklas Schnelle wrote:
> > On Wed, 2025-08-13 at 16:56 -0600, Alex Williamson wrote:
> > > On Wed, 13 Aug 2025 14:52:24 -0700
> > > Farhan Ali <alifm@linux.ibm.com> wrote:
> > > 
> > > > On 8/13/2025 1:30 PM, Alex Williamson wrote:
> > > > > On Wed, 13 Aug 2025 10:08:19 -0700
> > > > > Farhan Ali <alifm@linux.ibm.com> wrote:
> > > > >   
> > > > > > For zPCI devices we should drive a platform specific function reset
> > > > > > as part of VFIO_DEVICE_RESET. This reset is needed recover a zPCI device
> > > > > > in error state.
> > > > > > 
> > > > > > Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
> > > > > > ---
> > > > > >    arch/s390/pci/pci.c              |  1 +
> > > > > >    drivers/vfio/pci/vfio_pci_core.c |  4 ++++
> > > > > >    drivers/vfio/pci/vfio_pci_priv.h |  5 ++++
> > > > > >    drivers/vfio/pci/vfio_pci_zdev.c | 39 ++++++++++++++++++++++++++++++++
> > > > > >    4 files changed, 49 insertions(+)
> 
--- snip ---
> > Now for pci_reset_hotplug_slot() via VFIO_DEVICE_PCI_HOT_RESET I'm not
> > sure why that won't work as is. @Farhan do you know?
> 
> VFIO_DEVICE_PCI_HOT_RESET would have been sufficient interface for 
> majority of PCI devices on s390x as that would drive a bus reset. It was 
> sufficient as most devices were single bus devices. But in the latest 
> generation of machines (z17) we expose true SR-IOV and an OS can have 
> access to both PF and VFs and so these are on the same bus and can have 
> different ownership based on what is bound to vfio-pci.
> 

Talked to Farhan a bit off list. I think the problem boils down to
this. The s390 PCI support, due to there always being a hypervisor,
does hot and cold plug on a per PCI function basis. And so the hotplug
driver's reset_slot() is just a wrapper around zpci_hot_reset_device()
and also does per PCI function reset. 

Now when doing a VFIO_PCI_HOT_RESET this still doesn't give us a usable
per function reset because vfio_pci_ioctl_get_pci_hot_reset_info()'s
grouping assumes, quite naturally, that as a slot reset this will reset
an entire slot and so the ownership checks fail when we use this reset
on e.g. an SR-IOV capable device with PFs + VFs where s390 exposes
their PCI topology even while retaining per function hotplug. 

As some more background, we've had the virtual PCI topology for SR-IOV
capable devices since commit 44510d6fa0c0 ("s390/pci: Handling
multifunctions") added in v6.9 but only with z17 will they be generally
available.

Thanks,
Niklas

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v1 6/6] vfio: Allow error notification and recovery for ISM device
  2025-08-13 17:08 ` [PATCH v1 6/6] vfio: Allow error notification and recovery for ISM device Farhan Ali
@ 2025-08-14 20:48   ` Bjorn Helgaas
  2025-08-14 21:02     ` Farhan Ali
  0 siblings, 1 reply; 27+ messages in thread
From: Bjorn Helgaas @ 2025-08-14 20:48 UTC (permalink / raw)
  To: Farhan Ali
  Cc: linux-s390, kvm, linux-kernel, schnelle, mjrosato,
	alex.williamson

On Wed, Aug 13, 2025 at 10:08:20AM -0700, Farhan Ali wrote:
> VFIO allows error recovery and notification for devices that
> are PCIe (and thus AER) capable. But for PCI devices on IBM
> s390 error recovery involves platform firmware and
> notification to operating system is done by architecture
> specific way. The Internal Shared Memory(ISM) device is a legacy
> PCI device (so not PCIe capable), but can still be recovered
> when notified of an error.

"PCIe (and thus AER) capable" reads as though AER is required for all
PCIe devices, but AER is optional.

I don't know the details of VFIO and why it tests for PCIe instead of
AER.  Maybe AER is not relevant here and you don't need to mention
AER above at all?

> Relax the PCIe only requirement for ISM devices, so passthrough
> ISM devices can be notified and recovered on error.

Nit: it looks like all your commit logs could be rewrapped to fill
about 75 columns (to leave space for "git log" to indent them and
still fit in 80 columns).  IMHO not much value in using a smaller
width than that.

> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
> ---
>  drivers/vfio/pci/vfio_pci_core.c  | 18 ++++++++++++++++--
>  drivers/vfio/pci/vfio_pci_intrs.c |  2 +-
>  drivers/vfio/pci/vfio_pci_priv.h  |  3 +++
>  3 files changed, 20 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index 7220a22135a9..1faab80139c6 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -723,6 +723,20 @@ void vfio_pci_core_finish_enable(struct vfio_pci_core_device *vdev)
>  }
>  EXPORT_SYMBOL_GPL(vfio_pci_core_finish_enable);
>  
> +bool vfio_pci_device_can_recover(struct vfio_pci_core_device *vdev)
> +{
> +	struct pci_dev *pdev = vdev->pdev;
> +
> +	if (pci_is_pcie(pdev))
> +		return true;
> +
> +	if (pdev->vendor == PCI_VENDOR_ID_IBM &&
> +			pdev->device == PCI_DEVICE_ID_IBM_ISM)
> +		return true;
> +
> +	return false;
> +}
> +
>  static int vfio_pci_get_irq_count(struct vfio_pci_core_device *vdev, int irq_type)
>  {
>  	if (irq_type == VFIO_PCI_INTX_IRQ_INDEX) {
> @@ -749,7 +763,7 @@ static int vfio_pci_get_irq_count(struct vfio_pci_core_device *vdev, int irq_typ
>  			return (flags & PCI_MSIX_FLAGS_QSIZE) + 1;
>  		}
>  	} else if (irq_type == VFIO_PCI_ERR_IRQ_INDEX) {
> -		if (pci_is_pcie(vdev->pdev))
> +		if (vfio_pci_device_can_recover(vdev))
>  			return 1;
>  	} else if (irq_type == VFIO_PCI_REQ_IRQ_INDEX) {
>  		return 1;
> @@ -1150,7 +1164,7 @@ static int vfio_pci_ioctl_get_irq_info(struct vfio_pci_core_device *vdev,
>  	case VFIO_PCI_REQ_IRQ_INDEX:
>  		break;
>  	case VFIO_PCI_ERR_IRQ_INDEX:
> -		if (pci_is_pcie(vdev->pdev))
> +		if (vfio_pci_device_can_recover(vdev))
>  			break;
>  		fallthrough;
>  	default:
> diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
> index 123298a4dc8f..f5384086ac45 100644
> --- a/drivers/vfio/pci/vfio_pci_intrs.c
> +++ b/drivers/vfio/pci/vfio_pci_intrs.c
> @@ -838,7 +838,7 @@ int vfio_pci_set_irqs_ioctl(struct vfio_pci_core_device *vdev, uint32_t flags,
>  	case VFIO_PCI_ERR_IRQ_INDEX:
>  		switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) {
>  		case VFIO_IRQ_SET_ACTION_TRIGGER:
> -			if (pci_is_pcie(vdev->pdev))
> +			if (vfio_pci_device_can_recover(vdev))
>  				func = vfio_pci_set_err_trigger;
>  			break;
>  		}
> diff --git a/drivers/vfio/pci/vfio_pci_priv.h b/drivers/vfio/pci/vfio_pci_priv.h
> index 5288577b3170..93c1e29fbbbb 100644
> --- a/drivers/vfio/pci/vfio_pci_priv.h
> +++ b/drivers/vfio/pci/vfio_pci_priv.h
> @@ -36,6 +36,9 @@ ssize_t vfio_pci_config_rw(struct vfio_pci_core_device *vdev, char __user *buf,
>  ssize_t vfio_pci_bar_rw(struct vfio_pci_core_device *vdev, char __user *buf,
>  			size_t count, loff_t *ppos, bool iswrite);
>  
> +bool vfio_pci_device_can_recover(struct vfio_pci_core_device *vdev);
> +
> +
>  #ifdef CONFIG_VFIO_PCI_VGA
>  ssize_t vfio_pci_vga_rw(struct vfio_pci_core_device *vdev, char __user *buf,
>  			size_t count, loff_t *ppos, bool iswrite);
> -- 
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v1 5/6] vfio-pci/zdev: Perform platform specific function reset for zPCI
  2025-08-14 16:33           ` Farhan Ali
  2025-08-14 19:55             ` Niklas Schnelle
@ 2025-08-14 20:57             ` Alex Williamson
  2025-08-14 22:33               ` Farhan Ali
  1 sibling, 1 reply; 27+ messages in thread
From: Alex Williamson @ 2025-08-14 20:57 UTC (permalink / raw)
  To: Farhan Ali
  Cc: Niklas Schnelle, Bjorn Helgaas, linux-s390, kvm, linux-kernel,
	mjrosato

On Thu, 14 Aug 2025 09:33:47 -0700
Farhan Ali <alifm@linux.ibm.com> wrote:

> On 8/14/2025 6:12 AM, Niklas Schnelle wrote:
> > On Wed, 2025-08-13 at 16:56 -0600, Alex Williamson wrote:  
> >> On Wed, 13 Aug 2025 14:52:24 -0700
> >> Farhan Ali <alifm@linux.ibm.com> wrote:
> >>  
> >>> On 8/13/2025 1:30 PM, Alex Williamson wrote:  
> >>>> On Wed, 13 Aug 2025 10:08:19 -0700
> >>>> Farhan Ali <alifm@linux.ibm.com> wrote:
> >>>>     
> >>>>> For zPCI devices we should drive a platform specific function reset
> >>>>> as part of VFIO_DEVICE_RESET. This reset is needed recover a zPCI device
> >>>>> in error state.
> >>>>>
> >>>>> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
> >>>>> ---
> >>>>>    arch/s390/pci/pci.c              |  1 +
> >>>>>    drivers/vfio/pci/vfio_pci_core.c |  4 ++++
> >>>>>    drivers/vfio/pci/vfio_pci_priv.h |  5 ++++
> >>>>>    drivers/vfio/pci/vfio_pci_zdev.c | 39 ++++++++++++++++++++++++++++++++
> >>>>>    4 files changed, 49 insertions(+)  
> > --- snip ---  
> >>>>>    
> >>>>> +int vfio_pci_zdev_reset(struct vfio_pci_core_device *vdev)
> >>>>> +{
> >>>>> +	struct zpci_dev *zdev = to_zpci(vdev->pdev);
> >>>>> +	int rc = -EIO;
> >>>>> +
> >>>>> +	if (!zdev)
> >>>>> +		return -ENODEV;
> >>>>> +
> >>>>> +	/*
> >>>>> +	 * If we can't get the zdev->state_lock the device state is
> >>>>> +	 * currently undergoing a transition and we bail out - just
> >>>>> +	 * the same as if the device's state is not configured at all.
> >>>>> +	 */
> >>>>> +	if (!mutex_trylock(&zdev->state_lock))
> >>>>> +		return rc;
> >>>>> +
> >>>>> +	/* We can reset only if the function is configured */
> >>>>> +	if (zdev->state != ZPCI_FN_STATE_CONFIGURED)
> >>>>> +		goto out;
> >>>>> +
> >>>>> +	rc = zpci_hot_reset_device(zdev);
> >>>>> +	if (rc != 0)
> >>>>> +		goto out;
> >>>>> +
> >>>>> +	if (!vdev->pci_saved_state) {
> >>>>> +		pci_err(vdev->pdev, "No saved available for the device");
> >>>>> +		rc = -EIO;
> >>>>> +		goto out;
> >>>>> +	}
> >>>>> +
> >>>>> +	pci_dev_lock(vdev->pdev);
> >>>>> +	pci_load_saved_state(vdev->pdev, vdev->pci_saved_state);
> >>>>> +	pci_restore_state(vdev->pdev);
> >>>>> +	pci_dev_unlock(vdev->pdev);
> >>>>> +out:
> >>>>> +	mutex_unlock(&zdev->state_lock);
> >>>>> +	return rc;
> >>>>> +}  
> >>>> This looks like it should be a device or arch specific reset
> >>>> implemented in drivers/pci, not vfio.  Thanks,
> >>>>
> >>>> Alex  
> >>> Are you suggesting to move this to an arch specific function? One thing
> >>> we need to do after the zpci_hot_reset_device, is to correctly restore
> >>> the config space of the device. And for vfio-pci bound devices we want
> >>> to restore the state of the device to when it was initially opened.  
> >> We generally rely on the abstraction of pci_reset_function() to select
> >> the correct type of reset for a function scope reset.  We've gone to
> >> quite a bit of effort to implement all device specific resets and
> >> quirks in the PCI core to be re-used across the kernel.
> >>
> >> Calling zpci_hot_reset_device() directly seems contradictory to those
> >> efforts.  Should pci_reset_function() call this universally on s390x
> >> rather than providing access to FLR/PM/SBR reset?
> >>  
> > I agree with you Alex. Still trying to figure out what's needed for
> > this. We already do zpci_hot_reset_device() in reset_slot() from the
> > s390_pci_hpc.c hotplug slot driver and that does get called via
> > pci_reset_hotplug_slot() and pci_reset_function(). There are a few
> > problems though that meant it didn't work for Farhan but I agree maybe
> > we can fix them for the general case. For one pci_reset_function()
> > via DEVICE_RESET first tries FLR but that won't work with the device in
> > the error state and MMIO blocked. Sadly __pci_reset_function_locked()
> > then concludes that other resets also won't work. So that's something
> > we might want to improve in general, for example maybe we need
> > something more like pci_dev_acpi_reset() with higher priority than FLR.  
> 
> Yeah I did think of adding something like s390x CLP reset as part of the 
> reset methods. AFAIU the s390x CLP reset is similar to ACPI _RST. But 
> that would introduce s390x specific code in pci core common code.
> 
> >
> > Now for pci_reset_hotplug_slot() via VFIO_DEVICE_PCI_HOT_RESET I'm not
> > sure why that won't work as is. @Farhan do you know?  
> 
> VFIO_DEVICE_PCI_HOT_RESET would have been sufficient interface for 
> majority of PCI devices on s390x as that would drive a bus reset. It was 
> sufficient as most devices were single bus devices. But in the latest 
> generation of machines (z17) we expose true SR-IOV and an OS can have 
> access to both PF and VFs and so these are on the same bus and can have 
> different ownership based on what is bound to vfio-pci.
> 
> My thinking for extending VFIO_DEVICE_RESET is because AFAIU its a per 
> function reset mechanism, which maps well with what our architecture 
> provides. On s390x we can drive a per function reset (via firmware) 
> through the CLP instruction driven by the zpci_hot_reset_device(). And 
> doing it as vfio zpci specific function would confine the s390x logic.
> 
> >>   Why is it
> >> universally correct here given the ioctl previously made use of
> >> standard reset mechanisms?
> >>
> >> The DEVICE_RESET ioctl is simply an in-place reset of the device,
> >> without restoring the original device state.  So we're also subtly
> >> changing that behavior here, presumably because we're targeting the
> >> specific error recovery case.  Have you considered how this might
> >> break non-error-recovery use cases?
> >>
> >> I wonder if we want a different reset mechanism for this use case
> >> rather than these subtle semantic changes.  
> > I think an alternative to that, which Farhan actually had in the
> > previous internal version, is to implement
> > pci_error_handlers::reset_done() and do the pci_load_saved_state()
> > there. That would only affect the error recovery case leaving other
> > cases alone.
> >
> >
> > Thanks,
> > Niklas  
> 
> The reason I abandoned reset_done() callback idea is because its not 
> sufficient to recover the device correctly. Today before driving a reset 
> we save the state of the device. When a device is in error state, any 
> pci load/store (on s390x they are actual instructions :)) to config 
> space would return an error value (0xffffffff). We don't have any checks 
> in pci_save_state to prevent storing error values. And after a reset 
> when we try to restore the config space (pci_dev_restore) we try to 
> write the error value and this can be problematic. By the time the 
> reset_done() callback is invoked, its already too late.

It's too late because we've re-written the error value back to config
space and as a result the device is broken?  What if
pci_restore_state() were a little smarter to detect that it has bad
read data from pci_save_state() and only restores state based on kernel
data?  Would that leave the device in a functional state that
reset_done() could restore the original saved state and push it out to
the device?

> @Alex,
> I am open to ideas/suggestions on this. Do we think we need a separate 
> VFIO ioctl to drive this or a new reset mechanism as Niklas suggested?

Unfortunately I was short sighted on VFIO_DEVICE_RESET and it's the one
ioctl that doesn't have any flags, so it's not very extensible.

Can we do more of the above, ie. enlighten the FLR/PM reset callbacks to
return -ENOTTY if the device is in an error state and config space is
returning -1 such that we fall through to a slot reset that doesn't
care how broken the device is and you auto-magically get the zpci
function you want?  Follow-up with pushing the original state in
reset_done()?  Thanks,

Alex


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v1 6/6] vfio: Allow error notification and recovery for ISM device
  2025-08-14 20:48   ` Bjorn Helgaas
@ 2025-08-14 21:02     ` Farhan Ali
  2025-08-15 20:48       ` Alex Williamson
  0 siblings, 1 reply; 27+ messages in thread
From: Farhan Ali @ 2025-08-14 21:02 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-s390, kvm, linux-kernel, schnelle, mjrosato,
	alex.williamson


On 8/14/2025 1:48 PM, Bjorn Helgaas wrote:
> On Wed, Aug 13, 2025 at 10:08:20AM -0700, Farhan Ali wrote:
>> VFIO allows error recovery and notification for devices that
>> are PCIe (and thus AER) capable. But for PCI devices on IBM
>> s390 error recovery involves platform firmware and
>> notification to operating system is done by architecture
>> specific way. The Internal Shared Memory(ISM) device is a legacy
>> PCI device (so not PCIe capable), but can still be recovered
>> when notified of an error.
> "PCIe (and thus AER) capable" reads as though AER is required for all
> PCIe devices, but AER is optional.
>
> I don't know the details of VFIO and why it tests for PCIe instead of
> AER.  Maybe AER is not relevant here and you don't need to mention
> AER above at all?

The original change that introduced this commit dad9f89 "VFIO-AER: 
Vfio-pci driver changes for supporting AER" was adding the support for 
AER for vfio. My assumption is the author thought if the device is AER 
capable the pcie check should be sufficient? I can remove the AER 
references in commit message. Thanks Farhan



>
>> Relax the PCIe only requirement for ISM devices, so passthrough
>> ISM devices can be notified and recovered on error.
> Nit: it looks like all your commit logs could be rewrapped to fill
> about 75 columns (to leave space for "git log" to indent them and
> still fit in 80 columns).  IMHO not much value in using a smaller
> width than that.
>
Sure, will fix this.
>> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
>> ---
>>   drivers/vfio/pci/vfio_pci_core.c  | 18 ++++++++++++++++--
>>   drivers/vfio/pci/vfio_pci_intrs.c |  2 +-
>>   drivers/vfio/pci/vfio_pci_priv.h  |  3 +++
>>   3 files changed, 20 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
>> index 7220a22135a9..1faab80139c6 100644
>> --- a/drivers/vfio/pci/vfio_pci_core.c
>> +++ b/drivers/vfio/pci/vfio_pci_core.c
>> @@ -723,6 +723,20 @@ void vfio_pci_core_finish_enable(struct vfio_pci_core_device *vdev)
>>   }
>>   EXPORT_SYMBOL_GPL(vfio_pci_core_finish_enable);
>>   
>> +bool vfio_pci_device_can_recover(struct vfio_pci_core_device *vdev)
>> +{
>> +	struct pci_dev *pdev = vdev->pdev;
>> +
>> +	if (pci_is_pcie(pdev))
>> +		return true;
>> +
>> +	if (pdev->vendor == PCI_VENDOR_ID_IBM &&
>> +			pdev->device == PCI_DEVICE_ID_IBM_ISM)
>> +		return true;
>> +
>> +	return false;
>> +}
>> +
>>   static int vfio_pci_get_irq_count(struct vfio_pci_core_device *vdev, int irq_type)
>>   {
>>   	if (irq_type == VFIO_PCI_INTX_IRQ_INDEX) {
>> @@ -749,7 +763,7 @@ static int vfio_pci_get_irq_count(struct vfio_pci_core_device *vdev, int irq_typ
>>   			return (flags & PCI_MSIX_FLAGS_QSIZE) + 1;
>>   		}
>>   	} else if (irq_type == VFIO_PCI_ERR_IRQ_INDEX) {
>> -		if (pci_is_pcie(vdev->pdev))
>> +		if (vfio_pci_device_can_recover(vdev))
>>   			return 1;
>>   	} else if (irq_type == VFIO_PCI_REQ_IRQ_INDEX) {
>>   		return 1;
>> @@ -1150,7 +1164,7 @@ static int vfio_pci_ioctl_get_irq_info(struct vfio_pci_core_device *vdev,
>>   	case VFIO_PCI_REQ_IRQ_INDEX:
>>   		break;
>>   	case VFIO_PCI_ERR_IRQ_INDEX:
>> -		if (pci_is_pcie(vdev->pdev))
>> +		if (vfio_pci_device_can_recover(vdev))
>>   			break;
>>   		fallthrough;
>>   	default:
>> diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
>> index 123298a4dc8f..f5384086ac45 100644
>> --- a/drivers/vfio/pci/vfio_pci_intrs.c
>> +++ b/drivers/vfio/pci/vfio_pci_intrs.c
>> @@ -838,7 +838,7 @@ int vfio_pci_set_irqs_ioctl(struct vfio_pci_core_device *vdev, uint32_t flags,
>>   	case VFIO_PCI_ERR_IRQ_INDEX:
>>   		switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) {
>>   		case VFIO_IRQ_SET_ACTION_TRIGGER:
>> -			if (pci_is_pcie(vdev->pdev))
>> +			if (vfio_pci_device_can_recover(vdev))
>>   				func = vfio_pci_set_err_trigger;
>>   			break;
>>   		}
>> diff --git a/drivers/vfio/pci/vfio_pci_priv.h b/drivers/vfio/pci/vfio_pci_priv.h
>> index 5288577b3170..93c1e29fbbbb 100644
>> --- a/drivers/vfio/pci/vfio_pci_priv.h
>> +++ b/drivers/vfio/pci/vfio_pci_priv.h
>> @@ -36,6 +36,9 @@ ssize_t vfio_pci_config_rw(struct vfio_pci_core_device *vdev, char __user *buf,
>>   ssize_t vfio_pci_bar_rw(struct vfio_pci_core_device *vdev, char __user *buf,
>>   			size_t count, loff_t *ppos, bool iswrite);
>>   
>> +bool vfio_pci_device_can_recover(struct vfio_pci_core_device *vdev);
>> +
>> +
>>   #ifdef CONFIG_VFIO_PCI_VGA
>>   ssize_t vfio_pci_vga_rw(struct vfio_pci_core_device *vdev, char __user *buf,
>>   			size_t count, loff_t *ppos, bool iswrite);
>> -- 
>> 2.43.0
>>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v1 5/6] vfio-pci/zdev: Perform platform specific function reset for zPCI
  2025-08-14 20:57             ` Alex Williamson
@ 2025-08-14 22:33               ` Farhan Ali
  0 siblings, 0 replies; 27+ messages in thread
From: Farhan Ali @ 2025-08-14 22:33 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Niklas Schnelle, Bjorn Helgaas, linux-s390, kvm, linux-kernel,
	mjrosato


On 8/14/2025 1:57 PM, Alex Williamson wrote:
> On Thu, 14 Aug 2025 09:33:47 -0700
> Farhan Ali <alifm@linux.ibm.com> wrote:
>
>> On 8/14/2025 6:12 AM, Niklas Schnelle wrote:
>>> On Wed, 2025-08-13 at 16:56 -0600, Alex Williamson wrote:
>>>> On Wed, 13 Aug 2025 14:52:24 -0700
>>>> Farhan Ali <alifm@linux.ibm.com> wrote:
>>>>   
>>>>> On 8/13/2025 1:30 PM, Alex Williamson wrote:
>>>>>> On Wed, 13 Aug 2025 10:08:19 -0700
>>>>>> Farhan Ali <alifm@linux.ibm.com> wrote:
>>>>>>      
>>>>>>> For zPCI devices we should drive a platform specific function reset
>>>>>>> as part of VFIO_DEVICE_RESET. This reset is needed recover a zPCI device
>>>>>>> in error state.
>>>>>>>
>>>>>>> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
>>>>>>> ---
>>>>>>>     arch/s390/pci/pci.c              |  1 +
>>>>>>>     drivers/vfio/pci/vfio_pci_core.c |  4 ++++
>>>>>>>     drivers/vfio/pci/vfio_pci_priv.h |  5 ++++
>>>>>>>     drivers/vfio/pci/vfio_pci_zdev.c | 39 ++++++++++++++++++++++++++++++++
>>>>>>>     4 files changed, 49 insertions(+)
>>> --- snip ---
>>>>>>>     
>>>>>>> +int vfio_pci_zdev_reset(struct vfio_pci_core_device *vdev)
>>>>>>> +{
>>>>>>> +	struct zpci_dev *zdev = to_zpci(vdev->pdev);
>>>>>>> +	int rc = -EIO;
>>>>>>> +
>>>>>>> +	if (!zdev)
>>>>>>> +		return -ENODEV;
>>>>>>> +
>>>>>>> +	/*
>>>>>>> +	 * If we can't get the zdev->state_lock the device state is
>>>>>>> +	 * currently undergoing a transition and we bail out - just
>>>>>>> +	 * the same as if the device's state is not configured at all.
>>>>>>> +	 */
>>>>>>> +	if (!mutex_trylock(&zdev->state_lock))
>>>>>>> +		return rc;
>>>>>>> +
>>>>>>> +	/* We can reset only if the function is configured */
>>>>>>> +	if (zdev->state != ZPCI_FN_STATE_CONFIGURED)
>>>>>>> +		goto out;
>>>>>>> +
>>>>>>> +	rc = zpci_hot_reset_device(zdev);
>>>>>>> +	if (rc != 0)
>>>>>>> +		goto out;
>>>>>>> +
>>>>>>> +	if (!vdev->pci_saved_state) {
>>>>>>> +		pci_err(vdev->pdev, "No saved available for the device");
>>>>>>> +		rc = -EIO;
>>>>>>> +		goto out;
>>>>>>> +	}
>>>>>>> +
>>>>>>> +	pci_dev_lock(vdev->pdev);
>>>>>>> +	pci_load_saved_state(vdev->pdev, vdev->pci_saved_state);
>>>>>>> +	pci_restore_state(vdev->pdev);
>>>>>>> +	pci_dev_unlock(vdev->pdev);
>>>>>>> +out:
>>>>>>> +	mutex_unlock(&zdev->state_lock);
>>>>>>> +	return rc;
>>>>>>> +}
>>>>>> This looks like it should be a device or arch specific reset
>>>>>> implemented in drivers/pci, not vfio.  Thanks,
>>>>>>
>>>>>> Alex
>>>>> Are you suggesting to move this to an arch specific function? One thing
>>>>> we need to do after the zpci_hot_reset_device, is to correctly restore
>>>>> the config space of the device. And for vfio-pci bound devices we want
>>>>> to restore the state of the device to when it was initially opened.
>>>> We generally rely on the abstraction of pci_reset_function() to select
>>>> the correct type of reset for a function scope reset.  We've gone to
>>>> quite a bit of effort to implement all device specific resets and
>>>> quirks in the PCI core to be re-used across the kernel.
>>>>
>>>> Calling zpci_hot_reset_device() directly seems contradictory to those
>>>> efforts.  Should pci_reset_function() call this universally on s390x
>>>> rather than providing access to FLR/PM/SBR reset?
>>>>   
>>> I agree with you Alex. Still trying to figure out what's needed for
>>> this. We already do zpci_hot_reset_device() in reset_slot() from the
>>> s390_pci_hpc.c hotplug slot driver and that does get called via
>>> pci_reset_hotplug_slot() and pci_reset_function(). There are a few
>>> problems though that meant it didn't work for Farhan but I agree maybe
>>> we can fix them for the general case. For one pci_reset_function()
>>> via DEVICE_RESET first tries FLR but that won't work with the device in
>>> the error state and MMIO blocked. Sadly __pci_reset_function_locked()
>>> then concludes that other resets also won't work. So that's something
>>> we might want to improve in general, for example maybe we need
>>> something more like pci_dev_acpi_reset() with higher priority than FLR.
>> Yeah I did think of adding something like s390x CLP reset as part of the
>> reset methods. AFAIU the s390x CLP reset is similar to ACPI _RST. But
>> that would introduce s390x specific code in pci core common code.
>>
>>> Now for pci_reset_hotplug_slot() via VFIO_DEVICE_PCI_HOT_RESET I'm not
>>> sure why that won't work as is. @Farhan do you know?
>> VFIO_DEVICE_PCI_HOT_RESET would have been sufficient interface for
>> majority of PCI devices on s390x as that would drive a bus reset. It was
>> sufficient as most devices were single bus devices. But in the latest
>> generation of machines (z17) we expose true SR-IOV and an OS can have
>> access to both PF and VFs and so these are on the same bus and can have
>> different ownership based on what is bound to vfio-pci.
>>
>> My thinking for extending VFIO_DEVICE_RESET is because AFAIU its a per
>> function reset mechanism, which maps well with what our architecture
>> provides. On s390x we can drive a per function reset (via firmware)
>> through the CLP instruction driven by the zpci_hot_reset_device(). And
>> doing it as vfio zpci specific function would confine the s390x logic.
>>
>>>>    Why is it
>>>> universally correct here given the ioctl previously made use of
>>>> standard reset mechanisms?
>>>>
>>>> The DEVICE_RESET ioctl is simply an in-place reset of the device,
>>>> without restoring the original device state.  So we're also subtly
>>>> changing that behavior here, presumably because we're targeting the
>>>> specific error recovery case.  Have you considered how this might
>>>> break non-error-recovery use cases?
>>>>
>>>> I wonder if we want a different reset mechanism for this use case
>>>> rather than these subtle semantic changes.
>>> I think an alternative to that, which Farhan actually had in the
>>> previous internal version, is to implement
>>> pci_error_handlers::reset_done() and do the pci_load_saved_state()
>>> there. That would only affect the error recovery case leaving other
>>> cases alone.
>>>
>>>
>>> Thanks,
>>> Niklas
>> The reason I abandoned reset_done() callback idea is because its not
>> sufficient to recover the device correctly. Today before driving a reset
>> we save the state of the device. When a device is in error state, any
>> pci load/store (on s390x they are actual instructions :)) to config
>> space would return an error value (0xffffffff). We don't have any checks
>> in pci_save_state to prevent storing error values. And after a reset
>> when we try to restore the config space (pci_dev_restore) we try to
>> write the error value and this can be problematic. By the time the
>> reset_done() callback is invoked, its already too late.
> It's too late because we've re-written the error value back to config
> space and as a result the device is broken?
>
>
Yes, exactly.

>   What if
> pci_restore_state() were a little smarter to detect that it has bad
> read data from pci_save_state() and only restores state based on kernel
> data?  Would that leave the device in a functional state that
> reset_done() could restore the original saved state and push it out to
> the device?

Yeah I think this could work. I can try something like this.

>> @Alex,
>> I am open to ideas/suggestions on this. Do we think we need a separate
>> VFIO ioctl to drive this or a new reset mechanism as Niklas suggested?
> Unfortunately I was short sighted on VFIO_DEVICE_RESET and it's the one
> ioctl that doesn't have any flags, so it's not very extensible.
>
> Can we do more of the above, ie. enlighten the FLR/PM reset callbacks to
> return -ENOTTY if the device is in an error state and config space is
> returning -1 such that we fall through to a slot reset that doesn't
> care how broken the device is and you auto-magically get the zpci
> function you want?  Follow-up with pushing the original state in
> reset_done()?  Thanks,
>
> Alex
>
Yeah I can do that. I think adding some validation checks to the FLR/PM 
callbacks wouldn't be a bad idea if its acceptable for PCI maintainers.

If you are okay with a reset_done() callback, I will try to incorporate 
your suggestions and spin a new series.

Thanks
Farhan



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v1 6/6] vfio: Allow error notification and recovery for ISM device
  2025-08-14 21:02     ` Farhan Ali
@ 2025-08-15 20:48       ` Alex Williamson
  2025-08-15 21:36         ` Farhan Ali
  0 siblings, 1 reply; 27+ messages in thread
From: Alex Williamson @ 2025-08-15 20:48 UTC (permalink / raw)
  To: Farhan Ali
  Cc: Bjorn Helgaas, linux-s390, kvm, linux-kernel, schnelle, mjrosato

On Thu, 14 Aug 2025 14:02:05 -0700
Farhan Ali <alifm@linux.ibm.com> wrote:

> On 8/14/2025 1:48 PM, Bjorn Helgaas wrote:
> > On Wed, Aug 13, 2025 at 10:08:20AM -0700, Farhan Ali wrote:  
> >> VFIO allows error recovery and notification for devices that
> >> are PCIe (and thus AER) capable. But for PCI devices on IBM
> >> s390 error recovery involves platform firmware and
> >> notification to operating system is done by architecture
> >> specific way. The Internal Shared Memory(ISM) device is a legacy
> >> PCI device (so not PCIe capable), but can still be recovered
> >> when notified of an error.  
> > "PCIe (and thus AER) capable" reads as though AER is required for all
> > PCIe devices, but AER is optional.
> >
> > I don't know the details of VFIO and why it tests for PCIe instead of
> > AER.  Maybe AER is not relevant here and you don't need to mention
> > AER above at all?  
> 
> The original change that introduced this commit dad9f89 "VFIO-AER: 
> Vfio-pci driver changes for supporting AER" was adding the support for 
> AER for vfio. My assumption is the author thought if the device is AER 
> capable the pcie check should be sufficient? I can remove the AER 
> references in commit message. Thanks Farhan

I've looked back through discussions when this went in and can't find
any specific reasoning about why we chose pci_is_pcie() here.  Maybe
we were trying to avoid setting up an error signal on devices that
cannot have AER, but then why didn't we check specifically for AER.
Maybe some version used PCIe specific calls in the handler that we
didn't want to check runtime, but I don't spot such a dependency now.

Possibly we should just remove the check.  We're configuring the error
signaling on the vast majority of devices, it's extremely rare that it
fires anyway, reporting it on a device where it cannot trigger seems
relatively negligible and avoids extra ugly code.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v1 6/6] vfio: Allow error notification and recovery for ISM device
  2025-08-15 20:48       ` Alex Williamson
@ 2025-08-15 21:36         ` Farhan Ali
  0 siblings, 0 replies; 27+ messages in thread
From: Farhan Ali @ 2025-08-15 21:36 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Bjorn Helgaas, linux-s390, kvm, linux-kernel, schnelle, mjrosato


On 8/15/2025 1:48 PM, Alex Williamson wrote:
> On Thu, 14 Aug 2025 14:02:05 -0700
> Farhan Ali <alifm@linux.ibm.com> wrote:
>
>> On 8/14/2025 1:48 PM, Bjorn Helgaas wrote:
>>> On Wed, Aug 13, 2025 at 10:08:20AM -0700, Farhan Ali wrote:
>>>> VFIO allows error recovery and notification for devices that
>>>> are PCIe (and thus AER) capable. But for PCI devices on IBM
>>>> s390 error recovery involves platform firmware and
>>>> notification to operating system is done by architecture
>>>> specific way. The Internal Shared Memory(ISM) device is a legacy
>>>> PCI device (so not PCIe capable), but can still be recovered
>>>> when notified of an error.
>>> "PCIe (and thus AER) capable" reads as though AER is required for all
>>> PCIe devices, but AER is optional.
>>>
>>> I don't know the details of VFIO and why it tests for PCIe instead of
>>> AER.  Maybe AER is not relevant here and you don't need to mention
>>> AER above at all?
>> The original change that introduced this commit dad9f89 "VFIO-AER:
>> Vfio-pci driver changes for supporting AER" was adding the support for
>> AER for vfio. My assumption is the author thought if the device is AER
>> capable the pcie check should be sufficient? I can remove the AER
>> references in commit message. Thanks Farhan
> I've looked back through discussions when this went in and can't find
> any specific reasoning about why we chose pci_is_pcie() here.  Maybe
> we were trying to avoid setting up an error signal on devices that
> cannot have AER, but then why didn't we check specifically for AER.
> Maybe some version used PCIe specific calls in the handler that we
> didn't want to check runtime, but I don't spot such a dependency now.
>
> Possibly we should just remove the check.  We're configuring the error
> signaling on the vast majority of devices, it's extremely rare that it
> fires anyway, reporting it on a device where it cannot trigger seems
> relatively negligible and avoids extra ugly code.  Thanks,
>
> Alex

Okay will remove the check and re-word the commit message.

Thanks
Farhan


^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2025-08-15 21:36 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-13 17:08 [PATCH v1 0/6] Error recovery for vfio-pci devices on s390x Farhan Ali
2025-08-13 17:08 ` [PATCH v1 1/6] s390/pci: Restore airq unconditionally for the zPCI device Farhan Ali
2025-08-14 11:32   ` Niklas Schnelle
2025-08-14 16:42     ` Farhan Ali
2025-08-13 17:08 ` [PATCH v1 2/6] s390/pci: Update the logic for detecting passthrough device Farhan Ali
2025-08-13 17:08 ` [PATCH v1 3/6] s390/pci: Store PCI error information for passthrough devices Farhan Ali
2025-08-13 17:08 ` [PATCH v1 4/6] vfio-pci/zdev: Setup a zpci memory region for error information Farhan Ali
2025-08-13 20:30   ` Alex Williamson
2025-08-13 21:25     ` Farhan Ali
2025-08-13 21:42       ` Alex Williamson
2025-08-13 17:08 ` [PATCH v1 5/6] vfio-pci/zdev: Perform platform specific function reset for zPCI Farhan Ali
2025-08-13 20:30   ` Alex Williamson
2025-08-13 21:52     ` Farhan Ali
2025-08-13 22:56       ` Alex Williamson
2025-08-14 13:12         ` Niklas Schnelle
2025-08-14 16:33           ` Farhan Ali
2025-08-14 19:55             ` Niklas Schnelle
2025-08-14 20:57             ` Alex Williamson
2025-08-14 22:33               ` Farhan Ali
2025-08-14  5:22   ` kernel test robot
2025-08-14  7:42   ` kernel test robot
2025-08-13 17:08 ` [PATCH v1 6/6] vfio: Allow error notification and recovery for ISM device Farhan Ali
2025-08-14 20:48   ` Bjorn Helgaas
2025-08-14 21:02     ` Farhan Ali
2025-08-15 20:48       ` Alex Williamson
2025-08-15 21:36         ` Farhan Ali
2025-08-13 17:45 ` [PATCH v1 0/6] Error recovery for vfio-pci devices on s390x Farhan Ali

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).