All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 0/4] vfio-pci/zdev: Improved zPCI Function Measurement Support
@ 2026-06-26 17:55 Omar Elghoul
  2026-06-26 17:55 ` [PATCH v5 1/4] s390/pci: Hold fmb_lock when enabling or disabling PCI devices Omar Elghoul
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Omar Elghoul @ 2026-06-26 17:55 UTC (permalink / raw)
  To: linux-s390, linux-kernel, kvm
  Cc: oelghoul, hca, gor, agordeev, borntraeger, svens, schnelle,
	mjrosato, alifm, farman, gbayer, alex

Hi,

This patch series improves support for function measurement for zPCI
passthrough devices on s390x.

Changelog
=========
v4 -> v5:
* Typo in the cover letter
* Swap the ordering of patches 3/4 and 4/4 to ease merging (i.e., to
  ensure the three s390 patches are ordered before the VFIO patch)
* Patch 2/4:
  - Drop the refactor of zpci_fmb_enable_device() and the separation of
    zpci_fmb_clear_iommu_ctrs() and zpci_fmb_do_enable()
  - Allocate a new buffer in zpci_fmb_reenable_device() rather than
    reusing the same buffer to avoid firmware edge cases

* Patch 3/4 (previously 4/4):
  - Avoid reading from userspace while holding kzdev_lock unnecessarily

* Patch 4/4 (previously 3/4):
  - Drop allowing usercopy of the FMB when initializing the kmem_cache
  - Avoid copying to userspace while holding fmb_lock unnecessarily
    - Restore the FMB bounce buffer to achieve this one
  - Clarify uAPI documentation and ensure it accurately describes the
    behavior of the VFIO features

v3 -> v4:
* Patch 2/4:
  - Replace mutex_lock/unlock in zpci_reenable_device() with a guard

* Patch 3/4:
  - Allow usercopy of the FMB when initializing its kmem_cache
  - Move the guard in vfio_pci_zdev_feature_fmb_enable() lower to only
    protect the FMB
  - Ensure vfio_pci_zdev_feature_fmb_enable() fails on double-enable for
    consistency with the documentation
  - Eliminate the bounce buffer in vfio_pci_zdev_feature_fmb_read()
  - Replace the void pointer with __aligned_u64 in the FMB read uAPI
    structure

v2 -> v3:
* Patch 1/4 (new patch):
  - Fix race conditions in pcibios_enable/disable_device() with regard to
    the FMB enable/disable
  - Assert that fmb_lock is held within zpci_fmb_enable_device() and
    zpci_fmb_disable_device()

* Patch 2/4 (previously 1/3):
  - Move the FMB enable logic into a static function zpci_fmb_do_enable()
    to reduce code duplication between zpci_fmb_enable_device() and
    zpci_fmb_reenable_device()
  - Reword commit message to use the imperative voice more consistently

* Patch 3/4 (previously 2/3):
  - Split the previous VFIO feature into a SET-only and a GET-only feature
    for enabling/disabling and reading the FMB respectively
  - Remove FMB definitions from the VFIO uAPI and instead treat it as an
    opaque structure

* Patch 4/4 (previously 3/3):
  - Clarify goto label name to reduce misunderstandings

v1 -> v2:
* Patch 1/3:
  - Address a possible race condition in zpci_reenable_device() caused by
    calling zpci_fmb_reenable_device() without holding fmb_lock
  - Assert that fmb_lock is held within zpci_fmb_reenable_device()

* Patch 3/3:
  - Address a possible race condition in pci_perf_seq_write() caused by
    consuming zdev->kzdev without holding kzdev_lock

Motivation
==========
The firmware on s390x machines allows for tracking a variety of statistics
relating to zPCI devices in a function measurement block (FMB). However,
the kernel currently lacks a structured mechanism of sharing this
information with userspace, beyond /sys/kernel/debug/pci/ID/statistics.
This can lead to shortcomings when running a guest on KVM with PCI
passthrough devices, as QEMU is unable to provide an accurate FMB snapshot
to the guest.

Proposal
========
We propose adding a new VFIO device feature to zPCI passthrough devices,
allowing userspace programs to read the latest FMB snapshot as it is
written by the firmware. We ensure that function measurement enablement is
preserved across device resets on the host. Furthermore, we guard against
host tampering with the FMB via sysfs when the zPCI device is in
passthrough to protect the VM's state.

I'd appreciate some feedback on these patches.

Thanks in advance.

Omar Elghoul (4):
  s390/pci: Hold fmb_lock when enabling or disabling PCI devices
  s390/pci: Preserve FMB state in device re-enablement
  s390/pci: Fence FMB enable/disable via debugfs for passthrough devices
  vfio-pci/zdev: Add VFIO FMB device features

 arch/s390/include/asm/pci.h      |  1 +
 arch/s390/pci/pci.c              | 42 +++++++++++++++++++++-
 arch/s390/pci/pci_debug.c        |  9 +++++
 drivers/vfio/pci/vfio_pci_core.c |  4 +++
 drivers/vfio/pci/vfio_pci_priv.h | 18 ++++++++++
 drivers/vfio/pci/vfio_pci_zdev.c | 60 ++++++++++++++++++++++++++++++++
 include/uapi/linux/vfio.h        | 29 +++++++++++++++
 7 files changed, 162 insertions(+), 1 deletion(-)

-- 
2.54.0


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v5 1/4] s390/pci: Hold fmb_lock when enabling or disabling PCI devices
  2026-06-26 17:55 [PATCH v5 0/4] vfio-pci/zdev: Improved zPCI Function Measurement Support Omar Elghoul
@ 2026-06-26 17:55 ` Omar Elghoul
  2026-06-26 18:12   ` sashiko-bot
  2026-06-26 17:55 ` [PATCH v5 2/4] s390/pci: Preserve FMB state in device re-enablement Omar Elghoul
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 8+ messages in thread
From: Omar Elghoul @ 2026-06-26 17:55 UTC (permalink / raw)
  To: linux-s390, linux-kernel, kvm
  Cc: oelghoul, hca, gor, agordeev, borntraeger, svens, schnelle,
	mjrosato, alifm, farman, gbayer, alex, stable

Ensure that fmb_lock is held by pcibios_enable_device() and
pcibios_disable_device() when calling zpci_fmb_enable_device() or
zpci_fmb_disable_device(), respectively. Additionally, assert that the
fmb_lock is held within the latter two functions to prevent future race
conditions regarding new callers.

Fixes: af0a8a8453f7 ("s390/pci: implement pcibios_add_device")
Fixes: 944239c59e93 ("s390/pci: implement pcibios_release_device")
Cc: stable@vger.kernel.org
Signed-off-by: Omar Elghoul <oelghoul@linux.ibm.com>
Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
---
 arch/s390/pci/pci.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c
index 39bd2adfc240..2910d4038d39 100644
--- a/arch/s390/pci/pci.c
+++ b/arch/s390/pci/pci.c
@@ -173,6 +173,8 @@ int zpci_fmb_enable_device(struct zpci_dev *zdev)
 	unsigned long flags;
 	u8 cc, status;
 
+	lockdep_assert_held(&zdev->fmb_lock);
+
 	if (zdev->fmb || sizeof(*zdev->fmb) < zdev->fmb_length)
 		return -EINVAL;
 
@@ -211,6 +213,8 @@ int zpci_fmb_disable_device(struct zpci_dev *zdev)
 	struct zpci_fib fib = {0};
 	u8 cc, status;
 
+	lockdep_assert_held(&zdev->fmb_lock);
+
 	if (!zdev->fmb)
 		return -EINVAL;
 
@@ -639,7 +643,9 @@ int pcibios_enable_device(struct pci_dev *pdev, int mask)
 	struct zpci_dev *zdev = to_zpci(pdev);
 
 	zpci_debug_init_device(zdev, dev_name(&pdev->dev));
+	mutex_lock(&zdev->fmb_lock);
 	zpci_fmb_enable_device(zdev);
+	mutex_unlock(&zdev->fmb_lock);
 
 	return pci_enable_resources(pdev, mask);
 }
@@ -648,7 +654,9 @@ void pcibios_disable_device(struct pci_dev *pdev)
 {
 	struct zpci_dev *zdev = to_zpci(pdev);
 
+	mutex_lock(&zdev->fmb_lock);
 	zpci_fmb_disable_device(zdev);
+	mutex_unlock(&zdev->fmb_lock);
 	zpci_debug_exit_device(zdev);
 }
 
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v5 2/4] s390/pci: Preserve FMB state in device re-enablement
  2026-06-26 17:55 [PATCH v5 0/4] vfio-pci/zdev: Improved zPCI Function Measurement Support Omar Elghoul
  2026-06-26 17:55 ` [PATCH v5 1/4] s390/pci: Hold fmb_lock when enabling or disabling PCI devices Omar Elghoul
@ 2026-06-26 17:55 ` Omar Elghoul
  2026-06-26 18:06   ` sashiko-bot
  2026-07-01  8:34   ` Niklas Schnelle
  2026-06-26 17:55 ` [PATCH v5 3/4] s390/pci: Fence FMB enable/disable via debugfs for passthrough devices Omar Elghoul
  2026-06-26 17:55 ` [PATCH v5 4/4] vfio-pci/zdev: Add VFIO FMB device features Omar Elghoul
  3 siblings, 2 replies; 8+ messages in thread
From: Omar Elghoul @ 2026-06-26 17:55 UTC (permalink / raw)
  To: linux-s390, linux-kernel, kvm
  Cc: oelghoul, hca, gor, agordeev, borntraeger, svens, schnelle,
	mjrosato, alifm, farman, gbayer, alex

Introduce a function zpci_fmb_reenable_device() that checks the state of
the FMB and ensures it is enabled. Reset the counters to zero, disable, and
re-enable the FMB if it was already enabled. Call this function during a
zPCI device re-enablement, which in turn implicitly ensures that the FMB is
enabled for host devices during their KVM registration.

Signed-off-by: Omar Elghoul <oelghoul@linux.ibm.com>
---
 arch/s390/include/asm/pci.h |  1 +
 arch/s390/pci/pci.c         | 34 +++++++++++++++++++++++++++++++++-
 2 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h
index 5dcf35f0f325..65014e52d559 100644
--- a/arch/s390/include/asm/pci.h
+++ b/arch/s390/include/asm/pci.h
@@ -323,6 +323,7 @@ void zpci_remove_parent_msi_domain(struct zpci_bus *zbus);
 /* FMB */
 int zpci_fmb_enable_device(struct zpci_dev *);
 int zpci_fmb_disable_device(struct zpci_dev *);
+int zpci_fmb_reenable_device(struct zpci_dev *zdev);
 
 /* Debug */
 int zpci_debug_init(void);
diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c
index 2910d4038d39..1eb6aa772eb3 100644
--- a/arch/s390/pci/pci.c
+++ b/arch/s390/pci/pci.c
@@ -231,6 +231,34 @@ int zpci_fmb_disable_device(struct zpci_dev *zdev)
 	}
 	return cc ? -EIO : 0;
 }
+EXPORT_SYMBOL_GPL(zpci_fmb_disable_device);
+
+int zpci_fmb_reenable_device(struct zpci_dev *zdev)
+{
+	u64 req = ZPCI_CREATE_REQ(zdev->fh, 0, ZPCI_MOD_FC_SET_MEASURE);
+	struct zpci_fib fib = {0};
+	u8 cc, status;
+
+	lockdep_assert_held(&zdev->fmb_lock);
+
+	if (!zdev->fmb)
+		return zpci_fmb_enable_device(zdev);
+
+	fib.gd = zdev->gisa;
+	cc = zpci_mod_fc(req, &fib, &status); /* Disable function measurement */
+
+	/* Unlike in zpci_fmb_disable_device(), cc == 3 is not a valid state here
+	 * because we are re-enabling function measurement for the same function
+	 * handle.
+	 */
+	if (cc)
+		return -EIO;
+
+	kmem_cache_free(zdev_fmb_cache, zdev->fmb);
+	zdev->fmb = NULL;
+	return zpci_fmb_enable_device(zdev);
+}
+EXPORT_SYMBOL_GPL(zpci_fmb_reenable_device);
 
 static int zpci_cfg_load(struct zpci_dev *zdev, int offset, u32 *val, u8 len)
 {
@@ -737,9 +765,13 @@ int zpci_reenable_device(struct zpci_dev *zdev)
 	}
 
 	rc = zpci_iommu_register_ioat(zdev, &status);
-	if (rc)
+	if (rc) {
 		zpci_disable_device(zdev);
+		return rc;
+	}
 
+	guard(mutex)(&zdev->fmb_lock);
+	zpci_fmb_reenable_device(zdev);
 	return rc;
 }
 EXPORT_SYMBOL_GPL(zpci_reenable_device);
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v5 3/4] s390/pci: Fence FMB enable/disable via debugfs for passthrough devices
  2026-06-26 17:55 [PATCH v5 0/4] vfio-pci/zdev: Improved zPCI Function Measurement Support Omar Elghoul
  2026-06-26 17:55 ` [PATCH v5 1/4] s390/pci: Hold fmb_lock when enabling or disabling PCI devices Omar Elghoul
  2026-06-26 17:55 ` [PATCH v5 2/4] s390/pci: Preserve FMB state in device re-enablement Omar Elghoul
@ 2026-06-26 17:55 ` Omar Elghoul
  2026-06-26 17:55 ` [PATCH v5 4/4] vfio-pci/zdev: Add VFIO FMB device features Omar Elghoul
  3 siblings, 0 replies; 8+ messages in thread
From: Omar Elghoul @ 2026-06-26 17:55 UTC (permalink / raw)
  To: linux-s390, linux-kernel, kvm
  Cc: oelghoul, hca, gor, agordeev, borntraeger, svens, schnelle,
	mjrosato, alifm, farman, gbayer, alex

Introduce a fence over enabling or disabling FMB via debugfs when the zPCI
device is associated with a KVM. This will prevent processes on the host
from tampering with the FMB while the guest is still using it, which may
cause partial counter resets and inconsistent reads which have no parallel
in the architecture.

For VFIO devices that are not associated with a KVM (i.e., for userspace
drivers other than QEMU), this fence does not take effect.

Signed-off-by: Omar Elghoul <oelghoul@linux.ibm.com>
---
 arch/s390/pci/pci_debug.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/arch/s390/pci/pci_debug.c b/arch/s390/pci/pci_debug.c
index c7ed7bf254b5..23eb7e72c870 100644
--- a/arch/s390/pci/pci_debug.c
+++ b/arch/s390/pci/pci_debug.c
@@ -153,6 +153,12 @@ static ssize_t pci_perf_seq_write(struct file *file, const char __user *ubuf,
 	if (rc)
 		return rc;
 
+	mutex_lock(&zdev->kzdev_lock);
+	if (zdev->kzdev) {
+		rc = -EPERM;
+		goto out_unlock_kzdev;
+	}
+
 	mutex_lock(&zdev->fmb_lock);
 	switch (val) {
 	case 0:
@@ -163,6 +169,9 @@ static ssize_t pci_perf_seq_write(struct file *file, const char __user *ubuf,
 		break;
 	}
 	mutex_unlock(&zdev->fmb_lock);
+
+out_unlock_kzdev:
+	mutex_unlock(&zdev->kzdev_lock);
 	return rc ? rc : count;
 }
 
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v5 4/4] vfio-pci/zdev: Add VFIO FMB device features
  2026-06-26 17:55 [PATCH v5 0/4] vfio-pci/zdev: Improved zPCI Function Measurement Support Omar Elghoul
                   ` (2 preceding siblings ...)
  2026-06-26 17:55 ` [PATCH v5 3/4] s390/pci: Fence FMB enable/disable via debugfs for passthrough devices Omar Elghoul
@ 2026-06-26 17:55 ` Omar Elghoul
  3 siblings, 0 replies; 8+ messages in thread
From: Omar Elghoul @ 2026-06-26 17:55 UTC (permalink / raw)
  To: linux-s390, linux-kernel, kvm
  Cc: oelghoul, hca, gor, agordeev, borntraeger, svens, schnelle,
	mjrosato, alifm, farman, gbayer, alex

Introduce new VFIO features for zPCI devices to provide FMB passthrough to
userspace.

Allow the user to enable or disable the FMB using the SET-only feature
VFIO_DEVICE_FEATURE_ZPCI_FMB_ENABLE. Likewise allow the user to read the
latest FMB using the GET-only feature VFIO_DEVICE_FEATURE_ZPCI_FMB_READ
in the case where the FMB is enabled.

Signed-off-by: Omar Elghoul <oelghoul@linux.ibm.com>
---
 drivers/vfio/pci/vfio_pci_core.c |  4 +++
 drivers/vfio/pci/vfio_pci_priv.h | 18 ++++++++++
 drivers/vfio/pci/vfio_pci_zdev.c | 60 ++++++++++++++++++++++++++++++++
 include/uapi/linux/vfio.h        | 29 +++++++++++++++
 4 files changed, 111 insertions(+)

diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index a28f1e99362c..a4b0717ba8d6 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -1572,6 +1572,10 @@ int vfio_pci_core_ioctl_feature(struct vfio_device *device, u32 flags,
 		return vfio_pci_core_feature_token(vdev, flags, arg, argsz);
 	case VFIO_DEVICE_FEATURE_DMA_BUF:
 		return vfio_pci_core_feature_dma_buf(vdev, flags, arg, argsz);
+	case VFIO_DEVICE_FEATURE_ZPCI_FMB_ENABLE:
+		return vfio_pci_zdev_feature_fmb_enable(vdev, flags, arg, argsz);
+	case VFIO_DEVICE_FEATURE_ZPCI_FMB_READ:
+		return vfio_pci_zdev_feature_fmb_read(vdev, flags, arg, argsz);
 	default:
 		return -ENOTTY;
 	}
diff --git a/drivers/vfio/pci/vfio_pci_priv.h b/drivers/vfio/pci/vfio_pci_priv.h
index fca9d0dfac90..b7db064a6a95 100644
--- a/drivers/vfio/pci/vfio_pci_priv.h
+++ b/drivers/vfio/pci/vfio_pci_priv.h
@@ -93,6 +93,10 @@ int vfio_pci_info_zdev_add_caps(struct vfio_pci_core_device *vdev,
 				struct vfio_info_cap *caps);
 int vfio_pci_zdev_open_device(struct vfio_pci_core_device *vdev);
 void vfio_pci_zdev_close_device(struct vfio_pci_core_device *vdev);
+int vfio_pci_zdev_feature_fmb_enable(struct vfio_pci_core_device *vdev, u32 flags,
+				     void __user *arg, size_t argsz);
+int vfio_pci_zdev_feature_fmb_read(struct vfio_pci_core_device *vdev, u32 flags,
+				   void __user *arg, size_t argsz);
 #else
 static inline int vfio_pci_info_zdev_add_caps(struct vfio_pci_core_device *vdev,
 					      struct vfio_info_cap *caps)
@@ -107,6 +111,20 @@ static inline int vfio_pci_zdev_open_device(struct vfio_pci_core_device *vdev)
 
 static inline void vfio_pci_zdev_close_device(struct vfio_pci_core_device *vdev)
 {}
+
+static inline int vfio_pci_zdev_feature_fmb_enable(struct vfio_pci_core_device *vdev,
+						   u32 flags, void __user *arg,
+						   size_t argsz)
+{
+	return -ENOTTY;
+}
+
+static inline int vfio_pci_zdev_feature_fmb_read(struct vfio_pci_core_device *vdev,
+						 u32 flags, void __user *arg,
+						 size_t argsz)
+{
+	return -ENOTTY;
+}
 #endif
 
 static inline bool vfio_pci_is_vga(struct pci_dev *pdev)
diff --git a/drivers/vfio/pci/vfio_pci_zdev.c b/drivers/vfio/pci/vfio_pci_zdev.c
index 0990fdb146b7..1bd359ad6e4a 100644
--- a/drivers/vfio/pci/vfio_pci_zdev.c
+++ b/drivers/vfio/pci/vfio_pci_zdev.c
@@ -167,3 +167,63 @@ void vfio_pci_zdev_close_device(struct vfio_pci_core_device *vdev)
 	if (zpci_kvm_hook.kvm_unregister)
 		zpci_kvm_hook.kvm_unregister(zdev);
 }
+
+int vfio_pci_zdev_feature_fmb_enable(struct vfio_pci_core_device *vdev, u32 flags,
+				     void __user *arg, size_t argsz)
+{
+	struct zpci_dev *zdev;
+	struct vfio_device_feature_zpci_fmb_enable fmb_enable;
+	int ret;
+
+	ret = vfio_check_feature(flags, argsz, VFIO_DEVICE_FEATURE_SET, sizeof(fmb_enable));
+	if (ret != 1)
+		return ret;
+
+	zdev = to_zpci(vdev->pdev);
+	if (!zdev)
+		return -ENODEV;
+
+	if (copy_from_user(&fmb_enable, arg, sizeof(fmb_enable)))
+		return -EFAULT;
+
+	guard(mutex)(&zdev->fmb_lock);
+
+	if (fmb_enable.enabled)
+		return zpci_fmb_reenable_device(zdev);
+	return zpci_fmb_disable_device(zdev);
+}
+
+int vfio_pci_zdev_feature_fmb_read(struct vfio_pci_core_device *vdev, u32 flags,
+				   void __user *arg, size_t argsz)
+{
+	struct zpci_dev *zdev;
+	struct vfio_device_feature_zpci_fmb_read fmb_read;
+	struct zpci_fmb fmb_bounce;
+	int ret;
+
+	ret = vfio_check_feature(flags, argsz, VFIO_DEVICE_FEATURE_GET, sizeof(fmb_read));
+	if (ret != 1)
+		return ret;
+
+	zdev = to_zpci(vdev->pdev);
+	if (!zdev)
+		return -ENODEV;
+
+	if (copy_from_user(&fmb_read, arg, sizeof(fmb_read)))
+		return -EFAULT;
+	if (!fmb_read.data)
+		return -EINVAL;
+
+	mutex_lock(&zdev->fmb_lock);
+	if (!zdev->fmb) {
+		mutex_unlock(&zdev->fmb_lock);
+		return -ENOMSG;
+	}
+
+	memcpy(&fmb_bounce, zdev->fmb, zdev->fmb_length);
+	mutex_unlock(&zdev->fmb_lock);
+
+	if (copy_to_user(u64_to_user_ptr(fmb_read.data), &fmb_bounce, zdev->fmb_length))
+		return -EFAULT;
+	return 0;
+}
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 5de618a3a5ee..2b1b66eeef12 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -1534,6 +1534,35 @@ struct vfio_device_feature_dma_buf {
  */
 #define VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2  12
 
+/**
+ * Upon VFIO_DEVICE_FEATURE_SET, enable or disable FMB for the VFIO zPCI device.
+ *
+ * enabled is treated as a bool, so any non-zero value evaluates to true. Nested
+ * enabling resets the FMB counters to zero, and nested disabling fails.
+ *
+ * Returns: 0 on success, -1 and errno set appropriately on error.
+ */
+#define VFIO_DEVICE_FEATURE_ZPCI_FMB_ENABLE 13
+
+struct vfio_device_feature_zpci_fmb_enable {
+	__u8 enabled;
+};
+
+/**
+ * Upon VFIO_DEVICE_FEATURE_GET, provide FMB passthrough for VFIO zPCI devices.
+ *
+ * The user-provided buffer must be at least fmb_length large, where fmb_length
+ * is reported in VFIO_DEVICE_INFO_CAP_ZPCI_BASE.
+ *
+ * Returns: 0 on success, -1 and errno set appropriately on error. errno==ENOMSG
+ * when the FMB is not enabled.
+ */
+#define VFIO_DEVICE_FEATURE_ZPCI_FMB_READ 14
+
+struct vfio_device_feature_zpci_fmb_read {
+	__aligned_u64 data;
+};
+
 /* -------- API for Type1 VFIO IOMMU -------- */
 
 /**
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v5 2/4] s390/pci: Preserve FMB state in device re-enablement
  2026-06-26 17:55 ` [PATCH v5 2/4] s390/pci: Preserve FMB state in device re-enablement Omar Elghoul
@ 2026-06-26 18:06   ` sashiko-bot
  2026-07-01  8:34   ` Niklas Schnelle
  1 sibling, 0 replies; 8+ messages in thread
From: sashiko-bot @ 2026-06-26 18:06 UTC (permalink / raw)
  To: Omar Elghoul
  Cc: Vasily Gorbik, kvm, linux-s390, Heiko Carstens, Alexander Gordeev

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:

Pre-existing issues:
- [High] This problem wasn't introduced by this patch, but there is a missing resource cleanup in `zpci_reenable_device()` on the `zpci_set_irq()` error path. If `zpci_set_irq()` fails, the function returns immediately without calling `zpci_disable_device()`, unlike the subsequent IOMMU registration error path. This leaves the device enabled in an inconsistent state.
--

commit 94bfdcea02e7e7dca9c322a0dc40fac61a9fb275
Author: Omar Elghoul <oelghoul@linux.ibm.com>

s390/pci: Preserve FMB state in device re-enablement

This commit introduces zpci_fmb_reenable_device() to check the FMB state and
ensure it is enabled. It resets the counters and handles re-enabling if the
FMB was already active.

> diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c
> index 2910d4038d39c..1eb6aa772eb3f 100644
> --- a/arch/s390/pci/pci.c
> +++ b/arch/s390/pci/pci.c
[ ... ]
> @@ -737,9 +765,13 @@ int zpci_reenable_device(struct zpci_dev *zdev)
>  	}
>  
>  	rc = zpci_iommu_register_ioat(zdev, &status);
> -	if (rc)
> +	if (rc) {
>  		zpci_disable_device(zdev);
> +		return rc;
> +	}

[Severity: High]
This isn't a bug introduced by this patch, but there is a pre-existing issue
with resource cleanup in zpci_reenable_device().

If zpci_set_irq() fails earlier in this function:

        if (zdev->msi_nr_irqs > 0) {
                rc = zpci_set_irq(zdev);
                if (rc)
                        return rc;
        }

It returns immediately without calling zpci_disable_device(), unlike the
subsequent zpci_iommu_register_ioat() error path. Does this leave the device
enabled in the firmware but failed in Linux, rendering it unusable and
leaking its enabled state?

>  
> +	guard(mutex)(&zdev->fmb_lock);
> +	zpci_fmb_reenable_device(zdev);
>  	return rc;
>  }

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260626175525.37370-1-oelghoul@linux.ibm.com?part=2

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v5 1/4] s390/pci: Hold fmb_lock when enabling or disabling PCI devices
  2026-06-26 17:55 ` [PATCH v5 1/4] s390/pci: Hold fmb_lock when enabling or disabling PCI devices Omar Elghoul
@ 2026-06-26 18:12   ` sashiko-bot
  0 siblings, 0 replies; 8+ messages in thread
From: sashiko-bot @ 2026-06-26 18:12 UTC (permalink / raw)
  To: Omar Elghoul
  Cc: Vasily Gorbik, Alexander Gordeev, linux-s390, Heiko Carstens, kvm

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:

Pre-existing issues:
- [High] Missing cleanup in pcibios_enable_device() on pci_enable_resources() failure leads to FMB memory leak and Use-After-Free.
- [Medium] Race condition between pcibios_disable_device() and pci_perf_seq_write() allows FMB to be re-enabled on a disabled device, causing a memory leak.
--

commit 2e2fca3d1e089622d5f93c6cb6a886a708c7c22d
Author: Omar Elghoul <oelghoul@linux.ibm.com>

s390/pci: Hold fmb_lock when enabling or disabling PCI devices

This commit ensures fmb_lock is held in pcibios_enable_device() and
pcibios_disable_device() before calling the zpci fmb enable/disable
functions. It also adds lockdep assertions to prevent future races.

Cc: stable@vger.kernel.org

> diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c
> index 39bd2adfc240f..2910d4038d39c 100644
> --- a/arch/s390/pci/pci.c
> +++ b/arch/s390/pci/pci.c

[ ... ]

> @@ -639,7 +643,9 @@ int pcibios_enable_device(struct pci_dev *pdev, int mask)
>  	struct zpci_dev *zdev = to_zpci(pdev);
>  
>  	zpci_debug_init_device(zdev, dev_name(&pdev->dev));
> +	mutex_lock(&zdev->fmb_lock);
>  	zpci_fmb_enable_device(zdev);
> +	mutex_unlock(&zdev->fmb_lock);
>  
>  	return pci_enable_resources(pdev, mask);
>  }

[Severity: High]
This is a pre-existing issue, but does this lack of cleanup on failure lead
to a use-after-free?

If pci_enable_resources() fails, the function returns an error without rolling
back the debugfs initialization or the FMB allocations. Because the PCI core
considers the device disabled, pcibios_disable_device() is never called to
clean these up.

When the device is eventually released, pcibios_release_device() will drop
the zdev references without destroying the debugfs entries:

arch/s390/pci/pci.c:pcibios_release_device() {
    ...
    zpci_unmap_resources(pdev);
    zpci_zdev_put(zdev);
}

Can a subsequent userspace access to the orphaned debugfs files dereference
the freed zdev pointer and trigger a use-after-free?

> @@ -648,7 +654,9 @@ void pcibios_disable_device(struct pci_dev *pdev)
>  {
>  	struct zpci_dev *zdev = to_zpci(pdev);
>  
> +	mutex_lock(&zdev->fmb_lock);
>  	zpci_fmb_disable_device(zdev);
> +	mutex_unlock(&zdev->fmb_lock);
>  	zpci_debug_exit_device(zdev);
>  }

[Severity: Medium]
This isn't a bug introduced by this patch, but does unlocking fmb_lock before
zpci_debug_exit_device() create a race condition with debugfs writes?

A concurrent write to the debugfs statistics file could potentially execute in
the window between mutex_unlock() and zpci_debug_exit_device() removing the
files:

arch/s390/pci/pci_debug.c:pci_perf_seq_write() {
    ...
    mutex_lock(&zdev->fmb_lock);
    switch (val) {
    ...
    case 1:
        rc = zpci_fmb_enable_device(zdev);
        break;
    }
    mutex_unlock(&zdev->fmb_lock);
}

If pci_perf_seq_write() acquires the lock in this window, could it successfully
re-enable the FMB in hardware right before zpci_debug_exit_device() destroys
the debug entries, leaving the FMB enabled but permanently leaking the memory
upon device release?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260626175525.37370-1-oelghoul@linux.ibm.com?part=1

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v5 2/4] s390/pci: Preserve FMB state in device re-enablement
  2026-06-26 17:55 ` [PATCH v5 2/4] s390/pci: Preserve FMB state in device re-enablement Omar Elghoul
  2026-06-26 18:06   ` sashiko-bot
@ 2026-07-01  8:34   ` Niklas Schnelle
  1 sibling, 0 replies; 8+ messages in thread
From: Niklas Schnelle @ 2026-07-01  8:34 UTC (permalink / raw)
  To: Omar Elghoul, linux-s390, linux-kernel, kvm
  Cc: hca, gor, agordeev, borntraeger, svens, mjrosato, alifm, farman,
	gbayer, alex

On Fri, 2026-06-26 at 13:55 -0400, Omar Elghoul wrote:
> Introduce a function zpci_fmb_reenable_device() that checks the state of
> the FMB and ensures it is enabled. Reset the counters to zero, disable, and
> re-enable the FMB if it was already enabled. Call this function during a
> zPCI device re-enablement, which in turn implicitly ensures that the FMB is
> enabled for host devices during their KVM registration.
> 
> Signed-off-by: Omar Elghoul <oelghoul@linux.ibm.com>
> ---

Just to keep the list up to date. We're still discussing some details
about this internally. Mostly about how this may interact with platform
behavior in some edge cases that are likely not possible with current
implementations but would be covered by the architecture. 

Personally, I actually liked the re-using of the buffer in v4 better
and I think depending on implementations of the kmem_cache,
specifically if it may give us back the same buffer that we just freed
in the immediately following allocation, this could also end up re-
using the same buffer anyway. Hope to get the details sorted soon.

Thanks,
Niklas

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2026-07-01  8:35 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-26 17:55 [PATCH v5 0/4] vfio-pci/zdev: Improved zPCI Function Measurement Support Omar Elghoul
2026-06-26 17:55 ` [PATCH v5 1/4] s390/pci: Hold fmb_lock when enabling or disabling PCI devices Omar Elghoul
2026-06-26 18:12   ` sashiko-bot
2026-06-26 17:55 ` [PATCH v5 2/4] s390/pci: Preserve FMB state in device re-enablement Omar Elghoul
2026-06-26 18:06   ` sashiko-bot
2026-07-01  8:34   ` Niklas Schnelle
2026-06-26 17:55 ` [PATCH v5 3/4] s390/pci: Fence FMB enable/disable via debugfs for passthrough devices Omar Elghoul
2026-06-26 17:55 ` [PATCH v5 4/4] vfio-pci/zdev: Add VFIO FMB device features Omar Elghoul

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.