* [PATCH v1 1/3] s390/pci: Preserve FMB state in device re-enablement
2026-05-01 19:25 [PATCH v1 0/3] vfio-pci/zdev: Improved zPCI Function Measurement Support Omar Elghoul
@ 2026-05-01 19:25 ` Omar Elghoul
2026-05-01 19:25 ` [PATCH v1 2/3] vfio-pci/zdev: Add VFIO FMB device feature Omar Elghoul
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: Omar Elghoul @ 2026-05-01 19:25 UTC (permalink / raw)
To: linux-s390, linux-kernel, kvm
Cc: oelghoul, hca, gor, agordeev, borntraeger, svens, schnelle,
mjrosato, alifm, farman, gbayer, alex
Introduce a function zpci_fmb_reenable_device() that checks for the state
of the FMB and reuses the same buffer where appropriate. If FMB was not
previously enabled, it enables it for the device. Call this function during
a zPCI device re-enablement, which in turn implicitly ensures that the FMB
is enabled for host devices during their KVM registration.
This function also clears out the software counters, so that a program
resetting an FMB would see all its counters restart from zero as expected.
The function to clear the software counters is also separated into a static
function as it is now reused in both zpci_fmb_enable_device() and
zpci_fmb_reenable_device().
Signed-off-by: Omar Elghoul <oelghoul@linux.ibm.com>
---
arch/s390/include/asm/pci.h | 1 +
arch/s390/pci/pci.c | 71 ++++++++++++++++++++++++++++++-------
2 files changed, 59 insertions(+), 13 deletions(-)
diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h
index 5dcf35f0f325..65014e52d559 100644
--- a/arch/s390/include/asm/pci.h
+++ b/arch/s390/include/asm/pci.h
@@ -323,6 +323,7 @@ void zpci_remove_parent_msi_domain(struct zpci_bus *zbus);
/* FMB */
int zpci_fmb_enable_device(struct zpci_dev *);
int zpci_fmb_disable_device(struct zpci_dev *);
+int zpci_fmb_reenable_device(struct zpci_dev *zdev);
/* Debug */
int zpci_debug_init(void);
diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c
index 39bd2adfc240..9bc38e041130 100644
--- a/arch/s390/pci/pci.c
+++ b/arch/s390/pci/pci.c
@@ -164,6 +164,24 @@ int zpci_unregister_ioat(struct zpci_dev *zdev, u8 dmaas)
return cc;
}
+static void zpci_fmb_clear_iommu_ctrs(struct zpci_dev *zdev)
+{
+ struct zpci_iommu_ctrs *ctrs;
+ unsigned long flags = 0;
+
+ /* reset software counters */
+ spin_lock_irqsave(&zdev->dom_lock, flags);
+ ctrs = zpci_get_iommu_ctrs(zdev);
+ if (ctrs) {
+ atomic64_set(&ctrs->mapped_pages, 0);
+ atomic64_set(&ctrs->unmapped_pages, 0);
+ atomic64_set(&ctrs->global_rpcits, 0);
+ atomic64_set(&ctrs->sync_map_rpcits, 0);
+ atomic64_set(&ctrs->sync_rpcits, 0);
+ }
+ spin_unlock_irqrestore(&zdev->dom_lock, flags);
+}
+
/* Modify PCI: Set PCI function measurement parameters */
int zpci_fmb_enable_device(struct zpci_dev *zdev)
{
@@ -181,18 +199,7 @@ int zpci_fmb_enable_device(struct zpci_dev *zdev)
return -ENOMEM;
WARN_ON((u64) zdev->fmb & 0xf);
- /* reset software counters */
- spin_lock_irqsave(&zdev->dom_lock, flags);
- ctrs = zpci_get_iommu_ctrs(zdev);
- if (ctrs) {
- atomic64_set(&ctrs->mapped_pages, 0);
- atomic64_set(&ctrs->unmapped_pages, 0);
- atomic64_set(&ctrs->global_rpcits, 0);
- atomic64_set(&ctrs->sync_map_rpcits, 0);
- atomic64_set(&ctrs->sync_rpcits, 0);
- }
- spin_unlock_irqrestore(&zdev->dom_lock, flags);
-
+ zpci_fmb_clear_iommu_ctrs(zdev);
fib.fmb_addr = virt_to_phys(zdev->fmb);
fib.gd = zdev->gisa;
@@ -227,6 +234,41 @@ int zpci_fmb_disable_device(struct zpci_dev *zdev)
}
return cc ? -EIO : 0;
}
+EXPORT_SYMBOL_GPL(zpci_fmb_disable_device);
+
+int zpci_fmb_reenable_device(struct zpci_dev *zdev)
+{
+ u64 req = ZPCI_CREATE_REQ(zdev->fh, 0, ZPCI_MOD_FC_SET_MEASURE);
+ struct zpci_iommu_ctrs *ctrs;
+ struct zpci_fib fib = {0};
+ unsigned long flags;
+ u8 cc, status;
+
+ if (!zdev->fmb)
+ return zpci_fmb_enable_device(zdev);
+
+ fib.gd = zdev->gisa;
+ cc = zpci_mod_fc(req, &fib, &status); /* Disable function measurement */
+
+ /* Unlike in zpci_fmb_disable_device(), cc == 3 is not a valid state here
+ * because we are re-enabling function measurement for the same function
+ * handle.
+ */
+ if (cc)
+ return -EIO;
+
+ zpci_fmb_clear_iommu_ctrs(zdev);
+
+ fib.fmb_addr = virt_to_phys(zdev->fmb);
+ cc = zpci_mod_fc(req, &fib, &status); /* Re-enable function measurement */
+ if (cc) {
+ kmem_cache_free(zdev_fmb_cache, zdev->fmb);
+ zdev->fmb = NULL;
+ return -EIO;
+ }
+ return 0;
+}
+EXPORT_SYMBOL_GPL(zpci_fmb_reenable_device);
static int zpci_cfg_load(struct zpci_dev *zdev, int offset, u32 *val, u8 len)
{
@@ -729,9 +771,12 @@ int zpci_reenable_device(struct zpci_dev *zdev)
}
rc = zpci_iommu_register_ioat(zdev, &status);
- if (rc)
+ if (rc) {
zpci_disable_device(zdev);
+ return rc;
+ }
+ zpci_fmb_reenable_device(zdev);
return rc;
}
EXPORT_SYMBOL_GPL(zpci_reenable_device);
--
2.52.0
^ permalink raw reply related [flat|nested] 5+ messages in thread* [PATCH v1 2/3] vfio-pci/zdev: Add VFIO FMB device feature
2026-05-01 19:25 [PATCH v1 0/3] vfio-pci/zdev: Improved zPCI Function Measurement Support Omar Elghoul
2026-05-01 19:25 ` [PATCH v1 1/3] s390/pci: Preserve FMB state in device re-enablement Omar Elghoul
@ 2026-05-01 19:25 ` Omar Elghoul
2026-05-01 19:25 ` [PATCH v1 3/3] s390/pci: Fence FMB enable/disable via sysfs for passthrough devices Omar Elghoul
2026-05-01 20:17 ` [PATCH v1 0/3] vfio-pci/zdev: Improved zPCI Function Measurement Support Omar Elghoul
3 siblings, 0 replies; 5+ messages in thread
From: Omar Elghoul @ 2026-05-01 19:25 UTC (permalink / raw)
To: linux-s390, linux-kernel, kvm
Cc: oelghoul, hca, gor, agordeev, borntraeger, svens, schnelle,
mjrosato, alifm, farman, gbayer, alex
Set up a new VFIO feature for zPCI devices to share the latest FMB snapshot
with userspace. This feature supports the same 4 FMB formats (0 through 3)
that are already supported by the kernel.
With VFIO_DEVICE_FEATURE_GET, allow the user driver to read the latest FMB
snapshot as well as query whether the FMB is currently enabled on the
function, itself indicating whether the FMB snapshot is valid. On the other
hand, with VFIO_DEVICE_FEATURE_SET, the userspace driver can enable or
disable the FMB.
Signed-off-by: Omar Elghoul <oelghoul@linux.ibm.com>
---
drivers/vfio/pci/vfio_pci_core.c | 2 +
drivers/vfio/pci/vfio_pci_priv.h | 9 ++++
drivers/vfio/pci/vfio_pci_zdev.c | 77 ++++++++++++++++++++++++++++++++
include/uapi/linux/vfio.h | 43 ++++++++++++++++++
4 files changed, 131 insertions(+)
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 3f8d093aacf8..63e80b6fa0dc 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -1534,6 +1534,8 @@ int vfio_pci_core_ioctl_feature(struct vfio_device *device, u32 flags,
return vfio_pci_core_feature_token(vdev, flags, arg, argsz);
case VFIO_DEVICE_FEATURE_DMA_BUF:
return vfio_pci_core_feature_dma_buf(vdev, flags, arg, argsz);
+ case VFIO_DEVICE_FEATURE_ZPCI_FMB:
+ return vfio_pci_zdev_feature_fmb(vdev, flags, arg, argsz);
default:
return -ENOTTY;
}
diff --git a/drivers/vfio/pci/vfio_pci_priv.h b/drivers/vfio/pci/vfio_pci_priv.h
index fca9d0dfac90..208e05942b48 100644
--- a/drivers/vfio/pci/vfio_pci_priv.h
+++ b/drivers/vfio/pci/vfio_pci_priv.h
@@ -93,6 +93,8 @@ int vfio_pci_info_zdev_add_caps(struct vfio_pci_core_device *vdev,
struct vfio_info_cap *caps);
int vfio_pci_zdev_open_device(struct vfio_pci_core_device *vdev);
void vfio_pci_zdev_close_device(struct vfio_pci_core_device *vdev);
+int vfio_pci_zdev_feature_fmb(struct vfio_pci_core_device *vdev, u32 flags,
+ void __user *arg, size_t argsz);
#else
static inline int vfio_pci_info_zdev_add_caps(struct vfio_pci_core_device *vdev,
struct vfio_info_cap *caps)
@@ -107,6 +109,13 @@ static inline int vfio_pci_zdev_open_device(struct vfio_pci_core_device *vdev)
static inline void vfio_pci_zdev_close_device(struct vfio_pci_core_device *vdev)
{}
+
+static inline int vfio_pci_zdev_feature_fmb(struct vfio_pci_core_device *vdev,
+ u32 flags, void __user *arg,
+ size_t argsz)
+{
+ return -ENOTTY;
+}
#endif
static inline bool vfio_pci_is_vga(struct pci_dev *pdev)
diff --git a/drivers/vfio/pci/vfio_pci_zdev.c b/drivers/vfio/pci/vfio_pci_zdev.c
index 0990fdb146b7..1e9efe2bee69 100644
--- a/drivers/vfio/pci/vfio_pci_zdev.c
+++ b/drivers/vfio/pci/vfio_pci_zdev.c
@@ -167,3 +167,80 @@ void vfio_pci_zdev_close_device(struct vfio_pci_core_device *vdev)
if (zpci_kvm_hook.kvm_unregister)
zpci_kvm_hook.kvm_unregister(zdev);
}
+
+int vfio_pci_zdev_feature_fmb(struct vfio_pci_core_device *vdev, u32 flags,
+ void __user *arg, size_t argsz)
+{
+ struct zpci_dev *zdev;
+ struct vfio_device_feature_zpci_fmb fmb = {0};
+ u32 ops = VFIO_DEVICE_FEATURE_GET | VFIO_DEVICE_FEATURE_SET;
+ int ret;
+
+ ret = vfio_check_feature(flags, argsz, ops, sizeof(fmb));
+ if (ret != 1)
+ return ret;
+
+ zdev = to_zpci(vdev->pdev);
+ if (!zdev)
+ return -ENODEV;
+
+ mutex_lock(&zdev->fmb_lock);
+ if (flags & VFIO_DEVICE_FEATURE_SET) {
+ if (copy_from_user(&fmb, arg, sizeof(fmb))) {
+ ret = -EFAULT;
+ goto release_lock;
+ }
+
+ if (fmb.flags & VFIO_DEVICE_FEATURE_ZPCI_FMB_FLAGS_ENABLED)
+ ret = zpci_fmb_reenable_device(zdev);
+ else
+ ret = zpci_fmb_disable_device(zdev);
+ goto release_lock;
+ }
+
+ ret = 0;
+ if (zdev->fmb) {
+ fmb.flags |= VFIO_DEVICE_FEATURE_ZPCI_FMB_FLAGS_ENABLED;
+ } else {
+ fmb.flags &= ~VFIO_DEVICE_FEATURE_ZPCI_FMB_FLAGS_ENABLED;
+ goto release_lock;
+ }
+
+ fmb.format = zdev->fmb->format;
+ fmb.fmt_ind = zdev->fmb->fmt_ind;
+ fmb.samples = zdev->fmb->samples;
+ fmb.last_update = zdev->fmb->last_update;
+ fmb.ld_ops = zdev->fmb->ld_ops;
+ fmb.st_ops = zdev->fmb->st_ops;
+ fmb.stb_ops = zdev->fmb->stb_ops;
+ fmb.rpcit_ops = zdev->fmb->rpcit_ops;
+
+ switch (zdev->fmb->format) {
+ case 0:
+ if (zdev->fmb->fmt_ind & ZPCI_FMB_DMA_COUNTER_VALID) {
+ fmb.fmt0.dma_rbytes = zdev->fmb->fmt0.dma_rbytes;
+ fmb.fmt0.dma_wbytes = zdev->fmb->fmt0.dma_wbytes;
+ }
+ break;
+ case 1:
+ fmb.fmt1.rx_bytes = zdev->fmb->fmt1.rx_bytes;
+ fmb.fmt1.rx_packets = zdev->fmb->fmt1.rx_packets;
+ fmb.fmt1.tx_bytes = zdev->fmb->fmt1.tx_bytes;
+ fmb.fmt1.tx_packets = zdev->fmb->fmt1.tx_packets;
+ break;
+ case 2:
+ fmb.fmt2.consumed_work_units = zdev->fmb->fmt2.consumed_work_units;
+ fmb.fmt2.max_work_units = zdev->fmb->fmt2.max_work_units;
+ break;
+ case 3:
+ fmb.fmt3.tx_bytes = zdev->fmb->fmt3.tx_bytes;
+ break;
+ }
+
+ if (copy_to_user(arg, &fmb, sizeof(fmb)))
+ ret = -EFAULT;
+
+release_lock:
+ mutex_unlock(&zdev->fmb_lock);
+ return ret;
+}
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 5de618a3a5ee..6cbc34ff063e 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -1534,6 +1534,49 @@ struct vfio_device_feature_dma_buf {
*/
#define VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2 12
+/**
+ * Upon VFIO_DEVICE_FEATURE_GET, provide FMB passthrough for VFIO zPCI devices.
+ *
+ * Upon VFIO_DEVICE_FEATURE_SET, only the flags field is read while the
+ * remainder of the structure is ignored. This allows the driver to enable or
+ * disable the FMB while also leaving reserved bits for future flag expansion.
+ * All reserved fields should be zero for future compatibility.
+ */
+#define VFIO_DEVICE_FEATURE_ZPCI_FMB 13
+#define VFIO_DEVICE_FEATURE_ZPCI_FMB_FLAGS_ENABLED 0x1
+
+struct vfio_device_feature_zpci_fmb {
+ __u64 flags;
+ __u32 format: 8;
+ __u32 fmt_ind: 24;
+ __u32 samples;
+ __u64 last_update;
+ __u64 ld_ops;
+ __u64 st_ops;
+ __u64 stb_ops;
+ __u64 rpcit_ops;
+ union {
+ struct {
+ __u64 dma_rbytes;
+ __u64 dma_wbytes;
+ } fmt0;
+ struct {
+ __u64 rx_bytes;
+ __u64 rx_packets;
+ __u64 tx_bytes;
+ __u64 tx_packets;
+ } fmt1;
+ struct {
+ __u64 consumed_work_units;
+ __u64 max_work_units;
+ } fmt2;
+ struct {
+ __u64 tx_bytes;
+ } fmt3;
+ };
+ __u64 reserved[16];
+};
+
/* -------- API for Type1 VFIO IOMMU -------- */
/**
--
2.52.0
^ permalink raw reply related [flat|nested] 5+ messages in thread* [PATCH v1 3/3] s390/pci: Fence FMB enable/disable via sysfs for passthrough devices
2026-05-01 19:25 [PATCH v1 0/3] vfio-pci/zdev: Improved zPCI Function Measurement Support Omar Elghoul
2026-05-01 19:25 ` [PATCH v1 1/3] s390/pci: Preserve FMB state in device re-enablement Omar Elghoul
2026-05-01 19:25 ` [PATCH v1 2/3] vfio-pci/zdev: Add VFIO FMB device feature Omar Elghoul
@ 2026-05-01 19:25 ` Omar Elghoul
2026-05-01 20:17 ` [PATCH v1 0/3] vfio-pci/zdev: Improved zPCI Function Measurement Support Omar Elghoul
3 siblings, 0 replies; 5+ messages in thread
From: Omar Elghoul @ 2026-05-01 19:25 UTC (permalink / raw)
To: linux-s390, linux-kernel, kvm
Cc: oelghoul, hca, gor, agordeev, borntraeger, svens, schnelle,
mjrosato, alifm, farman, gbayer, alex
Introduce a fence over enabling or disabling FMB via sysfs when the zPCI
device is associated with a KVM. This will allow a KVM guest to use FMB
passthrough and avoid the edge-case where the host disables FMB while the
guest is still using it, which may cause partial counter resets and
inconsistent reads which have no parallel in the architecture.
With this patch, the userspace driver, likely QEMU, is still able to enable
or disable the FMB using the VFIO device feature introduced in the previous
patch, effectively securing what is associated with the VM state and
isolating it from other processes on the host.
For VFIO devices that are not associated with a KVM (i.e., for userspace
drivers other than QEMU), this fence does not take effect.
Signed-off-by: Omar Elghoul <oelghoul@linux.ibm.com>
---
arch/s390/pci/pci_debug.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/arch/s390/pci/pci_debug.c b/arch/s390/pci/pci_debug.c
index c7ed7bf254b5..b28923395d03 100644
--- a/arch/s390/pci/pci_debug.c
+++ b/arch/s390/pci/pci_debug.c
@@ -149,6 +149,9 @@ static ssize_t pci_perf_seq_write(struct file *file, const char __user *ubuf,
if (!zdev)
return 0;
+ if (zdev->kzdev)
+ return -EPERM;
+
rc = kstrtoul_from_user(ubuf, count, 10, &val);
if (rc)
return rc;
--
2.52.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH v1 0/3] vfio-pci/zdev: Improved zPCI Function Measurement Support
2026-05-01 19:25 [PATCH v1 0/3] vfio-pci/zdev: Improved zPCI Function Measurement Support Omar Elghoul
` (2 preceding siblings ...)
2026-05-01 19:25 ` [PATCH v1 3/3] s390/pci: Fence FMB enable/disable via sysfs for passthrough devices Omar Elghoul
@ 2026-05-01 20:17 ` Omar Elghoul
3 siblings, 0 replies; 5+ messages in thread
From: Omar Elghoul @ 2026-05-01 20:17 UTC (permalink / raw)
To: linux-s390, linux-kernel, kvm
Cc: hca, gor, agordeev, borntraeger, svens, schnelle, mjrosato, alifm,
farman, gbayer, alex, oelghoul
For illustration of example usage, linked below is the QEMU RFC that
proposes using the feature introduced in this patch series.
https://lore.kernel.org/all/20260501200026.22784-1-oelghoul@linux.ibm.com/
On 5/1/26 3:25 PM, Omar Elghoul wrote:
> Hi,
>
> This patch series improves support for function measurement for zPCI
> passthrough devices on s390x.
>
> Motivation
> ==========
> The firmware on s390x machines allows for tracking a variety of statistics
> relating to zPCI devices in a function measurement block (FMB). However,
> the kernel currently lacks a structured mechanism of sharing this
> information with userspace, beyond /sys/kernel/debug/pci/ID/statistics.
> This can lead to shortcomings when running a guest on KVM with PCI
> passthrough devices, as QEMU is unable to provide an accurate FMB snapshot
> to the guest.
>
> Proposal
> ========
> We propose adding a new VFIO device feature to zPCI passthrough devices,
> allowing userspace programs to read the latest FMB snapshot as it is
> written by the firmware. We ensure that function measurement enablement is
> preserved across device resets on the host. Furthermore, we guard against
> host tampering with the FMB via sysfs when the zPCI device is in
> passthrough to protect the VM's state.
>
> I'd appreciate some feedback on these patches.
>
> Thanks in advance.
>
> Omar Elghoul (3):
> s390/pci: Preserve FMB state in device re-enablement
> vfio-pci/zdev: Add VFIO FMB device feature
> s390/pci: Fence FMB enable/disable via sysfs for passthrough devices
>
> arch/s390/include/asm/pci.h | 1 +
> arch/s390/pci/pci.c | 71 +++++++++++++++++++++++------
> arch/s390/pci/pci_debug.c | 3 ++
> drivers/vfio/pci/vfio_pci_core.c | 2 +
> drivers/vfio/pci/vfio_pci_priv.h | 9 ++++
> drivers/vfio/pci/vfio_pci_zdev.c | 77 ++++++++++++++++++++++++++++++++
> include/uapi/linux/vfio.h | 43 ++++++++++++++++++
> 7 files changed, 193 insertions(+), 13 deletions(-)
>
^ permalink raw reply [flat|nested] 5+ messages in thread