* [PATCH v4 1/4] s390/pci: Hold fmb_lock when enabling or disabling PCI devices
2026-06-12 18:10 [PATCH v4 0/4] vfio-pci/zdev: Improved zPCI Function Measurement Support Omar Elghoul
@ 2026-06-12 18:10 ` Omar Elghoul
2026-06-12 18:28 ` sashiko-bot
2026-06-12 18:10 ` [PATCH v4 2/4] s390/pci: Preserve FMB state in device re-enablement Omar Elghoul
` (2 subsequent siblings)
3 siblings, 1 reply; 9+ messages in thread
From: Omar Elghoul @ 2026-06-12 18:10 UTC (permalink / raw)
To: linux-s390, linux-kernel, kvm
Cc: oelghoul, hca, gor, agordeev, borntraeger, svens, schnelle,
mjrosato, alifm, farman, gbayer, alex, stable
Ensure that fmb_lock is held by pcibios_enable_device() and
pcibios_disable_device() when calling zpci_fmb_enable_device() or
zpci_fmb_disable_device(), respectively. Additionally, assert that the
fmb_lock is held within the latter two functions to prevent future race
conditions regarding new callers.
Fixes: af0a8a8453f7 ("s390/pci: implement pcibios_add_device")
Fixes: 944239c59e93 ("s390/pci: implement pcibios_release_device")
Cc: stable@vger.kernel.org
Signed-off-by: Omar Elghoul <oelghoul@linux.ibm.com>
Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
---
arch/s390/pci/pci.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c
index 39bd2adfc240..2910d4038d39 100644
--- a/arch/s390/pci/pci.c
+++ b/arch/s390/pci/pci.c
@@ -173,6 +173,8 @@ int zpci_fmb_enable_device(struct zpci_dev *zdev)
unsigned long flags;
u8 cc, status;
+ lockdep_assert_held(&zdev->fmb_lock);
+
if (zdev->fmb || sizeof(*zdev->fmb) < zdev->fmb_length)
return -EINVAL;
@@ -211,6 +213,8 @@ int zpci_fmb_disable_device(struct zpci_dev *zdev)
struct zpci_fib fib = {0};
u8 cc, status;
+ lockdep_assert_held(&zdev->fmb_lock);
+
if (!zdev->fmb)
return -EINVAL;
@@ -639,7 +643,9 @@ int pcibios_enable_device(struct pci_dev *pdev, int mask)
struct zpci_dev *zdev = to_zpci(pdev);
zpci_debug_init_device(zdev, dev_name(&pdev->dev));
+ mutex_lock(&zdev->fmb_lock);
zpci_fmb_enable_device(zdev);
+ mutex_unlock(&zdev->fmb_lock);
return pci_enable_resources(pdev, mask);
}
@@ -648,7 +654,9 @@ void pcibios_disable_device(struct pci_dev *pdev)
{
struct zpci_dev *zdev = to_zpci(pdev);
+ mutex_lock(&zdev->fmb_lock);
zpci_fmb_disable_device(zdev);
+ mutex_unlock(&zdev->fmb_lock);
zpci_debug_exit_device(zdev);
}
--
2.54.0
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [PATCH v4 1/4] s390/pci: Hold fmb_lock when enabling or disabling PCI devices
2026-06-12 18:10 ` [PATCH v4 1/4] s390/pci: Hold fmb_lock when enabling or disabling PCI devices Omar Elghoul
@ 2026-06-12 18:28 ` sashiko-bot
0 siblings, 0 replies; 9+ messages in thread
From: sashiko-bot @ 2026-06-12 18:28 UTC (permalink / raw)
To: Omar Elghoul
Cc: kvm, Vasily Gorbik, Heiko Carstens, Alexander Gordeev, linux-s390
Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
Pre-existing issues:
- [Critical] Missing cleanup in pcibios_enable_device() error path causes memory leak and a critical Use-After-Free via leaked debugfs files.
- [Critical] Dangling debugfs pointers and FMB leak if device is removed while enabled.
- [High] Race condition between pcibios_disable_device() and debugfs write causes a permanent FMB memory leak.
--
commit 073e998db2b23cb7423c91c7e1c6e999dbe5b90a
Author: Omar Elghoul <oelghoul@linux.ibm.com>
s390/pci: Hold fmb_lock when enabling or disabling PCI devices
Ensure that fmb_lock is held by pcibios_enable_device() and
pcibios_disable_device() when calling zpci_fmb_enable_device() or
zpci_fmb_disable_device(), respectively. Additionally, assert that the
fmb_lock is held within the latter two functions to prevent future race
conditions regarding new callers.
> diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c
> index 39bd2adfc240f..2910d4038d39c 100644
> --- a/arch/s390/pci/pci.c
> +++ b/arch/s390/pci/pci.c
[ ... ]
> @@ -639,7 +643,9 @@ int pcibios_enable_device(struct pci_dev *pdev, int mask)
> struct zpci_dev *zdev = to_zpci(pdev);
>
> zpci_debug_init_device(zdev, dev_name(&pdev->dev));
> + mutex_lock(&zdev->fmb_lock);
> zpci_fmb_enable_device(zdev);
> + mutex_unlock(&zdev->fmb_lock);
>
> return pci_enable_resources(pdev, mask);
[Severity: Critical]
This is a pre-existing issue, but does this leak resources and potentially
cause a use-after-free if pci_enable_resources() fails?
If pci_enable_resources() returns an error, pcibios_enable_device() returns
the error directly without rolling back the debugfs creation or freeing the
FMB. Since pcibios_disable_device() is only invoked by the PCI core if
enablement succeeds, these resources appear to be permanently leaked.
Later, when the device is unregistered, zpci_release_device() frees the zdev
structure via kfree_rcu(). Because the error path bypassed cleanup, the
debugfs files remain active. If userspace then reads or writes to these
orphaned debugfs files, such as in arch/s390/pci/pci_debug.c:pci_perf_show():
struct zpci_dev *zdev = m->private;
if (!zdev)
return 0;
mutex_lock(&zdev->fmb_lock);
won't it dereference the freed zdev pointer?
> }
> @@ -648,7 +654,9 @@ void pcibios_disable_device(struct pci_dev *pdev)
> {
> struct zpci_dev *zdev = to_zpci(pdev);
>
> + mutex_lock(&zdev->fmb_lock);
> zpci_fmb_disable_device(zdev);
> + mutex_unlock(&zdev->fmb_lock);
> zpci_debug_exit_device(zdev);
[Severity: High]
This isn't a bug introduced by this patch, but does releasing the lock
before destroying the debugfs endpoints create a race window?
If a concurrent userspace write to the debugfs statistics file in
arch/s390/pci/pci_debug.c:pci_perf_seq_write() is blocked on fmb_lock:
mutex_lock(&zdev->fmb_lock);
switch (val) {
case 0:
rc = zpci_fmb_disable_device(zdev);
break;
case 1:
rc = zpci_fmb_enable_device(zdev);
break;
}
mutex_unlock(&zdev->fmb_lock);
it will wake up immediately after pcibios_disable_device() drops the lock.
It will then execute zpci_fmb_enable_device(), which allocates a new
zdev->fmb buffer and re-enables hardware statistics.
After pci_perf_seq_write() releases the lock, pcibios_disable_device()
proceeds to call zpci_debug_exit_device() and permanently deletes the
debugfs files.
Could this race leave the hardware FMB silently active and the zdev->fmb
memory permanently leaked because userspace can no longer disable it?
[Severity: Critical]
This is also a pre-existing issue, but what happens if a zPCI device is
hot-removed while its enable_cnt is greater than 0?
If the device is manually enabled via the PCI sysfs enable attribute while
unbound, or if a bound driver fails to call pci_disable_device(), the PCI
subsystem's teardown path (pci_stop_and_remove_bus_device) does not
automatically invoke pci_disable_device().
As a result, pcibios_disable_device() is never executed. When the device is
subsequently unregistered, zpci_release_device() invokes kfree_rcu() to
free the zdev structure:
arch/s390/pci/pci.c:zpci_release_device() {
...
zpci_bus_device_unregister(zdev);
zpci_destroy_iommu(zdev);
zpci_dbg(3, "rem fid:%x\n", zdev->fid);
kfree_rcu(zdev, rcu);
...
}
Because the debugfs teardown step in pcibios_disable_device() was bypassed,
will the active debugfs endpoints remain registered with pointers to the
freed zdev structure, leading to a use-after-free upon access?
> }
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260612181048.91548-1-oelghoul@linux.ibm.com?part=1
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v4 2/4] s390/pci: Preserve FMB state in device re-enablement
2026-06-12 18:10 [PATCH v4 0/4] vfio-pci/zdev: Improved zPCI Function Measurement Support Omar Elghoul
2026-06-12 18:10 ` [PATCH v4 1/4] s390/pci: Hold fmb_lock when enabling or disabling PCI devices Omar Elghoul
@ 2026-06-12 18:10 ` Omar Elghoul
2026-06-12 18:26 ` sashiko-bot
2026-06-12 18:10 ` [PATCH v4 3/4] vfio-pci/zdev: Add VFIO FMB device features Omar Elghoul
2026-06-12 18:10 ` [PATCH v4 4/4] s390/pci: Fence FMB enable/disable via sysfs for passthrough devices Omar Elghoul
3 siblings, 1 reply; 9+ messages in thread
From: Omar Elghoul @ 2026-06-12 18:10 UTC (permalink / raw)
To: linux-s390, linux-kernel, kvm
Cc: oelghoul, hca, gor, agordeev, borntraeger, svens, schnelle,
mjrosato, alifm, farman, gbayer, alex
Introduce a function zpci_fmb_reenable_device() that checks for the state
of the FMB and reuses the same buffer where appropriate. If the FMB was not
previously enabled, enable it for the device. Call this function during a
zPCI device re-enablement, which in turn implicitly ensures that the FMB is
is enabled for host devices during their KVM registration.
Besides re-enabling the FMB itself in zpci_fmb_reenable_device() also clear
out the software counters, such that a program resetting an FMB sees all
counters start from zero as expected. Separate this clearing of software
counters out into zpci_fmb_clear_iommu_ctrs() and reuse it in
zpci_fmb_enable_device() and zpci_fmb_reenable_device(). Likewise separate
the FMB enable logic into zpci_fmb_do_enable() to be reused in the same two
functions.
Signed-off-by: Omar Elghoul <oelghoul@linux.ibm.com>
---
arch/s390/include/asm/pci.h | 1 +
arch/s390/pci/pci.c | 95 +++++++++++++++++++++++++++++--------
2 files changed, 77 insertions(+), 19 deletions(-)
diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h
index 5dcf35f0f325..65014e52d559 100644
--- a/arch/s390/include/asm/pci.h
+++ b/arch/s390/include/asm/pci.h
@@ -323,6 +323,7 @@ void zpci_remove_parent_msi_domain(struct zpci_bus *zbus);
/* FMB */
int zpci_fmb_enable_device(struct zpci_dev *);
int zpci_fmb_disable_device(struct zpci_dev *);
+int zpci_fmb_reenable_device(struct zpci_dev *zdev);
/* Debug */
int zpci_debug_init(void);
diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c
index 2910d4038d39..21d3fccac789 100644
--- a/arch/s390/pci/pci.c
+++ b/arch/s390/pci/pci.c
@@ -164,24 +164,10 @@ int zpci_unregister_ioat(struct zpci_dev *zdev, u8 dmaas)
return cc;
}
-/* Modify PCI: Set PCI function measurement parameters */
-int zpci_fmb_enable_device(struct zpci_dev *zdev)
+static void zpci_fmb_clear_iommu_ctrs(struct zpci_dev *zdev)
{
- u64 req = ZPCI_CREATE_REQ(zdev->fh, 0, ZPCI_MOD_FC_SET_MEASURE);
struct zpci_iommu_ctrs *ctrs;
- struct zpci_fib fib = {0};
- unsigned long flags;
- u8 cc, status;
-
- lockdep_assert_held(&zdev->fmb_lock);
-
- if (zdev->fmb || sizeof(*zdev->fmb) < zdev->fmb_length)
- return -EINVAL;
-
- zdev->fmb = kmem_cache_zalloc(zdev_fmb_cache, GFP_KERNEL);
- if (!zdev->fmb)
- return -ENOMEM;
- WARN_ON((u64) zdev->fmb & 0xf);
+ unsigned long flags = 0;
/* reset software counters */
spin_lock_irqsave(&zdev->dom_lock, flags);
@@ -194,17 +180,49 @@ int zpci_fmb_enable_device(struct zpci_dev *zdev)
atomic64_set(&ctrs->sync_rpcits, 0);
}
spin_unlock_irqrestore(&zdev->dom_lock, flags);
+}
+static int zpci_fmb_do_enable(struct zpci_dev *zdev)
+{
+ /* This helper assumes that zdev->fmb is already allocated and thus only
+ * takes care of the actual enablement.
+ */
+ u64 req = ZPCI_CREATE_REQ(zdev->fh, 0, ZPCI_MOD_FC_SET_MEASURE);
+ struct zpci_fib fib = {0};
+ u8 cc, status;
fib.fmb_addr = virt_to_phys(zdev->fmb);
fib.gd = zdev->gisa;
cc = zpci_mod_fc(req, &fib, &status);
- if (cc) {
+
+ return cc ? -EIO : 0;
+}
+
+/* Modify PCI: Set PCI function measurement parameters */
+int zpci_fmb_enable_device(struct zpci_dev *zdev)
+{
+ int rc;
+
+ lockdep_assert_held(&zdev->fmb_lock);
+
+ if (zdev->fmb || sizeof(*zdev->fmb) < zdev->fmb_length)
+ return -EINVAL;
+
+ zdev->fmb = kmem_cache_zalloc(zdev_fmb_cache, GFP_KERNEL);
+ if (!zdev->fmb)
+ return -ENOMEM;
+ WARN_ON((u64) zdev->fmb & 0xf);
+
+ zpci_fmb_clear_iommu_ctrs(zdev);
+
+ rc = zpci_fmb_do_enable(zdev);
+ if (rc) {
kmem_cache_free(zdev_fmb_cache, zdev->fmb);
zdev->fmb = NULL;
}
- return cc ? -EIO : 0;
+ return rc;
}
+EXPORT_SYMBOL_GPL(zpci_fmb_enable_device);
/* Modify PCI: Disable PCI function measurement */
int zpci_fmb_disable_device(struct zpci_dev *zdev)
@@ -231,6 +249,41 @@ int zpci_fmb_disable_device(struct zpci_dev *zdev)
}
return cc ? -EIO : 0;
}
+EXPORT_SYMBOL_GPL(zpci_fmb_disable_device);
+
+int zpci_fmb_reenable_device(struct zpci_dev *zdev)
+{
+ u64 req = ZPCI_CREATE_REQ(zdev->fh, 0, ZPCI_MOD_FC_SET_MEASURE);
+ struct zpci_fib fib = {0};
+ u8 cc, status;
+ int rc;
+
+ lockdep_assert_held(&zdev->fmb_lock);
+
+ if (!zdev->fmb)
+ return zpci_fmb_enable_device(zdev);
+
+ fib.gd = zdev->gisa;
+ cc = zpci_mod_fc(req, &fib, &status); /* Disable function measurement */
+
+ /* Unlike in zpci_fmb_disable_device(), cc == 3 is not a valid state here
+ * because we are re-enabling function measurement for the same function
+ * handle.
+ */
+ if (cc)
+ return -EIO;
+
+ zpci_fmb_clear_iommu_ctrs(zdev);
+
+ rc = zpci_fmb_do_enable(zdev);
+ if (rc) {
+ kmem_cache_free(zdev_fmb_cache, zdev->fmb);
+ zdev->fmb = NULL;
+ }
+
+ return rc;
+}
+EXPORT_SYMBOL_GPL(zpci_fmb_reenable_device);
static int zpci_cfg_load(struct zpci_dev *zdev, int offset, u32 *val, u8 len)
{
@@ -737,9 +790,13 @@ int zpci_reenable_device(struct zpci_dev *zdev)
}
rc = zpci_iommu_register_ioat(zdev, &status);
- if (rc)
+ if (rc) {
zpci_disable_device(zdev);
+ return rc;
+ }
+ guard(mutex)(&zdev->fmb_lock);
+ zpci_fmb_reenable_device(zdev);
return rc;
}
EXPORT_SYMBOL_GPL(zpci_reenable_device);
--
2.54.0
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [PATCH v4 2/4] s390/pci: Preserve FMB state in device re-enablement
2026-06-12 18:10 ` [PATCH v4 2/4] s390/pci: Preserve FMB state in device re-enablement Omar Elghoul
@ 2026-06-12 18:26 ` sashiko-bot
0 siblings, 0 replies; 9+ messages in thread
From: sashiko-bot @ 2026-06-12 18:26 UTC (permalink / raw)
To: Omar Elghoul
Cc: kvm, linux-s390, Vasily Gorbik, Alexander Gordeev, Heiko Carstens
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Medium] zpci_fmb_reenable_device() reuses the existing FMB hardware buffer without zeroing it, exposing stale hardware counter values until the first periodic hardware update occurs.
--
commit a81b1b0a067d09df54d661ddbb8985e9220d9a88
Author: Omar Elghoul <oelghoul@linux.ibm.com>
s390/pci: Preserve FMB state in device re-enablement
This commit introduces zpci_fmb_reenable_device() to reuse the existing FMB
buffer during device re-enablement instead of allocating a new one. It also
refactors the FMB enablement and software counter clearing logic into shared
helpers.
> diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c
> index 2910d4038d39c..21d3fccac7894 100644
> --- a/arch/s390/pci/pci.c
> +++ b/arch/s390/pci/pci.c
[ ... ]
> +int zpci_fmb_reenable_device(struct zpci_dev *zdev)
> +{
> + u64 req = ZPCI_CREATE_REQ(zdev->fh, 0, ZPCI_MOD_FC_SET_MEASURE);
> + struct zpci_fib fib = {0};
> + u8 cc, status;
> + int rc;
> +
> + lockdep_assert_held(&zdev->fmb_lock);
> +
> + if (!zdev->fmb)
> + return zpci_fmb_enable_device(zdev);
> +
> + fib.gd = zdev->gisa;
> + cc = zpci_mod_fc(req, &fib, &status); /* Disable function measurement */
> +
> + /* Unlike in zpci_fmb_disable_device(), cc == 3 is not a valid state here
> + * because we are re-enabling function measurement for the same function
> + * handle.
> + */
> + if (cc)
> + return -EIO;
> +
> + zpci_fmb_clear_iommu_ctrs(zdev);
> +
[Severity: Medium]
Does this code need to zero the reused zdev->fmb buffer before enabling it?
The commit message states that a program resetting an FMB should see all
counters start from zero. While zpci_fmb_clear_iommu_ctrs() clears the
software-maintained IOMMU counters, the hardware-maintained FMB buffer
(zdev->fmb) is reused here without being cleared.
Could this expose stale hardware counter values to userspace until the
first asynchronous hardware update occurs? The previous behavior in
zpci_fmb_enable_device() allocated a fresh, zero-initialized buffer via
kmem_cache_zalloc().
> + rc = zpci_fmb_do_enable(zdev);
> + if (rc) {
> + kmem_cache_free(zdev_fmb_cache, zdev->fmb);
> + zdev->fmb = NULL;
> + }
> +
> + return rc;
> +}
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260612181048.91548-1-oelghoul@linux.ibm.com?part=2
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v4 3/4] vfio-pci/zdev: Add VFIO FMB device features
2026-06-12 18:10 [PATCH v4 0/4] vfio-pci/zdev: Improved zPCI Function Measurement Support Omar Elghoul
2026-06-12 18:10 ` [PATCH v4 1/4] s390/pci: Hold fmb_lock when enabling or disabling PCI devices Omar Elghoul
2026-06-12 18:10 ` [PATCH v4 2/4] s390/pci: Preserve FMB state in device re-enablement Omar Elghoul
@ 2026-06-12 18:10 ` Omar Elghoul
2026-06-12 18:28 ` sashiko-bot
2026-06-12 18:10 ` [PATCH v4 4/4] s390/pci: Fence FMB enable/disable via sysfs for passthrough devices Omar Elghoul
3 siblings, 1 reply; 9+ messages in thread
From: Omar Elghoul @ 2026-06-12 18:10 UTC (permalink / raw)
To: linux-s390, linux-kernel, kvm
Cc: oelghoul, hca, gor, agordeev, borntraeger, svens, schnelle,
mjrosato, alifm, farman, gbayer, alex
Introduce new VFIO features for zPCI devices to provide FMB passthrough to
userspace.
Allow the user to enable or disable the FMB using the SET-only feature
VFIO_DEVICE_FEATURE_ZPCI_FMB_ENABLE. Likewise allow the user to read the
latest FMB using the GET-only feature VFIO_DEVICE_FEATURE_ZPCI_FMB_READ
in the case where the FMB is enabled.
Additionally, when initializing the FMB kmem_cache, allow copy to userspace
within the FMB's structure range.
Signed-off-by: Omar Elghoul <oelghoul@linux.ibm.com>
---
arch/s390/pci/pci.c | 8 ++++-
drivers/vfio/pci/vfio_pci_core.c | 4 +++
drivers/vfio/pci/vfio_pci_priv.h | 18 +++++++++++
drivers/vfio/pci/vfio_pci_zdev.c | 55 ++++++++++++++++++++++++++++++++
include/uapi/linux/vfio.h | 29 +++++++++++++++++
5 files changed, 113 insertions(+), 1 deletion(-)
diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c
index 21d3fccac789..104a924f8e4b 100644
--- a/arch/s390/pci/pci.c
+++ b/arch/s390/pci/pci.c
@@ -1134,8 +1134,14 @@ static int zpci_mem_init(void)
BUILD_BUG_ON((CONFIG_ILLEGAL_POINTER_VALUE + 0x10000 > ZPCI_IOMAP_ADDR_BASE) &&
(CONFIG_ILLEGAL_POINTER_VALUE <= ZPCI_IOMAP_ADDR_MAX));
+ struct kmem_cache_args fmb_cache_args = {
+ .align = __alignof__(struct zpci_fmb),
+ .useroffset = 0,
+ .usersize = sizeof(struct zpci_fmb)
+ };
+
zdev_fmb_cache = kmem_cache_create("PCI_FMB_cache", sizeof(struct zpci_fmb),
- __alignof__(struct zpci_fmb), 0, NULL);
+ &fmb_cache_args, 0);
if (!zdev_fmb_cache)
goto error_fmb;
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 050e7542952e..44e8e5526eae 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -1569,6 +1569,10 @@ int vfio_pci_core_ioctl_feature(struct vfio_device *device, u32 flags,
return vfio_pci_core_feature_token(vdev, flags, arg, argsz);
case VFIO_DEVICE_FEATURE_DMA_BUF:
return vfio_pci_core_feature_dma_buf(vdev, flags, arg, argsz);
+ case VFIO_DEVICE_FEATURE_ZPCI_FMB_ENABLE:
+ return vfio_pci_zdev_feature_fmb_enable(vdev, flags, arg, argsz);
+ case VFIO_DEVICE_FEATURE_ZPCI_FMB_READ:
+ return vfio_pci_zdev_feature_fmb_read(vdev, flags, arg, argsz);
default:
return -ENOTTY;
}
diff --git a/drivers/vfio/pci/vfio_pci_priv.h b/drivers/vfio/pci/vfio_pci_priv.h
index fca9d0dfac90..b7db064a6a95 100644
--- a/drivers/vfio/pci/vfio_pci_priv.h
+++ b/drivers/vfio/pci/vfio_pci_priv.h
@@ -93,6 +93,10 @@ int vfio_pci_info_zdev_add_caps(struct vfio_pci_core_device *vdev,
struct vfio_info_cap *caps);
int vfio_pci_zdev_open_device(struct vfio_pci_core_device *vdev);
void vfio_pci_zdev_close_device(struct vfio_pci_core_device *vdev);
+int vfio_pci_zdev_feature_fmb_enable(struct vfio_pci_core_device *vdev, u32 flags,
+ void __user *arg, size_t argsz);
+int vfio_pci_zdev_feature_fmb_read(struct vfio_pci_core_device *vdev, u32 flags,
+ void __user *arg, size_t argsz);
#else
static inline int vfio_pci_info_zdev_add_caps(struct vfio_pci_core_device *vdev,
struct vfio_info_cap *caps)
@@ -107,6 +111,20 @@ static inline int vfio_pci_zdev_open_device(struct vfio_pci_core_device *vdev)
static inline void vfio_pci_zdev_close_device(struct vfio_pci_core_device *vdev)
{}
+
+static inline int vfio_pci_zdev_feature_fmb_enable(struct vfio_pci_core_device *vdev,
+ u32 flags, void __user *arg,
+ size_t argsz)
+{
+ return -ENOTTY;
+}
+
+static inline int vfio_pci_zdev_feature_fmb_read(struct vfio_pci_core_device *vdev,
+ u32 flags, void __user *arg,
+ size_t argsz)
+{
+ return -ENOTTY;
+}
#endif
static inline bool vfio_pci_is_vga(struct pci_dev *pdev)
diff --git a/drivers/vfio/pci/vfio_pci_zdev.c b/drivers/vfio/pci/vfio_pci_zdev.c
index 0990fdb146b7..ad1bcaf52ff2 100644
--- a/drivers/vfio/pci/vfio_pci_zdev.c
+++ b/drivers/vfio/pci/vfio_pci_zdev.c
@@ -167,3 +167,58 @@ void vfio_pci_zdev_close_device(struct vfio_pci_core_device *vdev)
if (zpci_kvm_hook.kvm_unregister)
zpci_kvm_hook.kvm_unregister(zdev);
}
+
+int vfio_pci_zdev_feature_fmb_enable(struct vfio_pci_core_device *vdev, u32 flags,
+ void __user *arg, size_t argsz)
+{
+ struct zpci_dev *zdev;
+ struct vfio_device_feature_zpci_fmb_enable fmb_enable;
+ int ret;
+
+ ret = vfio_check_feature(flags, argsz, VFIO_DEVICE_FEATURE_SET, sizeof(fmb_enable));
+ if (ret != 1)
+ return ret;
+
+ zdev = to_zpci(vdev->pdev);
+ if (!zdev)
+ return -ENODEV;
+
+ if (copy_from_user(&fmb_enable, arg, sizeof(fmb_enable)))
+ return -EFAULT;
+
+ guard(mutex)(&zdev->fmb_lock);
+
+ if (fmb_enable.enabled)
+ return zpci_fmb_enable_device(zdev);
+ return zpci_fmb_disable_device(zdev);
+}
+
+int vfio_pci_zdev_feature_fmb_read(struct vfio_pci_core_device *vdev, u32 flags,
+ void __user *arg, size_t argsz)
+{
+ struct zpci_dev *zdev;
+ struct vfio_device_feature_zpci_fmb_read fmb_read;
+ int ret;
+
+ ret = vfio_check_feature(flags, argsz, VFIO_DEVICE_FEATURE_GET, sizeof(fmb_read));
+ if (ret != 1)
+ return ret;
+
+ zdev = to_zpci(vdev->pdev);
+ if (!zdev)
+ return -ENODEV;
+
+ guard(mutex)(&zdev->fmb_lock);
+
+ if (!zdev->fmb)
+ return -ENOMSG;
+ if (copy_from_user(&fmb_read, arg, sizeof(fmb_read)))
+ return -EFAULT;
+ if (!fmb_read.data)
+ return -EINVAL;
+
+ if (copy_to_user((struct zpci_fmb __user *) fmb_read.data, zdev->fmb, zdev->fmb_length))
+ return -EFAULT;
+
+ return 0;
+}
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 5de618a3a5ee..97e0f857fe4f 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -1534,6 +1534,35 @@ struct vfio_device_feature_dma_buf {
*/
#define VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2 12
+/**
+ * Upon VFIO_DEVICE_FEATURE_SET, enable or disable FMB for the VFIO zPCI device.
+ *
+ * enabled is treated as a bool, so any non-zero value evaluates to true. This
+ * feature fails on attempt to double enable/disable.
+ *
+ * Returns: 0 on success, -1 and errno set appropriately on error.
+ */
+#define VFIO_DEVICE_FEATURE_ZPCI_FMB_ENABLE 13
+
+struct vfio_device_feature_zpci_fmb_enable {
+ __u8 enabled;
+};
+
+/**
+ * Upon VFIO_DEVICE_FEATURE_GET, provide FMB passthrough for VFIO zPCI devices.
+ *
+ * The user-provided buffer must be at least fmb_length large, where fmb_length
+ * is reported in VFIO_DEVICE_INFO_CAP_ZPCI_BASE.
+ *
+ * Returns: 0 on success, -1 and errno set appropriately on error. errno==ENOMSG
+ * when the FMB is not enabled.
+ */
+#define VFIO_DEVICE_FEATURE_ZPCI_FMB_READ 14
+
+struct vfio_device_feature_zpci_fmb_read {
+ __aligned_u64 data;
+};
+
/* -------- API for Type1 VFIO IOMMU -------- */
/**
--
2.54.0
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [PATCH v4 3/4] vfio-pci/zdev: Add VFIO FMB device features
2026-06-12 18:10 ` [PATCH v4 3/4] vfio-pci/zdev: Add VFIO FMB device features Omar Elghoul
@ 2026-06-12 18:28 ` sashiko-bot
0 siblings, 0 replies; 9+ messages in thread
From: sashiko-bot @ 2026-06-12 18:28 UTC (permalink / raw)
To: Omar Elghoul
Cc: kvm, linux-s390, Vasily Gorbik, Heiko Carstens, Alexander Gordeev
Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [High] Holding `zdev->fmb_lock` across user memory accesses allows a malicious userspace process to stall host device management.
- [Medium] Enabling the FMB feature via `VFIO_DEVICE_FEATURE_ZPCI_FMB_ENABLE` will unconditionally fail with `-EINVAL` on a newly opened device.
--
commit 67b6ef61b40cb8ad6b8fa03e4587c893d168c03d
Author: Omar Elghoul <oelghoul@linux.ibm.com>
vfio-pci/zdev: Add VFIO FMB device features
Introduce new VFIO features for zPCI devices to provide FMB passthrough to
userspace. Allow the user to enable or disable the FMB using the SET-only
feature VFIO_DEVICE_FEATURE_ZPCI_FMB_ENABLE and read it using the GET-only
feature VFIO_DEVICE_FEATURE_ZPCI_FMB_READ.
[ ... ]
> diff --git a/drivers/vfio/pci/vfio_pci_zdev.c b/drivers/vfio/pci/vfio_pci_zdev.c
> index 0990fdb146b78..ad1bcaf52ff24 100644
> --- a/drivers/vfio/pci/vfio_pci_zdev.c
> +++ b/drivers/vfio/pci/vfio_pci_zdev.c
> @@ -167,3 +167,58 @@ void vfio_pci_zdev_close_device(struct vfio_pci_core_device *vdev)
> if (zpci_kvm_hook.kvm_unregister)
> zpci_kvm_hook.kvm_unregister(zdev);
> }
> +
> +int vfio_pci_zdev_feature_fmb_enable(struct vfio_pci_core_device *vdev, u32 flags,
> + void __user *arg, size_t argsz)
> +{
> + struct zpci_dev *zdev;
> + struct vfio_device_feature_zpci_fmb_enable fmb_enable;
> + int ret;
> +
> + ret = vfio_check_feature(flags, argsz, VFIO_DEVICE_FEATURE_SET, sizeof(fmb_enable));
> + if (ret != 1)
> + return ret;
> +
> + zdev = to_zpci(vdev->pdev);
> + if (!zdev)
> + return -ENODEV;
> +
> + if (copy_from_user(&fmb_enable, arg, sizeof(fmb_enable)))
> + return -EFAULT;
> +
> + guard(mutex)(&zdev->fmb_lock);
> +
> + if (fmb_enable.enabled)
> + return zpci_fmb_enable_device(zdev);
[Severity: Medium]
Will enabling this feature fail with -EINVAL on a newly opened device?
When userspace opens a VFIO device, vfio_pci_core_enable() automatically
enables the physical device via pci_enable_device(). On s390,
pcibios_enable_device() unconditionally calls zpci_fmb_enable_device(),
which successfully allocates zdev->fmb.
If userspace subsequently attempts to enable the FMB feature via this ioctl,
it calls into zpci_fmb_enable_device() again:
arch/s390/pci/pci.c:zpci_fmb_enable_device() {
...
if (zdev->fmb || sizeof(*zdev->fmb) < zdev->fmb_length)
return -EINVAL;
...
}
Does this make it impossible for userspace to explicitly enable the feature
without first disabling it?
> + return zpci_fmb_disable_device(zdev);
> +}
> +
> +int vfio_pci_zdev_feature_fmb_read(struct vfio_pci_core_device *vdev, u32 flags,
> + void __user *arg, size_t argsz)
> +{
> + struct zpci_dev *zdev;
> + struct vfio_device_feature_zpci_fmb_read fmb_read;
> + int ret;
> +
> + ret = vfio_check_feature(flags, argsz, VFIO_DEVICE_FEATURE_GET, sizeof(fmb_read));
> + if (ret != 1)
> + return ret;
> +
> + zdev = to_zpci(vdev->pdev);
> + if (!zdev)
> + return -ENODEV;
> +
> + guard(mutex)(&zdev->fmb_lock);
> +
> + if (!zdev->fmb)
> + return -ENOMSG;
> + if (copy_from_user(&fmb_read, arg, sizeof(fmb_read)))
> + return -EFAULT;
> + if (!fmb_read.data)
> + return -EINVAL;
> +
> + if (copy_to_user((struct zpci_fmb __user *) fmb_read.data, zdev->fmb, zdev->fmb_length))
> + return -EFAULT;
[Severity: High]
Can holding this lock across user memory accesses cause a denial of service?
A malicious userspace process could provide a userfaultfd-backed memory
region for arg or fmb_read.data to intentionally stall the thread
indefinitely while holding zdev->fmb_lock.
Since pcibios_disable_device() also requires this lock during device
teardown, would this stall permanently block the host from hot-unplugging
the PCI device or unbinding the VFIO driver?
> +
> + return 0;
> +}
[ ... ]
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260612181048.91548-1-oelghoul@linux.ibm.com?part=3
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v4 4/4] s390/pci: Fence FMB enable/disable via sysfs for passthrough devices
2026-06-12 18:10 [PATCH v4 0/4] vfio-pci/zdev: Improved zPCI Function Measurement Support Omar Elghoul
` (2 preceding siblings ...)
2026-06-12 18:10 ` [PATCH v4 3/4] vfio-pci/zdev: Add VFIO FMB device features Omar Elghoul
@ 2026-06-12 18:10 ` Omar Elghoul
2026-06-12 18:22 ` sashiko-bot
3 siblings, 1 reply; 9+ messages in thread
From: Omar Elghoul @ 2026-06-12 18:10 UTC (permalink / raw)
To: linux-s390, linux-kernel, kvm
Cc: oelghoul, hca, gor, agordeev, borntraeger, svens, schnelle,
mjrosato, alifm, farman, gbayer, alex
Introduce a fence over enabling or disabling FMB via sysfs when the zPCI
device is associated with a KVM. This will allow a KVM guest to use FMB
passthrough and avoid the edge-case where the host disables FMB while the
guest is still using it, which may cause partial counter resets and
inconsistent reads which have no parallel in the architecture.
With this patch, the userspace driver, likely QEMU, is still able to enable
or disable the FMB using the VFIO device feature introduced in the previous
patch, effectively securing what is associated with the VM state and
isolating it from other processes on the host.
For VFIO devices that are not associated with a KVM (i.e., for userspace
drivers other than QEMU), this fence does not take effect.
Signed-off-by: Omar Elghoul <oelghoul@linux.ibm.com>
Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
---
arch/s390/pci/pci_debug.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/arch/s390/pci/pci_debug.c b/arch/s390/pci/pci_debug.c
index c7ed7bf254b5..a2dc79418c21 100644
--- a/arch/s390/pci/pci_debug.c
+++ b/arch/s390/pci/pci_debug.c
@@ -149,9 +149,15 @@ static ssize_t pci_perf_seq_write(struct file *file, const char __user *ubuf,
if (!zdev)
return 0;
+ mutex_lock(&zdev->kzdev_lock);
+ if (zdev->kzdev) {
+ rc = -EPERM;
+ goto out_unlock_kzdev;
+ }
+
rc = kstrtoul_from_user(ubuf, count, 10, &val);
if (rc)
- return rc;
+ goto out_unlock_kzdev;
mutex_lock(&zdev->fmb_lock);
switch (val) {
@@ -163,6 +169,9 @@ static ssize_t pci_perf_seq_write(struct file *file, const char __user *ubuf,
break;
}
mutex_unlock(&zdev->fmb_lock);
+
+out_unlock_kzdev:
+ mutex_unlock(&zdev->kzdev_lock);
return rc ? rc : count;
}
--
2.54.0
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [PATCH v4 4/4] s390/pci: Fence FMB enable/disable via sysfs for passthrough devices
2026-06-12 18:10 ` [PATCH v4 4/4] s390/pci: Fence FMB enable/disable via sysfs for passthrough devices Omar Elghoul
@ 2026-06-12 18:22 ` sashiko-bot
0 siblings, 0 replies; 9+ messages in thread
From: sashiko-bot @ 2026-06-12 18:22 UTC (permalink / raw)
To: Omar Elghoul
Cc: linux-s390, kvm, Vasily Gorbik, Heiko Carstens, Alexander Gordeev
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Medium] The patch explicitly wraps a sleepable user-space memory copy (`kstrtoul_from_user()`) inside `mutex_lock(&zdev->kzdev_lock)`, introducing a lock-holding anti-pattern that can block system paths.
--
commit 0ba1ddb8db696b6bb98d1631644717f3b52b557c
Author: Omar Elghoul <oelghoul@linux.ibm.com>
s390/pci: Fence FMB enable/disable via sysfs for passthrough devices
This commit introduces a fence over enabling or disabling FMB via sysfs when
the zPCI device is associated with a KVM guest. This prevents the host from
disabling FMB while the guest is using it, securing the VM state.
> diff --git a/arch/s390/pci/pci_debug.c b/arch/s390/pci/pci_debug.c
> --- a/arch/s390/pci/pci_debug.c
> +++ b/arch/s390/pci/pci_debug.c
> @@ -149,9 +149,15 @@ static ssize_t pci_perf_seq_write(struct file *file, const char __user *ubuf,
> if (!zdev)
> return 0;
>
> + mutex_lock(&zdev->kzdev_lock);
> + if (zdev->kzdev) {
> + rc = -EPERM;
> + goto out_unlock_kzdev;
> + }
> +
> rc = kstrtoul_from_user(ubuf, count, 10, &val);
[Severity: Medium]
Is it necessary to hold kzdev_lock while copying from user-space?
Calling kstrtoul_from_user() here means we are holding the kzdev_lock mutex
during a potentially blocking memory copy. If the user-space read blocks, it
will indefinitely hold kzdev_lock, stalling any other operations
synchronizing on this device's lock.
Since kstrtoul_from_user() only parses data into a local stack variable and
does not depend on the device state, does it make sense to move this call
before acquiring kzdev_lock?
> if (rc)
> - return rc;
> + goto out_unlock_kzdev;
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260612181048.91548-1-oelghoul@linux.ibm.com?part=4
^ permalink raw reply [flat|nested] 9+ messages in thread