* [PATCH v2 0/5] vfio/pci: Latch module params, fix bitfield use and VGA unwind
@ 2026-06-11 21:35 Alex Williamson
2026-06-11 21:35 ` [PATCH v2 1/5] vfio/pci: Latch disable_idle_d3 per device Alex Williamson
` (4 more replies)
0 siblings, 5 replies; 10+ messages in thread
From: Alex Williamson @ 2026-06-11 21:35 UTC (permalink / raw)
To: kvm, Alex Williamson
Cc: Alex Williamson, linux-kernel, Jason Gunthorpe, Kevin Tian, kanie
The vfio-pci module pushes module parameters into vfio-pci-core at
initialization that change global policies for all devices, whether
bound to vfio-pci itself or other variant drivers. Not only is this
generally bad practice, but in the case of runtime PM, policy changes
can affect initialized devices, even while they're in use, resulting in
unbalanced operations.
This series first addresses the runtime PM case as a separate fix for
stable, latching the runtime PM support into the device.
Then, in preparing for moving disable_vga to per device, it's discovered
that we lack proper unwind support for a registration failure, leaving
the VGA arbiter client registered. This is resolved before it turns into
an issue, also marked for stable.
Next, we're again reminded of the risky bitfield use in both the
vfio-pci-core and mlx5 variant driver, therefore before expanding use
for more per-device flags, perform the binary split to categorize
flags as either setup/release or runtime to avoid RMW hazards of the
bitfields. Also for stable.
Finally, close the series by pulling vfio-pci module parameter policy
out of the core, latching each into the device structure.
This latter change also has the effect of restoring the mutability of
two of the module parameters, where that functionality was lost with the
vfio-pci-core split. With this series, the current parameter values are
latched into the device at probe time.
Please review. Thanks,
Alex
Alex Williamson (5):
vfio/pci: Latch disable_idle_d3 per device
vfio/pci: Release the VGA arbiter client on register_device() failure
vfio/pci: Fix racy bitfields and tighten struct layout
vfio/mlx5: Fix racy bitfields and tighten struct layout
vfio/pci: Latch all module parameters per device
drivers/vfio/pci/mlx5/cmd.h | 15 +++++----
drivers/vfio/pci/vfio_pci.c | 30 +++++++++++++-----
drivers/vfio/pci/vfio_pci_core.c | 53 +++++++++++++++-----------------
include/linux/vfio_pci_core.h | 18 ++++++-----
4 files changed, 67 insertions(+), 49 deletions(-)
--
2.53.0
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v2 1/5] vfio/pci: Latch disable_idle_d3 per device
2026-06-11 21:35 [PATCH v2 0/5] vfio/pci: Latch module params, fix bitfield use and VGA unwind Alex Williamson
@ 2026-06-11 21:35 ` Alex Williamson
2026-06-11 21:35 ` [PATCH v2 2/5] vfio/pci: Release the VGA arbiter client on register_device() failure Alex Williamson
` (3 subsequent siblings)
4 siblings, 0 replies; 10+ messages in thread
From: Alex Williamson @ 2026-06-11 21:35 UTC (permalink / raw)
To: kvm, Alex Williamson
Cc: Alex Williamson, linux-kernel, Jason Gunthorpe, Kevin Tian, kanie,
stable
When disable_idle_d3 was introduced in vfio-pci, it directly manipulated
the device power state with pci_set_power_state(). There were no
refcounts to maintain or balanced operations, we could unconditionally
bring the device to D0 and conditionally move it to D3hot. Therefore
the module parameter was made writable.
Later, in commit c61302aa48f7 ("vfio/pci: Move module parameters to
vfio_pci.c"), as part of the vfio-pci-core split, the writable aspect
of the module parameter was nullified. The parameter value could still
be changed through sysfs, but the vfio-pci driver latched the values
into vfio-pci-core globals at module init. Loading the vfio-pci module,
or unloading and reloading, with non-default or different values could
change the globals relative to existing devices bound to vfio-pci
variant drivers.
Runtime PM was introduced in commit 7ab5e10eda02 ("vfio/pci: Move the
unused device into low power state with runtime PM"), which marks the
point where power states became refcounted. PM get and put operations
need to be balanced, but the same module operations noted above can
change the global variables relative to those devices already bound to
vfio-pci variant drivers. This introduces a window where PM operations
can now become unbalanced.
To resolve this with a narrow footprint for stable backports, the
disable_idle_d3 flag is latched into the vfio_pci_core_device at the
time of initialization, such that the device always operates with a
consistent value.
Fixes: 7ab5e10eda02 ("vfio/pci: Move the unused device into low power state with runtime PM")
Cc: stable@vger.kernel.org
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Alex Williamson <alex.williamson@nvidia.com>
---
drivers/vfio/pci/vfio_pci_core.c | 30 ++++++++++++++++++++----------
include/linux/vfio_pci_core.h | 1 +
2 files changed, 21 insertions(+), 10 deletions(-)
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index a28f1e99362c..9f71eae0cc94 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -538,7 +538,7 @@ int vfio_pci_core_enable(struct vfio_pci_core_device *vdev)
u16 cmd;
u8 msix_pos;
- if (!disable_idle_d3) {
+ if (!vdev->disable_idle_d3) {
ret = pm_runtime_resume_and_get(&pdev->dev);
if (ret < 0)
return ret;
@@ -617,7 +617,7 @@ int vfio_pci_core_enable(struct vfio_pci_core_device *vdev)
out_disable_device:
pci_disable_device(pdev);
out_power:
- if (!disable_idle_d3)
+ if (!vdev->disable_idle_d3)
pm_runtime_put(&pdev->dev);
return ret;
}
@@ -753,7 +753,7 @@ void vfio_pci_core_disable(struct vfio_pci_core_device *vdev)
vfio_pci_dev_set_try_reset(vdev->vdev.dev_set);
/* Put the pm-runtime usage counter acquired during enable */
- if (!disable_idle_d3)
+ if (!vdev->disable_idle_d3)
pm_runtime_put(&pdev->dev);
}
EXPORT_SYMBOL_GPL(vfio_pci_core_disable);
@@ -2144,6 +2144,8 @@ int vfio_pci_core_init_dev(struct vfio_device *core_vdev)
init_rwsem(&vdev->memory_lock);
xa_init(&vdev->ctx);
+ vdev->disable_idle_d3 = disable_idle_d3;
+
return 0;
}
EXPORT_SYMBOL_GPL(vfio_pci_core_init_dev);
@@ -2239,7 +2241,7 @@ int vfio_pci_core_register_device(struct vfio_pci_core_device *vdev)
dev->driver->pm = &vfio_pci_core_pm_ops;
pm_runtime_allow(dev);
- if (!disable_idle_d3)
+ if (!vdev->disable_idle_d3)
pm_runtime_put(dev);
ret = vfio_register_group_dev(&vdev->vdev);
@@ -2248,7 +2250,7 @@ int vfio_pci_core_register_device(struct vfio_pci_core_device *vdev)
return 0;
out_power:
- if (!disable_idle_d3)
+ if (!vdev->disable_idle_d3)
pm_runtime_get_noresume(dev);
pm_runtime_forbid(dev);
@@ -2267,7 +2269,7 @@ void vfio_pci_core_unregister_device(struct vfio_pci_core_device *vdev)
vfio_pci_vf_uninit(vdev);
vfio_pci_vga_uninit(vdev);
- if (!disable_idle_d3)
+ if (!vdev->disable_idle_d3)
pm_runtime_get_noresume(&vdev->pdev->dev);
pm_runtime_forbid(&vdev->pdev->dev);
@@ -2429,6 +2431,8 @@ static int vfio_pci_dev_set_pm_runtime_get(struct vfio_device_set *dev_set)
int ret;
list_for_each_entry(cur, &dev_set->device_list, vdev.dev_set_list) {
+ if (cur->disable_idle_d3)
+ continue;
ret = pm_runtime_resume_and_get(&cur->pdev->dev);
if (ret)
goto unwind;
@@ -2438,8 +2442,11 @@ static int vfio_pci_dev_set_pm_runtime_get(struct vfio_device_set *dev_set)
unwind:
list_for_each_entry_continue_reverse(cur, &dev_set->device_list,
- vdev.dev_set_list)
+ vdev.dev_set_list) {
+ if (cur->disable_idle_d3)
+ continue;
pm_runtime_put(&cur->pdev->dev);
+ }
return ret;
}
@@ -2552,8 +2559,11 @@ static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
up_write(&vdev->memory_lock);
}
- list_for_each_entry(vdev, &dev_set->device_list, vdev.dev_set_list)
+ list_for_each_entry(vdev, &dev_set->device_list, vdev.dev_set_list) {
+ if (vdev->disable_idle_d3)
+ continue;
pm_runtime_put(&vdev->pdev->dev);
+ }
err_unlock:
mutex_unlock(&dev_set->lock);
@@ -2599,7 +2609,7 @@ static void vfio_pci_dev_set_try_reset(struct vfio_device_set *dev_set)
* state. Increment the usage count for all the devices in the dev_set
* before reset and decrement the same after reset.
*/
- if (!disable_idle_d3 && vfio_pci_dev_set_pm_runtime_get(dev_set))
+ if (vfio_pci_dev_set_pm_runtime_get(dev_set))
return;
if (!pci_reset_bus(pdev))
@@ -2609,7 +2619,7 @@ static void vfio_pci_dev_set_try_reset(struct vfio_device_set *dev_set)
if (reset_done)
cur->needs_reset = false;
- if (!disable_idle_d3)
+ if (!cur->disable_idle_d3)
pm_runtime_put(&cur->pdev->dev);
}
}
diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
index 5fc6ce4dd786..27aab3fdbb91 100644
--- a/include/linux/vfio_pci_core.h
+++ b/include/linux/vfio_pci_core.h
@@ -127,6 +127,7 @@ struct vfio_pci_core_device {
bool needs_pm_restore:1;
bool pm_intx_masked:1;
bool pm_runtime_engaged:1;
+ bool disable_idle_d3:1;
bool sriov_active;
struct pci_saved_state *pci_saved_state;
struct pci_saved_state *pm_save;
--
2.53.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v2 2/5] vfio/pci: Release the VGA arbiter client on register_device() failure
2026-06-11 21:35 [PATCH v2 0/5] vfio/pci: Latch module params, fix bitfield use and VGA unwind Alex Williamson
2026-06-11 21:35 ` [PATCH v2 1/5] vfio/pci: Latch disable_idle_d3 per device Alex Williamson
@ 2026-06-11 21:35 ` Alex Williamson
2026-06-11 21:35 ` [PATCH v2 3/5] vfio/pci: Fix racy bitfields and tighten struct layout Alex Williamson
` (2 subsequent siblings)
4 siblings, 0 replies; 10+ messages in thread
From: Alex Williamson @ 2026-06-11 21:35 UTC (permalink / raw)
To: kvm, Alex Williamson
Cc: Alex Williamson, linux-kernel, Jason Gunthorpe, Kevin Tian, kanie,
stable
The re-order in the Fixes commit below displaced vfio_pci_vga_init() as
the last failure point of what is now vfio_pci_core_register_device()
without introducing an unwind for the VGA arbiter registration.
In current kernels this is mostly benign because vfio_pci_set_decode()
only uses pci_dev state, but the original failure path could leave a
callback with a freed vdev cookie. The stale registration also becomes
unsafe again once the callback follows drvdata to the vfio device.
Add the required VGA unwind callout.
Fixes: 4aeec3984ddc ("vfio/pci: Re-order vfio_pci_probe()")
Cc: stable@vger.kernel.org
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Alex Williamson <alex.williamson@nvidia.com>
---
drivers/vfio/pci/vfio_pci_core.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 9f71eae0cc94..d2d3fddec610 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -2254,6 +2254,7 @@ int vfio_pci_core_register_device(struct vfio_pci_core_device *vdev)
pm_runtime_get_noresume(dev);
pm_runtime_forbid(dev);
+ vfio_pci_vga_uninit(vdev);
out_vf:
vfio_pci_vf_uninit(vdev);
return ret;
--
2.53.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v2 3/5] vfio/pci: Fix racy bitfields and tighten struct layout
2026-06-11 21:35 [PATCH v2 0/5] vfio/pci: Latch module params, fix bitfield use and VGA unwind Alex Williamson
2026-06-11 21:35 ` [PATCH v2 1/5] vfio/pci: Latch disable_idle_d3 per device Alex Williamson
2026-06-11 21:35 ` [PATCH v2 2/5] vfio/pci: Release the VGA arbiter client on register_device() failure Alex Williamson
@ 2026-06-11 21:35 ` Alex Williamson
2026-06-11 21:35 ` [PATCH v2 4/5] vfio/mlx5: " Alex Williamson
2026-06-11 21:35 ` [PATCH v2 5/5] vfio/pci: Latch all module parameters per device Alex Williamson
4 siblings, 0 replies; 10+ messages in thread
From: Alex Williamson @ 2026-06-11 21:35 UTC (permalink / raw)
To: kvm, Alex Williamson
Cc: Alex Williamson, linux-kernel, Jason Gunthorpe, Kevin Tian, kanie,
stable
Bitfield operations are not atomic, they use a read-modify-write
pattern, therefore we should be careful not to pack bitfields that
can be concurrently updated into the same storage unit.
This split takes a binary approach: flags that are only modified
pre/post open/close remain bitfields, flags modified from user
action, including actions that reach across to another device (ex.
reset) use dedicated storage units.
Note that the virq_disabled and bardirty flags are relocated to fill
an existing hole in the structure.
Bitfield justifications:
has_dyn_msix: written only in vfio_pci_core_enable()
pci_2_3: written only in vfio_pci_core_enable()
reset_works: written only in vfio_pci_core_enable()
extended_caps: written only in vfio_cap_len() under vfio_config_init()
has_vga: written only in vfio_pci_core_enable()
nointx: written only in vfio_pci_core_enable()
needs_pm_restore: written only in vfio_pci_probe_power_state()
disable_idle_d3: written only at .init in vfio_pci_core_init_dev()
Dedicated storage units:
virq_disabled: written by guest INTx command writes in
vfio_basic_config_write() while the device is open
bardirty: written by guest BAR writes in vfio_basic_config_write()
while the device is open
pm_intx_masked: written in the runtime-PM suspend path.
pm_runtime_engaged: written by low-power feature entry/exit paths
needs_reset: set in vfio_pci_core_disable() and cleared for devices in
the set by vfio_pci_dev_set_try_reset()
sriov_active: written by vfio_pci_core_sriov_configure() via sysfs
sriov_numvfs while bound.
Fixes: 9cd0f6d5cbb6 ("vfio/pci: Use bitfield for struct vfio_pci_core_device flags")
Cc: stable@vger.kernel.org
Assisted-by: Claude:claude-opus-4-8
Link: https://lore.kernel.org/r/20260511221609.3837652-2-alex.williamson@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@nvidia.com>
---
include/linux/vfio_pci_core.h | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)
diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
index 27aab3fdbb91..985b8af5a04b 100644
--- a/include/linux/vfio_pci_core.h
+++ b/include/linux/vfio_pci_core.h
@@ -101,6 +101,9 @@ struct vfio_pci_core_device {
const struct vfio_pci_device_ops *pci_ops;
void __iomem *barmap[PCI_STD_NUM_BARS];
bool bar_mmap_supported[PCI_STD_NUM_BARS];
+ /* Flags modified at runtime - dedicated storage unit */
+ bool virq_disabled;
+ bool bardirty;
u8 *pci_config_map;
u8 *vconfig;
struct perm_bits *msi_perm;
@@ -115,19 +118,19 @@ struct vfio_pci_core_device {
u16 msix_size;
u32 msix_offset;
u32 rbar[7];
+ /* Flags only modified on setup/release - bitfield ok */
bool has_dyn_msix:1;
bool pci_2_3:1;
- bool virq_disabled:1;
bool reset_works:1;
bool extended_caps:1;
- bool bardirty:1;
bool has_vga:1;
- bool needs_reset:1;
bool nointx:1;
bool needs_pm_restore:1;
- bool pm_intx_masked:1;
- bool pm_runtime_engaged:1;
bool disable_idle_d3:1;
+ /* Flags modified at runtime - dedicated storage unit */
+ bool needs_reset;
+ bool pm_intx_masked;
+ bool pm_runtime_engaged;
bool sriov_active;
struct pci_saved_state *pci_saved_state;
struct pci_saved_state *pm_save;
--
2.53.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v2 4/5] vfio/mlx5: Fix racy bitfields and tighten struct layout
2026-06-11 21:35 [PATCH v2 0/5] vfio/pci: Latch module params, fix bitfield use and VGA unwind Alex Williamson
` (2 preceding siblings ...)
2026-06-11 21:35 ` [PATCH v2 3/5] vfio/pci: Fix racy bitfields and tighten struct layout Alex Williamson
@ 2026-06-11 21:35 ` Alex Williamson
2026-06-11 21:35 ` [PATCH v2 5/5] vfio/pci: Latch all module parameters per device Alex Williamson
4 siblings, 0 replies; 10+ messages in thread
From: Alex Williamson @ 2026-06-11 21:35 UTC (permalink / raw)
To: kvm, Alex Williamson
Cc: Alex Williamson, linux-kernel, Jason Gunthorpe, Kevin Tian, kanie,
stable
Bitfield operations are not atomic, they use a read-modify-write
pattern, therefore we should be careful not to pack bitfields that
can be concurrently updated into the same storage unit.
This split takes a binary approach: flags that are only modified
pre/post open/close remain bitfields, flags modified from user
action, including actions that reach across to another device (ex.
reset) use dedicated storage units.
Note mlx5_vhca_page_tracker.status is relocated to fill the alignment
hole this split exposes.
Bitfield justifications:
migrate_cap: written only in mlx5vf_cmd_set_migratable() at probe
chunk_mode: written only in mlx5vf_cmd_set_migratable() at probe
mig_state_cap: written only in mlx5vf_cmd_set_migratable() at probe
Dedicated storage units:
mdev_detach: written in the VF attach/detach event notifier
mlx5fv_vf_event() at runtime
log_active: written in mlx5vf_start_page_tracker()/
mlx5vf_stop_page_tracker() during runtime dirty tracking
deferred_reset: written in mlx5vf_state_mutex_unlock()/
mlx5vf_pci_aer_reset_done() during runtime reset handling
is_err: set by tracker error handling and dirty-log polling at runtime
object_changed: set by tracker event handling and cleared by dirty-log
polling at runtime
Fixes: 61a2f1460fd0 ("vfio/mlx5: Manage the VF attach/detach callback from the PF")
Fixes: 79c3cf279926 ("vfio/mlx5: Init QP based resources for dirty tracking")
Fixes: f886473071d6 ("vfio/mlx5: Add support for tracker object change event")
Cc: stable@vger.kernel.org
Assisted-by: Claude:claude-opus-4-8
Link: https://lore.kernel.org/r/20260511221609.3837652-3-alex.williamson@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@nvidia.com>
---
drivers/vfio/pci/mlx5/cmd.h | 15 +++++++++------
1 file changed, 9 insertions(+), 6 deletions(-)
diff --git a/drivers/vfio/pci/mlx5/cmd.h b/drivers/vfio/pci/mlx5/cmd.h
index deed0f132f39..c86d8b243a52 100644
--- a/drivers/vfio/pci/mlx5/cmd.h
+++ b/drivers/vfio/pci/mlx5/cmd.h
@@ -158,26 +158,29 @@ struct mlx5_vhca_qp {
struct mlx5_vhca_page_tracker {
u32 id;
u32 pdn;
- u8 is_err:1;
- u8 object_changed:1;
+ /* Flags modified at runtime - dedicated storage unit */
+ u8 is_err;
+ u8 object_changed;
+ int status;
struct mlx5_uars_page *uar;
struct mlx5_vhca_cq cq;
struct mlx5_vhca_qp *host_qp;
struct mlx5_vhca_qp *fw_qp;
struct mlx5_nb nb;
- int status;
};
struct mlx5vf_pci_core_device {
struct vfio_pci_core_device core_device;
int vf_id;
u16 vhca_id;
+ /* Flags only modified on setup/release - bitfield ok */
u8 migrate_cap:1;
- u8 deferred_reset:1;
- u8 mdev_detach:1;
- u8 log_active:1;
u8 chunk_mode:1;
u8 mig_state_cap:1;
+ /* Flags modified at runtime - dedicated storage unit */
+ u8 mdev_detach;
+ u8 log_active;
+ u8 deferred_reset;
struct completion tracker_comp;
/* protect migration state */
struct mutex state_mutex;
--
2.53.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v2 5/5] vfio/pci: Latch all module parameters per device
2026-06-11 21:35 [PATCH v2 0/5] vfio/pci: Latch module params, fix bitfield use and VGA unwind Alex Williamson
` (3 preceding siblings ...)
2026-06-11 21:35 ` [PATCH v2 4/5] vfio/mlx5: " Alex Williamson
@ 2026-06-11 21:35 ` Alex Williamson
2026-06-12 6:25 ` Guixin Liu
2026-06-12 8:40 ` fengchengwen
4 siblings, 2 replies; 10+ messages in thread
From: Alex Williamson @ 2026-06-11 21:35 UTC (permalink / raw)
To: kvm, Alex Williamson
Cc: Alex Williamson, linux-kernel, Jason Gunthorpe, Kevin Tian, kanie
The vfio-pci module parameters of disable_idle_d3, nointxmask, and
disable_vga latch vfio-pci policy into vfio-pci-core globals each time
the vfio-pci module is initialized. The disable_idle_d3 parameter has
already migrated to a per-device flag in order to provide consistency
for refcounted PM operations for the lifetime of the device
registration.
Pull the remaining vfio-pci module-parameter policy out of vfio-pci-core
into per-device flags set at device initialization.
This also restores the mutable aspect of the disable_idle_d3 and
nointxmask module parameters for vfio-pci, with the caveat that the
parameters are latched into the device at probe.
A notable change for variant drivers is that their devices are no longer
affected by vfio-pci module parameters and those drivers may need to
adopt similar module parameters if any devices have a hidden dependency
on vfio-pci setting non-default policy.
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Alex Williamson <alex.williamson@nvidia.com>
---
drivers/vfio/pci/vfio_pci.c | 30 ++++++++++++++++++++++--------
drivers/vfio/pci/vfio_pci_core.c | 26 ++++++--------------------
include/linux/vfio_pci_core.h | 4 ++--
3 files changed, 30 insertions(+), 30 deletions(-)
diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 0c771064c0b8..830369ff878d 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -125,9 +125,30 @@ static int vfio_pci_open_device(struct vfio_device *core_vdev)
return 0;
}
+static int vfio_pci_init_dev(struct vfio_device *core_vdev)
+{
+ struct vfio_pci_core_device *vdev =
+ container_of(core_vdev, struct vfio_pci_core_device, vdev);
+
+ /*
+ * These behaviors originated in vfio-pci and moved into
+ * vfio-pci-core when the driver was split; vfio-pci remains the
+ * only driver that toggles them. Latch our module parameters per
+ * device at init time so that later parameter changes do not
+ * affect already-initialized devices.
+ */
+ vdev->nointxmask = nointxmask;
+ vdev->disable_idle_d3 = disable_idle_d3;
+#ifdef CONFIG_VFIO_PCI_VGA
+ vdev->disable_vga = disable_vga;
+#endif
+
+ return vfio_pci_core_init_dev(core_vdev);
+}
+
static const struct vfio_device_ops vfio_pci_ops = {
.name = "vfio-pci",
- .init = vfio_pci_core_init_dev,
+ .init = vfio_pci_init_dev,
.release = vfio_pci_core_release_dev,
.open_device = vfio_pci_open_device,
.close_device = vfio_pci_core_close_device,
@@ -256,13 +277,6 @@ static void __init vfio_pci_fill_ids(void)
static int __init vfio_pci_init(void)
{
int ret;
- bool is_disable_vga = true;
-
-#ifdef CONFIG_VFIO_PCI_VGA
- is_disable_vga = disable_vga;
-#endif
-
- vfio_pci_core_set_params(nointxmask, is_disable_vga, disable_idle_d3);
/* Register and scan for devices */
ret = pci_register_driver(&vfio_pci_driver);
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index d2d3fddec610..76725c0e7dda 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -38,10 +38,6 @@
#define DRIVER_AUTHOR "Alex Williamson <alex.williamson@redhat.com>"
#define DRIVER_DESC "core driver for VFIO based PCI devices"
-static bool nointxmask;
-static bool disable_vga;
-static bool disable_idle_d3;
-
static void vfio_pci_eventfd_rcu_free(struct rcu_head *rcu)
{
struct vfio_pci_eventfd *eventfd =
@@ -92,10 +88,10 @@ struct vfio_pci_vf_token {
int users;
};
-static inline bool vfio_vga_disabled(void)
+static inline bool vfio_vga_disabled(struct vfio_pci_core_device *vdev)
{
#ifdef CONFIG_VFIO_PCI_VGA
- return disable_vga;
+ return vdev->disable_vga;
#else
return true;
#endif
@@ -111,11 +107,12 @@ static inline bool vfio_vga_disabled(void)
*/
static unsigned int vfio_pci_set_decode(struct pci_dev *pdev, bool single_vga)
{
+ struct vfio_pci_core_device *vdev = dev_get_drvdata(&pdev->dev);
struct pci_dev *tmp = NULL;
unsigned char max_busnr;
unsigned int decodes;
- if (single_vga || !vfio_vga_disabled() || pci_is_root_bus(pdev->bus))
+ if (single_vga || !vfio_vga_disabled(vdev) || pci_is_root_bus(pdev->bus))
return VGA_RSRC_NORMAL_IO | VGA_RSRC_NORMAL_MEM |
VGA_RSRC_LEGACY_IO | VGA_RSRC_LEGACY_MEM;
@@ -562,7 +559,7 @@ int vfio_pci_core_enable(struct vfio_pci_core_device *vdev)
if (!vdev->pci_saved_state)
pci_dbg(pdev, "%s: Couldn't store saved state\n", __func__);
- if (likely(!nointxmask)) {
+ if (likely(!vdev->nointxmask)) {
if (vfio_pci_nointx(pdev)) {
pci_info(pdev, "Masking broken INTx support\n");
vdev->nointx = true;
@@ -602,7 +599,7 @@ int vfio_pci_core_enable(struct vfio_pci_core_device *vdev)
vdev->has_dyn_msix = false;
}
- if (!vfio_vga_disabled() && vfio_pci_is_vga(pdev))
+ if (!vfio_vga_disabled(vdev) && vfio_pci_is_vga(pdev))
vdev->has_vga = true;
vfio_pci_core_map_bars(vdev);
@@ -2144,8 +2141,6 @@ int vfio_pci_core_init_dev(struct vfio_device *core_vdev)
init_rwsem(&vdev->memory_lock);
xa_init(&vdev->ctx);
- vdev->disable_idle_d3 = disable_idle_d3;
-
return 0;
}
EXPORT_SYMBOL_GPL(vfio_pci_core_init_dev);
@@ -2625,15 +2620,6 @@ static void vfio_pci_dev_set_try_reset(struct vfio_device_set *dev_set)
}
}
-void vfio_pci_core_set_params(bool is_nointxmask, bool is_disable_vga,
- bool is_disable_idle_d3)
-{
- nointxmask = is_nointxmask;
- disable_vga = is_disable_vga;
- disable_idle_d3 = is_disable_idle_d3;
-}
-EXPORT_SYMBOL_GPL(vfio_pci_core_set_params);
-
static void vfio_pci_core_cleanup(void)
{
vfio_pci_uninit_perm_bits();
diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
index 985b8af5a04b..9a1674c152aa 100644
--- a/include/linux/vfio_pci_core.h
+++ b/include/linux/vfio_pci_core.h
@@ -127,6 +127,8 @@ struct vfio_pci_core_device {
bool nointx:1;
bool needs_pm_restore:1;
bool disable_idle_d3:1;
+ bool nointxmask:1;
+ bool disable_vga:1;
/* Flags modified at runtime - dedicated storage unit */
bool needs_reset;
bool pm_intx_masked;
@@ -161,8 +163,6 @@ int vfio_pci_core_register_dev_region(struct vfio_pci_core_device *vdev,
unsigned int type, unsigned int subtype,
const struct vfio_pci_regops *ops,
size_t size, u32 flags, void *data);
-void vfio_pci_core_set_params(bool nointxmask, bool is_disable_vga,
- bool is_disable_idle_d3);
void vfio_pci_core_close_device(struct vfio_device *core_vdev);
int vfio_pci_core_init_dev(struct vfio_device *core_vdev);
void vfio_pci_core_release_dev(struct vfio_device *core_vdev);
--
2.53.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v2 5/5] vfio/pci: Latch all module parameters per device
2026-06-11 21:35 ` [PATCH v2 5/5] vfio/pci: Latch all module parameters per device Alex Williamson
@ 2026-06-12 6:25 ` Guixin Liu
2026-06-12 15:05 ` Alex Williamson
2026-06-12 8:40 ` fengchengwen
1 sibling, 1 reply; 10+ messages in thread
From: Guixin Liu @ 2026-06-12 6:25 UTC (permalink / raw)
To: Alex Williamson, kvm, Alex Williamson
Cc: linux-kernel, Jason Gunthorpe, Kevin Tian
Hi Alex,
In many cases, when users modify a module parameter, they expect
the change to take effect immediately, but that's not at all what
happens here. So I think on the one hand, we should make the module
parameter read-only; on the other hand, we could also expose this
information in the PCI core device's sysfs directory — read-only as well
for now,
or perhaps there's an even better approach?
Best Regards,
Guixin Liu
在 2026/6/12 05:35, Alex Williamson 写道:
> The vfio-pci module parameters of disable_idle_d3, nointxmask, and
> disable_vga latch vfio-pci policy into vfio-pci-core globals each time
> the vfio-pci module is initialized. The disable_idle_d3 parameter has
> already migrated to a per-device flag in order to provide consistency
> for refcounted PM operations for the lifetime of the device
> registration.
>
> Pull the remaining vfio-pci module-parameter policy out of vfio-pci-core
> into per-device flags set at device initialization.
>
> This also restores the mutable aspect of the disable_idle_d3 and
> nointxmask module parameters for vfio-pci, with the caveat that the
> parameters are latched into the device at probe.
>
> A notable change for variant drivers is that their devices are no longer
> affected by vfio-pci module parameters and those drivers may need to
> adopt similar module parameters if any devices have a hidden dependency
> on vfio-pci setting non-default policy.
>
> Assisted-by: Claude:claude-opus-4-8
> Signed-off-by: Alex Williamson <alex.williamson@nvidia.com>
> ---
> drivers/vfio/pci/vfio_pci.c | 30 ++++++++++++++++++++++--------
> drivers/vfio/pci/vfio_pci_core.c | 26 ++++++--------------------
> include/linux/vfio_pci_core.h | 4 ++--
> 3 files changed, 30 insertions(+), 30 deletions(-)
>
> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index 0c771064c0b8..830369ff878d 100644
> --- a/drivers/vfio/pci/vfio_pci.c
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -125,9 +125,30 @@ static int vfio_pci_open_device(struct vfio_device *core_vdev)
> return 0;
> }
>
> +static int vfio_pci_init_dev(struct vfio_device *core_vdev)
> +{
> + struct vfio_pci_core_device *vdev =
> + container_of(core_vdev, struct vfio_pci_core_device, vdev);
> +
> + /*
> + * These behaviors originated in vfio-pci and moved into
> + * vfio-pci-core when the driver was split; vfio-pci remains the
> + * only driver that toggles them. Latch our module parameters per
> + * device at init time so that later parameter changes do not
> + * affect already-initialized devices.
> + */
> + vdev->nointxmask = nointxmask;
> + vdev->disable_idle_d3 = disable_idle_d3;
> +#ifdef CONFIG_VFIO_PCI_VGA
> + vdev->disable_vga = disable_vga;
> +#endif
> +
> + return vfio_pci_core_init_dev(core_vdev);
> +}
> +
> static const struct vfio_device_ops vfio_pci_ops = {
> .name = "vfio-pci",
> - .init = vfio_pci_core_init_dev,
> + .init = vfio_pci_init_dev,
> .release = vfio_pci_core_release_dev,
> .open_device = vfio_pci_open_device,
> .close_device = vfio_pci_core_close_device,
> @@ -256,13 +277,6 @@ static void __init vfio_pci_fill_ids(void)
> static int __init vfio_pci_init(void)
> {
> int ret;
> - bool is_disable_vga = true;
> -
> -#ifdef CONFIG_VFIO_PCI_VGA
> - is_disable_vga = disable_vga;
> -#endif
> -
> - vfio_pci_core_set_params(nointxmask, is_disable_vga, disable_idle_d3);
>
> /* Register and scan for devices */
> ret = pci_register_driver(&vfio_pci_driver);
> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index d2d3fddec610..76725c0e7dda 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -38,10 +38,6 @@
> #define DRIVER_AUTHOR "Alex Williamson <alex.williamson@redhat.com>"
> #define DRIVER_DESC "core driver for VFIO based PCI devices"
>
> -static bool nointxmask;
> -static bool disable_vga;
> -static bool disable_idle_d3;
> -
> static void vfio_pci_eventfd_rcu_free(struct rcu_head *rcu)
> {
> struct vfio_pci_eventfd *eventfd =
> @@ -92,10 +88,10 @@ struct vfio_pci_vf_token {
> int users;
> };
>
> -static inline bool vfio_vga_disabled(void)
> +static inline bool vfio_vga_disabled(struct vfio_pci_core_device *vdev)
> {
> #ifdef CONFIG_VFIO_PCI_VGA
> - return disable_vga;
> + return vdev->disable_vga;
> #else
> return true;
> #endif
> @@ -111,11 +107,12 @@ static inline bool vfio_vga_disabled(void)
> */
> static unsigned int vfio_pci_set_decode(struct pci_dev *pdev, bool single_vga)
> {
> + struct vfio_pci_core_device *vdev = dev_get_drvdata(&pdev->dev);
> struct pci_dev *tmp = NULL;
> unsigned char max_busnr;
> unsigned int decodes;
>
> - if (single_vga || !vfio_vga_disabled() || pci_is_root_bus(pdev->bus))
> + if (single_vga || !vfio_vga_disabled(vdev) || pci_is_root_bus(pdev->bus))
> return VGA_RSRC_NORMAL_IO | VGA_RSRC_NORMAL_MEM |
> VGA_RSRC_LEGACY_IO | VGA_RSRC_LEGACY_MEM;
>
> @@ -562,7 +559,7 @@ int vfio_pci_core_enable(struct vfio_pci_core_device *vdev)
> if (!vdev->pci_saved_state)
> pci_dbg(pdev, "%s: Couldn't store saved state\n", __func__);
>
> - if (likely(!nointxmask)) {
> + if (likely(!vdev->nointxmask)) {
> if (vfio_pci_nointx(pdev)) {
> pci_info(pdev, "Masking broken INTx support\n");
> vdev->nointx = true;
> @@ -602,7 +599,7 @@ int vfio_pci_core_enable(struct vfio_pci_core_device *vdev)
> vdev->has_dyn_msix = false;
> }
>
> - if (!vfio_vga_disabled() && vfio_pci_is_vga(pdev))
> + if (!vfio_vga_disabled(vdev) && vfio_pci_is_vga(pdev))
> vdev->has_vga = true;
>
> vfio_pci_core_map_bars(vdev);
> @@ -2144,8 +2141,6 @@ int vfio_pci_core_init_dev(struct vfio_device *core_vdev)
> init_rwsem(&vdev->memory_lock);
> xa_init(&vdev->ctx);
>
> - vdev->disable_idle_d3 = disable_idle_d3;
> -
> return 0;
> }
> EXPORT_SYMBOL_GPL(vfio_pci_core_init_dev);
> @@ -2625,15 +2620,6 @@ static void vfio_pci_dev_set_try_reset(struct vfio_device_set *dev_set)
> }
> }
>
> -void vfio_pci_core_set_params(bool is_nointxmask, bool is_disable_vga,
> - bool is_disable_idle_d3)
> -{
> - nointxmask = is_nointxmask;
> - disable_vga = is_disable_vga;
> - disable_idle_d3 = is_disable_idle_d3;
> -}
> -EXPORT_SYMBOL_GPL(vfio_pci_core_set_params);
> -
> static void vfio_pci_core_cleanup(void)
> {
> vfio_pci_uninit_perm_bits();
> diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
> index 985b8af5a04b..9a1674c152aa 100644
> --- a/include/linux/vfio_pci_core.h
> +++ b/include/linux/vfio_pci_core.h
> @@ -127,6 +127,8 @@ struct vfio_pci_core_device {
> bool nointx:1;
> bool needs_pm_restore:1;
> bool disable_idle_d3:1;
> + bool nointxmask:1;
> + bool disable_vga:1;
> /* Flags modified at runtime - dedicated storage unit */
> bool needs_reset;
> bool pm_intx_masked;
> @@ -161,8 +163,6 @@ int vfio_pci_core_register_dev_region(struct vfio_pci_core_device *vdev,
> unsigned int type, unsigned int subtype,
> const struct vfio_pci_regops *ops,
> size_t size, u32 flags, void *data);
> -void vfio_pci_core_set_params(bool nointxmask, bool is_disable_vga,
> - bool is_disable_idle_d3);
> void vfio_pci_core_close_device(struct vfio_device *core_vdev);
> int vfio_pci_core_init_dev(struct vfio_device *core_vdev);
> void vfio_pci_core_release_dev(struct vfio_device *core_vdev);
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 5/5] vfio/pci: Latch all module parameters per device
2026-06-11 21:35 ` [PATCH v2 5/5] vfio/pci: Latch all module parameters per device Alex Williamson
2026-06-12 6:25 ` Guixin Liu
@ 2026-06-12 8:40 ` fengchengwen
2026-06-12 14:11 ` Alex Williamson
1 sibling, 1 reply; 10+ messages in thread
From: fengchengwen @ 2026-06-12 8:40 UTC (permalink / raw)
To: Alex Williamson, kvm, Alex Williamson
Cc: linux-kernel, Jason Gunthorpe, Kevin Tian, kanie
On 6/12/2026 5:35 AM, Alex Williamson wrote:
> The vfio-pci module parameters of disable_idle_d3, nointxmask, and
> disable_vga latch vfio-pci policy into vfio-pci-core globals each time
> the vfio-pci module is initialized. The disable_idle_d3 parameter has
> already migrated to a per-device flag in order to provide consistency
> for refcounted PM operations for the lifetime of the device
> registration.
>
> Pull the remaining vfio-pci module-parameter policy out of vfio-pci-core
> into per-device flags set at device initialization.
>
> This also restores the mutable aspect of the disable_idle_d3 and
> nointxmask module parameters for vfio-pci, with the caveat that the
> parameters are latched into the device at probe.
>
> A notable change for variant drivers is that their devices are no longer
> affected by vfio-pci module parameters and those drivers may need to
> adopt similar module parameters if any devices have a hidden dependency
> on vfio-pci setting non-default policy.
>
> Assisted-by: Claude:claude-opus-4-8
> Signed-off-by: Alex Williamson <alex.williamson@nvidia.com>
> ---
> drivers/vfio/pci/vfio_pci.c | 30 ++++++++++++++++++++++--------
> drivers/vfio/pci/vfio_pci_core.c | 26 ++++++--------------------
> include/linux/vfio_pci_core.h | 4 ++--
> 3 files changed, 30 insertions(+), 30 deletions(-)
>
> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index 0c771064c0b8..830369ff878d 100644
> --- a/drivers/vfio/pci/vfio_pci.c
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -125,9 +125,30 @@ static int vfio_pci_open_device(struct vfio_device *core_vdev)
> return 0;
> }
>
> +static int vfio_pci_init_dev(struct vfio_device *core_vdev)
> +{
> + struct vfio_pci_core_device *vdev =
> + container_of(core_vdev, struct vfio_pci_core_device, vdev);
> +
> + /*
> + * These behaviors originated in vfio-pci and moved into
> + * vfio-pci-core when the driver was split; vfio-pci remains the
> + * only driver that toggles them. Latch our module parameters per
> + * device at init time so that later parameter changes do not
> + * affect already-initialized devices.
> + */
> + vdev->nointxmask = nointxmask;
> + vdev->disable_idle_d3 = disable_idle_d3;
> +#ifdef CONFIG_VFIO_PCI_VGA
> + vdev->disable_vga = disable_vga;
Since nointxmask and disable_idle_d3 already have S_IWUSR and share the same
"latch per device at init" behavior, it would be better to also add S_IWUSR
to disable_vga for permission and behavior consistency.
Runtime writes will only affect future devices, existing devices keep their
latched value, so no extra risk.
...
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 5/5] vfio/pci: Latch all module parameters per device
2026-06-12 8:40 ` fengchengwen
@ 2026-06-12 14:11 ` Alex Williamson
0 siblings, 0 replies; 10+ messages in thread
From: Alex Williamson @ 2026-06-12 14:11 UTC (permalink / raw)
To: fengchengwen
Cc: kvm, Alex Williamson, linux-kernel, Jason Gunthorpe, Kevin Tian,
kanie
On Fri, 12 Jun 2026 16:40:37 +0800
fengchengwen <fengchengwen@huawei.com> wrote:
>
> Since nointxmask and disable_idle_d3 already have S_IWUSR and share the same
> "latch per device at init" behavior, it would be better to also add S_IWUSR
> to disable_vga for permission and behavior consistency.
>
> Runtime writes will only affect future devices, existing devices keep their
> latched value, so no extra risk.
The difference is that notintxmask and disable_idle_d3 are typically
diagnostic measures while disable_vga is a feature mask. disable_vga
has never supported runtime modification. So, while I agree it's
possible, this would expand the scope of this series from bug fixes and
behavior restoration into feature enablement, which should have some
actual use case in mind. Therefore I don't intend to pursue it in this
series. Thanks,
Alex
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 5/5] vfio/pci: Latch all module parameters per device
2026-06-12 6:25 ` Guixin Liu
@ 2026-06-12 15:05 ` Alex Williamson
0 siblings, 0 replies; 10+ messages in thread
From: Alex Williamson @ 2026-06-12 15:05 UTC (permalink / raw)
To: Guixin Liu
Cc: kvm, Alex Williamson, linux-kernel, Jason Gunthorpe, Kevin Tian
On Fri, 12 Jun 2026 14:25:16 +0800
Guixin Liu <kanie@linux.alibaba.com> wrote:
> Hi Alex,
>
> In many cases, when users modify a module parameter, they expect
> the change to take effect immediately, but that's not at all what
> happens here. So I think on the one hand, we should make the module
> parameter read-only; on the other hand, we could also expose this
> information in the PCI core device's sysfs directory — read-only as well
> for now,
> or perhaps there's an even better approach?
nointxmask never took effect immediately, it was at best setup when the
device is opened. We could support that behavior again, but it's
actually more limiting, all devices supporting INTx would latch
nointxmask on open and therefore require an exclusive interrupt. I've
encountered this in debugging, where it's more useful to isolate
nointxmask to a single device rather than modify the VM config to
remove devices or unload host drivers to free IRQ lines.
disable_idle_d3 manages the power state of unused devices, where
setting disable_idle_d3 never woke devices that were already bound and
idle, it only affected future use of low power states. I do still see
value in directing this behavior to specific devices, it's a runtime
diagnostic versus directing users to set quirk flags and recompile
their kernel.
Therefore, neither of these are really improved by making the module
parameter itself read-only.
Given these are largely diagnostic, debugfs would be the more natural
way to expose it versus sysfs. We already have some support for
debugfs for exposing migration features and state. This might be a
good idea, but I'd also argue it's more of a feature scope, whereas I'd
really like to get the Fixes: and behavior restoration scoped change in
tree asap. Thanks,
Alex
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2026-06-12 15:05 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-11 21:35 [PATCH v2 0/5] vfio/pci: Latch module params, fix bitfield use and VGA unwind Alex Williamson
2026-06-11 21:35 ` [PATCH v2 1/5] vfio/pci: Latch disable_idle_d3 per device Alex Williamson
2026-06-11 21:35 ` [PATCH v2 2/5] vfio/pci: Release the VGA arbiter client on register_device() failure Alex Williamson
2026-06-11 21:35 ` [PATCH v2 3/5] vfio/pci: Fix racy bitfields and tighten struct layout Alex Williamson
2026-06-11 21:35 ` [PATCH v2 4/5] vfio/mlx5: " Alex Williamson
2026-06-11 21:35 ` [PATCH v2 5/5] vfio/pci: Latch all module parameters per device Alex Williamson
2026-06-12 6:25 ` Guixin Liu
2026-06-12 15:05 ` Alex Williamson
2026-06-12 8:40 ` fengchengwen
2026-06-12 14:11 ` Alex Williamson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox