* [PATCH V6 00/21] Live update: vfio and iommufd
@ 2025-07-02 21:58 Steve Sistare
2025-07-02 21:58 ` [PATCH V6 01/21] vfio-pci: preserve MSI Steve Sistare
` (21 more replies)
0 siblings, 22 replies; 31+ messages in thread
From: Steve Sistare @ 2025-07-02 21:58 UTC (permalink / raw)
To: qemu-devel
Cc: Alex Williamson, Cedric Le Goater, Yi Liu, Eric Auger,
Zhenzhong Duan, Michael S. Tsirkin, Marcel Apfelbaum, Peter Xu,
Fabiano Rosas, Steve Sistare
NOTE: this V6 series depends on the patch
vfio-user: do not register vfio-user container with cpr
which is in vfio-next.
Support vfio and iommufd devices with the cpr-transfer live migration mode.
Devices that do not support live migration can still support cpr-transfer,
allowing live update to a new version of QEMU on the same host, with no loss
of guest connectivity.
No user-visible interfaces are added.
For legacy containers:
Pass vfio device descriptors to new QEMU. In new QEMU, during vfio_realize,
skip the ioctls that configure the device, because it is already configured.
Use VFIO_DMA_UNMAP_FLAG_VADDR to abandon the old VA's for DMA mapped
regions, and use VFIO_DMA_MAP_FLAG_VADDR to register the new VA in new
QEMU and update the locked memory accounting. The physical pages remain
pinned, because the descriptor of the device that locked them remains open,
so DMA to those pages continues without interruption. Mediated devices are
not supported, however, because they require the VA to always be valid, and
there is a brief window where no VA is registered.
Save the MSI message area as part of vfio-pci vmstate, and pass the interrupt
and notifier eventfd's to new QEMU. New QEMU loads the MSI data, then the
vfio-pci post_load handler finds the eventfds in CPR state, rebuilds vector
data structures, and attaches the interrupts to the new KVM instance. This
logic also applies to iommufd containers.
For iommufd containers:
Use IOMMU_IOAS_MAP_FILE to register memory regions for DMA when they are
backed by a file (including a memfd), so DMA mappings do not depend on VA,
which can differ after live update. This allows mediated devices to be
supported.
Pass the iommufd and vfio device descriptors from old to new QEMU. In new
QEMU, during vfio_realize, skip the ioctls that configure the device, because
it is already configured.
In new QEMU, call ioctl(IOMMU_IOAS_CHANGE_PROCESS) to update mm ownership and
locked memory accounting.
Patches 3 to 8 are specific to legacy containers.
Patches 21 to 36 are specific to iommufd containers.
The remainder apply to both.
Changes from previous versions:
* V1 of this series contains minor changes from the "Live update: vfio" and
"Live update: iommufd" series, mainly bug fixes and refactored patches.
Changes in V2:
* refactored various vfio code snippets into new cpr helpers
* refactored vfio struct members into cpr-specific structures
* refactored various small changes into their own patches
* split complex patches. Notably:
- split "refactor for cpr" into 5 patches
- split "reconstruct device" into 4 patches
* refactored vfio_connect_container using helpers and made its
error recovery more robust.
* moved vfio pci msi/vector/intx cpr functions to cpr.c
* renamed "reused" to cpr_reused and cpr.reused
* squashed vfio_cpr_[un]register_container to their call sites
* simplified iommu_type setting after cpr
* added cpr_open_fd and cpr_is_incoming helpers
* removed changes from vfio_legacy_dma_map, and instead temporarily
override dma_map and dma_unmap ops.
* deleted error_report and returned Error to callers where possible.
* simplified the memory_get_xlat_addr interface
* fixed flags passed to iommufd_backend_alloc_hwpt
* defined MIG_PRI_UNINITIALIZED
* added maintainers
Changes in V3:
* removed cleanup patches that were already pulled
* rebased to latest master
Changes in V4:
* added SPDX-License-Identifier
* patch "vfio/container: preserve descriptors"
- rewrote search loop in vfio_container_connect
- do not return pfd from vfio_cpr_container_match
- add helper for VFIO_GROUP_GET_DEVICE_FD
* deleted patch "export vfio_legacy_dma_map"
* patch "vfio/container: restore DMA vaddr"
- deleted redundant error_report from vfio_legacy_cpr_dma_map
- save old dma_map function
* patch "vfio-pci: skip reset during cpr"
- use cpr_is_incoming instead of cpr_reused
* renamed err -> local_err in all new code
* patch "export MSI functions"
- renamed with vfio_pci prefix, and defined wrappers for low level
routines instead of exporting them.
* patch "close kvm after cpr"
- fixed build error for !CONFIG_KVM
* added the cpr_resave_fd helper
* dropped patch "pass ramblock to vfio_container_dma_map", relying on
"pass MemoryRegion" from the vfio-user series instead.
* deleted "reused" variables, replaced with cpr_is_incoming()
* renamed cpr_needed_for_reuse -> cpr_incoming_needed
* rewrote patch "pci: skip reset during cpr"
* rebased to latest master
for iommufd:
* deleted redundant error_report from iommufd_backend_map_file_dma
* added interface doc for dma_map_file
* check return value of cpr_open_fd
* deleted "export iommufd_cdev_get_info_iova_range"
* deleted "reconstruct device"
* deleted "reconstruct hw_caps"
* deleted "define hwpt constructors"
* seperated cpr registration for iommufd be and vfio container
* correctly attach to multiple containers per iommufd using ioas_id
* simplified "reconstruct hwpt" by matching against hwpt_id.
* added patch "add vfio_device_free_name"
Changes in V5:
* dropped: vfio/pci: vfio_pci_put_device on failure
* added: "vfio: doc changes for cpr"
* deleted unnecessary include of vfio-cpr.h
* fixed compilation for !CONFIG_VFIO and !CONFIG_IOMMUFD
* misc minor changes
* Added RB's, rebased to master
Changes in V6:
* dropped already-pulled patches
* converted remaining g_free in "add vfio_device_free_name"
* fixed iommufd_backend_disconnect in "preserve descriptors"
* tweaked vfio_cpr_load_device in "preserve descriptors"
* added trace_vfio_cpr_find_device in "cpr state"
* rewrote vfio_notifier_init and vfio_msix_vector_use
* rewrote the notifier in "close kvm after cpr"
* Added RB's, rebased to master
Steve Sistare (21):
vfio-pci: preserve MSI
vfio-pci: preserve INTx
migration: close kvm after cpr
migration: cpr_get_fd_param helper
backends/iommufd: iommufd_backend_map_file_dma
backends/iommufd: change process ioctl
physmem: qemu_ram_get_fd_offset
vfio/iommufd: use IOMMU_IOAS_MAP_FILE
vfio/iommufd: invariant device name
vfio/iommufd: add vfio_device_free_name
vfio/iommufd: device name blocker
vfio/iommufd: register container for cpr
migration: vfio cpr state hook
vfio/iommufd: cpr state
vfio/iommufd: preserve descriptors
vfio/iommufd: reconstruct device
vfio/iommufd: reconstruct hwpt
vfio/iommufd: change process
iommufd: preserve DMA mappings
vfio/container: delete old cpr register
vfio: doc changes for cpr
docs/devel/migration/CPR.rst | 5 +-
qapi/migration.json | 6 +-
hw/vfio/pci.h | 2 +
include/exec/cpu-common.h | 1 +
include/hw/vfio/vfio-container-base.h | 15 +++
include/hw/vfio/vfio-cpr.h | 29 ++++-
include/hw/vfio/vfio-device.h | 3 +
include/migration/cpr.h | 14 +++
include/system/iommufd.h | 7 ++
include/system/kvm.h | 1 +
accel/kvm/kvm-all.c | 32 +++++
backends/iommufd.c | 107 +++++++++++++++-
hw/vfio/ap.c | 4 +-
hw/vfio/ccw.c | 4 +-
hw/vfio/container-base.c | 9 ++
hw/vfio/cpr-iommufd.c | 224 ++++++++++++++++++++++++++++++++++
hw/vfio/cpr-legacy.c | 2 +
hw/vfio/cpr.c | 144 ++++++++++++++++++++--
hw/vfio/device.c | 40 ++++--
hw/vfio/helpers.c | 11 ++
hw/vfio/iommufd-stubs.c | 18 +++
hw/vfio/iommufd.c | 81 ++++++++++--
hw/vfio/pci.c | 109 ++++++++++++++++-
hw/vfio/platform.c | 2 +-
migration/cpr.c | 52 ++++++--
system/physmem.c | 5 +
backends/trace-events | 2 +
hw/vfio/meson.build | 2 +
hw/vfio/trace-events | 3 +
29 files changed, 871 insertions(+), 63 deletions(-)
create mode 100644 hw/vfio/cpr-iommufd.c
create mode 100644 hw/vfio/iommufd-stubs.c
--
1.8.3.1
^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH V6 01/21] vfio-pci: preserve MSI
2025-07-02 21:58 [PATCH V6 00/21] Live update: vfio and iommufd Steve Sistare
@ 2025-07-02 21:58 ` Steve Sistare
2025-07-03 6:13 ` Cédric Le Goater
2025-07-02 21:58 ` [PATCH V6 02/21] vfio-pci: preserve INTx Steve Sistare
` (20 subsequent siblings)
21 siblings, 1 reply; 31+ messages in thread
From: Steve Sistare @ 2025-07-02 21:58 UTC (permalink / raw)
To: qemu-devel
Cc: Alex Williamson, Cedric Le Goater, Yi Liu, Eric Auger,
Zhenzhong Duan, Michael S. Tsirkin, Marcel Apfelbaum, Peter Xu,
Fabiano Rosas, Steve Sistare
Save the MSI message area as part of vfio-pci vmstate, and preserve the
interrupt and notifier eventfd's. migrate_incoming loads the MSI data,
then the vfio-pci post_load handler finds the eventfds in CPR state,
rebuilds vector data structures, and attaches the interrupts to the new
KVM instance.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
hw/vfio/pci.h | 2 +
include/hw/vfio/vfio-cpr.h | 8 ++++
hw/vfio/cpr.c | 97 ++++++++++++++++++++++++++++++++++++++++++++++
hw/vfio/pci.c | 52 ++++++++++++++++++++++++-
4 files changed, 157 insertions(+), 2 deletions(-)
diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index 5ba7330..495fae7 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -218,6 +218,8 @@ void vfio_pci_add_kvm_msi_virq(VFIOPCIDevice *vdev, VFIOMSIVector *vector,
void vfio_pci_prepare_kvm_msi_virq_batch(VFIOPCIDevice *vdev);
void vfio_pci_commit_kvm_msi_virq_batch(VFIOPCIDevice *vdev);
bool vfio_pci_intx_enable(VFIOPCIDevice *vdev, Error **errp);
+void vfio_pci_msix_set_notifiers(VFIOPCIDevice *vdev);
+void vfio_pci_msi_set_handler(VFIOPCIDevice *vdev, int nr);
uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len);
void vfio_pci_write_config(PCIDevice *pdev,
diff --git a/include/hw/vfio/vfio-cpr.h b/include/hw/vfio/vfio-cpr.h
index 8bf85b9..25e74ee 100644
--- a/include/hw/vfio/vfio-cpr.h
+++ b/include/hw/vfio/vfio-cpr.h
@@ -15,6 +15,7 @@
struct VFIOContainer;
struct VFIOContainerBase;
struct VFIOGroup;
+struct VFIOPCIDevice;
typedef struct VFIOContainerCPR {
Error *blocker;
@@ -52,6 +53,13 @@ void vfio_cpr_giommu_remap(struct VFIOContainerBase *bcontainer,
bool vfio_cpr_ram_discard_register_listener(
struct VFIOContainerBase *bcontainer, MemoryRegionSection *section);
+void vfio_cpr_save_vector_fd(struct VFIOPCIDevice *vdev, const char *name,
+ int nr, int fd);
+int vfio_cpr_load_vector_fd(struct VFIOPCIDevice *vdev, const char *name,
+ int nr);
+void vfio_cpr_delete_vector_fd(struct VFIOPCIDevice *vdev, const char *name,
+ int nr);
+
extern const VMStateDescription vfio_cpr_pci_vmstate;
#endif /* HW_VFIO_VFIO_CPR_H */
diff --git a/hw/vfio/cpr.c b/hw/vfio/cpr.c
index fdbb58e..e467373 100644
--- a/hw/vfio/cpr.c
+++ b/hw/vfio/cpr.c
@@ -9,6 +9,8 @@
#include "hw/vfio/vfio-device.h"
#include "hw/vfio/vfio-cpr.h"
#include "hw/vfio/pci.h"
+#include "hw/pci/msix.h"
+#include "hw/pci/msi.h"
#include "migration/cpr.h"
#include "qapi/error.h"
#include "system/runstate.h"
@@ -40,6 +42,69 @@ void vfio_cpr_unregister_container(VFIOContainerBase *bcontainer)
migration_remove_notifier(&bcontainer->cpr_reboot_notifier);
}
+#define STRDUP_VECTOR_FD_NAME(vdev, name) \
+ g_strdup_printf("%s_%s", (vdev)->vbasedev.name, (name))
+
+void vfio_cpr_save_vector_fd(VFIOPCIDevice *vdev, const char *name, int nr,
+ int fd)
+{
+ g_autofree char *fdname = STRDUP_VECTOR_FD_NAME(vdev, name);
+ cpr_save_fd(fdname, nr, fd);
+}
+
+int vfio_cpr_load_vector_fd(VFIOPCIDevice *vdev, const char *name, int nr)
+{
+ g_autofree char *fdname = STRDUP_VECTOR_FD_NAME(vdev, name);
+ return cpr_find_fd(fdname, nr);
+}
+
+void vfio_cpr_delete_vector_fd(VFIOPCIDevice *vdev, const char *name, int nr)
+{
+ g_autofree char *fdname = STRDUP_VECTOR_FD_NAME(vdev, name);
+ cpr_delete_fd(fdname, nr);
+}
+
+static void vfio_cpr_claim_vectors(VFIOPCIDevice *vdev, int nr_vectors,
+ bool msix)
+{
+ int i, fd;
+ bool pending = false;
+ PCIDevice *pdev = &vdev->pdev;
+
+ vdev->nr_vectors = nr_vectors;
+ vdev->msi_vectors = g_new0(VFIOMSIVector, nr_vectors);
+ vdev->interrupt = msix ? VFIO_INT_MSIX : VFIO_INT_MSI;
+
+ vfio_pci_prepare_kvm_msi_virq_batch(vdev);
+
+ for (i = 0; i < nr_vectors; i++) {
+ VFIOMSIVector *vector = &vdev->msi_vectors[i];
+
+ fd = vfio_cpr_load_vector_fd(vdev, "interrupt", i);
+ if (fd >= 0) {
+ vfio_pci_vector_init(vdev, i);
+ vfio_pci_msi_set_handler(vdev, i);
+ }
+
+ if (vfio_cpr_load_vector_fd(vdev, "kvm_interrupt", i) >= 0) {
+ vfio_pci_add_kvm_msi_virq(vdev, vector, i, msix);
+ } else {
+ vdev->msi_vectors[i].virq = -1;
+ }
+
+ if (msix && msix_is_pending(pdev, i) && msix_is_masked(pdev, i)) {
+ set_bit(i, vdev->msix->pending);
+ pending = true;
+ }
+ }
+
+ vfio_pci_commit_kvm_msi_virq_batch(vdev);
+
+ if (msix) {
+ memory_region_set_enabled(&pdev->msix_pba_mmio, pending);
+ }
+}
+
/*
* The kernel may change non-emulated config bits. Exclude them from the
* changed-bits check in get_pci_config_device.
@@ -58,13 +123,45 @@ static int vfio_cpr_pci_pre_load(void *opaque)
return 0;
}
+static int vfio_cpr_pci_post_load(void *opaque, int version_id)
+{
+ VFIOPCIDevice *vdev = opaque;
+ PCIDevice *pdev = &vdev->pdev;
+ int nr_vectors;
+
+ if (msix_enabled(pdev)) {
+ vfio_pci_msix_set_notifiers(vdev);
+ nr_vectors = vdev->msix->entries;
+ vfio_cpr_claim_vectors(vdev, nr_vectors, true);
+
+ } else if (msi_enabled(pdev)) {
+ nr_vectors = msi_nr_vectors_allocated(pdev);
+ vfio_cpr_claim_vectors(vdev, nr_vectors, false);
+
+ } else if (vfio_pci_read_config(pdev, PCI_INTERRUPT_PIN, 1)) {
+ g_assert_not_reached(); /* completed in a subsequent patch */
+ }
+
+ return 0;
+}
+
+static bool pci_msix_present(void *opaque, int version_id)
+{
+ PCIDevice *pdev = opaque;
+
+ return msix_present(pdev);
+}
+
const VMStateDescription vfio_cpr_pci_vmstate = {
.name = "vfio-cpr-pci",
.version_id = 0,
.minimum_version_id = 0,
.pre_load = vfio_cpr_pci_pre_load,
+ .post_load = vfio_cpr_pci_post_load,
.needed = cpr_incoming_needed,
.fields = (VMStateField[]) {
+ VMSTATE_PCI_DEVICE(pdev, VFIOPCIDevice),
+ VMSTATE_MSIX_TEST(pdev, VFIOPCIDevice, pci_msix_present),
VMSTATE_END_OF_LIST()
}
};
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index fa25bde..5f9f264 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -29,6 +29,7 @@
#include "hw/pci/pci_bridge.h"
#include "hw/qdev-properties.h"
#include "hw/qdev-properties-system.h"
+#include "hw/vfio/vfio-cpr.h"
#include "migration/vmstate.h"
#include "migration/cpr.h"
#include "qobject/qdict.h"
@@ -57,20 +58,33 @@ static void vfio_disable_interrupts(VFIOPCIDevice *vdev);
static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled);
static void vfio_msi_disable_common(VFIOPCIDevice *vdev);
+/* Create new or reuse existing eventfd */
static bool vfio_notifier_init(VFIOPCIDevice *vdev, EventNotifier *e,
const char *name, int nr, Error **errp)
{
- int ret = event_notifier_init(e, 0);
+ int fd, ret;
+ fd = vfio_cpr_load_vector_fd(vdev, name, nr);
+ if (fd >= 0) {
+ event_notifier_init_fd(e, fd);
+ return true;
+ }
+
+ ret = event_notifier_init(e, 0);
if (ret) {
error_setg_errno(errp, -ret, "vfio_notifier_init %s failed", name);
+ return false;
}
- return !ret;
+
+ fd = event_notifier_get_fd(e);
+ vfio_cpr_save_vector_fd(vdev, name, nr, fd);
+ return true;
}
static void vfio_notifier_cleanup(VFIOPCIDevice *vdev, EventNotifier *e,
const char *name, int nr)
{
+ vfio_cpr_delete_vector_fd(vdev, name, nr);
event_notifier_cleanup(e);
}
@@ -394,6 +408,14 @@ static void vfio_msi_interrupt(void *opaque)
notify(&vdev->pdev, nr);
}
+void vfio_pci_msi_set_handler(VFIOPCIDevice *vdev, int nr)
+{
+ VFIOMSIVector *vector = &vdev->msi_vectors[nr];
+ int fd = event_notifier_get_fd(&vector->interrupt);
+
+ qemu_set_fd_handler(fd, vfio_msi_interrupt, NULL, vector);
+}
+
/*
* Get MSI-X enabled, but no vector enabled, by setting vector 0 with an invalid
* fd to kernel.
@@ -656,6 +678,15 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, unsigned int nr,
static int vfio_msix_vector_use(PCIDevice *pdev,
unsigned int nr, MSIMessage msg)
{
+ /*
+ * Ignore the callback from msix_set_vector_notifiers during resume.
+ * The necessary subset of these actions is called from
+ * vfio_cpr_claim_vectors during post load.
+ */
+ if (cpr_is_incoming()) {
+ return 0;
+ }
+
return vfio_msix_vector_do_use(pdev, nr, &msg, vfio_msi_interrupt);
}
@@ -686,6 +717,12 @@ static void vfio_msix_vector_release(PCIDevice *pdev, unsigned int nr)
}
}
+void vfio_pci_msix_set_notifiers(VFIOPCIDevice *vdev)
+{
+ msix_set_vector_notifiers(&vdev->pdev, vfio_msix_vector_use,
+ vfio_msix_vector_release, NULL);
+}
+
void vfio_pci_prepare_kvm_msi_virq_batch(VFIOPCIDevice *vdev)
{
assert(!vdev->defer_kvm_irq_routing);
@@ -2965,6 +3002,11 @@ void vfio_pci_register_err_notifier(VFIOPCIDevice *vdev)
fd = event_notifier_get_fd(&vdev->err_notifier);
qemu_set_fd_handler(fd, vfio_err_notifier_handler, NULL, vdev);
+ /* Do not alter irq_signaling during vfio_realize for cpr */
+ if (cpr_is_incoming()) {
+ return;
+ }
+
if (!vfio_device_irq_set_signaling(&vdev->vbasedev, VFIO_PCI_ERR_IRQ_INDEX, 0,
VFIO_IRQ_SET_ACTION_TRIGGER, fd, &err)) {
error_reportf_err(err, VFIO_MSG_PREFIX, vdev->vbasedev.name);
@@ -3032,6 +3074,12 @@ void vfio_pci_register_req_notifier(VFIOPCIDevice *vdev)
fd = event_notifier_get_fd(&vdev->req_notifier);
qemu_set_fd_handler(fd, vfio_req_notifier_handler, NULL, vdev);
+ /* Do not alter irq_signaling during vfio_realize for cpr */
+ if (cpr_is_incoming()) {
+ vdev->req_enabled = true;
+ return;
+ }
+
if (!vfio_device_irq_set_signaling(&vdev->vbasedev, VFIO_PCI_REQ_IRQ_INDEX, 0,
VFIO_IRQ_SET_ACTION_TRIGGER, fd, &err)) {
error_reportf_err(err, VFIO_MSG_PREFIX, vdev->vbasedev.name);
--
1.8.3.1
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH V6 02/21] vfio-pci: preserve INTx
2025-07-02 21:58 [PATCH V6 00/21] Live update: vfio and iommufd Steve Sistare
2025-07-02 21:58 ` [PATCH V6 01/21] vfio-pci: preserve MSI Steve Sistare
@ 2025-07-02 21:58 ` Steve Sistare
2025-07-03 6:13 ` Cédric Le Goater
2025-07-02 21:58 ` [PATCH V6 03/21] migration: close kvm after cpr Steve Sistare
` (19 subsequent siblings)
21 siblings, 1 reply; 31+ messages in thread
From: Steve Sistare @ 2025-07-02 21:58 UTC (permalink / raw)
To: qemu-devel
Cc: Alex Williamson, Cedric Le Goater, Yi Liu, Eric Auger,
Zhenzhong Duan, Michael S. Tsirkin, Marcel Apfelbaum, Peter Xu,
Fabiano Rosas, Steve Sistare
Preserve vfio INTx state across cpr-transfer. Preserve VFIOINTx fields as
follows:
pin : Recover this from the vfio config in kernel space
interrupt : Preserve its eventfd descriptor across exec.
unmask : Ditto
route.irq : This could perhaps be recovered in vfio_pci_post_load by
calling pci_device_route_intx_to_irq(pin), whose implementation reads
config space for a bridge device such as ich9. However, there is no
guarantee that the bridge vmstate is read before vfio vmstate. Rather
than fiddling with MigrationPriority for vmstate handlers, explicitly
save route.irq in vfio vmstate.
pending : save in vfio vmstate.
mmap_timeout, mmap_timer : Re-initialize
bool kvm_accel : Re-initialize
In vfio_realize, defer calling vfio_intx_enable until the vmstate
is available, in vfio_pci_post_load. Modify vfio_intx_enable and
vfio_intx_kvm_enable to skip vfio initialization, but still perform
kvm initialization.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
hw/vfio/cpr.c | 27 ++++++++++++++++++++++++++-
hw/vfio/pci.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++++++++--
2 files changed, 79 insertions(+), 3 deletions(-)
diff --git a/hw/vfio/cpr.c b/hw/vfio/cpr.c
index e467373..f5555ca 100644
--- a/hw/vfio/cpr.c
+++ b/hw/vfio/cpr.c
@@ -139,7 +139,11 @@ static int vfio_cpr_pci_post_load(void *opaque, int version_id)
vfio_cpr_claim_vectors(vdev, nr_vectors, false);
} else if (vfio_pci_read_config(pdev, PCI_INTERRUPT_PIN, 1)) {
- g_assert_not_reached(); /* completed in a subsequent patch */
+ Error *local_err = NULL;
+ if (!vfio_pci_intx_enable(vdev, &local_err)) {
+ error_report_err(local_err);
+ return -1;
+ }
}
return 0;
@@ -152,6 +156,26 @@ static bool pci_msix_present(void *opaque, int version_id)
return msix_present(pdev);
}
+static const VMStateDescription vfio_intx_vmstate = {
+ .name = "vfio-cpr-intx",
+ .version_id = 0,
+ .minimum_version_id = 0,
+ .fields = (VMStateField[]) {
+ VMSTATE_BOOL(pending, VFIOINTx),
+ VMSTATE_UINT32(route.mode, VFIOINTx),
+ VMSTATE_INT32(route.irq, VFIOINTx),
+ VMSTATE_END_OF_LIST()
+ }
+};
+
+#define VMSTATE_VFIO_INTX(_field, _state) { \
+ .name = (stringify(_field)), \
+ .size = sizeof(VFIOINTx), \
+ .vmsd = &vfio_intx_vmstate, \
+ .flags = VMS_STRUCT, \
+ .offset = vmstate_offset_value(_state, _field, VFIOINTx), \
+}
+
const VMStateDescription vfio_cpr_pci_vmstate = {
.name = "vfio-cpr-pci",
.version_id = 0,
@@ -162,6 +186,7 @@ const VMStateDescription vfio_cpr_pci_vmstate = {
.fields = (VMStateField[]) {
VMSTATE_PCI_DEVICE(pdev, VFIOPCIDevice),
VMSTATE_MSIX_TEST(pdev, VFIOPCIDevice, pci_msix_present),
+ VMSTATE_VFIO_INTX(intx, VFIOPCIDevice),
VMSTATE_END_OF_LIST()
}
};
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 5f9f264..dd0b2a0 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -210,6 +210,36 @@ fail:
#endif
}
+static bool vfio_cpr_intx_enable_kvm(VFIOPCIDevice *vdev, Error **errp)
+{
+#ifdef CONFIG_KVM
+ if (vdev->no_kvm_intx || !kvm_irqfds_enabled() ||
+ vdev->intx.route.mode != PCI_INTX_ENABLED ||
+ !kvm_resamplefds_enabled()) {
+ return true;
+ }
+
+ if (!vfio_notifier_init(vdev, &vdev->intx.unmask, "intx-unmask", 0, errp)) {
+ return false;
+ }
+
+ if (kvm_irqchip_add_irqfd_notifier_gsi(kvm_state,
+ &vdev->intx.interrupt,
+ &vdev->intx.unmask,
+ vdev->intx.route.irq)) {
+ error_setg_errno(errp, errno, "failed to setup resample irqfd");
+ vfio_notifier_cleanup(vdev, &vdev->intx.unmask, "intx-unmask", 0);
+ return false;
+ }
+
+ vdev->intx.kvm_accel = true;
+ trace_vfio_intx_enable_kvm(vdev->vbasedev.name);
+ return true;
+#else
+ return true;
+#endif
+}
+
static void vfio_intx_disable_kvm(VFIOPCIDevice *vdev)
{
#ifdef CONFIG_KVM
@@ -305,7 +335,13 @@ static bool vfio_intx_enable(VFIOPCIDevice *vdev, Error **errp)
return true;
}
- vfio_disable_interrupts(vdev);
+ /*
+ * Do not alter interrupt state during vfio_realize and cpr load.
+ * The incoming state is cleared thereafter.
+ */
+ if (!cpr_is_incoming()) {
+ vfio_disable_interrupts(vdev);
+ }
vdev->intx.pin = pin - 1; /* Pin A (1) -> irq[0] */
pci_config_set_interrupt_pin(vdev->pdev.config, pin);
@@ -328,6 +364,14 @@ static bool vfio_intx_enable(VFIOPCIDevice *vdev, Error **errp)
fd = event_notifier_get_fd(&vdev->intx.interrupt);
qemu_set_fd_handler(fd, vfio_intx_interrupt, NULL, vdev);
+
+ if (cpr_is_incoming()) {
+ if (!vfio_cpr_intx_enable_kvm(vdev, &err)) {
+ warn_reportf_err(err, VFIO_MSG_PREFIX, vdev->vbasedev.name);
+ }
+ goto skip_signaling;
+ }
+
if (!vfio_device_irq_set_signaling(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX, 0,
VFIO_IRQ_SET_ACTION_TRIGGER, fd, errp)) {
qemu_set_fd_handler(fd, NULL, NULL, vdev);
@@ -339,6 +383,7 @@ static bool vfio_intx_enable(VFIOPCIDevice *vdev, Error **errp)
warn_reportf_err(err, VFIO_MSG_PREFIX, vdev->vbasedev.name);
}
+skip_signaling:
vdev->interrupt = VFIO_INT_INTx;
trace_vfio_intx_enable(vdev->vbasedev.name);
@@ -3237,7 +3282,13 @@ bool vfio_pci_interrupt_setup(VFIOPCIDevice *vdev, Error **errp)
vfio_intx_routing_notifier);
vdev->irqchip_change_notifier.notify = vfio_irqchip_change;
kvm_irqchip_add_change_notifier(&vdev->irqchip_change_notifier);
- if (!vfio_intx_enable(vdev, errp)) {
+
+ /*
+ * During CPR, do not call vfio_intx_enable at this time. Instead,
+ * call it from vfio_pci_post_load after the intx routing data has
+ * been loaded from vmstate.
+ */
+ if (!cpr_is_incoming() && !vfio_intx_enable(vdev, errp)) {
timer_free(vdev->intx.mmap_timer);
pci_device_set_intx_routing_notifier(&vdev->pdev, NULL);
kvm_irqchip_remove_change_notifier(&vdev->irqchip_change_notifier);
--
1.8.3.1
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH V6 03/21] migration: close kvm after cpr
2025-07-02 21:58 [PATCH V6 00/21] Live update: vfio and iommufd Steve Sistare
2025-07-02 21:58 ` [PATCH V6 01/21] vfio-pci: preserve MSI Steve Sistare
2025-07-02 21:58 ` [PATCH V6 02/21] vfio-pci: preserve INTx Steve Sistare
@ 2025-07-02 21:58 ` Steve Sistare
2025-07-02 22:05 ` Steven Sistare
2025-07-02 21:58 ` [PATCH V6 04/21] migration: cpr_get_fd_param helper Steve Sistare
` (18 subsequent siblings)
21 siblings, 1 reply; 31+ messages in thread
From: Steve Sistare @ 2025-07-02 21:58 UTC (permalink / raw)
To: qemu-devel
Cc: Alex Williamson, Cedric Le Goater, Yi Liu, Eric Auger,
Zhenzhong Duan, Michael S. Tsirkin, Marcel Apfelbaum, Peter Xu,
Fabiano Rosas, Steve Sistare
cpr-transfer breaks vfio network connectivity to and from the guest, and
the host system log shows:
irq bypass consumer (token 00000000a03c32e5) registration fails: -16
which is EBUSY. This occurs because KVM descriptors are still open in
the old QEMU process. Close them.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
---
include/hw/vfio/vfio-cpr.h | 2 ++
include/hw/vfio/vfio-device.h | 2 ++
include/system/kvm.h | 1 +
accel/kvm/kvm-all.c | 32 ++++++++++++++++++++++++++++++++
hw/vfio/cpr-legacy.c | 2 ++
hw/vfio/cpr.c | 21 +++++++++++++++++++++
hw/vfio/helpers.c | 11 +++++++++++
7 files changed, 71 insertions(+)
diff --git a/include/hw/vfio/vfio-cpr.h b/include/hw/vfio/vfio-cpr.h
index 25e74ee..099d54f 100644
--- a/include/hw/vfio/vfio-cpr.h
+++ b/include/hw/vfio/vfio-cpr.h
@@ -62,4 +62,6 @@ void vfio_cpr_delete_vector_fd(struct VFIOPCIDevice *vdev, const char *name,
extern const VMStateDescription vfio_cpr_pci_vmstate;
+void vfio_cpr_add_kvm_notifier(void);
+
#endif /* HW_VFIO_VFIO_CPR_H */
diff --git a/include/hw/vfio/vfio-device.h b/include/hw/vfio/vfio-device.h
index c616652..f503837 100644
--- a/include/hw/vfio/vfio-device.h
+++ b/include/hw/vfio/vfio-device.h
@@ -283,4 +283,6 @@ void vfio_device_set_fd(VFIODevice *vbasedev, const char *str, Error **errp);
void vfio_device_init(VFIODevice *vbasedev, int type, VFIODeviceOps *ops,
DeviceState *dev, bool ram_discard);
int vfio_device_get_aw_bits(VFIODevice *vdev);
+
+void vfio_kvm_device_close(void);
#endif /* HW_VFIO_VFIO_COMMON_H */
diff --git a/include/system/kvm.h b/include/system/kvm.h
index 7cc60d2..4896a3c 100644
--- a/include/system/kvm.h
+++ b/include/system/kvm.h
@@ -195,6 +195,7 @@ bool kvm_has_sync_mmu(void);
int kvm_has_vcpu_events(void);
int kvm_max_nested_state_length(void);
int kvm_has_gsi_routing(void);
+void kvm_close(void);
/**
* kvm_arm_supports_user_irq
diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index d095d1b..8141854 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -515,16 +515,23 @@ static int do_kvm_destroy_vcpu(CPUState *cpu)
goto err;
}
+ /* If I am the CPU that created coalesced_mmio_ring, then discard it */
+ if (s->coalesced_mmio_ring == (void *)cpu->kvm_run + PAGE_SIZE) {
+ s->coalesced_mmio_ring = NULL;
+ }
+
ret = munmap(cpu->kvm_run, mmap_size);
if (ret < 0) {
goto err;
}
+ cpu->kvm_run = NULL;
if (cpu->kvm_dirty_gfns) {
ret = munmap(cpu->kvm_dirty_gfns, s->kvm_dirty_ring_bytes);
if (ret < 0) {
goto err;
}
+ cpu->kvm_dirty_gfns = NULL;
}
kvm_park_vcpu(cpu);
@@ -608,6 +615,31 @@ err:
return ret;
}
+void kvm_close(void)
+{
+ CPUState *cpu;
+
+ if (!kvm_state || kvm_state->fd == -1) {
+ return;
+ }
+
+ CPU_FOREACH(cpu) {
+ cpu_remove_sync(cpu);
+ close(cpu->kvm_fd);
+ cpu->kvm_fd = -1;
+ close(cpu->kvm_vcpu_stats_fd);
+ cpu->kvm_vcpu_stats_fd = -1;
+ }
+
+ if (kvm_state && kvm_state->fd != -1) {
+ close(kvm_state->vmfd);
+ kvm_state->vmfd = -1;
+ close(kvm_state->fd);
+ kvm_state->fd = -1;
+ }
+ kvm_state = NULL;
+}
+
/*
* dirty pages logging control
*/
diff --git a/hw/vfio/cpr-legacy.c b/hw/vfio/cpr-legacy.c
index a84c324..daa3523 100644
--- a/hw/vfio/cpr-legacy.c
+++ b/hw/vfio/cpr-legacy.c
@@ -177,6 +177,8 @@ bool vfio_legacy_cpr_register_container(VFIOContainer *container, Error **errp)
MIG_MODE_CPR_TRANSFER, -1) == 0;
}
+ vfio_cpr_add_kvm_notifier();
+
vmstate_register(NULL, -1, &vfio_container_vmstate, container);
/* During incoming CPR, divert calls to dma_map. */
diff --git a/hw/vfio/cpr.c b/hw/vfio/cpr.c
index f5555ca..0e903cd 100644
--- a/hw/vfio/cpr.c
+++ b/hw/vfio/cpr.c
@@ -190,3 +190,24 @@ const VMStateDescription vfio_cpr_pci_vmstate = {
VMSTATE_END_OF_LIST()
}
};
+
+static NotifierWithReturn kvm_close_notifier;
+
+static int vfio_cpr_kvm_close_notifier(NotifierWithReturn *notifier,
+ MigrationEvent *e,
+ Error **errp)
+{
+ if (e->type == MIG_EVENT_PRECOPY_DONE) {
+ vfio_kvm_device_close();
+ }
+ return 0;
+}
+
+void vfio_cpr_add_kvm_notifier(void)
+{
+ if (!kvm_close_notifier.notify) {
+ migration_add_notifier_mode(&kvm_close_notifier,
+ vfio_cpr_kvm_close_notifier,
+ MIG_MODE_CPR_TRANSFER);
+ }
+}
diff --git a/hw/vfio/helpers.c b/hw/vfio/helpers.c
index d0dbab1..9a5f621 100644
--- a/hw/vfio/helpers.c
+++ b/hw/vfio/helpers.c
@@ -117,6 +117,17 @@ bool vfio_get_info_dma_avail(struct vfio_iommu_type1_info *info,
int vfio_kvm_device_fd = -1;
#endif
+void vfio_kvm_device_close(void)
+{
+#ifdef CONFIG_KVM
+ kvm_close();
+ if (vfio_kvm_device_fd != -1) {
+ close(vfio_kvm_device_fd);
+ vfio_kvm_device_fd = -1;
+ }
+#endif
+}
+
int vfio_kvm_device_add_fd(int fd, Error **errp)
{
#ifdef CONFIG_KVM
--
1.8.3.1
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH V6 04/21] migration: cpr_get_fd_param helper
2025-07-02 21:58 [PATCH V6 00/21] Live update: vfio and iommufd Steve Sistare
` (2 preceding siblings ...)
2025-07-02 21:58 ` [PATCH V6 03/21] migration: close kvm after cpr Steve Sistare
@ 2025-07-02 21:58 ` Steve Sistare
2025-07-02 21:58 ` [PATCH V6 05/21] backends/iommufd: iommufd_backend_map_file_dma Steve Sistare
` (17 subsequent siblings)
21 siblings, 0 replies; 31+ messages in thread
From: Steve Sistare @ 2025-07-02 21:58 UTC (permalink / raw)
To: qemu-devel
Cc: Alex Williamson, Cedric Le Goater, Yi Liu, Eric Auger,
Zhenzhong Duan, Michael S. Tsirkin, Marcel Apfelbaum, Peter Xu,
Fabiano Rosas, Steve Sistare
Add the helper function cpr_get_fd_param, to use when preserving
a file descriptor that is opened externally and passed to QEMU.
cpr_get_fd_param returns a descriptor number either from a QEMU
command-line parameter, from a getfd command, or from CPR state.
When a descriptor is passed to new QEMU via SCM_RIGHTS, its number
changes. Hence, during CPR, the command-line parameter is ignored
in new QEMU, and over-ridden by the value found in CPR state.
Similarly, if the descriptor was originally specified by a getfd
command in old QEMU, the fd number is not known outside of QEMU,
and it changes when sent to new QEMU via SCM_RIGHTS. Hence the
user cannot send getfd to new QEMU, but when the user sends a
hotplug command that references the fd, cpr_get_fd_param finds
its value in CPR state.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
---
include/migration/cpr.h | 2 ++
migration/cpr.c | 37 +++++++++++++++++++++++++++++++++++++
2 files changed, 39 insertions(+)
diff --git a/include/migration/cpr.h b/include/migration/cpr.h
index 07858e9..eb27a93 100644
--- a/include/migration/cpr.h
+++ b/include/migration/cpr.h
@@ -32,6 +32,8 @@ void cpr_state_close(void);
struct QIOChannel *cpr_state_ioc(void);
bool cpr_incoming_needed(void *opaque);
+int cpr_get_fd_param(const char *name, const char *fdname, int index,
+ Error **errp);
QEMUFile *cpr_transfer_output(MigrationChannel *channel, Error **errp);
QEMUFile *cpr_transfer_input(MigrationChannel *channel, Error **errp);
diff --git a/migration/cpr.c b/migration/cpr.c
index a50a57e..535d587 100644
--- a/migration/cpr.c
+++ b/migration/cpr.c
@@ -13,6 +13,7 @@
#include "migration/qemu-file.h"
#include "migration/savevm.h"
#include "migration/vmstate.h"
+#include "monitor/monitor.h"
#include "system/runstate.h"
#include "trace.h"
@@ -264,3 +265,39 @@ bool cpr_incoming_needed(void *opaque)
MigMode mode = migrate_mode();
return mode == MIG_MODE_CPR_TRANSFER;
}
+
+/*
+ * cpr_get_fd_param: find a descriptor and return its value.
+ *
+ * @name: CPR name for the descriptor
+ * @fdname: An integer-valued string, or a name passed to a getfd command
+ * @index: CPR index of the descriptor
+ * @errp: returned error message
+ *
+ * If CPR is not being performed, then use @fdname to find the fd.
+ * If CPR is being performed, then ignore @fdname, and look for @name
+ * and @index in CPR state.
+ *
+ * On success returns the fd value, else returns -1.
+ */
+int cpr_get_fd_param(const char *name, const char *fdname, int index,
+ Error **errp)
+{
+ ERRP_GUARD();
+ int fd;
+
+ if (cpr_is_incoming()) {
+ fd = cpr_find_fd(name, index);
+ if (fd < 0) {
+ error_setg(errp, "cannot find saved value for fd %s", fdname);
+ }
+ } else {
+ fd = monitor_fd_param(monitor_cur(), fdname, errp);
+ if (fd >= 0) {
+ cpr_save_fd(name, index, fd);
+ } else {
+ error_prepend(errp, "Could not parse object fd %s:", fdname);
+ }
+ }
+ return fd;
+}
--
1.8.3.1
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH V6 05/21] backends/iommufd: iommufd_backend_map_file_dma
2025-07-02 21:58 [PATCH V6 00/21] Live update: vfio and iommufd Steve Sistare
` (3 preceding siblings ...)
2025-07-02 21:58 ` [PATCH V6 04/21] migration: cpr_get_fd_param helper Steve Sistare
@ 2025-07-02 21:58 ` Steve Sistare
2025-07-02 21:58 ` [PATCH V6 06/21] backends/iommufd: change process ioctl Steve Sistare
` (16 subsequent siblings)
21 siblings, 0 replies; 31+ messages in thread
From: Steve Sistare @ 2025-07-02 21:58 UTC (permalink / raw)
To: qemu-devel
Cc: Alex Williamson, Cedric Le Goater, Yi Liu, Eric Auger,
Zhenzhong Duan, Michael S. Tsirkin, Marcel Apfelbaum, Peter Xu,
Fabiano Rosas, Steve Sistare
Define iommufd_backend_map_file_dma to implement IOMMU_IOAS_MAP_FILE.
This will be called as a substitute for iommufd_backend_map_dma, so
the error conditions for BARs are copied as-is from that function.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
include/system/iommufd.h | 3 +++
backends/iommufd.c | 34 ++++++++++++++++++++++++++++++++++
backends/trace-events | 1 +
3 files changed, 38 insertions(+)
diff --git a/include/system/iommufd.h b/include/system/iommufd.h
index 283861b..2d24d93 100644
--- a/include/system/iommufd.h
+++ b/include/system/iommufd.h
@@ -43,6 +43,9 @@ void iommufd_backend_disconnect(IOMMUFDBackend *be);
bool iommufd_backend_alloc_ioas(IOMMUFDBackend *be, uint32_t *ioas_id,
Error **errp);
void iommufd_backend_free_id(IOMMUFDBackend *be, uint32_t id);
+int iommufd_backend_map_file_dma(IOMMUFDBackend *be, uint32_t ioas_id,
+ hwaddr iova, ram_addr_t size, int fd,
+ unsigned long start, bool readonly);
int iommufd_backend_map_dma(IOMMUFDBackend *be, uint32_t ioas_id, hwaddr iova,
ram_addr_t size, void *vaddr, bool readonly);
int iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
diff --git a/backends/iommufd.c b/backends/iommufd.c
index c2c47ab..3a2ecc7 100644
--- a/backends/iommufd.c
+++ b/backends/iommufd.c
@@ -172,6 +172,40 @@ int iommufd_backend_map_dma(IOMMUFDBackend *be, uint32_t ioas_id, hwaddr iova,
return ret;
}
+int iommufd_backend_map_file_dma(IOMMUFDBackend *be, uint32_t ioas_id,
+ hwaddr iova, ram_addr_t size,
+ int mfd, unsigned long start, bool readonly)
+{
+ int ret, fd = be->fd;
+ struct iommu_ioas_map_file map = {
+ .size = sizeof(map),
+ .flags = IOMMU_IOAS_MAP_READABLE |
+ IOMMU_IOAS_MAP_FIXED_IOVA,
+ .ioas_id = ioas_id,
+ .fd = mfd,
+ .start = start,
+ .iova = iova,
+ .length = size,
+ };
+
+ if (!readonly) {
+ map.flags |= IOMMU_IOAS_MAP_WRITEABLE;
+ }
+
+ ret = ioctl(fd, IOMMU_IOAS_MAP_FILE, &map);
+ trace_iommufd_backend_map_file_dma(fd, ioas_id, iova, size, mfd, start,
+ readonly, ret);
+ if (ret) {
+ ret = -errno;
+
+ /* TODO: Not support mapping hardware PCI BAR region for now. */
+ if (errno == EFAULT) {
+ warn_report("IOMMU_IOAS_MAP_FILE failed: %m, PCI BAR?");
+ }
+ }
+ return ret;
+}
+
int iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
hwaddr iova, ram_addr_t size)
{
diff --git a/backends/trace-events b/backends/trace-events
index 7278214..e5f3e70 100644
--- a/backends/trace-events
+++ b/backends/trace-events
@@ -11,6 +11,7 @@ iommufd_backend_connect(int fd, bool owned, uint32_t users) "fd=%d owned=%d user
iommufd_backend_disconnect(int fd, uint32_t users) "fd=%d users=%d"
iommu_backend_set_fd(int fd) "pre-opened /dev/iommu fd=%d"
iommufd_backend_map_dma(int iommufd, uint32_t ioas, uint64_t iova, uint64_t size, void *vaddr, bool readonly, int ret) " iommufd=%d ioas=%d iova=0x%"PRIx64" size=0x%"PRIx64" addr=%p readonly=%d (%d)"
+iommufd_backend_map_file_dma(int iommufd, uint32_t ioas, uint64_t iova, uint64_t size, int fd, unsigned long start, bool readonly, int ret) " iommufd=%d ioas=%d iova=0x%"PRIx64" size=0x%"PRIx64" fd=%d start=%ld readonly=%d (%d)"
iommufd_backend_unmap_dma_non_exist(int iommufd, uint32_t ioas, uint64_t iova, uint64_t size, int ret) " Unmap nonexistent mapping: iommufd=%d ioas=%d iova=0x%"PRIx64" size=0x%"PRIx64" (%d)"
iommufd_backend_unmap_dma(int iommufd, uint32_t ioas, uint64_t iova, uint64_t size, int ret) " iommufd=%d ioas=%d iova=0x%"PRIx64" size=0x%"PRIx64" (%d)"
iommufd_backend_alloc_ioas(int iommufd, uint32_t ioas) " iommufd=%d ioas=%d"
--
1.8.3.1
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH V6 06/21] backends/iommufd: change process ioctl
2025-07-02 21:58 [PATCH V6 00/21] Live update: vfio and iommufd Steve Sistare
` (4 preceding siblings ...)
2025-07-02 21:58 ` [PATCH V6 05/21] backends/iommufd: iommufd_backend_map_file_dma Steve Sistare
@ 2025-07-02 21:58 ` Steve Sistare
2025-07-02 21:58 ` [PATCH V6 07/21] physmem: qemu_ram_get_fd_offset Steve Sistare
` (15 subsequent siblings)
21 siblings, 0 replies; 31+ messages in thread
From: Steve Sistare @ 2025-07-02 21:58 UTC (permalink / raw)
To: qemu-devel
Cc: Alex Williamson, Cedric Le Goater, Yi Liu, Eric Auger,
Zhenzhong Duan, Michael S. Tsirkin, Marcel Apfelbaum, Peter Xu,
Fabiano Rosas, Steve Sistare
Define the change process ioctl
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
include/system/iommufd.h | 3 +++
backends/iommufd.c | 24 ++++++++++++++++++++++++
backends/trace-events | 1 +
3 files changed, 28 insertions(+)
diff --git a/include/system/iommufd.h b/include/system/iommufd.h
index 2d24d93..db5f2c7 100644
--- a/include/system/iommufd.h
+++ b/include/system/iommufd.h
@@ -69,6 +69,9 @@ bool iommufd_backend_invalidate_cache(IOMMUFDBackend *be, uint32_t id,
uint32_t *entry_num, void *data,
Error **errp);
+bool iommufd_change_process_capable(IOMMUFDBackend *be);
+bool iommufd_change_process(IOMMUFDBackend *be, Error **errp);
+
#define TYPE_HOST_IOMMU_DEVICE_IOMMUFD TYPE_HOST_IOMMU_DEVICE "-iommufd"
OBJECT_DECLARE_TYPE(HostIOMMUDeviceIOMMUFD, HostIOMMUDeviceIOMMUFDClass,
HOST_IOMMU_DEVICE_IOMMUFD)
diff --git a/backends/iommufd.c b/backends/iommufd.c
index 3a2ecc7..87f81a0 100644
--- a/backends/iommufd.c
+++ b/backends/iommufd.c
@@ -73,6 +73,30 @@ static void iommufd_backend_class_init(ObjectClass *oc, const void *data)
object_class_property_add_str(oc, "fd", NULL, iommufd_backend_set_fd);
}
+bool iommufd_change_process_capable(IOMMUFDBackend *be)
+{
+ struct iommu_ioas_change_process args = {.size = sizeof(args)};
+
+ /*
+ * Call IOMMU_IOAS_CHANGE_PROCESS to verify it is a recognized ioctl.
+ * This is a no-op if the process has not changed since DMA was mapped.
+ */
+ return !ioctl(be->fd, IOMMU_IOAS_CHANGE_PROCESS, &args);
+}
+
+bool iommufd_change_process(IOMMUFDBackend *be, Error **errp)
+{
+ struct iommu_ioas_change_process args = {.size = sizeof(args)};
+ bool ret = !ioctl(be->fd, IOMMU_IOAS_CHANGE_PROCESS, &args);
+
+ if (!ret) {
+ error_setg_errno(errp, errno, "IOMMU_IOAS_CHANGE_PROCESS fd %d failed",
+ be->fd);
+ }
+ trace_iommufd_change_process(be->fd, ret);
+ return ret;
+}
+
bool iommufd_backend_connect(IOMMUFDBackend *be, Error **errp)
{
int fd;
diff --git a/backends/trace-events b/backends/trace-events
index e5f3e70..56132d3 100644
--- a/backends/trace-events
+++ b/backends/trace-events
@@ -7,6 +7,7 @@ dbus_vmstate_loading(const char *id) "id: %s"
dbus_vmstate_saving(const char *id) "id: %s"
# iommufd.c
+iommufd_change_process(int fd, bool ret) "fd=%d (%d)"
iommufd_backend_connect(int fd, bool owned, uint32_t users) "fd=%d owned=%d users=%d"
iommufd_backend_disconnect(int fd, uint32_t users) "fd=%d users=%d"
iommu_backend_set_fd(int fd) "pre-opened /dev/iommu fd=%d"
--
1.8.3.1
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH V6 07/21] physmem: qemu_ram_get_fd_offset
2025-07-02 21:58 [PATCH V6 00/21] Live update: vfio and iommufd Steve Sistare
` (5 preceding siblings ...)
2025-07-02 21:58 ` [PATCH V6 06/21] backends/iommufd: change process ioctl Steve Sistare
@ 2025-07-02 21:58 ` Steve Sistare
2025-07-02 21:58 ` [PATCH V6 08/21] vfio/iommufd: use IOMMU_IOAS_MAP_FILE Steve Sistare
` (14 subsequent siblings)
21 siblings, 0 replies; 31+ messages in thread
From: Steve Sistare @ 2025-07-02 21:58 UTC (permalink / raw)
To: qemu-devel
Cc: Alex Williamson, Cedric Le Goater, Yi Liu, Eric Auger,
Zhenzhong Duan, Michael S. Tsirkin, Marcel Apfelbaum, Peter Xu,
Fabiano Rosas, Steve Sistare
Define qemu_ram_get_fd_offset, so CPR can map a memory region using
IOMMU_IOAS_MAP_FILE in a subsequent patch.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
include/exec/cpu-common.h | 1 +
system/physmem.c | 5 +++++
2 files changed, 6 insertions(+)
diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
index a684855..9b658a3 100644
--- a/include/exec/cpu-common.h
+++ b/include/exec/cpu-common.h
@@ -85,6 +85,7 @@ void qemu_ram_unset_idstr(RAMBlock *block);
const char *qemu_ram_get_idstr(RAMBlock *rb);
void *qemu_ram_get_host_addr(RAMBlock *rb);
ram_addr_t qemu_ram_get_offset(RAMBlock *rb);
+ram_addr_t qemu_ram_get_fd_offset(RAMBlock *rb);
ram_addr_t qemu_ram_get_used_length(RAMBlock *rb);
ram_addr_t qemu_ram_get_max_length(RAMBlock *rb);
bool qemu_ram_is_shared(RAMBlock *rb);
diff --git a/system/physmem.c b/system/physmem.c
index ff0ca40..130c148 100644
--- a/system/physmem.c
+++ b/system/physmem.c
@@ -1593,6 +1593,11 @@ ram_addr_t qemu_ram_get_offset(RAMBlock *rb)
return rb->offset;
}
+ram_addr_t qemu_ram_get_fd_offset(RAMBlock *rb)
+{
+ return rb->fd_offset;
+}
+
ram_addr_t qemu_ram_get_used_length(RAMBlock *rb)
{
return rb->used_length;
--
1.8.3.1
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH V6 08/21] vfio/iommufd: use IOMMU_IOAS_MAP_FILE
2025-07-02 21:58 [PATCH V6 00/21] Live update: vfio and iommufd Steve Sistare
` (6 preceding siblings ...)
2025-07-02 21:58 ` [PATCH V6 07/21] physmem: qemu_ram_get_fd_offset Steve Sistare
@ 2025-07-02 21:58 ` Steve Sistare
2025-07-02 21:58 ` [PATCH V6 09/21] vfio/iommufd: invariant device name Steve Sistare
` (13 subsequent siblings)
21 siblings, 0 replies; 31+ messages in thread
From: Steve Sistare @ 2025-07-02 21:58 UTC (permalink / raw)
To: qemu-devel
Cc: Alex Williamson, Cedric Le Goater, Yi Liu, Eric Auger,
Zhenzhong Duan, Michael S. Tsirkin, Marcel Apfelbaum, Peter Xu,
Fabiano Rosas, Steve Sistare
Use IOMMU_IOAS_MAP_FILE when the mapped region is backed by a file.
Such a mapping can be preserved without modification during CPR,
because it depends on the file's address space, which does not change,
rather than on the process's address space, which does change.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
include/hw/vfio/vfio-container-base.h | 15 +++++++++++++++
hw/vfio/container-base.c | 9 +++++++++
hw/vfio/iommufd.c | 13 +++++++++++++
3 files changed, 37 insertions(+)
diff --git a/include/hw/vfio/vfio-container-base.h b/include/hw/vfio/vfio-container-base.h
index 3cd86ec..bded6e9 100644
--- a/include/hw/vfio/vfio-container-base.h
+++ b/include/hw/vfio/vfio-container-base.h
@@ -168,6 +168,21 @@ struct VFIOIOMMUClass {
hwaddr iova, ram_addr_t size,
void *vaddr, bool readonly, MemoryRegion *mr);
/**
+ * @dma_map_file
+ *
+ * Map a file range for the container.
+ *
+ * @bcontainer: #VFIOContainerBase to use for map
+ * @iova: start address to map
+ * @size: size of the range to map
+ * @fd: descriptor of the file to map
+ * @start: starting file offset of the range to map
+ * @readonly: map read only if true
+ */
+ int (*dma_map_file)(const VFIOContainerBase *bcontainer,
+ hwaddr iova, ram_addr_t size,
+ int fd, unsigned long start, bool readonly);
+ /**
* @dma_unmap
*
* Unmap an address range from the container.
diff --git a/hw/vfio/container-base.c b/hw/vfio/container-base.c
index d834bd4..5630497 100644
--- a/hw/vfio/container-base.c
+++ b/hw/vfio/container-base.c
@@ -78,7 +78,16 @@ int vfio_container_dma_map(VFIOContainerBase *bcontainer,
void *vaddr, bool readonly, MemoryRegion *mr)
{
VFIOIOMMUClass *vioc = VFIO_IOMMU_GET_CLASS(bcontainer);
+ RAMBlock *rb = mr->ram_block;
+ int mfd = rb ? qemu_ram_get_fd(rb) : -1;
+ if (mfd >= 0 && vioc->dma_map_file) {
+ unsigned long start = vaddr - qemu_ram_get_host_addr(rb);
+ unsigned long offset = qemu_ram_get_fd_offset(rb);
+
+ return vioc->dma_map_file(bcontainer, iova, size, mfd, start + offset,
+ readonly);
+ }
g_assert(vioc->dma_map);
return vioc->dma_map(bcontainer, iova, size, vaddr, readonly, mr);
}
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index d3efef7..962a1e2 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -45,6 +45,18 @@ static int iommufd_cdev_map(const VFIOContainerBase *bcontainer, hwaddr iova,
iova, size, vaddr, readonly);
}
+static int iommufd_cdev_map_file(const VFIOContainerBase *bcontainer,
+ hwaddr iova, ram_addr_t size,
+ int fd, unsigned long start, bool readonly)
+{
+ const VFIOIOMMUFDContainer *container =
+ container_of(bcontainer, VFIOIOMMUFDContainer, bcontainer);
+
+ return iommufd_backend_map_file_dma(container->be,
+ container->ioas_id,
+ iova, size, fd, start, readonly);
+}
+
static int iommufd_cdev_unmap(const VFIOContainerBase *bcontainer,
hwaddr iova, ram_addr_t size,
IOMMUTLBEntry *iotlb, bool unmap_all)
@@ -807,6 +819,7 @@ static void vfio_iommu_iommufd_class_init(ObjectClass *klass, const void *data)
VFIOIOMMUClass *vioc = VFIO_IOMMU_CLASS(klass);
vioc->dma_map = iommufd_cdev_map;
+ vioc->dma_map_file = iommufd_cdev_map_file;
vioc->dma_unmap = iommufd_cdev_unmap;
vioc->attach_device = iommufd_cdev_attach;
vioc->detach_device = iommufd_cdev_detach;
--
1.8.3.1
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH V6 09/21] vfio/iommufd: invariant device name
2025-07-02 21:58 [PATCH V6 00/21] Live update: vfio and iommufd Steve Sistare
` (7 preceding siblings ...)
2025-07-02 21:58 ` [PATCH V6 08/21] vfio/iommufd: use IOMMU_IOAS_MAP_FILE Steve Sistare
@ 2025-07-02 21:58 ` Steve Sistare
2025-07-02 21:58 ` [PATCH V6 10/21] vfio/iommufd: add vfio_device_free_name Steve Sistare
` (12 subsequent siblings)
21 siblings, 0 replies; 31+ messages in thread
From: Steve Sistare @ 2025-07-02 21:58 UTC (permalink / raw)
To: qemu-devel
Cc: Alex Williamson, Cedric Le Goater, Yi Liu, Eric Auger,
Zhenzhong Duan, Michael S. Tsirkin, Marcel Apfelbaum, Peter Xu,
Fabiano Rosas, Steve Sistare
cpr-transfer will use the device name as a key to find the value
of the device descriptor in new QEMU. However, if the descriptor
number is specified by a command-line fd parameter, then
vfio_device_get_name creates a name that includes the fd number.
This causes a chicken-and-egg problem: new QEMU must know the fd
number to construct a name to find the fd number.
To fix, create an invariant name based on the id command-line parameter,
if id is defined. The user will need to provide such an id to use CPR.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
hw/vfio/device.c | 15 ++++++++++-----
1 file changed, 10 insertions(+), 5 deletions(-)
diff --git a/hw/vfio/device.c b/hw/vfio/device.c
index d91c695..3cd365f 100644
--- a/hw/vfio/device.c
+++ b/hw/vfio/device.c
@@ -316,12 +316,17 @@ bool vfio_device_get_name(VFIODevice *vbasedev, Error **errp)
error_setg(errp, "Use FD passing only with iommufd backend");
return false;
}
- /*
- * Give a name with fd so any function printing out vbasedev->name
- * will not break.
- */
if (!vbasedev->name) {
- vbasedev->name = g_strdup_printf("VFIO_FD%d", vbasedev->fd);
+
+ if (vbasedev->dev->id) {
+ vbasedev->name = g_strdup(vbasedev->dev->id);
+ return true;
+ } else {
+ /*
+ * Assign a name so any function printing it will not break.
+ */
+ vbasedev->name = g_strdup_printf("VFIO_FD%d", vbasedev->fd);
+ }
}
}
--
1.8.3.1
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH V6 10/21] vfio/iommufd: add vfio_device_free_name
2025-07-02 21:58 [PATCH V6 00/21] Live update: vfio and iommufd Steve Sistare
` (8 preceding siblings ...)
2025-07-02 21:58 ` [PATCH V6 09/21] vfio/iommufd: invariant device name Steve Sistare
@ 2025-07-02 21:58 ` Steve Sistare
2025-07-02 21:58 ` [PATCH V6 11/21] vfio/iommufd: device name blocker Steve Sistare
` (11 subsequent siblings)
21 siblings, 0 replies; 31+ messages in thread
From: Steve Sistare @ 2025-07-02 21:58 UTC (permalink / raw)
To: qemu-devel
Cc: Alex Williamson, Cedric Le Goater, Yi Liu, Eric Auger,
Zhenzhong Duan, Michael S. Tsirkin, Marcel Apfelbaum, Peter Xu,
Fabiano Rosas, Steve Sistare
Define vfio_device_free_name to free the name created by
vfio_device_get_name. A subsequent patch will do more there.
No functional change.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
include/hw/vfio/vfio-device.h | 1 +
hw/vfio/ap.c | 4 ++--
hw/vfio/ccw.c | 4 ++--
hw/vfio/device.c | 5 +++++
hw/vfio/pci.c | 2 +-
hw/vfio/platform.c | 2 +-
6 files changed, 12 insertions(+), 6 deletions(-)
diff --git a/include/hw/vfio/vfio-device.h b/include/hw/vfio/vfio-device.h
index f503837..1901a35 100644
--- a/include/hw/vfio/vfio-device.h
+++ b/include/hw/vfio/vfio-device.h
@@ -279,6 +279,7 @@ int vfio_device_get_irq_info(VFIODevice *vbasedev, int index,
/* Returns 0 on success, or a negative errno. */
bool vfio_device_get_name(VFIODevice *vbasedev, Error **errp);
+void vfio_device_free_name(VFIODevice *vbasedev);
void vfio_device_set_fd(VFIODevice *vbasedev, const char *str, Error **errp);
void vfio_device_init(VFIODevice *vbasedev, int type, VFIODeviceOps *ops,
DeviceState *dev, bool ram_discard);
diff --git a/hw/vfio/ap.c b/hw/vfio/ap.c
index 1df4438..7719f24 100644
--- a/hw/vfio/ap.c
+++ b/hw/vfio/ap.c
@@ -265,7 +265,7 @@ static void vfio_ap_realize(DeviceState *dev, Error **errp)
error:
error_prepend(errp, VFIO_MSG_PREFIX, vbasedev->name);
- g_free(vbasedev->name);
+ vfio_device_free_name(vbasedev);
}
static void vfio_ap_unrealize(DeviceState *dev)
@@ -275,7 +275,7 @@ static void vfio_ap_unrealize(DeviceState *dev)
vfio_ap_unregister_irq_notifier(vapdev, VFIO_AP_REQ_IRQ_INDEX);
vfio_ap_unregister_irq_notifier(vapdev, VFIO_AP_CFG_CHG_IRQ_INDEX);
vfio_device_detach(&vapdev->vdev);
- g_free(vapdev->vdev.name);
+ vfio_device_free_name(&vapdev->vdev);
}
static const Property vfio_ap_properties[] = {
diff --git a/hw/vfio/ccw.c b/hw/vfio/ccw.c
index cea9d6e..9560b8d 100644
--- a/hw/vfio/ccw.c
+++ b/hw/vfio/ccw.c
@@ -619,7 +619,7 @@ out_io_notifier_err:
out_region_err:
vfio_device_detach(vbasedev);
out_attach_dev_err:
- g_free(vbasedev->name);
+ vfio_device_free_name(vbasedev);
out_unrealize:
if (cdc->unrealize) {
cdc->unrealize(cdev);
@@ -637,7 +637,7 @@ static void vfio_ccw_unrealize(DeviceState *dev)
vfio_ccw_unregister_irq_notifier(vcdev, VFIO_CCW_IO_IRQ_INDEX);
vfio_ccw_put_region(vcdev);
vfio_device_detach(&vcdev->vdev);
- g_free(vcdev->vdev.name);
+ vfio_device_free_name(&vcdev->vdev);
if (cdc->unrealize) {
cdc->unrealize(cdev);
diff --git a/hw/vfio/device.c b/hw/vfio/device.c
index 3cd365f..97eddd0 100644
--- a/hw/vfio/device.c
+++ b/hw/vfio/device.c
@@ -333,6 +333,11 @@ bool vfio_device_get_name(VFIODevice *vbasedev, Error **errp)
return true;
}
+void vfio_device_free_name(VFIODevice *vbasedev)
+{
+ g_clear_pointer(&vbasedev->name, g_free);
+}
+
void vfio_device_set_fd(VFIODevice *vbasedev, const char *str, Error **errp)
{
ERRP_GUARD();
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index dd0b2a0..1093b28 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2996,7 +2996,7 @@ void vfio_pci_put_device(VFIOPCIDevice *vdev)
vfio_device_detach(&vdev->vbasedev);
- g_free(vdev->vbasedev.name);
+ vfio_device_free_name(&vdev->vbasedev);
g_free(vdev->msix);
}
diff --git a/hw/vfio/platform.c b/hw/vfio/platform.c
index 9a21f2e..5c1795a 100644
--- a/hw/vfio/platform.c
+++ b/hw/vfio/platform.c
@@ -530,7 +530,7 @@ static bool vfio_base_device_init(VFIODevice *vbasedev, Error **errp)
{
/* @fd takes precedence over @sysfsdev which takes precedence over @host */
if (vbasedev->fd < 0 && vbasedev->sysfsdev) {
- g_free(vbasedev->name);
+ vfio_device_free_name(vbasedev);
vbasedev->name = g_path_get_basename(vbasedev->sysfsdev);
} else if (vbasedev->fd < 0) {
if (!vbasedev->name || strchr(vbasedev->name, '/')) {
--
1.8.3.1
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH V6 11/21] vfio/iommufd: device name blocker
2025-07-02 21:58 [PATCH V6 00/21] Live update: vfio and iommufd Steve Sistare
` (9 preceding siblings ...)
2025-07-02 21:58 ` [PATCH V6 10/21] vfio/iommufd: add vfio_device_free_name Steve Sistare
@ 2025-07-02 21:58 ` Steve Sistare
2025-07-02 21:58 ` [PATCH V6 12/21] vfio/iommufd: register container for cpr Steve Sistare
` (10 subsequent siblings)
21 siblings, 0 replies; 31+ messages in thread
From: Steve Sistare @ 2025-07-02 21:58 UTC (permalink / raw)
To: qemu-devel
Cc: Alex Williamson, Cedric Le Goater, Yi Liu, Eric Auger,
Zhenzhong Duan, Michael S. Tsirkin, Marcel Apfelbaum, Peter Xu,
Fabiano Rosas, Steve Sistare
If an invariant device name cannot be created, block CPR.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
include/hw/vfio/vfio-cpr.h | 1 +
hw/vfio/device.c | 11 +++++++++++
2 files changed, 12 insertions(+)
diff --git a/include/hw/vfio/vfio-cpr.h b/include/hw/vfio/vfio-cpr.h
index 099d54f..76eafc0 100644
--- a/include/hw/vfio/vfio-cpr.h
+++ b/include/hw/vfio/vfio-cpr.h
@@ -29,6 +29,7 @@ typedef struct VFIOContainerCPR {
typedef struct VFIODeviceCPR {
Error *mdev_blocker;
+ Error *id_blocker;
} VFIODeviceCPR;
bool vfio_legacy_cpr_register_container(struct VFIOContainer *container,
diff --git a/hw/vfio/device.c b/hw/vfio/device.c
index 97eddd0..0ae3f3c 100644
--- a/hw/vfio/device.c
+++ b/hw/vfio/device.c
@@ -28,6 +28,8 @@
#include "qapi/error.h"
#include "qemu/error-report.h"
#include "qemu/units.h"
+#include "migration/cpr.h"
+#include "migration/blocker.h"
#include "monitor/monitor.h"
#include "vfio-helpers.h"
@@ -324,8 +326,16 @@ bool vfio_device_get_name(VFIODevice *vbasedev, Error **errp)
} else {
/*
* Assign a name so any function printing it will not break.
+ * The fd number changes across processes, so this cannot be
+ * used as an invariant name for CPR.
*/
vbasedev->name = g_strdup_printf("VFIO_FD%d", vbasedev->fd);
+ error_setg(&vbasedev->cpr.id_blocker,
+ "vfio device with fd=%d needs an id property",
+ vbasedev->fd);
+ return migrate_add_blocker_modes(&vbasedev->cpr.id_blocker,
+ errp, MIG_MODE_CPR_TRANSFER,
+ -1) == 0;
}
}
}
@@ -336,6 +346,7 @@ bool vfio_device_get_name(VFIODevice *vbasedev, Error **errp)
void vfio_device_free_name(VFIODevice *vbasedev)
{
g_clear_pointer(&vbasedev->name, g_free);
+ migrate_del_blocker(&vbasedev->cpr.id_blocker);
}
void vfio_device_set_fd(VFIODevice *vbasedev, const char *str, Error **errp)
--
1.8.3.1
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH V6 12/21] vfio/iommufd: register container for cpr
2025-07-02 21:58 [PATCH V6 00/21] Live update: vfio and iommufd Steve Sistare
` (10 preceding siblings ...)
2025-07-02 21:58 ` [PATCH V6 11/21] vfio/iommufd: device name blocker Steve Sistare
@ 2025-07-02 21:58 ` Steve Sistare
2025-07-03 2:42 ` Duan, Zhenzhong
2025-07-02 21:58 ` [PATCH V6 13/21] migration: vfio cpr state hook Steve Sistare
` (9 subsequent siblings)
21 siblings, 1 reply; 31+ messages in thread
From: Steve Sistare @ 2025-07-02 21:58 UTC (permalink / raw)
To: qemu-devel
Cc: Alex Williamson, Cedric Le Goater, Yi Liu, Eric Auger,
Zhenzhong Duan, Michael S. Tsirkin, Marcel Apfelbaum, Peter Xu,
Fabiano Rosas, Steve Sistare
Register a vfio iommufd container and device for CPR, replacing the generic
CPR register call with a more specific iommufd register call. Add a
blocker if the kernel does not support IOMMU_IOAS_CHANGE_PROCESS.
This is mostly boiler plate. The fields to to saved and restored are added
in subsequent patches.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
include/hw/vfio/vfio-cpr.h | 12 +++++++
include/system/iommufd.h | 1 +
backends/iommufd.c | 10 ++++++
hw/vfio/cpr-iommufd.c | 86 ++++++++++++++++++++++++++++++++++++++++++++++
hw/vfio/iommufd.c | 6 ++--
hw/vfio/meson.build | 1 +
6 files changed, 114 insertions(+), 2 deletions(-)
create mode 100644 hw/vfio/cpr-iommufd.c
diff --git a/include/hw/vfio/vfio-cpr.h b/include/hw/vfio/vfio-cpr.h
index 76eafc0..e0e3ee2 100644
--- a/include/hw/vfio/vfio-cpr.h
+++ b/include/hw/vfio/vfio-cpr.h
@@ -15,7 +15,10 @@
struct VFIOContainer;
struct VFIOContainerBase;
struct VFIOGroup;
+struct VFIODevice;
struct VFIOPCIDevice;
+struct VFIOIOMMUFDContainer;
+struct IOMMUFDBackend;
typedef struct VFIOContainerCPR {
Error *blocker;
@@ -43,6 +46,15 @@ bool vfio_cpr_register_container(struct VFIOContainerBase *bcontainer,
Error **errp);
void vfio_cpr_unregister_container(struct VFIOContainerBase *bcontainer);
+bool vfio_iommufd_cpr_register_container(struct VFIOIOMMUFDContainer *container,
+ Error **errp);
+void vfio_iommufd_cpr_unregister_container(
+ struct VFIOIOMMUFDContainer *container);
+bool vfio_iommufd_cpr_register_iommufd(struct IOMMUFDBackend *be, Error **errp);
+void vfio_iommufd_cpr_unregister_iommufd(struct IOMMUFDBackend *be);
+void vfio_iommufd_cpr_register_device(struct VFIODevice *vbasedev);
+void vfio_iommufd_cpr_unregister_device(struct VFIODevice *vbasedev);
+
int vfio_cpr_group_get_device_fd(int d, const char *name);
bool vfio_cpr_container_match(struct VFIOContainer *container,
diff --git a/include/system/iommufd.h b/include/system/iommufd.h
index db5f2c7..c9c72ff 100644
--- a/include/system/iommufd.h
+++ b/include/system/iommufd.h
@@ -32,6 +32,7 @@ struct IOMMUFDBackend {
/*< protected >*/
int fd; /* /dev/iommu file descriptor */
bool owned; /* is the /dev/iommu opened internally */
+ Error *cpr_blocker;/* set if be does not support CPR */
uint32_t users;
/*< public >*/
diff --git a/backends/iommufd.c b/backends/iommufd.c
index 87f81a0..c554ce5 100644
--- a/backends/iommufd.c
+++ b/backends/iommufd.c
@@ -108,6 +108,13 @@ bool iommufd_backend_connect(IOMMUFDBackend *be, Error **errp)
}
be->fd = fd;
}
+ if (!be->users && !vfio_iommufd_cpr_register_iommufd(be, errp)) {
+ if (be->owned) {
+ close(be->fd);
+ be->fd = -1;
+ }
+ return false;
+ }
be->users++;
trace_iommufd_backend_connect(be->fd, be->owned, be->users);
@@ -125,6 +132,9 @@ void iommufd_backend_disconnect(IOMMUFDBackend *be)
be->fd = -1;
}
out:
+ if (!be->users) {
+ vfio_iommufd_cpr_unregister_iommufd(be);
+ }
trace_iommufd_backend_disconnect(be->fd, be->users);
}
diff --git a/hw/vfio/cpr-iommufd.c b/hw/vfio/cpr-iommufd.c
new file mode 100644
index 0000000..2f58b43
--- /dev/null
+++ b/hw/vfio/cpr-iommufd.c
@@ -0,0 +1,86 @@
+/*
+ * Copyright (c) 2024-2025 Oracle and/or its affiliates.
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "hw/vfio/vfio-cpr.h"
+#include "migration/blocker.h"
+#include "migration/cpr.h"
+#include "migration/migration.h"
+#include "migration/vmstate.h"
+#include "system/iommufd.h"
+#include "vfio-iommufd.h"
+
+static bool vfio_cpr_supported(IOMMUFDBackend *be, Error **errp)
+{
+ if (!iommufd_change_process_capable(be)) {
+ if (errp) {
+ error_setg(errp, "vfio iommufd backend does not support "
+ "IOMMU_IOAS_CHANGE_PROCESS");
+ }
+ return false;
+ }
+ return true;
+}
+
+static const VMStateDescription iommufd_cpr_vmstate = {
+ .name = "iommufd",
+ .version_id = 0,
+ .minimum_version_id = 0,
+ .needed = cpr_incoming_needed,
+ .fields = (VMStateField[]) {
+ VMSTATE_END_OF_LIST()
+ }
+};
+
+bool vfio_iommufd_cpr_register_iommufd(IOMMUFDBackend *be, Error **errp)
+{
+ Error **cpr_blocker = &be->cpr_blocker;
+
+ if (!vfio_cpr_supported(be, cpr_blocker)) {
+ return migrate_add_blocker_modes(cpr_blocker, errp,
+ MIG_MODE_CPR_TRANSFER, -1) == 0;
+ }
+
+ vmstate_register(NULL, -1, &iommufd_cpr_vmstate, be);
+
+ return true;
+}
+
+void vfio_iommufd_cpr_unregister_iommufd(IOMMUFDBackend *be)
+{
+ vmstate_unregister(NULL, &iommufd_cpr_vmstate, be);
+ migrate_del_blocker(&be->cpr_blocker);
+}
+
+bool vfio_iommufd_cpr_register_container(VFIOIOMMUFDContainer *container,
+ Error **errp)
+{
+ VFIOContainerBase *bcontainer = &container->bcontainer;
+
+ migration_add_notifier_mode(&bcontainer->cpr_reboot_notifier,
+ vfio_cpr_reboot_notifier,
+ MIG_MODE_CPR_REBOOT);
+
+ vfio_cpr_add_kvm_notifier();
+
+ return true;
+}
+
+void vfio_iommufd_cpr_unregister_container(VFIOIOMMUFDContainer *container)
+{
+ VFIOContainerBase *bcontainer = &container->bcontainer;
+
+ migration_remove_notifier(&bcontainer->cpr_reboot_notifier);
+}
+
+void vfio_iommufd_cpr_register_device(VFIODevice *vbasedev)
+{
+}
+
+void vfio_iommufd_cpr_unregister_device(VFIODevice *vbasedev)
+{
+}
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index 962a1e2..ff291be 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -446,7 +446,7 @@ static void iommufd_cdev_container_destroy(VFIOIOMMUFDContainer *container)
if (!QLIST_EMPTY(&bcontainer->device_list)) {
return;
}
- vfio_cpr_unregister_container(bcontainer);
+ vfio_iommufd_cpr_unregister_container(container);
vfio_listener_unregister(bcontainer);
iommufd_backend_free_id(container->be, container->ioas_id);
object_unref(container);
@@ -592,7 +592,7 @@ static bool iommufd_cdev_attach(const char *name, VFIODevice *vbasedev,
goto err_listener_register;
}
- if (!vfio_cpr_register_container(bcontainer, errp)) {
+ if (!vfio_iommufd_cpr_register_container(container, errp)) {
goto err_listener_register;
}
@@ -623,6 +623,7 @@ found_container:
}
vfio_device_prepare(vbasedev, bcontainer, &dev_info);
+ vfio_iommufd_cpr_register_device(vbasedev);
trace_iommufd_cdev_device_info(vbasedev->name, devfd, vbasedev->num_irqs,
vbasedev->num_regions, vbasedev->flags);
@@ -660,6 +661,7 @@ static void iommufd_cdev_detach(VFIODevice *vbasedev)
iommufd_cdev_container_destroy(container);
vfio_address_space_put(space);
+ vfio_iommufd_cpr_unregister_device(vbasedev);
iommufd_cdev_unbind_and_disconnect(vbasedev);
close(vbasedev->fd);
}
diff --git a/hw/vfio/meson.build b/hw/vfio/meson.build
index 63ea393..7a88174 100644
--- a/hw/vfio/meson.build
+++ b/hw/vfio/meson.build
@@ -31,6 +31,7 @@ system_ss.add(when: 'CONFIG_VFIO', if_true: files(
))
system_ss.add(when: ['CONFIG_VFIO', 'CONFIG_IOMMUFD'], if_true: files(
'iommufd.c',
+ 'cpr-iommufd.c',
))
system_ss.add(when: 'CONFIG_VFIO_PCI', if_true: files(
'display.c',
--
1.8.3.1
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH V6 13/21] migration: vfio cpr state hook
2025-07-02 21:58 [PATCH V6 00/21] Live update: vfio and iommufd Steve Sistare
` (11 preceding siblings ...)
2025-07-02 21:58 ` [PATCH V6 12/21] vfio/iommufd: register container for cpr Steve Sistare
@ 2025-07-02 21:58 ` Steve Sistare
2025-07-03 2:44 ` Duan, Zhenzhong
2025-07-02 21:58 ` [PATCH V6 14/21] vfio/iommufd: cpr state Steve Sistare
` (8 subsequent siblings)
21 siblings, 1 reply; 31+ messages in thread
From: Steve Sistare @ 2025-07-02 21:58 UTC (permalink / raw)
To: qemu-devel
Cc: Alex Williamson, Cedric Le Goater, Yi Liu, Eric Auger,
Zhenzhong Duan, Michael S. Tsirkin, Marcel Apfelbaum, Peter Xu,
Fabiano Rosas, Steve Sistare
Define a list of vfio devices in CPR state, in a subsection so that
older QEMU can be live updated to this version. However, new QEMU
will not be live updateable to old QEMU. This is acceptable because
CPR is not yet commonly used, and updates to older versions are unusual.
The contents of each device object will be defined by the vfio subsystem
in a subsequent patch.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
include/hw/vfio/vfio-cpr.h | 1 +
include/migration/cpr.h | 12 ++++++++++++
hw/vfio/cpr-iommufd.c | 2 ++
hw/vfio/iommufd-stubs.c | 18 ++++++++++++++++++
migration/cpr.c | 15 ++++++---------
hw/vfio/meson.build | 1 +
6 files changed, 40 insertions(+), 9 deletions(-)
create mode 100644 hw/vfio/iommufd-stubs.c
diff --git a/include/hw/vfio/vfio-cpr.h b/include/hw/vfio/vfio-cpr.h
index e0e3ee2..c94d5e0 100644
--- a/include/hw/vfio/vfio-cpr.h
+++ b/include/hw/vfio/vfio-cpr.h
@@ -74,6 +74,7 @@ void vfio_cpr_delete_vector_fd(struct VFIOPCIDevice *vdev, const char *name,
int nr);
extern const VMStateDescription vfio_cpr_pci_vmstate;
+extern const VMStateDescription vmstate_cpr_vfio_devices;
void vfio_cpr_add_kvm_notifier(void);
diff --git a/include/migration/cpr.h b/include/migration/cpr.h
index eb27a93..3fc19a7 100644
--- a/include/migration/cpr.h
+++ b/include/migration/cpr.h
@@ -9,11 +9,23 @@
#define MIGRATION_CPR_H
#include "qapi/qapi-types-migration.h"
+#include "qemu/queue.h"
#define MIG_MODE_NONE -1
#define QEMU_CPR_FILE_MAGIC 0x51435052
#define QEMU_CPR_FILE_VERSION 0x00000001
+#define CPR_STATE "CprState"
+
+typedef QLIST_HEAD(CprFdList, CprFd) CprFdList;
+typedef QLIST_HEAD(CprVFIODeviceList, CprVFIODevice) CprVFIODeviceList;
+
+typedef struct CprState {
+ CprFdList fds;
+ CprVFIODeviceList vfio_devices;
+} CprState;
+
+extern CprState cpr_state;
void cpr_save_fd(const char *name, int id, int fd);
void cpr_delete_fd(const char *name, int id);
diff --git a/hw/vfio/cpr-iommufd.c b/hw/vfio/cpr-iommufd.c
index 2f58b43..f95773b 100644
--- a/hw/vfio/cpr-iommufd.c
+++ b/hw/vfio/cpr-iommufd.c
@@ -14,6 +14,8 @@
#include "system/iommufd.h"
#include "vfio-iommufd.h"
+const VMStateDescription vmstate_cpr_vfio_devices; /* TBD in a later patch */
+
static bool vfio_cpr_supported(IOMMUFDBackend *be, Error **errp)
{
if (!iommufd_change_process_capable(be)) {
diff --git a/hw/vfio/iommufd-stubs.c b/hw/vfio/iommufd-stubs.c
new file mode 100644
index 0000000..0be5276
--- /dev/null
+++ b/hw/vfio/iommufd-stubs.c
@@ -0,0 +1,18 @@
+/*
+ * Copyright (c) 2025 Oracle and/or its affiliates.
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+#include "migration/cpr.h"
+#include "migration/vmstate.h"
+
+const VMStateDescription vmstate_cpr_vfio_devices = {
+ .name = CPR_STATE "/vfio devices",
+ .version_id = 1,
+ .minimum_version_id = 1,
+ .fields = (const VMStateField[]){
+ VMSTATE_END_OF_LIST()
+ }
+};
diff --git a/migration/cpr.c b/migration/cpr.c
index 535d587..42ad0b0 100644
--- a/migration/cpr.c
+++ b/migration/cpr.c
@@ -7,6 +7,7 @@
#include "qemu/osdep.h"
#include "qapi/error.h"
+#include "hw/vfio/vfio-device.h"
#include "migration/cpr.h"
#include "migration/misc.h"
#include "migration/options.h"
@@ -20,13 +21,7 @@
/*************************************************************************/
/* cpr state container for all information to be saved. */
-typedef QLIST_HEAD(CprFdList, CprFd) CprFdList;
-
-typedef struct CprState {
- CprFdList fds;
-} CprState;
-
-static CprState cpr_state;
+CprState cpr_state;
/****************************************************************************/
@@ -127,8 +122,6 @@ int cpr_open_fd(const char *path, int flags, const char *name, int id,
}
/*************************************************************************/
-#define CPR_STATE "CprState"
-
static const VMStateDescription vmstate_cpr_state = {
.name = CPR_STATE,
.version_id = 1,
@@ -136,6 +129,10 @@ static const VMStateDescription vmstate_cpr_state = {
.fields = (VMStateField[]) {
VMSTATE_QLIST_V(fds, CprState, 1, vmstate_cpr_fd, CprFd, next),
VMSTATE_END_OF_LIST()
+ },
+ .subsections = (const VMStateDescription * const []) {
+ &vmstate_cpr_vfio_devices,
+ NULL
}
};
/*************************************************************************/
diff --git a/hw/vfio/meson.build b/hw/vfio/meson.build
index 7a88174..bfaf6be 100644
--- a/hw/vfio/meson.build
+++ b/hw/vfio/meson.build
@@ -33,6 +33,7 @@ system_ss.add(when: ['CONFIG_VFIO', 'CONFIG_IOMMUFD'], if_true: files(
'iommufd.c',
'cpr-iommufd.c',
))
+system_ss.add(when: 'CONFIG_IOMMUFD', if_false: files('iommufd-stubs.c'))
system_ss.add(when: 'CONFIG_VFIO_PCI', if_true: files(
'display.c',
))
--
1.8.3.1
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH V6 14/21] vfio/iommufd: cpr state
2025-07-02 21:58 [PATCH V6 00/21] Live update: vfio and iommufd Steve Sistare
` (12 preceding siblings ...)
2025-07-02 21:58 ` [PATCH V6 13/21] migration: vfio cpr state hook Steve Sistare
@ 2025-07-02 21:58 ` Steve Sistare
2025-07-02 21:58 ` [PATCH V6 15/21] vfio/iommufd: preserve descriptors Steve Sistare
` (7 subsequent siblings)
21 siblings, 0 replies; 31+ messages in thread
From: Steve Sistare @ 2025-07-02 21:58 UTC (permalink / raw)
To: qemu-devel
Cc: Alex Williamson, Cedric Le Goater, Yi Liu, Eric Auger,
Zhenzhong Duan, Michael S. Tsirkin, Marcel Apfelbaum, Peter Xu,
Fabiano Rosas, Steve Sistare
VFIO iommufd devices will need access to ioas_id, devid, and hwpt_id in
new QEMU at realize time, so add them to CPR state. Define CprVFIODevice
as the object which holds the state and is serialized to the vmstate file.
Define accessors to copy state between VFIODevice and CprVFIODevice.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
include/hw/vfio/vfio-cpr.h | 3 ++
hw/vfio/cpr-iommufd.c | 98 +++++++++++++++++++++++++++++++++++++++++++++-
hw/vfio/iommufd.c | 2 +
hw/vfio/trace-events | 3 ++
4 files changed, 105 insertions(+), 1 deletion(-)
diff --git a/include/hw/vfio/vfio-cpr.h b/include/hw/vfio/vfio-cpr.h
index c94d5e0..4c17cb3 100644
--- a/include/hw/vfio/vfio-cpr.h
+++ b/include/hw/vfio/vfio-cpr.h
@@ -33,6 +33,8 @@ typedef struct VFIOContainerCPR {
typedef struct VFIODeviceCPR {
Error *mdev_blocker;
Error *id_blocker;
+ uint32_t hwpt_id;
+ uint32_t ioas_id;
} VFIODeviceCPR;
bool vfio_legacy_cpr_register_container(struct VFIOContainer *container,
@@ -54,6 +56,7 @@ bool vfio_iommufd_cpr_register_iommufd(struct IOMMUFDBackend *be, Error **errp);
void vfio_iommufd_cpr_unregister_iommufd(struct IOMMUFDBackend *be);
void vfio_iommufd_cpr_register_device(struct VFIODevice *vbasedev);
void vfio_iommufd_cpr_unregister_device(struct VFIODevice *vbasedev);
+void vfio_cpr_load_device(struct VFIODevice *vbasedev);
int vfio_cpr_group_get_device_fd(int d, const char *name);
diff --git a/hw/vfio/cpr-iommufd.c b/hw/vfio/cpr-iommufd.c
index f95773b..4166201 100644
--- a/hw/vfio/cpr-iommufd.c
+++ b/hw/vfio/cpr-iommufd.c
@@ -7,14 +7,98 @@
#include "qemu/osdep.h"
#include "qapi/error.h"
#include "hw/vfio/vfio-cpr.h"
+#include "hw/vfio/vfio-device.h"
#include "migration/blocker.h"
#include "migration/cpr.h"
#include "migration/migration.h"
#include "migration/vmstate.h"
#include "system/iommufd.h"
#include "vfio-iommufd.h"
+#include "trace.h"
-const VMStateDescription vmstate_cpr_vfio_devices; /* TBD in a later patch */
+typedef struct CprVFIODevice {
+ char *name;
+ unsigned int namelen;
+ uint32_t ioas_id;
+ int devid;
+ uint32_t hwpt_id;
+ QLIST_ENTRY(CprVFIODevice) next;
+} CprVFIODevice;
+
+static const VMStateDescription vmstate_cpr_vfio_device = {
+ .name = "cpr vfio device",
+ .version_id = 1,
+ .minimum_version_id = 1,
+ .fields = (VMStateField[]) {
+ VMSTATE_UINT32(namelen, CprVFIODevice),
+ VMSTATE_VBUFFER_ALLOC_UINT32(name, CprVFIODevice, 0, NULL, namelen),
+ VMSTATE_INT32(devid, CprVFIODevice),
+ VMSTATE_UINT32(ioas_id, CprVFIODevice),
+ VMSTATE_UINT32(hwpt_id, CprVFIODevice),
+ VMSTATE_END_OF_LIST()
+ }
+};
+
+const VMStateDescription vmstate_cpr_vfio_devices = {
+ .name = CPR_STATE "/vfio devices",
+ .version_id = 1,
+ .minimum_version_id = 1,
+ .fields = (const VMStateField[]){
+ VMSTATE_QLIST_V(vfio_devices, CprState, 1, vmstate_cpr_vfio_device,
+ CprVFIODevice, next),
+ VMSTATE_END_OF_LIST()
+ }
+};
+
+static void vfio_cpr_save_device(VFIODevice *vbasedev)
+{
+ CprVFIODevice *elem = g_new0(CprVFIODevice, 1);
+
+ elem->name = g_strdup(vbasedev->name);
+ elem->namelen = strlen(vbasedev->name) + 1;
+ elem->ioas_id = vbasedev->cpr.ioas_id;
+ elem->devid = vbasedev->devid;
+ elem->hwpt_id = vbasedev->cpr.hwpt_id;
+ QLIST_INSERT_HEAD(&cpr_state.vfio_devices, elem, next);
+}
+
+static CprVFIODevice *find_device(const char *name)
+{
+ CprVFIODeviceList *head = &cpr_state.vfio_devices;
+ CprVFIODevice *elem;
+
+ QLIST_FOREACH(elem, head, next) {
+ if (!strcmp(elem->name, name)) {
+ return elem;
+ }
+ }
+ return NULL;
+}
+
+static void vfio_cpr_delete_device(const char *name)
+{
+ CprVFIODevice *elem = find_device(name);
+
+ if (elem) {
+ QLIST_REMOVE(elem, next);
+ g_free(elem->name);
+ g_free(elem);
+ }
+}
+
+static bool vfio_cpr_find_device(VFIODevice *vbasedev)
+{
+ CprVFIODevice *elem = find_device(vbasedev->name);
+
+ if (elem) {
+ vbasedev->cpr.ioas_id = elem->ioas_id;
+ vbasedev->devid = elem->devid;
+ vbasedev->cpr.hwpt_id = elem->hwpt_id;
+ trace_vfio_cpr_find_device(elem->ioas_id, elem->devid, elem->hwpt_id);
+ return true;
+ }
+ return false;
+}
static bool vfio_cpr_supported(IOMMUFDBackend *be, Error **errp)
{
@@ -81,8 +165,20 @@ void vfio_iommufd_cpr_unregister_container(VFIOIOMMUFDContainer *container)
void vfio_iommufd_cpr_register_device(VFIODevice *vbasedev)
{
+ if (!cpr_is_incoming()) {
+ vfio_cpr_save_device(vbasedev);
+ }
}
void vfio_iommufd_cpr_unregister_device(VFIODevice *vbasedev)
{
+ vfio_cpr_delete_device(vbasedev->name);
+}
+
+void vfio_cpr_load_device(VFIODevice *vbasedev)
+{
+ if (cpr_is_incoming()) {
+ bool ret = vfio_cpr_find_device(vbasedev);
+ g_assert(ret);
+ }
}
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index ff291be..f0d57ea 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -515,6 +515,8 @@ static bool iommufd_cdev_attach(const char *name, VFIODevice *vbasedev,
const VFIOIOMMUClass *iommufd_vioc =
VFIO_IOMMU_CLASS(object_class_by_name(TYPE_VFIO_IOMMU_IOMMUFD));
+ vfio_cpr_load_device(vbasedev);
+
if (vbasedev->fd < 0) {
devfd = iommufd_cdev_getfd(vbasedev->sysfsdev, errp);
if (devfd < 0) {
diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index e1728c4..8ec0ad0 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -197,6 +197,9 @@ iommufd_cdev_alloc_ioas(int iommufd, int ioas_id) " [iommufd=%d] new IOMMUFD con
iommufd_cdev_device_info(char *name, int devfd, int num_irqs, int num_regions, int flags) " %s (%d) num_irqs=%d num_regions=%d flags=%d"
iommufd_cdev_pci_hot_reset_dep_devices(int domain, int bus, int slot, int function, int dev_id) "\t%04x:%02x:%02x.%x devid %d"
+# cpr-iommufd.c
+vfio_cpr_find_device(uint32_t ioas_id, int devid, uint32_t hwpt_id) "ioas_id %u, devid %d, hwpt_id %u"
+
# device.c
vfio_device_get_region_info_type(const char *name, int index, uint32_t type, uint32_t subtype) "%s index %d, %08x/%08x"
vfio_device_reset_handler(void) ""
--
1.8.3.1
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH V6 15/21] vfio/iommufd: preserve descriptors
2025-07-02 21:58 [PATCH V6 00/21] Live update: vfio and iommufd Steve Sistare
` (13 preceding siblings ...)
2025-07-02 21:58 ` [PATCH V6 14/21] vfio/iommufd: cpr state Steve Sistare
@ 2025-07-02 21:58 ` Steve Sistare
2025-07-02 21:58 ` [PATCH V6 16/21] vfio/iommufd: reconstruct device Steve Sistare
` (6 subsequent siblings)
21 siblings, 0 replies; 31+ messages in thread
From: Steve Sistare @ 2025-07-02 21:58 UTC (permalink / raw)
To: qemu-devel
Cc: Alex Williamson, Cedric Le Goater, Yi Liu, Eric Auger,
Zhenzhong Duan, Michael S. Tsirkin, Marcel Apfelbaum, Peter Xu,
Fabiano Rosas, Steve Sistare
Save the iommu and vfio device fd in CPR state when it is created.
After CPR, the fd number is found in CPR state and reused.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
backends/iommufd.c | 35 +++++++++++++++++++++++++++++------
hw/vfio/cpr-iommufd.c | 10 ++++++++++
hw/vfio/device.c | 9 +--------
3 files changed, 40 insertions(+), 14 deletions(-)
diff --git a/backends/iommufd.c b/backends/iommufd.c
index c554ce5..e091792 100644
--- a/backends/iommufd.c
+++ b/backends/iommufd.c
@@ -16,12 +16,18 @@
#include "qemu/module.h"
#include "qom/object_interfaces.h"
#include "qemu/error-report.h"
+#include "migration/cpr.h"
#include "monitor/monitor.h"
#include "trace.h"
#include "hw/vfio/vfio-device.h"
#include <sys/ioctl.h>
#include <linux/iommufd.h>
+static const char *iommufd_fd_name(IOMMUFDBackend *be)
+{
+ return object_get_canonical_path_component(OBJECT(be));
+}
+
static void iommufd_backend_init(Object *obj)
{
IOMMUFDBackend *be = IOMMUFD_BACKEND(obj);
@@ -64,11 +70,27 @@ static bool iommufd_backend_can_be_deleted(UserCreatable *uc)
return !be->users;
}
+static void iommufd_backend_complete(UserCreatable *uc, Error **errp)
+{
+ IOMMUFDBackend *be = IOMMUFD_BACKEND(uc);
+ const char *name = iommufd_fd_name(be);
+
+ if (!be->owned) {
+ /* fd came from the command line. Fetch updated value from cpr state. */
+ if (cpr_is_incoming()) {
+ be->fd = cpr_find_fd(name, 0);
+ } else {
+ cpr_save_fd(name, 0, be->fd);
+ }
+ }
+}
+
static void iommufd_backend_class_init(ObjectClass *oc, const void *data)
{
UserCreatableClass *ucc = USER_CREATABLE_CLASS(oc);
ucc->can_be_deleted = iommufd_backend_can_be_deleted;
+ ucc->complete = iommufd_backend_complete;
object_class_property_add_str(oc, "fd", NULL, iommufd_backend_set_fd);
}
@@ -102,7 +124,7 @@ bool iommufd_backend_connect(IOMMUFDBackend *be, Error **errp)
int fd;
if (be->owned && !be->users) {
- fd = qemu_open("/dev/iommu", O_RDWR, errp);
+ fd = cpr_open_fd("/dev/iommu", O_RDWR, iommufd_fd_name(be), 0, errp);
if (fd < 0) {
return false;
}
@@ -127,14 +149,15 @@ void iommufd_backend_disconnect(IOMMUFDBackend *be)
goto out;
}
be->users--;
- if (!be->users && be->owned) {
- close(be->fd);
- be->fd = -1;
- }
-out:
if (!be->users) {
vfio_iommufd_cpr_unregister_iommufd(be);
+ if (be->owned) {
+ cpr_delete_fd(iommufd_fd_name(be), 0);
+ close(be->fd);
+ be->fd = -1;
+ }
}
+out:
trace_iommufd_backend_disconnect(be->fd, be->users);
}
diff --git a/hw/vfio/cpr-iommufd.c b/hw/vfio/cpr-iommufd.c
index 4166201..a72b68d 100644
--- a/hw/vfio/cpr-iommufd.c
+++ b/hw/vfio/cpr-iommufd.c
@@ -166,12 +166,18 @@ void vfio_iommufd_cpr_unregister_container(VFIOIOMMUFDContainer *container)
void vfio_iommufd_cpr_register_device(VFIODevice *vbasedev)
{
if (!cpr_is_incoming()) {
+ /*
+ * Beware fd may have already been saved by vfio_device_set_fd,
+ * so call resave to avoid a duplicate entry.
+ */
+ cpr_resave_fd(vbasedev->name, 0, vbasedev->fd);
vfio_cpr_save_device(vbasedev);
}
}
void vfio_iommufd_cpr_unregister_device(VFIODevice *vbasedev)
{
+ cpr_delete_fd(vbasedev->name, 0);
vfio_cpr_delete_device(vbasedev->name);
}
@@ -180,5 +186,9 @@ void vfio_cpr_load_device(VFIODevice *vbasedev)
if (cpr_is_incoming()) {
bool ret = vfio_cpr_find_device(vbasedev);
g_assert(ret);
+
+ if (vbasedev->fd < 0) {
+ vbasedev->fd = cpr_find_fd(vbasedev->name, 0);
+ }
}
}
diff --git a/hw/vfio/device.c b/hw/vfio/device.c
index 0ae3f3c..96cf214 100644
--- a/hw/vfio/device.c
+++ b/hw/vfio/device.c
@@ -351,14 +351,7 @@ void vfio_device_free_name(VFIODevice *vbasedev)
void vfio_device_set_fd(VFIODevice *vbasedev, const char *str, Error **errp)
{
- ERRP_GUARD();
- int fd = monitor_fd_param(monitor_cur(), str, errp);
-
- if (fd < 0) {
- error_prepend(errp, "Could not parse remote object fd %s:", str);
- return;
- }
- vbasedev->fd = fd;
+ vbasedev->fd = cpr_get_fd_param(vbasedev->dev->id, str, 0, errp);
}
static VFIODeviceIOOps vfio_device_io_ops_ioctl;
--
1.8.3.1
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH V6 16/21] vfio/iommufd: reconstruct device
2025-07-02 21:58 [PATCH V6 00/21] Live update: vfio and iommufd Steve Sistare
` (14 preceding siblings ...)
2025-07-02 21:58 ` [PATCH V6 15/21] vfio/iommufd: preserve descriptors Steve Sistare
@ 2025-07-02 21:58 ` Steve Sistare
2025-07-02 21:58 ` [PATCH V6 17/21] vfio/iommufd: reconstruct hwpt Steve Sistare
` (5 subsequent siblings)
21 siblings, 0 replies; 31+ messages in thread
From: Steve Sistare @ 2025-07-02 21:58 UTC (permalink / raw)
To: qemu-devel
Cc: Alex Williamson, Cedric Le Goater, Yi Liu, Eric Auger,
Zhenzhong Duan, Michael S. Tsirkin, Marcel Apfelbaum, Peter Xu,
Fabiano Rosas, Steve Sistare
Reconstruct userland device state after CPR. During vfio_realize, skip all
ioctls that configure the device, as it was already configured in old QEMU.
Skip bind, and use the devid from CPR state.
Skip allocation of, and attachment to, ioas_id. Recover ioas_id from CPR
state, and use it to find a matching container, if any, before creating a
new one.
This reconstruction is not complete. hwpt_id is handled in a subsequent
patch.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
hw/vfio/iommufd.c | 30 ++++++++++++++++++++++++++++--
1 file changed, 28 insertions(+), 2 deletions(-)
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index f0d57ea..a650517 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -25,6 +25,7 @@
#include "system/reset.h"
#include "qemu/cutils.h"
#include "qemu/chardev_open.h"
+#include "migration/cpr.h"
#include "pci.h"
#include "vfio-iommufd.h"
#include "vfio-helpers.h"
@@ -121,6 +122,10 @@ static bool iommufd_cdev_connect_and_bind(VFIODevice *vbasedev, Error **errp)
goto err_kvm_device_add;
}
+ if (cpr_is_incoming()) {
+ goto skip_bind;
+ }
+
/* Bind device to iommufd */
bind.iommufd = iommufd->fd;
if (ioctl(vbasedev->fd, VFIO_DEVICE_BIND_IOMMUFD, &bind)) {
@@ -132,6 +137,8 @@ static bool iommufd_cdev_connect_and_bind(VFIODevice *vbasedev, Error **errp)
vbasedev->devid = bind.out_devid;
trace_iommufd_cdev_connect_and_bind(bind.iommufd, vbasedev->name,
vbasedev->fd, vbasedev->devid);
+
+skip_bind:
return true;
err_bind:
iommufd_cdev_kvm_device_del(vbasedev);
@@ -421,7 +428,9 @@ static bool iommufd_cdev_attach_container(VFIODevice *vbasedev,
return iommufd_cdev_autodomains_get(vbasedev, container, errp);
}
- return !iommufd_cdev_attach_ioas_hwpt(vbasedev, container->ioas_id, errp);
+ /* If CPR, we are already attached to ioas_id. */
+ return cpr_is_incoming() ||
+ !iommufd_cdev_attach_ioas_hwpt(vbasedev, container->ioas_id, errp);
}
static void iommufd_cdev_detach_container(VFIODevice *vbasedev,
@@ -510,6 +519,7 @@ static bool iommufd_cdev_attach(const char *name, VFIODevice *vbasedev,
VFIOAddressSpace *space;
struct vfio_device_info dev_info = { .argsz = sizeof(dev_info) };
int ret, devfd;
+ bool res;
uint32_t ioas_id;
Error *err = NULL;
const VFIOIOMMUClass *iommufd_vioc =
@@ -540,7 +550,16 @@ static bool iommufd_cdev_attach(const char *name, VFIODevice *vbasedev,
vbasedev->iommufd != container->be) {
continue;
}
- if (!iommufd_cdev_attach_container(vbasedev, container, &err)) {
+
+ if (!cpr_is_incoming()) {
+ res = iommufd_cdev_attach_container(vbasedev, container, &err);
+ } else if (vbasedev->cpr.ioas_id == container->ioas_id) {
+ res = true;
+ } else {
+ continue;
+ }
+
+ if (!res) {
const char *msg = error_get_pretty(err);
trace_iommufd_cdev_fail_attach_existing_container(msg);
@@ -557,6 +576,11 @@ static bool iommufd_cdev_attach(const char *name, VFIODevice *vbasedev,
}
}
+ if (cpr_is_incoming()) {
+ ioas_id = vbasedev->cpr.ioas_id;
+ goto skip_ioas_alloc;
+ }
+
/* Need to allocate a new dedicated container */
if (!iommufd_backend_alloc_ioas(vbasedev->iommufd, &ioas_id, errp)) {
goto err_alloc_ioas;
@@ -564,10 +588,12 @@ static bool iommufd_cdev_attach(const char *name, VFIODevice *vbasedev,
trace_iommufd_cdev_alloc_ioas(vbasedev->iommufd->fd, ioas_id);
+skip_ioas_alloc:
container = VFIO_IOMMU_IOMMUFD(object_new(TYPE_VFIO_IOMMU_IOMMUFD));
container->be = vbasedev->iommufd;
container->ioas_id = ioas_id;
QLIST_INIT(&container->hwpt_list);
+ vbasedev->cpr.ioas_id = ioas_id;
bcontainer = &container->bcontainer;
vfio_address_space_insert(space, bcontainer);
--
1.8.3.1
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH V6 17/21] vfio/iommufd: reconstruct hwpt
2025-07-02 21:58 [PATCH V6 00/21] Live update: vfio and iommufd Steve Sistare
` (15 preceding siblings ...)
2025-07-02 21:58 ` [PATCH V6 16/21] vfio/iommufd: reconstruct device Steve Sistare
@ 2025-07-02 21:58 ` Steve Sistare
2025-07-02 21:58 ` [PATCH V6 18/21] vfio/iommufd: change process Steve Sistare
` (4 subsequent siblings)
21 siblings, 0 replies; 31+ messages in thread
From: Steve Sistare @ 2025-07-02 21:58 UTC (permalink / raw)
To: qemu-devel
Cc: Alex Williamson, Cedric Le Goater, Yi Liu, Eric Auger,
Zhenzhong Duan, Michael S. Tsirkin, Marcel Apfelbaum, Peter Xu,
Fabiano Rosas, Steve Sistare
Skip allocation of, and attachment to, hwpt_id. Recover it from CPR state.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
hw/vfio/iommufd.c | 30 ++++++++++++++++++++++--------
1 file changed, 22 insertions(+), 8 deletions(-)
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index a650517..48c590b 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -332,7 +332,14 @@ static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
/* Try to find a domain */
QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
- ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id, errp);
+ if (!cpr_is_incoming()) {
+ ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id, errp);
+ } else if (vbasedev->cpr.hwpt_id == hwpt->hwpt_id) {
+ ret = 0;
+ } else {
+ continue;
+ }
+
if (ret) {
/* -EINVAL means the domain is incompatible with the device. */
if (ret == -EINVAL) {
@@ -349,6 +356,7 @@ static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
return false;
} else {
vbasedev->hwpt = hwpt;
+ vbasedev->cpr.hwpt_id = hwpt->hwpt_id;
QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next);
vbasedev->iommu_dirty_tracking = iommufd_hwpt_dirty_tracking(hwpt);
return true;
@@ -371,6 +379,11 @@ static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
flags = IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
}
+ if (cpr_is_incoming()) {
+ hwpt_id = vbasedev->cpr.hwpt_id;
+ goto skip_alloc;
+ }
+
if (!iommufd_backend_alloc_hwpt(iommufd, vbasedev->devid,
container->ioas_id, flags,
IOMMU_HWPT_DATA_NONE, 0, NULL,
@@ -378,19 +391,20 @@ static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
return false;
}
+ ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt_id, errp);
+ if (ret) {
+ iommufd_backend_free_id(container->be, hwpt_id);
+ return false;
+ }
+
+skip_alloc:
hwpt = g_malloc0(sizeof(*hwpt));
hwpt->hwpt_id = hwpt_id;
hwpt->hwpt_flags = flags;
QLIST_INIT(&hwpt->device_list);
- ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id, errp);
- if (ret) {
- iommufd_backend_free_id(container->be, hwpt->hwpt_id);
- g_free(hwpt);
- return false;
- }
-
vbasedev->hwpt = hwpt;
+ vbasedev->cpr.hwpt_id = hwpt->hwpt_id;
vbasedev->iommu_dirty_tracking = iommufd_hwpt_dirty_tracking(hwpt);
QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next);
QLIST_INSERT_HEAD(&container->hwpt_list, hwpt, next);
--
1.8.3.1
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH V6 18/21] vfio/iommufd: change process
2025-07-02 21:58 [PATCH V6 00/21] Live update: vfio and iommufd Steve Sistare
` (16 preceding siblings ...)
2025-07-02 21:58 ` [PATCH V6 17/21] vfio/iommufd: reconstruct hwpt Steve Sistare
@ 2025-07-02 21:58 ` Steve Sistare
2025-07-02 21:58 ` [PATCH V6 19/21] iommufd: preserve DMA mappings Steve Sistare
` (3 subsequent siblings)
21 siblings, 0 replies; 31+ messages in thread
From: Steve Sistare @ 2025-07-02 21:58 UTC (permalink / raw)
To: qemu-devel
Cc: Alex Williamson, Cedric Le Goater, Yi Liu, Eric Auger,
Zhenzhong Duan, Michael S. Tsirkin, Marcel Apfelbaum, Peter Xu,
Fabiano Rosas, Steve Sistare
Finish CPR by change the owning process of the iommufd device in
post load.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
hw/vfio/cpr-iommufd.c | 30 ++++++++++++++++++++++++++++++
1 file changed, 30 insertions(+)
diff --git a/hw/vfio/cpr-iommufd.c b/hw/vfio/cpr-iommufd.c
index a72b68d..cbeab57 100644
--- a/hw/vfio/cpr-iommufd.c
+++ b/hw/vfio/cpr-iommufd.c
@@ -112,10 +112,40 @@ static bool vfio_cpr_supported(IOMMUFDBackend *be, Error **errp)
return true;
}
+static int iommufd_cpr_pre_save(void *opaque)
+{
+ IOMMUFDBackend *be = opaque;
+
+ /*
+ * The process has not changed yet, but proactively try the ioctl,
+ * and it will fail if any DMA mappings are not supported.
+ */
+ if (!iommufd_change_process_capable(be)) {
+ error_report("some memory regions do not support "
+ "IOMMU_IOAS_CHANGE_PROCESS");
+ return -1;
+ }
+ return 0;
+}
+
+static int iommufd_cpr_post_load(void *opaque, int version_id)
+{
+ IOMMUFDBackend *be = opaque;
+ Error *local_err = NULL;
+
+ if (!iommufd_change_process(be, &local_err)) {
+ error_report_err(local_err);
+ return -1;
+ }
+ return 0;
+}
+
static const VMStateDescription iommufd_cpr_vmstate = {
.name = "iommufd",
.version_id = 0,
.minimum_version_id = 0,
+ .pre_save = iommufd_cpr_pre_save,
+ .post_load = iommufd_cpr_post_load,
.needed = cpr_incoming_needed,
.fields = (VMStateField[]) {
VMSTATE_END_OF_LIST()
--
1.8.3.1
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH V6 19/21] iommufd: preserve DMA mappings
2025-07-02 21:58 [PATCH V6 00/21] Live update: vfio and iommufd Steve Sistare
` (17 preceding siblings ...)
2025-07-02 21:58 ` [PATCH V6 18/21] vfio/iommufd: change process Steve Sistare
@ 2025-07-02 21:58 ` Steve Sistare
2025-07-02 21:58 ` [PATCH V6 20/21] vfio/container: delete old cpr register Steve Sistare
` (2 subsequent siblings)
21 siblings, 0 replies; 31+ messages in thread
From: Steve Sistare @ 2025-07-02 21:58 UTC (permalink / raw)
To: qemu-devel
Cc: Alex Williamson, Cedric Le Goater, Yi Liu, Eric Auger,
Zhenzhong Duan, Michael S. Tsirkin, Marcel Apfelbaum, Peter Xu,
Fabiano Rosas, Steve Sistare
During cpr-transfer load in new QEMU, the vfio_memory_listener causes
spurious calls to map and unmap DMA regions, as devices are created and
the address space is built. This memory was already already mapped by the
device in old QEMU, so suppress the map and unmap callbacks during incoming
CPR.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
backends/iommufd.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/backends/iommufd.c b/backends/iommufd.c
index e091792..2a33c7a 100644
--- a/backends/iommufd.c
+++ b/backends/iommufd.c
@@ -245,6 +245,10 @@ int iommufd_backend_map_file_dma(IOMMUFDBackend *be, uint32_t ioas_id,
.length = size,
};
+ if (cpr_is_incoming()) {
+ return 0;
+ }
+
if (!readonly) {
map.flags |= IOMMU_IOAS_MAP_WRITEABLE;
}
@@ -274,6 +278,10 @@ int iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
.length = size,
};
+ if (cpr_is_incoming()) {
+ return 0;
+ }
+
ret = ioctl(fd, IOMMU_IOAS_UNMAP, &unmap);
/*
* IOMMUFD takes mapping as some kind of object, unmapping
--
1.8.3.1
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH V6 20/21] vfio/container: delete old cpr register
2025-07-02 21:58 [PATCH V6 00/21] Live update: vfio and iommufd Steve Sistare
` (18 preceding siblings ...)
2025-07-02 21:58 ` [PATCH V6 19/21] iommufd: preserve DMA mappings Steve Sistare
@ 2025-07-02 21:58 ` Steve Sistare
2025-07-02 21:58 ` [PATCH V6 21/21] vfio: doc changes for cpr Steve Sistare
2025-07-03 6:22 ` [PATCH V6 00/21] Live update: vfio and iommufd Cédric Le Goater
21 siblings, 0 replies; 31+ messages in thread
From: Steve Sistare @ 2025-07-02 21:58 UTC (permalink / raw)
To: qemu-devel
Cc: Alex Williamson, Cedric Le Goater, Yi Liu, Eric Auger,
Zhenzhong Duan, Michael S. Tsirkin, Marcel Apfelbaum, Peter Xu,
Fabiano Rosas, Steve Sistare
vfio_cpr_[un]register_container is no longer used since they were
subsumed by container type-specific registration. Delete them.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
---
include/hw/vfio/vfio-cpr.h | 4 ----
hw/vfio/cpr.c | 13 -------------
2 files changed, 17 deletions(-)
diff --git a/include/hw/vfio/vfio-cpr.h b/include/hw/vfio/vfio-cpr.h
index 4c17cb3..e9cd9b2 100644
--- a/include/hw/vfio/vfio-cpr.h
+++ b/include/hw/vfio/vfio-cpr.h
@@ -44,10 +44,6 @@ void vfio_legacy_cpr_unregister_container(struct VFIOContainer *container);
int vfio_cpr_reboot_notifier(NotifierWithReturn *notifier, MigrationEvent *e,
Error **errp);
-bool vfio_cpr_register_container(struct VFIOContainerBase *bcontainer,
- Error **errp);
-void vfio_cpr_unregister_container(struct VFIOContainerBase *bcontainer);
-
bool vfio_iommufd_cpr_register_container(struct VFIOIOMMUFDContainer *container,
Error **errp);
void vfio_iommufd_cpr_unregister_container(
diff --git a/hw/vfio/cpr.c b/hw/vfio/cpr.c
index 0e903cd..af0f12a 100644
--- a/hw/vfio/cpr.c
+++ b/hw/vfio/cpr.c
@@ -29,19 +29,6 @@ int vfio_cpr_reboot_notifier(NotifierWithReturn *notifier,
return 0;
}
-bool vfio_cpr_register_container(VFIOContainerBase *bcontainer, Error **errp)
-{
- migration_add_notifier_mode(&bcontainer->cpr_reboot_notifier,
- vfio_cpr_reboot_notifier,
- MIG_MODE_CPR_REBOOT);
- return true;
-}
-
-void vfio_cpr_unregister_container(VFIOContainerBase *bcontainer)
-{
- migration_remove_notifier(&bcontainer->cpr_reboot_notifier);
-}
-
#define STRDUP_VECTOR_FD_NAME(vdev, name) \
g_strdup_printf("%s_%s", (vdev)->vbasedev.name, (name))
--
1.8.3.1
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH V6 21/21] vfio: doc changes for cpr
2025-07-02 21:58 [PATCH V6 00/21] Live update: vfio and iommufd Steve Sistare
` (19 preceding siblings ...)
2025-07-02 21:58 ` [PATCH V6 20/21] vfio/container: delete old cpr register Steve Sistare
@ 2025-07-02 21:58 ` Steve Sistare
2025-07-03 6:22 ` [PATCH V6 00/21] Live update: vfio and iommufd Cédric Le Goater
21 siblings, 0 replies; 31+ messages in thread
From: Steve Sistare @ 2025-07-02 21:58 UTC (permalink / raw)
To: qemu-devel
Cc: Alex Williamson, Cedric Le Goater, Yi Liu, Eric Auger,
Zhenzhong Duan, Michael S. Tsirkin, Marcel Apfelbaum, Peter Xu,
Fabiano Rosas, Steve Sistare
Update documentation to say that cpr-transfer supports vfio and iommufd.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
---
docs/devel/migration/CPR.rst | 5 ++---
qapi/migration.json | 6 ++++--
2 files changed, 6 insertions(+), 5 deletions(-)
diff --git a/docs/devel/migration/CPR.rst b/docs/devel/migration/CPR.rst
index 7897873..0a0fd4f 100644
--- a/docs/devel/migration/CPR.rst
+++ b/docs/devel/migration/CPR.rst
@@ -152,8 +152,7 @@ cpr-transfer mode
This mode allows the user to transfer a guest to a new QEMU instance
on the same host with minimal guest pause time, by preserving guest
RAM in place, albeit with new virtual addresses in new QEMU. Devices
-and their pinned memory pages will also be preserved in a future QEMU
-release.
+and their pinned memory pages are also preserved for VFIO and IOMMUFD.
The user starts new QEMU on the same host as old QEMU, with command-
line arguments to create the same machine, plus the ``-incoming``
@@ -322,6 +321,6 @@ Futures
cpr-transfer mode is based on a capability to transfer open file
descriptors from old to new QEMU. In the future, descriptors for
-vfio, iommufd, vhost, and char devices could be transferred,
+vhost, and char devices could be transferred,
preserving those devices and their kernel state without interruption,
even if they do not explicitly support live migration.
diff --git a/qapi/migration.json b/qapi/migration.json
index 4963f6c..e8a7d3b 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -620,8 +620,10 @@
#
# @cpr-transfer: This mode allows the user to transfer a guest to a
# new QEMU instance on the same host with minimal guest pause
-# time by preserving guest RAM in place. Devices and their pinned
-# pages will also be preserved in a future QEMU release.
+# time by preserving guest RAM in place.
+#
+# Devices and their pinned pages are also preserved for VFIO and
+# IOMMUFD. (since 10.1)
#
# The user starts new QEMU on the same host as old QEMU, with
# command-line arguments to create the same machine, plus the
--
1.8.3.1
^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [PATCH V6 03/21] migration: close kvm after cpr
2025-07-02 21:58 ` [PATCH V6 03/21] migration: close kvm after cpr Steve Sistare
@ 2025-07-02 22:05 ` Steven Sistare
2025-07-04 9:50 ` Duan, Zhenzhong
0 siblings, 1 reply; 31+ messages in thread
From: Steven Sistare @ 2025-07-02 22:05 UTC (permalink / raw)
To: qemu-devel, Paolo Bonzini
Cc: Alex Williamson, Cedric Le Goater, Yi Liu, Eric Auger,
Zhenzhong Duan, Michael S. Tsirkin, Marcel Apfelbaum, Peter Xu,
Fabiano Rosas
cc Paolo.
After incorporating Peter's feedback, IMO this version reads well:
* kvm exports kvm_close
* vfio exports vfio_kvm_device_close
* vfio-cpr registers a notifier that calls vfio_kvm_device_close
- Steve
On 7/2/2025 5:58 PM, Steve Sistare wrote:
> cpr-transfer breaks vfio network connectivity to and from the guest, and
> the host system log shows:
> irq bypass consumer (token 00000000a03c32e5) registration fails: -16
> which is EBUSY. This occurs because KVM descriptors are still open in
> the old QEMU process. Close them.
>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> Reviewed-by: Fabiano Rosas <farosas@suse.de>
> ---
> include/hw/vfio/vfio-cpr.h | 2 ++
> include/hw/vfio/vfio-device.h | 2 ++
> include/system/kvm.h | 1 +
> accel/kvm/kvm-all.c | 32 ++++++++++++++++++++++++++++++++
> hw/vfio/cpr-legacy.c | 2 ++
> hw/vfio/cpr.c | 21 +++++++++++++++++++++
> hw/vfio/helpers.c | 11 +++++++++++
> 7 files changed, 71 insertions(+)
>
> diff --git a/include/hw/vfio/vfio-cpr.h b/include/hw/vfio/vfio-cpr.h
> index 25e74ee..099d54f 100644
> --- a/include/hw/vfio/vfio-cpr.h
> +++ b/include/hw/vfio/vfio-cpr.h
> @@ -62,4 +62,6 @@ void vfio_cpr_delete_vector_fd(struct VFIOPCIDevice *vdev, const char *name,
>
> extern const VMStateDescription vfio_cpr_pci_vmstate;
>
> +void vfio_cpr_add_kvm_notifier(void);
> +
> #endif /* HW_VFIO_VFIO_CPR_H */
> diff --git a/include/hw/vfio/vfio-device.h b/include/hw/vfio/vfio-device.h
> index c616652..f503837 100644
> --- a/include/hw/vfio/vfio-device.h
> +++ b/include/hw/vfio/vfio-device.h
> @@ -283,4 +283,6 @@ void vfio_device_set_fd(VFIODevice *vbasedev, const char *str, Error **errp);
> void vfio_device_init(VFIODevice *vbasedev, int type, VFIODeviceOps *ops,
> DeviceState *dev, bool ram_discard);
> int vfio_device_get_aw_bits(VFIODevice *vdev);
> +
> +void vfio_kvm_device_close(void);
> #endif /* HW_VFIO_VFIO_COMMON_H */
> diff --git a/include/system/kvm.h b/include/system/kvm.h
> index 7cc60d2..4896a3c 100644
> --- a/include/system/kvm.h
> +++ b/include/system/kvm.h
> @@ -195,6 +195,7 @@ bool kvm_has_sync_mmu(void);
> int kvm_has_vcpu_events(void);
> int kvm_max_nested_state_length(void);
> int kvm_has_gsi_routing(void);
> +void kvm_close(void);
>
> /**
> * kvm_arm_supports_user_irq
> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> index d095d1b..8141854 100644
> --- a/accel/kvm/kvm-all.c
> +++ b/accel/kvm/kvm-all.c
> @@ -515,16 +515,23 @@ static int do_kvm_destroy_vcpu(CPUState *cpu)
> goto err;
> }
>
> + /* If I am the CPU that created coalesced_mmio_ring, then discard it */
> + if (s->coalesced_mmio_ring == (void *)cpu->kvm_run + PAGE_SIZE) {
> + s->coalesced_mmio_ring = NULL;
> + }
> +
> ret = munmap(cpu->kvm_run, mmap_size);
> if (ret < 0) {
> goto err;
> }
> + cpu->kvm_run = NULL;
>
> if (cpu->kvm_dirty_gfns) {
> ret = munmap(cpu->kvm_dirty_gfns, s->kvm_dirty_ring_bytes);
> if (ret < 0) {
> goto err;
> }
> + cpu->kvm_dirty_gfns = NULL;
> }
>
> kvm_park_vcpu(cpu);
> @@ -608,6 +615,31 @@ err:
> return ret;
> }
>
> +void kvm_close(void)
> +{
> + CPUState *cpu;
> +
> + if (!kvm_state || kvm_state->fd == -1) {
> + return;
> + }
> +
> + CPU_FOREACH(cpu) {
> + cpu_remove_sync(cpu);
> + close(cpu->kvm_fd);
> + cpu->kvm_fd = -1;
> + close(cpu->kvm_vcpu_stats_fd);
> + cpu->kvm_vcpu_stats_fd = -1;
> + }
> +
> + if (kvm_state && kvm_state->fd != -1) {
> + close(kvm_state->vmfd);
> + kvm_state->vmfd = -1;
> + close(kvm_state->fd);
> + kvm_state->fd = -1;
> + }
> + kvm_state = NULL;
> +}
> +
> /*
> * dirty pages logging control
> */
> diff --git a/hw/vfio/cpr-legacy.c b/hw/vfio/cpr-legacy.c
> index a84c324..daa3523 100644
> --- a/hw/vfio/cpr-legacy.c
> +++ b/hw/vfio/cpr-legacy.c
> @@ -177,6 +177,8 @@ bool vfio_legacy_cpr_register_container(VFIOContainer *container, Error **errp)
> MIG_MODE_CPR_TRANSFER, -1) == 0;
> }
>
> + vfio_cpr_add_kvm_notifier();
> +
> vmstate_register(NULL, -1, &vfio_container_vmstate, container);
>
> /* During incoming CPR, divert calls to dma_map. */
> diff --git a/hw/vfio/cpr.c b/hw/vfio/cpr.c
> index f5555ca..0e903cd 100644
> --- a/hw/vfio/cpr.c
> +++ b/hw/vfio/cpr.c
> @@ -190,3 +190,24 @@ const VMStateDescription vfio_cpr_pci_vmstate = {
> VMSTATE_END_OF_LIST()
> }
> };
> +
> +static NotifierWithReturn kvm_close_notifier;
> +
> +static int vfio_cpr_kvm_close_notifier(NotifierWithReturn *notifier,
> + MigrationEvent *e,
> + Error **errp)
> +{
> + if (e->type == MIG_EVENT_PRECOPY_DONE) {
> + vfio_kvm_device_close();
> + }
> + return 0;
> +}
> +
> +void vfio_cpr_add_kvm_notifier(void)
> +{
> + if (!kvm_close_notifier.notify) {
> + migration_add_notifier_mode(&kvm_close_notifier,
> + vfio_cpr_kvm_close_notifier,
> + MIG_MODE_CPR_TRANSFER);
> + }
> +}
> diff --git a/hw/vfio/helpers.c b/hw/vfio/helpers.c
> index d0dbab1..9a5f621 100644
> --- a/hw/vfio/helpers.c
> +++ b/hw/vfio/helpers.c
> @@ -117,6 +117,17 @@ bool vfio_get_info_dma_avail(struct vfio_iommu_type1_info *info,
> int vfio_kvm_device_fd = -1;
> #endif
>
> +void vfio_kvm_device_close(void)
> +{
> +#ifdef CONFIG_KVM
> + kvm_close();
> + if (vfio_kvm_device_fd != -1) {
> + close(vfio_kvm_device_fd);
> + vfio_kvm_device_fd = -1;
> + }
> +#endif
> +}
> +
> int vfio_kvm_device_add_fd(int fd, Error **errp)
> {
> #ifdef CONFIG_KVM
^ permalink raw reply [flat|nested] 31+ messages in thread
* RE: [PATCH V6 12/21] vfio/iommufd: register container for cpr
2025-07-02 21:58 ` [PATCH V6 12/21] vfio/iommufd: register container for cpr Steve Sistare
@ 2025-07-03 2:42 ` Duan, Zhenzhong
0 siblings, 0 replies; 31+ messages in thread
From: Duan, Zhenzhong @ 2025-07-03 2:42 UTC (permalink / raw)
To: Steve Sistare, qemu-devel@nongnu.org
Cc: Alex Williamson, Cedric Le Goater, Liu, Yi L, Eric Auger,
Michael S. Tsirkin, Marcel Apfelbaum, Peter Xu, Fabiano Rosas
>-----Original Message-----
>From: Steve Sistare <steven.sistare@oracle.com>
>Subject: [PATCH V6 12/21] vfio/iommufd: register container for cpr
>
>Register a vfio iommufd container and device for CPR, replacing the generic
>CPR register call with a more specific iommufd register call. Add a
>blocker if the kernel does not support IOMMU_IOAS_CHANGE_PROCESS.
>
>This is mostly boiler plate. The fields to to saved and restored are added
>in subsequent patches.
>
>Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Thanks
Zhenzhong
^ permalink raw reply [flat|nested] 31+ messages in thread
* RE: [PATCH V6 13/21] migration: vfio cpr state hook
2025-07-02 21:58 ` [PATCH V6 13/21] migration: vfio cpr state hook Steve Sistare
@ 2025-07-03 2:44 ` Duan, Zhenzhong
0 siblings, 0 replies; 31+ messages in thread
From: Duan, Zhenzhong @ 2025-07-03 2:44 UTC (permalink / raw)
To: Steve Sistare, qemu-devel@nongnu.org
Cc: Alex Williamson, Cedric Le Goater, Liu, Yi L, Eric Auger,
Michael S. Tsirkin, Marcel Apfelbaum, Peter Xu, Fabiano Rosas
>-----Original Message-----
>From: Steve Sistare <steven.sistare@oracle.com>
>Subject: [PATCH V6 13/21] migration: vfio cpr state hook
>
>Define a list of vfio devices in CPR state, in a subsection so that
>older QEMU can be live updated to this version. However, new QEMU
>will not be live updateable to old QEMU. This is acceptable because
>CPR is not yet commonly used, and updates to older versions are unusual.
>
>The contents of each device object will be defined by the vfio subsystem
>in a subsequent patch.
>
>Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Thanks
Zhenzhong
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH V6 01/21] vfio-pci: preserve MSI
2025-07-02 21:58 ` [PATCH V6 01/21] vfio-pci: preserve MSI Steve Sistare
@ 2025-07-03 6:13 ` Cédric Le Goater
0 siblings, 0 replies; 31+ messages in thread
From: Cédric Le Goater @ 2025-07-03 6:13 UTC (permalink / raw)
To: Steve Sistare, qemu-devel
Cc: Alex Williamson, Yi Liu, Eric Auger, Zhenzhong Duan,
Michael S. Tsirkin, Marcel Apfelbaum, Peter Xu, Fabiano Rosas
On 7/2/25 23:58, Steve Sistare wrote:
> Save the MSI message area as part of vfio-pci vmstate, and preserve the
> interrupt and notifier eventfd's. migrate_incoming loads the MSI data,
> then the vfio-pci post_load handler finds the eventfds in CPR state,
> rebuilds vector data structures, and attaches the interrupts to the new
> KVM instance.
>
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> ---
> hw/vfio/pci.h | 2 +
> include/hw/vfio/vfio-cpr.h | 8 ++++
> hw/vfio/cpr.c | 97 ++++++++++++++++++++++++++++++++++++++++++++++
> hw/vfio/pci.c | 52 ++++++++++++++++++++++++-
> 4 files changed, 157 insertions(+), 2 deletions(-)
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Thanks,
C.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH V6 02/21] vfio-pci: preserve INTx
2025-07-02 21:58 ` [PATCH V6 02/21] vfio-pci: preserve INTx Steve Sistare
@ 2025-07-03 6:13 ` Cédric Le Goater
0 siblings, 0 replies; 31+ messages in thread
From: Cédric Le Goater @ 2025-07-03 6:13 UTC (permalink / raw)
To: Steve Sistare, qemu-devel
Cc: Alex Williamson, Yi Liu, Eric Auger, Zhenzhong Duan,
Michael S. Tsirkin, Marcel Apfelbaum, Peter Xu, Fabiano Rosas
On 7/2/25 23:58, Steve Sistare wrote:
> Preserve vfio INTx state across cpr-transfer. Preserve VFIOINTx fields as
> follows:
> pin : Recover this from the vfio config in kernel space
> interrupt : Preserve its eventfd descriptor across exec.
> unmask : Ditto
> route.irq : This could perhaps be recovered in vfio_pci_post_load by
> calling pci_device_route_intx_to_irq(pin), whose implementation reads
> config space for a bridge device such as ich9. However, there is no
> guarantee that the bridge vmstate is read before vfio vmstate. Rather
> than fiddling with MigrationPriority for vmstate handlers, explicitly
> save route.irq in vfio vmstate.
> pending : save in vfio vmstate.
> mmap_timeout, mmap_timer : Re-initialize
> bool kvm_accel : Re-initialize
>
> In vfio_realize, defer calling vfio_intx_enable until the vmstate
> is available, in vfio_pci_post_load. Modify vfio_intx_enable and
> vfio_intx_kvm_enable to skip vfio initialization, but still perform
> kvm initialization.
>
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> ---
> hw/vfio/cpr.c | 27 ++++++++++++++++++++++++++-
> hw/vfio/pci.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++++++++--
> 2 files changed, 79 insertions(+), 3 deletions(-)
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Thanks,
C.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH V6 00/21] Live update: vfio and iommufd
2025-07-02 21:58 [PATCH V6 00/21] Live update: vfio and iommufd Steve Sistare
` (20 preceding siblings ...)
2025-07-02 21:58 ` [PATCH V6 21/21] vfio: doc changes for cpr Steve Sistare
@ 2025-07-03 6:22 ` Cédric Le Goater
21 siblings, 0 replies; 31+ messages in thread
From: Cédric Le Goater @ 2025-07-03 6:22 UTC (permalink / raw)
To: Steve Sistare, qemu-devel
Cc: Alex Williamson, Yi Liu, Eric Auger, Zhenzhong Duan,
Michael S. Tsirkin, Marcel Apfelbaum, Peter Xu, Fabiano Rosas
On 7/2/25 23:58, Steve Sistare wrote:
> NOTE: this V6 series depends on the patch
> vfio-user: do not register vfio-user container with cpr
> which is in vfio-next.
>
> Support vfio and iommufd devices with the cpr-transfer live migration mode.
> Devices that do not support live migration can still support cpr-transfer,
> allowing live update to a new version of QEMU on the same host, with no loss
> of guest connectivity.
>
> No user-visible interfaces are added.
>
> For legacy containers:
>
> Pass vfio device descriptors to new QEMU. In new QEMU, during vfio_realize,
> skip the ioctls that configure the device, because it is already configured.
>
> Use VFIO_DMA_UNMAP_FLAG_VADDR to abandon the old VA's for DMA mapped
> regions, and use VFIO_DMA_MAP_FLAG_VADDR to register the new VA in new
> QEMU and update the locked memory accounting. The physical pages remain
> pinned, because the descriptor of the device that locked them remains open,
> so DMA to those pages continues without interruption. Mediated devices are
> not supported, however, because they require the VA to always be valid, and
> there is a brief window where no VA is registered.
>
> Save the MSI message area as part of vfio-pci vmstate, and pass the interrupt
> and notifier eventfd's to new QEMU. New QEMU loads the MSI data, then the
> vfio-pci post_load handler finds the eventfds in CPR state, rebuilds vector
> data structures, and attaches the interrupts to the new KVM instance. This
> logic also applies to iommufd containers.
>
> For iommufd containers:
>
> Use IOMMU_IOAS_MAP_FILE to register memory regions for DMA when they are
> backed by a file (including a memfd), so DMA mappings do not depend on VA,
> which can differ after live update. This allows mediated devices to be
> supported.
>
> Pass the iommufd and vfio device descriptors from old to new QEMU. In new
> QEMU, during vfio_realize, skip the ioctls that configure the device, because
> it is already configured.
>
> In new QEMU, call ioctl(IOMMU_IOAS_CHANGE_PROCESS) to update mm ownership and
> locked memory accounting.
>
> Patches 3 to 8 are specific to legacy containers.
> Patches 21 to 36 are specific to iommufd containers.
> The remainder apply to both.
>
> Changes from previous versions:
> * V1 of this series contains minor changes from the "Live update: vfio" and
> "Live update: iommufd" series, mainly bug fixes and refactored patches.
>
> Changes in V2:
> * refactored various vfio code snippets into new cpr helpers
> * refactored vfio struct members into cpr-specific structures
> * refactored various small changes into their own patches
> * split complex patches. Notably:
> - split "refactor for cpr" into 5 patches
> - split "reconstruct device" into 4 patches
> * refactored vfio_connect_container using helpers and made its
> error recovery more robust.
> * moved vfio pci msi/vector/intx cpr functions to cpr.c
> * renamed "reused" to cpr_reused and cpr.reused
> * squashed vfio_cpr_[un]register_container to their call sites
> * simplified iommu_type setting after cpr
> * added cpr_open_fd and cpr_is_incoming helpers
> * removed changes from vfio_legacy_dma_map, and instead temporarily
> override dma_map and dma_unmap ops.
> * deleted error_report and returned Error to callers where possible.
> * simplified the memory_get_xlat_addr interface
> * fixed flags passed to iommufd_backend_alloc_hwpt
> * defined MIG_PRI_UNINITIALIZED
> * added maintainers
>
> Changes in V3:
> * removed cleanup patches that were already pulled
> * rebased to latest master
>
> Changes in V4:
> * added SPDX-License-Identifier
> * patch "vfio/container: preserve descriptors"
> - rewrote search loop in vfio_container_connect
> - do not return pfd from vfio_cpr_container_match
> - add helper for VFIO_GROUP_GET_DEVICE_FD
> * deleted patch "export vfio_legacy_dma_map"
> * patch "vfio/container: restore DMA vaddr"
> - deleted redundant error_report from vfio_legacy_cpr_dma_map
> - save old dma_map function
> * patch "vfio-pci: skip reset during cpr"
> - use cpr_is_incoming instead of cpr_reused
> * renamed err -> local_err in all new code
> * patch "export MSI functions"
> - renamed with vfio_pci prefix, and defined wrappers for low level
> routines instead of exporting them.
> * patch "close kvm after cpr"
> - fixed build error for !CONFIG_KVM
> * added the cpr_resave_fd helper
> * dropped patch "pass ramblock to vfio_container_dma_map", relying on
> "pass MemoryRegion" from the vfio-user series instead.
> * deleted "reused" variables, replaced with cpr_is_incoming()
> * renamed cpr_needed_for_reuse -> cpr_incoming_needed
> * rewrote patch "pci: skip reset during cpr"
> * rebased to latest master
>
> for iommufd:
> * deleted redundant error_report from iommufd_backend_map_file_dma
> * added interface doc for dma_map_file
> * check return value of cpr_open_fd
> * deleted "export iommufd_cdev_get_info_iova_range"
> * deleted "reconstruct device"
> * deleted "reconstruct hw_caps"
> * deleted "define hwpt constructors"
> * seperated cpr registration for iommufd be and vfio container
> * correctly attach to multiple containers per iommufd using ioas_id
> * simplified "reconstruct hwpt" by matching against hwpt_id.
> * added patch "add vfio_device_free_name"
>
> Changes in V5:
> * dropped: vfio/pci: vfio_pci_put_device on failure
> * added: "vfio: doc changes for cpr"
> * deleted unnecessary include of vfio-cpr.h
> * fixed compilation for !CONFIG_VFIO and !CONFIG_IOMMUFD
> * misc minor changes
> * Added RB's, rebased to master
>
> Changes in V6:
> * dropped already-pulled patches
> * converted remaining g_free in "add vfio_device_free_name"
> * fixed iommufd_backend_disconnect in "preserve descriptors"
> * tweaked vfio_cpr_load_device in "preserve descriptors"
> * added trace_vfio_cpr_find_device in "cpr state"
> * rewrote vfio_notifier_init and vfio_msix_vector_use
> * rewrote the notifier in "close kvm after cpr"
> * Added RB's, rebased to master
>
>
> Steve Sistare (21):
> vfio-pci: preserve MSI
> vfio-pci: preserve INTx
> migration: close kvm after cpr
> migration: cpr_get_fd_param helper
> backends/iommufd: iommufd_backend_map_file_dma
> backends/iommufd: change process ioctl
> physmem: qemu_ram_get_fd_offset
> vfio/iommufd: use IOMMU_IOAS_MAP_FILE
> vfio/iommufd: invariant device name
> vfio/iommufd: add vfio_device_free_name
> vfio/iommufd: device name blocker
> vfio/iommufd: register container for cpr
> migration: vfio cpr state hook
> vfio/iommufd: cpr state
> vfio/iommufd: preserve descriptors
> vfio/iommufd: reconstruct device
> vfio/iommufd: reconstruct hwpt
> vfio/iommufd: change process
> iommufd: preserve DMA mappings
> vfio/container: delete old cpr register
> vfio: doc changes for cpr
>
> docs/devel/migration/CPR.rst | 5 +-
> qapi/migration.json | 6 +-
> hw/vfio/pci.h | 2 +
> include/exec/cpu-common.h | 1 +
> include/hw/vfio/vfio-container-base.h | 15 +++
> include/hw/vfio/vfio-cpr.h | 29 ++++-
> include/hw/vfio/vfio-device.h | 3 +
> include/migration/cpr.h | 14 +++
> include/system/iommufd.h | 7 ++
> include/system/kvm.h | 1 +
> accel/kvm/kvm-all.c | 32 +++++
> backends/iommufd.c | 107 +++++++++++++++-
> hw/vfio/ap.c | 4 +-
> hw/vfio/ccw.c | 4 +-
> hw/vfio/container-base.c | 9 ++
> hw/vfio/cpr-iommufd.c | 224 ++++++++++++++++++++++++++++++++++
> hw/vfio/cpr-legacy.c | 2 +
> hw/vfio/cpr.c | 144 ++++++++++++++++++++--
> hw/vfio/device.c | 40 ++++--
> hw/vfio/helpers.c | 11 ++
> hw/vfio/iommufd-stubs.c | 18 +++
> hw/vfio/iommufd.c | 81 ++++++++++--
> hw/vfio/pci.c | 109 ++++++++++++++++-
> hw/vfio/platform.c | 2 +-
> migration/cpr.c | 52 ++++++--
> system/physmem.c | 5 +
> backends/trace-events | 2 +
> hw/vfio/meson.build | 2 +
> hw/vfio/trace-events | 3 +
> 29 files changed, 871 insertions(+), 63 deletions(-)
> create mode 100644 hw/vfio/cpr-iommufd.c
> create mode 100644 hw/vfio/iommufd-stubs.c
>
Applied to vfio-next.
Thanks,
C.
^ permalink raw reply [flat|nested] 31+ messages in thread
* RE: [PATCH V6 03/21] migration: close kvm after cpr
2025-07-02 22:05 ` Steven Sistare
@ 2025-07-04 9:50 ` Duan, Zhenzhong
2025-07-07 13:07 ` Steven Sistare
0 siblings, 1 reply; 31+ messages in thread
From: Duan, Zhenzhong @ 2025-07-04 9:50 UTC (permalink / raw)
To: Steven Sistare, qemu-devel@nongnu.org, Paolo Bonzini
Cc: Alex Williamson, Cedric Le Goater, Liu, Yi L, Eric Auger,
Michael S. Tsirkin, Marcel Apfelbaum, Peter Xu, Fabiano Rosas
>-----Original Message-----
>From: Steven Sistare <steven.sistare@oracle.com>
>Subject: Re: [PATCH V6 03/21] migration: close kvm after cpr
>
>cc Paolo.
>
>After incorporating Peter's feedback, IMO this version reads well:
> * kvm exports kvm_close
> * vfio exports vfio_kvm_device_close
> * vfio-cpr registers a notifier that calls vfio_kvm_device_close
>
>- Steve
>
>On 7/2/2025 5:58 PM, Steve Sistare wrote:
>> cpr-transfer breaks vfio network connectivity to and from the guest, and
>> the host system log shows:
>> irq bypass consumer (token 00000000a03c32e5) registration fails: -16
>> which is EBUSY. This occurs because KVM descriptors are still open in
>> the old QEMU process. Close them.
>>
>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>> Reviewed-by: Fabiano Rosas <farosas@suse.de>
>> ---
>> include/hw/vfio/vfio-cpr.h | 2 ++
>> include/hw/vfio/vfio-device.h | 2 ++
>> include/system/kvm.h | 1 +
>> accel/kvm/kvm-all.c | 32
>++++++++++++++++++++++++++++++++
>> hw/vfio/cpr-legacy.c | 2 ++
>> hw/vfio/cpr.c | 21 +++++++++++++++++++++
>> hw/vfio/helpers.c | 11 +++++++++++
>> 7 files changed, 71 insertions(+)
>>
>> diff --git a/include/hw/vfio/vfio-cpr.h b/include/hw/vfio/vfio-cpr.h
>> index 25e74ee..099d54f 100644
>> --- a/include/hw/vfio/vfio-cpr.h
>> +++ b/include/hw/vfio/vfio-cpr.h
>> @@ -62,4 +62,6 @@ void vfio_cpr_delete_vector_fd(struct VFIOPCIDevice
>*vdev, const char *name,
>>
>> extern const VMStateDescription vfio_cpr_pci_vmstate;
>>
>> +void vfio_cpr_add_kvm_notifier(void);
>> +
>> #endif /* HW_VFIO_VFIO_CPR_H */
>> diff --git a/include/hw/vfio/vfio-device.h b/include/hw/vfio/vfio-device.h
>> index c616652..f503837 100644
>> --- a/include/hw/vfio/vfio-device.h
>> +++ b/include/hw/vfio/vfio-device.h
>> @@ -283,4 +283,6 @@ void vfio_device_set_fd(VFIODevice *vbasedev,
>const char *str, Error **errp);
>> void vfio_device_init(VFIODevice *vbasedev, int type, VFIODeviceOps
>*ops,
>> DeviceState *dev, bool ram_discard);
>> int vfio_device_get_aw_bits(VFIODevice *vdev);
>> +
>> +void vfio_kvm_device_close(void);
>> #endif /* HW_VFIO_VFIO_COMMON_H */
>> diff --git a/include/system/kvm.h b/include/system/kvm.h
>> index 7cc60d2..4896a3c 100644
>> --- a/include/system/kvm.h
>> +++ b/include/system/kvm.h
>> @@ -195,6 +195,7 @@ bool kvm_has_sync_mmu(void);
>> int kvm_has_vcpu_events(void);
>> int kvm_max_nested_state_length(void);
>> int kvm_has_gsi_routing(void);
>> +void kvm_close(void);
>>
>> /**
>> * kvm_arm_supports_user_irq
>> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
>> index d095d1b..8141854 100644
>> --- a/accel/kvm/kvm-all.c
>> +++ b/accel/kvm/kvm-all.c
>> @@ -515,16 +515,23 @@ static int do_kvm_destroy_vcpu(CPUState *cpu)
>> goto err;
>> }
>>
>> + /* If I am the CPU that created coalesced_mmio_ring, then discard it
>*/
>> + if (s->coalesced_mmio_ring == (void *)cpu->kvm_run + PAGE_SIZE) {
>> + s->coalesced_mmio_ring = NULL;
>> + }
>> +
>> ret = munmap(cpu->kvm_run, mmap_size);
>> if (ret < 0) {
>> goto err;
>> }
>> + cpu->kvm_run = NULL;
>>
>> if (cpu->kvm_dirty_gfns) {
>> ret = munmap(cpu->kvm_dirty_gfns, s->kvm_dirty_ring_bytes);
>> if (ret < 0) {
>> goto err;
>> }
>> + cpu->kvm_dirty_gfns = NULL;
>> }
>>
>> kvm_park_vcpu(cpu);
>> @@ -608,6 +615,31 @@ err:
>> return ret;
>> }
>>
>> +void kvm_close(void)
>> +{
>> + CPUState *cpu;
>> +
>> + if (!kvm_state || kvm_state->fd == -1) {
>> + return;
>> + }
>> +
>> + CPU_FOREACH(cpu) {
>> + cpu_remove_sync(cpu);
>> + close(cpu->kvm_fd);
>> + cpu->kvm_fd = -1;
>> + close(cpu->kvm_vcpu_stats_fd);
>> + cpu->kvm_vcpu_stats_fd = -1;
>> + }
>> +
>> + if (kvm_state && kvm_state->fd != -1) {
>> + close(kvm_state->vmfd);
>> + kvm_state->vmfd = -1;
>> + close(kvm_state->fd);
>> + kvm_state->fd = -1;
>> + }
>> + kvm_state = NULL;
>> +}
>> +
>> /*
>> * dirty pages logging control
>> */
>> diff --git a/hw/vfio/cpr-legacy.c b/hw/vfio/cpr-legacy.c
>> index a84c324..daa3523 100644
>> --- a/hw/vfio/cpr-legacy.c
>> +++ b/hw/vfio/cpr-legacy.c
>> @@ -177,6 +177,8 @@ bool
>vfio_legacy_cpr_register_container(VFIOContainer *container, Error **errp)
>>
>MIG_MODE_CPR_TRANSFER, -1) == 0;
>> }
>>
>> + vfio_cpr_add_kvm_notifier();
Hi Steven, I just noticed this, do we need to do same for iommufd?
Do we need to delete notifier when all VFIO devices hot unplugged?
I see Cedric has just sent a PR, if I'm right, maybe a following up patch to address it?
Thanks
Zhenzhong
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH V6 03/21] migration: close kvm after cpr
2025-07-04 9:50 ` Duan, Zhenzhong
@ 2025-07-07 13:07 ` Steven Sistare
2025-07-08 3:04 ` Duan, Zhenzhong
0 siblings, 1 reply; 31+ messages in thread
From: Steven Sistare @ 2025-07-07 13:07 UTC (permalink / raw)
To: Duan, Zhenzhong
Cc: Alex Williamson, Cedric Le Goater, Liu, Yi L, Eric Auger,
Michael S. Tsirkin, Marcel Apfelbaum, Peter Xu, Fabiano Rosas,
Paolo Bonzini, qemu-devel@nongnu.org
On 7/4/2025 5:50 AM, Duan, Zhenzhong wrote:
>> -----Original Message-----
>> From: Steven Sistare <steven.sistare@oracle.com>
>> Subject: Re: [PATCH V6 03/21] migration: close kvm after cpr
>>
>> cc Paolo.
>>
>> After incorporating Peter's feedback, IMO this version reads well:
>> * kvm exports kvm_close
>> * vfio exports vfio_kvm_device_close
>> * vfio-cpr registers a notifier that calls vfio_kvm_device_close
>>
>> - Steve
>>
>> On 7/2/2025 5:58 PM, Steve Sistare wrote:
>>> cpr-transfer breaks vfio network connectivity to and from the guest, and
>>> the host system log shows:
>>> irq bypass consumer (token 00000000a03c32e5) registration fails: -16
>>> which is EBUSY. This occurs because KVM descriptors are still open in
>>> the old QEMU process. Close them.
>>>
>>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>>> Reviewed-by: Fabiano Rosas <farosas@suse.de>
>>> ---
>>> include/hw/vfio/vfio-cpr.h | 2 ++
>>> include/hw/vfio/vfio-device.h | 2 ++
>>> include/system/kvm.h | 1 +
>>> accel/kvm/kvm-all.c | 32
>> ++++++++++++++++++++++++++++++++
>>> hw/vfio/cpr-legacy.c | 2 ++
>>> hw/vfio/cpr.c | 21 +++++++++++++++++++++
>>> hw/vfio/helpers.c | 11 +++++++++++
>>> 7 files changed, 71 insertions(+)
>>>
>>> diff --git a/include/hw/vfio/vfio-cpr.h b/include/hw/vfio/vfio-cpr.h
>>> index 25e74ee..099d54f 100644
>>> --- a/include/hw/vfio/vfio-cpr.h
>>> +++ b/include/hw/vfio/vfio-cpr.h
>>> @@ -62,4 +62,6 @@ void vfio_cpr_delete_vector_fd(struct VFIOPCIDevice
>> *vdev, const char *name,
>>>
>>> extern const VMStateDescription vfio_cpr_pci_vmstate;
>>>
>>> +void vfio_cpr_add_kvm_notifier(void);
>>> +
>>> #endif /* HW_VFIO_VFIO_CPR_H */
>>> diff --git a/include/hw/vfio/vfio-device.h b/include/hw/vfio/vfio-device.h
>>> index c616652..f503837 100644
>>> --- a/include/hw/vfio/vfio-device.h
>>> +++ b/include/hw/vfio/vfio-device.h
>>> @@ -283,4 +283,6 @@ void vfio_device_set_fd(VFIODevice *vbasedev,
>> const char *str, Error **errp);
>>> void vfio_device_init(VFIODevice *vbasedev, int type, VFIODeviceOps
>> *ops,
>>> DeviceState *dev, bool ram_discard);
>>> int vfio_device_get_aw_bits(VFIODevice *vdev);
>>> +
>>> +void vfio_kvm_device_close(void);
>>> #endif /* HW_VFIO_VFIO_COMMON_H */
>>> diff --git a/include/system/kvm.h b/include/system/kvm.h
>>> index 7cc60d2..4896a3c 100644
>>> --- a/include/system/kvm.h
>>> +++ b/include/system/kvm.h
>>> @@ -195,6 +195,7 @@ bool kvm_has_sync_mmu(void);
>>> int kvm_has_vcpu_events(void);
>>> int kvm_max_nested_state_length(void);
>>> int kvm_has_gsi_routing(void);
>>> +void kvm_close(void);
>>>
>>> /**
>>> * kvm_arm_supports_user_irq
>>> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
>>> index d095d1b..8141854 100644
>>> --- a/accel/kvm/kvm-all.c
>>> +++ b/accel/kvm/kvm-all.c
>>> @@ -515,16 +515,23 @@ static int do_kvm_destroy_vcpu(CPUState *cpu)
>>> goto err;
>>> }
>>>
>>> + /* If I am the CPU that created coalesced_mmio_ring, then discard it
>> */
>>> + if (s->coalesced_mmio_ring == (void *)cpu->kvm_run + PAGE_SIZE) {
>>> + s->coalesced_mmio_ring = NULL;
>>> + }
>>> +
>>> ret = munmap(cpu->kvm_run, mmap_size);
>>> if (ret < 0) {
>>> goto err;
>>> }
>>> + cpu->kvm_run = NULL;
>>>
>>> if (cpu->kvm_dirty_gfns) {
>>> ret = munmap(cpu->kvm_dirty_gfns, s->kvm_dirty_ring_bytes);
>>> if (ret < 0) {
>>> goto err;
>>> }
>>> + cpu->kvm_dirty_gfns = NULL;
>>> }
>>>
>>> kvm_park_vcpu(cpu);
>>> @@ -608,6 +615,31 @@ err:
>>> return ret;
>>> }
>>>
>>> +void kvm_close(void)
>>> +{
>>> + CPUState *cpu;
>>> +
>>> + if (!kvm_state || kvm_state->fd == -1) {
>>> + return;
>>> + }
>>> +
>>> + CPU_FOREACH(cpu) {
>>> + cpu_remove_sync(cpu);
>>> + close(cpu->kvm_fd);
>>> + cpu->kvm_fd = -1;
>>> + close(cpu->kvm_vcpu_stats_fd);
>>> + cpu->kvm_vcpu_stats_fd = -1;
>>> + }
>>> +
>>> + if (kvm_state && kvm_state->fd != -1) {
>>> + close(kvm_state->vmfd);
>>> + kvm_state->vmfd = -1;
>>> + close(kvm_state->fd);
>>> + kvm_state->fd = -1;
>>> + }
>>> + kvm_state = NULL;
>>> +}
>>> +
>>> /*
>>> * dirty pages logging control
>>> */
>>> diff --git a/hw/vfio/cpr-legacy.c b/hw/vfio/cpr-legacy.c
>>> index a84c324..daa3523 100644
>>> --- a/hw/vfio/cpr-legacy.c
>>> +++ b/hw/vfio/cpr-legacy.c
>>> @@ -177,6 +177,8 @@ bool
>> vfio_legacy_cpr_register_container(VFIOContainer *container, Error **errp)
>>>
>> MIG_MODE_CPR_TRANSFER, -1) == 0;
>>> }
>>>
>>> + vfio_cpr_add_kvm_notifier();
>
> Hi Steven, I just noticed this, do we need to do same for iommufd?
Yes, and that call is added in patch
"vfio/iommufd: register container for cpr"
> Do we need to delete notifier when all VFIO devices hot unplugged?
No need. The notifier will be called, and close the kvm descriptors and
vfio_kvm_device_fd. Not strictly necessary if vfio devices are no longer
present, but not harmful either.
- Steve
^ permalink raw reply [flat|nested] 31+ messages in thread
* RE: [PATCH V6 03/21] migration: close kvm after cpr
2025-07-07 13:07 ` Steven Sistare
@ 2025-07-08 3:04 ` Duan, Zhenzhong
0 siblings, 0 replies; 31+ messages in thread
From: Duan, Zhenzhong @ 2025-07-08 3:04 UTC (permalink / raw)
To: Steven Sistare
Cc: Alex Williamson, Cedric Le Goater, Liu, Yi L, Eric Auger,
Michael S. Tsirkin, Marcel Apfelbaum, Peter Xu, Fabiano Rosas,
Paolo Bonzini, qemu-devel@nongnu.org
>-----Original Message-----
>From: Steven Sistare <steven.sistare@oracle.com>
>Subject: Re: [PATCH V6 03/21] migration: close kvm after cpr
>
>On 7/4/2025 5:50 AM, Duan, Zhenzhong wrote:
>>> -----Original Message-----
>>> From: Steven Sistare <steven.sistare@oracle.com>
>>> Subject: Re: [PATCH V6 03/21] migration: close kvm after cpr
>>>
>>> cc Paolo.
>>>
>>> After incorporating Peter's feedback, IMO this version reads well:
>>> * kvm exports kvm_close
>>> * vfio exports vfio_kvm_device_close
>>> * vfio-cpr registers a notifier that calls vfio_kvm_device_close
>>>
>>> - Steve
>>>
>>> On 7/2/2025 5:58 PM, Steve Sistare wrote:
>>>> cpr-transfer breaks vfio network connectivity to and from the guest, and
>>>> the host system log shows:
>>>> irq bypass consumer (token 00000000a03c32e5) registration fails:
>-16
>>>> which is EBUSY. This occurs because KVM descriptors are still open in
>>>> the old QEMU process. Close them.
>>>>
>>>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>>>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>>>> Reviewed-by: Fabiano Rosas <farosas@suse.de>
>>>> ---
>>>> include/hw/vfio/vfio-cpr.h | 2 ++
>>>> include/hw/vfio/vfio-device.h | 2 ++
>>>> include/system/kvm.h | 1 +
>>>> accel/kvm/kvm-all.c | 32
>>> ++++++++++++++++++++++++++++++++
>>>> hw/vfio/cpr-legacy.c | 2 ++
>>>> hw/vfio/cpr.c | 21 +++++++++++++++++++++
>>>> hw/vfio/helpers.c | 11 +++++++++++
>>>> 7 files changed, 71 insertions(+)
>>>>
>>>> diff --git a/include/hw/vfio/vfio-cpr.h b/include/hw/vfio/vfio-cpr.h
>>>> index 25e74ee..099d54f 100644
>>>> --- a/include/hw/vfio/vfio-cpr.h
>>>> +++ b/include/hw/vfio/vfio-cpr.h
>>>> @@ -62,4 +62,6 @@ void vfio_cpr_delete_vector_fd(struct
>VFIOPCIDevice
>>> *vdev, const char *name,
>>>>
>>>> extern const VMStateDescription vfio_cpr_pci_vmstate;
>>>>
>>>> +void vfio_cpr_add_kvm_notifier(void);
>>>> +
>>>> #endif /* HW_VFIO_VFIO_CPR_H */
>>>> diff --git a/include/hw/vfio/vfio-device.h b/include/hw/vfio/vfio-device.h
>>>> index c616652..f503837 100644
>>>> --- a/include/hw/vfio/vfio-device.h
>>>> +++ b/include/hw/vfio/vfio-device.h
>>>> @@ -283,4 +283,6 @@ void vfio_device_set_fd(VFIODevice *vbasedev,
>>> const char *str, Error **errp);
>>>> void vfio_device_init(VFIODevice *vbasedev, int type, VFIODeviceOps
>>> *ops,
>>>> DeviceState *dev, bool ram_discard);
>>>> int vfio_device_get_aw_bits(VFIODevice *vdev);
>>>> +
>>>> +void vfio_kvm_device_close(void);
>>>> #endif /* HW_VFIO_VFIO_COMMON_H */
>>>> diff --git a/include/system/kvm.h b/include/system/kvm.h
>>>> index 7cc60d2..4896a3c 100644
>>>> --- a/include/system/kvm.h
>>>> +++ b/include/system/kvm.h
>>>> @@ -195,6 +195,7 @@ bool kvm_has_sync_mmu(void);
>>>> int kvm_has_vcpu_events(void);
>>>> int kvm_max_nested_state_length(void);
>>>> int kvm_has_gsi_routing(void);
>>>> +void kvm_close(void);
>>>>
>>>> /**
>>>> * kvm_arm_supports_user_irq
>>>> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
>>>> index d095d1b..8141854 100644
>>>> --- a/accel/kvm/kvm-all.c
>>>> +++ b/accel/kvm/kvm-all.c
>>>> @@ -515,16 +515,23 @@ static int do_kvm_destroy_vcpu(CPUState
>*cpu)
>>>> goto err;
>>>> }
>>>>
>>>> + /* If I am the CPU that created coalesced_mmio_ring, then discard
>it
>>> */
>>>> + if (s->coalesced_mmio_ring == (void *)cpu->kvm_run + PAGE_SIZE)
>{
>>>> + s->coalesced_mmio_ring = NULL;
>>>> + }
>>>> +
>>>> ret = munmap(cpu->kvm_run, mmap_size);
>>>> if (ret < 0) {
>>>> goto err;
>>>> }
>>>> + cpu->kvm_run = NULL;
>>>>
>>>> if (cpu->kvm_dirty_gfns) {
>>>> ret = munmap(cpu->kvm_dirty_gfns,
>s->kvm_dirty_ring_bytes);
>>>> if (ret < 0) {
>>>> goto err;
>>>> }
>>>> + cpu->kvm_dirty_gfns = NULL;
>>>> }
>>>>
>>>> kvm_park_vcpu(cpu);
>>>> @@ -608,6 +615,31 @@ err:
>>>> return ret;
>>>> }
>>>>
>>>> +void kvm_close(void)
>>>> +{
>>>> + CPUState *cpu;
>>>> +
>>>> + if (!kvm_state || kvm_state->fd == -1) {
>>>> + return;
>>>> + }
>>>> +
>>>> + CPU_FOREACH(cpu) {
>>>> + cpu_remove_sync(cpu);
>>>> + close(cpu->kvm_fd);
>>>> + cpu->kvm_fd = -1;
>>>> + close(cpu->kvm_vcpu_stats_fd);
>>>> + cpu->kvm_vcpu_stats_fd = -1;
>>>> + }
>>>> +
>>>> + if (kvm_state && kvm_state->fd != -1) {
>>>> + close(kvm_state->vmfd);
>>>> + kvm_state->vmfd = -1;
>>>> + close(kvm_state->fd);
>>>> + kvm_state->fd = -1;
>>>> + }
>>>> + kvm_state = NULL;
>>>> +}
>>>> +
>>>> /*
>>>> * dirty pages logging control
>>>> */
>>>> diff --git a/hw/vfio/cpr-legacy.c b/hw/vfio/cpr-legacy.c
>>>> index a84c324..daa3523 100644
>>>> --- a/hw/vfio/cpr-legacy.c
>>>> +++ b/hw/vfio/cpr-legacy.c
>>>> @@ -177,6 +177,8 @@ bool
>>> vfio_legacy_cpr_register_container(VFIOContainer *container, Error
>**errp)
>>>>
>>> MIG_MODE_CPR_TRANSFER, -1) == 0;
>>>> }
>>>>
>>>> + vfio_cpr_add_kvm_notifier();
>>
>> Hi Steven, I just noticed this, do we need to do same for iommufd?
>
>Yes, and that call is added in patch
> "vfio/iommufd: register container for cpr"
>
>> Do we need to delete notifier when all VFIO devices hot unplugged?
>
>No need. The notifier will be called, and close the kvm descriptors and
>vfio_kvm_device_fd. Not strictly necessary if vfio devices are no longer
>present, but not harmful either.
Clear, no problem.
Thanks
Zhenzhong
^ permalink raw reply [flat|nested] 31+ messages in thread
end of thread, other threads:[~2025-07-08 20:52 UTC | newest]
Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-02 21:58 [PATCH V6 00/21] Live update: vfio and iommufd Steve Sistare
2025-07-02 21:58 ` [PATCH V6 01/21] vfio-pci: preserve MSI Steve Sistare
2025-07-03 6:13 ` Cédric Le Goater
2025-07-02 21:58 ` [PATCH V6 02/21] vfio-pci: preserve INTx Steve Sistare
2025-07-03 6:13 ` Cédric Le Goater
2025-07-02 21:58 ` [PATCH V6 03/21] migration: close kvm after cpr Steve Sistare
2025-07-02 22:05 ` Steven Sistare
2025-07-04 9:50 ` Duan, Zhenzhong
2025-07-07 13:07 ` Steven Sistare
2025-07-08 3:04 ` Duan, Zhenzhong
2025-07-02 21:58 ` [PATCH V6 04/21] migration: cpr_get_fd_param helper Steve Sistare
2025-07-02 21:58 ` [PATCH V6 05/21] backends/iommufd: iommufd_backend_map_file_dma Steve Sistare
2025-07-02 21:58 ` [PATCH V6 06/21] backends/iommufd: change process ioctl Steve Sistare
2025-07-02 21:58 ` [PATCH V6 07/21] physmem: qemu_ram_get_fd_offset Steve Sistare
2025-07-02 21:58 ` [PATCH V6 08/21] vfio/iommufd: use IOMMU_IOAS_MAP_FILE Steve Sistare
2025-07-02 21:58 ` [PATCH V6 09/21] vfio/iommufd: invariant device name Steve Sistare
2025-07-02 21:58 ` [PATCH V6 10/21] vfio/iommufd: add vfio_device_free_name Steve Sistare
2025-07-02 21:58 ` [PATCH V6 11/21] vfio/iommufd: device name blocker Steve Sistare
2025-07-02 21:58 ` [PATCH V6 12/21] vfio/iommufd: register container for cpr Steve Sistare
2025-07-03 2:42 ` Duan, Zhenzhong
2025-07-02 21:58 ` [PATCH V6 13/21] migration: vfio cpr state hook Steve Sistare
2025-07-03 2:44 ` Duan, Zhenzhong
2025-07-02 21:58 ` [PATCH V6 14/21] vfio/iommufd: cpr state Steve Sistare
2025-07-02 21:58 ` [PATCH V6 15/21] vfio/iommufd: preserve descriptors Steve Sistare
2025-07-02 21:58 ` [PATCH V6 16/21] vfio/iommufd: reconstruct device Steve Sistare
2025-07-02 21:58 ` [PATCH V6 17/21] vfio/iommufd: reconstruct hwpt Steve Sistare
2025-07-02 21:58 ` [PATCH V6 18/21] vfio/iommufd: change process Steve Sistare
2025-07-02 21:58 ` [PATCH V6 19/21] iommufd: preserve DMA mappings Steve Sistare
2025-07-02 21:58 ` [PATCH V6 20/21] vfio/container: delete old cpr register Steve Sistare
2025-07-02 21:58 ` [PATCH V6 21/21] vfio: doc changes for cpr Steve Sistare
2025-07-03 6:22 ` [PATCH V6 00/21] Live update: vfio and iommufd Cédric Le Goater
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).