* [RFC V1 00/12] Live update: iommufd
@ 2024-07-20 19:15 Steve Sistare
2024-07-20 19:15 ` [RFC V1 01/12] vfio: move cpr_exec_notifier Steve Sistare
` (11 more replies)
0 siblings, 12 replies; 18+ messages in thread
From: Steve Sistare @ 2024-07-20 19:15 UTC (permalink / raw)
To: qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Cedric Le Goater, Michael S. Tsirkin, Peter Xu, Fabiano Rosas,
Philippe Mathieu-Daude, David Hildenbrand, Steve Sistare
Support iommufd devices with the cpr-exec live migration mode.
No user-visible interfaces are added.
Pass the iommufd and vfio device descriptors from old to new QEMU. In new
QEMU, during vfio_realize, skip the ioctls that configure the device, because
it is already configured.
In new QEMU, call ioctl(IOMMU_IOAS_CHANGE_PROCESS) to update the mm ownership,
locked memory accounting, and virtual address of all DMA mappings. The old
virtual address of each memory region is needed to identify the existing
mapping, so pass the host address of each RAMBlock in the migration data
stream.
Block CPR if the iommufd container has any vfio mdevs (mediated devices).
IOMMU_IOAS_CHANGE_PROCESS can be used as-is to support mdevs, but it requires
extra work in userland at CPR time so that kernel threads have access to the
old mappings until the mappings are updated in new QEMU. I have prototyped
those changes but they need more work before posting.
This series depends on the following qemu series
[PATCH V1 00/08] Live update: vfio
https://lore.kernel.org/qemu-devel/1720558737-451106-1-git-send-email-steven.sistare@oracle.com/
This series depends on the IOMMU_IOAS_CHANGE_PROCESS kernel interface which
is a work in progress:
iommufd live update
https://lore.kernel.org/linux-iommu/1721501805-86928-1-git-send-email-steven.sistare@oracle.com
Steve Sistare (12):
vfio: move cpr_exec_notifier
iommufd: no DMA to BARs
iommufd: pass name to connect
migration: cpr_find_fd_any
iommufd: preserve device fd
iommufd: export iommufd_cdev_get_info_iova_range
iommufd: change_process kernel interface
vfio/iommufd: register container for cpr
vfio/iommufd: rebuild device
migration/ram: old host address
iommufd: update DMA virtual addresses
vfio: mdev blocker
backends/iommufd.c | 113 ++++++++++++++++++++++++++++++++--
hw/core/machine.c | 6 ++
hw/vfio/common.c | 3 +-
hw/vfio/cpr-iommufd.c | 84 +++++++++++++++++++++++++
hw/vfio/cpr-legacy.c | 10 +--
hw/vfio/helpers.c | 1 +
hw/vfio/iommufd.c | 43 ++++++++++---
hw/vfio/meson.build | 1 +
hw/vfio/pci.c | 10 +++
include/exec/memory.h | 1 +
include/exec/ramblock.h | 1 +
include/hw/vfio/vfio-common.h | 7 ++-
include/hw/vfio/vfio-container-base.h | 3 +
include/migration/cpr.h | 1 +
include/sysemu/iommufd.h | 7 ++-
linux-headers/linux/iommufd.h | 19 ++++++
migration/cpr.c | 15 +++++
migration/migration.h | 2 +
migration/options.c | 2 +
migration/ram.c | 7 +++
20 files changed, 316 insertions(+), 20 deletions(-)
create mode 100644 hw/vfio/cpr-iommufd.c
--
1.8.3.1
^ permalink raw reply [flat|nested] 18+ messages in thread
* [RFC V1 01/12] vfio: move cpr_exec_notifier
2024-07-20 19:15 [RFC V1 00/12] Live update: iommufd Steve Sistare
@ 2024-07-20 19:15 ` Steve Sistare
2024-07-20 19:15 ` [RFC V1 02/12] iommufd: no DMA to BARs Steve Sistare
` (10 subsequent siblings)
11 siblings, 0 replies; 18+ messages in thread
From: Steve Sistare @ 2024-07-20 19:15 UTC (permalink / raw)
To: qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Cedric Le Goater, Michael S. Tsirkin, Peter Xu, Fabiano Rosas,
Philippe Mathieu-Daude, David Hildenbrand, Steve Sistare
Move the cpr notifier to the base container. This change will be squashed
into the "live update: vfio" series.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
hw/vfio/cpr-legacy.c | 10 +++++-----
include/hw/vfio/vfio-common.h | 1 -
include/hw/vfio/vfio-container-base.h | 1 +
3 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/hw/vfio/cpr-legacy.c b/hw/vfio/cpr-legacy.c
index 8f6224e..91be762 100644
--- a/hw/vfio/cpr-legacy.c
+++ b/hw/vfio/cpr-legacy.c
@@ -107,9 +107,9 @@ static const VMStateDescription vfio_container_vmstate = {
static int vfio_cpr_fail_notifier(NotifierWithReturn *notifier,
MigrationEvent *e, Error **errp)
{
- VFIOContainer *container =
- container_of(notifier, VFIOContainer, cpr_exec_notifier);
- VFIOContainerBase *bcontainer = &container->bcontainer;
+ VFIOContainerBase *bcontainer =
+ container_of(notifier, VFIOContainerBase, cpr_exec_notifier);
+ VFIOContainer *container = VFIO_CONTAINER(bcontainer);
if (e->type != MIG_EVENT_PRECOPY_FAILED) {
return 0;
@@ -147,7 +147,7 @@ bool vfio_legacy_cpr_register_container(VFIOContainerBase *bcontainer,
vmstate_register(NULL, -1, &vfio_container_vmstate, container);
- migration_add_notifier_mode(&container->cpr_exec_notifier,
+ migration_add_notifier_mode(&bcontainer->cpr_exec_notifier,
vfio_cpr_fail_notifier,
MIG_MODE_CPR_EXEC);
return true;
@@ -158,5 +158,5 @@ void vfio_legacy_cpr_unregister_container(VFIOContainerBase *bcontainer)
VFIOContainer *container = VFIO_CONTAINER(bcontainer);
vmstate_unregister(NULL, &vfio_container_vmstate, container);
- migration_remove_notifier(&container->cpr_exec_notifier);
+ migration_remove_notifier(&bcontainer->cpr_exec_notifier);
}
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 1902c8f..9512a0c 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -81,7 +81,6 @@ typedef struct VFIOContainer {
VFIOContainerBase bcontainer;
int fd; /* /dev/vfio/vfio, empowered by the attached groups */
unsigned iommu_type;
- NotifierWithReturn cpr_exec_notifier;
bool vaddr_unmapped;
QLIST_HEAD(, VFIOGroup) group_list;
} VFIOContainer;
diff --git a/include/hw/vfio/vfio-container-base.h b/include/hw/vfio/vfio-container-base.h
index 3d30365..f8b7b26 100644
--- a/include/hw/vfio/vfio-container-base.h
+++ b/include/hw/vfio/vfio-container-base.h
@@ -52,6 +52,7 @@ typedef struct VFIOContainerBase {
QLIST_HEAD(, VFIODevice) device_list;
GList *iova_ranges;
NotifierWithReturn cpr_reboot_notifier;
+ NotifierWithReturn cpr_exec_notifier;
Error *cpr_blocker;
} VFIOContainerBase;
--
1.8.3.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [RFC V1 02/12] iommufd: no DMA to BARs
2024-07-20 19:15 [RFC V1 00/12] Live update: iommufd Steve Sistare
2024-07-20 19:15 ` [RFC V1 01/12] vfio: move cpr_exec_notifier Steve Sistare
@ 2024-07-20 19:15 ` Steve Sistare
2024-08-12 22:05 ` Alex Williamson
2024-08-13 1:39 ` Yi Liu
2024-07-20 19:15 ` [RFC V1 03/12] iommufd: pass name to connect Steve Sistare
` (9 subsequent siblings)
11 siblings, 2 replies; 18+ messages in thread
From: Steve Sistare @ 2024-07-20 19:15 UTC (permalink / raw)
To: qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Cedric Le Goater, Michael S. Tsirkin, Peter Xu, Fabiano Rosas,
Philippe Mathieu-Daude, David Hildenbrand, Steve Sistare
Do not map VFIO PCI BARs for DMA. This stops a raft of warnings of the
following form at QEMU start time when using -object iommufd:
qemu-kvm: warning: IOMMU_IOAS_MAP failed: Bad address, PCI BAR?
qemu-kvm: vfio_container_dma_map(0x555558282db0, 0x8800010000, 0x4000, 0x7ffff7ff0000) = -14 (Bad address)
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
hw/vfio/common.c | 3 ++-
hw/vfio/helpers.c | 1 +
include/exec/memory.h | 1 +
3 files changed, 4 insertions(+), 1 deletion(-)
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index da2e0ec..403d45a 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -248,7 +248,8 @@ static bool vfio_listener_skipped_section(MemoryRegionSection *section)
* are never accessed by the CPU and beyond the address width of
* some IOMMU hardware. TODO: VFIO should tell us the IOMMU width.
*/
- section->offset_within_address_space & (1ULL << 63);
+ section->offset_within_address_space & (1ULL << 63) ||
+ section->mr->no_dma;
}
/* Called with rcu_read_lock held. */
diff --git a/hw/vfio/helpers.c b/hw/vfio/helpers.c
index b14edd4..e4cfdd2 100644
--- a/hw/vfio/helpers.c
+++ b/hw/vfio/helpers.c
@@ -435,6 +435,7 @@ int vfio_region_mmap(VFIORegion *region)
memory_region_owner(region->mem),
name, region->mmaps[i].size,
region->mmaps[i].mmap);
+ region->mmaps[i].mem.no_dma = true;
g_free(name);
memory_region_add_subregion(region->mem, region->mmaps[i].offset,
®ion->mmaps[i].mem);
diff --git a/include/exec/memory.h b/include/exec/memory.h
index ea03ef2..850cc8c 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -794,6 +794,7 @@ struct MemoryRegion {
bool unmergeable;
uint8_t dirty_log_mask;
bool is_iommu;
+ bool no_dma;
RAMBlock *ram_block;
Object *owner;
/* owner as TYPE_DEVICE. Used for re-entrancy checks in MR access hotpath */
--
1.8.3.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [RFC V1 03/12] iommufd: pass name to connect
2024-07-20 19:15 [RFC V1 00/12] Live update: iommufd Steve Sistare
2024-07-20 19:15 ` [RFC V1 01/12] vfio: move cpr_exec_notifier Steve Sistare
2024-07-20 19:15 ` [RFC V1 02/12] iommufd: no DMA to BARs Steve Sistare
@ 2024-07-20 19:15 ` Steve Sistare
2024-07-20 19:15 ` [RFC V1 04/12] migration: cpr_find_fd_any Steve Sistare
` (8 subsequent siblings)
11 siblings, 0 replies; 18+ messages in thread
From: Steve Sistare @ 2024-07-20 19:15 UTC (permalink / raw)
To: qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Cedric Le Goater, Michael S. Tsirkin, Peter Xu, Fabiano Rosas,
Philippe Mathieu-Daude, David Hildenbrand, Steve Sistare
Pass device name to iommufd_backend_connect and iommufd_backend_disconnect,
for use by CPR in a subsequent patch. No functional change.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
backends/iommufd.c | 4 ++--
hw/vfio/iommufd.c | 6 +++---
include/sysemu/iommufd.h | 5 +++--
3 files changed, 8 insertions(+), 7 deletions(-)
diff --git a/backends/iommufd.c b/backends/iommufd.c
index 84fefbc..fc37386 100644
--- a/backends/iommufd.c
+++ b/backends/iommufd.c
@@ -72,7 +72,7 @@ static void iommufd_backend_class_init(ObjectClass *oc, void *data)
object_class_property_add_str(oc, "fd", NULL, iommufd_backend_set_fd);
}
-bool iommufd_backend_connect(IOMMUFDBackend *be, Error **errp)
+bool iommufd_backend_connect(IOMMUFDBackend *be, const char *name, Error **errp)
{
int fd;
@@ -90,7 +90,7 @@ bool iommufd_backend_connect(IOMMUFDBackend *be, Error **errp)
return true;
}
-void iommufd_backend_disconnect(IOMMUFDBackend *be)
+void iommufd_backend_disconnect(IOMMUFDBackend *be, const char *name)
{
if (!be->users) {
goto out;
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index c2f158e..255966a 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -71,7 +71,7 @@ static bool iommufd_cdev_connect_and_bind(VFIODevice *vbasedev, Error **errp)
.flags = 0,
};
- if (!iommufd_backend_connect(iommufd, errp)) {
+ if (!iommufd_backend_connect(iommufd, vbasedev->name, errp)) {
return false;
}
@@ -99,7 +99,7 @@ static bool iommufd_cdev_connect_and_bind(VFIODevice *vbasedev, Error **errp)
err_bind:
iommufd_cdev_kvm_device_del(vbasedev);
err_kvm_device_add:
- iommufd_backend_disconnect(iommufd);
+ iommufd_backend_disconnect(iommufd, vbasedev->name);
return false;
}
@@ -107,7 +107,7 @@ static void iommufd_cdev_unbind_and_disconnect(VFIODevice *vbasedev)
{
/* Unbind is automatically conducted when device fd is closed */
iommufd_cdev_kvm_device_del(vbasedev);
- iommufd_backend_disconnect(vbasedev->iommufd);
+ iommufd_backend_disconnect(vbasedev->iommufd, vbasedev->name);
}
static int iommufd_cdev_getfd(const char *sysfs_path, Error **errp)
diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
index 9edfec6..aa195d1 100644
--- a/include/sysemu/iommufd.h
+++ b/include/sysemu/iommufd.h
@@ -37,8 +37,9 @@ struct IOMMUFDBackend {
/*< public >*/
};
-bool iommufd_backend_connect(IOMMUFDBackend *be, Error **errp);
-void iommufd_backend_disconnect(IOMMUFDBackend *be);
+bool iommufd_backend_connect(IOMMUFDBackend *be, const char *name,
+ Error **errp);
+void iommufd_backend_disconnect(IOMMUFDBackend *be, const char *name);
bool iommufd_backend_alloc_ioas(IOMMUFDBackend *be, uint32_t *ioas_id,
Error **errp);
--
1.8.3.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [RFC V1 04/12] migration: cpr_find_fd_any
2024-07-20 19:15 [RFC V1 00/12] Live update: iommufd Steve Sistare
` (2 preceding siblings ...)
2024-07-20 19:15 ` [RFC V1 03/12] iommufd: pass name to connect Steve Sistare
@ 2024-07-20 19:15 ` Steve Sistare
2024-07-20 19:15 ` [RFC V1 05/12] iommufd: preserve device fd Steve Sistare
` (7 subsequent siblings)
11 siblings, 0 replies; 18+ messages in thread
From: Steve Sistare @ 2024-07-20 19:15 UTC (permalink / raw)
To: qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Cedric Le Goater, Michael S. Tsirkin, Peter Xu, Fabiano Rosas,
Philippe Mathieu-Daude, David Hildenbrand, Steve Sistare
Add a function for finding a CPR fd by name, for any value of id, and
return the id.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
include/migration/cpr.h | 1 +
migration/cpr.c | 15 +++++++++++++++
2 files changed, 16 insertions(+)
diff --git a/include/migration/cpr.h b/include/migration/cpr.h
index bfd9864..c9e6111 100644
--- a/include/migration/cpr.h
+++ b/include/migration/cpr.h
@@ -19,6 +19,7 @@ typedef int (*cpr_walk_fd_cb)(int fd);
void cpr_save_fd(const char *name, int id, int fd);
void cpr_delete_fd(const char *name, int id);
int cpr_find_fd(const char *name, int id);
+int cpr_find_fd_any(const char *name, int *id_p);
int cpr_walk_fd(cpr_walk_fd_cb cb);
void cpr_resave_fd(const char *name, int id, int fd);
diff --git a/migration/cpr.c b/migration/cpr.c
index 853d3a1..096e8d8 100644
--- a/migration/cpr.c
+++ b/migration/cpr.c
@@ -109,6 +109,21 @@ int cpr_find_fd(const char *name, int id)
return fd;
}
+int cpr_find_fd_any(const char *name, int *id_p)
+{
+ CprFd *elem;
+
+ QLIST_FOREACH(elem, &cpr_state.fds, next) {
+ if (!strcmp(elem->name, name)) {
+ trace_cpr_find_fd(name, elem->id, elem->fd);
+ *id_p = elem->id;
+ return elem->fd;
+ }
+ }
+ trace_cpr_find_fd(name, -1, -1);
+ return -1;
+}
+
int cpr_walk_fd(cpr_walk_fd_cb cb)
{
CprFd *elem;
--
1.8.3.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [RFC V1 05/12] iommufd: preserve device fd
2024-07-20 19:15 [RFC V1 00/12] Live update: iommufd Steve Sistare
` (3 preceding siblings ...)
2024-07-20 19:15 ` [RFC V1 04/12] migration: cpr_find_fd_any Steve Sistare
@ 2024-07-20 19:15 ` Steve Sistare
2024-07-20 19:15 ` [RFC V1 06/12] iommufd: export iommufd_cdev_get_info_iova_range Steve Sistare
` (6 subsequent siblings)
11 siblings, 0 replies; 18+ messages in thread
From: Steve Sistare @ 2024-07-20 19:15 UTC (permalink / raw)
To: qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Cedric Le Goater, Michael S. Tsirkin, Peter Xu, Fabiano Rosas,
Philippe Mathieu-Daude, David Hildenbrand, Steve Sistare
Save the iommu and vfio device fd in CPR state when it is created, and fetch
the fd from that state after CPR. Save the devid as the fd id. Remember
that the fd was reused, for subsequent patches.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
backends/iommufd.c | 12 +++++++++++-
hw/vfio/iommufd.c | 17 ++++++++++++++++-
include/sysemu/iommufd.h | 1 +
3 files changed, 28 insertions(+), 2 deletions(-)
diff --git a/backends/iommufd.c b/backends/iommufd.c
index fc37386..4bdbad2 100644
--- a/backends/iommufd.c
+++ b/backends/iommufd.c
@@ -16,6 +16,7 @@
#include "qemu/module.h"
#include "qom/object_interfaces.h"
#include "qemu/error-report.h"
+#include "migration/cpr.h"
#include "monitor/monitor.h"
#include "trace.h"
#include <sys/ioctl.h>
@@ -77,11 +78,17 @@ bool iommufd_backend_connect(IOMMUFDBackend *be, const char *name, Error **errp)
int fd;
if (be->owned && !be->users) {
- fd = qemu_open_old("/dev/iommu", O_RDWR);
+ g_autofree char *iname = g_strdup_printf("%s_iommu", name);
+ fd = cpr_find_fd(iname, 0);
+ be->reused = (fd >= 0);
+ if (!be->reused) {
+ fd = qemu_open_old("/dev/iommu", O_RDWR);
+ }
if (fd < 0) {
error_setg_errno(errp, errno, "/dev/iommu opening failed");
return false;
}
+ cpr_resave_fd(iname, 0, fd);
be->fd = fd;
}
be->users++;
@@ -92,6 +99,8 @@ bool iommufd_backend_connect(IOMMUFDBackend *be, const char *name, Error **errp)
void iommufd_backend_disconnect(IOMMUFDBackend *be, const char *name)
{
+ g_autofree char *iname = g_strdup_printf("%s_iommu", name);
+
if (!be->users) {
goto out;
}
@@ -101,6 +110,7 @@ void iommufd_backend_disconnect(IOMMUFDBackend *be, const char *name)
be->fd = -1;
}
out:
+ cpr_delete_fd(iname, 0);
trace_iommufd_backend_disconnect(be->fd, be->users);
}
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index 255966a..cefc9e0 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -24,6 +24,7 @@
#include "sysemu/reset.h"
#include "qemu/cutils.h"
#include "qemu/chardev_open.h"
+#include "migration/cpr.h"
#include "pci.h"
static int iommufd_cdev_map(const VFIOContainerBase *bcontainer, hwaddr iova,
@@ -84,6 +85,11 @@ static bool iommufd_cdev_connect_and_bind(VFIODevice *vbasedev, Error **errp)
goto err_kvm_device_add;
}
+ if (vbasedev->reused) {
+ /* Already bound, and devid was set in iommufd_cdev_attach */
+ goto skip_bind;
+ }
+
/* Bind device to iommufd */
bind.iommufd = iommufd->fd;
if (ioctl(vbasedev->fd, VFIO_DEVICE_BIND_IOMMUFD, &bind)) {
@@ -95,6 +101,8 @@ static bool iommufd_cdev_connect_and_bind(VFIODevice *vbasedev, Error **errp)
vbasedev->devid = bind.out_devid;
trace_iommufd_cdev_connect_and_bind(bind.iommufd, vbasedev->name,
vbasedev->fd, vbasedev->devid);
+
+skip_bind:
return true;
err_bind:
iommufd_cdev_kvm_device_del(vbasedev);
@@ -305,13 +313,18 @@ static bool iommufd_cdev_attach(const char *name, VFIODevice *vbasedev,
VFIO_IOMMU_CLASS(object_class_by_name(TYPE_VFIO_IOMMU_IOMMUFD));
if (vbasedev->fd < 0) {
- devfd = iommufd_cdev_getfd(vbasedev->sysfsdev, errp);
+ devfd = cpr_find_fd_any(vbasedev->name, (int *)&vbasedev->devid);
+ vbasedev->reused = (devfd >= 0);
+ if (!vbasedev->reused) {
+ devfd = iommufd_cdev_getfd(vbasedev->sysfsdev, errp);
+ }
if (devfd < 0) {
return false;
}
vbasedev->fd = devfd;
} else {
devfd = vbasedev->fd;
+ vbasedev->reused = false;
}
if (!iommufd_cdev_connect_and_bind(vbasedev, errp)) {
@@ -413,6 +426,7 @@ found_container:
vbasedev->bcontainer = bcontainer;
QLIST_INSERT_HEAD(&bcontainer->device_list, vbasedev, container_next);
QLIST_INSERT_HEAD(&vfio_device_list, vbasedev, global_next);
+ cpr_resave_fd(vbasedev->name, vbasedev->devid, devfd);
trace_iommufd_cdev_device_info(vbasedev->name, devfd, vbasedev->num_irqs,
vbasedev->num_regions, vbasedev->flags);
@@ -452,6 +466,7 @@ static void iommufd_cdev_detach(VFIODevice *vbasedev)
iommufd_cdev_container_destroy(container);
vfio_put_address_space(space);
+ cpr_delete_fd(vbasedev->name, vbasedev->devid);
iommufd_cdev_unbind_and_disconnect(vbasedev);
close(vbasedev->fd);
}
diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
index aa195d1..6955ebd 100644
--- a/include/sysemu/iommufd.h
+++ b/include/sysemu/iommufd.h
@@ -32,6 +32,7 @@ struct IOMMUFDBackend {
/*< protected >*/
int fd; /* /dev/iommu file descriptor */
bool owned; /* is the /dev/iommu opened internally */
+ bool reused; /* fd is reused after CPR */
uint32_t users;
/*< public >*/
--
1.8.3.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [RFC V1 06/12] iommufd: export iommufd_cdev_get_info_iova_range
2024-07-20 19:15 [RFC V1 00/12] Live update: iommufd Steve Sistare
` (4 preceding siblings ...)
2024-07-20 19:15 ` [RFC V1 05/12] iommufd: preserve device fd Steve Sistare
@ 2024-07-20 19:15 ` Steve Sistare
2024-07-20 19:15 ` [RFC V1 07/12] iommufd: change_process kernel interface Steve Sistare
` (5 subsequent siblings)
11 siblings, 0 replies; 18+ messages in thread
From: Steve Sistare @ 2024-07-20 19:15 UTC (permalink / raw)
To: qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Cedric Le Goater, Michael S. Tsirkin, Peter Xu, Fabiano Rosas,
Philippe Mathieu-Daude, David Hildenbrand, Steve Sistare
Export iommufd_cdev_get_info_iova_range for use by CPR.
No functional change.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
hw/vfio/iommufd.c | 4 ++--
include/hw/vfio/vfio-common.h | 2 ++
2 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index cefc9e0..6d77daa 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -257,8 +257,8 @@ static int iommufd_cdev_ram_block_discard_disable(bool state)
return ram_block_uncoordinated_discard_disable(state);
}
-static bool iommufd_cdev_get_info_iova_range(VFIOIOMMUFDContainer *container,
- uint32_t ioas_id, Error **errp)
+bool iommufd_cdev_get_info_iova_range(VFIOIOMMUFDContainer *container,
+ uint32_t ioas_id, Error **errp)
{
VFIOContainerBase *bcontainer = &container->bcontainer;
g_autofree struct iommu_ioas_iova_ranges *info = NULL;
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 9512a0c..ec5b7168 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -245,6 +245,8 @@ void vfio_cpr_unregister_container(VFIOContainerBase *bcontainer);
bool vfio_legacy_cpr_register_container(VFIOContainerBase *bcontainer,
Error **errp);
void vfio_legacy_cpr_unregister_container(VFIOContainerBase *bcontainer);
+bool iommufd_cdev_get_info_iova_range(VFIOIOMMUFDContainer *container,
+ uint32_t ioas_id, Error **errp);
extern const MemoryRegionOps vfio_region_ops;
typedef QLIST_HEAD(VFIOGroupList, VFIOGroup) VFIOGroupList;
--
1.8.3.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [RFC V1 07/12] iommufd: change_process kernel interface
2024-07-20 19:15 [RFC V1 00/12] Live update: iommufd Steve Sistare
` (5 preceding siblings ...)
2024-07-20 19:15 ` [RFC V1 06/12] iommufd: export iommufd_cdev_get_info_iova_range Steve Sistare
@ 2024-07-20 19:15 ` Steve Sistare
2024-07-20 19:15 ` [RFC V1 08/12] vfio/iommufd: register container for cpr Steve Sistare
` (4 subsequent siblings)
11 siblings, 0 replies; 18+ messages in thread
From: Steve Sistare @ 2024-07-20 19:15 UTC (permalink / raw)
To: qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Cedric Le Goater, Michael S. Tsirkin, Peter Xu, Fabiano Rosas,
Philippe Mathieu-Daude, David Hildenbrand, Steve Sistare
Define IOMMU_IOAS_CHANGE_PROCESS for use by CPR.
This interface is preliminary.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
| 19 +++++++++++++++++++
1 file changed, 19 insertions(+)
--git a/linux-headers/linux/iommufd.h b/linux-headers/linux/iommufd.h
index 72e8f4b..568029a 100644
--- a/linux-headers/linux/iommufd.h
+++ b/linux-headers/linux/iommufd.h
@@ -50,6 +50,7 @@ enum {
IOMMUFD_CMD_HWPT_SET_DIRTY_TRACKING,
IOMMUFD_CMD_HWPT_GET_DIRTY_BITMAP,
IOMMUFD_CMD_HWPT_INVALIDATE,
+ IOMMUFD_CMD_IOAS_CHANGE_PROCESS,
};
/**
@@ -692,4 +693,22 @@ struct iommu_hwpt_invalidate {
__u32 __reserved;
};
#define IOMMU_HWPT_INVALIDATE _IO(IOMMUFD_TYPE, IOMMUFD_CMD_HWPT_INVALIDATE)
+
+struct iommu_ioas_userspace_map {
+ __u64 addr_old;
+ __u64 addr_new;
+ __u64 size; /* bytes */
+};
+
+struct iommu_ioas_change_process {
+ __u32 size;
+ __u32 flags; /* must be 0 */
+ __u32 n_umap;
+ __u32 __reserved; /* must be 0 */
+ __aligned_u64 umap;
+};
+
+#define IOMMU_IOAS_CHANGE_PROCESS \
+ _IO(IOMMUFD_TYPE, IOMMUFD_CMD_IOAS_CHANGE_PROCESS)
+
#endif
--
1.8.3.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [RFC V1 08/12] vfio/iommufd: register container for cpr
2024-07-20 19:15 [RFC V1 00/12] Live update: iommufd Steve Sistare
` (6 preceding siblings ...)
2024-07-20 19:15 ` [RFC V1 07/12] iommufd: change_process kernel interface Steve Sistare
@ 2024-07-20 19:15 ` Steve Sistare
2024-07-20 19:15 ` [RFC V1 09/12] vfio/iommufd: rebuild device Steve Sistare
` (3 subsequent siblings)
11 siblings, 0 replies; 18+ messages in thread
From: Steve Sistare @ 2024-07-20 19:15 UTC (permalink / raw)
To: qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Cedric Le Goater, Michael S. Tsirkin, Peter Xu, Fabiano Rosas,
Philippe Mathieu-Daude, David Hildenbrand, Steve Sistare
Register a vfio iommufd container for CPR. Add a blocker if the kernel does
not support IOMMU_IOAS_CHANGE_PROCESS.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
backends/iommufd.c | 8 ++++++
hw/vfio/cpr-iommufd.c | 60 +++++++++++++++++++++++++++++++++++++++++++
hw/vfio/iommufd.c | 2 ++
hw/vfio/meson.build | 1 +
include/hw/vfio/vfio-common.h | 3 +++
include/sysemu/iommufd.h | 1 +
6 files changed, 75 insertions(+)
create mode 100644 hw/vfio/cpr-iommufd.c
diff --git a/backends/iommufd.c b/backends/iommufd.c
index 4bdbad2..243178e 100644
--- a/backends/iommufd.c
+++ b/backends/iommufd.c
@@ -73,6 +73,14 @@ static void iommufd_backend_class_init(ObjectClass *oc, void *data)
object_class_property_add_str(oc, "fd", NULL, iommufd_backend_set_fd);
}
+bool iommufd_change_process_capable(IOMMUFDBackend *be)
+{
+ struct iommu_ioas_change_process args = {.n_umap = -1};
+
+ ioctl(be->fd, IOMMU_IOAS_CHANGE_PROCESS, &args);
+ return (errno != ENOTTY);
+}
+
bool iommufd_backend_connect(IOMMUFDBackend *be, const char *name, Error **errp)
{
int fd;
diff --git a/hw/vfio/cpr-iommufd.c b/hw/vfio/cpr-iommufd.c
new file mode 100644
index 0000000..f2e34f4
--- /dev/null
+++ b/hw/vfio/cpr-iommufd.c
@@ -0,0 +1,60 @@
+/*
+ * Copyright (c) 2021-2024 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "hw/vfio/vfio-common.h"
+#include "migration/blocker.h"
+#include "migration/cpr.h"
+#include "migration/migration.h"
+#include "migration/vmstate.h"
+#include "sysemu/iommufd.h"
+
+#define IOMMUFD_CONTAINER(base) \
+ container_of(base, VFIOIOMMUFDContainer, bcontainer)
+
+static bool vfio_can_cpr_exec(VFIOIOMMUFDContainer *container, Error **errp)
+{
+ if (!iommufd_change_process_capable(container->be)) {
+ error_setg(errp,
+ "VFIO container does not support IOMMU_IOAS_CHANGE_PROCESS");
+ return false;
+ }
+ return true;
+}
+
+static const VMStateDescription vfio_container_vmstate = {
+ .name = "vfio-iommufd-container",
+ .version_id = 0,
+ .minimum_version_id = 0,
+ .needed = cpr_needed_for_reuse,
+ .fields = (VMStateField[]) {
+ VMSTATE_END_OF_LIST()
+ }
+};
+
+bool vfio_iommufd_cpr_register_container(VFIOContainerBase *bcontainer,
+ Error **errp)
+{
+ VFIOIOMMUFDContainer *container = IOMMUFD_CONTAINER(bcontainer);
+
+ if (!vfio_can_cpr_exec(container, &bcontainer->cpr_blocker)) {
+ return migrate_add_blocker_modes(&bcontainer->cpr_blocker, errp,
+ MIG_MODE_CPR_EXEC, -1) == 0;
+ }
+
+ vmstate_register(NULL, -1, &vfio_container_vmstate, container);
+
+ return true;
+}
+
+void vfio_iommufd_cpr_unregister_container(VFIOContainerBase *bcontainer)
+{
+ VFIOIOMMUFDContainer *container = IOMMUFD_CONTAINER(bcontainer);
+
+ vmstate_unregister(NULL, &vfio_container_vmstate, container);
+}
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index 6d77daa..585bf09 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -632,6 +632,8 @@ static void vfio_iommu_iommufd_class_init(ObjectClass *klass, void *data)
vioc->attach_device = iommufd_cdev_attach;
vioc->detach_device = iommufd_cdev_detach;
vioc->pci_hot_reset = iommufd_cdev_pci_hot_reset;
+ vioc->cpr_register = vfio_iommufd_cpr_register_container;
+ vioc->cpr_unregister = vfio_iommufd_cpr_unregister_container;
};
static bool hiod_iommufd_vfio_realize(HostIOMMUDevice *hiod, void *opaque,
diff --git a/hw/vfio/meson.build b/hw/vfio/meson.build
index 5487815..998adb5 100644
--- a/hw/vfio/meson.build
+++ b/hw/vfio/meson.build
@@ -13,6 +13,7 @@ vfio_ss.add(when: 'CONFIG_IOMMUFD', if_true: files(
vfio_ss.add(when: 'CONFIG_VFIO_PCI', if_true: files(
'cpr.c',
'cpr-legacy.c',
+ 'cpr-iommufd.c',
'display.c',
'pci-quirks.c',
'pci.c',
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index ec5b7168..8aa02d4 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -247,6 +247,9 @@ bool vfio_legacy_cpr_register_container(VFIOContainerBase *bcontainer,
void vfio_legacy_cpr_unregister_container(VFIOContainerBase *bcontainer);
bool iommufd_cdev_get_info_iova_range(VFIOIOMMUFDContainer *container,
uint32_t ioas_id, Error **errp);
+bool vfio_iommufd_cpr_register_container(VFIOContainerBase *bcontainer,
+ Error **errp);
+void vfio_iommufd_cpr_unregister_container(VFIOContainerBase *bcontainer);
extern const MemoryRegionOps vfio_region_ops;
typedef QLIST_HEAD(VFIOGroupList, VFIOGroup) VFIOGroupList;
diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
index 6955ebd..f80b968 100644
--- a/include/sysemu/iommufd.h
+++ b/include/sysemu/iommufd.h
@@ -52,6 +52,7 @@ int iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t devid,
uint32_t *type, void *data, uint32_t len,
Error **errp);
+bool iommufd_change_process_capable(IOMMUFDBackend *be);
#define TYPE_HOST_IOMMU_DEVICE_IOMMUFD TYPE_HOST_IOMMU_DEVICE "-iommufd"
#endif
--
1.8.3.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [RFC V1 09/12] vfio/iommufd: rebuild device
2024-07-20 19:15 [RFC V1 00/12] Live update: iommufd Steve Sistare
` (7 preceding siblings ...)
2024-07-20 19:15 ` [RFC V1 08/12] vfio/iommufd: register container for cpr Steve Sistare
@ 2024-07-20 19:15 ` Steve Sistare
2024-07-20 19:15 ` [RFC V1 10/12] migration/ram: old host address Steve Sistare
` (2 subsequent siblings)
11 siblings, 0 replies; 18+ messages in thread
From: Steve Sistare @ 2024-07-20 19:15 UTC (permalink / raw)
To: qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Cedric Le Goater, Michael S. Tsirkin, Peter Xu, Fabiano Rosas,
Philippe Mathieu-Daude, David Hildenbrand, Steve Sistare
Rebuild userland device state after CPR. During vfio_realize, skip all
ioctls that configure the device, as it was already configured in old
QEMU, and we preserved the device descriptor.
Preserve the ioas_id in vmstate. Because we skip ioctl's, it is not needed
at realize time. However, we do need to gather range info, so defer the
call to iommufd_cdev_get_info_iova_range to a post_load handler, at which
time the ioas_id is known.
Registering the vfio_memory_listener causes spurious calls to map and
unmap DMA, as devices are created and the address space is built. This
memory was already already mapped by the device, so suppress map and unmap
during CPR -- eg, if the reused flag is set. Clear the reused flag in the
post_load handler.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
backends/iommufd.c | 8 ++++++++
hw/vfio/cpr-iommufd.c | 24 ++++++++++++++++++++++++
hw/vfio/iommufd.c | 14 +++++++++++++-
3 files changed, 45 insertions(+), 1 deletion(-)
diff --git a/backends/iommufd.c b/backends/iommufd.c
index 243178e..86fd9db 100644
--- a/backends/iommufd.c
+++ b/backends/iommufd.c
@@ -172,6 +172,10 @@ int iommufd_backend_map_dma(IOMMUFDBackend *be, uint32_t ioas_id, hwaddr iova,
.length = size,
};
+ if (be->reused) {
+ return 0;
+ }
+
if (!readonly) {
map.flags |= IOMMU_IOAS_MAP_WRITEABLE;
}
@@ -203,6 +207,10 @@ int iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
.length = size,
};
+ if (be->reused) {
+ return 0;
+ }
+
ret = ioctl(fd, IOMMU_IOAS_UNMAP, &unmap);
/*
* IOMMUFD takes mapping as some kind of object, unmapping
diff --git a/hw/vfio/cpr-iommufd.c b/hw/vfio/cpr-iommufd.c
index f2e34f4..c38485a 100644
--- a/hw/vfio/cpr-iommufd.c
+++ b/hw/vfio/cpr-iommufd.c
@@ -27,12 +27,36 @@ static bool vfio_can_cpr_exec(VFIOIOMMUFDContainer *container, Error **errp)
return true;
}
+static int vfio_container_post_load(void *opaque, int version_id)
+{
+ VFIOIOMMUFDContainer *container = opaque;
+ VFIOContainerBase *bcontainer = &container->bcontainer;
+ VFIODevice *vbasedev;
+ Error *err = NULL;
+ uint32_t ioas_id = container->ioas_id;
+
+ if (!iommufd_cdev_get_info_iova_range(container, ioas_id, &err)) {
+ error_report_err(err);
+ return -1;
+ }
+
+ bcontainer->reused = false;
+ QLIST_FOREACH(vbasedev, &bcontainer->device_list, container_next) {
+ vbasedev->reused = false;
+ }
+ container->be->reused = false;
+
+ return 0;
+}
+
static const VMStateDescription vfio_container_vmstate = {
.name = "vfio-iommufd-container",
.version_id = 0,
.minimum_version_id = 0,
+ .post_load = vfio_container_post_load,
.needed = cpr_needed_for_reuse,
.fields = (VMStateField[]) {
+ VMSTATE_UINT32(ioas_id, VFIOIOMMUFDContainer),
VMSTATE_END_OF_LIST()
}
};
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index 585bf09..186edc7 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -357,6 +357,11 @@ static bool iommufd_cdev_attach(const char *name, VFIODevice *vbasedev,
}
}
+ if (vbasedev->reused) {
+ ioas_id = -1; /* ioas_id will be sent in vmstate */
+ goto skip_ioas_alloc;
+ }
+
/* Need to allocate a new dedicated container */
if (!iommufd_backend_alloc_ioas(vbasedev->iommufd, &ioas_id, errp)) {
goto err_alloc_ioas;
@@ -364,6 +369,7 @@ static bool iommufd_cdev_attach(const char *name, VFIODevice *vbasedev,
trace_iommufd_cdev_alloc_ioas(vbasedev->iommufd->fd, ioas_id);
+skip_ioas_alloc:
container = VFIO_IOMMU_IOMMUFD(object_new(TYPE_VFIO_IOMMU_IOMMUFD));
container->be = vbasedev->iommufd;
container->ioas_id = ioas_id;
@@ -371,7 +377,8 @@ static bool iommufd_cdev_attach(const char *name, VFIODevice *vbasedev,
bcontainer = &container->bcontainer;
vfio_address_space_insert(space, bcontainer);
- if (!iommufd_cdev_attach_container(vbasedev, container, errp)) {
+ if (!vbasedev->reused &&
+ !iommufd_cdev_attach_container(vbasedev, container, errp)) {
goto err_attach_container;
}
@@ -380,6 +387,10 @@ static bool iommufd_cdev_attach(const char *name, VFIODevice *vbasedev,
goto err_discard_disable;
}
+ if (vbasedev->reused) {
+ goto skip_info;
+ }
+
if (!iommufd_cdev_get_info_iova_range(container, ioas_id, &err)) {
error_append_hint(&err,
"Fallback to default 64bit IOVA range and 4K page size\n");
@@ -388,6 +399,7 @@ static bool iommufd_cdev_attach(const char *name, VFIODevice *vbasedev,
bcontainer->pgsizes = qemu_real_host_page_size();
}
+skip_info:
bcontainer->listener = vfio_memory_listener;
memory_listener_register(&bcontainer->listener, bcontainer->space->as);
--
1.8.3.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [RFC V1 10/12] migration/ram: old host address
2024-07-20 19:15 [RFC V1 00/12] Live update: iommufd Steve Sistare
` (8 preceding siblings ...)
2024-07-20 19:15 ` [RFC V1 09/12] vfio/iommufd: rebuild device Steve Sistare
@ 2024-07-20 19:15 ` Steve Sistare
2024-08-16 17:57 ` Fabiano Rosas
2024-07-20 19:15 ` [RFC V1 11/12] iommufd: update DMA virtual addresses Steve Sistare
2024-07-20 19:15 ` [RFC V1 12/12] vfio: mdev blocker Steve Sistare
11 siblings, 1 reply; 18+ messages in thread
From: Steve Sistare @ 2024-07-20 19:15 UTC (permalink / raw)
To: qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Cedric Le Goater, Michael S. Tsirkin, Peter Xu, Fabiano Rosas,
Philippe Mathieu-Daude, David Hildenbrand, Steve Sistare
Remember the RAMBlock host address as host_old during migration, for use
by CPR. The iommufd interface to update the virtual address of DMA
mappings requires it.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
hw/core/machine.c | 6 ++++++
include/exec/ramblock.h | 1 +
migration/migration.h | 2 ++
migration/options.c | 2 ++
migration/ram.c | 7 +++++++
5 files changed, 18 insertions(+)
diff --git a/hw/core/machine.c b/hw/core/machine.c
index 9676953..0ac16b8 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -35,6 +35,12 @@
#include "hw/virtio/virtio-iommu.h"
#include "audio/audio.h"
+/* TBD: register hw_compat_9_1 with machines */
+GlobalProperty hw_compat_9_1[] = {
+ { "migration", "send-host-old", "off"},
+};
+const size_t hw_compat_9_1_len = G_N_ELEMENTS(hw_compat_9_1);
+
GlobalProperty hw_compat_9_0[] = {
{"arm-cpu", "backcompat-cntfrq", "true" },
{"scsi-disk-base", "migrate-emulated-scsi-request", "false" },
diff --git a/include/exec/ramblock.h b/include/exec/ramblock.h
index 64484cd..8f1c535 100644
--- a/include/exec/ramblock.h
+++ b/include/exec/ramblock.h
@@ -28,6 +28,7 @@ struct RAMBlock {
struct rcu_head rcu;
struct MemoryRegion *mr;
uint8_t *host;
+ uint64_t host_old;
uint8_t *colo_cache; /* For colo, VM's ram cache */
ram_addr_t offset;
ram_addr_t used_length;
diff --git a/migration/migration.h b/migration/migration.h
index 38aa140..b5e3151 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -442,6 +442,8 @@ struct MigrationState {
*/
uint8_t clear_bitmap_shift;
+ bool send_host_old;
+
/*
* This save hostname when out-going migration starts
*/
diff --git a/migration/options.c b/migration/options.c
index 7526f9f..197cb86 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -92,6 +92,8 @@ Property migration_properties[] = {
clear_bitmap_shift, CLEAR_BITMAP_SHIFT_DEFAULT),
DEFINE_PROP_BOOL("x-preempt-pre-7-2", MigrationState,
preempt_pre_7_2, false),
+ DEFINE_PROP_BOOL("send-host-old", MigrationState,
+ send_host_old, true),
/* Migration parameters */
DEFINE_PROP_UINT8("x-throttle-trigger-threshold", MigrationState,
diff --git a/migration/ram.c b/migration/ram.c
index 1e1e05e..8644917 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3030,6 +3030,9 @@ static int ram_save_setup(QEMUFile *f, void *opaque, Error **errp)
qemu_put_byte(f, strlen(block->idstr));
qemu_put_buffer(f, (uint8_t *)block->idstr, strlen(block->idstr));
qemu_put_be64(f, block->used_length);
+ if (migrate_get_current()->send_host_old) {
+ qemu_put_be64(f, (uint64_t)block->host);
+ }
if (migrate_postcopy_ram() &&
block->page_size != max_hg_page_size) {
qemu_put_be64(f, block->page_size);
@@ -4021,6 +4024,10 @@ static int parse_ramblock(QEMUFile *f, RAMBlock *block, ram_addr_t length)
assert(block);
+ if (migrate_get_current()->send_host_old) {
+ block->host_old = qemu_get_be64(f);
+ }
+
if (migrate_mapped_ram()) {
parse_ramblock_mapped_ram(f, block, length, &local_err);
if (local_err) {
--
1.8.3.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [RFC V1 11/12] iommufd: update DMA virtual addresses
2024-07-20 19:15 [RFC V1 00/12] Live update: iommufd Steve Sistare
` (9 preceding siblings ...)
2024-07-20 19:15 ` [RFC V1 10/12] migration/ram: old host address Steve Sistare
@ 2024-07-20 19:15 ` Steve Sistare
2024-07-20 19:15 ` [RFC V1 12/12] vfio: mdev blocker Steve Sistare
11 siblings, 0 replies; 18+ messages in thread
From: Steve Sistare @ 2024-07-20 19:15 UTC (permalink / raw)
To: qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Cedric Le Goater, Michael S. Tsirkin, Peter Xu, Fabiano Rosas,
Philippe Mathieu-Daude, David Hildenbrand, Steve Sistare
Register a vmstate post_load handler to call IOMMU_IOAS_CHANGE_PROCESS and
update the virtual address of all DMA mappings after CPR.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
backends/iommufd.c | 81 +++++++++++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 80 insertions(+), 1 deletion(-)
diff --git a/backends/iommufd.c b/backends/iommufd.c
index 86fd9db..2e72b6f 100644
--- a/backends/iommufd.c
+++ b/backends/iommufd.c
@@ -17,7 +17,9 @@
#include "qom/object_interfaces.h"
#include "qemu/error-report.h"
#include "migration/cpr.h"
+#include "migration/vmstate.h"
#include "monitor/monitor.h"
+#include "exec/ramblock.h"
#include "trace.h"
#include <sys/ioctl.h>
#include <linux/iommufd.h>
@@ -81,6 +83,83 @@ bool iommufd_change_process_capable(IOMMUFDBackend *be)
return (errno != ENOTTY);
}
+static int iommufd_change_process(IOMMUFDBackend *be,
+ struct iommu_ioas_change_process *args)
+{
+ int ret, fd = be->fd;
+
+ ret = ioctl(fd, IOMMU_IOAS_CHANGE_PROCESS, args);
+ if (ret) {
+ ret = -errno;
+ error_report("IOMMU_IOAS_CHANGE_PROCESS failed: %m");
+ }
+ return ret;
+}
+
+static int count_umap(RAMBlock *rb, void *opaque)
+{
+ if (qemu_ram_is_migratable(rb)) {
+ (*(int *)opaque)++;
+ }
+ return 0;
+}
+
+static int fill_umap(RAMBlock *rb, void *opaque)
+{
+ if (qemu_ram_is_migratable(rb)) {
+ struct iommu_ioas_change_process *args = opaque;
+ struct iommu_ioas_userspace_map *umap = (void *)args->umap;
+ int i = args->n_umap++;
+
+ assert(rb->host_old && rb->host);
+ umap[i].addr_old = (__u64)rb->host_old;
+ umap[i].addr_new = (__u64)rb->host;
+ umap[i].size = rb->max_length;
+ }
+ return 0;
+}
+
+static int cmp_umap(const void *elem1, const void *elem2)
+{
+ const struct iommu_ioas_userspace_map *e1 = elem1;
+ const struct iommu_ioas_userspace_map *e2 = elem2;
+
+ return (e1->addr_old < e2->addr_old) ? -1 :
+ (e1->addr_old > e2->addr_old);
+}
+
+static int iommufd_cpr_post_load(void *opaque, int version_id)
+{
+ IOMMUFDBackend *be = opaque;
+ struct iommu_ioas_change_process args = {
+ .size = sizeof(args),
+ .flags = 0,
+ .n_umap = 0,
+ .umap = 0,
+ };
+ int n = 0;
+ g_autofree struct iommu_ioas_userspace_map *umap = NULL;
+
+ RCU_READ_LOCK_GUARD();
+ qemu_ram_foreach_block(count_umap, &n);
+ umap = g_malloc_n(n, sizeof(*umap));
+ args.umap = (__u64)umap;
+ qemu_ram_foreach_block(fill_umap, &args);
+ qsort(umap, args.n_umap, sizeof(*umap), cmp_umap);
+ return iommufd_change_process(be, &args);
+}
+
+static const VMStateDescription iommufd_cpr_vmstate = {
+ .name = "iommufd",
+ .version_id = 0,
+ .minimum_version_id = 0,
+ .post_load = iommufd_cpr_post_load,
+ .needed = cpr_needed_for_reuse,
+ .fields = (VMStateField[]) {
+ VMSTATE_END_OF_LIST()
+ }
+};
+
bool iommufd_backend_connect(IOMMUFDBackend *be, const char *name, Error **errp)
{
int fd;
@@ -100,7 +179,7 @@ bool iommufd_backend_connect(IOMMUFDBackend *be, const char *name, Error **errp)
be->fd = fd;
}
be->users++;
-
+ vmstate_register(NULL, -1, &iommufd_cpr_vmstate, be);
trace_iommufd_backend_connect(be->fd, be->owned, be->users);
return true;
}
--
1.8.3.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [RFC V1 12/12] vfio: mdev blocker
2024-07-20 19:15 [RFC V1 00/12] Live update: iommufd Steve Sistare
` (10 preceding siblings ...)
2024-07-20 19:15 ` [RFC V1 11/12] iommufd: update DMA virtual addresses Steve Sistare
@ 2024-07-20 19:15 ` Steve Sistare
11 siblings, 0 replies; 18+ messages in thread
From: Steve Sistare @ 2024-07-20 19:15 UTC (permalink / raw)
To: qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Cedric Le Goater, Michael S. Tsirkin, Peter Xu, Fabiano Rosas,
Philippe Mathieu-Daude, David Hildenbrand, Steve Sistare
Block CPR if the container has any mdevs (mediated devices). CPR is not
supported for legacy containers and mdevs. It will be supported for iommufd
containers with mdevs in a future patch.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
hw/vfio/pci.c | 10 ++++++++++
include/hw/vfio/vfio-common.h | 1 +
include/hw/vfio/vfio-container-base.h | 2 ++
3 files changed, 13 insertions(+)
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index b5e7592..872b07c 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -3100,6 +3100,13 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
is_mdev = subsys && (strcmp(subsys, "/sys/bus/mdev") == 0);
free(subsys);
+ vbasedev->is_mdev = is_mdev;
+ if (is_mdev && !vbasedev->bcontainer->n_mdev++) {
+ error_setg(&vbasedev->bcontainer->cpr_mdev_blocker,
+ "CPR does not support vfio mdev");
+ migrate_add_blocker_modes(&vbasedev->bcontainer->cpr_mdev_blocker,
+ &error_fatal, MIG_MODE_CPR_EXEC, -1);
+ }
trace_vfio_mdev(vbasedev->name, is_mdev);
if (vbasedev->ram_block_discard_allowed && !is_mdev) {
@@ -3387,6 +3394,9 @@ static void vfio_exitfn(PCIDevice *pdev)
vfio_teardown_msi(vdev);
vfio_pci_disable_rp_atomics(vdev);
vfio_bars_exit(vdev);
+ if (vbasedev->is_mdev && !--vbasedev->bcontainer->n_mdev) {
+ migrate_del_blocker(&vbasedev->bcontainer->cpr_mdev_blocker);
+ }
vfio_migration_exit(vbasedev);
pci_device_unset_iommu_device(pdev);
}
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 8aa02d4..342c40f 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -133,6 +133,7 @@ typedef struct VFIODevice {
OnOffAuto pre_copy_dirty_page_tracking;
bool dirty_pages_supported;
bool dirty_tracking;
+ bool is_mdev;
HostIOMMUDevice *hiod;
int devid;
IOMMUFDBackend *iommufd;
diff --git a/include/hw/vfio/vfio-container-base.h b/include/hw/vfio/vfio-container-base.h
index f8b7b26..e29cbb8 100644
--- a/include/hw/vfio/vfio-container-base.h
+++ b/include/hw/vfio/vfio-container-base.h
@@ -45,6 +45,7 @@ typedef struct VFIOContainerBase {
uint64_t max_dirty_bitmap_size;
unsigned long pgsizes;
unsigned int dma_max_mappings;
+ unsigned int n_mdev;
bool dirty_pages_supported;
QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
QLIST_HEAD(, VFIORamDiscardListener) vrdl_list;
@@ -54,6 +55,7 @@ typedef struct VFIOContainerBase {
NotifierWithReturn cpr_reboot_notifier;
NotifierWithReturn cpr_exec_notifier;
Error *cpr_blocker;
+ Error *cpr_mdev_blocker;
} VFIOContainerBase;
typedef struct VFIOGuestIOMMU {
--
1.8.3.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [RFC V1 02/12] iommufd: no DMA to BARs
2024-07-20 19:15 ` [RFC V1 02/12] iommufd: no DMA to BARs Steve Sistare
@ 2024-08-12 22:05 ` Alex Williamson
2024-08-13 1:39 ` Yi Liu
1 sibling, 0 replies; 18+ messages in thread
From: Alex Williamson @ 2024-08-12 22:05 UTC (permalink / raw)
To: Steve Sistare
Cc: qemu-devel, Yi Liu, Eric Auger, Zhenzhong Duan, Cedric Le Goater,
Michael S. Tsirkin, Peter Xu, Fabiano Rosas,
Philippe Mathieu-Daude, David Hildenbrand
On Sat, 20 Jul 2024 12:15:27 -0700
Steve Sistare <steven.sistare@oracle.com> wrote:
> Do not map VFIO PCI BARs for DMA. This stops a raft of warnings of the
> following form at QEMU start time when using -object iommufd:
>
> qemu-kvm: warning: IOMMU_IOAS_MAP failed: Bad address, PCI BAR?
> qemu-kvm: vfio_container_dma_map(0x555558282db0, 0x8800010000, 0x4000, 0x7ffff7ff0000) = -14 (Bad address)
NAK. These mappings are required for P2P DMA between devices. This is
currently a gap in IOMMUFD support that it doesn't have parity to legacy
vfio containers for these mappings. Thanks,
Alex
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> ---
> hw/vfio/common.c | 3 ++-
> hw/vfio/helpers.c | 1 +
> include/exec/memory.h | 1 +
> 3 files changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index da2e0ec..403d45a 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -248,7 +248,8 @@ static bool vfio_listener_skipped_section(MemoryRegionSection *section)
> * are never accessed by the CPU and beyond the address width of
> * some IOMMU hardware. TODO: VFIO should tell us the IOMMU width.
> */
> - section->offset_within_address_space & (1ULL << 63);
> + section->offset_within_address_space & (1ULL << 63) ||
> + section->mr->no_dma;
> }
>
> /* Called with rcu_read_lock held. */
> diff --git a/hw/vfio/helpers.c b/hw/vfio/helpers.c
> index b14edd4..e4cfdd2 100644
> --- a/hw/vfio/helpers.c
> +++ b/hw/vfio/helpers.c
> @@ -435,6 +435,7 @@ int vfio_region_mmap(VFIORegion *region)
> memory_region_owner(region->mem),
> name, region->mmaps[i].size,
> region->mmaps[i].mmap);
> + region->mmaps[i].mem.no_dma = true;
> g_free(name);
> memory_region_add_subregion(region->mem, region->mmaps[i].offset,
> ®ion->mmaps[i].mem);
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index ea03ef2..850cc8c 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -794,6 +794,7 @@ struct MemoryRegion {
> bool unmergeable;
> uint8_t dirty_log_mask;
> bool is_iommu;
> + bool no_dma;
> RAMBlock *ram_block;
> Object *owner;
> /* owner as TYPE_DEVICE. Used for re-entrancy checks in MR access hotpath */
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC V1 02/12] iommufd: no DMA to BARs
2024-07-20 19:15 ` [RFC V1 02/12] iommufd: no DMA to BARs Steve Sistare
2024-08-12 22:05 ` Alex Williamson
@ 2024-08-13 1:39 ` Yi Liu
2024-08-13 14:53 ` Steven Sistare
1 sibling, 1 reply; 18+ messages in thread
From: Yi Liu @ 2024-08-13 1:39 UTC (permalink / raw)
To: Steve Sistare, qemu-devel
Cc: Eric Auger, Zhenzhong Duan, Alex Williamson, Cedric Le Goater,
Michael S. Tsirkin, Peter Xu, Fabiano Rosas,
Philippe Mathieu-Daude, David Hildenbrand
On 2024/7/21 03:15, Steve Sistare wrote:
> Do not map VFIO PCI BARs for DMA. This stops a raft of warnings of the
> following form at QEMU start time when using -object iommufd:
>
> qemu-kvm: warning: IOMMU_IOAS_MAP failed: Bad address, PCI BAR?
> qemu-kvm: vfio_container_dma_map(0x555558282db0, 0x8800010000, 0x4000, 0x7ffff7ff0000) = -14 (Bad address)
It is required as Alex pointed, so no need to pay further attempt to hide
this message. And there were efforts to make it. But not done yet. Below
links may be helpful if you are interested about the history.
[1] https://lore.kernel.org/kvm/14-v4-0de2f6c78ed0+9d1-iommufd_jgg@nvidia.com/
[2] https://lore.kernel.org/kvm/20240624141139.GH29266@unreal/
[3]
https://lore.kernel.org/kvm/0-v1-9e6e1739ed95+5fa-vfio_dma_buf_jgg@nvidia.com/
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> ---
> hw/vfio/common.c | 3 ++-
> hw/vfio/helpers.c | 1 +
> include/exec/memory.h | 1 +
> 3 files changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index da2e0ec..403d45a 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -248,7 +248,8 @@ static bool vfio_listener_skipped_section(MemoryRegionSection *section)
> * are never accessed by the CPU and beyond the address width of
> * some IOMMU hardware. TODO: VFIO should tell us the IOMMU width.
> */
> - section->offset_within_address_space & (1ULL << 63);
> + section->offset_within_address_space & (1ULL << 63) ||
> + section->mr->no_dma;
> }
>
> /* Called with rcu_read_lock held. */
> diff --git a/hw/vfio/helpers.c b/hw/vfio/helpers.c
> index b14edd4..e4cfdd2 100644
> --- a/hw/vfio/helpers.c
> +++ b/hw/vfio/helpers.c
> @@ -435,6 +435,7 @@ int vfio_region_mmap(VFIORegion *region)
> memory_region_owner(region->mem),
> name, region->mmaps[i].size,
> region->mmaps[i].mmap);
> + region->mmaps[i].mem.no_dma = true;
> g_free(name);
> memory_region_add_subregion(region->mem, region->mmaps[i].offset,
> ®ion->mmaps[i].mem);
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index ea03ef2..850cc8c 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -794,6 +794,7 @@ struct MemoryRegion {
> bool unmergeable;
> uint8_t dirty_log_mask;
> bool is_iommu;
> + bool no_dma;
> RAMBlock *ram_block;
> Object *owner;
> /* owner as TYPE_DEVICE. Used for re-entrancy checks in MR access hotpath */
--
Regards,
Yi Liu
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC V1 02/12] iommufd: no DMA to BARs
2024-08-13 1:39 ` Yi Liu
@ 2024-08-13 14:53 ` Steven Sistare
0 siblings, 0 replies; 18+ messages in thread
From: Steven Sistare @ 2024-08-13 14:53 UTC (permalink / raw)
To: Yi Liu, qemu-devel
Cc: Eric Auger, Zhenzhong Duan, Alex Williamson, Cedric Le Goater,
Michael S. Tsirkin, Peter Xu, Fabiano Rosas,
Philippe Mathieu-Daude, David Hildenbrand
On 8/12/2024 9:39 PM, Yi Liu wrote:
> On 2024/7/21 03:15, Steve Sistare wrote:
>> Do not map VFIO PCI BARs for DMA. This stops a raft of warnings of the
>> following form at QEMU start time when using -object iommufd:
>>
>> qemu-kvm: warning: IOMMU_IOAS_MAP failed: Bad address, PCI BAR?
>> qemu-kvm: vfio_container_dma_map(0x555558282db0, 0x8800010000, 0x4000, 0x7ffff7ff0000) = -14 (Bad address)
>
> It is required as Alex pointed, so no need to pay further attempt to hide
> this message. And there were efforts to make it. But not done yet. Below
> links may be helpful if you are interested about the history.
>
> [1] https://lore.kernel.org/kvm/14-v4-0de2f6c78ed0+9d1-iommufd_jgg@nvidia.com/
> [2] https://lore.kernel.org/kvm/20240624141139.GH29266@unreal/
> [3] https://lore.kernel.org/kvm/0-v1-9e6e1739ed95+5fa-vfio_dma_buf_jgg@nvidia.com/
Thanks for the pointers. Good to here the problem is being worked.
- Steve
>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>> ---
>> hw/vfio/common.c | 3 ++-
>> hw/vfio/helpers.c | 1 +
>> include/exec/memory.h | 1 +
>> 3 files changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> index da2e0ec..403d45a 100644
>> --- a/hw/vfio/common.c
>> +++ b/hw/vfio/common.c
>> @@ -248,7 +248,8 @@ static bool vfio_listener_skipped_section(MemoryRegionSection *section)
>> * are never accessed by the CPU and beyond the address width of
>> * some IOMMU hardware. TODO: VFIO should tell us the IOMMU width.
>> */
>> - section->offset_within_address_space & (1ULL << 63);
>> + section->offset_within_address_space & (1ULL << 63) ||
>> + section->mr->no_dma;
>> }
>> /* Called with rcu_read_lock held. */
>> diff --git a/hw/vfio/helpers.c b/hw/vfio/helpers.c
>> index b14edd4..e4cfdd2 100644
>> --- a/hw/vfio/helpers.c
>> +++ b/hw/vfio/helpers.c
>> @@ -435,6 +435,7 @@ int vfio_region_mmap(VFIORegion *region)
>> memory_region_owner(region->mem),
>> name, region->mmaps[i].size,
>> region->mmaps[i].mmap);
>> + region->mmaps[i].mem.no_dma = true;
>> g_free(name);
>> memory_region_add_subregion(region->mem, region->mmaps[i].offset,
>> ®ion->mmaps[i].mem);
>> diff --git a/include/exec/memory.h b/include/exec/memory.h
>> index ea03ef2..850cc8c 100644
>> --- a/include/exec/memory.h
>> +++ b/include/exec/memory.h
>> @@ -794,6 +794,7 @@ struct MemoryRegion {
>> bool unmergeable;
>> uint8_t dirty_log_mask;
>> bool is_iommu;
>> + bool no_dma;
>> RAMBlock *ram_block;
>> Object *owner;
>> /* owner as TYPE_DEVICE. Used for re-entrancy checks in MR access hotpath */
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC V1 10/12] migration/ram: old host address
2024-07-20 19:15 ` [RFC V1 10/12] migration/ram: old host address Steve Sistare
@ 2024-08-16 17:57 ` Fabiano Rosas
2024-08-16 18:13 ` Steven Sistare
0 siblings, 1 reply; 18+ messages in thread
From: Fabiano Rosas @ 2024-08-16 17:57 UTC (permalink / raw)
To: Steve Sistare, qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Cedric Le Goater, Michael S. Tsirkin, Peter Xu,
Philippe Mathieu-Daude, David Hildenbrand, Steve Sistare
Steve Sistare <steven.sistare@oracle.com> writes:
> Remember the RAMBlock host address as host_old during migration, for use
> by CPR. The iommufd interface to update the virtual address of DMA
> mappings requires it.
>
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> ---
> hw/core/machine.c | 6 ++++++
> include/exec/ramblock.h | 1 +
> migration/migration.h | 2 ++
> migration/options.c | 2 ++
> migration/ram.c | 7 +++++++
> 5 files changed, 18 insertions(+)
>
> diff --git a/hw/core/machine.c b/hw/core/machine.c
> index 9676953..0ac16b8 100644
> --- a/hw/core/machine.c
> +++ b/hw/core/machine.c
> @@ -35,6 +35,12 @@
> #include "hw/virtio/virtio-iommu.h"
> #include "audio/audio.h"
>
> +/* TBD: register hw_compat_9_1 with machines */
> +GlobalProperty hw_compat_9_1[] = {
> + { "migration", "send-host-old", "off"},
> +};
> +const size_t hw_compat_9_1_len = G_N_ELEMENTS(hw_compat_9_1);
> +
> GlobalProperty hw_compat_9_0[] = {
> {"arm-cpu", "backcompat-cntfrq", "true" },
> {"scsi-disk-base", "migrate-emulated-scsi-request", "false" },
> diff --git a/include/exec/ramblock.h b/include/exec/ramblock.h
> index 64484cd..8f1c535 100644
> --- a/include/exec/ramblock.h
> +++ b/include/exec/ramblock.h
> @@ -28,6 +28,7 @@ struct RAMBlock {
> struct rcu_head rcu;
> struct MemoryRegion *mr;
> uint8_t *host;
> + uint64_t host_old;
> uint8_t *colo_cache; /* For colo, VM's ram cache */
> ram_addr_t offset;
> ram_addr_t used_length;
> diff --git a/migration/migration.h b/migration/migration.h
> index 38aa140..b5e3151 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -442,6 +442,8 @@ struct MigrationState {
> */
> uint8_t clear_bitmap_shift;
>
> + bool send_host_old;
> +
> /*
> * This save hostname when out-going migration starts
> */
> diff --git a/migration/options.c b/migration/options.c
> index 7526f9f..197cb86 100644
> --- a/migration/options.c
> +++ b/migration/options.c
> @@ -92,6 +92,8 @@ Property migration_properties[] = {
> clear_bitmap_shift, CLEAR_BITMAP_SHIFT_DEFAULT),
> DEFINE_PROP_BOOL("x-preempt-pre-7-2", MigrationState,
> preempt_pre_7_2, false),
> + DEFINE_PROP_BOOL("send-host-old", MigrationState,
> + send_host_old, true),
>
> /* Migration parameters */
> DEFINE_PROP_UINT8("x-throttle-trigger-threshold", MigrationState,
> diff --git a/migration/ram.c b/migration/ram.c
> index 1e1e05e..8644917 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -3030,6 +3030,9 @@ static int ram_save_setup(QEMUFile *f, void *opaque, Error **errp)
> qemu_put_byte(f, strlen(block->idstr));
> qemu_put_buffer(f, (uint8_t *)block->idstr, strlen(block->idstr));
> qemu_put_be64(f, block->used_length);
> + if (migrate_get_current()->send_host_old) {
> + qemu_put_be64(f, (uint64_t)block->host);
> + }
This requires an update of scripts/analyze-migration.py. Could be done
on the side.
> if (migrate_postcopy_ram() &&
> block->page_size != max_hg_page_size) {
> qemu_put_be64(f, block->page_size);
> @@ -4021,6 +4024,10 @@ static int parse_ramblock(QEMUFile *f, RAMBlock *block, ram_addr_t length)
>
> assert(block);
>
> + if (migrate_get_current()->send_host_old) {
> + block->host_old = qemu_get_be64(f);
> + }
> +
> if (migrate_mapped_ram()) {
> parse_ramblock_mapped_ram(f, block, length, &local_err);
> if (local_err) {
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC V1 10/12] migration/ram: old host address
2024-08-16 17:57 ` Fabiano Rosas
@ 2024-08-16 18:13 ` Steven Sistare
0 siblings, 0 replies; 18+ messages in thread
From: Steven Sistare @ 2024-08-16 18:13 UTC (permalink / raw)
To: Fabiano Rosas, qemu-devel
Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
Cedric Le Goater, Michael S. Tsirkin, Peter Xu,
Philippe Mathieu-Daude, David Hildenbrand
On 8/16/2024 1:57 PM, Fabiano Rosas wrote:
> Steve Sistare <steven.sistare@oracle.com> writes:
>
>> Remember the RAMBlock host address as host_old during migration, for use
>> by CPR. The iommufd interface to update the virtual address of DMA
>> mappings requires it.
>>
>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>> ---
>> hw/core/machine.c | 6 ++++++
>> include/exec/ramblock.h | 1 +
>> migration/migration.h | 2 ++
>> migration/options.c | 2 ++
>> migration/ram.c | 7 +++++++
>> 5 files changed, 18 insertions(+)
>>
>> diff --git a/hw/core/machine.c b/hw/core/machine.c
>> index 9676953..0ac16b8 100644
>> --- a/hw/core/machine.c
>> +++ b/hw/core/machine.c
>> @@ -35,6 +35,12 @@
>> #include "hw/virtio/virtio-iommu.h"
>> #include "audio/audio.h"
>>
>> +/* TBD: register hw_compat_9_1 with machines */
>> +GlobalProperty hw_compat_9_1[] = {
>> + { "migration", "send-host-old", "off"},
>> +};
>> +const size_t hw_compat_9_1_len = G_N_ELEMENTS(hw_compat_9_1);
>> +
>> GlobalProperty hw_compat_9_0[] = {
>> {"arm-cpu", "backcompat-cntfrq", "true" },
>> {"scsi-disk-base", "migrate-emulated-scsi-request", "false" },
>> diff --git a/include/exec/ramblock.h b/include/exec/ramblock.h
>> index 64484cd..8f1c535 100644
>> --- a/include/exec/ramblock.h
>> +++ b/include/exec/ramblock.h
>> @@ -28,6 +28,7 @@ struct RAMBlock {
>> struct rcu_head rcu;
>> struct MemoryRegion *mr;
>> uint8_t *host;
>> + uint64_t host_old;
>> uint8_t *colo_cache; /* For colo, VM's ram cache */
>> ram_addr_t offset;
>> ram_addr_t used_length;
>> diff --git a/migration/migration.h b/migration/migration.h
>> index 38aa140..b5e3151 100644
>> --- a/migration/migration.h
>> +++ b/migration/migration.h
>> @@ -442,6 +442,8 @@ struct MigrationState {
>> */
>> uint8_t clear_bitmap_shift;
>>
>> + bool send_host_old;
>> +
>> /*
>> * This save hostname when out-going migration starts
>> */
>> diff --git a/migration/options.c b/migration/options.c
>> index 7526f9f..197cb86 100644
>> --- a/migration/options.c
>> +++ b/migration/options.c
>> @@ -92,6 +92,8 @@ Property migration_properties[] = {
>> clear_bitmap_shift, CLEAR_BITMAP_SHIFT_DEFAULT),
>> DEFINE_PROP_BOOL("x-preempt-pre-7-2", MigrationState,
>> preempt_pre_7_2, false),
>> + DEFINE_PROP_BOOL("send-host-old", MigrationState,
>> + send_host_old, true),
>>
>> /* Migration parameters */
>> DEFINE_PROP_UINT8("x-throttle-trigger-threshold", MigrationState,
>> diff --git a/migration/ram.c b/migration/ram.c
>> index 1e1e05e..8644917 100644
>> --- a/migration/ram.c
>> +++ b/migration/ram.c
>> @@ -3030,6 +3030,9 @@ static int ram_save_setup(QEMUFile *f, void *opaque, Error **errp)
>> qemu_put_byte(f, strlen(block->idstr));
>> qemu_put_buffer(f, (uint8_t *)block->idstr, strlen(block->idstr));
>> qemu_put_be64(f, block->used_length);
>> + if (migrate_get_current()->send_host_old) {
>> + qemu_put_be64(f, (uint64_t)block->host);
>> + }
>
> This requires an update of scripts/analyze-migration.py. Could be done
> on the side.
Indeed. Thanks.
- Steve
>> if (migrate_postcopy_ram() &&
>> block->page_size != max_hg_page_size) {
>> qemu_put_be64(f, block->page_size);
>> @@ -4021,6 +4024,10 @@ static int parse_ramblock(QEMUFile *f, RAMBlock *block, ram_addr_t length)
>>
>> assert(block);
>>
>> + if (migrate_get_current()->send_host_old) {
>> + block->host_old = qemu_get_be64(f);
>> + }
>> +
>> if (migrate_mapped_ram()) {
>> parse_ramblock_mapped_ram(f, block, length, &local_err);
>> if (local_err) {
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2024-08-16 18:15 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-20 19:15 [RFC V1 00/12] Live update: iommufd Steve Sistare
2024-07-20 19:15 ` [RFC V1 01/12] vfio: move cpr_exec_notifier Steve Sistare
2024-07-20 19:15 ` [RFC V1 02/12] iommufd: no DMA to BARs Steve Sistare
2024-08-12 22:05 ` Alex Williamson
2024-08-13 1:39 ` Yi Liu
2024-08-13 14:53 ` Steven Sistare
2024-07-20 19:15 ` [RFC V1 03/12] iommufd: pass name to connect Steve Sistare
2024-07-20 19:15 ` [RFC V1 04/12] migration: cpr_find_fd_any Steve Sistare
2024-07-20 19:15 ` [RFC V1 05/12] iommufd: preserve device fd Steve Sistare
2024-07-20 19:15 ` [RFC V1 06/12] iommufd: export iommufd_cdev_get_info_iova_range Steve Sistare
2024-07-20 19:15 ` [RFC V1 07/12] iommufd: change_process kernel interface Steve Sistare
2024-07-20 19:15 ` [RFC V1 08/12] vfio/iommufd: register container for cpr Steve Sistare
2024-07-20 19:15 ` [RFC V1 09/12] vfio/iommufd: rebuild device Steve Sistare
2024-07-20 19:15 ` [RFC V1 10/12] migration/ram: old host address Steve Sistare
2024-08-16 17:57 ` Fabiano Rosas
2024-08-16 18:13 ` Steven Sistare
2024-07-20 19:15 ` [RFC V1 11/12] iommufd: update DMA virtual addresses Steve Sistare
2024-07-20 19:15 ` [RFC V1 12/12] vfio: mdev blocker Steve Sistare
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).