* [PATCH V3 01/42] MAINTAINERS: Add reviewer for CPR
2025-05-12 15:32 [PATCH V3 00/42] Live update: vfio and iommufd Steve Sistare
@ 2025-05-12 15:32 ` Steve Sistare
2025-05-15 7:36 ` Cédric Le Goater
2025-05-12 15:32 ` [PATCH V3 02/42] migration: cpr helpers Steve Sistare
` (41 subsequent siblings)
42 siblings, 1 reply; 157+ messages in thread
From: Steve Sistare @ 2025-05-12 15:32 UTC (permalink / raw)
To: qemu-devel
Cc: Alex Williamson, Cedric Le Goater, Yi Liu, Eric Auger,
Zhenzhong Duan, Michael S. Tsirkin, Marcel Apfelbaum, Peter Xu,
Fabiano Rosas, Steve Sistare
CPR is integrated with live migration, and has the same maintainers.
But, add a CPR section to add a reviewer.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
MAINTAINERS | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/MAINTAINERS b/MAINTAINERS
index 6dacd6d..d54a532 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3019,6 +3019,15 @@ F: include/qemu/co-shared-resource.h
T: git https://gitlab.com/jsnow/qemu.git jobs
T: git https://gitlab.com/vsementsov/qemu.git block
+CheckPoint and Restart (CPR)
+R: Steve Sistare <steven.sistare@oracle.com>
+S: Supported
+F: hw/vfio/cpr*
+F: include/migration/cpr.h
+F: migration/cpr*
+F: tests/qtest/migration/cpr*
+F: docs/devel/migration/CPR.rst
+
Compute Express Link
M: Jonathan Cameron <jonathan.cameron@huawei.com>
R: Fan Ni <fan.ni@samsung.com>
--
1.8.3.1
^ permalink raw reply related [flat|nested] 157+ messages in thread* Re: [PATCH V3 01/42] MAINTAINERS: Add reviewer for CPR
2025-05-12 15:32 ` [PATCH V3 01/42] MAINTAINERS: Add reviewer for CPR Steve Sistare
@ 2025-05-15 7:36 ` Cédric Le Goater
0 siblings, 0 replies; 157+ messages in thread
From: Cédric Le Goater @ 2025-05-15 7:36 UTC (permalink / raw)
To: Steve Sistare, qemu-devel
Cc: Alex Williamson, Yi Liu, Eric Auger, Zhenzhong Duan,
Michael S. Tsirkin, Marcel Apfelbaum, Peter Xu, Fabiano Rosas
On 5/12/25 17:32, Steve Sistare wrote:
> CPR is integrated with live migration, and has the same maintainers.
> But, add a CPR section to add a reviewer.
>
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> ---
> MAINTAINERS | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 6dacd6d..d54a532 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -3019,6 +3019,15 @@ F: include/qemu/co-shared-resource.h
> T: git https://gitlab.com/jsnow/qemu.git jobs
> T: git https://gitlab.com/vsementsov/qemu.git block
>
> +CheckPoint and Restart (CPR)
> +R: Steve Sistare <steven.sistare@oracle.com>
> +S: Supported
> +F: hw/vfio/cpr*
> +F: include/migration/cpr.h
> +F: migration/cpr*
> +F: tests/qtest/migration/cpr*
> +F: docs/devel/migration/CPR.rst
> +
> Compute Express Link
> M: Jonathan Cameron <jonathan.cameron@huawei.com>
> R: Fan Ni <fan.ni@samsung.com>
Please add :
include/hw/vfio/vfio-cpr.h
with that,
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Thanks,
C.
^ permalink raw reply [flat|nested] 157+ messages in thread
* [PATCH V3 02/42] migration: cpr helpers
2025-05-12 15:32 [PATCH V3 00/42] Live update: vfio and iommufd Steve Sistare
2025-05-12 15:32 ` [PATCH V3 01/42] MAINTAINERS: Add reviewer for CPR Steve Sistare
@ 2025-05-12 15:32 ` Steve Sistare
2025-05-15 7:43 ` Cédric Le Goater
2025-05-12 15:32 ` [PATCH V3 03/42] migration: lower handler priority Steve Sistare
` (40 subsequent siblings)
42 siblings, 1 reply; 157+ messages in thread
From: Steve Sistare @ 2025-05-12 15:32 UTC (permalink / raw)
To: qemu-devel
Cc: Alex Williamson, Cedric Le Goater, Yi Liu, Eric Auger,
Zhenzhong Duan, Michael S. Tsirkin, Marcel Apfelbaum, Peter Xu,
Fabiano Rosas, Steve Sistare
Add the cpr_needed_for_reuse and cpr_open_fd, for use when adding cpr
support for vfio and iommufd.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
include/migration/cpr.h | 4 ++++
migration/cpr.c | 24 ++++++++++++++++++++++++
2 files changed, 28 insertions(+)
diff --git a/include/migration/cpr.h b/include/migration/cpr.h
index 7561fc7..fc6aa33 100644
--- a/include/migration/cpr.h
+++ b/include/migration/cpr.h
@@ -18,6 +18,8 @@
void cpr_save_fd(const char *name, int id, int fd);
void cpr_delete_fd(const char *name, int id);
int cpr_find_fd(const char *name, int id);
+int cpr_open_fd(const char *path, int flags, const char *name, int id,
+ bool *reused, Error **errp);
MigMode cpr_get_incoming_mode(void);
void cpr_set_incoming_mode(MigMode mode);
@@ -28,6 +30,8 @@ int cpr_state_load(MigrationChannel *channel, Error **errp);
void cpr_state_close(void);
struct QIOChannel *cpr_state_ioc(void);
+bool cpr_needed_for_reuse(void *opaque);
+
QEMUFile *cpr_transfer_output(MigrationChannel *channel, Error **errp);
QEMUFile *cpr_transfer_input(MigrationChannel *channel, Error **errp);
diff --git a/migration/cpr.c b/migration/cpr.c
index 42c4656..0b01e25 100644
--- a/migration/cpr.c
+++ b/migration/cpr.c
@@ -95,6 +95,24 @@ int cpr_find_fd(const char *name, int id)
trace_cpr_find_fd(name, id, fd);
return fd;
}
+
+int cpr_open_fd(const char *path, int flags, const char *name, int id,
+ bool *reused, Error **errp)
+{
+ int fd = cpr_find_fd(name, id);
+
+ if (reused) {
+ *reused = (fd >= 0);
+ }
+ if (fd < 0) {
+ fd = qemu_open(path, flags, errp);
+ if (fd >= 0) {
+ cpr_save_fd(name, id, fd);
+ }
+ }
+ return fd;
+}
+
/*************************************************************************/
#define CPR_STATE "CprState"
@@ -228,3 +246,9 @@ void cpr_state_close(void)
cpr_state_file = NULL;
}
}
+
+bool cpr_needed_for_reuse(void *opaque)
+{
+ MigMode mode = migrate_mode();
+ return mode == MIG_MODE_CPR_TRANSFER;
+}
--
1.8.3.1
^ permalink raw reply related [flat|nested] 157+ messages in thread* Re: [PATCH V3 02/42] migration: cpr helpers
2025-05-12 15:32 ` [PATCH V3 02/42] migration: cpr helpers Steve Sistare
@ 2025-05-15 7:43 ` Cédric Le Goater
0 siblings, 0 replies; 157+ messages in thread
From: Cédric Le Goater @ 2025-05-15 7:43 UTC (permalink / raw)
To: Steve Sistare, qemu-devel
Cc: Alex Williamson, Yi Liu, Eric Auger, Zhenzhong Duan,
Michael S. Tsirkin, Marcel Apfelbaum, Peter Xu, Fabiano Rosas
On 5/12/25 17:32, Steve Sistare wrote:
> Add the cpr_needed_for_reuse and cpr_open_fd, for use when adding cpr
> support for vfio and iommufd.
>
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Thanks,
C.
> ---
> include/migration/cpr.h | 4 ++++
> migration/cpr.c | 24 ++++++++++++++++++++++++
> 2 files changed, 28 insertions(+)
>
> diff --git a/include/migration/cpr.h b/include/migration/cpr.h
> index 7561fc7..fc6aa33 100644
> --- a/include/migration/cpr.h
> +++ b/include/migration/cpr.h
> @@ -18,6 +18,8 @@
> void cpr_save_fd(const char *name, int id, int fd);
> void cpr_delete_fd(const char *name, int id);
> int cpr_find_fd(const char *name, int id);
> +int cpr_open_fd(const char *path, int flags, const char *name, int id,
> + bool *reused, Error **errp);
>
> MigMode cpr_get_incoming_mode(void);
> void cpr_set_incoming_mode(MigMode mode);
> @@ -28,6 +30,8 @@ int cpr_state_load(MigrationChannel *channel, Error **errp);
> void cpr_state_close(void);
> struct QIOChannel *cpr_state_ioc(void);
>
> +bool cpr_needed_for_reuse(void *opaque);
> +
> QEMUFile *cpr_transfer_output(MigrationChannel *channel, Error **errp);
> QEMUFile *cpr_transfer_input(MigrationChannel *channel, Error **errp);
>
> diff --git a/migration/cpr.c b/migration/cpr.c
> index 42c4656..0b01e25 100644
> --- a/migration/cpr.c
> +++ b/migration/cpr.c
> @@ -95,6 +95,24 @@ int cpr_find_fd(const char *name, int id)
> trace_cpr_find_fd(name, id, fd);
> return fd;
> }
> +
> +int cpr_open_fd(const char *path, int flags, const char *name, int id,
> + bool *reused, Error **errp)
> +{
> + int fd = cpr_find_fd(name, id);
> +
> + if (reused) {
> + *reused = (fd >= 0);
> + }
> + if (fd < 0) {
> + fd = qemu_open(path, flags, errp);
> + if (fd >= 0) {
> + cpr_save_fd(name, id, fd);
> + }
> + }
> + return fd;
> +}
> +
> /*************************************************************************/
> #define CPR_STATE "CprState"
>
> @@ -228,3 +246,9 @@ void cpr_state_close(void)
> cpr_state_file = NULL;
> }
> }
> +
> +bool cpr_needed_for_reuse(void *opaque)
> +{
> + MigMode mode = migrate_mode();
> + return mode == MIG_MODE_CPR_TRANSFER;
> +}
^ permalink raw reply [flat|nested] 157+ messages in thread
* [PATCH V3 03/42] migration: lower handler priority
2025-05-12 15:32 [PATCH V3 00/42] Live update: vfio and iommufd Steve Sistare
2025-05-12 15:32 ` [PATCH V3 01/42] MAINTAINERS: Add reviewer for CPR Steve Sistare
2025-05-12 15:32 ` [PATCH V3 02/42] migration: cpr helpers Steve Sistare
@ 2025-05-12 15:32 ` Steve Sistare
2025-05-12 15:32 ` [PATCH V3 04/42] vfio: vfio_find_ram_discard_listener Steve Sistare
` (39 subsequent siblings)
42 siblings, 0 replies; 157+ messages in thread
From: Steve Sistare @ 2025-05-12 15:32 UTC (permalink / raw)
To: qemu-devel
Cc: Alex Williamson, Cedric Le Goater, Yi Liu, Eric Auger,
Zhenzhong Duan, Michael S. Tsirkin, Marcel Apfelbaum, Peter Xu,
Fabiano Rosas, Steve Sistare
Define a vmstate priority that is lower than the default, so its handlers
run after all default priority handlers. Since 0 is no longer the default
priority, translate an uninitialized priority of 0 to MIG_PRI_DEFAULT.
CPR for vfio will use this to install handlers for containers that run
after handlers for the devices that they contain.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
---
include/migration/vmstate.h | 6 +++++-
migration/savevm.c | 4 ++--
2 files changed, 7 insertions(+), 3 deletions(-)
diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index a1dfab4..1ff7bd9 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -155,7 +155,11 @@ enum VMStateFlags {
};
typedef enum {
- MIG_PRI_DEFAULT = 0,
+ MIG_PRI_UNINITIALIZED = 0, /* An uninitialized priority field maps to */
+ /* MIG_PRI_DEFAULT in save_state_priority */
+
+ MIG_PRI_LOW, /* Must happen after default */
+ MIG_PRI_DEFAULT,
MIG_PRI_IOMMU, /* Must happen before PCI devices */
MIG_PRI_PCI_BUS, /* Must happen before IOMMU */
MIG_PRI_VIRTIO_MEM, /* Must happen before IOMMU */
diff --git a/migration/savevm.c b/migration/savevm.c
index 006514c..7e87815 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -266,7 +266,7 @@ typedef struct SaveState {
static SaveState savevm_state = {
.handlers = QTAILQ_HEAD_INITIALIZER(savevm_state.handlers),
- .handler_pri_head = { [MIG_PRI_DEFAULT ... MIG_PRI_MAX] = NULL },
+ .handler_pri_head = { [0 ... MIG_PRI_MAX] = NULL },
.global_section_id = 0,
};
@@ -737,7 +737,7 @@ static int calculate_compat_instance_id(const char *idstr)
static inline MigrationPriority save_state_priority(SaveStateEntry *se)
{
- if (se->vmsd) {
+ if (se->vmsd && se->vmsd->priority) {
return se->vmsd->priority;
}
return MIG_PRI_DEFAULT;
--
1.8.3.1
^ permalink raw reply related [flat|nested] 157+ messages in thread* [PATCH V3 04/42] vfio: vfio_find_ram_discard_listener
2025-05-12 15:32 [PATCH V3 00/42] Live update: vfio and iommufd Steve Sistare
` (2 preceding siblings ...)
2025-05-12 15:32 ` [PATCH V3 03/42] migration: lower handler priority Steve Sistare
@ 2025-05-12 15:32 ` Steve Sistare
2025-05-12 15:32 ` [PATCH V3 05/42] vfio: move vfio-cpr.h Steve Sistare
` (38 subsequent siblings)
42 siblings, 0 replies; 157+ messages in thread
From: Steve Sistare @ 2025-05-12 15:32 UTC (permalink / raw)
To: qemu-devel
Cc: Alex Williamson, Cedric Le Goater, Yi Liu, Eric Auger,
Zhenzhong Duan, Michael S. Tsirkin, Marcel Apfelbaum, Peter Xu,
Fabiano Rosas, Steve Sistare
Define vfio_find_ram_discard_listener as a subroutine so additional calls to
it may be added in a subsequent patch.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
---
hw/vfio/listener.c | 35 ++++++++++++++++++++++-------------
include/hw/vfio/vfio-container-base.h | 3 +++
2 files changed, 25 insertions(+), 13 deletions(-)
diff --git a/hw/vfio/listener.c b/hw/vfio/listener.c
index bfacb3d..5642d04 100644
--- a/hw/vfio/listener.c
+++ b/hw/vfio/listener.c
@@ -449,6 +449,26 @@ static void vfio_device_error_append(VFIODevice *vbasedev, Error **errp)
}
}
+VFIORamDiscardListener *vfio_find_ram_discard_listener(
+ VFIOContainerBase *bcontainer, MemoryRegionSection *section)
+{
+ VFIORamDiscardListener *vrdl = NULL;
+
+ QLIST_FOREACH(vrdl, &bcontainer->vrdl_list, next) {
+ if (vrdl->mr == section->mr &&
+ vrdl->offset_within_address_space ==
+ section->offset_within_address_space) {
+ break;
+ }
+ }
+
+ if (!vrdl) {
+ hw_error("vfio: Trying to sync missing RAM discard listener");
+ /* does not return */
+ }
+ return vrdl;
+}
+
static void vfio_listener_region_add(MemoryListener *listener,
MemoryRegionSection *section)
{
@@ -1075,19 +1095,8 @@ vfio_sync_ram_discard_listener_dirty_bitmap(VFIOContainerBase *bcontainer,
MemoryRegionSection *section)
{
RamDiscardManager *rdm = memory_region_get_ram_discard_manager(section->mr);
- VFIORamDiscardListener *vrdl = NULL;
-
- QLIST_FOREACH(vrdl, &bcontainer->vrdl_list, next) {
- if (vrdl->mr == section->mr &&
- vrdl->offset_within_address_space ==
- section->offset_within_address_space) {
- break;
- }
- }
-
- if (!vrdl) {
- hw_error("vfio: Trying to sync missing RAM discard listener");
- }
+ VFIORamDiscardListener *vrdl =
+ vfio_find_ram_discard_listener(bcontainer, section);
/*
* We only want/can synchronize the bitmap for actually mapped parts -
diff --git a/include/hw/vfio/vfio-container-base.h b/include/hw/vfio/vfio-container-base.h
index 3d392b0..1dc760f 100644
--- a/include/hw/vfio/vfio-container-base.h
+++ b/include/hw/vfio/vfio-container-base.h
@@ -183,4 +183,7 @@ struct VFIOIOMMUClass {
void (*release)(VFIOContainerBase *bcontainer);
};
+VFIORamDiscardListener *vfio_find_ram_discard_listener(
+ VFIOContainerBase *bcontainer, MemoryRegionSection *section);
+
#endif /* HW_VFIO_VFIO_CONTAINER_BASE_H */
--
1.8.3.1
^ permalink raw reply related [flat|nested] 157+ messages in thread* [PATCH V3 05/42] vfio: move vfio-cpr.h
2025-05-12 15:32 [PATCH V3 00/42] Live update: vfio and iommufd Steve Sistare
` (3 preceding siblings ...)
2025-05-12 15:32 ` [PATCH V3 04/42] vfio: vfio_find_ram_discard_listener Steve Sistare
@ 2025-05-12 15:32 ` Steve Sistare
2025-05-15 7:46 ` Cédric Le Goater
2025-05-12 15:32 ` [PATCH V3 06/42] vfio/container: register container for cpr Steve Sistare
` (37 subsequent siblings)
42 siblings, 1 reply; 157+ messages in thread
From: Steve Sistare @ 2025-05-12 15:32 UTC (permalink / raw)
To: qemu-devel
Cc: Alex Williamson, Cedric Le Goater, Yi Liu, Eric Auger,
Zhenzhong Duan, Michael S. Tsirkin, Marcel Apfelbaum, Peter Xu,
Fabiano Rosas, Steve Sistare
Move vfio-cpr.h to include/hw/vfio, because it will need to be included by
other files there.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
MAINTAINERS | 1 +
hw/vfio/container.c | 2 +-
hw/vfio/cpr.c | 2 +-
hw/vfio/iommufd.c | 2 +-
hw/vfio/vfio-cpr.h | 15 ---------------
include/hw/vfio/vfio-cpr.h | 18 ++++++++++++++++++
6 files changed, 22 insertions(+), 18 deletions(-)
delete mode 100644 hw/vfio/vfio-cpr.h
create mode 100644 include/hw/vfio/vfio-cpr.h
diff --git a/MAINTAINERS b/MAINTAINERS
index d54a532..9bee3cf 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3023,6 +3023,7 @@ CheckPoint and Restart (CPR)
R: Steve Sistare <steven.sistare@oracle.com>
S: Supported
F: hw/vfio/cpr*
+F: include/hw/vfio/vfio-cpr.h
F: include/migration/cpr.h
F: migration/cpr*
F: tests/qtest/migration/cpr*
diff --git a/hw/vfio/container.c b/hw/vfio/container.c
index a9f0dba..eb56f00 100644
--- a/hw/vfio/container.c
+++ b/hw/vfio/container.c
@@ -33,8 +33,8 @@
#include "qapi/error.h"
#include "pci.h"
#include "hw/vfio/vfio-container.h"
+#include "hw/vfio/vfio-cpr.h"
#include "vfio-helpers.h"
-#include "vfio-cpr.h"
#include "vfio-listener.h"
#define TYPE_HOST_IOMMU_DEVICE_LEGACY_VFIO TYPE_HOST_IOMMU_DEVICE "-legacy-vfio"
diff --git a/hw/vfio/cpr.c b/hw/vfio/cpr.c
index 3214184..0210e76 100644
--- a/hw/vfio/cpr.c
+++ b/hw/vfio/cpr.c
@@ -8,9 +8,9 @@
#include "qemu/osdep.h"
#include "hw/vfio/vfio-device.h"
#include "migration/misc.h"
+#include "hw/vfio/vfio-cpr.h"
#include "qapi/error.h"
#include "system/runstate.h"
-#include "vfio-cpr.h"
static int vfio_cpr_reboot_notifier(NotifierWithReturn *notifier,
MigrationEvent *e, Error **errp)
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index af1c7ab..167bda4 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -21,13 +21,13 @@
#include "qapi/error.h"
#include "system/iommufd.h"
#include "hw/qdev-core.h"
+#include "hw/vfio/vfio-cpr.h"
#include "system/reset.h"
#include "qemu/cutils.h"
#include "qemu/chardev_open.h"
#include "pci.h"
#include "vfio-iommufd.h"
#include "vfio-helpers.h"
-#include "vfio-cpr.h"
#include "vfio-listener.h"
#define TYPE_HOST_IOMMU_DEVICE_IOMMUFD_VFIO \
diff --git a/hw/vfio/vfio-cpr.h b/hw/vfio/vfio-cpr.h
deleted file mode 100644
index 134b83a..0000000
--- a/hw/vfio/vfio-cpr.h
+++ /dev/null
@@ -1,15 +0,0 @@
-/*
- * VFIO CPR
- *
- * Copyright (c) 2025 Oracle and/or its affiliates.
- *
- * SPDX-License-Identifier: GPL-2.0-or-later
- */
-
-#ifndef HW_VFIO_CPR_H
-#define HW_VFIO_CPR_H
-
-bool vfio_cpr_register_container(VFIOContainerBase *bcontainer, Error **errp);
-void vfio_cpr_unregister_container(VFIOContainerBase *bcontainer);
-
-#endif /* HW_VFIO_CPR_H */
diff --git a/include/hw/vfio/vfio-cpr.h b/include/hw/vfio/vfio-cpr.h
new file mode 100644
index 0000000..750ea5b
--- /dev/null
+++ b/include/hw/vfio/vfio-cpr.h
@@ -0,0 +1,18 @@
+/*
+ * VFIO CPR
+ *
+ * Copyright (c) 2025 Oracle and/or its affiliates.
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#ifndef HW_VFIO_VFIO_CPR_H
+#define HW_VFIO_VFIO_CPR_H
+
+struct VFIOContainerBase;
+
+bool vfio_cpr_register_container(struct VFIOContainerBase *bcontainer,
+ Error **errp);
+void vfio_cpr_unregister_container(struct VFIOContainerBase *bcontainer);
+
+#endif /* HW_VFIO_VFIO_CPR_H */
--
1.8.3.1
^ permalink raw reply related [flat|nested] 157+ messages in thread* Re: [PATCH V3 05/42] vfio: move vfio-cpr.h
2025-05-12 15:32 ` [PATCH V3 05/42] vfio: move vfio-cpr.h Steve Sistare
@ 2025-05-15 7:46 ` Cédric Le Goater
0 siblings, 0 replies; 157+ messages in thread
From: Cédric Le Goater @ 2025-05-15 7:46 UTC (permalink / raw)
To: Steve Sistare, qemu-devel
Cc: Alex Williamson, Yi Liu, Eric Auger, Zhenzhong Duan,
Michael S. Tsirkin, Marcel Apfelbaum, Peter Xu, Fabiano Rosas
On 5/12/25 17:32, Steve Sistare wrote:
> Move vfio-cpr.h to include/hw/vfio, because it will need to be included by
> other files there.
So patch 1 is fine. Forget my comment.
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> ---
> MAINTAINERS | 1 +
> hw/vfio/container.c | 2 +-
> hw/vfio/cpr.c | 2 +-
> hw/vfio/iommufd.c | 2 +-
> hw/vfio/vfio-cpr.h | 15 ---------------
> include/hw/vfio/vfio-cpr.h | 18 ++++++++++++++++++
> 6 files changed, 22 insertions(+), 18 deletions(-)
> delete mode 100644 hw/vfio/vfio-cpr.h
> create mode 100644 include/hw/vfio/vfio-cpr.h
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Thanks,
C.
> diff --git a/MAINTAINERS b/MAINTAINERS
> index d54a532..9bee3cf 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -3023,6 +3023,7 @@ CheckPoint and Restart (CPR)
> R: Steve Sistare <steven.sistare@oracle.com>
> S: Supported
> F: hw/vfio/cpr*
> +F: include/hw/vfio/vfio-cpr.h
> F: include/migration/cpr.h
> F: migration/cpr*
> F: tests/qtest/migration/cpr*
> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
> index a9f0dba..eb56f00 100644
> --- a/hw/vfio/container.c
> +++ b/hw/vfio/container.c
> @@ -33,8 +33,8 @@
> #include "qapi/error.h"
> #include "pci.h"
> #include "hw/vfio/vfio-container.h"
> +#include "hw/vfio/vfio-cpr.h"
> #include "vfio-helpers.h"
> -#include "vfio-cpr.h"
> #include "vfio-listener.h"
>
> #define TYPE_HOST_IOMMU_DEVICE_LEGACY_VFIO TYPE_HOST_IOMMU_DEVICE "-legacy-vfio"
> diff --git a/hw/vfio/cpr.c b/hw/vfio/cpr.c
> index 3214184..0210e76 100644
> --- a/hw/vfio/cpr.c
> +++ b/hw/vfio/cpr.c
> @@ -8,9 +8,9 @@
> #include "qemu/osdep.h"
> #include "hw/vfio/vfio-device.h"
> #include "migration/misc.h"
> +#include "hw/vfio/vfio-cpr.h"
> #include "qapi/error.h"
> #include "system/runstate.h"
> -#include "vfio-cpr.h"
>
> static int vfio_cpr_reboot_notifier(NotifierWithReturn *notifier,
> MigrationEvent *e, Error **errp)
> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
> index af1c7ab..167bda4 100644
> --- a/hw/vfio/iommufd.c
> +++ b/hw/vfio/iommufd.c
> @@ -21,13 +21,13 @@
> #include "qapi/error.h"
> #include "system/iommufd.h"
> #include "hw/qdev-core.h"
> +#include "hw/vfio/vfio-cpr.h"
> #include "system/reset.h"
> #include "qemu/cutils.h"
> #include "qemu/chardev_open.h"
> #include "pci.h"
> #include "vfio-iommufd.h"
> #include "vfio-helpers.h"
> -#include "vfio-cpr.h"
> #include "vfio-listener.h"
>
> #define TYPE_HOST_IOMMU_DEVICE_IOMMUFD_VFIO \
> diff --git a/hw/vfio/vfio-cpr.h b/hw/vfio/vfio-cpr.h
> deleted file mode 100644
> index 134b83a..0000000
> --- a/hw/vfio/vfio-cpr.h
> +++ /dev/null
> @@ -1,15 +0,0 @@
> -/*
> - * VFIO CPR
> - *
> - * Copyright (c) 2025 Oracle and/or its affiliates.
> - *
> - * SPDX-License-Identifier: GPL-2.0-or-later
> - */
> -
> -#ifndef HW_VFIO_CPR_H
> -#define HW_VFIO_CPR_H
> -
> -bool vfio_cpr_register_container(VFIOContainerBase *bcontainer, Error **errp);
> -void vfio_cpr_unregister_container(VFIOContainerBase *bcontainer);
> -
> -#endif /* HW_VFIO_CPR_H */
> diff --git a/include/hw/vfio/vfio-cpr.h b/include/hw/vfio/vfio-cpr.h
> new file mode 100644
> index 0000000..750ea5b
> --- /dev/null
> +++ b/include/hw/vfio/vfio-cpr.h
> @@ -0,0 +1,18 @@
> +/*
> + * VFIO CPR
> + *
> + * Copyright (c) 2025 Oracle and/or its affiliates.
> + *
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + */
> +
> +#ifndef HW_VFIO_VFIO_CPR_H
> +#define HW_VFIO_VFIO_CPR_H
> +
> +struct VFIOContainerBase;
> +
> +bool vfio_cpr_register_container(struct VFIOContainerBase *bcontainer,
> + Error **errp);
> +void vfio_cpr_unregister_container(struct VFIOContainerBase *bcontainer);
> +
> +#endif /* HW_VFIO_VFIO_CPR_H */
^ permalink raw reply [flat|nested] 157+ messages in thread
* [PATCH V3 06/42] vfio/container: register container for cpr
2025-05-12 15:32 [PATCH V3 00/42] Live update: vfio and iommufd Steve Sistare
` (4 preceding siblings ...)
2025-05-12 15:32 ` [PATCH V3 05/42] vfio: move vfio-cpr.h Steve Sistare
@ 2025-05-12 15:32 ` Steve Sistare
2025-05-15 7:54 ` Cédric Le Goater
2025-05-12 15:32 ` [PATCH V3 07/42] vfio/container: preserve descriptors Steve Sistare
` (36 subsequent siblings)
42 siblings, 1 reply; 157+ messages in thread
From: Steve Sistare @ 2025-05-12 15:32 UTC (permalink / raw)
To: qemu-devel
Cc: Alex Williamson, Cedric Le Goater, Yi Liu, Eric Auger,
Zhenzhong Duan, Michael S. Tsirkin, Marcel Apfelbaum, Peter Xu,
Fabiano Rosas, Steve Sistare
Register a legacy container for cpr-transfer, replacing the generic CPR
register call with a more specific legacy container register call. Add a
blocker if the kernel does not support VFIO_UPDATE_VADDR or VFIO_UNMAP_ALL.
This is mostly boiler plate. The fields to to saved and restored are added
in subsequent patches.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
hw/vfio/container.c | 6 ++--
hw/vfio/cpr-legacy.c | 70 ++++++++++++++++++++++++++++++++++++++++
hw/vfio/cpr.c | 5 ++-
hw/vfio/meson.build | 1 +
include/hw/vfio/vfio-container.h | 2 ++
include/hw/vfio/vfio-cpr.h | 14 ++++++++
6 files changed, 92 insertions(+), 6 deletions(-)
create mode 100644 hw/vfio/cpr-legacy.c
diff --git a/hw/vfio/container.c b/hw/vfio/container.c
index eb56f00..85c76da 100644
--- a/hw/vfio/container.c
+++ b/hw/vfio/container.c
@@ -642,7 +642,7 @@ static bool vfio_container_connect(VFIOGroup *group, AddressSpace *as,
new_container = true;
bcontainer = &container->bcontainer;
- if (!vfio_cpr_register_container(bcontainer, errp)) {
+ if (!vfio_legacy_cpr_register_container(container, errp)) {
goto fail;
}
@@ -678,7 +678,7 @@ fail:
vioc->release(bcontainer);
}
if (new_container) {
- vfio_cpr_unregister_container(bcontainer);
+ vfio_legacy_cpr_unregister_container(container);
object_unref(container);
}
if (fd >= 0) {
@@ -719,7 +719,7 @@ static void vfio_container_disconnect(VFIOGroup *group)
VFIOAddressSpace *space = bcontainer->space;
trace_vfio_container_disconnect(container->fd);
- vfio_cpr_unregister_container(bcontainer);
+ vfio_legacy_cpr_unregister_container(container);
close(container->fd);
object_unref(container);
diff --git a/hw/vfio/cpr-legacy.c b/hw/vfio/cpr-legacy.c
new file mode 100644
index 0000000..fac323c
--- /dev/null
+++ b/hw/vfio/cpr-legacy.c
@@ -0,0 +1,70 @@
+/*
+ * Copyright (c) 2021-2025 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include <sys/ioctl.h>
+#include <linux/vfio.h>
+#include "qemu/osdep.h"
+#include "hw/vfio/vfio-container.h"
+#include "hw/vfio/vfio-cpr.h"
+#include "migration/blocker.h"
+#include "migration/cpr.h"
+#include "migration/migration.h"
+#include "migration/vmstate.h"
+#include "qapi/error.h"
+
+static bool vfio_cpr_supported(VFIOContainer *container, Error **errp)
+{
+ if (!ioctl(container->fd, VFIO_CHECK_EXTENSION, VFIO_UPDATE_VADDR)) {
+ error_setg(errp, "VFIO container does not support VFIO_UPDATE_VADDR");
+ return false;
+
+ } else if (!ioctl(container->fd, VFIO_CHECK_EXTENSION, VFIO_UNMAP_ALL)) {
+ error_setg(errp, "VFIO container does not support VFIO_UNMAP_ALL");
+ return false;
+
+ } else {
+ return true;
+ }
+}
+
+static const VMStateDescription vfio_container_vmstate = {
+ .name = "vfio-container",
+ .version_id = 0,
+ .minimum_version_id = 0,
+ .needed = cpr_needed_for_reuse,
+ .fields = (VMStateField[]) {
+ VMSTATE_END_OF_LIST()
+ }
+};
+
+bool vfio_legacy_cpr_register_container(VFIOContainer *container, Error **errp)
+{
+ VFIOContainerBase *bcontainer = &container->bcontainer;
+ Error **cpr_blocker = &container->cpr.blocker;
+
+ migration_add_notifier_mode(&bcontainer->cpr_reboot_notifier,
+ vfio_cpr_reboot_notifier,
+ MIG_MODE_CPR_REBOOT);
+
+ if (!vfio_cpr_supported(container, cpr_blocker)) {
+ return migrate_add_blocker_modes(cpr_blocker, errp,
+ MIG_MODE_CPR_TRANSFER, -1) == 0;
+ }
+
+ vmstate_register(NULL, -1, &vfio_container_vmstate, container);
+
+ return true;
+}
+
+void vfio_legacy_cpr_unregister_container(VFIOContainer *container)
+{
+ VFIOContainerBase *bcontainer = &container->bcontainer;
+
+ migration_remove_notifier(&bcontainer->cpr_reboot_notifier);
+ migrate_del_blocker(&container->cpr.blocker);
+ vmstate_unregister(NULL, &vfio_container_vmstate, container);
+}
diff --git a/hw/vfio/cpr.c b/hw/vfio/cpr.c
index 0210e76..0e59612 100644
--- a/hw/vfio/cpr.c
+++ b/hw/vfio/cpr.c
@@ -7,13 +7,12 @@
#include "qemu/osdep.h"
#include "hw/vfio/vfio-device.h"
-#include "migration/misc.h"
#include "hw/vfio/vfio-cpr.h"
#include "qapi/error.h"
#include "system/runstate.h"
-static int vfio_cpr_reboot_notifier(NotifierWithReturn *notifier,
- MigrationEvent *e, Error **errp)
+int vfio_cpr_reboot_notifier(NotifierWithReturn *notifier,
+ MigrationEvent *e, Error **errp)
{
if (e->type == MIG_EVENT_PRECOPY_SETUP &&
!runstate_check(RUN_STATE_SUSPENDED) && !vm_get_suspended()) {
diff --git a/hw/vfio/meson.build b/hw/vfio/meson.build
index bccb050..73d29f9 100644
--- a/hw/vfio/meson.build
+++ b/hw/vfio/meson.build
@@ -21,6 +21,7 @@ system_ss.add(when: 'CONFIG_VFIO_XGMAC', if_true: files('calxeda-xgmac.c'))
system_ss.add(when: 'CONFIG_VFIO_AMD_XGBE', if_true: files('amd-xgbe.c'))
system_ss.add(when: 'CONFIG_VFIO', if_true: files(
'cpr.c',
+ 'cpr-legacy.c',
'device.c',
'migration.c',
'migration-multifd.c',
diff --git a/include/hw/vfio/vfio-container.h b/include/hw/vfio/vfio-container.h
index afc498d..21e5807 100644
--- a/include/hw/vfio/vfio-container.h
+++ b/include/hw/vfio/vfio-container.h
@@ -10,6 +10,7 @@
#define HW_VFIO_CONTAINER_H
#include "hw/vfio/vfio-container-base.h"
+#include "hw/vfio/vfio-cpr.h"
typedef struct VFIOContainer VFIOContainer;
typedef struct VFIODevice VFIODevice;
@@ -29,6 +30,7 @@ typedef struct VFIOContainer {
int fd; /* /dev/vfio/vfio, empowered by the attached groups */
unsigned iommu_type;
QLIST_HEAD(, VFIOGroup) group_list;
+ VFIOContainerCPR cpr;
} VFIOContainer;
OBJECT_DECLARE_SIMPLE_TYPE(VFIOContainer, VFIO_IOMMU_LEGACY);
diff --git a/include/hw/vfio/vfio-cpr.h b/include/hw/vfio/vfio-cpr.h
index 750ea5b..f864547 100644
--- a/include/hw/vfio/vfio-cpr.h
+++ b/include/hw/vfio/vfio-cpr.h
@@ -9,8 +9,22 @@
#ifndef HW_VFIO_VFIO_CPR_H
#define HW_VFIO_VFIO_CPR_H
+#include "migration/misc.h"
+
+typedef struct VFIOContainerCPR {
+ Error *blocker;
+} VFIOContainerCPR;
+
+struct VFIOContainer;
struct VFIOContainerBase;
+bool vfio_legacy_cpr_register_container(struct VFIOContainer *container,
+ Error **errp);
+void vfio_legacy_cpr_unregister_container(struct VFIOContainer *container);
+
+int vfio_cpr_reboot_notifier(NotifierWithReturn *notifier, MigrationEvent *e,
+ Error **errp);
+
bool vfio_cpr_register_container(struct VFIOContainerBase *bcontainer,
Error **errp);
void vfio_cpr_unregister_container(struct VFIOContainerBase *bcontainer);
--
1.8.3.1
^ permalink raw reply related [flat|nested] 157+ messages in thread* Re: [PATCH V3 06/42] vfio/container: register container for cpr
2025-05-12 15:32 ` [PATCH V3 06/42] vfio/container: register container for cpr Steve Sistare
@ 2025-05-15 7:54 ` Cédric Le Goater
2025-05-15 19:06 ` Steven Sistare
0 siblings, 1 reply; 157+ messages in thread
From: Cédric Le Goater @ 2025-05-15 7:54 UTC (permalink / raw)
To: Steve Sistare, qemu-devel
Cc: Alex Williamson, Yi Liu, Eric Auger, Zhenzhong Duan,
Michael S. Tsirkin, Marcel Apfelbaum, Peter Xu, Fabiano Rosas
On 5/12/25 17:32, Steve Sistare wrote:
> Register a legacy container for cpr-transfer, replacing the generic CPR
> register call with a more specific legacy container register call. Add a
> blocker if the kernel does not support VFIO_UPDATE_VADDR or VFIO_UNMAP_ALL.
>
> This is mostly boiler plate. The fields to to saved and restored are added
> in subsequent patches.
>
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> ---
> hw/vfio/container.c | 6 ++--
> hw/vfio/cpr-legacy.c | 70 ++++++++++++++++++++++++++++++++++++++++
> hw/vfio/cpr.c | 5 ++-
> hw/vfio/meson.build | 1 +
> include/hw/vfio/vfio-container.h | 2 ++
> include/hw/vfio/vfio-cpr.h | 14 ++++++++
> 6 files changed, 92 insertions(+), 6 deletions(-)
> create mode 100644 hw/vfio/cpr-legacy.c
>
> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
> index eb56f00..85c76da 100644
> --- a/hw/vfio/container.c
> +++ b/hw/vfio/container.c
> @@ -642,7 +642,7 @@ static bool vfio_container_connect(VFIOGroup *group, AddressSpace *as,
> new_container = true;
> bcontainer = &container->bcontainer;
>
> - if (!vfio_cpr_register_container(bcontainer, errp)) {
> + if (!vfio_legacy_cpr_register_container(container, errp)) {
> goto fail;
> }
>
> @@ -678,7 +678,7 @@ fail:
> vioc->release(bcontainer);
> }
> if (new_container) {
> - vfio_cpr_unregister_container(bcontainer);
> + vfio_legacy_cpr_unregister_container(container);
> object_unref(container);
> }
> if (fd >= 0) {
> @@ -719,7 +719,7 @@ static void vfio_container_disconnect(VFIOGroup *group)
> VFIOAddressSpace *space = bcontainer->space;
>
> trace_vfio_container_disconnect(container->fd);
> - vfio_cpr_unregister_container(bcontainer);
> + vfio_legacy_cpr_unregister_container(container);
> close(container->fd);
> object_unref(container);
>
> diff --git a/hw/vfio/cpr-legacy.c b/hw/vfio/cpr-legacy.c
> new file mode 100644
> index 0000000..fac323c
> --- /dev/null
> +++ b/hw/vfio/cpr-legacy.c
> @@ -0,0 +1,70 @@
> +/*
> + * Copyright (c) 2021-2025 Oracle and/or its affiliates.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
Please add a SPDX-License-Identifier tag.
> + */
> +
> +#include <sys/ioctl.h>
> +#include <linux/vfio.h>
> +#include "qemu/osdep.h"
> +#include "hw/vfio/vfio-container.h"
> +#include "hw/vfio/vfio-cpr.h"
> +#include "migration/blocker.h"
> +#include "migration/cpr.h"
> +#include "migration/migration.h"
> +#include "migration/vmstate.h"
> +#include "qapi/error.h"
> +
> +static bool vfio_cpr_supported(VFIOContainer *container, Error **errp)
> +{
> + if (!ioctl(container->fd, VFIO_CHECK_EXTENSION, VFIO_UPDATE_VADDR)) {
> + error_setg(errp, "VFIO container does not support VFIO_UPDATE_VADDR");
> + return false;
> +
> + } else if (!ioctl(container->fd, VFIO_CHECK_EXTENSION, VFIO_UNMAP_ALL)) {
> + error_setg(errp, "VFIO container does not support VFIO_UNMAP_ALL");
> + return false;
> +
> + } else {
> + return true;
> + }
> +}
> +
> +static const VMStateDescription vfio_container_vmstate = {
> + .name = "vfio-container",
> + .version_id = 0,
> + .minimum_version_id = 0,
> + .needed = cpr_needed_for_reuse,
> + .fields = (VMStateField[]) {
> + VMSTATE_END_OF_LIST()
> + }
> +};
> +
> +bool vfio_legacy_cpr_register_container(VFIOContainer *container, Error **errp)
> +{
> + VFIOContainerBase *bcontainer = &container->bcontainer;
> + Error **cpr_blocker = &container->cpr.blocker;
> +
> + migration_add_notifier_mode(&bcontainer->cpr_reboot_notifier,
> + vfio_cpr_reboot_notifier,
> + MIG_MODE_CPR_REBOOT);
> +
> + if (!vfio_cpr_supported(container, cpr_blocker)) {
> + return migrate_add_blocker_modes(cpr_blocker, errp,
> + MIG_MODE_CPR_TRANSFER, -1) == 0;
> + }
> +
> + vmstate_register(NULL, -1, &vfio_container_vmstate, container);
> +
> + return true;
> +}
> +
> +void vfio_legacy_cpr_unregister_container(VFIOContainer *container)
> +{
> + VFIOContainerBase *bcontainer = &container->bcontainer;
> +
> + migration_remove_notifier(&bcontainer->cpr_reboot_notifier);
> + migrate_del_blocker(&container->cpr.blocker);
> + vmstate_unregister(NULL, &vfio_container_vmstate, container);
> +}
> diff --git a/hw/vfio/cpr.c b/hw/vfio/cpr.c
> index 0210e76..0e59612 100644
> --- a/hw/vfio/cpr.c
> +++ b/hw/vfio/cpr.c
> @@ -7,13 +7,12 @@
>
> #include "qemu/osdep.h"
> #include "hw/vfio/vfio-device.h"
> -#include "migration/misc.h"
> #include "hw/vfio/vfio-cpr.h"
> #include "qapi/error.h"
> #include "system/runstate.h"
>
> -static int vfio_cpr_reboot_notifier(NotifierWithReturn *notifier,
> - MigrationEvent *e, Error **errp)
> +int vfio_cpr_reboot_notifier(NotifierWithReturn *notifier,
> + MigrationEvent *e, Error **errp)
> {
> if (e->type == MIG_EVENT_PRECOPY_SETUP &&
> !runstate_check(RUN_STATE_SUSPENDED) && !vm_get_suspended()) {
> diff --git a/hw/vfio/meson.build b/hw/vfio/meson.build
> index bccb050..73d29f9 100644
> --- a/hw/vfio/meson.build
> +++ b/hw/vfio/meson.build
> @@ -21,6 +21,7 @@ system_ss.add(when: 'CONFIG_VFIO_XGMAC', if_true: files('calxeda-xgmac.c'))
> system_ss.add(when: 'CONFIG_VFIO_AMD_XGBE', if_true: files('amd-xgbe.c'))
> system_ss.add(when: 'CONFIG_VFIO', if_true: files(
> 'cpr.c',
> + 'cpr-legacy.c',
> 'device.c',
> 'migration.c',
> 'migration-multifd.c',
> diff --git a/include/hw/vfio/vfio-container.h b/include/hw/vfio/vfio-container.h
> index afc498d..21e5807 100644
> --- a/include/hw/vfio/vfio-container.h
> +++ b/include/hw/vfio/vfio-container.h
> @@ -10,6 +10,7 @@
> #define HW_VFIO_CONTAINER_H
>
> #include "hw/vfio/vfio-container-base.h"
> +#include "hw/vfio/vfio-cpr.h"
>
> typedef struct VFIOContainer VFIOContainer;
> typedef struct VFIODevice VFIODevice;
> @@ -29,6 +30,7 @@ typedef struct VFIOContainer {
> int fd; /* /dev/vfio/vfio, empowered by the attached groups */
> unsigned iommu_type;
> QLIST_HEAD(, VFIOGroup) group_list;
> + VFIOContainerCPR cpr;
> } VFIOContainer;
>
> OBJECT_DECLARE_SIMPLE_TYPE(VFIOContainer, VFIO_IOMMU_LEGACY);
> diff --git a/include/hw/vfio/vfio-cpr.h b/include/hw/vfio/vfio-cpr.h
> index 750ea5b..f864547 100644
> --- a/include/hw/vfio/vfio-cpr.h
> +++ b/include/hw/vfio/vfio-cpr.h
> @@ -9,8 +9,22 @@
> #ifndef HW_VFIO_VFIO_CPR_H
> #define HW_VFIO_VFIO_CPR_H
>
> +#include "migration/misc.h"
> +
> +typedef struct VFIOContainerCPR {
> + Error *blocker;
> +} VFIOContainerCPR;
> +
> +struct VFIOContainer;
> struct VFIOContainerBase;
>
> +bool vfio_legacy_cpr_register_container(struct VFIOContainer *container,
> + Error **errp);
> +void vfio_legacy_cpr_unregister_container(struct VFIOContainer *container);
> +
> +int vfio_cpr_reboot_notifier(NotifierWithReturn *notifier, MigrationEvent *e,
> + Error **errp);
> +
> bool vfio_cpr_register_container(struct VFIOContainerBase *bcontainer,
> Error **errp);
> void vfio_cpr_unregister_container(struct VFIOContainerBase *bcontainer);
what about vfio_cpr_un/register_container ? Shouldn't we remove them ?
Thanks,
C.
^ permalink raw reply [flat|nested] 157+ messages in thread* Re: [PATCH V3 06/42] vfio/container: register container for cpr
2025-05-15 7:54 ` Cédric Le Goater
@ 2025-05-15 19:06 ` Steven Sistare
2025-05-16 16:20 ` Cédric Le Goater
0 siblings, 1 reply; 157+ messages in thread
From: Steven Sistare @ 2025-05-15 19:06 UTC (permalink / raw)
To: Cédric Le Goater, qemu-devel
Cc: Alex Williamson, Yi Liu, Eric Auger, Zhenzhong Duan,
Michael S. Tsirkin, Marcel Apfelbaum, Peter Xu, Fabiano Rosas
On 5/15/2025 3:54 AM, Cédric Le Goater wrote:
> On 5/12/25 17:32, Steve Sistare wrote:
>> Register a legacy container for cpr-transfer, replacing the generic CPR
>> register call with a more specific legacy container register call. Add a
>> blocker if the kernel does not support VFIO_UPDATE_VADDR or VFIO_UNMAP_ALL.
>>
>> This is mostly boiler plate. The fields to to saved and restored are added
>> in subsequent patches.
>>
>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>> ---
>> hw/vfio/container.c | 6 ++--
>> hw/vfio/cpr-legacy.c | 70 ++++++++++++++++++++++++++++++++++++++++
>> hw/vfio/cpr.c | 5 ++-
>> hw/vfio/meson.build | 1 +
>> include/hw/vfio/vfio-container.h | 2 ++
>> include/hw/vfio/vfio-cpr.h | 14 ++++++++
>> 6 files changed, 92 insertions(+), 6 deletions(-)
>> create mode 100644 hw/vfio/cpr-legacy.c
>>
>> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
>> index eb56f00..85c76da 100644
>> --- a/hw/vfio/container.c
>> +++ b/hw/vfio/container.c
>> @@ -642,7 +642,7 @@ static bool vfio_container_connect(VFIOGroup *group, AddressSpace *as,
>> new_container = true;
>> bcontainer = &container->bcontainer;
>> - if (!vfio_cpr_register_container(bcontainer, errp)) {
>> + if (!vfio_legacy_cpr_register_container(container, errp)) {
>> goto fail;
>> }
>> @@ -678,7 +678,7 @@ fail:
>> vioc->release(bcontainer);
>> }
>> if (new_container) {
>> - vfio_cpr_unregister_container(bcontainer);
>> + vfio_legacy_cpr_unregister_container(container);
>> object_unref(container);
>> }
>> if (fd >= 0) {
>> @@ -719,7 +719,7 @@ static void vfio_container_disconnect(VFIOGroup *group)
>> VFIOAddressSpace *space = bcontainer->space;
>> trace_vfio_container_disconnect(container->fd);
>> - vfio_cpr_unregister_container(bcontainer);
>> + vfio_legacy_cpr_unregister_container(container);
>> close(container->fd);
>> object_unref(container);
>> diff --git a/hw/vfio/cpr-legacy.c b/hw/vfio/cpr-legacy.c
>> new file mode 100644
>> index 0000000..fac323c
>> --- /dev/null
>> +++ b/hw/vfio/cpr-legacy.c
>> @@ -0,0 +1,70 @@
>> +/*
>> + * Copyright (c) 2021-2025 Oracle and/or its affiliates.
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
>> + * See the COPYING file in the top-level directory.
>
> Please add a SPDX-License-Identifier tag.
Sure. I'll do the same for my other new files.
>> + */
>> +
>> +#include <sys/ioctl.h>
>> +#include <linux/vfio.h>
>> +#include "qemu/osdep.h"
>> +#include "hw/vfio/vfio-container.h"
>> +#include "hw/vfio/vfio-cpr.h"
>> +#include "migration/blocker.h"
>> +#include "migration/cpr.h"
>> +#include "migration/migration.h"
>> +#include "migration/vmstate.h"
>> +#include "qapi/error.h"
>> +
>> +static bool vfio_cpr_supported(VFIOContainer *container, Error **errp)
>> +{
>> + if (!ioctl(container->fd, VFIO_CHECK_EXTENSION, VFIO_UPDATE_VADDR)) {
>> + error_setg(errp, "VFIO container does not support VFIO_UPDATE_VADDR");
>> + return false;
>> +
>> + } else if (!ioctl(container->fd, VFIO_CHECK_EXTENSION, VFIO_UNMAP_ALL)) {
>> + error_setg(errp, "VFIO container does not support VFIO_UNMAP_ALL");
>> + return false;
>> +
>> + } else {
>> + return true;
>> + }
>> +}
>> +
>> +static const VMStateDescription vfio_container_vmstate = {
>> + .name = "vfio-container",
>> + .version_id = 0,
>> + .minimum_version_id = 0,
>> + .needed = cpr_needed_for_reuse,
>> + .fields = (VMStateField[]) {
>> + VMSTATE_END_OF_LIST()
>> + }
>> +};
>> +
>> +bool vfio_legacy_cpr_register_container(VFIOContainer *container, Error **errp)
>> +{
>> + VFIOContainerBase *bcontainer = &container->bcontainer;
>> + Error **cpr_blocker = &container->cpr.blocker;
>> +
>> + migration_add_notifier_mode(&bcontainer->cpr_reboot_notifier,
>> + vfio_cpr_reboot_notifier,
>> + MIG_MODE_CPR_REBOOT);
>> +
>> + if (!vfio_cpr_supported(container, cpr_blocker)) {
>> + return migrate_add_blocker_modes(cpr_blocker, errp,
>> + MIG_MODE_CPR_TRANSFER, -1) == 0;
>> + }
>> +
>> + vmstate_register(NULL, -1, &vfio_container_vmstate, container);
>> +
>> + return true;
>> +}
>> +
>> +void vfio_legacy_cpr_unregister_container(VFIOContainer *container)
>> +{
>> + VFIOContainerBase *bcontainer = &container->bcontainer;
>> +
>> + migration_remove_notifier(&bcontainer->cpr_reboot_notifier);
>> + migrate_del_blocker(&container->cpr.blocker);
>> + vmstate_unregister(NULL, &vfio_container_vmstate, container);
>> +}
>> diff --git a/hw/vfio/cpr.c b/hw/vfio/cpr.c
>> index 0210e76..0e59612 100644
>> --- a/hw/vfio/cpr.c
>> +++ b/hw/vfio/cpr.c
>> @@ -7,13 +7,12 @@
>> #include "qemu/osdep.h"
>> #include "hw/vfio/vfio-device.h"
>> -#include "migration/misc.h"
>> #include "hw/vfio/vfio-cpr.h"
>> #include "qapi/error.h"
>> #include "system/runstate.h"
>> -static int vfio_cpr_reboot_notifier(NotifierWithReturn *notifier,
>> - MigrationEvent *e, Error **errp)
>> +int vfio_cpr_reboot_notifier(NotifierWithReturn *notifier,
>> + MigrationEvent *e, Error **errp)
>> {
>> if (e->type == MIG_EVENT_PRECOPY_SETUP &&
>> !runstate_check(RUN_STATE_SUSPENDED) && !vm_get_suspended()) {
>> diff --git a/hw/vfio/meson.build b/hw/vfio/meson.build
>> index bccb050..73d29f9 100644
>> --- a/hw/vfio/meson.build
>> +++ b/hw/vfio/meson.build
>> @@ -21,6 +21,7 @@ system_ss.add(when: 'CONFIG_VFIO_XGMAC', if_true: files('calxeda-xgmac.c'))
>> system_ss.add(when: 'CONFIG_VFIO_AMD_XGBE', if_true: files('amd-xgbe.c'))
>> system_ss.add(when: 'CONFIG_VFIO', if_true: files(
>> 'cpr.c',
>> + 'cpr-legacy.c',
>> 'device.c',
>> 'migration.c',
>> 'migration-multifd.c',
>> diff --git a/include/hw/vfio/vfio-container.h b/include/hw/vfio/vfio-container.h
>> index afc498d..21e5807 100644
>> --- a/include/hw/vfio/vfio-container.h
>> +++ b/include/hw/vfio/vfio-container.h
>> @@ -10,6 +10,7 @@
>> #define HW_VFIO_CONTAINER_H
>> #include "hw/vfio/vfio-container-base.h"
>> +#include "hw/vfio/vfio-cpr.h"
>> typedef struct VFIOContainer VFIOContainer;
>> typedef struct VFIODevice VFIODevice;
>> @@ -29,6 +30,7 @@ typedef struct VFIOContainer {
>> int fd; /* /dev/vfio/vfio, empowered by the attached groups */
>> unsigned iommu_type;
>> QLIST_HEAD(, VFIOGroup) group_list;
>> + VFIOContainerCPR cpr;
>> } VFIOContainer;
>> OBJECT_DECLARE_SIMPLE_TYPE(VFIOContainer, VFIO_IOMMU_LEGACY);
>> diff --git a/include/hw/vfio/vfio-cpr.h b/include/hw/vfio/vfio-cpr.h
>> index 750ea5b..f864547 100644
>> --- a/include/hw/vfio/vfio-cpr.h
>> +++ b/include/hw/vfio/vfio-cpr.h
>> @@ -9,8 +9,22 @@
>> #ifndef HW_VFIO_VFIO_CPR_H
>> #define HW_VFIO_VFIO_CPR_H
>> +#include "migration/misc.h"
>> +
>> +typedef struct VFIOContainerCPR {
>> + Error *blocker;
>> +} VFIOContainerCPR;
>> +
>> +struct VFIOContainer;
>> struct VFIOContainerBase;
>> +bool vfio_legacy_cpr_register_container(struct VFIOContainer *container,
>> + Error **errp);
>> +void vfio_legacy_cpr_unregister_container(struct VFIOContainer *container);
>> +
>> +int vfio_cpr_reboot_notifier(NotifierWithReturn *notifier, MigrationEvent *e,
>> + Error **errp);
>> +
>> bool vfio_cpr_register_container(struct VFIOContainerBase *bcontainer,
>> Error **errp);
>> void vfio_cpr_unregister_container(struct VFIOContainerBase *bcontainer);
>
> what about vfio_cpr_un/register_container ? Shouldn't we remove them ?
At this patch in the series, those are still used by iommufd containers.
Those uses are removed in "vfio/iommufd: register container for cpr", and
vfio_cpr_un/register_container are deleted by the last patch in the series.
- Steve
^ permalink raw reply [flat|nested] 157+ messages in thread* Re: [PATCH V3 06/42] vfio/container: register container for cpr
2025-05-15 19:06 ` Steven Sistare
@ 2025-05-16 16:20 ` Cédric Le Goater
2025-05-16 17:21 ` Steven Sistare
0 siblings, 1 reply; 157+ messages in thread
From: Cédric Le Goater @ 2025-05-16 16:20 UTC (permalink / raw)
To: Steven Sistare, qemu-devel
Cc: Alex Williamson, Yi Liu, Eric Auger, Zhenzhong Duan,
Michael S. Tsirkin, Marcel Apfelbaum, Peter Xu, Fabiano Rosas
On 5/15/25 21:06, Steven Sistare wrote:
> On 5/15/2025 3:54 AM, Cédric Le Goater wrote:
>> On 5/12/25 17:32, Steve Sistare wrote:
>>> Register a legacy container for cpr-transfer, replacing the generic CPR
>>> register call with a more specific legacy container register call. Add a
>>> blocker if the kernel does not support VFIO_UPDATE_VADDR or VFIO_UNMAP_ALL.
>>>
>>> This is mostly boiler plate. The fields to to saved and restored are added
>>> in subsequent patches.
>>>
>>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>>> ---
>>> hw/vfio/container.c | 6 ++--
>>> hw/vfio/cpr-legacy.c | 70 ++++++++++++++++++++++++++++++++++++++++
>>> hw/vfio/cpr.c | 5 ++-
>>> hw/vfio/meson.build | 1 +
>>> include/hw/vfio/vfio-container.h | 2 ++
>>> include/hw/vfio/vfio-cpr.h | 14 ++++++++
>>> 6 files changed, 92 insertions(+), 6 deletions(-)
>>> create mode 100644 hw/vfio/cpr-legacy.c
>>>
>>> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
>>> index eb56f00..85c76da 100644
>>> --- a/hw/vfio/container.c
>>> +++ b/hw/vfio/container.c
>>> @@ -642,7 +642,7 @@ static bool vfio_container_connect(VFIOGroup *group, AddressSpace *as,
>>> new_container = true;
>>> bcontainer = &container->bcontainer;
>>> - if (!vfio_cpr_register_container(bcontainer, errp)) {
>>> + if (!vfio_legacy_cpr_register_container(container, errp)) {
>>> goto fail;
>>> }
>>> @@ -678,7 +678,7 @@ fail:
>>> vioc->release(bcontainer);
>>> }
>>> if (new_container) {
>>> - vfio_cpr_unregister_container(bcontainer);
>>> + vfio_legacy_cpr_unregister_container(container);
>>> object_unref(container);
>>> }
>>> if (fd >= 0) {
>>> @@ -719,7 +719,7 @@ static void vfio_container_disconnect(VFIOGroup *group)
>>> VFIOAddressSpace *space = bcontainer->space;
>>> trace_vfio_container_disconnect(container->fd);
>>> - vfio_cpr_unregister_container(bcontainer);
>>> + vfio_legacy_cpr_unregister_container(container);
>>> close(container->fd);
>>> object_unref(container);
>>> diff --git a/hw/vfio/cpr-legacy.c b/hw/vfio/cpr-legacy.c
>>> new file mode 100644
>>> index 0000000..fac323c
>>> --- /dev/null
>>> +++ b/hw/vfio/cpr-legacy.c
>>> @@ -0,0 +1,70 @@
>>> +/*
>>> + * Copyright (c) 2021-2025 Oracle and/or its affiliates.
>>> + *
>>> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
>>> + * See the COPYING file in the top-level directory.
>>
>> Please add a SPDX-License-Identifier tag.
>
> Sure. I'll do the same for my other new files.
and remove the License boiler plate too please.
A newer version of checkpatch will complain with :
ERROR: New file 'hw/vfio/cpr-legacy.c' requires 'SPDX-License-Identifier'
ERROR: New file 'hw/vfio/cpr-legacy.c' must not have license boilerplate header text unless this file is copied from existing code with such text already present.
WARNING: added, moved or deleted file(s):
hw/vfio/cpr-legacy.c
Does MAINTAINERS need updating?
total: 2 errors, 1 warnings, 152 lines checked
Thanks,
C.
>
>>> + */
>>> +
>>> +#include <sys/ioctl.h>
>>> +#include <linux/vfio.h>
>>> +#include "qemu/osdep.h"
>>> +#include "hw/vfio/vfio-container.h"
>>> +#include "hw/vfio/vfio-cpr.h"
>>> +#include "migration/blocker.h"
>>> +#include "migration/cpr.h"
>>> +#include "migration/migration.h"
>>> +#include "migration/vmstate.h"
>>> +#include "qapi/error.h"
>>> +
>>> +static bool vfio_cpr_supported(VFIOContainer *container, Error **errp)
>>> +{
>>> + if (!ioctl(container->fd, VFIO_CHECK_EXTENSION, VFIO_UPDATE_VADDR)) {
>>> + error_setg(errp, "VFIO container does not support VFIO_UPDATE_VADDR");
>>> + return false;
>>> +
>>> + } else if (!ioctl(container->fd, VFIO_CHECK_EXTENSION, VFIO_UNMAP_ALL)) {
>>> + error_setg(errp, "VFIO container does not support VFIO_UNMAP_ALL");
>>> + return false;
>>> +
>>> + } else {
>>> + return true;
>>> + }
>>> +}
>>> +
>>> +static const VMStateDescription vfio_container_vmstate = {
>>> + .name = "vfio-container",
>>> + .version_id = 0,
>>> + .minimum_version_id = 0,
>>> + .needed = cpr_needed_for_reuse,
>>> + .fields = (VMStateField[]) {
>>> + VMSTATE_END_OF_LIST()
>>> + }
>>> +};
>>> +
>>> +bool vfio_legacy_cpr_register_container(VFIOContainer *container, Error **errp)
>>> +{
>>> + VFIOContainerBase *bcontainer = &container->bcontainer;
>>> + Error **cpr_blocker = &container->cpr.blocker;
>>> +
>>> + migration_add_notifier_mode(&bcontainer->cpr_reboot_notifier,
>>> + vfio_cpr_reboot_notifier,
>>> + MIG_MODE_CPR_REBOOT);
>>> +
>>> + if (!vfio_cpr_supported(container, cpr_blocker)) {
>>> + return migrate_add_blocker_modes(cpr_blocker, errp,
>>> + MIG_MODE_CPR_TRANSFER, -1) == 0;
>>> + }
>>> +
>>> + vmstate_register(NULL, -1, &vfio_container_vmstate, container);
>>> +
>>> + return true;
>>> +}
>>> +
>>> +void vfio_legacy_cpr_unregister_container(VFIOContainer *container)
>>> +{
>>> + VFIOContainerBase *bcontainer = &container->bcontainer;
>>> +
>>> + migration_remove_notifier(&bcontainer->cpr_reboot_notifier);
>>> + migrate_del_blocker(&container->cpr.blocker);
>>> + vmstate_unregister(NULL, &vfio_container_vmstate, container);
>>> +}
>>> diff --git a/hw/vfio/cpr.c b/hw/vfio/cpr.c
>>> index 0210e76..0e59612 100644
>>> --- a/hw/vfio/cpr.c
>>> +++ b/hw/vfio/cpr.c
>>> @@ -7,13 +7,12 @@
>>> #include "qemu/osdep.h"
>>> #include "hw/vfio/vfio-device.h"
>>> -#include "migration/misc.h"
>>> #include "hw/vfio/vfio-cpr.h"
>>> #include "qapi/error.h"
>>> #include "system/runstate.h"
>>> -static int vfio_cpr_reboot_notifier(NotifierWithReturn *notifier,
>>> - MigrationEvent *e, Error **errp)
>>> +int vfio_cpr_reboot_notifier(NotifierWithReturn *notifier,
>>> + MigrationEvent *e, Error **errp)
>>> {
>>> if (e->type == MIG_EVENT_PRECOPY_SETUP &&
>>> !runstate_check(RUN_STATE_SUSPENDED) && !vm_get_suspended()) {
>>> diff --git a/hw/vfio/meson.build b/hw/vfio/meson.build
>>> index bccb050..73d29f9 100644
>>> --- a/hw/vfio/meson.build
>>> +++ b/hw/vfio/meson.build
>>> @@ -21,6 +21,7 @@ system_ss.add(when: 'CONFIG_VFIO_XGMAC', if_true: files('calxeda-xgmac.c'))
>>> system_ss.add(when: 'CONFIG_VFIO_AMD_XGBE', if_true: files('amd-xgbe.c'))
>>> system_ss.add(when: 'CONFIG_VFIO', if_true: files(
>>> 'cpr.c',
>>> + 'cpr-legacy.c',
>>> 'device.c',
>>> 'migration.c',
>>> 'migration-multifd.c',
>>> diff --git a/include/hw/vfio/vfio-container.h b/include/hw/vfio/vfio-container.h
>>> index afc498d..21e5807 100644
>>> --- a/include/hw/vfio/vfio-container.h
>>> +++ b/include/hw/vfio/vfio-container.h
>>> @@ -10,6 +10,7 @@
>>> #define HW_VFIO_CONTAINER_H
>>> #include "hw/vfio/vfio-container-base.h"
>>> +#include "hw/vfio/vfio-cpr.h"
>>> typedef struct VFIOContainer VFIOContainer;
>>> typedef struct VFIODevice VFIODevice;
>>> @@ -29,6 +30,7 @@ typedef struct VFIOContainer {
>>> int fd; /* /dev/vfio/vfio, empowered by the attached groups */
>>> unsigned iommu_type;
>>> QLIST_HEAD(, VFIOGroup) group_list;
>>> + VFIOContainerCPR cpr;
>>> } VFIOContainer;
>>> OBJECT_DECLARE_SIMPLE_TYPE(VFIOContainer, VFIO_IOMMU_LEGACY);
>>> diff --git a/include/hw/vfio/vfio-cpr.h b/include/hw/vfio/vfio-cpr.h
>>> index 750ea5b..f864547 100644
>>> --- a/include/hw/vfio/vfio-cpr.h
>>> +++ b/include/hw/vfio/vfio-cpr.h
>>> @@ -9,8 +9,22 @@
>>> #ifndef HW_VFIO_VFIO_CPR_H
>>> #define HW_VFIO_VFIO_CPR_H
>>> +#include "migration/misc.h"
>>> +
>>> +typedef struct VFIOContainerCPR {
>>> + Error *blocker;
>>> +} VFIOContainerCPR;
>>> +
>>> +struct VFIOContainer;
>>> struct VFIOContainerBase;
>>> +bool vfio_legacy_cpr_register_container(struct VFIOContainer *container,
>>> + Error **errp);
>>> +void vfio_legacy_cpr_unregister_container(struct VFIOContainer *container);
>>> +
>>> +int vfio_cpr_reboot_notifier(NotifierWithReturn *notifier, MigrationEvent *e,
>>> + Error **errp);
>>> +
>>> bool vfio_cpr_register_container(struct VFIOContainerBase *bcontainer,
>>> Error **errp);
>>> void vfio_cpr_unregister_container(struct VFIOContainerBase *bcontainer);
>>
>> what about vfio_cpr_un/register_container ? Shouldn't we remove them ?
>
> At this patch in the series, those are still used by iommufd containers.
> Those uses are removed in "vfio/iommufd: register container for cpr", and
> vfio_cpr_un/register_container are deleted by the last patch in the series.
>
> - Steve
>
^ permalink raw reply [flat|nested] 157+ messages in thread* Re: [PATCH V3 06/42] vfio/container: register container for cpr
2025-05-16 16:20 ` Cédric Le Goater
@ 2025-05-16 17:21 ` Steven Sistare
0 siblings, 0 replies; 157+ messages in thread
From: Steven Sistare @ 2025-05-16 17:21 UTC (permalink / raw)
To: Cédric Le Goater, qemu-devel
Cc: Alex Williamson, Yi Liu, Eric Auger, Zhenzhong Duan,
Michael S. Tsirkin, Marcel Apfelbaum, Peter Xu, Fabiano Rosas
On 5/16/2025 12:20 PM, Cédric Le Goater wrote:
> On 5/15/25 21:06, Steven Sistare wrote:
>> On 5/15/2025 3:54 AM, Cédric Le Goater wrote:
>>> On 5/12/25 17:32, Steve Sistare wrote:
>>>> Register a legacy container for cpr-transfer, replacing the generic CPR
>>>> register call with a more specific legacy container register call. Add a
>>>> blocker if the kernel does not support VFIO_UPDATE_VADDR or VFIO_UNMAP_ALL.
>>>>
>>>> This is mostly boiler plate. The fields to to saved and restored are added
>>>> in subsequent patches.
>>>>
>>>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>>>> ---
>>>> hw/vfio/container.c | 6 ++--
>>>> hw/vfio/cpr-legacy.c | 70 ++++++++++++++++++++++++++++++++++++++++
>>>> hw/vfio/cpr.c | 5 ++-
>>>> hw/vfio/meson.build | 1 +
>>>> include/hw/vfio/vfio-container.h | 2 ++
>>>> include/hw/vfio/vfio-cpr.h | 14 ++++++++
>>>> 6 files changed, 92 insertions(+), 6 deletions(-)
>>>> create mode 100644 hw/vfio/cpr-legacy.c
>>>>
>>>> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
>>>> index eb56f00..85c76da 100644
>>>> --- a/hw/vfio/container.c
>>>> +++ b/hw/vfio/container.c
>>>> @@ -642,7 +642,7 @@ static bool vfio_container_connect(VFIOGroup *group, AddressSpace *as,
>>>> new_container = true;
>>>> bcontainer = &container->bcontainer;
>>>> - if (!vfio_cpr_register_container(bcontainer, errp)) {
>>>> + if (!vfio_legacy_cpr_register_container(container, errp)) {
>>>> goto fail;
>>>> }
>>>> @@ -678,7 +678,7 @@ fail:
>>>> vioc->release(bcontainer);
>>>> }
>>>> if (new_container) {
>>>> - vfio_cpr_unregister_container(bcontainer);
>>>> + vfio_legacy_cpr_unregister_container(container);
>>>> object_unref(container);
>>>> }
>>>> if (fd >= 0) {
>>>> @@ -719,7 +719,7 @@ static void vfio_container_disconnect(VFIOGroup *group)
>>>> VFIOAddressSpace *space = bcontainer->space;
>>>> trace_vfio_container_disconnect(container->fd);
>>>> - vfio_cpr_unregister_container(bcontainer);
>>>> + vfio_legacy_cpr_unregister_container(container);
>>>> close(container->fd);
>>>> object_unref(container);
>>>> diff --git a/hw/vfio/cpr-legacy.c b/hw/vfio/cpr-legacy.c
>>>> new file mode 100644
>>>> index 0000000..fac323c
>>>> --- /dev/null
>>>> +++ b/hw/vfio/cpr-legacy.c
>>>> @@ -0,0 +1,70 @@
>>>> +/*
>>>> + * Copyright (c) 2021-2025 Oracle and/or its affiliates.
>>>> + *
>>>> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
>>>> + * See the COPYING file in the top-level directory.
>>>
>>> Please add a SPDX-License-Identifier tag.
>>
>> Sure. I'll do the same for my other new files.
>
> and remove the License boiler plate too please.
Yes. I understood you wanted me to replace one with the other.
- Steve
>
> A newer version of checkpatch will complain with :
>
> ERROR: New file 'hw/vfio/cpr-legacy.c' requires 'SPDX-License-Identifier'
> ERROR: New file 'hw/vfio/cpr-legacy.c' must not have license boilerplate header text unless this file is copied from existing code with such text already present.
> WARNING: added, moved or deleted file(s):
>
> hw/vfio/cpr-legacy.c
>
> Does MAINTAINERS need updating?
>
> total: 2 errors, 1 warnings, 152 lines checked
>
>
> Thanks,
>
> C.
>
>
>>
>>>> + */
>>>> +
>>>> +#include <sys/ioctl.h>
>>>> +#include <linux/vfio.h>
>>>> +#include "qemu/osdep.h"
>>>> +#include "hw/vfio/vfio-container.h"
>>>> +#include "hw/vfio/vfio-cpr.h"
>>>> +#include "migration/blocker.h"
>>>> +#include "migration/cpr.h"
>>>> +#include "migration/migration.h"
>>>> +#include "migration/vmstate.h"
>>>> +#include "qapi/error.h"
>>>> +
>>>> +static bool vfio_cpr_supported(VFIOContainer *container, Error **errp)
>>>> +{
>>>> + if (!ioctl(container->fd, VFIO_CHECK_EXTENSION, VFIO_UPDATE_VADDR)) {
>>>> + error_setg(errp, "VFIO container does not support VFIO_UPDATE_VADDR");
>>>> + return false;
>>>> +
>>>> + } else if (!ioctl(container->fd, VFIO_CHECK_EXTENSION, VFIO_UNMAP_ALL)) {
>>>> + error_setg(errp, "VFIO container does not support VFIO_UNMAP_ALL");
>>>> + return false;
>>>> +
>>>> + } else {
>>>> + return true;
>>>> + }
>>>> +}
>>>> +
>>>> +static const VMStateDescription vfio_container_vmstate = {
>>>> + .name = "vfio-container",
>>>> + .version_id = 0,
>>>> + .minimum_version_id = 0,
>>>> + .needed = cpr_needed_for_reuse,
>>>> + .fields = (VMStateField[]) {
>>>> + VMSTATE_END_OF_LIST()
>>>> + }
>>>> +};
>>>> +
>>>> +bool vfio_legacy_cpr_register_container(VFIOContainer *container, Error **errp)
>>>> +{
>>>> + VFIOContainerBase *bcontainer = &container->bcontainer;
>>>> + Error **cpr_blocker = &container->cpr.blocker;
>>>> +
>>>> + migration_add_notifier_mode(&bcontainer->cpr_reboot_notifier,
>>>> + vfio_cpr_reboot_notifier,
>>>> + MIG_MODE_CPR_REBOOT);
>>>> +
>>>> + if (!vfio_cpr_supported(container, cpr_blocker)) {
>>>> + return migrate_add_blocker_modes(cpr_blocker, errp,
>>>> + MIG_MODE_CPR_TRANSFER, -1) == 0;
>>>> + }
>>>> +
>>>> + vmstate_register(NULL, -1, &vfio_container_vmstate, container);
>>>> +
>>>> + return true;
>>>> +}
>>>> +
>>>> +void vfio_legacy_cpr_unregister_container(VFIOContainer *container)
>>>> +{
>>>> + VFIOContainerBase *bcontainer = &container->bcontainer;
>>>> +
>>>> + migration_remove_notifier(&bcontainer->cpr_reboot_notifier);
>>>> + migrate_del_blocker(&container->cpr.blocker);
>>>> + vmstate_unregister(NULL, &vfio_container_vmstate, container);
>>>> +}
>>>> diff --git a/hw/vfio/cpr.c b/hw/vfio/cpr.c
>>>> index 0210e76..0e59612 100644
>>>> --- a/hw/vfio/cpr.c
>>>> +++ b/hw/vfio/cpr.c
>>>> @@ -7,13 +7,12 @@
>>>> #include "qemu/osdep.h"
>>>> #include "hw/vfio/vfio-device.h"
>>>> -#include "migration/misc.h"
>>>> #include "hw/vfio/vfio-cpr.h"
>>>> #include "qapi/error.h"
>>>> #include "system/runstate.h"
>>>> -static int vfio_cpr_reboot_notifier(NotifierWithReturn *notifier,
>>>> - MigrationEvent *e, Error **errp)
>>>> +int vfio_cpr_reboot_notifier(NotifierWithReturn *notifier,
>>>> + MigrationEvent *e, Error **errp)
>>>> {
>>>> if (e->type == MIG_EVENT_PRECOPY_SETUP &&
>>>> !runstate_check(RUN_STATE_SUSPENDED) && !vm_get_suspended()) {
>>>> diff --git a/hw/vfio/meson.build b/hw/vfio/meson.build
>>>> index bccb050..73d29f9 100644
>>>> --- a/hw/vfio/meson.build
>>>> +++ b/hw/vfio/meson.build
>>>> @@ -21,6 +21,7 @@ system_ss.add(when: 'CONFIG_VFIO_XGMAC', if_true: files('calxeda-xgmac.c'))
>>>> system_ss.add(when: 'CONFIG_VFIO_AMD_XGBE', if_true: files('amd-xgbe.c'))
>>>> system_ss.add(when: 'CONFIG_VFIO', if_true: files(
>>>> 'cpr.c',
>>>> + 'cpr-legacy.c',
>>>> 'device.c',
>>>> 'migration.c',
>>>> 'migration-multifd.c',
>>>> diff --git a/include/hw/vfio/vfio-container.h b/include/hw/vfio/vfio-container.h
>>>> index afc498d..21e5807 100644
>>>> --- a/include/hw/vfio/vfio-container.h
>>>> +++ b/include/hw/vfio/vfio-container.h
>>>> @@ -10,6 +10,7 @@
>>>> #define HW_VFIO_CONTAINER_H
>>>> #include "hw/vfio/vfio-container-base.h"
>>>> +#include "hw/vfio/vfio-cpr.h"
>>>> typedef struct VFIOContainer VFIOContainer;
>>>> typedef struct VFIODevice VFIODevice;
>>>> @@ -29,6 +30,7 @@ typedef struct VFIOContainer {
>>>> int fd; /* /dev/vfio/vfio, empowered by the attached groups */
>>>> unsigned iommu_type;
>>>> QLIST_HEAD(, VFIOGroup) group_list;
>>>> + VFIOContainerCPR cpr;
>>>> } VFIOContainer;
>>>> OBJECT_DECLARE_SIMPLE_TYPE(VFIOContainer, VFIO_IOMMU_LEGACY);
>>>> diff --git a/include/hw/vfio/vfio-cpr.h b/include/hw/vfio/vfio-cpr.h
>>>> index 750ea5b..f864547 100644
>>>> --- a/include/hw/vfio/vfio-cpr.h
>>>> +++ b/include/hw/vfio/vfio-cpr.h
>>>> @@ -9,8 +9,22 @@
>>>> #ifndef HW_VFIO_VFIO_CPR_H
>>>> #define HW_VFIO_VFIO_CPR_H
>>>> +#include "migration/misc.h"
>>>> +
>>>> +typedef struct VFIOContainerCPR {
>>>> + Error *blocker;
>>>> +} VFIOContainerCPR;
>>>> +
>>>> +struct VFIOContainer;
>>>> struct VFIOContainerBase;
>>>> +bool vfio_legacy_cpr_register_container(struct VFIOContainer *container,
>>>> + Error **errp);
>>>> +void vfio_legacy_cpr_unregister_container(struct VFIOContainer *container);
>>>> +
>>>> +int vfio_cpr_reboot_notifier(NotifierWithReturn *notifier, MigrationEvent *e,
>>>> + Error **errp);
>>>> +
>>>> bool vfio_cpr_register_container(struct VFIOContainerBase *bcontainer,
>>>> Error **errp);
>>>> void vfio_cpr_unregister_container(struct VFIOContainerBase *bcontainer);
>>>
>>> what about vfio_cpr_un/register_container ? Shouldn't we remove them ?
>>
>> At this patch in the series, those are still used by iommufd containers.
>> Those uses are removed in "vfio/iommufd: register container for cpr", and
>> vfio_cpr_un/register_container are deleted by the last patch in the series.
>>
>> - Steve
>>
>
^ permalink raw reply [flat|nested] 157+ messages in thread
* [PATCH V3 07/42] vfio/container: preserve descriptors
2025-05-12 15:32 [PATCH V3 00/42] Live update: vfio and iommufd Steve Sistare
` (5 preceding siblings ...)
2025-05-12 15:32 ` [PATCH V3 06/42] vfio/container: register container for cpr Steve Sistare
@ 2025-05-12 15:32 ` Steve Sistare
2025-05-15 12:59 ` Cédric Le Goater
2025-05-22 13:51 ` Cédric Le Goater
2025-05-12 15:32 ` [PATCH V3 08/42] vfio/container: export vfio_legacy_dma_map Steve Sistare
` (35 subsequent siblings)
42 siblings, 2 replies; 157+ messages in thread
From: Steve Sistare @ 2025-05-12 15:32 UTC (permalink / raw)
To: qemu-devel
Cc: Alex Williamson, Cedric Le Goater, Yi Liu, Eric Auger,
Zhenzhong Duan, Michael S. Tsirkin, Marcel Apfelbaum, Peter Xu,
Fabiano Rosas, Steve Sistare
At vfio creation time, save the value of vfio container, group, and device
descriptors in CPR state. On qemu restart, vfio_realize() finds and uses
the saved descriptors, and remembers the reused status for subsequent
patches. The reused status is cleared when vmstate load finishes.
During reuse, device and iommu state is already configured, so operations
in vfio_realize that would modify the configuration, such as vfio ioctl's,
are skipped. The result is that vfio_realize constructs qemu data
structures that reflect the current state of the device.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
hw/vfio/container.c | 65 ++++++++++++++++++++++++++++++++++++-------
hw/vfio/cpr-legacy.c | 46 ++++++++++++++++++++++++++++++
include/hw/vfio/vfio-cpr.h | 9 ++++++
include/hw/vfio/vfio-device.h | 2 ++
4 files changed, 112 insertions(+), 10 deletions(-)
diff --git a/hw/vfio/container.c b/hw/vfio/container.c
index 85c76da..278a220 100644
--- a/hw/vfio/container.c
+++ b/hw/vfio/container.c
@@ -31,6 +31,8 @@
#include "system/reset.h"
#include "trace.h"
#include "qapi/error.h"
+#include "migration/cpr.h"
+#include "migration/blocker.h"
#include "pci.h"
#include "hw/vfio/vfio-container.h"
#include "hw/vfio/vfio-cpr.h"
@@ -414,7 +416,7 @@ static bool vfio_set_iommu(int container_fd, int group_fd,
}
static VFIOContainer *vfio_create_container(int fd, VFIOGroup *group,
- Error **errp)
+ bool cpr_reused, Error **errp)
{
int iommu_type;
const char *vioc_name;
@@ -425,7 +427,11 @@ static VFIOContainer *vfio_create_container(int fd, VFIOGroup *group,
return NULL;
}
- if (!vfio_set_iommu(fd, group->fd, &iommu_type, errp)) {
+ /*
+ * If container is reused, just set its type and skip the ioctls, as the
+ * container and group are already configured in the kernel.
+ */
+ if (!cpr_reused && !vfio_set_iommu(fd, group->fd, &iommu_type, errp)) {
return NULL;
}
@@ -433,6 +439,7 @@ static VFIOContainer *vfio_create_container(int fd, VFIOGroup *group,
container = VFIO_IOMMU_LEGACY(object_new(vioc_name));
container->fd = fd;
+ container->cpr.reused = cpr_reused;
container->iommu_type = iommu_type;
return container;
}
@@ -584,7 +591,7 @@ static bool vfio_container_attach_discard_disable(VFIOContainer *container,
}
static bool vfio_container_group_add(VFIOContainer *container, VFIOGroup *group,
- Error **errp)
+ bool cpr_reused, Error **errp)
{
if (!vfio_container_attach_discard_disable(container, group, errp)) {
return false;
@@ -592,6 +599,9 @@ static bool vfio_container_group_add(VFIOContainer *container, VFIOGroup *group,
group->container = container;
QLIST_INSERT_HEAD(&container->group_list, group, container_next);
vfio_group_add_kvm_device(group);
+ if (!cpr_reused) {
+ cpr_save_fd("vfio_container_for_group", group->groupid, container->fd);
+ }
return true;
}
@@ -601,6 +611,7 @@ static void vfio_container_group_del(VFIOContainer *container, VFIOGroup *group)
group->container = NULL;
vfio_group_del_kvm_device(group);
vfio_ram_block_discard_disable(container, false);
+ cpr_delete_fd("vfio_container_for_group", group->groupid);
}
static bool vfio_container_connect(VFIOGroup *group, AddressSpace *as,
@@ -613,17 +624,37 @@ static bool vfio_container_connect(VFIOGroup *group, AddressSpace *as,
VFIOIOMMUClass *vioc = NULL;
bool new_container = false;
bool group_was_added = false;
+ bool cpr_reused;
space = vfio_address_space_get(as);
+ fd = cpr_find_fd("vfio_container_for_group", group->groupid);
+ cpr_reused = (fd > 0);
+
+ /*
+ * If the container is reused, then the group is already attached in the
+ * kernel. If a container with matching fd is found, then update the
+ * userland group list and return. If not, then after the loop, create
+ * the container struct and group list.
+ */
QLIST_FOREACH(bcontainer, &space->containers, next) {
container = container_of(bcontainer, VFIOContainer, bcontainer);
- if (!ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &container->fd)) {
- return vfio_container_group_add(container, group, errp);
+
+ if (cpr_reused) {
+ if (!vfio_cpr_container_match(container, group, &fd)) {
+ continue;
+ }
+ } else if (ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &container->fd)) {
+ continue;
}
+
+ return vfio_container_group_add(container, group, cpr_reused, errp);
+ }
+
+ if (!cpr_reused) {
+ fd = qemu_open("/dev/vfio/vfio", O_RDWR, errp);
}
- fd = qemu_open("/dev/vfio/vfio", O_RDWR, errp);
if (fd < 0) {
goto fail;
}
@@ -635,7 +666,7 @@ static bool vfio_container_connect(VFIOGroup *group, AddressSpace *as,
goto fail;
}
- container = vfio_create_container(fd, group, errp);
+ container = vfio_create_container(fd, group, cpr_reused, errp);
if (!container) {
goto fail;
}
@@ -655,7 +686,7 @@ static bool vfio_container_connect(VFIOGroup *group, AddressSpace *as,
vfio_address_space_insert(space, bcontainer);
- if (!vfio_container_group_add(container, group, errp)) {
+ if (!vfio_container_group_add(container, group, cpr_reused, errp)) {
goto fail;
}
group_was_added = true;
@@ -697,6 +728,7 @@ static void vfio_container_disconnect(VFIOGroup *group)
QLIST_REMOVE(group, container_next);
group->container = NULL;
+ cpr_delete_fd("vfio_container_for_group", group->groupid);
/*
* Explicitly release the listener first before unset container,
@@ -750,7 +782,7 @@ static VFIOGroup *vfio_group_get(int groupid, AddressSpace *as, Error **errp)
group = g_malloc0(sizeof(*group));
snprintf(path, sizeof(path), "/dev/vfio/%d", groupid);
- group->fd = qemu_open(path, O_RDWR, errp);
+ group->fd = cpr_open_fd(path, O_RDWR, "vfio_group", groupid, NULL, errp);
if (group->fd < 0) {
goto free_group_exit;
}
@@ -782,6 +814,7 @@ static VFIOGroup *vfio_group_get(int groupid, AddressSpace *as, Error **errp)
return group;
close_fd_exit:
+ cpr_delete_fd("vfio_group", groupid);
close(group->fd);
free_group_exit:
@@ -803,6 +836,7 @@ static void vfio_group_put(VFIOGroup *group)
vfio_container_disconnect(group);
QLIST_REMOVE(group, next);
trace_vfio_group_put(group->fd);
+ cpr_delete_fd("vfio_group", group->groupid);
close(group->fd);
g_free(group);
}
@@ -812,8 +846,14 @@ static bool vfio_device_get(VFIOGroup *group, const char *name,
{
g_autofree struct vfio_device_info *info = NULL;
int fd;
+ bool cpr_reused;
+
+ fd = cpr_find_fd(name, 0);
+ cpr_reused = (fd >= 0);
+ if (!cpr_reused) {
+ fd = ioctl(group->fd, VFIO_GROUP_GET_DEVICE_FD, name);
+ }
- fd = ioctl(group->fd, VFIO_GROUP_GET_DEVICE_FD, name);
if (fd < 0) {
error_setg_errno(errp, errno, "error getting device from group %d",
group->groupid);
@@ -857,6 +897,10 @@ static bool vfio_device_get(VFIOGroup *group, const char *name,
vbasedev->group = group;
QLIST_INSERT_HEAD(&group->device_list, vbasedev, next);
+ vbasedev->cpr.reused = cpr_reused;
+ if (!cpr_reused) {
+ cpr_save_fd(name, 0, fd);
+ }
trace_vfio_device_get(name, info->flags, info->num_regions, info->num_irqs);
return true;
@@ -870,6 +914,7 @@ static void vfio_device_put(VFIODevice *vbasedev)
QLIST_REMOVE(vbasedev, next);
vbasedev->group = NULL;
trace_vfio_device_put(vbasedev->fd);
+ cpr_delete_fd(vbasedev->name, 0);
close(vbasedev->fd);
}
diff --git a/hw/vfio/cpr-legacy.c b/hw/vfio/cpr-legacy.c
index fac323c..638a8e0 100644
--- a/hw/vfio/cpr-legacy.c
+++ b/hw/vfio/cpr-legacy.c
@@ -10,6 +10,7 @@
#include "qemu/osdep.h"
#include "hw/vfio/vfio-container.h"
#include "hw/vfio/vfio-cpr.h"
+#include "hw/vfio/vfio-device.h"
#include "migration/blocker.h"
#include "migration/cpr.h"
#include "migration/migration.h"
@@ -31,10 +32,27 @@ static bool vfio_cpr_supported(VFIOContainer *container, Error **errp)
}
}
+static int vfio_container_post_load(void *opaque, int version_id)
+{
+ VFIOContainer *container = opaque;
+ VFIOGroup *group;
+ VFIODevice *vbasedev;
+
+ container->cpr.reused = false;
+
+ QLIST_FOREACH(group, &container->group_list, container_next) {
+ QLIST_FOREACH(vbasedev, &group->device_list, next) {
+ vbasedev->cpr.reused = false;
+ }
+ }
+ return 0;
+}
+
static const VMStateDescription vfio_container_vmstate = {
.name = "vfio-container",
.version_id = 0,
.minimum_version_id = 0,
+ .post_load = vfio_container_post_load,
.needed = cpr_needed_for_reuse,
.fields = (VMStateField[]) {
VMSTATE_END_OF_LIST()
@@ -68,3 +86,31 @@ void vfio_legacy_cpr_unregister_container(VFIOContainer *container)
migrate_del_blocker(&container->cpr.blocker);
vmstate_unregister(NULL, &vfio_container_vmstate, container);
}
+
+static bool same_device(int fd1, int fd2)
+{
+ struct stat st1, st2;
+
+ return !fstat(fd1, &st1) && !fstat(fd2, &st2) && st1.st_dev == st2.st_dev;
+}
+
+bool vfio_cpr_container_match(VFIOContainer *container, VFIOGroup *group,
+ int *pfd)
+{
+ if (container->fd == *pfd) {
+ return true;
+ }
+ if (!same_device(container->fd, *pfd)) {
+ return false;
+ }
+ /*
+ * Same device, different fd. This occurs when the container fd is
+ * cpr_save'd multiple times, once for each groupid, so SCM_RIGHTS
+ * produces duplicates. De-dup it.
+ */
+ cpr_delete_fd("vfio_container_for_group", group->groupid);
+ close(*pfd);
+ cpr_save_fd("vfio_container_for_group", group->groupid, container->fd);
+ *pfd = container->fd;
+ return true;
+}
diff --git a/include/hw/vfio/vfio-cpr.h b/include/hw/vfio/vfio-cpr.h
index f864547..1c4f070 100644
--- a/include/hw/vfio/vfio-cpr.h
+++ b/include/hw/vfio/vfio-cpr.h
@@ -13,10 +13,16 @@
typedef struct VFIOContainerCPR {
Error *blocker;
+ bool reused;
} VFIOContainerCPR;
+typedef struct VFIODeviceCPR {
+ bool reused;
+} VFIODeviceCPR;
+
struct VFIOContainer;
struct VFIOContainerBase;
+struct VFIOGroup;
bool vfio_legacy_cpr_register_container(struct VFIOContainer *container,
Error **errp);
@@ -29,4 +35,7 @@ bool vfio_cpr_register_container(struct VFIOContainerBase *bcontainer,
Error **errp);
void vfio_cpr_unregister_container(struct VFIOContainerBase *bcontainer);
+bool vfio_cpr_container_match(struct VFIOContainer *container,
+ struct VFIOGroup *group, int *fd);
+
#endif /* HW_VFIO_VFIO_CPR_H */
diff --git a/include/hw/vfio/vfio-device.h b/include/hw/vfio/vfio-device.h
index 8bcb3c1..4e4d0b6 100644
--- a/include/hw/vfio/vfio-device.h
+++ b/include/hw/vfio/vfio-device.h
@@ -28,6 +28,7 @@
#endif
#include "system/system.h"
#include "hw/vfio/vfio-container-base.h"
+#include "hw/vfio/vfio-cpr.h"
#include "system/host_iommu_device.h"
#include "system/iommufd.h"
@@ -84,6 +85,7 @@ typedef struct VFIODevice {
VFIOIOASHwpt *hwpt;
QLIST_ENTRY(VFIODevice) hwpt_next;
struct vfio_region_info **reginfo;
+ VFIODeviceCPR cpr;
} VFIODevice;
struct VFIODeviceOps {
--
1.8.3.1
^ permalink raw reply related [flat|nested] 157+ messages in thread* Re: [PATCH V3 07/42] vfio/container: preserve descriptors
2025-05-12 15:32 ` [PATCH V3 07/42] vfio/container: preserve descriptors Steve Sistare
@ 2025-05-15 12:59 ` Cédric Le Goater
2025-05-15 19:08 ` Steven Sistare
2025-05-22 13:51 ` Cédric Le Goater
1 sibling, 1 reply; 157+ messages in thread
From: Cédric Le Goater @ 2025-05-15 12:59 UTC (permalink / raw)
To: Steve Sistare, qemu-devel
Cc: Alex Williamson, Yi Liu, Eric Auger, Zhenzhong Duan,
Michael S. Tsirkin, Marcel Apfelbaum, Peter Xu, Fabiano Rosas
On 5/12/25 17:32, Steve Sistare wrote:
> At vfio creation time, save the value of vfio container, group, and device
> descriptors in CPR state. On qemu restart, vfio_realize() finds and uses
> the saved descriptors, and remembers the reused status for subsequent
> patches. The reused status is cleared when vmstate load finishes.
>
> During reuse, device and iommu state is already configured, so operations
> in vfio_realize that would modify the configuration, such as vfio ioctl's,
> are skipped. The result is that vfio_realize constructs qemu data
> structures that reflect the current state of the device.
>
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> ---
> hw/vfio/container.c | 65 ++++++++++++++++++++++++++++++++++++-------
> hw/vfio/cpr-legacy.c | 46 ++++++++++++++++++++++++++++++
> include/hw/vfio/vfio-cpr.h | 9 ++++++
> include/hw/vfio/vfio-device.h | 2 ++
> 4 files changed, 112 insertions(+), 10 deletions(-)
>
> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
> index 85c76da..278a220 100644
> --- a/hw/vfio/container.c
> +++ b/hw/vfio/container.c
> @@ -31,6 +31,8 @@
> #include "system/reset.h"
> #include "trace.h"
> #include "qapi/error.h"
> +#include "migration/cpr.h"
> +#include "migration/blocker.h"
> #include "pci.h"
> #include "hw/vfio/vfio-container.h"
> #include "hw/vfio/vfio-cpr.h"
> @@ -414,7 +416,7 @@ static bool vfio_set_iommu(int container_fd, int group_fd,
> }
>
> static VFIOContainer *vfio_create_container(int fd, VFIOGroup *group,
> - Error **errp)
> + bool cpr_reused, Error **errp)
> {
> int iommu_type;
> const char *vioc_name;
> @@ -425,7 +427,11 @@ static VFIOContainer *vfio_create_container(int fd, VFIOGroup *group,
> return NULL;
> }
>
> - if (!vfio_set_iommu(fd, group->fd, &iommu_type, errp)) {
> + /*
> + * If container is reused, just set its type and skip the ioctls, as the
> + * container and group are already configured in the kernel.
> + */
> + if (!cpr_reused && !vfio_set_iommu(fd, group->fd, &iommu_type, errp)) {
> return NULL;
> }
>
> @@ -433,6 +439,7 @@ static VFIOContainer *vfio_create_container(int fd, VFIOGroup *group,
>
> container = VFIO_IOMMU_LEGACY(object_new(vioc_name));
> container->fd = fd;
> + container->cpr.reused = cpr_reused;
> container->iommu_type = iommu_type;
> return container;
> }
> @@ -584,7 +591,7 @@ static bool vfio_container_attach_discard_disable(VFIOContainer *container,
> }
>
> static bool vfio_container_group_add(VFIOContainer *container, VFIOGroup *group,
> - Error **errp)
> + bool cpr_reused, Error **errp)
> {
> if (!vfio_container_attach_discard_disable(container, group, errp)) {
> return false;
> @@ -592,6 +599,9 @@ static bool vfio_container_group_add(VFIOContainer *container, VFIOGroup *group,
> group->container = container;
> QLIST_INSERT_HEAD(&container->group_list, group, container_next);
> vfio_group_add_kvm_device(group);
> + if (!cpr_reused) {
> + cpr_save_fd("vfio_container_for_group", group->groupid, container->fd);
> + }
Could we avoid the test on cpr_reused always call cpr_save_fd() ?
> return true;
> }
>
> @@ -601,6 +611,7 @@ static void vfio_container_group_del(VFIOContainer *container, VFIOGroup *group)
> group->container = NULL;
> vfio_group_del_kvm_device(group);
> vfio_ram_block_discard_disable(container, false);
> + cpr_delete_fd("vfio_container_for_group", group->groupid);
> }
>
> static bool vfio_container_connect(VFIOGroup *group, AddressSpace *as,
> @@ -613,17 +624,37 @@ static bool vfio_container_connect(VFIOGroup *group, AddressSpace *as,
> VFIOIOMMUClass *vioc = NULL;
> bool new_container = false;
> bool group_was_added = false;
> + bool cpr_reused;
>
> space = vfio_address_space_get(as);
> + fd = cpr_find_fd("vfio_container_for_group", group->groupid);
> + cpr_reused = (fd > 0);
The code above is doing 2 things : it grabs a restored fd and
deduces from the fd value that the VM is doing are doing a CPR
reboot.
Instead of adding this cpr_reused flag, I would prefer to duplicate
the code into something like:
if (!cpr_reboot) {
QLIST_FOREACH(bcontainer, &space->containers, next) {
container = container_of(bcontainer, VFIOContainer, bcontainer);
if (!ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &container->fd)) {
return vfio_container_group_add(container, group, errp);
}
}
fd = qemu_open("/dev/vfio/vfio", O_RDWR, errp);
if (fd < 0) {
goto fail;
}
ret = ioctl(fd, VFIO_GET_API_VERSION);
if (ret != VFIO_API_VERSION) {
error_setg(errp, "supported vfio version: %d, "
"reported version: %d", VFIO_API_VERSION, ret);
goto fail;
}
container = vfio_create_container(fd, group, errp);
} else {
/* ... */
}
> + /*
> + * If the container is reused, then the group is already attached in the
> + * kernel. If a container with matching fd is found, then update the
> + * userland group list and return. If not, then after the loop, create
> + * the container struct and group list.
> + */
>
> QLIST_FOREACH(bcontainer, &space->containers, next) {
> container = container_of(bcontainer, VFIOContainer, bcontainer);
> - if (!ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &container->fd)) {
> - return vfio_container_group_add(container, group, errp);
> +
> + if (cpr_reused) {
> + if (!vfio_cpr_container_match(container, group, &fd)) {
why do we need to modify fd ?
> + continue;
> + }
> + } else if (ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &container->fd)) {
> + continue;
> }
> + return vfio_container_group_add(container, group, cpr_reused, errp);
> + }
> +
> + if (!cpr_reused) {
> + fd = qemu_open("/dev/vfio/vfio", O_RDWR, errp);
> }
>
> - fd = qemu_open("/dev/vfio/vfio", O_RDWR, errp);
> if (fd < 0) {> goto fail;
> }
> @@ -635,7 +666,7 @@ static bool vfio_container_connect(VFIOGroup *group, AddressSpace *as,
> goto fail;
> }
>
> - container = vfio_create_container(fd, group, errp);
> + container = vfio_create_container(fd, group, cpr_reused, errp);
> if (!container) {
> goto fail;
> }
> @@ -655,7 +686,7 @@ static bool vfio_container_connect(VFIOGroup *group, AddressSpace *as,
>
> vfio_address_space_insert(space, bcontainer);
>
> - if (!vfio_container_group_add(container, group, errp)) {
> + if (!vfio_container_group_add(container, group, cpr_reused, errp)) {
> goto fail;
> }
> group_was_added = true;
> @@ -697,6 +728,7 @@ static void vfio_container_disconnect(VFIOGroup *group)
>
> QLIST_REMOVE(group, container_next);
> group->container = NULL;
> + cpr_delete_fd("vfio_container_for_group", group->groupid);
>
> /*
> * Explicitly release the listener first before unset container,
> @@ -750,7 +782,7 @@ static VFIOGroup *vfio_group_get(int groupid, AddressSpace *as, Error **errp)
> group = g_malloc0(sizeof(*group));
>
> snprintf(path, sizeof(path), "/dev/vfio/%d", groupid);
> - group->fd = qemu_open(path, O_RDWR, errp);
> + group->fd = cpr_open_fd(path, O_RDWR, "vfio_group", groupid, NULL, errp);
> if (group->fd < 0) {
> goto free_group_exit;
> }
> @@ -782,6 +814,7 @@ static VFIOGroup *vfio_group_get(int groupid, AddressSpace *as, Error **errp)
> return group;
>
> close_fd_exit:
> + cpr_delete_fd("vfio_group", groupid);
> close(group->fd);
>
> free_group_exit:
> @@ -803,6 +836,7 @@ static void vfio_group_put(VFIOGroup *group)
> vfio_container_disconnect(group);
> QLIST_REMOVE(group, next);
> trace_vfio_group_put(group->fd);
> + cpr_delete_fd("vfio_group", group->groupid);
> close(group->fd);
> g_free(group);
> }
> @@ -812,8 +846,14 @@ static bool vfio_device_get(VFIOGroup *group, const char *name,
> {
> g_autofree struct vfio_device_info *info = NULL;
> int fd;
> + bool cpr_reused;
> +
> + fd = cpr_find_fd(name, 0);
> + cpr_reused = (fd >= 0);
> + if (!cpr_reused) {
> + fd = ioctl(group->fd, VFIO_GROUP_GET_DEVICE_FD, name);
> + }
>
Could we introduce an helper routine to open this file, like we have
cpr_open_fd() ?
> - fd = ioctl(group->fd, VFIO_GROUP_GET_DEVICE_FD, name);
> if (fd < 0) {
> error_setg_errno(errp, errno, "error getting device from group %d",
> group->groupid);
> @@ -857,6 +897,10 @@ static bool vfio_device_get(VFIOGroup *group, const char *name,
> vbasedev->group = group;
> QLIST_INSERT_HEAD(&group->device_list, vbasedev, next);
>
> + vbasedev->cpr.reused = cpr_reused;
> + if (!cpr_reused) {
> + cpr_save_fd(name, 0, fd);
Could we avoid the test on cpr_reused always call cpr_save_fd() ?
> + }
> trace_vfio_device_get(name, info->flags, info->num_regions, info->num_irqs);
>
> return true;
> @@ -870,6 +914,7 @@ static void vfio_device_put(VFIODevice *vbasedev)
> QLIST_REMOVE(vbasedev, next);
> vbasedev->group = NULL;
> trace_vfio_device_put(vbasedev->fd);
> + cpr_delete_fd(vbasedev->name, 0);
> close(vbasedev->fd);
> }
>
> diff --git a/hw/vfio/cpr-legacy.c b/hw/vfio/cpr-legacy.c
> index fac323c..638a8e0 100644
> --- a/hw/vfio/cpr-legacy.c
> +++ b/hw/vfio/cpr-legacy.c
> @@ -10,6 +10,7 @@
> #include "qemu/osdep.h"
> #include "hw/vfio/vfio-container.h"
> #include "hw/vfio/vfio-cpr.h"
> +#include "hw/vfio/vfio-device.h"
> #include "migration/blocker.h"
> #include "migration/cpr.h"
> #include "migration/migration.h"
> @@ -31,10 +32,27 @@ static bool vfio_cpr_supported(VFIOContainer *container, Error **errp)
> }
> }
>
> +static int vfio_container_post_load(void *opaque, int version_id)
> +{
> + VFIOContainer *container = opaque;
> + VFIOGroup *group;
> + VFIODevice *vbasedev;
> +
> + container->cpr.reused = false;
> +
> + QLIST_FOREACH(group, &container->group_list, container_next) {
> + QLIST_FOREACH(vbasedev, &group->device_list, next) {
> + vbasedev->cpr.reused = false;
> + }
> + }
> + return 0;
> +}
> +
> static const VMStateDescription vfio_container_vmstate = {
> .name = "vfio-container",
> .version_id = 0,
> .minimum_version_id = 0,
> + .post_load = vfio_container_post_load,
> .needed = cpr_needed_for_reuse,
> .fields = (VMStateField[]) {
> VMSTATE_END_OF_LIST()
> @@ -68,3 +86,31 @@ void vfio_legacy_cpr_unregister_container(VFIOContainer *container)
> migrate_del_blocker(&container->cpr.blocker);
> vmstate_unregister(NULL, &vfio_container_vmstate, container);
> }
> +
> +static bool same_device(int fd1, int fd2)
> +{
> + struct stat st1, st2;
> +
> + return !fstat(fd1, &st1) && !fstat(fd2, &st2) && st1.st_dev == st2.st_dev;
> +}
> +
> +bool vfio_cpr_container_match(VFIOContainer *container, VFIOGroup *group,
> + int *pfd)
> +{
> + if (container->fd == *pfd) {
> + return true;
> + }
> + if (!same_device(container->fd, *pfd)) {
> + return false;
> + }
> + /*
> + * Same device, different fd. This occurs when the container fd is
> + * cpr_save'd multiple times, once for each groupid, so SCM_RIGHTS
> + * produces duplicates. De-dup it.
> + */
> + cpr_delete_fd("vfio_container_for_group", group->groupid);
> + close(*pfd);
> + cpr_save_fd("vfio_container_for_group", group->groupid, container->fd);
> + *pfd = container->fd;
I am not sure 'pfd' is used afterwards. Is it ?
> + return true;
> +}
> diff --git a/include/hw/vfio/vfio-cpr.h b/include/hw/vfio/vfio-cpr.h
> index f864547..1c4f070 100644
> --- a/include/hw/vfio/vfio-cpr.h
> +++ b/include/hw/vfio/vfio-cpr.h
> @@ -13,10 +13,16 @@
>
> typedef struct VFIOContainerCPR {
> Error *blocker;
> + bool reused;
> } VFIOContainerCPR;
>
> +typedef struct VFIODeviceCPR {
> + bool reused;
> +} VFIODeviceCPR;
> +
> struct VFIOContainer;
> struct VFIOContainerBase;
> +struct VFIOGroup;
>
> bool vfio_legacy_cpr_register_container(struct VFIOContainer *container,
> Error **errp);
> @@ -29,4 +35,7 @@ bool vfio_cpr_register_container(struct VFIOContainerBase *bcontainer,
> Error **errp);
> void vfio_cpr_unregister_container(struct VFIOContainerBase *bcontainer);
>
> +bool vfio_cpr_container_match(struct VFIOContainer *container,
> + struct VFIOGroup *group, int *fd);
> +
> #endif /* HW_VFIO_VFIO_CPR_H */
> diff --git a/include/hw/vfio/vfio-device.h b/include/hw/vfio/vfio-device.h
> index 8bcb3c1..4e4d0b6 100644
> --- a/include/hw/vfio/vfio-device.h
> +++ b/include/hw/vfio/vfio-device.h
> @@ -28,6 +28,7 @@
> #endif
> #include "system/system.h"
> #include "hw/vfio/vfio-container-base.h"
> +#include "hw/vfio/vfio-cpr.h"
> #include "system/host_iommu_device.h"
> #include "system/iommufd.h"
>
> @@ -84,6 +85,7 @@ typedef struct VFIODevice {
> VFIOIOASHwpt *hwpt;
> QLIST_ENTRY(VFIODevice) hwpt_next;
> struct vfio_region_info **reginfo;
> + VFIODeviceCPR cpr;
> } VFIODevice;
>
> struct VFIODeviceOps {
Thanks,
C.
^ permalink raw reply [flat|nested] 157+ messages in thread* Re: [PATCH V3 07/42] vfio/container: preserve descriptors
2025-05-15 12:59 ` Cédric Le Goater
@ 2025-05-15 19:08 ` Steven Sistare
2025-05-19 13:20 ` Cédric Le Goater
0 siblings, 1 reply; 157+ messages in thread
From: Steven Sistare @ 2025-05-15 19:08 UTC (permalink / raw)
To: Cédric Le Goater, qemu-devel
Cc: Alex Williamson, Yi Liu, Eric Auger, Zhenzhong Duan,
Michael S. Tsirkin, Marcel Apfelbaum, Peter Xu, Fabiano Rosas
On 5/15/2025 8:59 AM, Cédric Le Goater wrote:
> On 5/12/25 17:32, Steve Sistare wrote:
>> At vfio creation time, save the value of vfio container, group, and device
>> descriptors in CPR state. On qemu restart, vfio_realize() finds and uses
>> the saved descriptors, and remembers the reused status for subsequent
>> patches. The reused status is cleared when vmstate load finishes.
>>
>> During reuse, device and iommu state is already configured, so operations
>> in vfio_realize that would modify the configuration, such as vfio ioctl's,
>> are skipped. The result is that vfio_realize constructs qemu data
>> structures that reflect the current state of the device.
>>
>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>> ---
>> hw/vfio/container.c | 65 ++++++++++++++++++++++++++++++++++++-------
>> hw/vfio/cpr-legacy.c | 46 ++++++++++++++++++++++++++++++
>> include/hw/vfio/vfio-cpr.h | 9 ++++++
>> include/hw/vfio/vfio-device.h | 2 ++
>> 4 files changed, 112 insertions(+), 10 deletions(-)
>>
>> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
>> index 85c76da..278a220 100644
>> --- a/hw/vfio/container.c
>> +++ b/hw/vfio/container.c
>> @@ -31,6 +31,8 @@
>> #include "system/reset.h"
>> #include "trace.h"
>> #include "qapi/error.h"
>> +#include "migration/cpr.h"
>> +#include "migration/blocker.h"
>> #include "pci.h"
>> #include "hw/vfio/vfio-container.h"
>> #include "hw/vfio/vfio-cpr.h"
>> @@ -414,7 +416,7 @@ static bool vfio_set_iommu(int container_fd, int group_fd,
>> }
>> static VFIOContainer *vfio_create_container(int fd, VFIOGroup *group,
>> - Error **errp)
>> + bool cpr_reused, Error **errp)
>> {
>> int iommu_type;
>> const char *vioc_name;
>> @@ -425,7 +427,11 @@ static VFIOContainer *vfio_create_container(int fd, VFIOGroup *group,
>> return NULL;
>> }
>> - if (!vfio_set_iommu(fd, group->fd, &iommu_type, errp)) {
>> + /*
>> + * If container is reused, just set its type and skip the ioctls, as the
>> + * container and group are already configured in the kernel.
>> + */
>> + if (!cpr_reused && !vfio_set_iommu(fd, group->fd, &iommu_type, errp)) {
>> return NULL;
>> }
>> @@ -433,6 +439,7 @@ static VFIOContainer *vfio_create_container(int fd, VFIOGroup *group,
>> container = VFIO_IOMMU_LEGACY(object_new(vioc_name));
>> container->fd = fd;
>> + container->cpr.reused = cpr_reused;
>> container->iommu_type = iommu_type;
>> return container;
>> }
>> @@ -584,7 +591,7 @@ static bool vfio_container_attach_discard_disable(VFIOContainer *container,
>> }
>> static bool vfio_container_group_add(VFIOContainer *container, VFIOGroup *group,
>> - Error **errp)
>> + bool cpr_reused, Error **errp)
>> {
>> if (!vfio_container_attach_discard_disable(container, group, errp)) {
>> return false;
>> @@ -592,6 +599,9 @@ static bool vfio_container_group_add(VFIOContainer *container, VFIOGroup *group,
>> group->container = container;
>> QLIST_INSERT_HEAD(&container->group_list, group, container_next);
>> vfio_group_add_kvm_device(group);
>> + if (!cpr_reused) {
>> + cpr_save_fd("vfio_container_for_group", group->groupid, container->fd);
>> + }
>
> Could we avoid the test on cpr_reused always call cpr_save_fd() ?
No. If cpr_reused is true, then the fd is already on cpr's save list.
We don't want to save duplicates of the same entry.
>> return true;
>> }
>> @@ -601,6 +611,7 @@ static void vfio_container_group_del(VFIOContainer *container, VFIOGroup *group)
>> group->container = NULL;
>> vfio_group_del_kvm_device(group);
>> vfio_ram_block_discard_disable(container, false);
>> + cpr_delete_fd("vfio_container_for_group", group->groupid);
>> }
>> static bool vfio_container_connect(VFIOGroup *group, AddressSpace *as,
>> @@ -613,17 +624,37 @@ static bool vfio_container_connect(VFIOGroup *group, AddressSpace *as,
>> VFIOIOMMUClass *vioc = NULL;
>> bool new_container = false;
>> bool group_was_added = false;
>> + bool cpr_reused;
>> space = vfio_address_space_get(as);
>> + fd = cpr_find_fd("vfio_container_for_group", group->groupid);
>> + cpr_reused = (fd > 0);
>
>
> The code above is doing 2 things : it grabs a restored fd and
> deduces from the fd value that the VM is doing are doing a CPR
> reboot.
>
> Instead of adding this cpr_reused flag, I would prefer to duplicate
> the code into something like:
>
> if (!cpr_reboot) {
> QLIST_FOREACH(bcontainer, &space->containers, next) {
> container = container_of(bcontainer, VFIOContainer, bcontainer);
> if (!ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &container->fd)) {
> return vfio_container_group_add(container, group, errp);
> }
> }
>
> fd = qemu_open("/dev/vfio/vfio", O_RDWR, errp);
> if (fd < 0) {
> goto fail;
> }
>
> ret = ioctl(fd, VFIO_GET_API_VERSION);
> if (ret != VFIO_API_VERSION) {
> error_setg(errp, "supported vfio version: %d, "
> "reported version: %d", VFIO_API_VERSION, ret);
> goto fail;
> }
>
> container = vfio_create_container(fd, group, errp);
> } else {
> /* ... */
> }
>
OK, but there is no sense in duplicating the identical code for
VFIO_GET_API_VERSION and vfio_create_container. If you want me to
simplify the loop, I suggest:
if (!cpr_reused) {
QLIST_FOREACH(bcontainer, &space->containers, next) {
container = container_of(bcontainer, VFIOContainer, bcontainer);
if (!ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &container->fd)) {
return vfio_container_group_add(container, group, false, errp);
}
}
fd = qemu_open("/dev/vfio/vfio", O_RDWR, errp);
if (fd < 0) {
goto fail;
}
} else {
QLIST_FOREACH(bcontainer, &space->containers, next) {
container = container_of(bcontainer, VFIOContainer, bcontainer);
if (vfio_cpr_container_match(container, group, &fd)) {
return vfio_container_group_add(container, group, true, errp);
}
}
}
ret = ioctl(fd, VFIO_GET_API_VERSION);
...
>> + /*
>> + * If the container is reused, then the group is already attached in the
>> + * kernel. If a container with matching fd is found, then update the
>> + * userland group list and return. If not, then after the loop, create
>> + * the container struct and group list.
>> + */
>> QLIST_FOREACH(bcontainer, &space->containers, next) {
>> container = container_of(bcontainer, VFIOContainer, bcontainer);
>> - if (!ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &container->fd)) {
>> - return vfio_container_group_add(container, group, errp);
>> +
>> + if (cpr_reused) {
>> + if (!vfio_cpr_container_match(container, group, &fd)) {
>
> why do we need to modify fd ?
That is explained by the comments inside vfio_cpr_container_match, where the
explanation is more easily understood.
>> + continue;
>> + }
>> + } else if (ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &container->fd)) {
>> + continue;
>> }
>> + return vfio_container_group_add(container, group, cpr_reused, errp);
>> + }
>> +
>> + if (!cpr_reused) {
>> + fd = qemu_open("/dev/vfio/vfio", O_RDWR, errp);
>> }
>> - fd = qemu_open("/dev/vfio/vfio", O_RDWR, errp);
>> if (fd < 0) {> goto fail;
>> }
>> @@ -635,7 +666,7 @@ static bool vfio_container_connect(VFIOGroup *group, AddressSpace *as,
>> goto fail;
>> }
>> - container = vfio_create_container(fd, group, errp);
>> + container = vfio_create_container(fd, group, cpr_reused, errp);
>> if (!container) {
>> goto fail;
>> }
>> @@ -655,7 +686,7 @@ static bool vfio_container_connect(VFIOGroup *group, AddressSpace *as,
>> vfio_address_space_insert(space, bcontainer);
>> - if (!vfio_container_group_add(container, group, errp)) {
>> + if (!vfio_container_group_add(container, group, cpr_reused, errp)) {
>> goto fail;
>> }
>> group_was_added = true;
>> @@ -697,6 +728,7 @@ static void vfio_container_disconnect(VFIOGroup *group)
>> QLIST_REMOVE(group, container_next);
>> group->container = NULL;
>> + cpr_delete_fd("vfio_container_for_group", group->groupid);
>> /*
>> * Explicitly release the listener first before unset container,
>> @@ -750,7 +782,7 @@ static VFIOGroup *vfio_group_get(int groupid, AddressSpace *as, Error **errp)
>> group = g_malloc0(sizeof(*group));
>> snprintf(path, sizeof(path), "/dev/vfio/%d", groupid);
>> - group->fd = qemu_open(path, O_RDWR, errp);
>> + group->fd = cpr_open_fd(path, O_RDWR, "vfio_group", groupid, NULL, errp);
>> if (group->fd < 0) {
>> goto free_group_exit;
>> }
>> @@ -782,6 +814,7 @@ static VFIOGroup *vfio_group_get(int groupid, AddressSpace *as, Error **errp)
>> return group;
>> close_fd_exit:
>> + cpr_delete_fd("vfio_group", groupid);
>> close(group->fd);
>> free_group_exit:
>> @@ -803,6 +836,7 @@ static void vfio_group_put(VFIOGroup *group)
>> vfio_container_disconnect(group);
>> QLIST_REMOVE(group, next);
>> trace_vfio_group_put(group->fd);
>> + cpr_delete_fd("vfio_group", group->groupid);
>> close(group->fd);
>> g_free(group);
>> }
>> @@ -812,8 +846,14 @@ static bool vfio_device_get(VFIOGroup *group, const char *name,
>> {
>> g_autofree struct vfio_device_info *info = NULL;
>> int fd;
>> + bool cpr_reused;
>> +
>> + fd = cpr_find_fd(name, 0);
>> + cpr_reused = (fd >= 0);
>> + if (!cpr_reused) {
>> + fd = ioctl(group->fd, VFIO_GROUP_GET_DEVICE_FD, name);
>> + }
>
> Could we introduce an helper routine to open this file, like we have
> cpr_open_fd() ?
OK, but this would be the only use of the helper, and it would bury
generic vfio functionality -- VFIO_GROUP_GET_DEVICE_FD -- inside a cpr
flavored helper. IMO not an improvement.
>> - fd = ioctl(group->fd, VFIO_GROUP_GET_DEVICE_FD, name);
>> if (fd < 0) {
>> error_setg_errno(errp, errno, "error getting device from group %d",
>> group->groupid);
>> @@ -857,6 +897,10 @@ static bool vfio_device_get(VFIOGroup *group, const char *name,
>> vbasedev->group = group;
>> QLIST_INSERT_HEAD(&group->device_list, vbasedev, next);
>> + vbasedev->cpr.reused = cpr_reused;
>> + if (!cpr_reused) {
>> + cpr_save_fd(name, 0, fd);
>
> Could we avoid the test on cpr_reused always call cpr_save_fd() ?
No. Must avoid adding duplicate entries.
>> + }
>> trace_vfio_device_get(name, info->flags, info->num_regions, info->num_irqs);
>> return true;
>> @@ -870,6 +914,7 @@ static void vfio_device_put(VFIODevice *vbasedev)
>> QLIST_REMOVE(vbasedev, next);
>> vbasedev->group = NULL;
>> trace_vfio_device_put(vbasedev->fd);
>> + cpr_delete_fd(vbasedev->name, 0);
>> close(vbasedev->fd);
>> }
>> diff --git a/hw/vfio/cpr-legacy.c b/hw/vfio/cpr-legacy.c
>> index fac323c..638a8e0 100644
>> --- a/hw/vfio/cpr-legacy.c
>> +++ b/hw/vfio/cpr-legacy.c
>> @@ -10,6 +10,7 @@
>> #include "qemu/osdep.h"
>> #include "hw/vfio/vfio-container.h"
>> #include "hw/vfio/vfio-cpr.h"
>> +#include "hw/vfio/vfio-device.h"
>> #include "migration/blocker.h"
>> #include "migration/cpr.h"
>> #include "migration/migration.h"
>> @@ -31,10 +32,27 @@ static bool vfio_cpr_supported(VFIOContainer *container, Error **errp)
>> }
>> }
>> +static int vfio_container_post_load(void *opaque, int version_id)
>> +{
>> + VFIOContainer *container = opaque;
>> + VFIOGroup *group;
>> + VFIODevice *vbasedev;
>> +
>> + container->cpr.reused = false;
>> +
>> + QLIST_FOREACH(group, &container->group_list, container_next) {
>> + QLIST_FOREACH(vbasedev, &group->device_list, next) {
>> + vbasedev->cpr.reused = false;
>> + }
>> + }
>> + return 0;
>> +}
>> +
>> static const VMStateDescription vfio_container_vmstate = {
>> .name = "vfio-container",
>> .version_id = 0,
>> .minimum_version_id = 0,
>> + .post_load = vfio_container_post_load,
>> .needed = cpr_needed_for_reuse,
>> .fields = (VMStateField[]) {
>> VMSTATE_END_OF_LIST()
>> @@ -68,3 +86,31 @@ void vfio_legacy_cpr_unregister_container(VFIOContainer *container)
>> migrate_del_blocker(&container->cpr.blocker);
>> vmstate_unregister(NULL, &vfio_container_vmstate, container);
>> }
>> +
>> +static bool same_device(int fd1, int fd2)
>> +{
>> + struct stat st1, st2;
>> +
>> + return !fstat(fd1, &st1) && !fstat(fd2, &st2) && st1.st_dev == st2.st_dev;
>> +}
>> +
>> +bool vfio_cpr_container_match(VFIOContainer *container, VFIOGroup *group,
>> + int *pfd)
>> +{
>> + if (container->fd == *pfd) {
>> + return true;
>> + }
>> + if (!same_device(container->fd, *pfd)) {
>> + return false;
>> + }
>> + /*
>> + * Same device, different fd. This occurs when the container fd is
>> + * cpr_save'd multiple times, once for each groupid, so SCM_RIGHTS
>> + * produces duplicates. De-dup it.
>> + */
>> + cpr_delete_fd("vfio_container_for_group", group->groupid);
>> + close(*pfd);
>> + cpr_save_fd("vfio_container_for_group", group->groupid, container->fd);
>> + *pfd = container->fd;
>
> I am not sure 'pfd' is used afterwards. Is it ?
True, good eye. I will change it to "int fd" and stop returning the new value.
- Steve
>
>> + return true;
>> +}
>> diff --git a/include/hw/vfio/vfio-cpr.h b/include/hw/vfio/vfio-cpr.h
>> index f864547..1c4f070 100644
>> --- a/include/hw/vfio/vfio-cpr.h
>> +++ b/include/hw/vfio/vfio-cpr.h
>> @@ -13,10 +13,16 @@
>> typedef struct VFIOContainerCPR {
>> Error *blocker;
>> + bool reused;
>> } VFIOContainerCPR;
>> +typedef struct VFIODeviceCPR {
>> + bool reused;
>> +} VFIODeviceCPR;
>> +
>> struct VFIOContainer;
>> struct VFIOContainerBase;
>> +struct VFIOGroup;
>> bool vfio_legacy_cpr_register_container(struct VFIOContainer *container,
>> Error **errp);
>> @@ -29,4 +35,7 @@ bool vfio_cpr_register_container(struct VFIOContainerBase *bcontainer,
>> Error **errp);
>> void vfio_cpr_unregister_container(struct VFIOContainerBase *bcontainer);
>> +bool vfio_cpr_container_match(struct VFIOContainer *container,
>> + struct VFIOGroup *group, int *fd);
>> +
>> #endif /* HW_VFIO_VFIO_CPR_H */
>> diff --git a/include/hw/vfio/vfio-device.h b/include/hw/vfio/vfio-device.h
>> index 8bcb3c1..4e4d0b6 100644
>> --- a/include/hw/vfio/vfio-device.h
>> +++ b/include/hw/vfio/vfio-device.h
>> @@ -28,6 +28,7 @@
>> #endif
>> #include "system/system.h"
>> #include "hw/vfio/vfio-container-base.h"
>> +#include "hw/vfio/vfio-cpr.h"
>> #include "system/host_iommu_device.h"
>> #include "system/iommufd.h"
>> @@ -84,6 +85,7 @@ typedef struct VFIODevice {
>> VFIOIOASHwpt *hwpt;
>> QLIST_ENTRY(VFIODevice) hwpt_next;
>> struct vfio_region_info **reginfo;
>> + VFIODeviceCPR cpr;
>> } VFIODevice;
>> struct VFIODeviceOps {
>
>
> Thanks,
>
> C.
>
>
^ permalink raw reply [flat|nested] 157+ messages in thread* Re: [PATCH V3 07/42] vfio/container: preserve descriptors
2025-05-15 19:08 ` Steven Sistare
@ 2025-05-19 13:20 ` Cédric Le Goater
2025-05-19 16:21 ` Steven Sistare
0 siblings, 1 reply; 157+ messages in thread
From: Cédric Le Goater @ 2025-05-19 13:20 UTC (permalink / raw)
To: Steven Sistare, qemu-devel
Cc: Alex Williamson, Yi Liu, Eric Auger, Zhenzhong Duan,
Michael S. Tsirkin, Marcel Apfelbaum, Peter Xu, Fabiano Rosas
On 5/15/25 21:08, Steven Sistare wrote:
> On 5/15/2025 8:59 AM, Cédric Le Goater wrote:
>> On 5/12/25 17:32, Steve Sistare wrote:
>>> At vfio creation time, save the value of vfio container, group, and device
>>> descriptors in CPR state. On qemu restart, vfio_realize() finds and uses
>>> the saved descriptors, and remembers the reused status for subsequent
>>> patches. The reused status is cleared when vmstate load finishes.
>>>
>>> During reuse, device and iommu state is already configured, so operations
>>> in vfio_realize that would modify the configuration, such as vfio ioctl's,
>>> are skipped. The result is that vfio_realize constructs qemu data
>>> structures that reflect the current state of the device.
>>>
>>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>>> ---
>>> hw/vfio/container.c | 65 ++++++++++++++++++++++++++++++++++++-------
>>> hw/vfio/cpr-legacy.c | 46 ++++++++++++++++++++++++++++++
>>> include/hw/vfio/vfio-cpr.h | 9 ++++++
>>> include/hw/vfio/vfio-device.h | 2 ++
>>> 4 files changed, 112 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
>>> index 85c76da..278a220 100644
>>> --- a/hw/vfio/container.c
>>> +++ b/hw/vfio/container.c
>>> @@ -31,6 +31,8 @@
>>> #include "system/reset.h"
>>> #include "trace.h"
>>> #include "qapi/error.h"
>>> +#include "migration/cpr.h"
>>> +#include "migration/blocker.h"
>>> #include "pci.h"
>>> #include "hw/vfio/vfio-container.h"
>>> #include "hw/vfio/vfio-cpr.h"
>>> @@ -414,7 +416,7 @@ static bool vfio_set_iommu(int container_fd, int group_fd,
>>> }
>>> static VFIOContainer *vfio_create_container(int fd, VFIOGroup *group,
>>> - Error **errp)
>>> + bool cpr_reused, Error **errp)
>>> {
>>> int iommu_type;
>>> const char *vioc_name;
>>> @@ -425,7 +427,11 @@ static VFIOContainer *vfio_create_container(int fd, VFIOGroup *group,
>>> return NULL;
>>> }
>>> - if (!vfio_set_iommu(fd, group->fd, &iommu_type, errp)) {
>>> + /*
>>> + * If container is reused, just set its type and skip the ioctls, as the
>>> + * container and group are already configured in the kernel.
>>> + */
>>> + if (!cpr_reused && !vfio_set_iommu(fd, group->fd, &iommu_type, errp)) {
>>> return NULL;
>>> }
>>> @@ -433,6 +439,7 @@ static VFIOContainer *vfio_create_container(int fd, VFIOGroup *group,
>>> container = VFIO_IOMMU_LEGACY(object_new(vioc_name));
>>> container->fd = fd;
>>> + container->cpr.reused = cpr_reused;
>>> container->iommu_type = iommu_type;
>>> return container;
>>> }
>>> @@ -584,7 +591,7 @@ static bool vfio_container_attach_discard_disable(VFIOContainer *container,
>>> }
>>> static bool vfio_container_group_add(VFIOContainer *container, VFIOGroup *group,
>>> - Error **errp)
>>> + bool cpr_reused, Error **errp)
>>> {
>>> if (!vfio_container_attach_discard_disable(container, group, errp)) {
>>> return false;
>>> @@ -592,6 +599,9 @@ static bool vfio_container_group_add(VFIOContainer *container, VFIOGroup *group,
>>> group->container = container;
>>> QLIST_INSERT_HEAD(&container->group_list, group, container_next);
>>> vfio_group_add_kvm_device(group);
>>> + if (!cpr_reused) {
>>> + cpr_save_fd("vfio_container_for_group", group->groupid, container->fd);
>>> + }
>>
>> Could we avoid the test on cpr_reused always call cpr_save_fd() ?
>
> No. If cpr_reused is true, then the fd is already on cpr's save list.
> We don't want to save duplicates of the same entry.
Can't we call cpr_find_fd() like in cpr_open_fd() ?
>
>>> return true;
>>> }
>>> @@ -601,6 +611,7 @@ static void vfio_container_group_del(VFIOContainer *container, VFIOGroup *group)
>>> group->container = NULL;
>>> vfio_group_del_kvm_device(group);
>>> vfio_ram_block_discard_disable(container, false);
>>> + cpr_delete_fd("vfio_container_for_group", group->groupid);
>>> }
>>> static bool vfio_container_connect(VFIOGroup *group, AddressSpace *as,
>>> @@ -613,17 +624,37 @@ static bool vfio_container_connect(VFIOGroup *group, AddressSpace *as,
>>> VFIOIOMMUClass *vioc = NULL;
>>> bool new_container = false;
>>> bool group_was_added = false;
>>> + bool cpr_reused;
>>> space = vfio_address_space_get(as);
>>> + fd = cpr_find_fd("vfio_container_for_group", group->groupid);
>>> + cpr_reused = (fd > 0);
>>
>>
>> The code above is doing 2 things : it grabs a restored fd and
>> deduces from the fd value that the VM is doing are doing a CPR
>> reboot.
>>
>> Instead of adding this cpr_reused flag, I would prefer to duplicate
>> the code into something like:
>>
>> if (!cpr_reboot) {
>> QLIST_FOREACH(bcontainer, &space->containers, next) {
>> container = container_of(bcontainer, VFIOContainer, bcontainer);
>> if (!ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &container->fd)) {
>> return vfio_container_group_add(container, group, errp);
>> }
>> }
>>
>> fd = qemu_open("/dev/vfio/vfio", O_RDWR, errp);
>> if (fd < 0) {
>> goto fail;
>> }
>>
>> ret = ioctl(fd, VFIO_GET_API_VERSION);
>> if (ret != VFIO_API_VERSION) {
>> error_setg(errp, "supported vfio version: %d, "
>> "reported version: %d", VFIO_API_VERSION, ret);
>> goto fail;
>> }
>>
>> container = vfio_create_container(fd, group, errp);
>> } else {
>> /* ... */
>> }
>>
>
> OK, but there is no sense in duplicating the identical code for
> VFIO_GET_API_VERSION and vfio_create_container. If you want me to
> simplify the loop, I suggest:
>
> if (!cpr_reused) {
> QLIST_FOREACH(bcontainer, &space->containers, next) {
> container = container_of(bcontainer, VFIOContainer, bcontainer);
> if (!ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &container->fd)) {
> return vfio_container_group_add(container, group, false, errp);
> }
> }
>
> fd = qemu_open("/dev/vfio/vfio", O_RDWR, errp);
> if (fd < 0) {
> goto fail;
> }
> } else {
> QLIST_FOREACH(bcontainer, &space->containers, next) {
> container = container_of(bcontainer, VFIOContainer, bcontainer);
> if (vfio_cpr_container_match(container, group, &fd)) {
> return vfio_container_group_add(container, group, true, errp);
> }
> }
> }
>
> ret = ioctl(fd, VFIO_GET_API_VERSION);
> ...
OK. Let's do that. I find it easier to read.
>>> + /*
>>> + * If the container is reused, then the group is already attached in the
>>> + * kernel. If a container with matching fd is found, then update the
>>> + * userland group list and return. If not, then after the loop, create
>>> + * the container struct and group list.
>>> + */
>>> QLIST_FOREACH(bcontainer, &space->containers, next) {
>>> container = container_of(bcontainer, VFIOContainer, bcontainer);
>>> - if (!ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &container->fd)) {
>>> - return vfio_container_group_add(container, group, errp);
>>> +
>>> + if (cpr_reused) {
>>> + if (!vfio_cpr_container_match(container, group, &fd)) {
>>
>> why do we need to modify fd ?
>
> That is explained by the comments inside vfio_cpr_container_match, where the
> explanation is more easily understood.
I haven't been able to see what a modified fd was useful for before because
we test cpr_reused and in other places !cpr_reused :
if (cpr_reused) {
if (!vfio_cpr_container_match(container, group, &fd)) {
continue;
}
and later
if (!cpr_reused) {
fd = qemu_open("/dev/vfio/vfio", O_RDWR, errp);
}
I think I got it now. This was a bit confusing.
>
>>> + continue;
>>> + }
>>> + } else if (ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &container->fd)) {
>>> + continue;
>>> }
>>> + return vfio_container_group_add(container, group, cpr_reused, errp);
>>> + }
>>> +
>>> + if (!cpr_reused) {
>>> + fd = qemu_open("/dev/vfio/vfio", O_RDWR, errp);
>>> }
>>> - fd = qemu_open("/dev/vfio/vfio", O_RDWR, errp);
>>> if (fd < 0) {> goto fail;
>>> }
>>> @@ -635,7 +666,7 @@ static bool vfio_container_connect(VFIOGroup *group, AddressSpace *as,
>>> goto fail;
>>> }
>>> - container = vfio_create_container(fd, group, errp);
>>> + container = vfio_create_container(fd, group, cpr_reused, errp);
>>> if (!container) {
>>> goto fail;
>>> }
>>> @@ -655,7 +686,7 @@ static bool vfio_container_connect(VFIOGroup *group, AddressSpace *as,
>>> vfio_address_space_insert(space, bcontainer);
>>> - if (!vfio_container_group_add(container, group, errp)) {
>>> + if (!vfio_container_group_add(container, group, cpr_reused, errp)) {
>>> goto fail;
>>> }
>>> group_was_added = true;
>>> @@ -697,6 +728,7 @@ static void vfio_container_disconnect(VFIOGroup *group)
>>> QLIST_REMOVE(group, container_next);
>>> group->container = NULL;
>>> + cpr_delete_fd("vfio_container_for_group", group->groupid);
>>> /*
>>> * Explicitly release the listener first before unset container,
>>> @@ -750,7 +782,7 @@ static VFIOGroup *vfio_group_get(int groupid, AddressSpace *as, Error **errp)
>>> group = g_malloc0(sizeof(*group));
>>> snprintf(path, sizeof(path), "/dev/vfio/%d", groupid);
>>> - group->fd = qemu_open(path, O_RDWR, errp);
>>> + group->fd = cpr_open_fd(path, O_RDWR, "vfio_group", groupid, NULL, errp);
>>> if (group->fd < 0) {
>>> goto free_group_exit;
>>> }
>>> @@ -782,6 +814,7 @@ static VFIOGroup *vfio_group_get(int groupid, AddressSpace *as, Error **errp)
>>> return group;
>>> close_fd_exit:
>>> + cpr_delete_fd("vfio_group", groupid);
>>> close(group->fd);
>>> free_group_exit:
>>> @@ -803,6 +836,7 @@ static void vfio_group_put(VFIOGroup *group)
>>> vfio_container_disconnect(group);
>>> QLIST_REMOVE(group, next);
>>> trace_vfio_group_put(group->fd);
>>> + cpr_delete_fd("vfio_group", group->groupid);
>>> close(group->fd);
>>> g_free(group);
>>> }
>>> @@ -812,8 +846,14 @@ static bool vfio_device_get(VFIOGroup *group, const char *name,
>>> {
>>> g_autofree struct vfio_device_info *info = NULL;
>>> int fd;
>>> + bool cpr_reused;
>>> +
>>> + fd = cpr_find_fd(name, 0);
>>> + cpr_reused = (fd >= 0);
>>> + if (!cpr_reused) {
>>> + fd = ioctl(group->fd, VFIO_GROUP_GET_DEVICE_FD, name);
>>> + }
>>
>> Could we introduce an helper routine to open this file, like we have
>> cpr_open_fd() ?
>
> OK, but this would be the only use of the helper, and it would bury
> generic vfio functionality -- VFIO_GROUP_GET_DEVICE_FD -- inside a cpr
> flavored helper. IMO not an improvement.
VFIO_GROUP_GET_DEVICE_FD would still be passed as a parameter and
so it won't be buried IMO. I don't dislike it that much.
However, I don't like the "if (cpr_reused)" statements scattered
throughout the code, so I'm looking for ways to bury them.
Thanks,
C.
>
>>> - fd = ioctl(group->fd, VFIO_GROUP_GET_DEVICE_FD, name);
>>> if (fd < 0) {
>>> error_setg_errno(errp, errno, "error getting device from group %d",
>>> group->groupid);
>>> @@ -857,6 +897,10 @@ static bool vfio_device_get(VFIOGroup *group, const char *name,
>>> vbasedev->group = group;
>>> QLIST_INSERT_HEAD(&group->device_list, vbasedev, next);
>>> + vbasedev->cpr.reused = cpr_reused;
>>> + if (!cpr_reused) {
>>> + cpr_save_fd(name, 0, fd);
>>
>> Could we avoid the test on cpr_reused always call cpr_save_fd() ?
>
> No. Must avoid adding duplicate entries.
>
>>> + }
>>> trace_vfio_device_get(name, info->flags, info->num_regions, info->num_irqs);
>>> return true;
>>> @@ -870,6 +914,7 @@ static void vfio_device_put(VFIODevice *vbasedev)
>>> QLIST_REMOVE(vbasedev, next);
>>> vbasedev->group = NULL;
>>> trace_vfio_device_put(vbasedev->fd);
>>> + cpr_delete_fd(vbasedev->name, 0);
>>> close(vbasedev->fd);
>>> }
>>> diff --git a/hw/vfio/cpr-legacy.c b/hw/vfio/cpr-legacy.c
>>> index fac323c..638a8e0 100644
>>> --- a/hw/vfio/cpr-legacy.c
>>> +++ b/hw/vfio/cpr-legacy.c
>>> @@ -10,6 +10,7 @@
>>> #include "qemu/osdep.h"
>>> #include "hw/vfio/vfio-container.h"
>>> #include "hw/vfio/vfio-cpr.h"
>>> +#include "hw/vfio/vfio-device.h"
>>> #include "migration/blocker.h"
>>> #include "migration/cpr.h"
>>> #include "migration/migration.h"
>>> @@ -31,10 +32,27 @@ static bool vfio_cpr_supported(VFIOContainer *container, Error **errp)
>>> }
>>> }
>>> +static int vfio_container_post_load(void *opaque, int version_id)
>>> +{
>>> + VFIOContainer *container = opaque;
>>> + VFIOGroup *group;
>>> + VFIODevice *vbasedev;
>>> +
>>> + container->cpr.reused = false;
>>> +
>>> + QLIST_FOREACH(group, &container->group_list, container_next) {
>>> + QLIST_FOREACH(vbasedev, &group->device_list, next) {
>>> + vbasedev->cpr.reused = false;
>>> + }
>>> + }
>>> + return 0;
>>> +}
>>> +
>>> static const VMStateDescription vfio_container_vmstate = {
>>> .name = "vfio-container",
>>> .version_id = 0,
>>> .minimum_version_id = 0,
>>> + .post_load = vfio_container_post_load,
>>> .needed = cpr_needed_for_reuse,
>>> .fields = (VMStateField[]) {
>>> VMSTATE_END_OF_LIST()
>>> @@ -68,3 +86,31 @@ void vfio_legacy_cpr_unregister_container(VFIOContainer *container)
>>> migrate_del_blocker(&container->cpr.blocker);
>>> vmstate_unregister(NULL, &vfio_container_vmstate, container);
>>> }
>>> +
>>> +static bool same_device(int fd1, int fd2)
>>> +{
>>> + struct stat st1, st2;
>>> +
>>> + return !fstat(fd1, &st1) && !fstat(fd2, &st2) && st1.st_dev == st2.st_dev;
>>> +}
>>> +
>>> +bool vfio_cpr_container_match(VFIOContainer *container, VFIOGroup *group,
>>> + int *pfd)
>>> +{
>>> + if (container->fd == *pfd) {
>>> + return true;
>>> + }
>>> + if (!same_device(container->fd, *pfd)) {
>>> + return false;
>>> + }
>>> + /*
>>> + * Same device, different fd. This occurs when the container fd is
>>> + * cpr_save'd multiple times, once for each groupid, so SCM_RIGHTS
>>> + * produces duplicates. De-dup it.
>>> + */
>>> + cpr_delete_fd("vfio_container_for_group", group->groupid);
>>> + close(*pfd);
>>> + cpr_save_fd("vfio_container_for_group", group->groupid, container->fd);
>>> + *pfd = container->fd;
>>
>> I am not sure 'pfd' is used afterwards. Is it ?
>
> True, good eye. I will change it to "int fd" and stop returning the new value.
>
> - Steve
>
>>
>>> + return true;
>>> +}
>>> diff --git a/include/hw/vfio/vfio-cpr.h b/include/hw/vfio/vfio-cpr.h
>>> index f864547..1c4f070 100644
>>> --- a/include/hw/vfio/vfio-cpr.h
>>> +++ b/include/hw/vfio/vfio-cpr.h
>>> @@ -13,10 +13,16 @@
>>> typedef struct VFIOContainerCPR {
>>> Error *blocker;
>>> + bool reused;
>>> } VFIOContainerCPR;
>>> +typedef struct VFIODeviceCPR {
>>> + bool reused;
>>> +} VFIODeviceCPR;
>>> +
>>> struct VFIOContainer;
>>> struct VFIOContainerBase;
>>> +struct VFIOGroup;
>>> bool vfio_legacy_cpr_register_container(struct VFIOContainer *container,
>>> Error **errp);
>>> @@ -29,4 +35,7 @@ bool vfio_cpr_register_container(struct VFIOContainerBase *bcontainer,
>>> Error **errp);
>>> void vfio_cpr_unregister_container(struct VFIOContainerBase *bcontainer);
>>> +bool vfio_cpr_container_match(struct VFIOContainer *container,
>>> + struct VFIOGroup *group, int *fd);
>>> +
>>> #endif /* HW_VFIO_VFIO_CPR_H */
>>> diff --git a/include/hw/vfio/vfio-device.h b/include/hw/vfio/vfio-device.h
>>> index 8bcb3c1..4e4d0b6 100644
>>> --- a/include/hw/vfio/vfio-device.h
>>> +++ b/include/hw/vfio/vfio-device.h
>>> @@ -28,6 +28,7 @@
>>> #endif
>>> #include "system/system.h"
>>> #include "hw/vfio/vfio-container-base.h"
>>> +#include "hw/vfio/vfio-cpr.h"
>>> #include "system/host_iommu_device.h"
>>> #include "system/iommufd.h"
>>> @@ -84,6 +85,7 @@ typedef struct VFIODevice {
>>> VFIOIOASHwpt *hwpt;
>>> QLIST_ENTRY(VFIODevice) hwpt_next;
>>> struct vfio_region_info **reginfo;
>>> + VFIODeviceCPR cpr;
>>> } VFIODevice;
>>> struct VFIODeviceOps {
>>
>>
>> Thanks,
>>
>> C.
>>
>>
>
^ permalink raw reply [flat|nested] 157+ messages in thread* Re: [PATCH V3 07/42] vfio/container: preserve descriptors
2025-05-19 13:20 ` Cédric Le Goater
@ 2025-05-19 16:21 ` Steven Sistare
0 siblings, 0 replies; 157+ messages in thread
From: Steven Sistare @ 2025-05-19 16:21 UTC (permalink / raw)
To: Cédric Le Goater, Peter Xu
Cc: Alex Williamson, Yi Liu, Eric Auger, Zhenzhong Duan,
Michael S. Tsirkin, Marcel Apfelbaum, Fabiano Rosas, qemu-devel
On 5/19/2025 9:20 AM, Cédric Le Goater wrote:
> On 5/15/25 21:08, Steven Sistare wrote:
>> On 5/15/2025 8:59 AM, Cédric Le Goater wrote:
>>> On 5/12/25 17:32, Steve Sistare wrote:
>>>> At vfio creation time, save the value of vfio container, group, and device
>>>> descriptors in CPR state. On qemu restart, vfio_realize() finds and uses
>>>> the saved descriptors, and remembers the reused status for subsequent
>>>> patches. The reused status is cleared when vmstate load finishes.
>>>>
>>>> During reuse, device and iommu state is already configured, so operations
>>>> in vfio_realize that would modify the configuration, such as vfio ioctl's,
>>>> are skipped. The result is that vfio_realize constructs qemu data
>>>> structures that reflect the current state of the device.
>>>>
>>>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>>>> ---
>>>> hw/vfio/container.c | 65 ++++++++++++++++++++++++++++++++++++-------
>>>> hw/vfio/cpr-legacy.c | 46 ++++++++++++++++++++++++++++++
>>>> include/hw/vfio/vfio-cpr.h | 9 ++++++
>>>> include/hw/vfio/vfio-device.h | 2 ++
>>>> 4 files changed, 112 insertions(+), 10 deletions(-)
>>>>
>>>> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
>>>> index 85c76da..278a220 100644
>>>> --- a/hw/vfio/container.c
>>>> +++ b/hw/vfio/container.c
>>>> @@ -31,6 +31,8 @@
>>>> #include "system/reset.h"
>>>> #include "trace.h"
>>>> #include "qapi/error.h"
>>>> +#include "migration/cpr.h"
>>>> +#include "migration/blocker.h"
>>>> #include "pci.h"
>>>> #include "hw/vfio/vfio-container.h"
>>>> #include "hw/vfio/vfio-cpr.h"
>>>> @@ -414,7 +416,7 @@ static bool vfio_set_iommu(int container_fd, int group_fd,
>>>> }
>>>> static VFIOContainer *vfio_create_container(int fd, VFIOGroup *group,
>>>> - Error **errp)
>>>> + bool cpr_reused, Error **errp)
>>>> {
>>>> int iommu_type;
>>>> const char *vioc_name;
>>>> @@ -425,7 +427,11 @@ static VFIOContainer *vfio_create_container(int fd, VFIOGroup *group,
>>>> return NULL;
>>>> }
>>>> - if (!vfio_set_iommu(fd, group->fd, &iommu_type, errp)) {
>>>> + /*
>>>> + * If container is reused, just set its type and skip the ioctls, as the
>>>> + * container and group are already configured in the kernel.
>>>> + */
>>>> + if (!cpr_reused && !vfio_set_iommu(fd, group->fd, &iommu_type, errp)) {
>>>> return NULL;
>>>> }
>>>> @@ -433,6 +439,7 @@ static VFIOContainer *vfio_create_container(int fd, VFIOGroup *group,
>>>> container = VFIO_IOMMU_LEGACY(object_new(vioc_name));
>>>> container->fd = fd;
>>>> + container->cpr.reused = cpr_reused;
>>>> container->iommu_type = iommu_type;
>>>> return container;
>>>> }
>>>> @@ -584,7 +591,7 @@ static bool vfio_container_attach_discard_disable(VFIOContainer *container,
>>>> }
>>>> static bool vfio_container_group_add(VFIOContainer *container, VFIOGroup *group,
>>>> - Error **errp)
>>>> + bool cpr_reused, Error **errp)
>>>> {
>>>> if (!vfio_container_attach_discard_disable(container, group, errp)) {
>>>> return false;
>>>> @@ -592,6 +599,9 @@ static bool vfio_container_group_add(VFIOContainer *container, VFIOGroup *group,
>>>> group->container = container;
>>>> QLIST_INSERT_HEAD(&container->group_list, group, container_next);
>>>> vfio_group_add_kvm_device(group);
>>>> + if (!cpr_reused) {
>>>> + cpr_save_fd("vfio_container_for_group", group->groupid, container->fd);
>>>> + }
>>>
>>> Could we avoid the test on cpr_reused always call cpr_save_fd() ?
>>
>> No. If cpr_reused is true, then the fd is already on cpr's save list.
>> We don't want to save duplicates of the same entry.
>
> Can't we call cpr_find_fd() like in cpr_open_fd() ?
I could indeed, and you have re-invented the cpr_resave_fd() helper which was
used here and elsewhere in V2, and Peter didn't like it.
Peter said:
If the caller know the fd was created, then IIUC the caller shouldn't
invoke the call.
For the other case, could you give an example when the caller may have been
created, but maybe not?
I said:
It avoids the need to remember that an fd was reused, and test that fact before
calling cpr_save_fd. And sometimes those operations occur in different functions.
Thus resave saves a few lines of code.
Peter, can I bring back cpr_resave_fd() ?
>>>> return true;
>>>> }
>>>> @@ -601,6 +611,7 @@ static void vfio_container_group_del(VFIOContainer *container, VFIOGroup *group)
>>>> group->container = NULL;
>>>> vfio_group_del_kvm_device(group);
>>>> vfio_ram_block_discard_disable(container, false);
>>>> + cpr_delete_fd("vfio_container_for_group", group->groupid);
>>>> }
>>>> static bool vfio_container_connect(VFIOGroup *group, AddressSpace *as,
>>>> @@ -613,17 +624,37 @@ static bool vfio_container_connect(VFIOGroup *group, AddressSpace *as,
>>>> VFIOIOMMUClass *vioc = NULL;
>>>> bool new_container = false;
>>>> bool group_was_added = false;
>>>> + bool cpr_reused;
>>>> space = vfio_address_space_get(as);
>>>> + fd = cpr_find_fd("vfio_container_for_group", group->groupid);
>>>> + cpr_reused = (fd > 0);
>>>
>>>
>>> The code above is doing 2 things : it grabs a restored fd and
>>> deduces from the fd value that the VM is doing are doing a CPR
>>> reboot.
>>>
>>> Instead of adding this cpr_reused flag, I would prefer to duplicate
>>> the code into something like:
>>>
>>> if (!cpr_reboot) {
>>> QLIST_FOREACH(bcontainer, &space->containers, next) {
>>> container = container_of(bcontainer, VFIOContainer, bcontainer);
>>> if (!ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &container->fd)) {
>>> return vfio_container_group_add(container, group, errp);
>>> }
>>> }
>>>
>>> fd = qemu_open("/dev/vfio/vfio", O_RDWR, errp);
>>> if (fd < 0) {
>>> goto fail;
>>> }
>>>
>>> ret = ioctl(fd, VFIO_GET_API_VERSION);
>>> if (ret != VFIO_API_VERSION) {
>>> error_setg(errp, "supported vfio version: %d, "
>>> "reported version: %d", VFIO_API_VERSION, ret);
>>> goto fail;
>>> }
>>>
>>> container = vfio_create_container(fd, group, errp);
>>> } else {
>>> /* ... */
>>> }
>>>
>>
>> OK, but there is no sense in duplicating the identical code for
>> VFIO_GET_API_VERSION and vfio_create_container. If you want me to
>> simplify the loop, I suggest:
>>
>> if (!cpr_reused) {
>> QLIST_FOREACH(bcontainer, &space->containers, next) {
>> container = container_of(bcontainer, VFIOContainer, bcontainer);
>> if (!ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &container->fd)) {
>> return vfio_container_group_add(container, group, false, errp);
>> }
>> }
>>
>> fd = qemu_open("/dev/vfio/vfio", O_RDWR, errp);
>> if (fd < 0) {
>> goto fail;
>> }
>> } else {
>> QLIST_FOREACH(bcontainer, &space->containers, next) {
>> container = container_of(bcontainer, VFIOContainer, bcontainer);
>> if (vfio_cpr_container_match(container, group, &fd)) {
>> return vfio_container_group_add(container, group, true, errp);
>> }
>> }
>> }
>>
>> ret = ioctl(fd, VFIO_GET_API_VERSION);
>> ...
>
> OK. Let's do that. I find it easier to read.
will do.
>>>> + /*
>>>> + * If the container is reused, then the group is already attached in the
>>>> + * kernel. If a container with matching fd is found, then update the
>>>> + * userland group list and return. If not, then after the loop, create
>>>> + * the container struct and group list.
>>>> + */
>>>> QLIST_FOREACH(bcontainer, &space->containers, next) {
>>>> container = container_of(bcontainer, VFIOContainer, bcontainer);
>>>> - if (!ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &container->fd)) {
>>>> - return vfio_container_group_add(container, group, errp);
>>>> +
>>>> + if (cpr_reused) {
>>>> + if (!vfio_cpr_container_match(container, group, &fd)) {
>>>
>>> why do we need to modify fd ?
>>
>> That is explained by the comments inside vfio_cpr_container_match, where the
>> explanation is more easily understood.
>
> I haven't been able to see what a modified fd was useful for before because
> we test cpr_reused and in other places !cpr_reused :
>
> if (cpr_reused) {
> if (!vfio_cpr_container_match(container, group, &fd)) {
> continue;
> }
>
> and later
>
> if (!cpr_reused) {
> fd = qemu_open("/dev/vfio/vfio", O_RDWR, errp);
> }
>
> I think I got it now. This was a bit confusing.
>
>>
>>>> + continue;
>>>> + }
>>>> + } else if (ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &container->fd)) {
>>>> + continue;
>>>> }
>>>> + return vfio_container_group_add(container, group, cpr_reused, errp);
>>>> + }
>>>> +
>>>> + if (!cpr_reused) {
>>>> + fd = qemu_open("/dev/vfio/vfio", O_RDWR, errp);
>>>> }
>>>> - fd = qemu_open("/dev/vfio/vfio", O_RDWR, errp);
>>>> if (fd < 0) {> goto fail;
>>>> }
>>>> @@ -635,7 +666,7 @@ static bool vfio_container_connect(VFIOGroup *group, AddressSpace *as,
>>>> goto fail;
>>>> }
>>>> - container = vfio_create_container(fd, group, errp);
>>>> + container = vfio_create_container(fd, group, cpr_reused, errp);
>>>> if (!container) {
>>>> goto fail;
>>>> }
>>>> @@ -655,7 +686,7 @@ static bool vfio_container_connect(VFIOGroup *group, AddressSpace *as,
>>>> vfio_address_space_insert(space, bcontainer);
>>>> - if (!vfio_container_group_add(container, group, errp)) {
>>>> + if (!vfio_container_group_add(container, group, cpr_reused, errp)) {
>>>> goto fail;
>>>> }
>>>> group_was_added = true;
>>>> @@ -697,6 +728,7 @@ static void vfio_container_disconnect(VFIOGroup *group)
>>>> QLIST_REMOVE(group, container_next);
>>>> group->container = NULL;
>>>> + cpr_delete_fd("vfio_container_for_group", group->groupid);
>>>> /*
>>>> * Explicitly release the listener first before unset container,
>>>> @@ -750,7 +782,7 @@ static VFIOGroup *vfio_group_get(int groupid, AddressSpace *as, Error **errp)
>>>> group = g_malloc0(sizeof(*group));
>>>> snprintf(path, sizeof(path), "/dev/vfio/%d", groupid);
>>>> - group->fd = qemu_open(path, O_RDWR, errp);
>>>> + group->fd = cpr_open_fd(path, O_RDWR, "vfio_group", groupid, NULL, errp);
>>>> if (group->fd < 0) {
>>>> goto free_group_exit;
>>>> }
>>>> @@ -782,6 +814,7 @@ static VFIOGroup *vfio_group_get(int groupid, AddressSpace *as, Error **errp)
>>>> return group;
>>>> close_fd_exit:
>>>> + cpr_delete_fd("vfio_group", groupid);
>>>> close(group->fd);
>>>> free_group_exit:
>>>> @@ -803,6 +836,7 @@ static void vfio_group_put(VFIOGroup *group)
>>>> vfio_container_disconnect(group);
>>>> QLIST_REMOVE(group, next);
>>>> trace_vfio_group_put(group->fd);
>>>> + cpr_delete_fd("vfio_group", group->groupid);
>>>> close(group->fd);
>>>> g_free(group);
>>>> }
>>>> @@ -812,8 +846,14 @@ static bool vfio_device_get(VFIOGroup *group, const char *name,
>>>> {
>>>> g_autofree struct vfio_device_info *info = NULL;
>>>> int fd;
>>>> + bool cpr_reused;
>>>> +
>>>> + fd = cpr_find_fd(name, 0);
>>>> + cpr_reused = (fd >= 0);
>>>> + if (!cpr_reused) {
>>>> + fd = ioctl(group->fd, VFIO_GROUP_GET_DEVICE_FD, name);
>>>> + }
>>>
>>> Could we introduce an helper routine to open this file, like we have
>>> cpr_open_fd() ?
>>
>> OK, but this would be the only use of the helper, and it would bury
>> generic vfio functionality -- VFIO_GROUP_GET_DEVICE_FD -- inside a cpr
>> flavored helper. IMO not an improvement.
>
> VFIO_GROUP_GET_DEVICE_FD would still be passed as a parameter and
> so it won't be buried IMO. I don't dislike it that much.
OK.
> However, I don't like the "if (cpr_reused)" statements scattered
> throughout the code, so I'm looking for ways to bury them.
cpr_resave_fd will help.
- Steve
>>>> - fd = ioctl(group->fd, VFIO_GROUP_GET_DEVICE_FD, name);
>>>> if (fd < 0) {
>>>> error_setg_errno(errp, errno, "error getting device from group %d",
>>>> group->groupid);
>>>> @@ -857,6 +897,10 @@ static bool vfio_device_get(VFIOGroup *group, const char *name,
>>>> vbasedev->group = group;
>>>> QLIST_INSERT_HEAD(&group->device_list, vbasedev, next);
>>>> + vbasedev->cpr.reused = cpr_reused;
>>>> + if (!cpr_reused) {
>>>> + cpr_save_fd(name, 0, fd);
>>>
>>> Could we avoid the test on cpr_reused always call cpr_save_fd() ?
>>
>> No. Must avoid adding duplicate entries.
>>
>>>> + }
>>>> trace_vfio_device_get(name, info->flags, info->num_regions, info->num_irqs);
>>>> return true;
>>>> @@ -870,6 +914,7 @@ static void vfio_device_put(VFIODevice *vbasedev)
>>>> QLIST_REMOVE(vbasedev, next);
>>>> vbasedev->group = NULL;
>>>> trace_vfio_device_put(vbasedev->fd);
>>>> + cpr_delete_fd(vbasedev->name, 0);
>>>> close(vbasedev->fd);
>>>> }
>>>> diff --git a/hw/vfio/cpr-legacy.c b/hw/vfio/cpr-legacy.c
>>>> index fac323c..638a8e0 100644
>>>> --- a/hw/vfio/cpr-legacy.c
>>>> +++ b/hw/vfio/cpr-legacy.c
>>>> @@ -10,6 +10,7 @@
>>>> #include "qemu/osdep.h"
>>>> #include "hw/vfio/vfio-container.h"
>>>> #include "hw/vfio/vfio-cpr.h"
>>>> +#include "hw/vfio/vfio-device.h"
>>>> #include "migration/blocker.h"
>>>> #include "migration/cpr.h"
>>>> #include "migration/migration.h"
>>>> @@ -31,10 +32,27 @@ static bool vfio_cpr_supported(VFIOContainer *container, Error **errp)
>>>> }
>>>> }
>>>> +static int vfio_container_post_load(void *opaque, int version_id)
>>>> +{
>>>> + VFIOContainer *container = opaque;
>>>> + VFIOGroup *group;
>>>> + VFIODevice *vbasedev;
>>>> +
>>>> + container->cpr.reused = false;
>>>> +
>>>> + QLIST_FOREACH(group, &container->group_list, container_next) {
>>>> + QLIST_FOREACH(vbasedev, &group->device_list, next) {
>>>> + vbasedev->cpr.reused = false;
>>>> + }
>>>> + }
>>>> + return 0;
>>>> +}
>>>> +
>>>> static const VMStateDescription vfio_container_vmstate = {
>>>> .name = "vfio-container",
>>>> .version_id = 0,
>>>> .minimum_version_id = 0,
>>>> + .post_load = vfio_container_post_load,
>>>> .needed = cpr_needed_for_reuse,
>>>> .fields = (VMStateField[]) {
>>>> VMSTATE_END_OF_LIST()
>>>> @@ -68,3 +86,31 @@ void vfio_legacy_cpr_unregister_container(VFIOContainer *container)
>>>> migrate_del_blocker(&container->cpr.blocker);
>>>> vmstate_unregister(NULL, &vfio_container_vmstate, container);
>>>> }
>>>> +
>>>> +static bool same_device(int fd1, int fd2)
>>>> +{
>>>> + struct stat st1, st2;
>>>> +
>>>> + return !fstat(fd1, &st1) && !fstat(fd2, &st2) && st1.st_dev == st2.st_dev;
>>>> +}
>>>> +
>>>> +bool vfio_cpr_container_match(VFIOContainer *container, VFIOGroup *group,
>>>> + int *pfd)
>>>> +{
>>>> + if (container->fd == *pfd) {
>>>> + return true;
>>>> + }
>>>> + if (!same_device(container->fd, *pfd)) {
>>>> + return false;
>>>> + }
>>>> + /*
>>>> + * Same device, different fd. This occurs when the container fd is
>>>> + * cpr_save'd multiple times, once for each groupid, so SCM_RIGHTS
>>>> + * produces duplicates. De-dup it.
>>>> + */
>>>> + cpr_delete_fd("vfio_container_for_group", group->groupid);
>>>> + close(*pfd);
>>>> + cpr_save_fd("vfio_container_for_group", group->groupid, container->fd);
>>>> + *pfd = container->fd;
>>>
>>> I am not sure 'pfd' is used afterwards. Is it ?
>>
>> True, good eye. I will change it to "int fd" and stop returning the new value.
>>
>> - Steve
>>
>>>
>>>> + return true;
>>>> +}
>>>> diff --git a/include/hw/vfio/vfio-cpr.h b/include/hw/vfio/vfio-cpr.h
>>>> index f864547..1c4f070 100644
>>>> --- a/include/hw/vfio/vfio-cpr.h
>>>> +++ b/include/hw/vfio/vfio-cpr.h
>>>> @@ -13,10 +13,16 @@
>>>> typedef struct VFIOContainerCPR {
>>>> Error *blocker;
>>>> + bool reused;
>>>> } VFIOContainerCPR;
>>>> +typedef struct VFIODeviceCPR {
>>>> + bool reused;
>>>> +} VFIODeviceCPR;
>>>> +
>>>> struct VFIOContainer;
>>>> struct VFIOContainerBase;
>>>> +struct VFIOGroup;
>>>> bool vfio_legacy_cpr_register_container(struct VFIOContainer *container,
>>>> Error **errp);
>>>> @@ -29,4 +35,7 @@ bool vfio_cpr_register_container(struct VFIOContainerBase *bcontainer,
>>>> Error **errp);
>>>> void vfio_cpr_unregister_container(struct VFIOContainerBase *bcontainer);
>>>> +bool vfio_cpr_container_match(struct VFIOContainer *container,
>>>> + struct VFIOGroup *group, int *fd);
>>>> +
>>>> #endif /* HW_VFIO_VFIO_CPR_H */
>>>> diff --git a/include/hw/vfio/vfio-device.h b/include/hw/vfio/vfio-device.h
>>>> index 8bcb3c1..4e4d0b6 100644
>>>> --- a/include/hw/vfio/vfio-device.h
>>>> +++ b/include/hw/vfio/vfio-device.h
>>>> @@ -28,6 +28,7 @@
>>>> #endif
>>>> #include "system/system.h"
>>>> #include "hw/vfio/vfio-container-base.h"
>>>> +#include "hw/vfio/vfio-cpr.h"
>>>> #include "system/host_iommu_device.h"
>>>> #include "system/iommufd.h"
>>>> @@ -84,6 +85,7 @@ typedef struct VFIODevice {
>>>> VFIOIOASHwpt *hwpt;
>>>> QLIST_ENTRY(VFIODevice) hwpt_next;
>>>> struct vfio_region_info **reginfo;
>>>> + VFIODeviceCPR cpr;
>>>> } VFIODevice;
>>>> struct VFIODeviceOps {
>>>
>>>
>>> Thanks,
>>>
>>> C.
>>>
>>>
>>
>
^ permalink raw reply [flat|nested] 157+ messages in thread
* Re: [PATCH V3 07/42] vfio/container: preserve descriptors
2025-05-12 15:32 ` [PATCH V3 07/42] vfio/container: preserve descriptors Steve Sistare
2025-05-15 12:59 ` Cédric Le Goater
@ 2025-05-22 13:51 ` Cédric Le Goater
2025-05-22 13:56 ` Steven Sistare
1 sibling, 1 reply; 157+ messages in thread
From: Cédric Le Goater @ 2025-05-22 13:51 UTC (permalink / raw)
To: Steve Sistare, qemu-devel
Cc: Alex Williamson, Yi Liu, Eric Auger, Zhenzhong Duan,
Michael S. Tsirkin, Marcel Apfelbaum, Peter Xu, Fabiano Rosas
On 5/12/25 17:32, Steve Sistare wrote:
> At vfio creation time, save the value of vfio container, group, and device
> descriptors in CPR state. On qemu restart, vfio_realize() finds and uses
> the saved descriptors, and remembers the reused status for subsequent
> patches. The reused status is cleared when vmstate load finishes.
>
> During reuse, device and iommu state is already configured, so operations
> in vfio_realize that would modify the configuration, such as vfio ioctl's,
> are skipped. The result is that vfio_realize constructs qemu data
> structures that reflect the current state of the device.
>
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> ---
> hw/vfio/container.c | 65 ++++++++++++++++++++++++++++++++++++-------
> hw/vfio/cpr-legacy.c | 46 ++++++++++++++++++++++++++++++
> include/hw/vfio/vfio-cpr.h | 9 ++++++
> include/hw/vfio/vfio-device.h | 2 ++
> 4 files changed, 112 insertions(+), 10 deletions(-)
>
> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
> index 85c76da..278a220 100644
> --- a/hw/vfio/container.c
> +++ b/hw/vfio/container.c
> @@ -31,6 +31,8 @@
> #include "system/reset.h"
> #include "trace.h"
> #include "qapi/error.h"
> +#include "migration/cpr.h"
> +#include "migration/blocker.h"
> #include "pci.h"
> #include "hw/vfio/vfio-container.h"
> #include "hw/vfio/vfio-cpr.h"
> @@ -414,7 +416,7 @@ static bool vfio_set_iommu(int container_fd, int group_fd,
> }
>
> static VFIOContainer *vfio_create_container(int fd, VFIOGroup *group,
> - Error **errp)
> + bool cpr_reused, Error **errp)
> {
> int iommu_type;
> const char *vioc_name;
> @@ -425,7 +427,11 @@ static VFIOContainer *vfio_create_container(int fd, VFIOGroup *group,
> return NULL;
> }
>
> - if (!vfio_set_iommu(fd, group->fd, &iommu_type, errp)) {
> + /*
> + * If container is reused, just set its type and skip the ioctls, as the
> + * container and group are already configured in the kernel.
> + */
> + if (!cpr_reused && !vfio_set_iommu(fd, group->fd, &iommu_type, errp)) {
> return NULL;
> }
>
> @@ -433,6 +439,7 @@ static VFIOContainer *vfio_create_container(int fd, VFIOGroup *group,
>
> container = VFIO_IOMMU_LEGACY(object_new(vioc_name));
> container->fd = fd;
> + container->cpr.reused = cpr_reused;
> container->iommu_type = iommu_type;
> return container;
> }
> @@ -584,7 +591,7 @@ static bool vfio_container_attach_discard_disable(VFIOContainer *container,
> }
>
> static bool vfio_container_group_add(VFIOContainer *container, VFIOGroup *group,
> - Error **errp)
> + bool cpr_reused, Error **errp)
> {
> if (!vfio_container_attach_discard_disable(container, group, errp)) {
> return false;
> @@ -592,6 +599,9 @@ static bool vfio_container_group_add(VFIOContainer *container, VFIOGroup *group,
> group->container = container;
> QLIST_INSERT_HEAD(&container->group_list, group, container_next);
> vfio_group_add_kvm_device(group);
> + if (!cpr_reused) {
> + cpr_save_fd("vfio_container_for_group", group->groupid, container->fd);
> + }
> return true;
> }
>
> @@ -601,6 +611,7 @@ static void vfio_container_group_del(VFIOContainer *container, VFIOGroup *group)
> group->container = NULL;
> vfio_group_del_kvm_device(group);
> vfio_ram_block_discard_disable(container, false);
> + cpr_delete_fd("vfio_container_for_group", group->groupid);
> }
>
> static bool vfio_container_connect(VFIOGroup *group, AddressSpace *as,
> @@ -613,17 +624,37 @@ static bool vfio_container_connect(VFIOGroup *group, AddressSpace *as,
> VFIOIOMMUClass *vioc = NULL;
> bool new_container = false;
> bool group_was_added = false;
> + bool cpr_reused;
>
> space = vfio_address_space_get(as);
> + fd = cpr_find_fd("vfio_container_for_group", group->groupid);
> + cpr_reused = (fd > 0);
btw, 0 is a valid fd number.
Thanks,
C.
> +
> + /*
> + * If the container is reused, then the group is already attached in the
> + * kernel. If a container with matching fd is found, then update the
> + * userland group list and return. If not, then after the loop, create
> + * the container struct and group list.
> + */
>
> QLIST_FOREACH(bcontainer, &space->containers, next) {
> container = container_of(bcontainer, VFIOContainer, bcontainer);
> - if (!ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &container->fd)) {
> - return vfio_container_group_add(container, group, errp);
> +
> + if (cpr_reused) {
> + if (!vfio_cpr_container_match(container, group, &fd)) {
> + continue;
> + }
> + } else if (ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &container->fd)) {
> + continue;
> }
> +
> + return vfio_container_group_add(container, group, cpr_reused, errp);
> + }
> +
> + if (!cpr_reused) {
> + fd = qemu_open("/dev/vfio/vfio", O_RDWR, errp);
> }
>
> - fd = qemu_open("/dev/vfio/vfio", O_RDWR, errp);
> if (fd < 0) {
> goto fail;
> }
> @@ -635,7 +666,7 @@ static bool vfio_container_connect(VFIOGroup *group, AddressSpace *as,
> goto fail;
> }
>
> - container = vfio_create_container(fd, group, errp);
> + container = vfio_create_container(fd, group, cpr_reused, errp);
> if (!container) {
> goto fail;
> }
> @@ -655,7 +686,7 @@ static bool vfio_container_connect(VFIOGroup *group, AddressSpace *as,
>
> vfio_address_space_insert(space, bcontainer);
>
> - if (!vfio_container_group_add(container, group, errp)) {
> + if (!vfio_container_group_add(container, group, cpr_reused, errp)) {
> goto fail;
> }
> group_was_added = true;
> @@ -697,6 +728,7 @@ static void vfio_container_disconnect(VFIOGroup *group)
>
> QLIST_REMOVE(group, container_next);
> group->container = NULL;
> + cpr_delete_fd("vfio_container_for_group", group->groupid);
>
> /*
> * Explicitly release the listener first before unset container,
> @@ -750,7 +782,7 @@ static VFIOGroup *vfio_group_get(int groupid, AddressSpace *as, Error **errp)
> group = g_malloc0(sizeof(*group));
>
> snprintf(path, sizeof(path), "/dev/vfio/%d", groupid);
> - group->fd = qemu_open(path, O_RDWR, errp);
> + group->fd = cpr_open_fd(path, O_RDWR, "vfio_group", groupid, NULL, errp);
> if (group->fd < 0) {
> goto free_group_exit;
> }
> @@ -782,6 +814,7 @@ static VFIOGroup *vfio_group_get(int groupid, AddressSpace *as, Error **errp)
> return group;
>
> close_fd_exit:
> + cpr_delete_fd("vfio_group", groupid);
> close(group->fd);
>
> free_group_exit:
> @@ -803,6 +836,7 @@ static void vfio_group_put(VFIOGroup *group)
> vfio_container_disconnect(group);
> QLIST_REMOVE(group, next);
> trace_vfio_group_put(group->fd);
> + cpr_delete_fd("vfio_group", group->groupid);
> close(group->fd);
> g_free(group);
> }
> @@ -812,8 +846,14 @@ static bool vfio_device_get(VFIOGroup *group, const char *name,
> {
> g_autofree struct vfio_device_info *info = NULL;
> int fd;
> + bool cpr_reused;
> +
> + fd = cpr_find_fd(name, 0);
> + cpr_reused = (fd >= 0);
> + if (!cpr_reused) {
> + fd = ioctl(group->fd, VFIO_GROUP_GET_DEVICE_FD, name);
> + }
>
> - fd = ioctl(group->fd, VFIO_GROUP_GET_DEVICE_FD, name);
> if (fd < 0) {
> error_setg_errno(errp, errno, "error getting device from group %d",
> group->groupid);
> @@ -857,6 +897,10 @@ static bool vfio_device_get(VFIOGroup *group, const char *name,
> vbasedev->group = group;
> QLIST_INSERT_HEAD(&group->device_list, vbasedev, next);
>
> + vbasedev->cpr.reused = cpr_reused;
> + if (!cpr_reused) {
> + cpr_save_fd(name, 0, fd);
> + }
> trace_vfio_device_get(name, info->flags, info->num_regions, info->num_irqs);
>
> return true;
> @@ -870,6 +914,7 @@ static void vfio_device_put(VFIODevice *vbasedev)
> QLIST_REMOVE(vbasedev, next);
> vbasedev->group = NULL;
> trace_vfio_device_put(vbasedev->fd);
> + cpr_delete_fd(vbasedev->name, 0);
> close(vbasedev->fd);
> }
>
> diff --git a/hw/vfio/cpr-legacy.c b/hw/vfio/cpr-legacy.c
> index fac323c..638a8e0 100644
> --- a/hw/vfio/cpr-legacy.c
> +++ b/hw/vfio/cpr-legacy.c
> @@ -10,6 +10,7 @@
> #include "qemu/osdep.h"
> #include "hw/vfio/vfio-container.h"
> #include "hw/vfio/vfio-cpr.h"
> +#include "hw/vfio/vfio-device.h"
> #include "migration/blocker.h"
> #include "migration/cpr.h"
> #include "migration/migration.h"
> @@ -31,10 +32,27 @@ static bool vfio_cpr_supported(VFIOContainer *container, Error **errp)
> }
> }
>
> +static int vfio_container_post_load(void *opaque, int version_id)
> +{
> + VFIOContainer *container = opaque;
> + VFIOGroup *group;
> + VFIODevice *vbasedev;
> +
> + container->cpr.reused = false;
> +
> + QLIST_FOREACH(group, &container->group_list, container_next) {
> + QLIST_FOREACH(vbasedev, &group->device_list, next) {
> + vbasedev->cpr.reused = false;
> + }
> + }
> + return 0;
> +}
> +
> static const VMStateDescription vfio_container_vmstate = {
> .name = "vfio-container",
> .version_id = 0,
> .minimum_version_id = 0,
> + .post_load = vfio_container_post_load,
> .needed = cpr_needed_for_reuse,
> .fields = (VMStateField[]) {
> VMSTATE_END_OF_LIST()
> @@ -68,3 +86,31 @@ void vfio_legacy_cpr_unregister_container(VFIOContainer *container)
> migrate_del_blocker(&container->cpr.blocker);
> vmstate_unregister(NULL, &vfio_container_vmstate, container);
> }
> +
> +static bool same_device(int fd1, int fd2)
> +{
> + struct stat st1, st2;
> +
> + return !fstat(fd1, &st1) && !fstat(fd2, &st2) && st1.st_dev == st2.st_dev;
> +}
> +
> +bool vfio_cpr_container_match(VFIOContainer *container, VFIOGroup *group,
> + int *pfd)
> +{
> + if (container->fd == *pfd) {
> + return true;
> + }
> + if (!same_device(container->fd, *pfd)) {
> + return false;
> + }
> + /*
> + * Same device, different fd. This occurs when the container fd is
> + * cpr_save'd multiple times, once for each groupid, so SCM_RIGHTS
> + * produces duplicates. De-dup it.
> + */
> + cpr_delete_fd("vfio_container_for_group", group->groupid);
> + close(*pfd);
> + cpr_save_fd("vfio_container_for_group", group->groupid, container->fd);
> + *pfd = container->fd;
> + return true;
> +}
> diff --git a/include/hw/vfio/vfio-cpr.h b/include/hw/vfio/vfio-cpr.h
> index f864547..1c4f070 100644
> --- a/include/hw/vfio/vfio-cpr.h
> +++ b/include/hw/vfio/vfio-cpr.h
> @@ -13,10 +13,16 @@
>
> typedef struct VFIOContainerCPR {
> Error *blocker;
> + bool reused;
> } VFIOContainerCPR;
>
> +typedef struct VFIODeviceCPR {
> + bool reused;
> +} VFIODeviceCPR;
> +
> struct VFIOContainer;
> struct VFIOContainerBase;
> +struct VFIOGroup;
>
> bool vfio_legacy_cpr_register_container(struct VFIOContainer *container,
> Error **errp);
> @@ -29,4 +35,7 @@ bool vfio_cpr_register_container(struct VFIOContainerBase *bcontainer,
> Error **errp);
> void vfio_cpr_unregister_container(struct VFIOContainerBase *bcontainer);
>
> +bool vfio_cpr_container_match(struct VFIOContainer *container,
> + struct VFIOGroup *group, int *fd);
> +
> #endif /* HW_VFIO_VFIO_CPR_H */
> diff --git a/include/hw/vfio/vfio-device.h b/include/hw/vfio/vfio-device.h
> index 8bcb3c1..4e4d0b6 100644
> --- a/include/hw/vfio/vfio-device.h
> +++ b/include/hw/vfio/vfio-device.h
> @@ -28,6 +28,7 @@
> #endif
> #include "system/system.h"
> #include "hw/vfio/vfio-container-base.h"
> +#include "hw/vfio/vfio-cpr.h"
> #include "system/host_iommu_device.h"
> #include "system/iommufd.h"
>
> @@ -84,6 +85,7 @@ typedef struct VFIODevice {
> VFIOIOASHwpt *hwpt;
> QLIST_ENTRY(VFIODevice) hwpt_next;
> struct vfio_region_info **reginfo;
> + VFIODeviceCPR cpr;
> } VFIODevice;
>
> struct VFIODeviceOps {
^ permalink raw reply [flat|nested] 157+ messages in thread* Re: [PATCH V3 07/42] vfio/container: preserve descriptors
2025-05-22 13:51 ` Cédric Le Goater
@ 2025-05-22 13:56 ` Steven Sistare
0 siblings, 0 replies; 157+ messages in thread
From: Steven Sistare @ 2025-05-22 13:56 UTC (permalink / raw)
To: Cédric Le Goater, qemu-devel
Cc: Alex Williamson, Yi Liu, Eric Auger, Zhenzhong Duan,
Michael S. Tsirkin, Marcel Apfelbaum, Peter Xu, Fabiano Rosas
On 5/22/2025 9:51 AM, Cédric Le Goater wrote:
> On 5/12/25 17:32, Steve Sistare wrote:
>> At vfio creation time, save the value of vfio container, group, and device
>> descriptors in CPR state. On qemu restart, vfio_realize() finds and uses
>> the saved descriptors, and remembers the reused status for subsequent
>> patches. The reused status is cleared when vmstate load finishes.
>>
>> During reuse, device and iommu state is already configured, so operations
>> in vfio_realize that would modify the configuration, such as vfio ioctl's,
>> are skipped. The result is that vfio_realize constructs qemu data
>> structures that reflect the current state of the device.
>>
>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>> ---
>> hw/vfio/container.c | 65 ++++++++++++++++++++++++++++++++++++-------
>> hw/vfio/cpr-legacy.c | 46 ++++++++++++++++++++++++++++++
>> include/hw/vfio/vfio-cpr.h | 9 ++++++
>> include/hw/vfio/vfio-device.h | 2 ++
>> 4 files changed, 112 insertions(+), 10 deletions(-)
>>
>> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
>> index 85c76da..278a220 100644
>> --- a/hw/vfio/container.c
>> +++ b/hw/vfio/container.c
>> @@ -31,6 +31,8 @@
>> #include "system/reset.h"
>> #include "trace.h"
>> #include "qapi/error.h"
>> +#include "migration/cpr.h"
>> +#include "migration/blocker.h"
>> #include "pci.h"
>> #include "hw/vfio/vfio-container.h"
>> #include "hw/vfio/vfio-cpr.h"
>> @@ -414,7 +416,7 @@ static bool vfio_set_iommu(int container_fd, int group_fd,
>> }
>> static VFIOContainer *vfio_create_container(int fd, VFIOGroup *group,
>> - Error **errp)
>> + bool cpr_reused, Error **errp)
>> {
>> int iommu_type;
>> const char *vioc_name;
>> @@ -425,7 +427,11 @@ static VFIOContainer *vfio_create_container(int fd, VFIOGroup *group,
>> return NULL;
>> }
>> - if (!vfio_set_iommu(fd, group->fd, &iommu_type, errp)) {
>> + /*
>> + * If container is reused, just set its type and skip the ioctls, as the
>> + * container and group are already configured in the kernel.
>> + */
>> + if (!cpr_reused && !vfio_set_iommu(fd, group->fd, &iommu_type, errp)) {
>> return NULL;
>> }
>> @@ -433,6 +439,7 @@ static VFIOContainer *vfio_create_container(int fd, VFIOGroup *group,
>> container = VFIO_IOMMU_LEGACY(object_new(vioc_name));
>> container->fd = fd;
>> + container->cpr.reused = cpr_reused;
>> container->iommu_type = iommu_type;
>> return container;
>> }
>> @@ -584,7 +591,7 @@ static bool vfio_container_attach_discard_disable(VFIOContainer *container,
>> }
>> static bool vfio_container_group_add(VFIOContainer *container, VFIOGroup *group,
>> - Error **errp)
>> + bool cpr_reused, Error **errp)
>> {
>> if (!vfio_container_attach_discard_disable(container, group, errp)) {
>> return false;
>> @@ -592,6 +599,9 @@ static bool vfio_container_group_add(VFIOContainer *container, VFIOGroup *group,
>> group->container = container;
>> QLIST_INSERT_HEAD(&container->group_list, group, container_next);
>> vfio_group_add_kvm_device(group);
>> + if (!cpr_reused) {
>> + cpr_save_fd("vfio_container_for_group", group->groupid, container->fd);
>> + }
>> return true;
>> }
>> @@ -601,6 +611,7 @@ static void vfio_container_group_del(VFIOContainer *container, VFIOGroup *group)
>> group->container = NULL;
>> vfio_group_del_kvm_device(group);
>> vfio_ram_block_discard_disable(container, false);
>> + cpr_delete_fd("vfio_container_for_group", group->groupid);
>> }
>> static bool vfio_container_connect(VFIOGroup *group, AddressSpace *as,
>> @@ -613,17 +624,37 @@ static bool vfio_container_connect(VFIOGroup *group, AddressSpace *as,
>> VFIOIOMMUClass *vioc = NULL;
>> bool new_container = false;
>> bool group_was_added = false;
>> + bool cpr_reused;
>> space = vfio_address_space_get(as);
>> + fd = cpr_find_fd("vfio_container_for_group", group->groupid);
>> + cpr_reused = (fd > 0);
>
> btw, 0 is a valid fd number.
That's a typo, but a bad one, thanks! That is the only broken one:
$ fgrep '(fd >' hw/vfio/*.c
container.c: cpr_reused = (fd > 0);
container.c: if (fd >= 0) {
cpr.c: if (fd >= 0) {
cpr-legacy.c: *reused = (fd >= 0);
cpr-legacy.c: if (fd >= 0) {
pci.c: if (fd >= 0) {
pci.c: if (fd >= 0) {
- Steve
>> +
>> + /*
>> + * If the container is reused, then the group is already attached in the
>> + * kernel. If a container with matching fd is found, then update the
>> + * userland group list and return. If not, then after the loop, create
>> + * the container struct and group list.
>> + */
>> QLIST_FOREACH(bcontainer, &space->containers, next) {
>> container = container_of(bcontainer, VFIOContainer, bcontainer);
>> - if (!ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &container->fd)) {
>> - return vfio_container_group_add(container, group, errp);
>> +
>> + if (cpr_reused) {
>> + if (!vfio_cpr_container_match(container, group, &fd)) {
>> + continue;
>> + }
>> + } else if (ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &container->fd)) {
>> + continue;
>> }
>> +
>> + return vfio_container_group_add(container, group, cpr_reused, errp);
>> + }
>> +
>> + if (!cpr_reused) {
>> + fd = qemu_open("/dev/vfio/vfio", O_RDWR, errp);
>> }
>> - fd = qemu_open("/dev/vfio/vfio", O_RDWR, errp);
>> if (fd < 0) {
>> goto fail;
>> }
>> @@ -635,7 +666,7 @@ static bool vfio_container_connect(VFIOGroup *group, AddressSpace *as,
>> goto fail;
>> }
>> - container = vfio_create_container(fd, group, errp);
>> + container = vfio_create_container(fd, group, cpr_reused, errp);
>> if (!container) {
>> goto fail;
>> }
>> @@ -655,7 +686,7 @@ static bool vfio_container_connect(VFIOGroup *group, AddressSpace *as,
>> vfio_address_space_insert(space, bcontainer);
>> - if (!vfio_container_group_add(container, group, errp)) {
>> + if (!vfio_container_group_add(container, group, cpr_reused, errp)) {
>> goto fail;
>> }
>> group_was_added = true;
>> @@ -697,6 +728,7 @@ static void vfio_container_disconnect(VFIOGroup *group)
>> QLIST_REMOVE(group, container_next);
>> group->container = NULL;
>> + cpr_delete_fd("vfio_container_for_group", group->groupid);
>> /*
>> * Explicitly release the listener first before unset container,
>> @@ -750,7 +782,7 @@ static VFIOGroup *vfio_group_get(int groupid, AddressSpace *as, Error **errp)
>> group = g_malloc0(sizeof(*group));
>> snprintf(path, sizeof(path), "/dev/vfio/%d", groupid);
>> - group->fd = qemu_open(path, O_RDWR, errp);
>> + group->fd = cpr_open_fd(path, O_RDWR, "vfio_group", groupid, NULL, errp);
>> if (group->fd < 0) {
>> goto free_group_exit;
>> }
>> @@ -782,6 +814,7 @@ static VFIOGroup *vfio_group_get(int groupid, AddressSpace *as, Error **errp)
>> return group;
>> close_fd_exit:
>> + cpr_delete_fd("vfio_group", groupid);
>> close(group->fd);
>> free_group_exit:
>> @@ -803,6 +836,7 @@ static void vfio_group_put(VFIOGroup *group)
>> vfio_container_disconnect(group);
>> QLIST_REMOVE(group, next);
>> trace_vfio_group_put(group->fd);
>> + cpr_delete_fd("vfio_group", group->groupid);
>> close(group->fd);
>> g_free(group);
>> }
>> @@ -812,8 +846,14 @@ static bool vfio_device_get(VFIOGroup *group, const char *name,
>> {
>> g_autofree struct vfio_device_info *info = NULL;
>> int fd;
>> + bool cpr_reused;
>> +
>> + fd = cpr_find_fd(name, 0);
>> + cpr_reused = (fd >= 0);
>> + if (!cpr_reused) {
>> + fd = ioctl(group->fd, VFIO_GROUP_GET_DEVICE_FD, name);
>> + }
>> - fd = ioctl(group->fd, VFIO_GROUP_GET_DEVICE_FD, name);
>> if (fd < 0) {
>> error_setg_errno(errp, errno, "error getting device from group %d",
>> group->groupid);
>> @@ -857,6 +897,10 @@ static bool vfio_device_get(VFIOGroup *group, const char *name,
>> vbasedev->group = group;
>> QLIST_INSERT_HEAD(&group->device_list, vbasedev, next);
>> + vbasedev->cpr.reused = cpr_reused;
>> + if (!cpr_reused) {
>> + cpr_save_fd(name, 0, fd);
>> + }
>> trace_vfio_device_get(name, info->flags, info->num_regions, info->num_irqs);
>> return true;
>> @@ -870,6 +914,7 @@ static void vfio_device_put(VFIODevice *vbasedev)
>> QLIST_REMOVE(vbasedev, next);
>> vbasedev->group = NULL;
>> trace_vfio_device_put(vbasedev->fd);
>> + cpr_delete_fd(vbasedev->name, 0);
>> close(vbasedev->fd);
>> }
>> diff --git a/hw/vfio/cpr-legacy.c b/hw/vfio/cpr-legacy.c
>> index fac323c..638a8e0 100644
>> --- a/hw/vfio/cpr-legacy.c
>> +++ b/hw/vfio/cpr-legacy.c
>> @@ -10,6 +10,7 @@
>> #include "qemu/osdep.h"
>> #include "hw/vfio/vfio-container.h"
>> #include "hw/vfio/vfio-cpr.h"
>> +#include "hw/vfio/vfio-device.h"
>> #include "migration/blocker.h"
>> #include "migration/cpr.h"
>> #include "migration/migration.h"
>> @@ -31,10 +32,27 @@ static bool vfio_cpr_supported(VFIOContainer *container, Error **errp)
>> }
>> }
>> +static int vfio_container_post_load(void *opaque, int version_id)
>> +{
>> + VFIOContainer *container = opaque;
>> + VFIOGroup *group;
>> + VFIODevice *vbasedev;
>> +
>> + container->cpr.reused = false;
>> +
>> + QLIST_FOREACH(group, &container->group_list, container_next) {
>> + QLIST_FOREACH(vbasedev, &group->device_list, next) {
>> + vbasedev->cpr.reused = false;
>> + }
>> + }
>> + return 0;
>> +}
>> +
>> static const VMStateDescription vfio_container_vmstate = {
>> .name = "vfio-container",
>> .version_id = 0,
>> .minimum_version_id = 0,
>> + .post_load = vfio_container_post_load,
>> .needed = cpr_needed_for_reuse,
>> .fields = (VMStateField[]) {
>> VMSTATE_END_OF_LIST()
>> @@ -68,3 +86,31 @@ void vfio_legacy_cpr_unregister_container(VFIOContainer *container)
>> migrate_del_blocker(&container->cpr.blocker);
>> vmstate_unregister(NULL, &vfio_container_vmstate, container);
>> }
>> +
>> +static bool same_device(int fd1, int fd2)
>> +{
>> + struct stat st1, st2;
>> +
>> + return !fstat(fd1, &st1) && !fstat(fd2, &st2) && st1.st_dev == st2.st_dev;
>> +}
>> +
>> +bool vfio_cpr_container_match(VFIOContainer *container, VFIOGroup *group,
>> + int *pfd)
>> +{
>> + if (container->fd == *pfd) {
>> + return true;
>> + }
>> + if (!same_device(container->fd, *pfd)) {
>> + return false;
>> + }
>> + /*
>> + * Same device, different fd. This occurs when the container fd is
>> + * cpr_save'd multiple times, once for each groupid, so SCM_RIGHTS
>> + * produces duplicates. De-dup it.
>> + */
>> + cpr_delete_fd("vfio_container_for_group", group->groupid);
>> + close(*pfd);
>> + cpr_save_fd("vfio_container_for_group", group->groupid, container->fd);
>> + *pfd = container->fd;
>> + return true;
>> +}
>> diff --git a/include/hw/vfio/vfio-cpr.h b/include/hw/vfio/vfio-cpr.h
>> index f864547..1c4f070 100644
>> --- a/include/hw/vfio/vfio-cpr.h
>> +++ b/include/hw/vfio/vfio-cpr.h
>> @@ -13,10 +13,16 @@
>> typedef struct VFIOContainerCPR {
>> Error *blocker;
>> + bool reused;
>> } VFIOContainerCPR;
>> +typedef struct VFIODeviceCPR {
>> + bool reused;
>> +} VFIODeviceCPR;
>> +
>> struct VFIOContainer;
>> struct VFIOContainerBase;
>> +struct VFIOGroup;
>> bool vfio_legacy_cpr_register_container(struct VFIOContainer *container,
>> Error **errp);
>> @@ -29,4 +35,7 @@ bool vfio_cpr_register_container(struct VFIOContainerBase *bcontainer,
>> Error **errp);
>> void vfio_cpr_unregister_container(struct VFIOContainerBase *bcontainer);
>> +bool vfio_cpr_container_match(struct VFIOContainer *container,
>> + struct VFIOGroup *group, int *fd);
>> +
>> #endif /* HW_VFIO_VFIO_CPR_H */
>> diff --git a/include/hw/vfio/vfio-device.h b/include/hw/vfio/vfio-device.h
>> index 8bcb3c1..4e4d0b6 100644
>> --- a/include/hw/vfio/vfio-device.h
>> +++ b/include/hw/vfio/vfio-device.h
>> @@ -28,6 +28,7 @@
>> #endif
>> #include "system/system.h"
>> #include "hw/vfio/vfio-container-base.h"
>> +#include "hw/vfio/vfio-cpr.h"
>> #include "system/host_iommu_device.h"
>> #include "system/iommufd.h"
>> @@ -84,6 +85,7 @@ typedef struct VFIODevice {
>> VFIOIOASHwpt *hwpt;
>> QLIST_ENTRY(VFIODevice) hwpt_next;
>> struct vfio_region_info **reginfo;
>> + VFIODeviceCPR cpr;
>> } VFIODevice;
>> struct VFIODeviceOps {
>
^ permalink raw reply [flat|nested] 157+ messages in thread
* [PATCH V3 08/42] vfio/container: export vfio_legacy_dma_map
2025-05-12 15:32 [PATCH V3 00/42] Live update: vfio and iommufd Steve Sistare
` (6 preceding siblings ...)
2025-05-12 15:32 ` [PATCH V3 07/42] vfio/container: preserve descriptors Steve Sistare
@ 2025-05-12 15:32 ` Steve Sistare
2025-05-15 13:42 ` Cédric Le Goater
2025-05-12 15:32 ` [PATCH V3 09/42] vfio/container: discard old DMA vaddr Steve Sistare
` (34 subsequent siblings)
42 siblings, 1 reply; 157+ messages in thread
From: Steve Sistare @ 2025-05-12 15:32 UTC (permalink / raw)
To: qemu-devel
Cc: Alex Williamson, Cedric Le Goater, Yi Liu, Eric Auger,
Zhenzhong Duan, Michael S. Tsirkin, Marcel Apfelbaum, Peter Xu,
Fabiano Rosas, Steve Sistare
Export vfio_legacy_dma_map so it may be referenced outside the file
in a subsequent patch.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
hw/vfio/container.c | 4 ++--
include/hw/vfio/vfio-container-base.h | 3 +++
2 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/hw/vfio/container.c b/hw/vfio/container.c
index 278a220..a554683 100644
--- a/hw/vfio/container.c
+++ b/hw/vfio/container.c
@@ -208,8 +208,8 @@ static int vfio_legacy_dma_unmap(const VFIOContainerBase *bcontainer,
return ret;
}
-static int vfio_legacy_dma_map(const VFIOContainerBase *bcontainer, hwaddr iova,
- ram_addr_t size, void *vaddr, bool readonly)
+int vfio_legacy_dma_map(const VFIOContainerBase *bcontainer, hwaddr iova,
+ ram_addr_t size, void *vaddr, bool readonly)
{
const VFIOContainer *container = container_of(bcontainer, VFIOContainer,
bcontainer);
diff --git a/include/hw/vfio/vfio-container-base.h b/include/hw/vfio/vfio-container-base.h
index 1dc760f..a2f6c3a 100644
--- a/include/hw/vfio/vfio-container-base.h
+++ b/include/hw/vfio/vfio-container-base.h
@@ -186,4 +186,7 @@ struct VFIOIOMMUClass {
VFIORamDiscardListener *vfio_find_ram_discard_listener(
VFIOContainerBase *bcontainer, MemoryRegionSection *section);
+int vfio_legacy_dma_map(const VFIOContainerBase *bcontainer, hwaddr iova,
+ ram_addr_t size, void *vaddr, bool readonly);
+
#endif /* HW_VFIO_VFIO_CONTAINER_BASE_H */
--
1.8.3.1
^ permalink raw reply related [flat|nested] 157+ messages in thread* Re: [PATCH V3 08/42] vfio/container: export vfio_legacy_dma_map
2025-05-12 15:32 ` [PATCH V3 08/42] vfio/container: export vfio_legacy_dma_map Steve Sistare
@ 2025-05-15 13:42 ` Cédric Le Goater
2025-05-15 19:08 ` Steven Sistare
0 siblings, 1 reply; 157+ messages in thread
From: Cédric Le Goater @ 2025-05-15 13:42 UTC (permalink / raw)
To: Steve Sistare, qemu-devel
Cc: Alex Williamson, Yi Liu, Eric Auger, Zhenzhong Duan,
Michael S. Tsirkin, Marcel Apfelbaum, Peter Xu, Fabiano Rosas
On 5/12/25 17:32, Steve Sistare wrote:
> Export vfio_legacy_dma_map so it may be referenced outside the file
> in a subsequent patch.
>
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> ---
> hw/vfio/container.c | 4 ++--
> include/hw/vfio/vfio-container-base.h | 3 +++
> 2 files changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
> index 278a220..a554683 100644
> --- a/hw/vfio/container.c
> +++ b/hw/vfio/container.c
> @@ -208,8 +208,8 @@ static int vfio_legacy_dma_unmap(const VFIOContainerBase *bcontainer,
> return ret;
> }
>
> -static int vfio_legacy_dma_map(const VFIOContainerBase *bcontainer, hwaddr iova,
> - ram_addr_t size, void *vaddr, bool readonly)
> +int vfio_legacy_dma_map(const VFIOContainerBase *bcontainer, hwaddr iova,
> + ram_addr_t size, void *vaddr, bool readonly)
> {
> const VFIOContainer *container = container_of(bcontainer, VFIOContainer,
> bcontainer);
> diff --git a/include/hw/vfio/vfio-container-base.h b/include/hw/vfio/vfio-container-base.h
> index 1dc760f..a2f6c3a 100644
> --- a/include/hw/vfio/vfio-container-base.h
> +++ b/include/hw/vfio/vfio-container-base.h
> @@ -186,4 +186,7 @@ struct VFIOIOMMUClass {
> VFIORamDiscardListener *vfio_find_ram_discard_listener(
> VFIOContainerBase *bcontainer, MemoryRegionSection *section);
>
> +int vfio_legacy_dma_map(const VFIOContainerBase *bcontainer, hwaddr iova,
> + ram_addr_t size, void *vaddr, bool readonly);
> +
> #endif /* HW_VFIO_VFIO_CONTAINER_BASE_H */
I don't think this export is necessary. See comment on patch 10.
Thanks,
C.
^ permalink raw reply [flat|nested] 157+ messages in thread* Re: [PATCH V3 08/42] vfio/container: export vfio_legacy_dma_map
2025-05-15 13:42 ` Cédric Le Goater
@ 2025-05-15 19:08 ` Steven Sistare
0 siblings, 0 replies; 157+ messages in thread
From: Steven Sistare @ 2025-05-15 19:08 UTC (permalink / raw)
To: Cédric Le Goater, qemu-devel
Cc: Alex Williamson, Yi Liu, Eric Auger, Zhenzhong Duan,
Michael S. Tsirkin, Marcel Apfelbaum, Peter Xu, Fabiano Rosas
On 5/15/2025 9:42 AM, Cédric Le Goater wrote:
> On 5/12/25 17:32, Steve Sistare wrote:
>> Export vfio_legacy_dma_map so it may be referenced outside the file
>> in a subsequent patch.
>>
>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>> ---
>> hw/vfio/container.c | 4 ++--
>> include/hw/vfio/vfio-container-base.h | 3 +++
>> 2 files changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
>> index 278a220..a554683 100644
>> --- a/hw/vfio/container.c
>> +++ b/hw/vfio/container.c
>> @@ -208,8 +208,8 @@ static int vfio_legacy_dma_unmap(const VFIOContainerBase *bcontainer,
>> return ret;
>> }
>> -static int vfio_legacy_dma_map(const VFIOContainerBase *bcontainer, hwaddr iova,
>> - ram_addr_t size, void *vaddr, bool readonly)
>> +int vfio_legacy_dma_map(const VFIOContainerBase *bcontainer, hwaddr iova,
>> + ram_addr_t size, void *vaddr, bool readonly)
>> {
>> const VFIOContainer *container = container_of(bcontainer, VFIOContainer,
>> bcontainer);
>> diff --git a/include/hw/vfio/vfio-container-base.h b/include/hw/vfio/vfio-container-base.h
>> index 1dc760f..a2f6c3a 100644
>> --- a/include/hw/vfio/vfio-container-base.h
>> +++ b/include/hw/vfio/vfio-container-base.h
>> @@ -186,4 +186,7 @@ struct VFIOIOMMUClass {
>> VFIORamDiscardListener *vfio_find_ram_discard_listener(
>> VFIOContainerBase *bcontainer, MemoryRegionSection *section);
>> +int vfio_legacy_dma_map(const VFIOContainerBase *bcontainer, hwaddr iova,
>> + ram_addr_t size, void *vaddr, bool readonly);
>> +
>> #endif /* HW_VFIO_VFIO_CONTAINER_BASE_H */
>
> I don't think this export is necessary. See comment on patch 10.
OK, I will drop this patch - steve
^ permalink raw reply [flat|nested] 157+ messages in thread
* [PATCH V3 09/42] vfio/container: discard old DMA vaddr
2025-05-12 15:32 [PATCH V3 00/42] Live update: vfio and iommufd Steve Sistare
` (7 preceding siblings ...)
2025-05-12 15:32 ` [PATCH V3 08/42] vfio/container: export vfio_legacy_dma_map Steve Sistare
@ 2025-05-12 15:32 ` Steve Sistare
2025-05-15 13:30 ` Cédric Le Goater
2025-05-12 15:32 ` [PATCH V3 10/42] vfio/container: restore " Steve Sistare
` (33 subsequent siblings)
42 siblings, 1 reply; 157+ messages in thread
From: Steve Sistare @ 2025-05-12 15:32 UTC (permalink / raw)
To: qemu-devel
Cc: Alex Williamson, Cedric Le Goater, Yi Liu, Eric Auger,
Zhenzhong Duan, Michael S. Tsirkin, Marcel Apfelbaum, Peter Xu,
Fabiano Rosas, Steve Sistare
In the container pre_save handler, discard the virtual addresses in DMA
mappings with VFIO_DMA_UNMAP_FLAG_VADDR, because guest RAM will be
remapped at a different VA after in new QEMU. DMA to already-mapped
pages continues.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
hw/vfio/cpr-legacy.c | 29 +++++++++++++++++++++++++++++
1 file changed, 29 insertions(+)
diff --git a/hw/vfio/cpr-legacy.c b/hw/vfio/cpr-legacy.c
index 638a8e0..519d772 100644
--- a/hw/vfio/cpr-legacy.c
+++ b/hw/vfio/cpr-legacy.c
@@ -17,6 +17,22 @@
#include "migration/vmstate.h"
#include "qapi/error.h"
+static bool vfio_dma_unmap_vaddr_all(VFIOContainer *container, Error **errp)
+{
+ struct vfio_iommu_type1_dma_unmap unmap = {
+ .argsz = sizeof(unmap),
+ .flags = VFIO_DMA_UNMAP_FLAG_VADDR | VFIO_DMA_UNMAP_FLAG_ALL,
+ .iova = 0,
+ .size = 0,
+ };
+ if (ioctl(container->fd, VFIO_IOMMU_UNMAP_DMA, &unmap)) {
+ error_setg_errno(errp, errno, "vfio_dma_unmap_vaddr_all");
+ return false;
+ }
+ return true;
+}
+
+
static bool vfio_cpr_supported(VFIOContainer *container, Error **errp)
{
if (!ioctl(container->fd, VFIO_CHECK_EXTENSION, VFIO_UPDATE_VADDR)) {
@@ -32,6 +48,18 @@ static bool vfio_cpr_supported(VFIOContainer *container, Error **errp)
}
}
+static int vfio_container_pre_save(void *opaque)
+{
+ VFIOContainer *container = opaque;
+ Error *err = NULL;
+
+ if (!vfio_dma_unmap_vaddr_all(container, &err)) {
+ error_report_err(err);
+ return -1;
+ }
+ return 0;
+}
+
static int vfio_container_post_load(void *opaque, int version_id)
{
VFIOContainer *container = opaque;
@@ -52,6 +80,7 @@ static const VMStateDescription vfio_container_vmstate = {
.name = "vfio-container",
.version_id = 0,
.minimum_version_id = 0,
+ .pre_save = vfio_container_pre_save,
.post_load = vfio_container_post_load,
.needed = cpr_needed_for_reuse,
.fields = (VMStateField[]) {
--
1.8.3.1
^ permalink raw reply related [flat|nested] 157+ messages in thread* Re: [PATCH V3 09/42] vfio/container: discard old DMA vaddr
2025-05-12 15:32 ` [PATCH V3 09/42] vfio/container: discard old DMA vaddr Steve Sistare
@ 2025-05-15 13:30 ` Cédric Le Goater
0 siblings, 0 replies; 157+ messages in thread
From: Cédric Le Goater @ 2025-05-15 13:30 UTC (permalink / raw)
To: Steve Sistare, qemu-devel
Cc: Alex Williamson, Yi Liu, Eric Auger, Zhenzhong Duan,
Michael S. Tsirkin, Marcel Apfelbaum, Peter Xu, Fabiano Rosas
On 5/12/25 17:32, Steve Sistare wrote:
> In the container pre_save handler, discard the virtual addresses in DMA
> mappings with VFIO_DMA_UNMAP_FLAG_VADDR, because guest RAM will be
> remapped at a different VA after in new QEMU. DMA to already-mapped
> pages continues.
>
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Looks OK. Too bad the pre_save() handler doesn't have an
'Error **' parameter.
It shouldn't be too complex to add in vmstate_save_state_v().
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Thanks,
C.
> ---
> hw/vfio/cpr-legacy.c | 29 +++++++++++++++++++++++++++++
> 1 file changed, 29 insertions(+)
>
> diff --git a/hw/vfio/cpr-legacy.c b/hw/vfio/cpr-legacy.c
> index 638a8e0..519d772 100644
> --- a/hw/vfio/cpr-legacy.c
> +++ b/hw/vfio/cpr-legacy.c
> @@ -17,6 +17,22 @@
> #include "migration/vmstate.h"
> #include "qapi/error.h"
>
> +static bool vfio_dma_unmap_vaddr_all(VFIOContainer *container, Error **errp)
> +{
> + struct vfio_iommu_type1_dma_unmap unmap = {
> + .argsz = sizeof(unmap),
> + .flags = VFIO_DMA_UNMAP_FLAG_VADDR | VFIO_DMA_UNMAP_FLAG_ALL,
> + .iova = 0,
> + .size = 0,
> + };
> + if (ioctl(container->fd, VFIO_IOMMU_UNMAP_DMA, &unmap)) {
> + error_setg_errno(errp, errno, "vfio_dma_unmap_vaddr_all");
> + return false;
> + }
> + return true;
> +}
> +
> +
> static bool vfio_cpr_supported(VFIOContainer *container, Error **errp)
> {
> if (!ioctl(container->fd, VFIO_CHECK_EXTENSION, VFIO_UPDATE_VADDR)) {
> @@ -32,6 +48,18 @@ static bool vfio_cpr_supported(VFIOContainer *container, Error **errp)
> }
> }
>
> +static int vfio_container_pre_save(void *opaque)
> +{
> + VFIOContainer *container = opaque;
> + Error *err = NULL;
> +
> + if (!vfio_dma_unmap_vaddr_all(container, &err)) {
> + error_report_err(err);
> + return -1;
> + }
> + return 0;
> +}
> +
> static int vfio_container_post_load(void *opaque, int version_id)
> {
> VFIOContainer *container = opaque;
> @@ -52,6 +80,7 @@ static const VMStateDescription vfio_container_vmstate = {
> .name = "vfio-container",
> .version_id = 0,
> .minimum_version_id = 0,
> + .pre_save = vfio_container_pre_save,
> .post_load = vfio_container_post_load,
> .needed = cpr_needed_for_reuse,
> .fields = (VMStateField[]) {
^ permalink raw reply [flat|nested] 157+ messages in thread
* [PATCH V3 10/42] vfio/container: restore DMA vaddr
2025-05-12 15:32 [PATCH V3 00/42] Live update: vfio and iommufd Steve Sistare
` (8 preceding siblings ...)
2025-05-12 15:32 ` [PATCH V3 09/42] vfio/container: discard old DMA vaddr Steve Sistare
@ 2025-05-12 15:32 ` Steve Sistare
2025-05-15 13:42 ` Cédric Le Goater
2025-05-22 6:37 ` Cédric Le Goater
2025-05-12 15:32 ` [PATCH V3 11/42] vfio/container: mdev cpr blocker Steve Sistare
` (32 subsequent siblings)
42 siblings, 2 replies; 157+ messages in thread
From: Steve Sistare @ 2025-05-12 15:32 UTC (permalink / raw)
To: qemu-devel
Cc: Alex Williamson, Cedric Le Goater, Yi Liu, Eric Auger,
Zhenzhong Duan, Michael S. Tsirkin, Marcel Apfelbaum, Peter Xu,
Fabiano Rosas, Steve Sistare
In new QEMU, do not register the memory listener at device creation time.
Register it later, in the container post_load handler, after all vmstate
that may affect regions and mapping boundaries has been loaded. The
post_load registration will cause the listener to invoke its callback on
each flat section, and the calls will match the mappings remembered by the
kernel.
The listener calls a special dma_map handler that passes the new VA of each
section to the kernel using VFIO_DMA_MAP_FLAG_VADDR. Restore the normal
handler at the end.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
hw/vfio/container.c | 15 +++++++++++++--
hw/vfio/cpr-legacy.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 61 insertions(+), 2 deletions(-)
diff --git a/hw/vfio/container.c b/hw/vfio/container.c
index a554683..0e02726 100644
--- a/hw/vfio/container.c
+++ b/hw/vfio/container.c
@@ -137,6 +137,8 @@ static int vfio_legacy_dma_unmap_one(const VFIOContainerBase *bcontainer,
int ret;
Error *local_err = NULL;
+ assert(!container->cpr.reused);
+
if (iotlb && vfio_container_dirty_tracking_is_started(bcontainer)) {
if (!vfio_container_devices_dirty_tracking_is_supported(bcontainer) &&
bcontainer->dirty_pages_supported) {
@@ -691,8 +693,17 @@ static bool vfio_container_connect(VFIOGroup *group, AddressSpace *as,
}
group_was_added = true;
- if (!vfio_listener_register(bcontainer, errp)) {
- goto fail;
+ /*
+ * If reused, register the listener later, after all state that may
+ * affect regions and mapping boundaries has been cpr load'ed. Later,
+ * the listener will invoke its callback on each flat section and call
+ * dma_map to supply the new vaddr, and the calls will match the mappings
+ * remembered by the kernel.
+ */
+ if (!cpr_reused) {
+ if (!vfio_listener_register(bcontainer, errp)) {
+ goto fail;
+ }
}
bcontainer->initialized = true;
diff --git a/hw/vfio/cpr-legacy.c b/hw/vfio/cpr-legacy.c
index 519d772..bbcf71e 100644
--- a/hw/vfio/cpr-legacy.c
+++ b/hw/vfio/cpr-legacy.c
@@ -11,11 +11,13 @@
#include "hw/vfio/vfio-container.h"
#include "hw/vfio/vfio-cpr.h"
#include "hw/vfio/vfio-device.h"
+#include "hw/vfio/vfio-listener.h"
#include "migration/blocker.h"
#include "migration/cpr.h"
#include "migration/migration.h"
#include "migration/vmstate.h"
#include "qapi/error.h"
+#include "qemu/error-report.h"
static bool vfio_dma_unmap_vaddr_all(VFIOContainer *container, Error **errp)
{
@@ -32,6 +34,34 @@ static bool vfio_dma_unmap_vaddr_all(VFIOContainer *container, Error **errp)
return true;
}
+/*
+ * Set the new @vaddr for any mappings registered during cpr load.
+ * Reused is cleared thereafter.
+ */
+static int vfio_legacy_cpr_dma_map(const VFIOContainerBase *bcontainer,
+ hwaddr iova, ram_addr_t size, void *vaddr,
+ bool readonly)
+{
+ const VFIOContainer *container = container_of(bcontainer, VFIOContainer,
+ bcontainer);
+ struct vfio_iommu_type1_dma_map map = {
+ .argsz = sizeof(map),
+ .flags = VFIO_DMA_MAP_FLAG_VADDR,
+ .vaddr = (__u64)(uintptr_t)vaddr,
+ .iova = iova,
+ .size = size,
+ };
+
+ assert(container->cpr.reused);
+
+ if (ioctl(container->fd, VFIO_IOMMU_MAP_DMA, &map)) {
+ error_report("vfio_legacy_cpr_dma_map (iova %lu, size %ld, va %p): %s",
+ iova, size, vaddr, strerror(errno));
+ return -errno;
+ }
+
+ return 0;
+}
static bool vfio_cpr_supported(VFIOContainer *container, Error **errp)
{
@@ -63,12 +93,24 @@ static int vfio_container_pre_save(void *opaque)
static int vfio_container_post_load(void *opaque, int version_id)
{
VFIOContainer *container = opaque;
+ VFIOContainerBase *bcontainer = &container->bcontainer;
VFIOGroup *group;
VFIODevice *vbasedev;
+ Error *err = NULL;
+
+ if (!vfio_listener_register(bcontainer, &err)) {
+ error_report_err(err);
+ return -1;
+ }
container->cpr.reused = false;
QLIST_FOREACH(group, &container->group_list, container_next) {
+ VFIOIOMMUClass *vioc = VFIO_IOMMU_GET_CLASS(bcontainer);
+
+ /* Restore original dma_map function */
+ vioc->dma_map = vfio_legacy_dma_map;
+
QLIST_FOREACH(vbasedev, &group->device_list, next) {
vbasedev->cpr.reused = false;
}
@@ -80,6 +122,7 @@ static const VMStateDescription vfio_container_vmstate = {
.name = "vfio-container",
.version_id = 0,
.minimum_version_id = 0,
+ .priority = MIG_PRI_LOW, /* Must happen after devices and groups */
.pre_save = vfio_container_pre_save,
.post_load = vfio_container_post_load,
.needed = cpr_needed_for_reuse,
@@ -104,6 +147,11 @@ bool vfio_legacy_cpr_register_container(VFIOContainer *container, Error **errp)
vmstate_register(NULL, -1, &vfio_container_vmstate, container);
+ /* During incoming CPR, divert calls to dma_map. */
+ if (container->cpr.reused) {
+ VFIOIOMMUClass *vioc = VFIO_IOMMU_GET_CLASS(bcontainer);
+ vioc->dma_map = vfio_legacy_cpr_dma_map;
+ }
return true;
}
--
1.8.3.1
^ permalink raw reply related [flat|nested] 157+ messages in thread* Re: [PATCH V3 10/42] vfio/container: restore DMA vaddr
2025-05-12 15:32 ` [PATCH V3 10/42] vfio/container: restore " Steve Sistare
@ 2025-05-15 13:42 ` Cédric Le Goater
2025-05-15 19:08 ` Steven Sistare
2025-05-22 6:37 ` Cédric Le Goater
1 sibling, 1 reply; 157+ messages in thread
From: Cédric Le Goater @ 2025-05-15 13:42 UTC (permalink / raw)
To: Steve Sistare, qemu-devel
Cc: Alex Williamson, Yi Liu, Eric Auger, Zhenzhong Duan,
Michael S. Tsirkin, Marcel Apfelbaum, Peter Xu, Fabiano Rosas
On 5/12/25 17:32, Steve Sistare wrote:
> In new QEMU, do not register the memory listener at device creation time.
> Register it later, in the container post_load handler, after all vmstate
> that may affect regions and mapping boundaries has been loaded. The
> post_load registration will cause the listener to invoke its callback on
> each flat section, and the calls will match the mappings remembered by the
> kernel.
>
> The listener calls a special dma_map handler that passes the new VA of each
> section to the kernel using VFIO_DMA_MAP_FLAG_VADDR. Restore the normal
> handler at the end.
>
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> ---
> hw/vfio/container.c | 15 +++++++++++++--
> hw/vfio/cpr-legacy.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 61 insertions(+), 2 deletions(-)
>
> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
> index a554683..0e02726 100644
> --- a/hw/vfio/container.c
> +++ b/hw/vfio/container.c
> @@ -137,6 +137,8 @@ static int vfio_legacy_dma_unmap_one(const VFIOContainerBase *bcontainer,
> int ret;
> Error *local_err = NULL;
>
> + assert(!container->cpr.reused);
> +
> if (iotlb && vfio_container_dirty_tracking_is_started(bcontainer)) {
> if (!vfio_container_devices_dirty_tracking_is_supported(bcontainer) &&
> bcontainer->dirty_pages_supported) {
> @@ -691,8 +693,17 @@ static bool vfio_container_connect(VFIOGroup *group, AddressSpace *as,
> }
> group_was_added = true;
>
> - if (!vfio_listener_register(bcontainer, errp)) {
> - goto fail;
> + /*
> + * If reused, register the listener later, after all state that may
> + * affect regions and mapping boundaries has been cpr load'ed. Later,
> + * the listener will invoke its callback on each flat section and call
> + * dma_map to supply the new vaddr, and the calls will match the mappings
> + * remembered by the kernel.
> + */
> + if (!cpr_reused) {
> + if (!vfio_listener_register(bcontainer, errp)) {
> + goto fail;
> + }
hmm, I am starting to think we should have a vfio_cpr_container_connect
routine too.
> }
>
> bcontainer->initialized = true;
> diff --git a/hw/vfio/cpr-legacy.c b/hw/vfio/cpr-legacy.c
> index 519d772..bbcf71e 100644
> --- a/hw/vfio/cpr-legacy.c
> +++ b/hw/vfio/cpr-legacy.c
> @@ -11,11 +11,13 @@
> #include "hw/vfio/vfio-container.h"
> #include "hw/vfio/vfio-cpr.h"
> #include "hw/vfio/vfio-device.h"
> +#include "hw/vfio/vfio-listener.h"
> #include "migration/blocker.h"
> #include "migration/cpr.h"
> #include "migration/migration.h"
> #include "migration/vmstate.h"
> #include "qapi/error.h"
> +#include "qemu/error-report.h"
>
> static bool vfio_dma_unmap_vaddr_all(VFIOContainer *container, Error **errp)
> {
> @@ -32,6 +34,34 @@ static bool vfio_dma_unmap_vaddr_all(VFIOContainer *container, Error **errp)
> return true;
> }
>
> +/*
> + * Set the new @vaddr for any mappings registered during cpr load.
> + * Reused is cleared thereafter.
> + */
> +static int vfio_legacy_cpr_dma_map(const VFIOContainerBase *bcontainer,
> + hwaddr iova, ram_addr_t size, void *vaddr,
> + bool readonly)
> +{
> + const VFIOContainer *container = container_of(bcontainer, VFIOContainer,
> + bcontainer);
> + struct vfio_iommu_type1_dma_map map = {
> + .argsz = sizeof(map),
> + .flags = VFIO_DMA_MAP_FLAG_VADDR,
> + .vaddr = (__u64)(uintptr_t)vaddr,
> + .iova = iova,
> + .size = size,
> + };
> +
> + assert(container->cpr.reused);
> +
> + if (ioctl(container->fd, VFIO_IOMMU_MAP_DMA, &map)) {
> + error_report("vfio_legacy_cpr_dma_map (iova %lu, size %ld, va %p): %s",
> + iova, size, vaddr, strerror(errno));
Callers should also report the error. No need to do it here.
> + return -errno;
> + }
> +
> + return 0;
> +}
>
> static bool vfio_cpr_supported(VFIOContainer *container, Error **errp)
> {
> @@ -63,12 +93,24 @@ static int vfio_container_pre_save(void *opaque)
> static int vfio_container_post_load(void *opaque, int version_id)
> {
> VFIOContainer *container = opaque;
> + VFIOContainerBase *bcontainer = &container->bcontainer;
> VFIOGroup *group;
> VFIODevice *vbasedev;
> + Error *err = NULL;
> +
> + if (!vfio_listener_register(bcontainer, &err)) {
> + error_report_err(err);
> + return -1;
> + }
>
> container->cpr.reused = false;
>
> QLIST_FOREACH(group, &container->group_list, container_next) {
> + VFIOIOMMUClass *vioc = VFIO_IOMMU_GET_CLASS(bcontainer);
> +
> + /* Restore original dma_map function */
> + vioc->dma_map = vfio_legacy_dma_map;
> +
> QLIST_FOREACH(vbasedev, &group->device_list, next) {
> vbasedev->cpr.reused = false;
> }
> @@ -80,6 +122,7 @@ static const VMStateDescription vfio_container_vmstate = {
> .name = "vfio-container",
> .version_id = 0,
> .minimum_version_id = 0,
> + .priority = MIG_PRI_LOW, /* Must happen after devices and groups */
> .pre_save = vfio_container_pre_save,
> .post_load = vfio_container_post_load,
> .needed = cpr_needed_for_reuse,
> @@ -104,6 +147,11 @@ bool vfio_legacy_cpr_register_container(VFIOContainer *container, Error **errp)
>
> vmstate_register(NULL, -1, &vfio_container_vmstate, container);
>
> + /* During incoming CPR, divert calls to dma_map. */
> + if (container->cpr.reused) {
> + VFIOIOMMUClass *vioc = VFIO_IOMMU_GET_CLASS(bcontainer);
> + vioc->dma_map = vfio_legacy_cpr_dma_map;
You could backup the previous dma_map() handler in a static variable or,
better, under container->cpr.
Thanks,
C.
> + }
> return true;
> }
>
^ permalink raw reply [flat|nested] 157+ messages in thread* Re: [PATCH V3 10/42] vfio/container: restore DMA vaddr
2025-05-15 13:42 ` Cédric Le Goater
@ 2025-05-15 19:08 ` Steven Sistare
2025-05-19 13:32 ` Cédric Le Goater
0 siblings, 1 reply; 157+ messages in thread
From: Steven Sistare @ 2025-05-15 19:08 UTC (permalink / raw)
To: Cédric Le Goater, qemu-devel
Cc: Alex Williamson, Yi Liu, Eric Auger, Zhenzhong Duan,
Michael S. Tsirkin, Marcel Apfelbaum, Peter Xu, Fabiano Rosas
On 5/15/2025 9:42 AM, Cédric Le Goater wrote:
> On 5/12/25 17:32, Steve Sistare wrote:
>> In new QEMU, do not register the memory listener at device creation time.
>> Register it later, in the container post_load handler, after all vmstate
>> that may affect regions and mapping boundaries has been loaded. The
>> post_load registration will cause the listener to invoke its callback on
>> each flat section, and the calls will match the mappings remembered by the
>> kernel.
>>
>> The listener calls a special dma_map handler that passes the new VA of each
>> section to the kernel using VFIO_DMA_MAP_FLAG_VADDR. Restore the normal
>> handler at the end.
>>
>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>> ---
>> hw/vfio/container.c | 15 +++++++++++++--
>> hw/vfio/cpr-legacy.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
>> 2 files changed, 61 insertions(+), 2 deletions(-)
>>
>> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
>> index a554683..0e02726 100644
>> --- a/hw/vfio/container.c
>> +++ b/hw/vfio/container.c
>> @@ -137,6 +137,8 @@ static int vfio_legacy_dma_unmap_one(const VFIOContainerBase *bcontainer,
>> int ret;
>> Error *local_err = NULL;
>> + assert(!container->cpr.reused);
>> +
>> if (iotlb && vfio_container_dirty_tracking_is_started(bcontainer)) {
>> if (!vfio_container_devices_dirty_tracking_is_supported(bcontainer) &&
>> bcontainer->dirty_pages_supported) {
>> @@ -691,8 +693,17 @@ static bool vfio_container_connect(VFIOGroup *group, AddressSpace *as,
>> }
>> group_was_added = true;
>> - if (!vfio_listener_register(bcontainer, errp)) {
>> - goto fail;
>> + /*
>> + * If reused, register the listener later, after all state that may
>> + * affect regions and mapping boundaries has been cpr load'ed. Later,
>> + * the listener will invoke its callback on each flat section and call
>> + * dma_map to supply the new vaddr, and the calls will match the mappings
>> + * remembered by the kernel.
>> + */
>> + if (!cpr_reused) {
>> + if (!vfio_listener_register(bcontainer, errp)) {
>> + goto fail;
>> + }
>
> hmm, I am starting to think we should have a vfio_cpr_container_connect
> routine too.
I think that would obscure rather than clarify the code, since the normal
non-cpr action of calling vfio_listener_register would be buried in a
cpr flavored function name.
>> }
>> bcontainer->initialized = true;
>> diff --git a/hw/vfio/cpr-legacy.c b/hw/vfio/cpr-legacy.c
>> index 519d772..bbcf71e 100644
>> --- a/hw/vfio/cpr-legacy.c
>> +++ b/hw/vfio/cpr-legacy.c
>> @@ -11,11 +11,13 @@
>> #include "hw/vfio/vfio-container.h"
>> #include "hw/vfio/vfio-cpr.h"
>> #include "hw/vfio/vfio-device.h"
>> +#include "hw/vfio/vfio-listener.h"
>> #include "migration/blocker.h"
>> #include "migration/cpr.h"
>> #include "migration/migration.h"
>> #include "migration/vmstate.h"
>> #include "qapi/error.h"
>> +#include "qemu/error-report.h"
>> static bool vfio_dma_unmap_vaddr_all(VFIOContainer *container, Error **errp)
>> {
>> @@ -32,6 +34,34 @@ static bool vfio_dma_unmap_vaddr_all(VFIOContainer *container, Error **errp)
>> return true;
>> }
>> +/*
>> + * Set the new @vaddr for any mappings registered during cpr load.
>> + * Reused is cleared thereafter.
>> + */
>> +static int vfio_legacy_cpr_dma_map(const VFIOContainerBase *bcontainer,
>> + hwaddr iova, ram_addr_t size, void *vaddr,
>> + bool readonly)
>> +{
>> + const VFIOContainer *container = container_of(bcontainer, VFIOContainer,
>> + bcontainer);
>> + struct vfio_iommu_type1_dma_map map = {
>> + .argsz = sizeof(map),
>> + .flags = VFIO_DMA_MAP_FLAG_VADDR,
>> + .vaddr = (__u64)(uintptr_t)vaddr,
>> + .iova = iova,
>> + .size = size,
>> + };
>> +
>> + assert(container->cpr.reused);
>> +
>> + if (ioctl(container->fd, VFIO_IOMMU_MAP_DMA, &map)) {
>> + error_report("vfio_legacy_cpr_dma_map (iova %lu, size %ld, va %p): %s",
>> + iova, size, vaddr, strerror(errno));
>
> Callers should also report the error. No need to do it here.
This function has the same signature as the dma_map class method,
which does not return an error message. It's existing implementations
use error_report.
>> + return -errno;
>> + }
>> +
>> + return 0;
>> +}
>> static bool vfio_cpr_supported(VFIOContainer *container, Error **errp)
>> {
>> @@ -63,12 +93,24 @@ static int vfio_container_pre_save(void *opaque)
>> static int vfio_container_post_load(void *opaque, int version_id)
>> {
>> VFIOContainer *container = opaque;
>> + VFIOContainerBase *bcontainer = &container->bcontainer;
>> VFIOGroup *group;
>> VFIODevice *vbasedev;
>> + Error *err = NULL;
>> +
>> + if (!vfio_listener_register(bcontainer, &err)) {
>> + error_report_err(err);
>> + return -1;
>> + }
>> container->cpr.reused = false;
>> QLIST_FOREACH(group, &container->group_list, container_next) {
>> + VFIOIOMMUClass *vioc = VFIO_IOMMU_GET_CLASS(bcontainer);
>> +
>> + /* Restore original dma_map function */
>> + vioc->dma_map = vfio_legacy_dma_map;
>> +
>> QLIST_FOREACH(vbasedev, &group->device_list, next) {
>> vbasedev->cpr.reused = false;
>> }
>> @@ -80,6 +122,7 @@ static const VMStateDescription vfio_container_vmstate = {
>> .name = "vfio-container",
>> .version_id = 0,
>> .minimum_version_id = 0,
>> + .priority = MIG_PRI_LOW, /* Must happen after devices and groups */
>> .pre_save = vfio_container_pre_save,
>> .post_load = vfio_container_post_load,
>> .needed = cpr_needed_for_reuse,
>> @@ -104,6 +147,11 @@ bool vfio_legacy_cpr_register_container(VFIOContainer *container, Error **errp)
>> vmstate_register(NULL, -1, &vfio_container_vmstate, container);
>> + /* During incoming CPR, divert calls to dma_map. */
>> + if (container->cpr.reused) {
>> + VFIOIOMMUClass *vioc = VFIO_IOMMU_GET_CLASS(bcontainer);
>> + vioc->dma_map = vfio_legacy_cpr_dma_map;
>
> You could backup the previous dma_map() handler in a static variable or,
> better, under container->cpr.
OK.
- Steve
^ permalink raw reply [flat|nested] 157+ messages in thread* Re: [PATCH V3 10/42] vfio/container: restore DMA vaddr
2025-05-15 19:08 ` Steven Sistare
@ 2025-05-19 13:32 ` Cédric Le Goater
2025-05-19 16:33 ` Steven Sistare
0 siblings, 1 reply; 157+ messages in thread
From: Cédric Le Goater @ 2025-05-19 13:32 UTC (permalink / raw)
To: Steven Sistare, qemu-devel
Cc: Alex Williamson, Yi Liu, Eric Auger, Zhenzhong Duan,
Michael S. Tsirkin, Marcel Apfelbaum, Peter Xu, Fabiano Rosas
On 5/15/25 21:08, Steven Sistare wrote:
> On 5/15/2025 9:42 AM, Cédric Le Goater wrote:
>> On 5/12/25 17:32, Steve Sistare wrote:
>>> In new QEMU, do not register the memory listener at device creation time.
>>> Register it later, in the container post_load handler, after all vmstate
>>> that may affect regions and mapping boundaries has been loaded. The
>>> post_load registration will cause the listener to invoke its callback on
>>> each flat section, and the calls will match the mappings remembered by the
>>> kernel.
>>>
>>> The listener calls a special dma_map handler that passes the new VA of each
>>> section to the kernel using VFIO_DMA_MAP_FLAG_VADDR. Restore the normal
>>> handler at the end.
>>>
>>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>>> ---
>>> hw/vfio/container.c | 15 +++++++++++++--
>>> hw/vfio/cpr-legacy.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
>>> 2 files changed, 61 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
>>> index a554683..0e02726 100644
>>> --- a/hw/vfio/container.c
>>> +++ b/hw/vfio/container.c
>>> @@ -137,6 +137,8 @@ static int vfio_legacy_dma_unmap_one(const VFIOContainerBase *bcontainer,
>>> int ret;
>>> Error *local_err = NULL;
>>> + assert(!container->cpr.reused);
>>> +
>>> if (iotlb && vfio_container_dirty_tracking_is_started(bcontainer)) {
>>> if (!vfio_container_devices_dirty_tracking_is_supported(bcontainer) &&
>>> bcontainer->dirty_pages_supported) {
>>> @@ -691,8 +693,17 @@ static bool vfio_container_connect(VFIOGroup *group, AddressSpace *as,
>>> }
>>> group_was_added = true;
>>> - if (!vfio_listener_register(bcontainer, errp)) {
>>> - goto fail;
>>> + /*
>>> + * If reused, register the listener later, after all state that may
>>> + * affect regions and mapping boundaries has been cpr load'ed. Later,
>>> + * the listener will invoke its callback on each flat section and call
>>> + * dma_map to supply the new vaddr, and the calls will match the mappings
>>> + * remembered by the kernel.
>>> + */
>>> + if (!cpr_reused) {
>>> + if (!vfio_listener_register(bcontainer, errp)) {
>>> + goto fail;
>>> + }
>>
>> hmm, I am starting to think we should have a vfio_cpr_container_connect
>> routine too.
>
> I think that would obscure rather than clarify the code, since the normal
> non-cpr action of calling vfio_listener_register would be buried in a
> cpr flavored function name.
>
>>> }
>>> bcontainer->initialized = true;
>>> diff --git a/hw/vfio/cpr-legacy.c b/hw/vfio/cpr-legacy.c
>>> index 519d772..bbcf71e 100644
>>> --- a/hw/vfio/cpr-legacy.c
>>> +++ b/hw/vfio/cpr-legacy.c
>>> @@ -11,11 +11,13 @@
>>> #include "hw/vfio/vfio-container.h"
>>> #include "hw/vfio/vfio-cpr.h"
>>> #include "hw/vfio/vfio-device.h"
>>> +#include "hw/vfio/vfio-listener.h"
>>> #include "migration/blocker.h"
>>> #include "migration/cpr.h"
>>> #include "migration/migration.h"
>>> #include "migration/vmstate.h"
>>> #include "qapi/error.h"
>>> +#include "qemu/error-report.h"
>>> static bool vfio_dma_unmap_vaddr_all(VFIOContainer *container, Error **errp)
>>> {
>>> @@ -32,6 +34,34 @@ static bool vfio_dma_unmap_vaddr_all(VFIOContainer *container, Error **errp)
>>> return true;
>>> }
>>> +/*
>>> + * Set the new @vaddr for any mappings registered during cpr load.
>>> + * Reused is cleared thereafter.
>>> + */
>>> +static int vfio_legacy_cpr_dma_map(const VFIOContainerBase *bcontainer,
>>> + hwaddr iova, ram_addr_t size, void *vaddr,
>>> + bool readonly)
>>> +{
>>> + const VFIOContainer *container = container_of(bcontainer, VFIOContainer,
>>> + bcontainer);
>>> + struct vfio_iommu_type1_dma_map map = {
>>> + .argsz = sizeof(map),
>>> + .flags = VFIO_DMA_MAP_FLAG_VADDR,
>>> + .vaddr = (__u64)(uintptr_t)vaddr,
>>> + .iova = iova,
>>> + .size = size,
>>> + };
>>> +
>>> + assert(container->cpr.reused);
>>> +
>>> + if (ioctl(container->fd, VFIO_IOMMU_MAP_DMA, &map)) {
>>> + error_report("vfio_legacy_cpr_dma_map (iova %lu, size %ld, va %p): %s",
>>> + iova, size, vaddr, strerror(errno));
>>
>> Callers should also report the error. No need to do it here.
>
> This function has the same signature as the dma_map class method,
> which does not return an error message. It's existing implementations
> use error_report.
backends .dma_map handlers : vfio_legacy_dma_map(), iommufd_backend_map_dma()
don't report errors. vfio_container_dma_map() doesn't either.
callers of vfio_container_dma_map() : vfio_iommu_map_notify(),
vfio_listener_region_add() report errors.
Thanks,
C.
>
>>> + return -errno;
>>> + }
>>> +
>>> + return 0;
>>> +}
>>> static bool vfio_cpr_supported(VFIOContainer *container, Error **errp)
>>> {
>>> @@ -63,12 +93,24 @@ static int vfio_container_pre_save(void *opaque)
>>> static int vfio_container_post_load(void *opaque, int version_id)
>>> {
>>> VFIOContainer *container = opaque;
>>> + VFIOContainerBase *bcontainer = &container->bcontainer;
>>> VFIOGroup *group;
>>> VFIODevice *vbasedev;
>>> + Error *err = NULL;
>>> +
>>> + if (!vfio_listener_register(bcontainer, &err)) {
>>> + error_report_err(err);
>>> + return -1;
>>> + }
>>> container->cpr.reused = false;
>>> QLIST_FOREACH(group, &container->group_list, container_next) {
>>> + VFIOIOMMUClass *vioc = VFIO_IOMMU_GET_CLASS(bcontainer);
>>> +
>>> + /* Restore original dma_map function */
>>> + vioc->dma_map = vfio_legacy_dma_map;
>>> +
>>> QLIST_FOREACH(vbasedev, &group->device_list, next) {
>>> vbasedev->cpr.reused = false;
>>> }
>>> @@ -80,6 +122,7 @@ static const VMStateDescription vfio_container_vmstate = {
>>> .name = "vfio-container",
>>> .version_id = 0,
>>> .minimum_version_id = 0,
>>> + .priority = MIG_PRI_LOW, /* Must happen after devices and groups */
>>> .pre_save = vfio_container_pre_save,
>>> .post_load = vfio_container_post_load,
>>> .needed = cpr_needed_for_reuse,
>>> @@ -104,6 +147,11 @@ bool vfio_legacy_cpr_register_container(VFIOContainer *container, Error **errp)
>>> vmstate_register(NULL, -1, &vfio_container_vmstate, container);
>>> + /* During incoming CPR, divert calls to dma_map. */
>>> + if (container->cpr.reused) {
>>> + VFIOIOMMUClass *vioc = VFIO_IOMMU_GET_CLASS(bcontainer);
>>> + vioc->dma_map = vfio_legacy_cpr_dma_map;
>>
>> You could backup the previous dma_map() handler in a static variable or,
>> better, under container->cpr.
>
> OK.
>
> - Steve
^ permalink raw reply [flat|nested] 157+ messages in thread* Re: [PATCH V3 10/42] vfio/container: restore DMA vaddr
2025-05-19 13:32 ` Cédric Le Goater
@ 2025-05-19 16:33 ` Steven Sistare
0 siblings, 0 replies; 157+ messages in thread
From: Steven Sistare @ 2025-05-19 16:33 UTC (permalink / raw)
To: Cédric Le Goater, qemu-devel
Cc: Alex Williamson, Yi Liu, Eric Auger, Zhenzhong Duan,
Michael S. Tsirkin, Marcel Apfelbaum, Peter Xu, Fabiano Rosas
On 5/19/2025 9:32 AM, Cédric Le Goater wrote:
> On 5/15/25 21:08, Steven Sistare wrote:
>> On 5/15/2025 9:42 AM, Cédric Le Goater wrote:
>>> On 5/12/25 17:32, Steve Sistare wrote:
>>>> In new QEMU, do not register the memory listener at device creation time.
>>>> Register it later, in the container post_load handler, after all vmstate
>>>> that may affect regions and mapping boundaries has been loaded. The
>>>> post_load registration will cause the listener to invoke its callback on
>>>> each flat section, and the calls will match the mappings remembered by the
>>>> kernel.
>>>>
>>>> The listener calls a special dma_map handler that passes the new VA of each
>>>> section to the kernel using VFIO_DMA_MAP_FLAG_VADDR. Restore the normal
>>>> handler at the end.
>>>>
>>>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>>>> ---
>>>> hw/vfio/container.c | 15 +++++++++++++--
>>>> hw/vfio/cpr-legacy.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
>>>> 2 files changed, 61 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
>>>> index a554683..0e02726 100644
>>>> --- a/hw/vfio/container.c
>>>> +++ b/hw/vfio/container.c
>>>> @@ -137,6 +137,8 @@ static int vfio_legacy_dma_unmap_one(const VFIOContainerBase *bcontainer,
>>>> int ret;
>>>> Error *local_err = NULL;
>>>> + assert(!container->cpr.reused);
>>>> +
>>>> if (iotlb && vfio_container_dirty_tracking_is_started(bcontainer)) {
>>>> if (!vfio_container_devices_dirty_tracking_is_supported(bcontainer) &&
>>>> bcontainer->dirty_pages_supported) {
>>>> @@ -691,8 +693,17 @@ static bool vfio_container_connect(VFIOGroup *group, AddressSpace *as,
>>>> }
>>>> group_was_added = true;
>>>> - if (!vfio_listener_register(bcontainer, errp)) {
>>>> - goto fail;
>>>> + /*
>>>> + * If reused, register the listener later, after all state that may
>>>> + * affect regions and mapping boundaries has been cpr load'ed. Later,
>>>> + * the listener will invoke its callback on each flat section and call
>>>> + * dma_map to supply the new vaddr, and the calls will match the mappings
>>>> + * remembered by the kernel.
>>>> + */
>>>> + if (!cpr_reused) {
>>>> + if (!vfio_listener_register(bcontainer, errp)) {
>>>> + goto fail;
>>>> + }
>>>
>>> hmm, I am starting to think we should have a vfio_cpr_container_connect
>>> routine too.
>>
>> I think that would obscure rather than clarify the code, since the normal
>> non-cpr action of calling vfio_listener_register would be buried in a
>> cpr flavored function name.
>>
>>>> }
>>>> bcontainer->initialized = true;
>>>> diff --git a/hw/vfio/cpr-legacy.c b/hw/vfio/cpr-legacy.c
>>>> index 519d772..bbcf71e 100644
>>>> --- a/hw/vfio/cpr-legacy.c
>>>> +++ b/hw/vfio/cpr-legacy.c
>>>> @@ -11,11 +11,13 @@
>>>> #include "hw/vfio/vfio-container.h"
>>>> #include "hw/vfio/vfio-cpr.h"
>>>> #include "hw/vfio/vfio-device.h"
>>>> +#include "hw/vfio/vfio-listener.h"
>>>> #include "migration/blocker.h"
>>>> #include "migration/cpr.h"
>>>> #include "migration/migration.h"
>>>> #include "migration/vmstate.h"
>>>> #include "qapi/error.h"
>>>> +#include "qemu/error-report.h"
>>>> static bool vfio_dma_unmap_vaddr_all(VFIOContainer *container, Error **errp)
>>>> {
>>>> @@ -32,6 +34,34 @@ static bool vfio_dma_unmap_vaddr_all(VFIOContainer *container, Error **errp)
>>>> return true;
>>>> }
>>>> +/*
>>>> + * Set the new @vaddr for any mappings registered during cpr load.
>>>> + * Reused is cleared thereafter.
>>>> + */
>>>> +static int vfio_legacy_cpr_dma_map(const VFIOContainerBase *bcontainer,
>>>> + hwaddr iova, ram_addr_t size, void *vaddr,
>>>> + bool readonly)
>>>> +{
>>>> + const VFIOContainer *container = container_of(bcontainer, VFIOContainer,
>>>> + bcontainer);
>>>> + struct vfio_iommu_type1_dma_map map = {
>>>> + .argsz = sizeof(map),
>>>> + .flags = VFIO_DMA_MAP_FLAG_VADDR,
>>>> + .vaddr = (__u64)(uintptr_t)vaddr,
>>>> + .iova = iova,
>>>> + .size = size,
>>>> + };
>>>> +
>>>> + assert(container->cpr.reused);
>>>> +
>>>> + if (ioctl(container->fd, VFIO_IOMMU_MAP_DMA, &map)) {
>>>> + error_report("vfio_legacy_cpr_dma_map (iova %lu, size %ld, va %p): %s",
>>>> + iova, size, vaddr, strerror(errno));
>>>
>>> Callers should also report the error. No need to do it here.
>>
>> This function has the same signature as the dma_map class method,
>> which does not return an error message. It's existing implementations
>> use error_report.
>
> backends .dma_map handlers : vfio_legacy_dma_map(), iommufd_backend_map_dma()
> don't report errors. vfio_container_dma_map() doesn't either.
>
> callers of vfio_container_dma_map() : vfio_iommu_map_notify(),
> vfio_listener_region_add() report errors.
OK, I misunderstood your suggestion.
I will drop the error_report and just return the errno.
- Steve
>>>> + return -errno;
>>>> + }
>>>> +
>>>> + return 0;
>>>> +}
>>>> static bool vfio_cpr_supported(VFIOContainer *container, Error **errp)
>>>> {
>>>> @@ -63,12 +93,24 @@ static int vfio_container_pre_save(void *opaque)
>>>> static int vfio_container_post_load(void *opaque, int version_id)
>>>> {
>>>> VFIOContainer *container = opaque;
>>>> + VFIOContainerBase *bcontainer = &container->bcontainer;
>>>> VFIOGroup *group;
>>>> VFIODevice *vbasedev;
>>>> + Error *err = NULL;
>>>> +
>>>> + if (!vfio_listener_register(bcontainer, &err)) {
>>>> + error_report_err(err);
>>>> + return -1;
>>>> + }
>>>> container->cpr.reused = false;
>>>> QLIST_FOREACH(group, &container->group_list, container_next) {
>>>> + VFIOIOMMUClass *vioc = VFIO_IOMMU_GET_CLASS(bcontainer);
>>>> +
>>>> + /* Restore original dma_map function */
>>>> + vioc->dma_map = vfio_legacy_dma_map;
>>>> +
>>>> QLIST_FOREACH(vbasedev, &group->device_list, next) {
>>>> vbasedev->cpr.reused = false;
>>>> }
>>>> @@ -80,6 +122,7 @@ static const VMStateDescription vfio_container_vmstate = {
>>>> .name = "vfio-container",
>>>> .version_id = 0,
>>>> .minimum_version_id = 0,
>>>> + .priority = MIG_PRI_LOW, /* Must happen after devices and groups */
>>>> .pre_save = vfio_container_pre_save,
>>>> .post_load = vfio_container_post_load,
>>>> .needed = cpr_needed_for_reuse,
>>>> @@ -104,6 +147,11 @@ bool vfio_legacy_cpr_register_container(VFIOContainer *container, Error **errp)
>>>> vmstate_register(NULL, -1, &vfio_container_vmstate, container);
>>>> + /* During incoming CPR, divert calls to dma_map. */
>>>> + if (container->cpr.reused) {
>>>> + VFIOIOMMUClass *vioc = VFIO_IOMMU_GET_CLASS(bcontainer);
>>>> + vioc->dma_map = vfio_legacy_cpr_dma_map;
>>>
>>> You could backup the previous dma_map() handler in a static variable or,
>>> better, under container->cpr.
>>
>> OK.
>>
>> - Steve
>
^ permalink raw reply [flat|nested] 157+ messages in thread
* Re: [PATCH V3 10/42] vfio/container: restore DMA vaddr
2025-05-12 15:32 ` [PATCH V3 10/42] vfio/container: restore " Steve Sistare
2025-05-15 13:42 ` Cédric Le Goater
@ 2025-05-22 6:37 ` Cédric Le Goater
2025-05-22 14:00 ` Steven Sistare
1 sibling, 1 reply; 157+ messages in thread
From: Cédric Le Goater @ 2025-05-22 6:37 UTC (permalink / raw)
To: Steve Sistare, qemu-devel
Cc: Alex Williamson, Yi Liu, Eric Auger, Zhenzhong Duan,
Michael S. Tsirkin, Marcel Apfelbaum, Peter Xu, Fabiano Rosas
On 5/12/25 17:32, Steve Sistare wrote:
> In new QEMU, do not register the memory listener at device creation time.
> Register it later, in the container post_load handler, after all vmstate
> that may affect regions and mapping boundaries has been loaded. The
> post_load registration will cause the listener to invoke its callback on
> each flat section, and the calls will match the mappings remembered by the
> kernel.
>
> The listener calls a special dma_map handler that passes the new VA of each
> section to the kernel using VFIO_DMA_MAP_FLAG_VADDR. Restore the normal
> handler at the end.
>
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> ---
> hw/vfio/container.c | 15 +++++++++++++--
> hw/vfio/cpr-legacy.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 61 insertions(+), 2 deletions(-)
>
> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
> index a554683..0e02726 100644
> --- a/hw/vfio/container.c
> +++ b/hw/vfio/container.c
> @@ -137,6 +137,8 @@ static int vfio_legacy_dma_unmap_one(const VFIOContainerBase *bcontainer,
> int ret;
> Error *local_err = NULL;
>
> + assert(!container->cpr.reused);
assert -> g_assert
this can be called at runtime, which would mean crashing QEMU in case
of error. Doing an error_report() call is more friendly.
Thanks,
C.
> +
> if (iotlb && vfio_container_dirty_tracking_is_started(bcontainer)) {
> if (!vfio_container_devices_dirty_tracking_is_supported(bcontainer) &&
> bcontainer->dirty_pages_supported) {
> @@ -691,8 +693,17 @@ static bool vfio_container_connect(VFIOGroup *group, AddressSpace *as,
> }
> group_was_added = true;
>
> - if (!vfio_listener_register(bcontainer, errp)) {
> - goto fail;
> + /*
> + * If reused, register the listener later, after all state that may
> + * affect regions and mapping boundaries has been cpr load'ed. Later,
> + * the listener will invoke its callback on each flat section and call
> + * dma_map to supply the new vaddr, and the calls will match the mappings
> + * remembered by the kernel.
> + */
> + if (!cpr_reused) {
> + if (!vfio_listener_register(bcontainer, errp)) {
> + goto fail;
> + }
> }
>
> bcontainer->initialized = true;
> diff --git a/hw/vfio/cpr-legacy.c b/hw/vfio/cpr-legacy.c
> index 519d772..bbcf71e 100644
> --- a/hw/vfio/cpr-legacy.c
> +++ b/hw/vfio/cpr-legacy.c
> @@ -11,11 +11,13 @@
> #include "hw/vfio/vfio-container.h"
> #include "hw/vfio/vfio-cpr.h"
> #include "hw/vfio/vfio-device.h"
> +#include "hw/vfio/vfio-listener.h"
> #include "migration/blocker.h"
> #include "migration/cpr.h"
> #include "migration/migration.h"
> #include "migration/vmstate.h"
> #include "qapi/error.h"
> +#include "qemu/error-report.h"
>
> static bool vfio_dma_unmap_vaddr_all(VFIOContainer *container, Error **errp)
> {
> @@ -32,6 +34,34 @@ static bool vfio_dma_unmap_vaddr_all(VFIOContainer *container, Error **errp)
> return true;
> }
>
> +/*
> + * Set the new @vaddr for any mappings registered during cpr load.
> + * Reused is cleared thereafter.
> + */
> +static int vfio_legacy_cpr_dma_map(const VFIOContainerBase *bcontainer,
> + hwaddr iova, ram_addr_t size, void *vaddr,
> + bool readonly)
> +{
> + const VFIOContainer *container = container_of(bcontainer, VFIOContainer,
> + bcontainer);
> + struct vfio_iommu_type1_dma_map map = {
> + .argsz = sizeof(map),
> + .flags = VFIO_DMA_MAP_FLAG_VADDR,
> + .vaddr = (__u64)(uintptr_t)vaddr,
> + .iova = iova,
> + .size = size,
> + };
> +
> + assert(container->cpr.reused);
> +> + if (ioctl(container->fd, VFIO_IOMMU_MAP_DMA, &map)) {
> + error_report("vfio_legacy_cpr_dma_map (iova %lu, size %ld, va %p): %s",
> + iova, size, vaddr, strerror(errno));
> + return -errno;
> + }
> +
> + return 0;
> +}
>
> static bool vfio_cpr_supported(VFIOContainer *container, Error **errp)
> {
> @@ -63,12 +93,24 @@ static int vfio_container_pre_save(void *opaque)
> static int vfio_container_post_load(void *opaque, int version_id)
> {
> VFIOContainer *container = opaque;
> + VFIOContainerBase *bcontainer = &container->bcontainer;
> VFIOGroup *group;
> VFIODevice *vbasedev;
> + Error *err = NULL;
> +
> + if (!vfio_listener_register(bcontainer, &err)) {
> + error_report_err(err);
> + return -1;
> + }
>
> container->cpr.reused = false;
>
> QLIST_FOREACH(group, &container->group_list, container_next) {
> + VFIOIOMMUClass *vioc = VFIO_IOMMU_GET_CLASS(bcontainer);
> +
> + /* Restore original dma_map function */
> + vioc->dma_map = vfio_legacy_dma_map;
> +
> QLIST_FOREACH(vbasedev, &group->device_list, next) {
> vbasedev->cpr.reused = false;
> }
> @@ -80,6 +122,7 @@ static const VMStateDescription vfio_container_vmstate = {
> .name = "vfio-container",
> .version_id = 0,
> .minimum_version_id = 0,
> + .priority = MIG_PRI_LOW, /* Must happen after devices and groups */
> .pre_save = vfio_container_pre_save,
> .post_load = vfio_container_post_load,
> .needed = cpr_needed_for_reuse,
> @@ -104,6 +147,11 @@ bool vfio_legacy_cpr_register_container(VFIOContainer *container, Error **errp)
>
> vmstate_register(NULL, -1, &vfio_container_vmstate, container);
>
> + /* During incoming CPR, divert calls to dma_map. */
> + if (container->cpr.reused) {
> + VFIOIOMMUClass *vioc = VFIO_IOMMU_GET_CLASS(bcontainer);
> + vioc->dma_map = vfio_legacy_cpr_dma_map;
> + }
> return true;
> }
>
^ permalink raw reply [flat|nested] 157+ messages in thread* Re: [PATCH V3 10/42] vfio/container: restore DMA vaddr
2025-05-22 6:37 ` Cédric Le Goater
@ 2025-05-22 14:00 ` Steven Sistare
0 siblings, 0 replies; 157+ messages in thread
From: Steven Sistare @ 2025-05-22 14:00 UTC (permalink / raw)
To: Cédric Le Goater, qemu-devel
Cc: Alex Williamson, Yi Liu, Eric Auger, Zhenzhong Duan,
Michael S. Tsirkin, Marcel Apfelbaum, Peter Xu, Fabiano Rosas
On 5/22/2025 2:37 AM, Cédric Le Goater wrote:
> On 5/12/25 17:32, Steve Sistare wrote:
>> In new QEMU, do not register the memory listener at device creation time.
>> Register it later, in the container post_load handler, after all vmstate
>> that may affect regions and mapping boundaries has been loaded. The
>> post_load registration will cause the listener to invoke its callback on
>> each flat section, and the calls will match the mappings remembered by the
>> kernel.
>>
>> The listener calls a special dma_map handler that passes the new VA of each
>> section to the kernel using VFIO_DMA_MAP_FLAG_VADDR. Restore the normal
>> handler at the end.
>>
>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>> ---
>> hw/vfio/container.c | 15 +++++++++++++--
>> hw/vfio/cpr-legacy.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
>> 2 files changed, 61 insertions(+), 2 deletions(-)
>>
>> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
>> index a554683..0e02726 100644
>> --- a/hw/vfio/container.c
>> +++ b/hw/vfio/container.c
>> @@ -137,6 +137,8 @@ static int vfio_legacy_dma_unmap_one(const VFIOContainerBase *bcontainer,
>> int ret;
>> Error *local_err = NULL;
>> + assert(!container->cpr.reused);
>
> assert -> g_assert
will do.
> this can be called at runtime, which would mean crashing QEMU in case
> of error. Doing an error_report() call is more friendly.
It is an internal error if this assertion is hit, so the state of the system
cannot be trusted. Hence assert rather than error_report and attempt to recover.
- Steve
>> +
>> if (iotlb && vfio_container_dirty_tracking_is_started(bcontainer)) {
>> if (!vfio_container_devices_dirty_tracking_is_supported(bcontainer) &&
>> bcontainer->dirty_pages_supported) {
>> @@ -691,8 +693,17 @@ static bool vfio_container_connect(VFIOGroup *group, AddressSpace *as,
>> }
>> group_was_added = true;
>> - if (!vfio_listener_register(bcontainer, errp)) {
>> - goto fail;
>> + /*
>> + * If reused, register the listener later, after all state that may
>> + * affect regions and mapping boundaries has been cpr load'ed. Later,
>> + * the listener will invoke its callback on each flat section and call
>> + * dma_map to supply the new vaddr, and the calls will match the mappings
>> + * remembered by the kernel.
>> + */
>> + if (!cpr_reused) {
>> + if (!vfio_listener_register(bcontainer, errp)) {
>> + goto fail;
>> + }
>> }
>> bcontainer->initialized = true;
>> diff --git a/hw/vfio/cpr-legacy.c b/hw/vfio/cpr-legacy.c
>> index 519d772..bbcf71e 100644
>> --- a/hw/vfio/cpr-legacy.c
>> +++ b/hw/vfio/cpr-legacy.c
>> @@ -11,11 +11,13 @@
>> #include "hw/vfio/vfio-container.h"
>> #include "hw/vfio/vfio-cpr.h"
>> #include "hw/vfio/vfio-device.h"
>> +#include "hw/vfio/vfio-listener.h"
>> #include "migration/blocker.h"
>> #include "migration/cpr.h"
>> #include "migration/migration.h"
>> #include "migration/vmstate.h"
>> #include "qapi/error.h"
>> +#include "qemu/error-report.h"
>> static bool vfio_dma_unmap_vaddr_all(VFIOContainer *container, Error **errp)
>> {
>> @@ -32,6 +34,34 @@ static bool vfio_dma_unmap_vaddr_all(VFIOContainer *container, Error **errp)
>> return true;
>> }
>> +/*
>> + * Set the new @vaddr for any mappings registered during cpr load.
>> + * Reused is cleared thereafter.
>> + */
>> +static int vfio_legacy_cpr_dma_map(const VFIOContainerBase *bcontainer,
>> + hwaddr iova, ram_addr_t size, void *vaddr,
>> + bool readonly)
>> +{
>> + const VFIOContainer *container = container_of(bcontainer, VFIOContainer,
>> + bcontainer);
>> + struct vfio_iommu_type1_dma_map map = {
>> + .argsz = sizeof(map),
>> + .flags = VFIO_DMA_MAP_FLAG_VADDR,
>> + .vaddr = (__u64)(uintptr_t)vaddr,
>> + .iova = iova,
>> + .size = size,
>> + };
>> +
>> + assert(container->cpr.reused);
>> +> + if (ioctl(container->fd, VFIO_IOMMU_MAP_DMA, &map)) {
>> + error_report("vfio_legacy_cpr_dma_map (iova %lu, size %ld, va %p): %s",
>> + iova, size, vaddr, strerror(errno));
>> + return -errno;
>> + }
>> +
>> + return 0;
>> +}
>> static bool vfio_cpr_supported(VFIOContainer *container, Error **errp)
>> {
>> @@ -63,12 +93,24 @@ static int vfio_container_pre_save(void *opaque)
>> static int vfio_container_post_load(void *opaque, int version_id)
>> {
>> VFIOContainer *container = opaque;
>> + VFIOContainerBase *bcontainer = &container->bcontainer;
>> VFIOGroup *group;
>> VFIODevice *vbasedev;
>> + Error *err = NULL;
>> +
>> + if (!vfio_listener_register(bcontainer, &err)) {
>> + error_report_err(err);
>> + return -1;
>> + }
>> container->cpr.reused = false;
>> QLIST_FOREACH(group, &container->group_list, container_next) {
>> + VFIOIOMMUClass *vioc = VFIO_IOMMU_GET_CLASS(bcontainer);
>> +
>> + /* Restore original dma_map function */
>> + vioc->dma_map = vfio_legacy_dma_map;
>> +
>> QLIST_FOREACH(vbasedev, &group->device_list, next) {
>> vbasedev->cpr.reused = false;
>> }
>> @@ -80,6 +122,7 @@ static const VMStateDescription vfio_container_vmstate = {
>> .name = "vfio-container",
>> .version_id = 0,
>> .minimum_version_id = 0,
>> + .priority = MIG_PRI_LOW, /* Must happen after devices and groups */
>> .pre_save = vfio_container_pre_save,
>> .post_load = vfio_container_post_load,
>> .needed = cpr_needed_for_reuse,
>> @@ -104,6 +147,11 @@ bool vfio_legacy_cpr_register_container(VFIOContainer *container, Error **errp)
>> vmstate_register(NULL, -1, &vfio_container_vmstate, container);
>> + /* During incoming CPR, divert calls to dma_map. */
>> + if (container->cpr.reused) {
>> + VFIOIOMMUClass *vioc = VFIO_IOMMU_GET_CLASS(bcontainer);
>> + vioc->dma_map = vfio_legacy_cpr_dma_map;
>> + }
>> return true;
>> }
>
^ permalink raw reply [flat|nested] 157+ messages in thread
* [PATCH V3 11/42] vfio/container: mdev cpr blocker
2025-05-12 15:32 [PATCH V3 00/42] Live update: vfio and iommufd Steve Sistare
` (9 preceding siblings ...)
2025-05-12 15:32 ` [PATCH V3 10/42] vfio/container: restore " Steve Sistare
@ 2025-05-12 15:32 ` Steve Sistare
2025-05-16 8:16 ` Cédric Le Goater
2025-05-12 15:32 ` [PATCH V3 12/42] vfio/container: recover from unmap-all-vaddr failure Steve Sistare
` (31 subsequent siblings)
42 siblings, 1 reply; 157+ messages in thread
From: Steve Sistare @ 2025-05-12 15:32 UTC (permalink / raw)
To: qemu-devel
Cc: Alex Williamson, Cedric Le Goater, Yi Liu, Eric Auger,
Zhenzhong Duan, Michael S. Tsirkin, Marcel Apfelbaum, Peter Xu,
Fabiano Rosas, Steve Sistare
During CPR, after VFIO_DMA_UNMAP_FLAG_VADDR, the vaddr is temporarily
invalid, so mediated devices cannot be supported. Add a blocker for them.
This restriction will not apply to iommufd containers when CPR is added
for them in a future patch.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
hw/vfio/container.c | 8 ++++++++
include/hw/vfio/vfio-cpr.h | 1 +
2 files changed, 9 insertions(+)
diff --git a/hw/vfio/container.c b/hw/vfio/container.c
index 0e02726..562e3bd 100644
--- a/hw/vfio/container.c
+++ b/hw/vfio/container.c
@@ -995,6 +995,13 @@ static bool vfio_legacy_attach_device(const char *name, VFIODevice *vbasedev,
goto device_put_exit;
}
+ if (vbasedev->mdev) {
+ error_setg(&vbasedev->cpr.mdev_blocker,
+ "CPR does not support vfio mdev %s", vbasedev->name);
+ migrate_add_blocker_modes(&vbasedev->cpr.mdev_blocker, &error_fatal,
+ MIG_MODE_CPR_TRANSFER, -1);
+ }
+
return true;
device_put_exit:
@@ -1012,6 +1019,7 @@ static void vfio_legacy_detach_device(VFIODevice *vbasedev)
vfio_device_unprepare(vbasedev);
+ migrate_del_blocker(&vbasedev->cpr.mdev_blocker);
object_unref(vbasedev->hiod);
vfio_device_put(vbasedev);
vfio_group_put(group);
diff --git a/include/hw/vfio/vfio-cpr.h b/include/hw/vfio/vfio-cpr.h
index 1c4f070..0fc7ab2 100644
--- a/include/hw/vfio/vfio-cpr.h
+++ b/include/hw/vfio/vfio-cpr.h
@@ -18,6 +18,7 @@ typedef struct VFIOContainerCPR {
typedef struct VFIODeviceCPR {
bool reused;
+ Error *mdev_blocker;
} VFIODeviceCPR;
struct VFIOContainer;
--
1.8.3.1
^ permalink raw reply related [flat|nested] 157+ messages in thread* Re: [PATCH V3 11/42] vfio/container: mdev cpr blocker
2025-05-12 15:32 ` [PATCH V3 11/42] vfio/container: mdev cpr blocker Steve Sistare
@ 2025-05-16 8:16 ` Cédric Le Goater
0 siblings, 0 replies; 157+ messages in thread
From: Cédric Le Goater @ 2025-05-16 8:16 UTC (permalink / raw)
To: Steve Sistare, qemu-devel
Cc: Alex Williamson, Yi Liu, Eric Auger, Zhenzhong Duan,
Michael S. Tsirkin, Marcel Apfelbaum, Peter Xu, Fabiano Rosas
On 5/12/25 17:32, Steve Sistare wrote:
> During CPR, after VFIO_DMA_UNMAP_FLAG_VADDR, the vaddr is temporarily
> invalid, so mediated devices cannot be supported. Add a blocker for them.
> This restriction will not apply to iommufd containers when CPR is added
> for them in a future patch.
>
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Thanks,
C.
> ---
> hw/vfio/container.c | 8 ++++++++
> include/hw/vfio/vfio-cpr.h | 1 +
> 2 files changed, 9 insertions(+)
>
> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
> index 0e02726..562e3bd 100644
> --- a/hw/vfio/container.c
> +++ b/hw/vfio/container.c
> @@ -995,6 +995,13 @@ static bool vfio_legacy_attach_device(const char *name, VFIODevice *vbasedev,
> goto device_put_exit;
> }
>
> + if (vbasedev->mdev) {
> + error_setg(&vbasedev->cpr.mdev_blocker,
> + "CPR does not support vfio mdev %s", vbasedev->name);
> + migrate_add_blocker_modes(&vbasedev->cpr.mdev_blocker, &error_fatal,
> + MIG_MODE_CPR_TRANSFER, -1);
> + }
> +
> return true;
>
> device_put_exit:
> @@ -1012,6 +1019,7 @@ static void vfio_legacy_detach_device(VFIODevice *vbasedev)
>
> vfio_device_unprepare(vbasedev);
>
> + migrate_del_blocker(&vbasedev->cpr.mdev_blocker);
> object_unref(vbasedev->hiod);
> vfio_device_put(vbasedev);
> vfio_group_put(group);
> diff --git a/include/hw/vfio/vfio-cpr.h b/include/hw/vfio/vfio-cpr.h
> index 1c4f070..0fc7ab2 100644
> --- a/include/hw/vfio/vfio-cpr.h
> +++ b/include/hw/vfio/vfio-cpr.h
> @@ -18,6 +18,7 @@ typedef struct VFIOContainerCPR {
>
> typedef struct VFIODeviceCPR {
> bool reused;
> + Error *mdev_blocker;
> } VFIODeviceCPR;
>
> struct VFIOContainer;
^ permalink raw reply [flat|nested] 157+ messages in thread
* [PATCH V3 12/42] vfio/container: recover from unmap-all-vaddr failure
2025-05-12 15:32 [PATCH V3 00/42] Live update: vfio and iommufd Steve Sistare
` (10 preceding siblings ...)
2025-05-12 15:32 ` [PATCH V3 11/42] vfio/container: mdev cpr blocker Steve Sistare
@ 2025-05-12 15:32 ` Steve Sistare
2025-05-20 6:29 ` Cédric Le Goater
2025-05-12 15:32 ` [PATCH V3 13/42] pci: export msix_is_pending Steve Sistare
` (30 subsequent siblings)
42 siblings, 1 reply; 157+ messages in thread
From: Steve Sistare @ 2025-05-12 15:32 UTC (permalink / raw)
To: qemu-devel
Cc: Alex Williamson, Cedric Le Goater, Yi Liu, Eric Auger,
Zhenzhong Duan, Michael S. Tsirkin, Marcel Apfelbaum, Peter Xu,
Fabiano Rosas, Steve Sistare
If there are multiple containers and unmap-all fails for some container, we
need to remap vaddr for the other containers for which unmap-all succeeded.
Recover by walking all address ranges of all containers to restore the vaddr
for each. Do so by invoking the vfio listener callback, and passing a new
"remap" flag that tells it to restore a mapping without re-allocating new
userland data structures.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
hw/vfio/cpr-legacy.c | 91 +++++++++++++++++++++++++++++++++++
hw/vfio/listener.c | 19 +++++++-
include/hw/vfio/vfio-container-base.h | 3 ++
include/hw/vfio/vfio-cpr.h | 10 ++++
4 files changed, 122 insertions(+), 1 deletion(-)
diff --git a/hw/vfio/cpr-legacy.c b/hw/vfio/cpr-legacy.c
index bbcf71e..f8ddf78 100644
--- a/hw/vfio/cpr-legacy.c
+++ b/hw/vfio/cpr-legacy.c
@@ -31,6 +31,7 @@ static bool vfio_dma_unmap_vaddr_all(VFIOContainer *container, Error **errp)
error_setg_errno(errp, errno, "vfio_dma_unmap_vaddr_all");
return false;
}
+ container->cpr.vaddr_unmapped = true;
return true;
}
@@ -63,6 +64,14 @@ static int vfio_legacy_cpr_dma_map(const VFIOContainerBase *bcontainer,
return 0;
}
+static void vfio_region_remap(MemoryListener *listener,
+ MemoryRegionSection *section)
+{
+ VFIOContainer *container = container_of(listener, VFIOContainer,
+ cpr.remap_listener);
+ vfio_container_region_add(&container->bcontainer, section, true);
+}
+
static bool vfio_cpr_supported(VFIOContainer *container, Error **errp)
{
if (!ioctl(container->fd, VFIO_CHECK_EXTENSION, VFIO_UPDATE_VADDR)) {
@@ -131,6 +140,40 @@ static const VMStateDescription vfio_container_vmstate = {
}
};
+static int vfio_cpr_fail_notifier(NotifierWithReturn *notifier,
+ MigrationEvent *e, Error **errp)
+{
+ VFIOContainer *container =
+ container_of(notifier, VFIOContainer, cpr.transfer_notifier);
+ VFIOContainerBase *bcontainer = &container->bcontainer;
+
+ if (e->type != MIG_EVENT_PRECOPY_FAILED) {
+ return 0;
+ }
+
+ if (container->cpr.vaddr_unmapped) {
+ /*
+ * Force a call to vfio_region_remap for each mapped section by
+ * temporarily registering a listener, and temporarily diverting
+ * dma_map to vfio_legacy_cpr_dma_map. The latter restores vaddr.
+ */
+
+ VFIOIOMMUClass *vioc = VFIO_IOMMU_GET_CLASS(bcontainer);
+ vioc->dma_map = vfio_legacy_cpr_dma_map;
+
+ container->cpr.remap_listener = (MemoryListener) {
+ .name = "vfio cpr recover",
+ .region_add = vfio_region_remap
+ };
+ memory_listener_register(&container->cpr.remap_listener,
+ bcontainer->space->as);
+ memory_listener_unregister(&container->cpr.remap_listener);
+ container->cpr.vaddr_unmapped = false;
+ vioc->dma_map = vfio_legacy_dma_map;
+ }
+ return 0;
+}
+
bool vfio_legacy_cpr_register_container(VFIOContainer *container, Error **errp)
{
VFIOContainerBase *bcontainer = &container->bcontainer;
@@ -152,6 +195,10 @@ bool vfio_legacy_cpr_register_container(VFIOContainer *container, Error **errp)
VFIOIOMMUClass *vioc = VFIO_IOMMU_GET_CLASS(bcontainer);
vioc->dma_map = vfio_legacy_cpr_dma_map;
}
+
+ migration_add_notifier_mode(&container->cpr.transfer_notifier,
+ vfio_cpr_fail_notifier,
+ MIG_MODE_CPR_TRANSFER);
return true;
}
@@ -162,6 +209,50 @@ void vfio_legacy_cpr_unregister_container(VFIOContainer *container)
migration_remove_notifier(&bcontainer->cpr_reboot_notifier);
migrate_del_blocker(&container->cpr.blocker);
vmstate_unregister(NULL, &vfio_container_vmstate, container);
+ migration_remove_notifier(&container->cpr.transfer_notifier);
+}
+
+/*
+ * In old QEMU, VFIO_DMA_UNMAP_FLAG_VADDR may fail on some mapping after
+ * succeeding for others, so the latter have lost their vaddr. Call this
+ * to restore vaddr for a section with a giommu.
+ *
+ * The giommu already exists. Find it and replay it, which calls
+ * vfio_legacy_cpr_dma_map further down the stack.
+ */
+void vfio_cpr_giommu_remap(VFIOContainerBase *bcontainer,
+ MemoryRegionSection *section)
+{
+ VFIOGuestIOMMU *giommu = NULL;
+ hwaddr as_offset = section->offset_within_address_space;
+ hwaddr iommu_offset = as_offset - section->offset_within_region;
+
+ QLIST_FOREACH(giommu, &bcontainer->giommu_list, giommu_next) {
+ if (giommu->iommu_mr == IOMMU_MEMORY_REGION(section->mr) &&
+ giommu->iommu_offset == iommu_offset) {
+ break;
+ }
+ }
+ g_assert(giommu);
+ memory_region_iommu_replay(giommu->iommu_mr, &giommu->n);
+}
+
+/*
+ * In old QEMU, VFIO_DMA_UNMAP_FLAG_VADDR may fail on some mapping after
+ * succeeding for others, so the latter have lost their vaddr. Call this
+ * to restore vaddr for a section with a RamDiscardManager.
+ *
+ * The ram discard listener already exists. Call its populate function
+ * directly, which calls vfio_legacy_cpr_dma_map.
+ */
+bool vfio_cpr_ram_discard_register_listener(VFIOContainerBase *bcontainer,
+ MemoryRegionSection *section)
+{
+ VFIORamDiscardListener *vrdl =
+ vfio_find_ram_discard_listener(bcontainer, section);
+
+ g_assert(vrdl);
+ return vrdl->listener.notify_populate(&vrdl->listener, section) == 0;
}
static bool same_device(int fd1, int fd2)
diff --git a/hw/vfio/listener.c b/hw/vfio/listener.c
index 5642d04..e86ffcf 100644
--- a/hw/vfio/listener.c
+++ b/hw/vfio/listener.c
@@ -474,6 +474,13 @@ static void vfio_listener_region_add(MemoryListener *listener,
{
VFIOContainerBase *bcontainer = container_of(listener, VFIOContainerBase,
listener);
+ vfio_container_region_add(bcontainer, section, false);
+}
+
+void vfio_container_region_add(VFIOContainerBase *bcontainer,
+ MemoryRegionSection *section,
+ bool cpr_remap)
+{
hwaddr iova, end;
Int128 llend, llsize;
void *vaddr;
@@ -509,6 +516,11 @@ static void vfio_listener_region_add(MemoryListener *listener,
int iommu_idx;
trace_vfio_listener_region_add_iommu(section->mr->name, iova, end);
+
+ if (cpr_remap) {
+ vfio_cpr_giommu_remap(bcontainer, section);
+ }
+
/*
* FIXME: For VFIO iommu types which have KVM acceleration to
* avoid bouncing all map/unmaps through qemu this way, this
@@ -551,7 +563,12 @@ static void vfio_listener_region_add(MemoryListener *listener,
* about changes.
*/
if (memory_region_has_ram_discard_manager(section->mr)) {
- vfio_ram_discard_register_listener(bcontainer, section);
+ if (!cpr_remap) {
+ vfio_ram_discard_register_listener(bcontainer, section);
+ } else if (!vfio_cpr_ram_discard_register_listener(bcontainer,
+ section)) {
+ goto fail;
+ }
return;
}
diff --git a/include/hw/vfio/vfio-container-base.h b/include/hw/vfio/vfio-container-base.h
index a2f6c3a..5776fd7 100644
--- a/include/hw/vfio/vfio-container-base.h
+++ b/include/hw/vfio/vfio-container-base.h
@@ -189,4 +189,7 @@ VFIORamDiscardListener *vfio_find_ram_discard_listener(
int vfio_legacy_dma_map(const VFIOContainerBase *bcontainer, hwaddr iova,
ram_addr_t size, void *vaddr, bool readonly);
+void vfio_container_region_add(VFIOContainerBase *bcontainer,
+ MemoryRegionSection *section, bool cpr_remap);
+
#endif /* HW_VFIO_VFIO_CONTAINER_BASE_H */
diff --git a/include/hw/vfio/vfio-cpr.h b/include/hw/vfio/vfio-cpr.h
index 0fc7ab2..d6d22f2 100644
--- a/include/hw/vfio/vfio-cpr.h
+++ b/include/hw/vfio/vfio-cpr.h
@@ -10,10 +10,14 @@
#define HW_VFIO_VFIO_CPR_H
#include "migration/misc.h"
+#include "system/memory.h"
typedef struct VFIOContainerCPR {
Error *blocker;
bool reused;
+ bool vaddr_unmapped;
+ NotifierWithReturn transfer_notifier;
+ MemoryListener remap_listener;
} VFIOContainerCPR;
typedef struct VFIODeviceCPR {
@@ -39,4 +43,10 @@ void vfio_cpr_unregister_container(struct VFIOContainerBase *bcontainer);
bool vfio_cpr_container_match(struct VFIOContainer *container,
struct VFIOGroup *group, int *fd);
+void vfio_cpr_giommu_remap(struct VFIOContainerBase *bcontainer,
+ MemoryRegionSection *section);
+
+bool vfio_cpr_ram_discard_register_listener(
+ struct VFIOContainerBase *bcontainer, MemoryRegionSection *section);
+
#endif /* HW_VFIO_VFIO_CPR_H */
--
1.8.3.1
^ permalink raw reply related [flat|nested] 157+ messages in thread* Re: [PATCH V3 12/42] vfio/container: recover from unmap-all-vaddr failure
2025-05-12 15:32 ` [PATCH V3 12/42] vfio/container: recover from unmap-all-vaddr failure Steve Sistare
@ 2025-05-20 6:29 ` Cédric Le Goater
2025-05-20 13:39 ` Steven Sistare
0 siblings, 1 reply; 157+ messages in thread
From: Cédric Le Goater @ 2025-05-20 6:29 UTC (permalink / raw)
To: Steve Sistare, qemu-devel
Cc: Alex Williamson, Yi Liu, Eric Auger, Zhenzhong Duan,
Michael S. Tsirkin, Marcel Apfelbaum, Peter Xu, Fabiano Rosas
On 5/12/25 17:32, Steve Sistare wrote:
> If there are multiple containers and unmap-all fails for some container, we
> need to remap vaddr for the other containers for which unmap-all succeeded.
> Recover by walking all address ranges of all containers to restore the vaddr
> for each. Do so by invoking the vfio listener callback, and passing a new
> "remap" flag that tells it to restore a mapping without re-allocating new
> userland data structures.
>
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> ---
> hw/vfio/cpr-legacy.c | 91 +++++++++++++++++++++++++++++++++++
> hw/vfio/listener.c | 19 +++++++-
> include/hw/vfio/vfio-container-base.h | 3 ++
> include/hw/vfio/vfio-cpr.h | 10 ++++
> 4 files changed, 122 insertions(+), 1 deletion(-)
>
> diff --git a/hw/vfio/cpr-legacy.c b/hw/vfio/cpr-legacy.c
> index bbcf71e..f8ddf78 100644
> --- a/hw/vfio/cpr-legacy.c
> +++ b/hw/vfio/cpr-legacy.c
> @@ -31,6 +31,7 @@ static bool vfio_dma_unmap_vaddr_all(VFIOContainer *container, Error **errp)
> error_setg_errno(errp, errno, "vfio_dma_unmap_vaddr_all");
> return false;
> }
> + container->cpr.vaddr_unmapped = true;
> return true;
> }
>
> @@ -63,6 +64,14 @@ static int vfio_legacy_cpr_dma_map(const VFIOContainerBase *bcontainer,
> return 0;
> }
>
> +static void vfio_region_remap(MemoryListener *listener,
> + MemoryRegionSection *section)
> +{
> + VFIOContainer *container = container_of(listener, VFIOContainer,
> + cpr.remap_listener);
> + vfio_container_region_add(&container->bcontainer, section, true);
> +}
> +
> static bool vfio_cpr_supported(VFIOContainer *container, Error **errp)
> {
> if (!ioctl(container->fd, VFIO_CHECK_EXTENSION, VFIO_UPDATE_VADDR)) {
> @@ -131,6 +140,40 @@ static const VMStateDescription vfio_container_vmstate = {
> }
> };
>
> +static int vfio_cpr_fail_notifier(NotifierWithReturn *notifier,
> + MigrationEvent *e, Error **errp)
> +{
> + VFIOContainer *container =
> + container_of(notifier, VFIOContainer, cpr.transfer_notifier);
> + VFIOContainerBase *bcontainer = &container->bcontainer;
> +
> + if (e->type != MIG_EVENT_PRECOPY_FAILED) {
> + return 0;
> + }
> +
> + if (container->cpr.vaddr_unmapped) {
> + /*
> + * Force a call to vfio_region_remap for each mapped section by
> + * temporarily registering a listener, and temporarily diverting
> + * dma_map to vfio_legacy_cpr_dma_map. The latter restores vaddr.
> + */
> +
> + VFIOIOMMUClass *vioc = VFIO_IOMMU_GET_CLASS(bcontainer);
> + vioc->dma_map = vfio_legacy_cpr_dma_map;
> +
> + container->cpr.remap_listener = (MemoryListener) {
> + .name = "vfio cpr recover",
> + .region_add = vfio_region_remap
> + };
> + memory_listener_register(&container->cpr.remap_listener,
> + bcontainer->space->as);
> + memory_listener_unregister(&container->cpr.remap_listener);
> + container->cpr.vaddr_unmapped = false;
> + vioc->dma_map = vfio_legacy_dma_map;
> + }
> + return 0;
> +}
> +
> bool vfio_legacy_cpr_register_container(VFIOContainer *container, Error **errp)
> {
> VFIOContainerBase *bcontainer = &container->bcontainer;
> @@ -152,6 +195,10 @@ bool vfio_legacy_cpr_register_container(VFIOContainer *container, Error **errp)
> VFIOIOMMUClass *vioc = VFIO_IOMMU_GET_CLASS(bcontainer);
> vioc->dma_map = vfio_legacy_cpr_dma_map;
> }
> +
> + migration_add_notifier_mode(&container->cpr.transfer_notifier,
> + vfio_cpr_fail_notifier,
> + MIG_MODE_CPR_TRANSFER);
> return true;
> }
>
> @@ -162,6 +209,50 @@ void vfio_legacy_cpr_unregister_container(VFIOContainer *container)
> migration_remove_notifier(&bcontainer->cpr_reboot_notifier);
> migrate_del_blocker(&container->cpr.blocker);
> vmstate_unregister(NULL, &vfio_container_vmstate, container);
> + migration_remove_notifier(&container->cpr.transfer_notifier);
> +}
> +
> +/*
> + * In old QEMU, VFIO_DMA_UNMAP_FLAG_VADDR may fail on some mapping after
> + * succeeding for others, so the latter have lost their vaddr. Call this
> + * to restore vaddr for a section with a giommu.
> + *
> + * The giommu already exists. Find it and replay it, which calls
> + * vfio_legacy_cpr_dma_map further down the stack.
> + */
> +void vfio_cpr_giommu_remap(VFIOContainerBase *bcontainer,
> + MemoryRegionSection *section)
> +{
> + VFIOGuestIOMMU *giommu = NULL;
> + hwaddr as_offset = section->offset_within_address_space;
> + hwaddr iommu_offset = as_offset - section->offset_within_region;
> +
> + QLIST_FOREACH(giommu, &bcontainer->giommu_list, giommu_next) {
> + if (giommu->iommu_mr == IOMMU_MEMORY_REGION(section->mr) &&
> + giommu->iommu_offset == iommu_offset) {
> + break;
> + }
> + }
> + g_assert(giommu);
> + memory_region_iommu_replay(giommu->iommu_mr, &giommu->n);
> +}
> +
> +/*
> + * In old QEMU, VFIO_DMA_UNMAP_FLAG_VADDR may fail on some mapping after
> + * succeeding for others, so the latter have lost their vaddr. Call this
> + * to restore vaddr for a section with a RamDiscardManager.
> + *
> + * The ram discard listener already exists. Call its populate function
> + * directly, which calls vfio_legacy_cpr_dma_map.
> + */
> +bool vfio_cpr_ram_discard_register_listener(VFIOContainerBase *bcontainer,
> + MemoryRegionSection *section)
> +{
> + VFIORamDiscardListener *vrdl =
> + vfio_find_ram_discard_listener(bcontainer, section);
> +
> + g_assert(vrdl);
> + return vrdl->listener.notify_populate(&vrdl->listener, section) == 0;
> }
>
> static bool same_device(int fd1, int fd2)
> diff --git a/hw/vfio/listener.c b/hw/vfio/listener.c
> index 5642d04..e86ffcf 100644
> --- a/hw/vfio/listener.c
> +++ b/hw/vfio/listener.c
> @@ -474,6 +474,13 @@ static void vfio_listener_region_add(MemoryListener *listener,
> {
> VFIOContainerBase *bcontainer = container_of(listener, VFIOContainerBase,
> listener);
> + vfio_container_region_add(bcontainer, section, false);
> +}
> +
> +void vfio_container_region_add(VFIOContainerBase *bcontainer,
> + MemoryRegionSection *section,
> + bool cpr_remap)
> +{
> hwaddr iova, end;
> Int128 llend, llsize;
> void *vaddr;
> @@ -509,6 +516,11 @@ static void vfio_listener_region_add(MemoryListener *listener,
> int iommu_idx;
>
> trace_vfio_listener_region_add_iommu(section->mr->name, iova, end);
> +
> + if (cpr_remap) {
> + vfio_cpr_giommu_remap(bcontainer, section);
> + }
> +
> /*
> * FIXME: For VFIO iommu types which have KVM acceleration to
> * avoid bouncing all map/unmaps through qemu this way, this
> @@ -551,7 +563,12 @@ static void vfio_listener_region_add(MemoryListener *listener,
> * about changes.
> */
> if (memory_region_has_ram_discard_manager(section->mr)) {
> - vfio_ram_discard_register_listener(bcontainer, section);
> + if (!cpr_remap) {
> + vfio_ram_discard_register_listener(bcontainer, section);
> + } else if (!vfio_cpr_ram_discard_register_listener(bcontainer,
> + section)) {
> + goto fail;
> + }
> return;
> }
>
> diff --git a/include/hw/vfio/vfio-container-base.h b/include/hw/vfio/vfio-container-base.h
> index a2f6c3a..5776fd7 100644
> --- a/include/hw/vfio/vfio-container-base.h
> +++ b/include/hw/vfio/vfio-container-base.h
> @@ -189,4 +189,7 @@ VFIORamDiscardListener *vfio_find_ram_discard_listener(
> int vfio_legacy_dma_map(const VFIOContainerBase *bcontainer, hwaddr iova,
> ram_addr_t size, void *vaddr, bool readonly);
>
> +void vfio_container_region_add(VFIOContainerBase *bcontainer,
> + MemoryRegionSection *section, bool cpr_remap);
> +
> #endif /* HW_VFIO_VFIO_CONTAINER_BASE_H */
> diff --git a/include/hw/vfio/vfio-cpr.h b/include/hw/vfio/vfio-cpr.h
> index 0fc7ab2..d6d22f2 100644
> --- a/include/hw/vfio/vfio-cpr.h
> +++ b/include/hw/vfio/vfio-cpr.h
> @@ -10,10 +10,14 @@
> #define HW_VFIO_VFIO_CPR_H
>
> #include "migration/misc.h"
> +#include "system/memory.h"
>
> typedef struct VFIOContainerCPR {
> Error *blocker;
> bool reused;
> + bool vaddr_unmapped;
> + NotifierWithReturn transfer_notifier;
> + MemoryListener remap_listener;
> } VFIOContainerCPR;
>
> typedef struct VFIODeviceCPR {
> @@ -39,4 +43,10 @@ void vfio_cpr_unregister_container(struct VFIOContainerBase *bcontainer);
> bool vfio_cpr_container_match(struct VFIOContainer *container,
> struct VFIOGroup *group, int *fd);
>
> +void vfio_cpr_giommu_remap(struct VFIOContainerBase *bcontainer,
> + MemoryRegionSection *section);
> +
> +bool vfio_cpr_ram_discard_register_listener(
> + struct VFIOContainerBase *bcontainer, MemoryRegionSection *section);
> +
> #endif /* HW_VFIO_VFIO_CPR_H */
Please add to your .gitconfig :
[diff]
orderFile = /path/to/qemu/scripts/git.orderfile
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Thanks,
C.
^ permalink raw reply [flat|nested] 157+ messages in thread* Re: [PATCH V3 12/42] vfio/container: recover from unmap-all-vaddr failure
2025-05-20 6:29 ` Cédric Le Goater
@ 2025-05-20 13:39 ` Steven Sistare
0 siblings, 0 replies; 157+ messages in thread
From: Steven Sistare @ 2025-05-20 13:39 UTC (permalink / raw)
To: Cédric Le Goater, qemu-devel
Cc: Alex Williamson, Yi Liu, Eric Auger, Zhenzhong Duan,
Michael S. Tsirkin, Marcel Apfelbaum, Peter Xu, Fabiano Rosas
On 5/20/2025 2:29 AM, Cédric Le Goater wrote:
> On 5/12/25 17:32, Steve Sistare wrote:
>> If there are multiple containers and unmap-all fails for some container, we
>> need to remap vaddr for the other containers for which unmap-all succeeded.
>> Recover by walking all address ranges of all containers to restore the vaddr
>> for each. Do so by invoking the vfio listener callback, and passing a new
>> "remap" flag that tells it to restore a mapping without re-allocating new
>> userland data structures.
>>
>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>> ---
>> hw/vfio/cpr-legacy.c | 91 +++++++++++++++++++++++++++++++++++
>> hw/vfio/listener.c | 19 +++++++-
>> include/hw/vfio/vfio-container-base.h | 3 ++
>> include/hw/vfio/vfio-cpr.h | 10 ++++
>> 4 files changed, 122 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/vfio/cpr-legacy.c b/hw/vfio/cpr-legacy.c
>> index bbcf71e..f8ddf78 100644
>> --- a/hw/vfio/cpr-legacy.c
>> +++ b/hw/vfio/cpr-legacy.c
>> @@ -31,6 +31,7 @@ static bool vfio_dma_unmap_vaddr_all(VFIOContainer *container, Error **errp)
>> error_setg_errno(errp, errno, "vfio_dma_unmap_vaddr_all");
>> return false;
>> }
>> + container->cpr.vaddr_unmapped = true;
>> return true;
>> }
>> @@ -63,6 +64,14 @@ static int vfio_legacy_cpr_dma_map(const VFIOContainerBase *bcontainer,
>> return 0;
>> }
>> +static void vfio_region_remap(MemoryListener *listener,
>> + MemoryRegionSection *section)
>> +{
>> + VFIOContainer *container = container_of(listener, VFIOContainer,
>> + cpr.remap_listener);
>> + vfio_container_region_add(&container->bcontainer, section, true);
>> +}
>> +
>> static bool vfio_cpr_supported(VFIOContainer *container, Error **errp)
>> {
>> if (!ioctl(container->fd, VFIO_CHECK_EXTENSION, VFIO_UPDATE_VADDR)) {
>> @@ -131,6 +140,40 @@ static const VMStateDescription vfio_container_vmstate = {
>> }
>> };
>> +static int vfio_cpr_fail_notifier(NotifierWithReturn *notifier,
>> + MigrationEvent *e, Error **errp)
>> +{
>> + VFIOContainer *container =
>> + container_of(notifier, VFIOContainer, cpr.transfer_notifier);
>> + VFIOContainerBase *bcontainer = &container->bcontainer;
>> +
>> + if (e->type != MIG_EVENT_PRECOPY_FAILED) {
>> + return 0;
>> + }
>> +
>> + if (container->cpr.vaddr_unmapped) {
>> + /*
>> + * Force a call to vfio_region_remap for each mapped section by
>> + * temporarily registering a listener, and temporarily diverting
>> + * dma_map to vfio_legacy_cpr_dma_map. The latter restores vaddr.
>> + */
>> +
>> + VFIOIOMMUClass *vioc = VFIO_IOMMU_GET_CLASS(bcontainer);
>> + vioc->dma_map = vfio_legacy_cpr_dma_map;
>> +
>> + container->cpr.remap_listener = (MemoryListener) {
>> + .name = "vfio cpr recover",
>> + .region_add = vfio_region_remap
>> + };
>> + memory_listener_register(&container->cpr.remap_listener,
>> + bcontainer->space->as);
>> + memory_listener_unregister(&container->cpr.remap_listener);
>> + container->cpr.vaddr_unmapped = false;
>> + vioc->dma_map = vfio_legacy_dma_map;
>> + }
>> + return 0;
>> +}
>> +
>> bool vfio_legacy_cpr_register_container(VFIOContainer *container, Error **errp)
>> {
>> VFIOContainerBase *bcontainer = &container->bcontainer;
>> @@ -152,6 +195,10 @@ bool vfio_legacy_cpr_register_container(VFIOContainer *container, Error **errp)
>> VFIOIOMMUClass *vioc = VFIO_IOMMU_GET_CLASS(bcontainer);
>> vioc->dma_map = vfio_legacy_cpr_dma_map;
>> }
>> +
>> + migration_add_notifier_mode(&container->cpr.transfer_notifier,
>> + vfio_cpr_fail_notifier,
>> + MIG_MODE_CPR_TRANSFER);
>> return true;
>> }
>> @@ -162,6 +209,50 @@ void vfio_legacy_cpr_unregister_container(VFIOContainer *container)
>> migration_remove_notifier(&bcontainer->cpr_reboot_notifier);
>> migrate_del_blocker(&container->cpr.blocker);
>> vmstate_unregister(NULL, &vfio_container_vmstate, container);
>> + migration_remove_notifier(&container->cpr.transfer_notifier);
>> +}
>> +
>> +/*
>> + * In old QEMU, VFIO_DMA_UNMAP_FLAG_VADDR may fail on some mapping after
>> + * succeeding for others, so the latter have lost their vaddr. Call this
>> + * to restore vaddr for a section with a giommu.
>> + *
>> + * The giommu already exists. Find it and replay it, which calls
>> + * vfio_legacy_cpr_dma_map further down the stack.
>> + */
>> +void vfio_cpr_giommu_remap(VFIOContainerBase *bcontainer,
>> + MemoryRegionSection *section)
>> +{
>> + VFIOGuestIOMMU *giommu = NULL;
>> + hwaddr as_offset = section->offset_within_address_space;
>> + hwaddr iommu_offset = as_offset - section->offset_within_region;
>> +
>> + QLIST_FOREACH(giommu, &bcontainer->giommu_list, giommu_next) {
>> + if (giommu->iommu_mr == IOMMU_MEMORY_REGION(section->mr) &&
>> + giommu->iommu_offset == iommu_offset) {
>> + break;
>> + }
>> + }
>> + g_assert(giommu);
>> + memory_region_iommu_replay(giommu->iommu_mr, &giommu->n);
>> +}
>> +
>> +/*
>> + * In old QEMU, VFIO_DMA_UNMAP_FLAG_VADDR may fail on some mapping after
>> + * succeeding for others, so the latter have lost their vaddr. Call this
>> + * to restore vaddr for a section with a RamDiscardManager.
>> + *
>> + * The ram discard listener already exists. Call its populate function
>> + * directly, which calls vfio_legacy_cpr_dma_map.
>> + */
>> +bool vfio_cpr_ram_discard_register_listener(VFIOContainerBase *bcontainer,
>> + MemoryRegionSection *section)
>> +{
>> + VFIORamDiscardListener *vrdl =
>> + vfio_find_ram_discard_listener(bcontainer, section);
>> +
>> + g_assert(vrdl);
>> + return vrdl->listener.notify_populate(&vrdl->listener, section) == 0;
>> }
>> static bool same_device(int fd1, int fd2)
>> diff --git a/hw/vfio/listener.c b/hw/vfio/listener.c
>> index 5642d04..e86ffcf 100644
>> --- a/hw/vfio/listener.c
>> +++ b/hw/vfio/listener.c
>> @@ -474,6 +474,13 @@ static void vfio_listener_region_add(MemoryListener *listener,
>> {
>> VFIOContainerBase *bcontainer = container_of(listener, VFIOContainerBase,
>> listener);
>> + vfio_container_region_add(bcontainer, section, false);
>> +}
>> +
>> +void vfio_container_region_add(VFIOContainerBase *bcontainer,
>> + MemoryRegionSection *section,
>> + bool cpr_remap)
>> +{
>> hwaddr iova, end;
>> Int128 llend, llsize;
>> void *vaddr;
>> @@ -509,6 +516,11 @@ static void vfio_listener_region_add(MemoryListener *listener,
>> int iommu_idx;
>> trace_vfio_listener_region_add_iommu(section->mr->name, iova, end);
>> +
>> + if (cpr_remap) {
>> + vfio_cpr_giommu_remap(bcontainer, section);
>> + }
>> +
>> /*
>> * FIXME: For VFIO iommu types which have KVM acceleration to
>> * avoid bouncing all map/unmaps through qemu this way, this
>> @@ -551,7 +563,12 @@ static void vfio_listener_region_add(MemoryListener *listener,
>> * about changes.
>> */
>> if (memory_region_has_ram_discard_manager(section->mr)) {
>> - vfio_ram_discard_register_listener(bcontainer, section);
>> + if (!cpr_remap) {
>> + vfio_ram_discard_register_listener(bcontainer, section);
>> + } else if (!vfio_cpr_ram_discard_register_listener(bcontainer,
>> + section)) {
>> + goto fail;
>> + }
>> return;
>> }
>> diff --git a/include/hw/vfio/vfio-container-base.h b/include/hw/vfio/vfio-container-base.h
>> index a2f6c3a..5776fd7 100644
>> --- a/include/hw/vfio/vfio-container-base.h
>> +++ b/include/hw/vfio/vfio-container-base.h
>> @@ -189,4 +189,7 @@ VFIORamDiscardListener *vfio_find_ram_discard_listener(
>> int vfio_legacy_dma_map(const VFIOContainerBase *bcontainer, hwaddr iova,
>> ram_addr_t size, void *vaddr, bool readonly);
>> +void vfio_container_region_add(VFIOContainerBase *bcontainer,
>> + MemoryRegionSection *section, bool cpr_remap);
>> +
>> #endif /* HW_VFIO_VFIO_CONTAINER_BASE_H */
>> diff --git a/include/hw/vfio/vfio-cpr.h b/include/hw/vfio/vfio-cpr.h
>> index 0fc7ab2..d6d22f2 100644
>> --- a/include/hw/vfio/vfio-cpr.h
>> +++ b/include/hw/vfio/vfio-cpr.h
>> @@ -10,10 +10,14 @@
>> #define HW_VFIO_VFIO_CPR_H
>> #include "migration/misc.h"
>> +#include "system/memory.h"
>> typedef struct VFIOContainerCPR {
>> Error *blocker;
>> bool reused;
>> + bool vaddr_unmapped;
>> + NotifierWithReturn transfer_notifier;
>> + MemoryListener remap_listener;
>> } VFIOContainerCPR;
>> typedef struct VFIODeviceCPR {
>> @@ -39,4 +43,10 @@ void vfio_cpr_unregister_container(struct VFIOContainerBase *bcontainer);
>> bool vfio_cpr_container_match(struct VFIOContainer *container,
>> struct VFIOGroup *group, int *fd);
>> +void vfio_cpr_giommu_remap(struct VFIOContainerBase *bcontainer,
>> + MemoryRegionSection *section);
>> +
>> +bool vfio_cpr_ram_discard_register_listener(
>> + struct VFIOContainerBase *bcontainer, MemoryRegionSection *section);
>> +
>> #endif /* HW_VFIO_VFIO_CPR_H */
>
> Please add to your .gitconfig :
>
> [diff]
> orderFile = /path/to/qemu/scripts/git.orderfile
Cool, thanks for the tip - steve
> Reviewed-by: Cédric Le Goater <clg@redhat.com>
>
> Thanks,
>
> C.
^ permalink raw reply [flat|nested] 157+ messages in thread
* [PATCH V3 13/42] pci: export msix_is_pending
2025-05-12 15:32 [PATCH V3 00/42] Live update: vfio and iommufd Steve Sistare
` (11 preceding siblings ...)
2025-05-12 15:32 ` [PATCH V3 12/42] vfio/container: recover from unmap-all-vaddr failure Steve Sistare
@ 2025-05-12 15:32 ` Steve Sistare
2025-05-12 15:32 ` [PATCH V3 14/42] pci: skip reset during cpr Steve Sistare
` (29 subsequent siblings)
42 siblings, 0 replies; 157+ messages in thread
From: Steve Sistare @ 2025-05-12 15:32 UTC (permalink / raw)
To: qemu-devel
Cc: Alex Williamson, Cedric Le Goater, Yi Liu, Eric Auger,
Zhenzhong Duan, Michael S. Tsirkin, Marcel Apfelbaum, Peter Xu,
Fabiano Rosas, Steve Sistare
Export msix_is_pending for use by cpr. No functional change.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
---
hw/pci/msix.c | 2 +-
include/hw/pci/msix.h | 1 +
2 files changed, 2 insertions(+), 1 deletion(-)
diff --git a/hw/pci/msix.c b/hw/pci/msix.c
index 66f27b9..8c7f670 100644
--- a/hw/pci/msix.c
+++ b/hw/pci/msix.c
@@ -72,7 +72,7 @@ static uint8_t *msix_pending_byte(PCIDevice *dev, int vector)
return dev->msix_pba + vector / 8;
}
-static int msix_is_pending(PCIDevice *dev, int vector)
+int msix_is_pending(PCIDevice *dev, unsigned int vector)
{
return *msix_pending_byte(dev, vector) & msix_pending_mask(vector);
}
diff --git a/include/hw/pci/msix.h b/include/hw/pci/msix.h
index 0e6f257..11ef945 100644
--- a/include/hw/pci/msix.h
+++ b/include/hw/pci/msix.h
@@ -32,6 +32,7 @@ int msix_present(PCIDevice *dev);
bool msix_is_masked(PCIDevice *dev, unsigned vector);
void msix_set_pending(PCIDevice *dev, unsigned vector);
void msix_clr_pending(PCIDevice *dev, int vector);
+int msix_is_pending(PCIDevice *dev, unsigned vector);
void msix_vector_use(PCIDevice *dev, unsigned vector);
void msix_vector_unuse(PCIDevice *dev, unsigned vector);
--
1.8.3.1
^ permalink raw reply related [flat|nested] 157+ messages in thread* [PATCH V3 14/42] pci: skip reset during cpr
2025-05-12 15:32 [PATCH V3 00/42] Live update: vfio and iommufd Steve Sistare
` (12 preceding siblings ...)
2025-05-12 15:32 ` [PATCH V3 13/42] pci: export msix_is_pending Steve Sistare
@ 2025-05-12 15:32 ` Steve Sistare
2025-05-16 8:19 ` Cédric Le Goater
2025-05-12 15:32 ` [PATCH V3 15/42] vfio-pci: " Steve Sistare
` (28 subsequent siblings)
42 siblings, 1 reply; 157+ messages in thread
From: Steve Sistare @ 2025-05-12 15:32 UTC (permalink / raw)
To: qemu-devel
Cc: Alex Williamson, Cedric Le Goater, Yi Liu, Eric Auger,
Zhenzhong Duan, Michael S. Tsirkin, Marcel Apfelbaum, Peter Xu,
Fabiano Rosas, Steve Sistare
Do not reset a vfio-pci device during CPR.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
hw/pci/pci.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index fe38c4c..2ba2e0f 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -32,6 +32,8 @@
#include "hw/pci/pci_host.h"
#include "hw/qdev-properties.h"
#include "hw/qdev-properties-system.h"
+#include "migration/cpr.h"
+#include "migration/misc.h"
#include "migration/qemu-file-types.h"
#include "migration/vmstate.h"
#include "net/net.h"
@@ -537,6 +539,17 @@ static void pci_reset_regions(PCIDevice *dev)
static void pci_do_device_reset(PCIDevice *dev)
{
+ /*
+ * A PCI device that is resuming for cpr is already configured, so do
+ * not reset it here when we are called from qemu_system_reset prior to
+ * cpr load, else interrupts may be lost for vfio-pci devices. It is
+ * safe to skip this reset for all PCI devices, because vmstate load will
+ * set all fields that would have been set here.
+ */
+ if (cpr_is_incoming()) {
+ return;
+ }
+
pci_device_deassert_intx(dev);
assert(dev->irq_state == 0);
--
1.8.3.1
^ permalink raw reply related [flat|nested] 157+ messages in thread* Re: [PATCH V3 14/42] pci: skip reset during cpr
2025-05-12 15:32 ` [PATCH V3 14/42] pci: skip reset during cpr Steve Sistare
@ 2025-05-16 8:19 ` Cédric Le Goater
2025-05-16 17:58 ` Steven Sistare
2025-05-24 9:34 ` Michael S. Tsirkin
0 siblings, 2 replies; 157+ messages in thread
From: Cédric Le Goater @ 2025-05-16 8:19 UTC (permalink / raw)
To: Steve Sistare, qemu-devel
Cc: Alex Williamson, Yi Liu, Eric Auger, Zhenzhong Duan,
Michael S. Tsirkin, Marcel Apfelbaum, Peter Xu, Fabiano Rosas
On 5/12/25 17:32, Steve Sistare wrote:
> Do not reset a vfio-pci device during CPR.
>
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> ---
> hw/pci/pci.c | 13 +++++++++++++
> 1 file changed, 13 insertions(+)
>
> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> index fe38c4c..2ba2e0f 100644
> --- a/hw/pci/pci.c
> +++ b/hw/pci/pci.c
> @@ -32,6 +32,8 @@
> #include "hw/pci/pci_host.h"
> #include "hw/qdev-properties.h"
> #include "hw/qdev-properties-system.h"
> +#include "migration/cpr.h"
> +#include "migration/misc.h"
> #include "migration/qemu-file-types.h"
> #include "migration/vmstate.h"
> #include "net/net.h"
> @@ -537,6 +539,17 @@ static void pci_reset_regions(PCIDevice *dev)
>
> static void pci_do_device_reset(PCIDevice *dev)
> {
> + /*
> + * A PCI device that is resuming for cpr is already configured, so do
> + * not reset it here when we are called from qemu_system_reset prior to
> + * cpr load, else interrupts may be lost for vfio-pci devices. It is
> + * safe to skip this reset for all PCI devices, because vmstate load will
> + * set all fields that would have been set here.
> + */
> + if (cpr_is_incoming()) {
Why can't we use cpr_is_incoming() in vfio instead of using an heuristic
on saved fds?
Thanks,
C.
> + return;
> + }
> +
> pci_device_deassert_intx(dev);
> assert(dev->irq_state == 0);
>
^ permalink raw reply [flat|nested] 157+ messages in thread* Re: [PATCH V3 14/42] pci: skip reset during cpr
2025-05-16 8:19 ` Cédric Le Goater
@ 2025-05-16 17:58 ` Steven Sistare
2025-05-24 9:34 ` Michael S. Tsirkin
1 sibling, 0 replies; 157+ messages in thread
From: Steven Sistare @ 2025-05-16 17:58 UTC (permalink / raw)
To: Cédric Le Goater, qemu-devel
Cc: Alex Williamson, Yi Liu, Eric Auger, Zhenzhong Duan,
Michael S. Tsirkin, Marcel Apfelbaum, Peter Xu, Fabiano Rosas
On 5/16/2025 4:19 AM, Cédric Le Goater wrote:
> On 5/12/25 17:32, Steve Sistare wrote:
>> Do not reset a vfio-pci device during CPR.
>>
>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>> ---
>> hw/pci/pci.c | 13 +++++++++++++
>> 1 file changed, 13 insertions(+)
>>
>> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
>> index fe38c4c..2ba2e0f 100644
>> --- a/hw/pci/pci.c
>> +++ b/hw/pci/pci.c
>> @@ -32,6 +32,8 @@
>> #include "hw/pci/pci_host.h"
>> #include "hw/qdev-properties.h"
>> #include "hw/qdev-properties-system.h"
>> +#include "migration/cpr.h"
>> +#include "migration/misc.h"
>> #include "migration/qemu-file-types.h"
>> #include "migration/vmstate.h"
>> #include "net/net.h"
>> @@ -537,6 +539,17 @@ static void pci_reset_regions(PCIDevice *dev)
>> static void pci_do_device_reset(PCIDevice *dev)
>> {
>> + /*
>> + * A PCI device that is resuming for cpr is already configured, so do
>> + * not reset it here when we are called from qemu_system_reset prior to
>> + * cpr load, else interrupts may be lost for vfio-pci devices. It is
>> + * safe to skip this reset for all PCI devices, because vmstate load will
>> + * set all fields that would have been set here.
>> + */
>> + if (cpr_is_incoming()) {
>
> Why can't we use cpr_is_incoming() in vfio instead of using an heuristic
> on saved fds?
We could (and we had the same discussion in V1 or V2).
I thought it slightly more object-oriented to derive the cpr_reused
boolean where an fd is involved, and save/use that in the associated object,
rather than call a global function everywhere. I do not feel strongly about it,
but it is used a lot:
$ git grep -F cpr_reused -- hw/vfio | wc -l
19
$ git grep -F cpr.reused -- hw/vfio | wc -l
27
- Steve
^ permalink raw reply [flat|nested] 157+ messages in thread* Re: [PATCH V3 14/42] pci: skip reset during cpr
2025-05-16 8:19 ` Cédric Le Goater
2025-05-16 17:58 ` Steven Sistare
@ 2025-05-24 9:34 ` Michael S. Tsirkin
2025-05-27 20:42 ` Steven Sistare
1 sibling, 1 reply; 157+ messages in thread
From: Michael S. Tsirkin @ 2025-05-24 9:34 UTC (permalink / raw)
To: Cédric Le Goater
Cc: Steve Sistare, qemu-devel, Alex Williamson, Yi Liu, Eric Auger,
Zhenzhong Duan, Marcel Apfelbaum, Peter Xu, Fabiano Rosas
On Fri, May 16, 2025 at 10:19:09AM +0200, Cédric Le Goater wrote:
> On 5/12/25 17:32, Steve Sistare wrote:
> > Do not reset a vfio-pci device during CPR.
> >
> > Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> > ---
> > hw/pci/pci.c | 13 +++++++++++++
> > 1 file changed, 13 insertions(+)
> >
> > diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> > index fe38c4c..2ba2e0f 100644
> > --- a/hw/pci/pci.c
> > +++ b/hw/pci/pci.c
> > @@ -32,6 +32,8 @@
> > #include "hw/pci/pci_host.h"
> > #include "hw/qdev-properties.h"
> > #include "hw/qdev-properties-system.h"
> > +#include "migration/cpr.h"
> > +#include "migration/misc.h"
> > #include "migration/qemu-file-types.h"
> > #include "migration/vmstate.h"
> > #include "net/net.h"
> > @@ -537,6 +539,17 @@ static void pci_reset_regions(PCIDevice *dev)
> > static void pci_do_device_reset(PCIDevice *dev)
> > {
> > + /*
> > + * A PCI device that is resuming for cpr is already configured, so do
> > + * not reset it here when we are called from qemu_system_reset prior to
> > + * cpr load, else interrupts may be lost for vfio-pci devices. It is
> > + * safe to skip this reset for all PCI devices, because vmstate load will
> > + * set all fields that would have been set here.
> > + */
> > + if (cpr_is_incoming()) {
>
> Why can't we use cpr_is_incoming() in vfio instead of using an heuristic
> on saved fds?
>
> Thanks,
>
> C.
Think I agree.
>
>
> > + return;
> > + }
> > +
> > pci_device_deassert_intx(dev);
> > assert(dev->irq_state == 0);
^ permalink raw reply [flat|nested] 157+ messages in thread* Re: [PATCH V3 14/42] pci: skip reset during cpr
2025-05-24 9:34 ` Michael S. Tsirkin
@ 2025-05-27 20:42 ` Steven Sistare
2025-05-27 21:03 ` Michael S. Tsirkin
0 siblings, 1 reply; 157+ messages in thread
From: Steven Sistare @ 2025-05-27 20:42 UTC (permalink / raw)
To: Michael S. Tsirkin, Cédric Le Goater
Cc: qemu-devel, Alex Williamson, Yi Liu, Eric Auger, Zhenzhong Duan,
Marcel Apfelbaum, Peter Xu, Fabiano Rosas
On 5/24/2025 5:34 AM, Michael S. Tsirkin wrote:
> On Fri, May 16, 2025 at 10:19:09AM +0200, Cédric Le Goater wrote:
>> On 5/12/25 17:32, Steve Sistare wrote:
>>> Do not reset a vfio-pci device during CPR.
>>>
>>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>>> ---
>>> hw/pci/pci.c | 13 +++++++++++++
>>> 1 file changed, 13 insertions(+)
>>>
>>> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
>>> index fe38c4c..2ba2e0f 100644
>>> --- a/hw/pci/pci.c
>>> +++ b/hw/pci/pci.c
>>> @@ -32,6 +32,8 @@
>>> #include "hw/pci/pci_host.h"
>>> #include "hw/qdev-properties.h"
>>> #include "hw/qdev-properties-system.h"
>>> +#include "migration/cpr.h"
>>> +#include "migration/misc.h"
>>> #include "migration/qemu-file-types.h"
>>> #include "migration/vmstate.h"
>>> #include "net/net.h"
>>> @@ -537,6 +539,17 @@ static void pci_reset_regions(PCIDevice *dev)
>>> static void pci_do_device_reset(PCIDevice *dev)
>>> {
>>> + /*
>>> + * A PCI device that is resuming for cpr is already configured, so do
>>> + * not reset it here when we are called from qemu_system_reset prior to
>>> + * cpr load, else interrupts may be lost for vfio-pci devices. It is
>>> + * safe to skip this reset for all PCI devices, because vmstate load will
>>> + * set all fields that would have been set here.
>>> + */
>>> + if (cpr_is_incoming()) {
>>
>> Why can't we use cpr_is_incoming() in vfio instead of using an heuristic
>> on saved fds?
>>
>> Thanks,
>>
>> C.
>
> Think I agree.
OK. I will delete the "reused" variable everywhere, and use cpr_is_incoming.
Michael, since I already use cpr_is_incoming in this pci patch, can I have
your RB or ack?
- Steve
>
>>
>>
>>> + return;
>>> + }
>>> +
>>> pci_device_deassert_intx(dev);
>>> assert(dev->irq_state == 0);
>
^ permalink raw reply [flat|nested] 157+ messages in thread* Re: [PATCH V3 14/42] pci: skip reset during cpr
2025-05-27 20:42 ` Steven Sistare
@ 2025-05-27 21:03 ` Michael S. Tsirkin
2025-05-28 16:11 ` Steven Sistare
0 siblings, 1 reply; 157+ messages in thread
From: Michael S. Tsirkin @ 2025-05-27 21:03 UTC (permalink / raw)
To: Steven Sistare
Cc: Cédric Le Goater, qemu-devel, Alex Williamson, Yi Liu,
Eric Auger, Zhenzhong Duan, Marcel Apfelbaum, Peter Xu,
Fabiano Rosas
On Tue, May 27, 2025 at 04:42:16PM -0400, Steven Sistare wrote:
>
>
> On 5/24/2025 5:34 AM, Michael S. Tsirkin wrote:
> > On Fri, May 16, 2025 at 10:19:09AM +0200, Cédric Le Goater wrote:
> > > On 5/12/25 17:32, Steve Sistare wrote:
> > > > Do not reset a vfio-pci device during CPR.
> > > >
> > > > Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> > > > ---
> > > > hw/pci/pci.c | 13 +++++++++++++
> > > > 1 file changed, 13 insertions(+)
> > > >
> > > > diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> > > > index fe38c4c..2ba2e0f 100644
> > > > --- a/hw/pci/pci.c
> > > > +++ b/hw/pci/pci.c
> > > > @@ -32,6 +32,8 @@
> > > > #include "hw/pci/pci_host.h"
> > > > #include "hw/qdev-properties.h"
> > > > #include "hw/qdev-properties-system.h"
> > > > +#include "migration/cpr.h"
> > > > +#include "migration/misc.h"
> > > > #include "migration/qemu-file-types.h"
> > > > #include "migration/vmstate.h"
> > > > #include "net/net.h"
> > > > @@ -537,6 +539,17 @@ static void pci_reset_regions(PCIDevice *dev)
> > > > static void pci_do_device_reset(PCIDevice *dev)
> > > > {
> > > > + /*
> > > > + * A PCI device that is resuming for cpr is already configured, so do
> > > > + * not reset it here when we are called from qemu_system_reset prior to
> > > > + * cpr load, else interrupts may be lost for vfio-pci devices. It is
> > > > + * safe to skip this reset for all PCI devices, because vmstate load will
> > > > + * set all fields that would have been set here.
> > > > + */
> > > > + if (cpr_is_incoming()) {
> > >
> > > Why can't we use cpr_is_incoming() in vfio instead of using an heuristic
> > > on saved fds?
> > >
> > > Thanks,
> > >
> > > C.
> >
> > Think I agree.
>
> OK. I will delete the "reused" variable everywhere, and use cpr_is_incoming.
>
> Michael, since I already use cpr_is_incoming in this pci patch, can I have
> your RB or ack?
>
> - Steve
My problem is not with cpr_is_incoming as such.
First this comment is a very low level thing to say in common pci code.
vfio will change and we will not remember to keep this up to date.
Second, do we really know vmload for all devices sets all fields as
opposed to assume that qemu_system_reset cleared them? If not this
introduces an information leak.
It feels safer to just add a way for VFIO to opt out of
(all or part of) reset, instead.
> >
> > >
> > >
> > > > + return;
> > > > + }
> > > > +
> > > > pci_device_deassert_intx(dev);
> > > > assert(dev->irq_state == 0);
> >
^ permalink raw reply [flat|nested] 157+ messages in thread* Re: [PATCH V3 14/42] pci: skip reset during cpr
2025-05-27 21:03 ` Michael S. Tsirkin
@ 2025-05-28 16:11 ` Steven Sistare
0 siblings, 0 replies; 157+ messages in thread
From: Steven Sistare @ 2025-05-28 16:11 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Cédric Le Goater, qemu-devel, Alex Williamson, Yi Liu,
Eric Auger, Zhenzhong Duan, Marcel Apfelbaum, Peter Xu,
Fabiano Rosas
On 5/27/2025 5:03 PM, Michael S. Tsirkin wrote:
> On Tue, May 27, 2025 at 04:42:16PM -0400, Steven Sistare wrote:
>> On 5/24/2025 5:34 AM, Michael S. Tsirkin wrote:
>>> On Fri, May 16, 2025 at 10:19:09AM +0200, Cédric Le Goater wrote:
>>>> On 5/12/25 17:32, Steve Sistare wrote:
>>>>> Do not reset a vfio-pci device during CPR.
>>>>>
>>>>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>>>>> ---
>>>>> hw/pci/pci.c | 13 +++++++++++++
>>>>> 1 file changed, 13 insertions(+)
>>>>>
>>>>> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
>>>>> index fe38c4c..2ba2e0f 100644
>>>>> --- a/hw/pci/pci.c
>>>>> +++ b/hw/pci/pci.c
>>>>> @@ -32,6 +32,8 @@
>>>>> #include "hw/pci/pci_host.h"
>>>>> #include "hw/qdev-properties.h"
>>>>> #include "hw/qdev-properties-system.h"
>>>>> +#include "migration/cpr.h"
>>>>> +#include "migration/misc.h"
>>>>> #include "migration/qemu-file-types.h"
>>>>> #include "migration/vmstate.h"
>>>>> #include "net/net.h"
>>>>> @@ -537,6 +539,17 @@ static void pci_reset_regions(PCIDevice *dev)
>>>>> static void pci_do_device_reset(PCIDevice *dev)
>>>>> {
>>>>> + /*
>>>>> + * A PCI device that is resuming for cpr is already configured, so do
>>>>> + * not reset it here when we are called from qemu_system_reset prior to
>>>>> + * cpr load, else interrupts may be lost for vfio-pci devices. It is
>>>>> + * safe to skip this reset for all PCI devices, because vmstate load will
>>>>> + * set all fields that would have been set here.
>>>>> + */
>>>>> + if (cpr_is_incoming()) {
>>>>
>>>> Why can't we use cpr_is_incoming() in vfio instead of using an heuristic
>>>> on saved fds?
>>>>
>>>> Thanks,
>>>>
>>>> C.
>>>
>>> Think I agree.
>>
>> OK. I will delete the "reused" variable everywhere, and use cpr_is_incoming.
>>
>> Michael, since I already use cpr_is_incoming in this pci patch, can I have
>> your RB or ack?
>>
>> - Steve
>
> My problem is not with cpr_is_incoming as such.
>
> First this comment is a very low level thing to say in common pci code.
> vfio will change and we will not remember to keep this up to date.
>
> Second, do we really know vmload for all devices sets all fields as
> opposed to assume that qemu_system_reset cleared them? If not this
> introduces an information leak.
>
> It feels safer to just add a way for VFIO to opt out of
> (all or part of) reset, instead.
Thanks very much for the feedback. How about:
hw/vfio/pci.c
vfio_instance_init()
/*
* A device that is resuming for cpr is already configured, so do not
* reset it during qemu_system_reset prior to cpr load, else interrupts
* may be lost.
*/
pci_dev->skip_reset_on_cpr = true
hw/pci/pci.c
pci_do_device_reset()
if (dev->skip_reset_on_cpr && cpr_is_incoming()) {
return;
}
- Steve
^ permalink raw reply [flat|nested] 157+ messages in thread
* [PATCH V3 15/42] vfio-pci: skip reset during cpr
2025-05-12 15:32 [PATCH V3 00/42] Live update: vfio and iommufd Steve Sistare
` (13 preceding siblings ...)
2025-05-12 15:32 ` [PATCH V3 14/42] pci: skip reset during cpr Steve Sistare
@ 2025-05-12 15:32 ` Steve Sistare
2025-05-20 6:48 ` Cédric Le Goater
2025-05-12 15:32 ` [PATCH V3 16/42] vfio/pci: vfio_vector_init Steve Sistare
` (27 subsequent siblings)
42 siblings, 1 reply; 157+ messages in thread
From: Steve Sistare @ 2025-05-12 15:32 UTC (permalink / raw)
To: qemu-devel
Cc: Alex Williamson, Cedric Le Goater, Yi Liu, Eric Auger,
Zhenzhong Duan, Michael S. Tsirkin, Marcel Apfelbaum, Peter Xu,
Fabiano Rosas, Steve Sistare
Do not reset a vfio-pci device during CPR, and do not complain if the
kernel's PCI config space changes for non-emulated bits between the
vmstate save and load, which can happen due to ongoing interrupt activity.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
hw/vfio/cpr.c | 31 +++++++++++++++++++++++++++++++
hw/vfio/pci.c | 6 ++++++
include/hw/vfio/vfio-cpr.h | 2 ++
3 files changed, 39 insertions(+)
diff --git a/hw/vfio/cpr.c b/hw/vfio/cpr.c
index 0e59612..6ea8e9f 100644
--- a/hw/vfio/cpr.c
+++ b/hw/vfio/cpr.c
@@ -8,6 +8,8 @@
#include "qemu/osdep.h"
#include "hw/vfio/vfio-device.h"
#include "hw/vfio/vfio-cpr.h"
+#include "hw/vfio/pci.h"
+#include "migration/cpr.h"
#include "qapi/error.h"
#include "system/runstate.h"
@@ -37,3 +39,32 @@ void vfio_cpr_unregister_container(VFIOContainerBase *bcontainer)
{
migration_remove_notifier(&bcontainer->cpr_reboot_notifier);
}
+
+/*
+ * The kernel may change non-emulated config bits. Exclude them from the
+ * changed-bits check in get_pci_config_device.
+ */
+static int vfio_cpr_pci_pre_load(void *opaque)
+{
+ VFIOPCIDevice *vdev = opaque;
+ PCIDevice *pdev = &vdev->pdev;
+ int size = MIN(pci_config_size(pdev), vdev->config_size);
+ int i;
+
+ for (i = 0; i < size; i++) {
+ pdev->cmask[i] &= vdev->emulated_config_bits[i];
+ }
+
+ return 0;
+}
+
+const VMStateDescription vfio_cpr_pci_vmstate = {
+ .name = "vfio-cpr-pci",
+ .version_id = 0,
+ .minimum_version_id = 0,
+ .pre_load = vfio_cpr_pci_pre_load,
+ .needed = cpr_needed_for_reuse,
+ .fields = (VMStateField[]) {
+ VMSTATE_END_OF_LIST()
+ }
+};
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index a1bfdfe..4aa83b1 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -3344,6 +3344,11 @@ static void vfio_pci_reset(DeviceState *dev)
{
VFIOPCIDevice *vdev = VFIO_PCI_BASE(dev);
+ /* Do not reset the device during qemu_system_reset prior to cpr load */
+ if (vdev->vbasedev.cpr.reused) {
+ return;
+ }
+
trace_vfio_pci_reset(vdev->vbasedev.name);
vfio_pci_pre_reset(vdev);
@@ -3513,6 +3518,7 @@ static void vfio_pci_dev_class_init(ObjectClass *klass, const void *data)
#ifdef CONFIG_IOMMUFD
object_class_property_add_str(klass, "fd", NULL, vfio_pci_set_fd);
#endif
+ dc->vmsd = &vfio_cpr_pci_vmstate;
dc->desc = "VFIO-based PCI device assignment";
pdc->realize = vfio_realize;
diff --git a/include/hw/vfio/vfio-cpr.h b/include/hw/vfio/vfio-cpr.h
index d6d22f2..e93600f 100644
--- a/include/hw/vfio/vfio-cpr.h
+++ b/include/hw/vfio/vfio-cpr.h
@@ -49,4 +49,6 @@ void vfio_cpr_giommu_remap(struct VFIOContainerBase *bcontainer,
bool vfio_cpr_ram_discard_register_listener(
struct VFIOContainerBase *bcontainer, MemoryRegionSection *section);
+extern const VMStateDescription vfio_cpr_pci_vmstate;
+
#endif /* HW_VFIO_VFIO_CPR_H */
--
1.8.3.1
^ permalink raw reply related [flat|nested] 157+ messages in thread* Re: [PATCH V3 15/42] vfio-pci: skip reset during cpr
2025-05-12 15:32 ` [PATCH V3 15/42] vfio-pci: " Steve Sistare
@ 2025-05-20 6:48 ` Cédric Le Goater
2025-05-20 13:44 ` Steven Sistare
0 siblings, 1 reply; 157+ messages in thread
From: Cédric Le Goater @ 2025-05-20 6:48 UTC (permalink / raw)
To: Steve Sistare, qemu-devel
Cc: Alex Williamson, Yi Liu, Eric Auger, Zhenzhong Duan,
Michael S. Tsirkin, Marcel Apfelbaum, Peter Xu, Fabiano Rosas
On 5/12/25 17:32, Steve Sistare wrote:
> Do not reset a vfio-pci device during CPR, and do not complain if the
> kernel's PCI config space changes for non-emulated bits between the
> vmstate save and load, which can happen due to ongoing interrupt activity.
>
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> ---
> hw/vfio/cpr.c | 31 +++++++++++++++++++++++++++++++
> hw/vfio/pci.c | 6 ++++++
> include/hw/vfio/vfio-cpr.h | 2 ++
> 3 files changed, 39 insertions(+)
>
> diff --git a/hw/vfio/cpr.c b/hw/vfio/cpr.c
> index 0e59612..6ea8e9f 100644
> --- a/hw/vfio/cpr.c
> +++ b/hw/vfio/cpr.c
> @@ -8,6 +8,8 @@
> #include "qemu/osdep.h"
> #include "hw/vfio/vfio-device.h"
> #include "hw/vfio/vfio-cpr.h"
> +#include "hw/vfio/pci.h"
> +#include "migration/cpr.h"
> #include "qapi/error.h"
> #include "system/runstate.h"
>
> @@ -37,3 +39,32 @@ void vfio_cpr_unregister_container(VFIOContainerBase *bcontainer)
> {
> migration_remove_notifier(&bcontainer->cpr_reboot_notifier);
> }
> +
> +/*
> + * The kernel may change non-emulated config bits. Exclude them from the
> + * changed-bits check in get_pci_config_device.
> + */
> +static int vfio_cpr_pci_pre_load(void *opaque)
> +{
> + VFIOPCIDevice *vdev = opaque;
> + PCIDevice *pdev = &vdev->pdev;
> + int size = MIN(pci_config_size(pdev), vdev->config_size);
> + int i;
> +
> + for (i = 0; i < size; i++) {
> + pdev->cmask[i] &= vdev->emulated_config_bits[i];
> + }
> +
> + return 0;
> +}
> +
> +const VMStateDescription vfio_cpr_pci_vmstate = {
> + .name = "vfio-cpr-pci",
> + .version_id = 0,
> + .minimum_version_id = 0,
> + .pre_load = vfio_cpr_pci_pre_load,
> + .needed = cpr_needed_for_reuse,
> + .fields = (VMStateField[]) {
> + VMSTATE_END_OF_LIST()
> + }
> +};
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index a1bfdfe..4aa83b1 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -3344,6 +3344,11 @@ static void vfio_pci_reset(DeviceState *dev)
> {
> VFIOPCIDevice *vdev = VFIO_PCI_BASE(dev);
>
> + /* Do not reset the device during qemu_system_reset prior to cpr load */
> + if (vdev->vbasedev.cpr.reused) {
> + return;
> + }
> +
hw/pci/pci.c does :
if (cpr_is_incoming()) {
return;
}
So, to be consistent, I think VFIO should do the same.
Thanks,
C.
> trace_vfio_pci_reset(vdev->vbasedev.name);
>
> vfio_pci_pre_reset(vdev);
> @@ -3513,6 +3518,7 @@ static void vfio_pci_dev_class_init(ObjectClass *klass, const void *data)
> #ifdef CONFIG_IOMMUFD
> object_class_property_add_str(klass, "fd", NULL, vfio_pci_set_fd);
> #endif
> + dc->vmsd = &vfio_cpr_pci_vmstate;
> dc->desc = "VFIO-based PCI device assignment";
> pdc->realize = vfio_realize;
>
> diff --git a/include/hw/vfio/vfio-cpr.h b/include/hw/vfio/vfio-cpr.h
> index d6d22f2..e93600f 100644
> --- a/include/hw/vfio/vfio-cpr.h
> +++ b/include/hw/vfio/vfio-cpr.h
> @@ -49,4 +49,6 @@ void vfio_cpr_giommu_remap(struct VFIOContainerBase *bcontainer,
> bool vfio_cpr_ram_discard_register_listener(
> struct VFIOContainerBase *bcontainer, MemoryRegionSection *section);
>
> +extern const VMStateDescription vfio_cpr_pci_vmstate;
> +
> #endif /* HW_VFIO_VFIO_CPR_H */
^ permalink raw reply [flat|nested] 157+ messages in thread* Re: [PATCH V3 15/42] vfio-pci: skip reset during cpr
2025-05-20 6:48 ` Cédric Le Goater
@ 2025-05-20 13:44 ` Steven Sistare
0 siblings, 0 replies; 157+ messages in thread
From: Steven Sistare @ 2025-05-20 13:44 UTC (permalink / raw)
To: Cédric Le Goater, qemu-devel
Cc: Alex Williamson, Yi Liu, Eric Auger, Zhenzhong Duan,
Michael S. Tsirkin, Marcel Apfelbaum, Peter Xu, Fabiano Rosas
On 5/20/2025 2:48 AM, Cédric Le Goater wrote:
> On 5/12/25 17:32, Steve Sistare wrote:
>> Do not reset a vfio-pci device during CPR, and do not complain if the
>> kernel's PCI config space changes for non-emulated bits between the
>> vmstate save and load, which can happen due to ongoing interrupt activity.
>>
>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>> ---
>> hw/vfio/cpr.c | 31 +++++++++++++++++++++++++++++++
>> hw/vfio/pci.c | 6 ++++++
>> include/hw/vfio/vfio-cpr.h | 2 ++
>> 3 files changed, 39 insertions(+)
>>
>> diff --git a/hw/vfio/cpr.c b/hw/vfio/cpr.c
>> index 0e59612..6ea8e9f 100644
>> --- a/hw/vfio/cpr.c
>> +++ b/hw/vfio/cpr.c
>> @@ -8,6 +8,8 @@
>> #include "qemu/osdep.h"
>> #include "hw/vfio/vfio-device.h"
>> #include "hw/vfio/vfio-cpr.h"
>> +#include "hw/vfio/pci.h"
>> +#include "migration/cpr.h"
>> #include "qapi/error.h"
>> #include "system/runstate.h"
>> @@ -37,3 +39,32 @@ void vfio_cpr_unregister_container(VFIOContainerBase *bcontainer)
>> {
>> migration_remove_notifier(&bcontainer->cpr_reboot_notifier);
>> }
>> +
>> +/*
>> + * The kernel may change non-emulated config bits. Exclude them from the
>> + * changed-bits check in get_pci_config_device.
>> + */
>> +static int vfio_cpr_pci_pre_load(void *opaque)
>> +{
>> + VFIOPCIDevice *vdev = opaque;
>> + PCIDevice *pdev = &vdev->pdev;
>> + int size = MIN(pci_config_size(pdev), vdev->config_size);
>> + int i;
>> +
>> + for (i = 0; i < size; i++) {
>> + pdev->cmask[i] &= vdev->emulated_config_bits[i];
>> + }
>> +
>> + return 0;
>> +}
>> +
>> +const VMStateDescription vfio_cpr_pci_vmstate = {
>> + .name = "vfio-cpr-pci",
>> + .version_id = 0,
>> + .minimum_version_id = 0,
>> + .pre_load = vfio_cpr_pci_pre_load,
>> + .needed = cpr_needed_for_reuse,
>> + .fields = (VMStateField[]) {
>> + VMSTATE_END_OF_LIST()
>> + }
>> +};
>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>> index a1bfdfe..4aa83b1 100644
>> --- a/hw/vfio/pci.c
>> +++ b/hw/vfio/pci.c
>> @@ -3344,6 +3344,11 @@ static void vfio_pci_reset(DeviceState *dev)
>> {
>> VFIOPCIDevice *vdev = VFIO_PCI_BASE(dev);
>> + /* Do not reset the device during qemu_system_reset prior to cpr load */
>> + if (vdev->vbasedev.cpr.reused) {
>> + return;
>> + }
>> +
>
> hw/pci/pci.c does :
>
> if (cpr_is_incoming()) {
> return;
> }
>
> So, to be consistent, I think VFIO should do the same.
>
will do - steve
>> trace_vfio_pci_reset(vdev->vbasedev.name);
>> vfio_pci_pre_reset(vdev);
>> @@ -3513,6 +3518,7 @@ static void vfio_pci_dev_class_init(ObjectClass *klass, const void *data)
>> #ifdef CONFIG_IOMMUFD
>> object_class_property_add_str(klass, "fd", NULL, vfio_pci_set_fd);
>> #endif
>> + dc->vmsd = &vfio_cpr_pci_vmstate;
>> dc->desc = "VFIO-based PCI device assignment";
>> pdc->realize = vfio_realize;
>> diff --git a/include/hw/vfio/vfio-cpr.h b/include/hw/vfio/vfio-cpr.h
>> index d6d22f2..e93600f 100644
>> --- a/include/hw/vfio/vfio-cpr.h
>> +++ b/include/hw/vfio/vfio-cpr.h
>> @@ -49,4 +49,6 @@ void vfio_cpr_giommu_remap(struct VFIOContainerBase *bcontainer,
>> bool vfio_cpr_ram_discard_register_listener(
>> struct VFIOContainerBase *bcontainer, MemoryRegionSection *section);
>> +extern const VMStateDescription vfio_cpr_pci_vmstate;
>> +
>> #endif /* HW_VFIO_VFIO_CPR_H */
>
^ permalink raw reply [flat|nested] 157+ messages in thread
* [PATCH V3 16/42] vfio/pci: vfio_vector_init
2025-05-12 15:32 [PATCH V3 00/42] Live update: vfio and iommufd Steve Sistare
` (14 preceding siblings ...)
2025-05-12 15:32 ` [PATCH V3 15/42] vfio-pci: " Steve Sistare
@ 2025-05-12 15:32 ` Steve Sistare
2025-05-16 8:32 ` Cédric Le Goater
2025-05-12 15:32 ` [PATCH V3 17/42] vfio/pci: vfio_notifier_init Steve Sistare
` (26 subsequent siblings)
42 siblings, 1 reply; 157+ messages in thread
From: Steve Sistare @ 2025-05-12 15:32 UTC (permalink / raw)
To: qemu-devel
Cc: Alex Williamson, Cedric Le Goater, Yi Liu, Eric Auger,
Zhenzhong Duan, Michael S. Tsirkin, Marcel Apfelbaum, Peter Xu,
Fabiano Rosas, Steve Sistare
Extract a subroutine vfio_vector_init. No functional change.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
hw/vfio/pci.c | 24 +++++++++++++++++-------
1 file changed, 17 insertions(+), 7 deletions(-)
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 4aa83b1..b46c42e 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -511,6 +511,22 @@ static void vfio_update_kvm_msi_virq(VFIOMSIVector *vector, MSIMessage msg,
kvm_irqchip_commit_routes(kvm_state);
}
+static void vfio_vector_init(VFIOPCIDevice *vdev, int nr)
+{
+ VFIOMSIVector *vector = &vdev->msi_vectors[nr];
+ PCIDevice *pdev = &vdev->pdev;
+
+ vector->vdev = vdev;
+ vector->virq = -1;
+ if (event_notifier_init(&vector->interrupt, 0)) {
+ error_report("vfio: Error: event_notifier_init failed");
+ }
+ vector->use = true;
+ if (vdev->interrupt == VFIO_INT_MSIX)