* [PATCH 0/3] vfio/migration: Make VFIO migration non-experimental
@ 2023-06-26 8:23 Avihai Horon
2023-06-26 8:23 ` [PATCH 1/3] vfio/migration: Move from STOP_COPY to STOP in vfio_save_cleanup() Avihai Horon
` (2 more replies)
0 siblings, 3 replies; 22+ messages in thread
From: Avihai Horon @ 2023-06-26 8:23 UTC (permalink / raw)
To: qemu-devel
Cc: Alex Williamson, Cédric Le Goater, Juan Quintela, Peter Xu,
Leonardo Bras, Zhenzhong Duan, Yishai Hadas, Jason Gunthorpe,
Maor Gottlieb, Avihai Horon, Kirti Wankhede, Tarun Gupta,
Joao Martins
Hello,
The major parts of VFIO migration are supported today in QEMU. This
includes basic VFIO migration, device dirty page tracking and precopy
support. Thus, at this point in time, it seems appropriate to make VFIO
migration non-experimental.
This short series (which is based on the precopy series [1]) does that
and also adds a few improvements:
- Patch #1 moves the transition from STOP_COPY to STOP state to
vfio_save_cleanup(). Testing with a ConnectX-7 VFIO device showed
that this can reduce downtime by 6% with loaded devices.
- Patch #2 resets bytes_transferred counter properly.
- Patch #3 cleans up the VFIO migration realize flow and makes VFIO
migration non-experimental.
Note that Zhenzhong's series [2] fixes additional bugs and further
cleans the VFIO migration realize flow.
Thanks.
[1]
https://lore.kernel.org/qemu-devel/20230621111201.29729-1-avihaih@nvidia.com/
[2]
https://lore.kernel.org/qemu-devel/20230621080204.420723-1-zhenzhong.duan@intel.com/
Avihai Horon (3):
vfio/migration: Move from STOP_COPY to STOP in vfio_save_cleanup()
vfio/migration: Reset bytes_transferred properly
vfio/migration: Make VFIO migration non-experimental
include/hw/vfio/vfio-common.h | 3 +-
migration/migration.h | 1 +
hw/vfio/migration.c | 54 ++++++++++++++++++++++-------------
hw/vfio/pci.c | 4 +--
migration/migration.c | 1 +
migration/savevm.c | 1 +
migration/target.c | 17 +++++++++--
7 files changed, 56 insertions(+), 25 deletions(-)
--
2.26.3
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH 1/3] vfio/migration: Move from STOP_COPY to STOP in vfio_save_cleanup()
2023-06-26 8:23 [PATCH 0/3] vfio/migration: Make VFIO migration non-experimental Avihai Horon
@ 2023-06-26 8:23 ` Avihai Horon
2023-06-26 9:56 ` Cédric Le Goater
2023-06-26 8:23 ` [PATCH 2/3] vfio/migration: Reset bytes_transferred properly Avihai Horon
2023-06-26 8:23 ` [PATCH 3/3] vfio/migration: Make VFIO migration non-experimental Avihai Horon
2 siblings, 1 reply; 22+ messages in thread
From: Avihai Horon @ 2023-06-26 8:23 UTC (permalink / raw)
To: qemu-devel
Cc: Alex Williamson, Cédric Le Goater, Juan Quintela, Peter Xu,
Leonardo Bras, Zhenzhong Duan, Yishai Hadas, Jason Gunthorpe,
Maor Gottlieb, Avihai Horon, Kirti Wankhede, Tarun Gupta,
Joao Martins
Changing the device state from STOP_COPY to STOP can take time as the
device may need to free resources and do other operations as part of the
transition. Currently, this is done in vfio_save_complete_precopy() and
therefore it is counted in the migration downtime.
To avoid this, change the device state from STOP_COPY to STOP in
vfio_save_cleanup(), which is called after migration has completed and
thus is not part of migration downtime.
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
---
hw/vfio/migration.c | 19 +++++++++++++------
1 file changed, 13 insertions(+), 6 deletions(-)
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index acbf0bb7ab..a8bfbe4b89 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -383,6 +383,19 @@ static void vfio_save_cleanup(void *opaque)
VFIODevice *vbasedev = opaque;
VFIOMigration *migration = vbasedev->migration;
+ /*
+ * Changing device state from STOP_COPY to STOP can take time. Do it here,
+ * after migration has completed, so it won't increase downtime.
+ */
+ if (migration->device_state == VFIO_DEVICE_STATE_STOP_COPY) {
+ /*
+ * If setting the device in STOP state fails, the device should be
+ * reset. To do so, use ERROR state as a recover state.
+ */
+ vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_STOP,
+ VFIO_DEVICE_STATE_ERROR);
+ }
+
g_free(migration->data_buffer);
migration->data_buffer = NULL;
migration->precopy_init_size = 0;
@@ -508,12 +521,6 @@ static int vfio_save_complete_precopy(QEMUFile *f, void *opaque)
return ret;
}
- /*
- * If setting the device in STOP state fails, the device should be reset.
- * To do so, use ERROR state as a recover state.
- */
- ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_STOP,
- VFIO_DEVICE_STATE_ERROR);
trace_vfio_save_complete_precopy(vbasedev->name, ret);
return ret;
--
2.26.3
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 2/3] vfio/migration: Reset bytes_transferred properly
2023-06-26 8:23 [PATCH 0/3] vfio/migration: Make VFIO migration non-experimental Avihai Horon
2023-06-26 8:23 ` [PATCH 1/3] vfio/migration: Move from STOP_COPY to STOP in vfio_save_cleanup() Avihai Horon
@ 2023-06-26 8:23 ` Avihai Horon
2023-06-26 9:52 ` Cédric Le Goater
2023-06-26 12:08 ` Avihai Horon
2023-06-26 8:23 ` [PATCH 3/3] vfio/migration: Make VFIO migration non-experimental Avihai Horon
2 siblings, 2 replies; 22+ messages in thread
From: Avihai Horon @ 2023-06-26 8:23 UTC (permalink / raw)
To: qemu-devel
Cc: Alex Williamson, Cédric Le Goater, Juan Quintela, Peter Xu,
Leonardo Bras, Zhenzhong Duan, Yishai Hadas, Jason Gunthorpe,
Maor Gottlieb, Avihai Horon, Kirti Wankhede, Tarun Gupta,
Joao Martins
Currently, VFIO bytes_transferred is not reset properly:
1. bytes_transferred is not reset after a VM snapshot (so a migration
following a snapshot will report incorrect value).
2. bytes_transferred is a single counter for all VFIO devices, however
upon migration failure it is reset multiple times, by each VFIO
device.
Fix it by introducing a new function vfio_reset_bytes_transferred() and
calling it during migration and snapshot start.
Remove existing bytes_transferred reset in VFIO migration state
notifier, which is not needed anymore.
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
---
include/hw/vfio/vfio-common.h | 1 +
migration/migration.h | 1 +
hw/vfio/migration.c | 6 +++++-
migration/migration.c | 1 +
migration/savevm.c | 1 +
migration/target.c | 17 +++++++++++++++--
6 files changed, 24 insertions(+), 3 deletions(-)
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 3dc5f2104c..b4c28f318f 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -228,6 +228,7 @@ int vfio_block_multiple_devices_migration(Error **errp);
void vfio_unblock_multiple_devices_migration(void);
int vfio_block_giommu_migration(Error **errp);
int64_t vfio_mig_bytes_transferred(void);
+void vfio_reset_bytes_transferred(void);
#ifdef CONFIG_LINUX
int vfio_get_region_info(VFIODevice *vbasedev, int index,
diff --git a/migration/migration.h b/migration/migration.h
index c859a0d35e..a80b22b703 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -514,6 +514,7 @@ bool migration_rate_limit(void);
void migration_cancel(const Error *error);
void populate_vfio_info(MigrationInfo *info);
+void reset_vfio_bytes_transferred(void);
void postcopy_temp_page_reset(PostcopyTmpPage *tmp_page);
#endif
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index a8bfbe4b89..79eb81dfd7 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -704,7 +704,6 @@ static void vfio_migration_state_notifier(Notifier *notifier, void *data)
case MIGRATION_STATUS_CANCELLING:
case MIGRATION_STATUS_CANCELLED:
case MIGRATION_STATUS_FAILED:
- bytes_transferred = 0;
/*
* If setting the device in RUNNING state fails, the device should
* be reset. To do so, use ERROR state as a recover state.
@@ -825,6 +824,11 @@ int64_t vfio_mig_bytes_transferred(void)
return bytes_transferred;
}
+void vfio_reset_bytes_transferred(void)
+{
+ bytes_transferred = 0;
+}
+
int vfio_migration_realize(VFIODevice *vbasedev, Error **errp)
{
int ret = -ENOTSUP;
diff --git a/migration/migration.c b/migration/migration.c
index 7653787f74..096e8191d1 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1628,6 +1628,7 @@ static bool migrate_prepare(MigrationState *s, bool blk, bool blk_inc,
*/
memset(&mig_stats, 0, sizeof(mig_stats));
memset(&compression_counters, 0, sizeof(compression_counters));
+ reset_vfio_bytes_transferred();
return true;
}
diff --git a/migration/savevm.c b/migration/savevm.c
index cdf4793924..95c2abf47c 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1622,6 +1622,7 @@ static int qemu_savevm_state(QEMUFile *f, Error **errp)
migrate_init(ms);
memset(&mig_stats, 0, sizeof(mig_stats));
memset(&compression_counters, 0, sizeof(compression_counters));
+ reset_vfio_bytes_transferred();
ms->to_dst_file = f;
qemu_mutex_unlock_iothread();
diff --git a/migration/target.c b/migration/target.c
index 00ca007f97..f39c9a8d88 100644
--- a/migration/target.c
+++ b/migration/target.c
@@ -14,12 +14,25 @@
#include "hw/vfio/vfio-common.h"
#endif
+#ifdef CONFIG_VFIO
void populate_vfio_info(MigrationInfo *info)
{
-#ifdef CONFIG_VFIO
if (vfio_mig_active()) {
info->vfio = g_malloc0(sizeof(*info->vfio));
info->vfio->transferred = vfio_mig_bytes_transferred();
}
-#endif
}
+
+void reset_vfio_bytes_transferred(void)
+{
+ vfio_reset_bytes_transferred();
+}
+#else
+void populate_vfio_info(MigrationInfo *info)
+{
+}
+
+void reset_vfio_bytes_transferred(void)
+{
+}
+#endif
--
2.26.3
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 3/3] vfio/migration: Make VFIO migration non-experimental
2023-06-26 8:23 [PATCH 0/3] vfio/migration: Make VFIO migration non-experimental Avihai Horon
2023-06-26 8:23 ` [PATCH 1/3] vfio/migration: Move from STOP_COPY to STOP in vfio_save_cleanup() Avihai Horon
2023-06-26 8:23 ` [PATCH 2/3] vfio/migration: Reset bytes_transferred properly Avihai Horon
@ 2023-06-26 8:23 ` Avihai Horon
2023-06-26 13:20 ` Cédric Le Goater
2 siblings, 1 reply; 22+ messages in thread
From: Avihai Horon @ 2023-06-26 8:23 UTC (permalink / raw)
To: qemu-devel
Cc: Alex Williamson, Cédric Le Goater, Juan Quintela, Peter Xu,
Leonardo Bras, Zhenzhong Duan, Yishai Hadas, Jason Gunthorpe,
Maor Gottlieb, Avihai Horon, Kirti Wankhede, Tarun Gupta,
Joao Martins
The major parts of VFIO migration are supported today in QEMU. This
includes basic VFIO migration, device dirty page tracking and precopy
support.
Thus, at this point in time, it seems appropriate to make VFIO migration
non-experimental: remove the x prefix from enable_migration property,
change it to ON_OFF_AUTO and let the default value be AUTO.
In addition, make the following adjustments:
1. Require device dirty tracking support when enable_migration is AUTO
(i.e., not explicitly enabled). This is because device dirty tracking
is currently the only method to do dirty page tracking, which is
essential for migrating in a reasonable downtime. Setting
enable_migration to ON will not require device dirty tracking.
2. Make migration blocker messages more elaborate.
3. Remove error prints in vfio_migration_query_flags().
4. Remove a redundant assignment in vfio_migration_realize().
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
---
include/hw/vfio/vfio-common.h | 2 +-
hw/vfio/migration.c | 29 ++++++++++++++++-------------
hw/vfio/pci.c | 4 ++--
3 files changed, 19 insertions(+), 16 deletions(-)
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index b4c28f318f..387eabde60 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -139,7 +139,7 @@ typedef struct VFIODevice {
bool needs_reset;
bool no_mmap;
bool ram_block_discard_allowed;
- bool enable_migration;
+ OnOffAuto enable_migration;
VFIODeviceOps *ops;
unsigned int num_irqs;
unsigned int num_regions;
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 79eb81dfd7..d8e0848635 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -731,14 +731,6 @@ static int vfio_migration_query_flags(VFIODevice *vbasedev, uint64_t *mig_flags)
feature->argsz = sizeof(buf);
feature->flags = VFIO_DEVICE_FEATURE_GET | VFIO_DEVICE_FEATURE_MIGRATION;
if (ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature)) {
- if (errno == ENOTTY) {
- error_report("%s: VFIO migration is not supported in kernel",
- vbasedev->name);
- } else {
- error_report("%s: Failed to query VFIO migration support, err: %s",
- vbasedev->name, strerror(errno));
- }
-
return -errno;
}
@@ -831,14 +823,28 @@ void vfio_reset_bytes_transferred(void)
int vfio_migration_realize(VFIODevice *vbasedev, Error **errp)
{
- int ret = -ENOTSUP;
+ int ret;
- if (!vbasedev->enable_migration) {
+ if (vbasedev->enable_migration == ON_OFF_AUTO_OFF) {
+ error_setg(&vbasedev->migration_blocker,
+ "%s: Migration is disabled for VFIO device", vbasedev->name);
goto add_blocker;
}
ret = vfio_migration_init(vbasedev);
if (ret) {
+ error_setg(&vbasedev->migration_blocker,
+ "%s: Migration couldn't be initialized for VFIO device, "
+ "err: %d (%s)",
+ vbasedev->name, ret, strerror(-ret));
+ goto add_blocker;
+ }
+
+ if (vbasedev->enable_migration == ON_OFF_AUTO_AUTO &&
+ !vbasedev->dirty_pages_supported) {
+ error_setg(&vbasedev->migration_blocker,
+ "%s: VFIO device doesn't support device dirty tracking",
+ vbasedev->name);
goto add_blocker;
}
@@ -856,9 +862,6 @@ int vfio_migration_realize(VFIODevice *vbasedev, Error **errp)
return 0;
add_blocker:
- error_setg(&vbasedev->migration_blocker,
- "VFIO device doesn't support migration");
-
ret = migrate_add_blocker(vbasedev->migration_blocker, errp);
if (ret < 0) {
error_free(vbasedev->migration_blocker);
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 73874a94de..48584e3b01 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -3347,8 +3347,8 @@ static Property vfio_pci_dev_properties[] = {
VFIO_FEATURE_ENABLE_REQ_BIT, true),
DEFINE_PROP_BIT("x-igd-opregion", VFIOPCIDevice, features,
VFIO_FEATURE_ENABLE_IGD_OPREGION_BIT, false),
- DEFINE_PROP_BOOL("x-enable-migration", VFIOPCIDevice,
- vbasedev.enable_migration, false),
+ DEFINE_PROP_ON_OFF_AUTO("enable-migration", VFIOPCIDevice,
+ vbasedev.enable_migration, ON_OFF_AUTO_AUTO),
DEFINE_PROP_BOOL("x-no-mmap", VFIOPCIDevice, vbasedev.no_mmap, false),
DEFINE_PROP_BOOL("x-balloon-allowed", VFIOPCIDevice,
vbasedev.ram_block_discard_allowed, false),
--
2.26.3
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH 2/3] vfio/migration: Reset bytes_transferred properly
2023-06-26 8:23 ` [PATCH 2/3] vfio/migration: Reset bytes_transferred properly Avihai Horon
@ 2023-06-26 9:52 ` Cédric Le Goater
2023-06-26 11:46 ` Avihai Horon
2023-06-26 12:08 ` Avihai Horon
1 sibling, 1 reply; 22+ messages in thread
From: Cédric Le Goater @ 2023-06-26 9:52 UTC (permalink / raw)
To: Avihai Horon, qemu-devel
Cc: Alex Williamson, Juan Quintela, Peter Xu, Leonardo Bras,
Zhenzhong Duan, Yishai Hadas, Jason Gunthorpe, Maor Gottlieb,
Kirti Wankhede, Tarun Gupta, Joao Martins
Hello Avihai,
On 6/26/23 10:23, Avihai Horon wrote:
> Currently, VFIO bytes_transferred is not reset properly:
> 1. bytes_transferred is not reset after a VM snapshot (so a migration
> following a snapshot will report incorrect value).
> 2. bytes_transferred is a single counter for all VFIO devices, however
> upon migration failure it is reset multiple times, by each VFIO
> device.
>
> Fix it by introducing a new function vfio_reset_bytes_transferred() and
> calling it during migration and snapshot start.
>
> Remove existing bytes_transferred reset in VFIO migration state
> notifier, which is not needed anymore.
a Fixes: tag would be useful.
> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
> ---
> include/hw/vfio/vfio-common.h | 1 +
> migration/migration.h | 1 +
> hw/vfio/migration.c | 6 +++++-
> migration/migration.c | 1 +
> migration/savevm.c | 1 +
> migration/target.c | 17 +++++++++++++++--
> 6 files changed, 24 insertions(+), 3 deletions(-)
>
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index 3dc5f2104c..b4c28f318f 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -228,6 +228,7 @@ int vfio_block_multiple_devices_migration(Error **errp);
> void vfio_unblock_multiple_devices_migration(void);
> int vfio_block_giommu_migration(Error **errp);
> int64_t vfio_mig_bytes_transferred(void);
> +void vfio_reset_bytes_transferred(void);
>
> #ifdef CONFIG_LINUX
> int vfio_get_region_info(VFIODevice *vbasedev, int index,
> diff --git a/migration/migration.h b/migration/migration.h
> index c859a0d35e..a80b22b703 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -514,6 +514,7 @@ bool migration_rate_limit(void);
> void migration_cancel(const Error *error);
>
> void populate_vfio_info(MigrationInfo *info);
> +void reset_vfio_bytes_transferred(void);
> void postcopy_temp_page_reset(PostcopyTmpPage *tmp_page);
>
> #endif
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index a8bfbe4b89..79eb81dfd7 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -704,7 +704,6 @@ static void vfio_migration_state_notifier(Notifier *notifier, void *data)
> case MIGRATION_STATUS_CANCELLING:
> case MIGRATION_STATUS_CANCELLED:
> case MIGRATION_STATUS_FAILED:
> - bytes_transferred = 0;
> /*
> * If setting the device in RUNNING state fails, the device should
> * be reset. To do so, use ERROR state as a recover state.
> @@ -825,6 +824,11 @@ int64_t vfio_mig_bytes_transferred(void)
> return bytes_transferred;
> }
>
> +void vfio_reset_bytes_transferred(void)
> +{
> + bytes_transferred = 0;
> +}
> +
> int vfio_migration_realize(VFIODevice *vbasedev, Error **errp)
> {
> int ret = -ENOTSUP;
> diff --git a/migration/migration.c b/migration/migration.c
> index 7653787f74..096e8191d1 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1628,6 +1628,7 @@ static bool migrate_prepare(MigrationState *s, bool blk, bool blk_inc,
> */
> memset(&mig_stats, 0, sizeof(mig_stats));
> memset(&compression_counters, 0, sizeof(compression_counters));
> + reset_vfio_bytes_transferred();
>
> return true;
> }
> diff --git a/migration/savevm.c b/migration/savevm.c
> index cdf4793924..95c2abf47c 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -1622,6 +1622,7 @@ static int qemu_savevm_state(QEMUFile *f, Error **errp)
> migrate_init(ms);
> memset(&mig_stats, 0, sizeof(mig_stats));
> memset(&compression_counters, 0, sizeof(compression_counters));
> + reset_vfio_bytes_transferred();
> ms->to_dst_file = f;
>
> qemu_mutex_unlock_iothread();
> diff --git a/migration/target.c b/migration/target.c
> index 00ca007f97..f39c9a8d88 100644
> --- a/migration/target.c
> +++ b/migration/target.c
> @@ -14,12 +14,25 @@
> #include "hw/vfio/vfio-common.h"
> #endif
>
> +#ifdef CONFIG_VFIO
> void populate_vfio_info(MigrationInfo *info)
> {
> -#ifdef CONFIG_VFIO
> if (vfio_mig_active()) {
> info->vfio = g_malloc0(sizeof(*info->vfio));
> info->vfio->transferred = vfio_mig_bytes_transferred();
> }
> -#endif
> }
> +
> +void reset_vfio_bytes_transferred(void)
> +{
> + vfio_reset_bytes_transferred();
> +}
> +#else
> +void populate_vfio_info(MigrationInfo *info)
> +{
> +}
> +
> +void reset_vfio_bytes_transferred(void)
> +{
> +}
> +#endif
I would simply use static inline in migration.h if !CONFIG_VFIO.
Thanks,
C.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 1/3] vfio/migration: Move from STOP_COPY to STOP in vfio_save_cleanup()
2023-06-26 8:23 ` [PATCH 1/3] vfio/migration: Move from STOP_COPY to STOP in vfio_save_cleanup() Avihai Horon
@ 2023-06-26 9:56 ` Cédric Le Goater
2023-06-26 11:31 ` Avihai Horon
0 siblings, 1 reply; 22+ messages in thread
From: Cédric Le Goater @ 2023-06-26 9:56 UTC (permalink / raw)
To: Avihai Horon, qemu-devel
Cc: Alex Williamson, Juan Quintela, Peter Xu, Leonardo Bras,
Zhenzhong Duan, Yishai Hadas, Jason Gunthorpe, Maor Gottlieb,
Kirti Wankhede, Tarun Gupta, Joao Martins
On 6/26/23 10:23, Avihai Horon wrote:
> Changing the device state from STOP_COPY to STOP can take time as the
> device may need to free resources and do other operations as part of the
> transition. Currently, this is done in vfio_save_complete_precopy() and
> therefore it is counted in the migration downtime.
>
> To avoid this, change the device state from STOP_COPY to STOP in
> vfio_save_cleanup(), which is called after migration has completed and
> thus is not part of migration downtime.
That's optimization and
> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
> ---
> hw/vfio/migration.c | 19 +++++++++++++------
> 1 file changed, 13 insertions(+), 6 deletions(-)
>
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index acbf0bb7ab..a8bfbe4b89 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -383,6 +383,19 @@ static void vfio_save_cleanup(void *opaque)
> VFIODevice *vbasedev = opaque;
> VFIOMigration *migration = vbasedev->migration;
>
> + /*
> + * Changing device state from STOP_COPY to STOP can take time. Do it here,
> + * after migration has completed, so it won't increase downtime.
> + */
> + if (migration->device_state == VFIO_DEVICE_STATE_STOP_COPY) {
> + /*
> + * If setting the device in STOP state fails, the device should be
> + * reset. To do so, use ERROR state as a recover state.
> + */
> + vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_STOP,
> + VFIO_DEVICE_STATE_ERROR);
> + }
> +
> g_free(migration->data_buffer);
> migration->data_buffer = NULL;
> migration->precopy_init_size = 0;
> @@ -508,12 +521,6 @@ static int vfio_save_complete_precopy(QEMUFile *f, void *opaque)
> return ret;
> }
>
> - /*
> - * If setting the device in STOP state fails, the device should be reset.
> - * To do so, use ERROR state as a recover state.
> - */
> - ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_STOP,
> - VFIO_DEVICE_STATE_ERROR);
we loose the possible returned error. Let's keep that change for later.
Thanks,
C.
> trace_vfio_save_complete_precopy(vbasedev->name, ret);
>
> return ret;
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 1/3] vfio/migration: Move from STOP_COPY to STOP in vfio_save_cleanup()
2023-06-26 9:56 ` Cédric Le Goater
@ 2023-06-26 11:31 ` Avihai Horon
0 siblings, 0 replies; 22+ messages in thread
From: Avihai Horon @ 2023-06-26 11:31 UTC (permalink / raw)
To: Cédric Le Goater, qemu-devel
Cc: Alex Williamson, Juan Quintela, Peter Xu, Leonardo Bras,
Zhenzhong Duan, Yishai Hadas, Jason Gunthorpe, Maor Gottlieb,
Kirti Wankhede, Tarun Gupta, Joao Martins
On 26/06/2023 12:56, Cédric Le Goater wrote:
> External email: Use caution opening links or attachments
>
>
> On 6/26/23 10:23, Avihai Horon wrote:
>> Changing the device state from STOP_COPY to STOP can take time as the
>> device may need to free resources and do other operations as part of the
>> transition. Currently, this is done in vfio_save_complete_precopy() and
>> therefore it is counted in the migration downtime.
>>
>> To avoid this, change the device state from STOP_COPY to STOP in
>> vfio_save_cleanup(), which is called after migration has completed and
>> thus is not part of migration downtime.
>
> That's optimization and
>
>> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
>> ---
>> hw/vfio/migration.c | 19 +++++++++++++------
>> 1 file changed, 13 insertions(+), 6 deletions(-)
>>
>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
>> index acbf0bb7ab..a8bfbe4b89 100644
>> --- a/hw/vfio/migration.c
>> +++ b/hw/vfio/migration.c
>> @@ -383,6 +383,19 @@ static void vfio_save_cleanup(void *opaque)
>> VFIODevice *vbasedev = opaque;
>> VFIOMigration *migration = vbasedev->migration;
>>
>> + /*
>> + * Changing device state from STOP_COPY to STOP can take time.
>> Do it here,
>> + * after migration has completed, so it won't increase downtime.
>> + */
>> + if (migration->device_state == VFIO_DEVICE_STATE_STOP_COPY) {
>> + /*
>> + * If setting the device in STOP state fails, the device
>> should be
>> + * reset. To do so, use ERROR state as a recover state.
>> + */
>> + vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_STOP,
>> + VFIO_DEVICE_STATE_ERROR);
>> + }
>> +
>> g_free(migration->data_buffer);
>> migration->data_buffer = NULL;
>> migration->precopy_init_size = 0;
>> @@ -508,12 +521,6 @@ static int vfio_save_complete_precopy(QEMUFile
>> *f, void *opaque)
>> return ret;
>> }
>>
>> - /*
>> - * If setting the device in STOP state fails, the device should
>> be reset.
>> - * To do so, use ERROR state as a recover state.
>> - */
>> - ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_STOP,
>> - VFIO_DEVICE_STATE_ERROR);
>
> we loose the possible returned error.
Right, but vfio_save_cleanup() is called as part of migration cleanup
flow after migration has finished, so I don't think returning an error
is useful here and the error is still reported in
vfio_migration_set_state() so we don't really loose it.
> Let's keep that change for later.
Sure.
Thanks!
>
>
>> trace_vfio_save_complete_precopy(vbasedev->name, ret);
>>
>> return ret;
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/3] vfio/migration: Reset bytes_transferred properly
2023-06-26 9:52 ` Cédric Le Goater
@ 2023-06-26 11:46 ` Avihai Horon
2023-06-26 12:01 ` Cédric Le Goater
0 siblings, 1 reply; 22+ messages in thread
From: Avihai Horon @ 2023-06-26 11:46 UTC (permalink / raw)
To: Cédric Le Goater, qemu-devel
Cc: Alex Williamson, Juan Quintela, Peter Xu, Leonardo Bras,
Zhenzhong Duan, Yishai Hadas, Jason Gunthorpe, Maor Gottlieb,
Kirti Wankhede, Tarun Gupta, Joao Martins
On 26/06/2023 12:52, Cédric Le Goater wrote:
> External email: Use caution opening links or attachments
>
>
> Hello Avihai,
>
> On 6/26/23 10:23, Avihai Horon wrote:
>> Currently, VFIO bytes_transferred is not reset properly:
>> 1. bytes_transferred is not reset after a VM snapshot (so a migration
>> following a snapshot will report incorrect value).
>> 2. bytes_transferred is a single counter for all VFIO devices, however
>> upon migration failure it is reset multiple times, by each VFIO
>> device.
>>
>> Fix it by introducing a new function vfio_reset_bytes_transferred() and
>> calling it during migration and snapshot start.
>>
>> Remove existing bytes_transferred reset in VFIO migration state
>> notifier, which is not needed anymore.
>
> a Fixes: tag would be useful.
Sure, I will add.
>
>
>> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
>> ---
>> include/hw/vfio/vfio-common.h | 1 +
>> migration/migration.h | 1 +
>> hw/vfio/migration.c | 6 +++++-
>> migration/migration.c | 1 +
>> migration/savevm.c | 1 +
>> migration/target.c | 17 +++++++++++++++--
>> 6 files changed, 24 insertions(+), 3 deletions(-)
>>
>> diff --git a/include/hw/vfio/vfio-common.h
>> b/include/hw/vfio/vfio-common.h
>> index 3dc5f2104c..b4c28f318f 100644
>> --- a/include/hw/vfio/vfio-common.h
>> +++ b/include/hw/vfio/vfio-common.h
>> @@ -228,6 +228,7 @@ int vfio_block_multiple_devices_migration(Error
>> **errp);
>> void vfio_unblock_multiple_devices_migration(void);
>> int vfio_block_giommu_migration(Error **errp);
>> int64_t vfio_mig_bytes_transferred(void);
>> +void vfio_reset_bytes_transferred(void);
>>
>> #ifdef CONFIG_LINUX
>> int vfio_get_region_info(VFIODevice *vbasedev, int index,
>> diff --git a/migration/migration.h b/migration/migration.h
>> index c859a0d35e..a80b22b703 100644
>> --- a/migration/migration.h
>> +++ b/migration/migration.h
>> @@ -514,6 +514,7 @@ bool migration_rate_limit(void);
>> void migration_cancel(const Error *error);
>>
>> void populate_vfio_info(MigrationInfo *info);
>> +void reset_vfio_bytes_transferred(void);
>> void postcopy_temp_page_reset(PostcopyTmpPage *tmp_page);
>>
>> #endif
>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
>> index a8bfbe4b89..79eb81dfd7 100644
>> --- a/hw/vfio/migration.c
>> +++ b/hw/vfio/migration.c
>> @@ -704,7 +704,6 @@ static void
>> vfio_migration_state_notifier(Notifier *notifier, void *data)
>> case MIGRATION_STATUS_CANCELLING:
>> case MIGRATION_STATUS_CANCELLED:
>> case MIGRATION_STATUS_FAILED:
>> - bytes_transferred = 0;
>> /*
>> * If setting the device in RUNNING state fails, the device
>> should
>> * be reset. To do so, use ERROR state as a recover state.
>> @@ -825,6 +824,11 @@ int64_t vfio_mig_bytes_transferred(void)
>> return bytes_transferred;
>> }
>>
>> +void vfio_reset_bytes_transferred(void)
>> +{
>> + bytes_transferred = 0;
>> +}
>> +
>> int vfio_migration_realize(VFIODevice *vbasedev, Error **errp)
>> {
>> int ret = -ENOTSUP;
>> diff --git a/migration/migration.c b/migration/migration.c
>> index 7653787f74..096e8191d1 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -1628,6 +1628,7 @@ static bool migrate_prepare(MigrationState *s,
>> bool blk, bool blk_inc,
>> */
>> memset(&mig_stats, 0, sizeof(mig_stats));
>> memset(&compression_counters, 0, sizeof(compression_counters));
>> + reset_vfio_bytes_transferred();
>>
>> return true;
>> }
>> diff --git a/migration/savevm.c b/migration/savevm.c
>> index cdf4793924..95c2abf47c 100644
>> --- a/migration/savevm.c
>> +++ b/migration/savevm.c
>> @@ -1622,6 +1622,7 @@ static int qemu_savevm_state(QEMUFile *f, Error
>> **errp)
>> migrate_init(ms);
>> memset(&mig_stats, 0, sizeof(mig_stats));
>> memset(&compression_counters, 0, sizeof(compression_counters));
>> + reset_vfio_bytes_transferred();
>> ms->to_dst_file = f;
>>
>> qemu_mutex_unlock_iothread();
>> diff --git a/migration/target.c b/migration/target.c
>> index 00ca007f97..f39c9a8d88 100644
>> --- a/migration/target.c
>> +++ b/migration/target.c
>> @@ -14,12 +14,25 @@
>> #include "hw/vfio/vfio-common.h"
>> #endif
>>
>> +#ifdef CONFIG_VFIO
>> void populate_vfio_info(MigrationInfo *info)
>> {
>> -#ifdef CONFIG_VFIO
>> if (vfio_mig_active()) {
>> info->vfio = g_malloc0(sizeof(*info->vfio));
>> info->vfio->transferred = vfio_mig_bytes_transferred();
>> }
>> -#endif
>> }
>> +
>> +void reset_vfio_bytes_transferred(void)
>> +{
>> + vfio_reset_bytes_transferred();
>> +}
>> +#else
>> +void populate_vfio_info(MigrationInfo *info)
>> +{
>> +}
>> +
>> +void reset_vfio_bytes_transferred(void)
>> +{
>> +}
>> +#endif
>
> I would simply use static inline in migration.h if !CONFIG_VFIO.
I'm not sure it's possible to use CONFIG_VFIO in migration.h. Got this
during compilation:
migration/migration.h:517:8: error: attempt to use poisoned "CONFIG_VFIO"
Plus, see 43bd0bf30fce ("migration: Move populate_vfio_info() into a
separate file") commit message:
The CONFIG_VFIO switch only works in target specific code. Since
migration/migration.c is common code, the #ifdef does not have
the intended behavior here. Move the related code to a separate
file now which gets compiled via specific_ss instead.
Or did I misunderstand you?
Thanks.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/3] vfio/migration: Reset bytes_transferred properly
2023-06-26 11:46 ` Avihai Horon
@ 2023-06-26 12:01 ` Cédric Le Goater
0 siblings, 0 replies; 22+ messages in thread
From: Cédric Le Goater @ 2023-06-26 12:01 UTC (permalink / raw)
To: Avihai Horon, qemu-devel
Cc: Alex Williamson, Juan Quintela, Peter Xu, Leonardo Bras,
Zhenzhong Duan, Yishai Hadas, Jason Gunthorpe, Maor Gottlieb,
Kirti Wankhede, Tarun Gupta, Joao Martins
On 6/26/23 13:46, Avihai Horon wrote:
>
> On 26/06/2023 12:52, Cédric Le Goater wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> Hello Avihai,
>>
>> On 6/26/23 10:23, Avihai Horon wrote:
>>> Currently, VFIO bytes_transferred is not reset properly:
>>> 1. bytes_transferred is not reset after a VM snapshot (so a migration
>>> following a snapshot will report incorrect value).
>>> 2. bytes_transferred is a single counter for all VFIO devices, however
>>> upon migration failure it is reset multiple times, by each VFIO
>>> device.
>>>
>>> Fix it by introducing a new function vfio_reset_bytes_transferred() and
>>> calling it during migration and snapshot start.
>>>
>>> Remove existing bytes_transferred reset in VFIO migration state
>>> notifier, which is not needed anymore.
>>
>> a Fixes: tag would be useful.
>
> Sure, I will add.
You could reply with the appropriate tag and the tooling will automatically
collect it. No need to resend for that.
>
>>
>>
>>> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
>>> ---
>>> include/hw/vfio/vfio-common.h | 1 +
>>> migration/migration.h | 1 +
>>> hw/vfio/migration.c | 6 +++++-
>>> migration/migration.c | 1 +
>>> migration/savevm.c | 1 +
>>> migration/target.c | 17 +++++++++++++++--
>>> 6 files changed, 24 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
>>> index 3dc5f2104c..b4c28f318f 100644
>>> --- a/include/hw/vfio/vfio-common.h
>>> +++ b/include/hw/vfio/vfio-common.h
>>> @@ -228,6 +228,7 @@ int vfio_block_multiple_devices_migration(Error **errp);
>>> void vfio_unblock_multiple_devices_migration(void);
>>> int vfio_block_giommu_migration(Error **errp);
>>> int64_t vfio_mig_bytes_transferred(void);
>>> +void vfio_reset_bytes_transferred(void);
>>>
>>> #ifdef CONFIG_LINUX
>>> int vfio_get_region_info(VFIODevice *vbasedev, int index,
>>> diff --git a/migration/migration.h b/migration/migration.h
>>> index c859a0d35e..a80b22b703 100644
>>> --- a/migration/migration.h
>>> +++ b/migration/migration.h
>>> @@ -514,6 +514,7 @@ bool migration_rate_limit(void);
>>> void migration_cancel(const Error *error);
>>>
>>> void populate_vfio_info(MigrationInfo *info);
>>> +void reset_vfio_bytes_transferred(void);
>>> void postcopy_temp_page_reset(PostcopyTmpPage *tmp_page);
>>>
>>> #endif
>>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
>>> index a8bfbe4b89..79eb81dfd7 100644
>>> --- a/hw/vfio/migration.c
>>> +++ b/hw/vfio/migration.c
>>> @@ -704,7 +704,6 @@ static void vfio_migration_state_notifier(Notifier *notifier, void *data)
>>> case MIGRATION_STATUS_CANCELLING:
>>> case MIGRATION_STATUS_CANCELLED:
>>> case MIGRATION_STATUS_FAILED:
>>> - bytes_transferred = 0;
>>> /*
>>> * If setting the device in RUNNING state fails, the device should
>>> * be reset. To do so, use ERROR state as a recover state.
>>> @@ -825,6 +824,11 @@ int64_t vfio_mig_bytes_transferred(void)
>>> return bytes_transferred;
>>> }
>>>
>>> +void vfio_reset_bytes_transferred(void)
>>> +{
>>> + bytes_transferred = 0;
>>> +}
>>> +
>>> int vfio_migration_realize(VFIODevice *vbasedev, Error **errp)
>>> {
>>> int ret = -ENOTSUP;
>>> diff --git a/migration/migration.c b/migration/migration.c
>>> index 7653787f74..096e8191d1 100644
>>> --- a/migration/migration.c
>>> +++ b/migration/migration.c
>>> @@ -1628,6 +1628,7 @@ static bool migrate_prepare(MigrationState *s, bool blk, bool blk_inc,
>>> */
>>> memset(&mig_stats, 0, sizeof(mig_stats));
>>> memset(&compression_counters, 0, sizeof(compression_counters));
>>> + reset_vfio_bytes_transferred();
>>>
>>> return true;
>>> }
>>> diff --git a/migration/savevm.c b/migration/savevm.c
>>> index cdf4793924..95c2abf47c 100644
>>> --- a/migration/savevm.c
>>> +++ b/migration/savevm.c
>>> @@ -1622,6 +1622,7 @@ static int qemu_savevm_state(QEMUFile *f, Error **errp)
>>> migrate_init(ms);
>>> memset(&mig_stats, 0, sizeof(mig_stats));
>>> memset(&compression_counters, 0, sizeof(compression_counters));
>>> + reset_vfio_bytes_transferred();
>>> ms->to_dst_file = f;
>>>
>>> qemu_mutex_unlock_iothread();
>>> diff --git a/migration/target.c b/migration/target.c
>>> index 00ca007f97..f39c9a8d88 100644
>>> --- a/migration/target.c
>>> +++ b/migration/target.c
>>> @@ -14,12 +14,25 @@
>>> #include "hw/vfio/vfio-common.h"
>>> #endif
>>>
>>> +#ifdef CONFIG_VFIO
>>> void populate_vfio_info(MigrationInfo *info)
>>> {
>>> -#ifdef CONFIG_VFIO
>>> if (vfio_mig_active()) {
>>> info->vfio = g_malloc0(sizeof(*info->vfio));
>>> info->vfio->transferred = vfio_mig_bytes_transferred();
>>> }
>>> -#endif
>>> }
>>> +
>>> +void reset_vfio_bytes_transferred(void)
>>> +{
>>> + vfio_reset_bytes_transferred();
>>> +}
>>> +#else
>>> +void populate_vfio_info(MigrationInfo *info)
>>> +{
>>> +}
>>> +
>>> +void reset_vfio_bytes_transferred(void)
>>> +{
>>> +}
>>> +#endif
>>
>> I would simply use static inline in migration.h if !CONFIG_VFIO.
>
> I'm not sure it's possible to use CONFIG_VFIO in migration.h. Got this during compilation:
> migration/migration.h:517:8: error: attempt to use poisoned "CONFIG_VFIO"
>
> Plus, see 43bd0bf30fce ("migration: Move populate_vfio_info() into a separate file") commit message:
>
> The CONFIG_VFIO switch only works in target specific code. Since
> migration/migration.c is common code, the #ifdef does not have
> the intended behavior here. Move the related code to a separate
> file now which gets compiled via specific_ss instead.
>
> Or did I misunderstand you?
I didn't realize that was not possible. Leave it that way then.
Thanks,
C.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/3] vfio/migration: Reset bytes_transferred properly
2023-06-26 8:23 ` [PATCH 2/3] vfio/migration: Reset bytes_transferred properly Avihai Horon
2023-06-26 9:52 ` Cédric Le Goater
@ 2023-06-26 12:08 ` Avihai Horon
1 sibling, 0 replies; 22+ messages in thread
From: Avihai Horon @ 2023-06-26 12:08 UTC (permalink / raw)
To: qemu-devel
Cc: Alex Williamson, Cédric Le Goater, Juan Quintela, Peter Xu,
Leonardo Bras, Zhenzhong Duan, Yishai Hadas, Jason Gunthorpe,
Maor Gottlieb, Kirti Wankhede, Tarun Gupta, Joao Martins
On 26/06/2023 11:23, Avihai Horon wrote:
> Currently, VFIO bytes_transferred is not reset properly:
> 1. bytes_transferred is not reset after a VM snapshot (so a migration
> following a snapshot will report incorrect value).
> 2. bytes_transferred is a single counter for all VFIO devices, however
> upon migration failure it is reset multiple times, by each VFIO
> device.
>
> Fix it by introducing a new function vfio_reset_bytes_transferred() and
> calling it during migration and snapshot start.
>
> Remove existing bytes_transferred reset in VFIO migration state
> notifier, which is not needed anymore.
Forgot fixes tag:
Fixes: 3710586caa5d ("qapi: Add VFIO devices migration stats in
Migration stats")
> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
> ---
> include/hw/vfio/vfio-common.h | 1 +
> migration/migration.h | 1 +
> hw/vfio/migration.c | 6 +++++-
> migration/migration.c | 1 +
> migration/savevm.c | 1 +
> migration/target.c | 17 +++++++++++++++--
> 6 files changed, 24 insertions(+), 3 deletions(-)
>
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index 3dc5f2104c..b4c28f318f 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -228,6 +228,7 @@ int vfio_block_multiple_devices_migration(Error **errp);
> void vfio_unblock_multiple_devices_migration(void);
> int vfio_block_giommu_migration(Error **errp);
> int64_t vfio_mig_bytes_transferred(void);
> +void vfio_reset_bytes_transferred(void);
>
> #ifdef CONFIG_LINUX
> int vfio_get_region_info(VFIODevice *vbasedev, int index,
> diff --git a/migration/migration.h b/migration/migration.h
> index c859a0d35e..a80b22b703 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -514,6 +514,7 @@ bool migration_rate_limit(void);
> void migration_cancel(const Error *error);
>
> void populate_vfio_info(MigrationInfo *info);
> +void reset_vfio_bytes_transferred(void);
> void postcopy_temp_page_reset(PostcopyTmpPage *tmp_page);
>
> #endif
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index a8bfbe4b89..79eb81dfd7 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -704,7 +704,6 @@ static void vfio_migration_state_notifier(Notifier *notifier, void *data)
> case MIGRATION_STATUS_CANCELLING:
> case MIGRATION_STATUS_CANCELLED:
> case MIGRATION_STATUS_FAILED:
> - bytes_transferred = 0;
> /*
> * If setting the device in RUNNING state fails, the device should
> * be reset. To do so, use ERROR state as a recover state.
> @@ -825,6 +824,11 @@ int64_t vfio_mig_bytes_transferred(void)
> return bytes_transferred;
> }
>
> +void vfio_reset_bytes_transferred(void)
> +{
> + bytes_transferred = 0;
> +}
> +
> int vfio_migration_realize(VFIODevice *vbasedev, Error **errp)
> {
> int ret = -ENOTSUP;
> diff --git a/migration/migration.c b/migration/migration.c
> index 7653787f74..096e8191d1 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1628,6 +1628,7 @@ static bool migrate_prepare(MigrationState *s, bool blk, bool blk_inc,
> */
> memset(&mig_stats, 0, sizeof(mig_stats));
> memset(&compression_counters, 0, sizeof(compression_counters));
> + reset_vfio_bytes_transferred();
>
> return true;
> }
> diff --git a/migration/savevm.c b/migration/savevm.c
> index cdf4793924..95c2abf47c 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -1622,6 +1622,7 @@ static int qemu_savevm_state(QEMUFile *f, Error **errp)
> migrate_init(ms);
> memset(&mig_stats, 0, sizeof(mig_stats));
> memset(&compression_counters, 0, sizeof(compression_counters));
> + reset_vfio_bytes_transferred();
> ms->to_dst_file = f;
>
> qemu_mutex_unlock_iothread();
> diff --git a/migration/target.c b/migration/target.c
> index 00ca007f97..f39c9a8d88 100644
> --- a/migration/target.c
> +++ b/migration/target.c
> @@ -14,12 +14,25 @@
> #include "hw/vfio/vfio-common.h"
> #endif
>
> +#ifdef CONFIG_VFIO
> void populate_vfio_info(MigrationInfo *info)
> {
> -#ifdef CONFIG_VFIO
> if (vfio_mig_active()) {
> info->vfio = g_malloc0(sizeof(*info->vfio));
> info->vfio->transferred = vfio_mig_bytes_transferred();
> }
> -#endif
> }
> +
> +void reset_vfio_bytes_transferred(void)
> +{
> + vfio_reset_bytes_transferred();
> +}
> +#else
> +void populate_vfio_info(MigrationInfo *info)
> +{
> +}
> +
> +void reset_vfio_bytes_transferred(void)
> +{
> +}
> +#endif
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 3/3] vfio/migration: Make VFIO migration non-experimental
2023-06-26 8:23 ` [PATCH 3/3] vfio/migration: Make VFIO migration non-experimental Avihai Horon
@ 2023-06-26 13:20 ` Cédric Le Goater
2023-06-26 13:40 ` Joao Martins
0 siblings, 1 reply; 22+ messages in thread
From: Cédric Le Goater @ 2023-06-26 13:20 UTC (permalink / raw)
To: Avihai Horon, qemu-devel
Cc: Alex Williamson, Juan Quintela, Peter Xu, Leonardo Bras,
Zhenzhong Duan, Yishai Hadas, Jason Gunthorpe, Maor Gottlieb,
Kirti Wankhede, Tarun Gupta, Joao Martins
Hello Avihai,
On 6/26/23 10:23, Avihai Horon wrote:
> The major parts of VFIO migration are supported today in QEMU. This
> includes basic VFIO migration, device dirty page tracking and precopy
> support.
>
> Thus, at this point in time, it seems appropriate to make VFIO migration
> non-experimental: remove the x prefix from enable_migration property,
> change it to ON_OFF_AUTO and let the default value be AUTO.
>
> In addition, make the following adjustments:
> 1. Require device dirty tracking support when enable_migration is AUTO
> (i.e., not explicitly enabled). This is because device dirty tracking
> is currently the only method to do dirty page tracking, which is
> essential for migrating in a reasonable downtime.
hmm, I don't think QEMU should decide to disable a feature for all
devices supposedly because it could be slow for some. That's too
restrictive. What about devices with have small states ? for which
the downtime would be reasonable even without device dirty tracking
support.
> Setting
> enable_migration to ON will not require device dirty tracking.
> 2. Make migration blocker messages more elaborate.
> 3. Remove error prints in vfio_migration_query_flags().
> 4. Remove a redundant assignment in vfio_migration_realize().
>
> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
> ---
> include/hw/vfio/vfio-common.h | 2 +-
> hw/vfio/migration.c | 29 ++++++++++++++++-------------
> hw/vfio/pci.c | 4 ++--
> 3 files changed, 19 insertions(+), 16 deletions(-)
>
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index b4c28f318f..387eabde60 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -139,7 +139,7 @@ typedef struct VFIODevice {
> bool needs_reset;
> bool no_mmap;
> bool ram_block_discard_allowed;
> - bool enable_migration;
> + OnOffAuto enable_migration;
> VFIODeviceOps *ops;
> unsigned int num_irqs;
> unsigned int num_regions;
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index 79eb81dfd7..d8e0848635 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -731,14 +731,6 @@ static int vfio_migration_query_flags(VFIODevice *vbasedev, uint64_t *mig_flags)
> feature->argsz = sizeof(buf);
> feature->flags = VFIO_DEVICE_FEATURE_GET | VFIO_DEVICE_FEATURE_MIGRATION;
> if (ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature)) {
> - if (errno == ENOTTY) {
> - error_report("%s: VFIO migration is not supported in kernel",
> - vbasedev->name);
> - } else {
> - error_report("%s: Failed to query VFIO migration support, err: %s",
> - vbasedev->name, strerror(errno));
> - }
> -
> return -errno;
> }
>
> @@ -831,14 +823,28 @@ void vfio_reset_bytes_transferred(void)
>
> int vfio_migration_realize(VFIODevice *vbasedev, Error **errp)
> {
> - int ret = -ENOTSUP;
> + int ret;
>
> - if (!vbasedev->enable_migration) {
> + if (vbasedev->enable_migration == ON_OFF_AUTO_OFF) {
> + error_setg(&vbasedev->migration_blocker,
> + "%s: Migration is disabled for VFIO device", vbasedev->name);
> goto add_blocker;
> }
>
> ret = vfio_migration_init(vbasedev);
> if (ret) {
It would be good to keep the message for 'errno == ENOTTY' as it was in
vfio_migration_query_flags(). When migration fails, it is an important
information to know that it is because the VFIO PCI host device driver
doesn't support the feature. The root cause could be deep below in FW or
how the VF was set up.
> + error_setg(&vbasedev->migration_blocker,
> + "%s: Migration couldn't be initialized for VFIO device, "
> + "err: %d (%s)",
> + vbasedev->name, ret, strerror(-ret));
> + goto add_blocker;
> + }
> +
> + if (vbasedev->enable_migration == ON_OFF_AUTO_AUTO &&
> + !vbasedev->dirty_pages_supported) {
I don't agree with this test.
> + error_setg(&vbasedev->migration_blocker,
> + "%s: VFIO device doesn't support device dirty tracking",
> + vbasedev->name);
> goto add_blocker;
> }
I agree that with ON_OFF_AUTO_AUTO, errors at realize time should be recorded
in a migration blocker. What about the ON_OFF_AUTO_ON case ? If migration was
explicitly requested for the device and the conditions on the host are not met,
I think realize should fail and the machine abort.
Thanks,
C.
> @@ -856,9 +862,6 @@ int vfio_migration_realize(VFIODevice *vbasedev, Error **errp)
> return 0;
>
> add_blocker:
> - error_setg(&vbasedev->migration_blocker,
> - "VFIO device doesn't support migration");
> -
> ret = migrate_add_blocker(vbasedev->migration_blocker, errp);
> if (ret < 0) {
> error_free(vbasedev->migration_blocker);
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 73874a94de..48584e3b01 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -3347,8 +3347,8 @@ static Property vfio_pci_dev_properties[] = {
> VFIO_FEATURE_ENABLE_REQ_BIT, true),
> DEFINE_PROP_BIT("x-igd-opregion", VFIOPCIDevice, features,
> VFIO_FEATURE_ENABLE_IGD_OPREGION_BIT, false),
> - DEFINE_PROP_BOOL("x-enable-migration", VFIOPCIDevice,
> - vbasedev.enable_migration, false),
> + DEFINE_PROP_ON_OFF_AUTO("enable-migration", VFIOPCIDevice,
> + vbasedev.enable_migration, ON_OFF_AUTO_AUTO),
> DEFINE_PROP_BOOL("x-no-mmap", VFIOPCIDevice, vbasedev.no_mmap, false),
> DEFINE_PROP_BOOL("x-balloon-allowed", VFIOPCIDevice,
> vbasedev.ram_block_discard_allowed, false),
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 3/3] vfio/migration: Make VFIO migration non-experimental
2023-06-26 13:20 ` Cédric Le Goater
@ 2023-06-26 13:40 ` Joao Martins
2023-06-26 15:26 ` Cédric Le Goater
0 siblings, 1 reply; 22+ messages in thread
From: Joao Martins @ 2023-06-26 13:40 UTC (permalink / raw)
To: Cédric Le Goater, Avihai Horon
Cc: Alex Williamson, Juan Quintela, Peter Xu, Leonardo Bras,
Zhenzhong Duan, Yishai Hadas, Jason Gunthorpe, Maor Gottlieb,
Kirti Wankhede, Tarun Gupta, qemu-devel
On 26/06/2023 14:20, Cédric Le Goater wrote:
> Hello Avihai,
>
> On 6/26/23 10:23, Avihai Horon wrote:
>> The major parts of VFIO migration are supported today in QEMU. This
>> includes basic VFIO migration, device dirty page tracking and precopy
>> support.
>>
>> Thus, at this point in time, it seems appropriate to make VFIO migration
>> non-experimental: remove the x prefix from enable_migration property,
>> change it to ON_OFF_AUTO and let the default value be AUTO.
>>
>> In addition, make the following adjustments:
>> 1. Require device dirty tracking support when enable_migration is AUTO
>> (i.e., not explicitly enabled). This is because device dirty tracking
>> is currently the only method to do dirty page tracking, which is
>> essential for migrating in a reasonable downtime.
>
> hmm, I don't think QEMU should decide to disable a feature for all
> devices supposedly because it could be slow for some. That's too
> restrictive. What about devices with have small states ? for which
> the downtime would be reasonable even without device dirty tracking
> support.
>
device dirty tracking refers to the ability to tracking dirty IOVA used by the
device which will DMA into RAM. It is required because the
consequence/alternative is to transfer all RAM in stop copy phase. Device state
size at that point is the least of our problems downtime wise.
I can imagine that allowing without dirty tracking is useful for developer
testing of the suspend/device-state flows, but as real default (auto) is very
questionable to let it go through without dirty tracking. When we have IOMMUFD
dirty tracking that's when we can relieve this restriction as a default.
But then note that (...)
>
>> Setting
>> enable_migration to ON will not require device dirty tracking.
(...) this lets it ignore dirty tracking as you would like.
>> 2. Make migration blocker messages more elaborate.
>> 3. Remove error prints in vfio_migration_query_flags().
>> 4. Remove a redundant assignment in vfio_migration_realize().
>>
>> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
>> ---
>> include/hw/vfio/vfio-common.h | 2 +-
>> hw/vfio/migration.c | 29 ++++++++++++++++-------------
>> hw/vfio/pci.c | 4 ++--
>> 3 files changed, 19 insertions(+), 16 deletions(-)
>>
>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
>> index b4c28f318f..387eabde60 100644
>> --- a/include/hw/vfio/vfio-common.h
>> +++ b/include/hw/vfio/vfio-common.h
>> @@ -139,7 +139,7 @@ typedef struct VFIODevice {
>> bool needs_reset;
>> bool no_mmap;
>> bool ram_block_discard_allowed;
>> - bool enable_migration;
>> + OnOffAuto enable_migration;
>> VFIODeviceOps *ops;
>> unsigned int num_irqs;
>> unsigned int num_regions;
>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
>> index 79eb81dfd7..d8e0848635 100644
>> --- a/hw/vfio/migration.c
>> +++ b/hw/vfio/migration.c
>> @@ -731,14 +731,6 @@ static int vfio_migration_query_flags(VFIODevice
>> *vbasedev, uint64_t *mig_flags)
>> feature->argsz = sizeof(buf);
>> feature->flags = VFIO_DEVICE_FEATURE_GET | VFIO_DEVICE_FEATURE_MIGRATION;
>> if (ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature)) {
>> - if (errno == ENOTTY) {
>> - error_report("%s: VFIO migration is not supported in kernel",
>> - vbasedev->name);
>> - } else {
>> - error_report("%s: Failed to query VFIO migration support, err: %s",
>> - vbasedev->name, strerror(errno));
>> - }
>> -
>> return -errno;
>> }
>> @@ -831,14 +823,28 @@ void vfio_reset_bytes_transferred(void)
>> int vfio_migration_realize(VFIODevice *vbasedev, Error **errp)
>> {
>> - int ret = -ENOTSUP;
>> + int ret;
>> - if (!vbasedev->enable_migration) {
>> + if (vbasedev->enable_migration == ON_OFF_AUTO_OFF) {
>> + error_setg(&vbasedev->migration_blocker,
>> + "%s: Migration is disabled for VFIO device", vbasedev->name);
>> goto add_blocker;
>> }
>> ret = vfio_migration_init(vbasedev);
>> if (ret) {
>
> It would be good to keep the message for 'errno == ENOTTY' as it was in
> vfio_migration_query_flags(). When migration fails, it is an important
> information to know that it is because the VFIO PCI host device driver
> doesn't support the feature. The root cause could be deep below in FW or
> how the VF was set up.
>
+1 As I have been in this rabbit hole
>> + error_setg(&vbasedev->migration_blocker,
>> + "%s: Migration couldn't be initialized for VFIO device, "
>> + "err: %d (%s)",
>> + vbasedev->name, ret, strerror(-ret));
>> + goto add_blocker;
>> + }
>> +
>> + if (vbasedev->enable_migration == ON_OFF_AUTO_AUTO &&
>> + !vbasedev->dirty_pages_supported) {
>
> I don't agree with this test.
>
The alternative right now is perceptual dirty tracking. How is that OK as a
default? It's not like we have even an option :(
Maybe perhaps you refer to avoid strongly enforcing *always* it to allow testing
of the non dirty tracking parts? Maybe when you 'force' enabling with
enable-migration=on is when you ignore the dirty tracking which is what this is
doing.
>> + error_setg(&vbasedev->migration_blocker,
>> + "%s: VFIO device doesn't support device dirty tracking",
>> + vbasedev->name);
>> goto add_blocker;
>> }
> I agree that with ON_OFF_AUTO_AUTO, errors at realize time should be recorded
> in a migration blocker. What about the ON_OFF_AUTO_ON case ? If migration was
> explicitly requested for the device and the conditions on the host are not met,
> I think realize should fail and the machine abort.
>
+1 Good point
> Thanks,
>
> C.
>
>
>
>> @@ -856,9 +862,6 @@ int vfio_migration_realize(VFIODevice *vbasedev, Error
>> **errp)
>> return 0;
>> add_blocker:
>> - error_setg(&vbasedev->migration_blocker,
>> - "VFIO device doesn't support migration");
>> -
>> ret = migrate_add_blocker(vbasedev->migration_blocker, errp);
>> if (ret < 0) {
>> error_free(vbasedev->migration_blocker);
>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>> index 73874a94de..48584e3b01 100644
>> --- a/hw/vfio/pci.c
>> +++ b/hw/vfio/pci.c
>> @@ -3347,8 +3347,8 @@ static Property vfio_pci_dev_properties[] = {
>> VFIO_FEATURE_ENABLE_REQ_BIT, true),
>> DEFINE_PROP_BIT("x-igd-opregion", VFIOPCIDevice, features,
>> VFIO_FEATURE_ENABLE_IGD_OPREGION_BIT, false),
>> - DEFINE_PROP_BOOL("x-enable-migration", VFIOPCIDevice,
>> - vbasedev.enable_migration, false),
>> + DEFINE_PROP_ON_OFF_AUTO("enable-migration", VFIOPCIDevice,
>> + vbasedev.enable_migration, ON_OFF_AUTO_AUTO),
>> DEFINE_PROP_BOOL("x-no-mmap", VFIOPCIDevice, vbasedev.no_mmap, false),
>> DEFINE_PROP_BOOL("x-balloon-allowed", VFIOPCIDevice,
>> vbasedev.ram_block_discard_allowed, false),
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 3/3] vfio/migration: Make VFIO migration non-experimental
2023-06-26 13:40 ` Joao Martins
@ 2023-06-26 15:26 ` Cédric Le Goater
2023-06-26 16:19 ` Jason Gunthorpe
` (2 more replies)
0 siblings, 3 replies; 22+ messages in thread
From: Cédric Le Goater @ 2023-06-26 15:26 UTC (permalink / raw)
To: Joao Martins, Avihai Horon
Cc: Alex Williamson, Juan Quintela, Peter Xu, Leonardo Bras,
Zhenzhong Duan, Yishai Hadas, Jason Gunthorpe, Maor Gottlieb,
Kirti Wankhede, Tarun Gupta, qemu-devel
On 6/26/23 15:40, Joao Martins wrote:
> On 26/06/2023 14:20, Cédric Le Goater wrote:
>> Hello Avihai,
>>
>> On 6/26/23 10:23, Avihai Horon wrote:
>>> The major parts of VFIO migration are supported today in QEMU. This
>>> includes basic VFIO migration, device dirty page tracking and precopy
>>> support.
>>>
>>> Thus, at this point in time, it seems appropriate to make VFIO migration
>>> non-experimental: remove the x prefix from enable_migration property,
>>> change it to ON_OFF_AUTO and let the default value be AUTO.
>>>
>>> In addition, make the following adjustments:
>>> 1. Require device dirty tracking support when enable_migration is AUTO
>>> (i.e., not explicitly enabled). This is because device dirty tracking
>>> is currently the only method to do dirty page tracking, which is
>>> essential for migrating in a reasonable downtime.
>>
>> hmm, I don't think QEMU should decide to disable a feature for all
>> devices supposedly because it could be slow for some. That's too
>> restrictive. What about devices with have small states ? for which
>> the downtime would be reasonable even without device dirty tracking
>> support.
>>
>
> device dirty tracking refers to the ability to tracking dirty IOVA used by the
> device which will DMA into RAM. It is required because the
> consequence/alternative is to transfer all RAM in stop copy phase. Device state
> size at that point is the least of our problems downtime wise.
Arg. thanks for reminding me. I tend to take this for granted ...
> I can imagine that allowing without dirty tracking is useful for developer
> testing of the suspend/device-state flows, but as real default (auto) is very
> questionable to let it go through without dirty tracking. When we have IOMMUFD
> dirty tracking that's when we can relieve this restriction as a default.
>
> But then note that (...)
>
>>
>>> Setting
>>> enable_migration to ON will not require device dirty tracking.
>
> (...) this lets it ignore dirty tracking as you would like.
>
>
>>> 2. Make migration blocker messages more elaborate.
>>> 3. Remove error prints in vfio_migration_query_flags().
>>> 4. Remove a redundant assignment in vfio_migration_realize().
>>>
>>> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
>>> ---
>>> include/hw/vfio/vfio-common.h | 2 +-
>>> hw/vfio/migration.c | 29 ++++++++++++++++-------------
>>> hw/vfio/pci.c | 4 ++--
>>> 3 files changed, 19 insertions(+), 16 deletions(-)
>>>
>>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
>>> index b4c28f318f..387eabde60 100644
>>> --- a/include/hw/vfio/vfio-common.h
>>> +++ b/include/hw/vfio/vfio-common.h
>>> @@ -139,7 +139,7 @@ typedef struct VFIODevice {
>>> bool needs_reset;
>>> bool no_mmap;
>>> bool ram_block_discard_allowed;
>>> - bool enable_migration;
>>> + OnOffAuto enable_migration;
>>> VFIODeviceOps *ops;
>>> unsigned int num_irqs;
>>> unsigned int num_regions;
>>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
>>> index 79eb81dfd7..d8e0848635 100644
>>> --- a/hw/vfio/migration.c
>>> +++ b/hw/vfio/migration.c
>>> @@ -731,14 +731,6 @@ static int vfio_migration_query_flags(VFIODevice
>>> *vbasedev, uint64_t *mig_flags)
>>> feature->argsz = sizeof(buf);
>>> feature->flags = VFIO_DEVICE_FEATURE_GET | VFIO_DEVICE_FEATURE_MIGRATION;
>>> if (ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature)) {
>>> - if (errno == ENOTTY) {
>>> - error_report("%s: VFIO migration is not supported in kernel",
>>> - vbasedev->name);
>>> - } else {
>>> - error_report("%s: Failed to query VFIO migration support, err: %s",
>>> - vbasedev->name, strerror(errno));
>>> - }
>>> -
>>> return -errno;
>>> }
>>> @@ -831,14 +823,28 @@ void vfio_reset_bytes_transferred(void)
>>> int vfio_migration_realize(VFIODevice *vbasedev, Error **errp)
>>> {
>>> - int ret = -ENOTSUP;
>>> + int ret;
>>> - if (!vbasedev->enable_migration) {
>>> + if (vbasedev->enable_migration == ON_OFF_AUTO_OFF) {
>>> + error_setg(&vbasedev->migration_blocker,
>>> + "%s: Migration is disabled for VFIO device", vbasedev->name);
>>> goto add_blocker;
>>> }
>>> ret = vfio_migration_init(vbasedev);
>>> if (ret) {
>>
>> It would be good to keep the message for 'errno == ENOTTY' as it was in
>> vfio_migration_query_flags(). When migration fails, it is an important
>> information to know that it is because the VFIO PCI host device driver
>> doesn't support the feature. The root cause could be deep below in FW or
>> how the VF was set up.
>>
> +1 As I have been in this rabbit hole
>
>>> + error_setg(&vbasedev->migration_blocker,
>>> + "%s: Migration couldn't be initialized for VFIO device, "
>>> + "err: %d (%s)",
>>> + vbasedev->name, ret, strerror(-ret));
>>> + goto add_blocker;
>>> + }
>>> +
>>> + if (vbasedev->enable_migration == ON_OFF_AUTO_AUTO &&
>>> + !vbasedev->dirty_pages_supported) {
>>
>> I don't agree with this test.
>>
>
> The alternative right now is perceptual dirty tracking. How is that OK as a
> default? It's not like we have even an option :(
>
> Maybe perhaps you refer to avoid strongly enforcing *always* it to allow testing
> of the non dirty tracking parts? Maybe when you 'force' enabling with
> enable-migration=on is when you ignore the dirty tracking which is what this is
> doing.
I see ON_OFF_AUTO_ON as a way to abort the machine startup while
ON_OFF_AUTO_AUTO would let it run but block migration. We would
need an extra property to relax the checks, else we are hijacking
some pre-existing option to fit our need.
Since dirty tracking is a must-have to implement migration support
for any existing and future VFIO PCI variant driver, anything else
would be experimental code and we are trying to remove the flag !
Please correct me if I am wrong.
So, the case !vbasedev->dirty_pages_supported is just an extra
information to report for why migration is not supported. Does
that sound reasonable ?
Thanks,
C.
>
>>> + error_setg(&vbasedev->migration_blocker,
>>> + "%s: VFIO device doesn't support device dirty tracking",
>>> + vbasedev->name);
>>> goto add_blocker;
>>> }
>> I agree that with ON_OFF_AUTO_AUTO, errors at realize time should be recorded
>> in a migration blocker. What about the ON_OFF_AUTO_ON case ? If migration was
>> explicitly requested for the device and the conditions on the host are not met,
>> I think realize should fail and the machine abort.
>>
> +1 Good point
>
>> Thanks,
>>
>> C.
>>
>>
>>
>>> @@ -856,9 +862,6 @@ int vfio_migration_realize(VFIODevice *vbasedev, Error
>>> **errp)
>>> return 0;
>>> add_blocker:
>>> - error_setg(&vbasedev->migration_blocker,
>>> - "VFIO device doesn't support migration");
>>> -
>>> ret = migrate_add_blocker(vbasedev->migration_blocker, errp);
>>> if (ret < 0) {
>>> error_free(vbasedev->migration_blocker);
>>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>>> index 73874a94de..48584e3b01 100644
>>> --- a/hw/vfio/pci.c
>>> +++ b/hw/vfio/pci.c
>>> @@ -3347,8 +3347,8 @@ static Property vfio_pci_dev_properties[] = {
>>> VFIO_FEATURE_ENABLE_REQ_BIT, true),
>>> DEFINE_PROP_BIT("x-igd-opregion", VFIOPCIDevice, features,
>>> VFIO_FEATURE_ENABLE_IGD_OPREGION_BIT, false),
>>> - DEFINE_PROP_BOOL("x-enable-migration", VFIOPCIDevice,
>>> - vbasedev.enable_migration, false),
>>> + DEFINE_PROP_ON_OFF_AUTO("enable-migration", VFIOPCIDevice,
>>> + vbasedev.enable_migration, ON_OFF_AUTO_AUTO),
>>> DEFINE_PROP_BOOL("x-no-mmap", VFIOPCIDevice, vbasedev.no_mmap, false),
>>> DEFINE_PROP_BOOL("x-balloon-allowed", VFIOPCIDevice,
>>> vbasedev.ram_block_discard_allowed, false),
>>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 3/3] vfio/migration: Make VFIO migration non-experimental
2023-06-26 15:26 ` Cédric Le Goater
@ 2023-06-26 16:19 ` Jason Gunthorpe
2023-06-26 16:39 ` Cédric Le Goater
2023-06-26 16:36 ` Joao Martins
2023-06-26 17:27 ` Alex Williamson
2 siblings, 1 reply; 22+ messages in thread
From: Jason Gunthorpe @ 2023-06-26 16:19 UTC (permalink / raw)
To: Cédric Le Goater
Cc: Joao Martins, Avihai Horon, Alex Williamson, Juan Quintela,
Peter Xu, Leonardo Bras, Zhenzhong Duan, Yishai Hadas,
Maor Gottlieb, Kirti Wankhede, Tarun Gupta, qemu-devel
On Mon, Jun 26, 2023 at 05:26:42PM +0200, Cédric Le Goater wrote:
> Since dirty tracking is a must-have to implement migration support
> for any existing and future VFIO PCI variant driver, anything else
> would be experimental code and we are trying to remove the flag !
> Please correct me if I am wrong.
I don't think we should call it experimental - it is like other qemu
options where we can choose to run qemu in a suboptimal way.
Jason
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 3/3] vfio/migration: Make VFIO migration non-experimental
2023-06-26 15:26 ` Cédric Le Goater
2023-06-26 16:19 ` Jason Gunthorpe
@ 2023-06-26 16:36 ` Joao Martins
2023-06-26 17:27 ` Alex Williamson
2 siblings, 0 replies; 22+ messages in thread
From: Joao Martins @ 2023-06-26 16:36 UTC (permalink / raw)
To: Cédric Le Goater, Avihai Horon
Cc: Alex Williamson, Juan Quintela, Peter Xu, Leonardo Bras,
Zhenzhong Duan, Yishai Hadas, Jason Gunthorpe, Maor Gottlieb,
Kirti Wankhede, Tarun Gupta, qemu-devel
On 26/06/2023 16:26, Cédric Le Goater wrote:
> On 6/26/23 15:40, Joao Martins wrote:
>> On 26/06/2023 14:20, Cédric Le Goater wrote:
>>> On 6/26/23 10:23, Avihai Horon wrote:
>>>> + error_setg(&vbasedev->migration_blocker,
>>>> + "%s: Migration couldn't be initialized for VFIO device, "
>>>> + "err: %d (%s)",
>>>> + vbasedev->name, ret, strerror(-ret));
>>>> + goto add_blocker;
>>>> + }
>>>> +
>>>> + if (vbasedev->enable_migration == ON_OFF_AUTO_AUTO &&
>>>> + !vbasedev->dirty_pages_supported) {
>>>
>>> I don't agree with this test.
>>>
>>
>> The alternative right now is perceptual dirty tracking. How is that OK as a
>> default? It's not like we have even an option :(
>>
>> Maybe perhaps you refer to avoid strongly enforcing *always* it to allow testing
>> of the non dirty tracking parts? Maybe when you 'force' enabling with
>> enable-migration=on is when you ignore the dirty tracking which is what this is
>> doing.
>
>
> I see ON_OFF_AUTO_ON as a way to abort the machine startup while
> ON_OFF_AUTO_AUTO would let it run but block migration. We would
> need an extra property to relax the checks, else we are hijacking
> some pre-existing option to fit our need.
>
OK
> Since dirty tracking is a must-have to implement migration support
> for any existing and future VFIO PCI variant driver, anything else
> would be experimental code and we are trying to remove the flag !
> Please correct me if I am wrong.
>
My thinking was mainly requiring the default migration case without any relaxing
from the user to require dirty tracking by default. hisilicon driver is the
current vfio driver upstream that doesn't support dirty tracking (neither P2P).
> So, the case !vbasedev->dirty_pages_supported is just an extra
> information to report for why migration is not supported. Does
> that sound reasonable ?
>
We can always piggy back on the x-pre-copy-dirty-page-tracking (which already
exists too) as means to relax this option, and that should cover this other
case. My comment was more targetted at your suggestion of making it optional *by
default* to not use the dirty tracking.
Joao
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 3/3] vfio/migration: Make VFIO migration non-experimental
2023-06-26 16:19 ` Jason Gunthorpe
@ 2023-06-26 16:39 ` Cédric Le Goater
0 siblings, 0 replies; 22+ messages in thread
From: Cédric Le Goater @ 2023-06-26 16:39 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Joao Martins, Avihai Horon, Alex Williamson, Juan Quintela,
Peter Xu, Leonardo Bras, Zhenzhong Duan, Yishai Hadas,
Maor Gottlieb, Kirti Wankhede, Tarun Gupta, qemu-devel
On 6/26/23 18:19, Jason Gunthorpe wrote:
> On Mon, Jun 26, 2023 at 05:26:42PM +0200, Cédric Le Goater wrote:
>
>> Since dirty tracking is a must-have to implement migration support
>> for any existing and future VFIO PCI variant driver, anything else
>> would be experimental code and we are trying to remove the flag !
>> Please correct me if I am wrong.
>
> I don't think we should call it experimental - it is like other qemu
> options where we can choose to run qemu in a suboptimal way.
In that case, whether the driver implements dirty tracking or not
is not a condition to enable migration, only the host kernel driver
support is.
C.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 3/3] vfio/migration: Make VFIO migration non-experimental
2023-06-26 15:26 ` Cédric Le Goater
2023-06-26 16:19 ` Jason Gunthorpe
2023-06-26 16:36 ` Joao Martins
@ 2023-06-26 17:27 ` Alex Williamson
2023-06-27 8:00 ` Avihai Horon
2 siblings, 1 reply; 22+ messages in thread
From: Alex Williamson @ 2023-06-26 17:27 UTC (permalink / raw)
To: Cédric Le Goater
Cc: Joao Martins, Avihai Horon, Juan Quintela, Peter Xu,
Leonardo Bras, Zhenzhong Duan, Yishai Hadas, Jason Gunthorpe,
Maor Gottlieb, Kirti Wankhede, Tarun Gupta, qemu-devel
On Mon, 26 Jun 2023 17:26:42 +0200
Cédric Le Goater <clg@redhat.com> wrote:
> On 6/26/23 15:40, Joao Martins wrote:
> > On 26/06/2023 14:20, Cédric Le Goater wrote:
> >> Hello Avihai,
> >>
> >> On 6/26/23 10:23, Avihai Horon wrote:
> >>> The major parts of VFIO migration are supported today in QEMU. This
> >>> includes basic VFIO migration, device dirty page tracking and precopy
> >>> support.
> >>>
> >>> Thus, at this point in time, it seems appropriate to make VFIO migration
> >>> non-experimental: remove the x prefix from enable_migration property,
> >>> change it to ON_OFF_AUTO and let the default value be AUTO.
> >>>
> >>> In addition, make the following adjustments:
> >>> 1. Require device dirty tracking support when enable_migration is AUTO
> >>> (i.e., not explicitly enabled). This is because device dirty tracking
> >>> is currently the only method to do dirty page tracking, which is
> >>> essential for migrating in a reasonable downtime.
> >>
> >> hmm, I don't think QEMU should decide to disable a feature for all
> >> devices supposedly because it could be slow for some. That's too
> >> restrictive. What about devices with have small states ? for which
> >> the downtime would be reasonable even without device dirty tracking
> >> support.
> >>
> >
> > device dirty tracking refers to the ability to tracking dirty IOVA used by the
> > device which will DMA into RAM. It is required because the
> > consequence/alternative is to transfer all RAM in stop copy phase. Device state
> > size at that point is the least of our problems downtime wise.
>
> Arg. thanks for reminding me. I tend to take this for granted ...
My initial reaction was the same, for instance we could have a non-DMA
device (ex. PCI serial card) that supports migration w/o dirty
tracking, but QEMU cannot assume the device doesn't do DMA, nor is it
worth our time to monitor whether bus-master is ever enabled for the
device, so QEMU needs to assume all memory that's DMA accessible for
the device is perpetually dirty. Also, if such a corner case existed
for a non-DMA migratable device, the easier path is simply to require
dirty tracking stubs in the variant driver to report no memory dirtied.
> > I can imagine that allowing without dirty tracking is useful for developer
> > testing of the suspend/device-state flows, but as real default (auto) is very
> > questionable to let it go through without dirty tracking. When we have IOMMUFD
> > dirty tracking that's when we can relieve this restriction as a default.
> >
> > But then note that (...)
> >
> >>
> >>> Setting
> >>> enable_migration to ON will not require device dirty tracking.
> >
> > (...) this lets it ignore dirty tracking as you would like.
> >
> >
> >>> 2. Make migration blocker messages more elaborate.
> >>> 3. Remove error prints in vfio_migration_query_flags().
> >>> 4. Remove a redundant assignment in vfio_migration_realize().
> >>>
> >>> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
> >>> ---
> >>> include/hw/vfio/vfio-common.h | 2 +-
> >>> hw/vfio/migration.c | 29 ++++++++++++++++-------------
> >>> hw/vfio/pci.c | 4 ++--
> >>> 3 files changed, 19 insertions(+), 16 deletions(-)
> >>>
> >>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> >>> index b4c28f318f..387eabde60 100644
> >>> --- a/include/hw/vfio/vfio-common.h
> >>> +++ b/include/hw/vfio/vfio-common.h
> >>> @@ -139,7 +139,7 @@ typedef struct VFIODevice {
> >>> bool needs_reset;
> >>> bool no_mmap;
> >>> bool ram_block_discard_allowed;
> >>> - bool enable_migration;
> >>> + OnOffAuto enable_migration;
> >>> VFIODeviceOps *ops;
> >>> unsigned int num_irqs;
> >>> unsigned int num_regions;
> >>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> >>> index 79eb81dfd7..d8e0848635 100644
> >>> --- a/hw/vfio/migration.c
> >>> +++ b/hw/vfio/migration.c
> >>> @@ -731,14 +731,6 @@ static int vfio_migration_query_flags(VFIODevice
> >>> *vbasedev, uint64_t *mig_flags)
> >>> feature->argsz = sizeof(buf);
> >>> feature->flags = VFIO_DEVICE_FEATURE_GET | VFIO_DEVICE_FEATURE_MIGRATION;
> >>> if (ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature)) {
> >>> - if (errno == ENOTTY) {
> >>> - error_report("%s: VFIO migration is not supported in kernel",
> >>> - vbasedev->name);
> >>> - } else {
> >>> - error_report("%s: Failed to query VFIO migration support, err: %s",
> >>> - vbasedev->name, strerror(errno));
> >>> - }
> >>> -
> >>> return -errno;
> >>> }
> >>> @@ -831,14 +823,28 @@ void vfio_reset_bytes_transferred(void)
> >>> int vfio_migration_realize(VFIODevice *vbasedev, Error **errp)
> >>> {
> >>> - int ret = -ENOTSUP;
> >>> + int ret;
> >>> - if (!vbasedev->enable_migration) {
> >>> + if (vbasedev->enable_migration == ON_OFF_AUTO_OFF) {
> >>> + error_setg(&vbasedev->migration_blocker,
> >>> + "%s: Migration is disabled for VFIO device", vbasedev->name);
> >>> goto add_blocker;
> >>> }
> >>> ret = vfio_migration_init(vbasedev);
> >>> if (ret) {
> >>
> >> It would be good to keep the message for 'errno == ENOTTY' as it was in
> >> vfio_migration_query_flags(). When migration fails, it is an important
> >> information to know that it is because the VFIO PCI host device driver
> >> doesn't support the feature. The root cause could be deep below in FW or
> >> how the VF was set up.
> >>
> > +1 As I have been in this rabbit hole
> >
> >>> + error_setg(&vbasedev->migration_blocker,
> >>> + "%s: Migration couldn't be initialized for VFIO device, "
> >>> + "err: %d (%s)",
> >>> + vbasedev->name, ret, strerror(-ret));
> >>> + goto add_blocker;
> >>> + }
> >>> +
> >>> + if (vbasedev->enable_migration == ON_OFF_AUTO_AUTO &&
> >>> + !vbasedev->dirty_pages_supported) {
> >>
> >> I don't agree with this test.
> >>
> >
> > The alternative right now is perceptual dirty tracking. How is that OK as a
> > default? It's not like we have even an option :(
> >
> > Maybe perhaps you refer to avoid strongly enforcing *always* it to allow testing
> > of the non dirty tracking parts? Maybe when you 'force' enabling with
> > enable-migration=on is when you ignore the dirty tracking which is what this is
> > doing.
>
>
> I see ON_OFF_AUTO_ON as a way to abort the machine startup while
> ON_OFF_AUTO_AUTO would let it run but block migration.
Agreed. There's a little bit of redundancy between the device-level
"enable-migration=on" option and the global "-only-migratable" option
relative to preventing machine startup, but it also doesn't make sense
to me if the device-level option let realize complete successfully if
the device doesn't support or fails migration setup. So I think we'd
generally rely on using the -only-migratable option with the default
ON_OFF_AUTO_AUTO value, allow the ON_OFF_AUTO_ON value to enable
dis-recommended support, and live with the redundancy that it should
also cause the device realize to fail if migration is not supported.
Thanks,
Alex
> We would
> need an extra property to relax the checks, else we are hijacking
> some pre-existing option to fit our need.
>
> Since dirty tracking is a must-have to implement migration support
> for any existing and future VFIO PCI variant driver, anything else
> would be experimental code and we are trying to remove the flag !
> Please correct me if I am wrong.
>
> So, the case !vbasedev->dirty_pages_supported is just an extra
> information to report for why migration is not supported. Does
> that sound reasonable ?
>
> Thanks,
>
> C.
>
>
>
> >
> >>> + error_setg(&vbasedev->migration_blocker,
> >>> + "%s: VFIO device doesn't support device dirty tracking",
> >>> + vbasedev->name);
> >>> goto add_blocker;
> >>> }
> >> I agree that with ON_OFF_AUTO_AUTO, errors at realize time should be recorded
> >> in a migration blocker. What about the ON_OFF_AUTO_ON case ? If migration was
> >> explicitly requested for the device and the conditions on the host are not met,
> >> I think realize should fail and the machine abort.
> >>
> > +1 Good point
> >
> >> Thanks,
> >>
> >> C.
> >>
> >>
> >>
> >>> @@ -856,9 +862,6 @@ int vfio_migration_realize(VFIODevice *vbasedev, Error
> >>> **errp)
> >>> return 0;
> >>> add_blocker:
> >>> - error_setg(&vbasedev->migration_blocker,
> >>> - "VFIO device doesn't support migration");
> >>> -
> >>> ret = migrate_add_blocker(vbasedev->migration_blocker, errp);
> >>> if (ret < 0) {
> >>> error_free(vbasedev->migration_blocker);
> >>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> >>> index 73874a94de..48584e3b01 100644
> >>> --- a/hw/vfio/pci.c
> >>> +++ b/hw/vfio/pci.c
> >>> @@ -3347,8 +3347,8 @@ static Property vfio_pci_dev_properties[] = {
> >>> VFIO_FEATURE_ENABLE_REQ_BIT, true),
> >>> DEFINE_PROP_BIT("x-igd-opregion", VFIOPCIDevice, features,
> >>> VFIO_FEATURE_ENABLE_IGD_OPREGION_BIT, false),
> >>> - DEFINE_PROP_BOOL("x-enable-migration", VFIOPCIDevice,
> >>> - vbasedev.enable_migration, false),
> >>> + DEFINE_PROP_ON_OFF_AUTO("enable-migration", VFIOPCIDevice,
> >>> + vbasedev.enable_migration, ON_OFF_AUTO_AUTO),
> >>> DEFINE_PROP_BOOL("x-no-mmap", VFIOPCIDevice, vbasedev.no_mmap, false),
> >>> DEFINE_PROP_BOOL("x-balloon-allowed", VFIOPCIDevice,
> >>> vbasedev.ram_block_discard_allowed, false),
> >>
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 3/3] vfio/migration: Make VFIO migration non-experimental
2023-06-26 17:27 ` Alex Williamson
@ 2023-06-27 8:00 ` Avihai Horon
2023-06-27 12:21 ` Cédric Le Goater
2023-06-27 14:20 ` Alex Williamson
0 siblings, 2 replies; 22+ messages in thread
From: Avihai Horon @ 2023-06-27 8:00 UTC (permalink / raw)
To: Alex Williamson, Cédric Le Goater
Cc: Joao Martins, Juan Quintela, Peter Xu, Leonardo Bras,
Zhenzhong Duan, Yishai Hadas, Jason Gunthorpe, Maor Gottlieb,
Kirti Wankhede, Tarun Gupta, qemu-devel
On 26/06/2023 20:27, Alex Williamson wrote:
> External email: Use caution opening links or attachments
>
>
> On Mon, 26 Jun 2023 17:26:42 +0200
> Cédric Le Goater <clg@redhat.com> wrote:
>
>> On 6/26/23 15:40, Joao Martins wrote:
>>> On 26/06/2023 14:20, Cédric Le Goater wrote:
>>>> Hello Avihai,
>>>>
>>>> On 6/26/23 10:23, Avihai Horon wrote:
>>>>> The major parts of VFIO migration are supported today in QEMU. This
>>>>> includes basic VFIO migration, device dirty page tracking and precopy
>>>>> support.
>>>>>
>>>>> Thus, at this point in time, it seems appropriate to make VFIO migration
>>>>> non-experimental: remove the x prefix from enable_migration property,
>>>>> change it to ON_OFF_AUTO and let the default value be AUTO.
>>>>>
>>>>> In addition, make the following adjustments:
>>>>> 1. Require device dirty tracking support when enable_migration is AUTO
>>>>> (i.e., not explicitly enabled). This is because device dirty tracking
>>>>> is currently the only method to do dirty page tracking, which is
>>>>> essential for migrating in a reasonable downtime.
>>>> hmm, I don't think QEMU should decide to disable a feature for all
>>>> devices supposedly because it could be slow for some. That's too
>>>> restrictive. What about devices with have small states ? for which
>>>> the downtime would be reasonable even without device dirty tracking
>>>> support.
>>>>
>>> device dirty tracking refers to the ability to tracking dirty IOVA used by the
>>> device which will DMA into RAM. It is required because the
>>> consequence/alternative is to transfer all RAM in stop copy phase. Device state
>>> size at that point is the least of our problems downtime wise.
>> Arg. thanks for reminding me. I tend to take this for granted ...
> My initial reaction was the same, for instance we could have a non-DMA
> device (ex. PCI serial card) that supports migration w/o dirty
> tracking, but QEMU cannot assume the device doesn't do DMA, nor is it
> worth our time to monitor whether bus-master is ever enabled for the
> device, so QEMU needs to assume all memory that's DMA accessible for
> the device is perpetually dirty. Also, if such a corner case existed
> for a non-DMA migratable device, the easier path is simply to require
> dirty tracking stubs in the variant driver to report no memory dirtied.
>
>>> I can imagine that allowing without dirty tracking is useful for developer
>>> testing of the suspend/device-state flows, but as real default (auto) is very
>>> questionable to let it go through without dirty tracking. When we have IOMMUFD
>>> dirty tracking that's when we can relieve this restriction as a default.
>>>
>>> But then note that (...)
>>>
>>>>> Setting
>>>>> enable_migration to ON will not require device dirty tracking.
>>> (...) this lets it ignore dirty tracking as you would like.
>>>
>>>
>>>>> 2. Make migration blocker messages more elaborate.
>>>>> 3. Remove error prints in vfio_migration_query_flags().
>>>>> 4. Remove a redundant assignment in vfio_migration_realize().
>>>>>
>>>>> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
>>>>> ---
>>>>> include/hw/vfio/vfio-common.h | 2 +-
>>>>> hw/vfio/migration.c | 29 ++++++++++++++++-------------
>>>>> hw/vfio/pci.c | 4 ++--
>>>>> 3 files changed, 19 insertions(+), 16 deletions(-)
>>>>>
>>>>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
>>>>> index b4c28f318f..387eabde60 100644
>>>>> --- a/include/hw/vfio/vfio-common.h
>>>>> +++ b/include/hw/vfio/vfio-common.h
>>>>> @@ -139,7 +139,7 @@ typedef struct VFIODevice {
>>>>> bool needs_reset;
>>>>> bool no_mmap;
>>>>> bool ram_block_discard_allowed;
>>>>> - bool enable_migration;
>>>>> + OnOffAuto enable_migration;
>>>>> VFIODeviceOps *ops;
>>>>> unsigned int num_irqs;
>>>>> unsigned int num_regions;
>>>>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
>>>>> index 79eb81dfd7..d8e0848635 100644
>>>>> --- a/hw/vfio/migration.c
>>>>> +++ b/hw/vfio/migration.c
>>>>> @@ -731,14 +731,6 @@ static int vfio_migration_query_flags(VFIODevice
>>>>> *vbasedev, uint64_t *mig_flags)
>>>>> feature->argsz = sizeof(buf);
>>>>> feature->flags = VFIO_DEVICE_FEATURE_GET | VFIO_DEVICE_FEATURE_MIGRATION;
>>>>> if (ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature)) {
>>>>> - if (errno == ENOTTY) {
>>>>> - error_report("%s: VFIO migration is not supported in kernel",
>>>>> - vbasedev->name);
>>>>> - } else {
>>>>> - error_report("%s: Failed to query VFIO migration support, err: %s",
>>>>> - vbasedev->name, strerror(errno));
>>>>> - }
>>>>> -
>>>>> return -errno;
>>>>> }
>>>>> @@ -831,14 +823,28 @@ void vfio_reset_bytes_transferred(void)
>>>>> int vfio_migration_realize(VFIODevice *vbasedev, Error **errp)
>>>>> {
>>>>> - int ret = -ENOTSUP;
>>>>> + int ret;
>>>>> - if (!vbasedev->enable_migration) {
>>>>> + if (vbasedev->enable_migration == ON_OFF_AUTO_OFF) {
>>>>> + error_setg(&vbasedev->migration_blocker,
>>>>> + "%s: Migration is disabled for VFIO device", vbasedev->name);
>>>>> goto add_blocker;
>>>>> }
>>>>> ret = vfio_migration_init(vbasedev);
>>>>> if (ret) {
>>>> It would be good to keep the message for 'errno == ENOTTY' as it was in
>>>> vfio_migration_query_flags(). When migration fails, it is an important
>>>> information to know that it is because the VFIO PCI host device driver
>>>> doesn't support the feature. The root cause could be deep below in FW or
>>>> how the VF was set up.
>>>>
>>> +1 As I have been in this rabbit hole
Sure, I will add it.
>>>
>>>>> + error_setg(&vbasedev->migration_blocker,
>>>>> + "%s: Migration couldn't be initialized for VFIO device, "
>>>>> + "err: %d (%s)",
>>>>> + vbasedev->name, ret, strerror(-ret));
>>>>> + goto add_blocker;
>>>>> + }
>>>>> +
>>>>> + if (vbasedev->enable_migration == ON_OFF_AUTO_AUTO &&
>>>>> + !vbasedev->dirty_pages_supported) {
>>>> I don't agree with this test.
>>>>
>>> The alternative right now is perceptual dirty tracking. How is that OK as a
>>> default? It's not like we have even an option :(
>>>
>>> Maybe perhaps you refer to avoid strongly enforcing *always* it to allow testing
>>> of the non dirty tracking parts? Maybe when you 'force' enabling with
>>> enable-migration=on is when you ignore the dirty tracking which is what this is
>>> doing.
>>
>> I see ON_OFF_AUTO_ON as a way to abort the machine startup while
>> ON_OFF_AUTO_AUTO would let it run but block migration.
> Agreed. There's a little bit of redundancy between the device-level
> "enable-migration=on" option and the global "-only-migratable" option
> relative to preventing machine startup, but it also doesn't make sense
> to me if the device-level option let realize complete successfully if
> the device doesn't support or fails migration setup. So I think we'd
> generally rely on using the -only-migratable option with the default
> ON_OFF_AUTO_AUTO value, allow the ON_OFF_AUTO_ON value to enable
> dis-recommended support, and live with the redundancy that it should
> also cause the device realize to fail if migration is not supported.
> Thanks,
>
> Alex
OK.
When enable_migration=AUTO we allow blockers.
When enable_migration=ON we don't allow blockers and instead prevent
realization of VFIO device.
Regarding device dirty tracking, we keep current behavior, right?
That is:
When enable_migration=AUTO we block migration if device dirty tracking
is not supported.
When enable_migration=ON we allow migration even if device dirty
tracking is not supported (in which case DMA-able memory will be
perpetually dirtied).
Thanks.
>
>> We would
>> need an extra property to relax the checks, else we are hijacking
>> some pre-existing option to fit our need.
>>
>> Since dirty tracking is a must-have to implement migration support
>> for any existing and future VFIO PCI variant driver, anything else
>> would be experimental code and we are trying to remove the flag !
>> Please correct me if I am wrong.
>>
>> So, the case !vbasedev->dirty_pages_supported is just an extra
>> information to report for why migration is not supported. Does
>> that sound reasonable ?
>>
>> Thanks,
>>
>> C.
>>
>>
>>
>>>>> + error_setg(&vbasedev->migration_blocker,
>>>>> + "%s: VFIO device doesn't support device dirty tracking",
>>>>> + vbasedev->name);
>>>>> goto add_blocker;
>>>>> }
>>>> I agree that with ON_OFF_AUTO_AUTO, errors at realize time should be recorded
>>>> in a migration blocker. What about the ON_OFF_AUTO_ON case ? If migration was
>>>> explicitly requested for the device and the conditions on the host are not met,
>>>> I think realize should fail and the machine abort.
>>>>
>>> +1 Good point
>>>
>>>> Thanks,
>>>>
>>>> C.
>>>>
>>>>
>>>>
>>>>> @@ -856,9 +862,6 @@ int vfio_migration_realize(VFIODevice *vbasedev, Error
>>>>> **errp)
>>>>> return 0;
>>>>> add_blocker:
>>>>> - error_setg(&vbasedev->migration_blocker,
>>>>> - "VFIO device doesn't support migration");
>>>>> -
>>>>> ret = migrate_add_blocker(vbasedev->migration_blocker, errp);
>>>>> if (ret < 0) {
>>>>> error_free(vbasedev->migration_blocker);
>>>>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>>>>> index 73874a94de..48584e3b01 100644
>>>>> --- a/hw/vfio/pci.c
>>>>> +++ b/hw/vfio/pci.c
>>>>> @@ -3347,8 +3347,8 @@ static Property vfio_pci_dev_properties[] = {
>>>>> VFIO_FEATURE_ENABLE_REQ_BIT, true),
>>>>> DEFINE_PROP_BIT("x-igd-opregion", VFIOPCIDevice, features,
>>>>> VFIO_FEATURE_ENABLE_IGD_OPREGION_BIT, false),
>>>>> - DEFINE_PROP_BOOL("x-enable-migration", VFIOPCIDevice,
>>>>> - vbasedev.enable_migration, false),
>>>>> + DEFINE_PROP_ON_OFF_AUTO("enable-migration", VFIOPCIDevice,
>>>>> + vbasedev.enable_migration, ON_OFF_AUTO_AUTO),
>>>>> DEFINE_PROP_BOOL("x-no-mmap", VFIOPCIDevice, vbasedev.no_mmap, false),
>>>>> DEFINE_PROP_BOOL("x-balloon-allowed", VFIOPCIDevice,
>>>>> vbasedev.ram_block_discard_allowed, false),
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 3/3] vfio/migration: Make VFIO migration non-experimental
2023-06-27 8:00 ` Avihai Horon
@ 2023-06-27 12:21 ` Cédric Le Goater
2023-06-27 12:30 ` Jason Gunthorpe
2023-06-27 14:20 ` Alex Williamson
1 sibling, 1 reply; 22+ messages in thread
From: Cédric Le Goater @ 2023-06-27 12:21 UTC (permalink / raw)
To: Avihai Horon, Alex Williamson
Cc: Joao Martins, Juan Quintela, Peter Xu, Leonardo Bras,
Zhenzhong Duan, Yishai Hadas, Jason Gunthorpe, Maor Gottlieb,
Kirti Wankhede, Tarun Gupta, qemu-devel, Shameer Kolothum
[ Cc: Shameer ]
>>> I see ON_OFF_AUTO_ON as a way to abort the machine startup while
>>> ON_OFF_AUTO_AUTO would let it run but block migration.
>>
>> Agreed. There's a little bit of redundancy between the device-level
>> "enable-migration=on" option and the global "-only-migratable" option
>> relative to preventing machine startup, but it also doesn't make sense
>> to me if the device-level option let realize complete successfully if
>> the device doesn't support or fails migration setup. So I think we'd
>> generally rely on using the -only-migratable option with the default
>> ON_OFF_AUTO_AUTO value, allow the ON_OFF_AUTO_ON value to enable
>> dis-recommended support, and live with the redundancy that it should
>> also cause the device realize to fail if migration is not supported.
>> Thanks,
>>
>> Alex
>
> OK.
>
> When enable_migration=AUTO we allow blockers.
> When enable_migration=ON we don't allow blockers and instead prevent realization of VFIO device.
>
> Regarding device dirty tracking, we keep current behavior, right?
> That is:
> When enable_migration=AUTO we block migration if device dirty tracking is not supported.
> When enable_migration=ON we allow migration even if device dirty tracking is not supported > (in which case DMA-able memory will be perpetually dirtied).
Yes. That's how I understand it. This is what you initially proposed.
The default behavior is to allow migration only if the host kernel
driver supports dirty tracking.
We have a way to run and migrate a machine with a device not supporting
dirty tracking. Only Hisilicon is in that case today. May be there are
plans to add dirty tracking support ?
Thanks,
C.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 3/3] vfio/migration: Make VFIO migration non-experimental
2023-06-27 12:21 ` Cédric Le Goater
@ 2023-06-27 12:30 ` Jason Gunthorpe
2023-06-28 3:09 ` Shameerali Kolothum Thodi via
0 siblings, 1 reply; 22+ messages in thread
From: Jason Gunthorpe @ 2023-06-27 12:30 UTC (permalink / raw)
To: Cédric Le Goater
Cc: Avihai Horon, Alex Williamson, Joao Martins, Juan Quintela,
Peter Xu, Leonardo Bras, Zhenzhong Duan, Yishai Hadas,
Maor Gottlieb, Kirti Wankhede, Tarun Gupta, qemu-devel,
Shameer Kolothum
On Tue, Jun 27, 2023 at 02:21:55PM +0200, Cédric Le Goater wrote:
> We have a way to run and migrate a machine with a device not supporting
> dirty tracking. Only Hisilicon is in that case today. May be there are
> plans to add dirty tracking support ?
Hisilicon will eventually use Joao's work for IOMMU based tracking,
this is what their HW was designed to do.
Jason
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 3/3] vfio/migration: Make VFIO migration non-experimental
2023-06-27 8:00 ` Avihai Horon
2023-06-27 12:21 ` Cédric Le Goater
@ 2023-06-27 14:20 ` Alex Williamson
1 sibling, 0 replies; 22+ messages in thread
From: Alex Williamson @ 2023-06-27 14:20 UTC (permalink / raw)
To: Avihai Horon
Cc: Cédric Le Goater, Joao Martins, Juan Quintela, Peter Xu,
Leonardo Bras, Zhenzhong Duan, Yishai Hadas, Jason Gunthorpe,
Maor Gottlieb, Kirti Wankhede, Tarun Gupta, qemu-devel
On Tue, 27 Jun 2023 11:00:46 +0300
Avihai Horon <avihaih@nvidia.com> wrote:
> On 26/06/2023 20:27, Alex Williamson wrote:
> > External email: Use caution opening links or attachments
> >
> >
> > On Mon, 26 Jun 2023 17:26:42 +0200
> > Cédric Le Goater <clg@redhat.com> wrote:
> >
> >> On 6/26/23 15:40, Joao Martins wrote:
> >>> On 26/06/2023 14:20, Cédric Le Goater wrote:
> >>>> Hello Avihai,
> >>>>
> >>>> On 6/26/23 10:23, Avihai Horon wrote:
> >>>>> The major parts of VFIO migration are supported today in QEMU. This
> >>>>> includes basic VFIO migration, device dirty page tracking and precopy
> >>>>> support.
> >>>>>
> >>>>> Thus, at this point in time, it seems appropriate to make VFIO migration
> >>>>> non-experimental: remove the x prefix from enable_migration property,
> >>>>> change it to ON_OFF_AUTO and let the default value be AUTO.
> >>>>>
> >>>>> In addition, make the following adjustments:
> >>>>> 1. Require device dirty tracking support when enable_migration is AUTO
> >>>>> (i.e., not explicitly enabled). This is because device dirty tracking
> >>>>> is currently the only method to do dirty page tracking, which is
> >>>>> essential for migrating in a reasonable downtime.
> >>>> hmm, I don't think QEMU should decide to disable a feature for all
> >>>> devices supposedly because it could be slow for some. That's too
> >>>> restrictive. What about devices with have small states ? for which
> >>>> the downtime would be reasonable even without device dirty tracking
> >>>> support.
> >>>>
> >>> device dirty tracking refers to the ability to tracking dirty IOVA used by the
> >>> device which will DMA into RAM. It is required because the
> >>> consequence/alternative is to transfer all RAM in stop copy phase. Device state
> >>> size at that point is the least of our problems downtime wise.
> >> Arg. thanks for reminding me. I tend to take this for granted ...
> > My initial reaction was the same, for instance we could have a non-DMA
> > device (ex. PCI serial card) that supports migration w/o dirty
> > tracking, but QEMU cannot assume the device doesn't do DMA, nor is it
> > worth our time to monitor whether bus-master is ever enabled for the
> > device, so QEMU needs to assume all memory that's DMA accessible for
> > the device is perpetually dirty. Also, if such a corner case existed
> > for a non-DMA migratable device, the easier path is simply to require
> > dirty tracking stubs in the variant driver to report no memory dirtied.
> >
> >>> I can imagine that allowing without dirty tracking is useful for developer
> >>> testing of the suspend/device-state flows, but as real default (auto) is very
> >>> questionable to let it go through without dirty tracking. When we have IOMMUFD
> >>> dirty tracking that's when we can relieve this restriction as a default.
> >>>
> >>> But then note that (...)
> >>>
> >>>>> Setting
> >>>>> enable_migration to ON will not require device dirty tracking.
> >>> (...) this lets it ignore dirty tracking as you would like.
> >>>
> >>>
> >>>>> 2. Make migration blocker messages more elaborate.
> >>>>> 3. Remove error prints in vfio_migration_query_flags().
> >>>>> 4. Remove a redundant assignment in vfio_migration_realize().
> >>>>>
> >>>>> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
> >>>>> ---
> >>>>> include/hw/vfio/vfio-common.h | 2 +-
> >>>>> hw/vfio/migration.c | 29 ++++++++++++++++-------------
> >>>>> hw/vfio/pci.c | 4 ++--
> >>>>> 3 files changed, 19 insertions(+), 16 deletions(-)
> >>>>>
> >>>>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> >>>>> index b4c28f318f..387eabde60 100644
> >>>>> --- a/include/hw/vfio/vfio-common.h
> >>>>> +++ b/include/hw/vfio/vfio-common.h
> >>>>> @@ -139,7 +139,7 @@ typedef struct VFIODevice {
> >>>>> bool needs_reset;
> >>>>> bool no_mmap;
> >>>>> bool ram_block_discard_allowed;
> >>>>> - bool enable_migration;
> >>>>> + OnOffAuto enable_migration;
> >>>>> VFIODeviceOps *ops;
> >>>>> unsigned int num_irqs;
> >>>>> unsigned int num_regions;
> >>>>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> >>>>> index 79eb81dfd7..d8e0848635 100644
> >>>>> --- a/hw/vfio/migration.c
> >>>>> +++ b/hw/vfio/migration.c
> >>>>> @@ -731,14 +731,6 @@ static int vfio_migration_query_flags(VFIODevice
> >>>>> *vbasedev, uint64_t *mig_flags)
> >>>>> feature->argsz = sizeof(buf);
> >>>>> feature->flags = VFIO_DEVICE_FEATURE_GET | VFIO_DEVICE_FEATURE_MIGRATION;
> >>>>> if (ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature)) {
> >>>>> - if (errno == ENOTTY) {
> >>>>> - error_report("%s: VFIO migration is not supported in kernel",
> >>>>> - vbasedev->name);
> >>>>> - } else {
> >>>>> - error_report("%s: Failed to query VFIO migration support, err: %s",
> >>>>> - vbasedev->name, strerror(errno));
> >>>>> - }
> >>>>> -
> >>>>> return -errno;
> >>>>> }
> >>>>> @@ -831,14 +823,28 @@ void vfio_reset_bytes_transferred(void)
> >>>>> int vfio_migration_realize(VFIODevice *vbasedev, Error **errp)
> >>>>> {
> >>>>> - int ret = -ENOTSUP;
> >>>>> + int ret;
> >>>>> - if (!vbasedev->enable_migration) {
> >>>>> + if (vbasedev->enable_migration == ON_OFF_AUTO_OFF) {
> >>>>> + error_setg(&vbasedev->migration_blocker,
> >>>>> + "%s: Migration is disabled for VFIO device", vbasedev->name);
> >>>>> goto add_blocker;
> >>>>> }
> >>>>> ret = vfio_migration_init(vbasedev);
> >>>>> if (ret) {
> >>>> It would be good to keep the message for 'errno == ENOTTY' as it was in
> >>>> vfio_migration_query_flags(). When migration fails, it is an important
> >>>> information to know that it is because the VFIO PCI host device driver
> >>>> doesn't support the feature. The root cause could be deep below in FW or
> >>>> how the VF was set up.
> >>>>
> >>> +1 As I have been in this rabbit hole
>
> Sure, I will add it.
>
> >>>
> >>>>> + error_setg(&vbasedev->migration_blocker,
> >>>>> + "%s: Migration couldn't be initialized for VFIO device, "
> >>>>> + "err: %d (%s)",
> >>>>> + vbasedev->name, ret, strerror(-ret));
> >>>>> + goto add_blocker;
> >>>>> + }
> >>>>> +
> >>>>> + if (vbasedev->enable_migration == ON_OFF_AUTO_AUTO &&
> >>>>> + !vbasedev->dirty_pages_supported) {
> >>>> I don't agree with this test.
> >>>>
> >>> The alternative right now is perceptual dirty tracking. How is that OK as a
> >>> default? It's not like we have even an option :(
> >>>
> >>> Maybe perhaps you refer to avoid strongly enforcing *always* it to allow testing
> >>> of the non dirty tracking parts? Maybe when you 'force' enabling with
> >>> enable-migration=on is when you ignore the dirty tracking which is what this is
> >>> doing.
> >>
> >> I see ON_OFF_AUTO_ON as a way to abort the machine startup while
> >> ON_OFF_AUTO_AUTO would let it run but block migration.
> > Agreed. There's a little bit of redundancy between the device-level
> > "enable-migration=on" option and the global "-only-migratable" option
> > relative to preventing machine startup, but it also doesn't make sense
> > to me if the device-level option let realize complete successfully if
> > the device doesn't support or fails migration setup. So I think we'd
> > generally rely on using the -only-migratable option with the default
> > ON_OFF_AUTO_AUTO value, allow the ON_OFF_AUTO_ON value to enable
> > dis-recommended support, and live with the redundancy that it should
> > also cause the device realize to fail if migration is not supported.
> > Thanks,
> >
> > Alex
>
> OK.
>
> When enable_migration=AUTO we allow blockers.
> When enable_migration=ON we don't allow blockers and instead prevent
> realization of VFIO device.
>
> Regarding device dirty tracking, we keep current behavior, right?
> That is:
> When enable_migration=AUTO we block migration if device dirty tracking
> is not supported.
> When enable_migration=ON we allow migration even if device dirty
> tracking is not supported (in which case DMA-able memory will be
> perpetually dirtied).
Yes, and I think we'd be justified in logging a warning when migration
is enabled without any dirty page tracking support. Thanks,
Alex
^ permalink raw reply [flat|nested] 22+ messages in thread
* RE: [PATCH 3/3] vfio/migration: Make VFIO migration non-experimental
2023-06-27 12:30 ` Jason Gunthorpe
@ 2023-06-28 3:09 ` Shameerali Kolothum Thodi via
0 siblings, 0 replies; 22+ messages in thread
From: Shameerali Kolothum Thodi via @ 2023-06-28 3:09 UTC (permalink / raw)
To: Jason Gunthorpe, Cédric Le Goater
Cc: Avihai Horon, Alex Williamson, Joao Martins, Juan Quintela,
Peter Xu, Leonardo Bras, Zhenzhong Duan, Yishai Hadas,
Maor Gottlieb, Kirti Wankhede, Tarun Gupta, qemu-devel@nongnu.org
> -----Original Message-----
> From: Jason Gunthorpe [mailto:jgg@nvidia.com]
> Sent: 27 June 2023 13:30
> To: Cédric Le Goater <clg@redhat.com>
> Cc: Avihai Horon <avihaih@nvidia.com>; Alex Williamson
> <alex.williamson@redhat.com>; Joao Martins <joao.m.martins@oracle.com>;
> Juan Quintela <quintela@redhat.com>; Peter Xu <peterx@redhat.com>;
> Leonardo Bras <leobras@redhat.com>; Zhenzhong Duan
> <zhenzhong.duan@intel.com>; Yishai Hadas <yishaih@nvidia.com>; Maor
> Gottlieb <maorg@nvidia.com>; Kirti Wankhede <kwankhede@nvidia.com>;
> Tarun Gupta <targupta@nvidia.com>; qemu-devel@nongnu.org; Shameerali
> Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Subject: Re: [PATCH 3/3] vfio/migration: Make VFIO migration
> non-experimental
>
> On Tue, Jun 27, 2023 at 02:21:55PM +0200, Cédric Le Goater wrote:
>
> > We have a way to run and migrate a machine with a device not supporting
> > dirty tracking. Only Hisilicon is in that case today. May be there are
> > plans to add dirty tracking support ?
>
> Hisilicon will eventually use Joao's work for IOMMU based tracking,
> this is what their HW was designed to do.
Yes. The plan is to make use of SMMUv3 HTTU feature for dirty tracking based
on Joao's work here,
https://lore.kernel.org/kvm/20230518204650.14541-1-joao.m.martins@oracle.com/
Thanks,
Shameer
^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2023-06-28 3:10 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-06-26 8:23 [PATCH 0/3] vfio/migration: Make VFIO migration non-experimental Avihai Horon
2023-06-26 8:23 ` [PATCH 1/3] vfio/migration: Move from STOP_COPY to STOP in vfio_save_cleanup() Avihai Horon
2023-06-26 9:56 ` Cédric Le Goater
2023-06-26 11:31 ` Avihai Horon
2023-06-26 8:23 ` [PATCH 2/3] vfio/migration: Reset bytes_transferred properly Avihai Horon
2023-06-26 9:52 ` Cédric Le Goater
2023-06-26 11:46 ` Avihai Horon
2023-06-26 12:01 ` Cédric Le Goater
2023-06-26 12:08 ` Avihai Horon
2023-06-26 8:23 ` [PATCH 3/3] vfio/migration: Make VFIO migration non-experimental Avihai Horon
2023-06-26 13:20 ` Cédric Le Goater
2023-06-26 13:40 ` Joao Martins
2023-06-26 15:26 ` Cédric Le Goater
2023-06-26 16:19 ` Jason Gunthorpe
2023-06-26 16:39 ` Cédric Le Goater
2023-06-26 16:36 ` Joao Martins
2023-06-26 17:27 ` Alex Williamson
2023-06-27 8:00 ` Avihai Horon
2023-06-27 12:21 ` Cédric Le Goater
2023-06-27 12:30 ` Jason Gunthorpe
2023-06-28 3:09 ` Shameerali Kolothum Thodi via
2023-06-27 14:20 ` Alex Williamson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).