From: "Cédric Le Goater" <clg@redhat.com>
To: qemu-devel@nongnu.org
Cc: "Richard Henderson" <richard.henderson@linaro.org>,
"Alex Williamson" <alex.williamson@redhat.com>,
"Avihai Horon" <avihaih@nvidia.com>,
"Cédric Le Goater" <clg@redhat.com>,
"YangHang Liu" <yanghliu@redhat.com>
Subject: [PULL 08/16] vfio/migration: Add support for switchover ack capability
Date: Fri, 30 Jun 2023 07:22:27 +0200 [thread overview]
Message-ID: <20230630052235.1934154-9-clg@redhat.com> (raw)
In-Reply-To: <20230630052235.1934154-1-clg@redhat.com>
From: Avihai Horon <avihaih@nvidia.com>
Loading of a VFIO device's data can take a substantial amount of time as
the device may need to allocate resources, prepare internal data
structures, etc. This can increase migration downtime, especially for
VFIO devices with a lot of resources.
To solve this, VFIO migration uAPI defines "initial bytes" as part of
its precopy data stream. Initial bytes can be used in various ways to
improve VFIO migration performance. For example, it can be used to
transfer device metadata to pre-allocate resources in the destination.
However, for this to work we need to make sure that all initial bytes
are sent and loaded in the destination before the source VM is stopped.
Use migration switchover ack capability to make sure a VFIO device's
initial bytes are sent and loaded in the destination before the source
stops the VM and attempts to complete the migration.
This can significantly reduce migration downtime for some devices.
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Tested-by: YangHang Liu <yanghliu@redhat.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
docs/devel/vfio-migration.rst | 10 +++++++++
include/hw/vfio/vfio-common.h | 1 +
hw/vfio/migration.c | 39 ++++++++++++++++++++++++++++++++++-
3 files changed, 49 insertions(+), 1 deletion(-)
diff --git a/docs/devel/vfio-migration.rst b/docs/devel/vfio-migration.rst
index e896b2a6734b..b433cb5bb2c8 100644
--- a/docs/devel/vfio-migration.rst
+++ b/docs/devel/vfio-migration.rst
@@ -16,6 +16,13 @@ helps to reduce the total downtime of the VM. VFIO devices opt-in to pre-copy
support by reporting the VFIO_MIGRATION_PRE_COPY flag in the
VFIO_DEVICE_FEATURE_MIGRATION ioctl.
+When pre-copy is supported, it's possible to further reduce downtime by
+enabling "switchover-ack" migration capability.
+VFIO migration uAPI defines "initial bytes" as part of its pre-copy data stream
+and recommends that the initial bytes are sent and loaded in the destination
+before stopping the source VM. Enabling this migration capability will
+guarantee that and thus, can potentially reduce downtime even further.
+
Note that currently VFIO migration is supported only for a single device. This
is due to VFIO migration's lack of P2P support. However, P2P support is planned
to be added later on.
@@ -45,6 +52,9 @@ VFIO implements the device hooks for the iterative approach as follows:
* A ``save_live_iterate`` function that reads the VFIO device's data from the
vendor driver during iterative pre-copy phase.
+* A ``switchover_ack_needed`` function that checks if the VFIO device uses
+ "switchover-ack" migration capability when this capability is enabled.
+
* A ``save_state`` function to save the device config space if it is present.
* A ``save_live_complete_precopy`` function that sets the VFIO device in
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 1db901c1941f..3dc5f2104c86 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -69,6 +69,7 @@ typedef struct VFIOMigration {
uint64_t mig_flags;
uint64_t precopy_init_size;
uint64_t precopy_dirty_size;
+ bool initial_data_sent;
} VFIOMigration;
typedef struct VFIOAddressSpace {
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index d8f6a22ae14e..acbf0bb7ab3c 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -18,6 +18,8 @@
#include "sysemu/runstate.h"
#include "hw/vfio/vfio-common.h"
#include "migration/migration.h"
+#include "migration/options.h"
+#include "migration/savevm.h"
#include "migration/vmstate.h"
#include "migration/qemu-file.h"
#include "migration/register.h"
@@ -45,6 +47,7 @@
#define VFIO_MIG_FLAG_DEV_CONFIG_STATE (0xffffffffef100002ULL)
#define VFIO_MIG_FLAG_DEV_SETUP_STATE (0xffffffffef100003ULL)
#define VFIO_MIG_FLAG_DEV_DATA_STATE (0xffffffffef100004ULL)
+#define VFIO_MIG_FLAG_DEV_INIT_DATA_SENT (0xffffffffef100005ULL)
/*
* This is an arbitrary size based on migration of mlx5 devices, where typically
@@ -384,6 +387,7 @@ static void vfio_save_cleanup(void *opaque)
migration->data_buffer = NULL;
migration->precopy_init_size = 0;
migration->precopy_dirty_size = 0;
+ migration->initial_data_sent = false;
vfio_migration_cleanup(vbasedev);
trace_vfio_save_cleanup(vbasedev->name);
}
@@ -457,10 +461,17 @@ static int vfio_save_iterate(QEMUFile *f, void *opaque)
if (data_size < 0) {
return data_size;
}
- qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
vfio_update_estimated_pending_data(migration, data_size);
+ if (migrate_switchover_ack() && !migration->precopy_init_size &&
+ !migration->initial_data_sent) {
+ qemu_put_be64(f, VFIO_MIG_FLAG_DEV_INIT_DATA_SENT);
+ migration->initial_data_sent = true;
+ } else {
+ qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
+ }
+
trace_vfio_save_iterate(vbasedev->name, migration->precopy_init_size,
migration->precopy_dirty_size);
@@ -579,6 +590,24 @@ static int vfio_load_state(QEMUFile *f, void *opaque, int version_id)
}
break;
}
+ case VFIO_MIG_FLAG_DEV_INIT_DATA_SENT:
+ {
+ if (!vfio_precopy_supported(vbasedev) ||
+ !migrate_switchover_ack()) {
+ error_report("%s: Received INIT_DATA_SENT but switchover ack "
+ "is not used", vbasedev->name);
+ return -EINVAL;
+ }
+
+ ret = qemu_loadvm_approve_switchover();
+ if (ret) {
+ error_report(
+ "%s: qemu_loadvm_approve_switchover failed, err=%d (%s)",
+ vbasedev->name, ret, strerror(-ret));
+ }
+
+ return ret;
+ }
default:
error_report("%s: Unknown tag 0x%"PRIx64, vbasedev->name, data);
return -EINVAL;
@@ -593,6 +622,13 @@ static int vfio_load_state(QEMUFile *f, void *opaque, int version_id)
return ret;
}
+static bool vfio_switchover_ack_needed(void *opaque)
+{
+ VFIODevice *vbasedev = opaque;
+
+ return vfio_precopy_supported(vbasedev);
+}
+
static const SaveVMHandlers savevm_vfio_handlers = {
.save_setup = vfio_save_setup,
.save_cleanup = vfio_save_cleanup,
@@ -605,6 +641,7 @@ static const SaveVMHandlers savevm_vfio_handlers = {
.load_setup = vfio_load_setup,
.load_cleanup = vfio_load_cleanup,
.load_state = vfio_load_state,
+ .switchover_ack_needed = vfio_switchover_ack_needed,
};
/* ---------------------------------------------------------------------- */
--
2.41.0
next prev parent reply other threads:[~2023-06-30 5:27 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-30 5:22 [PULL 00/16] vfio queue Cédric Le Goater
2023-06-30 5:22 ` [PULL 01/16] migration: Add switchover ack capability Cédric Le Goater
2023-06-30 5:22 ` [PULL 02/16] migration: Implement switchover ack logic Cédric Le Goater
2023-06-30 5:22 ` [PULL 03/16] migration: Enable switchover ack capability Cédric Le Goater
2023-06-30 5:22 ` [PULL 04/16] tests: Add migration switchover ack capability test Cédric Le Goater
2023-06-30 5:22 ` [PULL 05/16] vfio/migration: Refactor vfio_save_block() to return saved data size Cédric Le Goater
2023-06-30 5:22 ` [PULL 06/16] vfio/migration: Store VFIO migration flags in VFIOMigration Cédric Le Goater
2023-06-30 5:22 ` [PULL 07/16] vfio/migration: Add VFIO migration pre-copy support Cédric Le Goater
2023-06-30 5:22 ` Cédric Le Goater [this message]
2023-06-30 5:22 ` [PULL 09/16] vfio: Implement a common device info helper Cédric Le Goater
2023-06-30 5:22 ` [PULL 10/16] hw/vfio/pci-quirks: Support alternate offset for GPUDirect Cliques Cédric Le Goater
2023-06-30 5:22 ` [PULL 11/16] vfio/pci: Call vfio_prepare_kvm_msi_virq_batch() in MSI retry path Cédric Le Goater
2023-06-30 15:59 ` Michael Tokarev
2023-06-30 5:22 ` [PULL 12/16] vfio/migration: Reset bytes_transferred properly Cédric Le Goater
2023-06-30 5:22 ` [PULL 13/16] vfio/migration: Make VFIO migration non-experimental Cédric Le Goater
2023-06-30 5:22 ` [PULL 14/16] MAINTAINERS: Promote Cédric to VFIO co-maintainer Cédric Le Goater
2023-06-30 5:22 ` [PULL 15/16] vfio/pci: Fix a segfault in vfio_realize Cédric Le Goater
2023-06-30 5:22 ` [PULL 16/16] vfio/pci: Free leaked timer in vfio_realize error path Cédric Le Goater
2023-06-30 9:55 ` [PULL 00/16] vfio queue Richard Henderson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230630052235.1934154-9-clg@redhat.com \
--to=clg@redhat.com \
--cc=alex.williamson@redhat.com \
--cc=avihaih@nvidia.com \
--cc=qemu-devel@nongnu.org \
--cc=richard.henderson@linaro.org \
--cc=yanghliu@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).