From: "Cédric Le Goater" <clg@redhat.com>
To: Avihai Horon <avihaih@nvidia.com>, qemu-devel@nongnu.org
Cc: "Alex Williamson" <alex.williamson@redhat.com>,
"Eduardo Habkost" <eduardo@habkost.net>,
"Marcel Apfelbaum" <marcel.apfelbaum@gmail.com>,
"Philippe Mathieu-Daudé" <philmd@linaro.org>,
"Yanan Wang" <wangyanan55@huawei.com>,
"Juan Quintela" <quintela@redhat.com>,
"Peter Xu" <peterx@redhat.com>,
"Leonardo Bras" <leobras@redhat.com>,
"Eric Blake" <eblake@redhat.com>,
"Markus Armbruster" <armbru@redhat.com>,
"Thomas Huth" <thuth@redhat.com>,
"Laurent Vivier" <lvivier@redhat.com>,
"Paolo Bonzini" <pbonzini@redhat.com>,
"Yishai Hadas" <yishaih@nvidia.com>,
"Jason Gunthorpe" <jgg@nvidia.com>,
"Maor Gottlieb" <maorg@nvidia.com>,
"Kirti Wankhede" <kwankhede@nvidia.com>,
"Tarun Gupta" <targupta@nvidia.com>,
"Joao Martins" <joao.m.martins@oracle.com>
Subject: Re: [PATCH v5 9/9] vfio/migration: Add support for switchover ack capability
Date: Tue, 30 May 2023 17:15:51 +0200 [thread overview]
Message-ID: <8ca99fcc-af76-eb33-549e-e69e876ae8b9@redhat.com> (raw)
In-Reply-To: <20230530144821.1557-10-avihaih@nvidia.com>
On 5/30/23 16:48, Avihai Horon wrote:
> Loading of a VFIO device's data can take a substantial amount of time as
> the device may need to allocate resources, prepare internal data
> structures, etc. This can increase migration downtime, especially for
> VFIO devices with a lot of resources.
>
> To solve this, VFIO migration uAPI defines "initial bytes" as part of
> its precopy data stream. Initial bytes can be used in various ways to
> improve VFIO migration performance. For example, it can be used to
> transfer device metadata to pre-allocate resources in the destination.
> However, for this to work we need to make sure that all initial bytes
> are sent and loaded in the destination before the source VM is stopped.
>
> Use migration switchover ack capability to make sure a VFIO device's
> initial bytes are sent and loaded in the destination before the source
> stops the VM and attempts to complete the migration.
> This can significantly reduce migration downtime for some devices.
>
> As precopy support and precopy initial bytes support come together in
> VFIO migration, use x-allow-pre-copy device property to control usage of
> this feature as well.
>
> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Thanks,
C.
> ---
> docs/devel/vfio-migration.rst | 10 +++++++++
> include/hw/vfio/vfio-common.h | 1 +
> hw/vfio/migration.c | 39 ++++++++++++++++++++++++++++++++++-
> 3 files changed, 49 insertions(+), 1 deletion(-)
>
> diff --git a/docs/devel/vfio-migration.rst b/docs/devel/vfio-migration.rst
> index e896b2a673..b433cb5bb2 100644
> --- a/docs/devel/vfio-migration.rst
> +++ b/docs/devel/vfio-migration.rst
> @@ -16,6 +16,13 @@ helps to reduce the total downtime of the VM. VFIO devices opt-in to pre-copy
> support by reporting the VFIO_MIGRATION_PRE_COPY flag in the
> VFIO_DEVICE_FEATURE_MIGRATION ioctl.
>
> +When pre-copy is supported, it's possible to further reduce downtime by
> +enabling "switchover-ack" migration capability.
> +VFIO migration uAPI defines "initial bytes" as part of its pre-copy data stream
> +and recommends that the initial bytes are sent and loaded in the destination
> +before stopping the source VM. Enabling this migration capability will
> +guarantee that and thus, can potentially reduce downtime even further.
> +
> Note that currently VFIO migration is supported only for a single device. This
> is due to VFIO migration's lack of P2P support. However, P2P support is planned
> to be added later on.
> @@ -45,6 +52,9 @@ VFIO implements the device hooks for the iterative approach as follows:
> * A ``save_live_iterate`` function that reads the VFIO device's data from the
> vendor driver during iterative pre-copy phase.
>
> +* A ``switchover_ack_needed`` function that checks if the VFIO device uses
> + "switchover-ack" migration capability when this capability is enabled.
> +
> * A ``save_state`` function to save the device config space if it is present.
>
> * A ``save_live_complete_precopy`` function that sets the VFIO device in
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index a53ecbe2e0..3677aba4f4 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -69,6 +69,7 @@ typedef struct VFIOMigration {
> uint64_t mig_flags;
> uint64_t precopy_init_size;
> uint64_t precopy_dirty_size;
> + bool initial_data_sent;
> } VFIOMigration;
>
> typedef struct VFIOAddressSpace {
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index cb6923ed3f..53f5787f0e 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -18,6 +18,8 @@
> #include "sysemu/runstate.h"
> #include "hw/vfio/vfio-common.h"
> #include "migration/migration.h"
> +#include "migration/options.h"
> +#include "migration/savevm.h"
> #include "migration/vmstate.h"
> #include "migration/qemu-file.h"
> #include "migration/register.h"
> @@ -45,6 +47,7 @@
> #define VFIO_MIG_FLAG_DEV_CONFIG_STATE (0xffffffffef100002ULL)
> #define VFIO_MIG_FLAG_DEV_SETUP_STATE (0xffffffffef100003ULL)
> #define VFIO_MIG_FLAG_DEV_DATA_STATE (0xffffffffef100004ULL)
> +#define VFIO_MIG_FLAG_DEV_INIT_DATA_SENT (0xffffffffef100005ULL)
>
> /*
> * This is an arbitrary size based on migration of mlx5 devices, where typically
> @@ -385,6 +388,7 @@ static void vfio_save_cleanup(void *opaque)
> migration->data_buffer = NULL;
> migration->precopy_init_size = 0;
> migration->precopy_dirty_size = 0;
> + migration->initial_data_sent = false;
> vfio_migration_cleanup(vbasedev);
> trace_vfio_save_cleanup(vbasedev->name);
> }
> @@ -458,10 +462,17 @@ static int vfio_save_iterate(QEMUFile *f, void *opaque)
> if (data_size < 0) {
> return data_size;
> }
> - qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
>
> vfio_update_estimated_pending_data(migration, data_size);
>
> + if (migrate_switchover_ack() && !migration->precopy_init_size &&
> + !migration->initial_data_sent) {
> + qemu_put_be64(f, VFIO_MIG_FLAG_DEV_INIT_DATA_SENT);
> + migration->initial_data_sent = true;
> + } else {
> + qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
> + }
> +
> trace_vfio_save_iterate(vbasedev->name, migration->precopy_init_size,
> migration->precopy_dirty_size);
>
> @@ -580,6 +591,24 @@ static int vfio_load_state(QEMUFile *f, void *opaque, int version_id)
> }
> break;
> }
> + case VFIO_MIG_FLAG_DEV_INIT_DATA_SENT:
> + {
> + if (!vfio_precopy_supported(vbasedev) ||
> + !migrate_switchover_ack()) {
> + error_report("%s: Received INIT_DATA_SENT but switchover ack "
> + "is not used", vbasedev->name);
> + return -EINVAL;
> + }
> +
> + ret = qemu_loadvm_approve_switchover();
> + if (ret) {
> + error_report(
> + "%s: qemu_loadvm_approve_switchover failed, err=%d (%s)",
> + vbasedev->name, ret, strerror(-ret));
> + }
> +
> + return ret;
> + }
> default:
> error_report("%s: Unknown tag 0x%"PRIx64, vbasedev->name, data);
> return -EINVAL;
> @@ -594,6 +623,13 @@ static int vfio_load_state(QEMUFile *f, void *opaque, int version_id)
> return ret;
> }
>
> +static bool vfio_switchover_ack_needed(void *opaque)
> +{
> + VFIODevice *vbasedev = opaque;
> +
> + return vfio_precopy_supported(vbasedev);
> +}
> +
> static const SaveVMHandlers savevm_vfio_handlers = {
> .save_setup = vfio_save_setup,
> .save_cleanup = vfio_save_cleanup,
> @@ -606,6 +642,7 @@ static const SaveVMHandlers savevm_vfio_handlers = {
> .load_setup = vfio_load_setup,
> .load_cleanup = vfio_load_cleanup,
> .load_state = vfio_load_state,
> + .switchover_ack_needed = vfio_switchover_ack_needed,
> };
>
> /* ---------------------------------------------------------------------- */
next prev parent reply other threads:[~2023-05-30 15:16 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-05-30 14:48 [PATCH v5 0/9] migration: Add switchover ack capability and VFIO precopy support Avihai Horon
2023-05-30 14:48 ` [PATCH v5 1/9] migration: Add switchover ack capability Avihai Horon
2023-06-15 12:38 ` YangHang Liu
2023-06-15 13:49 ` Cédric Le Goater
2023-06-19 9:37 ` Avihai Horon
2023-05-30 14:48 ` [PATCH v5 2/9] migration: Implement switchover ack logic Avihai Horon
2023-06-05 22:06 ` Alex Williamson
2023-06-06 12:12 ` Avihai Horon
2023-06-08 18:32 ` Alex Williamson
2023-06-11 7:45 ` Avihai Horon
2023-05-30 14:48 ` [PATCH v5 3/9] migration: Enable switchover ack capability Avihai Horon
2023-05-30 14:48 ` [PATCH v5 4/9] tests: Add migration switchover ack capability test Avihai Horon
2023-05-30 14:48 ` [PATCH v5 5/9] vfio/migration: Refactor vfio_save_block() to return saved data size Avihai Horon
2023-05-30 14:48 ` [PATCH v5 6/9] vfio/migration: Store VFIO migration flags in VFIOMigration Avihai Horon
2023-05-30 14:48 ` [PATCH v5 7/9] vfio/migration: Add VFIO migration pre-copy support Avihai Horon
2023-05-30 14:48 ` [PATCH v5 8/9] vfio/migration: Add x-allow-pre-copy VFIO device property Avihai Horon
2023-06-01 20:22 ` Alex Williamson
2023-06-04 9:33 ` Avihai Horon
2023-06-05 14:56 ` Alex Williamson
2023-06-06 11:59 ` Avihai Horon
2023-06-06 13:40 ` Cédric Le Goater
2023-06-07 7:41 ` Avihai Horon
2023-05-30 14:48 ` [PATCH v5 9/9] vfio/migration: Add support for switchover ack capability Avihai Horon
2023-05-30 15:15 ` Cédric Le Goater [this message]
2023-06-16 9:35 ` [PATCH v5 0/9] migration: Add switchover ack capability and VFIO precopy support YangHang Liu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8ca99fcc-af76-eb33-549e-e69e876ae8b9@redhat.com \
--to=clg@redhat.com \
--cc=alex.williamson@redhat.com \
--cc=armbru@redhat.com \
--cc=avihaih@nvidia.com \
--cc=eblake@redhat.com \
--cc=eduardo@habkost.net \
--cc=jgg@nvidia.com \
--cc=joao.m.martins@oracle.com \
--cc=kwankhede@nvidia.com \
--cc=leobras@redhat.com \
--cc=lvivier@redhat.com \
--cc=maorg@nvidia.com \
--cc=marcel.apfelbaum@gmail.com \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
--cc=philmd@linaro.org \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
--cc=targupta@nvidia.com \
--cc=thuth@redhat.com \
--cc=wangyanan55@huawei.com \
--cc=yishaih@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).