From: Avihai Horon <avihaih@nvidia.com>
To: "Cédric Le Goater" <clg@redhat.com>, qemu-devel@nongnu.org
Cc: "Alex Williamson" <alex.williamson@redhat.com>,
"Eduardo Habkost" <eduardo@habkost.net>,
"Marcel Apfelbaum" <marcel.apfelbaum@gmail.com>,
"Philippe Mathieu-Daudé" <philmd@linaro.org>,
"Yanan Wang" <wangyanan55@huawei.com>,
"Juan Quintela" <quintela@redhat.com>,
"Peter Xu" <peterx@redhat.com>,
"Leonardo Bras" <leobras@redhat.com>,
"Eric Blake" <eblake@redhat.com>,
"Markus Armbruster" <armbru@redhat.com>,
"Thomas Huth" <thuth@redhat.com>,
"Laurent Vivier" <lvivier@redhat.com>,
"Paolo Bonzini" <pbonzini@redhat.com>,
"Yishai Hadas" <yishaih@nvidia.com>,
"Jason Gunthorpe" <jgg@nvidia.com>,
"Maor Gottlieb" <maorg@nvidia.com>,
"Kirti Wankhede" <kwankhede@nvidia.com>,
"Tarun Gupta" <targupta@nvidia.com>,
"Joao Martins" <joao.m.martins@oracle.com>
Subject: Re: [PATCH v4 9/9] vfio/migration: Add support for switchover ack capability
Date: Tue, 30 May 2023 14:04:27 +0300 [thread overview]
Message-ID: <bb7bb1f5-9dd4-b72f-1531-f59c1e28ebbb@nvidia.com> (raw)
In-Reply-To: <b3da19aa-37d0-16f2-4094-d1931dfa7601@redhat.com>
On 30/05/2023 12:58, Cédric Le Goater wrote:
> External email: Use caution opening links or attachments
>
>
> On 5/28/23 16:06, Avihai Horon wrote:
>> Loading of a VFIO device's data can take a substantial amount of time as
>> the device may need to allocate resources, prepare internal data
>> structures, etc. This can increase migration downtime, especially for
>> VFIO devices with a lot of resources.
>>
>> To solve this, VFIO migration uAPI defines "initial bytes" as part of
>> its precopy data stream. Initial bytes can be used in various ways to
>> improve VFIO migration performance. For example, it can be used to
>> transfer device metadata to pre-allocate resources in the destination.
>> However, for this to work we need to make sure that all initial bytes
>> are sent and loaded in the destination before the source VM is stopped.
>>
>> Use migration switchover ack capability to make sure a VFIO device's
>> initial bytes are sent and loaded in the destination before the source
>> stops the VM and attempts to complete the migration.
>> This can significantly reduce migration downtime for some devices.
>>
>> As precopy support and precopy initial bytes support come together in
>> VFIO migration, use x-allow-pre-copy device property to control usage of
>> this feature as well.
>>
>> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
>> ---
>> docs/devel/vfio-migration.rst | 10 ++++++++
>> include/hw/vfio/vfio-common.h | 2 ++
>> hw/vfio/migration.c | 48 ++++++++++++++++++++++++++++++++++-
>> 3 files changed, 59 insertions(+), 1 deletion(-)
>>
>> diff --git a/docs/devel/vfio-migration.rst
>> b/docs/devel/vfio-migration.rst
>> index e896b2a673..b433cb5bb2 100644
>> --- a/docs/devel/vfio-migration.rst
>> +++ b/docs/devel/vfio-migration.rst
>> @@ -16,6 +16,13 @@ helps to reduce the total downtime of the VM. VFIO
>> devices opt-in to pre-copy
>> support by reporting the VFIO_MIGRATION_PRE_COPY flag in the
>> VFIO_DEVICE_FEATURE_MIGRATION ioctl.
>>
>> +When pre-copy is supported, it's possible to further reduce downtime by
>> +enabling "switchover-ack" migration capability.
>> +VFIO migration uAPI defines "initial bytes" as part of its pre-copy
>> data stream
>> +and recommends that the initial bytes are sent and loaded in the
>> destination
>> +before stopping the source VM. Enabling this migration capability will
>> +guarantee that and thus, can potentially reduce downtime even further.
>> +
>> Note that currently VFIO migration is supported only for a single
>> device. This
>> is due to VFIO migration's lack of P2P support. However, P2P
>> support is planned
>> to be added later on.
>> @@ -45,6 +52,9 @@ VFIO implements the device hooks for the iterative
>> approach as follows:
>> * A ``save_live_iterate`` function that reads the VFIO device's
>> data from the
>> vendor driver during iterative pre-copy phase.
>>
>> +* A ``switchover_ack_needed`` function that checks if the VFIO
>> device uses
>> + "switchover-ack" migration capability when this capability is
>> enabled.
>> +
>> * A ``save_state`` function to save the device config space if it
>> is present.
>>
>> * A ``save_live_complete_precopy`` function that sets the VFIO
>> device in
>> diff --git a/include/hw/vfio/vfio-common.h
>> b/include/hw/vfio/vfio-common.h
>> index a53ecbe2e0..ad0562c8b7 100644
>> --- a/include/hw/vfio/vfio-common.h
>> +++ b/include/hw/vfio/vfio-common.h
>> @@ -69,6 +69,8 @@ typedef struct VFIOMigration {
>> uint64_t mig_flags;
>> uint64_t precopy_init_size;
>> uint64_t precopy_dirty_size;
>> + bool switchover_ack_needed;
>
> Do we really need the 'switchover_ack_needed' bool ?
>
> It seems that each time it is used in a routine it could be computed
> locally with migrate_switchover_ack() and vfio_precopy_supported().
> This would simplify the code a bit more.
>
You are right.
I will drop it and send a v5 (will fix the superfluous " as well).
Thanks!
>
>
>
>> + bool initial_data_sent;
>> } VFIOMigration;
>>
>> typedef struct VFIOAddressSpace {
>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
>> index cb6923ed3f..ede29ffb5c 100644
>> --- a/hw/vfio/migration.c
>> +++ b/hw/vfio/migration.c
>> @@ -18,6 +18,8 @@
>> #include "sysemu/runstate.h"
>> #include "hw/vfio/vfio-common.h"
>> #include "migration/migration.h"
>> +#include "migration/options.h"
>> +#include "migration/savevm.h"
>> #include "migration/vmstate.h"
>> #include "migration/qemu-file.h"
>> #include "migration/register.h"
>> @@ -45,6 +47,7 @@
>> #define VFIO_MIG_FLAG_DEV_CONFIG_STATE (0xffffffffef100002ULL)
>> #define VFIO_MIG_FLAG_DEV_SETUP_STATE (0xffffffffef100003ULL)
>> #define VFIO_MIG_FLAG_DEV_DATA_STATE (0xffffffffef100004ULL)
>> +#define VFIO_MIG_FLAG_DEV_INIT_DATA_SENT (0xffffffffef100005ULL)
>>
>> /*
>> * This is an arbitrary size based on migration of mlx5 devices,
>> where typically
>> @@ -218,6 +221,7 @@ static void vfio_migration_cleanup(VFIODevice
>> *vbasedev)
>>
>> close(migration->data_fd);
>> migration->data_fd = -1;
>> + migration->switchover_ack_needed = false;
>> }
>>
>> static int vfio_query_stop_copy_size(VFIODevice *vbasedev,
>> @@ -350,6 +354,10 @@ static int vfio_save_setup(QEMUFile *f, void
>> *opaque)
>> if (vfio_precopy_supported(vbasedev)) {
>> int ret;
>>
>> + if (migrate_switchover_ack()) {
>> + migration->switchover_ack_needed = true;
>> + }
>> +
>> switch (migration->device_state) {
>> case VFIO_DEVICE_STATE_RUNNING:
>> ret = vfio_migration_set_state(vbasedev,
>> VFIO_DEVICE_STATE_PRE_COPY,
>> @@ -385,6 +393,7 @@ static void vfio_save_cleanup(void *opaque)
>> migration->data_buffer = NULL;
>> migration->precopy_init_size = 0;
>> migration->precopy_dirty_size = 0;
>> + migration->initial_data_sent = false;
>> vfio_migration_cleanup(vbasedev);
>> trace_vfio_save_cleanup(vbasedev->name);
>> }
>> @@ -458,10 +467,17 @@ static int vfio_save_iterate(QEMUFile *f, void
>> *opaque)
>> if (data_size < 0) {
>> return data_size;
>> }
>> - qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
>>
>> vfio_update_estimated_pending_data(migration, data_size);
>>
>> + if (migration->switchover_ack_needed &&
>> !migration->precopy_init_size &&
>> + !migration->initial_data_sent) {
>> + qemu_put_be64(f, VFIO_MIG_FLAG_DEV_INIT_DATA_SENT);
>> + migration->initial_data_sent = true;
>> + } else {
>> + qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
>> + }
>> +
>> trace_vfio_save_iterate(vbasedev->name,
>> migration->precopy_init_size,
>> migration->precopy_dirty_size);
>>
>> @@ -526,6 +542,10 @@ static int vfio_load_setup(QEMUFile *f, void
>> *opaque)
>> {
>> VFIODevice *vbasedev = opaque;
>>
>> + if (migrate_switchover_ack() && vfio_precopy_supported(vbasedev)) {
>> + vbasedev->migration->switchover_ack_needed = true;
>> + }
>> +
>> return vfio_migration_set_state(vbasedev,
>> VFIO_DEVICE_STATE_RESUMING,
>> vbasedev->migration->device_state);
>> }
>> @@ -580,6 +600,23 @@ static int vfio_load_state(QEMUFile *f, void
>> *opaque, int version_id)
>> }
>> break;
>> }
>> + case VFIO_MIG_FLAG_DEV_INIT_DATA_SENT:
>> + {
>> + if (!vbasedev->migration->switchover_ack_needed) {
>> + error_report("%s: Received INIT_DATA_SENT but
>> switchover ack "
>> + "is not needed", vbasedev->name);
>> + return -EINVAL;
>> + }
>> +
>> + ret = qemu_loadvm_approve_switchover();
>> + if (ret) {
>> + error_report(
>> + "%s: qemu_loadvm_approve_switchover failed,
>> err=%d (%s)",
>> + vbasedev->name, ret, strerror(-ret));
>> + }
>> +
>> + return ret;
>> + }
>> default:
>> error_report("%s: Unknown tag 0x%"PRIx64,
>> vbasedev->name, data);
>> return -EINVAL;
>> @@ -594,6 +631,14 @@ static int vfio_load_state(QEMUFile *f, void
>> *opaque, int version_id)
>> return ret;
>> }
>>
>> +static bool vfio_switchover_ack_needed(void *opaque)
>> +{
>> + VFIODevice *vbasedev = opaque;
>> + VFIOMigration *migration = vbasedev->migration;
>> +
>> + return migration->switchover_ack_needed;
>> +}
>> +
>> static const SaveVMHandlers savevm_vfio_handlers = {
>> .save_setup = vfio_save_setup,
>> .save_cleanup = vfio_save_cleanup,
>> @@ -606,6 +651,7 @@ static const SaveVMHandlers savevm_vfio_handlers = {
>> .load_setup = vfio_load_setup,
>> .load_cleanup = vfio_load_cleanup,
>> .load_state = vfio_load_state,
>> + .switchover_ack_needed = vfio_switchover_ack_needed,
>> };
>>
>> /*
>> ----------------------------------------------------------------------
>> */
>
prev parent reply other threads:[~2023-05-30 11:05 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-05-28 14:06 [PATCH v4 0/9] migration: Add switchover ack capability and VFIO precopy support Avihai Horon
2023-05-28 14:06 ` [PATCH v4 1/9] migration: Add switchover ack capability Avihai Horon
2023-05-28 14:06 ` [PATCH v4 2/9] migration: Implement switchover ack logic Avihai Horon
2023-05-29 15:01 ` Peter Xu
2023-05-28 14:06 ` [PATCH v4 3/9] migration: Enable switchover ack capability Avihai Horon
2023-05-28 14:06 ` [PATCH v4 4/9] tests: Add migration switchover ack capability test Avihai Horon
2023-05-28 14:06 ` [PATCH v4 5/9] vfio/migration: Refactor vfio_save_block() to return saved data size Avihai Horon
2023-05-28 14:06 ` [PATCH v4 6/9] vfio/migration: Store VFIO migration flags in VFIOMigration Avihai Horon
2023-05-30 9:08 ` Cédric Le Goater
2023-05-28 14:06 ` [PATCH v4 7/9] vfio/migration: Add VFIO migration pre-copy support Avihai Horon
2023-05-30 9:28 ` Cédric Le Goater
2023-05-30 9:55 ` Avihai Horon
2023-05-30 10:17 ` Cédric Le Goater
2023-05-30 11:06 ` Avihai Horon
2023-05-28 14:06 ` [PATCH v4 8/9] vfio/migration: Add x-allow-pre-copy VFIO device property Avihai Horon
2023-05-30 9:13 ` Cédric Le Goater
2023-05-28 14:06 ` [PATCH v4 9/9] vfio/migration: Add support for switchover ack capability Avihai Horon
2023-05-30 9:58 ` Cédric Le Goater
2023-05-30 11:04 ` Avihai Horon [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=bb7bb1f5-9dd4-b72f-1531-f59c1e28ebbb@nvidia.com \
--to=avihaih@nvidia.com \
--cc=alex.williamson@redhat.com \
--cc=armbru@redhat.com \
--cc=clg@redhat.com \
--cc=eblake@redhat.com \
--cc=eduardo@habkost.net \
--cc=jgg@nvidia.com \
--cc=joao.m.martins@oracle.com \
--cc=kwankhede@nvidia.com \
--cc=leobras@redhat.com \
--cc=lvivier@redhat.com \
--cc=maorg@nvidia.com \
--cc=marcel.apfelbaum@gmail.com \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
--cc=philmd@linaro.org \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
--cc=targupta@nvidia.com \
--cc=thuth@redhat.com \
--cc=wangyanan55@huawei.com \
--cc=yishaih@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).