qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Avihai Horon <avihaih@nvidia.com>
To: "Cédric Le Goater" <clg@redhat.com>, qemu-devel@nongnu.org
Cc: "Alex Williamson" <alex.williamson@redhat.com>,
	"Eduardo Habkost" <eduardo@habkost.net>,
	"Marcel Apfelbaum" <marcel.apfelbaum@gmail.com>,
	"Philippe Mathieu-Daudé" <philmd@linaro.org>,
	"Yanan Wang" <wangyanan55@huawei.com>,
	"Juan Quintela" <quintela@redhat.com>,
	"Peter Xu" <peterx@redhat.com>,
	"Leonardo Bras" <leobras@redhat.com>,
	"Eric Blake" <eblake@redhat.com>,
	"Markus Armbruster" <armbru@redhat.com>,
	"Thomas Huth" <thuth@redhat.com>,
	"Laurent Vivier" <lvivier@redhat.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Yishai Hadas" <yishaih@nvidia.com>,
	"Jason Gunthorpe" <jgg@nvidia.com>,
	"Maor Gottlieb" <maorg@nvidia.com>,
	"Kirti Wankhede" <kwankhede@nvidia.com>,
	"Tarun Gupta" <targupta@nvidia.com>,
	"Joao Martins" <joao.m.martins@oracle.com>
Subject: Re: [PATCH v4 9/9] vfio/migration: Add support for switchover ack capability
Date: Tue, 30 May 2023 14:04:27 +0300	[thread overview]
Message-ID: <bb7bb1f5-9dd4-b72f-1531-f59c1e28ebbb@nvidia.com> (raw)
In-Reply-To: <b3da19aa-37d0-16f2-4094-d1931dfa7601@redhat.com>


On 30/05/2023 12:58, Cédric Le Goater wrote:
> External email: Use caution opening links or attachments
>
>
> On 5/28/23 16:06, Avihai Horon wrote:
>> Loading of a VFIO device's data can take a substantial amount of time as
>> the device may need to allocate resources, prepare internal data
>> structures, etc. This can increase migration downtime, especially for
>> VFIO devices with a lot of resources.
>>
>> To solve this, VFIO migration uAPI defines "initial bytes" as part of
>> its precopy data stream. Initial bytes can be used in various ways to
>> improve VFIO migration performance. For example, it can be used to
>> transfer device metadata to pre-allocate resources in the destination.
>> However, for this to work we need to make sure that all initial bytes
>> are sent and loaded in the destination before the source VM is stopped.
>>
>> Use migration switchover ack capability to make sure a VFIO device's
>> initial bytes are sent and loaded in the destination before the source
>> stops the VM and attempts to complete the migration.
>> This can significantly reduce migration downtime for some devices.
>>
>> As precopy support and precopy initial bytes support come together in
>> VFIO migration, use x-allow-pre-copy device property to control usage of
>> this feature as well.
>>
>> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
>> ---
>>   docs/devel/vfio-migration.rst | 10 ++++++++
>>   include/hw/vfio/vfio-common.h |  2 ++
>>   hw/vfio/migration.c           | 48 ++++++++++++++++++++++++++++++++++-
>>   3 files changed, 59 insertions(+), 1 deletion(-)
>>
>> diff --git a/docs/devel/vfio-migration.rst 
>> b/docs/devel/vfio-migration.rst
>> index e896b2a673..b433cb5bb2 100644
>> --- a/docs/devel/vfio-migration.rst
>> +++ b/docs/devel/vfio-migration.rst
>> @@ -16,6 +16,13 @@ helps to reduce the total downtime of the VM. VFIO 
>> devices opt-in to pre-copy
>>   support by reporting the VFIO_MIGRATION_PRE_COPY flag in the
>>   VFIO_DEVICE_FEATURE_MIGRATION ioctl.
>>
>> +When pre-copy is supported, it's possible to further reduce downtime by
>> +enabling "switchover-ack" migration capability.
>> +VFIO migration uAPI defines "initial bytes" as part of its pre-copy 
>> data stream
>> +and recommends that the initial bytes are sent and loaded in the 
>> destination
>> +before stopping the source VM. Enabling this migration capability will
>> +guarantee that and thus, can potentially reduce downtime even further.
>> +
>>   Note that currently VFIO migration is supported only for a single 
>> device. This
>>   is due to VFIO migration's lack of P2P support. However, P2P 
>> support is planned
>>   to be added later on.
>> @@ -45,6 +52,9 @@ VFIO implements the device hooks for the iterative 
>> approach as follows:
>>   * A ``save_live_iterate`` function that reads the VFIO device's 
>> data from the
>>     vendor driver during iterative pre-copy phase.
>>
>> +* A ``switchover_ack_needed`` function that checks if the VFIO 
>> device uses
>> +  "switchover-ack" migration capability when this capability is 
>> enabled.
>> +
>>   * A ``save_state`` function to save the device config space if it 
>> is present.
>>
>>   * A ``save_live_complete_precopy`` function that sets the VFIO 
>> device in
>> diff --git a/include/hw/vfio/vfio-common.h 
>> b/include/hw/vfio/vfio-common.h
>> index a53ecbe2e0..ad0562c8b7 100644
>> --- a/include/hw/vfio/vfio-common.h
>> +++ b/include/hw/vfio/vfio-common.h
>> @@ -69,6 +69,8 @@ typedef struct VFIOMigration {
>>       uint64_t mig_flags;
>>       uint64_t precopy_init_size;
>>       uint64_t precopy_dirty_size;
>> +    bool switchover_ack_needed;
>
> Do we really need the 'switchover_ack_needed' bool ?
>
> It seems that each time it is used in a routine it could be computed
> locally with migrate_switchover_ack() and vfio_precopy_supported().
> This would simplify the code a bit more.
>
You are right.
I will drop it and send a v5 (will fix the superfluous " as well).

Thanks!

>
>
>
>> +    bool initial_data_sent;
>>   } VFIOMigration;
>>
>>   typedef struct VFIOAddressSpace {
>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
>> index cb6923ed3f..ede29ffb5c 100644
>> --- a/hw/vfio/migration.c
>> +++ b/hw/vfio/migration.c
>> @@ -18,6 +18,8 @@
>>   #include "sysemu/runstate.h"
>>   #include "hw/vfio/vfio-common.h"
>>   #include "migration/migration.h"
>> +#include "migration/options.h"
>> +#include "migration/savevm.h"
>>   #include "migration/vmstate.h"
>>   #include "migration/qemu-file.h"
>>   #include "migration/register.h"
>> @@ -45,6 +47,7 @@
>>   #define VFIO_MIG_FLAG_DEV_CONFIG_STATE (0xffffffffef100002ULL)
>>   #define VFIO_MIG_FLAG_DEV_SETUP_STATE (0xffffffffef100003ULL)
>>   #define VFIO_MIG_FLAG_DEV_DATA_STATE (0xffffffffef100004ULL)
>> +#define VFIO_MIG_FLAG_DEV_INIT_DATA_SENT (0xffffffffef100005ULL)
>>
>>   /*
>>    * This is an arbitrary size based on migration of mlx5 devices, 
>> where typically
>> @@ -218,6 +221,7 @@ static void vfio_migration_cleanup(VFIODevice 
>> *vbasedev)
>>
>>       close(migration->data_fd);
>>       migration->data_fd = -1;
>> +    migration->switchover_ack_needed = false;
>>   }
>>
>>   static int vfio_query_stop_copy_size(VFIODevice *vbasedev,
>> @@ -350,6 +354,10 @@ static int vfio_save_setup(QEMUFile *f, void 
>> *opaque)
>>       if (vfio_precopy_supported(vbasedev)) {
>>           int ret;
>>
>> +        if (migrate_switchover_ack()) {
>> +            migration->switchover_ack_needed = true;
>> +        }
>> +
>>           switch (migration->device_state) {
>>           case VFIO_DEVICE_STATE_RUNNING:
>>               ret = vfio_migration_set_state(vbasedev, 
>> VFIO_DEVICE_STATE_PRE_COPY,
>> @@ -385,6 +393,7 @@ static void vfio_save_cleanup(void *opaque)
>>       migration->data_buffer = NULL;
>>       migration->precopy_init_size = 0;
>>       migration->precopy_dirty_size = 0;
>> +    migration->initial_data_sent = false;
>>       vfio_migration_cleanup(vbasedev);
>>       trace_vfio_save_cleanup(vbasedev->name);
>>   }
>> @@ -458,10 +467,17 @@ static int vfio_save_iterate(QEMUFile *f, void 
>> *opaque)
>>       if (data_size < 0) {
>>           return data_size;
>>       }
>> -    qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
>>
>>       vfio_update_estimated_pending_data(migration, data_size);
>>
>> +    if (migration->switchover_ack_needed && 
>> !migration->precopy_init_size &&
>> +        !migration->initial_data_sent) {
>> +        qemu_put_be64(f, VFIO_MIG_FLAG_DEV_INIT_DATA_SENT);
>> +        migration->initial_data_sent = true;
>> +    } else {
>> +        qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
>> +    }
>> +
>>       trace_vfio_save_iterate(vbasedev->name, 
>> migration->precopy_init_size,
>>                               migration->precopy_dirty_size);
>>
>> @@ -526,6 +542,10 @@ static int vfio_load_setup(QEMUFile *f, void 
>> *opaque)
>>   {
>>       VFIODevice *vbasedev = opaque;
>>
>> +    if (migrate_switchover_ack() && vfio_precopy_supported(vbasedev)) {
>> +        vbasedev->migration->switchover_ack_needed = true;
>> +    }
>> +
>>       return vfio_migration_set_state(vbasedev, 
>> VFIO_DEVICE_STATE_RESUMING,
>> vbasedev->migration->device_state);
>>   }
>> @@ -580,6 +600,23 @@ static int vfio_load_state(QEMUFile *f, void 
>> *opaque, int version_id)
>>               }
>>               break;
>>           }
>> +        case VFIO_MIG_FLAG_DEV_INIT_DATA_SENT:
>> +        {
>> +            if (!vbasedev->migration->switchover_ack_needed) {
>> +                error_report("%s: Received INIT_DATA_SENT but 
>> switchover ack "
>> +                             "is not needed", vbasedev->name);
>> +                return -EINVAL;
>> +            }
>> +
>> +            ret = qemu_loadvm_approve_switchover();
>> +            if (ret) {
>> +                error_report(
>> +                    "%s: qemu_loadvm_approve_switchover failed, 
>> err=%d (%s)",
>> +                    vbasedev->name, ret, strerror(-ret));
>> +            }
>> +
>> +            return ret;
>> +        }
>>           default:
>>               error_report("%s: Unknown tag 0x%"PRIx64, 
>> vbasedev->name, data);
>>               return -EINVAL;
>> @@ -594,6 +631,14 @@ static int vfio_load_state(QEMUFile *f, void 
>> *opaque, int version_id)
>>       return ret;
>>   }
>>
>> +static bool vfio_switchover_ack_needed(void *opaque)
>> +{
>> +    VFIODevice *vbasedev = opaque;
>> +    VFIOMigration *migration = vbasedev->migration;
>> +
>> +    return migration->switchover_ack_needed;
>> +}
>> +
>>   static const SaveVMHandlers savevm_vfio_handlers = {
>>       .save_setup = vfio_save_setup,
>>       .save_cleanup = vfio_save_cleanup,
>> @@ -606,6 +651,7 @@ static const SaveVMHandlers savevm_vfio_handlers = {
>>       .load_setup = vfio_load_setup,
>>       .load_cleanup = vfio_load_cleanup,
>>       .load_state = vfio_load_state,
>> +    .switchover_ack_needed = vfio_switchover_ack_needed,
>>   };
>>
>>   /* 
>> ---------------------------------------------------------------------- 
>> */
>


      reply	other threads:[~2023-05-30 11:05 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-28 14:06 [PATCH v4 0/9] migration: Add switchover ack capability and VFIO precopy support Avihai Horon
2023-05-28 14:06 ` [PATCH v4 1/9] migration: Add switchover ack capability Avihai Horon
2023-05-28 14:06 ` [PATCH v4 2/9] migration: Implement switchover ack logic Avihai Horon
2023-05-29 15:01   ` Peter Xu
2023-05-28 14:06 ` [PATCH v4 3/9] migration: Enable switchover ack capability Avihai Horon
2023-05-28 14:06 ` [PATCH v4 4/9] tests: Add migration switchover ack capability test Avihai Horon
2023-05-28 14:06 ` [PATCH v4 5/9] vfio/migration: Refactor vfio_save_block() to return saved data size Avihai Horon
2023-05-28 14:06 ` [PATCH v4 6/9] vfio/migration: Store VFIO migration flags in VFIOMigration Avihai Horon
2023-05-30  9:08   ` Cédric Le Goater
2023-05-28 14:06 ` [PATCH v4 7/9] vfio/migration: Add VFIO migration pre-copy support Avihai Horon
2023-05-30  9:28   ` Cédric Le Goater
2023-05-30  9:55     ` Avihai Horon
2023-05-30 10:17       ` Cédric Le Goater
2023-05-30 11:06         ` Avihai Horon
2023-05-28 14:06 ` [PATCH v4 8/9] vfio/migration: Add x-allow-pre-copy VFIO device property Avihai Horon
2023-05-30  9:13   ` Cédric Le Goater
2023-05-28 14:06 ` [PATCH v4 9/9] vfio/migration: Add support for switchover ack capability Avihai Horon
2023-05-30  9:58   ` Cédric Le Goater
2023-05-30 11:04     ` Avihai Horon [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bb7bb1f5-9dd4-b72f-1531-f59c1e28ebbb@nvidia.com \
    --to=avihaih@nvidia.com \
    --cc=alex.williamson@redhat.com \
    --cc=armbru@redhat.com \
    --cc=clg@redhat.com \
    --cc=eblake@redhat.com \
    --cc=eduardo@habkost.net \
    --cc=jgg@nvidia.com \
    --cc=joao.m.martins@oracle.com \
    --cc=kwankhede@nvidia.com \
    --cc=leobras@redhat.com \
    --cc=lvivier@redhat.com \
    --cc=maorg@nvidia.com \
    --cc=marcel.apfelbaum@gmail.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=philmd@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=targupta@nvidia.com \
    --cc=thuth@redhat.com \
    --cc=wangyanan55@huawei.com \
    --cc=yishaih@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).