qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Cédric Le Goater" <clg@redhat.com>
To: Avihai Horon <avihaih@nvidia.com>, qemu-devel@nongnu.org
Cc: "Alex Williamson" <alex.williamson@redhat.com>,
	"Eduardo Habkost" <eduardo@habkost.net>,
	"Marcel Apfelbaum" <marcel.apfelbaum@gmail.com>,
	"Philippe Mathieu-Daudé" <philmd@linaro.org>,
	"Yanan Wang" <wangyanan55@huawei.com>,
	"Juan Quintela" <quintela@redhat.com>,
	"Peter Xu" <peterx@redhat.com>,
	"Leonardo Bras" <leobras@redhat.com>,
	"Eric Blake" <eblake@redhat.com>,
	"Markus Armbruster" <armbru@redhat.com>,
	"Thomas Huth" <thuth@redhat.com>,
	"Laurent Vivier" <lvivier@redhat.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Yishai Hadas" <yishaih@nvidia.com>,
	"Jason Gunthorpe" <jgg@nvidia.com>,
	"Maor Gottlieb" <maorg@nvidia.com>,
	"Kirti Wankhede" <kwankhede@nvidia.com>,
	"Tarun Gupta" <targupta@nvidia.com>,
	"Joao Martins" <joao.m.martins@oracle.com>
Subject: Re: [PATCH v3 7/7] vfio/migration: Add support for switchover ack capability
Date: Tue, 23 May 2023 17:09:01 +0200	[thread overview]
Message-ID: <d5fdbc86-2d73-03ca-23a0-407c01fc206e@redhat.com> (raw)
In-Reply-To: <20230521151808.24804-8-avihaih@nvidia.com>

On 5/21/23 17:18, Avihai Horon wrote:
> Loading of a VFIO device's data can take a substantial amount of time as
> the device may need to allocate resources, prepare internal data
> structures, etc. This can increase migration downtime, especially for
> VFIO devices with a lot of resources.
> 
> To solve this, VFIO migration uAPI defines "initial bytes" as part of
> its precopy data stream. Initial bytes can be used in various ways to
> improve VFIO migration performance. For example, it can be used to
> transfer device metadata to pre-allocate resources in the destination.
> However, for this to work we need to make sure that all initial bytes
> are sent and loaded in the destination before the source VM is stopped.
> 
> Use migration switchover ack capability to make sure a VFIO device's
> initial bytes are sent and loaded in the destination before the source
> stops the VM and attempts to complete the migration.
> This can significantly reduce migration downtime for some devices.
> 
> As precopy support and precopy initial bytes support come together in



> VFIO migration, use x-allow-pre-copy device property to control usage of
> this feature as well.
> 
> Signed-off-by: Avihai Horon <avihaih@nvidia.com>


Reviewed-by: Cédric Le Goater <clg@redhat.com>

> ---
>   docs/devel/vfio-migration.rst | 10 +++++++++
>   include/hw/vfio/vfio-common.h |  2 ++
>   hw/vfio/migration.c           | 42 ++++++++++++++++++++++++++++++++++-
>   3 files changed, 53 insertions(+), 1 deletion(-)
> 
> diff --git a/docs/devel/vfio-migration.rst b/docs/devel/vfio-migration.rst
> index e896b2a673..e75793b76a 100644
> --- a/docs/devel/vfio-migration.rst
> +++ b/docs/devel/vfio-migration.rst
> @@ -16,6 +16,13 @@ helps to reduce the total downtime of the VM. VFIO devices opt-in to pre-copy
>   support by reporting the VFIO_MIGRATION_PRE_COPY flag in the
>   VFIO_DEVICE_FEATURE_MIGRATION ioctl.
>   
> +When pre-copy is supported, it's possible to further reduce downtime by
> +enabling "switchover-ack" migration capability.
> +VFIO migration uAPI defines "initial bytes" as part of its pre-copy data stream
> +and recommends that the initial bytes are sent and loaded in the destination
> +before stopping the source VM. Enabling this migration capability will
> +guarantee that and thus, can potentially reduce downtime even further.
> +
>   Note that currently VFIO migration is supported only for a single device. This
>   is due to VFIO migration's lack of P2P support. However, P2P support is planned
>   to be added later on.
> @@ -45,6 +52,9 @@ VFIO implements the device hooks for the iterative approach as follows:
>   * A ``save_live_iterate`` function that reads the VFIO device's data from the
>     vendor driver during iterative pre-copy phase.
>   
> +* A ``switchover_ack_needed`` function that checks if the VFIO device uses
> +  "switchover-ack" migration capability when this capability is enabled.
> +
>   * A ``save_state`` function to save the device config space if it is present.
>   
>   * A ``save_live_complete_precopy`` function that sets the VFIO device in
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index 5ce7a01d56..b4a7eb0f9a 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -69,6 +69,8 @@ typedef struct VFIOMigration {
>       uint64_t precopy_init_size;
>       uint64_t precopy_dirty_size;
>       uint64_t mig_flags;
> +    bool switchover_ack_needed;
> +    bool initial_data_sent;
>   } VFIOMigration;
>   
>   typedef struct VFIOAddressSpace {
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index 418efed019..09c669c1d8 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -18,6 +18,7 @@
>   #include "sysemu/runstate.h"
>   #include "hw/vfio/vfio-common.h"
>   #include "migration/migration.h"
> +#include "migration/savevm.h"
>   #include "migration/vmstate.h"
>   #include "migration/qemu-file.h"
>   #include "migration/register.h"
> @@ -45,6 +46,7 @@
>   #define VFIO_MIG_FLAG_DEV_CONFIG_STATE  (0xffffffffef100002ULL)
>   #define VFIO_MIG_FLAG_DEV_SETUP_STATE   (0xffffffffef100003ULL)
>   #define VFIO_MIG_FLAG_DEV_DATA_STATE    (0xffffffffef100004ULL)
> +#define VFIO_MIG_FLAG_DEV_INIT_DATA_SENT (0xffffffffef100005ULL)
>   
>   /*
>    * This is an arbitrary size based on migration of mlx5 devices, where typically
> @@ -380,6 +382,8 @@ static void vfio_save_cleanup(void *opaque)
>   
>       g_free(migration->data_buffer);
>       migration->data_buffer = NULL;
> +    migration->switchover_ack_needed = false;
> +    migration->initial_data_sent = false;
>       vfio_migration_cleanup(vbasedev);
>       trace_vfio_save_cleanup(vbasedev->name);
>   }
> @@ -455,10 +459,17 @@ static int vfio_save_iterate(QEMUFile *f, void *opaque)
>       if (data_size < 0) {
>           return data_size;
>       }
> -    qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
>   
>       vfio_update_estimated_pending_data(migration, data_size);
>   
> +    if (migration->switchover_ack_needed && !migration->precopy_init_size &&
> +        !migration->initial_data_sent) {
> +        qemu_put_be64(f, VFIO_MIG_FLAG_DEV_INIT_DATA_SENT);
> +        migration->initial_data_sent = true;
> +    } else {
> +        qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
> +    }
> +
>       trace_vfio_save_iterate(vbasedev->name);
>   
>       /*
> @@ -576,6 +587,24 @@ static int vfio_load_state(QEMUFile *f, void *opaque, int version_id)
>               }
>               break;
>           }
> +        case VFIO_MIG_FLAG_DEV_INIT_DATA_SENT:
> +        {
> +            if (!vbasedev->migration->switchover_ack_needed) {
> +                error_report("%s: Received INIT_DATA_SENT but switchover ack "
> +                             "is not needed",
> +                             vbasedev->name);

This extra line could be avoided. This is minor.

> +                return -EINVAL;
> +            }
> +
> +            ret = qemu_loadvm_approve_switchover();
> +            if (ret) {
> +                error_report(
> +                    "%s: qemu_loadvm_approve_switchover failed, err=%d (%s)",
> +                    vbasedev->name, ret, strerror(-ret));
> +            }
> +
> +            return ret;
> +        }
>           default:
>               error_report("%s: Unknown tag 0x%"PRIx64, vbasedev->name, data);
>               return -EINVAL;
> @@ -590,6 +619,16 @@ static int vfio_load_state(QEMUFile *f, void *opaque, int version_id)
>       return ret;
>   }
>   
> +static bool vfio_switchover_ack_needed(void *opaque)
> +{
> +    VFIODevice *vbasedev = opaque;
> +    VFIOMigration *migration = vbasedev->migration;
> +
> +    migration->switchover_ack_needed = vfio_precopy_supported(vbasedev);
> +
> +    return migration->switchover_ack_needed;
> +}
> +
>   static const SaveVMHandlers savevm_vfio_handlers = {
>       .save_setup = vfio_save_setup,
>       .save_cleanup = vfio_save_cleanup,
> @@ -602,6 +641,7 @@ static const SaveVMHandlers savevm_vfio_handlers = {
>       .load_setup = vfio_load_setup,
>       .load_cleanup = vfio_load_cleanup,
>       .load_state = vfio_load_state,
> +    .switchover_ack_needed = vfio_switchover_ack_needed,
>   };
>   
>   /* ---------------------------------------------------------------------- */



  reply	other threads:[~2023-05-23 15:12 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-21 15:18 [PATCH v3 0/7] migration: Add switchover ack capability and VFIO precopy support Avihai Horon
2023-05-21 15:18 ` [PATCH v3 1/7] migration: Add switchover ack capability Avihai Horon
2023-05-24 19:24   ` Peter Xu
2023-05-25  9:33   ` Markus Armbruster
2023-05-21 15:18 ` [PATCH v3 2/7] migration: Implement switchover ack logic Avihai Horon
2023-05-24 19:32   ` Peter Xu
2023-05-25  9:51     ` Avihai Horon
2023-05-25 12:44       ` Peter Xu
2023-05-28 12:42         ` Avihai Horon
2023-05-21 15:18 ` [PATCH v3 3/7] migration: Enable switchover ack capability Avihai Horon
2023-05-21 15:18 ` [PATCH v3 4/7] tests: Add migration switchover ack capability test Avihai Horon
2023-05-21 15:18 ` [PATCH v3 5/7] vfio/migration: Refactor vfio_save_block() to return saved data size Avihai Horon
2023-05-21 15:18 ` [PATCH v3 6/7] vfio/migration: Add VFIO migration pre-copy support Avihai Horon
2023-05-23 14:56   ` Cédric Le Goater
2023-05-24 12:49     ` Avihai Horon
2023-05-24 15:38       ` Cédric Le Goater
2023-05-25  9:24         ` Avihai Horon
2023-05-21 15:18 ` [PATCH v3 7/7] vfio/migration: Add support for switchover ack capability Avihai Horon
2023-05-23 15:09   ` Cédric Le Goater [this message]
2023-05-24 12:52     ` Avihai Horon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d5fdbc86-2d73-03ca-23a0-407c01fc206e@redhat.com \
    --to=clg@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=armbru@redhat.com \
    --cc=avihaih@nvidia.com \
    --cc=eblake@redhat.com \
    --cc=eduardo@habkost.net \
    --cc=jgg@nvidia.com \
    --cc=joao.m.martins@oracle.com \
    --cc=kwankhede@nvidia.com \
    --cc=leobras@redhat.com \
    --cc=lvivier@redhat.com \
    --cc=maorg@nvidia.com \
    --cc=marcel.apfelbaum@gmail.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=philmd@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=targupta@nvidia.com \
    --cc=thuth@redhat.com \
    --cc=wangyanan55@huawei.com \
    --cc=yishaih@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).