qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Cédric Le Goater" <clg@redhat.com>
To: Avihai Horon <avihaih@nvidia.com>, qemu-devel@nongnu.org
Cc: "Alex Williamson" <alex.williamson@redhat.com>,
	"Eduardo Habkost" <eduardo@habkost.net>,
	"Marcel Apfelbaum" <marcel.apfelbaum@gmail.com>,
	"Philippe Mathieu-Daudé" <philmd@linaro.org>,
	"Yanan Wang" <wangyanan55@huawei.com>,
	"Juan Quintela" <quintela@redhat.com>,
	"Peter Xu" <peterx@redhat.com>,
	"Leonardo Bras" <leobras@redhat.com>,
	"Eric Blake" <eblake@redhat.com>,
	"Markus Armbruster" <armbru@redhat.com>,
	"Thomas Huth" <thuth@redhat.com>,
	"Laurent Vivier" <lvivier@redhat.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Yishai Hadas" <yishaih@nvidia.com>,
	"Jason Gunthorpe" <jgg@nvidia.com>,
	"Maor Gottlieb" <maorg@nvidia.com>,
	"Kirti Wankhede" <kwankhede@nvidia.com>,
	"Tarun Gupta" <targupta@nvidia.com>,
	"Joao Martins" <joao.m.martins@oracle.com>
Subject: Re: [PATCH v5 9/9] vfio/migration: Add support for switchover ack capability
Date: Tue, 30 May 2023 17:15:51 +0200	[thread overview]
Message-ID: <8ca99fcc-af76-eb33-549e-e69e876ae8b9@redhat.com> (raw)
In-Reply-To: <20230530144821.1557-10-avihaih@nvidia.com>

On 5/30/23 16:48, Avihai Horon wrote:
> Loading of a VFIO device's data can take a substantial amount of time as
> the device may need to allocate resources, prepare internal data
> structures, etc. This can increase migration downtime, especially for
> VFIO devices with a lot of resources.
> 
> To solve this, VFIO migration uAPI defines "initial bytes" as part of
> its precopy data stream. Initial bytes can be used in various ways to
> improve VFIO migration performance. For example, it can be used to
> transfer device metadata to pre-allocate resources in the destination.
> However, for this to work we need to make sure that all initial bytes
> are sent and loaded in the destination before the source VM is stopped.
> 
> Use migration switchover ack capability to make sure a VFIO device's
> initial bytes are sent and loaded in the destination before the source
> stops the VM and attempts to complete the migration.
> This can significantly reduce migration downtime for some devices.
> 
> As precopy support and precopy initial bytes support come together in
> VFIO migration, use x-allow-pre-copy device property to control usage of
> this feature as well.
> 
> Signed-off-by: Avihai Horon <avihaih@nvidia.com>


Reviewed-by: Cédric Le Goater <clg@redhat.com>

Thanks,

C.


> ---
>   docs/devel/vfio-migration.rst | 10 +++++++++
>   include/hw/vfio/vfio-common.h |  1 +
>   hw/vfio/migration.c           | 39 ++++++++++++++++++++++++++++++++++-
>   3 files changed, 49 insertions(+), 1 deletion(-)
> 
> diff --git a/docs/devel/vfio-migration.rst b/docs/devel/vfio-migration.rst
> index e896b2a673..b433cb5bb2 100644
> --- a/docs/devel/vfio-migration.rst
> +++ b/docs/devel/vfio-migration.rst
> @@ -16,6 +16,13 @@ helps to reduce the total downtime of the VM. VFIO devices opt-in to pre-copy
>   support by reporting the VFIO_MIGRATION_PRE_COPY flag in the
>   VFIO_DEVICE_FEATURE_MIGRATION ioctl.
>   
> +When pre-copy is supported, it's possible to further reduce downtime by
> +enabling "switchover-ack" migration capability.
> +VFIO migration uAPI defines "initial bytes" as part of its pre-copy data stream
> +and recommends that the initial bytes are sent and loaded in the destination
> +before stopping the source VM. Enabling this migration capability will
> +guarantee that and thus, can potentially reduce downtime even further.
> +
>   Note that currently VFIO migration is supported only for a single device. This
>   is due to VFIO migration's lack of P2P support. However, P2P support is planned
>   to be added later on.
> @@ -45,6 +52,9 @@ VFIO implements the device hooks for the iterative approach as follows:
>   * A ``save_live_iterate`` function that reads the VFIO device's data from the
>     vendor driver during iterative pre-copy phase.
>   
> +* A ``switchover_ack_needed`` function that checks if the VFIO device uses
> +  "switchover-ack" migration capability when this capability is enabled.
> +
>   * A ``save_state`` function to save the device config space if it is present.
>   
>   * A ``save_live_complete_precopy`` function that sets the VFIO device in
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index a53ecbe2e0..3677aba4f4 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -69,6 +69,7 @@ typedef struct VFIOMigration {
>       uint64_t mig_flags;
>       uint64_t precopy_init_size;
>       uint64_t precopy_dirty_size;
> +    bool initial_data_sent;
>   } VFIOMigration;
>   
>   typedef struct VFIOAddressSpace {
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index cb6923ed3f..53f5787f0e 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -18,6 +18,8 @@
>   #include "sysemu/runstate.h"
>   #include "hw/vfio/vfio-common.h"
>   #include "migration/migration.h"
> +#include "migration/options.h"
> +#include "migration/savevm.h"
>   #include "migration/vmstate.h"
>   #include "migration/qemu-file.h"
>   #include "migration/register.h"
> @@ -45,6 +47,7 @@
>   #define VFIO_MIG_FLAG_DEV_CONFIG_STATE  (0xffffffffef100002ULL)
>   #define VFIO_MIG_FLAG_DEV_SETUP_STATE   (0xffffffffef100003ULL)
>   #define VFIO_MIG_FLAG_DEV_DATA_STATE    (0xffffffffef100004ULL)
> +#define VFIO_MIG_FLAG_DEV_INIT_DATA_SENT (0xffffffffef100005ULL)
>   
>   /*
>    * This is an arbitrary size based on migration of mlx5 devices, where typically
> @@ -385,6 +388,7 @@ static void vfio_save_cleanup(void *opaque)
>       migration->data_buffer = NULL;
>       migration->precopy_init_size = 0;
>       migration->precopy_dirty_size = 0;
> +    migration->initial_data_sent = false;
>       vfio_migration_cleanup(vbasedev);
>       trace_vfio_save_cleanup(vbasedev->name);
>   }
> @@ -458,10 +462,17 @@ static int vfio_save_iterate(QEMUFile *f, void *opaque)
>       if (data_size < 0) {
>           return data_size;
>       }
> -    qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
>   
>       vfio_update_estimated_pending_data(migration, data_size);
>   
> +    if (migrate_switchover_ack() && !migration->precopy_init_size &&
> +        !migration->initial_data_sent) {
> +        qemu_put_be64(f, VFIO_MIG_FLAG_DEV_INIT_DATA_SENT);
> +        migration->initial_data_sent = true;
> +    } else {
> +        qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
> +    }
> +
>       trace_vfio_save_iterate(vbasedev->name, migration->precopy_init_size,
>                               migration->precopy_dirty_size);
>   
> @@ -580,6 +591,24 @@ static int vfio_load_state(QEMUFile *f, void *opaque, int version_id)
>               }
>               break;
>           }
> +        case VFIO_MIG_FLAG_DEV_INIT_DATA_SENT:
> +        {
> +            if (!vfio_precopy_supported(vbasedev) ||
> +                !migrate_switchover_ack()) {
> +                error_report("%s: Received INIT_DATA_SENT but switchover ack "
> +                             "is not used", vbasedev->name);
> +                return -EINVAL;
> +            }
> +
> +            ret = qemu_loadvm_approve_switchover();
> +            if (ret) {
> +                error_report(
> +                    "%s: qemu_loadvm_approve_switchover failed, err=%d (%s)",
> +                    vbasedev->name, ret, strerror(-ret));
> +            }
> +
> +            return ret;
> +        }
>           default:
>               error_report("%s: Unknown tag 0x%"PRIx64, vbasedev->name, data);
>               return -EINVAL;
> @@ -594,6 +623,13 @@ static int vfio_load_state(QEMUFile *f, void *opaque, int version_id)
>       return ret;
>   }
>   
> +static bool vfio_switchover_ack_needed(void *opaque)
> +{
> +    VFIODevice *vbasedev = opaque;
> +
> +    return vfio_precopy_supported(vbasedev);
> +}
> +
>   static const SaveVMHandlers savevm_vfio_handlers = {
>       .save_setup = vfio_save_setup,
>       .save_cleanup = vfio_save_cleanup,
> @@ -606,6 +642,7 @@ static const SaveVMHandlers savevm_vfio_handlers = {
>       .load_setup = vfio_load_setup,
>       .load_cleanup = vfio_load_cleanup,
>       .load_state = vfio_load_state,
> +    .switchover_ack_needed = vfio_switchover_ack_needed,
>   };
>   
>   /* ---------------------------------------------------------------------- */



  reply	other threads:[~2023-05-30 15:16 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-30 14:48 [PATCH v5 0/9] migration: Add switchover ack capability and VFIO precopy support Avihai Horon
2023-05-30 14:48 ` [PATCH v5 1/9] migration: Add switchover ack capability Avihai Horon
2023-06-15 12:38   ` YangHang Liu
2023-06-15 13:49     ` Cédric Le Goater
2023-06-19  9:37       ` Avihai Horon
2023-05-30 14:48 ` [PATCH v5 2/9] migration: Implement switchover ack logic Avihai Horon
2023-06-05 22:06   ` Alex Williamson
2023-06-06 12:12     ` Avihai Horon
2023-06-08 18:32       ` Alex Williamson
2023-06-11  7:45         ` Avihai Horon
2023-05-30 14:48 ` [PATCH v5 3/9] migration: Enable switchover ack capability Avihai Horon
2023-05-30 14:48 ` [PATCH v5 4/9] tests: Add migration switchover ack capability test Avihai Horon
2023-05-30 14:48 ` [PATCH v5 5/9] vfio/migration: Refactor vfio_save_block() to return saved data size Avihai Horon
2023-05-30 14:48 ` [PATCH v5 6/9] vfio/migration: Store VFIO migration flags in VFIOMigration Avihai Horon
2023-05-30 14:48 ` [PATCH v5 7/9] vfio/migration: Add VFIO migration pre-copy support Avihai Horon
2023-05-30 14:48 ` [PATCH v5 8/9] vfio/migration: Add x-allow-pre-copy VFIO device property Avihai Horon
2023-06-01 20:22   ` Alex Williamson
2023-06-04  9:33     ` Avihai Horon
2023-06-05 14:56       ` Alex Williamson
2023-06-06 11:59         ` Avihai Horon
2023-06-06 13:40           ` Cédric Le Goater
2023-06-07  7:41             ` Avihai Horon
2023-05-30 14:48 ` [PATCH v5 9/9] vfio/migration: Add support for switchover ack capability Avihai Horon
2023-05-30 15:15   ` Cédric Le Goater [this message]
2023-06-16  9:35 ` [PATCH v5 0/9] migration: Add switchover ack capability and VFIO precopy support YangHang Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8ca99fcc-af76-eb33-549e-e69e876ae8b9@redhat.com \
    --to=clg@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=armbru@redhat.com \
    --cc=avihaih@nvidia.com \
    --cc=eblake@redhat.com \
    --cc=eduardo@habkost.net \
    --cc=jgg@nvidia.com \
    --cc=joao.m.martins@oracle.com \
    --cc=kwankhede@nvidia.com \
    --cc=leobras@redhat.com \
    --cc=lvivier@redhat.com \
    --cc=maorg@nvidia.com \
    --cc=marcel.apfelbaum@gmail.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=philmd@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=targupta@nvidia.com \
    --cc=thuth@redhat.com \
    --cc=wangyanan55@huawei.com \
    --cc=yishaih@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).