From: Alex Williamson <alex.williamson@redhat.com>
To: Avihai Horon <avihaih@nvidia.com>
Cc: qemu-devel@nongnu.org, "Michael S. Tsirkin" <mst@redhat.com>,
"Peter Xu" <peterx@redhat.com>,
"Jason Wang" <jasowang@redhat.com>,
"Marcel Apfelbaum" <marcel.apfelbaum@gmail.com>,
"Paolo Bonzini" <pbonzini@redhat.com>,
"Richard Henderson" <richard.henderson@linaro.org>,
"Eduardo Habkost" <eduardo@habkost.net>,
"David Hildenbrand" <david@redhat.com>,
"Philippe Mathieu-Daudé" <philmd@linaro.org>,
"Yishai Hadas" <yishaih@nvidia.com>,
"Jason Gunthorpe" <jgg@nvidia.com>,
"Maor Gottlieb" <maorg@nvidia.com>,
"Kirti Wankhede" <kwankhede@nvidia.com>,
"Tarun Gupta" <targupta@nvidia.com>,
"Joao Martins" <joao.m.martins@oracle.com>
Subject: Re: [PATCH 01/18] vfio/migration: Add VFIO migration pre-copy support
Date: Thu, 26 Jan 2023 16:52:32 -0700 [thread overview]
Message-ID: <20230126165232.0e7a2316.alex.williamson@redhat.com> (raw)
In-Reply-To: <20230126184948.10478-2-avihaih@nvidia.com>
On Thu, 26 Jan 2023 20:49:31 +0200
Avihai Horon <avihaih@nvidia.com> wrote:
> Pre-copy support allows the VFIO device data to be transferred while the
> VM is running. This helps to accommodate VFIO devices that have a large
> amount of data that needs to be transferred, and it can reduce migration
> downtime.
>
> Pre-copy support is optional in VFIO migration protocol v2.
> Implement pre-copy of VFIO migration protocol v2 and use it for devices
> that support it. Full description of it can be found here [1].
>
> [1]
> https://lore.kernel.org/kvm/20221206083438.37807-3-yishaih@nvidia.com/
>
> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
> ---
> docs/devel/vfio-migration.rst | 29 ++++++---
> include/hw/vfio/vfio-common.h | 3 +
> hw/vfio/common.c | 8 ++-
> hw/vfio/migration.c | 112 ++++++++++++++++++++++++++++++++--
> hw/vfio/trace-events | 5 +-
> 5 files changed, 140 insertions(+), 17 deletions(-)
>
> diff --git a/docs/devel/vfio-migration.rst b/docs/devel/vfio-migration.rst
> index 1d50c2fe5f..51f5e1a537 100644
> --- a/docs/devel/vfio-migration.rst
> +++ b/docs/devel/vfio-migration.rst
> @@ -7,12 +7,14 @@ the guest is running on source host and restoring this saved state on the
> destination host. This document details how saving and restoring of VFIO
> devices is done in QEMU.
>
> -Migration of VFIO devices currently consists of a single stop-and-copy phase.
> -During the stop-and-copy phase the guest is stopped and the entire VFIO device
> -data is transferred to the destination.
> -
> -The pre-copy phase of migration is currently not supported for VFIO devices.
> -Support for VFIO pre-copy will be added later on.
> +Migration of VFIO devices consists of two phases: the optional pre-copy phase,
> +and the stop-and-copy phase. The pre-copy phase is iterative and allows to
> +accommodate VFIO devices that have a large amount of data that needs to be
> +transferred. The iterative pre-copy phase of migration allows for the guest to
> +continue whilst the VFIO device state is transferred to the destination, this
> +helps to reduce the total downtime of the VM. VFIO devices can choose to skip
> +the pre-copy phase of migration by not reporting the VFIO_MIGRATION_PRE_COPY
> +flag in VFIO_DEVICE_FEATURE_MIGRATION ioctl.
>
> A detailed description of the UAPI for VFIO device migration can be found in
> the comment for the ``vfio_device_mig_state`` structure in the header file
> @@ -29,6 +31,12 @@ VFIO implements the device hooks for the iterative approach as follows:
> driver, which indicates the amount of data that the vendor driver has yet to
> save for the VFIO device.
>
> +* An ``is_active_iterate`` function that indicates ``save_live_iterate`` is
> + active only if the VFIO device is in pre-copy states.
> +
> +* A ``save_live_iterate`` function that reads the VFIO device's data from the
> + vendor driver during iterative phase.
> +
> * A ``save_state`` function to save the device config space if it is present.
>
> * A ``save_live_complete_precopy`` function that sets the VFIO device in
> @@ -91,8 +99,10 @@ Flow of state changes during Live migration
> ===========================================
>
> Below is the flow of state change during live migration.
> -The values in the brackets represent the VM state, the migration state, and
> +The values in the parentheses represent the VM state, the migration state, and
> the VFIO device state, respectively.
> +The text in the square brackets represents the flow if the VFIO device supports
> +pre-copy.
>
> Live migration save path
> ------------------------
> @@ -104,11 +114,12 @@ Live migration save path
> |
> migrate_init spawns migration_thread
> Migration thread then calls each device's .save_setup()
> - (RUNNING, _SETUP, _RUNNING)
> + (RUNNING, _SETUP, _RUNNING [_PRE_COPY])
> |
> - (RUNNING, _ACTIVE, _RUNNING)
> + (RUNNING, _ACTIVE, _RUNNING [_PRE_COPY])
> If device is active, get pending_bytes by .save_live_pending()
> If total pending_bytes >= threshold_size, call .save_live_iterate()
> + [Data of VFIO device for pre-copy phase is copied]
> Iterate till total pending bytes converge and are less than threshold
> |
> On migration completion, vCPU stops and calls .save_live_complete_precopy for
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index 5f8e7a02fe..88c2194fb9 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -67,7 +67,10 @@ typedef struct VFIOMigration {
> int data_fd;
> void *data_buffer;
> size_t data_buffer_size;
> + uint64_t mig_flags;
> uint64_t stop_copy_size;
> + uint64_t precopy_init_size;
> + uint64_t precopy_dirty_size;
> } VFIOMigration;
>
> typedef struct VFIOAddressSpace {
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 9a0dbee6b4..93b18c5e3d 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -357,7 +357,9 @@ static bool vfio_devices_all_dirty_tracking(VFIOContainer *container)
>
> if ((vbasedev->pre_copy_dirty_page_tracking == ON_OFF_AUTO_OFF) &&
> (migration->device_state == VFIO_DEVICE_STATE_RUNNING ||
> - migration->device_state == VFIO_DEVICE_STATE_RUNNING_P2P)) {
> + migration->device_state == VFIO_DEVICE_STATE_RUNNING_P2P ||
> + migration->device_state == VFIO_DEVICE_STATE_PRE_COPY ||
> + migration->device_state == VFIO_DEVICE_STATE_PRE_COPY_P2P)) {
Should this just turn into a test that we're not in STOP_COPY?
> return false;
> }
> }
> @@ -387,7 +389,9 @@ static bool vfio_devices_all_running_and_mig_active(VFIOContainer *container)
> }
>
> if (migration->device_state == VFIO_DEVICE_STATE_RUNNING ||
> - migration->device_state == VFIO_DEVICE_STATE_RUNNING_P2P) {
> + migration->device_state == VFIO_DEVICE_STATE_RUNNING_P2P ||
> + migration->device_state == VFIO_DEVICE_STATE_PRE_COPY ||
> + migration->device_state == VFIO_DEVICE_STATE_PRE_COPY_P2P) {
> continue;
> } else {
> return false;
Hmm, this only seems to highlight that between this series and the
previous, we're adding tests for states that we never actually use, ie.
these _P2P states.
IIRC, the reason we have these _P2P states is so that we can transition
a set of devices, which may have active P2P DMA between them, to STOP,
STOP_COPY, and even RUNNING states safely without lost data given that
we cannot simultaneously transition all devices. That suggest that
missing from both these series is support for bringing all devices to
these _P2P states before we move any device to one of STOP, STOP_COPY,
or RUNNING states (in the case of RESUMING).
Also, I recall discussions that we need to enforce configuration
restrictions when not all devices support the _P2P states? For example
adding a migration blocker when there are multiple vfio devices and at
least one of them does not support _P2P migration states. Or perhaps
initially, requiring support for _P2P states.
I think what's implemented here, where we don't make use of the _P2P
states would require adding a migration blocker whenever there are
multiple vfio devices, regardless of the device support for _P2P.
Thanks,
Alex
next prev parent reply other threads:[~2023-01-26 23:53 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-26 18:49 [PATCH 00/18] vfio: Add migration pre-copy support and device dirty tracking Avihai Horon
2023-01-26 18:49 ` [PATCH 01/18] vfio/migration: Add VFIO migration pre-copy support Avihai Horon
2023-01-26 23:52 ` Alex Williamson [this message]
2023-01-31 12:44 ` Avihai Horon
2023-01-31 22:43 ` Alex Williamson
2023-01-31 23:29 ` Jason Gunthorpe
2023-02-01 4:15 ` Alex Williamson
2023-02-01 17:28 ` Jason Gunthorpe
2023-02-01 18:42 ` Alex Williamson
2023-02-01 20:10 ` Jason Gunthorpe
2023-01-26 18:49 ` [PATCH 02/18] vfio/common: Fix error reporting in vfio_get_dirty_bitmap() Avihai Horon
2023-02-15 9:21 ` Cédric Le Goater
2023-01-26 18:49 ` [PATCH 03/18] vfio/common: Fix wrong %m usages Avihai Horon
2023-02-15 9:21 ` Cédric Le Goater
2023-01-26 18:49 ` [PATCH 04/18] vfio/common: Abort migration if dirty log start/stop/sync fails Avihai Horon
2023-02-15 9:41 ` Cédric Le Goater
2023-01-26 18:49 ` [PATCH 05/18] vfio/common: Add VFIOBitmap and (de)alloc functions Avihai Horon
2023-01-27 21:11 ` Alex Williamson
2023-02-12 15:36 ` Avihai Horon
2023-02-14 21:28 ` Alex Williamson
2023-01-26 18:49 ` [PATCH 06/18] util: Add iova_tree_nnodes() Avihai Horon
2023-02-09 22:21 ` Peter Xu
2023-01-26 18:49 ` [PATCH 07/18] util: Extend iova_tree_foreach() to take data argument Avihai Horon
2023-02-09 22:21 ` Peter Xu
2023-01-26 18:49 ` [PATCH 08/18] vfio/common: Record DMA mapped IOVA ranges Avihai Horon
2023-01-27 21:42 ` Alex Williamson
2023-02-12 15:40 ` Avihai Horon
2023-02-13 15:25 ` Alex Williamson
2023-01-26 18:49 ` [PATCH 09/18] vfio/common: Add device dirty page tracking start/stop Avihai Horon
2023-01-26 18:49 ` [PATCH 10/18] vfio/common: Extract code from vfio_get_dirty_bitmap() to new function Avihai Horon
2023-01-26 18:49 ` [PATCH 11/18] vfio/common: Add device dirty page bitmap sync Avihai Horon
2023-01-27 23:37 ` Alex Williamson
2023-02-12 15:49 ` Avihai Horon
2023-01-26 18:49 ` [PATCH 12/18] vfio/common: Extract vIOMMU code from vfio_sync_dirty_bitmap() Avihai Horon
2023-01-26 18:49 ` [PATCH 13/18] memory/iommu: Add IOMMU_ATTR_MAX_IOVA attribute Avihai Horon
2023-02-09 22:16 ` Peter Xu
2023-01-26 18:49 ` [PATCH 14/18] intel-iommu: Implement get_attr() method Avihai Horon
2023-02-09 22:18 ` Peter Xu
2023-01-26 18:49 ` [PATCH 15/18] vfio/common: Support device dirty page tracking with vIOMMU Avihai Horon
2023-01-26 18:49 ` [PATCH 16/18] vfio/common: Optimize " Avihai Horon
2023-01-26 18:49 ` [PATCH 17/18] vfio/migration: Query device dirty page tracking support Avihai Horon
2023-01-26 18:49 ` [PATCH 18/18] docs/devel: Document VFIO device dirty page tracking Avihai Horon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230126165232.0e7a2316.alex.williamson@redhat.com \
--to=alex.williamson@redhat.com \
--cc=avihaih@nvidia.com \
--cc=david@redhat.com \
--cc=eduardo@habkost.net \
--cc=jasowang@redhat.com \
--cc=jgg@nvidia.com \
--cc=joao.m.martins@oracle.com \
--cc=kwankhede@nvidia.com \
--cc=maorg@nvidia.com \
--cc=marcel.apfelbaum@gmail.com \
--cc=mst@redhat.com \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
--cc=philmd@linaro.org \
--cc=qemu-devel@nongnu.org \
--cc=richard.henderson@linaro.org \
--cc=targupta@nvidia.com \
--cc=yishaih@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).