From: "Cédric Le Goater" <clg@redhat.com>
To: "Maciej S. Szmigiero" <mail@maciej.szmigiero.name>,
Peter Xu <peterx@redhat.com>, Fabiano Rosas <farosas@suse.de>
Cc: "Alex Williamson" <alex.williamson@redhat.com>,
"Eric Blake" <eblake@redhat.com>,
"Markus Armbruster" <armbru@redhat.com>,
"Daniel P . Berrangé" <berrange@redhat.com>,
"Avihai Horon" <avihaih@nvidia.com>,
"Joao Martins" <joao.m.martins@oracle.com>,
qemu-devel@nongnu.org
Subject: Re: [PATCH v6 00/36] Multifd 🔀 device state transfer support with VFIO consumer
Date: Wed, 5 Mar 2025 10:29:44 +0100 [thread overview]
Message-ID: <4ea12608-ec9d-4eed-a20c-75f3ac6a5d0d@redhat.com> (raw)
In-Reply-To: <cover.1741124640.git.maciej.szmigiero@oracle.com>
Hello,
On 3/4/25 23:03, Maciej S. Szmigiero wrote:
> From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com>
>
> This is an updated v6 patch series of the v5 series located here:
> https://lore.kernel.org/qemu-devel/cover.1739994627.git.maciej.szmigiero@oracle.com/
>
> What this patch set is about?
> Current live migration device state transfer is done via the main (single)
> migration channel, which reduces performance and severally impacts the
> migration downtime for VMs having large device state that needs to be
> transferred during the switchover phase.
>
> Example devices that have such large switchover phase device state are some
> types of VFIO SmartNICs and GPUs.
>
> This patch set allows parallelizing this transfer by using multifd channels
> for it.
> It also introduces new load and save threads per VFIO device for decoupling
> these operations from the main migration thread.
> These threads run on newly introduced generic (non-AIO) thread pools,
> instantiated by the core migration core.
I think we are ready to apply 1-33. Avihai, please take a look !
7,15 and 17 still need an Ack from Peter and/or Fabiano though.
34 can be reworked a bit before -rc0.
35 is for QEMU 10.1.
36 needs some massaging. I will do that.
This can go through the vfio tree if everyone agrees.
Thanks,
C.
> Changes from v5:
> * Add bql_locked() assertion to migration_incoming_state_destroy() with a
> comment describing why holding BQL there is necessary.
>
> * Add SPDX-License-Identifier to newly added files.
>
> * Move consistency of multfd transfer settings check to the patch adding
> x-migration-multifd-transfer property.
>
> * Change packet->idx == UINT32_MAX message to the suggested one.
>
> * Use WITH_QEMU_LOCK_GUARD() in vfio_load_state_buffer().
>
> * Add vfio_load_bufs_thread_{start,end} trace events.
>
> * Invert "ret" value computation logic in vfio_load_bufs_thread() and
> vfio_multifd_save_complete_precopy_thread() - initialize "ret" to false
> at definition, remove "ret = false" at every failure/early exit block and
> add "ret = true" just before the early exit jump label.
>
> * Make vfio_load_bufs_thread_load_config() return a bool and take an
> "Error **" parameter.
>
> * Make vfio_multifd_setup() (previously called vfio_multifd_transfer_setup())
> allocate struct VFIOMultifd if requested by "alloc_multifd" parameter.
>
> * Add vfio_multifd_cleanup() call to vfio_save_cleanup() (for consistency
> with the load code), with a comment describing that it is currently a NOP
> there.
>
> * Move vfio_multifd_cleanup() to migration-multifd.c.
>
> * Move general multifd migration description in docs/devel/migration/vfio.rst
> from the top section to new "Multifd" section at the bottom.
>
> * Add comment describing why x-migration-multifd-transfer needs to be
> a custom property above the variable containing that custom property type
> in register_vfio_pci_dev_type().
>
> * Add object_class_property_set_description() description for all 3 newly
> added parameters: x-migration-multifd-transfer,
> x-migration-load-config-after-iter and x-migration-max-queued-buffers.
>
> * Split out wiring vfio_multifd_setup() and vfio_multifd_cleanup() into
> general VFIO load/save setup and cleanup methods into a brand new
> patch/commit.
>
> * Squash the patch introducing VFIOStateBuffer(s) into the "received buffers
> queuing" commit to fix building the interim code form at the time of this
> patch with "-Werror".
>
> * Change device state packet "idstr" field to NULL-terminated and drop
> QEMU_NONSTRING marking from its definition.
>
> * Add vbasedev->name to VFIO error messages to know which device caused
> that error.
>
> * Move BQL lock ordering assert closer to the other lock in the lock order
> in vfio_load_state_buffer().
>
> * Drop orphan "QemuThread load_bufs_thread" VFIOMultifd member leftover
> from the days of the version 2 of this patch set.
>
> * Change "guint" into an "unsigned int" where it was present in this
> patch set.
>
> * Use g_autoptr() for QEMUFile also in vfio_load_bufs_thread_load_config().
>
> * Call multifd_abort_device_state_save_threads() if a migration error is
> already set in the save path to avoid needlessly waiting for the remaining
> threads to do all of their normal work.
>
> * Other minor changes that should not have functional impact, like:
> renamed functions/labels, moved code lines between patches contained
> in this patch set, added review tags, code formatting, rebased on top
> of the latest QEMU git master, etc.
>
> ========================================================================
>
> This patch set is targeting QEMU 10.0.
>
> It is also exported as a git tree:
> https://gitlab.com/maciejsszmigiero/qemu/-/commits/multifd-device-state-transfer-vfio
>
> ========================================================================
>
> Maciej S. Szmigiero (35):
> migration: Clarify that {load,save}_cleanup handlers can run without
> setup
> thread-pool: Remove thread_pool_submit() function
> thread-pool: Rename AIO pool functions to *_aio() and data types to
> *Aio
> thread-pool: Implement generic (non-AIO) pool support
> migration: Add MIG_CMD_SWITCHOVER_START and its load handler
> migration: Add qemu_loadvm_load_state_buffer() and its handler
> migration: postcopy_ram_listen_thread() should take BQL for some calls
> error: define g_autoptr() cleanup function for the Error type
> migration: Add thread pool of optional load threads
> migration/multifd: Split packet into header and RAM data
> migration/multifd: Device state transfer support - receive side
> migration/multifd: Make multifd_send() thread safe
> migration/multifd: Add an explicit MultiFDSendData destructor
> migration/multifd: Device state transfer support - send side
> migration/multifd: Add multifd_device_state_supported()
> migration: Add save_live_complete_precopy_thread handler
> vfio/migration: Add load_device_config_state_start trace event
> vfio/migration: Convert bytes_transferred counter to atomic
> vfio/migration: Add vfio_add_bytes_transferred()
> vfio/migration: Move migration channel flags to vfio-common.h header
> file
> vfio/migration: Multifd device state transfer support - basic types
> vfio/migration: Multifd device state transfer - add support checking
> function
> vfio/migration: Multifd setup/cleanup functions and associated
> VFIOMultifd
> vfio/migration: Setup and cleanup multifd transfer in these general
> methods
> vfio/migration: Multifd device state transfer support - received
> buffers queuing
> vfio/migration: Multifd device state transfer support - load thread
> migration/qemu-file: Define g_autoptr() cleanup function for QEMUFile
> vfio/migration: Multifd device state transfer support - config loading
> support
> vfio/migration: Multifd device state transfer support - send side
> vfio/migration: Add x-migration-multifd-transfer VFIO property
> vfio/migration: Make x-migration-multifd-transfer VFIO property
> mutable
> hw/core/machine: Add compat for x-migration-multifd-transfer VFIO
> property
> vfio/migration: Max in-flight VFIO device state buffer count limit
> vfio/migration: Add x-migration-load-config-after-iter VFIO property
> vfio/migration: Update VFIO migration documentation
>
> Peter Xu (1):
> migration/multifd: Make MultiFDSendData a struct
>
> docs/devel/migration/vfio.rst | 79 ++-
> hw/core/machine.c | 2 +
> hw/vfio/meson.build | 1 +
> hw/vfio/migration-multifd.c | 786 +++++++++++++++++++++++++++++
> hw/vfio/migration-multifd.h | 37 ++
> hw/vfio/migration.c | 111 ++--
> hw/vfio/pci.c | 40 ++
> hw/vfio/trace-events | 13 +-
> include/block/aio.h | 8 +-
> include/block/thread-pool.h | 62 ++-
> include/hw/vfio/vfio-common.h | 34 ++
> include/migration/client-options.h | 4 +
> include/migration/misc.h | 25 +
> include/migration/register.h | 52 +-
> include/qapi/error.h | 2 +
> include/qemu/typedefs.h | 5 +
> migration/colo.c | 3 +
> migration/meson.build | 1 +
> migration/migration-hmp-cmds.c | 2 +
> migration/migration.c | 20 +-
> migration/migration.h | 7 +
> migration/multifd-device-state.c | 212 ++++++++
> migration/multifd-nocomp.c | 30 +-
> migration/multifd.c | 248 +++++++--
> migration/multifd.h | 74 ++-
> migration/options.c | 9 +
> migration/qemu-file.h | 2 +
> migration/savevm.c | 201 +++++++-
> migration/savevm.h | 6 +-
> migration/trace-events | 1 +
> scripts/analyze-migration.py | 11 +
> tests/unit/test-thread-pool.c | 6 +-
> util/async.c | 6 +-
> util/thread-pool.c | 184 +++++--
> util/trace-events | 6 +-
> 35 files changed, 2125 insertions(+), 165 deletions(-)
> create mode 100644 hw/vfio/migration-multifd.c
> create mode 100644 hw/vfio/migration-multifd.h
> create mode 100644 migration/multifd-device-state.c
>
next prev parent reply other threads:[~2025-03-05 9:30 UTC|newest]
Thread overview: 103+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-04 22:03 [PATCH v6 00/36] Multifd 🔀 device state transfer support with VFIO consumer Maciej S. Szmigiero
2025-03-04 22:03 ` [PATCH v6 01/36] migration: Clarify that {load, save}_cleanup handlers can run without setup Maciej S. Szmigiero
2025-03-04 22:03 ` [PATCH v6 02/36] thread-pool: Remove thread_pool_submit() function Maciej S. Szmigiero
2025-03-04 22:03 ` [PATCH v6 03/36] thread-pool: Rename AIO pool functions to *_aio() and data types to *Aio Maciej S. Szmigiero
2025-03-04 22:03 ` [PATCH v6 04/36] thread-pool: Implement generic (non-AIO) pool support Maciej S. Szmigiero
2025-03-04 22:03 ` [PATCH v6 05/36] migration: Add MIG_CMD_SWITCHOVER_START and its load handler Maciej S. Szmigiero
2025-03-04 22:03 ` [PATCH v6 06/36] migration: Add qemu_loadvm_load_state_buffer() and its handler Maciej S. Szmigiero
2025-03-04 22:03 ` [PATCH v6 07/36] migration: postcopy_ram_listen_thread() should take BQL for some calls Maciej S. Szmigiero
2025-03-05 12:34 ` Peter Xu
2025-03-05 15:11 ` Maciej S. Szmigiero
2025-03-05 16:15 ` Peter Xu
2025-03-05 16:37 ` Cédric Le Goater
2025-03-05 16:49 ` Maciej S. Szmigiero
2025-03-04 22:03 ` [PATCH v6 08/36] error: define g_autoptr() cleanup function for the Error type Maciej S. Szmigiero
2025-03-04 22:03 ` [PATCH v6 09/36] migration: Add thread pool of optional load threads Maciej S. Szmigiero
2025-03-04 22:03 ` [PATCH v6 10/36] migration/multifd: Split packet into header and RAM data Maciej S. Szmigiero
2025-03-04 22:03 ` [PATCH v6 11/36] migration/multifd: Device state transfer support - receive side Maciej S. Szmigiero
2025-03-04 22:03 ` [PATCH v6 12/36] migration/multifd: Make multifd_send() thread safe Maciej S. Szmigiero
2025-03-04 22:03 ` [PATCH v6 13/36] migration/multifd: Add an explicit MultiFDSendData destructor Maciej S. Szmigiero
2025-03-04 22:03 ` [PATCH v6 14/36] migration/multifd: Device state transfer support - send side Maciej S. Szmigiero
2025-03-04 22:03 ` [PATCH v6 15/36] migration/multifd: Make MultiFDSendData a struct Maciej S. Szmigiero
2025-03-05 9:00 ` Cédric Le Goater
2025-03-05 12:43 ` Fabiano Rosas
2025-03-04 22:03 ` [PATCH v6 16/36] migration/multifd: Add multifd_device_state_supported() Maciej S. Szmigiero
2025-03-04 22:03 ` [PATCH v6 17/36] migration: Add save_live_complete_precopy_thread handler Maciej S. Szmigiero
2025-03-05 12:36 ` Peter Xu
2025-03-04 22:03 ` [PATCH v6 18/36] vfio/migration: Add load_device_config_state_start trace event Maciej S. Szmigiero
2025-03-04 22:03 ` [PATCH v6 19/36] vfio/migration: Convert bytes_transferred counter to atomic Maciej S. Szmigiero
2025-03-04 22:03 ` [PATCH v6 20/36] vfio/migration: Add vfio_add_bytes_transferred() Maciej S. Szmigiero
2025-03-05 7:44 ` Cédric Le Goater
2025-03-04 22:03 ` [PATCH v6 21/36] vfio/migration: Move migration channel flags to vfio-common.h header file Maciej S. Szmigiero
2025-03-04 22:03 ` [PATCH v6 22/36] vfio/migration: Multifd device state transfer support - basic types Maciej S. Szmigiero
2025-03-05 7:44 ` Cédric Le Goater
2025-03-04 22:03 ` [PATCH v6 23/36] vfio/migration: Multifd device state transfer - add support checking function Maciej S. Szmigiero
2025-03-04 22:03 ` [PATCH v6 24/36] vfio/migration: Multifd setup/cleanup functions and associated VFIOMultifd Maciej S. Szmigiero
2025-03-05 8:03 ` Cédric Le Goater
2025-03-04 22:03 ` [PATCH v6 25/36] vfio/migration: Setup and cleanup multifd transfer in these general methods Maciej S. Szmigiero
2025-03-05 8:30 ` Cédric Le Goater
2025-03-05 16:22 ` Peter Xu
2025-03-05 16:27 ` Maciej S. Szmigiero
2025-03-05 16:39 ` Peter Xu
2025-03-05 16:47 ` Cédric Le Goater
2025-03-05 16:48 ` Peter Xu
2025-03-04 22:03 ` [PATCH v6 26/36] vfio/migration: Multifd device state transfer support - received buffers queuing Maciej S. Szmigiero
2025-03-05 8:30 ` Cédric Le Goater
2025-03-04 22:03 ` [PATCH v6 27/36] vfio/migration: Multifd device state transfer support - load thread Maciej S. Szmigiero
2025-03-05 8:31 ` Cédric Le Goater
2025-03-04 22:03 ` [PATCH v6 28/36] migration/qemu-file: Define g_autoptr() cleanup function for QEMUFile Maciej S. Szmigiero
2025-03-04 22:03 ` [PATCH v6 29/36] vfio/migration: Multifd device state transfer support - config loading support Maciej S. Szmigiero
2025-03-05 8:33 ` Cédric Le Goater
2025-03-04 22:03 ` [PATCH v6 30/36] vfio/migration: Multifd device state transfer support - send side Maciej S. Szmigiero
2025-03-05 8:38 ` Cédric Le Goater
2025-03-06 6:47 ` Avihai Horon
2025-03-06 10:15 ` Maciej S. Szmigiero
2025-03-06 10:32 ` Cédric Le Goater
2025-03-06 13:37 ` Avihai Horon
2025-03-06 14:13 ` Maciej S. Szmigiero
2025-03-06 14:23 ` Avihai Horon
2025-03-06 14:26 ` Cédric Le Goater
2025-03-07 10:59 ` Maciej S. Szmigiero
2025-03-04 22:03 ` [PATCH v6 31/36] vfio/migration: Add x-migration-multifd-transfer VFIO property Maciej S. Szmigiero
2025-03-05 9:21 ` Cédric Le Goater
2025-03-04 22:03 ` [PATCH v6 32/36] vfio/migration: Make x-migration-multifd-transfer VFIO property mutable Maciej S. Szmigiero
2025-03-05 8:41 ` Cédric Le Goater
2025-03-04 22:04 ` [PATCH v6 33/36] hw/core/machine: Add compat for x-migration-multifd-transfer VFIO property Maciej S. Szmigiero
2025-03-04 22:04 ` [PATCH v6 34/36] vfio/migration: Max in-flight VFIO device state buffer count limit Maciej S. Szmigiero
2025-03-05 9:19 ` Cédric Le Goater
2025-03-05 15:11 ` Maciej S. Szmigiero
2025-03-05 16:39 ` Cédric Le Goater
2025-03-05 16:53 ` Maciej S. Szmigiero
2025-03-04 22:04 ` [PATCH v6 35/36] vfio/migration: Add x-migration-load-config-after-iter VFIO property Maciej S. Szmigiero
2025-03-04 22:04 ` [PATCH v6 36/36] vfio/migration: Update VFIO migration documentation Maciej S. Szmigiero
2025-03-05 8:53 ` Cédric Le Goater
2025-03-05 9:29 ` Cédric Le Goater [this message]
2025-03-05 9:33 ` [PATCH v6 00/36] Multifd 🔀 device state transfer support with VFIO consumer Avihai Horon
2025-03-05 9:35 ` Cédric Le Goater
2025-03-05 9:38 ` Avihai Horon
2025-03-05 17:45 ` Cédric Le Goater
2025-03-06 6:50 ` Avihai Horon
2025-03-05 16:49 ` [PATCH] migration: Always take BQL for migration_incoming_state_destroy() Maciej S. Szmigiero
2025-03-05 16:53 ` Cédric Le Goater
2025-03-05 16:55 ` Maciej S. Szmigiero
2025-03-07 10:57 ` [PATCH 1/2] vfio/migration: Add also max in-flight VFIO device state buffers size limit Maciej S. Szmigiero
2025-03-07 12:03 ` Cédric Le Goater
2025-03-07 13:45 ` Maciej S. Szmigiero
2025-03-11 13:04 ` Cédric Le Goater
2025-03-11 14:57 ` Avihai Horon
2025-03-11 15:45 ` Cédric Le Goater
2025-03-11 16:01 ` Avihai Horon
2025-03-11 16:05 ` Cédric Le Goater
2025-03-12 7:44 ` Avihai Horon
2025-04-01 12:26 ` Maciej S. Szmigiero
2025-04-02 9:51 ` Cédric Le Goater
2025-04-02 12:40 ` Maciej S. Szmigiero
2025-04-02 13:13 ` Cédric Le Goater
2025-03-07 10:57 ` [PATCH 2/2] vfio/migration: Use BE byte order for device state wire packets Maciej S. Szmigiero
2025-03-10 7:30 ` Cédric Le Goater
2025-03-10 7:34 ` Cédric Le Goater
2025-03-10 8:17 ` Avihai Horon
2025-03-10 9:23 ` Cédric Le Goater
2025-03-10 12:53 ` Maciej S. Szmigiero
2025-03-10 13:39 ` Cédric Le Goater
2025-03-10 12:53 ` Maciej S. Szmigiero
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4ea12608-ec9d-4eed-a20c-75f3ac6a5d0d@redhat.com \
--to=clg@redhat.com \
--cc=alex.williamson@redhat.com \
--cc=armbru@redhat.com \
--cc=avihaih@nvidia.com \
--cc=berrange@redhat.com \
--cc=eblake@redhat.com \
--cc=farosas@suse.de \
--cc=joao.m.martins@oracle.com \
--cc=mail@maciej.szmigiero.name \
--cc=peterx@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).