qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Maciej S. Szmigiero" <mail@maciej.szmigiero.name>
To: Peter Xu <peterx@redhat.com>, Fabiano Rosas <farosas@suse.de>
Cc: "Alex Williamson" <alex.williamson@redhat.com>,
	"Cédric Le Goater" <clg@redhat.com>,
	"Eric Blake" <eblake@redhat.com>,
	"Markus Armbruster" <armbru@redhat.com>,
	"Daniel P . Berrangé" <berrange@redhat.com>,
	"Avihai Horon" <avihaih@nvidia.com>,
	"Joao Martins" <joao.m.martins@oracle.com>,
	qemu-devel@nongnu.org
Subject: [PATCH v6 00/36] Multifd 🔀 device state transfer support with VFIO consumer
Date: Tue,  4 Mar 2025 23:03:27 +0100	[thread overview]
Message-ID: <cover.1741124640.git.maciej.szmigiero@oracle.com> (raw)

From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com>

This is an updated v6 patch series of the v5 series located here:
https://lore.kernel.org/qemu-devel/cover.1739994627.git.maciej.szmigiero@oracle.com/

What this patch set is about?
Current live migration device state transfer is done via the main (single)
migration channel, which reduces performance and severally impacts the
migration downtime for VMs having large device state that needs to be
transferred during the switchover phase.

Example devices that have such large switchover phase device state are some
types of VFIO SmartNICs and GPUs.

This patch set allows parallelizing this transfer by using multifd channels
for it.
It also introduces new load and save threads per VFIO device for decoupling
these operations from the main migration thread.
These threads run on newly introduced generic (non-AIO) thread pools,
instantiated by the core migration core.

Changes from v5:
* Add bql_locked() assertion to migration_incoming_state_destroy() with a
comment describing why holding BQL there is necessary.

* Add SPDX-License-Identifier to newly added files.

* Move consistency of multfd transfer settings check to the patch adding
x-migration-multifd-transfer property.

* Change packet->idx == UINT32_MAX message to the suggested one.

* Use WITH_QEMU_LOCK_GUARD() in vfio_load_state_buffer().

* Add vfio_load_bufs_thread_{start,end} trace events.

* Invert "ret" value computation logic in vfio_load_bufs_thread() and
  vfio_multifd_save_complete_precopy_thread() - initialize "ret" to false
  at definition, remove "ret = false" at every failure/early exit block and
  add "ret = true" just before the early exit jump label.

* Make vfio_load_bufs_thread_load_config() return a bool and take an
  "Error **" parameter.

* Make vfio_multifd_setup() (previously called vfio_multifd_transfer_setup())
  allocate struct VFIOMultifd if requested by "alloc_multifd" parameter.

* Add vfio_multifd_cleanup() call to vfio_save_cleanup() (for consistency
  with the load code), with a comment describing that it is currently a NOP
  there.

* Move vfio_multifd_cleanup() to migration-multifd.c.

* Move general multifd migration description in docs/devel/migration/vfio.rst
  from the top section to new "Multifd" section at the bottom.

* Add comment describing why x-migration-multifd-transfer needs to be
  a custom property above the variable containing that custom property type
  in register_vfio_pci_dev_type().

* Add object_class_property_set_description() description for all 3 newly
  added parameters: x-migration-multifd-transfer,
  x-migration-load-config-after-iter and x-migration-max-queued-buffers.

* Split out wiring vfio_multifd_setup() and vfio_multifd_cleanup() into
  general VFIO load/save setup and cleanup methods into a brand new
  patch/commit.

* Squash the patch introducing VFIOStateBuffer(s) into the "received buffers
  queuing" commit to fix building the interim code form at the time of this
  patch with "-Werror".
  
* Change device state packet "idstr" field to NULL-terminated and drop
  QEMU_NONSTRING marking from its definition.

* Add vbasedev->name to VFIO error messages to know which device caused
  that error.

* Move BQL lock ordering assert closer to the other lock in the lock order
  in vfio_load_state_buffer().

* Drop orphan "QemuThread load_bufs_thread" VFIOMultifd member leftover
  from the days of the version 2 of this patch set.

* Change "guint" into an "unsigned int" where it was present in this
  patch set.

* Use g_autoptr() for QEMUFile also in vfio_load_bufs_thread_load_config().

* Call multifd_abort_device_state_save_threads() if a migration error is
  already set in the save path to avoid needlessly waiting for the remaining
  threads to do all of their normal work.

* Other minor changes that should not have functional impact, like:
  renamed functions/labels, moved code lines between patches contained
  in this patch set, added review tags, code formatting, rebased on top
  of the latest QEMU git master, etc.

========================================================================

This patch set is targeting QEMU 10.0.

It is also exported as a git tree:
https://gitlab.com/maciejsszmigiero/qemu/-/commits/multifd-device-state-transfer-vfio

========================================================================

Maciej S. Szmigiero (35):
  migration: Clarify that {load,save}_cleanup handlers can run without
    setup
  thread-pool: Remove thread_pool_submit() function
  thread-pool: Rename AIO pool functions to *_aio() and data types to
    *Aio
  thread-pool: Implement generic (non-AIO) pool support
  migration: Add MIG_CMD_SWITCHOVER_START and its load handler
  migration: Add qemu_loadvm_load_state_buffer() and its handler
  migration: postcopy_ram_listen_thread() should take BQL for some calls
  error: define g_autoptr() cleanup function for the Error type
  migration: Add thread pool of optional load threads
  migration/multifd: Split packet into header and RAM data
  migration/multifd: Device state transfer support - receive side
  migration/multifd: Make multifd_send() thread safe
  migration/multifd: Add an explicit MultiFDSendData destructor
  migration/multifd: Device state transfer support - send side
  migration/multifd: Add multifd_device_state_supported()
  migration: Add save_live_complete_precopy_thread handler
  vfio/migration: Add load_device_config_state_start trace event
  vfio/migration: Convert bytes_transferred counter to atomic
  vfio/migration: Add vfio_add_bytes_transferred()
  vfio/migration: Move migration channel flags to vfio-common.h header
    file
  vfio/migration: Multifd device state transfer support - basic types
  vfio/migration: Multifd device state transfer - add support checking
    function
  vfio/migration: Multifd setup/cleanup functions and associated
    VFIOMultifd
  vfio/migration: Setup and cleanup multifd transfer in these general
    methods
  vfio/migration: Multifd device state transfer support - received
    buffers queuing
  vfio/migration: Multifd device state transfer support - load thread
  migration/qemu-file: Define g_autoptr() cleanup function for QEMUFile
  vfio/migration: Multifd device state transfer support - config loading
    support
  vfio/migration: Multifd device state transfer support - send side
  vfio/migration: Add x-migration-multifd-transfer VFIO property
  vfio/migration: Make x-migration-multifd-transfer VFIO property
    mutable
  hw/core/machine: Add compat for x-migration-multifd-transfer VFIO
    property
  vfio/migration: Max in-flight VFIO device state buffer count limit
  vfio/migration: Add x-migration-load-config-after-iter VFIO property
  vfio/migration: Update VFIO migration documentation

Peter Xu (1):
  migration/multifd: Make MultiFDSendData a struct

 docs/devel/migration/vfio.rst      |  79 ++-
 hw/core/machine.c                  |   2 +
 hw/vfio/meson.build                |   1 +
 hw/vfio/migration-multifd.c        | 786 +++++++++++++++++++++++++++++
 hw/vfio/migration-multifd.h        |  37 ++
 hw/vfio/migration.c                | 111 ++--
 hw/vfio/pci.c                      |  40 ++
 hw/vfio/trace-events               |  13 +-
 include/block/aio.h                |   8 +-
 include/block/thread-pool.h        |  62 ++-
 include/hw/vfio/vfio-common.h      |  34 ++
 include/migration/client-options.h |   4 +
 include/migration/misc.h           |  25 +
 include/migration/register.h       |  52 +-
 include/qapi/error.h               |   2 +
 include/qemu/typedefs.h            |   5 +
 migration/colo.c                   |   3 +
 migration/meson.build              |   1 +
 migration/migration-hmp-cmds.c     |   2 +
 migration/migration.c              |  20 +-
 migration/migration.h              |   7 +
 migration/multifd-device-state.c   | 212 ++++++++
 migration/multifd-nocomp.c         |  30 +-
 migration/multifd.c                | 248 +++++++--
 migration/multifd.h                |  74 ++-
 migration/options.c                |   9 +
 migration/qemu-file.h              |   2 +
 migration/savevm.c                 | 201 +++++++-
 migration/savevm.h                 |   6 +-
 migration/trace-events             |   1 +
 scripts/analyze-migration.py       |  11 +
 tests/unit/test-thread-pool.c      |   6 +-
 util/async.c                       |   6 +-
 util/thread-pool.c                 | 184 +++++--
 util/trace-events                  |   6 +-
 35 files changed, 2125 insertions(+), 165 deletions(-)
 create mode 100644 hw/vfio/migration-multifd.c
 create mode 100644 hw/vfio/migration-multifd.h
 create mode 100644 migration/multifd-device-state.c



             reply	other threads:[~2025-03-04 22:16 UTC|newest]

Thread overview: 103+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-04 22:03 Maciej S. Szmigiero [this message]
2025-03-04 22:03 ` [PATCH v6 01/36] migration: Clarify that {load, save}_cleanup handlers can run without setup Maciej S. Szmigiero
2025-03-04 22:03 ` [PATCH v6 02/36] thread-pool: Remove thread_pool_submit() function Maciej S. Szmigiero
2025-03-04 22:03 ` [PATCH v6 03/36] thread-pool: Rename AIO pool functions to *_aio() and data types to *Aio Maciej S. Szmigiero
2025-03-04 22:03 ` [PATCH v6 04/36] thread-pool: Implement generic (non-AIO) pool support Maciej S. Szmigiero
2025-03-04 22:03 ` [PATCH v6 05/36] migration: Add MIG_CMD_SWITCHOVER_START and its load handler Maciej S. Szmigiero
2025-03-04 22:03 ` [PATCH v6 06/36] migration: Add qemu_loadvm_load_state_buffer() and its handler Maciej S. Szmigiero
2025-03-04 22:03 ` [PATCH v6 07/36] migration: postcopy_ram_listen_thread() should take BQL for some calls Maciej S. Szmigiero
2025-03-05 12:34   ` Peter Xu
2025-03-05 15:11     ` Maciej S. Szmigiero
2025-03-05 16:15       ` Peter Xu
2025-03-05 16:37         ` Cédric Le Goater
2025-03-05 16:49           ` Maciej S. Szmigiero
2025-03-04 22:03 ` [PATCH v6 08/36] error: define g_autoptr() cleanup function for the Error type Maciej S. Szmigiero
2025-03-04 22:03 ` [PATCH v6 09/36] migration: Add thread pool of optional load threads Maciej S. Szmigiero
2025-03-04 22:03 ` [PATCH v6 10/36] migration/multifd: Split packet into header and RAM data Maciej S. Szmigiero
2025-03-04 22:03 ` [PATCH v6 11/36] migration/multifd: Device state transfer support - receive side Maciej S. Szmigiero
2025-03-04 22:03 ` [PATCH v6 12/36] migration/multifd: Make multifd_send() thread safe Maciej S. Szmigiero
2025-03-04 22:03 ` [PATCH v6 13/36] migration/multifd: Add an explicit MultiFDSendData destructor Maciej S. Szmigiero
2025-03-04 22:03 ` [PATCH v6 14/36] migration/multifd: Device state transfer support - send side Maciej S. Szmigiero
2025-03-04 22:03 ` [PATCH v6 15/36] migration/multifd: Make MultiFDSendData a struct Maciej S. Szmigiero
2025-03-05  9:00   ` Cédric Le Goater
2025-03-05 12:43   ` Fabiano Rosas
2025-03-04 22:03 ` [PATCH v6 16/36] migration/multifd: Add multifd_device_state_supported() Maciej S. Szmigiero
2025-03-04 22:03 ` [PATCH v6 17/36] migration: Add save_live_complete_precopy_thread handler Maciej S. Szmigiero
2025-03-05 12:36   ` Peter Xu
2025-03-04 22:03 ` [PATCH v6 18/36] vfio/migration: Add load_device_config_state_start trace event Maciej S. Szmigiero
2025-03-04 22:03 ` [PATCH v6 19/36] vfio/migration: Convert bytes_transferred counter to atomic Maciej S. Szmigiero
2025-03-04 22:03 ` [PATCH v6 20/36] vfio/migration: Add vfio_add_bytes_transferred() Maciej S. Szmigiero
2025-03-05  7:44   ` Cédric Le Goater
2025-03-04 22:03 ` [PATCH v6 21/36] vfio/migration: Move migration channel flags to vfio-common.h header file Maciej S. Szmigiero
2025-03-04 22:03 ` [PATCH v6 22/36] vfio/migration: Multifd device state transfer support - basic types Maciej S. Szmigiero
2025-03-05  7:44   ` Cédric Le Goater
2025-03-04 22:03 ` [PATCH v6 23/36] vfio/migration: Multifd device state transfer - add support checking function Maciej S. Szmigiero
2025-03-04 22:03 ` [PATCH v6 24/36] vfio/migration: Multifd setup/cleanup functions and associated VFIOMultifd Maciej S. Szmigiero
2025-03-05  8:03   ` Cédric Le Goater
2025-03-04 22:03 ` [PATCH v6 25/36] vfio/migration: Setup and cleanup multifd transfer in these general methods Maciej S. Szmigiero
2025-03-05  8:30   ` Cédric Le Goater
2025-03-05 16:22   ` Peter Xu
2025-03-05 16:27     ` Maciej S. Szmigiero
2025-03-05 16:39       ` Peter Xu
2025-03-05 16:47         ` Cédric Le Goater
2025-03-05 16:48         ` Peter Xu
2025-03-04 22:03 ` [PATCH v6 26/36] vfio/migration: Multifd device state transfer support - received buffers queuing Maciej S. Szmigiero
2025-03-05  8:30   ` Cédric Le Goater
2025-03-04 22:03 ` [PATCH v6 27/36] vfio/migration: Multifd device state transfer support - load thread Maciej S. Szmigiero
2025-03-05  8:31   ` Cédric Le Goater
2025-03-04 22:03 ` [PATCH v6 28/36] migration/qemu-file: Define g_autoptr() cleanup function for QEMUFile Maciej S. Szmigiero
2025-03-04 22:03 ` [PATCH v6 29/36] vfio/migration: Multifd device state transfer support - config loading support Maciej S. Szmigiero
2025-03-05  8:33   ` Cédric Le Goater
2025-03-04 22:03 ` [PATCH v6 30/36] vfio/migration: Multifd device state transfer support - send side Maciej S. Szmigiero
2025-03-05  8:38   ` Cédric Le Goater
2025-03-06  6:47   ` Avihai Horon
2025-03-06 10:15     ` Maciej S. Szmigiero
2025-03-06 10:32       ` Cédric Le Goater
2025-03-06 13:37         ` Avihai Horon
2025-03-06 14:13           ` Maciej S. Szmigiero
2025-03-06 14:23             ` Avihai Horon
2025-03-06 14:26               ` Cédric Le Goater
2025-03-07 10:59                 ` Maciej S. Szmigiero
2025-03-04 22:03 ` [PATCH v6 31/36] vfio/migration: Add x-migration-multifd-transfer VFIO property Maciej S. Szmigiero
2025-03-05  9:21   ` Cédric Le Goater
2025-03-04 22:03 ` [PATCH v6 32/36] vfio/migration: Make x-migration-multifd-transfer VFIO property mutable Maciej S. Szmigiero
2025-03-05  8:41   ` Cédric Le Goater
2025-03-04 22:04 ` [PATCH v6 33/36] hw/core/machine: Add compat for x-migration-multifd-transfer VFIO property Maciej S. Szmigiero
2025-03-04 22:04 ` [PATCH v6 34/36] vfio/migration: Max in-flight VFIO device state buffer count limit Maciej S. Szmigiero
2025-03-05  9:19   ` Cédric Le Goater
2025-03-05 15:11     ` Maciej S. Szmigiero
2025-03-05 16:39       ` Cédric Le Goater
2025-03-05 16:53         ` Maciej S. Szmigiero
2025-03-04 22:04 ` [PATCH v6 35/36] vfio/migration: Add x-migration-load-config-after-iter VFIO property Maciej S. Szmigiero
2025-03-04 22:04 ` [PATCH v6 36/36] vfio/migration: Update VFIO migration documentation Maciej S. Szmigiero
2025-03-05  8:53   ` Cédric Le Goater
2025-03-05  9:29 ` [PATCH v6 00/36] Multifd 🔀 device state transfer support with VFIO consumer Cédric Le Goater
2025-03-05  9:33   ` Avihai Horon
2025-03-05  9:35     ` Cédric Le Goater
2025-03-05  9:38       ` Avihai Horon
2025-03-05 17:45   ` Cédric Le Goater
2025-03-06  6:50     ` Avihai Horon
2025-03-05 16:49 ` [PATCH] migration: Always take BQL for migration_incoming_state_destroy() Maciej S. Szmigiero
2025-03-05 16:53   ` Cédric Le Goater
2025-03-05 16:55     ` Maciej S. Szmigiero
2025-03-07 10:57 ` [PATCH 1/2] vfio/migration: Add also max in-flight VFIO device state buffers size limit Maciej S. Szmigiero
2025-03-07 12:03   ` Cédric Le Goater
2025-03-07 13:45     ` Maciej S. Szmigiero
2025-03-11 13:04       ` Cédric Le Goater
2025-03-11 14:57         ` Avihai Horon
2025-03-11 15:45           ` Cédric Le Goater
2025-03-11 16:01             ` Avihai Horon
2025-03-11 16:05               ` Cédric Le Goater
2025-03-12  7:44                 ` Avihai Horon
2025-04-01 12:26         ` Maciej S. Szmigiero
2025-04-02  9:51           ` Cédric Le Goater
2025-04-02 12:40             ` Maciej S. Szmigiero
2025-04-02 13:13               ` Cédric Le Goater
2025-03-07 10:57 ` [PATCH 2/2] vfio/migration: Use BE byte order for device state wire packets Maciej S. Szmigiero
2025-03-10  7:30   ` Cédric Le Goater
2025-03-10  7:34   ` Cédric Le Goater
2025-03-10  8:17   ` Avihai Horon
2025-03-10  9:23     ` Cédric Le Goater
2025-03-10 12:53       ` Maciej S. Szmigiero
2025-03-10 13:39         ` Cédric Le Goater
2025-03-10 12:53     ` Maciej S. Szmigiero

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover.1741124640.git.maciej.szmigiero@oracle.com \
    --to=mail@maciej.szmigiero.name \
    --cc=alex.williamson@redhat.com \
    --cc=armbru@redhat.com \
    --cc=avihaih@nvidia.com \
    --cc=berrange@redhat.com \
    --cc=clg@redhat.com \
    --cc=eblake@redhat.com \
    --cc=farosas@suse.de \
    --cc=joao.m.martins@oracle.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).