qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Steve Sistare <steven.sistare@oracle.com>
To: qemu-devel@nongnu.org
Cc: Peter Xu <peterx@redhat.com>, Fabiano Rosas <farosas@suse.de>,
	David Hildenbrand <david@redhat.com>,
	Marcel Apfelbaum <marcel.apfelbaum@gmail.com>,
	Eduardo Habkost <eduardo@habkost.net>,
	Philippe Mathieu-Daude <philmd@linaro.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	"Daniel P. Berrange" <berrange@redhat.com>,
	Markus Armbruster <armbru@redhat.com>,
	Steve Sistare <steven.sistare@oracle.com>
Subject: [PATCH V2 00/13] Live update: cpr-transfer
Date: Mon, 30 Sep 2024 12:40:31 -0700	[thread overview]
Message-ID: <1727725244-105198-1-git-send-email-steven.sistare@oracle.com> (raw)

What?

This patch series adds the live migration cpr-transfer mode, which
allows the user to transfer a guest to a new QEMU instance on the same
host with minimal guest pause time, by preserving guest RAM in place,
albeit with new virtual addresses in new QEMU, and by preserving device
file descriptors.

The new user-visible interfaces are:
  * cpr-transfer (MigMode migration parameter)
  * cpr-uri (migration parameter)
  * cpr-uri (command-line argument)

The user sets the mode parameter before invoking the migrate command.
In this mode, the user starts new QEMU on the same host as old QEMU, with
the same arguments as old QEMU, plus the -incoming and the -cpr-uri options.
The user issues the migrate command to old QEMU, which stops the VM, saves
state to the migration channels, and enters the postmigrate state.  Execution
resumes in new QEMU.

Memory-backend objects must have the share=on attribute, but
memory-backend-epc is not supported.  The VM must be started
with the '-machine anon-alloc=memfd' option, which allows anonymous
memory to be transferred in place to the new process.

This mode requires a second migration channel, specified by the cpr-uri
migration property on the outgoing side, and by the cpr-uri QEMU command-line
option on the incoming side.  The channel must be a type, such as unix socket,
that supports SCM_RIGHTS.

Why?

This mode has less impact on the guest than any other method of updating
in place.  The pause time is much lower, because devices need not be torn
down and recreated, DMA does not need to be drained and quiesced, and minimal
state is copied to new QEMU.  Further, there are no constraints on the guest.
By contrast, cpr-reboot mode requires the guest to support S3 suspend-to-ram,
and suspending plus resuming vfio devices adds multiple seconds to the
guest pause time.  Lastly, there is no loss of connectivity to the guest,
because chardev descriptors remain open and connected.

These benefits all derive from the core design principle of this mode,
which is preserving open descriptors.  This approach is very general and
can be used to support a wide variety of devices that do not have hardware
support for live migration, including but not limited to: vfio, chardev,
vhost, vdpa, and iommufd.  Some devices need new kernel software interfaces
to allow a descriptor to be used in a process that did not originally open it.

How?

All memory that is mapped by the guest is preserved in place.  Indeed,
it must be, because it may be the target of DMA requests, which are not
quiesced during cpr-transfer.  All such memory must be mmap'able in new QEMU.
This is easy for named memory-backend objects, as long as they are mapped
shared, because they are visible in the file system in both old and new QEMU.
Anonymous memory must be allocated using memfd_create rather than MAP_ANON,
so the memfd's can be sent to new QEMU.  Pages that were locked in memory
for DMA in old QEMU remain locked in new QEMU, because the descriptor of
the device that locked them remains open.

cpr-transfer preserves descriptors by sending them to new QEMU via the
cpr-uri, which must support SCM_RIGHTS, and by sending the unique name
and value of each descriptor to new QEMU
via CPR state.

For device descriptors, new QEMU reuses the descriptor when creating the
device, rather than opening it again.  For memfd descriptors, new QEMU
mmap's the preserved memfd when a ramblock is created.

CPR state cannot be sent over the normal migration channel, because devices
and backends are created prior to reading the channel, so this mode sends
CPR state over a second migration channel, specified by cpr-uri.  New QEMU
reads the second channel prior to creating devices or backends.

Example:

In this example, we simply restart the same version of QEMU, but in
a real scenario one would use a new QEMU binary path in terminal 2.

  Terminal 1: start old QEMU
  # qemu-kvm -monitor stdio -object
  memory-backend-file,id=ram0,size=4G,mem-path=/dev/shm/ram0,share=on
  -m 4G -machine anon-alloc=memfd ...

  Terminal 2: start new QEMU
  # qemu-kvm ... -incoming unix:vm.sock -cpr-uri unix:cpr.sock

  Terminal 1:
  QEMU 9.1.50 monitor - type 'help' for more information
  (qemu) info status
  VM status: running
  (qemu) migrate_set_parameter mode cpr-transfer
  (qemu) migrate_set_parameter cpr-uri unix:cpr.sock
  (qemu) migrate -d unix:vm.sock
  (qemu) info status
  VM status: paused (postmigrate)

  Terminal 2:
  QEMU 9.1.50 monitor - type 'help' for more information
  (qemu) info status
  VM status: running

This patch series implements a minimal version of cpr-transfer.  Additional
series are ready to be posted to deliver the complete vision described
above, including
  * vfio
  * chardev
  * vhost and tap
  * blockers
  * migration-test cases
  * cpr-exec mode

Works in progress include:
  * vdpa
  * iommufd

Changes in V2:
  * cpr-transfer is the first new mode proposed, and cpr-exec is deferred
  * anon-alloc does not apply to memory-backend-object
  * replaced hack with proper synchronization between source and target
  * defined QEMU_CPR_FILE_MAGIC
  * addressed misc review comments

The first 6 patches below are foundational and are needed for both cpr-transfer
mode and the proposed cpr-exec mode.  The last 7 patches are specific to
cpr-transfer and implement the mechanisms for sharing state across a socket
using SCM_RIGHTS.

Steve Sistare (13):
  machine: alloc-anon option
  migration: cpr-state
  migration: save cpr mode
  migration: stop vm earlier for cpr
  physmem: preserve ram blocks for cpr
  hostmem-memfd: preserve for cpr
  migration: SCM_RIGHTS for QEMUFile
  migration: VMSTATE_FD
  migration: cpr-transfer save and load
  migration: cpr-uri parameter
  migration: cpr-uri option
  migration: split qmp_migrate
  migration: cpr-transfer mode

 backends/hostmem-memfd.c       |  12 +-
 hw/core/machine.c              |  19 +++
 include/hw/boards.h            |   1 +
 include/migration/cpr.h        |  38 ++++++
 include/migration/vmstate.h    |   9 ++
 migration/cpr-transfer.c       |  81 +++++++++++++
 migration/cpr.c                | 269 +++++++++++++++++++++++++++++++++++++++++
 migration/meson.build          |   2 +
 migration/migration-hmp-cmds.c |  10 ++
 migration/migration.c          | 116 ++++++++++++++++--
 migration/migration.h          |   2 +
 migration/options.c            |  37 +++++-
 migration/options.h            |   1 +
 migration/qemu-file.c          |  83 ++++++++++++-
 migration/qemu-file.h          |   2 +
 migration/ram.c                |   2 +
 migration/trace-events         |   7 ++
 migration/vmstate-types.c      |  33 +++++
 qapi/machine.json              |  14 +++
 qapi/migration.json            |  45 ++++++-
 qemu-options.hx                |  19 +++
 stubs/vmstate.c                |   7 ++
 system/physmem.c               |  58 +++++++++
 system/trace-events            |   3 +
 system/vl.c                    |  10 ++
 25 files changed, 857 insertions(+), 23 deletions(-)
 create mode 100644 include/migration/cpr.h
 create mode 100644 migration/cpr-transfer.c
 create mode 100644 migration/cpr.c

-- 
1.8.3.1



             reply	other threads:[~2024-09-30 19:41 UTC|newest]

Thread overview: 79+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-30 19:40 Steve Sistare [this message]
2024-09-30 19:40 ` [PATCH V2 01/13] machine: alloc-anon option Steve Sistare
2024-10-03 16:14   ` Peter Xu
2024-10-04 10:14     ` David Hildenbrand
2024-10-04 12:33       ` Peter Xu
2024-10-04 12:54         ` David Hildenbrand
2024-10-04 13:24           ` Peter Xu
2024-10-07 16:23             ` David Hildenbrand
2024-10-07 19:05               ` Peter Xu
2024-10-07 15:36   ` Peter Xu
2024-10-07 19:30     ` Steven Sistare
2024-09-30 19:40 ` [PATCH V2 02/13] migration: cpr-state Steve Sistare
2024-10-07 14:14   ` Peter Xu
2024-10-07 19:30     ` Steven Sistare
2024-09-30 19:40 ` [PATCH V2 03/13] migration: save cpr mode Steve Sistare
2024-10-07 15:18   ` Peter Xu
2024-10-07 19:31     ` Steven Sistare
2024-10-07 20:10       ` Peter Xu
2024-10-08 15:57         ` Steven Sistare
2024-09-30 19:40 ` [PATCH V2 04/13] migration: stop vm earlier for cpr Steve Sistare
2024-10-07 15:27   ` Peter Xu
2024-10-07 20:52     ` Steven Sistare
2024-10-08 15:35       ` Peter Xu
2024-10-08 19:13         ` Steven Sistare
2024-09-30 19:40 ` [PATCH V2 05/13] physmem: preserve ram blocks " Steve Sistare
2024-10-07 15:49   ` Peter Xu
2024-10-07 16:28     ` Peter Xu
2024-10-08 15:17       ` Steven Sistare
2024-10-08 16:26         ` Peter Xu
2024-10-08 21:05           ` Steven Sistare
2024-10-08 21:32             ` Peter Xu
2024-10-31 20:32               ` Steven Sistare
2024-09-30 19:40 ` [PATCH V2 06/13] hostmem-memfd: preserve " Steve Sistare
2024-10-07 15:52   ` Peter Xu
2024-09-30 19:40 ` [PATCH V2 07/13] migration: SCM_RIGHTS for QEMUFile Steve Sistare
2024-10-07 16:06   ` Peter Xu
2024-10-07 16:35     ` Daniel P. Berrangé
2024-10-07 18:12       ` Peter Xu
2024-09-30 19:40 ` [PATCH V2 08/13] migration: VMSTATE_FD Steve Sistare
2024-10-07 16:36   ` Peter Xu
2024-10-07 19:31     ` Steven Sistare
2024-09-30 19:40 ` [PATCH V2 09/13] migration: cpr-transfer save and load Steve Sistare
2024-10-07 16:47   ` Peter Xu
2024-10-07 19:31     ` Steven Sistare
2024-10-08 15:36       ` Peter Xu
2024-09-30 19:40 ` [PATCH V2 10/13] migration: cpr-uri parameter Steve Sistare
2024-10-07 16:49   ` Peter Xu
2024-09-30 19:40 ` [PATCH V2 11/13] migration: cpr-uri option Steve Sistare
2024-10-07 16:50   ` Peter Xu
2024-09-30 19:40 ` [PATCH V2 12/13] migration: split qmp_migrate Steve Sistare
2024-10-07 19:18   ` Peter Xu
2024-09-30 19:40 ` [PATCH V2 13/13] migration: cpr-transfer mode Steve Sistare
2024-10-07 19:44   ` Peter Xu
2024-10-07 20:39     ` Steven Sistare
2024-10-08 15:45       ` Peter Xu
2024-10-08 19:12         ` Steven Sistare
2024-10-08 19:38           ` Peter Xu
2024-10-08 18:28       ` Fabiano Rosas
2024-10-08 18:47         ` Peter Xu
2024-10-08 19:11           ` Fabiano Rosas
2024-10-08 19:33             ` Steven Sistare
2024-10-08 19:48             ` Peter Xu
2024-10-09 18:43               ` Steven Sistare
2024-10-09 19:06                 ` Peter Xu
2024-10-09 19:59                   ` Peter Xu
2024-10-09 20:18                     ` Steven Sistare
2024-10-09 20:57                       ` Peter Xu
2024-10-09 22:08                         ` Fabiano Rosas
2024-10-10 20:05                           ` Steven Sistare
2024-10-09 20:09                   ` Steven Sistare
2024-10-09 20:36                     ` Peter Xu
2024-10-10 20:06                       ` Steven Sistare
2024-10-10 21:23                         ` Peter Xu
2024-10-24 21:12                           ` Steven Sistare
2024-10-25 13:55                             ` Peter Xu
2024-10-25 15:04                               ` Steven Sistare
2024-10-08 19:29           ` Steven Sistare
2024-10-08 14:33 ` [PATCH V2 00/13] Live update: cpr-transfer Vladimir Sementsov-Ogievskiy
2024-10-08 21:13   ` Steven Sistare

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1727725244-105198-1-git-send-email-steven.sistare@oracle.com \
    --to=steven.sistare@oracle.com \
    --cc=armbru@redhat.com \
    --cc=berrange@redhat.com \
    --cc=david@redhat.com \
    --cc=eduardo@habkost.net \
    --cc=farosas@suse.de \
    --cc=marcel.apfelbaum@gmail.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=philmd@linaro.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).