qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Steven Sistare <steven.sistare@oracle.com>
To: Fabiano Rosas <farosas@suse.de>, qemu-devel@nongnu.org
Cc: Peter Xu <peterx@redhat.com>,
	David Hildenbrand <david@redhat.com>,
	Marcel Apfelbaum <marcel.apfelbaum@gmail.com>,
	Eduardo Habkost <eduardo@habkost.net>,
	Philippe Mathieu-Daude <philmd@linaro.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	"Daniel P. Berrange" <berrange@redhat.com>,
	Markus Armbruster <armbru@redhat.com>
Subject: Re: [PATCH V7 00/24] Live update: cpr-transfer
Date: Tue, 28 Jan 2025 16:20:39 -0500	[thread overview]
Message-ID: <a7af45f7-cd65-497a-9b20-eae6a0dab361@oracle.com> (raw)
In-Reply-To: <87y0ywqna1.fsf@suse.de>

On 1/27/2025 10:39 AM, Fabiano Rosas wrote:
> Steve Sistare <steven.sistare@oracle.com> writes:
> 
>> What?
>>
>> This patch series adds the live migration cpr-transfer mode, which
>> allows the user to transfer a guest to a new QEMU instance on the same
>> host with minimal guest pause time, by preserving guest RAM in place,
>> albeit with new virtual addresses in new QEMU, and by preserving device
>> file descriptors.
>>
>> The new user-visible interfaces are:
>>    * cpr-transfer (MigMode migration parameter)
>>    * cpr (MigrationChannelType)
>>    * incoming MigrationChannel (command-line argument)
>>    * aux-ram-share (machine option)
>>
>> The user sets the mode parameter before invoking the migrate command.
>> In this mode, the user starts new QEMU on the same host as old QEMU, with
>> the same arguments as old QEMU, plus two -incoming options; one for the main
>> channel, and one for the CPR channel.  The user issues the migrate command to
>> old QEMU, which stops the VM, saves state to the migration channels, and
>> enters the postmigrate state.  Execution resumes in new QEMU.
>>
>> Memory-backend objects must have the share=on attribute, but memory-backend-epc
>> is not supported.  The VM must be started with the '-machine aux-ram-share=on'
>> option, which allows auxilliary guest memory to be transferred in place to the
>> new process.
>>
>> This mode requires a second migration channel of type "cpr", in the channel
>> arguments on the outgoing side, and in a second -incoming command-line
>> parameter on the incoming side.  This CPR channel must support file descriptor
>> transfer with SCM_RIGHTS, i.e. it must be a UNIX domain socket.
>>
>> Why?
>>
>> This mode has less impact on the guest than any other method of updating
>> in place.  The pause time is much lower, because devices need not be torn
>> down and recreated, DMA does not need to be drained and quiesced, and minimal
>> state is copied to new QEMU.  Further, there are no constraints on the guest.
>> By contrast, cpr-reboot mode requires the guest to support S3 suspend-to-ram,
>> and suspending plus resuming vfio devices adds multiple seconds to the
>> guest pause time.
>>
>> These benefits all derive from the core design principle of this mode,
>> which is preserving open descriptors.  This approach is very general and
>> can be used to support a wide variety of devices that do not have hardware
>> support for live migration, including but not limited to: vfio, chardev,
>> vhost, vdpa, and iommufd.  Some devices need new kernel software interfaces
>> to allow a descriptor to be used in a process that did not originally open it.
>>
>> How?
>>
>> All memory that is mapped by the guest is preserved in place.  Indeed,
>> it must be, because it may be the target of DMA requests, which are not
>> quiesced during cpr-transfer.  All such memory must be mmap'able in new QEMU.
>> This is easy for named memory-backend objects, as long as they are mapped
>> shared, because they are visible in the file system in both old and new QEMU.
>> Anonymous memory must be allocated using memfd_create rather than MAP_ANON,
>> so the memfd's can be sent to new QEMU.  Pages that were locked in memory
>> for DMA in old QEMU remain locked in new QEMU, because the descriptor of
>> the device that locked them remains open.
>>
>> cpr-transfer preserves descriptors by sending them to new QEMU via the CPR
>> channel, which must support SCM_RIGHTS, and by sending the unique name of
>> each descriptor to new QEMU via CPR state.
>>
>> For device descriptors, new QEMU reuses the descriptor when creating the
>> device, rather than opening it again.  For memfd descriptors, new QEMU
>> mmap's the preserved memfd when a ramblock is created.
>>
>> CPR state cannot be sent over the normal migration channel, because devices
>> and backends are created prior to reading the channel, so this mode sends
>> CPR state over a second "cpr" migration channel.  New QEMU reads the second
>> channel prior to creating devices or backends.
>>
>> Example:
>>
>> In this example, we simply restart the same version of QEMU, but in
>> a real scenario one would use a new QEMU binary path in terminal 2.
>>
>>    Terminal 1: start old QEMU
>>    # qemu-kvm -qmp stdio -object
>>    memory-backend-file,id=ram0,size=4G,mem-path=/dev/shm/ram0,share=on
>>    -m 4G -machine aux-ram-share=on ...
>>
>>    Terminal 2: start new QEMU
>>    # qemu-kvm -monitor stdio ... -incoming tcp:0:44444
>>      -incoming '{"channel-type": "cpr",
>>                  "addr": { "transport": "socket", "type": "unix",
>>                            "path": "cpr.sock"}}'
>>
>>    Terminal 1:
>>    {"execute":"qmp_capabilities"}
>>
>>    {"execute": "query-status"}
>>    {"return": {"status": "running",
>>                "running": true}}
>>
>>    {"execute":"migrate-set-parameters",
>>     "arguments":{"mode":"cpr-transfer"}}
>>
>>    {"execute": "migrate", "arguments": { "channels": [
>>      {"channel-type": "main",
>>       "addr": { "transport": "socket", "type": "inet",
>>                 "host": "0", "port": "44444" }},
>>      {"channel-type": "cpr",
>>       "addr": { "transport": "socket", "type": "unix",
>>                 "path": "cpr.sock" }}]}}
>>
>>    {"execute": "query-status"}
>>    {"return": {"status": "postmigrate",
>>                "running": false}}
>>
>>    Terminal 2:
>>    QEMU 10.0.50 monitor - type 'help' for more information
>>    (qemu) info status
>>    VM status: running
>>
>> This patch series implements a minimal version of cpr-transfer.  Additional
>> series are ready to be posted to deliver the complete vision described
>> above, including
>>    * vfio
>>    * chardev
>>    * vhost and tap
>>    * blockers
>>    * cpr-exec mode
>>    * iommufd
>>
>> Changes in V2:
>>    * cpr-transfer is the first new mode proposed, and cpr-exec is deferred
>>    * anon-alloc does not apply to memory-backend-object
>>    * replaced hack with proper synchronization between source and target
>>    * defined QEMU_CPR_FILE_MAGIC
>>    * addressed misc review comments
>>
>> Changes in V3:
>>    * added cpr-transfer to migration-test
>>    * documented cpr-transfer in CPR.rst
>>    * fix size_t trace format for 32-bit build
>>    * drop explicit fd value in VMSTATE_FD
>>    * defer cpr_walk_fd() and cpr_resave_fd() to later series
>>    * drop "migration: save cpr mode".
>>      delete mode from cpr state, and use cpr_uri to infer transfer mode.
>>    * drop "migration: stop vm earlier for cpr"
>>    * dropped cpr helpers, to be re-added later when needed
>>    * fixed an unreported bug for cpr-transfer and migrate cancel
>>    * documented cpr-transfer restrictions in qapi
>>    * added trace for cpr_state_save and cpr_state_load
>>    * added ftruncate to "preserve ram blocks"
>>
>> Changes in V4:
>>    * cleaned up qtest deferred connection code
>>    * renamed pass_fd -> can_pass_fd
>>    * squashed patch "split qmp_migrate"
>>    * deleted cpr-uri and its patches
>>    * added cpr channel and its patches
>>    * added patch "hostmem-shm: preserve for cpr"
>>    * added patch "fd-based shared memory"
>>    * added patch "factor out allocation of anonymous shared memory"
>>    * added RAM_PRIVATE and its patch
>>    * added aux-ram-share and its patch
>>
>> Changes in V5:
>>    * added patch 'enhance migrate_uri_parse'
>>    * supported dotted keys for -incoming channel,
>>      and rewrote incoming_option_parse
>>    * moved migrate_fd_cancel -> vm_resume to "stop vm earlier for cpr"
>>      in a future series.
>>    * updated command-line definition for aux-ram-share
>>    * added patch "resizable qemu_ram_alloc_from_fd"
>>    * rewrote patch "fd-based shared memory"
>>    * fixed error message in qemu_shm_alloc
>>    * added patch 'tests/qtest: optimize migrate_set_ports'
>>    * added patch 'tests/qtest: enhance migration channels'
>>    * added patch 'tests/qtest: assert qmp_ready'
>>    * modified patch 'migration-test: cpr-transfer'
>>    * polished the documentation in CPR.rst, qapi, and the
>>      cpr-transfer mode commit message
>>    * updated to master, and resolved massive context diffs for migration tests
>>
>> Changes in V6:
>>    * added RB's and Acks.
>>    * in patch "assert qmp_ready", deleted qmp_ready and checked qmp_fd instead.
>>      renamed patch to ""assert qmp connected"
>>    * factored out fix into new patch
>>      "fix qemu_ram_alloc_from_fd size calculation"
>>    * deleted a redundant call to migrate_hup_delete
>>    * added commit message to "migration: cpr-transfer documentation"
>>    * polished the text of cpr-transfer mode in qapi
>>
>> Changes in V7:
>>    * fixed cpr-transfer test failure for s390
>>    * fixed machine_get_aux_ram_share compilation error for Windows
>>    * fixed size_t print format compilation error for misc architectures
>>    * fixed memory leaks in cpr_transfer_output, cpr_transfer_input, and
>>      qemu_file_get_fd
>>
>> The first 10 patches below are foundational and are needed for both cpr-transfer
>> mode and the proposed cpr-exec mode.  The next 6 patches are specific to
>> cpr-transfer and implement the mechanisms for sharing state across a socket
>> using SCM_RIGHTS.  The last 8 patches supply tests and documentation.
>>
>> Steve Sistare (24):
>>    backends/hostmem-shm: factor out allocation of "anonymous shared
>>      memory with an fd"
>>    physmem: fix qemu_ram_alloc_from_fd size calculation
>>    physmem: qemu_ram_alloc_from_fd extensions
>>    physmem: fd-based shared memory
>>    memory: add RAM_PRIVATE
>>    machine: aux-ram-share option
>>    migration: cpr-state
>>    physmem: preserve ram blocks for cpr
>>    hostmem-memfd: preserve for cpr
>>    hostmem-shm: preserve for cpr
>>    migration: enhance migrate_uri_parse
>>    migration: incoming channel
>>    migration: SCM_RIGHTS for QEMUFile
>>    migration: VMSTATE_FD
>>    migration: cpr-transfer save and load
>>    migration: cpr-transfer mode
>>    migration-test: memory_backend
>>    tests/qtest: optimize migrate_set_ports
>>    tests/qtest: defer connection
>>    migration-test: defer connection
>>    tests/qtest: enhance migration channels
>>    tests/qtest: assert qmp connected
>>    migration-test: cpr-transfer
>>    migration: cpr-transfer documentation
>>
>>   backends/hostmem-epc.c                 |   2 +-
>>   backends/hostmem-file.c                |   2 +-
>>   backends/hostmem-memfd.c               |  14 ++-
>>   backends/hostmem-ram.c                 |   2 +-
>>   backends/hostmem-shm.c                 |  51 ++------
>>   docs/devel/migration/CPR.rst           | 182 ++++++++++++++++++++++++++-
>>   hw/core/machine.c                      |  22 ++++
>>   include/exec/memory.h                  |  10 ++
>>   include/exec/ram_addr.h                |  13 +-
>>   include/hw/boards.h                    |   1 +
>>   include/migration/cpr.h                |  33 +++++
>>   include/migration/misc.h               |   7 ++
>>   include/migration/vmstate.h            |   9 ++
>>   include/qemu/osdep.h                   |   1 +
>>   meson.build                            |   8 +-
>>   migration/cpr-transfer.c               |  71 +++++++++++
>>   migration/cpr.c                        | 224 +++++++++++++++++++++++++++++++++
>>   migration/meson.build                  |   2 +
>>   migration/migration.c                  | 139 +++++++++++++++++++-
>>   migration/migration.h                  |   4 +-
>>   migration/options.c                    |   8 +-
>>   migration/qemu-file.c                  |  84 ++++++++++++-
>>   migration/qemu-file.h                  |   2 +
>>   migration/ram.c                        |   2 +
>>   migration/trace-events                 |  11 ++
>>   migration/vmstate-types.c              |  24 ++++
>>   qapi/migration.json                    |  44 ++++++-
>>   qemu-options.hx                        |  34 +++++
>>   stubs/vmstate.c                        |   7 ++
>>   system/memory.c                        |   4 +-
>>   system/physmem.c                       | 150 ++++++++++++++++++----
>>   system/trace-events                    |   1 +
>>   system/vl.c                            |  43 ++++++-
>>   tests/qtest/libqtest.c                 |  86 ++++++++-----
>>   tests/qtest/libqtest.h                 |  19 ++-
>>   tests/qtest/migration/cpr-tests.c      |  62 +++++++++
>>   tests/qtest/migration/framework.c      |  74 +++++++++--
>>   tests/qtest/migration/framework.h      |  11 ++
>>   tests/qtest/migration/migration-qmp.c  |  53 ++++++--
>>   tests/qtest/migration/migration-qmp.h  |  10 +-
>>   tests/qtest/migration/migration-util.c |  23 ++--
>>   tests/qtest/migration/misc-tests.c     |   9 +-
>>   tests/qtest/migration/precopy-tests.c  |   6 +-
>>   tests/qtest/virtio-net-failover.c      |   8 +-
>>   util/memfd.c                           |  16 ++-
>>   util/oslib-posix.c                     |  52 ++++++++
>>   util/oslib-win32.c                     |   6 +
>>   47 files changed, 1472 insertions(+), 174 deletions(-)
>>   create mode 100644 include/migration/cpr.h
>>   create mode 100644 migration/cpr-transfer.c
>>   create mode 100644 migration/cpr.c
>>
>> base-commit: e8aa7fdcddfc8589bdc7c973a052e76e8f999455
> 
> I'd like to merge this series by the end of the week if possible. Please
> take a look at some comments from Markus that were left behind in v5.

We discussed, and Markus agrees none are show stoppers.

- Steve



  reply	other threads:[~2025-01-28 21:21 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-15 19:00 [PATCH V7 00/24] Live update: cpr-transfer Steve Sistare
2025-01-15 19:00 ` [PATCH V7 01/24] backends/hostmem-shm: factor out allocation of "anonymous shared memory with an fd" Steve Sistare
2025-01-15 19:00 ` [PATCH V7 02/24] physmem: fix qemu_ram_alloc_from_fd size calculation Steve Sistare
2025-01-15 19:00 ` [PATCH V7 03/24] physmem: qemu_ram_alloc_from_fd extensions Steve Sistare
2025-01-15 19:00 ` [PATCH V7 04/24] physmem: fd-based shared memory Steve Sistare
2025-01-15 19:00 ` [PATCH V7 05/24] memory: add RAM_PRIVATE Steve Sistare
2025-01-15 19:00 ` [PATCH V7 06/24] machine: aux-ram-share option Steve Sistare
2025-01-15 19:00 ` [PATCH V7 07/24] migration: cpr-state Steve Sistare
2025-01-15 19:00 ` [PATCH V7 08/24] physmem: preserve ram blocks for cpr Steve Sistare
2025-01-15 19:00 ` [PATCH V7 09/24] hostmem-memfd: preserve " Steve Sistare
2025-01-15 19:00 ` [PATCH V7 10/24] hostmem-shm: " Steve Sistare
2025-01-15 19:00 ` [PATCH V7 11/24] migration: enhance migrate_uri_parse Steve Sistare
2025-01-15 19:00 ` [PATCH V7 12/24] migration: incoming channel Steve Sistare
2025-01-15 19:00 ` [PATCH V7 13/24] migration: SCM_RIGHTS for QEMUFile Steve Sistare
2025-01-15 19:00 ` [PATCH V7 14/24] migration: VMSTATE_FD Steve Sistare
2025-01-15 19:00 ` [PATCH V7 15/24] migration: cpr-transfer save and load Steve Sistare
2025-01-15 19:00 ` [PATCH V7 16/24] migration: cpr-transfer mode Steve Sistare
2025-01-29  6:23   ` Markus Armbruster
2025-01-15 19:00 ` [PATCH V7 17/24] migration-test: memory_backend Steve Sistare
2025-01-15 19:00 ` [PATCH V7 18/24] tests/qtest: optimize migrate_set_ports Steve Sistare
2025-01-15 19:00 ` [PATCH V7 19/24] tests/qtest: defer connection Steve Sistare
2025-01-15 19:00 ` [PATCH V7 20/24] migration-test: " Steve Sistare
2025-01-15 19:00 ` [PATCH V7 21/24] tests/qtest: enhance migration channels Steve Sistare
2025-01-15 19:00 ` [PATCH V7 22/24] tests/qtest: assert qmp connected Steve Sistare
2025-01-15 19:00 ` [PATCH V7 23/24] migration-test: cpr-transfer Steve Sistare
2025-01-16 19:06   ` Fabiano Rosas
2025-01-16 19:37     ` Steven Sistare
2025-01-16 20:02       ` Fabiano Rosas
2025-01-16 20:15         ` Steven Sistare
2025-01-15 19:00 ` [PATCH V7 24/24] migration: cpr-transfer documentation Steve Sistare
2025-01-17 14:42   ` Fabiano Rosas
2025-01-17 15:04     ` Steven Sistare
2025-01-17 15:29       ` Fabiano Rosas
2025-01-17 16:58         ` Steven Sistare
2025-01-17 19:06           ` Fabiano Rosas
2025-01-17 19:32             ` Steven Sistare
2025-01-17 20:04               ` Fabiano Rosas
2025-01-27 15:39 ` [PATCH V7 00/24] Live update: cpr-transfer Fabiano Rosas
2025-01-28 21:20   ` Steven Sistare [this message]
2025-01-29  6:24     ` Markus Armbruster
2025-04-09 16:22 ` Vladimir Sementsov-Ogievskiy
2025-04-09 17:48   ` Steven Sistare
2025-04-09 18:06     ` Vladimir Sementsov-Ogievskiy
2025-04-09 17:50   ` Vladimir Sementsov-Ogievskiy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a7af45f7-cd65-497a-9b20-eae6a0dab361@oracle.com \
    --to=steven.sistare@oracle.com \
    --cc=armbru@redhat.com \
    --cc=berrange@redhat.com \
    --cc=david@redhat.com \
    --cc=eduardo@habkost.net \
    --cc=farosas@suse.de \
    --cc=marcel.apfelbaum@gmail.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=philmd@linaro.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).