From: Steven Sistare <steven.sistare@oracle.com>
To: Fabiano Rosas <farosas@suse.de>, qemu-devel@nongnu.org
Cc: Peter Xu <peterx@redhat.com>,
David Hildenbrand <david@redhat.com>,
Marcel Apfelbaum <marcel.apfelbaum@gmail.com>,
Eduardo Habkost <eduardo@habkost.net>,
Philippe Mathieu-Daude <philmd@linaro.org>,
Paolo Bonzini <pbonzini@redhat.com>,
"Daniel P. Berrange" <berrange@redhat.com>,
Markus Armbruster <armbru@redhat.com>
Subject: Re: [PATCH V7 00/24] Live update: cpr-transfer
Date: Tue, 28 Jan 2025 16:20:39 -0500 [thread overview]
Message-ID: <a7af45f7-cd65-497a-9b20-eae6a0dab361@oracle.com> (raw)
In-Reply-To: <87y0ywqna1.fsf@suse.de>
On 1/27/2025 10:39 AM, Fabiano Rosas wrote:
> Steve Sistare <steven.sistare@oracle.com> writes:
>
>> What?
>>
>> This patch series adds the live migration cpr-transfer mode, which
>> allows the user to transfer a guest to a new QEMU instance on the same
>> host with minimal guest pause time, by preserving guest RAM in place,
>> albeit with new virtual addresses in new QEMU, and by preserving device
>> file descriptors.
>>
>> The new user-visible interfaces are:
>> * cpr-transfer (MigMode migration parameter)
>> * cpr (MigrationChannelType)
>> * incoming MigrationChannel (command-line argument)
>> * aux-ram-share (machine option)
>>
>> The user sets the mode parameter before invoking the migrate command.
>> In this mode, the user starts new QEMU on the same host as old QEMU, with
>> the same arguments as old QEMU, plus two -incoming options; one for the main
>> channel, and one for the CPR channel. The user issues the migrate command to
>> old QEMU, which stops the VM, saves state to the migration channels, and
>> enters the postmigrate state. Execution resumes in new QEMU.
>>
>> Memory-backend objects must have the share=on attribute, but memory-backend-epc
>> is not supported. The VM must be started with the '-machine aux-ram-share=on'
>> option, which allows auxilliary guest memory to be transferred in place to the
>> new process.
>>
>> This mode requires a second migration channel of type "cpr", in the channel
>> arguments on the outgoing side, and in a second -incoming command-line
>> parameter on the incoming side. This CPR channel must support file descriptor
>> transfer with SCM_RIGHTS, i.e. it must be a UNIX domain socket.
>>
>> Why?
>>
>> This mode has less impact on the guest than any other method of updating
>> in place. The pause time is much lower, because devices need not be torn
>> down and recreated, DMA does not need to be drained and quiesced, and minimal
>> state is copied to new QEMU. Further, there are no constraints on the guest.
>> By contrast, cpr-reboot mode requires the guest to support S3 suspend-to-ram,
>> and suspending plus resuming vfio devices adds multiple seconds to the
>> guest pause time.
>>
>> These benefits all derive from the core design principle of this mode,
>> which is preserving open descriptors. This approach is very general and
>> can be used to support a wide variety of devices that do not have hardware
>> support for live migration, including but not limited to: vfio, chardev,
>> vhost, vdpa, and iommufd. Some devices need new kernel software interfaces
>> to allow a descriptor to be used in a process that did not originally open it.
>>
>> How?
>>
>> All memory that is mapped by the guest is preserved in place. Indeed,
>> it must be, because it may be the target of DMA requests, which are not
>> quiesced during cpr-transfer. All such memory must be mmap'able in new QEMU.
>> This is easy for named memory-backend objects, as long as they are mapped
>> shared, because they are visible in the file system in both old and new QEMU.
>> Anonymous memory must be allocated using memfd_create rather than MAP_ANON,
>> so the memfd's can be sent to new QEMU. Pages that were locked in memory
>> for DMA in old QEMU remain locked in new QEMU, because the descriptor of
>> the device that locked them remains open.
>>
>> cpr-transfer preserves descriptors by sending them to new QEMU via the CPR
>> channel, which must support SCM_RIGHTS, and by sending the unique name of
>> each descriptor to new QEMU via CPR state.
>>
>> For device descriptors, new QEMU reuses the descriptor when creating the
>> device, rather than opening it again. For memfd descriptors, new QEMU
>> mmap's the preserved memfd when a ramblock is created.
>>
>> CPR state cannot be sent over the normal migration channel, because devices
>> and backends are created prior to reading the channel, so this mode sends
>> CPR state over a second "cpr" migration channel. New QEMU reads the second
>> channel prior to creating devices or backends.
>>
>> Example:
>>
>> In this example, we simply restart the same version of QEMU, but in
>> a real scenario one would use a new QEMU binary path in terminal 2.
>>
>> Terminal 1: start old QEMU
>> # qemu-kvm -qmp stdio -object
>> memory-backend-file,id=ram0,size=4G,mem-path=/dev/shm/ram0,share=on
>> -m 4G -machine aux-ram-share=on ...
>>
>> Terminal 2: start new QEMU
>> # qemu-kvm -monitor stdio ... -incoming tcp:0:44444
>> -incoming '{"channel-type": "cpr",
>> "addr": { "transport": "socket", "type": "unix",
>> "path": "cpr.sock"}}'
>>
>> Terminal 1:
>> {"execute":"qmp_capabilities"}
>>
>> {"execute": "query-status"}
>> {"return": {"status": "running",
>> "running": true}}
>>
>> {"execute":"migrate-set-parameters",
>> "arguments":{"mode":"cpr-transfer"}}
>>
>> {"execute": "migrate", "arguments": { "channels": [
>> {"channel-type": "main",
>> "addr": { "transport": "socket", "type": "inet",
>> "host": "0", "port": "44444" }},
>> {"channel-type": "cpr",
>> "addr": { "transport": "socket", "type": "unix",
>> "path": "cpr.sock" }}]}}
>>
>> {"execute": "query-status"}
>> {"return": {"status": "postmigrate",
>> "running": false}}
>>
>> Terminal 2:
>> QEMU 10.0.50 monitor - type 'help' for more information
>> (qemu) info status
>> VM status: running
>>
>> This patch series implements a minimal version of cpr-transfer. Additional
>> series are ready to be posted to deliver the complete vision described
>> above, including
>> * vfio
>> * chardev
>> * vhost and tap
>> * blockers
>> * cpr-exec mode
>> * iommufd
>>
>> Changes in V2:
>> * cpr-transfer is the first new mode proposed, and cpr-exec is deferred
>> * anon-alloc does not apply to memory-backend-object
>> * replaced hack with proper synchronization between source and target
>> * defined QEMU_CPR_FILE_MAGIC
>> * addressed misc review comments
>>
>> Changes in V3:
>> * added cpr-transfer to migration-test
>> * documented cpr-transfer in CPR.rst
>> * fix size_t trace format for 32-bit build
>> * drop explicit fd value in VMSTATE_FD
>> * defer cpr_walk_fd() and cpr_resave_fd() to later series
>> * drop "migration: save cpr mode".
>> delete mode from cpr state, and use cpr_uri to infer transfer mode.
>> * drop "migration: stop vm earlier for cpr"
>> * dropped cpr helpers, to be re-added later when needed
>> * fixed an unreported bug for cpr-transfer and migrate cancel
>> * documented cpr-transfer restrictions in qapi
>> * added trace for cpr_state_save and cpr_state_load
>> * added ftruncate to "preserve ram blocks"
>>
>> Changes in V4:
>> * cleaned up qtest deferred connection code
>> * renamed pass_fd -> can_pass_fd
>> * squashed patch "split qmp_migrate"
>> * deleted cpr-uri and its patches
>> * added cpr channel and its patches
>> * added patch "hostmem-shm: preserve for cpr"
>> * added patch "fd-based shared memory"
>> * added patch "factor out allocation of anonymous shared memory"
>> * added RAM_PRIVATE and its patch
>> * added aux-ram-share and its patch
>>
>> Changes in V5:
>> * added patch 'enhance migrate_uri_parse'
>> * supported dotted keys for -incoming channel,
>> and rewrote incoming_option_parse
>> * moved migrate_fd_cancel -> vm_resume to "stop vm earlier for cpr"
>> in a future series.
>> * updated command-line definition for aux-ram-share
>> * added patch "resizable qemu_ram_alloc_from_fd"
>> * rewrote patch "fd-based shared memory"
>> * fixed error message in qemu_shm_alloc
>> * added patch 'tests/qtest: optimize migrate_set_ports'
>> * added patch 'tests/qtest: enhance migration channels'
>> * added patch 'tests/qtest: assert qmp_ready'
>> * modified patch 'migration-test: cpr-transfer'
>> * polished the documentation in CPR.rst, qapi, and the
>> cpr-transfer mode commit message
>> * updated to master, and resolved massive context diffs for migration tests
>>
>> Changes in V6:
>> * added RB's and Acks.
>> * in patch "assert qmp_ready", deleted qmp_ready and checked qmp_fd instead.
>> renamed patch to ""assert qmp connected"
>> * factored out fix into new patch
>> "fix qemu_ram_alloc_from_fd size calculation"
>> * deleted a redundant call to migrate_hup_delete
>> * added commit message to "migration: cpr-transfer documentation"
>> * polished the text of cpr-transfer mode in qapi
>>
>> Changes in V7:
>> * fixed cpr-transfer test failure for s390
>> * fixed machine_get_aux_ram_share compilation error for Windows
>> * fixed size_t print format compilation error for misc architectures
>> * fixed memory leaks in cpr_transfer_output, cpr_transfer_input, and
>> qemu_file_get_fd
>>
>> The first 10 patches below are foundational and are needed for both cpr-transfer
>> mode and the proposed cpr-exec mode. The next 6 patches are specific to
>> cpr-transfer and implement the mechanisms for sharing state across a socket
>> using SCM_RIGHTS. The last 8 patches supply tests and documentation.
>>
>> Steve Sistare (24):
>> backends/hostmem-shm: factor out allocation of "anonymous shared
>> memory with an fd"
>> physmem: fix qemu_ram_alloc_from_fd size calculation
>> physmem: qemu_ram_alloc_from_fd extensions
>> physmem: fd-based shared memory
>> memory: add RAM_PRIVATE
>> machine: aux-ram-share option
>> migration: cpr-state
>> physmem: preserve ram blocks for cpr
>> hostmem-memfd: preserve for cpr
>> hostmem-shm: preserve for cpr
>> migration: enhance migrate_uri_parse
>> migration: incoming channel
>> migration: SCM_RIGHTS for QEMUFile
>> migration: VMSTATE_FD
>> migration: cpr-transfer save and load
>> migration: cpr-transfer mode
>> migration-test: memory_backend
>> tests/qtest: optimize migrate_set_ports
>> tests/qtest: defer connection
>> migration-test: defer connection
>> tests/qtest: enhance migration channels
>> tests/qtest: assert qmp connected
>> migration-test: cpr-transfer
>> migration: cpr-transfer documentation
>>
>> backends/hostmem-epc.c | 2 +-
>> backends/hostmem-file.c | 2 +-
>> backends/hostmem-memfd.c | 14 ++-
>> backends/hostmem-ram.c | 2 +-
>> backends/hostmem-shm.c | 51 ++------
>> docs/devel/migration/CPR.rst | 182 ++++++++++++++++++++++++++-
>> hw/core/machine.c | 22 ++++
>> include/exec/memory.h | 10 ++
>> include/exec/ram_addr.h | 13 +-
>> include/hw/boards.h | 1 +
>> include/migration/cpr.h | 33 +++++
>> include/migration/misc.h | 7 ++
>> include/migration/vmstate.h | 9 ++
>> include/qemu/osdep.h | 1 +
>> meson.build | 8 +-
>> migration/cpr-transfer.c | 71 +++++++++++
>> migration/cpr.c | 224 +++++++++++++++++++++++++++++++++
>> migration/meson.build | 2 +
>> migration/migration.c | 139 +++++++++++++++++++-
>> migration/migration.h | 4 +-
>> migration/options.c | 8 +-
>> migration/qemu-file.c | 84 ++++++++++++-
>> migration/qemu-file.h | 2 +
>> migration/ram.c | 2 +
>> migration/trace-events | 11 ++
>> migration/vmstate-types.c | 24 ++++
>> qapi/migration.json | 44 ++++++-
>> qemu-options.hx | 34 +++++
>> stubs/vmstate.c | 7 ++
>> system/memory.c | 4 +-
>> system/physmem.c | 150 ++++++++++++++++++----
>> system/trace-events | 1 +
>> system/vl.c | 43 ++++++-
>> tests/qtest/libqtest.c | 86 ++++++++-----
>> tests/qtest/libqtest.h | 19 ++-
>> tests/qtest/migration/cpr-tests.c | 62 +++++++++
>> tests/qtest/migration/framework.c | 74 +++++++++--
>> tests/qtest/migration/framework.h | 11 ++
>> tests/qtest/migration/migration-qmp.c | 53 ++++++--
>> tests/qtest/migration/migration-qmp.h | 10 +-
>> tests/qtest/migration/migration-util.c | 23 ++--
>> tests/qtest/migration/misc-tests.c | 9 +-
>> tests/qtest/migration/precopy-tests.c | 6 +-
>> tests/qtest/virtio-net-failover.c | 8 +-
>> util/memfd.c | 16 ++-
>> util/oslib-posix.c | 52 ++++++++
>> util/oslib-win32.c | 6 +
>> 47 files changed, 1472 insertions(+), 174 deletions(-)
>> create mode 100644 include/migration/cpr.h
>> create mode 100644 migration/cpr-transfer.c
>> create mode 100644 migration/cpr.c
>>
>> base-commit: e8aa7fdcddfc8589bdc7c973a052e76e8f999455
>
> I'd like to merge this series by the end of the week if possible. Please
> take a look at some comments from Markus that were left behind in v5.
We discussed, and Markus agrees none are show stoppers.
- Steve
next prev parent reply other threads:[~2025-01-28 21:21 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-15 19:00 [PATCH V7 00/24] Live update: cpr-transfer Steve Sistare
2025-01-15 19:00 ` [PATCH V7 01/24] backends/hostmem-shm: factor out allocation of "anonymous shared memory with an fd" Steve Sistare
2025-01-15 19:00 ` [PATCH V7 02/24] physmem: fix qemu_ram_alloc_from_fd size calculation Steve Sistare
2025-01-15 19:00 ` [PATCH V7 03/24] physmem: qemu_ram_alloc_from_fd extensions Steve Sistare
2025-01-15 19:00 ` [PATCH V7 04/24] physmem: fd-based shared memory Steve Sistare
2025-01-15 19:00 ` [PATCH V7 05/24] memory: add RAM_PRIVATE Steve Sistare
2025-01-15 19:00 ` [PATCH V7 06/24] machine: aux-ram-share option Steve Sistare
2025-01-15 19:00 ` [PATCH V7 07/24] migration: cpr-state Steve Sistare
2025-01-15 19:00 ` [PATCH V7 08/24] physmem: preserve ram blocks for cpr Steve Sistare
2025-01-15 19:00 ` [PATCH V7 09/24] hostmem-memfd: preserve " Steve Sistare
2025-01-15 19:00 ` [PATCH V7 10/24] hostmem-shm: " Steve Sistare
2025-01-15 19:00 ` [PATCH V7 11/24] migration: enhance migrate_uri_parse Steve Sistare
2025-01-15 19:00 ` [PATCH V7 12/24] migration: incoming channel Steve Sistare
2025-01-15 19:00 ` [PATCH V7 13/24] migration: SCM_RIGHTS for QEMUFile Steve Sistare
2025-01-15 19:00 ` [PATCH V7 14/24] migration: VMSTATE_FD Steve Sistare
2025-01-15 19:00 ` [PATCH V7 15/24] migration: cpr-transfer save and load Steve Sistare
2025-01-15 19:00 ` [PATCH V7 16/24] migration: cpr-transfer mode Steve Sistare
2025-01-29 6:23 ` Markus Armbruster
2025-01-15 19:00 ` [PATCH V7 17/24] migration-test: memory_backend Steve Sistare
2025-01-15 19:00 ` [PATCH V7 18/24] tests/qtest: optimize migrate_set_ports Steve Sistare
2025-01-15 19:00 ` [PATCH V7 19/24] tests/qtest: defer connection Steve Sistare
2025-01-15 19:00 ` [PATCH V7 20/24] migration-test: " Steve Sistare
2025-01-15 19:00 ` [PATCH V7 21/24] tests/qtest: enhance migration channels Steve Sistare
2025-01-15 19:00 ` [PATCH V7 22/24] tests/qtest: assert qmp connected Steve Sistare
2025-01-15 19:00 ` [PATCH V7 23/24] migration-test: cpr-transfer Steve Sistare
2025-01-16 19:06 ` Fabiano Rosas
2025-01-16 19:37 ` Steven Sistare
2025-01-16 20:02 ` Fabiano Rosas
2025-01-16 20:15 ` Steven Sistare
2025-01-15 19:00 ` [PATCH V7 24/24] migration: cpr-transfer documentation Steve Sistare
2025-01-17 14:42 ` Fabiano Rosas
2025-01-17 15:04 ` Steven Sistare
2025-01-17 15:29 ` Fabiano Rosas
2025-01-17 16:58 ` Steven Sistare
2025-01-17 19:06 ` Fabiano Rosas
2025-01-17 19:32 ` Steven Sistare
2025-01-17 20:04 ` Fabiano Rosas
2025-01-27 15:39 ` [PATCH V7 00/24] Live update: cpr-transfer Fabiano Rosas
2025-01-28 21:20 ` Steven Sistare [this message]
2025-01-29 6:24 ` Markus Armbruster
2025-04-09 16:22 ` Vladimir Sementsov-Ogievskiy
2025-04-09 17:48 ` Steven Sistare
2025-04-09 18:06 ` Vladimir Sementsov-Ogievskiy
2025-04-09 17:50 ` Vladimir Sementsov-Ogievskiy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a7af45f7-cd65-497a-9b20-eae6a0dab361@oracle.com \
--to=steven.sistare@oracle.com \
--cc=armbru@redhat.com \
--cc=berrange@redhat.com \
--cc=david@redhat.com \
--cc=eduardo@habkost.net \
--cc=farosas@suse.de \
--cc=marcel.apfelbaum@gmail.com \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
--cc=philmd@linaro.org \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).