qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V7 00/24] Live update: cpr-transfer
@ 2025-01-15 19:00 Steve Sistare
  2025-01-15 19:00 ` [PATCH V7 01/24] backends/hostmem-shm: factor out allocation of "anonymous shared memory with an fd" Steve Sistare
                   ` (25 more replies)
  0 siblings, 26 replies; 44+ messages in thread
From: Steve Sistare @ 2025-01-15 19:00 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, David Hildenbrand, Marcel Apfelbaum,
	Eduardo Habkost, Philippe Mathieu-Daude, Paolo Bonzini,
	Daniel P. Berrange, Markus Armbruster, Steve Sistare

What?

This patch series adds the live migration cpr-transfer mode, which
allows the user to transfer a guest to a new QEMU instance on the same
host with minimal guest pause time, by preserving guest RAM in place,
albeit with new virtual addresses in new QEMU, and by preserving device
file descriptors.

The new user-visible interfaces are:
  * cpr-transfer (MigMode migration parameter)
  * cpr (MigrationChannelType)
  * incoming MigrationChannel (command-line argument)
  * aux-ram-share (machine option)

The user sets the mode parameter before invoking the migrate command.
In this mode, the user starts new QEMU on the same host as old QEMU, with
the same arguments as old QEMU, plus two -incoming options; one for the main
channel, and one for the CPR channel.  The user issues the migrate command to
old QEMU, which stops the VM, saves state to the migration channels, and
enters the postmigrate state.  Execution resumes in new QEMU.

Memory-backend objects must have the share=on attribute, but memory-backend-epc
is not supported.  The VM must be started with the '-machine aux-ram-share=on'
option, which allows auxilliary guest memory to be transferred in place to the
new process.

This mode requires a second migration channel of type "cpr", in the channel
arguments on the outgoing side, and in a second -incoming command-line
parameter on the incoming side.  This CPR channel must support file descriptor
transfer with SCM_RIGHTS, i.e. it must be a UNIX domain socket.

Why?

This mode has less impact on the guest than any other method of updating
in place.  The pause time is much lower, because devices need not be torn
down and recreated, DMA does not need to be drained and quiesced, and minimal
state is copied to new QEMU.  Further, there are no constraints on the guest.
By contrast, cpr-reboot mode requires the guest to support S3 suspend-to-ram,
and suspending plus resuming vfio devices adds multiple seconds to the
guest pause time.

These benefits all derive from the core design principle of this mode,
which is preserving open descriptors.  This approach is very general and
can be used to support a wide variety of devices that do not have hardware
support for live migration, including but not limited to: vfio, chardev,
vhost, vdpa, and iommufd.  Some devices need new kernel software interfaces
to allow a descriptor to be used in a process that did not originally open it.

How?

All memory that is mapped by the guest is preserved in place.  Indeed,
it must be, because it may be the target of DMA requests, which are not
quiesced during cpr-transfer.  All such memory must be mmap'able in new QEMU.
This is easy for named memory-backend objects, as long as they are mapped
shared, because they are visible in the file system in both old and new QEMU.
Anonymous memory must be allocated using memfd_create rather than MAP_ANON,
so the memfd's can be sent to new QEMU.  Pages that were locked in memory
for DMA in old QEMU remain locked in new QEMU, because the descriptor of
the device that locked them remains open.

cpr-transfer preserves descriptors by sending them to new QEMU via the CPR
channel, which must support SCM_RIGHTS, and by sending the unique name of
each descriptor to new QEMU via CPR state.

For device descriptors, new QEMU reuses the descriptor when creating the
device, rather than opening it again.  For memfd descriptors, new QEMU
mmap's the preserved memfd when a ramblock is created.

CPR state cannot be sent over the normal migration channel, because devices
and backends are created prior to reading the channel, so this mode sends
CPR state over a second "cpr" migration channel.  New QEMU reads the second
channel prior to creating devices or backends.

Example:

In this example, we simply restart the same version of QEMU, but in
a real scenario one would use a new QEMU binary path in terminal 2.

  Terminal 1: start old QEMU
  # qemu-kvm -qmp stdio -object
  memory-backend-file,id=ram0,size=4G,mem-path=/dev/shm/ram0,share=on
  -m 4G -machine aux-ram-share=on ...

  Terminal 2: start new QEMU
  # qemu-kvm -monitor stdio ... -incoming tcp:0:44444
    -incoming '{"channel-type": "cpr",
                "addr": { "transport": "socket", "type": "unix",
                          "path": "cpr.sock"}}'

  Terminal 1:
  {"execute":"qmp_capabilities"}

  {"execute": "query-status"}
  {"return": {"status": "running",
              "running": true}}

  {"execute":"migrate-set-parameters",
   "arguments":{"mode":"cpr-transfer"}}

  {"execute": "migrate", "arguments": { "channels": [
    {"channel-type": "main",
     "addr": { "transport": "socket", "type": "inet",
               "host": "0", "port": "44444" }},
    {"channel-type": "cpr",
     "addr": { "transport": "socket", "type": "unix",
               "path": "cpr.sock" }}]}}

  {"execute": "query-status"}
  {"return": {"status": "postmigrate",
              "running": false}}

  Terminal 2:
  QEMU 10.0.50 monitor - type 'help' for more information
  (qemu) info status
  VM status: running

This patch series implements a minimal version of cpr-transfer.  Additional
series are ready to be posted to deliver the complete vision described
above, including
  * vfio
  * chardev
  * vhost and tap
  * blockers
  * cpr-exec mode
  * iommufd

Changes in V2:
  * cpr-transfer is the first new mode proposed, and cpr-exec is deferred
  * anon-alloc does not apply to memory-backend-object
  * replaced hack with proper synchronization between source and target
  * defined QEMU_CPR_FILE_MAGIC
  * addressed misc review comments

Changes in V3:
  * added cpr-transfer to migration-test
  * documented cpr-transfer in CPR.rst
  * fix size_t trace format for 32-bit build
  * drop explicit fd value in VMSTATE_FD
  * defer cpr_walk_fd() and cpr_resave_fd() to later series
  * drop "migration: save cpr mode".
    delete mode from cpr state, and use cpr_uri to infer transfer mode.
  * drop "migration: stop vm earlier for cpr"
  * dropped cpr helpers, to be re-added later when needed
  * fixed an unreported bug for cpr-transfer and migrate cancel
  * documented cpr-transfer restrictions in qapi
  * added trace for cpr_state_save and cpr_state_load
  * added ftruncate to "preserve ram blocks"

Changes in V4:
  * cleaned up qtest deferred connection code
  * renamed pass_fd -> can_pass_fd
  * squashed patch "split qmp_migrate"
  * deleted cpr-uri and its patches
  * added cpr channel and its patches
  * added patch "hostmem-shm: preserve for cpr"
  * added patch "fd-based shared memory"
  * added patch "factor out allocation of anonymous shared memory"
  * added RAM_PRIVATE and its patch
  * added aux-ram-share and its patch

Changes in V5:
  * added patch 'enhance migrate_uri_parse'
  * supported dotted keys for -incoming channel,
    and rewrote incoming_option_parse
  * moved migrate_fd_cancel -> vm_resume to "stop vm earlier for cpr"
    in a future series.
  * updated command-line definition for aux-ram-share
  * added patch "resizable qemu_ram_alloc_from_fd"
  * rewrote patch "fd-based shared memory"
  * fixed error message in qemu_shm_alloc
  * added patch 'tests/qtest: optimize migrate_set_ports'
  * added patch 'tests/qtest: enhance migration channels'
  * added patch 'tests/qtest: assert qmp_ready'
  * modified patch 'migration-test: cpr-transfer'
  * polished the documentation in CPR.rst, qapi, and the
    cpr-transfer mode commit message
  * updated to master, and resolved massive context diffs for migration tests

Changes in V6:
  * added RB's and Acks.
  * in patch "assert qmp_ready", deleted qmp_ready and checked qmp_fd instead.
    renamed patch to ""assert qmp connected"
  * factored out fix into new patch
    "fix qemu_ram_alloc_from_fd size calculation"
  * deleted a redundant call to migrate_hup_delete
  * added commit message to "migration: cpr-transfer documentation"
  * polished the text of cpr-transfer mode in qapi

Changes in V7:
  * fixed cpr-transfer test failure for s390
  * fixed machine_get_aux_ram_share compilation error for Windows
  * fixed size_t print format compilation error for misc architectures
  * fixed memory leaks in cpr_transfer_output, cpr_transfer_input, and
    qemu_file_get_fd

The first 10 patches below are foundational and are needed for both cpr-transfer
mode and the proposed cpr-exec mode.  The next 6 patches are specific to
cpr-transfer and implement the mechanisms for sharing state across a socket
using SCM_RIGHTS.  The last 8 patches supply tests and documentation.

Steve Sistare (24):
  backends/hostmem-shm: factor out allocation of "anonymous shared
    memory with an fd"
  physmem: fix qemu_ram_alloc_from_fd size calculation
  physmem: qemu_ram_alloc_from_fd extensions
  physmem: fd-based shared memory
  memory: add RAM_PRIVATE
  machine: aux-ram-share option
  migration: cpr-state
  physmem: preserve ram blocks for cpr
  hostmem-memfd: preserve for cpr
  hostmem-shm: preserve for cpr
  migration: enhance migrate_uri_parse
  migration: incoming channel
  migration: SCM_RIGHTS for QEMUFile
  migration: VMSTATE_FD
  migration: cpr-transfer save and load
  migration: cpr-transfer mode
  migration-test: memory_backend
  tests/qtest: optimize migrate_set_ports
  tests/qtest: defer connection
  migration-test: defer connection
  tests/qtest: enhance migration channels
  tests/qtest: assert qmp connected
  migration-test: cpr-transfer
  migration: cpr-transfer documentation

 backends/hostmem-epc.c                 |   2 +-
 backends/hostmem-file.c                |   2 +-
 backends/hostmem-memfd.c               |  14 ++-
 backends/hostmem-ram.c                 |   2 +-
 backends/hostmem-shm.c                 |  51 ++------
 docs/devel/migration/CPR.rst           | 182 ++++++++++++++++++++++++++-
 hw/core/machine.c                      |  22 ++++
 include/exec/memory.h                  |  10 ++
 include/exec/ram_addr.h                |  13 +-
 include/hw/boards.h                    |   1 +
 include/migration/cpr.h                |  33 +++++
 include/migration/misc.h               |   7 ++
 include/migration/vmstate.h            |   9 ++
 include/qemu/osdep.h                   |   1 +
 meson.build                            |   8 +-
 migration/cpr-transfer.c               |  71 +++++++++++
 migration/cpr.c                        | 224 +++++++++++++++++++++++++++++++++
 migration/meson.build                  |   2 +
 migration/migration.c                  | 139 +++++++++++++++++++-
 migration/migration.h                  |   4 +-
 migration/options.c                    |   8 +-
 migration/qemu-file.c                  |  84 ++++++++++++-
 migration/qemu-file.h                  |   2 +
 migration/ram.c                        |   2 +
 migration/trace-events                 |  11 ++
 migration/vmstate-types.c              |  24 ++++
 qapi/migration.json                    |  44 ++++++-
 qemu-options.hx                        |  34 +++++
 stubs/vmstate.c                        |   7 ++
 system/memory.c                        |   4 +-
 system/physmem.c                       | 150 ++++++++++++++++++----
 system/trace-events                    |   1 +
 system/vl.c                            |  43 ++++++-
 tests/qtest/libqtest.c                 |  86 ++++++++-----
 tests/qtest/libqtest.h                 |  19 ++-
 tests/qtest/migration/cpr-tests.c      |  62 +++++++++
 tests/qtest/migration/framework.c      |  74 +++++++++--
 tests/qtest/migration/framework.h      |  11 ++
 tests/qtest/migration/migration-qmp.c  |  53 ++++++--
 tests/qtest/migration/migration-qmp.h  |  10 +-
 tests/qtest/migration/migration-util.c |  23 ++--
 tests/qtest/migration/misc-tests.c     |   9 +-
 tests/qtest/migration/precopy-tests.c  |   6 +-
 tests/qtest/virtio-net-failover.c      |   8 +-
 util/memfd.c                           |  16 ++-
 util/oslib-posix.c                     |  52 ++++++++
 util/oslib-win32.c                     |   6 +
 47 files changed, 1472 insertions(+), 174 deletions(-)
 create mode 100644 include/migration/cpr.h
 create mode 100644 migration/cpr-transfer.c
 create mode 100644 migration/cpr.c

base-commit: e8aa7fdcddfc8589bdc7c973a052e76e8f999455

-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH V7 01/24] backends/hostmem-shm: factor out allocation of "anonymous shared memory with an fd"
  2025-01-15 19:00 [PATCH V7 00/24] Live update: cpr-transfer Steve Sistare
@ 2025-01-15 19:00 ` Steve Sistare
  2025-01-15 19:00 ` [PATCH V7 02/24] physmem: fix qemu_ram_alloc_from_fd size calculation Steve Sistare
                   ` (24 subsequent siblings)
  25 siblings, 0 replies; 44+ messages in thread
From: Steve Sistare @ 2025-01-15 19:00 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, David Hildenbrand, Marcel Apfelbaum,
	Eduardo Habkost, Philippe Mathieu-Daude, Paolo Bonzini,
	Daniel P. Berrange, Markus Armbruster, Steve Sistare

Let's factor it out so we can reuse it.

Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
---
 backends/hostmem-shm.c | 45 ++++---------------------------------------
 include/qemu/osdep.h   |  1 +
 meson.build            |  8 ++++++--
 util/oslib-posix.c     | 52 ++++++++++++++++++++++++++++++++++++++++++++++++++
 util/oslib-win32.c     |  6 ++++++
 5 files changed, 69 insertions(+), 43 deletions(-)

diff --git a/backends/hostmem-shm.c b/backends/hostmem-shm.c
index 5551ba7..fabee41 100644
--- a/backends/hostmem-shm.c
+++ b/backends/hostmem-shm.c
@@ -25,11 +25,9 @@ struct HostMemoryBackendShm {
 static bool
 shm_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
 {
-    g_autoptr(GString) shm_name = g_string_new(NULL);
     g_autofree char *backend_name = NULL;
     uint32_t ram_flags;
-    int fd, oflag;
-    mode_t mode;
+    int fd;
 
     if (!backend->size) {
         error_setg(errp, "can't create shm backend with size 0");
@@ -41,48 +39,13 @@ shm_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
         return false;
     }
 
-    /*
-     * Let's use `mode = 0` because we don't want other processes to open our
-     * memory unless we share the file descriptor with them.
-     */
-    mode = 0;
-    oflag = O_RDWR | O_CREAT | O_EXCL;
-    backend_name = host_memory_backend_get_name(backend);
-
-    /*
-     * Some operating systems allow creating anonymous POSIX shared memory
-     * objects (e.g. FreeBSD provides the SHM_ANON constant), but this is not
-     * defined by POSIX, so let's create a unique name.
-     *
-     * From Linux's shm_open(3) man-page:
-     *   For  portable  use,  a shared  memory  object should be identified
-     *   by a name of the form /somename;"
-     */
-    g_string_printf(shm_name, "/qemu-" FMT_pid "-shm-%s", getpid(),
-                    backend_name);
-
-    fd = shm_open(shm_name->str, oflag, mode);
+    fd = qemu_shm_alloc(backend->size, errp);
     if (fd < 0) {
-        error_setg_errno(errp, errno,
-                         "failed to create POSIX shared memory");
-        return false;
-    }
-
-    /*
-     * We have the file descriptor, so we no longer need to expose the
-     * POSIX shared memory object. However it will remain allocated as long as
-     * there are file descriptors pointing to it.
-     */
-    shm_unlink(shm_name->str);
-
-    if (ftruncate(fd, backend->size) == -1) {
-        error_setg_errno(errp, errno,
-                         "failed to resize POSIX shared memory to %" PRIu64,
-                         backend->size);
-        close(fd);
         return false;
     }
 
+    /* Let's do the same as memory-backend-ram,share=on would do. */
+    backend_name = host_memory_backend_get_name(backend);
     ram_flags = RAM_SHARED;
     ram_flags |= backend->reserve ? 0 : RAM_NORESERVE;
 
diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index b94fb5f..112ebdf 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -509,6 +509,7 @@ int qemu_daemon(int nochdir, int noclose);
 void *qemu_anon_ram_alloc(size_t size, uint64_t *align, bool shared,
                           bool noreserve);
 void qemu_anon_ram_free(void *ptr, size_t size);
+int qemu_shm_alloc(size_t size, Error **errp);
 
 #ifdef _WIN32
 #define HAVE_CHARDEV_SERIAL 1
diff --git a/meson.build b/meson.build
index b715ea7..2e709c9 100644
--- a/meson.build
+++ b/meson.build
@@ -3706,9 +3706,13 @@ libqemuutil = static_library('qemuutil',
                              build_by_default: false,
                              sources: util_ss.sources() + stub_ss.sources() + genh,
                              dependencies: [util_ss.dependencies(), libm, threads, glib, socket, malloc])
+qemuutil_deps = [event_loop_base]
+if host_os != 'windows'
+  qemuutil_deps += [rt]
+endif
 qemuutil = declare_dependency(link_with: libqemuutil,
                               sources: genh + version_res,
-                              dependencies: [event_loop_base])
+                              dependencies: qemuutil_deps)
 
 if have_system or have_user
   decodetree = generator(find_program('scripts/decodetree.py'),
@@ -4362,7 +4366,7 @@ if have_tools
   subdir('contrib/elf2dmp')
 
   executable('qemu-edid', files('qemu-edid.c', 'hw/display/edid-generate.c'),
-             dependencies: qemuutil,
+             dependencies: [qemuutil, rt],
              install: true)
 
   if have_vhost_user
diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index 7a542cb..2bb34da 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -931,3 +931,55 @@ void qemu_close_all_open_fd(const int *skip, unsigned int nskip)
         qemu_close_all_open_fd_fallback(skip, nskip, open_max);
     }
 }
+
+int qemu_shm_alloc(size_t size, Error **errp)
+{
+    g_autoptr(GString) shm_name = g_string_new(NULL);
+    int fd, oflag, cur_sequence;
+    static int sequence;
+    mode_t mode;
+
+    cur_sequence = qatomic_fetch_inc(&sequence);
+
+    /*
+     * Let's use `mode = 0` because we don't want other processes to open our
+     * memory unless we share the file descriptor with them.
+     */
+    mode = 0;
+    oflag = O_RDWR | O_CREAT | O_EXCL;
+
+    /*
+     * Some operating systems allow creating anonymous POSIX shared memory
+     * objects (e.g. FreeBSD provides the SHM_ANON constant), but this is not
+     * defined by POSIX, so let's create a unique name.
+     *
+     * From Linux's shm_open(3) man-page:
+     *   For  portable  use,  a shared  memory  object should be identified
+     *   by a name of the form /somename;"
+     */
+    g_string_printf(shm_name, "/qemu-" FMT_pid "-shm-%d", getpid(),
+                    cur_sequence);
+
+    fd = shm_open(shm_name->str, oflag, mode);
+    if (fd < 0) {
+        error_setg_errno(errp, errno,
+                         "failed to create POSIX shared memory");
+        return -1;
+    }
+
+    /*
+     * We have the file descriptor, so we no longer need to expose the
+     * POSIX shared memory object. However it will remain allocated as long as
+     * there are file descriptors pointing to it.
+     */
+    shm_unlink(shm_name->str);
+
+    if (ftruncate(fd, size) == -1) {
+        error_setg_errno(errp, errno,
+                         "failed to resize POSIX shared memory to %zu", size);
+        close(fd);
+        return -1;
+    }
+
+    return fd;
+}
diff --git a/util/oslib-win32.c b/util/oslib-win32.c
index b623830..b735163 100644
--- a/util/oslib-win32.c
+++ b/util/oslib-win32.c
@@ -877,3 +877,9 @@ void qemu_win32_map_free(void *ptr, HANDLE h, Error **errp)
     }
     CloseHandle(h);
 }
+
+int qemu_shm_alloc(size_t size, Error **errp)
+{
+    error_setg(errp, "Shared memory is not supported.");
+    return -1;
+}
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH V7 02/24] physmem: fix qemu_ram_alloc_from_fd size calculation
  2025-01-15 19:00 [PATCH V7 00/24] Live update: cpr-transfer Steve Sistare
  2025-01-15 19:00 ` [PATCH V7 01/24] backends/hostmem-shm: factor out allocation of "anonymous shared memory with an fd" Steve Sistare
@ 2025-01-15 19:00 ` Steve Sistare
  2025-01-15 19:00 ` [PATCH V7 03/24] physmem: qemu_ram_alloc_from_fd extensions Steve Sistare
                   ` (23 subsequent siblings)
  25 siblings, 0 replies; 44+ messages in thread
From: Steve Sistare @ 2025-01-15 19:00 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, David Hildenbrand, Marcel Apfelbaum,
	Eduardo Habkost, Philippe Mathieu-Daude, Paolo Bonzini,
	Daniel P. Berrange, Markus Armbruster, Steve Sistare

qemu_ram_alloc_from_fd allocates space if file_size == 0.  If non-zero,
it uses the existing space and verifies it is large enough, but the
verification was broken when the offset parameter was introduced.  As
a result, a file smaller than offset passes the verification and causes
errors later.  Fix that, and update the error message to include offset.

Peter provides this concise reproducer:

  $ touch ramfile
  $ truncate -s 64M ramfile
  $ ./qemu-system-x86_64 -object memory-backend-file,mem-path=./ramfile,offset=128M,size=128M,id=mem1,prealloc=on
  qemu-system-x86_64: qemu_prealloc_mem: preallocating memory failed: Bad address

With the fix, the error message is:
  qemu-system-x86_64: mem1 backing store size 0x4000000 is too small for 'size' option 0x8000000 plus 'offset' option 0x8000000

Cc: qemu-stable@nongnu.org
Fixes: 4b870dc4d0c0 ("hostmem-file: add offset option")
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
---
 system/physmem.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/system/physmem.c b/system/physmem.c
index c76503a..792844d 100644
--- a/system/physmem.c
+++ b/system/physmem.c
@@ -1970,10 +1970,12 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
     size = REAL_HOST_PAGE_ALIGN(size);
 
     file_size = get_file_size(fd);
-    if (file_size > offset && file_size < (offset + size)) {
-        error_setg(errp, "backing store size 0x%" PRIx64
-                   " does not match 'size' option 0x" RAM_ADDR_FMT,
-                   file_size, size);
+    if (file_size && file_size < offset + size) {
+        error_setg(errp, "%s backing store size 0x%" PRIx64
+                   " is too small for 'size' option 0x" RAM_ADDR_FMT
+                   " plus 'offset' option 0x%" PRIx64,
+                   memory_region_name(mr), file_size, size,
+                   (uint64_t)offset);
         return NULL;
     }
 
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH V7 03/24] physmem: qemu_ram_alloc_from_fd extensions
  2025-01-15 19:00 [PATCH V7 00/24] Live update: cpr-transfer Steve Sistare
  2025-01-15 19:00 ` [PATCH V7 01/24] backends/hostmem-shm: factor out allocation of "anonymous shared memory with an fd" Steve Sistare
  2025-01-15 19:00 ` [PATCH V7 02/24] physmem: fix qemu_ram_alloc_from_fd size calculation Steve Sistare
@ 2025-01-15 19:00 ` Steve Sistare
  2025-01-15 19:00 ` [PATCH V7 04/24] physmem: fd-based shared memory Steve Sistare
                   ` (22 subsequent siblings)
  25 siblings, 0 replies; 44+ messages in thread
From: Steve Sistare @ 2025-01-15 19:00 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, David Hildenbrand, Marcel Apfelbaum,
	Eduardo Habkost, Philippe Mathieu-Daude, Paolo Bonzini,
	Daniel P. Berrange, Markus Armbruster, Steve Sistare

Extend qemu_ram_alloc_from_fd to support resizable ram, and define
qemu_ram_resize_cb to clean up the API.

Add a grow parameter to extend the file if necessary.  However, if
grow is false, a zero-sized file is always extended.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 include/exec/ram_addr.h | 13 +++++++++----
 system/memory.c         |  4 ++--
 system/physmem.c        | 35 ++++++++++++++++++++---------------
 3 files changed, 31 insertions(+), 21 deletions(-)

diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index ff157c1..94bb3cc 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -111,23 +111,30 @@ long qemu_maxrampagesize(void);
  *
  * Parameters:
  *  @size: the size in bytes of the ram block
+ *  @max_size: the maximum size of the block after resizing
  *  @mr: the memory region where the ram block is
+ *  @resized: callback after calls to qemu_ram_resize
  *  @ram_flags: RamBlock flags. Supported flags: RAM_SHARED, RAM_PMEM,
  *              RAM_NORESERVE, RAM_PROTECTED, RAM_NAMED_FILE, RAM_READONLY,
  *              RAM_READONLY_FD, RAM_GUEST_MEMFD
  *  @mem_path or @fd: specify the backing file or device
  *  @offset: Offset into target file
+ *  @grow: extend file if necessary (but an empty file is always extended).
  *  @errp: pointer to Error*, to store an error if it happens
  *
  * Return:
  *  On success, return a pointer to the ram block.
  *  On failure, return NULL.
  */
+typedef void (*qemu_ram_resize_cb)(const char *, uint64_t length, void *host);
+
 RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr,
                                    uint32_t ram_flags, const char *mem_path,
                                    off_t offset, Error **errp);
-RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
+RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, ram_addr_t max_size,
+                                 qemu_ram_resize_cb resized, MemoryRegion *mr,
                                  uint32_t ram_flags, int fd, off_t offset,
+                                 bool grow,
                                  Error **errp);
 
 RAMBlock *qemu_ram_alloc_from_ptr(ram_addr_t size, void *host,
@@ -135,9 +142,7 @@ RAMBlock *qemu_ram_alloc_from_ptr(ram_addr_t size, void *host,
 RAMBlock *qemu_ram_alloc(ram_addr_t size, uint32_t ram_flags, MemoryRegion *mr,
                          Error **errp);
 RAMBlock *qemu_ram_alloc_resizeable(ram_addr_t size, ram_addr_t max_size,
-                                    void (*resized)(const char*,
-                                                    uint64_t length,
-                                                    void *host),
+                                    qemu_ram_resize_cb resized,
                                     MemoryRegion *mr, Error **errp);
 void qemu_ram_free(RAMBlock *block);
 
diff --git a/system/memory.c b/system/memory.c
index b17b553..4c82979 100644
--- a/system/memory.c
+++ b/system/memory.c
@@ -1680,8 +1680,8 @@ bool memory_region_init_ram_from_fd(MemoryRegion *mr,
     mr->readonly = !!(ram_flags & RAM_READONLY);
     mr->terminates = true;
     mr->destructor = memory_region_destructor_ram;
-    mr->ram_block = qemu_ram_alloc_from_fd(size, mr, ram_flags, fd, offset,
-                                           &err);
+    mr->ram_block = qemu_ram_alloc_from_fd(size, size, NULL, mr, ram_flags, fd,
+                                           offset, false, &err);
     if (err) {
         mr->size = int128_zero();
         object_unparent(OBJECT(mr));
diff --git a/system/physmem.c b/system/physmem.c
index 792844d..4d13761 100644
--- a/system/physmem.c
+++ b/system/physmem.c
@@ -1942,8 +1942,10 @@ out_free:
 }
 
 #ifdef CONFIG_POSIX
-RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
+RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, ram_addr_t max_size,
+                                 qemu_ram_resize_cb resized, MemoryRegion *mr,
                                  uint32_t ram_flags, int fd, off_t offset,
+                                 bool grow,
                                  Error **errp)
 {
     RAMBlock *new_block;
@@ -1953,7 +1955,9 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
     /* Just support these ram flags by now. */
     assert((ram_flags & ~(RAM_SHARED | RAM_PMEM | RAM_NORESERVE |
                           RAM_PROTECTED | RAM_NAMED_FILE | RAM_READONLY |
-                          RAM_READONLY_FD | RAM_GUEST_MEMFD)) == 0);
+                          RAM_READONLY_FD | RAM_GUEST_MEMFD |
+                          RAM_RESIZEABLE)) == 0);
+    assert(max_size >= size);
 
     if (xen_enabled()) {
         error_setg(errp, "-mem-path not supported with Xen");
@@ -1968,13 +1972,15 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
 
     size = TARGET_PAGE_ALIGN(size);
     size = REAL_HOST_PAGE_ALIGN(size);
+    max_size = TARGET_PAGE_ALIGN(max_size);
+    max_size = REAL_HOST_PAGE_ALIGN(max_size);
 
     file_size = get_file_size(fd);
-    if (file_size && file_size < offset + size) {
+    if (file_size && file_size < offset + max_size && !grow) {
         error_setg(errp, "%s backing store size 0x%" PRIx64
                    " is too small for 'size' option 0x" RAM_ADDR_FMT
                    " plus 'offset' option 0x%" PRIx64,
-                   memory_region_name(mr), file_size, size,
+                   memory_region_name(mr), file_size, max_size,
                    (uint64_t)offset);
         return NULL;
     }
@@ -1990,11 +1996,13 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
     new_block = g_malloc0(sizeof(*new_block));
     new_block->mr = mr;
     new_block->used_length = size;
-    new_block->max_length = size;
+    new_block->max_length = max_size;
+    new_block->resized = resized;
     new_block->flags = ram_flags;
     new_block->guest_memfd = -1;
-    new_block->host = file_ram_alloc(new_block, size, fd, !file_size, offset,
-                                     errp);
+    new_block->host = file_ram_alloc(new_block, max_size, fd,
+                                     file_size < offset + max_size,
+                                     offset, errp);
     if (!new_block->host) {
         g_free(new_block);
         return NULL;
@@ -2046,7 +2054,8 @@ RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr,
         return NULL;
     }
 
-    block = qemu_ram_alloc_from_fd(size, mr, ram_flags, fd, offset, errp);
+    block = qemu_ram_alloc_from_fd(size, size, NULL, mr, ram_flags, fd, offset,
+                                   false, errp);
     if (!block) {
         if (created) {
             unlink(mem_path);
@@ -2061,9 +2070,7 @@ RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr,
 
 static
 RAMBlock *qemu_ram_alloc_internal(ram_addr_t size, ram_addr_t max_size,
-                                  void (*resized)(const char*,
-                                                  uint64_t length,
-                                                  void *host),
+                                  qemu_ram_resize_cb resized,
                                   void *host, uint32_t ram_flags,
                                   MemoryRegion *mr, Error **errp)
 {
@@ -2115,10 +2122,8 @@ RAMBlock *qemu_ram_alloc(ram_addr_t size, uint32_t ram_flags,
 }
 
 RAMBlock *qemu_ram_alloc_resizeable(ram_addr_t size, ram_addr_t maxsz,
-                                     void (*resized)(const char*,
-                                                     uint64_t length,
-                                                     void *host),
-                                     MemoryRegion *mr, Error **errp)
+                                    qemu_ram_resize_cb resized,
+                                    MemoryRegion *mr, Error **errp)
 {
     return qemu_ram_alloc_internal(size, maxsz, resized, NULL,
                                    RAM_RESIZEABLE, mr, errp);
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH V7 04/24] physmem: fd-based shared memory
  2025-01-15 19:00 [PATCH V7 00/24] Live update: cpr-transfer Steve Sistare
                   ` (2 preceding siblings ...)
  2025-01-15 19:00 ` [PATCH V7 03/24] physmem: qemu_ram_alloc_from_fd extensions Steve Sistare
@ 2025-01-15 19:00 ` Steve Sistare
  2025-01-15 19:00 ` [PATCH V7 05/24] memory: add RAM_PRIVATE Steve Sistare
                   ` (21 subsequent siblings)
  25 siblings, 0 replies; 44+ messages in thread
From: Steve Sistare @ 2025-01-15 19:00 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, David Hildenbrand, Marcel Apfelbaum,
	Eduardo Habkost, Philippe Mathieu-Daude, Paolo Bonzini,
	Daniel P. Berrange, Markus Armbruster, Steve Sistare

Create MAP_SHARED RAMBlocks by mmap'ing a file descriptor rather than using
MAP_ANON, so the memory can be accessed in another process by passing and
mmap'ing the fd.  This will allow CPR to support memory-backend-ram and
memory-backend-shm objects, provided the user creates them with share=on.

Use memfd_create if available because it has no constraints.  If not, use
POSIX shm_open.  However, allocation on the opened fd may fail if the shm
mount size is too small, even if the system has free memory, so for backwards
compatibility fall back to qemu_anon_ram_alloc/MAP_ANON on failure.

For backwards compatibility on Windows, always use MAP_ANON.  share=on has
no purpose there, but the syntax is accepted, and must continue to work.

Lastly, quietly fall back to MAP_ANON if the system does not support
qemu_ram_alloc_from_fd.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
---
 system/physmem.c    | 57 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
 system/trace-events |  1 +
 util/memfd.c        | 16 ++++++++++++---
 3 files changed, 70 insertions(+), 4 deletions(-)

diff --git a/system/physmem.c b/system/physmem.c
index 4d13761..e435564 100644
--- a/system/physmem.c
+++ b/system/physmem.c
@@ -48,6 +48,7 @@
 #include "qemu/qemu-print.h"
 #include "qemu/log.h"
 #include "qemu/memalign.h"
+#include "qemu/memfd.h"
 #include "exec/memory.h"
 #include "exec/ioport.h"
 #include "system/dma.h"
@@ -1948,6 +1949,7 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, ram_addr_t max_size,
                                  bool grow,
                                  Error **errp)
 {
+    ERRP_GUARD();
     RAMBlock *new_block;
     Error *local_err = NULL;
     int64_t file_size, file_align;
@@ -2068,6 +2070,25 @@ RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr,
 }
 #endif
 
+#ifdef CONFIG_POSIX
+/*
+ * Create MAP_SHARED RAMBlocks by mmap'ing a file descriptor, so it can be
+ * shared with another process if CPR is being used.  Use memfd if available
+ * because it has no size limits, else use POSIX shm.
+ */
+static int qemu_ram_get_shared_fd(const char *name, Error **errp)
+{
+    int fd;
+
+    if (qemu_memfd_check(0)) {
+        fd = qemu_memfd_create(name, 0, 0, 0, 0, errp);
+    } else {
+        fd = qemu_shm_alloc(0, errp);
+    }
+    return fd;
+}
+#endif
+
 static
 RAMBlock *qemu_ram_alloc_internal(ram_addr_t size, ram_addr_t max_size,
                                   qemu_ram_resize_cb resized,
@@ -2081,6 +2102,41 @@ RAMBlock *qemu_ram_alloc_internal(ram_addr_t size, ram_addr_t max_size,
     assert((ram_flags & ~(RAM_SHARED | RAM_RESIZEABLE | RAM_PREALLOC |
                           RAM_NORESERVE | RAM_GUEST_MEMFD)) == 0);
     assert(!host ^ (ram_flags & RAM_PREALLOC));
+    assert(max_size >= size);
+
+#ifdef CONFIG_POSIX         /* ignore RAM_SHARED for Windows */
+    if (!host) {
+        if (ram_flags & RAM_SHARED) {
+            const char *name = memory_region_name(mr);
+            int fd = qemu_ram_get_shared_fd(name, errp);
+
+            if (fd < 0) {
+                return NULL;
+            }
+
+            /* Use same alignment as qemu_anon_ram_alloc */
+            mr->align = QEMU_VMALLOC_ALIGN;
+
+            /*
+             * This can fail if the shm mount size is too small, or alloc from
+             * fd is not supported, but previous QEMU versions that called
+             * qemu_anon_ram_alloc for anonymous shared memory could have
+             * succeeded.  Quietly fail and fall back.
+             */
+            new_block = qemu_ram_alloc_from_fd(size, max_size, resized, mr,
+                                               ram_flags, fd, 0, false, NULL);
+            if (new_block) {
+                trace_qemu_ram_alloc_shared(name, new_block->used_length,
+                                            new_block->max_length, fd,
+                                            new_block->host);
+                return new_block;
+            }
+
+            close(fd);
+            /* fall back to anon allocation */
+        }
+    }
+#endif
 
     align = qemu_real_host_page_size();
     align = MAX(align, TARGET_PAGE_SIZE);
@@ -2092,7 +2148,6 @@ RAMBlock *qemu_ram_alloc_internal(ram_addr_t size, ram_addr_t max_size,
     new_block->resized = resized;
     new_block->used_length = size;
     new_block->max_length = max_size;
-    assert(max_size >= size);
     new_block->fd = -1;
     new_block->guest_memfd = -1;
     new_block->page_size = qemu_real_host_page_size();
diff --git a/system/trace-events b/system/trace-events
index 5bbc3fb..be12ebf 100644
--- a/system/trace-events
+++ b/system/trace-events
@@ -33,6 +33,7 @@ address_space_map(void *as, uint64_t addr, uint64_t len, bool is_write, uint32_t
 find_ram_offset(uint64_t size, uint64_t offset) "size: 0x%" PRIx64 " @ 0x%" PRIx64
 find_ram_offset_loop(uint64_t size, uint64_t candidate, uint64_t offset, uint64_t next, uint64_t mingap) "trying size: 0x%" PRIx64 " @ 0x%" PRIx64 ", offset: 0x%" PRIx64" next: 0x%" PRIx64 " mingap: 0x%" PRIx64
 ram_block_discard_range(const char *rbname, void *hva, size_t length, bool need_madvise, bool need_fallocate, int ret) "%s@%p + 0x%zx: madvise: %d fallocate: %d ret: %d"
+qemu_ram_alloc_shared(const char *name, size_t size, size_t max_size, int fd, void *host) "%s size %zu max_size %zu fd %d host %p"
 
 # cpus.c
 vm_stop_flush_all(int ret) "ret %d"
diff --git a/util/memfd.c b/util/memfd.c
index 8a2e906..07beab1 100644
--- a/util/memfd.c
+++ b/util/memfd.c
@@ -194,17 +194,27 @@ bool qemu_memfd_alloc_check(void)
 /**
  * qemu_memfd_check():
  *
- * Check if host supports memfd.
+ * Check if host supports memfd.  Cache the answer for the common case flags=0.
  */
 bool qemu_memfd_check(unsigned int flags)
 {
 #ifdef CONFIG_LINUX
-    int mfd = memfd_create("test", flags | MFD_CLOEXEC);
+    int mfd;
+    static int memfd_check = MEMFD_TODO;
 
+    if (!flags && memfd_check != MEMFD_TODO) {
+        return memfd_check;
+    }
+
+    mfd = memfd_create("test", flags | MFD_CLOEXEC);
     if (mfd >= 0) {
         close(mfd);
-        return true;
     }
+    if (!flags) {
+        memfd_check = (mfd >= 0) ? MEMFD_OK : MEMFD_KO;
+    }
+    return (mfd >= 0);
+
 #endif
 
     return false;
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH V7 05/24] memory: add RAM_PRIVATE
  2025-01-15 19:00 [PATCH V7 00/24] Live update: cpr-transfer Steve Sistare
                   ` (3 preceding siblings ...)
  2025-01-15 19:00 ` [PATCH V7 04/24] physmem: fd-based shared memory Steve Sistare
@ 2025-01-15 19:00 ` Steve Sistare
  2025-01-15 19:00 ` [PATCH V7 06/24] machine: aux-ram-share option Steve Sistare
                   ` (20 subsequent siblings)
  25 siblings, 0 replies; 44+ messages in thread
From: Steve Sistare @ 2025-01-15 19:00 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, David Hildenbrand, Marcel Apfelbaum,
	Eduardo Habkost, Philippe Mathieu-Daude, Paolo Bonzini,
	Daniel P. Berrange, Markus Armbruster, Steve Sistare

Define the RAM_PRIVATE flag.

In RAMBlock creation functions, if MAP_SHARED is 0 in the flags parameter,
in a subsequent patch the implementation may still create a shared mapping
if other conditions require it.  Callers who specifically want a private
mapping, eg for objects specified by the user, must pass RAM_PRIVATE.

After RAMBlock creation, MAP_SHARED in the block's flags indicates whether
the block is shared or private, and MAP_PRIVATE is omitted.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
---
 backends/hostmem-epc.c   |  2 +-
 backends/hostmem-file.c  |  2 +-
 backends/hostmem-memfd.c |  2 +-
 backends/hostmem-ram.c   |  2 +-
 include/exec/memory.h    | 10 ++++++++++
 system/physmem.c         | 15 ++++++++++++---
 6 files changed, 26 insertions(+), 7 deletions(-)

diff --git a/backends/hostmem-epc.c b/backends/hostmem-epc.c
index eb4b95d..1fa2d03 100644
--- a/backends/hostmem-epc.c
+++ b/backends/hostmem-epc.c
@@ -36,7 +36,7 @@ sgx_epc_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
 
     backend->aligned = true;
     name = object_get_canonical_path(OBJECT(backend));
-    ram_flags = (backend->share ? RAM_SHARED : 0) | RAM_PROTECTED;
+    ram_flags = (backend->share ? RAM_SHARED : RAM_PRIVATE) | RAM_PROTECTED;
     return memory_region_init_ram_from_fd(&backend->mr, OBJECT(backend), name,
                                           backend->size, ram_flags, fd, 0, errp);
 }
diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
index 46321fd..691a827 100644
--- a/backends/hostmem-file.c
+++ b/backends/hostmem-file.c
@@ -82,7 +82,7 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
 
     backend->aligned = true;
     name = host_memory_backend_get_name(backend);
-    ram_flags = backend->share ? RAM_SHARED : 0;
+    ram_flags = backend->share ? RAM_SHARED : RAM_PRIVATE;
     ram_flags |= fb->readonly ? RAM_READONLY_FD : 0;
     ram_flags |= fb->rom == ON_OFF_AUTO_ON ? RAM_READONLY : 0;
     ram_flags |= backend->reserve ? 0 : RAM_NORESERVE;
diff --git a/backends/hostmem-memfd.c b/backends/hostmem-memfd.c
index d4d0620..1672da9 100644
--- a/backends/hostmem-memfd.c
+++ b/backends/hostmem-memfd.c
@@ -52,7 +52,7 @@ memfd_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
 
     backend->aligned = true;
     name = host_memory_backend_get_name(backend);
-    ram_flags = backend->share ? RAM_SHARED : 0;
+    ram_flags = backend->share ? RAM_SHARED : RAM_PRIVATE;
     ram_flags |= backend->reserve ? 0 : RAM_NORESERVE;
     ram_flags |= backend->guest_memfd ? RAM_GUEST_MEMFD : 0;
     return memory_region_init_ram_from_fd(&backend->mr, OBJECT(backend), name,
diff --git a/backends/hostmem-ram.c b/backends/hostmem-ram.c
index 39aac6b..868ae6c 100644
--- a/backends/hostmem-ram.c
+++ b/backends/hostmem-ram.c
@@ -28,7 +28,7 @@ ram_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
     }
 
     name = host_memory_backend_get_name(backend);
-    ram_flags = backend->share ? RAM_SHARED : 0;
+    ram_flags = backend->share ? RAM_SHARED : RAM_PRIVATE;
     ram_flags |= backend->reserve ? 0 : RAM_NORESERVE;
     ram_flags |= backend->guest_memfd ? RAM_GUEST_MEMFD : 0;
     return memory_region_init_ram_flags_nomigrate(&backend->mr, OBJECT(backend),
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 9458e28..0ac21cc 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -246,6 +246,16 @@ typedef struct IOMMUTLBEvent {
 /* RAM can be private that has kvm guest memfd backend */
 #define RAM_GUEST_MEMFD   (1 << 12)
 
+/*
+ * In RAMBlock creation functions, if MAP_SHARED is 0 in the flags parameter,
+ * the implementation may still create a shared mapping if other conditions
+ * require it.  Callers who specifically want a private mapping, eg objects
+ * specified by the user, must pass RAM_PRIVATE.
+ * After RAMBlock creation, MAP_SHARED in the block's flags indicates whether
+ * the block is shared or private, and MAP_PRIVATE is omitted.
+ */
+#define RAM_PRIVATE (1 << 13)
+
 static inline void iommu_notifier_init(IOMMUNotifier *n, IOMMUNotify fn,
                                        IOMMUNotifierFlag flags,
                                        hwaddr start, hwaddr end,
diff --git a/system/physmem.c b/system/physmem.c
index e435564..03fac0a 100644
--- a/system/physmem.c
+++ b/system/physmem.c
@@ -1952,7 +1952,11 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, ram_addr_t max_size,
     ERRP_GUARD();
     RAMBlock *new_block;
     Error *local_err = NULL;
-    int64_t file_size, file_align;
+    int64_t file_size, file_align, share_flags;
+
+    share_flags = ram_flags & (RAM_PRIVATE | RAM_SHARED);
+    assert(share_flags != (RAM_SHARED | RAM_PRIVATE));
+    ram_flags &= ~RAM_PRIVATE;
 
     /* Just support these ram flags by now. */
     assert((ram_flags & ~(RAM_SHARED | RAM_PMEM | RAM_NORESERVE |
@@ -2097,7 +2101,11 @@ RAMBlock *qemu_ram_alloc_internal(ram_addr_t size, ram_addr_t max_size,
 {
     RAMBlock *new_block;
     Error *local_err = NULL;
-    int align;
+    int align, share_flags;
+
+    share_flags = ram_flags & (RAM_PRIVATE | RAM_SHARED);
+    assert(share_flags != (RAM_SHARED | RAM_PRIVATE));
+    ram_flags &= ~RAM_PRIVATE;
 
     assert((ram_flags & ~(RAM_SHARED | RAM_RESIZEABLE | RAM_PREALLOC |
                           RAM_NORESERVE | RAM_GUEST_MEMFD)) == 0);
@@ -2172,7 +2180,8 @@ RAMBlock *qemu_ram_alloc_from_ptr(ram_addr_t size, void *host,
 RAMBlock *qemu_ram_alloc(ram_addr_t size, uint32_t ram_flags,
                          MemoryRegion *mr, Error **errp)
 {
-    assert((ram_flags & ~(RAM_SHARED | RAM_NORESERVE | RAM_GUEST_MEMFD)) == 0);
+    assert((ram_flags & ~(RAM_SHARED | RAM_NORESERVE | RAM_GUEST_MEMFD |
+                          RAM_PRIVATE)) == 0);
     return qemu_ram_alloc_internal(size, size, NULL, NULL, ram_flags, mr, errp);
 }
 
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH V7 06/24] machine: aux-ram-share option
  2025-01-15 19:00 [PATCH V7 00/24] Live update: cpr-transfer Steve Sistare
                   ` (4 preceding siblings ...)
  2025-01-15 19:00 ` [PATCH V7 05/24] memory: add RAM_PRIVATE Steve Sistare
@ 2025-01-15 19:00 ` Steve Sistare
  2025-01-15 19:00 ` [PATCH V7 07/24] migration: cpr-state Steve Sistare
                   ` (19 subsequent siblings)
  25 siblings, 0 replies; 44+ messages in thread
From: Steve Sistare @ 2025-01-15 19:00 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, David Hildenbrand, Marcel Apfelbaum,
	Eduardo Habkost, Philippe Mathieu-Daude, Paolo Bonzini,
	Daniel P. Berrange, Markus Armbruster, Steve Sistare

Allocate auxilliary guest RAM as an anonymous file that is shareable
with an external process.  This option applies to memory allocated as
a side effect of creating various devices. It does not apply to
memory-backend-objects, whether explicitly specified on the command
line, or implicitly created by the -m command line option.

This option is intended to support new migration modes, in which the
memory region can be transferred in place to a new QEMU process, by sending
the memfd file descriptor to the process.  Memory contents are preserved,
and if the mode also transfers device descriptors, then pages that are
locked in memory for DMA remain locked.  This behavior is a pre-requisite
for supporting vfio, vdpa, and iommufd devices with the new modes.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
---
 hw/core/machine.c   | 22 ++++++++++++++++++++++
 include/hw/boards.h |  1 +
 qemu-options.hx     | 11 +++++++++++
 system/physmem.c    |  3 +++
 4 files changed, 37 insertions(+)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index c23b399..2b11bc4 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -457,6 +457,22 @@ static void machine_set_mem_merge(Object *obj, bool value, Error **errp)
     ms->mem_merge = value;
 }
 
+#ifdef CONFIG_POSIX
+static bool machine_get_aux_ram_share(Object *obj, Error **errp)
+{
+    MachineState *ms = MACHINE(obj);
+
+    return ms->aux_ram_share;
+}
+
+static void machine_set_aux_ram_share(Object *obj, bool value, Error **errp)
+{
+    MachineState *ms = MACHINE(obj);
+
+    ms->aux_ram_share = value;
+}
+#endif
+
 static bool machine_get_usb(Object *obj, Error **errp)
 {
     MachineState *ms = MACHINE(obj);
@@ -1162,6 +1178,12 @@ static void machine_class_init(ObjectClass *oc, void *data)
     object_class_property_set_description(oc, "mem-merge",
         "Enable/disable memory merge support");
 
+#ifdef CONFIG_POSIX
+    object_class_property_add_bool(oc, "aux-ram-share",
+                                   machine_get_aux_ram_share,
+                                   machine_set_aux_ram_share);
+#endif
+
     object_class_property_add_bool(oc, "usb",
         machine_get_usb, machine_set_usb);
     object_class_property_set_description(oc, "usb",
diff --git a/include/hw/boards.h b/include/hw/boards.h
index 2ad711e..e1f41b2 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -410,6 +410,7 @@ struct MachineState {
     bool enable_graphics;
     ConfidentialGuestSupport *cgs;
     HostMemoryBackend *memdev;
+    bool aux_ram_share;
     /*
      * convenience alias to ram_memdev_id backend memory region
      * or to numa container memory region
diff --git a/qemu-options.hx b/qemu-options.hx
index 7090d59..90fad31 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -38,6 +38,9 @@ DEF("machine", HAS_ARG, QEMU_OPTION_machine, \
     "                nvdimm=on|off controls NVDIMM support (default=off)\n"
     "                memory-encryption=@var{} memory encryption object to use (default=none)\n"
     "                hmat=on|off controls ACPI HMAT support (default=off)\n"
+#ifdef CONFIG_POSIX
+    "                aux-ram-share=on|off allocate auxiliary guest RAM as shared (default: off)\n"
+#endif
     "                memory-backend='backend-id' specifies explicitly provided backend for main RAM (default=none)\n"
     "                cxl-fmw.0.targets.0=firsttarget,cxl-fmw.0.targets.1=secondtarget,cxl-fmw.0.size=size[,cxl-fmw.0.interleave-granularity=granularity]\n",
     QEMU_ARCH_ALL)
@@ -101,6 +104,14 @@ SRST
         Enables or disables ACPI Heterogeneous Memory Attribute Table
         (HMAT) support. The default is off.
 
+    ``aux-ram-share=on|off``
+        Allocate auxiliary guest RAM as an anonymous file that is
+        shareable with an external process.  This option applies to
+        memory allocated as a side effect of creating various devices.
+        It does not apply to memory-backend-objects, whether explicitly
+        specified on the command line, or implicitly created by the -m
+        command line option.  The default is off.
+
     ``memory-backend='id'``
         An alternative to legacy ``-mem-path`` and ``mem-prealloc`` options.
         Allows to use a memory backend as main RAM.
diff --git a/system/physmem.c b/system/physmem.c
index 03fac0a..cb80ce3 100644
--- a/system/physmem.c
+++ b/system/physmem.c
@@ -2114,6 +2114,9 @@ RAMBlock *qemu_ram_alloc_internal(ram_addr_t size, ram_addr_t max_size,
 
 #ifdef CONFIG_POSIX         /* ignore RAM_SHARED for Windows */
     if (!host) {
+        if (!share_flags && current_machine->aux_ram_share) {
+            ram_flags |= RAM_SHARED;
+        }
         if (ram_flags & RAM_SHARED) {
             const char *name = memory_region_name(mr);
             int fd = qemu_ram_get_shared_fd(name, errp);
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH V7 07/24] migration: cpr-state
  2025-01-15 19:00 [PATCH V7 00/24] Live update: cpr-transfer Steve Sistare
                   ` (5 preceding siblings ...)
  2025-01-15 19:00 ` [PATCH V7 06/24] machine: aux-ram-share option Steve Sistare
@ 2025-01-15 19:00 ` Steve Sistare
  2025-01-15 19:00 ` [PATCH V7 08/24] physmem: preserve ram blocks for cpr Steve Sistare
                   ` (18 subsequent siblings)
  25 siblings, 0 replies; 44+ messages in thread
From: Steve Sistare @ 2025-01-15 19:00 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, David Hildenbrand, Marcel Apfelbaum,
	Eduardo Habkost, Philippe Mathieu-Daude, Paolo Bonzini,
	Daniel P. Berrange, Markus Armbruster, Steve Sistare

CPR must save state that is needed after QEMU is restarted, when devices
are realized.  Thus the extra state cannot be saved in the migration
channel, as objects must already exist before that channel can be loaded.
Instead, define auxilliary state structures and vmstate descriptions, not
associated with any registered object, and serialize the aux state to a
cpr-specific channel in cpr_state_save.  Deserialize in cpr_state_load
after QEMU restarts, before devices are realized.

Provide accessors for clients to register file descriptors for saving.
The mechanism for passing the fd's to the new process will be specific
to each migration mode, and added in subsequent patches.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
---
 include/migration/cpr.h |  25 ++++++
 migration/cpr.c         | 198 ++++++++++++++++++++++++++++++++++++++++++++++++
 migration/meson.build   |   1 +
 migration/migration.c   |   1 +
 migration/trace-events  |   7 ++
 5 files changed, 232 insertions(+)
 create mode 100644 include/migration/cpr.h
 create mode 100644 migration/cpr.c

diff --git a/include/migration/cpr.h b/include/migration/cpr.h
new file mode 100644
index 0000000..d9364f7
--- /dev/null
+++ b/include/migration/cpr.h
@@ -0,0 +1,25 @@
+/*
+ * Copyright (c) 2021, 2024 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef MIGRATION_CPR_H
+#define MIGRATION_CPR_H
+
+#include "qapi/qapi-types-migration.h"
+
+#define QEMU_CPR_FILE_MAGIC     0x51435052
+#define QEMU_CPR_FILE_VERSION   0x00000001
+
+void cpr_save_fd(const char *name, int id, int fd);
+void cpr_delete_fd(const char *name, int id);
+int cpr_find_fd(const char *name, int id);
+
+int cpr_state_save(MigrationChannel *channel, Error **errp);
+int cpr_state_load(MigrationChannel *channel, Error **errp);
+void cpr_state_close(void);
+struct QIOChannel *cpr_state_ioc(void);
+
+#endif
diff --git a/migration/cpr.c b/migration/cpr.c
new file mode 100644
index 0000000..87bcfdb
--- /dev/null
+++ b/migration/cpr.c
@@ -0,0 +1,198 @@
+/*
+ * Copyright (c) 2021-2024 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "migration/cpr.h"
+#include "migration/misc.h"
+#include "migration/options.h"
+#include "migration/qemu-file.h"
+#include "migration/savevm.h"
+#include "migration/vmstate.h"
+#include "system/runstate.h"
+#include "trace.h"
+
+/*************************************************************************/
+/* cpr state container for all information to be saved. */
+
+typedef QLIST_HEAD(CprFdList, CprFd) CprFdList;
+
+typedef struct CprState {
+    CprFdList fds;
+} CprState;
+
+static CprState cpr_state;
+
+/****************************************************************************/
+
+typedef struct CprFd {
+    char *name;
+    unsigned int namelen;
+    int id;
+    int fd;
+    QLIST_ENTRY(CprFd) next;
+} CprFd;
+
+static const VMStateDescription vmstate_cpr_fd = {
+    .name = "cpr fd",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .fields = (VMStateField[]) {
+        VMSTATE_UINT32(namelen, CprFd),
+        VMSTATE_VBUFFER_ALLOC_UINT32(name, CprFd, 0, NULL, namelen),
+        VMSTATE_INT32(id, CprFd),
+        VMSTATE_INT32(fd, CprFd),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
+void cpr_save_fd(const char *name, int id, int fd)
+{
+    CprFd *elem = g_new0(CprFd, 1);
+
+    trace_cpr_save_fd(name, id, fd);
+    elem->name = g_strdup(name);
+    elem->namelen = strlen(name) + 1;
+    elem->id = id;
+    elem->fd = fd;
+    QLIST_INSERT_HEAD(&cpr_state.fds, elem, next);
+}
+
+static CprFd *find_fd(CprFdList *head, const char *name, int id)
+{
+    CprFd *elem;
+
+    QLIST_FOREACH(elem, head, next) {
+        if (!strcmp(elem->name, name) && elem->id == id) {
+            return elem;
+        }
+    }
+    return NULL;
+}
+
+void cpr_delete_fd(const char *name, int id)
+{
+    CprFd *elem = find_fd(&cpr_state.fds, name, id);
+
+    if (elem) {
+        QLIST_REMOVE(elem, next);
+        g_free(elem->name);
+        g_free(elem);
+    }
+
+    trace_cpr_delete_fd(name, id);
+}
+
+int cpr_find_fd(const char *name, int id)
+{
+    CprFd *elem = find_fd(&cpr_state.fds, name, id);
+    int fd = elem ? elem->fd : -1;
+
+    trace_cpr_find_fd(name, id, fd);
+    return fd;
+}
+/*************************************************************************/
+#define CPR_STATE "CprState"
+
+static const VMStateDescription vmstate_cpr_state = {
+    .name = CPR_STATE,
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .fields = (VMStateField[]) {
+        VMSTATE_QLIST_V(fds, CprState, 1, vmstate_cpr_fd, CprFd, next),
+        VMSTATE_END_OF_LIST()
+    }
+};
+/*************************************************************************/
+
+static QEMUFile *cpr_state_file;
+
+QIOChannel *cpr_state_ioc(void)
+{
+    return qemu_file_get_ioc(cpr_state_file);
+}
+
+int cpr_state_save(MigrationChannel *channel, Error **errp)
+{
+    int ret;
+    QEMUFile *f;
+    MigMode mode = migrate_mode();
+
+    trace_cpr_state_save(MigMode_str(mode));
+
+    /* set f based on mode in a later patch in this series */
+    return 0;
+
+    qemu_put_be32(f, QEMU_CPR_FILE_MAGIC);
+    qemu_put_be32(f, QEMU_CPR_FILE_VERSION);
+
+    ret = vmstate_save_state(f, &vmstate_cpr_state, &cpr_state, 0);
+    if (ret) {
+        error_setg(errp, "vmstate_save_state error %d", ret);
+        qemu_fclose(f);
+        return ret;
+    }
+
+    /*
+     * Close the socket only partially so we can later detect when the other
+     * end closes by getting a HUP event.
+     */
+    qemu_fflush(f);
+    qio_channel_shutdown(qemu_file_get_ioc(f), QIO_CHANNEL_SHUTDOWN_WRITE,
+                         NULL);
+    cpr_state_file = f;
+    return 0;
+}
+
+int cpr_state_load(MigrationChannel *channel, Error **errp)
+{
+    int ret;
+    uint32_t v;
+    QEMUFile *f;
+    MigMode mode = 0;
+
+    /* set f and mode based on other parameters later in this patch series */
+    return 0;
+
+    trace_cpr_state_load(MigMode_str(mode));
+
+    v = qemu_get_be32(f);
+    if (v != QEMU_CPR_FILE_MAGIC) {
+        error_setg(errp, "Not a migration stream (bad magic %x)", v);
+        qemu_fclose(f);
+        return -EINVAL;
+    }
+    v = qemu_get_be32(f);
+    if (v != QEMU_CPR_FILE_VERSION) {
+        error_setg(errp, "Unsupported migration stream version %d", v);
+        qemu_fclose(f);
+        return -ENOTSUP;
+    }
+
+    ret = vmstate_load_state(f, &vmstate_cpr_state, &cpr_state, 1);
+    if (ret) {
+        error_setg(errp, "vmstate_load_state error %d", ret);
+        qemu_fclose(f);
+        return ret;
+    }
+
+    /*
+     * Let the caller decide when to close the socket (and generate a HUP event
+     * for the sending side).
+     */
+    cpr_state_file = f;
+
+    return ret;
+}
+
+void cpr_state_close(void)
+{
+    if (cpr_state_file) {
+        qemu_fclose(cpr_state_file);
+        cpr_state_file = NULL;
+    }
+}
diff --git a/migration/meson.build b/migration/meson.build
index dac687e..1eb8c96 100644
--- a/migration/meson.build
+++ b/migration/meson.build
@@ -14,6 +14,7 @@ system_ss.add(files(
   'block-active.c',
   'channel.c',
   'channel-block.c',
+  'cpr.c',
   'cpu-throttle.c',
   'dirtyrate.c',
   'exec.c',
diff --git a/migration/migration.c b/migration/migration.c
index 2d1da91..fce7b22 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -27,6 +27,7 @@
 #include "system/cpu-throttle.h"
 #include "rdma.h"
 #include "ram.h"
+#include "migration/cpr.h"
 #include "migration/global_state.h"
 #include "migration/misc.h"
 #include "migration.h"
diff --git a/migration/trace-events b/migration/trace-events
index b82a1c5..4e3061b 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -342,6 +342,13 @@ colo_receive_message(const char *msg) "Receive '%s' message"
 # colo-failover.c
 colo_failover_set_state(const char *new_state) "new state %s"
 
+# cpr.c
+cpr_save_fd(const char *name, int id, int fd) "%s, id %d, fd %d"
+cpr_delete_fd(const char *name, int id) "%s, id %d"
+cpr_find_fd(const char *name, int id, int fd) "%s, id %d returns %d"
+cpr_state_save(const char *mode) "%s mode"
+cpr_state_load(const char *mode) "%s mode"
+
 # block-dirty-bitmap.c
 send_bitmap_header_enter(void) ""
 send_bitmap_bits(uint32_t flags, uint64_t start_sector, uint32_t nr_sectors, uint64_t data_size) "flags: 0x%x, start_sector: %" PRIu64 ", nr_sectors: %" PRIu32 ", data_size: %" PRIu64
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH V7 08/24] physmem: preserve ram blocks for cpr
  2025-01-15 19:00 [PATCH V7 00/24] Live update: cpr-transfer Steve Sistare
                   ` (6 preceding siblings ...)
  2025-01-15 19:00 ` [PATCH V7 07/24] migration: cpr-state Steve Sistare
@ 2025-01-15 19:00 ` Steve Sistare
  2025-01-15 19:00 ` [PATCH V7 09/24] hostmem-memfd: preserve " Steve Sistare
                   ` (17 subsequent siblings)
  25 siblings, 0 replies; 44+ messages in thread
From: Steve Sistare @ 2025-01-15 19:00 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, David Hildenbrand, Marcel Apfelbaum,
	Eduardo Habkost, Philippe Mathieu-Daude, Paolo Bonzini,
	Daniel P. Berrange, Markus Armbruster, Steve Sistare

Save the memfd for ramblocks in CPR state, along with a name that
uniquely identifies it.  The block's idstr is not yet set, so it
cannot be used for this purpose.  Find the saved memfd in new QEMU when
creating a block.  If size of a resizable block is larger in new QEMU,
extend it via the file_ram_alloc truncate parameter, and the extra space
will be usable after a guest reset.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
---
 system/physmem.c | 44 +++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 39 insertions(+), 5 deletions(-)

diff --git a/system/physmem.c b/system/physmem.c
index cb80ce3..67c9db9 100644
--- a/system/physmem.c
+++ b/system/physmem.c
@@ -70,6 +70,7 @@
 
 #include "qemu/pmem.h"
 
+#include "migration/cpr.h"
 #include "migration/vmstate.h"
 
 #include "qemu/range.h"
@@ -1661,6 +1662,18 @@ void qemu_ram_unset_idstr(RAMBlock *block)
     }
 }
 
+static char *cpr_name(MemoryRegion *mr)
+{
+    const char *mr_name = memory_region_name(mr);
+    g_autofree char *id = mr->dev ? qdev_get_dev_path(mr->dev) : NULL;
+
+    if (id) {
+        return g_strdup_printf("%s/%s", id, mr_name);
+    } else {
+        return g_strdup(mr_name);
+    }
+}
+
 size_t qemu_ram_pagesize(RAMBlock *rb)
 {
     return rb->page_size;
@@ -2080,15 +2093,25 @@ RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr,
  * shared with another process if CPR is being used.  Use memfd if available
  * because it has no size limits, else use POSIX shm.
  */
-static int qemu_ram_get_shared_fd(const char *name, Error **errp)
+static int qemu_ram_get_shared_fd(const char *name, bool *reused, Error **errp)
 {
-    int fd;
+    int fd = cpr_find_fd(name, 0);
+
+    if (fd >= 0) {
+        *reused = true;
+        return fd;
+    }
 
     if (qemu_memfd_check(0)) {
         fd = qemu_memfd_create(name, 0, 0, 0, 0, errp);
     } else {
         fd = qemu_shm_alloc(0, errp);
     }
+
+    if (fd >= 0) {
+        cpr_save_fd(name, 0, fd);
+    }
+    *reused = false;
     return fd;
 }
 #endif
@@ -2118,8 +2141,9 @@ RAMBlock *qemu_ram_alloc_internal(ram_addr_t size, ram_addr_t max_size,
             ram_flags |= RAM_SHARED;
         }
         if (ram_flags & RAM_SHARED) {
-            const char *name = memory_region_name(mr);
-            int fd = qemu_ram_get_shared_fd(name, errp);
+            bool reused;
+            g_autofree char *name = cpr_name(mr);
+            int fd = qemu_ram_get_shared_fd(name, &reused, errp);
 
             if (fd < 0) {
                 return NULL;
@@ -2133,9 +2157,14 @@ RAMBlock *qemu_ram_alloc_internal(ram_addr_t size, ram_addr_t max_size,
              * fd is not supported, but previous QEMU versions that called
              * qemu_anon_ram_alloc for anonymous shared memory could have
              * succeeded.  Quietly fail and fall back.
+             *
+             * After cpr-transfer, new QEMU could create a memory region
+             * with a larger max size than old, so pass reused to grow the
+             * region if necessary.  The extra space will be usable after a
+             * guest reset.
              */
             new_block = qemu_ram_alloc_from_fd(size, max_size, resized, mr,
-                                               ram_flags, fd, 0, false, NULL);
+                                               ram_flags, fd, 0, reused, NULL);
             if (new_block) {
                 trace_qemu_ram_alloc_shared(name, new_block->used_length,
                                             new_block->max_length, fd,
@@ -2143,6 +2172,7 @@ RAMBlock *qemu_ram_alloc_internal(ram_addr_t size, ram_addr_t max_size,
                 return new_block;
             }
 
+            cpr_delete_fd(name, 0);
             close(fd);
             /* fall back to anon allocation */
         }
@@ -2221,6 +2251,8 @@ static void reclaim_ramblock(RAMBlock *block)
 
 void qemu_ram_free(RAMBlock *block)
 {
+    g_autofree char *name = NULL;
+
     if (!block) {
         return;
     }
@@ -2231,6 +2263,8 @@ void qemu_ram_free(RAMBlock *block)
     }
 
     qemu_mutex_lock_ramlist();
+    name = cpr_name(block->mr);
+    cpr_delete_fd(name, 0);
     QLIST_REMOVE_RCU(block, next);
     ram_list.mru_block = NULL;
     /* Write list before version */
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH V7 09/24] hostmem-memfd: preserve for cpr
  2025-01-15 19:00 [PATCH V7 00/24] Live update: cpr-transfer Steve Sistare
                   ` (7 preceding siblings ...)
  2025-01-15 19:00 ` [PATCH V7 08/24] physmem: preserve ram blocks for cpr Steve Sistare
@ 2025-01-15 19:00 ` Steve Sistare
  2025-01-15 19:00 ` [PATCH V7 10/24] hostmem-shm: " Steve Sistare
                   ` (16 subsequent siblings)
  25 siblings, 0 replies; 44+ messages in thread
From: Steve Sistare @ 2025-01-15 19:00 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, David Hildenbrand, Marcel Apfelbaum,
	Eduardo Habkost, Philippe Mathieu-Daude, Paolo Bonzini,
	Daniel P. Berrange, Markus Armbruster, Steve Sistare

Preserve memory-backend-memfd memory objects during cpr-transfer.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
---
 backends/hostmem-memfd.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/backends/hostmem-memfd.c b/backends/hostmem-memfd.c
index 1672da9..85daa14 100644
--- a/backends/hostmem-memfd.c
+++ b/backends/hostmem-memfd.c
@@ -17,6 +17,7 @@
 #include "qemu/module.h"
 #include "qapi/error.h"
 #include "qom/object.h"
+#include "migration/cpr.h"
 
 OBJECT_DECLARE_SIMPLE_TYPE(HostMemoryBackendMemfd, MEMORY_BACKEND_MEMFD)
 
@@ -33,15 +34,19 @@ static bool
 memfd_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
 {
     HostMemoryBackendMemfd *m = MEMORY_BACKEND_MEMFD(backend);
-    g_autofree char *name = NULL;
+    g_autofree char *name = host_memory_backend_get_name(backend);
+    int fd = cpr_find_fd(name, 0);
     uint32_t ram_flags;
-    int fd;
 
     if (!backend->size) {
         error_setg(errp, "can't create backend with size 0");
         return false;
     }
 
+    if (fd >= 0) {
+        goto have_fd;
+    }
+
     fd = qemu_memfd_create(TYPE_MEMORY_BACKEND_MEMFD, backend->size,
                            m->hugetlb, m->hugetlbsize, m->seal ?
                            F_SEAL_GROW | F_SEAL_SHRINK | F_SEAL_SEAL : 0,
@@ -49,9 +54,10 @@ memfd_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
     if (fd == -1) {
         return false;
     }
+    cpr_save_fd(name, 0, fd);
 
+have_fd:
     backend->aligned = true;
-    name = host_memory_backend_get_name(backend);
     ram_flags = backend->share ? RAM_SHARED : RAM_PRIVATE;
     ram_flags |= backend->reserve ? 0 : RAM_NORESERVE;
     ram_flags |= backend->guest_memfd ? RAM_GUEST_MEMFD : 0;
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH V7 10/24] hostmem-shm: preserve for cpr
  2025-01-15 19:00 [PATCH V7 00/24] Live update: cpr-transfer Steve Sistare
                   ` (8 preceding siblings ...)
  2025-01-15 19:00 ` [PATCH V7 09/24] hostmem-memfd: preserve " Steve Sistare
@ 2025-01-15 19:00 ` Steve Sistare
  2025-01-15 19:00 ` [PATCH V7 11/24] migration: enhance migrate_uri_parse Steve Sistare
                   ` (15 subsequent siblings)
  25 siblings, 0 replies; 44+ messages in thread
From: Steve Sistare @ 2025-01-15 19:00 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, David Hildenbrand, Marcel Apfelbaum,
	Eduardo Habkost, Philippe Mathieu-Daude, Paolo Bonzini,
	Daniel P. Berrange, Markus Armbruster, Steve Sistare

Preserve memory-backend-shm memory objects during cpr-transfer.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
---
 backends/hostmem-shm.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/backends/hostmem-shm.c b/backends/hostmem-shm.c
index fabee41..f67ad27 100644
--- a/backends/hostmem-shm.c
+++ b/backends/hostmem-shm.c
@@ -13,6 +13,7 @@
 #include "qemu/osdep.h"
 #include "system/hostmem.h"
 #include "qapi/error.h"
+#include "migration/cpr.h"
 
 #define TYPE_MEMORY_BACKEND_SHM "memory-backend-shm"
 
@@ -25,9 +26,9 @@ struct HostMemoryBackendShm {
 static bool
 shm_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
 {
-    g_autofree char *backend_name = NULL;
+    g_autofree char *backend_name = host_memory_backend_get_name(backend);
     uint32_t ram_flags;
-    int fd;
+    int fd = cpr_find_fd(backend_name, 0);
 
     if (!backend->size) {
         error_setg(errp, "can't create shm backend with size 0");
@@ -39,13 +40,18 @@ shm_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
         return false;
     }
 
+    if (fd >= 0) {
+        goto have_fd;
+    }
+
     fd = qemu_shm_alloc(backend->size, errp);
     if (fd < 0) {
         return false;
     }
+    cpr_save_fd(backend_name, 0, fd);
 
+have_fd:
     /* Let's do the same as memory-backend-ram,share=on would do. */
-    backend_name = host_memory_backend_get_name(backend);
     ram_flags = RAM_SHARED;
     ram_flags |= backend->reserve ? 0 : RAM_NORESERVE;
 
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH V7 11/24] migration: enhance migrate_uri_parse
  2025-01-15 19:00 [PATCH V7 00/24] Live update: cpr-transfer Steve Sistare
                   ` (9 preceding siblings ...)
  2025-01-15 19:00 ` [PATCH V7 10/24] hostmem-shm: " Steve Sistare
@ 2025-01-15 19:00 ` Steve Sistare
  2025-01-15 19:00 ` [PATCH V7 12/24] migration: incoming channel Steve Sistare
                   ` (14 subsequent siblings)
  25 siblings, 0 replies; 44+ messages in thread
From: Steve Sistare @ 2025-01-15 19:00 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, David Hildenbrand, Marcel Apfelbaum,
	Eduardo Habkost, Philippe Mathieu-Daude, Paolo Bonzini,
	Daniel P. Berrange, Markus Armbruster, Steve Sistare

Export migrate_uri_parse for use outside migration internals, and define
a method migrate_is_uri that indicates when migrate_uri_parse should
be used.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
---
 include/migration/misc.h |  7 +++++++
 migration/migration.c    | 11 +++++++++++
 migration/migration.h    |  2 --
 3 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/include/migration/misc.h b/include/migration/misc.h
index 67f7ef7..c660be8 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -108,4 +108,11 @@ bool migration_in_bg_snapshot(void);
 bool migration_block_activate(Error **errp);
 bool migration_block_inactivate(void);
 
+/* True if @uri starts with a syntactically valid URI prefix */
+bool migrate_is_uri(const char *uri);
+
+/* Parse @uri and return @channel, returning true on success */
+bool migrate_uri_parse(const char *uri, MigrationChannel **channel,
+                       Error **errp);
+
 #endif
diff --git a/migration/migration.c b/migration/migration.c
index fce7b22..b5ee98e 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -14,6 +14,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/ctype.h"
 #include "qemu/cutils.h"
 #include "qemu/error-report.h"
 #include "qemu/main-loop.h"
@@ -587,6 +588,16 @@ void migrate_add_address(SocketAddress *address)
                       QAPI_CLONE(SocketAddress, address));
 }
 
+bool migrate_is_uri(const char *uri)
+{
+    while (*uri && *uri != ':') {
+        if (!qemu_isalpha(*uri++)) {
+            return false;
+        }
+    }
+    return *uri == ':';
+}
+
 bool migrate_uri_parse(const char *uri, MigrationChannel **channel,
                        Error **errp)
 {
diff --git a/migration/migration.h b/migration/migration.h
index 0df2a18..1d4d4e9 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -519,8 +519,6 @@ bool check_dirty_bitmap_mig_alias_map(const BitmapMigrationNodeAliasList *bbm,
                                       Error **errp);
 
 void migrate_add_address(SocketAddress *address);
-bool migrate_uri_parse(const char *uri, MigrationChannel **channel,
-                       Error **errp);
 int foreach_not_ignored_block(RAMBlockIterFunc func, void *opaque);
 
 #define qemu_ram_foreach_block \
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH V7 12/24] migration: incoming channel
  2025-01-15 19:00 [PATCH V7 00/24] Live update: cpr-transfer Steve Sistare
                   ` (10 preceding siblings ...)
  2025-01-15 19:00 ` [PATCH V7 11/24] migration: enhance migrate_uri_parse Steve Sistare
@ 2025-01-15 19:00 ` Steve Sistare
  2025-01-15 19:00 ` [PATCH V7 13/24] migration: SCM_RIGHTS for QEMUFile Steve Sistare
                   ` (13 subsequent siblings)
  25 siblings, 0 replies; 44+ messages in thread
From: Steve Sistare @ 2025-01-15 19:00 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, David Hildenbrand, Marcel Apfelbaum,
	Eduardo Habkost, Philippe Mathieu-Daude, Paolo Bonzini,
	Daniel P. Berrange, Markus Armbruster, Steve Sistare

Extend the -incoming option to allow an @MigrationChannel to be specified.
This allows channels other than 'main' to be described on the command
line, which will be needed for CPR.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Acked-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c | 21 ++++++++++++++++-----
 qemu-options.hx       | 21 +++++++++++++++++++++
 system/vl.c           | 36 +++++++++++++++++++++++++++++++++---
 3 files changed, 70 insertions(+), 8 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index b5ee98e..5f2540f 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -695,7 +695,8 @@ static void qemu_start_incoming_migration(const char *uri, bool has_channels,
     if (channels) {
         /* To verify that Migrate channel list has only item */
         if (channels->next) {
-            error_setg(errp, "Channel list has more than one entries");
+            error_setg(errp, "Channel list must have only one entry, "
+                             "for type 'main'");
             return;
         }
         addr = channels->value->addr;
@@ -2054,6 +2055,7 @@ void qmp_migrate(const char *uri, bool has_channels,
     MigrationState *s = migrate_get_current();
     g_autoptr(MigrationChannel) channel = NULL;
     MigrationAddress *addr = NULL;
+    MigrationChannel *channelv[MIGRATION_CHANNEL_TYPE__MAX] = { NULL };
 
     /*
      * Having preliminary checks for uri and channel
@@ -2064,12 +2066,21 @@ void qmp_migrate(const char *uri, bool has_channels,
     }
 
     if (channels) {
-        /* To verify that Migrate channel list has only item */
-        if (channels->next) {
-            error_setg(errp, "Channel list has more than one entries");
+        for ( ; channels; channels = channels->next) {
+            MigrationChannelType type = channels->value->channel_type;
+
+            if (channelv[type]) {
+                error_setg(errp, "Channel list has more than one %s entry",
+                           MigrationChannelType_str(type));
+                return;
+            }
+            channelv[type] = channels->value;
+        }
+        addr = channelv[MIGRATION_CHANNEL_TYPE_MAIN]->addr;
+        if (!addr) {
+            error_setg(errp, "Channel list has no main entry");
             return;
         }
-        addr = channels->value->addr;
     }
 
     if (uri) {
diff --git a/qemu-options.hx b/qemu-options.hx
index 90fad31..3d1af73 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -4940,10 +4940,18 @@ DEF("incoming", HAS_ARG, QEMU_OPTION_incoming, \
     "-incoming exec:cmdline\n" \
     "                accept incoming migration on given file descriptor\n" \
     "                or from given external command\n" \
+    "-incoming <channel>\n" \
+    "                accept incoming migration on the migration channel\n" \
     "-incoming defer\n" \
     "                wait for the URI to be specified via migrate_incoming\n",
     QEMU_ARCH_ALL)
 SRST
+The -incoming option specifies the migration channel for an incoming
+migration.  It may be used multiple times to specify multiple
+migration channel types.  The channel type is specified in <channel>,
+or is 'main' for all other forms of -incoming.  If multiple -incoming
+options are specified for a channel type, the last one takes precedence.
+
 ``-incoming tcp:[host]:port[,to=maxport][,ipv4=on|off][,ipv6=on|off]``
   \ 
 ``-incoming rdma:host:port[,ipv4=on|off][,ipv6=on|off]``
@@ -4963,6 +4971,19 @@ SRST
     Accept incoming migration as an output from specified external
     command.
 
+``-incoming <channel>``
+    Accept incoming migration on the migration channel.  For the syntax
+    of <channel>, see the QAPI documentation of ``MigrationChannel``.
+    Examples:
+    ::
+
+        -incoming '{"channel-type": "main",
+                    "addr": { "transport": "socket",
+                              "type": "unix",
+                              "path": "my.sock" }}'
+
+        -incoming main,addr.transport=socket,addr.type=unix,addr.path=my.sock
+
 ``-incoming defer``
     Wait for the URI to be specified via migrate\_incoming. The monitor
     can be used to change settings (such as migration parameters) prior
diff --git a/system/vl.c b/system/vl.c
index 61285ad..251efa0 100644
--- a/system/vl.c
+++ b/system/vl.c
@@ -123,6 +123,7 @@
 #include "qapi/qapi-visit-block-core.h"
 #include "qapi/qapi-visit-compat.h"
 #include "qapi/qapi-visit-machine.h"
+#include "qapi/qapi-visit-migration.h"
 #include "qapi/qapi-visit-ui.h"
 #include "qapi/qapi-commands-block-core.h"
 #include "qapi/qapi-commands-migration.h"
@@ -159,6 +160,8 @@ typedef struct DeviceOption {
 static const char *cpu_option;
 static const char *mem_path;
 static const char *incoming;
+static const char *incoming_str[MIGRATION_CHANNEL_TYPE__MAX];
+static MigrationChannel *incoming_channels[MIGRATION_CHANNEL_TYPE__MAX];
 static const char *loadvm;
 static const char *accelerators;
 static bool have_custom_ram_size;
@@ -1822,6 +1825,30 @@ static void object_option_add_visitor(Visitor *v)
     QTAILQ_INSERT_TAIL(&object_opts, opt, next);
 }
 
+static void incoming_option_parse(const char *str)
+{
+    MigrationChannelType type = MIGRATION_CHANNEL_TYPE_MAIN;
+    MigrationChannel *channel;
+    Visitor *v;
+
+    if (!strcmp(str, "defer")) {
+        channel = NULL;
+    } else if (migrate_is_uri(str)) {
+        migrate_uri_parse(str, &channel, &error_fatal);
+    } else {
+        v = qobject_input_visitor_new_str(str, "channel-type", &error_fatal);
+        visit_type_MigrationChannel(v, NULL, &channel, &error_fatal);
+        visit_free(v);
+        type = channel->channel_type;
+    }
+
+    /* New incoming spec replaces the previous */
+    qapi_free_MigrationChannel(incoming_channels[type]);
+    incoming_channels[type] = channel;
+    incoming_str[type] = str;
+    incoming = incoming_str[MIGRATION_CHANNEL_TYPE_MAIN];
+}
+
 static void object_option_parse(const char *str)
 {
     QemuOpts *opts;
@@ -2753,8 +2780,11 @@ void qmp_x_exit_preconfig(Error **errp)
     if (incoming) {
         Error *local_err = NULL;
         if (strcmp(incoming, "defer") != 0) {
-            qmp_migrate_incoming(incoming, false, NULL, true, true,
-                                 &local_err);
+            g_autofree MigrationChannelList *channels =
+                g_new0(MigrationChannelList, 1);
+
+            channels->value = incoming_channels[MIGRATION_CHANNEL_TYPE_MAIN];
+            qmp_migrate_incoming(NULL, true, channels, true, true, &local_err);
             if (local_err) {
                 error_reportf_err(local_err, "-incoming %s: ", incoming);
                 exit(1);
@@ -3503,7 +3533,7 @@ void qemu_init(int argc, char **argv)
                 if (!incoming) {
                     runstate_set(RUN_STATE_INMIGRATE);
                 }
-                incoming = optarg;
+                incoming_option_parse(optarg);
                 break;
             case QEMU_OPTION_only_migratable:
                 only_migratable = 1;
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH V7 13/24] migration: SCM_RIGHTS for QEMUFile
  2025-01-15 19:00 [PATCH V7 00/24] Live update: cpr-transfer Steve Sistare
                   ` (11 preceding siblings ...)
  2025-01-15 19:00 ` [PATCH V7 12/24] migration: incoming channel Steve Sistare
@ 2025-01-15 19:00 ` Steve Sistare
  2025-01-15 19:00 ` [PATCH V7 14/24] migration: VMSTATE_FD Steve Sistare
                   ` (12 subsequent siblings)
  25 siblings, 0 replies; 44+ messages in thread
From: Steve Sistare @ 2025-01-15 19:00 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, David Hildenbrand, Marcel Apfelbaum,
	Eduardo Habkost, Philippe Mathieu-Daude, Paolo Bonzini,
	Daniel P. Berrange, Markus Armbruster, Steve Sistare

Define functions to put/get file descriptors to/from a QEMUFile, for qio
channels that support SCM_RIGHTS.  Maintain ordering such that
  put(A), put(fd), put(B)
followed by
  get(A), get(fd), get(B)
always succeeds.  Other get orderings may succeed but are not guaranteed.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
---
 migration/qemu-file.c  | 84 +++++++++++++++++++++++++++++++++++++++++++++++---
 migration/qemu-file.h  |  2 ++
 migration/trace-events |  2 ++
 3 files changed, 84 insertions(+), 4 deletions(-)

diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index b6d2f58..1303a5b 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -37,6 +37,11 @@
 #define IO_BUF_SIZE 32768
 #define MAX_IOV_SIZE MIN_CONST(IOV_MAX, 64)
 
+typedef struct FdEntry {
+    QTAILQ_ENTRY(FdEntry) entry;
+    int fd;
+} FdEntry;
+
 struct QEMUFile {
     QIOChannel *ioc;
     bool is_writable;
@@ -51,6 +56,9 @@ struct QEMUFile {
 
     int last_error;
     Error *last_error_obj;
+
+    bool can_pass_fd;
+    QTAILQ_HEAD(, FdEntry) fds;
 };
 
 /*
@@ -109,6 +117,8 @@ static QEMUFile *qemu_file_new_impl(QIOChannel *ioc, bool is_writable)
     object_ref(ioc);
     f->ioc = ioc;
     f->is_writable = is_writable;
+    f->can_pass_fd = qio_channel_has_feature(ioc, QIO_CHANNEL_FEATURE_FD_PASS);
+    QTAILQ_INIT(&f->fds);
 
     return f;
 }
@@ -310,6 +320,10 @@ static ssize_t coroutine_mixed_fn qemu_fill_buffer(QEMUFile *f)
     int len;
     int pending;
     Error *local_error = NULL;
+    g_autofree int *fds = NULL;
+    size_t nfd = 0;
+    int **pfds = f->can_pass_fd ? &fds : NULL;
+    size_t *pnfd = f->can_pass_fd ? &nfd : NULL;
 
     assert(!qemu_file_is_writable(f));
 
@@ -325,10 +339,9 @@ static ssize_t coroutine_mixed_fn qemu_fill_buffer(QEMUFile *f)
     }
 
     do {
-        len = qio_channel_read(f->ioc,
-                               (char *)f->buf + pending,
-                               IO_BUF_SIZE - pending,
-                               &local_error);
+        struct iovec iov = { f->buf + pending, IO_BUF_SIZE - pending };
+        len = qio_channel_readv_full(f->ioc, &iov, 1, pfds, pnfd, 0,
+                                     &local_error);
         if (len == QIO_CHANNEL_ERR_BLOCK) {
             if (qemu_in_coroutine()) {
                 qio_channel_yield(f->ioc, G_IO_IN);
@@ -348,9 +361,66 @@ static ssize_t coroutine_mixed_fn qemu_fill_buffer(QEMUFile *f)
         qemu_file_set_error_obj(f, len, local_error);
     }
 
+    for (int i = 0; i < nfd; i++) {
+        FdEntry *fde = g_new0(FdEntry, 1);
+        fde->fd = fds[i];
+        QTAILQ_INSERT_TAIL(&f->fds, fde, entry);
+    }
+
     return len;
 }
 
+int qemu_file_put_fd(QEMUFile *f, int fd)
+{
+    int ret = 0;
+    QIOChannel *ioc = qemu_file_get_ioc(f);
+    Error *err = NULL;
+    struct iovec iov = { (void *)" ", 1 };
+
+    /*
+     * Send a dummy byte so qemu_fill_buffer on the receiving side does not
+     * fail with a len=0 error.  Flush first to maintain ordering wrt other
+     * data.
+     */
+
+    qemu_fflush(f);
+    if (qio_channel_writev_full(ioc, &iov, 1, &fd, 1, 0, &err) < 1) {
+        error_report_err(error_copy(err));
+        qemu_file_set_error_obj(f, -EIO, err);
+        ret = -1;
+    }
+    trace_qemu_file_put_fd(f->ioc->name, fd, ret);
+    return ret;
+}
+
+int qemu_file_get_fd(QEMUFile *f)
+{
+    int fd = -1;
+    FdEntry *fde;
+
+    if (!f->can_pass_fd) {
+        Error *err = NULL;
+        error_setg(&err, "%s does not support fd passing", f->ioc->name);
+        error_report_err(error_copy(err));
+        qemu_file_set_error_obj(f, -EIO, err);
+        goto out;
+    }
+
+    /* Force the dummy byte and its fd passenger to appear. */
+    qemu_peek_byte(f, 0);
+
+    fde = QTAILQ_FIRST(&f->fds);
+    if (fde) {
+        qemu_get_byte(f);       /* Drop the dummy byte */
+        fd = fde->fd;
+        QTAILQ_REMOVE(&f->fds, fde, entry);
+        g_free(fde);
+    }
+out:
+    trace_qemu_file_get_fd(f->ioc->name, fd);
+    return fd;
+}
+
 /** Closes the file
  *
  * Returns negative error value if any error happened on previous operations or
@@ -361,11 +431,17 @@ static ssize_t coroutine_mixed_fn qemu_fill_buffer(QEMUFile *f)
  */
 int qemu_fclose(QEMUFile *f)
 {
+    FdEntry *fde, *next;
     int ret = qemu_fflush(f);
     int ret2 = qio_channel_close(f->ioc, NULL);
     if (ret >= 0) {
         ret = ret2;
     }
+    QTAILQ_FOREACH_SAFE(fde, &f->fds, entry, next) {
+        warn_report("qemu_fclose: received fd %d was never claimed", fde->fd);
+        close(fde->fd);
+        g_free(fde);
+    }
     g_clear_pointer(&f->ioc, object_unref);
     error_free(f->last_error_obj);
     g_free(f);
diff --git a/migration/qemu-file.h b/migration/qemu-file.h
index 11c2120..3e47a20 100644
--- a/migration/qemu-file.h
+++ b/migration/qemu-file.h
@@ -79,5 +79,7 @@ size_t qemu_get_buffer_at(QEMUFile *f, const uint8_t *buf, size_t buflen,
                           off_t pos);
 
 QIOChannel *qemu_file_get_ioc(QEMUFile *file);
+int qemu_file_put_fd(QEMUFile *f, int fd);
+int qemu_file_get_fd(QEMUFile *f);
 
 #endif
diff --git a/migration/trace-events b/migration/trace-events
index 4e3061b..abd9cdf 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -88,6 +88,8 @@ put_qlist_end(const char *field_name, const char *vmsd_name) "%s(%s)"
 
 # qemu-file.c
 qemu_file_fclose(void) ""
+qemu_file_put_fd(const char *name, int fd, int ret) "ioc %s, fd %d -> status %d"
+qemu_file_get_fd(const char *name, int fd) "ioc %s -> fd %d"
 
 # ram.c
 get_queued_page(const char *block_name, uint64_t tmp_offset, unsigned long page_abs) "%s/0x%" PRIx64 " page_abs=0x%lx"
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH V7 14/24] migration: VMSTATE_FD
  2025-01-15 19:00 [PATCH V7 00/24] Live update: cpr-transfer Steve Sistare
                   ` (12 preceding siblings ...)
  2025-01-15 19:00 ` [PATCH V7 13/24] migration: SCM_RIGHTS for QEMUFile Steve Sistare
@ 2025-01-15 19:00 ` Steve Sistare
  2025-01-15 19:00 ` [PATCH V7 15/24] migration: cpr-transfer save and load Steve Sistare
                   ` (11 subsequent siblings)
  25 siblings, 0 replies; 44+ messages in thread
From: Steve Sistare @ 2025-01-15 19:00 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, David Hildenbrand, Marcel Apfelbaum,
	Eduardo Habkost, Philippe Mathieu-Daude, Paolo Bonzini,
	Daniel P. Berrange, Markus Armbruster, Steve Sistare

Define VMSTATE_FD for declaring a file descriptor field in a
VMStateDescription.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
---
 include/migration/vmstate.h |  9 +++++++++
 migration/vmstate-types.c   | 23 +++++++++++++++++++++++
 2 files changed, 32 insertions(+)

diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index f313f2f..a1dfab4 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -230,6 +230,7 @@ extern const VMStateInfo vmstate_info_uint8;
 extern const VMStateInfo vmstate_info_uint16;
 extern const VMStateInfo vmstate_info_uint32;
 extern const VMStateInfo vmstate_info_uint64;
+extern const VMStateInfo vmstate_info_fd;
 
 /** Put this in the stream when migrating a null pointer.*/
 #define VMS_NULLPTR_MARKER (0x30U) /* '0' */
@@ -902,6 +903,9 @@ extern const VMStateInfo vmstate_info_qlist;
 #define VMSTATE_UINT64_V(_f, _s, _v)                                  \
     VMSTATE_SINGLE(_f, _s, _v, vmstate_info_uint64, uint64_t)
 
+#define VMSTATE_FD_V(_f, _s, _v)                                  \
+    VMSTATE_SINGLE(_f, _s, _v, vmstate_info_fd, int32_t)
+
 #ifdef CONFIG_LINUX
 
 #define VMSTATE_U8_V(_f, _s, _v)                                   \
@@ -936,6 +940,9 @@ extern const VMStateInfo vmstate_info_qlist;
 #define VMSTATE_UINT64(_f, _s)                                        \
     VMSTATE_UINT64_V(_f, _s, 0)
 
+#define VMSTATE_FD(_f, _s)                                            \
+    VMSTATE_FD_V(_f, _s, 0)
+
 #ifdef CONFIG_LINUX
 
 #define VMSTATE_U8(_f, _s)                                         \
@@ -1009,6 +1016,8 @@ extern const VMStateInfo vmstate_info_qlist;
 #define VMSTATE_UINT64_TEST(_f, _s, _t)                                  \
     VMSTATE_SINGLE_TEST(_f, _s, _t, 0, vmstate_info_uint64, uint64_t)
 
+#define VMSTATE_FD_TEST(_f, _s, _t)                                            \
+    VMSTATE_SINGLE_TEST(_f, _s, _t, 0, vmstate_info_fd, int32_t)
 
 #define VMSTATE_TIMER_PTR_TEST(_f, _s, _test)                             \
     VMSTATE_POINTER_TEST(_f, _s, _test, vmstate_info_timer, QEMUTimer *)
diff --git a/migration/vmstate-types.c b/migration/vmstate-types.c
index d70d573..0319c35 100644
--- a/migration/vmstate-types.c
+++ b/migration/vmstate-types.c
@@ -314,6 +314,29 @@ const VMStateInfo vmstate_info_uint64 = {
     .put  = put_uint64,
 };
 
+/* File descriptor communicated via SCM_RIGHTS */
+
+static int get_fd(QEMUFile *f, void *pv, size_t size,
+                  const VMStateField *field)
+{
+    int32_t *v = pv;
+    *v = qemu_file_get_fd(f);
+    return 0;
+}
+
+static int put_fd(QEMUFile *f, void *pv, size_t size,
+                  const VMStateField *field, JSONWriter *vmdesc)
+{
+    int32_t *v = pv;
+    return qemu_file_put_fd(f, *v);
+}
+
+const VMStateInfo vmstate_info_fd = {
+    .name = "fd",
+    .get  = get_fd,
+    .put  = put_fd,
+};
+
 static int get_nullptr(QEMUFile *f, void *pv, size_t size,
                        const VMStateField *field)
 
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH V7 15/24] migration: cpr-transfer save and load
  2025-01-15 19:00 [PATCH V7 00/24] Live update: cpr-transfer Steve Sistare
                   ` (13 preceding siblings ...)
  2025-01-15 19:00 ` [PATCH V7 14/24] migration: VMSTATE_FD Steve Sistare
@ 2025-01-15 19:00 ` Steve Sistare
  2025-01-15 19:00 ` [PATCH V7 16/24] migration: cpr-transfer mode Steve Sistare
                   ` (10 subsequent siblings)
  25 siblings, 0 replies; 44+ messages in thread
From: Steve Sistare @ 2025-01-15 19:00 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, David Hildenbrand, Marcel Apfelbaum,
	Eduardo Habkost, Philippe Mathieu-Daude, Paolo Bonzini,
	Daniel P. Berrange, Markus Armbruster, Steve Sistare

Add functions to create a QEMUFile based on a unix URI, for saving or
loading, for use by cpr-transfer mode to preserve CPR state.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
---
 include/migration/cpr.h  |  3 ++
 migration/cpr-transfer.c | 71 ++++++++++++++++++++++++++++++++++++++++++++++++
 migration/meson.build    |  1 +
 migration/trace-events   |  2 ++
 4 files changed, 77 insertions(+)
 create mode 100644 migration/cpr-transfer.c

diff --git a/include/migration/cpr.h b/include/migration/cpr.h
index d9364f7..c669b8b 100644
--- a/include/migration/cpr.h
+++ b/include/migration/cpr.h
@@ -22,4 +22,7 @@ int cpr_state_load(MigrationChannel *channel, Error **errp);
 void cpr_state_close(void);
 struct QIOChannel *cpr_state_ioc(void);
 
+QEMUFile *cpr_transfer_output(MigrationChannel *channel, Error **errp);
+QEMUFile *cpr_transfer_input(MigrationChannel *channel, Error **errp);
+
 #endif
diff --git a/migration/cpr-transfer.c b/migration/cpr-transfer.c
new file mode 100644
index 0000000..e1f1403
--- /dev/null
+++ b/migration/cpr-transfer.c
@@ -0,0 +1,71 @@
+/*
+ * Copyright (c) 2022, 2024 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "io/channel-file.h"
+#include "io/channel-socket.h"
+#include "io/net-listener.h"
+#include "migration/cpr.h"
+#include "migration/migration.h"
+#include "migration/savevm.h"
+#include "migration/qemu-file.h"
+#include "migration/vmstate.h"
+#include "trace.h"
+
+QEMUFile *cpr_transfer_output(MigrationChannel *channel, Error **errp)
+{
+    MigrationAddress *addr = channel->addr;
+
+    if (addr->transport == MIGRATION_ADDRESS_TYPE_SOCKET &&
+        addr->u.socket.type == SOCKET_ADDRESS_TYPE_UNIX) {
+
+        g_autoptr(QIOChannelSocket) sioc = qio_channel_socket_new();
+        QIOChannel *ioc = QIO_CHANNEL(sioc);
+        SocketAddress *saddr = &addr->u.socket;
+
+        if (qio_channel_socket_connect_sync(sioc, saddr, errp) < 0) {
+            return NULL;
+        }
+        trace_cpr_transfer_output(addr->u.socket.u.q_unix.path);
+        qio_channel_set_name(ioc, "cpr-out");
+        return qemu_file_new_output(ioc);
+
+    } else {
+        error_setg(errp, "bad cpr channel address; must be unix");
+        return NULL;
+    }
+}
+
+QEMUFile *cpr_transfer_input(MigrationChannel *channel, Error **errp)
+{
+    MigrationAddress *addr = channel->addr;
+
+    if (addr->transport == MIGRATION_ADDRESS_TYPE_SOCKET &&
+        addr->u.socket.type == SOCKET_ADDRESS_TYPE_UNIX) {
+
+        g_autoptr(QIOChannelSocket) sioc = NULL;
+        SocketAddress *saddr = &addr->u.socket;
+        g_autoptr(QIONetListener) listener = qio_net_listener_new();
+        QIOChannel *ioc;
+
+        qio_net_listener_set_name(listener, "cpr-socket-listener");
+        if (qio_net_listener_open_sync(listener, saddr, 1, errp) < 0) {
+            return NULL;
+        }
+
+        sioc = qio_net_listener_wait_client(listener);
+        ioc = QIO_CHANNEL(sioc);
+        trace_cpr_transfer_input(addr->u.socket.u.q_unix.path);
+        qio_channel_set_name(ioc, "cpr-in");
+        return qemu_file_new_input(ioc);
+
+    } else {
+        error_setg(errp, "bad cpr channel socket type; must be unix");
+        return NULL;
+    }
+}
diff --git a/migration/meson.build b/migration/meson.build
index 1eb8c96..d3bfe84 100644
--- a/migration/meson.build
+++ b/migration/meson.build
@@ -15,6 +15,7 @@ system_ss.add(files(
   'channel.c',
   'channel-block.c',
   'cpr.c',
+  'cpr-transfer.c',
   'cpu-throttle.c',
   'dirtyrate.c',
   'exec.c',
diff --git a/migration/trace-events b/migration/trace-events
index abd9cdf..e03a914 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -350,6 +350,8 @@ cpr_delete_fd(const char *name, int id) "%s, id %d"
 cpr_find_fd(const char *name, int id, int fd) "%s, id %d returns %d"
 cpr_state_save(const char *mode) "%s mode"
 cpr_state_load(const char *mode) "%s mode"
+cpr_transfer_input(const char *path) "%s"
+cpr_transfer_output(const char *path) "%s"
 
 # block-dirty-bitmap.c
 send_bitmap_header_enter(void) ""
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH V7 16/24] migration: cpr-transfer mode
  2025-01-15 19:00 [PATCH V7 00/24] Live update: cpr-transfer Steve Sistare
                   ` (14 preceding siblings ...)
  2025-01-15 19:00 ` [PATCH V7 15/24] migration: cpr-transfer save and load Steve Sistare
@ 2025-01-15 19:00 ` Steve Sistare
  2025-01-29  6:23   ` Markus Armbruster
  2025-01-15 19:00 ` [PATCH V7 17/24] migration-test: memory_backend Steve Sistare
                   ` (9 subsequent siblings)
  25 siblings, 1 reply; 44+ messages in thread
From: Steve Sistare @ 2025-01-15 19:00 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, David Hildenbrand, Marcel Apfelbaum,
	Eduardo Habkost, Philippe Mathieu-Daude, Paolo Bonzini,
	Daniel P. Berrange, Markus Armbruster, Steve Sistare

Add the cpr-transfer migration mode, which allows the user to transfer
a guest to a new QEMU instance on the same host with minimal guest pause
time, by preserving guest RAM in place, albeit with new virtual addresses
in new QEMU, and by preserving device file descriptors.  Pages that were
locked in memory for DMA in old QEMU remain locked in new QEMU, because the
descriptor of the device that locked them remains open.

cpr-transfer preserves memory and devices descriptors by sending them to
new QEMU over a unix domain socket using SCM_RIGHTS.  Such CPR state cannot
be sent over the normal migration channel, because devices and backends
are created prior to reading the channel, so this mode sends CPR state
over a second "cpr" migration channel.  New QEMU reads the cpr channel
prior to creating devices or backends.  The user specifies the cpr channel
in the channel arguments on the outgoing side, and in a second -incoming
command-line parameter on the incoming side.

The user must start old QEMU with the the '-machine aux-ram-share=on' option,
which allows anonymous memory to be transferred in place to the new process
by transferring a memory descriptor for each ram block.  Memory-backend
objects must have the share=on attribute, but memory-backend-epc is not
supported.

The user starts new QEMU on the same host as old QEMU, with command-line
arguments to create the same machine, plus the -incoming option for the
main migration channel, like normal live migration.  In addition, the user
adds a second -incoming option with channel type "cpr".  This CPR channel
must support file descriptor transfer with SCM_RIGHTS, i.e. it must be a
UNIX domain socket.

To initiate CPR, the user issues a migrate command to old QEMU, adding
a second migration channel of type "cpr" in the channels argument.
Old QEMU stops the VM, saves state to the migration channels, and enters
the postmigrate state.  New QEMU mmap's memory descriptors, and execution
resumes.

The implementation splits qmp_migrate into start and finish functions.
Start sends CPR state to new QEMU, which responds by closing the CPR
channel.  Old QEMU detects the HUP then calls finish, which connects the
main migration channel.

In summary, the usage is:

  qemu-system-$arch -machine aux-ram-share=on ...

  start new QEMU with "-incoming <main-uri> -incoming <cpr-channel>"

  Issue commands to old QEMU:
    migrate_set_parameter mode cpr-transfer

    {"execute": "migrate", ...
        {"channel-type": "main"...}, {"channel-type": "cpr"...} ... }

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
---
 include/migration/cpr.h   |   5 +++
 migration/cpr.c           |  36 +++++++++++++---
 migration/migration.c     | 106 +++++++++++++++++++++++++++++++++++++++++++++-
 migration/migration.h     |   2 +
 migration/options.c       |   8 +++-
 migration/ram.c           |   2 +
 migration/vmstate-types.c |   1 +
 qapi/migration.json       |  44 ++++++++++++++++++-
 qemu-options.hx           |   2 +
 stubs/vmstate.c           |   7 +++
 system/vl.c               |   7 +++
 11 files changed, 210 insertions(+), 10 deletions(-)

diff --git a/include/migration/cpr.h b/include/migration/cpr.h
index c669b8b..3a6deb7 100644
--- a/include/migration/cpr.h
+++ b/include/migration/cpr.h
@@ -10,6 +10,8 @@
 
 #include "qapi/qapi-types-migration.h"
 
+#define MIG_MODE_NONE           -1
+
 #define QEMU_CPR_FILE_MAGIC     0x51435052
 #define QEMU_CPR_FILE_VERSION   0x00000001
 
@@ -17,6 +19,9 @@ void cpr_save_fd(const char *name, int id, int fd);
 void cpr_delete_fd(const char *name, int id);
 int cpr_find_fd(const char *name, int id);
 
+MigMode cpr_get_incoming_mode(void);
+void cpr_set_incoming_mode(MigMode mode);
+
 int cpr_state_save(MigrationChannel *channel, Error **errp);
 int cpr_state_load(MigrationChannel *channel, Error **errp);
 void cpr_state_close(void);
diff --git a/migration/cpr.c b/migration/cpr.c
index 87bcfdb..584b0b9 100644
--- a/migration/cpr.c
+++ b/migration/cpr.c
@@ -45,7 +45,7 @@ static const VMStateDescription vmstate_cpr_fd = {
         VMSTATE_UINT32(namelen, CprFd),
         VMSTATE_VBUFFER_ALLOC_UINT32(name, CprFd, 0, NULL, namelen),
         VMSTATE_INT32(id, CprFd),
-        VMSTATE_INT32(fd, CprFd),
+        VMSTATE_FD(fd, CprFd),
         VMSTATE_END_OF_LIST()
     }
 };
@@ -116,6 +116,18 @@ QIOChannel *cpr_state_ioc(void)
     return qemu_file_get_ioc(cpr_state_file);
 }
 
+static MigMode incoming_mode = MIG_MODE_NONE;
+
+MigMode cpr_get_incoming_mode(void)
+{
+    return incoming_mode;
+}
+
+void cpr_set_incoming_mode(MigMode mode)
+{
+    incoming_mode = mode;
+}
+
 int cpr_state_save(MigrationChannel *channel, Error **errp)
 {
     int ret;
@@ -124,8 +136,14 @@ int cpr_state_save(MigrationChannel *channel, Error **errp)
 
     trace_cpr_state_save(MigMode_str(mode));
 
-    /* set f based on mode in a later patch in this series */
-    return 0;
+    if (mode == MIG_MODE_CPR_TRANSFER) {
+        f = cpr_transfer_output(channel, errp);
+    } else {
+        return 0;
+    }
+    if (!f) {
+        return -1;
+    }
 
     qemu_put_be32(f, QEMU_CPR_FILE_MAGIC);
     qemu_put_be32(f, QEMU_CPR_FILE_VERSION);
@@ -155,8 +173,16 @@ int cpr_state_load(MigrationChannel *channel, Error **errp)
     QEMUFile *f;
     MigMode mode = 0;
 
-    /* set f and mode based on other parameters later in this patch series */
-    return 0;
+    if (channel) {
+        mode = MIG_MODE_CPR_TRANSFER;
+        cpr_set_incoming_mode(mode);
+        f = cpr_transfer_input(channel, errp);
+    } else {
+        return 0;
+    }
+    if (!f) {
+        return -1;
+    }
 
     trace_cpr_state_load(MigMode_str(mode));
 
diff --git a/migration/migration.c b/migration/migration.c
index 5f2540f..88b0991 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -77,6 +77,7 @@
 static NotifierWithReturnList migration_state_notifiers[] = {
     NOTIFIER_ELEM_INIT(migration_state_notifiers, MIG_MODE_NORMAL),
     NOTIFIER_ELEM_INIT(migration_state_notifiers, MIG_MODE_CPR_REBOOT),
+    NOTIFIER_ELEM_INIT(migration_state_notifiers, MIG_MODE_CPR_TRANSFER),
 };
 
 /* Messages sent on the return path from destination to source */
@@ -110,6 +111,7 @@ static int migration_maybe_pause(MigrationState *s,
 static void migrate_fd_cancel(MigrationState *s);
 static bool close_return_path_on_source(MigrationState *s);
 static void migration_completion_end(MigrationState *s);
+static void migrate_hup_delete(MigrationState *s);
 
 static void migration_downtime_start(MigrationState *s)
 {
@@ -220,6 +222,12 @@ migration_channels_and_transport_compatible(MigrationAddress *addr,
         return false;
     }
 
+    if (migrate_mode() == MIG_MODE_CPR_TRANSFER &&
+        addr->transport == MIGRATION_ADDRESS_TYPE_FILE) {
+        error_setg(errp, "Migration requires streamable transport (eg unix)");
+        return false;
+    }
+
     return true;
 }
 
@@ -435,6 +443,7 @@ void migration_incoming_state_destroy(void)
         mis->postcopy_qemufile_dst = NULL;
     }
 
+    cpr_set_incoming_mode(MIG_MODE_NONE);
     yank_unregister_instance(MIGRATION_YANK_INSTANCE);
 }
 
@@ -747,6 +756,9 @@ static void qemu_start_incoming_migration(const char *uri, bool has_channels,
     } else {
         error_setg(errp, "unknown migration protocol: %s", uri);
     }
+
+    /* Close cpr socket to tell source that we are listening */
+    cpr_state_close();
 }
 
 static void process_incoming_migration_bh(void *opaque)
@@ -1423,6 +1435,8 @@ static void migrate_fd_cleanup(MigrationState *s)
     s->vmdesc = NULL;
 
     qemu_savevm_state_cleanup();
+    cpr_state_close();
+    migrate_hup_delete(s);
 
     close_return_path_on_source(s);
 
@@ -1534,6 +1548,7 @@ static void migrate_fd_error(MigrationState *s, const Error *error)
 static void migrate_fd_cancel(MigrationState *s)
 {
     int old_state ;
+    bool setup = (s->state == MIGRATION_STATUS_SETUP);
 
     trace_migrate_fd_cancel();
 
@@ -1568,6 +1583,17 @@ static void migrate_fd_cancel(MigrationState *s)
             }
         }
     }
+
+    /*
+     * If qmp_migrate_finish has not been called, then there is no path that
+     * will complete the cancellation.  Do it now.
+     */
+    if (setup && !s->to_dst_file) {
+        migrate_set_state(&s->state, MIGRATION_STATUS_CANCELLING,
+                          MIGRATION_STATUS_CANCELLED);
+        cpr_state_close();
+        migrate_hup_delete(s);
+    }
 }
 
 void migration_add_notifier_mode(NotifierWithReturn *notify,
@@ -1665,7 +1691,9 @@ bool migration_thread_is_self(void)
 
 bool migrate_mode_is_cpr(MigrationState *s)
 {
-    return s->parameters.mode == MIG_MODE_CPR_REBOOT;
+    MigMode mode = s->parameters.mode;
+    return mode == MIG_MODE_CPR_REBOOT ||
+           mode == MIG_MODE_CPR_TRANSFER;
 }
 
 int migrate_init(MigrationState *s, Error **errp)
@@ -2046,6 +2074,40 @@ static bool migrate_prepare(MigrationState *s, bool resume, Error **errp)
     return true;
 }
 
+static void qmp_migrate_finish(MigrationAddress *addr, bool resume_requested,
+                               Error **errp);
+
+static void migrate_hup_add(MigrationState *s, QIOChannel *ioc, GSourceFunc cb,
+                            void *opaque)
+{
+        s->hup_source = qio_channel_create_watch(ioc, G_IO_HUP);
+        g_source_set_callback(s->hup_source, cb, opaque, NULL);
+        g_source_attach(s->hup_source, NULL);
+}
+
+static void migrate_hup_delete(MigrationState *s)
+{
+    if (s->hup_source) {
+        g_source_destroy(s->hup_source);
+        g_source_unref(s->hup_source);
+        s->hup_source = NULL;
+    }
+}
+
+static gboolean qmp_migrate_finish_cb(QIOChannel *channel,
+                                      GIOCondition cond,
+                                      void *opaque)
+{
+    MigrationAddress *addr = opaque;
+
+    qmp_migrate_finish(addr, false, NULL);
+
+    cpr_state_close();
+    migrate_hup_delete(migrate_get_current());
+    qapi_free_MigrationAddress(addr);
+    return G_SOURCE_REMOVE;
+}
+
 void qmp_migrate(const char *uri, bool has_channels,
                  MigrationChannelList *channels, bool has_detach, bool detach,
                  bool has_resume, bool resume, Error **errp)
@@ -2056,6 +2118,7 @@ void qmp_migrate(const char *uri, bool has_channels,
     g_autoptr(MigrationChannel) channel = NULL;
     MigrationAddress *addr = NULL;
     MigrationChannel *channelv[MIGRATION_CHANNEL_TYPE__MAX] = { NULL };
+    MigrationChannel *cpr_channel = NULL;
 
     /*
      * Having preliminary checks for uri and channel
@@ -2076,6 +2139,7 @@ void qmp_migrate(const char *uri, bool has_channels,
             }
             channelv[type] = channels->value;
         }
+        cpr_channel = channelv[MIGRATION_CHANNEL_TYPE_CPR];
         addr = channelv[MIGRATION_CHANNEL_TYPE_MAIN]->addr;
         if (!addr) {
             error_setg(errp, "Channel list has no main entry");
@@ -2096,12 +2160,52 @@ void qmp_migrate(const char *uri, bool has_channels,
         return;
     }
 
+    if (s->parameters.mode == MIG_MODE_CPR_TRANSFER && !cpr_channel) {
+        error_setg(errp, "missing 'cpr' migration channel");
+        return;
+    }
+
     resume_requested = has_resume && resume;
     if (!migrate_prepare(s, resume_requested, errp)) {
         /* Error detected, put into errp */
         return;
     }
 
+    if (cpr_state_save(cpr_channel, &local_err)) {
+        goto out;
+    }
+
+    /*
+     * For cpr-transfer, the target may not be listening yet on the migration
+     * channel, because first it must finish cpr_load_state.  The target tells
+     * us it is listening by closing the cpr-state socket.  Wait for that HUP
+     * event before connecting in qmp_migrate_finish.
+     *
+     * The HUP could occur because the target fails while reading CPR state,
+     * in which case the target will not listen for the incoming migration
+     * connection, so qmp_migrate_finish will fail to connect, and then recover.
+     */
+    if (s->parameters.mode == MIG_MODE_CPR_TRANSFER) {
+        migrate_hup_add(s, cpr_state_ioc(), (GSourceFunc)qmp_migrate_finish_cb,
+                        QAPI_CLONE(MigrationAddress, addr));
+
+    } else {
+        qmp_migrate_finish(addr, resume_requested, errp);
+    }
+
+out:
+    if (local_err) {
+        migrate_fd_error(s, local_err);
+        error_propagate(errp, local_err);
+    }
+}
+
+static void qmp_migrate_finish(MigrationAddress *addr, bool resume_requested,
+                               Error **errp)
+{
+    MigrationState *s = migrate_get_current();
+    Error *local_err = NULL;
+
     if (!resume_requested) {
         if (!yank_register_instance(MIGRATION_YANK_INSTANCE, errp)) {
             return;
diff --git a/migration/migration.h b/migration/migration.h
index 1d4d4e9..fb1b8f9 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -468,6 +468,8 @@ struct MigrationState {
     bool switchover_acked;
     /* Is this a rdma migration */
     bool rdma_migration;
+
+    GSource *hup_source;
 };
 
 void migrate_set_state(MigrationStatus *state, MigrationStatus old_state,
diff --git a/migration/options.c b/migration/options.c
index b8d5300..1ad950e 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -22,6 +22,7 @@
 #include "qapi/qmp/qnull.h"
 #include "system/runstate.h"
 #include "migration/colo.h"
+#include "migration/cpr.h"
 #include "migration/misc.h"
 #include "migration.h"
 #include "migration-stats.h"
@@ -745,8 +746,11 @@ uint64_t migrate_max_postcopy_bandwidth(void)
 
 MigMode migrate_mode(void)
 {
-    MigrationState *s = migrate_get_current();
-    MigMode mode = s->parameters.mode;
+    MigMode mode = cpr_get_incoming_mode();
+
+    if (mode == MIG_MODE_NONE) {
+        mode = migrate_get_current()->parameters.mode;
+    }
 
     assert(mode >= 0 && mode < MIG_MODE__MAX);
     return mode;
diff --git a/migration/ram.c b/migration/ram.c
index ce28328..5aace00 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -195,7 +195,9 @@ static bool postcopy_preempt_active(void)
 
 bool migrate_ram_is_ignored(RAMBlock *block)
 {
+    MigMode mode = migrate_mode();
     return !qemu_ram_is_migratable(block) ||
+           mode == MIG_MODE_CPR_TRANSFER ||
            (migrate_ignore_shared() && qemu_ram_is_shared(block)
                                     && qemu_ram_is_named_file(block));
 }
diff --git a/migration/vmstate-types.c b/migration/vmstate-types.c
index 0319c35..741a588 100644
--- a/migration/vmstate-types.c
+++ b/migration/vmstate-types.c
@@ -15,6 +15,7 @@
 #include "qemu-file.h"
 #include "migration.h"
 #include "migration/vmstate.h"
+#include "migration/client-options.h"
 #include "qemu/error-report.h"
 #include "qemu/queue.h"
 #include "trace.h"
diff --git a/qapi/migration.json b/qapi/migration.json
index a605dc2..4679ce9 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -614,9 +614,48 @@
 #     or COLO.
 #
 #     (since 8.2)
+#
+# @cpr-transfer: This mode allows the user to transfer a guest to a
+#     new QEMU instance on the same host with minimal guest pause
+#     time by preserving guest RAM in place.  Devices and their pinned
+#     pages will also be preserved in a future QEMU release.
+#
+#     The user starts new QEMU on the same host as old QEMU, with
+#     command-line arguments to create the same machine, plus the
+#     -incoming option for the main migration channel, like normal
+#     live migration.  In addition, the user adds a second -incoming
+#     option with channel type "cpr".  This CPR channel must support
+#     file descriptor transfer with SCM_RIGHTS, i.e. it must be a
+#     UNIX domain socket.
+#
+#     To initiate CPR, the user issues a migrate command to old QEMU,
+#     adding a second migration channel of type "cpr" in the channels
+#     argument.  Old QEMU stops the VM, saves state to the migration
+#     channels, and enters the postmigrate state.  Execution resumes
+#     in new QEMU.
+#
+#     New QEMU reads the CPR channel before opening a monitor, hence
+#     the CPR channel cannot be specified in the list of channels for
+#     a migrate-incoming command.  It may only be specified on the
+#     command line.
+#
+#     The main channel address cannot be a file type, and for an
+#     inet socket, the port cannot be 0 (meaning dynamically choose
+#     a port).
+#
+#     Memory-backend objects must have the share=on attribute, but
+#     memory-backend-epc is not supported.  The VM must be started
+#     with the '-machine aux-ram-share=on' option.
+#
+#     When using -incoming defer, you must issue the migrate command
+#     to old QEMU before issuing any monitor commands to new QEMU.
+#     However, new QEMU does not open and read the migration stream
+#     until you issue the migrate incoming command.
+#
+#     (since 10.0)
 ##
 { 'enum': 'MigMode',
-  'data': [ 'normal', 'cpr-reboot' ] }
+  'data': [ 'normal', 'cpr-reboot', 'cpr-transfer' ] }
 
 ##
 # @ZeroPageDetection:
@@ -1578,11 +1617,12 @@
 # The migration channel-type request options.
 #
 # @main: Main outbound migration channel.
+# @cpr: Checkpoint and restart state channel.
 #
 # Since: 8.1
 ##
 { 'enum': 'MigrationChannelType',
-  'data': [ 'main' ] }
+  'data': [ 'main', 'cpr' ] }
 
 ##
 # @MigrationChannel:
diff --git a/qemu-options.hx b/qemu-options.hx
index 3d1af73..d19bf53 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -112,6 +112,8 @@ SRST
         specified on the command line, or implicitly created by the -m
         command line option.  The default is off.
 
+        To use the cpr-transfer migration mode, you must set aux-ram-share=on.
+
     ``memory-backend='id'``
         An alternative to legacy ``-mem-path`` and ``mem-prealloc`` options.
         Allows to use a memory backend as main RAM.
diff --git a/stubs/vmstate.c b/stubs/vmstate.c
index 8513d92..c190762 100644
--- a/stubs/vmstate.c
+++ b/stubs/vmstate.c
@@ -1,5 +1,7 @@
 #include "qemu/osdep.h"
 #include "migration/vmstate.h"
+#include "qapi/qapi-types-migration.h"
+#include "migration/client-options.h"
 
 int vmstate_register_with_alias_id(VMStateIf *obj,
                                    uint32_t instance_id,
@@ -21,3 +23,8 @@ bool vmstate_check_only_migratable(const VMStateDescription *vmsd)
 {
     return true;
 }
+
+MigMode migrate_mode(void)
+{
+    return MIG_MODE_NORMAL;
+}
diff --git a/system/vl.c b/system/vl.c
index 251efa0..cbf3737 100644
--- a/system/vl.c
+++ b/system/vl.c
@@ -77,6 +77,7 @@
 #include "hw/block/block.h"
 #include "hw/i386/x86.h"
 #include "hw/i386/pc.h"
+#include "migration/cpr.h"
 #include "migration/misc.h"
 #include "migration/snapshot.h"
 #include "system/tpm.h"
@@ -3751,6 +3752,12 @@ void qemu_init(int argc, char **argv)
 
     qemu_create_machine(machine_opts_dict);
 
+    /*
+     * Load incoming CPR state before any devices are created, because it
+     * contains file descriptors that are needed in device initialization code.
+     */
+    cpr_state_load(incoming_channels[MIGRATION_CHANNEL_TYPE_CPR], &error_fatal);
+
     suspend_mux_open();
 
     qemu_disable_default_devices();
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH V7 17/24] migration-test: memory_backend
  2025-01-15 19:00 [PATCH V7 00/24] Live update: cpr-transfer Steve Sistare
                   ` (15 preceding siblings ...)
  2025-01-15 19:00 ` [PATCH V7 16/24] migration: cpr-transfer mode Steve Sistare
@ 2025-01-15 19:00 ` Steve Sistare
  2025-01-15 19:00 ` [PATCH V7 18/24] tests/qtest: optimize migrate_set_ports Steve Sistare
                   ` (8 subsequent siblings)
  25 siblings, 0 replies; 44+ messages in thread
From: Steve Sistare @ 2025-01-15 19:00 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, David Hildenbrand, Marcel Apfelbaum,
	Eduardo Habkost, Philippe Mathieu-Daude, Paolo Bonzini,
	Daniel P. Berrange, Markus Armbruster, Steve Sistare

Allow each migration test to define its own memory backend, replacing
the standard "-m <size>" specification.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
---
 tests/qtest/migration/framework.c | 15 +++++++++++----
 tests/qtest/migration/framework.h |  5 +++++
 2 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/tests/qtest/migration/framework.c b/tests/qtest/migration/framework.c
index 47ce078..81a0e49 100644
--- a/tests/qtest/migration/framework.c
+++ b/tests/qtest/migration/framework.c
@@ -210,6 +210,7 @@ int migrate_start(QTestState **from, QTestState **to, const char *uri,
     const char *machine_alias, *machine_opts = "";
     g_autofree char *machine = NULL;
     const char *bootpath;
+    g_autofree char *memory_backend = NULL;
 
     if (args->use_shmem) {
         if (!g_file_test("/dev/shm", G_FILE_TEST_IS_DIR)) {
@@ -285,6 +286,12 @@ int migrate_start(QTestState **from, QTestState **to, const char *uri,
             memory_size, shmem_path);
     }
 
+    if (args->memory_backend) {
+        memory_backend = g_strdup_printf(args->memory_backend, memory_size);
+    } else {
+        memory_backend = g_strdup_printf("-m %s ", memory_size);
+    }
+
     if (args->use_dirty_ring) {
         kvm_opts = ",dirty-ring-size=4096";
     }
@@ -303,12 +310,12 @@ int migrate_start(QTestState **from, QTestState **to, const char *uri,
     cmd_source = g_strdup_printf("-accel kvm%s -accel tcg "
                                  "-machine %s,%s "
                                  "-name source,debug-threads=on "
-                                 "-m %s "
+                                 "%s "
                                  "-serial file:%s/src_serial "
                                  "%s %s %s %s",
                                  kvm_opts ? kvm_opts : "",
                                  machine, machine_opts,
-                                 memory_size, tmpfs,
+                                 memory_backend, tmpfs,
                                  arch_opts ? arch_opts : "",
                                  shmem_opts ? shmem_opts : "",
                                  args->opts_source ? args->opts_source : "",
@@ -323,13 +330,13 @@ int migrate_start(QTestState **from, QTestState **to, const char *uri,
     cmd_target = g_strdup_printf("-accel kvm%s -accel tcg "
                                  "-machine %s,%s "
                                  "-name target,debug-threads=on "
-                                 "-m %s "
+                                 "%s "
                                  "-serial file:%s/dest_serial "
                                  "-incoming %s "
                                  "%s %s %s %s",
                                  kvm_opts ? kvm_opts : "",
                                  machine, machine_opts,
-                                 memory_size, tmpfs, uri,
+                                 memory_backend, tmpfs, uri,
                                  arch_opts ? arch_opts : "",
                                  shmem_opts ? shmem_opts : "",
                                  args->opts_target ? args->opts_target : "",
diff --git a/tests/qtest/migration/framework.h b/tests/qtest/migration/framework.h
index e9fc4ec..d368fcf 100644
--- a/tests/qtest/migration/framework.h
+++ b/tests/qtest/migration/framework.h
@@ -109,6 +109,11 @@ typedef struct {
     const char *opts_target;
     /* suspend the src before migrating to dest. */
     bool suspend_me;
+    /*
+     * Format string for the main memory backend, containing one %s where the
+     * size is plugged in.  If omitted, "-m %s" is used.
+     */
+    const char *memory_backend;
 } MigrateStart;
 
 typedef enum PostcopyRecoveryFailStage {
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH V7 18/24] tests/qtest: optimize migrate_set_ports
  2025-01-15 19:00 [PATCH V7 00/24] Live update: cpr-transfer Steve Sistare
                   ` (16 preceding siblings ...)
  2025-01-15 19:00 ` [PATCH V7 17/24] migration-test: memory_backend Steve Sistare
@ 2025-01-15 19:00 ` Steve Sistare
  2025-01-15 19:00 ` [PATCH V7 19/24] tests/qtest: defer connection Steve Sistare
                   ` (7 subsequent siblings)
  25 siblings, 0 replies; 44+ messages in thread
From: Steve Sistare @ 2025-01-15 19:00 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, David Hildenbrand, Marcel Apfelbaum,
	Eduardo Habkost, Philippe Mathieu-Daude, Paolo Bonzini,
	Daniel P. Berrange, Markus Armbruster, Steve Sistare

Do not query connection parameters if all port numbers are known.  This is
more efficient, and also solves a problem for the cpr-transfer test.
At the point where cpr-transfer calls migrate_qmp and migrate_set_ports,
the monitor is not connected and queries are not allowed.  Port=0 is
never used for cpr-transfer.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
---
 tests/qtest/migration/migration-util.c | 23 +++++++++++++++--------
 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/tests/qtest/migration/migration-util.c b/tests/qtest/migration/migration-util.c
index 526bed7..0ce1413 100644
--- a/tests/qtest/migration/migration-util.c
+++ b/tests/qtest/migration/migration-util.c
@@ -135,25 +135,32 @@ migrate_get_connect_qdict(QTestState *who)
 
 void migrate_set_ports(QTestState *to, QList *channel_list)
 {
-    QDict *addr;
+    g_autoptr(QDict) addr = NULL;
     QListEntry *entry;
     const char *addr_port = NULL;
 
-    addr = migrate_get_connect_qdict(to);
-
     QLIST_FOREACH_ENTRY(channel_list, entry) {
         QDict *channel = qobject_to(QDict, qlist_entry_obj(entry));
         QDict *addrdict = qdict_get_qdict(channel, "addr");
 
-        if (qdict_haskey(addrdict, "port") &&
-            qdict_haskey(addr, "port") &&
-            (strcmp(qdict_get_str(addrdict, "port"), "0") == 0)) {
+        if (!qdict_haskey(addrdict, "port") ||
+            strcmp(qdict_get_str(addrdict, "port"), "0")) {
+            continue;
+        }
+
+        /*
+         * Fetch addr only if needed, so tests that are not yet connected to
+         * the monitor do not query it.  Such tests cannot use port=0.
+         */
+        if (!addr) {
+            addr = migrate_get_connect_qdict(to);
+        }
+
+        if (qdict_haskey(addr, "port")) {
             addr_port = qdict_get_str(addr, "port");
             qdict_put_str(addrdict, "port", addr_port);
         }
     }
-
-    qobject_unref(addr);
 }
 
 bool migrate_watch_for_events(QTestState *who, const char *name,
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH V7 19/24] tests/qtest: defer connection
  2025-01-15 19:00 [PATCH V7 00/24] Live update: cpr-transfer Steve Sistare
                   ` (17 preceding siblings ...)
  2025-01-15 19:00 ` [PATCH V7 18/24] tests/qtest: optimize migrate_set_ports Steve Sistare
@ 2025-01-15 19:00 ` Steve Sistare
  2025-01-15 19:00 ` [PATCH V7 20/24] migration-test: " Steve Sistare
                   ` (6 subsequent siblings)
  25 siblings, 0 replies; 44+ messages in thread
From: Steve Sistare @ 2025-01-15 19:00 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, David Hildenbrand, Marcel Apfelbaum,
	Eduardo Habkost, Philippe Mathieu-Daude, Paolo Bonzini,
	Daniel P. Berrange, Markus Armbruster, Steve Sistare

Add an option to defer making the connecting to the monitor and qtest
sockets when calling qtest_init_with_env.  The client makes the connection
later by calling qtest_connect and qtest_qmp_handshake.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
---
 tests/qtest/libqtest.c            | 82 ++++++++++++++++++++++++---------------
 tests/qtest/libqtest.h            | 19 ++++++++-
 tests/qtest/migration/framework.c |  4 +-
 3 files changed, 71 insertions(+), 34 deletions(-)

diff --git a/tests/qtest/libqtest.c b/tests/qtest/libqtest.c
index 8de5f1f..b1e0df9 100644
--- a/tests/qtest/libqtest.c
+++ b/tests/qtest/libqtest.c
@@ -75,6 +75,8 @@ struct QTestState
 {
     int fd;
     int qmp_fd;
+    int sock;
+    int qmpsock;
     pid_t qemu_pid;  /* our child QEMU process */
     int wstatus;
 #ifdef _WIN32
@@ -458,18 +460,19 @@ static QTestState *G_GNUC_PRINTF(2, 3) qtest_spawn_qemu(const char *qemu_bin,
     return s;
 }
 
+static char *qtest_socket_path(const char *suffix)
+{
+    return g_strdup_printf("%s/qtest-%d.%s", g_get_tmp_dir(), getpid(), suffix);
+}
+
 static QTestState *qtest_init_internal(const char *qemu_bin,
-                                       const char *extra_args)
+                                       const char *extra_args,
+                                       bool do_connect)
 {
     QTestState *s;
     int sock, qmpsock, i;
-    gchar *socket_path;
-    gchar *qmp_socket_path;
-
-    socket_path = g_strdup_printf("%s/qtest-%d.sock",
-                                  g_get_tmp_dir(), getpid());
-    qmp_socket_path = g_strdup_printf("%s/qtest-%d.qmp",
-                                      g_get_tmp_dir(), getpid());
+    g_autofree gchar *socket_path = qtest_socket_path("sock");
+    g_autofree gchar *qmp_socket_path = qtest_socket_path("qmp");
 
     /*
      * It's possible that if an earlier test run crashed it might
@@ -501,22 +504,19 @@ static QTestState *qtest_init_internal(const char *qemu_bin,
     qtest_client_set_rx_handler(s, qtest_client_socket_recv_line);
     qtest_client_set_tx_handler(s, qtest_client_socket_send);
 
-    s->fd = socket_accept(sock);
-    if (s->fd >= 0) {
-        s->qmp_fd = socket_accept(qmpsock);
-    }
-    unlink(socket_path);
-    unlink(qmp_socket_path);
-    g_free(socket_path);
-    g_free(qmp_socket_path);
-
-    g_assert(s->fd >= 0 && s->qmp_fd >= 0);
-
     s->rx = g_string_new("");
     for (i = 0; i < MAX_IRQ; i++) {
         s->irq_level[i] = false;
     }
 
+    s->fd = -1;
+    s->qmp_fd = -1;
+    s->sock = sock;
+    s->qmpsock = qmpsock;
+    if (do_connect) {
+        qtest_connect(s);
+    }
+
     /*
      * Stopping QEMU for debugging is not supported on Windows.
      *
@@ -531,34 +531,54 @@ static QTestState *qtest_init_internal(const char *qemu_bin,
     }
 #endif
 
+   return s;
+}
+
+void qtest_connect(QTestState *s)
+{
+    g_autofree gchar *socket_path = qtest_socket_path("sock");
+    g_autofree gchar *qmp_socket_path = qtest_socket_path("qmp");
+
+    g_assert(s->sock >= 0 && s->qmpsock >= 0);
+    s->fd = socket_accept(s->sock);
+    if (s->fd >= 0) {
+        s->qmp_fd = socket_accept(s->qmpsock);
+    }
+    unlink(socket_path);
+    unlink(qmp_socket_path);
+    g_assert(s->fd >= 0 && s->qmp_fd >= 0);
+    s->sock = s->qmpsock = -1;
     /* ask endianness of the target */
-
     s->big_endian = qtest_query_target_endianness(s);
-
-   return s;
 }
 
 QTestState *qtest_init_without_qmp_handshake(const char *extra_args)
 {
-    return qtest_init_internal(qtest_qemu_binary(NULL), extra_args);
+    return qtest_init_internal(qtest_qemu_binary(NULL), extra_args, true);
 }
 
-QTestState *qtest_init_with_env(const char *var, const char *extra_args)
+void qtest_qmp_handshake(QTestState *s)
 {
-    QTestState *s = qtest_init_internal(qtest_qemu_binary(var), extra_args);
-    QDict *greeting;
-
     /* Read the QMP greeting and then do the handshake */
-    greeting = qtest_qmp_receive(s);
+    QDict *greeting = qtest_qmp_receive(s);
     qobject_unref(greeting);
     qobject_unref(qtest_qmp(s, "{ 'execute': 'qmp_capabilities' }"));
+}
 
+QTestState *qtest_init_with_env(const char *var, const char *extra_args,
+                                bool do_connect)
+{
+    QTestState *s = qtest_init_internal(qtest_qemu_binary(var), extra_args,
+                                        do_connect);
+    if (do_connect) {
+        qtest_qmp_handshake(s);
+    }
     return s;
 }
 
 QTestState *qtest_init(const char *extra_args)
 {
-    return qtest_init_with_env(NULL, extra_args);
+    return qtest_init_with_env(NULL, extra_args, true);
 }
 
 QTestState *qtest_vinitf(const char *fmt, va_list ap)
@@ -1539,7 +1559,7 @@ static struct MachInfo *qtest_get_machines(const char *var)
 
     silence_spawn_log = !g_test_verbose();
 
-    qts = qtest_init_with_env(qemu_var, "-machine none");
+    qts = qtest_init_with_env(qemu_var, "-machine none", true);
     response = qtest_qmp(qts, "{ 'execute': 'query-machines' }");
     g_assert(response);
     list = qdict_get_qlist(response, "return");
@@ -1594,7 +1614,7 @@ static struct CpuModel *qtest_get_cpu_models(void)
 
     silence_spawn_log = !g_test_verbose();
 
-    qts = qtest_init_with_env(NULL, "-machine none");
+    qts = qtest_init_with_env(NULL, "-machine none", true);
     response = qtest_qmp(qts, "{ 'execute': 'query-cpu-definitions' }");
     g_assert(response);
     list = qdict_get_qlist(response, "return");
diff --git a/tests/qtest/libqtest.h b/tests/qtest/libqtest.h
index f23d80e..71c94b3 100644
--- a/tests/qtest/libqtest.h
+++ b/tests/qtest/libqtest.h
@@ -60,13 +60,15 @@ QTestState *qtest_init(const char *extra_args);
  * @var: Environment variable from where to take the QEMU binary
  * @extra_args: Other arguments to pass to QEMU.  CAUTION: these
  * arguments are subject to word splitting and shell evaluation.
+ * @do_connect: connect to qemu monitor and qtest socket.
  *
  * Like qtest_init(), but use a different environment variable for the
  * QEMU binary.
  *
  * Returns: #QTestState instance.
  */
-QTestState *qtest_init_with_env(const char *var, const char *extra_args);
+QTestState *qtest_init_with_env(const char *var, const char *extra_args,
+                                bool do_connect);
 
 /**
  * qtest_init_without_qmp_handshake:
@@ -78,6 +80,21 @@ QTestState *qtest_init_with_env(const char *var, const char *extra_args);
 QTestState *qtest_init_without_qmp_handshake(const char *extra_args);
 
 /**
+ * qtest_connect
+ * @s: #QTestState instance to connect
+ * Connect to qemu monitor and qtest socket, after skipping them in
+ * qtest_init_with_env.  Does not handshake with the monitor.
+ */
+void qtest_connect(QTestState *s);
+
+/**
+ * qtest_qmp_handshake:
+ * @s: #QTestState instance to operate on.
+ * Perform handshake after connecting to qemu monitor.
+ */
+void qtest_qmp_handshake(QTestState *s);
+
+/**
  * qtest_init_with_serial:
  * @extra_args: other arguments to pass to QEMU.  CAUTION: these
  * arguments are subject to word splitting and shell evaluation.
diff --git a/tests/qtest/migration/framework.c b/tests/qtest/migration/framework.c
index 81a0e49..44ff901 100644
--- a/tests/qtest/migration/framework.c
+++ b/tests/qtest/migration/framework.c
@@ -321,7 +321,7 @@ int migrate_start(QTestState **from, QTestState **to, const char *uri,
                                  args->opts_source ? args->opts_source : "",
                                  ignore_stderr);
     if (!args->only_target) {
-        *from = qtest_init_with_env(QEMU_ENV_SRC, cmd_source);
+        *from = qtest_init_with_env(QEMU_ENV_SRC, cmd_source, true);
         qtest_qmp_set_event_callback(*from,
                                      migrate_watch_for_events,
                                      &src_state);
@@ -341,7 +341,7 @@ int migrate_start(QTestState **from, QTestState **to, const char *uri,
                                  shmem_opts ? shmem_opts : "",
                                  args->opts_target ? args->opts_target : "",
                                  ignore_stderr);
-    *to = qtest_init_with_env(QEMU_ENV_DST, cmd_target);
+    *to = qtest_init_with_env(QEMU_ENV_DST, cmd_target, true);
     qtest_qmp_set_event_callback(*to,
                                  migrate_watch_for_events,
                                  &dst_state);
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH V7 20/24] migration-test: defer connection
  2025-01-15 19:00 [PATCH V7 00/24] Live update: cpr-transfer Steve Sistare
                   ` (18 preceding siblings ...)
  2025-01-15 19:00 ` [PATCH V7 19/24] tests/qtest: defer connection Steve Sistare
@ 2025-01-15 19:00 ` Steve Sistare
  2025-01-15 19:00 ` [PATCH V7 21/24] tests/qtest: enhance migration channels Steve Sistare
                   ` (5 subsequent siblings)
  25 siblings, 0 replies; 44+ messages in thread
From: Steve Sistare @ 2025-01-15 19:00 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, David Hildenbrand, Marcel Apfelbaum,
	Eduardo Habkost, Philippe Mathieu-Daude, Paolo Bonzini,
	Daniel P. Berrange, Markus Armbruster, Steve Sistare

Add an option to defer connection to the target monitor, needed by the
cpr-transfer test.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
---
 tests/qtest/migration/framework.c | 23 ++++++++++++++++++++---
 tests/qtest/migration/framework.h |  3 +++
 2 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/tests/qtest/migration/framework.c b/tests/qtest/migration/framework.c
index 44ff901..03640e4 100644
--- a/tests/qtest/migration/framework.c
+++ b/tests/qtest/migration/framework.c
@@ -211,6 +211,7 @@ int migrate_start(QTestState **from, QTestState **to, const char *uri,
     g_autofree char *machine = NULL;
     const char *bootpath;
     g_autofree char *memory_backend = NULL;
+    const char *events;
 
     if (args->use_shmem) {
         if (!g_file_test("/dev/shm", G_FILE_TEST_IS_DIR)) {
@@ -327,21 +328,30 @@ int migrate_start(QTestState **from, QTestState **to, const char *uri,
                                      &src_state);
     }
 
+    /*
+     * If the monitor connection is deferred, enable events on the command line
+     * so none are missed.  This is for testing only, do not set migration
+     * options like this in general.
+     */
+    events = args->defer_target_connect ? "-global migration.x-events=on" : "";
+
     cmd_target = g_strdup_printf("-accel kvm%s -accel tcg "
                                  "-machine %s,%s "
                                  "-name target,debug-threads=on "
                                  "%s "
                                  "-serial file:%s/dest_serial "
                                  "-incoming %s "
-                                 "%s %s %s %s",
+                                 "%s %s %s %s %s",
                                  kvm_opts ? kvm_opts : "",
                                  machine, machine_opts,
                                  memory_backend, tmpfs, uri,
+                                 events,
                                  arch_opts ? arch_opts : "",
                                  shmem_opts ? shmem_opts : "",
                                  args->opts_target ? args->opts_target : "",
                                  ignore_stderr);
-    *to = qtest_init_with_env(QEMU_ENV_DST, cmd_target, true);
+    *to = qtest_init_with_env(QEMU_ENV_DST, cmd_target,
+                              !args->defer_target_connect);
     qtest_qmp_set_event_callback(*to,
                                  migrate_watch_for_events,
                                  &dst_state);
@@ -359,7 +369,9 @@ int migrate_start(QTestState **from, QTestState **to, const char *uri,
      * to mimic as closer as that.
      */
     migrate_set_capability(*from, "events", true);
-    migrate_set_capability(*to, "events", true);
+    if (!args->defer_target_connect) {
+        migrate_set_capability(*to, "events", true);
+    }
 
     return 0;
 }
@@ -713,6 +725,11 @@ void test_precopy_common(MigrateCommon *args)
 
     migrate_qmp(from, to, args->connect_uri, args->connect_channels, "{}");
 
+    if (args->start.defer_target_connect) {
+        qtest_connect(to);
+        qtest_qmp_handshake(to);
+    }
+
     if (args->result != MIG_TEST_SUCCEED) {
         bool allow_active = args->result == MIG_TEST_FAIL;
         wait_for_migration_fail(from, allow_active);
diff --git a/tests/qtest/migration/framework.h b/tests/qtest/migration/framework.h
index d368fcf..1341368 100644
--- a/tests/qtest/migration/framework.h
+++ b/tests/qtest/migration/framework.h
@@ -114,6 +114,9 @@ typedef struct {
      * size is plugged in.  If omitted, "-m %s" is used.
      */
     const char *memory_backend;
+
+    /* Do not connect to target monitor and qtest sockets in qtest_init */
+    bool defer_target_connect;
 } MigrateStart;
 
 typedef enum PostcopyRecoveryFailStage {
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH V7 21/24] tests/qtest: enhance migration channels
  2025-01-15 19:00 [PATCH V7 00/24] Live update: cpr-transfer Steve Sistare
                   ` (19 preceding siblings ...)
  2025-01-15 19:00 ` [PATCH V7 20/24] migration-test: " Steve Sistare
@ 2025-01-15 19:00 ` Steve Sistare
  2025-01-15 19:00 ` [PATCH V7 22/24] tests/qtest: assert qmp connected Steve Sistare
                   ` (4 subsequent siblings)
  25 siblings, 0 replies; 44+ messages in thread
From: Steve Sistare @ 2025-01-15 19:00 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, David Hildenbrand, Marcel Apfelbaum,
	Eduardo Habkost, Philippe Mathieu-Daude, Paolo Bonzini,
	Daniel P. Berrange, Markus Armbruster, Steve Sistare

Change the migrate_qmp and migrate_qmp_fail channels argument to a QObject
type so the caller can manipulate the object before passing it to the
helper.  Define migrate_str_to_channel to aid such manipulation.
Add a channels argument to migrate_incoming_qmp.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
---
 tests/qtest/migration/framework.c     | 15 +++++++---
 tests/qtest/migration/migration-qmp.c | 53 +++++++++++++++++++++++++++++------
 tests/qtest/migration/migration-qmp.h | 10 ++++---
 tests/qtest/migration/misc-tests.c    |  9 +++++-
 tests/qtest/migration/precopy-tests.c |  6 ++--
 tests/qtest/virtio-net-failover.c     |  8 +++---
 6 files changed, 76 insertions(+), 25 deletions(-)

diff --git a/tests/qtest/migration/framework.c b/tests/qtest/migration/framework.c
index 03640e4..8d34cb2 100644
--- a/tests/qtest/migration/framework.c
+++ b/tests/qtest/migration/framework.c
@@ -18,6 +18,8 @@
 #include "migration/migration-qmp.h"
 #include "migration/migration-util.h"
 #include "ppc-util.h"
+#include "qapi/error.h"
+#include "qapi/qmp/qjson.h"
 #include "qapi/qmp/qlist.h"
 #include "qemu/module.h"
 #include "qemu/option.h"
@@ -686,6 +688,7 @@ void test_precopy_common(MigrateCommon *args)
 {
     QTestState *from, *to;
     void *data_hook = NULL;
+    QObject *out_channels = NULL;
 
     if (migrate_start(&from, &to, args->listen_uri, &args->start)) {
         return;
@@ -718,12 +721,16 @@ void test_precopy_common(MigrateCommon *args)
         }
     }
 
+    if (args->connect_channels) {
+        out_channels = qobject_from_json(args->connect_channels, &error_abort);
+    }
+
     if (args->result == MIG_TEST_QMP_ERROR) {
-        migrate_qmp_fail(from, args->connect_uri, args->connect_channels, "{}");
+        migrate_qmp_fail(from, args->connect_uri, out_channels, "{}");
         goto finish;
     }
 
-    migrate_qmp(from, to, args->connect_uri, args->connect_channels, "{}");
+    migrate_qmp(from, to, args->connect_uri, out_channels, "{}");
 
     if (args->start.defer_target_connect) {
         qtest_connect(to);
@@ -873,7 +880,7 @@ void test_file_common(MigrateCommon *args, bool stop_src)
      * We need to wait for the source to finish before starting the
      * destination.
      */
-    migrate_incoming_qmp(to, args->connect_uri, "{}");
+    migrate_incoming_qmp(to, args->connect_uri, NULL, "{}");
     wait_for_migration_complete(to);
 
     if (stop_src) {
@@ -909,7 +916,7 @@ void *migrate_hook_start_precopy_tcp_multifd_common(QTestState *from,
     migrate_set_capability(to, "multifd", true);
 
     /* Start incoming migration from the 1st socket */
-    migrate_incoming_qmp(to, "tcp:127.0.0.1:0", "{}");
+    migrate_incoming_qmp(to, "tcp:127.0.0.1:0", NULL, "{}");
 
     return NULL;
 }
diff --git a/tests/qtest/migration/migration-qmp.c b/tests/qtest/migration/migration-qmp.c
index 71b14b5..f7a597d 100644
--- a/tests/qtest/migration/migration-qmp.c
+++ b/tests/qtest/migration/migration-qmp.c
@@ -15,9 +15,13 @@
 #include "migration-qmp.h"
 #include "migration-util.h"
 #include "qapi/error.h"
+#include "qapi/qapi-types-migration.h"
+#include "qapi/qapi-visit-migration.h"
 #include "qapi/qmp/qdict.h"
 #include "qapi/qmp/qjson.h"
 #include "qapi/qmp/qlist.h"
+#include "qapi/qobject-input-visitor.h"
+#include "qapi/qobject-output-visitor.h"
 
 /*
  * Number of seconds we wait when looking for migration
@@ -47,8 +51,33 @@ void migration_event_wait(QTestState *s, const char *target)
     } while (!found);
 }
 
+/*
+ * Convert a string representing a single channel to an object.
+ * @str may be in JSON or dotted keys format.
+ */
+QObject *migrate_str_to_channel(const char *str)
+{
+    Visitor *v;
+    MigrationChannel *channel;
+    QObject *obj;
+
+    /* Create the channel */
+    v = qobject_input_visitor_new_str(str, "channel-type", &error_abort);
+    visit_type_MigrationChannel(v, NULL, &channel, &error_abort);
+    visit_free(v);
+
+    /* Create the object */
+    v = qobject_output_visitor_new(&obj);
+    visit_type_MigrationChannel(v, NULL, &channel, &error_abort);
+    visit_complete(v, &obj);
+    visit_free(v);
+
+    qapi_free_MigrationChannel(channel);
+    return obj;
+}
+
 void migrate_qmp_fail(QTestState *who, const char *uri,
-                      const char *channels, const char *fmt, ...)
+                      QObject *channels, const char *fmt, ...)
 {
     va_list ap;
     QDict *args, *err;
@@ -64,8 +93,7 @@ void migrate_qmp_fail(QTestState *who, const char *uri,
 
     g_assert(!qdict_haskey(args, "channels"));
     if (channels) {
-        QObject *channels_obj = qobject_from_json(channels, &error_abort);
-        qdict_put_obj(args, "channels", channels_obj);
+        qdict_put_obj(args, "channels", channels);
     }
 
     err = qtest_qmp_assert_failure_ref(
@@ -82,7 +110,7 @@ void migrate_qmp_fail(QTestState *who, const char *uri,
  * qobject_from_jsonf_nofail()) with "uri": @uri spliced in.
  */
 void migrate_qmp(QTestState *who, QTestState *to, const char *uri,
-                 const char *channels, const char *fmt, ...)
+                 QObject *channels, const char *fmt, ...)
 {
     va_list ap;
     QDict *args;
@@ -102,10 +130,9 @@ void migrate_qmp(QTestState *who, QTestState *to, const char *uri,
 
     g_assert(!qdict_haskey(args, "channels"));
     if (channels) {
-        QObject *channels_obj = qobject_from_json(channels, &error_abort);
-        QList *channel_list = qobject_to(QList, channels_obj);
+        QList *channel_list = qobject_to(QList, channels);
         migrate_set_ports(to, channel_list);
-        qdict_put_obj(args, "channels", channels_obj);
+        qdict_put_obj(args, "channels", channels);
     }
 
     qtest_qmp_assert_success(who,
@@ -123,7 +150,8 @@ void migrate_set_capability(QTestState *who, const char *capability,
                              capability, value);
 }
 
-void migrate_incoming_qmp(QTestState *to, const char *uri, const char *fmt, ...)
+void migrate_incoming_qmp(QTestState *to, const char *uri, QObject *channels,
+                          const char *fmt, ...)
 {
     va_list ap;
     QDict *args, *rsp;
@@ -133,7 +161,14 @@ void migrate_incoming_qmp(QTestState *to, const char *uri, const char *fmt, ...)
     va_end(ap);
 
     g_assert(!qdict_haskey(args, "uri"));
-    qdict_put_str(args, "uri", uri);
+    if (uri) {
+        qdict_put_str(args, "uri", uri);
+    }
+
+    g_assert(!qdict_haskey(args, "channels"));
+    if (channels) {
+        qdict_put_obj(args, "channels", channels);
+    }
 
     /* This function relies on the event to work, make sure it's enabled */
     migrate_set_capability(to, "events", true);
diff --git a/tests/qtest/migration/migration-qmp.h b/tests/qtest/migration/migration-qmp.h
index caaa787..faa8181 100644
--- a/tests/qtest/migration/migration-qmp.h
+++ b/tests/qtest/migration/migration-qmp.h
@@ -4,17 +4,19 @@
 
 #include "migration-util.h"
 
+QObject *migrate_str_to_channel(const char *str);
+
 G_GNUC_PRINTF(4, 5)
 void migrate_qmp_fail(QTestState *who, const char *uri,
-                      const char *channels, const char *fmt, ...);
+                      QObject *channels, const char *fmt, ...);
 
 G_GNUC_PRINTF(5, 6)
 void migrate_qmp(QTestState *who, QTestState *to, const char *uri,
-                 const char *channels, const char *fmt, ...);
+                 QObject *channels, const char *fmt, ...);
 
-G_GNUC_PRINTF(3, 4)
+G_GNUC_PRINTF(4, 5)
 void migrate_incoming_qmp(QTestState *who, const char *uri,
-                          const char *fmt, ...);
+                          QObject *channels, const char *fmt, ...);
 
 void migration_event_wait(QTestState *s, const char *target);
 void migrate_set_capability(QTestState *who, const char *capability,
diff --git a/tests/qtest/migration/misc-tests.c b/tests/qtest/migration/misc-tests.c
index 6173430..dda3707 100644
--- a/tests/qtest/migration/misc-tests.c
+++ b/tests/qtest/migration/misc-tests.c
@@ -11,6 +11,8 @@
  */
 
 #include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "qapi/qmp/qjson.h"
 #include "libqtest.h"
 #include "migration/framework.h"
 #include "migration/migration-qmp.h"
@@ -205,6 +207,7 @@ static void test_validate_uuid_dst_not_set(void)
 static void do_test_validate_uri_channel(MigrateCommon *args)
 {
     QTestState *from, *to;
+    QObject *channels;
 
     if (migrate_start(&from, &to, args->listen_uri, &args->start)) {
         return;
@@ -217,7 +220,11 @@ static void do_test_validate_uri_channel(MigrateCommon *args)
      * 'uri' and 'channels' validation is checked even before the migration
      * starts.
      */
-    migrate_qmp_fail(from, args->connect_uri, args->connect_channels, "{}");
+    channels = args->connect_channels ?
+               qobject_from_json(args->connect_channels, &error_abort) :
+               NULL;
+    migrate_qmp_fail(from, args->connect_uri, channels, "{}");
+
     migrate_end(from, to, false);
 }
 
diff --git a/tests/qtest/migration/precopy-tests.c b/tests/qtest/migration/precopy-tests.c
index 23599b2..436dbd9 100644
--- a/tests/qtest/migration/precopy-tests.c
+++ b/tests/qtest/migration/precopy-tests.c
@@ -152,7 +152,7 @@ static void *migrate_hook_start_fd(QTestState *from,
     close(pair[0]);
 
     /* Start incoming migration from the 1st socket */
-    migrate_incoming_qmp(to, "fd:fd-mig", "{}");
+    migrate_incoming_qmp(to, "fd:fd-mig", NULL, "{}");
 
     /* Send the 2nd socket to the target */
     qtest_qmp_fds_assert_success(from, &pair[1], 1,
@@ -479,7 +479,7 @@ static void test_multifd_tcp_cancel(void)
     migrate_set_capability(to, "multifd", true);
 
     /* Start incoming migration from the 1st socket */
-    migrate_incoming_qmp(to, "tcp:127.0.0.1:0", "{}");
+    migrate_incoming_qmp(to, "tcp:127.0.0.1:0", NULL, "{}");
 
     /* Wait for the first serial output from the source */
     wait_for_serial("src_serial");
@@ -518,7 +518,7 @@ static void test_multifd_tcp_cancel(void)
     migrate_set_capability(to2, "multifd", true);
 
     /* Start incoming migration from the 1st socket */
-    migrate_incoming_qmp(to2, "tcp:127.0.0.1:0", "{}");
+    migrate_incoming_qmp(to2, "tcp:127.0.0.1:0", NULL, "{}");
 
     migrate_ensure_non_converge(from);
 
diff --git a/tests/qtest/virtio-net-failover.c b/tests/qtest/virtio-net-failover.c
index 08365ff..f04573f 100644
--- a/tests/qtest/virtio-net-failover.c
+++ b/tests/qtest/virtio-net-failover.c
@@ -773,7 +773,7 @@ static void test_migrate_in(gconstpointer opaque)
     check_one_card(qts, true, "standby0", MAC_STANDBY0);
     check_one_card(qts, false, "primary0", MAC_PRIMARY0);
 
-    migrate_incoming_qmp(qts, uri, "{}");
+    migrate_incoming_qmp(qts, uri, NULL, "{}");
 
     resp = get_failover_negociated_event(qts);
     g_assert_cmpstr(qdict_get_str(resp, "device-id"), ==, "standby0");
@@ -895,7 +895,7 @@ static void test_off_migrate_in(gconstpointer opaque)
     check_one_card(qts, true, "standby0", MAC_STANDBY0);
     check_one_card(qts, true, "primary0", MAC_PRIMARY0);
 
-    migrate_incoming_qmp(qts, uri, "{}");
+    migrate_incoming_qmp(qts, uri, NULL, "{}");
 
     check_one_card(qts, true, "standby0", MAC_STANDBY0);
     check_one_card(qts, true, "primary0", MAC_PRIMARY0);
@@ -1022,7 +1022,7 @@ static void test_guest_off_migrate_in(gconstpointer opaque)
     check_one_card(qts, true, "standby0", MAC_STANDBY0);
     check_one_card(qts, false, "primary0", MAC_PRIMARY0);
 
-    migrate_incoming_qmp(qts, uri, "{}");
+    migrate_incoming_qmp(qts, uri, NULL, "{}");
 
     check_one_card(qts, true, "standby0", MAC_STANDBY0);
     check_one_card(qts, false, "primary0", MAC_PRIMARY0);
@@ -1747,7 +1747,7 @@ static void test_multi_in(gconstpointer opaque)
     check_one_card(qts, true, "standby1", MAC_STANDBY1);
     check_one_card(qts, false, "primary1", MAC_PRIMARY1);
 
-    migrate_incoming_qmp(qts, uri, "{}");
+    migrate_incoming_qmp(qts, uri, NULL, "{}");
 
     resp = get_failover_negociated_event(qts);
     g_assert_cmpstr(qdict_get_str(resp, "device-id"), ==, "standby0");
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH V7 22/24] tests/qtest: assert qmp connected
  2025-01-15 19:00 [PATCH V7 00/24] Live update: cpr-transfer Steve Sistare
                   ` (20 preceding siblings ...)
  2025-01-15 19:00 ` [PATCH V7 21/24] tests/qtest: enhance migration channels Steve Sistare
@ 2025-01-15 19:00 ` Steve Sistare
  2025-01-15 19:00 ` [PATCH V7 23/24] migration-test: cpr-transfer Steve Sistare
                   ` (3 subsequent siblings)
  25 siblings, 0 replies; 44+ messages in thread
From: Steve Sistare @ 2025-01-15 19:00 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, David Hildenbrand, Marcel Apfelbaum,
	Eduardo Habkost, Philippe Mathieu-Daude, Paolo Bonzini,
	Daniel P. Berrange, Markus Armbruster, Steve Sistare

Assert that qmp_fd is valid when we communicate with the monitor.

Suggested-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 tests/qtest/libqtest.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/tests/qtest/libqtest.c b/tests/qtest/libqtest.c
index b1e0df9..812b7e8 100644
--- a/tests/qtest/libqtest.c
+++ b/tests/qtest/libqtest.c
@@ -788,6 +788,7 @@ QDict *qtest_qmp_receive(QTestState *s)
 
 QDict *qtest_qmp_receive_dict(QTestState *s)
 {
+    g_assert(s->qmp_fd >= 0);
     return qmp_fd_receive(s->qmp_fd);
 }
 
@@ -815,12 +816,14 @@ int qtest_socket_server(const char *socket_path)
 void qtest_qmp_vsend_fds(QTestState *s, int *fds, size_t fds_num,
                          const char *fmt, va_list ap)
 {
+    g_assert(s->qmp_fd >= 0);
     qmp_fd_vsend_fds(s->qmp_fd, fds, fds_num, fmt, ap);
 }
 #endif
 
 void qtest_qmp_vsend(QTestState *s, const char *fmt, va_list ap)
 {
+    g_assert(s->qmp_fd >= 0);
     qmp_fd_vsend(s->qmp_fd, fmt, ap);
 }
 
@@ -881,6 +884,7 @@ void qtest_qmp_send_raw(QTestState *s, const char *fmt, ...)
 {
     va_list ap;
 
+    g_assert(s->qmp_fd >= 0);
     va_start(ap, fmt);
     qmp_fd_vsend_raw(s->qmp_fd, fmt, ap);
     va_end(ap);
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH V7 23/24] migration-test: cpr-transfer
  2025-01-15 19:00 [PATCH V7 00/24] Live update: cpr-transfer Steve Sistare
                   ` (21 preceding siblings ...)
  2025-01-15 19:00 ` [PATCH V7 22/24] tests/qtest: assert qmp connected Steve Sistare
@ 2025-01-15 19:00 ` Steve Sistare
  2025-01-16 19:06   ` Fabiano Rosas
  2025-01-15 19:00 ` [PATCH V7 24/24] migration: cpr-transfer documentation Steve Sistare
                   ` (2 subsequent siblings)
  25 siblings, 1 reply; 44+ messages in thread
From: Steve Sistare @ 2025-01-15 19:00 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, David Hildenbrand, Marcel Apfelbaum,
	Eduardo Habkost, Philippe Mathieu-Daude, Paolo Bonzini,
	Daniel P. Berrange, Markus Armbruster, Steve Sistare

Add a migration test for cpr-transfer mode.  Defer the connection to the
target monitor, else the test hangs because in cpr-transfer mode QEMU does
not listen for monitor connections until we send the migrate command to
source QEMU.

To test -incoming defer, send a migrate incoming command to the target,
after sending the migrate command to the source, as required by
cpr-transfer mode.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
---
 tests/qtest/migration/cpr-tests.c | 62 +++++++++++++++++++++++++++++++++++++++
 tests/qtest/migration/framework.c | 19 ++++++++++++
 tests/qtest/migration/framework.h |  3 ++
 3 files changed, 84 insertions(+)

diff --git a/tests/qtest/migration/cpr-tests.c b/tests/qtest/migration/cpr-tests.c
index 44ce89a..215b0df 100644
--- a/tests/qtest/migration/cpr-tests.c
+++ b/tests/qtest/migration/cpr-tests.c
@@ -44,6 +44,62 @@ static void test_mode_reboot(void)
     test_file_common(&args, true);
 }
 
+static void *test_mode_transfer_start(QTestState *from, QTestState *to)
+{
+    migrate_set_parameter_str(from, "mode", "cpr-transfer");
+    return NULL;
+}
+
+/*
+ * cpr-transfer mode cannot use the target monitor prior to starting the
+ * migration, and cannot connect synchronously to the monitor, so defer
+ * the target connection.
+ */
+static void test_mode_transfer_common(bool incoming_defer)
+{
+    g_autofree char *cpr_path = g_strdup_printf("%s/cpr.sock", tmpfs);
+    g_autofree char *mig_path = g_strdup_printf("%s/migsocket", tmpfs);
+    g_autofree char *uri = g_strdup_printf("unix:%s", mig_path);
+
+    const char *opts = "-machine aux-ram-share=on -nodefaults";
+    g_autofree const char *cpr_channel = g_strdup_printf(
+        "cpr,addr.transport=socket,addr.type=unix,addr.path=%s",
+        cpr_path);
+    g_autofree char *opts_target = g_strdup_printf("-incoming %s %s",
+                                                   cpr_channel, opts);
+
+    g_autofree char *connect_channels = g_strdup_printf(
+        "[ { 'channel-type': 'main',"
+        "    'addr': { 'transport': 'socket',"
+        "              'type': 'unix',"
+        "              'path': '%s' } } ]",
+        mig_path);
+
+    MigrateCommon args = {
+        .start.opts_source = opts,
+        .start.opts_target = opts_target,
+        .start.defer_target_connect = true,
+        .start.memory_backend = "-object memory-backend-memfd,id=pc.ram,size=%s"
+                                " -machine memory-backend=pc.ram",
+        .listen_uri = incoming_defer ? "defer" : uri,
+        .connect_channels = connect_channels,
+        .cpr_channel = cpr_channel,
+        .start_hook = test_mode_transfer_start,
+    };
+
+    test_precopy_common(&args);
+}
+
+static void test_mode_transfer(void)
+{
+    test_mode_transfer_common(NULL);
+}
+
+static void test_mode_transfer_defer(void)
+{
+    test_mode_transfer_common(true);
+}
+
 void migration_test_add_cpr(MigrationTestEnv *env)
 {
     tmpfs = env->tmpfs;
@@ -55,4 +111,10 @@ void migration_test_add_cpr(MigrationTestEnv *env)
     if (getenv("QEMU_TEST_FLAKY_TESTS")) {
         migration_test_add("/migration/mode/reboot", test_mode_reboot);
     }
+
+    if (env->has_kvm) {
+        migration_test_add("/migration/mode/transfer", test_mode_transfer);
+        migration_test_add("/migration/mode/transfer/defer",
+                           test_mode_transfer_defer);
+    }
 }
diff --git a/tests/qtest/migration/framework.c b/tests/qtest/migration/framework.c
index 8d34cb2..699beda 100644
--- a/tests/qtest/migration/framework.c
+++ b/tests/qtest/migration/framework.c
@@ -407,6 +407,7 @@ void migrate_end(QTestState *from, QTestState *to, bool test_dest)
     qtest_quit(to);
 
     cleanup("migsocket");
+    cleanup("cpr.sock");
     cleanup("src_serial");
     cleanup("dest_serial");
     cleanup(FILE_TEST_FILENAME);
@@ -688,8 +689,11 @@ void test_precopy_common(MigrateCommon *args)
 {
     QTestState *from, *to;
     void *data_hook = NULL;
+    QObject *in_channels = NULL;
     QObject *out_channels = NULL;
 
+    g_assert(!args->cpr_channel || args->connect_channels);
+
     if (migrate_start(&from, &to, args->listen_uri, &args->start)) {
         return;
     }
@@ -721,8 +725,20 @@ void test_precopy_common(MigrateCommon *args)
         }
     }
 
+    /*
+     * The cpr channel must be included in outgoing channels, but not in
+     * migrate-incoming channels.
+     */
     if (args->connect_channels) {
+        in_channels = qobject_from_json(args->connect_channels, &error_abort);
         out_channels = qobject_from_json(args->connect_channels, &error_abort);
+
+        if (args->cpr_channel) {
+            QList *channels_list = qobject_to(QList, out_channels);
+            QObject *obj = migrate_str_to_channel(args->cpr_channel);
+
+            qlist_append(channels_list, obj);
+        }
     }
 
     if (args->result == MIG_TEST_QMP_ERROR) {
@@ -735,6 +751,9 @@ void test_precopy_common(MigrateCommon *args)
     if (args->start.defer_target_connect) {
         qtest_connect(to);
         qtest_qmp_handshake(to);
+        if (!strcmp(args->listen_uri, "defer")) {
+            migrate_incoming_qmp(to, args->connect_uri, in_channels, "{}");
+        }
     }
 
     if (args->result != MIG_TEST_SUCCEED) {
diff --git a/tests/qtest/migration/framework.h b/tests/qtest/migration/framework.h
index 1341368..4678e2a 100644
--- a/tests/qtest/migration/framework.h
+++ b/tests/qtest/migration/framework.h
@@ -152,6 +152,9 @@ typedef struct {
      */
     const char *connect_channels;
 
+    /* Optional: the cpr migration channel, in JSON or dotted keys format */
+    const char *cpr_channel;
+
     /* Optional: callback to run at start to set migration parameters */
     TestMigrateStartHook start_hook;
     /* Optional: callback to run at finish to cleanup */
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH V7 24/24] migration: cpr-transfer documentation
  2025-01-15 19:00 [PATCH V7 00/24] Live update: cpr-transfer Steve Sistare
                   ` (22 preceding siblings ...)
  2025-01-15 19:00 ` [PATCH V7 23/24] migration-test: cpr-transfer Steve Sistare
@ 2025-01-15 19:00 ` Steve Sistare
  2025-01-17 14:42   ` Fabiano Rosas
  2025-01-27 15:39 ` [PATCH V7 00/24] Live update: cpr-transfer Fabiano Rosas
  2025-04-09 16:22 ` Vladimir Sementsov-Ogievskiy
  25 siblings, 1 reply; 44+ messages in thread
From: Steve Sistare @ 2025-01-15 19:00 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, David Hildenbrand, Marcel Apfelbaum,
	Eduardo Habkost, Philippe Mathieu-Daude, Paolo Bonzini,
	Daniel P. Berrange, Markus Armbruster, Steve Sistare

Add documentation for the cpr-transfer migration mode.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
---
 docs/devel/migration/CPR.rst | 182 ++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 180 insertions(+), 2 deletions(-)

diff --git a/docs/devel/migration/CPR.rst b/docs/devel/migration/CPR.rst
index 63c3647..d6021d5 100644
--- a/docs/devel/migration/CPR.rst
+++ b/docs/devel/migration/CPR.rst
@@ -5,7 +5,7 @@ CPR is the umbrella name for a set of migration modes in which the
 VM is migrated to a new QEMU instance on the same host.  It is
 intended for use when the goal is to update host software components
 that run the VM, such as QEMU or even the host kernel.  At this time,
-cpr-reboot is the only available mode.
+the cpr-reboot and cpr-transfer modes are available.
 
 Because QEMU is restarted on the same host, with access to the same
 local devices, CPR is allowed in certain cases where normal migration
@@ -53,7 +53,7 @@ RAM is copied to the migration URI.
 Outgoing:
   * Set the migration mode parameter to ``cpr-reboot``.
   * Set the ``x-ignore-shared`` capability if desired.
-  * Issue the ``migrate`` command.  It is recommended the the URI be a
+  * Issue the ``migrate`` command.  It is recommended the URI be a
     ``file`` type, but one can use other types such as ``exec``,
     provided the command captures all the data from the outgoing side,
     and provides all the data to the incoming side.
@@ -145,3 +145,181 @@ Caveats
 
 cpr-reboot mode may not be used with postcopy, background-snapshot,
 or COLO.
+
+cpr-transfer mode
+-----------------
+
+This mode allows the user to transfer a guest to a new QEMU instance
+on the same host with minimal guest pause time, by preserving guest
+RAM in place, albeit with new virtual addresses in new QEMU.  Devices
+and their pinned memory pages will also be preserved in a future QEMU
+release.
+
+The user starts new QEMU on the same host as old QEMU, with command-
+line arguments to create the same machine, plus the ``-incoming``
+option for the main migration channel, like normal live migration.
+In addition, the user adds a second -incoming option with channel
+type ``cpr``.  This CPR channel must support file descriptor transfer
+with SCM_RIGHTS, i.e. it must be a UNIX domain socket.
+
+To initiate CPR, the user issues a migrate command to old QEMU,
+adding a second migration channel of type ``cpr`` in the channels
+argument.  Old QEMU stops the VM, saves state to the migration
+channels, and enters the postmigrate state.  Execution resumes in
+new QEMU.
+
+New QEMU reads the CPR channel before opening a monitor, hence
+the CPR channel cannot be specified in the list of channels for a
+migrate-incoming command.  It may only be specified on the command
+line.
+
+Usage
+^^^^^
+
+Memory backend objects must have the ``share=on`` attribute.
+
+The VM must be started with the ``-machine aux-ram-share=on``
+option.  This causes implicit RAM blocks (those not described by
+a memory-backend object) to be allocated by mmap'ing a memfd.
+Examples include VGA and ROM.
+
+Outgoing:
+  * Set the migration mode parameter to ``cpr-transfer``.
+  * Issue the ``migrate`` command, containing a main channel and
+    a cpr channel.
+
+Incoming:
+  * Start new QEMU with two ``-incoming`` options.
+  * If the VM was running when the outgoing ``migrate`` command was
+    issued, then QEMU automatically resumes VM execution.
+
+Caveats
+^^^^^^^
+
+cpr-transfer mode may not be used with postcopy, background-snapshot,
+or COLO.
+
+memory-backend-epc is not supported.
+
+The main incoming migration channel address cannot be a file type.
+
+If the main incoming channel address is an inet socket, then the port
+cannot be 0 (meaning dynamically choose a port).
+
+When using ``-incoming defer``, you must issue the migrate command to
+old QEMU before issuing any monitor commands to new QEMU, because new
+QEMU blocks waiting to read from the cpr channel before starting its
+monitor, and old QEMU does not write to the channel until the migrate
+command is issued.  However, new QEMU does not open and read the
+main migration channel until you issue the migrate incoming command.
+
+Example 1: incoming channel
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+In these examples, we simply restart the same version of QEMU, but
+in a real scenario one would start new QEMU on the incoming side.
+Note that new QEMU does not print the monitor prompt until old QEMU
+has issued the migrate command.  The outgoing side uses QMP because
+HMP cannot specify a CPR channel.  Some QMP responses are omitted for
+brevity.
+
+::
+
+  Outgoing:                             Incoming:
+
+  # qemu-kvm -qmp stdio
+  -object memory-backend-file,id=ram0,size=4G,
+  mem-path=/dev/shm/ram0,share=on -m 4G
+  -machine aux-ram-share=on
+  ...
+                                        # qemu-kvm -monitor stdio
+                                        -incoming tcp:0:44444
+                                        -incoming '{"channel-type": "cpr",
+                                          "addr": { "transport": "socket",
+                                          "type": "unix", "path": "cpr.sock"}}'
+                                        ...
+  {"execute":"qmp_capabilities"}
+
+  {"execute": "query-status"}
+  {"return": {"status": "running",
+              "running": true}}
+
+  {"execute":"migrate-set-parameters",
+   "arguments":{"mode":"cpr-transfer"}}
+
+  {"execute": "migrate", "arguments": { "channels": [
+    {"channel-type": "main",
+     "addr": { "transport": "socket", "type": "inet",
+               "host": "0", "port": "44444" }},
+    {"channel-type": "cpr",
+     "addr": { "transport": "socket", "type": "unix",
+               "path": "cpr.sock" }}]}}
+
+                                        QEMU 10.0.50 monitor
+                                        (qemu) info status
+                                        VM status: running
+
+  {"execute": "query-status"}
+  {"return": {"status": "postmigrate",
+              "running": false}}
+
+Example 2: incoming defer
+^^^^^^^^^^^^^^^^^^^^^^^^^
+
+This example uses ``-incoming defer`` to hot plug a device before
+accepting the main migration channel.  Again note you must issue the
+migrate command to old QEMU before you can issue any monitor
+commands to new QEMU.
+
+
+::
+
+  Outgoing:                             Incoming:
+
+  # qemu-kvm -monitor stdio
+  -object memory-backend-file,id=ram0,size=4G,
+  mem-path=/dev/shm/ram0,share=on -m 4G
+  -machine aux-ram-share=on
+  ...
+                                        # qemu-kvm -monitor stdio
+                                        -incoming defer
+                                        -incoming '{"channel-type": "cpr",
+                                          "addr": { "transport": "socket",
+                                          "type": "unix", "path": "cpr.sock"}}'
+                                        ...
+  {"execute":"qmp_capabilities"}
+
+  {"execute": "device_add",
+   "arguments": {"driver": "pcie-root-port"}}
+
+  {"execute":"migrate-set-parameters",
+   "arguments":{"mode":"cpr-transfer"}}
+
+  {"execute": "migrate", "arguments": { "channels": [
+    {"channel-type": "main",
+     "addr": { "transport": "socket", "type": "inet",
+               "host": "0", "port": "44444" }},
+    {"channel-type": "cpr",
+     "addr": { "transport": "socket", "type": "unix",
+               "path": "cpr.sock" }}]}}
+
+                                        QEMU 10.0.50 monitor
+                                        (qemu) info status
+                                        VM status: paused (inmigrate)
+                                        (qemu) device_add pcie-root-port
+                                        (qemu) migrate_incoming tcp:0:44444
+                                        (qemu) info status
+                                        VM status: running
+
+  {"execute": "query-status"}
+  {"return": {"status": "postmigrate",
+              "running": false}}
+
+Futures
+^^^^^^^
+
+cpr-transfer mode is based on a capability to transfer open file
+descriptors from old to new QEMU.  In the future, descriptors for
+vfio, iommufd, vhost, and char devices could be transferred,
+preserving those devices and their kernel state without interruption,
+even if they do not explicitly support live migration.
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH V7 23/24] migration-test: cpr-transfer
  2025-01-15 19:00 ` [PATCH V7 23/24] migration-test: cpr-transfer Steve Sistare
@ 2025-01-16 19:06   ` Fabiano Rosas
  2025-01-16 19:37     ` Steven Sistare
  0 siblings, 1 reply; 44+ messages in thread
From: Fabiano Rosas @ 2025-01-16 19:06 UTC (permalink / raw)
  To: Steve Sistare, qemu-devel
  Cc: Peter Xu, David Hildenbrand, Marcel Apfelbaum, Eduardo Habkost,
	Philippe Mathieu-Daude, Paolo Bonzini, Daniel P. Berrange,
	Markus Armbruster, Steve Sistare

Steve Sistare <steven.sistare@oracle.com> writes:

> Add a migration test for cpr-transfer mode.  Defer the connection to the
> target monitor, else the test hangs because in cpr-transfer mode QEMU does
> not listen for monitor connections until we send the migrate command to
> source QEMU.
>
> To test -incoming defer, send a migrate incoming command to the target,
> after sending the migrate command to the source, as required by
> cpr-transfer mode.
>
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> Reviewed-by: Peter Xu <peterx@redhat.com>
> ---
>  tests/qtest/migration/cpr-tests.c | 62 +++++++++++++++++++++++++++++++++++++++
>  tests/qtest/migration/framework.c | 19 ++++++++++++
>  tests/qtest/migration/framework.h |  3 ++
>  3 files changed, 84 insertions(+)
>
> diff --git a/tests/qtest/migration/cpr-tests.c b/tests/qtest/migration/cpr-tests.c
> index 44ce89a..215b0df 100644
> --- a/tests/qtest/migration/cpr-tests.c
> +++ b/tests/qtest/migration/cpr-tests.c
> @@ -44,6 +44,62 @@ static void test_mode_reboot(void)
>      test_file_common(&args, true);
>  }
>  
> +static void *test_mode_transfer_start(QTestState *from, QTestState *to)
> +{
> +    migrate_set_parameter_str(from, "mode", "cpr-transfer");
> +    return NULL;
> +}
> +
> +/*
> + * cpr-transfer mode cannot use the target monitor prior to starting the
> + * migration, and cannot connect synchronously to the monitor, so defer
> + * the target connection.
> + */
> +static void test_mode_transfer_common(bool incoming_defer)
> +{
> +    g_autofree char *cpr_path = g_strdup_printf("%s/cpr.sock", tmpfs);
> +    g_autofree char *mig_path = g_strdup_printf("%s/migsocket", tmpfs);
> +    g_autofree char *uri = g_strdup_printf("unix:%s", mig_path);
> +
> +    const char *opts = "-machine aux-ram-share=on -nodefaults";
> +    g_autofree const char *cpr_channel = g_strdup_printf(
> +        "cpr,addr.transport=socket,addr.type=unix,addr.path=%s",
> +        cpr_path);
> +    g_autofree char *opts_target = g_strdup_printf("-incoming %s %s",
> +                                                   cpr_channel, opts);
> +
> +    g_autofree char *connect_channels = g_strdup_printf(
> +        "[ { 'channel-type': 'main',"
> +        "    'addr': { 'transport': 'socket',"
> +        "              'type': 'unix',"
> +        "              'path': '%s' } } ]",
> +        mig_path);
> +
> +    MigrateCommon args = {
> +        .start.opts_source = opts,
> +        .start.opts_target = opts_target,
> +        .start.defer_target_connect = true,
> +        .start.memory_backend = "-object memory-backend-memfd,id=pc.ram,size=%s"
> +                                " -machine memory-backend=pc.ram",
> +        .listen_uri = incoming_defer ? "defer" : uri,
> +        .connect_channels = connect_channels,
> +        .cpr_channel = cpr_channel,
> +        .start_hook = test_mode_transfer_start,
> +    };
> +
> +    test_precopy_common(&args);
> +}
> +
> +static void test_mode_transfer(void)
> +{
> +    test_mode_transfer_common(NULL);
> +}
> +
> +static void test_mode_transfer_defer(void)
> +{
> +    test_mode_transfer_common(true);
> +}
> +
>  void migration_test_add_cpr(MigrationTestEnv *env)
>  {
>      tmpfs = env->tmpfs;
> @@ -55,4 +111,10 @@ void migration_test_add_cpr(MigrationTestEnv *env)
>      if (getenv("QEMU_TEST_FLAKY_TESTS")) {
>          migration_test_add("/migration/mode/reboot", test_mode_reboot);
>      }
> +
> +    if (env->has_kvm) {
> +        migration_test_add("/migration/mode/transfer", test_mode_transfer);
> +        migration_test_add("/migration/mode/transfer/defer",
> +                           test_mode_transfer_defer);
> +    }
>  }
> diff --git a/tests/qtest/migration/framework.c b/tests/qtest/migration/framework.c
> index 8d34cb2..699beda 100644
> --- a/tests/qtest/migration/framework.c
> +++ b/tests/qtest/migration/framework.c
> @@ -407,6 +407,7 @@ void migrate_end(QTestState *from, QTestState *to, bool test_dest)
>      qtest_quit(to);
>  
>      cleanup("migsocket");
> +    cleanup("cpr.sock");
>      cleanup("src_serial");
>      cleanup("dest_serial");
>      cleanup(FILE_TEST_FILENAME);
> @@ -688,8 +689,11 @@ void test_precopy_common(MigrateCommon *args)
>  {
>      QTestState *from, *to;
>      void *data_hook = NULL;
> +    QObject *in_channels = NULL;
>      QObject *out_channels = NULL;
>  
> +    g_assert(!args->cpr_channel || args->connect_channels);
> +
>      if (migrate_start(&from, &to, args->listen_uri, &args->start)) {
>          return;
>      }
> @@ -721,8 +725,20 @@ void test_precopy_common(MigrateCommon *args)
>          }
>      }
>  
> +    /*
> +     * The cpr channel must be included in outgoing channels, but not in
> +     * migrate-incoming channels.
> +     */
>      if (args->connect_channels) {
> +        in_channels = qobject_from_json(args->connect_channels, &error_abort);
>          out_channels = qobject_from_json(args->connect_channels, &error_abort);
> +
> +        if (args->cpr_channel) {
> +            QList *channels_list = qobject_to(QList, out_channels);
> +            QObject *obj = migrate_str_to_channel(args->cpr_channel);
> +
> +            qlist_append(channels_list, obj);
> +        }
>      }
>  
>      if (args->result == MIG_TEST_QMP_ERROR) {
> @@ -735,6 +751,9 @@ void test_precopy_common(MigrateCommon *args)
>      if (args->start.defer_target_connect) {
>          qtest_connect(to);
>          qtest_qmp_handshake(to);
> +        if (!strcmp(args->listen_uri, "defer")) {
> +            migrate_incoming_qmp(to, args->connect_uri, in_channels, "{}");
> +        }

Paths that don't call migrate_incoming_qmp() never free
in_channels. We'll need something like this, let me know if I can squash
it in or you want to do it differently:

-- >8 --
From 62d60c39b3e5d38cac20241e63b9d023bd296d2f Mon Sep 17 00:00:00 2001
From: Fabiano Rosas <farosas@suse.de>
Date: Thu, 16 Jan 2025 15:40:22 -0300
Subject: [PATCH] fixup! migration-test: cpr-transfer

---
 tests/qtest/migration/framework.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/tests/qtest/migration/framework.c b/tests/qtest/migration/framework.c
index 699bedae69..1d5918d922 100644
--- a/tests/qtest/migration/framework.c
+++ b/tests/qtest/migration/framework.c
@@ -753,9 +753,14 @@ void test_precopy_common(MigrateCommon *args)
         qtest_qmp_handshake(to);
         if (!strcmp(args->listen_uri, "defer")) {
             migrate_incoming_qmp(to, args->connect_uri, in_channels, "{}");
+            in_channels = NULL;
         }
     }
 
+    if (in_channels) {
+        qobject_unref(in_channels);
+    }
+
     if (args->result != MIG_TEST_SUCCEED) {
         bool allow_active = args->result == MIG_TEST_FAIL;
         wait_for_migration_fail(from, allow_active);
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH V7 23/24] migration-test: cpr-transfer
  2025-01-16 19:06   ` Fabiano Rosas
@ 2025-01-16 19:37     ` Steven Sistare
  2025-01-16 20:02       ` Fabiano Rosas
  0 siblings, 1 reply; 44+ messages in thread
From: Steven Sistare @ 2025-01-16 19:37 UTC (permalink / raw)
  To: Fabiano Rosas, qemu-devel
  Cc: Peter Xu, David Hildenbrand, Marcel Apfelbaum, Eduardo Habkost,
	Philippe Mathieu-Daude, Paolo Bonzini, Daniel P. Berrange,
	Markus Armbruster

On 1/16/2025 2:06 PM, Fabiano Rosas wrote:
> Steve Sistare <steven.sistare@oracle.com> writes:
> 
[...]
>> +    /*
>> +     * The cpr channel must be included in outgoing channels, but not in
>> +     * migrate-incoming channels.
>> +     */
>>       if (args->connect_channels) {
>> +        in_channels = qobject_from_json(args->connect_channels, &error_abort);
>>           out_channels = qobject_from_json(args->connect_channels, &error_abort);
>> +
>> +        if (args->cpr_channel) {
>> +            QList *channels_list = qobject_to(QList, out_channels);
>> +            QObject *obj = migrate_str_to_channel(args->cpr_channel);
>> +
>> +            qlist_append(channels_list, obj);
>> +        }
>>       }
>>   
>>       if (args->result == MIG_TEST_QMP_ERROR) {
>> @@ -735,6 +751,9 @@ void test_precopy_common(MigrateCommon *args)
>>       if (args->start.defer_target_connect) {
>>           qtest_connect(to);
>>           qtest_qmp_handshake(to);
>> +        if (!strcmp(args->listen_uri, "defer")) {
>> +            migrate_incoming_qmp(to, args->connect_uri, in_channels, "{}");
>> +        }
> 
> Paths that don't call migrate_incoming_qmp() never free
> in_channels. We'll need something like this, let me know if I can squash
> it in or you want to do it differently:
> 
> -- >8 --
>  From 62d60c39b3e5d38cac20241e63b9d023bd296d2f Mon Sep 17 00:00:00 2001
> From: Fabiano Rosas <farosas@suse.de>
> Date: Thu, 16 Jan 2025 15:40:22 -0300
> Subject: [PATCH] fixup! migration-test: cpr-transfer
> 
> ---
>   tests/qtest/migration/framework.c | 5 +++++
>   1 file changed, 5 insertions(+)
> 
> diff --git a/tests/qtest/migration/framework.c b/tests/qtest/migration/framework.c
> index 699bedae69..1d5918d922 100644
> --- a/tests/qtest/migration/framework.c
> +++ b/tests/qtest/migration/framework.c
> @@ -753,9 +753,14 @@ void test_precopy_common(MigrateCommon *args)
>           qtest_qmp_handshake(to);
>           if (!strcmp(args->listen_uri, "defer")) {
>               migrate_incoming_qmp(to, args->connect_uri, in_channels, "{}");
> +            in_channels = NULL;
>           }
>       }
>   
> +    if (in_channels) {
> +        qobject_unref(in_channels);
> +    }
> +
>       if (args->result != MIG_TEST_SUCCEED) {
>           bool allow_active = args->result == MIG_TEST_FAIL;
>           wait_for_migration_fail(from, allow_active);

Thank-you, though it would be more direct to avoid creating in_channels when
not needed:

     if (args->connect_channels) {
         if (args->start.defer_target_connect) {
             in_channels = qobject_from_json(args->connect_channels,
                                             &error_abort);
         }
         out_channels = qobject_from_json(args->connect_channels, &error_abort);

- Steve



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH V7 23/24] migration-test: cpr-transfer
  2025-01-16 19:37     ` Steven Sistare
@ 2025-01-16 20:02       ` Fabiano Rosas
  2025-01-16 20:15         ` Steven Sistare
  0 siblings, 1 reply; 44+ messages in thread
From: Fabiano Rosas @ 2025-01-16 20:02 UTC (permalink / raw)
  To: Steven Sistare, qemu-devel
  Cc: Peter Xu, David Hildenbrand, Marcel Apfelbaum, Eduardo Habkost,
	Philippe Mathieu-Daude, Paolo Bonzini, Daniel P. Berrange,
	Markus Armbruster

Steven Sistare <steven.sistare@oracle.com> writes:

> On 1/16/2025 2:06 PM, Fabiano Rosas wrote:
>> Steve Sistare <steven.sistare@oracle.com> writes:
>> 
> [...]
>>> +    /*
>>> +     * The cpr channel must be included in outgoing channels, but not in
>>> +     * migrate-incoming channels.
>>> +     */
>>>       if (args->connect_channels) {
>>> +        in_channels = qobject_from_json(args->connect_channels, &error_abort);
>>>           out_channels = qobject_from_json(args->connect_channels, &error_abort);
>>> +
>>> +        if (args->cpr_channel) {
>>> +            QList *channels_list = qobject_to(QList, out_channels);
>>> +            QObject *obj = migrate_str_to_channel(args->cpr_channel);
>>> +
>>> +            qlist_append(channels_list, obj);
>>> +        }
>>>       }
>>>   
>>>       if (args->result == MIG_TEST_QMP_ERROR) {
>>> @@ -735,6 +751,9 @@ void test_precopy_common(MigrateCommon *args)
>>>       if (args->start.defer_target_connect) {
>>>           qtest_connect(to);
>>>           qtest_qmp_handshake(to);
>>> +        if (!strcmp(args->listen_uri, "defer")) {
>>> +            migrate_incoming_qmp(to, args->connect_uri, in_channels, "{}");
>>> +        }
>> 
>> Paths that don't call migrate_incoming_qmp() never free
>> in_channels. We'll need something like this, let me know if I can squash
>> it in or you want to do it differently:
>> 
>> -- >8 --
>>  From 62d60c39b3e5d38cac20241e63b9d023bd296d2f Mon Sep 17 00:00:00 2001
>> From: Fabiano Rosas <farosas@suse.de>
>> Date: Thu, 16 Jan 2025 15:40:22 -0300
>> Subject: [PATCH] fixup! migration-test: cpr-transfer
>> 
>> ---
>>   tests/qtest/migration/framework.c | 5 +++++
>>   1 file changed, 5 insertions(+)
>> 
>> diff --git a/tests/qtest/migration/framework.c b/tests/qtest/migration/framework.c
>> index 699bedae69..1d5918d922 100644
>> --- a/tests/qtest/migration/framework.c
>> +++ b/tests/qtest/migration/framework.c
>> @@ -753,9 +753,14 @@ void test_precopy_common(MigrateCommon *args)
>>           qtest_qmp_handshake(to);
>>           if (!strcmp(args->listen_uri, "defer")) {
>>               migrate_incoming_qmp(to, args->connect_uri, in_channels, "{}");
>> +            in_channels = NULL;
>>           }
>>       }
>>   
>> +    if (in_channels) {
>> +        qobject_unref(in_channels);
>> +    }
>> +
>>       if (args->result != MIG_TEST_SUCCEED) {
>>           bool allow_active = args->result == MIG_TEST_FAIL;
>>           wait_for_migration_fail(from, allow_active);
>
> Thank-you, though it would be more direct to avoid creating in_channels when
> not needed:
>
>      if (args->connect_channels) {
>          if (args->start.defer_target_connect) {
>              in_channels = qobject_from_json(args->connect_channels,
>                                              &error_abort);
>          }
>          out_channels = qobject_from_json(args->connect_channels, &error_abort);

That's better, but still needs one unref for the listen_uri != defer path.

>
> - Steve


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH V7 23/24] migration-test: cpr-transfer
  2025-01-16 20:02       ` Fabiano Rosas
@ 2025-01-16 20:15         ` Steven Sistare
  0 siblings, 0 replies; 44+ messages in thread
From: Steven Sistare @ 2025-01-16 20:15 UTC (permalink / raw)
  To: Fabiano Rosas, qemu-devel
  Cc: Peter Xu, David Hildenbrand, Marcel Apfelbaum, Eduardo Habkost,
	Philippe Mathieu-Daude, Paolo Bonzini, Daniel P. Berrange,
	Markus Armbruster

On 1/16/2025 3:02 PM, Fabiano Rosas wrote:
> Steven Sistare <steven.sistare@oracle.com> writes:
> 
>> On 1/16/2025 2:06 PM, Fabiano Rosas wrote:
>>> Steve Sistare <steven.sistare@oracle.com> writes:
>>>
>> [...]
>>>> +    /*
>>>> +     * The cpr channel must be included in outgoing channels, but not in
>>>> +     * migrate-incoming channels.
>>>> +     */
>>>>        if (args->connect_channels) {
>>>> +        in_channels = qobject_from_json(args->connect_channels, &error_abort);
>>>>            out_channels = qobject_from_json(args->connect_channels, &error_abort);
>>>> +
>>>> +        if (args->cpr_channel) {
>>>> +            QList *channels_list = qobject_to(QList, out_channels);
>>>> +            QObject *obj = migrate_str_to_channel(args->cpr_channel);
>>>> +
>>>> +            qlist_append(channels_list, obj);
>>>> +        }
>>>>        }
>>>>    
>>>>        if (args->result == MIG_TEST_QMP_ERROR) {
>>>> @@ -735,6 +751,9 @@ void test_precopy_common(MigrateCommon *args)
>>>>        if (args->start.defer_target_connect) {
>>>>            qtest_connect(to);
>>>>            qtest_qmp_handshake(to);
>>>> +        if (!strcmp(args->listen_uri, "defer")) {
>>>> +            migrate_incoming_qmp(to, args->connect_uri, in_channels, "{}");
>>>> +        }
>>>
>>> Paths that don't call migrate_incoming_qmp() never free
>>> in_channels. We'll need something like this, let me know if I can squash
>>> it in or you want to do it differently:
>>>
>>> -- >8 --
>>>   From 62d60c39b3e5d38cac20241e63b9d023bd296d2f Mon Sep 17 00:00:00 2001
>>> From: Fabiano Rosas <farosas@suse.de>
>>> Date: Thu, 16 Jan 2025 15:40:22 -0300
>>> Subject: [PATCH] fixup! migration-test: cpr-transfer
>>>
>>> ---
>>>    tests/qtest/migration/framework.c | 5 +++++
>>>    1 file changed, 5 insertions(+)
>>>
>>> diff --git a/tests/qtest/migration/framework.c b/tests/qtest/migration/framework.c
>>> index 699bedae69..1d5918d922 100644
>>> --- a/tests/qtest/migration/framework.c
>>> +++ b/tests/qtest/migration/framework.c
>>> @@ -753,9 +753,14 @@ void test_precopy_common(MigrateCommon *args)
>>>            qtest_qmp_handshake(to);
>>>            if (!strcmp(args->listen_uri, "defer")) {
>>>                migrate_incoming_qmp(to, args->connect_uri, in_channels, "{}");
>>> +            in_channels = NULL;
>>>            }
>>>        }
>>>    
>>> +    if (in_channels) {
>>> +        qobject_unref(in_channels);
>>> +    }
>>> +
>>>        if (args->result != MIG_TEST_SUCCEED) {
>>>            bool allow_active = args->result == MIG_TEST_FAIL;
>>>            wait_for_migration_fail(from, allow_active);
>>
>> Thank-you, though it would be more direct to avoid creating in_channels when
>> not needed:
>>
>>       if (args->connect_channels) {
>>           if (args->start.defer_target_connect) {
>>               in_channels = qobject_from_json(args->connect_channels,
>>                                               &error_abort);
>>           }
>>           out_channels = qobject_from_json(args->connect_channels, &error_abort);
> 
> That's better, but still needs one unref for the listen_uri != defer path.

OK, then:

     if (args->connect_channels) {
         if (args->start.defer_target_connect &&
             !strcmp(args->listen_uri, "defer")) {
             in_channels = qobject_from_json(args->connect_channels,
                                             &error_abort);
         }
         out_channels = qobject_from_json(args->connect_channels, &error_abort);

Or keep your fix.  I have no preference.

- Steve



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH V7 24/24] migration: cpr-transfer documentation
  2025-01-15 19:00 ` [PATCH V7 24/24] migration: cpr-transfer documentation Steve Sistare
@ 2025-01-17 14:42   ` Fabiano Rosas
  2025-01-17 15:04     ` Steven Sistare
  0 siblings, 1 reply; 44+ messages in thread
From: Fabiano Rosas @ 2025-01-17 14:42 UTC (permalink / raw)
  To: Steve Sistare, qemu-devel
  Cc: Peter Xu, David Hildenbrand, Marcel Apfelbaum, Eduardo Habkost,
	Philippe Mathieu-Daude, Paolo Bonzini, Daniel P. Berrange,
	Markus Armbruster, Steve Sistare

Steve Sistare <steven.sistare@oracle.com> writes:

> Add documentation for the cpr-transfer migration mode.
>
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> Reviewed-by: Peter Xu <peterx@redhat.com>
> ---
>  docs/devel/migration/CPR.rst | 182 ++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 180 insertions(+), 2 deletions(-)
>
> diff --git a/docs/devel/migration/CPR.rst b/docs/devel/migration/CPR.rst
> index 63c3647..d6021d5 100644
> --- a/docs/devel/migration/CPR.rst
> +++ b/docs/devel/migration/CPR.rst
> @@ -5,7 +5,7 @@ CPR is the umbrella name for a set of migration modes in which the
>  VM is migrated to a new QEMU instance on the same host.  It is
>  intended for use when the goal is to update host software components
>  that run the VM, such as QEMU or even the host kernel.  At this time,
> -cpr-reboot is the only available mode.
> +the cpr-reboot and cpr-transfer modes are available.
>  
>  Because QEMU is restarted on the same host, with access to the same
>  local devices, CPR is allowed in certain cases where normal migration
> @@ -53,7 +53,7 @@ RAM is copied to the migration URI.
>  Outgoing:
>    * Set the migration mode parameter to ``cpr-reboot``.
>    * Set the ``x-ignore-shared`` capability if desired.
> -  * Issue the ``migrate`` command.  It is recommended the the URI be a
> +  * Issue the ``migrate`` command.  It is recommended the URI be a
>      ``file`` type, but one can use other types such as ``exec``,
>      provided the command captures all the data from the outgoing side,
>      and provides all the data to the incoming side.
> @@ -145,3 +145,181 @@ Caveats
>  
>  cpr-reboot mode may not be used with postcopy, background-snapshot,
>  or COLO.
> +
> +cpr-transfer mode
> +-----------------
> +
> +This mode allows the user to transfer a guest to a new QEMU instance
> +on the same host with minimal guest pause time, by preserving guest
> +RAM in place, albeit with new virtual addresses in new QEMU.  Devices
> +and their pinned memory pages will also be preserved in a future QEMU
> +release.
> +
> +The user starts new QEMU on the same host as old QEMU, with command-
> +line arguments to create the same machine, plus the ``-incoming``
> +option for the main migration channel, like normal live migration.
> +In addition, the user adds a second -incoming option with channel
> +type ``cpr``.  This CPR channel must support file descriptor transfer
> +with SCM_RIGHTS, i.e. it must be a UNIX domain socket.
> +
> +To initiate CPR, the user issues a migrate command to old QEMU,
> +adding a second migration channel of type ``cpr`` in the channels
> +argument.  Old QEMU stops the VM, saves state to the migration
> +channels, and enters the postmigrate state.  Execution resumes in
> +new QEMU.
> +
> +New QEMU reads the CPR channel before opening a monitor, hence
> +the CPR channel cannot be specified in the list of channels for a
> +migrate-incoming command.  It may only be specified on the command
> +line.
> +
> +Usage
> +^^^^^
> +
> +Memory backend objects must have the ``share=on`` attribute.
> +
> +The VM must be started with the ``-machine aux-ram-share=on``
> +option.  This causes implicit RAM blocks (those not described by
> +a memory-backend object) to be allocated by mmap'ing a memfd.
> +Examples include VGA and ROM.
> +
> +Outgoing:
> +  * Set the migration mode parameter to ``cpr-transfer``.
> +  * Issue the ``migrate`` command, containing a main channel and
> +    a cpr channel.
> +
> +Incoming:
> +  * Start new QEMU with two ``-incoming`` options.
> +  * If the VM was running when the outgoing ``migrate`` command was
> +    issued, then QEMU automatically resumes VM execution.
> +
> +Caveats
> +^^^^^^^
> +
> +cpr-transfer mode may not be used with postcopy, background-snapshot,
> +or COLO.
> +
> +memory-backend-epc is not supported.
> +
> +The main incoming migration channel address cannot be a file type.
> +
> +If the main incoming channel address is an inet socket, then the port
> +cannot be 0 (meaning dynamically choose a port).
> +
> +When using ``-incoming defer``, you must issue the migrate command to
> +old QEMU before issuing any monitor commands to new QEMU, because new
> +QEMU blocks waiting to read from the cpr channel before starting its
> +monitor, and old QEMU does not write to the channel until the migrate
> +command is issued.  However, new QEMU does not open and read the
> +main migration channel until you issue the migrate incoming command.
> +
> +Example 1: incoming channel
> +^^^^^^^^^^^^^^^^^^^^^^^^^^^
> +
> +In these examples, we simply restart the same version of QEMU, but
> +in a real scenario one would start new QEMU on the incoming side.
> +Note that new QEMU does not print the monitor prompt until old QEMU
> +has issued the migrate command.  The outgoing side uses QMP because
> +HMP cannot specify a CPR channel.  Some QMP responses are omitted for
> +brevity.
> +
> +::
> +
> +  Outgoing:                             Incoming:
> +
> +  # qemu-kvm -qmp stdio
> +  -object memory-backend-file,id=ram0,size=4G,
> +  mem-path=/dev/shm/ram0,share=on -m 4G
> +  -machine aux-ram-share=on
> +  ...
> +                                        # qemu-kvm -monitor stdio
> +                                        -incoming tcp:0:44444
> +                                        -incoming '{"channel-type": "cpr",
> +                                          "addr": { "transport": "socket",
> +                                          "type": "unix", "path": "cpr.sock"}}'
> +                                        ...

I'm attempting this and not having much success. Surely I'm missing
something:


$ qemu-system-x86_64 -cpu host -smp 16 -machine pc,accel=kvm \
  -drive id=drive0,if=none,format=qcow2,file=img.qcow2 \
  -device virtio-blk-pci,id=image1,drive=drive0,bootindex=0 \
  -qmp unix:./dst-qmp.sock,server,wait=off \
  -nographic -serial mon:stdio \
  -object memory-backend-file,id=ram0,size=4G,mem-path=/dev/shm/ram0,share=on \
  -m 4G -machine aux-ram-share=on \

  -incoming tcp:0:44444 \
  -incoming '{"channel-type": "cpr", "addr": { "transport": "socket", "type": "unix", "path": "cpr.sock"}}' \

  -trace loadvm_* -trace cpr_* -trace migration_* -trace migrate_* -trace qemu_loadvm_*

cpr_transfer_input cpr.sock
cpr_state_load cpr-transfer mode
cpr_find_fd pc.bios, id 0 returns 15
cpr_find_fd pc.rom, id 0 returns 14
cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns 13
cpr_find_fd 0000:00:02.0/vga.rom, id 0 returns 12
cpr_find_fd 0000:00:03.0/e1000.rom, id 0 returns 11
cpr_find_fd /rom@etc/acpi/tables, id 0 returns 10
cpr_find_fd /rom@etc/table-loader, id 0 returns 8
cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns 3
migrate_set_state new state setup
migration_socket_incoming_accepted 
migration_set_incoming_channel ioc=0x564dc31e7000 ioctype=qio-channel-socket
migrate_set_state new state active
loadvm_state_setup 
qemu_loadvm_state_section 1
qemu_loadvm_state_section_startfull 2(ram) 0 4
qemu_loadvm_state_section 3
qemu_loadvm_state_section_partend 2
qemu_loadvm_state_section 4
qemu_loadvm_state_section_startfull 0(timer) 0 2
qemu_loadvm_state_section 4
qemu_loadvm_state_section_startfull 1(slirp) 0 4
qemu_loadvm_state_section 4
qemu_loadvm_state_section_startfull 4(cpu_common) 0 1
qemu_loadvm_state_section 4
qemu_loadvm_state_section_startfull 5(cpu) 0 12
qemu_loadvm_state_section 4
qemu_loadvm_state_section_startfull 6(kvm-tpr-opt) 0 1
qemu-system-x86_64: error while loading state for instance 0x0 of device 'kvm-tpr-opt'
qemu_loadvm_state_post_main -1
migrate_set_state new state failed
migrate_error error=load of migration failed: Operation not permitted
loadvm_state_cleanup 
qemu-system-x86_64: load of migration failed: Operation not permitted

> +  {"execute":"qmp_capabilities"}
> +
> +  {"execute": "query-status"}
> +  {"return": {"status": "running",
> +              "running": true}}
> +
> +  {"execute":"migrate-set-parameters",
> +   "arguments":{"mode":"cpr-transfer"}}
> +
> +  {"execute": "migrate", "arguments": { "channels": [
> +    {"channel-type": "main",
> +     "addr": { "transport": "socket", "type": "inet",
> +               "host": "0", "port": "44444" }},
> +    {"channel-type": "cpr",
> +     "addr": { "transport": "socket", "type": "unix",
> +               "path": "cpr.sock" }}]}}
> +
> +                                        QEMU 10.0.50 monitor
> +                                        (qemu) info status
> +                                        VM status: running
> +
> +  {"execute": "query-status"}
> +  {"return": {"status": "postmigrate",
> +              "running": false}}
> +
> +Example 2: incoming defer
> +^^^^^^^^^^^^^^^^^^^^^^^^^
> +
> +This example uses ``-incoming defer`` to hot plug a device before
> +accepting the main migration channel.  Again note you must issue the
> +migrate command to old QEMU before you can issue any monitor
> +commands to new QEMU.
> +
> +
> +::
> +
> +  Outgoing:                             Incoming:
> +
> +  # qemu-kvm -monitor stdio
> +  -object memory-backend-file,id=ram0,size=4G,
> +  mem-path=/dev/shm/ram0,share=on -m 4G
> +  -machine aux-ram-share=on
> +  ...
> +                                        # qemu-kvm -monitor stdio
> +                                        -incoming defer
> +                                        -incoming '{"channel-type": "cpr",
> +                                          "addr": { "transport": "socket",
> +                                          "type": "unix", "path": "cpr.sock"}}'
> +                                        ...
> +  {"execute":"qmp_capabilities"}
> +
> +  {"execute": "device_add",
> +   "arguments": {"driver": "pcie-root-port"}}
> +
> +  {"execute":"migrate-set-parameters",
> +   "arguments":{"mode":"cpr-transfer"}}
> +
> +  {"execute": "migrate", "arguments": { "channels": [
> +    {"channel-type": "main",
> +     "addr": { "transport": "socket", "type": "inet",
> +               "host": "0", "port": "44444" }},
> +    {"channel-type": "cpr",
> +     "addr": { "transport": "socket", "type": "unix",
> +               "path": "cpr.sock" }}]}}
> +
> +                                        QEMU 10.0.50 monitor
> +                                        (qemu) info status
> +                                        VM status: paused (inmigrate)
> +                                        (qemu) device_add pcie-root-port
> +                                        (qemu) migrate_incoming tcp:0:44444
> +                                        (qemu) info status
> +                                        VM status: running
> +
> +  {"execute": "query-status"}
> +  {"return": {"status": "postmigrate",
> +              "running": false}}
> +
> +Futures
> +^^^^^^^
> +
> +cpr-transfer mode is based on a capability to transfer open file
> +descriptors from old to new QEMU.  In the future, descriptors for
> +vfio, iommufd, vhost, and char devices could be transferred,
> +preserving those devices and their kernel state without interruption,
> +even if they do not explicitly support live migration.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH V7 24/24] migration: cpr-transfer documentation
  2025-01-17 14:42   ` Fabiano Rosas
@ 2025-01-17 15:04     ` Steven Sistare
  2025-01-17 15:29       ` Fabiano Rosas
  0 siblings, 1 reply; 44+ messages in thread
From: Steven Sistare @ 2025-01-17 15:04 UTC (permalink / raw)
  To: Fabiano Rosas, qemu-devel
  Cc: Peter Xu, David Hildenbrand, Marcel Apfelbaum, Eduardo Habkost,
	Philippe Mathieu-Daude, Paolo Bonzini, Daniel P. Berrange,
	Markus Armbruster

On 1/17/2025 9:42 AM, Fabiano Rosas wrote:
> Steve Sistare <steven.sistare@oracle.com> writes:
> 
>> Add documentation for the cpr-transfer migration mode.
>>
>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>> Reviewed-by: Peter Xu <peterx@redhat.com>
>> ---
>>   docs/devel/migration/CPR.rst | 182 ++++++++++++++++++++++++++++++++++++++++++-
>>   1 file changed, 180 insertions(+), 2 deletions(-)
>>
>> diff --git a/docs/devel/migration/CPR.rst b/docs/devel/migration/CPR.rst
>> index 63c3647..d6021d5 100644
>> --- a/docs/devel/migration/CPR.rst
>> +++ b/docs/devel/migration/CPR.rst
>> @@ -5,7 +5,7 @@ CPR is the umbrella name for a set of migration modes in which the
>>   VM is migrated to a new QEMU instance on the same host.  It is
>>   intended for use when the goal is to update host software components
>>   that run the VM, such as QEMU or even the host kernel.  At this time,
>> -cpr-reboot is the only available mode.
>> +the cpr-reboot and cpr-transfer modes are available.
>>   
>>   Because QEMU is restarted on the same host, with access to the same
>>   local devices, CPR is allowed in certain cases where normal migration
>> @@ -53,7 +53,7 @@ RAM is copied to the migration URI.
>>   Outgoing:
>>     * Set the migration mode parameter to ``cpr-reboot``.
>>     * Set the ``x-ignore-shared`` capability if desired.
>> -  * Issue the ``migrate`` command.  It is recommended the the URI be a
>> +  * Issue the ``migrate`` command.  It is recommended the URI be a
>>       ``file`` type, but one can use other types such as ``exec``,
>>       provided the command captures all the data from the outgoing side,
>>       and provides all the data to the incoming side.
>> @@ -145,3 +145,181 @@ Caveats
>>   
>>   cpr-reboot mode may not be used with postcopy, background-snapshot,
>>   or COLO.
>> +
>> +cpr-transfer mode
>> +-----------------
>> +
>> +This mode allows the user to transfer a guest to a new QEMU instance
>> +on the same host with minimal guest pause time, by preserving guest
>> +RAM in place, albeit with new virtual addresses in new QEMU.  Devices
>> +and their pinned memory pages will also be preserved in a future QEMU
>> +release.
>> +
>> +The user starts new QEMU on the same host as old QEMU, with command-
>> +line arguments to create the same machine, plus the ``-incoming``
>> +option for the main migration channel, like normal live migration.
>> +In addition, the user adds a second -incoming option with channel
>> +type ``cpr``.  This CPR channel must support file descriptor transfer
>> +with SCM_RIGHTS, i.e. it must be a UNIX domain socket.
>> +
>> +To initiate CPR, the user issues a migrate command to old QEMU,
>> +adding a second migration channel of type ``cpr`` in the channels
>> +argument.  Old QEMU stops the VM, saves state to the migration
>> +channels, and enters the postmigrate state.  Execution resumes in
>> +new QEMU.
>> +
>> +New QEMU reads the CPR channel before opening a monitor, hence
>> +the CPR channel cannot be specified in the list of channels for a
>> +migrate-incoming command.  It may only be specified on the command
>> +line.
>> +
>> +Usage
>> +^^^^^
>> +
>> +Memory backend objects must have the ``share=on`` attribute.
>> +
>> +The VM must be started with the ``-machine aux-ram-share=on``
>> +option.  This causes implicit RAM blocks (those not described by
>> +a memory-backend object) to be allocated by mmap'ing a memfd.
>> +Examples include VGA and ROM.
>> +
>> +Outgoing:
>> +  * Set the migration mode parameter to ``cpr-transfer``.
>> +  * Issue the ``migrate`` command, containing a main channel and
>> +    a cpr channel.
>> +
>> +Incoming:
>> +  * Start new QEMU with two ``-incoming`` options.
>> +  * If the VM was running when the outgoing ``migrate`` command was
>> +    issued, then QEMU automatically resumes VM execution.
>> +
>> +Caveats
>> +^^^^^^^
>> +
>> +cpr-transfer mode may not be used with postcopy, background-snapshot,
>> +or COLO.
>> +
>> +memory-backend-epc is not supported.
>> +
>> +The main incoming migration channel address cannot be a file type.
>> +
>> +If the main incoming channel address is an inet socket, then the port
>> +cannot be 0 (meaning dynamically choose a port).
>> +
>> +When using ``-incoming defer``, you must issue the migrate command to
>> +old QEMU before issuing any monitor commands to new QEMU, because new
>> +QEMU blocks waiting to read from the cpr channel before starting its
>> +monitor, and old QEMU does not write to the channel until the migrate
>> +command is issued.  However, new QEMU does not open and read the
>> +main migration channel until you issue the migrate incoming command.
>> +
>> +Example 1: incoming channel
>> +^^^^^^^^^^^^^^^^^^^^^^^^^^^
>> +
>> +In these examples, we simply restart the same version of QEMU, but
>> +in a real scenario one would start new QEMU on the incoming side.
>> +Note that new QEMU does not print the monitor prompt until old QEMU
>> +has issued the migrate command.  The outgoing side uses QMP because
>> +HMP cannot specify a CPR channel.  Some QMP responses are omitted for
>> +brevity.
>> +
>> +::
>> +
>> +  Outgoing:                             Incoming:
>> +
>> +  # qemu-kvm -qmp stdio
>> +  -object memory-backend-file,id=ram0,size=4G,
>> +  mem-path=/dev/shm/ram0,share=on -m 4G
>> +  -machine aux-ram-share=on
>> +  ...
>> +                                        # qemu-kvm -monitor stdio
>> +                                        -incoming tcp:0:44444
>> +                                        -incoming '{"channel-type": "cpr",
>> +                                          "addr": { "transport": "socket",
>> +                                          "type": "unix", "path": "cpr.sock"}}'
>> +                                        ...
> 
> I'm attempting this and not having much success. Surely I'm missing
> something:
> 
> 
> $ qemu-system-x86_64 -cpu host -smp 16 -machine pc,accel=kvm \
>    -drive id=drive0,if=none,format=qcow2,file=img.qcow2 \
>    -device virtio-blk-pci,id=image1,drive=drive0,bootindex=0 \
>    -qmp unix:./dst-qmp.sock,server,wait=off \
>    -nographic -serial mon:stdio \
>    -object memory-backend-file,id=ram0,size=4G,mem-path=/dev/shm/ram0,share=on \
>    -m 4G -machine aux-ram-share=on \
> 
>    -incoming tcp:0:44444 \
>    -incoming '{"channel-type": "cpr", "addr": { "transport": "socket", "type": "unix", "path": "cpr.sock"}}' \
> 
>    -trace loadvm_* -trace cpr_* -trace migration_* -trace migrate_* -trace qemu_loadvm_*
> 
> cpr_transfer_input cpr.sock
> cpr_state_load cpr-transfer mode
> cpr_find_fd pc.bios, id 0 returns 15
> cpr_find_fd pc.rom, id 0 returns 14
> cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns 13
> cpr_find_fd 0000:00:02.0/vga.rom, id 0 returns 12
> cpr_find_fd 0000:00:03.0/e1000.rom, id 0 returns 11
> cpr_find_fd /rom@etc/acpi/tables, id 0 returns 10
> cpr_find_fd /rom@etc/table-loader, id 0 returns 8
> cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns 3
> migrate_set_state new state setup
> migration_socket_incoming_accepted
> migration_set_incoming_channel ioc=0x564dc31e7000 ioctype=qio-channel-socket
> migrate_set_state new state active
> loadvm_state_setup
> qemu_loadvm_state_section 1
> qemu_loadvm_state_section_startfull 2(ram) 0 4
> qemu_loadvm_state_section 3
> qemu_loadvm_state_section_partend 2
> qemu_loadvm_state_section 4
> qemu_loadvm_state_section_startfull 0(timer) 0 2
> qemu_loadvm_state_section 4
> qemu_loadvm_state_section_startfull 1(slirp) 0 4
> qemu_loadvm_state_section 4
> qemu_loadvm_state_section_startfull 4(cpu_common) 0 1
> qemu_loadvm_state_section 4
> qemu_loadvm_state_section_startfull 5(cpu) 0 12
> qemu_loadvm_state_section 4
> qemu_loadvm_state_section_startfull 6(kvm-tpr-opt) 0 1
> qemu-system-x86_64: error while loading state for instance 0x0 of device 'kvm-tpr-opt'
> qemu_loadvm_state_post_main -1
> migrate_set_state new state failed
> migrate_error error=load of migration failed: Operation not permitted
> loadvm_state_cleanup
> qemu-system-x86_64: load of migration failed: Operation not permitted

Check for a mismatch between the qemu args on the source vs dest.  Maybe -cpu.

- Steve



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH V7 24/24] migration: cpr-transfer documentation
  2025-01-17 15:04     ` Steven Sistare
@ 2025-01-17 15:29       ` Fabiano Rosas
  2025-01-17 16:58         ` Steven Sistare
  0 siblings, 1 reply; 44+ messages in thread
From: Fabiano Rosas @ 2025-01-17 15:29 UTC (permalink / raw)
  To: Steven Sistare, qemu-devel
  Cc: Peter Xu, David Hildenbrand, Marcel Apfelbaum, Eduardo Habkost,
	Philippe Mathieu-Daude, Paolo Bonzini, Daniel P. Berrange,
	Markus Armbruster

Steven Sistare <steven.sistare@oracle.com> writes:

> On 1/17/2025 9:42 AM, Fabiano Rosas wrote:
>> Steve Sistare <steven.sistare@oracle.com> writes:
>> 
>>> Add documentation for the cpr-transfer migration mode.
>>>
>>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>>> Reviewed-by: Peter Xu <peterx@redhat.com>
>>> ---
>>>   docs/devel/migration/CPR.rst | 182 ++++++++++++++++++++++++++++++++++++++++++-
>>>   1 file changed, 180 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/docs/devel/migration/CPR.rst b/docs/devel/migration/CPR.rst
>>> index 63c3647..d6021d5 100644
>>> --- a/docs/devel/migration/CPR.rst
>>> +++ b/docs/devel/migration/CPR.rst
>>> @@ -5,7 +5,7 @@ CPR is the umbrella name for a set of migration modes in which the
>>>   VM is migrated to a new QEMU instance on the same host.  It is
>>>   intended for use when the goal is to update host software components
>>>   that run the VM, such as QEMU or even the host kernel.  At this time,
>>> -cpr-reboot is the only available mode.
>>> +the cpr-reboot and cpr-transfer modes are available.
>>>   
>>>   Because QEMU is restarted on the same host, with access to the same
>>>   local devices, CPR is allowed in certain cases where normal migration
>>> @@ -53,7 +53,7 @@ RAM is copied to the migration URI.
>>>   Outgoing:
>>>     * Set the migration mode parameter to ``cpr-reboot``.
>>>     * Set the ``x-ignore-shared`` capability if desired.
>>> -  * Issue the ``migrate`` command.  It is recommended the the URI be a
>>> +  * Issue the ``migrate`` command.  It is recommended the URI be a
>>>       ``file`` type, but one can use other types such as ``exec``,
>>>       provided the command captures all the data from the outgoing side,
>>>       and provides all the data to the incoming side.
>>> @@ -145,3 +145,181 @@ Caveats
>>>   
>>>   cpr-reboot mode may not be used with postcopy, background-snapshot,
>>>   or COLO.
>>> +
>>> +cpr-transfer mode
>>> +-----------------
>>> +
>>> +This mode allows the user to transfer a guest to a new QEMU instance
>>> +on the same host with minimal guest pause time, by preserving guest
>>> +RAM in place, albeit with new virtual addresses in new QEMU.  Devices
>>> +and their pinned memory pages will also be preserved in a future QEMU
>>> +release.
>>> +
>>> +The user starts new QEMU on the same host as old QEMU, with command-
>>> +line arguments to create the same machine, plus the ``-incoming``
>>> +option for the main migration channel, like normal live migration.
>>> +In addition, the user adds a second -incoming option with channel
>>> +type ``cpr``.  This CPR channel must support file descriptor transfer
>>> +with SCM_RIGHTS, i.e. it must be a UNIX domain socket.
>>> +
>>> +To initiate CPR, the user issues a migrate command to old QEMU,
>>> +adding a second migration channel of type ``cpr`` in the channels
>>> +argument.  Old QEMU stops the VM, saves state to the migration
>>> +channels, and enters the postmigrate state.  Execution resumes in
>>> +new QEMU.
>>> +
>>> +New QEMU reads the CPR channel before opening a monitor, hence
>>> +the CPR channel cannot be specified in the list of channels for a
>>> +migrate-incoming command.  It may only be specified on the command
>>> +line.
>>> +
>>> +Usage
>>> +^^^^^
>>> +
>>> +Memory backend objects must have the ``share=on`` attribute.
>>> +
>>> +The VM must be started with the ``-machine aux-ram-share=on``
>>> +option.  This causes implicit RAM blocks (those not described by
>>> +a memory-backend object) to be allocated by mmap'ing a memfd.
>>> +Examples include VGA and ROM.
>>> +
>>> +Outgoing:
>>> +  * Set the migration mode parameter to ``cpr-transfer``.
>>> +  * Issue the ``migrate`` command, containing a main channel and
>>> +    a cpr channel.
>>> +
>>> +Incoming:
>>> +  * Start new QEMU with two ``-incoming`` options.
>>> +  * If the VM was running when the outgoing ``migrate`` command was
>>> +    issued, then QEMU automatically resumes VM execution.
>>> +
>>> +Caveats
>>> +^^^^^^^
>>> +
>>> +cpr-transfer mode may not be used with postcopy, background-snapshot,
>>> +or COLO.
>>> +
>>> +memory-backend-epc is not supported.
>>> +
>>> +The main incoming migration channel address cannot be a file type.
>>> +
>>> +If the main incoming channel address is an inet socket, then the port
>>> +cannot be 0 (meaning dynamically choose a port).
>>> +
>>> +When using ``-incoming defer``, you must issue the migrate command to
>>> +old QEMU before issuing any monitor commands to new QEMU, because new
>>> +QEMU blocks waiting to read from the cpr channel before starting its
>>> +monitor, and old QEMU does not write to the channel until the migrate
>>> +command is issued.  However, new QEMU does not open and read the
>>> +main migration channel until you issue the migrate incoming command.
>>> +
>>> +Example 1: incoming channel
>>> +^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>> +
>>> +In these examples, we simply restart the same version of QEMU, but
>>> +in a real scenario one would start new QEMU on the incoming side.
>>> +Note that new QEMU does not print the monitor prompt until old QEMU
>>> +has issued the migrate command.  The outgoing side uses QMP because
>>> +HMP cannot specify a CPR channel.  Some QMP responses are omitted for
>>> +brevity.
>>> +
>>> +::
>>> +
>>> +  Outgoing:                             Incoming:
>>> +
>>> +  # qemu-kvm -qmp stdio
>>> +  -object memory-backend-file,id=ram0,size=4G,
>>> +  mem-path=/dev/shm/ram0,share=on -m 4G
>>> +  -machine aux-ram-share=on
>>> +  ...
>>> +                                        # qemu-kvm -monitor stdio
>>> +                                        -incoming tcp:0:44444
>>> +                                        -incoming '{"channel-type": "cpr",
>>> +                                          "addr": { "transport": "socket",
>>> +                                          "type": "unix", "path": "cpr.sock"}}'
>>> +                                        ...
>> 
>> I'm attempting this and not having much success. Surely I'm missing
>> something:
>> 
>> 
>> $ qemu-system-x86_64 -cpu host -smp 16 -machine pc,accel=kvm \
>>    -drive id=drive0,if=none,format=qcow2,file=img.qcow2 \
>>    -device virtio-blk-pci,id=image1,drive=drive0,bootindex=0 \
>>    -qmp unix:./dst-qmp.sock,server,wait=off \
>>    -nographic -serial mon:stdio \
>>    -object memory-backend-file,id=ram0,size=4G,mem-path=/dev/shm/ram0,share=on \
>>    -m 4G -machine aux-ram-share=on \
>> 
>>    -incoming tcp:0:44444 \
>>    -incoming '{"channel-type": "cpr", "addr": { "transport": "socket", "type": "unix", "path": "cpr.sock"}}' \
>> 
>>    -trace loadvm_* -trace cpr_* -trace migration_* -trace migrate_* -trace qemu_loadvm_*
>> 
>> cpr_transfer_input cpr.sock
>> cpr_state_load cpr-transfer mode
>> cpr_find_fd pc.bios, id 0 returns 15
>> cpr_find_fd pc.rom, id 0 returns 14
>> cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns 13
>> cpr_find_fd 0000:00:02.0/vga.rom, id 0 returns 12
>> cpr_find_fd 0000:00:03.0/e1000.rom, id 0 returns 11
>> cpr_find_fd /rom@etc/acpi/tables, id 0 returns 10
>> cpr_find_fd /rom@etc/table-loader, id 0 returns 8
>> cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns 3
>> migrate_set_state new state setup
>> migration_socket_incoming_accepted
>> migration_set_incoming_channel ioc=0x564dc31e7000 ioctype=qio-channel-socket
>> migrate_set_state new state active
>> loadvm_state_setup
>> qemu_loadvm_state_section 1
>> qemu_loadvm_state_section_startfull 2(ram) 0 4
>> qemu_loadvm_state_section 3
>> qemu_loadvm_state_section_partend 2
>> qemu_loadvm_state_section 4
>> qemu_loadvm_state_section_startfull 0(timer) 0 2
>> qemu_loadvm_state_section 4
>> qemu_loadvm_state_section_startfull 1(slirp) 0 4
>> qemu_loadvm_state_section 4
>> qemu_loadvm_state_section_startfull 4(cpu_common) 0 1
>> qemu_loadvm_state_section 4
>> qemu_loadvm_state_section_startfull 5(cpu) 0 12
>> qemu_loadvm_state_section 4
>> qemu_loadvm_state_section_startfull 6(kvm-tpr-opt) 0 1
>> qemu-system-x86_64: error while loading state for instance 0x0 of device 'kvm-tpr-opt'
>> qemu_loadvm_state_post_main -1
>> migrate_set_state new state failed
>> migrate_error error=load of migration failed: Operation not permitted
>> loadvm_state_cleanup
>> qemu-system-x86_64: load of migration failed: Operation not permitted
>
> Check for a mismatch between the qemu args on the source vs dest.
> Maybe -cpu.

No.. they're the same:

qemu-system-x86_64 -display none -cpu host -smp 4 -machine pc,accel=kvm
-object
memory-backend-file,id=ram0,size=4G,mem-path=/dev/shm/ram0,share=on -m
4G -machine aux-ram-share=on -qmp stdio

qemu-system-x86_64 -display none -cpu host -smp 4 -machine pc,accel=kvm
-object
memory-backend-file,id=ram0,size=4G,mem-path=/dev/shm/ram0,share=on -m
4G -machine aux-ram-share=on -incoming tcp:0:44444 -incoming
'{"channel-type": "cpr", "addr": { "transport": "socket", "type":
"unix", "path": "cpr.sock"}}' -monitor stdio


Here's the whole log, see if you spot something:

$ (sleep 5; echo "{ 'execute': 'qmp_capabilities' }
                 { 'execute': 'migrate-set-parameters','arguments':{ 'mode': 'cpr-transfer' } }
                 { 'execute': 'migrate', 'arguments': \
                   { 'channels': [ \
                     {'channel-type': 'main', 'addr': { 'transport': 'socket', 'type': 'inet', \
                                      'host': '0', 'port': '44444' }}, \
                     {'channel-type': 'cpr', 'addr': { 'transport': 'socket', 'type': 'unix', \
                                      'path': 'cpr.sock' }} \
                   ]} \
                 }") | /home/fabiano/qemu-system-x86_64 -display none
                 -cpu host -smp 4 -machine pc,accel=kvm -qmp stdio
                 -object
                 memory-backend-file,id=ram0,size=4G,mem-path=/dev/shm/ram0,share=on
                 -m 4G -machine aux-ram-share=on -trace cpr_* -trace
                 migration_* -trace migrate_* -trace qemu_savevm_*
                 -trace savevm_*

{"QMP": {"version": {"qemu": {"micro": 50, "minor": 2, "major": 9}, "package": "v9.2.0-987-gfd4129a8b9"}, "capabilities": ["oob"]}}
cpr_find_fd pc.bios, id 0 returns -1
cpr_save_fd pc.bios, id 0, fd 22
cpr_find_fd pc.rom, id 0 returns -1
cpr_save_fd pc.rom, id 0, fd 23
cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns -1
cpr_save_fd 0000:00:02.0/vga.vram, id 0, fd 24
cpr_find_fd 0000:00:02.0/vga.rom, id 0 returns -1
cpr_save_fd 0000:00:02.0/vga.rom, id 0, fd 26
cpr_find_fd 0000:00:03.0/e1000.rom, id 0 returns -1
cpr_save_fd 0000:00:03.0/e1000.rom, id 0, fd 27
cpr_find_fd /rom@etc/acpi/tables, id 0 returns -1
cpr_save_fd /rom@etc/acpi/tables, id 0, fd 28
cpr_find_fd /rom@etc/table-loader, id 0 returns -1
cpr_save_fd /rom@etc/table-loader, id 0, fd 29
cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns -1
cpr_save_fd /rom@etc/acpi/rsdp, id 0, fd 30
migration_block_activation active-skipped
{"return": {}}
{"return": {}}
migrate_set_state new state setup
cpr_state_save cpr-transfer mode
cpr_transfer_output cpr.sock
{"return": {}}
migration_socket_outgoing_connected hostname=0
migration_set_outgoing_channel ioc=0x55748ba10270 ioctype=qio-channel-socket hostname=0 err=(nil)
{"timestamp": {"seconds": 1737127025, "microseconds": 606998}, "event": "STOP"}
migration_completion_vm_stop ret 0
migration_transferred_bytes qemu_file 224 multifd 0 RDMA 0
savevm_state_header
savevm_state_setup
migration_bitmap_sync_start
migration_bitmap_sync_end dirty_pages 0
migrate_set_state new state active
migration_thread_setup_complete
migration_transferred_bytes qemu_file 506 multifd 0 RDMA 0
migrate_pending_estimate estimate pending size 0 (pre = 0 post=0)
migration_thread_low_pending 0
migrate_set_state new state device
migration_block_activation inactive
migration_precopy_complete
savevm_section_start ram, section_id 2
migration_bitmap_sync_start
migration_bitmap_sync_end dirty_pages 0
savevm_section_end ram, section_id 2 -> 0
savevm_section_start timer, section_id 0
savevm_section_end timer, section_id 0 -> 0
savevm_section_start slirp, section_id 1
savevm_section_end slirp, section_id 1 -> 0
savevm_section_start cpu_common, section_id 4
savevm_section_end cpu_common, section_id 4 -> 0
savevm_section_start cpu, section_id 5
savevm_section_end cpu, section_id 5 -> 0
savevm_section_start kvm-tpr-opt, section_id 6
savevm_section_end kvm-tpr-opt, section_id 6 -> 0
savevm_section_start apic, section_id 7
savevm_section_end apic, section_id 7 -> 0
savevm_section_start cpu_common, section_id 8
savevm_section_end cpu_common, section_id 8 -> 0
savevm_section_start cpu, section_id 9
savevm_section_end cpu, section_id 9 -> 0
savevm_section_start apic, section_id 10
savevm_section_end apic, section_id 10 -> 0
savevm_section_start cpu_common, section_id 11
savevm_section_end cpu_common, section_id 11 -> 0
savevm_section_start cpu, section_id 12
savevm_section_end cpu, section_id 12 -> 0
savevm_section_start apic, section_id 13
savevm_section_end apic, section_id 13 -> 0
savevm_section_start cpu_common, section_id 14
savevm_section_end cpu_common, section_id 14 -> 0
savevm_section_start cpu, section_id 15
savevm_section_end cpu, section_id 15 -> 0
savevm_section_start apic, section_id 16
savevm_section_end apic, section_id 16 -> 0
savevm_section_start kvmclock, section_id 17
savevm_section_end kvmclock, section_id 17 -> 0
savevm_section_start 0000:00:00.0/I440FX, section_id 18
savevm_section_end 0000:00:00.0/I440FX, section_id 18 -> 0
savevm_section_start PCIHost, section_id 19
savevm_section_end PCIHost, section_id 19 -> 0
savevm_section_start PCIBUS, section_id 20
savevm_section_end PCIBUS, section_id 20 -> 0
savevm_section_start fw_cfg, section_id 21
savevm_section_end fw_cfg, section_id 21 -> 0
savevm_section_start dma, section_id 22
savevm_section_end dma, section_id 22 -> 0
savevm_section_start dma, section_id 23
savevm_section_end dma, section_id 23 -> 0
savevm_section_start mc146818rtc, section_id 24
savevm_section_end mc146818rtc, section_id 24 -> 0
savevm_section_start 0000:00:01.1/ide, section_id 25
savevm_section_end 0000:00:01.1/ide, section_id 25 -> 0
savevm_section_start i2c_bus, section_id 26
savevm_section_end i2c_bus, section_id 26 -> 0
savevm_section_start 0000:00:01.3/piix4_pm, section_id 27
savevm_section_end 0000:00:01.3/piix4_pm, section_id 27 -> 0
savevm_section_start 0000:00:01.0/PIIX3, section_id 28
savevm_section_end 0000:00:01.0/PIIX3, section_id 28 -> 0
savevm_section_start i8259, section_id 29
savevm_section_end i8259, section_id 29 -> 0
savevm_section_start i8259, section_id 30
savevm_section_end i8259, section_id 30 -> 0
savevm_section_start ioapic, section_id 31
savevm_section_end ioapic, section_id 31 -> 0
savevm_section_start 0000:00:02.0/vga, section_id 32
savevm_section_end 0000:00:02.0/vga, section_id 32 -> 0
savevm_section_start hpet, section_id 33
savevm_section_end hpet, section_id 33 -> 0
savevm_section_start i8254, section_id 34
savevm_section_end i8254, section_id 34 -> 0
savevm_section_start pcspk, section_id 35
savevm_section_end pcspk, section_id 35 -> 0
savevm_section_start serial, section_id 36
savevm_section_end serial, section_id 36 -> 0
savevm_section_start parallel_isa, section_id 37
savevm_section_end parallel_isa, section_id 37 -> 0
savevm_section_start fdc, section_id 38
savevm_section_end fdc, section_id 38 -> 0
savevm_section_start ps2kbd, section_id 39
savevm_section_end ps2kbd, section_id 39 -> 0
savevm_section_start ps2mouse, section_id 40
savevm_section_end ps2mouse, section_id 40 -> 0
savevm_section_start pckbd, section_id 41
savevm_section_end pckbd, section_id 41 -> 0
savevm_section_start vmmouse, section_id 42
savevm_section_end vmmouse, section_id 42 -> 0
savevm_section_start port92, section_id 43
savevm_section_end port92, section_id 43 -> 0
savevm_section_start 0000:00:03.0/e1000, section_id 44
savevm_section_end 0000:00:03.0/e1000, section_id 44 -> 0
savevm_section_skip smbus-eeprom, section_id 45
savevm_section_skip smbus-eeprom, section_id 46
savevm_section_skip smbus-eeprom, section_id 47
savevm_section_skip smbus-eeprom, section_id 48
savevm_section_skip smbus-eeprom, section_id 49
savevm_section_skip smbus-eeprom, section_id 50
savevm_section_skip smbus-eeprom, section_id 51
savevm_section_skip smbus-eeprom, section_id 52
savevm_section_start acpi_build, section_id 53
savevm_section_end acpi_build, section_id 53 -> 0
savevm_section_start globalstate, section_id 54
migrate_global_state_pre_save saved state: running
savevm_section_end globalstate, section_id 54 -> 0
migrate_error error=Unable to write to socket: Connection reset by peer
migrate_set_state new state failed
migration_thread_after_loop
migration_block_activation active
{"timestamp": {"seconds": 1737127025, "microseconds": 625404}, "event": "RESUME"}
migrate_fd_cleanup
savevm_state_cleanup
qemu-system-x86_64: Unable to write to socket: Connection reset by peer


$ /home/fabiano/qemu-system-x86_64 -display none -cpu host -smp 4
-machine pc,accel=kvm -object
memory-backend-file,id=ram0,size=4G,mem-path=/dev/shm/ram0,share=on -m
4G -machine aux-ram-share=on -incoming tcp:0:44444 -incoming
'{"channel-type": "cpr", "addr": { "transport": "socket", "type":
"unix", "path": "cpr.sock"}}' -trace loadvm_* -trace cpr_* -trace
migration_* -trace migrate_* -monitor stdio

cpr_transfer_input cpr.sock
cpr_state_load cpr-transfer mode
QEMU 9.2.50 monitor - type 'help' for more information
cpr_find_fd pc.bios, id 0 returns 15
cpr_find_fd pc.rom, id 0 returns 14
cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns 13
cpr_find_fd 0000:00:02.0/vga.rom, id 0 returns 12
cpr_find_fd 0000:00:03.0/e1000.rom, id 0 returns 11
cpr_find_fd /rom@etc/acpi/tables, id 0 returns 10
cpr_find_fd /rom@etc/table-loader, id 0 returns 8
cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns 3
migrate_set_state new state setup
(qemu) migration_socket_incoming_accepted
migration_set_incoming_channel ioc=0x5565cccb8e70 ioctype=qio-channel-socket
migrate_set_state new state active
loadvm_state_setup
qemu-system-x86_64: error while loading state for instance 0x0 of device 'kvm-tpr-opt'
migrate_set_state new state failed
migrate_error error=load of migration failed: Operation not permitted
loadvm_state_cleanup
qemu-system-x86_64: load of migration failed: Operation not permitted


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH V7 24/24] migration: cpr-transfer documentation
  2025-01-17 15:29       ` Fabiano Rosas
@ 2025-01-17 16:58         ` Steven Sistare
  2025-01-17 19:06           ` Fabiano Rosas
  0 siblings, 1 reply; 44+ messages in thread
From: Steven Sistare @ 2025-01-17 16:58 UTC (permalink / raw)
  To: Fabiano Rosas, qemu-devel
  Cc: Peter Xu, David Hildenbrand, Marcel Apfelbaum, Eduardo Habkost,
	Philippe Mathieu-Daude, Paolo Bonzini, Daniel P. Berrange,
	Markus Armbruster

On 1/17/2025 10:29 AM, Fabiano Rosas wrote:
> Steven Sistare <steven.sistare@oracle.com> writes:
> 
>> On 1/17/2025 9:42 AM, Fabiano Rosas wrote:
>>> Steve Sistare <steven.sistare@oracle.com> writes:
>>>
>>>> Add documentation for the cpr-transfer migration mode.
>>>>
>>>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>>>> Reviewed-by: Peter Xu <peterx@redhat.com>
>>>> ---
>>>>    docs/devel/migration/CPR.rst | 182 ++++++++++++++++++++++++++++++++++++++++++-
>>>>    1 file changed, 180 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/docs/devel/migration/CPR.rst b/docs/devel/migration/CPR.rst
>>>> index 63c3647..d6021d5 100644
>>>> --- a/docs/devel/migration/CPR.rst
>>>> +++ b/docs/devel/migration/CPR.rst
>>>> @@ -5,7 +5,7 @@ CPR is the umbrella name for a set of migration modes in which the
>>>>    VM is migrated to a new QEMU instance on the same host.  It is
>>>>    intended for use when the goal is to update host software components
>>>>    that run the VM, such as QEMU or even the host kernel.  At this time,
>>>> -cpr-reboot is the only available mode.
>>>> +the cpr-reboot and cpr-transfer modes are available.
>>>>    
>>>>    Because QEMU is restarted on the same host, with access to the same
>>>>    local devices, CPR is allowed in certain cases where normal migration
>>>> @@ -53,7 +53,7 @@ RAM is copied to the migration URI.
>>>>    Outgoing:
>>>>      * Set the migration mode parameter to ``cpr-reboot``.
>>>>      * Set the ``x-ignore-shared`` capability if desired.
>>>> -  * Issue the ``migrate`` command.  It is recommended the the URI be a
>>>> +  * Issue the ``migrate`` command.  It is recommended the URI be a
>>>>        ``file`` type, but one can use other types such as ``exec``,
>>>>        provided the command captures all the data from the outgoing side,
>>>>        and provides all the data to the incoming side.
>>>> @@ -145,3 +145,181 @@ Caveats
>>>>    
>>>>    cpr-reboot mode may not be used with postcopy, background-snapshot,
>>>>    or COLO.
>>>> +
>>>> +cpr-transfer mode
>>>> +-----------------
>>>> +
>>>> +This mode allows the user to transfer a guest to a new QEMU instance
>>>> +on the same host with minimal guest pause time, by preserving guest
>>>> +RAM in place, albeit with new virtual addresses in new QEMU.  Devices
>>>> +and their pinned memory pages will also be preserved in a future QEMU
>>>> +release.
>>>> +
>>>> +The user starts new QEMU on the same host as old QEMU, with command-
>>>> +line arguments to create the same machine, plus the ``-incoming``
>>>> +option for the main migration channel, like normal live migration.
>>>> +In addition, the user adds a second -incoming option with channel
>>>> +type ``cpr``.  This CPR channel must support file descriptor transfer
>>>> +with SCM_RIGHTS, i.e. it must be a UNIX domain socket.
>>>> +
>>>> +To initiate CPR, the user issues a migrate command to old QEMU,
>>>> +adding a second migration channel of type ``cpr`` in the channels
>>>> +argument.  Old QEMU stops the VM, saves state to the migration
>>>> +channels, and enters the postmigrate state.  Execution resumes in
>>>> +new QEMU.
>>>> +
>>>> +New QEMU reads the CPR channel before opening a monitor, hence
>>>> +the CPR channel cannot be specified in the list of channels for a
>>>> +migrate-incoming command.  It may only be specified on the command
>>>> +line.
>>>> +
>>>> +Usage
>>>> +^^^^^
>>>> +
>>>> +Memory backend objects must have the ``share=on`` attribute.
>>>> +
>>>> +The VM must be started with the ``-machine aux-ram-share=on``
>>>> +option.  This causes implicit RAM blocks (those not described by
>>>> +a memory-backend object) to be allocated by mmap'ing a memfd.
>>>> +Examples include VGA and ROM.
>>>> +
>>>> +Outgoing:
>>>> +  * Set the migration mode parameter to ``cpr-transfer``.
>>>> +  * Issue the ``migrate`` command, containing a main channel and
>>>> +    a cpr channel.
>>>> +
>>>> +Incoming:
>>>> +  * Start new QEMU with two ``-incoming`` options.
>>>> +  * If the VM was running when the outgoing ``migrate`` command was
>>>> +    issued, then QEMU automatically resumes VM execution.
>>>> +
>>>> +Caveats
>>>> +^^^^^^^
>>>> +
>>>> +cpr-transfer mode may not be used with postcopy, background-snapshot,
>>>> +or COLO.
>>>> +
>>>> +memory-backend-epc is not supported.
>>>> +
>>>> +The main incoming migration channel address cannot be a file type.
>>>> +
>>>> +If the main incoming channel address is an inet socket, then the port
>>>> +cannot be 0 (meaning dynamically choose a port).
>>>> +
>>>> +When using ``-incoming defer``, you must issue the migrate command to
>>>> +old QEMU before issuing any monitor commands to new QEMU, because new
>>>> +QEMU blocks waiting to read from the cpr channel before starting its
>>>> +monitor, and old QEMU does not write to the channel until the migrate
>>>> +command is issued.  However, new QEMU does not open and read the
>>>> +main migration channel until you issue the migrate incoming command.
>>>> +
>>>> +Example 1: incoming channel
>>>> +^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>> +
>>>> +In these examples, we simply restart the same version of QEMU, but
>>>> +in a real scenario one would start new QEMU on the incoming side.
>>>> +Note that new QEMU does not print the monitor prompt until old QEMU
>>>> +has issued the migrate command.  The outgoing side uses QMP because
>>>> +HMP cannot specify a CPR channel.  Some QMP responses are omitted for
>>>> +brevity.
>>>> +
>>>> +::
>>>> +
>>>> +  Outgoing:                             Incoming:
>>>> +
>>>> +  # qemu-kvm -qmp stdio
>>>> +  -object memory-backend-file,id=ram0,size=4G,
>>>> +  mem-path=/dev/shm/ram0,share=on -m 4G
>>>> +  -machine aux-ram-share=on
>>>> +  ...
>>>> +                                        # qemu-kvm -monitor stdio
>>>> +                                        -incoming tcp:0:44444
>>>> +                                        -incoming '{"channel-type": "cpr",
>>>> +                                          "addr": { "transport": "socket",
>>>> +                                          "type": "unix", "path": "cpr.sock"}}'
>>>> +                                        ...
>>>
>>> I'm attempting this and not having much success. Surely I'm missing
>>> something:
>>>
>>>
>>> $ qemu-system-x86_64 -cpu host -smp 16 -machine pc,accel=kvm \
>>>     -drive id=drive0,if=none,format=qcow2,file=img.qcow2 \
>>>     -device virtio-blk-pci,id=image1,drive=drive0,bootindex=0 \
>>>     -qmp unix:./dst-qmp.sock,server,wait=off \
>>>     -nographic -serial mon:stdio \
>>>     -object memory-backend-file,id=ram0,size=4G,mem-path=/dev/shm/ram0,share=on \
>>>     -m 4G -machine aux-ram-share=on \
>>>
>>>     -incoming tcp:0:44444 \
>>>     -incoming '{"channel-type": "cpr", "addr": { "transport": "socket", "type": "unix", "path": "cpr.sock"}}' \
>>>
>>>     -trace loadvm_* -trace cpr_* -trace migration_* -trace migrate_* -trace qemu_loadvm_*
>>>
>>> cpr_transfer_input cpr.sock
>>> cpr_state_load cpr-transfer mode
>>> cpr_find_fd pc.bios, id 0 returns 15
>>> cpr_find_fd pc.rom, id 0 returns 14
>>> cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns 13
>>> cpr_find_fd 0000:00:02.0/vga.rom, id 0 returns 12
>>> cpr_find_fd 0000:00:03.0/e1000.rom, id 0 returns 11
>>> cpr_find_fd /rom@etc/acpi/tables, id 0 returns 10
>>> cpr_find_fd /rom@etc/table-loader, id 0 returns 8
>>> cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns 3
>>> migrate_set_state new state setup
>>> migration_socket_incoming_accepted
>>> migration_set_incoming_channel ioc=0x564dc31e7000 ioctype=qio-channel-socket
>>> migrate_set_state new state active
>>> loadvm_state_setup
>>> qemu_loadvm_state_section 1
>>> qemu_loadvm_state_section_startfull 2(ram) 0 4
>>> qemu_loadvm_state_section 3
>>> qemu_loadvm_state_section_partend 2
>>> qemu_loadvm_state_section 4
>>> qemu_loadvm_state_section_startfull 0(timer) 0 2
>>> qemu_loadvm_state_section 4
>>> qemu_loadvm_state_section_startfull 1(slirp) 0 4
>>> qemu_loadvm_state_section 4
>>> qemu_loadvm_state_section_startfull 4(cpu_common) 0 1
>>> qemu_loadvm_state_section 4
>>> qemu_loadvm_state_section_startfull 5(cpu) 0 12
>>> qemu_loadvm_state_section 4
>>> qemu_loadvm_state_section_startfull 6(kvm-tpr-opt) 0 1
>>> qemu-system-x86_64: error while loading state for instance 0x0 of device 'kvm-tpr-opt'
>>> qemu_loadvm_state_post_main -1
>>> migrate_set_state new state failed
>>> migrate_error error=load of migration failed: Operation not permitted
>>> loadvm_state_cleanup
>>> qemu-system-x86_64: load of migration failed: Operation not permitted
>>
>> Check for a mismatch between the qemu args on the source vs dest.
>> Maybe -cpu.
> 
> No.. they're the same:
> 
> qemu-system-x86_64 -display none -cpu host -smp 4 -machine pc,accel=kvm
> -object
> memory-backend-file,id=ram0,size=4G,mem-path=/dev/shm/ram0,share=on -m
> 4G -machine aux-ram-share=on -qmp stdio
> 
> qemu-system-x86_64 -display none -cpu host -smp 4 -machine pc,accel=kvm
> -object
> memory-backend-file,id=ram0,size=4G,mem-path=/dev/shm/ram0,share=on -m
> 4G -machine aux-ram-share=on -incoming tcp:0:44444 -incoming
> '{"channel-type": "cpr", "addr": { "transport": "socket", "type":
> "unix", "path": "cpr.sock"}}' -monitor stdio
> 
> 
> Here's the whole log, see if you spot something:
> 
> $ (sleep 5; echo "{ 'execute': 'qmp_capabilities' }
>                   { 'execute': 'migrate-set-parameters','arguments':{ 'mode': 'cpr-transfer' } }
>                   { 'execute': 'migrate', 'arguments': \
>                     { 'channels': [ \
>                       {'channel-type': 'main', 'addr': { 'transport': 'socket', 'type': 'inet', \
>                                        'host': '0', 'port': '44444' }}, \
>                       {'channel-type': 'cpr', 'addr': { 'transport': 'socket', 'type': 'unix', \
>                                        'path': 'cpr.sock' }} \
>                     ]} \
>                   }") | /home/fabiano/qemu-system-x86_64 -display none
>                   -cpu host -smp 4 -machine pc,accel=kvm -qmp stdio
>                   -object
>                   memory-backend-file,id=ram0,size=4G,mem-path=/dev/shm/ram0,share=on
>                   -m 4G -machine aux-ram-share=on -trace cpr_* -trace
>                   migration_* -trace migrate_* -trace qemu_savevm_*
>                   -trace savevm_*
> 
> {"QMP": {"version": {"qemu": {"micro": 50, "minor": 2, "major": 9}, "package": "v9.2.0-987-gfd4129a8b9"}, "capabilities": ["oob"]}}
> cpr_find_fd pc.bios, id 0 returns -1
> cpr_save_fd pc.bios, id 0, fd 22
> cpr_find_fd pc.rom, id 0 returns -1
> cpr_save_fd pc.rom, id 0, fd 23
> cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns -1
> cpr_save_fd 0000:00:02.0/vga.vram, id 0, fd 24
> cpr_find_fd 0000:00:02.0/vga.rom, id 0 returns -1
> cpr_save_fd 0000:00:02.0/vga.rom, id 0, fd 26
> cpr_find_fd 0000:00:03.0/e1000.rom, id 0 returns -1
> cpr_save_fd 0000:00:03.0/e1000.rom, id 0, fd 27
> cpr_find_fd /rom@etc/acpi/tables, id 0 returns -1
> cpr_save_fd /rom@etc/acpi/tables, id 0, fd 28
> cpr_find_fd /rom@etc/table-loader, id 0 returns -1
> cpr_save_fd /rom@etc/table-loader, id 0, fd 29
> cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns -1
> cpr_save_fd /rom@etc/acpi/rsdp, id 0, fd 30
> migration_block_activation active-skipped
> {"return": {}}
> {"return": {}}
> migrate_set_state new state setup
> cpr_state_save cpr-transfer mode
> cpr_transfer_output cpr.sock
> {"return": {}}
> migration_socket_outgoing_connected hostname=0
> migration_set_outgoing_channel ioc=0x55748ba10270 ioctype=qio-channel-socket hostname=0 err=(nil)
> {"timestamp": {"seconds": 1737127025, "microseconds": 606998}, "event": "STOP"}
> migration_completion_vm_stop ret 0
> migration_transferred_bytes qemu_file 224 multifd 0 RDMA 0
> savevm_state_header
> savevm_state_setup
> migration_bitmap_sync_start
> migration_bitmap_sync_end dirty_pages 0
> migrate_set_state new state active
> migration_thread_setup_complete
> migration_transferred_bytes qemu_file 506 multifd 0 RDMA 0
> migrate_pending_estimate estimate pending size 0 (pre = 0 post=0)
> migration_thread_low_pending 0
> migrate_set_state new state device
> migration_block_activation inactive
> migration_precopy_complete
> savevm_section_start ram, section_id 2
> migration_bitmap_sync_start
> migration_bitmap_sync_end dirty_pages 0
> savevm_section_end ram, section_id 2 -> 0
> savevm_section_start timer, section_id 0
> savevm_section_end timer, section_id 0 -> 0
> savevm_section_start slirp, section_id 1
> savevm_section_end slirp, section_id 1 -> 0
> savevm_section_start cpu_common, section_id 4
> savevm_section_end cpu_common, section_id 4 -> 0
> savevm_section_start cpu, section_id 5
> savevm_section_end cpu, section_id 5 -> 0
> savevm_section_start kvm-tpr-opt, section_id 6
> savevm_section_end kvm-tpr-opt, section_id 6 -> 0
> savevm_section_start apic, section_id 7
> savevm_section_end apic, section_id 7 -> 0
> savevm_section_start cpu_common, section_id 8
> savevm_section_end cpu_common, section_id 8 -> 0
> savevm_section_start cpu, section_id 9
> savevm_section_end cpu, section_id 9 -> 0
> savevm_section_start apic, section_id 10
> savevm_section_end apic, section_id 10 -> 0
> savevm_section_start cpu_common, section_id 11
> savevm_section_end cpu_common, section_id 11 -> 0
> savevm_section_start cpu, section_id 12
> savevm_section_end cpu, section_id 12 -> 0
> savevm_section_start apic, section_id 13
> savevm_section_end apic, section_id 13 -> 0
> savevm_section_start cpu_common, section_id 14
> savevm_section_end cpu_common, section_id 14 -> 0
> savevm_section_start cpu, section_id 15
> savevm_section_end cpu, section_id 15 -> 0
> savevm_section_start apic, section_id 16
> savevm_section_end apic, section_id 16 -> 0
> savevm_section_start kvmclock, section_id 17
> savevm_section_end kvmclock, section_id 17 -> 0
> savevm_section_start 0000:00:00.0/I440FX, section_id 18
> savevm_section_end 0000:00:00.0/I440FX, section_id 18 -> 0
> savevm_section_start PCIHost, section_id 19
> savevm_section_end PCIHost, section_id 19 -> 0
> savevm_section_start PCIBUS, section_id 20
> savevm_section_end PCIBUS, section_id 20 -> 0
> savevm_section_start fw_cfg, section_id 21
> savevm_section_end fw_cfg, section_id 21 -> 0
> savevm_section_start dma, section_id 22
> savevm_section_end dma, section_id 22 -> 0
> savevm_section_start dma, section_id 23
> savevm_section_end dma, section_id 23 -> 0
> savevm_section_start mc146818rtc, section_id 24
> savevm_section_end mc146818rtc, section_id 24 -> 0
> savevm_section_start 0000:00:01.1/ide, section_id 25
> savevm_section_end 0000:00:01.1/ide, section_id 25 -> 0
> savevm_section_start i2c_bus, section_id 26
> savevm_section_end i2c_bus, section_id 26 -> 0
> savevm_section_start 0000:00:01.3/piix4_pm, section_id 27
> savevm_section_end 0000:00:01.3/piix4_pm, section_id 27 -> 0
> savevm_section_start 0000:00:01.0/PIIX3, section_id 28
> savevm_section_end 0000:00:01.0/PIIX3, section_id 28 -> 0
> savevm_section_start i8259, section_id 29
> savevm_section_end i8259, section_id 29 -> 0
> savevm_section_start i8259, section_id 30
> savevm_section_end i8259, section_id 30 -> 0
> savevm_section_start ioapic, section_id 31
> savevm_section_end ioapic, section_id 31 -> 0
> savevm_section_start 0000:00:02.0/vga, section_id 32
> savevm_section_end 0000:00:02.0/vga, section_id 32 -> 0
> savevm_section_start hpet, section_id 33
> savevm_section_end hpet, section_id 33 -> 0
> savevm_section_start i8254, section_id 34
> savevm_section_end i8254, section_id 34 -> 0
> savevm_section_start pcspk, section_id 35
> savevm_section_end pcspk, section_id 35 -> 0
> savevm_section_start serial, section_id 36
> savevm_section_end serial, section_id 36 -> 0
> savevm_section_start parallel_isa, section_id 37
> savevm_section_end parallel_isa, section_id 37 -> 0
> savevm_section_start fdc, section_id 38
> savevm_section_end fdc, section_id 38 -> 0
> savevm_section_start ps2kbd, section_id 39
> savevm_section_end ps2kbd, section_id 39 -> 0
> savevm_section_start ps2mouse, section_id 40
> savevm_section_end ps2mouse, section_id 40 -> 0
> savevm_section_start pckbd, section_id 41
> savevm_section_end pckbd, section_id 41 -> 0
> savevm_section_start vmmouse, section_id 42
> savevm_section_end vmmouse, section_id 42 -> 0
> savevm_section_start port92, section_id 43
> savevm_section_end port92, section_id 43 -> 0
> savevm_section_start 0000:00:03.0/e1000, section_id 44
> savevm_section_end 0000:00:03.0/e1000, section_id 44 -> 0
> savevm_section_skip smbus-eeprom, section_id 45
> savevm_section_skip smbus-eeprom, section_id 46
> savevm_section_skip smbus-eeprom, section_id 47
> savevm_section_skip smbus-eeprom, section_id 48
> savevm_section_skip smbus-eeprom, section_id 49
> savevm_section_skip smbus-eeprom, section_id 50
> savevm_section_skip smbus-eeprom, section_id 51
> savevm_section_skip smbus-eeprom, section_id 52
> savevm_section_start acpi_build, section_id 53
> savevm_section_end acpi_build, section_id 53 -> 0
> savevm_section_start globalstate, section_id 54
> migrate_global_state_pre_save saved state: running
> savevm_section_end globalstate, section_id 54 -> 0
> migrate_error error=Unable to write to socket: Connection reset by peer
> migrate_set_state new state failed
> migration_thread_after_loop
> migration_block_activation active
> {"timestamp": {"seconds": 1737127025, "microseconds": 625404}, "event": "RESUME"}
> migrate_fd_cleanup
> savevm_state_cleanup
> qemu-system-x86_64: Unable to write to socket: Connection reset by peer
> 
> 
> $ /home/fabiano/qemu-system-x86_64 -display none -cpu host -smp 4
> -machine pc,accel=kvm -object
> memory-backend-file,id=ram0,size=4G,mem-path=/dev/shm/ram0,share=on -m
> 4G -machine aux-ram-share=on -incoming tcp:0:44444 -incoming
> '{"channel-type": "cpr", "addr": { "transport": "socket", "type":
> "unix", "path": "cpr.sock"}}' -trace loadvm_* -trace cpr_* -trace
> migration_* -trace migrate_* -monitor stdio
> 
> cpr_transfer_input cpr.sock
> cpr_state_load cpr-transfer mode
> QEMU 9.2.50 monitor - type 'help' for more information
> cpr_find_fd pc.bios, id 0 returns 15
> cpr_find_fd pc.rom, id 0 returns 14
> cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns 13
> cpr_find_fd 0000:00:02.0/vga.rom, id 0 returns 12
> cpr_find_fd 0000:00:03.0/e1000.rom, id 0 returns 11
> cpr_find_fd /rom@etc/acpi/tables, id 0 returns 10
> cpr_find_fd /rom@etc/table-loader, id 0 returns 8
> cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns 3
> migrate_set_state new state setup
> (qemu) migration_socket_incoming_accepted
> migration_set_incoming_channel ioc=0x5565cccb8e70 ioctype=qio-channel-socket
> migrate_set_state new state active
> loadvm_state_setup
> qemu-system-x86_64: error while loading state for instance 0x0 of device 'kvm-tpr-opt'
> migrate_set_state new state failed
> migrate_error error=load of migration failed: Operation not permitted
> loadvm_state_cleanup
> qemu-system-x86_64: load of migration failed: Operation not permitted

Thank-you for the simple example.  I reproduced the failure.
To fix, add "-machine aux-ram-share=on -machine memory-backend=ram0"
(The previous longer example had the former but lacked the latter).
Without that, the volatile pc.ram region is still in the mix.

I have a patch that adds a blocker if volatile ram is present, and would
have clearly diagnosed this problem.  I will submit it now.

- Steve



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH V7 24/24] migration: cpr-transfer documentation
  2025-01-17 16:58         ` Steven Sistare
@ 2025-01-17 19:06           ` Fabiano Rosas
  2025-01-17 19:32             ` Steven Sistare
  0 siblings, 1 reply; 44+ messages in thread
From: Fabiano Rosas @ 2025-01-17 19:06 UTC (permalink / raw)
  To: Steven Sistare, qemu-devel
  Cc: Peter Xu, David Hildenbrand, Marcel Apfelbaum, Eduardo Habkost,
	Philippe Mathieu-Daude, Paolo Bonzini, Daniel P. Berrange,
	Markus Armbruster

 Steven Sistare <steven.sistare@oracle.com> writes:

> On 1/17/2025 10:29 AM, Fabiano Rosas wrote:
>> Steven Sistare <steven.sistare@oracle.com> writes:
>> 
>>> On 1/17/2025 9:42 AM, Fabiano Rosas wrote:
>>>> Steve Sistare <steven.sistare@oracle.com> writes:
>>>>
>>>>> Add documentation for the cpr-transfer migration mode.
>>>>>
>>>>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>>>>> Reviewed-by: Peter Xu <peterx@redhat.com>
>>>>> ---
>>>>>    docs/devel/migration/CPR.rst | 182 ++++++++++++++++++++++++++++++++++++++++++-
>>>>>    1 file changed, 180 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/docs/devel/migration/CPR.rst b/docs/devel/migration/CPR.rst
>>>>> index 63c3647..d6021d5 100644
>>>>> --- a/docs/devel/migration/CPR.rst
>>>>> +++ b/docs/devel/migration/CPR.rst
>>>>> @@ -5,7 +5,7 @@ CPR is the umbrella name for a set of migration modes in which the
>>>>>    VM is migrated to a new QEMU instance on the same host.  It is
>>>>>    intended for use when the goal is to update host software components
>>>>>    that run the VM, such as QEMU or even the host kernel.  At this time,
>>>>> -cpr-reboot is the only available mode.
>>>>> +the cpr-reboot and cpr-transfer modes are available.
>>>>>    
>>>>>    Because QEMU is restarted on the same host, with access to the same
>>>>>    local devices, CPR is allowed in certain cases where normal migration
>>>>> @@ -53,7 +53,7 @@ RAM is copied to the migration URI.
>>>>>    Outgoing:
>>>>>      * Set the migration mode parameter to ``cpr-reboot``.
>>>>>      * Set the ``x-ignore-shared`` capability if desired.
>>>>> -  * Issue the ``migrate`` command.  It is recommended the the URI be a
>>>>> +  * Issue the ``migrate`` command.  It is recommended the URI be a
>>>>>        ``file`` type, but one can use other types such as ``exec``,
>>>>>        provided the command captures all the data from the outgoing side,
>>>>>        and provides all the data to the incoming side.
>>>>> @@ -145,3 +145,181 @@ Caveats
>>>>>    
>>>>>    cpr-reboot mode may not be used with postcopy, background-snapshot,
>>>>>    or COLO.
>>>>> +
>>>>> +cpr-transfer mode
>>>>> +-----------------
>>>>> +
>>>>> +This mode allows the user to transfer a guest to a new QEMU instance
>>>>> +on the same host with minimal guest pause time, by preserving guest
>>>>> +RAM in place, albeit with new virtual addresses in new QEMU.  Devices
>>>>> +and their pinned memory pages will also be preserved in a future QEMU
>>>>> +release.
>>>>> +
>>>>> +The user starts new QEMU on the same host as old QEMU, with command-
>>>>> +line arguments to create the same machine, plus the ``-incoming``
>>>>> +option for the main migration channel, like normal live migration.
>>>>> +In addition, the user adds a second -incoming option with channel
>>>>> +type ``cpr``.  This CPR channel must support file descriptor transfer
>>>>> +with SCM_RIGHTS, i.e. it must be a UNIX domain socket.
>>>>> +
>>>>> +To initiate CPR, the user issues a migrate command to old QEMU,
>>>>> +adding a second migration channel of type ``cpr`` in the channels
>>>>> +argument.  Old QEMU stops the VM, saves state to the migration
>>>>> +channels, and enters the postmigrate state.  Execution resumes in
>>>>> +new QEMU.
>>>>> +
>>>>> +New QEMU reads the CPR channel before opening a monitor, hence
>>>>> +the CPR channel cannot be specified in the list of channels for a
>>>>> +migrate-incoming command.  It may only be specified on the command
>>>>> +line.
>>>>> +
>>>>> +Usage
>>>>> +^^^^^
>>>>> +
>>>>> +Memory backend objects must have the ``share=on`` attribute.
>>>>> +
>>>>> +The VM must be started with the ``-machine aux-ram-share=on``
>>>>> +option.  This causes implicit RAM blocks (those not described by
>>>>> +a memory-backend object) to be allocated by mmap'ing a memfd.
>>>>> +Examples include VGA and ROM.
>>>>> +
>>>>> +Outgoing:
>>>>> +  * Set the migration mode parameter to ``cpr-transfer``.
>>>>> +  * Issue the ``migrate`` command, containing a main channel and
>>>>> +    a cpr channel.
>>>>> +
>>>>> +Incoming:
>>>>> +  * Start new QEMU with two ``-incoming`` options.
>>>>> +  * If the VM was running when the outgoing ``migrate`` command was
>>>>> +    issued, then QEMU automatically resumes VM execution.
>>>>> +
>>>>> +Caveats
>>>>> +^^^^^^^
>>>>> +
>>>>> +cpr-transfer mode may not be used with postcopy, background-snapshot,
>>>>> +or COLO.
>>>>> +
>>>>> +memory-backend-epc is not supported.
>>>>> +
>>>>> +The main incoming migration channel address cannot be a file type.
>>>>> +
>>>>> +If the main incoming channel address is an inet socket, then the port
>>>>> +cannot be 0 (meaning dynamically choose a port).
>>>>> +
>>>>> +When using ``-incoming defer``, you must issue the migrate command to
>>>>> +old QEMU before issuing any monitor commands to new QEMU, because new
>>>>> +QEMU blocks waiting to read from the cpr channel before starting its
>>>>> +monitor, and old QEMU does not write to the channel until the migrate
>>>>> +command is issued.  However, new QEMU does not open and read the
>>>>> +main migration channel until you issue the migrate incoming command.
>>>>> +
>>>>> +Example 1: incoming channel
>>>>> +^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>> +
>>>>> +In these examples, we simply restart the same version of QEMU, but
>>>>> +in a real scenario one would start new QEMU on the incoming side.
>>>>> +Note that new QEMU does not print the monitor prompt until old QEMU
>>>>> +has issued the migrate command.  The outgoing side uses QMP because
>>>>> +HMP cannot specify a CPR channel.  Some QMP responses are omitted for
>>>>> +brevity.
>>>>> +
>>>>> +::
>>>>> +
>>>>> +  Outgoing:                             Incoming:
>>>>> +
>>>>> +  # qemu-kvm -qmp stdio
>>>>> +  -object memory-backend-file,id=ram0,size=4G,
>>>>> +  mem-path=/dev/shm/ram0,share=on -m 4G
>>>>> +  -machine aux-ram-share=on
>>>>> +  ...
>>>>> +                                        # qemu-kvm -monitor stdio
>>>>> +                                        -incoming tcp:0:44444
>>>>> +                                        -incoming '{"channel-type": "cpr",
>>>>> +                                          "addr": { "transport": "socket",
>>>>> +                                          "type": "unix", "path": "cpr.sock"}}'
>>>>> +                                        ...
>>>>
>>>> I'm attempting this and not having much success. Surely I'm missing
>>>> something:
>>>>
>>>>
>>>> $ qemu-system-x86_64 -cpu host -smp 16 -machine pc,accel=kvm \
>>>>     -drive id=drive0,if=none,format=qcow2,file=img.qcow2 \
>>>>     -device virtio-blk-pci,id=image1,drive=drive0,bootindex=0 \
>>>>     -qmp unix:./dst-qmp.sock,server,wait=off \
>>>>     -nographic -serial mon:stdio \
>>>>     -object memory-backend-file,id=ram0,size=4G,mem-path=/dev/shm/ram0,share=on \
>>>>     -m 4G -machine aux-ram-share=on \
>>>>
>>>>     -incoming tcp:0:44444 \
>>>>     -incoming '{"channel-type": "cpr", "addr": { "transport": "socket", "type": "unix", "path": "cpr.sock"}}' \
>>>>
>>>>     -trace loadvm_* -trace cpr_* -trace migration_* -trace migrate_* -trace qemu_loadvm_*
>>>>
>>>> cpr_transfer_input cpr.sock
>>>> cpr_state_load cpr-transfer mode
>>>> cpr_find_fd pc.bios, id 0 returns 15
>>>> cpr_find_fd pc.rom, id 0 returns 14
>>>> cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns 13
>>>> cpr_find_fd 0000:00:02.0/vga.rom, id 0 returns 12
>>>> cpr_find_fd 0000:00:03.0/e1000.rom, id 0 returns 11
>>>> cpr_find_fd /rom@etc/acpi/tables, id 0 returns 10
>>>> cpr_find_fd /rom@etc/table-loader, id 0 returns 8
>>>> cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns 3
>>>> migrate_set_state new state setup
>>>> migration_socket_incoming_accepted
>>>> migration_set_incoming_channel ioc=0x564dc31e7000 ioctype=qio-channel-socket
>>>> migrate_set_state new state active
>>>> loadvm_state_setup
>>>> qemu_loadvm_state_section 1
>>>> qemu_loadvm_state_section_startfull 2(ram) 0 4
>>>> qemu_loadvm_state_section 3
>>>> qemu_loadvm_state_section_partend 2
>>>> qemu_loadvm_state_section 4
>>>> qemu_loadvm_state_section_startfull 0(timer) 0 2
>>>> qemu_loadvm_state_section 4
>>>> qemu_loadvm_state_section_startfull 1(slirp) 0 4
>>>> qemu_loadvm_state_section 4
>>>> qemu_loadvm_state_section_startfull 4(cpu_common) 0 1
>>>> qemu_loadvm_state_section 4
>>>> qemu_loadvm_state_section_startfull 5(cpu) 0 12
>>>> qemu_loadvm_state_section 4
>>>> qemu_loadvm_state_section_startfull 6(kvm-tpr-opt) 0 1
>>>> qemu-system-x86_64: error while loading state for instance 0x0 of device 'kvm-tpr-opt'
>>>> qemu_loadvm_state_post_main -1
>>>> migrate_set_state new state failed
>>>> migrate_error error=load of migration failed: Operation not permitted
>>>> loadvm_state_cleanup
>>>> qemu-system-x86_64: load of migration failed: Operation not permitted
>>>
>>> Check for a mismatch between the qemu args on the source vs dest.
>>> Maybe -cpu.
>> 
>> No.. they're the same:
>> 
>> qemu-system-x86_64 -display none -cpu host -smp 4 -machine pc,accel=kvm
>> -object
>> memory-backend-file,id=ram0,size=4G,mem-path=/dev/shm/ram0,share=on -m
>> 4G -machine aux-ram-share=on -qmp stdio
>> 
>> qemu-system-x86_64 -display none -cpu host -smp 4 -machine pc,accel=kvm
>> -object
>> memory-backend-file,id=ram0,size=4G,mem-path=/dev/shm/ram0,share=on -m
>> 4G -machine aux-ram-share=on -incoming tcp:0:44444 -incoming
>> '{"channel-type": "cpr", "addr": { "transport": "socket", "type":
>> "unix", "path": "cpr.sock"}}' -monitor stdio
>> 
>> 
>> Here's the whole log, see if you spot something:
>> 
>> $ (sleep 5; echo "{ 'execute': 'qmp_capabilities' }
>>                   { 'execute': 'migrate-set-parameters','arguments':{ 'mode': 'cpr-transfer' } }
>>                   { 'execute': 'migrate', 'arguments': \
>>                     { 'channels': [ \
>>                       {'channel-type': 'main', 'addr': { 'transport': 'socket', 'type': 'inet', \
>>                                        'host': '0', 'port': '44444' }}, \
>>                       {'channel-type': 'cpr', 'addr': { 'transport': 'socket', 'type': 'unix', \
>>                                        'path': 'cpr.sock' }} \
>>                     ]} \
>>                   }") | /home/fabiano/qemu-system-x86_64 -display none
>>                   -cpu host -smp 4 -machine pc,accel=kvm -qmp stdio
>>                   -object
>>                   memory-backend-file,id=ram0,size=4G,mem-path=/dev/shm/ram0,share=on
>>                   -m 4G -machine aux-ram-share=on -trace cpr_* -trace
>>                   migration_* -trace migrate_* -trace qemu_savevm_*
>>                   -trace savevm_*
>> 
>> {"QMP": {"version": {"qemu": {"micro": 50, "minor": 2, "major": 9}, "package": "v9.2.0-987-gfd4129a8b9"}, "capabilities": ["oob"]}}
>> cpr_find_fd pc.bios, id 0 returns -1
>> cpr_save_fd pc.bios, id 0, fd 22
>> cpr_find_fd pc.rom, id 0 returns -1
>> cpr_save_fd pc.rom, id 0, fd 23
>> cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns -1
>> cpr_save_fd 0000:00:02.0/vga.vram, id 0, fd 24
>> cpr_find_fd 0000:00:02.0/vga.rom, id 0 returns -1
>> cpr_save_fd 0000:00:02.0/vga.rom, id 0, fd 26
>> cpr_find_fd 0000:00:03.0/e1000.rom, id 0 returns -1
>> cpr_save_fd 0000:00:03.0/e1000.rom, id 0, fd 27
>> cpr_find_fd /rom@etc/acpi/tables, id 0 returns -1
>> cpr_save_fd /rom@etc/acpi/tables, id 0, fd 28
>> cpr_find_fd /rom@etc/table-loader, id 0 returns -1
>> cpr_save_fd /rom@etc/table-loader, id 0, fd 29
>> cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns -1
>> cpr_save_fd /rom@etc/acpi/rsdp, id 0, fd 30
>> migration_block_activation active-skipped
>> {"return": {}}
>> {"return": {}}
>> migrate_set_state new state setup
>> cpr_state_save cpr-transfer mode
>> cpr_transfer_output cpr.sock
>> {"return": {}}
>> migration_socket_outgoing_connected hostname=0
>> migration_set_outgoing_channel ioc=0x55748ba10270 ioctype=qio-channel-socket hostname=0 err=(nil)
>> {"timestamp": {"seconds": 1737127025, "microseconds": 606998}, "event": "STOP"}
>> migration_completion_vm_stop ret 0
>> migration_transferred_bytes qemu_file 224 multifd 0 RDMA 0
>> savevm_state_header
>> savevm_state_setup
>> migration_bitmap_sync_start
>> migration_bitmap_sync_end dirty_pages 0
>> migrate_set_state new state active
>> migration_thread_setup_complete
>> migration_transferred_bytes qemu_file 506 multifd 0 RDMA 0
>> migrate_pending_estimate estimate pending size 0 (pre = 0 post=0)
>> migration_thread_low_pending 0
>> migrate_set_state new state device
>> migration_block_activation inactive
>> migration_precopy_complete
>> savevm_section_start ram, section_id 2
>> migration_bitmap_sync_start
>> migration_bitmap_sync_end dirty_pages 0
>> savevm_section_end ram, section_id 2 -> 0
>> savevm_section_start timer, section_id 0
>> savevm_section_end timer, section_id 0 -> 0
>> savevm_section_start slirp, section_id 1
>> savevm_section_end slirp, section_id 1 -> 0
>> savevm_section_start cpu_common, section_id 4
>> savevm_section_end cpu_common, section_id 4 -> 0
>> savevm_section_start cpu, section_id 5
>> savevm_section_end cpu, section_id 5 -> 0
>> savevm_section_start kvm-tpr-opt, section_id 6
>> savevm_section_end kvm-tpr-opt, section_id 6 -> 0
>> savevm_section_start apic, section_id 7
>> savevm_section_end apic, section_id 7 -> 0
>> savevm_section_start cpu_common, section_id 8
>> savevm_section_end cpu_common, section_id 8 -> 0
>> savevm_section_start cpu, section_id 9
>> savevm_section_end cpu, section_id 9 -> 0
>> savevm_section_start apic, section_id 10
>> savevm_section_end apic, section_id 10 -> 0
>> savevm_section_start cpu_common, section_id 11
>> savevm_section_end cpu_common, section_id 11 -> 0
>> savevm_section_start cpu, section_id 12
>> savevm_section_end cpu, section_id 12 -> 0
>> savevm_section_start apic, section_id 13
>> savevm_section_end apic, section_id 13 -> 0
>> savevm_section_start cpu_common, section_id 14
>> savevm_section_end cpu_common, section_id 14 -> 0
>> savevm_section_start cpu, section_id 15
>> savevm_section_end cpu, section_id 15 -> 0
>> savevm_section_start apic, section_id 16
>> savevm_section_end apic, section_id 16 -> 0
>> savevm_section_start kvmclock, section_id 17
>> savevm_section_end kvmclock, section_id 17 -> 0
>> savevm_section_start 0000:00:00.0/I440FX, section_id 18
>> savevm_section_end 0000:00:00.0/I440FX, section_id 18 -> 0
>> savevm_section_start PCIHost, section_id 19
>> savevm_section_end PCIHost, section_id 19 -> 0
>> savevm_section_start PCIBUS, section_id 20
>> savevm_section_end PCIBUS, section_id 20 -> 0
>> savevm_section_start fw_cfg, section_id 21
>> savevm_section_end fw_cfg, section_id 21 -> 0
>> savevm_section_start dma, section_id 22
>> savevm_section_end dma, section_id 22 -> 0
>> savevm_section_start dma, section_id 23
>> savevm_section_end dma, section_id 23 -> 0
>> savevm_section_start mc146818rtc, section_id 24
>> savevm_section_end mc146818rtc, section_id 24 -> 0
>> savevm_section_start 0000:00:01.1/ide, section_id 25
>> savevm_section_end 0000:00:01.1/ide, section_id 25 -> 0
>> savevm_section_start i2c_bus, section_id 26
>> savevm_section_end i2c_bus, section_id 26 -> 0
>> savevm_section_start 0000:00:01.3/piix4_pm, section_id 27
>> savevm_section_end 0000:00:01.3/piix4_pm, section_id 27 -> 0
>> savevm_section_start 0000:00:01.0/PIIX3, section_id 28
>> savevm_section_end 0000:00:01.0/PIIX3, section_id 28 -> 0
>> savevm_section_start i8259, section_id 29
>> savevm_section_end i8259, section_id 29 -> 0
>> savevm_section_start i8259, section_id 30
>> savevm_section_end i8259, section_id 30 -> 0
>> savevm_section_start ioapic, section_id 31
>> savevm_section_end ioapic, section_id 31 -> 0
>> savevm_section_start 0000:00:02.0/vga, section_id 32
>> savevm_section_end 0000:00:02.0/vga, section_id 32 -> 0
>> savevm_section_start hpet, section_id 33
>> savevm_section_end hpet, section_id 33 -> 0
>> savevm_section_start i8254, section_id 34
>> savevm_section_end i8254, section_id 34 -> 0
>> savevm_section_start pcspk, section_id 35
>> savevm_section_end pcspk, section_id 35 -> 0
>> savevm_section_start serial, section_id 36
>> savevm_section_end serial, section_id 36 -> 0
>> savevm_section_start parallel_isa, section_id 37
>> savevm_section_end parallel_isa, section_id 37 -> 0
>> savevm_section_start fdc, section_id 38
>> savevm_section_end fdc, section_id 38 -> 0
>> savevm_section_start ps2kbd, section_id 39
>> savevm_section_end ps2kbd, section_id 39 -> 0
>> savevm_section_start ps2mouse, section_id 40
>> savevm_section_end ps2mouse, section_id 40 -> 0
>> savevm_section_start pckbd, section_id 41
>> savevm_section_end pckbd, section_id 41 -> 0
>> savevm_section_start vmmouse, section_id 42
>> savevm_section_end vmmouse, section_id 42 -> 0
>> savevm_section_start port92, section_id 43
>> savevm_section_end port92, section_id 43 -> 0
>> savevm_section_start 0000:00:03.0/e1000, section_id 44
>> savevm_section_end 0000:00:03.0/e1000, section_id 44 -> 0
>> savevm_section_skip smbus-eeprom, section_id 45
>> savevm_section_skip smbus-eeprom, section_id 46
>> savevm_section_skip smbus-eeprom, section_id 47
>> savevm_section_skip smbus-eeprom, section_id 48
>> savevm_section_skip smbus-eeprom, section_id 49
>> savevm_section_skip smbus-eeprom, section_id 50
>> savevm_section_skip smbus-eeprom, section_id 51
>> savevm_section_skip smbus-eeprom, section_id 52
>> savevm_section_start acpi_build, section_id 53
>> savevm_section_end acpi_build, section_id 53 -> 0
>> savevm_section_start globalstate, section_id 54
>> migrate_global_state_pre_save saved state: running
>> savevm_section_end globalstate, section_id 54 -> 0
>> migrate_error error=Unable to write to socket: Connection reset by peer
>> migrate_set_state new state failed
>> migration_thread_after_loop
>> migration_block_activation active
>> {"timestamp": {"seconds": 1737127025, "microseconds": 625404}, "event": "RESUME"}
>> migrate_fd_cleanup
>> savevm_state_cleanup
>> qemu-system-x86_64: Unable to write to socket: Connection reset by peer
>> 
>> 
>> $ /home/fabiano/qemu-system-x86_64 -display none -cpu host -smp 4
>> -machine pc,accel=kvm -object
>> memory-backend-file,id=ram0,size=4G,mem-path=/dev/shm/ram0,share=on -m
>> 4G -machine aux-ram-share=on -incoming tcp:0:44444 -incoming
>> '{"channel-type": "cpr", "addr": { "transport": "socket", "type":
>> "unix", "path": "cpr.sock"}}' -trace loadvm_* -trace cpr_* -trace
>> migration_* -trace migrate_* -monitor stdio
>> 
>> cpr_transfer_input cpr.sock
>> cpr_state_load cpr-transfer mode
>> QEMU 9.2.50 monitor - type 'help' for more information
>> cpr_find_fd pc.bios, id 0 returns 15
>> cpr_find_fd pc.rom, id 0 returns 14
>> cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns 13
>> cpr_find_fd 0000:00:02.0/vga.rom, id 0 returns 12
>> cpr_find_fd 0000:00:03.0/e1000.rom, id 0 returns 11
>> cpr_find_fd /rom@etc/acpi/tables, id 0 returns 10
>> cpr_find_fd /rom@etc/table-loader, id 0 returns 8
>> cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns 3
>> migrate_set_state new state setup
>> (qemu) migration_socket_incoming_accepted
>> migration_set_incoming_channel ioc=0x5565cccb8e70 ioctype=qio-channel-socket
>> migrate_set_state new state active
>> loadvm_state_setup
>> qemu-system-x86_64: error while loading state for instance 0x0 of device 'kvm-tpr-opt'
>> migrate_set_state new state failed
>> migrate_error error=load of migration failed: Operation not permitted
>> loadvm_state_cleanup
>> qemu-system-x86_64: load of migration failed: Operation not permitted
>
> Thank-you for the simple example.  I reproduced the failure.
> To fix, add "-machine aux-ram-share=on -machine memory-backend=ram0"
> (The previous longer example had the former but lacked the latter).
> Without that, the volatile pc.ram region is still in the mix.

There you go, that kvm-tpr-opt message is almost always indicative of
user error. I think because it's the first vmstate to be loaded.

Nonetheless, we better update the documentation to:

-object memory-backend-file,id=ram0,size=4G,mem-path=/dev/shm/ram0,share=on -m -4G \
-machine memory-backend=ram0 \
-machine aux-ram-share=on

>
> I have a patch that adds a blocker if volatile ram is present, and would
> have clearly diagnosed this problem.  I will submit it now.

Yes, I just tested and it's way better. Although the message asks for
"share=on" and "aux-ram-share=on", which were already there. But there's
not much we can do, basic proficiency with QEMU cmdline is a
prerequisite.

>
> - Steve


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH V7 24/24] migration: cpr-transfer documentation
  2025-01-17 19:06           ` Fabiano Rosas
@ 2025-01-17 19:32             ` Steven Sistare
  2025-01-17 20:04               ` Fabiano Rosas
  0 siblings, 1 reply; 44+ messages in thread
From: Steven Sistare @ 2025-01-17 19:32 UTC (permalink / raw)
  To: Fabiano Rosas, qemu-devel
  Cc: Peter Xu, David Hildenbrand, Marcel Apfelbaum, Eduardo Habkost,
	Philippe Mathieu-Daude, Paolo Bonzini, Daniel P. Berrange,
	Markus Armbruster

On 1/17/2025 2:06 PM, Fabiano Rosas wrote:
>   Steven Sistare <steven.sistare@oracle.com> writes:
> 
>> On 1/17/2025 10:29 AM, Fabiano Rosas wrote:
>>> Steven Sistare <steven.sistare@oracle.com> writes:
>>>
>>>> On 1/17/2025 9:42 AM, Fabiano Rosas wrote:
>>>>> Steve Sistare <steven.sistare@oracle.com> writes:
[...]
>>
>> Thank-you for the simple example.  I reproduced the failure.
>> To fix, add "-machine aux-ram-share=on -machine memory-backend=ram0"
>> (The previous longer example had the former but lacked the latter).
>> Without that, the volatile pc.ram region is still in the mix.
> 
> There you go, that kvm-tpr-opt message is almost always indicative of
> user error. I think because it's the first vmstate to be loaded.
> 
> Nonetheless, we better update the documentation to:
> 
> -object memory-backend-file,id=ram0,size=4G,mem-path=/dev/shm/ram0,share=on -m -4G \
> -machine memory-backend=ram0 \
> -machine aux-ram-share=on

Agreed.  Will you squash it in to both examples?

- Steve



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH V7 24/24] migration: cpr-transfer documentation
  2025-01-17 19:32             ` Steven Sistare
@ 2025-01-17 20:04               ` Fabiano Rosas
  0 siblings, 0 replies; 44+ messages in thread
From: Fabiano Rosas @ 2025-01-17 20:04 UTC (permalink / raw)
  To: Steven Sistare, qemu-devel
  Cc: Peter Xu, David Hildenbrand, Marcel Apfelbaum, Eduardo Habkost,
	Philippe Mathieu-Daude, Paolo Bonzini, Daniel P. Berrange,
	Markus Armbruster

Steven Sistare <steven.sistare@oracle.com> writes:

> On 1/17/2025 2:06 PM, Fabiano Rosas wrote:
>>   Steven Sistare <steven.sistare@oracle.com> writes:
>> 
>>> On 1/17/2025 10:29 AM, Fabiano Rosas wrote:
>>>> Steven Sistare <steven.sistare@oracle.com> writes:
>>>>
>>>>> On 1/17/2025 9:42 AM, Fabiano Rosas wrote:
>>>>>> Steve Sistare <steven.sistare@oracle.com> writes:
> [...]
>>>
>>> Thank-you for the simple example.  I reproduced the failure.
>>> To fix, add "-machine aux-ram-share=on -machine memory-backend=ram0"
>>> (The previous longer example had the former but lacked the latter).
>>> Without that, the volatile pc.ram region is still in the mix.
>> 
>> There you go, that kvm-tpr-opt message is almost always indicative of
>> user error. I think because it's the first vmstate to be loaded.
>> 
>> Nonetheless, we better update the documentation to:
>> 
>> -object memory-backend-file,id=ram0,size=4G,mem-path=/dev/shm/ram0,share=on -m -4G \
>> -machine memory-backend=ram0 \
>> -machine aux-ram-share=on
>
> Agreed.  Will you squash it in to both examples?

Yep, no worries.

>
> - Steve


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH V7 00/24] Live update: cpr-transfer
  2025-01-15 19:00 [PATCH V7 00/24] Live update: cpr-transfer Steve Sistare
                   ` (23 preceding siblings ...)
  2025-01-15 19:00 ` [PATCH V7 24/24] migration: cpr-transfer documentation Steve Sistare
@ 2025-01-27 15:39 ` Fabiano Rosas
  2025-01-28 21:20   ` Steven Sistare
  2025-04-09 16:22 ` Vladimir Sementsov-Ogievskiy
  25 siblings, 1 reply; 44+ messages in thread
From: Fabiano Rosas @ 2025-01-27 15:39 UTC (permalink / raw)
  To: Steve Sistare, qemu-devel
  Cc: Peter Xu, David Hildenbrand, Marcel Apfelbaum, Eduardo Habkost,
	Philippe Mathieu-Daude, Paolo Bonzini, Daniel P. Berrange,
	Markus Armbruster, Steve Sistare

Steve Sistare <steven.sistare@oracle.com> writes:

> What?
>
> This patch series adds the live migration cpr-transfer mode, which
> allows the user to transfer a guest to a new QEMU instance on the same
> host with minimal guest pause time, by preserving guest RAM in place,
> albeit with new virtual addresses in new QEMU, and by preserving device
> file descriptors.
>
> The new user-visible interfaces are:
>   * cpr-transfer (MigMode migration parameter)
>   * cpr (MigrationChannelType)
>   * incoming MigrationChannel (command-line argument)
>   * aux-ram-share (machine option)
>
> The user sets the mode parameter before invoking the migrate command.
> In this mode, the user starts new QEMU on the same host as old QEMU, with
> the same arguments as old QEMU, plus two -incoming options; one for the main
> channel, and one for the CPR channel.  The user issues the migrate command to
> old QEMU, which stops the VM, saves state to the migration channels, and
> enters the postmigrate state.  Execution resumes in new QEMU.
>
> Memory-backend objects must have the share=on attribute, but memory-backend-epc
> is not supported.  The VM must be started with the '-machine aux-ram-share=on'
> option, which allows auxilliary guest memory to be transferred in place to the
> new process.
>
> This mode requires a second migration channel of type "cpr", in the channel
> arguments on the outgoing side, and in a second -incoming command-line
> parameter on the incoming side.  This CPR channel must support file descriptor
> transfer with SCM_RIGHTS, i.e. it must be a UNIX domain socket.
>
> Why?
>
> This mode has less impact on the guest than any other method of updating
> in place.  The pause time is much lower, because devices need not be torn
> down and recreated, DMA does not need to be drained and quiesced, and minimal
> state is copied to new QEMU.  Further, there are no constraints on the guest.
> By contrast, cpr-reboot mode requires the guest to support S3 suspend-to-ram,
> and suspending plus resuming vfio devices adds multiple seconds to the
> guest pause time.
>
> These benefits all derive from the core design principle of this mode,
> which is preserving open descriptors.  This approach is very general and
> can be used to support a wide variety of devices that do not have hardware
> support for live migration, including but not limited to: vfio, chardev,
> vhost, vdpa, and iommufd.  Some devices need new kernel software interfaces
> to allow a descriptor to be used in a process that did not originally open it.
>
> How?
>
> All memory that is mapped by the guest is preserved in place.  Indeed,
> it must be, because it may be the target of DMA requests, which are not
> quiesced during cpr-transfer.  All such memory must be mmap'able in new QEMU.
> This is easy for named memory-backend objects, as long as they are mapped
> shared, because they are visible in the file system in both old and new QEMU.
> Anonymous memory must be allocated using memfd_create rather than MAP_ANON,
> so the memfd's can be sent to new QEMU.  Pages that were locked in memory
> for DMA in old QEMU remain locked in new QEMU, because the descriptor of
> the device that locked them remains open.
>
> cpr-transfer preserves descriptors by sending them to new QEMU via the CPR
> channel, which must support SCM_RIGHTS, and by sending the unique name of
> each descriptor to new QEMU via CPR state.
>
> For device descriptors, new QEMU reuses the descriptor when creating the
> device, rather than opening it again.  For memfd descriptors, new QEMU
> mmap's the preserved memfd when a ramblock is created.
>
> CPR state cannot be sent over the normal migration channel, because devices
> and backends are created prior to reading the channel, so this mode sends
> CPR state over a second "cpr" migration channel.  New QEMU reads the second
> channel prior to creating devices or backends.
>
> Example:
>
> In this example, we simply restart the same version of QEMU, but in
> a real scenario one would use a new QEMU binary path in terminal 2.
>
>   Terminal 1: start old QEMU
>   # qemu-kvm -qmp stdio -object
>   memory-backend-file,id=ram0,size=4G,mem-path=/dev/shm/ram0,share=on
>   -m 4G -machine aux-ram-share=on ...
>
>   Terminal 2: start new QEMU
>   # qemu-kvm -monitor stdio ... -incoming tcp:0:44444
>     -incoming '{"channel-type": "cpr",
>                 "addr": { "transport": "socket", "type": "unix",
>                           "path": "cpr.sock"}}'
>
>   Terminal 1:
>   {"execute":"qmp_capabilities"}
>
>   {"execute": "query-status"}
>   {"return": {"status": "running",
>               "running": true}}
>
>   {"execute":"migrate-set-parameters",
>    "arguments":{"mode":"cpr-transfer"}}
>
>   {"execute": "migrate", "arguments": { "channels": [
>     {"channel-type": "main",
>      "addr": { "transport": "socket", "type": "inet",
>                "host": "0", "port": "44444" }},
>     {"channel-type": "cpr",
>      "addr": { "transport": "socket", "type": "unix",
>                "path": "cpr.sock" }}]}}
>
>   {"execute": "query-status"}
>   {"return": {"status": "postmigrate",
>               "running": false}}
>
>   Terminal 2:
>   QEMU 10.0.50 monitor - type 'help' for more information
>   (qemu) info status
>   VM status: running
>
> This patch series implements a minimal version of cpr-transfer.  Additional
> series are ready to be posted to deliver the complete vision described
> above, including
>   * vfio
>   * chardev
>   * vhost and tap
>   * blockers
>   * cpr-exec mode
>   * iommufd
>
> Changes in V2:
>   * cpr-transfer is the first new mode proposed, and cpr-exec is deferred
>   * anon-alloc does not apply to memory-backend-object
>   * replaced hack with proper synchronization between source and target
>   * defined QEMU_CPR_FILE_MAGIC
>   * addressed misc review comments
>
> Changes in V3:
>   * added cpr-transfer to migration-test
>   * documented cpr-transfer in CPR.rst
>   * fix size_t trace format for 32-bit build
>   * drop explicit fd value in VMSTATE_FD
>   * defer cpr_walk_fd() and cpr_resave_fd() to later series
>   * drop "migration: save cpr mode".
>     delete mode from cpr state, and use cpr_uri to infer transfer mode.
>   * drop "migration: stop vm earlier for cpr"
>   * dropped cpr helpers, to be re-added later when needed
>   * fixed an unreported bug for cpr-transfer and migrate cancel
>   * documented cpr-transfer restrictions in qapi
>   * added trace for cpr_state_save and cpr_state_load
>   * added ftruncate to "preserve ram blocks"
>
> Changes in V4:
>   * cleaned up qtest deferred connection code
>   * renamed pass_fd -> can_pass_fd
>   * squashed patch "split qmp_migrate"
>   * deleted cpr-uri and its patches
>   * added cpr channel and its patches
>   * added patch "hostmem-shm: preserve for cpr"
>   * added patch "fd-based shared memory"
>   * added patch "factor out allocation of anonymous shared memory"
>   * added RAM_PRIVATE and its patch
>   * added aux-ram-share and its patch
>
> Changes in V5:
>   * added patch 'enhance migrate_uri_parse'
>   * supported dotted keys for -incoming channel,
>     and rewrote incoming_option_parse
>   * moved migrate_fd_cancel -> vm_resume to "stop vm earlier for cpr"
>     in a future series.
>   * updated command-line definition for aux-ram-share
>   * added patch "resizable qemu_ram_alloc_from_fd"
>   * rewrote patch "fd-based shared memory"
>   * fixed error message in qemu_shm_alloc
>   * added patch 'tests/qtest: optimize migrate_set_ports'
>   * added patch 'tests/qtest: enhance migration channels'
>   * added patch 'tests/qtest: assert qmp_ready'
>   * modified patch 'migration-test: cpr-transfer'
>   * polished the documentation in CPR.rst, qapi, and the
>     cpr-transfer mode commit message
>   * updated to master, and resolved massive context diffs for migration tests
>
> Changes in V6:
>   * added RB's and Acks.
>   * in patch "assert qmp_ready", deleted qmp_ready and checked qmp_fd instead.
>     renamed patch to ""assert qmp connected"
>   * factored out fix into new patch
>     "fix qemu_ram_alloc_from_fd size calculation"
>   * deleted a redundant call to migrate_hup_delete
>   * added commit message to "migration: cpr-transfer documentation"
>   * polished the text of cpr-transfer mode in qapi
>
> Changes in V7:
>   * fixed cpr-transfer test failure for s390
>   * fixed machine_get_aux_ram_share compilation error for Windows
>   * fixed size_t print format compilation error for misc architectures
>   * fixed memory leaks in cpr_transfer_output, cpr_transfer_input, and
>     qemu_file_get_fd
>
> The first 10 patches below are foundational and are needed for both cpr-transfer
> mode and the proposed cpr-exec mode.  The next 6 patches are specific to
> cpr-transfer and implement the mechanisms for sharing state across a socket
> using SCM_RIGHTS.  The last 8 patches supply tests and documentation.
>
> Steve Sistare (24):
>   backends/hostmem-shm: factor out allocation of "anonymous shared
>     memory with an fd"
>   physmem: fix qemu_ram_alloc_from_fd size calculation
>   physmem: qemu_ram_alloc_from_fd extensions
>   physmem: fd-based shared memory
>   memory: add RAM_PRIVATE
>   machine: aux-ram-share option
>   migration: cpr-state
>   physmem: preserve ram blocks for cpr
>   hostmem-memfd: preserve for cpr
>   hostmem-shm: preserve for cpr
>   migration: enhance migrate_uri_parse
>   migration: incoming channel
>   migration: SCM_RIGHTS for QEMUFile
>   migration: VMSTATE_FD
>   migration: cpr-transfer save and load
>   migration: cpr-transfer mode
>   migration-test: memory_backend
>   tests/qtest: optimize migrate_set_ports
>   tests/qtest: defer connection
>   migration-test: defer connection
>   tests/qtest: enhance migration channels
>   tests/qtest: assert qmp connected
>   migration-test: cpr-transfer
>   migration: cpr-transfer documentation
>
>  backends/hostmem-epc.c                 |   2 +-
>  backends/hostmem-file.c                |   2 +-
>  backends/hostmem-memfd.c               |  14 ++-
>  backends/hostmem-ram.c                 |   2 +-
>  backends/hostmem-shm.c                 |  51 ++------
>  docs/devel/migration/CPR.rst           | 182 ++++++++++++++++++++++++++-
>  hw/core/machine.c                      |  22 ++++
>  include/exec/memory.h                  |  10 ++
>  include/exec/ram_addr.h                |  13 +-
>  include/hw/boards.h                    |   1 +
>  include/migration/cpr.h                |  33 +++++
>  include/migration/misc.h               |   7 ++
>  include/migration/vmstate.h            |   9 ++
>  include/qemu/osdep.h                   |   1 +
>  meson.build                            |   8 +-
>  migration/cpr-transfer.c               |  71 +++++++++++
>  migration/cpr.c                        | 224 +++++++++++++++++++++++++++++++++
>  migration/meson.build                  |   2 +
>  migration/migration.c                  | 139 +++++++++++++++++++-
>  migration/migration.h                  |   4 +-
>  migration/options.c                    |   8 +-
>  migration/qemu-file.c                  |  84 ++++++++++++-
>  migration/qemu-file.h                  |   2 +
>  migration/ram.c                        |   2 +
>  migration/trace-events                 |  11 ++
>  migration/vmstate-types.c              |  24 ++++
>  qapi/migration.json                    |  44 ++++++-
>  qemu-options.hx                        |  34 +++++
>  stubs/vmstate.c                        |   7 ++
>  system/memory.c                        |   4 +-
>  system/physmem.c                       | 150 ++++++++++++++++++----
>  system/trace-events                    |   1 +
>  system/vl.c                            |  43 ++++++-
>  tests/qtest/libqtest.c                 |  86 ++++++++-----
>  tests/qtest/libqtest.h                 |  19 ++-
>  tests/qtest/migration/cpr-tests.c      |  62 +++++++++
>  tests/qtest/migration/framework.c      |  74 +++++++++--
>  tests/qtest/migration/framework.h      |  11 ++
>  tests/qtest/migration/migration-qmp.c  |  53 ++++++--
>  tests/qtest/migration/migration-qmp.h  |  10 +-
>  tests/qtest/migration/migration-util.c |  23 ++--
>  tests/qtest/migration/misc-tests.c     |   9 +-
>  tests/qtest/migration/precopy-tests.c  |   6 +-
>  tests/qtest/virtio-net-failover.c      |   8 +-
>  util/memfd.c                           |  16 ++-
>  util/oslib-posix.c                     |  52 ++++++++
>  util/oslib-win32.c                     |   6 +
>  47 files changed, 1472 insertions(+), 174 deletions(-)
>  create mode 100644 include/migration/cpr.h
>  create mode 100644 migration/cpr-transfer.c
>  create mode 100644 migration/cpr.c
>
> base-commit: e8aa7fdcddfc8589bdc7c973a052e76e8f999455

I'd like to merge this series by the end of the week if possible. Please
take a look at some comments from Markus that were left behind in v5.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH V7 00/24] Live update: cpr-transfer
  2025-01-27 15:39 ` [PATCH V7 00/24] Live update: cpr-transfer Fabiano Rosas
@ 2025-01-28 21:20   ` Steven Sistare
  2025-01-29  6:24     ` Markus Armbruster
  0 siblings, 1 reply; 44+ messages in thread
From: Steven Sistare @ 2025-01-28 21:20 UTC (permalink / raw)
  To: Fabiano Rosas, qemu-devel
  Cc: Peter Xu, David Hildenbrand, Marcel Apfelbaum, Eduardo Habkost,
	Philippe Mathieu-Daude, Paolo Bonzini, Daniel P. Berrange,
	Markus Armbruster

On 1/27/2025 10:39 AM, Fabiano Rosas wrote:
> Steve Sistare <steven.sistare@oracle.com> writes:
> 
>> What?
>>
>> This patch series adds the live migration cpr-transfer mode, which
>> allows the user to transfer a guest to a new QEMU instance on the same
>> host with minimal guest pause time, by preserving guest RAM in place,
>> albeit with new virtual addresses in new QEMU, and by preserving device
>> file descriptors.
>>
>> The new user-visible interfaces are:
>>    * cpr-transfer (MigMode migration parameter)
>>    * cpr (MigrationChannelType)
>>    * incoming MigrationChannel (command-line argument)
>>    * aux-ram-share (machine option)
>>
>> The user sets the mode parameter before invoking the migrate command.
>> In this mode, the user starts new QEMU on the same host as old QEMU, with
>> the same arguments as old QEMU, plus two -incoming options; one for the main
>> channel, and one for the CPR channel.  The user issues the migrate command to
>> old QEMU, which stops the VM, saves state to the migration channels, and
>> enters the postmigrate state.  Execution resumes in new QEMU.
>>
>> Memory-backend objects must have the share=on attribute, but memory-backend-epc
>> is not supported.  The VM must be started with the '-machine aux-ram-share=on'
>> option, which allows auxilliary guest memory to be transferred in place to the
>> new process.
>>
>> This mode requires a second migration channel of type "cpr", in the channel
>> arguments on the outgoing side, and in a second -incoming command-line
>> parameter on the incoming side.  This CPR channel must support file descriptor
>> transfer with SCM_RIGHTS, i.e. it must be a UNIX domain socket.
>>
>> Why?
>>
>> This mode has less impact on the guest than any other method of updating
>> in place.  The pause time is much lower, because devices need not be torn
>> down and recreated, DMA does not need to be drained and quiesced, and minimal
>> state is copied to new QEMU.  Further, there are no constraints on the guest.
>> By contrast, cpr-reboot mode requires the guest to support S3 suspend-to-ram,
>> and suspending plus resuming vfio devices adds multiple seconds to the
>> guest pause time.
>>
>> These benefits all derive from the core design principle of this mode,
>> which is preserving open descriptors.  This approach is very general and
>> can be used to support a wide variety of devices that do not have hardware
>> support for live migration, including but not limited to: vfio, chardev,
>> vhost, vdpa, and iommufd.  Some devices need new kernel software interfaces
>> to allow a descriptor to be used in a process that did not originally open it.
>>
>> How?
>>
>> All memory that is mapped by the guest is preserved in place.  Indeed,
>> it must be, because it may be the target of DMA requests, which are not
>> quiesced during cpr-transfer.  All such memory must be mmap'able in new QEMU.
>> This is easy for named memory-backend objects, as long as they are mapped
>> shared, because they are visible in the file system in both old and new QEMU.
>> Anonymous memory must be allocated using memfd_create rather than MAP_ANON,
>> so the memfd's can be sent to new QEMU.  Pages that were locked in memory
>> for DMA in old QEMU remain locked in new QEMU, because the descriptor of
>> the device that locked them remains open.
>>
>> cpr-transfer preserves descriptors by sending them to new QEMU via the CPR
>> channel, which must support SCM_RIGHTS, and by sending the unique name of
>> each descriptor to new QEMU via CPR state.
>>
>> For device descriptors, new QEMU reuses the descriptor when creating the
>> device, rather than opening it again.  For memfd descriptors, new QEMU
>> mmap's the preserved memfd when a ramblock is created.
>>
>> CPR state cannot be sent over the normal migration channel, because devices
>> and backends are created prior to reading the channel, so this mode sends
>> CPR state over a second "cpr" migration channel.  New QEMU reads the second
>> channel prior to creating devices or backends.
>>
>> Example:
>>
>> In this example, we simply restart the same version of QEMU, but in
>> a real scenario one would use a new QEMU binary path in terminal 2.
>>
>>    Terminal 1: start old QEMU
>>    # qemu-kvm -qmp stdio -object
>>    memory-backend-file,id=ram0,size=4G,mem-path=/dev/shm/ram0,share=on
>>    -m 4G -machine aux-ram-share=on ...
>>
>>    Terminal 2: start new QEMU
>>    # qemu-kvm -monitor stdio ... -incoming tcp:0:44444
>>      -incoming '{"channel-type": "cpr",
>>                  "addr": { "transport": "socket", "type": "unix",
>>                            "path": "cpr.sock"}}'
>>
>>    Terminal 1:
>>    {"execute":"qmp_capabilities"}
>>
>>    {"execute": "query-status"}
>>    {"return": {"status": "running",
>>                "running": true}}
>>
>>    {"execute":"migrate-set-parameters",
>>     "arguments":{"mode":"cpr-transfer"}}
>>
>>    {"execute": "migrate", "arguments": { "channels": [
>>      {"channel-type": "main",
>>       "addr": { "transport": "socket", "type": "inet",
>>                 "host": "0", "port": "44444" }},
>>      {"channel-type": "cpr",
>>       "addr": { "transport": "socket", "type": "unix",
>>                 "path": "cpr.sock" }}]}}
>>
>>    {"execute": "query-status"}
>>    {"return": {"status": "postmigrate",
>>                "running": false}}
>>
>>    Terminal 2:
>>    QEMU 10.0.50 monitor - type 'help' for more information
>>    (qemu) info status
>>    VM status: running
>>
>> This patch series implements a minimal version of cpr-transfer.  Additional
>> series are ready to be posted to deliver the complete vision described
>> above, including
>>    * vfio
>>    * chardev
>>    * vhost and tap
>>    * blockers
>>    * cpr-exec mode
>>    * iommufd
>>
>> Changes in V2:
>>    * cpr-transfer is the first new mode proposed, and cpr-exec is deferred
>>    * anon-alloc does not apply to memory-backend-object
>>    * replaced hack with proper synchronization between source and target
>>    * defined QEMU_CPR_FILE_MAGIC
>>    * addressed misc review comments
>>
>> Changes in V3:
>>    * added cpr-transfer to migration-test
>>    * documented cpr-transfer in CPR.rst
>>    * fix size_t trace format for 32-bit build
>>    * drop explicit fd value in VMSTATE_FD
>>    * defer cpr_walk_fd() and cpr_resave_fd() to later series
>>    * drop "migration: save cpr mode".
>>      delete mode from cpr state, and use cpr_uri to infer transfer mode.
>>    * drop "migration: stop vm earlier for cpr"
>>    * dropped cpr helpers, to be re-added later when needed
>>    * fixed an unreported bug for cpr-transfer and migrate cancel
>>    * documented cpr-transfer restrictions in qapi
>>    * added trace for cpr_state_save and cpr_state_load
>>    * added ftruncate to "preserve ram blocks"
>>
>> Changes in V4:
>>    * cleaned up qtest deferred connection code
>>    * renamed pass_fd -> can_pass_fd
>>    * squashed patch "split qmp_migrate"
>>    * deleted cpr-uri and its patches
>>    * added cpr channel and its patches
>>    * added patch "hostmem-shm: preserve for cpr"
>>    * added patch "fd-based shared memory"
>>    * added patch "factor out allocation of anonymous shared memory"
>>    * added RAM_PRIVATE and its patch
>>    * added aux-ram-share and its patch
>>
>> Changes in V5:
>>    * added patch 'enhance migrate_uri_parse'
>>    * supported dotted keys for -incoming channel,
>>      and rewrote incoming_option_parse
>>    * moved migrate_fd_cancel -> vm_resume to "stop vm earlier for cpr"
>>      in a future series.
>>    * updated command-line definition for aux-ram-share
>>    * added patch "resizable qemu_ram_alloc_from_fd"
>>    * rewrote patch "fd-based shared memory"
>>    * fixed error message in qemu_shm_alloc
>>    * added patch 'tests/qtest: optimize migrate_set_ports'
>>    * added patch 'tests/qtest: enhance migration channels'
>>    * added patch 'tests/qtest: assert qmp_ready'
>>    * modified patch 'migration-test: cpr-transfer'
>>    * polished the documentation in CPR.rst, qapi, and the
>>      cpr-transfer mode commit message
>>    * updated to master, and resolved massive context diffs for migration tests
>>
>> Changes in V6:
>>    * added RB's and Acks.
>>    * in patch "assert qmp_ready", deleted qmp_ready and checked qmp_fd instead.
>>      renamed patch to ""assert qmp connected"
>>    * factored out fix into new patch
>>      "fix qemu_ram_alloc_from_fd size calculation"
>>    * deleted a redundant call to migrate_hup_delete
>>    * added commit message to "migration: cpr-transfer documentation"
>>    * polished the text of cpr-transfer mode in qapi
>>
>> Changes in V7:
>>    * fixed cpr-transfer test failure for s390
>>    * fixed machine_get_aux_ram_share compilation error for Windows
>>    * fixed size_t print format compilation error for misc architectures
>>    * fixed memory leaks in cpr_transfer_output, cpr_transfer_input, and
>>      qemu_file_get_fd
>>
>> The first 10 patches below are foundational and are needed for both cpr-transfer
>> mode and the proposed cpr-exec mode.  The next 6 patches are specific to
>> cpr-transfer and implement the mechanisms for sharing state across a socket
>> using SCM_RIGHTS.  The last 8 patches supply tests and documentation.
>>
>> Steve Sistare (24):
>>    backends/hostmem-shm: factor out allocation of "anonymous shared
>>      memory with an fd"
>>    physmem: fix qemu_ram_alloc_from_fd size calculation
>>    physmem: qemu_ram_alloc_from_fd extensions
>>    physmem: fd-based shared memory
>>    memory: add RAM_PRIVATE
>>    machine: aux-ram-share option
>>    migration: cpr-state
>>    physmem: preserve ram blocks for cpr
>>    hostmem-memfd: preserve for cpr
>>    hostmem-shm: preserve for cpr
>>    migration: enhance migrate_uri_parse
>>    migration: incoming channel
>>    migration: SCM_RIGHTS for QEMUFile
>>    migration: VMSTATE_FD
>>    migration: cpr-transfer save and load
>>    migration: cpr-transfer mode
>>    migration-test: memory_backend
>>    tests/qtest: optimize migrate_set_ports
>>    tests/qtest: defer connection
>>    migration-test: defer connection
>>    tests/qtest: enhance migration channels
>>    tests/qtest: assert qmp connected
>>    migration-test: cpr-transfer
>>    migration: cpr-transfer documentation
>>
>>   backends/hostmem-epc.c                 |   2 +-
>>   backends/hostmem-file.c                |   2 +-
>>   backends/hostmem-memfd.c               |  14 ++-
>>   backends/hostmem-ram.c                 |   2 +-
>>   backends/hostmem-shm.c                 |  51 ++------
>>   docs/devel/migration/CPR.rst           | 182 ++++++++++++++++++++++++++-
>>   hw/core/machine.c                      |  22 ++++
>>   include/exec/memory.h                  |  10 ++
>>   include/exec/ram_addr.h                |  13 +-
>>   include/hw/boards.h                    |   1 +
>>   include/migration/cpr.h                |  33 +++++
>>   include/migration/misc.h               |   7 ++
>>   include/migration/vmstate.h            |   9 ++
>>   include/qemu/osdep.h                   |   1 +
>>   meson.build                            |   8 +-
>>   migration/cpr-transfer.c               |  71 +++++++++++
>>   migration/cpr.c                        | 224 +++++++++++++++++++++++++++++++++
>>   migration/meson.build                  |   2 +
>>   migration/migration.c                  | 139 +++++++++++++++++++-
>>   migration/migration.h                  |   4 +-
>>   migration/options.c                    |   8 +-
>>   migration/qemu-file.c                  |  84 ++++++++++++-
>>   migration/qemu-file.h                  |   2 +
>>   migration/ram.c                        |   2 +
>>   migration/trace-events                 |  11 ++
>>   migration/vmstate-types.c              |  24 ++++
>>   qapi/migration.json                    |  44 ++++++-
>>   qemu-options.hx                        |  34 +++++
>>   stubs/vmstate.c                        |   7 ++
>>   system/memory.c                        |   4 +-
>>   system/physmem.c                       | 150 ++++++++++++++++++----
>>   system/trace-events                    |   1 +
>>   system/vl.c                            |  43 ++++++-
>>   tests/qtest/libqtest.c                 |  86 ++++++++-----
>>   tests/qtest/libqtest.h                 |  19 ++-
>>   tests/qtest/migration/cpr-tests.c      |  62 +++++++++
>>   tests/qtest/migration/framework.c      |  74 +++++++++--
>>   tests/qtest/migration/framework.h      |  11 ++
>>   tests/qtest/migration/migration-qmp.c  |  53 ++++++--
>>   tests/qtest/migration/migration-qmp.h  |  10 +-
>>   tests/qtest/migration/migration-util.c |  23 ++--
>>   tests/qtest/migration/misc-tests.c     |   9 +-
>>   tests/qtest/migration/precopy-tests.c  |   6 +-
>>   tests/qtest/virtio-net-failover.c      |   8 +-
>>   util/memfd.c                           |  16 ++-
>>   util/oslib-posix.c                     |  52 ++++++++
>>   util/oslib-win32.c                     |   6 +
>>   47 files changed, 1472 insertions(+), 174 deletions(-)
>>   create mode 100644 include/migration/cpr.h
>>   create mode 100644 migration/cpr-transfer.c
>>   create mode 100644 migration/cpr.c
>>
>> base-commit: e8aa7fdcddfc8589bdc7c973a052e76e8f999455
> 
> I'd like to merge this series by the end of the week if possible. Please
> take a look at some comments from Markus that were left behind in v5.

We discussed, and Markus agrees none are show stoppers.

- Steve



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH V7 16/24] migration: cpr-transfer mode
  2025-01-15 19:00 ` [PATCH V7 16/24] migration: cpr-transfer mode Steve Sistare
@ 2025-01-29  6:23   ` Markus Armbruster
  0 siblings, 0 replies; 44+ messages in thread
From: Markus Armbruster @ 2025-01-29  6:23 UTC (permalink / raw)
  To: Steve Sistare
  Cc: qemu-devel, Peter Xu, Fabiano Rosas, David Hildenbrand,
	Marcel Apfelbaum, Eduardo Habkost, Philippe Mathieu-Daude,
	Paolo Bonzini, Daniel P. Berrange, Markus Armbruster

Steve Sistare <steven.sistare@oracle.com> writes:

> Add the cpr-transfer migration mode, which allows the user to transfer
> a guest to a new QEMU instance on the same host with minimal guest pause
> time, by preserving guest RAM in place, albeit with new virtual addresses
> in new QEMU, and by preserving device file descriptors.  Pages that were
> locked in memory for DMA in old QEMU remain locked in new QEMU, because the
> descriptor of the device that locked them remains open.
>
> cpr-transfer preserves memory and devices descriptors by sending them to
> new QEMU over a unix domain socket using SCM_RIGHTS.  Such CPR state cannot
> be sent over the normal migration channel, because devices and backends
> are created prior to reading the channel, so this mode sends CPR state
> over a second "cpr" migration channel.  New QEMU reads the cpr channel
> prior to creating devices or backends.  The user specifies the cpr channel
> in the channel arguments on the outgoing side, and in a second -incoming
> command-line parameter on the incoming side.
>
> The user must start old QEMU with the the '-machine aux-ram-share=on' option,
> which allows anonymous memory to be transferred in place to the new process
> by transferring a memory descriptor for each ram block.  Memory-backend
> objects must have the share=on attribute, but memory-backend-epc is not
> supported.
>
> The user starts new QEMU on the same host as old QEMU, with command-line
> arguments to create the same machine, plus the -incoming option for the
> main migration channel, like normal live migration.  In addition, the user
> adds a second -incoming option with channel type "cpr".  This CPR channel
> must support file descriptor transfer with SCM_RIGHTS, i.e. it must be a
> UNIX domain socket.
>
> To initiate CPR, the user issues a migrate command to old QEMU, adding
> a second migration channel of type "cpr" in the channels argument.
> Old QEMU stops the VM, saves state to the migration channels, and enters
> the postmigrate state.  New QEMU mmap's memory descriptors, and execution
> resumes.
>
> The implementation splits qmp_migrate into start and finish functions.
> Start sends CPR state to new QEMU, which responds by closing the CPR
> channel.  Old QEMU detects the HUP then calls finish, which connects the
> main migration channel.
>
> In summary, the usage is:
>
>   qemu-system-$arch -machine aux-ram-share=on ...
>
>   start new QEMU with "-incoming <main-uri> -incoming <cpr-channel>"
>
>   Issue commands to old QEMU:
>     migrate_set_parameter mode cpr-transfer
>
>     {"execute": "migrate", ...
>         {"channel-type": "main"...}, {"channel-type": "cpr"...} ... }
>
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> Reviewed-by: Peter Xu <peterx@redhat.com>

Acked-by: Markus Armbruster <armbru@redhat.com>



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH V7 00/24] Live update: cpr-transfer
  2025-01-28 21:20   ` Steven Sistare
@ 2025-01-29  6:24     ` Markus Armbruster
  0 siblings, 0 replies; 44+ messages in thread
From: Markus Armbruster @ 2025-01-29  6:24 UTC (permalink / raw)
  To: Steven Sistare
  Cc: Fabiano Rosas, qemu-devel, Peter Xu, David Hildenbrand,
	Marcel Apfelbaum, Eduardo Habkost, Philippe Mathieu-Daude,
	Paolo Bonzini, Daniel P. Berrange

Steven Sistare <steven.sistare@oracle.com> writes:

> On 1/27/2025 10:39 AM, Fabiano Rosas wrote:
>> Steve Sistare <steven.sistare@oracle.com> writes:

[...]

>> I'd like to merge this series by the end of the week if possible. Please
>> take a look at some comments from Markus that were left behind in v5.
>
> We discussed, and Markus agrees none are show stoppers.

I just sent my Acked-by to the QAPI part.  Thank you both for your
patience!



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH V7 00/24] Live update: cpr-transfer
  2025-01-15 19:00 [PATCH V7 00/24] Live update: cpr-transfer Steve Sistare
                   ` (24 preceding siblings ...)
  2025-01-27 15:39 ` [PATCH V7 00/24] Live update: cpr-transfer Fabiano Rosas
@ 2025-04-09 16:22 ` Vladimir Sementsov-Ogievskiy
  2025-04-09 17:48   ` Steven Sistare
  2025-04-09 17:50   ` Vladimir Sementsov-Ogievskiy
  25 siblings, 2 replies; 44+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-04-09 16:22 UTC (permalink / raw)
  To: Steve Sistare, qemu-devel
  Cc: Peter Xu, Fabiano Rosas, David Hildenbrand, Marcel Apfelbaum,
	Eduardo Habkost, Philippe Mathieu-Daude, Paolo Bonzini,
	Daniel P. Berrange, Markus Armbruster

[offlist]

On 15.01.25 22:00, Steve Sistare wrote:
> This patch series implements a minimal version of cpr-transfer.  Additional
> series are ready to be posted to deliver the complete vision described
> above, including
>    * vfio
>    * chardev
>    * vhost and tap
>    * blockers
>    * cpr-exec mode

Hi Steve. First, great congratulations with finally landed cpr-transfer! I saw the history of Live Update series was started overly five years ago.

I've some questions, hope it's not much trouble for you.

1. We consider porting cpr-transver + vfio part of your "Live update: vfio and iommufd" to our downstream QEMU, based on v7.2. What do you think? I mean, may be you may quickly answer "don't try, you'll have to bring more than 100 commits from different series", or visa-versa "we have downstream based on 7.2 too, so it's possible" (OK, seems the latter answer is not possible, as iommufd code just absent in v7.2).


2. About cpr-exec. Do you plan resending it in future? The solution is interesting for us, as it simplifies management a lot. I read the discussion on cpr-exec, seems the main problem was the security constraint, that we don't want to allow exec call in seccomp profile. Didn't you consider a variant with loding the library instead of exec?

I mean:

- turn the whole QEMU into library, which may be dynamically loaded. Recently there was a question how to do it, and the answer contained an example patch: https://github.com/pbo-linaro/qemu/commit/fbb39cc64f77d4bf1e5e50795c75b62735bf5c5f

- and make a simple wrapper process for that library, which also is a container for migration state (including file descriptors), during live update.

Benefits:

- no execve, and we just need to add pattern for "qemu library" paths to apparmor profile

- probably, we can load new library _before_ starting the migration, reducing freeze-time of migration - more like migration with two processes

-- 
Best regards,
Vladimir



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH V7 00/24] Live update: cpr-transfer
  2025-04-09 16:22 ` Vladimir Sementsov-Ogievskiy
@ 2025-04-09 17:48   ` Steven Sistare
  2025-04-09 18:06     ` Vladimir Sementsov-Ogievskiy
  2025-04-09 17:50   ` Vladimir Sementsov-Ogievskiy
  1 sibling, 1 reply; 44+ messages in thread
From: Steven Sistare @ 2025-04-09 17:48 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: Peter Xu, Fabiano Rosas, David Hildenbrand, Marcel Apfelbaum,
	Eduardo Habkost, Philippe Mathieu-Daude, Paolo Bonzini,
	Daniel P. Berrange, Markus Armbruster

On 4/9/2025 12:22 PM, Vladimir Sementsov-Ogievskiy wrote:
> 
> On 15.01.25 22:00, Steve Sistare wrote:
>> This patch series implements a minimal version of cpr-transfer.  Additional
>> series are ready to be posted to deliver the complete vision described
>> above, including
>>    * vfio
>>    * chardev
>>    * vhost and tap
>>    * blockers
>>    * cpr-exec mode
> 
> Hi Steve. First, great congratulations with finally landed cpr-transfer! I saw the history of Live Update series was started overly five years ago.

Thanks!  It's been a marathon, not a sprint.

> I've some questions, hope it's not much trouble for you.
> 
> 1. We consider porting cpr-transver + vfio part of your "Live update: vfio and iommufd" to our downstream QEMU, based on v7.2. What do you think? I mean, may be you may quickly answer "don't try, you'll have to bring more than 100 commits from different series", or visa-versa "we have downstream based on 7.2 too, so it's possible" (OK, seems the latter answer is not possible, as iommufd code just absent in v7.2).

I have not tried it, but I think this is feasible if you omit the iommufd patches.
You will also need some of the cpr-reboot patches (like mode-specific migration
blockers) which did not appear until qemu 8.2.

> 2. About cpr-exec. Do you plan resending it in future? The solution is interesting for us, as it simplifies management a lot. 

I agree!  I made that argument when I submitted it.  Perhaps your +1 will add
enough critical mass to get it accepted next time.  I do plan to resubmit it later.

> I read the discussion on cpr-exec, seems the main problem was the security constraint, that we don't want to allow exec call in seccomp profile. Didn't you consider a variant with loding the library instead of exec?
> 
> I mean:
> 
> - turn the whole QEMU into library, which may be dynamically loaded. Recently there was a question how to do it, and the answer contained an example patch: https://github.com/pbo-linaro/qemu/commit/fbb39cc64f77d4bf1e5e50795c75b62735bf5c5f
> 
> - and make a simple wrapper process for that library, which also is a container for migration state (including file descriptors), during live update.
> 
> Benefits:
> 
> - no execve, and we just need to add pattern for "qemu library" paths to apparmor profile
> 
> - probably, we can load new library _before_ starting the migration, reducing freeze-time of migration - more like migration with two processes

I have not considered that.  A colleague suggested something similar -- loading the
new qemu binary in memory and implementing exec in userland.   No doubt either
method would be a non-trivial amount of work, versus cpr-exec which already works :)

Personally I don't think that requiring exec is a show stopper. If qemu is deployed in
a container environment, then the potential targets of an exec can be limited by the
container walls.

- Steve



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH V7 00/24] Live update: cpr-transfer
  2025-04-09 16:22 ` Vladimir Sementsov-Ogievskiy
  2025-04-09 17:48   ` Steven Sistare
@ 2025-04-09 17:50   ` Vladimir Sementsov-Ogievskiy
  1 sibling, 0 replies; 44+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-04-09 17:50 UTC (permalink / raw)
  To: Steve Sistare, qemu-devel
  Cc: Peter Xu, Fabiano Rosas, David Hildenbrand, Marcel Apfelbaum,
	Eduardo Habkost, Philippe Mathieu-Daude, Paolo Bonzini,
	Daniel P. Berrange, Markus Armbruster

On 09.04.25 19:22, Vladimir Sementsov-Ogievskiy wrote:
> [offlist]

Hah, decided to send offlist, but forget to clear CC. Sorry. However, nothing secret here.

Moreover, interesting, what do all think about a cpr-exec variant with loading QEMU as library instead of doing exec.

> 
> On 15.01.25 22:00, Steve Sistare wrote:
>> This patch series implements a minimal version of cpr-transfer.  Additional
>> series are ready to be posted to deliver the complete vision described
>> above, including
>>    * vfio
>>    * chardev
>>    * vhost and tap
>>    * blockers
>>    * cpr-exec mode
> 
> Hi Steve. First, great congratulations with finally landed cpr-transfer! I saw the history of Live Update series was started overly five years ago.
> 
> I've some questions, hope it's not much trouble for you.
> 
> 1. We consider porting cpr-transver + vfio part of your "Live update: vfio and iommufd" to our downstream QEMU, based on v7.2. What do you think? I mean, may be you may quickly answer "don't try, you'll have to bring more than 100 commits from different series", or visa-versa "we have downstream based on 7.2 too, so it's possible" (OK, seems the latter answer is not possible, as iommufd code just absent in v7.2).
> 
> 
> 2. About cpr-exec. Do you plan resending it in future? The solution is interesting for us, as it simplifies management a lot. I read the discussion on cpr-exec, seems the main problem was the security constraint, that we don't want to allow exec call in seccomp profile. Didn't you consider a variant with loding the library instead of exec?
> 
> I mean:
> 
> - turn the whole QEMU into library, which may be dynamically loaded. Recently there was a question how to do it, and the answer contained an example patch: https://github.com/pbo-linaro/qemu/commit/fbb39cc64f77d4bf1e5e50795c75b62735bf5c5f
> 
> - and make a simple wrapper process for that library, which also is a container for migration state (including file descriptors), during live update.
> 
> Benefits:
> 
> - no execve, and we just need to add pattern for "qemu library" paths to apparmor profile
> 
> - probably, we can load new library _before_ starting the migration, reducing freeze-time of migration - more like migration with two processes
> 

-- 
Best regards,
Vladimir



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH V7 00/24] Live update: cpr-transfer
  2025-04-09 17:48   ` Steven Sistare
@ 2025-04-09 18:06     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 44+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-04-09 18:06 UTC (permalink / raw)
  To: Steven Sistare, qemu-devel
  Cc: Peter Xu, Fabiano Rosas, David Hildenbrand, Marcel Apfelbaum,
	Eduardo Habkost, Philippe Mathieu-Daude, Paolo Bonzini,
	Daniel P. Berrange, Markus Armbruster

On 09.04.25 20:48, Steven Sistare wrote:
> On 4/9/2025 12:22 PM, Vladimir Sementsov-Ogievskiy wrote:
>>
>> On 15.01.25 22:00, Steve Sistare wrote:
>>> This patch series implements a minimal version of cpr-transfer.  Additional
>>> series are ready to be posted to deliver the complete vision described
>>> above, including
>>>    * vfio
>>>    * chardev
>>>    * vhost and tap
>>>    * blockers
>>>    * cpr-exec mode
>>
>> Hi Steve. First, great congratulations with finally landed cpr-transfer! I saw the history of Live Update series was started overly five years ago.
> 
> Thanks!  It's been a marathon, not a sprint.
> 
>> I've some questions, hope it's not much trouble for you.
>>
>> 1. We consider porting cpr-transver + vfio part of your "Live update: vfio and iommufd" to our downstream QEMU, based on v7.2. What do you think? I mean, may be you may quickly answer "don't try, you'll have to bring more than 100 commits from different series", or visa-versa "we have downstream based on 7.2 too, so it's possible" (OK, seems the latter answer is not possible, as iommufd code just absent in v7.2).
> 
> I have not tried it, but I think this is feasible if you omit the iommufd patches.
> You will also need some of the cpr-reboot patches (like mode-specific migration
> blockers) which did not appear until qemu 8.2.

Well, I'll try, thanks!

> 
>> 2. About cpr-exec. Do you plan resending it in future? The solution is interesting for us, as it simplifies management a lot. 
> 
> I agree!  I made that argument when I submitted it.  Perhaps your +1 will add
> enough critical mass to get it accepted next time.  I do plan to resubmit it later.

Great!

> 
>> I read the discussion on cpr-exec, seems the main problem was the security constraint, that we don't want to allow exec call in seccomp profile. Didn't you consider a variant with loding the library instead of exec?
>>
>> I mean:
>>
>> - turn the whole QEMU into library, which may be dynamically loaded. Recently there was a question how to do it, and the answer contained an example patch: https://github.com/pbo-linaro/qemu/commit/fbb39cc64f77d4bf1e5e50795c75b62735bf5c5f
>>
>> - and make a simple wrapper process for that library, which also is a container for migration state (including file descriptors), during live update.
>>
>> Benefits:
>>
>> - no execve, and we just need to add pattern for "qemu library" paths to apparmor profile
>>
>> - probably, we can load new library _before_ starting the migration, reducing freeze-time of migration - more like migration with two processes
> 
> I have not considered that.  A colleague suggested something similar -- loading the
> new qemu binary in memory and implementing exec in userland.   No doubt either
> method would be a non-trivial amount of work, versus cpr-exec which already works :)
> 
> Personally I don't think that requiring exec is a show stopper. If qemu is deployed in
> a container environment, then the potential targets of an exec can be limited by the
> container walls.
> 

> a container environment

that's not our case.. Still, probably it's not a big deal to allow exec call, when we control the whole code base, where not too many exec calls.

-- 
Best regards,
Vladimir



^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2025-04-09 18:08 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-15 19:00 [PATCH V7 00/24] Live update: cpr-transfer Steve Sistare
2025-01-15 19:00 ` [PATCH V7 01/24] backends/hostmem-shm: factor out allocation of "anonymous shared memory with an fd" Steve Sistare
2025-01-15 19:00 ` [PATCH V7 02/24] physmem: fix qemu_ram_alloc_from_fd size calculation Steve Sistare
2025-01-15 19:00 ` [PATCH V7 03/24] physmem: qemu_ram_alloc_from_fd extensions Steve Sistare
2025-01-15 19:00 ` [PATCH V7 04/24] physmem: fd-based shared memory Steve Sistare
2025-01-15 19:00 ` [PATCH V7 05/24] memory: add RAM_PRIVATE Steve Sistare
2025-01-15 19:00 ` [PATCH V7 06/24] machine: aux-ram-share option Steve Sistare
2025-01-15 19:00 ` [PATCH V7 07/24] migration: cpr-state Steve Sistare
2025-01-15 19:00 ` [PATCH V7 08/24] physmem: preserve ram blocks for cpr Steve Sistare
2025-01-15 19:00 ` [PATCH V7 09/24] hostmem-memfd: preserve " Steve Sistare
2025-01-15 19:00 ` [PATCH V7 10/24] hostmem-shm: " Steve Sistare
2025-01-15 19:00 ` [PATCH V7 11/24] migration: enhance migrate_uri_parse Steve Sistare
2025-01-15 19:00 ` [PATCH V7 12/24] migration: incoming channel Steve Sistare
2025-01-15 19:00 ` [PATCH V7 13/24] migration: SCM_RIGHTS for QEMUFile Steve Sistare
2025-01-15 19:00 ` [PATCH V7 14/24] migration: VMSTATE_FD Steve Sistare
2025-01-15 19:00 ` [PATCH V7 15/24] migration: cpr-transfer save and load Steve Sistare
2025-01-15 19:00 ` [PATCH V7 16/24] migration: cpr-transfer mode Steve Sistare
2025-01-29  6:23   ` Markus Armbruster
2025-01-15 19:00 ` [PATCH V7 17/24] migration-test: memory_backend Steve Sistare
2025-01-15 19:00 ` [PATCH V7 18/24] tests/qtest: optimize migrate_set_ports Steve Sistare
2025-01-15 19:00 ` [PATCH V7 19/24] tests/qtest: defer connection Steve Sistare
2025-01-15 19:00 ` [PATCH V7 20/24] migration-test: " Steve Sistare
2025-01-15 19:00 ` [PATCH V7 21/24] tests/qtest: enhance migration channels Steve Sistare
2025-01-15 19:00 ` [PATCH V7 22/24] tests/qtest: assert qmp connected Steve Sistare
2025-01-15 19:00 ` [PATCH V7 23/24] migration-test: cpr-transfer Steve Sistare
2025-01-16 19:06   ` Fabiano Rosas
2025-01-16 19:37     ` Steven Sistare
2025-01-16 20:02       ` Fabiano Rosas
2025-01-16 20:15         ` Steven Sistare
2025-01-15 19:00 ` [PATCH V7 24/24] migration: cpr-transfer documentation Steve Sistare
2025-01-17 14:42   ` Fabiano Rosas
2025-01-17 15:04     ` Steven Sistare
2025-01-17 15:29       ` Fabiano Rosas
2025-01-17 16:58         ` Steven Sistare
2025-01-17 19:06           ` Fabiano Rosas
2025-01-17 19:32             ` Steven Sistare
2025-01-17 20:04               ` Fabiano Rosas
2025-01-27 15:39 ` [PATCH V7 00/24] Live update: cpr-transfer Fabiano Rosas
2025-01-28 21:20   ` Steven Sistare
2025-01-29  6:24     ` Markus Armbruster
2025-04-09 16:22 ` Vladimir Sementsov-Ogievskiy
2025-04-09 17:48   ` Steven Sistare
2025-04-09 18:06     ` Vladimir Sementsov-Ogievskiy
2025-04-09 17:50   ` Vladimir Sementsov-Ogievskiy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).