* [PULL 00/42] Migration patches for 2025-01-29
@ 2025-01-29 16:00 Fabiano Rosas
2025-01-29 16:00 ` [PULL 01/42] migration: fix -Werror=maybe-uninitialized Fabiano Rosas
` (42 more replies)
0 siblings, 43 replies; 47+ messages in thread
From: Fabiano Rosas @ 2025-01-29 16:00 UTC (permalink / raw)
To: qemu-devel; +Cc: Peter Xu
The following changes since commit 7faf9d2f12ace4c1d04cf1a2b39334eef9a45f22:
Merge tag 'pull-aspeed-20250127' of https://github.com/legoater/qemu into staging (2025-01-27 11:20:35 -0500)
are available in the Git repository at:
https://gitlab.com/farosas/qemu.git tags/migration-20250129-pull-request
for you to fetch changes up to bc38dc2f5f350310724fd7d4f0a09f8c3a4811fa:
migration: refactor ram_save_target_page functions (2025-01-29 11:56:42 -0300)
----------------------------------------------------------------
Migration pull request
- Purge of ram_save_target_page_legacy
- Cleanups to postcopy, json writer, migration states
- New migration mode cpr-transfer
- Fix for a -Werror=maybe-uninitialized instance in savevm
----------------------------------------------------------------
Marc-André Lureau (1):
migration: fix -Werror=maybe-uninitialized
Peter Xu (16):
migration: Remove postcopy implications in should_send_vmdesc()
migration: Do not construct JSON description if suppressed
migration: Optimize postcopy on downtime by avoiding JSON writer
migration: Avoid two src-downtime-end tracepoints for postcopy
migration: Drop inactivate_disk param in qemu_savevm_state_complete*
migration: Synchronize all CPU states only for non-iterable dump
migration: Adjust postcopy bandwidth during switchover
migration: Adjust locking in migration_maybe_pause()
migration: Drop cached migration state in migration_maybe_pause()
migration: Take BQL slightly longer in postcopy_start()
migration: Notify COMPLETE once for postcopy
migration: Unwrap qemu_savevm_state_complete_precopy() in postcopy
migration: Cleanup qemu_savevm_state_complete_precopy()
migration: Always set DEVICE state
migration: Merge precopy/postcopy on switchover start
migration: Trivial cleanup on JSON writer of vmstate_save()
Prasad J Pandit (1):
migration: refactor ram_save_target_page functions
Steve Sistare (24):
backends/hostmem-shm: factor out allocation of "anonymous shared
memory with an fd"
physmem: fix qemu_ram_alloc_from_fd size calculation
physmem: qemu_ram_alloc_from_fd extensions
physmem: fd-based shared memory
memory: add RAM_PRIVATE
machine: aux-ram-share option
migration: cpr-state
physmem: preserve ram blocks for cpr
hostmem-memfd: preserve for cpr
hostmem-shm: preserve for cpr
migration: enhance migrate_uri_parse
migration: incoming channel
migration: SCM_RIGHTS for QEMUFile
migration: VMSTATE_FD
migration: cpr-transfer save and load
migration: cpr-transfer mode
migration-test: memory_backend
tests/qtest: optimize migrate_set_ports
tests/qtest: defer connection
migration-test: defer connection
tests/qtest: enhance migration channels
tests/qtest: assert qmp connected
migration-test: cpr-transfer
migration: cpr-transfer documentation
backends/hostmem-epc.c | 2 +-
backends/hostmem-file.c | 2 +-
backends/hostmem-memfd.c | 14 +-
backends/hostmem-ram.c | 2 +-
backends/hostmem-shm.c | 51 +---
docs/devel/migration/CPR.rst | 184 ++++++++++++-
hw/core/machine.c | 22 ++
include/exec/memory.h | 10 +
include/exec/ram_addr.h | 13 +-
include/hw/boards.h | 1 +
include/migration/cpr.h | 33 +++
include/migration/misc.h | 7 +
include/migration/vmstate.h | 9 +
include/qemu/osdep.h | 1 +
meson.build | 8 +-
migration/cpr-transfer.c | 71 +++++
migration/cpr.c | 224 ++++++++++++++++
migration/meson.build | 2 +
migration/migration.c | 348 +++++++++++++++++++------
migration/migration.h | 5 +-
migration/options.c | 8 +-
migration/qemu-file.c | 84 +++++-
migration/qemu-file.h | 2 +
migration/ram.c | 69 ++---
migration/savevm.c | 116 ++++-----
migration/savevm.h | 6 +-
migration/trace-events | 13 +-
migration/vmstate-types.c | 24 ++
migration/vmstate.c | 6 +-
qapi/migration.json | 51 +++-
qemu-options.hx | 34 +++
stubs/vmstate.c | 7 +
system/memory.c | 4 +-
system/physmem.c | 150 +++++++++--
system/trace-events | 1 +
system/vl.c | 43 ++-
tests/qemu-iotests/194.out | 1 +
tests/qemu-iotests/203.out | 1 +
tests/qemu-iotests/234.out | 2 +
tests/qemu-iotests/262.out | 1 +
tests/qemu-iotests/280.out | 1 +
tests/qtest/libqos/libqos.c | 3 +-
tests/qtest/libqtest.c | 103 +++++---
tests/qtest/libqtest.h | 24 +-
tests/qtest/migration/cpr-tests.c | 62 +++++
tests/qtest/migration/framework.c | 80 +++++-
tests/qtest/migration/framework.h | 11 +
tests/qtest/migration/migration-qmp.c | 53 +++-
tests/qtest/migration/migration-qmp.h | 10 +-
tests/qtest/migration/migration-util.c | 23 +-
tests/qtest/migration/misc-tests.c | 9 +-
tests/qtest/migration/precopy-tests.c | 6 +-
tests/qtest/virtio-net-failover.c | 8 +-
util/memfd.c | 16 +-
util/oslib-posix.c | 52 ++++
util/oslib-win32.c | 6 +
56 files changed, 1713 insertions(+), 386 deletions(-)
create mode 100644 include/migration/cpr.h
create mode 100644 migration/cpr-transfer.c
create mode 100644 migration/cpr.c
--
2.35.3
^ permalink raw reply [flat|nested] 47+ messages in thread
* [PULL 01/42] migration: fix -Werror=maybe-uninitialized
2025-01-29 16:00 [PULL 00/42] Migration patches for 2025-01-29 Fabiano Rosas
@ 2025-01-29 16:00 ` Fabiano Rosas
2025-01-29 16:00 ` [PULL 02/42] backends/hostmem-shm: factor out allocation of "anonymous shared memory with an fd" Fabiano Rosas
` (41 subsequent siblings)
42 siblings, 0 replies; 47+ messages in thread
From: Fabiano Rosas @ 2025-01-29 16:00 UTC (permalink / raw)
To: qemu-devel; +Cc: Peter Xu, Marc-André Lureau, Philippe Mathieu-Daudé
From: Marc-André Lureau <marcandre.lureau@redhat.com>
../migration/savevm.c: In function ‘qemu_savevm_state_complete_precopy_non_iterable’:
../migration/savevm.c:1560:20: error: ‘ret’ may be used uninitialized [-Werror=maybe-uninitialized]
1560 | return ret;
| ^~~
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20250114104811.2612846-1-marcandre.lureau@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
migration/savevm.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/migration/savevm.c b/migration/savevm.c
index c929da1ca5..6e56d4cf1d 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1557,7 +1557,7 @@ int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
migrate_set_error(ms, local_err);
error_report_err(local_err);
qemu_file_set_error(f, -EFAULT);
- return ret;
+ return -1;
}
}
if (!in_postcopy) {
--
2.35.3
^ permalink raw reply related [flat|nested] 47+ messages in thread
* [PULL 02/42] backends/hostmem-shm: factor out allocation of "anonymous shared memory with an fd"
2025-01-29 16:00 [PULL 00/42] Migration patches for 2025-01-29 Fabiano Rosas
2025-01-29 16:00 ` [PULL 01/42] migration: fix -Werror=maybe-uninitialized Fabiano Rosas
@ 2025-01-29 16:00 ` Fabiano Rosas
2025-01-29 16:00 ` [PULL 03/42] physmem: fix qemu_ram_alloc_from_fd size calculation Fabiano Rosas
` (40 subsequent siblings)
42 siblings, 0 replies; 47+ messages in thread
From: Fabiano Rosas @ 2025-01-29 16:00 UTC (permalink / raw)
To: qemu-devel; +Cc: Peter Xu, Steve Sistare, David Hildenbrand
From: Steve Sistare <steven.sistare@oracle.com>
Let's factor it out so we can reuse it.
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/1736967650-129648-2-git-send-email-steven.sistare@oracle.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
backends/hostmem-shm.c | 45 ++++--------------------------------
include/qemu/osdep.h | 1 +
meson.build | 8 +++++--
util/oslib-posix.c | 52 ++++++++++++++++++++++++++++++++++++++++++
util/oslib-win32.c | 6 +++++
5 files changed, 69 insertions(+), 43 deletions(-)
diff --git a/backends/hostmem-shm.c b/backends/hostmem-shm.c
index 5551ba78a6..fabee41f2c 100644
--- a/backends/hostmem-shm.c
+++ b/backends/hostmem-shm.c
@@ -25,11 +25,9 @@ struct HostMemoryBackendShm {
static bool
shm_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
{
- g_autoptr(GString) shm_name = g_string_new(NULL);
g_autofree char *backend_name = NULL;
uint32_t ram_flags;
- int fd, oflag;
- mode_t mode;
+ int fd;
if (!backend->size) {
error_setg(errp, "can't create shm backend with size 0");
@@ -41,48 +39,13 @@ shm_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
return false;
}
- /*
- * Let's use `mode = 0` because we don't want other processes to open our
- * memory unless we share the file descriptor with them.
- */
- mode = 0;
- oflag = O_RDWR | O_CREAT | O_EXCL;
- backend_name = host_memory_backend_get_name(backend);
-
- /*
- * Some operating systems allow creating anonymous POSIX shared memory
- * objects (e.g. FreeBSD provides the SHM_ANON constant), but this is not
- * defined by POSIX, so let's create a unique name.
- *
- * From Linux's shm_open(3) man-page:
- * For portable use, a shared memory object should be identified
- * by a name of the form /somename;"
- */
- g_string_printf(shm_name, "/qemu-" FMT_pid "-shm-%s", getpid(),
- backend_name);
-
- fd = shm_open(shm_name->str, oflag, mode);
+ fd = qemu_shm_alloc(backend->size, errp);
if (fd < 0) {
- error_setg_errno(errp, errno,
- "failed to create POSIX shared memory");
- return false;
- }
-
- /*
- * We have the file descriptor, so we no longer need to expose the
- * POSIX shared memory object. However it will remain allocated as long as
- * there are file descriptors pointing to it.
- */
- shm_unlink(shm_name->str);
-
- if (ftruncate(fd, backend->size) == -1) {
- error_setg_errno(errp, errno,
- "failed to resize POSIX shared memory to %" PRIu64,
- backend->size);
- close(fd);
return false;
}
+ /* Let's do the same as memory-backend-ram,share=on would do. */
+ backend_name = host_memory_backend_get_name(backend);
ram_flags = RAM_SHARED;
ram_flags |= backend->reserve ? 0 : RAM_NORESERVE;
diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index b94fb5fab8..112ebdff21 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -509,6 +509,7 @@ int qemu_daemon(int nochdir, int noclose);
void *qemu_anon_ram_alloc(size_t size, uint64_t *align, bool shared,
bool noreserve);
void qemu_anon_ram_free(void *ptr, size_t size);
+int qemu_shm_alloc(size_t size, Error **errp);
#ifdef _WIN32
#define HAVE_CHARDEV_SERIAL 1
diff --git a/meson.build b/meson.build
index 15a066043b..2c9ac9cfe1 100644
--- a/meson.build
+++ b/meson.build
@@ -3696,9 +3696,13 @@ libqemuutil = static_library('qemuutil',
build_by_default: false,
sources: util_ss.sources() + stub_ss.sources() + genh,
dependencies: [util_ss.dependencies(), libm, threads, glib, socket, malloc])
+qemuutil_deps = [event_loop_base]
+if host_os != 'windows'
+ qemuutil_deps += [rt]
+endif
qemuutil = declare_dependency(link_with: libqemuutil,
sources: genh + version_res,
- dependencies: [event_loop_base])
+ dependencies: qemuutil_deps)
if have_system or have_user
decodetree = generator(find_program('scripts/decodetree.py'),
@@ -4357,7 +4361,7 @@ if have_tools
subdir('contrib/elf2dmp')
executable('qemu-edid', files('qemu-edid.c', 'hw/display/edid-generate.c'),
- dependencies: qemuutil,
+ dependencies: [qemuutil, rt],
install: true)
if have_vhost_user
diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index 7a542cb50b..2bb34dade3 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -931,3 +931,55 @@ void qemu_close_all_open_fd(const int *skip, unsigned int nskip)
qemu_close_all_open_fd_fallback(skip, nskip, open_max);
}
}
+
+int qemu_shm_alloc(size_t size, Error **errp)
+{
+ g_autoptr(GString) shm_name = g_string_new(NULL);
+ int fd, oflag, cur_sequence;
+ static int sequence;
+ mode_t mode;
+
+ cur_sequence = qatomic_fetch_inc(&sequence);
+
+ /*
+ * Let's use `mode = 0` because we don't want other processes to open our
+ * memory unless we share the file descriptor with them.
+ */
+ mode = 0;
+ oflag = O_RDWR | O_CREAT | O_EXCL;
+
+ /*
+ * Some operating systems allow creating anonymous POSIX shared memory
+ * objects (e.g. FreeBSD provides the SHM_ANON constant), but this is not
+ * defined by POSIX, so let's create a unique name.
+ *
+ * From Linux's shm_open(3) man-page:
+ * For portable use, a shared memory object should be identified
+ * by a name of the form /somename;"
+ */
+ g_string_printf(shm_name, "/qemu-" FMT_pid "-shm-%d", getpid(),
+ cur_sequence);
+
+ fd = shm_open(shm_name->str, oflag, mode);
+ if (fd < 0) {
+ error_setg_errno(errp, errno,
+ "failed to create POSIX shared memory");
+ return -1;
+ }
+
+ /*
+ * We have the file descriptor, so we no longer need to expose the
+ * POSIX shared memory object. However it will remain allocated as long as
+ * there are file descriptors pointing to it.
+ */
+ shm_unlink(shm_name->str);
+
+ if (ftruncate(fd, size) == -1) {
+ error_setg_errno(errp, errno,
+ "failed to resize POSIX shared memory to %zu", size);
+ close(fd);
+ return -1;
+ }
+
+ return fd;
+}
diff --git a/util/oslib-win32.c b/util/oslib-win32.c
index b623830d62..b7351634ec 100644
--- a/util/oslib-win32.c
+++ b/util/oslib-win32.c
@@ -877,3 +877,9 @@ void qemu_win32_map_free(void *ptr, HANDLE h, Error **errp)
}
CloseHandle(h);
}
+
+int qemu_shm_alloc(size_t size, Error **errp)
+{
+ error_setg(errp, "Shared memory is not supported.");
+ return -1;
+}
--
2.35.3
^ permalink raw reply related [flat|nested] 47+ messages in thread
* [PULL 03/42] physmem: fix qemu_ram_alloc_from_fd size calculation
2025-01-29 16:00 [PULL 00/42] Migration patches for 2025-01-29 Fabiano Rosas
2025-01-29 16:00 ` [PULL 01/42] migration: fix -Werror=maybe-uninitialized Fabiano Rosas
2025-01-29 16:00 ` [PULL 02/42] backends/hostmem-shm: factor out allocation of "anonymous shared memory with an fd" Fabiano Rosas
@ 2025-01-29 16:00 ` Fabiano Rosas
2025-01-29 16:00 ` [PULL 04/42] physmem: qemu_ram_alloc_from_fd extensions Fabiano Rosas
` (39 subsequent siblings)
42 siblings, 0 replies; 47+ messages in thread
From: Fabiano Rosas @ 2025-01-29 16:00 UTC (permalink / raw)
To: qemu-devel; +Cc: Peter Xu, Steve Sistare, qemu-stable, David Hildenbrand
From: Steve Sistare <steven.sistare@oracle.com>
qemu_ram_alloc_from_fd allocates space if file_size == 0. If non-zero,
it uses the existing space and verifies it is large enough, but the
verification was broken when the offset parameter was introduced. As
a result, a file smaller than offset passes the verification and causes
errors later. Fix that, and update the error message to include offset.
Peter provides this concise reproducer:
$ touch ramfile
$ truncate -s 64M ramfile
$ ./qemu-system-x86_64 -object memory-backend-file,mem-path=./ramfile,offset=128M,size=128M,id=mem1,prealloc=on
qemu-system-x86_64: qemu_prealloc_mem: preallocating memory failed: Bad address
With the fix, the error message is:
qemu-system-x86_64: mem1 backing store size 0x4000000 is too small for 'size' option 0x8000000 plus 'offset' option 0x8000000
Cc: qemu-stable@nongnu.org
Fixes: 4b870dc4d0c0 ("hostmem-file: add offset option")
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/1736967650-129648-3-git-send-email-steven.sistare@oracle.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
system/physmem.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/system/physmem.c b/system/physmem.c
index c76503aea8..792844d5a5 100644
--- a/system/physmem.c
+++ b/system/physmem.c
@@ -1970,10 +1970,12 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
size = REAL_HOST_PAGE_ALIGN(size);
file_size = get_file_size(fd);
- if (file_size > offset && file_size < (offset + size)) {
- error_setg(errp, "backing store size 0x%" PRIx64
- " does not match 'size' option 0x" RAM_ADDR_FMT,
- file_size, size);
+ if (file_size && file_size < offset + size) {
+ error_setg(errp, "%s backing store size 0x%" PRIx64
+ " is too small for 'size' option 0x" RAM_ADDR_FMT
+ " plus 'offset' option 0x%" PRIx64,
+ memory_region_name(mr), file_size, size,
+ (uint64_t)offset);
return NULL;
}
--
2.35.3
^ permalink raw reply related [flat|nested] 47+ messages in thread
* [PULL 04/42] physmem: qemu_ram_alloc_from_fd extensions
2025-01-29 16:00 [PULL 00/42] Migration patches for 2025-01-29 Fabiano Rosas
` (2 preceding siblings ...)
2025-01-29 16:00 ` [PULL 03/42] physmem: fix qemu_ram_alloc_from_fd size calculation Fabiano Rosas
@ 2025-01-29 16:00 ` Fabiano Rosas
2025-01-29 16:00 ` [PULL 05/42] physmem: fd-based shared memory Fabiano Rosas
` (38 subsequent siblings)
42 siblings, 0 replies; 47+ messages in thread
From: Fabiano Rosas @ 2025-01-29 16:00 UTC (permalink / raw)
To: qemu-devel; +Cc: Peter Xu, Steve Sistare
From: Steve Sistare <steven.sistare@oracle.com>
Extend qemu_ram_alloc_from_fd to support resizable ram, and define
qemu_ram_resize_cb to clean up the API.
Add a grow parameter to extend the file if necessary. However, if
grow is false, a zero-sized file is always extended.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Link: https://lore.kernel.org/r/1736967650-129648-4-git-send-email-steven.sistare@oracle.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
include/exec/ram_addr.h | 13 +++++++++----
system/memory.c | 4 ++--
system/physmem.c | 35 ++++++++++++++++++++---------------
3 files changed, 31 insertions(+), 21 deletions(-)
diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index ff157c1f42..94bb3ccbe4 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -111,23 +111,30 @@ long qemu_maxrampagesize(void);
*
* Parameters:
* @size: the size in bytes of the ram block
+ * @max_size: the maximum size of the block after resizing
* @mr: the memory region where the ram block is
+ * @resized: callback after calls to qemu_ram_resize
* @ram_flags: RamBlock flags. Supported flags: RAM_SHARED, RAM_PMEM,
* RAM_NORESERVE, RAM_PROTECTED, RAM_NAMED_FILE, RAM_READONLY,
* RAM_READONLY_FD, RAM_GUEST_MEMFD
* @mem_path or @fd: specify the backing file or device
* @offset: Offset into target file
+ * @grow: extend file if necessary (but an empty file is always extended).
* @errp: pointer to Error*, to store an error if it happens
*
* Return:
* On success, return a pointer to the ram block.
* On failure, return NULL.
*/
+typedef void (*qemu_ram_resize_cb)(const char *, uint64_t length, void *host);
+
RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr,
uint32_t ram_flags, const char *mem_path,
off_t offset, Error **errp);
-RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
+RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, ram_addr_t max_size,
+ qemu_ram_resize_cb resized, MemoryRegion *mr,
uint32_t ram_flags, int fd, off_t offset,
+ bool grow,
Error **errp);
RAMBlock *qemu_ram_alloc_from_ptr(ram_addr_t size, void *host,
@@ -135,9 +142,7 @@ RAMBlock *qemu_ram_alloc_from_ptr(ram_addr_t size, void *host,
RAMBlock *qemu_ram_alloc(ram_addr_t size, uint32_t ram_flags, MemoryRegion *mr,
Error **errp);
RAMBlock *qemu_ram_alloc_resizeable(ram_addr_t size, ram_addr_t max_size,
- void (*resized)(const char*,
- uint64_t length,
- void *host),
+ qemu_ram_resize_cb resized,
MemoryRegion *mr, Error **errp);
void qemu_ram_free(RAMBlock *block);
diff --git a/system/memory.c b/system/memory.c
index b17b5538ff..4c829793a0 100644
--- a/system/memory.c
+++ b/system/memory.c
@@ -1680,8 +1680,8 @@ bool memory_region_init_ram_from_fd(MemoryRegion *mr,
mr->readonly = !!(ram_flags & RAM_READONLY);
mr->terminates = true;
mr->destructor = memory_region_destructor_ram;
- mr->ram_block = qemu_ram_alloc_from_fd(size, mr, ram_flags, fd, offset,
- &err);
+ mr->ram_block = qemu_ram_alloc_from_fd(size, size, NULL, mr, ram_flags, fd,
+ offset, false, &err);
if (err) {
mr->size = int128_zero();
object_unparent(OBJECT(mr));
diff --git a/system/physmem.c b/system/physmem.c
index 792844d5a5..4d13761329 100644
--- a/system/physmem.c
+++ b/system/physmem.c
@@ -1942,8 +1942,10 @@ out_free:
}
#ifdef CONFIG_POSIX
-RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
+RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, ram_addr_t max_size,
+ qemu_ram_resize_cb resized, MemoryRegion *mr,
uint32_t ram_flags, int fd, off_t offset,
+ bool grow,
Error **errp)
{
RAMBlock *new_block;
@@ -1953,7 +1955,9 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
/* Just support these ram flags by now. */
assert((ram_flags & ~(RAM_SHARED | RAM_PMEM | RAM_NORESERVE |
RAM_PROTECTED | RAM_NAMED_FILE | RAM_READONLY |
- RAM_READONLY_FD | RAM_GUEST_MEMFD)) == 0);
+ RAM_READONLY_FD | RAM_GUEST_MEMFD |
+ RAM_RESIZEABLE)) == 0);
+ assert(max_size >= size);
if (xen_enabled()) {
error_setg(errp, "-mem-path not supported with Xen");
@@ -1968,13 +1972,15 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
size = TARGET_PAGE_ALIGN(size);
size = REAL_HOST_PAGE_ALIGN(size);
+ max_size = TARGET_PAGE_ALIGN(max_size);
+ max_size = REAL_HOST_PAGE_ALIGN(max_size);
file_size = get_file_size(fd);
- if (file_size && file_size < offset + size) {
+ if (file_size && file_size < offset + max_size && !grow) {
error_setg(errp, "%s backing store size 0x%" PRIx64
" is too small for 'size' option 0x" RAM_ADDR_FMT
" plus 'offset' option 0x%" PRIx64,
- memory_region_name(mr), file_size, size,
+ memory_region_name(mr), file_size, max_size,
(uint64_t)offset);
return NULL;
}
@@ -1990,11 +1996,13 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
new_block = g_malloc0(sizeof(*new_block));
new_block->mr = mr;
new_block->used_length = size;
- new_block->max_length = size;
+ new_block->max_length = max_size;
+ new_block->resized = resized;
new_block->flags = ram_flags;
new_block->guest_memfd = -1;
- new_block->host = file_ram_alloc(new_block, size, fd, !file_size, offset,
- errp);
+ new_block->host = file_ram_alloc(new_block, max_size, fd,
+ file_size < offset + max_size,
+ offset, errp);
if (!new_block->host) {
g_free(new_block);
return NULL;
@@ -2046,7 +2054,8 @@ RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr,
return NULL;
}
- block = qemu_ram_alloc_from_fd(size, mr, ram_flags, fd, offset, errp);
+ block = qemu_ram_alloc_from_fd(size, size, NULL, mr, ram_flags, fd, offset,
+ false, errp);
if (!block) {
if (created) {
unlink(mem_path);
@@ -2061,9 +2070,7 @@ RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr,
static
RAMBlock *qemu_ram_alloc_internal(ram_addr_t size, ram_addr_t max_size,
- void (*resized)(const char*,
- uint64_t length,
- void *host),
+ qemu_ram_resize_cb resized,
void *host, uint32_t ram_flags,
MemoryRegion *mr, Error **errp)
{
@@ -2115,10 +2122,8 @@ RAMBlock *qemu_ram_alloc(ram_addr_t size, uint32_t ram_flags,
}
RAMBlock *qemu_ram_alloc_resizeable(ram_addr_t size, ram_addr_t maxsz,
- void (*resized)(const char*,
- uint64_t length,
- void *host),
- MemoryRegion *mr, Error **errp)
+ qemu_ram_resize_cb resized,
+ MemoryRegion *mr, Error **errp)
{
return qemu_ram_alloc_internal(size, maxsz, resized, NULL,
RAM_RESIZEABLE, mr, errp);
--
2.35.3
^ permalink raw reply related [flat|nested] 47+ messages in thread
* [PULL 05/42] physmem: fd-based shared memory
2025-01-29 16:00 [PULL 00/42] Migration patches for 2025-01-29 Fabiano Rosas
` (3 preceding siblings ...)
2025-01-29 16:00 ` [PULL 04/42] physmem: qemu_ram_alloc_from_fd extensions Fabiano Rosas
@ 2025-01-29 16:00 ` Fabiano Rosas
2025-01-29 16:00 ` [PULL 06/42] memory: add RAM_PRIVATE Fabiano Rosas
` (37 subsequent siblings)
42 siblings, 0 replies; 47+ messages in thread
From: Fabiano Rosas @ 2025-01-29 16:00 UTC (permalink / raw)
To: qemu-devel; +Cc: Peter Xu, Steve Sistare
From: Steve Sistare <steven.sistare@oracle.com>
Create MAP_SHARED RAMBlocks by mmap'ing a file descriptor rather than using
MAP_ANON, so the memory can be accessed in another process by passing and
mmap'ing the fd. This will allow CPR to support memory-backend-ram and
memory-backend-shm objects, provided the user creates them with share=on.
Use memfd_create if available because it has no constraints. If not, use
POSIX shm_open. However, allocation on the opened fd may fail if the shm
mount size is too small, even if the system has free memory, so for backwards
compatibility fall back to qemu_anon_ram_alloc/MAP_ANON on failure.
For backwards compatibility on Windows, always use MAP_ANON. share=on has
no purpose there, but the syntax is accepted, and must continue to work.
Lastly, quietly fall back to MAP_ANON if the system does not support
qemu_ram_alloc_from_fd.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/1736967650-129648-5-git-send-email-steven.sistare@oracle.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
system/physmem.c | 57 ++++++++++++++++++++++++++++++++++++++++++++-
system/trace-events | 1 +
util/memfd.c | 16 ++++++++++---
3 files changed, 70 insertions(+), 4 deletions(-)
diff --git a/system/physmem.c b/system/physmem.c
index 4d13761329..e4355649e9 100644
--- a/system/physmem.c
+++ b/system/physmem.c
@@ -48,6 +48,7 @@
#include "qemu/qemu-print.h"
#include "qemu/log.h"
#include "qemu/memalign.h"
+#include "qemu/memfd.h"
#include "exec/memory.h"
#include "exec/ioport.h"
#include "system/dma.h"
@@ -1948,6 +1949,7 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, ram_addr_t max_size,
bool grow,
Error **errp)
{
+ ERRP_GUARD();
RAMBlock *new_block;
Error *local_err = NULL;
int64_t file_size, file_align;
@@ -2068,6 +2070,25 @@ RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr,
}
#endif
+#ifdef CONFIG_POSIX
+/*
+ * Create MAP_SHARED RAMBlocks by mmap'ing a file descriptor, so it can be
+ * shared with another process if CPR is being used. Use memfd if available
+ * because it has no size limits, else use POSIX shm.
+ */
+static int qemu_ram_get_shared_fd(const char *name, Error **errp)
+{
+ int fd;
+
+ if (qemu_memfd_check(0)) {
+ fd = qemu_memfd_create(name, 0, 0, 0, 0, errp);
+ } else {
+ fd = qemu_shm_alloc(0, errp);
+ }
+ return fd;
+}
+#endif
+
static
RAMBlock *qemu_ram_alloc_internal(ram_addr_t size, ram_addr_t max_size,
qemu_ram_resize_cb resized,
@@ -2081,6 +2102,41 @@ RAMBlock *qemu_ram_alloc_internal(ram_addr_t size, ram_addr_t max_size,
assert((ram_flags & ~(RAM_SHARED | RAM_RESIZEABLE | RAM_PREALLOC |
RAM_NORESERVE | RAM_GUEST_MEMFD)) == 0);
assert(!host ^ (ram_flags & RAM_PREALLOC));
+ assert(max_size >= size);
+
+#ifdef CONFIG_POSIX /* ignore RAM_SHARED for Windows */
+ if (!host) {
+ if (ram_flags & RAM_SHARED) {
+ const char *name = memory_region_name(mr);
+ int fd = qemu_ram_get_shared_fd(name, errp);
+
+ if (fd < 0) {
+ return NULL;
+ }
+
+ /* Use same alignment as qemu_anon_ram_alloc */
+ mr->align = QEMU_VMALLOC_ALIGN;
+
+ /*
+ * This can fail if the shm mount size is too small, or alloc from
+ * fd is not supported, but previous QEMU versions that called
+ * qemu_anon_ram_alloc for anonymous shared memory could have
+ * succeeded. Quietly fail and fall back.
+ */
+ new_block = qemu_ram_alloc_from_fd(size, max_size, resized, mr,
+ ram_flags, fd, 0, false, NULL);
+ if (new_block) {
+ trace_qemu_ram_alloc_shared(name, new_block->used_length,
+ new_block->max_length, fd,
+ new_block->host);
+ return new_block;
+ }
+
+ close(fd);
+ /* fall back to anon allocation */
+ }
+ }
+#endif
align = qemu_real_host_page_size();
align = MAX(align, TARGET_PAGE_SIZE);
@@ -2092,7 +2148,6 @@ RAMBlock *qemu_ram_alloc_internal(ram_addr_t size, ram_addr_t max_size,
new_block->resized = resized;
new_block->used_length = size;
new_block->max_length = max_size;
- assert(max_size >= size);
new_block->fd = -1;
new_block->guest_memfd = -1;
new_block->page_size = qemu_real_host_page_size();
diff --git a/system/trace-events b/system/trace-events
index 5bbc3fbffa..be12ebfb41 100644
--- a/system/trace-events
+++ b/system/trace-events
@@ -33,6 +33,7 @@ address_space_map(void *as, uint64_t addr, uint64_t len, bool is_write, uint32_t
find_ram_offset(uint64_t size, uint64_t offset) "size: 0x%" PRIx64 " @ 0x%" PRIx64
find_ram_offset_loop(uint64_t size, uint64_t candidate, uint64_t offset, uint64_t next, uint64_t mingap) "trying size: 0x%" PRIx64 " @ 0x%" PRIx64 ", offset: 0x%" PRIx64" next: 0x%" PRIx64 " mingap: 0x%" PRIx64
ram_block_discard_range(const char *rbname, void *hva, size_t length, bool need_madvise, bool need_fallocate, int ret) "%s@%p + 0x%zx: madvise: %d fallocate: %d ret: %d"
+qemu_ram_alloc_shared(const char *name, size_t size, size_t max_size, int fd, void *host) "%s size %zu max_size %zu fd %d host %p"
# cpus.c
vm_stop_flush_all(int ret) "ret %d"
diff --git a/util/memfd.c b/util/memfd.c
index 8a2e906962..07beab174d 100644
--- a/util/memfd.c
+++ b/util/memfd.c
@@ -194,17 +194,27 @@ bool qemu_memfd_alloc_check(void)
/**
* qemu_memfd_check():
*
- * Check if host supports memfd.
+ * Check if host supports memfd. Cache the answer for the common case flags=0.
*/
bool qemu_memfd_check(unsigned int flags)
{
#ifdef CONFIG_LINUX
- int mfd = memfd_create("test", flags | MFD_CLOEXEC);
+ int mfd;
+ static int memfd_check = MEMFD_TODO;
+ if (!flags && memfd_check != MEMFD_TODO) {
+ return memfd_check;
+ }
+
+ mfd = memfd_create("test", flags | MFD_CLOEXEC);
if (mfd >= 0) {
close(mfd);
- return true;
}
+ if (!flags) {
+ memfd_check = (mfd >= 0) ? MEMFD_OK : MEMFD_KO;
+ }
+ return (mfd >= 0);
+
#endif
return false;
--
2.35.3
^ permalink raw reply related [flat|nested] 47+ messages in thread
* [PULL 06/42] memory: add RAM_PRIVATE
2025-01-29 16:00 [PULL 00/42] Migration patches for 2025-01-29 Fabiano Rosas
` (4 preceding siblings ...)
2025-01-29 16:00 ` [PULL 05/42] physmem: fd-based shared memory Fabiano Rosas
@ 2025-01-29 16:00 ` Fabiano Rosas
2025-01-29 16:00 ` [PULL 07/42] machine: aux-ram-share option Fabiano Rosas
` (36 subsequent siblings)
42 siblings, 0 replies; 47+ messages in thread
From: Fabiano Rosas @ 2025-01-29 16:00 UTC (permalink / raw)
To: qemu-devel; +Cc: Peter Xu, Steve Sistare
From: Steve Sistare <steven.sistare@oracle.com>
Define the RAM_PRIVATE flag.
In RAMBlock creation functions, if MAP_SHARED is 0 in the flags parameter,
in a subsequent patch the implementation may still create a shared mapping
if other conditions require it. Callers who specifically want a private
mapping, eg for objects specified by the user, must pass RAM_PRIVATE.
After RAMBlock creation, MAP_SHARED in the block's flags indicates whether
the block is shared or private, and MAP_PRIVATE is omitted.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/1736967650-129648-6-git-send-email-steven.sistare@oracle.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
backends/hostmem-epc.c | 2 +-
backends/hostmem-file.c | 2 +-
backends/hostmem-memfd.c | 2 +-
backends/hostmem-ram.c | 2 +-
include/exec/memory.h | 10 ++++++++++
system/physmem.c | 15 ++++++++++++---
6 files changed, 26 insertions(+), 7 deletions(-)
diff --git a/backends/hostmem-epc.c b/backends/hostmem-epc.c
index eb4b95dfd7..1fa2d031e4 100644
--- a/backends/hostmem-epc.c
+++ b/backends/hostmem-epc.c
@@ -36,7 +36,7 @@ sgx_epc_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
backend->aligned = true;
name = object_get_canonical_path(OBJECT(backend));
- ram_flags = (backend->share ? RAM_SHARED : 0) | RAM_PROTECTED;
+ ram_flags = (backend->share ? RAM_SHARED : RAM_PRIVATE) | RAM_PROTECTED;
return memory_region_init_ram_from_fd(&backend->mr, OBJECT(backend), name,
backend->size, ram_flags, fd, 0, errp);
}
diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
index 46321fda84..691a827819 100644
--- a/backends/hostmem-file.c
+++ b/backends/hostmem-file.c
@@ -82,7 +82,7 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
backend->aligned = true;
name = host_memory_backend_get_name(backend);
- ram_flags = backend->share ? RAM_SHARED : 0;
+ ram_flags = backend->share ? RAM_SHARED : RAM_PRIVATE;
ram_flags |= fb->readonly ? RAM_READONLY_FD : 0;
ram_flags |= fb->rom == ON_OFF_AUTO_ON ? RAM_READONLY : 0;
ram_flags |= backend->reserve ? 0 : RAM_NORESERVE;
diff --git a/backends/hostmem-memfd.c b/backends/hostmem-memfd.c
index d4d0620e6c..1672da9e30 100644
--- a/backends/hostmem-memfd.c
+++ b/backends/hostmem-memfd.c
@@ -52,7 +52,7 @@ memfd_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
backend->aligned = true;
name = host_memory_backend_get_name(backend);
- ram_flags = backend->share ? RAM_SHARED : 0;
+ ram_flags = backend->share ? RAM_SHARED : RAM_PRIVATE;
ram_flags |= backend->reserve ? 0 : RAM_NORESERVE;
ram_flags |= backend->guest_memfd ? RAM_GUEST_MEMFD : 0;
return memory_region_init_ram_from_fd(&backend->mr, OBJECT(backend), name,
diff --git a/backends/hostmem-ram.c b/backends/hostmem-ram.c
index 39aac6bf35..868ae6ca80 100644
--- a/backends/hostmem-ram.c
+++ b/backends/hostmem-ram.c
@@ -28,7 +28,7 @@ ram_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
}
name = host_memory_backend_get_name(backend);
- ram_flags = backend->share ? RAM_SHARED : 0;
+ ram_flags = backend->share ? RAM_SHARED : RAM_PRIVATE;
ram_flags |= backend->reserve ? 0 : RAM_NORESERVE;
ram_flags |= backend->guest_memfd ? RAM_GUEST_MEMFD : 0;
return memory_region_init_ram_flags_nomigrate(&backend->mr, OBJECT(backend),
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 3ee1901b52..9f73b59867 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -246,6 +246,16 @@ typedef struct IOMMUTLBEvent {
/* RAM can be private that has kvm guest memfd backend */
#define RAM_GUEST_MEMFD (1 << 12)
+/*
+ * In RAMBlock creation functions, if MAP_SHARED is 0 in the flags parameter,
+ * the implementation may still create a shared mapping if other conditions
+ * require it. Callers who specifically want a private mapping, eg objects
+ * specified by the user, must pass RAM_PRIVATE.
+ * After RAMBlock creation, MAP_SHARED in the block's flags indicates whether
+ * the block is shared or private, and MAP_PRIVATE is omitted.
+ */
+#define RAM_PRIVATE (1 << 13)
+
static inline void iommu_notifier_init(IOMMUNotifier *n, IOMMUNotify fn,
IOMMUNotifierFlag flags,
hwaddr start, hwaddr end,
diff --git a/system/physmem.c b/system/physmem.c
index e4355649e9..03fac0a64f 100644
--- a/system/physmem.c
+++ b/system/physmem.c
@@ -1952,7 +1952,11 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, ram_addr_t max_size,
ERRP_GUARD();
RAMBlock *new_block;
Error *local_err = NULL;
- int64_t file_size, file_align;
+ int64_t file_size, file_align, share_flags;
+
+ share_flags = ram_flags & (RAM_PRIVATE | RAM_SHARED);
+ assert(share_flags != (RAM_SHARED | RAM_PRIVATE));
+ ram_flags &= ~RAM_PRIVATE;
/* Just support these ram flags by now. */
assert((ram_flags & ~(RAM_SHARED | RAM_PMEM | RAM_NORESERVE |
@@ -2097,7 +2101,11 @@ RAMBlock *qemu_ram_alloc_internal(ram_addr_t size, ram_addr_t max_size,
{
RAMBlock *new_block;
Error *local_err = NULL;
- int align;
+ int align, share_flags;
+
+ share_flags = ram_flags & (RAM_PRIVATE | RAM_SHARED);
+ assert(share_flags != (RAM_SHARED | RAM_PRIVATE));
+ ram_flags &= ~RAM_PRIVATE;
assert((ram_flags & ~(RAM_SHARED | RAM_RESIZEABLE | RAM_PREALLOC |
RAM_NORESERVE | RAM_GUEST_MEMFD)) == 0);
@@ -2172,7 +2180,8 @@ RAMBlock *qemu_ram_alloc_from_ptr(ram_addr_t size, void *host,
RAMBlock *qemu_ram_alloc(ram_addr_t size, uint32_t ram_flags,
MemoryRegion *mr, Error **errp)
{
- assert((ram_flags & ~(RAM_SHARED | RAM_NORESERVE | RAM_GUEST_MEMFD)) == 0);
+ assert((ram_flags & ~(RAM_SHARED | RAM_NORESERVE | RAM_GUEST_MEMFD |
+ RAM_PRIVATE)) == 0);
return qemu_ram_alloc_internal(size, size, NULL, NULL, ram_flags, mr, errp);
}
--
2.35.3
^ permalink raw reply related [flat|nested] 47+ messages in thread
* [PULL 07/42] machine: aux-ram-share option
2025-01-29 16:00 [PULL 00/42] Migration patches for 2025-01-29 Fabiano Rosas
` (5 preceding siblings ...)
2025-01-29 16:00 ` [PULL 06/42] memory: add RAM_PRIVATE Fabiano Rosas
@ 2025-01-29 16:00 ` Fabiano Rosas
2025-01-29 16:00 ` [PULL 08/42] migration: cpr-state Fabiano Rosas
` (35 subsequent siblings)
42 siblings, 0 replies; 47+ messages in thread
From: Fabiano Rosas @ 2025-01-29 16:00 UTC (permalink / raw)
To: qemu-devel; +Cc: Peter Xu, Steve Sistare
From: Steve Sistare <steven.sistare@oracle.com>
Allocate auxilliary guest RAM as an anonymous file that is shareable
with an external process. This option applies to memory allocated as
a side effect of creating various devices. It does not apply to
memory-backend-objects, whether explicitly specified on the command
line, or implicitly created by the -m command line option.
This option is intended to support new migration modes, in which the
memory region can be transferred in place to a new QEMU process, by sending
the memfd file descriptor to the process. Memory contents are preserved,
and if the mode also transfers device descriptors, then pages that are
locked in memory for DMA remain locked. This behavior is a pre-requisite
for supporting vfio, vdpa, and iommufd devices with the new modes.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/1736967650-129648-7-git-send-email-steven.sistare@oracle.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
hw/core/machine.c | 22 ++++++++++++++++++++++
include/hw/boards.h | 1 +
qemu-options.hx | 11 +++++++++++
system/physmem.c | 3 +++
4 files changed, 37 insertions(+)
diff --git a/hw/core/machine.c b/hw/core/machine.c
index c23b399496..2b11bc4f66 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -457,6 +457,22 @@ static void machine_set_mem_merge(Object *obj, bool value, Error **errp)
ms->mem_merge = value;
}
+#ifdef CONFIG_POSIX
+static bool machine_get_aux_ram_share(Object *obj, Error **errp)
+{
+ MachineState *ms = MACHINE(obj);
+
+ return ms->aux_ram_share;
+}
+
+static void machine_set_aux_ram_share(Object *obj, bool value, Error **errp)
+{
+ MachineState *ms = MACHINE(obj);
+
+ ms->aux_ram_share = value;
+}
+#endif
+
static bool machine_get_usb(Object *obj, Error **errp)
{
MachineState *ms = MACHINE(obj);
@@ -1162,6 +1178,12 @@ static void machine_class_init(ObjectClass *oc, void *data)
object_class_property_set_description(oc, "mem-merge",
"Enable/disable memory merge support");
+#ifdef CONFIG_POSIX
+ object_class_property_add_bool(oc, "aux-ram-share",
+ machine_get_aux_ram_share,
+ machine_set_aux_ram_share);
+#endif
+
object_class_property_add_bool(oc, "usb",
machine_get_usb, machine_set_usb);
object_class_property_set_description(oc, "usb",
diff --git a/include/hw/boards.h b/include/hw/boards.h
index 2ad711e56d..e1f41b2a53 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -410,6 +410,7 @@ struct MachineState {
bool enable_graphics;
ConfidentialGuestSupport *cgs;
HostMemoryBackend *memdev;
+ bool aux_ram_share;
/*
* convenience alias to ram_memdev_id backend memory region
* or to numa container memory region
diff --git a/qemu-options.hx b/qemu-options.hx
index 7090d59f6f..90fad31590 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -38,6 +38,9 @@ DEF("machine", HAS_ARG, QEMU_OPTION_machine, \
" nvdimm=on|off controls NVDIMM support (default=off)\n"
" memory-encryption=@var{} memory encryption object to use (default=none)\n"
" hmat=on|off controls ACPI HMAT support (default=off)\n"
+#ifdef CONFIG_POSIX
+ " aux-ram-share=on|off allocate auxiliary guest RAM as shared (default: off)\n"
+#endif
" memory-backend='backend-id' specifies explicitly provided backend for main RAM (default=none)\n"
" cxl-fmw.0.targets.0=firsttarget,cxl-fmw.0.targets.1=secondtarget,cxl-fmw.0.size=size[,cxl-fmw.0.interleave-granularity=granularity]\n",
QEMU_ARCH_ALL)
@@ -101,6 +104,14 @@ SRST
Enables or disables ACPI Heterogeneous Memory Attribute Table
(HMAT) support. The default is off.
+ ``aux-ram-share=on|off``
+ Allocate auxiliary guest RAM as an anonymous file that is
+ shareable with an external process. This option applies to
+ memory allocated as a side effect of creating various devices.
+ It does not apply to memory-backend-objects, whether explicitly
+ specified on the command line, or implicitly created by the -m
+ command line option. The default is off.
+
``memory-backend='id'``
An alternative to legacy ``-mem-path`` and ``mem-prealloc`` options.
Allows to use a memory backend as main RAM.
diff --git a/system/physmem.c b/system/physmem.c
index 03fac0a64f..cb80ce3091 100644
--- a/system/physmem.c
+++ b/system/physmem.c
@@ -2114,6 +2114,9 @@ RAMBlock *qemu_ram_alloc_internal(ram_addr_t size, ram_addr_t max_size,
#ifdef CONFIG_POSIX /* ignore RAM_SHARED for Windows */
if (!host) {
+ if (!share_flags && current_machine->aux_ram_share) {
+ ram_flags |= RAM_SHARED;
+ }
if (ram_flags & RAM_SHARED) {
const char *name = memory_region_name(mr);
int fd = qemu_ram_get_shared_fd(name, errp);
--
2.35.3
^ permalink raw reply related [flat|nested] 47+ messages in thread
* [PULL 08/42] migration: cpr-state
2025-01-29 16:00 [PULL 00/42] Migration patches for 2025-01-29 Fabiano Rosas
` (6 preceding siblings ...)
2025-01-29 16:00 ` [PULL 07/42] machine: aux-ram-share option Fabiano Rosas
@ 2025-01-29 16:00 ` Fabiano Rosas
2025-01-29 16:00 ` [PULL 09/42] physmem: preserve ram blocks for cpr Fabiano Rosas
` (34 subsequent siblings)
42 siblings, 0 replies; 47+ messages in thread
From: Fabiano Rosas @ 2025-01-29 16:00 UTC (permalink / raw)
To: qemu-devel; +Cc: Peter Xu, Steve Sistare
From: Steve Sistare <steven.sistare@oracle.com>
CPR must save state that is needed after QEMU is restarted, when devices
are realized. Thus the extra state cannot be saved in the migration
channel, as objects must already exist before that channel can be loaded.
Instead, define auxilliary state structures and vmstate descriptions, not
associated with any registered object, and serialize the aux state to a
cpr-specific channel in cpr_state_save. Deserialize in cpr_state_load
after QEMU restarts, before devices are realized.
Provide accessors for clients to register file descriptors for saving.
The mechanism for passing the fd's to the new process will be specific
to each migration mode, and added in subsequent patches.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/1736967650-129648-8-git-send-email-steven.sistare@oracle.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
include/migration/cpr.h | 25 +++++
migration/cpr.c | 198 ++++++++++++++++++++++++++++++++++++++++
migration/meson.build | 1 +
migration/migration.c | 1 +
migration/trace-events | 7 ++
5 files changed, 232 insertions(+)
create mode 100644 include/migration/cpr.h
create mode 100644 migration/cpr.c
diff --git a/include/migration/cpr.h b/include/migration/cpr.h
new file mode 100644
index 0000000000..d9364f7d1f
--- /dev/null
+++ b/include/migration/cpr.h
@@ -0,0 +1,25 @@
+/*
+ * Copyright (c) 2021, 2024 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef MIGRATION_CPR_H
+#define MIGRATION_CPR_H
+
+#include "qapi/qapi-types-migration.h"
+
+#define QEMU_CPR_FILE_MAGIC 0x51435052
+#define QEMU_CPR_FILE_VERSION 0x00000001
+
+void cpr_save_fd(const char *name, int id, int fd);
+void cpr_delete_fd(const char *name, int id);
+int cpr_find_fd(const char *name, int id);
+
+int cpr_state_save(MigrationChannel *channel, Error **errp);
+int cpr_state_load(MigrationChannel *channel, Error **errp);
+void cpr_state_close(void);
+struct QIOChannel *cpr_state_ioc(void);
+
+#endif
diff --git a/migration/cpr.c b/migration/cpr.c
new file mode 100644
index 0000000000..87bcfdb5ff
--- /dev/null
+++ b/migration/cpr.c
@@ -0,0 +1,198 @@
+/*
+ * Copyright (c) 2021-2024 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "migration/cpr.h"
+#include "migration/misc.h"
+#include "migration/options.h"
+#include "migration/qemu-file.h"
+#include "migration/savevm.h"
+#include "migration/vmstate.h"
+#include "system/runstate.h"
+#include "trace.h"
+
+/*************************************************************************/
+/* cpr state container for all information to be saved. */
+
+typedef QLIST_HEAD(CprFdList, CprFd) CprFdList;
+
+typedef struct CprState {
+ CprFdList fds;
+} CprState;
+
+static CprState cpr_state;
+
+/****************************************************************************/
+
+typedef struct CprFd {
+ char *name;
+ unsigned int namelen;
+ int id;
+ int fd;
+ QLIST_ENTRY(CprFd) next;
+} CprFd;
+
+static const VMStateDescription vmstate_cpr_fd = {
+ .name = "cpr fd",
+ .version_id = 1,
+ .minimum_version_id = 1,
+ .fields = (VMStateField[]) {
+ VMSTATE_UINT32(namelen, CprFd),
+ VMSTATE_VBUFFER_ALLOC_UINT32(name, CprFd, 0, NULL, namelen),
+ VMSTATE_INT32(id, CprFd),
+ VMSTATE_INT32(fd, CprFd),
+ VMSTATE_END_OF_LIST()
+ }
+};
+
+void cpr_save_fd(const char *name, int id, int fd)
+{
+ CprFd *elem = g_new0(CprFd, 1);
+
+ trace_cpr_save_fd(name, id, fd);
+ elem->name = g_strdup(name);
+ elem->namelen = strlen(name) + 1;
+ elem->id = id;
+ elem->fd = fd;
+ QLIST_INSERT_HEAD(&cpr_state.fds, elem, next);
+}
+
+static CprFd *find_fd(CprFdList *head, const char *name, int id)
+{
+ CprFd *elem;
+
+ QLIST_FOREACH(elem, head, next) {
+ if (!strcmp(elem->name, name) && elem->id == id) {
+ return elem;
+ }
+ }
+ return NULL;
+}
+
+void cpr_delete_fd(const char *name, int id)
+{
+ CprFd *elem = find_fd(&cpr_state.fds, name, id);
+
+ if (elem) {
+ QLIST_REMOVE(elem, next);
+ g_free(elem->name);
+ g_free(elem);
+ }
+
+ trace_cpr_delete_fd(name, id);
+}
+
+int cpr_find_fd(const char *name, int id)
+{
+ CprFd *elem = find_fd(&cpr_state.fds, name, id);
+ int fd = elem ? elem->fd : -1;
+
+ trace_cpr_find_fd(name, id, fd);
+ return fd;
+}
+/*************************************************************************/
+#define CPR_STATE "CprState"
+
+static const VMStateDescription vmstate_cpr_state = {
+ .name = CPR_STATE,
+ .version_id = 1,
+ .minimum_version_id = 1,
+ .fields = (VMStateField[]) {
+ VMSTATE_QLIST_V(fds, CprState, 1, vmstate_cpr_fd, CprFd, next),
+ VMSTATE_END_OF_LIST()
+ }
+};
+/*************************************************************************/
+
+static QEMUFile *cpr_state_file;
+
+QIOChannel *cpr_state_ioc(void)
+{
+ return qemu_file_get_ioc(cpr_state_file);
+}
+
+int cpr_state_save(MigrationChannel *channel, Error **errp)
+{
+ int ret;
+ QEMUFile *f;
+ MigMode mode = migrate_mode();
+
+ trace_cpr_state_save(MigMode_str(mode));
+
+ /* set f based on mode in a later patch in this series */
+ return 0;
+
+ qemu_put_be32(f, QEMU_CPR_FILE_MAGIC);
+ qemu_put_be32(f, QEMU_CPR_FILE_VERSION);
+
+ ret = vmstate_save_state(f, &vmstate_cpr_state, &cpr_state, 0);
+ if (ret) {
+ error_setg(errp, "vmstate_save_state error %d", ret);
+ qemu_fclose(f);
+ return ret;
+ }
+
+ /*
+ * Close the socket only partially so we can later detect when the other
+ * end closes by getting a HUP event.
+ */
+ qemu_fflush(f);
+ qio_channel_shutdown(qemu_file_get_ioc(f), QIO_CHANNEL_SHUTDOWN_WRITE,
+ NULL);
+ cpr_state_file = f;
+ return 0;
+}
+
+int cpr_state_load(MigrationChannel *channel, Error **errp)
+{
+ int ret;
+ uint32_t v;
+ QEMUFile *f;
+ MigMode mode = 0;
+
+ /* set f and mode based on other parameters later in this patch series */
+ return 0;
+
+ trace_cpr_state_load(MigMode_str(mode));
+
+ v = qemu_get_be32(f);
+ if (v != QEMU_CPR_FILE_MAGIC) {
+ error_setg(errp, "Not a migration stream (bad magic %x)", v);
+ qemu_fclose(f);
+ return -EINVAL;
+ }
+ v = qemu_get_be32(f);
+ if (v != QEMU_CPR_FILE_VERSION) {
+ error_setg(errp, "Unsupported migration stream version %d", v);
+ qemu_fclose(f);
+ return -ENOTSUP;
+ }
+
+ ret = vmstate_load_state(f, &vmstate_cpr_state, &cpr_state, 1);
+ if (ret) {
+ error_setg(errp, "vmstate_load_state error %d", ret);
+ qemu_fclose(f);
+ return ret;
+ }
+
+ /*
+ * Let the caller decide when to close the socket (and generate a HUP event
+ * for the sending side).
+ */
+ cpr_state_file = f;
+
+ return ret;
+}
+
+void cpr_state_close(void)
+{
+ if (cpr_state_file) {
+ qemu_fclose(cpr_state_file);
+ cpr_state_file = NULL;
+ }
+}
diff --git a/migration/meson.build b/migration/meson.build
index dac687ee3a..1eb8c96d23 100644
--- a/migration/meson.build
+++ b/migration/meson.build
@@ -14,6 +14,7 @@ system_ss.add(files(
'block-active.c',
'channel.c',
'channel-block.c',
+ 'cpr.c',
'cpu-throttle.c',
'dirtyrate.c',
'exec.c',
diff --git a/migration/migration.c b/migration/migration.c
index 2d1da917c7..fce7b22ae8 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -27,6 +27,7 @@
#include "system/cpu-throttle.h"
#include "rdma.h"
#include "ram.h"
+#include "migration/cpr.h"
#include "migration/global_state.h"
#include "migration/misc.h"
#include "migration.h"
diff --git a/migration/trace-events b/migration/trace-events
index b82a1c5e40..4e3061bc55 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -342,6 +342,13 @@ colo_receive_message(const char *msg) "Receive '%s' message"
# colo-failover.c
colo_failover_set_state(const char *new_state) "new state %s"
+# cpr.c
+cpr_save_fd(const char *name, int id, int fd) "%s, id %d, fd %d"
+cpr_delete_fd(const char *name, int id) "%s, id %d"
+cpr_find_fd(const char *name, int id, int fd) "%s, id %d returns %d"
+cpr_state_save(const char *mode) "%s mode"
+cpr_state_load(const char *mode) "%s mode"
+
# block-dirty-bitmap.c
send_bitmap_header_enter(void) ""
send_bitmap_bits(uint32_t flags, uint64_t start_sector, uint32_t nr_sectors, uint64_t data_size) "flags: 0x%x, start_sector: %" PRIu64 ", nr_sectors: %" PRIu32 ", data_size: %" PRIu64
--
2.35.3
^ permalink raw reply related [flat|nested] 47+ messages in thread
* [PULL 09/42] physmem: preserve ram blocks for cpr
2025-01-29 16:00 [PULL 00/42] Migration patches for 2025-01-29 Fabiano Rosas
` (7 preceding siblings ...)
2025-01-29 16:00 ` [PULL 08/42] migration: cpr-state Fabiano Rosas
@ 2025-01-29 16:00 ` Fabiano Rosas
2025-01-29 16:00 ` [PULL 10/42] hostmem-memfd: preserve " Fabiano Rosas
` (33 subsequent siblings)
42 siblings, 0 replies; 47+ messages in thread
From: Fabiano Rosas @ 2025-01-29 16:00 UTC (permalink / raw)
To: qemu-devel; +Cc: Peter Xu, Steve Sistare
From: Steve Sistare <steven.sistare@oracle.com>
Save the memfd for ramblocks in CPR state, along with a name that
uniquely identifies it. The block's idstr is not yet set, so it
cannot be used for this purpose. Find the saved memfd in new QEMU when
creating a block. If size of a resizable block is larger in new QEMU,
extend it via the file_ram_alloc truncate parameter, and the extra space
will be usable after a guest reset.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/1736967650-129648-9-git-send-email-steven.sistare@oracle.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
system/physmem.c | 44 +++++++++++++++++++++++++++++++++++++++-----
1 file changed, 39 insertions(+), 5 deletions(-)
diff --git a/system/physmem.c b/system/physmem.c
index cb80ce3091..67c9db9daa 100644
--- a/system/physmem.c
+++ b/system/physmem.c
@@ -70,6 +70,7 @@
#include "qemu/pmem.h"
+#include "migration/cpr.h"
#include "migration/vmstate.h"
#include "qemu/range.h"
@@ -1661,6 +1662,18 @@ void qemu_ram_unset_idstr(RAMBlock *block)
}
}
+static char *cpr_name(MemoryRegion *mr)
+{
+ const char *mr_name = memory_region_name(mr);
+ g_autofree char *id = mr->dev ? qdev_get_dev_path(mr->dev) : NULL;
+
+ if (id) {
+ return g_strdup_printf("%s/%s", id, mr_name);
+ } else {
+ return g_strdup(mr_name);
+ }
+}
+
size_t qemu_ram_pagesize(RAMBlock *rb)
{
return rb->page_size;
@@ -2080,15 +2093,25 @@ RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr,
* shared with another process if CPR is being used. Use memfd if available
* because it has no size limits, else use POSIX shm.
*/
-static int qemu_ram_get_shared_fd(const char *name, Error **errp)
+static int qemu_ram_get_shared_fd(const char *name, bool *reused, Error **errp)
{
- int fd;
+ int fd = cpr_find_fd(name, 0);
+
+ if (fd >= 0) {
+ *reused = true;
+ return fd;
+ }
if (qemu_memfd_check(0)) {
fd = qemu_memfd_create(name, 0, 0, 0, 0, errp);
} else {
fd = qemu_shm_alloc(0, errp);
}
+
+ if (fd >= 0) {
+ cpr_save_fd(name, 0, fd);
+ }
+ *reused = false;
return fd;
}
#endif
@@ -2118,8 +2141,9 @@ RAMBlock *qemu_ram_alloc_internal(ram_addr_t size, ram_addr_t max_size,
ram_flags |= RAM_SHARED;
}
if (ram_flags & RAM_SHARED) {
- const char *name = memory_region_name(mr);
- int fd = qemu_ram_get_shared_fd(name, errp);
+ bool reused;
+ g_autofree char *name = cpr_name(mr);
+ int fd = qemu_ram_get_shared_fd(name, &reused, errp);
if (fd < 0) {
return NULL;
@@ -2133,9 +2157,14 @@ RAMBlock *qemu_ram_alloc_internal(ram_addr_t size, ram_addr_t max_size,
* fd is not supported, but previous QEMU versions that called
* qemu_anon_ram_alloc for anonymous shared memory could have
* succeeded. Quietly fail and fall back.
+ *
+ * After cpr-transfer, new QEMU could create a memory region
+ * with a larger max size than old, so pass reused to grow the
+ * region if necessary. The extra space will be usable after a
+ * guest reset.
*/
new_block = qemu_ram_alloc_from_fd(size, max_size, resized, mr,
- ram_flags, fd, 0, false, NULL);
+ ram_flags, fd, 0, reused, NULL);
if (new_block) {
trace_qemu_ram_alloc_shared(name, new_block->used_length,
new_block->max_length, fd,
@@ -2143,6 +2172,7 @@ RAMBlock *qemu_ram_alloc_internal(ram_addr_t size, ram_addr_t max_size,
return new_block;
}
+ cpr_delete_fd(name, 0);
close(fd);
/* fall back to anon allocation */
}
@@ -2221,6 +2251,8 @@ static void reclaim_ramblock(RAMBlock *block)
void qemu_ram_free(RAMBlock *block)
{
+ g_autofree char *name = NULL;
+
if (!block) {
return;
}
@@ -2231,6 +2263,8 @@ void qemu_ram_free(RAMBlock *block)
}
qemu_mutex_lock_ramlist();
+ name = cpr_name(block->mr);
+ cpr_delete_fd(name, 0);
QLIST_REMOVE_RCU(block, next);
ram_list.mru_block = NULL;
/* Write list before version */
--
2.35.3
^ permalink raw reply related [flat|nested] 47+ messages in thread
* [PULL 10/42] hostmem-memfd: preserve for cpr
2025-01-29 16:00 [PULL 00/42] Migration patches for 2025-01-29 Fabiano Rosas
` (8 preceding siblings ...)
2025-01-29 16:00 ` [PULL 09/42] physmem: preserve ram blocks for cpr Fabiano Rosas
@ 2025-01-29 16:00 ` Fabiano Rosas
2025-01-29 16:00 ` [PULL 11/42] hostmem-shm: " Fabiano Rosas
` (32 subsequent siblings)
42 siblings, 0 replies; 47+ messages in thread
From: Fabiano Rosas @ 2025-01-29 16:00 UTC (permalink / raw)
To: qemu-devel; +Cc: Peter Xu, Steve Sistare
From: Steve Sistare <steven.sistare@oracle.com>
Preserve memory-backend-memfd memory objects during cpr-transfer.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/1736967650-129648-10-git-send-email-steven.sistare@oracle.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
backends/hostmem-memfd.c | 12 +++++++++---
1 file changed, 9 insertions(+), 3 deletions(-)
diff --git a/backends/hostmem-memfd.c b/backends/hostmem-memfd.c
index 1672da9e30..85daa1432c 100644
--- a/backends/hostmem-memfd.c
+++ b/backends/hostmem-memfd.c
@@ -17,6 +17,7 @@
#include "qemu/module.h"
#include "qapi/error.h"
#include "qom/object.h"
+#include "migration/cpr.h"
OBJECT_DECLARE_SIMPLE_TYPE(HostMemoryBackendMemfd, MEMORY_BACKEND_MEMFD)
@@ -33,15 +34,19 @@ static bool
memfd_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
{
HostMemoryBackendMemfd *m = MEMORY_BACKEND_MEMFD(backend);
- g_autofree char *name = NULL;
+ g_autofree char *name = host_memory_backend_get_name(backend);
+ int fd = cpr_find_fd(name, 0);
uint32_t ram_flags;
- int fd;
if (!backend->size) {
error_setg(errp, "can't create backend with size 0");
return false;
}
+ if (fd >= 0) {
+ goto have_fd;
+ }
+
fd = qemu_memfd_create(TYPE_MEMORY_BACKEND_MEMFD, backend->size,
m->hugetlb, m->hugetlbsize, m->seal ?
F_SEAL_GROW | F_SEAL_SHRINK | F_SEAL_SEAL : 0,
@@ -49,9 +54,10 @@ memfd_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
if (fd == -1) {
return false;
}
+ cpr_save_fd(name, 0, fd);
+have_fd:
backend->aligned = true;
- name = host_memory_backend_get_name(backend);
ram_flags = backend->share ? RAM_SHARED : RAM_PRIVATE;
ram_flags |= backend->reserve ? 0 : RAM_NORESERVE;
ram_flags |= backend->guest_memfd ? RAM_GUEST_MEMFD : 0;
--
2.35.3
^ permalink raw reply related [flat|nested] 47+ messages in thread
* [PULL 11/42] hostmem-shm: preserve for cpr
2025-01-29 16:00 [PULL 00/42] Migration patches for 2025-01-29 Fabiano Rosas
` (9 preceding siblings ...)
2025-01-29 16:00 ` [PULL 10/42] hostmem-memfd: preserve " Fabiano Rosas
@ 2025-01-29 16:00 ` Fabiano Rosas
2025-01-29 16:00 ` [PULL 12/42] migration: enhance migrate_uri_parse Fabiano Rosas
` (31 subsequent siblings)
42 siblings, 0 replies; 47+ messages in thread
From: Fabiano Rosas @ 2025-01-29 16:00 UTC (permalink / raw)
To: qemu-devel; +Cc: Peter Xu, Steve Sistare
From: Steve Sistare <steven.sistare@oracle.com>
Preserve memory-backend-shm memory objects during cpr-transfer.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/1736967650-129648-11-git-send-email-steven.sistare@oracle.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
backends/hostmem-shm.c | 12 +++++++++---
1 file changed, 9 insertions(+), 3 deletions(-)
diff --git a/backends/hostmem-shm.c b/backends/hostmem-shm.c
index fabee41f2c..f67ad2740b 100644
--- a/backends/hostmem-shm.c
+++ b/backends/hostmem-shm.c
@@ -13,6 +13,7 @@
#include "qemu/osdep.h"
#include "system/hostmem.h"
#include "qapi/error.h"
+#include "migration/cpr.h"
#define TYPE_MEMORY_BACKEND_SHM "memory-backend-shm"
@@ -25,9 +26,9 @@ struct HostMemoryBackendShm {
static bool
shm_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
{
- g_autofree char *backend_name = NULL;
+ g_autofree char *backend_name = host_memory_backend_get_name(backend);
uint32_t ram_flags;
- int fd;
+ int fd = cpr_find_fd(backend_name, 0);
if (!backend->size) {
error_setg(errp, "can't create shm backend with size 0");
@@ -39,13 +40,18 @@ shm_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
return false;
}
+ if (fd >= 0) {
+ goto have_fd;
+ }
+
fd = qemu_shm_alloc(backend->size, errp);
if (fd < 0) {
return false;
}
+ cpr_save_fd(backend_name, 0, fd);
+have_fd:
/* Let's do the same as memory-backend-ram,share=on would do. */
- backend_name = host_memory_backend_get_name(backend);
ram_flags = RAM_SHARED;
ram_flags |= backend->reserve ? 0 : RAM_NORESERVE;
--
2.35.3
^ permalink raw reply related [flat|nested] 47+ messages in thread
* [PULL 12/42] migration: enhance migrate_uri_parse
2025-01-29 16:00 [PULL 00/42] Migration patches for 2025-01-29 Fabiano Rosas
` (10 preceding siblings ...)
2025-01-29 16:00 ` [PULL 11/42] hostmem-shm: " Fabiano Rosas
@ 2025-01-29 16:00 ` Fabiano Rosas
2025-01-29 16:00 ` [PULL 13/42] migration: incoming channel Fabiano Rosas
` (30 subsequent siblings)
42 siblings, 0 replies; 47+ messages in thread
From: Fabiano Rosas @ 2025-01-29 16:00 UTC (permalink / raw)
To: qemu-devel; +Cc: Peter Xu, Steve Sistare
From: Steve Sistare <steven.sistare@oracle.com>
Export migrate_uri_parse for use outside migration internals, and define
a method migrate_is_uri that indicates when migrate_uri_parse should
be used.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/1736967650-129648-12-git-send-email-steven.sistare@oracle.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
include/migration/misc.h | 7 +++++++
migration/migration.c | 11 +++++++++++
migration/migration.h | 2 --
3 files changed, 18 insertions(+), 2 deletions(-)
diff --git a/include/migration/misc.h b/include/migration/misc.h
index 67f7ef7a0e..c660be8095 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -108,4 +108,11 @@ bool migration_in_bg_snapshot(void);
bool migration_block_activate(Error **errp);
bool migration_block_inactivate(void);
+/* True if @uri starts with a syntactically valid URI prefix */
+bool migrate_is_uri(const char *uri);
+
+/* Parse @uri and return @channel, returning true on success */
+bool migrate_uri_parse(const char *uri, MigrationChannel **channel,
+ Error **errp);
+
#endif
diff --git a/migration/migration.c b/migration/migration.c
index fce7b22ae8..b5ee98e691 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -14,6 +14,7 @@
*/
#include "qemu/osdep.h"
+#include "qemu/ctype.h"
#include "qemu/cutils.h"
#include "qemu/error-report.h"
#include "qemu/main-loop.h"
@@ -587,6 +588,16 @@ void migrate_add_address(SocketAddress *address)
QAPI_CLONE(SocketAddress, address));
}
+bool migrate_is_uri(const char *uri)
+{
+ while (*uri && *uri != ':') {
+ if (!qemu_isalpha(*uri++)) {
+ return false;
+ }
+ }
+ return *uri == ':';
+}
+
bool migrate_uri_parse(const char *uri, MigrationChannel **channel,
Error **errp)
{
diff --git a/migration/migration.h b/migration/migration.h
index 0df2a187af..1d4d4e910d 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -519,8 +519,6 @@ bool check_dirty_bitmap_mig_alias_map(const BitmapMigrationNodeAliasList *bbm,
Error **errp);
void migrate_add_address(SocketAddress *address);
-bool migrate_uri_parse(const char *uri, MigrationChannel **channel,
- Error **errp);
int foreach_not_ignored_block(RAMBlockIterFunc func, void *opaque);
#define qemu_ram_foreach_block \
--
2.35.3
^ permalink raw reply related [flat|nested] 47+ messages in thread
* [PULL 13/42] migration: incoming channel
2025-01-29 16:00 [PULL 00/42] Migration patches for 2025-01-29 Fabiano Rosas
` (11 preceding siblings ...)
2025-01-29 16:00 ` [PULL 12/42] migration: enhance migrate_uri_parse Fabiano Rosas
@ 2025-01-29 16:00 ` Fabiano Rosas
2025-01-29 16:00 ` [PULL 14/42] migration: SCM_RIGHTS for QEMUFile Fabiano Rosas
` (29 subsequent siblings)
42 siblings, 0 replies; 47+ messages in thread
From: Fabiano Rosas @ 2025-01-29 16:00 UTC (permalink / raw)
To: qemu-devel; +Cc: Peter Xu, Steve Sistare
From: Steve Sistare <steven.sistare@oracle.com>
Extend the -incoming option to allow an @MigrationChannel to be specified.
This allows channels other than 'main' to be described on the command
line, which will be needed for CPR.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Acked-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/1736967650-129648-13-git-send-email-steven.sistare@oracle.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
migration/migration.c | 21 ++++++++++++++++-----
qemu-options.hx | 21 +++++++++++++++++++++
system/vl.c | 36 +++++++++++++++++++++++++++++++++---
3 files changed, 70 insertions(+), 8 deletions(-)
diff --git a/migration/migration.c b/migration/migration.c
index b5ee98e691..5f2540fac3 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -695,7 +695,8 @@ static void qemu_start_incoming_migration(const char *uri, bool has_channels,
if (channels) {
/* To verify that Migrate channel list has only item */
if (channels->next) {
- error_setg(errp, "Channel list has more than one entries");
+ error_setg(errp, "Channel list must have only one entry, "
+ "for type 'main'");
return;
}
addr = channels->value->addr;
@@ -2054,6 +2055,7 @@ void qmp_migrate(const char *uri, bool has_channels,
MigrationState *s = migrate_get_current();
g_autoptr(MigrationChannel) channel = NULL;
MigrationAddress *addr = NULL;
+ MigrationChannel *channelv[MIGRATION_CHANNEL_TYPE__MAX] = { NULL };
/*
* Having preliminary checks for uri and channel
@@ -2064,12 +2066,21 @@ void qmp_migrate(const char *uri, bool has_channels,
}
if (channels) {
- /* To verify that Migrate channel list has only item */
- if (channels->next) {
- error_setg(errp, "Channel list has more than one entries");
+ for ( ; channels; channels = channels->next) {
+ MigrationChannelType type = channels->value->channel_type;
+
+ if (channelv[type]) {
+ error_setg(errp, "Channel list has more than one %s entry",
+ MigrationChannelType_str(type));
+ return;
+ }
+ channelv[type] = channels->value;
+ }
+ addr = channelv[MIGRATION_CHANNEL_TYPE_MAIN]->addr;
+ if (!addr) {
+ error_setg(errp, "Channel list has no main entry");
return;
}
- addr = channels->value->addr;
}
if (uri) {
diff --git a/qemu-options.hx b/qemu-options.hx
index 90fad31590..3d1af7325b 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -4940,10 +4940,18 @@ DEF("incoming", HAS_ARG, QEMU_OPTION_incoming, \
"-incoming exec:cmdline\n" \
" accept incoming migration on given file descriptor\n" \
" or from given external command\n" \
+ "-incoming <channel>\n" \
+ " accept incoming migration on the migration channel\n" \
"-incoming defer\n" \
" wait for the URI to be specified via migrate_incoming\n",
QEMU_ARCH_ALL)
SRST
+The -incoming option specifies the migration channel for an incoming
+migration. It may be used multiple times to specify multiple
+migration channel types. The channel type is specified in <channel>,
+or is 'main' for all other forms of -incoming. If multiple -incoming
+options are specified for a channel type, the last one takes precedence.
+
``-incoming tcp:[host]:port[,to=maxport][,ipv4=on|off][,ipv6=on|off]``
\
``-incoming rdma:host:port[,ipv4=on|off][,ipv6=on|off]``
@@ -4963,6 +4971,19 @@ SRST
Accept incoming migration as an output from specified external
command.
+``-incoming <channel>``
+ Accept incoming migration on the migration channel. For the syntax
+ of <channel>, see the QAPI documentation of ``MigrationChannel``.
+ Examples:
+ ::
+
+ -incoming '{"channel-type": "main",
+ "addr": { "transport": "socket",
+ "type": "unix",
+ "path": "my.sock" }}'
+
+ -incoming main,addr.transport=socket,addr.type=unix,addr.path=my.sock
+
``-incoming defer``
Wait for the URI to be specified via migrate\_incoming. The monitor
can be used to change settings (such as migration parameters) prior
diff --git a/system/vl.c b/system/vl.c
index c567826718..504f05b954 100644
--- a/system/vl.c
+++ b/system/vl.c
@@ -123,6 +123,7 @@
#include "qapi/qapi-visit-block-core.h"
#include "qapi/qapi-visit-compat.h"
#include "qapi/qapi-visit-machine.h"
+#include "qapi/qapi-visit-migration.h"
#include "qapi/qapi-visit-ui.h"
#include "qapi/qapi-commands-block-core.h"
#include "qapi/qapi-commands-migration.h"
@@ -159,6 +160,8 @@ typedef struct DeviceOption {
static const char *cpu_option;
static const char *mem_path;
static const char *incoming;
+static const char *incoming_str[MIGRATION_CHANNEL_TYPE__MAX];
+static MigrationChannel *incoming_channels[MIGRATION_CHANNEL_TYPE__MAX];
static const char *loadvm;
static const char *accelerators;
static bool have_custom_ram_size;
@@ -1813,6 +1816,30 @@ static void object_option_add_visitor(Visitor *v)
QTAILQ_INSERT_TAIL(&object_opts, opt, next);
}
+static void incoming_option_parse(const char *str)
+{
+ MigrationChannelType type = MIGRATION_CHANNEL_TYPE_MAIN;
+ MigrationChannel *channel;
+ Visitor *v;
+
+ if (!strcmp(str, "defer")) {
+ channel = NULL;
+ } else if (migrate_is_uri(str)) {
+ migrate_uri_parse(str, &channel, &error_fatal);
+ } else {
+ v = qobject_input_visitor_new_str(str, "channel-type", &error_fatal);
+ visit_type_MigrationChannel(v, NULL, &channel, &error_fatal);
+ visit_free(v);
+ type = channel->channel_type;
+ }
+
+ /* New incoming spec replaces the previous */
+ qapi_free_MigrationChannel(incoming_channels[type]);
+ incoming_channels[type] = channel;
+ incoming_str[type] = str;
+ incoming = incoming_str[MIGRATION_CHANNEL_TYPE_MAIN];
+}
+
static void object_option_parse(const char *str)
{
QemuOpts *opts;
@@ -2738,8 +2765,11 @@ void qmp_x_exit_preconfig(Error **errp)
if (incoming) {
Error *local_err = NULL;
if (strcmp(incoming, "defer") != 0) {
- qmp_migrate_incoming(incoming, false, NULL, true, true,
- &local_err);
+ g_autofree MigrationChannelList *channels =
+ g_new0(MigrationChannelList, 1);
+
+ channels->value = incoming_channels[MIGRATION_CHANNEL_TYPE_MAIN];
+ qmp_migrate_incoming(NULL, true, channels, true, true, &local_err);
if (local_err) {
error_reportf_err(local_err, "-incoming %s: ", incoming);
exit(1);
@@ -3458,7 +3488,7 @@ void qemu_init(int argc, char **argv)
if (!incoming) {
runstate_set(RUN_STATE_INMIGRATE);
}
- incoming = optarg;
+ incoming_option_parse(optarg);
break;
case QEMU_OPTION_only_migratable:
only_migratable = 1;
--
2.35.3
^ permalink raw reply related [flat|nested] 47+ messages in thread
* [PULL 14/42] migration: SCM_RIGHTS for QEMUFile
2025-01-29 16:00 [PULL 00/42] Migration patches for 2025-01-29 Fabiano Rosas
` (12 preceding siblings ...)
2025-01-29 16:00 ` [PULL 13/42] migration: incoming channel Fabiano Rosas
@ 2025-01-29 16:00 ` Fabiano Rosas
2025-01-29 16:00 ` [PULL 15/42] migration: VMSTATE_FD Fabiano Rosas
` (28 subsequent siblings)
42 siblings, 0 replies; 47+ messages in thread
From: Fabiano Rosas @ 2025-01-29 16:00 UTC (permalink / raw)
To: qemu-devel; +Cc: Peter Xu, Steve Sistare
From: Steve Sistare <steven.sistare@oracle.com>
Define functions to put/get file descriptors to/from a QEMUFile, for qio
channels that support SCM_RIGHTS. Maintain ordering such that
put(A), put(fd), put(B)
followed by
get(A), get(fd), get(B)
always succeeds. Other get orderings may succeed but are not guaranteed.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/1736967650-129648-14-git-send-email-steven.sistare@oracle.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
migration/qemu-file.c | 84 ++++++++++++++++++++++++++++++++++++++++--
migration/qemu-file.h | 2 +
migration/trace-events | 2 +
3 files changed, 84 insertions(+), 4 deletions(-)
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index b6d2f588bd..1303a5bf58 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -37,6 +37,11 @@
#define IO_BUF_SIZE 32768
#define MAX_IOV_SIZE MIN_CONST(IOV_MAX, 64)
+typedef struct FdEntry {
+ QTAILQ_ENTRY(FdEntry) entry;
+ int fd;
+} FdEntry;
+
struct QEMUFile {
QIOChannel *ioc;
bool is_writable;
@@ -51,6 +56,9 @@ struct QEMUFile {
int last_error;
Error *last_error_obj;
+
+ bool can_pass_fd;
+ QTAILQ_HEAD(, FdEntry) fds;
};
/*
@@ -109,6 +117,8 @@ static QEMUFile *qemu_file_new_impl(QIOChannel *ioc, bool is_writable)
object_ref(ioc);
f->ioc = ioc;
f->is_writable = is_writable;
+ f->can_pass_fd = qio_channel_has_feature(ioc, QIO_CHANNEL_FEATURE_FD_PASS);
+ QTAILQ_INIT(&f->fds);
return f;
}
@@ -310,6 +320,10 @@ static ssize_t coroutine_mixed_fn qemu_fill_buffer(QEMUFile *f)
int len;
int pending;
Error *local_error = NULL;
+ g_autofree int *fds = NULL;
+ size_t nfd = 0;
+ int **pfds = f->can_pass_fd ? &fds : NULL;
+ size_t *pnfd = f->can_pass_fd ? &nfd : NULL;
assert(!qemu_file_is_writable(f));
@@ -325,10 +339,9 @@ static ssize_t coroutine_mixed_fn qemu_fill_buffer(QEMUFile *f)
}
do {
- len = qio_channel_read(f->ioc,
- (char *)f->buf + pending,
- IO_BUF_SIZE - pending,
- &local_error);
+ struct iovec iov = { f->buf + pending, IO_BUF_SIZE - pending };
+ len = qio_channel_readv_full(f->ioc, &iov, 1, pfds, pnfd, 0,
+ &local_error);
if (len == QIO_CHANNEL_ERR_BLOCK) {
if (qemu_in_coroutine()) {
qio_channel_yield(f->ioc, G_IO_IN);
@@ -348,9 +361,66 @@ static ssize_t coroutine_mixed_fn qemu_fill_buffer(QEMUFile *f)
qemu_file_set_error_obj(f, len, local_error);
}
+ for (int i = 0; i < nfd; i++) {
+ FdEntry *fde = g_new0(FdEntry, 1);
+ fde->fd = fds[i];
+ QTAILQ_INSERT_TAIL(&f->fds, fde, entry);
+ }
+
return len;
}
+int qemu_file_put_fd(QEMUFile *f, int fd)
+{
+ int ret = 0;
+ QIOChannel *ioc = qemu_file_get_ioc(f);
+ Error *err = NULL;
+ struct iovec iov = { (void *)" ", 1 };
+
+ /*
+ * Send a dummy byte so qemu_fill_buffer on the receiving side does not
+ * fail with a len=0 error. Flush first to maintain ordering wrt other
+ * data.
+ */
+
+ qemu_fflush(f);
+ if (qio_channel_writev_full(ioc, &iov, 1, &fd, 1, 0, &err) < 1) {
+ error_report_err(error_copy(err));
+ qemu_file_set_error_obj(f, -EIO, err);
+ ret = -1;
+ }
+ trace_qemu_file_put_fd(f->ioc->name, fd, ret);
+ return ret;
+}
+
+int qemu_file_get_fd(QEMUFile *f)
+{
+ int fd = -1;
+ FdEntry *fde;
+
+ if (!f->can_pass_fd) {
+ Error *err = NULL;
+ error_setg(&err, "%s does not support fd passing", f->ioc->name);
+ error_report_err(error_copy(err));
+ qemu_file_set_error_obj(f, -EIO, err);
+ goto out;
+ }
+
+ /* Force the dummy byte and its fd passenger to appear. */
+ qemu_peek_byte(f, 0);
+
+ fde = QTAILQ_FIRST(&f->fds);
+ if (fde) {
+ qemu_get_byte(f); /* Drop the dummy byte */
+ fd = fde->fd;
+ QTAILQ_REMOVE(&f->fds, fde, entry);
+ g_free(fde);
+ }
+out:
+ trace_qemu_file_get_fd(f->ioc->name, fd);
+ return fd;
+}
+
/** Closes the file
*
* Returns negative error value if any error happened on previous operations or
@@ -361,11 +431,17 @@ static ssize_t coroutine_mixed_fn qemu_fill_buffer(QEMUFile *f)
*/
int qemu_fclose(QEMUFile *f)
{
+ FdEntry *fde, *next;
int ret = qemu_fflush(f);
int ret2 = qio_channel_close(f->ioc, NULL);
if (ret >= 0) {
ret = ret2;
}
+ QTAILQ_FOREACH_SAFE(fde, &f->fds, entry, next) {
+ warn_report("qemu_fclose: received fd %d was never claimed", fde->fd);
+ close(fde->fd);
+ g_free(fde);
+ }
g_clear_pointer(&f->ioc, object_unref);
error_free(f->last_error_obj);
g_free(f);
diff --git a/migration/qemu-file.h b/migration/qemu-file.h
index 11c2120edd..3e47a20621 100644
--- a/migration/qemu-file.h
+++ b/migration/qemu-file.h
@@ -79,5 +79,7 @@ size_t qemu_get_buffer_at(QEMUFile *f, const uint8_t *buf, size_t buflen,
off_t pos);
QIOChannel *qemu_file_get_ioc(QEMUFile *file);
+int qemu_file_put_fd(QEMUFile *f, int fd);
+int qemu_file_get_fd(QEMUFile *f);
#endif
diff --git a/migration/trace-events b/migration/trace-events
index 4e3061bc55..abd9cdf2a1 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -88,6 +88,8 @@ put_qlist_end(const char *field_name, const char *vmsd_name) "%s(%s)"
# qemu-file.c
qemu_file_fclose(void) ""
+qemu_file_put_fd(const char *name, int fd, int ret) "ioc %s, fd %d -> status %d"
+qemu_file_get_fd(const char *name, int fd) "ioc %s -> fd %d"
# ram.c
get_queued_page(const char *block_name, uint64_t tmp_offset, unsigned long page_abs) "%s/0x%" PRIx64 " page_abs=0x%lx"
--
2.35.3
^ permalink raw reply related [flat|nested] 47+ messages in thread
* [PULL 15/42] migration: VMSTATE_FD
2025-01-29 16:00 [PULL 00/42] Migration patches for 2025-01-29 Fabiano Rosas
` (13 preceding siblings ...)
2025-01-29 16:00 ` [PULL 14/42] migration: SCM_RIGHTS for QEMUFile Fabiano Rosas
@ 2025-01-29 16:00 ` Fabiano Rosas
2025-01-29 16:00 ` [PULL 16/42] migration: cpr-transfer save and load Fabiano Rosas
` (27 subsequent siblings)
42 siblings, 0 replies; 47+ messages in thread
From: Fabiano Rosas @ 2025-01-29 16:00 UTC (permalink / raw)
To: qemu-devel; +Cc: Peter Xu, Steve Sistare
From: Steve Sistare <steven.sistare@oracle.com>
Define VMSTATE_FD for declaring a file descriptor field in a
VMStateDescription.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/1736967650-129648-15-git-send-email-steven.sistare@oracle.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
include/migration/vmstate.h | 9 +++++++++
migration/vmstate-types.c | 23 +++++++++++++++++++++++
2 files changed, 32 insertions(+)
diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index f313f2f408..a1dfab4460 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -230,6 +230,7 @@ extern const VMStateInfo vmstate_info_uint8;
extern const VMStateInfo vmstate_info_uint16;
extern const VMStateInfo vmstate_info_uint32;
extern const VMStateInfo vmstate_info_uint64;
+extern const VMStateInfo vmstate_info_fd;
/** Put this in the stream when migrating a null pointer.*/
#define VMS_NULLPTR_MARKER (0x30U) /* '0' */
@@ -902,6 +903,9 @@ extern const VMStateInfo vmstate_info_qlist;
#define VMSTATE_UINT64_V(_f, _s, _v) \
VMSTATE_SINGLE(_f, _s, _v, vmstate_info_uint64, uint64_t)
+#define VMSTATE_FD_V(_f, _s, _v) \
+ VMSTATE_SINGLE(_f, _s, _v, vmstate_info_fd, int32_t)
+
#ifdef CONFIG_LINUX
#define VMSTATE_U8_V(_f, _s, _v) \
@@ -936,6 +940,9 @@ extern const VMStateInfo vmstate_info_qlist;
#define VMSTATE_UINT64(_f, _s) \
VMSTATE_UINT64_V(_f, _s, 0)
+#define VMSTATE_FD(_f, _s) \
+ VMSTATE_FD_V(_f, _s, 0)
+
#ifdef CONFIG_LINUX
#define VMSTATE_U8(_f, _s) \
@@ -1009,6 +1016,8 @@ extern const VMStateInfo vmstate_info_qlist;
#define VMSTATE_UINT64_TEST(_f, _s, _t) \
VMSTATE_SINGLE_TEST(_f, _s, _t, 0, vmstate_info_uint64, uint64_t)
+#define VMSTATE_FD_TEST(_f, _s, _t) \
+ VMSTATE_SINGLE_TEST(_f, _s, _t, 0, vmstate_info_fd, int32_t)
#define VMSTATE_TIMER_PTR_TEST(_f, _s, _test) \
VMSTATE_POINTER_TEST(_f, _s, _test, vmstate_info_timer, QEMUTimer *)
diff --git a/migration/vmstate-types.c b/migration/vmstate-types.c
index d70d573dbd..0319c3568b 100644
--- a/migration/vmstate-types.c
+++ b/migration/vmstate-types.c
@@ -314,6 +314,29 @@ const VMStateInfo vmstate_info_uint64 = {
.put = put_uint64,
};
+/* File descriptor communicated via SCM_RIGHTS */
+
+static int get_fd(QEMUFile *f, void *pv, size_t size,
+ const VMStateField *field)
+{
+ int32_t *v = pv;
+ *v = qemu_file_get_fd(f);
+ return 0;
+}
+
+static int put_fd(QEMUFile *f, void *pv, size_t size,
+ const VMStateField *field, JSONWriter *vmdesc)
+{
+ int32_t *v = pv;
+ return qemu_file_put_fd(f, *v);
+}
+
+const VMStateInfo vmstate_info_fd = {
+ .name = "fd",
+ .get = get_fd,
+ .put = put_fd,
+};
+
static int get_nullptr(QEMUFile *f, void *pv, size_t size,
const VMStateField *field)
--
2.35.3
^ permalink raw reply related [flat|nested] 47+ messages in thread
* [PULL 16/42] migration: cpr-transfer save and load
2025-01-29 16:00 [PULL 00/42] Migration patches for 2025-01-29 Fabiano Rosas
` (14 preceding siblings ...)
2025-01-29 16:00 ` [PULL 15/42] migration: VMSTATE_FD Fabiano Rosas
@ 2025-01-29 16:00 ` Fabiano Rosas
2025-01-29 16:00 ` [PULL 17/42] migration: cpr-transfer mode Fabiano Rosas
` (26 subsequent siblings)
42 siblings, 0 replies; 47+ messages in thread
From: Fabiano Rosas @ 2025-01-29 16:00 UTC (permalink / raw)
To: qemu-devel; +Cc: Peter Xu, Steve Sistare
From: Steve Sistare <steven.sistare@oracle.com>
Add functions to create a QEMUFile based on a unix URI, for saving or
loading, for use by cpr-transfer mode to preserve CPR state.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/1736967650-129648-16-git-send-email-steven.sistare@oracle.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
include/migration/cpr.h | 3 ++
migration/cpr-transfer.c | 71 ++++++++++++++++++++++++++++++++++++++++
migration/meson.build | 1 +
migration/trace-events | 2 ++
4 files changed, 77 insertions(+)
create mode 100644 migration/cpr-transfer.c
diff --git a/include/migration/cpr.h b/include/migration/cpr.h
index d9364f7d1f..c669b8b8a3 100644
--- a/include/migration/cpr.h
+++ b/include/migration/cpr.h
@@ -22,4 +22,7 @@ int cpr_state_load(MigrationChannel *channel, Error **errp);
void cpr_state_close(void);
struct QIOChannel *cpr_state_ioc(void);
+QEMUFile *cpr_transfer_output(MigrationChannel *channel, Error **errp);
+QEMUFile *cpr_transfer_input(MigrationChannel *channel, Error **errp);
+
#endif
diff --git a/migration/cpr-transfer.c b/migration/cpr-transfer.c
new file mode 100644
index 0000000000..e1f140359c
--- /dev/null
+++ b/migration/cpr-transfer.c
@@ -0,0 +1,71 @@
+/*
+ * Copyright (c) 2022, 2024 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "io/channel-file.h"
+#include "io/channel-socket.h"
+#include "io/net-listener.h"
+#include "migration/cpr.h"
+#include "migration/migration.h"
+#include "migration/savevm.h"
+#include "migration/qemu-file.h"
+#include "migration/vmstate.h"
+#include "trace.h"
+
+QEMUFile *cpr_transfer_output(MigrationChannel *channel, Error **errp)
+{
+ MigrationAddress *addr = channel->addr;
+
+ if (addr->transport == MIGRATION_ADDRESS_TYPE_SOCKET &&
+ addr->u.socket.type == SOCKET_ADDRESS_TYPE_UNIX) {
+
+ g_autoptr(QIOChannelSocket) sioc = qio_channel_socket_new();
+ QIOChannel *ioc = QIO_CHANNEL(sioc);
+ SocketAddress *saddr = &addr->u.socket;
+
+ if (qio_channel_socket_connect_sync(sioc, saddr, errp) < 0) {
+ return NULL;
+ }
+ trace_cpr_transfer_output(addr->u.socket.u.q_unix.path);
+ qio_channel_set_name(ioc, "cpr-out");
+ return qemu_file_new_output(ioc);
+
+ } else {
+ error_setg(errp, "bad cpr channel address; must be unix");
+ return NULL;
+ }
+}
+
+QEMUFile *cpr_transfer_input(MigrationChannel *channel, Error **errp)
+{
+ MigrationAddress *addr = channel->addr;
+
+ if (addr->transport == MIGRATION_ADDRESS_TYPE_SOCKET &&
+ addr->u.socket.type == SOCKET_ADDRESS_TYPE_UNIX) {
+
+ g_autoptr(QIOChannelSocket) sioc = NULL;
+ SocketAddress *saddr = &addr->u.socket;
+ g_autoptr(QIONetListener) listener = qio_net_listener_new();
+ QIOChannel *ioc;
+
+ qio_net_listener_set_name(listener, "cpr-socket-listener");
+ if (qio_net_listener_open_sync(listener, saddr, 1, errp) < 0) {
+ return NULL;
+ }
+
+ sioc = qio_net_listener_wait_client(listener);
+ ioc = QIO_CHANNEL(sioc);
+ trace_cpr_transfer_input(addr->u.socket.u.q_unix.path);
+ qio_channel_set_name(ioc, "cpr-in");
+ return qemu_file_new_input(ioc);
+
+ } else {
+ error_setg(errp, "bad cpr channel socket type; must be unix");
+ return NULL;
+ }
+}
diff --git a/migration/meson.build b/migration/meson.build
index 1eb8c96d23..d3bfe84d62 100644
--- a/migration/meson.build
+++ b/migration/meson.build
@@ -15,6 +15,7 @@ system_ss.add(files(
'channel.c',
'channel-block.c',
'cpr.c',
+ 'cpr-transfer.c',
'cpu-throttle.c',
'dirtyrate.c',
'exec.c',
diff --git a/migration/trace-events b/migration/trace-events
index abd9cdf2a1..e03a914afb 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -350,6 +350,8 @@ cpr_delete_fd(const char *name, int id) "%s, id %d"
cpr_find_fd(const char *name, int id, int fd) "%s, id %d returns %d"
cpr_state_save(const char *mode) "%s mode"
cpr_state_load(const char *mode) "%s mode"
+cpr_transfer_input(const char *path) "%s"
+cpr_transfer_output(const char *path) "%s"
# block-dirty-bitmap.c
send_bitmap_header_enter(void) ""
--
2.35.3
^ permalink raw reply related [flat|nested] 47+ messages in thread
* [PULL 17/42] migration: cpr-transfer mode
2025-01-29 16:00 [PULL 00/42] Migration patches for 2025-01-29 Fabiano Rosas
` (15 preceding siblings ...)
2025-01-29 16:00 ` [PULL 16/42] migration: cpr-transfer save and load Fabiano Rosas
@ 2025-01-29 16:00 ` Fabiano Rosas
2025-02-04 13:40 ` Peter Maydell
2025-01-29 16:00 ` [PULL 18/42] migration-test: memory_backend Fabiano Rosas
` (25 subsequent siblings)
42 siblings, 1 reply; 47+ messages in thread
From: Fabiano Rosas @ 2025-01-29 16:00 UTC (permalink / raw)
To: qemu-devel; +Cc: Peter Xu, Steve Sistare, Markus Armbruster
From: Steve Sistare <steven.sistare@oracle.com>
Add the cpr-transfer migration mode, which allows the user to transfer
a guest to a new QEMU instance on the same host with minimal guest pause
time, by preserving guest RAM in place, albeit with new virtual addresses
in new QEMU, and by preserving device file descriptors. Pages that were
locked in memory for DMA in old QEMU remain locked in new QEMU, because the
descriptor of the device that locked them remains open.
cpr-transfer preserves memory and devices descriptors by sending them to
new QEMU over a unix domain socket using SCM_RIGHTS. Such CPR state cannot
be sent over the normal migration channel, because devices and backends
are created prior to reading the channel, so this mode sends CPR state
over a second "cpr" migration channel. New QEMU reads the cpr channel
prior to creating devices or backends. The user specifies the cpr channel
in the channel arguments on the outgoing side, and in a second -incoming
command-line parameter on the incoming side.
The user must start old QEMU with the the '-machine aux-ram-share=on' option,
which allows anonymous memory to be transferred in place to the new process
by transferring a memory descriptor for each ram block. Memory-backend
objects must have the share=on attribute, but memory-backend-epc is not
supported.
The user starts new QEMU on the same host as old QEMU, with command-line
arguments to create the same machine, plus the -incoming option for the
main migration channel, like normal live migration. In addition, the user
adds a second -incoming option with channel type "cpr". This CPR channel
must support file descriptor transfer with SCM_RIGHTS, i.e. it must be a
UNIX domain socket.
To initiate CPR, the user issues a migrate command to old QEMU, adding
a second migration channel of type "cpr" in the channels argument.
Old QEMU stops the VM, saves state to the migration channels, and enters
the postmigrate state. New QEMU mmap's memory descriptors, and execution
resumes.
The implementation splits qmp_migrate into start and finish functions.
Start sends CPR state to new QEMU, which responds by closing the CPR
channel. Old QEMU detects the HUP then calls finish, which connects the
main migration channel.
In summary, the usage is:
qemu-system-$arch -machine aux-ram-share=on ...
start new QEMU with "-incoming <main-uri> -incoming <cpr-channel>"
Issue commands to old QEMU:
migrate_set_parameter mode cpr-transfer
{"execute": "migrate", ...
{"channel-type": "main"...}, {"channel-type": "cpr"...} ... }
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Link: https://lore.kernel.org/r/1736967650-129648-17-git-send-email-steven.sistare@oracle.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
include/migration/cpr.h | 5 ++
migration/cpr.c | 36 +++++++++++--
migration/migration.c | 106 +++++++++++++++++++++++++++++++++++++-
migration/migration.h | 2 +
migration/options.c | 8 ++-
migration/ram.c | 2 +
migration/vmstate-types.c | 1 +
qapi/migration.json | 44 +++++++++++++++-
qemu-options.hx | 2 +
stubs/vmstate.c | 7 +++
system/vl.c | 7 +++
11 files changed, 210 insertions(+), 10 deletions(-)
diff --git a/include/migration/cpr.h b/include/migration/cpr.h
index c669b8b8a3..3a6deb7933 100644
--- a/include/migration/cpr.h
+++ b/include/migration/cpr.h
@@ -10,6 +10,8 @@
#include "qapi/qapi-types-migration.h"
+#define MIG_MODE_NONE -1
+
#define QEMU_CPR_FILE_MAGIC 0x51435052
#define QEMU_CPR_FILE_VERSION 0x00000001
@@ -17,6 +19,9 @@ void cpr_save_fd(const char *name, int id, int fd);
void cpr_delete_fd(const char *name, int id);
int cpr_find_fd(const char *name, int id);
+MigMode cpr_get_incoming_mode(void);
+void cpr_set_incoming_mode(MigMode mode);
+
int cpr_state_save(MigrationChannel *channel, Error **errp);
int cpr_state_load(MigrationChannel *channel, Error **errp);
void cpr_state_close(void);
diff --git a/migration/cpr.c b/migration/cpr.c
index 87bcfdb5ff..584b0b98f7 100644
--- a/migration/cpr.c
+++ b/migration/cpr.c
@@ -45,7 +45,7 @@ static const VMStateDescription vmstate_cpr_fd = {
VMSTATE_UINT32(namelen, CprFd),
VMSTATE_VBUFFER_ALLOC_UINT32(name, CprFd, 0, NULL, namelen),
VMSTATE_INT32(id, CprFd),
- VMSTATE_INT32(fd, CprFd),
+ VMSTATE_FD(fd, CprFd),
VMSTATE_END_OF_LIST()
}
};
@@ -116,6 +116,18 @@ QIOChannel *cpr_state_ioc(void)
return qemu_file_get_ioc(cpr_state_file);
}
+static MigMode incoming_mode = MIG_MODE_NONE;
+
+MigMode cpr_get_incoming_mode(void)
+{
+ return incoming_mode;
+}
+
+void cpr_set_incoming_mode(MigMode mode)
+{
+ incoming_mode = mode;
+}
+
int cpr_state_save(MigrationChannel *channel, Error **errp)
{
int ret;
@@ -124,8 +136,14 @@ int cpr_state_save(MigrationChannel *channel, Error **errp)
trace_cpr_state_save(MigMode_str(mode));
- /* set f based on mode in a later patch in this series */
- return 0;
+ if (mode == MIG_MODE_CPR_TRANSFER) {
+ f = cpr_transfer_output(channel, errp);
+ } else {
+ return 0;
+ }
+ if (!f) {
+ return -1;
+ }
qemu_put_be32(f, QEMU_CPR_FILE_MAGIC);
qemu_put_be32(f, QEMU_CPR_FILE_VERSION);
@@ -155,8 +173,16 @@ int cpr_state_load(MigrationChannel *channel, Error **errp)
QEMUFile *f;
MigMode mode = 0;
- /* set f and mode based on other parameters later in this patch series */
- return 0;
+ if (channel) {
+ mode = MIG_MODE_CPR_TRANSFER;
+ cpr_set_incoming_mode(mode);
+ f = cpr_transfer_input(channel, errp);
+ } else {
+ return 0;
+ }
+ if (!f) {
+ return -1;
+ }
trace_cpr_state_load(MigMode_str(mode));
diff --git a/migration/migration.c b/migration/migration.c
index 5f2540fac3..88b09914ec 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -77,6 +77,7 @@
static NotifierWithReturnList migration_state_notifiers[] = {
NOTIFIER_ELEM_INIT(migration_state_notifiers, MIG_MODE_NORMAL),
NOTIFIER_ELEM_INIT(migration_state_notifiers, MIG_MODE_CPR_REBOOT),
+ NOTIFIER_ELEM_INIT(migration_state_notifiers, MIG_MODE_CPR_TRANSFER),
};
/* Messages sent on the return path from destination to source */
@@ -110,6 +111,7 @@ static int migration_maybe_pause(MigrationState *s,
static void migrate_fd_cancel(MigrationState *s);
static bool close_return_path_on_source(MigrationState *s);
static void migration_completion_end(MigrationState *s);
+static void migrate_hup_delete(MigrationState *s);
static void migration_downtime_start(MigrationState *s)
{
@@ -220,6 +222,12 @@ migration_channels_and_transport_compatible(MigrationAddress *addr,
return false;
}
+ if (migrate_mode() == MIG_MODE_CPR_TRANSFER &&
+ addr->transport == MIGRATION_ADDRESS_TYPE_FILE) {
+ error_setg(errp, "Migration requires streamable transport (eg unix)");
+ return false;
+ }
+
return true;
}
@@ -435,6 +443,7 @@ void migration_incoming_state_destroy(void)
mis->postcopy_qemufile_dst = NULL;
}
+ cpr_set_incoming_mode(MIG_MODE_NONE);
yank_unregister_instance(MIGRATION_YANK_INSTANCE);
}
@@ -747,6 +756,9 @@ static void qemu_start_incoming_migration(const char *uri, bool has_channels,
} else {
error_setg(errp, "unknown migration protocol: %s", uri);
}
+
+ /* Close cpr socket to tell source that we are listening */
+ cpr_state_close();
}
static void process_incoming_migration_bh(void *opaque)
@@ -1423,6 +1435,8 @@ static void migrate_fd_cleanup(MigrationState *s)
s->vmdesc = NULL;
qemu_savevm_state_cleanup();
+ cpr_state_close();
+ migrate_hup_delete(s);
close_return_path_on_source(s);
@@ -1534,6 +1548,7 @@ static void migrate_fd_error(MigrationState *s, const Error *error)
static void migrate_fd_cancel(MigrationState *s)
{
int old_state ;
+ bool setup = (s->state == MIGRATION_STATUS_SETUP);
trace_migrate_fd_cancel();
@@ -1568,6 +1583,17 @@ static void migrate_fd_cancel(MigrationState *s)
}
}
}
+
+ /*
+ * If qmp_migrate_finish has not been called, then there is no path that
+ * will complete the cancellation. Do it now.
+ */
+ if (setup && !s->to_dst_file) {
+ migrate_set_state(&s->state, MIGRATION_STATUS_CANCELLING,
+ MIGRATION_STATUS_CANCELLED);
+ cpr_state_close();
+ migrate_hup_delete(s);
+ }
}
void migration_add_notifier_mode(NotifierWithReturn *notify,
@@ -1665,7 +1691,9 @@ bool migration_thread_is_self(void)
bool migrate_mode_is_cpr(MigrationState *s)
{
- return s->parameters.mode == MIG_MODE_CPR_REBOOT;
+ MigMode mode = s->parameters.mode;
+ return mode == MIG_MODE_CPR_REBOOT ||
+ mode == MIG_MODE_CPR_TRANSFER;
}
int migrate_init(MigrationState *s, Error **errp)
@@ -2046,6 +2074,40 @@ static bool migrate_prepare(MigrationState *s, bool resume, Error **errp)
return true;
}
+static void qmp_migrate_finish(MigrationAddress *addr, bool resume_requested,
+ Error **errp);
+
+static void migrate_hup_add(MigrationState *s, QIOChannel *ioc, GSourceFunc cb,
+ void *opaque)
+{
+ s->hup_source = qio_channel_create_watch(ioc, G_IO_HUP);
+ g_source_set_callback(s->hup_source, cb, opaque, NULL);
+ g_source_attach(s->hup_source, NULL);
+}
+
+static void migrate_hup_delete(MigrationState *s)
+{
+ if (s->hup_source) {
+ g_source_destroy(s->hup_source);
+ g_source_unref(s->hup_source);
+ s->hup_source = NULL;
+ }
+}
+
+static gboolean qmp_migrate_finish_cb(QIOChannel *channel,
+ GIOCondition cond,
+ void *opaque)
+{
+ MigrationAddress *addr = opaque;
+
+ qmp_migrate_finish(addr, false, NULL);
+
+ cpr_state_close();
+ migrate_hup_delete(migrate_get_current());
+ qapi_free_MigrationAddress(addr);
+ return G_SOURCE_REMOVE;
+}
+
void qmp_migrate(const char *uri, bool has_channels,
MigrationChannelList *channels, bool has_detach, bool detach,
bool has_resume, bool resume, Error **errp)
@@ -2056,6 +2118,7 @@ void qmp_migrate(const char *uri, bool has_channels,
g_autoptr(MigrationChannel) channel = NULL;
MigrationAddress *addr = NULL;
MigrationChannel *channelv[MIGRATION_CHANNEL_TYPE__MAX] = { NULL };
+ MigrationChannel *cpr_channel = NULL;
/*
* Having preliminary checks for uri and channel
@@ -2076,6 +2139,7 @@ void qmp_migrate(const char *uri, bool has_channels,
}
channelv[type] = channels->value;
}
+ cpr_channel = channelv[MIGRATION_CHANNEL_TYPE_CPR];
addr = channelv[MIGRATION_CHANNEL_TYPE_MAIN]->addr;
if (!addr) {
error_setg(errp, "Channel list has no main entry");
@@ -2096,12 +2160,52 @@ void qmp_migrate(const char *uri, bool has_channels,
return;
}
+ if (s->parameters.mode == MIG_MODE_CPR_TRANSFER && !cpr_channel) {
+ error_setg(errp, "missing 'cpr' migration channel");
+ return;
+ }
+
resume_requested = has_resume && resume;
if (!migrate_prepare(s, resume_requested, errp)) {
/* Error detected, put into errp */
return;
}
+ if (cpr_state_save(cpr_channel, &local_err)) {
+ goto out;
+ }
+
+ /*
+ * For cpr-transfer, the target may not be listening yet on the migration
+ * channel, because first it must finish cpr_load_state. The target tells
+ * us it is listening by closing the cpr-state socket. Wait for that HUP
+ * event before connecting in qmp_migrate_finish.
+ *
+ * The HUP could occur because the target fails while reading CPR state,
+ * in which case the target will not listen for the incoming migration
+ * connection, so qmp_migrate_finish will fail to connect, and then recover.
+ */
+ if (s->parameters.mode == MIG_MODE_CPR_TRANSFER) {
+ migrate_hup_add(s, cpr_state_ioc(), (GSourceFunc)qmp_migrate_finish_cb,
+ QAPI_CLONE(MigrationAddress, addr));
+
+ } else {
+ qmp_migrate_finish(addr, resume_requested, errp);
+ }
+
+out:
+ if (local_err) {
+ migrate_fd_error(s, local_err);
+ error_propagate(errp, local_err);
+ }
+}
+
+static void qmp_migrate_finish(MigrationAddress *addr, bool resume_requested,
+ Error **errp)
+{
+ MigrationState *s = migrate_get_current();
+ Error *local_err = NULL;
+
if (!resume_requested) {
if (!yank_register_instance(MIGRATION_YANK_INSTANCE, errp)) {
return;
diff --git a/migration/migration.h b/migration/migration.h
index 1d4d4e910d..fb1b8f99d3 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -468,6 +468,8 @@ struct MigrationState {
bool switchover_acked;
/* Is this a rdma migration */
bool rdma_migration;
+
+ GSource *hup_source;
};
void migrate_set_state(MigrationStatus *state, MigrationStatus old_state,
diff --git a/migration/options.c b/migration/options.c
index b8d5300326..1ad950e397 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -22,6 +22,7 @@
#include "qapi/qmp/qnull.h"
#include "system/runstate.h"
#include "migration/colo.h"
+#include "migration/cpr.h"
#include "migration/misc.h"
#include "migration.h"
#include "migration-stats.h"
@@ -745,8 +746,11 @@ uint64_t migrate_max_postcopy_bandwidth(void)
MigMode migrate_mode(void)
{
- MigrationState *s = migrate_get_current();
- MigMode mode = s->parameters.mode;
+ MigMode mode = cpr_get_incoming_mode();
+
+ if (mode == MIG_MODE_NONE) {
+ mode = migrate_get_current()->parameters.mode;
+ }
assert(mode >= 0 && mode < MIG_MODE__MAX);
return mode;
diff --git a/migration/ram.c b/migration/ram.c
index ce28328141..5aace00bf1 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -195,7 +195,9 @@ static bool postcopy_preempt_active(void)
bool migrate_ram_is_ignored(RAMBlock *block)
{
+ MigMode mode = migrate_mode();
return !qemu_ram_is_migratable(block) ||
+ mode == MIG_MODE_CPR_TRANSFER ||
(migrate_ignore_shared() && qemu_ram_is_shared(block)
&& qemu_ram_is_named_file(block));
}
diff --git a/migration/vmstate-types.c b/migration/vmstate-types.c
index 0319c3568b..741a588b7e 100644
--- a/migration/vmstate-types.c
+++ b/migration/vmstate-types.c
@@ -15,6 +15,7 @@
#include "qemu-file.h"
#include "migration.h"
#include "migration/vmstate.h"
+#include "migration/client-options.h"
#include "qemu/error-report.h"
#include "qemu/queue.h"
#include "trace.h"
diff --git a/qapi/migration.json b/qapi/migration.json
index a605dc26db..4679ce9f2a 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -614,9 +614,48 @@
# or COLO.
#
# (since 8.2)
+#
+# @cpr-transfer: This mode allows the user to transfer a guest to a
+# new QEMU instance on the same host with minimal guest pause
+# time by preserving guest RAM in place. Devices and their pinned
+# pages will also be preserved in a future QEMU release.
+#
+# The user starts new QEMU on the same host as old QEMU, with
+# command-line arguments to create the same machine, plus the
+# -incoming option for the main migration channel, like normal
+# live migration. In addition, the user adds a second -incoming
+# option with channel type "cpr". This CPR channel must support
+# file descriptor transfer with SCM_RIGHTS, i.e. it must be a
+# UNIX domain socket.
+#
+# To initiate CPR, the user issues a migrate command to old QEMU,
+# adding a second migration channel of type "cpr" in the channels
+# argument. Old QEMU stops the VM, saves state to the migration
+# channels, and enters the postmigrate state. Execution resumes
+# in new QEMU.
+#
+# New QEMU reads the CPR channel before opening a monitor, hence
+# the CPR channel cannot be specified in the list of channels for
+# a migrate-incoming command. It may only be specified on the
+# command line.
+#
+# The main channel address cannot be a file type, and for an
+# inet socket, the port cannot be 0 (meaning dynamically choose
+# a port).
+#
+# Memory-backend objects must have the share=on attribute, but
+# memory-backend-epc is not supported. The VM must be started
+# with the '-machine aux-ram-share=on' option.
+#
+# When using -incoming defer, you must issue the migrate command
+# to old QEMU before issuing any monitor commands to new QEMU.
+# However, new QEMU does not open and read the migration stream
+# until you issue the migrate incoming command.
+#
+# (since 10.0)
##
{ 'enum': 'MigMode',
- 'data': [ 'normal', 'cpr-reboot' ] }
+ 'data': [ 'normal', 'cpr-reboot', 'cpr-transfer' ] }
##
# @ZeroPageDetection:
@@ -1578,11 +1617,12 @@
# The migration channel-type request options.
#
# @main: Main outbound migration channel.
+# @cpr: Checkpoint and restart state channel.
#
# Since: 8.1
##
{ 'enum': 'MigrationChannelType',
- 'data': [ 'main' ] }
+ 'data': [ 'main', 'cpr' ] }
##
# @MigrationChannel:
diff --git a/qemu-options.hx b/qemu-options.hx
index 3d1af7325b..d19bf533d6 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -112,6 +112,8 @@ SRST
specified on the command line, or implicitly created by the -m
command line option. The default is off.
+ To use the cpr-transfer migration mode, you must set aux-ram-share=on.
+
``memory-backend='id'``
An alternative to legacy ``-mem-path`` and ``mem-prealloc`` options.
Allows to use a memory backend as main RAM.
diff --git a/stubs/vmstate.c b/stubs/vmstate.c
index 8513d9204e..c190762d7c 100644
--- a/stubs/vmstate.c
+++ b/stubs/vmstate.c
@@ -1,5 +1,7 @@
#include "qemu/osdep.h"
#include "migration/vmstate.h"
+#include "qapi/qapi-types-migration.h"
+#include "migration/client-options.h"
int vmstate_register_with_alias_id(VMStateIf *obj,
uint32_t instance_id,
@@ -21,3 +23,8 @@ bool vmstate_check_only_migratable(const VMStateDescription *vmsd)
{
return true;
}
+
+MigMode migrate_mode(void)
+{
+ return MIG_MODE_NORMAL;
+}
diff --git a/system/vl.c b/system/vl.c
index 504f05b954..db8e604eba 100644
--- a/system/vl.c
+++ b/system/vl.c
@@ -77,6 +77,7 @@
#include "hw/block/block.h"
#include "hw/i386/x86.h"
#include "hw/i386/pc.h"
+#include "migration/cpr.h"
#include "migration/misc.h"
#include "migration/snapshot.h"
#include "system/tpm.h"
@@ -3706,6 +3707,12 @@ void qemu_init(int argc, char **argv)
qemu_create_machine(machine_opts_dict);
+ /*
+ * Load incoming CPR state before any devices are created, because it
+ * contains file descriptors that are needed in device initialization code.
+ */
+ cpr_state_load(incoming_channels[MIGRATION_CHANNEL_TYPE_CPR], &error_fatal);
+
suspend_mux_open();
qemu_disable_default_devices();
--
2.35.3
^ permalink raw reply related [flat|nested] 47+ messages in thread
* [PULL 18/42] migration-test: memory_backend
2025-01-29 16:00 [PULL 00/42] Migration patches for 2025-01-29 Fabiano Rosas
` (16 preceding siblings ...)
2025-01-29 16:00 ` [PULL 17/42] migration: cpr-transfer mode Fabiano Rosas
@ 2025-01-29 16:00 ` Fabiano Rosas
2025-01-29 16:00 ` [PULL 19/42] tests/qtest: optimize migrate_set_ports Fabiano Rosas
` (24 subsequent siblings)
42 siblings, 0 replies; 47+ messages in thread
From: Fabiano Rosas @ 2025-01-29 16:00 UTC (permalink / raw)
To: qemu-devel; +Cc: Peter Xu, Steve Sistare
From: Steve Sistare <steven.sistare@oracle.com>
Allow each migration test to define its own memory backend, replacing
the standard "-m <size>" specification.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/1736967650-129648-18-git-send-email-steven.sistare@oracle.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
tests/qtest/migration/framework.c | 15 +++++++++++----
tests/qtest/migration/framework.h | 5 +++++
2 files changed, 16 insertions(+), 4 deletions(-)
diff --git a/tests/qtest/migration/framework.c b/tests/qtest/migration/framework.c
index 4550cda129..758e14abab 100644
--- a/tests/qtest/migration/framework.c
+++ b/tests/qtest/migration/framework.c
@@ -221,6 +221,7 @@ int migrate_start(QTestState **from, QTestState **to, const char *uri,
g_autofree char *machine = NULL;
const char *bootpath;
g_autoptr(QList) capabilities = migrate_start_get_qmp_capabilities(args);
+ g_autofree char *memory_backend = NULL;
if (args->use_shmem) {
if (!g_file_test("/dev/shm", G_FILE_TEST_IS_DIR)) {
@@ -296,6 +297,12 @@ int migrate_start(QTestState **from, QTestState **to, const char *uri,
memory_size, shmem_path);
}
+ if (args->memory_backend) {
+ memory_backend = g_strdup_printf(args->memory_backend, memory_size);
+ } else {
+ memory_backend = g_strdup_printf("-m %s ", memory_size);
+ }
+
if (args->use_dirty_ring) {
kvm_opts = ",dirty-ring-size=4096";
}
@@ -314,12 +321,12 @@ int migrate_start(QTestState **from, QTestState **to, const char *uri,
cmd_source = g_strdup_printf("-accel kvm%s -accel tcg "
"-machine %s,%s "
"-name source,debug-threads=on "
- "-m %s "
+ "%s "
"-serial file:%s/src_serial "
"%s %s %s %s",
kvm_opts ? kvm_opts : "",
machine, machine_opts,
- memory_size, tmpfs,
+ memory_backend, tmpfs,
arch_opts ? arch_opts : "",
shmem_opts ? shmem_opts : "",
args->opts_source ? args->opts_source : "",
@@ -335,13 +342,13 @@ int migrate_start(QTestState **from, QTestState **to, const char *uri,
cmd_target = g_strdup_printf("-accel kvm%s -accel tcg "
"-machine %s,%s "
"-name target,debug-threads=on "
- "-m %s "
+ "%s "
"-serial file:%s/dest_serial "
"-incoming %s "
"%s %s %s %s",
kvm_opts ? kvm_opts : "",
machine, machine_opts,
- memory_size, tmpfs, uri,
+ memory_backend, tmpfs, uri,
arch_opts ? arch_opts : "",
shmem_opts ? shmem_opts : "",
args->opts_target ? args->opts_target : "",
diff --git a/tests/qtest/migration/framework.h b/tests/qtest/migration/framework.h
index 7991ee56b6..dd2db1c000 100644
--- a/tests/qtest/migration/framework.h
+++ b/tests/qtest/migration/framework.h
@@ -111,6 +111,11 @@ typedef struct {
bool suspend_me;
/* enable OOB QMP capability */
bool oob;
+ /*
+ * Format string for the main memory backend, containing one %s where the
+ * size is plugged in. If omitted, "-m %s" is used.
+ */
+ const char *memory_backend;
} MigrateStart;
typedef enum PostcopyRecoveryFailStage {
--
2.35.3
^ permalink raw reply related [flat|nested] 47+ messages in thread
* [PULL 19/42] tests/qtest: optimize migrate_set_ports
2025-01-29 16:00 [PULL 00/42] Migration patches for 2025-01-29 Fabiano Rosas
` (17 preceding siblings ...)
2025-01-29 16:00 ` [PULL 18/42] migration-test: memory_backend Fabiano Rosas
@ 2025-01-29 16:00 ` Fabiano Rosas
2025-01-29 16:00 ` [PULL 20/42] tests/qtest: defer connection Fabiano Rosas
` (23 subsequent siblings)
42 siblings, 0 replies; 47+ messages in thread
From: Fabiano Rosas @ 2025-01-29 16:00 UTC (permalink / raw)
To: qemu-devel; +Cc: Peter Xu, Steve Sistare
From: Steve Sistare <steven.sistare@oracle.com>
Do not query connection parameters if all port numbers are known. This is
more efficient, and also solves a problem for the cpr-transfer test.
At the point where cpr-transfer calls migrate_qmp and migrate_set_ports,
the monitor is not connected and queries are not allowed. Port=0 is
never used for cpr-transfer.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/1736967650-129648-19-git-send-email-steven.sistare@oracle.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
tests/qtest/migration/migration-util.c | 23 +++++++++++++++--------
1 file changed, 15 insertions(+), 8 deletions(-)
diff --git a/tests/qtest/migration/migration-util.c b/tests/qtest/migration/migration-util.c
index 526bed74ea..0ce1413b6c 100644
--- a/tests/qtest/migration/migration-util.c
+++ b/tests/qtest/migration/migration-util.c
@@ -135,25 +135,32 @@ migrate_get_connect_qdict(QTestState *who)
void migrate_set_ports(QTestState *to, QList *channel_list)
{
- QDict *addr;
+ g_autoptr(QDict) addr = NULL;
QListEntry *entry;
const char *addr_port = NULL;
- addr = migrate_get_connect_qdict(to);
-
QLIST_FOREACH_ENTRY(channel_list, entry) {
QDict *channel = qobject_to(QDict, qlist_entry_obj(entry));
QDict *addrdict = qdict_get_qdict(channel, "addr");
- if (qdict_haskey(addrdict, "port") &&
- qdict_haskey(addr, "port") &&
- (strcmp(qdict_get_str(addrdict, "port"), "0") == 0)) {
+ if (!qdict_haskey(addrdict, "port") ||
+ strcmp(qdict_get_str(addrdict, "port"), "0")) {
+ continue;
+ }
+
+ /*
+ * Fetch addr only if needed, so tests that are not yet connected to
+ * the monitor do not query it. Such tests cannot use port=0.
+ */
+ if (!addr) {
+ addr = migrate_get_connect_qdict(to);
+ }
+
+ if (qdict_haskey(addr, "port")) {
addr_port = qdict_get_str(addr, "port");
qdict_put_str(addrdict, "port", addr_port);
}
}
-
- qobject_unref(addr);
}
bool migrate_watch_for_events(QTestState *who, const char *name,
--
2.35.3
^ permalink raw reply related [flat|nested] 47+ messages in thread
* [PULL 20/42] tests/qtest: defer connection
2025-01-29 16:00 [PULL 00/42] Migration patches for 2025-01-29 Fabiano Rosas
` (18 preceding siblings ...)
2025-01-29 16:00 ` [PULL 19/42] tests/qtest: optimize migrate_set_ports Fabiano Rosas
@ 2025-01-29 16:00 ` Fabiano Rosas
2025-01-29 16:00 ` [PULL 21/42] migration-test: " Fabiano Rosas
` (22 subsequent siblings)
42 siblings, 0 replies; 47+ messages in thread
From: Fabiano Rosas @ 2025-01-29 16:00 UTC (permalink / raw)
To: qemu-devel; +Cc: Peter Xu, Steve Sistare
From: Steve Sistare <steven.sistare@oracle.com>
Add an option to defer making the connecting to the monitor and qtest
sockets when calling qtest_init_with_env. The client makes the connection
later by calling qtest_connect and qtest_qmp_handshake.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/1736967650-129648-20-git-send-email-steven.sistare@oracle.com
[plumb capabilities list into qtest_qmp_handshake]
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
tests/qtest/libqtest.c | 99 ++++++++++++++++++++-----------
tests/qtest/libqtest.h | 24 +++++++-
tests/qtest/migration/framework.c | 7 ++-
3 files changed, 90 insertions(+), 40 deletions(-)
diff --git a/tests/qtest/libqtest.c b/tests/qtest/libqtest.c
index a1e105f27f..fbb51e3e55 100644
--- a/tests/qtest/libqtest.c
+++ b/tests/qtest/libqtest.c
@@ -75,6 +75,8 @@ struct QTestState
{
int fd;
int qmp_fd;
+ int sock;
+ int qmpsock;
pid_t qemu_pid; /* our child QEMU process */
int wstatus;
#ifdef _WIN32
@@ -458,18 +460,19 @@ static QTestState *G_GNUC_PRINTF(2, 3) qtest_spawn_qemu(const char *qemu_bin,
return s;
}
+static char *qtest_socket_path(const char *suffix)
+{
+ return g_strdup_printf("%s/qtest-%d.%s", g_get_tmp_dir(), getpid(), suffix);
+}
+
static QTestState *qtest_init_internal(const char *qemu_bin,
- const char *extra_args)
+ const char *extra_args,
+ bool do_connect)
{
QTestState *s;
int sock, qmpsock, i;
- gchar *socket_path;
- gchar *qmp_socket_path;
-
- socket_path = g_strdup_printf("%s/qtest-%d.sock",
- g_get_tmp_dir(), getpid());
- qmp_socket_path = g_strdup_printf("%s/qtest-%d.qmp",
- g_get_tmp_dir(), getpid());
+ g_autofree gchar *socket_path = qtest_socket_path("sock");
+ g_autofree gchar *qmp_socket_path = qtest_socket_path("qmp");
/*
* It's possible that if an earlier test run crashed it might
@@ -501,22 +504,19 @@ static QTestState *qtest_init_internal(const char *qemu_bin,
qtest_client_set_rx_handler(s, qtest_client_socket_recv_line);
qtest_client_set_tx_handler(s, qtest_client_socket_send);
- s->fd = socket_accept(sock);
- if (s->fd >= 0) {
- s->qmp_fd = socket_accept(qmpsock);
- }
- unlink(socket_path);
- unlink(qmp_socket_path);
- g_free(socket_path);
- g_free(qmp_socket_path);
-
- g_assert(s->fd >= 0 && s->qmp_fd >= 0);
-
s->rx = g_string_new("");
for (i = 0; i < MAX_IRQ; i++) {
s->irq_level[i] = false;
}
+ s->fd = -1;
+ s->qmp_fd = -1;
+ s->sock = sock;
+ s->qmpsock = qmpsock;
+ if (do_connect) {
+ qtest_connect(s);
+ }
+
/*
* Stopping QEMU for debugging is not supported on Windows.
*
@@ -531,28 +531,38 @@ static QTestState *qtest_init_internal(const char *qemu_bin,
}
#endif
+ return s;
+}
+
+void qtest_connect(QTestState *s)
+{
+ g_autofree gchar *socket_path = qtest_socket_path("sock");
+ g_autofree gchar *qmp_socket_path = qtest_socket_path("qmp");
+
+ g_assert(s->sock >= 0 && s->qmpsock >= 0);
+ s->fd = socket_accept(s->sock);
+ if (s->fd >= 0) {
+ s->qmp_fd = socket_accept(s->qmpsock);
+ }
+ unlink(socket_path);
+ unlink(qmp_socket_path);
+ g_assert(s->fd >= 0 && s->qmp_fd >= 0);
+ s->sock = s->qmpsock = -1;
/* ask endianness of the target */
-
s->big_endian = qtest_query_target_endianness(s);
-
- return s;
}
QTestState *qtest_init_without_qmp_handshake(const char *extra_args)
{
- return qtest_init_internal(qtest_qemu_binary(NULL), extra_args);
+ return qtest_init_internal(qtest_qemu_binary(NULL), extra_args, true);
}
-QTestState *qtest_init_with_env_and_capabilities(const char *var,
- const char *extra_args,
- QList *capabilities)
+void qtest_qmp_handshake(QTestState *s, QList *capabilities)
{
- QTestState *s = qtest_init_internal(qtest_qemu_binary(var), extra_args);
- QDict *greeting;
-
/* Read the QMP greeting and then do the handshake */
- greeting = qtest_qmp_receive(s);
+ QDict *greeting = qtest_qmp_receive(s);
qobject_unref(greeting);
+
if (capabilities) {
qtest_qmp_assert_success(s,
"{ 'execute': 'qmp_capabilities', "
@@ -561,18 +571,37 @@ QTestState *qtest_init_with_env_and_capabilities(const char *var,
} else {
qtest_qmp_assert_success(s, "{ 'execute': 'qmp_capabilities' }");
}
+}
+QTestState *qtest_init_with_env_and_capabilities(const char *var,
+ const char *extra_args,
+ QList *capabilities,
+ bool do_connect)
+{
+ QTestState *s = qtest_init_internal(qtest_qemu_binary(var), extra_args,
+ do_connect);
+
+ if (do_connect) {
+ qtest_qmp_handshake(s, capabilities);
+ } else {
+ /*
+ * If the connection is delayed, the capabilities must be set
+ * at that moment.
+ */
+ assert(!capabilities);
+ }
return s;
}
-QTestState *qtest_init_with_env(const char *var, const char *extra_args)
+QTestState *qtest_init_with_env(const char *var, const char *extra_args,
+ bool do_connect)
{
- return qtest_init_with_env_and_capabilities(var, extra_args, NULL);
+ return qtest_init_with_env_and_capabilities(var, extra_args, NULL, true);
}
QTestState *qtest_init(const char *extra_args)
{
- return qtest_init_with_env(NULL, extra_args);
+ return qtest_init_with_env(NULL, extra_args, true);
}
QTestState *qtest_vinitf(const char *fmt, va_list ap)
@@ -1580,7 +1609,7 @@ static struct MachInfo *qtest_get_machines(const char *var)
silence_spawn_log = !g_test_verbose();
- qts = qtest_init_with_env(qemu_var, "-machine none");
+ qts = qtest_init_with_env(qemu_var, "-machine none", true);
response = qtest_qmp(qts, "{ 'execute': 'query-machines' }");
g_assert(response);
list = qdict_get_qlist(response, "return");
@@ -1635,7 +1664,7 @@ static struct CpuModel *qtest_get_cpu_models(void)
silence_spawn_log = !g_test_verbose();
- qts = qtest_init_with_env(NULL, "-machine none");
+ qts = qtest_init_with_env(NULL, "-machine none", true);
response = qtest_qmp(qts, "{ 'execute': 'query-cpu-definitions' }");
g_assert(response);
list = qdict_get_qlist(response, "return");
diff --git a/tests/qtest/libqtest.h b/tests/qtest/libqtest.h
index ce88d23eae..29f123e281 100644
--- a/tests/qtest/libqtest.h
+++ b/tests/qtest/libqtest.h
@@ -61,13 +61,15 @@ QTestState *qtest_init(const char *extra_args);
* @var: Environment variable from where to take the QEMU binary
* @extra_args: Other arguments to pass to QEMU. CAUTION: these
* arguments are subject to word splitting and shell evaluation.
+ * @do_connect: connect to qemu monitor and qtest socket.
*
* Like qtest_init(), but use a different environment variable for the
* QEMU binary.
*
* Returns: #QTestState instance.
*/
-QTestState *qtest_init_with_env(const char *var, const char *extra_args);
+QTestState *qtest_init_with_env(const char *var, const char *extra_args,
+ bool do_connect);
/**
* qtest_init_with_env_and_capabilities:
@@ -75,6 +77,7 @@ QTestState *qtest_init_with_env(const char *var, const char *extra_args);
* @extra_args: Other arguments to pass to QEMU. CAUTION: these
* arguments are subject to word splitting and shell evaluation.
* @capabilities: list of QMP capabilities (strings) to enable
+ * @do_connect: connect to qemu monitor and qtest socket.
*
* Like qtest_init_with_env(), but enable specified capabilities during
* hadshake.
@@ -83,7 +86,8 @@ QTestState *qtest_init_with_env(const char *var, const char *extra_args);
*/
QTestState *qtest_init_with_env_and_capabilities(const char *var,
const char *extra_args,
- QList *capabilities);
+ QList *capabilities,
+ bool do_connect);
/**
* qtest_init_without_qmp_handshake:
@@ -94,6 +98,22 @@ QTestState *qtest_init_with_env_and_capabilities(const char *var,
*/
QTestState *qtest_init_without_qmp_handshake(const char *extra_args);
+/**
+ * qtest_connect
+ * @s: #QTestState instance to connect
+ * Connect to qemu monitor and qtest socket, after skipping them in
+ * qtest_init_with_env. Does not handshake with the monitor.
+ */
+void qtest_connect(QTestState *s);
+
+/**
+ * qtest_qmp_handshake:
+ * @s: #QTestState instance to operate on.
+ * @capabilities: list of QMP capabilities (strings) to enable
+ * Perform handshake after connecting to qemu monitor.
+ */
+void qtest_qmp_handshake(QTestState *s, QList *capabilities);
+
/**
* qtest_init_with_serial:
* @extra_args: other arguments to pass to QEMU. CAUTION: these
diff --git a/tests/qtest/migration/framework.c b/tests/qtest/migration/framework.c
index 758e14abab..f7add75ed5 100644
--- a/tests/qtest/migration/framework.c
+++ b/tests/qtest/migration/framework.c
@@ -196,9 +196,10 @@ static void cleanup(const char *filename)
static QList *migrate_start_get_qmp_capabilities(const MigrateStart *args)
{
- QList *capabilities = qlist_new();
+ QList *capabilities = NULL;
if (args->oob) {
+ capabilities = qlist_new();
qlist_append_str(capabilities, "oob");
}
return capabilities;
@@ -333,7 +334,7 @@ int migrate_start(QTestState **from, QTestState **to, const char *uri,
ignore_stderr);
if (!args->only_target) {
*from = qtest_init_with_env_and_capabilities(QEMU_ENV_SRC, cmd_source,
- capabilities);
+ capabilities, true);
qtest_qmp_set_event_callback(*from,
migrate_watch_for_events,
&src_state);
@@ -354,7 +355,7 @@ int migrate_start(QTestState **from, QTestState **to, const char *uri,
args->opts_target ? args->opts_target : "",
ignore_stderr);
*to = qtest_init_with_env_and_capabilities(QEMU_ENV_DST, cmd_target,
- capabilities);
+ capabilities, true);
qtest_qmp_set_event_callback(*to,
migrate_watch_for_events,
&dst_state);
--
2.35.3
^ permalink raw reply related [flat|nested] 47+ messages in thread
* [PULL 21/42] migration-test: defer connection
2025-01-29 16:00 [PULL 00/42] Migration patches for 2025-01-29 Fabiano Rosas
` (19 preceding siblings ...)
2025-01-29 16:00 ` [PULL 20/42] tests/qtest: defer connection Fabiano Rosas
@ 2025-01-29 16:00 ` Fabiano Rosas
2025-01-29 16:00 ` [PULL 22/42] tests/qtest: enhance migration channels Fabiano Rosas
` (21 subsequent siblings)
42 siblings, 0 replies; 47+ messages in thread
From: Fabiano Rosas @ 2025-01-29 16:00 UTC (permalink / raw)
To: qemu-devel; +Cc: Peter Xu, Steve Sistare
From: Steve Sistare <steven.sistare@oracle.com>
Add an option to defer connection to the target monitor, needed by the
cpr-transfer test.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/1736967650-129648-21-git-send-email-steven.sistare@oracle.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
tests/qtest/migration/framework.c | 22 +++++++++++++++++++---
tests/qtest/migration/framework.h | 3 +++
2 files changed, 22 insertions(+), 3 deletions(-)
diff --git a/tests/qtest/migration/framework.c b/tests/qtest/migration/framework.c
index f7add75ed5..2611c31c1b 100644
--- a/tests/qtest/migration/framework.c
+++ b/tests/qtest/migration/framework.c
@@ -223,6 +223,7 @@ int migrate_start(QTestState **from, QTestState **to, const char *uri,
const char *bootpath;
g_autoptr(QList) capabilities = migrate_start_get_qmp_capabilities(args);
g_autofree char *memory_backend = NULL;
+ const char *events;
if (args->use_shmem) {
if (!g_file_test("/dev/shm", G_FILE_TEST_IS_DIR)) {
@@ -340,22 +341,30 @@ int migrate_start(QTestState **from, QTestState **to, const char *uri,
&src_state);
}
+ /*
+ * If the monitor connection is deferred, enable events on the command line
+ * so none are missed. This is for testing only, do not set migration
+ * options like this in general.
+ */
+ events = args->defer_target_connect ? "-global migration.x-events=on" : "";
+
cmd_target = g_strdup_printf("-accel kvm%s -accel tcg "
"-machine %s,%s "
"-name target,debug-threads=on "
"%s "
"-serial file:%s/dest_serial "
"-incoming %s "
- "%s %s %s %s",
+ "%s %s %s %s %s",
kvm_opts ? kvm_opts : "",
machine, machine_opts,
memory_backend, tmpfs, uri,
+ events,
arch_opts ? arch_opts : "",
shmem_opts ? shmem_opts : "",
args->opts_target ? args->opts_target : "",
ignore_stderr);
*to = qtest_init_with_env_and_capabilities(QEMU_ENV_DST, cmd_target,
- capabilities, true);
+ capabilities, !args->defer_target_connect);
qtest_qmp_set_event_callback(*to,
migrate_watch_for_events,
&dst_state);
@@ -373,7 +382,9 @@ int migrate_start(QTestState **from, QTestState **to, const char *uri,
* to mimic as closer as that.
*/
migrate_set_capability(*from, "events", true);
- migrate_set_capability(*to, "events", true);
+ if (!args->defer_target_connect) {
+ migrate_set_capability(*to, "events", true);
+ }
return 0;
}
@@ -733,6 +744,11 @@ void test_precopy_common(MigrateCommon *args)
migrate_qmp(from, to, args->connect_uri, args->connect_channels, "{}");
+ if (args->start.defer_target_connect) {
+ qtest_connect(to);
+ qtest_qmp_handshake(to, NULL);
+ }
+
if (args->result != MIG_TEST_SUCCEED) {
bool allow_active = args->result == MIG_TEST_FAIL;
wait_for_migration_fail(from, allow_active);
diff --git a/tests/qtest/migration/framework.h b/tests/qtest/migration/framework.h
index dd2db1c000..32f3a93632 100644
--- a/tests/qtest/migration/framework.h
+++ b/tests/qtest/migration/framework.h
@@ -116,6 +116,9 @@ typedef struct {
* size is plugged in. If omitted, "-m %s" is used.
*/
const char *memory_backend;
+
+ /* Do not connect to target monitor and qtest sockets in qtest_init */
+ bool defer_target_connect;
} MigrateStart;
typedef enum PostcopyRecoveryFailStage {
--
2.35.3
^ permalink raw reply related [flat|nested] 47+ messages in thread
* [PULL 22/42] tests/qtest: enhance migration channels
2025-01-29 16:00 [PULL 00/42] Migration patches for 2025-01-29 Fabiano Rosas
` (20 preceding siblings ...)
2025-01-29 16:00 ` [PULL 21/42] migration-test: " Fabiano Rosas
@ 2025-01-29 16:00 ` Fabiano Rosas
2025-01-29 16:00 ` [PULL 23/42] tests/qtest: assert qmp connected Fabiano Rosas
` (20 subsequent siblings)
42 siblings, 0 replies; 47+ messages in thread
From: Fabiano Rosas @ 2025-01-29 16:00 UTC (permalink / raw)
To: qemu-devel; +Cc: Peter Xu, Steve Sistare
From: Steve Sistare <steven.sistare@oracle.com>
Change the migrate_qmp and migrate_qmp_fail channels argument to a QObject
type so the caller can manipulate the object before passing it to the
helper. Define migrate_str_to_channel to aid such manipulation.
Add a channels argument to migrate_incoming_qmp.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/1736967650-129648-22-git-send-email-steven.sistare@oracle.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
tests/qtest/migration/framework.c | 15 ++++++--
tests/qtest/migration/migration-qmp.c | 53 ++++++++++++++++++++++-----
tests/qtest/migration/migration-qmp.h | 10 +++--
tests/qtest/migration/misc-tests.c | 9 ++++-
tests/qtest/migration/precopy-tests.c | 6 +--
tests/qtest/virtio-net-failover.c | 8 ++--
6 files changed, 76 insertions(+), 25 deletions(-)
diff --git a/tests/qtest/migration/framework.c b/tests/qtest/migration/framework.c
index 2611c31c1b..1228bd5bca 100644
--- a/tests/qtest/migration/framework.c
+++ b/tests/qtest/migration/framework.c
@@ -18,6 +18,8 @@
#include "migration/migration-qmp.h"
#include "migration/migration-util.h"
#include "ppc-util.h"
+#include "qapi/error.h"
+#include "qapi/qmp/qjson.h"
#include "qapi/qmp/qlist.h"
#include "qemu/module.h"
#include "qemu/option.h"
@@ -705,6 +707,7 @@ void test_precopy_common(MigrateCommon *args)
{
QTestState *from, *to;
void *data_hook = NULL;
+ QObject *out_channels = NULL;
if (migrate_start(&from, &to, args->listen_uri, &args->start)) {
return;
@@ -737,12 +740,16 @@ void test_precopy_common(MigrateCommon *args)
}
}
+ if (args->connect_channels) {
+ out_channels = qobject_from_json(args->connect_channels, &error_abort);
+ }
+
if (args->result == MIG_TEST_QMP_ERROR) {
- migrate_qmp_fail(from, args->connect_uri, args->connect_channels, "{}");
+ migrate_qmp_fail(from, args->connect_uri, out_channels, "{}");
goto finish;
}
- migrate_qmp(from, to, args->connect_uri, args->connect_channels, "{}");
+ migrate_qmp(from, to, args->connect_uri, out_channels, "{}");
if (args->start.defer_target_connect) {
qtest_connect(to);
@@ -892,7 +899,7 @@ void test_file_common(MigrateCommon *args, bool stop_src)
* We need to wait for the source to finish before starting the
* destination.
*/
- migrate_incoming_qmp(to, args->connect_uri, "{}");
+ migrate_incoming_qmp(to, args->connect_uri, NULL, "{}");
wait_for_migration_complete(to);
if (stop_src) {
@@ -928,7 +935,7 @@ void *migrate_hook_start_precopy_tcp_multifd_common(QTestState *from,
migrate_set_capability(to, "multifd", true);
/* Start incoming migration from the 1st socket */
- migrate_incoming_qmp(to, "tcp:127.0.0.1:0", "{}");
+ migrate_incoming_qmp(to, "tcp:127.0.0.1:0", NULL, "{}");
return NULL;
}
diff --git a/tests/qtest/migration/migration-qmp.c b/tests/qtest/migration/migration-qmp.c
index 9431d2beda..5610f6d15d 100644
--- a/tests/qtest/migration/migration-qmp.c
+++ b/tests/qtest/migration/migration-qmp.c
@@ -15,9 +15,13 @@
#include "migration-qmp.h"
#include "migration-util.h"
#include "qapi/error.h"
+#include "qapi/qapi-types-migration.h"
+#include "qapi/qapi-visit-migration.h"
#include "qapi/qmp/qdict.h"
#include "qapi/qmp/qjson.h"
#include "qapi/qmp/qlist.h"
+#include "qapi/qobject-input-visitor.h"
+#include "qapi/qobject-output-visitor.h"
/*
* Number of seconds we wait when looking for migration
@@ -47,8 +51,33 @@ void migration_event_wait(QTestState *s, const char *target)
} while (!found);
}
+/*
+ * Convert a string representing a single channel to an object.
+ * @str may be in JSON or dotted keys format.
+ */
+QObject *migrate_str_to_channel(const char *str)
+{
+ Visitor *v;
+ MigrationChannel *channel;
+ QObject *obj;
+
+ /* Create the channel */
+ v = qobject_input_visitor_new_str(str, "channel-type", &error_abort);
+ visit_type_MigrationChannel(v, NULL, &channel, &error_abort);
+ visit_free(v);
+
+ /* Create the object */
+ v = qobject_output_visitor_new(&obj);
+ visit_type_MigrationChannel(v, NULL, &channel, &error_abort);
+ visit_complete(v, &obj);
+ visit_free(v);
+
+ qapi_free_MigrationChannel(channel);
+ return obj;
+}
+
void migrate_qmp_fail(QTestState *who, const char *uri,
- const char *channels, const char *fmt, ...)
+ QObject *channels, const char *fmt, ...)
{
va_list ap;
QDict *args, *err;
@@ -64,8 +93,7 @@ void migrate_qmp_fail(QTestState *who, const char *uri,
g_assert(!qdict_haskey(args, "channels"));
if (channels) {
- QObject *channels_obj = qobject_from_json(channels, &error_abort);
- qdict_put_obj(args, "channels", channels_obj);
+ qdict_put_obj(args, "channels", channels);
}
err = qtest_qmp_assert_failure_ref(
@@ -82,7 +110,7 @@ void migrate_qmp_fail(QTestState *who, const char *uri,
* qobject_from_jsonf_nofail()) with "uri": @uri spliced in.
*/
void migrate_qmp(QTestState *who, QTestState *to, const char *uri,
- const char *channels, const char *fmt, ...)
+ QObject *channels, const char *fmt, ...)
{
va_list ap;
QDict *args;
@@ -102,10 +130,9 @@ void migrate_qmp(QTestState *who, QTestState *to, const char *uri,
g_assert(!qdict_haskey(args, "channels"));
if (channels) {
- QObject *channels_obj = qobject_from_json(channels, &error_abort);
- QList *channel_list = qobject_to(QList, channels_obj);
+ QList *channel_list = qobject_to(QList, channels);
migrate_set_ports(to, channel_list);
- qdict_put_obj(args, "channels", channels_obj);
+ qdict_put_obj(args, "channels", channels);
}
qtest_qmp_assert_success(who,
@@ -123,7 +150,8 @@ void migrate_set_capability(QTestState *who, const char *capability,
capability, value);
}
-void migrate_incoming_qmp(QTestState *to, const char *uri, const char *fmt, ...)
+void migrate_incoming_qmp(QTestState *to, const char *uri, QObject *channels,
+ const char *fmt, ...)
{
va_list ap;
QDict *args, *rsp;
@@ -133,7 +161,14 @@ void migrate_incoming_qmp(QTestState *to, const char *uri, const char *fmt, ...)
va_end(ap);
g_assert(!qdict_haskey(args, "uri"));
- qdict_put_str(args, "uri", uri);
+ if (uri) {
+ qdict_put_str(args, "uri", uri);
+ }
+
+ g_assert(!qdict_haskey(args, "channels"));
+ if (channels) {
+ qdict_put_obj(args, "channels", channels);
+ }
/* This function relies on the event to work, make sure it's enabled */
migrate_set_capability(to, "events", true);
diff --git a/tests/qtest/migration/migration-qmp.h b/tests/qtest/migration/migration-qmp.h
index caaa78722a..faa8181d91 100644
--- a/tests/qtest/migration/migration-qmp.h
+++ b/tests/qtest/migration/migration-qmp.h
@@ -4,17 +4,19 @@
#include "migration-util.h"
+QObject *migrate_str_to_channel(const char *str);
+
G_GNUC_PRINTF(4, 5)
void migrate_qmp_fail(QTestState *who, const char *uri,
- const char *channels, const char *fmt, ...);
+ QObject *channels, const char *fmt, ...);
G_GNUC_PRINTF(5, 6)
void migrate_qmp(QTestState *who, QTestState *to, const char *uri,
- const char *channels, const char *fmt, ...);
+ QObject *channels, const char *fmt, ...);
-G_GNUC_PRINTF(3, 4)
+G_GNUC_PRINTF(4, 5)
void migrate_incoming_qmp(QTestState *who, const char *uri,
- const char *fmt, ...);
+ QObject *channels, const char *fmt, ...);
void migration_event_wait(QTestState *s, const char *target);
void migrate_set_capability(QTestState *who, const char *capability,
diff --git a/tests/qtest/migration/misc-tests.c b/tests/qtest/migration/misc-tests.c
index 6173430748..dda3707cf3 100644
--- a/tests/qtest/migration/misc-tests.c
+++ b/tests/qtest/migration/misc-tests.c
@@ -11,6 +11,8 @@
*/
#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "qapi/qmp/qjson.h"
#include "libqtest.h"
#include "migration/framework.h"
#include "migration/migration-qmp.h"
@@ -205,6 +207,7 @@ static void test_validate_uuid_dst_not_set(void)
static void do_test_validate_uri_channel(MigrateCommon *args)
{
QTestState *from, *to;
+ QObject *channels;
if (migrate_start(&from, &to, args->listen_uri, &args->start)) {
return;
@@ -217,7 +220,11 @@ static void do_test_validate_uri_channel(MigrateCommon *args)
* 'uri' and 'channels' validation is checked even before the migration
* starts.
*/
- migrate_qmp_fail(from, args->connect_uri, args->connect_channels, "{}");
+ channels = args->connect_channels ?
+ qobject_from_json(args->connect_channels, &error_abort) :
+ NULL;
+ migrate_qmp_fail(from, args->connect_uri, channels, "{}");
+
migrate_end(from, to, false);
}
diff --git a/tests/qtest/migration/precopy-tests.c b/tests/qtest/migration/precopy-tests.c
index 23599b29ee..436dbd98e8 100644
--- a/tests/qtest/migration/precopy-tests.c
+++ b/tests/qtest/migration/precopy-tests.c
@@ -152,7 +152,7 @@ static void *migrate_hook_start_fd(QTestState *from,
close(pair[0]);
/* Start incoming migration from the 1st socket */
- migrate_incoming_qmp(to, "fd:fd-mig", "{}");
+ migrate_incoming_qmp(to, "fd:fd-mig", NULL, "{}");
/* Send the 2nd socket to the target */
qtest_qmp_fds_assert_success(from, &pair[1], 1,
@@ -479,7 +479,7 @@ static void test_multifd_tcp_cancel(void)
migrate_set_capability(to, "multifd", true);
/* Start incoming migration from the 1st socket */
- migrate_incoming_qmp(to, "tcp:127.0.0.1:0", "{}");
+ migrate_incoming_qmp(to, "tcp:127.0.0.1:0", NULL, "{}");
/* Wait for the first serial output from the source */
wait_for_serial("src_serial");
@@ -518,7 +518,7 @@ static void test_multifd_tcp_cancel(void)
migrate_set_capability(to2, "multifd", true);
/* Start incoming migration from the 1st socket */
- migrate_incoming_qmp(to2, "tcp:127.0.0.1:0", "{}");
+ migrate_incoming_qmp(to2, "tcp:127.0.0.1:0", NULL, "{}");
migrate_ensure_non_converge(from);
diff --git a/tests/qtest/virtio-net-failover.c b/tests/qtest/virtio-net-failover.c
index 08365ffa11..f04573f98c 100644
--- a/tests/qtest/virtio-net-failover.c
+++ b/tests/qtest/virtio-net-failover.c
@@ -773,7 +773,7 @@ static void test_migrate_in(gconstpointer opaque)
check_one_card(qts, true, "standby0", MAC_STANDBY0);
check_one_card(qts, false, "primary0", MAC_PRIMARY0);
- migrate_incoming_qmp(qts, uri, "{}");
+ migrate_incoming_qmp(qts, uri, NULL, "{}");
resp = get_failover_negociated_event(qts);
g_assert_cmpstr(qdict_get_str(resp, "device-id"), ==, "standby0");
@@ -895,7 +895,7 @@ static void test_off_migrate_in(gconstpointer opaque)
check_one_card(qts, true, "standby0", MAC_STANDBY0);
check_one_card(qts, true, "primary0", MAC_PRIMARY0);
- migrate_incoming_qmp(qts, uri, "{}");
+ migrate_incoming_qmp(qts, uri, NULL, "{}");
check_one_card(qts, true, "standby0", MAC_STANDBY0);
check_one_card(qts, true, "primary0", MAC_PRIMARY0);
@@ -1022,7 +1022,7 @@ static void test_guest_off_migrate_in(gconstpointer opaque)
check_one_card(qts, true, "standby0", MAC_STANDBY0);
check_one_card(qts, false, "primary0", MAC_PRIMARY0);
- migrate_incoming_qmp(qts, uri, "{}");
+ migrate_incoming_qmp(qts, uri, NULL, "{}");
check_one_card(qts, true, "standby0", MAC_STANDBY0);
check_one_card(qts, false, "primary0", MAC_PRIMARY0);
@@ -1747,7 +1747,7 @@ static void test_multi_in(gconstpointer opaque)
check_one_card(qts, true, "standby1", MAC_STANDBY1);
check_one_card(qts, false, "primary1", MAC_PRIMARY1);
- migrate_incoming_qmp(qts, uri, "{}");
+ migrate_incoming_qmp(qts, uri, NULL, "{}");
resp = get_failover_negociated_event(qts);
g_assert_cmpstr(qdict_get_str(resp, "device-id"), ==, "standby0");
--
2.35.3
^ permalink raw reply related [flat|nested] 47+ messages in thread
* [PULL 23/42] tests/qtest: assert qmp connected
2025-01-29 16:00 [PULL 00/42] Migration patches for 2025-01-29 Fabiano Rosas
` (21 preceding siblings ...)
2025-01-29 16:00 ` [PULL 22/42] tests/qtest: enhance migration channels Fabiano Rosas
@ 2025-01-29 16:00 ` Fabiano Rosas
2025-01-29 16:00 ` [PULL 24/42] migration-test: cpr-transfer Fabiano Rosas
` (19 subsequent siblings)
42 siblings, 0 replies; 47+ messages in thread
From: Fabiano Rosas @ 2025-01-29 16:00 UTC (permalink / raw)
To: qemu-devel; +Cc: Peter Xu, Steve Sistare
From: Steve Sistare <steven.sistare@oracle.com>
Assert that qmp_fd is valid when we communicate with the monitor.
Suggested-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Link: https://lore.kernel.org/r/1736967650-129648-23-git-send-email-steven.sistare@oracle.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
tests/qtest/libqtest.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/tests/qtest/libqtest.c b/tests/qtest/libqtest.c
index fbb51e3e55..437b24fa2e 100644
--- a/tests/qtest/libqtest.c
+++ b/tests/qtest/libqtest.c
@@ -811,6 +811,7 @@ QDict *qtest_qmp_receive(QTestState *s)
QDict *qtest_qmp_receive_dict(QTestState *s)
{
+ g_assert(s->qmp_fd >= 0);
return qmp_fd_receive(s->qmp_fd);
}
@@ -838,12 +839,14 @@ int qtest_socket_server(const char *socket_path)
void qtest_qmp_vsend_fds(QTestState *s, int *fds, size_t fds_num,
const char *fmt, va_list ap)
{
+ g_assert(s->qmp_fd >= 0);
qmp_fd_vsend_fds(s->qmp_fd, fds, fds_num, fmt, ap);
}
#endif
void qtest_qmp_vsend(QTestState *s, const char *fmt, va_list ap)
{
+ g_assert(s->qmp_fd >= 0);
qmp_fd_vsend(s->qmp_fd, fmt, ap);
}
@@ -904,6 +907,7 @@ void qtest_qmp_send_raw(QTestState *s, const char *fmt, ...)
{
va_list ap;
+ g_assert(s->qmp_fd >= 0);
va_start(ap, fmt);
qmp_fd_vsend_raw(s->qmp_fd, fmt, ap);
va_end(ap);
--
2.35.3
^ permalink raw reply related [flat|nested] 47+ messages in thread
* [PULL 24/42] migration-test: cpr-transfer
2025-01-29 16:00 [PULL 00/42] Migration patches for 2025-01-29 Fabiano Rosas
` (22 preceding siblings ...)
2025-01-29 16:00 ` [PULL 23/42] tests/qtest: assert qmp connected Fabiano Rosas
@ 2025-01-29 16:00 ` Fabiano Rosas
2025-01-29 16:00 ` [PULL 25/42] migration: cpr-transfer documentation Fabiano Rosas
` (18 subsequent siblings)
42 siblings, 0 replies; 47+ messages in thread
From: Fabiano Rosas @ 2025-01-29 16:00 UTC (permalink / raw)
To: qemu-devel; +Cc: Peter Xu, Steve Sistare
From: Steve Sistare <steven.sistare@oracle.com>
Add a migration test for cpr-transfer mode. Defer the connection to the
target monitor, else the test hangs because in cpr-transfer mode QEMU does
not listen for monitor connections until we send the migrate command to
source QEMU.
To test -incoming defer, send a migrate incoming command to the target,
after sending the migrate command to the source, as required by
cpr-transfer mode.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/1736967650-129648-24-git-send-email-steven.sistare@oracle.com
[only allocate in_channels when needed]
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
tests/qtest/migration/cpr-tests.c | 62 +++++++++++++++++++++++++++++++
tests/qtest/migration/framework.c | 23 ++++++++++++
tests/qtest/migration/framework.h | 3 ++
3 files changed, 88 insertions(+)
diff --git a/tests/qtest/migration/cpr-tests.c b/tests/qtest/migration/cpr-tests.c
index 44ce89aa5b..215b0df8c0 100644
--- a/tests/qtest/migration/cpr-tests.c
+++ b/tests/qtest/migration/cpr-tests.c
@@ -44,6 +44,62 @@ static void test_mode_reboot(void)
test_file_common(&args, true);
}
+static void *test_mode_transfer_start(QTestState *from, QTestState *to)
+{
+ migrate_set_parameter_str(from, "mode", "cpr-transfer");
+ return NULL;
+}
+
+/*
+ * cpr-transfer mode cannot use the target monitor prior to starting the
+ * migration, and cannot connect synchronously to the monitor, so defer
+ * the target connection.
+ */
+static void test_mode_transfer_common(bool incoming_defer)
+{
+ g_autofree char *cpr_path = g_strdup_printf("%s/cpr.sock", tmpfs);
+ g_autofree char *mig_path = g_strdup_printf("%s/migsocket", tmpfs);
+ g_autofree char *uri = g_strdup_printf("unix:%s", mig_path);
+
+ const char *opts = "-machine aux-ram-share=on -nodefaults";
+ g_autofree const char *cpr_channel = g_strdup_printf(
+ "cpr,addr.transport=socket,addr.type=unix,addr.path=%s",
+ cpr_path);
+ g_autofree char *opts_target = g_strdup_printf("-incoming %s %s",
+ cpr_channel, opts);
+
+ g_autofree char *connect_channels = g_strdup_printf(
+ "[ { 'channel-type': 'main',"
+ " 'addr': { 'transport': 'socket',"
+ " 'type': 'unix',"
+ " 'path': '%s' } } ]",
+ mig_path);
+
+ MigrateCommon args = {
+ .start.opts_source = opts,
+ .start.opts_target = opts_target,
+ .start.defer_target_connect = true,
+ .start.memory_backend = "-object memory-backend-memfd,id=pc.ram,size=%s"
+ " -machine memory-backend=pc.ram",
+ .listen_uri = incoming_defer ? "defer" : uri,
+ .connect_channels = connect_channels,
+ .cpr_channel = cpr_channel,
+ .start_hook = test_mode_transfer_start,
+ };
+
+ test_precopy_common(&args);
+}
+
+static void test_mode_transfer(void)
+{
+ test_mode_transfer_common(NULL);
+}
+
+static void test_mode_transfer_defer(void)
+{
+ test_mode_transfer_common(true);
+}
+
void migration_test_add_cpr(MigrationTestEnv *env)
{
tmpfs = env->tmpfs;
@@ -55,4 +111,10 @@ void migration_test_add_cpr(MigrationTestEnv *env)
if (getenv("QEMU_TEST_FLAKY_TESTS")) {
migration_test_add("/migration/mode/reboot", test_mode_reboot);
}
+
+ if (env->has_kvm) {
+ migration_test_add("/migration/mode/transfer", test_mode_transfer);
+ migration_test_add("/migration/mode/transfer/defer",
+ test_mode_transfer_defer);
+ }
}
diff --git a/tests/qtest/migration/framework.c b/tests/qtest/migration/framework.c
index 1228bd5bca..de65bfe40d 100644
--- a/tests/qtest/migration/framework.c
+++ b/tests/qtest/migration/framework.c
@@ -420,6 +420,7 @@ void migrate_end(QTestState *from, QTestState *to, bool test_dest)
qtest_quit(to);
cleanup("migsocket");
+ cleanup("cpr.sock");
cleanup("src_serial");
cleanup("dest_serial");
cleanup(FILE_TEST_FILENAME);
@@ -707,8 +708,11 @@ void test_precopy_common(MigrateCommon *args)
{
QTestState *from, *to;
void *data_hook = NULL;
+ QObject *in_channels = NULL;
QObject *out_channels = NULL;
+ g_assert(!args->cpr_channel || args->connect_channels);
+
if (migrate_start(&from, &to, args->listen_uri, &args->start)) {
return;
}
@@ -740,8 +744,24 @@ void test_precopy_common(MigrateCommon *args)
}
}
+ /*
+ * The cpr channel must be included in outgoing channels, but not in
+ * migrate-incoming channels.
+ */
if (args->connect_channels) {
+ if (args->start.defer_target_connect &&
+ !strcmp(args->listen_uri, "defer")) {
+ in_channels = qobject_from_json(args->connect_channels,
+ &error_abort);
+ }
out_channels = qobject_from_json(args->connect_channels, &error_abort);
+
+ if (args->cpr_channel) {
+ QList *channels_list = qobject_to(QList, out_channels);
+ QObject *obj = migrate_str_to_channel(args->cpr_channel);
+
+ qlist_append(channels_list, obj);
+ }
}
if (args->result == MIG_TEST_QMP_ERROR) {
@@ -754,6 +774,9 @@ void test_precopy_common(MigrateCommon *args)
if (args->start.defer_target_connect) {
qtest_connect(to);
qtest_qmp_handshake(to, NULL);
+ if (!strcmp(args->listen_uri, "defer")) {
+ migrate_incoming_qmp(to, args->connect_uri, in_channels, "{}");
+ }
}
if (args->result != MIG_TEST_SUCCEED) {
diff --git a/tests/qtest/migration/framework.h b/tests/qtest/migration/framework.h
index 32f3a93632..cb4a984700 100644
--- a/tests/qtest/migration/framework.h
+++ b/tests/qtest/migration/framework.h
@@ -154,6 +154,9 @@ typedef struct {
*/
const char *connect_channels;
+ /* Optional: the cpr migration channel, in JSON or dotted keys format */
+ const char *cpr_channel;
+
/* Optional: callback to run at start to set migration parameters */
TestMigrateStartHook start_hook;
/* Optional: callback to run at finish to cleanup */
--
2.35.3
^ permalink raw reply related [flat|nested] 47+ messages in thread
* [PULL 25/42] migration: cpr-transfer documentation
2025-01-29 16:00 [PULL 00/42] Migration patches for 2025-01-29 Fabiano Rosas
` (23 preceding siblings ...)
2025-01-29 16:00 ` [PULL 24/42] migration-test: cpr-transfer Fabiano Rosas
@ 2025-01-29 16:00 ` Fabiano Rosas
2025-01-29 16:00 ` [PULL 26/42] migration: Remove postcopy implications in should_send_vmdesc() Fabiano Rosas
` (17 subsequent siblings)
42 siblings, 0 replies; 47+ messages in thread
From: Fabiano Rosas @ 2025-01-29 16:00 UTC (permalink / raw)
To: qemu-devel; +Cc: Peter Xu, Steve Sistare
From: Steve Sistare <steven.sistare@oracle.com>
Add documentation for the cpr-transfer migration mode.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/1736967650-129648-25-git-send-email-steven.sistare@oracle.com
[add -machine memory-backend=ram0]
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
docs/devel/migration/CPR.rst | 184 ++++++++++++++++++++++++++++++++++-
1 file changed, 182 insertions(+), 2 deletions(-)
diff --git a/docs/devel/migration/CPR.rst b/docs/devel/migration/CPR.rst
index 63c36470cf..7897873c86 100644
--- a/docs/devel/migration/CPR.rst
+++ b/docs/devel/migration/CPR.rst
@@ -5,7 +5,7 @@ CPR is the umbrella name for a set of migration modes in which the
VM is migrated to a new QEMU instance on the same host. It is
intended for use when the goal is to update host software components
that run the VM, such as QEMU or even the host kernel. At this time,
-cpr-reboot is the only available mode.
+the cpr-reboot and cpr-transfer modes are available.
Because QEMU is restarted on the same host, with access to the same
local devices, CPR is allowed in certain cases where normal migration
@@ -53,7 +53,7 @@ RAM is copied to the migration URI.
Outgoing:
* Set the migration mode parameter to ``cpr-reboot``.
* Set the ``x-ignore-shared`` capability if desired.
- * Issue the ``migrate`` command. It is recommended the the URI be a
+ * Issue the ``migrate`` command. It is recommended the URI be a
``file`` type, but one can use other types such as ``exec``,
provided the command captures all the data from the outgoing side,
and provides all the data to the incoming side.
@@ -145,3 +145,183 @@ Caveats
cpr-reboot mode may not be used with postcopy, background-snapshot,
or COLO.
+
+cpr-transfer mode
+-----------------
+
+This mode allows the user to transfer a guest to a new QEMU instance
+on the same host with minimal guest pause time, by preserving guest
+RAM in place, albeit with new virtual addresses in new QEMU. Devices
+and their pinned memory pages will also be preserved in a future QEMU
+release.
+
+The user starts new QEMU on the same host as old QEMU, with command-
+line arguments to create the same machine, plus the ``-incoming``
+option for the main migration channel, like normal live migration.
+In addition, the user adds a second -incoming option with channel
+type ``cpr``. This CPR channel must support file descriptor transfer
+with SCM_RIGHTS, i.e. it must be a UNIX domain socket.
+
+To initiate CPR, the user issues a migrate command to old QEMU,
+adding a second migration channel of type ``cpr`` in the channels
+argument. Old QEMU stops the VM, saves state to the migration
+channels, and enters the postmigrate state. Execution resumes in
+new QEMU.
+
+New QEMU reads the CPR channel before opening a monitor, hence
+the CPR channel cannot be specified in the list of channels for a
+migrate-incoming command. It may only be specified on the command
+line.
+
+Usage
+^^^^^
+
+Memory backend objects must have the ``share=on`` attribute.
+
+The VM must be started with the ``-machine aux-ram-share=on``
+option. This causes implicit RAM blocks (those not described by
+a memory-backend object) to be allocated by mmap'ing a memfd.
+Examples include VGA and ROM.
+
+Outgoing:
+ * Set the migration mode parameter to ``cpr-transfer``.
+ * Issue the ``migrate`` command, containing a main channel and
+ a cpr channel.
+
+Incoming:
+ * Start new QEMU with two ``-incoming`` options.
+ * If the VM was running when the outgoing ``migrate`` command was
+ issued, then QEMU automatically resumes VM execution.
+
+Caveats
+^^^^^^^
+
+cpr-transfer mode may not be used with postcopy, background-snapshot,
+or COLO.
+
+memory-backend-epc is not supported.
+
+The main incoming migration channel address cannot be a file type.
+
+If the main incoming channel address is an inet socket, then the port
+cannot be 0 (meaning dynamically choose a port).
+
+When using ``-incoming defer``, you must issue the migrate command to
+old QEMU before issuing any monitor commands to new QEMU, because new
+QEMU blocks waiting to read from the cpr channel before starting its
+monitor, and old QEMU does not write to the channel until the migrate
+command is issued. However, new QEMU does not open and read the
+main migration channel until you issue the migrate incoming command.
+
+Example 1: incoming channel
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+In these examples, we simply restart the same version of QEMU, but
+in a real scenario one would start new QEMU on the incoming side.
+Note that new QEMU does not print the monitor prompt until old QEMU
+has issued the migrate command. The outgoing side uses QMP because
+HMP cannot specify a CPR channel. Some QMP responses are omitted for
+brevity.
+
+::
+
+ Outgoing: Incoming:
+
+ # qemu-kvm -qmp stdio
+ -object memory-backend-file,id=ram0,size=4G,
+ mem-path=/dev/shm/ram0,share=on -m 4G
+ -machine memory-backend=ram0
+ -machine aux-ram-share=on
+ ...
+ # qemu-kvm -monitor stdio
+ -incoming tcp:0:44444
+ -incoming '{"channel-type": "cpr",
+ "addr": { "transport": "socket",
+ "type": "unix", "path": "cpr.sock"}}'
+ ...
+ {"execute":"qmp_capabilities"}
+
+ {"execute": "query-status"}
+ {"return": {"status": "running",
+ "running": true}}
+
+ {"execute":"migrate-set-parameters",
+ "arguments":{"mode":"cpr-transfer"}}
+
+ {"execute": "migrate", "arguments": { "channels": [
+ {"channel-type": "main",
+ "addr": { "transport": "socket", "type": "inet",
+ "host": "0", "port": "44444" }},
+ {"channel-type": "cpr",
+ "addr": { "transport": "socket", "type": "unix",
+ "path": "cpr.sock" }}]}}
+
+ QEMU 10.0.50 monitor
+ (qemu) info status
+ VM status: running
+
+ {"execute": "query-status"}
+ {"return": {"status": "postmigrate",
+ "running": false}}
+
+Example 2: incoming defer
+^^^^^^^^^^^^^^^^^^^^^^^^^
+
+This example uses ``-incoming defer`` to hot plug a device before
+accepting the main migration channel. Again note you must issue the
+migrate command to old QEMU before you can issue any monitor
+commands to new QEMU.
+
+
+::
+
+ Outgoing: Incoming:
+
+ # qemu-kvm -monitor stdio
+ -object memory-backend-file,id=ram0,size=4G,
+ mem-path=/dev/shm/ram0,share=on -m 4G
+ -machine memory-backend=ram0
+ -machine aux-ram-share=on
+ ...
+ # qemu-kvm -monitor stdio
+ -incoming defer
+ -incoming '{"channel-type": "cpr",
+ "addr": { "transport": "socket",
+ "type": "unix", "path": "cpr.sock"}}'
+ ...
+ {"execute":"qmp_capabilities"}
+
+ {"execute": "device_add",
+ "arguments": {"driver": "pcie-root-port"}}
+
+ {"execute":"migrate-set-parameters",
+ "arguments":{"mode":"cpr-transfer"}}
+
+ {"execute": "migrate", "arguments": { "channels": [
+ {"channel-type": "main",
+ "addr": { "transport": "socket", "type": "inet",
+ "host": "0", "port": "44444" }},
+ {"channel-type": "cpr",
+ "addr": { "transport": "socket", "type": "unix",
+ "path": "cpr.sock" }}]}}
+
+ QEMU 10.0.50 monitor
+ (qemu) info status
+ VM status: paused (inmigrate)
+ (qemu) device_add pcie-root-port
+ (qemu) migrate_incoming tcp:0:44444
+ (qemu) info status
+ VM status: running
+
+ {"execute": "query-status"}
+ {"return": {"status": "postmigrate",
+ "running": false}}
+
+Futures
+^^^^^^^
+
+cpr-transfer mode is based on a capability to transfer open file
+descriptors from old to new QEMU. In the future, descriptors for
+vfio, iommufd, vhost, and char devices could be transferred,
+preserving those devices and their kernel state without interruption,
+even if they do not explicitly support live migration.
--
2.35.3
^ permalink raw reply related [flat|nested] 47+ messages in thread
* [PULL 26/42] migration: Remove postcopy implications in should_send_vmdesc()
2025-01-29 16:00 [PULL 00/42] Migration patches for 2025-01-29 Fabiano Rosas
` (24 preceding siblings ...)
2025-01-29 16:00 ` [PULL 25/42] migration: cpr-transfer documentation Fabiano Rosas
@ 2025-01-29 16:00 ` Fabiano Rosas
2025-01-29 16:00 ` [PULL 27/42] migration: Do not construct JSON description if suppressed Fabiano Rosas
` (16 subsequent siblings)
42 siblings, 0 replies; 47+ messages in thread
From: Fabiano Rosas @ 2025-01-29 16:00 UTC (permalink / raw)
To: qemu-devel; +Cc: Peter Xu, Jiri Denemark, Juraj Marcin
From: Peter Xu <peterx@redhat.com>
should_send_vmdesc() has a hack inside (which was not reflected in the
function name) in that it tries to detect global postcopy state and that
will affect the value to be returned.
It's easier to keep the helper simple by only check the suppress-vmdesc
property. Then:
- On the sender side of its usage, there's already in_postcopy variable
that we can use: postcopy doesn't send vmdesc at all, so directly skip
everything for postcopy.
- On the recv side, when reaching vmdesc processing it must be precopy
code already, hence that hack check never used to work anyway.
No functional change intended, except a trivial side effect that QEMU
source will start to avoid running some JSON helper in postcopy path, but
that would only reduce the postcopy blackout window a bit, rather than any
other bad side effect.
Signed-off-by: Peter Xu <peterx@redhat.com>
Tested-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Juraj Marcin <jmarcin@redhat.com>
Link: https://lore.kernel.org/r/20250114230746.3268797-2-peterx@redhat.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
migration/savevm.c | 21 +++++++++++----------
1 file changed, 11 insertions(+), 10 deletions(-)
diff --git a/migration/savevm.c b/migration/savevm.c
index 6e56d4cf1d..b8859d367f 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1427,8 +1427,8 @@ int qemu_savevm_state_iterate(QEMUFile *f, bool postcopy)
static bool should_send_vmdesc(void)
{
MachineState *machine = MACHINE(qdev_get_machine());
- bool in_postcopy = migration_in_postcopy();
- return !machine->suppress_vmdesc && !in_postcopy;
+
+ return !machine->suppress_vmdesc;
}
/*
@@ -1563,16 +1563,16 @@ int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
if (!in_postcopy) {
/* Postcopy stream will still be going */
qemu_put_byte(f, QEMU_VM_EOF);
- }
- json_writer_end_array(vmdesc);
- json_writer_end_object(vmdesc);
- vmdesc_len = strlen(json_writer_get(vmdesc));
+ json_writer_end_array(vmdesc);
+ json_writer_end_object(vmdesc);
+ vmdesc_len = strlen(json_writer_get(vmdesc));
- if (should_send_vmdesc()) {
- qemu_put_byte(f, QEMU_VM_VMDESCRIPTION);
- qemu_put_be32(f, vmdesc_len);
- qemu_put_buffer(f, (uint8_t *)json_writer_get(vmdesc), vmdesc_len);
+ if (should_send_vmdesc()) {
+ qemu_put_byte(f, QEMU_VM_VMDESCRIPTION);
+ qemu_put_be32(f, vmdesc_len);
+ qemu_put_buffer(f, (uint8_t *)json_writer_get(vmdesc), vmdesc_len);
+ }
}
/* Free it now to detect any inconsistencies. */
@@ -2965,6 +2965,7 @@ int qemu_loadvm_state(QEMUFile *f)
return ret;
}
+ /* When reaching here, it must be precopy */
if (ret == 0) {
ret = qemu_file_get_error(f);
}
--
2.35.3
^ permalink raw reply related [flat|nested] 47+ messages in thread
* [PULL 27/42] migration: Do not construct JSON description if suppressed
2025-01-29 16:00 [PULL 00/42] Migration patches for 2025-01-29 Fabiano Rosas
` (25 preceding siblings ...)
2025-01-29 16:00 ` [PULL 26/42] migration: Remove postcopy implications in should_send_vmdesc() Fabiano Rosas
@ 2025-01-29 16:00 ` Fabiano Rosas
2025-01-29 16:00 ` [PULL 28/42] migration: Optimize postcopy on downtime by avoiding JSON writer Fabiano Rosas
` (15 subsequent siblings)
42 siblings, 0 replies; 47+ messages in thread
From: Fabiano Rosas @ 2025-01-29 16:00 UTC (permalink / raw)
To: qemu-devel; +Cc: Peter Xu, Jiri Denemark, Juraj Marcin
From: Peter Xu <peterx@redhat.com>
QEMU machine has a property "suppress-vmdesc". When it is enabled, QEMU
will stop attaching JSON VM description at the end of the precopy migration
stream (postcopy is never affected because postcopy never attach that).
However even if it's suppressed by the user, the source QEMU will still
construct the JSON descriptions, which is a complete waste of CPU and
memory resources.
To avoid it, only create the JSON writer object if suppress-vmdesc is not
specified.
Luckily, vmstate_save() already supports vmdesc==NULL, so only a few spots
that are left to be prepared that vmdesc can be NULL now.
When at it, move the init / destroy of the JSON writer object to start /
end of the migration - the JSON writer object is a sub-struct of migration
state, and that looks like the only object that was dynamically allocated /
destroyed within migration process. Make it the same as the rest objects
that migration uses.
Signed-off-by: Peter Xu <peterx@redhat.com>
Tested-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Juraj Marcin <jmarcin@redhat.com>
Link: https://lore.kernel.org/r/20250114230746.3268797-3-peterx@redhat.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
migration/migration.c | 9 +++++---
migration/migration.h | 1 +
migration/savevm.c | 49 +++++++++++++++++++++++--------------------
3 files changed, 33 insertions(+), 26 deletions(-)
diff --git a/migration/migration.c b/migration/migration.c
index 88b09914ec..5c335cc30b 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1431,8 +1431,8 @@ static void migrate_fd_cleanup(MigrationState *s)
g_free(s->hostname);
s->hostname = NULL;
- json_writer_free(s->vmdesc);
- s->vmdesc = NULL;
+
+ g_clear_pointer(&s->vmdesc, json_writer_free);
qemu_savevm_state_cleanup();
cpr_state_close();
@@ -1722,7 +1722,10 @@ int migrate_init(MigrationState *s, Error **errp)
s->migration_thread_running = false;
error_free(s->error);
s->error = NULL;
- s->vmdesc = NULL;
+
+ if (should_send_vmdesc()) {
+ s->vmdesc = json_writer_new(false);
+ }
migrate_set_state(&s->state, MIGRATION_STATUS_NONE, MIGRATION_STATUS_SETUP);
diff --git a/migration/migration.h b/migration/migration.h
index fb1b8f99d3..4c1fafc2b5 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -552,6 +552,7 @@ void migration_bitmap_sync_precopy(bool last_stage);
/* migration/block-dirty-bitmap.c */
void dirty_bitmap_mig_init(void);
+bool should_send_vmdesc(void);
/* migration/block-active.c */
void migration_block_active_setup(bool active);
diff --git a/migration/savevm.c b/migration/savevm.c
index b8859d367f..cfe9dfaf5c 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1231,8 +1231,7 @@ void qemu_savevm_non_migratable_list(strList **reasons)
void qemu_savevm_state_header(QEMUFile *f)
{
MigrationState *s = migrate_get_current();
-
- s->vmdesc = json_writer_new(false);
+ JSONWriter *vmdesc = s->vmdesc;
trace_savevm_state_header();
qemu_put_be32(f, QEMU_VM_FILE_MAGIC);
@@ -1241,16 +1240,21 @@ void qemu_savevm_state_header(QEMUFile *f)
if (s->send_configuration) {
qemu_put_byte(f, QEMU_VM_CONFIGURATION);
- /*
- * This starts the main json object and is paired with the
- * json_writer_end_object in
- * qemu_savevm_state_complete_precopy_non_iterable
- */
- json_writer_start_object(s->vmdesc, NULL);
+ if (vmdesc) {
+ /*
+ * This starts the main json object and is paired with the
+ * json_writer_end_object in
+ * qemu_savevm_state_complete_precopy_non_iterable
+ */
+ json_writer_start_object(vmdesc, NULL);
+ json_writer_start_object(vmdesc, "configuration");
+ }
- json_writer_start_object(s->vmdesc, "configuration");
- vmstate_save_state(f, &vmstate_configuration, &savevm_state, s->vmdesc);
- json_writer_end_object(s->vmdesc);
+ vmstate_save_state(f, &vmstate_configuration, &savevm_state, vmdesc);
+
+ if (vmdesc) {
+ json_writer_end_object(vmdesc);
+ }
}
}
@@ -1296,16 +1300,19 @@ int qemu_savevm_state_setup(QEMUFile *f, Error **errp)
{
ERRP_GUARD();
MigrationState *ms = migrate_get_current();
+ JSONWriter *vmdesc = ms->vmdesc;
SaveStateEntry *se;
int ret = 0;
- json_writer_int64(ms->vmdesc, "page_size", qemu_target_page_size());
- json_writer_start_array(ms->vmdesc, "devices");
+ if (vmdesc) {
+ json_writer_int64(vmdesc, "page_size", qemu_target_page_size());
+ json_writer_start_array(vmdesc, "devices");
+ }
trace_savevm_state_setup();
QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
if (se->vmsd && se->vmsd->early_setup) {
- ret = vmstate_save(f, se, ms->vmdesc, errp);
+ ret = vmstate_save(f, se, vmdesc, errp);
if (ret) {
migrate_set_error(ms, *errp);
qemu_file_set_error(f, ret);
@@ -1424,7 +1431,7 @@ int qemu_savevm_state_iterate(QEMUFile *f, bool postcopy)
return all_finished;
}
-static bool should_send_vmdesc(void)
+bool should_send_vmdesc(void)
{
MachineState *machine = MACHINE(qdev_get_machine());
@@ -1564,21 +1571,17 @@ int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
/* Postcopy stream will still be going */
qemu_put_byte(f, QEMU_VM_EOF);
- json_writer_end_array(vmdesc);
- json_writer_end_object(vmdesc);
- vmdesc_len = strlen(json_writer_get(vmdesc));
+ if (vmdesc) {
+ json_writer_end_array(vmdesc);
+ json_writer_end_object(vmdesc);
+ vmdesc_len = strlen(json_writer_get(vmdesc));
- if (should_send_vmdesc()) {
qemu_put_byte(f, QEMU_VM_VMDESCRIPTION);
qemu_put_be32(f, vmdesc_len);
qemu_put_buffer(f, (uint8_t *)json_writer_get(vmdesc), vmdesc_len);
}
}
- /* Free it now to detect any inconsistencies. */
- json_writer_free(vmdesc);
- ms->vmdesc = NULL;
-
trace_vmstate_downtime_checkpoint("src-non-iterable-saved");
return 0;
--
2.35.3
^ permalink raw reply related [flat|nested] 47+ messages in thread
* [PULL 28/42] migration: Optimize postcopy on downtime by avoiding JSON writer
2025-01-29 16:00 [PULL 00/42] Migration patches for 2025-01-29 Fabiano Rosas
` (26 preceding siblings ...)
2025-01-29 16:00 ` [PULL 27/42] migration: Do not construct JSON description if suppressed Fabiano Rosas
@ 2025-01-29 16:00 ` Fabiano Rosas
2025-01-29 16:00 ` [PULL 29/42] migration: Avoid two src-downtime-end tracepoints for postcopy Fabiano Rosas
` (14 subsequent siblings)
42 siblings, 0 replies; 47+ messages in thread
From: Fabiano Rosas @ 2025-01-29 16:00 UTC (permalink / raw)
To: qemu-devel; +Cc: Peter Xu, Jiri Denemark, Juraj Marcin
From: Peter Xu <peterx@redhat.com>
postcopy_start() is the entry function that postcopy is destined to start.
It also means QEMU source will not dump VM description, aka, the JSON
writer is garbage now.
We can leave that to be cleaned up when migration completes, however when
with the JSON writer object being present, vmstate_save() will still try to
construct the JSON objects for the VM descriptions, even though it'll never
be used later if it's postcopy.
To save those cycles, release the JSON writer earlier for postcopy. Then
vmstate_save() later will be smart enough to skip the JSON object
constructions completely. It can logically reduce downtime because all
such JSON constructions happen during postcopy blackout.
Signed-off-by: Peter Xu <peterx@redhat.com>
Tested-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Juraj Marcin <jmarcin@redhat.com>
Link: https://lore.kernel.org/r/20250114230746.3268797-4-peterx@redhat.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
migration/migration.c | 17 +++++++++++++++--
1 file changed, 15 insertions(+), 2 deletions(-)
diff --git a/migration/migration.c b/migration/migration.c
index 5c335cc30b..a9fe9c2821 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1422,6 +1422,11 @@ void migrate_set_state(MigrationStatus *state, MigrationStatus old_state,
}
}
+static void migration_cleanup_json_writer(MigrationState *s)
+{
+ g_clear_pointer(&s->vmdesc, json_writer_free);
+}
+
static void migrate_fd_cleanup(MigrationState *s)
{
MigrationEventType type;
@@ -1429,11 +1434,11 @@ static void migrate_fd_cleanup(MigrationState *s)
trace_migrate_fd_cleanup();
+ migration_cleanup_json_writer(s);
+
g_free(s->hostname);
s->hostname = NULL;
- g_clear_pointer(&s->vmdesc, json_writer_free);
-
qemu_savevm_state_cleanup();
cpr_state_close();
migrate_hup_delete(s);
@@ -2628,6 +2633,14 @@ static int postcopy_start(MigrationState *ms, Error **errp)
uint64_t bandwidth = migrate_max_postcopy_bandwidth();
int cur_state = MIGRATION_STATUS_ACTIVE;
+ /*
+ * Now we're 100% sure to switch to postcopy, so JSON writer won't be
+ * useful anymore. Free the resources early if it is there. Clearing
+ * the vmdesc also means any follow up vmstate_save()s will start to
+ * skip all JSON operations, which can shrink postcopy downtime.
+ */
+ migration_cleanup_json_writer(ms);
+
if (migrate_postcopy_preempt()) {
migration_wait_main_channel(ms);
if (postcopy_preempt_establish_channel(ms)) {
--
2.35.3
^ permalink raw reply related [flat|nested] 47+ messages in thread
* [PULL 29/42] migration: Avoid two src-downtime-end tracepoints for postcopy
2025-01-29 16:00 [PULL 00/42] Migration patches for 2025-01-29 Fabiano Rosas
` (27 preceding siblings ...)
2025-01-29 16:00 ` [PULL 28/42] migration: Optimize postcopy on downtime by avoiding JSON writer Fabiano Rosas
@ 2025-01-29 16:00 ` Fabiano Rosas
2025-01-29 16:00 ` [PULL 30/42] migration: Drop inactivate_disk param in qemu_savevm_state_complete* Fabiano Rosas
` (13 subsequent siblings)
42 siblings, 0 replies; 47+ messages in thread
From: Fabiano Rosas @ 2025-01-29 16:00 UTC (permalink / raw)
To: qemu-devel; +Cc: Peter Xu, Jiri Denemark, Juraj Marcin
From: Peter Xu <peterx@redhat.com>
Postcopy can trigger this tracepoint twice, while only the 1st one is
valid. Avoid triggering the 2nd tracepoint just like what we do with
recording the total downtime.
Signed-off-by: Peter Xu <peterx@redhat.com>
Tested-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Juraj Marcin <jmarcin@redhat.com>
Link: https://lore.kernel.org/r/20250114230746.3268797-5-peterx@redhat.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
migration/migration.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/migration/migration.c b/migration/migration.c
index a9fe9c2821..07b6b730b7 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -129,9 +129,8 @@ static void migration_downtime_end(MigrationState *s)
*/
if (!s->downtime) {
s->downtime = now - s->downtime_start;
+ trace_vmstate_downtime_checkpoint("src-downtime-end");
}
-
- trace_vmstate_downtime_checkpoint("src-downtime-end");
}
static bool migration_needs_multiple_sockets(void)
--
2.35.3
^ permalink raw reply related [flat|nested] 47+ messages in thread
* [PULL 30/42] migration: Drop inactivate_disk param in qemu_savevm_state_complete*
2025-01-29 16:00 [PULL 00/42] Migration patches for 2025-01-29 Fabiano Rosas
` (28 preceding siblings ...)
2025-01-29 16:00 ` [PULL 29/42] migration: Avoid two src-downtime-end tracepoints for postcopy Fabiano Rosas
@ 2025-01-29 16:00 ` Fabiano Rosas
2025-01-29 16:00 ` [PULL 31/42] migration: Synchronize all CPU states only for non-iterable dump Fabiano Rosas
` (12 subsequent siblings)
42 siblings, 0 replies; 47+ messages in thread
From: Fabiano Rosas @ 2025-01-29 16:00 UTC (permalink / raw)
To: qemu-devel; +Cc: Peter Xu, Jiri Denemark, Juraj Marcin
From: Peter Xu <peterx@redhat.com>
This parameter is only used by one caller, which is the genuine precopy
complete path (migration_completion_precopy).
The parameter was introduced in a1fbe750fd ("migration: Fix race of image
locking between src and dst") to make sure the inactivate will happen
before EOF to make sure dest will always be able to activate the disk
properly. However there's no limitation on how early we inactivate the
disk. For precopy completion path, we can always do that as long as VM is
stopped.
Move the disk inactivate there, then we can remove this inactivate_disk
parameter in the whole call stack, because all the rest users pass in false
always.
Signed-off-by: Peter Xu <peterx@redhat.com>
Tested-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Juraj Marcin <jmarcin@redhat.com>
Link: https://lore.kernel.org/r/20250114230746.3268797-6-peterx@redhat.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
migration/migration.c | 24 +++++++++++++++++-------
migration/savevm.c | 27 +++++----------------------
migration/savevm.h | 5 ++---
3 files changed, 24 insertions(+), 32 deletions(-)
diff --git a/migration/migration.c b/migration/migration.c
index 07b6b730b7..d8a6bc12e0 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2682,7 +2682,7 @@ static int postcopy_start(MigrationState *ms, Error **errp)
* Cause any non-postcopiable, but iterative devices to
* send out their final data.
*/
- qemu_savevm_state_complete_precopy(ms->to_dst_file, true, false);
+ qemu_savevm_state_complete_precopy(ms->to_dst_file, true);
/*
* in Finish migrate and with the io-lock held everything should
@@ -2727,7 +2727,7 @@ static int postcopy_start(MigrationState *ms, Error **errp)
*/
qemu_savevm_send_postcopy_listen(fb);
- qemu_savevm_state_complete_precopy(fb, false, false);
+ qemu_savevm_state_complete_precopy(fb, false);
if (migrate_postcopy_ram()) {
qemu_savevm_send_ping(fb, 3);
}
@@ -2859,11 +2859,21 @@ static int migration_completion_precopy(MigrationState *s,
goto out_unlock;
}
- migration_rate_set(RATE_LIMIT_DISABLED);
-
/* Inactivate disks except in COLO */
- ret = qemu_savevm_state_complete_precopy(s->to_dst_file, false,
- !migrate_colo());
+ if (!migrate_colo()) {
+ /*
+ * Inactivate before sending QEMU_VM_EOF so that the
+ * bdrv_activate_all() on the other end won't fail.
+ */
+ if (!migration_block_inactivate()) {
+ ret = -EFAULT;
+ goto out_unlock;
+ }
+ }
+
+ migration_rate_set(RATE_LIMIT_DISABLED);
+
+ ret = qemu_savevm_state_complete_precopy(s->to_dst_file, false);
out_unlock:
bql_unlock();
return ret;
@@ -3744,7 +3754,7 @@ static void *bg_migration_thread(void *opaque)
* save their state to channel-buffer along with devices.
*/
cpu_synchronize_all_states();
- if (qemu_savevm_state_complete_precopy_non_iterable(fb, false, false)) {
+ if (qemu_savevm_state_complete_precopy_non_iterable(fb, false)) {
goto fail;
}
/*
diff --git a/migration/savevm.c b/migration/savevm.c
index cfe9dfaf5c..5e56a5d9fc 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1521,8 +1521,7 @@ int qemu_savevm_state_complete_precopy_iterable(QEMUFile *f, bool in_postcopy)
}
int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
- bool in_postcopy,
- bool inactivate_disks)
+ bool in_postcopy)
{
MigrationState *ms = migrate_get_current();
int64_t start_ts_each, end_ts_each;
@@ -1553,20 +1552,6 @@ int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
end_ts_each - start_ts_each);
}
- if (inactivate_disks) {
- /*
- * Inactivate before sending QEMU_VM_EOF so that the
- * bdrv_activate_all() on the other end won't fail.
- */
- if (!migration_block_inactivate()) {
- error_setg(&local_err, "%s: bdrv_inactivate_all() failed",
- __func__);
- migrate_set_error(ms, local_err);
- error_report_err(local_err);
- qemu_file_set_error(f, -EFAULT);
- return -1;
- }
- }
if (!in_postcopy) {
/* Postcopy stream will still be going */
qemu_put_byte(f, QEMU_VM_EOF);
@@ -1587,8 +1572,7 @@ int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
return 0;
}
-int qemu_savevm_state_complete_precopy(QEMUFile *f, bool iterable_only,
- bool inactivate_disks)
+int qemu_savevm_state_complete_precopy(QEMUFile *f, bool iterable_only)
{
int ret;
Error *local_err = NULL;
@@ -1613,8 +1597,7 @@ int qemu_savevm_state_complete_precopy(QEMUFile *f, bool iterable_only,
goto flush;
}
- ret = qemu_savevm_state_complete_precopy_non_iterable(f, in_postcopy,
- inactivate_disks);
+ ret = qemu_savevm_state_complete_precopy_non_iterable(f, in_postcopy);
if (ret) {
return ret;
}
@@ -1717,7 +1700,7 @@ static int qemu_savevm_state(QEMUFile *f, Error **errp)
ret = qemu_file_get_error(f);
if (ret == 0) {
- qemu_savevm_state_complete_precopy(f, false, false);
+ qemu_savevm_state_complete_precopy(f, false);
ret = qemu_file_get_error(f);
}
if (ret != 0) {
@@ -1743,7 +1726,7 @@ cleanup:
void qemu_savevm_live_state(QEMUFile *f)
{
/* save QEMU_VM_SECTION_END section */
- qemu_savevm_state_complete_precopy(f, true, false);
+ qemu_savevm_state_complete_precopy(f, true);
qemu_put_byte(f, QEMU_VM_EOF);
}
diff --git a/migration/savevm.h b/migration/savevm.h
index 9ec96a995c..c48a53e95e 100644
--- a/migration/savevm.h
+++ b/migration/savevm.h
@@ -39,8 +39,7 @@ void qemu_savevm_state_header(QEMUFile *f);
int qemu_savevm_state_iterate(QEMUFile *f, bool postcopy);
void qemu_savevm_state_cleanup(void);
void qemu_savevm_state_complete_postcopy(QEMUFile *f);
-int qemu_savevm_state_complete_precopy(QEMUFile *f, bool iterable_only,
- bool inactivate_disks);
+int qemu_savevm_state_complete_precopy(QEMUFile *f, bool iterable_only);
void qemu_savevm_state_pending_exact(uint64_t *must_precopy,
uint64_t *can_postcopy);
void qemu_savevm_state_pending_estimate(uint64_t *must_precopy,
@@ -68,6 +67,6 @@ int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis);
int qemu_load_device_state(QEMUFile *f);
int qemu_loadvm_approve_switchover(void);
int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
- bool in_postcopy, bool inactivate_disks);
+ bool in_postcopy);
#endif
--
2.35.3
^ permalink raw reply related [flat|nested] 47+ messages in thread
* [PULL 31/42] migration: Synchronize all CPU states only for non-iterable dump
2025-01-29 16:00 [PULL 00/42] Migration patches for 2025-01-29 Fabiano Rosas
` (29 preceding siblings ...)
2025-01-29 16:00 ` [PULL 30/42] migration: Drop inactivate_disk param in qemu_savevm_state_complete* Fabiano Rosas
@ 2025-01-29 16:00 ` Fabiano Rosas
2025-01-29 16:00 ` [PULL 32/42] migration: Adjust postcopy bandwidth during switchover Fabiano Rosas
` (11 subsequent siblings)
42 siblings, 0 replies; 47+ messages in thread
From: Fabiano Rosas @ 2025-01-29 16:00 UTC (permalink / raw)
To: qemu-devel; +Cc: Peter Xu, Jiri Denemark, Juraj Marcin
From: Peter Xu <peterx@redhat.com>
Do one shot cpu sync at qemu_savevm_state_complete_precopy_non_iterable(),
instead of coding it separately in two places.
Note that in the context of qemu_savevm_state_complete_precopy(), this
patch is also an optimization for postcopy path, in that we can avoid sync
cpu twice during switchover: before this patch, postcopy_start() invokes
twice on qemu_savevm_state_complete_precopy(), each of them will try to
sync CPU info. In reality, only one of them would be enough.
For background snapshot, there's no intended functional change.
Signed-off-by: Peter Xu <peterx@redhat.com>
Tested-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Juraj Marcin <jmarcin@redhat.com>
Link: https://lore.kernel.org/r/20250114230746.3268797-7-peterx@redhat.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
migration/migration.c | 6 +-----
migration/savevm.c | 5 +++--
2 files changed, 4 insertions(+), 7 deletions(-)
diff --git a/migration/migration.c b/migration/migration.c
index d8a6bc12e0..46e30a4814 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3749,11 +3749,7 @@ static void *bg_migration_thread(void *opaque)
if (migration_stop_vm(s, RUN_STATE_PAUSED)) {
goto fail;
}
- /*
- * Put vCPUs in sync with shadow context structures, then
- * save their state to channel-buffer along with devices.
- */
- cpu_synchronize_all_states();
+
if (qemu_savevm_state_complete_precopy_non_iterable(fb, false)) {
goto fail;
}
diff --git a/migration/savevm.c b/migration/savevm.c
index 5e56a5d9fc..92e77ca92b 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1531,6 +1531,9 @@ int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
Error *local_err = NULL;
int ret;
+ /* Making sure cpu states are synchronized before saving non-iterable */
+ cpu_synchronize_all_states();
+
QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
if (se->vmsd && se->vmsd->early_setup) {
/* Already saved during qemu_savevm_state_setup(). */
@@ -1584,8 +1587,6 @@ int qemu_savevm_state_complete_precopy(QEMUFile *f, bool iterable_only)
trace_savevm_state_complete_precopy();
- cpu_synchronize_all_states();
-
if (!in_postcopy || iterable_only) {
ret = qemu_savevm_state_complete_precopy_iterable(f, in_postcopy);
if (ret) {
--
2.35.3
^ permalink raw reply related [flat|nested] 47+ messages in thread
* [PULL 32/42] migration: Adjust postcopy bandwidth during switchover
2025-01-29 16:00 [PULL 00/42] Migration patches for 2025-01-29 Fabiano Rosas
` (30 preceding siblings ...)
2025-01-29 16:00 ` [PULL 31/42] migration: Synchronize all CPU states only for non-iterable dump Fabiano Rosas
@ 2025-01-29 16:00 ` Fabiano Rosas
2025-01-29 16:00 ` [PULL 33/42] migration: Adjust locking in migration_maybe_pause() Fabiano Rosas
` (10 subsequent siblings)
42 siblings, 0 replies; 47+ messages in thread
From: Fabiano Rosas @ 2025-01-29 16:00 UTC (permalink / raw)
To: qemu-devel; +Cc: Peter Xu, Jiri Denemark, Juraj Marcin
From: Peter Xu <peterx@redhat.com>
Precopy uses unlimited bandwidth always during switchover, it makes sense
because this is so critical and no one would like to throttle bandwidth
during the VM blackout.
OTOH, postcopy surprisingly didn't do that. There's one line that in the
middle of the postcopy switchover it tries to switch to postcopy's
specified max-postcopy-bandwidth, but even so it's somewhere in the middle
which is strange.
This patch brings the two modes to always use unlimited bandwidth for
switchover, meanwhile only apply the postcopy max bandwidth after the
switchover is completed.
Signed-off-by: Peter Xu <peterx@redhat.com>
Tested-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Juraj Marcin <jmarcin@redhat.com>
Link: https://lore.kernel.org/r/20250114230746.3268797-8-peterx@redhat.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
migration/migration.c | 16 +++++++++-------
1 file changed, 9 insertions(+), 7 deletions(-)
diff --git a/migration/migration.c b/migration/migration.c
index 46e30a4814..03e3631d5b 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2629,7 +2629,6 @@ static int postcopy_start(MigrationState *ms, Error **errp)
int ret;
QIOChannelBuffer *bioc;
QEMUFile *fb;
- uint64_t bandwidth = migrate_max_postcopy_bandwidth();
int cur_state = MIGRATION_STATUS_ACTIVE;
/*
@@ -2678,6 +2677,9 @@ static int postcopy_start(MigrationState *ms, Error **errp)
goto fail;
}
+ /* Switchover phase, switch to unlimited */
+ migration_rate_set(RATE_LIMIT_DISABLED);
+
/*
* Cause any non-postcopiable, but iterative devices to
* send out their final data.
@@ -2694,12 +2696,6 @@ static int postcopy_start(MigrationState *ms, Error **errp)
ram_postcopy_send_discard_bitmap(ms);
}
- /*
- * send rest of state - note things that are doing postcopy
- * will notice we're in POSTCOPY_ACTIVE and not actually
- * wrap their state up here
- */
- migration_rate_set(bandwidth);
if (migrate_postcopy_ram()) {
/* Ping just for debugging, helps line traces up */
qemu_savevm_send_ping(ms->to_dst_file, 2);
@@ -2783,6 +2779,12 @@ static int postcopy_start(MigrationState *ms, Error **errp)
}
trace_postcopy_preempt_enabled(migrate_postcopy_preempt());
+ /*
+ * Now postcopy officially started, switch to postcopy bandwidth that
+ * user specified.
+ */
+ migration_rate_set(migrate_max_postcopy_bandwidth());
+
return ret;
fail_closefb:
--
2.35.3
^ permalink raw reply related [flat|nested] 47+ messages in thread
* [PULL 33/42] migration: Adjust locking in migration_maybe_pause()
2025-01-29 16:00 [PULL 00/42] Migration patches for 2025-01-29 Fabiano Rosas
` (31 preceding siblings ...)
2025-01-29 16:00 ` [PULL 32/42] migration: Adjust postcopy bandwidth during switchover Fabiano Rosas
@ 2025-01-29 16:00 ` Fabiano Rosas
2025-01-29 16:00 ` [PULL 34/42] migration: Drop cached migration state " Fabiano Rosas
` (9 subsequent siblings)
42 siblings, 0 replies; 47+ messages in thread
From: Fabiano Rosas @ 2025-01-29 16:00 UTC (permalink / raw)
To: qemu-devel; +Cc: Peter Xu, Jiri Denemark, Juraj Marcin
From: Peter Xu <peterx@redhat.com>
In migration_maybe_pause() QEMU may yield BQL before waiting for a
semaphore. However it yields the BQL too early, which logically gives it
chance for the main thread to quickly take the BQL and modify the state to
CANCELLING.
To avoid such race condition from happening at all, always update the
migration states within the BQL. It'll make sure no concurrent
cancellation can ever happen.
With that, IIUC there's chance we can remove the extra parameter in
migration_maybe_pause() to update active state, but that'll be done
separately later.
Signed-off-by: Peter Xu <peterx@redhat.com>
Tested-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Juraj Marcin <jmarcin@redhat.com>
Link: https://lore.kernel.org/r/20250114230746.3268797-9-peterx@redhat.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
migration/migration.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/migration/migration.c b/migration/migration.c
index 03e3631d5b..4e4bf8ffed 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2828,14 +2828,14 @@ static int migration_maybe_pause(MigrationState *s,
* wait for the 'pause_sem' semaphore.
*/
if (s->state != MIGRATION_STATUS_CANCELLING) {
- bql_unlock();
migrate_set_state(&s->state, *current_active_state,
MIGRATION_STATUS_PRE_SWITCHOVER);
+ bql_unlock();
qemu_sem_wait(&s->pause_sem);
+ bql_lock();
migrate_set_state(&s->state, MIGRATION_STATUS_PRE_SWITCHOVER,
new_state);
*current_active_state = new_state;
- bql_lock();
}
return s->state == new_state ? 0 : -EINVAL;
--
2.35.3
^ permalink raw reply related [flat|nested] 47+ messages in thread
* [PULL 34/42] migration: Drop cached migration state in migration_maybe_pause()
2025-01-29 16:00 [PULL 00/42] Migration patches for 2025-01-29 Fabiano Rosas
` (32 preceding siblings ...)
2025-01-29 16:00 ` [PULL 33/42] migration: Adjust locking in migration_maybe_pause() Fabiano Rosas
@ 2025-01-29 16:00 ` Fabiano Rosas
2025-01-29 16:00 ` [PULL 35/42] migration: Take BQL slightly longer in postcopy_start() Fabiano Rosas
` (8 subsequent siblings)
42 siblings, 0 replies; 47+ messages in thread
From: Fabiano Rosas @ 2025-01-29 16:00 UTC (permalink / raw)
To: qemu-devel; +Cc: Peter Xu, Jiri Denemark, Juraj Marcin
From: Peter Xu <peterx@redhat.com>
I can't see why we must cache the state now after we avoided possible
CANCEL race: that's the only thing I can think of that can modify the
migration state concurrently with the migration thread itself. Make all
the state updates to happen always, then we don't need to cache the state
anymore.
Signed-off-by: Peter Xu <peterx@redhat.com>
Tested-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Juraj Marcin <jmarcin@redhat.com>
Link: https://lore.kernel.org/r/20250114230746.3268797-10-peterx@redhat.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
migration/migration.c | 27 ++++++++-------------------
1 file changed, 8 insertions(+), 19 deletions(-)
diff --git a/migration/migration.c b/migration/migration.c
index 4e4bf8ffed..5a3d0750ec 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -105,9 +105,7 @@ static MigrationIncomingState *current_incoming;
static GSList *migration_blockers[MIG_MODE__MAX];
static bool migration_object_check(MigrationState *ms, Error **errp);
-static int migration_maybe_pause(MigrationState *s,
- int *current_active_state,
- int new_state);
+static int migration_maybe_pause(MigrationState *s, int new_state);
static void migrate_fd_cancel(MigrationState *s);
static bool close_return_path_on_source(MigrationState *s);
static void migration_completion_end(MigrationState *s);
@@ -2629,7 +2627,6 @@ static int postcopy_start(MigrationState *ms, Error **errp)
int ret;
QIOChannelBuffer *bioc;
QEMUFile *fb;
- int cur_state = MIGRATION_STATUS_ACTIVE;
/*
* Now we're 100% sure to switch to postcopy, so JSON writer won't be
@@ -2664,8 +2661,7 @@ static int postcopy_start(MigrationState *ms, Error **errp)
goto fail;
}
- ret = migration_maybe_pause(ms, &cur_state,
- MIGRATION_STATUS_POSTCOPY_ACTIVE);
+ ret = migration_maybe_pause(ms, MIGRATION_STATUS_POSTCOPY_ACTIVE);
if (ret < 0) {
error_setg_errno(errp, -ret, "%s: Failed in migration_maybe_pause()",
__func__);
@@ -2803,9 +2799,7 @@ fail:
* migrate_pause_before_switchover called with the BQL locked
* Returns: 0 on success
*/
-static int migration_maybe_pause(MigrationState *s,
- int *current_active_state,
- int new_state)
+static int migration_maybe_pause(MigrationState *s, int new_state)
{
if (!migrate_pause_before_switchover()) {
return 0;
@@ -2828,21 +2822,19 @@ static int migration_maybe_pause(MigrationState *s,
* wait for the 'pause_sem' semaphore.
*/
if (s->state != MIGRATION_STATUS_CANCELLING) {
- migrate_set_state(&s->state, *current_active_state,
+ migrate_set_state(&s->state, s->state,
MIGRATION_STATUS_PRE_SWITCHOVER);
bql_unlock();
qemu_sem_wait(&s->pause_sem);
bql_lock();
migrate_set_state(&s->state, MIGRATION_STATUS_PRE_SWITCHOVER,
new_state);
- *current_active_state = new_state;
}
return s->state == new_state ? 0 : -EINVAL;
}
-static int migration_completion_precopy(MigrationState *s,
- int *current_active_state)
+static int migration_completion_precopy(MigrationState *s)
{
int ret;
@@ -2855,8 +2847,7 @@ static int migration_completion_precopy(MigrationState *s,
}
}
- ret = migration_maybe_pause(s, current_active_state,
- MIGRATION_STATUS_DEVICE);
+ ret = migration_maybe_pause(s, MIGRATION_STATUS_DEVICE);
if (ret < 0) {
goto out_unlock;
}
@@ -2909,11 +2900,10 @@ static void migration_completion_postcopy(MigrationState *s)
static void migration_completion(MigrationState *s)
{
int ret = 0;
- int current_active_state = s->state;
Error *local_err = NULL;
if (s->state == MIGRATION_STATUS_ACTIVE) {
- ret = migration_completion_precopy(s, ¤t_active_state);
+ ret = migration_completion_precopy(s);
} else if (s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
migration_completion_postcopy(s);
} else {
@@ -2953,8 +2943,7 @@ fail:
error_free(local_err);
}
- migrate_set_state(&s->state, current_active_state,
- MIGRATION_STATUS_FAILED);
+ migrate_set_state(&s->state, s->state, MIGRATION_STATUS_FAILED);
}
/**
--
2.35.3
^ permalink raw reply related [flat|nested] 47+ messages in thread
* [PULL 35/42] migration: Take BQL slightly longer in postcopy_start()
2025-01-29 16:00 [PULL 00/42] Migration patches for 2025-01-29 Fabiano Rosas
` (33 preceding siblings ...)
2025-01-29 16:00 ` [PULL 34/42] migration: Drop cached migration state " Fabiano Rosas
@ 2025-01-29 16:00 ` Fabiano Rosas
2025-01-29 16:00 ` [PULL 36/42] migration: Notify COMPLETE once for postcopy Fabiano Rosas
` (7 subsequent siblings)
42 siblings, 0 replies; 47+ messages in thread
From: Fabiano Rosas @ 2025-01-29 16:00 UTC (permalink / raw)
To: qemu-devel; +Cc: Peter Xu, Jiri Denemark, Juraj Marcin
From: Peter Xu <peterx@redhat.com>
This paves way for some follow up patch to modify migration states at the
end of postcopy_start(), which should better be with the BQL so that
there's no way of concurrent cancellation.
So we'll do something slightly more with BQL but they're really trivial,
hopefully nothing will really chance with this.
A side benefit is we can drop another explicit lock() in failure path.
Signed-off-by: Peter Xu <peterx@redhat.com>
Tested-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Juraj Marcin <jmarcin@redhat.com>
Link: https://lore.kernel.org/r/20250114230746.3268797-11-peterx@redhat.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
migration/migration.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/migration/migration.c b/migration/migration.c
index 5a3d0750ec..4ba6c8912a 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2753,8 +2753,6 @@ static int postcopy_start(MigrationState *ms, Error **errp)
migration_downtime_end(ms);
- bql_unlock();
-
if (migrate_postcopy_ram()) {
/*
* Although this ping is just for debug, it could potentially be
@@ -2770,7 +2768,6 @@ static int postcopy_start(MigrationState *ms, Error **errp)
ret = qemu_file_get_error(ms->to_dst_file);
if (ret) {
error_setg_errno(errp, -ret, "postcopy_start: Migration stream error");
- bql_lock();
goto fail;
}
trace_postcopy_preempt_enabled(migrate_postcopy_preempt());
@@ -2781,6 +2778,8 @@ static int postcopy_start(MigrationState *ms, Error **errp)
*/
migration_rate_set(migrate_max_postcopy_bandwidth());
+ bql_unlock();
+
return ret;
fail_closefb:
--
2.35.3
^ permalink raw reply related [flat|nested] 47+ messages in thread
* [PULL 36/42] migration: Notify COMPLETE once for postcopy
2025-01-29 16:00 [PULL 00/42] Migration patches for 2025-01-29 Fabiano Rosas
` (34 preceding siblings ...)
2025-01-29 16:00 ` [PULL 35/42] migration: Take BQL slightly longer in postcopy_start() Fabiano Rosas
@ 2025-01-29 16:00 ` Fabiano Rosas
2025-01-29 16:00 ` [PULL 37/42] migration: Unwrap qemu_savevm_state_complete_precopy() in postcopy Fabiano Rosas
` (6 subsequent siblings)
42 siblings, 0 replies; 47+ messages in thread
From: Fabiano Rosas @ 2025-01-29 16:00 UTC (permalink / raw)
To: qemu-devel; +Cc: Peter Xu, Jiri Denemark, Juraj Marcin
From: Peter Xu <peterx@redhat.com>
Postcopy invokes qemu_savevm_state_complete_precopy() twice, that means
it'll invoke COMPLETE notify twice.. also twice the tracepoints that
marking precopy complete.
Move that notification (along with the tracepoint) out to the caller, so
that postcopy will only notify once right at the start of switchover phase
from precopy. When at it, rename it to suite the file now it locates.
For precopy, there should have no functional change except the tracepoint
has a name change.
For the other two users of qemu_savevm_state_complete_precopy(), namely:
qemu_savevm_state() and qemu_savevm_live_state(): the notifier shouldn't
matter because they're not precopy at all. Now in these two contexts (aka,
"savevm", and "colo") sometimes the precopy notifiers will still be
invoked, but that's outside the scope of this patch.
Signed-off-by: Peter Xu <peterx@redhat.com>
Tested-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Juraj Marcin <jmarcin@redhat.com>
Link: https://lore.kernel.org/r/20250114230746.3268797-12-peterx@redhat.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
migration/migration.c | 15 +++++++++++++++
migration/savevm.c | 7 -------
migration/trace-events | 2 +-
3 files changed, 16 insertions(+), 8 deletions(-)
diff --git a/migration/migration.c b/migration/migration.c
index 4ba6c8912a..72802d6133 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -131,6 +131,17 @@ static void migration_downtime_end(MigrationState *s)
}
}
+static void precopy_notify_complete(void)
+{
+ Error *local_err = NULL;
+
+ if (precopy_notify(PRECOPY_NOTIFY_COMPLETE, &local_err)) {
+ error_report_err(local_err);
+ }
+
+ trace_migration_precopy_complete();
+}
+
static bool migration_needs_multiple_sockets(void)
{
return migrate_multifd() || migrate_postcopy_preempt();
@@ -2676,6 +2687,8 @@ static int postcopy_start(MigrationState *ms, Error **errp)
/* Switchover phase, switch to unlimited */
migration_rate_set(RATE_LIMIT_DISABLED);
+ precopy_notify_complete();
+
/*
* Cause any non-postcopiable, but iterative devices to
* send out their final data.
@@ -2865,6 +2878,8 @@ static int migration_completion_precopy(MigrationState *s)
migration_rate_set(RATE_LIMIT_DISABLED);
+ precopy_notify_complete();
+
ret = qemu_savevm_state_complete_precopy(s->to_dst_file, false);
out_unlock:
bql_unlock();
diff --git a/migration/savevm.c b/migration/savevm.c
index 92e77ca92b..9aef2fa3c9 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1578,15 +1578,8 @@ int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
int qemu_savevm_state_complete_precopy(QEMUFile *f, bool iterable_only)
{
int ret;
- Error *local_err = NULL;
bool in_postcopy = migration_in_postcopy();
- if (precopy_notify(PRECOPY_NOTIFY_COMPLETE, &local_err)) {
- error_report_err(local_err);
- }
-
- trace_savevm_state_complete_precopy();
-
if (!in_postcopy || iterable_only) {
ret = qemu_savevm_state_complete_precopy_iterable(f, in_postcopy);
if (ret) {
diff --git a/migration/trace-events b/migration/trace-events
index e03a914afb..12b262f8ee 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -44,7 +44,6 @@ savevm_state_resume_prepare(void) ""
savevm_state_header(void) ""
savevm_state_iterate(void) ""
savevm_state_cleanup(void) ""
-savevm_state_complete_precopy(void) ""
vmstate_save(const char *idstr, const char *vmsd_name) "%s, %s"
vmstate_load(const char *idstr, const char *vmsd_name) "%s, %s"
vmstate_downtime_save(const char *type, const char *idstr, uint32_t instance_id, int64_t downtime) "type=%s idstr=%s instance_id=%d downtime=%"PRIi64
@@ -195,6 +194,7 @@ migrate_transferred(uint64_t transferred, uint64_t time_spent, uint64_t bandwidt
process_incoming_migration_co_end(int ret, int ps) "ret=%d postcopy-state=%d"
process_incoming_migration_co_postcopy_end_main(void) ""
postcopy_preempt_enabled(bool value) "%d"
+migration_precopy_complete(void) ""
# migration-stats
migration_transferred_bytes(uint64_t qemu_file, uint64_t multifd, uint64_t rdma) "qemu_file %" PRIu64 " multifd %" PRIu64 " RDMA %" PRIu64
--
2.35.3
^ permalink raw reply related [flat|nested] 47+ messages in thread
* [PULL 37/42] migration: Unwrap qemu_savevm_state_complete_precopy() in postcopy
2025-01-29 16:00 [PULL 00/42] Migration patches for 2025-01-29 Fabiano Rosas
` (35 preceding siblings ...)
2025-01-29 16:00 ` [PULL 36/42] migration: Notify COMPLETE once for postcopy Fabiano Rosas
@ 2025-01-29 16:00 ` Fabiano Rosas
2025-01-29 16:00 ` [PULL 38/42] migration: Cleanup qemu_savevm_state_complete_precopy() Fabiano Rosas
` (5 subsequent siblings)
42 siblings, 0 replies; 47+ messages in thread
From: Fabiano Rosas @ 2025-01-29 16:00 UTC (permalink / raw)
To: qemu-devel; +Cc: Peter Xu, Jiri Denemark, Juraj Marcin
From: Peter Xu <peterx@redhat.com>
Postcopy invokes qemu_savevm_state_complete_precopy() twice for a long
time, and that caused way too much confusions. Let's clean this up and
make postcopy easier to read.
It's actually fairly straightforward: postcopy starts with saving
non-postcopiable iterables, then later it saves again with non-iterable
only. Move these two calls out makes everything much easier to follow.
Otherwise it's very unclear what qemu_savevm_state_complete_precopy() did
in either of the calls.
No functional change intended.
Signed-off-by: Peter Xu <peterx@redhat.com>
Tested-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Juraj Marcin <jmarcin@redhat.com>
Link: https://lore.kernel.org/r/20250114230746.3268797-13-peterx@redhat.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
migration/migration.c | 13 +++++++++++--
migration/savevm.c | 1 -
migration/savevm.h | 1 +
3 files changed, 12 insertions(+), 3 deletions(-)
diff --git a/migration/migration.c b/migration/migration.c
index 72802d6133..d29f7448bd 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2693,7 +2693,11 @@ static int postcopy_start(MigrationState *ms, Error **errp)
* Cause any non-postcopiable, but iterative devices to
* send out their final data.
*/
- qemu_savevm_state_complete_precopy(ms->to_dst_file, true);
+ ret = qemu_savevm_state_complete_precopy_iterable(ms->to_dst_file, true);
+ if (ret) {
+ error_setg(errp, "Postcopy save non-postcopiable iterables failed");
+ goto fail;
+ }
/*
* in Finish migrate and with the io-lock held everything should
@@ -2732,7 +2736,12 @@ static int postcopy_start(MigrationState *ms, Error **errp)
*/
qemu_savevm_send_postcopy_listen(fb);
- qemu_savevm_state_complete_precopy(fb, false);
+ ret = qemu_savevm_state_complete_precopy_non_iterable(fb, true);
+ if (ret) {
+ error_setg(errp, "Postcopy save non-iterable device states failed");
+ goto fail_closefb;
+ }
+
if (migrate_postcopy_ram()) {
qemu_savevm_send_ping(fb, 3);
}
diff --git a/migration/savevm.c b/migration/savevm.c
index 9aef2fa3c9..0ddc4c8eb5 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1477,7 +1477,6 @@ void qemu_savevm_state_complete_postcopy(QEMUFile *f)
qemu_fflush(f);
}
-static
int qemu_savevm_state_complete_precopy_iterable(QEMUFile *f, bool in_postcopy)
{
int64_t start_ts_each, end_ts_each;
diff --git a/migration/savevm.h b/migration/savevm.h
index c48a53e95e..7957460062 100644
--- a/migration/savevm.h
+++ b/migration/savevm.h
@@ -44,6 +44,7 @@ void qemu_savevm_state_pending_exact(uint64_t *must_precopy,
uint64_t *can_postcopy);
void qemu_savevm_state_pending_estimate(uint64_t *must_precopy,
uint64_t *can_postcopy);
+int qemu_savevm_state_complete_precopy_iterable(QEMUFile *f, bool in_postcopy);
void qemu_savevm_send_ping(QEMUFile *f, uint32_t value);
void qemu_savevm_send_open_return_path(QEMUFile *f);
int qemu_savevm_send_packaged(QEMUFile *f, const uint8_t *buf, size_t len);
--
2.35.3
^ permalink raw reply related [flat|nested] 47+ messages in thread
* [PULL 38/42] migration: Cleanup qemu_savevm_state_complete_precopy()
2025-01-29 16:00 [PULL 00/42] Migration patches for 2025-01-29 Fabiano Rosas
` (36 preceding siblings ...)
2025-01-29 16:00 ` [PULL 37/42] migration: Unwrap qemu_savevm_state_complete_precopy() in postcopy Fabiano Rosas
@ 2025-01-29 16:00 ` Fabiano Rosas
2025-01-29 16:00 ` [PULL 39/42] migration: Always set DEVICE state Fabiano Rosas
` (4 subsequent siblings)
42 siblings, 0 replies; 47+ messages in thread
From: Fabiano Rosas @ 2025-01-29 16:00 UTC (permalink / raw)
To: qemu-devel; +Cc: Peter Xu, Jiri Denemark, Juraj Marcin
From: Peter Xu <peterx@redhat.com>
Now qemu_savevm_state_complete_precopy() is never used in postcopy, clean
it up as in_postcopy==false now unconditionally.
Signed-off-by: Peter Xu <peterx@redhat.com>
Tested-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Juraj Marcin <jmarcin@redhat.com>
Link: https://lore.kernel.org/r/20250114230746.3268797-14-peterx@redhat.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
migration/savevm.c | 20 +++++++-------------
1 file changed, 7 insertions(+), 13 deletions(-)
diff --git a/migration/savevm.c b/migration/savevm.c
index 0ddc4c8eb5..bc375db282 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1577,25 +1577,19 @@ int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
int qemu_savevm_state_complete_precopy(QEMUFile *f, bool iterable_only)
{
int ret;
- bool in_postcopy = migration_in_postcopy();
- if (!in_postcopy || iterable_only) {
- ret = qemu_savevm_state_complete_precopy_iterable(f, in_postcopy);
+ ret = qemu_savevm_state_complete_precopy_iterable(f, false);
+ if (ret) {
+ return ret;
+ }
+
+ if (!iterable_only) {
+ ret = qemu_savevm_state_complete_precopy_non_iterable(f, false);
if (ret) {
return ret;
}
}
- if (iterable_only) {
- goto flush;
- }
-
- ret = qemu_savevm_state_complete_precopy_non_iterable(f, in_postcopy);
- if (ret) {
- return ret;
- }
-
-flush:
return qemu_fflush(f);
}
--
2.35.3
^ permalink raw reply related [flat|nested] 47+ messages in thread
* [PULL 39/42] migration: Always set DEVICE state
2025-01-29 16:00 [PULL 00/42] Migration patches for 2025-01-29 Fabiano Rosas
` (37 preceding siblings ...)
2025-01-29 16:00 ` [PULL 38/42] migration: Cleanup qemu_savevm_state_complete_precopy() Fabiano Rosas
@ 2025-01-29 16:00 ` Fabiano Rosas
2025-01-29 16:00 ` [PULL 40/42] migration: Merge precopy/postcopy on switchover start Fabiano Rosas
` (3 subsequent siblings)
42 siblings, 0 replies; 47+ messages in thread
From: Fabiano Rosas @ 2025-01-29 16:00 UTC (permalink / raw)
To: qemu-devel
Cc: Peter Xu, Jiri Denemark, Daniel P . Berrangé,
Dr . David Alan Gilbert, Juraj Marcin
From: Peter Xu <peterx@redhat.com>
DEVICE state was introduced back in 2017:
https://lore.kernel.org/qemu-devel/20171020090556.18631-1-dgilbert@redhat.com/
Quote from Dave's cover letter, when the pre-switchover phase was enabled,
the state transition looks like this:
The precopy flow is:
active->pre-switchover->device->completed
The postcopy flow is:
active->pre-switchover->postcopy-active->completed
To supplement above, when the cap is not enabled:
The precopy flow is:
active->completed
The postcopy flow is:
active->postcopy-active->completed
It works for us, though we have some code just to special case these state
transitions, so the DEVICE state currently is special only to precopy, and
only conditionally.
I had a quick discussion with Libvirt developers, it turns out that this
may not be necessary. IOW, it seems okay we can have DEVICE state to be
generic, so that we don't have over-complicated state machines. It not
only helps align all the migration state machine, help cleanup the code
path especially on pre-switchover handling (see the patch itself), another
side benefit is we can unconditionally have a specific state to mark the
switchover phase, which might be helpful for debugging too.
This patch makes the DEVICE state to be present always, marking that source
QEMU is switching over. Then the state machine will be always as simple
as:
active-> [pre-switchover->] -> device -> [postcopy-active->] -> complete
After the change, no matter whether pre-switchover or postcopy is enabled
or not, we always have DEVICE state showing the switchover phase. When
pre-switchover enabled, we'll have an extra stage before that. When
postcopy is enabled, we'll have an extra stage after that.
A few qtests need touch up in QEMU tree for this change:
- A few iotest outputs (194, 203, 234, 262, 280)
- Teach libqos's migrate() on "device" state
Cc: Jiri Denemark <jdenemar@redhat.com>
Cc: Daniel P. Berrangé <berrange@redhat.com>
Cc: Dr. David Alan Gilbert <dave@treblig.org>
Signed-off-by: Peter Xu <peterx@redhat.com>
Tested-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Juraj Marcin <jmarcin@redhat.com>
Link: https://lore.kernel.org/r/20250114230746.3268797-15-peterx@redhat.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
migration/migration.c | 82 +++++++++++++++++++++++--------------
qapi/migration.json | 7 +++-
tests/qemu-iotests/194.out | 1 +
tests/qemu-iotests/203.out | 1 +
tests/qemu-iotests/234.out | 2 +
tests/qemu-iotests/262.out | 1 +
tests/qemu-iotests/280.out | 1 +
tests/qtest/libqos/libqos.c | 3 +-
8 files changed, 64 insertions(+), 34 deletions(-)
diff --git a/migration/migration.c b/migration/migration.c
index d29f7448bd..5302b7b91b 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -105,7 +105,7 @@ static MigrationIncomingState *current_incoming;
static GSList *migration_blockers[MIG_MODE__MAX];
static bool migration_object_check(MigrationState *ms, Error **errp);
-static int migration_maybe_pause(MigrationState *s, int new_state);
+static bool migration_switchover_start(MigrationState *s);
static void migrate_fd_cancel(MigrationState *s);
static bool close_return_path_on_source(MigrationState *s);
static void migration_completion_end(MigrationState *s);
@@ -2657,11 +2657,6 @@ static int postcopy_start(MigrationState *ms, Error **errp)
}
}
- if (!migrate_pause_before_switchover()) {
- migrate_set_state(&ms->state, MIGRATION_STATUS_ACTIVE,
- MIGRATION_STATUS_POSTCOPY_ACTIVE);
- }
-
trace_postcopy_start();
bql_lock();
trace_postcopy_start_set_run();
@@ -2672,10 +2667,8 @@ static int postcopy_start(MigrationState *ms, Error **errp)
goto fail;
}
- ret = migration_maybe_pause(ms, MIGRATION_STATUS_POSTCOPY_ACTIVE);
- if (ret < 0) {
- error_setg_errno(errp, -ret, "%s: Failed in migration_maybe_pause()",
- __func__);
+ if (!migration_switchover_start(ms)) {
+ error_setg(errp, "migration_switchover_start() failed");
goto fail;
}
@@ -2800,6 +2793,10 @@ static int postcopy_start(MigrationState *ms, Error **errp)
*/
migration_rate_set(migrate_max_postcopy_bandwidth());
+ /* Now, switchover looks all fine, switching to postcopy-active */
+ migrate_set_state(&ms->state, MIGRATION_STATUS_DEVICE,
+ MIGRATION_STATUS_POSTCOPY_ACTIVE);
+
bql_unlock();
return ret;
@@ -2816,14 +2813,39 @@ fail:
}
/**
- * migration_maybe_pause: Pause if required to by
- * migrate_pause_before_switchover called with the BQL locked
- * Returns: 0 on success
+ * @migration_switchover_start: Start VM switchover procedure
+ *
+ * @s: The migration state object pointer
+ *
+ * Prepares for the switchover, depending on "pause-before-switchover"
+ * capability.
+ *
+ * If cap set, state machine goes like:
+ * [postcopy-]active -> pre-switchover -> device
+ *
+ * If cap not set:
+ * [postcopy-]active -> device
+ *
+ * Returns: true on success, false on interruptions.
*/
-static int migration_maybe_pause(MigrationState *s, int new_state)
+static bool migration_switchover_start(MigrationState *s)
{
+ /* Concurrent cancellation? Quit */
+ if (s->state == MIGRATION_STATUS_CANCELLING) {
+ return false;
+ }
+
+ /*
+ * No matter precopy or postcopy, since we still hold BQL it must not
+ * change concurrently to CANCELLING, so it must be either ACTIVE or
+ * POSTCOPY_ACTIVE.
+ */
+ assert(migration_is_active());
+
+ /* If the pre stage not requested, directly switch to DEVICE */
if (!migrate_pause_before_switchover()) {
- return 0;
+ migrate_set_state(&s->state, s->state, MIGRATION_STATUS_DEVICE);
+ return true;
}
/* Since leaving this state is not atomic with posting the semaphore
@@ -2836,23 +2858,22 @@ static int migration_maybe_pause(MigrationState *s, int new_state)
/* This block intentionally left blank */
}
+ /* Update [POSTCOPY_]ACTIVE to PRE_SWITCHOVER */
+ migrate_set_state(&s->state, s->state, MIGRATION_STATUS_PRE_SWITCHOVER);
+ bql_unlock();
+
+ qemu_sem_wait(&s->pause_sem);
+
+ bql_lock();
/*
- * If the migration is cancelled when it is in the completion phase,
- * the migration state is set to MIGRATION_STATUS_CANCELLING.
- * So we don't need to wait a semaphore, otherwise we would always
- * wait for the 'pause_sem' semaphore.
+ * After BQL released and retaken, the state can be CANCELLING if it
+ * happend during sem_wait().. Only change the state if it's still
+ * pre-switchover.
*/
- if (s->state != MIGRATION_STATUS_CANCELLING) {
- migrate_set_state(&s->state, s->state,
- MIGRATION_STATUS_PRE_SWITCHOVER);
- bql_unlock();
- qemu_sem_wait(&s->pause_sem);
- bql_lock();
- migrate_set_state(&s->state, MIGRATION_STATUS_PRE_SWITCHOVER,
- new_state);
- }
+ migrate_set_state(&s->state, MIGRATION_STATUS_PRE_SWITCHOVER,
+ MIGRATION_STATUS_DEVICE);
- return s->state == new_state ? 0 : -EINVAL;
+ return s->state == MIGRATION_STATUS_DEVICE;
}
static int migration_completion_precopy(MigrationState *s)
@@ -2868,8 +2889,7 @@ static int migration_completion_precopy(MigrationState *s)
}
}
- ret = migration_maybe_pause(s, MIGRATION_STATUS_DEVICE);
- if (ret < 0) {
+ if (!migration_switchover_start(s)) {
goto out_unlock;
}
diff --git a/qapi/migration.json b/qapi/migration.json
index 4679ce9f2a..43babd1df4 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -158,8 +158,11 @@
#
# @pre-switchover: Paused before device serialisation. (since 2.11)
#
-# @device: During device serialisation when pause-before-switchover is
-# enabled (since 2.11)
+# @device: During device serialisation (also known as switchover phase).
+# Before 9.2, this is only used when (1) in precopy, and (2) when
+# pre-switchover capability is enabled. After 10.0, this state will
+# always be present for every migration procedure as the switchover
+# phase. (since 2.11)
#
# @wait-unplug: wait for device unplug request by guest OS to be
# completed. (since 4.2)
diff --git a/tests/qemu-iotests/194.out b/tests/qemu-iotests/194.out
index 376ed1d2e6..6940e809cd 100644
--- a/tests/qemu-iotests/194.out
+++ b/tests/qemu-iotests/194.out
@@ -14,6 +14,7 @@ Starting migration...
{"return": {}}
{"data": {"status": "setup"}, "event": "MIGRATION", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
{"data": {"status": "active"}, "event": "MIGRATION", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
+{"data": {"status": "device"}, "event": "MIGRATION", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
{"data": {"status": "completed"}, "event": "MIGRATION", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
Gracefully ending the `drive-mirror` job on source...
{"return": {}}
diff --git a/tests/qemu-iotests/203.out b/tests/qemu-iotests/203.out
index 9d4abba8c5..8e58705e51 100644
--- a/tests/qemu-iotests/203.out
+++ b/tests/qemu-iotests/203.out
@@ -8,4 +8,5 @@ Starting migration...
{"return": {}}
{"data": {"status": "setup"}, "event": "MIGRATION", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
{"data": {"status": "active"}, "event": "MIGRATION", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
+{"data": {"status": "device"}, "event": "MIGRATION", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
{"data": {"status": "completed"}, "event": "MIGRATION", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
diff --git a/tests/qemu-iotests/234.out b/tests/qemu-iotests/234.out
index ac8b64350c..be3e138b58 100644
--- a/tests/qemu-iotests/234.out
+++ b/tests/qemu-iotests/234.out
@@ -10,6 +10,7 @@ Starting migration to B...
{"return": {}}
{"data": {"status": "setup"}, "event": "MIGRATION", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
{"data": {"status": "active"}, "event": "MIGRATION", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
+{"data": {"status": "device"}, "event": "MIGRATION", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
{"data": {"status": "completed"}, "event": "MIGRATION", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
{"data": {"status": "active"}, "event": "MIGRATION", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
{"data": {"status": "completed"}, "event": "MIGRATION", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
@@ -27,6 +28,7 @@ Starting migration back to A...
{"return": {}}
{"data": {"status": "setup"}, "event": "MIGRATION", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
{"data": {"status": "active"}, "event": "MIGRATION", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
+{"data": {"status": "device"}, "event": "MIGRATION", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
{"data": {"status": "completed"}, "event": "MIGRATION", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
{"data": {"status": "active"}, "event": "MIGRATION", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
{"data": {"status": "completed"}, "event": "MIGRATION", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
diff --git a/tests/qemu-iotests/262.out b/tests/qemu-iotests/262.out
index b8a2d3598d..bd7706b84b 100644
--- a/tests/qemu-iotests/262.out
+++ b/tests/qemu-iotests/262.out
@@ -8,6 +8,7 @@ Starting migration to B...
{"return": {}}
{"data": {"status": "setup"}, "event": "MIGRATION", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
{"data": {"status": "active"}, "event": "MIGRATION", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
+{"data": {"status": "device"}, "event": "MIGRATION", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
{"data": {"status": "completed"}, "event": "MIGRATION", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
{"data": {"status": "active"}, "event": "MIGRATION", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
{"data": {"status": "completed"}, "event": "MIGRATION", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
diff --git a/tests/qemu-iotests/280.out b/tests/qemu-iotests/280.out
index 546dbb4a68..37411144ca 100644
--- a/tests/qemu-iotests/280.out
+++ b/tests/qemu-iotests/280.out
@@ -7,6 +7,7 @@ Enabling migration QMP events on VM...
{"return": {}}
{"data": {"status": "setup"}, "event": "MIGRATION", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
{"data": {"status": "active"}, "event": "MIGRATION", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
+{"data": {"status": "device"}, "event": "MIGRATION", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
{"data": {"status": "completed"}, "event": "MIGRATION", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
VM is now stopped:
diff --git a/tests/qtest/libqos/libqos.c b/tests/qtest/libqos/libqos.c
index 5c0fa1f7c5..28a0901a0a 100644
--- a/tests/qtest/libqos/libqos.c
+++ b/tests/qtest/libqos/libqos.c
@@ -117,13 +117,14 @@ void migrate(QOSState *from, QOSState *to, const char *uri)
g_assert(qdict_haskey(sub, "status"));
st = qdict_get_str(sub, "status");
- /* "setup", "active", "completed", "failed", "cancelled" */
+ /* "setup", "active", "device", "completed", "failed", "cancelled" */
if (strcmp(st, "completed") == 0) {
qobject_unref(rsp);
break;
}
if ((strcmp(st, "setup") == 0) || (strcmp(st, "active") == 0)
+ || (strcmp(st, "device") == 0)
|| (strcmp(st, "wait-unplug") == 0)) {
qobject_unref(rsp);
g_usleep(5000);
--
2.35.3
^ permalink raw reply related [flat|nested] 47+ messages in thread
* [PULL 40/42] migration: Merge precopy/postcopy on switchover start
2025-01-29 16:00 [PULL 00/42] Migration patches for 2025-01-29 Fabiano Rosas
` (38 preceding siblings ...)
2025-01-29 16:00 ` [PULL 39/42] migration: Always set DEVICE state Fabiano Rosas
@ 2025-01-29 16:00 ` Fabiano Rosas
2025-01-29 16:00 ` [PULL 41/42] migration: Trivial cleanup on JSON writer of vmstate_save() Fabiano Rosas
` (2 subsequent siblings)
42 siblings, 0 replies; 47+ messages in thread
From: Fabiano Rosas @ 2025-01-29 16:00 UTC (permalink / raw)
To: qemu-devel; +Cc: Peter Xu, Jiri Denemark, Juraj Marcin
From: Peter Xu <peterx@redhat.com>
Now after all the cleanups, finally we can merge the switchover startup
phase into one single function for precopy/postcopy.
Signed-off-by: Peter Xu <peterx@redhat.com>
Tested-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Juraj Marcin <jmarcin@redhat.com>
Link: https://lore.kernel.org/r/20250114230746.3268797-16-peterx@redhat.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
migration/migration.c | 62 ++++++++++++++++++++++---------------------
1 file changed, 32 insertions(+), 30 deletions(-)
diff --git a/migration/migration.c b/migration/migration.c
index 5302b7b91b..74c50cc72c 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -105,7 +105,7 @@ static MigrationIncomingState *current_incoming;
static GSList *migration_blockers[MIG_MODE__MAX];
static bool migration_object_check(MigrationState *ms, Error **errp);
-static bool migration_switchover_start(MigrationState *s);
+static bool migration_switchover_start(MigrationState *s, Error **errp);
static void migrate_fd_cancel(MigrationState *s);
static bool close_return_path_on_source(MigrationState *s);
static void migration_completion_end(MigrationState *s);
@@ -2667,21 +2667,10 @@ static int postcopy_start(MigrationState *ms, Error **errp)
goto fail;
}
- if (!migration_switchover_start(ms)) {
- error_setg(errp, "migration_switchover_start() failed");
+ if (!migration_switchover_start(ms, errp)) {
goto fail;
}
- if (!migration_block_inactivate()) {
- error_setg(errp, "%s: Failed in bdrv_inactivate_all()", __func__);
- goto fail;
- }
-
- /* Switchover phase, switch to unlimited */
- migration_rate_set(RATE_LIMIT_DISABLED);
-
- precopy_notify_complete();
-
/*
* Cause any non-postcopiable, but iterative devices to
* send out their final data.
@@ -2813,7 +2802,7 @@ fail:
}
/**
- * @migration_switchover_start: Start VM switchover procedure
+ * @migration_switchover_prepare: Start VM switchover procedure
*
* @s: The migration state object pointer
*
@@ -2828,7 +2817,7 @@ fail:
*
* Returns: true on success, false on interruptions.
*/
-static bool migration_switchover_start(MigrationState *s)
+static bool migration_switchover_prepare(MigrationState *s)
{
/* Concurrent cancellation? Quit */
if (s->state == MIGRATION_STATUS_CANCELLING) {
@@ -2876,21 +2865,13 @@ static bool migration_switchover_start(MigrationState *s)
return s->state == MIGRATION_STATUS_DEVICE;
}
-static int migration_completion_precopy(MigrationState *s)
+static bool migration_switchover_start(MigrationState *s, Error **errp)
{
- int ret;
+ ERRP_GUARD();
- bql_lock();
-
- if (!migrate_mode_is_cpr(s)) {
- ret = migration_stop_vm(s, RUN_STATE_FINISH_MIGRATE);
- if (ret < 0) {
- goto out_unlock;
- }
- }
-
- if (!migration_switchover_start(s)) {
- goto out_unlock;
+ if (!migration_switchover_prepare(s)) {
+ error_setg(errp, "Switchover is interrupted");
+ return false;
}
/* Inactivate disks except in COLO */
@@ -2900,8 +2881,8 @@ static int migration_completion_precopy(MigrationState *s)
* bdrv_activate_all() on the other end won't fail.
*/
if (!migration_block_inactivate()) {
- ret = -EFAULT;
- goto out_unlock;
+ error_setg(errp, "Block inactivate failed during switchover");
+ return false;
}
}
@@ -2909,6 +2890,27 @@ static int migration_completion_precopy(MigrationState *s)
precopy_notify_complete();
+ return true;
+}
+
+static int migration_completion_precopy(MigrationState *s)
+{
+ int ret;
+
+ bql_lock();
+
+ if (!migrate_mode_is_cpr(s)) {
+ ret = migration_stop_vm(s, RUN_STATE_FINISH_MIGRATE);
+ if (ret < 0) {
+ goto out_unlock;
+ }
+ }
+
+ if (!migration_switchover_start(s, NULL)) {
+ ret = -EFAULT;
+ goto out_unlock;
+ }
+
ret = qemu_savevm_state_complete_precopy(s->to_dst_file, false);
out_unlock:
bql_unlock();
--
2.35.3
^ permalink raw reply related [flat|nested] 47+ messages in thread
* [PULL 41/42] migration: Trivial cleanup on JSON writer of vmstate_save()
2025-01-29 16:00 [PULL 00/42] Migration patches for 2025-01-29 Fabiano Rosas
` (39 preceding siblings ...)
2025-01-29 16:00 ` [PULL 40/42] migration: Merge precopy/postcopy on switchover start Fabiano Rosas
@ 2025-01-29 16:00 ` Fabiano Rosas
2025-01-29 16:00 ` [PULL 42/42] migration: refactor ram_save_target_page functions Fabiano Rosas
2025-02-01 3:03 ` [PULL 00/42] Migration patches for 2025-01-29 Stefan Hajnoczi
42 siblings, 0 replies; 47+ messages in thread
From: Fabiano Rosas @ 2025-01-29 16:00 UTC (permalink / raw)
To: qemu-devel; +Cc: Peter Xu, Jiri Denemark, Juraj Marcin
From: Peter Xu <peterx@redhat.com>
Two small cleanups in the same section of vmstate_save():
- Check vmdesc before the "mixed null/non-null data in array" logic, to
be crystal clear that it's only about the JSON writer, not the vmstate on
its own in the migration stream.
- Since we have is_null variable now, use that to replace a check.
Signed-off-by: Peter Xu <peterx@redhat.com>
Tested-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Juraj Marcin <jmarcin@redhat.com>
Link: https://lore.kernel.org/r/20250114230746.3268797-17-peterx@redhat.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
migration/vmstate.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/migration/vmstate.c b/migration/vmstate.c
index 82bd005a83..047a52af89 100644
--- a/migration/vmstate.c
+++ b/migration/vmstate.c
@@ -459,6 +459,8 @@ int vmstate_save_state_v(QEMUFile *f, const VMStateDescription *vmsd,
}
/*
+ * This logic only matters when dumping VM Desc.
+ *
* Due to the fake nullptr handling above, if there's mixed
* null/non-null data, it doesn't make sense to emit a
* compressed array representation spanning the entire array
@@ -466,7 +468,7 @@ int vmstate_save_state_v(QEMUFile *f, const VMStateDescription *vmsd,
* vs. nullptr). Search ahead for the next null/non-null element
* and start a new compressed array if found.
*/
- if (field->flags & VMS_ARRAY_OF_POINTER &&
+ if (vmdesc && (field->flags & VMS_ARRAY_OF_POINTER) &&
is_null != is_prev_null) {
is_prev_null = is_null;
@@ -504,7 +506,7 @@ int vmstate_save_state_v(QEMUFile *f, const VMStateDescription *vmsd,
written_bytes);
/* If we used a fake temp field.. free it now */
- if (inner_field != field) {
+ if (is_null) {
g_clear_pointer((gpointer *)&inner_field, g_free);
}
--
2.35.3
^ permalink raw reply related [flat|nested] 47+ messages in thread
* [PULL 42/42] migration: refactor ram_save_target_page functions
2025-01-29 16:00 [PULL 00/42] Migration patches for 2025-01-29 Fabiano Rosas
` (40 preceding siblings ...)
2025-01-29 16:00 ` [PULL 41/42] migration: Trivial cleanup on JSON writer of vmstate_save() Fabiano Rosas
@ 2025-01-29 16:00 ` Fabiano Rosas
2025-02-01 3:03 ` [PULL 00/42] Migration patches for 2025-01-29 Stefan Hajnoczi
42 siblings, 0 replies; 47+ messages in thread
From: Fabiano Rosas @ 2025-01-29 16:00 UTC (permalink / raw)
To: qemu-devel; +Cc: Peter Xu, Prasad Pandit
From: Prasad Pandit <pjp@fedoraproject.org>
Refactor ram_save_target_page legacy and multifd
functions into one. Other than simplifying it,
it frees 'migration_ops' object from usage, so it
is expunged.
Signed-off-by: Prasad Pandit <pjp@fedoraproject.org>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Message-ID: <20250127120823.144949-3-ppandit@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
migration/ram.c | 67 +++++++++++++------------------------------------
1 file changed, 17 insertions(+), 50 deletions(-)
diff --git a/migration/ram.c b/migration/ram.c
index 5aace00bf1..6f460fd22d 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -448,13 +448,6 @@ void ram_transferred_add(uint64_t bytes)
}
}
-struct MigrationOps {
- int (*ram_save_target_page)(RAMState *rs, PageSearchStatus *pss);
-};
-typedef struct MigrationOps MigrationOps;
-
-MigrationOps *migration_ops;
-
static int ram_save_host_page_urgent(PageSearchStatus *pss);
/* NOTE: page is the PFN not real ram_addr_t. */
@@ -1960,55 +1953,36 @@ int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t len,
}
/**
- * ram_save_target_page_legacy: save one target page
- *
- * Returns the number of pages written
+ * ram_save_target_page: save one target page to the precopy thread
+ * OR to multifd workers.
*
* @rs: current RAM state
* @pss: data about the page we want to send
*/
-static int ram_save_target_page_legacy(RAMState *rs, PageSearchStatus *pss)
+static int ram_save_target_page(RAMState *rs, PageSearchStatus *pss)
{
ram_addr_t offset = ((ram_addr_t)pss->page) << TARGET_PAGE_BITS;
int res;
+ if (!migrate_multifd()
+ || migrate_zero_page_detection() == ZERO_PAGE_DETECTION_LEGACY) {
+ if (save_zero_page(rs, pss, offset)) {
+ return 1;
+ }
+ }
+
+ if (migrate_multifd()) {
+ RAMBlock *block = pss->block;
+ return ram_save_multifd_page(block, offset);
+ }
+
if (control_save_page(pss, offset, &res)) {
return res;
}
- if (save_zero_page(rs, pss, offset)) {
- return 1;
- }
-
return ram_save_page(rs, pss);
}
-/**
- * ram_save_target_page_multifd: send one target page to multifd workers
- *
- * Returns 1 if the page was queued, -1 otherwise.
- *
- * @rs: current RAM state
- * @pss: data about the page we want to send
- */
-static int ram_save_target_page_multifd(RAMState *rs, PageSearchStatus *pss)
-{
- RAMBlock *block = pss->block;
- ram_addr_t offset = ((ram_addr_t)pss->page) << TARGET_PAGE_BITS;
-
- /*
- * While using multifd live migration, we still need to handle zero
- * page checking on the migration main thread.
- */
- if (migrate_zero_page_detection() == ZERO_PAGE_DETECTION_LEGACY) {
- if (save_zero_page(rs, pss, offset)) {
- return 1;
- }
- }
-
- return ram_save_multifd_page(block, offset);
-}
-
/* Should be called before sending a host page */
static void pss_host_page_prepare(PageSearchStatus *pss)
{
@@ -2095,7 +2069,7 @@ static int ram_save_host_page_urgent(PageSearchStatus *pss)
if (page_dirty) {
/* Be strict to return code; it must be 1, or what else? */
- if (migration_ops->ram_save_target_page(rs, pss) != 1) {
+ if (ram_save_target_page(rs, pss) != 1) {
error_report_once("%s: ram_save_target_page failed", __func__);
ret = -1;
goto out;
@@ -2164,7 +2138,7 @@ static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss)
if (preempt_active) {
qemu_mutex_unlock(&rs->bitmap_mutex);
}
- tmppages = migration_ops->ram_save_target_page(rs, pss);
+ tmppages = ram_save_target_page(rs, pss);
if (tmppages >= 0) {
pages += tmppages;
/*
@@ -2362,8 +2336,6 @@ static void ram_save_cleanup(void *opaque)
xbzrle_cleanup();
multifd_ram_save_cleanup();
ram_state_cleanup(rsp);
- g_free(migration_ops);
- migration_ops = NULL;
}
static void ram_state_reset(RAMState *rs)
@@ -3029,13 +3001,8 @@ static int ram_save_setup(QEMUFile *f, void *opaque, Error **errp)
return ret;
}
- migration_ops = g_malloc0(sizeof(MigrationOps));
-
if (migrate_multifd()) {
multifd_ram_save_setup();
- migration_ops->ram_save_target_page = ram_save_target_page_multifd;
- } else {
- migration_ops->ram_save_target_page = ram_save_target_page_legacy;
}
/*
--
2.35.3
^ permalink raw reply related [flat|nested] 47+ messages in thread
* Re: [PULL 00/42] Migration patches for 2025-01-29
2025-01-29 16:00 [PULL 00/42] Migration patches for 2025-01-29 Fabiano Rosas
` (41 preceding siblings ...)
2025-01-29 16:00 ` [PULL 42/42] migration: refactor ram_save_target_page functions Fabiano Rosas
@ 2025-02-01 3:03 ` Stefan Hajnoczi
42 siblings, 0 replies; 47+ messages in thread
From: Stefan Hajnoczi @ 2025-02-01 3:03 UTC (permalink / raw)
To: Fabiano Rosas; +Cc: qemu-devel, Peter Xu
[-- Attachment #1: Type: text/plain, Size: 116 bytes --]
Applied, thanks.
Please update the changelog at https://wiki.qemu.org/ChangeLog/10.0 for any user-visible changes.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PULL 17/42] migration: cpr-transfer mode
2025-01-29 16:00 ` [PULL 17/42] migration: cpr-transfer mode Fabiano Rosas
@ 2025-02-04 13:40 ` Peter Maydell
2025-02-04 16:26 ` Peter Xu
0 siblings, 1 reply; 47+ messages in thread
From: Peter Maydell @ 2025-02-04 13:40 UTC (permalink / raw)
To: Fabiano Rosas; +Cc: qemu-devel, Peter Xu, Steve Sistare, Markus Armbruster
On Wed, 29 Jan 2025 at 16:11, Fabiano Rosas <farosas@suse.de> wrote:
>
> From: Steve Sistare <steven.sistare@oracle.com>
>
> Add the cpr-transfer migration mode, which allows the user to transfer
> a guest to a new QEMU instance on the same host with minimal guest pause
> time, by preserving guest RAM in place, albeit with new virtual addresses
> in new QEMU, and by preserving device file descriptors. Pages that were
> locked in memory for DMA in old QEMU remain locked in new QEMU, because the
> descriptor of the device that locked them remains open.
>
> cpr-transfer preserves memory and devices descriptors by sending them to
> new QEMU over a unix domain socket using SCM_RIGHTS. Such CPR state cannot
> be sent over the normal migration channel, because devices and backends
> are created prior to reading the channel, so this mode sends CPR state
> over a second "cpr" migration channel. New QEMU reads the cpr channel
> prior to creating devices or backends. The user specifies the cpr channel
> in the channel arguments on the outgoing side, and in a second -incoming
> command-line parameter on the incoming side.
>
> The user must start old QEMU with the the '-machine aux-ram-share=on' option,
> which allows anonymous memory to be transferred in place to the new process
> by transferring a memory descriptor for each ram block. Memory-backend
> objects must have the share=on attribute, but memory-backend-epc is not
> supported.
>
> The user starts new QEMU on the same host as old QEMU, with command-line
> arguments to create the same machine, plus the -incoming option for the
> main migration channel, like normal live migration. In addition, the user
> adds a second -incoming option with channel type "cpr". This CPR channel
> must support file descriptor transfer with SCM_RIGHTS, i.e. it must be a
> UNIX domain socket.
>
> To initiate CPR, the user issues a migrate command to old QEMU, adding
> a second migration channel of type "cpr" in the channels argument.
> Old QEMU stops the VM, saves state to the migration channels, and enters
> the postmigrate state. New QEMU mmap's memory descriptors, and execution
> resumes.
>
> The implementation splits qmp_migrate into start and finish functions.
> Start sends CPR state to new QEMU, which responds by closing the CPR
> channel. Old QEMU detects the HUP then calls finish, which connects the
> main migration channel.
>
> In summary, the usage is:
>
> qemu-system-$arch -machine aux-ram-share=on ...
>
> start new QEMU with "-incoming <main-uri> -incoming <cpr-channel>"
>
> Issue commands to old QEMU:
> migrate_set_parameter mode cpr-transfer
>
> {"execute": "migrate", ...
> {"channel-type": "main"...}, {"channel-type": "cpr"...} ... }
>
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> Reviewed-by: Peter Xu <peterx@redhat.com>
> Acked-by: Markus Armbruster <armbru@redhat.com>
> Link: https://lore.kernel.org/r/1736967650-129648-17-git-send-email-steven.sistare@oracle.com
> Signed-off-by: Fabiano Rosas <farosas@suse.de>
Hi; this commit includes some code that has confused
Coverity (CID 1590980) and it also confused me, so maybe
it could be usefully made clearer?
> void qmp_migrate(const char *uri, bool has_channels,
> MigrationChannelList *channels, bool has_detach, bool detach,
> bool has_resume, bool resume, Error **errp)
> @@ -2056,6 +2118,7 @@ void qmp_migrate(const char *uri, bool has_channels,
> g_autoptr(MigrationChannel) channel = NULL;
> MigrationAddress *addr = NULL;
> MigrationChannel *channelv[MIGRATION_CHANNEL_TYPE__MAX] = { NULL };
> + MigrationChannel *cpr_channel = NULL;
>
> /*
> * Having preliminary checks for uri and channel
> @@ -2076,6 +2139,7 @@ void qmp_migrate(const char *uri, bool has_channels,
> }
> channelv[type] = channels->value;
> }
> + cpr_channel = channelv[MIGRATION_CHANNEL_TYPE_CPR];
> addr = channelv[MIGRATION_CHANNEL_TYPE_MAIN]->addr;
> if (!addr) {
> error_setg(errp, "Channel list has no main entry");
> @@ -2096,12 +2160,52 @@ void qmp_migrate(const char *uri, bool has_channels,
> return;
> }
>
> + if (s->parameters.mode == MIG_MODE_CPR_TRANSFER && !cpr_channel) {
> + error_setg(errp, "missing 'cpr' migration channel");
> + return;
> + }
Here in qmp_migrate() we bail out if cpr_channel is NULL,
provided that s->parameters.mode is MIG_MODE_CPR_TRANSFER...
> +
> resume_requested = has_resume && resume;
> if (!migrate_prepare(s, resume_requested, errp)) {
> /* Error detected, put into errp */
> return;
> }
>
> + if (cpr_state_save(cpr_channel, &local_err)) {
...but in cpr_state_save() when we decide whether to dereference
cpr_channel or not, we aren't checking s->parameters.mode,
we call migrate_mode() and check the result of that.
And migrate_mode() isn't completely trivial: it calls
cpr_get_incoming_mode(), so it's not obvious that it's
necessarily going to be the same value as s->parameters.mode.
So Coverity complains that it sees a code path where we
might dereference cpr_channel even when it's NULL.
Could this be made a bit clearer somehow, do you think?
thanks
-- PMM
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PULL 17/42] migration: cpr-transfer mode
2025-02-04 13:40 ` Peter Maydell
@ 2025-02-04 16:26 ` Peter Xu
2025-02-04 16:52 ` Steven Sistare
0 siblings, 1 reply; 47+ messages in thread
From: Peter Xu @ 2025-02-04 16:26 UTC (permalink / raw)
To: Peter Maydell, Steve Sistare
Cc: Fabiano Rosas, qemu-devel, Steve Sistare, Markus Armbruster
On Tue, Feb 04, 2025 at 01:40:34PM +0000, Peter Maydell wrote:
> On Wed, 29 Jan 2025 at 16:11, Fabiano Rosas <farosas@suse.de> wrote:
> >
> > From: Steve Sistare <steven.sistare@oracle.com>
> >
> > Add the cpr-transfer migration mode, which allows the user to transfer
> > a guest to a new QEMU instance on the same host with minimal guest pause
> > time, by preserving guest RAM in place, albeit with new virtual addresses
> > in new QEMU, and by preserving device file descriptors. Pages that were
> > locked in memory for DMA in old QEMU remain locked in new QEMU, because the
> > descriptor of the device that locked them remains open.
> >
> > cpr-transfer preserves memory and devices descriptors by sending them to
> > new QEMU over a unix domain socket using SCM_RIGHTS. Such CPR state cannot
> > be sent over the normal migration channel, because devices and backends
> > are created prior to reading the channel, so this mode sends CPR state
> > over a second "cpr" migration channel. New QEMU reads the cpr channel
> > prior to creating devices or backends. The user specifies the cpr channel
> > in the channel arguments on the outgoing side, and in a second -incoming
> > command-line parameter on the incoming side.
> >
> > The user must start old QEMU with the the '-machine aux-ram-share=on' option,
> > which allows anonymous memory to be transferred in place to the new process
> > by transferring a memory descriptor for each ram block. Memory-backend
> > objects must have the share=on attribute, but memory-backend-epc is not
> > supported.
> >
> > The user starts new QEMU on the same host as old QEMU, with command-line
> > arguments to create the same machine, plus the -incoming option for the
> > main migration channel, like normal live migration. In addition, the user
> > adds a second -incoming option with channel type "cpr". This CPR channel
> > must support file descriptor transfer with SCM_RIGHTS, i.e. it must be a
> > UNIX domain socket.
> >
> > To initiate CPR, the user issues a migrate command to old QEMU, adding
> > a second migration channel of type "cpr" in the channels argument.
> > Old QEMU stops the VM, saves state to the migration channels, and enters
> > the postmigrate state. New QEMU mmap's memory descriptors, and execution
> > resumes.
> >
> > The implementation splits qmp_migrate into start and finish functions.
> > Start sends CPR state to new QEMU, which responds by closing the CPR
> > channel. Old QEMU detects the HUP then calls finish, which connects the
> > main migration channel.
> >
> > In summary, the usage is:
> >
> > qemu-system-$arch -machine aux-ram-share=on ...
> >
> > start new QEMU with "-incoming <main-uri> -incoming <cpr-channel>"
> >
> > Issue commands to old QEMU:
> > migrate_set_parameter mode cpr-transfer
> >
> > {"execute": "migrate", ...
> > {"channel-type": "main"...}, {"channel-type": "cpr"...} ... }
> >
> > Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> > Reviewed-by: Peter Xu <peterx@redhat.com>
> > Acked-by: Markus Armbruster <armbru@redhat.com>
> > Link: https://lore.kernel.org/r/1736967650-129648-17-git-send-email-steven.sistare@oracle.com
> > Signed-off-by: Fabiano Rosas <farosas@suse.de>
>
> Hi; this commit includes some code that has confused
> Coverity (CID 1590980) and it also confused me, so maybe
> it could be usefully made clearer?
>
>
> > void qmp_migrate(const char *uri, bool has_channels,
> > MigrationChannelList *channels, bool has_detach, bool detach,
> > bool has_resume, bool resume, Error **errp)
> > @@ -2056,6 +2118,7 @@ void qmp_migrate(const char *uri, bool has_channels,
> > g_autoptr(MigrationChannel) channel = NULL;
> > MigrationAddress *addr = NULL;
> > MigrationChannel *channelv[MIGRATION_CHANNEL_TYPE__MAX] = { NULL };
> > + MigrationChannel *cpr_channel = NULL;
> >
> > /*
> > * Having preliminary checks for uri and channel
> > @@ -2076,6 +2139,7 @@ void qmp_migrate(const char *uri, bool has_channels,
> > }
> > channelv[type] = channels->value;
> > }
> > + cpr_channel = channelv[MIGRATION_CHANNEL_TYPE_CPR];
> > addr = channelv[MIGRATION_CHANNEL_TYPE_MAIN]->addr;
> > if (!addr) {
> > error_setg(errp, "Channel list has no main entry");
> > @@ -2096,12 +2160,52 @@ void qmp_migrate(const char *uri, bool has_channels,
> > return;
> > }
> >
> > + if (s->parameters.mode == MIG_MODE_CPR_TRANSFER && !cpr_channel) {
> > + error_setg(errp, "missing 'cpr' migration channel");
> > + return;
> > + }
>
> Here in qmp_migrate() we bail out if cpr_channel is NULL,
> provided that s->parameters.mode is MIG_MODE_CPR_TRANSFER...
>
> > +
> > resume_requested = has_resume && resume;
> > if (!migrate_prepare(s, resume_requested, errp)) {
> > /* Error detected, put into errp */
> > return;
> > }
> >
> > + if (cpr_state_save(cpr_channel, &local_err)) {
>
> ...but in cpr_state_save() when we decide whether to dereference
> cpr_channel or not, we aren't checking s->parameters.mode,
> we call migrate_mode() and check the result of that.
> And migrate_mode() isn't completely trivial: it calls
> cpr_get_incoming_mode(), so it's not obvious that it's
> necessarily going to be the same value as s->parameters.mode.
> So Coverity complains that it sees a code path where we
> might dereference cpr_channel even when it's NULL.
>
> Could this be made a bit clearer somehow, do you think?
That migrate_mode() is indeed tricky, and should only be needed for
incoming side QEMU to workaround current limitation that the migration
parameter "mode" cannot be set as early as when cpr_state_load() happens..
I think we could check s->parameters.mode here before doing
cpr_state_save(), it can also be more readable.
Steve, do you want to send a patch?
--
Peter Xu
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PULL 17/42] migration: cpr-transfer mode
2025-02-04 16:26 ` Peter Xu
@ 2025-02-04 16:52 ` Steven Sistare
0 siblings, 0 replies; 47+ messages in thread
From: Steven Sistare @ 2025-02-04 16:52 UTC (permalink / raw)
To: Peter Xu, Peter Maydell; +Cc: Fabiano Rosas, qemu-devel, Markus Armbruster
On 2/4/2025 11:26 AM, Peter Xu wrote:
> On Tue, Feb 04, 2025 at 01:40:34PM +0000, Peter Maydell wrote:
>> On Wed, 29 Jan 2025 at 16:11, Fabiano Rosas <farosas@suse.de> wrote:
>>>
>>> From: Steve Sistare <steven.sistare@oracle.com>
>>>
>>> Add the cpr-transfer migration mode, which allows the user to transfer
>>> a guest to a new QEMU instance on the same host with minimal guest pause
>>> time, by preserving guest RAM in place, albeit with new virtual addresses
>>> in new QEMU, and by preserving device file descriptors. Pages that were
>>> locked in memory for DMA in old QEMU remain locked in new QEMU, because the
>>> descriptor of the device that locked them remains open.
>>>
>>> cpr-transfer preserves memory and devices descriptors by sending them to
>>> new QEMU over a unix domain socket using SCM_RIGHTS. Such CPR state cannot
>>> be sent over the normal migration channel, because devices and backends
>>> are created prior to reading the channel, so this mode sends CPR state
>>> over a second "cpr" migration channel. New QEMU reads the cpr channel
>>> prior to creating devices or backends. The user specifies the cpr channel
>>> in the channel arguments on the outgoing side, and in a second -incoming
>>> command-line parameter on the incoming side.
>>>
>>> The user must start old QEMU with the the '-machine aux-ram-share=on' option,
>>> which allows anonymous memory to be transferred in place to the new process
>>> by transferring a memory descriptor for each ram block. Memory-backend
>>> objects must have the share=on attribute, but memory-backend-epc is not
>>> supported.
>>>
>>> The user starts new QEMU on the same host as old QEMU, with command-line
>>> arguments to create the same machine, plus the -incoming option for the
>>> main migration channel, like normal live migration. In addition, the user
>>> adds a second -incoming option with channel type "cpr". This CPR channel
>>> must support file descriptor transfer with SCM_RIGHTS, i.e. it must be a
>>> UNIX domain socket.
>>>
>>> To initiate CPR, the user issues a migrate command to old QEMU, adding
>>> a second migration channel of type "cpr" in the channels argument.
>>> Old QEMU stops the VM, saves state to the migration channels, and enters
>>> the postmigrate state. New QEMU mmap's memory descriptors, and execution
>>> resumes.
>>>
>>> The implementation splits qmp_migrate into start and finish functions.
>>> Start sends CPR state to new QEMU, which responds by closing the CPR
>>> channel. Old QEMU detects the HUP then calls finish, which connects the
>>> main migration channel.
>>>
>>> In summary, the usage is:
>>>
>>> qemu-system-$arch -machine aux-ram-share=on ...
>>>
>>> start new QEMU with "-incoming <main-uri> -incoming <cpr-channel>"
>>>
>>> Issue commands to old QEMU:
>>> migrate_set_parameter mode cpr-transfer
>>>
>>> {"execute": "migrate", ...
>>> {"channel-type": "main"...}, {"channel-type": "cpr"...} ... }
>>>
>>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>>> Reviewed-by: Peter Xu <peterx@redhat.com>
>>> Acked-by: Markus Armbruster <armbru@redhat.com>
>>> Link: https://lore.kernel.org/r/1736967650-129648-17-git-send-email-steven.sistare@oracle.com
>>> Signed-off-by: Fabiano Rosas <farosas@suse.de>
>>
>> Hi; this commit includes some code that has confused
>> Coverity (CID 1590980) and it also confused me, so maybe
>> it could be usefully made clearer?
>>
>>
>>> void qmp_migrate(const char *uri, bool has_channels,
>>> MigrationChannelList *channels, bool has_detach, bool detach,
>>> bool has_resume, bool resume, Error **errp)
>>> @@ -2056,6 +2118,7 @@ void qmp_migrate(const char *uri, bool has_channels,
>>> g_autoptr(MigrationChannel) channel = NULL;
>>> MigrationAddress *addr = NULL;
>>> MigrationChannel *channelv[MIGRATION_CHANNEL_TYPE__MAX] = { NULL };
>>> + MigrationChannel *cpr_channel = NULL;
>>>
>>> /*
>>> * Having preliminary checks for uri and channel
>>> @@ -2076,6 +2139,7 @@ void qmp_migrate(const char *uri, bool has_channels,
>>> }
>>> channelv[type] = channels->value;
>>> }
>>> + cpr_channel = channelv[MIGRATION_CHANNEL_TYPE_CPR];
>>> addr = channelv[MIGRATION_CHANNEL_TYPE_MAIN]->addr;
>>> if (!addr) {
>>> error_setg(errp, "Channel list has no main entry");
>>> @@ -2096,12 +2160,52 @@ void qmp_migrate(const char *uri, bool has_channels,
>>> return;
>>> }
>>>
>>> + if (s->parameters.mode == MIG_MODE_CPR_TRANSFER && !cpr_channel) {
>>> + error_setg(errp, "missing 'cpr' migration channel");
>>> + return;
>>> + }
>>
>> Here in qmp_migrate() we bail out if cpr_channel is NULL,
>> provided that s->parameters.mode is MIG_MODE_CPR_TRANSFER...
>>
>>> +
>>> resume_requested = has_resume && resume;
>>> if (!migrate_prepare(s, resume_requested, errp)) {
>>> /* Error detected, put into errp */
>>> return;
>>> }
>>>
>>> + if (cpr_state_save(cpr_channel, &local_err)) {
>>
>> ...but in cpr_state_save() when we decide whether to dereference
>> cpr_channel or not, we aren't checking s->parameters.mode,
>> we call migrate_mode() and check the result of that.
>> And migrate_mode() isn't completely trivial: it calls
>> cpr_get_incoming_mode(), so it's not obvious that it's
>> necessarily going to be the same value as s->parameters.mode.
>> So Coverity complains that it sees a code path where we
>> might dereference cpr_channel even when it's NULL.
>>
>> Could this be made a bit clearer somehow, do you think?
>
> That migrate_mode() is indeed tricky, and should only be needed for
> incoming side QEMU to workaround current limitation that the migration
> parameter "mode" cannot be set as early as when cpr_state_load() happens..
>
> I think we could check s->parameters.mode here before doing
> cpr_state_save(), it can also be more readable.
>
> Steve, do you want to send a patch?
I am busy today but I will submit a patch tomorrow. cpr_state_save
is only used on the outgoing side, so internally it can check
s->parameters.mode instead of migrate_mode().
- Steve
^ permalink raw reply [flat|nested] 47+ messages in thread
end of thread, other threads:[~2025-02-04 16:53 UTC | newest]
Thread overview: 47+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-29 16:00 [PULL 00/42] Migration patches for 2025-01-29 Fabiano Rosas
2025-01-29 16:00 ` [PULL 01/42] migration: fix -Werror=maybe-uninitialized Fabiano Rosas
2025-01-29 16:00 ` [PULL 02/42] backends/hostmem-shm: factor out allocation of "anonymous shared memory with an fd" Fabiano Rosas
2025-01-29 16:00 ` [PULL 03/42] physmem: fix qemu_ram_alloc_from_fd size calculation Fabiano Rosas
2025-01-29 16:00 ` [PULL 04/42] physmem: qemu_ram_alloc_from_fd extensions Fabiano Rosas
2025-01-29 16:00 ` [PULL 05/42] physmem: fd-based shared memory Fabiano Rosas
2025-01-29 16:00 ` [PULL 06/42] memory: add RAM_PRIVATE Fabiano Rosas
2025-01-29 16:00 ` [PULL 07/42] machine: aux-ram-share option Fabiano Rosas
2025-01-29 16:00 ` [PULL 08/42] migration: cpr-state Fabiano Rosas
2025-01-29 16:00 ` [PULL 09/42] physmem: preserve ram blocks for cpr Fabiano Rosas
2025-01-29 16:00 ` [PULL 10/42] hostmem-memfd: preserve " Fabiano Rosas
2025-01-29 16:00 ` [PULL 11/42] hostmem-shm: " Fabiano Rosas
2025-01-29 16:00 ` [PULL 12/42] migration: enhance migrate_uri_parse Fabiano Rosas
2025-01-29 16:00 ` [PULL 13/42] migration: incoming channel Fabiano Rosas
2025-01-29 16:00 ` [PULL 14/42] migration: SCM_RIGHTS for QEMUFile Fabiano Rosas
2025-01-29 16:00 ` [PULL 15/42] migration: VMSTATE_FD Fabiano Rosas
2025-01-29 16:00 ` [PULL 16/42] migration: cpr-transfer save and load Fabiano Rosas
2025-01-29 16:00 ` [PULL 17/42] migration: cpr-transfer mode Fabiano Rosas
2025-02-04 13:40 ` Peter Maydell
2025-02-04 16:26 ` Peter Xu
2025-02-04 16:52 ` Steven Sistare
2025-01-29 16:00 ` [PULL 18/42] migration-test: memory_backend Fabiano Rosas
2025-01-29 16:00 ` [PULL 19/42] tests/qtest: optimize migrate_set_ports Fabiano Rosas
2025-01-29 16:00 ` [PULL 20/42] tests/qtest: defer connection Fabiano Rosas
2025-01-29 16:00 ` [PULL 21/42] migration-test: " Fabiano Rosas
2025-01-29 16:00 ` [PULL 22/42] tests/qtest: enhance migration channels Fabiano Rosas
2025-01-29 16:00 ` [PULL 23/42] tests/qtest: assert qmp connected Fabiano Rosas
2025-01-29 16:00 ` [PULL 24/42] migration-test: cpr-transfer Fabiano Rosas
2025-01-29 16:00 ` [PULL 25/42] migration: cpr-transfer documentation Fabiano Rosas
2025-01-29 16:00 ` [PULL 26/42] migration: Remove postcopy implications in should_send_vmdesc() Fabiano Rosas
2025-01-29 16:00 ` [PULL 27/42] migration: Do not construct JSON description if suppressed Fabiano Rosas
2025-01-29 16:00 ` [PULL 28/42] migration: Optimize postcopy on downtime by avoiding JSON writer Fabiano Rosas
2025-01-29 16:00 ` [PULL 29/42] migration: Avoid two src-downtime-end tracepoints for postcopy Fabiano Rosas
2025-01-29 16:00 ` [PULL 30/42] migration: Drop inactivate_disk param in qemu_savevm_state_complete* Fabiano Rosas
2025-01-29 16:00 ` [PULL 31/42] migration: Synchronize all CPU states only for non-iterable dump Fabiano Rosas
2025-01-29 16:00 ` [PULL 32/42] migration: Adjust postcopy bandwidth during switchover Fabiano Rosas
2025-01-29 16:00 ` [PULL 33/42] migration: Adjust locking in migration_maybe_pause() Fabiano Rosas
2025-01-29 16:00 ` [PULL 34/42] migration: Drop cached migration state " Fabiano Rosas
2025-01-29 16:00 ` [PULL 35/42] migration: Take BQL slightly longer in postcopy_start() Fabiano Rosas
2025-01-29 16:00 ` [PULL 36/42] migration: Notify COMPLETE once for postcopy Fabiano Rosas
2025-01-29 16:00 ` [PULL 37/42] migration: Unwrap qemu_savevm_state_complete_precopy() in postcopy Fabiano Rosas
2025-01-29 16:00 ` [PULL 38/42] migration: Cleanup qemu_savevm_state_complete_precopy() Fabiano Rosas
2025-01-29 16:00 ` [PULL 39/42] migration: Always set DEVICE state Fabiano Rosas
2025-01-29 16:00 ` [PULL 40/42] migration: Merge precopy/postcopy on switchover start Fabiano Rosas
2025-01-29 16:00 ` [PULL 41/42] migration: Trivial cleanup on JSON writer of vmstate_save() Fabiano Rosas
2025-01-29 16:00 ` [PULL 42/42] migration: refactor ram_save_target_page functions Fabiano Rosas
2025-02-01 3:03 ` [PULL 00/42] Migration patches for 2025-01-29 Stefan Hajnoczi
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).