* [PULL 00/15] Migration 20240126 patches
@ 2024-01-26 4:17 peterx
2024-01-26 4:17 ` [PULL 01/15] userfaultfd: use 1ULL to build ioctl masks peterx
` (15 more replies)
0 siblings, 16 replies; 34+ messages in thread
From: peterx @ 2024-01-26 4:17 UTC (permalink / raw)
To: qemu-devel; +Cc: Fabiano Rosas, peterx
From: Peter Xu <peterx@redhat.com>
The following changes since commit 5bab95dc74d43bbb28c6a96d24c810a664432057:
Merge tag 'pull-request-2024-01-24' of https://gitlab.com/thuth/qemu into staging (2024-01-25 12:33:42 +0000)
are available in the Git repository at:
https://gitlab.com/peterx/qemu.git tags/migration-20240126-pull-request
for you to fetch changes up to 24b0c2ec956ca225282f81470f7c26f5bb844885:
Make 'uri' optional for migrate QAPI (2024-01-26 11:06:13 +0800)
----------------------------------------------------------------
Migration pull for 9.0
- Fabiano's patchset to fix migration state references in BHs
- Fabiano's new 'n-1' migration test for CI
- Het's fix on making "uri" optional in QMP migrate cmd
- Markus's HMP leak fix reported by Coverity
- Paolo's cleanup on uffd to replace u64 usage
- Peter's small migration cleanup series all over the places
----------------------------------------------------------------
Fabiano Rosas (9):
tests/qtest/migration: Don't use -cpu max for aarch64
ci: Add a migration compatibility test job
ci: Disable migration compatibility tests for aarch64
migration/yank: Use channel features
migration: Fix use-after-free of migration state object
migration: Take reference to migration state around
bg_migration_vm_start_bh
migration: Reference migration state around
loadvm_postcopy_handle_run_bh
migration: Add a wrapper to qemu_bh_schedule
migration: Centralize BH creation and dispatch
Het Gala (1):
Make 'uri' optional for migrate QAPI
Markus Armbruster (1):
migration: Plug memory leak on HMP migrate error path
Paolo Bonzini (1):
userfaultfd: use 1ULL to build ioctl masks
Peter Xu (3):
migration: Make threshold_size an uint64_t
migration: Drop unnecessary check in ram's pending_exact()
analyze-migration.py: Remove trick on parsing ramblocks
qapi/migration.json | 2 +-
migration/migration.h | 7 +-
migration/migration-hmp-cmds.c | 4 +-
migration/migration.c | 82 +++++++++++++----------
migration/postcopy-ram.c | 16 ++---
migration/ram.c | 9 ++-
migration/savevm.c | 5 +-
migration/yank_functions.c | 6 +-
subprojects/libvhost-user/libvhost-user.c | 2 +-
tests/qtest/migration-test.c | 6 +-
.gitlab-ci.d/buildtest.yml | 64 ++++++++++++++++++
scripts/analyze-migration.py | 11 +--
12 files changed, 135 insertions(+), 79 deletions(-)
--
2.43.0
^ permalink raw reply [flat|nested] 34+ messages in thread
* [PULL 01/15] userfaultfd: use 1ULL to build ioctl masks
2024-01-26 4:17 [PULL 00/15] Migration 20240126 patches peterx
@ 2024-01-26 4:17 ` peterx
2024-01-26 4:17 ` [PULL 02/15] migration: Plug memory leak on HMP migrate error path peterx
` (14 subsequent siblings)
15 siblings, 0 replies; 34+ messages in thread
From: peterx @ 2024-01-26 4:17 UTC (permalink / raw)
To: qemu-devel
Cc: Fabiano Rosas, peterx, Paolo Bonzini, Philippe Mathieu-Daudé
From: Paolo Bonzini <pbonzini@redhat.com>
There is no need to use the Linux-internal __u64 type, 1ULL is
guaranteed to be wide enough.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Link: https://lore.kernel.org/r/20240117160313.175609-1-pbonzini@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
---
migration/postcopy-ram.c | 16 +++++++---------
subprojects/libvhost-user/libvhost-user.c | 2 +-
tests/qtest/migration-test.c | 4 ++--
3 files changed, 10 insertions(+), 12 deletions(-)
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 5408e028c6..893ec8fa89 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -102,11 +102,9 @@ void postcopy_thread_create(MigrationIncomingState *mis,
* are target OS specific.
*/
#if defined(__linux__)
-
#include <poll.h>
#include <sys/ioctl.h>
#include <sys/syscall.h>
-#include <asm/types.h> /* for __u64 */
#endif
#if defined(__linux__) && defined(__NR_userfaultfd) && defined(CONFIG_EVENTFD)
@@ -272,8 +270,8 @@ static bool request_ufd_features(int ufd, uint64_t features)
return false;
}
- ioctl_mask = (__u64)1 << _UFFDIO_REGISTER |
- (__u64)1 << _UFFDIO_UNREGISTER;
+ ioctl_mask = 1ULL << _UFFDIO_REGISTER |
+ 1ULL << _UFFDIO_UNREGISTER;
if ((api_struct.ioctls & ioctl_mask) != ioctl_mask) {
error_report("Missing userfault features: %" PRIx64,
(uint64_t)(~api_struct.ioctls & ioctl_mask));
@@ -462,9 +460,9 @@ bool postcopy_ram_supported_by_host(MigrationIncomingState *mis, Error **errp)
goto out;
}
- feature_mask = (__u64)1 << _UFFDIO_WAKE |
- (__u64)1 << _UFFDIO_COPY |
- (__u64)1 << _UFFDIO_ZEROPAGE;
+ feature_mask = 1ULL << _UFFDIO_WAKE |
+ 1ULL << _UFFDIO_COPY |
+ 1ULL << _UFFDIO_ZEROPAGE;
if ((reg_struct.ioctls & feature_mask) != feature_mask) {
error_setg(errp, "Missing userfault map features: %" PRIx64,
(uint64_t)(~reg_struct.ioctls & feature_mask));
@@ -733,11 +731,11 @@ static int ram_block_enable_notify(RAMBlock *rb, void *opaque)
error_report("%s userfault register: %s", __func__, strerror(errno));
return -1;
}
- if (!(reg_struct.ioctls & ((__u64)1 << _UFFDIO_COPY))) {
+ if (!(reg_struct.ioctls & (1ULL << _UFFDIO_COPY))) {
error_report("%s userfault: Region doesn't support COPY", __func__);
return -1;
}
- if (reg_struct.ioctls & ((__u64)1 << _UFFDIO_ZEROPAGE)) {
+ if (reg_struct.ioctls & (1ULL << _UFFDIO_ZEROPAGE)) {
qemu_ram_set_uf_zeroable(rb);
}
diff --git a/subprojects/libvhost-user/libvhost-user.c b/subprojects/libvhost-user/libvhost-user.c
index 6684057370..a3b158c671 100644
--- a/subprojects/libvhost-user/libvhost-user.c
+++ b/subprojects/libvhost-user/libvhost-user.c
@@ -684,7 +684,7 @@ generate_faults(VuDev *dev) {
dev->postcopy_ufd, strerror(errno));
return false;
}
- if (!(reg_struct.ioctls & ((__u64)1 << _UFFDIO_COPY))) {
+ if (!(reg_struct.ioctls & (1ULL << _UFFDIO_COPY))) {
vu_panic(dev, "%s Region (%d) doesn't support COPY",
__func__, i);
return false;
diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index d3066e119f..7675519cfa 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -104,8 +104,8 @@ static bool ufd_version_check(void)
}
uffd_feature_thread_id = api_struct.features & UFFD_FEATURE_THREAD_ID;
- ioctl_mask = (__u64)1 << _UFFDIO_REGISTER |
- (__u64)1 << _UFFDIO_UNREGISTER;
+ ioctl_mask = 1ULL << _UFFDIO_REGISTER |
+ 1ULL << _UFFDIO_UNREGISTER;
if ((api_struct.ioctls & ioctl_mask) != ioctl_mask) {
g_test_message("Skipping test: Missing userfault feature");
return false;
--
2.43.0
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PULL 02/15] migration: Plug memory leak on HMP migrate error path
2024-01-26 4:17 [PULL 00/15] Migration 20240126 patches peterx
2024-01-26 4:17 ` [PULL 01/15] userfaultfd: use 1ULL to build ioctl masks peterx
@ 2024-01-26 4:17 ` peterx
2024-01-26 4:17 ` [PULL 03/15] migration: Make threshold_size an uint64_t peterx
` (13 subsequent siblings)
15 siblings, 0 replies; 34+ messages in thread
From: peterx @ 2024-01-26 4:17 UTC (permalink / raw)
To: qemu-devel; +Cc: Fabiano Rosas, peterx, Markus Armbruster
From: Markus Armbruster <armbru@redhat.com>
hmp_migrate() leaks @caps when qmp_migrate() fails. Plug the leak
with g_autoptr().
Fixes: 967f2de5c9ec (migration: Implement MigrateChannelList to hmp migration flow.) v8.2.0-rc0
Fixes: CID 1533125
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Link: https://lore.kernel.org/r/20240117140722.3979657-1-armbru@redhat.com
[peterx: fix CID number as reported by Peter Maydell]
Signed-off-by: Peter Xu <peterx@redhat.com>
---
migration/migration-hmp-cmds.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
index 740a219aa4..99b49df5dd 100644
--- a/migration/migration-hmp-cmds.c
+++ b/migration/migration-hmp-cmds.c
@@ -764,7 +764,7 @@ void hmp_migrate(Monitor *mon, const QDict *qdict)
bool resume = qdict_get_try_bool(qdict, "resume", false);
const char *uri = qdict_get_str(qdict, "uri");
Error *err = NULL;
- MigrationChannelList *caps = NULL;
+ g_autoptr(MigrationChannelList) caps = NULL;
g_autoptr(MigrationChannel) channel = NULL;
if (inc) {
@@ -789,8 +789,6 @@ void hmp_migrate(Monitor *mon, const QDict *qdict)
return;
}
- qapi_free_MigrationChannelList(caps);
-
if (!detach) {
HMPMigrationStatus *status;
--
2.43.0
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PULL 03/15] migration: Make threshold_size an uint64_t
2024-01-26 4:17 [PULL 00/15] Migration 20240126 patches peterx
2024-01-26 4:17 ` [PULL 01/15] userfaultfd: use 1ULL to build ioctl masks peterx
2024-01-26 4:17 ` [PULL 02/15] migration: Plug memory leak on HMP migrate error path peterx
@ 2024-01-26 4:17 ` peterx
2024-01-26 4:17 ` [PULL 04/15] migration: Drop unnecessary check in ram's pending_exact() peterx
` (12 subsequent siblings)
15 siblings, 0 replies; 34+ messages in thread
From: peterx @ 2024-01-26 4:17 UTC (permalink / raw)
To: qemu-devel; +Cc: Fabiano Rosas, peterx, Philippe Mathieu-Daudé
From: Peter Xu <peterx@redhat.com>
It's always used to compare against another uint64_t. Make it always clear
that it's never a negative.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240117075848.139045-2-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
---
migration/migration.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/migration/migration.h b/migration/migration.h
index 17972dac34..a589ae8650 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -296,7 +296,7 @@ struct MigrationState {
* this threshold; it's calculated from the requested downtime and
* measured bandwidth, or avail-switchover-bandwidth if specified.
*/
- int64_t threshold_size;
+ uint64_t threshold_size;
/* params from 'migrate-set-parameters' */
MigrationParameters parameters;
--
2.43.0
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PULL 04/15] migration: Drop unnecessary check in ram's pending_exact()
2024-01-26 4:17 [PULL 00/15] Migration 20240126 patches peterx
` (2 preceding siblings ...)
2024-01-26 4:17 ` [PULL 03/15] migration: Make threshold_size an uint64_t peterx
@ 2024-01-26 4:17 ` peterx
2024-01-26 4:17 ` [PULL 05/15] analyze-migration.py: Remove trick on parsing ramblocks peterx
` (11 subsequent siblings)
15 siblings, 0 replies; 34+ messages in thread
From: peterx @ 2024-01-26 4:17 UTC (permalink / raw)
To: qemu-devel; +Cc: Fabiano Rosas, peterx
From: Peter Xu <peterx@redhat.com>
When the migration frameworks fetches the exact pending sizes, it means
this check:
remaining_size < s->threshold_size
Must have been done already, actually at migration_iteration_run():
if (must_precopy <= s->threshold_size) {
qemu_savevm_state_pending_exact(&must_precopy, &can_postcopy);
That should be after one round of ram_state_pending_estimate(). It makes
the 2nd check meaningless and can be dropped.
To say it in another way, when reaching ->state_pending_exact(), we
unconditionally sync dirty bits for precopy.
Then we can drop migrate_get_current() there too.
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240117075848.139045-3-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
---
migration/ram.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)
diff --git a/migration/ram.c b/migration/ram.c
index c0cdcccb75..d5b7cd5ac2 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3213,21 +3213,20 @@ static void ram_state_pending_estimate(void *opaque, uint64_t *must_precopy,
static void ram_state_pending_exact(void *opaque, uint64_t *must_precopy,
uint64_t *can_postcopy)
{
- MigrationState *s = migrate_get_current();
RAMState **temp = opaque;
RAMState *rs = *temp;
+ uint64_t remaining_size;
- uint64_t remaining_size = rs->migration_dirty_pages * TARGET_PAGE_SIZE;
-
- if (!migration_in_postcopy() && remaining_size < s->threshold_size) {
+ if (!migration_in_postcopy()) {
bql_lock();
WITH_RCU_READ_LOCK_GUARD() {
migration_bitmap_sync_precopy(rs, false);
}
bql_unlock();
- remaining_size = rs->migration_dirty_pages * TARGET_PAGE_SIZE;
}
+ remaining_size = rs->migration_dirty_pages * TARGET_PAGE_SIZE;
+
if (migrate_postcopy_ram()) {
/* We can do postcopy, and all the data is postcopiable */
*can_postcopy += remaining_size;
--
2.43.0
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PULL 05/15] analyze-migration.py: Remove trick on parsing ramblocks
2024-01-26 4:17 [PULL 00/15] Migration 20240126 patches peterx
` (3 preceding siblings ...)
2024-01-26 4:17 ` [PULL 04/15] migration: Drop unnecessary check in ram's pending_exact() peterx
@ 2024-01-26 4:17 ` peterx
2024-01-26 4:17 ` [PULL 06/15] tests/qtest/migration: Don't use -cpu max for aarch64 peterx
` (10 subsequent siblings)
15 siblings, 0 replies; 34+ messages in thread
From: peterx @ 2024-01-26 4:17 UTC (permalink / raw)
To: qemu-devel; +Cc: Fabiano Rosas, peterx
From: Peter Xu <peterx@redhat.com>
RAM_SAVE_FLAG_MEM_SIZE contains the total length of ramblock idstr to know
whether scanning of ramblocks is complete. Drop the trick.
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240117075848.139045-4-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
---
scripts/analyze-migration.py | 11 +++--------
1 file changed, 3 insertions(+), 8 deletions(-)
diff --git a/scripts/analyze-migration.py b/scripts/analyze-migration.py
index a39dfb8766..8a254a5b6a 100755
--- a/scripts/analyze-migration.py
+++ b/scripts/analyze-migration.py
@@ -151,17 +151,12 @@ def read(self):
addr &= ~(self.TARGET_PAGE_SIZE - 1)
if flags & self.RAM_SAVE_FLAG_MEM_SIZE:
- while True:
+ total_length = addr
+ while total_length > 0:
namelen = self.file.read8()
- # We assume that no RAM chunk is big enough to ever
- # hit the first byte of the address, so when we see
- # a zero here we know it has to be an address, not the
- # length of the next block.
- if namelen == 0:
- self.file.file.seek(-1, 1)
- break
self.name = self.file.readstr(len = namelen)
len = self.file.read64()
+ total_length -= len
self.sizeinfo[self.name] = '0x%016x' % len
if self.write_memory:
print(self.name)
--
2.43.0
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PULL 06/15] tests/qtest/migration: Don't use -cpu max for aarch64
2024-01-26 4:17 [PULL 00/15] Migration 20240126 patches peterx
` (4 preceding siblings ...)
2024-01-26 4:17 ` [PULL 05/15] analyze-migration.py: Remove trick on parsing ramblocks peterx
@ 2024-01-26 4:17 ` peterx
2024-01-26 14:36 ` Fabiano Rosas
2024-01-26 4:17 ` [PULL 07/15] ci: Add a migration compatibility test job peterx
` (9 subsequent siblings)
15 siblings, 1 reply; 34+ messages in thread
From: peterx @ 2024-01-26 4:17 UTC (permalink / raw)
To: qemu-devel; +Cc: Fabiano Rosas, peterx, Peter Maydell
From: Fabiano Rosas <farosas@suse.de>
The 'max' cpu is not expected to be stable in terms of features across
QEMU versions, so it should not be expected to migrate.
While the tests currently all pass with -cpu max, that is only because
we're not testing across QEMU versions, which is the more common
use-case for migration.
We've recently introduced compatibility tests that use two different
QEMU versions and the tests are now failing for aarch64. The next
patch adds those tests to CI, so we cannot use the 'max' cpu
anymore. Replace it with the 'neoverse-n1', which has a fixed set of
features.
Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240118164951.30350-2-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
---
tests/qtest/migration-test.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 7675519cfa..15713f3666 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -820,7 +820,7 @@ static int test_migrate_start(QTestState **from, QTestState **to,
memory_size = "150M";
machine_alias = "virt";
machine_opts = "gic-version=max";
- arch_opts = g_strdup_printf("-cpu max -kernel %s", bootpath);
+ arch_opts = g_strdup_printf("-cpu neoverse-n1 -kernel %s", bootpath);
start_address = ARM_TEST_MEM_START;
end_address = ARM_TEST_MEM_END;
} else {
--
2.43.0
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PULL 07/15] ci: Add a migration compatibility test job
2024-01-26 4:17 [PULL 00/15] Migration 20240126 patches peterx
` (5 preceding siblings ...)
2024-01-26 4:17 ` [PULL 06/15] tests/qtest/migration: Don't use -cpu max for aarch64 peterx
@ 2024-01-26 4:17 ` peterx
2024-01-26 4:17 ` [PULL 08/15] ci: Disable migration compatibility tests for aarch64 peterx
` (8 subsequent siblings)
15 siblings, 0 replies; 34+ messages in thread
From: peterx @ 2024-01-26 4:17 UTC (permalink / raw)
To: qemu-devel; +Cc: Fabiano Rosas, peterx
From: Fabiano Rosas <farosas@suse.de>
The migration tests have support for being passed two QEMU binaries to
test migration compatibility.
Add a CI job that builds the lastest release of QEMU and another job
that uses that version plus an already present build of the current
version and run the migration tests with the two, both as source and
destination. I.e.:
old QEMU (n-1) -> current QEMU (development tree)
current QEMU (development tree) -> old QEMU (n-1)
The purpose of this CI job is to ensure the code we're about to merge
will not cause a migration compatibility problem when migrating the
next release (which will contain that code) to/from the previous
release.
The version of migration-test used will be the one matching the older
QEMU. That way we can avoid special-casing new tests that wouldn't be
compatible with the older QEMU.
Note: for user forks, the version tags need to be pushed to gitlab
otherwise it won't be able to checkout a different version.
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240118164951.30350-3-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
---
.gitlab-ci.d/buildtest.yml | 60 ++++++++++++++++++++++++++++++++++++++
1 file changed, 60 insertions(+)
diff --git a/.gitlab-ci.d/buildtest.yml b/.gitlab-ci.d/buildtest.yml
index e1c7801598..f0b0edc634 100644
--- a/.gitlab-ci.d/buildtest.yml
+++ b/.gitlab-ci.d/buildtest.yml
@@ -167,6 +167,66 @@ build-system-centos:
x86_64-softmmu rx-softmmu sh4-softmmu nios2-softmmu
MAKE_CHECK_ARGS: check-build
+# Previous QEMU release. Used for cross-version migration tests.
+build-previous-qemu:
+ extends: .native_build_job_template
+ artifacts:
+ when: on_success
+ expire_in: 2 days
+ paths:
+ - build-previous
+ exclude:
+ - build-previous/**/*.p
+ - build-previous/**/*.a.p
+ - build-previous/**/*.fa.p
+ - build-previous/**/*.c.o
+ - build-previous/**/*.c.o.d
+ - build-previous/**/*.fa
+ needs:
+ job: amd64-opensuse-leap-container
+ variables:
+ IMAGE: opensuse-leap
+ TARGETS: x86_64-softmmu aarch64-softmmu
+ before_script:
+ - export QEMU_PREV_VERSION="$(sed 's/\([0-9.]*\)\.[0-9]*/v\1.0/' VERSION)"
+ - git checkout $QEMU_PREV_VERSION
+ after_script:
+ - mv build build-previous
+
+.migration-compat-common:
+ extends: .common_test_job_template
+ needs:
+ - job: build-previous-qemu
+ - job: build-system-opensuse
+ # The old QEMU could have bugs unrelated to migration that are
+ # already fixed in the current development branch, so this test
+ # might fail.
+ allow_failure: true
+ variables:
+ IMAGE: opensuse-leap
+ MAKE_CHECK_ARGS: check-build
+ script:
+ # Use the migration-tests from the older QEMU tree. This avoids
+ # testing an old QEMU against new features/tests that it is not
+ # compatible with.
+ - cd build-previous
+ # old to new
+ - QTEST_QEMU_BINARY_SRC=./qemu-system-${TARGET}
+ QTEST_QEMU_BINARY=../build/qemu-system-${TARGET} ./tests/qtest/migration-test
+ # new to old
+ - QTEST_QEMU_BINARY_DST=./qemu-system-${TARGET}
+ QTEST_QEMU_BINARY=../build/qemu-system-${TARGET} ./tests/qtest/migration-test
+
+migration-compat-aarch64:
+ extends: .migration-compat-common
+ variables:
+ TARGET: aarch64
+
+migration-compat-x86_64:
+ extends: .migration-compat-common
+ variables:
+ TARGET: x86_64
+
check-system-centos:
extends: .native_test_job_template
needs:
--
2.43.0
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PULL 08/15] ci: Disable migration compatibility tests for aarch64
2024-01-26 4:17 [PULL 00/15] Migration 20240126 patches peterx
` (6 preceding siblings ...)
2024-01-26 4:17 ` [PULL 07/15] ci: Add a migration compatibility test job peterx
@ 2024-01-26 4:17 ` peterx
2024-01-26 4:17 ` [PULL 09/15] migration/yank: Use channel features peterx
` (7 subsequent siblings)
15 siblings, 0 replies; 34+ messages in thread
From: peterx @ 2024-01-26 4:17 UTC (permalink / raw)
To: qemu-devel; +Cc: Fabiano Rosas, peterx
From: Fabiano Rosas <farosas@suse.de>
Until 9.0 is out, we need to keep the aarch64 job disabled because the
tests always use the n-1 version of migration-test. That happens to be
broken for aarch64 in 8.2. Once 9.0 is out, it will become the n-1
version and it will bring the fixed tests.
We can revert this patch when 9.0 releases.
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240118164951.30350-4-farosas@suse.de
[peterx: use _SKIPPED rather than _OPTIONAL]
Signed-off-by: Peter Xu <peterx@redhat.com>
---
.gitlab-ci.d/buildtest.yml | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/.gitlab-ci.d/buildtest.yml b/.gitlab-ci.d/buildtest.yml
index f0b0edc634..79bbc8585b 100644
--- a/.gitlab-ci.d/buildtest.yml
+++ b/.gitlab-ci.d/buildtest.yml
@@ -217,10 +217,14 @@ build-previous-qemu:
- QTEST_QEMU_BINARY_DST=./qemu-system-${TARGET}
QTEST_QEMU_BINARY=../build/qemu-system-${TARGET} ./tests/qtest/migration-test
+# This job is disabled until we release 9.0. The existing
+# migration-test in 8.2 is broken on aarch64. The fix was already
+# commited, but it will only take effect once 9.0 is out.
migration-compat-aarch64:
extends: .migration-compat-common
variables:
TARGET: aarch64
+ QEMU_JOB_SKIPPED: 1
migration-compat-x86_64:
extends: .migration-compat-common
--
2.43.0
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PULL 09/15] migration/yank: Use channel features
2024-01-26 4:17 [PULL 00/15] Migration 20240126 patches peterx
` (7 preceding siblings ...)
2024-01-26 4:17 ` [PULL 08/15] ci: Disable migration compatibility tests for aarch64 peterx
@ 2024-01-26 4:17 ` peterx
2024-01-26 4:17 ` [PULL 10/15] migration: Fix use-after-free of migration state object peterx
` (6 subsequent siblings)
15 siblings, 0 replies; 34+ messages in thread
From: peterx @ 2024-01-26 4:17 UTC (permalink / raw)
To: qemu-devel; +Cc: Fabiano Rosas, peterx, Philippe Mathieu-Daudé
From: Fabiano Rosas <farosas@suse.de>
Stop using outside knowledge about the io channels when registering
yank functions. Query for features instead.
The yank method for all channels used with migration code currently is
to call the qio_channel_shutdown() function, so query for
QIO_CHANNEL_FEATURE_SHUTDOWN. We could add a separate feature in the
future for indicating whether a channel supports yanking, but that
seems overkill at the moment.
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/20230911171320.24372-9-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
---
migration/yank_functions.c | 6 +-----
1 file changed, 1 insertion(+), 5 deletions(-)
diff --git a/migration/yank_functions.c b/migration/yank_functions.c
index d5a710a3f2..979e60c762 100644
--- a/migration/yank_functions.c
+++ b/migration/yank_functions.c
@@ -8,12 +8,9 @@
*/
#include "qemu/osdep.h"
-#include "qapi/error.h"
#include "io/channel.h"
#include "yank_functions.h"
#include "qemu/yank.h"
-#include "io/channel-socket.h"
-#include "io/channel-tls.h"
#include "qemu-file.h"
void migration_yank_iochannel(void *opaque)
@@ -26,8 +23,7 @@ void migration_yank_iochannel(void *opaque)
/* Return whether yank is supported on this ioc */
static bool migration_ioc_yank_supported(QIOChannel *ioc)
{
- return object_dynamic_cast(OBJECT(ioc), TYPE_QIO_CHANNEL_SOCKET) ||
- object_dynamic_cast(OBJECT(ioc), TYPE_QIO_CHANNEL_TLS);
+ return qio_channel_has_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN);
}
void migration_ioc_register_yank(QIOChannel *ioc)
--
2.43.0
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PULL 10/15] migration: Fix use-after-free of migration state object
2024-01-26 4:17 [PULL 00/15] Migration 20240126 patches peterx
` (8 preceding siblings ...)
2024-01-26 4:17 ` [PULL 09/15] migration/yank: Use channel features peterx
@ 2024-01-26 4:17 ` peterx
2024-01-26 4:17 ` [PULL 11/15] migration: Take reference to migration state around bg_migration_vm_start_bh peterx
` (5 subsequent siblings)
15 siblings, 0 replies; 34+ messages in thread
From: peterx @ 2024-01-26 4:17 UTC (permalink / raw)
To: qemu-devel; +Cc: Fabiano Rosas, peterx
From: Fabiano Rosas <farosas@suse.de>
We're currently allowing the process_incoming_migration_bh bottom-half
to run without holding a reference to the 'current_migration' object,
which leads to a segmentation fault if the BH is still live after
migration_shutdown() has dropped the last reference to
current_migration.
In my system the bug manifests as migrate_multifd() returning true
when it shouldn't and multifd_load_shutdown() calling
multifd_recv_terminate_threads() which crashes due to an uninitialized
multifd_recv_state.
Fix the issue by holding a reference to the object when scheduling the
BH and dropping it before returning from the BH. The same is already
done for the cleanup_bh at migrate_fd_cleanup_schedule().
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1969
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240119233922.32588-2-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
---
migration/migration.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/migration/migration.c b/migration/migration.c
index 219447dea1..cf17b68e57 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -648,6 +648,7 @@ static void process_incoming_migration_bh(void *opaque)
MIGRATION_STATUS_COMPLETED);
qemu_bh_delete(mis->bh);
migration_incoming_state_destroy();
+ object_unref(OBJECT(migrate_get_current()));
}
static void coroutine_fn
@@ -713,6 +714,7 @@ process_incoming_migration_co(void *opaque)
}
mis->bh = qemu_bh_new(process_incoming_migration_bh, mis);
+ object_ref(OBJECT(migrate_get_current()));
qemu_bh_schedule(mis->bh);
return;
fail:
--
2.43.0
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PULL 11/15] migration: Take reference to migration state around bg_migration_vm_start_bh
2024-01-26 4:17 [PULL 00/15] Migration 20240126 patches peterx
` (9 preceding siblings ...)
2024-01-26 4:17 ` [PULL 10/15] migration: Fix use-after-free of migration state object peterx
@ 2024-01-26 4:17 ` peterx
2024-01-26 4:17 ` [PULL 12/15] migration: Reference migration state around loadvm_postcopy_handle_run_bh peterx
` (4 subsequent siblings)
15 siblings, 0 replies; 34+ messages in thread
From: peterx @ 2024-01-26 4:17 UTC (permalink / raw)
To: qemu-devel; +Cc: Fabiano Rosas, peterx
From: Fabiano Rosas <farosas@suse.de>
We need to hold a reference to the current_migration object around
async calls to avoid it been freed while still in use.
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240119233922.32588-3-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
---
migration/migration.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/migration/migration.c b/migration/migration.c
index cf17b68e57..b1213b59ce 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3382,6 +3382,7 @@ static void bg_migration_vm_start_bh(void *opaque)
vm_resume(s->vm_old_state);
migration_downtime_end(s);
+ object_unref(OBJECT(s));
}
/**
@@ -3486,6 +3487,7 @@ static void *bg_migration_thread(void *opaque)
* writes to virtio VQs memory which is in write-protected region.
*/
s->vm_start_bh = qemu_bh_new(bg_migration_vm_start_bh, s);
+ object_ref(OBJECT(s));
qemu_bh_schedule(s->vm_start_bh);
bql_unlock();
--
2.43.0
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PULL 12/15] migration: Reference migration state around loadvm_postcopy_handle_run_bh
2024-01-26 4:17 [PULL 00/15] Migration 20240126 patches peterx
` (10 preceding siblings ...)
2024-01-26 4:17 ` [PULL 11/15] migration: Take reference to migration state around bg_migration_vm_start_bh peterx
@ 2024-01-26 4:17 ` peterx
2024-01-26 4:17 ` [PULL 13/15] migration: Add a wrapper to qemu_bh_schedule peterx
` (3 subsequent siblings)
15 siblings, 0 replies; 34+ messages in thread
From: peterx @ 2024-01-26 4:17 UTC (permalink / raw)
To: qemu-devel; +Cc: Fabiano Rosas, peterx
From: Fabiano Rosas <farosas@suse.de>
We need to hold a reference to the current_migration object around
async calls to avoid it been freed while still in use. Even on this
load-side function, we might still use the MigrationState, e.g to
check for capabilities.
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240119233922.32588-4-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
---
migration/savevm.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/migration/savevm.c b/migration/savevm.c
index 6410705ebe..93387350c7 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2174,6 +2174,7 @@ static void loadvm_postcopy_handle_run_bh(void *opaque)
qemu_bh_delete(mis->bh);
trace_vmstate_downtime_checkpoint("dst-postcopy-bh-vm-started");
+ object_unref(OBJECT(migration_get_current()));
}
/* After all discards we can start running and asking for pages */
@@ -2189,6 +2190,7 @@ static int loadvm_postcopy_handle_run(MigrationIncomingState *mis)
postcopy_state_set(POSTCOPY_INCOMING_RUNNING);
mis->bh = qemu_bh_new(loadvm_postcopy_handle_run_bh, mis);
+ object_ref(OBJECT(migration_get_current()));
qemu_bh_schedule(mis->bh);
/* We need to finish reading the stream from the package
--
2.43.0
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PULL 13/15] migration: Add a wrapper to qemu_bh_schedule
2024-01-26 4:17 [PULL 00/15] Migration 20240126 patches peterx
` (11 preceding siblings ...)
2024-01-26 4:17 ` [PULL 12/15] migration: Reference migration state around loadvm_postcopy_handle_run_bh peterx
@ 2024-01-26 4:17 ` peterx
2024-01-26 4:17 ` [PULL 14/15] migration: Centralize BH creation and dispatch peterx
` (2 subsequent siblings)
15 siblings, 0 replies; 34+ messages in thread
From: peterx @ 2024-01-26 4:17 UTC (permalink / raw)
To: qemu-devel; +Cc: Fabiano Rosas, peterx
From: Fabiano Rosas <farosas@suse.de>
Wrap qemu_bh_schedule() to ensure we always hold a reference to the
current_migration object.
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240119233922.32588-5-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
---
migration/migration.c | 31 ++++++++++++++-----------------
1 file changed, 14 insertions(+), 17 deletions(-)
diff --git a/migration/migration.c b/migration/migration.c
index b1213b59ce..0e7f101d64 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -199,6 +199,16 @@ void migration_object_init(void)
dirty_bitmap_mig_init();
}
+static void migration_bh_schedule(MigrationState *s, QEMUBH *bh)
+{
+ /*
+ * Ref the state for bh, because it may be called when
+ * there're already no other refs
+ */
+ object_ref(OBJECT(s));
+ qemu_bh_schedule(bh);
+}
+
void migration_cancel(const Error *error)
{
if (error) {
@@ -714,8 +724,7 @@ process_incoming_migration_co(void *opaque)
}
mis->bh = qemu_bh_new(process_incoming_migration_bh, mis);
- object_ref(OBJECT(migrate_get_current()));
- qemu_bh_schedule(mis->bh);
+ migration_bh_schedule(migrate_get_current(), mis->bh);
return;
fail:
migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
@@ -1332,16 +1341,6 @@ static void migrate_fd_cleanup(MigrationState *s)
yank_unregister_instance(MIGRATION_YANK_INSTANCE);
}
-static void migrate_fd_cleanup_schedule(MigrationState *s)
-{
- /*
- * Ref the state for bh, because it may be called when
- * there're already no other refs
- */
- object_ref(OBJECT(s));
- qemu_bh_schedule(s->cleanup_bh);
-}
-
static void migrate_fd_cleanup_bh(void *opaque)
{
MigrationState *s = opaque;
@@ -3140,7 +3139,7 @@ static void migration_iteration_finish(MigrationState *s)
error_report("%s: Unknown ending state %d", __func__, s->state);
break;
}
- migrate_fd_cleanup_schedule(s);
+ migration_bh_schedule(s, s->cleanup_bh);
bql_unlock();
}
@@ -3171,7 +3170,7 @@ static void bg_migration_iteration_finish(MigrationState *s)
break;
}
- migrate_fd_cleanup_schedule(s);
+ migration_bh_schedule(s, s->cleanup_bh);
bql_unlock();
}
@@ -3487,9 +3486,7 @@ static void *bg_migration_thread(void *opaque)
* writes to virtio VQs memory which is in write-protected region.
*/
s->vm_start_bh = qemu_bh_new(bg_migration_vm_start_bh, s);
- object_ref(OBJECT(s));
- qemu_bh_schedule(s->vm_start_bh);
-
+ migration_bh_schedule(s, s->vm_start_bh);
bql_unlock();
while (migration_is_active(s)) {
--
2.43.0
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PULL 14/15] migration: Centralize BH creation and dispatch
2024-01-26 4:17 [PULL 00/15] Migration 20240126 patches peterx
` (12 preceding siblings ...)
2024-01-26 4:17 ` [PULL 13/15] migration: Add a wrapper to qemu_bh_schedule peterx
@ 2024-01-26 4:17 ` peterx
2024-01-26 4:17 ` [PULL 15/15] Make 'uri' optional for migrate QAPI peterx
2024-01-26 18:16 ` [PULL 00/15] Migration 20240126 patches Peter Maydell
15 siblings, 0 replies; 34+ messages in thread
From: peterx @ 2024-01-26 4:17 UTC (permalink / raw)
To: qemu-devel; +Cc: Fabiano Rosas, peterx
From: Fabiano Rosas <farosas@suse.de>
Now that the migration state reference counting is correct, further
wrap the bottom half dispatch process to avoid future issues.
Move BH creation and scheduling together and wrap the dispatch with an
intermediary function that will ensure we always keep the ref/unref
balanced.
Also move the responsibility of deleting the BH into the wrapper and
remove the now unnecessary pointers.
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240119233922.32588-6-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
---
migration/migration.h | 5 +---
migration/migration.c | 65 +++++++++++++++++++++++++------------------
migration/savevm.c | 7 +----
3 files changed, 40 insertions(+), 37 deletions(-)
diff --git a/migration/migration.h b/migration/migration.h
index a589ae8650..f2c8b8f286 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -159,8 +159,6 @@ struct MigrationIncomingState {
/* PostCopyFD's for external userfaultfds & handlers of shared memory */
GArray *postcopy_remote_fds;
- QEMUBH *bh;
-
int state;
/*
@@ -255,8 +253,6 @@ struct MigrationState {
/*< public >*/
QemuThread thread;
- QEMUBH *vm_start_bh;
- QEMUBH *cleanup_bh;
/* Protected by qemu_file_lock */
QEMUFile *to_dst_file;
/* Postcopy specific transfer channel */
@@ -528,6 +524,7 @@ int foreach_not_ignored_block(RAMBlockIterFunc func, void *opaque);
void migration_make_urgent_request(void);
void migration_consume_urgent_request(void);
bool migration_rate_limit(void);
+void migration_bh_schedule(QEMUBHFunc *cb, void *opaque);
void migration_cancel(const Error *error);
void migration_populate_vfio_info(MigrationInfo *info);
diff --git a/migration/migration.c b/migration/migration.c
index 0e7f101d64..d5f705ceef 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -199,8 +199,39 @@ void migration_object_init(void)
dirty_bitmap_mig_init();
}
-static void migration_bh_schedule(MigrationState *s, QEMUBH *bh)
+typedef struct {
+ QEMUBH *bh;
+ QEMUBHFunc *cb;
+ void *opaque;
+} MigrationBH;
+
+static void migration_bh_dispatch_bh(void *opaque)
{
+ MigrationState *s = migrate_get_current();
+ MigrationBH *migbh = opaque;
+
+ /* cleanup this BH */
+ qemu_bh_delete(migbh->bh);
+ migbh->bh = NULL;
+
+ /* dispatch the other one */
+ migbh->cb(migbh->opaque);
+ object_unref(OBJECT(s));
+
+ g_free(migbh);
+}
+
+void migration_bh_schedule(QEMUBHFunc *cb, void *opaque)
+{
+ MigrationState *s = migrate_get_current();
+ MigrationBH *migbh = g_new0(MigrationBH, 1);
+ QEMUBH *bh = qemu_bh_new(migration_bh_dispatch_bh, migbh);
+
+ /* Store these to dispatch when the BH runs */
+ migbh->bh = bh;
+ migbh->cb = cb;
+ migbh->opaque = opaque;
+
/*
* Ref the state for bh, because it may be called when
* there're already no other refs
@@ -656,9 +687,7 @@ static void process_incoming_migration_bh(void *opaque)
*/
migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
MIGRATION_STATUS_COMPLETED);
- qemu_bh_delete(mis->bh);
migration_incoming_state_destroy();
- object_unref(OBJECT(migrate_get_current()));
}
static void coroutine_fn
@@ -723,8 +752,7 @@ process_incoming_migration_co(void *opaque)
goto fail;
}
- mis->bh = qemu_bh_new(process_incoming_migration_bh, mis);
- migration_bh_schedule(migrate_get_current(), mis->bh);
+ migration_bh_schedule(process_incoming_migration_bh, mis);
return;
fail:
migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
@@ -1285,9 +1313,6 @@ void migrate_set_state(int *state, int old_state, int new_state)
static void migrate_fd_cleanup(MigrationState *s)
{
- qemu_bh_delete(s->cleanup_bh);
- s->cleanup_bh = NULL;
-
g_free(s->hostname);
s->hostname = NULL;
json_writer_free(s->vmdesc);
@@ -1343,9 +1368,7 @@ static void migrate_fd_cleanup(MigrationState *s)
static void migrate_fd_cleanup_bh(void *opaque)
{
- MigrationState *s = opaque;
- migrate_fd_cleanup(s);
- object_unref(OBJECT(s));
+ migrate_fd_cleanup(opaque);
}
void migrate_set_error(MigrationState *s, const Error *error)
@@ -1568,8 +1591,6 @@ int migrate_init(MigrationState *s, Error **errp)
* parameters/capabilities that the user set, and
* locks.
*/
- s->cleanup_bh = 0;
- s->vm_start_bh = 0;
s->to_dst_file = NULL;
s->state = MIGRATION_STATUS_NONE;
s->rp_state.from_dst_file = NULL;
@@ -3139,7 +3160,8 @@ static void migration_iteration_finish(MigrationState *s)
error_report("%s: Unknown ending state %d", __func__, s->state);
break;
}
- migration_bh_schedule(s, s->cleanup_bh);
+
+ migration_bh_schedule(migrate_fd_cleanup_bh, s);
bql_unlock();
}
@@ -3170,7 +3192,7 @@ static void bg_migration_iteration_finish(MigrationState *s)
break;
}
- migration_bh_schedule(s, s->cleanup_bh);
+ migration_bh_schedule(migrate_fd_cleanup_bh, s);
bql_unlock();
}
@@ -3376,12 +3398,8 @@ static void bg_migration_vm_start_bh(void *opaque)
{
MigrationState *s = opaque;
- qemu_bh_delete(s->vm_start_bh);
- s->vm_start_bh = NULL;
-
vm_resume(s->vm_old_state);
migration_downtime_end(s);
- object_unref(OBJECT(s));
}
/**
@@ -3485,8 +3503,7 @@ static void *bg_migration_thread(void *opaque)
* calling VM state change notifiers from vm_start() would initiate
* writes to virtio VQs memory which is in write-protected region.
*/
- s->vm_start_bh = qemu_bh_new(bg_migration_vm_start_bh, s);
- migration_bh_schedule(s, s->vm_start_bh);
+ migration_bh_schedule(bg_migration_vm_start_bh, s);
bql_unlock();
while (migration_is_active(s)) {
@@ -3542,12 +3559,6 @@ void migrate_fd_connect(MigrationState *s, Error *error_in)
migrate_error_free(s);
s->expected_downtime = migrate_downtime_limit();
- if (resume) {
- assert(s->cleanup_bh);
- } else {
- assert(!s->cleanup_bh);
- s->cleanup_bh = qemu_bh_new(migrate_fd_cleanup_bh, s);
- }
if (error_in) {
migrate_fd_error(s, error_in);
if (resume) {
diff --git a/migration/savevm.c b/migration/savevm.c
index 93387350c7..d612c8a902 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2171,10 +2171,7 @@ static void loadvm_postcopy_handle_run_bh(void *opaque)
runstate_set(RUN_STATE_PAUSED);
}
- qemu_bh_delete(mis->bh);
-
trace_vmstate_downtime_checkpoint("dst-postcopy-bh-vm-started");
- object_unref(OBJECT(migration_get_current()));
}
/* After all discards we can start running and asking for pages */
@@ -2189,9 +2186,7 @@ static int loadvm_postcopy_handle_run(MigrationIncomingState *mis)
}
postcopy_state_set(POSTCOPY_INCOMING_RUNNING);
- mis->bh = qemu_bh_new(loadvm_postcopy_handle_run_bh, mis);
- object_ref(OBJECT(migration_get_current()));
- qemu_bh_schedule(mis->bh);
+ migration_bh_schedule(loadvm_postcopy_handle_run_bh, mis);
/* We need to finish reading the stream from the package
* and also stop reading anything more from the stream that loaded the
--
2.43.0
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PULL 15/15] Make 'uri' optional for migrate QAPI
2024-01-26 4:17 [PULL 00/15] Migration 20240126 patches peterx
` (13 preceding siblings ...)
2024-01-26 4:17 ` [PULL 14/15] migration: Centralize BH creation and dispatch peterx
@ 2024-01-26 4:17 ` peterx
2024-01-26 18:16 ` [PULL 00/15] Migration 20240126 patches Peter Maydell
15 siblings, 0 replies; 34+ messages in thread
From: peterx @ 2024-01-26 4:17 UTC (permalink / raw)
To: qemu-devel; +Cc: Fabiano Rosas, peterx, Het Gala
From: Het Gala <het.gala@nutanix.com>
'uri' argument should be optional, as 'uri' and 'channels'
arguments are mutally exclusive in nature.
Fixes: 074dbce5fcce (migration: New migrate and migrate-incoming argument 'channels')
Signed-off-by: Het Gala <het.gala@nutanix.com>
Link: https://lore.kernel.org/r/20240123064219.40514-1-het.gala@nutanix.com
Signed-off-by: Peter Xu <peterx@redhat.com>
---
qapi/migration.json | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/qapi/migration.json b/qapi/migration.json
index eb2f883513..197d3faa43 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -1757,7 +1757,7 @@
#
##
{ 'command': 'migrate',
- 'data': {'uri': 'str',
+ 'data': {'*uri': 'str',
'*channels': [ 'MigrationChannel' ],
'*blk': { 'type': 'bool', 'features': [ 'deprecated' ] },
'*inc': { 'type': 'bool', 'features': [ 'deprecated' ] },
--
2.43.0
^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: [PULL 06/15] tests/qtest/migration: Don't use -cpu max for aarch64
2024-01-26 4:17 ` [PULL 06/15] tests/qtest/migration: Don't use -cpu max for aarch64 peterx
@ 2024-01-26 14:36 ` Fabiano Rosas
2024-01-26 14:39 ` Peter Maydell
0 siblings, 1 reply; 34+ messages in thread
From: Fabiano Rosas @ 2024-01-26 14:36 UTC (permalink / raw)
To: peterx, qemu-devel; +Cc: peterx, Peter Maydell
peterx@redhat.com writes:
> From: Fabiano Rosas <farosas@suse.de>
>
> The 'max' cpu is not expected to be stable in terms of features across
> QEMU versions, so it should not be expected to migrate.
>
> While the tests currently all pass with -cpu max, that is only because
> we're not testing across QEMU versions, which is the more common
> use-case for migration.
>
> We've recently introduced compatibility tests that use two different
> QEMU versions and the tests are now failing for aarch64. The next
> patch adds those tests to CI, so we cannot use the 'max' cpu
> anymore. Replace it with the 'neoverse-n1', which has a fixed set of
> features.
>
> Suggested-by: Peter Maydell <peter.maydell@linaro.org>
> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> Link: https://lore.kernel.org/r/20240118164951.30350-2-farosas@suse.de
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
> tests/qtest/migration-test.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
> index 7675519cfa..15713f3666 100644
> --- a/tests/qtest/migration-test.c
> +++ b/tests/qtest/migration-test.c
> @@ -820,7 +820,7 @@ static int test_migrate_start(QTestState **from, QTestState **to,
> memory_size = "150M";
> machine_alias = "virt";
> machine_opts = "gic-version=max";
> - arch_opts = g_strdup_printf("-cpu max -kernel %s", bootpath);
> + arch_opts = g_strdup_printf("-cpu neoverse-n1 -kernel %s", bootpath);
> start_address = ARM_TEST_MEM_START;
> end_address = ARM_TEST_MEM_END;
> } else {
This breaks the tests on an arm host with KVM support. We could drop
this patch from the PR, it doesn't affect anything else.
Or squash in:
-->8--
From b8aa5d8a2b33dcc28e4cd4ce2c4f4eacc3a3b845 Mon Sep 17 00:00:00 2001
From: Fabiano Rosas <farosas@suse.de>
Date: Fri, 26 Jan 2024 11:33:15 -0300
Subject: [PATCH] fixup! tests/qtest/migration: Don't use -cpu max for aarch64
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
tests/qtest/migration-test.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 15713f3666..2ba9cab684 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -820,7 +820,9 @@ static int test_migrate_start(QTestState **from, QTestState **to,
memory_size = "150M";
machine_alias = "virt";
machine_opts = "gic-version=max";
- arch_opts = g_strdup_printf("-cpu neoverse-n1 -kernel %s", bootpath);
+ arch_opts = g_strdup_printf("-cpu %s -kernel %s",
+ qtest_has_accel("kvm") ?
+ "host" : "neoverse-n1", bootpath);
start_address = ARM_TEST_MEM_START;
end_address = ARM_TEST_MEM_END;
} else {
--
2.35.3
^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: [PULL 06/15] tests/qtest/migration: Don't use -cpu max for aarch64
2024-01-26 14:36 ` Fabiano Rosas
@ 2024-01-26 14:39 ` Peter Maydell
2024-01-26 14:54 ` Fabiano Rosas
0 siblings, 1 reply; 34+ messages in thread
From: Peter Maydell @ 2024-01-26 14:39 UTC (permalink / raw)
To: Fabiano Rosas; +Cc: peterx, qemu-devel
On Fri, 26 Jan 2024 at 14:36, Fabiano Rosas <farosas@suse.de> wrote:
>
> peterx@redhat.com writes:
>
> > From: Fabiano Rosas <farosas@suse.de>
> >
> > The 'max' cpu is not expected to be stable in terms of features across
> > QEMU versions, so it should not be expected to migrate.
> >
> > While the tests currently all pass with -cpu max, that is only because
> > we're not testing across QEMU versions, which is the more common
> > use-case for migration.
> >
> > We've recently introduced compatibility tests that use two different
> > QEMU versions and the tests are now failing for aarch64. The next
> > patch adds those tests to CI, so we cannot use the 'max' cpu
> > anymore. Replace it with the 'neoverse-n1', which has a fixed set of
> > features.
> >
> > Suggested-by: Peter Maydell <peter.maydell@linaro.org>
> > Signed-off-by: Fabiano Rosas <farosas@suse.de>
> > Link: https://lore.kernel.org/r/20240118164951.30350-2-farosas@suse.de
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> > tests/qtest/migration-test.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
> > index 7675519cfa..15713f3666 100644
> > --- a/tests/qtest/migration-test.c
> > +++ b/tests/qtest/migration-test.c
> > @@ -820,7 +820,7 @@ static int test_migrate_start(QTestState **from, QTestState **to,
> > memory_size = "150M";
> > machine_alias = "virt";
> > machine_opts = "gic-version=max";
> > - arch_opts = g_strdup_printf("-cpu max -kernel %s", bootpath);
> > + arch_opts = g_strdup_printf("-cpu neoverse-n1 -kernel %s", bootpath);
> > start_address = ARM_TEST_MEM_START;
> > end_address = ARM_TEST_MEM_END;
> > } else {
>
> This breaks the tests on an arm host with KVM support. We could drop
> this patch from the PR, it doesn't affect anything else.
>
> Or squash in:
>
> -->8--
> From b8aa5d8a2b33dcc28e4cd4ce2c4f4eacc3a3b845 Mon Sep 17 00:00:00 2001
> From: Fabiano Rosas <farosas@suse.de>
> Date: Fri, 26 Jan 2024 11:33:15 -0300
> Subject: [PATCH] fixup! tests/qtest/migration: Don't use -cpu max for aarch64
>
> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> ---
> tests/qtest/migration-test.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
> index 15713f3666..2ba9cab684 100644
> --- a/tests/qtest/migration-test.c
> +++ b/tests/qtest/migration-test.c
> @@ -820,7 +820,9 @@ static int test_migrate_start(QTestState **from, QTestState **to,
> memory_size = "150M";
> machine_alias = "virt";
> machine_opts = "gic-version=max";
> - arch_opts = g_strdup_printf("-cpu neoverse-n1 -kernel %s", bootpath);
> + arch_opts = g_strdup_printf("-cpu %s -kernel %s",
> + qtest_has_accel("kvm") ?
> + "host" : "neoverse-n1", bootpath);
> start_address = ARM_TEST_MEM_START;
> end_address = ARM_TEST_MEM_END;
> } else {
If you want to do that then a comment explaining why would be
helpful for future readers, I think.
thanks
-- PMM
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PULL 06/15] tests/qtest/migration: Don't use -cpu max for aarch64
2024-01-26 14:39 ` Peter Maydell
@ 2024-01-26 14:54 ` Fabiano Rosas
2024-01-29 2:51 ` Peter Xu
0 siblings, 1 reply; 34+ messages in thread
From: Fabiano Rosas @ 2024-01-26 14:54 UTC (permalink / raw)
To: Peter Maydell; +Cc: peterx, qemu-devel
Peter Maydell <peter.maydell@linaro.org> writes:
> On Fri, 26 Jan 2024 at 14:36, Fabiano Rosas <farosas@suse.de> wrote:
>>
>> peterx@redhat.com writes:
>>
>> > From: Fabiano Rosas <farosas@suse.de>
>> >
>> > The 'max' cpu is not expected to be stable in terms of features across
>> > QEMU versions, so it should not be expected to migrate.
>> >
>> > While the tests currently all pass with -cpu max, that is only because
>> > we're not testing across QEMU versions, which is the more common
>> > use-case for migration.
>> >
>> > We've recently introduced compatibility tests that use two different
>> > QEMU versions and the tests are now failing for aarch64. The next
>> > patch adds those tests to CI, so we cannot use the 'max' cpu
>> > anymore. Replace it with the 'neoverse-n1', which has a fixed set of
>> > features.
>> >
>> > Suggested-by: Peter Maydell <peter.maydell@linaro.org>
>> > Signed-off-by: Fabiano Rosas <farosas@suse.de>
>> > Link: https://lore.kernel.org/r/20240118164951.30350-2-farosas@suse.de
>> > Signed-off-by: Peter Xu <peterx@redhat.com>
>> > ---
>> > tests/qtest/migration-test.c | 2 +-
>> > 1 file changed, 1 insertion(+), 1 deletion(-)
>> >
>> > diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
>> > index 7675519cfa..15713f3666 100644
>> > --- a/tests/qtest/migration-test.c
>> > +++ b/tests/qtest/migration-test.c
>> > @@ -820,7 +820,7 @@ static int test_migrate_start(QTestState **from, QTestState **to,
>> > memory_size = "150M";
>> > machine_alias = "virt";
>> > machine_opts = "gic-version=max";
>> > - arch_opts = g_strdup_printf("-cpu max -kernel %s", bootpath);
>> > + arch_opts = g_strdup_printf("-cpu neoverse-n1 -kernel %s", bootpath);
>> > start_address = ARM_TEST_MEM_START;
>> > end_address = ARM_TEST_MEM_END;
>> > } else {
>>
>> This breaks the tests on an arm host with KVM support. We could drop
>> this patch from the PR, it doesn't affect anything else.
>>
>> Or squash in:
>>
>> -->8--
>> From b8aa5d8a2b33dcc28e4cd4ce2c4f4eacc3a3b845 Mon Sep 17 00:00:00 2001
>> From: Fabiano Rosas <farosas@suse.de>
>> Date: Fri, 26 Jan 2024 11:33:15 -0300
>> Subject: [PATCH] fixup! tests/qtest/migration: Don't use -cpu max for aarch64
>>
>> Signed-off-by: Fabiano Rosas <farosas@suse.de>
>> ---
>> tests/qtest/migration-test.c | 4 +++-
>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
>> index 15713f3666..2ba9cab684 100644
>> --- a/tests/qtest/migration-test.c
>> +++ b/tests/qtest/migration-test.c
>> @@ -820,7 +820,9 @@ static int test_migrate_start(QTestState **from, QTestState **to,
>> memory_size = "150M";
>> machine_alias = "virt";
>> machine_opts = "gic-version=max";
>> - arch_opts = g_strdup_printf("-cpu neoverse-n1 -kernel %s", bootpath);
>> + arch_opts = g_strdup_printf("-cpu %s -kernel %s",
>> + qtest_has_accel("kvm") ?
>> + "host" : "neoverse-n1", bootpath);
>> start_address = ARM_TEST_MEM_START;
>> end_address = ARM_TEST_MEM_END;
>> } else {
>
> If you want to do that then a comment explaining why would be
> helpful for future readers, I think.
Ok, let's drop this one then, I'll resend.
Thanks
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PULL 00/15] Migration 20240126 patches
2024-01-26 4:17 [PULL 00/15] Migration 20240126 patches peterx
` (14 preceding siblings ...)
2024-01-26 4:17 ` [PULL 15/15] Make 'uri' optional for migrate QAPI peterx
@ 2024-01-26 18:16 ` Peter Maydell
15 siblings, 0 replies; 34+ messages in thread
From: Peter Maydell @ 2024-01-26 18:16 UTC (permalink / raw)
To: peterx; +Cc: qemu-devel, Fabiano Rosas
On Fri, 26 Jan 2024 at 04:18, <peterx@redhat.com> wrote:
>
> From: Peter Xu <peterx@redhat.com>
>
> The following changes since commit 5bab95dc74d43bbb28c6a96d24c810a664432057:
>
> Merge tag 'pull-request-2024-01-24' of https://gitlab.com/thuth/qemu into staging (2024-01-25 12:33:42 +0000)
>
> are available in the Git repository at:
>
> https://gitlab.com/peterx/qemu.git tags/migration-20240126-pull-request
>
> for you to fetch changes up to 24b0c2ec956ca225282f81470f7c26f5bb844885:
>
> Make 'uri' optional for migrate QAPI (2024-01-26 11:06:13 +0800)
>
> ----------------------------------------------------------------
> Migration pull for 9.0
>
> - Fabiano's patchset to fix migration state references in BHs
> - Fabiano's new 'n-1' migration test for CI
> - Het's fix on making "uri" optional in QMP migrate cmd
> - Markus's HMP leak fix reported by Coverity
> - Paolo's cleanup on uffd to replace u64 usage
> - Peter's small migration cleanup series all over the places
>
> ----------------------------------------------------------------
This fails CI on the AArch64 host:
https://gitlab.com/qemu-project/qemu/-/jobs/6031311759
because it's trying to use the neoverse-n1 CPU with KVM.
thanks
-- PMM
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PULL 06/15] tests/qtest/migration: Don't use -cpu max for aarch64
2024-01-26 14:54 ` Fabiano Rosas
@ 2024-01-29 2:51 ` Peter Xu
2024-01-29 12:14 ` Fabiano Rosas
0 siblings, 1 reply; 34+ messages in thread
From: Peter Xu @ 2024-01-29 2:51 UTC (permalink / raw)
To: Fabiano Rosas; +Cc: Peter Maydell, qemu-devel
On Fri, Jan 26, 2024 at 11:54:32AM -0300, Fabiano Rosas wrote:
> Peter Maydell <peter.maydell@linaro.org> writes:
>
> > On Fri, 26 Jan 2024 at 14:36, Fabiano Rosas <farosas@suse.de> wrote:
> >>
> >> peterx@redhat.com writes:
> >>
> >> > From: Fabiano Rosas <farosas@suse.de>
> >> >
> >> > The 'max' cpu is not expected to be stable in terms of features across
> >> > QEMU versions, so it should not be expected to migrate.
> >> >
> >> > While the tests currently all pass with -cpu max, that is only because
> >> > we're not testing across QEMU versions, which is the more common
> >> > use-case for migration.
> >> >
> >> > We've recently introduced compatibility tests that use two different
> >> > QEMU versions and the tests are now failing for aarch64. The next
> >> > patch adds those tests to CI, so we cannot use the 'max' cpu
> >> > anymore. Replace it with the 'neoverse-n1', which has a fixed set of
> >> > features.
> >> >
> >> > Suggested-by: Peter Maydell <peter.maydell@linaro.org>
> >> > Signed-off-by: Fabiano Rosas <farosas@suse.de>
> >> > Link: https://lore.kernel.org/r/20240118164951.30350-2-farosas@suse.de
> >> > Signed-off-by: Peter Xu <peterx@redhat.com>
> >> > ---
> >> > tests/qtest/migration-test.c | 2 +-
> >> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >> >
> >> > diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
> >> > index 7675519cfa..15713f3666 100644
> >> > --- a/tests/qtest/migration-test.c
> >> > +++ b/tests/qtest/migration-test.c
> >> > @@ -820,7 +820,7 @@ static int test_migrate_start(QTestState **from, QTestState **to,
> >> > memory_size = "150M";
> >> > machine_alias = "virt";
> >> > machine_opts = "gic-version=max";
> >> > - arch_opts = g_strdup_printf("-cpu max -kernel %s", bootpath);
> >> > + arch_opts = g_strdup_printf("-cpu neoverse-n1 -kernel %s", bootpath);
> >> > start_address = ARM_TEST_MEM_START;
> >> > end_address = ARM_TEST_MEM_END;
> >> > } else {
> >>
> >> This breaks the tests on an arm host with KVM support. We could drop
> >> this patch from the PR, it doesn't affect anything else.
> >>
> >> Or squash in:
> >>
> >> -->8--
> >> From b8aa5d8a2b33dcc28e4cd4ce2c4f4eacc3a3b845 Mon Sep 17 00:00:00 2001
> >> From: Fabiano Rosas <farosas@suse.de>
> >> Date: Fri, 26 Jan 2024 11:33:15 -0300
> >> Subject: [PATCH] fixup! tests/qtest/migration: Don't use -cpu max for aarch64
> >>
> >> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> >> ---
> >> tests/qtest/migration-test.c | 4 +++-
> >> 1 file changed, 3 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
> >> index 15713f3666..2ba9cab684 100644
> >> --- a/tests/qtest/migration-test.c
> >> +++ b/tests/qtest/migration-test.c
> >> @@ -820,7 +820,9 @@ static int test_migrate_start(QTestState **from, QTestState **to,
> >> memory_size = "150M";
> >> machine_alias = "virt";
> >> machine_opts = "gic-version=max";
> >> - arch_opts = g_strdup_printf("-cpu neoverse-n1 -kernel %s", bootpath);
> >> + arch_opts = g_strdup_printf("-cpu %s -kernel %s",
> >> + qtest_has_accel("kvm") ?
> >> + "host" : "neoverse-n1", bootpath);
> >> start_address = ARM_TEST_MEM_START;
> >> end_address = ARM_TEST_MEM_END;
> >> } else {
> >
> > If you want to do that then a comment explaining why would be
> > helpful for future readers, I think.
>
> Ok, let's drop this one then, I'll resend.
I'll drop this one for now then, thanks.
Just to double check: Fabiano, you meant that "-cpu host" won't hit the
same issue as what "-cpu max" would have for the new "n-1" CI test, right?
I can also wait to read your patch if that will contain the explanations.
--
Peter Xu
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PULL 06/15] tests/qtest/migration: Don't use -cpu max for aarch64
2024-01-29 2:51 ` Peter Xu
@ 2024-01-29 12:14 ` Fabiano Rosas
2024-01-29 23:30 ` Fabiano Rosas
0 siblings, 1 reply; 34+ messages in thread
From: Fabiano Rosas @ 2024-01-29 12:14 UTC (permalink / raw)
To: Peter Xu; +Cc: Peter Maydell, qemu-devel
Peter Xu <peterx@redhat.com> writes:
> On Fri, Jan 26, 2024 at 11:54:32AM -0300, Fabiano Rosas wrote:
>> Peter Maydell <peter.maydell@linaro.org> writes:
>>
>> > On Fri, 26 Jan 2024 at 14:36, Fabiano Rosas <farosas@suse.de> wrote:
>> >>
>> >> peterx@redhat.com writes:
>> >>
>> >> > From: Fabiano Rosas <farosas@suse.de>
>> >> >
>> >> > The 'max' cpu is not expected to be stable in terms of features across
>> >> > QEMU versions, so it should not be expected to migrate.
>> >> >
>> >> > While the tests currently all pass with -cpu max, that is only because
>> >> > we're not testing across QEMU versions, which is the more common
>> >> > use-case for migration.
>> >> >
>> >> > We've recently introduced compatibility tests that use two different
>> >> > QEMU versions and the tests are now failing for aarch64. The next
>> >> > patch adds those tests to CI, so we cannot use the 'max' cpu
>> >> > anymore. Replace it with the 'neoverse-n1', which has a fixed set of
>> >> > features.
>> >> >
>> >> > Suggested-by: Peter Maydell <peter.maydell@linaro.org>
>> >> > Signed-off-by: Fabiano Rosas <farosas@suse.de>
>> >> > Link: https://lore.kernel.org/r/20240118164951.30350-2-farosas@suse.de
>> >> > Signed-off-by: Peter Xu <peterx@redhat.com>
>> >> > ---
>> >> > tests/qtest/migration-test.c | 2 +-
>> >> > 1 file changed, 1 insertion(+), 1 deletion(-)
>> >> >
>> >> > diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
>> >> > index 7675519cfa..15713f3666 100644
>> >> > --- a/tests/qtest/migration-test.c
>> >> > +++ b/tests/qtest/migration-test.c
>> >> > @@ -820,7 +820,7 @@ static int test_migrate_start(QTestState **from, QTestState **to,
>> >> > memory_size = "150M";
>> >> > machine_alias = "virt";
>> >> > machine_opts = "gic-version=max";
>> >> > - arch_opts = g_strdup_printf("-cpu max -kernel %s", bootpath);
>> >> > + arch_opts = g_strdup_printf("-cpu neoverse-n1 -kernel %s", bootpath);
>> >> > start_address = ARM_TEST_MEM_START;
>> >> > end_address = ARM_TEST_MEM_END;
>> >> > } else {
>> >>
>> >> This breaks the tests on an arm host with KVM support. We could drop
>> >> this patch from the PR, it doesn't affect anything else.
>> >>
>> >> Or squash in:
>> >>
>> >> -->8--
>> >> From b8aa5d8a2b33dcc28e4cd4ce2c4f4eacc3a3b845 Mon Sep 17 00:00:00 2001
>> >> From: Fabiano Rosas <farosas@suse.de>
>> >> Date: Fri, 26 Jan 2024 11:33:15 -0300
>> >> Subject: [PATCH] fixup! tests/qtest/migration: Don't use -cpu max for aarch64
>> >>
>> >> Signed-off-by: Fabiano Rosas <farosas@suse.de>
>> >> ---
>> >> tests/qtest/migration-test.c | 4 +++-
>> >> 1 file changed, 3 insertions(+), 1 deletion(-)
>> >>
>> >> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
>> >> index 15713f3666..2ba9cab684 100644
>> >> --- a/tests/qtest/migration-test.c
>> >> +++ b/tests/qtest/migration-test.c
>> >> @@ -820,7 +820,9 @@ static int test_migrate_start(QTestState **from, QTestState **to,
>> >> memory_size = "150M";
>> >> machine_alias = "virt";
>> >> machine_opts = "gic-version=max";
>> >> - arch_opts = g_strdup_printf("-cpu neoverse-n1 -kernel %s", bootpath);
>> >> + arch_opts = g_strdup_printf("-cpu %s -kernel %s",
>> >> + qtest_has_accel("kvm") ?
>> >> + "host" : "neoverse-n1", bootpath);
>> >> start_address = ARM_TEST_MEM_START;
>> >> end_address = ARM_TEST_MEM_END;
>> >> } else {
>> >
>> > If you want to do that then a comment explaining why would be
>> > helpful for future readers, I think.
>>
>> Ok, let's drop this one then, I'll resend.
>
> I'll drop this one for now then, thanks.
>
> Just to double check: Fabiano, you meant that "-cpu host" won't hit the
> same issue as what "-cpu max" would have for the new "n-1" CI test, right?
Well, no. What we need here is a cpu that works with KVM. Currently
that's 'host'. If that breaks the n-1 test, then it's a regression.
We also need a cpu that works with TCG. Any of them would do. Except max
which changes in incompatible ways (that was the original patch's
purpose).
The issue that occurs to me now is that 'cpu host' will not work with
TCG. We might actually need to go poking /dev/kvm for this to work.
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PULL 06/15] tests/qtest/migration: Don't use -cpu max for aarch64
2024-01-29 12:14 ` Fabiano Rosas
@ 2024-01-29 23:30 ` Fabiano Rosas
2024-01-30 10:18 ` Peter Maydell
0 siblings, 1 reply; 34+ messages in thread
From: Fabiano Rosas @ 2024-01-29 23:30 UTC (permalink / raw)
To: Peter Xu; +Cc: Peter Maydell, qemu-devel
Fabiano Rosas <farosas@suse.de> writes:
> Peter Xu <peterx@redhat.com> writes:
>
>> On Fri, Jan 26, 2024 at 11:54:32AM -0300, Fabiano Rosas wrote:
>>> Peter Maydell <peter.maydell@linaro.org> writes:
>>>
>>> > On Fri, 26 Jan 2024 at 14:36, Fabiano Rosas <farosas@suse.de> wrote:
>>> >>
>>> >> peterx@redhat.com writes:
>>> >>
>>> >> > From: Fabiano Rosas <farosas@suse.de>
>>> >> >
>>> >> > The 'max' cpu is not expected to be stable in terms of features across
>>> >> > QEMU versions, so it should not be expected to migrate.
>>> >> >
>>> >> > While the tests currently all pass with -cpu max, that is only because
>>> >> > we're not testing across QEMU versions, which is the more common
>>> >> > use-case for migration.
>>> >> >
>>> >> > We've recently introduced compatibility tests that use two different
>>> >> > QEMU versions and the tests are now failing for aarch64. The next
>>> >> > patch adds those tests to CI, so we cannot use the 'max' cpu
>>> >> > anymore. Replace it with the 'neoverse-n1', which has a fixed set of
>>> >> > features.
>>> >> >
>>> >> > Suggested-by: Peter Maydell <peter.maydell@linaro.org>
>>> >> > Signed-off-by: Fabiano Rosas <farosas@suse.de>
>>> >> > Link: https://lore.kernel.org/r/20240118164951.30350-2-farosas@suse.de
>>> >> > Signed-off-by: Peter Xu <peterx@redhat.com>
>>> >> > ---
>>> >> > tests/qtest/migration-test.c | 2 +-
>>> >> > 1 file changed, 1 insertion(+), 1 deletion(-)
>>> >> >
>>> >> > diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
>>> >> > index 7675519cfa..15713f3666 100644
>>> >> > --- a/tests/qtest/migration-test.c
>>> >> > +++ b/tests/qtest/migration-test.c
>>> >> > @@ -820,7 +820,7 @@ static int test_migrate_start(QTestState **from, QTestState **to,
>>> >> > memory_size = "150M";
>>> >> > machine_alias = "virt";
>>> >> > machine_opts = "gic-version=max";
>>> >> > - arch_opts = g_strdup_printf("-cpu max -kernel %s", bootpath);
>>> >> > + arch_opts = g_strdup_printf("-cpu neoverse-n1 -kernel %s", bootpath);
>>> >> > start_address = ARM_TEST_MEM_START;
>>> >> > end_address = ARM_TEST_MEM_END;
>>> >> > } else {
>>> >>
>>> >> This breaks the tests on an arm host with KVM support. We could drop
>>> >> this patch from the PR, it doesn't affect anything else.
>>> >>
>>> >> Or squash in:
>>> >>
>>> >> -->8--
>>> >> From b8aa5d8a2b33dcc28e4cd4ce2c4f4eacc3a3b845 Mon Sep 17 00:00:00 2001
>>> >> From: Fabiano Rosas <farosas@suse.de>
>>> >> Date: Fri, 26 Jan 2024 11:33:15 -0300
>>> >> Subject: [PATCH] fixup! tests/qtest/migration: Don't use -cpu max for aarch64
>>> >>
>>> >> Signed-off-by: Fabiano Rosas <farosas@suse.de>
>>> >> ---
>>> >> tests/qtest/migration-test.c | 4 +++-
>>> >> 1 file changed, 3 insertions(+), 1 deletion(-)
>>> >>
>>> >> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
>>> >> index 15713f3666..2ba9cab684 100644
>>> >> --- a/tests/qtest/migration-test.c
>>> >> +++ b/tests/qtest/migration-test.c
>>> >> @@ -820,7 +820,9 @@ static int test_migrate_start(QTestState **from, QTestState **to,
>>> >> memory_size = "150M";
>>> >> machine_alias = "virt";
>>> >> machine_opts = "gic-version=max";
>>> >> - arch_opts = g_strdup_printf("-cpu neoverse-n1 -kernel %s", bootpath);
>>> >> + arch_opts = g_strdup_printf("-cpu %s -kernel %s",
>>> >> + qtest_has_accel("kvm") ?
>>> >> + "host" : "neoverse-n1", bootpath);
>>> >> start_address = ARM_TEST_MEM_START;
>>> >> end_address = ARM_TEST_MEM_END;
>>> >> } else {
>>> >
>>> > If you want to do that then a comment explaining why would be
>>> > helpful for future readers, I think.
>>>
>>> Ok, let's drop this one then, I'll resend.
>>
>> I'll drop this one for now then, thanks.
>>
>> Just to double check: Fabiano, you meant that "-cpu host" won't hit the
>> same issue as what "-cpu max" would have for the new "n-1" CI test, right?
>
> Well, no. What we need here is a cpu that works with KVM. Currently
> that's 'host'. If that breaks the n-1 test, then it's a regression.
>
> We also need a cpu that works with TCG. Any of them would do. Except max
> which changes in incompatible ways (that was the original patch's
> purpose).
>
> The issue that occurs to me now is that 'cpu host' will not work with
> TCG. We might actually need to go poking /dev/kvm for this to work.
Nevermind this last part. There's not going to be a scenario where we
build with CONFIG_KVM, but run in an environment that does not support
KVM.
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PULL 06/15] tests/qtest/migration: Don't use -cpu max for aarch64
2024-01-29 23:30 ` Fabiano Rosas
@ 2024-01-30 10:18 ` Peter Maydell
2024-01-30 10:48 ` Peter Xu
0 siblings, 1 reply; 34+ messages in thread
From: Peter Maydell @ 2024-01-30 10:18 UTC (permalink / raw)
To: Fabiano Rosas; +Cc: Peter Xu, qemu-devel
On Mon, 29 Jan 2024 at 23:31, Fabiano Rosas <farosas@suse.de> wrote:
>
> Fabiano Rosas <farosas@suse.de> writes:
>
> > Peter Xu <peterx@redhat.com> writes:
> >
> >> On Fri, Jan 26, 2024 at 11:54:32AM -0300, Fabiano Rosas wrote:
> > The issue that occurs to me now is that 'cpu host' will not work with
> > TCG. We might actually need to go poking /dev/kvm for this to work.
>
> Nevermind this last part. There's not going to be a scenario where we
> build with CONFIG_KVM, but run in an environment that does not support
> KVM.
Yes, there is. We'll build with CONFIG_KVM on any aarch64 host,
but that doesn't imply that the user running the build and
test has permissions for /dev/kvm.
thanks
-- PMM
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PULL 06/15] tests/qtest/migration: Don't use -cpu max for aarch64
2024-01-30 10:18 ` Peter Maydell
@ 2024-01-30 10:48 ` Peter Xu
2024-01-30 21:23 ` Fabiano Rosas
0 siblings, 1 reply; 34+ messages in thread
From: Peter Xu @ 2024-01-30 10:48 UTC (permalink / raw)
To: Peter Maydell; +Cc: Fabiano Rosas, qemu-devel
On Tue, Jan 30, 2024 at 10:18:07AM +0000, Peter Maydell wrote:
> On Mon, 29 Jan 2024 at 23:31, Fabiano Rosas <farosas@suse.de> wrote:
> >
> > Fabiano Rosas <farosas@suse.de> writes:
> >
> > > Peter Xu <peterx@redhat.com> writes:
> > >
> > >> On Fri, Jan 26, 2024 at 11:54:32AM -0300, Fabiano Rosas wrote:
> > > The issue that occurs to me now is that 'cpu host' will not work with
> > > TCG. We might actually need to go poking /dev/kvm for this to work.
> >
> > Nevermind this last part. There's not going to be a scenario where we
> > build with CONFIG_KVM, but run in an environment that does not support
> > KVM.
>
> Yes, there is. We'll build with CONFIG_KVM on any aarch64 host,
> but that doesn't imply that the user running the build and
> test has permissions for /dev/kvm.
I'm actually pretty confused on why this would be a problem even for
neoverse-n1: can we just try to use KVM, if it fails then use TCG?
Something like:
(construct qemu cmdline)
..
#ifdef CONFIG_KVM
"-accel kvm "
#endif
"-accel tcg "
..
?
IIUC if we specify two "-accel", we'll try the first, then if failed then
the 2nd?
--
Peter Xu
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PULL 06/15] tests/qtest/migration: Don't use -cpu max for aarch64
2024-01-30 10:48 ` Peter Xu
@ 2024-01-30 21:23 ` Fabiano Rosas
2024-01-31 4:04 ` Peter Xu
0 siblings, 1 reply; 34+ messages in thread
From: Fabiano Rosas @ 2024-01-30 21:23 UTC (permalink / raw)
To: Peter Xu, Peter Maydell; +Cc: qemu-devel
Peter Xu <peterx@redhat.com> writes:
> On Tue, Jan 30, 2024 at 10:18:07AM +0000, Peter Maydell wrote:
>> On Mon, 29 Jan 2024 at 23:31, Fabiano Rosas <farosas@suse.de> wrote:
>> >
>> > Fabiano Rosas <farosas@suse.de> writes:
>> >
>> > > Peter Xu <peterx@redhat.com> writes:
>> > >
>> > >> On Fri, Jan 26, 2024 at 11:54:32AM -0300, Fabiano Rosas wrote:
>> > > The issue that occurs to me now is that 'cpu host' will not work with
>> > > TCG. We might actually need to go poking /dev/kvm for this to work.
>> >
>> > Nevermind this last part. There's not going to be a scenario where we
>> > build with CONFIG_KVM, but run in an environment that does not support
>> > KVM.
>>
>> Yes, there is. We'll build with CONFIG_KVM on any aarch64 host,
>> but that doesn't imply that the user running the build and
>> test has permissions for /dev/kvm.
>
> I'm actually pretty confused on why this would be a problem even for
> neoverse-n1: can we just try to use KVM, if it fails then use TCG?
> Something like:
>
> (construct qemu cmdline)
> ..
> #ifdef CONFIG_KVM
> "-accel kvm "
> #endif
> "-accel tcg "
> ..
>
> ?
> IIUC if we specify two "-accel", we'll try the first, then if failed then
> the 2nd?
Aside from '-cpu max', there's no -accel and -cpu combination that works
on all of:
x86_64 host - TCG-only
aarch64 host - KVM & TCG
aarch64 host with --disable-tcg - KVM-only
aarch64 host without access to /dev/kvm - TCG-only
And the cpus are:
host - KVM-only
neoverse-n1 - TCG-only
We'll need something like:
/* covers aarch64 host with --disable-tcg */
if (qtest_has_accel("kvm") && !qtest_has_accel("tcg")) {
if (open("/dev/kvm", O_RDONLY) < 0) {
g_test_skip()
} else {
"-accel kvm -cpu host"
}
}
/* covers x86_64 host */
if (!qtest_has_accel("kvm") && qtest_has_accel("tcg")) {
"-accel tcg -cpu neoverse-n1"
}
/* covers aarch64 host */
if (qtest_has_accel("kvm") && qtest_has_accel("tcg")) {
if (open("/dev/kvm", O_RDONLY) < 0) {
"-accel tcg -cpu neoverse-n1"
} else {
"-accel kvm -cpu host"
}
}
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PULL 06/15] tests/qtest/migration: Don't use -cpu max for aarch64
2024-01-30 21:23 ` Fabiano Rosas
@ 2024-01-31 4:04 ` Peter Xu
2024-01-31 13:09 ` Fabiano Rosas
0 siblings, 1 reply; 34+ messages in thread
From: Peter Xu @ 2024-01-31 4:04 UTC (permalink / raw)
To: Fabiano Rosas; +Cc: Peter Maydell, qemu-devel
On Tue, Jan 30, 2024 at 06:23:10PM -0300, Fabiano Rosas wrote:
> Peter Xu <peterx@redhat.com> writes:
>
> > On Tue, Jan 30, 2024 at 10:18:07AM +0000, Peter Maydell wrote:
> >> On Mon, 29 Jan 2024 at 23:31, Fabiano Rosas <farosas@suse.de> wrote:
> >> >
> >> > Fabiano Rosas <farosas@suse.de> writes:
> >> >
> >> > > Peter Xu <peterx@redhat.com> writes:
> >> > >
> >> > >> On Fri, Jan 26, 2024 at 11:54:32AM -0300, Fabiano Rosas wrote:
> >> > > The issue that occurs to me now is that 'cpu host' will not work with
> >> > > TCG. We might actually need to go poking /dev/kvm for this to work.
> >> >
> >> > Nevermind this last part. There's not going to be a scenario where we
> >> > build with CONFIG_KVM, but run in an environment that does not support
> >> > KVM.
> >>
> >> Yes, there is. We'll build with CONFIG_KVM on any aarch64 host,
> >> but that doesn't imply that the user running the build and
> >> test has permissions for /dev/kvm.
> >
> > I'm actually pretty confused on why this would be a problem even for
> > neoverse-n1: can we just try to use KVM, if it fails then use TCG?
> > Something like:
> >
> > (construct qemu cmdline)
> > ..
> > #ifdef CONFIG_KVM
>
> > "-accel kvm "
> > #endif
> > "-accel tcg "
> > ..
> >
> > ?
> > IIUC if we specify two "-accel", we'll try the first, then if failed then
> > the 2nd?
>
> Aside from '-cpu max', there's no -accel and -cpu combination that works
> on all of:
>
> x86_64 host - TCG-only
> aarch64 host - KVM & TCG
> aarch64 host with --disable-tcg - KVM-only
> aarch64 host without access to /dev/kvm - TCG-only
>
> And the cpus are:
> host - KVM-only
> neoverse-n1 - TCG-only
>
> We'll need something like:
>
> /* covers aarch64 host with --disable-tcg */
> if (qtest_has_accel("kvm") && !qtest_has_accel("tcg")) {
> if (open("/dev/kvm", O_RDONLY) < 0) {
> g_test_skip()
> } else {
> "-accel kvm -cpu host"
> }
> }
>
> /* covers x86_64 host */
> if (!qtest_has_accel("kvm") && qtest_has_accel("tcg")) {
> "-accel tcg -cpu neoverse-n1"
> }
>
> /* covers aarch64 host */
> if (qtest_has_accel("kvm") && qtest_has_accel("tcg")) {
> if (open("/dev/kvm", O_RDONLY) < 0) {
> "-accel tcg -cpu neoverse-n1"
> } else {
> "-accel kvm -cpu host"
> }
> }
The open("/dev/kvm") logic more or less duplicates what QEMU already does
when init accelerators:
if (!qemu_opts_foreach(qemu_find_opts("accel"),
do_configure_accelerator, &init_failed, &error_fatal)) {
if (!init_failed) {
error_report("no accelerator found");
}
exit(1);
}
If /dev/kvm not accessible I think it'll already fallback to tcg here, as
do_configure_accelerator() for kvm will just silently fail for qtest. I
hope we can still rely on that for /dev/kvm access issues.
Hmm, I just notice that test_migrate_start() already has this later:
"-accel kvm%s -accel tcg "
So we're actually good from that regard, AFAIU.
Then did I understand it right that in the failure case KVM is properly
initialized, however it crashed later in neoverse-n1 asking for TCG? So
the logic in the accel code above didn't really work to do a real fallback?
A backtrace of such crash would help, maybe; I tried to find it in the
pipeline log but I can only see:
----------------------------------- stderr -----------------------------------
Broken pipe
../tests/qtest/libqtest.c:195: kill_qemu() tried to terminate QEMU process but encountered exit status 1 (expected 0)
Or, is there some aarch64 cpu that will have a stable CPU ABI (not like
-max, which is unstable), meanwhile supports both TCG + KVM?
Another thing I noticed that we may need to be caution is that currently
gic is also using max version:
machine_opts = "gic-version=max";
We may want to choose a sane version too, probably altogether with the
patch?
--
Peter Xu
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PULL 06/15] tests/qtest/migration: Don't use -cpu max for aarch64
2024-01-31 4:04 ` Peter Xu
@ 2024-01-31 13:09 ` Fabiano Rosas
2024-02-01 2:56 ` Peter Xu
0 siblings, 1 reply; 34+ messages in thread
From: Fabiano Rosas @ 2024-01-31 13:09 UTC (permalink / raw)
To: Peter Xu; +Cc: Peter Maydell, qemu-devel
Peter Xu <peterx@redhat.com> writes:
> On Tue, Jan 30, 2024 at 06:23:10PM -0300, Fabiano Rosas wrote:
>> Peter Xu <peterx@redhat.com> writes:
>>
>> > On Tue, Jan 30, 2024 at 10:18:07AM +0000, Peter Maydell wrote:
>> >> On Mon, 29 Jan 2024 at 23:31, Fabiano Rosas <farosas@suse.de> wrote:
>> >> >
>> >> > Fabiano Rosas <farosas@suse.de> writes:
>> >> >
>> >> > > Peter Xu <peterx@redhat.com> writes:
>> >> > >
>> >> > >> On Fri, Jan 26, 2024 at 11:54:32AM -0300, Fabiano Rosas wrote:
>> >> > > The issue that occurs to me now is that 'cpu host' will not work with
>> >> > > TCG. We might actually need to go poking /dev/kvm for this to work.
>> >> >
>> >> > Nevermind this last part. There's not going to be a scenario where we
>> >> > build with CONFIG_KVM, but run in an environment that does not support
>> >> > KVM.
>> >>
>> >> Yes, there is. We'll build with CONFIG_KVM on any aarch64 host,
>> >> but that doesn't imply that the user running the build and
>> >> test has permissions for /dev/kvm.
>> >
>> > I'm actually pretty confused on why this would be a problem even for
>> > neoverse-n1: can we just try to use KVM, if it fails then use TCG?
>> > Something like:
>> >
>> > (construct qemu cmdline)
>> > ..
>> > #ifdef CONFIG_KVM
>>
>> > "-accel kvm "
>> > #endif
>> > "-accel tcg "
>> > ..
>> >
>> > ?
>> > IIUC if we specify two "-accel", we'll try the first, then if failed then
>> > the 2nd?
>>
>> Aside from '-cpu max', there's no -accel and -cpu combination that works
>> on all of:
>>
>> x86_64 host - TCG-only
>> aarch64 host - KVM & TCG
>> aarch64 host with --disable-tcg - KVM-only
>> aarch64 host without access to /dev/kvm - TCG-only
>>
>> And the cpus are:
>> host - KVM-only
>> neoverse-n1 - TCG-only
>>
>> We'll need something like:
>>
>> /* covers aarch64 host with --disable-tcg */
>> if (qtest_has_accel("kvm") && !qtest_has_accel("tcg")) {
>> if (open("/dev/kvm", O_RDONLY) < 0) {
>> g_test_skip()
>> } else {
>> "-accel kvm -cpu host"
>> }
>> }
>>
>> /* covers x86_64 host */
>> if (!qtest_has_accel("kvm") && qtest_has_accel("tcg")) {
>> "-accel tcg -cpu neoverse-n1"
>> }
>>
>> /* covers aarch64 host */
>> if (qtest_has_accel("kvm") && qtest_has_accel("tcg")) {
>> if (open("/dev/kvm", O_RDONLY) < 0) {
>> "-accel tcg -cpu neoverse-n1"
>> } else {
>> "-accel kvm -cpu host"
>> }
>> }
>
> The open("/dev/kvm") logic more or less duplicates what QEMU already does
> when init accelerators:
>
> if (!qemu_opts_foreach(qemu_find_opts("accel"),
> do_configure_accelerator, &init_failed, &error_fatal)) {
> if (!init_failed) {
> error_report("no accelerator found");
> }
> exit(1);
> }
>
> If /dev/kvm not accessible I think it'll already fallback to tcg here, as
> do_configure_accelerator() for kvm will just silently fail for qtest. I
> hope we can still rely on that for /dev/kvm access issues.
If we ask for KVM and it falls back to TCG, we need a cpu that supports
both. We don't have that. I've put some command-line combinations at the
end of the email[1], take a look.
If we ask for KVM only and /dev/kvm is not accessible, the test will
fail and we can prevent that by checking beforehand. It's much simpler
to check first and do the right thing than to run the QEMU binary and
somehow work around the test failure in migration-test.
>
> Hmm, I just notice that test_migrate_start() already has this later:
>
> "-accel kvm%s -accel tcg "
>
> So we're actually good from that regard, AFAIU.
>
> Then did I understand it right that in the failure case KVM is properly
> initialized, however it crashed later in neoverse-n1 asking for TCG? So
It didn't crash. It simply does not accept the neoverse-n1 with KVM
because it's unsupported:
qemu-system-aarch64: KVM is not supported for this guest CPU type
qemu-system-aarch64: kvm_init_vcpu: kvm_arch_init_vcpu failed (0): Invalid argument
> the logic in the accel code above didn't really work to do a real fallback?
Yep, it didn't.
> A backtrace of such crash would help, maybe; I tried to find it in the
> pipeline log but I can only see:
>
> ----------------------------------- stderr -----------------------------------
> Broken pipe
> ../tests/qtest/libqtest.c:195: kill_qemu() tried to terminate QEMU process but encountered exit status 1 (expected 0)
We need to fix the QTEST_LOG logic someday. It currently hides QEMU
stderr. But when we enable logging then it logs every single serial read
and write and query-migrate in the face of the earth and it floods the
logs.
>
> Or, is there some aarch64 cpu that will have a stable CPU ABI (not like
> -max, which is unstable), meanwhile supports both TCG + KVM?
Not as far as I know.
>
> Another thing I noticed that we may need to be caution is that currently
> gic is also using max version:
>
> machine_opts = "gic-version=max";
>
> We may want to choose a sane version too, probably altogether with the
> patch?
Good point.
====================
[1]
On x86_64:
==========
-cpu host
---------
$ ./qemu-system-aarch64 -nographic -machine virt -cpu host -accel kvm
qemu-system-aarch64: -accel kvm: invalid accelerator kvm
$ ./qemu-system-aarch64 -nographic -machine virt -cpu host -accel tcg
qemu-system-aarch64: unable to find CPU model 'host'
$ ./qemu-system-aarch64 -nographic -machine virt -cpu host -accel kvm -accel tcg
qemu-system-aarch64: -accel kvm: invalid accelerator kvm
qemu-system-aarch64: falling back to tcg
qemu-system-aarch64: unable to find CPU model 'host'
$ ./qemu-system-aarch64 -nographic -machine virt -cpu host -accel tcg -accel kvm
qemu-system-aarch64: unable to find CPU model 'host'
-cpu neoverse-n1
----------------
$ ./qemu-system-aarch64 -nographic -machine virt -cpu neoverse-n1 -accel tcg
works
$ ./qemu-system-aarch64 -nographic -machine virt -cpu neoverse-n1 -accel kvm
qemu-system-aarch64: -accel kvm: invalid accelerator kvm
$ ./qemu-system-aarch64 -nographic -machine virt -cpu neoverse-n1 -accel kvm -accel tcg
qemu-system-aarch64: -accel kvm: invalid accelerator kvm
qemu-system-aarch64: falling back to tcg
works
$ ./qemu-system-aarch64 -nographic -machine virt -cpu neoverse-n1 -accel tcg -accel kvm
works
On aarch64:
===========
-cpu host
---------
# ./qemu-system-aarch64 -nographic -machine virt -cpu host -accel kvm
works
# ./qemu-system-aarch64 -nographic -machine virt -cpu host -accel tcg
qemu-system-aarch64: The 'host' CPU type can only be used with KVM or HVF
# ./qemu-system-aarch64 -nographic -machine virt -cpu host -accel kvm -accel tcg
works
# ./qemu-system-aarch64 -nographic -machine virt -cpu host -accel tcg -accel kvm
qemu-system-aarch64: The 'host' CPU type can only be used with KVM or
HVF
-cpu neoverse-n1
----------------
# ./qemu-system-aarch64 -nographic -machine virt -cpu neoverse-n1 -accel kvm
qemu-system-aarch64: KVM is not supported for this guest CPU type
qemu-system-aarch64: kvm_init_vcpu: kvm_arch_init_vcpu failed (0): Invalid argument
# ./qemu-system-aarch64 -nographic -machine virt -cpu neoverse-n1 -accel tcg
works
# ./qemu-system-aarch64 -nographic -machine virt -cpu neoverse-n1 -accel kvm -accel tcg
qemu-system-aarch64: KVM is not supported for this guest CPU type
qemu-system-aarch64: kvm_init_vcpu: kvm_arch_init_vcpu failed (0): Invalid argument
# ./qemu-system-aarch64 -nographic -machine virt -cpu neoverse-n1 -accel tcg -accel kvm
works
On aarch64 --disable-tcg:
=========================
-cpu host
---------
# ./qemu-system-aarch64 -nographic -machine virt -cpu host -accel kvm
works
# ./qemu-system-aarch64 -nographic -machine virt -cpu host -accel tcg
qemu-system-aarch64: -accel tcg: invalid accelerator tcg
# ./qemu-system-aarch64 -nographic -machine virt -cpu host -accel kvm -accel tcg
works
# ./qemu-system-aarch64 -nographic -machine virt -cpu host -accel tcg -accel kvm
qemu-system-aarch64: -accel tcg: invalid accelerator tcg
qemu-system-aarch64: falling back to KVM
works
-cpu neoverse-n1
----------------
# ./qemu-system-aarch64 -nographic -machine virt -cpu neoverse-n1 -accel kvm
qemu-system-aarch64: unable to find CPU model 'neoverse-n1'
# ./qemu-system-aarch64 -nographic -machine virt -cpu neoverse-n1 -accel tcg
qemu-system-aarch64: -accel tcg: invalid accelerator tcg
# ./qemu-system-aarch64 -nographic -machine virt -cpu neoverse-n1 -accel kvm -accel tcg
qemu-system-aarch64: unable to find CPU model 'neoverse-n1'
# ./qemu-system-aarch64 -nographic -machine virt -cpu neoverse-n1 -accel tcg -accel kvm
qemu-system-aarch64: -accel tcg: invalid accelerator tcg
qemu-system-aarch64: falling back to KVM
qemu-system-aarch64: unable to find CPU model 'neoverse-n1'
On aarch64 without access to /dev/kvm:
======================================
-cpu host
---------
# ./qemu-system-aarch64 -nographic -machine virt -cpu host -accel kvm
Could not access KVM kernel module: No such file or directory
qemu-system-aarch64: -accel kvm: failed to initialize kvm: No such file
or directory
# ./qemu-system-aarch64 -nographic -machine virt -cpu host -accel tcg
qemu-system-aarch64: The 'host' CPU type can only be used with KVM or HVF
# ./qemu-system-aarch64 -nographic -machine virt -cpu host -accel kvm -accel tcg
Could not access KVM kernel module: No such file or directory
qemu-system-aarch64: -accel kvm: failed to initialize kvm: No such file or directory
qemu-system-aarch64: falling back to tcg
qemu-system-aarch64: The 'host' CPU type can only be used with KVM or
HVF
# ./qemu-system-aarch64 -nographic -machine virt -cpu host -accel tcg -accel kvm
qemu-system-aarch64: The 'host' CPU type can only be used with KVM or HVF
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PULL 06/15] tests/qtest/migration: Don't use -cpu max for aarch64
2024-01-31 13:09 ` Fabiano Rosas
@ 2024-02-01 2:56 ` Peter Xu
[not found] ` <87y1c4ib03.fsf@suse.de>
0 siblings, 1 reply; 34+ messages in thread
From: Peter Xu @ 2024-02-01 2:56 UTC (permalink / raw)
To: Fabiano Rosas, Eric Auger, Peter Maydell; +Cc: qemu-devel
On Wed, Jan 31, 2024 at 10:09:16AM -0300, Fabiano Rosas wrote:
> If we ask for KVM and it falls back to TCG, we need a cpu that supports
> both. We don't have that. I've put some command-line combinations at the
> end of the email[1], take a look.
Thanks a lot, Fabiano. I think I have a better picture now.
Now the question is whether it'll be worthwhile we (migration) explicitly
provide code to workaround such issue in qtest, or we wait for ARM side
until we have a processor that can be both stable and support KVM+TCG.
I actually personally prefer to wait - it's not too bad after all, because
it only affects the new "n-1" migration test. Most of the migration
functionality will still be covered there in CI for ARM.
Meanwhile, AFAIU we do have a plan upstream to have a stable aarch64 cpu
model sooner or later that at least support KVM. If that will also be able
to support TCG then goal achieved. Or vice versa, if we would be able to
add KVM support to some stable TCG-only cores (like neoverse-n1).
Do we have a plan in this area? Copy both Peter & Eric.
If we can have that in 9.0 then that'll be perfect; we can already start to
switch migration tests to use the cpu model.
As of now, maybe we can (1) fix the gic-version in migration-test.c to be
stable; this seems a separate issue just to get prepared when a new model
comes, then (2) document above decision in migration-compat-aarch64 test in
.gitlab-ci.d/, if we can reach consensus. Then we only rely on x86 for
"n-1" migration tests until later.
--
Peter Xu
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PULL 06/15] tests/qtest/migration: Don't use -cpu max for aarch64
[not found] ` <87y1c4ib03.fsf@suse.de>
@ 2024-02-01 23:50 ` Peter Xu
2024-02-02 10:51 ` Peter Maydell
0 siblings, 1 reply; 34+ messages in thread
From: Peter Xu @ 2024-02-01 23:50 UTC (permalink / raw)
To: Fabiano Rosas; +Cc: qemu-devel, Peter Maydell, Eric Auger
Fabiano, I think you forgot to reply-to-all.. adding back the list and
people in the loop.
On Thu, Feb 01, 2024 at 10:12:44AM -0300, Fabiano Rosas wrote:
> Peter Xu <peterx@redhat.com> writes:
>
> > On Wed, Jan 31, 2024 at 10:09:16AM -0300, Fabiano Rosas wrote:
> >> If we ask for KVM and it falls back to TCG, we need a cpu that supports
> >> both. We don't have that. I've put some command-line combinations at the
> >> end of the email[1], take a look.
> >
> > Thanks a lot, Fabiano. I think I have a better picture now.
> >
> > Now the question is whether it'll be worthwhile we (migration) explicitly
> > provide code to workaround such issue in qtest, or we wait for ARM side
> > until we have a processor that can be both stable and support KVM+TCG.
> >
> > I actually personally prefer to wait - it's not too bad after all, because
> > it only affects the new "n-1" migration test. Most of the migration
> > functionality will still be covered there in CI for ARM.
>
> That's fine with me. We just need to do something about the arm CI job
> which is currently disabled waiting for a fix. We could remove it or add
> some words somewhere explaining the situation. I can do that once we
> reach an agreement here.
Yes. IMHO we can keep the test (with SKIPPED=1) but amend the message,
which will start to state inaccurately:
# This job is disabled until we release 9.0. The existing
# migration-test in 8.2 is broken on aarch64. The fix was already
# commited, but it will only take effect once 9.0 is out.
IMHO then it won't mean 9.0 will have it fixed, but we'll simply wait for a
cpu model that is ready for both kvm+tcg, then we replace "max".
For gic_version, I knew 3 was there for quite some time so maybe we can
start from that? Or we need suggestions from ARM people.
--
Peter Xu
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PULL 06/15] tests/qtest/migration: Don't use -cpu max for aarch64
2024-02-01 23:50 ` Peter Xu
@ 2024-02-02 10:51 ` Peter Maydell
2024-02-05 2:56 ` Peter Xu
2024-02-05 8:35 ` Eric Auger
0 siblings, 2 replies; 34+ messages in thread
From: Peter Maydell @ 2024-02-02 10:51 UTC (permalink / raw)
To: Peter Xu; +Cc: Fabiano Rosas, qemu-devel, Eric Auger
On Thu, 1 Feb 2024 at 23:50, Peter Xu <peterx@redhat.com> wrote:
>
> Fabiano, I think you forgot to reply-to-all.. adding back the list and
> people in the loop.
>
> On Thu, Feb 01, 2024 at 10:12:44AM -0300, Fabiano Rosas wrote:
> > Peter Xu <peterx@redhat.com> writes:
> >
> > > On Wed, Jan 31, 2024 at 10:09:16AM -0300, Fabiano Rosas wrote:
> > >> If we ask for KVM and it falls back to TCG, we need a cpu that supports
> > >> both. We don't have that. I've put some command-line combinations at the
> > >> end of the email[1], take a look.
> > >
> > > Thanks a lot, Fabiano. I think I have a better picture now.
> > >
> > > Now the question is whether it'll be worthwhile we (migration) explicitly
> > > provide code to workaround such issue in qtest, or we wait for ARM side
> > > until we have a processor that can be both stable and support KVM+TCG.
> > >
> > > I actually personally prefer to wait - it's not too bad after all, because
> > > it only affects the new "n-1" migration test. Most of the migration
> > > functionality will still be covered there in CI for ARM.
> >
> > That's fine with me. We just need to do something about the arm CI job
> > which is currently disabled waiting for a fix. We could remove it or add
> > some words somewhere explaining the situation. I can do that once we
> > reach an agreement here.
>
> Yes. IMHO we can keep the test (with SKIPPED=1) but amend the message,
> which will start to state inaccurately:
>
> # This job is disabled until we release 9.0. The existing
> # migration-test in 8.2 is broken on aarch64. The fix was already
> # commited, but it will only take effect once 9.0 is out.
>
> IMHO then it won't mean 9.0 will have it fixed, but we'll simply wait for a
> cpu model that is ready for both kvm+tcg, then we replace "max".
We already have a CPU model that works for both KVM and TCG: that
is "max". We're not going to add another one. The difference is
just that we provide different cross-version migration compatibility
support levels for the two cases. (Strictly speaking, I'm not sure we
strongly support migration compat for 'max' on KVM either --
for instance you probably need to be doing a migration on the
same host CPU type and the same host kernel version. It's just
that the definition of "max" on KVM is less QEMU-dependent and
more host-kernel-dependent, so in your particular situation running
the test cases you're less likely to see any possible breakage.)
-- PMM
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PULL 06/15] tests/qtest/migration: Don't use -cpu max for aarch64
2024-02-02 10:51 ` Peter Maydell
@ 2024-02-05 2:56 ` Peter Xu
2024-02-12 18:29 ` Peter Maydell
2024-02-05 8:35 ` Eric Auger
1 sibling, 1 reply; 34+ messages in thread
From: Peter Xu @ 2024-02-05 2:56 UTC (permalink / raw)
To: Peter Maydell; +Cc: Fabiano Rosas, qemu-devel, Eric Auger
On Fri, Feb 02, 2024 at 10:51:36AM +0000, Peter Maydell wrote:
> On Thu, 1 Feb 2024 at 23:50, Peter Xu <peterx@redhat.com> wrote:
> >
> > Fabiano, I think you forgot to reply-to-all.. adding back the list and
> > people in the loop.
> >
> > On Thu, Feb 01, 2024 at 10:12:44AM -0300, Fabiano Rosas wrote:
> > > Peter Xu <peterx@redhat.com> writes:
> > >
> > > > On Wed, Jan 31, 2024 at 10:09:16AM -0300, Fabiano Rosas wrote:
> > > >> If we ask for KVM and it falls back to TCG, we need a cpu that supports
> > > >> both. We don't have that. I've put some command-line combinations at the
> > > >> end of the email[1], take a look.
> > > >
> > > > Thanks a lot, Fabiano. I think I have a better picture now.
> > > >
> > > > Now the question is whether it'll be worthwhile we (migration) explicitly
> > > > provide code to workaround such issue in qtest, or we wait for ARM side
> > > > until we have a processor that can be both stable and support KVM+TCG.
> > > >
> > > > I actually personally prefer to wait - it's not too bad after all, because
> > > > it only affects the new "n-1" migration test. Most of the migration
> > > > functionality will still be covered there in CI for ARM.
> > >
> > > That's fine with me. We just need to do something about the arm CI job
> > > which is currently disabled waiting for a fix. We could remove it or add
> > > some words somewhere explaining the situation. I can do that once we
> > > reach an agreement here.
> >
> > Yes. IMHO we can keep the test (with SKIPPED=1) but amend the message,
> > which will start to state inaccurately:
> >
> > # This job is disabled until we release 9.0. The existing
> > # migration-test in 8.2 is broken on aarch64. The fix was already
> > # commited, but it will only take effect once 9.0 is out.
> >
> > IMHO then it won't mean 9.0 will have it fixed, but we'll simply wait for a
> > cpu model that is ready for both kvm+tcg, then we replace "max".
>
> We already have a CPU model that works for both KVM and TCG: that
> is "max". We're not going to add another one.
Thanks, but then this is pretty sad. I'm surprised aarch64 doesn't have
such requirement to allow some VM config to run across all kinds of hosts.
> The difference is just that we provide different cross-version migration
> compatibility support levels for the two cases. (Strictly speaking, I'm
> not sure we strongly support migration compat for 'max' on KVM either --
> for instance you probably need to be doing a migration on the same host
> CPU type and the same host kernel version. It's just that the definition
> of "max" on KVM is less QEMU-dependent and more host-kernel-dependent, so
> in your particular situation running the test cases you're less likely to
> see any possible breakage.)
Yes we don't have issue for the current CI on KVM compatibilities, but QEMU
does matter for sure.
Then we can either (1) add code as Fabiano suggested to choose different
cpu model by adding hack code in qtest, or (2) we simply not support
aarch64 on cross binary test like most of the rest of the arch, but only
support x86, until any arch can provide a stable CPU that support all
config of hosts (we can document it in the CI file).
I'd vote for (2). Fabiano, do you have any preference?
--
Peter Xu
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PULL 06/15] tests/qtest/migration: Don't use -cpu max for aarch64
2024-02-02 10:51 ` Peter Maydell
2024-02-05 2:56 ` Peter Xu
@ 2024-02-05 8:35 ` Eric Auger
1 sibling, 0 replies; 34+ messages in thread
From: Eric Auger @ 2024-02-05 8:35 UTC (permalink / raw)
To: Peter Maydell, Peter Xu; +Cc: Fabiano Rosas, qemu-devel, Sebastian Ott
Hi,
On 2/2/24 11:51, Peter Maydell wrote:
> On Thu, 1 Feb 2024 at 23:50, Peter Xu <peterx@redhat.com> wrote:
>> Fabiano, I think you forgot to reply-to-all.. adding back the list and
>> people in the loop.
>>
>> On Thu, Feb 01, 2024 at 10:12:44AM -0300, Fabiano Rosas wrote:
>>> Peter Xu <peterx@redhat.com> writes:
>>>
>>>> On Wed, Jan 31, 2024 at 10:09:16AM -0300, Fabiano Rosas wrote:
>>>>> If we ask for KVM and it falls back to TCG, we need a cpu that supports
>>>>> both. We don't have that. I've put some command-line combinations at the
>>>>> end of the email[1], take a look.
>>>> Thanks a lot, Fabiano. I think I have a better picture now.
>>>>
>>>> Now the question is whether it'll be worthwhile we (migration) explicitly
>>>> provide code to workaround such issue in qtest, or we wait for ARM side
>>>> until we have a processor that can be both stable and support KVM+TCG.
>>>>
>>>> I actually personally prefer to wait - it's not too bad after all, because
>>>> it only affects the new "n-1" migration test. Most of the migration
>>>> functionality will still be covered there in CI for ARM.
>>> That's fine with me. We just need to do something about the arm CI job
>>> which is currently disabled waiting for a fix. We could remove it or add
>>> some words somewhere explaining the situation. I can do that once we
>>> reach an agreement here.
>> Yes. IMHO we can keep the test (with SKIPPED=1) but amend the message,
>> which will start to state inaccurately:
>>
>> # This job is disabled until we release 9.0. The existing
>> # migration-test in 8.2 is broken on aarch64. The fix was already
>> # commited, but it will only take effect once 9.0 is out.
>>
>> IMHO then it won't mean 9.0 will have it fixed, but we'll simply wait for a
>> cpu model that is ready for both kvm+tcg, then we replace "max".
> We already have a CPU model that works for both KVM and TCG: that
> is "max". We're not going to add another one. The difference is
> just that we provide different cross-version migration compatibility
> support levels for the two cases. (Strictly speaking, I'm not sure we
> strongly support migration compat for 'max' on KVM either --
> for instance you probably need to be doing a migration on the
> same host CPU type and the same host kernel version. It's just
Yes in general migrating to different kernels will fail. Same for
different pCPU types.
We need CPU models to work around some of those limitations. Adding
Sebastian in copy. He is currently working on this.
Thanks
Eric
> that the definition of "max" on KVM is less QEMU-dependent and
> more host-kernel-dependent, so in your particular situation running
> the test cases you're less likely to see any possible breakage.)
>
> -- PMM
>
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PULL 06/15] tests/qtest/migration: Don't use -cpu max for aarch64
2024-02-05 2:56 ` Peter Xu
@ 2024-02-12 18:29 ` Peter Maydell
0 siblings, 0 replies; 34+ messages in thread
From: Peter Maydell @ 2024-02-12 18:29 UTC (permalink / raw)
To: Peter Xu; +Cc: Fabiano Rosas, qemu-devel, Eric Auger
On Mon, 5 Feb 2024 at 02:56, Peter Xu <peterx@redhat.com> wrote:
> Thanks, but then this is pretty sad. I'm surprised aarch64 doesn't have
> such requirement to allow some VM config to run across all kinds of hosts.
It just hasn't been anything that anybody so far has wanted
enough to put the necessary kernel-side work into. (There are
also some tricky issues surrounding errata workarounds that
the guest needs to do: you need to have some way of telling the
guest "the vCPU looks like it's type X but you need to do
errata workarounds ABC for CPU type Y, not the ones for X".)
> > The difference is just that we provide different cross-version migration
> > compatibility support levels for the two cases. (Strictly speaking, I'm
> > not sure we strongly support migration compat for 'max' on KVM either --
> > for instance you probably need to be doing a migration on the same host
> > CPU type and the same host kernel version. It's just that the definition
> > of "max" on KVM is less QEMU-dependent and more host-kernel-dependent, so
> > in your particular situation running the test cases you're less likely to
> > see any possible breakage.)
>
> Yes we don't have issue for the current CI on KVM compatibilities, but QEMU
> does matter for sure.
>
> Then we can either (1) add code as Fabiano suggested to choose different
> cpu model by adding hack code in qtest, or (2) we simply not support
> aarch64 on cross binary test like most of the rest of the arch, but only
> support x86, until any arch can provide a stable CPU that support all
> config of hosts (we can document it in the CI file).
That seems a bit pessimistic. How about "always only test with TCG" ?
That will defend the migration compat on all the device models etc,
which is the bit we're most likely to break by accident.
thanks
-- PMM
^ permalink raw reply [flat|nested] 34+ messages in thread
end of thread, other threads:[~2024-02-12 18:30 UTC | newest]
Thread overview: 34+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-01-26 4:17 [PULL 00/15] Migration 20240126 patches peterx
2024-01-26 4:17 ` [PULL 01/15] userfaultfd: use 1ULL to build ioctl masks peterx
2024-01-26 4:17 ` [PULL 02/15] migration: Plug memory leak on HMP migrate error path peterx
2024-01-26 4:17 ` [PULL 03/15] migration: Make threshold_size an uint64_t peterx
2024-01-26 4:17 ` [PULL 04/15] migration: Drop unnecessary check in ram's pending_exact() peterx
2024-01-26 4:17 ` [PULL 05/15] analyze-migration.py: Remove trick on parsing ramblocks peterx
2024-01-26 4:17 ` [PULL 06/15] tests/qtest/migration: Don't use -cpu max for aarch64 peterx
2024-01-26 14:36 ` Fabiano Rosas
2024-01-26 14:39 ` Peter Maydell
2024-01-26 14:54 ` Fabiano Rosas
2024-01-29 2:51 ` Peter Xu
2024-01-29 12:14 ` Fabiano Rosas
2024-01-29 23:30 ` Fabiano Rosas
2024-01-30 10:18 ` Peter Maydell
2024-01-30 10:48 ` Peter Xu
2024-01-30 21:23 ` Fabiano Rosas
2024-01-31 4:04 ` Peter Xu
2024-01-31 13:09 ` Fabiano Rosas
2024-02-01 2:56 ` Peter Xu
[not found] ` <87y1c4ib03.fsf@suse.de>
2024-02-01 23:50 ` Peter Xu
2024-02-02 10:51 ` Peter Maydell
2024-02-05 2:56 ` Peter Xu
2024-02-12 18:29 ` Peter Maydell
2024-02-05 8:35 ` Eric Auger
2024-01-26 4:17 ` [PULL 07/15] ci: Add a migration compatibility test job peterx
2024-01-26 4:17 ` [PULL 08/15] ci: Disable migration compatibility tests for aarch64 peterx
2024-01-26 4:17 ` [PULL 09/15] migration/yank: Use channel features peterx
2024-01-26 4:17 ` [PULL 10/15] migration: Fix use-after-free of migration state object peterx
2024-01-26 4:17 ` [PULL 11/15] migration: Take reference to migration state around bg_migration_vm_start_bh peterx
2024-01-26 4:17 ` [PULL 12/15] migration: Reference migration state around loadvm_postcopy_handle_run_bh peterx
2024-01-26 4:17 ` [PULL 13/15] migration: Add a wrapper to qemu_bh_schedule peterx
2024-01-26 4:17 ` [PULL 14/15] migration: Centralize BH creation and dispatch peterx
2024-01-26 4:17 ` [PULL 15/15] Make 'uri' optional for migrate QAPI peterx
2024-01-26 18:16 ` [PULL 00/15] Migration 20240126 patches Peter Maydell
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).