* [RFC PATCH 0/4] migration-test: Device migration smoke tests
@ 2024-05-23 20:19 Fabiano Rosas
2024-05-23 20:19 ` [RFC PATCH 1/4] tests/qtest/libqtest: Introduce another qtest_init version with no handshake Fabiano Rosas
` (3 more replies)
0 siblings, 4 replies; 16+ messages in thread
From: Fabiano Rosas @ 2024-05-23 20:19 UTC (permalink / raw)
To: qemu-devel
Cc: Peter Xu, Thomas Huth, Marc-André Lureau, Fiona Ebner,
Het Gala
We have discussed recently about two relatively cheap ways to catch
migration compatibility breakages across QEMU versions. This series
adds support for both.
1) vmstate-static-checker.py
This script has existed for a while and takes a dmup of vmstates from
two different QEMU versions and compares them.
The migration maintainer will run this before merges, but it's useless
for bugs that don't enter via the migration tree. I'm adding this test
to the CI for everyone.
Cons: the script can't handle renames and other compatible changes
that might happen to the vmstate structures without modification, so
these kinds of changes would fail the CI job during that release until
the script is fixed or the old QEMU version catches up. I think this
is passable because the CI job is already marked as allow_failure.
2) migration-tests with -device
We never ran the migration-tests with devices added to the QEMU
command line because the migration-tests run custom guest
code. However, just having the device in the command line already
causes it's state to be sent around and this has been shown to catch
bugs[1].
I'm adding support for running any migration-test with a list of
devices, either a hardcoded one provided by us, or a custom one
provided by whoever is experimenting with this code. This also give us
the ability to quickly check what happens when a new (to the tests)
device is added.
1- https://lore.kernel.org/r/87wmo5l58z.fsf@suse.de
Fabiano Rosas (4):
tests/qtest/libqtest: Introduce another qtest_init version with no
handshake
tests/qtest/migration: Add a test that runs vmstate-static-checker
tests/qtest/migration: Add support for simple device tests
ci: Add the new migration device tests
.gitlab-ci.d/buildtest.yml | 43 ++++++++++++---
tests/qtest/libqtest.c | 14 +++--
tests/qtest/libqtest.h | 13 +++++
tests/qtest/migration-test.c | 101 ++++++++++++++++++++++++++++++++++-
4 files changed, 157 insertions(+), 14 deletions(-)
base-commit: 01782d6b294f95bcde334386f0aaac593cd28c0d
--
2.35.3
^ permalink raw reply [flat|nested] 16+ messages in thread
* [RFC PATCH 1/4] tests/qtest/libqtest: Introduce another qtest_init version with no handshake
2024-05-23 20:19 [RFC PATCH 0/4] migration-test: Device migration smoke tests Fabiano Rosas
@ 2024-05-23 20:19 ` Fabiano Rosas
2024-05-23 20:19 ` [RFC PATCH 2/4] tests/qtest/migration: Add a test that runs vmstate-static-checker Fabiano Rosas
` (2 subsequent siblings)
3 siblings, 0 replies; 16+ messages in thread
From: Fabiano Rosas @ 2024-05-23 20:19 UTC (permalink / raw)
To: qemu-devel
Cc: Peter Xu, Thomas Huth, Marc-André Lureau, Fiona Ebner,
Het Gala, Laurent Vivier, Paolo Bonzini
Introduce a qtest_init version that does not go through the QMP
handshake, but does pass the test binary environment variables
forward. This is needed so we can run a simpler instance of QEMU with
-machine, but not much else.
The existing qtest_init_without_qmp_handshake() is not enough because
this time we want to pass along the special QTEST_QEMU_BINARY_SRC|DST
environment variables.
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
tests/qtest/libqtest.c | 14 +++++++++-----
tests/qtest/libqtest.h | 13 +++++++++++++
2 files changed, 22 insertions(+), 5 deletions(-)
diff --git a/tests/qtest/libqtest.c b/tests/qtest/libqtest.c
index d8f80d335e..911e45e189 100644
--- a/tests/qtest/libqtest.c
+++ b/tests/qtest/libqtest.c
@@ -513,11 +513,6 @@ static QTestState *qtest_init_internal(const char *qemu_bin,
kill(s->qemu_pid, SIGSTOP);
}
#endif
-
- /* ask endianness of the target */
-
- s->big_endian = qtest_query_target_endianness(s);
-
return s;
}
@@ -526,11 +521,20 @@ QTestState *qtest_init_without_qmp_handshake(const char *extra_args)
return qtest_init_internal(qtest_qemu_binary(NULL), extra_args);
}
+QTestState *qtest_init_with_env_no_handshake(const char *var,
+ const char *extra_args)
+{
+ return qtest_init_internal(qtest_qemu_binary(var), extra_args);
+}
+
QTestState *qtest_init_with_env(const char *var, const char *extra_args)
{
QTestState *s = qtest_init_internal(qtest_qemu_binary(var), extra_args);
QDict *greeting;
+ /* ask endianness of the target */
+ s->big_endian = qtest_query_target_endianness(s);
+
/* Read the QMP greeting and then do the handshake */
greeting = qtest_qmp_receive(s);
qobject_unref(greeting);
diff --git a/tests/qtest/libqtest.h b/tests/qtest/libqtest.h
index 6e3d3525bf..5e5554b5ad 100644
--- a/tests/qtest/libqtest.h
+++ b/tests/qtest/libqtest.h
@@ -68,6 +68,19 @@ QTestState *qtest_init(const char *extra_args);
*/
QTestState *qtest_init_with_env(const char *var, const char *extra_args);
+/**
+ * qtest_init_with_env_no_handshake:
+ * @var: Environment variable from where to take the QEMU binary
+ * @extra_args: Other arguments to pass to QEMU. CAUTION: these
+ * arguments are subject to word splitting and shell evaluation.
+ *
+ * Like qtest_init_with_env(), but skip the qmp handshake.
+ *
+ * Returns: #QTestState instance.
+ */
+QTestState *qtest_init_with_env_no_handshake(const char *var,
+ const char *extra_args);
+
/**
* qtest_init_without_qmp_handshake:
* @extra_args: other arguments to pass to QEMU. CAUTION: these
--
2.35.3
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [RFC PATCH 2/4] tests/qtest/migration: Add a test that runs vmstate-static-checker
2024-05-23 20:19 [RFC PATCH 0/4] migration-test: Device migration smoke tests Fabiano Rosas
2024-05-23 20:19 ` [RFC PATCH 1/4] tests/qtest/libqtest: Introduce another qtest_init version with no handshake Fabiano Rosas
@ 2024-05-23 20:19 ` Fabiano Rosas
2024-05-27 21:06 ` Peter Xu
2024-05-23 20:19 ` [RFC PATCH 3/4] tests/qtest/migration: Add support for simple device tests Fabiano Rosas
2024-05-23 20:19 ` [RFC PATCH 4/4] ci: Add the new migration " Fabiano Rosas
3 siblings, 1 reply; 16+ messages in thread
From: Fabiano Rosas @ 2024-05-23 20:19 UTC (permalink / raw)
To: qemu-devel
Cc: Peter Xu, Thomas Huth, Marc-André Lureau, Fiona Ebner,
Het Gala, Laurent Vivier, Paolo Bonzini
We have the vmstate-static-checker script that takes the output of:
'$QEMU -M $machine -dump-vmstate' for two different QEMU versions and
compares them to check for compatibility breakages. This is just too
simple and useful for us to pass on it. Add a test that runs the
script.
Since this needs to use two different QEMU versions, the test is
skipped if only one QEMU is provided. The infrastructure for passing
more than one binary is already in place:
$ PYTHON=$(which python3.11) \
QTEST_QEMU_BINARY_SRC=../build-previous/qemu-system-x86_64 \
QTEST_QEMU_BINARY=./qemu-system-x86_64 \
./tests/qtest/migration-test -p /x86_64/migration/vmstate-checker-script
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
some code duplication for now, just so we can reason about this
without too much noise
---
tests/qtest/migration-test.c | 82 ++++++++++++++++++++++++++++++++++++
1 file changed, 82 insertions(+)
diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index e8d3555f56..2253e0fc5b 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -63,6 +63,7 @@ static QTestMigrationState dst_state;
#define DIRTYLIMIT_TOLERANCE_RANGE 25 /* MB/s */
#define ANALYZE_SCRIPT "scripts/analyze-migration.py"
+#define VMSTATE_CHECKER_SCRIPT "scripts/vmstate-static-checker.py"
#define QEMU_VM_FILE_MAGIC 0x5145564d
#define FILE_TEST_FILENAME "migfile"
@@ -1611,6 +1612,85 @@ static void test_analyze_script(void)
test_migrate_end(from, to, false);
cleanup("migfile");
}
+
+static void test_vmstate_checker_script(void)
+{
+ g_autofree gchar *cmd_src = NULL;
+ g_autofree gchar *cmd_dst = NULL;
+ g_autofree gchar *vmstate_src = NULL;
+ g_autofree gchar *vmstate_dst = NULL;
+ const char *machine_alias, *machine_opts = "";
+ g_autofree char *machine = NULL;
+ const char *arch = qtest_get_arch();
+ int pid, wstatus;
+ const char *python = g_getenv("PYTHON");
+
+ if (!getenv(QEMU_ENV_SRC) && !getenv(QEMU_ENV_DST)) {
+ g_test_skip("Test needs two different QEMU versions");
+ return;
+ }
+
+ if (!python) {
+ g_test_skip("PYTHON variable not set");
+ return;
+ }
+
+ if (strcmp(arch, "i386") == 0 || strcmp(arch, "x86_64") == 0) {
+ if (g_str_equal(arch, "i386")) {
+ machine_alias = "pc";
+ } else {
+ machine_alias = "q35";
+ }
+ } else if (g_str_equal(arch, "s390x")) {
+ machine_alias = "s390-ccw-virtio";
+ } else if (strcmp(arch, "ppc64") == 0) {
+ machine_alias = "pseries";
+ } else if (strcmp(arch, "aarch64") == 0) {
+ machine_alias = "virt";
+ } else {
+ g_assert_not_reached();
+ }
+
+ if (!qtest_has_machine(machine_alias)) {
+ g_autofree char *msg = g_strdup_printf("machine %s not supported", machine_alias);
+ g_test_skip(msg);
+ return;
+ }
+
+ machine = resolve_machine_version(machine_alias, QEMU_ENV_SRC,
+ QEMU_ENV_DST);
+
+ vmstate_src = g_strdup_printf("%s/vmstate-src", tmpfs);
+ vmstate_dst = g_strdup_printf("%s/vmstate-dst", tmpfs);
+
+ cmd_dst = g_strdup_printf("-machine %s,%s -dump-vmstate %s",
+ machine, machine_opts, vmstate_dst);
+ cmd_src = g_strdup_printf("-machine %s,%s -dump-vmstate %s",
+ machine, machine_opts, vmstate_src);
+
+ qtest_init_with_env_no_handshake(QEMU_ENV_SRC, cmd_src);
+ qtest_init_with_env_no_handshake(QEMU_ENV_DST, cmd_dst);
+
+ pid = fork();
+ if (!pid) {
+ close(1);
+ open("/dev/null", O_WRONLY);
+ execl(python, python, VMSTATE_CHECKER_SCRIPT,
+ "-s", vmstate_src,
+ "-d", vmstate_dst,
+ NULL);
+ g_assert_not_reached();
+ }
+
+ g_assert(waitpid(pid, &wstatus, 0) == pid);
+ if (!WIFEXITED(wstatus) || WEXITSTATUS(wstatus) != 0) {
+ g_test_message("Failed to run vmstate-static-checker.py");
+ g_test_fail();
+ }
+
+ cleanup("vmstate-src");
+ cleanup("vmstate-dst");
+}
#endif
static void test_precopy_common(MigrateCommon *args)
@@ -3495,6 +3575,8 @@ int main(int argc, char **argv)
#ifndef _WIN32
if (!g_str_equal(arch, "s390x")) {
migration_test_add("/migration/analyze-script", test_analyze_script);
+ migration_test_add("/migration/vmstate-checker-script",
+ test_vmstate_checker_script);
}
#endif
migration_test_add("/migration/precopy/unix/plain",
--
2.35.3
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [RFC PATCH 3/4] tests/qtest/migration: Add support for simple device tests
2024-05-23 20:19 [RFC PATCH 0/4] migration-test: Device migration smoke tests Fabiano Rosas
2024-05-23 20:19 ` [RFC PATCH 1/4] tests/qtest/libqtest: Introduce another qtest_init version with no handshake Fabiano Rosas
2024-05-23 20:19 ` [RFC PATCH 2/4] tests/qtest/migration: Add a test that runs vmstate-static-checker Fabiano Rosas
@ 2024-05-23 20:19 ` Fabiano Rosas
2024-05-27 21:12 ` Peter Xu
2024-05-23 20:19 ` [RFC PATCH 4/4] ci: Add the new migration " Fabiano Rosas
3 siblings, 1 reply; 16+ messages in thread
From: Fabiano Rosas @ 2024-05-23 20:19 UTC (permalink / raw)
To: qemu-devel
Cc: Peter Xu, Thomas Huth, Marc-André Lureau, Fiona Ebner,
Het Gala, Laurent Vivier, Paolo Bonzini
The current migration-tests are almost entirely focused on catching
bugs on the migration code itself, not on the device migration
infrastructure (vmstate). That means we miss catching some low hanging
fruits that would show up immediately if only we had the device in
question present in the VM.
Add a list of devices to include by default in the migration-tests,
starting with one that recently had issues, virtio-gpu. Also add an
environment variable QTEST_DEVICE_OPTS to allow test users to
experiment with different devices or device options.
Do not run every migration test with the devices because that would
increase the complexity of the command lines and, as mentioned, the
migration-tests are mostly used to test the core migration code, not
the device migration. Add a special value QTEST_DEVICE_OPTS=all that
enables testing with devices.
Notes on usage:
For this new testing mode, it's not useful to run all the migration
tests, a single test would probably suffice to catch any issues, so
provide the -p option to migration-test and the test of your choice.
Like with the cross-version compatibility tests in CI and the recently
introduced vmstate-static-checker test, to be of any use, a test with
devices needs to be run against a different QEMU version, like so:
$ cd build
$ QTEST_DEVICE_OPTS=all \
QTEST_QEMU_BINARY=./qemu-system-x86_64 \
QTEST_QEMU_BINARY_DST=../build-previous/qemu-system-x86_64 \
./tests/qtest/migration-test -p /x86_64/migration/precopy/tcp/plain
$ cd build
$ QTEST_DEVICE_OPTS='-device virtio-net' \
QTEST_QEMU_BINARY=./qemu-system-x86_64 \
QTEST_QEMU_BINARY_DST=../build-previous/qemu-system-x86_64 \
./tests/qtest/migration-test -p /x86_64/migration/precopy/tcp/plain
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
tests/qtest/migration-test.c | 19 +++++++++++++++++--
1 file changed, 17 insertions(+), 2 deletions(-)
diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 2253e0fc5b..35bb224d18 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -71,6 +71,13 @@ static QTestMigrationState dst_state;
#define QEMU_ENV_SRC "QTEST_QEMU_BINARY_SRC"
#define QEMU_ENV_DST "QTEST_QEMU_BINARY_DST"
+/*
+ * The tests using DEFAULT_DEVICES need a special invocation and
+ * cannot be reached from make check, so don't bother with the
+ * --without-default-devices build.
+ */
+#define DEFAULT_DEVICES "-device virtio-gpu"
+
#if defined(__linux__)
#include <sys/syscall.h>
#include <sys/vfs.h>
@@ -701,6 +708,7 @@ static int test_migrate_start(QTestState **from, QTestState **to,
const char *memory_size;
const char *machine_alias, *machine_opts = "";
g_autofree char *machine = NULL;
+ g_autofree gchar *device_opts = NULL;
if (args->use_shmem) {
if (!g_file_test("/dev/shm", G_FILE_TEST_IS_DIR)) {
@@ -793,12 +801,17 @@ static int test_migrate_start(QTestState **from, QTestState **to,
g_test_message("Using machine type: %s", machine);
+ device_opts = g_strdup(getenv("QTEST_DEVICE_OPTS"));
+ if (g_str_equal(device_opts, "all")) {
+ device_opts = g_strdup(DEFAULT_DEVICES);
+ }
+
cmd_source = g_strdup_printf("-accel kvm%s -accel tcg "
"-machine %s,%s "
"-name source,debug-threads=on "
"-m %s "
"-serial file:%s/src_serial "
- "%s %s %s %s %s",
+ "%s %s %s %s %s %s",
kvm_opts ? kvm_opts : "",
machine, machine_opts,
memory_size, tmpfs,
@@ -806,6 +819,7 @@ static int test_migrate_start(QTestState **from, QTestState **to,
arch_source ? arch_source : "",
shmem_opts ? shmem_opts : "",
args->opts_source ? args->opts_source : "",
+ device_opts ? device_opts : "",
ignore_stderr);
if (!args->only_target) {
*from = qtest_init_with_env(QEMU_ENV_SRC, cmd_source);
@@ -820,7 +834,7 @@ static int test_migrate_start(QTestState **from, QTestState **to,
"-m %s "
"-serial file:%s/dest_serial "
"-incoming %s "
- "%s %s %s %s %s",
+ "%s %s %s %s %s %s",
kvm_opts ? kvm_opts : "",
machine, machine_opts,
memory_size, tmpfs, uri,
@@ -828,6 +842,7 @@ static int test_migrate_start(QTestState **from, QTestState **to,
arch_target ? arch_target : "",
shmem_opts ? shmem_opts : "",
args->opts_target ? args->opts_target : "",
+ device_opts ? device_opts : "",
ignore_stderr);
*to = qtest_init_with_env(QEMU_ENV_DST, cmd_target);
qtest_qmp_set_event_callback(*to,
--
2.35.3
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [RFC PATCH 4/4] ci: Add the new migration device tests
2024-05-23 20:19 [RFC PATCH 0/4] migration-test: Device migration smoke tests Fabiano Rosas
` (2 preceding siblings ...)
2024-05-23 20:19 ` [RFC PATCH 3/4] tests/qtest/migration: Add support for simple device tests Fabiano Rosas
@ 2024-05-23 20:19 ` Fabiano Rosas
2024-05-27 21:17 ` Peter Xu
3 siblings, 1 reply; 16+ messages in thread
From: Fabiano Rosas @ 2024-05-23 20:19 UTC (permalink / raw)
To: qemu-devel
Cc: Peter Xu, Thomas Huth, Marc-André Lureau, Fiona Ebner,
Het Gala, Alex Bennée, Philippe Mathieu-Daudé,
Wainer dos Santos Moschetta, Beraldo Leal
We have two new migration tests that check cross version
compatibility. One uses the vmstate-static-checker.py script to
compare the vmstate structures from two different QEMU versions. The
other runs a simple migration with a few devices present in the VM, to
catch obvious breakages.
Add both tests to the migration-compat-common job.
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
.gitlab-ci.d/buildtest.yml | 43 +++++++++++++++++++++++++++++++-------
1 file changed, 36 insertions(+), 7 deletions(-)
diff --git a/.gitlab-ci.d/buildtest.yml b/.gitlab-ci.d/buildtest.yml
index 91c57efded..bc7ac35983 100644
--- a/.gitlab-ci.d/buildtest.yml
+++ b/.gitlab-ci.d/buildtest.yml
@@ -202,18 +202,47 @@ build-previous-qemu:
needs:
- job: build-previous-qemu
- job: build-system-opensuse
- # The old QEMU could have bugs unrelated to migration that are
- # already fixed in the current development branch, so this test
- # might fail.
+ # This test is allowed to fail because:
+ #
+ # - The old QEMU could have bugs unrelated to migration that are
+ # already fixed in the current development branch.
+ #
+ # - The vmstate-static-checker script trips on renames and other
+ # backward-compatible changes to the vmstate structs.
allow_failure: true
variables:
IMAGE: opensuse-leap
MAKE_CHECK_ARGS: check-build
script:
- # Use the migration-tests from the older QEMU tree. This avoids
- # testing an old QEMU against new features/tests that it is not
- # compatible with.
- - cd build-previous
+ - cd build
+ # device state static test: Tests the vmstate structures for
+ # compatibility across QEMU versions. Uses the latest version of
+ # the tests.
+ # old to new
+ - PYTHON=pyvenv/bin/python3
+ QTEST_QEMU_BINARY_SRC=../build-previous/qemu-system-${TARGET}
+ QTEST_QEMU_BINARY=./qemu-system-${TARGET}
+ ./tests/qtest/migration-test -p /${TARGET}/migration/vmstate-checker-script
+ # new to old skipped because vmstate version bumps are always
+ # backward incompatible.
+
+ # device state runtime test: Performs a cross-version migration
+ # with a select list of devices (see DEFAULT_DEVICES in
+ # migration-test.c). Using the multifd tcp test here, but any will
+ # do.
+ # old to new
+ - QTEST_DEVICE_OPTS=all QTEST_QEMU_BINARY_SRC=../build-previous/qemu-system-${TARGET}
+ QTEST_QEMU_BINARY=./qemu-system-${TARGET} ./tests/qtest/migration-test
+ -p /${TARGET}/migration/multifd/tcp/channels/plain/none
+ # new to old
+ - QTEST_DEVICE_OPTS=all QTEST_QEMU_BINARY_DST=../build-previous/qemu-system-${TARGET}
+ QTEST_QEMU_BINARY=./qemu-system-${TARGET} ./tests/qtest/migration-test
+ -p /${TARGET}/migration/multifd/tcp/channels/plain/none
+
+ # migration core tests: Use the migration-tests from the older
+ # QEMU tree. This avoids testing an old QEMU against new
+ # features/tests that it is not compatible with.
+ - cd ../build-previous
# old to new
- QTEST_QEMU_BINARY_SRC=./qemu-system-${TARGET}
QTEST_QEMU_BINARY=../build/qemu-system-${TARGET} ./tests/qtest/migration-test
--
2.35.3
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [RFC PATCH 2/4] tests/qtest/migration: Add a test that runs vmstate-static-checker
2024-05-23 20:19 ` [RFC PATCH 2/4] tests/qtest/migration: Add a test that runs vmstate-static-checker Fabiano Rosas
@ 2024-05-27 21:06 ` Peter Xu
2024-05-27 22:52 ` Fabiano Rosas
0 siblings, 1 reply; 16+ messages in thread
From: Peter Xu @ 2024-05-27 21:06 UTC (permalink / raw)
To: Fabiano Rosas
Cc: qemu-devel, Thomas Huth, Marc-André Lureau, Fiona Ebner,
Het Gala, Laurent Vivier, Paolo Bonzini
On Thu, May 23, 2024 at 05:19:20PM -0300, Fabiano Rosas wrote:
> We have the vmstate-static-checker script that takes the output of:
> '$QEMU -M $machine -dump-vmstate' for two different QEMU versions and
> compares them to check for compatibility breakages. This is just too
> simple and useful for us to pass on it. Add a test that runs the
> script.
>
> Since this needs to use two different QEMU versions, the test is
> skipped if only one QEMU is provided. The infrastructure for passing
> more than one binary is already in place:
>
> $ PYTHON=$(which python3.11) \
> QTEST_QEMU_BINARY_SRC=../build-previous/qemu-system-x86_64 \
> QTEST_QEMU_BINARY=./qemu-system-x86_64 \
> ./tests/qtest/migration-test -p /x86_64/migration/vmstate-checker-script
>
> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> ---
> some code duplication for now, just so we can reason about this
> without too much noise
> ---
> tests/qtest/migration-test.c | 82 ++++++++++++++++++++++++++++++++++++
> 1 file changed, 82 insertions(+)
>
> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
> index e8d3555f56..2253e0fc5b 100644
> --- a/tests/qtest/migration-test.c
> +++ b/tests/qtest/migration-test.c
> @@ -63,6 +63,7 @@ static QTestMigrationState dst_state;
> #define DIRTYLIMIT_TOLERANCE_RANGE 25 /* MB/s */
>
> #define ANALYZE_SCRIPT "scripts/analyze-migration.py"
> +#define VMSTATE_CHECKER_SCRIPT "scripts/vmstate-static-checker.py"
>
> #define QEMU_VM_FILE_MAGIC 0x5145564d
> #define FILE_TEST_FILENAME "migfile"
> @@ -1611,6 +1612,85 @@ static void test_analyze_script(void)
> test_migrate_end(from, to, false);
> cleanup("migfile");
> }
> +
> +static void test_vmstate_checker_script(void)
> +{
> + g_autofree gchar *cmd_src = NULL;
> + g_autofree gchar *cmd_dst = NULL;
> + g_autofree gchar *vmstate_src = NULL;
> + g_autofree gchar *vmstate_dst = NULL;
> + const char *machine_alias, *machine_opts = "";
> + g_autofree char *machine = NULL;
> + const char *arch = qtest_get_arch();
> + int pid, wstatus;
> + const char *python = g_getenv("PYTHON");
> +
> + if (!getenv(QEMU_ENV_SRC) && !getenv(QEMU_ENV_DST)) {
> + g_test_skip("Test needs two different QEMU versions");
> + return;
> + }
> +
> + if (!python) {
> + g_test_skip("PYTHON variable not set");
> + return;
> + }
> +
> + if (strcmp(arch, "i386") == 0 || strcmp(arch, "x86_64") == 0) {
> + if (g_str_equal(arch, "i386")) {
> + machine_alias = "pc";
> + } else {
> + machine_alias = "q35";
> + }
> + } else if (g_str_equal(arch, "s390x")) {
> + machine_alias = "s390-ccw-virtio";
> + } else if (strcmp(arch, "ppc64") == 0) {
> + machine_alias = "pseries";
> + } else if (strcmp(arch, "aarch64") == 0) {
> + machine_alias = "virt";
> + } else {
> + g_assert_not_reached();
> + }
> +
> + if (!qtest_has_machine(machine_alias)) {
> + g_autofree char *msg = g_strdup_printf("machine %s not supported", machine_alias);
> + g_test_skip(msg);
> + return;
> + }
> +
> + machine = resolve_machine_version(machine_alias, QEMU_ENV_SRC,
> + QEMU_ENV_DST);
> +
> + vmstate_src = g_strdup_printf("%s/vmstate-src", tmpfs);
> + vmstate_dst = g_strdup_printf("%s/vmstate-dst", tmpfs);
> +
> + cmd_dst = g_strdup_printf("-machine %s,%s -dump-vmstate %s",
> + machine, machine_opts, vmstate_dst);
> + cmd_src = g_strdup_printf("-machine %s,%s -dump-vmstate %s",
> + machine, machine_opts, vmstate_src);
> +
> + qtest_init_with_env_no_handshake(QEMU_ENV_SRC, cmd_src);
> + qtest_init_with_env_no_handshake(QEMU_ENV_DST, cmd_dst);
> +
> + pid = fork();
> + if (!pid) {
> + close(1);
> + open("/dev/null", O_WRONLY);
> + execl(python, python, VMSTATE_CHECKER_SCRIPT,
> + "-s", vmstate_src,
> + "-d", vmstate_dst,
> + NULL);
> + g_assert_not_reached();
> + }
> +
> + g_assert(waitpid(pid, &wstatus, 0) == pid);
> + if (!WIFEXITED(wstatus) || WEXITSTATUS(wstatus) != 0) {
> + g_test_message("Failed to run vmstate-static-checker.py");
> + g_test_fail();
> + }
> +
> + cleanup("vmstate-src");
> + cleanup("vmstate-dst");
> +}
Did I ask before on whether this can be written without C?
I think this and also the analyze-script are more suitable to be written in
other ways, e.g., bash or python, no?
> #endif
>
> static void test_precopy_common(MigrateCommon *args)
> @@ -3495,6 +3575,8 @@ int main(int argc, char **argv)
> #ifndef _WIN32
> if (!g_str_equal(arch, "s390x")) {
> migration_test_add("/migration/analyze-script", test_analyze_script);
> + migration_test_add("/migration/vmstate-checker-script",
> + test_vmstate_checker_script);
> }
> #endif
> migration_test_add("/migration/precopy/unix/plain",
> --
> 2.35.3
>
--
Peter Xu
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC PATCH 3/4] tests/qtest/migration: Add support for simple device tests
2024-05-23 20:19 ` [RFC PATCH 3/4] tests/qtest/migration: Add support for simple device tests Fabiano Rosas
@ 2024-05-27 21:12 ` Peter Xu
2024-05-27 22:59 ` Fabiano Rosas
0 siblings, 1 reply; 16+ messages in thread
From: Peter Xu @ 2024-05-27 21:12 UTC (permalink / raw)
To: Fabiano Rosas
Cc: qemu-devel, Thomas Huth, Marc-André Lureau, Fiona Ebner,
Het Gala, Laurent Vivier, Paolo Bonzini
On Thu, May 23, 2024 at 05:19:21PM -0300, Fabiano Rosas wrote:
> The current migration-tests are almost entirely focused on catching
> bugs on the migration code itself, not on the device migration
> infrastructure (vmstate). That means we miss catching some low hanging
> fruits that would show up immediately if only we had the device in
> question present in the VM.
>
> Add a list of devices to include by default in the migration-tests,
> starting with one that recently had issues, virtio-gpu. Also add an
> environment variable QTEST_DEVICE_OPTS to allow test users to
> experiment with different devices or device options.
>
> Do not run every migration test with the devices because that would
> increase the complexity of the command lines and, as mentioned, the
> migration-tests are mostly used to test the core migration code, not
> the device migration. Add a special value QTEST_DEVICE_OPTS=all that
> enables testing with devices.
>
> Notes on usage:
>
> For this new testing mode, it's not useful to run all the migration
> tests, a single test would probably suffice to catch any issues, so
> provide the -p option to migration-test and the test of your choice.
>
> Like with the cross-version compatibility tests in CI and the recently
> introduced vmstate-static-checker test, to be of any use, a test with
> devices needs to be run against a different QEMU version, like so:
>
> $ cd build
> $ QTEST_DEVICE_OPTS=all \
> QTEST_QEMU_BINARY=./qemu-system-x86_64 \
> QTEST_QEMU_BINARY_DST=../build-previous/qemu-system-x86_64 \
> ./tests/qtest/migration-test -p /x86_64/migration/precopy/tcp/plain
>
> $ cd build
> $ QTEST_DEVICE_OPTS='-device virtio-net' \
> QTEST_QEMU_BINARY=./qemu-system-x86_64 \
> QTEST_QEMU_BINARY_DST=../build-previous/qemu-system-x86_64 \
> ./tests/qtest/migration-test -p /x86_64/migration/precopy/tcp/plain
>
> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> ---
> tests/qtest/migration-test.c | 19 +++++++++++++++++--
> 1 file changed, 17 insertions(+), 2 deletions(-)
>
> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
> index 2253e0fc5b..35bb224d18 100644
> --- a/tests/qtest/migration-test.c
> +++ b/tests/qtest/migration-test.c
> @@ -71,6 +71,13 @@ static QTestMigrationState dst_state;
> #define QEMU_ENV_SRC "QTEST_QEMU_BINARY_SRC"
> #define QEMU_ENV_DST "QTEST_QEMU_BINARY_DST"
>
> +/*
> + * The tests using DEFAULT_DEVICES need a special invocation and
> + * cannot be reached from make check, so don't bother with the
> + * --without-default-devices build.
What's this "--without-default-devices"?
Other than this it looks all good.. thanks.
--
Peter Xu
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC PATCH 4/4] ci: Add the new migration device tests
2024-05-23 20:19 ` [RFC PATCH 4/4] ci: Add the new migration " Fabiano Rosas
@ 2024-05-27 21:17 ` Peter Xu
2024-05-27 23:59 ` Fabiano Rosas
0 siblings, 1 reply; 16+ messages in thread
From: Peter Xu @ 2024-05-27 21:17 UTC (permalink / raw)
To: Fabiano Rosas
Cc: qemu-devel, Thomas Huth, Marc-André Lureau, Fiona Ebner,
Het Gala, Alex Bennée, Philippe Mathieu-Daudé,
Wainer dos Santos Moschetta, Beraldo Leal
On Thu, May 23, 2024 at 05:19:22PM -0300, Fabiano Rosas wrote:
> We have two new migration tests that check cross version
> compatibility. One uses the vmstate-static-checker.py script to
> compare the vmstate structures from two different QEMU versions. The
> other runs a simple migration with a few devices present in the VM, to
> catch obvious breakages.
>
> Add both tests to the migration-compat-common job.
>
> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> ---
> .gitlab-ci.d/buildtest.yml | 43 +++++++++++++++++++++++++++++++-------
> 1 file changed, 36 insertions(+), 7 deletions(-)
>
> diff --git a/.gitlab-ci.d/buildtest.yml b/.gitlab-ci.d/buildtest.yml
> index 91c57efded..bc7ac35983 100644
> --- a/.gitlab-ci.d/buildtest.yml
> +++ b/.gitlab-ci.d/buildtest.yml
> @@ -202,18 +202,47 @@ build-previous-qemu:
> needs:
> - job: build-previous-qemu
> - job: build-system-opensuse
> - # The old QEMU could have bugs unrelated to migration that are
> - # already fixed in the current development branch, so this test
> - # might fail.
> + # This test is allowed to fail because:
> + #
> + # - The old QEMU could have bugs unrelated to migration that are
> + # already fixed in the current development branch.
Did you ever hit a real failure with this? I'm wondering whether we can
remove this allow_failure thing.
> + #
> + # - The vmstate-static-checker script trips on renames and other
> + # backward-compatible changes to the vmstate structs.
I think I keep my preference per last time we talked on this. :)
I still think it's too early to involve a test that can report false
negative. I'd still keep running this before soft-freeze like I used to
do, throw issues to others and urge them to fix before release. Per my
previous experience that doesn't consume me a lot of time, and it's not
common to see issues either.
So I want people to really pay attention when someone sees a migration CI
test failed, rather than we help people form the habit in "oh migration CI
failed again? I think that's fine, it allows failing anyway".
So far I still don't see as much benefit to adding this if we need to pay
for the other false negative issue. I'll fully support it if e.g. we can
fix the tool to avoid reporting false negatives, but that may take effort
that I didn't check.
> allow_failure: true
> variables:
> IMAGE: opensuse-leap
> MAKE_CHECK_ARGS: check-build
> script:
> - # Use the migration-tests from the older QEMU tree. This avoids
> - # testing an old QEMU against new features/tests that it is not
> - # compatible with.
> - - cd build-previous
> + - cd build
> + # device state static test: Tests the vmstate structures for
> + # compatibility across QEMU versions. Uses the latest version of
> + # the tests.
> + # old to new
> + - PYTHON=pyvenv/bin/python3
> + QTEST_QEMU_BINARY_SRC=../build-previous/qemu-system-${TARGET}
> + QTEST_QEMU_BINARY=./qemu-system-${TARGET}
> + ./tests/qtest/migration-test -p /${TARGET}/migration/vmstate-checker-script
> + # new to old skipped because vmstate version bumps are always
> + # backward incompatible.
> +
> + # device state runtime test: Performs a cross-version migration
> + # with a select list of devices (see DEFAULT_DEVICES in
> + # migration-test.c). Using the multifd tcp test here, but any will
> + # do.
> + # old to new
> + - QTEST_DEVICE_OPTS=all QTEST_QEMU_BINARY_SRC=../build-previous/qemu-system-${TARGET}
> + QTEST_QEMU_BINARY=./qemu-system-${TARGET} ./tests/qtest/migration-test
> + -p /${TARGET}/migration/multifd/tcp/channels/plain/none
> + # new to old
> + - QTEST_DEVICE_OPTS=all QTEST_QEMU_BINARY_DST=../build-previous/qemu-system-${TARGET}
> + QTEST_QEMU_BINARY=./qemu-system-${TARGET} ./tests/qtest/migration-test
> + -p /${TARGET}/migration/multifd/tcp/channels/plain/none
> +
> + # migration core tests: Use the migration-tests from the older
> + # QEMU tree. This avoids testing an old QEMU against new
> + # features/tests that it is not compatible with.
> + - cd ../build-previous
> # old to new
> - QTEST_QEMU_BINARY_SRC=./qemu-system-${TARGET}
> QTEST_QEMU_BINARY=../build/qemu-system-${TARGET} ./tests/qtest/migration-test
> --
> 2.35.3
>
--
Peter Xu
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC PATCH 2/4] tests/qtest/migration: Add a test that runs vmstate-static-checker
2024-05-27 21:06 ` Peter Xu
@ 2024-05-27 22:52 ` Fabiano Rosas
2024-05-28 15:52 ` Peter Xu
0 siblings, 1 reply; 16+ messages in thread
From: Fabiano Rosas @ 2024-05-27 22:52 UTC (permalink / raw)
To: Peter Xu
Cc: qemu-devel, Thomas Huth, Marc-André Lureau, Fiona Ebner,
Het Gala, Laurent Vivier, Paolo Bonzini
Peter Xu <peterx@redhat.com> writes:
> On Thu, May 23, 2024 at 05:19:20PM -0300, Fabiano Rosas wrote:
>> We have the vmstate-static-checker script that takes the output of:
>> '$QEMU -M $machine -dump-vmstate' for two different QEMU versions and
>> compares them to check for compatibility breakages. This is just too
>> simple and useful for us to pass on it. Add a test that runs the
>> script.
>>
>> Since this needs to use two different QEMU versions, the test is
>> skipped if only one QEMU is provided. The infrastructure for passing
>> more than one binary is already in place:
>>
>> $ PYTHON=$(which python3.11) \
>> QTEST_QEMU_BINARY_SRC=../build-previous/qemu-system-x86_64 \
>> QTEST_QEMU_BINARY=./qemu-system-x86_64 \
>> ./tests/qtest/migration-test -p /x86_64/migration/vmstate-checker-script
>>
>> Signed-off-by: Fabiano Rosas <farosas@suse.de>
>> ---
>> some code duplication for now, just so we can reason about this
>> without too much noise
>> ---
>> tests/qtest/migration-test.c | 82 ++++++++++++++++++++++++++++++++++++
>> 1 file changed, 82 insertions(+)
>>
>> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
>> index e8d3555f56..2253e0fc5b 100644
>> --- a/tests/qtest/migration-test.c
>> +++ b/tests/qtest/migration-test.c
>> @@ -63,6 +63,7 @@ static QTestMigrationState dst_state;
>> #define DIRTYLIMIT_TOLERANCE_RANGE 25 /* MB/s */
>>
>> #define ANALYZE_SCRIPT "scripts/analyze-migration.py"
>> +#define VMSTATE_CHECKER_SCRIPT "scripts/vmstate-static-checker.py"
>>
>> #define QEMU_VM_FILE_MAGIC 0x5145564d
>> #define FILE_TEST_FILENAME "migfile"
>> @@ -1611,6 +1612,85 @@ static void test_analyze_script(void)
>> test_migrate_end(from, to, false);
>> cleanup("migfile");
>> }
>> +
>> +static void test_vmstate_checker_script(void)
>> +{
>> + g_autofree gchar *cmd_src = NULL;
>> + g_autofree gchar *cmd_dst = NULL;
>> + g_autofree gchar *vmstate_src = NULL;
>> + g_autofree gchar *vmstate_dst = NULL;
>> + const char *machine_alias, *machine_opts = "";
>> + g_autofree char *machine = NULL;
>> + const char *arch = qtest_get_arch();
>> + int pid, wstatus;
>> + const char *python = g_getenv("PYTHON");
>> +
>> + if (!getenv(QEMU_ENV_SRC) && !getenv(QEMU_ENV_DST)) {
>> + g_test_skip("Test needs two different QEMU versions");
>> + return;
>> + }
>> +
>> + if (!python) {
>> + g_test_skip("PYTHON variable not set");
>> + return;
>> + }
>> +
>> + if (strcmp(arch, "i386") == 0 || strcmp(arch, "x86_64") == 0) {
>> + if (g_str_equal(arch, "i386")) {
>> + machine_alias = "pc";
>> + } else {
>> + machine_alias = "q35";
>> + }
>> + } else if (g_str_equal(arch, "s390x")) {
>> + machine_alias = "s390-ccw-virtio";
>> + } else if (strcmp(arch, "ppc64") == 0) {
>> + machine_alias = "pseries";
>> + } else if (strcmp(arch, "aarch64") == 0) {
>> + machine_alias = "virt";
>> + } else {
>> + g_assert_not_reached();
>> + }
>> +
>> + if (!qtest_has_machine(machine_alias)) {
>> + g_autofree char *msg = g_strdup_printf("machine %s not supported", machine_alias);
>> + g_test_skip(msg);
>> + return;
>> + }
>> +
>> + machine = resolve_machine_version(machine_alias, QEMU_ENV_SRC,
>> + QEMU_ENV_DST);
>> +
>> + vmstate_src = g_strdup_printf("%s/vmstate-src", tmpfs);
>> + vmstate_dst = g_strdup_printf("%s/vmstate-dst", tmpfs);
>> +
>> + cmd_dst = g_strdup_printf("-machine %s,%s -dump-vmstate %s",
>> + machine, machine_opts, vmstate_dst);
>> + cmd_src = g_strdup_printf("-machine %s,%s -dump-vmstate %s",
>> + machine, machine_opts, vmstate_src);
>> +
>> + qtest_init_with_env_no_handshake(QEMU_ENV_SRC, cmd_src);
>> + qtest_init_with_env_no_handshake(QEMU_ENV_DST, cmd_dst);
>> +
>> + pid = fork();
>> + if (!pid) {
>> + close(1);
>> + open("/dev/null", O_WRONLY);
>> + execl(python, python, VMSTATE_CHECKER_SCRIPT,
>> + "-s", vmstate_src,
>> + "-d", vmstate_dst,
>> + NULL);
>> + g_assert_not_reached();
>> + }
>> +
>> + g_assert(waitpid(pid, &wstatus, 0) == pid);
>> + if (!WIFEXITED(wstatus) || WEXITSTATUS(wstatus) != 0) {
>> + g_test_message("Failed to run vmstate-static-checker.py");
>> + g_test_fail();
>> + }
>> +
>> + cleanup("vmstate-src");
>> + cleanup("vmstate-dst");
>> +}
>
> Did I ask before on whether this can be written without C?
If you did I forgot about it, sorry.
> I think this and also the analyze-script are more suitable to be written in
> other ways, e.g., bash or python, no?
>
I would prefer not to fragment the test framework. There's a bunch of
infra already present in migration-test/libqtest that we would end up
having to rewrite in the other languages.
>> #endif
>>
>> static void test_precopy_common(MigrateCommon *args)
>> @@ -3495,6 +3575,8 @@ int main(int argc, char **argv)
>> #ifndef _WIN32
>> if (!g_str_equal(arch, "s390x")) {
>> migration_test_add("/migration/analyze-script", test_analyze_script);
>> + migration_test_add("/migration/vmstate-checker-script",
>> + test_vmstate_checker_script);
>> }
>> #endif
>> migration_test_add("/migration/precopy/unix/plain",
>> --
>> 2.35.3
>>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC PATCH 3/4] tests/qtest/migration: Add support for simple device tests
2024-05-27 21:12 ` Peter Xu
@ 2024-05-27 22:59 ` Fabiano Rosas
2024-05-28 15:35 ` Peter Xu
0 siblings, 1 reply; 16+ messages in thread
From: Fabiano Rosas @ 2024-05-27 22:59 UTC (permalink / raw)
To: Peter Xu
Cc: qemu-devel, Thomas Huth, Marc-André Lureau, Fiona Ebner,
Het Gala, Laurent Vivier, Paolo Bonzini
Peter Xu <peterx@redhat.com> writes:
> On Thu, May 23, 2024 at 05:19:21PM -0300, Fabiano Rosas wrote:
>> The current migration-tests are almost entirely focused on catching
>> bugs on the migration code itself, not on the device migration
>> infrastructure (vmstate). That means we miss catching some low hanging
>> fruits that would show up immediately if only we had the device in
>> question present in the VM.
>>
>> Add a list of devices to include by default in the migration-tests,
>> starting with one that recently had issues, virtio-gpu. Also add an
>> environment variable QTEST_DEVICE_OPTS to allow test users to
>> experiment with different devices or device options.
>>
>> Do not run every migration test with the devices because that would
>> increase the complexity of the command lines and, as mentioned, the
>> migration-tests are mostly used to test the core migration code, not
>> the device migration. Add a special value QTEST_DEVICE_OPTS=all that
>> enables testing with devices.
>>
>> Notes on usage:
>>
>> For this new testing mode, it's not useful to run all the migration
>> tests, a single test would probably suffice to catch any issues, so
>> provide the -p option to migration-test and the test of your choice.
>>
>> Like with the cross-version compatibility tests in CI and the recently
>> introduced vmstate-static-checker test, to be of any use, a test with
>> devices needs to be run against a different QEMU version, like so:
>>
>> $ cd build
>> $ QTEST_DEVICE_OPTS=all \
>> QTEST_QEMU_BINARY=./qemu-system-x86_64 \
>> QTEST_QEMU_BINARY_DST=../build-previous/qemu-system-x86_64 \
>> ./tests/qtest/migration-test -p /x86_64/migration/precopy/tcp/plain
>>
>> $ cd build
>> $ QTEST_DEVICE_OPTS='-device virtio-net' \
>> QTEST_QEMU_BINARY=./qemu-system-x86_64 \
>> QTEST_QEMU_BINARY_DST=../build-previous/qemu-system-x86_64 \
>> ./tests/qtest/migration-test -p /x86_64/migration/precopy/tcp/plain
>>
>> Signed-off-by: Fabiano Rosas <farosas@suse.de>
>> ---
>> tests/qtest/migration-test.c | 19 +++++++++++++++++--
>> 1 file changed, 17 insertions(+), 2 deletions(-)
>>
>> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
>> index 2253e0fc5b..35bb224d18 100644
>> --- a/tests/qtest/migration-test.c
>> +++ b/tests/qtest/migration-test.c
>> @@ -71,6 +71,13 @@ static QTestMigrationState dst_state;
>> #define QEMU_ENV_SRC "QTEST_QEMU_BINARY_SRC"
>> #define QEMU_ENV_DST "QTEST_QEMU_BINARY_DST"
>>
>> +/*
>> + * The tests using DEFAULT_DEVICES need a special invocation and
>> + * cannot be reached from make check, so don't bother with the
>> + * --without-default-devices build.
>
> What's this "--without-default-devices"?
A configure option. It removes from the build any devices that are
marked as default. It's an endless source of bugs because it is supposed
to be paired with a config file that adds back some of the removed
devices, but there's nothing enforcing that so we always run it as is
and generate a broken QEMU binary.
So anything in the tests that refer to devices should first check if
that QEMU binary even has the device present. I'm saying here that we're
not going to do that because this test cannot be accidentally reached
via make check. Realistically, most people will consume this test
through the CI job only.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC PATCH 4/4] ci: Add the new migration device tests
2024-05-27 21:17 ` Peter Xu
@ 2024-05-27 23:59 ` Fabiano Rosas
2024-05-28 15:48 ` Peter Xu
0 siblings, 1 reply; 16+ messages in thread
From: Fabiano Rosas @ 2024-05-27 23:59 UTC (permalink / raw)
To: Peter Xu
Cc: qemu-devel, Thomas Huth, Marc-André Lureau, Fiona Ebner,
Het Gala, Alex Bennée, Philippe Mathieu-Daudé,
Wainer dos Santos Moschetta, Beraldo Leal
Peter Xu <peterx@redhat.com> writes:
> On Thu, May 23, 2024 at 05:19:22PM -0300, Fabiano Rosas wrote:
>> We have two new migration tests that check cross version
>> compatibility. One uses the vmstate-static-checker.py script to
>> compare the vmstate structures from two different QEMU versions. The
>> other runs a simple migration with a few devices present in the VM, to
>> catch obvious breakages.
>>
>> Add both tests to the migration-compat-common job.
>>
>> Signed-off-by: Fabiano Rosas <farosas@suse.de>
>> ---
>> .gitlab-ci.d/buildtest.yml | 43 +++++++++++++++++++++++++++++++-------
>> 1 file changed, 36 insertions(+), 7 deletions(-)
>>
>> diff --git a/.gitlab-ci.d/buildtest.yml b/.gitlab-ci.d/buildtest.yml
>> index 91c57efded..bc7ac35983 100644
>> --- a/.gitlab-ci.d/buildtest.yml
>> +++ b/.gitlab-ci.d/buildtest.yml
>> @@ -202,18 +202,47 @@ build-previous-qemu:
>> needs:
>> - job: build-previous-qemu
>> - job: build-system-opensuse
>> - # The old QEMU could have bugs unrelated to migration that are
>> - # already fixed in the current development branch, so this test
>> - # might fail.
>> + # This test is allowed to fail because:
>> + #
>> + # - The old QEMU could have bugs unrelated to migration that are
>> + # already fixed in the current development branch.
>
> Did you ever hit a real failure with this? I'm wondering whether we can
> remove this allow_failure thing.
>
I haven't. But when it fails we'll go through an entire release cycle
with this thing showing red for every person that runs the CI. Remember,
this is a CI failure to which there's no fix aside from waiting for the
release to happen. Even if we're quick to react and disable the job, I
feel it might create some confusion already.
>> + #
>> + # - The vmstate-static-checker script trips on renames and other
>> + # backward-compatible changes to the vmstate structs.
>
> I think I keep my preference per last time we talked on this. :)
Sorry, I'm not trying to force this in any way, I just wrote these to
use in the pull-request and thought I'd put it out there. At the very
least we can have your concerns documented. =)
> I still think it's too early to involve a test that can report false
> negative.
(1)
Well, we haven't seen any false negatives, we've seen fields being
renamed. If that happens, then we'll ask the person to update the
script. Is that not acceptable to you? Or are you thinking about other
sorts of issues?
> I'd still keep running this before soft-freeze like I used to
> do, throw issues to others and urge them to fix before release.
Having hidden procedures that maintainers run before a release is bad
IMHO, it just delays the catching of bugs and frustrates
contributors. Imagine working on a series, everything goes well with
reviews, CI passes, patch gets queued and merged and a month later you
get a ping about something you should have done to avoid breaking
migration. Right during freeze.
> Per my
> previous experience that doesn't consume me a lot of time, and it's not
> common to see issues either.
>
> So I want people to really pay attention when someone sees a migration CI
> test failed, rather than we help people form the habit in "oh migration CI
> failed again? I think that's fine, it allows failing anyway".
That's a good point. I don't think it applies here though. See my point
in (1).
> So far I still don't see as much benefit to adding this if we need to pay
> for the other false negative issue. I'll fully support it if e.g. we can
> fix the tool to avoid reporting false negatives, but that may take effort
> that I didn't check.
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC PATCH 3/4] tests/qtest/migration: Add support for simple device tests
2024-05-27 22:59 ` Fabiano Rosas
@ 2024-05-28 15:35 ` Peter Xu
0 siblings, 0 replies; 16+ messages in thread
From: Peter Xu @ 2024-05-28 15:35 UTC (permalink / raw)
To: Fabiano Rosas
Cc: qemu-devel, Thomas Huth, Marc-André Lureau, Fiona Ebner,
Het Gala, Laurent Vivier, Paolo Bonzini
On Mon, May 27, 2024 at 07:59:50PM -0300, Fabiano Rosas wrote:
> Peter Xu <peterx@redhat.com> writes:
>
> > On Thu, May 23, 2024 at 05:19:21PM -0300, Fabiano Rosas wrote:
> >> The current migration-tests are almost entirely focused on catching
> >> bugs on the migration code itself, not on the device migration
> >> infrastructure (vmstate). That means we miss catching some low hanging
> >> fruits that would show up immediately if only we had the device in
> >> question present in the VM.
> >>
> >> Add a list of devices to include by default in the migration-tests,
> >> starting with one that recently had issues, virtio-gpu. Also add an
> >> environment variable QTEST_DEVICE_OPTS to allow test users to
> >> experiment with different devices or device options.
> >>
> >> Do not run every migration test with the devices because that would
> >> increase the complexity of the command lines and, as mentioned, the
> >> migration-tests are mostly used to test the core migration code, not
> >> the device migration. Add a special value QTEST_DEVICE_OPTS=all that
> >> enables testing with devices.
> >>
> >> Notes on usage:
> >>
> >> For this new testing mode, it's not useful to run all the migration
> >> tests, a single test would probably suffice to catch any issues, so
> >> provide the -p option to migration-test and the test of your choice.
> >>
> >> Like with the cross-version compatibility tests in CI and the recently
> >> introduced vmstate-static-checker test, to be of any use, a test with
> >> devices needs to be run against a different QEMU version, like so:
> >>
> >> $ cd build
> >> $ QTEST_DEVICE_OPTS=all \
> >> QTEST_QEMU_BINARY=./qemu-system-x86_64 \
> >> QTEST_QEMU_BINARY_DST=../build-previous/qemu-system-x86_64 \
> >> ./tests/qtest/migration-test -p /x86_64/migration/precopy/tcp/plain
> >>
> >> $ cd build
> >> $ QTEST_DEVICE_OPTS='-device virtio-net' \
> >> QTEST_QEMU_BINARY=./qemu-system-x86_64 \
> >> QTEST_QEMU_BINARY_DST=../build-previous/qemu-system-x86_64 \
> >> ./tests/qtest/migration-test -p /x86_64/migration/precopy/tcp/plain
> >>
> >> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> >> ---
> >> tests/qtest/migration-test.c | 19 +++++++++++++++++--
> >> 1 file changed, 17 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
> >> index 2253e0fc5b..35bb224d18 100644
> >> --- a/tests/qtest/migration-test.c
> >> +++ b/tests/qtest/migration-test.c
> >> @@ -71,6 +71,13 @@ static QTestMigrationState dst_state;
> >> #define QEMU_ENV_SRC "QTEST_QEMU_BINARY_SRC"
> >> #define QEMU_ENV_DST "QTEST_QEMU_BINARY_DST"
> >>
> >> +/*
> >> + * The tests using DEFAULT_DEVICES need a special invocation and
> >> + * cannot be reached from make check, so don't bother with the
> >> + * --without-default-devices build.
> >
> > What's this "--without-default-devices"?
>
> A configure option. It removes from the build any devices that are
> marked as default. It's an endless source of bugs because it is supposed
> to be paired with a config file that adds back some of the removed
> devices, but there's nothing enforcing that so we always run it as is
> and generate a broken QEMU binary.
>
> So anything in the tests that refer to devices should first check if
> that QEMU binary even has the device present. I'm saying here that we're
> not going to do that because this test cannot be accidentally reached
> via make check. Realistically, most people will consume this test
> through the CI job only.
Ah I didn't expect that is an existing configure option.. then it is all
fine.
Reviewed-by: Peter Xu <peterx@redhat.com>
Thanks,
--
Peter Xu
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC PATCH 4/4] ci: Add the new migration device tests
2024-05-27 23:59 ` Fabiano Rosas
@ 2024-05-28 15:48 ` Peter Xu
2024-05-28 18:10 ` Fabiano Rosas
0 siblings, 1 reply; 16+ messages in thread
From: Peter Xu @ 2024-05-28 15:48 UTC (permalink / raw)
To: Fabiano Rosas
Cc: qemu-devel, Thomas Huth, Marc-André Lureau, Fiona Ebner,
Het Gala, Alex Bennée, Philippe Mathieu-Daudé,
Wainer dos Santos Moschetta, Beraldo Leal
On Mon, May 27, 2024 at 08:59:00PM -0300, Fabiano Rosas wrote:
> Peter Xu <peterx@redhat.com> writes:
>
> > On Thu, May 23, 2024 at 05:19:22PM -0300, Fabiano Rosas wrote:
> >> We have two new migration tests that check cross version
> >> compatibility. One uses the vmstate-static-checker.py script to
> >> compare the vmstate structures from two different QEMU versions. The
> >> other runs a simple migration with a few devices present in the VM, to
> >> catch obvious breakages.
> >>
> >> Add both tests to the migration-compat-common job.
> >>
> >> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> >> ---
> >> .gitlab-ci.d/buildtest.yml | 43 +++++++++++++++++++++++++++++++-------
> >> 1 file changed, 36 insertions(+), 7 deletions(-)
> >>
> >> diff --git a/.gitlab-ci.d/buildtest.yml b/.gitlab-ci.d/buildtest.yml
> >> index 91c57efded..bc7ac35983 100644
> >> --- a/.gitlab-ci.d/buildtest.yml
> >> +++ b/.gitlab-ci.d/buildtest.yml
> >> @@ -202,18 +202,47 @@ build-previous-qemu:
> >> needs:
> >> - job: build-previous-qemu
> >> - job: build-system-opensuse
> >> - # The old QEMU could have bugs unrelated to migration that are
> >> - # already fixed in the current development branch, so this test
> >> - # might fail.
> >> + # This test is allowed to fail because:
> >> + #
> >> + # - The old QEMU could have bugs unrelated to migration that are
> >> + # already fixed in the current development branch.
> >
> > Did you ever hit a real failure with this? I'm wondering whether we can
> > remove this allow_failure thing.
> >
>
> I haven't. But when it fails we'll go through an entire release cycle
> with this thing showing red for every person that runs the CI. Remember,
> this is a CI failure to which there's no fix aside from waiting for the
> release to happen. Even if we're quick to react and disable the job, I
> feel it might create some confusion already.
My imagination was if needed we'll get complains and we add that until
then for that broken release only, and remove in the next release again.
>
> >> + #
> >> + # - The vmstate-static-checker script trips on renames and other
> >> + # backward-compatible changes to the vmstate structs.
> >
> > I think I keep my preference per last time we talked on this. :)
>
> Sorry, I'm not trying to force this in any way, I just wrote these to
> use in the pull-request and thought I'd put it out there. At the very
> least we can have your concerns documented. =)
Yep that's fine. I think we should keep such discussion on the list,
especially we have different opinions, while none of us got convinced yet
so far. :)
>
> > I still think it's too early to involve a test that can report false
> > negative.
>
> (1)
> Well, we haven't seen any false negatives, we've seen fields being
> renamed. If that happens, then we'll ask the person to update the
> script. Is that not acceptable to you? Or are you thinking about other
> sorts of issues?
Then question is how to update the script. So far it's treated as failure
on rename, even if it's benign. Right now we have this:
print("Section \"" + sec + "\",", end=' ')
print("Description \"" + desc + "\":", end=' ')
print("expected field \"" + s_item["field"] + "\",", end=' ')
print("got \"" + d_item["field"] + "\"; skipping rest")
bump_taint()
break
Do you want to introduce a list of renamed vmsd fields in this script and
maintain that? IMHO it's an overkill and unnecessary burden to other
developers.
>
> > I'd still keep running this before soft-freeze like I used to
> > do, throw issues to others and urge them to fix before release.
>
> Having hidden procedures that maintainers run before a release is bad
> IMHO, it just delays the catching of bugs and frustrates
> contributors. Imagine working on a series, everything goes well with
> reviews, CI passes, patch gets queued and merged and a month later you
> get a ping about something you should have done to avoid breaking
> migration. Right during freeze.
I understand your point, however I don't yet see a way CI could cover
everything. CI won't cover performance test, and I still ran multifd tests
at least smoke it too before soft-freeze.
If there's something done wrong, we notify the sooner the better. Now it
looks to me the best trade-off is like that - we notify at soft-freeze once
per release considering that's pretty rare too, e.g. 9.0 has no such outlier.
Again I'm happy if we have a solution to not report false negatives; that's
the only concern I have.
>
> > Per my
> > previous experience that doesn't consume me a lot of time, and it's not
> > common to see issues either.
> >
> > So I want people to really pay attention when someone sees a migration CI
> > test failed, rather than we help people form the habit in "oh migration CI
> > failed again? I think that's fine, it allows failing anyway".
>
> That's a good point. I don't think it applies here though. See my point
> in (1).
Yes, if you have an answer to (1) I'm ok too. Maybe I misunderstood your
plan there.
Thanks,
>
> > So far I still don't see as much benefit to adding this if we need to pay
> > for the other false negative issue. I'll fully support it if e.g. we can
> > fix the tool to avoid reporting false negatives, but that may take effort
> > that I didn't check.
> >
>
--
Peter Xu
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC PATCH 2/4] tests/qtest/migration: Add a test that runs vmstate-static-checker
2024-05-27 22:52 ` Fabiano Rosas
@ 2024-05-28 15:52 ` Peter Xu
0 siblings, 0 replies; 16+ messages in thread
From: Peter Xu @ 2024-05-28 15:52 UTC (permalink / raw)
To: Fabiano Rosas
Cc: qemu-devel, Thomas Huth, Marc-André Lureau, Fiona Ebner,
Het Gala, Laurent Vivier, Paolo Bonzini
On Mon, May 27, 2024 at 07:52:28PM -0300, Fabiano Rosas wrote:
> Peter Xu <peterx@redhat.com> writes:
>
> > On Thu, May 23, 2024 at 05:19:20PM -0300, Fabiano Rosas wrote:
> >> We have the vmstate-static-checker script that takes the output of:
> >> '$QEMU -M $machine -dump-vmstate' for two different QEMU versions and
> >> compares them to check for compatibility breakages. This is just too
> >> simple and useful for us to pass on it. Add a test that runs the
> >> script.
> >>
> >> Since this needs to use two different QEMU versions, the test is
> >> skipped if only one QEMU is provided. The infrastructure for passing
> >> more than one binary is already in place:
> >>
> >> $ PYTHON=$(which python3.11) \
> >> QTEST_QEMU_BINARY_SRC=../build-previous/qemu-system-x86_64 \
> >> QTEST_QEMU_BINARY=./qemu-system-x86_64 \
> >> ./tests/qtest/migration-test -p /x86_64/migration/vmstate-checker-script
> >>
> >> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> >> ---
> >> some code duplication for now, just so we can reason about this
> >> without too much noise
> >> ---
> >> tests/qtest/migration-test.c | 82 ++++++++++++++++++++++++++++++++++++
> >> 1 file changed, 82 insertions(+)
> >>
> >> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
> >> index e8d3555f56..2253e0fc5b 100644
> >> --- a/tests/qtest/migration-test.c
> >> +++ b/tests/qtest/migration-test.c
> >> @@ -63,6 +63,7 @@ static QTestMigrationState dst_state;
> >> #define DIRTYLIMIT_TOLERANCE_RANGE 25 /* MB/s */
> >>
> >> #define ANALYZE_SCRIPT "scripts/analyze-migration.py"
> >> +#define VMSTATE_CHECKER_SCRIPT "scripts/vmstate-static-checker.py"
> >>
> >> #define QEMU_VM_FILE_MAGIC 0x5145564d
> >> #define FILE_TEST_FILENAME "migfile"
> >> @@ -1611,6 +1612,85 @@ static void test_analyze_script(void)
> >> test_migrate_end(from, to, false);
> >> cleanup("migfile");
> >> }
> >> +
> >> +static void test_vmstate_checker_script(void)
> >> +{
> >> + g_autofree gchar *cmd_src = NULL;
> >> + g_autofree gchar *cmd_dst = NULL;
> >> + g_autofree gchar *vmstate_src = NULL;
> >> + g_autofree gchar *vmstate_dst = NULL;
> >> + const char *machine_alias, *machine_opts = "";
> >> + g_autofree char *machine = NULL;
> >> + const char *arch = qtest_get_arch();
> >> + int pid, wstatus;
> >> + const char *python = g_getenv("PYTHON");
> >> +
> >> + if (!getenv(QEMU_ENV_SRC) && !getenv(QEMU_ENV_DST)) {
> >> + g_test_skip("Test needs two different QEMU versions");
> >> + return;
> >> + }
> >> +
> >> + if (!python) {
> >> + g_test_skip("PYTHON variable not set");
> >> + return;
> >> + }
> >> +
> >> + if (strcmp(arch, "i386") == 0 || strcmp(arch, "x86_64") == 0) {
> >> + if (g_str_equal(arch, "i386")) {
> >> + machine_alias = "pc";
> >> + } else {
> >> + machine_alias = "q35";
> >> + }
> >> + } else if (g_str_equal(arch, "s390x")) {
> >> + machine_alias = "s390-ccw-virtio";
> >> + } else if (strcmp(arch, "ppc64") == 0) {
> >> + machine_alias = "pseries";
> >> + } else if (strcmp(arch, "aarch64") == 0) {
> >> + machine_alias = "virt";
> >> + } else {
> >> + g_assert_not_reached();
> >> + }
> >> +
> >> + if (!qtest_has_machine(machine_alias)) {
> >> + g_autofree char *msg = g_strdup_printf("machine %s not supported", machine_alias);
> >> + g_test_skip(msg);
> >> + return;
> >> + }
> >> +
> >> + machine = resolve_machine_version(machine_alias, QEMU_ENV_SRC,
> >> + QEMU_ENV_DST);
> >> +
> >> + vmstate_src = g_strdup_printf("%s/vmstate-src", tmpfs);
> >> + vmstate_dst = g_strdup_printf("%s/vmstate-dst", tmpfs);
> >> +
> >> + cmd_dst = g_strdup_printf("-machine %s,%s -dump-vmstate %s",
> >> + machine, machine_opts, vmstate_dst);
> >> + cmd_src = g_strdup_printf("-machine %s,%s -dump-vmstate %s",
> >> + machine, machine_opts, vmstate_src);
> >> +
> >> + qtest_init_with_env_no_handshake(QEMU_ENV_SRC, cmd_src);
> >> + qtest_init_with_env_no_handshake(QEMU_ENV_DST, cmd_dst);
> >> +
> >> + pid = fork();
> >> + if (!pid) {
> >> + close(1);
> >> + open("/dev/null", O_WRONLY);
> >> + execl(python, python, VMSTATE_CHECKER_SCRIPT,
> >> + "-s", vmstate_src,
> >> + "-d", vmstate_dst,
> >> + NULL);
> >> + g_assert_not_reached();
> >> + }
> >> +
> >> + g_assert(waitpid(pid, &wstatus, 0) == pid);
> >> + if (!WIFEXITED(wstatus) || WEXITSTATUS(wstatus) != 0) {
> >> + g_test_message("Failed to run vmstate-static-checker.py");
> >> + g_test_fail();
> >> + }
> >> +
> >> + cleanup("vmstate-src");
> >> + cleanup("vmstate-dst");
> >> +}
> >
> > Did I ask before on whether this can be written without C?
>
> If you did I forgot about it, sorry.
Nah it's really about me forgetting things sometimes... nothing else.
>
> > I think this and also the analyze-script are more suitable to be written in
> > other ways, e.g., bash or python, no?
> >
>
> I would prefer not to fragment the test framework. There's a bunch of
> infra already present in migration-test/libqtest that we would end up
> having to rewrite in the other languages.
It's ok; I don't feel strongly on this one, but I want us to figure out
whether we should have such test enforced in CI. Let's discuss that, and
I'm ok with this in C at least for now.
Note that one worst case scenario (if you prefer running this in pulls for
our submaintainer CI) is we can always introduce this test but only kick it
with QEMU_CI_MIGRATION=1. As you said we lost the chance to fail exactly
on the pull of another maintainer as they won't normally set this bit, but
that goes back to the false negative issue, then we can cover this test
more frequently than "softfreeze-only", and no need for manual triggers.
--
Peter Xu
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC PATCH 4/4] ci: Add the new migration device tests
2024-05-28 15:48 ` Peter Xu
@ 2024-05-28 18:10 ` Fabiano Rosas
2024-05-28 18:52 ` Peter Xu
0 siblings, 1 reply; 16+ messages in thread
From: Fabiano Rosas @ 2024-05-28 18:10 UTC (permalink / raw)
To: Peter Xu
Cc: qemu-devel, Thomas Huth, Marc-André Lureau, Fiona Ebner,
Het Gala, Alex Bennée, Philippe Mathieu-Daudé,
Wainer dos Santos Moschetta, Beraldo Leal
Peter Xu <peterx@redhat.com> writes:
> On Mon, May 27, 2024 at 08:59:00PM -0300, Fabiano Rosas wrote:
>> Peter Xu <peterx@redhat.com> writes:
>>
>> > On Thu, May 23, 2024 at 05:19:22PM -0300, Fabiano Rosas wrote:
>> >> We have two new migration tests that check cross version
>> >> compatibility. One uses the vmstate-static-checker.py script to
>> >> compare the vmstate structures from two different QEMU versions. The
>> >> other runs a simple migration with a few devices present in the VM, to
>> >> catch obvious breakages.
>> >>
>> >> Add both tests to the migration-compat-common job.
>> >>
>> >> Signed-off-by: Fabiano Rosas <farosas@suse.de>
>> >> ---
>> >> .gitlab-ci.d/buildtest.yml | 43 +++++++++++++++++++++++++++++++-------
>> >> 1 file changed, 36 insertions(+), 7 deletions(-)
>> >>
>> >> diff --git a/.gitlab-ci.d/buildtest.yml b/.gitlab-ci.d/buildtest.yml
>> >> index 91c57efded..bc7ac35983 100644
>> >> --- a/.gitlab-ci.d/buildtest.yml
>> >> +++ b/.gitlab-ci.d/buildtest.yml
>> >> @@ -202,18 +202,47 @@ build-previous-qemu:
>> >> needs:
>> >> - job: build-previous-qemu
>> >> - job: build-system-opensuse
>> >> - # The old QEMU could have bugs unrelated to migration that are
>> >> - # already fixed in the current development branch, so this test
>> >> - # might fail.
>> >> + # This test is allowed to fail because:
>> >> + #
>> >> + # - The old QEMU could have bugs unrelated to migration that are
>> >> + # already fixed in the current development branch.
>> >
>> > Did you ever hit a real failure with this? I'm wondering whether we can
>> > remove this allow_failure thing.
>> >
>>
>> I haven't. But when it fails we'll go through an entire release cycle
>> with this thing showing red for every person that runs the CI. Remember,
>> this is a CI failure to which there's no fix aside from waiting for the
>> release to happen. Even if we're quick to react and disable the job, I
>> feel it might create some confusion already.
>
> My imagination was if needed we'll get complains and we add that until
> then for that broken release only, and remove in the next release again.
>
>>
>> >> + #
>> >> + # - The vmstate-static-checker script trips on renames and other
>> >> + # backward-compatible changes to the vmstate structs.
>> >
>> > I think I keep my preference per last time we talked on this. :)
>>
>> Sorry, I'm not trying to force this in any way, I just wrote these to
>> use in the pull-request and thought I'd put it out there. At the very
>> least we can have your concerns documented. =)
>
> Yep that's fine. I think we should keep such discussion on the list,
> especially we have different opinions, while none of us got convinced yet
> so far. :)
>
>>
>> > I still think it's too early to involve a test that can report false
>> > negative.
>>
>> (1)
>> Well, we haven't seen any false negatives, we've seen fields being
>> renamed. If that happens, then we'll ask the person to update the
>> script. Is that not acceptable to you? Or are you thinking about other
>> sorts of issues?
>
> Then question is how to update the script. So far it's treated as failure
> on rename, even if it's benign. Right now we have this:
>
> print("Section \"" + sec + "\",", end=' ')
> print("Description \"" + desc + "\":", end=' ')
> print("expected field \"" + s_item["field"] + "\",", end=' ')
> print("got \"" + d_item["field"] + "\"; skipping rest")
> bump_taint()
> break
>
> Do you want to introduce a list of renamed vmsd fields in this script and
> maintain that? IMHO it's an overkill and unnecessary burden to other
> developers.
>
That's not _my_ idea, we already have that (see below). There's not much
reason to rename fields like that, the vmstate is obviously something
that should be kept stable, so having to do a rename in a script is way
better than having to figure out the fix for the compatibility break.
def check_fields_match(name, s_field, d_field):
if s_field == d_field:
return True
# Some fields changed names between qemu versions. This list
# is used to allow such changes in each section / description.
changed_names = {
'apic': ['timer', 'timer_expiry'],
'e1000': ['dev', 'parent_obj'],
'ehci': ['dev', 'pcidev'],
'I440FX': ['dev', 'parent_obj'],
'ich9_ahci': ['card', 'parent_obj'],
'ich9-ahci': ['ahci', 'ich9_ahci'],
'ioh3420': ['PCIDevice', 'PCIEDevice'],
'ioh-3240-express-root-port': ['port.br.dev',
'parent_obj.parent_obj.parent_obj',
'port.br.dev.exp.aer_log',
'parent_obj.parent_obj.parent_obj.exp.aer_log'],
'cirrus_vga': ['hw_cursor_x', 'vga.hw_cursor_x',
'hw_cursor_y', 'vga.hw_cursor_y'],
'lsiscsi': ['dev', 'parent_obj'],
'mch': ['d', 'parent_obj'],
'pci_bridge': ['bridge.dev', 'parent_obj', 'bridge.dev.shpc', 'shpc'],
'pcnet': ['pci_dev', 'parent_obj'],
'PIIX3': ['pci_irq_levels', 'pci_irq_levels_vmstate'],
'piix4_pm': ['dev', 'parent_obj', 'pci0_status',
'acpi_pci_hotplug.acpi_pcihp_pci_status[0x0]',
'pm1a.sts', 'ar.pm1.evt.sts', 'pm1a.en', 'ar.pm1.evt.en',
'pm1_cnt.cnt', 'ar.pm1.cnt.cnt',
'tmr.timer', 'ar.tmr.timer',
'tmr.overflow_time', 'ar.tmr.overflow_time',
'gpe', 'ar.gpe'],
'rtl8139': ['dev', 'parent_obj'],
'qxl': ['num_surfaces', 'ssd.num_surfaces'],
'usb-ccid': ['abProtocolDataStructure', 'abProtocolDataStructure.data'],
'usb-host': ['dev', 'parent_obj'],
'usb-mouse': ['usb-ptr-queue', 'HIDPointerEventQueue'],
'usb-tablet': ['usb-ptr-queue', 'HIDPointerEventQueue'],
'vmware_vga': ['card', 'parent_obj'],
'vmware_vga_internal': ['depth', 'new_depth'],
'xhci': ['pci_dev', 'parent_obj'],
'x3130-upstream': ['PCIDevice', 'PCIEDevice'],
'xio3130-express-downstream-port': ['port.br.dev',
'parent_obj.parent_obj.parent_obj',
'port.br.dev.exp.aer_log',
'parent_obj.parent_obj.parent_obj.exp.aer_log'],
'xio3130-downstream': ['PCIDevice', 'PCIEDevice'],
'xio3130-express-upstream-port': ['br.dev', 'parent_obj.parent_obj',
'br.dev.exp.aer_log',
'parent_obj.parent_obj.exp.aer_log'],
'spapr_pci': ['dma_liobn[0]', 'mig_liobn',
'mem_win_addr', 'mig_mem_win_addr',
'mem_win_size', 'mig_mem_win_size',
'io_win_addr', 'mig_io_win_addr',
'io_win_size', 'mig_io_win_size'],
}
if not name in changed_names:
return False
if s_field in changed_names[name] and d_field in changed_names[name]:
return True
return False
>>
>> > I'd still keep running this before soft-freeze like I used to
>> > do, throw issues to others and urge them to fix before release.
>>
>> Having hidden procedures that maintainers run before a release is bad
>> IMHO, it just delays the catching of bugs and frustrates
>> contributors. Imagine working on a series, everything goes well with
>> reviews, CI passes, patch gets queued and merged and a month later you
>> get a ping about something you should have done to avoid breaking
>> migration. Right during freeze.
>
> I understand your point, however I don't yet see a way CI could cover
> everything. CI won't cover performance test, and I still ran multifd tests
> at least smoke it too before soft-freeze.
>
> If there's something done wrong, we notify the sooner the better. Now it
> looks to me the best trade-off is like that - we notify at soft-freeze once
> per release considering that's pretty rare too, e.g. 9.0 has no such outlier.
We should notify before merging any code. If there is a tool that can
catch the bug, there's no point in wasting everyone's time carrying that
patch forward. That's the purpose of a CI setup. These tests are very
light in terms of resources.
Now, you're making the point that these issues are rare, which they may
very well be, I haven't ran this long enough to know. I also don't know
how to measure that, I believe our CI already has a large amount of
tests that never catch anything, so I'm not sure how much we should take
it in consideration when adding tests to the CI.
About the possibly false results, we'll have to hold merging this for a
few releases and gather more data, see how much of a bother it is to
keep updating these fields. I expect to have more bugs caught by the
script than any renames happening.
>
> Again I'm happy if we have a solution to not report false negatives; that's
> the only concern I have.
>>
>> > Per my
>> > previous experience that doesn't consume me a lot of time, and it's not
>> > common to see issues either.
>> >
>> > So I want people to really pay attention when someone sees a migration CI
>> > test failed, rather than we help people form the habit in "oh migration CI
>> > failed again? I think that's fine, it allows failing anyway".
>>
>> That's a good point. I don't think it applies here though. See my point
>> in (1).
>
> Yes, if you have an answer to (1) I'm ok too. Maybe I misunderstood your
> plan there.
>
> Thanks,
>
>>
>> > So far I still don't see as much benefit to adding this if we need to pay
>> > for the other false negative issue. I'll fully support it if e.g. we can
>> > fix the tool to avoid reporting false negatives, but that may take effort
>> > that I didn't check.
>> >
>>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC PATCH 4/4] ci: Add the new migration device tests
2024-05-28 18:10 ` Fabiano Rosas
@ 2024-05-28 18:52 ` Peter Xu
0 siblings, 0 replies; 16+ messages in thread
From: Peter Xu @ 2024-05-28 18:52 UTC (permalink / raw)
To: Fabiano Rosas
Cc: qemu-devel, Thomas Huth, Marc-André Lureau, Fiona Ebner,
Het Gala, Alex Bennée, Philippe Mathieu-Daudé,
Wainer dos Santos Moschetta, Beraldo Leal
On Tue, May 28, 2024 at 03:10:48PM -0300, Fabiano Rosas wrote:
> Peter Xu <peterx@redhat.com> writes:
>
> > On Mon, May 27, 2024 at 08:59:00PM -0300, Fabiano Rosas wrote:
> >> Peter Xu <peterx@redhat.com> writes:
> >>
> >> > On Thu, May 23, 2024 at 05:19:22PM -0300, Fabiano Rosas wrote:
> >> >> We have two new migration tests that check cross version
> >> >> compatibility. One uses the vmstate-static-checker.py script to
> >> >> compare the vmstate structures from two different QEMU versions. The
> >> >> other runs a simple migration with a few devices present in the VM, to
> >> >> catch obvious breakages.
> >> >>
> >> >> Add both tests to the migration-compat-common job.
> >> >>
> >> >> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> >> >> ---
> >> >> .gitlab-ci.d/buildtest.yml | 43 +++++++++++++++++++++++++++++++-------
> >> >> 1 file changed, 36 insertions(+), 7 deletions(-)
> >> >>
> >> >> diff --git a/.gitlab-ci.d/buildtest.yml b/.gitlab-ci.d/buildtest.yml
> >> >> index 91c57efded..bc7ac35983 100644
> >> >> --- a/.gitlab-ci.d/buildtest.yml
> >> >> +++ b/.gitlab-ci.d/buildtest.yml
> >> >> @@ -202,18 +202,47 @@ build-previous-qemu:
> >> >> needs:
> >> >> - job: build-previous-qemu
> >> >> - job: build-system-opensuse
> >> >> - # The old QEMU could have bugs unrelated to migration that are
> >> >> - # already fixed in the current development branch, so this test
> >> >> - # might fail.
> >> >> + # This test is allowed to fail because:
> >> >> + #
> >> >> + # - The old QEMU could have bugs unrelated to migration that are
> >> >> + # already fixed in the current development branch.
> >> >
> >> > Did you ever hit a real failure with this? I'm wondering whether we can
> >> > remove this allow_failure thing.
> >> >
> >>
> >> I haven't. But when it fails we'll go through an entire release cycle
> >> with this thing showing red for every person that runs the CI. Remember,
> >> this is a CI failure to which there's no fix aside from waiting for the
> >> release to happen. Even if we're quick to react and disable the job, I
> >> feel it might create some confusion already.
> >
> > My imagination was if needed we'll get complains and we add that until
> > then for that broken release only, and remove in the next release again.
> >
> >>
> >> >> + #
> >> >> + # - The vmstate-static-checker script trips on renames and other
> >> >> + # backward-compatible changes to the vmstate structs.
> >> >
> >> > I think I keep my preference per last time we talked on this. :)
> >>
> >> Sorry, I'm not trying to force this in any way, I just wrote these to
> >> use in the pull-request and thought I'd put it out there. At the very
> >> least we can have your concerns documented. =)
> >
> > Yep that's fine. I think we should keep such discussion on the list,
> > especially we have different opinions, while none of us got convinced yet
> > so far. :)
> >
> >>
> >> > I still think it's too early to involve a test that can report false
> >> > negative.
> >>
> >> (1)
> >> Well, we haven't seen any false negatives, we've seen fields being
> >> renamed. If that happens, then we'll ask the person to update the
> >> script. Is that not acceptable to you? Or are you thinking about other
> >> sorts of issues?
> >
> > Then question is how to update the script. So far it's treated as failure
> > on rename, even if it's benign. Right now we have this:
> >
> > print("Section \"" + sec + "\",", end=' ')
> > print("Description \"" + desc + "\":", end=' ')
> > print("expected field \"" + s_item["field"] + "\",", end=' ')
> > print("got \"" + d_item["field"] + "\"; skipping rest")
> > bump_taint()
> > break
> >
> > Do you want to introduce a list of renamed vmsd fields in this script and
> > maintain that? IMHO it's an overkill and unnecessary burden to other
> > developers.
> >
>
> That's not _my_ idea, we already have that (see below). There's not much
> reason to rename fields like that, the vmstate is obviously something
> that should be kept stable, so having to do a rename in a script is way
> better than having to figure out the fix for the compatibility break.
>
> def check_fields_match(name, s_field, d_field):
> if s_field == d_field:
> return True
>
> # Some fields changed names between qemu versions. This list
> # is used to allow such changes in each section / description.
> changed_names = {
> 'apic': ['timer', 'timer_expiry'],
> 'e1000': ['dev', 'parent_obj'],
> 'ehci': ['dev', 'pcidev'],
> 'I440FX': ['dev', 'parent_obj'],
> 'ich9_ahci': ['card', 'parent_obj'],
> 'ich9-ahci': ['ahci', 'ich9_ahci'],
> 'ioh3420': ['PCIDevice', 'PCIEDevice'],
> 'ioh-3240-express-root-port': ['port.br.dev',
> 'parent_obj.parent_obj.parent_obj',
> 'port.br.dev.exp.aer_log',
> 'parent_obj.parent_obj.parent_obj.exp.aer_log'],
> 'cirrus_vga': ['hw_cursor_x', 'vga.hw_cursor_x',
> 'hw_cursor_y', 'vga.hw_cursor_y'],
> 'lsiscsi': ['dev', 'parent_obj'],
> 'mch': ['d', 'parent_obj'],
> 'pci_bridge': ['bridge.dev', 'parent_obj', 'bridge.dev.shpc', 'shpc'],
> 'pcnet': ['pci_dev', 'parent_obj'],
> 'PIIX3': ['pci_irq_levels', 'pci_irq_levels_vmstate'],
> 'piix4_pm': ['dev', 'parent_obj', 'pci0_status',
> 'acpi_pci_hotplug.acpi_pcihp_pci_status[0x0]',
> 'pm1a.sts', 'ar.pm1.evt.sts', 'pm1a.en', 'ar.pm1.evt.en',
> 'pm1_cnt.cnt', 'ar.pm1.cnt.cnt',
> 'tmr.timer', 'ar.tmr.timer',
> 'tmr.overflow_time', 'ar.tmr.overflow_time',
> 'gpe', 'ar.gpe'],
> 'rtl8139': ['dev', 'parent_obj'],
> 'qxl': ['num_surfaces', 'ssd.num_surfaces'],
> 'usb-ccid': ['abProtocolDataStructure', 'abProtocolDataStructure.data'],
> 'usb-host': ['dev', 'parent_obj'],
> 'usb-mouse': ['usb-ptr-queue', 'HIDPointerEventQueue'],
> 'usb-tablet': ['usb-ptr-queue', 'HIDPointerEventQueue'],
> 'vmware_vga': ['card', 'parent_obj'],
> 'vmware_vga_internal': ['depth', 'new_depth'],
> 'xhci': ['pci_dev', 'parent_obj'],
> 'x3130-upstream': ['PCIDevice', 'PCIEDevice'],
> 'xio3130-express-downstream-port': ['port.br.dev',
> 'parent_obj.parent_obj.parent_obj',
> 'port.br.dev.exp.aer_log',
> 'parent_obj.parent_obj.parent_obj.exp.aer_log'],
> 'xio3130-downstream': ['PCIDevice', 'PCIEDevice'],
> 'xio3130-express-upstream-port': ['br.dev', 'parent_obj.parent_obj',
> 'br.dev.exp.aer_log',
> 'parent_obj.parent_obj.exp.aer_log'],
> 'spapr_pci': ['dma_liobn[0]', 'mig_liobn',
> 'mem_win_addr', 'mig_mem_win_addr',
> 'mem_win_size', 'mig_mem_win_size',
> 'io_win_addr', 'mig_io_win_addr',
> 'io_win_size', 'mig_io_win_size'],
> }
>
> if not name in changed_names:
> return False
>
> if s_field in changed_names[name] and d_field in changed_names[name]:
> return True
>
> return False
Ouch, I didn't notice that.. Ok that's a fair request then!
>
>
> >>
> >> > I'd still keep running this before soft-freeze like I used to
> >> > do, throw issues to others and urge them to fix before release.
> >>
> >> Having hidden procedures that maintainers run before a release is bad
> >> IMHO, it just delays the catching of bugs and frustrates
> >> contributors. Imagine working on a series, everything goes well with
> >> reviews, CI passes, patch gets queued and merged and a month later you
> >> get a ping about something you should have done to avoid breaking
> >> migration. Right during freeze.
> >
> > I understand your point, however I don't yet see a way CI could cover
> > everything. CI won't cover performance test, and I still ran multifd tests
> > at least smoke it too before soft-freeze.
> >
> > If there's something done wrong, we notify the sooner the better. Now it
> > looks to me the best trade-off is like that - we notify at soft-freeze once
> > per release considering that's pretty rare too, e.g. 9.0 has no such outlier.
>
> We should notify before merging any code. If there is a tool that can
> catch the bug, there's no point in wasting everyone's time carrying that
> patch forward. That's the purpose of a CI setup. These tests are very
> light in terms of resources.
>
> Now, you're making the point that these issues are rare, which they may
> very well be, I haven't ran this long enough to know. I also don't know
> how to measure that, I believe our CI already has a large amount of
> tests that never catch anything, so I'm not sure how much we should take
> it in consideration when adding tests to the CI.
>
> About the possibly false results, we'll have to hold merging this for a
> few releases and gather more data, see how much of a bother it is to
> keep updating these fields. I expect to have more bugs caught by the
> script than any renames happening.
When looking (trying to fill up the whilelist for 8.2 machine type on
"esp"), I found one more thing. See:
commit b46a43a2241476d13486e016a0809b690b65f90e
Author: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Date: Fri Jan 12 12:53:48 2024 +0000
esp.c: remove unused PDMA callback implementation
Note that this is a migration break for the q800 machine because the extra PDMA
information is no longer included.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Tested-by: Helge Deller <deller@gmx.de>
Tested-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20240112125420.514425-57-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
It deliberately wants to break migration. That might be a problem now
since it means it'll reliably fail the vmstate checker CI test, then
people'll say "this is expected", then should the maintainer merge such
change? As long as merged, everything running the vmstate checker will
fail it always.
Maybe we'll need to add another whiltelist to ignore some devices? It
gets a bit hairy..
--
Peter Xu
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2024-05-28 18:53 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-23 20:19 [RFC PATCH 0/4] migration-test: Device migration smoke tests Fabiano Rosas
2024-05-23 20:19 ` [RFC PATCH 1/4] tests/qtest/libqtest: Introduce another qtest_init version with no handshake Fabiano Rosas
2024-05-23 20:19 ` [RFC PATCH 2/4] tests/qtest/migration: Add a test that runs vmstate-static-checker Fabiano Rosas
2024-05-27 21:06 ` Peter Xu
2024-05-27 22:52 ` Fabiano Rosas
2024-05-28 15:52 ` Peter Xu
2024-05-23 20:19 ` [RFC PATCH 3/4] tests/qtest/migration: Add support for simple device tests Fabiano Rosas
2024-05-27 21:12 ` Peter Xu
2024-05-27 22:59 ` Fabiano Rosas
2024-05-28 15:35 ` Peter Xu
2024-05-23 20:19 ` [RFC PATCH 4/4] ci: Add the new migration " Fabiano Rosas
2024-05-27 21:17 ` Peter Xu
2024-05-27 23:59 ` Fabiano Rosas
2024-05-28 15:48 ` Peter Xu
2024-05-28 18:10 ` Fabiano Rosas
2024-05-28 18:52 ` Peter Xu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).