qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/3] migration: Fix aarch64 cpregs migration
@ 2025-07-30 20:52 Fabiano Rosas
  2025-07-30 20:52 ` [RFC PATCH 1/3] target/arm: Fix migration to QEMU 10.1 Fabiano Rosas
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Fabiano Rosas @ 2025-07-30 20:52 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Xu, Peter Maydell

Commit 655659a74a ("target/arm: Correct encoding of Debug
Communications Channel registers") removed one register and added two
more. This breaks TCG migration:

1) to 10.1 - older versions will have one unknown register

  { .name = "DBGDTRTX", .state = ARM_CP_STATE_AA32,
    .cp = 14, .crn = 0, .crm = 5, .opc1 = 3, .opc2 = 0 }

  kvmid: 0x40200000200e0298

2) from 10.1 - older versions will not accept more registers than
   their cpreg_vmstate_array_len (292 > 291).

I'm proposing a (RFC) stop-gap fix for (1) so 10.1 can be used as a
migration target until we figure out how to add some form of
versioning of the cpregs_indexes list and keep track of what has been
added/removed throughout the releases.

Even with infrastructure in place to do compatibility of the cpregs
list, there is still the need to avoid (or justify) guest-visible
changes resulting from some registers not being migrated.

Anyway, let's discuss.

PS: the extra patches are to enable cross-version testing on aarch64,
which would have flagged this early. My apologies as this is entirely
my fault because enabling these tests for arm has been on my list for
a long time.

Fabiano Rosas (3):
  target/arm: Fix migration to QEMU 10.1
  tests/qtest/migration: Only test aarch64 on TCG
  tests/qtest/migration: Change cpu for aarch64

 target/arm/machine.c              | 24 +++++++++++++++++++++++-
 tests/qtest/migration/framework.c | 19 ++++++++++++++-----
 2 files changed, 37 insertions(+), 6 deletions(-)

-- 
2.35.3



^ permalink raw reply	[flat|nested] 8+ messages in thread

* [RFC PATCH 1/3] target/arm: Fix migration to QEMU 10.1
  2025-07-30 20:52 [RFC PATCH 0/3] migration: Fix aarch64 cpregs migration Fabiano Rosas
@ 2025-07-30 20:52 ` Fabiano Rosas
  2025-07-30 20:52 ` [RFC PATCH 2/3] tests/qtest/migration: Only test aarch64 on TCG Fabiano Rosas
  2025-07-30 20:52 ` [RFC PATCH 3/3] tests/qtest/migration: Change cpu for aarch64 Fabiano Rosas
  2 siblings, 0 replies; 8+ messages in thread
From: Fabiano Rosas @ 2025-07-30 20:52 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Xu, Peter Maydell

The DBGDTRTX register definition was removed from cpu->cp_regs in the
10.1 cycle. This breaks migration from QEMU versions < 10.1 due to the
present of an extra, unknown (to 10.1) register in the migration
stream.

Change the cpu_post_load validation code to recognise that the
register has been removed and ignore it when present in the stream.

Keep a compatibility list with the registers that should be ignored
when sent from versions older than 10.1. The value of the cpregs
hashtable key is used because it can be derived on the destination
(where this patch applies) from the cpreg_vmstate_indexes array.

Note that this solution is *not* generic for other QEMU versions
moving forward, this is a stop gap to avoid machines being stuck in
QEMU < 10.1 without a migration path. A proper solution would include
versioning of the register list and recognizing any registers
removed/changed.

Fixes: 655659a74a ("target/arm: Correct encoding of Debug Communications Channel registers")
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
Will an older guest using the register have issues after migration
once the register gets set to its default value?
---
 target/arm/machine.c | 24 +++++++++++++++++++++++-
 1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/target/arm/machine.c b/target/arm/machine.c
index 6986915bee..2d4df53817 100644
--- a/target/arm/machine.c
+++ b/target/arm/machine.c
@@ -8,6 +8,7 @@
 #include "cpu-features.h"
 #include "migration/qemu-file-types.h"
 #include "migration/vmstate.h"
+#include "target/arm/cpregs.h"
 #include "target/arm/gtimer.h"
 
 static bool vfp_needed(void *opaque)
@@ -868,6 +869,14 @@ static const VMStateInfo vmstate_powered_off = {
     .put = put_power,
 };
 
+static uint64_t compat_cpreg_keys_virt_10_0[] = {
+    /*
+     * { .name = "DBGDTRTX", .state = ARM_CP_STATE_AA32,
+     * .cp = 14, .crn = 0, .crm = 5, .opc1 = 3, .opc2 = 0 }
+     */
+    ENCODE_CP_REG(14, 0, 1, 0, 5, 3, 0),
+};
+
 static int cpu_pre_save(void *opaque)
 {
     ARMCPU *cpu = opaque;
@@ -951,7 +960,7 @@ static int cpu_post_load(void *opaque, int version_id)
 {
     ARMCPU *cpu = opaque;
     CPUARMState *env = &cpu->env;
-    int i, v;
+    int i, j, v;
 
     /*
      * Handle migration compatibility from old QEMU which didn't
@@ -987,10 +996,23 @@ static int cpu_post_load(void *opaque, int version_id)
         }
         if (cpu->cpreg_vmstate_indexes[v] < cpu->cpreg_indexes[i]) {
             /* register in their list but not ours: fail migration */
+
+            for (j = 0; j < ARRAY_SIZE(compat_cpreg_keys_virt_10_0); j++) {
+                if (cpu->cpreg_vmstate_indexes[v] ==
+                    cpreg_to_kvm_id(compat_cpreg_keys_virt_10_0[j])) {
+                    /*
+                     * ...unless the extra register is being explicitly
+                     * ignored for migration compatibility purposes.
+                     */
+                    i--;
+                    goto next;
+                }
+            }
             return -1;
         }
         /* matching register, copy the value over */
         cpu->cpreg_values[i] = cpu->cpreg_vmstate_values[v];
+    next:
         v++;
     }
 
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [RFC PATCH 2/3] tests/qtest/migration: Only test aarch64 on TCG
  2025-07-30 20:52 [RFC PATCH 0/3] migration: Fix aarch64 cpregs migration Fabiano Rosas
  2025-07-30 20:52 ` [RFC PATCH 1/3] target/arm: Fix migration to QEMU 10.1 Fabiano Rosas
@ 2025-07-30 20:52 ` Fabiano Rosas
  2025-07-31 13:37   ` Peter Maydell
  2025-07-30 20:52 ` [RFC PATCH 3/3] tests/qtest/migration: Change cpu for aarch64 Fabiano Rosas
  2 siblings, 1 reply; 8+ messages in thread
From: Fabiano Rosas @ 2025-07-30 20:52 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Xu, Peter Maydell, Laurent Vivier, Paolo Bonzini

Currently our aarch64 tests are only being run using identical QEMU
versions. When running the tests with different QEMU versions, which
is a common use-case for migration, the tests are broken due to the
current choice of the 'max' cpu, which is not stable and is prone to
breaking migration.

This means aarch64 tests are currently only testing about the same
situations as any other arch, i.e. no arm-specific testing is being
done.

To make the aarch64 tests more useful, -cpu max will be changed to
-cpu neoverse-n1 in the next patch. Before doing that, make sure
aarch64 tests only run with TCG, since KVM testing depends on usage of
the -cpu host and we currently don't have code to switch between cpus
according to test runtime environment.

Also, TCG alone should allow us to catch most issues with migration,
since there is no guarantee of a uniform environment as there is with
KVM.

Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 tests/qtest/migration/framework.c | 17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/tests/qtest/migration/framework.c b/tests/qtest/migration/framework.c
index 407c9023c0..f09365d122 100644
--- a/tests/qtest/migration/framework.c
+++ b/tests/qtest/migration/framework.c
@@ -353,8 +353,17 @@ int migrate_start(QTestState **from, QTestState **to, const char *uri,
         memory_backend = g_strdup_printf("-m %s ", memory_size);
     }
 
-    if (args->use_dirty_ring) {
-        kvm_opts = ",dirty-ring-size=4096";
+
+    if (g_str_equal(arch, "aarch64")) {
+        /*
+         * aarch64 is only tested with TCG because there is no single
+         * cpu that can be used for both KVM and TCG.
+         */
+        kvm_opts = NULL;
+    } else if (args->use_dirty_ring) {
+        kvm_opts = "-accel kvm,dirty-ring-size=4096";
+    } else {
+        kvm_opts = "-accel kvm";
     }
 
     if (!qtest_has_machine(machine_alias)) {
@@ -368,7 +377,7 @@ int migrate_start(QTestState **from, QTestState **to, const char *uri,
 
     g_test_message("Using machine type: %s", machine);
 
-    cmd_source = g_strdup_printf("-accel kvm%s -accel tcg "
+    cmd_source = g_strdup_printf("%s -accel tcg "
                                  "-machine %s,%s "
                                  "-name source,debug-threads=on "
                                  "%s "
@@ -395,7 +404,7 @@ int migrate_start(QTestState **from, QTestState **to, const char *uri,
      */
     events = args->defer_target_connect ? "-global migration.x-events=on" : "";
 
-    cmd_target = g_strdup_printf("-accel kvm%s -accel tcg "
+    cmd_target = g_strdup_printf("%s -accel tcg "
                                  "-machine %s,%s "
                                  "-name target,debug-threads=on "
                                  "%s "
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [RFC PATCH 3/3] tests/qtest/migration: Change cpu for aarch64
  2025-07-30 20:52 [RFC PATCH 0/3] migration: Fix aarch64 cpregs migration Fabiano Rosas
  2025-07-30 20:52 ` [RFC PATCH 1/3] target/arm: Fix migration to QEMU 10.1 Fabiano Rosas
  2025-07-30 20:52 ` [RFC PATCH 2/3] tests/qtest/migration: Only test aarch64 on TCG Fabiano Rosas
@ 2025-07-30 20:52 ` Fabiano Rosas
  2025-07-31 10:03   ` Jonathan Cameron via
  2 siblings, 1 reply; 8+ messages in thread
From: Fabiano Rosas @ 2025-07-30 20:52 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Xu, Peter Maydell, Laurent Vivier, Paolo Bonzini

Don't use the 'max' cpu for migration testing of aarch64. That cpu
does not provide a stable set of features and is expected to break
migration from time to time.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 tests/qtest/migration/framework.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/qtest/migration/framework.c b/tests/qtest/migration/framework.c
index f09365d122..6d980b6b51 100644
--- a/tests/qtest/migration/framework.c
+++ b/tests/qtest/migration/framework.c
@@ -317,7 +317,7 @@ int migrate_start(QTestState **from, QTestState **to, const char *uri,
         memory_size = "150M";
         machine_alias = "virt";
         machine_opts = "gic-version=3";
-        arch_opts = g_strdup_printf("-cpu max -kernel %s", bootpath);
+        arch_opts = g_strdup_printf("-cpu neoverse-n1 -kernel %s", bootpath);
         start_address = ARM_TEST_MEM_START;
         end_address = ARM_TEST_MEM_END;
     } else {
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH 3/3] tests/qtest/migration: Change cpu for aarch64
  2025-07-30 20:52 ` [RFC PATCH 3/3] tests/qtest/migration: Change cpu for aarch64 Fabiano Rosas
@ 2025-07-31 10:03   ` Jonathan Cameron via
  2025-07-31 13:55     ` Fabiano Rosas
  0 siblings, 1 reply; 8+ messages in thread
From: Jonathan Cameron via @ 2025-07-31 10:03 UTC (permalink / raw)
  To: Fabiano Rosas
  Cc: qemu-devel, Peter Xu, Peter Maydell, Laurent Vivier,
	Paolo Bonzini

On Wed, 30 Jul 2025 17:52:45 -0300
Fabiano Rosas <farosas@suse.de> wrote:

> Don't use the 'max' cpu for migration testing of aarch64. That cpu
> does not provide a stable set of features and is expected to break
> migration from time to time.

Whilst I can see the motivation, doesn't this leave us with a lack of
converage for new CPU features that are currently only in max?
> 
> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> ---
>  tests/qtest/migration/framework.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tests/qtest/migration/framework.c b/tests/qtest/migration/framework.c
> index f09365d122..6d980b6b51 100644
> --- a/tests/qtest/migration/framework.c
> +++ b/tests/qtest/migration/framework.c
> @@ -317,7 +317,7 @@ int migrate_start(QTestState **from, QTestState **to, const char *uri,
>          memory_size = "150M";
>          machine_alias = "virt";
>          machine_opts = "gic-version=3";
> -        arch_opts = g_strdup_printf("-cpu max -kernel %s", bootpath);
> +        arch_opts = g_strdup_printf("-cpu neoverse-n1 -kernel %s", bootpath);
>          start_address = ARM_TEST_MEM_START;
>          end_address = ARM_TEST_MEM_END;
>      } else {



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH 2/3] tests/qtest/migration: Only test aarch64 on TCG
  2025-07-30 20:52 ` [RFC PATCH 2/3] tests/qtest/migration: Only test aarch64 on TCG Fabiano Rosas
@ 2025-07-31 13:37   ` Peter Maydell
  2025-07-31 14:58     ` Mohamed Mediouni
  0 siblings, 1 reply; 8+ messages in thread
From: Peter Maydell @ 2025-07-31 13:37 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, Peter Xu, Laurent Vivier, Paolo Bonzini

On Wed, 30 Jul 2025 at 21:52, Fabiano Rosas <farosas@suse.de> wrote:
>
> Currently our aarch64 tests are only being run using identical QEMU
> versions. When running the tests with different QEMU versions, which
> is a common use-case for migration, the tests are broken due to the
> current choice of the 'max' cpu, which is not stable and is prone to
> breaking migration.
>
> This means aarch64 tests are currently only testing about the same
> situations as any other arch, i.e. no arm-specific testing is being
> done.
>
> To make the aarch64 tests more useful, -cpu max will be changed to
> -cpu neoverse-n1 in the next patch. Before doing that, make sure
> aarch64 tests only run with TCG, since KVM testing depends on usage of
> the -cpu host and we currently don't have code to switch between cpus
> according to test runtime environment.
>
> Also, TCG alone should allow us to catch most issues with migration,
> since there is no guarantee of a uniform environment as there is with
> KVM.

The difficulty with only testing TCG migration is that now
we're testing the setup that most cross-versions migration users
don't care about. At least my assumption is that it's KVM
cross-version migration that is the real use case here.

For instance, this migration bug with the DBGDTR register
isn't a problem for KVM, because with KVM we use the kernel
to tell us what system registers are present, and whether
a register is defined with a cpreg in QEMU or not doesn't
affect what we put on the wire for migration. Conversely
there might be migration compat issues that show up only
with KVM and not TCG (though the most obvious source of those
would be host kernel changes, which is kind of out of scope
for us).

Though of course with our CI jobs we're probably not
doing AArch64 KVM cross-version testing anyway...

thanks
-- PMM


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH 3/3] tests/qtest/migration: Change cpu for aarch64
  2025-07-31 10:03   ` Jonathan Cameron via
@ 2025-07-31 13:55     ` Fabiano Rosas
  0 siblings, 0 replies; 8+ messages in thread
From: Fabiano Rosas @ 2025-07-31 13:55 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: qemu-devel, Peter Xu, Peter Maydell, Laurent Vivier,
	Paolo Bonzini

Jonathan Cameron via <qemu-devel@nongnu.org> writes:

> On Wed, 30 Jul 2025 17:52:45 -0300
> Fabiano Rosas <farosas@suse.de> wrote:
>
>> Don't use the 'max' cpu for migration testing of aarch64. That cpu
>> does not provide a stable set of features and is expected to break
>> migration from time to time.
>
> Whilst I can see the motivation, doesn't this leave us with a lack of
> converage for new CPU features that are currently only in max?

It does. That's an interesting aspect. It's better to make sure the new
features come with at least some basic migration support off the gate.

I'll add a patch to this series leaving one of the tests with -cpu
max. That should be enough to cover this scenario.

Thanks

>> 
>> Signed-off-by: Fabiano Rosas <farosas@suse.de>
>> ---
>>  tests/qtest/migration/framework.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/tests/qtest/migration/framework.c b/tests/qtest/migration/framework.c
>> index f09365d122..6d980b6b51 100644
>> --- a/tests/qtest/migration/framework.c
>> +++ b/tests/qtest/migration/framework.c
>> @@ -317,7 +317,7 @@ int migrate_start(QTestState **from, QTestState **to, const char *uri,
>>          memory_size = "150M";
>>          machine_alias = "virt";
>>          machine_opts = "gic-version=3";
>> -        arch_opts = g_strdup_printf("-cpu max -kernel %s", bootpath);
>> +        arch_opts = g_strdup_printf("-cpu neoverse-n1 -kernel %s", bootpath);
>>          start_address = ARM_TEST_MEM_START;
>>          end_address = ARM_TEST_MEM_END;
>>      } else {


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH 2/3] tests/qtest/migration: Only test aarch64 on TCG
  2025-07-31 13:37   ` Peter Maydell
@ 2025-07-31 14:58     ` Mohamed Mediouni
  0 siblings, 0 replies; 8+ messages in thread
From: Mohamed Mediouni @ 2025-07-31 14:58 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Fabiano Rosas, qemu-devel, Peter Xu, Laurent Vivier,
	Paolo Bonzini



> On 31. Jul 2025, at 15:37, Peter Maydell <peter.maydell@linaro.org> wrote:
> 
> On Wed, 30 Jul 2025 at 21:52, Fabiano Rosas <farosas@suse.de> wrote:
>> 
>> Currently our aarch64 tests are only being run using identical QEMU
>> versions. When running the tests with different QEMU versions, which
>> is a common use-case for migration, the tests are broken due to the
>> current choice of the 'max' cpu, which is not stable and is prone to
>> breaking migration.
>> 
>> This means aarch64 tests are currently only testing about the same
>> situations as any other arch, i.e. no arm-specific testing is being
>> done.
>> 
>> To make the aarch64 tests more useful, -cpu max will be changed to
>> -cpu neoverse-n1 in the next patch. Before doing that, make sure
>> aarch64 tests only run with TCG, since KVM testing depends on usage of
>> the -cpu host and we currently don't have code to switch between cpus
>> according to test runtime environment.
>> 
>> Also, TCG alone should allow us to catch most issues with migration,
>> since there is no guarantee of a uniform environment as there is with
>> KVM.
> 
> The difficulty with only testing TCG migration is that now
> we're testing the setup that most cross-versions migration users
> don't care about. At least my assumption is that it's KVM
> cross-version migration that is the real use case here.
> 
> For instance, this migration bug with the DBGDTR register
> isn't a problem for KVM, because with KVM we use the kernel
> to tell us what system registers are present, and whether
> a register is defined with a cpreg in QEMU or not doesn't
> affect what we put on the wire for migration. Conversely
> there might be migration compat issues that show up only
> with KVM and not TCG (though the most obvious source of those
> would be host kernel changes, which is kind of out of scope
> for us).
> 
> Though of course with our CI jobs we're probably not
> doing AArch64 KVM cross-version testing anyway...
> 
On the cloud provider side*, we do rely on having rollbacks work.

We rely on staged deployments with rolling back if things go wrong
as we observe progress.

Note that the set of MSRs KVM gives (at least on AArch64) does sometimes
vary between releases so for rolling back you’ll need to ignore some (new) 
sysregs in the vmm. With careful planning so that you deploy a VMM
release with a point-fix to ignore the new registers and then the kernel update.

So not dealing with that would make the cloud use case not usable without 
downstream patches.

*although we don’t rely on Qemu for Nitro System VMs

Thank you,
> thanks
> -- PMM
> 



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-07-31 16:23 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-30 20:52 [RFC PATCH 0/3] migration: Fix aarch64 cpregs migration Fabiano Rosas
2025-07-30 20:52 ` [RFC PATCH 1/3] target/arm: Fix migration to QEMU 10.1 Fabiano Rosas
2025-07-30 20:52 ` [RFC PATCH 2/3] tests/qtest/migration: Only test aarch64 on TCG Fabiano Rosas
2025-07-31 13:37   ` Peter Maydell
2025-07-31 14:58     ` Mohamed Mediouni
2025-07-30 20:52 ` [RFC PATCH 3/3] tests/qtest/migration: Change cpu for aarch64 Fabiano Rosas
2025-07-31 10:03   ` Jonathan Cameron via
2025-07-31 13:55     ` Fabiano Rosas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).