* [PATCH v6 01/11] hw/arm/virt: Rename arm_virt_compat into arm_virt_compat_defaults
2026-01-26 16:52 [PATCH v6 00/11] Mitigation of "failed to load cpu:cpreg_vmstate_array_len" migration failures Eric Auger
@ 2026-01-26 16:53 ` Eric Auger
2026-02-06 14:34 ` Peter Maydell
2026-01-26 16:53 ` [PATCH v6 02/11] target/arm/machine: Improve traces on register mismatch during migration Eric Auger
` (12 subsequent siblings)
13 siblings, 1 reply; 32+ messages in thread
From: Eric Auger @ 2026-01-26 16:53 UTC (permalink / raw)
To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, peter.maydell,
cohuck, maz, oliver.upton, sebott, gshan, ddutile, peterx, philmd,
pbonzini
Renaming arm_virt_compat into arm_virt_compat_defaults
makes more obvious that those compats apply to all machine
types by default, if not overriden for specific ones. This also
matches the terminology used for pc-q35.
Suggested-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Sebastian Ott <sebott@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
---
hw/arm/virt.c | 13 +++++++------
1 file changed, 7 insertions(+), 6 deletions(-)
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 4badc1a7348..baa4e31aac1 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -94,20 +94,21 @@
#include "hw/cxl/cxl_host.h"
#include "qemu/guest-random.h"
-static GlobalProperty arm_virt_compat[] = {
+static GlobalProperty arm_virt_compat_defaults[] = {
{ TYPE_VIRTIO_IOMMU_PCI, "aw-bits", "48" },
};
-static const size_t arm_virt_compat_len = G_N_ELEMENTS(arm_virt_compat);
+static const size_t arm_virt_compat_defaults_len =
+ G_N_ELEMENTS(arm_virt_compat_defaults);
/*
* This cannot be called from the virt_machine_class_init() because
* TYPE_VIRT_MACHINE is abstract and mc->compat_props g_ptr_array_new()
* only is called on virt non abstract class init.
*/
-static void arm_virt_compat_set(MachineClass *mc)
+static void arm_virt_compat_default_set(MachineClass *mc)
{
- compat_props_add(mc->compat_props, arm_virt_compat,
- arm_virt_compat_len);
+ compat_props_add(mc->compat_props, arm_virt_compat_defaults,
+ arm_virt_compat_defaults_len);
}
#define DEFINE_VIRT_MACHINE_IMPL(latest, ...) \
@@ -116,7 +117,7 @@ static void arm_virt_compat_set(MachineClass *mc)
const void *data) \
{ \
MachineClass *mc = MACHINE_CLASS(oc); \
- arm_virt_compat_set(mc); \
+ arm_virt_compat_default_set(mc); \
MACHINE_VER_SYM(options, virt, __VA_ARGS__)(mc); \
mc->desc = "QEMU " MACHINE_VER_STR(__VA_ARGS__) " ARM Virtual Machine"; \
MACHINE_VER_DEPRECATION(__VA_ARGS__); \
--
2.52.0
^ permalink raw reply related [flat|nested] 32+ messages in thread* Re: [PATCH v6 01/11] hw/arm/virt: Rename arm_virt_compat into arm_virt_compat_defaults
2026-01-26 16:53 ` [PATCH v6 01/11] hw/arm/virt: Rename arm_virt_compat into arm_virt_compat_defaults Eric Auger
@ 2026-02-06 14:34 ` Peter Maydell
0 siblings, 0 replies; 32+ messages in thread
From: Peter Maydell @ 2026-02-06 14:34 UTC (permalink / raw)
To: Eric Auger
Cc: eric.auger.pro, qemu-devel, qemu-arm, cohuck, maz, oliver.upton,
sebott, gshan, peterx, philmd, pbonzini
On Mon, 26 Jan 2026 at 16:55, Eric Auger <eric.auger@redhat.com> wrote:
>
> Renaming arm_virt_compat into arm_virt_compat_defaults
> makes more obvious that those compats apply to all machine
> types by default, if not overriden for specific ones. This also
> matches the terminology used for pc-q35.
>
> Suggested-by: Igor Mammedov <imammedo@redhat.com>
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> Reviewed-by: Sebastian Ott <sebott@redhat.com>
> Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Having our function name match the pc version seems sensible,
and this patch isn't strongly related to the rest of the
series, so I've applied this one to target-arm.next.
thanks
-- PMM
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH v6 02/11] target/arm/machine: Improve traces on register mismatch during migration
2026-01-26 16:52 [PATCH v6 00/11] Mitigation of "failed to load cpu:cpreg_vmstate_array_len" migration failures Eric Auger
2026-01-26 16:53 ` [PATCH v6 01/11] hw/arm/virt: Rename arm_virt_compat into arm_virt_compat_defaults Eric Auger
@ 2026-01-26 16:53 ` Eric Auger
2026-02-06 14:31 ` Peter Maydell
2026-02-09 15:08 ` Alex Bennée
2026-01-26 16:53 ` [PATCH v6 03/11] target/arm/cpu: Allow registers to be hidden Eric Auger
` (11 subsequent siblings)
13 siblings, 2 replies; 32+ messages in thread
From: Eric Auger @ 2026-01-26 16:53 UTC (permalink / raw)
To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, peter.maydell,
cohuck, maz, oliver.upton, sebott, gshan, ddutile, peterx, philmd,
pbonzini
Currently when the number of KVM registers exposed by the source is
larger than the one exposed on the destination, the migration fails
with: "failed to load cpu:cpreg_vmstate_array_len"
This gives no information about which registers are causing the trouble.
This patch reworks the target/arm/machine code so that it becomes
able to handle an input stream with a larger set of registers than
the destination and print useful information about which registers
are causing the trouble. The migration outcome is unchanged:
- unexpected registers still will fail the migration
- missing ones are printed but will not fail the migration, as done today.
The input stream can contain MAX_CPREG_VMSTATE_ANOMALIES(10) extra
registers compared to what exists on the target.
If there are more registers we will still hit the previous
"load cpu:cpreg_vmstate_array_len" error.
At most, MAX_CPREG_VMSTATE_ANOMALIES missing registers
and MAX_CPREG_VMSTATE_ANOMALIES unexpected registers are printed.
Example:
qemu-system-aarch64: kvm_arm_cpu_post_load Missing register in input stream: 0 0x6030000000160003 fw feat reg 3
qemu-system-aarch64: kvm_arm_cpu_post_load Unexpected register in input stream: 0 0x603000000013c103 op0:3 op1:0 crn:2 crm:0 op2:3
qemu-system-aarch64: kvm_arm_cpu_post_load Unexpected register in input stream: 1 0x603000000013c512 op0:3 op1:0 crn:10 crm:2 op2:2
qemu-system-aarch64: kvm_arm_cpu_post_load Unexpected register in input stream: 2 0x603000000013c513 op0:3 op1:0 crn:10 crm:2 op2:3
qemu-system-aarch64: error while loading state for instance 0x0 of device 'cpu'
qemu-system-aarch64: load of migration failed: Operation not permitted
With TCG there is no user friendly formatting of the faulting
register indexes as with KVM. However the 2 added trace points
help to identify the culprit indexes.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
---
v2 -> v3:
- some extra typos (Connie)
- collected Connie's R-b
v1 -> v2:
- fixed some type in the commit msg
---
target/arm/cpu.h | 6 +++++
target/arm/kvm.c | 23 ++++++++++++++++
target/arm/machine.c | 58 ++++++++++++++++++++++++++++++++++++-----
target/arm/trace-events | 7 +++++
4 files changed, 88 insertions(+), 6 deletions(-)
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 1eaf5a3fddf..e900ef7937b 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -939,6 +939,12 @@ struct ArchCPU {
uint64_t *cpreg_vmstate_values;
int32_t cpreg_vmstate_array_len;
+ #define MAX_CPREG_VMSTATE_ANOMALIES 10
+ uint64_t cpreg_vmstate_missing_indexes[MAX_CPREG_VMSTATE_ANOMALIES];
+ int32_t cpreg_vmstate_missing_indexes_array_len;
+ uint64_t cpreg_vmstate_unexpected_indexes[MAX_CPREG_VMSTATE_ANOMALIES];
+ int32_t cpreg_vmstate_unexpected_indexes_array_len;
+
DynamicGDBFeatureInfo dyn_sysreg_feature;
DynamicGDBFeatureInfo dyn_svereg_feature;
DynamicGDBFeatureInfo dyn_smereg_feature;
diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index 48f853fff80..c6f0d0fc4e1 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -1024,6 +1024,29 @@ void kvm_arm_cpu_pre_save(ARMCPU *cpu)
bool kvm_arm_cpu_post_load(ARMCPU *cpu)
{
+ int i;
+
+ for (i = 0; i < cpu->cpreg_vmstate_missing_indexes_array_len; i++) {
+ gchar *name;
+
+ name = kvm_print_register_name(cpu->cpreg_vmstate_missing_indexes[i]);
+ trace_kvm_arm_cpu_post_load_missing_reg(name);
+ g_free(name);
+ }
+
+ for (i = 0; i < cpu->cpreg_vmstate_unexpected_indexes_array_len; i++) {
+ gchar *name;
+
+ name = kvm_print_register_name(cpu->cpreg_vmstate_unexpected_indexes[i]);
+ error_report("%s Unexpected register in input stream: %i 0x%"PRIx64" %s",
+ __func__, i, cpu->cpreg_vmstate_unexpected_indexes[i], name);
+ g_free(name);
+ }
+ /* Fail the migration if we detect unexpected registers */
+ if (cpu->cpreg_vmstate_unexpected_indexes_array_len) {
+ return false;
+ }
+
if (!write_list_to_kvmstate(cpu, KVM_PUT_FULL_STATE)) {
return false;
}
diff --git a/target/arm/machine.c b/target/arm/machine.c
index 0befdb0b28a..f06a920aba1 100644
--- a/target/arm/machine.c
+++ b/target/arm/machine.c
@@ -10,6 +10,7 @@
#include "migration/vmstate.h"
#include "target/arm/gtimer.h"
#include "hw/arm/machines-qom.h"
+#include "trace.h"
static bool vfp_needed(void *opaque)
{
@@ -990,7 +991,13 @@ static int cpu_pre_load(void *opaque)
{
ARMCPU *cpu = opaque;
CPUARMState *env = &cpu->env;
+ int arraylen = cpu->cpreg_vmstate_array_len + MAX_CPREG_VMSTATE_ANOMALIES;
+ cpu->cpreg_vmstate_indexes = g_renew(uint64_t, cpu->cpreg_vmstate_indexes,
+ arraylen);
+ cpu->cpreg_vmstate_values = g_renew(uint64_t, cpu->cpreg_vmstate_values,
+ arraylen);
+ cpu->cpreg_vmstate_array_len = arraylen;
/*
* In an inbound migration where on the source FPSCR/FPSR/FPCR are 0,
* there will be no fpcr_fpsr subsection so we won't call vfp_set_fpcr()
@@ -1023,7 +1030,7 @@ static int cpu_post_load(void *opaque, int version_id)
{
ARMCPU *cpu = opaque;
CPUARMState *env = &cpu->env;
- int i, v;
+ int i = 0, j = 0, k = 0, v = 0;
/*
* Handle migration compatibility from old QEMU which didn't
@@ -1051,27 +1058,66 @@ static int cpu_post_load(void *opaque, int version_id)
* entries with the right slots in our own values array.
*/
- for (i = 0, v = 0; i < cpu->cpreg_array_len
- && v < cpu->cpreg_vmstate_array_len; i++) {
+ trace_cpu_post_load_len(cpu->cpreg_array_len, cpu->cpreg_vmstate_array_len);
+ for (; i < cpu->cpreg_array_len && v < cpu->cpreg_vmstate_array_len;) {
+ trace_cpu_post_load(i, v , cpu->cpreg_indexes[i]);
if (cpu->cpreg_vmstate_indexes[v] > cpu->cpreg_indexes[i]) {
/* register in our list but not incoming : skip it */
+ trace_cpu_post_load_missing(i, cpu->cpreg_indexes[i], v);
+ if (j < MAX_CPREG_VMSTATE_ANOMALIES) {
+ cpu->cpreg_vmstate_missing_indexes[j++] = cpu->cpreg_indexes[i];
+ }
+ i++;
continue;
}
if (cpu->cpreg_vmstate_indexes[v] < cpu->cpreg_indexes[i]) {
- /* register in their list but not ours: fail migration */
- return -1;
+ /* register in their list but not ours: those will fail migration */
+ trace_cpu_post_load_unexpected(v, cpu->cpreg_vmstate_indexes[v], i);
+ if (k < MAX_CPREG_VMSTATE_ANOMALIES) {
+ cpu->cpreg_vmstate_unexpected_indexes[k++] =
+ cpu->cpreg_vmstate_indexes[v];
+ }
+ v++;
+ continue;
}
/* matching register, copy the value over */
cpu->cpreg_values[i] = cpu->cpreg_vmstate_values[v];
v++;
+ i++;
}
+ /*
+ * if we have reached the end of the incoming array but there are
+ * still regs in cpreg, continue parsing the regs which are missing
+ * in the input stream
+ */
+ for ( ; i < cpu->cpreg_array_len; i++) {
+ if (j < MAX_CPREG_VMSTATE_ANOMALIES) {
+ trace_cpu_post_load_missing(i, cpu->cpreg_indexes[i], v);
+ cpu->cpreg_vmstate_missing_indexes[j++] = cpu->cpreg_indexes[i];
+ }
+ }
+ /*
+ * if we have reached the end of the cpreg array but there are
+ * still regs in the input stream, continue parsing the vmstate array
+ */
+ for ( ; v < cpu->cpreg_vmstate_array_len; v++) {
+ if (k < MAX_CPREG_VMSTATE_ANOMALIES) {
+ trace_cpu_post_load_unexpected(v, cpu->cpreg_vmstate_indexes[v], i);
+ cpu->cpreg_vmstate_unexpected_indexes[k++] =
+ cpu->cpreg_vmstate_indexes[v];
+ }
+ }
+
+ cpu->cpreg_vmstate_missing_indexes_array_len = j;
+ cpu->cpreg_vmstate_unexpected_indexes_array_len = k;
if (kvm_enabled()) {
if (!kvm_arm_cpu_post_load(cpu)) {
return -1;
}
} else {
- if (!write_list_to_cpustate(cpu)) {
+ if (cpu->cpreg_vmstate_unexpected_indexes_array_len ||
+ !write_list_to_cpustate(cpu)) {
return -1;
}
}
diff --git a/target/arm/trace-events b/target/arm/trace-events
index 676d29fe516..0a5ed3e69d5 100644
--- a/target/arm/trace-events
+++ b/target/arm/trace-events
@@ -13,6 +13,7 @@ arm_gt_update_irq(int timer, int irqstate) "gt_update_irq: timer %d irqstate %d"
# kvm.c
kvm_arm_fixup_msi_route(uint64_t iova, uint64_t gpa) "MSI iova = 0x%"PRIx64" is translated into 0x%"PRIx64
+kvm_arm_cpu_post_load_missing_reg(char *name) "Missing register in input stream: %s"
# cpu.c
arm_cpu_reset(uint64_t mp_aff) "cpu %" PRIu64
@@ -26,3 +27,9 @@ arm_powerctl_reset_cpu(uint64_t mp_aff) "cpu %" PRIu64
# tcg/psci.c and hvf/hvf.c
arm_psci_call(uint64_t x0, uint64_t x1, uint64_t x2, uint64_t x3, uint32_t cpuid) "PSCI Call x0=0x%016"PRIx64" x1=0x%016"PRIx64" x2=0x%016"PRIx64" x3=0x%016"PRIx64" cpuid=0x%x"
+
+# machine.c
+cpu_post_load_len(int cpreg_array_len, int cpreg_vmstate_array_len) "cpreg_array_len=%d cpreg_vmstate_array_len=%d"
+cpu_post_load(int i, int v, uint64_t regidx) "i=%d v=%d regidx=0x%"PRIx64
+cpu_post_load_missing(int i, uint64_t regidx, int v) "missing register in input stream: i=%d index=0x%"PRIx64" (v=%d)"
+cpu_post_load_unexpected(int v, uint64_t regidx, int i) "unexpected register in input stream: v=%d index=0x%"PRIx64" (i=%d)"
--
2.52.0
^ permalink raw reply related [flat|nested] 32+ messages in thread* Re: [PATCH v6 02/11] target/arm/machine: Improve traces on register mismatch during migration
2026-01-26 16:53 ` [PATCH v6 02/11] target/arm/machine: Improve traces on register mismatch during migration Eric Auger
@ 2026-02-06 14:31 ` Peter Maydell
2026-02-09 12:51 ` Cornelia Huck
2026-02-09 13:42 ` Eric Auger
2026-02-09 15:08 ` Alex Bennée
1 sibling, 2 replies; 32+ messages in thread
From: Peter Maydell @ 2026-02-06 14:31 UTC (permalink / raw)
To: Eric Auger
Cc: eric.auger.pro, qemu-devel, qemu-arm, cohuck, maz, oliver.upton,
sebott, gshan, ddutile, peterx, philmd, pbonzini
On Mon, 26 Jan 2026 at 16:55, Eric Auger <eric.auger@redhat.com> wrote:
>
> Currently when the number of KVM registers exposed by the source is
> larger than the one exposed on the destination, the migration fails
> with: "failed to load cpu:cpreg_vmstate_array_len"
>
> This gives no information about which registers are causing the trouble.
>
> This patch reworks the target/arm/machine code so that it becomes
> able to handle an input stream with a larger set of registers than
> the destination and print useful information about which registers
> are causing the trouble. The migration outcome is unchanged:
> - unexpected registers still will fail the migration
> - missing ones are printed but will not fail the migration, as done today.
Improving the diagnostics here is a great idea.
> The input stream can contain MAX_CPREG_VMSTATE_ANOMALIES(10) extra
> registers compared to what exists on the target.
>
> If there are more registers we will still hit the previous
> "load cpu:cpreg_vmstate_array_len" error.
>
> At most, MAX_CPREG_VMSTATE_ANOMALIES missing registers
> and MAX_CPREG_VMSTATE_ANOMALIES unexpected registers are printed.
>
> Example:
>
> qemu-system-aarch64: kvm_arm_cpu_post_load Missing register in input stream: 0 0x6030000000160003 fw feat reg 3
> qemu-system-aarch64: kvm_arm_cpu_post_load Unexpected register in input stream: 0 0x603000000013c103 op0:3 op1:0 crn:2 crm:0 op2:3
> qemu-system-aarch64: kvm_arm_cpu_post_load Unexpected register in input stream: 1 0x603000000013c512 op0:3 op1:0 crn:10 crm:2 op2:2
> qemu-system-aarch64: kvm_arm_cpu_post_load Unexpected register in input stream: 2 0x603000000013c513 op0:3 op1:0 crn:10 crm:2 op2:3
> qemu-system-aarch64: error while loading state for instance 0x0 of device 'cpu'
> qemu-system-aarch64: load of migration failed: Operation not permitted
>
> With TCG there is no user friendly formatting of the faulting
> register indexes as with KVM. However the 2 added trace points
> help to identify the culprit indexes.
Could we move kvm_print_register_name() out of kvm.c and into
somewhere that the TCG code can use it? (I did think when I
was reviewing the patch that added that that we might want it
for TCG too eventually.)
> @@ -990,7 +991,13 @@ static int cpu_pre_load(void *opaque)
> {
> ARMCPU *cpu = opaque;
> CPUARMState *env = &cpu->env;
> + int arraylen = cpu->cpreg_vmstate_array_len + MAX_CPREG_VMSTATE_ANOMALIES;
>
> + cpu->cpreg_vmstate_indexes = g_renew(uint64_t, cpu->cpreg_vmstate_indexes,
> + arraylen);
> + cpu->cpreg_vmstate_values = g_renew(uint64_t, cpu->cpreg_vmstate_values,
> + arraylen);
> + cpu->cpreg_vmstate_array_len = arraylen;
It seems a bit odd to extend these on cpu_pre_load, especially
since it means we'll do so on every cpu_pre_load call, which I
think can happen if you try an inbound migration, it fails, and
then you retry it.
I think it ought to be possible to both avoid this reallocation
and the problem noted in the commit message where more than 10
extra registers results in an unhelpful message, if we can
convert the vmstate fields from VMSTATE_VARRAY_INT32 to
VMSTATE_VARRAY_INT32_ALLOC. (That latter doesn't exist yet but
will be the INT32 equivalent of VMSTATE_VARRAY_UINT32_ALLOC.)
If I have read the code correctly, these should work by
having the inbound migration code allocate the buffer for the
array data instead of expecting it to be pre-allocated -- that
means our post_load function can look at all the data it got
without imposing a length limitation.
I think (but we should check :-)) that the data in the migration
stream is the same in both cases, so this will not be a compat break.
(Some existing code will need adjustment to avoid a memory leak,
e.g. g_free any existing array in pre_load.)
thanks
-- PMM
^ permalink raw reply [flat|nested] 32+ messages in thread* Re: [PATCH v6 02/11] target/arm/machine: Improve traces on register mismatch during migration
2026-02-06 14:31 ` Peter Maydell
@ 2026-02-09 12:51 ` Cornelia Huck
2026-02-09 13:56 ` Eric Auger
2026-02-09 13:42 ` Eric Auger
1 sibling, 1 reply; 32+ messages in thread
From: Cornelia Huck @ 2026-02-09 12:51 UTC (permalink / raw)
To: Peter Maydell, Eric Auger
Cc: eric.auger.pro, qemu-devel, qemu-arm, maz, oliver.upton, sebott,
gshan, ddutile, peterx, philmd, pbonzini
On Fri, Feb 06 2026, Peter Maydell <peter.maydell@linaro.org> wrote:
> On Mon, 26 Jan 2026 at 16:55, Eric Auger <eric.auger@redhat.com> wrote:
>>
>> Currently when the number of KVM registers exposed by the source is
>> larger than the one exposed on the destination, the migration fails
>> with: "failed to load cpu:cpreg_vmstate_array_len"
>>
>> This gives no information about which registers are causing the trouble.
>>
>> This patch reworks the target/arm/machine code so that it becomes
>> able to handle an input stream with a larger set of registers than
>> the destination and print useful information about which registers
>> are causing the trouble. The migration outcome is unchanged:
>> - unexpected registers still will fail the migration
>> - missing ones are printed but will not fail the migration, as done today.
>
> Improving the diagnostics here is a great idea.
>
>> The input stream can contain MAX_CPREG_VMSTATE_ANOMALIES(10) extra
>> registers compared to what exists on the target.
>>
>> If there are more registers we will still hit the previous
>> "load cpu:cpreg_vmstate_array_len" error.
>>
>> At most, MAX_CPREG_VMSTATE_ANOMALIES missing registers
>> and MAX_CPREG_VMSTATE_ANOMALIES unexpected registers are printed.
>>
>> Example:
>>
>> qemu-system-aarch64: kvm_arm_cpu_post_load Missing register in input stream: 0 0x6030000000160003 fw feat reg 3
>> qemu-system-aarch64: kvm_arm_cpu_post_load Unexpected register in input stream: 0 0x603000000013c103 op0:3 op1:0 crn:2 crm:0 op2:3
>> qemu-system-aarch64: kvm_arm_cpu_post_load Unexpected register in input stream: 1 0x603000000013c512 op0:3 op1:0 crn:10 crm:2 op2:2
>> qemu-system-aarch64: kvm_arm_cpu_post_load Unexpected register in input stream: 2 0x603000000013c513 op0:3 op1:0 crn:10 crm:2 op2:3
>> qemu-system-aarch64: error while loading state for instance 0x0 of device 'cpu'
>> qemu-system-aarch64: load of migration failed: Operation not permitted
>>
>> With TCG there is no user friendly formatting of the faulting
>> register indexes as with KVM. However the 2 added trace points
>> help to identify the culprit indexes.
>
> Could we move kvm_print_register_name() out of kvm.c and into
> somewhere that the TCG code can use it? (I did think when I
> was reviewing the patch that added that that we might want it
> for TCG too eventually.)
I'm wondering which parts could/should be generalized -- the sysreg
encodings match with the CP_REG_ encodings, but I don't think much else?
Might be worth trying to split those regs off?
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v6 02/11] target/arm/machine: Improve traces on register mismatch during migration
2026-02-09 12:51 ` Cornelia Huck
@ 2026-02-09 13:56 ` Eric Auger
0 siblings, 0 replies; 32+ messages in thread
From: Eric Auger @ 2026-02-09 13:56 UTC (permalink / raw)
To: Cornelia Huck, Peter Maydell
Cc: eric.auger.pro, qemu-devel, qemu-arm, maz, oliver.upton, sebott,
gshan, ddutile, peterx, philmd, pbonzini
On 2/9/26 1:51 PM, Cornelia Huck wrote:
> On Fri, Feb 06 2026, Peter Maydell <peter.maydell@linaro.org> wrote:
>
>> On Mon, 26 Jan 2026 at 16:55, Eric Auger <eric.auger@redhat.com> wrote:
>>> Currently when the number of KVM registers exposed by the source is
>>> larger than the one exposed on the destination, the migration fails
>>> with: "failed to load cpu:cpreg_vmstate_array_len"
>>>
>>> This gives no information about which registers are causing the trouble.
>>>
>>> This patch reworks the target/arm/machine code so that it becomes
>>> able to handle an input stream with a larger set of registers than
>>> the destination and print useful information about which registers
>>> are causing the trouble. The migration outcome is unchanged:
>>> - unexpected registers still will fail the migration
>>> - missing ones are printed but will not fail the migration, as done today.
>> Improving the diagnostics here is a great idea.
>>
>>> The input stream can contain MAX_CPREG_VMSTATE_ANOMALIES(10) extra
>>> registers compared to what exists on the target.
>>>
>>> If there are more registers we will still hit the previous
>>> "load cpu:cpreg_vmstate_array_len" error.
>>>
>>> At most, MAX_CPREG_VMSTATE_ANOMALIES missing registers
>>> and MAX_CPREG_VMSTATE_ANOMALIES unexpected registers are printed.
>>>
>>> Example:
>>>
>>> qemu-system-aarch64: kvm_arm_cpu_post_load Missing register in input stream: 0 0x6030000000160003 fw feat reg 3
>>> qemu-system-aarch64: kvm_arm_cpu_post_load Unexpected register in input stream: 0 0x603000000013c103 op0:3 op1:0 crn:2 crm:0 op2:3
>>> qemu-system-aarch64: kvm_arm_cpu_post_load Unexpected register in input stream: 1 0x603000000013c512 op0:3 op1:0 crn:10 crm:2 op2:2
>>> qemu-system-aarch64: kvm_arm_cpu_post_load Unexpected register in input stream: 2 0x603000000013c513 op0:3 op1:0 crn:10 crm:2 op2:3
>>> qemu-system-aarch64: error while loading state for instance 0x0 of device 'cpu'
>>> qemu-system-aarch64: load of migration failed: Operation not permitted
>>>
>>> With TCG there is no user friendly formatting of the faulting
>>> register indexes as with KVM. However the 2 added trace points
>>> help to identify the culprit indexes.
>> Could we move kvm_print_register_name() out of kvm.c and into
>> somewhere that the TCG code can use it? (I did think when I
>> was reviewing the patch that added that that we might want it
>> for TCG too eventually.)
> I'm wondering which parts could/should be generalized -- the sysreg
> encodings match with the CP_REG_ encodings, but I don't think much else?
> Might be worth trying to split those regs off?
>
In target/arm/cpregs.h, there is cpreg_to_kvm_id() which is used to
convert some cpreg into kvm regidx.
Those latter are stored in cpu->cpreg_indexes. I noticed that because
when advertising TCG
DBGDTRTX 0x40200000200e0298
as safe to ignore in the incoming stream I used the kvm regidx in the prop value.
cpreg_to_kvm_id does not use any KVM header. I guess we may rewrite
kvm_print_register_name() into something similar and use it also for TCG. what do you think?
Thanks
Eric
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v6 02/11] target/arm/machine: Improve traces on register mismatch during migration
2026-02-06 14:31 ` Peter Maydell
2026-02-09 12:51 ` Cornelia Huck
@ 2026-02-09 13:42 ` Eric Auger
1 sibling, 0 replies; 32+ messages in thread
From: Eric Auger @ 2026-02-09 13:42 UTC (permalink / raw)
To: Peter Maydell
Cc: eric.auger.pro, qemu-devel, qemu-arm, cohuck, maz, oliver.upton,
sebott, gshan, ddutile, peterx, philmd, pbonzini
Hi Peter,
On 2/6/26 3:31 PM, Peter Maydell wrote:
> On Mon, 26 Jan 2026 at 16:55, Eric Auger <eric.auger@redhat.com> wrote:
>> Currently when the number of KVM registers exposed by the source is
>> larger than the one exposed on the destination, the migration fails
>> with: "failed to load cpu:cpreg_vmstate_array_len"
>>
>> This gives no information about which registers are causing the trouble.
>>
>> This patch reworks the target/arm/machine code so that it becomes
>> able to handle an input stream with a larger set of registers than
>> the destination and print useful information about which registers
>> are causing the trouble. The migration outcome is unchanged:
>> - unexpected registers still will fail the migration
>> - missing ones are printed but will not fail the migration, as done today.
> Improving the diagnostics here is a great idea.
>
>> The input stream can contain MAX_CPREG_VMSTATE_ANOMALIES(10) extra
>> registers compared to what exists on the target.
>>
>> If there are more registers we will still hit the previous
>> "load cpu:cpreg_vmstate_array_len" error.
>>
>> At most, MAX_CPREG_VMSTATE_ANOMALIES missing registers
>> and MAX_CPREG_VMSTATE_ANOMALIES unexpected registers are printed.
>>
>> Example:
>>
>> qemu-system-aarch64: kvm_arm_cpu_post_load Missing register in input stream: 0 0x6030000000160003 fw feat reg 3
>> qemu-system-aarch64: kvm_arm_cpu_post_load Unexpected register in input stream: 0 0x603000000013c103 op0:3 op1:0 crn:2 crm:0 op2:3
>> qemu-system-aarch64: kvm_arm_cpu_post_load Unexpected register in input stream: 1 0x603000000013c512 op0:3 op1:0 crn:10 crm:2 op2:2
>> qemu-system-aarch64: kvm_arm_cpu_post_load Unexpected register in input stream: 2 0x603000000013c513 op0:3 op1:0 crn:10 crm:2 op2:3
>> qemu-system-aarch64: error while loading state for instance 0x0 of device 'cpu'
>> qemu-system-aarch64: load of migration failed: Operation not permitted
>>
>> With TCG there is no user friendly formatting of the faulting
>> register indexes as with KVM. However the 2 added trace points
>> help to identify the culprit indexes.
> Could we move kvm_print_register_name() out of kvm.c and into
> somewhere that the TCG code can use it? (I did think when I
> was reviewing the patch that added that that we might want it
> for TCG too eventually.)
>
>> @@ -990,7 +991,13 @@ static int cpu_pre_load(void *opaque)
>> {
>> ARMCPU *cpu = opaque;
>> CPUARMState *env = &cpu->env;
>> + int arraylen = cpu->cpreg_vmstate_array_len + MAX_CPREG_VMSTATE_ANOMALIES;
>>
>> + cpu->cpreg_vmstate_indexes = g_renew(uint64_t, cpu->cpreg_vmstate_indexes,
>> + arraylen);
>> + cpu->cpreg_vmstate_values = g_renew(uint64_t, cpu->cpreg_vmstate_values,
>> + arraylen);
>> + cpu->cpreg_vmstate_array_len = arraylen;
> It seems a bit odd to extend these on cpu_pre_load, especially
> since it means we'll do so on every cpu_pre_load call, which I
> think can happen if you try an inbound migration, it fails, and
> then you retry it.
yes in that case g_renew would be called several times with
cpu->cpreg_vmstate_array_len increased each time I guess. However with
the current way cpreg_vmstate_indexes/values arrays were statically
allocated I did not have other solution than using pre_load() I think.
>
> I think it ought to be possible to both avoid this reallocation
> and the problem noted in the commit message where more than 10
> extra registers results in an unhelpful message, if we can
> convert the vmstate fields from VMSTATE_VARRAY_INT32 to
> VMSTATE_VARRAY_INT32_ALLOC. (That latter doesn't exist yet but
> will be the INT32 equivalent of VMSTATE_VARRAY_UINT32_ALLOC.)
VMSTATE_VARRAY_INT32_ALLOC sounds a great idea indeed. I was not aware of its existence. I will try this out.
>
> If I have read the code correctly, these should work by
> having the inbound migration code allocate the buffer for the
> array data instead of expecting it to be pre-allocated -- that
> means our post_load function can look at all the data it got
> without imposing a length limitation.
>
> I think (but we should check :-)) that the data in the migration
> stream is the same in both cases, so this will not be a compat break.
OK I will test this.
Thank you for the review!
Eric
>
> (Some existing code will need adjustment to avoid a memory leak,
> e.g. g_free any existing array in pre_load.)
>
> thanks
> -- PMM
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v6 02/11] target/arm/machine: Improve traces on register mismatch during migration
2026-01-26 16:53 ` [PATCH v6 02/11] target/arm/machine: Improve traces on register mismatch during migration Eric Auger
2026-02-06 14:31 ` Peter Maydell
@ 2026-02-09 15:08 ` Alex Bennée
2026-02-09 15:20 ` Peter Maydell
2026-02-09 16:04 ` Eric Auger
1 sibling, 2 replies; 32+ messages in thread
From: Alex Bennée @ 2026-02-09 15:08 UTC (permalink / raw)
To: Eric Auger
Cc: eric.auger.pro, qemu-devel, qemu-arm, peter.maydell, cohuck, maz,
oliver.upton, sebott, gshan, ddutile, peterx, philmd, pbonzini
Eric Auger <eric.auger@redhat.com> writes:
> Currently when the number of KVM registers exposed by the source is
> larger than the one exposed on the destination, the migration fails
> with: "failed to load cpu:cpreg_vmstate_array_len"
>
> This gives no information about which registers are causing the trouble.
>
> This patch reworks the target/arm/machine code so that it becomes
> able to handle an input stream with a larger set of registers than
> the destination and print useful information about which registers
> are causing the trouble. The migration outcome is unchanged:
> - unexpected registers still will fail the migration
> - missing ones are printed but will not fail the migration, as done today.
>
> The input stream can contain MAX_CPREG_VMSTATE_ANOMALIES(10) extra
> registers compared to what exists on the target.
>
> If there are more registers we will still hit the previous
> "load cpu:cpreg_vmstate_array_len" error.
>
> At most, MAX_CPREG_VMSTATE_ANOMALIES missing registers
> and MAX_CPREG_VMSTATE_ANOMALIES unexpected registers are printed.
>
> Example:
>
> qemu-system-aarch64: kvm_arm_cpu_post_load Missing register in input stream: 0 0x6030000000160003 fw feat reg 3
> qemu-system-aarch64: kvm_arm_cpu_post_load Unexpected register in input stream: 0 0x603000000013c103 op0:3 op1:0 crn:2 crm:0 op2:3
> qemu-system-aarch64: kvm_arm_cpu_post_load Unexpected register in input stream: 1 0x603000000013c512 op0:3 op1:0 crn:10 crm:2 op2:2
> qemu-system-aarch64: kvm_arm_cpu_post_load Unexpected register in input stream: 2 0x603000000013c513 op0:3 op1:0 crn:10 crm:2 op2:3
> qemu-system-aarch64: error while loading state for instance 0x0 of device 'cpu'
> qemu-system-aarch64: load of migration failed: Operation not permitted
>
> With TCG there is no user friendly formatting of the faulting
> register indexes as with KVM. However the 2 added trace points
> help to identify the culprit indexes.
>
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> Reviewed-by: Cornelia Huck <cohuck@redhat.com>
>
> ---
>
> v2 -> v3:
> - some extra typos (Connie)
> - collected Connie's R-b
>
> v1 -> v2:
> - fixed some type in the commit msg
> ---
> target/arm/cpu.h | 6 +++++
> target/arm/kvm.c | 23 ++++++++++++++++
> target/arm/machine.c | 58 ++++++++++++++++++++++++++++++++++++-----
> target/arm/trace-events | 7 +++++
> 4 files changed, 88 insertions(+), 6 deletions(-)
>
> diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> index 1eaf5a3fddf..e900ef7937b 100644
> --- a/target/arm/cpu.h
> +++ b/target/arm/cpu.h
> @@ -939,6 +939,12 @@ struct ArchCPU {
> uint64_t *cpreg_vmstate_values;
> int32_t cpreg_vmstate_array_len;
>
> + #define MAX_CPREG_VMSTATE_ANOMALIES 10
> + uint64_t cpreg_vmstate_missing_indexes[MAX_CPREG_VMSTATE_ANOMALIES];
> + int32_t cpreg_vmstate_missing_indexes_array_len;
> + uint64_t cpreg_vmstate_unexpected_indexes[MAX_CPREG_VMSTATE_ANOMALIES];
> + int32_t cpreg_vmstate_unexpected_indexes_array_len;
> +
This seems a bit old school when we have GArray.
> DynamicGDBFeatureInfo dyn_sysreg_feature;
> DynamicGDBFeatureInfo dyn_svereg_feature;
> DynamicGDBFeatureInfo dyn_smereg_feature;
> diff --git a/target/arm/kvm.c b/target/arm/kvm.c
> index 48f853fff80..c6f0d0fc4e1 100644
> --- a/target/arm/kvm.c
> +++ b/target/arm/kvm.c
> @@ -1024,6 +1024,29 @@ void kvm_arm_cpu_pre_save(ARMCPU *cpu)
>
> bool kvm_arm_cpu_post_load(ARMCPU *cpu)
> {
> + int i;
> +
> + for (i = 0; i < cpu->cpreg_vmstate_missing_indexes_array_len; i++) {
> + gchar *name;
> +
> + name = kvm_print_register_name(cpu->cpreg_vmstate_missing_indexes[i]);
> + trace_kvm_arm_cpu_post_load_missing_reg(name);
> + g_free(name);
> + }
> +
> + for (i = 0; i < cpu->cpreg_vmstate_unexpected_indexes_array_len; i++) {
> + gchar *name;
> +
> + name = kvm_print_register_name(cpu->cpreg_vmstate_unexpected_indexes[i]);
> + error_report("%s Unexpected register in input stream: %i 0x%"PRIx64" %s",
> + __func__, i, cpu->cpreg_vmstate_unexpected_indexes[i], name);
> + g_free(name);
> + }
> + /* Fail the migration if we detect unexpected registers */
> + if (cpu->cpreg_vmstate_unexpected_indexes_array_len) {
> + return false;
> + }
> +
> if (!write_list_to_kvmstate(cpu, KVM_PUT_FULL_STATE)) {
> return false;
> }
> diff --git a/target/arm/machine.c b/target/arm/machine.c
> index 0befdb0b28a..f06a920aba1 100644
> --- a/target/arm/machine.c
> +++ b/target/arm/machine.c
> @@ -10,6 +10,7 @@
> #include "migration/vmstate.h"
> #include "target/arm/gtimer.h"
> #include "hw/arm/machines-qom.h"
> +#include "trace.h"
>
> static bool vfp_needed(void *opaque)
> {
> @@ -990,7 +991,13 @@ static int cpu_pre_load(void *opaque)
> {
> ARMCPU *cpu = opaque;
> CPUARMState *env = &cpu->env;
> + int arraylen = cpu->cpreg_vmstate_array_len + MAX_CPREG_VMSTATE_ANOMALIES;
>
> + cpu->cpreg_vmstate_indexes = g_renew(uint64_t, cpu->cpreg_vmstate_indexes,
> + arraylen);
> + cpu->cpreg_vmstate_values = g_renew(uint64_t, cpu->cpreg_vmstate_values,
> + arraylen);
> + cpu->cpreg_vmstate_array_len = arraylen;
I wonder if these would be candidates for fixing up as well.
> /*
> * In an inbound migration where on the source FPSCR/FPSR/FPCR are 0,
> * there will be no fpcr_fpsr subsection so we won't call vfp_set_fpcr()
> @@ -1023,7 +1030,7 @@ static int cpu_post_load(void *opaque, int version_id)
> {
> ARMCPU *cpu = opaque;
> CPUARMState *env = &cpu->env;
> - int i, v;
> + int i = 0, j = 0, k = 0, v = 0;
>
> /*
> * Handle migration compatibility from old QEMU which didn't
> @@ -1051,27 +1058,66 @@ static int cpu_post_load(void *opaque, int version_id)
> * entries with the right slots in our own values array.
> */
>
> - for (i = 0, v = 0; i < cpu->cpreg_array_len
> - && v < cpu->cpreg_vmstate_array_len; i++) {
> + trace_cpu_post_load_len(cpu->cpreg_array_len, cpu->cpreg_vmstate_array_len);
> + for (; i < cpu->cpreg_array_len && v < cpu->cpreg_vmstate_array_len;) {
> + trace_cpu_post_load(i, v , cpu->cpreg_indexes[i]);
> if (cpu->cpreg_vmstate_indexes[v] > cpu->cpreg_indexes[i]) {
> /* register in our list but not incoming : skip it */
> + trace_cpu_post_load_missing(i, cpu->cpreg_indexes[i], v);
> + if (j < MAX_CPREG_VMSTATE_ANOMALIES) {
> + cpu->cpreg_vmstate_missing_indexes[j++] = cpu->cpreg_indexes[i];
> + }
> + i++;
> continue;
> }
> if (cpu->cpreg_vmstate_indexes[v] < cpu->cpreg_indexes[i]) {
> - /* register in their list but not ours: fail migration */
> - return -1;
> + /* register in their list but not ours: those will fail migration */
> + trace_cpu_post_load_unexpected(v, cpu->cpreg_vmstate_indexes[v], i);
> + if (k < MAX_CPREG_VMSTATE_ANOMALIES) {
> + cpu->cpreg_vmstate_unexpected_indexes[k++] =
> + cpu->cpreg_vmstate_indexes[v];
> + }
> + v++;
> + continue;
> }
> /* matching register, copy the value over */
> cpu->cpreg_values[i] = cpu->cpreg_vmstate_values[v];
> v++;
> + i++;
> }
> + /*
> + * if we have reached the end of the incoming array but there are
> + * still regs in cpreg, continue parsing the regs which are missing
> + * in the input stream
> + */
> + for ( ; i < cpu->cpreg_array_len; i++) {
> + if (j < MAX_CPREG_VMSTATE_ANOMALIES) {
> + trace_cpu_post_load_missing(i, cpu->cpreg_indexes[i], v);
> + cpu->cpreg_vmstate_missing_indexes[j++] = cpu->cpreg_indexes[i];
> + }
> + }
> + /*
> + * if we have reached the end of the cpreg array but there are
> + * still regs in the input stream, continue parsing the vmstate array
> + */
> + for ( ; v < cpu->cpreg_vmstate_array_len; v++) {
> + if (k < MAX_CPREG_VMSTATE_ANOMALIES) {
> + trace_cpu_post_load_unexpected(v, cpu->cpreg_vmstate_indexes[v], i);
> + cpu->cpreg_vmstate_unexpected_indexes[k++] =
> + cpu->cpreg_vmstate_indexes[v];
> + }
> + }
> +
> + cpu->cpreg_vmstate_missing_indexes_array_len = j;
> + cpu->cpreg_vmstate_unexpected_indexes_array_len = k;
>
> if (kvm_enabled()) {
> if (!kvm_arm_cpu_post_load(cpu)) {
> return -1;
> }
> } else {
> - if (!write_list_to_cpustate(cpu)) {
> + if (cpu->cpreg_vmstate_unexpected_indexes_array_len ||
> + !write_list_to_cpustate(cpu)) {
> return -1;
> }
> }
> diff --git a/target/arm/trace-events b/target/arm/trace-events
> index 676d29fe516..0a5ed3e69d5 100644
> --- a/target/arm/trace-events
> +++ b/target/arm/trace-events
> @@ -13,6 +13,7 @@ arm_gt_update_irq(int timer, int irqstate) "gt_update_irq: timer %d irqstate %d"
>
> # kvm.c
> kvm_arm_fixup_msi_route(uint64_t iova, uint64_t gpa) "MSI iova = 0x%"PRIx64" is translated into 0x%"PRIx64
> +kvm_arm_cpu_post_load_missing_reg(char *name) "Missing register in input stream: %s"
>
> # cpu.c
> arm_cpu_reset(uint64_t mp_aff) "cpu %" PRIu64
> @@ -26,3 +27,9 @@ arm_powerctl_reset_cpu(uint64_t mp_aff) "cpu %" PRIu64
>
> # tcg/psci.c and hvf/hvf.c
> arm_psci_call(uint64_t x0, uint64_t x1, uint64_t x2, uint64_t x3, uint32_t cpuid) "PSCI Call x0=0x%016"PRIx64" x1=0x%016"PRIx64" x2=0x%016"PRIx64" x3=0x%016"PRIx64" cpuid=0x%x"
> +
> +# machine.c
> +cpu_post_load_len(int cpreg_array_len, int cpreg_vmstate_array_len) "cpreg_array_len=%d cpreg_vmstate_array_len=%d"
> +cpu_post_load(int i, int v, uint64_t regidx) "i=%d v=%d regidx=0x%"PRIx64
> +cpu_post_load_missing(int i, uint64_t regidx, int v) "missing register in input stream: i=%d index=0x%"PRIx64" (v=%d)"
> +cpu_post_load_unexpected(int v, uint64_t regidx, int i) "unexpected register in input stream: v=%d index=0x%"PRIx64" (i=%d)"
--
Alex Bennée
Virtualisation Tech Lead @ Linaro
^ permalink raw reply [flat|nested] 32+ messages in thread* Re: [PATCH v6 02/11] target/arm/machine: Improve traces on register mismatch during migration
2026-02-09 15:08 ` Alex Bennée
@ 2026-02-09 15:20 ` Peter Maydell
2026-02-09 16:04 ` Eric Auger
1 sibling, 0 replies; 32+ messages in thread
From: Peter Maydell @ 2026-02-09 15:20 UTC (permalink / raw)
To: Alex Bennée
Cc: Eric Auger, eric.auger.pro, qemu-devel, qemu-arm, cohuck, maz,
oliver.upton, sebott, gshan, ddutile, peterx, philmd, pbonzini
On Mon, 9 Feb 2026 at 15:08, Alex Bennée <alex.bennee@linaro.org> wrote:
>
> Eric Auger <eric.auger@redhat.com> writes:
> > + #define MAX_CPREG_VMSTATE_ANOMALIES 10
> > + uint64_t cpreg_vmstate_missing_indexes[MAX_CPREG_VMSTATE_ANOMALIES];
> > + int32_t cpreg_vmstate_missing_indexes_array_len;
> > + uint64_t cpreg_vmstate_unexpected_indexes[MAX_CPREG_VMSTATE_ANOMALIES];
> > + int32_t cpreg_vmstate_unexpected_indexes_array_len;
> > +
>
> This seems a bit old school when we have GArray.
> > + int arraylen = cpu->cpreg_vmstate_array_len + MAX_CPREG_VMSTATE_ANOMALIES;
> >
> > + cpu->cpreg_vmstate_indexes = g_renew(uint64_t, cpu->cpreg_vmstate_indexes,
> > + arraylen);
> > + cpu->cpreg_vmstate_values = g_renew(uint64_t, cpu->cpreg_vmstate_values,
> > + arraylen);
> > + cpu->cpreg_vmstate_array_len = arraylen;
>
> I wonder if these would be candidates for fixing up as well.
These ones are the types they are because they need to match
what VMSTATE_VARRAY_INT32() expects -- these are the data
that goes out on the migration stream.
-- PMM
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v6 02/11] target/arm/machine: Improve traces on register mismatch during migration
2026-02-09 15:08 ` Alex Bennée
2026-02-09 15:20 ` Peter Maydell
@ 2026-02-09 16:04 ` Eric Auger
1 sibling, 0 replies; 32+ messages in thread
From: Eric Auger @ 2026-02-09 16:04 UTC (permalink / raw)
To: Alex Bennée
Cc: eric.auger.pro, qemu-devel, qemu-arm, peter.maydell, cohuck, maz,
oliver.upton, sebott, gshan, ddutile, peterx, philmd, pbonzini
Hi Alex,
On 2/9/26 4:08 PM, Alex Bennée wrote:
> Eric Auger <eric.auger@redhat.com> writes:
>
>> Currently when the number of KVM registers exposed by the source is
>> larger than the one exposed on the destination, the migration fails
>> with: "failed to load cpu:cpreg_vmstate_array_len"
>>
>> This gives no information about which registers are causing the trouble.
>>
>> This patch reworks the target/arm/machine code so that it becomes
>> able to handle an input stream with a larger set of registers than
>> the destination and print useful information about which registers
>> are causing the trouble. The migration outcome is unchanged:
>> - unexpected registers still will fail the migration
>> - missing ones are printed but will not fail the migration, as done today.
>>
>> The input stream can contain MAX_CPREG_VMSTATE_ANOMALIES(10) extra
>> registers compared to what exists on the target.
>>
>> If there are more registers we will still hit the previous
>> "load cpu:cpreg_vmstate_array_len" error.
>>
>> At most, MAX_CPREG_VMSTATE_ANOMALIES missing registers
>> and MAX_CPREG_VMSTATE_ANOMALIES unexpected registers are printed.
>>
>> Example:
>>
>> qemu-system-aarch64: kvm_arm_cpu_post_load Missing register in input stream: 0 0x6030000000160003 fw feat reg 3
>> qemu-system-aarch64: kvm_arm_cpu_post_load Unexpected register in input stream: 0 0x603000000013c103 op0:3 op1:0 crn:2 crm:0 op2:3
>> qemu-system-aarch64: kvm_arm_cpu_post_load Unexpected register in input stream: 1 0x603000000013c512 op0:3 op1:0 crn:10 crm:2 op2:2
>> qemu-system-aarch64: kvm_arm_cpu_post_load Unexpected register in input stream: 2 0x603000000013c513 op0:3 op1:0 crn:10 crm:2 op2:3
>> qemu-system-aarch64: error while loading state for instance 0x0 of device 'cpu'
>> qemu-system-aarch64: load of migration failed: Operation not permitted
>>
>> With TCG there is no user friendly formatting of the faulting
>> register indexes as with KVM. However the 2 added trace points
>> help to identify the culprit indexes.
>>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>> Reviewed-by: Cornelia Huck <cohuck@redhat.com>
>>
>> ---
>>
>> v2 -> v3:
>> - some extra typos (Connie)
>> - collected Connie's R-b
>>
>> v1 -> v2:
>> - fixed some type in the commit msg
>> ---
>> target/arm/cpu.h | 6 +++++
>> target/arm/kvm.c | 23 ++++++++++++++++
>> target/arm/machine.c | 58 ++++++++++++++++++++++++++++++++++++-----
>> target/arm/trace-events | 7 +++++
>> 4 files changed, 88 insertions(+), 6 deletions(-)
>>
>> diff --git a/target/arm/cpu.h b/target/arm/cpu.h
>> index 1eaf5a3fddf..e900ef7937b 100644
>> --- a/target/arm/cpu.h
>> +++ b/target/arm/cpu.h
>> @@ -939,6 +939,12 @@ struct ArchCPU {
>> uint64_t *cpreg_vmstate_values;
>> int32_t cpreg_vmstate_array_len;
>>
>> + #define MAX_CPREG_VMSTATE_ANOMALIES 10
>> + uint64_t cpreg_vmstate_missing_indexes[MAX_CPREG_VMSTATE_ANOMALIES];
>> + int32_t cpreg_vmstate_missing_indexes_array_len;
>> + uint64_t cpreg_vmstate_unexpected_indexes[MAX_CPREG_VMSTATE_ANOMALIES];
>> + int32_t cpreg_vmstate_unexpected_indexes_array_len;
>> +
> This seems a bit old school when we have GArray.
thanks for jumping in. Agreed. if we manage to have a generic TCG/KVM
print_register_name() I hope those can even be removed.
Thanks!
Eric
>
>> DynamicGDBFeatureInfo dyn_sysreg_feature;
>> DynamicGDBFeatureInfo dyn_svereg_feature;
>> DynamicGDBFeatureInfo dyn_smereg_feature;
>> diff --git a/target/arm/kvm.c b/target/arm/kvm.c
>> index 48f853fff80..c6f0d0fc4e1 100644
>> --- a/target/arm/kvm.c
>> +++ b/target/arm/kvm.c
>> @@ -1024,6 +1024,29 @@ void kvm_arm_cpu_pre_save(ARMCPU *cpu)
>>
>> bool kvm_arm_cpu_post_load(ARMCPU *cpu)
>> {
>> + int i;
>> +
>> + for (i = 0; i < cpu->cpreg_vmstate_missing_indexes_array_len; i++) {
>> + gchar *name;
>> +
>> + name = kvm_print_register_name(cpu->cpreg_vmstate_missing_indexes[i]);
>> + trace_kvm_arm_cpu_post_load_missing_reg(name);
>> + g_free(name);
>> + }
>> +
>> + for (i = 0; i < cpu->cpreg_vmstate_unexpected_indexes_array_len; i++) {
>> + gchar *name;
>> +
>> + name = kvm_print_register_name(cpu->cpreg_vmstate_unexpected_indexes[i]);
>> + error_report("%s Unexpected register in input stream: %i 0x%"PRIx64" %s",
>> + __func__, i, cpu->cpreg_vmstate_unexpected_indexes[i], name);
>> + g_free(name);
>> + }
>> + /* Fail the migration if we detect unexpected registers */
>> + if (cpu->cpreg_vmstate_unexpected_indexes_array_len) {
>> + return false;
>> + }
>> +
>> if (!write_list_to_kvmstate(cpu, KVM_PUT_FULL_STATE)) {
>> return false;
>> }
>> diff --git a/target/arm/machine.c b/target/arm/machine.c
>> index 0befdb0b28a..f06a920aba1 100644
>> --- a/target/arm/machine.c
>> +++ b/target/arm/machine.c
>> @@ -10,6 +10,7 @@
>> #include "migration/vmstate.h"
>> #include "target/arm/gtimer.h"
>> #include "hw/arm/machines-qom.h"
>> +#include "trace.h"
>>
>> static bool vfp_needed(void *opaque)
>> {
>> @@ -990,7 +991,13 @@ static int cpu_pre_load(void *opaque)
>> {
>> ARMCPU *cpu = opaque;
>> CPUARMState *env = &cpu->env;
>> + int arraylen = cpu->cpreg_vmstate_array_len + MAX_CPREG_VMSTATE_ANOMALIES;
>>
>> + cpu->cpreg_vmstate_indexes = g_renew(uint64_t, cpu->cpreg_vmstate_indexes,
>> + arraylen);
>> + cpu->cpreg_vmstate_values = g_renew(uint64_t, cpu->cpreg_vmstate_values,
>> + arraylen);
>> + cpu->cpreg_vmstate_array_len = arraylen;
> I wonder if these would be candidates for fixing up as well.
>
>> /*
>> * In an inbound migration where on the source FPSCR/FPSR/FPCR are 0,
>> * there will be no fpcr_fpsr subsection so we won't call vfp_set_fpcr()
>> @@ -1023,7 +1030,7 @@ static int cpu_post_load(void *opaque, int version_id)
>> {
>> ARMCPU *cpu = opaque;
>> CPUARMState *env = &cpu->env;
>> - int i, v;
>> + int i = 0, j = 0, k = 0, v = 0;
>>
>> /*
>> * Handle migration compatibility from old QEMU which didn't
>> @@ -1051,27 +1058,66 @@ static int cpu_post_load(void *opaque, int version_id)
>> * entries with the right slots in our own values array.
>> */
>>
>> - for (i = 0, v = 0; i < cpu->cpreg_array_len
>> - && v < cpu->cpreg_vmstate_array_len; i++) {
>> + trace_cpu_post_load_len(cpu->cpreg_array_len, cpu->cpreg_vmstate_array_len);
>> + for (; i < cpu->cpreg_array_len && v < cpu->cpreg_vmstate_array_len;) {
>> + trace_cpu_post_load(i, v , cpu->cpreg_indexes[i]);
>> if (cpu->cpreg_vmstate_indexes[v] > cpu->cpreg_indexes[i]) {
>> /* register in our list but not incoming : skip it */
>> + trace_cpu_post_load_missing(i, cpu->cpreg_indexes[i], v);
>> + if (j < MAX_CPREG_VMSTATE_ANOMALIES) {
>> + cpu->cpreg_vmstate_missing_indexes[j++] = cpu->cpreg_indexes[i];
>> + }
>> + i++;
>> continue;
>> }
>> if (cpu->cpreg_vmstate_indexes[v] < cpu->cpreg_indexes[i]) {
>> - /* register in their list but not ours: fail migration */
>> - return -1;
>> + /* register in their list but not ours: those will fail migration */
>> + trace_cpu_post_load_unexpected(v, cpu->cpreg_vmstate_indexes[v], i);
>> + if (k < MAX_CPREG_VMSTATE_ANOMALIES) {
>> + cpu->cpreg_vmstate_unexpected_indexes[k++] =
>> + cpu->cpreg_vmstate_indexes[v];
>> + }
>> + v++;
>> + continue;
>> }
>> /* matching register, copy the value over */
>> cpu->cpreg_values[i] = cpu->cpreg_vmstate_values[v];
>> v++;
>> + i++;
>> }
>> + /*
>> + * if we have reached the end of the incoming array but there are
>> + * still regs in cpreg, continue parsing the regs which are missing
>> + * in the input stream
>> + */
>> + for ( ; i < cpu->cpreg_array_len; i++) {
>> + if (j < MAX_CPREG_VMSTATE_ANOMALIES) {
>> + trace_cpu_post_load_missing(i, cpu->cpreg_indexes[i], v);
>> + cpu->cpreg_vmstate_missing_indexes[j++] = cpu->cpreg_indexes[i];
>> + }
>> + }
>> + /*
>> + * if we have reached the end of the cpreg array but there are
>> + * still regs in the input stream, continue parsing the vmstate array
>> + */
>> + for ( ; v < cpu->cpreg_vmstate_array_len; v++) {
>> + if (k < MAX_CPREG_VMSTATE_ANOMALIES) {
>> + trace_cpu_post_load_unexpected(v, cpu->cpreg_vmstate_indexes[v], i);
>> + cpu->cpreg_vmstate_unexpected_indexes[k++] =
>> + cpu->cpreg_vmstate_indexes[v];
>> + }
>> + }
>> +
>> + cpu->cpreg_vmstate_missing_indexes_array_len = j;
>> + cpu->cpreg_vmstate_unexpected_indexes_array_len = k;
>>
>> if (kvm_enabled()) {
>> if (!kvm_arm_cpu_post_load(cpu)) {
>> return -1;
>> }
>> } else {
>> - if (!write_list_to_cpustate(cpu)) {
>> + if (cpu->cpreg_vmstate_unexpected_indexes_array_len ||
>> + !write_list_to_cpustate(cpu)) {
>> return -1;
>> }
>> }
>> diff --git a/target/arm/trace-events b/target/arm/trace-events
>> index 676d29fe516..0a5ed3e69d5 100644
>> --- a/target/arm/trace-events
>> +++ b/target/arm/trace-events
>> @@ -13,6 +13,7 @@ arm_gt_update_irq(int timer, int irqstate) "gt_update_irq: timer %d irqstate %d"
>>
>> # kvm.c
>> kvm_arm_fixup_msi_route(uint64_t iova, uint64_t gpa) "MSI iova = 0x%"PRIx64" is translated into 0x%"PRIx64
>> +kvm_arm_cpu_post_load_missing_reg(char *name) "Missing register in input stream: %s"
>>
>> # cpu.c
>> arm_cpu_reset(uint64_t mp_aff) "cpu %" PRIu64
>> @@ -26,3 +27,9 @@ arm_powerctl_reset_cpu(uint64_t mp_aff) "cpu %" PRIu64
>>
>> # tcg/psci.c and hvf/hvf.c
>> arm_psci_call(uint64_t x0, uint64_t x1, uint64_t x2, uint64_t x3, uint32_t cpuid) "PSCI Call x0=0x%016"PRIx64" x1=0x%016"PRIx64" x2=0x%016"PRIx64" x3=0x%016"PRIx64" cpuid=0x%x"
>> +
>> +# machine.c
>> +cpu_post_load_len(int cpreg_array_len, int cpreg_vmstate_array_len) "cpreg_array_len=%d cpreg_vmstate_array_len=%d"
>> +cpu_post_load(int i, int v, uint64_t regidx) "i=%d v=%d regidx=0x%"PRIx64
>> +cpu_post_load_missing(int i, uint64_t regidx, int v) "missing register in input stream: i=%d index=0x%"PRIx64" (v=%d)"
>> +cpu_post_load_unexpected(int v, uint64_t regidx, int i) "unexpected register in input stream: v=%d index=0x%"PRIx64" (i=%d)"
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH v6 03/11] target/arm/cpu: Allow registers to be hidden
2026-01-26 16:52 [PATCH v6 00/11] Mitigation of "failed to load cpu:cpreg_vmstate_array_len" migration failures Eric Auger
2026-01-26 16:53 ` [PATCH v6 01/11] hw/arm/virt: Rename arm_virt_compat into arm_virt_compat_defaults Eric Auger
2026-01-26 16:53 ` [PATCH v6 02/11] target/arm/machine: Improve traces on register mismatch during migration Eric Auger
@ 2026-01-26 16:53 ` Eric Auger
2026-01-27 17:09 ` Cornelia Huck
2026-02-04 12:46 ` Eric Auger
2026-01-26 16:53 ` [PATCH v6 04/11] target/arm/machine: Allow extra regs in the incoming stream Eric Auger
` (10 subsequent siblings)
13 siblings, 2 replies; 32+ messages in thread
From: Eric Auger @ 2026-01-26 16:53 UTC (permalink / raw)
To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, peter.maydell,
cohuck, maz, oliver.upton, sebott, gshan, ddutile, peterx, philmd,
pbonzini
More recent kernels sometimes expose new registers in an
unconditionnal manner. This situation breaks backward migration
as qemu notices there are more registers in the input stream
than supported on the destination host. This leads to a
"failed to load cpu:cpreg_vmstate_array_len" error.
A good example is the introduction of KVM_REG_ARM_VENDOR_HYP_BMAP_2
pseudo FW register in v6.16 by commit C0000e58c74e (“KVM: arm64:
Introduce KVM_REG_ARM_VENDOR_HYP_BMAP_2”). Trying to do backward
migration from a host kernel that features the commit to a destination
host that doesn't, fail with above error.
Currently QEMU is not using that feature so ignoring this latter
is not a problem. An easy way to fix the migration issue is to teach
qemu we don't care about that register and we can simply ignore it
when syncing its state during migration.
This patch introduces an array of such hidden registers. Soon it will
be settable through an array property.
If hidden, the register is moved out of the array of cpreg which is
built in kvm_arm_init_cpreg_list(). That way their state won't be
synced.
To extend that functionality to TCG, do the same in add_cpreg_to_list()
and count_cpreg().
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Sebastian Ott <sebott@redhat.com>
---
v2 -> v3:
- use kvm_regidx
v1 -> v2:
- Move the property in a separate patch
- improve the commit msg
- change the trace point to just print info in
kvm_arm_init_cpreg_list()
- improve comment in cpu.h (Connie)
target/arm/helper: Skip hidden registers
In case a cpreg is hidden, skip it when initialing the cpreg
list.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
---
target/arm/cpu.h | 20 ++++++++++++++++++++
target/arm/helper.c | 12 +++++++++++-
target/arm/kvm.c | 12 +++++++++++-
target/arm/trace-events | 2 ++
4 files changed, 44 insertions(+), 2 deletions(-)
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index e900ef7937b..e87f222e7be 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -1045,6 +1045,15 @@ struct ArchCPU {
/* KVM steal time */
OnOffAuto kvm_steal_time;
+ /*
+ * Array of register indexes that need to be hidden to allow migration
+ * in certain cases, i.e. when a register is exposed in KVM or defined
+ * in TCG but not actually used in QEMU. Indexes are described in Linux
+ * Documentation/virt/kvm/api.rst for both KVM and TCG.
+ */
+ uint64_t *hidden_regs;
+ uint32_t nr_hidden_regs;
+
/* Uniprocessor system with MP extensions */
bool mp_is_up;
@@ -1185,6 +1194,17 @@ struct ARMCPUClass {
ResettablePhases parent_phases;
};
+static inline bool
+arm_cpu_hidden_reg(ARMCPU *cpu, uint64_t regidx)
+{
+ for (int i = 0; i < cpu->nr_hidden_regs; i++) {
+ if (cpu->hidden_regs[i] == regidx) {
+ return true;
+ }
+ }
+ return false;
+}
+
/* Callback functions for the generic timer's timers. */
void arm_gt_ptimer_cb(void *opaque);
void arm_gt_vtimer_cb(void *opaque);
diff --git a/target/arm/helper.c b/target/arm/helper.c
index dce648b4824..8217517150b 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -235,9 +235,13 @@ static void add_cpreg_to_list(gpointer key, gpointer value, gpointer opaque)
ARMCPU *cpu = opaque;
uint32_t regidx = (uintptr_t)key;
const ARMCPRegInfo *ri = value;
+ uint64_t kvm_regidx = cpreg_to_kvm_id(regidx);
+ if (arm_cpu_hidden_reg(cpu, kvm_regidx)) {
+ return;
+ }
if (!(ri->type & (ARM_CP_NO_RAW | ARM_CP_ALIAS))) {
- cpu->cpreg_indexes[cpu->cpreg_array_len] = cpreg_to_kvm_id(regidx);
+ cpu->cpreg_indexes[cpu->cpreg_array_len] = kvm_regidx;
/* The value array need not be initialized at this point */
cpu->cpreg_array_len++;
}
@@ -247,6 +251,12 @@ static void count_cpreg(gpointer key, gpointer value, gpointer opaque)
{
ARMCPU *cpu = opaque;
const ARMCPRegInfo *ri = value;
+ uint32_t regidx = (uintptr_t)key;
+ uint64_t kvm_regidx = cpreg_to_kvm_id(regidx);
+
+ if (arm_cpu_hidden_reg(cpu, kvm_regidx)) {
+ return;
+ }
if (!(ri->type & (ARM_CP_NO_RAW | ARM_CP_ALIAS))) {
cpu->cpreg_array_len++;
diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index c6f0d0fc4e1..7e0a3680748 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -789,7 +789,10 @@ static int kvm_arm_init_cpreg_list(ARMCPU *cpu)
qsort(&rlp->reg, rlp->n, sizeof(rlp->reg[0]), compare_u64);
for (i = 0, arraylen = 0; i < rlp->n; i++) {
- if (!kvm_arm_reg_syncs_via_cpreg_list(rlp->reg[i])) {
+ uint64_t regidx = rlp->reg[i];
+
+ if (!kvm_arm_reg_syncs_via_cpreg_list(regidx) ||
+ arm_cpu_hidden_reg(cpu, regidx)) {
continue;
}
switch (rlp->reg[i] & KVM_REG_SIZE_MASK) {
@@ -805,6 +808,8 @@ static int kvm_arm_init_cpreg_list(ARMCPU *cpu)
arraylen++;
}
+ trace_kvm_arm_init_cpreg_list_arraylen(arraylen);
+
cpu->cpreg_indexes = g_renew(uint64_t, cpu->cpreg_indexes, arraylen);
cpu->cpreg_values = g_renew(uint64_t, cpu->cpreg_values, arraylen);
cpu->cpreg_vmstate_indexes = g_renew(uint64_t, cpu->cpreg_vmstate_indexes,
@@ -816,9 +821,14 @@ static int kvm_arm_init_cpreg_list(ARMCPU *cpu)
for (i = 0, arraylen = 0; i < rlp->n; i++) {
uint64_t regidx = rlp->reg[i];
+
if (!kvm_arm_reg_syncs_via_cpreg_list(regidx)) {
continue;
}
+ if (arm_cpu_hidden_reg(cpu, regidx)) {
+ trace_kvm_arm_init_cpreg_list_skip_hidden_reg(rlp->reg[i]);
+ continue;
+ }
cpu->cpreg_indexes[arraylen] = regidx;
arraylen++;
}
diff --git a/target/arm/trace-events b/target/arm/trace-events
index 0a5ed3e69d5..20f4b4f2cd0 100644
--- a/target/arm/trace-events
+++ b/target/arm/trace-events
@@ -14,6 +14,8 @@ arm_gt_update_irq(int timer, int irqstate) "gt_update_irq: timer %d irqstate %d"
# kvm.c
kvm_arm_fixup_msi_route(uint64_t iova, uint64_t gpa) "MSI iova = 0x%"PRIx64" is translated into 0x%"PRIx64
kvm_arm_cpu_post_load_missing_reg(char *name) "Missing register in input stream: %s"
+kvm_arm_init_cpreg_list_arraylen(uint32_t arraylen) "arraylen=%d"
+kvm_arm_init_cpreg_list_skip_hidden_reg(uint64_t regidx) "hidden 0x%"PRIx64" is skipped"
# cpu.c
arm_cpu_reset(uint64_t mp_aff) "cpu %" PRIu64
--
2.52.0
^ permalink raw reply related [flat|nested] 32+ messages in thread* Re: [PATCH v6 03/11] target/arm/cpu: Allow registers to be hidden
2026-01-26 16:53 ` [PATCH v6 03/11] target/arm/cpu: Allow registers to be hidden Eric Auger
@ 2026-01-27 17:09 ` Cornelia Huck
2026-01-29 10:17 ` Eric Auger
2026-02-04 12:46 ` Eric Auger
1 sibling, 1 reply; 32+ messages in thread
From: Cornelia Huck @ 2026-01-27 17:09 UTC (permalink / raw)
To: Eric Auger, eric.auger.pro, eric.auger, qemu-devel, qemu-arm,
peter.maydell, maz, oliver.upton, sebott, gshan, ddutile, peterx,
philmd, pbonzini
On Mon, Jan 26 2026, Eric Auger <eric.auger@redhat.com> wrote:
> More recent kernels sometimes expose new registers in an
> unconditionnal manner. This situation breaks backward migration
> as qemu notices there are more registers in the input stream
> than supported on the destination host. This leads to a
> "failed to load cpu:cpreg_vmstate_array_len" error.
>
> A good example is the introduction of KVM_REG_ARM_VENDOR_HYP_BMAP_2
> pseudo FW register in v6.16 by commit C0000e58c74e (“KVM: arm64:
> Introduce KVM_REG_ARM_VENDOR_HYP_BMAP_2”). Trying to do backward
> migration from a host kernel that features the commit to a destination
> host that doesn't, fail with above error.
>
> Currently QEMU is not using that feature so ignoring this latter
> is not a problem. An easy way to fix the migration issue is to teach
> qemu we don't care about that register and we can simply ignore it
> when syncing its state during migration.
>
> This patch introduces an array of such hidden registers. Soon it will
> be settable through an array property.
>
> If hidden, the register is moved out of the array of cpreg which is
> built in kvm_arm_init_cpreg_list(). That way their state won't be
> synced.
>
> To extend that functionality to TCG, do the same in add_cpreg_to_list()
> and count_cpreg().
>
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> Reviewed-by: Sebastian Ott <sebott@redhat.com>
>
> ---
> v2 -> v3:
> - use kvm_regidx
>
> v1 -> v2:
> - Move the property in a separate patch
> - improve the commit msg
> - change the trace point to just print info in
> kvm_arm_init_cpreg_list()
> - improve comment in cpu.h (Connie)
>
> target/arm/helper: Skip hidden registers
>
> In case a cpreg is hidden, skip it when initialing the cpreg
> list.
>
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> Reviewed-by: Cornelia Huck <cohuck@redhat.com>
This looks a bit weird, probably due to patch folding... but I'm happy
to have my R-b apply to the whole patch :)
> ---
> target/arm/cpu.h | 20 ++++++++++++++++++++
> target/arm/helper.c | 12 +++++++++++-
> target/arm/kvm.c | 12 +++++++++++-
> target/arm/trace-events | 2 ++
> 4 files changed, 44 insertions(+), 2 deletions(-)
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v6 03/11] target/arm/cpu: Allow registers to be hidden
2026-01-27 17:09 ` Cornelia Huck
@ 2026-01-29 10:17 ` Eric Auger
0 siblings, 0 replies; 32+ messages in thread
From: Eric Auger @ 2026-01-29 10:17 UTC (permalink / raw)
To: Cornelia Huck, eric.auger.pro, qemu-devel, qemu-arm,
peter.maydell, maz, oliver.upton, sebott, gshan, ddutile, peterx,
philmd, pbonzini
On 1/27/26 6:09 PM, Cornelia Huck wrote:
> On Mon, Jan 26 2026, Eric Auger <eric.auger@redhat.com> wrote:
>
>> More recent kernels sometimes expose new registers in an
>> unconditionnal manner. This situation breaks backward migration
>> as qemu notices there are more registers in the input stream
>> than supported on the destination host. This leads to a
>> "failed to load cpu:cpreg_vmstate_array_len" error.
>>
>> A good example is the introduction of KVM_REG_ARM_VENDOR_HYP_BMAP_2
>> pseudo FW register in v6.16 by commit C0000e58c74e (“KVM: arm64:
>> Introduce KVM_REG_ARM_VENDOR_HYP_BMAP_2”). Trying to do backward
>> migration from a host kernel that features the commit to a destination
>> host that doesn't, fail with above error.
>>
>> Currently QEMU is not using that feature so ignoring this latter
>> is not a problem. An easy way to fix the migration issue is to teach
>> qemu we don't care about that register and we can simply ignore it
>> when syncing its state during migration.
>>
>> This patch introduces an array of such hidden registers. Soon it will
>> be settable through an array property.
>>
>> If hidden, the register is moved out of the array of cpreg which is
>> built in kvm_arm_init_cpreg_list(). That way their state won't be
>> synced.
>>
>> To extend that functionality to TCG, do the same in add_cpreg_to_list()
>> and count_cpreg().
>>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>> Reviewed-by: Sebastian Ott <sebott@redhat.com>
>>
>> ---
>> v2 -> v3:
>> - use kvm_regidx
>>
>> v1 -> v2:
>> - Move the property in a separate patch
>> - improve the commit msg
>> - change the trace point to just print info in
>> kvm_arm_init_cpreg_list()
>> - improve comment in cpu.h (Connie)
>>
>> target/arm/helper: Skip hidden registers
>>
>> In case a cpreg is hidden, skip it when initialing the cpreg
>> list.
>>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>> Reviewed-by: Cornelia Huck <cohuck@redhat.com>
> This looks a bit weird, probably due to patch folding... but I'm happy
> to have my R-b apply to the whole patch :)
Oh yes I missed that. Nevertheless it is below the --- so I guess it is
not that much an issue
Eric
>
>> ---
>> target/arm/cpu.h | 20 ++++++++++++++++++++
>> target/arm/helper.c | 12 +++++++++++-
>> target/arm/kvm.c | 12 +++++++++++-
>> target/arm/trace-events | 2 ++
>> 4 files changed, 44 insertions(+), 2 deletions(-)
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v6 03/11] target/arm/cpu: Allow registers to be hidden
2026-01-26 16:53 ` [PATCH v6 03/11] target/arm/cpu: Allow registers to be hidden Eric Auger
2026-01-27 17:09 ` Cornelia Huck
@ 2026-02-04 12:46 ` Eric Auger
1 sibling, 0 replies; 32+ messages in thread
From: Eric Auger @ 2026-02-04 12:46 UTC (permalink / raw)
To: eric.auger.pro, qemu-devel, qemu-arm, peter.maydell, cohuck, maz,
oliver.upton, sebott, gshan, ddutile, peterx, philmd, pbonzini,
Alex Bennée
Hi Alex,
On 1/26/26 5:53 PM, Eric Auger wrote:
> More recent kernels sometimes expose new registers in an
> unconditionnal manner. This situation breaks backward migration
> as qemu notices there are more registers in the input stream
> than supported on the destination host. This leads to a
> "failed to load cpu:cpreg_vmstate_array_len" error.
>
> A good example is the introduction of KVM_REG_ARM_VENDOR_HYP_BMAP_2
> pseudo FW register in v6.16 by commit C0000e58c74e (“KVM: arm64:
> Introduce KVM_REG_ARM_VENDOR_HYP_BMAP_2”). Trying to do backward
> migration from a host kernel that features the commit to a destination
> host that doesn't, fail with above error.
Please find below a follow up on your question regarding hidden reg,
during yesterday's qemu biweekly
Regarding this KVM_REG_ARM_VENDOR_HYP_BMAP_2 pseudo FW reg, this latter
was introduced by
[PATCH v8 3/6] KVM: arm64: Introduce KVM_REG_ARM_VENDOR_HYP_BMAP_2 <https://lore.kernel.org/all/20250221140229.12588-4-shameerali.kolothum.thodi@huawei.com/#r>
https://lore.kernel.org/all/20250221140229.12588-4-shameerali.kolothum.thodi@huawei.com/
The purpose is to let userspace retrieve and negotiate available Vendor
Hyp services. So it allows qemu to opt in for specific services (the
first one being the MIDR based errata mgt). So the guest only reads this
reg and does not change it. So this sounds safe to ignore saving/restore
the reg state as long as qemu does not play with the reg. QEMU does
not get/set this reg yet. Support may come later. Shameer posted an RFC:
[RFC PATCH RESEND 3/4] target/arm/kvm: Handle KVM Target Imp CPU
hypercalls
https://lore.kernel.org/all/3bf24eb7-b1cd-793c-158f-7b6e2aeab026@redhat.com/
There you can see how the reg is accessed. Besides, enlarging your
question to an hidden reg likely to be written by the guest and whose
state wouldn't be saved/restored due to the "hidden" nature of the reg,
we could add a check in qemu that compares the reg value against default
init one on migration source and in case it differs fail the migration.
Thanks Eric
> Currently QEMU is not using that feature so ignoring this latter
> is not a problem. An easy way to fix the migration issue is to teach
> qemu we don't care about that register and we can simply ignore it
> when syncing its state during migration.
>
> This patch introduces an array of such hidden registers. Soon it will
> be settable through an array property.
>
> If hidden, the register is moved out of the array of cpreg which is
> built in kvm_arm_init_cpreg_list(). That way their state won't be
> synced.
>
> To extend that functionality to TCG, do the same in add_cpreg_to_list()
> and count_cpreg().
>
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> Reviewed-by: Sebastian Ott <sebott@redhat.com>
>
> ---
> v2 -> v3:
> - use kvm_regidx
>
> v1 -> v2:
> - Move the property in a separate patch
> - improve the commit msg
> - change the trace point to just print info in
> kvm_arm_init_cpreg_list()
> - improve comment in cpu.h (Connie)
>
> target/arm/helper: Skip hidden registers
>
> In case a cpreg is hidden, skip it when initialing the cpreg
> list.
>
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> Reviewed-by: Cornelia Huck <cohuck@redhat.com>
> ---
> target/arm/cpu.h | 20 ++++++++++++++++++++
> target/arm/helper.c | 12 +++++++++++-
> target/arm/kvm.c | 12 +++++++++++-
> target/arm/trace-events | 2 ++
> 4 files changed, 44 insertions(+), 2 deletions(-)
>
> diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> index e900ef7937b..e87f222e7be 100644
> --- a/target/arm/cpu.h
> +++ b/target/arm/cpu.h
> @@ -1045,6 +1045,15 @@ struct ArchCPU {
> /* KVM steal time */
> OnOffAuto kvm_steal_time;
>
> + /*
> + * Array of register indexes that need to be hidden to allow migration
> + * in certain cases, i.e. when a register is exposed in KVM or defined
> + * in TCG but not actually used in QEMU. Indexes are described in Linux
> + * Documentation/virt/kvm/api.rst for both KVM and TCG.
> + */
> + uint64_t *hidden_regs;
> + uint32_t nr_hidden_regs;
> +
> /* Uniprocessor system with MP extensions */
> bool mp_is_up;
>
> @@ -1185,6 +1194,17 @@ struct ARMCPUClass {
> ResettablePhases parent_phases;
> };
>
> +static inline bool
> +arm_cpu_hidden_reg(ARMCPU *cpu, uint64_t regidx)
> +{
> + for (int i = 0; i < cpu->nr_hidden_regs; i++) {
> + if (cpu->hidden_regs[i] == regidx) {
> + return true;
> + }
> + }
> + return false;
> +}
> +
> /* Callback functions for the generic timer's timers. */
> void arm_gt_ptimer_cb(void *opaque);
> void arm_gt_vtimer_cb(void *opaque);
> diff --git a/target/arm/helper.c b/target/arm/helper.c
> index dce648b4824..8217517150b 100644
> --- a/target/arm/helper.c
> +++ b/target/arm/helper.c
> @@ -235,9 +235,13 @@ static void add_cpreg_to_list(gpointer key, gpointer value, gpointer opaque)
> ARMCPU *cpu = opaque;
> uint32_t regidx = (uintptr_t)key;
> const ARMCPRegInfo *ri = value;
> + uint64_t kvm_regidx = cpreg_to_kvm_id(regidx);
>
> + if (arm_cpu_hidden_reg(cpu, kvm_regidx)) {
> + return;
> + }
> if (!(ri->type & (ARM_CP_NO_RAW | ARM_CP_ALIAS))) {
> - cpu->cpreg_indexes[cpu->cpreg_array_len] = cpreg_to_kvm_id(regidx);
> + cpu->cpreg_indexes[cpu->cpreg_array_len] = kvm_regidx;
> /* The value array need not be initialized at this point */
> cpu->cpreg_array_len++;
> }
> @@ -247,6 +251,12 @@ static void count_cpreg(gpointer key, gpointer value, gpointer opaque)
> {
> ARMCPU *cpu = opaque;
> const ARMCPRegInfo *ri = value;
> + uint32_t regidx = (uintptr_t)key;
> + uint64_t kvm_regidx = cpreg_to_kvm_id(regidx);
> +
> + if (arm_cpu_hidden_reg(cpu, kvm_regidx)) {
> + return;
> + }
>
> if (!(ri->type & (ARM_CP_NO_RAW | ARM_CP_ALIAS))) {
> cpu->cpreg_array_len++;
> diff --git a/target/arm/kvm.c b/target/arm/kvm.c
> index c6f0d0fc4e1..7e0a3680748 100644
> --- a/target/arm/kvm.c
> +++ b/target/arm/kvm.c
> @@ -789,7 +789,10 @@ static int kvm_arm_init_cpreg_list(ARMCPU *cpu)
> qsort(&rlp->reg, rlp->n, sizeof(rlp->reg[0]), compare_u64);
>
> for (i = 0, arraylen = 0; i < rlp->n; i++) {
> - if (!kvm_arm_reg_syncs_via_cpreg_list(rlp->reg[i])) {
> + uint64_t regidx = rlp->reg[i];
> +
> + if (!kvm_arm_reg_syncs_via_cpreg_list(regidx) ||
> + arm_cpu_hidden_reg(cpu, regidx)) {
> continue;
> }
> switch (rlp->reg[i] & KVM_REG_SIZE_MASK) {
> @@ -805,6 +808,8 @@ static int kvm_arm_init_cpreg_list(ARMCPU *cpu)
> arraylen++;
> }
>
> + trace_kvm_arm_init_cpreg_list_arraylen(arraylen);
> +
> cpu->cpreg_indexes = g_renew(uint64_t, cpu->cpreg_indexes, arraylen);
> cpu->cpreg_values = g_renew(uint64_t, cpu->cpreg_values, arraylen);
> cpu->cpreg_vmstate_indexes = g_renew(uint64_t, cpu->cpreg_vmstate_indexes,
> @@ -816,9 +821,14 @@ static int kvm_arm_init_cpreg_list(ARMCPU *cpu)
>
> for (i = 0, arraylen = 0; i < rlp->n; i++) {
> uint64_t regidx = rlp->reg[i];
> +
> if (!kvm_arm_reg_syncs_via_cpreg_list(regidx)) {
> continue;
> }
> + if (arm_cpu_hidden_reg(cpu, regidx)) {
> + trace_kvm_arm_init_cpreg_list_skip_hidden_reg(rlp->reg[i]);
> + continue;
> + }
> cpu->cpreg_indexes[arraylen] = regidx;
> arraylen++;
> }
> diff --git a/target/arm/trace-events b/target/arm/trace-events
> index 0a5ed3e69d5..20f4b4f2cd0 100644
> --- a/target/arm/trace-events
> +++ b/target/arm/trace-events
> @@ -14,6 +14,8 @@ arm_gt_update_irq(int timer, int irqstate) "gt_update_irq: timer %d irqstate %d"
> # kvm.c
> kvm_arm_fixup_msi_route(uint64_t iova, uint64_t gpa) "MSI iova = 0x%"PRIx64" is translated into 0x%"PRIx64
> kvm_arm_cpu_post_load_missing_reg(char *name) "Missing register in input stream: %s"
> +kvm_arm_init_cpreg_list_arraylen(uint32_t arraylen) "arraylen=%d"
> +kvm_arm_init_cpreg_list_skip_hidden_reg(uint64_t regidx) "hidden 0x%"PRIx64" is skipped"
>
> # cpu.c
> arm_cpu_reset(uint64_t mp_aff) "cpu %" PRIu64
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH v6 04/11] target/arm/machine: Allow extra regs in the incoming stream
2026-01-26 16:52 [PATCH v6 00/11] Mitigation of "failed to load cpu:cpreg_vmstate_array_len" migration failures Eric Auger
` (2 preceding siblings ...)
2026-01-26 16:53 ` [PATCH v6 03/11] target/arm/cpu: Allow registers to be hidden Eric Auger
@ 2026-01-26 16:53 ` Eric Auger
2026-01-26 16:53 ` [PATCH v6 05/11] kvm-all: Enforce hidden regs are never accessed Eric Auger
` (9 subsequent siblings)
13 siblings, 0 replies; 32+ messages in thread
From: Eric Auger @ 2026-01-26 16:53 UTC (permalink / raw)
To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, peter.maydell,
cohuck, maz, oliver.upton, sebott, gshan, ddutile, peterx, philmd,
pbonzini
Newer kernels may revoke exposure of KVM regs to userspace. This can
happen when one notices that some registers were unconditionnally
exposed whether they shall be conditionnally exposed for example.
An example of such situation is: TCR2_EL1, PIRE0_EL1, PIR_EL1.
Associated kernel commits were:
0fcb4eea5345 KVM: arm64: Hide TCR2_EL1 from userspace when disabled for guests
a68cddbe47ef KVM: arm64: Hide S1PIE registers from userspace when disabled for guests
Those commits were actual fixes but the cons is that is breaks forward
migration on some HW. Indeed when migrating from an old kernel that
does not feature those commits to a more recent one, destination
qemu detects there are more KVM regs in the input migration stream than
exposed by the destination host and the migration fails with:
"failed to load cpu:cpreg_vmstate_array_len"
This patchs adds the capability to define an array of register indexes
that may exist in the migration incoming stream but may be not
exposed by KVM on the destination.
We provision for extra space in cpreg_vmstate_* arrays during the preload
phase to allow the state to be saved without overflow, in case the
registers only are in the inbound data.
On postload we make sure to ignore them when analyzing potential
mismatch between registers. The actual cpreg array is never altered
meaning those registers are never accessed nor saved.
The array will be populated with a dedicated array property.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
---
v2 -> v3:
- add a missing_as_expected trace point
v1 -> v2:
- get rid of the enforced/fake terminology
- remove the useless array of fake regs. Only the number of missing
regs is needed
RFC -> v1:
- improve comment in target/arm/cpu.h (Connie)
---
target/arm/cpu.h | 22 ++++++++++++++++++++++
target/arm/machine.c | 30 +++++++++++++++++++++---------
target/arm/trace-events | 1 +
3 files changed, 44 insertions(+), 9 deletions(-)
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index e87f222e7be..7608fa88fbf 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -1054,6 +1054,15 @@ struct ArchCPU {
uint64_t *hidden_regs;
uint32_t nr_hidden_regs;
+ /*
+ * Registers that are likely to be part of the migration
+ * incoming stream but not exposed on destination. If
+ * their indexes are stored in this array, it is OK to
+ * ignore those registers in the inbound data.
+ */
+ uint64_t *mig_safe_missing_regs;
+ uint32_t nr_mig_safe_missing_regs;
+
/* Uniprocessor system with MP extensions */
bool mp_is_up;
@@ -1205,6 +1214,19 @@ arm_cpu_hidden_reg(ARMCPU *cpu, uint64_t regidx)
return false;
}
+
+static inline bool
+arm_cpu_safe_missing_reg(ARMCPU *cpu, uint64_t regidx)
+{
+ for (int i = 0; i < cpu->nr_mig_safe_missing_regs; i++) {
+ if (regidx == cpu->mig_safe_missing_regs[i]) {
+ return true;
+ }
+ }
+ return false;
+}
+
+
/* Callback functions for the generic timer's timers. */
void arm_gt_ptimer_cb(void *opaque);
void arm_gt_vtimer_cb(void *opaque);
diff --git a/target/arm/machine.c b/target/arm/machine.c
index f06a920aba1..98851afc615 100644
--- a/target/arm/machine.c
+++ b/target/arm/machine.c
@@ -991,7 +991,8 @@ static int cpu_pre_load(void *opaque)
{
ARMCPU *cpu = opaque;
CPUARMState *env = &cpu->env;
- int arraylen = cpu->cpreg_vmstate_array_len + MAX_CPREG_VMSTATE_ANOMALIES;
+ int arraylen = cpu->cpreg_vmstate_array_len +
+ cpu->nr_mig_safe_missing_regs + MAX_CPREG_VMSTATE_ANOMALIES;
cpu->cpreg_vmstate_indexes = g_renew(uint64_t, cpu->cpreg_vmstate_indexes,
arraylen);
@@ -1058,6 +1059,10 @@ static int cpu_post_load(void *opaque, int version_id)
* entries with the right slots in our own values array.
*/
+ /*
+ * at this point cpu->cpreg_vmstate_array_len was migrated with the
+ * actual length saved on source
+ */
trace_cpu_post_load_len(cpu->cpreg_array_len, cpu->cpreg_vmstate_array_len);
for (; i < cpu->cpreg_array_len && v < cpu->cpreg_vmstate_array_len;) {
trace_cpu_post_load(i, v , cpu->cpreg_indexes[i]);
@@ -1072,10 +1077,15 @@ static int cpu_post_load(void *opaque, int version_id)
}
if (cpu->cpreg_vmstate_indexes[v] < cpu->cpreg_indexes[i]) {
/* register in their list but not ours: those will fail migration */
- trace_cpu_post_load_unexpected(v, cpu->cpreg_vmstate_indexes[v], i);
- if (k < MAX_CPREG_VMSTATE_ANOMALIES) {
- cpu->cpreg_vmstate_unexpected_indexes[k++] =
- cpu->cpreg_vmstate_indexes[v];
+ if (!arm_cpu_safe_missing_reg(cpu, cpu->cpreg_vmstate_indexes[v])) {
+ trace_cpu_post_load_unexpected(v, cpu->cpreg_vmstate_indexes[v], i);
+ if (k < MAX_CPREG_VMSTATE_ANOMALIES) {
+ cpu->cpreg_vmstate_unexpected_indexes[k++] =
+ cpu->cpreg_vmstate_indexes[v];
+ }
+ } else {
+ trace_cpu_post_load_missing_as_expected(v, cpu->cpreg_vmstate_indexes[v],
+ i);
}
v++;
continue;
@@ -1101,10 +1111,12 @@ static int cpu_post_load(void *opaque, int version_id)
* still regs in the input stream, continue parsing the vmstate array
*/
for ( ; v < cpu->cpreg_vmstate_array_len; v++) {
- if (k < MAX_CPREG_VMSTATE_ANOMALIES) {
- trace_cpu_post_load_unexpected(v, cpu->cpreg_vmstate_indexes[v], i);
- cpu->cpreg_vmstate_unexpected_indexes[k++] =
- cpu->cpreg_vmstate_indexes[v];
+ if (!arm_cpu_safe_missing_reg(cpu, cpu->cpreg_vmstate_indexes[v])) {
+ if (k < MAX_CPREG_VMSTATE_ANOMALIES) {
+ trace_cpu_post_load_unexpected(v, cpu->cpreg_vmstate_indexes[v], i);
+ cpu->cpreg_vmstate_unexpected_indexes[k++] =
+ cpu->cpreg_vmstate_indexes[v];
+ }
}
}
diff --git a/target/arm/trace-events b/target/arm/trace-events
index 20f4b4f2cd0..2e22012c692 100644
--- a/target/arm/trace-events
+++ b/target/arm/trace-events
@@ -35,3 +35,4 @@ cpu_post_load_len(int cpreg_array_len, int cpreg_vmstate_array_len) "cpreg_array
cpu_post_load(int i, int v, uint64_t regidx) "i=%d v=%d regidx=0x%"PRIx64
cpu_post_load_missing(int i, uint64_t regidx, int v) "missing register in input stream: i=%d index=0x%"PRIx64" (v=%d)"
cpu_post_load_unexpected(int v, uint64_t regidx, int i) "unexpected register in input stream: v=%d index=0x%"PRIx64" (i=%d)"
+cpu_post_load_missing_as_expected(int v, uint64_t regidx, int i) "register missing as expected in input stream: v=%d index=0x%"PRIx64" (i=%d)"
--
2.52.0
^ permalink raw reply related [flat|nested] 32+ messages in thread* [PATCH v6 05/11] kvm-all: Enforce hidden regs are never accessed
2026-01-26 16:52 [PATCH v6 00/11] Mitigation of "failed to load cpu:cpreg_vmstate_array_len" migration failures Eric Auger
` (3 preceding siblings ...)
2026-01-26 16:53 ` [PATCH v6 04/11] target/arm/machine: Allow extra regs in the incoming stream Eric Auger
@ 2026-01-26 16:53 ` Eric Auger
2026-01-26 16:53 ` [PATCH v6 06/11] target/arm/cpu: Implement hide_reg callback() Eric Auger
` (8 subsequent siblings)
13 siblings, 0 replies; 32+ messages in thread
From: Eric Auger @ 2026-01-26 16:53 UTC (permalink / raw)
To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, peter.maydell,
cohuck, maz, oliver.upton, sebott, gshan, ddutile, peterx, philmd,
pbonzini
On ARM we want to be able to hide some registers which are exposed
by KVM. To mitigate some migration failures that occur when a new
register is exposed and does not exist on the destination, some
registers are tagged "hidden" and their state won't be saved. As the
state is not saved and they are expected not to be used, we want to
enforce they aren't. So let's check this. The new CPUClass hide_reg()
callback is optional and will be implemented on ARM in a subsequent
patch.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Sebastian Ott <sebott@redhat.com>
---
v3 -> v4:
- don't use blacklist terminology (Connie & Sebastian) and reword
the commit title to something clearer
---
include/hw/core/cpu.h | 2 ++
accel/kvm/kvm-all.c | 12 ++++++++++++
2 files changed, 14 insertions(+)
diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
index 61da2ea4331..6d714492714 100644
--- a/include/hw/core/cpu.h
+++ b/include/hw/core/cpu.h
@@ -142,6 +142,7 @@ struct SysemuCPUOps;
* the caller will not g_free() it.
* @disas_set_info: Setup architecture specific components of disassembly info
* @adjust_watchpoint_address: Perform a target-specific adjustment to an
+ * @hide_reg: Check if a register must be hidden (optional)
* address before attempting to match it against watchpoints.
* @deprecation_note: If this CPUClass is deprecated, this field provides
* related information.
@@ -170,6 +171,7 @@ struct CPUClass {
int (*gdb_read_register)(CPUState *cpu, GByteArray *buf, int reg);
int (*gdb_write_register)(CPUState *cpu, uint8_t *buf, int reg);
vaddr (*gdb_adjust_breakpoint)(CPUState *cpu, vaddr addr);
+ bool (*hide_reg)(CPUState *cpu, uint64_t regidex);
const char *gdb_core_xml_file;
const char * (*gdb_arch_name)(CPUState *cpu);
diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 8301a512e7f..ec733896e0d 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -3784,9 +3784,15 @@ bool kvm_device_supported(int vmfd, uint64_t type)
int kvm_set_one_reg(CPUState *cs, uint64_t id, void *source)
{
+ CPUClass *cc = CPU_GET_CLASS(cs);
struct kvm_one_reg reg;
int r;
+ if (cc->hide_reg && cc->hide_reg(cs, id)) {
+ error_report("%s reg 0x%"PRIx64" is hidden and shall never been accessed",
+ __func__, id);
+ g_assert_not_reached();
+ }
reg.id = id;
reg.addr = (uintptr_t) source;
r = kvm_vcpu_ioctl(cs, KVM_SET_ONE_REG, ®);
@@ -3798,9 +3804,15 @@ int kvm_set_one_reg(CPUState *cs, uint64_t id, void *source)
int kvm_get_one_reg(CPUState *cs, uint64_t id, void *target)
{
+ CPUClass *cc = CPU_GET_CLASS(cs);
struct kvm_one_reg reg;
int r;
+ if (cc->hide_reg && cc->hide_reg(cs, id)) {
+ error_report("%s reg 0x%"PRIx64" is hidden and shall never been accessed",
+ __func__, id);
+ g_assert_not_reached();
+ }
reg.id = id;
reg.addr = (uintptr_t) target;
r = kvm_vcpu_ioctl(cs, KVM_GET_ONE_REG, ®);
--
2.52.0
^ permalink raw reply related [flat|nested] 32+ messages in thread* [PATCH v6 06/11] target/arm/cpu: Implement hide_reg callback()
2026-01-26 16:52 [PATCH v6 00/11] Mitigation of "failed to load cpu:cpreg_vmstate_array_len" migration failures Eric Auger
` (4 preceding siblings ...)
2026-01-26 16:53 ` [PATCH v6 05/11] kvm-all: Enforce hidden regs are never accessed Eric Auger
@ 2026-01-26 16:53 ` Eric Auger
2026-01-26 16:53 ` [PATCH v6 07/11] target/arm/cpu: Expose x-mig-hidden-regs and x-mig-safe-missing-regs properties Eric Auger
` (7 subsequent siblings)
13 siblings, 0 replies; 32+ messages in thread
From: Eric Auger @ 2026-01-26 16:53 UTC (permalink / raw)
To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, peter.maydell,
cohuck, maz, oliver.upton, sebott, gshan, ddutile, peterx, philmd,
pbonzini
Check if the register is hidden.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Sebastian Ott <sebott@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
---
target/arm/cpu.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index 6e1cbf3d614..76d013e6c4b 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -2360,6 +2360,11 @@ static const TCGCPUOps arm_tcg_ops = {
};
#endif /* CONFIG_TCG */
+static inline bool arm_cpu_hide_reg(CPUState *s, uint64_t regidx)
+{
+ return arm_cpu_hidden_reg(ARM_CPU(s), regidx);
+}
+
static void arm_cpu_class_init(ObjectClass *oc, const void *data)
{
ARMCPUClass *acc = ARM_CPU_CLASS(oc);
@@ -2389,6 +2394,7 @@ static void arm_cpu_class_init(ObjectClass *oc, const void *data)
cc->gdb_get_core_xml_file = arm_gdb_get_core_xml_file;
cc->gdb_stop_before_watchpoint = true;
cc->disas_set_info = arm_disas_set_info;
+ cc->hide_reg = arm_cpu_hide_reg;
#ifdef CONFIG_TCG
cc->tcg_ops = &arm_tcg_ops;
--
2.52.0
^ permalink raw reply related [flat|nested] 32+ messages in thread* [PATCH v6 07/11] target/arm/cpu: Expose x-mig-hidden-regs and x-mig-safe-missing-regs properties
2026-01-26 16:52 [PATCH v6 00/11] Mitigation of "failed to load cpu:cpreg_vmstate_array_len" migration failures Eric Auger
` (5 preceding siblings ...)
2026-01-26 16:53 ` [PATCH v6 06/11] target/arm/cpu: Implement hide_reg callback() Eric Auger
@ 2026-01-26 16:53 ` Eric Auger
2026-01-26 16:53 ` [PATCH v6 08/11] hw/arm/virt: Declare AArch32 DBGDTRTX as safe to ignore in incoming stream Eric Auger
` (6 subsequent siblings)
13 siblings, 0 replies; 32+ messages in thread
From: Eric Auger @ 2026-01-26 16:53 UTC (permalink / raw)
To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, peter.maydell,
cohuck, maz, oliver.upton, sebott, gshan, ddutile, peterx, philmd,
pbonzini
Allows to set both array properties along with arm cpus. Their "x-" prefix
reminds that those shall be used carefully for distro specific use cases
to enable cross kernel migration.
This will allow to define such compat machine props like:
static GlobalProperty arm_virt_kernel_compat_10_1[] = {
/* KVM_REG_ARM_VENDOR_HYP_BMAP_2 */
{ TYPE_ARM_CPU, "x-mig-hidden-regs", "0x6030000000160003" },
{ TYPE_ARM_CPU, "x-mig-safe-missing-regs",
/* TCR_EL1, PIRE0_EL1, PIR_EL1 */
"0x603000000013c103, 0x603000000013c512, 0x603000000013c513" },
}
The first one means KVM_REG_ARM_VENDOR_HYP_BMAP_2 shall always
be hidden for machine types older than 10.1. The second one means
that along with 10.1 machine type we may receive in the incoming
migration stream, 3 registers that are unknown on destination.
Obviously, using the reg index as defined in
linux/Documentation/virt/kvm/api.rst is not user friendly. However
These options are supposed to be used to enable specific, rare cases,
and in general, by people trying to configure distribution defaults
familiar with those specific cases.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
---
v3 -> v4:
- typo and rewording in the commit description (Connie)
---
target/arm/cpu.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index 76d013e6c4b..c998fb5ba4d 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -2239,6 +2239,11 @@ static const Property arm_cpu_properties[] = {
DEFINE_PROP_BOOL("backcompat-cntfrq", ARMCPU, backcompat_cntfrq, false),
DEFINE_PROP_BOOL("backcompat-pauth-default-use-qarma5", ARMCPU,
backcompat_pauth_default_use_qarma5, false),
+ DEFINE_PROP_ARRAY("x-mig-hidden-regs", ARMCPU,
+ nr_hidden_regs, hidden_regs, qdev_prop_uint64, uint64_t),
+ DEFINE_PROP_ARRAY("x-mig-safe-missing-regs", ARMCPU,
+ nr_mig_safe_missing_regs, mig_safe_missing_regs,
+ qdev_prop_uint64, uint64_t),
};
static const gchar *arm_gdb_arch_name(CPUState *cs)
--
2.52.0
^ permalink raw reply related [flat|nested] 32+ messages in thread* [PATCH v6 08/11] hw/arm/virt: Declare AArch32 DBGDTRTX as safe to ignore in incoming stream
2026-01-26 16:52 [PATCH v6 00/11] Mitigation of "failed to load cpu:cpreg_vmstate_array_len" migration failures Eric Auger
` (6 preceding siblings ...)
2026-01-26 16:53 ` [PATCH v6 07/11] target/arm/cpu: Expose x-mig-hidden-regs and x-mig-safe-missing-regs properties Eric Auger
@ 2026-01-26 16:53 ` Eric Auger
2026-01-27 17:19 ` Cornelia Huck
2026-01-26 16:53 ` [PATCH v6 09/11] Revert "target/arm: Reinstate bogus AArch32 DBGDTRTX register for migration compat" Eric Auger
` (5 subsequent siblings)
13 siblings, 1 reply; 32+ messages in thread
From: Eric Auger @ 2026-01-26 16:53 UTC (permalink / raw)
To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, peter.maydell,
cohuck, maz, oliver.upton, sebott, gshan, ddutile, peterx, philmd,
pbonzini
With the new infrastructure in place it is now feasible to teach
qemu that it is safe to ignore a sysreg in the incoming migration
stream. So with the plan to revert commit 4f2b82f60431 ("target/arm:
Reinstate bogus AArch32 DBGDTRTX register for migration compat") from
qemu 11.0 onwards, let's add a compat in 10.2 machine options stating
that this reg is safe to ignore. from 11.0 onwards we will not need
that register anymore.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
v4 -> v5:
- rebased on top of latest machine types (Connie)
v3 -> v4:
- add a comment related to DBGDTRTX (Connie)
---
hw/arm/virt.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index baa4e31aac1..03d5af18f26 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -100,6 +100,15 @@ static GlobalProperty arm_virt_compat_defaults[] = {
static const size_t arm_virt_compat_defaults_len =
G_N_ELEMENTS(arm_virt_compat_defaults);
+/* Register erronously exposed on 10.2 and earlier */
+#define DBGDTRTX 0x40200000200e0298
+
+static GlobalProperty arm_virt_compat_10_2[] = {
+ { TYPE_ARM_CPU, "x-mig-safe-missing-regs", stringify(DBGDTRTX)},
+};
+static const size_t arm_virt_compat_10_2_len =
+ G_N_ELEMENTS(arm_virt_compat_10_2);
+
/*
* This cannot be called from the virt_machine_class_init() because
* TYPE_VIRT_MACHINE is abstract and mc->compat_props g_ptr_array_new()
@@ -3552,6 +3561,7 @@ static void virt_machine_10_2_options(MachineClass *mc)
{
virt_machine_11_0_options(mc);
compat_props_add(mc->compat_props, hw_compat_10_2, hw_compat_10_2_len);
+ compat_props_add(mc->compat_props, arm_virt_compat_10_2, arm_virt_compat_10_2_len);
}
DEFINE_VIRT_MACHINE(10, 2)
--
2.52.0
^ permalink raw reply related [flat|nested] 32+ messages in thread* Re: [PATCH v6 08/11] hw/arm/virt: Declare AArch32 DBGDTRTX as safe to ignore in incoming stream
2026-01-26 16:53 ` [PATCH v6 08/11] hw/arm/virt: Declare AArch32 DBGDTRTX as safe to ignore in incoming stream Eric Auger
@ 2026-01-27 17:19 ` Cornelia Huck
2026-01-29 10:23 ` Eric Auger
0 siblings, 1 reply; 32+ messages in thread
From: Cornelia Huck @ 2026-01-27 17:19 UTC (permalink / raw)
To: Eric Auger, eric.auger.pro, eric.auger, qemu-devel, qemu-arm,
peter.maydell, maz, oliver.upton, sebott, gshan, ddutile, peterx,
philmd, pbonzini
On Mon, Jan 26 2026, Eric Auger <eric.auger@redhat.com> wrote:
> With the new infrastructure in place it is now feasible to teach
> qemu that it is safe to ignore a sysreg in the incoming migration
> stream. So with the plan to revert commit 4f2b82f60431 ("target/arm:
> Reinstate bogus AArch32 DBGDTRTX register for migration compat") from
> qemu 11.0 onwards, let's add a compat in 10.2 machine options stating
> that this reg is safe to ignore. from 11.0 onwards we will not need
> that register anymore.
>
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>
> ---
>
> v4 -> v5:
> - rebased on top of latest machine types (Connie)
>
> v3 -> v4:
> - add a comment related to DBGDTRTX (Connie)
> ---
> hw/arm/virt.c | 10 ++++++++++
> 1 file changed, 10 insertions(+)
>
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index baa4e31aac1..03d5af18f26 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -100,6 +100,15 @@ static GlobalProperty arm_virt_compat_defaults[] = {
> static const size_t arm_virt_compat_defaults_len =
> G_N_ELEMENTS(arm_virt_compat_defaults);
>
> +/* Register erronously exposed on 10.2 and earlier */
> +#define DBGDTRTX 0x40200000200e0298
> +
> +static GlobalProperty arm_virt_compat_10_2[] = {
> + { TYPE_ARM_CPU, "x-mig-safe-missing-regs", stringify(DBGDTRTX)},
> +};
> +static const size_t arm_virt_compat_10_2_len =
> + G_N_ELEMENTS(arm_virt_compat_10_2);
> +
Not objecting, but we had a discussion recently regarding where compat
values for arm cpus should live:
https://lore.kernel.org/qemu-devel/20260120122108.131708-1-thuth@redhat.com/ ff.
Could this become relevant for future other versioned machine types? I'd
assume that they just would skip the bogus reg from the start, though.
> /*
> * This cannot be called from the virt_machine_class_init() because
> * TYPE_VIRT_MACHINE is abstract and mc->compat_props g_ptr_array_new()
> @@ -3552,6 +3561,7 @@ static void virt_machine_10_2_options(MachineClass *mc)
> {
> virt_machine_11_0_options(mc);
> compat_props_add(mc->compat_props, hw_compat_10_2, hw_compat_10_2_len);
> + compat_props_add(mc->compat_props, arm_virt_compat_10_2, arm_virt_compat_10_2_len);
> }
> DEFINE_VIRT_MACHINE(10, 2)
>
^ permalink raw reply [flat|nested] 32+ messages in thread* Re: [PATCH v6 08/11] hw/arm/virt: Declare AArch32 DBGDTRTX as safe to ignore in incoming stream
2026-01-27 17:19 ` Cornelia Huck
@ 2026-01-29 10:23 ` Eric Auger
0 siblings, 0 replies; 32+ messages in thread
From: Eric Auger @ 2026-01-29 10:23 UTC (permalink / raw)
To: Cornelia Huck, eric.auger.pro, qemu-devel, qemu-arm,
peter.maydell, maz, oliver.upton, sebott, gshan, ddutile, peterx,
philmd, pbonzini
Hi Connie,
On 1/27/26 6:19 PM, Cornelia Huck wrote:
> On Mon, Jan 26 2026, Eric Auger <eric.auger@redhat.com> wrote:
>
>> With the new infrastructure in place it is now feasible to teach
>> qemu that it is safe to ignore a sysreg in the incoming migration
>> stream. So with the plan to revert commit 4f2b82f60431 ("target/arm:
>> Reinstate bogus AArch32 DBGDTRTX register for migration compat") from
>> qemu 11.0 onwards, let's add a compat in 10.2 machine options stating
>> that this reg is safe to ignore. from 11.0 onwards we will not need
>> that register anymore.
>>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>
>> ---
>>
>> v4 -> v5:
>> - rebased on top of latest machine types (Connie)
>>
>> v3 -> v4:
>> - add a comment related to DBGDTRTX (Connie)
>> ---
>> hw/arm/virt.c | 10 ++++++++++
>> 1 file changed, 10 insertions(+)
>>
>> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
>> index baa4e31aac1..03d5af18f26 100644
>> --- a/hw/arm/virt.c
>> +++ b/hw/arm/virt.c
>> @@ -100,6 +100,15 @@ static GlobalProperty arm_virt_compat_defaults[] = {
>> static const size_t arm_virt_compat_defaults_len =
>> G_N_ELEMENTS(arm_virt_compat_defaults);
>>
>> +/* Register erronously exposed on 10.2 and earlier */
>> +#define DBGDTRTX 0x40200000200e0298
>> +
>> +static GlobalProperty arm_virt_compat_10_2[] = {
>> + { TYPE_ARM_CPU, "x-mig-safe-missing-regs", stringify(DBGDTRTX)},
>> +};
>> +static const size_t arm_virt_compat_10_2_len =
>> + G_N_ELEMENTS(arm_virt_compat_10_2);
>> +
> Not objecting, but we had a discussion recently regarding where compat
> values for arm cpus should live:
> https://lore.kernel.org/qemu-devel/20260120122108.131708-1-thuth@redhat.com/ ff.
>
> Could this become relevant for future other versioned machine types? I'd
> assume that they just would skip the bogus reg from the start, though.
Thanks for the pointer. Effectively this could have initially lived
in hw/core/machine.c but the problem specific to this array prop is that
in any case this is not something you can aggregate and this requires
infra defined in 10/11 where you register safe missing regs and hidden
regs throgh a specific helper. Currently only one machine requires this
so I would be inclined to leave it in virt arm machine code. So this
patch is something trasncient and replaced in 10/11. Thanks Eric
>
>> /*
>> * This cannot be called from the virt_machine_class_init() because
>> * TYPE_VIRT_MACHINE is abstract and mc->compat_props g_ptr_array_new()
>> @@ -3552,6 +3561,7 @@ static void virt_machine_10_2_options(MachineClass *mc)
>> {
>> virt_machine_11_0_options(mc);
>> compat_props_add(mc->compat_props, hw_compat_10_2, hw_compat_10_2_len);
>> + compat_props_add(mc->compat_props, arm_virt_compat_10_2, arm_virt_compat_10_2_len);
>> }
>> DEFINE_VIRT_MACHINE(10, 2)
>>
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH v6 09/11] Revert "target/arm: Reinstate bogus AArch32 DBGDTRTX register for migration compat"
2026-01-26 16:52 [PATCH v6 00/11] Mitigation of "failed to load cpu:cpreg_vmstate_array_len" migration failures Eric Auger
` (7 preceding siblings ...)
2026-01-26 16:53 ` [PATCH v6 08/11] hw/arm/virt: Declare AArch32 DBGDTRTX as safe to ignore in incoming stream Eric Auger
@ 2026-01-26 16:53 ` Eric Auger
2026-01-26 16:53 ` [PATCH v6 10/11] hw/arm/virt: Introduce framework to aggregate hidden-regs and safe-missing-regs Eric Auger
` (4 subsequent siblings)
13 siblings, 0 replies; 32+ messages in thread
From: Eric Auger @ 2026-01-26 16:53 UTC (permalink / raw)
To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, peter.maydell,
cohuck, maz, oliver.upton, sebott, gshan, ddutile, peterx, philmd,
pbonzini
This reverts commit 4f2b82f60431e4792ecfd86a4d6b824248ee4c21. We don't
need that commit anymore as the AArch32 DBGDTRTX register is declared to
be safe to ignore in the incoming migration stream using a compat
in arm virt machine.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
target/arm/debug_helper.c | 29 -----------------------------
1 file changed, 29 deletions(-)
diff --git a/target/arm/debug_helper.c b/target/arm/debug_helper.c
index 579516e1541..aee06d4d426 100644
--- a/target/arm/debug_helper.c
+++ b/target/arm/debug_helper.c
@@ -940,13 +940,6 @@ static void dbgclaimclr_write(CPUARMState *env, const ARMCPRegInfo *ri,
env->cp15.dbgclaim &= ~(value & 0xFF);
}
-static CPAccessResult access_bogus(CPUARMState *env, const ARMCPRegInfo *ri,
- bool isread)
-{
- /* Always UNDEF, as if this cpreg didn't exist */
- return CP_ACCESS_UNDEFINED;
-}
-
static const ARMCPRegInfo debug_cp_reginfo[] = {
/*
* DBGDRAR, DBGDSAR: always RAZ since we don't implement memory mapped
@@ -1009,28 +1002,6 @@ static const ARMCPRegInfo debug_cp_reginfo[] = {
.opc0 = 2, .opc1 = 3, .crn = 0, .crm = 4, .opc2 = 0,
.access = PL0_RW, .accessfn = access_tdcc,
.type = ARM_CP_CONST, .resetvalue = 0 },
- /*
- * This is not a real AArch32 register. We used to incorrectly expose
- * this due to a QEMU bug; to avoid breaking migration compatibility we
- * need to continue to provide it so that we don't fail the inbound
- * migration when it tells us about a sysreg that we don't have.
- * We set an always-fails .accessfn, which means that the guest doesn't
- * actually see this register (it will always UNDEF, identically to if
- * there were no cpreg definition for it other than that we won't print
- * a LOG_UNIMP message about it), and we set the ARM_CP_NO_GDB flag so the
- * gdbstub won't see it either.
- * (We can't just set .access = 0, because add_cpreg_to_hashtable()
- * helpfully ignores cpregs which aren't accessible to the highest
- * implemented EL.)
- *
- * TODO: implement a system for being able to describe "this register
- * can be ignored if it appears in the inbound stream"; then we can
- * remove this temporary hack.
- */
- { .name = "BOGUS_DBGDTR_EL0", .state = ARM_CP_STATE_AA32,
- .cp = 14, .opc1 = 3, .crn = 0, .crm = 5, .opc2 = 0,
- .access = PL0_RW, .accessfn = access_bogus,
- .type = ARM_CP_CONST | ARM_CP_NO_GDB, .resetvalue = 0 },
/*
* OSECCR_EL1 provides a mechanism for an operating system
* to access the contents of EDECCR. EDECCR is not implemented though,
--
2.52.0
^ permalink raw reply related [flat|nested] 32+ messages in thread* [PATCH v6 10/11] hw/arm/virt: Introduce framework to aggregate hidden-regs and safe-missing-regs
2026-01-26 16:52 [PATCH v6 00/11] Mitigation of "failed to load cpu:cpreg_vmstate_array_len" migration failures Eric Auger
` (8 preceding siblings ...)
2026-01-26 16:53 ` [PATCH v6 09/11] Revert "target/arm: Reinstate bogus AArch32 DBGDTRTX register for migration compat" Eric Auger
@ 2026-01-26 16:53 ` Eric Auger
2026-01-26 16:53 ` [PATCH v6 11/11] hw/arm/virt: [DO NOT UPSTREAM] Enforce compatibility with older kernels Eric Auger
` (3 subsequent siblings)
13 siblings, 0 replies; 32+ messages in thread
From: Eric Auger @ 2026-01-26 16:53 UTC (permalink / raw)
To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, peter.maydell,
cohuck, maz, oliver.upton, sebott, gshan, ddutile, peterx, philmd,
pbonzini
Currently if a virt_machine_<n>_options() sets a TYPE_ARM_CPU
x-mig-hidden-regs or x-mig-safe-missing-regs array property, another
one cannot overwrite it or extend it. We end up with a core dump:
qemu-system-aarch64: can't apply global arm-cpu.x-mig-safe-missing-regs=0x603000000013c103, 0x603000000013c512, 0x603000000013c513: array size property x-mig-safe-missing-regs may not be set more than once
Aborted (core dumped)
In practice we would like an easy way to register regs that belong
to either of those categories and allow aggregation of those.
We introduce arm_virt_compat_register_safe_missing_reg() and
arm_virt_compat_register_hidden_reg() which populate GLists of
int64_t. After all virt_machine_<n>_options have been called and
have registered their regs, the GList are converted into the
associated array property value and the GlobalProperties are set.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Sebastian Ott <sebott@redhat.com>
---
v5 -> v6:
- move g_string_new after list length check
- collected Sebastian's R-b
v5:
- new patch
---
include/hw/arm/virt.h | 23 ++++++++++++++
hw/arm/virt.c | 73 ++++++++++++++++++++++++++++++++++++++-----
2 files changed, 89 insertions(+), 7 deletions(-)
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index 5907d41dbb2..d83e6f00068 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -41,6 +41,7 @@
#include "system/kvm.h"
#include "hw/intc/arm_gicv3_common.h"
#include "qom/object.h"
+#include "qobject/qlist.h"
#define NUM_GICV2M_SPIS 64
#define NUM_VIRTIO_TRANSPORTS 32
@@ -131,6 +132,8 @@ struct VirtMachineClass {
bool no_tcg_lpa2;
bool no_ns_el2_virt_timer_irq;
bool no_nested_smmu;
+ QList *safe_missing_regs;
+ QList *hidden_regs;
};
struct VirtMachineState {
@@ -216,4 +219,24 @@ static inline int virt_gicv3_redist_region_count(VirtMachineState *vms)
vms->highmem_redists) ? 2 : 1;
}
+static inline void arm_virt_class_init(MachineClass *mc)
+{
+ VirtMachineClass *vmc = VIRT_MACHINE_CLASS(OBJECT_CLASS(mc));
+
+ vmc->safe_missing_regs = qlist_new();
+ vmc->hidden_regs = qlist_new();
+}
+
+static inline void
+arm_virt_compat_register_safe_missing_reg(VirtMachineClass *vmc, int64_t regidx)
+{
+ qlist_append_int(vmc->safe_missing_regs, regidx);
+}
+
+static inline void
+arm_virt_compat_register_hidden_reg(VirtMachineClass *vmc, int64_t regidx)
+{
+ qlist_append_int(vmc->hidden_regs, regidx);
+}
+
#endif /* QEMU_ARM_VIRT_H */
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 03d5af18f26..a01dfb7fb79 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -93,6 +93,7 @@
#include "hw/cxl/cxl.h"
#include "hw/cxl/cxl_host.h"
#include "qemu/guest-random.h"
+#include "qobject/qnum.h"
static GlobalProperty arm_virt_compat_defaults[] = {
{ TYPE_VIRTIO_IOMMU_PCI, "aw-bits", "48" },
@@ -100,15 +101,18 @@ static GlobalProperty arm_virt_compat_defaults[] = {
static const size_t arm_virt_compat_defaults_len =
G_N_ELEMENTS(arm_virt_compat_defaults);
+/*
+ * Array made of x-mig-safe-missing-regs and x-mig-hidden-regs global
+ * properties. It is populated by arm_virt_aggregate_x_mig_props() that
+ * aggregates registrations respectively made with:
+ * - arm_virt_compat_register_safe_missing_reg() and
+ * - arm_virt_compat_register_hidden_reg()
+ */
+static GlobalProperty aggregated_x_mig_array_props[2];
+
/* Register erronously exposed on 10.2 and earlier */
#define DBGDTRTX 0x40200000200e0298
-static GlobalProperty arm_virt_compat_10_2[] = {
- { TYPE_ARM_CPU, "x-mig-safe-missing-regs", stringify(DBGDTRTX)},
-};
-static const size_t arm_virt_compat_10_2_len =
- G_N_ELEMENTS(arm_virt_compat_10_2);
-
/*
* This cannot be called from the virt_machine_class_init() because
* TYPE_VIRT_MACHINE is abstract and mc->compat_props g_ptr_array_new()
@@ -120,14 +124,67 @@ static void arm_virt_compat_default_set(MachineClass *mc)
arm_virt_compat_defaults_len);
}
+static char *get_prop_value_from_reg_qlist(QList *l)
+{
+ size_t size = qlist_size(l);
+ QListEntry *item;
+ GString *s;
+ int i = 0;
+ QNum *qi;
+
+ if (!size) {
+ return NULL;
+ }
+
+ s = g_string_new("");
+
+ QLIST_FOREACH_ENTRY(l, item) {
+ qi = qobject_to(QNum, qlist_entry_obj(item));
+ int64_t regidx;
+
+ qnum_get_try_int(qi, ®idx);
+ if (i++ > 0) {
+ g_string_append(s, ", ");
+ }
+ g_string_append_printf(s, "%" G_GINT64_FORMAT, regidx);
+ }
+ return g_string_free(s, false);
+}
+
+static void arm_virt_aggregate_x_mig_props(MachineClass *mc)
+{
+ VirtMachineClass *vmc = VIRT_MACHINE_CLASS(OBJECT_CLASS(mc));
+ const char *safe_missing_regs_prop_value =
+ get_prop_value_from_reg_qlist(vmc->safe_missing_regs);
+ const char *hidden_regs_prop_value =
+ get_prop_value_from_reg_qlist(vmc->hidden_regs);
+ int i = 0;
+
+ if (safe_missing_regs_prop_value) {
+ aggregated_x_mig_array_props[i].driver = TYPE_ARM_CPU;
+ aggregated_x_mig_array_props[i].property = "x-mig-safe-missing-regs";
+ aggregated_x_mig_array_props[i++].value = safe_missing_regs_prop_value;
+ }
+
+ if (hidden_regs_prop_value) {
+ aggregated_x_mig_array_props[i].driver = TYPE_ARM_CPU;
+ aggregated_x_mig_array_props[i].property = "x-mig-hidden-regs";
+ aggregated_x_mig_array_props[i++].value = hidden_regs_prop_value;
+ }
+
+ compat_props_add(mc->compat_props, aggregated_x_mig_array_props, i);
+}
+
#define DEFINE_VIRT_MACHINE_IMPL(latest, ...) \
static void MACHINE_VER_SYM(class_init, virt, __VA_ARGS__)( \
ObjectClass *oc, \
const void *data) \
{ \
MachineClass *mc = MACHINE_CLASS(oc); \
+ arm_virt_class_init(mc); \
arm_virt_compat_default_set(mc); \
MACHINE_VER_SYM(options, virt, __VA_ARGS__)(mc); \
+ arm_virt_aggregate_x_mig_props(mc); \
mc->desc = "QEMU " MACHINE_VER_STR(__VA_ARGS__) " ARM Virtual Machine"; \
MACHINE_VER_DEPRECATION(__VA_ARGS__); \
if (latest) { \
@@ -3559,9 +3616,11 @@ DEFINE_VIRT_MACHINE_AS_LATEST(11, 0)
static void virt_machine_10_2_options(MachineClass *mc)
{
+ VirtMachineClass *vmc = VIRT_MACHINE_CLASS(OBJECT_CLASS(mc));
+
virt_machine_11_0_options(mc);
compat_props_add(mc->compat_props, hw_compat_10_2, hw_compat_10_2_len);
- compat_props_add(mc->compat_props, arm_virt_compat_10_2, arm_virt_compat_10_2_len);
+ arm_virt_compat_register_safe_missing_reg(vmc, DBGDTRTX);
}
DEFINE_VIRT_MACHINE(10, 2)
--
2.52.0
^ permalink raw reply related [flat|nested] 32+ messages in thread* [PATCH v6 11/11] hw/arm/virt: [DO NOT UPSTREAM] Enforce compatibility with older kernels
2026-01-26 16:52 [PATCH v6 00/11] Mitigation of "failed to load cpu:cpreg_vmstate_array_len" migration failures Eric Auger
` (9 preceding siblings ...)
2026-01-26 16:53 ` [PATCH v6 10/11] hw/arm/virt: Introduce framework to aggregate hidden-regs and safe-missing-regs Eric Auger
@ 2026-01-26 16:53 ` Eric Auger
2026-01-26 17:16 ` [PATCH v6 00/11] Mitigation of "failed to load cpu:cpreg_vmstate_array_len" migration failures Eric Auger
` (2 subsequent siblings)
13 siblings, 0 replies; 32+ messages in thread
From: Eric Auger @ 2026-01-26 16:53 UTC (permalink / raw)
To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, peter.maydell,
cohuck, maz, oliver.upton, sebott, gshan, ddutile, peterx, philmd,
pbonzini
This is an example on how to use the new CPU options. This catters to
distributions who want machines to be migratable (forward and backward)
accross different host kernel versions in case KVM registers exposed
to qemu vary accross kernels. This patch is not meant to be upstreamed
as it is really kernel dependent. The goal is to illustrate how this
would be used.
In this example, For 10_1 machines types and older we apply the following
host kernel related compats:
1) Make sure the KVM_REG_ARM_VENDOR_HYP_BMAP_2 exposed from v6.15 onwards
is ignored/hidden.
2) Make sure TCR_EL1, PIRE0_EL1, PIR_EL1 are always seen by qemu
although not exposed by KVM. They were unconditionnally exposed before
v6.13 while from v6.13 they are only exposed if supported by the guest.
This will allow 10_1 machines types and older machines to migrate
forward and backward from old downstream kernels that do not feature
those changes to newer kernels (>= v6.15).
Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
hw/arm/virt.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index a01dfb7fb79..f20a8453cbd 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -3626,9 +3626,16 @@ DEFINE_VIRT_MACHINE(10, 2)
static void virt_machine_10_1_options(MachineClass *mc)
{
+ VirtMachineClass *vmc = VIRT_MACHINE_CLASS(OBJECT_CLASS(mc));
+
virt_machine_10_2_options(mc);
mc->smbios_memory_device_size = 2047 * TiB;
compat_props_add(mc->compat_props, hw_compat_10_1, hw_compat_10_1_len);
+ /* KVM_REG_ARM_VENDOR_HYP_BMAP_2 */
+ arm_virt_compat_register_hidden_reg(vmc, 0x6030000000160003);
+ arm_virt_compat_register_safe_missing_reg(vmc, 0x603000000013c103 /* TCR_EL1 */);
+ arm_virt_compat_register_safe_missing_reg(vmc, 0x603000000013c512 /* PIRE0_EL1 */);
+ arm_virt_compat_register_safe_missing_reg(vmc, 0x603000000013c513 /* PIR_EL1 */);
}
DEFINE_VIRT_MACHINE(10, 1)
--
2.52.0
^ permalink raw reply related [flat|nested] 32+ messages in thread* Re: [PATCH v6 00/11] Mitigation of "failed to load cpu:cpreg_vmstate_array_len" migration failures
2026-01-26 16:52 [PATCH v6 00/11] Mitigation of "failed to load cpu:cpreg_vmstate_array_len" migration failures Eric Auger
` (10 preceding siblings ...)
2026-01-26 16:53 ` [PATCH v6 11/11] hw/arm/virt: [DO NOT UPSTREAM] Enforce compatibility with older kernels Eric Auger
@ 2026-01-26 17:16 ` Eric Auger
2026-01-27 16:52 ` Sebastian Ott
2026-02-06 14:15 ` Peter Maydell
13 siblings, 0 replies; 32+ messages in thread
From: Eric Auger @ 2026-01-26 17:16 UTC (permalink / raw)
To: eric.auger.pro, qemu-devel, qemu-arm, peter.maydell, cohuck, maz,
oliver.upton, sebott, gshan, ddutile, peterx, philmd, pbonzini,
Richard Henderson
Hi Peter, Richard,
On 1/26/26 5:52 PM, Eric Auger wrote:
> When migrating ARM guests accross same machines with different host
> kernels we are likely to encounter failures such as:
>
> "failed to load cpu:cpreg_vmstate_array_len"
>
> This is due to the fact KVM exposes a different number of registers
> to qemu on source and destination. When trying to migrate a bigger
> register set to a smaller one, qemu cannot save the CPU state.
>
> For example, recently we faced such kind of situations with:
> - unconditionnal exposure of KVM_REG_ARM_VENDOR_HYP_BMAP_2 FW pseudo
> register from v6.16 onwards. Causes backward migration failure.
> - removal of unconditionnal exposure of TCR2_EL1, PIRE0_EL1, PIR_EL1
> from v6.13 onwards. Causes forward migration failure.
>
> This situation is really problematic for distributions which want to
> guarantee forward and backward migration of a given machine type
> between different releases.
>
> While the series mainly targets KVM acceleration, this problem
> also exists with TCG. For instance some registers may be exposed
> while they shouldn't. Then it is tricky to fix that situation
> without breaking forward migration. An example was provided by
> Peter: 4f2b82f60 ("target/arm: Reinstate bogus AArch32 DBGDTRTX
> register for migration compat).
>
> This series introduces 2 CPU array properties that list
> - the CPU registers to hide from the exposes sysregs (aims
> at removing registers from the destination)
> - The CPU registers that may not exist but which can be found
> in the incoming migration stream (aims at ignoring extra
> registers in the incoming state)
>
> An example is given to illustrate how those props
> could be used to apply compats for machine types supposed to "see" the
> same register set accross various host kernels.
>
> Mitigation of DBGDTRTX issue would be achieved by setting
> x-mig-safe-missing-regs=0x40200000200e0298 which matches
> AArch32 DBGDTRTX register index.
>
> The first patch improves the tracing so that we can quickly detect
> which registers do not match between the incoming stream and the
> exposed sysregs
Most of the patches of the series have collected R-bs. Do you have
concerns with the approach?
This aims at solving distro real life issues wrt cross kernel migration
failures and we would appreciate to get a generic solution within 11.0
timeframe.
Also [PATCH v4 0/2] arm: add kvm-psci-version vcpu property
(https://lore.kernel.org/all/20251202160853.22560-3-sebott@redhat.com/)
is part of this initiative and also collected R-bs/T-bs.
Looking forward to your feedbacks.
Eric
>
> ---
>
> Available at:
> https://github.com/eauger/qemu/tree/mitig-v6
>
> ---
>
> Tests:
> - migration with 10.2 machine with old qemu featuring DBGDTRTX
> and this one where it is removed. Forward migration works.
> backward doesn't because the register is not present in the
> input migration stream and write_list_to_cpustate() fails
> while write_raw_cp_reg and reading it back. write_raw_cp_reg()
> seems to read an unintialized values from cpu->cpreg_values[i].
> write has no effect since type is ARM_CP_CONST but read_raw_cp_reg
> returns ri->resetvalue which differs from uninitialized value.
> I would have expected the initial cpu->cpreg_values[i] to match
> reset value which is obviously not the case. Laso the comment hints
> that it should be. So maybe another issue? Nevertheless I am
> not totally sure supporting backward migration for TCG is a must.
> This may be fixed separately if it is confirmed this is a bug.
>
> - migration with accel=kvm back and forth old host/qemu where
> host does not feature fixes for TCR2_EL1, PIRE0_EL1, PIR_EL1
> and recent KVM_REG_ARM_VENDOR_HYP_BMAP_2 FW and more recent
> kernel/this qemu that feature them. Migration works forward
> and backward with 10.1 machine type.
>
> History:
>
> v5 -> v6:
> - move GString init and collected Sebastian's R-b
>
> v4 -> v5:
> - Fixed issue reported by Sebastian about aggregated array
> props. This lead to the introduction of
> hw/arm/virt: Introduce framework to aggregate hidden-regs
> and safe-missing-regs
> - Collected additional hacks from Connie
>
> v3 -> v4:
> - Collected Connie's & Sebastian's R-bs
> - Squashed patches 3 and 5
> - various typos and rewording
>
> v2 -> v3:
> - revert target/arm: Reinstate bogus AArch32 DBGDTRTX register for migration compat
> - fix some typos and rework target/arm/cpu.h hidden_regs comment (Connie)
> - Even for TCG we use KVM index
>
> v1 -> v2:
> - fixed typos (Connie)
> - Make it less KVM specific (tentative hidding of TCG regs, not
> tested)
> - Tested DBGDTRTX TCG case reported by Peter
> - No change to the property format yet. Ran out of idea. However
> I changed the name of the property with x-mig prefix
> - Changed the terminology, kept hidding but remove fake which was
> confusing
> - Simplified the logic for regs missing in the incoming stream and
> do not check anymore they are exposed on dest
>
>
> Eric Auger (11):
> hw/arm/virt: Rename arm_virt_compat into arm_virt_compat_defaults
> target/arm/machine: Improve traces on register mismatch during
> migration
> target/arm/cpu: Allow registers to be hidden
> target/arm/machine: Allow extra regs in the incoming stream
> kvm-all: Enforce hidden regs are never accessed
> target/arm/cpu: Implement hide_reg callback()
> target/arm/cpu: Expose x-mig-hidden-regs and x-mig-safe-missing-regs
> properties
> hw/arm/virt: Declare AArch32 DBGDTRTX as safe to ignore in incoming
> stream
> Revert "target/arm: Reinstate bogus AArch32 DBGDTRTX register for
> migration compat"
> hw/arm/virt: Introduce framework to aggregate hidden-regs and
> safe-missing-regs
> hw/arm/virt: [DO NOT UPSTREAM] Enforce compatibility with older
> kernels
>
> include/hw/arm/virt.h | 23 ++++++++++
> include/hw/core/cpu.h | 2 +
> target/arm/cpu.h | 48 +++++++++++++++++++++
> accel/kvm/kvm-all.c | 12 ++++++
> hw/arm/virt.c | 89 ++++++++++++++++++++++++++++++++++++---
> target/arm/cpu.c | 11 +++++
> target/arm/debug_helper.c | 29 -------------
> target/arm/helper.c | 12 +++++-
> target/arm/kvm.c | 35 ++++++++++++++-
> target/arm/machine.c | 70 +++++++++++++++++++++++++++---
> target/arm/trace-events | 10 +++++
> 11 files changed, 298 insertions(+), 43 deletions(-)
>
^ permalink raw reply [flat|nested] 32+ messages in thread* Re: [PATCH v6 00/11] Mitigation of "failed to load cpu:cpreg_vmstate_array_len" migration failures
2026-01-26 16:52 [PATCH v6 00/11] Mitigation of "failed to load cpu:cpreg_vmstate_array_len" migration failures Eric Auger
` (11 preceding siblings ...)
2026-01-26 17:16 ` [PATCH v6 00/11] Mitigation of "failed to load cpu:cpreg_vmstate_array_len" migration failures Eric Auger
@ 2026-01-27 16:52 ` Sebastian Ott
2026-01-29 10:23 ` Eric Auger
2026-02-06 14:15 ` Peter Maydell
13 siblings, 1 reply; 32+ messages in thread
From: Sebastian Ott @ 2026-01-27 16:52 UTC (permalink / raw)
To: Eric Auger
Cc: eric.auger.pro, qemu-devel, qemu-arm, peter.maydell,
Cornelia Huck, maz, oliver.upton, gshan, ddutile, peterx, philmd,
pbonzini
On Mon, 26 Jan 2026, Eric Auger wrote:
> When migrating ARM guests accross same machines with different host
> kernels we are likely to encounter failures such as:
>
> "failed to load cpu:cpreg_vmstate_array_len"
>
> This is due to the fact KVM exposes a different number of registers
> to qemu on source and destination. When trying to migrate a bigger
> register set to a smaller one, qemu cannot save the CPU state.
>
> For example, recently we faced such kind of situations with:
> - unconditionnal exposure of KVM_REG_ARM_VENDOR_HYP_BMAP_2 FW pseudo
> register from v6.16 onwards. Causes backward migration failure.
> - removal of unconditionnal exposure of TCR2_EL1, PIRE0_EL1, PIR_EL1
> from v6.13 onwards. Causes forward migration failure.
>
> This situation is really problematic for distributions which want to
> guarantee forward and backward migration of a given machine type
> between different releases.
>
> While the series mainly targets KVM acceleration, this problem
> also exists with TCG. For instance some registers may be exposed
> while they shouldn't. Then it is tricky to fix that situation
> without breaking forward migration. An example was provided by
> Peter: 4f2b82f60 ("target/arm: Reinstate bogus AArch32 DBGDTRTX
> register for migration compat).
>
> This series introduces 2 CPU array properties that list
> - the CPU registers to hide from the exposes sysregs (aims
> at removing registers from the destination)
> - The CPU registers that may not exist but which can be found
> in the incoming migration stream (aims at ignoring extra
> registers in the incoming state)
>
> An example is given to illustrate how those props
> could be used to apply compats for machine types supposed to "see" the
> same register set accross various host kernels.
>
> Mitigation of DBGDTRTX issue would be achieved by setting
> x-mig-safe-missing-regs=0x40200000200e0298 which matches
> AArch32 DBGDTRTX register index.
>
> The first patch improves the tracing so that we can quickly detect
> which registers do not match between the incoming stream and the
> exposed sysregs
>
I gave these a spin - works as advertised, no issues found.
Tested-by: Sebastian Ott <sebott@redhat.com>
^ permalink raw reply [flat|nested] 32+ messages in thread* Re: [PATCH v6 00/11] Mitigation of "failed to load cpu:cpreg_vmstate_array_len" migration failures
2026-01-27 16:52 ` Sebastian Ott
@ 2026-01-29 10:23 ` Eric Auger
0 siblings, 0 replies; 32+ messages in thread
From: Eric Auger @ 2026-01-29 10:23 UTC (permalink / raw)
To: Sebastian Ott
Cc: eric.auger.pro, qemu-devel, qemu-arm, peter.maydell,
Cornelia Huck, maz, oliver.upton, gshan, ddutile, peterx, philmd,
pbonzini
On 1/27/26 5:52 PM, Sebastian Ott wrote:
> On Mon, 26 Jan 2026, Eric Auger wrote:
>
>> When migrating ARM guests accross same machines with different host
>> kernels we are likely to encounter failures such as:
>>
>> "failed to load cpu:cpreg_vmstate_array_len"
>>
>> This is due to the fact KVM exposes a different number of registers
>> to qemu on source and destination. When trying to migrate a bigger
>> register set to a smaller one, qemu cannot save the CPU state.
>>
>> For example, recently we faced such kind of situations with:
>> - unconditionnal exposure of KVM_REG_ARM_VENDOR_HYP_BMAP_2 FW pseudo
>> register from v6.16 onwards. Causes backward migration failure.
>> - removal of unconditionnal exposure of TCR2_EL1, PIRE0_EL1, PIR_EL1
>> from v6.13 onwards. Causes forward migration failure.
>>
>> This situation is really problematic for distributions which want to
>> guarantee forward and backward migration of a given machine type
>> between different releases.
>>
>> While the series mainly targets KVM acceleration, this problem
>> also exists with TCG. For instance some registers may be exposed
>> while they shouldn't. Then it is tricky to fix that situation
>> without breaking forward migration. An example was provided by
>> Peter: 4f2b82f60 ("target/arm: Reinstate bogus AArch32 DBGDTRTX
>> register for migration compat).
>>
>> This series introduces 2 CPU array properties that list
>> - the CPU registers to hide from the exposes sysregs (aims
>> at removing registers from the destination)
>> - The CPU registers that may not exist but which can be found
>> in the incoming migration stream (aims at ignoring extra
>> registers in the incoming state)
>>
>> An example is given to illustrate how those props
>> could be used to apply compats for machine types supposed to "see" the
>> same register set accross various host kernels.
>>
>> Mitigation of DBGDTRTX issue would be achieved by setting
>> x-mig-safe-missing-regs=0x40200000200e0298 which matches
>> AArch32 DBGDTRTX register index.
>>
>> The first patch improves the tracing so that we can quickly detect
>> which registers do not match between the incoming stream and the
>> exposed sysregs
>>
>
> I gave these a spin - works as advertised, no issues found.
> Tested-by: Sebastian Ott <sebott@redhat.com>
Thanks!
Eric
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v6 00/11] Mitigation of "failed to load cpu:cpreg_vmstate_array_len" migration failures
2026-01-26 16:52 [PATCH v6 00/11] Mitigation of "failed to load cpu:cpreg_vmstate_array_len" migration failures Eric Auger
` (12 preceding siblings ...)
2026-01-27 16:52 ` Sebastian Ott
@ 2026-02-06 14:15 ` Peter Maydell
2026-02-09 14:59 ` Alex Bennée
2026-02-09 15:13 ` Eric Auger
13 siblings, 2 replies; 32+ messages in thread
From: Peter Maydell @ 2026-02-06 14:15 UTC (permalink / raw)
To: Eric Auger
Cc: eric.auger.pro, qemu-devel, qemu-arm, cohuck, maz, oliver.upton,
sebott, gshan, ddutile, peterx, philmd, pbonzini
On Mon, 26 Jan 2026 at 16:54, Eric Auger <eric.auger@redhat.com> wrote:
>
> When migrating ARM guests accross same machines with different host
> kernels we are likely to encounter failures such as:
>
> "failed to load cpu:cpreg_vmstate_array_len"
>
> This is due to the fact KVM exposes a different number of registers
> to qemu on source and destination. When trying to migrate a bigger
> register set to a smaller one, qemu cannot save the CPU state.
>
> For example, recently we faced such kind of situations with:
> - unconditionnal exposure of KVM_REG_ARM_VENDOR_HYP_BMAP_2 FW pseudo
> register from v6.16 onwards. Causes backward migration failure.
> - removal of unconditionnal exposure of TCR2_EL1, PIRE0_EL1, PIR_EL1
> from v6.13 onwards. Causes forward migration failure.
Hi; sorry I haven't given this series any attention before.
(1) Yes, this is definitely a problem we need to solve.
(2) What are the requirements we have for this?
This series sets up CPU properties controlling this, and then
sets them in the virt machine model based on the machine
type, but this seems awkward for two reasons:
* using properties confines us to using a "text string"
way of describing the behaviour; if we could implement
the handling in code and C data structures in target/arm
we could potentially do it in a more flexible and
readable way (e.g. being able to specify the register
via something other than a raw hex value)
* different host kernel versions isn't really related to
the QEMU version, so tying it to a versioned machine
type doesn't seem to fit
Q: Do we need the user to be able to control this (e.g. adding
extra registers to be ignored) on their command line, or
can we say "you need a newer QEMU that understands how to
deal with this register if you want to do migrations involving
this newer kernel version" ?
Q: This series adds a "hide this register" option which
stops the register appearing in the outbound migration data.
Do we need that, or would it be enough to have "ignore this
register in the inbound migration data" ? Assuming we're
not trying to migrate backwards to an older QEMU version
that's unaware of the new register, that seems to me like
it should be equivalent.
(3) Categories of sysreg that are causing problems:
a: "controls" -- like the PSCI_VERSION pseudoreg. Here the setting
controls what the kernel is exposing to the guest, and so we need
to be able to have the user tell QEMU to use a specific version
that's not the host kernel default if the default isn't one
that's valid for all older kernels. Sometimes the new kernel
default is the same as the old kernel's behaviour and in those
cases we also want handling of "if you see the control reg in
the incoming data and its value is the default then it's OK to
ignore it".
b: "things exposed that should not have been" -- where the old kernel
exposed a register but the new one does not because exposing the
register was wrong (i.e. a bug). The handling here can be
"ignore this in migration input if present". Examples are the
TCG2_EL1, PIRE0_EL1, PIR_EL1 regs that shouldn't exist if the
corresponding feature was disabled for the guest.
c: "things not exposed that should have been" -- where a new kernel
exposes a new register that the old one does not, and so migration
from a host with the new kernel to the old one fails. In most cases
it should be possible to handle this with "ignore in migration input
if present", or "fail migration if incoming value is not some safe
default, but if it is that default value then ignore".
Have I missed anything ?
(4) Mechanisms for handling them:
This series provides two mechanisms:
"safe missing reg" -- these registers are ignored if they appear
in the incoming migration data.
"hidden" -- the behaviour here is that we effectively entirely
ignore the register, so we do not read it from the kernel or write
it back, do not send it in outbound migration data, and do
not expect to see it in incoming migration data.
The "arm: add kvm-psci-version vcpu property" series handles one
specific "control" register, with a specific user-facing cpu property.
If new "control" type registers are rare, this seems like a good
way to go, because it means we can give the user an interface that
is reasonably clear about what it does, and we can provide better
errors on the migration-destination side (e.g. pointing the user
at the need to specify the property on the source side to get a
VM they can migrate to this destination).
The only use of "hidden" so far is for KVM_REG_ARM_VENDOR_HYP_BMAP_2.
However, I'm not sure this is the right way to handle this register.
Judging from the documentation, this seems to be a "control" register:
it would let QEMU enable certain things to be visible to the guest.
It also is odd to treat this differently from the existing
KVM_REG_ARM_VENDOR_HYP_BMAP register, which has exactly the same
semantics.
I think that the right way to treat this register would be
"if this is present in the incoming migration system and the
host kernel doesn't know about it, a value of zero is OK, but
any other value should fail migration".
In general I'm not convinced that "hidden" is a useful thing
to provide -- it should always be fine for QEMU to read and
write back to the same host kernel some sysreg it doesn't
know about, so what "hidden" is mostly doing is "don't put
this into outgoing migration data". Do we need to be able
to do that, or can we instead always use a "ignore in
incoming migration data" strategy?
(5) My preferences
I think that assuming that it meets the requirements, I would
prefer something like a mechanism where we use some kind of
C data structure / code in target/arm/machine.c to represent
"this register needs some special handling", where the special
handling might be:
- ignore if present in input
- if present in input, value must be X, otherwise fail
migration
- maybe some other things if we need them
and this is not tied to specific QEMU machine versions and
isn't something we expose via QOM properties.
I'd rather avoid the "hidden" register idea unless we
definitely need it in addition to "ignore in incoming data".
thanks
-- PMM
^ permalink raw reply [flat|nested] 32+ messages in thread* Re: [PATCH v6 00/11] Mitigation of "failed to load cpu:cpreg_vmstate_array_len" migration failures
2026-02-06 14:15 ` Peter Maydell
@ 2026-02-09 14:59 ` Alex Bennée
2026-02-09 16:13 ` Eric Auger
2026-02-09 15:13 ` Eric Auger
1 sibling, 1 reply; 32+ messages in thread
From: Alex Bennée @ 2026-02-09 14:59 UTC (permalink / raw)
To: Peter Maydell
Cc: Eric Auger, eric.auger.pro, qemu-devel, qemu-arm, cohuck, maz,
oliver.upton, sebott, gshan, ddutile, peterx, philmd, pbonzini
Peter Maydell <peter.maydell@linaro.org> writes:
> On Mon, 26 Jan 2026 at 16:54, Eric Auger <eric.auger@redhat.com> wrote:
>>
>> When migrating ARM guests accross same machines with different host
>> kernels we are likely to encounter failures such as:
>>
>> "failed to load cpu:cpreg_vmstate_array_len"
>>
>> This is due to the fact KVM exposes a different number of registers
>> to qemu on source and destination. When trying to migrate a bigger
>> register set to a smaller one, qemu cannot save the CPU state.
>>
>> For example, recently we faced such kind of situations with:
>> - unconditionnal exposure of KVM_REG_ARM_VENDOR_HYP_BMAP_2 FW pseudo
>> register from v6.16 onwards. Causes backward migration failure.
>> - removal of unconditionnal exposure of TCR2_EL1, PIRE0_EL1, PIR_EL1
>> from v6.13 onwards. Causes forward migration failure.
>
> Hi; sorry I haven't given this series any attention before.
>
> (1) Yes, this is definitely a problem we need to solve.
>
> (2) What are the requirements we have for this?
>
> This series sets up CPU properties controlling this, and then
> sets them in the virt machine model based on the machine
> type, but this seems awkward for two reasons:
>
> * using properties confines us to using a "text string"
> way of describing the behaviour; if we could implement
> the handling in code and C data structures in target/arm
> we could potentially do it in a more flexible and
> readable way (e.g. being able to specify the register
> via something other than a raw hex value)
> * different host kernel versions isn't really related to
> the QEMU version, so tying it to a versioned machine
> type doesn't seem to fit
>
> Q: Do we need the user to be able to control this (e.g. adding
> extra registers to be ignored) on their command line, or
> can we say "you need a newer QEMU that understands how to
> deal with this register if you want to do migrations involving
> this newer kernel version" ?
>
> Q: This series adds a "hide this register" option which
> stops the register appearing in the outbound migration data.
> Do we need that, or would it be enough to have "ignore this
> register in the inbound migration data" ? Assuming we're
> not trying to migrate backwards to an older QEMU version
> that's unaware of the new register, that seems to me like
> it should be equivalent.
As I understand it these signal to the guest what services the
hypervisor supplies. I assume the guest kernel only reads these once at
boot up rather than before invoking any particular service?
If this is the case then things would break if a new host couldn't
support the guest's request of the hypervisor service.
>
> (3) Categories of sysreg that are causing problems:
>
> a: "controls" -- like the PSCI_VERSION pseudoreg. Here the setting
> controls what the kernel is exposing to the guest, and so we need
> to be able to have the user tell QEMU to use a specific version
> that's not the host kernel default if the default isn't one
> that's valid for all older kernels. Sometimes the new kernel
> default is the same as the old kernel's behaviour and in those
> cases we also want handling of "if you see the control reg in
> the incoming data and its value is the default then it's OK to
> ignore it".
>
> b: "things exposed that should not have been" -- where the old kernel
> exposed a register but the new one does not because exposing the
> register was wrong (i.e. a bug). The handling here can be
> "ignore this in migration input if present". Examples are the
> TCG2_EL1, PIRE0_EL1, PIR_EL1 regs that shouldn't exist if the
> corresponding feature was disabled for the guest.
>
> c: "things not exposed that should have been" -- where a new kernel
> exposes a new register that the old one does not, and so migration
> from a host with the new kernel to the old one fails. In most cases
> it should be possible to handle this with "ignore in migration input
> if present", or "fail migration if incoming value is not some safe
> default, but if it is that default value then ignore".
Shame we don't know if the guest ever read the register. If the old host
provides features the new host doesn't but it never probed anyway then
neither the guest or new host needs to care about the register.
>
> Have I missed anything ?
>
> (4) Mechanisms for handling them:
>
> This series provides two mechanisms:
>
> "safe missing reg" -- these registers are ignored if they appear
> in the incoming migration data.
>
> "hidden" -- the behaviour here is that we effectively entirely
> ignore the register, so we do not read it from the kernel or write
> it back, do not send it in outbound migration data, and do
> not expect to see it in incoming migration data.
>
> The "arm: add kvm-psci-version vcpu property" series handles one
> specific "control" register, with a specific user-facing cpu property.
> If new "control" type registers are rare, this seems like a good
> way to go, because it means we can give the user an interface that
> is reasonably clear about what it does, and we can provide better
> errors on the migration-destination side (e.g. pointing the user
> at the need to specify the property on the source side to get a
> VM they can migrate to this destination).
>
> The only use of "hidden" so far is for KVM_REG_ARM_VENDOR_HYP_BMAP_2.
> However, I'm not sure this is the right way to handle this register.
> Judging from the documentation, this seems to be a "control" register:
> it would let QEMU enable certain things to be visible to the guest.
> It also is odd to treat this differently from the existing
> KVM_REG_ARM_VENDOR_HYP_BMAP register, which has exactly the same
> semantics.
>
> I think that the right way to treat this register would be
> "if this is present in the incoming migration system and the
> host kernel doesn't know about it, a value of zero is OK, but
> any other value should fail migration".
>
> In general I'm not convinced that "hidden" is a useful thing
> to provide -- it should always be fine for QEMU to read and
> write back to the same host kernel some sysreg it doesn't
> know about, so what "hidden" is mostly doing is "don't put
> this into outgoing migration data". Do we need to be able
> to do that, or can we instead always use a "ignore in
> incoming migration data" strategy?
>
> (5) My preferences
>
> I think that assuming that it meets the requirements, I would
> prefer something like a mechanism where we use some kind of
> C data structure / code in target/arm/machine.c to represent
> "this register needs some special handling", where the special
> handling might be:
> - ignore if present in input
> - if present in input, value must be X, otherwise fail
> migration
> - maybe some other things if we need them
>
> and this is not tied to specific QEMU machine versions and
> isn't something we expose via QOM properties.
>
> I'd rather avoid the "hidden" register idea unless we
> definitely need it in addition to "ignore in incoming data".
>
> thanks
> -- PMM
--
Alex Bennée
Virtualisation Tech Lead @ Linaro
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v6 00/11] Mitigation of "failed to load cpu:cpreg_vmstate_array_len" migration failures
2026-02-09 14:59 ` Alex Bennée
@ 2026-02-09 16:13 ` Eric Auger
0 siblings, 0 replies; 32+ messages in thread
From: Eric Auger @ 2026-02-09 16:13 UTC (permalink / raw)
To: Alex Bennée, Peter Maydell
Cc: eric.auger.pro, qemu-devel, qemu-arm, cohuck, maz, oliver.upton,
sebott, gshan, ddutile, peterx, philmd, pbonzini
Hi Alex,
On 2/9/26 3:59 PM, Alex Bennée wrote:
> Peter Maydell <peter.maydell@linaro.org> writes:
>
>> On Mon, 26 Jan 2026 at 16:54, Eric Auger <eric.auger@redhat.com> wrote:
>>> When migrating ARM guests accross same machines with different host
>>> kernels we are likely to encounter failures such as:
>>>
>>> "failed to load cpu:cpreg_vmstate_array_len"
>>>
>>> This is due to the fact KVM exposes a different number of registers
>>> to qemu on source and destination. When trying to migrate a bigger
>>> register set to a smaller one, qemu cannot save the CPU state.
>>>
>>> For example, recently we faced such kind of situations with:
>>> - unconditionnal exposure of KVM_REG_ARM_VENDOR_HYP_BMAP_2 FW pseudo
>>> register from v6.16 onwards. Causes backward migration failure.
>>> - removal of unconditionnal exposure of TCR2_EL1, PIRE0_EL1, PIR_EL1
>>> from v6.13 onwards. Causes forward migration failure.
>> Hi; sorry I haven't given this series any attention before.
>>
>> (1) Yes, this is definitely a problem we need to solve.
>>
>> (2) What are the requirements we have for this?
>>
>> This series sets up CPU properties controlling this, and then
>> sets them in the virt machine model based on the machine
>> type, but this seems awkward for two reasons:
>>
>> * using properties confines us to using a "text string"
>> way of describing the behaviour; if we could implement
>> the handling in code and C data structures in target/arm
>> we could potentially do it in a more flexible and
>> readable way (e.g. being able to specify the register
>> via something other than a raw hex value)
>> * different host kernel versions isn't really related to
>> the QEMU version, so tying it to a versioned machine
>> type doesn't seem to fit
>>
>> Q: Do we need the user to be able to control this (e.g. adding
>> extra registers to be ignored) on their command line, or
>> can we say "you need a newer QEMU that understands how to
>> deal with this register if you want to do migrations involving
>> this newer kernel version" ?
>>
>> Q: This series adds a "hide this register" option which
>> stops the register appearing in the outbound migration data.
>> Do we need that, or would it be enough to have "ignore this
>> register in the inbound migration data" ? Assuming we're
>> not trying to migrate backwards to an older QEMU version
>> that's unaware of the new register, that seems to me like
>> it should be equivalent.
to me we may try to migrate backwards to an older QEMU version that's
unaware of this new reg.
> As I understand it these signal to the guest what services the
> hypervisor supplies. I assume the guest kernel only reads these once at
> boot up rather than before invoking any particular service?
>
> If this is the case then things would break if a new host couldn't
> support the guest's request of the hypervisor service.
Effectively we need to make sure the reg value is not different from the
reset value (meaning qemu has not enabled any feature on src that the
dest is not able to run).
Eric
>
>> (3) Categories of sysreg that are causing problems:
>>
>> a: "controls" -- like the PSCI_VERSION pseudoreg. Here the setting
>> controls what the kernel is exposing to the guest, and so we need
>> to be able to have the user tell QEMU to use a specific version
>> that's not the host kernel default if the default isn't one
>> that's valid for all older kernels. Sometimes the new kernel
>> default is the same as the old kernel's behaviour and in those
>> cases we also want handling of "if you see the control reg in
>> the incoming data and its value is the default then it's OK to
>> ignore it".
>>
>> b: "things exposed that should not have been" -- where the old kernel
>> exposed a register but the new one does not because exposing the
>> register was wrong (i.e. a bug). The handling here can be
>> "ignore this in migration input if present". Examples are the
>> TCG2_EL1, PIRE0_EL1, PIR_EL1 regs that shouldn't exist if the
>> corresponding feature was disabled for the guest.
>>
>> c: "things not exposed that should have been" -- where a new kernel
>> exposes a new register that the old one does not, and so migration
>> from a host with the new kernel to the old one fails. In most cases
>> it should be possible to handle this with "ignore in migration input
>> if present", or "fail migration if incoming value is not some safe
>> default, but if it is that default value then ignore".
> Shame we don't know if the guest ever read the register. If the old host
> provides features the new host doesn't but it never probed anyway then
> neither the guest or new host needs to care about the register.
>
>> Have I missed anything ?
>>
>> (4) Mechanisms for handling them:
>>
>> This series provides two mechanisms:
>>
>> "safe missing reg" -- these registers are ignored if they appear
>> in the incoming migration data.
>>
>> "hidden" -- the behaviour here is that we effectively entirely
>> ignore the register, so we do not read it from the kernel or write
>> it back, do not send it in outbound migration data, and do
>> not expect to see it in incoming migration data.
>>
>> The "arm: add kvm-psci-version vcpu property" series handles one
>> specific "control" register, with a specific user-facing cpu property.
>> If new "control" type registers are rare, this seems like a good
>> way to go, because it means we can give the user an interface that
>> is reasonably clear about what it does, and we can provide better
>> errors on the migration-destination side (e.g. pointing the user
>> at the need to specify the property on the source side to get a
>> VM they can migrate to this destination).
>>
>> The only use of "hidden" so far is for KVM_REG_ARM_VENDOR_HYP_BMAP_2.
>> However, I'm not sure this is the right way to handle this register.
>> Judging from the documentation, this seems to be a "control" register:
>> it would let QEMU enable certain things to be visible to the guest.
>> It also is odd to treat this differently from the existing
>> KVM_REG_ARM_VENDOR_HYP_BMAP register, which has exactly the same
>> semantics.
>>
>> I think that the right way to treat this register would be
>> "if this is present in the incoming migration system and the
>> host kernel doesn't know about it, a value of zero is OK, but
>> any other value should fail migration".
>>
>> In general I'm not convinced that "hidden" is a useful thing
>> to provide -- it should always be fine for QEMU to read and
>> write back to the same host kernel some sysreg it doesn't
>> know about, so what "hidden" is mostly doing is "don't put
>> this into outgoing migration data". Do we need to be able
>> to do that, or can we instead always use a "ignore in
>> incoming migration data" strategy?
>>
>> (5) My preferences
>>
>> I think that assuming that it meets the requirements, I would
>> prefer something like a mechanism where we use some kind of
>> C data structure / code in target/arm/machine.c to represent
>> "this register needs some special handling", where the special
>> handling might be:
>> - ignore if present in input
>> - if present in input, value must be X, otherwise fail
>> migration
>> - maybe some other things if we need them
>>
>> and this is not tied to specific QEMU machine versions and
>> isn't something we expose via QOM properties.
>>
>> I'd rather avoid the "hidden" register idea unless we
>> definitely need it in addition to "ignore in incoming data".
>>
>> thanks
>> -- PMM
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v6 00/11] Mitigation of "failed to load cpu:cpreg_vmstate_array_len" migration failures
2026-02-06 14:15 ` Peter Maydell
2026-02-09 14:59 ` Alex Bennée
@ 2026-02-09 15:13 ` Eric Auger
1 sibling, 0 replies; 32+ messages in thread
From: Eric Auger @ 2026-02-09 15:13 UTC (permalink / raw)
To: Peter Maydell
Cc: eric.auger.pro, qemu-devel, qemu-arm, cohuck, maz, oliver.upton,
sebott, gshan, ddutile, peterx, philmd, pbonzini
Hi Peter,
On 2/6/26 3:15 PM, Peter Maydell wrote:
> On Mon, 26 Jan 2026 at 16:54, Eric Auger <eric.auger@redhat.com> wrote:
>> When migrating ARM guests accross same machines with different host
>> kernels we are likely to encounter failures such as:
>>
>> "failed to load cpu:cpreg_vmstate_array_len"
>>
>> This is due to the fact KVM exposes a different number of registers
>> to qemu on source and destination. When trying to migrate a bigger
>> register set to a smaller one, qemu cannot save the CPU state.
>>
>> For example, recently we faced such kind of situations with:
>> - unconditionnal exposure of KVM_REG_ARM_VENDOR_HYP_BMAP_2 FW pseudo
>> register from v6.16 onwards. Causes backward migration failure.
>> - removal of unconditionnal exposure of TCR2_EL1, PIRE0_EL1, PIR_EL1
>> from v6.13 onwards. Causes forward migration failure.
> Hi; sorry I haven't given this series any attention before.
>
> (1) Yes, this is definitely a problem we need to solve.
>
> (2) What are the requirements we have for this?
>
> This series sets up CPU properties controlling this, and then
> sets them in the virt machine model based on the machine
> type, but this seems awkward for two reasons:
>
> * using properties confines us to using a "text string"
> way of describing the behaviour; if we could implement
> the handling in code and C data structures in target/arm
> we could potentially do it in a more flexible and
> readable way (e.g. being able to specify the register
> via something other than a raw hex value)
> * different host kernel versions isn't really related to
> the QEMU version, so tying it to a versioned machine
> type doesn't seem to fit
Well, in distros, I think it is.
When Red Hat releases a new RHEL, a new qemu version with hopefully a
new virt machine type (not always) comes along with a new kernel.
If you want to migrate between this new kernel and an older one, this
means you will use the old machine type. The new qemu, when using the
old machine type knows it needs to handle specific migration hurdles
that originate from the diff in the host kernels. So to me the
mitigation schemes can be really attached to a machine type.
As we tie a qemu version with a host kernel, it looks natural to use
compat props.
>
> Q: Do we need the user to be able to control this (e.g. adding
> extra registers to be ignored) on their command line, or
> can we say "you need a newer QEMU that understands how to
> deal with this register if you want to do migrations involving
> this newer kernel version" ?
I don't think we need users to play with that. We rather need compats
that apply to machine types.
>
> Q: This series adds a "hide this register" option which
> stops the register appearing in the outbound migration data.
> Do we need that, or would it be enough to have "ignore this
> register in the inbound migration data" ? Assuming we're
> not trying to migrate backwards to an older QEMU version
> that's unaware of the new register, that seems to me like
> it should be equivalent.
I think this is mandated.
Assume you have distro-n installed in all your customer premises. Your
customer wants to migrate to distro-n+1. It migrates some VMs to n+1.
n+1 features a new qemu and a new kernel which exposes new features such
as a new KVM pseudo FW reg. For some reason, the customer discovers
there are some issues with n+1. Customer wants to migrate those machines
back to distro-n machines. this won't work. It was confirmed this
scenario has been useful on x86 in the past. You don't want to update
your qemu on distro-n to handle extra incoming regs. This is already
shipped on the customer premises as part of an old release. Old qemu is
not ready to deal with extra regs in the incoming stream. That's why I
think we need both.
>
> (3) Categories of sysreg that are causing problems:
>
> a: "controls" -- like the PSCI_VERSION pseudoreg. Here the setting
> controls what the kernel is exposing to the guest, and so we need
> to be able to have the user tell QEMU to use a specific version
> that's not the host kernel default if the default isn't one
> that's valid for all older kernels. Sometimes the new kernel
> default is the same as the old kernel's behaviour and in those
> cases we also want handling of "if you see the control reg in
> the incoming data and its value is the default then it's OK to
> ignore it".
Effectively we could have have something telling qemu that if the
migration fails due to that given reg and because of this given value,
that's OK.
However in case you want to spawn VMs with a new release while keeping
in mind we may need at some point to migrate those VMs back to an older
release in the advent of any issue, it may be safer to directly set the
pseudo FW reg to the old default version. This looks safer to me instead
of starting the VM with PSCI_VERSION set to 1.3 initially and then
reverting to 1.1 on the dest without notice. I am not sufiiciently
knowledgeable on that use case but I am not even sure this wouldn't
break in general.
>
> b: "things exposed that should not have been" -- where the old kernel
> exposed a register but the new one does not because exposing the
> register was wrong (i.e. a bug). The handling here can be
> "ignore this in migration input if present". Examples are the
> TCG2_EL1, PIRE0_EL1, PIR_EL1 regs that shouldn't exist if the
> corresponding feature was disabled for the guest.
yes that's what I called safe-missing-regs
>
> c: "things not exposed that should have been" -- where a new kernel
> exposes a new register that the old one does not, and so migration
> from a host with the new kernel to the old one fails. In most cases
> it should be possible to handle this with "ignore in migration input
> if present", or "fail migration if incoming value is not some safe
> default, but if it is that default value then ignore".
you would need to update the qemu on the old release which is not what
we want to do. Old qemu is not equipped with that ignore-if-missing
feature.
>
> Have I missed anything ?
>
> (4) Mechanisms for handling them:
>
> This series provides two mechanisms:
>
> "safe missing reg" -- these registers are ignored if they appear
> in the incoming migration data.
>
> "hidden" -- the behaviour here is that we effectively entirely
> ignore the register, so we do not read it from the kernel or write
> it back, do not send it in outbound migration data, and do
> not expect to see it in incoming migration data.
On top of what I currently do and as pointed out by Alex during the
bi-weekly call, I think we need to make sure the guest has not changed
the init value.
>
> The "arm: add kvm-psci-version vcpu property" series handles one
> specific "control" register, with a specific user-facing cpu property.
> If new "control" type registers are rare, this seems like a good
> way to go, because it means we can give the user an interface that
> is reasonably clear about what it does, and we can provide better
> errors on the migration-destination side (e.g. pointing the user
> at the need to specify the property on the source side to get a
> VM they can migrate to this destination).
I think and hope this should be rare. This is an obsvious compatibility
breakage. At VMM level we do our utmost to avoid this situation by
introducing quite a lot of compats already. Also as mentionned earlier I
think it is much safer to start the VM with a reg value that is likely
to be compatible with its migration destination. Again only older
machine types will be started with 1.1 PSCI version while the new one is
started with 1.3.
>
> The only use of "hidden" so far is for KVM_REG_ARM_VENDOR_HYP_BMAP_2.
> However, I'm not sure this is the right way to handle this register.
> Judging from the documentation, this seems to be a "control" register:
> it would let QEMU enable certain things to be visible to the guest.
> It also is odd to treat this differently from the existing
> KVM_REG_ARM_VENDOR_HYP_BMAP register, which has exactly the same
> semantics.
Agreed but KVM_REG_ARM_VENDOR_HYP_BMAP reg does not break migration
anymore. BMAP_2 is a real life case that breaks it. At the moment we
cannot introduce this feature without breaking the compat and the
problem is we will need that feature for vcpu model at some point. So
this is a dead end. Any new KVM reg will break the migration. The
purpose of thise series is to bring an infrastructure for distros to
handle such breakages while minimizing downstream only code.
>
> I think that the right way to treat this register would be
> "if this is present in the incoming migration system and the
> host kernel doesn't know about it, a value of zero is OK, but
> any other value should fail migration".
this obliges to upgrade qemu on the destination (older installed
version) and I don't think we want that in general.
>
> In general I'm not convinced that "hidden" is a useful thing
> to provide -- it should always be fine for QEMU to read and
> write back to the same host kernel some sysreg it doesn't
> know about, so what "hidden" is mostly doing is "don't put
> this into outgoing migration data". Do we need to be able
> to do that, or can we instead always use a "ignore in
> incoming migration data" strategy?
>
> (5) My preferences
>
> I think that assuming that it meets the requirements, I would
> prefer something like a mechanism where we use some kind of
> C data structure / code in target/arm/machine.c to represent
> "this register needs some special handling", where the special
> handling might be:
> - ignore if present in input
> - if present in input, value must be X, otherwise fail
> migration
> - maybe some other things if we need them
>
> and this is not tied to specific QEMU machine versions and
> isn't something we expose via QOM properties.
So you wouldn't bother about specifying that a given migration issue can
only happen with a given machine type. Effectively it is simpler but
less precise in general.
>
> I'd rather avoid the "hidden" register idea unless we
> definitely need it in addition to "ignore in incoming data".
I think we cannot afford assuming/relying on an upgrade of the old qemu
Thank you for the technical exchange!
Eric
>
> thanks
> -- PMM
>
^ permalink raw reply [flat|nested] 32+ messages in thread