* Re: [PATCH v2] kern: perform NULL check in unregister paths (command/extcmd)
[not found] <mailman.870.1757509050.1197.grub-devel@gnu.org>
@ 2025-09-11 6:59 ` Avnish Chouhan
2025-09-11 8:09 ` Srish Srinivasan
2025-09-11 8:27 ` [RFC PATCH 1/2] target/i386: add compatibility property for arch_capabilities Avnish Chouhan
2025-09-11 10:39 ` [RFC PATCH 2/2] target/i386: add compatibility property for pdcm feature Avnish Chouhan
2 siblings, 1 reply; 5+ messages in thread
From: Avnish Chouhan @ 2025-09-11 6:59 UTC (permalink / raw)
To: ssrish
Cc: grub-devel, stefanb, sudhakar, daniel.kiper, phcoder, nayna,
sridharm
On 2025-09-10 18:27, grub-devel-request@gnu.org wrote:
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Tue, 9 Sep 2025 11:09:55 -0400
> From: Stefan Berger <stefanb@linux.ibm.com>
> To: Sudhakar Kuppusamy <sudhakar@linux.ibm.com>, Srish Srinivasan
> <ssrish@linux.ibm.com>
> Cc: grub-devel@gnu.org, daniel.kiper@oracle.com, phcoder@gmail.com,
> nayna@linux.ibm.com, sridharm@linux.ibm.com
> Subject: Re: [PATCH v2] kern: perform NULL check in unregister paths
> (command/extcmd)
> Message-ID: <57eced6f-6b12-4d95-953c-98329bce3b82@linux.ibm.com>
> Content-Type: text/plain; charset=UTF-8; format=flowed
>
>
>
> On 9/9/25 4:22 AM, Sudhakar Kuppusamy wrote:
>>
>>
>>> On 9 Sep 2025, at 12:25 PM, Srish Srinivasan <ssrish@linux.ibm.com>
>>> wrote:
>>>
>>> Many modules call grub_unregister_{command(), extcmd()} from
>>> GRUB_MOD_FINI() without checking for a failure in registration.
>>> This could lead to a NULL pointer dereference in unregistration.
>>>
>>> Perform explicit NULL check in both the unregister helpers.
>>>
>>> Signed-off-by: Srish Srinivasan <ssrish@linux.ibm.com>
>>
>> Reviewed-by: Sudhakar Kuppusamy <sudhakar@linux.ibm.com>
> Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
>
Hi Srish,
Thank you so much for the patch!
A failure in registration will anyways returns NULL. Re checking it
might not make a sense. Do you have any specific scenario for this?
Thank you!
Regards,
Avnish Chouhan
_______________________________________________
Grub-devel mailing list
Grub-devel@gnu.org
https://lists.gnu.org/mailman/listinfo/grub-devel
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [PATCH v2] kern: perform NULL check in unregister paths (command/extcmd)
2025-09-11 6:59 ` [PATCH v2] kern: perform NULL check in unregister paths (command/extcmd) Avnish Chouhan
@ 2025-09-11 8:09 ` Srish Srinivasan
0 siblings, 0 replies; 5+ messages in thread
From: Srish Srinivasan @ 2025-09-11 8:09 UTC (permalink / raw)
To: The development of GNU GRUB, Avnish Chouhan
Cc: stefanb, sudhakar, daniel.kiper, phcoder, nayna, sridharm
On 9/11/25 12:29 PM, Avnish Chouhan wrote:
> On 2025-09-10 18:27, grub-devel-request@gnu.org wrote:
>> ----------------------------------------------------------------------
>>
>> Message: 1
>> Date: Tue, 9 Sep 2025 11:09:55 -0400
>> From: Stefan Berger <stefanb@linux.ibm.com>
>> To: Sudhakar Kuppusamy <sudhakar@linux.ibm.com>, Srish Srinivasan
>> <ssrish@linux.ibm.com>
>> Cc: grub-devel@gnu.org, daniel.kiper@oracle.com, phcoder@gmail.com,
>> nayna@linux.ibm.com, sridharm@linux.ibm.com
>> Subject: Re: [PATCH v2] kern: perform NULL check in unregister paths
>> (command/extcmd)
>> Message-ID: <57eced6f-6b12-4d95-953c-98329bce3b82@linux.ibm.com>
>> Content-Type: text/plain; charset=UTF-8; format=flowed
>>
>>
>>
>> On 9/9/25 4:22 AM, Sudhakar Kuppusamy wrote:
>>>
>>>
>>>> On 9 Sep 2025, at 12:25 PM, Srish Srinivasan <ssrish@linux.ibm.com>
>>>> wrote:
>>>>
>>>> Many modules call grub_unregister_{command(), extcmd()} from
>>>> GRUB_MOD_FINI() without checking for a failure in registration.
>>>> This could lead to a NULL pointer dereference in unregistration.
>>>>
>>>> Perform explicit NULL check in both the unregister helpers.
>>>>
>>>> Signed-off-by: Srish Srinivasan <ssrish@linux.ibm.com>
>>>
>>> Reviewed-by: Sudhakar Kuppusamy <sudhakar@linux.ibm.com>
>> Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
>>
>
> Hi Srish,
> Thank you so much for the patch!
>
> A failure in registration will anyways returns NULL. Re checking it
> might not make a sense. Do you have any specific scenario for this?
> Thank you!
>
>
> Regards,
> Avnish Chouhan
Hi Avnish,
During module registration, if there is a failure in memory allocation
then a NULL is returned.
https://github.com/olafhering/grub/blob/master/grub-core/kern/command.c#L40
But during unregistration, not all modules check for this NULL.
here is an example where NULL is checked.
https://github.com/olafhering/grub/blob/master/grub-core/disk/diskfilter.c#L1491
But for example in
https://github.com/olafhering/grub/blob/master/grub-core/commands/efi/efitextmode.c#L151
there is no explicit check done before unregistration. And there are
many such instances.
Thanks,
Srish
>
> _______________________________________________
> Grub-devel mailing list
> Grub-devel@gnu.org
> https://lists.gnu.org/mailman/listinfo/grub-devel
_______________________________________________
Grub-devel mailing list
Grub-devel@gnu.org
https://lists.gnu.org/mailman/listinfo/grub-devel
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC PATCH 1/2] target/i386: add compatibility property for arch_capabilities
[not found] <mailman.870.1757509050.1197.grub-devel@gnu.org>
2025-09-11 6:59 ` [PATCH v2] kern: perform NULL check in unregister paths (command/extcmd) Avnish Chouhan
@ 2025-09-11 8:27 ` Avnish Chouhan
2025-09-11 10:39 ` [RFC PATCH 2/2] target/i386: add compatibility property for pdcm feature Avnish Chouhan
2 siblings, 0 replies; 5+ messages in thread
From: Avnish Chouhan @ 2025-09-11 8:27 UTC (permalink / raw)
To: hector.cao; +Cc: grub-devel
On 2025-09-10 18:27, grub-devel-request@gnu.org wrote:
> Message: 2
> Date: Wed, 10 Sep 2025 10:24:31 +0200
> From: Hector Cao <hector.cao@canonical.com>
> To: grub-devel@gnu.org
> Subject: [RFC PATCH 1/2] target/i386: add compatibility property for
> arch_capabilities
> Message-ID: <20250910082432.14764-2-hector.cao@canonical.com>
>
> Prior to v10.1, if requested by user, arch-capabilities is always on
> despite the fact that CPUID advertises it to be off/unvailable.
> this causes a migration issue for VMs that are run on a machine
> without arch-capabilities and expect this feature to be present
> on the destination host with QEMU 10.1.
>
> This commit add a compatibility property to restore the legacy
> behavior for all machines with version prior to 10.1
>
> Signed-off-by: Hector Cao <hector.cao@canonical.com>
> ---
> hw/core/machine.c | 1 +
> migration/migration.h | 12 ++++++++++++
> migration/options.c | 3 +++
> target/i386/kvm/kvm.c | 5 ++++-
> 4 files changed, 20 insertions(+), 1 deletion(-)
>
> diff --git a/hw/core/machine.c b/hw/core/machine.c
> index 38c949c4f2..8ad5d79cb3 100644
> --- a/hw/core/machine.c
> +++ b/hw/core/machine.c
> @@ -45,6 +45,7 @@ GlobalProperty hw_compat_10_0[] = {
> { "vfio-pci", "x-migration-load-config-after-iter", "off" },
> { "ramfb", "use-legacy-x86-rom", "true"},
> { "vfio-pci-nohotplug", "use-legacy-x86-rom", "true" },
> + { "migration", "arch-cap-always-on", "true" },
> };
> const size_t hw_compat_10_0_len = G_N_ELEMENTS(hw_compat_10_0);
>
> diff --git a/migration/migration.h b/migration/migration.h
> index 01329bf824..5124ff3636 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -510,6 +510,18 @@ struct MigrationState {
> bool rdma_migration;
>
> GSource *hup_source;
> +
> + /*
> + * This variable allows to keep the backward compatibility with
> QEMU (<10.1)
> + * on the arch-capabilities detection.
> + * With the commit d3a2413 (since 10.1), the arch-capabilities
> feature is gated
> + * with the CPUID bit (CPUID_7_0_EDX_ARCH_CAPABILITIES) instead
> of being always
> + * enabled when user requests for it. this new behavior breaks
> migration of VMs
> + * created and run with older QEMU on machines without
> IA32_ARCH_CAPABILITIES MSR,
> + * those VMs might have arch-capabilities enabled and break when
> migrating
> + * to a host with QEMU 10.1 with error : missing feature
> arch-capabilities
> + */
> + bool arch_cap_always_on;
> };
>
> void migrate_set_state(MigrationStatus *state, MigrationStatus
> old_state,
> diff --git a/migration/options.c b/migration/options.c
> index 4e923a2e07..3a80dba9c5 100644
> --- a/migration/options.c
> +++ b/migration/options.c
> @@ -203,6 +203,9 @@ const Property migration_properties[] = {
> MIGRATION_CAPABILITY_SWITCHOVER_ACK),
> DEFINE_PROP_MIG_CAP("x-dirty-limit",
> MIGRATION_CAPABILITY_DIRTY_LIMIT),
> DEFINE_PROP_MIG_CAP("mapped-ram",
> MIGRATION_CAPABILITY_MAPPED_RAM),
> +
> + DEFINE_PROP_BOOL("arch-cap-always-on", MigrationState,
Hi Hector,
Missing space before '('
> + arch_cap_always_on, false),
> };
> const size_t migration_properties_count =
> ARRAY_SIZE(migration_properties);
>
> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> index 306430a052..e2ec4e6de5 100644
> --- a/target/i386/kvm/kvm.c
> +++ b/target/i386/kvm/kvm.c
> @@ -42,6 +42,7 @@
> #include "xen-emu.h"
> #include "hyperv.h"
> #include "hyperv-proto.h"
> +#include "migration/migration.h"
>
> #include "gdbstub/enums.h"
> #include "qemu/host-utils.h"
> @@ -438,6 +439,7 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s,
> uint32_t function,
> uint32_t ret = 0;
> uint32_t cpuid_1_edx, unused;
> uint64_t bitmask;
> + MigrationState *ms = migrate_get_current();
>
> cpuid = get_supported_cpuid(s);
>
> @@ -508,7 +510,8 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s,
> uint32_t function,
> * mcahines at all, do not show the fake ARCH_CAPABILITIES MSR
> that
> * KVM sets up.
> */
> - if (!has_msr_arch_capabs || !(edx &
> CPUID_7_0_EDX_ARCH_CAPABILITIES)) {
> + if (!has_msr_arch_capabs
> + || (!(edx & CPUID_7_0_EDX_ARCH_CAPABILITIES) &&
> (!ms->arch_cap_always_on))) {
Please move '{' in the next line.
Thank you!
Regards,
Avnish Chouhan
> ret &= ~CPUID_7_0_EDX_ARCH_CAPABILITIES;
> }
> } else if (function == 7 && index == 1 && reg == R_EAX) {
> --
> 2.45.2
>
>
>
>
> ------------------------------
_______________________________________________
Grub-devel mailing list
Grub-devel@gnu.org
https://lists.gnu.org/mailman/listinfo/grub-devel
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [RFC PATCH 2/2] target/i386: add compatibility property for pdcm feature
[not found] <mailman.870.1757509050.1197.grub-devel@gnu.org>
2025-09-11 6:59 ` [PATCH v2] kern: perform NULL check in unregister paths (command/extcmd) Avnish Chouhan
2025-09-11 8:27 ` [RFC PATCH 1/2] target/i386: add compatibility property for arch_capabilities Avnish Chouhan
@ 2025-09-11 10:39 ` Avnish Chouhan
2 siblings, 0 replies; 5+ messages in thread
From: Avnish Chouhan @ 2025-09-11 10:39 UTC (permalink / raw)
To: hector.cao; +Cc: grub-devel
On 2025-09-10 18:27, grub-devel-request@gnu.org wrote:
> Message: 3
> Date: Wed, 10 Sep 2025 10:24:32 +0200
> From: Hector Cao <hector.cao@canonical.com>
> To: grub-devel@gnu.org
> Subject: [RFC PATCH 2/2] target/i386: add compatibility property for
> pdcm feature
> Message-ID: <20250910082432.14764-3-hector.cao@canonical.com>
>
> The pdcm feature is supposed to be disabled when PMU is not
> available. Up until v10.1, pdcm feature is enabled even when PMU
> is off. This behavior has been fixed but this change breaks the
> migration of VMs that are run with QEMU < 10.0 and expect the pdcm
> feature to be enabled on the destination host.
>
> This commit restores the legacy behavior for machines with version
> prior to 10.1 to allow the migration from older QEMU to QEMU 10.1.
>
> Signed-off-by: Hector Cao <hector.cao@canonical.com>
> ---
> hw/core/machine.c | 1 +
> migration/migration.h | 11 +++++++++++
> migration/options.c | 3 +++
> target/i386/cpu.c | 17 ++++++++++++++---
> 4 files changed, 29 insertions(+), 3 deletions(-)
>
> diff --git a/hw/core/machine.c b/hw/core/machine.c
> index 8ad5d79cb3..535184c221 100644
> --- a/hw/core/machine.c
> +++ b/hw/core/machine.c
> @@ -46,6 +46,7 @@ GlobalProperty hw_compat_10_0[] = {
> { "ramfb", "use-legacy-x86-rom", "true"},
> { "vfio-pci-nohotplug", "use-legacy-x86-rom", "true" },
> { "migration", "arch-cap-always-on", "true" },
> + { "migration", "pdcm-on-even-without-pmu", "true" },
> };
> const size_t hw_compat_10_0_len = G_N_ELEMENTS(hw_compat_10_0);
>
> diff --git a/migration/migration.h b/migration/migration.h
> index 5124ff3636..7d5b2aa042 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -522,6 +522,17 @@ struct MigrationState {
> * to a host with QEMU 10.1 with error : missing feature
> arch-capabilities
> */
> bool arch_cap_always_on;
> +
> + /*
> + * This variable allows to keep the backward compatibility with
> QEMU (<10.1)
> + * on the pdcm feature detection. The pdcm feature should be
> disabled when
> + * PMU is not available. Prio to 10.1, there is a bug and pdcm can
> still be
> + * enabled even if PMU is off. This behavior has been fixed by the
> commit
> + * e68ec29 (since 10.1).
> + * This new behavior breaks migration of VMs that expect, with the
> QEMU
> + * (since 10.1), pdcm to be disabled.
> + */
> + bool pdcm_on_even_without_pmu;
> };
>
> void migrate_set_state(MigrationStatus *state, MigrationStatus
> old_state,
> diff --git a/migration/options.c b/migration/options.c
> index 3a80dba9c5..a2a95dfcc4 100644
> --- a/migration/options.c
> +++ b/migration/options.c
> @@ -206,6 +206,9 @@ const Property migration_properties[] = {
>
> DEFINE_PROP_BOOL("arch-cap-always-on", MigrationState,
> arch_cap_always_on, false),
> +
> + DEFINE_PROP_BOOL("pdcm-on-even-without-pmu", MigrationState,
Hi Hector,
Missing space before '('
> + pdcm_on_even_without_pmu, false),
> };
> const size_t migration_properties_count =
> ARRAY_SIZE(migration_properties);
>
> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> index 6d85149e6e..1f0f2c8dbf 100644
> --- a/target/i386/cpu.c
> +++ b/target/i386/cpu.c
> @@ -51,6 +51,8 @@
> #include "disas/capstone.h"
> #include "cpu-internal.h"
>
> +#include "migration/migration.h"
> +
> static void x86_cpu_realizefn(DeviceState *dev, Error **errp);
> static void x86_cpu_get_supported_cpuid(uint32_t func, uint32_t index,
> uint32_t *eax, uint32_t *ebx,
> @@ -7839,6 +7841,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t
> index, uint32_t count,
> uint32_t signature[3];
> X86CPUTopoInfo *topo_info = &env->topo_info;
> uint32_t threads_per_pkg;
> + MigrationState *ms = migrate_get_current();
>
> threads_per_pkg = x86_threads_per_pkg(topo_info);
>
> @@ -7894,6 +7897,11 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t
> index, uint32_t count,
> /* Fixup overflow: max value for bits 23-16 is 255. */
> *ebx |= MIN(num, 255) << 16;
> }
> + if (ms->pdcm_on_even_without_pmu) {
Please move '{' to next line.
> + if (!cpu->enable_pmu) {
Same as above!
> + *ecx &= ~CPUID_EXT_PDCM;
> + }
> + }
> break;
> case 2: { /* cache info: needed for Pentium Pro compatibility */
> const CPUCaches *caches;
> @@ -8892,6 +8900,7 @@ void x86_cpu_expand_features(X86CPU *cpu, Error
> **errp)
> FeatureWord w;
> int i;
> GList *l;
> + MigrationState *ms = migrate_get_current();
>
> for (l = plus_features; l; l = l->next) {
> const char *prop = l->data;
> @@ -8944,9 +8953,11 @@ void x86_cpu_expand_features(X86CPU *cpu, Error
> **errp)
> }
> }
>
> - /* PDCM is fixed1 bit for TDX */
> - if (!cpu->enable_pmu && !is_tdx_vm()) {
> - env->features[FEAT_1_ECX] &= ~CPUID_EXT_PDCM;
> + if (!ms->pdcm_on_even_without_pmu) {
Same as above!
> + /* PDCM is fixed1 bit for TDX */
> + if (!cpu->enable_pmu && !is_tdx_vm()) {
same as above!
Thank you!
Regards,
Avnish Chouhan
> + env->features[FEAT_1_ECX] &= ~CPUID_EXT_PDCM;
> + }
> }
>
> for (i = 0; i < ARRAY_SIZE(feature_dependencies); i++) {
> --
> 2.45.2
>
>
_______________________________________________
Grub-devel mailing list
Grub-devel@gnu.org
https://lists.gnu.org/mailman/listinfo/grub-devel
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Issues with pdcm in qemu 10.1-rc on migration and save/restore
@ 2025-09-04 14:35 Hector Cao
2025-09-10 8:24 ` [RFC PATCH 0/2] Fix cross migration issue with missing features: pdcm, arch-capabilities Hector Cao
0 siblings, 1 reply; 5+ messages in thread
From: Hector Cao @ 2025-09-04 14:35 UTC (permalink / raw)
To: Christian Ehrhardt
Cc: Paolo Bonzini, Daniel P. Berrangé, Xiaoyao Li, Zhao Liu,
qemu-devel
[-- Attachment #1: Type: text/plain, Size: 9654 bytes --]
Hello,
In addition to my previous mail describing the issue on different
Ubuntu releases,
I went further by testing directly qemu upstream at HEAD
(baa79455fa92984ff0f4b9ae94bed66823177a27)
As the start version for the migration, I take quite recent release
v10.0.x to make the version gap smaller.
I can reproduce the following migration failures:
v10.0.2 -> HEAD:
error: operation failed: guest CPU doesn't match specification:
missing features: pdcm,arch-capabilities
v10.0.3 -> HEAD:
error: operation failed: guest CPU doesn't match specification:
missing features: pdcm
The error arch-capabilities is no longer present because v10.0.3 also
has [2] like HEAD.
If I revert the two commits [1] and [2] in HEAD, the migration works fine:
v10.0.2 -> HEAD (+reverts):
OK
[1] Revert "i386/cpu: Move adjustment of CPUID_EXT_PDCM before
feature_dependencies[] check"
This reverts commit e68ec2980901c8e7f948f3305770962806c53f0b.
[2] Revert "target/i386: do not expose ARCH_CAPABILITIES on AMD CPU"
This reverts commit d3a24134e37d57abd3e7445842cda2717f49e96d.
Since this issue is blocking us for the Ubuntu 25.10 release, can you
please provide
feedback on the best path going forward ?
On Wed, Sep 3, 2025 at 10:38 AM Christian Ehrhardt <
christian.ehrhardt@canonical.com> wrote:
> On Wed, Aug 20, 2025 at 7:11 AM Christian Ehrhardt
> <christian.ehrhardt@canonical.com> wrote:
> >
> > On Tue, Aug 19, 2025 at 4:51 PM Paolo Bonzini <pbonzini@redhat.com>
> wrote:
> > >
> > > On 8/6/25 21:18, Daniel P. Berrangé wrote:
> > > > On Wed, Aug 06, 2025 at 07:57:34PM +0200, Christian Ehrhardt wrote:
> > > >> On Wed, Aug 6, 2025 at 2:00 PM Daniel P. Berrangé <
> berrange@redhat.com> wrote:
> > > >>>
> > > >>> On Wed, Aug 06, 2025 at 01:52:17PM +0200, Christian Ehrhardt wrote:
> > > >>>> Hi,
> > > >>>> I was unsure if this would be better sent to libvirt or qemu - the
> > > >>>> issue is somewhere between libvirt modelling CPUs and qemu 10.1
> > > >>>> behaving differently. I did not want to double post and gladly
> most of
> > > >>>> the people are on both lists - since the switch in/out of the
> problem
> > > >>>> is qemu 10.0 <-> 10.1 let me start here. I beg your pardon for
> not yet
> > > >>>> having all the answers, I'm sure I could find more with
> debugging, but
> > > >>>> I also wanted to report early for your awareness while we are
> still in
> > > >>>> the RC phase.
> > > >>>>
> > > >>>>
> > > >>>> # Problem
> > > >>>>
> > > >>>> What I found when testing migrations in Ubuntu with qemu 10.1-rc1
> was:
> > > >>>> error: operation failed: guest CPU doesn't match specification:
> > > >>>> missing features: pdcm
> > > >>>>
> > > >>>> This is behaving the same with libvirt 11.4 or the more recent
> 11.6.
> > > >>>> But switching back to qemu 10.0 confirmed that this behavior is
> new
> > > >>>> with qemu 10.1-rc.
> > > >>>
> > > >>>
> > > >>>> Without yet having any hard evidence against them I found a few
> pdcm
> > > >>>> related commits between 10.0 and 10.1-rc1:
> > > >>>> 7ff24fb65 i386/tdx: Don't mask off CPUID_EXT_PDCM
> > > >>>> 00268e000 i386/cpu: Warn about why CPUID_EXT_PDCM is not
> available
> > > >>>> e68ec2980 i386/cpu: Move adjustment of CPUID_EXT_PDCM before
> > > >>>> feature_dependencies[] check
> > > >>>> 0ba06e46d i386/tdx: Add TDX fixed1 bits to supported CPUIDs
> > > >>>>
> > > >>>>
> > > >>>> # Caveat
> > > >>>>
> > > >>>> My test environment is in LXD system containers, that gives me
> issues
> > > >>>> in the power management detection
> > > >>>> libvirtd[406]: error from service:
> GDBus.Error:System.Error.EROFS:
> > > >>>> Read-only file system
> > > >>>> libvirtd[406]: Failed to get host power management capabilities
> > > >>>
> > > >>> That's harmless.
> > > >>
> > > >> Yeah, it always was for me - thanks for confirming.
> > > >>
> > > >>>> And the resulting host-model on a rather old test server will
> therefore have:
> > > >>>> <cpu mode='custom' match='exact' check='full'>
> > > >>>> <model fallback='forbid'>Haswell-noTSX-IBRS</model>
> > > >>>> <vendor>Intel</vendor>
> > > >>>> <feature policy='require' name='vmx'/>
> > > >>>> <feature policy='disable' name='pdcm'/>
> > > >>>> ...
> > > >>>>
> > > >>>> But that was fine in the past, and the behavior started to break
> > > >>>> save/restore or migrations just now with the new qemu 10.1-rc.
> > > >>>>
> > > >>>> # Next steps
> > > >>>>
> > > >>>> I'm soon overwhelmed by meetings for the rest of the day, but
> would be
> > > >>>> curious if one has a suggestion about what to look at next for
> > > >>>> debugging or a theory about what might go wrong. If nothing else
> comes
> > > >>>> up I'll try to set up a bisect run tomorrow.
> > > >>>
> > > >>> Yeah, git bisect is what I'd start with.
> > > >>
> > > >> Bisect complete, identified this commit
> > > >>
> > > >> commit 00268e00027459abede448662f8794d78eb4b0a4
> > > >> Author: Xiaoyao Li <xiaoyao.li@intel.com>
> > > >> Date: Tue Mar 4 00:24:50 2025 -0500
> > > >>
> > > >> i386/cpu: Warn about why CPUID_EXT_PDCM is not available
> > > >>
> > > >> When user requests PDCM explicitly via "+pdcm" without PMU
> enabled, emit
> > > >> a warning to inform the user.
> > > >>
> > > >> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> > > >> Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
> > > >> Link:
> https://lore.kernel.org/r/20250304052450.465445-3-xiaoyao.li@intel.com
> > > >> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> > > >>
> > > >> target/i386/cpu.c | 3 +++
> > > >> 1 file changed, 3 insertions(+)
> > > >>
> > > >>
> > > >>
> > > >> Which is odd as it should only add a warning right?
> > > >
> > > > No, that commit message is misleading.
> > > >
> > > > IIUC mark_unavailable_features() actively blocks usage of the
> feature,
> > > > so it is a functional change, not merely a emitting warning.
> > > >
> > > > It makes me wonder if that commit was actually intended to block the
> > > > feature or not, vs merely warning ? CC'ing those involved in the
> > > > commit.
> > > We can revert the commit. I'll send the revert to Stefan and let him
> > > decide whether to include it in 10.1-rc4 or delay to 10.2 and 10.1.1.
> >
> > Thanks Paolo for considering that.
> >
> > My steps to reproduce seemed really clear and are 100% reproducible
> > for me, but no one so far said "yeah they see it too", so I'm getting
> > unsure if it was not tried by anyone else or if there is more to it
> > than we yet know.
> > Further I tested more with the commit reverted, and found that at
> > least cross version migrations (9.2 -> 10.1) still have issues that
> > seem related - complaining about pdcm as missing feature.
> > But that was in a log of a test system that went away and ... you know
> > how these things can sometimes be, that new result is not yet very
> > reliable.
> >
> > I intended to check the following matrix more deeply again with and
> > without the reverted change and then come back to this thread:
> >
> > #1 Compare platforms
> > - Migrating between non containerized hosts to verify if they are
> > affected as well
> > - Power management explicitly switched off/on (vs the auto detect of
> > host-model) in the guest XML
> > #2 Retest the different Use-cases I've seen this pop up
> > - 10.1 managed save (broken unless reverting the commit that was
> identified)
> > - 9.2 -> 10.1 migration (seems broken even with the revert)
>
> I need to come back to this aspect of it - the cross release or cross
> qemu version migrations.
>
> Hector (on CC) helps me on that now - sadly we were able to confirm
> that migrations from older qemu versions no longer work.
> Yep 10.1 is released by now so it might end up as "The problem is what
> happens when we detect after we have done a release that something has
> gone wrong" from [2].
> But I still can't believe only we see this and therefore for now want
> to believe I messed up on our side when merging 10.1 :-)
>
> For now this is a call if others have also seen any older release
> migrating to 10.1 to throw:
> error: operation failed: guest CPU doesn't match specification:
> missing features: pdcm,arch-capabilities
>
> Hector will later today reply here with a summary of what we found so
> far, to provide you a more complete picture to think about, without
> having to read through all the messy interim steps in the Ubuntu bug.
>
> [1]: https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/2121787
> [2]:
> https://gitlab.com/qemu-project/qemu/-/blob/master/docs/devel/migration/compatibility.rst?plain=1#L322
>
> > The hope was that these will help to further identify what is going
> > on, but despite the urgency of the release being imminent I have not
> > yet managed to find the time in the last two days :-/
> >
> > > Sorry for the delay in answering (and thanks Daniel for bringing this
> to
> > > my attention).
> > >
> > > Thanks,
> > >
> > > Paolo
> > >
> >
> >
> > --
> > Christian Ehrhardt
> > Director of Engineering, Ubuntu Server
> > Canonical Ltd
>
>
>
> --
> Christian Ehrhardt
> Director of Engineering, Ubuntu Server
> Canonical Ltd
>
--
Hector CAO
Software Engineer – Partner Engineering Team
hector.cao@canonical.com
https://launc <https://launchpad.net/~hectorcao>hpad.net/~hectorcao
<https://launchpad.net/~hectorcao>
<https://launchpad.net/~hectorcao>
[-- Attachment #2: Type: text/html, Size: 13864 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread* [RFC PATCH 0/2] Fix cross migration issue with missing features: pdcm, arch-capabilities
2025-09-04 14:35 Issues with pdcm in qemu 10.1-rc on migration and save/restore Hector Cao
@ 2025-09-10 8:24 ` Hector Cao
2025-09-10 8:24 ` [RFC PATCH 2/2] target/i386: add compatibility property for pdcm feature Hector Cao
0 siblings, 1 reply; 5+ messages in thread
From: Hector Cao @ 2025-09-10 8:24 UTC (permalink / raw)
To: grub-devel
Hello,
Since it is a blocking issue for us, we went further and ended up with a solution along [1]
that allows us to get out of this situation.
The idea is to add compatibility properties to restore legacy behaviors for machine types
with older versions of QEMU (<10.1). 2 compatiblity properties have been added to address
respectively the 2 missing features, each one is done in a separate patch.
We know that 10.1 has been released and it's final, but working on a solution towards 11.0
would allow everyone to settle on the fix and even consider backporting where not yet released
like Ubuntu 25.10 for us.
It is important to have upstream support going forward in this or any other way
and therefore reach out with this RFC to ask you to think about it with us.
[1] https://gitlab.com/qemu-project/qemu/-/blob/master/docs/devel/migration/compatibility.rst
Hector Cao (2):
target/i386: add compatibility property for arch_capabilities
target/i386: add compatibility property for pdcm feature
hw/core/machine.c | 2 ++
migration/migration.h | 23 +++++++++++++++++++++++
migration/options.c | 6 ++++++
target/i386/cpu.c | 17 ++++++++++++++---
target/i386/kvm/kvm.c | 5 ++++-
5 files changed, 49 insertions(+), 4 deletions(-)
--
2.45.2
_______________________________________________
Grub-devel mailing list
Grub-devel@gnu.org
https://lists.gnu.org/mailman/listinfo/grub-devel
^ permalink raw reply [flat|nested] 5+ messages in thread
* [RFC PATCH 2/2] target/i386: add compatibility property for pdcm feature
2025-09-10 8:24 ` [RFC PATCH 0/2] Fix cross migration issue with missing features: pdcm, arch-capabilities Hector Cao
@ 2025-09-10 8:24 ` Hector Cao
0 siblings, 0 replies; 5+ messages in thread
From: Hector Cao @ 2025-09-10 8:24 UTC (permalink / raw)
To: grub-devel
The pdcm feature is supposed to be disabled when PMU is not
available. Up until v10.1, pdcm feature is enabled even when PMU
is off. This behavior has been fixed but this change breaks the
migration of VMs that are run with QEMU < 10.0 and expect the pdcm
feature to be enabled on the destination host.
This commit restores the legacy behavior for machines with version
prior to 10.1 to allow the migration from older QEMU to QEMU 10.1.
Signed-off-by: Hector Cao <hector.cao@canonical.com>
---
hw/core/machine.c | 1 +
migration/migration.h | 11 +++++++++++
migration/options.c | 3 +++
target/i386/cpu.c | 17 ++++++++++++++---
4 files changed, 29 insertions(+), 3 deletions(-)
diff --git a/hw/core/machine.c b/hw/core/machine.c
index 8ad5d79cb3..535184c221 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -46,6 +46,7 @@ GlobalProperty hw_compat_10_0[] = {
{ "ramfb", "use-legacy-x86-rom", "true"},
{ "vfio-pci-nohotplug", "use-legacy-x86-rom", "true" },
{ "migration", "arch-cap-always-on", "true" },
+ { "migration", "pdcm-on-even-without-pmu", "true" },
};
const size_t hw_compat_10_0_len = G_N_ELEMENTS(hw_compat_10_0);
diff --git a/migration/migration.h b/migration/migration.h
index 5124ff3636..7d5b2aa042 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -522,6 +522,17 @@ struct MigrationState {
* to a host with QEMU 10.1 with error : missing feature arch-capabilities
*/
bool arch_cap_always_on;
+
+ /*
+ * This variable allows to keep the backward compatibility with QEMU (<10.1)
+ * on the pdcm feature detection. The pdcm feature should be disabled when
+ * PMU is not available. Prio to 10.1, there is a bug and pdcm can still be
+ * enabled even if PMU is off. This behavior has been fixed by the commit
+ * e68ec29 (since 10.1).
+ * This new behavior breaks migration of VMs that expect, with the QEMU
+ * (since 10.1), pdcm to be disabled.
+ */
+ bool pdcm_on_even_without_pmu;
};
void migrate_set_state(MigrationStatus *state, MigrationStatus old_state,
diff --git a/migration/options.c b/migration/options.c
index 3a80dba9c5..a2a95dfcc4 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -206,6 +206,9 @@ const Property migration_properties[] = {
DEFINE_PROP_BOOL("arch-cap-always-on", MigrationState,
arch_cap_always_on, false),
+
+ DEFINE_PROP_BOOL("pdcm-on-even-without-pmu", MigrationState,
+ pdcm_on_even_without_pmu, false),
};
const size_t migration_properties_count = ARRAY_SIZE(migration_properties);
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 6d85149e6e..1f0f2c8dbf 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -51,6 +51,8 @@
#include "disas/capstone.h"
#include "cpu-internal.h"
+#include "migration/migration.h"
+
static void x86_cpu_realizefn(DeviceState *dev, Error **errp);
static void x86_cpu_get_supported_cpuid(uint32_t func, uint32_t index,
uint32_t *eax, uint32_t *ebx,
@@ -7839,6 +7841,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
uint32_t signature[3];
X86CPUTopoInfo *topo_info = &env->topo_info;
uint32_t threads_per_pkg;
+ MigrationState *ms = migrate_get_current();
threads_per_pkg = x86_threads_per_pkg(topo_info);
@@ -7894,6 +7897,11 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
/* Fixup overflow: max value for bits 23-16 is 255. */
*ebx |= MIN(num, 255) << 16;
}
+ if (ms->pdcm_on_even_without_pmu) {
+ if (!cpu->enable_pmu) {
+ *ecx &= ~CPUID_EXT_PDCM;
+ }
+ }
break;
case 2: { /* cache info: needed for Pentium Pro compatibility */
const CPUCaches *caches;
@@ -8892,6 +8900,7 @@ void x86_cpu_expand_features(X86CPU *cpu, Error **errp)
FeatureWord w;
int i;
GList *l;
+ MigrationState *ms = migrate_get_current();
for (l = plus_features; l; l = l->next) {
const char *prop = l->data;
@@ -8944,9 +8953,11 @@ void x86_cpu_expand_features(X86CPU *cpu, Error **errp)
}
}
- /* PDCM is fixed1 bit for TDX */
- if (!cpu->enable_pmu && !is_tdx_vm()) {
- env->features[FEAT_1_ECX] &= ~CPUID_EXT_PDCM;
+ if (!ms->pdcm_on_even_without_pmu) {
+ /* PDCM is fixed1 bit for TDX */
+ if (!cpu->enable_pmu && !is_tdx_vm()) {
+ env->features[FEAT_1_ECX] &= ~CPUID_EXT_PDCM;
+ }
}
for (i = 0; i < ARRAY_SIZE(feature_dependencies); i++) {
--
2.45.2
_______________________________________________
Grub-devel mailing list
Grub-devel@gnu.org
https://lists.gnu.org/mailman/listinfo/grub-devel
^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-09-11 10:40 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <mailman.870.1757509050.1197.grub-devel@gnu.org>
2025-09-11 6:59 ` [PATCH v2] kern: perform NULL check in unregister paths (command/extcmd) Avnish Chouhan
2025-09-11 8:09 ` Srish Srinivasan
2025-09-11 8:27 ` [RFC PATCH 1/2] target/i386: add compatibility property for arch_capabilities Avnish Chouhan
2025-09-11 10:39 ` [RFC PATCH 2/2] target/i386: add compatibility property for pdcm feature Avnish Chouhan
2025-09-04 14:35 Issues with pdcm in qemu 10.1-rc on migration and save/restore Hector Cao
2025-09-10 8:24 ` [RFC PATCH 0/2] Fix cross migration issue with missing features: pdcm, arch-capabilities Hector Cao
2025-09-10 8:24 ` [RFC PATCH 2/2] target/i386: add compatibility property for pdcm feature Hector Cao
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.