From: Xiaoyao Li <xiaoyao.li@intel.com>
To: Christian Ehrhardt <christian.ehrhardt@canonical.com>
Cc: "Daniel P. Berrangé" <berrange@redhat.com>,
"Zhao Liu" <zhao1.liu@intel.com>,
"Paolo Bonzini" <pbonzini@redhat.com>,
qemu-devel <qemu-devel@nongnu.org>
Subject: Re: Issues with pdcm in qemu 10.1-rc on migration and save/restore
Date: Thu, 7 Aug 2025 16:09:02 +0800 [thread overview]
Message-ID: <da824dc2-c241-4858-a233-6253b6b62926@intel.com> (raw)
In-Reply-To: <CAATJJ0Jpn8VMRDOFuk7VaV5jC3tj0V1817OiRa6tH3x1OtYFSQ@mail.gmail.com>
On 8/7/2025 2:37 PM, Christian Ehrhardt wrote:
> On Thu, Aug 7, 2025 at 5:38 AM Xiaoyao Li <xiaoyao.li@intel.com> wrote:
>>
>> On 8/7/2025 3:18 AM, Daniel P. Berrangé wrote:
>>> On Wed, Aug 06, 2025 at 07:57:34PM +0200, Christian Ehrhardt wrote:
>>>> On Wed, Aug 6, 2025 at 2:00 PM Daniel P. Berrangé <berrange@redhat.com> wrote:
>>>>>
>>>>> On Wed, Aug 06, 2025 at 01:52:17PM +0200, Christian Ehrhardt wrote:
>>>>>> Hi,
>>>>>> I was unsure if this would be better sent to libvirt or qemu - the
>>>>>> issue is somewhere between libvirt modelling CPUs and qemu 10.1
>>>>>> behaving differently. I did not want to double post and gladly most of
>>>>>> the people are on both lists - since the switch in/out of the problem
>>>>>> is qemu 10.0 <-> 10.1 let me start here. I beg your pardon for not yet
>>>>>> having all the answers, I'm sure I could find more with debugging, but
>>>>>> I also wanted to report early for your awareness while we are still in
>>>>>> the RC phase.
>>>>>>
>>>>>>
>>>>>> # Problem
>>>>>>
>>>>>> What I found when testing migrations in Ubuntu with qemu 10.1-rc1 was:
>>>>>> error: operation failed: guest CPU doesn't match specification:
>>>>>> missing features: pdcm
>>>>>>
>>>>>> This is behaving the same with libvirt 11.4 or the more recent 11.6.
>>>>>> But switching back to qemu 10.0 confirmed that this behavior is new
>>>>>> with qemu 10.1-rc.
>>>>>
>>>>>
>>>>>> Without yet having any hard evidence against them I found a few pdcm
>>>>>> related commits between 10.0 and 10.1-rc1:
>>>>>> 7ff24fb65 i386/tdx: Don't mask off CPUID_EXT_PDCM
>>>>>> 00268e000 i386/cpu: Warn about why CPUID_EXT_PDCM is not available
>>>>>> e68ec2980 i386/cpu: Move adjustment of CPUID_EXT_PDCM before
>>>>>> feature_dependencies[] check
>>>>>> 0ba06e46d i386/tdx: Add TDX fixed1 bits to supported CPUIDs
>>>>>>
>>>>>>
>>>>>> # Caveat
>>>>>>
>>>>>> My test environment is in LXD system containers, that gives me issues
>>>>>> in the power management detection
>>>>>> libvirtd[406]: error from service: GDBus.Error:System.Error.EROFS:
>>>>>> Read-only file system
>>>>>> libvirtd[406]: Failed to get host power management capabilities
>>>>>
>>>>> That's harmless.
>>>>
>>>> Yeah, it always was for me - thanks for confirming.
>>>>
>>>>>> And the resulting host-model on a rather old test server will therefore have:
>>>>>> <cpu mode='custom' match='exact' check='full'>
>>>>>> <model fallback='forbid'>Haswell-noTSX-IBRS</model>
>>>>>> <vendor>Intel</vendor>
>>>>>> <feature policy='require' name='vmx'/>
>>>>>> <feature policy='disable' name='pdcm'/>
>>>>>> ...
>>>>>>
>>>>>> But that was fine in the past, and the behavior started to break
>>>>>> save/restore or migrations just now with the new qemu 10.1-rc.
>>>>>>
>>>>>> # Next steps
>>>>>>
>>>>>> I'm soon overwhelmed by meetings for the rest of the day, but would be
>>>>>> curious if one has a suggestion about what to look at next for
>>>>>> debugging or a theory about what might go wrong. If nothing else comes
>>>>>> up I'll try to set up a bisect run tomorrow.
>>>>>
>>>>> Yeah, git bisect is what I'd start with.
>>>>
>>>> Bisect complete, identified this commit
>>>>
>>>> commit 00268e00027459abede448662f8794d78eb4b0a4
>>>> Author: Xiaoyao Li <xiaoyao.li@intel.com>
>>>> Date: Tue Mar 4 00:24:50 2025 -0500
>>>>
>>>> i386/cpu: Warn about why CPUID_EXT_PDCM is not available
>>>>
>>>> When user requests PDCM explicitly via "+pdcm" without PMU enabled, emit
>>>> a warning to inform the user.
>>>>
>>>> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
>>>> Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
>>>> Link: https://lore.kernel.org/r/20250304052450.465445-3-xiaoyao.li@intel.com
>>>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>>>>
>>>> target/i386/cpu.c | 3 +++
>>>> 1 file changed, 3 insertions(+)
>>>>
>>>>
>>>>
>>>> Which is odd as it should only add a warning right?
>>>
>>> No, that commit message is misleading.
>>>
>>> IIUC mark_unavailable_features() actively blocks usage of the feature,
>>> so it is a functional change, not merely a emitting warning.
>>>
>>> It makes me wonder if that commit was actually intended to block the
>>> feature or not, vs merely warning ? CC'ing those involved in the
>>> commit.
>>
>> The intention was to print a warning to tell users PDCM cannot be
>> enabled if pmu is not enabled. While mark_unavailable_features() does
>> has the effect of setting the bit in cpu->filtered_features[].
>>
>> But the feature is masked off anyway
>
> Right - it was disabled right from the beginning.
> As I reported libvirt detected it as not available and constructed the
> CPU as with it disabled.
> Which translated it into -cpu ...,pdcm=off,...
>
> The new and bad aspect we need to overcome is that in these conditions
> this now somehow breaks save/restore and migration operations.
The commit 00268e0002 makes a difference only for the case "-cpu
xxx,pdcm=on" without "pmu=on", and it emits a warning and sets the PDCM
in cpu->filtered_features[].
So libvirt must first request with "-cpu xxx,pdcm=on" without "pmu=on"
and gets the result that PDCM is filtered (set in cpu->filtered_features[]).
This indeed introduces the behavior change that before the commit, "-cpu
xxx,pdcm=on" without "pmu=on" doesn't get warning nor PDCM is set in
cpu->filtered_features[], but PDCM is just not set in guest's CPUID.
I couldn't understand how the warning or PDCM is set in
cpu->filtered_features[] breaks save/restore and migration.
> As a cross-check I reverted just and only 00268e0002 on top of
> 10.1-rc2 and these use cases work again.
>
>> even without the
>> mark_unavailable_features():
>>
>> env->features[FEAT_1_ECX] &= ~CPUID_EXT_PDCM;
>>
>> So is it that PDCM is set in cpu->filtered_features[] causing the problem?
>>
>>> With regards,
>>> Daniel
>>
>
>
next prev parent reply other threads:[~2025-08-07 8:10 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-06 11:52 Issues with pdcm in qemu 10.1-rc on migration and save/restore Christian Ehrhardt
2025-08-06 12:00 ` Daniel P. Berrangé
2025-08-06 17:57 ` Christian Ehrhardt
2025-08-06 19:18 ` Daniel P. Berrangé
2025-08-07 3:38 ` Xiaoyao Li
2025-08-07 6:37 ` Christian Ehrhardt
2025-08-07 8:09 ` Xiaoyao Li [this message]
2025-08-10 13:07 ` Christian Ehrhardt
2025-08-19 14:51 ` Paolo Bonzini
2025-08-20 5:11 ` Christian Ehrhardt
2025-08-20 9:10 ` Christian Ehrhardt
2025-09-03 8:38 ` Christian Ehrhardt
2025-09-03 11:26 ` Hector Cao
2025-09-04 14:35 ` Hector Cao
2025-09-10 11:57 ` [RFC PATCH 0/2] Fix cross migration issue with missing features: pdcm, arch-capabilities Hector Cao
2025-09-10 11:57 ` [PATCH 1/2] target/i386: add compatibility property for arch_capabilities Hector Cao
2025-09-16 8:12 ` Daniel P. Berrangé
2025-09-16 8:28 ` Hector Cao
2025-09-23 7:25 ` Christian Ehrhardt
2025-09-10 11:57 ` [PATCH 2/2] target/i386: add compatibility property for pdcm feature Hector Cao
2025-09-23 7:53 ` [RFC PATCH 0/2] Fix cross migration issue with missing features: pdcm, arch-capabilities Paolo Bonzini
2025-09-23 10:08 ` Hector Cao
2025-09-23 10:15 ` Paolo Bonzini
2025-09-23 10:31 ` Hector Cao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=da824dc2-c241-4858-a233-6253b6b62926@intel.com \
--to=xiaoyao.li@intel.com \
--cc=berrange@redhat.com \
--cc=christian.ehrhardt@canonical.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=zhao1.liu@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).