qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* Issues with pdcm in qemu 10.1-rc on migration and save/restore
@ 2025-08-06 11:52 Christian Ehrhardt
  2025-08-06 12:00 ` Daniel P. Berrangé
  0 siblings, 1 reply; 25+ messages in thread
From: Christian Ehrhardt @ 2025-08-06 11:52 UTC (permalink / raw)
  To: qemu-devel

Hi,
I was unsure if this would be better sent to libvirt or qemu - the
issue is somewhere between libvirt modelling CPUs and qemu 10.1
behaving differently. I did not want to double post and gladly most of
the people are on both lists - since the switch in/out of the problem
is qemu 10.0 <-> 10.1 let me start here. I beg your pardon for not yet
having all the answers, I'm sure I could find more with debugging, but
I also wanted to report early for your awareness while we are still in
the RC phase.


# Problem

What I found when testing migrations in Ubuntu with qemu 10.1-rc1 was:
  error: operation failed: guest CPU doesn't match specification:
missing features: pdcm

This is behaving the same with libvirt 11.4 or the more recent 11.6.
But switching back to qemu 10.0 confirmed that this behavior is new
with qemu 10.1-rc.

To allow you to have a look I isolated it from the test automation and
simplified it to use save/restore which allows you to see it on just
one machine.


# Steps to reproduce

$ cat testguest.xml
<domain type='kvm'>
<name>testguest</name>
<memory unit='KiB'>524288</memory>
<currentMemory unit='KiB'>524288</currentMemory>
<os>
<type arch='x86_64' machine='pc-q35-10.0'>hvm</type>
</os>
<cpu mode='host-model' check='partial'/>
<devices>
<emulator>/usr/bin/qemu-system-x86_64</emulator>
</devices>
</domain>

$ virsh define testguest.xml
Domain 'testguest' defined from testguest.xml

$ virsh start testguest
Domain 'testguest' started

$ virsh managedsave testguest
Domain 'testguest' state saved by libvirt

$ virsh start testguest
error: Failed to start domain 'testguest'
error: operation failed: guest CPU doesn't match specification:
missing features: pdcm

Without yet having any hard evidence against them I found a few pdcm
related commits between 10.0 and 10.1-rc1:
  7ff24fb65 i386/tdx: Don't mask off CPUID_EXT_PDCM
  00268e000 i386/cpu: Warn about why CPUID_EXT_PDCM is not available
  e68ec2980 i386/cpu: Move adjustment of CPUID_EXT_PDCM before
feature_dependencies[] check
  0ba06e46d i386/tdx: Add TDX fixed1 bits to supported CPUIDs


# Caveat

My test environment is in LXD system containers, that gives me issues
in the power management detection
  libvirtd[406]: error from service: GDBus.Error:System.Error.EROFS:
Read-only file system
  libvirtd[406]: Failed to get host power management capabilities

And the resulting host-model on a  rather old test server will therefore have:
  <cpu mode='custom' match='exact' check='full'>
    <model fallback='forbid'>Haswell-noTSX-IBRS</model>
    <vendor>Intel</vendor>
    <feature policy='require' name='vmx'/>
    <feature policy='disable' name='pdcm'/>
     ...

But that was fine in the past, and the behavior started to break
save/restore or migrations just now with the new qemu 10.1-rc.


# Next steps

I'm soon overwhelmed by meetings for the rest of the day, but would be
curious if one has a suggestion about what to look at next for
debugging or a theory about what might go wrong. If nothing else comes
up I'll try to set up a bisect run tomorrow.

-- 
Christian Ehrhardt
Director of Engineering, Ubuntu Server
Canonical Ltd


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Issues with pdcm in qemu 10.1-rc on migration and save/restore
  2025-08-06 11:52 Issues with pdcm in qemu 10.1-rc on migration and save/restore Christian Ehrhardt
@ 2025-08-06 12:00 ` Daniel P. Berrangé
  2025-08-06 17:57   ` Christian Ehrhardt
  0 siblings, 1 reply; 25+ messages in thread
From: Daniel P. Berrangé @ 2025-08-06 12:00 UTC (permalink / raw)
  To: Christian Ehrhardt; +Cc: qemu-devel

On Wed, Aug 06, 2025 at 01:52:17PM +0200, Christian Ehrhardt wrote:
> Hi,
> I was unsure if this would be better sent to libvirt or qemu - the
> issue is somewhere between libvirt modelling CPUs and qemu 10.1
> behaving differently. I did not want to double post and gladly most of
> the people are on both lists - since the switch in/out of the problem
> is qemu 10.0 <-> 10.1 let me start here. I beg your pardon for not yet
> having all the answers, I'm sure I could find more with debugging, but
> I also wanted to report early for your awareness while we are still in
> the RC phase.
> 
> 
> # Problem
> 
> What I found when testing migrations in Ubuntu with qemu 10.1-rc1 was:
>   error: operation failed: guest CPU doesn't match specification:
> missing features: pdcm
> 
> This is behaving the same with libvirt 11.4 or the more recent 11.6.
> But switching back to qemu 10.0 confirmed that this behavior is new
> with qemu 10.1-rc.


> Without yet having any hard evidence against them I found a few pdcm
> related commits between 10.0 and 10.1-rc1:
>   7ff24fb65 i386/tdx: Don't mask off CPUID_EXT_PDCM
>   00268e000 i386/cpu: Warn about why CPUID_EXT_PDCM is not available
>   e68ec2980 i386/cpu: Move adjustment of CPUID_EXT_PDCM before
> feature_dependencies[] check
>   0ba06e46d i386/tdx: Add TDX fixed1 bits to supported CPUIDs
> 
> 
> # Caveat
> 
> My test environment is in LXD system containers, that gives me issues
> in the power management detection
>   libvirtd[406]: error from service: GDBus.Error:System.Error.EROFS:
> Read-only file system
>   libvirtd[406]: Failed to get host power management capabilities

That's harmless.

> And the resulting host-model on a  rather old test server will therefore have:
>   <cpu mode='custom' match='exact' check='full'>
>     <model fallback='forbid'>Haswell-noTSX-IBRS</model>
>     <vendor>Intel</vendor>
>     <feature policy='require' name='vmx'/>
>     <feature policy='disable' name='pdcm'/>
>      ...
> 
> But that was fine in the past, and the behavior started to break
> save/restore or migrations just now with the new qemu 10.1-rc.
> 
> # Next steps
> 
> I'm soon overwhelmed by meetings for the rest of the day, but would be
> curious if one has a suggestion about what to look at next for
> debugging or a theory about what might go wrong. If nothing else comes
> up I'll try to set up a bisect run tomorrow.

Yeah, git bisect is what I'd start with.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Issues with pdcm in qemu 10.1-rc on migration and save/restore
  2025-08-06 12:00 ` Daniel P. Berrangé
@ 2025-08-06 17:57   ` Christian Ehrhardt
  2025-08-06 19:18     ` Daniel P. Berrangé
  0 siblings, 1 reply; 25+ messages in thread
From: Christian Ehrhardt @ 2025-08-06 17:57 UTC (permalink / raw)
  To: Daniel P. Berrangé; +Cc: qemu-devel

On Wed, Aug 6, 2025 at 2:00 PM Daniel P. Berrangé <berrange@redhat.com> wrote:
>
> On Wed, Aug 06, 2025 at 01:52:17PM +0200, Christian Ehrhardt wrote:
> > Hi,
> > I was unsure if this would be better sent to libvirt or qemu - the
> > issue is somewhere between libvirt modelling CPUs and qemu 10.1
> > behaving differently. I did not want to double post and gladly most of
> > the people are on both lists - since the switch in/out of the problem
> > is qemu 10.0 <-> 10.1 let me start here. I beg your pardon for not yet
> > having all the answers, I'm sure I could find more with debugging, but
> > I also wanted to report early for your awareness while we are still in
> > the RC phase.
> >
> >
> > # Problem
> >
> > What I found when testing migrations in Ubuntu with qemu 10.1-rc1 was:
> >   error: operation failed: guest CPU doesn't match specification:
> > missing features: pdcm
> >
> > This is behaving the same with libvirt 11.4 or the more recent 11.6.
> > But switching back to qemu 10.0 confirmed that this behavior is new
> > with qemu 10.1-rc.
>
>
> > Without yet having any hard evidence against them I found a few pdcm
> > related commits between 10.0 and 10.1-rc1:
> >   7ff24fb65 i386/tdx: Don't mask off CPUID_EXT_PDCM
> >   00268e000 i386/cpu: Warn about why CPUID_EXT_PDCM is not available
> >   e68ec2980 i386/cpu: Move adjustment of CPUID_EXT_PDCM before
> > feature_dependencies[] check
> >   0ba06e46d i386/tdx: Add TDX fixed1 bits to supported CPUIDs
> >
> >
> > # Caveat
> >
> > My test environment is in LXD system containers, that gives me issues
> > in the power management detection
> >   libvirtd[406]: error from service: GDBus.Error:System.Error.EROFS:
> > Read-only file system
> >   libvirtd[406]: Failed to get host power management capabilities
>
> That's harmless.

Yeah, it always was for me - thanks for confirming.

> > And the resulting host-model on a  rather old test server will therefore have:
> >   <cpu mode='custom' match='exact' check='full'>
> >     <model fallback='forbid'>Haswell-noTSX-IBRS</model>
> >     <vendor>Intel</vendor>
> >     <feature policy='require' name='vmx'/>
> >     <feature policy='disable' name='pdcm'/>
> >      ...
> >
> > But that was fine in the past, and the behavior started to break
> > save/restore or migrations just now with the new qemu 10.1-rc.
> >
> > # Next steps
> >
> > I'm soon overwhelmed by meetings for the rest of the day, but would be
> > curious if one has a suggestion about what to look at next for
> > debugging or a theory about what might go wrong. If nothing else comes
> > up I'll try to set up a bisect run tomorrow.
>
> Yeah, git bisect is what I'd start with.

Bisect complete, identified this commit

commit 00268e00027459abede448662f8794d78eb4b0a4
Author: Xiaoyao Li <xiaoyao.li@intel.com>
Date:   Tue Mar 4 00:24:50 2025 -0500

    i386/cpu: Warn about why CPUID_EXT_PDCM is not available

    When user requests PDCM explicitly via "+pdcm" without PMU enabled, emit
    a warning to inform the user.

    Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
    Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
    Link: https://lore.kernel.org/r/20250304052450.465445-3-xiaoyao.li@intel.com
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

 target/i386/cpu.c | 3 +++
 1 file changed, 3 insertions(+)



Which is odd as it should only add a warning right?
But I checked the logs - the build on "e68ec29809 i386/cpu: Move
adjustment of CPUID_EXT_PDCM before feature_dependencies[] check"
passed the same use case.
I'll build both outside of the bisect run tomorrow to ensure this is
reproducible when I watch it more closely (than submitting a bisect
script).
Maybe this already helps to put your eyes and thoughts in the right direction.

> With regards,
> Daniel
> --
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
>


-- 
Christian Ehrhardt
Director of Engineering, Ubuntu Server
Canonical Ltd


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Issues with pdcm in qemu 10.1-rc on migration and save/restore
  2025-08-06 17:57   ` Christian Ehrhardt
@ 2025-08-06 19:18     ` Daniel P. Berrangé
  2025-08-07  3:38       ` Xiaoyao Li
  2025-08-19 14:51       ` Paolo Bonzini
  0 siblings, 2 replies; 25+ messages in thread
From: Daniel P. Berrangé @ 2025-08-06 19:18 UTC (permalink / raw)
  To: Christian Ehrhardt, Xiaoyao Li, Zhao Liu, Paolo Bonzini; +Cc: qemu-devel

On Wed, Aug 06, 2025 at 07:57:34PM +0200, Christian Ehrhardt wrote:
> On Wed, Aug 6, 2025 at 2:00 PM Daniel P. Berrangé <berrange@redhat.com> wrote:
> >
> > On Wed, Aug 06, 2025 at 01:52:17PM +0200, Christian Ehrhardt wrote:
> > > Hi,
> > > I was unsure if this would be better sent to libvirt or qemu - the
> > > issue is somewhere between libvirt modelling CPUs and qemu 10.1
> > > behaving differently. I did not want to double post and gladly most of
> > > the people are on both lists - since the switch in/out of the problem
> > > is qemu 10.0 <-> 10.1 let me start here. I beg your pardon for not yet
> > > having all the answers, I'm sure I could find more with debugging, but
> > > I also wanted to report early for your awareness while we are still in
> > > the RC phase.
> > >
> > >
> > > # Problem
> > >
> > > What I found when testing migrations in Ubuntu with qemu 10.1-rc1 was:
> > >   error: operation failed: guest CPU doesn't match specification:
> > > missing features: pdcm
> > >
> > > This is behaving the same with libvirt 11.4 or the more recent 11.6.
> > > But switching back to qemu 10.0 confirmed that this behavior is new
> > > with qemu 10.1-rc.
> >
> >
> > > Without yet having any hard evidence against them I found a few pdcm
> > > related commits between 10.0 and 10.1-rc1:
> > >   7ff24fb65 i386/tdx: Don't mask off CPUID_EXT_PDCM
> > >   00268e000 i386/cpu: Warn about why CPUID_EXT_PDCM is not available
> > >   e68ec2980 i386/cpu: Move adjustment of CPUID_EXT_PDCM before
> > > feature_dependencies[] check
> > >   0ba06e46d i386/tdx: Add TDX fixed1 bits to supported CPUIDs
> > >
> > >
> > > # Caveat
> > >
> > > My test environment is in LXD system containers, that gives me issues
> > > in the power management detection
> > >   libvirtd[406]: error from service: GDBus.Error:System.Error.EROFS:
> > > Read-only file system
> > >   libvirtd[406]: Failed to get host power management capabilities
> >
> > That's harmless.
> 
> Yeah, it always was for me - thanks for confirming.
> 
> > > And the resulting host-model on a  rather old test server will therefore have:
> > >   <cpu mode='custom' match='exact' check='full'>
> > >     <model fallback='forbid'>Haswell-noTSX-IBRS</model>
> > >     <vendor>Intel</vendor>
> > >     <feature policy='require' name='vmx'/>
> > >     <feature policy='disable' name='pdcm'/>
> > >      ...
> > >
> > > But that was fine in the past, and the behavior started to break
> > > save/restore or migrations just now with the new qemu 10.1-rc.
> > >
> > > # Next steps
> > >
> > > I'm soon overwhelmed by meetings for the rest of the day, but would be
> > > curious if one has a suggestion about what to look at next for
> > > debugging or a theory about what might go wrong. If nothing else comes
> > > up I'll try to set up a bisect run tomorrow.
> >
> > Yeah, git bisect is what I'd start with.
> 
> Bisect complete, identified this commit
> 
> commit 00268e00027459abede448662f8794d78eb4b0a4
> Author: Xiaoyao Li <xiaoyao.li@intel.com>
> Date:   Tue Mar 4 00:24:50 2025 -0500
> 
>     i386/cpu: Warn about why CPUID_EXT_PDCM is not available
> 
>     When user requests PDCM explicitly via "+pdcm" without PMU enabled, emit
>     a warning to inform the user.
> 
>     Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
>     Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
>     Link: https://lore.kernel.org/r/20250304052450.465445-3-xiaoyao.li@intel.com
>     Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> 
>  target/i386/cpu.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> 
> 
> Which is odd as it should only add a warning right?

No, that commit message is misleading.

IIUC mark_unavailable_features() actively blocks usage of the feature,
so it is a functional change, not merely a emitting warning.

It makes me wonder if that commit was actually intended to block the
feature or not, vs merely warning ?  CC'ing those involved in the
commit.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Issues with pdcm in qemu 10.1-rc on migration and save/restore
  2025-08-06 19:18     ` Daniel P. Berrangé
@ 2025-08-07  3:38       ` Xiaoyao Li
  2025-08-07  6:37         ` Christian Ehrhardt
  2025-08-19 14:51       ` Paolo Bonzini
  1 sibling, 1 reply; 25+ messages in thread
From: Xiaoyao Li @ 2025-08-07  3:38 UTC (permalink / raw)
  To: Daniel P. Berrangé, Christian Ehrhardt, Zhao Liu,
	Paolo Bonzini
  Cc: qemu-devel

On 8/7/2025 3:18 AM, Daniel P. Berrangé wrote:
> On Wed, Aug 06, 2025 at 07:57:34PM +0200, Christian Ehrhardt wrote:
>> On Wed, Aug 6, 2025 at 2:00 PM Daniel P. Berrangé <berrange@redhat.com> wrote:
>>>
>>> On Wed, Aug 06, 2025 at 01:52:17PM +0200, Christian Ehrhardt wrote:
>>>> Hi,
>>>> I was unsure if this would be better sent to libvirt or qemu - the
>>>> issue is somewhere between libvirt modelling CPUs and qemu 10.1
>>>> behaving differently. I did not want to double post and gladly most of
>>>> the people are on both lists - since the switch in/out of the problem
>>>> is qemu 10.0 <-> 10.1 let me start here. I beg your pardon for not yet
>>>> having all the answers, I'm sure I could find more with debugging, but
>>>> I also wanted to report early for your awareness while we are still in
>>>> the RC phase.
>>>>
>>>>
>>>> # Problem
>>>>
>>>> What I found when testing migrations in Ubuntu with qemu 10.1-rc1 was:
>>>>    error: operation failed: guest CPU doesn't match specification:
>>>> missing features: pdcm
>>>>
>>>> This is behaving the same with libvirt 11.4 or the more recent 11.6.
>>>> But switching back to qemu 10.0 confirmed that this behavior is new
>>>> with qemu 10.1-rc.
>>>
>>>
>>>> Without yet having any hard evidence against them I found a few pdcm
>>>> related commits between 10.0 and 10.1-rc1:
>>>>    7ff24fb65 i386/tdx: Don't mask off CPUID_EXT_PDCM
>>>>    00268e000 i386/cpu: Warn about why CPUID_EXT_PDCM is not available
>>>>    e68ec2980 i386/cpu: Move adjustment of CPUID_EXT_PDCM before
>>>> feature_dependencies[] check
>>>>    0ba06e46d i386/tdx: Add TDX fixed1 bits to supported CPUIDs
>>>>
>>>>
>>>> # Caveat
>>>>
>>>> My test environment is in LXD system containers, that gives me issues
>>>> in the power management detection
>>>>    libvirtd[406]: error from service: GDBus.Error:System.Error.EROFS:
>>>> Read-only file system
>>>>    libvirtd[406]: Failed to get host power management capabilities
>>>
>>> That's harmless.
>>
>> Yeah, it always was for me - thanks for confirming.
>>
>>>> And the resulting host-model on a  rather old test server will therefore have:
>>>>    <cpu mode='custom' match='exact' check='full'>
>>>>      <model fallback='forbid'>Haswell-noTSX-IBRS</model>
>>>>      <vendor>Intel</vendor>
>>>>      <feature policy='require' name='vmx'/>
>>>>      <feature policy='disable' name='pdcm'/>
>>>>       ...
>>>>
>>>> But that was fine in the past, and the behavior started to break
>>>> save/restore or migrations just now with the new qemu 10.1-rc.
>>>>
>>>> # Next steps
>>>>
>>>> I'm soon overwhelmed by meetings for the rest of the day, but would be
>>>> curious if one has a suggestion about what to look at next for
>>>> debugging or a theory about what might go wrong. If nothing else comes
>>>> up I'll try to set up a bisect run tomorrow.
>>>
>>> Yeah, git bisect is what I'd start with.
>>
>> Bisect complete, identified this commit
>>
>> commit 00268e00027459abede448662f8794d78eb4b0a4
>> Author: Xiaoyao Li <xiaoyao.li@intel.com>
>> Date:   Tue Mar 4 00:24:50 2025 -0500
>>
>>      i386/cpu: Warn about why CPUID_EXT_PDCM is not available
>>
>>      When user requests PDCM explicitly via "+pdcm" without PMU enabled, emit
>>      a warning to inform the user.
>>
>>      Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
>>      Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
>>      Link: https://lore.kernel.org/r/20250304052450.465445-3-xiaoyao.li@intel.com
>>      Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>>
>>   target/i386/cpu.c | 3 +++
>>   1 file changed, 3 insertions(+)
>>
>>
>>
>> Which is odd as it should only add a warning right?
> 
> No, that commit message is misleading.
> 
> IIUC mark_unavailable_features() actively blocks usage of the feature,
> so it is a functional change, not merely a emitting warning.
> 
> It makes me wonder if that commit was actually intended to block the
> feature or not, vs merely warning ?  CC'ing those involved in the
> commit.

The intention was to print a warning to tell users PDCM cannot be 
enabled if pmu is not enabled. While mark_unavailable_features() does 
has the effect of setting the bit in cpu->filtered_features[].

But the feature is masked off anyway even without the 
mark_unavailable_features():

     env->features[FEAT_1_ECX] &= ~CPUID_EXT_PDCM;

So is it that PDCM is set in cpu->filtered_features[] causing the problem?

> With regards,
> Daniel



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Issues with pdcm in qemu 10.1-rc on migration and save/restore
  2025-08-07  3:38       ` Xiaoyao Li
@ 2025-08-07  6:37         ` Christian Ehrhardt
  2025-08-07  8:09           ` Xiaoyao Li
  0 siblings, 1 reply; 25+ messages in thread
From: Christian Ehrhardt @ 2025-08-07  6:37 UTC (permalink / raw)
  To: Xiaoyao Li; +Cc: Daniel P. Berrangé, Zhao Liu, Paolo Bonzini, qemu-devel

On Thu, Aug 7, 2025 at 5:38 AM Xiaoyao Li <xiaoyao.li@intel.com> wrote:
>
> On 8/7/2025 3:18 AM, Daniel P. Berrangé wrote:
> > On Wed, Aug 06, 2025 at 07:57:34PM +0200, Christian Ehrhardt wrote:
> >> On Wed, Aug 6, 2025 at 2:00 PM Daniel P. Berrangé <berrange@redhat.com> wrote:
> >>>
> >>> On Wed, Aug 06, 2025 at 01:52:17PM +0200, Christian Ehrhardt wrote:
> >>>> Hi,
> >>>> I was unsure if this would be better sent to libvirt or qemu - the
> >>>> issue is somewhere between libvirt modelling CPUs and qemu 10.1
> >>>> behaving differently. I did not want to double post and gladly most of
> >>>> the people are on both lists - since the switch in/out of the problem
> >>>> is qemu 10.0 <-> 10.1 let me start here. I beg your pardon for not yet
> >>>> having all the answers, I'm sure I could find more with debugging, but
> >>>> I also wanted to report early for your awareness while we are still in
> >>>> the RC phase.
> >>>>
> >>>>
> >>>> # Problem
> >>>>
> >>>> What I found when testing migrations in Ubuntu with qemu 10.1-rc1 was:
> >>>>    error: operation failed: guest CPU doesn't match specification:
> >>>> missing features: pdcm
> >>>>
> >>>> This is behaving the same with libvirt 11.4 or the more recent 11.6.
> >>>> But switching back to qemu 10.0 confirmed that this behavior is new
> >>>> with qemu 10.1-rc.
> >>>
> >>>
> >>>> Without yet having any hard evidence against them I found a few pdcm
> >>>> related commits between 10.0 and 10.1-rc1:
> >>>>    7ff24fb65 i386/tdx: Don't mask off CPUID_EXT_PDCM
> >>>>    00268e000 i386/cpu: Warn about why CPUID_EXT_PDCM is not available
> >>>>    e68ec2980 i386/cpu: Move adjustment of CPUID_EXT_PDCM before
> >>>> feature_dependencies[] check
> >>>>    0ba06e46d i386/tdx: Add TDX fixed1 bits to supported CPUIDs
> >>>>
> >>>>
> >>>> # Caveat
> >>>>
> >>>> My test environment is in LXD system containers, that gives me issues
> >>>> in the power management detection
> >>>>    libvirtd[406]: error from service: GDBus.Error:System.Error.EROFS:
> >>>> Read-only file system
> >>>>    libvirtd[406]: Failed to get host power management capabilities
> >>>
> >>> That's harmless.
> >>
> >> Yeah, it always was for me - thanks for confirming.
> >>
> >>>> And the resulting host-model on a  rather old test server will therefore have:
> >>>>    <cpu mode='custom' match='exact' check='full'>
> >>>>      <model fallback='forbid'>Haswell-noTSX-IBRS</model>
> >>>>      <vendor>Intel</vendor>
> >>>>      <feature policy='require' name='vmx'/>
> >>>>      <feature policy='disable' name='pdcm'/>
> >>>>       ...
> >>>>
> >>>> But that was fine in the past, and the behavior started to break
> >>>> save/restore or migrations just now with the new qemu 10.1-rc.
> >>>>
> >>>> # Next steps
> >>>>
> >>>> I'm soon overwhelmed by meetings for the rest of the day, but would be
> >>>> curious if one has a suggestion about what to look at next for
> >>>> debugging or a theory about what might go wrong. If nothing else comes
> >>>> up I'll try to set up a bisect run tomorrow.
> >>>
> >>> Yeah, git bisect is what I'd start with.
> >>
> >> Bisect complete, identified this commit
> >>
> >> commit 00268e00027459abede448662f8794d78eb4b0a4
> >> Author: Xiaoyao Li <xiaoyao.li@intel.com>
> >> Date:   Tue Mar 4 00:24:50 2025 -0500
> >>
> >>      i386/cpu: Warn about why CPUID_EXT_PDCM is not available
> >>
> >>      When user requests PDCM explicitly via "+pdcm" without PMU enabled, emit
> >>      a warning to inform the user.
> >>
> >>      Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> >>      Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
> >>      Link: https://lore.kernel.org/r/20250304052450.465445-3-xiaoyao.li@intel.com
> >>      Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> >>
> >>   target/i386/cpu.c | 3 +++
> >>   1 file changed, 3 insertions(+)
> >>
> >>
> >>
> >> Which is odd as it should only add a warning right?
> >
> > No, that commit message is misleading.
> >
> > IIUC mark_unavailable_features() actively blocks usage of the feature,
> > so it is a functional change, not merely a emitting warning.
> >
> > It makes me wonder if that commit was actually intended to block the
> > feature or not, vs merely warning ?  CC'ing those involved in the
> > commit.
>
> The intention was to print a warning to tell users PDCM cannot be
> enabled if pmu is not enabled. While mark_unavailable_features() does
> has the effect of setting the bit in cpu->filtered_features[].
>
> But the feature is masked off anyway

Right - it was disabled right from the beginning.
As I reported libvirt detected it as not available and constructed the
CPU as with it disabled.
Which translated it into -cpu ...,pdcm=off,...

The new and bad aspect we need to overcome is that in these conditions
this now somehow breaks save/restore and migration operations.

As a cross-check I reverted just and only 00268e0002 on top of
10.1-rc2 and these use cases work again.

> even without the
> mark_unavailable_features():
>
>      env->features[FEAT_1_ECX] &= ~CPUID_EXT_PDCM;
>
> So is it that PDCM is set in cpu->filtered_features[] causing the problem?
>
> > With regards,
> > Daniel
>


-- 
Christian Ehrhardt
Director of Engineering, Ubuntu Server
Canonical Ltd


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Issues with pdcm in qemu 10.1-rc on migration and save/restore
  2025-08-07  6:37         ` Christian Ehrhardt
@ 2025-08-07  8:09           ` Xiaoyao Li
  2025-08-10 13:07             ` Christian Ehrhardt
  0 siblings, 1 reply; 25+ messages in thread
From: Xiaoyao Li @ 2025-08-07  8:09 UTC (permalink / raw)
  To: Christian Ehrhardt
  Cc: Daniel P. Berrangé, Zhao Liu, Paolo Bonzini, qemu-devel

On 8/7/2025 2:37 PM, Christian Ehrhardt wrote:
> On Thu, Aug 7, 2025 at 5:38 AM Xiaoyao Li <xiaoyao.li@intel.com> wrote:
>>
>> On 8/7/2025 3:18 AM, Daniel P. Berrangé wrote:
>>> On Wed, Aug 06, 2025 at 07:57:34PM +0200, Christian Ehrhardt wrote:
>>>> On Wed, Aug 6, 2025 at 2:00 PM Daniel P. Berrangé <berrange@redhat.com> wrote:
>>>>>
>>>>> On Wed, Aug 06, 2025 at 01:52:17PM +0200, Christian Ehrhardt wrote:
>>>>>> Hi,
>>>>>> I was unsure if this would be better sent to libvirt or qemu - the
>>>>>> issue is somewhere between libvirt modelling CPUs and qemu 10.1
>>>>>> behaving differently. I did not want to double post and gladly most of
>>>>>> the people are on both lists - since the switch in/out of the problem
>>>>>> is qemu 10.0 <-> 10.1 let me start here. I beg your pardon for not yet
>>>>>> having all the answers, I'm sure I could find more with debugging, but
>>>>>> I also wanted to report early for your awareness while we are still in
>>>>>> the RC phase.
>>>>>>
>>>>>>
>>>>>> # Problem
>>>>>>
>>>>>> What I found when testing migrations in Ubuntu with qemu 10.1-rc1 was:
>>>>>>     error: operation failed: guest CPU doesn't match specification:
>>>>>> missing features: pdcm
>>>>>>
>>>>>> This is behaving the same with libvirt 11.4 or the more recent 11.6.
>>>>>> But switching back to qemu 10.0 confirmed that this behavior is new
>>>>>> with qemu 10.1-rc.
>>>>>
>>>>>
>>>>>> Without yet having any hard evidence against them I found a few pdcm
>>>>>> related commits between 10.0 and 10.1-rc1:
>>>>>>     7ff24fb65 i386/tdx: Don't mask off CPUID_EXT_PDCM
>>>>>>     00268e000 i386/cpu: Warn about why CPUID_EXT_PDCM is not available
>>>>>>     e68ec2980 i386/cpu: Move adjustment of CPUID_EXT_PDCM before
>>>>>> feature_dependencies[] check
>>>>>>     0ba06e46d i386/tdx: Add TDX fixed1 bits to supported CPUIDs
>>>>>>
>>>>>>
>>>>>> # Caveat
>>>>>>
>>>>>> My test environment is in LXD system containers, that gives me issues
>>>>>> in the power management detection
>>>>>>     libvirtd[406]: error from service: GDBus.Error:System.Error.EROFS:
>>>>>> Read-only file system
>>>>>>     libvirtd[406]: Failed to get host power management capabilities
>>>>>
>>>>> That's harmless.
>>>>
>>>> Yeah, it always was for me - thanks for confirming.
>>>>
>>>>>> And the resulting host-model on a  rather old test server will therefore have:
>>>>>>     <cpu mode='custom' match='exact' check='full'>
>>>>>>       <model fallback='forbid'>Haswell-noTSX-IBRS</model>
>>>>>>       <vendor>Intel</vendor>
>>>>>>       <feature policy='require' name='vmx'/>
>>>>>>       <feature policy='disable' name='pdcm'/>
>>>>>>        ...
>>>>>>
>>>>>> But that was fine in the past, and the behavior started to break
>>>>>> save/restore or migrations just now with the new qemu 10.1-rc.
>>>>>>
>>>>>> # Next steps
>>>>>>
>>>>>> I'm soon overwhelmed by meetings for the rest of the day, but would be
>>>>>> curious if one has a suggestion about what to look at next for
>>>>>> debugging or a theory about what might go wrong. If nothing else comes
>>>>>> up I'll try to set up a bisect run tomorrow.
>>>>>
>>>>> Yeah, git bisect is what I'd start with.
>>>>
>>>> Bisect complete, identified this commit
>>>>
>>>> commit 00268e00027459abede448662f8794d78eb4b0a4
>>>> Author: Xiaoyao Li <xiaoyao.li@intel.com>
>>>> Date:   Tue Mar 4 00:24:50 2025 -0500
>>>>
>>>>       i386/cpu: Warn about why CPUID_EXT_PDCM is not available
>>>>
>>>>       When user requests PDCM explicitly via "+pdcm" without PMU enabled, emit
>>>>       a warning to inform the user.
>>>>
>>>>       Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
>>>>       Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
>>>>       Link: https://lore.kernel.org/r/20250304052450.465445-3-xiaoyao.li@intel.com
>>>>       Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>>>>
>>>>    target/i386/cpu.c | 3 +++
>>>>    1 file changed, 3 insertions(+)
>>>>
>>>>
>>>>
>>>> Which is odd as it should only add a warning right?
>>>
>>> No, that commit message is misleading.
>>>
>>> IIUC mark_unavailable_features() actively blocks usage of the feature,
>>> so it is a functional change, not merely a emitting warning.
>>>
>>> It makes me wonder if that commit was actually intended to block the
>>> feature or not, vs merely warning ?  CC'ing those involved in the
>>> commit.
>>
>> The intention was to print a warning to tell users PDCM cannot be
>> enabled if pmu is not enabled. While mark_unavailable_features() does
>> has the effect of setting the bit in cpu->filtered_features[].
>>
>> But the feature is masked off anyway
> 
> Right - it was disabled right from the beginning.
> As I reported libvirt detected it as not available and constructed the
> CPU as with it disabled.
> Which translated it into -cpu ...,pdcm=off,...
> 
> The new and bad aspect we need to overcome is that in these conditions
> this now somehow breaks save/restore and migration operations.

The commit 00268e0002 makes a difference only for the case "-cpu 
xxx,pdcm=on" without "pmu=on", and it emits a warning and sets the PDCM 
in cpu->filtered_features[].

So libvirt must first request with "-cpu xxx,pdcm=on" without "pmu=on" 
and gets the result that PDCM is filtered (set in cpu->filtered_features[]).

This indeed introduces the behavior change that before the commit, "-cpu 
xxx,pdcm=on" without "pmu=on" doesn't get warning nor PDCM is set in 
cpu->filtered_features[], but PDCM is just not set in guest's CPUID.

I couldn't understand how the warning or PDCM is set in 
cpu->filtered_features[] breaks save/restore and migration.

> As a cross-check I reverted just and only 00268e0002 on top of
> 10.1-rc2 and these use cases work again.
> 
>> even without the
>> mark_unavailable_features():
>>
>>       env->features[FEAT_1_ECX] &= ~CPUID_EXT_PDCM;
>>
>> So is it that PDCM is set in cpu->filtered_features[] causing the problem?
>>
>>> With regards,
>>> Daniel
>>
> 
> 



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Issues with pdcm in qemu 10.1-rc on migration and save/restore
  2025-08-07  8:09           ` Xiaoyao Li
@ 2025-08-10 13:07             ` Christian Ehrhardt
  0 siblings, 0 replies; 25+ messages in thread
From: Christian Ehrhardt @ 2025-08-10 13:07 UTC (permalink / raw)
  To: Xiaoyao Li; +Cc: Daniel P. Berrangé, Zhao Liu, Paolo Bonzini, qemu-devel

On Thu, Aug 7, 2025 at 10:09 AM Xiaoyao Li <xiaoyao.li@intel.com> wrote:
>
> On 8/7/2025 2:37 PM, Christian Ehrhardt wrote:
> > On Thu, Aug 7, 2025 at 5:38 AM Xiaoyao Li <xiaoyao.li@intel.com> wrote:
> >>
> >> On 8/7/2025 3:18 AM, Daniel P. Berrangé wrote:
> >>> On Wed, Aug 06, 2025 at 07:57:34PM +0200, Christian Ehrhardt wrote:
> >>>> On Wed, Aug 6, 2025 at 2:00 PM Daniel P. Berrangé <berrange@redhat.com> wrote:
> >>>>>
> >>>>> On Wed, Aug 06, 2025 at 01:52:17PM +0200, Christian Ehrhardt wrote:
> >>>>>> Hi,
> >>>>>> I was unsure if this would be better sent to libvirt or qemu - the
> >>>>>> issue is somewhere between libvirt modelling CPUs and qemu 10.1
> >>>>>> behaving differently. I did not want to double post and gladly most of
> >>>>>> the people are on both lists - since the switch in/out of the problem
> >>>>>> is qemu 10.0 <-> 10.1 let me start here. I beg your pardon for not yet
> >>>>>> having all the answers, I'm sure I could find more with debugging, but
> >>>>>> I also wanted to report early for your awareness while we are still in
> >>>>>> the RC phase.
> >>>>>>
> >>>>>>
> >>>>>> # Problem
> >>>>>>
> >>>>>> What I found when testing migrations in Ubuntu with qemu 10.1-rc1 was:
> >>>>>>     error: operation failed: guest CPU doesn't match specification:
> >>>>>> missing features: pdcm
> >>>>>>
> >>>>>> This is behaving the same with libvirt 11.4 or the more recent 11.6.
> >>>>>> But switching back to qemu 10.0 confirmed that this behavior is new
> >>>>>> with qemu 10.1-rc.
> >>>>>
> >>>>>
> >>>>>> Without yet having any hard evidence against them I found a few pdcm
> >>>>>> related commits between 10.0 and 10.1-rc1:
> >>>>>>     7ff24fb65 i386/tdx: Don't mask off CPUID_EXT_PDCM
> >>>>>>     00268e000 i386/cpu: Warn about why CPUID_EXT_PDCM is not available
> >>>>>>     e68ec2980 i386/cpu: Move adjustment of CPUID_EXT_PDCM before
> >>>>>> feature_dependencies[] check
> >>>>>>     0ba06e46d i386/tdx: Add TDX fixed1 bits to supported CPUIDs
> >>>>>>
> >>>>>>
> >>>>>> # Caveat
> >>>>>>
> >>>>>> My test environment is in LXD system containers, that gives me issues
> >>>>>> in the power management detection
> >>>>>>     libvirtd[406]: error from service: GDBus.Error:System.Error.EROFS:
> >>>>>> Read-only file system
> >>>>>>     libvirtd[406]: Failed to get host power management capabilities
> >>>>>
> >>>>> That's harmless.
> >>>>
> >>>> Yeah, it always was for me - thanks for confirming.
> >>>>
> >>>>>> And the resulting host-model on a  rather old test server will therefore have:
> >>>>>>     <cpu mode='custom' match='exact' check='full'>
> >>>>>>       <model fallback='forbid'>Haswell-noTSX-IBRS</model>
> >>>>>>       <vendor>Intel</vendor>
> >>>>>>       <feature policy='require' name='vmx'/>
> >>>>>>       <feature policy='disable' name='pdcm'/>
> >>>>>>        ...
> >>>>>>
> >>>>>> But that was fine in the past, and the behavior started to break
> >>>>>> save/restore or migrations just now with the new qemu 10.1-rc.
> >>>>>>
> >>>>>> # Next steps
> >>>>>>
> >>>>>> I'm soon overwhelmed by meetings for the rest of the day, but would be
> >>>>>> curious if one has a suggestion about what to look at next for
> >>>>>> debugging or a theory about what might go wrong. If nothing else comes
> >>>>>> up I'll try to set up a bisect run tomorrow.
> >>>>>
> >>>>> Yeah, git bisect is what I'd start with.
> >>>>
> >>>> Bisect complete, identified this commit
> >>>>
> >>>> commit 00268e00027459abede448662f8794d78eb4b0a4
> >>>> Author: Xiaoyao Li <xiaoyao.li@intel.com>
> >>>> Date:   Tue Mar 4 00:24:50 2025 -0500
> >>>>
> >>>>       i386/cpu: Warn about why CPUID_EXT_PDCM is not available
> >>>>
> >>>>       When user requests PDCM explicitly via "+pdcm" without PMU enabled, emit
> >>>>       a warning to inform the user.
> >>>>
> >>>>       Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> >>>>       Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
> >>>>       Link: https://lore.kernel.org/r/20250304052450.465445-3-xiaoyao.li@intel.com
> >>>>       Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> >>>>
> >>>>    target/i386/cpu.c | 3 +++
> >>>>    1 file changed, 3 insertions(+)
> >>>>
> >>>>
> >>>>
> >>>> Which is odd as it should only add a warning right?
> >>>
> >>> No, that commit message is misleading.
> >>>
> >>> IIUC mark_unavailable_features() actively blocks usage of the feature,
> >>> so it is a functional change, not merely a emitting warning.
> >>>
> >>> It makes me wonder if that commit was actually intended to block the
> >>> feature or not, vs merely warning ?  CC'ing those involved in the
> >>> commit.
> >>
> >> The intention was to print a warning to tell users PDCM cannot be
> >> enabled if pmu is not enabled. While mark_unavailable_features() does
> >> has the effect of setting the bit in cpu->filtered_features[].
> >>
> >> But the feature is masked off anyway
> >
> > Right - it was disabled right from the beginning.
> > As I reported libvirt detected it as not available and constructed the
> > CPU as with it disabled.
> > Which translated it into -cpu ...,pdcm=off,...
> >
> > The new and bad aspect we need to overcome is that in these conditions
> > this now somehow breaks save/restore and migration operations.
>
> The commit 00268e0002 makes a difference only for the case "-cpu
> xxx,pdcm=on" without "pmu=on", and it emits a warning and sets the PDCM
> in cpu->filtered_features[].

But this is `pdcm=off` as I said above, yet with/without the change it
breaks the mentioned migration and save/restors.
But since you mentioned pmu, that isn't mentioned in the qemu cmdline
arguments that libvirt used and the base type is Haswell-noTSX-IBRS.

> So libvirt must first request with "-cpu xxx,pdcm=on" without "pmu=on"
> and gets the result that PDCM is filtered (set in cpu->filtered_features[]).
>
> This indeed introduces the behavior change that before the commit, "-cpu
> xxx,pdcm=on" without "pmu=on" doesn't get warning nor PDCM is set in
> cpu->filtered_features[], but PDCM is just not set in guest's CPUID.
>
> I couldn't understand how the warning or PDCM is set in
> cpu->filtered_features[] breaks save/restore and migration.
>
> > As a cross-check I reverted just and only 00268e0002 on top of
> > 10.1-rc2 and these use cases work again.
> >
> >> even without the
> >> mark_unavailable_features():
> >>
> >>       env->features[FEAT_1_ECX] &= ~CPUID_EXT_PDCM;
> >>
> >> So is it that PDCM is set in cpu->filtered_features[] causing the problem?
> >>
> >>> With regards,
> >>> Daniel
> >>
> >
> >
>


-- 
Christian Ehrhardt
Director of Engineering, Ubuntu Server
Canonical Ltd


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Issues with pdcm in qemu 10.1-rc on migration and save/restore
  2025-08-06 19:18     ` Daniel P. Berrangé
  2025-08-07  3:38       ` Xiaoyao Li
@ 2025-08-19 14:51       ` Paolo Bonzini
  2025-08-20  5:11         ` Christian Ehrhardt
  1 sibling, 1 reply; 25+ messages in thread
From: Paolo Bonzini @ 2025-08-19 14:51 UTC (permalink / raw)
  To: Daniel P. Berrangé, Christian Ehrhardt, Xiaoyao Li, Zhao Liu
  Cc: qemu-devel

On 8/6/25 21:18, Daniel P. Berrangé wrote:
> On Wed, Aug 06, 2025 at 07:57:34PM +0200, Christian Ehrhardt wrote:
>> On Wed, Aug 6, 2025 at 2:00 PM Daniel P. Berrangé <berrange@redhat.com> wrote:
>>>
>>> On Wed, Aug 06, 2025 at 01:52:17PM +0200, Christian Ehrhardt wrote:
>>>> Hi,
>>>> I was unsure if this would be better sent to libvirt or qemu - the
>>>> issue is somewhere between libvirt modelling CPUs and qemu 10.1
>>>> behaving differently. I did not want to double post and gladly most of
>>>> the people are on both lists - since the switch in/out of the problem
>>>> is qemu 10.0 <-> 10.1 let me start here. I beg your pardon for not yet
>>>> having all the answers, I'm sure I could find more with debugging, but
>>>> I also wanted to report early for your awareness while we are still in
>>>> the RC phase.
>>>>
>>>>
>>>> # Problem
>>>>
>>>> What I found when testing migrations in Ubuntu with qemu 10.1-rc1 was:
>>>>    error: operation failed: guest CPU doesn't match specification:
>>>> missing features: pdcm
>>>>
>>>> This is behaving the same with libvirt 11.4 or the more recent 11.6.
>>>> But switching back to qemu 10.0 confirmed that this behavior is new
>>>> with qemu 10.1-rc.
>>>
>>>
>>>> Without yet having any hard evidence against them I found a few pdcm
>>>> related commits between 10.0 and 10.1-rc1:
>>>>    7ff24fb65 i386/tdx: Don't mask off CPUID_EXT_PDCM
>>>>    00268e000 i386/cpu: Warn about why CPUID_EXT_PDCM is not available
>>>>    e68ec2980 i386/cpu: Move adjustment of CPUID_EXT_PDCM before
>>>> feature_dependencies[] check
>>>>    0ba06e46d i386/tdx: Add TDX fixed1 bits to supported CPUIDs
>>>>
>>>>
>>>> # Caveat
>>>>
>>>> My test environment is in LXD system containers, that gives me issues
>>>> in the power management detection
>>>>    libvirtd[406]: error from service: GDBus.Error:System.Error.EROFS:
>>>> Read-only file system
>>>>    libvirtd[406]: Failed to get host power management capabilities
>>>
>>> That's harmless.
>>
>> Yeah, it always was for me - thanks for confirming.
>>
>>>> And the resulting host-model on a  rather old test server will therefore have:
>>>>    <cpu mode='custom' match='exact' check='full'>
>>>>      <model fallback='forbid'>Haswell-noTSX-IBRS</model>
>>>>      <vendor>Intel</vendor>
>>>>      <feature policy='require' name='vmx'/>
>>>>      <feature policy='disable' name='pdcm'/>
>>>>       ...
>>>>
>>>> But that was fine in the past, and the behavior started to break
>>>> save/restore or migrations just now with the new qemu 10.1-rc.
>>>>
>>>> # Next steps
>>>>
>>>> I'm soon overwhelmed by meetings for the rest of the day, but would be
>>>> curious if one has a suggestion about what to look at next for
>>>> debugging or a theory about what might go wrong. If nothing else comes
>>>> up I'll try to set up a bisect run tomorrow.
>>>
>>> Yeah, git bisect is what I'd start with.
>>
>> Bisect complete, identified this commit
>>
>> commit 00268e00027459abede448662f8794d78eb4b0a4
>> Author: Xiaoyao Li <xiaoyao.li@intel.com>
>> Date:   Tue Mar 4 00:24:50 2025 -0500
>>
>>      i386/cpu: Warn about why CPUID_EXT_PDCM is not available
>>
>>      When user requests PDCM explicitly via "+pdcm" without PMU enabled, emit
>>      a warning to inform the user.
>>
>>      Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
>>      Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
>>      Link: https://lore.kernel.org/r/20250304052450.465445-3-xiaoyao.li@intel.com
>>      Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>>
>>   target/i386/cpu.c | 3 +++
>>   1 file changed, 3 insertions(+)
>>
>>
>>
>> Which is odd as it should only add a warning right?
> 
> No, that commit message is misleading.
> 
> IIUC mark_unavailable_features() actively blocks usage of the feature,
> so it is a functional change, not merely a emitting warning.
> 
> It makes me wonder if that commit was actually intended to block the
> feature or not, vs merely warning ?  CC'ing those involved in the
> commit.
We can revert the commit.  I'll send the revert to Stefan and let him 
decide whether to include it in 10.1-rc4 or delay to 10.2 and 10.1.1.

Sorry for the delay in answering (and thanks Daniel for bringing this to 
my attention).

Thanks,

Paolo



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Issues with pdcm in qemu 10.1-rc on migration and save/restore
  2025-08-19 14:51       ` Paolo Bonzini
@ 2025-08-20  5:11         ` Christian Ehrhardt
  2025-08-20  9:10           ` Christian Ehrhardt
  2025-09-03  8:38           ` Christian Ehrhardt
  0 siblings, 2 replies; 25+ messages in thread
From: Christian Ehrhardt @ 2025-08-20  5:11 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Daniel P. Berrangé, Xiaoyao Li, Zhao Liu, qemu-devel

On Tue, Aug 19, 2025 at 4:51 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 8/6/25 21:18, Daniel P. Berrangé wrote:
> > On Wed, Aug 06, 2025 at 07:57:34PM +0200, Christian Ehrhardt wrote:
> >> On Wed, Aug 6, 2025 at 2:00 PM Daniel P. Berrangé <berrange@redhat.com> wrote:
> >>>
> >>> On Wed, Aug 06, 2025 at 01:52:17PM +0200, Christian Ehrhardt wrote:
> >>>> Hi,
> >>>> I was unsure if this would be better sent to libvirt or qemu - the
> >>>> issue is somewhere between libvirt modelling CPUs and qemu 10.1
> >>>> behaving differently. I did not want to double post and gladly most of
> >>>> the people are on both lists - since the switch in/out of the problem
> >>>> is qemu 10.0 <-> 10.1 let me start here. I beg your pardon for not yet
> >>>> having all the answers, I'm sure I could find more with debugging, but
> >>>> I also wanted to report early for your awareness while we are still in
> >>>> the RC phase.
> >>>>
> >>>>
> >>>> # Problem
> >>>>
> >>>> What I found when testing migrations in Ubuntu with qemu 10.1-rc1 was:
> >>>>    error: operation failed: guest CPU doesn't match specification:
> >>>> missing features: pdcm
> >>>>
> >>>> This is behaving the same with libvirt 11.4 or the more recent 11.6.
> >>>> But switching back to qemu 10.0 confirmed that this behavior is new
> >>>> with qemu 10.1-rc.
> >>>
> >>>
> >>>> Without yet having any hard evidence against them I found a few pdcm
> >>>> related commits between 10.0 and 10.1-rc1:
> >>>>    7ff24fb65 i386/tdx: Don't mask off CPUID_EXT_PDCM
> >>>>    00268e000 i386/cpu: Warn about why CPUID_EXT_PDCM is not available
> >>>>    e68ec2980 i386/cpu: Move adjustment of CPUID_EXT_PDCM before
> >>>> feature_dependencies[] check
> >>>>    0ba06e46d i386/tdx: Add TDX fixed1 bits to supported CPUIDs
> >>>>
> >>>>
> >>>> # Caveat
> >>>>
> >>>> My test environment is in LXD system containers, that gives me issues
> >>>> in the power management detection
> >>>>    libvirtd[406]: error from service: GDBus.Error:System.Error.EROFS:
> >>>> Read-only file system
> >>>>    libvirtd[406]: Failed to get host power management capabilities
> >>>
> >>> That's harmless.
> >>
> >> Yeah, it always was for me - thanks for confirming.
> >>
> >>>> And the resulting host-model on a  rather old test server will therefore have:
> >>>>    <cpu mode='custom' match='exact' check='full'>
> >>>>      <model fallback='forbid'>Haswell-noTSX-IBRS</model>
> >>>>      <vendor>Intel</vendor>
> >>>>      <feature policy='require' name='vmx'/>
> >>>>      <feature policy='disable' name='pdcm'/>
> >>>>       ...
> >>>>
> >>>> But that was fine in the past, and the behavior started to break
> >>>> save/restore or migrations just now with the new qemu 10.1-rc.
> >>>>
> >>>> # Next steps
> >>>>
> >>>> I'm soon overwhelmed by meetings for the rest of the day, but would be
> >>>> curious if one has a suggestion about what to look at next for
> >>>> debugging or a theory about what might go wrong. If nothing else comes
> >>>> up I'll try to set up a bisect run tomorrow.
> >>>
> >>> Yeah, git bisect is what I'd start with.
> >>
> >> Bisect complete, identified this commit
> >>
> >> commit 00268e00027459abede448662f8794d78eb4b0a4
> >> Author: Xiaoyao Li <xiaoyao.li@intel.com>
> >> Date:   Tue Mar 4 00:24:50 2025 -0500
> >>
> >>      i386/cpu: Warn about why CPUID_EXT_PDCM is not available
> >>
> >>      When user requests PDCM explicitly via "+pdcm" without PMU enabled, emit
> >>      a warning to inform the user.
> >>
> >>      Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> >>      Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
> >>      Link: https://lore.kernel.org/r/20250304052450.465445-3-xiaoyao.li@intel.com
> >>      Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> >>
> >>   target/i386/cpu.c | 3 +++
> >>   1 file changed, 3 insertions(+)
> >>
> >>
> >>
> >> Which is odd as it should only add a warning right?
> >
> > No, that commit message is misleading.
> >
> > IIUC mark_unavailable_features() actively blocks usage of the feature,
> > so it is a functional change, not merely a emitting warning.
> >
> > It makes me wonder if that commit was actually intended to block the
> > feature or not, vs merely warning ?  CC'ing those involved in the
> > commit.
> We can revert the commit.  I'll send the revert to Stefan and let him
> decide whether to include it in 10.1-rc4 or delay to 10.2 and 10.1.1.

Thanks Paolo for considering that.

My steps to reproduce seemed really clear and are 100% reproducible
for me, but no one so far said "yeah they see it too", so I'm getting
unsure if it was not tried by anyone else or if there is more to it
than we yet know.
Further I tested more with the commit reverted, and found that at
least cross version migrations (9.2 -> 10.1) still have issues that
seem related - complaining about pdcm as missing feature.
But that was in a log of a test system that went away and ... you know
how these things can sometimes be, that new result is not yet very
reliable.

I intended to check the following matrix more deeply again with and
without the reverted change and then come back to this thread:

#1 Compare platforms
- Migrating between non containerized hosts to verify if they are
affected as well
- Power management explicitly switched off/on (vs the auto detect of
host-model) in the guest XML
#2 Retest the different Use-cases I've seen this pop up
- 10.1 managed save (broken unless reverting the commit that was identified)
- 9.2 -> 10.1 migration (seems broken even with the revert)

The hope was that these will help to further identify what is going
on, but despite the urgency of the release being imminent I have not
yet managed to find the time in the last two days :-/

> Sorry for the delay in answering (and thanks Daniel for bringing this to
> my attention).
>
> Thanks,
>
> Paolo
>


-- 
Christian Ehrhardt
Director of Engineering, Ubuntu Server
Canonical Ltd


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Issues with pdcm in qemu 10.1-rc on migration and save/restore
  2025-08-20  5:11         ` Christian Ehrhardt
@ 2025-08-20  9:10           ` Christian Ehrhardt
  2025-09-03  8:38           ` Christian Ehrhardt
  1 sibling, 0 replies; 25+ messages in thread
From: Christian Ehrhardt @ 2025-08-20  9:10 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Daniel P. Berrangé, Xiaoyao Li, Zhao Liu, qemu-devel

On Wed, Aug 20, 2025 at 7:11 AM Christian Ehrhardt
<christian.ehrhardt@canonical.com> wrote:
>
> On Tue, Aug 19, 2025 at 4:51 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
> >
> > On 8/6/25 21:18, Daniel P. Berrangé wrote:
> > > On Wed, Aug 06, 2025 at 07:57:34PM +0200, Christian Ehrhardt wrote:
> > >> On Wed, Aug 6, 2025 at 2:00 PM Daniel P. Berrangé <berrange@redhat.com> wrote:
> > >>>
> > >>> On Wed, Aug 06, 2025 at 01:52:17PM +0200, Christian Ehrhardt wrote:
> > >>>> Hi,
> > >>>> I was unsure if this would be better sent to libvirt or qemu - the
> > >>>> issue is somewhere between libvirt modelling CPUs and qemu 10.1
> > >>>> behaving differently. I did not want to double post and gladly most of
> > >>>> the people are on both lists - since the switch in/out of the problem
> > >>>> is qemu 10.0 <-> 10.1 let me start here. I beg your pardon for not yet
> > >>>> having all the answers, I'm sure I could find more with debugging, but
> > >>>> I also wanted to report early for your awareness while we are still in
> > >>>> the RC phase.
> > >>>>
> > >>>>
> > >>>> # Problem
> > >>>>
> > >>>> What I found when testing migrations in Ubuntu with qemu 10.1-rc1 was:
> > >>>>    error: operation failed: guest CPU doesn't match specification:
> > >>>> missing features: pdcm
> > >>>>
> > >>>> This is behaving the same with libvirt 11.4 or the more recent 11.6.
> > >>>> But switching back to qemu 10.0 confirmed that this behavior is new
> > >>>> with qemu 10.1-rc.
> > >>>
> > >>>
> > >>>> Without yet having any hard evidence against them I found a few pdcm
> > >>>> related commits between 10.0 and 10.1-rc1:
> > >>>>    7ff24fb65 i386/tdx: Don't mask off CPUID_EXT_PDCM
> > >>>>    00268e000 i386/cpu: Warn about why CPUID_EXT_PDCM is not available
> > >>>>    e68ec2980 i386/cpu: Move adjustment of CPUID_EXT_PDCM before
> > >>>> feature_dependencies[] check
> > >>>>    0ba06e46d i386/tdx: Add TDX fixed1 bits to supported CPUIDs
> > >>>>
> > >>>>
> > >>>> # Caveat
> > >>>>
> > >>>> My test environment is in LXD system containers, that gives me issues
> > >>>> in the power management detection
> > >>>>    libvirtd[406]: error from service: GDBus.Error:System.Error.EROFS:
> > >>>> Read-only file system
> > >>>>    libvirtd[406]: Failed to get host power management capabilities
> > >>>
> > >>> That's harmless.
> > >>
> > >> Yeah, it always was for me - thanks for confirming.
> > >>
> > >>>> And the resulting host-model on a  rather old test server will therefore have:
> > >>>>    <cpu mode='custom' match='exact' check='full'>
> > >>>>      <model fallback='forbid'>Haswell-noTSX-IBRS</model>
> > >>>>      <vendor>Intel</vendor>
> > >>>>      <feature policy='require' name='vmx'/>
> > >>>>      <feature policy='disable' name='pdcm'/>
> > >>>>       ...
> > >>>>
> > >>>> But that was fine in the past, and the behavior started to break
> > >>>> save/restore or migrations just now with the new qemu 10.1-rc.
> > >>>>
> > >>>> # Next steps
> > >>>>
> > >>>> I'm soon overwhelmed by meetings for the rest of the day, but would be
> > >>>> curious if one has a suggestion about what to look at next for
> > >>>> debugging or a theory about what might go wrong. If nothing else comes
> > >>>> up I'll try to set up a bisect run tomorrow.
> > >>>
> > >>> Yeah, git bisect is what I'd start with.
> > >>
> > >> Bisect complete, identified this commit
> > >>
> > >> commit 00268e00027459abede448662f8794d78eb4b0a4
> > >> Author: Xiaoyao Li <xiaoyao.li@intel.com>
> > >> Date:   Tue Mar 4 00:24:50 2025 -0500
> > >>
> > >>      i386/cpu: Warn about why CPUID_EXT_PDCM is not available
> > >>
> > >>      When user requests PDCM explicitly via "+pdcm" without PMU enabled, emit
> > >>      a warning to inform the user.
> > >>
> > >>      Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> > >>      Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
> > >>      Link: https://lore.kernel.org/r/20250304052450.465445-3-xiaoyao.li@intel.com
> > >>      Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> > >>
> > >>   target/i386/cpu.c | 3 +++
> > >>   1 file changed, 3 insertions(+)
> > >>
> > >>
> > >>
> > >> Which is odd as it should only add a warning right?
> > >
> > > No, that commit message is misleading.
> > >
> > > IIUC mark_unavailable_features() actively blocks usage of the feature,
> > > so it is a functional change, not merely a emitting warning.
> > >
> > > It makes me wonder if that commit was actually intended to block the
> > > feature or not, vs merely warning ?  CC'ing those involved in the
> > > commit.
> > We can revert the commit.  I'll send the revert to Stefan and let him
> > decide whether to include it in 10.1-rc4 or delay to 10.2 and 10.1.1.
>
> Thanks Paolo for considering that.
>
> My steps to reproduce seemed really clear and are 100% reproducible
> for me, but no one so far said "yeah they see it too", so I'm getting
> unsure if it was not tried by anyone else or if there is more to it
> than we yet know.
> Further I tested more with the commit reverted, and found that at
> least cross version migrations (9.2 -> 10.1) still have issues that
> seem related - complaining about pdcm as missing feature.
> But that was in a log of a test system that went away and ... you know
> how these things can sometimes be, that new result is not yet very
> reliable.
>
> I intended to check the following matrix more deeply again with and
> without the reverted change and then come back to this thread:
>
> #1 Compare platforms
> - Migrating between non containerized hosts to verify if they are
> affected as well
> - Power management explicitly switched off/on (vs the auto detect of
> host-model) in the guest XML
> #2 Retest the different Use-cases I've seen this pop up
> - 10.1 managed save (broken unless reverting the commit that was identified)
> - 9.2 -> 10.1 migration (seems broken even with the revert)

I'll report back as i can squeeze tests between meetings (there also
is an issue with riscv emulation that I try to identify at the same
time)

For now I managed to eliminate some of my concerns, which was that it
might be my system container based setup.
But I was testing different builds on bare metal x86 this morning and
it looks the same to me.

### ### ###

1:10.0.2+ds-1ubuntu2 (this is what is in the current Ubuntu, until I
started on 10.1)

=> On start maps host-model to:
  <model fallback='forbid'>Cascadelake-Server</model>
  <vendor>Intel</vendor>
  <feature policy='require' name='vmx'/>
  <feature policy='require' name='pdcm'/>
...
=> Can do managedsave + start without problems

### ### ###

1:10.1.0~rc2+ds-1ubuntu1 (which has 00268e0 "i386/cpu: Warn about why
CPUID_EXT_PDCM is not available" reverted)

=> On start maps host-model to:
  <model fallback='forbid'>Cascadelake-Server</model>
  <vendor>Intel</vendor>
  <feature policy='require' name='vmx'/>
  <feature policy='disable' name='pdcm'/>
               ^^ this detection is already different due to using
10.1 even with the revert
...
=> Can do managedsave + start without problems
               ^^ with the revert it works

### ### ###

1:10.1.0~rc3+ds-2ubuntu1~questingppa1 (which is WITHOUT the revert,
more like the normal RC3)

=> On start maps host-model to:
  <model fallback='forbid'>Cascadelake-Server</model>
  <vendor>Intel</vendor>
  <feature policy='require' name='vmx'/>
  <feature policy='disable' name='pdcm'/>
                       ^^ same detection as with the revert, but
different than 10.0 formerly did
...
=> Fails to do save and restore just as I had it in my container based tests

$ virsh start testguest
error: Failed to start domain 'testguest'
error: operation failed: guest CPU doesn't match specification:
missing features: pdcm

### ### ###

While I was concerned if my test environment might have been part of
this, it seems it is generally applicable and a problem on bare metal
too.
This is on 6.16.0-13-generic on a Intel Xeon Gold 5222, libvirt 11.6,
only qemu changes between the tests.
With that confirmed, I'd indeed suggest reverting it for now, even
more if someone else could reproduce the same on other platforms and
builds.


On Thu, Aug 7, 2025 at 10:09 AM Xiaoyao Li <xiaoyao.li@intel.com> wrote:
>
> So libvirt must first request with "-cpu xxx,pdcm=on" without "pmu=on"
> and gets the result that PDCM is filtered (set in cpu->filtered_features[]).

I wanted to also double down on that former comment, trying to check
logs what they say about pdcm and pmu exactly.
pmu is easy - not mentioned ever in cmdline or configs, but pdcm is
more interesting than I expected.

This is with:
1:10.1.0~rc3+ds-2ubuntu1~questingppa1 (which is WITHOUT the revert,
more like the normal RC3)

The guest gets started at first with this full cpu definition

-cpu Cascadelake-Server,vmx=on,pdcm=on,hypervisor=on,ss=on,tsc-adjust=on,fdp-excptn-only=on,zero-fcs-fds=on,mpx=on,umip=on,pku=on,md-clear=on,stibp=on,flush-l1d=on,arch-capabilities=on,xsaves=on,ibpb=on,ibrs=on,amd-stibp=on,amd-ssbd=on,rdctl-no=on,ibrs-all=on,skip-l1dfl-vmentry=on,mds-no=on,pschange-mc-no=on,tsx-ctrl=on,sbdr-ssdp-no=on,psdp-no=on,fb-clear=on,rfds-no=on,vmx-ins-outs=on,vmx-true-ctls=on,vmx-store-lma=on,vmx-activity-hlt=on,vmx-activity-wait-sipi=on,vmx-vmwrite-vmexit-fields=on,vmx-apicv-xapic=on,vmx-ept=on,vmx-desc-exit=on,vmx-rdtscp-exit=on,vmx-apicv-x2apic=on,vmx-vpid=on,vmx-wbinvd-exit=on,vmx-unrestricted-guest=on,vmx-apicv-register=on,vmx-apicv-vid=on,vmx-rdrand-exit=on,vmx-invpcid-exit=on,vmx-vmfunc=on,vmx-shadow-vmcs=on,vmx-rdseed-exit=on,vmx-pml=on,vmx-xsaves=on,vmx-tsc-scaling=on,vmx-ept-execonly=on,vmx-page-walk-4=on,vmx-ept-2mb=on,vmx-ept-1gb=on,vmx-invept=on,vmx-eptad=on,vmx-invept-single-context=on,vmx-invept-all-context=on,vmx-invvpid=on,vmx-invvpid-single-addr=on,vmx-invvpid-all-context=on,vmx-invept-single-context-noglobals=on,vmx-intr-exit=on,vmx-nmi-exit=on,vmx-vnmi=on,vmx-preemption-timer=on,vmx-posted-intr=on,vmx-vintr-pending=on,vmx-tsc-offset=on,vmx-hlt-exit=on,vmx-invlpg-exit=on,vmx-mwait-exit=on,vmx-rdpmc-exit=on,vmx-rdtsc-exit=on,vmx-cr3-load-noexit=on,vmx-cr3-store-noexit=on,vmx-cr8-load-exit=on,vmx-cr8-store-exit=on,vmx-flexpriority=on,vmx-vnmi-pending=on,vmx-movdr-exit=on,vmx-io-exit=on,vmx-io-bitmap=on,vmx-mtf=on,vmx-msr-bitmap=on,vmx-monitor-exit=on,vmx-pause-exit=on,vmx-secondary-ctls=on,vmx-exit-nosave-debugctl=on,vmx-exit-load-perf-global-ctrl=on,vmx-exit-ack-intr=on,vmx-exit-save-pat=on,vmx-exit-load-pat=on,vmx-exit-save-efer=on,vmx-exit-load-efer=on,vmx-exit-save-preemption-timer=on,vmx-exit-clear-bndcfgs=on,vmx-entry-noload-debugctl=on,vmx-entry-ia32e-mode=on,vmx-entry-load-perf-global-ctrl=on,vmx-entry-load-pat=on,vmx-entry-load-efer=on,vmx-entry-load-bndcfgs=on,vmx-eptp-switching=on,hle=off,rtm=off

which has "pdcm=on" but does not mention pmu either way.

But once qemu starts we see it emit

2025-08-20T08:53:00.514650Z qemu-system-x86_64: warning: This feature
is not available due to PMU being disabled: CPUID[eax=01h].ECX.pdcm
[bit 15]

When I do "virsh start" after "managedsave" I see this:

-cpu Cascadelake-Server,vmx=on,pdcm=off,hypervisor=on,ss=on,tsc-adjust=on,fdp-excptn-only=on,zero-fcs-fds=on,mpx=on,umip=on,pku=on,md-clear=on,stibp=on,flush-l1d=on,arch-capabilities=on,xsaves=on,ibpb=on,ibrs=on,amd-stibp=on,amd-ssbd=on,rdctl-no=on,ibrs-all=on,skip-l1dfl-vmentry=on,mds-no=on,pschange-mc-no=on,tsx-ctrl=on,sbdr-ssdp-no=on,psdp-no=on,fb-clear=on,rfds-no=on,vmx-ins-outs=on,vmx-true-ctls=on,vmx-store-lma=on,vmx-activity-hlt=on,vmx-activity-wait-sipi=on,vmx-vmwrite-vmexit-fields=on,vmx-apicv-xapic=on,vmx-ept=on,vmx-desc-exit=on,vmx-rdtscp-exit=on,vmx-apicv-x2apic=on,vmx-vpid=on,vmx-wbinvd-exit=on,vmx-unrestricted-guest=on,vmx-apicv-register=on,vmx-apicv-vid=on,vmx-rdrand-exit=on,vmx-invpcid-exit=on,vmx-vmfunc=on,vmx-shadow-vmcs=on,vmx-rdseed-exit=on,vmx-pml=on,vmx-xsaves=on,vmx-tsc-scaling=on,vmx-ept-execonly=on,vmx-page-walk-4=on,vmx-ept-2mb=on,vmx-ept-1gb=on,vmx-invept=on,vmx-eptad=on,vmx-invept-single-context=on,vmx-invept-all-context=on,vmx-invvpid=on,vmx-invvpid-single-addr=on,vmx-invvpid-all-context=on,vmx-invept-single-context-noglobals=on,vmx-intr-exit=on,vmx-nmi-exit=on,vmx-vnmi=on,vmx-preemption-timer=on,vmx-posted-intr=on,vmx-vintr-pending=on,vmx-tsc-offset=on,vmx-hlt-exit=on,vmx-invlpg-exit=on,vmx-mwait-exit=on,vmx-rdpmc-exit=on,vmx-rdtsc-exit=on,vmx-cr3-load-noexit=on,vmx-cr3-store-noexit=on,vmx-cr8-load-exit=on,vmx-cr8-store-exit=on,vmx-flexpriority=on,vmx-vnmi-pending=on,vmx-movdr-exit=on,vmx-io-exit=on,vmx-io-bitmap=on,vmx-mtf=on,vmx-msr-bitmap=on,vmx-monitor-exit=on,vmx-pause-exit=on,vmx-secondary-ctls=on,vmx-exit-nosave-debugctl=on,vmx-exit-load-perf-global-ctrl=on,vmx-exit-ack-intr=on,vmx-exit-save-pat=on,vmx-exit-load-pat=on,vmx-exit-save-efer=on,vmx-exit-load-efer=on,vmx-exit-save-preemption-timer=on,vmx-exit-clear-bndcfgs=on,vmx-entry-noload-debugctl=on,vmx-entry-ia32e-mode=on,vmx-entry-load-perf-global-ctrl=on,vmx-entry-load-pat=on,vmx-entry-load-efer=on,vmx-entry-load-bndcfgs=on,vmx-eptp-switching=on,hle=off,rtm=off

I still get this warning
2025-08-20T08:54:54.477469Z qemu-system-x86_64: warning: This feature
is not available due to PMU being disabled: CPUID[eax=01h].ECX.pdcm
[bit 15]

But more interestingly - in this second start, the one with a managed
save present, we see libvirt call the guest with
   pdcm=off

So for a yet unknown reason this second start is setting pdcm=off and
therefore all that follows is expected.
And yep, then we will get
  error: operation failed: guest CPU doesn't match specification:
missing features: pdcm

So we are partially back to my original suspicion that is is somewhere
in between qemu and libvirt, it is neither alone but their interaction
that is breaking this AFAICS.

> The hope was that these will help to further identify what is going
> on, but despite the urgency of the release being imminent I have not
> yet managed to find the time in the last two days :-/
>
> > Sorry for the delay in answering (and thanks Daniel for bringing this to
> > my attention).
> >
> > Thanks,
> >
> > Paolo
> >
>
>
> --
> Christian Ehrhardt
> Director of Engineering, Ubuntu Server
> Canonical Ltd



-- 
Christian Ehrhardt
Director of Engineering, Ubuntu Server
Canonical Ltd


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Issues with pdcm in qemu 10.1-rc on migration and save/restore
  2025-08-20  5:11         ` Christian Ehrhardt
  2025-08-20  9:10           ` Christian Ehrhardt
@ 2025-09-03  8:38           ` Christian Ehrhardt
  2025-09-03 11:26             ` Hector Cao
  2025-09-04 14:35             ` Hector Cao
  1 sibling, 2 replies; 25+ messages in thread
From: Christian Ehrhardt @ 2025-09-03  8:38 UTC (permalink / raw)
  To: Paolo Bonzini, Hector Cao
  Cc: Daniel P. Berrangé, Xiaoyao Li, Zhao Liu, qemu-devel

On Wed, Aug 20, 2025 at 7:11 AM Christian Ehrhardt
<christian.ehrhardt@canonical.com> wrote:
>
> On Tue, Aug 19, 2025 at 4:51 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
> >
> > On 8/6/25 21:18, Daniel P. Berrangé wrote:
> > > On Wed, Aug 06, 2025 at 07:57:34PM +0200, Christian Ehrhardt wrote:
> > >> On Wed, Aug 6, 2025 at 2:00 PM Daniel P. Berrangé <berrange@redhat.com> wrote:
> > >>>
> > >>> On Wed, Aug 06, 2025 at 01:52:17PM +0200, Christian Ehrhardt wrote:
> > >>>> Hi,
> > >>>> I was unsure if this would be better sent to libvirt or qemu - the
> > >>>> issue is somewhere between libvirt modelling CPUs and qemu 10.1
> > >>>> behaving differently. I did not want to double post and gladly most of
> > >>>> the people are on both lists - since the switch in/out of the problem
> > >>>> is qemu 10.0 <-> 10.1 let me start here. I beg your pardon for not yet
> > >>>> having all the answers, I'm sure I could find more with debugging, but
> > >>>> I also wanted to report early for your awareness while we are still in
> > >>>> the RC phase.
> > >>>>
> > >>>>
> > >>>> # Problem
> > >>>>
> > >>>> What I found when testing migrations in Ubuntu with qemu 10.1-rc1 was:
> > >>>>    error: operation failed: guest CPU doesn't match specification:
> > >>>> missing features: pdcm
> > >>>>
> > >>>> This is behaving the same with libvirt 11.4 or the more recent 11.6.
> > >>>> But switching back to qemu 10.0 confirmed that this behavior is new
> > >>>> with qemu 10.1-rc.
> > >>>
> > >>>
> > >>>> Without yet having any hard evidence against them I found a few pdcm
> > >>>> related commits between 10.0 and 10.1-rc1:
> > >>>>    7ff24fb65 i386/tdx: Don't mask off CPUID_EXT_PDCM
> > >>>>    00268e000 i386/cpu: Warn about why CPUID_EXT_PDCM is not available
> > >>>>    e68ec2980 i386/cpu: Move adjustment of CPUID_EXT_PDCM before
> > >>>> feature_dependencies[] check
> > >>>>    0ba06e46d i386/tdx: Add TDX fixed1 bits to supported CPUIDs
> > >>>>
> > >>>>
> > >>>> # Caveat
> > >>>>
> > >>>> My test environment is in LXD system containers, that gives me issues
> > >>>> in the power management detection
> > >>>>    libvirtd[406]: error from service: GDBus.Error:System.Error.EROFS:
> > >>>> Read-only file system
> > >>>>    libvirtd[406]: Failed to get host power management capabilities
> > >>>
> > >>> That's harmless.
> > >>
> > >> Yeah, it always was for me - thanks for confirming.
> > >>
> > >>>> And the resulting host-model on a  rather old test server will therefore have:
> > >>>>    <cpu mode='custom' match='exact' check='full'>
> > >>>>      <model fallback='forbid'>Haswell-noTSX-IBRS</model>
> > >>>>      <vendor>Intel</vendor>
> > >>>>      <feature policy='require' name='vmx'/>
> > >>>>      <feature policy='disable' name='pdcm'/>
> > >>>>       ...
> > >>>>
> > >>>> But that was fine in the past, and the behavior started to break
> > >>>> save/restore or migrations just now with the new qemu 10.1-rc.
> > >>>>
> > >>>> # Next steps
> > >>>>
> > >>>> I'm soon overwhelmed by meetings for the rest of the day, but would be
> > >>>> curious if one has a suggestion about what to look at next for
> > >>>> debugging or a theory about what might go wrong. If nothing else comes
> > >>>> up I'll try to set up a bisect run tomorrow.
> > >>>
> > >>> Yeah, git bisect is what I'd start with.
> > >>
> > >> Bisect complete, identified this commit
> > >>
> > >> commit 00268e00027459abede448662f8794d78eb4b0a4
> > >> Author: Xiaoyao Li <xiaoyao.li@intel.com>
> > >> Date:   Tue Mar 4 00:24:50 2025 -0500
> > >>
> > >>      i386/cpu: Warn about why CPUID_EXT_PDCM is not available
> > >>
> > >>      When user requests PDCM explicitly via "+pdcm" without PMU enabled, emit
> > >>      a warning to inform the user.
> > >>
> > >>      Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> > >>      Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
> > >>      Link: https://lore.kernel.org/r/20250304052450.465445-3-xiaoyao.li@intel.com
> > >>      Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> > >>
> > >>   target/i386/cpu.c | 3 +++
> > >>   1 file changed, 3 insertions(+)
> > >>
> > >>
> > >>
> > >> Which is odd as it should only add a warning right?
> > >
> > > No, that commit message is misleading.
> > >
> > > IIUC mark_unavailable_features() actively blocks usage of the feature,
> > > so it is a functional change, not merely a emitting warning.
> > >
> > > It makes me wonder if that commit was actually intended to block the
> > > feature or not, vs merely warning ?  CC'ing those involved in the
> > > commit.
> > We can revert the commit.  I'll send the revert to Stefan and let him
> > decide whether to include it in 10.1-rc4 or delay to 10.2 and 10.1.1.
>
> Thanks Paolo for considering that.
>
> My steps to reproduce seemed really clear and are 100% reproducible
> for me, but no one so far said "yeah they see it too", so I'm getting
> unsure if it was not tried by anyone else or if there is more to it
> than we yet know.
> Further I tested more with the commit reverted, and found that at
> least cross version migrations (9.2 -> 10.1) still have issues that
> seem related - complaining about pdcm as missing feature.
> But that was in a log of a test system that went away and ... you know
> how these things can sometimes be, that new result is not yet very
> reliable.
>
> I intended to check the following matrix more deeply again with and
> without the reverted change and then come back to this thread:
>
> #1 Compare platforms
> - Migrating between non containerized hosts to verify if they are
> affected as well
> - Power management explicitly switched off/on (vs the auto detect of
> host-model) in the guest XML
> #2 Retest the different Use-cases I've seen this pop up
> - 10.1 managed save (broken unless reverting the commit that was identified)
> - 9.2 -> 10.1 migration (seems broken even with the revert)

I need to come back to this aspect of it - the cross release or cross
qemu version migrations.

Hector (on CC) helps me on that now - sadly we were able to confirm
that migrations from older qemu versions no longer work.
Yep 10.1 is released by now so it might end up as "The problem is what
happens when we detect after we have done a release that something has
gone wrong" from [2].
But I still can't believe only we see this and therefore for now want
to believe I messed up on our side when merging 10.1 :-)

For now this is a call if others have also seen any older release
migrating to 10.1 to throw:
  error: operation failed: guest CPU doesn't match specification:
missing features: pdcm,arch-capabilities

Hector will later today reply here with a summary of what we found so
far, to provide you a more complete picture to think about, without
having to read through all the messy interim steps in the Ubuntu bug.

[1]: https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/2121787
[2]: https://gitlab.com/qemu-project/qemu/-/blob/master/docs/devel/migration/compatibility.rst?plain=1#L322

> The hope was that these will help to further identify what is going
> on, but despite the urgency of the release being imminent I have not
> yet managed to find the time in the last two days :-/
>
> > Sorry for the delay in answering (and thanks Daniel for bringing this to
> > my attention).
> >
> > Thanks,
> >
> > Paolo
> >
>
>
> --
> Christian Ehrhardt
> Director of Engineering, Ubuntu Server
> Canonical Ltd



-- 
Christian Ehrhardt
Director of Engineering, Ubuntu Server
Canonical Ltd


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Issues with pdcm in qemu 10.1-rc on migration and save/restore
  2025-09-03  8:38           ` Christian Ehrhardt
@ 2025-09-03 11:26             ` Hector Cao
  2025-09-04 14:35             ` Hector Cao
  1 sibling, 0 replies; 25+ messages in thread
From: Hector Cao @ 2025-09-03 11:26 UTC (permalink / raw)
  To: Christian Ehrhardt
  Cc: Paolo Bonzini, Daniel P. Berrangé, Xiaoyao Li, Zhao Liu,
	qemu-devel

[-- Attachment #1: Type: text/plain, Size: 12371 bytes --]

On Wed, Sep 3, 2025 at 10:38 AM Christian Ehrhardt <
christian.ehrhardt@canonical.com> wrote:

> On Wed, Aug 20, 2025 at 7:11 AM Christian Ehrhardt
> <christian.ehrhardt@canonical.com> wrote:
> >
> > On Tue, Aug 19, 2025 at 4:51 PM Paolo Bonzini <pbonzini@redhat.com>
> wrote:
> > >
> > > On 8/6/25 21:18, Daniel P. Berrangé wrote:
> > > > On Wed, Aug 06, 2025 at 07:57:34PM +0200, Christian Ehrhardt wrote:
> > > >> On Wed, Aug 6, 2025 at 2:00 PM Daniel P. Berrangé <
> berrange@redhat.com> wrote:
> > > >>>
> > > >>> On Wed, Aug 06, 2025 at 01:52:17PM +0200, Christian Ehrhardt wrote:
> > > >>>> Hi,
> > > >>>> I was unsure if this would be better sent to libvirt or qemu - the
> > > >>>> issue is somewhere between libvirt modelling CPUs and qemu 10.1
> > > >>>> behaving differently. I did not want to double post and gladly
> most of
> > > >>>> the people are on both lists - since the switch in/out of the
> problem
> > > >>>> is qemu 10.0 <-> 10.1 let me start here. I beg your pardon for
> not yet
> > > >>>> having all the answers, I'm sure I could find more with
> debugging, but
> > > >>>> I also wanted to report early for your awareness while we are
> still in
> > > >>>> the RC phase.
> > > >>>>
> > > >>>>
> > > >>>> # Problem
> > > >>>>
> > > >>>> What I found when testing migrations in Ubuntu with qemu 10.1-rc1
> was:
> > > >>>>    error: operation failed: guest CPU doesn't match specification:
> > > >>>> missing features: pdcm
> > > >>>>
> > > >>>> This is behaving the same with libvirt 11.4 or the more recent
> 11.6.
> > > >>>> But switching back to qemu 10.0 confirmed that this behavior is
> new
> > > >>>> with qemu 10.1-rc.
> > > >>>
> > > >>>
> > > >>>> Without yet having any hard evidence against them I found a few
> pdcm
> > > >>>> related commits between 10.0 and 10.1-rc1:
> > > >>>>    7ff24fb65 i386/tdx: Don't mask off CPUID_EXT_PDCM
> > > >>>>    00268e000 i386/cpu: Warn about why CPUID_EXT_PDCM is not
> available
> > > >>>>    e68ec2980 i386/cpu: Move adjustment of CPUID_EXT_PDCM before
> > > >>>> feature_dependencies[] check
> > > >>>>    0ba06e46d i386/tdx: Add TDX fixed1 bits to supported CPUIDs
> > > >>>>
> > > >>>>
> > > >>>> # Caveat
> > > >>>>
> > > >>>> My test environment is in LXD system containers, that gives me
> issues
> > > >>>> in the power management detection
> > > >>>>    libvirtd[406]: error from service:
> GDBus.Error:System.Error.EROFS:
> > > >>>> Read-only file system
> > > >>>>    libvirtd[406]: Failed to get host power management capabilities
> > > >>>
> > > >>> That's harmless.
> > > >>
> > > >> Yeah, it always was for me - thanks for confirming.
> > > >>
> > > >>>> And the resulting host-model on a  rather old test server will
> therefore have:
> > > >>>>    <cpu mode='custom' match='exact' check='full'>
> > > >>>>      <model fallback='forbid'>Haswell-noTSX-IBRS</model>
> > > >>>>      <vendor>Intel</vendor>
> > > >>>>      <feature policy='require' name='vmx'/>
> > > >>>>      <feature policy='disable' name='pdcm'/>
> > > >>>>       ...
> > > >>>>
> > > >>>> But that was fine in the past, and the behavior started to break
> > > >>>> save/restore or migrations just now with the new qemu 10.1-rc.
> > > >>>>
> > > >>>> # Next steps
> > > >>>>
> > > >>>> I'm soon overwhelmed by meetings for the rest of the day, but
> would be
> > > >>>> curious if one has a suggestion about what to look at next for
> > > >>>> debugging or a theory about what might go wrong. If nothing else
> comes
> > > >>>> up I'll try to set up a bisect run tomorrow.
> > > >>>
> > > >>> Yeah, git bisect is what I'd start with.
> > > >>
> > > >> Bisect complete, identified this commit
> > > >>
> > > >> commit 00268e00027459abede448662f8794d78eb4b0a4
> > > >> Author: Xiaoyao Li <xiaoyao.li@intel.com>
> > > >> Date:   Tue Mar 4 00:24:50 2025 -0500
> > > >>
> > > >>      i386/cpu: Warn about why CPUID_EXT_PDCM is not available
> > > >>
> > > >>      When user requests PDCM explicitly via "+pdcm" without PMU
> enabled, emit
> > > >>      a warning to inform the user.
> > > >>
> > > >>      Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> > > >>      Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
> > > >>      Link:
> https://lore.kernel.org/r/20250304052450.465445-3-xiaoyao.li@intel.com
> > > >>      Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> > > >>
> > > >>   target/i386/cpu.c | 3 +++
> > > >>   1 file changed, 3 insertions(+)
> > > >>
> > > >>
> > > >>
> > > >> Which is odd as it should only add a warning right?
> > > >
> > > > No, that commit message is misleading.
> > > >
> > > > IIUC mark_unavailable_features() actively blocks usage of the
> feature,
> > > > so it is a functional change, not merely a emitting warning.
> > > >
> > > > It makes me wonder if that commit was actually intended to block the
> > > > feature or not, vs merely warning ?  CC'ing those involved in the
> > > > commit.
> > > We can revert the commit.  I'll send the revert to Stefan and let him
> > > decide whether to include it in 10.1-rc4 or delay to 10.2 and 10.1.1.
> >
> > Thanks Paolo for considering that.
> >
> > My steps to reproduce seemed really clear and are 100% reproducible
> > for me, but no one so far said "yeah they see it too", so I'm getting
> > unsure if it was not tried by anyone else or if there is more to it
> > than we yet know.
> > Further I tested more with the commit reverted, and found that at
> > least cross version migrations (9.2 -> 10.1) still have issues that
> > seem related - complaining about pdcm as missing feature.
> > But that was in a log of a test system that went away and ... you know
> > how these things can sometimes be, that new result is not yet very
> > reliable.
> >
> > I intended to check the following matrix more deeply again with and
> > without the reverted change and then come back to this thread:
> >
> > #1 Compare platforms
> > - Migrating between non containerized hosts to verify if they are
> > affected as well
> > - Power management explicitly switched off/on (vs the auto detect of
> > host-model) in the guest XML
> > #2 Retest the different Use-cases I've seen this pop up
> > - 10.1 managed save (broken unless reverting the commit that was
> identified)
> > - 9.2 -> 10.1 migration (seems broken even with the revert)
>
> I need to come back to this aspect of it - the cross release or cross
> qemu version migrations.
>
> Hector (on CC) helps me on that now - sadly we were able to confirm
> that migrations from older qemu versions no longer work.
> Yep 10.1 is released by now so it might end up as "The problem is what
> happens when we detect after we have done a release that something has
> gone wrong" from [2].
> But I still can't believe only we see this and therefore for now want
> to believe I messed up on our side when merging 10.1 :-)
>
> For now this is a call if others have also seen any older release
> migrating to 10.1 to throw:
>   error: operation failed: guest CPU doesn't match specification:
> missing features: pdcm,arch-capabilities
>
> Hector will later today reply here with a summary of what we found so
> far, to provide you a more complete picture to think about, without
> having to read through all the messy interim steps in the Ubuntu bug.
>
>
Indeed, we experience this error at migration from older QEMU versions to
the QEMU 10.1


$ virsh migrate --unsafe --live test-migration qemu+ssh://10.105.100.188/system
error: operation failed: guest CPU doesn't match specification:
missing features: pdcm,arch-capabilities

The domain definition used to reproduce this issue is quite simple:

<domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'>
  <name>test-migration</name>
  <memory unit='GiB'>2</memory>
  <os>
    <type arch='x86_64' machine='q35'>hvm</type>
  </os>
  <cpu mode='host-model' check='partial'/>
</domain>

Here are the -machine and -cpu blocks of QEMU command-lines for a
migration between
QEMU 9.2 and QEMU 10.1 on a Intel Haswell CPU:

Origin: QEMU 9.2

...
-machine pc-q35-9.2,usb=off,dump-guest-core=off,memory-backend=pc.ram,acpi=off
\
-cpu Haswell-noTSX-IBRS,vmx=on,pdcm=on,f16c=on,rdrand=on,
 hypervisor=on,vme=on,ss=on,arat=on,tsc-adjust=on,zero-fcs-fds=on,
 umip=on,md-clear=on,stibp=on,flush-l1d=on,arch-capabilities=on,
 ssbd=on,xsaveopt=on,abm=on,pdpe1gb=on,ibpb=on,ibrs=on,amd-stibp=on,
 amd-ssbd=on,skip-l1dfl-vmentry=on,pschange-mc-no=on,gds-no=on,rfds-no=on,
<vmx-*>
...

<vmx-*> : the vmx-* block is removed for better clarity

Target: QEMU 10.1

...
-machine pc-q35-9.2,usb=off,dump-guest-core=off,memory-backend=pc.ram,acpi=off
\
-cpu Haswell-noTSX-IBRS,vmx=on,pdcm=on,f16c=on,rdrand=on,
 hypervisor=on,vme=on,ss=on,arat=on,tsc-adjust=on,zero-fcs-fds=on,
 umip=on,md-clear=on,stibp=on,flush-l1d=on,arch-capabilities=on,
 ssbd=on,xsaveopt=onabm=on,pdpe1gb=on,ibpb=on,ibrs=on,amd-stibp=on,
 amd-ssbd=on,skip-l1dfl-vmentry=on,pschange-mc-no=on,gds-no=on,rfds-no=on,vmx-activity-wait-sipi=on,vmx-pml=on
...

This migration failure can be broken down to 2 separate issues, each
one is related to one missing feature: pdcm & arch-capabilities.
Based on our best understanding of the moment, the behavior of QEMU on
these 2 features has been changed recently in 10.1.

- arch-capabilities

  https://github.com/qemu/qemu/commit/d3a24134e37d57abd3e7445842cda2717f49e96d
  (target/i386: do not expose ARCH_CAPABILITIES on AMD CPU)

- pdcm
  https://github.com/qemu/qemu/commit/e68ec2980901c8e7f948f3305770962806c53f0b
  (i386/cpu: Move adjustment of CPUID_EXT_PDCM before
feature_dependencies[] check)

  this commit makes QEMU disable the pdcm if PMU is off, I think on
previous QEMU versions,
  this is also the expected behavior but there is a bug that is fixed
in the commit.
  When I enable the PMU in the guest definition:
    <features>
    <pmu state='on'/>
    </features>
  The missing pdcm feature error disappears.

If we revert both these two commits, the migration works.

We are looking into potential solutions to this migration issue and
according to the documenation [1],
our failure might fall into the 4th case:
  $ qemu-7.2 -M pc-7.2 -> qemu-8.0 -M pc-7.2

Presumably, it is necessary to add some compability properties to make
the new behavior on pdcm and
arch-capabilities compatible with older QEMU versions, but as Christian said -
10.1 is already released so it might be more complex now

## Other failed combinations

We looked into all the failing migration combinations we might have in
our supported releases.
We can confirm that the migration is also broken for other QEMU
versions we support in various Ubuntu releases:
(F = Focal, J = Jammy, N = Noble, P = Plucky, Q = Questing)

F-4.2 -> Q-10.1
J-6.2 -> Q-10.1
N-8.2 -> Q-10.1
P-9.2 -> Q-10.1

Maybe worth to note that these combinations are tested by our
automated tests and they just leave
cpu unspecified allowing libvirt to pick safe defaults instead of
using host-model as shown in the sample
domain definition above.

[1] https://www.qemu.org/docs/master/devel/migration/compatibility.html#how-to-mitigate-when-we-have-a-backward-compatibility-error



> [1]: https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/2121787
> [2]:
> https://gitlab.com/qemu-project/qemu/-/blob/master/docs/devel/migration/compatibility.rst?plain=1#L322
>
> > The hope was that these will help to further identify what is going
> > on, but despite the urgency of the release being imminent I have not
> > yet managed to find the time in the last two days :-/
> >
> > > Sorry for the delay in answering (and thanks Daniel for bringing this
> to
> > > my attention).
> > >
> > > Thanks,
> > >
> > > Paolo
> > >
> >
> >
> > --
> > Christian Ehrhardt
> > Director of Engineering, Ubuntu Server
> > Canonical Ltd
>
>
>
> --
> Christian Ehrhardt
> Director of Engineering, Ubuntu Server
> Canonical Ltd
>


-- 
Hector CAO
Software Engineer – Partner Engineering Team
hector.cao@canonical.com
https://launc <https://launchpad.net/~hectorcao>hpad.net/~hectorcao
<https://launchpad.net/~hectorcao>

<https://launchpad.net/~hectorcao>

[-- Attachment #2: Type: text/html, Size: 21302 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Issues with pdcm in qemu 10.1-rc on migration and save/restore
  2025-09-03  8:38           ` Christian Ehrhardt
  2025-09-03 11:26             ` Hector Cao
@ 2025-09-04 14:35             ` Hector Cao
  2025-09-10 11:57               ` [RFC PATCH 0/2] Fix cross migration issue with missing features: pdcm, arch-capabilities Hector Cao
  1 sibling, 1 reply; 25+ messages in thread
From: Hector Cao @ 2025-09-04 14:35 UTC (permalink / raw)
  To: Christian Ehrhardt
  Cc: Paolo Bonzini, Daniel P. Berrangé, Xiaoyao Li, Zhao Liu,
	qemu-devel

[-- Attachment #1: Type: text/plain, Size: 9654 bytes --]

Hello,

In addition to my previous mail describing the issue on different
Ubuntu releases,

I went further by testing directly qemu upstream at HEAD
(baa79455fa92984ff0f4b9ae94bed66823177a27)

As the start version for the migration, I take quite recent release
v10.0.x to make the version gap smaller.

I can reproduce the following migration failures:

v10.0.2 -> HEAD:
error: operation failed: guest CPU doesn't match specification:
missing features: pdcm,arch-capabilities

v10.0.3 -> HEAD:
error: operation failed: guest CPU doesn't match specification:
missing features: pdcm
The error arch-capabilities is no longer present because v10.0.3 also
has [2] like HEAD.

If I revert the two commits [1] and [2] in HEAD, the migration works fine:

v10.0.2 -> HEAD (+reverts):
OK

[1] Revert "i386/cpu: Move adjustment of CPUID_EXT_PDCM before
feature_dependencies[] check"
    This reverts commit e68ec2980901c8e7f948f3305770962806c53f0b.

[2] Revert "target/i386: do not expose ARCH_CAPABILITIES on AMD CPU"
    This reverts commit d3a24134e37d57abd3e7445842cda2717f49e96d.

Since this issue is blocking us for the Ubuntu 25.10 release, can you
please provide
feedback on the best path going forward ?


On Wed, Sep 3, 2025 at 10:38 AM Christian Ehrhardt <
christian.ehrhardt@canonical.com> wrote:

> On Wed, Aug 20, 2025 at 7:11 AM Christian Ehrhardt
> <christian.ehrhardt@canonical.com> wrote:
> >
> > On Tue, Aug 19, 2025 at 4:51 PM Paolo Bonzini <pbonzini@redhat.com>
> wrote:
> > >
> > > On 8/6/25 21:18, Daniel P. Berrangé wrote:
> > > > On Wed, Aug 06, 2025 at 07:57:34PM +0200, Christian Ehrhardt wrote:
> > > >> On Wed, Aug 6, 2025 at 2:00 PM Daniel P. Berrangé <
> berrange@redhat.com> wrote:
> > > >>>
> > > >>> On Wed, Aug 06, 2025 at 01:52:17PM +0200, Christian Ehrhardt wrote:
> > > >>>> Hi,
> > > >>>> I was unsure if this would be better sent to libvirt or qemu - the
> > > >>>> issue is somewhere between libvirt modelling CPUs and qemu 10.1
> > > >>>> behaving differently. I did not want to double post and gladly
> most of
> > > >>>> the people are on both lists - since the switch in/out of the
> problem
> > > >>>> is qemu 10.0 <-> 10.1 let me start here. I beg your pardon for
> not yet
> > > >>>> having all the answers, I'm sure I could find more with
> debugging, but
> > > >>>> I also wanted to report early for your awareness while we are
> still in
> > > >>>> the RC phase.
> > > >>>>
> > > >>>>
> > > >>>> # Problem
> > > >>>>
> > > >>>> What I found when testing migrations in Ubuntu with qemu 10.1-rc1
> was:
> > > >>>>    error: operation failed: guest CPU doesn't match specification:
> > > >>>> missing features: pdcm
> > > >>>>
> > > >>>> This is behaving the same with libvirt 11.4 or the more recent
> 11.6.
> > > >>>> But switching back to qemu 10.0 confirmed that this behavior is
> new
> > > >>>> with qemu 10.1-rc.
> > > >>>
> > > >>>
> > > >>>> Without yet having any hard evidence against them I found a few
> pdcm
> > > >>>> related commits between 10.0 and 10.1-rc1:
> > > >>>>    7ff24fb65 i386/tdx: Don't mask off CPUID_EXT_PDCM
> > > >>>>    00268e000 i386/cpu: Warn about why CPUID_EXT_PDCM is not
> available
> > > >>>>    e68ec2980 i386/cpu: Move adjustment of CPUID_EXT_PDCM before
> > > >>>> feature_dependencies[] check
> > > >>>>    0ba06e46d i386/tdx: Add TDX fixed1 bits to supported CPUIDs
> > > >>>>
> > > >>>>
> > > >>>> # Caveat
> > > >>>>
> > > >>>> My test environment is in LXD system containers, that gives me
> issues
> > > >>>> in the power management detection
> > > >>>>    libvirtd[406]: error from service:
> GDBus.Error:System.Error.EROFS:
> > > >>>> Read-only file system
> > > >>>>    libvirtd[406]: Failed to get host power management capabilities
> > > >>>
> > > >>> That's harmless.
> > > >>
> > > >> Yeah, it always was for me - thanks for confirming.
> > > >>
> > > >>>> And the resulting host-model on a  rather old test server will
> therefore have:
> > > >>>>    <cpu mode='custom' match='exact' check='full'>
> > > >>>>      <model fallback='forbid'>Haswell-noTSX-IBRS</model>
> > > >>>>      <vendor>Intel</vendor>
> > > >>>>      <feature policy='require' name='vmx'/>
> > > >>>>      <feature policy='disable' name='pdcm'/>
> > > >>>>       ...
> > > >>>>
> > > >>>> But that was fine in the past, and the behavior started to break
> > > >>>> save/restore or migrations just now with the new qemu 10.1-rc.
> > > >>>>
> > > >>>> # Next steps
> > > >>>>
> > > >>>> I'm soon overwhelmed by meetings for the rest of the day, but
> would be
> > > >>>> curious if one has a suggestion about what to look at next for
> > > >>>> debugging or a theory about what might go wrong. If nothing else
> comes
> > > >>>> up I'll try to set up a bisect run tomorrow.
> > > >>>
> > > >>> Yeah, git bisect is what I'd start with.
> > > >>
> > > >> Bisect complete, identified this commit
> > > >>
> > > >> commit 00268e00027459abede448662f8794d78eb4b0a4
> > > >> Author: Xiaoyao Li <xiaoyao.li@intel.com>
> > > >> Date:   Tue Mar 4 00:24:50 2025 -0500
> > > >>
> > > >>      i386/cpu: Warn about why CPUID_EXT_PDCM is not available
> > > >>
> > > >>      When user requests PDCM explicitly via "+pdcm" without PMU
> enabled, emit
> > > >>      a warning to inform the user.
> > > >>
> > > >>      Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> > > >>      Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
> > > >>      Link:
> https://lore.kernel.org/r/20250304052450.465445-3-xiaoyao.li@intel.com
> > > >>      Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> > > >>
> > > >>   target/i386/cpu.c | 3 +++
> > > >>   1 file changed, 3 insertions(+)
> > > >>
> > > >>
> > > >>
> > > >> Which is odd as it should only add a warning right?
> > > >
> > > > No, that commit message is misleading.
> > > >
> > > > IIUC mark_unavailable_features() actively blocks usage of the
> feature,
> > > > so it is a functional change, not merely a emitting warning.
> > > >
> > > > It makes me wonder if that commit was actually intended to block the
> > > > feature or not, vs merely warning ?  CC'ing those involved in the
> > > > commit.
> > > We can revert the commit.  I'll send the revert to Stefan and let him
> > > decide whether to include it in 10.1-rc4 or delay to 10.2 and 10.1.1.
> >
> > Thanks Paolo for considering that.
> >
> > My steps to reproduce seemed really clear and are 100% reproducible
> > for me, but no one so far said "yeah they see it too", so I'm getting
> > unsure if it was not tried by anyone else or if there is more to it
> > than we yet know.
> > Further I tested more with the commit reverted, and found that at
> > least cross version migrations (9.2 -> 10.1) still have issues that
> > seem related - complaining about pdcm as missing feature.
> > But that was in a log of a test system that went away and ... you know
> > how these things can sometimes be, that new result is not yet very
> > reliable.
> >
> > I intended to check the following matrix more deeply again with and
> > without the reverted change and then come back to this thread:
> >
> > #1 Compare platforms
> > - Migrating between non containerized hosts to verify if they are
> > affected as well
> > - Power management explicitly switched off/on (vs the auto detect of
> > host-model) in the guest XML
> > #2 Retest the different Use-cases I've seen this pop up
> > - 10.1 managed save (broken unless reverting the commit that was
> identified)
> > - 9.2 -> 10.1 migration (seems broken even with the revert)
>
> I need to come back to this aspect of it - the cross release or cross
> qemu version migrations.
>
> Hector (on CC) helps me on that now - sadly we were able to confirm
> that migrations from older qemu versions no longer work.
> Yep 10.1 is released by now so it might end up as "The problem is what
> happens when we detect after we have done a release that something has
> gone wrong" from [2].
> But I still can't believe only we see this and therefore for now want
> to believe I messed up on our side when merging 10.1 :-)
>
> For now this is a call if others have also seen any older release
> migrating to 10.1 to throw:
>   error: operation failed: guest CPU doesn't match specification:
> missing features: pdcm,arch-capabilities
>
> Hector will later today reply here with a summary of what we found so
> far, to provide you a more complete picture to think about, without
> having to read through all the messy interim steps in the Ubuntu bug.
>
> [1]: https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/2121787
> [2]:
> https://gitlab.com/qemu-project/qemu/-/blob/master/docs/devel/migration/compatibility.rst?plain=1#L322
>
> > The hope was that these will help to further identify what is going
> > on, but despite the urgency of the release being imminent I have not
> > yet managed to find the time in the last two days :-/
> >
> > > Sorry for the delay in answering (and thanks Daniel for bringing this
> to
> > > my attention).
> > >
> > > Thanks,
> > >
> > > Paolo
> > >
> >
> >
> > --
> > Christian Ehrhardt
> > Director of Engineering, Ubuntu Server
> > Canonical Ltd
>
>
>
> --
> Christian Ehrhardt
> Director of Engineering, Ubuntu Server
> Canonical Ltd
>


-- 
Hector CAO
Software Engineer – Partner Engineering Team
hector.cao@canonical.com
https://launc <https://launchpad.net/~hectorcao>hpad.net/~hectorcao
<https://launchpad.net/~hectorcao>

<https://launchpad.net/~hectorcao>

[-- Attachment #2: Type: text/html, Size: 13864 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [RFC PATCH 0/2] Fix cross migration issue with missing features: pdcm, arch-capabilities
  2025-09-04 14:35             ` Hector Cao
@ 2025-09-10 11:57               ` Hector Cao
  2025-09-10 11:57                 ` [PATCH 1/2] target/i386: add compatibility property for arch_capabilities Hector Cao
                                   ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: Hector Cao @ 2025-09-10 11:57 UTC (permalink / raw)
  To: qemu-devel

Hello,

Since it is a blocking issue for us, we went further and ended up with a solution along [1]
that allows us to get out of this situation.

The idea is to add compatibility properties to restore legacy behaviors for machine types
with older versions of QEMU (<10.1). 2 compatiblity properties have been added to address
respectively the 2 missing features, each one is done in a separate patch.

We know that 10.1 has been released and it's final, but working on a solution towards 11.0
would allow everyone to settle on the fix and even consider backporting where not yet released
like Ubuntu 25.10 for us.

It is important to have upstream support going forward in this or any other way
and therefore reach out with this RFC to ask you to think about it with us.

[1] https://gitlab.com/qemu-project/qemu/-/blob/master/docs/devel/migration/compatibility.rst

Hector Cao (2):
  target/i386: add compatibility property for arch_capabilities
  target/i386: add compatibility property for pdcm feature

 hw/core/machine.c     |  2 ++
 migration/migration.h | 23 +++++++++++++++++++++++
 migration/options.c   |  6 ++++++
 target/i386/cpu.c     | 17 ++++++++++++++---
 target/i386/kvm/kvm.c |  5 ++++-
 5 files changed, 49 insertions(+), 4 deletions(-)

-- 
2.45.2



^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/2] target/i386: add compatibility property for arch_capabilities
  2025-09-10 11:57               ` [RFC PATCH 0/2] Fix cross migration issue with missing features: pdcm, arch-capabilities Hector Cao
@ 2025-09-10 11:57                 ` Hector Cao
  2025-09-16  8:12                   ` Daniel P. Berrangé
  2025-09-10 11:57                 ` [PATCH 2/2] target/i386: add compatibility property for pdcm feature Hector Cao
  2025-09-23  7:53                 ` [RFC PATCH 0/2] Fix cross migration issue with missing features: pdcm, arch-capabilities Paolo Bonzini
  2 siblings, 1 reply; 25+ messages in thread
From: Hector Cao @ 2025-09-10 11:57 UTC (permalink / raw)
  To: qemu-devel

Prior to v10.1, if requested by user, arch-capabilities is always on
despite the fact that CPUID advertises it to be off/unvailable.
this causes a migration issue for VMs that are run on a machine
without arch-capabilities and expect this feature to be present
on the destination host with QEMU 10.1.

This commit add a compatibility property to restore the legacy
behavior for all machines with version prior to 10.1

Signed-off-by: Hector Cao <hector.cao@canonical.com>
---
 hw/core/machine.c     |  1 +
 migration/migration.h | 12 ++++++++++++
 migration/options.c   |  3 +++
 target/i386/kvm/kvm.c |  5 ++++-
 4 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index 38c949c4f2..8ad5d79cb3 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -45,6 +45,7 @@ GlobalProperty hw_compat_10_0[] = {
     { "vfio-pci", "x-migration-load-config-after-iter", "off" },
     { "ramfb", "use-legacy-x86-rom", "true"},
     { "vfio-pci-nohotplug", "use-legacy-x86-rom", "true" },
+    { "migration", "arch-cap-always-on", "true" },
 };
 const size_t hw_compat_10_0_len = G_N_ELEMENTS(hw_compat_10_0);
 
diff --git a/migration/migration.h b/migration/migration.h
index 01329bf824..5124ff3636 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -510,6 +510,18 @@ struct MigrationState {
     bool rdma_migration;
 
     GSource *hup_source;
+
+    /*
+     * This variable allows to keep the backward compatibility with QEMU (<10.1)
+     * on the arch-capabilities detection.
+     * With the commit d3a2413 (since 10.1), the arch-capabilities feature is gated
+     * with the CPUID bit (CPUID_7_0_EDX_ARCH_CAPABILITIES) instead of being always
+     * enabled when user requests for it. this new behavior breaks migration of VMs
+     * created and run with older QEMU on machines without IA32_ARCH_CAPABILITIES MSR,
+     * those VMs might have arch-capabilities enabled and break when migrating
+     * to a host with QEMU 10.1 with error : missing feature arch-capabilities
+     */
+    bool arch_cap_always_on;
 };
 
 void migrate_set_state(MigrationStatus *state, MigrationStatus old_state,
diff --git a/migration/options.c b/migration/options.c
index 4e923a2e07..3a80dba9c5 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -203,6 +203,9 @@ const Property migration_properties[] = {
                         MIGRATION_CAPABILITY_SWITCHOVER_ACK),
     DEFINE_PROP_MIG_CAP("x-dirty-limit", MIGRATION_CAPABILITY_DIRTY_LIMIT),
     DEFINE_PROP_MIG_CAP("mapped-ram", MIGRATION_CAPABILITY_MAPPED_RAM),
+
+    DEFINE_PROP_BOOL("arch-cap-always-on", MigrationState,
+                     arch_cap_always_on, false),
 };
 const size_t migration_properties_count = ARRAY_SIZE(migration_properties);
 
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 306430a052..e2ec4e6de5 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -42,6 +42,7 @@
 #include "xen-emu.h"
 #include "hyperv.h"
 #include "hyperv-proto.h"
+#include "migration/migration.h"
 
 #include "gdbstub/enums.h"
 #include "qemu/host-utils.h"
@@ -438,6 +439,7 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, uint32_t function,
     uint32_t ret = 0;
     uint32_t cpuid_1_edx, unused;
     uint64_t bitmask;
+    MigrationState *ms = migrate_get_current();
 
     cpuid = get_supported_cpuid(s);
 
@@ -508,7 +510,8 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, uint32_t function,
          * mcahines at all, do not show the fake ARCH_CAPABILITIES MSR that
          * KVM sets up.
          */
-        if (!has_msr_arch_capabs || !(edx & CPUID_7_0_EDX_ARCH_CAPABILITIES)) {
+        if (!has_msr_arch_capabs
+            || (!(edx & CPUID_7_0_EDX_ARCH_CAPABILITIES) && (!ms->arch_cap_always_on))) {
             ret &= ~CPUID_7_0_EDX_ARCH_CAPABILITIES;
         }
     } else if (function == 7 && index == 1 && reg == R_EAX) {
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 2/2] target/i386: add compatibility property for pdcm feature
  2025-09-10 11:57               ` [RFC PATCH 0/2] Fix cross migration issue with missing features: pdcm, arch-capabilities Hector Cao
  2025-09-10 11:57                 ` [PATCH 1/2] target/i386: add compatibility property for arch_capabilities Hector Cao
@ 2025-09-10 11:57                 ` Hector Cao
  2025-09-23  7:53                 ` [RFC PATCH 0/2] Fix cross migration issue with missing features: pdcm, arch-capabilities Paolo Bonzini
  2 siblings, 0 replies; 25+ messages in thread
From: Hector Cao @ 2025-09-10 11:57 UTC (permalink / raw)
  To: qemu-devel

The pdcm feature is supposed to be disabled when PMU is not
available. Up until v10.1, pdcm feature is enabled even when PMU
is off. This behavior has been fixed but this change breaks the
migration of VMs that are run with QEMU < 10.0 and expect the pdcm
feature to be enabled on the destination host.

This commit restores the legacy behavior for machines with version
prior to 10.1 to allow the migration from older QEMU to QEMU 10.1.

Signed-off-by: Hector Cao <hector.cao@canonical.com>
---
 hw/core/machine.c     |  1 +
 migration/migration.h | 11 +++++++++++
 migration/options.c   |  3 +++
 target/i386/cpu.c     | 17 ++++++++++++++---
 4 files changed, 29 insertions(+), 3 deletions(-)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index 8ad5d79cb3..535184c221 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -46,6 +46,7 @@ GlobalProperty hw_compat_10_0[] = {
     { "ramfb", "use-legacy-x86-rom", "true"},
     { "vfio-pci-nohotplug", "use-legacy-x86-rom", "true" },
     { "migration", "arch-cap-always-on", "true" },
+    { "migration", "pdcm-on-even-without-pmu", "true" },
 };
 const size_t hw_compat_10_0_len = G_N_ELEMENTS(hw_compat_10_0);
 
diff --git a/migration/migration.h b/migration/migration.h
index 5124ff3636..7d5b2aa042 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -522,6 +522,17 @@ struct MigrationState {
      * to a host with QEMU 10.1 with error : missing feature arch-capabilities
      */
     bool arch_cap_always_on;
+
+    /*
+     * This variable allows to keep the backward compatibility with QEMU (<10.1)
+     * on the pdcm feature detection. The pdcm feature should be disabled when
+     * PMU is not available. Prio to 10.1, there is a bug and pdcm can still be
+     * enabled even if PMU is off. This behavior has been fixed by the commit
+     * e68ec29 (since 10.1).
+     * This new behavior breaks migration of VMs that expect, with the QEMU
+     * (since 10.1), pdcm to be disabled.
+     */
+    bool pdcm_on_even_without_pmu;
 };
 
 void migrate_set_state(MigrationStatus *state, MigrationStatus old_state,
diff --git a/migration/options.c b/migration/options.c
index 3a80dba9c5..a2a95dfcc4 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -206,6 +206,9 @@ const Property migration_properties[] = {
 
     DEFINE_PROP_BOOL("arch-cap-always-on", MigrationState,
                      arch_cap_always_on, false),
+
+    DEFINE_PROP_BOOL("pdcm-on-even-without-pmu", MigrationState,
+                     pdcm_on_even_without_pmu, false),
 };
 const size_t migration_properties_count = ARRAY_SIZE(migration_properties);
 
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 6d85149e6e..1f0f2c8dbf 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -51,6 +51,8 @@
 #include "disas/capstone.h"
 #include "cpu-internal.h"
 
+#include "migration/migration.h"
+
 static void x86_cpu_realizefn(DeviceState *dev, Error **errp);
 static void x86_cpu_get_supported_cpuid(uint32_t func, uint32_t index,
                                         uint32_t *eax, uint32_t *ebx,
@@ -7839,6 +7841,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
     uint32_t signature[3];
     X86CPUTopoInfo *topo_info = &env->topo_info;
     uint32_t threads_per_pkg;
+    MigrationState *ms = migrate_get_current();
 
     threads_per_pkg = x86_threads_per_pkg(topo_info);
 
@@ -7894,6 +7897,11 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
             /* Fixup overflow: max value for bits 23-16 is 255. */
             *ebx |= MIN(num, 255) << 16;
         }
+        if (ms->pdcm_on_even_without_pmu) {
+            if (!cpu->enable_pmu) {
+                *ecx &= ~CPUID_EXT_PDCM;
+            }
+        }
         break;
     case 2: { /* cache info: needed for Pentium Pro compatibility */
         const CPUCaches *caches;
@@ -8892,6 +8900,7 @@ void x86_cpu_expand_features(X86CPU *cpu, Error **errp)
     FeatureWord w;
     int i;
     GList *l;
+    MigrationState *ms = migrate_get_current();
 
     for (l = plus_features; l; l = l->next) {
         const char *prop = l->data;
@@ -8944,9 +8953,11 @@ void x86_cpu_expand_features(X86CPU *cpu, Error **errp)
         }
     }
 
-    /* PDCM is fixed1 bit for TDX */
-    if (!cpu->enable_pmu && !is_tdx_vm()) {
-        env->features[FEAT_1_ECX] &= ~CPUID_EXT_PDCM;
+    if (!ms->pdcm_on_even_without_pmu) {
+        /* PDCM is fixed1 bit for TDX */
+        if (!cpu->enable_pmu && !is_tdx_vm()) {
+            env->features[FEAT_1_ECX] &= ~CPUID_EXT_PDCM;
+        }
     }
 
     for (i = 0; i < ARRAY_SIZE(feature_dependencies); i++) {
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC PATCH 0/2] Fix cross migration issue with missing features: pdcm, arch-capabilities
@ 2025-09-12 13:35 Hector Cao
  0 siblings, 0 replies; 25+ messages in thread
From: Hector Cao @ 2025-09-12 13:35 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, Zhao Liu, peterx, farosas

[-- Attachment #1: Type: text/plain, Size: 230 bytes --]

Thanks Fiona Ebner for pointing out (in DM) that I did not CC to the
relevant maintainers.
Let me CC to maintainers that are listed by  the ./scripts/get_maintainer.pl
script on the submission changed files.

Kind regards,
Hector

[-- Attachment #2: Type: text/html, Size: 513 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/2] target/i386: add compatibility property for arch_capabilities
  2025-09-10 11:57                 ` [PATCH 1/2] target/i386: add compatibility property for arch_capabilities Hector Cao
@ 2025-09-16  8:12                   ` Daniel P. Berrangé
  2025-09-16  8:28                     ` Hector Cao
  0 siblings, 1 reply; 25+ messages in thread
From: Daniel P. Berrangé @ 2025-09-16  8:12 UTC (permalink / raw)
  To: Hector Cao; +Cc: qemu-devel, Paolo Bonzini

CC Paolo as maintainer

On Wed, Sep 10, 2025 at 01:57:32PM +0200, Hector Cao wrote:
> Prior to v10.1, if requested by user, arch-capabilities is always on
> despite the fact that CPUID advertises it to be off/unvailable.
> this causes a migration issue for VMs that are run on a machine
> without arch-capabilities and expect this feature to be present
> on the destination host with QEMU 10.1.
> 
> This commit add a compatibility property to restore the legacy
> behavior for all machines with version prior to 10.1
>

Can you add a 'Fixes: <hash>' line to refer to the orignial
commit in 10.1 that introduced the regression.

> Signed-off-by: Hector Cao <hector.cao@canonical.com>
> ---
>  hw/core/machine.c     |  1 +
>  migration/migration.h | 12 ++++++++++++
>  migration/options.c   |  3 +++
>  target/i386/kvm/kvm.c |  5 ++++-
>  4 files changed, 20 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/core/machine.c b/hw/core/machine.c
> index 38c949c4f2..8ad5d79cb3 100644
> --- a/hw/core/machine.c
> +++ b/hw/core/machine.c
> @@ -45,6 +45,7 @@ GlobalProperty hw_compat_10_0[] = {
>      { "vfio-pci", "x-migration-load-config-after-iter", "off" },
>      { "ramfb", "use-legacy-x86-rom", "true"},
>      { "vfio-pci-nohotplug", "use-legacy-x86-rom", "true" },
> +    { "migration", "arch-cap-always-on", "true" },
>  };
>  const size_t hw_compat_10_0_len = G_N_ELEMENTS(hw_compat_10_0);
>  
> diff --git a/migration/migration.h b/migration/migration.h
> index 01329bf824..5124ff3636 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -510,6 +510,18 @@ struct MigrationState {
>      bool rdma_migration;
>  
>      GSource *hup_source;
> +
> +    /*
> +     * This variable allows to keep the backward compatibility with QEMU (<10.1)
> +     * on the arch-capabilities detection.
> +     * With the commit d3a2413 (since 10.1), the arch-capabilities feature is gated
> +     * with the CPUID bit (CPUID_7_0_EDX_ARCH_CAPABILITIES) instead of being always
> +     * enabled when user requests for it. this new behavior breaks migration of VMs
> +     * created and run with older QEMU on machines without IA32_ARCH_CAPABILITIES MSR,
> +     * those VMs might have arch-capabilities enabled and break when migrating
> +     * to a host with QEMU 10.1 with error : missing feature arch-capabilities
> +     */
> +    bool arch_cap_always_on;
>  };
>  
>  void migrate_set_state(MigrationStatus *state, MigrationStatus old_state,
> diff --git a/migration/options.c b/migration/options.c
> index 4e923a2e07..3a80dba9c5 100644
> --- a/migration/options.c
> +++ b/migration/options.c
> @@ -203,6 +203,9 @@ const Property migration_properties[] = {
>                          MIGRATION_CAPABILITY_SWITCHOVER_ACK),
>      DEFINE_PROP_MIG_CAP("x-dirty-limit", MIGRATION_CAPABILITY_DIRTY_LIMIT),
>      DEFINE_PROP_MIG_CAP("mapped-ram", MIGRATION_CAPABILITY_MAPPED_RAM),
> +
> +    DEFINE_PROP_BOOL("arch-cap-always-on", MigrationState,
> +                     arch_cap_always_on, false),
>  };
>  const size_t migration_properties_count = ARRAY_SIZE(migration_properties);
>  
> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> index 306430a052..e2ec4e6de5 100644
> --- a/target/i386/kvm/kvm.c
> +++ b/target/i386/kvm/kvm.c
> @@ -42,6 +42,7 @@
>  #include "xen-emu.h"
>  #include "hyperv.h"
>  #include "hyperv-proto.h"
> +#include "migration/migration.h"
>  
>  #include "gdbstub/enums.h"
>  #include "qemu/host-utils.h"
> @@ -438,6 +439,7 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, uint32_t function,
>      uint32_t ret = 0;
>      uint32_t cpuid_1_edx, unused;
>      uint64_t bitmask;
> +    MigrationState *ms = migrate_get_current();
>  
>      cpuid = get_supported_cpuid(s);
>  
> @@ -508,7 +510,8 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, uint32_t function,
>           * mcahines at all, do not show the fake ARCH_CAPABILITIES MSR that
>           * KVM sets up.
>           */
> -        if (!has_msr_arch_capabs || !(edx & CPUID_7_0_EDX_ARCH_CAPABILITIES)) {
> +        if (!has_msr_arch_capabs
> +            || (!(edx & CPUID_7_0_EDX_ARCH_CAPABILITIES) && (!ms->arch_cap_always_on))) {
>              ret &= ~CPUID_7_0_EDX_ARCH_CAPABILITIES;
>          }
>      } else if (function == 7 && index == 1 && reg == R_EAX) {
> -- 
> 2.45.2
> 
> 

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/2] target/i386: add compatibility property for arch_capabilities
  2025-09-16  8:12                   ` Daniel P. Berrangé
@ 2025-09-16  8:28                     ` Hector Cao
  2025-09-23  7:25                       ` Christian Ehrhardt
  0 siblings, 1 reply; 25+ messages in thread
From: Hector Cao @ 2025-09-16  8:28 UTC (permalink / raw)
  To: Daniel P. Berrangé; +Cc: qemu-devel, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 5483 bytes --]

On Tue, Sep 16, 2025 at 10:13 AM Daniel P. Berrangé <berrange@redhat.com>
wrote:

> CC Paolo as maintainer
>
> On Wed, Sep 10, 2025 at 01:57:32PM +0200, Hector Cao wrote:
> > Prior to v10.1, if requested by user, arch-capabilities is always on
> > despite the fact that CPUID advertises it to be off/unvailable.
> > this causes a migration issue for VMs that are run on a machine
> > without arch-capabilities and expect this feature to be present
> > on the destination host with QEMU 10.1.
> >
> > This commit add a compatibility property to restore the legacy
> > behavior for all machines with version prior to 10.1
> >
>
> Can you add a 'Fixes: <hash>' line to refer to the orignial
> commit in 10.1 that introduced the regression.
>

Thanks Daniel for the feedback,

Since this patch is a PoC at the moment,  I will submit the final one later
once I have enough feedback

Here is the line I will add to this patch header:

Fixes: d3a2413


>
> > Signed-off-by: Hector Cao <hector.cao@canonical.com>
> > ---
> >  hw/core/machine.c     |  1 +
> >  migration/migration.h | 12 ++++++++++++
> >  migration/options.c   |  3 +++
> >  target/i386/kvm/kvm.c |  5 ++++-
> >  4 files changed, 20 insertions(+), 1 deletion(-)
> >
> > diff --git a/hw/core/machine.c b/hw/core/machine.c
> > index 38c949c4f2..8ad5d79cb3 100644
> > --- a/hw/core/machine.c
> > +++ b/hw/core/machine.c
> > @@ -45,6 +45,7 @@ GlobalProperty hw_compat_10_0[] = {
> >      { "vfio-pci", "x-migration-load-config-after-iter", "off" },
> >      { "ramfb", "use-legacy-x86-rom", "true"},
> >      { "vfio-pci-nohotplug", "use-legacy-x86-rom", "true" },
> > +    { "migration", "arch-cap-always-on", "true" },
> >  };
> >  const size_t hw_compat_10_0_len = G_N_ELEMENTS(hw_compat_10_0);
> >
> > diff --git a/migration/migration.h b/migration/migration.h
> > index 01329bf824..5124ff3636 100644
> > --- a/migration/migration.h
> > +++ b/migration/migration.h
> > @@ -510,6 +510,18 @@ struct MigrationState {
> >      bool rdma_migration;
> >
> >      GSource *hup_source;
> > +
> > +    /*
> > +     * This variable allows to keep the backward compatibility with
> QEMU (<10.1)
> > +     * on the arch-capabilities detection.
> > +     * With the commit d3a2413 (since 10.1), the arch-capabilities
> feature is gated
> > +     * with the CPUID bit (CPUID_7_0_EDX_ARCH_CAPABILITIES) instead of
> being always
> > +     * enabled when user requests for it. this new behavior breaks
> migration of VMs
> > +     * created and run with older QEMU on machines without
> IA32_ARCH_CAPABILITIES MSR,
> > +     * those VMs might have arch-capabilities enabled and break when
> migrating
> > +     * to a host with QEMU 10.1 with error : missing feature
> arch-capabilities
> > +     */
> > +    bool arch_cap_always_on;
> >  };
> >
> >  void migrate_set_state(MigrationStatus *state, MigrationStatus
> old_state,
> > diff --git a/migration/options.c b/migration/options.c
> > index 4e923a2e07..3a80dba9c5 100644
> > --- a/migration/options.c
> > +++ b/migration/options.c
> > @@ -203,6 +203,9 @@ const Property migration_properties[] = {
> >                          MIGRATION_CAPABILITY_SWITCHOVER_ACK),
> >      DEFINE_PROP_MIG_CAP("x-dirty-limit",
> MIGRATION_CAPABILITY_DIRTY_LIMIT),
> >      DEFINE_PROP_MIG_CAP("mapped-ram", MIGRATION_CAPABILITY_MAPPED_RAM),
> > +
> > +    DEFINE_PROP_BOOL("arch-cap-always-on", MigrationState,
> > +                     arch_cap_always_on, false),
> >  };
> >  const size_t migration_properties_count =
> ARRAY_SIZE(migration_properties);
> >
> > diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> > index 306430a052..e2ec4e6de5 100644
> > --- a/target/i386/kvm/kvm.c
> > +++ b/target/i386/kvm/kvm.c
> > @@ -42,6 +42,7 @@
> >  #include "xen-emu.h"
> >  #include "hyperv.h"
> >  #include "hyperv-proto.h"
> > +#include "migration/migration.h"
> >
> >  #include "gdbstub/enums.h"
> >  #include "qemu/host-utils.h"
> > @@ -438,6 +439,7 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s,
> uint32_t function,
> >      uint32_t ret = 0;
> >      uint32_t cpuid_1_edx, unused;
> >      uint64_t bitmask;
> > +    MigrationState *ms = migrate_get_current();
> >
> >      cpuid = get_supported_cpuid(s);
> >
> > @@ -508,7 +510,8 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s,
> uint32_t function,
> >           * mcahines at all, do not show the fake ARCH_CAPABILITIES MSR
> that
> >           * KVM sets up.
> >           */
> > -        if (!has_msr_arch_capabs || !(edx &
> CPUID_7_0_EDX_ARCH_CAPABILITIES)) {
> > +        if (!has_msr_arch_capabs
> > +            || (!(edx & CPUID_7_0_EDX_ARCH_CAPABILITIES) &&
> (!ms->arch_cap_always_on))) {
> >              ret &= ~CPUID_7_0_EDX_ARCH_CAPABILITIES;
> >          }
> >      } else if (function == 7 && index == 1 && reg == R_EAX) {
> > --
> > 2.45.2
> >
> >
>
> With regards,
> Daniel
> --
> |: https://berrange.com      -o-
> https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-
> https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-
> https://www.instagram.com/dberrange :|
>
>

-- 
Hector CAO
Software Engineer – Server Team / Virtualization
hector.cao@canonical.com
https://launc <https://launchpad.net/~hectorcao>hpad.net/~hectorcao
<https://launchpad.net/~hectorcao>

<https://launchpad.net/~hectorcao>

[-- Attachment #2: Type: text/html, Size: 8465 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/2] target/i386: add compatibility property for arch_capabilities
  2025-09-16  8:28                     ` Hector Cao
@ 2025-09-23  7:25                       ` Christian Ehrhardt
  0 siblings, 0 replies; 25+ messages in thread
From: Christian Ehrhardt @ 2025-09-23  7:25 UTC (permalink / raw)
  To: Hector Cao; +Cc: Daniel P. Berrangé, qemu-devel, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 7100 bytes --]

On Tue, Sep 16, 2025 at 10:29 AM Hector Cao <hector.cao@canonical.com>
wrote:

>
>
> On Tue, Sep 16, 2025 at 10:13 AM Daniel P. Berrangé <berrange@redhat.com>
> wrote:
>
>> CC Paolo as maintainer
>>
>> On Wed, Sep 10, 2025 at 01:57:32PM +0200, Hector Cao wrote:
>> > Prior to v10.1, if requested by user, arch-capabilities is always on
>> > despite the fact that CPUID advertises it to be off/unvailable.
>> > this causes a migration issue for VMs that are run on a machine
>> > without arch-capabilities and expect this feature to be present
>> > on the destination host with QEMU 10.1.
>> >
>> > This commit add a compatibility property to restore the legacy
>> > behavior for all machines with version prior to 10.1
>> >
>>
>> Can you add a 'Fixes: <hash>' line to refer to the orignial
>> commit in 10.1 that introduced the regression.
>>
>
> Thanks Daniel for the feedback,
>
> Since this patch is a PoC at the moment,  I will submit the final one
> later once I have enough feedback
>
> Here is the line I will add to this patch header:
>
> Fixes: d3a2413
>

Hi Paolo, Daniel and qemu-dev’s,

Sorry to bother you with pings over and over on this ..

But to help you understand, the Ubuntu 25.10 release is soon to happen in
2.5 weeks. And while it is already too late to fix 10.1 before being
released it is our chance to still fix what we will ship with Ubuntu 25.10.

Tomorrow (24th Sept) is the day we need to decide if we will pick the
proposed patch for Ubuntu 25.10 or leave it broken, not being able to
migrate to it from older Ubuntu/Qemu releases, or any other path forward.

Therefore once more, if there could be some reaction, your judgement of the
overall situation or  your assumptions how this will eventually conclude -
any of that will help us to decide for Ubuntu.

We have been expecting more of a discussion - in fact we mostly rushed to
get something for discussion out and not spend a long time on every line to
be in final form.

So far the feedback has been great but mostly useful small process
suggestions. As Hector said these will be done when doing the non-RFC
submission. But maybe we mis-interpreted the lack of discussion as being
dormant, while you actually have been ok except for these mechanical things?

I’ll ask Hector to resubmit as non-RFC today.



>
>
>>
>> > Signed-off-by: Hector Cao <hector.cao@canonical.com>
>> > ---
>> >  hw/core/machine.c     |  1 +
>> >  migration/migration.h | 12 ++++++++++++
>> >  migration/options.c   |  3 +++
>> >  target/i386/kvm/kvm.c |  5 ++++-
>> >  4 files changed, 20 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/hw/core/machine.c b/hw/core/machine.c
>> > index 38c949c4f2..8ad5d79cb3 100644
>> > --- a/hw/core/machine.c
>> > +++ b/hw/core/machine.c
>> > @@ -45,6 +45,7 @@ GlobalProperty hw_compat_10_0[] = {
>> >      { "vfio-pci", "x-migration-load-config-after-iter", "off" },
>> >      { "ramfb", "use-legacy-x86-rom", "true"},
>> >      { "vfio-pci-nohotplug", "use-legacy-x86-rom", "true" },
>> > +    { "migration", "arch-cap-always-on", "true" },
>> >  };
>> >  const size_t hw_compat_10_0_len = G_N_ELEMENTS(hw_compat_10_0);
>> >
>> > diff --git a/migration/migration.h b/migration/migration.h
>> > index 01329bf824..5124ff3636 100644
>> > --- a/migration/migration.h
>> > +++ b/migration/migration.h
>> > @@ -510,6 +510,18 @@ struct MigrationState {
>> >      bool rdma_migration;
>> >
>> >      GSource *hup_source;
>> > +
>> > +    /*
>> > +     * This variable allows to keep the backward compatibility with
>> QEMU (<10.1)
>> > +     * on the arch-capabilities detection.
>> > +     * With the commit d3a2413 (since 10.1), the arch-capabilities
>> feature is gated
>> > +     * with the CPUID bit (CPUID_7_0_EDX_ARCH_CAPABILITIES) instead of
>> being always
>> > +     * enabled when user requests for it. this new behavior breaks
>> migration of VMs
>> > +     * created and run with older QEMU on machines without
>> IA32_ARCH_CAPABILITIES MSR,
>> > +     * those VMs might have arch-capabilities enabled and break when
>> migrating
>> > +     * to a host with QEMU 10.1 with error : missing feature
>> arch-capabilities
>> > +     */
>> > +    bool arch_cap_always_on;
>> >  };
>> >
>> >  void migrate_set_state(MigrationStatus *state, MigrationStatus
>> old_state,
>> > diff --git a/migration/options.c b/migration/options.c
>> > index 4e923a2e07..3a80dba9c5 100644
>> > --- a/migration/options.c
>> > +++ b/migration/options.c
>> > @@ -203,6 +203,9 @@ const Property migration_properties[] = {
>> >                          MIGRATION_CAPABILITY_SWITCHOVER_ACK),
>> >      DEFINE_PROP_MIG_CAP("x-dirty-limit",
>> MIGRATION_CAPABILITY_DIRTY_LIMIT),
>> >      DEFINE_PROP_MIG_CAP("mapped-ram", MIGRATION_CAPABILITY_MAPPED_RAM),
>> > +
>> > +    DEFINE_PROP_BOOL("arch-cap-always-on", MigrationState,
>> > +                     arch_cap_always_on, false),
>> >  };
>> >  const size_t migration_properties_count =
>> ARRAY_SIZE(migration_properties);
>> >
>> > diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
>> > index 306430a052..e2ec4e6de5 100644
>> > --- a/target/i386/kvm/kvm.c
>> > +++ b/target/i386/kvm/kvm.c
>> > @@ -42,6 +42,7 @@
>> >  #include "xen-emu.h"
>> >  #include "hyperv.h"
>> >  #include "hyperv-proto.h"
>> > +#include "migration/migration.h"
>> >
>> >  #include "gdbstub/enums.h"
>> >  #include "qemu/host-utils.h"
>> > @@ -438,6 +439,7 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s,
>> uint32_t function,
>> >      uint32_t ret = 0;
>> >      uint32_t cpuid_1_edx, unused;
>> >      uint64_t bitmask;
>> > +    MigrationState *ms = migrate_get_current();
>> >
>> >      cpuid = get_supported_cpuid(s);
>> >
>> > @@ -508,7 +510,8 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s,
>> uint32_t function,
>> >           * mcahines at all, do not show the fake ARCH_CAPABILITIES MSR
>> that
>> >           * KVM sets up.
>> >           */
>> > -        if (!has_msr_arch_capabs || !(edx &
>> CPUID_7_0_EDX_ARCH_CAPABILITIES)) {
>> > +        if (!has_msr_arch_capabs
>> > +            || (!(edx & CPUID_7_0_EDX_ARCH_CAPABILITIES) &&
>> (!ms->arch_cap_always_on))) {
>> >              ret &= ~CPUID_7_0_EDX_ARCH_CAPABILITIES;
>> >          }
>> >      } else if (function == 7 && index == 1 && reg == R_EAX) {
>> > --
>> > 2.45.2
>> >
>> >
>>
>> With regards,
>> Daniel
>> --
>> |: https://berrange.com      -o-
>> https://www.flickr.com/photos/dberrange :|
>> |: https://libvirt.org         -o-
>> https://fstop138.berrange.com :|
>> |: https://entangle-photo.org    -o-
>> https://www.instagram.com/dberrange :|
>>
>>
>
> --
> Hector CAO
> Software Engineer – Server Team / Virtualization
> hector.cao@canonical.com
> https://launc <https://launchpad.net/~hectorcao>hpad.net/~hectorcao
> <https://launchpad.net/~hectorcao>
>
> <https://launchpad.net/~hectorcao>
>


-- 
Christian Ehrhardt
Director of Engineering, Ubuntu Server
Canonical Ltd

[-- Attachment #2: Type: text/html, Size: 13118 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH 0/2] Fix cross migration issue with missing features: pdcm, arch-capabilities
  2025-09-10 11:57               ` [RFC PATCH 0/2] Fix cross migration issue with missing features: pdcm, arch-capabilities Hector Cao
  2025-09-10 11:57                 ` [PATCH 1/2] target/i386: add compatibility property for arch_capabilities Hector Cao
  2025-09-10 11:57                 ` [PATCH 2/2] target/i386: add compatibility property for pdcm feature Hector Cao
@ 2025-09-23  7:53                 ` Paolo Bonzini
  2025-09-23 10:08                   ` Hector Cao
  2 siblings, 1 reply; 25+ messages in thread
From: Paolo Bonzini @ 2025-09-23  7:53 UTC (permalink / raw)
  To: Hector Cao, qemu-devel

On 9/10/25 13:57, Hector Cao wrote:
> Hello,
> 
> Since it is a blocking issue for us, we went further and ended up with a solution along [1]
> that allows us to get out of this situation.
> 
> The idea is to add compatibility properties to restore legacy behaviors for machine types
> with older versions of QEMU (<10.1). 2 compatiblity properties have been added to address
> respectively the 2 missing features, each one is done in a separate patch.
> 
> We know that 10.1 has been released and it's final, but working on a solution towards 11.0
> would allow everyone to settle on the fix and even consider backporting where not yet released
> like Ubuntu 25.10 for us.

Thanks, I have applied the patch.  It's better to have the fix in 10.1.1.

Sorry for the delay, I was on vacation for one week and working reduced 
hours the next.

Paolo

> It is important to have upstream support going forward in this or any other way
> and therefore reach out with this RFC to ask you to think about it with us.
> 
> [1] https://gitlab.com/qemu-project/qemu/-/blob/master/docs/devel/migration/compatibility.rst
> 
> Hector Cao (2):
>    target/i386: add compatibility property for arch_capabilities
>    target/i386: add compatibility property for pdcm feature
> 
>   hw/core/machine.c     |  2 ++
>   migration/migration.h | 23 +++++++++++++++++++++++
>   migration/options.c   |  6 ++++++
>   target/i386/cpu.c     | 17 ++++++++++++++---
>   target/i386/kvm/kvm.c |  5 ++++-
>   5 files changed, 49 insertions(+), 4 deletions(-)
> 



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH 0/2] Fix cross migration issue with missing features: pdcm, arch-capabilities
  2025-09-23  7:53                 ` [RFC PATCH 0/2] Fix cross migration issue with missing features: pdcm, arch-capabilities Paolo Bonzini
@ 2025-09-23 10:08                   ` Hector Cao
  2025-09-23 10:15                     ` Paolo Bonzini
  0 siblings, 1 reply; 25+ messages in thread
From: Hector Cao @ 2025-09-23 10:08 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel

[-- Attachment #1: Type: text/plain, Size: 2029 bytes --]

Thanks Paolo,

Is it still time for me to submit the v2 of this patch ? I would like do
add 2 changes:
- add fixes:xxx line suggested by Daniel
- fix link error for qemu-user build (since it has no access to migration
code)

Best,
Hector

<https://launchpad.net/~hectorcao>

Le mar. 23 sept. 2025, 09:53, Paolo Bonzini <pbonzini@redhat.com> a écrit :

> On 9/10/25 13:57, Hector Cao wrote:
> > Hello,
> >
> > Since it is a blocking issue for us, we went further and ended up with a
> solution along [1]
> > that allows us to get out of this situation.
> >
> > The idea is to add compatibility properties to restore legacy behaviors
> for machine types
> > with older versions of QEMU (<10.1). 2 compatiblity properties have been
> added to address
> > respectively the 2 missing features, each one is done in a separate
> patch.
> >
> > We know that 10.1 has been released and it's final, but working on a
> solution towards 11.0
> > would allow everyone to settle on the fix and even consider backporting
> where not yet released
> > like Ubuntu 25.10 for us.
>
> Thanks, I have applied the patch.  It's better to have the fix in 10.1.1.
>
> Sorry for the delay, I was on vacation for one week and working reduced
> hours the next.
>
> Paolo
>
> > It is important to have upstream support going forward in this or any
> other way
> > and therefore reach out with this RFC to ask you to think about it with
> us.
> >
> > [1]
> https://gitlab.com/qemu-project/qemu/-/blob/master/docs/devel/migration/compatibility.rst
> >
> > Hector Cao (2):
> >    target/i386: add compatibility property for arch_capabilities
> >    target/i386: add compatibility property for pdcm feature
> >
> >   hw/core/machine.c     |  2 ++
> >   migration/migration.h | 23 +++++++++++++++++++++++
> >   migration/options.c   |  6 ++++++
> >   target/i386/cpu.c     | 17 ++++++++++++++---
> >   target/i386/kvm/kvm.c |  5 ++++-
> >   5 files changed, 49 insertions(+), 4 deletions(-)
> >
>
>

[-- Attachment #2: Type: text/html, Size: 3127 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH 0/2] Fix cross migration issue with missing features: pdcm, arch-capabilities
  2025-09-23 10:08                   ` Hector Cao
@ 2025-09-23 10:15                     ` Paolo Bonzini
  2025-09-23 10:31                       ` Hector Cao
  0 siblings, 1 reply; 25+ messages in thread
From: Paolo Bonzini @ 2025-09-23 10:15 UTC (permalink / raw)
  To: Hector Cao; +Cc: qemu-devel

On Tue, Sep 23, 2025 at 12:08 PM Hector Cao <hector.cao@canonical.com> wrote:
>
> Thanks Paolo,
>
> Is it still time for me to submit the v2 of this patch ? I would like do add 2 changes:
> - add fixes:xxx line suggested by Daniel
> - fix link error for qemu-user build (since it has no access to migration code)

I have since noticed the link error indeed, and I'll post a v2 myself
with the fix.

Next time, if you notice a problem with the patch you should post the
fixed version without waiting for input.

Paolo



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH 0/2] Fix cross migration issue with missing features: pdcm, arch-capabilities
  2025-09-23 10:15                     ` Paolo Bonzini
@ 2025-09-23 10:31                       ` Hector Cao
  0 siblings, 0 replies; 25+ messages in thread
From: Hector Cao @ 2025-09-23 10:31 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel

[-- Attachment #1: Type: text/plain, Size: 736 bytes --]

<https://launchpad.net/~hectorcao>

Le mar. 23 sept. 2025, 12:15, Paolo Bonzini <pbonzini@redhat.com> a écrit :

> On Tue, Sep 23, 2025 at 12:08 PM Hector Cao <hector.cao@canonical.com>
> wrote:
> >
> > Thanks Paolo,
> >
> > Is it still time for me to submit the v2 of this patch ? I would like do
> add 2 changes:
> > - add fixes:xxx line suggested by Daniel
> > - fix link error for qemu-user build (since it has no access to
> migration code)
>
> I have since noticed the link error indeed, and I'll post a v2 myself
> with the fix.
>
> Next time, if you notice a problem with the patch you should post the
> fixed version without waiting for input.
>

Lesson learnt, thanks !

Hector

>
> Paolo
>
>

[-- Attachment #2: Type: text/html, Size: 1803 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2025-09-23 10:32 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-06 11:52 Issues with pdcm in qemu 10.1-rc on migration and save/restore Christian Ehrhardt
2025-08-06 12:00 ` Daniel P. Berrangé
2025-08-06 17:57   ` Christian Ehrhardt
2025-08-06 19:18     ` Daniel P. Berrangé
2025-08-07  3:38       ` Xiaoyao Li
2025-08-07  6:37         ` Christian Ehrhardt
2025-08-07  8:09           ` Xiaoyao Li
2025-08-10 13:07             ` Christian Ehrhardt
2025-08-19 14:51       ` Paolo Bonzini
2025-08-20  5:11         ` Christian Ehrhardt
2025-08-20  9:10           ` Christian Ehrhardt
2025-09-03  8:38           ` Christian Ehrhardt
2025-09-03 11:26             ` Hector Cao
2025-09-04 14:35             ` Hector Cao
2025-09-10 11:57               ` [RFC PATCH 0/2] Fix cross migration issue with missing features: pdcm, arch-capabilities Hector Cao
2025-09-10 11:57                 ` [PATCH 1/2] target/i386: add compatibility property for arch_capabilities Hector Cao
2025-09-16  8:12                   ` Daniel P. Berrangé
2025-09-16  8:28                     ` Hector Cao
2025-09-23  7:25                       ` Christian Ehrhardt
2025-09-10 11:57                 ` [PATCH 2/2] target/i386: add compatibility property for pdcm feature Hector Cao
2025-09-23  7:53                 ` [RFC PATCH 0/2] Fix cross migration issue with missing features: pdcm, arch-capabilities Paolo Bonzini
2025-09-23 10:08                   ` Hector Cao
2025-09-23 10:15                     ` Paolo Bonzini
2025-09-23 10:31                       ` Hector Cao
  -- strict thread matches above, loose matches on Subject: below --
2025-09-12 13:35 Hector Cao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).