qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* Qemu setting "-cpu host" seems broken with Windows vms
@ 2023-12-28 17:45 xtec
  2023-12-29 13:10 ` Stefan Hajnoczi
  2024-01-12 18:32 ` Daniel P. Berrangé
  0 siblings, 2 replies; 5+ messages in thread
From: xtec @ 2023-12-28 17:45 UTC (permalink / raw)
  To: qemu-devel

I noticed something weird when using "-cpu host" with Windows vms.
First, I always use it along with ",hv_passthrough" as well.

First, performance: since some years ago, since prior to qemu 6.2 until 
latest 8.2, win10 and win11 vms always worked slower than expected. This 
could be noticed by comparing booting/starting times between vm and a 
bare metal installation, but I particularly measured it when installing 
windows cumulative updates through windows update. On vm, from 
downloading to finishing rebooting it always took 1.5 circa 1.5 hours, 
while just 40 minutes on bare metal.

Second, and more recently, newer windows 11 23h2 seems to have big 
problem with "-cpu host".
When trying to update from 22h2 to 23h2 I got either black screen or 
bsod after trying to reboot.
Also, same result when trying to install 23h2 from scratch.
This on qemu 7.1 and 8.2.
Did a long search, and finally found the cause which also solved the 
problem for me:
https://forum.proxmox.com/threads/new-windows-11-vm-fails-boot-after-update.137543/
I found similar problems and similar solution in other forums as well.

So in my case, physical host cpu is intel core 11th gen; tried using 
libvirt's "virsh capabilities" to see which qemu cpu model better 
matched, and for some reason it gave Broadwell instead of newer 
Skylake...
Anyway, tried with "-cpu <Broadwell_model>,hv_passthrough", and this 
solved *both* problems: performance finally matched bare metal in all 
aspects, and the windows 23h2 problem was finally gone.

On IRC, it was suggested to try "-cpu host" and "disabling CPU bits" one 
by one until finding the culprit. But I don't know how to do this...

Could someone look into this?
Thanks.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Qemu setting "-cpu host" seems broken with Windows vms
  2023-12-28 17:45 Qemu setting "-cpu host" seems broken with Windows vms xtec
@ 2023-12-29 13:10 ` Stefan Hajnoczi
  2024-01-16 17:56   ` Paolo Bonzini
  2024-01-12 18:32 ` Daniel P. Berrangé
  1 sibling, 1 reply; 5+ messages in thread
From: Stefan Hajnoczi @ 2023-12-29 13:10 UTC (permalink / raw)
  To: xtec; +Cc: qemu-devel, Paolo Bonzini

On Thu, 28 Dec 2023 at 17:21, <xtec@trimaso.com.mx> wrote:

CCing Paolo, the general x86 maintainer.

Stefan

> I noticed something weird when using "-cpu host" with Windows vms.
> First, I always use it along with ",hv_passthrough" as well.
>
> First, performance: since some years ago, since prior to qemu 6.2 until
> latest 8.2, win10 and win11 vms always worked slower than expected. This
> could be noticed by comparing booting/starting times between vm and a
> bare metal installation, but I particularly measured it when installing
> windows cumulative updates through windows update. On vm, from
> downloading to finishing rebooting it always took 1.5 circa 1.5 hours,
> while just 40 minutes on bare metal.
>
> Second, and more recently, newer windows 11 23h2 seems to have big
> problem with "-cpu host".
> When trying to update from 22h2 to 23h2 I got either black screen or
> bsod after trying to reboot.
> Also, same result when trying to install 23h2 from scratch.
> This on qemu 7.1 and 8.2.
> Did a long search, and finally found the cause which also solved the
> problem for me:
> https://forum.proxmox.com/threads/new-windows-11-vm-fails-boot-after-update.137543/
> I found similar problems and similar solution in other forums as well.
>
> So in my case, physical host cpu is intel core 11th gen; tried using
> libvirt's "virsh capabilities" to see which qemu cpu model better
> matched, and for some reason it gave Broadwell instead of newer
> Skylake...
> Anyway, tried with "-cpu <Broadwell_model>,hv_passthrough", and this
> solved *both* problems: performance finally matched bare metal in all
> aspects, and the windows 23h2 problem was finally gone.
>
> On IRC, it was suggested to try "-cpu host" and "disabling CPU bits" one
> by one until finding the culprit. But I don't know how to do this...
>
> Could someone look into this?
> Thanks.
>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Qemu setting "-cpu host" seems broken with Windows vms
  2023-12-28 17:45 Qemu setting "-cpu host" seems broken with Windows vms xtec
  2023-12-29 13:10 ` Stefan Hajnoczi
@ 2024-01-12 18:32 ` Daniel P. Berrangé
  1 sibling, 0 replies; 5+ messages in thread
From: Daniel P. Berrangé @ 2024-01-12 18:32 UTC (permalink / raw)
  To: xtec; +Cc: qemu-devel

On Thu, Dec 28, 2023 at 11:45:18AM -0600, xtec@trimaso.com.mx wrote:
> I noticed something weird when using "-cpu host" with Windows vms.
> First, I always use it along with ",hv_passthrough" as well.
> 
> First, performance: since some years ago, since prior to qemu 6.2 until
> latest 8.2, win10 and win11 vms always worked slower than expected. This
> could be noticed by comparing booting/starting times between vm and a bare
> metal installation, but I particularly measured it when installing windows
> cumulative updates through windows update. On vm, from downloading to
> finishing rebooting it always took 1.5 circa 1.5 hours, while just 40
> minutes on bare metal.
> 
> Second, and more recently, newer windows 11 23h2 seems to have big problem
> with "-cpu host".
> When trying to update from 22h2 to 23h2 I got either black screen or bsod
> after trying to reboot.
> Also, same result when trying to install 23h2 from scratch.
> This on qemu 7.1 and 8.2.
> Did a long search, and finally found the cause which also solved the problem
> for me:
> https://forum.proxmox.com/threads/new-windows-11-vm-fails-boot-after-update.137543/
> I found similar problems and similar solution in other forums as well.
> 
> So in my case, physical host cpu is intel core 11th gen; tried using
> libvirt's "virsh capabilities" to see which qemu cpu model better matched,
> and for some reason it gave Broadwell instead of newer Skylake...

Intel has many different variants of each named CPU generation, and
QEMU's CPU model only reflects one particular variant.  So it is
possible that you have a Skylake variant that lacks 1 feature flag
that QEMU's Skylake model has. This in turn causes libvirt to find
the next best named model with all flags available and in your case
libvirt decided Broadwell was best.

> Anyway, tried with "-cpu <Broadwell_model>,hv_passthrough", and this solved
> *both* problems: performance finally matched bare metal in all aspects, and
> the windows 23h2 problem was finally gone.
> 
> On IRC, it was suggested to try "-cpu host" and "disabling CPU bits" one by
> one until finding the culprit. But I don't know how to do this...

So you need to figure out which bits are different between 'Broadwell' and
'host' for your machine.

Assuming you have qemu.git checked out, you want to run

   ./scripts/qmp/qmp-shell-wrap -p  /usr/bin/qemu-system-x86_64 -display none -accel kvm

in the QMP shell now run

   query-cpu-model-expansion type=full model={'name':'Broadwell'}

and save the list of features it reports. then run

   query-cpu-model-expansion type=full model={'name':'host'}

and save the list of features it reports too.

Now diff the two feature lists.

If the diff shows that 'sse4a' was missing in Broadwell but present in
host, then try

   -cpu Broadwell,hv_passthrough,sse4a

keep appending more features on -cpu, and if you're lucky you might
hit one that triggers the problem.

Not every difference though can be controlled via -cpu flags, so it is
posible there's something inherantly different about the 'host' model
that triggers this problem.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Qemu setting "-cpu host" seems broken with Windows vms
  2023-12-29 13:10 ` Stefan Hajnoczi
@ 2024-01-16 17:56   ` Paolo Bonzini
  2024-01-19  0:13     ` xtec
  0 siblings, 1 reply; 5+ messages in thread
From: Paolo Bonzini @ 2024-01-16 17:56 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: xtec, qemu-devel

On Fri, Dec 29, 2023 at 2:10 PM Stefan Hajnoczi <stefanha@gmail.com> wrote:
> > First, performance: since some years ago, since prior to qemu 6.2 until
> > latest 8.2, win10 and win11 vms always worked slower than expected. This
> > could be noticed by comparing booting/starting times between vm and a
> > bare metal installation, but I particularly measured it when installing
> > windows cumulative updates through windows update. On vm, from
> > downloading to finishing rebooting it always took 1.5 circa 1.5 hours,
> > while just 40 minutes on bare metal.

One possibility is that you have Hyper-V enabled with -cpu host but
not with other CPU models. That's because "-cpu host" enables nested
virtualization.

Try "-cpu host,-vmx" and it should be clear if that's the case.

Based on the pastie that you prepared, that's the main difference
between -cpu host and -cpu Broadwell-noTSX-IBRS. Nothing else (see
list below) should have any substantial performance impact; even less
so should they make things worse.

Paolo

               "avx512-vp2intersect": true,
               "avx512-vpopcntdq": true,
               "avx512bitalg": true,
               "avx512bw": true,
               "avx512cd": true,
               "avx512dq": true,
               "avx512f": true,
               "avx512ifma": true,
               "avx512vbmi": true,
               "avx512vbmi2": true,
               "avx512vl": true,
               "avx512vnni": true,
               "full-width-write": true,
               "gfni": true,
               "vaes": true,
               "vpclmulqdq": true,

               "clflushopt": true,
               "clwb": true,

               "fsrm": true,

               "host-cache-info": false,
               "host-phys-bits": true,

               "amd-ssbd": true,
               "amd-stibp": true,
               "arch-capabilities": true,
               "ibpb": true,
               "ibrs": true,
               "ibrs-all": true,
               "ssbd": true,
               "stibp": true,

               "kvm-pv-ipi": true,
               "kvm-pv-sched-yield": true,
               "kvm-pv-tlb-flush": true,
               "kvm-pv-unhalt": true,

               "lmce": true,
               "md-clear": true,
               "mds-no": true,
               "movdir64b": true,
               "movdiri": true,
               "pdcm": true,
               "pdpe1gb": true,

               "pdcm": false,
               "pdpe1gb": false,
               "pku": true,
               "pmu": true,
               "pschange-mc-no": true,
               "rdctl-no": true,
               "rdpid": true,
               "sha-ni": true,
               "ss": true,
               "tsc-adjust": true,
               "umip": true,
               "vmx": true,
               "xgetbv1": true,
               "xsavec": true,
               "xsaves": true,

(skipped everything vmx-related, since they don't matter with vmx
itself being false)



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Qemu setting "-cpu host" seems broken with Windows vms
  2024-01-16 17:56   ` Paolo Bonzini
@ 2024-01-19  0:13     ` xtec
  0 siblings, 0 replies; 5+ messages in thread
From: xtec @ 2024-01-19  0:13 UTC (permalink / raw)
  To: Qemu Devel; +Cc: pbonzini

So finally tested with this:
-cpu host,hv_relaxed,hv_spinlocks=0x1fff,hv_vapic,hv_time,-vmx
The used hyper-v enhancements are the ones generally recommended for 
Windows vms.

Overall it seemed to really work: the performance was like bare metal, 
and the BSOD second problem was also gone (for this I had to test by 
installing another Win11 23h2 vm from scratch and run windows updates).

Also, unlike before, the Windows "suspend" functions also appeared; and 
most surprising: the actually worked.
Tried suspending, worked. Even tried enabling the infamous "fast boot" 
and shutting down vm. Result, it took a little more to shut down, but 
when powering on vm again, it did restore.
Though did each test only once...

Did these last tests because in many QEMU/KVM guides around internet I 
had read that, at least with Windows vms, it was very important to 
disable fast boot because QEMU/KVM did not support it and lead to ugly 
buggy functionalities.
So, did this change over time?

This could apparently imply that the culprit was the "vmx" CPU bit, 
which as already explained it's the one enabling nested virtualization 
inside the vm.
Overall, what would you think? Could this qualify as kind of a bug? Is 
nested virtualization often used in QEMU/KVM vms?
Could it be that Win11 23h2 has problems with this CPU bit?

Oh, and based on the results, I have few additional doubts:

If wanted to do live migration, would it be a matter of just switching 
"host" for "Skylake" or any other "fixed" QEMU CPU model, then just 
checking vm still boots correctly?

When trying "-cpu host,hv-passthrough", I did notice a considerable 
improvement in overall performance than when using 
"hv_relaxed,hv_spinlocks=0x1fff,hv_vapic,hv_time"; yet still noticeably 
not like bare metal. Why was this?

In another forum, I read a guy did not have problems updating from Win11 
22h2 to 23h2 on QEMU/KVM; though he used libvirt.
Among his CPU settings, he did not use the CPU passthrough, but a QEMU 
CPU model, which cannot remember which was except that it was a Xeon 
server model. Moreover, among the used CPU bits, there was vmx=on.
If the culprit here was apparently this vmx bit, how it is that for 
others it bore no consequence? The only difference was using a "server" 
CPU model instead of a "client" one. Though they did not talk about 
performance...

Thanks.


El 2024-01-16 11:56, Paolo Bonzini escribió:
> One possibility is that you have Hyper-V enabled with -cpu host but
> not with other CPU models. That's because "-cpu host" enables nested
> virtualization.
> 
> Try "-cpu host,-vmx" and it should be clear if that's the case.
> 
> Based on the pastie that you prepared, that's the main difference
> between -cpu host and -cpu Broadwell-noTSX-IBRS. Nothing else (see
> list below) should have any substantial performance impact; even less
> so should they make things worse.
> 
> Paolo
> 
>                "avx512-vp2intersect": true,
>                "avx512-vpopcntdq": true,
>                "avx512bitalg": true,
>                "avx512bw": true,
>                "avx512cd": true,
>                "avx512dq": true,
>                "avx512f": true,
>                "avx512ifma": true,
>                "avx512vbmi": true,
>                "avx512vbmi2": true,
>                "avx512vl": true,
>                "avx512vnni": true,
>                "full-width-write": true,
>                "gfni": true,
>                "vaes": true,
>                "vpclmulqdq": true,
> 
>                "clflushopt": true,
>                "clwb": true,
> 
>                "fsrm": true,
> 
>                "host-cache-info": false,
>                "host-phys-bits": true,
> 
>                "amd-ssbd": true,
>                "amd-stibp": true,
>                "arch-capabilities": true,
>                "ibpb": true,
>                "ibrs": true,
>                "ibrs-all": true,
>                "ssbd": true,
>                "stibp": true,
> 
>                "kvm-pv-ipi": true,
>                "kvm-pv-sched-yield": true,
>                "kvm-pv-tlb-flush": true,
>                "kvm-pv-unhalt": true,
> 
>                "lmce": true,
>                "md-clear": true,
>                "mds-no": true,
>                "movdir64b": true,
>                "movdiri": true,
>                "pdcm": true,
>                "pdpe1gb": true,
> 
>                "pdcm": false,
>                "pdpe1gb": false,
>                "pku": true,
>                "pmu": true,
>                "pschange-mc-no": true,
>                "rdctl-no": true,
>                "rdpid": true,
>                "sha-ni": true,
>                "ss": true,
>                "tsc-adjust": true,
>                "umip": true,
>                "vmx": true,
>                "xgetbv1": true,
>                "xsavec": true,
>                "xsaves": true,
> 
> (skipped everything vmx-related, since they don't matter with vmx
> itself being false)


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-01-19 14:04 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-12-28 17:45 Qemu setting "-cpu host" seems broken with Windows vms xtec
2023-12-29 13:10 ` Stefan Hajnoczi
2024-01-16 17:56   ` Paolo Bonzini
2024-01-19  0:13     ` xtec
2024-01-12 18:32 ` Daniel P. Berrangé

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).