* Qemu setting "-cpu host" seems broken with Windows vms @ 2023-12-28 17:45 xtec 2023-12-29 13:10 ` Stefan Hajnoczi 2024-01-12 18:32 ` Daniel P. Berrangé 0 siblings, 2 replies; 5+ messages in thread From: xtec @ 2023-12-28 17:45 UTC (permalink / raw) To: qemu-devel I noticed something weird when using "-cpu host" with Windows vms. First, I always use it along with ",hv_passthrough" as well. First, performance: since some years ago, since prior to qemu 6.2 until latest 8.2, win10 and win11 vms always worked slower than expected. This could be noticed by comparing booting/starting times between vm and a bare metal installation, but I particularly measured it when installing windows cumulative updates through windows update. On vm, from downloading to finishing rebooting it always took 1.5 circa 1.5 hours, while just 40 minutes on bare metal. Second, and more recently, newer windows 11 23h2 seems to have big problem with "-cpu host". When trying to update from 22h2 to 23h2 I got either black screen or bsod after trying to reboot. Also, same result when trying to install 23h2 from scratch. This on qemu 7.1 and 8.2. Did a long search, and finally found the cause which also solved the problem for me: https://forum.proxmox.com/threads/new-windows-11-vm-fails-boot-after-update.137543/ I found similar problems and similar solution in other forums as well. So in my case, physical host cpu is intel core 11th gen; tried using libvirt's "virsh capabilities" to see which qemu cpu model better matched, and for some reason it gave Broadwell instead of newer Skylake... Anyway, tried with "-cpu <Broadwell_model>,hv_passthrough", and this solved *both* problems: performance finally matched bare metal in all aspects, and the windows 23h2 problem was finally gone. On IRC, it was suggested to try "-cpu host" and "disabling CPU bits" one by one until finding the culprit. But I don't know how to do this... Could someone look into this? Thanks. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Qemu setting "-cpu host" seems broken with Windows vms 2023-12-28 17:45 Qemu setting "-cpu host" seems broken with Windows vms xtec @ 2023-12-29 13:10 ` Stefan Hajnoczi 2024-01-16 17:56 ` Paolo Bonzini 2024-01-12 18:32 ` Daniel P. Berrangé 1 sibling, 1 reply; 5+ messages in thread From: Stefan Hajnoczi @ 2023-12-29 13:10 UTC (permalink / raw) To: xtec; +Cc: qemu-devel, Paolo Bonzini On Thu, 28 Dec 2023 at 17:21, <xtec@trimaso.com.mx> wrote: CCing Paolo, the general x86 maintainer. Stefan > I noticed something weird when using "-cpu host" with Windows vms. > First, I always use it along with ",hv_passthrough" as well. > > First, performance: since some years ago, since prior to qemu 6.2 until > latest 8.2, win10 and win11 vms always worked slower than expected. This > could be noticed by comparing booting/starting times between vm and a > bare metal installation, but I particularly measured it when installing > windows cumulative updates through windows update. On vm, from > downloading to finishing rebooting it always took 1.5 circa 1.5 hours, > while just 40 minutes on bare metal. > > Second, and more recently, newer windows 11 23h2 seems to have big > problem with "-cpu host". > When trying to update from 22h2 to 23h2 I got either black screen or > bsod after trying to reboot. > Also, same result when trying to install 23h2 from scratch. > This on qemu 7.1 and 8.2. > Did a long search, and finally found the cause which also solved the > problem for me: > https://forum.proxmox.com/threads/new-windows-11-vm-fails-boot-after-update.137543/ > I found similar problems and similar solution in other forums as well. > > So in my case, physical host cpu is intel core 11th gen; tried using > libvirt's "virsh capabilities" to see which qemu cpu model better > matched, and for some reason it gave Broadwell instead of newer > Skylake... > Anyway, tried with "-cpu <Broadwell_model>,hv_passthrough", and this > solved *both* problems: performance finally matched bare metal in all > aspects, and the windows 23h2 problem was finally gone. > > On IRC, it was suggested to try "-cpu host" and "disabling CPU bits" one > by one until finding the culprit. But I don't know how to do this... > > Could someone look into this? > Thanks. > ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Qemu setting "-cpu host" seems broken with Windows vms 2023-12-29 13:10 ` Stefan Hajnoczi @ 2024-01-16 17:56 ` Paolo Bonzini 2024-01-19 0:13 ` xtec 0 siblings, 1 reply; 5+ messages in thread From: Paolo Bonzini @ 2024-01-16 17:56 UTC (permalink / raw) To: Stefan Hajnoczi; +Cc: xtec, qemu-devel On Fri, Dec 29, 2023 at 2:10 PM Stefan Hajnoczi <stefanha@gmail.com> wrote: > > First, performance: since some years ago, since prior to qemu 6.2 until > > latest 8.2, win10 and win11 vms always worked slower than expected. This > > could be noticed by comparing booting/starting times between vm and a > > bare metal installation, but I particularly measured it when installing > > windows cumulative updates through windows update. On vm, from > > downloading to finishing rebooting it always took 1.5 circa 1.5 hours, > > while just 40 minutes on bare metal. One possibility is that you have Hyper-V enabled with -cpu host but not with other CPU models. That's because "-cpu host" enables nested virtualization. Try "-cpu host,-vmx" and it should be clear if that's the case. Based on the pastie that you prepared, that's the main difference between -cpu host and -cpu Broadwell-noTSX-IBRS. Nothing else (see list below) should have any substantial performance impact; even less so should they make things worse. Paolo "avx512-vp2intersect": true, "avx512-vpopcntdq": true, "avx512bitalg": true, "avx512bw": true, "avx512cd": true, "avx512dq": true, "avx512f": true, "avx512ifma": true, "avx512vbmi": true, "avx512vbmi2": true, "avx512vl": true, "avx512vnni": true, "full-width-write": true, "gfni": true, "vaes": true, "vpclmulqdq": true, "clflushopt": true, "clwb": true, "fsrm": true, "host-cache-info": false, "host-phys-bits": true, "amd-ssbd": true, "amd-stibp": true, "arch-capabilities": true, "ibpb": true, "ibrs": true, "ibrs-all": true, "ssbd": true, "stibp": true, "kvm-pv-ipi": true, "kvm-pv-sched-yield": true, "kvm-pv-tlb-flush": true, "kvm-pv-unhalt": true, "lmce": true, "md-clear": true, "mds-no": true, "movdir64b": true, "movdiri": true, "pdcm": true, "pdpe1gb": true, "pdcm": false, "pdpe1gb": false, "pku": true, "pmu": true, "pschange-mc-no": true, "rdctl-no": true, "rdpid": true, "sha-ni": true, "ss": true, "tsc-adjust": true, "umip": true, "vmx": true, "xgetbv1": true, "xsavec": true, "xsaves": true, (skipped everything vmx-related, since they don't matter with vmx itself being false) ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Qemu setting "-cpu host" seems broken with Windows vms 2024-01-16 17:56 ` Paolo Bonzini @ 2024-01-19 0:13 ` xtec 0 siblings, 0 replies; 5+ messages in thread From: xtec @ 2024-01-19 0:13 UTC (permalink / raw) To: Qemu Devel; +Cc: pbonzini So finally tested with this: -cpu host,hv_relaxed,hv_spinlocks=0x1fff,hv_vapic,hv_time,-vmx The used hyper-v enhancements are the ones generally recommended for Windows vms. Overall it seemed to really work: the performance was like bare metal, and the BSOD second problem was also gone (for this I had to test by installing another Win11 23h2 vm from scratch and run windows updates). Also, unlike before, the Windows "suspend" functions also appeared; and most surprising: the actually worked. Tried suspending, worked. Even tried enabling the infamous "fast boot" and shutting down vm. Result, it took a little more to shut down, but when powering on vm again, it did restore. Though did each test only once... Did these last tests because in many QEMU/KVM guides around internet I had read that, at least with Windows vms, it was very important to disable fast boot because QEMU/KVM did not support it and lead to ugly buggy functionalities. So, did this change over time? This could apparently imply that the culprit was the "vmx" CPU bit, which as already explained it's the one enabling nested virtualization inside the vm. Overall, what would you think? Could this qualify as kind of a bug? Is nested virtualization often used in QEMU/KVM vms? Could it be that Win11 23h2 has problems with this CPU bit? Oh, and based on the results, I have few additional doubts: If wanted to do live migration, would it be a matter of just switching "host" for "Skylake" or any other "fixed" QEMU CPU model, then just checking vm still boots correctly? When trying "-cpu host,hv-passthrough", I did notice a considerable improvement in overall performance than when using "hv_relaxed,hv_spinlocks=0x1fff,hv_vapic,hv_time"; yet still noticeably not like bare metal. Why was this? In another forum, I read a guy did not have problems updating from Win11 22h2 to 23h2 on QEMU/KVM; though he used libvirt. Among his CPU settings, he did not use the CPU passthrough, but a QEMU CPU model, which cannot remember which was except that it was a Xeon server model. Moreover, among the used CPU bits, there was vmx=on. If the culprit here was apparently this vmx bit, how it is that for others it bore no consequence? The only difference was using a "server" CPU model instead of a "client" one. Though they did not talk about performance... Thanks. El 2024-01-16 11:56, Paolo Bonzini escribió: > One possibility is that you have Hyper-V enabled with -cpu host but > not with other CPU models. That's because "-cpu host" enables nested > virtualization. > > Try "-cpu host,-vmx" and it should be clear if that's the case. > > Based on the pastie that you prepared, that's the main difference > between -cpu host and -cpu Broadwell-noTSX-IBRS. Nothing else (see > list below) should have any substantial performance impact; even less > so should they make things worse. > > Paolo > > "avx512-vp2intersect": true, > "avx512-vpopcntdq": true, > "avx512bitalg": true, > "avx512bw": true, > "avx512cd": true, > "avx512dq": true, > "avx512f": true, > "avx512ifma": true, > "avx512vbmi": true, > "avx512vbmi2": true, > "avx512vl": true, > "avx512vnni": true, > "full-width-write": true, > "gfni": true, > "vaes": true, > "vpclmulqdq": true, > > "clflushopt": true, > "clwb": true, > > "fsrm": true, > > "host-cache-info": false, > "host-phys-bits": true, > > "amd-ssbd": true, > "amd-stibp": true, > "arch-capabilities": true, > "ibpb": true, > "ibrs": true, > "ibrs-all": true, > "ssbd": true, > "stibp": true, > > "kvm-pv-ipi": true, > "kvm-pv-sched-yield": true, > "kvm-pv-tlb-flush": true, > "kvm-pv-unhalt": true, > > "lmce": true, > "md-clear": true, > "mds-no": true, > "movdir64b": true, > "movdiri": true, > "pdcm": true, > "pdpe1gb": true, > > "pdcm": false, > "pdpe1gb": false, > "pku": true, > "pmu": true, > "pschange-mc-no": true, > "rdctl-no": true, > "rdpid": true, > "sha-ni": true, > "ss": true, > "tsc-adjust": true, > "umip": true, > "vmx": true, > "xgetbv1": true, > "xsavec": true, > "xsaves": true, > > (skipped everything vmx-related, since they don't matter with vmx > itself being false) ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Qemu setting "-cpu host" seems broken with Windows vms 2023-12-28 17:45 Qemu setting "-cpu host" seems broken with Windows vms xtec 2023-12-29 13:10 ` Stefan Hajnoczi @ 2024-01-12 18:32 ` Daniel P. Berrangé 1 sibling, 0 replies; 5+ messages in thread From: Daniel P. Berrangé @ 2024-01-12 18:32 UTC (permalink / raw) To: xtec; +Cc: qemu-devel On Thu, Dec 28, 2023 at 11:45:18AM -0600, xtec@trimaso.com.mx wrote: > I noticed something weird when using "-cpu host" with Windows vms. > First, I always use it along with ",hv_passthrough" as well. > > First, performance: since some years ago, since prior to qemu 6.2 until > latest 8.2, win10 and win11 vms always worked slower than expected. This > could be noticed by comparing booting/starting times between vm and a bare > metal installation, but I particularly measured it when installing windows > cumulative updates through windows update. On vm, from downloading to > finishing rebooting it always took 1.5 circa 1.5 hours, while just 40 > minutes on bare metal. > > Second, and more recently, newer windows 11 23h2 seems to have big problem > with "-cpu host". > When trying to update from 22h2 to 23h2 I got either black screen or bsod > after trying to reboot. > Also, same result when trying to install 23h2 from scratch. > This on qemu 7.1 and 8.2. > Did a long search, and finally found the cause which also solved the problem > for me: > https://forum.proxmox.com/threads/new-windows-11-vm-fails-boot-after-update.137543/ > I found similar problems and similar solution in other forums as well. > > So in my case, physical host cpu is intel core 11th gen; tried using > libvirt's "virsh capabilities" to see which qemu cpu model better matched, > and for some reason it gave Broadwell instead of newer Skylake... Intel has many different variants of each named CPU generation, and QEMU's CPU model only reflects one particular variant. So it is possible that you have a Skylake variant that lacks 1 feature flag that QEMU's Skylake model has. This in turn causes libvirt to find the next best named model with all flags available and in your case libvirt decided Broadwell was best. > Anyway, tried with "-cpu <Broadwell_model>,hv_passthrough", and this solved > *both* problems: performance finally matched bare metal in all aspects, and > the windows 23h2 problem was finally gone. > > On IRC, it was suggested to try "-cpu host" and "disabling CPU bits" one by > one until finding the culprit. But I don't know how to do this... So you need to figure out which bits are different between 'Broadwell' and 'host' for your machine. Assuming you have qemu.git checked out, you want to run ./scripts/qmp/qmp-shell-wrap -p /usr/bin/qemu-system-x86_64 -display none -accel kvm in the QMP shell now run query-cpu-model-expansion type=full model={'name':'Broadwell'} and save the list of features it reports. then run query-cpu-model-expansion type=full model={'name':'host'} and save the list of features it reports too. Now diff the two feature lists. If the diff shows that 'sse4a' was missing in Broadwell but present in host, then try -cpu Broadwell,hv_passthrough,sse4a keep appending more features on -cpu, and if you're lucky you might hit one that triggers the problem. Not every difference though can be controlled via -cpu flags, so it is posible there's something inherantly different about the 'host' model that triggers this problem. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2024-01-19 14:04 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-12-28 17:45 Qemu setting "-cpu host" seems broken with Windows vms xtec 2023-12-29 13:10 ` Stefan Hajnoczi 2024-01-16 17:56 ` Paolo Bonzini 2024-01-19 0:13 ` xtec 2024-01-12 18:32 ` Daniel P. Berrangé
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).