* [Bug 219085] kvm_spurious_fault in L1 when running a nested kvm instance on AMD Opteron_G5_qemu L0
2024-07-22 18:50 [Bug 219085] New: kvm_spurious_fault in L1 when running a nested kvm instance on AMD Opteron_G5_qemu L0 bugzilla-daemon
@ 2024-07-22 18:51 ` bugzilla-daemon
2024-07-22 19:13 ` bugzilla-daemon
` (5 subsequent siblings)
6 siblings, 0 replies; 10+ messages in thread
From: bugzilla-daemon @ 2024-07-22 18:51 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=219085
ununpta@mailto.plus changed:
What |Removed |Added
----------------------------------------------------------------------------
Kernel Version| |6.10.0
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 10+ messages in thread* [Bug 219085] kvm_spurious_fault in L1 when running a nested kvm instance on AMD Opteron_G5_qemu L0
2024-07-22 18:50 [Bug 219085] New: kvm_spurious_fault in L1 when running a nested kvm instance on AMD Opteron_G5_qemu L0 bugzilla-daemon
2024-07-22 18:51 ` [Bug 219085] " bugzilla-daemon
@ 2024-07-22 19:13 ` bugzilla-daemon
2024-07-22 23:21 ` Sean Christopherson
2024-07-22 23:21 ` bugzilla-daemon
` (4 subsequent siblings)
6 siblings, 1 reply; 10+ messages in thread
From: bugzilla-daemon @ 2024-07-22 19:13 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=219085
--- Comment #1 from ununpta@mailto.plus ---
Command I used on L0 AMD Ryzen:
qemu-system-x86_64.exe -m 4096 -machine q35 -accel whpx -smp 1 -cpu
Opteron_G5,check,+svm -hda c:\debian.qcow2
It's reproducible in 100% cases
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [Bug 219085] kvm_spurious_fault in L1 when running a nested kvm instance on AMD Opteron_G5_qemu L0
2024-07-22 19:13 ` bugzilla-daemon
@ 2024-07-22 23:21 ` Sean Christopherson
0 siblings, 0 replies; 10+ messages in thread
From: Sean Christopherson @ 2024-07-22 23:21 UTC (permalink / raw)
To: bugzilla-daemon; +Cc: kvm
On Mon, Jul 22, 2024, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=219085
>
> --- Comment #1 from ununpta@mailto.plus ---
> Command I used on L0 AMD Ryzen:
> qemu-system-x86_64.exe -m 4096 -machine q35 -accel whpx -smp 1 -cpu
This is likely an issue in the L0 hypervisor, which in this case is Hyper-V. KVM
(L1) hits a #GP when trying to enable EFER.SVME, which leads to the #UD on VMSAVE
(SVM isn't enabled).
[ 355.714362] unchecked MSR access error: WRMSR to 0xc0000080 (tried to write 0x0000000000001d01) at rIP: 0xffffffff9228a274 (native_write_msr+0x4/0x20)
Do you you see the same behavior on other kernel (L1) version? Have you changed
any other components (especially in L0)?
> Opteron_G5,check,+svm -hda c:\debian.qcow2
>
> It's reproducible in 100% cases
>
> --
> You may reply to this email to add a comment.
>
> You are receiving this mail because:
> You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug 219085] kvm_spurious_fault in L1 when running a nested kvm instance on AMD Opteron_G5_qemu L0
2024-07-22 18:50 [Bug 219085] New: kvm_spurious_fault in L1 when running a nested kvm instance on AMD Opteron_G5_qemu L0 bugzilla-daemon
2024-07-22 18:51 ` [Bug 219085] " bugzilla-daemon
2024-07-22 19:13 ` bugzilla-daemon
@ 2024-07-22 23:21 ` bugzilla-daemon
2024-07-23 18:53 ` bugzilla-daemon
` (3 subsequent siblings)
6 siblings, 0 replies; 10+ messages in thread
From: bugzilla-daemon @ 2024-07-22 23:21 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=219085
--- Comment #2 from Sean Christopherson (seanjc@google.com) ---
On Mon, Jul 22, 2024, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=219085
>
> --- Comment #1 from ununpta@mailto.plus ---
> Command I used on L0 AMD Ryzen:
> qemu-system-x86_64.exe -m 4096 -machine q35 -accel whpx -smp 1 -cpu
This is likely an issue in the L0 hypervisor, which in this case is Hyper-V.
KVM
(L1) hits a #GP when trying to enable EFER.SVME, which leads to the #UD on
VMSAVE
(SVM isn't enabled).
[ 355.714362] unchecked MSR access error: WRMSR to 0xc0000080 (tried to
write 0x0000000000001d01) at rIP: 0xffffffff9228a274
(native_write_msr+0x4/0x20)
Do you you see the same behavior on other kernel (L1) version? Have you
changed
any other components (especially in L0)?
> Opteron_G5,check,+svm -hda c:\debian.qcow2
>
> It's reproducible in 100% cases
>
> --
> You may reply to this email to add a comment.
>
> You are receiving this mail because:
> You are watching the assignee of the bug.
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 10+ messages in thread* [Bug 219085] kvm_spurious_fault in L1 when running a nested kvm instance on AMD Opteron_G5_qemu L0
2024-07-22 18:50 [Bug 219085] New: kvm_spurious_fault in L1 when running a nested kvm instance on AMD Opteron_G5_qemu L0 bugzilla-daemon
` (2 preceding siblings ...)
2024-07-22 23:21 ` bugzilla-daemon
@ 2024-07-23 18:53 ` bugzilla-daemon
2024-07-23 19:13 ` Sean Christopherson
2024-07-23 19:13 ` bugzilla-daemon
` (2 subsequent siblings)
6 siblings, 1 reply; 10+ messages in thread
From: bugzilla-daemon @ 2024-07-23 18:53 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=219085
--- Comment #3 from ununpta@mailto.plus ---
> Do you you see the same behavior on other kernel (L1) version? Have you
> changed any other components (especially in L0)?
Thank you for your help.
What I tried:
* Opened Hyper-V manager built in Windows and created Ubuntu 22.04 LTS
available by default.
* Opened PowerShell console and ran `Set-VMProcessor -VMName "Ubuntu 22.04 LTS"
-ExposeVirtualizationExtensions $true` to allow Nested Virtualization in
Hyper-V.
I have to notice, though, that even without `ExposeVirtualizationExtensions
$true`, KVM inside Hyper-V manager didn't crash as it did in qemu. Bash just
printed a warning that nested virtualization is restricted.
* Booted into "Ubuntu 22.04 LTS", installed qemu and `qemu-system-x86_64 -accel
kvm` was successfull - BIOS was shown up.
Default kernel was vmlinuz-5.15.0-27-generic - After qemu launch, only
kvm-related messages were:
[2.485820] kvm: Nested Virtualization enabled
[2.485822] SVM: kvm: Nested Paging enabled
[2.485823] SVM: kvm: Hyper-V enlightened NPT TLB flush enabled
[2.485824] SVM: kvm: Hyper-V Direct TLB flush enabled
[2.485828] SVM: Virtual VMLOAD VMSAVE supported
Then I recompiled latest kernel and installed it with the same successful
KVM-accelerated qemu BIOS boot.
vmlinuz-6.10.0 - After qemu launch, only kvm-related messages are:
[1.701988] kvm_amd: TSC scaling supported
[1.701992] kvm_amd: Nested Virtualization enabled
[1.701993] kvm_amd: Nested Paging enabled
[1.701996] kvm_amd: kvm_amd: Hyper-V enlightened NPT TLB flush enabled
[1.701997] kvm_amd: kvm_amd: Hyper-V Direct TLB flush enabled
[1.701999] kvm_amd: Virtual VMLOAD VMSAVE supported
[1.702000] kvm_amd: PMU virtualization is disabled
I have to guess how to allow `Set-VMProcessor -VMName "Ubuntu 22.04 LTS"
-ExposeVirtualizationExtensions $true` for third-party software, not only for
machines created by Hyper-V manager. Maybe Qemu has to be run under admin
priveleges as well.
I also saw a claim from Peter Maydell, qemu developer, who had said this about
qemu command line parameter `-cpu _processor_type_`:
> using a specific cpu type will only work with KVM if the host CPU really is
> that exact CPU type, otherwise, use "-cpu host" or "-cpu max".
> This is a restriction in the kernel's KVM handling, and not something that
> can be worked around in the QEMU side.
Per https://gitlab.com/qemu-project/qemu/-/issues/239
I was somewhat confused by this claim because
> --- Comment #1 from ununpta@mailto.plus ---
> Command I used on L0 AMD Ryzen:
> qemu-system-x86_64.exe -m 4096 -machine q35 -accel whpx -smp 1 -cpu
> Opteron_G5
Let me ask you a few questions.
Q1: Can one use an older cpu (but still supporting SVM), not the actual bare
one in qemu command line for nested virtualization or KVM will crash due to
restriction in the kernel's KVM handling?
Q2: Is there a command in bare Kernel/KVM console to figure out if EFER.SVME
register/bit is writeable? If not,
Q3: Can you recommend any package to figure out it?
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [Bug 219085] kvm_spurious_fault in L1 when running a nested kvm instance on AMD Opteron_G5_qemu L0
2024-07-23 18:53 ` bugzilla-daemon
@ 2024-07-23 19:13 ` Sean Christopherson
0 siblings, 0 replies; 10+ messages in thread
From: Sean Christopherson @ 2024-07-23 19:13 UTC (permalink / raw)
To: bugzilla-daemon; +Cc: kvm
On Tue, Jul 23, 2024, bugzilla-daemon@kernel.org wrote:
> I also saw a claim from Peter Maydell, qemu developer, who had said this about
> qemu command line parameter `-cpu _processor_type_`:
> > using a specific cpu type will only work with KVM if the host CPU really is
> > that exact CPU type, otherwise, use "-cpu host" or "-cpu max".
This generally isn't true. KVM is very capable of running older vCPU models on
newer hardware. What won't work (at least, not well) is cross-vendor virtualization,
i.e. advertising AMD on Intel and vice versa, but that's not what you're doing.
> > This is a restriction in the kernel's KVM handling, and not something that
> > can be worked around in the QEMU side.
> Per https://gitlab.com/qemu-project/qemu/-/issues/239
>
> I was somewhat confused by this claim because
> > --- Comment #1 from ununpta@mailto.plus ---
> > Command I used on L0 AMD Ryzen:
> > qemu-system-x86_64.exe -m 4096 -machine q35 -accel whpx -smp 1 -cpu
> > Opteron_G5
>
> Let me ask you a few questions.
> Q1: Can one use an older cpu (but still supporting SVM), not the actual bare
> one in qemu command line for nested virtualization or KVM will crash due to
> restriction in the kernel's KVM handling?
Yes. There might be caveats, but AFAIK, QEMU's predefined vCPU models should
always work. If it doesn't work, and you have decent evidence that it's a KVM
problem, definitely feel free to file a KVM bug.
> Q2: Is there a command in bare Kernel/KVM console to figure out if EFER.SVME
> register/bit is writeable? If not,
grep -q svm /proc/cpuinfo
SVM can be disabled by firmware via MSR_VM_CR (0xc0010114) even if SVM is reported
in raw CPUID, but the kernel accounts for that and clears the "svm" flag from the
CPU data that's reported in /proc/cpuinfo.
> Q3: Can you recommend any package to figure out it?
Sorry, I don't follow this question.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug 219085] kvm_spurious_fault in L1 when running a nested kvm instance on AMD Opteron_G5_qemu L0
2024-07-22 18:50 [Bug 219085] New: kvm_spurious_fault in L1 when running a nested kvm instance on AMD Opteron_G5_qemu L0 bugzilla-daemon
` (3 preceding siblings ...)
2024-07-23 18:53 ` bugzilla-daemon
@ 2024-07-23 19:13 ` bugzilla-daemon
2024-07-24 19:15 ` bugzilla-daemon
2024-08-12 7:44 ` bugzilla-daemon
6 siblings, 0 replies; 10+ messages in thread
From: bugzilla-daemon @ 2024-07-23 19:13 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=219085
--- Comment #4 from Sean Christopherson (seanjc@google.com) ---
On Tue, Jul 23, 2024, bugzilla-daemon@kernel.org wrote:
> I also saw a claim from Peter Maydell, qemu developer, who had said this
> about
> qemu command line parameter `-cpu _processor_type_`:
> > using a specific cpu type will only work with KVM if the host CPU really is
> > that exact CPU type, otherwise, use "-cpu host" or "-cpu max".
This generally isn't true. KVM is very capable of running older vCPU models on
newer hardware. What won't work (at least, not well) is cross-vendor
virtualization,
i.e. advertising AMD on Intel and vice versa, but that's not what you're doing.
> > This is a restriction in the kernel's KVM handling, and not something that
> > can be worked around in the QEMU side.
> Per https://gitlab.com/qemu-project/qemu/-/issues/239
>
> I was somewhat confused by this claim because
> > --- Comment #1 from ununpta@mailto.plus ---
> > Command I used on L0 AMD Ryzen:
> > qemu-system-x86_64.exe -m 4096 -machine q35 -accel whpx -smp 1 -cpu
> > Opteron_G5
>
> Let me ask you a few questions.
> Q1: Can one use an older cpu (but still supporting SVM), not the actual bare
> one in qemu command line for nested virtualization or KVM will crash due to
> restriction in the kernel's KVM handling?
Yes. There might be caveats, but AFAIK, QEMU's predefined vCPU models should
always work. If it doesn't work, and you have decent evidence that it's a KVM
problem, definitely feel free to file a KVM bug.
> Q2: Is there a command in bare Kernel/KVM console to figure out if EFER.SVME
> register/bit is writeable? If not,
grep -q svm /proc/cpuinfo
SVM can be disabled by firmware via MSR_VM_CR (0xc0010114) even if SVM is
reported
in raw CPUID, but the kernel accounts for that and clears the "svm" flag from
the
CPU data that's reported in /proc/cpuinfo.
> Q3: Can you recommend any package to figure out it?
Sorry, I don't follow this question.
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 10+ messages in thread* [Bug 219085] kvm_spurious_fault in L1 when running a nested kvm instance on AMD Opteron_G5_qemu L0
2024-07-22 18:50 [Bug 219085] New: kvm_spurious_fault in L1 when running a nested kvm instance on AMD Opteron_G5_qemu L0 bugzilla-daemon
` (4 preceding siblings ...)
2024-07-23 19:13 ` bugzilla-daemon
@ 2024-07-24 19:15 ` bugzilla-daemon
2024-08-12 7:44 ` bugzilla-daemon
6 siblings, 0 replies; 10+ messages in thread
From: bugzilla-daemon @ 2024-07-24 19:15 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=219085
--- Comment #5 from ununpta@mailto.plus ---
Sean, after looking into AMD documentation on
https://unix.stackexchange.com/questions/74376 I think it's clear why KVM in L1
crashes.
AMD says:
> Secure Virtual Machine Enable (SVME) Bit. Bit 12, read/write. Enables the SVM
> extensions. When this bit is zero, the SVM instructions cause #UD exceptions.
> EFER.SVME defaults to a reset value of zero.
> The effect of turning off EFER.SVME while a guest is running is undefined;
> therefore, the VMM should always prevent guests from writing EFER.
> SVM extensions can be disabled by setting VM_CR.SVME_DISABLE.
Command to read from EFER.SVME is `sudo rdmsr 0xC0000080 #EFER`. Both in
non-working and working machines this command returns d01. d01 is 1101 0000
0001 in bin.
Crashing command from Comment #1 did `WRMSR to 0xc0000080 (tried to write
0x0000000000001d01)`. 1d01 is 0001 1101 0000 0001 in bin. The leftmost 0001 is
Bit 12.
So crashing command in L1 tries to write Bit 12 to exclude #UD. Nested VM is
impossible without Bit 12. Writing this bit needs 0ring priveleges, guests
cannot do this but the VM manager can. VM manager hooks into the write
operation, checks whether VM_CR.SVME_DISABLE == 0 and if true, sets the Bit 12
by itself with L0 priveleges, then returns success to the guest.
This is what happens on Windows if KVM L1 runs on the top of native Windows
Hyper-V manager L0.
Qemu on windows does not hook into write command and guest tries to write the
Bit with user privileges, which of course fails.
Questions are:
* How Does Processor determine who tries to write - L0 or L1?
* Does KVM determine in its code source whether KVM itself runs on the top of
Hyper-V or on the top of another KVM?
* Should Qemu hook into WRMSR to 0xc0000080 (tried to write 0x0000000000001d01)
coming from KVM if Qemu is accelerated by Hyper-V on L0 and KVM is L1?
> Sorry, I don't follow this question.
I figured out that the commands I had tried to describe turned out `sudo rdmsr
0xC0000080 #EFER` and `sudo rdmsr 0xC0010114 #VM_CR`. The package is called
msr-tools :)
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 10+ messages in thread* [Bug 219085] kvm_spurious_fault in L1 when running a nested kvm instance on AMD Opteron_G5_qemu L0
2024-07-22 18:50 [Bug 219085] New: kvm_spurious_fault in L1 when running a nested kvm instance on AMD Opteron_G5_qemu L0 bugzilla-daemon
` (5 preceding siblings ...)
2024-07-24 19:15 ` bugzilla-daemon
@ 2024-08-12 7:44 ` bugzilla-daemon
6 siblings, 0 replies; 10+ messages in thread
From: bugzilla-daemon @ 2024-08-12 7:44 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=219085
ununpta@mailto.plus changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution|--- |INVALID
--- Comment #6 from ununpta@mailto.plus ---
Closed as invalid since it is a qemu bug.
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 10+ messages in thread