* [Qemu-devel] [qemu-devel] Bug Report: VM crashed for some kinds of vCPU in nested virtualization @ 2013-04-15 6:24 李春奇 <Arthur Chunqi Li> 2013-04-15 7:43 ` Jan Kiszka 0 siblings, 1 reply; 6+ messages in thread From: 李春奇 <Arthur Chunqi Li> @ 2013-04-15 6:24 UTC (permalink / raw) To: qemu-devel; +Cc: Jan Kiszka [-- Attachment #1: Type: text/plain, Size: 2319 bytes --] Hi all, In a nested virtualization environment of qemu+KVM, some emulated CPU (such as core2duo) may cause L2 guest crash after booting for a while. Here's my configuration: Host: Linux 3.5.7 Qemu is the latest version from git repository. Emulated CPU : core2duo L1 guest: Linux 3.5.7 Qemu is the latest version from git Emulated CPU : core2duo L2 guest: Crash at some specific point after running for sometime. Here's the callback trace: qemu-system-x86_64 -net nic,vlan=0,macaddr=00:26:b9:fa:fe:31 -net tap,vlan=0 -vnc :1 -hda vm1.1.img -m 512 -machine pc,accel=kvm -cpu core2duo -cdrom ubuntu-12.04.2-server-amd64.iso TUNSETIFF: Device or resource busy qemu-system-x86_64: pci_add_option_rom: failed to find romfile "efi-e1000.rom" KVM: entry failed, hardware error 0x7 RAX=000000000000000f RBX=ffff88001f60c740 RCX=000000000000038f RDX=0000000000000007 RSI=000000000000000f RDI=000000000000038f RBP=ffff88001e6ffaf0 RSP=ffff88001e6ffaf0 R8 =000000070000000f R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000 R12=0000000000000001 R13=0000000000000001 R14=0000000000000000 R15=ffff88001f617384 RIP=ffffffff8103fe1a RFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0000 0000000000000000 000fffff 00000000 CS =0010 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA] SS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA] DS =0000 0000000000000000 000fffff 00000000 FS =0000 0000000000000000 000fffff 00000000 GS =0000 ffff88001f600000 000fffff 00000000 LDT=0000 0000000000000000 000fffff 00000000 TR =0040 ffff88001f611580 00002087 00008b00 DPL=0 TSS64-busy GDT= ffff88001f604000 0000007f IDT= ffffffff81dd6000 00000fff CR0=8005003b CR2=00000000ffffffff CR3=0000000001c0b000 CR4=000007f0 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=00000000ffff0ff0 DR7=0000000000000400 EFER=0000000000000d01 Code=20 89 f9 48 09 c8 5d c3 66 90 55 89 f0 89 f9 48 89 e5 0f 30 <31> c0 5d c3 66 90 55 89 f9 48 89 e5 0f 33 89 c7 48 89 d0 48 c1 e0 20 89 f9 48 09 c8 5d c3 This bug also appears in Westmere, SandyBridge and Haswell. But Nehalem, Penryn and Conroe runs OK. Is this problem really a bug or some mistakes in configuration? Thanks, Arthur -- Arthur Chunqi Li Department of Computer Science School of EECS Peking University Beijing, China [-- Attachment #2: Type: text/html, Size: 3118 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] [qemu-devel] Bug Report: VM crashed for some kinds of vCPU in nested virtualization 2013-04-15 6:24 [Qemu-devel] [qemu-devel] Bug Report: VM crashed for some kinds of vCPU in nested virtualization 李春奇 <Arthur Chunqi Li> @ 2013-04-15 7:43 ` Jan Kiszka 2013-04-16 3:49 ` 李春奇 <Arthur Chunqi Li> 0 siblings, 1 reply; 6+ messages in thread From: Jan Kiszka @ 2013-04-15 7:43 UTC (permalink / raw) To: "李春奇 <Arthur Chunqi Li>" Cc: qemu-devel, kvm [-- Attachment #1: Type: text/plain, Size: 3015 bytes --] On 2013-04-15 08:24, 李春奇 <Arthur Chunqi Li> wrote: > Hi all, > In a nested virtualization environment of qemu+KVM, some emulated CPU (such > as core2duo) may cause L2 guest crash after booting for a while. Here's my > configuration: > > Host: > Linux 3.5.7 You should better use latest version from kvm.git [1], branch "next". Otherwise, you risk re-triggering bugs that were fixed in the meantime. > Qemu is the latest version from git repository. > Emulated CPU : core2duo > > L1 guest: > Linux 3.5.7 > Qemu is the latest version from git > Emulated CPU : core2duo > > L2 guest: > Crash at some specific point after running for sometime. > > > Here's the callback trace: > > qemu-system-x86_64 -net nic,vlan=0,macaddr=00:26:b9:fa:fe:31 -net > tap,vlan=0 -vnc :1 -hda vm1.1.img -m 512 -machine pc,accel=kvm -cpu > core2duo -cdrom ubuntu-12.04.2-server-amd64.iso > TUNSETIFF: Device or resource busy > qemu-system-x86_64: pci_add_option_rom: failed to find romfile > "efi-e1000.rom" > KVM: entry failed, hardware error 0x7 ^^^ As an exercise, you could try to track down what this number means. Hint: there will be two possibilities (unfortunately). > RAX=000000000000000f RBX=ffff88001f60c740 RCX=000000000000038f > RDX=0000000000000007 > RSI=000000000000000f RDI=000000000000038f RBP=ffff88001e6ffaf0 > RSP=ffff88001e6ffaf0 > R8 =000000070000000f R9 =0000000000000000 R10=0000000000000000 > R11=0000000000000000 > R12=0000000000000001 R13=0000000000000001 R14=0000000000000000 > R15=ffff88001f617384 > RIP=ffffffff8103fe1a RFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0 > ES =0000 0000000000000000 000fffff 00000000 > CS =0010 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA] > SS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA] > DS =0000 0000000000000000 000fffff 00000000 > FS =0000 0000000000000000 000fffff 00000000 > GS =0000 ffff88001f600000 000fffff 00000000 > LDT=0000 0000000000000000 000fffff 00000000 > TR =0040 ffff88001f611580 00002087 00008b00 DPL=0 TSS64-busy > GDT= ffff88001f604000 0000007f > IDT= ffffffff81dd6000 00000fff > CR0=8005003b CR2=00000000ffffffff CR3=0000000001c0b000 CR4=000007f0 > DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 > DR3=0000000000000000 > DR6=00000000ffff0ff0 DR7=0000000000000400 > EFER=0000000000000d01 > Code=20 89 f9 48 09 c8 5d c3 66 90 55 89 f0 89 f9 48 89 e5 0f 30 <31> c0 5d > c3 66 90 55 89 f9 48 89 e5 0f 33 89 c7 48 89 d0 48 c1 e0 20 89 f9 48 09 c8 > 5d c3 > > > This bug also appears in Westmere, SandyBridge and Haswell. But Nehalem, > Penryn and Conroe runs OK. > > Is this problem really a bug or some mistakes in configuration? A bug, most probably. If you are able to reproduce using latest KVM, we would have to look into details. Jan PS: KVM related error reports of QEMU should also go to the KVM list. CC'ing it. [1] https://git.kernel.org/cgit/virt/kvm/kvm.git/ [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 263 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] [qemu-devel] Bug Report: VM crashed for some kinds of vCPU in nested virtualization 2013-04-15 7:43 ` Jan Kiszka @ 2013-04-16 3:49 ` 李春奇 <Arthur Chunqi Li> 2013-04-16 7:03 ` Jan Kiszka 0 siblings, 1 reply; 6+ messages in thread From: 李春奇 <Arthur Chunqi Li> @ 2013-04-16 3:49 UTC (permalink / raw) To: Jan Kiszka; +Cc: qemu-devel, kvm [-- Attachment #1: Type: text/plain, Size: 7266 bytes --] I changed to the latest version of kvm kernel but the bug also occured. On the startup of L1 VM on the host, the host kern.log will output: Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458090] kvm [2808]: vcpu0 unhandled rdmsr: 0x345 Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458166] kvm_set_msr_common: 22 callbacks suppressed Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458169] kvm [2808]: vcpu0 unhandled wrmsr: 0x40 data 0 Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458176] kvm [2808]: vcpu0 unhandled wrmsr: 0x60 data 0 Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458182] kvm [2808]: vcpu0 unhandled wrmsr: 0x41 data 0 Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458188] kvm [2808]: vcpu0 unhandled wrmsr: 0x61 data 0 Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458194] kvm [2808]: vcpu0 unhandled wrmsr: 0x42 data 0 Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458200] kvm [2808]: vcpu0 unhandled wrmsr: 0x62 data 0 Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458206] kvm [2808]: vcpu0 unhandled wrmsr: 0x43 data 0 Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458211] kvm [2808]: vcpu0 unhandled wrmsr: 0x63 data 0 Apr 16 11:28:23 Blade1-02 kernel: [ 4908.471014] kvm [2808]: vcpu1 unhandled wrmsr: 0x40 data 0 Apr 16 11:28:23 Blade1-02 kernel: [ 4908.471024] kvm [2808]: vcpu1 unhandled wrmsr: 0x60 data 0 When L1 VM starts and crashes, its kern.log will output: Apr 16 11:28:55 kvm1 kernel: [ 33.590101] device tap0 entered promiscuous mode Apr 16 11:28:55 kvm1 kernel: [ 33.590140] br0: port 2(tap0) entered forwarding state Apr 16 11:28:55 kvm1 kernel: [ 33.590146] br0: port 2(tap0) entered forwarding state Apr 16 11:29:04 kvm1 kernel: [ 42.592103] br0: port 2(tap0) entered forwarding state Apr 16 11:29:19 kvm1 kernel: [ 57.752731] kvm [1673]: vcpu0 unhandled rdmsr: 0x345 Apr 16 11:29:19 kvm1 kernel: [ 57.797261] kvm [1673]: vcpu0 unhandled wrmsr: 0x40 data 0 Apr 16 11:29:19 kvm1 kernel: [ 57.797315] kvm [1673]: vcpu0 unhandled wrmsr: 0x60 data 0 Apr 16 11:29:19 kvm1 kernel: [ 57.797366] kvm [1673]: vcpu0 unhandled wrmsr: 0x41 data 0 Apr 16 11:29:19 kvm1 kernel: [ 57.797416] kvm [1673]: vcpu0 unhandled wrmsr: 0x61 data 0 Apr 16 11:29:19 kvm1 kernel: [ 57.797466] kvm [1673]: vcpu0 unhandled wrmsr: 0x42 data 0 Apr 16 11:29:19 kvm1 kernel: [ 57.797516] kvm [1673]: vcpu0 unhandled wrmsr: 0x62 data 0 Apr 16 11:29:19 kvm1 kernel: [ 57.797566] kvm [1673]: vcpu0 unhandled wrmsr: 0x43 data 0 Apr 16 11:29:19 kvm1 kernel: [ 57.797616] kvm [1673]: vcpu0 unhandled wrmsr: 0x63 data 0 The host will output simultaneously: Apr 16 11:29:20 Blade1-02 kernel: [ 4966.314742] nested_vmx_run: VMCS MSR_{LOAD,STORE} unsupported And the callback trace displayed on the console is the same as the previous mail. Besides, the L1 and L2 guest may sometimes crash and output nothing, while sometimes it will output as above. So this indicates that the msr controls may fail for core2duo CPU emulator. For Jan, I have traced the code of qemu and KVM and found the relevant code of errno "KVM: entry failed, hardware error 0x7". The relevant code is in kernel arch/x86/kvm/vmx.c, function vmx_handle_exit(): if (exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY) { vcpu->run->exit_reason = KVM_EXIT_FAIL_ENTRY; vcpu->run->fail_entry.hardware_entry_failure_reason = exit_reason; return 0; } if (unlikely(vmx->fail)) { vcpu->run->exit_reason = KVM_EXIT_FAIL_ENTRY; vcpu->run->fail_entry.hardware_entry_failure_reason = vmcs_read32(VM_INSTRUCTION_ERROR); return 0; } The entry failed hardware error may be caused from these two points, both are caused by VMENTRY failed. Because macro VMX_EXIT_REASONS_FAILED_VMENTRY is 0x80000000 and the output errno is 0x7, so this error is caused by the second branch. I'm not very clear what the result of vmcs_read32(VM_INSTRUCTION_ERROR) refers to. Thanks, Arthur On Mon, Apr 15, 2013 at 3:43 PM, Jan Kiszka <jan.kiszka@web.de> wrote: > On 2013-04-15 08:24, 李春奇 <Arthur Chunqi Li> wrote: > > Hi all, > > In a nested virtualization environment of qemu+KVM, some emulated CPU > (such > > as core2duo) may cause L2 guest crash after booting for a while. Here's > my > > configuration: > > > > Host: > > Linux 3.5.7 > > You should better use latest version from kvm.git [1], branch "next". > Otherwise, you risk re-triggering bugs that were fixed in the meantime. > > > Qemu is the latest version from git repository. > > Emulated CPU : core2duo > > > > L1 guest: > > Linux 3.5.7 > > Qemu is the latest version from git > > Emulated CPU : core2duo > > > > L2 guest: > > Crash at some specific point after running for sometime. > > > > > > Here's the callback trace: > > > > qemu-system-x86_64 -net nic,vlan=0,macaddr=00:26:b9:fa:fe:31 -net > > tap,vlan=0 -vnc :1 -hda vm1.1.img -m 512 -machine pc,accel=kvm -cpu > > core2duo -cdrom ubuntu-12.04.2-server-amd64.iso > > TUNSETIFF: Device or resource busy > > qemu-system-x86_64: pci_add_option_rom: failed to find romfile > > "efi-e1000.rom" > > KVM: entry failed, hardware error 0x7 > ^^^ > As an exercise, you could try to track down what this number means. > Hint: there will be two possibilities (unfortunately). > > > RAX=000000000000000f RBX=ffff88001f60c740 RCX=000000000000038f > > RDX=0000000000000007 > > RSI=000000000000000f RDI=000000000000038f RBP=ffff88001e6ffaf0 > > RSP=ffff88001e6ffaf0 > > R8 =000000070000000f R9 =0000000000000000 R10=0000000000000000 > > R11=0000000000000000 > > R12=0000000000000001 R13=0000000000000001 R14=0000000000000000 > > R15=ffff88001f617384 > > RIP=ffffffff8103fe1a RFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0 > > ES =0000 0000000000000000 000fffff 00000000 > > CS =0010 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA] > > SS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA] > > DS =0000 0000000000000000 000fffff 00000000 > > FS =0000 0000000000000000 000fffff 00000000 > > GS =0000 ffff88001f600000 000fffff 00000000 > > LDT=0000 0000000000000000 000fffff 00000000 > > TR =0040 ffff88001f611580 00002087 00008b00 DPL=0 TSS64-busy > > GDT= ffff88001f604000 0000007f > > IDT= ffffffff81dd6000 00000fff > > CR0=8005003b CR2=00000000ffffffff CR3=0000000001c0b000 CR4=000007f0 > > DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 > > DR3=0000000000000000 > > DR6=00000000ffff0ff0 DR7=0000000000000400 > > EFER=0000000000000d01 > > Code=20 89 f9 48 09 c8 5d c3 66 90 55 89 f0 89 f9 48 89 e5 0f 30 <31> c0 > 5d > > c3 66 90 55 89 f9 48 89 e5 0f 33 89 c7 48 89 d0 48 c1 e0 20 89 f9 48 09 > c8 > > 5d c3 > > > > > > This bug also appears in Westmere, SandyBridge and Haswell. But Nehalem, > > Penryn and Conroe runs OK. > > > > Is this problem really a bug or some mistakes in configuration? > > A bug, most probably. If you are able to reproduce using latest KVM, we > would have to look into details. > > Jan > > PS: KVM related error reports of QEMU should also go to the KVM list. > CC'ing it. > > [1] https://git.kernel.org/cgit/virt/kvm/kvm.git/ > > -- Arthur Chunqi Li Department of Computer Science School of EECS Peking University Beijing, China [-- Attachment #2: Type: text/html, Size: 9579 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] [qemu-devel] Bug Report: VM crashed for some kinds of vCPU in nested virtualization 2013-04-16 3:49 ` 李春奇 <Arthur Chunqi Li> @ 2013-04-16 7:03 ` Jan Kiszka 2013-04-16 10:19 ` 李春奇 <Arthur Chunqi Li> 0 siblings, 1 reply; 6+ messages in thread From: Jan Kiszka @ 2013-04-16 7:03 UTC (permalink / raw) To: "李春奇 <Arthur Chunqi Li>" Cc: qemu-devel, kvm [-- Attachment #1: Type: text/plain, Size: 4599 bytes --] On 2013-04-16 05:49, 李春奇 <Arthur Chunqi Li> wrote: > I changed to the latest version of kvm kernel but the bug also occured. > > On the startup of L1 VM on the host, the host kern.log will output: > Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458090] kvm [2808]: vcpu0 > unhandled rdmsr: 0x345 > Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458166] kvm_set_msr_common: 22 > callbacks suppressed > Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458169] kvm [2808]: vcpu0 > unhandled wrmsr: 0x40 data 0 > Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458176] kvm [2808]: vcpu0 > unhandled wrmsr: 0x60 data 0 > Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458182] kvm [2808]: vcpu0 > unhandled wrmsr: 0x41 data 0 > Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458188] kvm [2808]: vcpu0 > unhandled wrmsr: 0x61 data 0 > Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458194] kvm [2808]: vcpu0 > unhandled wrmsr: 0x42 data 0 > Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458200] kvm [2808]: vcpu0 > unhandled wrmsr: 0x62 data 0 > Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458206] kvm [2808]: vcpu0 > unhandled wrmsr: 0x43 data 0 > Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458211] kvm [2808]: vcpu0 > unhandled wrmsr: 0x63 data 0 > Apr 16 11:28:23 Blade1-02 kernel: [ 4908.471014] kvm [2808]: vcpu1 > unhandled wrmsr: 0x40 data 0 > Apr 16 11:28:23 Blade1-02 kernel: [ 4908.471024] kvm [2808]: vcpu1 > unhandled wrmsr: 0x60 data 0 > > When L1 VM starts and crashes, its kern.log will output: > Apr 16 11:28:55 kvm1 kernel: [ 33.590101] device tap0 entered promiscuous > mode > Apr 16 11:28:55 kvm1 kernel: [ 33.590140] br0: port 2(tap0) entered > forwarding state > Apr 16 11:28:55 kvm1 kernel: [ 33.590146] br0: port 2(tap0) entered > forwarding state > Apr 16 11:29:04 kvm1 kernel: [ 42.592103] br0: port 2(tap0) entered > forwarding state > Apr 16 11:29:19 kvm1 kernel: [ 57.752731] kvm [1673]: vcpu0 unhandled > rdmsr: 0x345 > Apr 16 11:29:19 kvm1 kernel: [ 57.797261] kvm [1673]: vcpu0 unhandled > wrmsr: 0x40 data 0 > Apr 16 11:29:19 kvm1 kernel: [ 57.797315] kvm [1673]: vcpu0 unhandled > wrmsr: 0x60 data 0 > Apr 16 11:29:19 kvm1 kernel: [ 57.797366] kvm [1673]: vcpu0 unhandled > wrmsr: 0x41 data 0 > Apr 16 11:29:19 kvm1 kernel: [ 57.797416] kvm [1673]: vcpu0 unhandled > wrmsr: 0x61 data 0 > Apr 16 11:29:19 kvm1 kernel: [ 57.797466] kvm [1673]: vcpu0 unhandled > wrmsr: 0x42 data 0 > Apr 16 11:29:19 kvm1 kernel: [ 57.797516] kvm [1673]: vcpu0 unhandled > wrmsr: 0x62 data 0 > Apr 16 11:29:19 kvm1 kernel: [ 57.797566] kvm [1673]: vcpu0 unhandled > wrmsr: 0x43 data 0 > Apr 16 11:29:19 kvm1 kernel: [ 57.797616] kvm [1673]: vcpu0 unhandled > wrmsr: 0x63 data 0 > > The host will output simultaneously: > Apr 16 11:29:20 Blade1-02 kernel: [ 4966.314742] nested_vmx_run: VMCS > MSR_{LOAD,STORE} unsupported That's an important information. KVM is not yet implementing this feature, but L1 is using it - doomed to fail. This feature gap of nested VMX needs to be closed at some point. > > And the callback trace displayed on the console is the same as the previous > mail. > > Besides, the L1 and L2 guest may sometimes crash and output nothing, while > sometimes it will output as above. > > > So this indicates that the msr controls may fail for core2duo CPU emulator. > Maybe varying the CPU type (try e.g. -cpu kvm64,+vmx) reduces the likeliness of this scenario with KVM as guest. > > For Jan, > I have traced the code of qemu and KVM and found the relevant code of errno > "KVM: entry failed, hardware error 0x7". The relevant code is in kernel > arch/x86/kvm/vmx.c, function vmx_handle_exit(): > > if (exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY) { > vcpu->run->exit_reason = KVM_EXIT_FAIL_ENTRY; > vcpu->run->fail_entry.hardware_entry_failure_reason > = exit_reason; > return 0; > } > > if (unlikely(vmx->fail)) { > vcpu->run->exit_reason = KVM_EXIT_FAIL_ENTRY; > vcpu->run->fail_entry.hardware_entry_failure_reason > = vmcs_read32(VM_INSTRUCTION_ERROR); > return 0; > } > > The entry failed hardware error may be caused from these two points, both > are caused by VMENTRY failed. Because macro VMX_EXIT_REASONS_FAILED_VMENTRY > is 0x80000000 and the output errno is 0x7, so this error is caused by the > second branch. I'm not very clear what the result of > vmcs_read32(VM_INSTRUCTION_ERROR) refers to. Try to look this up in the Intel manual. It explains what instruction error 7 means. You will also find it when tracing down the error message of L0. Jan [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 263 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] [qemu-devel] Bug Report: VM crashed for some kinds of vCPU in nested virtualization 2013-04-16 7:03 ` Jan Kiszka @ 2013-04-16 10:19 ` 李春奇 <Arthur Chunqi Li> 2013-04-16 10:29 ` Jan Kiszka 0 siblings, 1 reply; 6+ messages in thread From: 李春奇 <Arthur Chunqi Li> @ 2013-04-16 10:19 UTC (permalink / raw) To: Jan Kiszka; +Cc: qemu-devel, kvm [-- Attachment #1: Type: text/plain, Size: 5529 bytes --] I looked up Intel manual for VM instruction error. Error number 7 means "VM entry with invalid control field(s)", which means in process of VM switching some control fields are not properly configured. I wonder why some emulated CPUs (e.g.Nehalem) can run properly without nested VMCS MSR support? Besides, this bug has also been reported at Red Hat community https://bugzilla.redhat.com/show_bug.cgi?id=892240 And for some specific kernel (e.g. kernel 3.8.4-202.fc18.x86_64 for fedora18) it works well. On Tue, Apr 16, 2013 at 3:03 PM, Jan Kiszka <jan.kiszka@web.de> wrote: > On 2013-04-16 05:49, 李春奇 <Arthur Chunqi Li> wrote: > > I changed to the latest version of kvm kernel but the bug also occured. > > > > On the startup of L1 VM on the host, the host kern.log will output: > > Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458090] kvm [2808]: vcpu0 > > unhandled rdmsr: 0x345 > > Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458166] kvm_set_msr_common: 22 > > callbacks suppressed > > Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458169] kvm [2808]: vcpu0 > > unhandled wrmsr: 0x40 data 0 > > Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458176] kvm [2808]: vcpu0 > > unhandled wrmsr: 0x60 data 0 > > Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458182] kvm [2808]: vcpu0 > > unhandled wrmsr: 0x41 data 0 > > Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458188] kvm [2808]: vcpu0 > > unhandled wrmsr: 0x61 data 0 > > Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458194] kvm [2808]: vcpu0 > > unhandled wrmsr: 0x42 data 0 > > Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458200] kvm [2808]: vcpu0 > > unhandled wrmsr: 0x62 data 0 > > Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458206] kvm [2808]: vcpu0 > > unhandled wrmsr: 0x43 data 0 > > Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458211] kvm [2808]: vcpu0 > > unhandled wrmsr: 0x63 data 0 > > Apr 16 11:28:23 Blade1-02 kernel: [ 4908.471014] kvm [2808]: vcpu1 > > unhandled wrmsr: 0x40 data 0 > > Apr 16 11:28:23 Blade1-02 kernel: [ 4908.471024] kvm [2808]: vcpu1 > > unhandled wrmsr: 0x60 data 0 > > > > When L1 VM starts and crashes, its kern.log will output: > > Apr 16 11:28:55 kvm1 kernel: [ 33.590101] device tap0 entered > promiscuous > > mode > > Apr 16 11:28:55 kvm1 kernel: [ 33.590140] br0: port 2(tap0) entered > > forwarding state > > Apr 16 11:28:55 kvm1 kernel: [ 33.590146] br0: port 2(tap0) entered > > forwarding state > > Apr 16 11:29:04 kvm1 kernel: [ 42.592103] br0: port 2(tap0) entered > > forwarding state > > Apr 16 11:29:19 kvm1 kernel: [ 57.752731] kvm [1673]: vcpu0 unhandled > > rdmsr: 0x345 > > Apr 16 11:29:19 kvm1 kernel: [ 57.797261] kvm [1673]: vcpu0 unhandled > > wrmsr: 0x40 data 0 > > Apr 16 11:29:19 kvm1 kernel: [ 57.797315] kvm [1673]: vcpu0 unhandled > > wrmsr: 0x60 data 0 > > Apr 16 11:29:19 kvm1 kernel: [ 57.797366] kvm [1673]: vcpu0 unhandled > > wrmsr: 0x41 data 0 > > Apr 16 11:29:19 kvm1 kernel: [ 57.797416] kvm [1673]: vcpu0 unhandled > > wrmsr: 0x61 data 0 > > Apr 16 11:29:19 kvm1 kernel: [ 57.797466] kvm [1673]: vcpu0 unhandled > > wrmsr: 0x42 data 0 > > Apr 16 11:29:19 kvm1 kernel: [ 57.797516] kvm [1673]: vcpu0 unhandled > > wrmsr: 0x62 data 0 > > Apr 16 11:29:19 kvm1 kernel: [ 57.797566] kvm [1673]: vcpu0 unhandled > > wrmsr: 0x43 data 0 > > Apr 16 11:29:19 kvm1 kernel: [ 57.797616] kvm [1673]: vcpu0 unhandled > > wrmsr: 0x63 data 0 > > > > The host will output simultaneously: > > Apr 16 11:29:20 Blade1-02 kernel: [ 4966.314742] nested_vmx_run: VMCS > > MSR_{LOAD,STORE} unsupported > > That's an important information. KVM is not yet implementing this > feature, but L1 is using it - doomed to fail. This feature gap of nested > VMX needs to be closed at some point. > > > > > And the callback trace displayed on the console is the same as the > previous > > mail. > > > > Besides, the L1 and L2 guest may sometimes crash and output nothing, > while > > sometimes it will output as above. > > > > > > So this indicates that the msr controls may fail for core2duo CPU > emulator. > > > > Maybe varying the CPU type (try e.g. -cpu kvm64,+vmx) reduces the > likeliness of this scenario with KVM as guest. > > > > > For Jan, > > I have traced the code of qemu and KVM and found the relevant code of > errno > > "KVM: entry failed, hardware error 0x7". The relevant code is in kernel > > arch/x86/kvm/vmx.c, function vmx_handle_exit(): > > > > if (exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY) { > > vcpu->run->exit_reason = KVM_EXIT_FAIL_ENTRY; > > vcpu->run->fail_entry.hardware_entry_failure_reason > > = exit_reason; > > return 0; > > } > > > > if (unlikely(vmx->fail)) { > > vcpu->run->exit_reason = KVM_EXIT_FAIL_ENTRY; > > vcpu->run->fail_entry.hardware_entry_failure_reason > > = vmcs_read32(VM_INSTRUCTION_ERROR); > > return 0; > > } > > > > The entry failed hardware error may be caused from these two points, both > > are caused by VMENTRY failed. Because macro > VMX_EXIT_REASONS_FAILED_VMENTRY > > is 0x80000000 and the output errno is 0x7, so this error is caused by the > > second branch. I'm not very clear what the result of > > vmcs_read32(VM_INSTRUCTION_ERROR) refers to. > > Try to look this up in the Intel manual. It explains what instruction > error 7 means. You will also find it when tracing down the error message > of L0. > > Jan > > > -- Arthur Chunqi Li Department of Computer Science School of EECS Peking University Beijing, China [-- Attachment #2: Type: text/html, Size: 7074 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] [qemu-devel] Bug Report: VM crashed for some kinds of vCPU in nested virtualization 2013-04-16 10:19 ` 李春奇 <Arthur Chunqi Li> @ 2013-04-16 10:29 ` Jan Kiszka 0 siblings, 0 replies; 6+ messages in thread From: Jan Kiszka @ 2013-04-16 10:29 UTC (permalink / raw) To: "李春奇 <Arthur Chunqi Li>" Cc: qemu-devel, kvm [-- Attachment #1: Type: text/plain, Size: 730 bytes --] On 2013-04-16 12:19, 李春奇 <Arthur Chunqi Li> wrote: > I looked up Intel manual for VM instruction error. Error number 7 means "VM > entry with invalid control field(s)", which means in process of VM > switching some control fields are not properly configured. > > I wonder why some emulated CPUs (e.g.Nehalem) can run properly without > nested VMCS MSR support? MSRs are only switched between host (L0) and guest (L1/L2) if their value differ. That saves some cycles. Therefore, if either the guest is not using a specific MSR (due to differences in the virtual CPU feature set) or it is using it in the same way like the host, there is no switching, thus no risk to hit this unimplemented feature. Jan [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 263 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2013-04-16 10:29 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-04-15 6:24 [Qemu-devel] [qemu-devel] Bug Report: VM crashed for some kinds of vCPU in nested virtualization 李春奇 <Arthur Chunqi Li> 2013-04-15 7:43 ` Jan Kiszka 2013-04-16 3:49 ` 李春奇 <Arthur Chunqi Li> 2013-04-16 7:03 ` Jan Kiszka 2013-04-16 10:19 ` 李春奇 <Arthur Chunqi Li> 2013-04-16 10:29 ` Jan Kiszka
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).