[Qemu-devel] [qemu-devel] Bug Report: VM crashed for some kinds of vCPU in nested virtualization

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [qemu-devel] Bug Report: VM crashed for some kinds of vCPU in nested virtualization
@ 2013-04-15  6:24 李春奇 <Arthur Chunqi Li>
  2013-04-15  7:43 ` Jan Kiszka
  0 siblings, 1 reply; 6+ messages in thread
From: 李春奇 <Arthur Chunqi Li> @ 2013-04-15  6:24 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jan Kiszka

[-- Attachment #1: Type: text/plain, Size: 2319 bytes --]

Hi all,
In a nested virtualization environment of qemu+KVM, some emulated CPU (such
as core2duo) may cause L2 guest crash after booting for a while. Here's my
configuration:

Host:
Linux 3.5.7
Qemu is the latest version from git repository.
Emulated CPU : core2duo

L1 guest:
Linux 3.5.7
Qemu is the latest version from git
Emulated CPU : core2duo

L2 guest:
Crash at some specific point after running for sometime.


Here's the callback trace:

qemu-system-x86_64 -net nic,vlan=0,macaddr=00:26:b9:fa:fe:31 -net
tap,vlan=0 -vnc :1 -hda vm1.1.img -m 512 -machine pc,accel=kvm -cpu
core2duo -cdrom ubuntu-12.04.2-server-amd64.iso
TUNSETIFF: Device or resource busy
qemu-system-x86_64: pci_add_option_rom: failed to find romfile
"efi-e1000.rom"
KVM: entry failed, hardware error 0x7
RAX=000000000000000f RBX=ffff88001f60c740 RCX=000000000000038f
RDX=0000000000000007
RSI=000000000000000f RDI=000000000000038f RBP=ffff88001e6ffaf0
RSP=ffff88001e6ffaf0
R8 =000000070000000f R9 =0000000000000000 R10=0000000000000000
R11=0000000000000000
R12=0000000000000001 R13=0000000000000001 R14=0000000000000000
R15=ffff88001f617384
RIP=ffffffff8103fe1a RFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0000 0000000000000000 000fffff 00000000
CS =0010 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
SS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
DS =0000 0000000000000000 000fffff 00000000
FS =0000 0000000000000000 000fffff 00000000
GS =0000 ffff88001f600000 000fffff 00000000
LDT=0000 0000000000000000 000fffff 00000000
TR =0040 ffff88001f611580 00002087 00008b00 DPL=0 TSS64-busy
GDT=     ffff88001f604000 0000007f
IDT=     ffffffff81dd6000 00000fff
CR0=8005003b CR2=00000000ffffffff CR3=0000000001c0b000 CR4=000007f0
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000
DR3=0000000000000000
DR6=00000000ffff0ff0 DR7=0000000000000400
EFER=0000000000000d01
Code=20 89 f9 48 09 c8 5d c3 66 90 55 89 f0 89 f9 48 89 e5 0f 30 <31> c0 5d
c3 66 90 55 89 f9 48 89 e5 0f 33 89 c7 48 89 d0 48 c1 e0 20 89 f9 48 09 c8
5d c3


This bug also appears in Westmere, SandyBridge and Haswell. But Nehalem,
Penryn and Conroe runs OK.

Is this problem really a bug or some mistakes in configuration?

Thanks,
Arthur

-- 
Arthur Chunqi Li
Department of Computer Science
School of EECS
Peking University
Beijing, China

[-- Attachment #2: Type: text/html, Size: 3118 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] [qemu-devel] Bug Report: VM crashed for some kinds of vCPU in nested virtualization
  2013-04-15  6:24 [Qemu-devel] [qemu-devel] Bug Report: VM crashed for some kinds of vCPU in nested virtualization 李春奇 <Arthur Chunqi Li>
@ 2013-04-15  7:43 ` Jan Kiszka
  2013-04-16  3:49   ` 李春奇 <Arthur Chunqi Li>
  0 siblings, 1 reply; 6+ messages in thread
From: Jan Kiszka @ 2013-04-15  7:43 UTC (permalink / raw)
  To: "李春奇 <Arthur Chunqi Li>"
  Cc: qemu-devel, kvm

[-- Attachment #1: Type: text/plain, Size: 3015 bytes --]

On 2013-04-15 08:24, 李春奇 <Arthur Chunqi Li> wrote:
> Hi all,
> In a nested virtualization environment of qemu+KVM, some emulated CPU (such
> as core2duo) may cause L2 guest crash after booting for a while. Here's my
> configuration:
> 
> Host:
> Linux 3.5.7

You should better use latest version from kvm.git [1], branch "next".
Otherwise, you risk re-triggering bugs that were fixed in the meantime.

> Qemu is the latest version from git repository.
> Emulated CPU : core2duo
> 
> L1 guest:
> Linux 3.5.7
> Qemu is the latest version from git
> Emulated CPU : core2duo
> 
> L2 guest:
> Crash at some specific point after running for sometime.
> 
> 
> Here's the callback trace:
> 
> qemu-system-x86_64 -net nic,vlan=0,macaddr=00:26:b9:fa:fe:31 -net
> tap,vlan=0 -vnc :1 -hda vm1.1.img -m 512 -machine pc,accel=kvm -cpu
> core2duo -cdrom ubuntu-12.04.2-server-amd64.iso
> TUNSETIFF: Device or resource busy
> qemu-system-x86_64: pci_add_option_rom: failed to find romfile
> "efi-e1000.rom"
> KVM: entry failed, hardware error 0x7
                                    ^^^
As an exercise, you could try to track down what this number means.
Hint: there will be two possibilities (unfortunately).

> RAX=000000000000000f RBX=ffff88001f60c740 RCX=000000000000038f
> RDX=0000000000000007
> RSI=000000000000000f RDI=000000000000038f RBP=ffff88001e6ffaf0
> RSP=ffff88001e6ffaf0
> R8 =000000070000000f R9 =0000000000000000 R10=0000000000000000
> R11=0000000000000000
> R12=0000000000000001 R13=0000000000000001 R14=0000000000000000
> R15=ffff88001f617384
> RIP=ffffffff8103fe1a RFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
> ES =0000 0000000000000000 000fffff 00000000
> CS =0010 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
> SS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> DS =0000 0000000000000000 000fffff 00000000
> FS =0000 0000000000000000 000fffff 00000000
> GS =0000 ffff88001f600000 000fffff 00000000
> LDT=0000 0000000000000000 000fffff 00000000
> TR =0040 ffff88001f611580 00002087 00008b00 DPL=0 TSS64-busy
> GDT=     ffff88001f604000 0000007f
> IDT=     ffffffff81dd6000 00000fff
> CR0=8005003b CR2=00000000ffffffff CR3=0000000001c0b000 CR4=000007f0
> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000
> DR3=0000000000000000
> DR6=00000000ffff0ff0 DR7=0000000000000400
> EFER=0000000000000d01
> Code=20 89 f9 48 09 c8 5d c3 66 90 55 89 f0 89 f9 48 89 e5 0f 30 <31> c0 5d
> c3 66 90 55 89 f9 48 89 e5 0f 33 89 c7 48 89 d0 48 c1 e0 20 89 f9 48 09 c8
> 5d c3
> 
> 
> This bug also appears in Westmere, SandyBridge and Haswell. But Nehalem,
> Penryn and Conroe runs OK.
> 
> Is this problem really a bug or some mistakes in configuration?

A bug, most probably. If you are able to reproduce using latest KVM, we
would have to look into details.

Jan

PS: KVM related error reports of QEMU should also go to the KVM list.
CC'ing it.

[1] https://git.kernel.org/cgit/virt/kvm/kvm.git/


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] [qemu-devel] Bug Report: VM crashed for some kinds of vCPU in nested virtualization
  2013-04-15  7:43 ` Jan Kiszka
@ 2013-04-16  3:49   ` 李春奇 <Arthur Chunqi Li>
  2013-04-16  7:03     ` Jan Kiszka
  0 siblings, 1 reply; 6+ messages in thread
From: 李春奇 <Arthur Chunqi Li> @ 2013-04-16  3:49 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: qemu-devel, kvm

[-- Attachment #1: Type: text/plain, Size: 7266 bytes --]

I changed to the latest version of kvm kernel but the bug also occured.

On the startup of L1 VM on the host, the host kern.log will output:
Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458090] kvm [2808]: vcpu0
unhandled rdmsr: 0x345
Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458166] kvm_set_msr_common: 22
callbacks suppressed
Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458169] kvm [2808]: vcpu0
unhandled wrmsr: 0x40 data 0
Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458176] kvm [2808]: vcpu0
unhandled wrmsr: 0x60 data 0
Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458182] kvm [2808]: vcpu0
unhandled wrmsr: 0x41 data 0
Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458188] kvm [2808]: vcpu0
unhandled wrmsr: 0x61 data 0
Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458194] kvm [2808]: vcpu0
unhandled wrmsr: 0x42 data 0
Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458200] kvm [2808]: vcpu0
unhandled wrmsr: 0x62 data 0
Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458206] kvm [2808]: vcpu0
unhandled wrmsr: 0x43 data 0
Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458211] kvm [2808]: vcpu0
unhandled wrmsr: 0x63 data 0
Apr 16 11:28:23 Blade1-02 kernel: [ 4908.471014] kvm [2808]: vcpu1
unhandled wrmsr: 0x40 data 0
Apr 16 11:28:23 Blade1-02 kernel: [ 4908.471024] kvm [2808]: vcpu1
unhandled wrmsr: 0x60 data 0

When L1 VM starts and crashes, its kern.log will output:
Apr 16 11:28:55 kvm1 kernel: [   33.590101] device tap0 entered promiscuous
mode
Apr 16 11:28:55 kvm1 kernel: [   33.590140] br0: port 2(tap0) entered
forwarding state
Apr 16 11:28:55 kvm1 kernel: [   33.590146] br0: port 2(tap0) entered
forwarding state
Apr 16 11:29:04 kvm1 kernel: [   42.592103] br0: port 2(tap0) entered
forwarding state
Apr 16 11:29:19 kvm1 kernel: [   57.752731] kvm [1673]: vcpu0 unhandled
rdmsr: 0x345
Apr 16 11:29:19 kvm1 kernel: [   57.797261] kvm [1673]: vcpu0 unhandled
wrmsr: 0x40 data 0
Apr 16 11:29:19 kvm1 kernel: [   57.797315] kvm [1673]: vcpu0 unhandled
wrmsr: 0x60 data 0
Apr 16 11:29:19 kvm1 kernel: [   57.797366] kvm [1673]: vcpu0 unhandled
wrmsr: 0x41 data 0
Apr 16 11:29:19 kvm1 kernel: [   57.797416] kvm [1673]: vcpu0 unhandled
wrmsr: 0x61 data 0
Apr 16 11:29:19 kvm1 kernel: [   57.797466] kvm [1673]: vcpu0 unhandled
wrmsr: 0x42 data 0
Apr 16 11:29:19 kvm1 kernel: [   57.797516] kvm [1673]: vcpu0 unhandled
wrmsr: 0x62 data 0
Apr 16 11:29:19 kvm1 kernel: [   57.797566] kvm [1673]: vcpu0 unhandled
wrmsr: 0x43 data 0
Apr 16 11:29:19 kvm1 kernel: [   57.797616] kvm [1673]: vcpu0 unhandled
wrmsr: 0x63 data 0

The host will output simultaneously:
Apr 16 11:29:20 Blade1-02 kernel: [ 4966.314742] nested_vmx_run: VMCS
MSR_{LOAD,STORE} unsupported

And the callback trace displayed on the console is the same as the previous
mail.

Besides, the L1 and L2 guest may sometimes crash and output nothing, while
sometimes it will output as above.


So this indicates that the msr controls may fail for core2duo CPU emulator.


For Jan,
I have traced the code of qemu and KVM and found the relevant code of errno
"KVM: entry failed, hardware error 0x7". The relevant code is in kernel
arch/x86/kvm/vmx.c, function vmx_handle_exit():

if (exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY) {
vcpu->run->exit_reason = KVM_EXIT_FAIL_ENTRY;
vcpu->run->fail_entry.hardware_entry_failure_reason
= exit_reason;
return 0;
}

if (unlikely(vmx->fail)) {
vcpu->run->exit_reason = KVM_EXIT_FAIL_ENTRY;
vcpu->run->fail_entry.hardware_entry_failure_reason
= vmcs_read32(VM_INSTRUCTION_ERROR);
return 0;
}

The entry failed hardware error may be caused from these two points, both
are caused by VMENTRY failed. Because macro VMX_EXIT_REASONS_FAILED_VMENTRY
is 0x80000000 and the output errno is 0x7, so this error is caused by the
second branch. I'm not very clear what the result of
vmcs_read32(VM_INSTRUCTION_ERROR) refers to.

Thanks,
Arthur


On Mon, Apr 15, 2013 at 3:43 PM, Jan Kiszka <jan.kiszka@web.de> wrote:

> On 2013-04-15 08:24, 李春奇 <Arthur Chunqi Li> wrote:
> > Hi all,
> > In a nested virtualization environment of qemu+KVM, some emulated CPU
> (such
> > as core2duo) may cause L2 guest crash after booting for a while. Here's
> my
> > configuration:
> >
> > Host:
> > Linux 3.5.7
>
> You should better use latest version from kvm.git [1], branch "next".
> Otherwise, you risk re-triggering bugs that were fixed in the meantime.
>
> > Qemu is the latest version from git repository.
> > Emulated CPU : core2duo
> >
> > L1 guest:
> > Linux 3.5.7
> > Qemu is the latest version from git
> > Emulated CPU : core2duo
> >
> > L2 guest:
> > Crash at some specific point after running for sometime.
> >
> >
> > Here's the callback trace:
> >
> > qemu-system-x86_64 -net nic,vlan=0,macaddr=00:26:b9:fa:fe:31 -net
> > tap,vlan=0 -vnc :1 -hda vm1.1.img -m 512 -machine pc,accel=kvm -cpu
> > core2duo -cdrom ubuntu-12.04.2-server-amd64.iso
> > TUNSETIFF: Device or resource busy
> > qemu-system-x86_64: pci_add_option_rom: failed to find romfile
> > "efi-e1000.rom"
> > KVM: entry failed, hardware error 0x7
>                                     ^^^
> As an exercise, you could try to track down what this number means.
> Hint: there will be two possibilities (unfortunately).
>
> > RAX=000000000000000f RBX=ffff88001f60c740 RCX=000000000000038f
> > RDX=0000000000000007
> > RSI=000000000000000f RDI=000000000000038f RBP=ffff88001e6ffaf0
> > RSP=ffff88001e6ffaf0
> > R8 =000000070000000f R9 =0000000000000000 R10=0000000000000000
> > R11=0000000000000000
> > R12=0000000000000001 R13=0000000000000001 R14=0000000000000000
> > R15=ffff88001f617384
> > RIP=ffffffff8103fe1a RFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
> > ES =0000 0000000000000000 000fffff 00000000
> > CS =0010 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
> > SS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> > DS =0000 0000000000000000 000fffff 00000000
> > FS =0000 0000000000000000 000fffff 00000000
> > GS =0000 ffff88001f600000 000fffff 00000000
> > LDT=0000 0000000000000000 000fffff 00000000
> > TR =0040 ffff88001f611580 00002087 00008b00 DPL=0 TSS64-busy
> > GDT=     ffff88001f604000 0000007f
> > IDT=     ffffffff81dd6000 00000fff
> > CR0=8005003b CR2=00000000ffffffff CR3=0000000001c0b000 CR4=000007f0
> > DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000
> > DR3=0000000000000000
> > DR6=00000000ffff0ff0 DR7=0000000000000400
> > EFER=0000000000000d01
> > Code=20 89 f9 48 09 c8 5d c3 66 90 55 89 f0 89 f9 48 89 e5 0f 30 <31> c0
> 5d
> > c3 66 90 55 89 f9 48 89 e5 0f 33 89 c7 48 89 d0 48 c1 e0 20 89 f9 48 09
> c8
> > 5d c3
> >
> >
> > This bug also appears in Westmere, SandyBridge and Haswell. But Nehalem,
> > Penryn and Conroe runs OK.
> >
> > Is this problem really a bug or some mistakes in configuration?
>
> A bug, most probably. If you are able to reproduce using latest KVM, we
> would have to look into details.
>
> Jan
>
> PS: KVM related error reports of QEMU should also go to the KVM list.
> CC'ing it.
>
> [1] https://git.kernel.org/cgit/virt/kvm/kvm.git/
>
>


-- 
Arthur Chunqi Li
Department of Computer Science
School of EECS
Peking University
Beijing, China

[-- Attachment #2: Type: text/html, Size: 9579 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] [qemu-devel] Bug Report: VM crashed for some kinds of vCPU in nested virtualization
  2013-04-16  3:49   ` 李春奇 <Arthur Chunqi Li>
@ 2013-04-16  7:03     ` Jan Kiszka
  2013-04-16 10:19       ` 李春奇 <Arthur Chunqi Li>
  0 siblings, 1 reply; 6+ messages in thread
From: Jan Kiszka @ 2013-04-16  7:03 UTC (permalink / raw)
  To: "李春奇 <Arthur Chunqi Li>"
  Cc: qemu-devel, kvm

[-- Attachment #1: Type: text/plain, Size: 4599 bytes --]

On 2013-04-16 05:49, 李春奇 <Arthur Chunqi Li> wrote:
> I changed to the latest version of kvm kernel but the bug also occured.
> 
> On the startup of L1 VM on the host, the host kern.log will output:
> Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458090] kvm [2808]: vcpu0
> unhandled rdmsr: 0x345
> Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458166] kvm_set_msr_common: 22
> callbacks suppressed
> Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458169] kvm [2808]: vcpu0
> unhandled wrmsr: 0x40 data 0
> Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458176] kvm [2808]: vcpu0
> unhandled wrmsr: 0x60 data 0
> Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458182] kvm [2808]: vcpu0
> unhandled wrmsr: 0x41 data 0
> Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458188] kvm [2808]: vcpu0
> unhandled wrmsr: 0x61 data 0
> Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458194] kvm [2808]: vcpu0
> unhandled wrmsr: 0x42 data 0
> Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458200] kvm [2808]: vcpu0
> unhandled wrmsr: 0x62 data 0
> Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458206] kvm [2808]: vcpu0
> unhandled wrmsr: 0x43 data 0
> Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458211] kvm [2808]: vcpu0
> unhandled wrmsr: 0x63 data 0
> Apr 16 11:28:23 Blade1-02 kernel: [ 4908.471014] kvm [2808]: vcpu1
> unhandled wrmsr: 0x40 data 0
> Apr 16 11:28:23 Blade1-02 kernel: [ 4908.471024] kvm [2808]: vcpu1
> unhandled wrmsr: 0x60 data 0
> 
> When L1 VM starts and crashes, its kern.log will output:
> Apr 16 11:28:55 kvm1 kernel: [   33.590101] device tap0 entered promiscuous
> mode
> Apr 16 11:28:55 kvm1 kernel: [   33.590140] br0: port 2(tap0) entered
> forwarding state
> Apr 16 11:28:55 kvm1 kernel: [   33.590146] br0: port 2(tap0) entered
> forwarding state
> Apr 16 11:29:04 kvm1 kernel: [   42.592103] br0: port 2(tap0) entered
> forwarding state
> Apr 16 11:29:19 kvm1 kernel: [   57.752731] kvm [1673]: vcpu0 unhandled
> rdmsr: 0x345
> Apr 16 11:29:19 kvm1 kernel: [   57.797261] kvm [1673]: vcpu0 unhandled
> wrmsr: 0x40 data 0
> Apr 16 11:29:19 kvm1 kernel: [   57.797315] kvm [1673]: vcpu0 unhandled
> wrmsr: 0x60 data 0
> Apr 16 11:29:19 kvm1 kernel: [   57.797366] kvm [1673]: vcpu0 unhandled
> wrmsr: 0x41 data 0
> Apr 16 11:29:19 kvm1 kernel: [   57.797416] kvm [1673]: vcpu0 unhandled
> wrmsr: 0x61 data 0
> Apr 16 11:29:19 kvm1 kernel: [   57.797466] kvm [1673]: vcpu0 unhandled
> wrmsr: 0x42 data 0
> Apr 16 11:29:19 kvm1 kernel: [   57.797516] kvm [1673]: vcpu0 unhandled
> wrmsr: 0x62 data 0
> Apr 16 11:29:19 kvm1 kernel: [   57.797566] kvm [1673]: vcpu0 unhandled
> wrmsr: 0x43 data 0
> Apr 16 11:29:19 kvm1 kernel: [   57.797616] kvm [1673]: vcpu0 unhandled
> wrmsr: 0x63 data 0
> 
> The host will output simultaneously:
> Apr 16 11:29:20 Blade1-02 kernel: [ 4966.314742] nested_vmx_run: VMCS
> MSR_{LOAD,STORE} unsupported

That's an important information. KVM is not yet implementing this
feature, but L1 is using it - doomed to fail. This feature gap of nested
VMX needs to be closed at some point.

> 
> And the callback trace displayed on the console is the same as the previous
> mail.
> 
> Besides, the L1 and L2 guest may sometimes crash and output nothing, while
> sometimes it will output as above.
> 
> 
> So this indicates that the msr controls may fail for core2duo CPU emulator.
> 

Maybe varying the CPU type (try e.g. -cpu kvm64,+vmx) reduces the
likeliness of this scenario with KVM as guest.

> 
> For Jan,
> I have traced the code of qemu and KVM and found the relevant code of errno
> "KVM: entry failed, hardware error 0x7". The relevant code is in kernel
> arch/x86/kvm/vmx.c, function vmx_handle_exit():
> 
> if (exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY) {
> vcpu->run->exit_reason = KVM_EXIT_FAIL_ENTRY;
> vcpu->run->fail_entry.hardware_entry_failure_reason
> = exit_reason;
> return 0;
> }
> 
> if (unlikely(vmx->fail)) {
> vcpu->run->exit_reason = KVM_EXIT_FAIL_ENTRY;
> vcpu->run->fail_entry.hardware_entry_failure_reason
> = vmcs_read32(VM_INSTRUCTION_ERROR);
> return 0;
> }
> 
> The entry failed hardware error may be caused from these two points, both
> are caused by VMENTRY failed. Because macro VMX_EXIT_REASONS_FAILED_VMENTRY
> is 0x80000000 and the output errno is 0x7, so this error is caused by the
> second branch. I'm not very clear what the result of
> vmcs_read32(VM_INSTRUCTION_ERROR) refers to.

Try to look this up in the Intel manual. It explains what instruction
error 7 means. You will also find it when tracing down the error message
of L0.

Jan



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] [qemu-devel] Bug Report: VM crashed for some kinds of vCPU in nested virtualization
  2013-04-16  7:03     ` Jan Kiszka
@ 2013-04-16 10:19       ` 李春奇 <Arthur Chunqi Li>
  2013-04-16 10:29         ` Jan Kiszka
  0 siblings, 1 reply; 6+ messages in thread
From: 李春奇 <Arthur Chunqi Li> @ 2013-04-16 10:19 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: qemu-devel, kvm

[-- Attachment #1: Type: text/plain, Size: 5529 bytes --]

I looked up Intel manual for VM instruction error. Error number 7 means "VM
entry with invalid control field(s)", which means in process of VM
switching some control fields are not properly configured.

I wonder why some emulated CPUs (e.g.Nehalem) can run properly without
nested VMCS MSR support?

Besides, this bug has also been reported at Red Hat community
https://bugzilla.redhat.com/show_bug.cgi?id=892240
And for some specific kernel (e.g. kernel 3.8.4-202.fc18.x86_64 for
fedora18) it works well.


On Tue, Apr 16, 2013 at 3:03 PM, Jan Kiszka <jan.kiszka@web.de> wrote:

> On 2013-04-16 05:49, 李春奇 <Arthur Chunqi Li> wrote:
> > I changed to the latest version of kvm kernel but the bug also occured.
> >
> > On the startup of L1 VM on the host, the host kern.log will output:
> > Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458090] kvm [2808]: vcpu0
> > unhandled rdmsr: 0x345
> > Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458166] kvm_set_msr_common: 22
> > callbacks suppressed
> > Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458169] kvm [2808]: vcpu0
> > unhandled wrmsr: 0x40 data 0
> > Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458176] kvm [2808]: vcpu0
> > unhandled wrmsr: 0x60 data 0
> > Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458182] kvm [2808]: vcpu0
> > unhandled wrmsr: 0x41 data 0
> > Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458188] kvm [2808]: vcpu0
> > unhandled wrmsr: 0x61 data 0
> > Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458194] kvm [2808]: vcpu0
> > unhandled wrmsr: 0x42 data 0
> > Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458200] kvm [2808]: vcpu0
> > unhandled wrmsr: 0x62 data 0
> > Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458206] kvm [2808]: vcpu0
> > unhandled wrmsr: 0x43 data 0
> > Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458211] kvm [2808]: vcpu0
> > unhandled wrmsr: 0x63 data 0
> > Apr 16 11:28:23 Blade1-02 kernel: [ 4908.471014] kvm [2808]: vcpu1
> > unhandled wrmsr: 0x40 data 0
> > Apr 16 11:28:23 Blade1-02 kernel: [ 4908.471024] kvm [2808]: vcpu1
> > unhandled wrmsr: 0x60 data 0
> >
> > When L1 VM starts and crashes, its kern.log will output:
> > Apr 16 11:28:55 kvm1 kernel: [   33.590101] device tap0 entered
> promiscuous
> > mode
> > Apr 16 11:28:55 kvm1 kernel: [   33.590140] br0: port 2(tap0) entered
> > forwarding state
> > Apr 16 11:28:55 kvm1 kernel: [   33.590146] br0: port 2(tap0) entered
> > forwarding state
> > Apr 16 11:29:04 kvm1 kernel: [   42.592103] br0: port 2(tap0) entered
> > forwarding state
> > Apr 16 11:29:19 kvm1 kernel: [   57.752731] kvm [1673]: vcpu0 unhandled
> > rdmsr: 0x345
> > Apr 16 11:29:19 kvm1 kernel: [   57.797261] kvm [1673]: vcpu0 unhandled
> > wrmsr: 0x40 data 0
> > Apr 16 11:29:19 kvm1 kernel: [   57.797315] kvm [1673]: vcpu0 unhandled
> > wrmsr: 0x60 data 0
> > Apr 16 11:29:19 kvm1 kernel: [   57.797366] kvm [1673]: vcpu0 unhandled
> > wrmsr: 0x41 data 0
> > Apr 16 11:29:19 kvm1 kernel: [   57.797416] kvm [1673]: vcpu0 unhandled
> > wrmsr: 0x61 data 0
> > Apr 16 11:29:19 kvm1 kernel: [   57.797466] kvm [1673]: vcpu0 unhandled
> > wrmsr: 0x42 data 0
> > Apr 16 11:29:19 kvm1 kernel: [   57.797516] kvm [1673]: vcpu0 unhandled
> > wrmsr: 0x62 data 0
> > Apr 16 11:29:19 kvm1 kernel: [   57.797566] kvm [1673]: vcpu0 unhandled
> > wrmsr: 0x43 data 0
> > Apr 16 11:29:19 kvm1 kernel: [   57.797616] kvm [1673]: vcpu0 unhandled
> > wrmsr: 0x63 data 0
> >
> > The host will output simultaneously:
> > Apr 16 11:29:20 Blade1-02 kernel: [ 4966.314742] nested_vmx_run: VMCS
> > MSR_{LOAD,STORE} unsupported
>
> That's an important information. KVM is not yet implementing this
> feature, but L1 is using it - doomed to fail. This feature gap of nested
> VMX needs to be closed at some point.
>
> >
> > And the callback trace displayed on the console is the same as the
> previous
> > mail.
> >
> > Besides, the L1 and L2 guest may sometimes crash and output nothing,
> while
> > sometimes it will output as above.
> >
> >
> > So this indicates that the msr controls may fail for core2duo CPU
> emulator.
> >
>
> Maybe varying the CPU type (try e.g. -cpu kvm64,+vmx) reduces the
> likeliness of this scenario with KVM as guest.
>
> >
> > For Jan,
> > I have traced the code of qemu and KVM and found the relevant code of
> errno
> > "KVM: entry failed, hardware error 0x7". The relevant code is in kernel
> > arch/x86/kvm/vmx.c, function vmx_handle_exit():
> >
> > if (exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY) {
> > vcpu->run->exit_reason = KVM_EXIT_FAIL_ENTRY;
> > vcpu->run->fail_entry.hardware_entry_failure_reason
> > = exit_reason;
> > return 0;
> > }
> >
> > if (unlikely(vmx->fail)) {
> > vcpu->run->exit_reason = KVM_EXIT_FAIL_ENTRY;
> > vcpu->run->fail_entry.hardware_entry_failure_reason
> > = vmcs_read32(VM_INSTRUCTION_ERROR);
> > return 0;
> > }
> >
> > The entry failed hardware error may be caused from these two points, both
> > are caused by VMENTRY failed. Because macro
> VMX_EXIT_REASONS_FAILED_VMENTRY
> > is 0x80000000 and the output errno is 0x7, so this error is caused by the
> > second branch. I'm not very clear what the result of
> > vmcs_read32(VM_INSTRUCTION_ERROR) refers to.
>
> Try to look this up in the Intel manual. It explains what instruction
> error 7 means. You will also find it when tracing down the error message
> of L0.
>
> Jan
>
>
>


-- 
Arthur Chunqi Li
Department of Computer Science
School of EECS
Peking University
Beijing, China

[-- Attachment #2: Type: text/html, Size: 7074 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] [qemu-devel] Bug Report: VM crashed for some kinds of vCPU in nested virtualization
  2013-04-16 10:19       ` 李春奇 <Arthur Chunqi Li>
@ 2013-04-16 10:29         ` Jan Kiszka
  0 siblings, 0 replies; 6+ messages in thread
From: Jan Kiszka @ 2013-04-16 10:29 UTC (permalink / raw)
  To: "李春奇 <Arthur Chunqi Li>"
  Cc: qemu-devel, kvm

[-- Attachment #1: Type: text/plain, Size: 730 bytes --]

On 2013-04-16 12:19, 李春奇 <Arthur Chunqi Li> wrote:
> I looked up Intel manual for VM instruction error. Error number 7 means "VM
> entry with invalid control field(s)", which means in process of VM
> switching some control fields are not properly configured.
> 
> I wonder why some emulated CPUs (e.g.Nehalem) can run properly without
> nested VMCS MSR support?

MSRs are only switched between host (L0) and guest (L1/L2) if their
value differ. That saves some cycles. Therefore, if either the guest is
not using a specific MSR (due to differences in the virtual CPU feature
set) or it is using it in the same way like the host, there is no
switching, thus no risk to hit this unimplemented feature.

Jan



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-04-16 10:29 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-04-15  6:24 [Qemu-devel] [qemu-devel] Bug Report: VM crashed for some kinds of vCPU in nested virtualization 李春奇 <Arthur Chunqi Li>
2013-04-15  7:43 ` Jan Kiszka
2013-04-16  3:49   ` 李春奇 <Arthur Chunqi Li>
2013-04-16  7:03     ` Jan Kiszka
2013-04-16 10:19       ` 李春奇 <Arthur Chunqi Li>
2013-04-16 10:29         ` Jan Kiszka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).