* nVMX regression v3.13+, bisected
@ 2014-02-26 19:43 Stefan Bader
2014-02-26 20:25 ` Paolo Bonzini
0 siblings, 1 reply; 11+ messages in thread
From: Stefan Bader @ 2014-02-26 19:43 UTC (permalink / raw)
To: kvm; +Cc: Anthoine Bourgeois, Paolo Bonzini
[-- Attachment #1: Type: text/plain, Size: 1312 bytes --]
Hi,
I was looking at a bug report[1] about a regression on nested VMX that started
with kernel v3.13 (same issue still existed with v3.14-rc4). The problem shows
up when running a v3.13 kernel in L0 and then trying to launch a L2 (L1 was
either a v3.2 kernel or v3.13, so seemed to have no immediate influence). L2 is
trying to boot a iso image and hangs before the isolinux boot loader displays
anything. A preinstalled hd image fails to boot, too.
I bisected this and ended up on the following commit which, when reverted made
the launch work again:
Author: Anthoine Bourgeois <bourgeois@bertin.fr>
Date: Wed Nov 13 11:45:37 2013 +0100
kvm, vmx: Fix lazy FPU on nested guest
If a nested guest does a NM fault but its CR0 doesn't contain the TS
flag (because it was already cleared by the guest with L1 aid) then we
have to activate FPU ourselves in L0 and then continue to L2. If TS flag
is set then we fallback on the previous behavior, forward the fault to
L1 if it asked for.
Signed-off-by: Anthoine Bourgeois <bourgeois@bertin.fr>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The condition to exit to L0 seems to be according to what the description says.
Could it be that the handling in L0 is doing something wrong?
-Stefan
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 901 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: nVMX regression v3.13+, bisected 2014-02-26 19:43 nVMX regression v3.13+, bisected Stefan Bader @ 2014-02-26 20:25 ` Paolo Bonzini 2014-02-26 20:27 ` Stefan Bader 0 siblings, 1 reply; 11+ messages in thread From: Paolo Bonzini @ 2014-02-26 20:25 UTC (permalink / raw) To: Stefan Bader, kvm; +Cc: Anthoine Bourgeois Il 26/02/2014 20:43, Stefan Bader ha scritto: > Hi, > > I was looking at a bug report[1] about a regression on nested VMX that started > with kernel v3.13 (same issue still existed with v3.14-rc4). The problem shows > up when running a v3.13 kernel in L0 and then trying to launch a L2 (L1 was > either a v3.2 kernel or v3.13, so seemed to have no immediate influence). L2 is > trying to boot a iso image and hangs before the isolinux boot loader displays > anything. A preinstalled hd image fails to boot, too. > > I bisected this and ended up on the following commit which, when reverted made > the launch work again: > > Author: Anthoine Bourgeois <bourgeois@bertin.fr> > Date: Wed Nov 13 11:45:37 2013 +0100 > > kvm, vmx: Fix lazy FPU on nested guest > > If a nested guest does a NM fault but its CR0 doesn't contain the TS > flag (because it was already cleared by the guest with L1 aid) then we > have to activate FPU ourselves in L0 and then continue to L2. If TS flag > is set then we fallback on the previous behavior, forward the fault to > L1 if it asked for. > > Signed-off-by: Anthoine Bourgeois <bourgeois@bertin.fr> > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> > > The condition to exit to L0 seems to be according to what the description says. > Could it be that the handling in L0 is doing something wrong? Thanks, I'll look at it tomorrow or Friday. Paolo ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: nVMX regression v3.13+, bisected 2014-02-26 20:25 ` Paolo Bonzini @ 2014-02-26 20:27 ` Stefan Bader 2014-02-26 20:44 ` Kashyap Chamarthy 2014-02-27 10:51 ` Paolo Bonzini 0 siblings, 2 replies; 11+ messages in thread From: Stefan Bader @ 2014-02-26 20:27 UTC (permalink / raw) To: Paolo Bonzini, kvm; +Cc: Anthoine Bourgeois [-- Attachment #1: Type: text/plain, Size: 1711 bytes --] On 26.02.2014 21:25, Paolo Bonzini wrote: > Il 26/02/2014 20:43, Stefan Bader ha scritto: >> Hi, >> >> I was looking at a bug report[1] about a regression on nested VMX that started >> with kernel v3.13 (same issue still existed with v3.14-rc4). The problem shows >> up when running a v3.13 kernel in L0 and then trying to launch a L2 (L1 was >> either a v3.2 kernel or v3.13, so seemed to have no immediate influence). L2 is >> trying to boot a iso image and hangs before the isolinux boot loader displays >> anything. A preinstalled hd image fails to boot, too. >> >> I bisected this and ended up on the following commit which, when reverted made >> the launch work again: >> >> Author: Anthoine Bourgeois <bourgeois@bertin.fr> >> Date: Wed Nov 13 11:45:37 2013 +0100 >> >> kvm, vmx: Fix lazy FPU on nested guest >> >> If a nested guest does a NM fault but its CR0 doesn't contain the TS >> flag (because it was already cleared by the guest with L1 aid) then we >> have to activate FPU ourselves in L0 and then continue to L2. If TS flag >> is set then we fallback on the previous behavior, forward the fault to >> L1 if it asked for. >> >> Signed-off-by: Anthoine Bourgeois <bourgeois@bertin.fr> >> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> >> >> The condition to exit to L0 seems to be according to what the description says. >> Could it be that the handling in L0 is doing something wrong? > > Thanks, I'll look at it tomorrow or Friday. > > Paolo > Great thanks. And maybe it helps if I actually add the link to the bug report as I had intended... :-P [1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1278531 [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 901 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: nVMX regression v3.13+, bisected 2014-02-26 20:27 ` Stefan Bader @ 2014-02-26 20:44 ` Kashyap Chamarthy 2014-02-27 12:10 ` Kashyap Chamarthy 2014-02-27 10:51 ` Paolo Bonzini 1 sibling, 1 reply; 11+ messages in thread From: Kashyap Chamarthy @ 2014-02-26 20:44 UTC (permalink / raw) To: Stefan Bader; +Cc: Paolo Bonzini, kvm, Anthoine Bourgeois On Wed, Feb 26, 2014 at 09:27:17PM +0100, Stefan Bader wrote: > On 26.02.2014 21:25, Paolo Bonzini wrote: [. . .] > >> > >> I bisected this and ended up on the following commit which, when reverted made > >> the launch work again: > >> > >> Author: Anthoine Bourgeois <bourgeois@bertin.fr> > >> Date: Wed Nov 13 11:45:37 2013 +0100 > >> > >> kvm, vmx: Fix lazy FPU on nested guest > >> > >> If a nested guest does a NM fault but its CR0 doesn't contain the TS > >> flag (because it was already cleared by the guest with L1 aid) then we > >> have to activate FPU ourselves in L0 and then continue to L2. If TS flag > >> is set then we fallback on the previous behavior, forward the fault to > >> L1 if it asked for. > >> > >> Signed-off-by: Anthoine Bourgeois <bourgeois@bertin.fr> > >> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> > >> > >> The condition to exit to L0 seems to be according to what the description says. > >> Could it be that the handling in L0 is doing something wrong? > > > > Thanks, I'll look at it tomorrow or Friday. > > > > Paolo > > > Great thanks. And maybe it helps if I actually add the link to the bug report as > I had intended... :-P > > [1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1278531 Yes, I'm seeing something similar[*] in a consistent manner with minimal Fedora installs on L0, L1 and L2, but couldn't manage time to do the bisecting. I thought this would be my first bisecting exercise, but you already beat me to it. [*] https://bugzilla.kernel.org/show_bug.cgi?id=69491#c7 -- /kashyap ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: nVMX regression v3.13+, bisected 2014-02-26 20:44 ` Kashyap Chamarthy @ 2014-02-27 12:10 ` Kashyap Chamarthy 2014-02-27 15:55 ` Kashyap Chamarthy 0 siblings, 1 reply; 11+ messages in thread From: Kashyap Chamarthy @ 2014-02-27 12:10 UTC (permalink / raw) To: Stefan Bader; +Cc: Paolo Bonzini, kvm, Anthoine Bourgeois On Thu, Feb 27, 2014 at 02:14:23AM +0530, Kashyap Chamarthy wrote: > On Wed, Feb 26, 2014 at 09:27:17PM +0100, Stefan Bader wrote: > > On 26.02.2014 21:25, Paolo Bonzini wrote: > > [. . .] > > > >> > > >> I bisected this and ended up on the following commit which, when reverted made > > >> the launch work again: > > >> > > >> Author: Anthoine Bourgeois <bourgeois@bertin.fr> > > >> Date: Wed Nov 13 11:45:37 2013 +0100 > > >> > > >> kvm, vmx: Fix lazy FPU on nested guest > > >> > > >> If a nested guest does a NM fault but its CR0 doesn't contain the TS > > >> flag (because it was already cleared by the guest with L1 aid) then we > > >> have to activate FPU ourselves in L0 and then continue to L2. If TS flag > > >> is set then we fallback on the previous behavior, forward the fault to > > >> L1 if it asked for. > > >> > > >> Signed-off-by: Anthoine Bourgeois <bourgeois@bertin.fr> > > >> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> > > >> > > >> The condition to exit to L0 seems to be according to what the description says. > > >> Could it be that the handling in L0 is doing something wrong? > > > > > > Thanks, I'll look at it tomorrow or Friday. > > > > > > Paolo > > > > > Great thanks. And maybe it helps if I actually add the link to the bug report as > > I had intended... :-P > > > > [1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1278531 > > Yes, I'm seeing something similar[*] in a consistent manner with minimal > Fedora installs on L0, L1 and L2 Ok, I just tried to debug an L2 guest (a libguestfs appliance) via gdb following this method[1]. This is how far I got: >From shell on L1, launch the libguestfs appliance (note: here libguestfs is compiled with gdb debugging enabled, so QEMU won't start running the appliance): $ ./run libguestfs-test-tool [. . .] checking modpath /lib/modules/3.14.0-0.rc2.git0.1.fc21.x86_64 is a directory picked kernel vmlinuz-3.14.0-0.rc2.git0.1.fc21.x86_64 supermin helper [00000ms] finished creating kernel [. . .] libguestfs: warning: qemu debugging is enabled, connect gdb to tcp::1234 to begin [. . .] >From a different shell, I invoke gdb like that: (gdb) symbol-file /usr/lib/debug/lib/modules/3.14.0-0.rc4.git0.1.fc21.x86_64/vmlinux Reading symbols from /usr/lib/debug/lib/modules/3.14.0-0.rc4.git0.1.fc21.x86_64/vmlinux...done. (gdb) target remote tcp::1234 Remote debugging using tcp::1234 0x0000fff0 in ftrace_stack () (gdb) bt #0 0x00000997 in irq_stack_union () #1 0x00000000 in ?? () (gdb) (gdb) c Continuing. Again, back to libguestfs-test-tool, it's just hung attempting to booting from ROM: [. . .] SGABIOS $Id: sgabios.S 8 2010-04-22 00:03:40Z nlaredo $ (mockbuild@) Wed Aug 14 23:57:08 UTC 2013 Term: 80x24 4 0 SeaBIOS (version 1.7.4-20140106_154858-) Booting from ROM... Back to gdb, to find out _what_ file the above function is trying to be executed from: (gdb) c Continuing. ^C Program received signal SIGINT, Interrupt. 0x00000997 in irq_stack_union () (gdb) bt #0 0x00000997 in irq_stack_union () #1 0x00000000 in ?? () (gdb) list 1 /* 2 * Copyright 2002, 2003 Andi Kleen, SuSE Labs. 3 * 4 * This file is subject to the terms and conditions of the GNU General Public 5 * License. See the file COPYING in the main directory of this archive 6 * for more details. No warranty for anything given at all. 7 */ 8 #include <linux/linkage.h> 9 #include <asm/dwarf2.h> 10 #include <asm/errno.h> (gdb) [. . .] (gdb) 241 ENDPROC(csum_partial_copy_generic) (gdb) Line number 242 out of range; arch/x86/lib/csum-copy_64.S has 241 lines. (gdb) PS: Paolo, I'll try to test with your new patch soon. Thanks. [1] https://github.com/libguestfs/libguestfs/blob/master/src/launch-direct.c#L404 > > [*] https://bugzilla.kernel.org/show_bug.cgi?id=69491#c7 > > -- /kashyap ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: nVMX regression v3.13+, bisected 2014-02-27 12:10 ` Kashyap Chamarthy @ 2014-02-27 15:55 ` Kashyap Chamarthy 0 siblings, 0 replies; 11+ messages in thread From: Kashyap Chamarthy @ 2014-02-27 15:55 UTC (permalink / raw) To: Stefan Bader; +Cc: Paolo Bonzini, kvm, Anthoine Bourgeois On Thu, Feb 27, 2014 at 05:40:56PM +0530, Kashyap Chamarthy wrote: > On Thu, Feb 27, 2014 at 02:14:23AM +0530, Kashyap Chamarthy wrote: > > On Wed, Feb 26, 2014 at 09:27:17PM +0100, Stefan Bader wrote: > > > On 26.02.2014 21:25, Paolo Bonzini wrote: > appliance): > > $ ./run libguestfs-test-tool > [. . .] > checking modpath /lib/modules/3.14.0-0.rc2.git0.1.fc21.x86_64 is a directory > picked kernel vmlinuz-3.14.0-0.rc2.git0.1.fc21.x86_64 > supermin helper [00000ms] finished creating kernel > [. . .] > libguestfs: warning: qemu debugging is enabled, connect gdb to tcp::1234 to begin > [. . .] > > > From a different shell, I invoke gdb like that: > > > (gdb) symbol-file /usr/lib/debug/lib/modules/3.14.0-0.rc4.git0.1.fc21.x86_64/vmlinux Disregard me here. I loaded wrong symbol file (Thanks to David Gilbert for spotting that on IRC) :-( Just issued a Fedora Kernel test build[1] with Paolo's patch, will see how it goes. [1] http://koji.fedoraproject.org/koji/taskinfo?taskID=6577135 -- /kashyap ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: nVMX regression v3.13+, bisected 2014-02-26 20:27 ` Stefan Bader 2014-02-26 20:44 ` Kashyap Chamarthy @ 2014-02-27 10:51 ` Paolo Bonzini 2014-02-27 13:41 ` anthoine.bourgeois 2014-02-27 17:01 ` anthoine.bourgeois 1 sibling, 2 replies; 11+ messages in thread From: Paolo Bonzini @ 2014-02-27 10:51 UTC (permalink / raw) To: Stefan Bader, kvm; +Cc: Anthoine Bourgeois Il 26/02/2014 21:27, Stefan Bader ha scritto: > On 26.02.2014 21:25, Paolo Bonzini wrote: >> Il 26/02/2014 20:43, Stefan Bader ha scritto: >>> Hi, >>> >>> I was looking at a bug report[1] about a regression on nested VMX that started >>> with kernel v3.13 (same issue still existed with v3.14-rc4). The problem shows >>> up when running a v3.13 kernel in L0 and then trying to launch a L2 (L1 was >>> either a v3.2 kernel or v3.13, so seemed to have no immediate influence). L2 is >>> trying to boot a iso image and hangs before the isolinux boot loader displays >>> anything. A preinstalled hd image fails to boot, too. >>> >>> I bisected this and ended up on the following commit which, when reverted made >>> the launch work again: >>> >>> Author: Anthoine Bourgeois <bourgeois@bertin.fr> >>> Date: Wed Nov 13 11:45:37 2013 +0100 >>> >>> kvm, vmx: Fix lazy FPU on nested guest >>> >>> If a nested guest does a NM fault but its CR0 doesn't contain the TS >>> flag (because it was already cleared by the guest with L1 aid) then we >>> have to activate FPU ourselves in L0 and then continue to L2. If TS flag >>> is set then we fallback on the previous behavior, forward the fault to >>> L1 if it asked for. >>> >>> Signed-off-by: Anthoine Bourgeois <bourgeois@bertin.fr> >>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> >>> >>> The condition to exit to L0 seems to be according to what the description says. >>> Could it be that the handling in L0 is doing something wrong? >> >> Thanks, I'll look at it tomorrow or Friday. >> >> Paolo >> > Great thanks. And maybe it helps if I actually add the link to the bug report as > I had intended... :-P I don't have my usual test machine available, but here is a possible guess. nested_read_cr0 is the CR0 as read by L2, but here we want to look at the CR0 value reflecting L1's setup. This would suggest the following untested patch: diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index a06f101ef64b..0d90601a2681 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -6688,7 +6688,7 @@ static bool nested_vmx_exit_handled(struct kvm_vcpu *vcpu) else if (is_page_fault(intr_info)) return enable_ept; else if (is_no_device(intr_info) && - !(nested_read_cr0(vmcs12) & X86_CR0_TS)) + !(vmcs12->guest_cr0 & X86_CR0_TS)) return 0; return vmcs12->exception_bitmap & (1u << (intr_info & INTR_INFO_VECTOR_MASK)); Paolo ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: nVMX regression v3.13+, bisected 2014-02-27 10:51 ` Paolo Bonzini @ 2014-02-27 13:41 ` anthoine.bourgeois 2014-02-27 17:01 ` anthoine.bourgeois 1 sibling, 0 replies; 11+ messages in thread From: anthoine.bourgeois @ 2014-02-27 13:41 UTC (permalink / raw) To: Paolo Bonzini; +Cc: Anthoine Bourgeois, kvm, Stefan Bader -----Paolo Bonzini <paolo.bonzini@gmail.com> a écrit : ----- Il 26/02/2014 21:27, Stefan Bader ha scritto: > On 26.02.2014 21:25, Paolo Bonzini wrote: >> Il 26/02/2014 20:43, Stefan Bader ha scritto: >>> Hi, >>> >>> I was looking at a bug report[1] about a regression on nested VMX that started >>> with kernel v3.13 (same issue still existed with v3.14-rc4). The problem shows >>> up when running a v3.13 kernel in L0 and then trying to launch a L2 (L1 was >>> either a v3.2 kernel or v3.13, so seemed to have no immediate influence). L2 is >>> trying to boot a iso image and hangs before the isolinux boot loader displays >>> anything. A preinstalled hd image fails to boot, too. >>> >>> I bisected this and ended up on the following commit which, when reverted made >>> the launch work again: >>> >>> Author: Anthoine Bourgeois <bourgeois@bertin.fr> >>> Date: Wed Nov 13 11:45:37 2013 +0100 >>> >>> kvm, vmx: Fix lazy FPU on nested guest >>> >>> If a nested guest does a NM fault but its CR0 doesn't contain the TS >>> flag (because it was already cleared by the guest with L1 aid) then we >>> have to activate FPU ourselves in L0 and then continue to L2. If TS flag >>> is set then we fallback on the previous behavior, forward the fault to >>> L1 if it asked for. >>> >>> Signed-off-by: Anthoine Bourgeois <bourgeois@bertin.fr> >>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> >>> >>> The condition to exit to L0 seems to be according to what the description says. >>> Could it be that the handling in L0 is doing something wrong? >> >> Thanks, I'll look at it tomorrow or Friday. >> >> Paolo >> > Great thanks. And maybe it helps if I actually add the link to the bug report as > I had intended... :-P I don't have my usual test machine available, but here is a possible guess. nested_read_cr0 is the CR0 as read by L2, but here we want to look at the CR0 value reflecting L1's setup. This would suggest the following untested patch: diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index a06f101ef64b..0d90601a2681 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -6688,7 +6688,7 @@ static bool nested_vmx_exit_handled(struct kvm_vcpu *vcpu) else if (is_page_fault(intr_info)) return enable_ept; else if (is_no_device(intr_info) && - !(nested_read_cr0(vmcs12) & X86_CR0_TS)) + !(vmcs12->guest_cr0 & X86_CR0_TS)) return 0; return vmcs12->exception_bitmap & (1u << (intr_info & INTR_INFO_VECTOR_MASK)); Hi, I install a new test machine and I test the patch. I'll report as soon as possible. Regards, Anthoine 1 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: nVMX regression v3.13+, bisected 2014-02-27 10:51 ` Paolo Bonzini 2014-02-27 13:41 ` anthoine.bourgeois @ 2014-02-27 17:01 ` anthoine.bourgeois 2014-02-27 16:58 ` Paolo Bonzini 1 sibling, 1 reply; 11+ messages in thread From: anthoine.bourgeois @ 2014-02-27 17:01 UTC (permalink / raw) To: Paolo Bonzini; +Cc: Anthoine Bourgeois, kvm, Stefan Bader, Kashyap Chamarthy [-- Attachment #1: Type: text/plain, Size: 2979 bytes --] -----Paolo Bonzini <paolo.bonzini@gmail.com> a écrit : ----- >A : Stefan Bader <stefan.bader@canonical.com>, kvm@vger.kernel.org >De : Paolo Bonzini >Envoyé par : Paolo Bonzini >Date : 27/02/2014 11:51 >Cc : Anthoine Bourgeois <bourgeois@bertin.fr> >Objet : Re: nVMX regression v3.13+, bisected > >Il 26/02/2014 21:27, Stefan Bader ha scritto: >> On 26.02.2014 21:25, Paolo Bonzini wrote: >>> Il 26/02/2014 20:43, Stefan Bader ha scritto: >>>> Hi, >>>> >>>> I was looking at a bug report[1] about a regression on nested VMX >that started >>>> with kernel v3.13 (same issue still existed with v3.14-rc4). The >problem shows >>>> up when running a v3.13 kernel in L0 and then trying to launch a >L2 (L1 was >>>> either a v3.2 kernel or v3.13, so seemed to have no immediate >influence). L2 is >>>> trying to boot a iso image and hangs before the isolinux boot >loader displays >>>> anything. A preinstalled hd image fails to boot, too. >>>> >>>> I bisected this and ended up on the following commit which, when >reverted made >>>> the launch work again: >>>> >>>> Author: Anthoine Bourgeois <bourgeois@bertin.fr> >>>> Date: Wed Nov 13 11:45:37 2013 +0100 >>>> >>>> kvm, vmx: Fix lazy FPU on nested guest >>>> >>>> If a nested guest does a NM fault but its CR0 doesn't contain >the TS >>>> flag (because it was already cleared by the guest with L1 >aid) then we >>>> have to activate FPU ourselves in L0 and then continue to L2. >If TS flag >>>> is set then we fallback on the previous behavior, forward the >fault to >>>> L1 if it asked for. >>>> >>>> Signed-off-by: Anthoine Bourgeois <bourgeois@bertin.fr> >>>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> >>>> >>>> The condition to exit to L0 seems to be according to what the >description says. >>>> Could it be that the handling in L0 is doing something wrong? >>> >>> Thanks, I'll look at it tomorrow or Friday. >>> >>> Paolo >>> >> Great thanks. And maybe it helps if I actually add the link to the >bug report as >> I had intended... :-P > >I don't have my usual test machine available, but here is a possible >guess. >nested_read_cr0 is the CR0 as read by L2, but here we want to look at >the >CR0 value reflecting L1's setup. This would suggest the following >untested >patch: > >diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c >index a06f101ef64b..0d90601a2681 100644 >--- a/arch/x86/kvm/vmx.c >+++ b/arch/x86/kvm/vmx.c >@@ -6688,7 +6688,7 @@ static bool nested_vmx_exit_handled(struct >kvm_vcpu *vcpu) > else if (is_page_fault(intr_info)) > return enable_ept; > else if (is_no_device(intr_info) && >- !(nested_read_cr0(vmcs12) & X86_CR0_TS)) >+ !(vmcs12->guest_cr0 & X86_CR0_TS)) > return 0; > return vmcs12->exception_bitmap & > (1u << (intr_info & INTR_INFO_VECTOR_MASK)); > OK, so your patch works perfectly well with both of my test machines (a Ubuntu guest or a ChorusOS guest). I join the patch, can you signof it ? Regards, Anthoine PS: Sorry for my bad Lotus Notes mailer behaviour :-/ 1 [-- Attachment #2: 0001-kvm-vmx-Fix-a-nested-cr0-read-on-NM-fault.patch --] [-- Type: application/octet-stream, Size: 1043 bytes --] From 7f3102b23a2eb8fc070345675a60faaa45d6dd7c Mon Sep 17 00:00:00 2001 From: Paolo Bonzini <pbonzini@redhat.com> Date: Thu, 27 Feb 2014 17:46:43 +0100 Subject: [PATCH] kvm,vmx: Fix a nested cr0 read on NM fault nested_read_cr0 is the CR0 as read by L2, but here we want to look at the CR0 value reflecting L1's setup. Fix bugzilla ID#69491 Reported-by: Kashyap Chamarthy <kchamart@redhat.com> Tested-by: Anthoine Bourgeois <bourgeois@bertin.fr> --- arch/x86/kvm/vmx.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index da7837e..dcc4de3 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -6644,7 +6644,7 @@ static bool nested_vmx_exit_handled(struct kvm_vcpu *vcpu) else if (is_page_fault(intr_info)) return enable_ept; else if (is_no_device(intr_info) && - !(nested_read_cr0(vmcs12) & X86_CR0_TS)) + !(vmcs12->guest_cr0 & X86_CR0_TS)) return 0; return vmcs12->exception_bitmap & (1u << (intr_info & INTR_INFO_VECTOR_MASK)); -- 1.7.9.5 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: nVMX regression v3.13+, bisected 2014-02-27 17:01 ` anthoine.bourgeois @ 2014-02-27 16:58 ` Paolo Bonzini 2014-02-27 21:34 ` Kashyap Chamarthy 0 siblings, 1 reply; 11+ messages in thread From: Paolo Bonzini @ 2014-02-27 16:58 UTC (permalink / raw) To: anthoine.bourgeois Cc: Anthoine Bourgeois, kvm, Stefan Bader, Kashyap Chamarthy Il 27/02/2014 18:01, anthoine.bourgeois@bertin.fr ha scritto: > OK, so your patch works perfectly well with both of my test machines (a Ubuntu guest or > a ChorusOS guest). > I join the patch, can you signof it ? I'll post it tonight or tomorrow, thanks. Paolo ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: nVMX regression v3.13+, bisected 2014-02-27 16:58 ` Paolo Bonzini @ 2014-02-27 21:34 ` Kashyap Chamarthy 0 siblings, 0 replies; 11+ messages in thread From: Kashyap Chamarthy @ 2014-02-27 21:34 UTC (permalink / raw) To: Paolo Bonzini; +Cc: anthoine.bourgeois, Anthoine Bourgeois, kvm, Stefan Bader On Thu, Feb 27, 2014 at 05:58:46PM +0100, Paolo Bonzini wrote: > Il 27/02/2014 18:01, anthoine.bourgeois@bertin.fr ha scritto: > >OK, so your patch works perfectly well with both of my test machines (a Ubuntu guest or > >a ChorusOS guest). > >I join the patch, can you signof it ? > > I'll post it tonight or tomorrow, thanks. I tested two cases: 1. Built a Fedora Kernel[a] and installed on both L0 and L1, booted an L2 guest successfully[b]. - I tried two things as L2 guest: (1) Run 'libguestfs-test-tool' that boots a minimal Kernel/initrd via 'libvirt' backend; (2) A regular Fedora-20 guest with QEMU command-line produced by libvirt. 2. Build KVM git with Paolo's patch, install the Kernel on L0 and L1; (and reboot both): $ git log | head -1 commit 404381c5839d67aa0c275ad1da96ef3d3928ca2c In this case, booting an L1 (guest hypervisor) itself results in the below stack trace for me: [. . .] [ 0.039000] task: ffff880216078000 ti: ffff880216056000 task.ti: ffff880216056000 [ 0.039000] RIP: 0010:[<ffffffff81ceab25>] [<ffffffff81ceab25>] intel_pmu_init+0x2d9/0x8e6 [ 0.039000] RSP: 0000:ffff880216057e40 EFLAGS: 00000202 [ 0.039000] RAX: 0000000000000003 RBX: 0000000000000000 RCX: 0000000000000345 [ 0.039000] RDX: 0000000000000003 RSI: 0000000000000730 RDI: 0000ffffffffffff [ 0.039000] RBP: ffff880216057e48 R08: 0000000000000001 R09: 0000000000000007 [ 0.039000] R10: ffffffff81cc0c60 R11: ffff880216057c16 R12: ffffffff81ce9a4f [ 0.039000] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 0.039000] FS: 0000000000000000(0000) GS:ffff88021fc00000(0000) knlGS:0000000000000000 [ 0.039000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 0.039000] CR2: ffff88021ffff000 CR3: 0000000001c0b000 CR4: 00000000001406f0 [ 0.039000] Stack: [ 0.039000] 0000000000000000 ffff880216057e98 ffffffff81ce9a88 0000000000000000 [ 0.039000] ffff88000009b000 ffff880216057e98 0000000000000000 ffffffff81ce9a4f [ 0.039000] 0000000000000000 0000000000000000 0000000000000000 ffff880216057f08 [ 0.039000] Call Trace: [ 0.039000] [<ffffffff81ce9a88>] init_hw_perf_events+0x39/0x51a [ 0.039000] [<ffffffff81ce9a4f>] ? check_bugs+0x2d/0x2d [ 0.039000] [<ffffffff81000332>] do_one_initcall+0xf2/0x140 [ 0.039000] [<ffffffff81cef22f>] ? native_smp_prepare_cpus+0x30c/0x32a [ 0.039000] [<ffffffff81ce2f19>] kernel_init_freeable+0xc4/0x1ec [ 0.039000] [<ffffffff817aa230>] ? rest_init+0x80/0x80 [ 0.039000] [<ffffffff817aa239>] kernel_init+0x9/0xf0 [ 0.039000] [<ffffffff817c307c>] ret_from_fork+0x7c/0xb0 [ 0.039000] [<ffffffff817aa230>] ? rest_init+0x80/0x80 [ 0.039000] Code: 61 fd ff 44 89 0d e4 61 fd ff 89 0d 9e 62 fd ff 7e 2b 83 e2 1f b8 03 00 00 00 b9 45 03 00 00 83 fa 02 0f 4f c2 89 05 ab 61 fd ff <0f> 32 48 c1 e2 20 89 c0 48 09 c2 48 89 15 49 62 fd ff e8 64 10 [ 0.039000] RIP [<ffffffff81ceab25>] intel_pmu_init+0x2d9/0x8e6 [ 0.039000] RSP <ffff880216057e40> [ 0.039013] ---[ end trace 6a1e6dd839222f3e ]--- [ 0.040007] swapper/0 (1) used greatest stack depth: 5720 bytes left [ 0.041008] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b Thanks Paolo, for the fix. [a] http://koji.fedoraproject.org/koji/taskinfo?taskID=6577700 [b] http://kashyapc.fedorapeople.org/temp/stdout-libguestfs-test-tool-in-L1-28FEB2014.txt -- /kashyap ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2014-02-27 21:34 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-02-26 19:43 nVMX regression v3.13+, bisected Stefan Bader 2014-02-26 20:25 ` Paolo Bonzini 2014-02-26 20:27 ` Stefan Bader 2014-02-26 20:44 ` Kashyap Chamarthy 2014-02-27 12:10 ` Kashyap Chamarthy 2014-02-27 15:55 ` Kashyap Chamarthy 2014-02-27 10:51 ` Paolo Bonzini 2014-02-27 13:41 ` anthoine.bourgeois 2014-02-27 17:01 ` anthoine.bourgeois 2014-02-27 16:58 ` Paolo Bonzini 2014-02-27 21:34 ` Kashyap Chamarthy
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).