From mboxrd@z Thu Jan  1 00:00:00 1970
From: Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: nVMX regression v3.13+, bisected
Date: Thu, 27 Feb 2014 11:51:08 +0100
Message-ID: <530F189C.5000200@redhat.com>
References: <530E43EC.7000600@canonical.com> <530E4DB9.5050001@redhat.com> <530E4E25.4050508@canonical.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-15
Content-Transfer-Encoding: 7bit
Cc: Anthoine Bourgeois <bourgeois@bertin.fr>
To: Stefan Bader <stefan.bader@canonical.com>, kvm@vger.kernel.org
Return-path: <kvm-owner@vger.kernel.org>
Received: from mail-qa0-f52.google.com ([209.85.216.52]:55647 "EHLO
	mail-qa0-f52.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1750993AbaB0KvM (ORCPT <rfc822;kvm@vger.kernel.org>);
	Thu, 27 Feb 2014 05:51:12 -0500
Received: by mail-qa0-f52.google.com with SMTP id j15so3882870qaq.11
        for <kvm@vger.kernel.org>; Thu, 27 Feb 2014 02:51:11 -0800 (PST)
In-Reply-To: <530E4E25.4050508@canonical.com>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

Il 26/02/2014 21:27, Stefan Bader ha scritto:
> On 26.02.2014 21:25, Paolo Bonzini wrote:
>> Il 26/02/2014 20:43, Stefan Bader ha scritto:
>>> Hi,
>>>
>>> I was looking at a bug report[1] about a regression on nested VMX that started
>>> with kernel v3.13 (same issue still existed with v3.14-rc4). The problem shows
>>> up when running a v3.13 kernel in L0 and then trying to launch a L2 (L1 was
>>> either a v3.2 kernel or v3.13, so seemed to have no immediate influence). L2 is
>>> trying to boot a iso image and hangs before the isolinux boot loader displays
>>> anything. A preinstalled hd image fails to boot, too.
>>>
>>> I bisected this and ended up on the following commit which, when reverted made
>>> the launch work again:
>>>
>>> Author: Anthoine Bourgeois <bourgeois@bertin.fr>
>>> Date:   Wed Nov 13 11:45:37 2013 +0100
>>>
>>>     kvm, vmx: Fix lazy FPU on nested guest
>>>
>>>     If a nested guest does a NM fault but its CR0 doesn't contain the TS
>>>     flag (because it was already cleared by the guest with L1 aid) then we
>>>     have to activate FPU ourselves in L0 and then continue to L2. If TS flag
>>>     is set then we fallback on the previous behavior, forward the fault to
>>>     L1 if it asked for.
>>>
>>>     Signed-off-by: Anthoine Bourgeois <bourgeois@bertin.fr>
>>>     Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>>>
>>> The condition to exit to L0 seems to be according to what the description says.
>>> Could it be that the handling in L0 is doing something wrong?
>>
>> Thanks, I'll look at it tomorrow or Friday.
>>
>> Paolo
>>
> Great thanks. And maybe it helps if I actually add the link to the bug report as
> I had intended... :-P

I don't have my usual test machine available, but here is a possible guess.
nested_read_cr0 is the CR0 as read by L2, but here we want to look at the
CR0 value reflecting L1's setup.  This would suggest the following untested
patch:

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index a06f101ef64b..0d90601a2681 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -6688,7 +6688,7 @@ static bool nested_vmx_exit_handled(struct kvm_vcpu *vcpu)
 		else if (is_page_fault(intr_info))
 			return enable_ept;
 		else if (is_no_device(intr_info) &&
-			 !(nested_read_cr0(vmcs12) & X86_CR0_TS))
+			 !(vmcs12->guest_cr0 & X86_CR0_TS))
 			return 0;
 		return vmcs12->exception_bitmap &
 				(1u << (intr_info & INTR_INFO_VECTOR_MASK));


Paolo