public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] KVM: nVMX: always use early vmcs check when EPT is disabled
@ 2019-04-15 14:05 Paolo Bonzini
  2019-04-15 17:35 ` Sean Christopherson
  0 siblings, 1 reply; 3+ messages in thread
From: Paolo Bonzini @ 2019-04-15 14:05 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: Sean Christopherson

The remaining failures of vmx.flat when EPT is disabled are caused by
incorrectly reflecting VMfails to the L1 hypervisor.  What happens is
that nested_vmx_restore_host_state corrupts the guest CR3, reloading it
with the host's shadow CR3 instead, because it blindly loads GUEST_CR3
from the vmcs01.

For simplicity let's just always use hardware VMCS checks when EPT is
disabled.  This way, nested_vmx_restore_host_state is not reached at
all (or at least shouldn't be reached).
---
 arch/x86/include/uapi/asm/vmx.h |  1 +
 arch/x86/kvm/vmx/nested.c       | 22 ++++++++++++++++++++--
 2 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/uapi/asm/vmx.h b/arch/x86/include/uapi/asm/vmx.h
index f0b0c90dd398..d213ec5c3766 100644
--- a/arch/x86/include/uapi/asm/vmx.h
+++ b/arch/x86/include/uapi/asm/vmx.h
@@ -146,6 +146,7 @@
 
 #define VMX_ABORT_SAVE_GUEST_MSR_FAIL        1
 #define VMX_ABORT_LOAD_HOST_PDPTE_FAIL       2
+#define VMX_ABORT_VMCS_CORRUPTED             3
 #define VMX_ABORT_LOAD_HOST_MSR_FAIL         4
 
 #endif /* _UAPIVMX_H */
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index a22af5a85540..6401eb7ef19c 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -3796,8 +3796,18 @@ static void nested_vmx_restore_host_state(struct kvm_vcpu *vcpu)
 	vmx_set_cr4(vcpu, vmcs_readl(CR4_READ_SHADOW));
 
 	nested_ept_uninit_mmu_context(vcpu);
-	vcpu->arch.cr3 = vmcs_readl(GUEST_CR3);
-	__set_bit(VCPU_EXREG_CR3, (ulong *)&vcpu->arch.regs_avail);
+
+	/*
+	 * This is only valid if EPT is in use, otherwise the vmcs01 GUEST_CR3
+	 * points to shadow pages!  Fortunately we only get here after a WARN_ON
+	 * if EPT is disabled, so a VMabort is perfectly fine.
+	 */
+	if (enable_ept) {
+		vcpu->arch.cr3 = vmcs_readl(GUEST_CR3);
+		__set_bit(VCPU_EXREG_CR3, (ulong *)&vcpu->arch.regs_avail);
+	} else {
+		nested_vmx_abort(vcpu, VMX_ABORT_VMCS_CORRUPTED);
+	}
 
 	/*
 	 * Use ept_save_pdptrs(vcpu) to load the MMU's cached PDPTRs
@@ -5745,6 +5755,14 @@ __init int nested_vmx_hardware_setup(int (*exit_handlers[])(struct kvm_vcpu *))
 {
 	int i;
 
+	/*
+	 * Without EPT it is not possible to restore L1's CR3 and PDPTR on
+	 * VMfail, because they are not available in vmcs01.  Just always
+	 * use hardware checks.
+	 */
+	if (!enable_ept)
+		nested_early_check = 1;
+
 	if (!cpu_has_vmx_shadow_vmcs())
 		enable_shadow_vmcs = 0;
 	if (enable_shadow_vmcs) {
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] KVM: nVMX: always use early vmcs check when EPT is disabled
  2019-04-15 14:05 [PATCH] KVM: nVMX: always use early vmcs check when EPT is disabled Paolo Bonzini
@ 2019-04-15 17:35 ` Sean Christopherson
  2019-04-15 17:39   ` Sean Christopherson
  0 siblings, 1 reply; 3+ messages in thread
From: Sean Christopherson @ 2019-04-15 17:35 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm

[-- Attachment #1: Type: text/plain, Size: 832 bytes --]

On Mon, Apr 15, 2019 at 04:05:56PM +0200, Paolo Bonzini wrote:
> The remaining failures of vmx.flat when EPT is disabled are caused by
> incorrectly reflecting VMfails to the L1 hypervisor.  What happens is
> that nested_vmx_restore_host_state corrupts the guest CR3, reloading it
> with the host's shadow CR3 instead, because it blindly loads GUEST_CR3
> from the vmcs01.
> 
> For simplicity let's just always use hardware VMCS checks when EPT is
> disabled.  This way, nested_vmx_restore_host_state is not reached at
> all (or at least shouldn't be reached).

At the risk of getting too clever, we can handle this scenario by stashing
L1's CR3 in vmcs01.GUEST_CR3 immediately prior to loading L2's state.

The attached patch passes vmx.flat with ept=0, haven't tested it beyond
that.

Side topic, your patch was missing your SOB.

[-- Attachment #2: 0001-KVM-nVMX-Stash-L1-s-CR3-in-vmcs01.GUEST_CR3-on-neste.patch --]
[-- Type: text/x-diff, Size: 2656 bytes --]

From 32ee6be1ba6490c59db6843b2f88e18539a49961 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <sean.j.christopherson@intel.com>
Date: Mon, 15 Apr 2019 10:06:22 -0700
Subject: [PATCH] KVM: nVMX: Stash L1's CR3 in vmcs01.GUEST_CR3 on nested entry
 w/o EPT

KVM does not have 100% coverage of VMX consistency checks, i.e. some
checks that cause VM-Fail may only be detected by hardware during a
nested VM-Entry.  In such a case, KVM must restore L1's state to the
pre-VM-Enter state as L2's state has already been loaded into KVM's
software model.

L1's CR3 and PDPTRs in particular are loaded from vmcs01.GUEST_*.  But
when EPT is disabled, the associated fields hold KVM's shadow values,
not L1's "real" values.  Fortunately, when EPT is disabled the PDPTRs
come from memory, i.e. are not cached in the VMCS.  Which leaves CR3
as the sole anomaly.  Handle CR3 by overwriting vmcs01.GUEST_CR3 with
L1's CR3 during the nested VM-Entry when EPT is disabled *and* nested
early checks are disabled, so that nested_vmx_restore_host_state() will
naturally restore the correct vcpu->arch.cr3 from vmcs01.GUEST_CR3.

Note, these shenanigans work because nested_vmx_restore_host_state()
does a full kvm_mmu_reset_context(), i.e. unloads the current MMU,
which guarantees vmcs01.GUEST_CR3 will be rewritten with a new shadow
CR3 prior to re-entering L1.  Writing vmcs01.GUEST_CR3 is done if and
only if nested early checks are disabled as "late" VM-Fail should never
happen in that case (KVM WARNs), and the conditional write avoids the
need to restore the correct GUEST_CR3 when nested_vmx_check_vmentry_hw()
fails.

Reported-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/vmx/nested.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index e5418f78a249..b974d2116f9e 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -2982,6 +2982,8 @@ int nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu, bool from_vmentry)
 	if (kvm_mpx_supported() &&
 		!(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_BNDCFGS))
 		vmx->nested.vmcs01_guest_bndcfgs = vmcs_read64(GUEST_BNDCFGS);
+	if (!enable_ept)
+		vmcs_writel(GUEST_CR3, vcpu->arch.cr3);
 
 	vmx_switch_vmcs(vcpu, &vmx->nested.vmcs02);
 
@@ -3802,7 +3804,8 @@ static void nested_vmx_restore_host_state(struct kvm_vcpu *vcpu)
 	 * VMFail, like everything else we just need to ensure our
 	 * software model is up-to-date.
 	 */
-	ept_save_pdptrs(vcpu);
+	if (enable_ept)
+		ept_save_pdptrs(vcpu);
 
 	kvm_mmu_reset_context(vcpu);
 
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] KVM: nVMX: always use early vmcs check when EPT is disabled
  2019-04-15 17:35 ` Sean Christopherson
@ 2019-04-15 17:39   ` Sean Christopherson
  0 siblings, 0 replies; 3+ messages in thread
From: Sean Christopherson @ 2019-04-15 17:39 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm

[-- Attachment #1: Type: text/plain, Size: 953 bytes --]

On Mon, Apr 15, 2019 at 10:35:13AM -0700, Sean Christopherson wrote:
> On Mon, Apr 15, 2019 at 04:05:56PM +0200, Paolo Bonzini wrote:
> > The remaining failures of vmx.flat when EPT is disabled are caused by
> > incorrectly reflecting VMfails to the L1 hypervisor.  What happens is
> > that nested_vmx_restore_host_state corrupts the guest CR3, reloading it
> > with the host's shadow CR3 instead, because it blindly loads GUEST_CR3
> > from the vmcs01.
> > 
> > For simplicity let's just always use hardware VMCS checks when EPT is
> > disabled.  This way, nested_vmx_restore_host_state is not reached at
> > all (or at least shouldn't be reached).
> 
> At the risk of getting too clever, we can handle this scenario by stashing
> L1's CR3 in vmcs01.GUEST_CR3 immediately prior to loading L2's state.
> 
> The attached patch passes vmx.flat with ept=0, haven't tested it beyond
> that.

Gah, forgot to regenerate the patch, correct version attached...

[-- Attachment #2: 0001-KVM-nVMX-Stash-L1-s-CR3-in-vmcs01.GUEST_CR3-on-neste.patch --]
[-- Type: text/x-diff, Size: 2792 bytes --]

From 332242a1f340af2bd313e5a5622985bdafae340f Mon Sep 17 00:00:00 2001
From: Sean Christopherson <sean.j.christopherson@intel.com>
Date: Mon, 15 Apr 2019 10:06:22 -0700
Subject: [PATCH] KVM: nVMX: Stash L1's CR3 in vmcs01.GUEST_CR3 on nested entry
 w/o EPT

KVM does not have 100% coverage of VMX consistency checks, i.e. some
checks that cause VM-Fail may only be detected by hardware during a
nested VM-Entry.  In such a case, KVM must restore L1's state to the
pre-VM-Enter state as L2's state has already been loaded into KVM's
software model.

L1's CR3 and PDPTRs in particular are loaded from vmcs01.GUEST_*.  But
when EPT is disabled, the associated fields hold KVM's shadow values,
not L1's "real" values.  Fortunately, when EPT is disabled the PDPTRs
come from memory, i.e. are not cached in the VMCS.  Which leaves CR3
as the sole anomaly.  Handle CR3 by overwriting vmcs01.GUEST_CR3 with
L1's CR3 during the nested VM-Entry when EPT is disabled *and* nested
early checks are disabled, so that nested_vmx_restore_host_state() will
naturally restore the correct vcpu->arch.cr3 from vmcs01.GUEST_CR3.

Note, these shenanigans work because nested_vmx_restore_host_state()
does a full kvm_mmu_reset_context(), i.e. unloads the current MMU,
which guarantees vmcs01.GUEST_CR3 will be rewritten with a new shadow
CR3 prior to re-entering L1.  Writing vmcs01.GUEST_CR3 is done if and
only if nested early checks are disabled as "late" VM-Fail should never
happen in that case (KVM WARNs), and the conditional write avoids the
need to restore the correct GUEST_CR3 when nested_vmx_check_vmentry_hw()
fails.

Reported-by: Paolo Bonzini <pbonzini@redhat.com>
Fixes: bd18bffca353 ("KVM: nVMX: restore host state in nested_vmx_vmexit for VMFail")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/vmx/nested.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index e5418f78a249..196f33c0e707 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -2982,6 +2982,8 @@ int nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu, bool from_vmentry)
 	if (kvm_mpx_supported() &&
 		!(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_BNDCFGS))
 		vmx->nested.vmcs01_guest_bndcfgs = vmcs_read64(GUEST_BNDCFGS);
+	if (!enable_ept && !nested_early_check)
+		vmcs_writel(GUEST_CR3, vcpu->arch.cr3);
 
 	vmx_switch_vmcs(vcpu, &vmx->nested.vmcs02);
 
@@ -3802,7 +3804,8 @@ static void nested_vmx_restore_host_state(struct kvm_vcpu *vcpu)
 	 * VMFail, like everything else we just need to ensure our
 	 * software model is up-to-date.
 	 */
-	ept_save_pdptrs(vcpu);
+	if (enable_ept)
+		ept_save_pdptrs(vcpu);
 
 	kvm_mmu_reset_context(vcpu);
 
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2019-04-15 17:39 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-04-15 14:05 [PATCH] KVM: nVMX: always use early vmcs check when EPT is disabled Paolo Bonzini
2019-04-15 17:35 ` Sean Christopherson
2019-04-15 17:39   ` Sean Christopherson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox