kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v7 00/15] Nested EPT
@ 2013-08-05  8:07 Gleb Natapov
  2013-08-05  8:07 ` [PATCH v7 01/15] nEPT: Support LOAD_IA32_EFER entry/exit controls for L1 Gleb Natapov
                   ` (15 more replies)
  0 siblings, 16 replies; 32+ messages in thread
From: Gleb Natapov @ 2013-08-05  8:07 UTC (permalink / raw)
  To: kvm; +Cc: Xiao Guangrong, Jun Nakajima, Yang Zhang, pbonzini

Xiao comment about checking ept pointer before flushing individual ept context
is addressed here.

Gleb Natapov (3):
  nEPT: make guest's A/D bits depends on guest's paging mode
  nEPT: Support shadow paging for guest paging without A/D bits
  nEPT: correctly check if remote tlb flush is needed for shadowed EPT
    tables

Nadav Har'El (10):
  nEPT: Support LOAD_IA32_EFER entry/exit controls for L1
  nEPT: Fix cr3 handling in nested exit and entry
  nEPT: Fix wrong test in kvm_set_cr3
  nEPT: Move common code to paging_tmpl.h
  nEPT: Add EPT tables support to paging_tmpl.h
  nEPT: MMU context for nested EPT
  nEPT: Nested INVEPT
  nEPT: Advertise EPT to L1
  nEPT: Some additional comments
  nEPT: Miscelleneous cleanups

Yang Zhang (2):
  nEPT: Redefine EPT-specific link_shadow_page()
  nEPT: Add nEPT violation/misconfigration support

 arch/x86/include/asm/kvm_host.h |    4 +
 arch/x86/include/asm/vmx.h      |    2 +
 arch/x86/include/uapi/asm/vmx.h |    1 +
 arch/x86/kvm/mmu.c              |  170 +++++++++++++++++-------------
 arch/x86/kvm/mmu.h              |    2 +
 arch/x86/kvm/paging_tmpl.h      |  176 +++++++++++++++++++++++++++----
 arch/x86/kvm/vmx.c              |  220 ++++++++++++++++++++++++++++++++++++---
 arch/x86/kvm/x86.c              |   11 --
 8 files changed, 467 insertions(+), 119 deletions(-)

-- 
1.7.10.4


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v7 01/15] nEPT: Support LOAD_IA32_EFER entry/exit controls for L1
  2013-08-05  8:07 [PATCH v7 00/15] Nested EPT Gleb Natapov
@ 2013-08-05  8:07 ` Gleb Natapov
  2013-08-05 11:27   ` Arthur Chunqi Li
  2013-08-05  8:07 ` [PATCH v7 02/15] nEPT: Fix cr3 handling in nested exit and entry Gleb Natapov
                   ` (14 subsequent siblings)
  15 siblings, 1 reply; 32+ messages in thread
From: Gleb Natapov @ 2013-08-05  8:07 UTC (permalink / raw)
  To: kvm; +Cc: Xiao Guangrong, Jun Nakajima, Yang Zhang, pbonzini

From: Nadav Har'El <nyh@il.ibm.com>

Recent KVM, since http://kerneltrap.org/mailarchive/linux-kvm/2010/5/2/6261577
switch the EFER MSR when EPT is used and the host and guest have different
NX bits. So if we add support for nested EPT (L1 guest using EPT to run L2)
and want to be able to run recent KVM as L1, we need to allow L1 to use this
EFER switching feature.

To do this EFER switching, KVM uses VM_ENTRY/EXIT_LOAD_IA32_EFER if available,
and if it isn't, it uses the generic VM_ENTRY/EXIT_MSR_LOAD. This patch adds
support for the former (the latter is still unsupported).

Nested entry and exit emulation (prepare_vmcs_02 and load_vmcs12_host_state,
respectively) already handled VM_ENTRY/EXIT_LOAD_IA32_EFER correctly. So all
that's left to do in this patch is to properly advertise this feature to L1.

Note that vmcs12's VM_ENTRY/EXIT_LOAD_IA32_EFER are emulated by L0, by using
vmx_set_efer (which itself sets one of several vmcs02 fields), so we always
support this feature, regardless of whether the host supports it.

Reviewed-by: Orit Wasserman <owasserm@redhat.com>
Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Signed-off-by: Jun Nakajima <jun.nakajima@intel.com>
Signed-off-by: Xinhao Xu <xinhao.xu@intel.com>
Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
---
 arch/x86/kvm/vmx.c |   23 ++++++++++++++++-------
 1 file changed, 16 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index e999dc7..27efa6a 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2198,7 +2198,8 @@ static __init void nested_vmx_setup_ctls_msrs(void)
 #else
 	nested_vmx_exit_ctls_high = 0;
 #endif
-	nested_vmx_exit_ctls_high |= VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR;
+	nested_vmx_exit_ctls_high |= (VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR |
+				      VM_EXIT_LOAD_IA32_EFER);
 
 	/* entry controls */
 	rdmsr(MSR_IA32_VMX_ENTRY_CTLS,
@@ -2207,8 +2208,8 @@ static __init void nested_vmx_setup_ctls_msrs(void)
 	nested_vmx_entry_ctls_low = VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR;
 	nested_vmx_entry_ctls_high &=
 		VM_ENTRY_LOAD_IA32_PAT | VM_ENTRY_IA32E_MODE;
-	nested_vmx_entry_ctls_high |= VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR;
-
+	nested_vmx_entry_ctls_high |= (VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR |
+				       VM_ENTRY_LOAD_IA32_EFER);
 	/* cpu-based controls */
 	rdmsr(MSR_IA32_VMX_PROCBASED_CTLS,
 		nested_vmx_procbased_ctls_low, nested_vmx_procbased_ctls_high);
@@ -7529,10 +7530,18 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
 	vcpu->arch.cr0_guest_owned_bits &= ~vmcs12->cr0_guest_host_mask;
 	vmcs_writel(CR0_GUEST_HOST_MASK, ~vcpu->arch.cr0_guest_owned_bits);
 
-	/* Note: IA32_MODE, LOAD_IA32_EFER are modified by vmx_set_efer below */
-	vmcs_write32(VM_EXIT_CONTROLS,
-		vmcs12->vm_exit_controls | vmcs_config.vmexit_ctrl);
-	vmcs_write32(VM_ENTRY_CONTROLS, vmcs12->vm_entry_controls |
+	/* L2->L1 exit controls are emulated - the hardware exit is to L0 so
+	 * we should use its exit controls. Note that VM_EXIT_LOAD_IA32_EFER
+	 * bits are further modified by vmx_set_efer() below.
+	 */
+	vmcs_write32(VM_EXIT_CONTROLS, vmcs_config.vmexit_ctrl);
+
+	/* vmcs12's VM_ENTRY_LOAD_IA32_EFER and VM_ENTRY_IA32E_MODE are
+	 * emulated by vmx_set_efer(), below.
+	 */
+	vmcs_write32(VM_ENTRY_CONTROLS,
+		(vmcs12->vm_entry_controls & ~VM_ENTRY_LOAD_IA32_EFER &
+			~VM_ENTRY_IA32E_MODE) |
 		(vmcs_config.vmentry_ctrl & ~VM_ENTRY_IA32E_MODE));
 
 	if (vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_PAT)
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v7 02/15] nEPT: Fix cr3 handling in nested exit and entry
  2013-08-05  8:07 [PATCH v7 00/15] Nested EPT Gleb Natapov
  2013-08-05  8:07 ` [PATCH v7 01/15] nEPT: Support LOAD_IA32_EFER entry/exit controls for L1 Gleb Natapov
@ 2013-08-05  8:07 ` Gleb Natapov
  2013-08-05  8:07 ` [PATCH v7 03/15] nEPT: Fix wrong test in kvm_set_cr3 Gleb Natapov
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 32+ messages in thread
From: Gleb Natapov @ 2013-08-05  8:07 UTC (permalink / raw)
  To: kvm; +Cc: Xiao Guangrong, Jun Nakajima, Yang Zhang, pbonzini

From: Nadav Har'El <nyh@il.ibm.com>

The existing code for handling cr3 and related VMCS fields during nested
exit and entry wasn't correct in all cases:

If L2 is allowed to control cr3 (and this is indeed the case in nested EPT),
during nested exit we must copy the modified cr3 from vmcs02 to vmcs12, and
we forgot to do so. This patch adds this copy.

If L0 isn't controlling cr3 when running L2 (i.e., L0 is using EPT), and
whoever does control cr3 (L1 or L2) is using PAE, the processor might have
saved PDPTEs and we should also save them in vmcs12 (and restore later).

Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Reviewed-by: Orit Wasserman <owasserm@redhat.com>
Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Signed-off-by: Jun Nakajima <jun.nakajima@intel.com>
Signed-off-by: Xinhao Xu <xinhao.xu@intel.com>
Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
---
 arch/x86/kvm/vmx.c |   26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 27efa6a..4e98764 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -7595,6 +7595,16 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
 	kvm_set_cr3(vcpu, vmcs12->guest_cr3);
 	kvm_mmu_reset_context(vcpu);
 
+	/*
+	 * L1 may access the L2's PDPTR, so save them to construct vmcs12
+	 */
+	if (enable_ept) {
+		vmcs_write64(GUEST_PDPTR0, vmcs12->guest_pdptr0);
+		vmcs_write64(GUEST_PDPTR1, vmcs12->guest_pdptr1);
+		vmcs_write64(GUEST_PDPTR2, vmcs12->guest_pdptr2);
+		vmcs_write64(GUEST_PDPTR3, vmcs12->guest_pdptr3);
+	}
+
 	kvm_register_write(vcpu, VCPU_REGS_RSP, vmcs12->guest_rsp);
 	kvm_register_write(vcpu, VCPU_REGS_RIP, vmcs12->guest_rip);
 }
@@ -7917,6 +7927,22 @@ static void prepare_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
 	vmcs12->guest_pending_dbg_exceptions =
 		vmcs_readl(GUEST_PENDING_DBG_EXCEPTIONS);
 
+	/*
+	 * In some cases (usually, nested EPT), L2 is allowed to change its
+	 * own CR3 without exiting. If it has changed it, we must keep it.
+	 * Of course, if L0 is using shadow page tables, GUEST_CR3 was defined
+	 * by L0, not L1 or L2, so we mustn't unconditionally copy it to vmcs12.
+	 *
+	 * Additionally, restore L2's PDPTR to vmcs12.
+	 */
+	if (enable_ept) {
+		vmcs12->guest_cr3 = vmcs_read64(GUEST_CR3);
+		vmcs12->guest_pdptr0 = vmcs_read64(GUEST_PDPTR0);
+		vmcs12->guest_pdptr1 = vmcs_read64(GUEST_PDPTR1);
+		vmcs12->guest_pdptr2 = vmcs_read64(GUEST_PDPTR2);
+		vmcs12->guest_pdptr3 = vmcs_read64(GUEST_PDPTR3);
+	}
+
 	vmcs12->vm_entry_controls =
 		(vmcs12->vm_entry_controls & ~VM_ENTRY_IA32E_MODE) |
 		(vmcs_read32(VM_ENTRY_CONTROLS) & VM_ENTRY_IA32E_MODE);
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v7 03/15] nEPT: Fix wrong test in kvm_set_cr3
  2013-08-05  8:07 [PATCH v7 00/15] Nested EPT Gleb Natapov
  2013-08-05  8:07 ` [PATCH v7 01/15] nEPT: Support LOAD_IA32_EFER entry/exit controls for L1 Gleb Natapov
  2013-08-05  8:07 ` [PATCH v7 02/15] nEPT: Fix cr3 handling in nested exit and entry Gleb Natapov
@ 2013-08-05  8:07 ` Gleb Natapov
  2013-08-05  8:07 ` [PATCH v7 04/15] nEPT: Move common code to paging_tmpl.h Gleb Natapov
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 32+ messages in thread
From: Gleb Natapov @ 2013-08-05  8:07 UTC (permalink / raw)
  To: kvm; +Cc: Xiao Guangrong, Jun Nakajima, Yang Zhang, pbonzini

From: Nadav Har'El <nyh@il.ibm.com>

kvm_set_cr3() attempts to check if the new cr3 is a valid guest physical
address. The problem is that with nested EPT, cr3 is an *L2* physical
address, not an L1 physical address as this test expects.

As the comment above this test explains, it isn't necessary, and doesn't
correspond to anything a real processor would do. So this patch removes it.

Note that this wrong test could have also theoretically caused problems
in nested NPT, not just in nested EPT. However, in practice, the problem
was avoided: nested_svm_vmexit()/vmrun() do not call kvm_set_cr3 in the
nested NPT case, and instead set the vmcb (and arch.cr3) directly, thus
circumventing the problem. Additional potential calls to the buggy function
are avoided in that we don't trap cr3 modifications when nested NPT is
enabled. However, because in nested VMX we did want to use kvm_set_cr3()
(as requested in Avi Kivity's review of the original nested VMX patches),
we can't avoid this problem and need to fix it.

Reviewed-by: Orit Wasserman <owasserm@redhat.com>
Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Signed-off-by: Jun Nakajima <jun.nakajima@intel.com>
Signed-off-by: Xinhao Xu <xinhao.xu@intel.com>
Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
---
 arch/x86/kvm/x86.c |   11 -----------
 1 file changed, 11 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d2caeb9..e2fef8b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -682,17 +682,6 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
 		 */
 	}
 
-	/*
-	 * Does the new cr3 value map to physical memory? (Note, we
-	 * catch an invalid cr3 even in real-mode, because it would
-	 * cause trouble later on when we turn on paging anyway.)
-	 *
-	 * A real CPU would silently accept an invalid cr3 and would
-	 * attempt to use it - with largely undefined (and often hard
-	 * to debug) behavior on the guest side.
-	 */
-	if (unlikely(!gfn_to_memslot(vcpu->kvm, cr3 >> PAGE_SHIFT)))
-		return 1;
 	vcpu->arch.cr3 = cr3;
 	__set_bit(VCPU_EXREG_CR3, (ulong *)&vcpu->arch.regs_avail);
 	vcpu->arch.mmu.new_cr3(vcpu);
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v7 04/15] nEPT: Move common code to paging_tmpl.h
  2013-08-05  8:07 [PATCH v7 00/15] Nested EPT Gleb Natapov
                   ` (2 preceding siblings ...)
  2013-08-05  8:07 ` [PATCH v7 03/15] nEPT: Fix wrong test in kvm_set_cr3 Gleb Natapov
@ 2013-08-05  8:07 ` Gleb Natapov
  2013-08-05  8:07 ` [PATCH v7 05/15] nEPT: make guest's A/D bits depends on guest's paging mode Gleb Natapov
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 32+ messages in thread
From: Gleb Natapov @ 2013-08-05  8:07 UTC (permalink / raw)
  To: kvm; +Cc: Xiao Guangrong, Jun Nakajima, Yang Zhang, pbonzini

From: Nadav Har'El <nyh@il.ibm.com>

For preparation, we just move gpte_access(), prefetch_invalid_gpte(),
s_rsvd_bits_set(), protect_clean_gpte() and is_dirty_gpte() from mmu.c
to paging_tmpl.h.

Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Signed-off-by: Jun Nakajima <jun.nakajima@intel.com>
Signed-off-by: Xinhao Xu <xinhao.xu@intel.com>
Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Signed-off-by: Jun Nakajima <jun.nakajima@intel.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
---
 arch/x86/kvm/mmu.c         |   55 ------------------------------
 arch/x86/kvm/paging_tmpl.h |   80 +++++++++++++++++++++++++++++++++++++-------
 2 files changed, 68 insertions(+), 67 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 3a9493a..4c4274d 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -331,11 +331,6 @@ static int is_large_pte(u64 pte)
 	return pte & PT_PAGE_SIZE_MASK;
 }
 
-static int is_dirty_gpte(unsigned long pte)
-{
-	return pte & PT_DIRTY_MASK;
-}
-
 static int is_rmap_spte(u64 pte)
 {
 	return is_shadow_present_pte(pte);
@@ -2574,14 +2569,6 @@ static void nonpaging_new_cr3(struct kvm_vcpu *vcpu)
 	mmu_free_roots(vcpu);
 }
 
-static bool is_rsvd_bits_set(struct kvm_mmu *mmu, u64 gpte, int level)
-{
-	int bit7;
-
-	bit7 = (gpte >> 7) & 1;
-	return (gpte & mmu->rsvd_bits_mask[bit7][level-1]) != 0;
-}
-
 static pfn_t pte_prefetch_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn,
 				     bool no_dirty_log)
 {
@@ -2594,26 +2581,6 @@ static pfn_t pte_prefetch_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn,
 	return gfn_to_pfn_memslot_atomic(slot, gfn);
 }
 
-static bool prefetch_invalid_gpte(struct kvm_vcpu *vcpu,
-				  struct kvm_mmu_page *sp, u64 *spte,
-				  u64 gpte)
-{
-	if (is_rsvd_bits_set(&vcpu->arch.mmu, gpte, PT_PAGE_TABLE_LEVEL))
-		goto no_present;
-
-	if (!is_present_gpte(gpte))
-		goto no_present;
-
-	if (!(gpte & PT_ACCESSED_MASK))
-		goto no_present;
-
-	return false;
-
-no_present:
-	drop_spte(vcpu->kvm, spte);
-	return true;
-}
-
 static int direct_pte_prefetch_many(struct kvm_vcpu *vcpu,
 				    struct kvm_mmu_page *sp,
 				    u64 *start, u64 *end)
@@ -3501,18 +3468,6 @@ static void paging_free(struct kvm_vcpu *vcpu)
 	nonpaging_free(vcpu);
 }
 
-static inline void protect_clean_gpte(unsigned *access, unsigned gpte)
-{
-	unsigned mask;
-
-	BUILD_BUG_ON(PT_WRITABLE_MASK != ACC_WRITE_MASK);
-
-	mask = (unsigned)~ACC_WRITE_MASK;
-	/* Allow write access to dirty gptes */
-	mask |= (gpte >> (PT_DIRTY_SHIFT - PT_WRITABLE_SHIFT)) & PT_WRITABLE_MASK;
-	*access &= mask;
-}
-
 static bool sync_mmio_spte(struct kvm *kvm, u64 *sptep, gfn_t gfn,
 			   unsigned access, int *nr_present)
 {
@@ -3530,16 +3485,6 @@ static bool sync_mmio_spte(struct kvm *kvm, u64 *sptep, gfn_t gfn,
 	return false;
 }
 
-static inline unsigned gpte_access(struct kvm_vcpu *vcpu, u64 gpte)
-{
-	unsigned access;
-
-	access = (gpte & (PT_WRITABLE_MASK | PT_USER_MASK)) | ACC_EXEC_MASK;
-	access &= ~(gpte >> PT64_NX_SHIFT);
-
-	return access;
-}
-
 static inline bool is_last_gpte(struct kvm_mmu *mmu, unsigned level, unsigned gpte)
 {
 	unsigned index;
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 7769699..fb26ca9 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -80,6 +80,31 @@ static gfn_t gpte_to_gfn_lvl(pt_element_t gpte, int lvl)
 	return (gpte & PT_LVL_ADDR_MASK(lvl)) >> PAGE_SHIFT;
 }
 
+static inline void FNAME(protect_clean_gpte)(unsigned *access, unsigned gpte)
+{
+	unsigned mask;
+
+	BUILD_BUG_ON(PT_WRITABLE_MASK != ACC_WRITE_MASK);
+
+	mask = (unsigned)~ACC_WRITE_MASK;
+	/* Allow write access to dirty gptes */
+	mask |= (gpte >> (PT_DIRTY_SHIFT - PT_WRITABLE_SHIFT)) & PT_WRITABLE_MASK;
+	*access &= mask;
+}
+
+static bool FNAME(is_rsvd_bits_set)(struct kvm_mmu *mmu, u64 gpte, int level)
+{
+	int bit7;
+
+	bit7 = (gpte >> 7) & 1;
+	return (gpte & mmu->rsvd_bits_mask[bit7][level-1]) != 0;
+}
+
+static inline int FNAME(is_present_gpte)(unsigned long pte)
+{
+	return is_present_gpte(pte);
+}
+
 static int FNAME(cmpxchg_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
 			       pt_element_t __user *ptep_user, unsigned index,
 			       pt_element_t orig_pte, pt_element_t new_pte)
@@ -103,6 +128,36 @@ static int FNAME(cmpxchg_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
 	return (ret != orig_pte);
 }
 
+static bool FNAME(prefetch_invalid_gpte)(struct kvm_vcpu *vcpu,
+				  struct kvm_mmu_page *sp, u64 *spte,
+				  u64 gpte)
+{
+	if (FNAME(is_rsvd_bits_set)(&vcpu->arch.mmu, gpte, PT_PAGE_TABLE_LEVEL))
+		goto no_present;
+
+	if (!FNAME(is_present_gpte)(gpte))
+		goto no_present;
+
+	if (!(gpte & PT_ACCESSED_MASK))
+		goto no_present;
+
+	return false;
+
+no_present:
+	drop_spte(vcpu->kvm, spte);
+	return true;
+}
+
+static inline unsigned FNAME(gpte_access)(struct kvm_vcpu *vcpu, u64 gpte)
+{
+	unsigned access;
+
+	access = (gpte & (PT_WRITABLE_MASK | PT_USER_MASK)) | ACC_EXEC_MASK;
+	access &= ~(gpte >> PT64_NX_SHIFT);
+
+	return access;
+}
+
 static int FNAME(update_accessed_dirty_bits)(struct kvm_vcpu *vcpu,
 					     struct kvm_mmu *mmu,
 					     struct guest_walker *walker,
@@ -123,7 +178,8 @@ static int FNAME(update_accessed_dirty_bits)(struct kvm_vcpu *vcpu,
 			trace_kvm_mmu_set_accessed_bit(table_gfn, index, sizeof(pte));
 			pte |= PT_ACCESSED_MASK;
 		}
-		if (level == walker->level && write_fault && !is_dirty_gpte(pte)) {
+		if (level == walker->level && write_fault &&
+				!(pte & PT_DIRTY_MASK)) {
 			trace_kvm_mmu_set_dirty_bit(table_gfn, index, sizeof(pte));
 			pte |= PT_DIRTY_MASK;
 		}
@@ -170,7 +226,7 @@ retry_walk:
 	if (walker->level == PT32E_ROOT_LEVEL) {
 		pte = mmu->get_pdptr(vcpu, (addr >> 30) & 3);
 		trace_kvm_mmu_paging_element(pte, walker->level);
-		if (!is_present_gpte(pte))
+		if (!FNAME(is_present_gpte)(pte))
 			goto error;
 		--walker->level;
 	}
@@ -215,17 +271,17 @@ retry_walk:
 
 		trace_kvm_mmu_paging_element(pte, walker->level);
 
-		if (unlikely(!is_present_gpte(pte)))
+		if (unlikely(!FNAME(is_present_gpte)(pte)))
 			goto error;
 
-		if (unlikely(is_rsvd_bits_set(&vcpu->arch.mmu, pte,
+		if (unlikely(FNAME(is_rsvd_bits_set)(&vcpu->arch.mmu, pte,
 					      walker->level))) {
 			errcode |= PFERR_RSVD_MASK | PFERR_PRESENT_MASK;
 			goto error;
 		}
 
 		accessed_dirty &= pte;
-		pte_access = pt_access & gpte_access(vcpu, pte);
+		pte_access = pt_access & FNAME(gpte_access)(vcpu, pte);
 
 		walker->ptes[walker->level - 1] = pte;
 	} while (!is_last_gpte(mmu, walker->level, pte));
@@ -248,7 +304,7 @@ retry_walk:
 	walker->gfn = real_gpa >> PAGE_SHIFT;
 
 	if (!write_fault)
-		protect_clean_gpte(&pte_access, pte);
+		FNAME(protect_clean_gpte)(&pte_access, pte);
 	else
 		/*
 		 * On a write fault, fold the dirty bit into accessed_dirty by
@@ -309,14 +365,14 @@ FNAME(prefetch_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 	gfn_t gfn;
 	pfn_t pfn;
 
-	if (prefetch_invalid_gpte(vcpu, sp, spte, gpte))
+	if (FNAME(prefetch_invalid_gpte)(vcpu, sp, spte, gpte))
 		return false;
 
 	pgprintk("%s: gpte %llx spte %p\n", __func__, (u64)gpte, spte);
 
 	gfn = gpte_to_gfn(gpte);
-	pte_access = sp->role.access & gpte_access(vcpu, gpte);
-	protect_clean_gpte(&pte_access, gpte);
+	pte_access = sp->role.access & FNAME(gpte_access)(vcpu, gpte);
+	FNAME(protect_clean_gpte)(&pte_access, gpte);
 	pfn = pte_prefetch_gfn_to_pfn(vcpu, gfn,
 			no_dirty_log && (pte_access & ACC_WRITE_MASK));
 	if (is_error_pfn(pfn))
@@ -785,15 +841,15 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
 					  sizeof(pt_element_t)))
 			return -EINVAL;
 
-		if (prefetch_invalid_gpte(vcpu, sp, &sp->spt[i], gpte)) {
+		if (FNAME(prefetch_invalid_gpte)(vcpu, sp, &sp->spt[i], gpte)) {
 			vcpu->kvm->tlbs_dirty++;
 			continue;
 		}
 
 		gfn = gpte_to_gfn(gpte);
 		pte_access = sp->role.access;
-		pte_access &= gpte_access(vcpu, gpte);
-		protect_clean_gpte(&pte_access, gpte);
+		pte_access &= FNAME(gpte_access)(vcpu, gpte);
+		FNAME(protect_clean_gpte)(&pte_access, gpte);
 
 		if (sync_mmio_spte(vcpu->kvm, &sp->spt[i], gfn, pte_access,
 		      &nr_present))
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v7 05/15] nEPT: make guest's A/D bits depends on guest's paging mode
  2013-08-05  8:07 [PATCH v7 00/15] Nested EPT Gleb Natapov
                   ` (3 preceding siblings ...)
  2013-08-05  8:07 ` [PATCH v7 04/15] nEPT: Move common code to paging_tmpl.h Gleb Natapov
@ 2013-08-05  8:07 ` Gleb Natapov
  2013-08-05  8:07 ` [PATCH v7 06/15] nEPT: Support shadow paging for guest paging without A/D bits Gleb Natapov
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 32+ messages in thread
From: Gleb Natapov @ 2013-08-05  8:07 UTC (permalink / raw)
  To: kvm; +Cc: Xiao Guangrong, Jun Nakajima, Yang Zhang, pbonzini

This patch makes guest A/D bits definition to be dependable on paging
mode, so when EPT support will be added it will be able to define them
differently.

Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
---
 arch/x86/kvm/paging_tmpl.h |   30 ++++++++++++++++++++++--------
 1 file changed, 22 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index fb26ca9..7581395 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -32,6 +32,10 @@
 	#define PT_LVL_OFFSET_MASK(lvl) PT64_LVL_OFFSET_MASK(lvl)
 	#define PT_INDEX(addr, level) PT64_INDEX(addr, level)
 	#define PT_LEVEL_BITS PT64_LEVEL_BITS
+	#define PT_GUEST_ACCESSED_MASK PT_ACCESSED_MASK
+	#define PT_GUEST_DIRTY_MASK PT_DIRTY_MASK
+	#define PT_GUEST_DIRTY_SHIFT PT_DIRTY_SHIFT
+	#define PT_GUEST_ACCESSED_SHIFT PT_ACCESSED_SHIFT
 	#ifdef CONFIG_X86_64
 	#define PT_MAX_FULL_LEVELS 4
 	#define CMPXCHG cmpxchg
@@ -49,6 +53,10 @@
 	#define PT_INDEX(addr, level) PT32_INDEX(addr, level)
 	#define PT_LEVEL_BITS PT32_LEVEL_BITS
 	#define PT_MAX_FULL_LEVELS 2
+	#define PT_GUEST_ACCESSED_MASK PT_ACCESSED_MASK
+	#define PT_GUEST_DIRTY_MASK PT_DIRTY_MASK
+	#define PT_GUEST_DIRTY_SHIFT PT_DIRTY_SHIFT
+	#define PT_GUEST_ACCESSED_SHIFT PT_ACCESSED_SHIFT
 	#define CMPXCHG cmpxchg
 #else
 	#error Invalid PTTYPE value
@@ -88,7 +96,8 @@ static inline void FNAME(protect_clean_gpte)(unsigned *access, unsigned gpte)
 
 	mask = (unsigned)~ACC_WRITE_MASK;
 	/* Allow write access to dirty gptes */
-	mask |= (gpte >> (PT_DIRTY_SHIFT - PT_WRITABLE_SHIFT)) & PT_WRITABLE_MASK;
+	mask |= (gpte >> (PT_GUEST_DIRTY_SHIFT - PT_WRITABLE_SHIFT)) &
+		PT_WRITABLE_MASK;
 	*access &= mask;
 }
 
@@ -138,7 +147,7 @@ static bool FNAME(prefetch_invalid_gpte)(struct kvm_vcpu *vcpu,
 	if (!FNAME(is_present_gpte)(gpte))
 		goto no_present;
 
-	if (!(gpte & PT_ACCESSED_MASK))
+	if (!(gpte & PT_GUEST_ACCESSED_MASK))
 		goto no_present;
 
 	return false;
@@ -174,14 +183,14 @@ static int FNAME(update_accessed_dirty_bits)(struct kvm_vcpu *vcpu,
 		table_gfn = walker->table_gfn[level - 1];
 		ptep_user = walker->ptep_user[level - 1];
 		index = offset_in_page(ptep_user) / sizeof(pt_element_t);
-		if (!(pte & PT_ACCESSED_MASK)) {
+		if (!(pte & PT_GUEST_ACCESSED_MASK)) {
 			trace_kvm_mmu_set_accessed_bit(table_gfn, index, sizeof(pte));
-			pte |= PT_ACCESSED_MASK;
+			pte |= PT_GUEST_ACCESSED_MASK;
 		}
 		if (level == walker->level && write_fault &&
-				!(pte & PT_DIRTY_MASK)) {
+				!(pte & PT_GUEST_DIRTY_MASK)) {
 			trace_kvm_mmu_set_dirty_bit(table_gfn, index, sizeof(pte));
-			pte |= PT_DIRTY_MASK;
+			pte |= PT_GUEST_DIRTY_MASK;
 		}
 		if (pte == orig_pte)
 			continue;
@@ -235,7 +244,7 @@ retry_walk:
 	ASSERT((!is_long_mode(vcpu) && is_pae(vcpu)) ||
 	       (mmu->get_cr3(vcpu) & CR3_NONPAE_RESERVED_BITS) == 0);
 
-	accessed_dirty = PT_ACCESSED_MASK;
+	accessed_dirty = PT_GUEST_ACCESSED_MASK;
 	pt_access = pte_access = ACC_ALL;
 	++walker->level;
 
@@ -310,7 +319,8 @@ retry_walk:
 		 * On a write fault, fold the dirty bit into accessed_dirty by
 		 * shifting it one place right.
 		 */
-		accessed_dirty &= pte >> (PT_DIRTY_SHIFT - PT_ACCESSED_SHIFT);
+		accessed_dirty &= pte >>
+			(PT_GUEST_DIRTY_SHIFT - PT_GUEST_ACCESSED_SHIFT);
 
 	if (unlikely(!accessed_dirty)) {
 		ret = FNAME(update_accessed_dirty_bits)(vcpu, mmu, walker, write_fault);
@@ -886,3 +896,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
 #undef gpte_to_gfn
 #undef gpte_to_gfn_lvl
 #undef CMPXCHG
+#undef PT_GUEST_ACCESSED_MASK
+#undef PT_GUEST_DIRTY_MASK
+#undef PT_GUEST_DIRTY_SHIFT
+#undef PT_GUEST_ACCESSED_SHIFT
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v7 06/15] nEPT: Support shadow paging for guest paging without A/D bits
  2013-08-05  8:07 [PATCH v7 00/15] Nested EPT Gleb Natapov
                   ` (4 preceding siblings ...)
  2013-08-05  8:07 ` [PATCH v7 05/15] nEPT: make guest's A/D bits depends on guest's paging mode Gleb Natapov
@ 2013-08-05  8:07 ` Gleb Natapov
  2013-08-05  8:07 ` [PATCH v7 07/15] nEPT: Add EPT tables support to paging_tmpl.h Gleb Natapov
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 32+ messages in thread
From: Gleb Natapov @ 2013-08-05  8:07 UTC (permalink / raw)
  To: kvm; +Cc: Xiao Guangrong, Jun Nakajima, Yang Zhang, pbonzini

Some guest paging modes do not support A/D bits. Add support for such
modes in shadow page code. For such modes PT_GUEST_DIRTY_MASK,
PT_GUEST_ACCESSED_MASK, PT_GUEST_DIRTY_SHIFT and PT_GUEST_ACCESSED_SHIFT
should be set to zero.

Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
---
 arch/x86/kvm/paging_tmpl.h |   16 +++++++++++++---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 7581395..2c2f635 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -92,6 +92,10 @@ static inline void FNAME(protect_clean_gpte)(unsigned *access, unsigned gpte)
 {
 	unsigned mask;
 
+	/* dirty bit is not supported, so no need to track it */
+	if (!PT_GUEST_DIRTY_MASK)
+		return;
+
 	BUILD_BUG_ON(PT_WRITABLE_MASK != ACC_WRITE_MASK);
 
 	mask = (unsigned)~ACC_WRITE_MASK;
@@ -147,7 +151,8 @@ static bool FNAME(prefetch_invalid_gpte)(struct kvm_vcpu *vcpu,
 	if (!FNAME(is_present_gpte)(gpte))
 		goto no_present;
 
-	if (!(gpte & PT_GUEST_ACCESSED_MASK))
+	/* if accessed bit is not supported prefetch non accessed gpte */
+	if (PT_GUEST_ACCESSED_MASK && !(gpte & PT_GUEST_ACCESSED_MASK))
 		goto no_present;
 
 	return false;
@@ -178,6 +183,10 @@ static int FNAME(update_accessed_dirty_bits)(struct kvm_vcpu *vcpu,
 	gfn_t table_gfn;
 	int ret;
 
+	/* dirty/accessed bits are not supported, so no need to update them */
+	if (!PT_GUEST_DIRTY_MASK)
+		return 0;
+
 	for (level = walker->max_level; level >= walker->level; --level) {
 		pte = orig_pte = walker->ptes[level - 1];
 		table_gfn = walker->table_gfn[level - 1];
@@ -316,8 +325,9 @@ retry_walk:
 		FNAME(protect_clean_gpte)(&pte_access, pte);
 	else
 		/*
-		 * On a write fault, fold the dirty bit into accessed_dirty by
-		 * shifting it one place right.
+		 * On a write fault, fold the dirty bit into accessed_dirty.
+		 * For modes without A/D bits support accessed_dirty will be
+		 * always clear.
 		 */
 		accessed_dirty &= pte >>
 			(PT_GUEST_DIRTY_SHIFT - PT_GUEST_ACCESSED_SHIFT);
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v7 07/15] nEPT: Add EPT tables support to paging_tmpl.h
  2013-08-05  8:07 [PATCH v7 00/15] Nested EPT Gleb Natapov
                   ` (5 preceding siblings ...)
  2013-08-05  8:07 ` [PATCH v7 06/15] nEPT: Support shadow paging for guest paging without A/D bits Gleb Natapov
@ 2013-08-05  8:07 ` Gleb Natapov
  2013-08-05  8:07 ` [PATCH v7 08/15] nEPT: Redefine EPT-specific link_shadow_page() Gleb Natapov
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 32+ messages in thread
From: Gleb Natapov @ 2013-08-05  8:07 UTC (permalink / raw)
  To: kvm; +Cc: Xiao Guangrong, Jun Nakajima, Yang Zhang, pbonzini

From: Nadav Har'El <nyh@il.ibm.com>

This is the first patch in a series which adds nested EPT support to KVM's
nested VMX. Nested EPT means emulating EPT for an L1 guest so that L1 can use
EPT when running a nested guest L2. When L1 uses EPT, it allows the L2 guest
to set its own cr3 and take its own page faults without either of L0 or L1
getting involved. This often significanlty improves L2's performance over the
previous two alternatives (shadow page tables over EPT, and shadow page
tables over shadow page tables).

This patch adds EPT support to paging_tmpl.h.

paging_tmpl.h contains the code for reading and writing page tables. The code
for 32-bit and 64-bit tables is very similar, but not identical, so
paging_tmpl.h is #include'd twice in mmu.c, once with PTTTYPE=32 and once
with PTTYPE=64, and this generates the two sets of similar functions.

There are subtle but important differences between the format of EPT tables
and that of ordinary x86 64-bit page tables, so for nested EPT we need a
third set of functions to read the guest EPT table and to write the shadow
EPT table.

So this patch adds third PTTYPE, PTTYPE_EPT, which creates functions (prefixed
with "EPT") which correctly read and write EPT tables.

Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Signed-off-by: Jun Nakajima <jun.nakajima@intel.com>
Signed-off-by: Xinhao Xu <xinhao.xu@intel.com>
Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
---
 arch/x86/kvm/mmu.c         |    5 +++++
 arch/x86/kvm/paging_tmpl.h |   37 ++++++++++++++++++++++++++++++++++++-
 2 files changed, 41 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 4c4274d..b5273c3 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3494,6 +3494,11 @@ static inline bool is_last_gpte(struct kvm_mmu *mmu, unsigned level, unsigned gp
 	return mmu->last_pte_bitmap & (1 << index);
 }
 
+#define PTTYPE_EPT 18 /* arbitrary */
+#define PTTYPE PTTYPE_EPT
+#include "paging_tmpl.h"
+#undef PTTYPE
+
 #define PTTYPE 64
 #include "paging_tmpl.h"
 #undef PTTYPE
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 2c2f635..762c904 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -23,6 +23,13 @@
  * so the code in this file is compiled twice, once per pte size.
  */
 
+/*
+ * This is used to catch non optimized PT_GUEST_(DIRTY|ACCESS)_SHIFT macro
+ * uses for EPT without A/D paging type.
+ */
+extern u64 __pure __using_nonexistent_pte_bit(void)
+	       __compiletime_error("wrong use of PT_GUEST_(DIRTY|ACCESS)_SHIFT");
+
 #if PTTYPE == 64
 	#define pt_element_t u64
 	#define guest_walker guest_walker64
@@ -58,6 +65,21 @@
 	#define PT_GUEST_DIRTY_SHIFT PT_DIRTY_SHIFT
 	#define PT_GUEST_ACCESSED_SHIFT PT_ACCESSED_SHIFT
 	#define CMPXCHG cmpxchg
+#elif PTTYPE == PTTYPE_EPT
+	#define pt_element_t u64
+	#define guest_walker guest_walkerEPT
+	#define FNAME(name) ept_##name
+	#define PT_BASE_ADDR_MASK PT64_BASE_ADDR_MASK
+	#define PT_LVL_ADDR_MASK(lvl) PT64_LVL_ADDR_MASK(lvl)
+	#define PT_LVL_OFFSET_MASK(lvl) PT64_LVL_OFFSET_MASK(lvl)
+	#define PT_INDEX(addr, level) PT64_INDEX(addr, level)
+	#define PT_LEVEL_BITS PT64_LEVEL_BITS
+	#define PT_GUEST_ACCESSED_MASK 0
+	#define PT_GUEST_DIRTY_MASK 0
+	#define PT_GUEST_DIRTY_SHIFT __using_nonexistent_pte_bit()
+	#define PT_GUEST_ACCESSED_SHIFT __using_nonexistent_pte_bit()
+	#define CMPXCHG cmpxchg64
+	#define PT_MAX_FULL_LEVELS 4
 #else
 	#error Invalid PTTYPE value
 #endif
@@ -115,7 +137,11 @@ static bool FNAME(is_rsvd_bits_set)(struct kvm_mmu *mmu, u64 gpte, int level)
 
 static inline int FNAME(is_present_gpte)(unsigned long pte)
 {
+#if PTTYPE != PTTYPE_EPT
 	return is_present_gpte(pte);
+#else
+	return pte & 7;
+#endif
 }
 
 static int FNAME(cmpxchg_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
@@ -165,9 +191,14 @@ no_present:
 static inline unsigned FNAME(gpte_access)(struct kvm_vcpu *vcpu, u64 gpte)
 {
 	unsigned access;
-
+#if PTTYPE == PTTYPE_EPT
+	access = ((gpte & VMX_EPT_WRITABLE_MASK) ? ACC_WRITE_MASK : 0) |
+		((gpte & VMX_EPT_EXECUTABLE_MASK) ? ACC_EXEC_MASK : 0) |
+		ACC_USER_MASK;
+#else
 	access = (gpte & (PT_WRITABLE_MASK | PT_USER_MASK)) | ACC_EXEC_MASK;
 	access &= ~(gpte >> PT64_NX_SHIFT);
+#endif
 
 	return access;
 }
@@ -369,6 +400,7 @@ static int FNAME(walk_addr)(struct guest_walker *walker,
 					access);
 }
 
+#if PTTYPE != PTTYPE_EPT
 static int FNAME(walk_addr_nested)(struct guest_walker *walker,
 				   struct kvm_vcpu *vcpu, gva_t addr,
 				   u32 access)
@@ -376,6 +408,7 @@ static int FNAME(walk_addr_nested)(struct guest_walker *walker,
 	return FNAME(walk_addr_generic)(walker, vcpu, &vcpu->arch.nested_mmu,
 					addr, access);
 }
+#endif
 
 static bool
 FNAME(prefetch_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
@@ -803,6 +836,7 @@ static gpa_t FNAME(gva_to_gpa)(struct kvm_vcpu *vcpu, gva_t vaddr, u32 access,
 	return gpa;
 }
 
+#if PTTYPE != PTTYPE_EPT
 static gpa_t FNAME(gva_to_gpa_nested)(struct kvm_vcpu *vcpu, gva_t vaddr,
 				      u32 access,
 				      struct x86_exception *exception)
@@ -821,6 +855,7 @@ static gpa_t FNAME(gva_to_gpa_nested)(struct kvm_vcpu *vcpu, gva_t vaddr,
 
 	return gpa;
 }
+#endif
 
 /*
  * Using the cached information from sp->gfns is safe because:
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v7 08/15] nEPT: Redefine EPT-specific link_shadow_page()
  2013-08-05  8:07 [PATCH v7 00/15] Nested EPT Gleb Natapov
                   ` (6 preceding siblings ...)
  2013-08-05  8:07 ` [PATCH v7 07/15] nEPT: Add EPT tables support to paging_tmpl.h Gleb Natapov
@ 2013-08-05  8:07 ` Gleb Natapov
  2013-08-05  8:07 ` [PATCH v7 09/15] nEPT: correctly check if remote tlb flush is needed for shadowed EPT tables Gleb Natapov
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 32+ messages in thread
From: Gleb Natapov @ 2013-08-05  8:07 UTC (permalink / raw)
  To: kvm; +Cc: Xiao Guangrong, Jun Nakajima, Yang Zhang, pbonzini

From: Yang Zhang <yang.z.zhang@Intel.com>

Since nEPT doesn't support A/D bit, so we should not set those bit
when build shadow page table.

Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
---
 arch/x86/kvm/mmu.c         |   12 +++++++++---
 arch/x86/kvm/paging_tmpl.h |    4 ++--
 2 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index b5273c3..9e0f467 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2047,12 +2047,18 @@ static void shadow_walk_next(struct kvm_shadow_walk_iterator *iterator)
 	return __shadow_walk_next(iterator, *iterator->sptep);
 }
 
-static void link_shadow_page(u64 *sptep, struct kvm_mmu_page *sp)
+static void link_shadow_page(u64 *sptep, struct kvm_mmu_page *sp, bool accessed)
 {
 	u64 spte;
 
+	BUILD_BUG_ON(VMX_EPT_READABLE_MASK != PT_PRESENT_MASK ||
+			VMX_EPT_WRITABLE_MASK != PT_WRITABLE_MASK);
+
 	spte = __pa(sp->spt) | PT_PRESENT_MASK | PT_WRITABLE_MASK |
-	       shadow_user_mask | shadow_x_mask | shadow_accessed_mask;
+	       shadow_user_mask | shadow_x_mask;
+
+	if (accessed)
+		spte |= shadow_accessed_mask;
 
 	mmu_spte_set(sptep, spte);
 }
@@ -2677,7 +2683,7 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, int write,
 					      iterator.level - 1,
 					      1, ACC_ALL, iterator.sptep);
 
-			link_shadow_page(iterator.sptep, sp);
+			link_shadow_page(iterator.sptep, sp, true);
 		}
 	}
 	return emulate;
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 762c904..f8e5680 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -555,7 +555,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
 			goto out_gpte_changed;
 
 		if (sp)
-			link_shadow_page(it.sptep, sp);
+			link_shadow_page(it.sptep, sp, PT_GUEST_ACCESSED_MASK);
 	}
 
 	for (;
@@ -575,7 +575,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
 
 		sp = kvm_mmu_get_page(vcpu, direct_gfn, addr, it.level-1,
 				      true, direct_access, it.sptep);
-		link_shadow_page(it.sptep, sp);
+		link_shadow_page(it.sptep, sp, PT_GUEST_ACCESSED_MASK);
 	}
 
 	clear_sp_write_flooding_count(it.sptep);
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v7 09/15] nEPT: correctly check if remote tlb flush is needed for shadowed EPT tables
  2013-08-05  8:07 [PATCH v7 00/15] Nested EPT Gleb Natapov
                   ` (7 preceding siblings ...)
  2013-08-05  8:07 ` [PATCH v7 08/15] nEPT: Redefine EPT-specific link_shadow_page() Gleb Natapov
@ 2013-08-05  8:07 ` Gleb Natapov
  2013-08-05  8:07 ` [PATCH v7 10/15] nEPT: Add nEPT violation/misconfigration support Gleb Natapov
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 32+ messages in thread
From: Gleb Natapov @ 2013-08-05  8:07 UTC (permalink / raw)
  To: kvm; +Cc: Xiao Guangrong, Jun Nakajima, Yang Zhang, pbonzini

need_remote_flush() assumes that shadow page is in PT64 format, but
with addition of nested EPT this is no longer always true. Fix it by
bits definitions that depend on host shadow page type.

Reported-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
---
 arch/x86/kvm/mmu.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 9e0f467..a512ecf 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -132,8 +132,8 @@ module_param(dbg, bool, 0644);
 	(PAGE_MASK & ~((1ULL << (PAGE_SHIFT + (((level) - 1) \
 					    * PT32_LEVEL_BITS))) - 1))
 
-#define PT64_PERM_MASK (PT_PRESENT_MASK | PT_WRITABLE_MASK | PT_USER_MASK \
-			| PT64_NX_MASK)
+#define PT64_PERM_MASK (PT_PRESENT_MASK | PT_WRITABLE_MASK | shadow_user_mask \
+			| shadow_x_mask | shadow_nx_mask)
 
 #define ACC_EXEC_MASK    1
 #define ACC_WRITE_MASK   PT_WRITABLE_MASK
@@ -3879,8 +3879,8 @@ static bool need_remote_flush(u64 old, u64 new)
 		return true;
 	if ((old ^ new) & PT64_BASE_ADDR_MASK)
 		return true;
-	old ^= PT64_NX_MASK;
-	new ^= PT64_NX_MASK;
+	old ^= shadow_nx_mask;
+	new ^= shadow_nx_mask;
 	return (old & ~new & PT64_PERM_MASK) != 0;
 }
 
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v7 10/15] nEPT: Add nEPT violation/misconfigration support
  2013-08-05  8:07 [PATCH v7 00/15] Nested EPT Gleb Natapov
                   ` (8 preceding siblings ...)
  2013-08-05  8:07 ` [PATCH v7 09/15] nEPT: correctly check if remote tlb flush is needed for shadowed EPT tables Gleb Natapov
@ 2013-08-05  8:07 ` Gleb Natapov
  2013-08-06  8:01   ` Jan Kiszka
  2013-08-06  9:00   ` Gleb Natapov
  2013-08-05  8:07 ` [PATCH v7 11/15] nEPT: MMU context for nested EPT Gleb Natapov
                   ` (5 subsequent siblings)
  15 siblings, 2 replies; 32+ messages in thread
From: Gleb Natapov @ 2013-08-05  8:07 UTC (permalink / raw)
  To: kvm; +Cc: Xiao Guangrong, Jun Nakajima, Yang Zhang, pbonzini

From: Yang Zhang <yang.z.zhang@Intel.com>

Inject nEPT fault to L1 guest. This patch is original from Xinhao.

Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Jun Nakajima <jun.nakajima@intel.com>
Signed-off-by: Xinhao Xu <xinhao.xu@intel.com>
Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
---
 arch/x86/include/asm/kvm_host.h |    4 +++
 arch/x86/kvm/mmu.c              |   61 ++++++++++++++++++++++++++++++++-------
 arch/x86/kvm/paging_tmpl.h      |   25 ++++++++++++++--
 arch/x86/kvm/vmx.c              |   19 ++++++++++++
 4 files changed, 95 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 531f47c..58a17c0 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -286,6 +286,7 @@ struct kvm_mmu {
 	u64 *pae_root;
 	u64 *lm_root;
 	u64 rsvd_bits_mask[2][4];
+	u64 bad_mt_xwr;
 
 	/*
 	 * Bitmap: bit set = last pte in walk
@@ -512,6 +513,9 @@ struct kvm_vcpu_arch {
 	 * instruction.
 	 */
 	bool write_fault_to_shadow_pgtable;
+
+	/* set at EPT violation at this point */
+	unsigned long exit_qualification;
 };
 
 struct kvm_lpage_info {
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index a512ecf..f2d982d 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3519,6 +3519,8 @@ static void reset_rsvds_bits_mask(struct kvm_vcpu *vcpu,
 	int maxphyaddr = cpuid_maxphyaddr(vcpu);
 	u64 exb_bit_rsvd = 0;
 
+	context->bad_mt_xwr = 0;
+
 	if (!context->nx)
 		exb_bit_rsvd = rsvd_bits(63, 63);
 	switch (context->root_level) {
@@ -3574,7 +3576,40 @@ static void reset_rsvds_bits_mask(struct kvm_vcpu *vcpu,
 	}
 }
 
-static void update_permission_bitmask(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu)
+static void reset_rsvds_bits_mask_ept(struct kvm_vcpu *vcpu,
+		struct kvm_mmu *context, bool execonly)
+{
+	int maxphyaddr = cpuid_maxphyaddr(vcpu);
+	int pte;
+
+	context->rsvd_bits_mask[0][3] =
+		rsvd_bits(maxphyaddr, 51) | rsvd_bits(3, 7);
+	context->rsvd_bits_mask[0][2] =
+		rsvd_bits(maxphyaddr, 51) | rsvd_bits(3, 6);
+	context->rsvd_bits_mask[0][1] =
+		rsvd_bits(maxphyaddr, 51) | rsvd_bits(3, 6);
+	context->rsvd_bits_mask[0][0] = rsvd_bits(maxphyaddr, 51);
+
+	/* large page */
+	context->rsvd_bits_mask[1][3] = context->rsvd_bits_mask[0][3];
+	context->rsvd_bits_mask[1][2] =
+		rsvd_bits(maxphyaddr, 51) | rsvd_bits(12, 29);
+	context->rsvd_bits_mask[1][1] =
+		rsvd_bits(maxphyaddr, 51) | rsvd_bits(12, 20);
+	context->rsvd_bits_mask[1][0] = context->rsvd_bits_mask[0][0];
+	
+	for (pte = 0; pte < 64; pte++) {
+		int rwx_bits = pte & 7;
+		int mt = pte >> 3;
+		if (mt == 0x2 || mt == 0x3 || mt == 0x7 ||
+				rwx_bits == 0x2 || rwx_bits == 0x6 ||
+				(rwx_bits == 0x4 && !execonly))
+			context->bad_mt_xwr |= (1ull << pte);
+	}
+}
+
+static void update_permission_bitmask(struct kvm_vcpu *vcpu,
+		struct kvm_mmu *mmu, bool ept)
 {
 	unsigned bit, byte, pfec;
 	u8 map;
@@ -3592,12 +3627,16 @@ static void update_permission_bitmask(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu
 			w = bit & ACC_WRITE_MASK;
 			u = bit & ACC_USER_MASK;
 
-			/* Not really needed: !nx will cause pte.nx to fault */
-			x |= !mmu->nx;
-			/* Allow supervisor writes if !cr0.wp */
-			w |= !is_write_protection(vcpu) && !uf;
-			/* Disallow supervisor fetches of user code if cr4.smep */
-			x &= !(smep && u && !uf);
+			if (!ept) {
+				/* Not really needed: !nx will cause pte.nx to fault */
+				x |= !mmu->nx;
+				/* Allow supervisor writes if !cr0.wp */
+				w |= !is_write_protection(vcpu) && !uf;
+				/* Disallow supervisor fetches of user code if cr4.smep */
+				x &= !(smep && u && !uf);
+			} else
+				/* Not really needed: no U/S accesses on ept  */
+				u = 1;
 
 			fault = (ff && !x) || (uf && !u) || (wf && !w);
 			map |= fault << bit;
@@ -3632,7 +3671,7 @@ static int paging64_init_context_common(struct kvm_vcpu *vcpu,
 	context->root_level = level;
 
 	reset_rsvds_bits_mask(vcpu, context);
-	update_permission_bitmask(vcpu, context);
+	update_permission_bitmask(vcpu, context, false);
 	update_last_pte_bitmap(vcpu, context);
 
 	ASSERT(is_pae(vcpu));
@@ -3662,7 +3701,7 @@ static int paging32_init_context(struct kvm_vcpu *vcpu,
 	context->root_level = PT32_ROOT_LEVEL;
 
 	reset_rsvds_bits_mask(vcpu, context);
-	update_permission_bitmask(vcpu, context);
+	update_permission_bitmask(vcpu, context, false);
 	update_last_pte_bitmap(vcpu, context);
 
 	context->new_cr3 = paging_new_cr3;
@@ -3724,7 +3763,7 @@ static int init_kvm_tdp_mmu(struct kvm_vcpu *vcpu)
 		context->gva_to_gpa = paging32_gva_to_gpa;
 	}
 
-	update_permission_bitmask(vcpu, context);
+	update_permission_bitmask(vcpu, context, false);
 	update_last_pte_bitmap(vcpu, context);
 
 	return 0;
@@ -3803,7 +3842,7 @@ static int init_kvm_nested_mmu(struct kvm_vcpu *vcpu)
 		g_context->gva_to_gpa = paging32_gva_to_gpa_nested;
 	}
 
-	update_permission_bitmask(vcpu, g_context);
+	update_permission_bitmask(vcpu, g_context, false);
 	update_last_pte_bitmap(vcpu, g_context);
 
 	return 0;
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index f8e5680..7f6af8e 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -129,10 +129,10 @@ static inline void FNAME(protect_clean_gpte)(unsigned *access, unsigned gpte)
 
 static bool FNAME(is_rsvd_bits_set)(struct kvm_mmu *mmu, u64 gpte, int level)
 {
-	int bit7;
+	int bit7 = (gpte >> 7) & 1, low6 = gpte & 0x3f;
 
-	bit7 = (gpte >> 7) & 1;
-	return (gpte & mmu->rsvd_bits_mask[bit7][level-1]) != 0;
+	return (gpte & mmu->rsvd_bits_mask[bit7][level-1]) |
+		((mmu->bad_mt_xwr & (1ull << low6)) != 0);
 }
 
 static inline int FNAME(is_present_gpte)(unsigned long pte)
@@ -386,6 +386,25 @@ error:
 	walker->fault.vector = PF_VECTOR;
 	walker->fault.error_code_valid = true;
 	walker->fault.error_code = errcode;
+
+#if PTTYPE == PTTYPE_EPT
+	/*
+	 * Use PFERR_RSVD_MASK in error_code to to tell if EPT
+	 * misconfiguration requires to be injected. The detection is
+	 * done by is_rsvd_bits_set() above.
+	 *
+	 * We set up the value of exit_qualification to inject:
+	 * [2:0] - Derive from [2:0] of real exit_qualification at EPT violation
+	 * [5:3] - Calculated by the page walk of the guest EPT page tables
+	 * [7:8] - Derived from [7:8] of real exit_qualification
+	 *
+	 * The other bits are set to 0.
+	 */
+	if (!(errcode & PFERR_RSVD_MASK)) {
+		vcpu->arch.exit_qualification &= 0x187;
+		vcpu->arch.exit_qualification |= ((pt_access & pte) & 0x7) << 3;
+	}
+#endif
 	walker->fault.address = addr;
 	walker->fault.nested_page_fault = mmu != vcpu->arch.walk_mmu;
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 4e98764..984f8d7 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -5317,9 +5317,13 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)
 
 	/* It is a write fault? */
 	error_code = exit_qualification & (1U << 1);
+	/* It is a fetch fault? */
+	error_code |= (exit_qualification & (1U << 2)) << 2;
 	/* ept page table is present? */
 	error_code |= (exit_qualification >> 3) & 0x1;
 
+	vcpu->arch.exit_qualification = exit_qualification;
+
 	return kvm_mmu_page_fault(vcpu, gpa, error_code, NULL, 0);
 }
 
@@ -7348,6 +7352,21 @@ static void vmx_set_supported_cpuid(u32 func, struct kvm_cpuid_entry2 *entry)
 		entry->ecx |= bit(X86_FEATURE_VMX);
 }
 
+static void nested_ept_inject_page_fault(struct kvm_vcpu *vcpu,
+		struct x86_exception *fault)
+{
+	struct vmcs12 *vmcs12;
+	nested_vmx_vmexit(vcpu);
+	vmcs12 = get_vmcs12(vcpu);
+
+	if (fault->error_code & PFERR_RSVD_MASK)
+		vmcs12->vm_exit_reason = EXIT_REASON_EPT_MISCONFIG;
+	else
+		vmcs12->vm_exit_reason = EXIT_REASON_EPT_VIOLATION;
+	vmcs12->exit_qualification = vcpu->arch.exit_qualification;
+	vmcs12->guest_physical_address = fault->address;
+}
+
 /*
  * prepare_vmcs02 is called when the L1 guest hypervisor runs its nested
  * L2 guest. L1 has a vmcs for L2 (vmcs12), and this function "merges" it
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v7 11/15] nEPT: MMU context for nested EPT
  2013-08-05  8:07 [PATCH v7 00/15] Nested EPT Gleb Natapov
                   ` (9 preceding siblings ...)
  2013-08-05  8:07 ` [PATCH v7 10/15] nEPT: Add nEPT violation/misconfigration support Gleb Natapov
@ 2013-08-05  8:07 ` Gleb Natapov
  2013-08-05  8:07 ` [PATCH v7 12/15] nEPT: Nested INVEPT Gleb Natapov
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 32+ messages in thread
From: Gleb Natapov @ 2013-08-05  8:07 UTC (permalink / raw)
  To: kvm; +Cc: Xiao Guangrong, Jun Nakajima, Yang Zhang, pbonzini

From: Nadav Har'El <nyh@il.ibm.com>

KVM's existing shadow MMU code already supports nested TDP. To use it, we
need to set up a new "MMU context" for nested EPT, and create a few callbacks
for it (nested_ept_*()). This context should also use the EPT versions of
the page table access functions (defined in the previous patch).
Then, we need to switch back and forth between this nested context and the
regular MMU context when switching between L1 and L2 (when L1 runs this L2
with EPT).

Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Signed-off-by: Jun Nakajima <jun.nakajima@intel.com>
Signed-off-by: Xinhao Xu <xinhao.xu@intel.com>
Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
---
 arch/x86/kvm/mmu.c |   27 +++++++++++++++++++++++++++
 arch/x86/kvm/mmu.h |    2 ++
 arch/x86/kvm/vmx.c |   41 ++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 69 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index f2d982d..e3bfdde 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3795,6 +3795,33 @@ int kvm_init_shadow_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *context)
 }
 EXPORT_SYMBOL_GPL(kvm_init_shadow_mmu);
 
+int kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *context,
+		bool execonly)
+{
+	ASSERT(vcpu);
+	ASSERT(!VALID_PAGE(vcpu->arch.mmu.root_hpa));
+
+	context->shadow_root_level = kvm_x86_ops->get_tdp_level();
+
+	context->nx = true;
+	context->new_cr3 = paging_new_cr3;
+	context->page_fault = ept_page_fault;
+	context->gva_to_gpa = ept_gva_to_gpa;
+	context->sync_page = ept_sync_page;
+	context->invlpg = ept_invlpg;
+	context->update_pte = ept_update_pte;
+	context->free = paging_free;
+	context->root_level = context->shadow_root_level;
+	context->root_hpa = INVALID_PAGE;
+	context->direct_map = false;
+
+	update_permission_bitmask(vcpu, context, true);
+	reset_rsvds_bits_mask_ept(vcpu, context, execonly);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(kvm_init_shadow_ept_mmu);
+
 static int init_kvm_softmmu(struct kvm_vcpu *vcpu)
 {
 	int r = kvm_init_shadow_mmu(vcpu, vcpu->arch.walk_mmu);
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 5b59c57..77e044a 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -71,6 +71,8 @@ enum {
 
 int handle_mmio_page_fault_common(struct kvm_vcpu *vcpu, u64 addr, bool direct);
 int kvm_init_shadow_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *context);
+int kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *context,
+		bool execonly);
 
 static inline unsigned int kvm_mmu_available_pages(struct kvm *kvm)
 {
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 984f8d7..fbfabbe 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1046,6 +1046,11 @@ static inline bool nested_cpu_has_virtual_nmis(struct vmcs12 *vmcs12,
 	return vmcs12->pin_based_vm_exec_control & PIN_BASED_VIRTUAL_NMIS;
 }
 
+static inline int nested_cpu_has_ept(struct vmcs12 *vmcs12)
+{
+	return nested_cpu_has2(vmcs12, SECONDARY_EXEC_ENABLE_EPT);
+}
+
 static inline bool is_exception(u32 intr_info)
 {
 	return (intr_info & (INTR_INFO_INTR_TYPE_MASK | INTR_INFO_VALID_MASK))
@@ -7367,6 +7372,33 @@ static void nested_ept_inject_page_fault(struct kvm_vcpu *vcpu,
 	vmcs12->guest_physical_address = fault->address;
 }
 
+/* Callbacks for nested_ept_init_mmu_context: */
+
+static unsigned long nested_ept_get_cr3(struct kvm_vcpu *vcpu)
+{
+	/* return the page table to be shadowed - in our case, EPT12 */
+	return get_vmcs12(vcpu)->ept_pointer;
+}
+
+static int nested_ept_init_mmu_context(struct kvm_vcpu *vcpu)
+{
+	int r = kvm_init_shadow_ept_mmu(vcpu, &vcpu->arch.mmu,
+			nested_vmx_ept_caps & VMX_EPT_EXECUTE_ONLY_BIT);
+
+	vcpu->arch.mmu.set_cr3           = vmx_set_cr3;
+	vcpu->arch.mmu.get_cr3           = nested_ept_get_cr3;
+	vcpu->arch.mmu.inject_page_fault = nested_ept_inject_page_fault;
+
+	vcpu->arch.walk_mmu              = &vcpu->arch.nested_mmu;
+
+	return r;
+}
+
+static void nested_ept_uninit_mmu_context(struct kvm_vcpu *vcpu)
+{
+	vcpu->arch.walk_mmu = &vcpu->arch.mmu;
+}
+
 /*
  * prepare_vmcs02 is called when the L1 guest hypervisor runs its nested
  * L2 guest. L1 has a vmcs for L2 (vmcs12), and this function "merges" it
@@ -7587,6 +7619,11 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
 		vmx_flush_tlb(vcpu);
 	}
 
+	if (nested_cpu_has_ept(vmcs12)) {
+		kvm_mmu_unload(vcpu);
+		nested_ept_init_mmu_context(vcpu);
+	}
+
 	if (vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_EFER)
 		vcpu->arch.efer = vmcs12->guest_ia32_efer;
 	else if (vmcs12->vm_entry_controls & VM_ENTRY_IA32E_MODE)
@@ -8059,7 +8096,9 @@ static void load_vmcs12_host_state(struct kvm_vcpu *vcpu,
 	vcpu->arch.cr4_guest_owned_bits = ~vmcs_readl(CR4_GUEST_HOST_MASK);
 	kvm_set_cr4(vcpu, vmcs12->host_cr4);
 
-	/* shadow page tables on either EPT or shadow page tables */
+	if (nested_cpu_has_ept(vmcs12))
+		nested_ept_uninit_mmu_context(vcpu);
+
 	kvm_set_cr3(vcpu, vmcs12->host_cr3);
 	kvm_mmu_reset_context(vcpu);
 
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v7 12/15] nEPT: Nested INVEPT
  2013-08-05  8:07 [PATCH v7 00/15] Nested EPT Gleb Natapov
                   ` (10 preceding siblings ...)
  2013-08-05  8:07 ` [PATCH v7 11/15] nEPT: MMU context for nested EPT Gleb Natapov
@ 2013-08-05  8:07 ` Gleb Natapov
  2013-08-05  8:53   ` Xiao Guangrong
  2013-08-05  8:07 ` [PATCH v7 13/15] nEPT: Advertise EPT to L1 Gleb Natapov
                   ` (3 subsequent siblings)
  15 siblings, 1 reply; 32+ messages in thread
From: Gleb Natapov @ 2013-08-05  8:07 UTC (permalink / raw)
  To: kvm; +Cc: Xiao Guangrong, Jun Nakajima, Yang Zhang, pbonzini

From: Nadav Har'El <nyh@il.ibm.com>

If we let L1 use EPT, we should probably also support the INVEPT instruction.

In our current nested EPT implementation, when L1 changes its EPT table
for L2 (i.e., EPT12), L0 modifies the shadow EPT table (EPT02), and in
the course of this modification already calls INVEPT. But if last level
of shadow page is unsync not all L1's changes to EPT12 are intercepted,
which means roots need to be synced when L1 calls INVEPT. Global INVEPT
should not be different since roots are synced by kvm_mmu_load() each
time EPTP02 changes.

Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Signed-off-by: Jun Nakajima <jun.nakajima@intel.com>
Signed-off-by: Xinhao Xu <xinhao.xu@intel.com>
Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
---
 arch/x86/include/asm/vmx.h      |    2 ++
 arch/x86/include/uapi/asm/vmx.h |    1 +
 arch/x86/kvm/mmu.c              |    2 ++
 arch/x86/kvm/vmx.c              |   72 +++++++++++++++++++++++++++++++++++++++
 4 files changed, 77 insertions(+)

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index f3e01a2..966502d 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -387,6 +387,7 @@ enum vmcs_field {
 #define VMX_EPT_EXTENT_INDIVIDUAL_ADDR		0
 #define VMX_EPT_EXTENT_CONTEXT			1
 #define VMX_EPT_EXTENT_GLOBAL			2
+#define VMX_EPT_EXTENT_SHIFT			24
 
 #define VMX_EPT_EXECUTE_ONLY_BIT		(1ull)
 #define VMX_EPT_PAGE_WALK_4_BIT			(1ull << 6)
@@ -394,6 +395,7 @@ enum vmcs_field {
 #define VMX_EPTP_WB_BIT				(1ull << 14)
 #define VMX_EPT_2MB_PAGE_BIT			(1ull << 16)
 #define VMX_EPT_1GB_PAGE_BIT			(1ull << 17)
+#define VMX_EPT_INVEPT_BIT			(1ull << 20)
 #define VMX_EPT_AD_BIT				    (1ull << 21)
 #define VMX_EPT_EXTENT_CONTEXT_BIT		(1ull << 25)
 #define VMX_EPT_EXTENT_GLOBAL_BIT		(1ull << 26)
diff --git a/arch/x86/include/uapi/asm/vmx.h b/arch/x86/include/uapi/asm/vmx.h
index d651082..7a34e8f 100644
--- a/arch/x86/include/uapi/asm/vmx.h
+++ b/arch/x86/include/uapi/asm/vmx.h
@@ -65,6 +65,7 @@
 #define EXIT_REASON_EOI_INDUCED         45
 #define EXIT_REASON_EPT_VIOLATION       48
 #define EXIT_REASON_EPT_MISCONFIG       49
+#define EXIT_REASON_INVEPT              50
 #define EXIT_REASON_PREEMPTION_TIMER    52
 #define EXIT_REASON_WBINVD              54
 #define EXIT_REASON_XSETBV              55
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index e3bfdde..c0b4e0f 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3182,6 +3182,7 @@ void kvm_mmu_sync_roots(struct kvm_vcpu *vcpu)
 	mmu_sync_roots(vcpu);
 	spin_unlock(&vcpu->kvm->mmu_lock);
 }
+EXPORT_SYMBOL_GPL(kvm_mmu_sync_roots);
 
 static gpa_t nonpaging_gva_to_gpa(struct kvm_vcpu *vcpu, gva_t vaddr,
 				  u32 access, struct x86_exception *exception)
@@ -3451,6 +3452,7 @@ void kvm_mmu_flush_tlb(struct kvm_vcpu *vcpu)
 	++vcpu->stat.tlb_flush;
 	kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
 }
+EXPORT_SYMBOL_GPL(kvm_mmu_flush_tlb);
 
 static void paging_new_cr3(struct kvm_vcpu *vcpu)
 {
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index fbfabbe..51c06c2 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -712,6 +712,7 @@ static void nested_release_page_clean(struct page *page)
 	kvm_release_page_clean(page);
 }
 
+static unsigned long nested_ept_get_cr3(struct kvm_vcpu *vcpu);
 static u64 construct_eptp(unsigned long root_hpa);
 static void kvm_cpu_vmxon(u64 addr);
 static void kvm_cpu_vmxoff(void);
@@ -2161,6 +2162,7 @@ static u32 nested_vmx_pinbased_ctls_low, nested_vmx_pinbased_ctls_high;
 static u32 nested_vmx_exit_ctls_low, nested_vmx_exit_ctls_high;
 static u32 nested_vmx_entry_ctls_low, nested_vmx_entry_ctls_high;
 static u32 nested_vmx_misc_low, nested_vmx_misc_high;
+static u32 nested_vmx_ept_caps;
 static __init void nested_vmx_setup_ctls_msrs(void)
 {
 	/*
@@ -6279,6 +6281,74 @@ static int handle_vmptrst(struct kvm_vcpu *vcpu)
 	return 1;
 }
 
+/* Emulate the INVEPT instruction */
+static int handle_invept(struct kvm_vcpu *vcpu)
+{
+	u32 vmx_instruction_info, types;
+	unsigned long type;
+	gva_t gva;
+	struct x86_exception e;
+	struct {
+		u64 eptp, gpa;
+	} operand;
+	u64 eptp_mask = ((1ull << 51) - 1) & PAGE_MASK;
+
+	if (!(nested_vmx_secondary_ctls_high & SECONDARY_EXEC_ENABLE_EPT) ||
+	    !(nested_vmx_ept_caps & VMX_EPT_INVEPT_BIT)) {
+		kvm_queue_exception(vcpu, UD_VECTOR);
+		return 1;
+	}
+
+	if (!nested_vmx_check_permission(vcpu))
+		return 1;
+
+	if (!kvm_read_cr0_bits(vcpu, X86_CR0_PE)) {
+		kvm_queue_exception(vcpu, UD_VECTOR);
+		return 1;
+	}
+
+	vmx_instruction_info = vmcs_read32(VMX_INSTRUCTION_INFO);
+	type = kvm_register_read(vcpu, (vmx_instruction_info >> 28) & 0xf);
+
+	types = (nested_vmx_ept_caps >> VMX_EPT_EXTENT_SHIFT) & 6;
+
+	if (!(types & (1UL << type))) {
+		nested_vmx_failValid(vcpu,
+				VMXERR_INVALID_OPERAND_TO_INVEPT_INVVPID);
+		return 1;
+	}
+
+	/* According to the Intel VMX instruction reference, the memory
+	 * operand is read even if it isn't needed (e.g., for type==global)
+	 */
+	if (get_vmx_mem_address(vcpu, vmcs_readl(EXIT_QUALIFICATION),
+			vmx_instruction_info, &gva))
+		return 1;
+	if (kvm_read_guest_virt(&vcpu->arch.emulate_ctxt, gva, &operand,
+				sizeof(operand), &e)) {
+		kvm_inject_page_fault(vcpu, &e);
+		return 1;
+	}
+
+	switch (type) {
+	case VMX_EPT_EXTENT_CONTEXT:
+		if ((operand.eptp & eptp_mask) !=
+				(nested_ept_get_cr3(vcpu) & eptp_mask))
+			break;
+	case VMX_EPT_EXTENT_GLOBAL:
+		kvm_mmu_sync_roots(vcpu);
+		kvm_mmu_flush_tlb(vcpu);
+		nested_vmx_succeed(vcpu);
+		break;
+	default:
+		BUG_ON(1);
+		break;
+	}
+
+	skip_emulated_instruction(vcpu);
+	return 1;
+}
+
 /*
  * The exit handlers return 1 if the exit was handled fully and guest execution
  * may resume.  Otherwise they set the kvm_run parameter to indicate what needs
@@ -6323,6 +6393,7 @@ static int (*const kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = {
 	[EXIT_REASON_PAUSE_INSTRUCTION]       = handle_pause,
 	[EXIT_REASON_MWAIT_INSTRUCTION]	      = handle_invalid_op,
 	[EXIT_REASON_MONITOR_INSTRUCTION]     = handle_invalid_op,
+	[EXIT_REASON_INVEPT]                  = handle_invept,
 };
 
 static const int kvm_vmx_max_exit_handlers =
@@ -6549,6 +6620,7 @@ static bool nested_vmx_exit_handled(struct kvm_vcpu *vcpu)
 	case EXIT_REASON_VMPTRST: case EXIT_REASON_VMREAD:
 	case EXIT_REASON_VMRESUME: case EXIT_REASON_VMWRITE:
 	case EXIT_REASON_VMOFF: case EXIT_REASON_VMON:
+	case EXIT_REASON_INVEPT:
 		/*
 		 * VMX instructions trap unconditionally. This allows L1 to
 		 * emulate them for its L2 guest, i.e., allows 3-level nesting!
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v7 13/15] nEPT: Advertise EPT to L1
  2013-08-05  8:07 [PATCH v7 00/15] Nested EPT Gleb Natapov
                   ` (11 preceding siblings ...)
  2013-08-05  8:07 ` [PATCH v7 12/15] nEPT: Nested INVEPT Gleb Natapov
@ 2013-08-05  8:07 ` Gleb Natapov
  2013-08-06  8:20   ` Jan Kiszka
  2013-08-06  9:02   ` Gleb Natapov
  2013-08-05  8:07 ` [PATCH v7 14/15] nEPT: Some additional comments Gleb Natapov
                   ` (2 subsequent siblings)
  15 siblings, 2 replies; 32+ messages in thread
From: Gleb Natapov @ 2013-08-05  8:07 UTC (permalink / raw)
  To: kvm; +Cc: Xiao Guangrong, Jun Nakajima, Yang Zhang, pbonzini

From: Nadav Har'El <nyh@il.ibm.com>

Advertise the support of EPT to the L1 guest, through the appropriate MSR.

This is the last patch of the basic Nested EPT feature, so as to allow
bisection through this patch series: The guest will not see EPT support until
this last patch, and will not attempt to use the half-applied feature.

Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Signed-off-by: Jun Nakajima <jun.nakajima@intel.com>
Signed-off-by: Xinhao Xu <xinhao.xu@intel.com>
Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
---
 arch/x86/kvm/vmx.c |   20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 51c06c2..c65a08a 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2250,6 +2250,22 @@ static __init void nested_vmx_setup_ctls_msrs(void)
 		SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES |
 		SECONDARY_EXEC_WBINVD_EXITING;
 
+	if (enable_ept) {
+		/* nested EPT: emulate EPT also to L1 */
+		nested_vmx_secondary_ctls_high |= SECONDARY_EXEC_ENABLE_EPT;
+		nested_vmx_ept_caps = VMX_EPT_PAGE_WALK_4_BIT;
+		nested_vmx_ept_caps |= VMX_EPT_INVEPT_BIT;
+		nested_vmx_ept_caps &= vmx_capability.ept;
+		/*
+		 * Since invept is completely emulated we support both global
+		 * and context invalidation independent of what host cpu
+		 * supports
+		 */
+	       	nested_vmx_ept_caps |= VMX_EPT_EXTENT_GLOBAL_BIT |
+			VMX_EPT_EXTENT_CONTEXT_BIT;
+	} else
+		nested_vmx_ept_caps = 0;
+
 	/* miscellaneous data */
 	rdmsr(MSR_IA32_VMX_MISC, nested_vmx_misc_low, nested_vmx_misc_high);
 	nested_vmx_misc_low &= VMX_MISC_PREEMPTION_TIMER_RATE_MASK |
@@ -2358,8 +2374,8 @@ static int vmx_get_vmx_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata)
 					nested_vmx_secondary_ctls_high);
 		break;
 	case MSR_IA32_VMX_EPT_VPID_CAP:
-		/* Currently, no nested ept or nested vpid */
-		*pdata = 0;
+		/* Currently, no nested vpid support */
+		*pdata = nested_vmx_ept_caps;
 		break;
 	default:
 		return 0;
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v7 14/15] nEPT: Some additional comments
  2013-08-05  8:07 [PATCH v7 00/15] Nested EPT Gleb Natapov
                   ` (12 preceding siblings ...)
  2013-08-05  8:07 ` [PATCH v7 13/15] nEPT: Advertise EPT to L1 Gleb Natapov
@ 2013-08-05  8:07 ` Gleb Natapov
  2013-08-05  8:07 ` [PATCH v7 15/15] nEPT: Miscelleneous cleanups Gleb Natapov
  2013-08-07 14:07 ` [PATCH v7 00/15] Nested EPT Paolo Bonzini
  15 siblings, 0 replies; 32+ messages in thread
From: Gleb Natapov @ 2013-08-05  8:07 UTC (permalink / raw)
  To: kvm; +Cc: Xiao Guangrong, Jun Nakajima, Yang Zhang, pbonzini

From: Nadav Har'El <nyh@il.ibm.com>

Some additional comments to preexisting code:
Explain who (L0 or L1) handles EPT violation and misconfiguration exits.
Don't mention "shadow on either EPT or shadow" as the only two options.

Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Signed-off-by: Jun Nakajima <jun.nakajima@intel.com>
Signed-off-by: Xinhao Xu <xinhao.xu@intel.com>
Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
---
 arch/x86/kvm/vmx.c |   13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index c65a08a..ed224bd 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -6669,7 +6669,20 @@ static bool nested_vmx_exit_handled(struct kvm_vcpu *vcpu)
 		return nested_cpu_has2(vmcs12,
 			SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES);
 	case EXIT_REASON_EPT_VIOLATION:
+		/*
+		 * L0 always deals with the EPT violation. If nested EPT is
+		 * used, and the nested mmu code discovers that the address is
+		 * missing in the guest EPT table (EPT12), the EPT violation
+		 * will be injected with nested_ept_inject_page_fault()
+		 */
+		return 0;
 	case EXIT_REASON_EPT_MISCONFIG:
+		/*
+		 * L2 never uses directly L1's EPT, but rather L0's own EPT
+		 * table (shadow on EPT) or a merged EPT table that L0 built
+		 * (EPT on EPT). So any problems with the structure of the
+		 * table is L0's fault.
+		 */
 		return 0;
 	case EXIT_REASON_PREEMPTION_TIMER:
 		return vmcs12->pin_based_vm_exec_control &
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v7 15/15] nEPT: Miscelleneous cleanups
  2013-08-05  8:07 [PATCH v7 00/15] Nested EPT Gleb Natapov
                   ` (13 preceding siblings ...)
  2013-08-05  8:07 ` [PATCH v7 14/15] nEPT: Some additional comments Gleb Natapov
@ 2013-08-05  8:07 ` Gleb Natapov
  2013-08-07 14:07 ` [PATCH v7 00/15] Nested EPT Paolo Bonzini
  15 siblings, 0 replies; 32+ messages in thread
From: Gleb Natapov @ 2013-08-05  8:07 UTC (permalink / raw)
  To: kvm; +Cc: Xiao Guangrong, Jun Nakajima, Yang Zhang, pbonzini

From: Nadav Har'El <nyh@il.ibm.com>

Some trivial code cleanups not really related to nested EPT.

Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Signed-off-by: Jun Nakajima <jun.nakajima@intel.com>
Signed-off-by: Xinhao Xu <xinhao.xu@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
---
 arch/x86/kvm/vmx.c |    6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index ed224bd..01f2091 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -716,7 +716,6 @@ static unsigned long nested_ept_get_cr3(struct kvm_vcpu *vcpu);
 static u64 construct_eptp(unsigned long root_hpa);
 static void kvm_cpu_vmxon(u64 addr);
 static void kvm_cpu_vmxoff(void);
-static void vmx_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3);
 static int vmx_set_tss_addr(struct kvm *kvm, unsigned int addr);
 static void vmx_set_segment(struct kvm_vcpu *vcpu,
 			    struct kvm_segment *var, int seg);
@@ -1041,8 +1040,7 @@ static inline bool nested_cpu_has2(struct vmcs12 *vmcs12, u32 bit)
 		(vmcs12->secondary_vm_exec_control & bit);
 }
 
-static inline bool nested_cpu_has_virtual_nmis(struct vmcs12 *vmcs12,
-	struct kvm_vcpu *vcpu)
+static inline bool nested_cpu_has_virtual_nmis(struct vmcs12 *vmcs12)
 {
 	return vmcs12->pin_based_vm_exec_control & PIN_BASED_VIRTUAL_NMIS;
 }
@@ -6770,7 +6768,7 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu)
 
 	if (unlikely(!cpu_has_virtual_nmis() && vmx->soft_vnmi_blocked &&
 	    !(is_guest_mode(vcpu) && nested_cpu_has_virtual_nmis(
-	                                get_vmcs12(vcpu), vcpu)))) {
+					get_vmcs12(vcpu))))) {
 		if (vmx_interrupt_allowed(vcpu)) {
 			vmx->soft_vnmi_blocked = 0;
 		} else if (vmx->vnmi_blocked_time > 1000000000LL &&
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 12/15] nEPT: Nested INVEPT
  2013-08-05  8:07 ` [PATCH v7 12/15] nEPT: Nested INVEPT Gleb Natapov
@ 2013-08-05  8:53   ` Xiao Guangrong
  0 siblings, 0 replies; 32+ messages in thread
From: Xiao Guangrong @ 2013-08-05  8:53 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: kvm, Jun Nakajima, Yang Zhang, pbonzini

On 08/05/2013 04:07 PM, Gleb Natapov wrote:
> From: Nadav Har'El <nyh@il.ibm.com>
> 
> If we let L1 use EPT, we should probably also support the INVEPT instruction.
> 
> In our current nested EPT implementation, when L1 changes its EPT table
> for L2 (i.e., EPT12), L0 modifies the shadow EPT table (EPT02), and in
> the course of this modification already calls INVEPT. But if last level
> of shadow page is unsync not all L1's changes to EPT12 are intercepted,
> which means roots need to be synced when L1 calls INVEPT. Global INVEPT
> should not be different since roots are synced by kvm_mmu_load() each
> time EPTP02 changes.

Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 01/15] nEPT: Support LOAD_IA32_EFER entry/exit controls for L1
  2013-08-05  8:07 ` [PATCH v7 01/15] nEPT: Support LOAD_IA32_EFER entry/exit controls for L1 Gleb Natapov
@ 2013-08-05 11:27   ` Arthur Chunqi Li
  2013-08-05 11:40     ` Gleb Natapov
  2013-08-05 11:48     ` Paolo Bonzini
  0 siblings, 2 replies; 32+ messages in thread
From: Arthur Chunqi Li @ 2013-08-05 11:27 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: kvm, Xiao Guangrong, Jun Nakajima, Yang Zhang, Paolo Bonzini

On Mon, Aug 5, 2013 at 4:07 PM, Gleb Natapov <gleb@redhat.com> wrote:
> From: Nadav Har'El <nyh@il.ibm.com>
>
> Recent KVM, since http://kerneltrap.org/mailarchive/linux-kvm/2010/5/2/6261577
> switch the EFER MSR when EPT is used and the host and guest have different
> NX bits. So if we add support for nested EPT (L1 guest using EPT to run L2)
> and want to be able to run recent KVM as L1, we need to allow L1 to use this
> EFER switching feature.
>
> To do this EFER switching, KVM uses VM_ENTRY/EXIT_LOAD_IA32_EFER if available,
> and if it isn't, it uses the generic VM_ENTRY/EXIT_MSR_LOAD. This patch adds
> support for the former (the latter is still unsupported).
>
> Nested entry and exit emulation (prepare_vmcs_02 and load_vmcs12_host_state,
> respectively) already handled VM_ENTRY/EXIT_LOAD_IA32_EFER correctly. So all
> that's left to do in this patch is to properly advertise this feature to L1.
>
> Note that vmcs12's VM_ENTRY/EXIT_LOAD_IA32_EFER are emulated by L0, by using
> vmx_set_efer (which itself sets one of several vmcs02 fields), so we always
> support this feature, regardless of whether the host supports it.
>
> Reviewed-by: Orit Wasserman <owasserm@redhat.com>
> Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
> Signed-off-by: Jun Nakajima <jun.nakajima@intel.com>
> Signed-off-by: Xinhao Xu <xinhao.xu@intel.com>
> Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
> Signed-off-by: Gleb Natapov <gleb@redhat.com>
> ---
>  arch/x86/kvm/vmx.c |   23 ++++++++++++++++-------
>  1 file changed, 16 insertions(+), 7 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index e999dc7..27efa6a 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -2198,7 +2198,8 @@ static __init void nested_vmx_setup_ctls_msrs(void)
>  #else
>         nested_vmx_exit_ctls_high = 0;
>  #endif
> -       nested_vmx_exit_ctls_high |= VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR;
> +       nested_vmx_exit_ctls_high |= (VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR |
> +                                     VM_EXIT_LOAD_IA32_EFER);
Gleb, why we don't need to check whether host supports
VM_EXIT_LOAD_IA32_EFER here, as what you noted in my
VM_EXIT_LOAD_IA32_PAT patch?

Arthur
>
>         /* entry controls */
>         rdmsr(MSR_IA32_VMX_ENTRY_CTLS,
> @@ -2207,8 +2208,8 @@ static __init void nested_vmx_setup_ctls_msrs(void)
>         nested_vmx_entry_ctls_low = VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR;
>         nested_vmx_entry_ctls_high &=
>                 VM_ENTRY_LOAD_IA32_PAT | VM_ENTRY_IA32E_MODE;
> -       nested_vmx_entry_ctls_high |= VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR;
> -
> +       nested_vmx_entry_ctls_high |= (VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR |
> +                                      VM_ENTRY_LOAD_IA32_EFER);
>         /* cpu-based controls */
>         rdmsr(MSR_IA32_VMX_PROCBASED_CTLS,
>                 nested_vmx_procbased_ctls_low, nested_vmx_procbased_ctls_high);
> @@ -7529,10 +7530,18 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
>         vcpu->arch.cr0_guest_owned_bits &= ~vmcs12->cr0_guest_host_mask;
>         vmcs_writel(CR0_GUEST_HOST_MASK, ~vcpu->arch.cr0_guest_owned_bits);
>
> -       /* Note: IA32_MODE, LOAD_IA32_EFER are modified by vmx_set_efer below */
> -       vmcs_write32(VM_EXIT_CONTROLS,
> -               vmcs12->vm_exit_controls | vmcs_config.vmexit_ctrl);
> -       vmcs_write32(VM_ENTRY_CONTROLS, vmcs12->vm_entry_controls |
> +       /* L2->L1 exit controls are emulated - the hardware exit is to L0 so
> +        * we should use its exit controls. Note that VM_EXIT_LOAD_IA32_EFER
> +        * bits are further modified by vmx_set_efer() below.
> +        */
> +       vmcs_write32(VM_EXIT_CONTROLS, vmcs_config.vmexit_ctrl);
> +
> +       /* vmcs12's VM_ENTRY_LOAD_IA32_EFER and VM_ENTRY_IA32E_MODE are
> +        * emulated by vmx_set_efer(), below.
> +        */
> +       vmcs_write32(VM_ENTRY_CONTROLS,
> +               (vmcs12->vm_entry_controls & ~VM_ENTRY_LOAD_IA32_EFER &
> +                       ~VM_ENTRY_IA32E_MODE) |
>                 (vmcs_config.vmentry_ctrl & ~VM_ENTRY_IA32E_MODE));
>
>         if (vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_PAT)
> --
> 1.7.10.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 01/15] nEPT: Support LOAD_IA32_EFER entry/exit controls for L1
  2013-08-05 11:27   ` Arthur Chunqi Li
@ 2013-08-05 11:40     ` Gleb Natapov
  2013-08-06  7:47       ` Jan Kiszka
  2013-08-05 11:48     ` Paolo Bonzini
  1 sibling, 1 reply; 32+ messages in thread
From: Gleb Natapov @ 2013-08-05 11:40 UTC (permalink / raw)
  To: Arthur Chunqi Li
  Cc: kvm, Xiao Guangrong, Jun Nakajima, Yang Zhang, Paolo Bonzini

On Mon, Aug 05, 2013 at 07:27:33PM +0800, Arthur Chunqi Li wrote:
> On Mon, Aug 5, 2013 at 4:07 PM, Gleb Natapov <gleb@redhat.com> wrote:
> > From: Nadav Har'El <nyh@il.ibm.com>
> >
> > Recent KVM, since http://kerneltrap.org/mailarchive/linux-kvm/2010/5/2/6261577
> > switch the EFER MSR when EPT is used and the host and guest have different
> > NX bits. So if we add support for nested EPT (L1 guest using EPT to run L2)
> > and want to be able to run recent KVM as L1, we need to allow L1 to use this
> > EFER switching feature.
> >
> > To do this EFER switching, KVM uses VM_ENTRY/EXIT_LOAD_IA32_EFER if available,
> > and if it isn't, it uses the generic VM_ENTRY/EXIT_MSR_LOAD. This patch adds
> > support for the former (the latter is still unsupported).
> >
> > Nested entry and exit emulation (prepare_vmcs_02 and load_vmcs12_host_state,
> > respectively) already handled VM_ENTRY/EXIT_LOAD_IA32_EFER correctly. So all
> > that's left to do in this patch is to properly advertise this feature to L1.
> >
> > Note that vmcs12's VM_ENTRY/EXIT_LOAD_IA32_EFER are emulated by L0, by using
> > vmx_set_efer (which itself sets one of several vmcs02 fields), so we always
> > support this feature, regardless of whether the host supports it.
> >
> > Reviewed-by: Orit Wasserman <owasserm@redhat.com>
> > Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
> > Signed-off-by: Jun Nakajima <jun.nakajima@intel.com>
> > Signed-off-by: Xinhao Xu <xinhao.xu@intel.com>
> > Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
> > Signed-off-by: Gleb Natapov <gleb@redhat.com>
> > ---
> >  arch/x86/kvm/vmx.c |   23 ++++++++++++++++-------
> >  1 file changed, 16 insertions(+), 7 deletions(-)
> >
> > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > index e999dc7..27efa6a 100644
> > --- a/arch/x86/kvm/vmx.c
> > +++ b/arch/x86/kvm/vmx.c
> > @@ -2198,7 +2198,8 @@ static __init void nested_vmx_setup_ctls_msrs(void)
> >  #else
> >         nested_vmx_exit_ctls_high = 0;
> >  #endif
> > -       nested_vmx_exit_ctls_high |= VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR;
> > +       nested_vmx_exit_ctls_high |= (VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR |
> > +                                     VM_EXIT_LOAD_IA32_EFER);
> Gleb, why we don't need to check whether host supports
> VM_EXIT_LOAD_IA32_EFER here, as what you noted in my
> VM_EXIT_LOAD_IA32_PAT patch?
Nested VMX completely emulates the capability. It calls vmx_set_efer()
which suppose to handle hosts without VM_EXIT_LOAD_IA32_EFER support.

--
			Gleb.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 01/15] nEPT: Support LOAD_IA32_EFER entry/exit controls for L1
  2013-08-05 11:27   ` Arthur Chunqi Li
  2013-08-05 11:40     ` Gleb Natapov
@ 2013-08-05 11:48     ` Paolo Bonzini
  2013-08-05 12:05       ` Gleb Natapov
  1 sibling, 1 reply; 32+ messages in thread
From: Paolo Bonzini @ 2013-08-05 11:48 UTC (permalink / raw)
  To: Arthur Chunqi Li
  Cc: Gleb Natapov, kvm, Xiao Guangrong, Jun Nakajima, Yang Zhang

> > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > index e999dc7..27efa6a 100644
> > --- a/arch/x86/kvm/vmx.c
> > +++ b/arch/x86/kvm/vmx.c
> > @@ -2198,7 +2198,8 @@ static __init void nested_vmx_setup_ctls_msrs(void)
> >  #else
> >         nested_vmx_exit_ctls_high = 0;
> >  #endif
> > -       nested_vmx_exit_ctls_high |= VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR;
> > +       nested_vmx_exit_ctls_high |= (VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR |
> > +                                     VM_EXIT_LOAD_IA32_EFER);
> 
> Gleb, why we don't need to check whether host supports
> VM_EXIT_LOAD_IA32_EFER here, as what you noted in my
> VM_EXIT_LOAD_IA32_PAT patch?

The host can also emulate VM_EXIT_LOAD_IA32_EFER using the
VM-exit MSR-load feature.  If neither EFER save/load nor
MSR save/load are available, I believe you are right and
the feature should not be available in nested VMX.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 01/15] nEPT: Support LOAD_IA32_EFER entry/exit controls for L1
  2013-08-05 11:48     ` Paolo Bonzini
@ 2013-08-05 12:05       ` Gleb Natapov
  0 siblings, 0 replies; 32+ messages in thread
From: Gleb Natapov @ 2013-08-05 12:05 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Arthur Chunqi Li, kvm, Xiao Guangrong, Jun Nakajima, Yang Zhang

On Mon, Aug 05, 2013 at 07:48:54AM -0400, Paolo Bonzini wrote:
> > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > > index e999dc7..27efa6a 100644
> > > --- a/arch/x86/kvm/vmx.c
> > > +++ b/arch/x86/kvm/vmx.c
> > > @@ -2198,7 +2198,8 @@ static __init void nested_vmx_setup_ctls_msrs(void)
> > >  #else
> > >         nested_vmx_exit_ctls_high = 0;
> > >  #endif
> > > -       nested_vmx_exit_ctls_high |= VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR;
> > > +       nested_vmx_exit_ctls_high |= (VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR |
> > > +                                     VM_EXIT_LOAD_IA32_EFER);
> > 
> > Gleb, why we don't need to check whether host supports
> > VM_EXIT_LOAD_IA32_EFER here, as what you noted in my
> > VM_EXIT_LOAD_IA32_PAT patch?
> 
> The host can also emulate VM_EXIT_LOAD_IA32_EFER using the
> VM-exit MSR-load feature.  If neither EFER save/load nor
> MSR save/load are available, I believe you are right and
> the feature should not be available in nested VMX.
As far as I can tell MSR save/load is not optional.

--
			Gleb.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 01/15] nEPT: Support LOAD_IA32_EFER entry/exit controls for L1
  2013-08-05 11:40     ` Gleb Natapov
@ 2013-08-06  7:47       ` Jan Kiszka
  2013-08-06  7:51         ` Gleb Natapov
  0 siblings, 1 reply; 32+ messages in thread
From: Jan Kiszka @ 2013-08-06  7:47 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Arthur Chunqi Li, kvm, Xiao Guangrong, Jun Nakajima, Yang Zhang,
	Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 2960 bytes --]

On 2013-08-05 13:40, Gleb Natapov wrote:
> On Mon, Aug 05, 2013 at 07:27:33PM +0800, Arthur Chunqi Li wrote:
>> On Mon, Aug 5, 2013 at 4:07 PM, Gleb Natapov <gleb@redhat.com> wrote:
>>> From: Nadav Har'El <nyh@il.ibm.com>
>>>
>>> Recent KVM, since http://kerneltrap.org/mailarchive/linux-kvm/2010/5/2/6261577
>>> switch the EFER MSR when EPT is used and the host and guest have different
>>> NX bits. So if we add support for nested EPT (L1 guest using EPT to run L2)
>>> and want to be able to run recent KVM as L1, we need to allow L1 to use this
>>> EFER switching feature.
>>>
>>> To do this EFER switching, KVM uses VM_ENTRY/EXIT_LOAD_IA32_EFER if available,
>>> and if it isn't, it uses the generic VM_ENTRY/EXIT_MSR_LOAD. This patch adds
>>> support for the former (the latter is still unsupported).
>>>
>>> Nested entry and exit emulation (prepare_vmcs_02 and load_vmcs12_host_state,
>>> respectively) already handled VM_ENTRY/EXIT_LOAD_IA32_EFER correctly. So all
>>> that's left to do in this patch is to properly advertise this feature to L1.
>>>
>>> Note that vmcs12's VM_ENTRY/EXIT_LOAD_IA32_EFER are emulated by L0, by using
>>> vmx_set_efer (which itself sets one of several vmcs02 fields), so we always
>>> support this feature, regardless of whether the host supports it.
>>>
>>> Reviewed-by: Orit Wasserman <owasserm@redhat.com>
>>> Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
>>> Signed-off-by: Jun Nakajima <jun.nakajima@intel.com>
>>> Signed-off-by: Xinhao Xu <xinhao.xu@intel.com>
>>> Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
>>> Signed-off-by: Gleb Natapov <gleb@redhat.com>
>>> ---
>>>  arch/x86/kvm/vmx.c |   23 ++++++++++++++++-------
>>>  1 file changed, 16 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>>> index e999dc7..27efa6a 100644
>>> --- a/arch/x86/kvm/vmx.c
>>> +++ b/arch/x86/kvm/vmx.c
>>> @@ -2198,7 +2198,8 @@ static __init void nested_vmx_setup_ctls_msrs(void)
>>>  #else
>>>         nested_vmx_exit_ctls_high = 0;
>>>  #endif
>>> -       nested_vmx_exit_ctls_high |= VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR;
>>> +       nested_vmx_exit_ctls_high |= (VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR |
>>> +                                     VM_EXIT_LOAD_IA32_EFER);
>> Gleb, why we don't need to check whether host supports
>> VM_EXIT_LOAD_IA32_EFER here, as what you noted in my
>> VM_EXIT_LOAD_IA32_PAT patch?
> Nested VMX completely emulates the capability.

No, it doesn't. The values for host/guest are handled over via the
corresponding VMCS fields, physically, even though the actual loading is
emulated then. So we must not expose this feature unconditionally.
That's the same pattern as with PAT, Arthur is right.

I've a patch on top of v7 to fix this (and also the incorrect
VM_ENTRY_IA32E_MODE reporting on 32-bit). Will post it soon as part of
an updated version of my unrestricted guest mode series.

Jan



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 01/15] nEPT: Support LOAD_IA32_EFER entry/exit controls for L1
  2013-08-06  7:47       ` Jan Kiszka
@ 2013-08-06  7:51         ` Gleb Natapov
  2013-08-06  7:55           ` Jan Kiszka
  0 siblings, 1 reply; 32+ messages in thread
From: Gleb Natapov @ 2013-08-06  7:51 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Arthur Chunqi Li, kvm, Xiao Guangrong, Jun Nakajima, Yang Zhang,
	Paolo Bonzini

On Tue, Aug 06, 2013 at 09:47:23AM +0200, Jan Kiszka wrote:
> On 2013-08-05 13:40, Gleb Natapov wrote:
> > On Mon, Aug 05, 2013 at 07:27:33PM +0800, Arthur Chunqi Li wrote:
> >> On Mon, Aug 5, 2013 at 4:07 PM, Gleb Natapov <gleb@redhat.com> wrote:
> >>> From: Nadav Har'El <nyh@il.ibm.com>
> >>>
> >>> Recent KVM, since http://kerneltrap.org/mailarchive/linux-kvm/2010/5/2/6261577
> >>> switch the EFER MSR when EPT is used and the host and guest have different
> >>> NX bits. So if we add support for nested EPT (L1 guest using EPT to run L2)
> >>> and want to be able to run recent KVM as L1, we need to allow L1 to use this
> >>> EFER switching feature.
> >>>
> >>> To do this EFER switching, KVM uses VM_ENTRY/EXIT_LOAD_IA32_EFER if available,
> >>> and if it isn't, it uses the generic VM_ENTRY/EXIT_MSR_LOAD. This patch adds
> >>> support for the former (the latter is still unsupported).
> >>>
> >>> Nested entry and exit emulation (prepare_vmcs_02 and load_vmcs12_host_state,
> >>> respectively) already handled VM_ENTRY/EXIT_LOAD_IA32_EFER correctly. So all
> >>> that's left to do in this patch is to properly advertise this feature to L1.
> >>>
> >>> Note that vmcs12's VM_ENTRY/EXIT_LOAD_IA32_EFER are emulated by L0, by using
> >>> vmx_set_efer (which itself sets one of several vmcs02 fields), so we always
> >>> support this feature, regardless of whether the host supports it.
> >>>
> >>> Reviewed-by: Orit Wasserman <owasserm@redhat.com>
> >>> Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
> >>> Signed-off-by: Jun Nakajima <jun.nakajima@intel.com>
> >>> Signed-off-by: Xinhao Xu <xinhao.xu@intel.com>
> >>> Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
> >>> Signed-off-by: Gleb Natapov <gleb@redhat.com>
> >>> ---
> >>>  arch/x86/kvm/vmx.c |   23 ++++++++++++++++-------
> >>>  1 file changed, 16 insertions(+), 7 deletions(-)
> >>>
> >>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> >>> index e999dc7..27efa6a 100644
> >>> --- a/arch/x86/kvm/vmx.c
> >>> +++ b/arch/x86/kvm/vmx.c
> >>> @@ -2198,7 +2198,8 @@ static __init void nested_vmx_setup_ctls_msrs(void)
> >>>  #else
> >>>         nested_vmx_exit_ctls_high = 0;
> >>>  #endif
> >>> -       nested_vmx_exit_ctls_high |= VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR;
> >>> +       nested_vmx_exit_ctls_high |= (VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR |
> >>> +                                     VM_EXIT_LOAD_IA32_EFER);
> >> Gleb, why we don't need to check whether host supports
> >> VM_EXIT_LOAD_IA32_EFER here, as what you noted in my
> >> VM_EXIT_LOAD_IA32_PAT patch?
> > Nested VMX completely emulates the capability.
> 
> No, it doesn't. The values for host/guest are handled over via the
> corresponding VMCS fields, physically, even though the actual loading is
> emulated then. So we must not expose this feature unconditionally.
Can you show me the code where it happens?

> That's the same pattern as with PAT, Arthur is right.
> 
> I've a patch on top of v7 to fix this (and also the incorrect
> VM_ENTRY_IA32E_MODE reporting on 32-bit). Will post it soon as part of
> an updated version of my unrestricted guest mode series.
> 
> Jan
> 
> 



--
			Gleb.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 01/15] nEPT: Support LOAD_IA32_EFER entry/exit controls for L1
  2013-08-06  7:51         ` Gleb Natapov
@ 2013-08-06  7:55           ` Jan Kiszka
  2013-08-06  8:00             ` Gleb Natapov
  0 siblings, 1 reply; 32+ messages in thread
From: Jan Kiszka @ 2013-08-06  7:55 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Arthur Chunqi Li, kvm, Xiao Guangrong, Jun Nakajima, Yang Zhang,
	Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 3437 bytes --]

On 2013-08-06 09:51, Gleb Natapov wrote:
> On Tue, Aug 06, 2013 at 09:47:23AM +0200, Jan Kiszka wrote:
>> On 2013-08-05 13:40, Gleb Natapov wrote:
>>> On Mon, Aug 05, 2013 at 07:27:33PM +0800, Arthur Chunqi Li wrote:
>>>> On Mon, Aug 5, 2013 at 4:07 PM, Gleb Natapov <gleb@redhat.com> wrote:
>>>>> From: Nadav Har'El <nyh@il.ibm.com>
>>>>>
>>>>> Recent KVM, since http://kerneltrap.org/mailarchive/linux-kvm/2010/5/2/6261577
>>>>> switch the EFER MSR when EPT is used and the host and guest have different
>>>>> NX bits. So if we add support for nested EPT (L1 guest using EPT to run L2)
>>>>> and want to be able to run recent KVM as L1, we need to allow L1 to use this
>>>>> EFER switching feature.
>>>>>
>>>>> To do this EFER switching, KVM uses VM_ENTRY/EXIT_LOAD_IA32_EFER if available,
>>>>> and if it isn't, it uses the generic VM_ENTRY/EXIT_MSR_LOAD. This patch adds
>>>>> support for the former (the latter is still unsupported).
>>>>>
>>>>> Nested entry and exit emulation (prepare_vmcs_02 and load_vmcs12_host_state,
>>>>> respectively) already handled VM_ENTRY/EXIT_LOAD_IA32_EFER correctly. So all
>>>>> that's left to do in this patch is to properly advertise this feature to L1.
>>>>>
>>>>> Note that vmcs12's VM_ENTRY/EXIT_LOAD_IA32_EFER are emulated by L0, by using
>>>>> vmx_set_efer (which itself sets one of several vmcs02 fields), so we always
>>>>> support this feature, regardless of whether the host supports it.
>>>>>
>>>>> Reviewed-by: Orit Wasserman <owasserm@redhat.com>
>>>>> Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
>>>>> Signed-off-by: Jun Nakajima <jun.nakajima@intel.com>
>>>>> Signed-off-by: Xinhao Xu <xinhao.xu@intel.com>
>>>>> Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
>>>>> Signed-off-by: Gleb Natapov <gleb@redhat.com>
>>>>> ---
>>>>>  arch/x86/kvm/vmx.c |   23 ++++++++++++++++-------
>>>>>  1 file changed, 16 insertions(+), 7 deletions(-)
>>>>>
>>>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>>>>> index e999dc7..27efa6a 100644
>>>>> --- a/arch/x86/kvm/vmx.c
>>>>> +++ b/arch/x86/kvm/vmx.c
>>>>> @@ -2198,7 +2198,8 @@ static __init void nested_vmx_setup_ctls_msrs(void)
>>>>>  #else
>>>>>         nested_vmx_exit_ctls_high = 0;
>>>>>  #endif
>>>>> -       nested_vmx_exit_ctls_high |= VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR;
>>>>> +       nested_vmx_exit_ctls_high |= (VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR |
>>>>> +                                     VM_EXIT_LOAD_IA32_EFER);
>>>> Gleb, why we don't need to check whether host supports
>>>> VM_EXIT_LOAD_IA32_EFER here, as what you noted in my
>>>> VM_EXIT_LOAD_IA32_PAT patch?
>>> Nested VMX completely emulates the capability.
>>
>> No, it doesn't. The values for host/guest are handled over via the
>> corresponding VMCS fields, physically, even though the actual loading is
>> emulated then. So we must not expose this feature unconditionally.
> Can you show me the code where it happens?

When the guest writes to HOST/GUEST_IA32_EFER, we will store this in the
vmcs that will then become the active one on next L1/L2 entry, no?

Jan

> 
>> That's the same pattern as with PAT, Arthur is right.
>>
>> I've a patch on top of v7 to fix this (and also the incorrect
>> VM_ENTRY_IA32E_MODE reporting on 32-bit). Will post it soon as part of
>> an updated version of my unrestricted guest mode series.
>>
>> Jan
>>
>>
> 
> 
> 
> --
> 			Gleb.
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 01/15] nEPT: Support LOAD_IA32_EFER entry/exit controls for L1
  2013-08-06  7:55           ` Jan Kiszka
@ 2013-08-06  8:00             ` Gleb Natapov
  2013-08-06  8:18               ` Jan Kiszka
  0 siblings, 1 reply; 32+ messages in thread
From: Gleb Natapov @ 2013-08-06  8:00 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Arthur Chunqi Li, kvm, Xiao Guangrong, Jun Nakajima, Yang Zhang,
	Paolo Bonzini

On Tue, Aug 06, 2013 at 09:55:09AM +0200, Jan Kiszka wrote:
> On 2013-08-06 09:51, Gleb Natapov wrote:
> > On Tue, Aug 06, 2013 at 09:47:23AM +0200, Jan Kiszka wrote:
> >> On 2013-08-05 13:40, Gleb Natapov wrote:
> >>> On Mon, Aug 05, 2013 at 07:27:33PM +0800, Arthur Chunqi Li wrote:
> >>>> On Mon, Aug 5, 2013 at 4:07 PM, Gleb Natapov <gleb@redhat.com> wrote:
> >>>>> From: Nadav Har'El <nyh@il.ibm.com>
> >>>>>
> >>>>> Recent KVM, since http://kerneltrap.org/mailarchive/linux-kvm/2010/5/2/6261577
> >>>>> switch the EFER MSR when EPT is used and the host and guest have different
> >>>>> NX bits. So if we add support for nested EPT (L1 guest using EPT to run L2)
> >>>>> and want to be able to run recent KVM as L1, we need to allow L1 to use this
> >>>>> EFER switching feature.
> >>>>>
> >>>>> To do this EFER switching, KVM uses VM_ENTRY/EXIT_LOAD_IA32_EFER if available,
> >>>>> and if it isn't, it uses the generic VM_ENTRY/EXIT_MSR_LOAD. This patch adds
> >>>>> support for the former (the latter is still unsupported).
> >>>>>
> >>>>> Nested entry and exit emulation (prepare_vmcs_02 and load_vmcs12_host_state,
> >>>>> respectively) already handled VM_ENTRY/EXIT_LOAD_IA32_EFER correctly. So all
> >>>>> that's left to do in this patch is to properly advertise this feature to L1.
> >>>>>
> >>>>> Note that vmcs12's VM_ENTRY/EXIT_LOAD_IA32_EFER are emulated by L0, by using
> >>>>> vmx_set_efer (which itself sets one of several vmcs02 fields), so we always
> >>>>> support this feature, regardless of whether the host supports it.
> >>>>>
> >>>>> Reviewed-by: Orit Wasserman <owasserm@redhat.com>
> >>>>> Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
> >>>>> Signed-off-by: Jun Nakajima <jun.nakajima@intel.com>
> >>>>> Signed-off-by: Xinhao Xu <xinhao.xu@intel.com>
> >>>>> Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
> >>>>> Signed-off-by: Gleb Natapov <gleb@redhat.com>
> >>>>> ---
> >>>>>  arch/x86/kvm/vmx.c |   23 ++++++++++++++++-------
> >>>>>  1 file changed, 16 insertions(+), 7 deletions(-)
> >>>>>
> >>>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> >>>>> index e999dc7..27efa6a 100644
> >>>>> --- a/arch/x86/kvm/vmx.c
> >>>>> +++ b/arch/x86/kvm/vmx.c
> >>>>> @@ -2198,7 +2198,8 @@ static __init void nested_vmx_setup_ctls_msrs(void)
> >>>>>  #else
> >>>>>         nested_vmx_exit_ctls_high = 0;
> >>>>>  #endif
> >>>>> -       nested_vmx_exit_ctls_high |= VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR;
> >>>>> +       nested_vmx_exit_ctls_high |= (VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR |
> >>>>> +                                     VM_EXIT_LOAD_IA32_EFER);
> >>>> Gleb, why we don't need to check whether host supports
> >>>> VM_EXIT_LOAD_IA32_EFER here, as what you noted in my
> >>>> VM_EXIT_LOAD_IA32_PAT patch?
> >>> Nested VMX completely emulates the capability.
> >>
> >> No, it doesn't. The values for host/guest are handled over via the
> >> corresponding VMCS fields, physically, even though the actual loading is
> >> emulated then. So we must not expose this feature unconditionally.
> > Can you show me the code where it happens?
> 
> When the guest writes to HOST/GUEST_IA32_EFER, we will store this in the
> vmcs that will then become the active one on next L1/L2 entry, no?
> 
Guest writes are stored in vmcs12 which is not HW vmcs, just a format
kvm uses internally (see vmcs12_write_any). During guest entry vmcs02 
is constructed from vmcs12 (see prepare_vmcs02) and this function does
not access HOST/GUEST_IA32_EFER directly, it uses vmx_set_efer instead
which takes care of things. Same function access GUEST_IA32_PAT directly
though.

--
			Gleb.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 10/15] nEPT: Add nEPT violation/misconfigration support
  2013-08-05  8:07 ` [PATCH v7 10/15] nEPT: Add nEPT violation/misconfigration support Gleb Natapov
@ 2013-08-06  8:01   ` Jan Kiszka
  2013-08-06  9:00   ` Gleb Natapov
  1 sibling, 0 replies; 32+ messages in thread
From: Jan Kiszka @ 2013-08-06  8:01 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: kvm, Xiao Guangrong, Jun Nakajima, Yang Zhang, pbonzini

[-- Attachment #1: Type: text/plain, Size: 2922 bytes --]

On 2013-08-05 10:07, Gleb Natapov wrote:
> From: Yang Zhang <yang.z.zhang@Intel.com>
> 
> Inject nEPT fault to L1 guest. This patch is original from Xinhao.
> 
> Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
> Signed-off-by: Jun Nakajima <jun.nakajima@intel.com>
> Signed-off-by: Xinhao Xu <xinhao.xu@intel.com>
> Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
> Signed-off-by: Gleb Natapov <gleb@redhat.com>
> ---
>  arch/x86/include/asm/kvm_host.h |    4 +++
>  arch/x86/kvm/mmu.c              |   61 ++++++++++++++++++++++++++++++++-------
>  arch/x86/kvm/paging_tmpl.h      |   25 ++++++++++++++--
>  arch/x86/kvm/vmx.c              |   19 ++++++++++++
>  4 files changed, 95 insertions(+), 14 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 531f47c..58a17c0 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -286,6 +286,7 @@ struct kvm_mmu {
>  	u64 *pae_root;
>  	u64 *lm_root;
>  	u64 rsvd_bits_mask[2][4];
> +	u64 bad_mt_xwr;
>  
>  	/*
>  	 * Bitmap: bit set = last pte in walk
> @@ -512,6 +513,9 @@ struct kvm_vcpu_arch {
>  	 * instruction.
>  	 */
>  	bool write_fault_to_shadow_pgtable;
> +
> +	/* set at EPT violation at this point */
> +	unsigned long exit_qualification;
>  };
>  
>  struct kvm_lpage_info {
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index a512ecf..f2d982d 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -3519,6 +3519,8 @@ static void reset_rsvds_bits_mask(struct kvm_vcpu *vcpu,
>  	int maxphyaddr = cpuid_maxphyaddr(vcpu);
>  	u64 exb_bit_rsvd = 0;
>  
> +	context->bad_mt_xwr = 0;
> +
>  	if (!context->nx)
>  		exb_bit_rsvd = rsvd_bits(63, 63);
>  	switch (context->root_level) {
> @@ -3574,7 +3576,40 @@ static void reset_rsvds_bits_mask(struct kvm_vcpu *vcpu,
>  	}
>  }
>  
> -static void update_permission_bitmask(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu)
> +static void reset_rsvds_bits_mask_ept(struct kvm_vcpu *vcpu,
> +		struct kvm_mmu *context, bool execonly)
> +{
> +	int maxphyaddr = cpuid_maxphyaddr(vcpu);
> +	int pte;
> +
> +	context->rsvd_bits_mask[0][3] =
> +		rsvd_bits(maxphyaddr, 51) | rsvd_bits(3, 7);
> +	context->rsvd_bits_mask[0][2] =
> +		rsvd_bits(maxphyaddr, 51) | rsvd_bits(3, 6);
> +	context->rsvd_bits_mask[0][1] =
> +		rsvd_bits(maxphyaddr, 51) | rsvd_bits(3, 6);
> +	context->rsvd_bits_mask[0][0] = rsvd_bits(maxphyaddr, 51);
> +
> +	/* large page */
> +	context->rsvd_bits_mask[1][3] = context->rsvd_bits_mask[0][3];
> +	context->rsvd_bits_mask[1][2] =
> +		rsvd_bits(maxphyaddr, 51) | rsvd_bits(12, 29);
> +	context->rsvd_bits_mask[1][1] =
> +		rsvd_bits(maxphyaddr, 51) | rsvd_bits(12, 20);
> +	context->rsvd_bits_mask[1][0] = context->rsvd_bits_mask[0][0];
> +	

Here is a whitespace error (says git am).

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 01/15] nEPT: Support LOAD_IA32_EFER entry/exit controls for L1
  2013-08-06  8:00             ` Gleb Natapov
@ 2013-08-06  8:18               ` Jan Kiszka
  2013-08-06  8:24                 ` Gleb Natapov
  0 siblings, 1 reply; 32+ messages in thread
From: Jan Kiszka @ 2013-08-06  8:18 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Arthur Chunqi Li, kvm, Xiao Guangrong, Jun Nakajima, Yang Zhang,
	Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 3987 bytes --]

On 2013-08-06 10:00, Gleb Natapov wrote:
> On Tue, Aug 06, 2013 at 09:55:09AM +0200, Jan Kiszka wrote:
>> On 2013-08-06 09:51, Gleb Natapov wrote:
>>> On Tue, Aug 06, 2013 at 09:47:23AM +0200, Jan Kiszka wrote:
>>>> On 2013-08-05 13:40, Gleb Natapov wrote:
>>>>> On Mon, Aug 05, 2013 at 07:27:33PM +0800, Arthur Chunqi Li wrote:
>>>>>> On Mon, Aug 5, 2013 at 4:07 PM, Gleb Natapov <gleb@redhat.com> wrote:
>>>>>>> From: Nadav Har'El <nyh@il.ibm.com>
>>>>>>>
>>>>>>> Recent KVM, since http://kerneltrap.org/mailarchive/linux-kvm/2010/5/2/6261577
>>>>>>> switch the EFER MSR when EPT is used and the host and guest have different
>>>>>>> NX bits. So if we add support for nested EPT (L1 guest using EPT to run L2)
>>>>>>> and want to be able to run recent KVM as L1, we need to allow L1 to use this
>>>>>>> EFER switching feature.
>>>>>>>
>>>>>>> To do this EFER switching, KVM uses VM_ENTRY/EXIT_LOAD_IA32_EFER if available,
>>>>>>> and if it isn't, it uses the generic VM_ENTRY/EXIT_MSR_LOAD. This patch adds
>>>>>>> support for the former (the latter is still unsupported).
>>>>>>>
>>>>>>> Nested entry and exit emulation (prepare_vmcs_02 and load_vmcs12_host_state,
>>>>>>> respectively) already handled VM_ENTRY/EXIT_LOAD_IA32_EFER correctly. So all
>>>>>>> that's left to do in this patch is to properly advertise this feature to L1.
>>>>>>>
>>>>>>> Note that vmcs12's VM_ENTRY/EXIT_LOAD_IA32_EFER are emulated by L0, by using
>>>>>>> vmx_set_efer (which itself sets one of several vmcs02 fields), so we always
>>>>>>> support this feature, regardless of whether the host supports it.
>>>>>>>
>>>>>>> Reviewed-by: Orit Wasserman <owasserm@redhat.com>
>>>>>>> Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
>>>>>>> Signed-off-by: Jun Nakajima <jun.nakajima@intel.com>
>>>>>>> Signed-off-by: Xinhao Xu <xinhao.xu@intel.com>
>>>>>>> Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
>>>>>>> Signed-off-by: Gleb Natapov <gleb@redhat.com>
>>>>>>> ---
>>>>>>>  arch/x86/kvm/vmx.c |   23 ++++++++++++++++-------
>>>>>>>  1 file changed, 16 insertions(+), 7 deletions(-)
>>>>>>>
>>>>>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>>>>>>> index e999dc7..27efa6a 100644
>>>>>>> --- a/arch/x86/kvm/vmx.c
>>>>>>> +++ b/arch/x86/kvm/vmx.c
>>>>>>> @@ -2198,7 +2198,8 @@ static __init void nested_vmx_setup_ctls_msrs(void)
>>>>>>>  #else
>>>>>>>         nested_vmx_exit_ctls_high = 0;
>>>>>>>  #endif
>>>>>>> -       nested_vmx_exit_ctls_high |= VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR;
>>>>>>> +       nested_vmx_exit_ctls_high |= (VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR |
>>>>>>> +                                     VM_EXIT_LOAD_IA32_EFER);
>>>>>> Gleb, why we don't need to check whether host supports
>>>>>> VM_EXIT_LOAD_IA32_EFER here, as what you noted in my
>>>>>> VM_EXIT_LOAD_IA32_PAT patch?
>>>>> Nested VMX completely emulates the capability.
>>>>
>>>> No, it doesn't. The values for host/guest are handled over via the
>>>> corresponding VMCS fields, physically, even though the actual loading is
>>>> emulated then. So we must not expose this feature unconditionally.
>>> Can you show me the code where it happens?
>>
>> When the guest writes to HOST/GUEST_IA32_EFER, we will store this in the
>> vmcs that will then become the active one on next L1/L2 entry, no?
>>
> Guest writes are stored in vmcs12 which is not HW vmcs, just a format
> kvm uses internally (see vmcs12_write_any). During guest entry vmcs02 
> is constructed from vmcs12 (see prepare_vmcs02) and this function does
> not access HOST/GUEST_IA32_EFER directly, it uses vmx_set_efer instead
> which takes care of things. Same function access GUEST_IA32_PAT directly
> though.

OK, right.

Is it also safe to write to any field of a shadow VMCS? That's currently
hypothetical, but what if the host supports shadowing but not EFER
loading? I don't see a technical reason but also no clear statement in
the SDM regarding this scenario.

Jan



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 13/15] nEPT: Advertise EPT to L1
  2013-08-05  8:07 ` [PATCH v7 13/15] nEPT: Advertise EPT to L1 Gleb Natapov
@ 2013-08-06  8:20   ` Jan Kiszka
  2013-08-06  9:02   ` Gleb Natapov
  1 sibling, 0 replies; 32+ messages in thread
From: Jan Kiszka @ 2013-08-06  8:20 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: kvm, Xiao Guangrong, Jun Nakajima, Yang Zhang, pbonzini

[-- Attachment #1: Type: text/plain, Size: 2382 bytes --]

On 2013-08-05 10:07, Gleb Natapov wrote:
> From: Nadav Har'El <nyh@il.ibm.com>
> 
> Advertise the support of EPT to the L1 guest, through the appropriate MSR.
> 
> This is the last patch of the basic Nested EPT feature, so as to allow
> bisection through this patch series: The guest will not see EPT support until
> this last patch, and will not attempt to use the half-applied feature.
> 
> Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
> Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
> Signed-off-by: Jun Nakajima <jun.nakajima@intel.com>
> Signed-off-by: Xinhao Xu <xinhao.xu@intel.com>
> Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
> Signed-off-by: Gleb Natapov <gleb@redhat.com>
> ---
>  arch/x86/kvm/vmx.c |   20 ++++++++++++++++++--
>  1 file changed, 18 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 51c06c2..c65a08a 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -2250,6 +2250,22 @@ static __init void nested_vmx_setup_ctls_msrs(void)
>  		SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES |
>  		SECONDARY_EXEC_WBINVD_EXITING;
>  
> +	if (enable_ept) {
> +		/* nested EPT: emulate EPT also to L1 */
> +		nested_vmx_secondary_ctls_high |= SECONDARY_EXEC_ENABLE_EPT;
> +		nested_vmx_ept_caps = VMX_EPT_PAGE_WALK_4_BIT;
> +		nested_vmx_ept_caps |= VMX_EPT_INVEPT_BIT;
> +		nested_vmx_ept_caps &= vmx_capability.ept;
> +		/*
> +		 * Since invept is completely emulated we support both global
> +		 * and context invalidation independent of what host cpu
> +		 * supports
> +		 */
> +	       	nested_vmx_ept_caps |= VMX_EPT_EXTENT_GLOBAL_BIT |

And another whitespace damage (space after tab).

Jan

> +			VMX_EPT_EXTENT_CONTEXT_BIT;
> +	} else
> +		nested_vmx_ept_caps = 0;
> +
>  	/* miscellaneous data */
>  	rdmsr(MSR_IA32_VMX_MISC, nested_vmx_misc_low, nested_vmx_misc_high);
>  	nested_vmx_misc_low &= VMX_MISC_PREEMPTION_TIMER_RATE_MASK |
> @@ -2358,8 +2374,8 @@ static int vmx_get_vmx_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata)
>  					nested_vmx_secondary_ctls_high);
>  		break;
>  	case MSR_IA32_VMX_EPT_VPID_CAP:
> -		/* Currently, no nested ept or nested vpid */
> -		*pdata = 0;
> +		/* Currently, no nested vpid support */
> +		*pdata = nested_vmx_ept_caps;
>  		break;
>  	default:
>  		return 0;
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 01/15] nEPT: Support LOAD_IA32_EFER entry/exit controls for L1
  2013-08-06  8:18               ` Jan Kiszka
@ 2013-08-06  8:24                 ` Gleb Natapov
  0 siblings, 0 replies; 32+ messages in thread
From: Gleb Natapov @ 2013-08-06  8:24 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Arthur Chunqi Li, kvm, Xiao Guangrong, Jun Nakajima, Yang Zhang,
	Paolo Bonzini

On Tue, Aug 06, 2013 at 10:18:17AM +0200, Jan Kiszka wrote:
> On 2013-08-06 10:00, Gleb Natapov wrote:
> > On Tue, Aug 06, 2013 at 09:55:09AM +0200, Jan Kiszka wrote:
> >> On 2013-08-06 09:51, Gleb Natapov wrote:
> >>> On Tue, Aug 06, 2013 at 09:47:23AM +0200, Jan Kiszka wrote:
> >>>> On 2013-08-05 13:40, Gleb Natapov wrote:
> >>>>> On Mon, Aug 05, 2013 at 07:27:33PM +0800, Arthur Chunqi Li wrote:
> >>>>>> On Mon, Aug 5, 2013 at 4:07 PM, Gleb Natapov <gleb@redhat.com> wrote:
> >>>>>>> From: Nadav Har'El <nyh@il.ibm.com>
> >>>>>>>
> >>>>>>> Recent KVM, since http://kerneltrap.org/mailarchive/linux-kvm/2010/5/2/6261577
> >>>>>>> switch the EFER MSR when EPT is used and the host and guest have different
> >>>>>>> NX bits. So if we add support for nested EPT (L1 guest using EPT to run L2)
> >>>>>>> and want to be able to run recent KVM as L1, we need to allow L1 to use this
> >>>>>>> EFER switching feature.
> >>>>>>>
> >>>>>>> To do this EFER switching, KVM uses VM_ENTRY/EXIT_LOAD_IA32_EFER if available,
> >>>>>>> and if it isn't, it uses the generic VM_ENTRY/EXIT_MSR_LOAD. This patch adds
> >>>>>>> support for the former (the latter is still unsupported).
> >>>>>>>
> >>>>>>> Nested entry and exit emulation (prepare_vmcs_02 and load_vmcs12_host_state,
> >>>>>>> respectively) already handled VM_ENTRY/EXIT_LOAD_IA32_EFER correctly. So all
> >>>>>>> that's left to do in this patch is to properly advertise this feature to L1.
> >>>>>>>
> >>>>>>> Note that vmcs12's VM_ENTRY/EXIT_LOAD_IA32_EFER are emulated by L0, by using
> >>>>>>> vmx_set_efer (which itself sets one of several vmcs02 fields), so we always
> >>>>>>> support this feature, regardless of whether the host supports it.
> >>>>>>>
> >>>>>>> Reviewed-by: Orit Wasserman <owasserm@redhat.com>
> >>>>>>> Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
> >>>>>>> Signed-off-by: Jun Nakajima <jun.nakajima@intel.com>
> >>>>>>> Signed-off-by: Xinhao Xu <xinhao.xu@intel.com>
> >>>>>>> Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
> >>>>>>> Signed-off-by: Gleb Natapov <gleb@redhat.com>
> >>>>>>> ---
> >>>>>>>  arch/x86/kvm/vmx.c |   23 ++++++++++++++++-------
> >>>>>>>  1 file changed, 16 insertions(+), 7 deletions(-)
> >>>>>>>
> >>>>>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> >>>>>>> index e999dc7..27efa6a 100644
> >>>>>>> --- a/arch/x86/kvm/vmx.c
> >>>>>>> +++ b/arch/x86/kvm/vmx.c
> >>>>>>> @@ -2198,7 +2198,8 @@ static __init void nested_vmx_setup_ctls_msrs(void)
> >>>>>>>  #else
> >>>>>>>         nested_vmx_exit_ctls_high = 0;
> >>>>>>>  #endif
> >>>>>>> -       nested_vmx_exit_ctls_high |= VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR;
> >>>>>>> +       nested_vmx_exit_ctls_high |= (VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR |
> >>>>>>> +                                     VM_EXIT_LOAD_IA32_EFER);
> >>>>>> Gleb, why we don't need to check whether host supports
> >>>>>> VM_EXIT_LOAD_IA32_EFER here, as what you noted in my
> >>>>>> VM_EXIT_LOAD_IA32_PAT patch?
> >>>>> Nested VMX completely emulates the capability.
> >>>>
> >>>> No, it doesn't. The values for host/guest are handled over via the
> >>>> corresponding VMCS fields, physically, even though the actual loading is
> >>>> emulated then. So we must not expose this feature unconditionally.
> >>> Can you show me the code where it happens?
> >>
> >> When the guest writes to HOST/GUEST_IA32_EFER, we will store this in the
> >> vmcs that will then become the active one on next L1/L2 entry, no?
> >>
> > Guest writes are stored in vmcs12 which is not HW vmcs, just a format
> > kvm uses internally (see vmcs12_write_any). During guest entry vmcs02 
> > is constructed from vmcs12 (see prepare_vmcs02) and this function does
> > not access HOST/GUEST_IA32_EFER directly, it uses vmx_set_efer instead
> > which takes care of things. Same function access GUEST_IA32_PAT directly
> > though.
> 
> OK, right.
> 
> Is it also safe to write to any field of a shadow VMCS? That's currently
> hypothetical, but what if the host supports shadowing but not EFER
> loading? I don't see a technical reason but also no clear statement in
> the SDM regarding this scenario.
> 
We do not shadow all VMCS fields, only some of them (see
shadow_read_write_fields array). HOST/GUEST_IA32_EFER is not the one we
shadow.

--
			Gleb.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v7 10/15] nEPT: Add nEPT violation/misconfigration support
  2013-08-05  8:07 ` [PATCH v7 10/15] nEPT: Add nEPT violation/misconfigration support Gleb Natapov
  2013-08-06  8:01   ` Jan Kiszka
@ 2013-08-06  9:00   ` Gleb Natapov
  1 sibling, 0 replies; 32+ messages in thread
From: Gleb Natapov @ 2013-08-06  9:00 UTC (permalink / raw)
  To: kvm; +Cc: Xiao Guangrong, Jun Nakajima, Yang Zhang, pbonzini

From: Yang Zhang <yang.z.zhang@Intel.com>

Inject nEPT fault to L1 guest. This patch is original from Xinhao.

Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Jun Nakajima <jun.nakajima@intel.com>
Signed-off-by: Xinhao Xu <xinhao.xu@intel.com>
Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
---
 fix white space


 arch/x86/include/asm/kvm_host.h |    4 +++
 arch/x86/kvm/mmu.c              |   61 ++++++++++++++++++++++++++++++++-------
 arch/x86/kvm/paging_tmpl.h      |   25 ++++++++++++++--
 arch/x86/kvm/vmx.c              |   19 ++++++++++++
 4 files changed, 95 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 531f47c..58a17c0 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -286,6 +286,7 @@ struct kvm_mmu {
 	u64 *pae_root;
 	u64 *lm_root;
 	u64 rsvd_bits_mask[2][4];
+	u64 bad_mt_xwr;
 
 	/*
 	 * Bitmap: bit set = last pte in walk
@@ -512,6 +513,9 @@ struct kvm_vcpu_arch {
 	 * instruction.
 	 */
 	bool write_fault_to_shadow_pgtable;
+
+	/* set at EPT violation at this point */
+	unsigned long exit_qualification;
 };
 
 struct kvm_lpage_info {
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index a512ecf..7443f82 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3519,6 +3519,8 @@ static void reset_rsvds_bits_mask(struct kvm_vcpu *vcpu,
 	int maxphyaddr = cpuid_maxphyaddr(vcpu);
 	u64 exb_bit_rsvd = 0;
 
+	context->bad_mt_xwr = 0;
+
 	if (!context->nx)
 		exb_bit_rsvd = rsvd_bits(63, 63);
 	switch (context->root_level) {
@@ -3574,7 +3576,40 @@ static void reset_rsvds_bits_mask(struct kvm_vcpu *vcpu,
 	}
 }
 
-static void update_permission_bitmask(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu)
+static void reset_rsvds_bits_mask_ept(struct kvm_vcpu *vcpu,
+		struct kvm_mmu *context, bool execonly)
+{
+	int maxphyaddr = cpuid_maxphyaddr(vcpu);
+	int pte;
+
+	context->rsvd_bits_mask[0][3] =
+		rsvd_bits(maxphyaddr, 51) | rsvd_bits(3, 7);
+	context->rsvd_bits_mask[0][2] =
+		rsvd_bits(maxphyaddr, 51) | rsvd_bits(3, 6);
+	context->rsvd_bits_mask[0][1] =
+		rsvd_bits(maxphyaddr, 51) | rsvd_bits(3, 6);
+	context->rsvd_bits_mask[0][0] = rsvd_bits(maxphyaddr, 51);
+
+	/* large page */
+	context->rsvd_bits_mask[1][3] = context->rsvd_bits_mask[0][3];
+	context->rsvd_bits_mask[1][2] =
+		rsvd_bits(maxphyaddr, 51) | rsvd_bits(12, 29);
+	context->rsvd_bits_mask[1][1] =
+		rsvd_bits(maxphyaddr, 51) | rsvd_bits(12, 20);
+	context->rsvd_bits_mask[1][0] = context->rsvd_bits_mask[0][0];
+
+	for (pte = 0; pte < 64; pte++) {
+		int rwx_bits = pte & 7;
+		int mt = pte >> 3;
+		if (mt == 0x2 || mt == 0x3 || mt == 0x7 ||
+				rwx_bits == 0x2 || rwx_bits == 0x6 ||
+				(rwx_bits == 0x4 && !execonly))
+			context->bad_mt_xwr |= (1ull << pte);
+	}
+}
+
+static void update_permission_bitmask(struct kvm_vcpu *vcpu,
+		struct kvm_mmu *mmu, bool ept)
 {
 	unsigned bit, byte, pfec;
 	u8 map;
@@ -3592,12 +3627,16 @@ static void update_permission_bitmask(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu
 			w = bit & ACC_WRITE_MASK;
 			u = bit & ACC_USER_MASK;
 
-			/* Not really needed: !nx will cause pte.nx to fault */
-			x |= !mmu->nx;
-			/* Allow supervisor writes if !cr0.wp */
-			w |= !is_write_protection(vcpu) && !uf;
-			/* Disallow supervisor fetches of user code if cr4.smep */
-			x &= !(smep && u && !uf);
+			if (!ept) {
+				/* Not really needed: !nx will cause pte.nx to fault */
+				x |= !mmu->nx;
+				/* Allow supervisor writes if !cr0.wp */
+				w |= !is_write_protection(vcpu) && !uf;
+				/* Disallow supervisor fetches of user code if cr4.smep */
+				x &= !(smep && u && !uf);
+			} else
+				/* Not really needed: no U/S accesses on ept  */
+				u = 1;
 
 			fault = (ff && !x) || (uf && !u) || (wf && !w);
 			map |= fault << bit;
@@ -3632,7 +3671,7 @@ static int paging64_init_context_common(struct kvm_vcpu *vcpu,
 	context->root_level = level;
 
 	reset_rsvds_bits_mask(vcpu, context);
-	update_permission_bitmask(vcpu, context);
+	update_permission_bitmask(vcpu, context, false);
 	update_last_pte_bitmap(vcpu, context);
 
 	ASSERT(is_pae(vcpu));
@@ -3662,7 +3701,7 @@ static int paging32_init_context(struct kvm_vcpu *vcpu,
 	context->root_level = PT32_ROOT_LEVEL;
 
 	reset_rsvds_bits_mask(vcpu, context);
-	update_permission_bitmask(vcpu, context);
+	update_permission_bitmask(vcpu, context, false);
 	update_last_pte_bitmap(vcpu, context);
 
 	context->new_cr3 = paging_new_cr3;
@@ -3724,7 +3763,7 @@ static int init_kvm_tdp_mmu(struct kvm_vcpu *vcpu)
 		context->gva_to_gpa = paging32_gva_to_gpa;
 	}
 
-	update_permission_bitmask(vcpu, context);
+	update_permission_bitmask(vcpu, context, false);
 	update_last_pte_bitmap(vcpu, context);
 
 	return 0;
@@ -3803,7 +3842,7 @@ static int init_kvm_nested_mmu(struct kvm_vcpu *vcpu)
 		g_context->gva_to_gpa = paging32_gva_to_gpa_nested;
 	}
 
-	update_permission_bitmask(vcpu, g_context);
+	update_permission_bitmask(vcpu, g_context, false);
 	update_last_pte_bitmap(vcpu, g_context);
 
 	return 0;
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index f8e5680..7f6af8e 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -129,10 +129,10 @@ static inline void FNAME(protect_clean_gpte)(unsigned *access, unsigned gpte)
 
 static bool FNAME(is_rsvd_bits_set)(struct kvm_mmu *mmu, u64 gpte, int level)
 {
-	int bit7;
+	int bit7 = (gpte >> 7) & 1, low6 = gpte & 0x3f;
 
-	bit7 = (gpte >> 7) & 1;
-	return (gpte & mmu->rsvd_bits_mask[bit7][level-1]) != 0;
+	return (gpte & mmu->rsvd_bits_mask[bit7][level-1]) |
+		((mmu->bad_mt_xwr & (1ull << low6)) != 0);
 }
 
 static inline int FNAME(is_present_gpte)(unsigned long pte)
@@ -386,6 +386,25 @@ error:
 	walker->fault.vector = PF_VECTOR;
 	walker->fault.error_code_valid = true;
 	walker->fault.error_code = errcode;
+
+#if PTTYPE == PTTYPE_EPT
+	/*
+	 * Use PFERR_RSVD_MASK in error_code to to tell if EPT
+	 * misconfiguration requires to be injected. The detection is
+	 * done by is_rsvd_bits_set() above.
+	 *
+	 * We set up the value of exit_qualification to inject:
+	 * [2:0] - Derive from [2:0] of real exit_qualification at EPT violation
+	 * [5:3] - Calculated by the page walk of the guest EPT page tables
+	 * [7:8] - Derived from [7:8] of real exit_qualification
+	 *
+	 * The other bits are set to 0.
+	 */
+	if (!(errcode & PFERR_RSVD_MASK)) {
+		vcpu->arch.exit_qualification &= 0x187;
+		vcpu->arch.exit_qualification |= ((pt_access & pte) & 0x7) << 3;
+	}
+#endif
 	walker->fault.address = addr;
 	walker->fault.nested_page_fault = mmu != vcpu->arch.walk_mmu;
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 4e98764..984f8d7 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -5317,9 +5317,13 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)
 
 	/* It is a write fault? */
 	error_code = exit_qualification & (1U << 1);
+	/* It is a fetch fault? */
+	error_code |= (exit_qualification & (1U << 2)) << 2;
 	/* ept page table is present? */
 	error_code |= (exit_qualification >> 3) & 0x1;
 
+	vcpu->arch.exit_qualification = exit_qualification;
+
 	return kvm_mmu_page_fault(vcpu, gpa, error_code, NULL, 0);
 }
 
@@ -7348,6 +7352,21 @@ static void vmx_set_supported_cpuid(u32 func, struct kvm_cpuid_entry2 *entry)
 		entry->ecx |= bit(X86_FEATURE_VMX);
 }
 
+static void nested_ept_inject_page_fault(struct kvm_vcpu *vcpu,
+		struct x86_exception *fault)
+{
+	struct vmcs12 *vmcs12;
+	nested_vmx_vmexit(vcpu);
+	vmcs12 = get_vmcs12(vcpu);
+
+	if (fault->error_code & PFERR_RSVD_MASK)
+		vmcs12->vm_exit_reason = EXIT_REASON_EPT_MISCONFIG;
+	else
+		vmcs12->vm_exit_reason = EXIT_REASON_EPT_VIOLATION;
+	vmcs12->exit_qualification = vcpu->arch.exit_qualification;
+	vmcs12->guest_physical_address = fault->address;
+}
+
 /*
  * prepare_vmcs02 is called when the L1 guest hypervisor runs its nested
  * L2 guest. L1 has a vmcs for L2 (vmcs12), and this function "merges" it
-- 
1.7.10.4

--
			Gleb.

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v7 13/15] nEPT: Advertise EPT to L1
  2013-08-05  8:07 ` [PATCH v7 13/15] nEPT: Advertise EPT to L1 Gleb Natapov
  2013-08-06  8:20   ` Jan Kiszka
@ 2013-08-06  9:02   ` Gleb Natapov
  1 sibling, 0 replies; 32+ messages in thread
From: Gleb Natapov @ 2013-08-06  9:02 UTC (permalink / raw)
  To: kvm; +Cc: Xiao Guangrong, Jun Nakajima, Yang Zhang, pbonzini

From: Nadav Har'El <nyh@il.ibm.com>

Advertise the support of EPT to the L1 guest, through the appropriate MSR.

This is the last patch of the basic Nested EPT feature, so as to allow
bisection through this patch series: The guest will not see EPT support until
this last patch, and will not attempt to use the half-applied feature.

Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Signed-off-by: Jun Nakajima <jun.nakajima@intel.com>
Signed-off-by: Xinhao Xu <xinhao.xu@intel.com>
Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
---
Fix white space error.

 arch/x86/kvm/vmx.c |   20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 51c06c2..e23fdf2 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2250,6 +2250,22 @@ static __init void nested_vmx_setup_ctls_msrs(void)
 		SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES |
 		SECONDARY_EXEC_WBINVD_EXITING;
 
+	if (enable_ept) {
+		/* nested EPT: emulate EPT also to L1 */
+		nested_vmx_secondary_ctls_high |= SECONDARY_EXEC_ENABLE_EPT;
+		nested_vmx_ept_caps = VMX_EPT_PAGE_WALK_4_BIT;
+		nested_vmx_ept_caps |= VMX_EPT_INVEPT_BIT;
+		nested_vmx_ept_caps &= vmx_capability.ept;
+		/*
+		 * Since invept is completely emulated we support both global
+		 * and context invalidation independent of what host cpu
+		 * supports
+		 */
+		nested_vmx_ept_caps |= VMX_EPT_EXTENT_GLOBAL_BIT |
+			VMX_EPT_EXTENT_CONTEXT_BIT;
+	} else
+		nested_vmx_ept_caps = 0;
+
 	/* miscellaneous data */
 	rdmsr(MSR_IA32_VMX_MISC, nested_vmx_misc_low, nested_vmx_misc_high);
 	nested_vmx_misc_low &= VMX_MISC_PREEMPTION_TIMER_RATE_MASK |
@@ -2358,8 +2374,8 @@ static int vmx_get_vmx_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata)
 					nested_vmx_secondary_ctls_high);
 		break;
 	case MSR_IA32_VMX_EPT_VPID_CAP:
-		/* Currently, no nested ept or nested vpid */
-		*pdata = 0;
+		/* Currently, no nested vpid support */
+		*pdata = nested_vmx_ept_caps;
 		break;
 	default:
 		return 0;
-- 
1.7.10.4

--
			Gleb.

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 00/15] Nested EPT
  2013-08-05  8:07 [PATCH v7 00/15] Nested EPT Gleb Natapov
                   ` (14 preceding siblings ...)
  2013-08-05  8:07 ` [PATCH v7 15/15] nEPT: Miscelleneous cleanups Gleb Natapov
@ 2013-08-07 14:07 ` Paolo Bonzini
  15 siblings, 0 replies; 32+ messages in thread
From: Paolo Bonzini @ 2013-08-07 14:07 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: kvm, Xiao Guangrong, Jun Nakajima, Yang Zhang

On 08/05/2013 10:07 AM, Gleb Natapov wrote:
> Xiao comment about checking ept pointer before flushing individual ept context
> is addressed here.
>
> Gleb Natapov (3):
>    nEPT: make guest's A/D bits depends on guest's paging mode
>    nEPT: Support shadow paging for guest paging without A/D bits
>    nEPT: correctly check if remote tlb flush is needed for shadowed EPT
>      tables
>
> Nadav Har'El (10):
>    nEPT: Support LOAD_IA32_EFER entry/exit controls for L1
>    nEPT: Fix cr3 handling in nested exit and entry
>    nEPT: Fix wrong test in kvm_set_cr3
>    nEPT: Move common code to paging_tmpl.h
>    nEPT: Add EPT tables support to paging_tmpl.h
>    nEPT: MMU context for nested EPT
>    nEPT: Nested INVEPT
>    nEPT: Advertise EPT to L1
>    nEPT: Some additional comments
>    nEPT: Miscelleneous cleanups
>
> Yang Zhang (2):
>    nEPT: Redefine EPT-specific link_shadow_page()
>    nEPT: Add nEPT violation/misconfigration support
>
>   arch/x86/include/asm/kvm_host.h |    4 +
>   arch/x86/include/asm/vmx.h      |    2 +
>   arch/x86/include/uapi/asm/vmx.h |    1 +
>   arch/x86/kvm/mmu.c              |  170 +++++++++++++++++-------------
>   arch/x86/kvm/mmu.h              |    2 +
>   arch/x86/kvm/paging_tmpl.h      |  176 +++++++++++++++++++++++++++----
>   arch/x86/kvm/vmx.c              |  220 ++++++++++++++++++++++++++++++++++++---
>   arch/x86/kvm/x86.c              |   11 --
>   8 files changed, 467 insertions(+), 119 deletions(-)
>

Applied, thanks (rebased on top of Xiao's walk_addr_generic fix).

Paolo

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2013-08-07 14:07 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-08-05  8:07 [PATCH v7 00/15] Nested EPT Gleb Natapov
2013-08-05  8:07 ` [PATCH v7 01/15] nEPT: Support LOAD_IA32_EFER entry/exit controls for L1 Gleb Natapov
2013-08-05 11:27   ` Arthur Chunqi Li
2013-08-05 11:40     ` Gleb Natapov
2013-08-06  7:47       ` Jan Kiszka
2013-08-06  7:51         ` Gleb Natapov
2013-08-06  7:55           ` Jan Kiszka
2013-08-06  8:00             ` Gleb Natapov
2013-08-06  8:18               ` Jan Kiszka
2013-08-06  8:24                 ` Gleb Natapov
2013-08-05 11:48     ` Paolo Bonzini
2013-08-05 12:05       ` Gleb Natapov
2013-08-05  8:07 ` [PATCH v7 02/15] nEPT: Fix cr3 handling in nested exit and entry Gleb Natapov
2013-08-05  8:07 ` [PATCH v7 03/15] nEPT: Fix wrong test in kvm_set_cr3 Gleb Natapov
2013-08-05  8:07 ` [PATCH v7 04/15] nEPT: Move common code to paging_tmpl.h Gleb Natapov
2013-08-05  8:07 ` [PATCH v7 05/15] nEPT: make guest's A/D bits depends on guest's paging mode Gleb Natapov
2013-08-05  8:07 ` [PATCH v7 06/15] nEPT: Support shadow paging for guest paging without A/D bits Gleb Natapov
2013-08-05  8:07 ` [PATCH v7 07/15] nEPT: Add EPT tables support to paging_tmpl.h Gleb Natapov
2013-08-05  8:07 ` [PATCH v7 08/15] nEPT: Redefine EPT-specific link_shadow_page() Gleb Natapov
2013-08-05  8:07 ` [PATCH v7 09/15] nEPT: correctly check if remote tlb flush is needed for shadowed EPT tables Gleb Natapov
2013-08-05  8:07 ` [PATCH v7 10/15] nEPT: Add nEPT violation/misconfigration support Gleb Natapov
2013-08-06  8:01   ` Jan Kiszka
2013-08-06  9:00   ` Gleb Natapov
2013-08-05  8:07 ` [PATCH v7 11/15] nEPT: MMU context for nested EPT Gleb Natapov
2013-08-05  8:07 ` [PATCH v7 12/15] nEPT: Nested INVEPT Gleb Natapov
2013-08-05  8:53   ` Xiao Guangrong
2013-08-05  8:07 ` [PATCH v7 13/15] nEPT: Advertise EPT to L1 Gleb Natapov
2013-08-06  8:20   ` Jan Kiszka
2013-08-06  9:02   ` Gleb Natapov
2013-08-05  8:07 ` [PATCH v7 14/15] nEPT: Some additional comments Gleb Natapov
2013-08-05  8:07 ` [PATCH v7 15/15] nEPT: Miscelleneous cleanups Gleb Natapov
2013-08-07 14:07 ` [PATCH v7 00/15] Nested EPT Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).