All of lore.kernel.org
 help / color / mirror / Atom feed
From: Orit Wasserman <owasserm@redhat.com>
To: "Nadav Har'El" <nyh@math.technion.ac.il>
Cc: Avi Kivity <avi@redhat.com>,
	kvm@vger.kernel.org, "Roedel, Joerg" <Joerg.Roedel@amd.com>,
	abelg@il.ibm.com
Subject: Re: [PATCH 02/10] nEPT: MMU context for nested EPT
Date: Sun, 13 Nov 2011 13:30:15 +0200	[thread overview]
Message-ID: <4EBFAA47.406@redhat.com> (raw)
In-Reply-To: <20111112213747.GA741@fermat.math.technion.ac.il>

[-- Attachment #1: Type: text/plain, Size: 4692 bytes --]

On 11/12/2011 11:37 PM, Nadav Har'El wrote:
> On Sat, Nov 12, 2011, Avi Kivity wrote about "Re: [PATCH 02/10] nEPT: MMU context for nested EPT":
>> host may write-protect a page.  Second, the shadow and guest ptes may be
>> in different formats (ept vs ia32).
> 
> I'm afraid I've lost you here... The shadow table and the to-be-shadowed 
> table are both ia32 (this is the normal shadow table code), or both ept
> (the nested tdp code). When are they supposed to be in different
> formats (ept vs ia32)?
> 
> I'm also puzzled in what situation will the host will write-protect an EPT02
> (shadow EPT) page?
> 
>> In fact that happens to accidentally work, no?  Intermediate ptes are
>> always present/write/user, which translates to read/write/execute in EPT.
> 
> It didn't work because it also used to set the "accessed" bit, bit 5,
> which on EPT is reserved and caused EPT misconfiguration. So I had to
> fix link_shadow_page, or nested EPT would not work at all.
> 
>> Don't optimize for least changes, optimize for best result afterwards.
> 
> As I'm sure you remember, two years ago, in September 6 2009, you wrote in
> your blog about the newly contributed nested VMX patch set, and in
> particular its nested EPT (which predated the nested NPT contribution).
> 
> Nested EPT was, for some workloads, a huge performance improvement, but
> you (if I understand correctly) did not want that code in KVM because
> it, basically, optimized for getting the job done, in the most correct
> and most efficient manner - but without regard of how cleanly this fit with
> other types of shadowing (normal shadow page tables, and nested NPT),
> or how much of the code was being duplicated or circumvented.
> 
> So this time around, I couldn't really "not optimize for least changes".
> This time, the nested EPT had to fit (like a square peg in a round hole
> ;-)), into the preexisting MMU and NPT shadowing. I couldn't really just write
> the most correct and most efficient code (which Orit Wasserman already
> did, two years earlier). This time I needed to figure out the least obtrusive
> way of changing the existing code. The hardest thing about doing this
> was trying to understand all the complexities and subtleties of the existing
> MMU code in KVM, which already does 101 different cases in one
> overloaded piece of code, which is not commented or documented.
> And of course, add to that all the complexities (some might even say "cruft")
> which the underlying x86 architecture itself has acrued over the years.
> So it's not surprising I've missed some of the important subtleties which
> didn't have any effect in the typical case I've tried. Like I said, in my
> tests nested EPT *did* work. And even getting to that point was hard enough :-)
> 
>> We need a third variant of walk_addr_generic that parses EPT format
>> PTEs.  Whether that's best done by writing paging_ept.h or modifying
>> paging_tmpl.h, I don't know.
> 
> Thanks. I'll think about everything you've said in this thread (I'm still
> not convinced I understood all your points, so just understanding them
> will be the first step). I'll see what I can do to improve the patch.
> 
> But I have to be honest - I'm not sure how quickly I can finish this.
> I really appreciate all your comments about nested VMX in the last two
> years - most of them have been spot-on, 100% correct, and really helpful
> for making me understand things which I had previously misunderstood.
> However, since you are (of course) extremely familiar with every nook and
> cranny of KVM, what normally happens is that every comment which took you
> 5 minutes to figure out, takes me 5 days to fully understand, and to actually
> write, debug and test the fixed code. Every review that takes you two days
> to go through (and is very much appreciated!) takes me several months to fix
> each and every thing you asked for.
> 
> Don't get me wrong, I *am* planning to continue working (part-time) on nested
> VMX, and nested EPT in particular. But if you want it to pick up the pace,
> I could use some help with actual coding from people who have much more
> intimate knowledge of the non-nested-VMX parts of KVM than I have.
> 
> In the meantime, if anybody wants to experiment with a much faster
> Nested VMX than we had before, you can try my current patch. It may not
> be perfect, but in many ways it is better than the old shadow-on-ept code.
> And in simple (64 bit, 4k page) kvm-over-kvm configurations like I tried, it
> works well.
> 
> Nadav.
> 

Maybe this patch can help, this is roughly what Avi wants (I hope) done very quickly.
I'm sorry I don't have setup to run nested VMX at the moment so i can't test it.

Orit

[-- Attachment #2: 0001-Add-EPT_walk_addr_genric.patch --]
[-- Type: text/plain, Size: 5158 bytes --]

>From 032ba0221ea690f61ecc4f6d946090d18d6f4dfb Mon Sep 17 00:00:00 2001
From: Orit Wasserman <owasserm@redhard.com>
Date: Sun, 13 Nov 2011 12:57:17 +0200
Subject: [PATCH] Add EPT_walk_addr_genric

---
 arch/x86/kvm/mmu.c         |   13 +++++++++++++
 arch/x86/kvm/mmu.h         |    5 +++++
 arch/x86/kvm/paging_tmpl.h |   44 ++++++++++++++++++++++++++++++++++++++++----
 3 files changed, 58 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 9335e1b..bbe212f 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3180,6 +3180,10 @@ static bool sync_mmio_spte(u64 *sptep, gfn_t gfn, unsigned access,
 #include "paging_tmpl.h"
 #undef PTTYPE
 
+#define PTTYPE EPT
+#include "paging_tmpl.h"
+#undef PTTYPE
+
 #define PTTYPE 32
 #include "paging_tmpl.h"
 #undef PTTYPE
@@ -3381,6 +3385,15 @@ static int init_kvm_softmmu(struct kvm_vcpu *vcpu)
 	return r;
 }
 
+static int nested_ept_init_mmu_context(struct kvm_vcpu *vcpu)
+{
+	int r = kvm_init_shadow_mmu(vcpu, &vcpu->arch.mmu);
+
+	vcpu->arch.nested_mmu.gva_to_gpa = EPT_gva_to_gpa_nested;
+
+	return r;
+}
+
 static int init_kvm_nested_mmu(struct kvm_vcpu *vcpu)
 {
 	struct kvm_mmu *g_context = &vcpu->arch.nested_mmu;
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index e374db9..545e2ff 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -48,6 +48,11 @@
 #define PFERR_RSVD_MASK (1U << 3)
 #define PFERR_FETCH_MASK (1U << 4)
 
+#define EPT_WRITABLE_MASK 2
+#define EPT_EXEC_MASK 4
+
+#define EPT 18
+
 int kvm_mmu_get_spte_hierarchy(struct kvm_vcpu *vcpu, u64 addr, u64 sptes[4]);
 void kvm_mmu_set_mmio_spte_mask(u64 mmio_mask);
 int handle_mmio_page_fault_common(struct kvm_vcpu *vcpu, u64 addr, bool direct);
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 507e2b8..70d4cfd 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -39,6 +39,21 @@
 	#define CMPXCHG cmpxchg64
 	#define PT_MAX_FULL_LEVELS 2
 	#endif
+#elif PTTYPE == EPT
+	#define pt_element_t u64
+	#define FNAME(name) EPT_##name
+	#define PT_BASE_ADDR_MASK PT64_BASE_ADDR_MASK
+	#define PT_LVL_ADDR_MASK(lvl) PT64_LVL_ADDR_MASK(lvl)
+	#define PT_LVL_OFFSET_MASK(lvl) PT64_LVL_OFFSET_MASK(lvl)
+	#define PT_INDEX(addr, level) PT64_INDEX(addr, level)
+	#define PT_LEVEL_BITS PT64_LEVEL_BITS
+	#ifdef CONFIG_X86_64
+	#define PT_MAX_FULL_LEVELS 4
+	#define CMPXCHG cmpxchg
+	#else
+	#define CMPXCHG cmpxchg64
+	#define PT_MAX_FULL_LEVELS 2
+	#endif
 #elif PTTYPE == 32
 	#define pt_element_t u32
 	#define guest_walker guest_walker32
@@ -106,14 +121,19 @@ static unsigned FNAME(gpte_access)(struct kvm_vcpu *vcpu, pt_element_t gpte,
 {
 	unsigned access;
 
+#if PTTYPE == EPT       
 	access = (gpte & (PT_WRITABLE_MASK | PT_USER_MASK)) | ACC_EXEC_MASK;
+#else
+	access = (gpte & EPT_WRITABLE_MASK) | EPT_EXEC_MASK;
 	if (last && !is_dirty_gpte(gpte))
 		access &= ~ACC_WRITE_MASK;
+#endif
 
 #if PTTYPE == 64
 	if (vcpu->arch.mmu.nx)
 		access &= ~(gpte >> PT64_NX_SHIFT);
-#endif
+#endif 
+
 	return access;
 }
 
@@ -149,9 +169,15 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
 	gpa_t pte_gpa;
 	bool eperm;
 	int offset;
+#if PTTYPE == EPT
+	const int write_fault = access & EPT_WRITABLE_MASK;
+	const int user_fault  = 0;
+	const int fetch_fault = 0;
+#else
 	const int write_fault = access & PFERR_WRITE_MASK;
 	const int user_fault  = access & PFERR_USER_MASK;
 	const int fetch_fault = access & PFERR_FETCH_MASK;
+#endif
 	u16 errcode = 0;
 
 	trace_kvm_mmu_pagetable_walk(addr, write_fault, user_fault,
@@ -174,6 +200,9 @@ retry_walk:
 	       (mmu->get_cr3(vcpu) & CR3_NONPAE_RESERVED_BITS) == 0);
 
 	pt_access = ACC_ALL;
+#if PTTYPE == EPT 
+	pt_access =  PT_PRESENT_MASK | EPT_WRITABLE_MASK | EPT_EXEC_MASK;
+#endif
 
 	for (;;) {
 		gfn_t real_gfn;
@@ -186,9 +215,14 @@ retry_walk:
 		pte_gpa   = gfn_to_gpa(table_gfn) + offset;
 		walker->table_gfn[walker->level - 1] = table_gfn;
 		walker->pte_gpa[walker->level - 1] = pte_gpa;
-
+#if PTTYPE == EPT 
+		real_gfn = mmu->translate_gpa(vcpu, gfn_to_gpa(table_gfn),
+					      EPT_WRITABLE_MASK);
+#else
 		real_gfn = mmu->translate_gpa(vcpu, gfn_to_gpa(table_gfn),
 					      PFERR_USER_MASK|PFERR_WRITE_MASK);
+#endif
+
 		if (unlikely(real_gfn == UNMAPPED_GVA))
 			goto error;
 		real_gfn = gpa_to_gfn(real_gfn);
@@ -221,6 +255,7 @@ retry_walk:
 			eperm = true;
 #endif
 
+#if PTTYPE != EPT
 		if (!eperm && unlikely(!(pte & PT_ACCESSED_MASK))) {
 			int ret;
 			trace_kvm_mmu_set_accessed_bit(table_gfn, index,
@@ -235,7 +270,7 @@ retry_walk:
 			mark_page_dirty(vcpu->kvm, table_gfn);
 			pte |= PT_ACCESSED_MASK;
 		}
-
+#endif
 		walker->ptes[walker->level - 1] = pte;
 
 		if (FNAME(is_last_gpte)(walker, vcpu, mmu, pte)) {
@@ -244,12 +279,13 @@ retry_walk:
 			gfn_t gfn;
 			u32 ac;
 
+#if PTTYPE != EPT
 			/* check if the kernel is fetching from user page */
 			if (unlikely(pte_access & PT_USER_MASK) &&
 			    kvm_read_cr4_bits(vcpu, X86_CR4_SMEP))
 				if (fetch_fault && !user_fault)
 					eperm = true;
-
+#endif
 			gfn = gpte_to_gfn_lvl(pte, lvl);
 			gfn += (addr & PT_LVL_OFFSET_MASK(lvl)) >> PAGE_SHIFT;
 
-- 
1.7.6.4


  parent reply	other threads:[~2011-11-13 11:32 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-11-10  9:57 [PATCH 0/10] nEPT: Nested EPT support for Nested VMX Nadav Har'El
2011-11-10  9:58 ` [PATCH 01/10] nEPT: Module option Nadav Har'El
2011-11-10 12:23   ` Avi Kivity
2011-11-10 14:21     ` Nadav Har'El
2011-11-10 14:38       ` Avi Kivity
2011-11-10 15:14         ` Nadav Har'El
2011-11-10 15:21           ` Avi Kivity
2011-11-10  9:58 ` [PATCH 02/10] nEPT: MMU context for nested EPT Nadav Har'El
2011-11-10 10:31   ` Avi Kivity
2011-11-10 12:49   ` Avi Kivity
2011-11-10 14:40     ` Nadav Har'El
2011-11-10 15:19       ` Avi Kivity
2011-11-10 20:05         ` Nadav Har'El
2011-11-12 10:39           ` Avi Kivity
2011-11-12 21:37             ` Nadav Har'El
2011-11-13  9:10               ` Avi Kivity
2011-11-13 11:30               ` Orit Wasserman [this message]
2011-11-13 14:32                 ` Avi Kivity
2011-11-13 18:26                   ` Orit Wasserman
2011-11-14  8:25                     ` Avi Kivity
2011-12-08 15:21                       ` Nadav Har'El
2011-12-06 12:40                   ` Nadav Har'El
2011-12-06 13:07                     ` Avi Kivity
2011-11-23 15:06                 ` Nadav Har'El
2011-11-23 15:44                   ` Nadav Har'El
2011-11-24 13:36                     ` Avi Kivity
2011-12-07  9:06                 ` Nadav Har'El
2011-12-07 10:10                   ` Avi Kivity
2011-11-10  9:59 ` [PATCH 03/10] nEPT: Fix cr3 handling in nested exit and entry Nadav Har'El
2011-11-10  9:59 ` [PATCH 04/10] nEPT: Fix page table format in nested EPT Nadav Har'El
2011-11-10 10:37   ` Avi Kivity
2011-11-10 11:03     ` Nadav Har'El
2011-11-10 12:21       ` Avi Kivity
2011-11-10 12:50         ` Avi Kivity
2011-11-10 13:07   ` Orit Wasserman
2011-11-10 10:00 ` [PATCH 05/10] nEPT: Fix wrong test in kvm_set_cr3 Nadav Har'El
2011-11-10 10:00 ` [PATCH 06/10] nEPT: Some additional comments Nadav Har'El
2011-11-10 10:01 ` [PATCH 07/10] nEPT: Advertise EPT to L1 Nadav Har'El
2011-11-10 10:01 ` [PATCH 08/10] nEPT: Nested INVEPT Nadav Har'El
2011-11-10 12:17   ` Avi Kivity
2011-12-11 14:24     ` Nadav Har'El
2011-12-11 14:37       ` Avi Kivity
2011-11-10 10:02 ` [PATCH 09/10] nEPT: Documentation Nadav Har'El
2011-11-10 10:02 ` [PATCH 10/10] nEPT: Miscelleneous cleanups Nadav Har'El
2011-11-10 12:26 ` [PATCH 0/10] nEPT: Nested EPT support for Nested VMX Avi Kivity
2011-11-13  8:52   ` Nadav Har'El
2011-11-13  9:21     ` Avi Kivity
2011-12-12 11:37       ` Nadav Har'El
2011-12-12 13:04         ` Avi Kivity

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4EBFAA47.406@redhat.com \
    --to=owasserm@redhat.com \
    --cc=Joerg.Roedel@amd.com \
    --cc=abelg@il.ibm.com \
    --cc=avi@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=nyh@math.technion.ac.il \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.