kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/9] Optimize page table walk
@ 2012-09-16 12:07 Avi Kivity
  2012-09-16 12:07 ` [PATCH v2 1/9] KVM: MMU: Push clean gpte write protection out of gpte_access() Avi Kivity
                   ` (8 more replies)
  0 siblings, 9 replies; 21+ messages in thread
From: Avi Kivity @ 2012-09-16 12:07 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: kvm, Xiao Guangrong

The page table walk has gotten crufty over the years and is threatening to become
even more crufty when SMAP is introduced.  Clean it up (and optimize it) somewhat.

v2:
  fix SMEP false positive by moving checks to the end of the walk
  fix last_pte_bitmap documentation
  fix incorrect SMEP fault permission checks
  introduce helper for accessing the permission bitmap

Avi Kivity (9):
  KVM: MMU: Push clean gpte write protection out of gpte_access()
  KVM: MMU: Optimize gpte_access() slightly
  KVM: MMU: Move gpte_access() out of paging_tmpl.h
  KVM: MMU: Update accessed and dirty bits after guest pagetable walk
  KVM: MMU: Optimize pte permission checks
  KVM: MMU: Simplify walk_addr_generic() loop
  KVM: MMU: Optimize is_last_gpte()
  KVM: MMU: Eliminate eperm temporary
  KVM: MMU: Avoid access/dirty update loop if all is well

 arch/x86/include/asm/kvm_host.h |  14 +++
 arch/x86/kvm/mmu.c              |  91 +++++++++++++++++++
 arch/x86/kvm/mmu.h              |  25 +++---
 arch/x86/kvm/paging_tmpl.h      | 190 +++++++++++++++++-----------------------
 arch/x86/kvm/x86.c              |  11 +--
 5 files changed, 202 insertions(+), 129 deletions(-)

-- 
1.7.12


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2 1/9] KVM: MMU: Push clean gpte write protection out of gpte_access()
  2012-09-16 12:07 [PATCH v2 0/9] Optimize page table walk Avi Kivity
@ 2012-09-16 12:07 ` Avi Kivity
  2012-09-16 12:07 ` [PATCH v2 2/9] KVM: MMU: Optimize gpte_access() slightly Avi Kivity
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 21+ messages in thread
From: Avi Kivity @ 2012-09-16 12:07 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: kvm, Xiao Guangrong

gpte_access() computes the access permissions of a guest pte and also
write-protects clean gptes.  This is wrong when we are servicing a
write fault (since we'll be setting the dirty bit momentarily) but
correct when instantiating a speculative spte, or when servicing a
read fault (since we'll want to trap a following write in order to
set the dirty bit).

It doesn't seem to hurt in practice, but in order to make the code
readable, push the write protection out of gpte_access() and into
a new protect_clean_gpte() which is called explicitly when needed.

Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
---
 arch/x86/kvm/mmu.c         | 12 ++++++++++++
 arch/x86/kvm/mmu.h         |  3 ++-
 arch/x86/kvm/paging_tmpl.h | 24 ++++++++++++------------
 3 files changed, 26 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index aa0b469..54c9cb4 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3408,6 +3408,18 @@ static bool is_rsvd_bits_set(struct kvm_mmu *mmu, u64 gpte, int level)
 	return (gpte & mmu->rsvd_bits_mask[bit7][level-1]) != 0;
 }
 
+static inline void protect_clean_gpte(unsigned *access, unsigned gpte)
+{
+	unsigned mask;
+
+	BUILD_BUG_ON(PT_WRITABLE_MASK != ACC_WRITE_MASK);
+
+	mask = (unsigned)~ACC_WRITE_MASK;
+	/* Allow write access to dirty gptes */
+	mask |= (gpte >> (PT_DIRTY_SHIFT - PT_WRITABLE_SHIFT)) & PT_WRITABLE_MASK;
+	*access &= mask;
+}
+
 static bool sync_mmio_spte(u64 *sptep, gfn_t gfn, unsigned access,
 			   int *nr_present)
 {
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index e374db9..2832081 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -18,7 +18,8 @@
 #define PT_PCD_MASK (1ULL << 4)
 #define PT_ACCESSED_SHIFT 5
 #define PT_ACCESSED_MASK (1ULL << PT_ACCESSED_SHIFT)
-#define PT_DIRTY_MASK (1ULL << 6)
+#define PT_DIRTY_SHIFT 6
+#define PT_DIRTY_MASK (1ULL << PT_DIRTY_SHIFT)
 #define PT_PAGE_SIZE_MASK (1ULL << 7)
 #define PT_PAT_MASK (1ULL << 7)
 #define PT_GLOBAL_MASK (1ULL << 8)
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index bf8c42b..bf7b4ff 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -101,14 +101,11 @@ static int FNAME(cmpxchg_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
 	return (ret != orig_pte);
 }
 
-static unsigned FNAME(gpte_access)(struct kvm_vcpu *vcpu, pt_element_t gpte,
-				   bool last)
+static unsigned FNAME(gpte_access)(struct kvm_vcpu *vcpu, pt_element_t gpte)
 {
 	unsigned access;
 
 	access = (gpte & (PT_WRITABLE_MASK | PT_USER_MASK)) | ACC_EXEC_MASK;
-	if (last && !is_dirty_gpte(gpte))
-		access &= ~ACC_WRITE_MASK;
 
 #if PTTYPE == 64
 	if (vcpu->arch.mmu.nx)
@@ -222,8 +219,7 @@ retry_walk:
 
 		last_gpte = FNAME(is_last_gpte)(walker, vcpu, mmu, pte);
 		if (last_gpte) {
-			pte_access = pt_access &
-				     FNAME(gpte_access)(vcpu, pte, true);
+			pte_access = pt_access & FNAME(gpte_access)(vcpu, pte);
 			/* check if the kernel is fetching from user page */
 			if (unlikely(pte_access & PT_USER_MASK) &&
 			    kvm_read_cr4_bits(vcpu, X86_CR4_SMEP))
@@ -274,7 +270,7 @@ retry_walk:
 			break;
 		}
 
-		pt_access &= FNAME(gpte_access)(vcpu, pte, false);
+		pt_access &= FNAME(gpte_access)(vcpu, pte);
 		--walker->level;
 	}
 
@@ -283,7 +279,9 @@ retry_walk:
 		goto error;
 	}
 
-	if (write_fault && unlikely(!is_dirty_gpte(pte))) {
+	if (!write_fault)
+		protect_clean_gpte(&pte_access, pte);
+	else if (unlikely(!is_dirty_gpte(pte))) {
 		int ret;
 
 		trace_kvm_mmu_set_dirty_bit(table_gfn, index, sizeof(pte));
@@ -368,7 +366,8 @@ static void FNAME(update_pte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 		return;
 
 	pgprintk("%s: gpte %llx spte %p\n", __func__, (u64)gpte, spte);
-	pte_access = sp->role.access & FNAME(gpte_access)(vcpu, gpte, true);
+	pte_access = sp->role.access & FNAME(gpte_access)(vcpu, gpte);
+	protect_clean_gpte(&pte_access, gpte);
 	pfn = gfn_to_pfn_atomic(vcpu->kvm, gpte_to_gfn(gpte));
 	if (mmu_invalid_pfn(pfn))
 		return;
@@ -441,8 +440,8 @@ static void FNAME(pte_prefetch)(struct kvm_vcpu *vcpu, struct guest_walker *gw,
 		if (FNAME(prefetch_invalid_gpte)(vcpu, sp, spte, gpte))
 			continue;
 
-		pte_access = sp->role.access & FNAME(gpte_access)(vcpu, gpte,
-								  true);
+		pte_access = sp->role.access & FNAME(gpte_access)(vcpu, gpte);
+		protect_clean_gpte(&pte_access, gpte);
 		gfn = gpte_to_gfn(gpte);
 		pfn = pte_prefetch_gfn_to_pfn(vcpu, gfn,
 				      pte_access & ACC_WRITE_MASK);
@@ -794,7 +793,8 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
 
 		gfn = gpte_to_gfn(gpte);
 		pte_access = sp->role.access;
-		pte_access &= FNAME(gpte_access)(vcpu, gpte, true);
+		pte_access &= FNAME(gpte_access)(vcpu, gpte);
+		protect_clean_gpte(&pte_access, gpte);
 
 		if (sync_mmio_spte(&sp->spt[i], gfn, pte_access, &nr_present))
 			continue;
-- 
1.7.12


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 2/9] KVM: MMU: Optimize gpte_access() slightly
  2012-09-16 12:07 [PATCH v2 0/9] Optimize page table walk Avi Kivity
  2012-09-16 12:07 ` [PATCH v2 1/9] KVM: MMU: Push clean gpte write protection out of gpte_access() Avi Kivity
@ 2012-09-16 12:07 ` Avi Kivity
  2012-09-16 12:07 ` [PATCH v2 3/9] KVM: MMU: Move gpte_access() out of paging_tmpl.h Avi Kivity
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 21+ messages in thread
From: Avi Kivity @ 2012-09-16 12:07 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: kvm, Xiao Guangrong

If nx is disabled, then is gpte[63] is set we will hit a reserved
bit set fault before checking permissions; so we can ignore the
setting of efer.nxe.

Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
---
 arch/x86/kvm/paging_tmpl.h | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index bf7b4ff..064bcb3 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -106,10 +106,8 @@ static unsigned FNAME(gpte_access)(struct kvm_vcpu *vcpu, pt_element_t gpte)
 	unsigned access;
 
 	access = (gpte & (PT_WRITABLE_MASK | PT_USER_MASK)) | ACC_EXEC_MASK;
-
 #if PTTYPE == 64
-	if (vcpu->arch.mmu.nx)
-		access &= ~(gpte >> PT64_NX_SHIFT);
+	access &= ~(gpte >> PT64_NX_SHIFT);
 #endif
 	return access;
 }
-- 
1.7.12


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 3/9] KVM: MMU: Move gpte_access() out of paging_tmpl.h
  2012-09-16 12:07 [PATCH v2 0/9] Optimize page table walk Avi Kivity
  2012-09-16 12:07 ` [PATCH v2 1/9] KVM: MMU: Push clean gpte write protection out of gpte_access() Avi Kivity
  2012-09-16 12:07 ` [PATCH v2 2/9] KVM: MMU: Optimize gpte_access() slightly Avi Kivity
@ 2012-09-16 12:07 ` Avi Kivity
  2012-09-16 12:07 ` [PATCH v2 4/9] KVM: MMU: Update accessed and dirty bits after guest pagetable walk Avi Kivity
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 21+ messages in thread
From: Avi Kivity @ 2012-09-16 12:07 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: kvm, Xiao Guangrong

We no longer rely on paging_tmpl.h defines; so we can move the function
to mmu.c.

Rely on zero extension to 64 bits to get the correct nx behaviour.

Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
---
 arch/x86/kvm/mmu.c         | 10 ++++++++++
 arch/x86/kvm/paging_tmpl.h | 21 +++++----------------
 2 files changed, 15 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 54c9cb4..f297a2c 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3437,6 +3437,16 @@ static bool sync_mmio_spte(u64 *sptep, gfn_t gfn, unsigned access,
 	return false;
 }
 
+static inline unsigned gpte_access(struct kvm_vcpu *vcpu, u64 gpte)
+{
+	unsigned access;
+
+	access = (gpte & (PT_WRITABLE_MASK | PT_USER_MASK)) | ACC_EXEC_MASK;
+	access &= ~(gpte >> PT64_NX_SHIFT);
+
+	return access;
+}
+
 #define PTTYPE 64
 #include "paging_tmpl.h"
 #undef PTTYPE
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 064bcb3..1cbf576 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -101,17 +101,6 @@ static int FNAME(cmpxchg_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
 	return (ret != orig_pte);
 }
 
-static unsigned FNAME(gpte_access)(struct kvm_vcpu *vcpu, pt_element_t gpte)
-{
-	unsigned access;
-
-	access = (gpte & (PT_WRITABLE_MASK | PT_USER_MASK)) | ACC_EXEC_MASK;
-#if PTTYPE == 64
-	access &= ~(gpte >> PT64_NX_SHIFT);
-#endif
-	return access;
-}
-
 static bool FNAME(is_last_gpte)(struct guest_walker *walker,
 				struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
 				pt_element_t gpte)
@@ -217,7 +206,7 @@ retry_walk:
 
 		last_gpte = FNAME(is_last_gpte)(walker, vcpu, mmu, pte);
 		if (last_gpte) {
-			pte_access = pt_access & FNAME(gpte_access)(vcpu, pte);
+			pte_access = pt_access & gpte_access(vcpu, pte);
 			/* check if the kernel is fetching from user page */
 			if (unlikely(pte_access & PT_USER_MASK) &&
 			    kvm_read_cr4_bits(vcpu, X86_CR4_SMEP))
@@ -268,7 +257,7 @@ retry_walk:
 			break;
 		}
 
-		pt_access &= FNAME(gpte_access)(vcpu, pte);
+		pt_access &= gpte_access(vcpu, pte);
 		--walker->level;
 	}
 
@@ -364,7 +353,7 @@ static void FNAME(update_pte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 		return;
 
 	pgprintk("%s: gpte %llx spte %p\n", __func__, (u64)gpte, spte);
-	pte_access = sp->role.access & FNAME(gpte_access)(vcpu, gpte);
+	pte_access = sp->role.access & gpte_access(vcpu, gpte);
 	protect_clean_gpte(&pte_access, gpte);
 	pfn = gfn_to_pfn_atomic(vcpu->kvm, gpte_to_gfn(gpte));
 	if (mmu_invalid_pfn(pfn))
@@ -438,7 +427,7 @@ static void FNAME(pte_prefetch)(struct kvm_vcpu *vcpu, struct guest_walker *gw,
 		if (FNAME(prefetch_invalid_gpte)(vcpu, sp, spte, gpte))
 			continue;
 
-		pte_access = sp->role.access & FNAME(gpte_access)(vcpu, gpte);
+		pte_access = sp->role.access & gpte_access(vcpu, gpte);
 		protect_clean_gpte(&pte_access, gpte);
 		gfn = gpte_to_gfn(gpte);
 		pfn = pte_prefetch_gfn_to_pfn(vcpu, gfn,
@@ -791,7 +780,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
 
 		gfn = gpte_to_gfn(gpte);
 		pte_access = sp->role.access;
-		pte_access &= FNAME(gpte_access)(vcpu, gpte);
+		pte_access &= gpte_access(vcpu, gpte);
 		protect_clean_gpte(&pte_access, gpte);
 
 		if (sync_mmio_spte(&sp->spt[i], gfn, pte_access, &nr_present))
-- 
1.7.12


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 4/9] KVM: MMU: Update accessed and dirty bits after guest pagetable walk
  2012-09-16 12:07 [PATCH v2 0/9] Optimize page table walk Avi Kivity
                   ` (2 preceding siblings ...)
  2012-09-16 12:07 ` [PATCH v2 3/9] KVM: MMU: Move gpte_access() out of paging_tmpl.h Avi Kivity
@ 2012-09-16 12:07 ` Avi Kivity
  2012-09-18  6:37   ` Xiao Guangrong
  2012-09-16 12:07 ` [PATCH v2 5/9] KVM: MMU: Optimize pte permission checks Avi Kivity
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 21+ messages in thread
From: Avi Kivity @ 2012-09-16 12:07 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: kvm, Xiao Guangrong

While unspecified, the behaviour of Intel processors is to first
perform the page table walk, then, if the walk was successful, to
atomically update the accessed and dirty bits of walked paging elements.

While we are not required to follow this exactly, doing so will allow us
to perform the access permissions check after the walk is complete, rather
than after each walk step.

(the tricky case is SMEP: a zero in any pte's U bit makes the referenced
page a supervisor page, so we can't fault on a one bit during the walk
itself).

Signed-off-by: Avi Kivity <avi@redhat.com>
---
 arch/x86/kvm/paging_tmpl.h | 76 ++++++++++++++++++++++++++++------------------
 1 file changed, 47 insertions(+), 29 deletions(-)

diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 1cbf576..35a05dd 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -63,10 +63,12 @@
  */
 struct guest_walker {
 	int level;
+	unsigned max_level;
 	gfn_t table_gfn[PT_MAX_FULL_LEVELS];
 	pt_element_t ptes[PT_MAX_FULL_LEVELS];
 	pt_element_t prefetch_ptes[PTE_PREFETCH_NUM];
 	gpa_t pte_gpa[PT_MAX_FULL_LEVELS];
+	pt_element_t __user *ptep_user[PT_MAX_FULL_LEVELS];
 	unsigned pt_access;
 	unsigned pte_access;
 	gfn_t gfn;
@@ -119,6 +121,43 @@ static bool FNAME(is_last_gpte)(struct guest_walker *walker,
 	return false;
 }
 
+static int FNAME(update_accessed_dirty_bits)(struct kvm_vcpu *vcpu,
+					     struct kvm_mmu *mmu,
+					     struct guest_walker *walker,
+					     int write_fault)
+{
+	unsigned level, index;
+	pt_element_t pte, orig_pte;
+	pt_element_t __user *ptep_user;
+	gfn_t table_gfn;
+	int ret;
+
+	for (level = walker->max_level; level >= walker->level; --level) {
+		pte = orig_pte = walker->ptes[level - 1];
+		table_gfn = walker->table_gfn[level - 1];
+		ptep_user = walker->ptep_user[level - 1];
+		index = offset_in_page(ptep_user) / sizeof(pt_element_t);
+		if (!(pte & PT_ACCESSED_MASK)) {
+			trace_kvm_mmu_set_accessed_bit(table_gfn, index, sizeof(pte));
+			pte |= PT_ACCESSED_MASK;
+		}
+		if (level == walker->level && write_fault && !is_dirty_gpte(pte)) {
+			trace_kvm_mmu_set_dirty_bit(table_gfn, index, sizeof(pte));
+			pte |= PT_DIRTY_MASK;
+		}
+		if (pte == orig_pte)
+			continue;
+
+		ret = FNAME(cmpxchg_gpte)(vcpu, mmu, ptep_user, index, orig_pte, pte);
+		if (ret)
+			return ret;
+
+		mark_page_dirty(vcpu->kvm, table_gfn);
+		walker->ptes[level] = pte;
+	}
+	return 0;
+}
+
 /*
  * Fetch a guest pte for a guest virtual address
  */
@@ -126,6 +165,7 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
 				    struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
 				    gva_t addr, u32 access)
 {
+	int ret;
 	pt_element_t pte;
 	pt_element_t __user *uninitialized_var(ptep_user);
 	gfn_t table_gfn;
@@ -153,6 +193,7 @@ retry_walk:
 		--walker->level;
 	}
 #endif
+	walker->max_level = walker->level;
 	ASSERT((!is_long_mode(vcpu) && is_pae(vcpu)) ||
 	       (mmu->get_cr3(vcpu) & CR3_NONPAE_RESERVED_BITS) == 0);
 
@@ -183,6 +224,7 @@ retry_walk:
 		ptep_user = (pt_element_t __user *)((void *)host_addr + offset);
 		if (unlikely(__copy_from_user(&pte, ptep_user, sizeof(pte))))
 			goto error;
+		walker->ptep_user[walker->level - 1] = ptep_user;
 
 		trace_kvm_mmu_paging_element(pte, walker->level);
 
@@ -214,21 +256,6 @@ retry_walk:
 					eperm = true;
 		}
 
-		if (!eperm && unlikely(!(pte & PT_ACCESSED_MASK))) {
-			int ret;
-			trace_kvm_mmu_set_accessed_bit(table_gfn, index,
-						       sizeof(pte));
-			ret = FNAME(cmpxchg_gpte)(vcpu, mmu, ptep_user, index,
-						  pte, pte|PT_ACCESSED_MASK);
-			if (unlikely(ret < 0))
-				goto error;
-			else if (ret)
-				goto retry_walk;
-
-			mark_page_dirty(vcpu->kvm, table_gfn);
-			pte |= PT_ACCESSED_MASK;
-		}
-
 		walker->ptes[walker->level - 1] = pte;
 
 		if (last_gpte) {
@@ -268,21 +295,12 @@ retry_walk:
 
 	if (!write_fault)
 		protect_clean_gpte(&pte_access, pte);
-	else if (unlikely(!is_dirty_gpte(pte))) {
-		int ret;
 
-		trace_kvm_mmu_set_dirty_bit(table_gfn, index, sizeof(pte));
-		ret = FNAME(cmpxchg_gpte)(vcpu, mmu, ptep_user, index,
-					  pte, pte|PT_DIRTY_MASK);
-		if (unlikely(ret < 0))
-			goto error;
-		else if (ret)
-			goto retry_walk;
-
-		mark_page_dirty(vcpu->kvm, table_gfn);
-		pte |= PT_DIRTY_MASK;
-		walker->ptes[walker->level - 1] = pte;
-	}
+	ret = FNAME(update_accessed_dirty_bits)(vcpu, mmu, walker, write_fault);
+	if (unlikely(ret < 0))
+		goto error;
+	else if (ret)
+		goto retry_walk;
 
 	walker->pt_access = pt_access;
 	walker->pte_access = pte_access;
-- 
1.7.12


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 5/9] KVM: MMU: Optimize pte permission checks
  2012-09-16 12:07 [PATCH v2 0/9] Optimize page table walk Avi Kivity
                   ` (3 preceding siblings ...)
  2012-09-16 12:07 ` [PATCH v2 4/9] KVM: MMU: Update accessed and dirty bits after guest pagetable walk Avi Kivity
@ 2012-09-16 12:07 ` Avi Kivity
  2012-09-18  6:45   ` Xiao Guangrong
  2012-09-16 12:07 ` [PATCH v2 6/9] KVM: MMU: Simplify walk_addr_generic() loop Avi Kivity
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 21+ messages in thread
From: Avi Kivity @ 2012-09-16 12:07 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: kvm, Xiao Guangrong

walk_addr_generic() permission checks are a maze of branchy code, which is
performed four times per lookup.  It depends on the type of access, efer.nxe,
cr0.wp, cr4.smep, and in the near future, cr4.smap.

Optimize this away by precalculating all variants and storing them in a
bitmap.  The bitmap is recalculated when rarely-changing variables change
(cr0, cr4) and is indexed by the often-changing variables (page fault error
code, pte access permissions).

The permission check is moved to the end of the loop, otherwise an SMEP
fault could be reported as a false positive, when PDE.U=1 but PTE.U=0.
Noted by Xiao Guangrong.

The result is short, branch-free code.

Signed-off-by: Avi Kivity <avi@redhat.com>
---
 arch/x86/include/asm/kvm_host.h |  7 +++++++
 arch/x86/kvm/mmu.c              | 38 ++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/mmu.h              | 19 ++++++++-----------
 arch/x86/kvm/paging_tmpl.h      | 22 ++++------------------
 arch/x86/kvm/x86.c              | 11 ++++-------
 5 files changed, 61 insertions(+), 36 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 64adb61..3318bde 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -287,6 +287,13 @@ struct kvm_mmu {
 	union kvm_mmu_page_role base_role;
 	bool direct_map;
 
+	/*
+	 * Bitmap; bit set = permission fault
+	 * Byte index: page fault error code [4:1]
+	 * Bit index: pte permissions in ACC_* format
+	 */
+	u8 permissions[16];
+
 	u64 *pae_root;
 	u64 *lm_root;
 	u64 rsvd_bits_mask[2][4];
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index f297a2c..9c61889 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3516,6 +3516,38 @@ static void reset_rsvds_bits_mask(struct kvm_vcpu *vcpu,
 	}
 }
 
+static void update_permission_bitmask(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu)
+{
+	unsigned bit, byte, pfec;
+	u8 map;
+	bool fault, x, w, u, wf, uf, ff, smep;
+
+	smep = kvm_read_cr4_bits(vcpu, X86_CR4_SMEP);
+	for (byte = 0; byte < ARRAY_SIZE(mmu->permissions); ++byte) {
+		pfec = byte << 1;
+		map = 0;
+		wf = pfec & PFERR_WRITE_MASK;
+		uf = pfec & PFERR_USER_MASK;
+		ff = pfec & PFERR_FETCH_MASK;
+		for (bit = 0; bit < 8; ++bit) {
+			x = bit & ACC_EXEC_MASK;
+			w = bit & ACC_WRITE_MASK;
+			u = bit & ACC_USER_MASK;
+
+			/* Not really needed: !nx will cause pte.nx to fault */
+			x |= !mmu->nx;
+			/* Allow supervisor writes if !cr0.wp */
+			w |= !is_write_protection(vcpu) && !uf;
+			/* Disallow supervisor fetches of user code if cr4.smep */
+			x &= !(smep && u && !uf);
+
+			fault = (ff && !x) || (uf && !u) || (wf && !w);
+			map |= fault << bit;
+		}
+		mmu->permissions[byte] = map;
+	}
+}
+
 static int paging64_init_context_common(struct kvm_vcpu *vcpu,
 					struct kvm_mmu *context,
 					int level)
@@ -3524,6 +3556,7 @@ static int paging64_init_context_common(struct kvm_vcpu *vcpu,
 	context->root_level = level;
 
 	reset_rsvds_bits_mask(vcpu, context);
+	update_permission_bitmask(vcpu, context);
 
 	ASSERT(is_pae(vcpu));
 	context->new_cr3 = paging_new_cr3;
@@ -3552,6 +3585,7 @@ static int paging32_init_context(struct kvm_vcpu *vcpu,
 	context->root_level = PT32_ROOT_LEVEL;
 
 	reset_rsvds_bits_mask(vcpu, context);
+	update_permission_bitmask(vcpu, context);
 
 	context->new_cr3 = paging_new_cr3;
 	context->page_fault = paging32_page_fault;
@@ -3612,6 +3646,8 @@ static int init_kvm_tdp_mmu(struct kvm_vcpu *vcpu)
 		context->gva_to_gpa = paging32_gva_to_gpa;
 	}
 
+	update_permission_bitmask(vcpu, context);
+
 	return 0;
 }
 
@@ -3687,6 +3723,8 @@ static int init_kvm_nested_mmu(struct kvm_vcpu *vcpu)
 		g_context->gva_to_gpa = paging32_gva_to_gpa_nested;
 	}
 
+	update_permission_bitmask(vcpu, g_context);
+
 	return 0;
 }
 
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 2832081..5846607 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -89,17 +89,14 @@ static inline bool is_write_protection(struct kvm_vcpu *vcpu)
 	return kvm_read_cr0_bits(vcpu, X86_CR0_WP);
 }
 
-static inline bool check_write_user_access(struct kvm_vcpu *vcpu,
-					   bool write_fault, bool user_fault,
-					   unsigned long pte)
+/*
+ * Will a fault with a given page-fault error code (pfec) cause a permission
+ * fault with the given access (in ACC_* format)?
+ */
+static inline bool permission_fault(struct kvm_mmu *mmu, unsigned pte_access,
+				    unsigned pfec)
 {
-	if (unlikely(write_fault && !is_writable_pte(pte)
-	      && (user_fault || is_write_protection(vcpu))))
-		return false;
-
-	if (unlikely(user_fault && !(pte & PT_USER_MASK)))
-		return false;
-
-	return true;
+	return (mmu->permissions[pfec >> 1] >> pte_access) & 1;
 }
+
 #endif
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 35a05dd..8f6c59f 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -169,7 +169,7 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
 	pt_element_t pte;
 	pt_element_t __user *uninitialized_var(ptep_user);
 	gfn_t table_gfn;
-	unsigned index, pt_access, uninitialized_var(pte_access);
+	unsigned index, pt_access, pte_access;
 	gpa_t pte_gpa;
 	bool eperm, last_gpte;
 	int offset;
@@ -237,24 +237,9 @@ retry_walk:
 			goto error;
 		}
 
-		if (!check_write_user_access(vcpu, write_fault, user_fault,
-					  pte))
-			eperm = true;
-
-#if PTTYPE == 64
-		if (unlikely(fetch_fault && (pte & PT64_NX_MASK)))
-			eperm = true;
-#endif
+		pte_access = pt_access & gpte_access(vcpu, pte);
 
 		last_gpte = FNAME(is_last_gpte)(walker, vcpu, mmu, pte);
-		if (last_gpte) {
-			pte_access = pt_access & gpte_access(vcpu, pte);
-			/* check if the kernel is fetching from user page */
-			if (unlikely(pte_access & PT_USER_MASK) &&
-			    kvm_read_cr4_bits(vcpu, X86_CR4_SMEP))
-				if (fetch_fault && !user_fault)
-					eperm = true;
-		}
 
 		walker->ptes[walker->level - 1] = pte;
 
@@ -284,10 +269,11 @@ retry_walk:
 			break;
 		}
 
-		pt_access &= gpte_access(vcpu, pte);
+		pt_access &= pte_access;
 		--walker->level;
 	}
 
+	eperm |= permission_fault(mmu, pte_access, access);
 	if (unlikely(eperm)) {
 		errcode |= PFERR_PRESENT_MASK;
 		goto error;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c4d451e..8367ccc 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3672,20 +3672,17 @@ static int vcpu_mmio_gva_to_gpa(struct kvm_vcpu *vcpu, unsigned long gva,
 				gpa_t *gpa, struct x86_exception *exception,
 				bool write)
 {
-	u32 access = (kvm_x86_ops->get_cpl(vcpu) == 3) ? PFERR_USER_MASK : 0;
+	u32 access = ((kvm_x86_ops->get_cpl(vcpu) == 3) ? PFERR_USER_MASK : 0)
+		| (write ? PFERR_WRITE_MASK : 0);
 
-	if (vcpu_match_mmio_gva(vcpu, gva) &&
-		  check_write_user_access(vcpu, write, access,
-		  vcpu->arch.access)) {
+	if (vcpu_match_mmio_gva(vcpu, gva)
+	    && permission_fault(vcpu->arch.walk_mmu, vcpu->arch.access, access)) {
 		*gpa = vcpu->arch.mmio_gfn << PAGE_SHIFT |
 					(gva & (PAGE_SIZE - 1));
 		trace_vcpu_match_mmio(gva, *gpa, write, false);
 		return 1;
 	}
 
-	if (write)
-		access |= PFERR_WRITE_MASK;
-
 	*gpa = vcpu->arch.walk_mmu->gva_to_gpa(vcpu, gva, access, exception);
 
 	if (*gpa == UNMAPPED_GVA)
-- 
1.7.12


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 6/9] KVM: MMU: Simplify walk_addr_generic() loop
  2012-09-16 12:07 [PATCH v2 0/9] Optimize page table walk Avi Kivity
                   ` (4 preceding siblings ...)
  2012-09-16 12:07 ` [PATCH v2 5/9] KVM: MMU: Optimize pte permission checks Avi Kivity
@ 2012-09-16 12:07 ` Avi Kivity
  2012-09-18  6:53   ` Xiao Guangrong
  2012-09-16 12:07 ` [PATCH v2 7/9] KVM: MMU: Optimize is_last_gpte() Avi Kivity
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 21+ messages in thread
From: Avi Kivity @ 2012-09-16 12:07 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: kvm, Xiao Guangrong

The page table walk is coded as an infinite loop, with a special
case on the last pte.

Code it as an ordinary loop with a termination condition on the last
pte (large page or walk length exhausted), and put the last pte handling
code after the loop where it belongs.

Signed-off-by: Avi Kivity <avi@redhat.com>
---
 arch/x86/kvm/paging_tmpl.h | 60 +++++++++++++++++++---------------------------
 1 file changed, 25 insertions(+), 35 deletions(-)

diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 8f6c59f..1b4c14d 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -171,12 +171,15 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
 	gfn_t table_gfn;
 	unsigned index, pt_access, pte_access;
 	gpa_t pte_gpa;
-	bool eperm, last_gpte;
+	bool eperm;
 	int offset;
 	const int write_fault = access & PFERR_WRITE_MASK;
 	const int user_fault  = access & PFERR_USER_MASK;
 	const int fetch_fault = access & PFERR_FETCH_MASK;
 	u16 errcode = 0;
+	gpa_t real_gpa;
+	gfn_t gfn;
+	u32 ac;
 
 	trace_kvm_mmu_pagetable_walk(addr, access);
 retry_walk:
@@ -197,12 +200,16 @@ retry_walk:
 	ASSERT((!is_long_mode(vcpu) && is_pae(vcpu)) ||
 	       (mmu->get_cr3(vcpu) & CR3_NONPAE_RESERVED_BITS) == 0);
 
-	pt_access = ACC_ALL;
+	pt_access = pte_access = ACC_ALL;
+	++walker->level;
 
-	for (;;) {
+	do {
 		gfn_t real_gfn;
 		unsigned long host_addr;
 
+		pt_access &= pte_access;
+		--walker->level;
+
 		index = PT_INDEX(addr, walker->level);
 
 		table_gfn = gpte_to_gfn(pte);
@@ -239,39 +246,8 @@ retry_walk:
 
 		pte_access = pt_access & gpte_access(vcpu, pte);
 
-		last_gpte = FNAME(is_last_gpte)(walker, vcpu, mmu, pte);
-
 		walker->ptes[walker->level - 1] = pte;
-
-		if (last_gpte) {
-			int lvl = walker->level;
-			gpa_t real_gpa;
-			gfn_t gfn;
-			u32 ac;
-
-			gfn = gpte_to_gfn_lvl(pte, lvl);
-			gfn += (addr & PT_LVL_OFFSET_MASK(lvl)) >> PAGE_SHIFT;
-
-			if (PTTYPE == 32 &&
-			    walker->level == PT_DIRECTORY_LEVEL &&
-			    is_cpuid_PSE36())
-				gfn += pse36_gfn_delta(pte);
-
-			ac = write_fault | fetch_fault | user_fault;
-
-			real_gpa = mmu->translate_gpa(vcpu, gfn_to_gpa(gfn),
-						      ac);
-			if (real_gpa == UNMAPPED_GVA)
-				return 0;
-
-			walker->gfn = real_gpa >> PAGE_SHIFT;
-
-			break;
-		}
-
-		pt_access &= pte_access;
-		--walker->level;
-	}
+	} while (!FNAME(is_last_gpte)(walker, vcpu, mmu, pte));
 
 	eperm |= permission_fault(mmu, pte_access, access);
 	if (unlikely(eperm)) {
@@ -279,6 +255,20 @@ retry_walk:
 		goto error;
 	}
 
+	gfn = gpte_to_gfn_lvl(pte, walker->level);
+	gfn += (addr & PT_LVL_OFFSET_MASK(walker->level)) >> PAGE_SHIFT;
+
+	if (PTTYPE == 32 && walker->level == PT_DIRECTORY_LEVEL && is_cpuid_PSE36())
+		gfn += pse36_gfn_delta(pte);
+
+	ac = write_fault | fetch_fault | user_fault;
+
+	real_gpa = mmu->translate_gpa(vcpu, gfn_to_gpa(gfn), ac);
+	if (real_gpa == UNMAPPED_GVA)
+		return 0;
+
+	walker->gfn = real_gpa >> PAGE_SHIFT;
+
 	if (!write_fault)
 		protect_clean_gpte(&pte_access, pte);
 
-- 
1.7.12


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 7/9] KVM: MMU: Optimize is_last_gpte()
  2012-09-16 12:07 [PATCH v2 0/9] Optimize page table walk Avi Kivity
                   ` (5 preceding siblings ...)
  2012-09-16 12:07 ` [PATCH v2 6/9] KVM: MMU: Simplify walk_addr_generic() loop Avi Kivity
@ 2012-09-16 12:07 ` Avi Kivity
  2012-09-18  7:36   ` Xiao Guangrong
  2012-09-16 12:07 ` [PATCH v2 8/9] KVM: MMU: Eliminate eperm temporary Avi Kivity
  2012-09-16 12:07 ` [PATCH v2 9/9] KVM: MMU: Avoid access/dirty update loop if all is well Avi Kivity
  8 siblings, 1 reply; 21+ messages in thread
From: Avi Kivity @ 2012-09-16 12:07 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: kvm, Xiao Guangrong

Instead of branchy code depending on level, gpte.ps, and mmu configuration,
prepare everything in a bitmap during mode changes and look it up during
runtime.

Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
---
 arch/x86/include/asm/kvm_host.h |  7 +++++++
 arch/x86/kvm/mmu.c              | 31 +++++++++++++++++++++++++++++++
 arch/x86/kvm/mmu.h              |  3 ++-
 arch/x86/kvm/paging_tmpl.h      | 20 +-------------------
 4 files changed, 41 insertions(+), 20 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 3318bde..43aeb94 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -298,6 +298,13 @@ struct kvm_mmu {
 	u64 *lm_root;
 	u64 rsvd_bits_mask[2][4];
 
+	/*
+	 * Bitmap: bit set = last pte in walk
+	 * index[0:1]: level (zero-based)
+	 * index[2]: pte.ps
+	 */
+	u8 last_pte_bitmap;
+
 	bool nx;
 
 	u64 pdptrs[4]; /* pae */
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 9c61889..d289fee 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3447,6 +3447,15 @@ static inline unsigned gpte_access(struct kvm_vcpu *vcpu, u64 gpte)
 	return access;
 }
 
+static inline bool is_last_gpte(struct kvm_mmu *mmu, unsigned level, unsigned gpte)
+{
+	unsigned index;
+
+	index = level - 1;
+	index |= (gpte & PT_PAGE_SIZE_MASK) >> (PT_PAGE_SIZE_SHIFT - 2);
+	return mmu->last_pte_bitmap & (1 << index);
+}
+
 #define PTTYPE 64
 #include "paging_tmpl.h"
 #undef PTTYPE
@@ -3548,6 +3557,24 @@ static void update_permission_bitmask(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu
 	}
 }
 
+static void update_last_pte_bitmap(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu)
+{
+	u8 map;
+	unsigned level, root_level = mmu->root_level;
+	const unsigned ps_set_index = 1 << 2;  /* bit 2 of index: ps */
+
+	if (root_level == PT32E_ROOT_LEVEL)
+		--root_level;
+	/* PT_PAGE_TABLE_LEVEL always terminates */
+	map = 1 | (1 << ps_set_index);
+	for (level = PT_DIRECTORY_LEVEL; level <= root_level; ++level) {
+		if (level <= PT_PDPE_LEVEL
+		    && (mmu->root_level >= PT32E_ROOT_LEVEL || is_pse(vcpu)))
+			map |= 1 << (ps_set_index | (level - 1));
+	}
+	mmu->last_pte_bitmap = map;
+}
+
 static int paging64_init_context_common(struct kvm_vcpu *vcpu,
 					struct kvm_mmu *context,
 					int level)
@@ -3557,6 +3584,7 @@ static int paging64_init_context_common(struct kvm_vcpu *vcpu,
 
 	reset_rsvds_bits_mask(vcpu, context);
 	update_permission_bitmask(vcpu, context);
+	update_last_pte_bitmap(vcpu, context);
 
 	ASSERT(is_pae(vcpu));
 	context->new_cr3 = paging_new_cr3;
@@ -3586,6 +3614,7 @@ static int paging32_init_context(struct kvm_vcpu *vcpu,
 
 	reset_rsvds_bits_mask(vcpu, context);
 	update_permission_bitmask(vcpu, context);
+	update_last_pte_bitmap(vcpu, context);
 
 	context->new_cr3 = paging_new_cr3;
 	context->page_fault = paging32_page_fault;
@@ -3647,6 +3676,7 @@ static int init_kvm_tdp_mmu(struct kvm_vcpu *vcpu)
 	}
 
 	update_permission_bitmask(vcpu, context);
+	update_last_pte_bitmap(vcpu, context);
 
 	return 0;
 }
@@ -3724,6 +3754,7 @@ static int init_kvm_nested_mmu(struct kvm_vcpu *vcpu)
 	}
 
 	update_permission_bitmask(vcpu, g_context);
+	update_last_pte_bitmap(vcpu, g_context);
 
 	return 0;
 }
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 5846607..6987108 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -20,7 +20,8 @@
 #define PT_ACCESSED_MASK (1ULL << PT_ACCESSED_SHIFT)
 #define PT_DIRTY_SHIFT 6
 #define PT_DIRTY_MASK (1ULL << PT_DIRTY_SHIFT)
-#define PT_PAGE_SIZE_MASK (1ULL << 7)
+#define PT_PAGE_SIZE_SHIFT 7
+#define PT_PAGE_SIZE_MASK (1ULL << PT_PAGE_SIZE_SHIFT)
 #define PT_PAT_MASK (1ULL << 7)
 #define PT_GLOBAL_MASK (1ULL << 8)
 #define PT64_NX_SHIFT 63
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 1b4c14d..134ea7b 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -103,24 +103,6 @@ static int FNAME(cmpxchg_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
 	return (ret != orig_pte);
 }
 
-static bool FNAME(is_last_gpte)(struct guest_walker *walker,
-				struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
-				pt_element_t gpte)
-{
-	if (walker->level == PT_PAGE_TABLE_LEVEL)
-		return true;
-
-	if ((walker->level == PT_DIRECTORY_LEVEL) && is_large_pte(gpte) &&
-	    (PTTYPE == 64 || is_pse(vcpu)))
-		return true;
-
-	if ((walker->level == PT_PDPE_LEVEL) && is_large_pte(gpte) &&
-	    (mmu->root_level == PT64_ROOT_LEVEL))
-		return true;
-
-	return false;
-}
-
 static int FNAME(update_accessed_dirty_bits)(struct kvm_vcpu *vcpu,
 					     struct kvm_mmu *mmu,
 					     struct guest_walker *walker,
@@ -247,7 +229,7 @@ retry_walk:
 		pte_access = pt_access & gpte_access(vcpu, pte);
 
 		walker->ptes[walker->level - 1] = pte;
-	} while (!FNAME(is_last_gpte)(walker, vcpu, mmu, pte));
+	} while (!is_last_gpte(mmu, walker->level, pte));
 
 	eperm |= permission_fault(mmu, pte_access, access);
 	if (unlikely(eperm)) {
-- 
1.7.12


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 8/9] KVM: MMU: Eliminate eperm temporary
  2012-09-16 12:07 [PATCH v2 0/9] Optimize page table walk Avi Kivity
                   ` (6 preceding siblings ...)
  2012-09-16 12:07 ` [PATCH v2 7/9] KVM: MMU: Optimize is_last_gpte() Avi Kivity
@ 2012-09-16 12:07 ` Avi Kivity
  2012-09-18  7:00   ` Xiao Guangrong
  2012-09-16 12:07 ` [PATCH v2 9/9] KVM: MMU: Avoid access/dirty update loop if all is well Avi Kivity
  8 siblings, 1 reply; 21+ messages in thread
From: Avi Kivity @ 2012-09-16 12:07 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: kvm, Xiao Guangrong

'eperm' is no longer used in the walker loop, so we can eliminate it.

Signed-off-by: Avi Kivity <avi@redhat.com>
---
 arch/x86/kvm/paging_tmpl.h | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 134ea7b..95a64d1 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -153,7 +153,6 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
 	gfn_t table_gfn;
 	unsigned index, pt_access, pte_access;
 	gpa_t pte_gpa;
-	bool eperm;
 	int offset;
 	const int write_fault = access & PFERR_WRITE_MASK;
 	const int user_fault  = access & PFERR_USER_MASK;
@@ -165,7 +164,6 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
 
 	trace_kvm_mmu_pagetable_walk(addr, access);
 retry_walk:
-	eperm = false;
 	walker->level = mmu->root_level;
 	pte           = mmu->get_cr3(vcpu);
 
@@ -231,8 +229,7 @@ retry_walk:
 		walker->ptes[walker->level - 1] = pte;
 	} while (!is_last_gpte(mmu, walker->level, pte));
 
-	eperm |= permission_fault(mmu, pte_access, access);
-	if (unlikely(eperm)) {
+	if (unlikely(permission_fault(mmu, pte_access, access))) {
 		errcode |= PFERR_PRESENT_MASK;
 		goto error;
 	}
-- 
1.7.12


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 9/9] KVM: MMU: Avoid access/dirty update loop if all is well
  2012-09-16 12:07 [PATCH v2 0/9] Optimize page table walk Avi Kivity
                   ` (7 preceding siblings ...)
  2012-09-16 12:07 ` [PATCH v2 8/9] KVM: MMU: Eliminate eperm temporary Avi Kivity
@ 2012-09-16 12:07 ` Avi Kivity
  2012-09-18  7:21   ` Xiao Guangrong
  8 siblings, 1 reply; 21+ messages in thread
From: Avi Kivity @ 2012-09-16 12:07 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: kvm, Xiao Guangrong

Keep track of accessed/dirty bits; if they are all set, do not
enter the accessed/dirty update loop.

Signed-off-by: Avi Kivity <avi@redhat.com>
---
 arch/x86/kvm/paging_tmpl.h | 24 ++++++++++++++++++------
 1 file changed, 18 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 95a64d1..5bb3d4f 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -151,7 +151,7 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
 	pt_element_t pte;
 	pt_element_t __user *uninitialized_var(ptep_user);
 	gfn_t table_gfn;
-	unsigned index, pt_access, pte_access;
+	unsigned index, pt_access, pte_access, accessed_dirty;
 	gpa_t pte_gpa;
 	int offset;
 	const int write_fault = access & PFERR_WRITE_MASK;
@@ -180,6 +180,7 @@ retry_walk:
 	ASSERT((!is_long_mode(vcpu) && is_pae(vcpu)) ||
 	       (mmu->get_cr3(vcpu) & CR3_NONPAE_RESERVED_BITS) == 0);
 
+	accessed_dirty = PT_ACCESSED_MASK;
 	pt_access = pte_access = ACC_ALL;
 	++walker->level;
 
@@ -224,6 +225,7 @@ retry_walk:
 			goto error;
 		}
 
+		accessed_dirty &= pte;
 		pte_access = pt_access & gpte_access(vcpu, pte);
 
 		walker->ptes[walker->level - 1] = pte;
@@ -251,11 +253,21 @@ retry_walk:
 	if (!write_fault)
 		protect_clean_gpte(&pte_access, pte);
 
-	ret = FNAME(update_accessed_dirty_bits)(vcpu, mmu, walker, write_fault);
-	if (unlikely(ret < 0))
-		goto error;
-	else if (ret)
-		goto retry_walk;
+	/*
+	 * On a write fault, fold the dirty bit into accessed_dirty by shifting it one
+	 * place right.
+	 *
+	 * On a read fault, do nothing.
+	 */
+	accessed_dirty &= pte >> (write_fault >> ilog2(PFERR_WRITE_MASK));
+
+	if (unlikely(!accessed_dirty)) {
+		ret = FNAME(update_accessed_dirty_bits)(vcpu, mmu, walker, write_fault);
+		if (unlikely(ret < 0))
+			goto error;
+		else if (ret)
+			goto retry_walk;
+	}
 
 	walker->pt_access = pt_access;
 	walker->pte_access = pte_access;
-- 
1.7.12


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 4/9] KVM: MMU: Update accessed and dirty bits after guest pagetable walk
  2012-09-16 12:07 ` [PATCH v2 4/9] KVM: MMU: Update accessed and dirty bits after guest pagetable walk Avi Kivity
@ 2012-09-18  6:37   ` Xiao Guangrong
  0 siblings, 0 replies; 21+ messages in thread
From: Xiao Guangrong @ 2012-09-18  6:37 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Marcelo Tosatti, kvm

On 09/16/2012 08:07 PM, Avi Kivity wrote:
> While unspecified, the behaviour of Intel processors is to first
> perform the page table walk, then, if the walk was successful, to
> atomically update the accessed and dirty bits of walked paging elements.
> 
> While we are not required to follow this exactly, doing so will allow us
> to perform the access permissions check after the walk is complete, rather
> than after each walk step.
> 
> (the tricky case is SMEP: a zero in any pte's U bit makes the referenced
> page a supervisor page, so we can't fault on a one bit during the walk
> itself).

Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 5/9] KVM: MMU: Optimize pte permission checks
  2012-09-16 12:07 ` [PATCH v2 5/9] KVM: MMU: Optimize pte permission checks Avi Kivity
@ 2012-09-18  6:45   ` Xiao Guangrong
  2012-09-19 16:06     ` Avi Kivity
  0 siblings, 1 reply; 21+ messages in thread
From: Xiao Guangrong @ 2012-09-18  6:45 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Marcelo Tosatti, kvm

On 09/16/2012 08:07 PM, Avi Kivity wrote:

> @@ -3672,20 +3672,17 @@ static int vcpu_mmio_gva_to_gpa(struct kvm_vcpu *vcpu, unsigned long gva,
>  				gpa_t *gpa, struct x86_exception *exception,
>  				bool write)
>  {
> -	u32 access = (kvm_x86_ops->get_cpl(vcpu) == 3) ? PFERR_USER_MASK : 0;
> +	u32 access = ((kvm_x86_ops->get_cpl(vcpu) == 3) ? PFERR_USER_MASK : 0)
> +		| (write ? PFERR_WRITE_MASK : 0);
> 
> -	if (vcpu_match_mmio_gva(vcpu, gva) &&
> -		  check_write_user_access(vcpu, write, access,
> -		  vcpu->arch.access)) {
> +	if (vcpu_match_mmio_gva(vcpu, gva)
> +	    && permission_fault(vcpu->arch.walk_mmu, vcpu->arch.access, access)) {

I think '!' was missed? Is it !permission_fault()?

Otherwise looks good to me!

Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 6/9] KVM: MMU: Simplify walk_addr_generic() loop
  2012-09-16 12:07 ` [PATCH v2 6/9] KVM: MMU: Simplify walk_addr_generic() loop Avi Kivity
@ 2012-09-18  6:53   ` Xiao Guangrong
  2012-09-19 16:29     ` Avi Kivity
  0 siblings, 1 reply; 21+ messages in thread
From: Xiao Guangrong @ 2012-09-18  6:53 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Marcelo Tosatti, kvm

On 09/16/2012 08:07 PM, Avi Kivity wrote:

> 
> -	pt_access = ACC_ALL;
> +	pt_access = pte_access = ACC_ALL;
> +	++walker->level;
> 
> -	for (;;) {
> +	do {
>  		gfn_t real_gfn;
>  		unsigned long host_addr;
> 
> +		pt_access &= pte_access;
> +		--walker->level;

Any reason increase walker->level before the loop and decrease here?
Can not use the origin style? :)

> +	gfn = gpte_to_gfn_lvl(pte, walker->level);
> +	gfn += (addr & PT_LVL_OFFSET_MASK(walker->level)) >> PAGE_SHIFT;
> +
> +	if (PTTYPE == 32 && walker->level == PT_DIRECTORY_LEVEL && is_cpuid_PSE36())
> +		gfn += pse36_gfn_delta(pte);
> +
> +	ac = write_fault | fetch_fault | user_fault;

Can use 'access' instead.

Otherwise looks good to me.

Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 8/9] KVM: MMU: Eliminate eperm temporary
  2012-09-16 12:07 ` [PATCH v2 8/9] KVM: MMU: Eliminate eperm temporary Avi Kivity
@ 2012-09-18  7:00   ` Xiao Guangrong
  0 siblings, 0 replies; 21+ messages in thread
From: Xiao Guangrong @ 2012-09-18  7:00 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Marcelo Tosatti, kvm

On 09/16/2012 08:07 PM, Avi Kivity wrote:
> 'eperm' is no longer used in the walker loop, so we can eliminate it.
> 

Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 9/9] KVM: MMU: Avoid access/dirty update loop if all is well
  2012-09-16 12:07 ` [PATCH v2 9/9] KVM: MMU: Avoid access/dirty update loop if all is well Avi Kivity
@ 2012-09-18  7:21   ` Xiao Guangrong
  2012-09-19 16:12     ` Avi Kivity
  0 siblings, 1 reply; 21+ messages in thread
From: Xiao Guangrong @ 2012-09-18  7:21 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Marcelo Tosatti, kvm

On 09/16/2012 08:07 PM, Avi Kivity wrote:

> +	/*
> +	 * On a write fault, fold the dirty bit into accessed_dirty by shifting it one
> +	 * place right.
> +	 *
> +	 * On a read fault, do nothing.
> +	 */
> +	accessed_dirty &= pte >> (write_fault >> ilog2(PFERR_WRITE_MASK));

It is too trick: it depends on DIRTY_SHIFT = ACCESSED_SHIFT + 1. How about change it
to:
  accessed_dirty &= pte >> (!!write_fault >> (DIRTY_SHIFT - ACCESSED_SHIFT)) ?


Otherwise looks good to me.

Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 7/9] KVM: MMU: Optimize is_last_gpte()
  2012-09-16 12:07 ` [PATCH v2 7/9] KVM: MMU: Optimize is_last_gpte() Avi Kivity
@ 2012-09-18  7:36   ` Xiao Guangrong
  2012-09-19 17:17     ` Avi Kivity
  0 siblings, 1 reply; 21+ messages in thread
From: Xiao Guangrong @ 2012-09-18  7:36 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Marcelo Tosatti, kvm, Xiao Guangrong

On 09/16/2012 08:07 PM, Avi Kivity wrote:
> Instead of branchy code depending on level, gpte.ps, and mmu configuration,
> prepare everything in a bitmap during mode changes and look it up during
> runtime.
> 

Avi,

Can we introduce ignore_bits_mask[] (like rsvd_bits_mask), if pse is not supported,
we can set PTE_PSE in this ignore_bits_mask, then the function can be simplified to

	return (level == 1) || gpte & ~vcpu->ignore_bits_mask[level] & PTE_PSE

I think it is more readable.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 5/9] KVM: MMU: Optimize pte permission checks
  2012-09-18  6:45   ` Xiao Guangrong
@ 2012-09-19 16:06     ` Avi Kivity
  0 siblings, 0 replies; 21+ messages in thread
From: Avi Kivity @ 2012-09-19 16:06 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Marcelo Tosatti, kvm

On 09/18/2012 09:45 AM, Xiao Guangrong wrote:
> On 09/16/2012 08:07 PM, Avi Kivity wrote:
> 
>> @@ -3672,20 +3672,17 @@ static int vcpu_mmio_gva_to_gpa(struct kvm_vcpu *vcpu, unsigned long gva,
>>  				gpa_t *gpa, struct x86_exception *exception,
>>  				bool write)
>>  {
>> -	u32 access = (kvm_x86_ops->get_cpl(vcpu) == 3) ? PFERR_USER_MASK : 0;
>> +	u32 access = ((kvm_x86_ops->get_cpl(vcpu) == 3) ? PFERR_USER_MASK : 0)
>> +		| (write ? PFERR_WRITE_MASK : 0);
>> 
>> -	if (vcpu_match_mmio_gva(vcpu, gva) &&
>> -		  check_write_user_access(vcpu, write, access,
>> -		  vcpu->arch.access)) {
>> +	if (vcpu_match_mmio_gva(vcpu, gva)
>> +	    && permission_fault(vcpu->arch.walk_mmu, vcpu->arch.access, access)) {
> 
> I think '!' was missed? Is it !permission_fault()?
> 
> Otherwise looks good to me!

Whoops, fixed.


-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 9/9] KVM: MMU: Avoid access/dirty update loop if all is well
  2012-09-18  7:21   ` Xiao Guangrong
@ 2012-09-19 16:12     ` Avi Kivity
  0 siblings, 0 replies; 21+ messages in thread
From: Avi Kivity @ 2012-09-19 16:12 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Marcelo Tosatti, kvm

On 09/18/2012 10:21 AM, Xiao Guangrong wrote:
> On 09/16/2012 08:07 PM, Avi Kivity wrote:
> 
>> +	/*
>> +	 * On a write fault, fold the dirty bit into accessed_dirty by shifting it one
>> +	 * place right.
>> +	 *
>> +	 * On a read fault, do nothing.
>> +	 */
>> +	accessed_dirty &= pte >> (write_fault >> ilog2(PFERR_WRITE_MASK));
> 
> It is too trick: it depends on DIRTY_SHIFT = ACCESSED_SHIFT + 1. How about change it
> to:
>   accessed_dirty &= pte >> (!!write_fault >> (DIRTY_SHIFT - ACCESSED_SHIFT)) ?

!! forces a branch, unless gcc is really clever.  So I changed it to

	shift = write_fault >> ilog2(PFERR_WRITE_MASK);
	shift *= PT_DIRTY_SHIFT - PT_ACCESSED_SHIFT;
	accessed_dirty &= pte >> shift;

which should result in the same code.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 6/9] KVM: MMU: Simplify walk_addr_generic() loop
  2012-09-18  6:53   ` Xiao Guangrong
@ 2012-09-19 16:29     ` Avi Kivity
  0 siblings, 0 replies; 21+ messages in thread
From: Avi Kivity @ 2012-09-19 16:29 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Marcelo Tosatti, kvm

On 09/18/2012 09:53 AM, Xiao Guangrong wrote:
> On 09/16/2012 08:07 PM, Avi Kivity wrote:
> 
>> 
>> -	pt_access = ACC_ALL;
>> +	pt_access = pte_access = ACC_ALL;
>> +	++walker->level;
>> 
>> -	for (;;) {
>> +	do {
>>  		gfn_t real_gfn;
>>  		unsigned long host_addr;
>> 
>> +		pt_access &= pte_access;
>> +		--walker->level;
> 
> Any reason increase walker->level before the loop and decrease here?
> Can not use the origin style? :)

The original code had

      if (last_gpte) {
         ...
         break;
      }
      --walker->level
   }

Since my change moves the check to the last '}', it would include an
extra decrement of walker->level.

> 
>> +	gfn = gpte_to_gfn_lvl(pte, walker->level);
>> +	gfn += (addr & PT_LVL_OFFSET_MASK(walker->level)) >> PAGE_SHIFT;
>> +
>> +	if (PTTYPE == 32 && walker->level == PT_DIRECTORY_LEVEL && is_cpuid_PSE36())
>> +		gfn += pse36_gfn_delta(pte);
>> +
>> +	ac = write_fault | fetch_fault | user_fault;
> 
> Can use 'access' instead.
> 

'access' also has other bits set, need to check if we need to mask them
off.  Will add a separate patch for this.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 7/9] KVM: MMU: Optimize is_last_gpte()
  2012-09-18  7:36   ` Xiao Guangrong
@ 2012-09-19 17:17     ` Avi Kivity
  2012-09-19 17:18       ` Avi Kivity
  0 siblings, 1 reply; 21+ messages in thread
From: Avi Kivity @ 2012-09-19 17:17 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Marcelo Tosatti, kvm

On 09/18/2012 10:36 AM, Xiao Guangrong wrote:
> On 09/16/2012 08:07 PM, Avi Kivity wrote:
>> Instead of branchy code depending on level, gpte.ps, and mmu configuration,
>> prepare everything in a bitmap during mode changes and look it up during
>> runtime.
>> 
> 
> Avi,
> 
> Can we introduce ignore_bits_mask[] (like rsvd_bits_mask), if pse is not supported,
> we can set PTE_PSE in this ignore_bits_mask, then the function can be simplified to
> 
> 	return (level == 1) || gpte & ~vcpu->ignore_bits_mask[level] & PTE_PSE
> 
> I think it is more readable.

It's more readable, yes, but those loads do show up in profiles.  So I'd
like to avoid them unless they save a lot of code.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 7/9] KVM: MMU: Optimize is_last_gpte()
  2012-09-19 17:17     ` Avi Kivity
@ 2012-09-19 17:18       ` Avi Kivity
  0 siblings, 0 replies; 21+ messages in thread
From: Avi Kivity @ 2012-09-19 17:18 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Marcelo Tosatti, kvm

On 09/19/2012 08:17 PM, Avi Kivity wrote:
> On 09/18/2012 10:36 AM, Xiao Guangrong wrote:
>> On 09/16/2012 08:07 PM, Avi Kivity wrote:
>>> Instead of branchy code depending on level, gpte.ps, and mmu configuration,
>>> prepare everything in a bitmap during mode changes and look it up during
>>> runtime.
>>> 
>> 
>> Avi,
>> 
>> Can we introduce ignore_bits_mask[] (like rsvd_bits_mask), if pse is not supported,
>> we can set PTE_PSE in this ignore_bits_mask, then the function can be simplified to
>> 
>> 	return (level == 1) || gpte & ~vcpu->ignore_bits_mask[level] & PTE_PSE
>> 
>> I think it is more readable.
> 
> It's more readable, yes, but those loads do show up in profiles.  So I'd
> like to avoid them unless they save a lot of code.

Oops, this is replacing one load by another.  Still, the new code is
more branchy.


-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2012-09-19 17:18 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-09-16 12:07 [PATCH v2 0/9] Optimize page table walk Avi Kivity
2012-09-16 12:07 ` [PATCH v2 1/9] KVM: MMU: Push clean gpte write protection out of gpte_access() Avi Kivity
2012-09-16 12:07 ` [PATCH v2 2/9] KVM: MMU: Optimize gpte_access() slightly Avi Kivity
2012-09-16 12:07 ` [PATCH v2 3/9] KVM: MMU: Move gpte_access() out of paging_tmpl.h Avi Kivity
2012-09-16 12:07 ` [PATCH v2 4/9] KVM: MMU: Update accessed and dirty bits after guest pagetable walk Avi Kivity
2012-09-18  6:37   ` Xiao Guangrong
2012-09-16 12:07 ` [PATCH v2 5/9] KVM: MMU: Optimize pte permission checks Avi Kivity
2012-09-18  6:45   ` Xiao Guangrong
2012-09-19 16:06     ` Avi Kivity
2012-09-16 12:07 ` [PATCH v2 6/9] KVM: MMU: Simplify walk_addr_generic() loop Avi Kivity
2012-09-18  6:53   ` Xiao Guangrong
2012-09-19 16:29     ` Avi Kivity
2012-09-16 12:07 ` [PATCH v2 7/9] KVM: MMU: Optimize is_last_gpte() Avi Kivity
2012-09-18  7:36   ` Xiao Guangrong
2012-09-19 17:17     ` Avi Kivity
2012-09-19 17:18       ` Avi Kivity
2012-09-16 12:07 ` [PATCH v2 8/9] KVM: MMU: Eliminate eperm temporary Avi Kivity
2012-09-18  7:00   ` Xiao Guangrong
2012-09-16 12:07 ` [PATCH v2 9/9] KVM: MMU: Avoid access/dirty update loop if all is well Avi Kivity
2012-09-18  7:21   ` Xiao Guangrong
2012-09-19 16:12     ` Avi Kivity

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).