[patch 00/13] out of sync shadow v3

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

* [patch 00/13] out of sync shadow v3
@ 2008-09-22 19:16 Marcelo Tosatti
  2008-09-22 19:16 ` [patch 01/13] KVM: MMU: flush remote TLBs on large->normal entry overwrite Marcelo Tosatti
                   ` (13 more replies)
  0 siblings, 14 replies; 18+ messages in thread
From: Marcelo Tosatti @ 2008-09-22 19:16 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm

Adressing v2 comments, and with an option to disable oos.

Windows 2003 DDK build (4-way guest) results:
mainline  oos
04:37     03:42    ( ~= -20% )

kernel builds on 4-way guest improve by 10%.

-- 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [patch 01/13] KVM: MMU: flush remote TLBs on large->normal entry overwrite
  2008-09-22 19:16 [patch 00/13] out of sync shadow v3 Marcelo Tosatti
@ 2008-09-22 19:16 ` Marcelo Tosatti
  2008-09-22 19:16 ` [patch 02/13] KVM: MMU: split mmu_set_spte Marcelo Tosatti
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 18+ messages in thread
From: Marcelo Tosatti @ 2008-09-22 19:16 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, Marcelo Tosatti

[-- Attachment #1: flush-tlb-on-lpage-overwrite --]
[-- Type: text/plain, Size: 775 bytes --]

It is necessary to flush all TLB's when a large spte entry is
overwritten with a normal page directory pointer.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

Index: kvm/arch/x86/kvm/paging_tmpl.h
===================================================================
--- kvm.orig/arch/x86/kvm/paging_tmpl.h
+++ kvm/arch/x86/kvm/paging_tmpl.h
@@ -310,8 +310,11 @@ static int FNAME(shadow_walk_entry)(stru
 	if (is_shadow_present_pte(*sptep) && !is_large_pte(*sptep))
 		return 0;
 
-	if (is_large_pte(*sptep))
+	if (is_large_pte(*sptep)) {
+		set_shadow_pte(sptep, shadow_trap_nonpresent_pte);
+		kvm_flush_remote_tlbs(vcpu->kvm);
 		rmap_remove(vcpu->kvm, sptep);
+	}
 
 	if (level == PT_DIRECTORY_LEVEL && gw->level == PT_DIRECTORY_LEVEL) {
 		metaphysical = 1;

-- 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [patch 02/13] KVM: MMU: split mmu_set_spte
  2008-09-22 19:16 [patch 00/13] out of sync shadow v3 Marcelo Tosatti
  2008-09-22 19:16 ` [patch 01/13] KVM: MMU: flush remote TLBs on large->normal entry overwrite Marcelo Tosatti
@ 2008-09-22 19:16 ` Marcelo Tosatti
  2008-09-22 19:16 ` [patch 03/13] KVM: MMU: move local TLB flush to mmu_set_spte Marcelo Tosatti
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 18+ messages in thread
From: Marcelo Tosatti @ 2008-09-22 19:16 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, Marcelo Tosatti

[-- Attachment #1: mmu-set-spte --]
[-- Type: text/plain, Size: 4394 bytes --]

Split the spte entry creation code into a new set_spte function.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

Index: kvm/arch/x86/kvm/mmu.c
===================================================================
--- kvm.orig/arch/x86/kvm/mmu.c
+++ kvm/arch/x86/kvm/mmu.c
@@ -1148,44 +1148,13 @@ struct page *gva_to_page(struct kvm_vcpu
 	return page;
 }
 
-static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *shadow_pte,
-			 unsigned pt_access, unsigned pte_access,
-			 int user_fault, int write_fault, int dirty,
-			 int *ptwrite, int largepage, gfn_t gfn,
-			 pfn_t pfn, bool speculative)
+static int set_spte(struct kvm_vcpu *vcpu, u64 *shadow_pte,
+		    unsigned pte_access, int user_fault,
+		    int write_fault, int dirty, int largepage,
+		    gfn_t gfn, pfn_t pfn, bool speculative)
 {
 	u64 spte;
-	int was_rmapped = 0;
-	int was_writeble = is_writeble_pte(*shadow_pte);
-
-	pgprintk("%s: spte %llx access %x write_fault %d"
-		 " user_fault %d gfn %lx\n",
-		 __func__, *shadow_pte, pt_access,
-		 write_fault, user_fault, gfn);
-
-	if (is_rmap_pte(*shadow_pte)) {
-		/*
-		 * If we overwrite a PTE page pointer with a 2MB PMD, unlink
-		 * the parent of the now unreachable PTE.
-		 */
-		if (largepage && !is_large_pte(*shadow_pte)) {
-			struct kvm_mmu_page *child;
-			u64 pte = *shadow_pte;
-
-			child = page_header(pte & PT64_BASE_ADDR_MASK);
-			mmu_page_remove_parent_pte(child, shadow_pte);
-		} else if (pfn != spte_to_pfn(*shadow_pte)) {
-			pgprintk("hfn old %lx new %lx\n",
-				 spte_to_pfn(*shadow_pte), pfn);
-			rmap_remove(vcpu->kvm, shadow_pte);
-		} else {
-			if (largepage)
-				was_rmapped = is_large_pte(*shadow_pte);
-			else
-				was_rmapped = 1;
-		}
-	}
-
+	int ret = 0;
 	/*
 	 * We don't set the accessed bit, since we sometimes want to see
 	 * whether the guest actually used the pte (in order to detect
@@ -1218,26 +1187,70 @@ static void mmu_set_spte(struct kvm_vcpu
 		   (largepage && has_wrprotected_page(vcpu->kvm, gfn))) {
 			pgprintk("%s: found shadow page for %lx, marking ro\n",
 				 __func__, gfn);
+			ret = 1;
 			pte_access &= ~ACC_WRITE_MASK;
 			if (is_writeble_pte(spte)) {
 				spte &= ~PT_WRITABLE_MASK;
 				kvm_x86_ops->tlb_flush(vcpu);
 			}
-			if (write_fault)
-				*ptwrite = 1;
 		}
 	}
 
 	if (pte_access & ACC_WRITE_MASK)
 		mark_page_dirty(vcpu->kvm, gfn);
 
-	pgprintk("%s: setting spte %llx\n", __func__, spte);
-	pgprintk("instantiating %s PTE (%s) at %ld (%llx) addr %p\n",
-		 (spte&PT_PAGE_SIZE_MASK)? "2MB" : "4kB",
-		 (spte&PT_WRITABLE_MASK)?"RW":"R", gfn, spte, shadow_pte);
 	set_shadow_pte(shadow_pte, spte);
-	if (!was_rmapped && (spte & PT_PAGE_SIZE_MASK)
-	    && (spte & PT_PRESENT_MASK))
+	return ret;
+}
+
+
+static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *shadow_pte,
+			 unsigned pt_access, unsigned pte_access,
+			 int user_fault, int write_fault, int dirty,
+			 int *ptwrite, int largepage, gfn_t gfn,
+			 pfn_t pfn, bool speculative)
+{
+	int was_rmapped = 0;
+	int was_writeble = is_writeble_pte(*shadow_pte);
+
+	pgprintk("%s: spte %llx access %x write_fault %d"
+		 " user_fault %d gfn %lx\n",
+		 __func__, *shadow_pte, pt_access,
+		 write_fault, user_fault, gfn);
+
+	if (is_rmap_pte(*shadow_pte)) {
+		/*
+		 * If we overwrite a PTE page pointer with a 2MB PMD, unlink
+		 * the parent of the now unreachable PTE.
+		 */
+		if (largepage && !is_large_pte(*shadow_pte)) {
+			struct kvm_mmu_page *child;
+			u64 pte = *shadow_pte;
+
+			child = page_header(pte & PT64_BASE_ADDR_MASK);
+			mmu_page_remove_parent_pte(child, shadow_pte);
+		} else if (pfn != spte_to_pfn(*shadow_pte)) {
+			pgprintk("hfn old %lx new %lx\n",
+				 spte_to_pfn(*shadow_pte), pfn);
+			rmap_remove(vcpu->kvm, shadow_pte);
+		} else {
+			if (largepage)
+				was_rmapped = is_large_pte(*shadow_pte);
+			else
+				was_rmapped = 1;
+		}
+	}
+	if (set_spte(vcpu, shadow_pte, pte_access, user_fault, write_fault,
+		      dirty, largepage, gfn, pfn, speculative))
+		if (write_fault)
+			*ptwrite = 1;
+
+	pgprintk("%s: setting spte %llx\n", __func__, *shadow_pte);
+	pgprintk("instantiating %s PTE (%s) at %ld (%llx) addr %p\n",
+		 is_large_pte(*shadow_pte)? "2MB" : "4kB",
+		 is_present_pte(*shadow_pte)?"RW":"R", gfn,
+		 *shadow_pte, shadow_pte);
+	if (!was_rmapped && is_large_pte(*shadow_pte))
 		++vcpu->kvm->stat.lpages;
 
 	page_header_update_slot(vcpu->kvm, shadow_pte, gfn);

-- 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [patch 03/13] KVM: MMU: move local TLB flush to mmu_set_spte
  2008-09-22 19:16 [patch 00/13] out of sync shadow v3 Marcelo Tosatti
  2008-09-22 19:16 ` [patch 01/13] KVM: MMU: flush remote TLBs on large->normal entry overwrite Marcelo Tosatti
  2008-09-22 19:16 ` [patch 02/13] KVM: MMU: split mmu_set_spte Marcelo Tosatti
@ 2008-09-22 19:16 ` Marcelo Tosatti
  2008-09-22 19:16 ` [patch 04/13] KVM: MMU: do not write-protect large mappings Marcelo Tosatti
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 18+ messages in thread
From: Marcelo Tosatti @ 2008-09-22 19:16 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, Marcelo Tosatti

[-- Attachment #1: mmu-set-spte-tlb-flush --]
[-- Type: text/plain, Size: 997 bytes --]

Since the sync page path can collapse flushes.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

Index: kvm/arch/x86/kvm/mmu.c
===================================================================
--- kvm.orig/arch/x86/kvm/mmu.c
+++ kvm/arch/x86/kvm/mmu.c
@@ -1189,10 +1189,8 @@ static int set_spte(struct kvm_vcpu *vcp
 				 __func__, gfn);
 			ret = 1;
 			pte_access &= ~ACC_WRITE_MASK;
-			if (is_writeble_pte(spte)) {
+			if (is_writeble_pte(spte))
 				spte &= ~PT_WRITABLE_MASK;
-				kvm_x86_ops->tlb_flush(vcpu);
-			}
 		}
 	}
 
@@ -1241,9 +1239,11 @@ static void mmu_set_spte(struct kvm_vcpu
 		}
 	}
 	if (set_spte(vcpu, shadow_pte, pte_access, user_fault, write_fault,
-		      dirty, largepage, gfn, pfn, speculative))
+		      dirty, largepage, gfn, pfn, speculative)) {
 		if (write_fault)
 			*ptwrite = 1;
+		kvm_x86_ops->tlb_flush(vcpu);
+	}
 
 	pgprintk("%s: setting spte %llx\n", __func__, *shadow_pte);
 	pgprintk("instantiating %s PTE (%s) at %ld (%llx) addr %p\n",

-- 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [patch 04/13] KVM: MMU: do not write-protect large mappings
  2008-09-22 19:16 [patch 00/13] out of sync shadow v3 Marcelo Tosatti
                   ` (2 preceding siblings ...)
  2008-09-22 19:16 ` [patch 03/13] KVM: MMU: move local TLB flush to mmu_set_spte Marcelo Tosatti
@ 2008-09-22 19:16 ` Marcelo Tosatti
  2008-09-22 19:16 ` [patch 05/13] KVM: MMU: mode specific sync_page Marcelo Tosatti
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 18+ messages in thread
From: Marcelo Tosatti @ 2008-09-22 19:16 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, Marcelo Tosatti

[-- Attachment #1: no-wrprotect-large-parge --]
[-- Type: text/plain, Size: 1329 bytes --]

There is not much point in write protecting large mappings. This
can only happen when a page is shadowed during the window between
is_largepage_backed and mmu_lock acquision. Zap the entry instead, so
the next pagefault will find a shadowed page via is_largepage_backed and
fallback to 4k translations.

Simplifies out of sync shadow.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

Index: kvm/arch/x86/kvm/mmu.c
===================================================================
--- kvm.orig/arch/x86/kvm/mmu.c
+++ kvm/arch/x86/kvm/mmu.c
@@ -1180,11 +1180,16 @@ static int set_spte(struct kvm_vcpu *vcp
 	    || (write_fault && !is_write_protection(vcpu) && !user_fault)) {
 		struct kvm_mmu_page *shadow;
 
+		if (largepage && has_wrprotected_page(vcpu->kvm, gfn)) {
+			ret = 1;
+			spte = shadow_trap_nonpresent_pte;
+			goto set_pte;
+		}
+
 		spte |= PT_WRITABLE_MASK;
 
 		shadow = kvm_mmu_lookup_page(vcpu->kvm, gfn);
-		if (shadow ||
-		   (largepage && has_wrprotected_page(vcpu->kvm, gfn))) {
+		if (shadow) {
 			pgprintk("%s: found shadow page for %lx, marking ro\n",
 				 __func__, gfn);
 			ret = 1;
@@ -1197,6 +1202,7 @@ static int set_spte(struct kvm_vcpu *vcp
 	if (pte_access & ACC_WRITE_MASK)
 		mark_page_dirty(vcpu->kvm, gfn);
 
+set_pte:
 	set_shadow_pte(shadow_pte, spte);
 	return ret;
 }

-- 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [patch 05/13] KVM: MMU: mode specific sync_page
  2008-09-22 19:16 [patch 00/13] out of sync shadow v3 Marcelo Tosatti
                   ` (3 preceding siblings ...)
  2008-09-22 19:16 ` [patch 04/13] KVM: MMU: do not write-protect large mappings Marcelo Tosatti
@ 2008-09-22 19:16 ` Marcelo Tosatti
  2008-09-22 19:16 ` [patch 06/13] KVM: MMU: sync roots on mmu reload Marcelo Tosatti
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 18+ messages in thread
From: Marcelo Tosatti @ 2008-09-22 19:16 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, Marcelo Tosatti

[-- Attachment #1: kvm-oos-sync-page --]
[-- Type: text/plain, Size: 4334 bytes --]

Examine guest pagetable and bring the shadow back in sync. Caller is responsible
for local TLB flush before re-entering guest mode.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

Index: kvm/arch/x86/kvm/mmu.c
===================================================================
--- kvm.orig/arch/x86/kvm/mmu.c
+++ kvm/arch/x86/kvm/mmu.c
@@ -871,6 +871,12 @@ static void nonpaging_prefetch_page(stru
 		sp->spt[i] = shadow_trap_nonpresent_pte;
 }
 
+static int nonpaging_sync_page(struct kvm_vcpu *vcpu,
+			       struct kvm_mmu_page *sp)
+{
+	return 1;
+}
+
 static struct kvm_mmu_page *kvm_mmu_lookup_page(struct kvm *kvm, gfn_t gfn)
 {
 	unsigned index;
@@ -1547,6 +1553,7 @@ static int nonpaging_init_context(struct
 	context->gva_to_gpa = nonpaging_gva_to_gpa;
 	context->free = nonpaging_free;
 	context->prefetch_page = nonpaging_prefetch_page;
+	context->sync_page = nonpaging_sync_page;
 	context->root_level = 0;
 	context->shadow_root_level = PT32E_ROOT_LEVEL;
 	context->root_hpa = INVALID_PAGE;
@@ -1594,6 +1601,7 @@ static int paging64_init_context_common(
 	context->page_fault = paging64_page_fault;
 	context->gva_to_gpa = paging64_gva_to_gpa;
 	context->prefetch_page = paging64_prefetch_page;
+	context->sync_page = paging64_sync_page;
 	context->free = paging_free;
 	context->root_level = level;
 	context->shadow_root_level = level;
@@ -1615,6 +1623,7 @@ static int paging32_init_context(struct 
 	context->gva_to_gpa = paging32_gva_to_gpa;
 	context->free = paging_free;
 	context->prefetch_page = paging32_prefetch_page;
+	context->sync_page = paging32_sync_page;
 	context->root_level = PT32_ROOT_LEVEL;
 	context->shadow_root_level = PT32E_ROOT_LEVEL;
 	context->root_hpa = INVALID_PAGE;
@@ -1634,6 +1643,7 @@ static int init_kvm_tdp_mmu(struct kvm_v
 	context->page_fault = tdp_page_fault;
 	context->free = nonpaging_free;
 	context->prefetch_page = nonpaging_prefetch_page;
+	context->sync_page = nonpaging_sync_page;
 	context->shadow_root_level = kvm_x86_ops->get_tdp_level();
 	context->root_hpa = INVALID_PAGE;
 
Index: kvm/arch/x86/kvm/paging_tmpl.h
===================================================================
--- kvm.orig/arch/x86/kvm/paging_tmpl.h
+++ kvm/arch/x86/kvm/paging_tmpl.h
@@ -507,6 +507,60 @@ static void FNAME(prefetch_page)(struct 
 	}
 }
 
+/*
+ * Using the cached information from sp->gfns is safe because:
+ * - The spte has a reference to the struct page, so the pfn for a given gfn
+ *   can't change unless all sptes pointing to it are nuked first.
+ * - Alias changes zap the entire shadow cache.
+ */
+static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
+{
+	int i, offset, nr_present;
+
+	offset = nr_present = 0;
+
+	if (PTTYPE == 32)
+		offset = sp->role.quadrant << PT64_LEVEL_BITS;
+
+	for (i = 0; i < PT64_ENT_PER_PAGE; i++) {
+		unsigned pte_access;
+		pt_element_t gpte;
+		gpa_t pte_gpa;
+		gfn_t gfn = sp->gfns[i];
+
+		if (!is_shadow_present_pte(sp->spt[i]))
+			continue;
+
+		pte_gpa = gfn_to_gpa(sp->gfn);
+		pte_gpa += (i+offset) * sizeof(pt_element_t);
+
+		if (kvm_read_guest_atomic(vcpu->kvm, pte_gpa, &gpte,
+					  sizeof(pt_element_t)))
+			return -EINVAL;
+
+		if (gpte_to_gfn(gpte) != gfn || !is_present_pte(gpte) ||
+		    !(gpte & PT_ACCESSED_MASK)) {
+			u64 nonpresent;
+
+			rmap_remove(vcpu->kvm, &sp->spt[i]);
+			if (is_present_pte(gpte))
+				nonpresent = shadow_trap_nonpresent_pte;
+			else
+				nonpresent = shadow_notrap_nonpresent_pte;
+			set_shadow_pte(&sp->spt[i], nonpresent);
+			continue;
+		}
+
+		nr_present++;
+		pte_access = sp->role.access & FNAME(gpte_access)(vcpu, gpte);
+		set_spte(vcpu, &sp->spt[i], pte_access, 0, 0,
+			 is_dirty_pte(gpte), 0, gfn,
+			 spte_to_pfn(sp->spt[i]), true);
+	}
+
+	return !nr_present;
+}
+
 #undef pt_element_t
 #undef guest_walker
 #undef shadow_walker
Index: kvm/include/asm-x86/kvm_host.h
===================================================================
--- kvm.orig/include/asm-x86/kvm_host.h
+++ kvm/include/asm-x86/kvm_host.h
@@ -220,6 +220,8 @@ struct kvm_mmu {
 	gpa_t (*gva_to_gpa)(struct kvm_vcpu *vcpu, gva_t gva);
 	void (*prefetch_page)(struct kvm_vcpu *vcpu,
 			      struct kvm_mmu_page *page);
+	int (*sync_page)(struct kvm_vcpu *vcpu,
+			 struct kvm_mmu_page *sp);
 	hpa_t root_hpa;
 	int root_level;
 	int shadow_root_level;

-- 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [patch 06/13] KVM: MMU: sync roots on mmu reload
  2008-09-22 19:16 [patch 00/13] out of sync shadow v3 Marcelo Tosatti
                   ` (4 preceding siblings ...)
  2008-09-22 19:16 ` [patch 05/13] KVM: MMU: mode specific sync_page Marcelo Tosatti
@ 2008-09-22 19:16 ` Marcelo Tosatti
  2008-09-22 19:16 ` [patch 07/13] KVM: x86: trap invlpg Marcelo Tosatti
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 18+ messages in thread
From: Marcelo Tosatti @ 2008-09-22 19:16 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, Marcelo Tosatti

[-- Attachment #1: mmu-sync-roots --]
[-- Type: text/plain, Size: 2381 bytes --]

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

Index: kvm/arch/x86/kvm/mmu.c
===================================================================
--- kvm.orig/arch/x86/kvm/mmu.c
+++ kvm/arch/x86/kvm/mmu.c
@@ -1471,6 +1471,41 @@ static void mmu_alloc_roots(struct kvm_v
 	vcpu->arch.mmu.root_hpa = __pa(vcpu->arch.mmu.pae_root);
 }
 
+static void mmu_sync_children(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
+{
+}
+
+static void mmu_sync_roots(struct kvm_vcpu *vcpu)
+{
+	int i;
+	struct kvm_mmu_page *sp;
+
+	if (!VALID_PAGE(vcpu->arch.mmu.root_hpa))
+		return;
+	if (vcpu->arch.mmu.shadow_root_level == PT64_ROOT_LEVEL) {
+		hpa_t root = vcpu->arch.mmu.root_hpa;
+		sp = page_header(root);
+		mmu_sync_children(vcpu, sp);
+		return;
+	}
+	for (i = 0; i < 4; ++i) {
+		hpa_t root = vcpu->arch.mmu.pae_root[i];
+
+		if (root) {
+			root &= PT64_BASE_ADDR_MASK;
+			sp = page_header(root);
+			mmu_sync_children(vcpu, sp);
+		}
+	}
+}
+
+void kvm_mmu_sync_roots(struct kvm_vcpu *vcpu)
+{
+	spin_lock(&vcpu->kvm->mmu_lock);
+	mmu_sync_roots(vcpu);
+	spin_unlock(&vcpu->kvm->mmu_lock);
+}
+
 static gpa_t nonpaging_gva_to_gpa(struct kvm_vcpu *vcpu, gva_t vaddr)
 {
 	return vaddr;
@@ -1715,6 +1750,7 @@ int kvm_mmu_load(struct kvm_vcpu *vcpu)
 	spin_lock(&vcpu->kvm->mmu_lock);
 	kvm_mmu_free_some_pages(vcpu);
 	mmu_alloc_roots(vcpu);
+	mmu_sync_roots(vcpu);
 	spin_unlock(&vcpu->kvm->mmu_lock);
 	kvm_x86_ops->set_cr3(vcpu, vcpu->arch.mmu.root_hpa);
 	kvm_mmu_flush_tlb(vcpu);
Index: kvm/arch/x86/kvm/x86.c
===================================================================
--- kvm.orig/arch/x86/kvm/x86.c
+++ kvm/arch/x86/kvm/x86.c
@@ -594,6 +594,7 @@ EXPORT_SYMBOL_GPL(kvm_set_cr4);
 void kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
 {
 	if (cr3 == vcpu->arch.cr3 && !pdptrs_changed(vcpu)) {
+		kvm_mmu_sync_roots(vcpu);
 		kvm_mmu_flush_tlb(vcpu);
 		return;
 	}
Index: kvm/include/asm-x86/kvm_host.h
===================================================================
--- kvm.orig/include/asm-x86/kvm_host.h
+++ kvm/include/asm-x86/kvm_host.h
@@ -584,6 +584,7 @@ int kvm_mmu_unprotect_page_virt(struct k
 void __kvm_mmu_free_some_pages(struct kvm_vcpu *vcpu);
 int kvm_mmu_load(struct kvm_vcpu *vcpu);
 void kvm_mmu_unload(struct kvm_vcpu *vcpu);
+void kvm_mmu_sync_roots(struct kvm_vcpu *vcpu);
 
 int kvm_emulate_hypercall(struct kvm_vcpu *vcpu);
 

-- 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [patch 07/13] KVM: x86: trap invlpg
  2008-09-22 19:16 [patch 00/13] out of sync shadow v3 Marcelo Tosatti
                   ` (5 preceding siblings ...)
  2008-09-22 19:16 ` [patch 06/13] KVM: MMU: sync roots on mmu reload Marcelo Tosatti
@ 2008-09-22 19:16 ` Marcelo Tosatti
  2008-09-22 19:16 ` [patch 08/13] KVM: MMU: mmu_parent_walk Marcelo Tosatti
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 18+ messages in thread
From: Marcelo Tosatti @ 2008-09-22 19:16 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, Marcelo Tosatti

[-- Attachment #1: mmu-invlpg --]
[-- Type: text/plain, Size: 8142 bytes --]

With pages out of sync invlpg needs to be trapped. For now simply nuke
the entry.

Untested on AMD.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

Index: kvm/arch/x86/kvm/mmu.c
===================================================================
--- kvm.orig/arch/x86/kvm/mmu.c
+++ kvm/arch/x86/kvm/mmu.c
@@ -877,6 +877,10 @@ static int nonpaging_sync_page(struct kv
 	return 1;
 }
 
+static void nonpaging_invlpg(struct kvm_vcpu *vcpu, gva_t gva)
+{
+}
+
 static struct kvm_mmu_page *kvm_mmu_lookup_page(struct kvm *kvm, gfn_t gfn)
 {
 	unsigned index;
@@ -1589,6 +1593,7 @@ static int nonpaging_init_context(struct
 	context->free = nonpaging_free;
 	context->prefetch_page = nonpaging_prefetch_page;
 	context->sync_page = nonpaging_sync_page;
+	context->invlpg = nonpaging_invlpg;
 	context->root_level = 0;
 	context->shadow_root_level = PT32E_ROOT_LEVEL;
 	context->root_hpa = INVALID_PAGE;
@@ -1637,6 +1642,7 @@ static int paging64_init_context_common(
 	context->gva_to_gpa = paging64_gva_to_gpa;
 	context->prefetch_page = paging64_prefetch_page;
 	context->sync_page = paging64_sync_page;
+	context->invlpg = paging64_invlpg;
 	context->free = paging_free;
 	context->root_level = level;
 	context->shadow_root_level = level;
@@ -1659,6 +1665,7 @@ static int paging32_init_context(struct 
 	context->free = paging_free;
 	context->prefetch_page = paging32_prefetch_page;
 	context->sync_page = paging32_sync_page;
+	context->invlpg = paging32_invlpg;
 	context->root_level = PT32_ROOT_LEVEL;
 	context->shadow_root_level = PT32E_ROOT_LEVEL;
 	context->root_hpa = INVALID_PAGE;
@@ -1679,6 +1686,7 @@ static int init_kvm_tdp_mmu(struct kvm_v
 	context->free = nonpaging_free;
 	context->prefetch_page = nonpaging_prefetch_page;
 	context->sync_page = nonpaging_sync_page;
+	context->invlpg = nonpaging_invlpg;
 	context->shadow_root_level = kvm_x86_ops->get_tdp_level();
 	context->root_hpa = INVALID_PAGE;
 
@@ -2071,6 +2079,16 @@ out:
 }
 EXPORT_SYMBOL_GPL(kvm_mmu_page_fault);
 
+void kvm_mmu_invlpg(struct kvm_vcpu *vcpu, gva_t gva)
+{
+	spin_lock(&vcpu->kvm->mmu_lock);
+	vcpu->arch.mmu.invlpg(vcpu, gva);
+	spin_unlock(&vcpu->kvm->mmu_lock);
+	kvm_mmu_flush_tlb(vcpu);
+	++vcpu->stat.invlpg;
+}
+EXPORT_SYMBOL_GPL(kvm_mmu_invlpg);
+
 void kvm_enable_tdp(void)
 {
 	tdp_enabled = true;
Index: kvm/arch/x86/kvm/paging_tmpl.h
===================================================================
--- kvm.orig/arch/x86/kvm/paging_tmpl.h
+++ kvm/arch/x86/kvm/paging_tmpl.h
@@ -461,6 +461,31 @@ out_unlock:
 	return 0;
 }
 
+static int FNAME(shadow_invlpg_entry)(struct kvm_shadow_walk *_sw,
+				      struct kvm_vcpu *vcpu, u64 addr,
+				      u64 *sptep, int level)
+{
+
+	if (level == PT_PAGE_TABLE_LEVEL) {
+		if (is_shadow_present_pte(*sptep))
+			rmap_remove(vcpu->kvm, sptep);
+		set_shadow_pte(sptep, shadow_trap_nonpresent_pte);
+		return 1;
+	}
+	if (!is_shadow_present_pte(*sptep))
+		return 1;
+	return 0;
+}
+
+static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva)
+{
+	struct shadow_walker walker = {
+		.walker = { .entry = FNAME(shadow_invlpg_entry), },
+	};
+
+	walk_shadow(&walker.walker, vcpu, gva);
+}
+
 static gpa_t FNAME(gva_to_gpa)(struct kvm_vcpu *vcpu, gva_t vaddr)
 {
 	struct guest_walker walker;
Index: kvm/arch/x86/kvm/svm.c
===================================================================
--- kvm.orig/arch/x86/kvm/svm.c
+++ kvm/arch/x86/kvm/svm.c
@@ -525,6 +525,7 @@ static void init_vmcb(struct vcpu_svm *s
 				(1ULL << INTERCEPT_CPUID) |
 				(1ULL << INTERCEPT_INVD) |
 				(1ULL << INTERCEPT_HLT) |
+				(1ULL << INTERCEPT_INVLPG) |
 				(1ULL << INTERCEPT_INVLPGA) |
 				(1ULL << INTERCEPT_IOIO_PROT) |
 				(1ULL << INTERCEPT_MSR_PROT) |
@@ -589,7 +590,8 @@ static void init_vmcb(struct vcpu_svm *s
 	if (npt_enabled) {
 		/* Setup VMCB for Nested Paging */
 		control->nested_ctl = 1;
-		control->intercept &= ~(1ULL << INTERCEPT_TASK_SWITCH);
+		control->intercept &= ~((1ULL << INTERCEPT_TASK_SWITCH) |
+					(1ULL << INTERCEPT_INVLPG));
 		control->intercept_exceptions &= ~(1 << PF_VECTOR);
 		control->intercept_cr_read &= ~(INTERCEPT_CR0_MASK|
 						INTERCEPT_CR3_MASK);
@@ -1164,6 +1166,13 @@ static int cpuid_interception(struct vcp
 	return 1;
 }
 
+static int invlpg_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run)
+{
+	if (emulate_instruction(&svm->vcpu, kvm_run, 0, 0, 0) != EMULATE_DONE)
+		pr_unimpl(&svm->vcpu, "%s: failed\n", __func__);
+	return 1;
+}
+
 static int emulate_on_interception(struct vcpu_svm *svm,
 				   struct kvm_run *kvm_run)
 {
@@ -1417,7 +1426,7 @@ static int (*svm_exit_handlers[])(struct
 	[SVM_EXIT_CPUID]			= cpuid_interception,
 	[SVM_EXIT_INVD]                         = emulate_on_interception,
 	[SVM_EXIT_HLT]				= halt_interception,
-	[SVM_EXIT_INVLPG]			= emulate_on_interception,
+	[SVM_EXIT_INVLPG]			= invlpg_interception,
 	[SVM_EXIT_INVLPGA]			= invalid_op_interception,
 	[SVM_EXIT_IOIO] 		  	= io_interception,
 	[SVM_EXIT_MSR]				= msr_interception,
Index: kvm/arch/x86/kvm/vmx.c
===================================================================
--- kvm.orig/arch/x86/kvm/vmx.c
+++ kvm/arch/x86/kvm/vmx.c
@@ -1130,7 +1130,8 @@ static __init int setup_vmcs_config(stru
 	      CPU_BASED_CR3_STORE_EXITING |
 	      CPU_BASED_USE_IO_BITMAPS |
 	      CPU_BASED_MOV_DR_EXITING |
-	      CPU_BASED_USE_TSC_OFFSETING;
+	      CPU_BASED_USE_TSC_OFFSETING |
+	      CPU_BASED_INVLPG_EXITING;
 	opt = CPU_BASED_TPR_SHADOW |
 	      CPU_BASED_USE_MSR_BITMAPS |
 	      CPU_BASED_ACTIVATE_SECONDARY_CONTROLS;
@@ -1159,9 +1160,11 @@ static __init int setup_vmcs_config(stru
 		_cpu_based_exec_control &= ~CPU_BASED_TPR_SHADOW;
 #endif
 	if (_cpu_based_2nd_exec_control & SECONDARY_EXEC_ENABLE_EPT) {
-		/* CR3 accesses don't need to cause VM Exits when EPT enabled */
+		/* CR3 accesses and invlpg don't need to cause VM Exits when EPT
+		   enabled */
 		min &= ~(CPU_BASED_CR3_LOAD_EXITING |
-			 CPU_BASED_CR3_STORE_EXITING);
+			 CPU_BASED_CR3_STORE_EXITING |
+			 CPU_BASED_INVLPG_EXITING);
 		if (adjust_vmx_controls(min, opt, MSR_IA32_VMX_PROCBASED_CTLS,
 					&_cpu_based_exec_control) < 0)
 			return -EIO;
@@ -2790,6 +2793,15 @@ static int handle_vmcall(struct kvm_vcpu
 	return 1;
 }
 
+static int handle_invlpg(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
+{
+	u64 exit_qualification = vmcs_read64(EXIT_QUALIFICATION);
+
+	kvm_mmu_invlpg(vcpu, exit_qualification);
+	skip_emulated_instruction(vcpu);
+	return 1;
+}
+
 static int handle_wbinvd(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
 {
 	skip_emulated_instruction(vcpu);
@@ -2958,6 +2970,7 @@ static int (*kvm_vmx_exit_handlers[])(st
 	[EXIT_REASON_MSR_WRITE]               = handle_wrmsr,
 	[EXIT_REASON_PENDING_INTERRUPT]       = handle_interrupt_window,
 	[EXIT_REASON_HLT]                     = handle_halt,
+	[EXIT_REASON_INVLPG]		      = handle_invlpg,
 	[EXIT_REASON_VMCALL]                  = handle_vmcall,
 	[EXIT_REASON_TPR_BELOW_THRESHOLD]     = handle_tpr_below_threshold,
 	[EXIT_REASON_APIC_ACCESS]             = handle_apic_access,
Index: kvm/include/asm-x86/kvm_host.h
===================================================================
--- kvm.orig/include/asm-x86/kvm_host.h
+++ kvm/include/asm-x86/kvm_host.h
@@ -222,6 +222,7 @@ struct kvm_mmu {
 			      struct kvm_mmu_page *page);
 	int (*sync_page)(struct kvm_vcpu *vcpu,
 			 struct kvm_mmu_page *sp);
+	void (*invlpg)(struct kvm_vcpu *vcpu, gva_t gva);
 	hpa_t root_hpa;
 	int root_level;
 	int shadow_root_level;
@@ -591,6 +592,7 @@ int kvm_emulate_hypercall(struct kvm_vcp
 int kvm_fix_hypercall(struct kvm_vcpu *vcpu);
 
 int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t gva, u32 error_code);
+void kvm_mmu_invlpg(struct kvm_vcpu *vcpu, gva_t gva);
 
 void kvm_enable_tdp(void);
 void kvm_disable_tdp(void);
Index: kvm/arch/x86/kvm/x86.c
===================================================================
--- kvm.orig/arch/x86/kvm/x86.c
+++ kvm/arch/x86/kvm/x86.c
@@ -2341,6 +2341,7 @@ static unsigned long get_segment_base(st
 
 int emulate_invlpg(struct kvm_vcpu *vcpu, gva_t address)
 {
+	kvm_mmu_invlpg(vcpu, address);
 	return X86EMUL_CONTINUE;
 }
 

-- 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [patch 08/13] KVM: MMU: mmu_parent_walk
  2008-09-22 19:16 [patch 00/13] out of sync shadow v3 Marcelo Tosatti
                   ` (6 preceding siblings ...)
  2008-09-22 19:16 ` [patch 07/13] KVM: x86: trap invlpg Marcelo Tosatti
@ 2008-09-22 19:16 ` Marcelo Tosatti
  2008-09-22 19:16 ` [patch 09/13] KVM: MMU: awareness of new kvm_mmu_zap_page behaviour Marcelo Tosatti
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 18+ messages in thread
From: Marcelo Tosatti @ 2008-09-22 19:16 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, Marcelo Tosatti

[-- Attachment #1: walk-parent --]
[-- Type: text/plain, Size: 2128 bytes --]

Introduce a function to walk all parents of a given page, invoking a handler.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>


Index: kvm/arch/x86/kvm/mmu.c
===================================================================
--- kvm.orig/arch/x86/kvm/mmu.c
+++ kvm/arch/x86/kvm/mmu.c
@@ -147,6 +147,8 @@ struct kvm_shadow_walk {
 		     u64 addr, u64 *spte, int level);
 };
 
+typedef int (*mmu_parent_walk_fn) (struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp);
+
 static struct kmem_cache *pte_chain_cache;
 static struct kmem_cache *rmap_desc_cache;
 static struct kmem_cache *mmu_page_header_cache;
@@ -862,6 +864,65 @@ static void mmu_page_remove_parent_pte(s
 	BUG();
 }
 
+struct mmu_parent_walk {
+	struct hlist_node *node;
+	int i;
+};
+
+static struct kvm_mmu_page *mmu_parent_next(struct kvm_mmu_page *sp,
+					    struct mmu_parent_walk *walk)
+{
+	struct kvm_pte_chain *pte_chain;
+	struct hlist_head *h;
+
+	if (!walk->node) {
+		if (!sp || !sp->parent_pte)
+			return NULL;
+		if (!sp->multimapped)
+			return page_header(__pa(sp->parent_pte));
+		h = &sp->parent_ptes;
+		walk->node = h->first;
+		walk->i = 0;
+	}
+
+	while (walk->node) {
+		pte_chain = hlist_entry(walk->node, struct kvm_pte_chain, link);
+		while (walk->i < NR_PTE_CHAIN_ENTRIES) {
+			int i = walk->i++;
+			if (!pte_chain->parent_ptes[i])
+				break;
+			return page_header(__pa(pte_chain->parent_ptes[i]));
+		}
+		walk->node = walk->node->next;
+		walk->i = 0;
+	}
+
+	return NULL;
+}
+
+static void mmu_parent_walk(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
+			    mmu_parent_walk_fn fn)
+{
+	int level, start_level;
+	struct mmu_parent_walk walk[PT64_ROOT_LEVEL];
+
+	memset(&walk, 0, sizeof(walk));
+	level = start_level = sp->role.level;
+
+	do {
+		sp = mmu_parent_next(sp, &walk[level-1]);
+		if (sp) {
+			if (sp->role.level > start_level)
+				fn(vcpu, sp);
+			if (level != sp->role.level)
+				++level;
+			WARN_ON (level > PT64_ROOT_LEVEL);
+			continue;
+		}
+		--level;
+	} while (level > start_level-1);
+}
+
 static void nonpaging_prefetch_page(struct kvm_vcpu *vcpu,
 				    struct kvm_mmu_page *sp)
 {

-- 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [patch 09/13] KVM: MMU: awareness of new kvm_mmu_zap_page behaviour
  2008-09-22 19:16 [patch 00/13] out of sync shadow v3 Marcelo Tosatti
                   ` (7 preceding siblings ...)
  2008-09-22 19:16 ` [patch 08/13] KVM: MMU: mmu_parent_walk Marcelo Tosatti
@ 2008-09-22 19:16 ` Marcelo Tosatti
  2008-09-22 19:16 ` [patch 10/13] KVM: MMU: mmu_convert_notrap helper Marcelo Tosatti
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 18+ messages in thread
From: Marcelo Tosatti @ 2008-09-22 19:16 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, Marcelo Tosatti

[-- Attachment #1: mmu-zap-ret --]
[-- Type: text/plain, Size: 1777 bytes --]

kvm_mmu_zap_page will soon zap the unsynced children of a page. Restart
list walk in such case.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

Index: kvm/arch/x86/kvm/mmu.c
===================================================================
--- kvm.orig/arch/x86/kvm/mmu.c
+++ kvm/arch/x86/kvm/mmu.c
@@ -1112,7 +1112,7 @@ static void kvm_mmu_unlink_parents(struc
 	}
 }
 
-static void kvm_mmu_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp)
+static int kvm_mmu_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp)
 {
 	++kvm->stat.mmu_shadow_zapped;
 	kvm_mmu_page_unlink_children(kvm, sp);
@@ -1129,6 +1129,7 @@ static void kvm_mmu_zap_page(struct kvm 
 		kvm_reload_remote_mmus(kvm);
 	}
 	kvm_mmu_reset_last_pte_updated(kvm);
+	return 0;
 }
 
 /*
@@ -1181,8 +1182,9 @@ static int kvm_mmu_unprotect_page(struct
 		if (sp->gfn == gfn && !sp->role.metaphysical) {
 			pgprintk("%s: gfn %lx role %x\n", __func__, gfn,
 				 sp->role.word);
-			kvm_mmu_zap_page(kvm, sp);
 			r = 1;
+			if (kvm_mmu_zap_page(kvm, sp))
+				n = bucket->first;
 		}
 	return r;
 }
@@ -2026,7 +2028,8 @@ void kvm_mmu_pte_write(struct kvm_vcpu *
 			 */
 			pgprintk("misaligned: gpa %llx bytes %d role %x\n",
 				 gpa, bytes, sp->role.word);
-			kvm_mmu_zap_page(vcpu->kvm, sp);
+			if (kvm_mmu_zap_page(vcpu->kvm, sp))
+				n = bucket->first;
 			++vcpu->kvm->stat.mmu_flooded;
 			continue;
 		}
@@ -2260,7 +2263,9 @@ void kvm_mmu_zap_all(struct kvm *kvm)
 
 	spin_lock(&kvm->mmu_lock);
 	list_for_each_entry_safe(sp, node, &kvm->arch.active_mmu_pages, link)
-		kvm_mmu_zap_page(kvm, sp);
+		if (kvm_mmu_zap_page(kvm, sp))
+			node = container_of(kvm->arch.active_mmu_pages.next,
+					    struct kvm_mmu_page, link);
 	spin_unlock(&kvm->mmu_lock);
 
 	kvm_flush_remote_tlbs(kvm);

-- 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [patch 10/13] KVM: MMU: mmu_convert_notrap helper
  2008-09-22 19:16 [patch 00/13] out of sync shadow v3 Marcelo Tosatti
                   ` (8 preceding siblings ...)
  2008-09-22 19:16 ` [patch 09/13] KVM: MMU: awareness of new kvm_mmu_zap_page behaviour Marcelo Tosatti
@ 2008-09-22 19:16 ` Marcelo Tosatti
  2008-09-22 19:16 ` [patch 11/13] KVM: MMU: out of sync shadow core v2 Marcelo Tosatti
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 18+ messages in thread
From: Marcelo Tosatti @ 2008-09-22 19:16 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, Marcelo Tosatti

[-- Attachment #1: kvm-oos-trap-convert --]
[-- Type: text/plain, Size: 823 bytes --]

Need to convert shadow_notrap_nonpresent -> shadow_trap_nonpresent when
unsyncing pages.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

Index: kvm/arch/x86/kvm/mmu.c
===================================================================
--- kvm.orig/arch/x86/kvm/mmu.c
+++ kvm/arch/x86/kvm/mmu.c
@@ -1207,6 +1207,20 @@ static void page_header_update_slot(stru
 	__set_bit(slot, &sp->slot_bitmap);
 }
 
+static void mmu_convert_notrap(struct kvm_mmu_page *sp)
+{
+	int i;
+	u64 *pt = sp->spt;
+
+	if (shadow_trap_nonpresent_pte == shadow_notrap_nonpresent_pte)
+		return;
+
+	for (i = 0; i < PT64_ENT_PER_PAGE; ++i) {
+		if (pt[i] == shadow_notrap_nonpresent_pte)
+			set_shadow_pte(&pt[i], shadow_trap_nonpresent_pte);
+	}
+}
+
 struct page *gva_to_page(struct kvm_vcpu *vcpu, gva_t gva)
 {
 	struct page *page;

-- 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [patch 11/13] KVM: MMU: out of sync shadow core v2
  2008-09-22 19:16 [patch 00/13] out of sync shadow v3 Marcelo Tosatti
                   ` (9 preceding siblings ...)
  2008-09-22 19:16 ` [patch 10/13] KVM: MMU: mmu_convert_notrap helper Marcelo Tosatti
@ 2008-09-22 19:16 ` Marcelo Tosatti
  2008-09-22 19:16 ` [patch 12/13] KVM: MMU: speed up mmu_unsync_walk Marcelo Tosatti
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 18+ messages in thread
From: Marcelo Tosatti @ 2008-09-22 19:16 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, Marcelo Tosatti

[-- Attachment #1: kvm-oos-core --]
[-- Type: text/plain, Size: 10788 bytes --]

Allow guest pagetables to go out of sync.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

Index: kvm/arch/x86/kvm/mmu.c
===================================================================
--- kvm.orig/arch/x86/kvm/mmu.c
+++ kvm/arch/x86/kvm/mmu.c
@@ -147,6 +147,10 @@ struct kvm_shadow_walk {
 		     u64 addr, u64 *spte, int level);
 };
 
+struct kvm_unsync_walk {
+	int (*entry) (struct kvm_mmu_page *sp, struct kvm_unsync_walk *walk);
+};
+
 typedef int (*mmu_parent_walk_fn) (struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp);
 
 static struct kmem_cache *pte_chain_cache;
@@ -942,6 +946,39 @@ static void nonpaging_invlpg(struct kvm_
 {
 }
 
+static int mmu_unsync_walk(struct kvm_mmu_page *parent,
+			   struct kvm_unsync_walk *walker)
+{
+	int i, ret;
+	struct kvm_mmu_page *sp = parent;
+
+	while (parent->unsync_children) {
+		for (i = 0; i < PT64_ENT_PER_PAGE; ++i) {
+			u64 ent = sp->spt[i];
+
+			if (is_shadow_present_pte(ent)) {
+				struct kvm_mmu_page *child;
+				child = page_header(ent & PT64_BASE_ADDR_MASK);
+
+				if (child->unsync_children) {
+					sp = child;
+					break;
+				}
+				if (child->unsync) {
+					ret = walker->entry(child, walker);
+					if (ret)
+						return ret;
+				}
+			}
+		}
+		if (i == PT64_ENT_PER_PAGE) {
+			sp->unsync_children = 0;
+			sp = parent;
+		}
+	}
+	return 0;
+}
+
 static struct kvm_mmu_page *kvm_mmu_lookup_page(struct kvm *kvm, gfn_t gfn)
 {
 	unsigned index;
@@ -962,6 +999,59 @@ static struct kvm_mmu_page *kvm_mmu_look
 	return NULL;
 }
 
+static void kvm_unlink_unsync_page(struct kvm *kvm, struct kvm_mmu_page *sp)
+{
+	WARN_ON(!sp->unsync);
+	sp->unsync = 0;
+	--kvm->stat.mmu_unsync;
+}
+
+static int kvm_mmu_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp);
+
+static int kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
+{
+	if (sp->role.glevels != vcpu->arch.mmu.root_level) {
+		kvm_mmu_zap_page(vcpu->kvm, sp);
+		return 1;
+	}
+
+	rmap_write_protect(vcpu->kvm, sp->gfn);
+	if (vcpu->arch.mmu.sync_page(vcpu, sp)) {
+		kvm_mmu_zap_page(vcpu->kvm, sp);
+		return 1;
+	}
+
+	kvm_mmu_flush_tlb(vcpu);
+	kvm_unlink_unsync_page(vcpu->kvm, sp);
+	return 0;
+}
+
+struct sync_walker {
+	struct kvm_vcpu *vcpu;
+	struct kvm_unsync_walk walker;
+};
+
+static int mmu_sync_fn(struct kvm_mmu_page *sp, struct kvm_unsync_walk *walk)
+{
+	struct sync_walker *sync_walk = container_of(walk, struct sync_walker,
+						     walker);
+	struct kvm_vcpu *vcpu = sync_walk->vcpu;
+
+	kvm_sync_page(vcpu, sp);
+	return (need_resched() || spin_needbreak(&vcpu->kvm->mmu_lock));
+}
+
+static void mmu_sync_children(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
+{
+	struct sync_walker walker = {
+		.walker = { .entry = mmu_sync_fn, },
+		.vcpu = vcpu,
+	};
+
+	while (mmu_unsync_walk(sp, &walker.walker))
+		cond_resched_lock(&vcpu->kvm->mmu_lock);
+}
+
 static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
 					     gfn_t gfn,
 					     gva_t gaddr,
@@ -975,7 +1065,7 @@ static struct kvm_mmu_page *kvm_mmu_get_
 	unsigned quadrant;
 	struct hlist_head *bucket;
 	struct kvm_mmu_page *sp;
-	struct hlist_node *node;
+	struct hlist_node *node, *tmp;
 
 	role.word = 0;
 	role.glevels = vcpu->arch.mmu.root_level;
@@ -991,8 +1081,18 @@ static struct kvm_mmu_page *kvm_mmu_get_
 		 gfn, role.word);
 	index = kvm_page_table_hashfn(gfn);
 	bucket = &vcpu->kvm->arch.mmu_page_hash[index];
-	hlist_for_each_entry(sp, node, bucket, hash_link)
-		if (sp->gfn == gfn && sp->role.word == role.word) {
+	hlist_for_each_entry_safe(sp, node, tmp, bucket, hash_link)
+		if (sp->gfn == gfn) {
+			if (sp->unsync)
+				if (kvm_sync_page(vcpu, sp))
+					continue;
+
+			if (sp->role.word != role.word)
+				continue;
+
+			if (sp->unsync_children)
+				set_bit(KVM_REQ_MMU_SYNC, &vcpu->requests);
+
 			mmu_page_add_parent_pte(vcpu, sp, parent_pte);
 			pgprintk("%s: found\n", __func__);
 			return sp;
@@ -1112,14 +1212,47 @@ static void kvm_mmu_unlink_parents(struc
 	}
 }
 
+struct zap_walker {
+	struct kvm_unsync_walk walker;
+	struct kvm *kvm;
+	int zapped;
+};
+
+static int mmu_zap_fn(struct kvm_mmu_page *sp, struct kvm_unsync_walk *walk)
+{
+	struct zap_walker *zap_walk = container_of(walk, struct zap_walker,
+						     walker);
+	kvm_mmu_zap_page(zap_walk->kvm, sp);
+	zap_walk->zapped = 1;
+	return 0;
+}
+
+static int mmu_zap_unsync_children(struct kvm *kvm, struct kvm_mmu_page *sp)
+{
+	struct zap_walker walker = {
+		.walker = { .entry = mmu_zap_fn, },
+		.kvm = kvm,
+		.zapped = 0,
+	};
+
+	if (sp->role.level == PT_PAGE_TABLE_LEVEL)
+		return 0;
+	mmu_unsync_walk(sp, &walker.walker);
+	return walker.zapped;
+}
+
 static int kvm_mmu_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp)
 {
+	int ret;
 	++kvm->stat.mmu_shadow_zapped;
+	ret = mmu_zap_unsync_children(kvm, sp);
 	kvm_mmu_page_unlink_children(kvm, sp);
 	kvm_mmu_unlink_parents(kvm, sp);
 	kvm_flush_remote_tlbs(kvm);
 	if (!sp->role.invalid && !sp->role.metaphysical)
 		unaccount_shadowed(kvm, sp->gfn);
+	if (sp->unsync)
+		kvm_unlink_unsync_page(kvm, sp);
 	if (!sp->root_count) {
 		hlist_del(&sp->hash_link);
 		kvm_mmu_free_page(kvm, sp);
@@ -1129,7 +1262,7 @@ static int kvm_mmu_zap_page(struct kvm *
 		kvm_reload_remote_mmus(kvm);
 	}
 	kvm_mmu_reset_last_pte_updated(kvm);
-	return 0;
+	return ret;
 }
 
 /*
@@ -1235,10 +1368,58 @@ struct page *gva_to_page(struct kvm_vcpu
 	return page;
 }
 
+static int unsync_walk_fn(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
+{
+	sp->unsync_children = 1;
+	return 1;
+}
+
+static int kvm_unsync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
+{
+	unsigned index;
+	struct hlist_head *bucket;
+	struct kvm_mmu_page *s;
+	struct hlist_node *node, *n;
+
+	index = kvm_page_table_hashfn(sp->gfn);
+	bucket = &vcpu->kvm->arch.mmu_page_hash[index];
+	/* don't unsync if pagetable is shadowed with multiple roles */
+	hlist_for_each_entry_safe(s, node, n, bucket, hash_link) {
+		if (s->gfn != sp->gfn || s->role.metaphysical)
+			continue;
+		if (s->role.word != sp->role.word)
+			return 1;
+	}
+	mmu_parent_walk(vcpu, sp, unsync_walk_fn);
+	++vcpu->kvm->stat.mmu_unsync;
+	sp->unsync = 1;
+	mmu_convert_notrap(sp);
+	return 0;
+}
+
+static int mmu_need_write_protect(struct kvm_vcpu *vcpu, gfn_t gfn,
+				  bool can_unsync)
+{
+	struct kvm_mmu_page *shadow;
+
+	shadow = kvm_mmu_lookup_page(vcpu->kvm, gfn);
+	if (shadow) {
+		if (shadow->role.level != PT_PAGE_TABLE_LEVEL)
+			return 1;
+		if (shadow->unsync)
+			return 0;
+		if (can_unsync)
+			return kvm_unsync_page(vcpu, shadow);
+		return 1;
+	}
+	return 0;
+}
+
 static int set_spte(struct kvm_vcpu *vcpu, u64 *shadow_pte,
 		    unsigned pte_access, int user_fault,
 		    int write_fault, int dirty, int largepage,
-		    gfn_t gfn, pfn_t pfn, bool speculative)
+		    gfn_t gfn, pfn_t pfn, bool speculative,
+		    bool can_unsync)
 {
 	u64 spte;
 	int ret = 0;
@@ -1265,7 +1446,6 @@ static int set_spte(struct kvm_vcpu *vcp
 
 	if ((pte_access & ACC_WRITE_MASK)
 	    || (write_fault && !is_write_protection(vcpu) && !user_fault)) {
-		struct kvm_mmu_page *shadow;
 
 		if (largepage && has_wrprotected_page(vcpu->kvm, gfn)) {
 			ret = 1;
@@ -1275,8 +1455,7 @@ static int set_spte(struct kvm_vcpu *vcp
 
 		spte |= PT_WRITABLE_MASK;
 
-		shadow = kvm_mmu_lookup_page(vcpu->kvm, gfn);
-		if (shadow) {
+		if (mmu_need_write_protect(vcpu, gfn, can_unsync)) {
 			pgprintk("%s: found shadow page for %lx, marking ro\n",
 				 __func__, gfn);
 			ret = 1;
@@ -1294,7 +1473,6 @@ set_pte:
 	return ret;
 }
 
-
 static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *shadow_pte,
 			 unsigned pt_access, unsigned pte_access,
 			 int user_fault, int write_fault, int dirty,
@@ -1332,7 +1510,7 @@ static void mmu_set_spte(struct kvm_vcpu
 		}
 	}
 	if (set_spte(vcpu, shadow_pte, pte_access, user_fault, write_fault,
-		      dirty, largepage, gfn, pfn, speculative)) {
+		      dirty, largepage, gfn, pfn, speculative, true)) {
 		if (write_fault)
 			*ptwrite = 1;
 		kvm_x86_ops->tlb_flush(vcpu);
@@ -1552,10 +1730,6 @@ static void mmu_alloc_roots(struct kvm_v
 	vcpu->arch.mmu.root_hpa = __pa(vcpu->arch.mmu.pae_root);
 }
 
-static void mmu_sync_children(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
-{
-}
-
 static void mmu_sync_roots(struct kvm_vcpu *vcpu)
 {
 	int i;
Index: kvm/include/asm-x86/kvm_host.h
===================================================================
--- kvm.orig/include/asm-x86/kvm_host.h
+++ kvm/include/asm-x86/kvm_host.h
@@ -195,6 +195,8 @@ struct kvm_mmu_page {
 				    */
 	int multimapped;         /* More than one parent_pte? */
 	int root_count;          /* Currently serving as active root */
+	bool unsync;
+	bool unsync_children;
 	union {
 		u64 *parent_pte;               /* !multimapped */
 		struct hlist_head parent_ptes; /* multimapped, kvm_pte_chain */
@@ -371,6 +373,7 @@ struct kvm_vm_stat {
 	u32 mmu_flooded;
 	u32 mmu_recycled;
 	u32 mmu_cache_miss;
+	u32 mmu_unsync;
 	u32 remote_tlb_flush;
 	u32 lpages;
 };
Index: kvm/arch/x86/kvm/x86.c
===================================================================
--- kvm.orig/arch/x86/kvm/x86.c
+++ kvm/arch/x86/kvm/x86.c
@@ -101,6 +101,7 @@ struct kvm_stats_debugfs_item debugfs_en
 	{ "mmu_flooded", VM_STAT(mmu_flooded) },
 	{ "mmu_recycled", VM_STAT(mmu_recycled) },
 	{ "mmu_cache_miss", VM_STAT(mmu_cache_miss) },
+	{ "mmu_unsync", VM_STAT(mmu_unsync) },
 	{ "remote_tlb_flush", VM_STAT(remote_tlb_flush) },
 	{ "largepages", VM_STAT(lpages) },
 	{ NULL }
@@ -3120,6 +3121,8 @@ static int vcpu_enter_guest(struct kvm_v
 	if (vcpu->requests) {
 		if (test_and_clear_bit(KVM_REQ_MIGRATE_TIMER, &vcpu->requests))
 			__kvm_migrate_timers(vcpu);
+		if (test_and_clear_bit(KVM_REQ_MMU_SYNC, &vcpu->requests))
+			kvm_mmu_sync_roots(vcpu);
 		if (test_and_clear_bit(KVM_REQ_TLB_FLUSH, &vcpu->requests))
 			kvm_x86_ops->tlb_flush(vcpu);
 		if (test_and_clear_bit(KVM_REQ_REPORT_TPR_ACCESS,
Index: kvm/arch/x86/kvm/paging_tmpl.h
===================================================================
--- kvm.orig/arch/x86/kvm/paging_tmpl.h
+++ kvm/arch/x86/kvm/paging_tmpl.h
@@ -580,7 +580,7 @@ static int FNAME(sync_page)(struct kvm_v
 		pte_access = sp->role.access & FNAME(gpte_access)(vcpu, gpte);
 		set_spte(vcpu, &sp->spt[i], pte_access, 0, 0,
 			 is_dirty_pte(gpte), 0, gfn,
-			 spte_to_pfn(sp->spt[i]), true);
+			 spte_to_pfn(sp->spt[i]), true, false);
 	}
 
 	return !nr_present;
Index: kvm/include/linux/kvm_host.h
===================================================================
--- kvm.orig/include/linux/kvm_host.h
+++ kvm/include/linux/kvm_host.h
@@ -35,6 +35,7 @@
 #define KVM_REQ_TRIPLE_FAULT       4
 #define KVM_REQ_PENDING_TIMER      5
 #define KVM_REQ_UNHALT             6
+#define KVM_REQ_MMU_SYNC           7
 
 struct kvm_vcpu;
 extern struct kmem_cache *kvm_vcpu_cache;

-- 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [patch 12/13] KVM: MMU: speed up mmu_unsync_walk
  2008-09-22 19:16 [patch 00/13] out of sync shadow v3 Marcelo Tosatti
                   ` (10 preceding siblings ...)
  2008-09-22 19:16 ` [patch 11/13] KVM: MMU: out of sync shadow core v2 Marcelo Tosatti
@ 2008-09-22 19:16 ` Marcelo Tosatti
  2008-09-22 19:16 ` [patch 13/13] KVM: MMU: add "oos_shadow" parameter to disable oos Marcelo Tosatti
  2008-10-14 18:37 ` [patch 00/13] out of sync shadow v3 Alex Williamson
  13 siblings, 0 replies; 18+ messages in thread
From: Marcelo Tosatti @ 2008-09-22 19:16 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, Marcelo Tosatti

[-- Attachment #1: kvm-oos-speed-walk --]
[-- Type: text/plain, Size: 4697 bytes --]

Cache the unsynced children information in a per-page bitmap.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

Index: kvm/arch/x86/kvm/mmu.c
===================================================================
--- kvm.orig/arch/x86/kvm/mmu.c
+++ kvm/arch/x86/kvm/mmu.c
@@ -927,6 +927,52 @@ static void mmu_parent_walk(struct kvm_v
 	} while (level > start_level-1);
 }
 
+static void kvm_mmu_update_unsync_bitmap(u64 *spte)
+{
+	unsigned int index;
+	struct kvm_mmu_page *sp = page_header(__pa(spte));
+
+	index = spte - sp->spt;
+	__set_bit(index, sp->unsync_child_bitmap);
+	sp->unsync_children = 1;
+}
+
+static void kvm_mmu_update_parents_unsync(struct kvm_mmu_page *sp)
+{
+	struct kvm_pte_chain *pte_chain;
+	struct hlist_node *node;
+	int i;
+
+	if (!sp->parent_pte)
+		return;
+
+	if (!sp->multimapped) {
+		kvm_mmu_update_unsync_bitmap(sp->parent_pte);
+		return;
+	}
+
+	hlist_for_each_entry(pte_chain, node, &sp->parent_ptes, link)
+		for (i = 0; i < NR_PTE_CHAIN_ENTRIES; ++i) {
+			if (!pte_chain->parent_ptes[i])
+				break;
+			kvm_mmu_update_unsync_bitmap(pte_chain->parent_ptes[i]);
+		}
+}
+
+static int unsync_walk_fn(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
+{
+	sp->unsync_children = 1;
+	kvm_mmu_update_parents_unsync(sp);
+	return 1;
+}
+
+static void kvm_mmu_mark_parents_unsync(struct kvm_vcpu *vcpu,
+					struct kvm_mmu_page *sp)
+{
+	mmu_parent_walk(vcpu, sp, unsync_walk_fn);
+	kvm_mmu_update_parents_unsync(sp);
+}
+
 static void nonpaging_prefetch_page(struct kvm_vcpu *vcpu,
 				    struct kvm_mmu_page *sp)
 {
@@ -949,33 +995,57 @@ static void nonpaging_invlpg(struct kvm_
 static int mmu_unsync_walk(struct kvm_mmu_page *parent,
 			   struct kvm_unsync_walk *walker)
 {
-	int i, ret;
-	struct kvm_mmu_page *sp = parent;
+	int ret, level, i;
+	u64 ent;
+	struct kvm_mmu_page *sp, *child;
+	struct walk {
+		struct kvm_mmu_page *sp;
+		int pos;
+	} walk[PT64_ROOT_LEVEL];
 
-	while (parent->unsync_children) {
-		for (i = 0; i < PT64_ENT_PER_PAGE; ++i) {
-			u64 ent = sp->spt[i];
+	WARN_ON(parent->role.level == PT_PAGE_TABLE_LEVEL);
+
+	if (!parent->unsync_children)
+		return 0;
+
+	memset(&walk, 0, sizeof(walk));
+	level = parent->role.level;
+	walk[level-1].sp = parent;
+
+	do {
+		sp = walk[level-1].sp;
+		i = find_next_bit(sp->unsync_child_bitmap, 512, walk[level-1].pos);
+		if (i < 512) {
+			walk[level-1].pos = i+1;
+			ent = sp->spt[i];
 
 			if (is_shadow_present_pte(ent)) {
-				struct kvm_mmu_page *child;
 				child = page_header(ent & PT64_BASE_ADDR_MASK);
 
 				if (child->unsync_children) {
-					sp = child;
-					break;
+					--level;
+					walk[level-1].sp = child;
+					walk[level-1].pos = 0;
+					continue;
 				}
 				if (child->unsync) {
 					ret = walker->entry(child, walker);
+					__clear_bit(i, sp->unsync_child_bitmap);
 					if (ret)
 						return ret;
 				}
 			}
+			__clear_bit(i, sp->unsync_child_bitmap);
+		} else {
+			++level;
+			if (find_first_bit(sp->unsync_child_bitmap, 512) == 512) {
+				sp->unsync_children = 0;
+				if (level-1 < PT64_ROOT_LEVEL)
+					walk[level-1].pos = 0;
+			}
 		}
-		if (i == PT64_ENT_PER_PAGE) {
-			sp->unsync_children = 0;
-			sp = parent;
-		}
-	}
+	} while (level <= parent->role.level);
+
 	return 0;
 }
 
@@ -1090,10 +1160,11 @@ static struct kvm_mmu_page *kvm_mmu_get_
 			if (sp->role.word != role.word)
 				continue;
 
-			if (sp->unsync_children)
-				set_bit(KVM_REQ_MMU_SYNC, &vcpu->requests);
-
 			mmu_page_add_parent_pte(vcpu, sp, parent_pte);
+			if (sp->unsync_children) {
+				set_bit(KVM_REQ_MMU_SYNC, &vcpu->requests);
+				kvm_mmu_mark_parents_unsync(vcpu, sp);
+			}
 			pgprintk("%s: found\n", __func__);
 			return sp;
 		}
@@ -1368,12 +1439,6 @@ struct page *gva_to_page(struct kvm_vcpu
 	return page;
 }
 
-static int unsync_walk_fn(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
-{
-	sp->unsync_children = 1;
-	return 1;
-}
-
 static int kvm_unsync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
 {
 	unsigned index;
@@ -1390,7 +1455,7 @@ static int kvm_unsync_page(struct kvm_vc
 		if (s->role.word != sp->role.word)
 			return 1;
 	}
-	mmu_parent_walk(vcpu, sp, unsync_walk_fn);
+	kvm_mmu_mark_parents_unsync(vcpu, sp);
 	++vcpu->kvm->stat.mmu_unsync;
 	sp->unsync = 1;
 	mmu_convert_notrap(sp);
Index: kvm/include/asm-x86/kvm_host.h
===================================================================
--- kvm.orig/include/asm-x86/kvm_host.h
+++ kvm/include/asm-x86/kvm_host.h
@@ -201,6 +201,7 @@ struct kvm_mmu_page {
 		u64 *parent_pte;               /* !multimapped */
 		struct hlist_head parent_ptes; /* multimapped, kvm_pte_chain */
 	};
+	DECLARE_BITMAP(unsync_child_bitmap, 512);
 };
 
 struct kvm_pv_mmu_op_buffer {

-- 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [patch 13/13] KVM: MMU: add "oos_shadow" parameter to disable oos
  2008-09-22 19:16 [patch 00/13] out of sync shadow v3 Marcelo Tosatti
                   ` (11 preceding siblings ...)
  2008-09-22 19:16 ` [patch 12/13] KVM: MMU: speed up mmu_unsync_walk Marcelo Tosatti
@ 2008-09-22 19:16 ` Marcelo Tosatti
  2008-10-14 18:37 ` [patch 00/13] out of sync shadow v3 Alex Williamson
  13 siblings, 0 replies; 18+ messages in thread
From: Marcelo Tosatti @ 2008-09-22 19:16 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, Marcelo Tosatti

[-- Attachment #1: kvm-oos-disable --]
[-- Type: text/plain, Size: 675 bytes --]

Subject says it all.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

Index: kvm/arch/x86/kvm/mmu.c
===================================================================
--- kvm.orig/arch/x86/kvm/mmu.c
+++ kvm/arch/x86/kvm/mmu.c
@@ -70,6 +70,9 @@ static int dbg = 0;
 module_param(dbg, bool, 0644);
 #endif
 
+static int oos_shadow = 1;
+module_param(oos_shadow, bool, 0644);
+
 #ifndef MMU_DEBUG
 #define ASSERT(x) do { } while (0)
 #else
@@ -1473,7 +1476,7 @@ static int mmu_need_write_protect(struct
 			return 1;
 		if (shadow->unsync)
 			return 0;
-		if (can_unsync)
+		if (can_unsync && oos_shadow)
 			return kvm_unsync_page(vcpu, shadow);
 		return 1;
 	}

-- 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [patch 00/13] out of sync shadow v3
  2008-09-22 19:16 [patch 00/13] out of sync shadow v3 Marcelo Tosatti
                   ` (12 preceding siblings ...)
  2008-09-22 19:16 ` [patch 13/13] KVM: MMU: add "oos_shadow" parameter to disable oos Marcelo Tosatti
@ 2008-10-14 18:37 ` Alex Williamson
  2008-10-14 22:08   ` Alex Williamson
  13 siblings, 1 reply; 18+ messages in thread
From: Alex Williamson @ 2008-10-14 18:37 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Avi Kivity, kvm

On Mon, 2008-09-22 at 16:16 -0300, Marcelo Tosatti wrote:
> Adressing v2 comments, and with an option to disable oos.
> 
> Windows 2003 DDK build (4-way guest) results:
> mainline  oos
> 04:37     03:42    ( ~= -20% )
> 
> kernel builds on 4-way guest improve by 10%.

I'm seeing problems with this using a Debian Lenny guest, it gets glibc
realloc errors in the init scripts/initramfs.  Using -no-kvm or setting
oos_shadow=0 works around the problem.  I also confirmed it was
introduced in commit 641fb03992b20aa640781a245f6b7136f0b845e4.  It
should be reproducible using an install from this media:

http://cdimage.debian.org/cdimage/lenny_di_beta2/i386/iso-cd/debian-LennyBeta2-i386-businesscard.iso

Host system is running 2.6.27 x86_64 on Intel Xeon E5450.  After
install, the guest fails to get to a login prompt w/o the above
workarounds.  Running the guest as:

/usr/local/kvm/bin/qemu-system-x86_64 -L /usr/local/kvm/share/qemu
-drive file=/dev/VM/Debian,if=ide,index=0 -m 512 -net nic -net user
-vnc :1

Thanks,
Alex

-- 
Alex Williamson                             HP Open Source & Linux Org.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [patch 00/13] out of sync shadow v3
  2008-10-14 18:37 ` [patch 00/13] out of sync shadow v3 Alex Williamson
@ 2008-10-14 22:08   ` Alex Williamson
  2008-10-15  2:37     ` Marcelo Tosatti
  0 siblings, 1 reply; 18+ messages in thread
From: Alex Williamson @ 2008-10-14 22:08 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Avi Kivity, kvm

On Tue, 2008-10-14 at 12:37 -0600, Alex Williamson wrote:
> I'm seeing problems with this using a Debian Lenny guest, it gets glibc
> realloc errors in the init scripts/initramfs.  Using -no-kvm or setting
> oos_shadow=0 works around the problem.  I also confirmed it was
> introduced in commit 641fb03992b20aa640781a245f6b7136f0b845e4.  It
> should be reproducible using an install from this media:
> 
> http://cdimage.debian.org/cdimage/lenny_di_beta2/i386/iso-cd/debian-LennyBeta2-i386-businesscard.iso
> 
> Host system is running 2.6.27 x86_64 on Intel Xeon E5450.  After
> install, the guest fails to get to a login prompt w/o the above
> workarounds.  Running the guest as:
> 
> /usr/local/kvm/bin/qemu-system-x86_64 -L /usr/local/kvm/share/qemu
> -drive file=/dev/VM/Debian,if=ide,index=0 -m 512 -net nic -net user
> -vnc :1

I forgot to mention that this only happens with the Debian 2.6.26
kernel, if you end up with the 2.6.24 kernel after install, try an
'apt-get update && apt-get dist-upgrade' and reboot to get the newer
kernel.

Alex

-- 
Alex Williamson                             HP Open Source & Linux Org.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [patch 00/13] out of sync shadow v3
  2008-10-14 22:08   ` Alex Williamson
@ 2008-10-15  2:37     ` Marcelo Tosatti
  2008-10-15  3:56       ` Alex Williamson
  0 siblings, 1 reply; 18+ messages in thread
From: Marcelo Tosatti @ 2008-10-15  2:37 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Avi Kivity, kvm

On Tue, Oct 14, 2008 at 04:08:10PM -0600, Alex Williamson wrote:
> On Tue, 2008-10-14 at 12:37 -0600, Alex Williamson wrote:
> > I'm seeing problems with this using a Debian Lenny guest, it gets glibc
> > realloc errors in the init scripts/initramfs.  Using -no-kvm or setting
> > oos_shadow=0 works around the problem.  I also confirmed it was
> > introduced in commit 641fb03992b20aa640781a245f6b7136f0b845e4.  It
> > should be reproducible using an install from this media:
> > 
> > http://cdimage.debian.org/cdimage/lenny_di_beta2/i386/iso-cd/debian-LennyBeta2-i386-businesscard.iso
> > 
> > Host system is running 2.6.27 x86_64 on Intel Xeon E5450.  After
> > install, the guest fails to get to a login prompt w/o the above
> > workarounds.  Running the guest as:
> > 
> > /usr/local/kvm/bin/qemu-system-x86_64 -L /usr/local/kvm/share/qemu
> > -drive file=/dev/VM/Debian,if=ide,index=0 -m 512 -net nic -net user
> > -vnc :1
> 
> I forgot to mention that this only happens with the Debian 2.6.26
> kernel, if you end up with the 2.6.24 kernel after install, try an
> 'apt-get update && apt-get dist-upgrade' and reboot to get the newer
> kernel.

Hi Alex,

Can you try
http://article.gmane.org/gmane.comp.emulators.kvm.devel/23044 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [patch 00/13] out of sync shadow v3
  2008-10-15  2:37     ` Marcelo Tosatti
@ 2008-10-15  3:56       ` Alex Williamson
  0 siblings, 0 replies; 18+ messages in thread
From: Alex Williamson @ 2008-10-15  3:56 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Avi Kivity, kvm

On Wed, 2008-10-15 at 00:37 -0200, Marcelo Tosatti wrote: 
> > I forgot to mention that this only happens with the Debian 2.6.26
> > kernel, if you end up with the 2.6.24 kernel after install, try an
> > 'apt-get update && apt-get dist-upgrade' and reboot to get the newer
> > kernel.
> 
> Hi Alex,
> 
> Can you try
> http://article.gmane.org/gmane.comp.emulators.kvm.devel/23044 

Hi Marcelo,

Much better, that fixes it.  Thanks for the pointer,

Alex

-- 
Alex Williamson                             HP Open Source & Linux Org.


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2008-10-15  3:56 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-09-22 19:16 [patch 00/13] out of sync shadow v3 Marcelo Tosatti
2008-09-22 19:16 ` [patch 01/13] KVM: MMU: flush remote TLBs on large->normal entry overwrite Marcelo Tosatti
2008-09-22 19:16 ` [patch 02/13] KVM: MMU: split mmu_set_spte Marcelo Tosatti
2008-09-22 19:16 ` [patch 03/13] KVM: MMU: move local TLB flush to mmu_set_spte Marcelo Tosatti
2008-09-22 19:16 ` [patch 04/13] KVM: MMU: do not write-protect large mappings Marcelo Tosatti
2008-09-22 19:16 ` [patch 05/13] KVM: MMU: mode specific sync_page Marcelo Tosatti
2008-09-22 19:16 ` [patch 06/13] KVM: MMU: sync roots on mmu reload Marcelo Tosatti
2008-09-22 19:16 ` [patch 07/13] KVM: x86: trap invlpg Marcelo Tosatti
2008-09-22 19:16 ` [patch 08/13] KVM: MMU: mmu_parent_walk Marcelo Tosatti
2008-09-22 19:16 ` [patch 09/13] KVM: MMU: awareness of new kvm_mmu_zap_page behaviour Marcelo Tosatti
2008-09-22 19:16 ` [patch 10/13] KVM: MMU: mmu_convert_notrap helper Marcelo Tosatti
2008-09-22 19:16 ` [patch 11/13] KVM: MMU: out of sync shadow core v2 Marcelo Tosatti
2008-09-22 19:16 ` [patch 12/13] KVM: MMU: speed up mmu_unsync_walk Marcelo Tosatti
2008-09-22 19:16 ` [patch 13/13] KVM: MMU: add "oos_shadow" parameter to disable oos Marcelo Tosatti
2008-10-14 18:37 ` [patch 00/13] out of sync shadow v3 Alex Williamson
2008-10-14 22:08   ` Alex Williamson
2008-10-15  2:37     ` Marcelo Tosatti
2008-10-15  3:56       ` Alex Williamson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox