[patch 00/13] RFC: out of sync shadow

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

* [patch 00/13] RFC: out of sync shadow
@ 2008-09-06 18:48 Marcelo Tosatti
  2008-09-06 18:48 ` [patch 01/13] x86/mm: get_user_pages_fast_atomic Marcelo Tosatti
                   ` (14 more replies)
  0 siblings, 15 replies; 42+ messages in thread
From: Marcelo Tosatti @ 2008-09-06 18:48 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm

Keep shadow pages temporarily out of sync, allowing more efficient guest
PTE updates in comparison to trap-emulate + unprotect heuristics. Stolen
from Xen :)

This version only allows leaf pagetables to go out of sync, for
simplicity, but can be enhanced.

VMX "bypass_guest_pf" feature on prefetch_page breaks it (since new
PTE writes need no TLB flush, I assume). Not sure if its worthwhile to
convert notrap_nonpresent -> trap_nonpresent on unshadow or just go 
for unconditional nonpaging_prefetch_page.

* Kernel builds on 4-way 64-bit guest improve 10% (+ 3.7% for
get_user_pages_fast). 

* lmbench's "lat_proc fork" microbenchmark latency is 40% lower (a
shadow worst scenario test).

* The RHEL3 highpte kscand hangs go from 5+ seconds to < 1 second.

* Windows 2003 Server, 32-bit PAE, DDK build (build -cPzM 3):

Windows 2003 Checked 64 Bit Build Environment, 256M RAM
1-vcpu:
vanilla + gup_fast:         oos
0:04:37.375                 0:03:28.047     (- 25%)

2-vcpus:
vanilla + gup_fast          oos
0:02:32.000                 0:01:56.031     (- 23%)

Windows 2003 Checked Build Environment, 1GB RAM
2-vcpus:
vanilla + fast_gup         oos
0:02:26.078                0:01:50.110      (- 24%)

4-vcpus:
vanilla + fast_gup         oos
0:01:59.266                0:01:29.625      (- 25%)

And I think other optimizations are possible now, for example the guest
can be responsible for remote TLB flushing on kvm_mmu_pte_write().

Please review.

-- 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch 01/13] x86/mm: get_user_pages_fast_atomic
  2008-09-06 18:48 [patch 00/13] RFC: out of sync shadow Marcelo Tosatti
@ 2008-09-06 18:48 ` Marcelo Tosatti
  2008-09-07  8:42   ` Avi Kivity
  2008-09-06 18:48 ` [patch 02/13] KVM: MMU: switch to get_user_pages_fast Marcelo Tosatti
                   ` (13 subsequent siblings)
  14 siblings, 1 reply; 42+ messages in thread
From: Marcelo Tosatti @ 2008-09-06 18:48 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm

[-- Attachment #1: gupfast-atomic --]
[-- Type: text/plain, Size: 3035 bytes --]

From: Nick Piggin <nickpiggin@yahoo.com.au>

Provide a lockless pagetable walk function without fallback to mmap_sem
on error.

Index: kvm/arch/x86/mm/gup.c
===================================================================
--- kvm.orig/arch/x86/mm/gup.c
+++ kvm/arch/x86/mm/gup.c
@@ -8,6 +8,7 @@
 #include <linux/mm.h>
 #include <linux/vmstat.h>
 #include <linux/highmem.h>
+#include <linux/module.h>
 
 #include <asm/pgtable.h>
 
@@ -219,7 +220,7 @@ static int gup_pud_range(pgd_t pgd, unsi
 	return 1;
 }
 
-int get_user_pages_fast(unsigned long start, int nr_pages, int write,
+int get_user_pages_fast_atomic(unsigned long start, int nr_pages, int write,
 			struct page **pages)
 {
 	struct mm_struct *mm = current->mm;
@@ -234,7 +235,7 @@ int get_user_pages_fast(unsigned long st
 	end = start + len;
 	if (unlikely(!access_ok(write ? VERIFY_WRITE : VERIFY_READ,
 					start, len)))
-		goto slow_irqon;
+		return -EFAULT;
 
 	/*
 	 * XXX: batch / limit 'nr', to avoid large irq off latency
@@ -261,38 +262,49 @@ int get_user_pages_fast(unsigned long st
 
 		next = pgd_addr_end(addr, end);
 		if (pgd_none(pgd))
-			goto slow;
+			goto out_short;
 		if (!gup_pud_range(pgd, addr, next, write, pages, &nr))
-			goto slow;
+			goto out_short;
 	} while (pgdp++, addr = next, addr != end);
-	local_irq_enable();
 
 	VM_BUG_ON(nr != (end - start) >> PAGE_SHIFT);
+out_short:
+	local_irq_enable();
 	return nr;
+}
 
-	{
-		int ret;
-
-slow:
-		local_irq_enable();
-slow_irqon:
-		/* Try to get the remaining pages with get_user_pages */
-		start += nr << PAGE_SHIFT;
-		pages += nr;
-
-		down_read(&mm->mmap_sem);
-		ret = get_user_pages(current, mm, start,
-			(end - start) >> PAGE_SHIFT, write, 0, pages, NULL);
-		up_read(&mm->mmap_sem);
-
-		/* Have to be a bit careful with return values */
-		if (nr > 0) {
-			if (ret < 0)
-				ret = nr;
-			else
-				ret += nr;
-		}
+int get_user_pages_fast(unsigned long start, int nr_pages, int write,
+			struct page **pages)
+{
+	struct mm_struct *mm = current->mm;
+	int nr = 0;
+	int ret;
 
-		return ret;
+	nr = get_user_pages_fast_atomic(start, nr_pages, write, pages);
+	if (likely(nr == nr_pages))
+		return nr;
+
+	if (unlikely(nr < 0))
+		return nr;
+
+	start += nr << PAGE_SHIFT;
+	pages += nr;
+	nr_pages -= nr;
+
+	down_read(&mm->mmap_sem);
+	ret = get_user_pages(current, mm, start,
+		nr_pages, write, 0, pages, NULL);
+	up_read(&mm->mmap_sem);
+
+	/* Have to be a bit careful with return values */
+	if (nr > 0) {
+		if (ret < 0)
+			ret = nr;
+		else
+			ret += nr;
 	}
+
+	return ret;
 }
+
+EXPORT_SYMBOL_GPL(get_user_pages_fast_atomic);
Index: kvm/include/asm-x86/uaccess.h
===================================================================
--- kvm.orig/include/asm-x86/uaccess.h
+++ kvm/include/asm-x86/uaccess.h
@@ -443,6 +443,9 @@ extern struct movsl_mask {
 
 #define ARCH_HAS_NOCACHE_UACCESS 1
 
+int get_user_pages_fast_atomic(unsigned long start, int nr_pages, int write,
+			struct page **pages);
+
 #ifdef CONFIG_X86_32
 # include "uaccess_32.h"
 #else

-- 


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch 02/13] KVM: MMU: switch to get_user_pages_fast
  2008-09-06 18:48 [patch 00/13] RFC: out of sync shadow Marcelo Tosatti
  2008-09-06 18:48 ` [patch 01/13] x86/mm: get_user_pages_fast_atomic Marcelo Tosatti
@ 2008-09-06 18:48 ` Marcelo Tosatti
  2008-09-07  8:45   ` Avi Kivity
  2008-09-06 18:48 ` [patch 03/13] KVM: MMU: gfn_to_page_atomic Marcelo Tosatti
                   ` (12 subsequent siblings)
  14 siblings, 1 reply; 42+ messages in thread
From: Marcelo Tosatti @ 2008-09-06 18:48 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, Andrea Arcangeli

[-- Attachment #1: kvm-use-fast-gup --]
[-- Type: text/plain, Size: 7604 bytes --]

Avoid mmap_sem / pt lock acquision if the pagetables are present. The
improvement for hugepage backed guests is more significant, since pte
walk + page grab for such mappings is serialized by mm->page_table_lock.

CC: Andrea Arcangeli <andrea@qumranet.com>

Index: kvm/arch/x86/kvm/mmu.c
===================================================================
--- kvm.orig/arch/x86/kvm/mmu.c
+++ kvm/arch/x86/kvm/mmu.c
@@ -405,16 +405,19 @@ static int host_largepage_backed(struct 
 {
 	struct vm_area_struct *vma;
 	unsigned long addr;
+	int ret = 0;
 
 	addr = gfn_to_hva(kvm, gfn);
 	if (kvm_is_error_hva(addr))
-		return 0;
+		return ret;
 
+	down_read(&current->mm->mmap_sem);
 	vma = find_vma(current->mm, addr);
 	if (vma && is_vm_hugetlb_page(vma))
-		return 1;
+		ret = 1;
+	up_read(&current->mm->mmap_sem);
 
-	return 0;
+	return ret;
 }
 
 static int is_largepage_backed(struct kvm_vcpu *vcpu, gfn_t large_gfn)
@@ -1136,9 +1139,7 @@ struct page *gva_to_page(struct kvm_vcpu
 	if (gpa == UNMAPPED_GVA)
 		return NULL;
 
-	down_read(&current->mm->mmap_sem);
 	page = gfn_to_page(vcpu->kvm, gpa >> PAGE_SHIFT);
-	up_read(&current->mm->mmap_sem);
 
 	return page;
 }
@@ -1326,16 +1327,14 @@ static int nonpaging_map(struct kvm_vcpu
 	pfn_t pfn;
 	unsigned long mmu_seq;
 
-	down_read(&current->mm->mmap_sem);
 	if (is_largepage_backed(vcpu, gfn & ~(KVM_PAGES_PER_HPAGE-1))) {
 		gfn &= ~(KVM_PAGES_PER_HPAGE-1);
 		largepage = 1;
 	}
 
 	mmu_seq = vcpu->kvm->mmu_notifier_seq;
-	/* implicit mb(), we'll read before PT lock is unlocked */
+	smp_rmb();
 	pfn = gfn_to_pfn(vcpu->kvm, gfn);
-	up_read(&current->mm->mmap_sem);
 
 	/* mmio */
 	if (is_error_pfn(pfn)) {
@@ -1484,15 +1483,13 @@ static int tdp_page_fault(struct kvm_vcp
 	if (r)
 		return r;
 
-	down_read(&current->mm->mmap_sem);
 	if (is_largepage_backed(vcpu, gfn & ~(KVM_PAGES_PER_HPAGE-1))) {
 		gfn &= ~(KVM_PAGES_PER_HPAGE-1);
 		largepage = 1;
 	}
 	mmu_seq = vcpu->kvm->mmu_notifier_seq;
-	/* implicit mb(), we'll read before PT lock is unlocked */
+	smp_rmb();
 	pfn = gfn_to_pfn(vcpu->kvm, gfn);
-	up_read(&current->mm->mmap_sem);
 	if (is_error_pfn(pfn)) {
 		kvm_release_pfn_clean(pfn);
 		return 1;
@@ -1805,15 +1802,13 @@ static void mmu_guess_page_from_pte_writ
 		return;
 	gfn = (gpte & PT64_BASE_ADDR_MASK) >> PAGE_SHIFT;
 
-	down_read(&current->mm->mmap_sem);
 	if (is_large_pte(gpte) && is_largepage_backed(vcpu, gfn)) {
 		gfn &= ~(KVM_PAGES_PER_HPAGE-1);
 		vcpu->arch.update_pte.largepage = 1;
 	}
 	vcpu->arch.update_pte.mmu_seq = vcpu->kvm->mmu_notifier_seq;
-	/* implicit mb(), we'll read before PT lock is unlocked */
+	smp_rmb();
 	pfn = gfn_to_pfn(vcpu->kvm, gfn);
-	up_read(&current->mm->mmap_sem);
 
 	if (is_error_pfn(pfn)) {
 		kvm_release_pfn_clean(pfn);
Index: kvm/arch/x86/kvm/paging_tmpl.h
===================================================================
--- kvm.orig/arch/x86/kvm/paging_tmpl.h
+++ kvm/arch/x86/kvm/paging_tmpl.h
@@ -102,14 +102,10 @@ static bool FNAME(cmpxchg_gpte)(struct k
 	pt_element_t *table;
 	struct page *page;
 
-	down_read(&current->mm->mmap_sem);
 	page = gfn_to_page(kvm, table_gfn);
-	up_read(&current->mm->mmap_sem);
 
 	table = kmap_atomic(page, KM_USER0);
-
 	ret = CMPXCHG(&table[index], orig_pte, new_pte);
-
 	kunmap_atomic(table, KM_USER0);
 
 	kvm_release_page_dirty(page);
@@ -418,7 +414,6 @@ static int FNAME(page_fault)(struct kvm_
 		return 0;
 	}
 
-	down_read(&current->mm->mmap_sem);
 	if (walker.level == PT_DIRECTORY_LEVEL) {
 		gfn_t large_gfn;
 		large_gfn = walker.gfn & ~(KVM_PAGES_PER_HPAGE-1);
@@ -428,9 +423,8 @@ static int FNAME(page_fault)(struct kvm_
 		}
 	}
 	mmu_seq = vcpu->kvm->mmu_notifier_seq;
-	/* implicit mb(), we'll read before PT lock is unlocked */
+	smp_rmb();
 	pfn = gfn_to_pfn(vcpu->kvm, walker.gfn);
-	up_read(&current->mm->mmap_sem);
 
 	/* mmio */
 	if (is_error_pfn(pfn)) {
Index: kvm/arch/x86/kvm/vmx.c
===================================================================
--- kvm.orig/arch/x86/kvm/vmx.c
+++ kvm/arch/x86/kvm/vmx.c
@@ -2010,9 +2010,7 @@ static int alloc_apic_access_page(struct
 	if (r)
 		goto out;
 
-	down_read(&current->mm->mmap_sem);
 	kvm->arch.apic_access_page = gfn_to_page(kvm, 0xfee00);
-	up_read(&current->mm->mmap_sem);
 out:
 	up_write(&kvm->slots_lock);
 	return r;
@@ -2034,10 +2032,8 @@ static int alloc_identity_pagetable(stru
 	if (r)
 		goto out;
 
-	down_read(&current->mm->mmap_sem);
 	kvm->arch.ept_identity_pagetable = gfn_to_page(kvm,
 			VMX_EPT_IDENTITY_PAGETABLE_ADDR >> PAGE_SHIFT);
-	up_read(&current->mm->mmap_sem);
 out:
 	up_write(&kvm->slots_lock);
 	return r;
Index: kvm/arch/x86/kvm/x86.c
===================================================================
--- kvm.orig/arch/x86/kvm/x86.c
+++ kvm/arch/x86/kvm/x86.c
@@ -932,10 +932,8 @@ int kvm_set_msr_common(struct kvm_vcpu *
 		/* ...but clean it before doing the actual write */
 		vcpu->arch.time_offset = data & ~(PAGE_MASK | 1);
 
-		down_read(&current->mm->mmap_sem);
 		vcpu->arch.time_page =
 				gfn_to_page(vcpu->kvm, data >> PAGE_SHIFT);
-		up_read(&current->mm->mmap_sem);
 
 		if (is_error_page(vcpu->arch.time_page)) {
 			kvm_release_page_clean(vcpu->arch.time_page);
@@ -2305,9 +2303,7 @@ static int emulator_cmpxchg_emulated(uns
 
 		val = *(u64 *)new;
 
-		down_read(&current->mm->mmap_sem);
 		page = gfn_to_page(vcpu->kvm, gpa >> PAGE_SHIFT);
-		up_read(&current->mm->mmap_sem);
 
 		kaddr = kmap_atomic(page, KM_USER0);
 		set_64bit((u64 *)(kaddr + offset_in_page(gpa)), val);
@@ -3077,9 +3073,7 @@ static void vapic_enter(struct kvm_vcpu 
 	if (!apic || !apic->vapic_addr)
 		return;
 
-	down_read(&current->mm->mmap_sem);
 	page = gfn_to_page(vcpu->kvm, apic->vapic_addr >> PAGE_SHIFT);
-	up_read(&current->mm->mmap_sem);
 
 	vcpu->arch.apic->vapic_page = page;
 }
Index: kvm/virt/kvm/kvm_main.c
===================================================================
--- kvm.orig/virt/kvm/kvm_main.c
+++ kvm/virt/kvm/kvm_main.c
@@ -716,9 +716,6 @@ unsigned long gfn_to_hva(struct kvm *kvm
 }
 EXPORT_SYMBOL_GPL(gfn_to_hva);
 
-/*
- * Requires current->mm->mmap_sem to be held
- */
 pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn)
 {
 	struct page *page[1];
@@ -734,20 +731,23 @@ pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t 
 		return page_to_pfn(bad_page);
 	}
 
-	npages = get_user_pages(current, current->mm, addr, 1, 1, 0, page,
-				NULL);
+	npages = get_user_pages_fast(addr, 1, 1, page);
 
 	if (unlikely(npages != 1)) {
 		struct vm_area_struct *vma;
 
+		down_read(&current->mm->mmap_sem);
 		vma = find_vma(current->mm, addr);
+
 		if (vma == NULL || addr < vma->vm_start ||
 		    !(vma->vm_flags & VM_PFNMAP)) {
+			up_read(&current->mm->mmap_sem);
 			get_page(bad_page);
 			return page_to_pfn(bad_page);
 		}
 
 		pfn = ((addr - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
+		up_read(&current->mm->mmap_sem);
 		BUG_ON(!is_mmio_pfn(pfn));
 	} else
 		pfn = page_to_pfn(page[0]);
@@ -1387,17 +1387,22 @@ out:
 
 static int kvm_vm_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 {
+	struct page *page[1];
+	unsigned long addr;
+	int npages;
+	gfn_t gfn = vmf->pgoff;
 	struct kvm *kvm = vma->vm_file->private_data;
-	struct page *page;
 
-	if (!kvm_is_visible_gfn(kvm, vmf->pgoff))
+	addr = gfn_to_hva(kvm, gfn);
+	if (kvm_is_error_hva(addr))
 		return VM_FAULT_SIGBUS;
-	page = gfn_to_page(kvm, vmf->pgoff);
-	if (is_error_page(page)) {
-		kvm_release_page_clean(page);
+
+	npages = get_user_pages(current, current->mm, addr, 1, 1, 0, page,
+				NULL);
+	if (unlikely(npages != 1))
 		return VM_FAULT_SIGBUS;
-	}
-	vmf->page = page;
+
+	vmf->page = page[0];
 	return 0;
 }
 

-- 


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch 03/13] KVM: MMU: gfn_to_page_atomic
  2008-09-06 18:48 [patch 00/13] RFC: out of sync shadow Marcelo Tosatti
  2008-09-06 18:48 ` [patch 01/13] x86/mm: get_user_pages_fast_atomic Marcelo Tosatti
  2008-09-06 18:48 ` [patch 02/13] KVM: MMU: switch to get_user_pages_fast Marcelo Tosatti
@ 2008-09-06 18:48 ` Marcelo Tosatti
  2008-09-06 18:48 ` [patch 04/13] KVM: MMU: switch prefetch_page to gfn_to_page_atomic Marcelo Tosatti
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 42+ messages in thread
From: Marcelo Tosatti @ 2008-09-06 18:48 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm

[-- Attachment #1: kvm-gfn-to-page-atomic --]
[-- Type: text/plain, Size: 1451 bytes --]

KVM wrapper around get_user_pages_fast_atomic.


Index: kvm/include/linux/kvm_host.h
===================================================================
--- kvm.orig/include/linux/kvm_host.h
+++ kvm/include/linux/kvm_host.h
@@ -177,6 +177,7 @@ int kvm_arch_set_memory_region(struct kv
 void kvm_arch_flush_shadow(struct kvm *kvm);
 gfn_t unalias_gfn(struct kvm *kvm, gfn_t gfn);
 struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn);
+struct page *gfn_to_page_atomic(struct kvm *kvm, gfn_t gfn);
 unsigned long gfn_to_hva(struct kvm *kvm, gfn_t gfn);
 void kvm_release_page_clean(struct page *page);
 void kvm_release_page_dirty(struct page *page);
Index: kvm/virt/kvm/kvm_main.c
===================================================================
--- kvm.orig/virt/kvm/kvm_main.c
+++ kvm/virt/kvm/kvm_main.c
@@ -773,6 +773,30 @@ struct page *gfn_to_page(struct kvm *kvm
 
 EXPORT_SYMBOL_GPL(gfn_to_page);
 
+struct page *gfn_to_page_atomic(struct kvm *kvm, gfn_t gfn)
+{
+	struct page *page[1];
+	unsigned long addr;
+	int npages;
+
+	addr = gfn_to_hva(kvm, gfn);
+	if (kvm_is_error_hva(addr)) {
+		get_page(bad_page);
+		return bad_page;
+	}
+
+	npages = get_user_pages_fast_atomic(addr, 1, 1, page);
+
+	if (unlikely(npages != 1)) {
+		get_page(bad_page);
+		return bad_page;
+	}
+
+	return page[0];
+}
+
+EXPORT_SYMBOL_GPL(gfn_to_page_atomic);
+
 void kvm_release_page_clean(struct page *page)
 {
 	kvm_release_pfn_clean(page_to_pfn(page));

-- 


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch 04/13] KVM: MMU: switch prefetch_page to gfn_to_page_atomic
  2008-09-06 18:48 [patch 00/13] RFC: out of sync shadow Marcelo Tosatti
                   ` (2 preceding siblings ...)
  2008-09-06 18:48 ` [patch 03/13] KVM: MMU: gfn_to_page_atomic Marcelo Tosatti
@ 2008-09-06 18:48 ` Marcelo Tosatti
  2008-09-06 18:48 ` [patch 05/13] KVM: MMU: do not write-protect large mappings Marcelo Tosatti
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 42+ messages in thread
From: Marcelo Tosatti @ 2008-09-06 18:48 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm

[-- Attachment #1: prefetch-page-use-gfn-to-page-atomic --]
[-- Type: text/plain, Size: 1796 bytes --]

Close to 50% improvement for prefetch_page.

Index: kvm/arch/x86/kvm/paging_tmpl.h
===================================================================
--- kvm.orig/arch/x86/kvm/paging_tmpl.h
+++ kvm/arch/x86/kvm/paging_tmpl.h
@@ -477,9 +477,10 @@ static gpa_t FNAME(gva_to_gpa)(struct kv
 static void FNAME(prefetch_page)(struct kvm_vcpu *vcpu,
 				 struct kvm_mmu_page *sp)
 {
-	int i, j, offset, r;
-	pt_element_t pt[256 / sizeof(pt_element_t)];
-	gpa_t pte_gpa;
+	int i;
+	struct page *page;
+	pt_element_t *pt;
+	void *gpte_kaddr;
 
 	if (sp->role.metaphysical
 	    || (PTTYPE == 32 && sp->role.level > PT_PAGE_TABLE_LEVEL)) {
@@ -487,21 +488,26 @@ static void FNAME(prefetch_page)(struct 
 		return;
 	}
 
-	pte_gpa = gfn_to_gpa(sp->gfn);
-	if (PTTYPE == 32) {
-		offset = sp->role.quadrant << PT64_LEVEL_BITS;
-		pte_gpa += offset * sizeof(pt_element_t);
+	page = gfn_to_page_atomic(vcpu->kvm, sp->gfn);
+	if (is_error_page(page)) {
+		nonpaging_prefetch_page(vcpu, sp);
+		kvm_release_page_clean(page);
+		return;
 	}
+	gpte_kaddr = pt = kmap_atomic(page, KM_USER0);
+
+	if (PTTYPE == 32)
+		pt += sp->role.quadrant << PT64_LEVEL_BITS;
 
-	for (i = 0; i < PT64_ENT_PER_PAGE; i += ARRAY_SIZE(pt)) {
-		r = kvm_read_guest_atomic(vcpu->kvm, pte_gpa, pt, sizeof pt);
-		pte_gpa += ARRAY_SIZE(pt) * sizeof(pt_element_t);
-		for (j = 0; j < ARRAY_SIZE(pt); ++j)
-			if (r || is_present_pte(pt[j]))
-				sp->spt[i+j] = shadow_trap_nonpresent_pte;
-			else
-				sp->spt[i+j] = shadow_notrap_nonpresent_pte;
+	for (i = 0; i < PT64_ENT_PER_PAGE; i++) {
+		if (is_present_pte(*pt))
+			sp->spt[i] = shadow_trap_nonpresent_pte;
+		else
+			sp->spt[i] = shadow_notrap_nonpresent_pte;
+		pt++;
 	}
+	kunmap_atomic(gpte_kaddr, KM_USER0);
+	kvm_release_page_clean(page);
 }
 
 #undef pt_element_t

-- 


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch 05/13] KVM: MMU: do not write-protect large mappings
  2008-09-06 18:48 [patch 00/13] RFC: out of sync shadow Marcelo Tosatti
                   ` (3 preceding siblings ...)
  2008-09-06 18:48 ` [patch 04/13] KVM: MMU: switch prefetch_page to gfn_to_page_atomic Marcelo Tosatti
@ 2008-09-06 18:48 ` Marcelo Tosatti
  2008-09-07  9:04   ` Avi Kivity
  2008-09-06 18:48 ` [patch 06/13] KVM: MMU: global page keeping Marcelo Tosatti
                   ` (9 subsequent siblings)
  14 siblings, 1 reply; 42+ messages in thread
From: Marcelo Tosatti @ 2008-09-06 18:48 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm

[-- Attachment #1: no-wprotected-lpage --]
[-- Type: text/plain, Size: 1787 bytes --]

There is not much point in write protecting large mappings. This
can only happen when a page is shadowed during the window between
is_largepage_backed and mmu_lock acquision. Zap the entry instead, so
the next pagefault will find a shadowed page via is_largepage_backed and
fallback to 4k translations.

Simplifies out of sync shadow.

Index: kvm/arch/x86/kvm/mmu.c
===================================================================
--- kvm.orig/arch/x86/kvm/mmu.c
+++ kvm/arch/x86/kvm/mmu.c
@@ -1210,8 +1210,7 @@ static void mmu_set_spte(struct kvm_vcpu
 		spte |= PT_WRITABLE_MASK;
 
 		shadow = kvm_mmu_lookup_page(vcpu->kvm, gfn);
-		if (shadow ||
-		   (largepage && has_wrprotected_page(vcpu->kvm, gfn))) {
+		if (shadow) {
 			pgprintk("%s: found shadow page for %lx, marking ro\n",
 				 __func__, gfn);
 			pte_access &= ~ACC_WRITE_MASK;
@@ -1222,6 +1221,14 @@ static void mmu_set_spte(struct kvm_vcpu
 			if (write_fault)
 				*ptwrite = 1;
 		}
+		/*
+ 		 * Do not create write protected large translations.
+ 		 */
+		if (largepage && has_wrprotected_page(vcpu->kvm, gfn)) {
+			spte = shadow_trap_nonpresent_pte;
+			was_writeble = 0;
+			*ptwrite = 0;
+		}
 	}
 
 	if (pte_access & ACC_WRITE_MASK)
Index: kvm/arch/x86/kvm/paging_tmpl.h
===================================================================
--- kvm.orig/arch/x86/kvm/paging_tmpl.h
+++ kvm/arch/x86/kvm/paging_tmpl.h
@@ -307,11 +307,10 @@ static int FNAME(shadow_walk_entry)(stru
 		return 1;
 	}
 
-	if (is_shadow_present_pte(*sptep) && !is_large_pte(*sptep))
+	if (is_shadow_present_pte(*sptep))
 		return 0;
 
-	if (is_large_pte(*sptep))
-		rmap_remove(vcpu->kvm, sptep);
+	WARN_ON (is_large_pte(*sptep));
 
 	if (level == PT_DIRECTORY_LEVEL && gw->level == PT_DIRECTORY_LEVEL) {
 		metaphysical = 1;

-- 


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch 06/13] KVM: MMU: global page keeping
  2008-09-06 18:48 [patch 00/13] RFC: out of sync shadow Marcelo Tosatti
                   ` (4 preceding siblings ...)
  2008-09-06 18:48 ` [patch 05/13] KVM: MMU: do not write-protect large mappings Marcelo Tosatti
@ 2008-09-06 18:48 ` Marcelo Tosatti
  2008-09-07  9:16   ` Avi Kivity
  2008-09-06 18:48 ` [patch 07/13] KVM: MMU: mode specific sync_page Marcelo Tosatti
                   ` (8 subsequent siblings)
  14 siblings, 1 reply; 42+ messages in thread
From: Marcelo Tosatti @ 2008-09-06 18:48 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm

[-- Attachment #1: kvm-oos-global-keeping --]
[-- Type: text/plain, Size: 3848 bytes --]

Account pages that only have global entries. Such pages need to be
synced on cr4 writes, but not cr3 writes.

Index: kvm/arch/x86/kvm/mmu.c
===================================================================
--- kvm.orig/arch/x86/kvm/mmu.c
+++ kvm/arch/x86/kvm/mmu.c
@@ -778,6 +778,7 @@ static struct kvm_mmu_page *kvm_mmu_allo
 	ASSERT(is_empty_shadow_page(sp->spt));
 	sp->slot_bitmap = 0;
 	sp->multimapped = 0;
+	sp->flags = (1 << KVM_PG_global);
 	sp->parent_pte = parent_pte;
 	--vcpu->kvm->arch.n_free_mmu_pages;
 	return sp;
@@ -1147,18 +1148,22 @@ struct page *gva_to_page(struct kvm_vcpu
 static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *shadow_pte,
 			 unsigned pt_access, unsigned pte_access,
 			 int user_fault, int write_fault, int dirty,
-			 int *ptwrite, int largepage, gfn_t gfn,
-			 pfn_t pfn, bool speculative)
+			 int global, int *ptwrite, int largepage,
+			 gfn_t gfn, pfn_t pfn, bool speculative)
 {
 	u64 spte;
 	int was_rmapped = 0;
 	int was_writeble = is_writeble_pte(*shadow_pte);
+	struct kvm_mmu_page *sp = page_header(__pa(shadow_pte));
 
 	pgprintk("%s: spte %llx access %x write_fault %d"
 		 " user_fault %d gfn %lx\n",
 		 __func__, *shadow_pte, pt_access,
 		 write_fault, user_fault, gfn);
 
+	if (!global)
+		kvm_clear_pg_global(sp);
+
 	if (is_rmap_pte(*shadow_pte)) {
 		/*
 		 * If we overwrite a PTE page pointer with a 2MB PMD, unlink
@@ -1285,7 +1290,7 @@ static int direct_map_entry(struct kvm_s
 	if (level == PT_PAGE_TABLE_LEVEL
 	    || (walk->largepage && level == PT_DIRECTORY_LEVEL)) {
 		mmu_set_spte(vcpu, sptep, ACC_ALL, ACC_ALL,
-			     0, walk->write, 1, &walk->pt_write,
+			     0, walk->write, 1, 0, &walk->pt_write,
 			     walk->largepage, gfn, walk->pfn, false);
 		++vcpu->stat.pf_fixed;
 		return 1;
Index: kvm/include/asm-x86/kvm_host.h
===================================================================
--- kvm.orig/include/asm-x86/kvm_host.h
+++ kvm/include/asm-x86/kvm_host.h
@@ -199,6 +199,7 @@ struct kvm_mmu_page {
 		u64 *parent_pte;               /* !multimapped */
 		struct hlist_head parent_ptes; /* multimapped, kvm_pte_chain */
 	};
+	unsigned long flags;
 };
 
 struct kvm_pv_mmu_op_buffer {
@@ -757,4 +758,20 @@ asmlinkage void kvm_handle_fault_on_rebo
 int kvm_unmap_hva(struct kvm *kvm, unsigned long hva);
 int kvm_age_hva(struct kvm *kvm, unsigned long hva);
 
+enum kvm_page_flags {
+	KVM_PG_global,
+};
+
+#define KVMPGFLAG(name)							\
+static inline int kvm_page_##name(struct kvm_mmu_page *sp)		\
+	{ return test_bit(KVM_PG_##name, &sp->flags); }			\
+static inline void kvm_set_pg_##name(struct kvm_mmu_page *sp)		\
+	{ set_bit(KVM_PG_##name, &sp->flags); }				\
+static inline void kvm_clear_pg_##name(struct kvm_mmu_page *sp)		\
+	{ clear_bit(KVM_PG_##name, &sp->flags); }			\
+static inline int kvm_test_clear_pg_##name(struct kvm_mmu_page *sp)	\
+	{ return test_and_clear_bit(KVM_PG_##name, &sp->flags); }
+
+KVMPGFLAG(global);
+
 #endif
Index: kvm/arch/x86/kvm/paging_tmpl.h
===================================================================
--- kvm.orig/arch/x86/kvm/paging_tmpl.h
+++ kvm/arch/x86/kvm/paging_tmpl.h
@@ -274,8 +274,8 @@ static void FNAME(update_pte)(struct kvm
 		return;
 	kvm_get_pfn(pfn);
 	mmu_set_spte(vcpu, spte, page->role.access, pte_access, 0, 0,
-		     gpte & PT_DIRTY_MASK, NULL, largepage, gpte_to_gfn(gpte),
-		     pfn, true);
+		     gpte & PT_DIRTY_MASK, gpte & PT_GLOBAL_MASK, NULL,
+		     largepage, gpte_to_gfn(gpte), pfn, true);
 }
 
 /*
@@ -301,6 +301,7 @@ static int FNAME(shadow_walk_entry)(stru
 		mmu_set_spte(vcpu, sptep, access, gw->pte_access & access,
 			     sw->user_fault, sw->write_fault,
 			     gw->ptes[gw->level-1] & PT_DIRTY_MASK,
+			     gw->ptes[gw->level-1] & PT_GLOBAL_MASK,
 			     sw->ptwrite, sw->largepage, gw->gfn, sw->pfn,
 			     false);
 		sw->sptep = sptep;

-- 


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch 07/13] KVM: MMU: mode specific sync_page
  2008-09-06 18:48 [patch 00/13] RFC: out of sync shadow Marcelo Tosatti
                   ` (5 preceding siblings ...)
  2008-09-06 18:48 ` [patch 06/13] KVM: MMU: global page keeping Marcelo Tosatti
@ 2008-09-06 18:48 ` Marcelo Tosatti
  2008-09-07  9:52   ` Avi Kivity
  2008-09-06 18:48 ` [patch 08/13] KVM: MMU: record guest root level on struct guest_walker Marcelo Tosatti
                   ` (7 subsequent siblings)
  14 siblings, 1 reply; 42+ messages in thread
From: Marcelo Tosatti @ 2008-09-06 18:48 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm

[-- Attachment #1: kvm-oos-sync-page --]
[-- Type: text/plain, Size: 6527 bytes --]

Examine guest pagetable and bring the shadow back in sync. At the moment
sync_page is simplistic and only cares about shadow present entries
whose gfn remains unchanged.

It might be worthwhile to prepopulate the shadow in advance.

FIXME: the RW->RO transition needs a local TLB flush.

Index: kvm/arch/x86/kvm/mmu.c
===================================================================
--- kvm.orig/arch/x86/kvm/mmu.c
+++ kvm/arch/x86/kvm/mmu.c
@@ -868,6 +868,14 @@ static void nonpaging_prefetch_page(stru
 		sp->spt[i] = shadow_trap_nonpresent_pte;
 }
 
+static int nonpaging_sync_page(struct kvm_vcpu *vcpu,
+			       struct kvm_mmu_page *sp)
+{
+	/* should never happen */
+	WARN_ON(1);
+	return 1;
+}
+
 static struct kvm_mmu_page *kvm_mmu_lookup_page(struct kvm *kvm, gfn_t gfn)
 {
 	unsigned index;
@@ -888,6 +896,43 @@ static struct kvm_mmu_page *kvm_mmu_look
 	return NULL;
 }
 
+static void kvm_sync_writeble(struct kvm_vcpu *vcpu, u64 *spte, int gpte_rw,
+			      gfn_t gfn)
+{
+	if (is_writeble_pte(*spte) == gpte_rw)
+		return;
+	if (is_writeble_pte(*spte))
+		*spte &= ~PT_WRITABLE_MASK;
+	else {
+		if (kvm_mmu_lookup_page(vcpu->kvm, gfn)) {
+			pgprintk("%s: found shadow page for %lx, keeping ro\n",
+			 __func__, gfn);
+		} else
+			*spte |= PT_WRITABLE_MASK;
+	}
+	return;
+}
+
+static void kvm_mmu_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp);
+
+static int kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
+{
+	int ret;
+
+	rmap_write_protect(vcpu->kvm, sp->gfn);
+	ret = vcpu->arch.mmu.sync_page(vcpu, sp);
+	if (ret <= 0)
+		/* possible optimization: unprotect all
+ 		 * mappings (only originally writeble ones
+ 		 * of course).
+ 		 */
+		kvm_mmu_zap_page(vcpu->kvm, sp);
+	else
+		kvm_clear_pg_unsync(sp);
+
+	return ret;
+}
+
 static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
 					     gfn_t gfn,
 					     gva_t gaddr,
@@ -1536,6 +1581,7 @@ static int nonpaging_init_context(struct
 	context->gva_to_gpa = nonpaging_gva_to_gpa;
 	context->free = nonpaging_free;
 	context->prefetch_page = nonpaging_prefetch_page;
+	context->sync_page = nonpaging_sync_page;
 	context->root_level = 0;
 	context->shadow_root_level = PT32E_ROOT_LEVEL;
 	context->root_hpa = INVALID_PAGE;
@@ -1583,6 +1629,7 @@ static int paging64_init_context_common(
 	context->page_fault = paging64_page_fault;
 	context->gva_to_gpa = paging64_gva_to_gpa;
 	context->prefetch_page = paging64_prefetch_page;
+	context->sync_page = paging64_sync_page;
 	context->free = paging_free;
 	context->root_level = level;
 	context->shadow_root_level = level;
@@ -1604,6 +1651,7 @@ static int paging32_init_context(struct 
 	context->gva_to_gpa = paging32_gva_to_gpa;
 	context->free = paging_free;
 	context->prefetch_page = paging32_prefetch_page;
+	context->sync_page = paging32_sync_page;
 	context->root_level = PT32_ROOT_LEVEL;
 	context->shadow_root_level = PT32E_ROOT_LEVEL;
 	context->root_hpa = INVALID_PAGE;
@@ -1623,6 +1671,7 @@ static int init_kvm_tdp_mmu(struct kvm_v
 	context->page_fault = tdp_page_fault;
 	context->free = nonpaging_free;
 	context->prefetch_page = nonpaging_prefetch_page;
+	context->sync_page = nonpaging_sync_page;
 	context->shadow_root_level = kvm_x86_ops->get_tdp_level();
 	context->root_hpa = INVALID_PAGE;
 
Index: kvm/arch/x86/kvm/paging_tmpl.h
===================================================================
--- kvm.orig/arch/x86/kvm/paging_tmpl.h
+++ kvm/arch/x86/kvm/paging_tmpl.h
@@ -510,6 +510,85 @@ static void FNAME(prefetch_page)(struct 
 	kvm_release_page_clean(page);
 }
 
+static int FNAME(sync_page)(struct kvm_vcpu *vcpu,
+			    struct kvm_mmu_page *sp)
+{
+	int i, nr_present = 0;
+	struct page *pt_page;
+	pt_element_t *pt;
+	void *gpte_kaddr;
+
+	pt_page = gfn_to_page_atomic(vcpu->kvm, sp->gfn);
+	if (is_error_page(pt_page)) {
+		kvm_release_page_clean(pt_page);
+		return -EFAULT;
+	}
+
+	gpte_kaddr = pt = kmap_atomic(pt_page, KM_USER0);
+
+	if (PTTYPE == 32)
+		pt += sp->role.quadrant << PT64_LEVEL_BITS;
+
+	for (i = 0; i < PT64_ENT_PER_PAGE; i++) {
+		if (is_shadow_present_pte(sp->spt[i])) {
+			struct page *page;
+			u64 spte;
+			unsigned pte_access;
+
+			if (!is_present_pte(*pt)) {
+				rmap_remove(vcpu->kvm, &sp->spt[i]);
+				sp->spt[i] = shadow_notrap_nonpresent_pte;
+				pt++;
+				continue;
+			}
+
+			page = gfn_to_page_atomic(vcpu->kvm, gpte_to_gfn(*pt));
+			if (is_error_page(page) ||
+			    spte_to_pfn(sp->spt[i]) != page_to_pfn(page)) {
+				rmap_remove(vcpu->kvm, &sp->spt[i]);
+				sp->spt[i] = shadow_trap_nonpresent_pte;
+				kvm_release_page_clean(page);
+				pt++;
+				continue;
+			}
+			kvm_release_page_clean(page);
+			nr_present++;
+			spte = sp->spt[i];
+			pte_access = sp->role.access & FNAME(gpte_access)(vcpu, *pt);
+			/* user */
+			if (pte_access & ACC_USER_MASK)
+				spte |= shadow_user_mask;
+			/* nx */
+			if (pte_access & ACC_EXEC_MASK)
+				spte |= shadow_x_mask;
+			else
+				spte |= shadow_nx_mask;
+			/* writeble */
+			kvm_sync_writeble(vcpu, &spte, is_writeble_pte(*pt),
+					  gpte_to_gfn(*pt));
+			/* clear writable to catch dirtyness */
+			if (!is_dirty_pte(*pt))
+				spte &= ~PT_WRITABLE_MASK;
+			/* guest->shadow accessed sync */
+			if (!(*pt & PT_ACCESSED_MASK))
+				spte &= ~PT_ACCESSED_MASK;
+			/* shadow->guest accessed sync */
+			if (spte & PT_ACCESSED_MASK)
+				set_bit(PT_ACCESSED_SHIFT, (unsigned long *)pt);
+			/* global */
+			if (!(*pt & PT_GLOBAL_MASK))
+				kvm_clear_pg_global(sp);
+			set_shadow_pte(&sp->spt[i], spte);
+		}
+		pt++;
+	}
+
+	kunmap_atomic(pt_page, KM_USER0);
+	kvm_release_page_dirty(pt_page);
+
+	return nr_present;
+}
+
 #undef pt_element_t
 #undef guest_walker
 #undef shadow_walker
Index: kvm/include/asm-x86/kvm_host.h
===================================================================
--- kvm.orig/include/asm-x86/kvm_host.h
+++ kvm/include/asm-x86/kvm_host.h
@@ -221,6 +221,8 @@ struct kvm_mmu {
 	gpa_t (*gva_to_gpa)(struct kvm_vcpu *vcpu, gva_t gva);
 	void (*prefetch_page)(struct kvm_vcpu *vcpu,
 			      struct kvm_mmu_page *page);
+	int (*sync_page)(struct kvm_vcpu *vcpu,
+			  struct kvm_mmu_page *sp);
 	hpa_t root_hpa;
 	int root_level;
 	int shadow_root_level;
@@ -760,6 +762,7 @@ int kvm_age_hva(struct kvm *kvm, unsigne
 
 enum kvm_page_flags {
 	KVM_PG_global,
+	KVM_PG_unsync,
 };
 
 #define KVMPGFLAG(name)							\
@@ -773,5 +776,6 @@ static inline int kvm_test_clear_pg_##na
 	{ return test_and_clear_bit(KVM_PG_##name, &sp->flags); }
 
 KVMPGFLAG(global);
+KVMPGFLAG(unsync);
 
 #endif

-- 


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch 08/13] KVM: MMU: record guest root level on struct guest_walker
  2008-09-06 18:48 [patch 00/13] RFC: out of sync shadow Marcelo Tosatti
                   ` (6 preceding siblings ...)
  2008-09-06 18:48 ` [patch 07/13] KVM: MMU: mode specific sync_page Marcelo Tosatti
@ 2008-09-06 18:48 ` Marcelo Tosatti
  2008-09-06 18:48 ` [patch 09/13] KVM: MMU: out of sync shadow core Marcelo Tosatti
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 42+ messages in thread
From: Marcelo Tosatti @ 2008-09-06 18:48 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm

[-- Attachment #1: kvm-oos-record-root-level-on-guest-walker --]
[-- Type: text/plain, Size: 2612 bytes --]

For future usage.

Index: kvm/arch/x86/kvm/mmu.c
===================================================================
--- kvm.orig/arch/x86/kvm/mmu.c
+++ kvm/arch/x86/kvm/mmu.c
@@ -934,6 +934,7 @@ static int kvm_sync_page(struct kvm_vcpu
 }
 
 static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
+					     gfn_t root_gfn,
 					     gfn_t gfn,
 					     gva_t gaddr,
 					     unsigned level,
@@ -1343,8 +1344,8 @@ static int direct_map_entry(struct kvm_s
 
 	if (*sptep == shadow_trap_nonpresent_pte) {
 		pseudo_gfn = (addr & PT64_DIR_BASE_ADDR_MASK) >> PAGE_SHIFT;
-		sp = kvm_mmu_get_page(vcpu, pseudo_gfn, (gva_t)addr, level - 1,
-				      1, ACC_ALL, sptep);
+		sp = kvm_mmu_get_page(vcpu, -1, pseudo_gfn, (gva_t)addr,
+				      level - 1, 1, ACC_ALL, sptep);
 		if (!sp) {
 			pgprintk("nonpaging_map: ENOMEM\n");
 			kvm_release_pfn_clean(walk->pfn);
@@ -1466,7 +1467,7 @@ static void mmu_alloc_roots(struct kvm_v
 		ASSERT(!VALID_PAGE(root));
 		if (tdp_enabled)
 			metaphysical = 1;
-		sp = kvm_mmu_get_page(vcpu, root_gfn, 0,
+		sp = kvm_mmu_get_page(vcpu, -1, root_gfn, 0,
 				      PT64_ROOT_LEVEL, metaphysical,
 				      ACC_ALL, NULL);
 		root = __pa(sp->spt);
@@ -1489,7 +1490,7 @@ static void mmu_alloc_roots(struct kvm_v
 			root_gfn = vcpu->arch.pdptrs[i] >> PAGE_SHIFT;
 		} else if (vcpu->arch.mmu.root_level == 0)
 			root_gfn = 0;
-		sp = kvm_mmu_get_page(vcpu, root_gfn, i << 30,
+		sp = kvm_mmu_get_page(vcpu, -1, root_gfn, i << 30,
 				      PT32_ROOT_LEVEL, metaphysical,
 				      ACC_ALL, NULL);
 		root = __pa(sp->spt);
Index: kvm/arch/x86/kvm/paging_tmpl.h
===================================================================
--- kvm.orig/arch/x86/kvm/paging_tmpl.h
+++ kvm/arch/x86/kvm/paging_tmpl.h
@@ -64,6 +64,7 @@
  */
 struct guest_walker {
 	int level;
+	int root_level;
 	gfn_t table_gfn[PT_MAX_FULL_LEVELS];
 	pt_element_t ptes[PT_MAX_FULL_LEVELS];
 	gpa_t pte_gpa[PT_MAX_FULL_LEVELS];
@@ -149,6 +150,7 @@ walk:
 		--walker->level;
 	}
 #endif
+	walker->root_level = walker->level;
 	ASSERT((!is_long_mode(vcpu) && is_pae(vcpu)) ||
 	       (vcpu->arch.cr3 & CR3_NONPAE_RESERVED_BITS) == 0);
 
@@ -322,7 +324,8 @@ static int FNAME(shadow_walk_entry)(stru
 		metaphysical = 0;
 		table_gfn = gw->table_gfn[level - 2];
 	}
-	shadow_page = kvm_mmu_get_page(vcpu, table_gfn, (gva_t)addr, level-1,
+	shadow_page = kvm_mmu_get_page(vcpu, gw->table_gfn[gw->root_level - 1],
+				       table_gfn, (gva_t)addr, level-1,
 				       metaphysical, access, sptep);
 	if (!metaphysical) {
 		r = kvm_read_guest_atomic(vcpu->kvm, gw->pte_gpa[level - 2],

-- 


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch 09/13] KVM: MMU: out of sync shadow core
  2008-09-06 18:48 [patch 00/13] RFC: out of sync shadow Marcelo Tosatti
                   ` (7 preceding siblings ...)
  2008-09-06 18:48 ` [patch 08/13] KVM: MMU: record guest root level on struct guest_walker Marcelo Tosatti
@ 2008-09-06 18:48 ` Marcelo Tosatti
  2008-09-07 11:01   ` Avi Kivity
  2008-09-06 18:48 ` [patch 10/13] KVM: MMU: sync roots on mmu reload Marcelo Tosatti
                   ` (5 subsequent siblings)
  14 siblings, 1 reply; 42+ messages in thread
From: Marcelo Tosatti @ 2008-09-06 18:48 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm

[-- Attachment #1: kvm-oos-core --]
[-- Type: text/plain, Size: 15796 bytes --]

Allow global and single-root, single-role-per-gfn leaf shadowed
pagetables to be unsynced.

Global unsync pages are saved into a per-vm array, synced on cr4/cr0 writes.

Non-global unsync pages are linked off their root shadow page, synced 
on cr3/cr4/cr0 writes.

Some of this logic is simplistic and could be smarter (page_multimapped and
the full root sync on higher level pagetable sharing).

Also unsyncing of non-leaf nodes might be interesting (but more complicated).


Index: kvm/arch/x86/kvm/mmu.c
===================================================================
--- kvm.orig/arch/x86/kvm/mmu.c
+++ kvm/arch/x86/kvm/mmu.c
@@ -64,6 +64,17 @@ static void kvm_mmu_audit(struct kvm_vcp
 #define rmap_printk(x...) do { } while (0)
 
 #endif
+#define OOS_DEBUG
+#if defined (OOS_DEBUG)
+#define OOS_ASSERT(x)							\
+	if (!(x)) {							\
+		printk(KERN_WARNING "assertion failed %s:%d: %s\n",	\
+		       __FILE__, __LINE__, #x);				\
+		dump_stack();						\
+	}
+#else
+#define OOS_ASSERT(x) do { } while (0)
+#endif
 
 #if defined(MMU_DEBUG) || defined(AUDIT)
 static int dbg = 0;
@@ -773,6 +784,8 @@ static struct kvm_mmu_page *kvm_mmu_allo
 	sp = mmu_memory_cache_alloc(&vcpu->arch.mmu_page_header_cache, sizeof *sp);
 	sp->spt = mmu_memory_cache_alloc(&vcpu->arch.mmu_page_cache, PAGE_SIZE);
 	sp->gfns = mmu_memory_cache_alloc(&vcpu->arch.mmu_page_cache, PAGE_SIZE);
+	INIT_LIST_HEAD(&sp->oos_link);
+	INIT_LIST_HEAD(&sp->unsync_pages);
 	set_page_private(virt_to_page(sp->spt), (unsigned long)sp);
 	list_add(&sp->link, &vcpu->kvm->arch.active_mmu_pages);
 	ASSERT(is_empty_shadow_page(sp->spt));
@@ -896,6 +909,31 @@ static struct kvm_mmu_page *kvm_mmu_look
 	return NULL;
 }
 
+static struct kvm_mmu_page *kvm_mmu_lookup_page_root(struct kvm_vcpu *vcpu,
+						     gfn_t gfn)
+{
+	unsigned index;
+	struct hlist_head *bucket;
+	struct kvm_mmu_page *sp;
+	struct hlist_node *node;
+	struct kvm *kvm = vcpu->kvm;
+	int level = vcpu->arch.mmu.root_level;
+	if (!is_long_mode(vcpu) && is_pae(vcpu))
+		level--;
+
+	pgprintk("%s: looking for gfn %lx\n", __func__, gfn);
+	index = kvm_page_table_hashfn(gfn);
+	bucket = &kvm->arch.mmu_page_hash[index];
+	hlist_for_each_entry(sp, node, bucket, hash_link)
+		if (sp->gfn == gfn && !sp->role.metaphysical
+		    && !sp->role.invalid && sp->role.level == level) {
+			pgprintk("%s: found role %x\n",
+				 __func__, sp->role.word);
+			return sp;
+		}
+	return NULL;
+}
+
 static void kvm_sync_writeble(struct kvm_vcpu *vcpu, u64 *spte, int gpte_rw,
 			      gfn_t gfn)
 {
@@ -913,12 +951,48 @@ static void kvm_sync_writeble(struct kvm
 	return;
 }
 
-static void kvm_mmu_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp);
+static void kvm_unlink_unsync_page(struct kvm *kvm, struct kvm_mmu_page *sp)
+{
+	if (kvm_page_global(sp)) {
+		int i;
+		/* FIXME: save index into sp->flags */
+		for (i = 0; i < ARRAY_SIZE(kvm->arch.oos_global_pages); i++)
+			if (kvm->arch.oos_global_pages[i] == sp) {
+				kvm->arch.oos_global_pages[i] = NULL;
+				break;
+			}
+		OOS_ASSERT(i < ARRAY_SIZE(kvm->arch.oos_global_pages));
+		--kvm->stat.mmu_unsync_global;
+	} else {
+		list_del(&sp->oos_link);
+		--kvm->stat.mmu_unsync;
+	}
+}
+
+static void kvm_mmu_page_unlink_children(struct kvm *kvm,
+					 struct kvm_mmu_page *sp);
+static int kvm_mmu_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp);
+
+static void kvm_mmu_zap_unsync_page(struct kvm *kvm, struct kvm_mmu_page *sp)
+{
+	if (unlikely(kvm_page_inuse(sp))) {
+		kvm_mmu_page_unlink_children(kvm, sp);
+		kvm_flush_remote_tlbs(kvm);
+		kvm_unlink_unsync_page(kvm, sp);
+		kvm_clear_pg_unsync(sp);
+	} else
+		kvm_mmu_zap_page(kvm, sp);
+}
 
 static int kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
 {
 	int ret;
 
+	if (sp->role.glevels != vcpu->arch.mmu.root_level) {
+		kvm_mmu_zap_unsync_page(vcpu->kvm, sp);
+		return -EINVAL;
+	}
+
 	rmap_write_protect(vcpu->kvm, sp->gfn);
 	ret = vcpu->arch.mmu.sync_page(vcpu, sp);
 	if (ret <= 0)
@@ -926,13 +1000,87 @@ static int kvm_sync_page(struct kvm_vcpu
  		 * mappings (only originally writeble ones
  		 * of course).
  		 */
-		kvm_mmu_zap_page(vcpu->kvm, sp);
-	else
+		kvm_mmu_zap_unsync_page(vcpu->kvm, sp);
+	else {
+		kvm_unlink_unsync_page(vcpu->kvm, sp);
 		kvm_clear_pg_unsync(sp);
+	}
 
 	return ret;
 }
 
+static int mmu_unsync_global_page(struct kvm_vcpu *vcpu,
+				  struct kvm_mmu_page *sp)
+
+{
+	struct kvm *kvm = vcpu->kvm;
+	unsigned idx = kvm->arch.oos_global_idx;
+
+	if (kvm->arch.oos_global_pages[idx])
+		kvm_sync_page(vcpu, kvm->arch.oos_global_pages[idx]);
+
+	kvm_set_pg_unsync(sp);
+	kvm->arch.oos_global_pages[idx] = sp;
+
+	kvm->arch.oos_global_idx++;
+	if (kvm->arch.oos_global_idx >= ARRAY_SIZE(kvm->arch.oos_global_pages))
+		kvm->arch.oos_global_idx = 0;
+
+	++kvm->stat.mmu_unsync_global;
+	return 0;
+}
+
+static int page_multimapped(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
+{
+	while (!sp->multimapped) {
+		if (!sp->parent_pte)
+			return 0;
+		sp = page_header(__pa(sp->parent_pte));
+	}
+	return 1;
+}
+
+static int mmu_unsync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
+{
+	struct kvm_mmu_page *root_sp;
+
+	if (page_multimapped(vcpu, sp))
+		return 1;
+
+	root_sp = kvm_mmu_lookup_page_root(vcpu, sp->root_gfn);
+	if (!root_sp)
+		return 1;
+
+	kvm_set_pg_unsync(sp);
+	list_add(&sp->oos_link, &root_sp->unsync_pages);
+	++vcpu->kvm->stat.mmu_unsync;
+	return 0;
+}
+
+static int kvm_unsync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
+{
+	if (kvm_page_global(sp))
+		return mmu_unsync_global_page(vcpu, sp);
+	else
+		return mmu_unsync_page(vcpu, sp);
+}
+
+static int set_shared_mmu_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
+{
+	struct kvm_mmu_page *root_sp;
+	int ret = 0;
+
+	if (sp->role.level == PT_PAGE_TABLE_LEVEL) {
+		if (kvm_page_unsync(sp))
+			kvm_sync_page(vcpu, sp);
+	} else if (sp->root_gfn != -1) {
+		root_sp = kvm_mmu_lookup_page_root(vcpu, sp->root_gfn);
+	}
+
+	sp->root_gfn = -1;
+	return ret;
+}
+
 static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
 					     gfn_t root_gfn,
 					     gfn_t gfn,
@@ -947,7 +1095,8 @@ static struct kvm_mmu_page *kvm_mmu_get_
 	unsigned quadrant;
 	struct hlist_head *bucket;
 	struct kvm_mmu_page *sp;
-	struct hlist_node *node;
+	struct hlist_node *node, *tmp;
+	int unsyncable = 1;
 
 	role.word = 0;
 	role.glevels = vcpu->arch.mmu.root_level;
@@ -963,8 +1112,24 @@ static struct kvm_mmu_page *kvm_mmu_get_
 		 gfn, role.word);
 	index = kvm_page_table_hashfn(gfn);
 	bucket = &vcpu->kvm->arch.mmu_page_hash[index];
-	hlist_for_each_entry(sp, node, bucket, hash_link)
-		if (sp->gfn == gfn && sp->role.word == role.word) {
+	hlist_for_each_entry_safe(sp, node, tmp, bucket, hash_link)
+		if (sp->gfn == gfn) {
+			/*
+ 			 * If a pagetable becomes referenced by more than one
+ 			 * root, or has multiple roles, unsync it and disable
+ 			 * oos. For higher level pgtables the entire tree
+ 			 * has to be synced.
+ 			 */
+			if (sp->root_gfn != root_gfn) {
+				kvm_set_pg_inuse(sp);
+				if (set_shared_mmu_page(vcpu, sp))
+					tmp = bucket->first;
+				kvm_clear_pg_inuse(sp);
+				unsyncable = 0;
+			}
+			if (sp->role.word != role.word)
+				continue;
+
 			mmu_page_add_parent_pte(vcpu, sp, parent_pte);
 			pgprintk("%s: found\n", __func__);
 			return sp;
@@ -975,6 +1140,9 @@ static struct kvm_mmu_page *kvm_mmu_get_
 		return sp;
 	pgprintk("%s: adding gfn %lx role %x\n", __func__, gfn, role.word);
 	sp->gfn = gfn;
+	if (!unsyncable)
+		root_gfn = -1;
+	sp->root_gfn = root_gfn;
 	sp->role = role;
 	hlist_add_head(&sp->hash_link, bucket);
 	if (!metaphysical)
@@ -1084,14 +1252,35 @@ static void kvm_mmu_unlink_parents(struc
 	}
 }
 
-static void kvm_mmu_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp)
+static void kvm_drop_unsync_children(struct kvm *kvm,
+				     struct kvm_mmu_page *root_sp)
+{
+	struct kvm_mmu_page *sp, *n;
+
+	list_for_each_entry_safe(sp, n, &root_sp->unsync_pages, oos_link) {
+		OOS_ASSERT(kvm_page_unsync(sp));
+		OOS_ASSERT(!kvm_page_global(sp));
+		OOS_ASSERT(sp->role.level == PT_PAGE_TABLE_LEVEL);
+		OOS_ASSERT(list_empty(&sp->unsync_pages));
+		kvm_mmu_zap_page(kvm, sp);
+	}
+}
+
+static int kvm_mmu_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp)
 {
+	int ret = 0;
 	++kvm->stat.mmu_shadow_zapped;
 	kvm_mmu_page_unlink_children(kvm, sp);
 	kvm_mmu_unlink_parents(kvm, sp);
 	kvm_flush_remote_tlbs(kvm);
 	if (!sp->role.invalid && !sp->role.metaphysical)
 		unaccount_shadowed(kvm, sp->gfn);
+	if (kvm_test_clear_pg_unsync(sp))
+		kvm_unlink_unsync_page(kvm, sp);
+	if (!list_empty(&sp->unsync_pages)) {
+		kvm_drop_unsync_children(kvm, sp);
+		ret = 1;
+	}
 	if (!sp->root_count) {
 		hlist_del(&sp->hash_link);
 		kvm_mmu_free_page(kvm, sp);
@@ -1101,6 +1290,7 @@ static void kvm_mmu_zap_page(struct kvm 
 		kvm_reload_remote_mmus(kvm);
 	}
 	kvm_mmu_reset_last_pte_updated(kvm);
+	return ret;
 }
 
 /*
@@ -1153,8 +1343,9 @@ static int kvm_mmu_unprotect_page(struct
 		if (sp->gfn == gfn && !sp->role.metaphysical) {
 			pgprintk("%s: gfn %lx role %x\n", __func__, gfn,
 				 sp->role.word);
-			kvm_mmu_zap_page(kvm, sp);
 			r = 1;
+			if (kvm_mmu_zap_page(kvm, sp))
+				n = bucket->first;
 		}
 	return r;
 }
@@ -1191,6 +1382,25 @@ struct page *gva_to_page(struct kvm_vcpu
 	return page;
 }
 
+static int mmu_need_write_protect(struct kvm_vcpu *vcpu, gfn_t gfn,
+				  int write_fault)
+{
+	struct kvm_mmu_page *shadow;
+
+	shadow = kvm_mmu_lookup_page(vcpu->kvm, gfn);
+	if (shadow) {
+		if (!write_fault)
+			return 1;
+		if (shadow->role.level != PT_PAGE_TABLE_LEVEL)
+			return 1;
+		if (shadow->root_gfn == -1)
+			return 1;
+		if (!kvm_page_unsync(shadow))
+			return kvm_unsync_page(vcpu, shadow);
+	}
+	return 0;
+}
+
 static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *shadow_pte,
 			 unsigned pt_access, unsigned pte_access,
 			 int user_fault, int write_fault, int dirty,
@@ -1207,8 +1417,13 @@ static void mmu_set_spte(struct kvm_vcpu
 		 __func__, *shadow_pte, pt_access,
 		 write_fault, user_fault, gfn);
 
-	if (!global)
+	kvm_set_pg_inuse(sp);
+
+	if (!global && kvm_page_global(sp)) {
+		if (kvm_page_unsync(sp))
+			kvm_sync_page(vcpu, sp);
 		kvm_clear_pg_global(sp);
+	}
 
 	if (is_rmap_pte(*shadow_pte)) {
 		/*
@@ -1256,12 +1471,19 @@ static void mmu_set_spte(struct kvm_vcpu
 
 	if ((pte_access & ACC_WRITE_MASK)
 	    || (write_fault && !is_write_protection(vcpu) && !user_fault)) {
-		struct kvm_mmu_page *shadow;
+		/*
+ 		 * Do not create write protected large translations.
+ 		 */
+		if (largepage && has_wrprotected_page(vcpu->kvm, gfn)) {
+			spte = shadow_trap_nonpresent_pte;
+			was_writeble = 0;
+			*ptwrite = 0;
+			goto set_shadow;
+		}
 
 		spte |= PT_WRITABLE_MASK;
 
-		shadow = kvm_mmu_lookup_page(vcpu->kvm, gfn);
-		if (shadow) {
+		if (mmu_need_write_protect(vcpu, gfn, write_fault)) {
 			pgprintk("%s: found shadow page for %lx, marking ro\n",
 				 __func__, gfn);
 			pte_access &= ~ACC_WRITE_MASK;
@@ -1272,16 +1494,9 @@ static void mmu_set_spte(struct kvm_vcpu
 			if (write_fault)
 				*ptwrite = 1;
 		}
-		/*
- 		 * Do not create write protected large translations.
- 		 */
-		if (largepage && has_wrprotected_page(vcpu->kvm, gfn)) {
-			spte = shadow_trap_nonpresent_pte;
-			was_writeble = 0;
-			*ptwrite = 0;
-		}
 	}
 
+set_shadow:
 	if (pte_access & ACC_WRITE_MASK)
 		mark_page_dirty(vcpu->kvm, gfn);
 
@@ -1309,6 +1524,7 @@ static void mmu_set_spte(struct kvm_vcpu
 		vcpu->arch.last_pte_updated = shadow_pte;
 		vcpu->arch.last_pte_gfn = gfn;
 	}
+	kvm_clear_pg_inuse(sp);
 }
 
 static void nonpaging_new_cr3(struct kvm_vcpu *vcpu)
@@ -1950,7 +2166,8 @@ void kvm_mmu_pte_write(struct kvm_vcpu *
 			 */
 			pgprintk("misaligned: gpa %llx bytes %d role %x\n",
 				 gpa, bytes, sp->role.word);
-			kvm_mmu_zap_page(vcpu->kvm, sp);
+			if (kvm_mmu_zap_page(vcpu->kvm, sp))
+				n = bucket->first;
 			++vcpu->kvm->stat.mmu_flooded;
 			continue;
 		}
@@ -2174,7 +2391,9 @@ void kvm_mmu_zap_all(struct kvm *kvm)
 
 	spin_lock(&kvm->mmu_lock);
 	list_for_each_entry_safe(sp, node, &kvm->arch.active_mmu_pages, link)
-		kvm_mmu_zap_page(kvm, sp);
+		if (kvm_mmu_zap_page(kvm, sp))
+			node = container_of(kvm->arch.active_mmu_pages.next,
+					    struct kvm_mmu_page, link);
 	spin_unlock(&kvm->mmu_lock);
 
 	kvm_flush_remote_tlbs(kvm);
Index: kvm/include/asm-x86/kvm_host.h
===================================================================
--- kvm.orig/include/asm-x86/kvm_host.h
+++ kvm/include/asm-x86/kvm_host.h
@@ -179,6 +179,10 @@ union kvm_mmu_page_role {
 struct kvm_mmu_page {
 	struct list_head link;
 	struct hlist_node hash_link;
+	/* FIXME: one list_head is enough */
+	struct list_head unsync_pages;
+	struct list_head oos_link;
+	gfn_t root_gfn; /* root this pagetable belongs to, -1 if multimapped */
 
 	/*
 	 * The following two entries are used to key the shadow page in the
@@ -362,6 +366,8 @@ struct kvm_arch{
 	unsigned int n_requested_mmu_pages;
 	unsigned int n_alloc_mmu_pages;
 	struct hlist_head mmu_page_hash[KVM_NUM_MMU_PAGES];
+	struct kvm_mmu_page *oos_global_pages[7];
+	unsigned oos_global_idx;
 	/*
 	 * Hash table of struct kvm_mmu_page.
 	 */
@@ -390,6 +396,8 @@ struct kvm_vm_stat {
 	u32 mmu_flooded;
 	u32 mmu_recycled;
 	u32 mmu_cache_miss;
+	u32 mmu_unsync;
+	u32 mmu_unsync_global;
 	u32 remote_tlb_flush;
 	u32 lpages;
 };
@@ -763,6 +771,7 @@ int kvm_age_hva(struct kvm *kvm, unsigne
 enum kvm_page_flags {
 	KVM_PG_global,
 	KVM_PG_unsync,
+	KVM_PG_inuse,
 };
 
 #define KVMPGFLAG(name)							\
@@ -777,5 +786,6 @@ static inline int kvm_test_clear_pg_##na
 
 KVMPGFLAG(global);
 KVMPGFLAG(unsync);
+KVMPGFLAG(inuse);
 
 #endif
Index: kvm/arch/x86/kvm/x86.c
===================================================================
--- kvm.orig/arch/x86/kvm/x86.c
+++ kvm/arch/x86/kvm/x86.c
@@ -100,6 +100,8 @@ struct kvm_stats_debugfs_item debugfs_en
 	{ "mmu_flooded", VM_STAT(mmu_flooded) },
 	{ "mmu_recycled", VM_STAT(mmu_recycled) },
 	{ "mmu_cache_miss", VM_STAT(mmu_cache_miss) },
+	{ "mmu_unsync", VM_STAT(mmu_unsync) },
+	{ "mmu_unsync_global", VM_STAT(mmu_unsync_global) },
 	{ "remote_tlb_flush", VM_STAT(remote_tlb_flush) },
 	{ "largepages", VM_STAT(lpages) },
 	{ NULL }
Index: kvm/arch/x86/kvm/paging_tmpl.h
===================================================================
--- kvm.orig/arch/x86/kvm/paging_tmpl.h
+++ kvm/arch/x86/kvm/paging_tmpl.h
@@ -81,6 +81,7 @@ struct shadow_walker {
 	int write_fault;
 	int largepage;
 	int *ptwrite;
+	int multiroot;
 	pfn_t pfn;
 	u64 *sptep;
 };
@@ -294,7 +295,7 @@ static int FNAME(shadow_walk_entry)(stru
 	struct kvm_mmu_page *shadow_page;
 	u64 spte;
 	int metaphysical;
-	gfn_t table_gfn;
+	gfn_t table_gfn, root_gfn;
 	int r;
 	pt_element_t curr_pte;
 
@@ -324,9 +325,11 @@ static int FNAME(shadow_walk_entry)(stru
 		metaphysical = 0;
 		table_gfn = gw->table_gfn[level - 2];
 	}
-	shadow_page = kvm_mmu_get_page(vcpu, gw->table_gfn[gw->root_level - 1],
-				       table_gfn, (gva_t)addr, level-1,
-				       metaphysical, access, sptep);
+	root_gfn = gw->table_gfn[gw->root_level - 1];
+	if (sw->multiroot)
+		root_gfn = -1;
+	shadow_page = kvm_mmu_get_page(vcpu, root_gfn, table_gfn, (gva_t)addr,
+				       level-1, metaphysical, access, sptep);
 	if (!metaphysical) {
 		r = kvm_read_guest_atomic(vcpu->kvm, gw->pte_gpa[level - 2],
 					  &curr_pte, sizeof(curr_pte));
@@ -336,6 +339,8 @@ static int FNAME(shadow_walk_entry)(stru
 			return 1;
 		}
 	}
+	if (shadow_page->root_gfn == -1)
+		sw->multiroot = 1;
 
 	spte = __pa(shadow_page->spt) | PT_PRESENT_MASK | PT_ACCESSED_MASK
 		| PT_WRITABLE_MASK | PT_USER_MASK;
@@ -355,6 +360,7 @@ static u64 *FNAME(fetch)(struct kvm_vcpu
 		.write_fault = write_fault,
 		.largepage = largepage,
 		.ptwrite = ptwrite,
+		.multiroot = 0,
 		.pfn = pfn,
 	};
 

-- 


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch 10/13] KVM: MMU: sync roots on mmu reload
  2008-09-06 18:48 [patch 00/13] RFC: out of sync shadow Marcelo Tosatti
                   ` (8 preceding siblings ...)
  2008-09-06 18:48 ` [patch 09/13] KVM: MMU: out of sync shadow core Marcelo Tosatti
@ 2008-09-06 18:48 ` Marcelo Tosatti
  2008-09-06 18:48 ` [patch 11/13] KVM: MMU: sync global pages on cr0/cr4 writes Marcelo Tosatti
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 42+ messages in thread
From: Marcelo Tosatti @ 2008-09-06 18:48 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm

[-- Attachment #1: kvm-oos-hook-cr3 --]
[-- Type: text/plain, Size: 3099 bytes --]

Bring the pages for the root being switched to back in sync.

Index: kvm/arch/x86/kvm/mmu.c
===================================================================
--- kvm.orig/arch/x86/kvm/mmu.c
+++ kvm/arch/x86/kvm/mmu.c
@@ -1065,6 +1065,20 @@ static int kvm_unsync_page(struct kvm_vc
 		return mmu_unsync_page(vcpu, sp);
 }
 
+static int mmu_sync_root(struct kvm_vcpu *vcpu, struct kvm_mmu_page *root_sp)
+{
+	struct kvm_mmu_page *sp, *n;
+	int ret = 0;
+
+	list_for_each_entry_safe(sp, n, &root_sp->unsync_pages, oos_link) {
+		ret = 1;
+		kvm_sync_page(vcpu, sp);
+	}
+
+	OOS_ASSERT(list_empty(&root_sp->unsync_pages));
+	return ret;
+}
+
 static int set_shared_mmu_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
 {
 	struct kvm_mmu_page *root_sp;
@@ -1075,6 +1089,8 @@ static int set_shared_mmu_page(struct kv
 			kvm_sync_page(vcpu, sp);
 	} else if (sp->root_gfn != -1) {
 		root_sp = kvm_mmu_lookup_page_root(vcpu, sp->root_gfn);
+		if (root_sp)
+			ret = mmu_sync_root(vcpu, root_sp);
 	}
 
 	sp->root_gfn = -1;
@@ -1716,6 +1732,37 @@ static void mmu_alloc_roots(struct kvm_v
 	vcpu->arch.mmu.root_hpa = __pa(vcpu->arch.mmu.pae_root);
 }
 
+static void mmu_sync_roots(struct kvm_vcpu *vcpu)
+{
+	int i;
+	struct kvm_mmu_page *sp;
+
+	if (!VALID_PAGE(vcpu->arch.mmu.root_hpa))
+		return;
+	if (vcpu->arch.mmu.shadow_root_level == PT64_ROOT_LEVEL) {
+		hpa_t root = vcpu->arch.mmu.root_hpa;
+		sp = page_header(root);
+		mmu_sync_root(vcpu, sp);
+		return;
+	}
+	for (i = 0; i < 4; ++i) {
+		hpa_t root = vcpu->arch.mmu.pae_root[i];
+
+		if (root) {
+			root &= PT64_BASE_ADDR_MASK;
+			sp = page_header(root);
+			mmu_sync_root(vcpu, sp);
+		}
+	}
+}
+
+void kvm_mmu_sync_roots(struct kvm_vcpu *vcpu)
+{
+	spin_lock(&vcpu->kvm->mmu_lock);
+	mmu_sync_roots(vcpu);
+	spin_unlock(&vcpu->kvm->mmu_lock);
+}
+
 static gpa_t nonpaging_gva_to_gpa(struct kvm_vcpu *vcpu, gva_t vaddr)
 {
 	return vaddr;
@@ -1960,6 +2007,7 @@ int kvm_mmu_load(struct kvm_vcpu *vcpu)
 	spin_lock(&vcpu->kvm->mmu_lock);
 	kvm_mmu_free_some_pages(vcpu);
 	mmu_alloc_roots(vcpu);
+	mmu_sync_roots(vcpu);
 	spin_unlock(&vcpu->kvm->mmu_lock);
 	kvm_x86_ops->set_cr3(vcpu, vcpu->arch.mmu.root_hpa);
 	kvm_mmu_flush_tlb(vcpu);
Index: kvm/arch/x86/kvm/x86.c
===================================================================
--- kvm.orig/arch/x86/kvm/x86.c
+++ kvm/arch/x86/kvm/x86.c
@@ -582,6 +582,7 @@ EXPORT_SYMBOL_GPL(kvm_set_cr4);
 void kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
 {
 	if (cr3 == vcpu->arch.cr3 && !pdptrs_changed(vcpu)) {
+		kvm_mmu_sync_roots(vcpu);
 		kvm_mmu_flush_tlb(vcpu);
 		return;
 	}
Index: kvm/include/asm-x86/kvm_host.h
===================================================================
--- kvm.orig/include/asm-x86/kvm_host.h
+++ kvm/include/asm-x86/kvm_host.h
@@ -610,6 +610,7 @@ int kvm_mmu_unprotect_page_virt(struct k
 void __kvm_mmu_free_some_pages(struct kvm_vcpu *vcpu);
 int kvm_mmu_load(struct kvm_vcpu *vcpu);
 void kvm_mmu_unload(struct kvm_vcpu *vcpu);
+void kvm_mmu_sync_roots(struct kvm_vcpu *vcpu);
 
 int kvm_emulate_hypercall(struct kvm_vcpu *vcpu);
 

-- 


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch 11/13] KVM: MMU: sync global pages on cr0/cr4 writes
  2008-09-06 18:48 [patch 00/13] RFC: out of sync shadow Marcelo Tosatti
                   ` (9 preceding siblings ...)
  2008-09-06 18:48 ` [patch 10/13] KVM: MMU: sync roots on mmu reload Marcelo Tosatti
@ 2008-09-06 18:48 ` Marcelo Tosatti
  2008-09-06 18:48 ` [patch 12/13] KVM: x86: trap invlpg Marcelo Tosatti
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 42+ messages in thread
From: Marcelo Tosatti @ 2008-09-06 18:48 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm

[-- Attachment #1: kvm-oos-cr4-sync-global --]
[-- Type: text/plain, Size: 1749 bytes --]



Index: kvm/arch/x86/kvm/mmu.c
===================================================================
--- kvm.orig/arch/x86/kvm/mmu.c
+++ kvm/arch/x86/kvm/mmu.c
@@ -1097,6 +1097,20 @@ static int set_shared_mmu_page(struct kv
 	return ret;
 }
 
+void kvm_mmu_sync_global(struct kvm_vcpu *vcpu)
+{
+	int i;
+	struct kvm *kvm = vcpu->kvm;
+
+	spin_lock(&kvm->mmu_lock);
+	for (i = 0; i < ARRAY_SIZE(kvm->arch.oos_global_pages); i++) {
+		struct kvm_mmu_page *sp = kvm->arch.oos_global_pages[i];
+		if (sp)
+			kvm_sync_page(vcpu, sp);
+	}
+	spin_unlock(&kvm->mmu_lock);
+}
+
 static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
 					     gfn_t root_gfn,
 					     gfn_t gfn,
Index: kvm/arch/x86/kvm/x86.c
===================================================================
--- kvm.orig/arch/x86/kvm/x86.c
+++ kvm/arch/x86/kvm/x86.c
@@ -532,6 +532,7 @@ void kvm_set_cr0(struct kvm_vcpu *vcpu, 
 	kvm_x86_ops->set_cr0(vcpu, cr0);
 	vcpu->arch.cr0 = cr0;
 
+	kvm_mmu_sync_global(vcpu);
 	kvm_mmu_reset_context(vcpu);
 	return;
 }
@@ -575,6 +576,7 @@ void kvm_set_cr4(struct kvm_vcpu *vcpu, 
 	}
 	kvm_x86_ops->set_cr4(vcpu, cr4);
 	vcpu->arch.cr4 = cr4;
+	kvm_mmu_sync_global(vcpu);
 	kvm_mmu_reset_context(vcpu);
 }
 EXPORT_SYMBOL_GPL(kvm_set_cr4);
Index: kvm/include/asm-x86/kvm_host.h
===================================================================
--- kvm.orig/include/asm-x86/kvm_host.h
+++ kvm/include/asm-x86/kvm_host.h
@@ -611,6 +611,7 @@ void __kvm_mmu_free_some_pages(struct kv
 int kvm_mmu_load(struct kvm_vcpu *vcpu);
 void kvm_mmu_unload(struct kvm_vcpu *vcpu);
 void kvm_mmu_sync_roots(struct kvm_vcpu *vcpu);
+void kvm_mmu_sync_global(struct kvm_vcpu *vcpu);
 
 int kvm_emulate_hypercall(struct kvm_vcpu *vcpu);
 

-- 


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch 12/13] KVM: x86: trap invlpg
  2008-09-06 18:48 [patch 00/13] RFC: out of sync shadow Marcelo Tosatti
                   ` (10 preceding siblings ...)
  2008-09-06 18:48 ` [patch 11/13] KVM: MMU: sync global pages on cr0/cr4 writes Marcelo Tosatti
@ 2008-09-06 18:48 ` Marcelo Tosatti
  2008-09-07 11:14   ` Avi Kivity
  2008-09-06 18:48 ` [patch 13/13] KVM: MMU: ignore multiroot when unsyncing global pages Marcelo Tosatti
                   ` (2 subsequent siblings)
  14 siblings, 1 reply; 42+ messages in thread
From: Marcelo Tosatti @ 2008-09-06 18:48 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm

[-- Attachment #1: kvm-oos-invlpg --]
[-- Type: text/plain, Size: 6669 bytes --]

With pages out of sync invlpg needs to be trapped. For now simply nuke
the entry.

The SVM code is untested and probably broken.

Index: kvm/arch/x86/kvm/vmx.c
===================================================================
--- kvm.orig/arch/x86/kvm/vmx.c
+++ kvm/arch/x86/kvm/vmx.c
@@ -1130,6 +1130,7 @@ static __init int setup_vmcs_config(stru
 	      CPU_BASED_CR3_STORE_EXITING |
 	      CPU_BASED_USE_IO_BITMAPS |
 	      CPU_BASED_MOV_DR_EXITING |
+	      CPU_BASED_INVLPG_EXITING |
 	      CPU_BASED_USE_TSC_OFFSETING;
 	opt = CPU_BASED_TPR_SHADOW |
 	      CPU_BASED_USE_MSR_BITMAPS |
@@ -2790,6 +2791,15 @@ static int handle_vmcall(struct kvm_vcpu
 	return 1;
 }
 
+static int handle_invlpg(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
+{
+	u64 exit_qualification = vmcs_read64(EXIT_QUALIFICATION);
+
+	kvm_mmu_invlpg(vcpu, exit_qualification);
+	skip_emulated_instruction(vcpu);
+	return 1;
+}
+
 static int handle_wbinvd(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
 {
 	skip_emulated_instruction(vcpu);
@@ -2958,6 +2968,7 @@ static int (*kvm_vmx_exit_handlers[])(st
 	[EXIT_REASON_MSR_WRITE]               = handle_wrmsr,
 	[EXIT_REASON_PENDING_INTERRUPT]       = handle_interrupt_window,
 	[EXIT_REASON_HLT]                     = handle_halt,
+	[EXIT_REASON_INVLPG]		      = handle_invlpg,
 	[EXIT_REASON_VMCALL]                  = handle_vmcall,
 	[EXIT_REASON_TPR_BELOW_THRESHOLD]     = handle_tpr_below_threshold,
 	[EXIT_REASON_APIC_ACCESS]             = handle_apic_access,
Index: kvm/arch/x86/kvm/mmu.c
===================================================================
--- kvm.orig/arch/x86/kvm/mmu.c
+++ kvm/arch/x86/kvm/mmu.c
@@ -889,6 +889,12 @@ static int nonpaging_sync_page(struct kv
 	return 1;
 }
 
+static void nonpaging_invlpg(struct kvm_vcpu *vcpu, gva_t gva)
+{
+	/* should never happen */
+	WARN_ON(1);
+}
+
 static struct kvm_mmu_page *kvm_mmu_lookup_page(struct kvm *kvm, gfn_t gfn)
 {
 	unsigned index;
@@ -1860,6 +1866,7 @@ static int nonpaging_init_context(struct
 	context->free = nonpaging_free;
 	context->prefetch_page = nonpaging_prefetch_page;
 	context->sync_page = nonpaging_sync_page;
+	context->invlpg = nonpaging_invlpg;
 	context->root_level = 0;
 	context->shadow_root_level = PT32E_ROOT_LEVEL;
 	context->root_hpa = INVALID_PAGE;
@@ -1908,6 +1915,7 @@ static int paging64_init_context_common(
 	context->gva_to_gpa = paging64_gva_to_gpa;
 	context->prefetch_page = paging64_prefetch_page;
 	context->sync_page = paging64_sync_page;
+	context->invlpg = paging64_invlpg;
 	context->free = paging_free;
 	context->root_level = level;
 	context->shadow_root_level = level;
@@ -1930,6 +1938,7 @@ static int paging32_init_context(struct 
 	context->free = paging_free;
 	context->prefetch_page = paging32_prefetch_page;
 	context->sync_page = paging32_sync_page;
+	context->invlpg = paging32_invlpg;
 	context->root_level = PT32_ROOT_LEVEL;
 	context->shadow_root_level = PT32E_ROOT_LEVEL;
 	context->root_hpa = INVALID_PAGE;
@@ -1950,6 +1959,7 @@ static int init_kvm_tdp_mmu(struct kvm_v
 	context->free = nonpaging_free;
 	context->prefetch_page = nonpaging_prefetch_page;
 	context->sync_page = nonpaging_sync_page;
+	context->invlpg = nonpaging_invlpg;
 	context->shadow_root_level = kvm_x86_ops->get_tdp_level();
 	context->root_hpa = INVALID_PAGE;
 
@@ -2343,6 +2353,14 @@ out:
 }
 EXPORT_SYMBOL_GPL(kvm_mmu_page_fault);
 
+void kvm_mmu_invlpg(struct kvm_vcpu *vcpu, gva_t gva)
+{
+	spin_lock(&vcpu->kvm->mmu_lock);
+	vcpu->arch.mmu.invlpg(vcpu, gva);
+	spin_unlock(&vcpu->kvm->mmu_lock);
+}
+EXPORT_SYMBOL_GPL(kvm_mmu_invlpg);
+
 void kvm_enable_tdp(void)
 {
 	tdp_enabled = true;
Index: kvm/arch/x86/kvm/paging_tmpl.h
===================================================================
--- kvm.orig/arch/x86/kvm/paging_tmpl.h
+++ kvm/arch/x86/kvm/paging_tmpl.h
@@ -467,6 +467,32 @@ out_unlock:
 	return 0;
 }
 
+
+static int FNAME(shadow_invlpg_entry)(struct kvm_shadow_walk *_sw,
+				    struct kvm_vcpu *vcpu, u64 addr,
+				    u64 *sptep, int level)
+{
+
+	if (level == PT_PAGE_TABLE_LEVEL) {
+		if (is_shadow_present_pte(*sptep))
+			rmap_remove(vcpu->kvm, sptep);
+		set_shadow_pte(sptep, shadow_trap_nonpresent_pte);
+		return 1;
+	}
+	if (!is_shadow_present_pte(*sptep))
+		return 1;
+	return 0;
+}
+
+static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva)
+{
+	struct shadow_walker walker = {
+		.walker = { .entry = FNAME(shadow_invlpg_entry), },
+	};
+
+	walk_shadow(&walker.walker, vcpu, gva);
+}
+
 static gpa_t FNAME(gva_to_gpa)(struct kvm_vcpu *vcpu, gva_t vaddr)
 {
 	struct guest_walker walker;
Index: kvm/include/asm-x86/kvm_host.h
===================================================================
--- kvm.orig/include/asm-x86/kvm_host.h
+++ kvm/include/asm-x86/kvm_host.h
@@ -227,6 +227,7 @@ struct kvm_mmu {
 			      struct kvm_mmu_page *page);
 	int (*sync_page)(struct kvm_vcpu *vcpu,
 			  struct kvm_mmu_page *sp);
+	void (*invlpg)(struct kvm_vcpu *vcpu, gva_t gva);
 	hpa_t root_hpa;
 	int root_level;
 	int shadow_root_level;
@@ -618,6 +619,7 @@ int kvm_emulate_hypercall(struct kvm_vcp
 int kvm_fix_hypercall(struct kvm_vcpu *vcpu);
 
 int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t gva, u32 error_code);
+void kvm_mmu_invlpg(struct kvm_vcpu *vcpu, gva_t gva);
 
 void kvm_enable_tdp(void);
 void kvm_disable_tdp(void);
Index: kvm/arch/x86/kvm/svm.c
===================================================================
--- kvm.orig/arch/x86/kvm/svm.c
+++ kvm/arch/x86/kvm/svm.c
@@ -525,6 +525,7 @@ static void init_vmcb(struct vcpu_svm *s
 				(1ULL << INTERCEPT_CPUID) |
 				(1ULL << INTERCEPT_INVD) |
 				(1ULL << INTERCEPT_HLT) |
+				(1ULL << INTERCEPT_INVLPG) |
 				(1ULL << INTERCEPT_INVLPGA) |
 				(1ULL << INTERCEPT_IOIO_PROT) |
 				(1ULL << INTERCEPT_MSR_PROT) |
@@ -1160,6 +1161,15 @@ static int cpuid_interception(struct vcp
 	return 1;
 }
 
+static int invlpg_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run)
+{
+	/* FIXME: does this make any sense? */
+	u64 vaddr = svm->vmcb->control.exit_info_1;
+	kvm_mmu_invlpg(&svm->vcpu, vaddr);
+	skip_emulated_instruction(&svm->vcpu);
+	return 1;
+}
+
 static int emulate_on_interception(struct vcpu_svm *svm,
 				   struct kvm_run *kvm_run)
 {
@@ -1413,7 +1423,7 @@ static int (*svm_exit_handlers[])(struct
 	[SVM_EXIT_CPUID]			= cpuid_interception,
 	[SVM_EXIT_INVD]                         = emulate_on_interception,
 	[SVM_EXIT_HLT]				= halt_interception,
-	[SVM_EXIT_INVLPG]			= emulate_on_interception,
+	[SVM_EXIT_INVLPG]			= invlpg_interception,
 	[SVM_EXIT_INVLPGA]			= invalid_op_interception,
 	[SVM_EXIT_IOIO] 		  	= io_interception,
 	[SVM_EXIT_MSR]				= msr_interception,

-- 


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch 13/13] KVM: MMU: ignore multiroot when unsyncing global pages
  2008-09-06 18:48 [patch 00/13] RFC: out of sync shadow Marcelo Tosatti
                   ` (11 preceding siblings ...)
  2008-09-06 18:48 ` [patch 12/13] KVM: x86: trap invlpg Marcelo Tosatti
@ 2008-09-06 18:48 ` Marcelo Tosatti
  2008-09-07 11:22 ` [patch 00/13] RFC: out of sync shadow Avi Kivity
  2008-09-12  4:05 ` David S. Ahern
  14 siblings, 0 replies; 42+ messages in thread
From: Marcelo Tosatti @ 2008-09-06 18:48 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm

[-- Attachment #1: kvm-oos-global-share --]
[-- Type: text/plain, Size: 556 bytes --]

Since global pages are not linked in terms of TLB flushing to any root.

Index: kvm/arch/x86/kvm/mmu.c
===================================================================
--- kvm.orig/arch/x86/kvm/mmu.c
+++ kvm/arch/x86/kvm/mmu.c
@@ -1429,7 +1429,7 @@ static int mmu_need_write_protect(struct
 			return 1;
 		if (shadow->role.level != PT_PAGE_TABLE_LEVEL)
 			return 1;
-		if (shadow->root_gfn == -1)
+		if (shadow->root_gfn == -1 && !kvm_page_global(shadow))
 			return 1;
 		if (!kvm_page_unsync(shadow))
 			return kvm_unsync_page(vcpu, shadow);

-- 


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch 01/13] x86/mm: get_user_pages_fast_atomic
  2008-09-06 18:48 ` [patch 01/13] x86/mm: get_user_pages_fast_atomic Marcelo Tosatti
@ 2008-09-07  8:42   ` Avi Kivity
  2008-09-08  6:10     ` Marcelo Tosatti
  0 siblings, 1 reply; 42+ messages in thread
From: Avi Kivity @ 2008-09-07  8:42 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: kvm

Marcelo Tosatti wrote:
> From: Nick Piggin <nickpiggin@yahoo.com.au>
>
> Provide a lockless pagetable walk function without fallback to mmap_sem
> on error.
>   

I would like to avoid this if possible.  Not only is this a change to 
the core (with backporting headaches), if we resync in atomic context 
this can mean a long time spent with preemption disabled.

We might get around the need by dropping the lock when we resync, fetch 
the gfns without the lock, and after reacquiring it check whether we can 
proceed or whether we need to abort and let the guest retry.  We can 
probably proceed unless one of two things have happened: an mmu page was 
zapped, or out page was oos'ed while we were resyncing it.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch 02/13] KVM: MMU: switch to get_user_pages_fast
  2008-09-06 18:48 ` [patch 02/13] KVM: MMU: switch to get_user_pages_fast Marcelo Tosatti
@ 2008-09-07  8:45   ` Avi Kivity
  2008-09-07 20:44     ` Marcelo Tosatti
  2008-09-09 12:21     ` Andrea Arcangeli
  0 siblings, 2 replies; 42+ messages in thread
From: Avi Kivity @ 2008-09-07  8:45 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: kvm, Andrea Arcangeli

Marcelo Tosatti wrote:
> Avoid mmap_sem / pt lock acquision if the pagetables are present. The
> improvement for hugepage backed guests is more significant, since pte
> walk + page grab for such mappings is serialized by mm->page_table_lock.
>
> CC: Andrea Arcangeli <andrea@qumranet.com>
>
>   

I'd like to apply this.  Andrea, can you verify that mmu notifiers 
interaction is okay?

>  
>  static int kvm_vm_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
>  {
> +	struct page *page[1];
> +	unsigned long addr;
> +	int npages;
> +	gfn_t gfn = vmf->pgoff;
>  	struct kvm *kvm = vma->vm_file->private_data;
> -	struct page *page;
>  
> -	if (!kvm_is_visible_gfn(kvm, vmf->pgoff))
> +	addr = gfn_to_hva(kvm, gfn);
> +	if (kvm_is_error_hva(addr))
>  		return VM_FAULT_SIGBUS;
> -	page = gfn_to_page(kvm, vmf->pgoff);
> -	if (is_error_page(page)) {
> -		kvm_release_page_clean(page);
> +
> +	npages = get_user_pages(current, current->mm, addr, 1, 1, 0, page,
> +				NULL);
> +	if (unlikely(npages != 1))
>  		return VM_FAULT_SIGBUS;
> -	}
> -	vmf->page = page;
> +
> +	vmf->page = page[0];
>  	return 0;
>  }
>  
>   

Why this change?

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch 05/13] KVM: MMU: do not write-protect large mappings
  2008-09-06 18:48 ` [patch 05/13] KVM: MMU: do not write-protect large mappings Marcelo Tosatti
@ 2008-09-07  9:04   ` Avi Kivity
  2008-09-07 20:54     ` Marcelo Tosatti
  0 siblings, 1 reply; 42+ messages in thread
From: Avi Kivity @ 2008-09-07  9:04 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: kvm

Marcelo Tosatti wrote:
> There is not much point in write protecting large mappings. This
> can only happen when a page is shadowed during the window between
> is_largepage_backed and mmu_lock acquision. Zap the entry instead, so
> the next pagefault will find a shadowed page via is_largepage_backed and
> fallback to 4k translations.
>
> Simplifies out of sync shadow.
>
> @@ -1222,6 +1221,14 @@ static void mmu_set_spte(struct kvm_vcpu
>  			if (write_fault)
>  				*ptwrite = 1;
>  		}
> +		/*
> + 		 * Do not create write protected large translations.
> + 		 */
> +		if (largepage && has_wrprotected_page(vcpu->kvm, gfn)) {
> +			spte = shadow_trap_nonpresent_pte;
> +			was_writeble = 0;
> +			*ptwrite = 0;
> +		}
>  	}
>  
>   

Why are you clearing was_writable?


-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch 06/13] KVM: MMU: global page keeping
  2008-09-06 18:48 ` [patch 06/13] KVM: MMU: global page keeping Marcelo Tosatti
@ 2008-09-07  9:16   ` Avi Kivity
  0 siblings, 0 replies; 42+ messages in thread
From: Avi Kivity @ 2008-09-07  9:16 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: kvm

Marcelo Tosatti wrote:
> Account pages that only have global entries. Such pages need to be
> synced on cr4 writes, but not cr3 writes.
>
> @@ -1147,18 +1148,22 @@ struct page *gva_to_page(struct kvm_vcpu
>  static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *shadow_pte,
>  			 unsigned pt_access, unsigned pte_access,
>  			 int user_fault, int write_fault, int dirty,
> -			 int *ptwrite, int largepage, gfn_t gfn,
> -			 pfn_t pfn, bool speculative)
> +			 int global, int *ptwrite, int largepage,
> +			 gfn_t gfn, pfn_t pfn, bool speculative)
>   

bool global

One option to stop the ever-increasing m_s_s() parameter list is to put 
it into pte_access.  Not sure if it's a good idea.

> ===================================================================
> --- kvm.orig/include/asm-x86/kvm_host.h
> +++ kvm/include/asm-x86/kvm_host.h
> @@ -199,6 +199,7 @@ struct kvm_mmu_page {
>  		u64 *parent_pte;               /* !multimapped */
>  		struct hlist_head parent_ptes; /* multimapped, kvm_pte_chain */
>  	};
> +	unsigned long flags;
>  };
>   

bool global;

(or: bool global : 1; if we run out of space)

>  int kvm_unmap_hva(struct kvm *kvm, unsigned long hva);
>  int kvm_age_hva(struct kvm *kvm, unsigned long hva);
>  
> +enum kvm_page_flags {
> +	KVM_PG_global,
> +};
> +
> +#define KVMPGFLAG(name)							\
> +static inline int kvm_page_##name(struct kvm_mmu_page *sp)		\
> +	{ return test_bit(KVM_PG_##name, &sp->flags); }			\
> +static inline void kvm_set_pg_##name(struct kvm_mmu_page *sp)		\
> +	{ set_bit(KVM_PG_##name, &sp->flags); }				\
> +static inline void kvm_clear_pg_##name(struct kvm_mmu_page *sp)		\
> +	{ clear_bit(KVM_PG_##name, &sp->flags); }			\
> +static inline int kvm_test_clear_pg_##name(struct kvm_mmu_page *sp)	\
> +	{ return test_and_clear_bit(KVM_PG_##name, &sp->flags); }
> +
> +KVMPGFLAG(global);
> +
>   

Ugh.  Note you're using locked operations even though the structure is 
protected by the mmu lock.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch 07/13] KVM: MMU: mode specific sync_page
  2008-09-06 18:48 ` [patch 07/13] KVM: MMU: mode specific sync_page Marcelo Tosatti
@ 2008-09-07  9:52   ` Avi Kivity
  2008-09-08  6:03     ` Marcelo Tosatti
  0 siblings, 1 reply; 42+ messages in thread
From: Avi Kivity @ 2008-09-07  9:52 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: KVM list

Marcelo Tosatti wrote:
> Examine guest pagetable and bring the shadow back in sync. At the moment
> sync_page is simplistic and only cares about shadow present entries
> whose gfn remains unchanged.
>
> It might be worthwhile to prepopulate the shadow in advance.
>
> FIXME: the RW->RO transition needs a local TLB flush.
>
>   

Yes!

>  static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
>  					     gfn_t gfn,
>  					     gva_t gaddr,
> @@ -1536,6 +1581,7 @@ static int nonpaging_init_context(struct
>  	context->gva_to_gpa = nonpaging_gva_to_gpa;
>  	context->free = nonpaging_free;
>  	context->prefetch_page = nonpaging_prefetch_page;
> +	context->sync_page = nonpaging_sync_page;
>  	context->root_level = 0;
>  	context->shadow_root_level = PT32E_ROOT_LEVEL;
>  	context->root_hpa = INVALID_PAGE;
> @@ -1583,6 +1629,7 @@ static int paging64_init_context_common(
>  	context->page_fault = paging64_page_fault;
>  	context->gva_to_gpa = paging64_gva_to_gpa;
>  	context->prefetch_page = paging64_prefetch_page;
> +	context->sync_page = paging64_sync_page;
>  	context->free = paging_free;
>  	context->root_level = level;
>  	context->shadow_root_level = level;
> @@ -1604,6 +1651,7 @@ static int paging32_init_context(struct 
>  	context->gva_to_gpa = paging32_gva_to_gpa;
>  	context->free = paging_free;
>  	context->prefetch_page = paging32_prefetch_page;
> +	context->sync_page = paging32_sync_page;
>  	context->root_level = PT32_ROOT_LEVEL;
>  	context->shadow_root_level = PT32E_ROOT_LEVEL;
>  	context->root_hpa = INVALID_PAGE;
> @@ -1623,6 +1671,7 @@ static int init_kvm_tdp_mmu(struct kvm_v
>  	context->page_fault = tdp_page_fault;
>  	context->free = nonpaging_free;
>  	context->prefetch_page = nonpaging_prefetch_page;
> +	context->sync_page = nonpaging_sync_page;
>  	context->shadow_root_level = kvm_x86_ops->get_tdp_level();
>  	context->root_hpa = INVALID_PAGE;
>   

Not sure this is right.

What if vcpu0 is in mode X, while vcpu1 is in mode Y.  vcpu0 writes to 
some pagetable, causing both mode X and mode Y shadows to become 
unsynced, so on the next resync (either by vcpu0 or vcpu1) we need to 
sync both modes.

Same problem with kvm_mmu_pte_write(), which right now hacks around it.

Maybe we need a ->ops member.

>  
>  
> +static int FNAME(sync_page)(struct kvm_vcpu *vcpu,
> +			    struct kvm_mmu_page *sp)
> +{
> +	int i, nr_present = 0;
> +	struct page *pt_page;
> +	pt_element_t *pt;
> +	void *gpte_kaddr;
> +
> +	pt_page = gfn_to_page_atomic(vcpu->kvm, sp->gfn);
> +	if (is_error_page(pt_page)) {
> +		kvm_release_page_clean(pt_page);
> +		return -EFAULT;
> +	}
> +
> +	gpte_kaddr = pt = kmap_atomic(pt_page, KM_USER0);
> +
> +	if (PTTYPE == 32)
> +		pt += sp->role.quadrant << PT64_LEVEL_BITS;
>   

Only works for level 1 pages (which is okay).

> +
> +	for (i = 0; i < PT64_ENT_PER_PAGE; i++) {
> +		if (is_shadow_present_pte(sp->spt[i])) {
>   

Helper function needed for contents of inner loop.

> +			struct page *page;
> +			u64 spte;
> +			unsigned pte_access;
> +
> +			if (!is_present_pte(*pt)) {
> +				rmap_remove(vcpu->kvm, &sp->spt[i]);
> +				sp->spt[i] = shadow_notrap_nonpresent_pte;
> +				pt++;
> +				continue;
> +			}
>   

Are we missing a tlb flush?  Or will the caller take care of it?

> +
> +			pte_access = sp->role.access & FNAME(gpte_access)(vcpu, *pt);
> +			/* user */
> +			if (pte_access & ACC_USER_MASK)
> +				spte |= shadow_user_mask;
>   

There are some special cases involving cr0.wp=0 and the user mask.  so 
spte.u is not correlated exactly with gpte.u.

> +			/* guest->shadow accessed sync */
> +			if (!(*pt & PT_ACCESSED_MASK))
> +				spte &= ~PT_ACCESSED_MASK;
>   

spte shouldn't be accessible at all if gpte is not accessed, so we can 
set gpte.a on the next access (similar to spte not being writeable if 
gpte is not dirty).

> +			/* shadow->guest accessed sync */
> +			if (spte & PT_ACCESSED_MASK)
> +				set_bit(PT_ACCESSED_SHIFT, (unsigned long *)pt);
>   

host accessed and guest accessed are very different.  We shouldn't set 
host accessed unless we're sure the guest will access the page very soon.


> +			set_shadow_pte(&sp->spt[i], spte);
>   

What if permissions are reduced?

You can use PT_* instead of shadow_* as this will never be called when 
ept is active.

I'm worried about the duplication with kvm_mmu_set_pte().  Perhaps that 
can be refactored instead to be the inner loop.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch 09/13] KVM: MMU: out of sync shadow core
  2008-09-06 18:48 ` [patch 09/13] KVM: MMU: out of sync shadow core Marcelo Tosatti
@ 2008-09-07 11:01   ` Avi Kivity
  2008-09-08  7:19     ` Marcelo Tosatti
  2008-09-11 13:15     ` Marcelo Tosatti
  0 siblings, 2 replies; 42+ messages in thread
From: Avi Kivity @ 2008-09-07 11:01 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: kvm

Marcelo Tosatti wrote:
> Allow global and single-root, single-role-per-gfn leaf shadowed
> pagetables to be unsynced.
>
> Global unsync pages are saved into a per-vm array, synced on cr4/cr0 writes.
>
>   

Why not a list?

> Non-global unsync pages are linked off their root shadow page, synced 
> on cr3/cr4/cr0 writes.
>
> Some of this logic is simplistic and could be smarter (page_multimapped and
> the full root sync on higher level pagetable sharing).
>
> Also unsyncing of non-leaf nodes might be interesting (but more complicated).
>
>
>  
> +static struct kvm_mmu_page *kvm_mmu_lookup_page_root(struct kvm_vcpu *vcpu,
> +						     gfn_t gfn)
> +{
> +	unsigned index;
> +	struct hlist_head *bucket;
> +	struct kvm_mmu_page *sp;
> +	struct hlist_node *node;
> +	struct kvm *kvm = vcpu->kvm;
> +	int level = vcpu->arch.mmu.root_level;
> +	if (!is_long_mode(vcpu) && is_pae(vcpu))
> +		level--;
> +
> +	pgprintk("%s: looking for gfn %lx\n", __func__, gfn);
> +	index = kvm_page_table_hashfn(gfn);
> +	bucket = &kvm->arch.mmu_page_hash[index];
> +	hlist_for_each_entry(sp, node, bucket, hash_link)
> +		if (sp->gfn == gfn && !sp->role.metaphysical
> +		    && !sp->role.invalid && sp->role.level == level) {
> +			pgprintk("%s: found role %x\n",
> +				 __func__, sp->role.word);
> +			return sp;
> +		}
> +	return NULL;
> +}
>   

I'm worried about the complexity this (and the rest) introduces.

A possible alternative is:

- for non-leaf pages, including roots, add a 'unsync_children' flag.
- when marking a page unsync, set the flag recursively on all parents
- when switching cr3, recursively descend to locate unsynced leaves, 
clearing flags along the way
- to speed this up, put a bitmap with 1 bit per pte in the pages (512 
bits = 64 bytes)
- the bitmap can be externally allocated to save space, or not

This means we no longer have to worry about multiple roots, when a page 
acquires another root while it is unsynced, etc.


> @@ -963,8 +1112,24 @@ static struct kvm_mmu_page *kvm_mmu_get_
>  		 gfn, role.word);
>  	index = kvm_page_table_hashfn(gfn);
>  	bucket = &vcpu->kvm->arch.mmu_page_hash[index];
> -	hlist_for_each_entry(sp, node, bucket, hash_link)
> -		if (sp->gfn == gfn && sp->role.word == role.word) {
> +	hlist_for_each_entry_safe(sp, node, tmp, bucket, hash_link)
> +		if (sp->gfn == gfn) {
> +			/*
> + 			 * If a pagetable becomes referenced by more than one
> + 			 * root, or has multiple roles, unsync it and disable
> + 			 * oos. For higher level pgtables the entire tree
> + 			 * has to be synced.
> + 			 */
> +			if (sp->root_gfn != root_gfn) {
> +				kvm_set_pg_inuse(sp);
>   

What does inuse mean exactly?

> +				if (set_shared_mmu_page(vcpu, sp))
> +					tmp = bucket->first;
> +				kvm_clear_pg_inuse(sp);
>   

Cleared here?

> +				unsyncable = 0;
> +			}
> +			if (sp->role.word != role.word)
> +				continue;
> +
>  			mmu_page_add_parent_pte(vcpu, sp, parent_pte);
>  			pgprintk("%s: found\n", __func__);
>  			return sp;
> --- kvm.orig/include/asm-x86/kvm_host.h
> +++ kvm/include/asm-x86/kvm_host.h
> @@ -179,6 +179,10 @@ union kvm_mmu_page_role {
>  struct kvm_mmu_page {
>  	struct list_head link;
>  	struct hlist_node hash_link;
> +	/* FIXME: one list_head is enough */
> +	struct list_head unsync_pages;
> +	struct list_head oos_link;
>   

That's okay, we may allow OOS roots one day.

> +	gfn_t root_gfn; /* root this pagetable belongs to, -1 if multimapped */
>  
>  	/*
>  	 * The following two entries are used to key the shadow page in the
> @@ -362,6 +366,8 @@ struct kvm_arch{
>  	unsigned int n_requested_mmu_pages;
>  	unsigned int n_alloc_mmu_pages;
>  	struct hlist_head mmu_page_hash[KVM_NUM_MMU_PAGES];
> +	struct kvm_mmu_page *oos_global_pages[7];
> +	unsigned oos_global_idx;
>   

What does the index mean?  An lru pointer?

I became a little unsynced myself reading the patch.  It's very complex.

We need to change something to keep it maintainable.  Either a better 
data structure, or disallowing a parent to be zapped while any of its 
children are alive.

What happens when a sp->global changes its value while a page is oos?

Can we even detect a nonglobal->global change?  Maybe instead of a flag, 
add a counter for active ptes and global ptes, and a page is global if 
the counters match.  However most likely it doesn't matter at all.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch 12/13] KVM: x86: trap invlpg
  2008-09-06 18:48 ` [patch 12/13] KVM: x86: trap invlpg Marcelo Tosatti
@ 2008-09-07 11:14   ` Avi Kivity
  0 siblings, 0 replies; 42+ messages in thread
From: Avi Kivity @ 2008-09-07 11:14 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: kvm

Marcelo Tosatti wrote:
> With pages out of sync invlpg needs to be trapped. For now simply nuke
> the entry.
>
>   

We could kvm_mmu_pte_write() it, with some modifications.  Probably 
worth it for the guest breaking cow.


>  	[EXIT_REASON_MSR_WRITE]               = handle_wrmsr,
>  	[EXIT_REASON_PENDING_INTERRUPT]       = handle_interrupt_window,
>  	[EXIT_REASON_HLT]                     = handle_halt,
> +	[EXIT_REASON_INVLPG]		      = handle_invlpg,
>  	[EXIT_REASON_VMCALL]                  = handle_vmcall,
>  	[EXIT_REASON_TPR_BELOW_THRESHOLD]     = handle_tpr_below_threshold,
>  	[EXIT_REASON_APIC_ACCESS]             = handle_apic_access,

We shouldn't do that if ept or npt is enabled.

> Index: kvm/arch/x86/kvm/mmu.c
> ===================================================================
> --- kvm.orig/arch/x86/kvm/mmu.c
> +++ kvm/arch/x86/kvm/mmu.c
> @@ -889,6 +889,12 @@ static int nonpaging_sync_page(struct kv
>  	return 1;
>  }
>  
> +static void nonpaging_invlpg(struct kvm_vcpu *vcpu, gva_t gva)
> +{
> +	/* should never happen */
> +	WARN_ON(1);
> +}
>   

Nevertheless, invlpg is legal in real mode.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch 00/13] RFC: out of sync shadow
  2008-09-06 18:48 [patch 00/13] RFC: out of sync shadow Marcelo Tosatti
                   ` (12 preceding siblings ...)
  2008-09-06 18:48 ` [patch 13/13] KVM: MMU: ignore multiroot when unsyncing global pages Marcelo Tosatti
@ 2008-09-07 11:22 ` Avi Kivity
  2008-09-08  7:23   ` Marcelo Tosatti
  2008-09-12  4:05 ` David S. Ahern
  14 siblings, 1 reply; 42+ messages in thread
From: Avi Kivity @ 2008-09-07 11:22 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: kvm

Marcelo Tosatti wrote:
> Keep shadow pages temporarily out of sync, allowing more efficient guest
> PTE updates in comparison to trap-emulate + unprotect heuristics. Stolen
> from Xen :)
>
> This version only allows leaf pagetables to go out of sync, for
> simplicity, but can be enhanced.
>
> VMX "bypass_guest_pf" feature on prefetch_page breaks it (since new
> PTE writes need no TLB flush, I assume). Not sure if its worthwhile to
> convert notrap_nonpresent -> trap_nonpresent on unshadow or just go 
> for unconditional nonpaging_prefetch_page.
>
>   

Doesn't it kill bypass_guest_pf completely?  As soon as we unsync a 
page, we can't have nontrapping nonpresent ptes in it.

We can try convertion on unsync, it does speed up demand paging.

> * Kernel builds on 4-way 64-bit guest improve 10% (+ 3.7% for
> get_user_pages_fast). 
>
> * lmbench's "lat_proc fork" microbenchmark latency is 40% lower (a
> shadow worst scenario test).
>
> * The RHEL3 highpte kscand hangs go from 5+ seconds to < 1 second.
>
> * Windows 2003 Server, 32-bit PAE, DDK build (build -cPzM 3):
>
> Windows 2003 Checked 64 Bit Build Environment, 256M RAM
> 1-vcpu:
> vanilla + gup_fast:         oos
> 0:04:37.375                 0:03:28.047     (- 25%)
>
> 2-vcpus:
> vanilla + gup_fast          oos
> 0:02:32.000                 0:01:56.031     (- 23%)
>
>
> Windows 2003 Checked Build Environment, 1GB RAM
> 2-vcpus:
> vanilla + fast_gup         oos
> 0:02:26.078                0:01:50.110      (- 24%)
>
> 4-vcpus:
> vanilla + fast_gup         oos
> 0:01:59.266                0:01:29.625      (- 25%)
>
>   

Impressive results.

> And I think other optimizations are possible now, for example the guest
> can be responsible for remote TLB flushing on kvm_mmu_pte_write().
>   

But kvm_mmu_pte_write() is no longer called, since we unsync?

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch 02/13] KVM: MMU: switch to get_user_pages_fast
  2008-09-07  8:45   ` Avi Kivity
@ 2008-09-07 20:44     ` Marcelo Tosatti
  2008-09-08 14:53       ` Avi Kivity
  2008-09-09 12:21     ` Andrea Arcangeli
  1 sibling, 1 reply; 42+ messages in thread
From: Marcelo Tosatti @ 2008-09-07 20:44 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, Andrea Arcangeli

On Sun, Sep 07, 2008 at 11:45:41AM +0300, Avi Kivity wrote:
> Marcelo Tosatti wrote:
>> Avoid mmap_sem / pt lock acquision if the pagetables are present. The
>> improvement for hugepage backed guests is more significant, since pte
>> walk + page grab for such mappings is serialized by mm->page_table_lock.
>>
>> CC: Andrea Arcangeli <andrea@qumranet.com>
>>
>>   
>
> I'd like to apply this.  Andrea, can you verify that mmu notifiers  
> interaction is okay?
>
>>   static int kvm_vm_fault(struct vm_area_struct *vma, struct vm_fault 
>> *vmf)
>>  {
>> +	struct page *page[1];
>> +	unsigned long addr;
>> +	int npages;
>> +	gfn_t gfn = vmf->pgoff;
>>  	struct kvm *kvm = vma->vm_file->private_data;
>> -	struct page *page;
>>  -	if (!kvm_is_visible_gfn(kvm, vmf->pgoff))
>> +	addr = gfn_to_hva(kvm, gfn);
>> +	if (kvm_is_error_hva(addr))
>>  		return VM_FAULT_SIGBUS;
>> -	page = gfn_to_page(kvm, vmf->pgoff);
>> -	if (is_error_page(page)) {
>> -		kvm_release_page_clean(page);
>> +
>> +	npages = get_user_pages(current, current->mm, addr, 1, 1, 0, page,
>> +				NULL);
>> +	if (unlikely(npages != 1))
>>  		return VM_FAULT_SIGBUS;
>> -	}
>> -	vmf->page = page;
>> +
>> +	vmf->page = page[0];
>>  	return 0;
>>  }
>>    
>
> Why this change?

Because get_user_pages_fast grabs mmap_sem if necessary, but ->vm_fault
already holds it.

Deadlock:

CPU0                        CPU1    

down_read(mmap_sem)
kvm_vm_fault
                            down_write(mmap_sem)
gfn_to_page
get_user_pages_fast
down_read(mmap_sem)



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch 05/13] KVM: MMU: do not write-protect large mappings
  2008-09-07  9:04   ` Avi Kivity
@ 2008-09-07 20:54     ` Marcelo Tosatti
  0 siblings, 0 replies; 42+ messages in thread
From: Marcelo Tosatti @ 2008-09-07 20:54 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm

On Sun, Sep 07, 2008 at 12:04:59PM +0300, Avi Kivity wrote:
> Marcelo Tosatti wrote:
>> There is not much point in write protecting large mappings. This
>> can only happen when a page is shadowed during the window between
>> is_largepage_backed and mmu_lock acquision. Zap the entry instead, so
>> the next pagefault will find a shadowed page via is_largepage_backed and
>> fallback to 4k translations.
>>
>> Simplifies out of sync shadow.
>>
>> @@ -1222,6 +1221,14 @@ static void mmu_set_spte(struct kvm_vcpu
>>  			if (write_fault)
>>  				*ptwrite = 1;
>>  		}
>> +		/*
>> + 		 * Do not create write protected large translations.
>> + 		 */
>> +		if (largepage && has_wrprotected_page(vcpu->kvm, gfn)) {
>> +			spte = shadow_trap_nonpresent_pte;
>> +			was_writeble = 0;
>> +			*ptwrite = 0;
>> +		}
>>  	}
>>    
>
> Why are you clearing was_writable?

No idea. Its wrong.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch 07/13] KVM: MMU: mode specific sync_page
  2008-09-07  9:52   ` Avi Kivity
@ 2008-09-08  6:03     ` Marcelo Tosatti
  2008-09-08  9:50       ` Avi Kivity
  0 siblings, 1 reply; 42+ messages in thread
From: Marcelo Tosatti @ 2008-09-08  6:03 UTC (permalink / raw)
  To: Avi Kivity; +Cc: KVM list

On Sun, Sep 07, 2008 at 12:52:21PM +0300, Avi Kivity wrote:
> What if vcpu0 is in mode X, while vcpu1 is in mode Y.  vcpu0 writes to  
> some pagetable, causing both mode X and mode Y shadows to become  
> unsynced, so on the next resync (either by vcpu0 or vcpu1) we need to  
> sync both modes.

>From the oos core patch:

-       hlist_for_each_entry(sp, node, bucket, hash_link)
-               if (sp->gfn == gfn && sp->role.word == role.word) {
+       hlist_for_each_entry_safe(sp, node, tmp, bucket, hash_link)
+               if (sp->gfn == gfn) {
+                       /*
+                        * If a pagetable becomes referenced by more than one
+                        * root, or has multiple roles, unsync it and disable
+                        * oos. For higher level pgtables the entire tree
+                        * has to be synced.
+                        */
+                       if (sp->root_gfn != root_gfn) {
+                               kvm_set_pg_inuse(sp);
+                               if (set_shared_mmu_page(vcpu, sp))
+                                       tmp = bucket->first;
+                               kvm_clear_pg_inuse(sp);
+                               unsyncable = 0;
+                       }

So as soon as a pagetable is shadowed with different modes, its resynced 
and unsyncing is disabled.

> Same problem with kvm_mmu_pte_write(), which right now hacks around it.
>
> Maybe we need a ->ops member.

>> +			if (!is_present_pte(*pt)) {
>> +				rmap_remove(vcpu->kvm, &sp->spt[i]);
>> +				sp->spt[i] = shadow_notrap_nonpresent_pte;
>> +				pt++;
>> +				continue;
>> +			}
>>   
>
> Are we missing a tlb flush?  Or will the caller take care of it?

Yes, there's a local TLB flush missing, which can be collapsed into a
single kvm_x86_ops->tlb_flush in the caller.

>> +
>> +			pte_access = sp->role.access & FNAME(gpte_access)(vcpu, *pt);
>> +			/* user */
>> +			if (pte_access & ACC_USER_MASK)
>> +				spte |= shadow_user_mask;
>>   
>
> There are some special cases involving cr0.wp=0 and the user mask.  so  
> spte.u is not correlated exactly with gpte.u.

How come?

>> +			/* guest->shadow accessed sync */
>> +			if (!(*pt & PT_ACCESSED_MASK))
>> +				spte &= ~PT_ACCESSED_MASK;
>>   
>
> spte shouldn't be accessible at all if gpte is not accessed, so we can  
> set gpte.a on the next access (similar to spte not being writeable if  
> gpte is not dirty).

Right. Perhaps accessed bit synchronization to guest could be performed
lazily somehow, so as to avoid a vmexit on every first page access.

>> +			/* shadow->guest accessed sync */
>> +			if (spte & PT_ACCESSED_MASK)
>> +				set_bit(PT_ACCESSED_SHIFT, (unsigned long *)pt);
>>   
>
> host accessed and guest accessed are very different.  We shouldn't set  
> host accessed unless we're sure the guest will access the page very soon.
>
>> +			set_shadow_pte(&sp->spt[i], spte);
>>   
>
> What if permissions are reduced?

Then a local TLB flush is needed. Flushing the TLB's of remote vcpus
should be done by the guest AFAICS.

> You can use PT_* instead of shadow_* as this will never be called when  
> ept is active.
>
> I'm worried about the duplication with kvm_mmu_set_pte().  Perhaps that  
> can be refactored instead to be the inner loop.

Will look into that.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch 01/13] x86/mm: get_user_pages_fast_atomic
  2008-09-07  8:42   ` Avi Kivity
@ 2008-09-08  6:10     ` Marcelo Tosatti
  2008-09-08 14:20       ` Avi Kivity
  0 siblings, 1 reply; 42+ messages in thread
From: Marcelo Tosatti @ 2008-09-08  6:10 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm

On Sun, Sep 07, 2008 at 11:42:18AM +0300, Avi Kivity wrote:
> Marcelo Tosatti wrote:
>> From: Nick Piggin <nickpiggin@yahoo.com.au>
>>
>> Provide a lockless pagetable walk function without fallback to mmap_sem
>> on error.
>>   
>
> I would like to avoid this if possible.  Not only is this a change to  
> the core (with backporting headaches),

Chris mentioned that the backport could use down_read_trylock(mmap_sem), 
and zap the page on failure. Its a simple solution and it should be rare
for mmap_sem to be acquired in write mode.

> if we resync in atomic context  
> this can mean a long time spent with preemption disabled.

The resync time for a single page is comparable to prefetch_page (note
that prefetch_page with direct access via gfn_to_page_atomic is about
50% faster than the current one) plus the gfn->pfn pagetable walks.

It could simply resched based on need_resched after each page synced.
Would that cover your concern?

BTW, it might be interesting to spin_needbreak after resyncing a certain
number of pages.

> We might get around the need by dropping the lock when we resync, fetch  
> the gfns without the lock, and after reacquiring it check whether we can  
> proceed or whether we need to abort and let the guest retry.  We can  
> probably proceed unless one of two things have happened: an mmu page was  
> zapped, or out page was oos'ed while we were resyncing it.

This sounds more complicated. First you have to grab the lock twice for
each page synced. Secondly, the abort case due to oos'ed while resyncing
means the page has to be zapped.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch 09/13] KVM: MMU: out of sync shadow core
  2008-09-07 11:01   ` Avi Kivity
@ 2008-09-08  7:19     ` Marcelo Tosatti
  2008-09-08 14:51       ` Avi Kivity
  2008-09-11 13:15     ` Marcelo Tosatti
  1 sibling, 1 reply; 42+ messages in thread
From: Marcelo Tosatti @ 2008-09-08  7:19 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm

On Sun, Sep 07, 2008 at 02:01:42PM +0300, Avi Kivity wrote:
> Marcelo Tosatti wrote:
>> Allow global and single-root, single-role-per-gfn leaf shadowed
>> pagetables to be unsynced.
>>
>> Global unsync pages are saved into a per-vm array, synced on cr4/cr0 writes.
>>
>>   
>
> Why not a list?

Could be.

> I'm worried about the complexity this (and the rest) introduces.
>
> A possible alternative is:
>
> - for non-leaf pages, including roots, add a 'unsync_children' flag.
> - when marking a page unsync, set the flag recursively on all parents
> - when switching cr3, recursively descend to locate unsynced leaves,  
> clearing flags along the way
> - to speed this up, put a bitmap with 1 bit per pte in the pages (512  
> bits = 64 bytes)
> - the bitmap can be externally allocated to save space, or not
>
> This means we no longer have to worry about multiple roots, when a page  
> acquires another root while it is unsynced, etc.

I thought about that when you first mentioned it, but it seems more
complex than the current structure. Remember you have to clean the
unsynced flag on resync, which means walking up the parents verifying if
this is the last unsynced children.

Other than the bitmap space.

And see comments about multiple roles below.

>> @@ -963,8 +1112,24 @@ static struct kvm_mmu_page *kvm_mmu_get_
>>  		 gfn, role.word);
>>  	index = kvm_page_table_hashfn(gfn);
>>  	bucket = &vcpu->kvm->arch.mmu_page_hash[index];
>> -	hlist_for_each_entry(sp, node, bucket, hash_link)
>> -		if (sp->gfn == gfn && sp->role.word == role.word) {
>> +	hlist_for_each_entry_safe(sp, node, tmp, bucket, hash_link)
>> +		if (sp->gfn == gfn) {
>> +			/*
>> + 			 * If a pagetable becomes referenced by more than one
>> + 			 * root, or has multiple roles, unsync it and disable
>> + 			 * oos. For higher level pgtables the entire tree
>> + 			 * has to be synced.
>> + 			 */
>> +			if (sp->root_gfn != root_gfn) {
>> +				kvm_set_pg_inuse(sp);
>>   
>
> What does inuse mean exactly?

That we're going to access struct kvm_mmu_page, so kvm_sync_page won't
free it (also used for global->nonglobal resync).

>> +				if (set_shared_mmu_page(vcpu, sp))
>> +					tmp = bucket->first;
>> +				kvm_clear_pg_inuse(sp);
>>   
>
> Cleared here?

Can't be zapped after this point.

>
>> +				unsyncable = 0;
>> +			}
>> +			if (sp->role.word != role.word)
>> +				continue;
>> +
>>  			mmu_page_add_parent_pte(vcpu, sp, parent_pte);
>>  			pgprintk("%s: found\n", __func__);
>>  			return sp;
>> --- kvm.orig/include/asm-x86/kvm_host.h
>> +++ kvm/include/asm-x86/kvm_host.h
>> @@ -179,6 +179,10 @@ union kvm_mmu_page_role {
>>  struct kvm_mmu_page {
>>  	struct list_head link;
>>  	struct hlist_node hash_link;
>> +	/* FIXME: one list_head is enough */
>> +	struct list_head unsync_pages;
>> +	struct list_head oos_link;
>>   
>
> That's okay, we may allow OOS roots one day.
>
>> +	gfn_t root_gfn; /* root this pagetable belongs to, -1 if multimapped */
>>   	/*
>>  	 * The following two entries are used to key the shadow page in the
>> @@ -362,6 +366,8 @@ struct kvm_arch{
>>  	unsigned int n_requested_mmu_pages;
>>  	unsigned int n_alloc_mmu_pages;
>>  	struct hlist_head mmu_page_hash[KVM_NUM_MMU_PAGES];
>> +	struct kvm_mmu_page *oos_global_pages[7];
>> +	unsigned oos_global_idx;
>>   
>
> What does the index mean?  An lru pointer?

Yes, last saved index into array.

> I became a little unsynced myself reading the patch.  It's very complex.

Can you go into detail? Worrying about multiple roots is more about
code change (passing root_gfn down to mmu_get_page etc) than structural
complexity I think. It boils down to

                    if (sp->root_gfn != root_gfn) {
                        kvm_set_pg_inuse(sp);
                        if (set_shared_mmu_page(vcpu, sp))
                            tmp = bucket->first;
                        kvm_clear_pg_inuse(sp);
                    }

And this also deals with the pagetable with shadows in different
modes/roles case. You'd still have to deal with that by keeping unsync
information all the way up to root.

> We need to change something to keep it maintainable.  Either a better  
> data structure, 

> or disallowing a parent to be zapped while any of its  
> children are alive.

What is the problem with that? And what the alternative would be, 
to zap all children first?

So more details please, what exactly is annoying you:

- Awareness of multiple roots in the current form ? I agree its
  not very elegant.
- The fact that hash table bucket and active_mmu_page
  for_each_entry_safe walks are unsafe because several list
  entries (the unsynced leafs) can be deleted ?

> What happens when a sp->global changes its value while a page is oos?

Its resynced on global->nonglobal change.

> Can we even detect a nonglobal->global change?  Maybe instead of a flag,  
> add a counter for active ptes and global ptes, and a page is global if  
> the counters match.  However most likely it doesn't matter at all.

Yeah, it seems nonglobal->global change is quite uncommon.

Oh, and another argument in favour of atomic resync is that you can
do it from mmu_get_page (for multiple role case), and from within
mmu_set_spte (for global->nonglobal change).

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch 00/13] RFC: out of sync shadow
  2008-09-07 11:22 ` [patch 00/13] RFC: out of sync shadow Avi Kivity
@ 2008-09-08  7:23   ` Marcelo Tosatti
  2008-09-08 14:56     ` Avi Kivity
  0 siblings, 1 reply; 42+ messages in thread
From: Marcelo Tosatti @ 2008-09-08  7:23 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm

On Sun, Sep 07, 2008 at 02:22:47PM +0300, Avi Kivity wrote:
>> VMX "bypass_guest_pf" feature on prefetch_page breaks it (since new
>> PTE writes need no TLB flush, I assume). Not sure if its worthwhile to
>> convert notrap_nonpresent -> trap_nonpresent on unshadow or just go  
>> for unconditional nonpaging_prefetch_page.
>>
>>   
>
> Doesn't it kill bypass_guest_pf completely?  As soon as we unsync a  
> page, we can't have nontrapping nonpresent ptes in it.

Yes, thats what I meant.

> We can try convertion on unsync, it does speed up demand paging.
>
>> And I think other optimizations are possible now, for example the guest
>> can be responsible for remote TLB flushing on kvm_mmu_pte_write().
>>   
>
> But kvm_mmu_pte_write() is no longer called, since we unsync?

On higher level pagetables it still is. The IPI functions show high up 
in profiling (probably more due to write protection, but still).

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch 07/13] KVM: MMU: mode specific sync_page
  2008-09-08  6:03     ` Marcelo Tosatti
@ 2008-09-08  9:50       ` Avi Kivity
  0 siblings, 0 replies; 42+ messages in thread
From: Avi Kivity @ 2008-09-08  9:50 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: KVM list

Marcelo Tosatti wrote:
> On Sun, Sep 07, 2008 at 12:52:21PM +0300, Avi Kivity wrote:
>   
>> What if vcpu0 is in mode X, while vcpu1 is in mode Y.  vcpu0 writes to  
>> some pagetable, causing both mode X and mode Y shadows to become  
>> unsynced, so on the next resync (either by vcpu0 or vcpu1) we need to  
>> sync both modes.
>>     
>
> From the oos core patch:
>
> -       hlist_for_each_entry(sp, node, bucket, hash_link)
> -               if (sp->gfn == gfn && sp->role.word == role.word) {
> +       hlist_for_each_entry_safe(sp, node, tmp, bucket, hash_link)
> +               if (sp->gfn == gfn) {
> +                       /*
> +                        * If a pagetable becomes referenced by more than one
> +                        * root, or has multiple roles, unsync it and disable
> +                        * oos. For higher level pgtables the entire tree
> +                        * has to be synced.
> +                        */
> +                       if (sp->root_gfn != root_gfn) {
> +                               kvm_set_pg_inuse(sp);
> +                               if (set_shared_mmu_page(vcpu, sp))
> +                                       tmp = bucket->first;
> +                               kvm_clear_pg_inuse(sp);
> +                               unsyncable = 0;
> +                       }
>
> So as soon as a pagetable is shadowed with different modes, its resynced 
> and unsyncing is disabled.
>
>   

Okay.  But the complexity here, esp. with rarely used cases like 
multiple mode shadows, is frightening.

>>> +
>>> +			pte_access = sp->role.access & FNAME(gpte_access)(vcpu, *pt);
>>> +			/* user */
>>> +			if (pte_access & ACC_USER_MASK)
>>> +				spte |= shadow_user_mask;
>>>   
>>>       
>> There are some special cases involving cr0.wp=0 and the user mask.  so  
>> spte.u is not correlated exactly with gpte.u.
>>     
>
> How come?
>
>   

When cr0.wp=0, the cpu ignores pte.w for cpl=0 accesses.  kvm requires 
cr0.wp=1 (since we need to write protect pages, for many reasons, like 
emulating pte.dirty).  This is how we handle pte.u=1 + pte.w=0:

- for cpl 0 accesses, we set spte.w=1 (to allow the write) and spte.u=0 
(to forbid cpl>0 accesses)
- for cpl>0 accesses, we set spte.w=0 (to forbid userspace write 
accesses) and spte.u=1 (to allow cpl>0 read accesses)

this works well except if the accesses keep alternating between 
userspace and kernel.

>>> +			/* guest->shadow accessed sync */
>>> +			if (!(*pt & PT_ACCESSED_MASK))
>>> +				spte &= ~PT_ACCESSED_MASK;
>>>   
>>>       
>> spte shouldn't be accessible at all if gpte is not accessed, so we can  
>> set gpte.a on the next access (similar to spte not being writeable if  
>> gpte is not dirty).
>>     
>
> Right. Perhaps accessed bit synchronization to guest could be performed
> lazily somehow, so as to avoid a vmexit on every first page access.
>   

I don't think this is doable (well you can do it if you make the guest 
page table not present, but then even reading the accessed bit faults).

>>> +			set_shadow_pte(&sp->spt[i], spte);
>>>   
>>>       
>> What if permissions are reduced?
>>     
>
> Then a local TLB flush is needed. Flushing the TLB's of remote vcpus
> should be done by the guest AFAICS.
>
>   

hm.  It depends on why they are reduced.  If the page became shadowed, 
then we are responsible.

Don't think this is the case here, so local flush is likely sufficient.


-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch 01/13] x86/mm: get_user_pages_fast_atomic
  2008-09-08  6:10     ` Marcelo Tosatti
@ 2008-09-08 14:20       ` Avi Kivity
  0 siblings, 0 replies; 42+ messages in thread
From: Avi Kivity @ 2008-09-08 14:20 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: kvm

Marcelo Tosatti wrote:
> On Sun, Sep 07, 2008 at 11:42:18AM +0300, Avi Kivity wrote:
>   
>> Marcelo Tosatti wrote:
>>     
>>> From: Nick Piggin <nickpiggin@yahoo.com.au>
>>>
>>> Provide a lockless pagetable walk function without fallback to mmap_sem
>>> on error.
>>>   
>>>       
>> I would like to avoid this if possible.  Not only is this a change to  
>> the core (with backporting headaches),
>>     
>
> Chris mentioned that the backport could use down_read_trylock(mmap_sem), 
> and zap the page on failure. Its a simple solution and it should be rare
> for mmap_sem to be acquired in write mode.
>
>   

Yes. Clever.

>> if we resync in atomic context  
>> this can mean a long time spent with preemption disabled.
>>     
>
> The resync time for a single page is comparable to prefetch_page (note
> that prefetch_page with direct access via gfn_to_page_atomic is about
> 50% faster than the current one) plus the gfn->pfn pagetable walks.
>   

These could be very expensive as the gfn->pfn mapping is essentially 
random and too big to be cached. 512 cache misses is a lot of time - 
perhaps upwards of 50 microseconds.

Snapshotting is a must here IMO -- and it avoids the need for a walk 
completely.

> It could simply resched based on need_resched after each page synced.
> Would that cover your concern?
>
>   

I guess it's better than nothing.

> BTW, it might be interesting to spin_needbreak after resyncing a certain
> number of pages.
>
>   
>> We might get around the need by dropping the lock when we resync, fetch  
>> the gfns without the lock, and after reacquiring it check whether we can  
>> proceed or whether we need to abort and let the guest retry.  We can  
>> probably proceed unless one of two things have happened: an mmu page was  
>> zapped, or out page was oos'ed while we were resyncing it.
>>     
>
> This sounds more complicated. First you have to grab the lock twice for
> each page synced. Secondly, the abort case due to oos'ed while resyncing
> means the page has to be zapped.
>   

It is complicated, yes.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch 09/13] KVM: MMU: out of sync shadow core
  2008-09-08  7:19     ` Marcelo Tosatti
@ 2008-09-08 14:51       ` Avi Kivity
  2008-09-11  8:19         ` Marcelo Tosatti
  0 siblings, 1 reply; 42+ messages in thread
From: Avi Kivity @ 2008-09-08 14:51 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: kvm

Marcelo Tosatti wrote:
>> I'm worried about the complexity this (and the rest) introduces.
>>
>> A possible alternative is:
>>
>> - for non-leaf pages, including roots, add a 'unsync_children' flag.
>> - when marking a page unsync, set the flag recursively on all parents
>> - when switching cr3, recursively descend to locate unsynced leaves,  
>> clearing flags along the way
>> - to speed this up, put a bitmap with 1 bit per pte in the pages (512  
>> bits = 64 bytes)
>> - the bitmap can be externally allocated to save space, or not
>>
>> This means we no longer have to worry about multiple roots, when a page  
>> acquires another root while it is unsynced, etc.
>>     
>
> I thought about that when you first mentioned it, but it seems more
> complex than the current structure. Remember you have to clean the
> unsynced flag on resync, which means walking up the parents verifying if
> this is the last unsynced children.
>   

No, if you have a false positive you can simply ignore it.

> Other than the bitmap space.
>   

The bitmap space could be stored in a separate structure.  
Alternatively, put a few u16s with indexes into the page header.  Would 
be faster to walk as well, though less general.

> And see comments about multiple roles below.
>
>   
>>> @@ -963,8 +1112,24 @@ static struct kvm_mmu_page *kvm_mmu_get_
>>>  		 gfn, role.word);
>>>  	index = kvm_page_table_hashfn(gfn);
>>>  	bucket = &vcpu->kvm->arch.mmu_page_hash[index];
>>> -	hlist_for_each_entry(sp, node, bucket, hash_link)
>>> -		if (sp->gfn == gfn && sp->role.word == role.word) {
>>> +	hlist_for_each_entry_safe(sp, node, tmp, bucket, hash_link)
>>> +		if (sp->gfn == gfn) {
>>> +			/*
>>> + 			 * If a pagetable becomes referenced by more than one
>>> + 			 * root, or has multiple roles, unsync it and disable
>>> + 			 * oos. For higher level pgtables the entire tree
>>> + 			 * has to be synced.
>>> + 			 */
>>> +			if (sp->root_gfn != root_gfn) {
>>> +				kvm_set_pg_inuse(sp);
>>>   
>>>       
>> What does inuse mean exactly?
>>     
>
> That we're going to access struct kvm_mmu_page, so kvm_sync_page won't
> free it (also used for global->nonglobal resync).
>
>   

Couldn't it be passed as a parameter?

>> I became a little unsynced myself reading the patch.  It's very complex.
>>     
>
> Can you go into detail? Worrying about multiple roots is more about
> code change (passing root_gfn down to mmu_get_page etc) than structural
> complexity I think. It boils down to
>
>                     if (sp->root_gfn != root_gfn) {
>                         kvm_set_pg_inuse(sp);
>                         if (set_shared_mmu_page(vcpu, sp))
>                             tmp = bucket->first;
>                         kvm_clear_pg_inuse(sp);
>                     }
>
> And this also deals with the pagetable with shadows in different
> modes/roles case. You'd still have to deal with that by keeping unsync
> information all the way up to root.
>
>   

I'm worried about the amount of state we add.  Whether a page is 
single-root or multi-root, if it's in the same mode or multiple modes.  
The problems with the large amount of state is that the number of 
possible state transitions increases rapidly.

So far we treat each page completely independently of other pages (apart 
from the connectivity pointers), so we avoid the combinatorial 
explosion.  The tree walk approach keeps that (at the expense of some 
efficiency, unfortunately).

>> or disallowing a parent to be zapped while any of its  
>> children are alive.
>>     
>
> What is the problem with that? 

It reduces the mmu flexibility.  If we (say) introduce an lru algorithm, 
it is orthogonal to everything else in the mmu.  If we have a root/child 
dependency, the lru has to know.

> And what the alternative would be, 
> to zap all children first?
>   

That has the disadvantage of allowing very bad corner cases if we are 
forced to zap a root.

I'd really like to avoid bad worst cases.

> So more details please, what exactly is annoying you:
>
> - Awareness of multiple roots in the current form ? I agree its
>   not very elegant.
>   

Yes.

> - The fact that hash table bucket and active_mmu_page
>   for_each_entry_safe walks are unsafe because several list
>   entries (the unsynced leafs) can be deleted ?
>
>   

Hadn't even considered that...

What worries me most is that everything is interconnected: multiple 
modes, cr3 switch, out-of-sync, zapping via the inuse flag.  It's very 
difficult for me to understand, what about someone new?

We need to make this fit better.  We need to morph some mmu 
infrastructure to something else, but we can't keep adding complexity.

> Oh, and another argument in favour of atomic resync is that you can
> do it from mmu_get_page (for multiple role case), and from within
> mmu_set_spte (for global->nonglobal change).
>   

I'll be more comfortable with atomic resync if we have snapshots as a 
means not to require so many walks in an atomic context.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch 02/13] KVM: MMU: switch to get_user_pages_fast
  2008-09-07 20:44     ` Marcelo Tosatti
@ 2008-09-08 14:53       ` Avi Kivity
  0 siblings, 0 replies; 42+ messages in thread
From: Avi Kivity @ 2008-09-08 14:53 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: kvm, Andrea Arcangeli

Marcelo Tosatti wrote:
>>
>>>   static int kvm_vm_fault(struct vm_area_struct *vma, struct vm_fault 
>>> *vmf)
>>>  {
>>> +	struct page *page[1];
>>> +	unsigned long addr;
>>> +	int npages;
>>> +	gfn_t gfn = vmf->pgoff;
>>>  	struct kvm *kvm = vma->vm_file->private_data;
>>> -	struct page *page;
>>>  -	if (!kvm_is_visible_gfn(kvm, vmf->pgoff))
>>> +	addr = gfn_to_hva(kvm, gfn);
>>> +	if (kvm_is_error_hva(addr))
>>>  		return VM_FAULT_SIGBUS;
>>> -	page = gfn_to_page(kvm, vmf->pgoff);
>>> -	if (is_error_page(page)) {
>>> -		kvm_release_page_clean(page);
>>> +
>>> +	npages = get_user_pages(current, current->mm, addr, 1, 1, 0, page,
>>> +				NULL);
>>> +	if (unlikely(npages != 1))
>>>  		return VM_FAULT_SIGBUS;
>>> -	}
>>> -	vmf->page = page;
>>> +
>>> +	vmf->page = page[0];
>>>  	return 0;
>>>  }
>>>    
>>>       
>> Why this change?
>>     
>
> Because get_user_pages_fast grabs mmap_sem if necessary, but ->vm_fault
> already holds it.
>
>   

Right.  I think it merits its own patch (and changelog entry) since it 
differs in spirit from the rest.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch 00/13] RFC: out of sync shadow
  2008-09-08  7:23   ` Marcelo Tosatti
@ 2008-09-08 14:56     ` Avi Kivity
  0 siblings, 0 replies; 42+ messages in thread
From: Avi Kivity @ 2008-09-08 14:56 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: kvm

Marcelo Tosatti wrote:
> On Sun, Sep 07, 2008 at 02:22:47PM +0300, Avi Kivity wrote:
>   
>>> VMX "bypass_guest_pf" feature on prefetch_page breaks it (since new
>>> PTE writes need no TLB flush, I assume). Not sure if its worthwhile to
>>> convert notrap_nonpresent -> trap_nonpresent on unshadow or just go  
>>> for unconditional nonpaging_prefetch_page.
>>>
>>>   
>>>       
>> Doesn't it kill bypass_guest_pf completely?  As soon as we unsync a  
>> page, we can't have nontrapping nonpresent ptes in it.
>>     
>
> Yes, thats what I meant.
>
>   

We can convert those ptes during unsync, and back during resync.  That's 
a fairly cheap operation.


-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch 02/13] KVM: MMU: switch to get_user_pages_fast
  2008-09-07  8:45   ` Avi Kivity
  2008-09-07 20:44     ` Marcelo Tosatti
@ 2008-09-09 12:21     ` Andrea Arcangeli
  2008-09-09 13:57       ` Avi Kivity
  1 sibling, 1 reply; 42+ messages in thread
From: Andrea Arcangeli @ 2008-09-09 12:21 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Marcelo Tosatti, kvm

On Sun, Sep 07, 2008 at 11:45:41AM +0300, Avi Kivity wrote:
> Marcelo Tosatti wrote:
>> Avoid mmap_sem / pt lock acquision if the pagetables are present. The
>> improvement for hugepage backed guests is more significant, since pte
>> walk + page grab for such mappings is serialized by mm->page_table_lock.
>>
>> CC: Andrea Arcangeli <andrea@qumranet.com>
>>
>>   
>
> I'd like to apply this.  Andrea, can you verify that mmu notifiers 
> interaction is okay?

Looks great to me. Thanks!

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch 02/13] KVM: MMU: switch to get_user_pages_fast
  2008-09-09 12:21     ` Andrea Arcangeli
@ 2008-09-09 13:57       ` Avi Kivity
  0 siblings, 0 replies; 42+ messages in thread
From: Avi Kivity @ 2008-09-09 13:57 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Andrea Arcangeli, kvm

Andrea Arcangeli wrote:
> On Sun, Sep 07, 2008 at 11:45:41AM +0300, Avi Kivity wrote:
>   
>> Marcelo Tosatti wrote:
>>     
>>> Avoid mmap_sem / pt lock acquision if the pagetables are present. The
>>> improvement for hugepage backed guests is more significant, since pte
>>> walk + page grab for such mappings is serialized by mm->page_table_lock.
>>>
>>> CC: Andrea Arcangeli <andrea@qumranet.com>
>>>
>>>   
>>>       
>> I'd like to apply this.  Andrea, can you verify that mmu notifiers 
>> interaction is okay?
>>     
>
> Looks great to me. Thanks!
>   

Thanks.

Marcelo, can you split the patch into two and repost?  We can get this 
in quickly.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch 09/13] KVM: MMU: out of sync shadow core
  2008-09-08 14:51       ` Avi Kivity
@ 2008-09-11  8:19         ` Marcelo Tosatti
  0 siblings, 0 replies; 42+ messages in thread
From: Marcelo Tosatti @ 2008-09-11  8:19 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm

On Mon, Sep 08, 2008 at 05:51:26PM +0300, Avi Kivity wrote:
> Marcelo Tosatti wrote:
>>> I'm worried about the complexity this (and the rest) introduces.
>>>
>>> A possible alternative is:
>>>
>>> - for non-leaf pages, including roots, add a 'unsync_children' flag.
>>> - when marking a page unsync, set the flag recursively on all parents
>>> - when switching cr3, recursively descend to locate unsynced leaves,  
>>> clearing flags along the way
>>> - to speed this up, put a bitmap with 1 bit per pte in the pages (512 
>>>  bits = 64 bytes)
>>> - the bitmap can be externally allocated to save space, or not
>>>
>>> This means we no longer have to worry about multiple roots, when a 
>>> page  acquires another root while it is unsynced, etc.
>>>     
>>
>> I thought about that when you first mentioned it, but it seems more
>> complex than the current structure. Remember you have to clean the
>> unsynced flag on resync, which means walking up the parents verifying if
>> this is the last unsynced children.
>>   
>
> No, if you have a false positive you can simply ignore it.
>
>> Other than the bitmap space.
>>   
>
> The bitmap space could be stored in a separate structure.   
> Alternatively, put a few u16s with indexes into the page header.  Would  
> be faster to walk as well, though less general.
>
>> And see comments about multiple roles below.
>>
>>   
>>>> @@ -963,8 +1112,24 @@ static struct kvm_mmu_page *kvm_mmu_get_
>>>>  		 gfn, role.word);
>>>>  	index = kvm_page_table_hashfn(gfn);
>>>>  	bucket = &vcpu->kvm->arch.mmu_page_hash[index];
>>>> -	hlist_for_each_entry(sp, node, bucket, hash_link)
>>>> -		if (sp->gfn == gfn && sp->role.word == role.word) {
>>>> +	hlist_for_each_entry_safe(sp, node, tmp, bucket, hash_link)
>>>> +		if (sp->gfn == gfn) {
>>>> +			/*
>>>> + 			 * If a pagetable becomes referenced by more than one
>>>> + 			 * root, or has multiple roles, unsync it and disable
>>>> + 			 * oos. For higher level pgtables the entire tree
>>>> + 			 * has to be synced.
>>>> + 			 */
>>>> +			if (sp->root_gfn != root_gfn) {
>>>> +				kvm_set_pg_inuse(sp);
>>>>         
>>> What does inuse mean exactly?
>>>     
>>
>> That we're going to access struct kvm_mmu_page, so kvm_sync_page won't
>> free it (also used for global->nonglobal resync).
>>
>>   
>
> Couldn't it be passed as a parameter?

Yes it could.

>>> I became a little unsynced myself reading the patch.  It's very complex.
>>>     
>>
>> Can you go into detail? Worrying about multiple roots is more about
>> code change (passing root_gfn down to mmu_get_page etc) than structural
>> complexity I think. It boils down to
>>
>>                     if (sp->root_gfn != root_gfn) {
>>                         kvm_set_pg_inuse(sp);
>>                         if (set_shared_mmu_page(vcpu, sp))
>>                             tmp = bucket->first;
>>                         kvm_clear_pg_inuse(sp);
>>                     }
>>
>> And this also deals with the pagetable with shadows in different
>> modes/roles case. You'd still have to deal with that by keeping unsync
>> information all the way up to root.
>>
>>   
>
> I'm worried about the amount of state we add.  Whether a page is  
> single-root or multi-root, if it's in the same mode or multiple modes.   

You need similar complexity to handle pagetables with multiple roles
under unsync-info-on-tree approach:

@@ -993,8 +1049,15 @@ static struct kvm_mmu_page *kvm_mmu_get_
                 gfn, role.word);
        index = kvm_page_table_hashfn(gfn);
        bucket = &vcpu->kvm->arch.mmu_page_hash[index];
-       hlist_for_each_entry(sp, node, bucket, hash_link)
-               if (sp->gfn == gfn && sp->role.word == role.word) {
+       hlist_for_each_entry_safe(sp, node, tmp, bucket, hash_link)
+               if (sp->gfn == gfn) {
+                       if (sp->role.word != role.word) {
+                               if (kvm_page_unsync(sp))
+                                       mmu_invalidate_unsync_page(vcpu, sp);
+                               unsyncable = 1;
+                               continue;
+                       }
+
                        mmu_page_add_parent_pte(vcpu, sp, parent_pte);
                        pgprintk("%s: found\n", __func__);
                        return sp;

And such pagetables with multiple shadows will have to be marked as
"unsyncable". Which is pretty similar to what the current patchset does,
except that it accounts for multi-root other than multiple shadows.

The other option, which is to keep the multiple shadow unsync, and have
them synced by their respective users, seems much more complex than
this.

> The problems with the large amount of state is that the number of  
> possible state transitions increases rapidly.
>
> So far we treat each page completely independently of other pages (apart  
> from the connectivity pointers), so we avoid the combinatorial  
> explosion.  The tree walk approach keeps that (at the expense of some  
> efficiency, unfortunately).
>
>>> or disallowing a parent to be zapped while any of its  children are 
>>> alive.
>>>     
>>
>> What is the problem with that? 
>
> It reduces the mmu flexibility.  If we (say) introduce an lru algorithm,  
> it is orthogonal to everything else in the mmu.  If we have a root/child  
> dependency, the lru has to know.
>
>> And what the alternative would be, to zap all children first?
>>   
>
> That has the disadvantage of allowing very bad corner cases if we are  
> forced to zap a root.
>
> I'd really like to avoid bad worst cases.
>
>> So more details please, what exactly is annoying you:
>>
>> - Awareness of multiple roots in the current form ? I agree its
>>   not very elegant.
>>   
>
> Yes.
>
>> - The fact that hash table bucket and active_mmu_page
>>   for_each_entry_safe walks are unsafe because several list
>>   entries (the unsynced leafs) can be deleted ?
>>
>>   
>
> Hadn't even considered that...
>
> What worries me most is that everything is interconnected: multiple  
> modes, cr3 switch, out-of-sync, zapping via the inuse flag.  It's very  
> difficult for me to understand, what about someone new?
>
> We need to make this fit better.  We need to morph some mmu  
> infrastructure to something else, but we can't keep adding complexity.

Another point that came to mind is that, with the unsync-info-on-tree
approach, whenever a higher level pagetable is zapped all of its
unsynced children need to be zapped.

Since at the moment resync on atomic context is not an option, the only
option is to zap the children, potentially wasting a lot of shadowed
entries. This is not an issue with the current approach since unsync
children are tied to roots, not every parents all the way to root.

I'll measure how often that happens, but seems a bad side effect.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch 09/13] KVM: MMU: out of sync shadow core
  2008-09-07 11:01   ` Avi Kivity
  2008-09-08  7:19     ` Marcelo Tosatti
@ 2008-09-11 13:15     ` Marcelo Tosatti
  1 sibling, 0 replies; 42+ messages in thread
From: Marcelo Tosatti @ 2008-09-11 13:15 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm

On Sun, Sep 07, 2008 at 02:01:42PM +0300, Avi Kivity wrote:
> Marcelo Tosatti wrote:
>> Allow global and single-root, single-role-per-gfn leaf shadowed
>> pagetables to be unsynced.
>>
>> Global unsync pages are saved into a per-vm array, synced on cr4/cr0 writes.
>>
>>   
>
> Why not a list?
>
>> Non-global unsync pages are linked off their root shadow page, synced  
>> on cr3/cr4/cr0 writes.
>>
>> Some of this logic is simplistic and could be smarter (page_multimapped and
>> the full root sync on higher level pagetable sharing).
>>
>> Also unsyncing of non-leaf nodes might be interesting (but more complicated).
>>
>>
>>  +static struct kvm_mmu_page *kvm_mmu_lookup_page_root(struct kvm_vcpu 
>> *vcpu,
>> +						     gfn_t gfn)
>> +{
>> +	unsigned index;
>> +	struct hlist_head *bucket;
>> +	struct kvm_mmu_page *sp;
>> +	struct hlist_node *node;
>> +	struct kvm *kvm = vcpu->kvm;
>> +	int level = vcpu->arch.mmu.root_level;
>> +	if (!is_long_mode(vcpu) && is_pae(vcpu))
>> +		level--;
>> +
>> +	pgprintk("%s: looking for gfn %lx\n", __func__, gfn);
>> +	index = kvm_page_table_hashfn(gfn);
>> +	bucket = &kvm->arch.mmu_page_hash[index];
>> +	hlist_for_each_entry(sp, node, bucket, hash_link)
>> +		if (sp->gfn == gfn && !sp->role.metaphysical
>> +		    && !sp->role.invalid && sp->role.level == level) {
>> +			pgprintk("%s: found role %x\n",
>> +				 __func__, sp->role.word);
>> +			return sp;
>> +		}
>> +	return NULL;
>> +}
>>   
>
> I'm worried about the complexity this (and the rest) introduces.
>
> A possible alternative is:
>
> - for non-leaf pages, including roots, add a 'unsync_children' flag.
> - when marking a page unsync, set the flag recursively on all parents
> - when switching cr3, recursively descend to locate unsynced leaves,  
> clearing flags along the way
> - to speed this up, put a bitmap with 1 bit per pte in the pages (512  
> bits = 64 bytes)
> - the bitmap can be externally allocated to save space, or not
>
> This means we no longer have to worry about multiple roots, when a page  
> acquires another root while it is unsynced, etc.

While trying to implement it, found a pretty nasty issue with this. A
process can switch to a shadow root which has no present entries, whose
unvisible (by cr3 switch time) children may be unsynced.

The Xen implementation gets away with that by syncing all oos pages on
cr3 switch (which are a few).

The obvious way to fix it is to resync at kvm_mmu_get_page time,
whenever grabbing an unsynced page. Which greatly reduces the
out-of-sync time of a pagetable, breaking the ability to populate via
pagefault without resync.

The singleroot scheme avoids such "invisible" unsynced pages,
all out-of-sync pages in the root in question are reachable via
sp->unsynced_pages.

Any ideas on how to get around that situation?




^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch 00/13] RFC: out of sync shadow
  2008-09-06 18:48 [patch 00/13] RFC: out of sync shadow Marcelo Tosatti
                   ` (13 preceding siblings ...)
  2008-09-07 11:22 ` [patch 00/13] RFC: out of sync shadow Avi Kivity
@ 2008-09-12  4:05 ` David S. Ahern
  2008-09-12 11:51   ` Marcelo Tosatti
  14 siblings, 1 reply; 42+ messages in thread
From: David S. Ahern @ 2008-09-12  4:05 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: kvm

Hi Marcelo:

This patchset causes my RHEL3 guest to hang during boot at one of the
early sym53c8xx messages:

sym53c8xx: at PCI bus 0, device 5, functions 0

Using ide instead of scsi the guest proceeds farther, but inevitably
hangs as well. I've tried dropping the amount of ram to 1024 and varied
the number of vcpus as well (including 1 vcpu).

When it hangs kvm on the host is spinning on one of the cpus, and
kvm/qemu appears to be 1 thread short. For the kvm process I expect to
see 2 + Nvcpus threads (ps -C kvm -L). With this patchset I see 2 +
Nvcpus - 1. (e.g., I usually run with 4 vcpus, so there should be 6
threads. I see only 5).

I'm using kvm-git tip from a couple of days ago + this patch set. kvm
userspace comes from kvm-75. Resetting to kvm-git and the guest starts
up just fine.

david


Marcelo Tosatti wrote:
> Keep shadow pages temporarily out of sync, allowing more efficient guest
> PTE updates in comparison to trap-emulate + unprotect heuristics. Stolen
> from Xen :)
> 
> This version only allows leaf pagetables to go out of sync, for
> simplicity, but can be enhanced.
> 
> VMX "bypass_guest_pf" feature on prefetch_page breaks it (since new
> PTE writes need no TLB flush, I assume). Not sure if its worthwhile to
> convert notrap_nonpresent -> trap_nonpresent on unshadow or just go 
> for unconditional nonpaging_prefetch_page.
> 
> * Kernel builds on 4-way 64-bit guest improve 10% (+ 3.7% for
> get_user_pages_fast). 
> 
> * lmbench's "lat_proc fork" microbenchmark latency is 40% lower (a
> shadow worst scenario test).
> 
> * The RHEL3 highpte kscand hangs go from 5+ seconds to < 1 second.
> 
> * Windows 2003 Server, 32-bit PAE, DDK build (build -cPzM 3):
> 
> Windows 2003 Checked 64 Bit Build Environment, 256M RAM
> 1-vcpu:
> vanilla + gup_fast:         oos
> 0:04:37.375                 0:03:28.047     (- 25%)
> 
> 2-vcpus:
> vanilla + gup_fast          oos
> 0:02:32.000                 0:01:56.031     (- 23%)
> 
> 
> Windows 2003 Checked Build Environment, 1GB RAM
> 2-vcpus:
> vanilla + fast_gup         oos
> 0:02:26.078                0:01:50.110      (- 24%)
> 
> 4-vcpus:
> vanilla + fast_gup         oos
> 0:01:59.266                0:01:29.625      (- 25%)
> 
> And I think other optimizations are possible now, for example the guest
> can be responsible for remote TLB flushing on kvm_mmu_pte_write().
> 
> Please review.
> 
> 


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch 00/13] RFC: out of sync shadow
  2008-09-12  4:05 ` David S. Ahern
@ 2008-09-12 11:51   ` Marcelo Tosatti
  2008-09-12 15:12     ` David S. Ahern
  0 siblings, 1 reply; 42+ messages in thread
From: Marcelo Tosatti @ 2008-09-12 11:51 UTC (permalink / raw)
  To: David S. Ahern; +Cc: kvm

On Thu, Sep 11, 2008 at 10:05:12PM -0600, David S. Ahern wrote:
> Hi Marcelo:
> 
> This patchset causes my RHEL3 guest to hang during boot at one of the
> early sym53c8xx messages:
> 
> sym53c8xx: at PCI bus 0, device 5, functions 0
> 
> Using ide instead of scsi the guest proceeds farther, but inevitably
> hangs as well. I've tried dropping the amount of ram to 1024 and varied
> the number of vcpus as well (including 1 vcpu).
> 
> When it hangs kvm on the host is spinning on one of the cpus, and
> kvm/qemu appears to be 1 thread short. For the kvm process I expect to
> see 2 + Nvcpus threads (ps -C kvm -L). With this patchset I see 2 +
> Nvcpus - 1. (e.g., I usually run with 4 vcpus, so there should be 6
> threads. I see only 5).
> 
> I'm using kvm-git tip from a couple of days ago + this patch set. kvm
> userspace comes from kvm-75. Resetting to kvm-git and the guest starts
> up just fine.

David,

Please reload the kvm-intel module with "bypass_guest_pf=0" option.



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch 00/13] RFC: out of sync shadow
  2008-09-12 11:51   ` Marcelo Tosatti
@ 2008-09-12 15:12     ` David S. Ahern
  2008-09-12 18:09       ` Marcelo Tosatti
  0 siblings, 1 reply; 42+ messages in thread
From: David S. Ahern @ 2008-09-12 15:12 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: kvm

Marcelo Tosatti wrote:
> On Thu, Sep 11, 2008 at 10:05:12PM -0600, David S. Ahern wrote:
> 
> David,
> 
> Please reload the kvm-intel module with "bypass_guest_pf=0" option.
> 
> 

DOH. You mentioned that in your description, and I forgot to disable it.
 Works fine now.

Guest behavior is amazing. After 30 min of uptime, kscand shows only
17secs of cpu usage. Before, kscand was hitting over a minute after
about 10 minutes of uptime.

david

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch 00/13] RFC: out of sync shadow
  2008-09-12 15:12     ` David S. Ahern
@ 2008-09-12 18:09       ` Marcelo Tosatti
  2008-09-12 18:19         ` David S. Ahern
  0 siblings, 1 reply; 42+ messages in thread
From: Marcelo Tosatti @ 2008-09-12 18:09 UTC (permalink / raw)
  To: David S. Ahern; +Cc: kvm

On Fri, Sep 12, 2008 at 09:12:03AM -0600, David S. Ahern wrote:
> 
> 
> Marcelo Tosatti wrote:
> > On Thu, Sep 11, 2008 at 10:05:12PM -0600, David S. Ahern wrote:
> > 
> > David,
> > 
> > Please reload the kvm-intel module with "bypass_guest_pf=0" option.
> > 
> > 
> 
> DOH. You mentioned that in your description, and I forgot to disable it.
>  Works fine now.
> 
> Guest behavior is amazing. After 30 min of uptime, kscand shows only
> 17secs of cpu usage. Before, kscand was hitting over a minute after
> about 10 minutes of uptime.

Great. Note that its not fully optimized for the RHEL3 highpte kscand
case, which calls invlpg for present pagetable entries.

The current patchset simply invalidates the shadow on invlpg, meaning
that the next access will cause a pagefault exit.

It can instead read the guest pte and prefault, saving one exit per
test-and-clear-accessed operation.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch 00/13] RFC: out of sync shadow
  2008-09-12 18:09       ` Marcelo Tosatti
@ 2008-09-12 18:19         ` David S. Ahern
  0 siblings, 0 replies; 42+ messages in thread
From: David S. Ahern @ 2008-09-12 18:19 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: kvm



Marcelo Tosatti wrote:
> On Fri, Sep 12, 2008 at 09:12:03AM -0600, David S. Ahern wrote:
>>
>> Marcelo Tosatti wrote:
>>> On Thu, Sep 11, 2008 at 10:05:12PM -0600, David S. Ahern wrote:
>>>
>>> David,
>>>
>>> Please reload the kvm-intel module with "bypass_guest_pf=0" option.
>>>
>>>
>> DOH. You mentioned that in your description, and I forgot to disable it.
>>  Works fine now.
>>
>> Guest behavior is amazing. After 30 min of uptime, kscand shows only
>> 17secs of cpu usage. Before, kscand was hitting over a minute after
>> about 10 minutes of uptime.
> 
> Great. Note that its not fully optimized for the RHEL3 highpte kscand
> case, which calls invlpg for present pagetable entries.
> 
> The current patchset simply invalidates the shadow on invlpg, meaning
> that the next access will cause a pagefault exit.
> 
> It can instead read the guest pte and prefault, saving one exit per
> test-and-clear-accessed operation.
> 

Meaning performance will get even better? Sweet.

david

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2008-09-12 18:19 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-09-06 18:48 [patch 00/13] RFC: out of sync shadow Marcelo Tosatti
2008-09-06 18:48 ` [patch 01/13] x86/mm: get_user_pages_fast_atomic Marcelo Tosatti
2008-09-07  8:42   ` Avi Kivity
2008-09-08  6:10     ` Marcelo Tosatti
2008-09-08 14:20       ` Avi Kivity
2008-09-06 18:48 ` [patch 02/13] KVM: MMU: switch to get_user_pages_fast Marcelo Tosatti
2008-09-07  8:45   ` Avi Kivity
2008-09-07 20:44     ` Marcelo Tosatti
2008-09-08 14:53       ` Avi Kivity
2008-09-09 12:21     ` Andrea Arcangeli
2008-09-09 13:57       ` Avi Kivity
2008-09-06 18:48 ` [patch 03/13] KVM: MMU: gfn_to_page_atomic Marcelo Tosatti
2008-09-06 18:48 ` [patch 04/13] KVM: MMU: switch prefetch_page to gfn_to_page_atomic Marcelo Tosatti
2008-09-06 18:48 ` [patch 05/13] KVM: MMU: do not write-protect large mappings Marcelo Tosatti
2008-09-07  9:04   ` Avi Kivity
2008-09-07 20:54     ` Marcelo Tosatti
2008-09-06 18:48 ` [patch 06/13] KVM: MMU: global page keeping Marcelo Tosatti
2008-09-07  9:16   ` Avi Kivity
2008-09-06 18:48 ` [patch 07/13] KVM: MMU: mode specific sync_page Marcelo Tosatti
2008-09-07  9:52   ` Avi Kivity
2008-09-08  6:03     ` Marcelo Tosatti
2008-09-08  9:50       ` Avi Kivity
2008-09-06 18:48 ` [patch 08/13] KVM: MMU: record guest root level on struct guest_walker Marcelo Tosatti
2008-09-06 18:48 ` [patch 09/13] KVM: MMU: out of sync shadow core Marcelo Tosatti
2008-09-07 11:01   ` Avi Kivity
2008-09-08  7:19     ` Marcelo Tosatti
2008-09-08 14:51       ` Avi Kivity
2008-09-11  8:19         ` Marcelo Tosatti
2008-09-11 13:15     ` Marcelo Tosatti
2008-09-06 18:48 ` [patch 10/13] KVM: MMU: sync roots on mmu reload Marcelo Tosatti
2008-09-06 18:48 ` [patch 11/13] KVM: MMU: sync global pages on cr0/cr4 writes Marcelo Tosatti
2008-09-06 18:48 ` [patch 12/13] KVM: x86: trap invlpg Marcelo Tosatti
2008-09-07 11:14   ` Avi Kivity
2008-09-06 18:48 ` [patch 13/13] KVM: MMU: ignore multiroot when unsyncing global pages Marcelo Tosatti
2008-09-07 11:22 ` [patch 00/13] RFC: out of sync shadow Avi Kivity
2008-09-08  7:23   ` Marcelo Tosatti
2008-09-08 14:56     ` Avi Kivity
2008-09-12  4:05 ` David S. Ahern
2008-09-12 11:51   ` Marcelo Tosatti
2008-09-12 15:12     ` David S. Ahern
2008-09-12 18:09       ` Marcelo Tosatti
2008-09-12 18:19         ` David S. Ahern

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox