* [PATCH 1/6] KVM: MMU: fix gfn got in kvm_mmu_page_get_gfn() [not found] <4C16E6ED.7020009@cn.fujitsu.com> @ 2010-06-15 2:46 ` Xiao Guangrong [not found] ` <4C16E75F.6020003@cn.fujitsu.com> 1 sibling, 0 replies; 23+ messages in thread From: Xiao Guangrong @ 2010-06-15 2:46 UTC (permalink / raw) To: Avi Kivity; +Cc: Marcelo Tosatti, LKML, KVM list kvm_mmu_page_get_gfn() should return the unalias gfn not mapping gfn Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> --- arch/x86/kvm/mmu.c | 19 ++++++++++++------- 1 files changed, 12 insertions(+), 7 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 56cbe45..734b106 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -397,18 +397,23 @@ static void mmu_free_rmap_desc(struct kvm_rmap_desc *rd) kmem_cache_free(rmap_desc_cache, rd); } -static gfn_t kvm_mmu_page_get_gfn(struct kvm_mmu_page *sp, int index) +static gfn_t kvm_mmu_page_get_gfn(struct kvm *kvm, struct kvm_mmu_page *sp, + int index) { + gfn_t gfn; + if (!sp->role.direct) return sp->gfns[index]; + gfn = sp->gfn + (index << ((sp->role.level - 1) * PT64_LEVEL_BITS)); - return sp->gfn + (index << ((sp->role.level - 1) * PT64_LEVEL_BITS)); + return unalias_gfn(kvm, gfn); } -static void kvm_mmu_page_set_gfn(struct kvm_mmu_page *sp, int index, gfn_t gfn) +static void kvm_mmu_page_set_gfn(struct kvm *kvm, struct kvm_mmu_page *sp, + int index, gfn_t gfn) { if (sp->role.direct) - BUG_ON(gfn != kvm_mmu_page_get_gfn(sp, index)); + BUG_ON(gfn != kvm_mmu_page_get_gfn(kvm, sp, index)); else sp->gfns[index] = gfn; } @@ -563,7 +568,7 @@ static int rmap_add(struct kvm_vcpu *vcpu, u64 *spte, gfn_t gfn) return count; gfn = unalias_gfn(vcpu->kvm, gfn); sp = page_header(__pa(spte)); - kvm_mmu_page_set_gfn(sp, spte - sp->spt, gfn); + kvm_mmu_page_set_gfn(vcpu->kvm, sp, spte - sp->spt, gfn); rmapp = gfn_to_rmap(vcpu->kvm, gfn, sp->role.level); if (!*rmapp) { rmap_printk("rmap_add: %p %llx 0->1\n", spte, *spte); @@ -633,7 +638,7 @@ static void rmap_remove(struct kvm *kvm, u64 *spte) kvm_set_pfn_accessed(pfn); if (is_writable_pte(*spte)) kvm_set_pfn_dirty(pfn); - gfn = kvm_mmu_page_get_gfn(sp, spte - sp->spt); + gfn = kvm_mmu_page_get_gfn(kvm, sp, spte - sp->spt); rmapp = gfn_to_rmap(kvm, gfn, sp->role.level); if (!*rmapp) { printk(KERN_ERR "rmap_remove: %p %llx 0->BUG\n", spte, *spte); @@ -3460,7 +3465,7 @@ void inspect_spte_has_rmap(struct kvm *kvm, u64 *sptep) if (is_writable_pte(*sptep)) { rev_sp = page_header(__pa(sptep)); - gfn = kvm_mmu_page_get_gfn(rev_sp, sptep - rev_sp->spt); + gfn = kvm_mmu_page_get_gfn(kvm, rev_sp, sptep - rev_sp->spt); if (!gfn_to_memslot(kvm, gfn)) { if (!printk_ratelimit()) -- 1.6.1.2 ^ permalink raw reply related [flat|nested] 23+ messages in thread
[parent not found: <4C16E75F.6020003@cn.fujitsu.com>]
* [PATCH 2/6] KVM: MMU: fix conflict access permissions in direct sp [not found] ` <4C16E75F.6020003@cn.fujitsu.com> @ 2010-06-15 2:46 ` Xiao Guangrong [not found] ` <4C16E7AD.1060101@cn.fujitsu.com> 1 sibling, 0 replies; 23+ messages in thread From: Xiao Guangrong @ 2010-06-15 2:46 UTC (permalink / raw) To: Avi Kivity; +Cc: Marcelo Tosatti, LKML, KVM list In no-direct mapping, we mark sp is 'direct' when we mapping the guest's larger page, but its access is encoded form upper page-struct entire not include the last mapping, it will cause access conflict. For example, have this mapping: [W] / PDE1 -> |---| P[W] | | LPA \ PDE2 -> |---| [R] P have two children, PDE1 and PDE2, both PDE1 and PDE2 mapping the same lage page(LPA). The P's access is WR, PDE1's access is WR, PDE2's access is RO(just consider read-write permissions here) When guest access PDE1, we will create a direct sp for LPA, the sp's access is from P, is W, then we will mark the ptes is W in this sp. Then, guest access PDE2, we will find LPA's shadow page, is the same as PDE's, and mark the ptes is RO. So, if guest access PDE1, the incorrect #PF is occured. Fixed by encode the last mapping access into direct shadow page Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> --- arch/x86/kvm/paging_tmpl.h | 2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index 3ebc5a0..eb47148 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -339,6 +339,8 @@ static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, direct = 1; if (!is_dirty_gpte(gw->ptes[level - delta])) access &= ~ACC_WRITE_MASK; + access &= gw->pte_access; + /* * It is a large guest pages backed by small host pages, * So we set @direct(@sp->role.direct)=1, and set -- 1.6.1.2 ^ permalink raw reply related [flat|nested] 23+ messages in thread
[parent not found: <4C16E7AD.1060101@cn.fujitsu.com>]
* [PATCH 3/6] KVM: MMU: introduce gfn_to_page_atomic() and gfn_to_pfn_atomic() [not found] ` <4C16E7AD.1060101@cn.fujitsu.com> @ 2010-06-15 2:46 ` Xiao Guangrong 2010-06-15 11:22 ` Avi Kivity [not found] ` <4C16E7F4.5060801@cn.fujitsu.com> 1 sibling, 1 reply; 23+ messages in thread From: Xiao Guangrong @ 2010-06-15 2:46 UTC (permalink / raw) To: Avi Kivity; +Cc: Marcelo Tosatti, LKML, KVM list Introduce gfn_to_page_atomic() and gfn_to_pfn_atomic(), those functions is fast path and can used in atomic context, the later patch will use those Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> --- arch/x86/mm/gup.c | 2 + include/linux/kvm_host.h | 2 + virt/kvm/kvm_main.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 54 insertions(+), 0 deletions(-) diff --git a/arch/x86/mm/gup.c b/arch/x86/mm/gup.c index 738e659..0c9034b 100644 --- a/arch/x86/mm/gup.c +++ b/arch/x86/mm/gup.c @@ -6,6 +6,7 @@ */ #include <linux/sched.h> #include <linux/mm.h> +#include <linux/module.h> #include <linux/vmstat.h> #include <linux/highmem.h> @@ -274,6 +275,7 @@ int __get_user_pages_fast(unsigned long start, int nr_pages, int write, return nr; } +EXPORT_SYMBOL_GPL(__get_user_pages_fast); /** * get_user_pages_fast() - pin user pages in memory diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 2d96555..98c3e00 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -289,6 +289,7 @@ void kvm_arch_flush_shadow(struct kvm *kvm); gfn_t unalias_gfn(struct kvm *kvm, gfn_t gfn); gfn_t unalias_gfn_instantiation(struct kvm *kvm, gfn_t gfn); +struct page *gfn_to_page_atomic(struct kvm *kvm, gfn_t gfn); struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn); unsigned long gfn_to_hva(struct kvm *kvm, gfn_t gfn); void kvm_release_page_clean(struct page *page); @@ -296,6 +297,7 @@ void kvm_release_page_dirty(struct page *page); void kvm_set_page_dirty(struct page *page); void kvm_set_page_accessed(struct page *page); +pfn_t gfn_to_pfn_atomic(struct kvm *kvm, gfn_t gfn); pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn); pfn_t gfn_to_pfn_memslot(struct kvm *kvm, struct kvm_memory_slot *slot, gfn_t gfn); diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 84a0906..b806f29 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -942,6 +942,41 @@ unsigned long gfn_to_hva(struct kvm *kvm, gfn_t gfn) } EXPORT_SYMBOL_GPL(gfn_to_hva); +static pfn_t hva_to_pfn_atomic(struct kvm *kvm, unsigned long addr) +{ + struct page *page[1]; + int npages; + pfn_t pfn; + + npages = __get_user_pages_fast(addr, 1, 1, page); + + if (unlikely(npages != 1)) { + if (is_hwpoison_address(addr)) { + get_page(hwpoison_page); + return page_to_pfn(hwpoison_page); + } + get_page(bad_page); + return page_to_pfn(bad_page); + } else + pfn = page_to_pfn(page[0]); + + return pfn; +} + +pfn_t gfn_to_pfn_atomic(struct kvm *kvm, gfn_t gfn) +{ + unsigned long addr; + + addr = gfn_to_hva(kvm, gfn); + if (kvm_is_error_hva(addr)) { + get_page(bad_page); + return page_to_pfn(bad_page); + } + + return hva_to_pfn_atomic(kvm, addr); +} +EXPORT_SYMBOL_GPL(gfn_to_pfn_atomic); + static pfn_t hva_to_pfn(struct kvm *kvm, unsigned long addr) { struct page *page[1]; @@ -1000,6 +1035,21 @@ pfn_t gfn_to_pfn_memslot(struct kvm *kvm, return hva_to_pfn(kvm, addr); } +struct page *gfn_to_page_atomic(struct kvm *kvm, gfn_t gfn) +{ + pfn_t pfn; + + pfn = gfn_to_pfn_atomic(kvm, gfn); + if (!kvm_is_mmio_pfn(pfn)) + return pfn_to_page(pfn); + + WARN_ON(kvm_is_mmio_pfn(pfn)); + + get_page(bad_page); + return bad_page; +} +EXPORT_SYMBOL_GPL(gfn_to_page_atomic); + struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn) { pfn_t pfn; -- 1.6.1.2 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH 3/6] KVM: MMU: introduce gfn_to_page_atomic() and gfn_to_pfn_atomic() 2010-06-15 2:46 ` [PATCH 3/6] KVM: MMU: introduce gfn_to_page_atomic() and gfn_to_pfn_atomic() Xiao Guangrong @ 2010-06-15 11:22 ` Avi Kivity 2010-06-16 7:59 ` Andi Kleen ` (2 more replies) 0 siblings, 3 replies; 23+ messages in thread From: Avi Kivity @ 2010-06-15 11:22 UTC (permalink / raw) To: Xiao Guangrong; +Cc: Marcelo Tosatti, LKML, KVM list, Andi Kleen, Huang Ying On 06/15/2010 05:46 AM, Xiao Guangrong wrote: > Introduce gfn_to_page_atomic() and gfn_to_pfn_atomic(), those > functions is fast path and can used in atomic context, the later > patch will use those > > > @@ -942,6 +942,41 @@ unsigned long gfn_to_hva(struct kvm *kvm, gfn_t gfn) > } > EXPORT_SYMBOL_GPL(gfn_to_hva); > > +static pfn_t hva_to_pfn_atomic(struct kvm *kvm, unsigned long addr) > +{ > + struct page *page[1]; > + int npages; > + pfn_t pfn; > + > + npages = __get_user_pages_fast(addr, 1, 1, page); > + > + if (unlikely(npages != 1)) { > + if (is_hwpoison_address(addr)) { > + get_page(hwpoison_page); > + return page_to_pfn(hwpoison_page); > + } > + get_page(bad_page); > + return page_to_pfn(bad_page); > + } else > + pfn = page_to_pfn(page[0]); > + > + return pfn; > +} > Too much duplication. How about putting the tail end of the function in a common helper (with an inatomic flag)? btw, is_hwpoison_address() is racy. While it looks up the address, some other task can unmap the page tables under us. Andi/Huang? One way of fixing it is get_user_pages_ptes_fast(), which also returns the pte, also atomically. I want it for other reasons as well (respond to a read fault by gupping the page for read, but allowing write access if the pte indicates it is writeable). -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 3/6] KVM: MMU: introduce gfn_to_page_atomic() and gfn_to_pfn_atomic() 2010-06-15 11:22 ` Avi Kivity @ 2010-06-16 7:59 ` Andi Kleen 2010-06-16 8:05 ` Avi Kivity 2010-06-16 10:51 ` huang ying 2010-06-17 6:46 ` Xiao Guangrong 2 siblings, 1 reply; 23+ messages in thread From: Andi Kleen @ 2010-06-16 7:59 UTC (permalink / raw) To: Avi Kivity Cc: Xiao Guangrong, Marcelo Tosatti, LKML, KVM list, Andi Kleen, Huang Ying On Tue, Jun 15, 2010 at 02:22:06PM +0300, Avi Kivity wrote: > Too much duplication. How about putting the tail end of the function in a > common helper (with an inatomic flag)? > > btw, is_hwpoison_address() is racy. While it looks up the address, some > other task can unmap the page tables under us. Where is is_hwpoison_address() coming from? I can't find it anywhere. Anyways hwpoison will not remove the page as long as there is a reference to it, so as long as you get the reference race free against another task you're ok. Of course there might be always an error in between and the hardware may poison the data, but we don't try to handle all kernel errors. -Andi -- ak@linux.intel.com -- Speaking for myself only. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 3/6] KVM: MMU: introduce gfn_to_page_atomic() and gfn_to_pfn_atomic() 2010-06-16 7:59 ` Andi Kleen @ 2010-06-16 8:05 ` Avi Kivity 2010-06-16 8:49 ` Andi Kleen 0 siblings, 1 reply; 23+ messages in thread From: Avi Kivity @ 2010-06-16 8:05 UTC (permalink / raw) To: Andi Kleen; +Cc: Xiao Guangrong, Marcelo Tosatti, LKML, KVM list, Huang Ying On 06/16/2010 10:59 AM, Andi Kleen wrote: > On Tue, Jun 15, 2010 at 02:22:06PM +0300, Avi Kivity wrote: > >> Too much duplication. How about putting the tail end of the function in a >> common helper (with an inatomic flag)? >> >> btw, is_hwpoison_address() is racy. While it looks up the address, some >> other task can unmap the page tables under us. >> > Where is is_hwpoison_address() coming from? I can't find it anywhere. > > kvm.git master mm/memory-failure.c (19564281fe). > Anyways hwpoison will not remove the page as long as there > is a reference to it, so as long as you get the reference > race free against another task you're ok. Of course there might > be always an error in between and the hardware may poison > the data, but we don't try to handle all kernel errors. > The page is fine, the page tables are not. Another task can munmap() the thing while is_hwpoison_address() is running. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 3/6] KVM: MMU: introduce gfn_to_page_atomic() and gfn_to_pfn_atomic() 2010-06-16 8:05 ` Avi Kivity @ 2010-06-16 8:49 ` Andi Kleen 2010-06-16 9:40 ` Avi Kivity 0 siblings, 1 reply; 23+ messages in thread From: Andi Kleen @ 2010-06-16 8:49 UTC (permalink / raw) To: Avi Kivity Cc: Andi Kleen, Xiao Guangrong, Marcelo Tosatti, LKML, KVM list, Huang Ying > The page is fine, the page tables are not. Another task can munmap() the > thing while is_hwpoison_address() is running. Ok that boils down to me not seeing that source. If it accesses the page tables yes then it's racy. But whoever looked up the page tables in the first place should have returned an -EFAULT. There's no useful address attached to poison. -Andi -- ak@linux.intel.com -- Speaking for myself only. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 3/6] KVM: MMU: introduce gfn_to_page_atomic() and gfn_to_pfn_atomic() 2010-06-16 8:49 ` Andi Kleen @ 2010-06-16 9:40 ` Avi Kivity 0 siblings, 0 replies; 23+ messages in thread From: Avi Kivity @ 2010-06-16 9:40 UTC (permalink / raw) To: Andi Kleen; +Cc: Xiao Guangrong, Marcelo Tosatti, LKML, KVM list, Huang Ying On 06/16/2010 11:49 AM, Andi Kleen wrote: >> The page is fine, the page tables are not. Another task can munmap() the >> thing while is_hwpoison_address() is running. >> > Ok that boils down to me not seeing that source. > > If it accesses the page tables yes then it's racy. But whoever > looked up the page tables in the first place should have > returned an -EFAULT. There's no useful address attached > to poison. > We need to distinguish between genuine -EFAULT and poisoned address. That's why I suggested get_user_pages_ptes_fast. You can return page = NULL (-EFAULT) and the pte in the same go. No race, and useful for other cases. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 3/6] KVM: MMU: introduce gfn_to_page_atomic() and gfn_to_pfn_atomic() 2010-06-15 11:22 ` Avi Kivity 2010-06-16 7:59 ` Andi Kleen @ 2010-06-16 10:51 ` huang ying 2010-06-16 11:08 ` Avi Kivity 2010-06-17 6:46 ` Xiao Guangrong 2 siblings, 1 reply; 23+ messages in thread From: huang ying @ 2010-06-16 10:51 UTC (permalink / raw) To: Avi Kivity Cc: Xiao Guangrong, Marcelo Tosatti, LKML, KVM list, Andi Kleen, Huang Ying On Tue, Jun 15, 2010 at 7:22 PM, Avi Kivity <avi@redhat.com> wrote: > btw, is_hwpoison_address() is racy. While it looks up the address, some > other task can unmap the page tables under us. > > Andi/Huang? > > One way of fixing it is get_user_pages_ptes_fast(), which also returns the > pte, also atomically. I want it for other reasons as well (respond to a > read fault by gupping the page for read, but allowing write access if the > pte indicates it is writeable). Yes. is_hwpoison_address() is racy. But I think it is not absolutely necessary to call is_hwpoison_address() in hva_to_pfn_atomic(), is it? For is_hwpoison_address() in hva_to_pfn(), we can protect it with mmap_sem. Best Regards, Huang Ying ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 3/6] KVM: MMU: introduce gfn_to_page_atomic() and gfn_to_pfn_atomic() 2010-06-16 10:51 ` huang ying @ 2010-06-16 11:08 ` Avi Kivity 0 siblings, 0 replies; 23+ messages in thread From: Avi Kivity @ 2010-06-16 11:08 UTC (permalink / raw) To: huang ying Cc: Xiao Guangrong, Marcelo Tosatti, LKML, KVM list, Andi Kleen, Huang Ying On 06/16/2010 01:51 PM, huang ying wrote: > On Tue, Jun 15, 2010 at 7:22 PM, Avi Kivity<avi@redhat.com> wrote: > >> btw, is_hwpoison_address() is racy. While it looks up the address, some >> other task can unmap the page tables under us. >> >> Andi/Huang? >> >> One way of fixing it is get_user_pages_ptes_fast(), which also returns the >> pte, also atomically. I want it for other reasons as well (respond to a >> read fault by gupping the page for read, but allowing write access if the >> pte indicates it is writeable). >> > Yes. is_hwpoison_address() is racy. But I think it is not absolutely > necessary to call is_hwpoison_address() in hva_to_pfn_atomic(), is it? > We can probably ignore it, yes. > For is_hwpoison_address() in hva_to_pfn(), we can protect it with mmap_sem. > Not very appealing, but should work. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 3/6] KVM: MMU: introduce gfn_to_page_atomic() and gfn_to_pfn_atomic() 2010-06-15 11:22 ` Avi Kivity 2010-06-16 7:59 ` Andi Kleen 2010-06-16 10:51 ` huang ying @ 2010-06-17 6:46 ` Xiao Guangrong 2 siblings, 0 replies; 23+ messages in thread From: Xiao Guangrong @ 2010-06-17 6:46 UTC (permalink / raw) To: Avi Kivity; +Cc: Marcelo Tosatti, LKML, KVM list, Andi Kleen, Huang Ying Avi Kivity wrote: > On 06/15/2010 05:46 AM, Xiao Guangrong wrote: >> Introduce gfn_to_page_atomic() and gfn_to_pfn_atomic(), those >> functions is fast path and can used in atomic context, the later >> patch will use those >> >> >> @@ -942,6 +942,41 @@ unsigned long gfn_to_hva(struct kvm *kvm, gfn_t gfn) >> } >> EXPORT_SYMBOL_GPL(gfn_to_hva); >> >> +static pfn_t hva_to_pfn_atomic(struct kvm *kvm, unsigned long addr) >> +{ >> + struct page *page[1]; >> + int npages; >> + pfn_t pfn; >> + >> + npages = __get_user_pages_fast(addr, 1, 1, page); >> + >> + if (unlikely(npages != 1)) { >> + if (is_hwpoison_address(addr)) { >> + get_page(hwpoison_page); >> + return page_to_pfn(hwpoison_page); >> + } >> + get_page(bad_page); >> + return page_to_pfn(bad_page); >> + } else >> + pfn = page_to_pfn(page[0]); >> + >> + return pfn; >> +} >> > > Too much duplication. How about putting the tail end of the function in > a common helper (with an inatomic flag)? > OK, will cleanup it in the next version. ^ permalink raw reply [flat|nested] 23+ messages in thread
[parent not found: <4C16E7F4.5060801@cn.fujitsu.com>]
* [PATCH 4/6] KVM: MMU: introduce mmu_topup_memory_cache_atomic() [not found] ` <4C16E7F4.5060801@cn.fujitsu.com> @ 2010-06-15 2:46 ` Xiao Guangrong [not found] ` <4C16E82E.5010306@cn.fujitsu.com> 1 sibling, 0 replies; 23+ messages in thread From: Xiao Guangrong @ 2010-06-15 2:46 UTC (permalink / raw) To: Avi Kivity; +Cc: Marcelo Tosatti, LKML, KVM list Introduce mmu_topup_memory_cache_atomic(), it support topup memory cache in atomic context. And, when we prefetch pte, we need topup the rmap's cache Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> --- arch/x86/kvm/mmu.c | 29 +++++++++++++++++++++++++---- 1 files changed, 25 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 734b106..92ff099 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -290,15 +290,16 @@ static void __set_spte(u64 *sptep, u64 spte) #endif } -static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache, - struct kmem_cache *base_cache, int min) +static int __mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache, + struct kmem_cache *base_cache, int min, + int max, gfp_t flags) { void *obj; if (cache->nobjs >= min) return 0; - while (cache->nobjs < ARRAY_SIZE(cache->objects)) { - obj = kmem_cache_zalloc(base_cache, GFP_KERNEL); + while (cache->nobjs < max) { + obj = kmem_cache_zalloc(base_cache, flags); if (!obj) return -ENOMEM; cache->objects[cache->nobjs++] = obj; @@ -306,6 +307,26 @@ static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache, return 0; } +static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache, + struct kmem_cache *base_cache, int min) +{ + return __mmu_topup_memory_cache(cache, base_cache, min, + ARRAY_SIZE(cache->objects), GFP_KERNEL); +} + +static int mmu_topup_memory_cache_atomic(struct kvm_mmu_memory_cache *cache, + struct kmem_cache *base_cache, int min) +{ + return __mmu_topup_memory_cache(cache, base_cache, min, min, + GFP_ATOMIC); +} + +static int pte_prefetch_topup_memory_cache(struct kvm_vcpu *vcpu) +{ + return mmu_topup_memory_cache_atomic(&vcpu->arch.mmu_rmap_desc_cache, + rmap_desc_cache, 1); +} + static void mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc, struct kmem_cache *cache) { -- 1.6.1.2 ^ permalink raw reply related [flat|nested] 23+ messages in thread
[parent not found: <4C16E82E.5010306@cn.fujitsu.com>]
* [PATCH 5/6] KVM: MMU: prefetch ptes when intercepted guest #PF [not found] ` <4C16E82E.5010306@cn.fujitsu.com> @ 2010-06-15 2:47 ` Xiao Guangrong 2010-06-15 11:41 ` Avi Kivity 2010-06-16 3:55 ` Marcelo Tosatti [not found] ` <4C16E867.8040606@cn.fujitsu.com> 1 sibling, 2 replies; 23+ messages in thread From: Xiao Guangrong @ 2010-06-15 2:47 UTC (permalink / raw) To: Avi Kivity; +Cc: Marcelo Tosatti, LKML, KVM list Support prefetch ptes when intercept guest #PF, avoid to #PF by later access If we meet any failure in the prefetch path, we will exit it and not try other ptes to avoid become heavy path Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> --- arch/x86/kvm/mmu.c | 36 +++++++++++++++++++++ arch/x86/kvm/paging_tmpl.h | 76 ++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 112 insertions(+), 0 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 92ff099..941c86b 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -89,6 +89,8 @@ module_param(oos_shadow, bool, 0644); } #endif +#define PTE_PREFETCH_NUM 16 + #define PT_FIRST_AVAIL_BITS_SHIFT 9 #define PT64_SECOND_AVAIL_BITS_SHIFT 52 @@ -2041,6 +2043,39 @@ static void nonpaging_new_cr3(struct kvm_vcpu *vcpu) { } +static void direct_pte_prefetch(struct kvm_vcpu *vcpu, u64 *sptep) +{ + struct kvm_mmu_page *sp; + int index, i; + + sp = page_header(__pa(sptep)); + WARN_ON(!sp->role.direct); + index = sptep - sp->spt; + + for (i = index + 1; i < min(PT64_ENT_PER_PAGE, + index + PTE_PREFETCH_NUM); i++) { + gfn_t gfn; + pfn_t pfn; + u64 *spte = sp->spt + i; + + if (*spte != shadow_trap_nonpresent_pte) + continue; + + gfn = sp->gfn + (i << ((sp->role.level - 1) * PT64_LEVEL_BITS)); + + pfn = gfn_to_pfn_atomic(vcpu->kvm, gfn); + if (is_error_pfn(pfn)) { + kvm_release_pfn_clean(pfn); + break; + } + if (pte_prefetch_topup_memory_cache(vcpu)) + break; + + mmu_set_spte(vcpu, spte, ACC_ALL, ACC_ALL, 0, 0, 1, NULL, + sp->role.level, gfn, pfn, true, false); + } +} + static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, int write, int level, gfn_t gfn, pfn_t pfn) { @@ -2055,6 +2090,7 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, int write, 0, write, 1, &pt_write, level, gfn, pfn, false, true); ++vcpu->stat.pf_fixed; + direct_pte_prefetch(vcpu, iterator.sptep); break; } diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index eb47148..af4e041 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -291,6 +291,81 @@ static void FNAME(update_pte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, gpte_to_gfn(gpte), pfn, true, true); } +static void FNAME(pte_prefetch)(struct kvm_vcpu *vcpu, u64 *sptep) +{ + struct kvm_mmu_page *sp; + pt_element_t *table = NULL; + int offset = 0, shift, index, i; + + sp = page_header(__pa(sptep)); + index = sptep - sp->spt; + + if (PTTYPE == 32) { + shift = PAGE_SHIFT - (PT_LEVEL_BITS - + PT64_LEVEL_BITS) * sp->role.level; + offset = sp->role.quadrant << shift; + } + + for (i = index + 1; i < min(PT64_ENT_PER_PAGE, + index + PTE_PREFETCH_NUM); i++) { + struct page *page; + pt_element_t gpte; + unsigned pte_access; + u64 *spte = sp->spt + i; + gfn_t gfn; + pfn_t pfn; + int dirty; + + if (*spte != shadow_trap_nonpresent_pte) + continue; + + pte_access = sp->role.access; + if (sp->role.direct) { + dirty = 1; + gfn = sp->gfn + (i << ((sp->role.level - 1) * + PT64_LEVEL_BITS)); + goto gfn_mapping; + } + + if (!table) { + page = gfn_to_page_atomic(vcpu->kvm, sp->gfn); + if (is_error_page(page)) { + kvm_release_page_clean(page); + break; + } + table = kmap_atomic(page, KM_USER0); + table = (pt_element_t *)((char *)table + offset); + } + + gpte = table[i]; + if (!(gpte & PT_ACCESSED_MASK)) + continue; + + if (!is_present_gpte(gpte)) { + if (!sp->unsync) + *spte = shadow_notrap_nonpresent_pte; + continue; + } + dirty = is_dirty_gpte(gpte); + gfn = (gpte & PT64_BASE_ADDR_MASK) >> PAGE_SHIFT; + pte_access = pte_access & FNAME(gpte_access)(vcpu, gpte); +gfn_mapping: + pfn = gfn_to_pfn_atomic(vcpu->kvm, gfn); + if (is_error_pfn(pfn)) { + kvm_release_pfn_clean(pfn); + break; + } + + if (pte_prefetch_topup_memory_cache(vcpu)) + break; + mmu_set_spte(vcpu, spte, sp->role.access, pte_access, 0, 0, + dirty, NULL, sp->role.level, gfn, pfn, + true, false); + } + if (table) + kunmap_atomic((char *)table - offset, KM_USER0); +} + /* * Fetch a shadow pte for a specific level in the paging hierarchy. */ @@ -322,6 +397,7 @@ static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, is_dirty_gpte(gw->ptes[gw->level-1]), ptwrite, level, gw->gfn, pfn, false, true); + FNAME(pte_prefetch)(vcpu, sptep); break; } -- 1.6.1.2 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH 5/6] KVM: MMU: prefetch ptes when intercepted guest #PF 2010-06-15 2:47 ` [PATCH 5/6] KVM: MMU: prefetch ptes when intercepted guest #PF Xiao Guangrong @ 2010-06-15 11:41 ` Avi Kivity 2010-06-17 7:25 ` Xiao Guangrong 2010-06-16 3:55 ` Marcelo Tosatti 1 sibling, 1 reply; 23+ messages in thread From: Avi Kivity @ 2010-06-15 11:41 UTC (permalink / raw) To: Xiao Guangrong; +Cc: Marcelo Tosatti, LKML, KVM list On 06/15/2010 05:47 AM, Xiao Guangrong wrote: > Support prefetch ptes when intercept guest #PF, avoid to #PF by later > access > > If we meet any failure in the prefetch path, we will exit it and > not try other ptes to avoid become heavy path > > > > +#define PTE_PREFETCH_NUM 16 > + > #define PT_FIRST_AVAIL_BITS_SHIFT 9 > #define PT64_SECOND_AVAIL_BITS_SHIFT 52 > > @@ -2041,6 +2043,39 @@ static void nonpaging_new_cr3(struct kvm_vcpu *vcpu) > { > } > > +static void direct_pte_prefetch(struct kvm_vcpu *vcpu, u64 *sptep) > +{ > + struct kvm_mmu_page *sp; > + int index, i; > + > + sp = page_header(__pa(sptep)); > + WARN_ON(!sp->role.direct); > + index = sptep - sp->spt; > + > + for (i = index + 1; i< min(PT64_ENT_PER_PAGE, > + index + PTE_PREFETCH_NUM); i++) { > + gfn_t gfn; > + pfn_t pfn; > + u64 *spte = sp->spt + i; > + > + if (*spte != shadow_trap_nonpresent_pte) > + continue; > + > + gfn = sp->gfn + (i<< ((sp->role.level - 1) * PT64_LEVEL_BITS)); > Can calculate outside the loop and use +=. Can this in fact work for level != PT_PAGE_TABLE_LEVEL? We might start at PT_PAGE_DIRECTORY_LEVEL but get 4k pages while iterating. > + > + pfn = gfn_to_pfn_atomic(vcpu->kvm, gfn); > + if (is_error_pfn(pfn)) { > + kvm_release_pfn_clean(pfn); > + break; > + } > + if (pte_prefetch_topup_memory_cache(vcpu)) > + break; > + > + mmu_set_spte(vcpu, spte, ACC_ALL, ACC_ALL, 0, 0, 1, NULL, > + sp->role.level, gfn, pfn, true, false); > + } > +} > Nice. Direct prefetch should usually succeed. Can later augment to call get_users_pages_fast(..., PTE_PREFETCH_NUM, ...) to reduce gup overhead. > > +static void FNAME(pte_prefetch)(struct kvm_vcpu *vcpu, u64 *sptep) > +{ > + struct kvm_mmu_page *sp; > + pt_element_t *table = NULL; > + int offset = 0, shift, index, i; > + > + sp = page_header(__pa(sptep)); > + index = sptep - sp->spt; > + > + if (PTTYPE == 32) { > + shift = PAGE_SHIFT - (PT_LEVEL_BITS - > + PT64_LEVEL_BITS) * sp->role.level; > + offset = sp->role.quadrant<< shift; > + } > + > + for (i = index + 1; i< min(PT64_ENT_PER_PAGE, > + index + PTE_PREFETCH_NUM); i++) { > + struct page *page; > + pt_element_t gpte; > + unsigned pte_access; > + u64 *spte = sp->spt + i; > + gfn_t gfn; > + pfn_t pfn; > + int dirty; > + > + if (*spte != shadow_trap_nonpresent_pte) > + continue; > + > + pte_access = sp->role.access; > + if (sp->role.direct) { > + dirty = 1; > + gfn = sp->gfn + (i<< ((sp->role.level - 1) * > + PT64_LEVEL_BITS)); > + goto gfn_mapping; > + } > Should just call direct_pte_prefetch. > + > + if (!table) { > + page = gfn_to_page_atomic(vcpu->kvm, sp->gfn); > + if (is_error_page(page)) { > + kvm_release_page_clean(page); > + break; > + } > + table = kmap_atomic(page, KM_USER0); > + table = (pt_element_t *)((char *)table + offset); > + } > Why not kvm_read_guest_atomic()? Can do it outside the loop. > + > + gpte = table[i]; > + if (!(gpte& PT_ACCESSED_MASK)) > + continue; > + > + if (!is_present_gpte(gpte)) { > + if (!sp->unsync) > + *spte = shadow_notrap_nonpresent_pte; > Need __set_spte(). > + continue; > + } > + dirty = is_dirty_gpte(gpte); > + gfn = (gpte& PT64_BASE_ADDR_MASK)>> PAGE_SHIFT; > + pte_access = pte_access& FNAME(gpte_access)(vcpu, gpte); > +gfn_mapping: > + pfn = gfn_to_pfn_atomic(vcpu->kvm, gfn); > + if (is_error_pfn(pfn)) { > + kvm_release_pfn_clean(pfn); > + break; > + } > + > + if (pte_prefetch_topup_memory_cache(vcpu)) > + break; > + mmu_set_spte(vcpu, spte, sp->role.access, pte_access, 0, 0, > + dirty, NULL, sp->role.level, gfn, pfn, > + true, false); > + } > + if (table) > + kunmap_atomic((char *)table - offset, KM_USER0); > +} > I think lot of code can be shared with the pte prefetch in invlpg. > + > /* > * Fetch a shadow pte for a specific level in the paging hierarchy. > */ > @@ -322,6 +397,7 @@ static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, > is_dirty_gpte(gw->ptes[gw->level-1]), > ptwrite, level, > gw->gfn, pfn, false, true); > + FNAME(pte_prefetch)(vcpu, sptep); > break; > } > > -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 5/6] KVM: MMU: prefetch ptes when intercepted guest #PF 2010-06-15 11:41 ` Avi Kivity @ 2010-06-17 7:25 ` Xiao Guangrong 2010-06-17 8:07 ` Avi Kivity 0 siblings, 1 reply; 23+ messages in thread From: Xiao Guangrong @ 2010-06-17 7:25 UTC (permalink / raw) To: Avi Kivity; +Cc: Marcelo Tosatti, LKML, KVM list Avi Kivity wrote: >> + if (*spte != shadow_trap_nonpresent_pte) >> + continue; >> + >> + gfn = sp->gfn + (i<< ((sp->role.level - 1) * PT64_LEVEL_BITS)); >> > Avi, Thanks for your comment. > Can calculate outside the loop and use +=. > It's nice, will do it in the next version. > Can this in fact work for level != PT_PAGE_TABLE_LEVEL? We might start > at PT_PAGE_DIRECTORY_LEVEL but get 4k pages while iterating. Ah, i forgot it. We can't assume that the host also support huge page for next gfn, as Marcelo's suggestion, we should "only map with level > 1 if the host page matches the size". Um, the problem is, when we get host page size, we should hold 'mm->mmap_sem', it can't used in atomic context and it's also a slow path, we hope pte prefetch path is fast. How about only allow prefetch for sp.leve = 1 now? i'll improve it in the future, i think it need more time :-) > >> + >> + pfn = gfn_to_pfn_atomic(vcpu->kvm, gfn); >> + if (is_error_pfn(pfn)) { >> + kvm_release_pfn_clean(pfn); >> + break; >> + } >> + if (pte_prefetch_topup_memory_cache(vcpu)) >> + break; >> + >> + mmu_set_spte(vcpu, spte, ACC_ALL, ACC_ALL, 0, 0, 1, NULL, >> + sp->role.level, gfn, pfn, true, false); >> + } >> +} >> > > Nice. Direct prefetch should usually succeed. > > Can later augment to call get_users_pages_fast(..., PTE_PREFETCH_NUM, > ...) to reduce gup overhead. But we can't assume the gfn's hva is consecutive, for example, gfn and gfn+1 maybe in the different slots. > >> >> +static void FNAME(pte_prefetch)(struct kvm_vcpu *vcpu, u64 *sptep) >> +{ >> + struct kvm_mmu_page *sp; >> + pt_element_t *table = NULL; >> + int offset = 0, shift, index, i; >> + >> + sp = page_header(__pa(sptep)); >> + index = sptep - sp->spt; >> + >> + if (PTTYPE == 32) { >> + shift = PAGE_SHIFT - (PT_LEVEL_BITS - >> + PT64_LEVEL_BITS) * sp->role.level; >> + offset = sp->role.quadrant<< shift; >> + } >> + >> + for (i = index + 1; i< min(PT64_ENT_PER_PAGE, >> + index + PTE_PREFETCH_NUM); i++) { >> + struct page *page; >> + pt_element_t gpte; >> + unsigned pte_access; >> + u64 *spte = sp->spt + i; >> + gfn_t gfn; >> + pfn_t pfn; >> + int dirty; >> + >> + if (*spte != shadow_trap_nonpresent_pte) >> + continue; >> + >> + pte_access = sp->role.access; >> + if (sp->role.direct) { >> + dirty = 1; >> + gfn = sp->gfn + (i<< ((sp->role.level - 1) * >> + PT64_LEVEL_BITS)); >> + goto gfn_mapping; >> + } >> > > Should just call direct_pte_prefetch. > OK, will fix it. >> + >> + if (!table) { >> + page = gfn_to_page_atomic(vcpu->kvm, sp->gfn); >> + if (is_error_page(page)) { >> + kvm_release_page_clean(page); >> + break; >> + } >> + table = kmap_atomic(page, KM_USER0); >> + table = (pt_element_t *)((char *)table + offset); >> + } >> > > Why not kvm_read_guest_atomic()? Can do it outside the loop. Do you mean that read all prefetched sptes at one time? If prefetch one spte fail, the later sptes that we read is waste, so i choose read next spte only when current spte is prefetched successful. But i not have strong opinion on it since it's fast to read all sptes at one time, at the worst case, only 16 * 8 = 128 bytes we need to read. > >> + >> + gpte = table[i]; >> + if (!(gpte& PT_ACCESSED_MASK)) >> + continue; >> + >> + if (!is_present_gpte(gpte)) { >> + if (!sp->unsync) >> + *spte = shadow_notrap_nonpresent_pte; >> > > Need __set_spte(). Oops, fix it. > >> + continue; >> + } >> + dirty = is_dirty_gpte(gpte); >> + gfn = (gpte& PT64_BASE_ADDR_MASK)>> PAGE_SHIFT; >> + pte_access = pte_access& FNAME(gpte_access)(vcpu, gpte); >> +gfn_mapping: >> + pfn = gfn_to_pfn_atomic(vcpu->kvm, gfn); >> + if (is_error_pfn(pfn)) { >> + kvm_release_pfn_clean(pfn); >> + break; >> + } >> + >> + if (pte_prefetch_topup_memory_cache(vcpu)) >> + break; >> + mmu_set_spte(vcpu, spte, sp->role.access, pte_access, 0, 0, >> + dirty, NULL, sp->role.level, gfn, pfn, >> + true, false); >> + } >> + if (table) >> + kunmap_atomic((char *)table - offset, KM_USER0); >> +} >> > > I think lot of code can be shared with the pte prefetch in invlpg. > Yes, please allow me to cleanup those code after my future patchset: [PATCH v4 9/9] KVM MMU: optimize sync/update unsync-page it's the last part in the 'allow multiple shadow pages' patchset. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 5/6] KVM: MMU: prefetch ptes when intercepted guest #PF 2010-06-17 7:25 ` Xiao Guangrong @ 2010-06-17 8:07 ` Avi Kivity 2010-06-17 9:04 ` Xiao Guangrong 0 siblings, 1 reply; 23+ messages in thread From: Avi Kivity @ 2010-06-17 8:07 UTC (permalink / raw) To: Xiao Guangrong; +Cc: Marcelo Tosatti, LKML, KVM list On 06/17/2010 10:25 AM, Xiao Guangrong wrote: > > >> Can this in fact work for level != PT_PAGE_TABLE_LEVEL? We might start >> at PT_PAGE_DIRECTORY_LEVEL but get 4k pages while iterating. >> > Ah, i forgot it. We can't assume that the host also support huge page for > next gfn, as Marcelo's suggestion, we should "only map with level> 1 if > the host page matches the size". > > Um, the problem is, when we get host page size, we should hold 'mm->mmap_sem', > it can't used in atomic context and it's also a slow path, we hope pte prefetch > path is fast. > > How about only allow prefetch for sp.leve = 1 now? i'll improve it in the future, > i think it need more time :-) > I don't think prefetch for level > 1 is worthwhile. One fault per 2MB is already very good, no need to optimize it further. >>> + >>> + pfn = gfn_to_pfn_atomic(vcpu->kvm, gfn); >>> + if (is_error_pfn(pfn)) { >>> + kvm_release_pfn_clean(pfn); >>> + break; >>> + } >>> + if (pte_prefetch_topup_memory_cache(vcpu)) >>> + break; >>> + >>> + mmu_set_spte(vcpu, spte, ACC_ALL, ACC_ALL, 0, 0, 1, NULL, >>> + sp->role.level, gfn, pfn, true, false); >>> + } >>> +} >>> >>> >> Nice. Direct prefetch should usually succeed. >> >> Can later augment to call get_users_pages_fast(..., PTE_PREFETCH_NUM, >> ...) to reduce gup overhead. >> > But we can't assume the gfn's hva is consecutive, for example, gfn and gfn+1 > maybe in the different slots. > Right. We could limit it to one slot then for simplicity. > >>> + >>> + if (!table) { >>> + page = gfn_to_page_atomic(vcpu->kvm, sp->gfn); >>> + if (is_error_page(page)) { >>> + kvm_release_page_clean(page); >>> + break; >>> + } >>> + table = kmap_atomic(page, KM_USER0); >>> + table = (pt_element_t *)((char *)table + offset); >>> + } >>> >>> >> Why not kvm_read_guest_atomic()? Can do it outside the loop. >> > Do you mean that read all prefetched sptes at one time? > Yes. > If prefetch one spte fail, the later sptes that we read is waste, so i > choose read next spte only when current spte is prefetched successful. > > But i not have strong opinion on it since it's fast to read all sptes at > one time, at the worst case, only 16 * 8 = 128 bytes we need to read. > In general batching is worthwhile, the cost of the extra bytes is low compared to the cost of bringing in the cacheline and error checking. btw, you could align the prefetch to 16 pte boundary. That would improve performance for memory that is scanned backwards. So we can change the fault path to always fault 16 ptes, aligned on 16 pte boundary, with the needed pte called with specualtive=false. >> I think lot of code can be shared with the pte prefetch in invlpg. >> >> > Yes, please allow me to cleanup those code after my future patchset: > > [PATCH v4 9/9] KVM MMU: optimize sync/update unsync-page > > it's the last part in the 'allow multiple shadow pages' patchset. > Sure. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 5/6] KVM: MMU: prefetch ptes when intercepted guest #PF 2010-06-17 8:07 ` Avi Kivity @ 2010-06-17 9:04 ` Xiao Guangrong 2010-06-17 9:20 ` Avi Kivity 0 siblings, 1 reply; 23+ messages in thread From: Xiao Guangrong @ 2010-06-17 9:04 UTC (permalink / raw) To: Avi Kivity; +Cc: Marcelo Tosatti, LKML, KVM list Avi Kivity wrote: > On 06/17/2010 10:25 AM, Xiao Guangrong wrote: >> >> >>> Can this in fact work for level != PT_PAGE_TABLE_LEVEL? We might start >>> at PT_PAGE_DIRECTORY_LEVEL but get 4k pages while iterating. >>> >> Ah, i forgot it. We can't assume that the host also support huge page for >> next gfn, as Marcelo's suggestion, we should "only map with level> 1 if >> the host page matches the size". >> >> Um, the problem is, when we get host page size, we should hold >> 'mm->mmap_sem', >> it can't used in atomic context and it's also a slow path, we hope pte >> prefetch >> path is fast. >> >> How about only allow prefetch for sp.leve = 1 now? i'll improve it in >> the future, >> i think it need more time :-) >> > > I don't think prefetch for level > 1 is worthwhile. One fault per 2MB > is already very good, no need to optimize it further. > OK >>>> + >>>> + pfn = gfn_to_pfn_atomic(vcpu->kvm, gfn); >>>> + if (is_error_pfn(pfn)) { >>>> + kvm_release_pfn_clean(pfn); >>>> + break; >>>> + } >>>> + if (pte_prefetch_topup_memory_cache(vcpu)) >>>> + break; >>>> + >>>> + mmu_set_spte(vcpu, spte, ACC_ALL, ACC_ALL, 0, 0, 1, NULL, >>>> + sp->role.level, gfn, pfn, true, false); >>>> + } >>>> +} >>>> >>>> >>> Nice. Direct prefetch should usually succeed. >>> >>> Can later augment to call get_users_pages_fast(..., PTE_PREFETCH_NUM, >>> ...) to reduce gup overhead. >>> >> But we can't assume the gfn's hva is consecutive, for example, gfn and >> gfn+1 >> maybe in the different slots. >> > > Right. We could limit it to one slot then for simplicity. OK, i'll do it. > >> >>>> + >>>> + if (!table) { >>>> + page = gfn_to_page_atomic(vcpu->kvm, sp->gfn); >>>> + if (is_error_page(page)) { >>>> + kvm_release_page_clean(page); >>>> + break; >>>> + } >>>> + table = kmap_atomic(page, KM_USER0); >>>> + table = (pt_element_t *)((char *)table + offset); >>>> + } >>>> >>>> >>> Why not kvm_read_guest_atomic()? Can do it outside the loop. >>> >> Do you mean that read all prefetched sptes at one time? >> > > Yes. > >> If prefetch one spte fail, the later sptes that we read is waste, so i >> choose read next spte only when current spte is prefetched successful. >> >> But i not have strong opinion on it since it's fast to read all sptes at >> one time, at the worst case, only 16 * 8 = 128 bytes we need to read. >> > > In general batching is worthwhile, the cost of the extra bytes is low > compared to the cost of bringing in the cacheline and error checking. > Agreed. > btw, you could align the prefetch to 16 pte boundary. That would > improve performance for memory that is scanned backwards. > Yeah, good idea. > So we can change the fault path to always fault 16 ptes, aligned on 16 > pte boundary, with the needed pte called with specualtive=false. Avi, i not understand it clearly, Could you please explain it? :-( ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 5/6] KVM: MMU: prefetch ptes when intercepted guest #PF 2010-06-17 9:04 ` Xiao Guangrong @ 2010-06-17 9:20 ` Avi Kivity 2010-06-17 9:29 ` Xiao Guangrong 0 siblings, 1 reply; 23+ messages in thread From: Avi Kivity @ 2010-06-17 9:20 UTC (permalink / raw) To: Xiao Guangrong; +Cc: Marcelo Tosatti, LKML, KVM list On 06/17/2010 12:04 PM, Xiao Guangrong wrote: > > >> So we can change the fault path to always fault 16 ptes, aligned on 16 >> pte boundary, with the needed pte called with specualtive=false. >> > Avi, i not understand it clearly, Could you please explain it? :-( > Right now if the fault is in spte i, you prefetch ptes (i+1)..(i+MAX_PREFETCH-1). I'd like to prefetch ptes (i & ~(MAX_PREFETCH-1))..(i | (MAX_PREFETCH - 1)). Read all those gptes, and map them one by one with speculative = false only for spte i. Perhaps we need to integrate it into walk_addr, there's no reason to read the gptes twice. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 5/6] KVM: MMU: prefetch ptes when intercepted guest #PF 2010-06-17 9:20 ` Avi Kivity @ 2010-06-17 9:29 ` Xiao Guangrong 0 siblings, 0 replies; 23+ messages in thread From: Xiao Guangrong @ 2010-06-17 9:29 UTC (permalink / raw) To: Avi Kivity; +Cc: Marcelo Tosatti, LKML, KVM list Avi Kivity wrote: > On 06/17/2010 12:04 PM, Xiao Guangrong wrote: >> >> >>> So we can change the fault path to always fault 16 ptes, aligned on 16 >>> pte boundary, with the needed pte called with specualtive=false. >>> >> Avi, i not understand it clearly, Could you please explain it? :-( >> > > Right now if the fault is in spte i, you prefetch ptes > (i+1)..(i+MAX_PREFETCH-1). I'd like to prefetch ptes (i & > ~(MAX_PREFETCH-1))..(i | (MAX_PREFETCH - 1)). Read all those gptes, and > map them one by one with speculative = false only for spte i. > Thanks for your explanation, i see. > Perhaps we need to integrate it into walk_addr, there's no reason to > read the gptes twice. > OK, will do it. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 5/6] KVM: MMU: prefetch ptes when intercepted guest #PF 2010-06-15 2:47 ` [PATCH 5/6] KVM: MMU: prefetch ptes when intercepted guest #PF Xiao Guangrong 2010-06-15 11:41 ` Avi Kivity @ 2010-06-16 3:55 ` Marcelo Tosatti 1 sibling, 0 replies; 23+ messages in thread From: Marcelo Tosatti @ 2010-06-16 3:55 UTC (permalink / raw) To: Xiao Guangrong; +Cc: Avi Kivity, LKML, KVM list On Tue, Jun 15, 2010 at 10:47:04AM +0800, Xiao Guangrong wrote: > Support prefetch ptes when intercept guest #PF, avoid to #PF by later > access > > If we meet any failure in the prefetch path, we will exit it and > not try other ptes to avoid become heavy path > > Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> > --- > arch/x86/kvm/mmu.c | 36 +++++++++++++++++++++ > arch/x86/kvm/paging_tmpl.h | 76 ++++++++++++++++++++++++++++++++++++++++++++ > 2 files changed, 112 insertions(+), 0 deletions(-) > > diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c > index 92ff099..941c86b 100644 > --- a/arch/x86/kvm/mmu.c > +++ b/arch/x86/kvm/mmu.c > @@ -89,6 +89,8 @@ module_param(oos_shadow, bool, 0644); > } > #endif > > +#define PTE_PREFETCH_NUM 16 > + > #define PT_FIRST_AVAIL_BITS_SHIFT 9 > #define PT64_SECOND_AVAIL_BITS_SHIFT 52 > > @@ -2041,6 +2043,39 @@ static void nonpaging_new_cr3(struct kvm_vcpu *vcpu) > { > } > > +static void direct_pte_prefetch(struct kvm_vcpu *vcpu, u64 *sptep) > +{ > + struct kvm_mmu_page *sp; > + int index, i; > + > + sp = page_header(__pa(sptep)); > + WARN_ON(!sp->role.direct); > + index = sptep - sp->spt; > + > + for (i = index + 1; i < min(PT64_ENT_PER_PAGE, > + index + PTE_PREFETCH_NUM); i++) { > + gfn_t gfn; > + pfn_t pfn; > + u64 *spte = sp->spt + i; > + > + if (*spte != shadow_trap_nonpresent_pte) > + continue; > + > + gfn = sp->gfn + (i << ((sp->role.level - 1) * PT64_LEVEL_BITS)); > + > + pfn = gfn_to_pfn_atomic(vcpu->kvm, gfn); > + if (is_error_pfn(pfn)) { > + kvm_release_pfn_clean(pfn); > + break; > + } > + if (pte_prefetch_topup_memory_cache(vcpu)) > + break; > + > + mmu_set_spte(vcpu, spte, ACC_ALL, ACC_ALL, 0, 0, 1, NULL, > + sp->role.level, gfn, pfn, true, false); Can only map with level > 1 if the host page matches the size. ^ permalink raw reply [flat|nested] 23+ messages in thread
[parent not found: <4C16E867.8040606@cn.fujitsu.com>]
* [PATCH 6/6] KVM: MMU: trace pte prefetch [not found] ` <4C16E867.8040606@cn.fujitsu.com> @ 2010-06-15 2:47 ` Xiao Guangrong 2010-06-15 12:09 ` Avi Kivity 0 siblings, 1 reply; 23+ messages in thread From: Xiao Guangrong @ 2010-06-15 2:47 UTC (permalink / raw) To: Avi Kivity; +Cc: Marcelo Tosatti, LKML, KVM list Trace pte prefetch to see what trouble we meet, if can help us to improve the prefetch Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> --- arch/x86/kvm/mmu.c | 12 +++++++++++- arch/x86/kvm/mmutrace.h | 26 ++++++++++++++++++++++++++ arch/x86/kvm/paging_tmpl.h | 9 ++++++++- 3 files changed, 45 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 941c86b..0aaa18d 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -91,6 +91,11 @@ module_param(oos_shadow, bool, 0644); #define PTE_PREFETCH_NUM 16 +#define PREFETCH_SUCCESS 0 +#define PREFETCH_ERR_GFN2PFN 1 +#define PREFETCH_ERR_ALLOC_MEM 2 +#define PREFETCH_ERR_READ_GPTE 3 + #define PT_FIRST_AVAIL_BITS_SHIFT 9 #define PT64_SECOND_AVAIL_BITS_SHIFT 52 @@ -2066,11 +2071,16 @@ static void direct_pte_prefetch(struct kvm_vcpu *vcpu, u64 *sptep) pfn = gfn_to_pfn_atomic(vcpu->kvm, gfn); if (is_error_pfn(pfn)) { kvm_release_pfn_clean(pfn); + trace_pte_prefetch(true, PREFETCH_ERR_GFN2PFN); break; } - if (pte_prefetch_topup_memory_cache(vcpu)) + + if (pte_prefetch_topup_memory_cache(vcpu)) { + trace_pte_prefetch(true, PREFETCH_ERR_ALLOC_MEM); break; + } + trace_pte_prefetch(true, PREFETCH_SUCCESS); mmu_set_spte(vcpu, spte, ACC_ALL, ACC_ALL, 0, 0, 1, NULL, sp->role.level, gfn, pfn, true, false); } diff --git a/arch/x86/kvm/mmutrace.h b/arch/x86/kvm/mmutrace.h index 3aab0f0..1c3e84e 100644 --- a/arch/x86/kvm/mmutrace.h +++ b/arch/x86/kvm/mmutrace.h @@ -195,6 +195,32 @@ DEFINE_EVENT(kvm_mmu_page_class, kvm_mmu_prepare_zap_page, TP_ARGS(sp) ); + +#define pte_prefetch_err \ + {PREFETCH_SUCCESS, "SUCCESS" }, \ + {PREFETCH_ERR_GFN2PFN, "ERR_GFN2PFN" }, \ + {PREFETCH_ERR_ALLOC_MEM, "ERR_ALLOC_MEM" }, \ + {PREFETCH_ERR_READ_GPTE, "ERR_READ_GPTE" } + +TRACE_EVENT( + pte_prefetch, + TP_PROTO(bool direct, int err_code), + TP_ARGS(direct, err_code), + + TP_STRUCT__entry( + __field(bool, direct) + __field(int, err_code) + ), + + TP_fast_assign( + __entry->direct = direct; + __entry->err_code = err_code; + ), + + TP_printk("%s %s", __entry->direct ? "direct" : "no-direct", + __print_symbolic(__entry->err_code, pte_prefetch_err)) +); + #endif /* _TRACE_KVMMMU_H */ #undef TRACE_INCLUDE_PATH diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index af4e041..64f2acb 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -331,6 +331,8 @@ static void FNAME(pte_prefetch)(struct kvm_vcpu *vcpu, u64 *sptep) page = gfn_to_page_atomic(vcpu->kvm, sp->gfn); if (is_error_page(page)) { kvm_release_page_clean(page); + trace_pte_prefetch(false, + PREFETCH_ERR_READ_GPTE); break; } table = kmap_atomic(page, KM_USER0); @@ -353,11 +355,16 @@ gfn_mapping: pfn = gfn_to_pfn_atomic(vcpu->kvm, gfn); if (is_error_pfn(pfn)) { kvm_release_pfn_clean(pfn); + trace_pte_prefetch(false, PREFETCH_ERR_GFN2PFN); break; } - if (pte_prefetch_topup_memory_cache(vcpu)) + if (pte_prefetch_topup_memory_cache(vcpu)) { + trace_pte_prefetch(false, PREFETCH_ERR_ALLOC_MEM); break; + } + + trace_pte_prefetch(false, PREFETCH_SUCCESS); mmu_set_spte(vcpu, spte, sp->role.access, pte_access, 0, 0, dirty, NULL, sp->role.level, gfn, pfn, true, false); -- 1.6.1.2 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH 6/6] KVM: MMU: trace pte prefetch 2010-06-15 2:47 ` [PATCH 6/6] KVM: MMU: trace pte prefetch Xiao Guangrong @ 2010-06-15 12:09 ` Avi Kivity 2010-06-17 7:27 ` Xiao Guangrong 0 siblings, 1 reply; 23+ messages in thread From: Avi Kivity @ 2010-06-15 12:09 UTC (permalink / raw) To: Xiao Guangrong; +Cc: Marcelo Tosatti, LKML, KVM list On 06/15/2010 05:47 AM, Xiao Guangrong wrote: > Trace pte prefetch to see what trouble we meet, if can help us to > improve the prefetch > > Signed-off-by: Xiao Guangrong<xiaoguangrong@cn.fujitsu.com> > --- > arch/x86/kvm/mmu.c | 12 +++++++++++- > arch/x86/kvm/mmutrace.h | 26 ++++++++++++++++++++++++++ > arch/x86/kvm/paging_tmpl.h | 9 ++++++++- > 3 files changed, 45 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c > index 941c86b..0aaa18d 100644 > --- a/arch/x86/kvm/mmu.c > +++ b/arch/x86/kvm/mmu.c > @@ -91,6 +91,11 @@ module_param(oos_shadow, bool, 0644); > > #define PTE_PREFETCH_NUM 16 > > +#define PREFETCH_SUCCESS 0 > +#define PREFETCH_ERR_GFN2PFN 1 > +#define PREFETCH_ERR_ALLOC_MEM 2 > +#define PREFETCH_ERR_READ_GPTE 3 > + > #define PT_FIRST_AVAIL_BITS_SHIFT 9 > #define PT64_SECOND_AVAIL_BITS_SHIFT 52 > > @@ -2066,11 +2071,16 @@ static void direct_pte_prefetch(struct kvm_vcpu *vcpu, u64 *sptep) > pfn = gfn_to_pfn_atomic(vcpu->kvm, gfn); > if (is_error_pfn(pfn)) { > kvm_release_pfn_clean(pfn); > + trace_pte_prefetch(true, PREFETCH_ERR_GFN2PFN); > break; > } > What about virtual addresses and gptes? They can be helpful too. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 6/6] KVM: MMU: trace pte prefetch 2010-06-15 12:09 ` Avi Kivity @ 2010-06-17 7:27 ` Xiao Guangrong 0 siblings, 0 replies; 23+ messages in thread From: Xiao Guangrong @ 2010-06-17 7:27 UTC (permalink / raw) To: Avi Kivity; +Cc: Marcelo Tosatti, LKML, KVM list Avi Kivity wrote: >> @@ -2066,11 +2071,16 @@ static void direct_pte_prefetch(struct >> kvm_vcpu *vcpu, u64 *sptep) >> pfn = gfn_to_pfn_atomic(vcpu->kvm, gfn); >> if (is_error_pfn(pfn)) { >> kvm_release_pfn_clean(pfn); >> + trace_pte_prefetch(true, PREFETCH_ERR_GFN2PFN); >> break; >> } >> > > What about virtual addresses and gptes? They can be helpful too. > OK, will do it in the next version. ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2010-06-17 9:32 UTC | newest]
Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <4C16E6ED.7020009@cn.fujitsu.com>
2010-06-15 2:46 ` [PATCH 1/6] KVM: MMU: fix gfn got in kvm_mmu_page_get_gfn() Xiao Guangrong
[not found] ` <4C16E75F.6020003@cn.fujitsu.com>
2010-06-15 2:46 ` [PATCH 2/6] KVM: MMU: fix conflict access permissions in direct sp Xiao Guangrong
[not found] ` <4C16E7AD.1060101@cn.fujitsu.com>
2010-06-15 2:46 ` [PATCH 3/6] KVM: MMU: introduce gfn_to_page_atomic() and gfn_to_pfn_atomic() Xiao Guangrong
2010-06-15 11:22 ` Avi Kivity
2010-06-16 7:59 ` Andi Kleen
2010-06-16 8:05 ` Avi Kivity
2010-06-16 8:49 ` Andi Kleen
2010-06-16 9:40 ` Avi Kivity
2010-06-16 10:51 ` huang ying
2010-06-16 11:08 ` Avi Kivity
2010-06-17 6:46 ` Xiao Guangrong
[not found] ` <4C16E7F4.5060801@cn.fujitsu.com>
2010-06-15 2:46 ` [PATCH 4/6] KVM: MMU: introduce mmu_topup_memory_cache_atomic() Xiao Guangrong
[not found] ` <4C16E82E.5010306@cn.fujitsu.com>
2010-06-15 2:47 ` [PATCH 5/6] KVM: MMU: prefetch ptes when intercepted guest #PF Xiao Guangrong
2010-06-15 11:41 ` Avi Kivity
2010-06-17 7:25 ` Xiao Guangrong
2010-06-17 8:07 ` Avi Kivity
2010-06-17 9:04 ` Xiao Guangrong
2010-06-17 9:20 ` Avi Kivity
2010-06-17 9:29 ` Xiao Guangrong
2010-06-16 3:55 ` Marcelo Tosatti
[not found] ` <4C16E867.8040606@cn.fujitsu.com>
2010-06-15 2:47 ` [PATCH 6/6] KVM: MMU: trace pte prefetch Xiao Guangrong
2010-06-15 12:09 ` Avi Kivity
2010-06-17 7:27 ` Xiao Guangrong
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox