From mboxrd@z Thu Jan 1 00:00:00 1970 From: Avi Kivity Subject: Re: [patch 09/13] KVM: MMU: out of sync shadow core Date: Sun, 07 Sep 2008 14:01:42 +0300 Message-ID: <48C3B496.20905@qumranet.com> References: <20080906184822.560099087@localhost.localdomain> <20080906192431.211131067@localhost.localdomain> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: kvm@vger.kernel.org To: Marcelo Tosatti Return-path: Received: from il.qumranet.com ([212.179.150.194]:33879 "EHLO il.qumranet.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753238AbYIGLBp (ORCPT ); Sun, 7 Sep 2008 07:01:45 -0400 In-Reply-To: <20080906192431.211131067@localhost.localdomain> Sender: kvm-owner@vger.kernel.org List-ID: Marcelo Tosatti wrote: > Allow global and single-root, single-role-per-gfn leaf shadowed > pagetables to be unsynced. > > Global unsync pages are saved into a per-vm array, synced on cr4/cr0 writes. > > Why not a list? > Non-global unsync pages are linked off their root shadow page, synced > on cr3/cr4/cr0 writes. > > Some of this logic is simplistic and could be smarter (page_multimapped and > the full root sync on higher level pagetable sharing). > > Also unsyncing of non-leaf nodes might be interesting (but more complicated). > > > > +static struct kvm_mmu_page *kvm_mmu_lookup_page_root(struct kvm_vcpu *vcpu, > + gfn_t gfn) > +{ > + unsigned index; > + struct hlist_head *bucket; > + struct kvm_mmu_page *sp; > + struct hlist_node *node; > + struct kvm *kvm = vcpu->kvm; > + int level = vcpu->arch.mmu.root_level; > + if (!is_long_mode(vcpu) && is_pae(vcpu)) > + level--; > + > + pgprintk("%s: looking for gfn %lx\n", __func__, gfn); > + index = kvm_page_table_hashfn(gfn); > + bucket = &kvm->arch.mmu_page_hash[index]; > + hlist_for_each_entry(sp, node, bucket, hash_link) > + if (sp->gfn == gfn && !sp->role.metaphysical > + && !sp->role.invalid && sp->role.level == level) { > + pgprintk("%s: found role %x\n", > + __func__, sp->role.word); > + return sp; > + } > + return NULL; > +} > I'm worried about the complexity this (and the rest) introduces. A possible alternative is: - for non-leaf pages, including roots, add a 'unsync_children' flag. - when marking a page unsync, set the flag recursively on all parents - when switching cr3, recursively descend to locate unsynced leaves, clearing flags along the way - to speed this up, put a bitmap with 1 bit per pte in the pages (512 bits = 64 bytes) - the bitmap can be externally allocated to save space, or not This means we no longer have to worry about multiple roots, when a page acquires another root while it is unsynced, etc. > @@ -963,8 +1112,24 @@ static struct kvm_mmu_page *kvm_mmu_get_ > gfn, role.word); > index = kvm_page_table_hashfn(gfn); > bucket = &vcpu->kvm->arch.mmu_page_hash[index]; > - hlist_for_each_entry(sp, node, bucket, hash_link) > - if (sp->gfn == gfn && sp->role.word == role.word) { > + hlist_for_each_entry_safe(sp, node, tmp, bucket, hash_link) > + if (sp->gfn == gfn) { > + /* > + * If a pagetable becomes referenced by more than one > + * root, or has multiple roles, unsync it and disable > + * oos. For higher level pgtables the entire tree > + * has to be synced. > + */ > + if (sp->root_gfn != root_gfn) { > + kvm_set_pg_inuse(sp); > What does inuse mean exactly? > + if (set_shared_mmu_page(vcpu, sp)) > + tmp = bucket->first; > + kvm_clear_pg_inuse(sp); > Cleared here? > + unsyncable = 0; > + } > + if (sp->role.word != role.word) > + continue; > + > mmu_page_add_parent_pte(vcpu, sp, parent_pte); > pgprintk("%s: found\n", __func__); > return sp; > --- kvm.orig/include/asm-x86/kvm_host.h > +++ kvm/include/asm-x86/kvm_host.h > @@ -179,6 +179,10 @@ union kvm_mmu_page_role { > struct kvm_mmu_page { > struct list_head link; > struct hlist_node hash_link; > + /* FIXME: one list_head is enough */ > + struct list_head unsync_pages; > + struct list_head oos_link; > That's okay, we may allow OOS roots one day. > + gfn_t root_gfn; /* root this pagetable belongs to, -1 if multimapped */ > > /* > * The following two entries are used to key the shadow page in the > @@ -362,6 +366,8 @@ struct kvm_arch{ > unsigned int n_requested_mmu_pages; > unsigned int n_alloc_mmu_pages; > struct hlist_head mmu_page_hash[KVM_NUM_MMU_PAGES]; > + struct kvm_mmu_page *oos_global_pages[7]; > + unsigned oos_global_idx; > What does the index mean? An lru pointer? I became a little unsynced myself reading the patch. It's very complex. We need to change something to keep it maintainable. Either a better data structure, or disallowing a parent to be zapped while any of its children are alive. What happens when a sp->global changes its value while a page is oos? Can we even detect a nonglobal->global change? Maybe instead of a flag, add a counter for active ptes and global ptes, and a page is global if the counters match. However most likely it doesn't matter at all. -- error compiling committee.c: too many arguments to function