From mboxrd@z Thu Jan  1 00:00:00 1970
From: Avi Kivity <avi@qumranet.com>
Subject: Re: [patch 09/13] KVM: MMU: out of sync shadow core
Date: Sun, 07 Sep 2008 14:01:42 +0300
Message-ID: <48C3B496.20905@qumranet.com>
References: <20080906184822.560099087@localhost.localdomain> <20080906192431.211131067@localhost.localdomain>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: kvm@vger.kernel.org
To: Marcelo Tosatti <mtosatti@redhat.com>
Return-path: <kvm-owner@vger.kernel.org>
Received: from il.qumranet.com ([212.179.150.194]:33879 "EHLO il.qumranet.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1753238AbYIGLBp (ORCPT <rfc822;kvm@vger.kernel.org>);
	Sun, 7 Sep 2008 07:01:45 -0400
In-Reply-To: <20080906192431.211131067@localhost.localdomain>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

Marcelo Tosatti wrote:
> Allow global and single-root, single-role-per-gfn leaf shadowed
> pagetables to be unsynced.
>
> Global unsync pages are saved into a per-vm array, synced on cr4/cr0 writes.
>
>   

Why not a list?

> Non-global unsync pages are linked off their root shadow page, synced 
> on cr3/cr4/cr0 writes.
>
> Some of this logic is simplistic and could be smarter (page_multimapped and
> the full root sync on higher level pagetable sharing).
>
> Also unsyncing of non-leaf nodes might be interesting (but more complicated).
>
>
>  
> +static struct kvm_mmu_page *kvm_mmu_lookup_page_root(struct kvm_vcpu *vcpu,
> +						     gfn_t gfn)
> +{
> +	unsigned index;
> +	struct hlist_head *bucket;
> +	struct kvm_mmu_page *sp;
> +	struct hlist_node *node;
> +	struct kvm *kvm = vcpu->kvm;
> +	int level = vcpu->arch.mmu.root_level;
> +	if (!is_long_mode(vcpu) && is_pae(vcpu))
> +		level--;
> +
> +	pgprintk("%s: looking for gfn %lx\n", __func__, gfn);
> +	index = kvm_page_table_hashfn(gfn);
> +	bucket = &kvm->arch.mmu_page_hash[index];
> +	hlist_for_each_entry(sp, node, bucket, hash_link)
> +		if (sp->gfn == gfn && !sp->role.metaphysical
> +		    && !sp->role.invalid && sp->role.level == level) {
> +			pgprintk("%s: found role %x\n",
> +				 __func__, sp->role.word);
> +			return sp;
> +		}
> +	return NULL;
> +}
>   

I'm worried about the complexity this (and the rest) introduces.

A possible alternative is:

- for non-leaf pages, including roots, add a 'unsync_children' flag.
- when marking a page unsync, set the flag recursively on all parents
- when switching cr3, recursively descend to locate unsynced leaves, 
clearing flags along the way
- to speed this up, put a bitmap with 1 bit per pte in the pages (512 
bits = 64 bytes)
- the bitmap can be externally allocated to save space, or not

This means we no longer have to worry about multiple roots, when a page 
acquires another root while it is unsynced, etc.


> @@ -963,8 +1112,24 @@ static struct kvm_mmu_page *kvm_mmu_get_
>  		 gfn, role.word);
>  	index = kvm_page_table_hashfn(gfn);
>  	bucket = &vcpu->kvm->arch.mmu_page_hash[index];
> -	hlist_for_each_entry(sp, node, bucket, hash_link)
> -		if (sp->gfn == gfn && sp->role.word == role.word) {
> +	hlist_for_each_entry_safe(sp, node, tmp, bucket, hash_link)
> +		if (sp->gfn == gfn) {
> +			/*
> + 			 * If a pagetable becomes referenced by more than one
> + 			 * root, or has multiple roles, unsync it and disable
> + 			 * oos. For higher level pgtables the entire tree
> + 			 * has to be synced.
> + 			 */
> +			if (sp->root_gfn != root_gfn) {
> +				kvm_set_pg_inuse(sp);
>   

What does inuse mean exactly?

> +				if (set_shared_mmu_page(vcpu, sp))
> +					tmp = bucket->first;
> +				kvm_clear_pg_inuse(sp);
>   

Cleared here?

> +				unsyncable = 0;
> +			}
> +			if (sp->role.word != role.word)
> +				continue;
> +
>  			mmu_page_add_parent_pte(vcpu, sp, parent_pte);
>  			pgprintk("%s: found\n", __func__);
>  			return sp;
> --- kvm.orig/include/asm-x86/kvm_host.h
> +++ kvm/include/asm-x86/kvm_host.h
> @@ -179,6 +179,10 @@ union kvm_mmu_page_role {
>  struct kvm_mmu_page {
>  	struct list_head link;
>  	struct hlist_node hash_link;
> +	/* FIXME: one list_head is enough */
> +	struct list_head unsync_pages;
> +	struct list_head oos_link;
>   

That's okay, we may allow OOS roots one day.

> +	gfn_t root_gfn; /* root this pagetable belongs to, -1 if multimapped */
>  
>  	/*
>  	 * The following two entries are used to key the shadow page in the
> @@ -362,6 +366,8 @@ struct kvm_arch{
>  	unsigned int n_requested_mmu_pages;
>  	unsigned int n_alloc_mmu_pages;
>  	struct hlist_head mmu_page_hash[KVM_NUM_MMU_PAGES];
> +	struct kvm_mmu_page *oos_global_pages[7];
> +	unsigned oos_global_idx;
>   

What does the index mean?  An lru pointer?

I became a little unsynced myself reading the patch.  It's very complex.

We need to change something to keep it maintainable.  Either a better 
data structure, or disallowing a parent to be zapped while any of its 
children are alive.

What happens when a sp->global changes its value while a page is oos?

Can we even detect a nonglobal->global change?  Maybe instead of a flag, 
add a counter for active ptes and global ptes, and a page is global if 
the counters match.  However most likely it doesn't matter at all.

-- 
error compiling committee.c: too many arguments to function