From mboxrd@z Thu Jan  1 00:00:00 1970
From: Marcelo Tosatti <mtosatti@redhat.com>
Subject: Re: [patch 09/13] KVM: MMU: out of sync shadow core
Date: Mon, 8 Sep 2008 04:19:33 -0300
Message-ID: <20080908071933.GC1014@dmt.cnet>
References: <20080906184822.560099087@localhost.localdomain> <20080906192431.211131067@localhost.localdomain> <48C3B496.20905@qumranet.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Cc: kvm@vger.kernel.org
To: Avi Kivity <avi@qumranet.com>
Return-path: <kvm-owner@vger.kernel.org>
Received: from mx1.redhat.com ([66.187.233.31]:42719 "EHLO mx1.redhat.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1752168AbYIHHUi (ORCPT <rfc822;kvm@vger.kernel.org>);
	Mon, 8 Sep 2008 03:20:38 -0400
Content-Disposition: inline
In-Reply-To: <48C3B496.20905@qumranet.com>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

On Sun, Sep 07, 2008 at 02:01:42PM +0300, Avi Kivity wrote:
> Marcelo Tosatti wrote:
>> Allow global and single-root, single-role-per-gfn leaf shadowed
>> pagetables to be unsynced.
>>
>> Global unsync pages are saved into a per-vm array, synced on cr4/cr0 writes.
>>
>>   
>
> Why not a list?

Could be.

> I'm worried about the complexity this (and the rest) introduces.
>
> A possible alternative is:
>
> - for non-leaf pages, including roots, add a 'unsync_children' flag.
> - when marking a page unsync, set the flag recursively on all parents
> - when switching cr3, recursively descend to locate unsynced leaves,  
> clearing flags along the way
> - to speed this up, put a bitmap with 1 bit per pte in the pages (512  
> bits = 64 bytes)
> - the bitmap can be externally allocated to save space, or not
>
> This means we no longer have to worry about multiple roots, when a page  
> acquires another root while it is unsynced, etc.

I thought about that when you first mentioned it, but it seems more
complex than the current structure. Remember you have to clean the
unsynced flag on resync, which means walking up the parents verifying if
this is the last unsynced children.

Other than the bitmap space.

And see comments about multiple roles below.

>> @@ -963,8 +1112,24 @@ static struct kvm_mmu_page *kvm_mmu_get_
>>  		 gfn, role.word);
>>  	index = kvm_page_table_hashfn(gfn);
>>  	bucket = &vcpu->kvm->arch.mmu_page_hash[index];
>> -	hlist_for_each_entry(sp, node, bucket, hash_link)
>> -		if (sp->gfn == gfn && sp->role.word == role.word) {
>> +	hlist_for_each_entry_safe(sp, node, tmp, bucket, hash_link)
>> +		if (sp->gfn == gfn) {
>> +			/*
>> + 			 * If a pagetable becomes referenced by more than one
>> + 			 * root, or has multiple roles, unsync it and disable
>> + 			 * oos. For higher level pgtables the entire tree
>> + 			 * has to be synced.
>> + 			 */
>> +			if (sp->root_gfn != root_gfn) {
>> +				kvm_set_pg_inuse(sp);
>>   
>
> What does inuse mean exactly?

That we're going to access struct kvm_mmu_page, so kvm_sync_page won't
free it (also used for global->nonglobal resync).

>> +				if (set_shared_mmu_page(vcpu, sp))
>> +					tmp = bucket->first;
>> +				kvm_clear_pg_inuse(sp);
>>   
>
> Cleared here?

Can't be zapped after this point.

>
>> +				unsyncable = 0;
>> +			}
>> +			if (sp->role.word != role.word)
>> +				continue;
>> +
>>  			mmu_page_add_parent_pte(vcpu, sp, parent_pte);
>>  			pgprintk("%s: found\n", __func__);
>>  			return sp;
>> --- kvm.orig/include/asm-x86/kvm_host.h
>> +++ kvm/include/asm-x86/kvm_host.h
>> @@ -179,6 +179,10 @@ union kvm_mmu_page_role {
>>  struct kvm_mmu_page {
>>  	struct list_head link;
>>  	struct hlist_node hash_link;
>> +	/* FIXME: one list_head is enough */
>> +	struct list_head unsync_pages;
>> +	struct list_head oos_link;
>>   
>
> That's okay, we may allow OOS roots one day.
>
>> +	gfn_t root_gfn; /* root this pagetable belongs to, -1 if multimapped */
>>   	/*
>>  	 * The following two entries are used to key the shadow page in the
>> @@ -362,6 +366,8 @@ struct kvm_arch{
>>  	unsigned int n_requested_mmu_pages;
>>  	unsigned int n_alloc_mmu_pages;
>>  	struct hlist_head mmu_page_hash[KVM_NUM_MMU_PAGES];
>> +	struct kvm_mmu_page *oos_global_pages[7];
>> +	unsigned oos_global_idx;
>>   
>
> What does the index mean?  An lru pointer?

Yes, last saved index into array.

> I became a little unsynced myself reading the patch.  It's very complex.

Can you go into detail? Worrying about multiple roots is more about
code change (passing root_gfn down to mmu_get_page etc) than structural
complexity I think. It boils down to

                    if (sp->root_gfn != root_gfn) {
                        kvm_set_pg_inuse(sp);
                        if (set_shared_mmu_page(vcpu, sp))
                            tmp = bucket->first;
                        kvm_clear_pg_inuse(sp);
                    }

And this also deals with the pagetable with shadows in different
modes/roles case. You'd still have to deal with that by keeping unsync
information all the way up to root.

> We need to change something to keep it maintainable.  Either a better  
> data structure, 

> or disallowing a parent to be zapped while any of its  
> children are alive.

What is the problem with that? And what the alternative would be, 
to zap all children first?

So more details please, what exactly is annoying you:

- Awareness of multiple roots in the current form ? I agree its
  not very elegant.
- The fact that hash table bucket and active_mmu_page
  for_each_entry_safe walks are unsafe because several list
  entries (the unsynced leafs) can be deleted ?

> What happens when a sp->global changes its value while a page is oos?

Its resynced on global->nonglobal change.

> Can we even detect a nonglobal->global change?  Maybe instead of a flag,  
> add a counter for active ptes and global ptes, and a page is global if  
> the counters match.  However most likely it doesn't matter at all.

Yeah, it seems nonglobal->global change is quite uncommon.

Oh, and another argument in favour of atomic resync is that you can
do it from mmu_get_page (for multiple role case), and from within
mmu_set_spte (for global->nonglobal change).