Re: [patch 09/13] KVM: MMU: out of sync shadow core

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

From: Marcelo Tosatti <mtosatti@redhat.com>
To: Avi Kivity <avi@qumranet.com>
Cc: kvm@vger.kernel.org
Subject: Re: [patch 09/13] KVM: MMU: out of sync shadow core
Date: Thu, 11 Sep 2008 05:19:52 -0300	[thread overview]
Message-ID: <20080911081952.GA18128@dmt.cnet> (raw)
In-Reply-To: <48C53BEE.4050402@qumranet.com>

On Mon, Sep 08, 2008 at 05:51:26PM +0300, Avi Kivity wrote:
> Marcelo Tosatti wrote:
>>> I'm worried about the complexity this (and the rest) introduces.
>>>
>>> A possible alternative is:
>>>
>>> - for non-leaf pages, including roots, add a 'unsync_children' flag.
>>> - when marking a page unsync, set the flag recursively on all parents
>>> - when switching cr3, recursively descend to locate unsynced leaves,  
>>> clearing flags along the way
>>> - to speed this up, put a bitmap with 1 bit per pte in the pages (512 
>>>  bits = 64 bytes)
>>> - the bitmap can be externally allocated to save space, or not
>>>
>>> This means we no longer have to worry about multiple roots, when a 
>>> page  acquires another root while it is unsynced, etc.
>>>     
>>
>> I thought about that when you first mentioned it, but it seems more
>> complex than the current structure. Remember you have to clean the
>> unsynced flag on resync, which means walking up the parents verifying if
>> this is the last unsynced children.
>>   
>
> No, if you have a false positive you can simply ignore it.
>
>> Other than the bitmap space.
>>   
>
> The bitmap space could be stored in a separate structure.   
> Alternatively, put a few u16s with indexes into the page header.  Would  
> be faster to walk as well, though less general.
>
>> And see comments about multiple roles below.
>>
>>   
>>>> @@ -963,8 +1112,24 @@ static struct kvm_mmu_page *kvm_mmu_get_
>>>>  		 gfn, role.word);
>>>>  	index = kvm_page_table_hashfn(gfn);
>>>>  	bucket = &vcpu->kvm->arch.mmu_page_hash[index];
>>>> -	hlist_for_each_entry(sp, node, bucket, hash_link)
>>>> -		if (sp->gfn == gfn && sp->role.word == role.word) {
>>>> +	hlist_for_each_entry_safe(sp, node, tmp, bucket, hash_link)
>>>> +		if (sp->gfn == gfn) {
>>>> +			/*
>>>> + 			 * If a pagetable becomes referenced by more than one
>>>> + 			 * root, or has multiple roles, unsync it and disable
>>>> + 			 * oos. For higher level pgtables the entire tree
>>>> + 			 * has to be synced.
>>>> + 			 */
>>>> +			if (sp->root_gfn != root_gfn) {
>>>> +				kvm_set_pg_inuse(sp);
>>>>         
>>> What does inuse mean exactly?
>>>     
>>
>> That we're going to access struct kvm_mmu_page, so kvm_sync_page won't
>> free it (also used for global->nonglobal resync).
>>
>>   
>
> Couldn't it be passed as a parameter?

Yes it could.

>>> I became a little unsynced myself reading the patch.  It's very complex.
>>>     
>>
>> Can you go into detail? Worrying about multiple roots is more about
>> code change (passing root_gfn down to mmu_get_page etc) than structural
>> complexity I think. It boils down to
>>
>>                     if (sp->root_gfn != root_gfn) {
>>                         kvm_set_pg_inuse(sp);
>>                         if (set_shared_mmu_page(vcpu, sp))
>>                             tmp = bucket->first;
>>                         kvm_clear_pg_inuse(sp);
>>                     }
>>
>> And this also deals with the pagetable with shadows in different
>> modes/roles case. You'd still have to deal with that by keeping unsync
>> information all the way up to root.
>>
>>   
>
> I'm worried about the amount of state we add.  Whether a page is  
> single-root or multi-root, if it's in the same mode or multiple modes.   

You need similar complexity to handle pagetables with multiple roles
under unsync-info-on-tree approach:

@@ -993,8 +1049,15 @@ static struct kvm_mmu_page *kvm_mmu_get_
                 gfn, role.word);
        index = kvm_page_table_hashfn(gfn);
        bucket = &vcpu->kvm->arch.mmu_page_hash[index];
-       hlist_for_each_entry(sp, node, bucket, hash_link)
-               if (sp->gfn == gfn && sp->role.word == role.word) {
+       hlist_for_each_entry_safe(sp, node, tmp, bucket, hash_link)
+               if (sp->gfn == gfn) {
+                       if (sp->role.word != role.word) {
+                               if (kvm_page_unsync(sp))
+                                       mmu_invalidate_unsync_page(vcpu, sp);
+                               unsyncable = 1;
+                               continue;
+                       }
+
                        mmu_page_add_parent_pte(vcpu, sp, parent_pte);
                        pgprintk("%s: found\n", __func__);
                        return sp;

And such pagetables with multiple shadows will have to be marked as
"unsyncable". Which is pretty similar to what the current patchset does,
except that it accounts for multi-root other than multiple shadows.

The other option, which is to keep the multiple shadow unsync, and have
them synced by their respective users, seems much more complex than
this.

> The problems with the large amount of state is that the number of  
> possible state transitions increases rapidly.
>
> So far we treat each page completely independently of other pages (apart  
> from the connectivity pointers), so we avoid the combinatorial  
> explosion.  The tree walk approach keeps that (at the expense of some  
> efficiency, unfortunately).
>
>>> or disallowing a parent to be zapped while any of its  children are 
>>> alive.
>>>     
>>
>> What is the problem with that? 
>
> It reduces the mmu flexibility.  If we (say) introduce an lru algorithm,  
> it is orthogonal to everything else in the mmu.  If we have a root/child  
> dependency, the lru has to know.
>
>> And what the alternative would be, to zap all children first?
>>   
>
> That has the disadvantage of allowing very bad corner cases if we are  
> forced to zap a root.
>
> I'd really like to avoid bad worst cases.
>
>> So more details please, what exactly is annoying you:
>>
>> - Awareness of multiple roots in the current form ? I agree its
>>   not very elegant.
>>   
>
> Yes.
>
>> - The fact that hash table bucket and active_mmu_page
>>   for_each_entry_safe walks are unsafe because several list
>>   entries (the unsynced leafs) can be deleted ?
>>
>>   
>
> Hadn't even considered that...
>
> What worries me most is that everything is interconnected: multiple  
> modes, cr3 switch, out-of-sync, zapping via the inuse flag.  It's very  
> difficult for me to understand, what about someone new?
>
> We need to make this fit better.  We need to morph some mmu  
> infrastructure to something else, but we can't keep adding complexity.

Another point that came to mind is that, with the unsync-info-on-tree
approach, whenever a higher level pagetable is zapped all of its
unsynced children need to be zapped.

Since at the moment resync on atomic context is not an option, the only
option is to zap the children, potentially wasting a lot of shadowed
entries. This is not an issue with the current approach since unsync
children are tied to roots, not every parents all the way to root.

I'll measure how often that happens, but seems a bad side effect.

next prev parent reply	other threads:[~2008-09-11  8:20 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-09-06 18:48 [patch 00/13] RFC: out of sync shadow Marcelo Tosatti
2008-09-06 18:48 ` [patch 01/13] x86/mm: get_user_pages_fast_atomic Marcelo Tosatti
2008-09-07  8:42   ` Avi Kivity
2008-09-08  6:10     ` Marcelo Tosatti
2008-09-08 14:20       ` Avi Kivity
2008-09-06 18:48 ` [patch 02/13] KVM: MMU: switch to get_user_pages_fast Marcelo Tosatti
2008-09-07  8:45   ` Avi Kivity
2008-09-07 20:44     ` Marcelo Tosatti
2008-09-08 14:53       ` Avi Kivity
2008-09-09 12:21     ` Andrea Arcangeli
2008-09-09 13:57       ` Avi Kivity
2008-09-06 18:48 ` [patch 03/13] KVM: MMU: gfn_to_page_atomic Marcelo Tosatti
2008-09-06 18:48 ` [patch 04/13] KVM: MMU: switch prefetch_page to gfn_to_page_atomic Marcelo Tosatti
2008-09-06 18:48 ` [patch 05/13] KVM: MMU: do not write-protect large mappings Marcelo Tosatti
2008-09-07  9:04   ` Avi Kivity
2008-09-07 20:54     ` Marcelo Tosatti
2008-09-06 18:48 ` [patch 06/13] KVM: MMU: global page keeping Marcelo Tosatti
2008-09-07  9:16   ` Avi Kivity
2008-09-06 18:48 ` [patch 07/13] KVM: MMU: mode specific sync_page Marcelo Tosatti
2008-09-07  9:52   ` Avi Kivity
2008-09-08  6:03     ` Marcelo Tosatti
2008-09-08  9:50       ` Avi Kivity
2008-09-06 18:48 ` [patch 08/13] KVM: MMU: record guest root level on struct guest_walker Marcelo Tosatti
2008-09-06 18:48 ` [patch 09/13] KVM: MMU: out of sync shadow core Marcelo Tosatti
2008-09-07 11:01   ` Avi Kivity
2008-09-08  7:19     ` Marcelo Tosatti
2008-09-08 14:51       ` Avi Kivity
2008-09-11  8:19         ` Marcelo Tosatti [this message]
2008-09-11 13:15     ` Marcelo Tosatti
2008-09-06 18:48 ` [patch 10/13] KVM: MMU: sync roots on mmu reload Marcelo Tosatti
2008-09-06 18:48 ` [patch 11/13] KVM: MMU: sync global pages on cr0/cr4 writes Marcelo Tosatti
2008-09-06 18:48 ` [patch 12/13] KVM: x86: trap invlpg Marcelo Tosatti
2008-09-07 11:14   ` Avi Kivity
2008-09-06 18:48 ` [patch 13/13] KVM: MMU: ignore multiroot when unsyncing global pages Marcelo Tosatti
2008-09-07 11:22 ` [patch 00/13] RFC: out of sync shadow Avi Kivity
2008-09-08  7:23   ` Marcelo Tosatti
2008-09-08 14:56     ` Avi Kivity
2008-09-12  4:05 ` David S. Ahern
2008-09-12 11:51   ` Marcelo Tosatti
2008-09-12 15:12     ` David S. Ahern
2008-09-12 18:09       ` Marcelo Tosatti
2008-09-12 18:19         ` David S. Ahern

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080911081952.GA18128@dmt.cnet \
    --to=mtosatti@redhat.com \
    --cc=avi@qumranet.com \
    --cc=kvm@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox