From mboxrd@z Thu Jan  1 00:00:00 1970
From: Avi Kivity <avi@qumranet.com>
Subject: Re: [patch 07/13] KVM: MMU: mode specific sync_page
Date: Mon, 08 Sep 2008 12:50:00 +0300
Message-ID: <48C4F548.3060909@qumranet.com>
References: <20080906184822.560099087@localhost.localdomain> <20080906192431.043506161@localhost.localdomain> <48C3A455.5080100@qumranet.com> <20080908060354.GA1014@dmt.cnet>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: KVM list <kvm@vger.kernel.org>
To: Marcelo Tosatti <mtosatti@redhat.com>
Return-path: <kvm-owner@vger.kernel.org>
Received: from il.qumranet.com ([212.179.150.194]:35063 "EHLO il.qumranet.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1751504AbYIHJuD (ORCPT <rfc822;kvm@vger.kernel.org>);
	Mon, 8 Sep 2008 05:50:03 -0400
In-Reply-To: <20080908060354.GA1014@dmt.cnet>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

Marcelo Tosatti wrote:
> On Sun, Sep 07, 2008 at 12:52:21PM +0300, Avi Kivity wrote:
>   
>> What if vcpu0 is in mode X, while vcpu1 is in mode Y.  vcpu0 writes to  
>> some pagetable, causing both mode X and mode Y shadows to become  
>> unsynced, so on the next resync (either by vcpu0 or vcpu1) we need to  
>> sync both modes.
>>     
>
> From the oos core patch:
>
> -       hlist_for_each_entry(sp, node, bucket, hash_link)
> -               if (sp->gfn == gfn && sp->role.word == role.word) {
> +       hlist_for_each_entry_safe(sp, node, tmp, bucket, hash_link)
> +               if (sp->gfn == gfn) {
> +                       /*
> +                        * If a pagetable becomes referenced by more than one
> +                        * root, or has multiple roles, unsync it and disable
> +                        * oos. For higher level pgtables the entire tree
> +                        * has to be synced.
> +                        */
> +                       if (sp->root_gfn != root_gfn) {
> +                               kvm_set_pg_inuse(sp);
> +                               if (set_shared_mmu_page(vcpu, sp))
> +                                       tmp = bucket->first;
> +                               kvm_clear_pg_inuse(sp);
> +                               unsyncable = 0;
> +                       }
>
> So as soon as a pagetable is shadowed with different modes, its resynced 
> and unsyncing is disabled.
>
>   

Okay.  But the complexity here, esp. with rarely used cases like 
multiple mode shadows, is frightening.

>>> +
>>> +			pte_access = sp->role.access & FNAME(gpte_access)(vcpu, *pt);
>>> +			/* user */
>>> +			if (pte_access & ACC_USER_MASK)
>>> +				spte |= shadow_user_mask;
>>>   
>>>       
>> There are some special cases involving cr0.wp=0 and the user mask.  so  
>> spte.u is not correlated exactly with gpte.u.
>>     
>
> How come?
>
>   

When cr0.wp=0, the cpu ignores pte.w for cpl=0 accesses.  kvm requires 
cr0.wp=1 (since we need to write protect pages, for many reasons, like 
emulating pte.dirty).  This is how we handle pte.u=1 + pte.w=0:

- for cpl 0 accesses, we set spte.w=1 (to allow the write) and spte.u=0 
(to forbid cpl>0 accesses)
- for cpl>0 accesses, we set spte.w=0 (to forbid userspace write 
accesses) and spte.u=1 (to allow cpl>0 read accesses)

this works well except if the accesses keep alternating between 
userspace and kernel.

>>> +			/* guest->shadow accessed sync */
>>> +			if (!(*pt & PT_ACCESSED_MASK))
>>> +				spte &= ~PT_ACCESSED_MASK;
>>>   
>>>       
>> spte shouldn't be accessible at all if gpte is not accessed, so we can  
>> set gpte.a on the next access (similar to spte not being writeable if  
>> gpte is not dirty).
>>     
>
> Right. Perhaps accessed bit synchronization to guest could be performed
> lazily somehow, so as to avoid a vmexit on every first page access.
>   

I don't think this is doable (well you can do it if you make the guest 
page table not present, but then even reading the accessed bit faults).

>>> +			set_shadow_pte(&sp->spt[i], spte);
>>>   
>>>       
>> What if permissions are reduced?
>>     
>
> Then a local TLB flush is needed. Flushing the TLB's of remote vcpus
> should be done by the guest AFAICS.
>
>   

hm.  It depends on why they are reduced.  If the page became shadowed, 
then we are responsible.

Don't think this is the case here, so local flush is likely sufficient.


-- 
error compiling committee.c: too many arguments to function