From mboxrd@z Thu Jan 1 00:00:00 1970 From: Avi Kivity Subject: Re: [patch 07/13] KVM: MMU: mode specific sync_page Date: Mon, 08 Sep 2008 12:50:00 +0300 Message-ID: <48C4F548.3060909@qumranet.com> References: <20080906184822.560099087@localhost.localdomain> <20080906192431.043506161@localhost.localdomain> <48C3A455.5080100@qumranet.com> <20080908060354.GA1014@dmt.cnet> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: KVM list To: Marcelo Tosatti Return-path: Received: from il.qumranet.com ([212.179.150.194]:35063 "EHLO il.qumranet.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751504AbYIHJuD (ORCPT ); Mon, 8 Sep 2008 05:50:03 -0400 In-Reply-To: <20080908060354.GA1014@dmt.cnet> Sender: kvm-owner@vger.kernel.org List-ID: Marcelo Tosatti wrote: > On Sun, Sep 07, 2008 at 12:52:21PM +0300, Avi Kivity wrote: > >> What if vcpu0 is in mode X, while vcpu1 is in mode Y. vcpu0 writes to >> some pagetable, causing both mode X and mode Y shadows to become >> unsynced, so on the next resync (either by vcpu0 or vcpu1) we need to >> sync both modes. >> > > From the oos core patch: > > - hlist_for_each_entry(sp, node, bucket, hash_link) > - if (sp->gfn == gfn && sp->role.word == role.word) { > + hlist_for_each_entry_safe(sp, node, tmp, bucket, hash_link) > + if (sp->gfn == gfn) { > + /* > + * If a pagetable becomes referenced by more than one > + * root, or has multiple roles, unsync it and disable > + * oos. For higher level pgtables the entire tree > + * has to be synced. > + */ > + if (sp->root_gfn != root_gfn) { > + kvm_set_pg_inuse(sp); > + if (set_shared_mmu_page(vcpu, sp)) > + tmp = bucket->first; > + kvm_clear_pg_inuse(sp); > + unsyncable = 0; > + } > > So as soon as a pagetable is shadowed with different modes, its resynced > and unsyncing is disabled. > > Okay. But the complexity here, esp. with rarely used cases like multiple mode shadows, is frightening. >>> + >>> + pte_access = sp->role.access & FNAME(gpte_access)(vcpu, *pt); >>> + /* user */ >>> + if (pte_access & ACC_USER_MASK) >>> + spte |= shadow_user_mask; >>> >>> >> There are some special cases involving cr0.wp=0 and the user mask. so >> spte.u is not correlated exactly with gpte.u. >> > > How come? > > When cr0.wp=0, the cpu ignores pte.w for cpl=0 accesses. kvm requires cr0.wp=1 (since we need to write protect pages, for many reasons, like emulating pte.dirty). This is how we handle pte.u=1 + pte.w=0: - for cpl 0 accesses, we set spte.w=1 (to allow the write) and spte.u=0 (to forbid cpl>0 accesses) - for cpl>0 accesses, we set spte.w=0 (to forbid userspace write accesses) and spte.u=1 (to allow cpl>0 read accesses) this works well except if the accesses keep alternating between userspace and kernel. >>> + /* guest->shadow accessed sync */ >>> + if (!(*pt & PT_ACCESSED_MASK)) >>> + spte &= ~PT_ACCESSED_MASK; >>> >>> >> spte shouldn't be accessible at all if gpte is not accessed, so we can >> set gpte.a on the next access (similar to spte not being writeable if >> gpte is not dirty). >> > > Right. Perhaps accessed bit synchronization to guest could be performed > lazily somehow, so as to avoid a vmexit on every first page access. > I don't think this is doable (well you can do it if you make the guest page table not present, but then even reading the accessed bit faults). >>> + set_shadow_pte(&sp->spt[i], spte); >>> >>> >> What if permissions are reduced? >> > > Then a local TLB flush is needed. Flushing the TLB's of remote vcpus > should be done by the guest AFAICS. > > hm. It depends on why they are reduced. If the page became shadowed, then we are responsible. Don't think this is the case here, so local flush is likely sufficient. -- error compiling committee.c: too many arguments to function