From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrea Arcangeli Subject: Re: [patch] kvm with mmu notifier v18 Date: Thu, 12 Jun 2008 03:33:54 +0200 Message-ID: <20080612013354.GD15233@duo> References: <20080605002626.GA15502@duo.random> <48480C36.6050309@qumranet.com> <20080605164717.GH15502@duo.random> <20080610204130.GA13798@dmt.cnet> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Avi Kivity , kvm@vger.kernel.org To: Marcelo Tosatti Return-path: Received: from host36-195-149-62.serverdedicati.aruba.it ([62.149.195.36]:49307 "EHLO mx.cpushare.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752261AbYFLBeE (ORCPT ); Wed, 11 Jun 2008 21:34:04 -0400 Content-Disposition: inline In-Reply-To: <20080610204130.GA13798@dmt.cnet> Sender: kvm-owner@vger.kernel.org List-ID: On Tue, Jun 10, 2008 at 05:41:30PM -0300, Marcelo Tosatti wrote: > Hi Andrea, > > On Thu, Jun 05, 2008 at 06:47:17PM +0200, Andrea Arcangeli wrote: > > latest version, I removed ->release already before posting the last > > one because by the time vm destroy runs no more guest mode can run, so > > sptes are irrelevant and no cpu can follow the secondary tlb anymore > > because no cpu can be in guest mode for the 'mm', even if sptes are > > actually destroyed later. The previous patch was taking a kvm mutex in > > release under mmu_lock and that's forbidden, so it's simpler to remove > > the release debug knob for now (you suggested to use > > kvm_reload_remote_mmus in the future that shouldn't take sleeping > > locks). The only reason for having a real ->release would be to avoid > > any risk w.r.t. to tlb speculative accesses to gart alias with > > different cache protocol (I doubt this is a realistic worry but anyway > > it's not big deal to implement a ->release). > > > > Signed-off-by: Andrea Arcangeli > > > +static int mmu_guess_page_from_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa, > > + const u8 *new, int bytes, > > + gfn_t *_gfn, pfn_t *_pfn, > > + int *_mmu_seq, int *_largepage) > > { > > gfn_t gfn; > > int r; > > u64 gpte = 0; > > pfn_t pfn; > > - > > - vcpu->arch.update_pte.largepage = 0; > > + int mmu_seq; > > + int largepage; > > > > if (bytes != 4 && bytes != 8) > > - return; > > + return 0; > > > > /* > > * Assume that the pte write on a page table of the same type > > @@ -1650,7 +1770,7 @@ static void mmu_guess_page_from_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa, > > if ((bytes == 4) && (gpa % 4 == 0)) { > > r = kvm_read_guest(vcpu->kvm, gpa & ~(u64)7, &gpte, 8); > > if (r) > > - return; > > + return 0; > > memcpy((void *)&gpte + (gpa % 8), new, 4); > > } else if ((bytes == 8) && (gpa % 8 == 0)) { > > memcpy((void *)&gpte, new, 8); > > @@ -1660,23 +1780,30 @@ static void mmu_guess_page_from_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa, > > memcpy((void *)&gpte, new, 4); > > } > > if (!is_present_pte(gpte)) > > - return; > > + return 0; > > gfn = (gpte & PT64_BASE_ADDR_MASK) >> PAGE_SHIFT; > > > > + largepage = 0; > > down_read(¤t->mm->mmap_sem); > > if (is_large_pte(gpte) && is_largepage_backed(vcpu, gfn)) { > > gfn &= ~(KVM_PAGES_PER_HPAGE-1); > > - vcpu->arch.update_pte.largepage = 1; > > + largepage = 1; > > } > > + mmu_seq = atomic_read(&vcpu->kvm->arch.mmu_notifier_seq); > > + /* implicit mb(), we'll read before PT lock is unlocked */ > > pfn = gfn_to_pfn(vcpu->kvm, gfn); > > up_read(¤t->mm->mmap_sem); > > > > - if (is_error_pfn(pfn)) { > > + if (unlikely(is_error_pfn(pfn))) { > > kvm_release_pfn_clean(pfn); > > - return; > > + return 0; > > } > > - vcpu->arch.update_pte.gfn = gfn; > > - vcpu->arch.update_pte.pfn = pfn; > > + > > + *_gfn = gfn; > > + *_pfn = pfn; > > + *_mmu_seq = mmu_seq; > > + *_largepage = largepage; > > + return 1; > > } > > > > static void kvm_mmu_access_page(struct kvm_vcpu *vcpu, gfn_t gfn) > > @@ -1711,9 +1838,24 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa, > > int npte; > > int r; > > > > + int update_pte; > > + gfn_t gpte_gfn; > > + pfn_t pfn; > > + int mmu_seq; > > + int largepage; > > + > > pgprintk("%s: gpa %llx bytes %d\n", __func__, gpa, bytes); > > - mmu_guess_page_from_pte_write(vcpu, gpa, new, bytes); > > + update_pte = mmu_guess_page_from_pte_write(vcpu, gpa, new, bytes, > > + &gpte_gfn, &pfn, > > + &mmu_seq, &largepage); > > spin_lock(&vcpu->kvm->mmu_lock); > > + if (update_pte) { > > + BUG_ON(!is_error_pfn(vcpu->arch.update_pte.pfn)); > > + vcpu->arch.update_pte.gfn = gpte_gfn; > > + vcpu->arch.update_pte.pfn = pfn; > > + vcpu->arch.update_pte.mmu_seq = mmu_seq; > > + vcpu->arch.update_pte.largepage = largepage; > > + } > > I don't get this. mmu_lock protects the shadow page tables, reverse > mappings and associated lists. vcpu->arch.update_pte is a per-vcpu > structure, so it does not need locking by itself. Ok, I wasn't sure myself if this is needed. The question is if two physical cpus could ever access vcpu->arch.update_pte structure at the same time... I guess answer is no in which case I can freely undo the above.