From mboxrd@z Thu Jan 1 00:00:00 1970 From: Marcelo Tosatti Subject: Re: [patch] kvm with mmu notifier v18 Date: Tue, 10 Jun 2008 17:41:30 -0300 Message-ID: <20080610204130.GA13798@dmt.cnet> References: <20080605002626.GA15502@duo.random> <48480C36.6050309@qumranet.com> <20080605164717.GH15502@duo.random> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Avi Kivity , kvm@vger.kernel.org To: Andrea Arcangeli Return-path: Received: from mx1.redhat.com ([66.187.233.31]:47254 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751955AbYFJUlu (ORCPT ); Tue, 10 Jun 2008 16:41:50 -0400 Content-Disposition: inline In-Reply-To: <20080605164717.GH15502@duo.random> Sender: kvm-owner@vger.kernel.org List-ID: Hi Andrea, On Thu, Jun 05, 2008 at 06:47:17PM +0200, Andrea Arcangeli wrote: > latest version, I removed ->release already before posting the last > one because by the time vm destroy runs no more guest mode can run, so > sptes are irrelevant and no cpu can follow the secondary tlb anymore > because no cpu can be in guest mode for the 'mm', even if sptes are > actually destroyed later. The previous patch was taking a kvm mutex in > release under mmu_lock and that's forbidden, so it's simpler to remove > the release debug knob for now (you suggested to use > kvm_reload_remote_mmus in the future that shouldn't take sleeping > locks). The only reason for having a real ->release would be to avoid > any risk w.r.t. to tlb speculative accesses to gart alias with > different cache protocol (I doubt this is a realistic worry but anyway > it's not big deal to implement a ->release). > > Signed-off-by: Andrea Arcangeli > +static int mmu_guess_page_from_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa, > + const u8 *new, int bytes, > + gfn_t *_gfn, pfn_t *_pfn, > + int *_mmu_seq, int *_largepage) > { > gfn_t gfn; > int r; > u64 gpte = 0; > pfn_t pfn; > - > - vcpu->arch.update_pte.largepage = 0; > + int mmu_seq; > + int largepage; > > if (bytes != 4 && bytes != 8) > - return; > + return 0; > > /* > * Assume that the pte write on a page table of the same type > @@ -1650,7 +1770,7 @@ static void mmu_guess_page_from_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa, > if ((bytes == 4) && (gpa % 4 == 0)) { > r = kvm_read_guest(vcpu->kvm, gpa & ~(u64)7, &gpte, 8); > if (r) > - return; > + return 0; > memcpy((void *)&gpte + (gpa % 8), new, 4); > } else if ((bytes == 8) && (gpa % 8 == 0)) { > memcpy((void *)&gpte, new, 8); > @@ -1660,23 +1780,30 @@ static void mmu_guess_page_from_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa, > memcpy((void *)&gpte, new, 4); > } > if (!is_present_pte(gpte)) > - return; > + return 0; > gfn = (gpte & PT64_BASE_ADDR_MASK) >> PAGE_SHIFT; > > + largepage = 0; > down_read(¤t->mm->mmap_sem); > if (is_large_pte(gpte) && is_largepage_backed(vcpu, gfn)) { > gfn &= ~(KVM_PAGES_PER_HPAGE-1); > - vcpu->arch.update_pte.largepage = 1; > + largepage = 1; > } > + mmu_seq = atomic_read(&vcpu->kvm->arch.mmu_notifier_seq); > + /* implicit mb(), we'll read before PT lock is unlocked */ > pfn = gfn_to_pfn(vcpu->kvm, gfn); > up_read(¤t->mm->mmap_sem); > > - if (is_error_pfn(pfn)) { > + if (unlikely(is_error_pfn(pfn))) { > kvm_release_pfn_clean(pfn); > - return; > + return 0; > } > - vcpu->arch.update_pte.gfn = gfn; > - vcpu->arch.update_pte.pfn = pfn; > + > + *_gfn = gfn; > + *_pfn = pfn; > + *_mmu_seq = mmu_seq; > + *_largepage = largepage; > + return 1; > } > > static void kvm_mmu_access_page(struct kvm_vcpu *vcpu, gfn_t gfn) > @@ -1711,9 +1838,24 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa, > int npte; > int r; > > + int update_pte; > + gfn_t gpte_gfn; > + pfn_t pfn; > + int mmu_seq; > + int largepage; > + > pgprintk("%s: gpa %llx bytes %d\n", __func__, gpa, bytes); > - mmu_guess_page_from_pte_write(vcpu, gpa, new, bytes); > + update_pte = mmu_guess_page_from_pte_write(vcpu, gpa, new, bytes, > + &gpte_gfn, &pfn, > + &mmu_seq, &largepage); > spin_lock(&vcpu->kvm->mmu_lock); > + if (update_pte) { > + BUG_ON(!is_error_pfn(vcpu->arch.update_pte.pfn)); > + vcpu->arch.update_pte.gfn = gpte_gfn; > + vcpu->arch.update_pte.pfn = pfn; > + vcpu->arch.update_pte.mmu_seq = mmu_seq; > + vcpu->arch.update_pte.largepage = largepage; > + } I don't get this. mmu_lock protects the shadow page tables, reverse mappings and associated lists. vcpu->arch.update_pte is a per-vcpu structure, so it does not need locking by itself. The memslots are protected by memslots_lock, which is always taken when mmu_guess_page_from_pte_write() is reached. mmap_sem protects QEMU's virtual mmaping from changing during find_vma / get_user_pages. Can you explain please?