From mboxrd@z Thu Jan 1 00:00:00 1970 From: Marcelo Tosatti Subject: Re: [PATCH 1/6] kvm.git mmu notifier patch Date: Mon, 28 Jul 2008 18:11:04 -0300 Message-ID: <20080728211104.GA8342@dmt.cnet> References: <20080725142452.GA8217@duo.random> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Avi Kivity , Marcelo Tosatti , kvm@vger.kernel.org To: Andrea Arcangeli Return-path: Received: from mx1.redhat.com ([66.187.233.31]:42600 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1760876AbYG1VNu (ORCPT ); Mon, 28 Jul 2008 17:13:50 -0400 Content-Disposition: inline In-Reply-To: <20080725142452.GA8217@duo.random> Sender: kvm-owner@vger.kernel.org List-ID: Hi Andrea, On Fri, Jul 25, 2008 at 04:24:52PM +0200, Andrea Arcangeli wrote: > Hello, > > This is the latest version of the kvm mmu notifier patch. > > This allows largepages to be invalidated too, and a separate patch > takes care of alias locking (2/5) and memslot locking (3/5). > > While in the long term we should remove the alias to microoptimize the > fast paths and avoid the need of unalias_gfn calls, I decided not to > remove alias immediately: that patch done best would have removed the > alias API as a whole from userland, and done bad it would have kept > the alias API intact while implementing the lowlevel kernel details > with memslots instead of the alias array. Problem is the memslot array > is already fragmented with the private entries, and so adding the > alias entries before or after the private memslot entries resulted in > different issues, often requiring a new loop over the alias range or > the private range depending where I put the alias slots (before or > after the private ones). > > So I think to do it right the alias ioctl should be removed as a whole > and that was too large to do right now, especially given we're > concentrating on stability and fixing the alias with the mmu_lock was > a 2 liner change. > > Marcelo please double check this code, I didn't effectively test it > myself with the largepages yet but it looks the rmap_next works the > same and the rmap_pde is just a rmapp with the only difference of the > PSE bit being set. I didn't hide the rmap_pde details with gfn_to_rmap > to avoid an entirely superflous gfn_to_memslot. > > + retval |= handler(kvm, > + &memslot->lpage_info[ > + gfn_offset / KVM_PAGES_PER_HPAGE].rmap_pde); > > Signed-off-by: Andrea Arcangeli > > diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig > index 8d45fab..ce3251c 100644 > --- a/arch/x86/kvm/Kconfig > +++ b/arch/x86/kvm/Kconfig > @@ -21,6 +21,7 @@ config KVM > tristate "Kernel-based Virtual Machine (KVM) support" > depends on HAVE_KVM > select PREEMPT_NOTIFIERS > + select MMU_NOTIFIER > select ANON_INODES > ---help--- > Support hosting fully virtualized guest machines using hardware > diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c > index 2b60b7d..9bc31fc 100644 > --- a/arch/x86/kvm/mmu.c > +++ b/arch/x86/kvm/mmu.c > @@ -653,6 +653,84 @@ static void rmap_write_protect(struct kvm *kvm, u64 gfn) > account_shadowed(kvm, gfn); > } > > +static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp) > +{ > + u64 *spte; > + int need_tlb_flush = 0; > + > + while ((spte = rmap_next(kvm, rmapp, NULL))) { > + BUG_ON(!(*spte & PT_PRESENT_MASK)); > + rmap_printk("kvm_rmap_unmap_hva: spte %p %llx\n", spte, *spte); > + rmap_remove(kvm, spte); > + set_shadow_pte(spte, shadow_trap_nonpresent_pte); > + need_tlb_flush = 1; > + } Is rmap_remove() safe during rmap_next() iteration? Don't you need to restart the rmap walk from the beginning? Remember the discussion around commit 6597ca09e6c0e5aec7ffd2b8ab48c671d3c28414. > + return need_tlb_flush; > +} > + > +static int kvm_handle_hva(struct kvm *kvm, unsigned long hva, > + int (*handler)(struct kvm *kvm, unsigned long *rmapp)) > +{ > + int i; > + int retval = 0; > + > + /* > + * If mmap_sem isn't taken, we can look the memslots with only > + * the mmu_lock by skipping over the slots with userspace_addr == 0. > + */ > + for (i = 0; i < kvm->nmemslots; i++) { > + struct kvm_memory_slot *memslot = &kvm->memslots[i]; > + unsigned long start = memslot->userspace_addr; > + unsigned long end; > + > + /* mmu_lock protects userspace_addr */ > + if (!start) > + continue; > + > + end = start + (memslot->npages << PAGE_SHIFT); > + if (hva >= start && hva < end) { > + gfn_t gfn_offset = (hva - start) >> PAGE_SHIFT; > + retval |= handler(kvm, &memslot->rmap[gfn_offset]); > + retval |= handler(kvm, > + &memslot->lpage_info[ > + gfn_offset / > + KVM_PAGES_PER_HPAGE].rmap_pde); Other than that looks good.