From mboxrd@z Thu Jan  1 00:00:00 1970
From: Marcelo Tosatti <mtosatti@redhat.com>
Subject: Re: [PATCH 1/6] kvm.git mmu notifier patch
Date: Mon, 28 Jul 2008 18:11:04 -0300
Message-ID: <20080728211104.GA8342@dmt.cnet>
References: <20080725142452.GA8217@duo.random>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Cc: Avi Kivity <avi@qumranet.com>,
	Marcelo Tosatti <marcelo.tosatti@cyclades.com.br>,
	kvm@vger.kernel.org
To: Andrea Arcangeli <andrea@qumranet.com>
Return-path: <kvm-owner@vger.kernel.org>
Received: from mx1.redhat.com ([66.187.233.31]:42600 "EHLO mx1.redhat.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1760876AbYG1VNu (ORCPT <rfc822;kvm@vger.kernel.org>);
	Mon, 28 Jul 2008 17:13:50 -0400
Content-Disposition: inline
In-Reply-To: <20080725142452.GA8217@duo.random>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

Hi Andrea,

On Fri, Jul 25, 2008 at 04:24:52PM +0200, Andrea Arcangeli wrote:
> Hello,
> 
> This is the latest version of the kvm mmu notifier patch.
> 
> This allows largepages to be invalidated too, and a separate patch
> takes care of alias locking (2/5) and memslot locking (3/5).
> 
> While in the long term we should remove the alias to microoptimize the
> fast paths and avoid the need of unalias_gfn calls, I decided not to
> remove alias immediately: that patch done best would have removed the
> alias API as a whole from userland, and done bad it would have kept
> the alias API intact while implementing the lowlevel kernel details
> with memslots instead of the alias array. Problem is the memslot array
> is already fragmented with the private entries, and so adding the
> alias entries before or after the private memslot entries resulted in
> different issues, often requiring a new loop over the alias range or
> the private range depending where I put the alias slots (before or
> after the private ones).
> 
> So I think to do it right the alias ioctl should be removed as a whole
> and that was too large to do right now, especially given we're
> concentrating on stability and fixing the alias with the mmu_lock was
> a 2 liner change.
> 
> Marcelo please double check this code, I didn't effectively test it
> myself with the largepages yet but it looks the rmap_next works the
> same and the rmap_pde is just a rmapp with the only difference of the
> PSE bit being set. I didn't hide the rmap_pde details with gfn_to_rmap
> to avoid an entirely superflous gfn_to_memslot.
> 
> +                       retval |= handler(kvm,
> +                                         &memslot->lpage_info[
> +                                                 gfn_offset / KVM_PAGES_PER_HPAGE].rmap_pde);
> 
> Signed-off-by: Andrea Arcangeli <andrea@qumranet.com>
> 
> diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
> index 8d45fab..ce3251c 100644
> --- a/arch/x86/kvm/Kconfig
> +++ b/arch/x86/kvm/Kconfig
> @@ -21,6 +21,7 @@ config KVM
>  	tristate "Kernel-based Virtual Machine (KVM) support"
>  	depends on HAVE_KVM
>  	select PREEMPT_NOTIFIERS
> +	select MMU_NOTIFIER
>  	select ANON_INODES
>  	---help---
>  	  Support hosting fully virtualized guest machines using hardware
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 2b60b7d..9bc31fc 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -653,6 +653,84 @@ static void rmap_write_protect(struct kvm *kvm, u64 gfn)
>  	account_shadowed(kvm, gfn);
>  }
>  
> +static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp)
> +{
> +	u64 *spte;
> +	int need_tlb_flush = 0;
> +
> +	while ((spte = rmap_next(kvm, rmapp, NULL))) {
> +		BUG_ON(!(*spte & PT_PRESENT_MASK));
> +		rmap_printk("kvm_rmap_unmap_hva: spte %p %llx\n", spte, *spte);
> +		rmap_remove(kvm, spte);
> +		set_shadow_pte(spte, shadow_trap_nonpresent_pte);
> +		need_tlb_flush = 1;
> +	}

Is rmap_remove() safe during rmap_next() iteration? Don't you need to
restart the rmap walk from the beginning?

Remember the discussion around commit
6597ca09e6c0e5aec7ffd2b8ab48c671d3c28414.

> +	return need_tlb_flush;
> +}
> +
> +static int kvm_handle_hva(struct kvm *kvm, unsigned long hva,
> +			  int (*handler)(struct kvm *kvm, unsigned long *rmapp))
> +{
> +	int i;
> +	int retval = 0;
> +
> +	/*
> +	 * If mmap_sem isn't taken, we can look the memslots with only
> +	 * the mmu_lock by skipping over the slots with userspace_addr == 0.
> +	 */
> +	for (i = 0; i < kvm->nmemslots; i++) {
> +		struct kvm_memory_slot *memslot = &kvm->memslots[i];
> +		unsigned long start = memslot->userspace_addr;
> +		unsigned long end;
> +
> +		/* mmu_lock protects userspace_addr */
> +		if (!start)
> +			continue;
> +
> +		end = start + (memslot->npages << PAGE_SHIFT);
> +		if (hva >= start && hva < end) {
> +			gfn_t gfn_offset = (hva - start) >> PAGE_SHIFT;
> +			retval |= handler(kvm, &memslot->rmap[gfn_offset]);
> +			retval |= handler(kvm,
> +					  &memslot->lpage_info[
> +						  gfn_offset /
> +						  KVM_PAGES_PER_HPAGE].rmap_pde);

Other than that looks good.