From mboxrd@z Thu Jan  1 00:00:00 1970
From: Jeremy Fitzhardinge <jeremy@goop.org>
Subject: Re: xen.git branch reorg / success with 2.6.30-rc3	pv_ops
	dom0
Date: Wed, 22 Jul 2009 11:16:25 -0700
Message-ID: <4A675779.1030403@goop.org>
References: <20090522080655.GA24960@edu.joroinen.fi>	
	<20090604202656.GR24960@edu.joroinen.fi>	
	<1244197217.27370.146.camel@zakaz.uk.xensource.com>	
	<20090605112347.GY24960@edu.joroinen.fi>	
	<1244201864.27370.172.camel@zakaz.uk.xensource.com>	
	<20090605133850.GA24960@edu.joroinen.fi>	
	<1244209979.27370.188.camel@zakaz.uk.xensource.com>	
	<20090605154130.GB24960@edu.joroinen.fi>	
	<1244217948.27370.213.camel@zakaz.uk.xensource.com>	
	<1244218353.27370.216.camel@zakaz.uk.xensource.com>	
	<20090605181925.GC24960@edu.joroinen.fi>	
	<1244475935.27370.309.camel@zakaz.uk.xensource.com>	
	<1244476858.27370.325.camel@zakaz.uk.xensource.com>	
	<4A2E9BC3.4060507@goop.org>	
	<1244710938.27370.502.camel@zakaz.uk.xensource.com>
	<1244711914.27370.511.camel@zakaz.uk.xensource.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-15
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xensource.com>
In-Reply-To: <1244711914.27370.511.camel@zakaz.uk.xensource.com>
List-Unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xensource.com>
List-Help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-Subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
Sender: xen-devel-bounces@lists.xensource.com
Errors-To: xen-devel-bounces@lists.xensource.com
To: Ian Campbell <Ian.Campbell@citrix.com>
Cc: Xen-devel <xen-devel@lists.xensource.com>
List-Id: xen-devel@lists.xenproject.org

On 06/11/09 02:18, Ian Campbell wrote:
> Pasi, to validate the theory that you are seeing races between unpinning
> and kmap_atomic_pte can you give this biguglystick approach to solving
> it a go.
>   

I gave up trying to solve this in any kind of clever way and just
decided to go with a slightly cleaned-up version of this patch. 
Unfortunately, I don't think it actually solves the problem because it
doesn't prevent unpin from happening while the page is still kmapped; it
just ends up using the spinlock as a barrier to move the race around to
some timing which is presumably mostly avoids the problem.

In principle the fix is to take the lock in xen_kmap_atomic and release
it in xen_kunmap_atomic.  Unfortunately this is pretty ugly and complex
because kmaps are 1) inherently per-cpu, and 2) there can be 2 levels of
kmapping at once.  This means that we'd need 2 per-cpu locks to allow us
to hold these locks for the mapping duration without introducing
deadlocks.  However unpin (and pin, in principle) need to take *all*
these locks to exclude kmap on all cpus.

This is total overkill, since we only really care about excluding kmap
and unpin from a given pagetable, which suggests that the locks should
be part of the mm rather than global.  Unfortunately k(un)map_atomic_pte
don't get the mm of the pagetable they're trying to pin, and I don't
think we can assume its the current mm.

Another (pretty hideous) approach might be to make unpin check the state
of the KMAP_PTE[01] slots in each CPU's kmap fixmaps and see if a
mapping exists for a page that its currently unpinning.  This also has
the problem of being inherently racy; if we unpin the page, there's
going to be a little window after the unpin and before the kmap pte
update (even if they're back-to-back batched hypercalls).

I guess we could have a global rw spinlock; kmap/unmap takes it for
reading, and so can be concurrent with all other kmaps, but pin/unpin
takes it for writing to exclude them.  That would work, but runs the
risk of pin/unpin from being livelocked out (I don't think rwspins will
block new readers if there's a pending writer).  Ugly, but its the only
thing I can think of which actually solves the problem.

Oh, crap, we don't have a kunmap_atomic_pte hook.

Thoughts?  Am I overthinking this and missing something obvious?

(PS: avoid CONFIG_HIGHPTE)

    J

> diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
> index 1729178..beeb8e8 100644
> --- a/arch/x86/xen/mmu.c
> +++ b/arch/x86/xen/mmu.c
> @@ -1145,9 +1145,12 @@ static int xen_unpin_page(struct mm_struct *mm, struct page *page,
>  	return 0;		/* never need to flush on unpin */
>  }
>  
> +static DEFINE_SPINLOCK(hack_lock); /* Hack to sync unpin against kmap_atomic_pte */
> +
>  /* Release a pagetables pages back as normal RW */
>  static void __xen_pgd_unpin(struct mm_struct *mm, pgd_t *pgd)
>  {
> +	spin_lock(&hack_lock);
>  	xen_mc_batch();
>  
>  	xen_do_pin(MMUEXT_UNPIN_TABLE, PFN_DOWN(__pa(pgd)));
> @@ -1173,6 +1176,7 @@ static void __xen_pgd_unpin(struct mm_struct *mm, pgd_t *pgd)
>  	__xen_pgd_walk(mm, pgd, xen_unpin_page, USER_LIMIT);
>  
>  	xen_mc_issue(0);
> +	spin_unlock(&hack_lock);
>  }
>  
>  static void xen_pgd_unpin(struct mm_struct *mm)
> @@ -1521,6 +1525,9 @@ static void xen_pgd_free(struct mm_struct *mm, pgd_t *pgd)
>  static void *xen_kmap_atomic_pte(struct page *page, enum km_type type)
>  {
>  	pgprot_t prot = PAGE_KERNEL;
> +	void *ret;
> +
> +	spin_lock(&hack_lock);
>  
>  	if (PagePinned(page))
>  		prot = PAGE_KERNEL_RO;
> @@ -1530,7 +1537,11 @@ static void *xen_kmap_atomic_pte(struct page *page, enum km_type type)
>  		       page_to_pfn(page), type,
>  		       (unsigned long)pgprot_val(prot) & _PAGE_RW ? "WRITE" : "READ");
>  
> -	return kmap_atomic_prot(page, type, prot);
> +	ret = kmap_atomic_prot(page, type, prot);
> +
> +	spin_unlock(&hack_lock);
> +
> +	return ret;
>  }
>  #endif
>  
>
>
>