linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC] vmalloc: eagerly clear ptes on vunmap
       [not found] ` <20101127103656.GA6884@amd>
@ 2010-11-29 20:32   ` Jeremy Fitzhardinge
  2010-11-30 12:42     ` Nick Piggin
  2010-12-01  0:29     ` Andrew Morton
  0 siblings, 2 replies; 7+ messages in thread
From: Jeremy Fitzhardinge @ 2010-11-29 20:32 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Xen-devel@lists.xensource.com, Linux Kernel Mailing List,
	Andrew Morton, Linux Memory Management List, Trond Myklebust,
	Bryan Schumaker, Konrad Rzeszutek Wilk

When unmapping a region in the vmalloc space, clear the ptes immediately.
There's no point in deferring this because there's no amortization
benefit.

The TLBs are left dirty, and they are flushed lazily to amortize the
cost of the IPIs.

This specific motivation for this patch is a regression since 2.6.36 when
using NFS under Xen, triggered by the NFS client's use of vm_map_ram()
introduced in 56e4ebf877b6043c289bda32a5a7385b80c17dee.  XFS also uses
vm_map_ram() and could cause similar problems.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Nick Piggin <npiggin@kernel.dk>

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index a3d66b3..9960644 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -566,7 +566,6 @@ static void __purge_vmap_area_lazy(unsigned long *start, unsigned long *end,
 			if (va->va_end > *end)
 				*end = va->va_end;
 			nr += (va->va_end - va->va_start) >> PAGE_SHIFT;
-			unmap_vmap_area(va);
 			list_add_tail(&va->purge_list, &valist);
 			va->flags |= VM_LAZY_FREEING;
 			va->flags &= ~VM_LAZY_FREE;
@@ -616,6 +615,8 @@ static void purge_vmap_area_lazy(void)
  */
 static void free_unmap_vmap_area_noflush(struct vmap_area *va)
 {
+	unmap_vmap_area(va);
+
 	va->flags |= VM_LAZY_FREE;
 	atomic_add((va->va_end - va->va_start) >> PAGE_SHIFT, &vmap_lazy_nr);
 	if (unlikely(atomic_read(&vmap_lazy_nr) > lazy_max_pages()))
@@ -944,8 +945,10 @@ static void vb_free(const void *addr, unsigned long size)
 		BUG_ON(vb->free);
 		spin_unlock(&vb->lock);
 		free_vmap_block(vb);
-	} else
+	} else {
 		spin_unlock(&vb->lock);
+		vunmap_page_range((unsigned long)addr, (unsigned long)addr + size);
+	}
 }
 
 /**
@@ -988,7 +991,6 @@ void vm_unmap_aliases(void)
 
 				s = vb->va->va_start + (i << PAGE_SHIFT);
 				e = vb->va->va_start + (j << PAGE_SHIFT);
-				vunmap_page_range(s, e);
 				flush = 1;
 
 				if (s < start)


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH RFC] vmalloc: eagerly clear ptes on vunmap
  2010-11-29 20:32   ` [PATCH RFC] vmalloc: eagerly clear ptes on vunmap Jeremy Fitzhardinge
@ 2010-11-30 12:42     ` Nick Piggin
  2010-11-30 17:45       ` Jeremy Fitzhardinge
  2010-12-01  0:29     ` Andrew Morton
  1 sibling, 1 reply; 7+ messages in thread
From: Nick Piggin @ 2010-11-30 12:42 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Nick Piggin, Xen-devel@lists.xensource.com,
	Linux Kernel Mailing List, Andrew Morton,
	Linux Memory Management List, Trond Myklebust, Bryan Schumaker,
	Konrad Rzeszutek Wilk

On Mon, Nov 29, 2010 at 12:32:11PM -0800, Jeremy Fitzhardinge wrote:
> When unmapping a region in the vmalloc space, clear the ptes immediately.
> There's no point in deferring this because there's no amortization
> benefit.
> 
> The TLBs are left dirty, and they are flushed lazily to amortize the
> cost of the IPIs.
> 
> This specific motivation for this patch is a regression since 2.6.36 when
> using NFS under Xen, triggered by the NFS client's use of vm_map_ram()
> introduced in 56e4ebf877b6043c289bda32a5a7385b80c17dee.  XFS also uses
> vm_map_ram() and could cause similar problems.

I do wonder whether there are cache benefits from batching page table
updates, especially the batched per cpu maps (and in your version they
get double-cleared as well).  I think this patch is good, but I think
perhaps making it configurable would be nice.

So... main question, does it allow Xen to use lazy flushing and avoid
vm_unmap_aliases() calls?


> 
> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
> Cc: Nick Piggin <npiggin@kernel.dk>
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index a3d66b3..9960644 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -566,7 +566,6 @@ static void __purge_vmap_area_lazy(unsigned long *start, unsigned long *end,
>  			if (va->va_end > *end)
>  				*end = va->va_end;
>  			nr += (va->va_end - va->va_start) >> PAGE_SHIFT;
> -			unmap_vmap_area(va);
>  			list_add_tail(&va->purge_list, &valist);
>  			va->flags |= VM_LAZY_FREEING;
>  			va->flags &= ~VM_LAZY_FREE;
> @@ -616,6 +615,8 @@ static void purge_vmap_area_lazy(void)
>   */
>  static void free_unmap_vmap_area_noflush(struct vmap_area *va)
>  {
> +	unmap_vmap_area(va);
> +
>  	va->flags |= VM_LAZY_FREE;
>  	atomic_add((va->va_end - va->va_start) >> PAGE_SHIFT, &vmap_lazy_nr);
>  	if (unlikely(atomic_read(&vmap_lazy_nr) > lazy_max_pages()))
> @@ -944,8 +945,10 @@ static void vb_free(const void *addr, unsigned long size)
>  		BUG_ON(vb->free);
>  		spin_unlock(&vb->lock);
>  		free_vmap_block(vb);
> -	} else
> +	} else {
>  		spin_unlock(&vb->lock);
> +		vunmap_page_range((unsigned long)addr, (unsigned long)addr + size);
> +	}
>  }
>  
>  /**
> @@ -988,7 +991,6 @@ void vm_unmap_aliases(void)
>  
>  				s = vb->va->va_start + (i << PAGE_SHIFT);
>  				e = vb->va->va_start + (j << PAGE_SHIFT);
> -				vunmap_page_range(s, e);
>  				flush = 1;
>  
>  				if (s < start)
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH RFC] vmalloc: eagerly clear ptes on vunmap
  2010-11-30 12:42     ` Nick Piggin
@ 2010-11-30 17:45       ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 7+ messages in thread
From: Jeremy Fitzhardinge @ 2010-11-30 17:45 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Xen-devel@lists.xensource.com, Linux Kernel Mailing List,
	Andrew Morton, Linux Memory Management List, Trond Myklebust,
	Bryan Schumaker, Konrad Rzeszutek Wilk

On 11/30/2010 04:42 AM, Nick Piggin wrote:
> On Mon, Nov 29, 2010 at 12:32:11PM -0800, Jeremy Fitzhardinge wrote:
>> When unmapping a region in the vmalloc space, clear the ptes immediately.
>> There's no point in deferring this because there's no amortization
>> benefit.
>>
>> The TLBs are left dirty, and they are flushed lazily to amortize the
>> cost of the IPIs.
>>
>> This specific motivation for this patch is a regression since 2.6.36 when
>> using NFS under Xen, triggered by the NFS client's use of vm_map_ram()
>> introduced in 56e4ebf877b6043c289bda32a5a7385b80c17dee.  XFS also uses
>> vm_map_ram() and could cause similar problems.
> I do wonder whether there are cache benefits from batching page table
> updates, especially the batched per cpu maps

Perhaps.  But perhaps there are cache benefits in clearing early because
the ptes are still in cache from when they were set?
>  (and in your version they
> get double-cleared as well).

I thought I'd avoided that.  Oh, right, in both vb_free(), and again -
eventually - in free_vmap_block->free_unmap_vmap_area_noflush.

Delta patch below.

>   I think this patch is good, but I think
> perhaps making it configurable would be nice.

I'd rather not unless there's a strong reason to do so.

It occurs to me that once you remove the lazily mapped ptes, then all
that code is doing is keeping track of ranges of addresses with dirty
tlb entries.  But on x86 at least, any kernel tlb flush is a global one,
so keeping track of fine-grain address information is overkill.  I
wonder if the overall code can be simplified as a result?

On a more concrete level, vmap_page_range_noflush() and
vunmap_page_range() could be implemented with apply_to_page_range()
which removes a chunk of boilerplate code (however, it would result in a
callback per pte rather than one per pte page - but I'll fix that now).

> So... main question, does it allow Xen to use lazy flushing and avoid
> vm_unmap_aliases() calls?

Yes, it seems to.

Thanks,
    J

Subject: [PATCH] vmalloc: avoid double-unmapping percpu blocks

The area has always been unmapped by the time free_vmap_block() is
called, so there's no need to unmap it again.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 9551316..ade3302 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -520,13 +520,12 @@ static void purge_vmap_area_lazy(void)
 }
 
 /*
- * Free and unmap a vmap area, caller ensuring flush_cache_vunmap had been
- * called for the correct range previously.
+ * Free a vmap area, caller ensuring that the area has been unmapped
+ * and flush_cache_vunmap had been called for the correct range
+ * previously.
  */
-static void free_unmap_vmap_area_noflush(struct vmap_area *va)
+static void free_vmap_area_noflush(struct vmap_area *va)
 {
-	unmap_vmap_area(va);
-
 	va->flags |= VM_LAZY_FREE;
 	atomic_add((va->va_end - va->va_start) >> PAGE_SHIFT, &vmap_lazy_nr);
 	if (unlikely(atomic_read(&vmap_lazy_nr) > lazy_max_pages()))
@@ -534,6 +533,16 @@ static void free_unmap_vmap_area_noflush(struct vmap_area *va)
 }
 
 /*
+ * Free and unmap a vmap area, caller ensuring flush_cache_vunmap had been
+ * called for the correct range previously.
+ */
+static void free_unmap_vmap_area_noflush(struct vmap_area *va)
+{
+	unmap_vmap_area(va);
+	free_vmap_area_noflush(va);
+}
+
+/*
  * Free and unmap a vmap area
  */
 static void free_unmap_vmap_area(struct vmap_area *va)
@@ -709,7 +718,7 @@ static void free_vmap_block(struct vmap_block *vb)
 	spin_unlock(&vmap_block_tree_lock);
 	BUG_ON(tmp != vb);
 
-	free_unmap_vmap_area_noflush(vb->va);
+	free_vmap_area_noflush(vb->va);
 	call_rcu(&vb->rcu_head, rcu_free_vb);
 }
 


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH RFC] vmalloc: eagerly clear ptes on vunmap
  2010-11-29 20:32   ` [PATCH RFC] vmalloc: eagerly clear ptes on vunmap Jeremy Fitzhardinge
  2010-11-30 12:42     ` Nick Piggin
@ 2010-12-01  0:29     ` Andrew Morton
  2010-12-01  3:09       ` Jeremy Fitzhardinge
  1 sibling, 1 reply; 7+ messages in thread
From: Andrew Morton @ 2010-12-01  0:29 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Nick Piggin, Xen-devel@lists.xensource.com,
	Linux Kernel Mailing List, Linux Memory Management List,
	Trond Myklebust, Bryan Schumaker, Konrad Rzeszutek Wilk

On Mon, 29 Nov 2010 12:32:11 -0800
Jeremy Fitzhardinge <jeremy@goop.org> wrote:

> When unmapping a region in the vmalloc space, clear the ptes immediately.
> There's no point in deferring this because there's no amortization
> benefit.
> 
> The TLBs are left dirty, and they are flushed lazily to amortize the
> cost of the IPIs.
> 
> This specific motivation for this patch is a regression since 2.6.36 when
> using NFS under Xen, triggered by the NFS client's use of vm_map_ram()
> introduced in 56e4ebf877b6043c289bda32a5a7385b80c17dee.  XFS also uses
> vm_map_ram() and could cause similar problems.
> 

Do we have any quantitative info on that regression?  The patch fixed
it, I assume?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH RFC] vmalloc: eagerly clear ptes on vunmap
  2010-12-01  0:29     ` Andrew Morton
@ 2010-12-01  3:09       ` Jeremy Fitzhardinge
  2010-12-01  3:23         ` Andrew Morton
  0 siblings, 1 reply; 7+ messages in thread
From: Jeremy Fitzhardinge @ 2010-12-01  3:09 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Nick Piggin, Xen-devel@lists.xensource.com,
	Linux Kernel Mailing List, Linux Memory Management List,
	Trond Myklebust, Bryan Schumaker, Konrad Rzeszutek Wilk

On 11/30/2010 04:29 PM, Andrew Morton wrote:
> On Mon, 29 Nov 2010 12:32:11 -0800
> Jeremy Fitzhardinge <jeremy@goop.org> wrote:
>
>> When unmapping a region in the vmalloc space, clear the ptes immediately.
>> There's no point in deferring this because there's no amortization
>> benefit.
>>
>> The TLBs are left dirty, and they are flushed lazily to amortize the
>> cost of the IPIs.
>>
>> This specific motivation for this patch is a regression since 2.6.36 when
>> using NFS under Xen, triggered by the NFS client's use of vm_map_ram()
>> introduced in 56e4ebf877b6043c289bda32a5a7385b80c17dee.  XFS also uses
>> vm_map_ram() and could cause similar problems.
>>
> Do we have any quantitative info on that regression?

It's pretty easy to reproduce - you get oopses very quickly while using
NFS.  I haven't got any lying around right now, but I could easily
generate one if you want to decorate the changelog a bit.

>   The patch fixed
> it, I assume?

Yes, the patch fixes it, and I think it is just luck that xfs doesn't
also trigger the same problem.  Here's a followup patch to disable the
previous hack.

    J

Subject: [PATCH] vmalloc: remove vmap_lazy_unmap flag

Now that vmunmap no longer leaves stray ptes lying around, we don't need
the vmap_lazy_unmap flag any more.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 21ed8d7..0e4ecac 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -2358,8 +2358,6 @@ void __init xen_init_mmu_ops(void)
 	x86_init.paging.pagetable_setup_done = xen_pagetable_setup_done;
 	pv_mmu_ops = xen_mmu_ops;
 
-	vmap_lazy_unmap = false;
-
 	memset(dummy_mapping, 0xff, PAGE_SIZE);
 }
 
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index a03dcf6..44b54f6 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -7,8 +7,6 @@
 
 struct vm_area_struct;		/* vma defining user mapping in mm_types.h */
 
-extern bool vmap_lazy_unmap;
-
 /* bits in flags of vmalloc's vm_struct below */
 #define VM_IOREMAP	0x00000001	/* ioremap() and friends */
 #define VM_ALLOC	0x00000002	/* vmalloc() */
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index ffefe70..828d95e 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -31,8 +31,6 @@
 #include <asm/tlbflush.h>
 #include <asm/shmparam.h>
 
-bool vmap_lazy_unmap __read_mostly = true;
-
 /*** Page table manipulation functions ***/
 
 static void vunmap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end)
@@ -503,9 +501,6 @@ static unsigned long lazy_max_pages(void)
 {
 	unsigned int log;
 
-	if (!vmap_lazy_unmap)
-		return 0;
-
 	log = fls(num_online_cpus());
 
 	return log * (32UL * 1024 * 1024 / PAGE_SIZE);


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH RFC] vmalloc: eagerly clear ptes on vunmap
  2010-12-01  3:09       ` Jeremy Fitzhardinge
@ 2010-12-01  3:23         ` Andrew Morton
  2010-12-01  8:16           ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 7+ messages in thread
From: Andrew Morton @ 2010-12-01  3:23 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Nick Piggin, Xen-devel@lists.xensource.com,
	Linux Kernel Mailing List, Linux Memory Management List,
	Trond Myklebust, Bryan Schumaker, Konrad Rzeszutek Wilk

On Tue, 30 Nov 2010 19:09:43 -0800 Jeremy Fitzhardinge <jeremy@goop.org> wrote:

> On 11/30/2010 04:29 PM, Andrew Morton wrote:
> > On Mon, 29 Nov 2010 12:32:11 -0800
> > Jeremy Fitzhardinge <jeremy@goop.org> wrote:
> >
> >> When unmapping a region in the vmalloc space, clear the ptes immediately.
> >> There's no point in deferring this because there's no amortization
> >> benefit.
> >>
> >> The TLBs are left dirty, and they are flushed lazily to amortize the
> >> cost of the IPIs.
> >>
> >> This specific motivation for this patch is a regression since 2.6.36 when
> >> using NFS under Xen, triggered by the NFS client's use of vm_map_ram()
> >> introduced in 56e4ebf877b6043c289bda32a5a7385b80c17dee.  XFS also uses
> >> vm_map_ram() and could cause similar problems.
> >>
> > Do we have any quantitative info on that regression?
> 
> It's pretty easy to reproduce - you get oopses very quickly while using
> NFS.

Bah.  I'd assumed that it was a performance regression and had vaguely
queued it for 2.6.37.

> I haven't got any lying around right now, but I could easily
> generate one if you want to decorate the changelog a bit.

You owe me that much ;)


Here's the current rollup of the three patches.  Please check.



From: Jeremy Fitzhardinge <jeremy@goop.org>

When unmapping a region in the vmalloc space, clear the ptes immediately. 
There's no point in deferring this because there's no amortization
benefit.

The TLBs are left dirty, and they are flushed lazily to amortize the cost
of the IPIs.

This specific motivation for this patch is an oops-causing regression
since 2.6.36 when using NFS under Xen, triggered by the NFS client's use
of vm_map_ram() introduced in 56e4ebf877b60 ("NFS: readdir with vmapped
pages") .  XFS also uses vm_map_ram() and could cause similar problems.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Bryan Schumaker <bjschuma@netapp.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Alex Elder <aelder@sgi.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/x86/xen/mmu.c      |    2 --
 include/linux/vmalloc.h |    2 --
 mm/vmalloc.c            |   30 ++++++++++++++++++------------
 3 files changed, 18 insertions(+), 16 deletions(-)

diff -puN mm/vmalloc.c~vmalloc-eagerly-clear-ptes-on-vunmap mm/vmalloc.c
--- a/mm/vmalloc.c~vmalloc-eagerly-clear-ptes-on-vunmap
+++ a/mm/vmalloc.c
@@ -31,8 +31,6 @@
 #include <asm/tlbflush.h>
 #include <asm/shmparam.h>
 
-bool vmap_lazy_unmap __read_mostly = true;
-
 /*** Page table manipulation functions ***/
 
 static void vunmap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end)
@@ -503,9 +501,6 @@ static unsigned long lazy_max_pages(void
 {
 	unsigned int log;
 
-	if (!vmap_lazy_unmap)
-		return 0;
-
 	log = fls(num_online_cpus());
 
 	return log * (32UL * 1024 * 1024 / PAGE_SIZE);
@@ -566,7 +561,6 @@ static void __purge_vmap_area_lazy(unsig
 			if (va->va_end > *end)
 				*end = va->va_end;
 			nr += (va->va_end - va->va_start) >> PAGE_SHIFT;
-			unmap_vmap_area(va);
 			list_add_tail(&va->purge_list, &valist);
 			va->flags |= VM_LAZY_FREEING;
 			va->flags &= ~VM_LAZY_FREE;
@@ -611,10 +605,11 @@ static void purge_vmap_area_lazy(void)
 }
 
 /*
- * Free and unmap a vmap area, caller ensuring flush_cache_vunmap had been
- * called for the correct range previously.
+ * Free a vmap area, caller ensuring that the area has been unmapped
+ * and flush_cache_vunmap had been called for the correct range
+ * previously.
  */
-static void free_unmap_vmap_area_noflush(struct vmap_area *va)
+static void free_vmap_area_noflush(struct vmap_area *va)
 {
 	va->flags |= VM_LAZY_FREE;
 	atomic_add((va->va_end - va->va_start) >> PAGE_SHIFT, &vmap_lazy_nr);
@@ -623,6 +618,16 @@ static void free_unmap_vmap_area_noflush
 }
 
 /*
+ * Free and unmap a vmap area, caller ensuring flush_cache_vunmap had been
+ * called for the correct range previously.
+ */
+static void free_unmap_vmap_area_noflush(struct vmap_area *va)
+{
+	unmap_vmap_area(va);
+	free_vmap_area_noflush(va);
+}
+
+/*
  * Free and unmap a vmap area
  */
 static void free_unmap_vmap_area(struct vmap_area *va)
@@ -798,7 +803,7 @@ static void free_vmap_block(struct vmap_
 	spin_unlock(&vmap_block_tree_lock);
 	BUG_ON(tmp != vb);
 
-	free_unmap_vmap_area_noflush(vb->va);
+	free_vmap_area_noflush(vb->va);
 	call_rcu(&vb->rcu_head, rcu_free_vb);
 }
 
@@ -944,8 +949,10 @@ static void vb_free(const void *addr, un
 		BUG_ON(vb->free);
 		spin_unlock(&vb->lock);
 		free_vmap_block(vb);
-	} else
+	} else {
 		spin_unlock(&vb->lock);
+		vunmap_page_range((unsigned long)addr, (unsigned long)addr + size);
+	}
 }
 
 /**
@@ -988,7 +995,6 @@ void vm_unmap_aliases(void)
 
 				s = vb->va->va_start + (i << PAGE_SHIFT);
 				e = vb->va->va_start + (j << PAGE_SHIFT);
-				vunmap_page_range(s, e);
 				flush = 1;
 
 				if (s < start)
diff -puN arch/x86/xen/mmu.c~vmalloc-eagerly-clear-ptes-on-vunmap arch/x86/xen/mmu.c
--- a/arch/x86/xen/mmu.c~vmalloc-eagerly-clear-ptes-on-vunmap
+++ a/arch/x86/xen/mmu.c
@@ -2415,8 +2415,6 @@ void __init xen_init_mmu_ops(void)
 	x86_init.paging.pagetable_setup_done = xen_pagetable_setup_done;
 	pv_mmu_ops = xen_mmu_ops;
 
-	vmap_lazy_unmap = false;
-
 	memset(dummy_mapping, 0xff, PAGE_SIZE);
 }
 
diff -puN include/linux/vmalloc.h~vmalloc-eagerly-clear-ptes-on-vunmap include/linux/vmalloc.h
--- a/include/linux/vmalloc.h~vmalloc-eagerly-clear-ptes-on-vunmap
+++ a/include/linux/vmalloc.h
@@ -7,8 +7,6 @@
 
 struct vm_area_struct;		/* vma defining user mapping in mm_types.h */
 
-extern bool vmap_lazy_unmap;
-
 /* bits in flags of vmalloc's vm_struct below */
 #define VM_IOREMAP	0x00000001	/* ioremap() and friends */
 #define VM_ALLOC	0x00000002	/* vmalloc() */
_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH RFC] vmalloc: eagerly clear ptes on vunmap
  2010-12-01  3:23         ` Andrew Morton
@ 2010-12-01  8:16           ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 7+ messages in thread
From: Jeremy Fitzhardinge @ 2010-12-01  8:16 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Nick Piggin, Xen-devel@lists.xensource.com,
	Linux Kernel Mailing List, Linux Memory Management List,
	Trond Myklebust, Bryan Schumaker, Konrad Rzeszutek Wilk

On 11/30/2010 07:23 PM, Andrew Morton wrote:
> On Tue, 30 Nov 2010 19:09:43 -0800 Jeremy Fitzhardinge <jeremy@goop.org> wrote:
>
>> On 11/30/2010 04:29 PM, Andrew Morton wrote:
>>> On Mon, 29 Nov 2010 12:32:11 -0800
>>> Jeremy Fitzhardinge <jeremy@goop.org> wrote:
>>>
>>>> When unmapping a region in the vmalloc space, clear the ptes immediately.
>>>> There's no point in deferring this because there's no amortization
>>>> benefit.
>>>>
>>>> The TLBs are left dirty, and they are flushed lazily to amortize the
>>>> cost of the IPIs.
>>>>
>>>> This specific motivation for this patch is a regression since 2.6.36 when
>>>> using NFS under Xen, triggered by the NFS client's use of vm_map_ram()
>>>> introduced in 56e4ebf877b6043c289bda32a5a7385b80c17dee.  XFS also uses
>>>> vm_map_ram() and could cause similar problems.
>>>>
>>> Do we have any quantitative info on that regression?
>> It's pretty easy to reproduce - you get oopses very quickly while using
>> NFS.
> Bah.  I'd assumed that it was a performance regression and had vaguely
> queued it for 2.6.37.

I have a bunch of other cleanups for mm/vmalloc.c which I'll propose for
.37.

>> I haven't got any lying around right now, but I could easily
>> generate one if you want to decorate the changelog a bit.
> You owe me that much ;)

On stock 2.6.37-rc4, running:

# mount lilith:/export /mnt/lilith
# find  /mnt/lilith/ -type f -print0 | xargs -0 file

crashes the machine fairly quickly under Xen.  Often it results in oops
messages, but the couple of times I tried just now, it just hung quietly
and made Xen print some rude messages:

    (XEN) mm.c:2389:d80 Bad type (saw 7400000000000001 != exp
    3000000000000000) for mfn 1d7058 (pfn 18fa7)
    (XEN) mm.c:964:d80 Attempt to create linear p.t. with write perms
    (XEN) mm.c:2389:d80 Bad type (saw 7400000000000010 != exp
    1000000000000000) for mfn 1d2e04 (pfn 1d1fb)
    (XEN) mm.c:2965:d80 Error while pinning mfn 1d2e04

Which means the domain tried to map a pagetable page RW, which would
allow it to map arbitrary memory, so Xen stopped it.  This is because
vm_unmap_ram() left some pages mapped in the vmalloc area after NFS had
finished with them, and those pages got recycled as pagetable pages
while still having these RW aliases.

Removing those mappings immediately removes the Xen-visible aliases, and
so it has no problem with those pages being reused as pagetable pages. 
Deferring the TLB flush doesn't upset Xen because it can flush the TLB
itself as needed to maintain its invariants.

> Here's the current rollup of the three patches.  Please check.

It looks like "vmalloc: avoid double-unmapping percpu blocks" wasn't
complete.  Fix below.

Thanks,
    J


From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Date: Tue, 30 Nov 2010 23:53:28 -0800
Subject: [PATCH] vmalloc: always unmap in vb_free()

free_vmap_block() doesn't unmap anything, so just unconditionally unmap
the region.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index ffefe70..a582491 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -946,6 +946,8 @@ static void vb_free(const void *addr, unsigned long size)
 	rcu_read_unlock();
 	BUG_ON(!vb);
 
+	vunmap_page_range((unsigned long)addr, (unsigned long)addr + size);
+
 	spin_lock(&vb->lock);
 	BUG_ON(bitmap_allocate_region(vb->dirty_map, offset >> PAGE_SHIFT, order));
 
@@ -954,10 +956,8 @@ static void vb_free(const void *addr, unsigned long size)
 		BUG_ON(vb->free);
 		spin_unlock(&vb->lock);
 		free_vmap_block(vb);
-	} else {
+	} else
 		spin_unlock(&vb->lock);
-		vunmap_page_range((unsigned long)addr, (unsigned long)addr + size);
-	}
 }
 
 /**


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2010-12-01  8:16 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <4CEF6B8B.8080206@goop.org>
     [not found] ` <20101127103656.GA6884@amd>
2010-11-29 20:32   ` [PATCH RFC] vmalloc: eagerly clear ptes on vunmap Jeremy Fitzhardinge
2010-11-30 12:42     ` Nick Piggin
2010-11-30 17:45       ` Jeremy Fitzhardinge
2010-12-01  0:29     ` Andrew Morton
2010-12-01  3:09       ` Jeremy Fitzhardinge
2010-12-01  3:23         ` Andrew Morton
2010-12-01  8:16           ` Jeremy Fitzhardinge

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).