[PATCH 1/2] break out page allocation warning code

public inbox for linux-mm@kvack.org
 help / color / mirror / Atom feed

* [PATCH 1/2] break out page allocation warning code
@ 2011-04-08 20:22 Dave Hansen
  2011-04-08 20:22 ` [PATCH 2/2] print vmalloc() state after allocation failures Dave Hansen
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Dave Hansen @ 2011-04-08 20:22 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, Johannes Weiner, Dave Hansen


This originally started as a simple patch to give vmalloc()
some more verbose output on failure on top of the plain
page allocator messages.  Johannes suggested that it might
be nicer to lead with the vmalloc() info _before_ the page
allocator messages.

But, I do think there's a lot of value in what
__alloc_pages_slowpath() does with its filtering and so
forth.

This patch creates a new function which other allocators
can call instead of relying on the internal page allocator
warnings.  It also gives this function private rate-limiting
which separates it from other printk_ratelimit() users.

---

 linux-2.6.git-dave/include/linux/mm.h |    2 +
 linux-2.6.git-dave/mm/page_alloc.c    |   65 +++++++++++++++++++++++-----------
 2 files changed, 46 insertions(+), 21 deletions(-)

diff -puN include/linux/mm.h~break-out-alloc-failure-messages include/linux/mm.h
--- linux-2.6.git/include/linux/mm.h~break-out-alloc-failure-messages	2011-04-08 13:07:18.978332687 -0700
+++ linux-2.6.git-dave/include/linux/mm.h	2011-04-08 13:07:18.990332675 -0700
@@ -1365,6 +1365,8 @@ extern void si_meminfo(struct sysinfo * 
 extern void si_meminfo_node(struct sysinfo *val, int nid);
 extern int after_bootmem;
 
+extern void nopage_warning(gfp_t gfp_mask, int order, const char *fmt, ...);
+
 extern void setup_per_cpu_pageset(void);
 
 extern void zone_pcp_update(struct zone *zone);
diff -puN mm/page_alloc.c~break-out-alloc-failure-messages mm/page_alloc.c
--- linux-2.6.git/mm/page_alloc.c~break-out-alloc-failure-messages	2011-04-08 13:07:18.982332683 -0700
+++ linux-2.6.git-dave/mm/page_alloc.c	2011-04-08 13:07:18.990332675 -0700
@@ -54,6 +54,7 @@
 #include <trace/events/kmem.h>
 #include <linux/ftrace_event.h>
 #include <linux/memcontrol.h>
+#include <linux/ratelimit.h>
 
 #include <asm/tlbflush.h>
 #include <asm/div64.h>
@@ -1734,6 +1735,48 @@ static inline bool should_suppress_show_
 	return ret;
 }
 
+static DEFINE_RATELIMIT_STATE(nopage_rs,
+		DEFAULT_RATELIMIT_INTERVAL,
+		DEFAULT_RATELIMIT_BURST);
+
+void nopage_warning(gfp_t gfp_mask, int order, const char *fmt, ...)
+{
+	va_list args;
+	int r;
+	unsigned int filter = SHOW_MEM_FILTER_NODES;
+	const gfp_t wait = gfp_mask & __GFP_WAIT;
+
+	if ((gfp_mask & __GFP_NOWARN) || !__ratelimit(&nopage_rs))
+		return;
+
+	/*
+	 * This documents exceptions given to allocations in certain
+	 * contexts that are allowed to allocate outside current's set
+	 * of allowed nodes.
+	 */
+	if (!(gfp_mask & __GFP_NOMEMALLOC))
+		if (test_thread_flag(TIF_MEMDIE) ||
+		    (current->flags & (PF_MEMALLOC | PF_EXITING)))
+			filter &= ~SHOW_MEM_FILTER_NODES;
+	if (in_interrupt() || !wait)
+		filter &= ~SHOW_MEM_FILTER_NODES;
+
+	if (fmt) {
+		printk(KERN_WARNING);
+		va_start(args, fmt);
+		r = vprintk(fmt, args);
+		va_end(args);
+	}
+
+	printk(KERN_WARNING);
+	printk("%s: page allocation failure: order:%d, mode:0x%x\n",
+			current->comm, order, gfp_mask);
+
+	dump_stack();
+	if (!should_suppress_show_mem())
+		show_mem(filter);
+}
+
 static inline int
 should_alloc_retry(gfp_t gfp_mask, unsigned int order,
 				unsigned long pages_reclaimed)
@@ -2176,27 +2219,7 @@ rebalance:
 	}
 
 nopage:
-	if (!(gfp_mask & __GFP_NOWARN) && printk_ratelimit()) {
-		unsigned int filter = SHOW_MEM_FILTER_NODES;
-
-		/*
-		 * This documents exceptions given to allocations in certain
-		 * contexts that are allowed to allocate outside current's set
-		 * of allowed nodes.
-		 */
-		if (!(gfp_mask & __GFP_NOMEMALLOC))
-			if (test_thread_flag(TIF_MEMDIE) ||
-			    (current->flags & (PF_MEMALLOC | PF_EXITING)))
-				filter &= ~SHOW_MEM_FILTER_NODES;
-		if (in_interrupt() || !wait)
-			filter &= ~SHOW_MEM_FILTER_NODES;
-
-		pr_warning("%s: page allocation failure. order:%d, mode:0x%x\n",
-			current->comm, order, gfp_mask);
-		dump_stack();
-		if (!should_suppress_show_mem())
-			show_mem(filter);
-	}
+	nopage_warning(gfp_mask, order, NULL);
 	return page;
 got_pg:
 	if (kmemcheck_enabled)
_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 2/2] print vmalloc() state after allocation failures
  2011-04-08 20:22 [PATCH 1/2] break out page allocation warning code Dave Hansen
@ 2011-04-08 20:22 ` Dave Hansen
  2011-04-08 20:39   ` David Rientjes
  2011-04-08 20:37 ` [PATCH 1/2] break out page allocation warning code David Rientjes
  2011-04-08 20:54 ` Michał Nazarewicz
  2 siblings, 1 reply; 15+ messages in thread
From: Dave Hansen @ 2011-04-08 20:22 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, Johannes Weiner, Dave Hansen


I was tracking down a page allocation failure that ended up in vmalloc().
Since vmalloc() uses 0-order pages, if somebody asks for an insane amount
of memory, we'll still get a warning with "order:0" in it.  That's not
very useful.

During recovery, vmalloc() also nicely frees all of the memory that it
got up to the point of the failure.  That is wonderful, but it also
quickly hides any issues.  We have a much different sitation if vmalloc()
repeatedly fails 10GB in to:

	vmalloc(100 * 1<<30);

versus repeatedly failing 4096 bytes in to a:

	vmalloc(8192);

This patch will print out messages that look like this:

[   30.040774] bash: vmalloc failure allocating after 0 / 73728 bytes

As a side issue, I also noticed that ctl_ioctl() does vmalloc() based
solely on an unverified value passed in from userspace.  Granted, it's
under CAP_SYS_ADMIN, but it still frightens me a bit.

Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
---

 linux-2.6.git-dave/mm/vmalloc.c |    9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff -puN mm/vmalloc.c~vmalloc-warn mm/vmalloc.c
--- linux-2.6.git/mm/vmalloc.c~vmalloc-warn	2011-04-08 09:36:05.877020199 -0700
+++ linux-2.6.git-dave/mm/vmalloc.c	2011-04-08 09:38:00.373093593 -0700
@@ -1534,6 +1534,7 @@ static void *__vmalloc_node(unsigned lon
 static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
 				 pgprot_t prot, int node, void *caller)
 {
+	int order = 0;
 	struct page **pages;
 	unsigned int nr_pages, array_size, i;
 	gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
@@ -1560,11 +1561,12 @@ static void *__vmalloc_area_node(struct 
 
 	for (i = 0; i < area->nr_pages; i++) {
 		struct page *page;
+		gfp_t tmp_mask = gfp_mask | __GFP_NOWARN;
 
 		if (node < 0)
-			page = alloc_page(gfp_mask);
+			page = alloc_page(tmp_mask);
 		else
-			page = alloc_pages_node(node, gfp_mask, 0);
+			page = alloc_pages_node(node, tmp_mask, order);
 
 		if (unlikely(!page)) {
 			/* Successfully allocated i pages, free them in __vunmap() */
@@ -1579,6 +1581,9 @@ static void *__vmalloc_area_node(struct 
 	return area->addr;
 
 fail:
+	nopage_warning(gfp_mask, order, "vmalloc: allocation failure, "
+			"allocated %ld of %ld bytes\n",
+			(area->nr_pages*PAGE_SIZE), area->size);
 	vfree(area->addr);
 	return NULL;
 }
_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] break out page allocation warning code
  2011-04-08 20:22 [PATCH 1/2] break out page allocation warning code Dave Hansen
  2011-04-08 20:22 ` [PATCH 2/2] print vmalloc() state after allocation failures Dave Hansen
@ 2011-04-08 20:37 ` David Rientjes
  2011-04-08 20:43   ` Dave Hansen
  2011-04-08 20:54 ` Michał Nazarewicz
  2 siblings, 1 reply; 15+ messages in thread
From: David Rientjes @ 2011-04-08 20:37 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-mm, linux-kernel, Johannes Weiner

On Fri, 8 Apr 2011, Dave Hansen wrote:

> 
> This originally started as a simple patch to give vmalloc()
> some more verbose output on failure on top of the plain
> page allocator messages.  Johannes suggested that it might
> be nicer to lead with the vmalloc() info _before_ the page
> allocator messages.
> 
> But, I do think there's a lot of value in what
> __alloc_pages_slowpath() does with its filtering and so
> forth.
> 
> This patch creates a new function which other allocators
> can call instead of relying on the internal page allocator
> warnings.  It also gives this function private rate-limiting
> which separates it from other printk_ratelimit() users.
> 
> ---
> 
>  linux-2.6.git-dave/include/linux/mm.h |    2 +
>  linux-2.6.git-dave/mm/page_alloc.c    |   65 +++++++++++++++++++++++-----------
>  2 files changed, 46 insertions(+), 21 deletions(-)
> 
> diff -puN include/linux/mm.h~break-out-alloc-failure-messages include/linux/mm.h
> --- linux-2.6.git/include/linux/mm.h~break-out-alloc-failure-messages	2011-04-08 13:07:18.978332687 -0700
> +++ linux-2.6.git-dave/include/linux/mm.h	2011-04-08 13:07:18.990332675 -0700
> @@ -1365,6 +1365,8 @@ extern void si_meminfo(struct sysinfo * 
>  extern void si_meminfo_node(struct sysinfo *val, int nid);
>  extern int after_bootmem;
>  
> +extern void nopage_warning(gfp_t gfp_mask, int order, const char *fmt, ...);
> +
>  extern void setup_per_cpu_pageset(void);
>  
>  extern void zone_pcp_update(struct zone *zone);
> diff -puN mm/page_alloc.c~break-out-alloc-failure-messages mm/page_alloc.c
> --- linux-2.6.git/mm/page_alloc.c~break-out-alloc-failure-messages	2011-04-08 13:07:18.982332683 -0700
> +++ linux-2.6.git-dave/mm/page_alloc.c	2011-04-08 13:07:18.990332675 -0700
> @@ -54,6 +54,7 @@
>  #include <trace/events/kmem.h>
>  #include <linux/ftrace_event.h>
>  #include <linux/memcontrol.h>
> +#include <linux/ratelimit.h>
>  
>  #include <asm/tlbflush.h>
>  #include <asm/div64.h>
> @@ -1734,6 +1735,48 @@ static inline bool should_suppress_show_
>  	return ret;
>  }
>  
> +static DEFINE_RATELIMIT_STATE(nopage_rs,
> +		DEFAULT_RATELIMIT_INTERVAL,
> +		DEFAULT_RATELIMIT_BURST);
> +
> +void nopage_warning(gfp_t gfp_mask, int order, const char *fmt, ...)

I suggest a different name for this, something like warn_alloc_failure() 
or such.

I guess this isn't general enough where it could be used in the oom killer 
as well?

> +{
> +	va_list args;
> +	int r;
> +	unsigned int filter = SHOW_MEM_FILTER_NODES;
> +	const gfp_t wait = gfp_mask & __GFP_WAIT;
> +
> +	if ((gfp_mask & __GFP_NOWARN) || !__ratelimit(&nopage_rs))
> +		return;
> +
> +	/*
> +	 * This documents exceptions given to allocations in certain
> +	 * contexts that are allowed to allocate outside current's set
> +	 * of allowed nodes.
> +	 */
> +	if (!(gfp_mask & __GFP_NOMEMALLOC))
> +		if (test_thread_flag(TIF_MEMDIE) ||
> +		    (current->flags & (PF_MEMALLOC | PF_EXITING)))
> +			filter &= ~SHOW_MEM_FILTER_NODES;
> +	if (in_interrupt() || !wait)
> +		filter &= ~SHOW_MEM_FILTER_NODES;
> +
> +	if (fmt) {
> +		printk(KERN_WARNING);
> +		va_start(args, fmt);
> +		r = vprintk(fmt, args);
> +		va_end(args);
> +	}
> +
> +	printk(KERN_WARNING);
> +	printk("%s: page allocation failure: order:%d, mode:0x%x\n",
> +			current->comm, order, gfp_mask);

This shouldn't be here, it should have been printed already.

> +
> +	dump_stack();
> +	if (!should_suppress_show_mem())
> +		show_mem(filter);
> +}
> +
>  static inline int
>  should_alloc_retry(gfp_t gfp_mask, unsigned int order,
>  				unsigned long pages_reclaimed)
> @@ -2176,27 +2219,7 @@ rebalance:
>  	}
>  
>  nopage:
> -	if (!(gfp_mask & __GFP_NOWARN) && printk_ratelimit()) {
> -		unsigned int filter = SHOW_MEM_FILTER_NODES;
> -
> -		/*
> -		 * This documents exceptions given to allocations in certain
> -		 * contexts that are allowed to allocate outside current's set
> -		 * of allowed nodes.
> -		 */
> -		if (!(gfp_mask & __GFP_NOMEMALLOC))
> -			if (test_thread_flag(TIF_MEMDIE) ||
> -			    (current->flags & (PF_MEMALLOC | PF_EXITING)))
> -				filter &= ~SHOW_MEM_FILTER_NODES;
> -		if (in_interrupt() || !wait)
> -			filter &= ~SHOW_MEM_FILTER_NODES;
> -
> -		pr_warning("%s: page allocation failure. order:%d, mode:0x%x\n",
> -			current->comm, order, gfp_mask);
> -		dump_stack();
> -		if (!should_suppress_show_mem())
> -			show_mem(filter);
> -	}
> +	nopage_warning(gfp_mask, order, NULL);
>  	return page;
>  got_pg:
>  	if (kmemcheck_enabled)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] print vmalloc() state after allocation failures
  2011-04-08 20:22 ` [PATCH 2/2] print vmalloc() state after allocation failures Dave Hansen
@ 2011-04-08 20:39   ` David Rientjes
  2011-04-08 20:47     ` Dave Hansen
  0 siblings, 1 reply; 15+ messages in thread
From: David Rientjes @ 2011-04-08 20:39 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-mm, linux-kernel, Johannes Weiner

On Fri, 8 Apr 2011, Dave Hansen wrote:

> 
> I was tracking down a page allocation failure that ended up in vmalloc().
> Since vmalloc() uses 0-order pages, if somebody asks for an insane amount
> of memory, we'll still get a warning with "order:0" in it.  That's not
> very useful.
> 
> During recovery, vmalloc() also nicely frees all of the memory that it
> got up to the point of the failure.  That is wonderful, but it also
> quickly hides any issues.  We have a much different sitation if vmalloc()
> repeatedly fails 10GB in to:
> 
> 	vmalloc(100 * 1<<30);
> 
> versus repeatedly failing 4096 bytes in to a:
> 
> 	vmalloc(8192);
> 
> This patch will print out messages that look like this:
> 
> [   30.040774] bash: vmalloc failure allocating after 0 / 73728 bytes
> 

Either the changelog or the patch is still wrong because the format of 
this string is inconsistent.

> As a side issue, I also noticed that ctl_ioctl() does vmalloc() based
> solely on an unverified value passed in from userspace.  Granted, it's
> under CAP_SYS_ADMIN, but it still frightens me a bit.
> 
> Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
> ---
> 
>  linux-2.6.git-dave/mm/vmalloc.c |    9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff -puN mm/vmalloc.c~vmalloc-warn mm/vmalloc.c
> --- linux-2.6.git/mm/vmalloc.c~vmalloc-warn	2011-04-08 09:36:05.877020199 -0700
> +++ linux-2.6.git-dave/mm/vmalloc.c	2011-04-08 09:38:00.373093593 -0700
> @@ -1534,6 +1534,7 @@ static void *__vmalloc_node(unsigned lon
>  static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
>  				 pgprot_t prot, int node, void *caller)
>  {
> +	int order = 0;

Unnecessary, we can continue to hardcode the 0, vmalloc isn't going to use 
higher order allocs (it's there to avoid such things!).

>  	struct page **pages;
>  	unsigned int nr_pages, array_size, i;
>  	gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
> @@ -1560,11 +1561,12 @@ static void *__vmalloc_area_node(struct 
>  
>  	for (i = 0; i < area->nr_pages; i++) {
>  		struct page *page;
> +		gfp_t tmp_mask = gfp_mask | __GFP_NOWARN;

I think it would be better to just do away with this as well and just 
hardwire the __GFP_NOWARN directly into the two allocation calls.

>  
>  		if (node < 0)
> -			page = alloc_page(gfp_mask);
> +			page = alloc_page(tmp_mask);
>  		else
> -			page = alloc_pages_node(node, gfp_mask, 0);
> +			page = alloc_pages_node(node, tmp_mask, order);
>  
>  		if (unlikely(!page)) {
>  			/* Successfully allocated i pages, free them in __vunmap() */
> @@ -1579,6 +1581,9 @@ static void *__vmalloc_area_node(struct 
>  	return area->addr;
>  
>  fail:
> +	nopage_warning(gfp_mask, order, "vmalloc: allocation failure, "
> +			"allocated %ld of %ld bytes\n",
> +			(area->nr_pages*PAGE_SIZE), area->size);
>  	vfree(area->addr);
>  	return NULL;
>  }

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] break out page allocation warning code
  2011-04-08 20:37 ` [PATCH 1/2] break out page allocation warning code David Rientjes
@ 2011-04-08 20:43   ` Dave Hansen
  0 siblings, 0 replies; 15+ messages in thread
From: Dave Hansen @ 2011-04-08 20:43 UTC (permalink / raw)
  To: David Rientjes; +Cc: linux-mm, linux-kernel, Johannes Weiner

On Fri, 2011-04-08 at 13:37 -0700, David Rientjes wrote:
> > +static DEFINE_RATELIMIT_STATE(nopage_rs,
> > +		DEFAULT_RATELIMIT_INTERVAL,
> > +		DEFAULT_RATELIMIT_BURST);
> > +
> > +void nopage_warning(gfp_t gfp_mask, int order, const char *fmt, ...)
> 
> I suggest a different name for this, something like warn_alloc_failure() 
> or such.

That works for me.

> I guess this isn't general enough where it could be used in the oom killer 
> as well?

Nope, don't think so.  I took a look at it, but it isn't horribly close
to this.

> > +{
> > +	va_list args;
> > +	int r;
> > +	unsigned int filter = SHOW_MEM_FILTER_NODES;
> > +	const gfp_t wait = gfp_mask & __GFP_WAIT;
> > +
> > +	if ((gfp_mask & __GFP_NOWARN) || !__ratelimit(&nopage_rs))
> > +		return;
> > +
> > +	/*
> > +	 * This documents exceptions given to allocations in certain
> > +	 * contexts that are allowed to allocate outside current's set
> > +	 * of allowed nodes.
> > +	 */
> > +	if (!(gfp_mask & __GFP_NOMEMALLOC))
> > +		if (test_thread_flag(TIF_MEMDIE) ||
> > +		    (current->flags & (PF_MEMALLOC | PF_EXITING)))
> > +			filter &= ~SHOW_MEM_FILTER_NODES;
> > +	if (in_interrupt() || !wait)
> > +		filter &= ~SHOW_MEM_FILTER_NODES;
> > +
> > +	if (fmt) {
> > +		printk(KERN_WARNING);
> > +		va_start(args, fmt);
> > +		r = vprintk(fmt, args);
> > +		va_end(args);
> > +	}
> > +
> > +	printk(KERN_WARNING);
> > +	printk("%s: page allocation failure: order:%d, mode:0x%x\n",
> > +			current->comm, order, gfp_mask);
> 
> This shouldn't be here, it should have been printed already.

The "page allocation failure" might have been, if it was specified (it
isn't from the allocator), but order and mode haven't been.  My thought
here is that _all_ allocator failures will want to output mode and gfp,
so it might as well be common code instead of making everybody specify
it.

-- Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] print vmalloc() state after allocation failures
  2011-04-08 20:39   ` David Rientjes
@ 2011-04-08 20:47     ` Dave Hansen
  0 siblings, 0 replies; 15+ messages in thread
From: Dave Hansen @ 2011-04-08 20:47 UTC (permalink / raw)
  To: David Rientjes; +Cc: linux-mm, linux-kernel, Johannes Weiner

On Fri, 2011-04-08 at 13:39 -0700, David Rientjes wrote:
> On Fri, 8 Apr 2011, Dave Hansen wrote:
> > This patch will print out messages that look like this:
> > 
> > [   30.040774] bash: vmalloc failure allocating after 0 / 73728 bytes
> > 
> 
> Either the changelog or the patch is still wrong because the format of 
> this string is inconsistent.

Yeah, ya caught me. :)
> > diff -puN mm/vmalloc.c~vmalloc-warn mm/vmalloc.c
> > --- linux-2.6.git/mm/vmalloc.c~vmalloc-warn	2011-04-08 09:36:05.877020199 -0700
> > +++ linux-2.6.git-dave/mm/vmalloc.c	2011-04-08 09:38:00.373093593 -0700
> > @@ -1534,6 +1534,7 @@ static void *__vmalloc_node(unsigned lon
> >  static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
> >  				 pgprot_t prot, int node, void *caller)
> >  {
> > +	int order = 0;
> 
> Unnecessary, we can continue to hardcode the 0, vmalloc isn't going to use 
> higher order allocs (it's there to avoid such things!).

The only reason I did that was to keep the printk from looking like
this:

> > +	nopage_warning(gfp_mask, 0,  "vmalloc: allocation failure, "
> > +			"allocated %ld of %ld bytes\n",
> > +			(area->nr_pages*PAGE_SIZE), area->size);

The order is pretty darn obvious in the direct allocator calls, but I
liked having it named where it wasn't as obvious.

> >  	struct page **pages;
> >  	unsigned int nr_pages, array_size, i;
> >  	gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
> > @@ -1560,11 +1561,12 @@ static void *__vmalloc_area_node(struct 
> >  
> >  	for (i = 0; i < area->nr_pages; i++) {
> >  		struct page *page;
> > +		gfp_t tmp_mask = gfp_mask | __GFP_NOWARN;
> 
> I think it would be better to just do away with this as well and just 
> hardwire the __GFP_NOWARN directly into the two allocation calls.

I did it because hard-wiring it takes the alloc_pages_node() one over 80
columns.  I figured if I was going to add a line, I might as well keep
it pretty.

-- Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] break out page allocation warning code
  2011-04-08 20:22 [PATCH 1/2] break out page allocation warning code Dave Hansen
  2011-04-08 20:22 ` [PATCH 2/2] print vmalloc() state after allocation failures Dave Hansen
  2011-04-08 20:37 ` [PATCH 1/2] break out page allocation warning code David Rientjes
@ 2011-04-08 20:54 ` Michał Nazarewicz
  2011-04-08 21:02   ` Dave Hansen
  2 siblings, 1 reply; 15+ messages in thread
From: Michał Nazarewicz @ 2011-04-08 20:54 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-mm, Johannes Weiner, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 591 bytes --]

On Apr 8, 2011 10:23 PM, "Dave Hansen" <dave@linux.vnet.ibm.com> wrote:
> +       if (fmt) {
> +               printk(KERN_WARNING);
> +               va_start(args, fmt);
> +               r = vprintk(fmt, args);
> +               va_end(args);
> +       }

Could we make the "printk(KERN_WARNING);" go away and require caller to
specify level?

> +       printk(KERN_WARNING);
> +       printk("%s: page allocation failure: order:%d, mode:0x%x\n",
> +                       current->comm, order, gfp_mask);

Even more so here. Why not pr_warning instead of two non-atomic calls to
printk?

[-- Attachment #2: Type: text/html, Size: 825 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] break out page allocation warning code
  2011-04-08 20:54 ` Michał Nazarewicz
@ 2011-04-08 21:02   ` Dave Hansen
  2011-04-11 10:20     ` Michal Nazarewicz
  0 siblings, 1 reply; 15+ messages in thread
From: Dave Hansen @ 2011-04-08 21:02 UTC (permalink / raw)
  To: Michał Nazarewicz; +Cc: linux-mm, Johannes Weiner, linux-kernel

On Fri, 2011-04-08 at 22:54 +0200, MichaA? Nazarewicz wrote:
> On Apr 8, 2011 10:23 PM, "Dave Hansen" <dave@linux.vnet.ibm.com> wrote:
> > +       if (fmt) {
> > +               printk(KERN_WARNING);
> > +               va_start(args, fmt);
> > +               r = vprintk(fmt, args);
> > +               va_end(args);
> > +       }
> 
> Could we make the "printk(KERN_WARNING);" go away and require caller
> to specify level?  

The core problem is this: I want two lines of output: one for the
order/mode gunk, and one for the user-specified message.

If we have the user pass in a string for the printk() level, we're stuck
doing what I have here.  If we have them _prepend_ it to the "fmt"
string, then it's harder to figure out below.  I guess we could fish in
the string for it.

> > +       printk(KERN_WARNING);
> > +       printk("%s: page allocation failure: order:%d, mode:0x%x\n",
> > +                       current->comm, order, gfp_mask);
> 
> Even more so here. Why not pr_warning instead of two non-atomic calls
> to printk?

It's a relic of an hour ago when I tried passing in the printk() level
to the function as a string.  It can go away now. :)

-- Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] break out page allocation warning code
  2011-04-08 21:02   ` Dave Hansen
@ 2011-04-11 10:20     ` Michal Nazarewicz
  0 siblings, 0 replies; 15+ messages in thread
From: Michal Nazarewicz @ 2011-04-11 10:20 UTC (permalink / raw)
  To: Michał Nazarewicz, Dave Hansen
  Cc: linux-mm, Johannes Weiner, linux-kernel

>> On Apr 8, 2011 10:23 PM, "Dave Hansen" <dave@linux.vnet.ibm.com> wrote:
>>> +       if (fmt) {
>>> +               printk(KERN_WARNING);
>>> +               va_start(args, fmt);
>>> +               r = vprintk(fmt, args);
>>> +               va_end(args);
>>> +       }

> On Fri, 2011-04-08 at 22:54 +0200, Michał Nazarewicz wrote:
>> Could we make the "printk(KERN_WARNING);" go away and require caller
>> to specify level?

On Fri, 08 Apr 2011 23:02:02 +0200, Dave Hansen wrote:
> The core problem is this: I want two lines of output: one for the
> order/mode gunk, and one for the user-specified message.
>
> If we have the user pass in a string for the printk() level, we're stuck
> doing what I have here.  If we have them _prepend_ it to the "fmt"
> string, then it's harder to figure out below.  I guess we could fish in
> the string for it.

This is a bit unfortunate, but that's what I was worried anyway.  I guess
creating a macro which automatically prepends format  with KERN_WARNING
would solve the issue but that's probably not the most elegant solution.

-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Michal "mina86" Nazarewicz    (o o)
ooo +-----<email/xmpp: mnazarewicz@google.com>-----ooO--(_)--Ooo--

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 2/2] print vmalloc() state after allocation failures
  2011-04-15 17:04 Dave Hansen
@ 2011-04-15 17:04 ` Dave Hansen
  2011-04-15 17:20   ` Michal Nazarewicz
  0 siblings, 1 reply; 15+ messages in thread
From: Dave Hansen @ 2011-04-15 17:04 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, Johannes Weiner, David Rientjes, Michal Nazarewicz,
	akpm, Dave Hansen


I was tracking down a page allocation failure that ended up in vmalloc().
Since vmalloc() uses 0-order pages, if somebody asks for an insane amount
of memory, we'll still get a warning with "order:0" in it.  That's not
very useful.

During recovery, vmalloc() also nicely frees all of the memory that it
got up to the point of the failure.  That is wonderful, but it also
quickly hides any issues.  We have a much different sitation if vmalloc()
repeatedly fails 10GB in to:

	vmalloc(100 * 1<<30);

versus repeatedly failing 4096 bytes in to a:

	vmalloc(8192);

This patch will print out messages that look like this:

[   68.123503] vmalloc: allocation failure, allocated 6680576 of 13426688 bytes
[   68.124218] bash: page allocation failure: order:0, mode:0xd2
[   68.124811] Pid: 3770, comm: bash Not tainted 2.6.39-rc3-00082-g85f2e68-dirty #333
[   68.125579] Call Trace:
[   68.125853]  [<ffffffff810f6da6>] warn_alloc_failed+0x146/0x170
[   68.126464]  [<ffffffff8107e05c>] ? printk+0x6c/0x70
[   68.126791]  [<ffffffff8112b5d4>] ? alloc_pages_current+0x94/0xe0
[   68.127661]  [<ffffffff8111ed37>] __vmalloc_node_range+0x237/0x290
...

The 'order' variable is added for clarity when calling
warn_alloc_failed() to avoid having an unexplained '0' as an argument.
The 'tmp_mask' is there to keep the alloc_pages_node() looking sane.
Adding __GFP_NOWARN is done because we now have our own, full error
message in vmalloc code.

As a side issue, I also noticed that ctl_ioctl() does vmalloc() based
solely on an unverified value passed in from userspace.  Granted, it's
under CAP_SYS_ADMIN, but it still frightens me a bit.

Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
---

 linux-2.6.git-dave/mm/vmalloc.c |    9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff -puN mm/vmalloc.c~vmalloc-warn mm/vmalloc.c
--- linux-2.6.git/mm/vmalloc.c~vmalloc-warn	2011-04-15 08:49:06.823306620 -0700
+++ linux-2.6.git-dave/mm/vmalloc.c	2011-04-15 09:20:17.926460283 -0700
@@ -1534,6 +1534,7 @@ static void *__vmalloc_node(unsigned lon
 static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
 				 pgprot_t prot, int node, void *caller)
 {
+	int order = 0;
 	struct page **pages;
 	unsigned int nr_pages, array_size, i;
 	gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
@@ -1560,11 +1561,12 @@ static void *__vmalloc_area_node(struct 
 
 	for (i = 0; i < area->nr_pages; i++) {
 		struct page *page;
+		gfp_t tmp_mask = gfp_mask | __GFP_NOWARN;
 
 		if (node < 0)
-			page = alloc_page(gfp_mask);
+			page = alloc_page(tmp_mask);
 		else
-			page = alloc_pages_node(node, gfp_mask, 0);
+			page = alloc_pages_node(node, tmp_mask, order);
 
 		if (unlikely(!page)) {
 			/* Successfully allocated i pages, free them in __vunmap() */
@@ -1579,6 +1581,9 @@ static void *__vmalloc_area_node(struct 
 	return area->addr;
 
 fail:
+	warn_alloc_failed(gfp_mask, order, "vmalloc: allocation failure, "
+			  "allocated %ld of %ld bytes\n",
+			  (area->nr_pages*PAGE_SIZE), area->size);
 	vfree(area->addr);
 	return NULL;
 }
_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] print vmalloc() state after allocation failures
  2011-04-15 17:04 ` [PATCH 2/2] print vmalloc() state after allocation failures Dave Hansen
@ 2011-04-15 17:20   ` Michal Nazarewicz
  2011-04-15 17:44     ` Dave Hansen
  0 siblings, 1 reply; 15+ messages in thread
From: Michal Nazarewicz @ 2011-04-15 17:20 UTC (permalink / raw)
  To: linux-mm, Dave Hansen; +Cc: linux-kernel, Johannes Weiner, David Rientjes, akpm

On Fri, 15 Apr 2011 19:04:38 +0200, Dave Hansen wrote:
> diff -puN mm/vmalloc.c~vmalloc-warn mm/vmalloc.c
> --- linux-2.6.git/mm/vmalloc.c~vmalloc-warn	2011-04-15  
> 08:49:06.823306620 -0700
> +++ linux-2.6.git-dave/mm/vmalloc.c	2011-04-15 09:20:17.926460283 -0700
> @@ -1534,6 +1534,7 @@ static void *__vmalloc_node(unsigned lon
>  static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
>  				 pgprot_t prot, int node, void *caller)
>  {
> +	int order = 0;

Could we make that const?

>  	struct page **pages;
>  	unsigned int nr_pages, array_size, i;
>  	gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
> @@ -1560,11 +1561,12 @@ static void *__vmalloc_area_node(struct
> 	for (i = 0; i < area->nr_pages; i++) {
>  		struct page *page;
> +		gfp_t tmp_mask = gfp_mask | __GFP_NOWARN;
> 		if (node < 0)
> -			page = alloc_page(gfp_mask);
> +			page = alloc_page(tmp_mask);
>  		else
> -			page = alloc_pages_node(node, gfp_mask, 0);
> +			page = alloc_pages_node(node, tmp_mask, order);

so it'll be more visible that we are passing 0 here.

> 		if (unlikely(!page)) {
>  			/* Successfully allocated i pages, free them in __vunmap() */
> @@ -1579,6 +1581,9 @@ static void *__vmalloc_area_node(struct
>  	return area->addr;
> fail:
> +	warn_alloc_failed(gfp_mask, order, "vmalloc: allocation failure, "
> +			  "allocated %ld of %ld bytes\n",
> +			  (area->nr_pages*PAGE_SIZE), area->size);
>  	vfree(area->addr);
>  	return NULL;
>  }
> _
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign  
> http://stopthemeter.ca/
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>


-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Michal "mina86" Nazarewicz    (o o)
ooo +-----<email/xmpp: mnazarewicz@google.com>-----ooO--(_)--Ooo--

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 2/2] print vmalloc() state after allocation failures
  2011-04-15 17:20   ` Michal Nazarewicz
@ 2011-04-15 17:44     ` Dave Hansen
  2011-04-17  0:03       ` David Rientjes
  0 siblings, 1 reply; 15+ messages in thread
From: Dave Hansen @ 2011-04-15 17:44 UTC (permalink / raw)
  To: Michal Nazarewicz
  Cc: linux-mm, linux-kernel, Johannes Weiner, David Rientjes, akpm

On Fri, 2011-04-15 at 19:20 +0200, Michal Nazarewicz wrote:
> On Fri, 15 Apr 2011 19:04:38 +0200, Dave Hansen wrote:
> > diff -puN mm/vmalloc.c~vmalloc-warn mm/vmalloc.c
> > --- linux-2.6.git/mm/vmalloc.c~vmalloc-warn	2011-04-15  
> > 08:49:06.823306620 -0700
> > +++ linux-2.6.git-dave/mm/vmalloc.c	2011-04-15 09:20:17.926460283 -0700
> > @@ -1534,6 +1534,7 @@ static void *__vmalloc_node(unsigned lon
> >  static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
> >  				 pgprot_t prot, int node, void *caller)
> >  {
> > +	int order = 0;
> 
> Could we make that const?

Sure.  Here's a replacement patch.  Compiles and boots for me.

--

I was tracking down a page allocation failure that ended up in vmalloc().
Since vmalloc() uses 0-order pages, if somebody asks for an insane amount
of memory, we'll still get a warning with "order:0" in it.  That's not
very useful.

During recovery, vmalloc() also nicely frees all of the memory that it
got up to the point of the failure.  That is wonderful, but it also
quickly hides any issues.  We have a much different sitation if vmalloc()
repeatedly fails 10GB in to:

	vmalloc(100 * 1<<30);

versus repeatedly failing 4096 bytes in to a:

	vmalloc(8192);

This patch will print out messages that look like this:

[   68.123503] vmalloc: allocation failure, allocated 6680576 of 13426688 bytes
[   68.124218] bash: page allocation failure: order:0, mode:0xd2
[   68.124811] Pid: 3770, comm: bash Not tainted 2.6.39-rc3-00082-g85f2e68-dirty #333
[   68.125579] Call Trace:
[   68.125853]  [<ffffffff810f6da6>] warn_alloc_failed+0x146/0x170
[   68.126464]  [<ffffffff8107e05c>] ? printk+0x6c/0x70
[   68.126791]  [<ffffffff8112b5d4>] ? alloc_pages_current+0x94/0xe0
[   68.127661]  [<ffffffff8111ed37>] __vmalloc_node_range+0x237/0x290
...

The 'order' variable is added for clarity when calling
warn_alloc_failed() to avoid having an unexplained '0' as an argument.
The 'tmp_mask' is there to keep the alloc_pages_node() looking sane.
Adding __GFP_NOWARN is done because we now have our own, full error
message in vmalloc code.

As a side issue, I also noticed that ctl_ioctl() does vmalloc() based
solely on an unverified value passed in from userspace.  Granted, it's
under CAP_SYS_ADMIN, but it still frightens me a bit.

Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
---

 linux-2.6.git-dave/mm/vmalloc.c |    9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff -puN mm/vmalloc.c~vmalloc-warn mm/vmalloc.c
--- linux-2.6.git/mm/vmalloc.c~vmalloc-warn	2011-04-15 10:39:05.928793559 -0700
+++ linux-2.6.git-dave/mm/vmalloc.c	2011-04-15 10:39:18.716789177 -0700
@@ -1534,6 +1534,7 @@ static void *__vmalloc_node(unsigned lon
 static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
 				 pgprot_t prot, int node, void *caller)
 {
+	const int order = 0;
 	struct page **pages;
 	unsigned int nr_pages, array_size, i;
 	gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
@@ -1560,11 +1561,12 @@ static void *__vmalloc_area_node(struct 
 
 	for (i = 0; i < area->nr_pages; i++) {
 		struct page *page;
+		gfp_t tmp_mask = gfp_mask | __GFP_NOWARN;
 
 		if (node < 0)
-			page = alloc_page(gfp_mask);
+			page = alloc_page(tmp_mask);
 		else
-			page = alloc_pages_node(node, gfp_mask, 0);
+			page = alloc_pages_node(node, tmp_mask, order);
 
 		if (unlikely(!page)) {
 			/* Successfully allocated i pages, free them in __vunmap() */
@@ -1579,6 +1581,9 @@ static void *__vmalloc_area_node(struct 
 	return area->addr;
 
 fail:
+	warn_alloc_failed(gfp_mask, order, "vmalloc: allocation failure, "
+			  "allocated %ld of %ld bytes\n",
+			  (area->nr_pages*PAGE_SIZE), area->size);
 	vfree(area->addr);
 	return NULL;
 }
_


-- Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] print vmalloc() state after allocation failures
  2011-04-15 17:44     ` Dave Hansen
@ 2011-04-17  0:03       ` David Rientjes
  2011-04-18 15:21         ` Dave Hansen
  0 siblings, 1 reply; 15+ messages in thread
From: David Rientjes @ 2011-04-17  0:03 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Michal Nazarewicz, linux-mm, linux-kernel, Johannes Weiner,
	Andrew Morton

On Fri, 15 Apr 2011, Dave Hansen wrote:

> diff -puN mm/vmalloc.c~vmalloc-warn mm/vmalloc.c
> --- linux-2.6.git/mm/vmalloc.c~vmalloc-warn	2011-04-15 10:39:05.928793559 -0700
> +++ linux-2.6.git-dave/mm/vmalloc.c	2011-04-15 10:39:18.716789177 -0700
> @@ -1534,6 +1534,7 @@ static void *__vmalloc_node(unsigned lon
>  static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
>  				 pgprot_t prot, int node, void *caller)
>  {
> +	const int order = 0;
>  	struct page **pages;
>  	unsigned int nr_pages, array_size, i;
>  	gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
> @@ -1560,11 +1561,12 @@ static void *__vmalloc_area_node(struct 
>  
>  	for (i = 0; i < area->nr_pages; i++) {
>  		struct page *page;
> +		gfp_t tmp_mask = gfp_mask | __GFP_NOWARN;
>  
>  		if (node < 0)
> -			page = alloc_page(gfp_mask);
> +			page = alloc_page(tmp_mask);
>  		else
> -			page = alloc_pages_node(node, gfp_mask, 0);
> +			page = alloc_pages_node(node, tmp_mask, order);
>  
>  		if (unlikely(!page)) {
>  			/* Successfully allocated i pages, free them in __vunmap() */
> @@ -1579,6 +1581,9 @@ static void *__vmalloc_area_node(struct 
>  	return area->addr;
>  
>  fail:
> +	warn_alloc_failed(gfp_mask, order, "vmalloc: allocation failure, "
> +			  "allocated %ld of %ld bytes\n",
> +			  (area->nr_pages*PAGE_SIZE), area->size);
>  	vfree(area->addr);
>  	return NULL;
>  }

Sorry, I still don't understand why this isn't just a three-liner patch to 
call warn_alloc_failed().  I don't see the benefit of the "order" or 
"tmp_mask" variables at all, they'll just be removed next time someone 
goes down the mm/* directory and looks for variables that are used only 
once or are unchanged as a cleanup.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] print vmalloc() state after allocation failures
  2011-04-17  0:03       ` David Rientjes
@ 2011-04-18 15:21         ` Dave Hansen
  0 siblings, 0 replies; 15+ messages in thread
From: Dave Hansen @ 2011-04-18 15:21 UTC (permalink / raw)
  To: David Rientjes
  Cc: Michal Nazarewicz, linux-mm, linux-kernel, Johannes Weiner,
	Andrew Morton

On Sat, 2011-04-16 at 17:03 -0700, David Rientjes wrote:
> >  fail:
> > +     warn_alloc_failed(gfp_mask, order, "vmalloc: allocation failure, "
> > +                       "allocated %ld of %ld bytes\n",
> > +                       (area->nr_pages*PAGE_SIZE), area->size);
> >       vfree(area->addr);
> >       return NULL;
> >  }
> 
> Sorry, I still don't understand why this isn't just a three-liner patch to 
> call warn_alloc_failed().  I don't see the benefit of the "order" or 
> "tmp_mask" variables at all, they'll just be removed next time someone 
> goes down the mm/* directory and looks for variables that are used only 
> once or are unchanged as a cleanup. 

Without the "order" variable, we have:

	warn_alloc_failed(gfp_mask, 0, "vmalloc: allocation failure, "
		"allocated %ld of %ld bytes\n",
		(area->nr_pages*PAGE_SIZE), area->size);

I *HATE* those with a passion.  What is the '0' _doing_?  Is it for "0
pages", "do not print", "_do_ print"?  There's no way to tell without
going and finding warn_alloc_failed()'s definition.

With 'order' in there, the code self-documents, at least from the
caller's side.  It makes it 100% clear that the "0" being passed to the
allocators is that same as the one passed to the warning; it draws a
link between the allocations and the allocation error message:

	warn_alloc_failed(gfp_mask, order, "vmalloc: allocation failure, "
		"allocated %ld of %ld bytes\n",
		(area->nr_pages*PAGE_SIZE), area->size);

As for the 'tmp_mask' business.  Right now we have:

        for (i = 0; i < area->nr_pages; i++) {
                struct page *page;
+               gfp_t tmp_mask = gfp_mask | __GFP_NOWARN;

                if (node < 0)
-                       page = alloc_page(gfp_mask);
+                       page = alloc_page(tmp_mask);
                else
-                       page = alloc_pages_node(node, gfp_mask, 0);
+                       page = alloc_pages_node(node, tmp_mask, order);

The alternative is this:

        for (i = 0; i < area->nr_pages; i++) {
                struct page *page;

                if (node < 0)
-                       page = alloc_page(gfp_mask);
+                       page = alloc_page(gfp_mask | __GFP_NOWARN);
                else
-                       page = alloc_pages_node(node, gfp_mask, 0);
+                       page = alloc_pages_node(node, gfp_mask | __GFP_NOWARN,
+						order);

I can go look, but I bet the compiler compiles down to the same thing.
Plus, they're the same number of lines in the end.  I know which one
appeals to me visually.

I think we're pretty deep in personal preference territory here.  If I
hear a consensus that folks like it one way over another, I'm happy to
change it.

-- Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 2/2] print vmalloc() state after allocation failures
  2011-04-19 16:21 [PATCH 1/2] break out page allocation warning code Dave Hansen
@ 2011-04-19 16:21 ` Dave Hansen
  0 siblings, 0 replies; 15+ messages in thread
From: Dave Hansen @ 2011-04-19 16:21 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, Johannes Weiner, David Rientjes, Michal Nazarewicz,
	akpm, Dave Hansen


New in this version:
- updated description to clarify why I added local variables

--

I was tracking down a page allocation failure that ended up in vmalloc().
Since vmalloc() uses 0-order pages, if somebody asks for an insane amount
of memory, we'll still get a warning with "order:0" in it.  That's not
very useful.

During recovery, vmalloc() also nicely frees all of the memory that it
got up to the point of the failure.  That is wonderful, but it also
quickly hides any issues.  We have a much different sitation if vmalloc()
repeatedly fails 10GB in to:

	vmalloc(100 * 1<<30);

versus repeatedly failing 4096 bytes in to a:

	vmalloc(8192);

This patch will print out messages that look like this:

[   68.123503] vmalloc: allocation failure, allocated 6680576 of 13426688 bytes
[   68.124218] bash: page allocation failure: order:0, mode:0xd2
[   68.124811] Pid: 3770, comm: bash Not tainted 2.6.39-rc3-00082-g85f2e68-dirty #333
[   68.125579] Call Trace:
[   68.125853]  [<ffffffff810f6da6>] warn_alloc_failed+0x146/0x170
[   68.126464]  [<ffffffff8107e05c>] ? printk+0x6c/0x70
[   68.126791]  [<ffffffff8112b5d4>] ? alloc_pages_current+0x94/0xe0
[   68.127661]  [<ffffffff8111ed37>] __vmalloc_node_range+0x237/0x290
...

The 'order' variable is added for clarity when calling
warn_alloc_failed() to avoid having an unexplained '0' as an argument.

The 'tmp_mask' is because adding an open-coded '| __GFP_NOWARN' would
take us over 80 columns for the alloc_pages_node() call.  If we are
going to add a line, it might as well be one that makes the sucker
easier to read.

As a side issue, I also noticed that ctl_ioctl() does vmalloc() based
solely on an unverified value passed in from userspace.  Granted, it's
under CAP_SYS_ADMIN, but it still frightens me a bit.

Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
---

 linux-2.6.git-dave/mm/vmalloc.c |    9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff -puN mm/vmalloc.c~vmalloc-warn mm/vmalloc.c
--- linux-2.6.git/mm/vmalloc.c~vmalloc-warn	2011-04-18 15:03:35.658506887 -0700
+++ linux-2.6.git-dave/mm/vmalloc.c	2011-04-18 15:04:48.762499842 -0700
@@ -1534,6 +1534,7 @@ static void *__vmalloc_node(unsigned lon
 static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
 				 pgprot_t prot, int node, void *caller)
 {
+	const int order = 0;
 	struct page **pages;
 	unsigned int nr_pages, array_size, i;
 	gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
@@ -1560,11 +1561,12 @@ static void *__vmalloc_area_node(struct 
 
 	for (i = 0; i < area->nr_pages; i++) {
 		struct page *page;
+		gfp_t tmp_mask = gfp_mask | __GFP_NOWARN;
 
 		if (node < 0)
-			page = alloc_page(gfp_mask);
+			page = alloc_page(tmp_mask);
 		else
-			page = alloc_pages_node(node, gfp_mask, 0);
+			page = alloc_pages_node(node, tmp_mask, order);
 
 		if (unlikely(!page)) {
 			/* Successfully allocated i pages, free them in __vunmap() */
@@ -1579,6 +1581,9 @@ static void *__vmalloc_area_node(struct 
 	return area->addr;
 
 fail:
+	warn_alloc_failed(gfp_mask, order, "vmalloc: allocation failure, "
+			  "allocated %ld of %ld bytes\n",
+			  (area->nr_pages*PAGE_SIZE), area->size);
 	vfree(area->addr);
 	return NULL;
 }
_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2011-04-19 16:21 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-04-08 20:22 [PATCH 1/2] break out page allocation warning code Dave Hansen
2011-04-08 20:22 ` [PATCH 2/2] print vmalloc() state after allocation failures Dave Hansen
2011-04-08 20:39   ` David Rientjes
2011-04-08 20:47     ` Dave Hansen
2011-04-08 20:37 ` [PATCH 1/2] break out page allocation warning code David Rientjes
2011-04-08 20:43   ` Dave Hansen
2011-04-08 20:54 ` Michał Nazarewicz
2011-04-08 21:02   ` Dave Hansen
2011-04-11 10:20     ` Michal Nazarewicz
  -- strict thread matches above, loose matches on Subject: below --
2011-04-15 17:04 Dave Hansen
2011-04-15 17:04 ` [PATCH 2/2] print vmalloc() state after allocation failures Dave Hansen
2011-04-15 17:20   ` Michal Nazarewicz
2011-04-15 17:44     ` Dave Hansen
2011-04-17  0:03       ` David Rientjes
2011-04-18 15:21         ` Dave Hansen
2011-04-19 16:21 [PATCH 1/2] break out page allocation warning code Dave Hansen
2011-04-19 16:21 ` [PATCH 2/2] print vmalloc() state after allocation failures Dave Hansen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox