[PATCH] mm: Check if any page in a pageblock is reserved before marking it MIGRATE

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] mm: Check if any page in a pageblock is reserved before marking it MIGRATE_RESERVE
@ 2011-04-22  1:34 John Stultz
  2011-04-22 16:02 ` Dave Hansen
  2011-04-26  7:34 ` Mel Gorman
  0 siblings, 2 replies; 9+ messages in thread
From: John Stultz @ 2011-04-22  1:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: Arve Hjønnevåg, Dave Hansen, Mel Gorman, Andrew Morton,
	John Stultz

From: Arve Hjønnevåg <arve@android.com>

This fixes a problem where the first pageblock got marked MIGRATE_RESERVE even
though it only had a few free pages. This in turn caused no contiguous memory
to be reserved and frequent kswapd wakeups that emptied the caches to get more
contiguous memory.

CC: Dave Hansen <dave@linux.vnet.ibm.com>
CC: Mel Gorman <mgorman@suse.de>
CC: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Arve Hjønnevåg <arve@android.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>

[This patch was submitted and acked a little over a year ago
(see: http://lkml.org/lkml/2010/4/6/172 ), but never seemingly
made it upstream. Resending for comments. -jstultz]

Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 mm/page_alloc.c |   16 +++++++++++++++-
 1 files changed, 15 insertions(+), 1 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index ed87f3b..209d9bf 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3288,6 +3288,20 @@ static inline unsigned long wait_table_bits(unsigned long size)
 #define LONG_ALIGN(x) (((x)+(sizeof(long))-1)&~((sizeof(long))-1))
 
 /*
+ * Check if a pageblock contains reserved pages
+ */
+static int pageblock_is_reserved(unsigned long start_pfn)
+{
+	unsigned long end_pfn = start_pfn + pageblock_nr_pages;
+	unsigned long pfn;
+
+	for (pfn = start_pfn; pfn < end_pfn; pfn++)
+		if (PageReserved(pfn_to_page(pfn)))
+			return 1;
+	return 0;
+}
+
+/*
  * Mark a number of pageblocks as MIGRATE_RESERVE. The number
  * of blocks reserved is based on min_wmark_pages(zone). The memory within
  * the reserve will tend to store contiguous free pages. Setting min_free_kbytes
@@ -3326,7 +3340,7 @@ static void setup_zone_migrate_reserve(struct zone *zone)
 			continue;
 
 		/* Blocks with reserved pages will never free, skip them. */
-		if (PageReserved(page))
+		if (pageblock_is_reserved(pfn))
 			continue;
 
 		block_migratetype = get_pageblock_migratetype(page);
-- 
1.7.3.2.146.gca209


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm: Check if any page in a pageblock is reserved before marking it MIGRATE_RESERVE
  2011-04-22  1:34 [PATCH] mm: Check if any page in a pageblock is reserved before marking it MIGRATE_RESERVE John Stultz
@ 2011-04-22 16:02 ` Dave Hansen
  2011-04-26  7:34 ` Mel Gorman
  1 sibling, 0 replies; 9+ messages in thread
From: Dave Hansen @ 2011-04-22 16:02 UTC (permalink / raw)
  To: John Stultz
  Cc: linux-kernel, Arve Hjønnevåg, Mel Gorman, Andrew Morton

On Thu, 2011-04-21 at 18:34 -0700, John Stultz wrote:
> This fixes a problem where the first pageblock got marked MIGRATE_RESERVE even
> though it only had a few free pages. This in turn caused no contiguous memory
> to be reserved and frequent kswapd wakeups that emptied the caches to get more
> contiguous memory.

This looks sane to me.

Acked-by: Dave Hansen <dave@linux.vnet.ibm.com> 

-- Dave


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm: Check if any page in a pageblock is reserved before marking it MIGRATE_RESERVE
  2011-04-22  1:34 [PATCH] mm: Check if any page in a pageblock is reserved before marking it MIGRATE_RESERVE John Stultz
  2011-04-22 16:02 ` Dave Hansen
@ 2011-04-26  7:34 ` Mel Gorman
  2011-04-26  9:49   ` KOSAKI Motohiro
  1 sibling, 1 reply; 9+ messages in thread
From: Mel Gorman @ 2011-04-26  7:34 UTC (permalink / raw)
  To: John Stultz; +Cc: linux-kernel, Arve Hj?nnev?g, Dave Hansen, Andrew Morton

On Thu, Apr 21, 2011 at 06:34:03PM -0700, John Stultz wrote:
> From: Arve Hjønnevåg <arve@android.com>
> 
> This fixes a problem where the first pageblock got marked MIGRATE_RESERVE even
> though it only had a few free pages. This in turn caused no contiguous memory
> to be reserved and frequent kswapd wakeups that emptied the caches to get more
> contiguous memory.
> 
> CC: Dave Hansen <dave@linux.vnet.ibm.com>
> CC: Mel Gorman <mgorman@suse.de>
> CC: Andrew Morton <akpm@linux-foundation.org>
> Signed-off-by: Arve Hjønnevåg <arve@android.com>
> Acked-by: Mel Gorman <mel@csn.ul.ie>
> 
> [This patch was submitted and acked a little over a year ago
> (see: http://lkml.org/lkml/2010/4/6/172 ), but never seemingly
> made it upstream. Resending for comments. -jstultz]
> 
> Signed-off-by: John Stultz <john.stultz@linaro.org>

Whoops, should have spotted it slipped through. FWIW, I'm still happy
with my Ack being stuck onto it.

Thanks.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm: Check if any page in a pageblock is reserved before marking it MIGRATE_RESERVE
  2011-04-26  7:34 ` Mel Gorman
@ 2011-04-26  9:49   ` KOSAKI Motohiro
  2011-04-26 10:13     ` Mel Gorman
  2011-04-26 17:51     ` John Stultz
  0 siblings, 2 replies; 9+ messages in thread
From: KOSAKI Motohiro @ 2011-04-26  9:49 UTC (permalink / raw)
  To: Mel Gorman
  Cc: kosaki.motohiro, John Stultz, linux-kernel, Arve Hj?nnev?g,
	Dave Hansen, Andrew Morton

> On Thu, Apr 21, 2011 at 06:34:03PM -0700, John Stultz wrote:
> > From: Arve Hjønnevåg <arve@android.com>
> > 
> > This fixes a problem where the first pageblock got marked MIGRATE_RESERVE even
> > though it only had a few free pages. This in turn caused no contiguous memory
> > to be reserved and frequent kswapd wakeups that emptied the caches to get more
> > contiguous memory.
> > 
> > CC: Dave Hansen <dave@linux.vnet.ibm.com>
> > CC: Mel Gorman <mgorman@suse.de>
> > CC: Andrew Morton <akpm@linux-foundation.org>
> > Signed-off-by: Arve Hjønnevåg <arve@android.com>
> > Acked-by: Mel Gorman <mel@csn.ul.ie>
> > 
> > [This patch was submitted and acked a little over a year ago
> > (see: http://lkml.org/lkml/2010/4/6/172 ), but never seemingly
> > made it upstream. Resending for comments. -jstultz]
> > 
> > Signed-off-by: John Stultz <john.stultz@linaro.org>
> 
> Whoops, should have spotted it slipped through. FWIW, I'm still happy
> with my Ack being stuck onto it.

Hehe, No.

You acked another patch at last year and John taked up old one. Sigh.
Look,  correct one has pfn_valid_within(). 
	http://lkml.org/lkml/2010/4/6/172

And, Minchan suggested to add more explanation to the description. Then, I think
following is desiable one.



Subject: [PATCH] mm: Check if any page in a pageblock is reserved before marking it MIGRATE_RESERVE
From: Arve Hjonnevag <arve@android.com>

This fixes a problem where the first pageblock got marked MIGRATE_RESERVE even
though it only had a few free pages. eg, On current ARM port, The kernel starts
at offset 0x8000 to leave room for boot parameters, and the memory is freed later.

This in turn caused no contiguous memory to be reserved and frequent kswapd
wakeups that emptied the caches to get more contiguous memory.

Unfortunatelly, ARM need order-2 allocation for pgd (see arm/mm/pgd.c#pgd_alloc()).
Therefore the issue is not minor nor easy avoidable.

CC: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Arve Hjonnevag <arve@android.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> [added a
few explanation]
---
 mm/page_alloc.c |   16 +++++++++++++++-
 1 files changed, 15 insertions(+), 1 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 1d5c189..10d9fa7 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3282,6 +3282,20 @@ static inline unsigned long wait_table_bits(unsigned long size)
 #define LONG_ALIGN(x) (((x)+(sizeof(long))-1)&~((sizeof(long))-1))
 
 /*
+ * Check if a pageblock contains reserved pages
+ */
+static int pageblock_is_reserved(unsigned long start_pfn)
+{
+	unsigned long end_pfn = start_pfn + pageblock_nr_pages;
+	unsigned long pfn;
+
+	for (pfn = start_pfn; pfn < end_pfn; pfn++)
+		if (!pfn_valid_within(pfn) || PageReserved(pfn_to_page(pfn)))
+			return 1;
+	return 0;
+}
+
+/*
  * Mark a number of pageblocks as MIGRATE_RESERVE. The number
  * of blocks reserved is based on min_wmark_pages(zone). The memory within
  * the reserve will tend to store contiguous free pages. Setting min_free_kbytes
@@ -3320,7 +3334,7 @@ static void setup_zone_migrate_reserve(struct zone *zone)
 			continue;
 
 		/* Blocks with reserved pages will never free, skip them. */
-		if (PageReserved(page))
+		if (pageblock_is_reserved(pfn))
 			continue;
 
 		block_migratetype = get_pageblock_migratetype(page);
-- 
1.7.3.1




^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm: Check if any page in a pageblock is reserved before marking it MIGRATE_RESERVE
  2011-04-26  9:49   ` KOSAKI Motohiro
@ 2011-04-26 10:13     ` Mel Gorman
  2011-04-26 17:51     ` John Stultz
  1 sibling, 0 replies; 9+ messages in thread
From: Mel Gorman @ 2011-04-26 10:13 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: John Stultz, linux-kernel, Arve Hj?nnev?g, Dave Hansen,
	Andrew Morton

On Tue, Apr 26, 2011 at 06:49:39PM +0900, KOSAKI Motohiro wrote:
> > On Thu, Apr 21, 2011 at 06:34:03PM -0700, John Stultz wrote:
> > > From: Arve Hjønnevåg <arve@android.com>
> > > 
> > > This fixes a problem where the first pageblock got marked MIGRATE_RESERVE even
> > > though it only had a few free pages. This in turn caused no contiguous memory
> > > to be reserved and frequent kswapd wakeups that emptied the caches to get more
> > > contiguous memory.
> > > 
> > > CC: Dave Hansen <dave@linux.vnet.ibm.com>
> > > CC: Mel Gorman <mgorman@suse.de>
> > > CC: Andrew Morton <akpm@linux-foundation.org>
> > > Signed-off-by: Arve Hjønnevåg <arve@android.com>
> > > Acked-by: Mel Gorman <mel@csn.ul.ie>
> > > 
> > > [This patch was submitted and acked a little over a year ago
> > > (see: http://lkml.org/lkml/2010/4/6/172 ), but never seemingly
> > > made it upstream. Resending for comments. -jstultz]
> > > 
> > > Signed-off-by: John Stultz <john.stultz@linaro.org>
> > 
> > Whoops, should have spotted it slipped through. FWIW, I'm still happy
> > with my Ack being stuck onto it.
> 
> Hehe, No.
> 
> You acked another patch at last year and John taked up old one. Sigh.
> Look,  correct one has pfn_valid_within(). 
> 	http://lkml.org/lkml/2010/4/6/172
> 

Bah, you're right thanks for catching that. A pfn_valid_within check is
indeed required, particularly on ARM where there can be holes punched within
pageblock boundaries. Thanks

> 
> Subject: [PATCH] mm: Check if any page in a pageblock is reserved before marking it MIGRATE_RESERVE
> From: Arve Hjonnevag <arve@android.com>
> 
> This fixes a problem where the first pageblock got marked MIGRATE_RESERVE even
> though it only had a few free pages. eg, On current ARM port, The kernel starts
> at offset 0x8000 to leave room for boot parameters, and the memory is freed later.
> 
> This in turn caused no contiguous memory to be reserved and frequent kswapd
> wakeups that emptied the caches to get more contiguous memory.
> 
> Unfortunatelly, ARM need order-2 allocation for pgd (see arm/mm/pgd.c#pgd_alloc()).
> Therefore the issue is not minor nor easy avoidable.
> 
> CC: Andrew Morton <akpm@linux-foundation.org>
> Signed-off-by: Arve Hjonnevag <arve@android.com>
> Acked-by: Mel Gorman <mel@csn.ul.ie>
> Acked-by: Dave Hansen <dave@linux.vnet.ibm.com>
> Signed-off-by: John Stultz <john.stultz@linaro.org>
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> [added a
> few explanation]
> ---
>  mm/page_alloc.c |   16 +++++++++++++++-
>  1 files changed, 15 insertions(+), 1 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 1d5c189..10d9fa7 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3282,6 +3282,20 @@ static inline unsigned long wait_table_bits(unsigned long size)
>  #define LONG_ALIGN(x) (((x)+(sizeof(long))-1)&~((sizeof(long))-1))
>  
>  /*
> + * Check if a pageblock contains reserved pages
> + */
> +static int pageblock_is_reserved(unsigned long start_pfn)
> +{
> +	unsigned long end_pfn = start_pfn + pageblock_nr_pages;
> +	unsigned long pfn;
> +
> +	for (pfn = start_pfn; pfn < end_pfn; pfn++)
> +		if (!pfn_valid_within(pfn) || PageReserved(pfn_to_page(pfn)))
> +			return 1;
> +	return 0;
> +}
> +
> +/*
>   * Mark a number of pageblocks as MIGRATE_RESERVE. The number
>   * of blocks reserved is based on min_wmark_pages(zone). The memory within
>   * the reserve will tend to store contiguous free pages. Setting min_free_kbytes
> @@ -3320,7 +3334,7 @@ static void setup_zone_migrate_reserve(struct zone *zone)
>  			continue;
>  
>  		/* Blocks with reserved pages will never free, skip them. */
> -		if (PageReserved(page))
> +		if (pageblock_is_reserved(pfn))
>  			continue;
>  
>  		block_migratetype = get_pageblock_migratetype(page);

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm: Check if any page in a pageblock is reserved before marking it MIGRATE_RESERVE
  2011-04-26  9:49   ` KOSAKI Motohiro
  2011-04-26 10:13     ` Mel Gorman
@ 2011-04-26 17:51     ` John Stultz
  1 sibling, 0 replies; 9+ messages in thread
From: John Stultz @ 2011-04-26 17:51 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Mel Gorman, linux-kernel, Arve Hj?nnev?g, Dave Hansen,
	Andrew Morton

On Tue, 2011-04-26 at 18:49 +0900, KOSAKI Motohiro wrote:
> > On Thu, Apr 21, 2011 at 06:34:03PM -0700, John Stultz wrote:
> > > From: Arve Hjønnevåg <arve@android.com>
> > > 
> > > This fixes a problem where the first pageblock got marked MIGRATE_RESERVE even
> > > though it only had a few free pages. This in turn caused no contiguous memory
> > > to be reserved and frequent kswapd wakeups that emptied the caches to get more
> > > contiguous memory.
> > > 
> > > CC: Dave Hansen <dave@linux.vnet.ibm.com>
> > > CC: Mel Gorman <mgorman@suse.de>
> > > CC: Andrew Morton <akpm@linux-foundation.org>
> > > Signed-off-by: Arve Hjønnevåg <arve@android.com>
> > > Acked-by: Mel Gorman <mel@csn.ul.ie>
> > > 
> > > [This patch was submitted and acked a little over a year ago
> > > (see: http://lkml.org/lkml/2010/4/6/172 ), but never seemingly
> > > made it upstream. Resending for comments. -jstultz]
> > > 
> > > Signed-off-by: John Stultz <john.stultz@linaro.org>
> > 
> > Whoops, should have spotted it slipped through. FWIW, I'm still happy
> > with my Ack being stuck onto it.
> 
> Hehe, No.
> 
> You acked another patch at last year and John taked up old one. Sigh.
> Look,  correct one has pfn_valid_within(). 
> 	http://lkml.org/lkml/2010/4/6/172

Oh yikes! Many thanks for noticing that detail! Indeed, I started with
the patch in the Android tree, and didn't notice the difference in the
discussion I linked to. My apologies.

> And, Minchan suggested to add more explanation to the description. Then, I think
> following is desiable one.

Thanks so much again!
-john



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Question] race condition in mm/page_alloc.c regarding page->lru?
@ 2010-04-05 10:14 Mel Gorman
  2010-04-06  3:09 ` [PATCH] mm: Check if any page in a pageblock is reserved before marking it MIGRATE_RESERVE Arve Hjønnevåg
  0 siblings, 1 reply; 9+ messages in thread
From: Mel Gorman @ 2010-04-05 10:14 UTC (permalink / raw)
  To: Arve Hj?nnev?g
  Cc: KOSAKI Motohiro, TAO HU, linux-mm, linux-kernel,
	Ye Yuan.Bo-A22116, Chang Qing-A21550, linux-arm-kernel

On Fri, Apr 02, 2010 at 05:59:00PM -0700, Arve Hj?nnev?g wrote:
> On Fri, Apr 2, 2010 at 2:48 AM, Mel Gorman <mel@csn.ul.ie> wrote:
> > On Fri, Apr 02, 2010 at 02:03:23PM +0900, KOSAKI Motohiro wrote:
> >> Cc to Mel,
> >>
> >> > 2 patches related to page_alloc.c were applied.
> >> > Does anyone see a connection between the 2 patches and the panic?
> >> > NOTE: the full patches are attached.
> >>
> >> I think your attached two patches are perfectly unrelated your problem.
> >>
> >
> > Agreed. It's unlikely that there is a race as such in the page
> > allocator. In buffered_rmqueue that you initially talk about, the lists
> > being manipulated are per-cpu lists. About the only way to corrupt them
> > is if you had a NMI hander that called the page allocator. I really hope
> > your platform is not doing anything like that.
> >
> > A double free of page->lru is a possibility. You could try reproducing
> > the problem with CONFIG_DEBUG_LIST enabled to see if anything falls out.
> >
> >> "mm: Add min_free_order_shift tunable." seems makes zero sense. I don't think this patch
> >> need to be merge.
> >>
> >
> > It makes a marginal amount of sense. Basically what it does is allowing
> > high-order allocations to go much further below their watermarks than is
> > currently allowed. If the platform in question is doing a lot of high-order
> > allocations, this patch could be seen to "fix" the problem but you wouldn't
> > touch mainline with it with a barge pole. It would be more stable to fix
> > the drivers to not use high order allocations or use a mempool.
> >
> 
> The high order allocation that caused problems was the first level
> page table for each process.

Out of curiousity, how big is that allocation? Is it specific to
android? If it is, I guess it can be let slide but if it's common, it
would be worth thinking of an arch-hook that tells the VM that a
particular high-order is very common. For example, one possibility would
be to ask kswapd to always reclaim at a given order even if the
watermarks required are for a lower order.

> Each time a new process started the
> kernel would empty the entire page cache to create contiguous free
> memory.

I ask because I'm surprised the entire page cache got chucked out

> With the reserved pageblock mostly full (fixed by the second
> patch) this contiguous memory would then almost immediately get used
> for low order allocations, so the same problem starts again when the
> next process starts.

This is a little outside what I expected the reserved pageblock was
intended for. I expected it to be used for high-order short-lived
allocations such as required by some wireless drivers. Pagetables are a
bit more common.

> I agree this patch does not fix the problem, but
> it does improve things when the problem hits. I have not seen a device
> in this situation with the second patch applied, but I did not remove
> the first patch in case the reserved pageblock fills up.
> 
> > It is inconceivable this patch is related to the problem though.
> >
> >> but "mm: Check if any page in a pageblock is reserved before marking it MIGRATE_RESERVE"
> >> treat strange hardware correctly, I think. If Mel ack this, I hope merge it.
> >> Mel, Can we hear your opinion?
> >>
> >
> > This patch is interesting and I am surprised it is required. Is it really the
> > case that page blocks near the start of a zone are dominated with PageReserved
> > pages but the first one happen to be free? I guess it's conceivable on ARM
> > where memmap can be freed at boot time.
> 
> I think this happens by default on arm. The kernel starts at offset
> 0x8000 to leave room for boot parameters, and in recent kernel
> versions (>~2.6.26-29) this memory is freed.
> 

Ok, that's fine.

> >
> > There is a theoritical problem with the patch but it is easily resolved.
> > A PFN walker like this must call pfn_valid_within() before calling
> > pfn_to_page(). If they do not, it's possible to get complete garbage
> > for the page and result in a bad dereference. In this particular case,
> > it would be a kernel oops rather than memory corruption though.
> >
> > If that was fixed, I'd see no problem with Acking the patch.
> >
> 
> I can fix this if you want the patch in mainline. I was not sure it
> was acceptable since will slow down boot on all systems, even where it
> is not needed.
> 

It will not be noticeable. Only a few pageblocks are scanned per zone
and the full zone gets walked for a variety of reasons during boot
anyway. If it ever became absolutly necessary, the lowest suitable
pageblock could be identified when the bootmem allocator is being torn
down as the necessary information becomes available then.

> > It is also inconceivable this patch is related to the problem.
> >
> >> >
> >> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> >> > index a596bfd..34a29e2 100644
> >> > --- a/mm/page_alloc.c
> >> > +++ b/mm/page_alloc.c
> >> > @@ -2551,6 +2551,20 @@ static inline unsigned long
> >> > wait_table_bits(unsigned long size)
> >> >  #define LONG_ALIGN(x) (((x)+(sizeof(long))-1)&~((sizeof(long))-1))
> >> >
> >> >  /*
> >> > + * Check if a pageblock contains reserved pages
> >> > + */
> >> > +static int pageblock_is_reserved(unsigned long start_pfn)
> >> > +{
> >> > +   unsigned long end_pfn = start_pfn + pageblock_nr_pages;
> >> > +   unsigned long pfn;
> >> > +
> >> > +   for (pfn = start_pfn; pfn < end_pfn; pfn++)
> >> > +           if (PageReserved(pfn_to_page(pfn)))
> >> > +                   return 1;
> >> > +   return 0;
> >> > +}
> >> > +
> >> > +/*
> >> >   * Mark a number of pageblocks as MIGRATE_RESERVE. The number
> >> >   * of blocks reserved is based on zone->pages_min. The memory within the
> >> >   * reserve will tend to store contiguous free pages. Setting min_free_kbytes
> >> > @@ -2579,7 +2593,7 @@ static void setup_zone_migrate_reserve(struct zone *zone)
> >> >                     continue;
> >> >
> >> >             /* Blocks with reserved pages will never free, skip them. */
> >> > -           if (PageReserved(page))
> >> > +           if (pageblock_is_reserved(pfn))
> >> >                     continue;
> >> >
> >> >             block_migratetype = get_pageblock_migratetype(page);
> >> > --
> >> > 1.5.4.3
> >> >
> >> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> >> > index 5c44ed4..a596bfd 100644
> >> > --- a/mm/page_alloc.c
> >> > +++ b/mm/page_alloc.c
> >> > @@ -119,6 +119,7 @@ static char * const zone_names[MAX_NR_ZONES] = {
> >> >  };
> >> >
> >> >  int min_free_kbytes = 1024;
> >> > +int min_free_order_shift = 1;
> >> >
> >> >  unsigned long __meminitdata nr_kernel_pages;
> >> >  unsigned long __meminitdata nr_all_pages;
> >> > @@ -1256,7 +1257,7 @@ int zone_watermark_ok(struct zone *z, int order,
> >> > unsigned long mark,
> >> >             free_pages -= z->free_area[o].nr_free << o;
> >> >
> >> >             /* Require fewer higher order pages to be free */
> >> > -           min >>= 1;
> >> > +           min >>= min_free_order_shift;
> >> >
> >> >             if (free_pages <= min)
> >> >                     return 0;
> >> > --
> >> >
> >> >
> >> > On Thu, Apr 1, 2010 at 12:05 PM, TAO HU <tghk48@motorola.com> wrote:
> >> > > Hi, all
> >> > >
> >> > > We got a panic on our ARM (OMAP) based HW.
> >> > > Our code is based on 2.6.29 kernel (last commit for mm/page_alloc.c is
> >> > > cc2559bccc72767cb446f79b071d96c30c26439b)
> >> > >
> >> > > It appears to crash while going through pcp->list in
> >> > > buffered_rmqueue() of mm/page_alloc.c after checking vmlinux.
> >> > > "00100100" implies LIST_POISON1 that suggests a race condition between
> >> > > list_add() and list_del() in my personal view.
> >> > > However we not yet figure out locking problem regarding page.lru.
> >> > >
> >> > > Any known issues about race condition in mm/page_alloc.c?
> >> > > And other hints are highly appreciated.
> >> > >
> >> > >  /* Find a page of the appropriate migrate type */
> >> > >                if (cold) {
> >> > >                   ... ...
> >> > >                } else {
> >> > >                        list_for_each_entry(page, &pcp->list, lru)
> >> > >                                if (page_private(page) == migratetype)
> >> > >                                        break;
> >> > >                }
> >> > >
> >> > > <1>[120898.805267] Unable to handle kernel paging request at virtual
> >> > > address 00100100
> >> > > <1>[120898.805633] pgd = c1560000
> >> > > <1>[120898.805786] [00100100] *pgd=897b3031, *pte=00000000, *ppte=00000000
> >> > > <4>[120898.806457] Internal error: Oops: 17 [#1] PREEMPT
> >> > > ... ...
> >> > > <4>[120898.807861] CPU: 0    Not tainted  (2.6.29-omap1 #1)
> >> > > <4>[120898.808044] PC is at get_page_from_freelist+0x1d0/0x4b0
> >> > > <4>[120898.808227] LR is at get_page_from_freelist+0xc8/0x4b0
> >> > > <4>[120898.808563] pc : [<c00a600c>]    lr : [<c00a5f04>]    psr: 800000d3
> >> > > <4>[120898.808563] sp : c49fbd18  ip : 00000000  fp : c49fbd74
> >> > > <4>[120898.809020] r10: 00000000  r9 : 001000e8  r8 : 00000002
> >> > > <4>[120898.809204] r7 : 001200d2  r6 : 60000053  r5 : c0507c4c  r4 : c49fa000
> >> > > <4>[120898.809509] r3 : 001000e8  r2 : 00100100  r1 : c0507c6c  r0 : 00000001
> >> > > <4>[120898.809844] Flags: Nzcv  IRQs off  FIQs off  Mode SVC_32  ISA
> >> > > ARM  Segment kernel
> >> > > <4>[120898.810028] Control: 10c5387d  Table: 82160019  DAC: 00000017
> >> > > <4>[120898.948425] Backtrace:
> >> > > <4>[120898.948760] [<c00a5e3c>] (get_page_from_freelist+0x0/0x4b0)
> >> > > from [<c00a6398>] (__alloc_pages_internal+0xac/0x3e8)
> >> > > <4>[120898.949554] [<c00a62ec>] (__alloc_pages_internal+0x0/0x3e8)
> >> > > from [<c00b461c>] (handle_mm_fault+0x16c/0xbac)
> >> > > <4>[120898.950347] [<c00b44b0>] (handle_mm_fault+0x0/0xbac) from
> >> > > [<c00b51d0>] (__get_user_pages+0x174/0x2b4)
> >> > > <4>[120898.951019] [<c00b505c>] (__get_user_pages+0x0/0x2b4) from
> >> > > [<c00b534c>] (get_user_pages+0x3c/0x44)
> >> > > <4>[120898.951812] [<c00b5310>] (get_user_pages+0x0/0x44) from
> >> > > [<c00caf9c>] (get_arg_page+0x50/0xa4)
> >> > > <4>[120898.952636] [<c00caf4c>] (get_arg_page+0x0/0xa4) from
> >> > > [<c00cb1ec>] (copy_strings+0x108/0x210)
> >> > > <4>[120898.953430]  r7:beffffe4 r6:00000ffc r5:00000000 r4:00000018
> >> > > <4>[120898.954223] [<c00cb0e4>] (copy_strings+0x0/0x210) from
> >> > > [<c00cb330>] (copy_strings_kernel+0x3c/0x74)
> >> > > <4>[120898.955047] [<c00cb2f4>] (copy_strings_kernel+0x0/0x74) from
> >> > > [<c00cc778>] (do_execve+0x18c/0x2b0)
> >> > > <4>[120898.955841]  r5:0001e240 r4:0001e224
> >> > > <4>[120898.956329] [<c00cc5ec>] (do_execve+0x0/0x2b0) from
> >> > > [<c00400e4>] (sys_execve+0x3c/0x5c)
> >> > > <4>[120898.957153] [<c00400a8>] (sys_execve+0x0/0x5c) from
> >> > > [<c003ce80>] (ret_fast_syscall+0x0/0x2c)
> >> > > <4>[120898.957946]  r7:0000000b r6:0001e270 r5:00000000 r4:0001d580
> >> > > <4>[120898.958740] Code: e1530008 0a000006 e2429018 e1a03009 (e5b32018)
> >> > >
> >> > >
> >> > >
> >> > > --
> >> > > Best Regards
> >> > > Hu Tao
> >> > >
> >>
> >>
> >>

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] mm: Check if any page in a pageblock is reserved before marking it MIGRATE_RESERVE
  2010-04-05 10:14 [Question] race condition in mm/page_alloc.c regarding page->lru? Mel Gorman
@ 2010-04-06  3:09 ` Arve Hjønnevåg
  2010-04-06  4:15   ` Minchan Kim
  2010-04-06 15:11   ` Mel Gorman
  0 siblings, 2 replies; 9+ messages in thread
From: Arve Hjønnevåg @ 2010-04-06  3:09 UTC (permalink / raw)
  To: Mel Gorman
  Cc: KOSAKI Motohiro, TAO HU, linux-mm, linux-kernel,
	Ye Yuan.Bo-A22116, Chang Qing-A21550, linux-arm-kernel,
	Arve Hjønnevåg

This fixes a problem where the first pageblock got marked MIGRATE_RESERVE even
though it only had a few free pages. This in turn caused no contiguous memory
to be reserved and frequent kswapd wakeups that emptied the caches to get more
contiguous memory.

Signed-off-by: Arve Hjønnevåg <arve@android.com>
---
 mm/page_alloc.c |   16 +++++++++++++++-
 1 files changed, 15 insertions(+), 1 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index fb7df1d..46ade16 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2860,6 +2860,20 @@ static inline unsigned long wait_table_bits(unsigned long size)
 #define LONG_ALIGN(x) (((x)+(sizeof(long))-1)&~((sizeof(long))-1))
 
 /*
+ * Check if a pageblock contains reserved pages
+ */
+static int pageblock_is_reserved(unsigned long start_pfn)
+{
+	unsigned long end_pfn = start_pfn + pageblock_nr_pages;
+	unsigned long pfn;
+
+	for (pfn = start_pfn; pfn < end_pfn; pfn++)
+		if (!pfn_valid_within(pfn) || PageReserved(pfn_to_page(pfn)))
+			return 1;
+	return 0;
+}
+
+/*
  * Mark a number of pageblocks as MIGRATE_RESERVE. The number
  * of blocks reserved is based on min_wmark_pages(zone). The memory within
  * the reserve will tend to store contiguous free pages. Setting min_free_kbytes
@@ -2898,7 +2912,7 @@ static void setup_zone_migrate_reserve(struct zone *zone)
 			continue;
 
 		/* Blocks with reserved pages will never free, skip them. */
-		if (PageReserved(page))
+		if (pageblock_is_reserved(pfn))
 			continue;
 
 		block_migratetype = get_pageblock_migratetype(page);
-- 
1.6.5.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm: Check if any page in a pageblock is reserved before  marking it MIGRATE_RESERVE
  2010-04-06  3:09 ` [PATCH] mm: Check if any page in a pageblock is reserved before marking it MIGRATE_RESERVE Arve Hjønnevåg
@ 2010-04-06  4:15   ` Minchan Kim
  2010-04-06 15:11   ` Mel Gorman
  1 sibling, 0 replies; 9+ messages in thread
From: Minchan Kim @ 2010-04-06  4:15 UTC (permalink / raw)
  To: Arve Hjønnevåg
  Cc: Mel Gorman, KOSAKI Motohiro, TAO HU, linux-mm, linux-kernel,
	Ye Yuan.Bo-A22116, Chang Qing-A21550, linux-arm-kernel

On Tue, Apr 6, 2010 at 12:09 PM, Arve Hjønnevåg <arve@android.com> wrote:
> This fixes a problem where the first pageblock got marked MIGRATE_RESERVE even
> though it only had a few free pages. This in turn caused no contiguous memory
> to be reserved and frequent kswapd wakeups that emptied the caches to get more
> contiguous memory.

It would be better to add following your description of previous mail thread.
It can help others understand it in future.

On Fri, Apr 02, 2010 at 05:59:00PM -0700, Arve Hj?nnev?g wrote:
...
"I think this happens by default on arm. The kernel starts at offset
0x8000 to leave room for boot parameters, and in recent kernel
versions (>~2.6.26-29) this memory is freed."


-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm: Check if any page in a pageblock is reserved before marking it MIGRATE_RESERVE
  2010-04-06  3:09 ` [PATCH] mm: Check if any page in a pageblock is reserved before marking it MIGRATE_RESERVE Arve Hjønnevåg
  2010-04-06  4:15   ` Minchan Kim
@ 2010-04-06 15:11   ` Mel Gorman
  1 sibling, 0 replies; 9+ messages in thread
From: Mel Gorman @ 2010-04-06 15:11 UTC (permalink / raw)
  To: Arve Hj?nnev?g
  Cc: KOSAKI Motohiro, TAO HU, linux-mm, linux-kernel,
	Ye Yuan.Bo-A22116, Chang Qing-A21550, linux-arm-kernel

On Mon, Apr 05, 2010 at 08:09:16PM -0700, Arve Hj?nnev?g wrote:
> This fixes a problem where the first pageblock got marked MIGRATE_RESERVE even
> though it only had a few free pages. This in turn caused no contiguous memory
> to be reserved and frequent kswapd wakeups that emptied the caches to get more
> contiguous memory.
> 
> Signed-off-by: Arve Hjønnevåg <arve@android.com>

I would have used pageblock_reserve_suitable because what you are really
checking is "is this page block suitable for use by MIGRATE_RESERVE?".
The definition was "is the first page PageReserved" and you are changing it to
"does the page block have any memory holes or PageReserved pages?"

No biggie though. Change it if you like before upstreaming. Either way.

Acked-by: Mel Gorman <mel@csn.ul.ie>


Thanks

> ---
>  mm/page_alloc.c |   16 +++++++++++++++-
>  1 files changed, 15 insertions(+), 1 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index fb7df1d..46ade16 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2860,6 +2860,20 @@ static inline unsigned long wait_table_bits(unsigned long size)
>  #define LONG_ALIGN(x) (((x)+(sizeof(long))-1)&~((sizeof(long))-1))
>  
>  /*
> + * Check if a pageblock contains reserved pages
> + */
> +static int pageblock_is_reserved(unsigned long start_pfn)
> +{
> +	unsigned long end_pfn = start_pfn + pageblock_nr_pages;
> +	unsigned long pfn;
> +
> +	for (pfn = start_pfn; pfn < end_pfn; pfn++)
> +		if (!pfn_valid_within(pfn) || PageReserved(pfn_to_page(pfn)))
> +			return 1;
> +	return 0;
> +}
> +
> +/*
>   * Mark a number of pageblocks as MIGRATE_RESERVE. The number
>   * of blocks reserved is based on min_wmark_pages(zone). The memory within
>   * the reserve will tend to store contiguous free pages. Setting min_free_kbytes
> @@ -2898,7 +2912,7 @@ static void setup_zone_migrate_reserve(struct zone *zone)
>  			continue;
>  
>  		/* Blocks with reserved pages will never free, skip them. */
> -		if (PageReserved(page))
> +		if (pageblock_is_reserved(pfn))
>  			continue;
>  
>  		block_migratetype = get_pageblock_migratetype(page);
> -- 
> 1.6.5.1
> 

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2011-04-26 17:51 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-04-22  1:34 [PATCH] mm: Check if any page in a pageblock is reserved before marking it MIGRATE_RESERVE John Stultz
2011-04-22 16:02 ` Dave Hansen
2011-04-26  7:34 ` Mel Gorman
2011-04-26  9:49   ` KOSAKI Motohiro
2011-04-26 10:13     ` Mel Gorman
2011-04-26 17:51     ` John Stultz
  -- strict thread matches above, loose matches on Subject: below --
2010-04-05 10:14 [Question] race condition in mm/page_alloc.c regarding page->lru? Mel Gorman
2010-04-06  3:09 ` [PATCH] mm: Check if any page in a pageblock is reserved before marking it MIGRATE_RESERVE Arve Hjønnevåg
2010-04-06  4:15   ` Minchan Kim
2010-04-06 15:11   ` Mel Gorman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).