linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Kernel panic due to page migration accessing memory holes
@ 2010-02-18  0:45 Michael Bohan
  2010-02-18  1:03 ` KAMEZAWA Hiroyuki
  2010-02-18  8:53 ` Russell King - ARM Linux
  0 siblings, 2 replies; 13+ messages in thread
From: Michael Bohan @ 2010-02-18  0:45 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, linux-arm-msm, linux-arm-kernel

Hi,

I have encountered a kernel panic on the ARM/msm platform in the mm 
migration code on 2.6.29.  My memory configuration has two discontiguous 
banks per our ATAG definition.   These banks end up on addresses that 
are 1 MB aligned.  I am using FLATMEM (not SPARSEMEM), but my 
understanding is that SPARSEMEM should not be necessary to support this 
configuration.  Please correct me if I'm wrong.

The crash occurs in mm/page_alloc.c:move_freepages() when being passed a 
start_page that corresponds to the last several megabytes of our first 
memory bank.  The code in move_freepages_block() aligns the passed in 
page number to pageblock_nr_pages, which corresponds to 4 MB.  It then 
passes that aligned pfn as the beginning of a 4 MB range to 
move_freepages().  The problem is that since our bank's end address is 
not 4 MB aligned, the range passed to move_freepages() exceeds the end 
of our memory bank.  The code later blows up when trying to access 
uninitialized page structures.

As a temporary fix, I added some code to move_freepages_block() that 
inspects whether the range exceeds our first memory bank -- returning 0 
if it does.  This is not a clean solution, since it requires exporting 
the ARM specific meminfo structure to extract the bank information.

I see an option exists called CONFIG_HOLES_IN_ZONE, which has control 
over the definition of pfn_valid_within() used in move_freepages().  
This option seems relevant to the problem.  The ia64 architecture has a 
special version of pfn_valid() called ia64_pfn_valid() that is used in 
conjunction with this option.  It appears to inspect the page 
structure's state in a safe way that does not cause a crash, and can 
presumably be used to determine whether the page structure is 
initialized properly.  The ARM version of pfn_valid() used in the 
FLATMEM scenario does not appear to be memory hole aware, and will 
blindly return true in this case.

I have looked on linux-next, and at least the functions mentioned above 
have not changed.

I was curious if there is a stated requirement where memory banks must 
end on 4 MB aligned addresses.  Although I found this problem on ARM, it 
appears upon inspection that the problem could occur on other 
architectures as well, given the memory map assumptions stated above.  
I'm hoping that some mm experts might understand the problem in greater 
detail.

Thanks,
Michael

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Kernel panic due to page migration accessing memory holes
  2010-02-18  0:45 Kernel panic due to page migration accessing memory holes Michael Bohan
@ 2010-02-18  1:03 ` KAMEZAWA Hiroyuki
  2010-02-18  8:22   ` Michael Bohan
  2010-02-18  8:53 ` Russell King - ARM Linux
  1 sibling, 1 reply; 13+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-02-18  1:03 UTC (permalink / raw)
  To: Michael Bohan; +Cc: linux-mm, linux-kernel, linux-arm-msm, linux-arm-kernel

On Wed, 17 Feb 2010 16:45:54 -0800
Michael Bohan <mbohan@codeaurora.org> wrote:

> Hi,
> 
> I have encountered a kernel panic on the ARM/msm platform in the mm 
> migration code on 2.6.29.  My memory configuration has two discontiguous 
> banks per our ATAG definition.   These banks end up on addresses that 
> are 1 MB aligned.  I am using FLATMEM (not SPARSEMEM), but my 
> understanding is that SPARSEMEM should not be necessary to support this 
> configuration.  Please correct me if I'm wrong.
> 
> The crash occurs in mm/page_alloc.c:move_freepages() when being passed a 
> start_page that corresponds to the last several megabytes of our first 
> memory bank.  The code in move_freepages_block() aligns the passed in 
> page number to pageblock_nr_pages, which corresponds to 4 MB.  It then 
> passes that aligned pfn as the beginning of a 4 MB range to 
> move_freepages().  The problem is that since our bank's end address is 
> not 4 MB aligned, the range passed to move_freepages() exceeds the end 
> of our memory bank.  The code later blows up when trying to access 
> uninitialized page structures.
> 
That should be aligned, I think.

> As a temporary fix, I added some code to move_freepages_block() that 
> inspects whether the range exceeds our first memory bank -- returning 0 
> if it does.  This is not a clean solution, since it requires exporting 
> the ARM specific meminfo structure to extract the bank information.
> 
Hmm, my first impression is...

- Using FLATMEM, memmap is created for the number of pages and memmap should
  not have aligned size.
- Using SPARSEMEM, memmap is created for aligned number of pages.

Then, the range [zone->start_pfn ... zone->start_pfn + zone->spanned_pages]
should be checked always.


 803 static int move_freepages_block(struct zone *zone, struct page *page,
 804                                 int migratetype)
 805 {
 816         if (start_pfn < zone->zone_start_pfn)
 817                 start_page = page;
 818         if (end_pfn >= zone->zone_start_pfn + zone->spanned_pages)
 819                 return 0;
 820 
 821         return move_freepages(zone, start_page, end_page, migratetype);
 822 }

"(end_pfn >= zone->zone_start_pfn + zone->spanned_pages)" is checked. 
What zone->spanned_pages is set ? The zone's range is
[zone->start_pfn ... zone->start_pfn+zone->spanned_pages], so this
area should have initialized memmap. I wonder zone->spanned_pages is too big.

Could you check ? (maybe /proc/zoneinfo can show it.)
Dump of /proc/zoneinfo or dmesg will be helpful.

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Kernel panic due to page migration accessing memory holes
  2010-02-18  1:03 ` KAMEZAWA Hiroyuki
@ 2010-02-18  8:22   ` Michael Bohan
  2010-02-18  9:36     ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 13+ messages in thread
From: Michael Bohan @ 2010-02-18  8:22 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, linux-kernel, linux-arm-msm, linux-arm-kernel

On 2/17/2010 5:03 PM, KAMEZAWA Hiroyuki wrote:
> On Wed, 17 Feb 2010 16:45:54 -0800
> Michael Bohan<mbohan@codeaurora.org>  wrote:
>> As a temporary fix, I added some code to move_freepages_block() that
>> inspects whether the range exceeds our first memory bank -- returning 0
>> if it does.  This is not a clean solution, since it requires exporting
>> the ARM specific meminfo structure to extract the bank information.
>>
>>      
> Hmm, my first impression is...
>
> - Using FLATMEM, memmap is created for the number of pages and memmap should
>    not have aligned size.
> - Using SPARSEMEM, memmap is created for aligned number of pages.
>
> Then, the range [zone->start_pfn ... zone->start_pfn + zone->spanned_pages]
> should be checked always.
>
>
>   803 static int move_freepages_block(struct zone *zone, struct page *page,
>   804                                 int migratetype)
>   805 {
>   816         if (start_pfn<  zone->zone_start_pfn)
>   817                 start_page = page;
>   818         if (end_pfn>= zone->zone_start_pfn + zone->spanned_pages)
>   819                 return 0;
>   820
>   821         return move_freepages(zone, start_page, end_page, migratetype);
>   822 }
>
> "(end_pfn>= zone->zone_start_pfn + zone->spanned_pages)" is checked.
> What zone->spanned_pages is set ? The zone's range is
> [zone->start_pfn ... zone->start_pfn+zone->spanned_pages], so this
> area should have initialized memmap. I wonder zone->spanned_pages is too big.
>    

In the block of code above running on my target, the zone_start_pfn is 
is 0x200 and the spanned_pages is 0x44100.  This is consistent with the 
values shown from the zoneinfo file below.  It is also consistent with 
my memory map:

bank0:
     start: 0x00200000
     size:  0x07B00000

bank1:
     start: 0x40000000
     size:  0x04300000

Thus, spanned_pages here is the highest address reached minus the start 
address of the lowest bank (eg. 0x40000000 + 0x04300000 - 0x00200000).

Both of these banks exist in the same zone.  This means that the check 
in move_freepages_block() will never be satisfied for cases that overlap 
with the prohibited pfns, since the zone spans invalid pfns.  Should 
each bank be associated with its own zone?

> Could you check ? (maybe /proc/zoneinfo can show it.)
> Dump of /proc/zoneinfo or dmesg will be helpful.
>    

Here is what I believe to be the relevant pieces from the kernel log:

<7>[    0.000000] On node 0 totalpages: 48640
<7>[    0.000000] free_area_init_node: node 0, pgdat 804875bc, 
node_mem_map 805af000
<7>[    0.000000]   Normal zone: 2178 pages used for memmap
<7>[    0.000000]   Normal zone: 0 pages reserved
<7>[    0.000000]   Normal zone: 46462 pages, LIFO batch:15
<4>[    0.000000] Built 1 zonelists in Zone order, mobility grouping 
on.  Total pages: 46462

# cat /proc/zoneinfo
Node 0, zone   Normal
   pages free     678
         min      431
         low      538
         high     646
         scanned  0 (aa: 0 ia: 0 af: 0 if: 0)
         spanned  278784
         present  46462
         mem_notify_status 0
     nr_free_pages 678
     nr_inactive_anon 8494
     nr_active_anon 8474
     nr_inactive_file 3234
     nr_active_file 2653
     nr_unevictable 71
     nr_mlock     0
     nr_anon_pages 12488
     nr_mapped    7237
     nr_file_pages 10446
     nr_dirty     0
     nr_writeback 0
     nr_slab_reclaimable 293
     nr_slab_unreclaimable 942
     nr_page_table_pages 1365
     nr_unstable  0
     nr_bounce    0
     nr_vmscan_write 0
     nr_writeback_temp 0
         protection: (0, 0)
   pagesets
     cpu: 0
               count: 42
               high:  90
               batch: 15
   all_unreclaimable: 0
   prev_priority:     12
   start_pfn:         512
   inactive_ratio:    1

Thanks,
Michael

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Kernel panic due to page migration accessing memory holes
  2010-02-18  0:45 Kernel panic due to page migration accessing memory holes Michael Bohan
  2010-02-18  1:03 ` KAMEZAWA Hiroyuki
@ 2010-02-18  8:53 ` Russell King - ARM Linux
  1 sibling, 0 replies; 13+ messages in thread
From: Russell King - ARM Linux @ 2010-02-18  8:53 UTC (permalink / raw)
  To: Michael Bohan; +Cc: linux-mm, linux-arm-msm, linux-kernel, linux-arm-kernel

On Wed, Feb 17, 2010 at 04:45:54PM -0800, Michael Bohan wrote:
> I have encountered a kernel panic on the ARM/msm platform in the mm  
> migration code on 2.6.29.  My memory configuration has two discontiguous  
> banks per our ATAG definition.   These banks end up on addresses that  
> are 1 MB aligned.  I am using FLATMEM (not SPARSEMEM), but my  
> understanding is that SPARSEMEM should not be necessary to support this  
> configuration.  Please correct me if I'm wrong.

Make sure you have ARCH_HAS_HOLES_MEMORYMODEL enabled.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Kernel panic due to page migration accessing memory holes
  2010-02-18  8:22   ` Michael Bohan
@ 2010-02-18  9:36     ` KAMEZAWA Hiroyuki
  2010-02-18 10:04       ` Mel Gorman
  0 siblings, 1 reply; 13+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-02-18  9:36 UTC (permalink / raw)
  To: Michael Bohan
  Cc: linux-mm, linux-kernel, linux-arm-msm, linux-arm-kernel, mel

On Thu, 18 Feb 2010 00:22:24 -0800
Michael Bohan <mbohan@codeaurora.org> wrote:

> On 2/17/2010 5:03 PM, KAMEZAWA Hiroyuki wrote:
> > On Wed, 17 Feb 2010 16:45:54 -0800
> > Michael Bohan<mbohan@codeaurora.org>  wrote:
> >> As a temporary fix, I added some code to move_freepages_block() that
> >> inspects whether the range exceeds our first memory bank -- returning 0
> >> if it does.  This is not a clean solution, since it requires exporting
> >> the ARM specific meminfo structure to extract the bank information.
> >>
> >>      
> > Hmm, my first impression is...
> >
> > - Using FLATMEM, memmap is created for the number of pages and memmap should
> >    not have aligned size.
> > - Using SPARSEMEM, memmap is created for aligned number of pages.
> >
> > Then, the range [zone->start_pfn ... zone->start_pfn + zone->spanned_pages]
> > should be checked always.
> >
> >
> >   803 static int move_freepages_block(struct zone *zone, struct page *page,
> >   804                                 int migratetype)
> >   805 {
> >   816         if (start_pfn<  zone->zone_start_pfn)
> >   817                 start_page = page;
> >   818         if (end_pfn>= zone->zone_start_pfn + zone->spanned_pages)
> >   819                 return 0;
> >   820
> >   821         return move_freepages(zone, start_page, end_page, migratetype);
> >   822 }
> >
> > "(end_pfn>= zone->zone_start_pfn + zone->spanned_pages)" is checked.
> > What zone->spanned_pages is set ? The zone's range is
> > [zone->start_pfn ... zone->start_pfn+zone->spanned_pages], so this
> > area should have initialized memmap. I wonder zone->spanned_pages is too big.
> >    
> 
> In the block of code above running on my target, the zone_start_pfn is 
> is 0x200 and the spanned_pages is 0x44100.  This is consistent with the 
> values shown from the zoneinfo file below.  It is also consistent with 
> my memory map:
> 
> bank0:
>      start: 0x00200000
>      size:  0x07B00000
> 
> bank1:
>      start: 0x40000000
>      size:  0x04300000
> 
> Thus, spanned_pages here is the highest address reached minus the start 
> address of the lowest bank (eg. 0x40000000 + 0x04300000 - 0x00200000).
> 
> Both of these banks exist in the same zone.  This means that the check 
> in move_freepages_block() will never be satisfied for cases that overlap 
> with the prohibited pfns, since the zone spans invalid pfns.  Should 
> each bank be associated with its own zone?
> 

Hmm. okay then..(CCing Mel.)

 [Fact]
 - There are 2 banks of memory and a memory hole on your machine.
   As
         0x00200000 - 0x07D00000
         0x40000000 - 0x43000000

 - Each bancks are in the same zone.
 - You use FLATMEM.
 - You see panic in move_freepages().
 - Your host's MAX_ORDER=11....buddy allocator's alignment is 0x400000
   Then, it seems 1st bank is not algined.
 - You see panic in move_freepages().
 - When you added special range check for bank0 in move_freepages(), no panic.
   So, it seems the kernel see somehing bad at accessing memmap for a memory 
   hole between bank0 and bank1.


When you use FLATMEM, memmap/migrate-type-bitmap should be allocated for
the whole range of [start_pfn....max_pfn) regardless of memory holes. 
Then, I think you have memmap even for a memory hole [0x07D00000...0x40000000)

Then, the question is why move_freepages() panic at accessing *unused* memmaps
for memory hole. All memmap(struct page) are initialized in 
  memmap_init()
	-> memmap_init_zone()
		-> ....
  Here, all page structs are initialized (page->flags, page->lru are initialized.)

Then, looking back into move_freepages().
 ==
 778         for (page = start_page; page <= end_page;) {
 779                 /* Make sure we are not inadvertently changing nodes */
 780                 VM_BUG_ON(page_to_nid(page) != zone_to_nid(zone));
 781 
 782                 if (!pfn_valid_within(page_to_pfn(page))) {
 783                         page++;
 784                         continue;
 785                 }
 786 
 787                 if (!PageBuddy(page)) {
 788                         page++;
 789                         continue;
 790                 }
 791 
 792                 order = page_order(page);
 793                 list_del(&page->lru);
 794                 list_add(&page->lru,
 795                         &zone->free_area[order].free_list[migratetype]);
 796                 page += 1 << order;
 797                 pages_moved += 1 << order;
 798         }
 ==
Assume an access to page struct itself doesn't cause panic.
Touching page struct's member of page->lru at el to cause panic,
So, PageBuddy should be set.

Then, there are 2 chances.
  1. page_to_nid(page) != zone_to_nid(zone).
  2. PageBuddy() is set by mistake.
     (PG_reserved page never be set PG_buddy.)

For both, something corrupted in unused memmap area.
There are 2 possibility.
 (1) memmap for memory hole was not initialized correctly.
 (2) something wrong currupt memmap. (by overwrite.)

I doubt (2) rather than (1).

One of difficulty here is that your kernel is 2.6.29. Can't you try 2.6.32 and
reproduce trouble ? Or could you check page flags for memory holes ?
For holes, nid should be zero and PG_buddy shouldn't be set and PG_reserved
should be set...

And checking memmap initialization of memory holes in memmap_init_zone() 
may be good start point for debug, I guess.


Off topic:
BTW, memory hole seems huge for your size of memory....using SPARSEMEM
is a choice.

Regards,
-Kame










--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Kernel panic due to page migration accessing memory holes
  2010-02-18  9:36     ` KAMEZAWA Hiroyuki
@ 2010-02-18 10:04       ` Mel Gorman
  2010-02-19  1:47         ` Michael Bohan
  0 siblings, 1 reply; 13+ messages in thread
From: Mel Gorman @ 2010-02-18 10:04 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Michael Bohan, linux-mm, linux-kernel, linux-arm-msm,
	linux-arm-kernel

On Thu, Feb 18, 2010 at 06:36:04PM +0900, KAMEZAWA Hiroyuki wrote:
> On Thu, 18 Feb 2010 00:22:24 -0800
> Michael Bohan <mbohan@codeaurora.org> wrote:
> 
> > On 2/17/2010 5:03 PM, KAMEZAWA Hiroyuki wrote:
> > > On Wed, 17 Feb 2010 16:45:54 -0800
> > > Michael Bohan<mbohan@codeaurora.org>  wrote:
> > >> As a temporary fix, I added some code to move_freepages_block() that
> > >> inspects whether the range exceeds our first memory bank -- returning 0
> > >> if it does.  This is not a clean solution, since it requires exporting
> > >> the ARM specific meminfo structure to extract the bank information.
> > >>
> > >>      
> > > Hmm, my first impression is...
> > >
> > > - Using FLATMEM, memmap is created for the number of pages and memmap should
> > >    not have aligned size.
> > > - Using SPARSEMEM, memmap is created for aligned number of pages.
> > >
> > > Then, the range [zone->start_pfn ... zone->start_pfn + zone->spanned_pages]
> > > should be checked always.
> > >
> > >
> > >   803 static int move_freepages_block(struct zone *zone, struct page *page,
> > >   804                                 int migratetype)
> > >   805 {
> > >   816         if (start_pfn<  zone->zone_start_pfn)
> > >   817                 start_page = page;
> > >   818         if (end_pfn>= zone->zone_start_pfn + zone->spanned_pages)
> > >   819                 return 0;
> > >   820
> > >   821         return move_freepages(zone, start_page, end_page, migratetype);
> > >   822 }
> > >
> > > "(end_pfn>= zone->zone_start_pfn + zone->spanned_pages)" is checked.
> > > What zone->spanned_pages is set ? The zone's range is
> > > [zone->start_pfn ... zone->start_pfn+zone->spanned_pages], so this
> > > area should have initialized memmap. I wonder zone->spanned_pages is too big.
> > >    
> > 
> > In the block of code above running on my target, the zone_start_pfn is 
> > is 0x200 and the spanned_pages is 0x44100.  This is consistent with the 
> > values shown from the zoneinfo file below.  It is also consistent with 
> > my memory map:
> > 
> > bank0:
> >      start: 0x00200000
> >      size:  0x07B00000
> > 
> > bank1:
> >      start: 0x40000000
> >      size:  0x04300000
> > 
> > Thus, spanned_pages here is the highest address reached minus the start 
> > address of the lowest bank (eg. 0x40000000 + 0x04300000 - 0x00200000).
> > 
> > Both of these banks exist in the same zone.  This means that the check 
> > in move_freepages_block() will never be satisfied for cases that overlap 
> > with the prohibited pfns, since the zone spans invalid pfns.  Should 
> > each bank be associated with its own zone?
> > 
> 
> Hmm. okay then..(CCing Mel.)
> 
>  [Fact]
>  - There are 2 banks of memory and a memory hole on your machine.
>    As
>          0x00200000 - 0x07D00000
>          0x40000000 - 0x43000000
> 
>  - Each bancks are in the same zone.
>  - You use FLATMEM.
>  - You see panic in move_freepages().
>  - Your host's MAX_ORDER=11....buddy allocator's alignment is 0x400000
>    Then, it seems 1st bank is not algined.

It's not and assumptions are made about it being aligned.

>  - You see panic in move_freepages().
>  - When you added special range check for bank0 in move_freepages(), no panic.
>    So, it seems the kernel see somehing bad at accessing memmap for a memory 
>    hole between bank0 and bank1.
> 
> 
> When you use FLATMEM, memmap/migrate-type-bitmap should be allocated for
> the whole range of [start_pfn....max_pfn) regardless of memory holes. 
> Then, I think you have memmap even for a memory hole [0x07D00000...0x40000000)
> 

It would have at the start but then ....


> Then, the question is why move_freepages() panic at accessing *unused* memmaps
> for memory hole. All memmap(struct page) are initialized in 
>   memmap_init()
> 	-> memmap_init_zone()
> 		-> ....
>   Here, all page structs are initialized (page->flags, page->lru are initialized.)
> 

ARM frees unused portions of memmap to save memory. It's why memmap_valid_within()
exists when CONFIG_ARCH_HAS_HOLES_MEMORYMODEL although previously only
reading /proc/pagetypeinfo cared.

In that case, the FLATMEM memory map had unexpected holes which "never"
happens and that was the workaround. The problem here is that there are
unaligned zones but no pfn_valid() implementation that can identify
them as you'd have with SPARSEMEM. My expectation is that you are using
the pfn_valid() implementation from asm-generic

#define pfn_valid(pfn)          ((pfn) < max_mapnr)

which is insufficient in your case.

> Then, looking back into move_freepages().
>  ==
>  778         for (page = start_page; page <= end_page;) {
>  779                 /* Make sure we are not inadvertently changing nodes */
>  780                 VM_BUG_ON(page_to_nid(page) != zone_to_nid(zone));
>  781 
>  782                 if (!pfn_valid_within(page_to_pfn(page))) {
>  783                         page++;
>  784                         continue;
>  785                 }
>  786 
>  787                 if (!PageBuddy(page)) {
>  788                         page++;
>  789                         continue;
>  790                 }
>  791 
>  792                 order = page_order(page);
>  793                 list_del(&page->lru);
>  794                 list_add(&page->lru,
>  795                         &zone->free_area[order].free_list[migratetype]);
>  796                 page += 1 << order;
>  797                 pages_moved += 1 << order;
>  798         }
>  ==
> Assume an access to page struct itself doesn't cause panic.
> Touching page struct's member of page->lru at el to cause panic,
> So, PageBuddy should be set.
> 
> Then, there are 2 chances.
>   1. page_to_nid(page) != zone_to_nid(zone).
>   2. PageBuddy() is set by mistake.
>      (PG_reserved page never be set PG_buddy.)
> 
> For both, something corrupted in unused memmap area.
> There are 2 possibility.
>  (1) memmap for memory hole was not initialized correctly.
>  (2) something wrong currupt memmap. (by overwrite.)
> 
> I doubt (2) rather than (1).
> 

I think it's more likely the at the memmap he is accessing has been
freed and is effectively random data.

> One of difficulty here is that your kernel is 2.6.29. Can't you try 2.6.32 and
> reproduce trouble ? Or could you check page flags for memory holes ?
> For holes, nid should be zero and PG_buddy shouldn't be set and PG_reserved
> should be set...
> 
> And checking memmap initialization of memory holes in memmap_init_zone() 
> may be good start point for debug, I guess.
> 
> Off topic:
> BTW, memory hole seems huge for your size of memory....using SPARSEMEM
> is a choice.
> 

SPARSEMEM would give you an implementation of pfn_valid() that you could
use here. The choices that spring to mind are;

1. reduce MAX_ORDER so they are aligned (easiest)
2. use SPARSEMEM (easy, but not necessary what you want to do, might
	waste memory unless you drop MAX_ORDER as well)
3. implement a pfn_valid() that can handle the holes and set
	CONFIG_HOLES_IN_ZONE so it's called in move_freepages() to
	deal with the holes (should pass this by someone more familiar
	with ARM than I)
4. Call memmap_valid_within in move_freepages (very very ugly, not
	suitable for upstream merging)

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Kernel panic due to page migration accessing memory holes
  2010-02-18 10:04       ` Mel Gorman
@ 2010-02-19  1:47         ` Michael Bohan
  2010-02-19  2:00           ` KAMEZAWA Hiroyuki
                             ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Michael Bohan @ 2010-02-19  1:47 UTC (permalink / raw)
  To: Mel Gorman
  Cc: KAMEZAWA Hiroyuki, linux-mm, linux-kernel, linux-arm-msm,
	linux-arm-kernel

On 2/18/2010 2:04 AM, Mel Gorman wrote:
> On Thu, Feb 18, 2010 at 06:36:04PM +0900, KAMEZAWA Hiroyuki wrote:
>    
>>   [Fact]
>>   - There are 2 banks of memory and a memory hole on your machine.
>>     As
>>           0x00200000 - 0x07D00000
>>           0x40000000 - 0x43000000
>>
>>   - Each bancks are in the same zone.
>>   - You use FLATMEM.
>>   - You see panic in move_freepages().
>>   - Your host's MAX_ORDER=11....buddy allocator's alignment is 0x400000
>>     Then, it seems 1st bank is not algined.
>>      
> It's not and assumptions are made about it being aligned.
>    

Would it be prudent to have the ARM mm init code detect unaligned, 
discontiguous banks and print a warning message if 
CONFIG_ARCH_HAS_HOLES_MEMORYMODEL is not configured?  Should we take it 
a step further and even BUG()?

> ARM frees unused portions of memmap to save memory. It's why memmap_valid_within()
> exists when CONFIG_ARCH_HAS_HOLES_MEMORYMODEL although previously only
> reading /proc/pagetypeinfo cared.
>
> In that case, the FLATMEM memory map had unexpected holes which "never"
> happens and that was the workaround. The problem here is that there are
> unaligned zones but no pfn_valid() implementation that can identify
> them as you'd have with SPARSEMEM. My expectation is that you are using
> the pfn_valid() implementation from asm-generic
>
> #define pfn_valid(pfn)          ((pfn)<  max_mapnr)
>
> which is insufficient in your case.
>    

I am actually using the pfn_valid implementation FLATMEM in 
arch/arm/include/asm/memory.h.  This one is very similar to the 
asm-generic, and has no knowledge of the holes.

> I think it's more likely the at the memmap he is accessing has been
> freed and is effectively random data.
>
>    

I also think this is the case.

> SPARSEMEM would give you an implementation of pfn_valid() that you could
> use here. The choices that spring to mind are;
>
> 1. reduce MAX_ORDER so they are aligned (easiest)
>    

Is it safe to assume that reducing MAX_ORDER will hurt performance?

> 2. use SPARSEMEM (easy, but not necessary what you want to do, might
> 	waste memory unless you drop MAX_ORDER as well)
>    

We intend to use SPARSEMEM, but we'd also like to maintain FLATMEM 
compatibility for some configurations.  My guess is that there are other 
ARM users that may want this support as well.

> 3. implement a pfn_valid() that can handle the holes and set
> 	CONFIG_HOLES_IN_ZONE so it's called in move_freepages() to
> 	deal with the holes (should pass this by someone more familiar
> 	with ARM than I)
>    

This option seems the best to me.  We should be able to implement an ARM 
specific pfn_valid() that walks the ARM meminfo struct to ensure the pfn 
is not within a hole.

My only concern with this is a comment in __rmqueue_fallback() after 
calling move_freepages_block()  that states "Claim the whole block if 
over half of it is free".  Suppose only 1 MB is beyond the bank limit.  
That means that over half of the pages of the 4 MB block will be 
reported by move_freepages() as free -- but 1 MB of those pages are 
invalid.  Won't this cause problems if these pages are assumed to be 
part of an active block?

It seems like we should have an additional check in 
move_freepages_block() with pfn_valid_within() to check the last page in 
the block (eg. end_pfn) before calling move_freepages_block().  If the 
last page is not valid, then we shouldn't we return 0 as in the zone 
span check?  This will also skip the extra burden of checking each 
individual page, when we already know the proposed range is invalid.

Assuming we did return 0 in this case, would that sub-block of pages 
ever be usable for anything else, or would it be effectively wasted?  If 
this memory were wasted, then adjusting MAX_ORDER would have an 
advantage in this sense -- ignoring any performance implications.

Thanks,
Michael

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Kernel panic due to page migration accessing memory holes
  2010-02-19  1:47         ` Michael Bohan
@ 2010-02-19  2:00           ` KAMEZAWA Hiroyuki
  2010-02-19  5:48             ` Michael Bohan
  2010-02-19  8:30           ` Russell King - ARM Linux
  2010-02-19 13:48           ` Mel Gorman
  2 siblings, 1 reply; 13+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-02-19  2:00 UTC (permalink / raw)
  To: Michael Bohan
  Cc: Mel Gorman, linux-mm, linux-kernel, linux-arm-msm,
	linux-arm-kernel

On Thu, 18 Feb 2010 17:47:28 -0800
Michael Bohan <mbohan@codeaurora.org> wrote:

> On 2/18/2010 2:04 AM, Mel Gorman wrote:
> > On Thu, Feb 18, 2010 at 06:36:04PM +0900, KAMEZAWA Hiroyuki wrote:
> >    
> >>   [Fact]
> >>   - There are 2 banks of memory and a memory hole on your machine.
> >>     As
> >>           0x00200000 - 0x07D00000
> >>           0x40000000 - 0x43000000
> >>
> >>   - Each bancks are in the same zone.
> >>   - You use FLATMEM.
> >>   - You see panic in move_freepages().
> >>   - Your host's MAX_ORDER=11....buddy allocator's alignment is 0x400000
> >>     Then, it seems 1st bank is not algined.
> >>      
> > It's not and assumptions are made about it being aligned.
> >    
> 
> Would it be prudent to have the ARM mm init code detect unaligned, 
> discontiguous banks and print a warning message if 
> CONFIG_ARCH_HAS_HOLES_MEMORYMODEL is not configured?  Should we take it 
> a step further and even BUG()?
> 
> > ARM frees unused portions of memmap to save memory. It's why memmap_valid_within()
> > exists when CONFIG_ARCH_HAS_HOLES_MEMORYMODEL although previously only
> > reading /proc/pagetypeinfo cared.
> >
> > In that case, the FLATMEM memory map had unexpected holes which "never"
> > happens and that was the workaround. The problem here is that there are
> > unaligned zones but no pfn_valid() implementation that can identify
> > them as you'd have with SPARSEMEM. My expectation is that you are using
> > the pfn_valid() implementation from asm-generic
> >
> > #define pfn_valid(pfn)          ((pfn)<  max_mapnr)
> >
> > which is insufficient in your case.
> >    
> 
> I am actually using the pfn_valid implementation FLATMEM in 
> arch/arm/include/asm/memory.h.  This one is very similar to the 
> asm-generic, and has no knowledge of the holes.
> 
That means, in FLATMEM, memmaps are allocated for [start....max_pfn].
pfn_valid() isn't for "there is memor" but for "there is memmap".



> > I think it's more likely the at the memmap he is accessing has been
> > freed and is effectively random data.
> >
> >    
> 
> I also think this is the case.
> 
Then, plz check free_bootmem() at el doen't free pages in a memory hole.


> > SPARSEMEM would give you an implementation of pfn_valid() that you could
> > use here. The choices that spring to mind are;
> >
> > 1. reduce MAX_ORDER so they are aligned (easiest)
> >    
> 
> Is it safe to assume that reducing MAX_ORDER will hurt performance?
> 
> > 2. use SPARSEMEM (easy, but not necessary what you want to do, might
> > 	waste memory unless you drop MAX_ORDER as well)
> >    
> 
> We intend to use SPARSEMEM, but we'd also like to maintain FLATMEM 
> compatibility for some configurations.  My guess is that there are other 
> ARM users that may want this support as well.
> 
> > 3. implement a pfn_valid() that can handle the holes and set
> > 	CONFIG_HOLES_IN_ZONE so it's called in move_freepages() to
> > 	deal with the holes (should pass this by someone more familiar
> > 	with ARM than I)
> >    
> 
> This option seems the best to me.  We should be able to implement an ARM 
> specific pfn_valid() that walks the ARM meminfo struct to ensure the pfn 
> is not within a hole.
> 
> My only concern with this is a comment in __rmqueue_fallback() after 
> calling move_freepages_block()  that states "Claim the whole block if 
> over half of it is free".  Suppose only 1 MB is beyond the bank limit.  
> That means that over half of the pages of the 4 MB block will be 
> reported by move_freepages() as free -- but 1 MB of those pages are 
> invalid.  Won't this cause problems if these pages are assumed to be 
> part of an active block?
> 
memmap for memory holes should be marked as PG_reserved and never be freed
by free_bootmem(). Then, memmap for memory holes will not be in buddy allocator.

Again, pfn_valid() just show "there is memmap", not for "there is a valid page"


> It seems like we should have an additional check in 
> move_freepages_block() with pfn_valid_within() to check the last page in 
> the block (eg. end_pfn) before calling move_freepages_block().  If the 
> last page is not valid, then we shouldn't we return 0 as in the zone 
> span check?  This will also skip the extra burden of checking each 
> individual page, when we already know the proposed range is invalid.
> 
You don't need that. please check why PG_reserved for your memory holes
are not set.

> Assuming we did return 0 in this case, would that sub-block of pages 
> ever be usable for anything else, or would it be effectively wasted?  If 
> this memory were wasted, then adjusting MAX_ORDER would have an 
> advantage in this sense -- ignoring any performance implications.
> 

Even you do that, you have to fix "someone corrupts memory" or "someone
free PG_reserved memory" issue, anyway.

Thanks,
-Kame


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Kernel panic due to page migration accessing memory holes
  2010-02-19  2:00           ` KAMEZAWA Hiroyuki
@ 2010-02-19  5:48             ` Michael Bohan
  2010-02-19  6:10               ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 13+ messages in thread
From: Michael Bohan @ 2010-02-19  5:48 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Mel Gorman, linux-mm, linux-kernel, linux-arm-msm,
	linux-arm-kernel

On 2/18/2010 6:00 PM, KAMEZAWA Hiroyuki wrote:
> memmap for memory holes should be marked as PG_reserved and never be freed
> by free_bootmem(). Then, memmap for memory holes will not be in buddy allocator.
>
> Again, pfn_valid() just show "there is memmap", not for "there is a valid page"
>    

ARM seems to have been freeing the memmap holes for a long time.  I'm 
pretty sure there would be a lot of pushback if we tried to change 
that.  For example, in my memory map running FLATMEM, I would be 
consuming an extra ~7 MB of memory if these structures were not freed.

As a compromise, perhaps we could free everything except the first 
'pageblock_nr_pages' in a hole?  This would guarantee that 
move_freepages() doesn't deference any memory that doesn't belong to the 
memmap -- but still only waste a relatively small amount of memory.  For 
a 4 MB page block, it should only consume an extra 32 KB per hole in the 
memory map.

Thanks,
Michael

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Kernel panic due to page migration accessing memory holes
  2010-02-19  5:48             ` Michael Bohan
@ 2010-02-19  6:10               ` KAMEZAWA Hiroyuki
  2010-02-19  8:21                 ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 13+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-02-19  6:10 UTC (permalink / raw)
  To: Michael Bohan
  Cc: Mel Gorman, linux-mm, linux-kernel, linux-arm-msm,
	linux-arm-kernel

On Thu, 18 Feb 2010 21:48:37 -0800
Michael Bohan <mbohan@codeaurora.org> wrote:

> On 2/18/2010 6:00 PM, KAMEZAWA Hiroyuki wrote:
> > memmap for memory holes should be marked as PG_reserved and never be freed
> > by free_bootmem(). Then, memmap for memory holes will not be in buddy allocator.
> >
> > Again, pfn_valid() just show "there is memmap", not for "there is a valid page"
> >    
> 
> ARM seems to have been freeing the memmap holes for a long time.
Ouch.

> I'm pretty sure there would be a lot of pushback if we tried to change 
> that.  For example, in my memory map running FLATMEM, I would be 
> consuming an extra ~7 MB of memory if these structures were not freed.
> 
> As a compromise, perhaps we could free everything except the first 
> 'pageblock_nr_pages' in a hole?  This would guarantee that 
> move_freepages() doesn't deference any memory that doesn't belong to the 
> memmap -- but still only waste a relatively small amount of memory.  For 
> a 4 MB page block, it should only consume an extra 32 KB per hole in the 
> memory map.
> 
No. You have to implement pfn_valid() to return correct value as
"pfn_valid() returnes true if there is memmap." even if you do that.
Otherwise, many things will go bad.

You have 2 or 3 ways.

1. re-implement pfn_valid() which returns correct value.
   maybe not difficult. but please take care of defining CONFIG_HOLES_IN_....
   etc.

2. use DISCONTIGMEM and treat each bank and NUMA node.
   There will be no waste for memmap. But other complication of CONFIG_NUMA.
   
3. use SPARSEMEM.
   You have even 2 choisce here. 
   a - Set your MAX_ORDER and SECTION_SIZE to be proper value.
   b - waste some amount of memory for memmap on the edge of section.
       (and don't free memmap for the edge.)
      
Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Kernel panic due to page migration accessing memory holes
  2010-02-19  6:10               ` KAMEZAWA Hiroyuki
@ 2010-02-19  8:21                 ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 13+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-02-19  8:21 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Michael Bohan, Mel Gorman, linux-mm, linux-kernel, linux-arm-msm,
	linux-arm-kernel

On Fri, 19 Feb 2010 15:10:12 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> 1. re-implement pfn_valid() which returns correct value.
>    maybe not difficult. but please take care of defining CONFIG_HOLES_IN_....
>    etc.
> 
> 2. use DISCONTIGMEM and treat each bank and NUMA node.
>    There will be no waste for memmap. But other complication of CONFIG_NUMA.
>    
> 3. use SPARSEMEM.
>    You have even 2 choisce here. 
>    a - Set your MAX_ORDER and SECTION_SIZE to be proper value.
>    b - waste some amount of memory for memmap on the edge of section.
>        (and don't free memmap for the edge.)
>       

I read ARM's code briefly. In 2.6.32, ...I think (1) is implemented. As
==

#ifndef CONFIG_SPARSEMEM
int pfn_valid(unsigned long pfn)
{
        struct meminfo *mi = &meminfo;
        unsigned int left = 0, right = mi->nr_banks;

        do {
                unsigned int mid = (right + left) / 2;
                struct membank *bank = &mi->bank[mid];

                if (pfn < bank_pfn_start(bank))
                        right = mid;
                else if (pfn >= bank_pfn_end(bank))
                        left = mid + 1;
                else
                        return 1;
        } while (left < right);
        return 0;
}
EXPORT_SYMBOL(pfn_valid);
==
So, what you should do is upgrade to 2.6.32 or backport this one.

See this.

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=b7cfda9fc3d7aa60cffab5367f2a72a4a70060cd

Thanks,
-Kame




--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Kernel panic due to page migration accessing memory holes
  2010-02-19  1:47         ` Michael Bohan
  2010-02-19  2:00           ` KAMEZAWA Hiroyuki
@ 2010-02-19  8:30           ` Russell King - ARM Linux
  2010-02-19 13:48           ` Mel Gorman
  2 siblings, 0 replies; 13+ messages in thread
From: Russell King - ARM Linux @ 2010-02-19  8:30 UTC (permalink / raw)
  To: Michael Bohan
  Cc: Mel Gorman, linux-arm-kernel, linux-mm, linux-kernel,
	KAMEZAWA Hiroyuki, linux-arm-msm

On Thu, Feb 18, 2010 at 05:47:28PM -0800, Michael Bohan wrote:
> I am actually using the pfn_valid implementation FLATMEM in  
> arch/arm/include/asm/memory.h.  This one is very similar to the  
> asm-generic, and has no knowledge of the holes.

Later kernels have a pfn_valid() which does have hole functionality.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Kernel panic due to page migration accessing memory holes
  2010-02-19  1:47         ` Michael Bohan
  2010-02-19  2:00           ` KAMEZAWA Hiroyuki
  2010-02-19  8:30           ` Russell King - ARM Linux
@ 2010-02-19 13:48           ` Mel Gorman
  2 siblings, 0 replies; 13+ messages in thread
From: Mel Gorman @ 2010-02-19 13:48 UTC (permalink / raw)
  To: Michael Bohan
  Cc: KAMEZAWA Hiroyuki, linux-mm, linux-kernel, linux-arm-msm,
	linux-arm-kernel

On Thu, Feb 18, 2010 at 05:47:28PM -0800, Michael Bohan wrote:
> On 2/18/2010 2:04 AM, Mel Gorman wrote:
>> On Thu, Feb 18, 2010 at 06:36:04PM +0900, KAMEZAWA Hiroyuki wrote:
>>    
>>>   [Fact]
>>>   - There are 2 banks of memory and a memory hole on your machine.
>>>     As
>>>           0x00200000 - 0x07D00000
>>>           0x40000000 - 0x43000000
>>>
>>>   - Each bancks are in the same zone.
>>>   - You use FLATMEM.
>>>   - You see panic in move_freepages().
>>>   - Your host's MAX_ORDER=11....buddy allocator's alignment is 0x400000
>>>     Then, it seems 1st bank is not algined.
>>>      
>> It's not and assumptions are made about it being aligned.
>>    
>
> Would it be prudent to have the ARM mm init code detect unaligned,  
> discontiguous banks and print a warning message if  
> CONFIG_ARCH_HAS_HOLES_MEMORYMODEL is not configured?  Should we take it  
> a step further and even BUG()?
>

I guess it wouldn't hurt. I wouldn't get too side-tracked though as it's
not the most important issue here.

>> ARM frees unused portions of memmap to save memory. It's why memmap_valid_within()
>> exists when CONFIG_ARCH_HAS_HOLES_MEMORYMODEL although previously only
>> reading /proc/pagetypeinfo cared.
>>
>> In that case, the FLATMEM memory map had unexpected holes which "never"
>> happens and that was the workaround. The problem here is that there are
>> unaligned zones but no pfn_valid() implementation that can identify
>> them as you'd have with SPARSEMEM. My expectation is that you are using
>> the pfn_valid() implementation from asm-generic
>>
>> #define pfn_valid(pfn)          ((pfn)<  max_mapnr)
>>
>> which is insufficient in your case.
>>    
>
> I am actually using the pfn_valid implementation FLATMEM in  
> arch/arm/include/asm/memory.h.  This one is very similar to the  
> asm-generic, and has no knowledge of the holes.
>

Same problem applies so.

>> I think it's more likely the at the memmap he is accessing has been
>> freed and is effectively random data.
>>
>>    
>
> I also think this is the case.
>
>> SPARSEMEM would give you an implementation of pfn_valid() that you could
>> use here. The choices that spring to mind are;
>>
>> 1. reduce MAX_ORDER so they are aligned (easiest)
>>    
>
> Is it safe to assume that reducing MAX_ORDER will hurt performance?
>

No, it does not necessarily reduce performance. In some circumstances it
might even help although I wouldn't chase after it.

Downside one is that some hash tables might be getting hurt if you have a
very large amount of memory (look for "hash table entries:" in dmesg after
booting to see what order is being used).

Downside two is that if some drivers require large contiguous memory
early in boot, they might be hurt by MAX_ORDER being lower. If you
require CONFIG_HUGETLB_PAGE, it might not be possible to reduce
MAX_ORDER depending on the size of the huge page.

>> 2. use SPARSEMEM (easy, but not necessary what you want to do, might
>> 	waste memory unless you drop MAX_ORDER as well)
>>    
>
> We intend to use SPARSEMEM, but we'd also like to maintain FLATMEM  
> compatibility for some configurations.  My guess is that there are other  
> ARM users that may want this support as well.
>
>> 3. implement a pfn_valid() that can handle the holes and set
>> 	CONFIG_HOLES_IN_ZONE so it's called in move_freepages() to
>> 	deal with the holes (should pass this by someone more familiar
>> 	with ARM than I)
>>    
>
> This option seems the best to me.  We should be able to implement an ARM  
> specific pfn_valid() that walks the ARM meminfo struct to ensure the pfn  
> is not within a hole.
>

Be sure to check your performance before and after. pfn_valid_within()
is used in a fair few places and you are likely enabling it.

> My only concern with this is a comment in __rmqueue_fallback() after  
> calling move_freepages_block()  that states "Claim the whole block if  
> over half of it is free".  Suppose only 1 MB is beyond the bank limit.   
> That means that over half of the pages of the 4 MB block will be  
> reported by move_freepages() as free -- but 1 MB of those pages are  
> invalid.  Won't this cause problems if these pages are assumed to be  
> part of an active block?
>

The only operation taking place there is updating a bitmap so I doubt
you'll hit snags there.

> It seems like we should have an additional check in  
> move_freepages_block() with pfn_valid_within() to check the last page in  
> the block (eg. end_pfn) before calling move_freepages_block().  If the  
> last page is not valid, then we shouldn't we return 0 as in the zone  
> span check? This will also skip the extra burden of checking each  
> individual page, when we already know the proposed range is invalid.
>

You don't know where the holes are going to be so it is paranod rather
than making assumptions about where architectures put holes.

> Assuming we did return 0 in this case, would that sub-block of pages  
> ever be usable for anything else, or would it be effectively wasted? 

They're still usable.

> If  
> this memory were wasted, then adjusting MAX_ORDER would have an  
> advantage in this sense -- ignoring any performance implications.
>

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2010-02-19 13:48 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-02-18  0:45 Kernel panic due to page migration accessing memory holes Michael Bohan
2010-02-18  1:03 ` KAMEZAWA Hiroyuki
2010-02-18  8:22   ` Michael Bohan
2010-02-18  9:36     ` KAMEZAWA Hiroyuki
2010-02-18 10:04       ` Mel Gorman
2010-02-19  1:47         ` Michael Bohan
2010-02-19  2:00           ` KAMEZAWA Hiroyuki
2010-02-19  5:48             ` Michael Bohan
2010-02-19  6:10               ` KAMEZAWA Hiroyuki
2010-02-19  8:21                 ` KAMEZAWA Hiroyuki
2010-02-19  8:30           ` Russell King - ARM Linux
2010-02-19 13:48           ` Mel Gorman
2010-02-18  8:53 ` Russell King - ARM Linux

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).