* Kernel panic due to page migration accessing memory holes @ 2010-02-18 0:45 Michael Bohan 2010-02-18 1:03 ` KAMEZAWA Hiroyuki 2010-02-18 8:53 ` Russell King - ARM Linux 0 siblings, 2 replies; 13+ messages in thread From: Michael Bohan @ 2010-02-18 0:45 UTC (permalink / raw) To: linux-mm; +Cc: linux-kernel, linux-arm-msm, linux-arm-kernel Hi, I have encountered a kernel panic on the ARM/msm platform in the mm migration code on 2.6.29. My memory configuration has two discontiguous banks per our ATAG definition. These banks end up on addresses that are 1 MB aligned. I am using FLATMEM (not SPARSEMEM), but my understanding is that SPARSEMEM should not be necessary to support this configuration. Please correct me if I'm wrong. The crash occurs in mm/page_alloc.c:move_freepages() when being passed a start_page that corresponds to the last several megabytes of our first memory bank. The code in move_freepages_block() aligns the passed in page number to pageblock_nr_pages, which corresponds to 4 MB. It then passes that aligned pfn as the beginning of a 4 MB range to move_freepages(). The problem is that since our bank's end address is not 4 MB aligned, the range passed to move_freepages() exceeds the end of our memory bank. The code later blows up when trying to access uninitialized page structures. As a temporary fix, I added some code to move_freepages_block() that inspects whether the range exceeds our first memory bank -- returning 0 if it does. This is not a clean solution, since it requires exporting the ARM specific meminfo structure to extract the bank information. I see an option exists called CONFIG_HOLES_IN_ZONE, which has control over the definition of pfn_valid_within() used in move_freepages(). This option seems relevant to the problem. The ia64 architecture has a special version of pfn_valid() called ia64_pfn_valid() that is used in conjunction with this option. It appears to inspect the page structure's state in a safe way that does not cause a crash, and can presumably be used to determine whether the page structure is initialized properly. The ARM version of pfn_valid() used in the FLATMEM scenario does not appear to be memory hole aware, and will blindly return true in this case. I have looked on linux-next, and at least the functions mentioned above have not changed. I was curious if there is a stated requirement where memory banks must end on 4 MB aligned addresses. Although I found this problem on ARM, it appears upon inspection that the problem could occur on other architectures as well, given the memory map assumptions stated above. I'm hoping that some mm experts might understand the problem in greater detail. Thanks, Michael -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Kernel panic due to page migration accessing memory holes 2010-02-18 0:45 Kernel panic due to page migration accessing memory holes Michael Bohan @ 2010-02-18 1:03 ` KAMEZAWA Hiroyuki 2010-02-18 8:22 ` Michael Bohan 2010-02-18 8:53 ` Russell King - ARM Linux 1 sibling, 1 reply; 13+ messages in thread From: KAMEZAWA Hiroyuki @ 2010-02-18 1:03 UTC (permalink / raw) To: Michael Bohan; +Cc: linux-mm, linux-kernel, linux-arm-msm, linux-arm-kernel On Wed, 17 Feb 2010 16:45:54 -0800 Michael Bohan <mbohan@codeaurora.org> wrote: > Hi, > > I have encountered a kernel panic on the ARM/msm platform in the mm > migration code on 2.6.29. My memory configuration has two discontiguous > banks per our ATAG definition. These banks end up on addresses that > are 1 MB aligned. I am using FLATMEM (not SPARSEMEM), but my > understanding is that SPARSEMEM should not be necessary to support this > configuration. Please correct me if I'm wrong. > > The crash occurs in mm/page_alloc.c:move_freepages() when being passed a > start_page that corresponds to the last several megabytes of our first > memory bank. The code in move_freepages_block() aligns the passed in > page number to pageblock_nr_pages, which corresponds to 4 MB. It then > passes that aligned pfn as the beginning of a 4 MB range to > move_freepages(). The problem is that since our bank's end address is > not 4 MB aligned, the range passed to move_freepages() exceeds the end > of our memory bank. The code later blows up when trying to access > uninitialized page structures. > That should be aligned, I think. > As a temporary fix, I added some code to move_freepages_block() that > inspects whether the range exceeds our first memory bank -- returning 0 > if it does. This is not a clean solution, since it requires exporting > the ARM specific meminfo structure to extract the bank information. > Hmm, my first impression is... - Using FLATMEM, memmap is created for the number of pages and memmap should not have aligned size. - Using SPARSEMEM, memmap is created for aligned number of pages. Then, the range [zone->start_pfn ... zone->start_pfn + zone->spanned_pages] should be checked always. 803 static int move_freepages_block(struct zone *zone, struct page *page, 804 int migratetype) 805 { 816 if (start_pfn < zone->zone_start_pfn) 817 start_page = page; 818 if (end_pfn >= zone->zone_start_pfn + zone->spanned_pages) 819 return 0; 820 821 return move_freepages(zone, start_page, end_page, migratetype); 822 } "(end_pfn >= zone->zone_start_pfn + zone->spanned_pages)" is checked. What zone->spanned_pages is set ? The zone's range is [zone->start_pfn ... zone->start_pfn+zone->spanned_pages], so this area should have initialized memmap. I wonder zone->spanned_pages is too big. Could you check ? (maybe /proc/zoneinfo can show it.) Dump of /proc/zoneinfo or dmesg will be helpful. Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Kernel panic due to page migration accessing memory holes 2010-02-18 1:03 ` KAMEZAWA Hiroyuki @ 2010-02-18 8:22 ` Michael Bohan 2010-02-18 9:36 ` KAMEZAWA Hiroyuki 0 siblings, 1 reply; 13+ messages in thread From: Michael Bohan @ 2010-02-18 8:22 UTC (permalink / raw) To: KAMEZAWA Hiroyuki; +Cc: linux-mm, linux-kernel, linux-arm-msm, linux-arm-kernel On 2/17/2010 5:03 PM, KAMEZAWA Hiroyuki wrote: > On Wed, 17 Feb 2010 16:45:54 -0800 > Michael Bohan<mbohan@codeaurora.org> wrote: >> As a temporary fix, I added some code to move_freepages_block() that >> inspects whether the range exceeds our first memory bank -- returning 0 >> if it does. This is not a clean solution, since it requires exporting >> the ARM specific meminfo structure to extract the bank information. >> >> > Hmm, my first impression is... > > - Using FLATMEM, memmap is created for the number of pages and memmap should > not have aligned size. > - Using SPARSEMEM, memmap is created for aligned number of pages. > > Then, the range [zone->start_pfn ... zone->start_pfn + zone->spanned_pages] > should be checked always. > > > 803 static int move_freepages_block(struct zone *zone, struct page *page, > 804 int migratetype) > 805 { > 816 if (start_pfn< zone->zone_start_pfn) > 817 start_page = page; > 818 if (end_pfn>= zone->zone_start_pfn + zone->spanned_pages) > 819 return 0; > 820 > 821 return move_freepages(zone, start_page, end_page, migratetype); > 822 } > > "(end_pfn>= zone->zone_start_pfn + zone->spanned_pages)" is checked. > What zone->spanned_pages is set ? The zone's range is > [zone->start_pfn ... zone->start_pfn+zone->spanned_pages], so this > area should have initialized memmap. I wonder zone->spanned_pages is too big. > In the block of code above running on my target, the zone_start_pfn is is 0x200 and the spanned_pages is 0x44100. This is consistent with the values shown from the zoneinfo file below. It is also consistent with my memory map: bank0: start: 0x00200000 size: 0x07B00000 bank1: start: 0x40000000 size: 0x04300000 Thus, spanned_pages here is the highest address reached minus the start address of the lowest bank (eg. 0x40000000 + 0x04300000 - 0x00200000). Both of these banks exist in the same zone. This means that the check in move_freepages_block() will never be satisfied for cases that overlap with the prohibited pfns, since the zone spans invalid pfns. Should each bank be associated with its own zone? > Could you check ? (maybe /proc/zoneinfo can show it.) > Dump of /proc/zoneinfo or dmesg will be helpful. > Here is what I believe to be the relevant pieces from the kernel log: <7>[ 0.000000] On node 0 totalpages: 48640 <7>[ 0.000000] free_area_init_node: node 0, pgdat 804875bc, node_mem_map 805af000 <7>[ 0.000000] Normal zone: 2178 pages used for memmap <7>[ 0.000000] Normal zone: 0 pages reserved <7>[ 0.000000] Normal zone: 46462 pages, LIFO batch:15 <4>[ 0.000000] Built 1 zonelists in Zone order, mobility grouping on. Total pages: 46462 # cat /proc/zoneinfo Node 0, zone Normal pages free 678 min 431 low 538 high 646 scanned 0 (aa: 0 ia: 0 af: 0 if: 0) spanned 278784 present 46462 mem_notify_status 0 nr_free_pages 678 nr_inactive_anon 8494 nr_active_anon 8474 nr_inactive_file 3234 nr_active_file 2653 nr_unevictable 71 nr_mlock 0 nr_anon_pages 12488 nr_mapped 7237 nr_file_pages 10446 nr_dirty 0 nr_writeback 0 nr_slab_reclaimable 293 nr_slab_unreclaimable 942 nr_page_table_pages 1365 nr_unstable 0 nr_bounce 0 nr_vmscan_write 0 nr_writeback_temp 0 protection: (0, 0) pagesets cpu: 0 count: 42 high: 90 batch: 15 all_unreclaimable: 0 prev_priority: 12 start_pfn: 512 inactive_ratio: 1 Thanks, Michael -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Kernel panic due to page migration accessing memory holes 2010-02-18 8:22 ` Michael Bohan @ 2010-02-18 9:36 ` KAMEZAWA Hiroyuki 2010-02-18 10:04 ` Mel Gorman 0 siblings, 1 reply; 13+ messages in thread From: KAMEZAWA Hiroyuki @ 2010-02-18 9:36 UTC (permalink / raw) To: Michael Bohan Cc: linux-mm, linux-kernel, linux-arm-msm, linux-arm-kernel, mel On Thu, 18 Feb 2010 00:22:24 -0800 Michael Bohan <mbohan@codeaurora.org> wrote: > On 2/17/2010 5:03 PM, KAMEZAWA Hiroyuki wrote: > > On Wed, 17 Feb 2010 16:45:54 -0800 > > Michael Bohan<mbohan@codeaurora.org> wrote: > >> As a temporary fix, I added some code to move_freepages_block() that > >> inspects whether the range exceeds our first memory bank -- returning 0 > >> if it does. This is not a clean solution, since it requires exporting > >> the ARM specific meminfo structure to extract the bank information. > >> > >> > > Hmm, my first impression is... > > > > - Using FLATMEM, memmap is created for the number of pages and memmap should > > not have aligned size. > > - Using SPARSEMEM, memmap is created for aligned number of pages. > > > > Then, the range [zone->start_pfn ... zone->start_pfn + zone->spanned_pages] > > should be checked always. > > > > > > 803 static int move_freepages_block(struct zone *zone, struct page *page, > > 804 int migratetype) > > 805 { > > 816 if (start_pfn< zone->zone_start_pfn) > > 817 start_page = page; > > 818 if (end_pfn>= zone->zone_start_pfn + zone->spanned_pages) > > 819 return 0; > > 820 > > 821 return move_freepages(zone, start_page, end_page, migratetype); > > 822 } > > > > "(end_pfn>= zone->zone_start_pfn + zone->spanned_pages)" is checked. > > What zone->spanned_pages is set ? The zone's range is > > [zone->start_pfn ... zone->start_pfn+zone->spanned_pages], so this > > area should have initialized memmap. I wonder zone->spanned_pages is too big. > > > > In the block of code above running on my target, the zone_start_pfn is > is 0x200 and the spanned_pages is 0x44100. This is consistent with the > values shown from the zoneinfo file below. It is also consistent with > my memory map: > > bank0: > start: 0x00200000 > size: 0x07B00000 > > bank1: > start: 0x40000000 > size: 0x04300000 > > Thus, spanned_pages here is the highest address reached minus the start > address of the lowest bank (eg. 0x40000000 + 0x04300000 - 0x00200000). > > Both of these banks exist in the same zone. This means that the check > in move_freepages_block() will never be satisfied for cases that overlap > with the prohibited pfns, since the zone spans invalid pfns. Should > each bank be associated with its own zone? > Hmm. okay then..(CCing Mel.) [Fact] - There are 2 banks of memory and a memory hole on your machine. As 0x00200000 - 0x07D00000 0x40000000 - 0x43000000 - Each bancks are in the same zone. - You use FLATMEM. - You see panic in move_freepages(). - Your host's MAX_ORDER=11....buddy allocator's alignment is 0x400000 Then, it seems 1st bank is not algined. - You see panic in move_freepages(). - When you added special range check for bank0 in move_freepages(), no panic. So, it seems the kernel see somehing bad at accessing memmap for a memory hole between bank0 and bank1. When you use FLATMEM, memmap/migrate-type-bitmap should be allocated for the whole range of [start_pfn....max_pfn) regardless of memory holes. Then, I think you have memmap even for a memory hole [0x07D00000...0x40000000) Then, the question is why move_freepages() panic at accessing *unused* memmaps for memory hole. All memmap(struct page) are initialized in memmap_init() -> memmap_init_zone() -> .... Here, all page structs are initialized (page->flags, page->lru are initialized.) Then, looking back into move_freepages(). == 778 for (page = start_page; page <= end_page;) { 779 /* Make sure we are not inadvertently changing nodes */ 780 VM_BUG_ON(page_to_nid(page) != zone_to_nid(zone)); 781 782 if (!pfn_valid_within(page_to_pfn(page))) { 783 page++; 784 continue; 785 } 786 787 if (!PageBuddy(page)) { 788 page++; 789 continue; 790 } 791 792 order = page_order(page); 793 list_del(&page->lru); 794 list_add(&page->lru, 795 &zone->free_area[order].free_list[migratetype]); 796 page += 1 << order; 797 pages_moved += 1 << order; 798 } == Assume an access to page struct itself doesn't cause panic. Touching page struct's member of page->lru at el to cause panic, So, PageBuddy should be set. Then, there are 2 chances. 1. page_to_nid(page) != zone_to_nid(zone). 2. PageBuddy() is set by mistake. (PG_reserved page never be set PG_buddy.) For both, something corrupted in unused memmap area. There are 2 possibility. (1) memmap for memory hole was not initialized correctly. (2) something wrong currupt memmap. (by overwrite.) I doubt (2) rather than (1). One of difficulty here is that your kernel is 2.6.29. Can't you try 2.6.32 and reproduce trouble ? Or could you check page flags for memory holes ? For holes, nid should be zero and PG_buddy shouldn't be set and PG_reserved should be set... And checking memmap initialization of memory holes in memmap_init_zone() may be good start point for debug, I guess. Off topic: BTW, memory hole seems huge for your size of memory....using SPARSEMEM is a choice. Regards, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Kernel panic due to page migration accessing memory holes 2010-02-18 9:36 ` KAMEZAWA Hiroyuki @ 2010-02-18 10:04 ` Mel Gorman 2010-02-19 1:47 ` Michael Bohan 0 siblings, 1 reply; 13+ messages in thread From: Mel Gorman @ 2010-02-18 10:04 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: Michael Bohan, linux-mm, linux-kernel, linux-arm-msm, linux-arm-kernel On Thu, Feb 18, 2010 at 06:36:04PM +0900, KAMEZAWA Hiroyuki wrote: > On Thu, 18 Feb 2010 00:22:24 -0800 > Michael Bohan <mbohan@codeaurora.org> wrote: > > > On 2/17/2010 5:03 PM, KAMEZAWA Hiroyuki wrote: > > > On Wed, 17 Feb 2010 16:45:54 -0800 > > > Michael Bohan<mbohan@codeaurora.org> wrote: > > >> As a temporary fix, I added some code to move_freepages_block() that > > >> inspects whether the range exceeds our first memory bank -- returning 0 > > >> if it does. This is not a clean solution, since it requires exporting > > >> the ARM specific meminfo structure to extract the bank information. > > >> > > >> > > > Hmm, my first impression is... > > > > > > - Using FLATMEM, memmap is created for the number of pages and memmap should > > > not have aligned size. > > > - Using SPARSEMEM, memmap is created for aligned number of pages. > > > > > > Then, the range [zone->start_pfn ... zone->start_pfn + zone->spanned_pages] > > > should be checked always. > > > > > > > > > 803 static int move_freepages_block(struct zone *zone, struct page *page, > > > 804 int migratetype) > > > 805 { > > > 816 if (start_pfn< zone->zone_start_pfn) > > > 817 start_page = page; > > > 818 if (end_pfn>= zone->zone_start_pfn + zone->spanned_pages) > > > 819 return 0; > > > 820 > > > 821 return move_freepages(zone, start_page, end_page, migratetype); > > > 822 } > > > > > > "(end_pfn>= zone->zone_start_pfn + zone->spanned_pages)" is checked. > > > What zone->spanned_pages is set ? The zone's range is > > > [zone->start_pfn ... zone->start_pfn+zone->spanned_pages], so this > > > area should have initialized memmap. I wonder zone->spanned_pages is too big. > > > > > > > In the block of code above running on my target, the zone_start_pfn is > > is 0x200 and the spanned_pages is 0x44100. This is consistent with the > > values shown from the zoneinfo file below. It is also consistent with > > my memory map: > > > > bank0: > > start: 0x00200000 > > size: 0x07B00000 > > > > bank1: > > start: 0x40000000 > > size: 0x04300000 > > > > Thus, spanned_pages here is the highest address reached minus the start > > address of the lowest bank (eg. 0x40000000 + 0x04300000 - 0x00200000). > > > > Both of these banks exist in the same zone. This means that the check > > in move_freepages_block() will never be satisfied for cases that overlap > > with the prohibited pfns, since the zone spans invalid pfns. Should > > each bank be associated with its own zone? > > > > Hmm. okay then..(CCing Mel.) > > [Fact] > - There are 2 banks of memory and a memory hole on your machine. > As > 0x00200000 - 0x07D00000 > 0x40000000 - 0x43000000 > > - Each bancks are in the same zone. > - You use FLATMEM. > - You see panic in move_freepages(). > - Your host's MAX_ORDER=11....buddy allocator's alignment is 0x400000 > Then, it seems 1st bank is not algined. It's not and assumptions are made about it being aligned. > - You see panic in move_freepages(). > - When you added special range check for bank0 in move_freepages(), no panic. > So, it seems the kernel see somehing bad at accessing memmap for a memory > hole between bank0 and bank1. > > > When you use FLATMEM, memmap/migrate-type-bitmap should be allocated for > the whole range of [start_pfn....max_pfn) regardless of memory holes. > Then, I think you have memmap even for a memory hole [0x07D00000...0x40000000) > It would have at the start but then .... > Then, the question is why move_freepages() panic at accessing *unused* memmaps > for memory hole. All memmap(struct page) are initialized in > memmap_init() > -> memmap_init_zone() > -> .... > Here, all page structs are initialized (page->flags, page->lru are initialized.) > ARM frees unused portions of memmap to save memory. It's why memmap_valid_within() exists when CONFIG_ARCH_HAS_HOLES_MEMORYMODEL although previously only reading /proc/pagetypeinfo cared. In that case, the FLATMEM memory map had unexpected holes which "never" happens and that was the workaround. The problem here is that there are unaligned zones but no pfn_valid() implementation that can identify them as you'd have with SPARSEMEM. My expectation is that you are using the pfn_valid() implementation from asm-generic #define pfn_valid(pfn) ((pfn) < max_mapnr) which is insufficient in your case. > Then, looking back into move_freepages(). > == > 778 for (page = start_page; page <= end_page;) { > 779 /* Make sure we are not inadvertently changing nodes */ > 780 VM_BUG_ON(page_to_nid(page) != zone_to_nid(zone)); > 781 > 782 if (!pfn_valid_within(page_to_pfn(page))) { > 783 page++; > 784 continue; > 785 } > 786 > 787 if (!PageBuddy(page)) { > 788 page++; > 789 continue; > 790 } > 791 > 792 order = page_order(page); > 793 list_del(&page->lru); > 794 list_add(&page->lru, > 795 &zone->free_area[order].free_list[migratetype]); > 796 page += 1 << order; > 797 pages_moved += 1 << order; > 798 } > == > Assume an access to page struct itself doesn't cause panic. > Touching page struct's member of page->lru at el to cause panic, > So, PageBuddy should be set. > > Then, there are 2 chances. > 1. page_to_nid(page) != zone_to_nid(zone). > 2. PageBuddy() is set by mistake. > (PG_reserved page never be set PG_buddy.) > > For both, something corrupted in unused memmap area. > There are 2 possibility. > (1) memmap for memory hole was not initialized correctly. > (2) something wrong currupt memmap. (by overwrite.) > > I doubt (2) rather than (1). > I think it's more likely the at the memmap he is accessing has been freed and is effectively random data. > One of difficulty here is that your kernel is 2.6.29. Can't you try 2.6.32 and > reproduce trouble ? Or could you check page flags for memory holes ? > For holes, nid should be zero and PG_buddy shouldn't be set and PG_reserved > should be set... > > And checking memmap initialization of memory holes in memmap_init_zone() > may be good start point for debug, I guess. > > Off topic: > BTW, memory hole seems huge for your size of memory....using SPARSEMEM > is a choice. > SPARSEMEM would give you an implementation of pfn_valid() that you could use here. The choices that spring to mind are; 1. reduce MAX_ORDER so they are aligned (easiest) 2. use SPARSEMEM (easy, but not necessary what you want to do, might waste memory unless you drop MAX_ORDER as well) 3. implement a pfn_valid() that can handle the holes and set CONFIG_HOLES_IN_ZONE so it's called in move_freepages() to deal with the holes (should pass this by someone more familiar with ARM than I) 4. Call memmap_valid_within in move_freepages (very very ugly, not suitable for upstream merging) -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Kernel panic due to page migration accessing memory holes 2010-02-18 10:04 ` Mel Gorman @ 2010-02-19 1:47 ` Michael Bohan 2010-02-19 2:00 ` KAMEZAWA Hiroyuki ` (2 more replies) 0 siblings, 3 replies; 13+ messages in thread From: Michael Bohan @ 2010-02-19 1:47 UTC (permalink / raw) To: Mel Gorman Cc: KAMEZAWA Hiroyuki, linux-mm, linux-kernel, linux-arm-msm, linux-arm-kernel On 2/18/2010 2:04 AM, Mel Gorman wrote: > On Thu, Feb 18, 2010 at 06:36:04PM +0900, KAMEZAWA Hiroyuki wrote: > >> [Fact] >> - There are 2 banks of memory and a memory hole on your machine. >> As >> 0x00200000 - 0x07D00000 >> 0x40000000 - 0x43000000 >> >> - Each bancks are in the same zone. >> - You use FLATMEM. >> - You see panic in move_freepages(). >> - Your host's MAX_ORDER=11....buddy allocator's alignment is 0x400000 >> Then, it seems 1st bank is not algined. >> > It's not and assumptions are made about it being aligned. > Would it be prudent to have the ARM mm init code detect unaligned, discontiguous banks and print a warning message if CONFIG_ARCH_HAS_HOLES_MEMORYMODEL is not configured? Should we take it a step further and even BUG()? > ARM frees unused portions of memmap to save memory. It's why memmap_valid_within() > exists when CONFIG_ARCH_HAS_HOLES_MEMORYMODEL although previously only > reading /proc/pagetypeinfo cared. > > In that case, the FLATMEM memory map had unexpected holes which "never" > happens and that was the workaround. The problem here is that there are > unaligned zones but no pfn_valid() implementation that can identify > them as you'd have with SPARSEMEM. My expectation is that you are using > the pfn_valid() implementation from asm-generic > > #define pfn_valid(pfn) ((pfn)< max_mapnr) > > which is insufficient in your case. > I am actually using the pfn_valid implementation FLATMEM in arch/arm/include/asm/memory.h. This one is very similar to the asm-generic, and has no knowledge of the holes. > I think it's more likely the at the memmap he is accessing has been > freed and is effectively random data. > > I also think this is the case. > SPARSEMEM would give you an implementation of pfn_valid() that you could > use here. The choices that spring to mind are; > > 1. reduce MAX_ORDER so they are aligned (easiest) > Is it safe to assume that reducing MAX_ORDER will hurt performance? > 2. use SPARSEMEM (easy, but not necessary what you want to do, might > waste memory unless you drop MAX_ORDER as well) > We intend to use SPARSEMEM, but we'd also like to maintain FLATMEM compatibility for some configurations. My guess is that there are other ARM users that may want this support as well. > 3. implement a pfn_valid() that can handle the holes and set > CONFIG_HOLES_IN_ZONE so it's called in move_freepages() to > deal with the holes (should pass this by someone more familiar > with ARM than I) > This option seems the best to me. We should be able to implement an ARM specific pfn_valid() that walks the ARM meminfo struct to ensure the pfn is not within a hole. My only concern with this is a comment in __rmqueue_fallback() after calling move_freepages_block() that states "Claim the whole block if over half of it is free". Suppose only 1 MB is beyond the bank limit. That means that over half of the pages of the 4 MB block will be reported by move_freepages() as free -- but 1 MB of those pages are invalid. Won't this cause problems if these pages are assumed to be part of an active block? It seems like we should have an additional check in move_freepages_block() with pfn_valid_within() to check the last page in the block (eg. end_pfn) before calling move_freepages_block(). If the last page is not valid, then we shouldn't we return 0 as in the zone span check? This will also skip the extra burden of checking each individual page, when we already know the proposed range is invalid. Assuming we did return 0 in this case, would that sub-block of pages ever be usable for anything else, or would it be effectively wasted? If this memory were wasted, then adjusting MAX_ORDER would have an advantage in this sense -- ignoring any performance implications. Thanks, Michael -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Kernel panic due to page migration accessing memory holes 2010-02-19 1:47 ` Michael Bohan @ 2010-02-19 2:00 ` KAMEZAWA Hiroyuki 2010-02-19 5:48 ` Michael Bohan 2010-02-19 8:30 ` Russell King - ARM Linux 2010-02-19 13:48 ` Mel Gorman 2 siblings, 1 reply; 13+ messages in thread From: KAMEZAWA Hiroyuki @ 2010-02-19 2:00 UTC (permalink / raw) To: Michael Bohan Cc: Mel Gorman, linux-mm, linux-kernel, linux-arm-msm, linux-arm-kernel On Thu, 18 Feb 2010 17:47:28 -0800 Michael Bohan <mbohan@codeaurora.org> wrote: > On 2/18/2010 2:04 AM, Mel Gorman wrote: > > On Thu, Feb 18, 2010 at 06:36:04PM +0900, KAMEZAWA Hiroyuki wrote: > > > >> [Fact] > >> - There are 2 banks of memory and a memory hole on your machine. > >> As > >> 0x00200000 - 0x07D00000 > >> 0x40000000 - 0x43000000 > >> > >> - Each bancks are in the same zone. > >> - You use FLATMEM. > >> - You see panic in move_freepages(). > >> - Your host's MAX_ORDER=11....buddy allocator's alignment is 0x400000 > >> Then, it seems 1st bank is not algined. > >> > > It's not and assumptions are made about it being aligned. > > > > Would it be prudent to have the ARM mm init code detect unaligned, > discontiguous banks and print a warning message if > CONFIG_ARCH_HAS_HOLES_MEMORYMODEL is not configured? Should we take it > a step further and even BUG()? > > > ARM frees unused portions of memmap to save memory. It's why memmap_valid_within() > > exists when CONFIG_ARCH_HAS_HOLES_MEMORYMODEL although previously only > > reading /proc/pagetypeinfo cared. > > > > In that case, the FLATMEM memory map had unexpected holes which "never" > > happens and that was the workaround. The problem here is that there are > > unaligned zones but no pfn_valid() implementation that can identify > > them as you'd have with SPARSEMEM. My expectation is that you are using > > the pfn_valid() implementation from asm-generic > > > > #define pfn_valid(pfn) ((pfn)< max_mapnr) > > > > which is insufficient in your case. > > > > I am actually using the pfn_valid implementation FLATMEM in > arch/arm/include/asm/memory.h. This one is very similar to the > asm-generic, and has no knowledge of the holes. > That means, in FLATMEM, memmaps are allocated for [start....max_pfn]. pfn_valid() isn't for "there is memor" but for "there is memmap". > > I think it's more likely the at the memmap he is accessing has been > > freed and is effectively random data. > > > > > > I also think this is the case. > Then, plz check free_bootmem() at el doen't free pages in a memory hole. > > SPARSEMEM would give you an implementation of pfn_valid() that you could > > use here. The choices that spring to mind are; > > > > 1. reduce MAX_ORDER so they are aligned (easiest) > > > > Is it safe to assume that reducing MAX_ORDER will hurt performance? > > > 2. use SPARSEMEM (easy, but not necessary what you want to do, might > > waste memory unless you drop MAX_ORDER as well) > > > > We intend to use SPARSEMEM, but we'd also like to maintain FLATMEM > compatibility for some configurations. My guess is that there are other > ARM users that may want this support as well. > > > 3. implement a pfn_valid() that can handle the holes and set > > CONFIG_HOLES_IN_ZONE so it's called in move_freepages() to > > deal with the holes (should pass this by someone more familiar > > with ARM than I) > > > > This option seems the best to me. We should be able to implement an ARM > specific pfn_valid() that walks the ARM meminfo struct to ensure the pfn > is not within a hole. > > My only concern with this is a comment in __rmqueue_fallback() after > calling move_freepages_block() that states "Claim the whole block if > over half of it is free". Suppose only 1 MB is beyond the bank limit. > That means that over half of the pages of the 4 MB block will be > reported by move_freepages() as free -- but 1 MB of those pages are > invalid. Won't this cause problems if these pages are assumed to be > part of an active block? > memmap for memory holes should be marked as PG_reserved and never be freed by free_bootmem(). Then, memmap for memory holes will not be in buddy allocator. Again, pfn_valid() just show "there is memmap", not for "there is a valid page" > It seems like we should have an additional check in > move_freepages_block() with pfn_valid_within() to check the last page in > the block (eg. end_pfn) before calling move_freepages_block(). If the > last page is not valid, then we shouldn't we return 0 as in the zone > span check? This will also skip the extra burden of checking each > individual page, when we already know the proposed range is invalid. > You don't need that. please check why PG_reserved for your memory holes are not set. > Assuming we did return 0 in this case, would that sub-block of pages > ever be usable for anything else, or would it be effectively wasted? If > this memory were wasted, then adjusting MAX_ORDER would have an > advantage in this sense -- ignoring any performance implications. > Even you do that, you have to fix "someone corrupts memory" or "someone free PG_reserved memory" issue, anyway. Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Kernel panic due to page migration accessing memory holes 2010-02-19 2:00 ` KAMEZAWA Hiroyuki @ 2010-02-19 5:48 ` Michael Bohan 2010-02-19 6:10 ` KAMEZAWA Hiroyuki 0 siblings, 1 reply; 13+ messages in thread From: Michael Bohan @ 2010-02-19 5:48 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: Mel Gorman, linux-mm, linux-kernel, linux-arm-msm, linux-arm-kernel On 2/18/2010 6:00 PM, KAMEZAWA Hiroyuki wrote: > memmap for memory holes should be marked as PG_reserved and never be freed > by free_bootmem(). Then, memmap for memory holes will not be in buddy allocator. > > Again, pfn_valid() just show "there is memmap", not for "there is a valid page" > ARM seems to have been freeing the memmap holes for a long time. I'm pretty sure there would be a lot of pushback if we tried to change that. For example, in my memory map running FLATMEM, I would be consuming an extra ~7 MB of memory if these structures were not freed. As a compromise, perhaps we could free everything except the first 'pageblock_nr_pages' in a hole? This would guarantee that move_freepages() doesn't deference any memory that doesn't belong to the memmap -- but still only waste a relatively small amount of memory. For a 4 MB page block, it should only consume an extra 32 KB per hole in the memory map. Thanks, Michael -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Kernel panic due to page migration accessing memory holes 2010-02-19 5:48 ` Michael Bohan @ 2010-02-19 6:10 ` KAMEZAWA Hiroyuki 2010-02-19 8:21 ` KAMEZAWA Hiroyuki 0 siblings, 1 reply; 13+ messages in thread From: KAMEZAWA Hiroyuki @ 2010-02-19 6:10 UTC (permalink / raw) To: Michael Bohan Cc: Mel Gorman, linux-mm, linux-kernel, linux-arm-msm, linux-arm-kernel On Thu, 18 Feb 2010 21:48:37 -0800 Michael Bohan <mbohan@codeaurora.org> wrote: > On 2/18/2010 6:00 PM, KAMEZAWA Hiroyuki wrote: > > memmap for memory holes should be marked as PG_reserved and never be freed > > by free_bootmem(). Then, memmap for memory holes will not be in buddy allocator. > > > > Again, pfn_valid() just show "there is memmap", not for "there is a valid page" > > > > ARM seems to have been freeing the memmap holes for a long time. Ouch. > I'm pretty sure there would be a lot of pushback if we tried to change > that. For example, in my memory map running FLATMEM, I would be > consuming an extra ~7 MB of memory if these structures were not freed. > > As a compromise, perhaps we could free everything except the first > 'pageblock_nr_pages' in a hole? This would guarantee that > move_freepages() doesn't deference any memory that doesn't belong to the > memmap -- but still only waste a relatively small amount of memory. For > a 4 MB page block, it should only consume an extra 32 KB per hole in the > memory map. > No. You have to implement pfn_valid() to return correct value as "pfn_valid() returnes true if there is memmap." even if you do that. Otherwise, many things will go bad. You have 2 or 3 ways. 1. re-implement pfn_valid() which returns correct value. maybe not difficult. but please take care of defining CONFIG_HOLES_IN_.... etc. 2. use DISCONTIGMEM and treat each bank and NUMA node. There will be no waste for memmap. But other complication of CONFIG_NUMA. 3. use SPARSEMEM. You have even 2 choisce here. a - Set your MAX_ORDER and SECTION_SIZE to be proper value. b - waste some amount of memory for memmap on the edge of section. (and don't free memmap for the edge.) Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Kernel panic due to page migration accessing memory holes 2010-02-19 6:10 ` KAMEZAWA Hiroyuki @ 2010-02-19 8:21 ` KAMEZAWA Hiroyuki 0 siblings, 0 replies; 13+ messages in thread From: KAMEZAWA Hiroyuki @ 2010-02-19 8:21 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: Michael Bohan, Mel Gorman, linux-mm, linux-kernel, linux-arm-msm, linux-arm-kernel On Fri, 19 Feb 2010 15:10:12 +0900 KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote: > 1. re-implement pfn_valid() which returns correct value. > maybe not difficult. but please take care of defining CONFIG_HOLES_IN_.... > etc. > > 2. use DISCONTIGMEM and treat each bank and NUMA node. > There will be no waste for memmap. But other complication of CONFIG_NUMA. > > 3. use SPARSEMEM. > You have even 2 choisce here. > a - Set your MAX_ORDER and SECTION_SIZE to be proper value. > b - waste some amount of memory for memmap on the edge of section. > (and don't free memmap for the edge.) > I read ARM's code briefly. In 2.6.32, ...I think (1) is implemented. As == #ifndef CONFIG_SPARSEMEM int pfn_valid(unsigned long pfn) { struct meminfo *mi = &meminfo; unsigned int left = 0, right = mi->nr_banks; do { unsigned int mid = (right + left) / 2; struct membank *bank = &mi->bank[mid]; if (pfn < bank_pfn_start(bank)) right = mid; else if (pfn >= bank_pfn_end(bank)) left = mid + 1; else return 1; } while (left < right); return 0; } EXPORT_SYMBOL(pfn_valid); == So, what you should do is upgrade to 2.6.32 or backport this one. See this. http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=b7cfda9fc3d7aa60cffab5367f2a72a4a70060cd Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Kernel panic due to page migration accessing memory holes 2010-02-19 1:47 ` Michael Bohan 2010-02-19 2:00 ` KAMEZAWA Hiroyuki @ 2010-02-19 8:30 ` Russell King - ARM Linux 2010-02-19 13:48 ` Mel Gorman 2 siblings, 0 replies; 13+ messages in thread From: Russell King - ARM Linux @ 2010-02-19 8:30 UTC (permalink / raw) To: Michael Bohan Cc: Mel Gorman, linux-arm-kernel, linux-mm, linux-kernel, KAMEZAWA Hiroyuki, linux-arm-msm On Thu, Feb 18, 2010 at 05:47:28PM -0800, Michael Bohan wrote: > I am actually using the pfn_valid implementation FLATMEM in > arch/arm/include/asm/memory.h. This one is very similar to the > asm-generic, and has no knowledge of the holes. Later kernels have a pfn_valid() which does have hole functionality. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Kernel panic due to page migration accessing memory holes 2010-02-19 1:47 ` Michael Bohan 2010-02-19 2:00 ` KAMEZAWA Hiroyuki 2010-02-19 8:30 ` Russell King - ARM Linux @ 2010-02-19 13:48 ` Mel Gorman 2 siblings, 0 replies; 13+ messages in thread From: Mel Gorman @ 2010-02-19 13:48 UTC (permalink / raw) To: Michael Bohan Cc: KAMEZAWA Hiroyuki, linux-mm, linux-kernel, linux-arm-msm, linux-arm-kernel On Thu, Feb 18, 2010 at 05:47:28PM -0800, Michael Bohan wrote: > On 2/18/2010 2:04 AM, Mel Gorman wrote: >> On Thu, Feb 18, 2010 at 06:36:04PM +0900, KAMEZAWA Hiroyuki wrote: >> >>> [Fact] >>> - There are 2 banks of memory and a memory hole on your machine. >>> As >>> 0x00200000 - 0x07D00000 >>> 0x40000000 - 0x43000000 >>> >>> - Each bancks are in the same zone. >>> - You use FLATMEM. >>> - You see panic in move_freepages(). >>> - Your host's MAX_ORDER=11....buddy allocator's alignment is 0x400000 >>> Then, it seems 1st bank is not algined. >>> >> It's not and assumptions are made about it being aligned. >> > > Would it be prudent to have the ARM mm init code detect unaligned, > discontiguous banks and print a warning message if > CONFIG_ARCH_HAS_HOLES_MEMORYMODEL is not configured? Should we take it > a step further and even BUG()? > I guess it wouldn't hurt. I wouldn't get too side-tracked though as it's not the most important issue here. >> ARM frees unused portions of memmap to save memory. It's why memmap_valid_within() >> exists when CONFIG_ARCH_HAS_HOLES_MEMORYMODEL although previously only >> reading /proc/pagetypeinfo cared. >> >> In that case, the FLATMEM memory map had unexpected holes which "never" >> happens and that was the workaround. The problem here is that there are >> unaligned zones but no pfn_valid() implementation that can identify >> them as you'd have with SPARSEMEM. My expectation is that you are using >> the pfn_valid() implementation from asm-generic >> >> #define pfn_valid(pfn) ((pfn)< max_mapnr) >> >> which is insufficient in your case. >> > > I am actually using the pfn_valid implementation FLATMEM in > arch/arm/include/asm/memory.h. This one is very similar to the > asm-generic, and has no knowledge of the holes. > Same problem applies so. >> I think it's more likely the at the memmap he is accessing has been >> freed and is effectively random data. >> >> > > I also think this is the case. > >> SPARSEMEM would give you an implementation of pfn_valid() that you could >> use here. The choices that spring to mind are; >> >> 1. reduce MAX_ORDER so they are aligned (easiest) >> > > Is it safe to assume that reducing MAX_ORDER will hurt performance? > No, it does not necessarily reduce performance. In some circumstances it might even help although I wouldn't chase after it. Downside one is that some hash tables might be getting hurt if you have a very large amount of memory (look for "hash table entries:" in dmesg after booting to see what order is being used). Downside two is that if some drivers require large contiguous memory early in boot, they might be hurt by MAX_ORDER being lower. If you require CONFIG_HUGETLB_PAGE, it might not be possible to reduce MAX_ORDER depending on the size of the huge page. >> 2. use SPARSEMEM (easy, but not necessary what you want to do, might >> waste memory unless you drop MAX_ORDER as well) >> > > We intend to use SPARSEMEM, but we'd also like to maintain FLATMEM > compatibility for some configurations. My guess is that there are other > ARM users that may want this support as well. > >> 3. implement a pfn_valid() that can handle the holes and set >> CONFIG_HOLES_IN_ZONE so it's called in move_freepages() to >> deal with the holes (should pass this by someone more familiar >> with ARM than I) >> > > This option seems the best to me. We should be able to implement an ARM > specific pfn_valid() that walks the ARM meminfo struct to ensure the pfn > is not within a hole. > Be sure to check your performance before and after. pfn_valid_within() is used in a fair few places and you are likely enabling it. > My only concern with this is a comment in __rmqueue_fallback() after > calling move_freepages_block() that states "Claim the whole block if > over half of it is free". Suppose only 1 MB is beyond the bank limit. > That means that over half of the pages of the 4 MB block will be > reported by move_freepages() as free -- but 1 MB of those pages are > invalid. Won't this cause problems if these pages are assumed to be > part of an active block? > The only operation taking place there is updating a bitmap so I doubt you'll hit snags there. > It seems like we should have an additional check in > move_freepages_block() with pfn_valid_within() to check the last page in > the block (eg. end_pfn) before calling move_freepages_block(). If the > last page is not valid, then we shouldn't we return 0 as in the zone > span check? This will also skip the extra burden of checking each > individual page, when we already know the proposed range is invalid. > You don't know where the holes are going to be so it is paranod rather than making assumptions about where architectures put holes. > Assuming we did return 0 in this case, would that sub-block of pages > ever be usable for anything else, or would it be effectively wasted? They're still usable. > If > this memory were wasted, then adjusting MAX_ORDER would have an > advantage in this sense -- ignoring any performance implications. > -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Kernel panic due to page migration accessing memory holes 2010-02-18 0:45 Kernel panic due to page migration accessing memory holes Michael Bohan 2010-02-18 1:03 ` KAMEZAWA Hiroyuki @ 2010-02-18 8:53 ` Russell King - ARM Linux 1 sibling, 0 replies; 13+ messages in thread From: Russell King - ARM Linux @ 2010-02-18 8:53 UTC (permalink / raw) To: Michael Bohan; +Cc: linux-mm, linux-arm-msm, linux-kernel, linux-arm-kernel On Wed, Feb 17, 2010 at 04:45:54PM -0800, Michael Bohan wrote: > I have encountered a kernel panic on the ARM/msm platform in the mm > migration code on 2.6.29. My memory configuration has two discontiguous > banks per our ATAG definition. These banks end up on addresses that > are 1 MB aligned. I am using FLATMEM (not SPARSEMEM), but my > understanding is that SPARSEMEM should not be necessary to support this > configuration. Please correct me if I'm wrong. Make sure you have ARCH_HAS_HOLES_MEMORYMODEL enabled. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2010-02-19 13:48 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-02-18 0:45 Kernel panic due to page migration accessing memory holes Michael Bohan 2010-02-18 1:03 ` KAMEZAWA Hiroyuki 2010-02-18 8:22 ` Michael Bohan 2010-02-18 9:36 ` KAMEZAWA Hiroyuki 2010-02-18 10:04 ` Mel Gorman 2010-02-19 1:47 ` Michael Bohan 2010-02-19 2:00 ` KAMEZAWA Hiroyuki 2010-02-19 5:48 ` Michael Bohan 2010-02-19 6:10 ` KAMEZAWA Hiroyuki 2010-02-19 8:21 ` KAMEZAWA Hiroyuki 2010-02-19 8:30 ` Russell King - ARM Linux 2010-02-19 13:48 ` Mel Gorman 2010-02-18 8:53 ` Russell King - ARM Linux
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).