* [Question] race condition in mm/page_alloc.c regarding page->lru? @ 2010-04-01 4:05 TAO HU 2010-04-02 3:51 ` TAO HU 0 siblings, 1 reply; 19+ messages in thread From: TAO HU @ 2010-04-01 4:05 UTC (permalink / raw) To: linux-mm; +Cc: linux-kernel, Ye Yuan.Bo-A22116, Chang Qing-A21550 Hi, all We got a panic on our ARM (OMAP) based HW. Our code is based on 2.6.29 kernel (last commit for mm/page_alloc.c is cc2559bccc72767cb446f79b071d96c30c26439b) It appears to crash while going through pcp->list in buffered_rmqueue() of mm/page_alloc.c after checking vmlinux. "00100100" implies LIST_POISON1 that suggests a race condition between list_add() and list_del() in my personal view. However we not yet figure out locking problem regarding page.lru. Any known issues about race condition in mm/page_alloc.c? And other hints are highly appreciated. /* Find a page of the appropriate migrate type */ if (cold) { ... ... } else { list_for_each_entry(page, &pcp->list, lru) if (page_private(page) == migratetype) break; } <1>[120898.805267] Unable to handle kernel paging request at virtual address 00100100 <1>[120898.805633] pgd = c1560000 <1>[120898.805786] [00100100] *pgd=897b3031, *pte=00000000, *ppte=00000000 <4>[120898.806457] Internal error: Oops: 17 [#1] PREEMPT ... ... <4>[120898.807861] CPU: 0 Not tainted (2.6.29-omap1 #1) <4>[120898.808044] PC is at get_page_from_freelist+0x1d0/0x4b0 <4>[120898.808227] LR is at get_page_from_freelist+0xc8/0x4b0 <4>[120898.808563] pc : [<c00a600c>] lr : [<c00a5f04>] psr: 800000d3 <4>[120898.808563] sp : c49fbd18 ip : 00000000 fp : c49fbd74 <4>[120898.809020] r10: 00000000 r9 : 001000e8 r8 : 00000002 <4>[120898.809204] r7 : 001200d2 r6 : 60000053 r5 : c0507c4c r4 : c49fa000 <4>[120898.809509] r3 : 001000e8 r2 : 00100100 r1 : c0507c6c r0 : 00000001 <4>[120898.809844] Flags: Nzcv IRQs off FIQs off Mode SVC_32 ISA ARM Segment kernel <4>[120898.810028] Control: 10c5387d Table: 82160019 DAC: 00000017 <4>[120898.948425] Backtrace: <4>[120898.948760] [<c00a5e3c>] (get_page_from_freelist+0x0/0x4b0) from [<c00a6398>] (__alloc_pages_internal+0xac/0x3e8) <4>[120898.949554] [<c00a62ec>] (__alloc_pages_internal+0x0/0x3e8) from [<c00b461c>] (handle_mm_fault+0x16c/0xbac) <4>[120898.950347] [<c00b44b0>] (handle_mm_fault+0x0/0xbac) from [<c00b51d0>] (__get_user_pages+0x174/0x2b4) <4>[120898.951019] [<c00b505c>] (__get_user_pages+0x0/0x2b4) from [<c00b534c>] (get_user_pages+0x3c/0x44) <4>[120898.951812] [<c00b5310>] (get_user_pages+0x0/0x44) from [<c00caf9c>] (get_arg_page+0x50/0xa4) <4>[120898.952636] [<c00caf4c>] (get_arg_page+0x0/0xa4) from [<c00cb1ec>] (copy_strings+0x108/0x210) <4>[120898.953430] r7:beffffe4 r6:00000ffc r5:00000000 r4:00000018 <4>[120898.954223] [<c00cb0e4>] (copy_strings+0x0/0x210) from [<c00cb330>] (copy_strings_kernel+0x3c/0x74) <4>[120898.955047] [<c00cb2f4>] (copy_strings_kernel+0x0/0x74) from [<c00cc778>] (do_execve+0x18c/0x2b0) <4>[120898.955841] r5:0001e240 r4:0001e224 <4>[120898.956329] [<c00cc5ec>] (do_execve+0x0/0x2b0) from [<c00400e4>] (sys_execve+0x3c/0x5c) <4>[120898.957153] [<c00400a8>] (sys_execve+0x0/0x5c) from [<c003ce80>] (ret_fast_syscall+0x0/0x2c) <4>[120898.957946] r7:0000000b r6:0001e270 r5:00000000 r4:0001d580 <4>[120898.958740] Code: e1530008 0a000006 e2429018 e1a03009 (e5b32018) -- Best Regards Hu Tao -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Question] race condition in mm/page_alloc.c regarding page->lru? 2010-04-01 4:05 [Question] race condition in mm/page_alloc.c regarding page->lru? TAO HU @ 2010-04-02 3:51 ` TAO HU 2010-04-02 5:03 ` KOSAKI Motohiro ` (3 more replies) 0 siblings, 4 replies; 19+ messages in thread From: TAO HU @ 2010-04-02 3:51 UTC (permalink / raw) To: linux-mm Cc: linux-kernel, Ye Yuan.Bo-A22116, Chang Qing-A21550, linux-arm-kernel [-- Attachment #1: Type: text/plain, Size: 5470 bytes --] 2 patches related to page_alloc.c were applied. Does anyone see a connection between the 2 patches and the panic? NOTE: the full patches are attached. diff --git a/mm/page_alloc.c b/mm/page_alloc.c index a596bfd..34a29e2 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2551,6 +2551,20 @@ static inline unsigned long wait_table_bits(unsigned long size) #define LONG_ALIGN(x) (((x)+(sizeof(long))-1)&~((sizeof(long))-1)) /* + * Check if a pageblock contains reserved pages + */ +static int pageblock_is_reserved(unsigned long start_pfn) +{ + unsigned long end_pfn = start_pfn + pageblock_nr_pages; + unsigned long pfn; + + for (pfn = start_pfn; pfn < end_pfn; pfn++) + if (PageReserved(pfn_to_page(pfn))) + return 1; + return 0; +} + +/* * Mark a number of pageblocks as MIGRATE_RESERVE. The number * of blocks reserved is based on zone->pages_min. The memory within the * reserve will tend to store contiguous free pages. Setting min_free_kbytes @@ -2579,7 +2593,7 @@ static void setup_zone_migrate_reserve(struct zone *zone) continue; /* Blocks with reserved pages will never free, skip them. */ - if (PageReserved(page)) + if (pageblock_is_reserved(pfn)) continue; block_migratetype = get_pageblock_migratetype(page); -- 1.5.4.3 diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 5c44ed4..a596bfd 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -119,6 +119,7 @@ static char * const zone_names[MAX_NR_ZONES] = { }; int min_free_kbytes = 1024; +int min_free_order_shift = 1; unsigned long __meminitdata nr_kernel_pages; unsigned long __meminitdata nr_all_pages; @@ -1256,7 +1257,7 @@ int zone_watermark_ok(struct zone *z, int order, unsigned long mark, free_pages -= z->free_area[o].nr_free << o; /* Require fewer higher order pages to be free */ - min >>= 1; + min >>= min_free_order_shift; if (free_pages <= min) return 0; -- On Thu, Apr 1, 2010 at 12:05 PM, TAO HU <tghk48@motorola.com> wrote: > Hi, all > > We got a panic on our ARM (OMAP) based HW. > Our code is based on 2.6.29 kernel (last commit for mm/page_alloc.c is > cc2559bccc72767cb446f79b071d96c30c26439b) > > It appears to crash while going through pcp->list in > buffered_rmqueue() of mm/page_alloc.c after checking vmlinux. > "00100100" implies LIST_POISON1 that suggests a race condition between > list_add() and list_del() in my personal view. > However we not yet figure out locking problem regarding page.lru. > > Any known issues about race condition in mm/page_alloc.c? > And other hints are highly appreciated. > > /* Find a page of the appropriate migrate type */ > if (cold) { > ... ... > } else { > list_for_each_entry(page, &pcp->list, lru) > if (page_private(page) == migratetype) > break; > } > > <1>[120898.805267] Unable to handle kernel paging request at virtual > address 00100100 > <1>[120898.805633] pgd = c1560000 > <1>[120898.805786] [00100100] *pgd=897b3031, *pte=00000000, *ppte=00000000 > <4>[120898.806457] Internal error: Oops: 17 [#1] PREEMPT > ... ... > <4>[120898.807861] CPU: 0 Not tainted (2.6.29-omap1 #1) > <4>[120898.808044] PC is at get_page_from_freelist+0x1d0/0x4b0 > <4>[120898.808227] LR is at get_page_from_freelist+0xc8/0x4b0 > <4>[120898.808563] pc : [<c00a600c>] lr : [<c00a5f04>] psr: 800000d3 > <4>[120898.808563] sp : c49fbd18 ip : 00000000 fp : c49fbd74 > <4>[120898.809020] r10: 00000000 r9 : 001000e8 r8 : 00000002 > <4>[120898.809204] r7 : 001200d2 r6 : 60000053 r5 : c0507c4c r4 : c49fa000 > <4>[120898.809509] r3 : 001000e8 r2 : 00100100 r1 : c0507c6c r0 : 00000001 > <4>[120898.809844] Flags: Nzcv IRQs off FIQs off Mode SVC_32 ISA > ARM Segment kernel > <4>[120898.810028] Control: 10c5387d Table: 82160019 DAC: 00000017 > <4>[120898.948425] Backtrace: > <4>[120898.948760] [<c00a5e3c>] (get_page_from_freelist+0x0/0x4b0) > from [<c00a6398>] (__alloc_pages_internal+0xac/0x3e8) > <4>[120898.949554] [<c00a62ec>] (__alloc_pages_internal+0x0/0x3e8) > from [<c00b461c>] (handle_mm_fault+0x16c/0xbac) > <4>[120898.950347] [<c00b44b0>] (handle_mm_fault+0x0/0xbac) from > [<c00b51d0>] (__get_user_pages+0x174/0x2b4) > <4>[120898.951019] [<c00b505c>] (__get_user_pages+0x0/0x2b4) from > [<c00b534c>] (get_user_pages+0x3c/0x44) > <4>[120898.951812] [<c00b5310>] (get_user_pages+0x0/0x44) from > [<c00caf9c>] (get_arg_page+0x50/0xa4) > <4>[120898.952636] [<c00caf4c>] (get_arg_page+0x0/0xa4) from > [<c00cb1ec>] (copy_strings+0x108/0x210) > <4>[120898.953430] r7:beffffe4 r6:00000ffc r5:00000000 r4:00000018 > <4>[120898.954223] [<c00cb0e4>] (copy_strings+0x0/0x210) from > [<c00cb330>] (copy_strings_kernel+0x3c/0x74) > <4>[120898.955047] [<c00cb2f4>] (copy_strings_kernel+0x0/0x74) from > [<c00cc778>] (do_execve+0x18c/0x2b0) > <4>[120898.955841] r5:0001e240 r4:0001e224 > <4>[120898.956329] [<c00cc5ec>] (do_execve+0x0/0x2b0) from > [<c00400e4>] (sys_execve+0x3c/0x5c) > <4>[120898.957153] [<c00400a8>] (sys_execve+0x0/0x5c) from > [<c003ce80>] (ret_fast_syscall+0x0/0x2c) > <4>[120898.957946] r7:0000000b r6:0001e270 r5:00000000 r4:0001d580 > <4>[120898.958740] Code: e1530008 0a000006 e2429018 e1a03009 (e5b32018) > > > > -- > Best Regards > Hu Tao > [-- Attachment #2: 0001-mm-Add-min_free_order_shift-tunable.patch --] [-- Type: application/octet-stream, Size: 2105 bytes --] From d620f695290e4ffb1586420ba1dbbb5b2c8c075d Mon Sep 17 00:00:00 2001 From: =?utf-8?q?Arve=20Hj=C3=B8nnev=C3=A5g?= <arve@android.com> Date: Tue, 17 Feb 2009 14:51:02 -0800 Subject: [PATCH] mm: Add min_free_order_shift tunable. MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit By default the kernel tries to keep half as much memory free at each order as it does for one order below. This can be too agressive when running without swap. Signed-off-by: Arve Hjønnevåg <arve@android.com> --- kernel/sysctl.c | 9 +++++++++ mm/page_alloc.c | 3 ++- 2 files changed, 11 insertions(+), 1 deletions(-) diff --git a/kernel/sysctl.c b/kernel/sysctl.c index c5ef44f..0e3d9aa 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -76,6 +76,7 @@ extern int suid_dumpable; extern char core_pattern[]; extern int pid_max; extern int min_free_kbytes; +extern int min_free_order_shift; extern int pid_max_min, pid_max_max; extern int sysctl_drop_caches; extern int percpu_pagelist_fraction; @@ -1097,6 +1098,14 @@ static struct ctl_table vm_table[] = { .extra1 = &zero, }, { + .ctl_name = CTL_UNNUMBERED, + .procname = "min_free_order_shift", + .data = &min_free_order_shift, + .maxlen = sizeof(min_free_order_shift), + .mode = 0644, + .proc_handler = &proc_dointvec + }, + { .ctl_name = VM_PERCPU_PAGELIST_FRACTION, .procname = "percpu_pagelist_fraction", .data = &percpu_pagelist_fraction, diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 5c44ed4..a596bfd 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -119,6 +119,7 @@ static char * const zone_names[MAX_NR_ZONES] = { }; int min_free_kbytes = 1024; +int min_free_order_shift = 1; unsigned long __meminitdata nr_kernel_pages; unsigned long __meminitdata nr_all_pages; @@ -1256,7 +1257,7 @@ int zone_watermark_ok(struct zone *z, int order, unsigned long mark, free_pages -= z->free_area[o].nr_free << o; /* Require fewer higher order pages to be free */ - min >>= 1; + min >>= min_free_order_shift; if (free_pages <= min) return 0; -- 1.5.4.3 [-- Attachment #3: 0002-mm-Check-if-any-page-in-a-pageblock-is-reserved-bef.patch --] [-- Type: application/octet-stream, Size: 1747 bytes --] From a4eb204a8029320c2dd748daf4f51fd48d337c3d Mon Sep 17 00:00:00 2001 From: =?utf-8?q?Arve=20Hj=C3=B8nnev=C3=A5g?= <arve@android.com> Date: Wed, 18 Mar 2009 17:27:31 -0700 Subject: [PATCH] mm: Check if any page in a pageblock is reserved before marking it MIGRATE_RESERVE This fixes a problem where the first pageblock got marked MIGRATE_RESERVE even though it only had a few free pages. This in turn caused no contiguous memory to be reserved and frequent kswapd wakeups that emptied the caches to get more contiguous memory. --- mm/page_alloc.c | 16 +++++++++++++++- 1 files changed, 15 insertions(+), 1 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index a596bfd..34a29e2 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2551,6 +2551,20 @@ static inline unsigned long wait_table_bits(unsigned long size) #define LONG_ALIGN(x) (((x)+(sizeof(long))-1)&~((sizeof(long))-1)) /* + * Check if a pageblock contains reserved pages + */ +static int pageblock_is_reserved(unsigned long start_pfn) +{ + unsigned long end_pfn = start_pfn + pageblock_nr_pages; + unsigned long pfn; + + for (pfn = start_pfn; pfn < end_pfn; pfn++) + if (PageReserved(pfn_to_page(pfn))) + return 1; + return 0; +} + +/* * Mark a number of pageblocks as MIGRATE_RESERVE. The number * of blocks reserved is based on zone->pages_min. The memory within the * reserve will tend to store contiguous free pages. Setting min_free_kbytes @@ -2579,7 +2593,7 @@ static void setup_zone_migrate_reserve(struct zone *zone) continue; /* Blocks with reserved pages will never free, skip them. */ - if (PageReserved(page)) + if (pageblock_is_reserved(pfn)) continue; block_migratetype = get_pageblock_migratetype(page); -- 1.5.4.3 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [Question] race condition in mm/page_alloc.c regarding page->lru? 2010-04-02 3:51 ` TAO HU @ 2010-04-02 5:03 ` KOSAKI Motohiro 2010-04-02 5:19 ` TAO HU 2010-04-02 9:48 ` Mel Gorman 2010-04-02 5:04 ` [Question] race condition in mm/page_alloc.c regarding page->lru? KAMEZAWA Hiroyuki ` (2 subsequent siblings) 3 siblings, 2 replies; 19+ messages in thread From: KOSAKI Motohiro @ 2010-04-02 5:03 UTC (permalink / raw) To: TAO HU Cc: kosaki.motohiro, linux-mm, linux-kernel, Ye Yuan.Bo-A22116, Chang Qing-A21550, linux-arm-kernel, Mel Gorman Cc to Mel, > 2 patches related to page_alloc.c were applied. > Does anyone see a connection between the 2 patches and the panic? > NOTE: the full patches are attached. I think your attached two patches are perfectly unrelated your problem. "mm: Add min_free_order_shift tunable." seems makes zero sense. I don't think this patch need to be merge. but "mm: Check if any page in a pageblock is reserved before marking it MIGRATE_RESERVE" treat strange hardware correctly, I think. If Mel ack this, I hope merge it. Mel, Can we hear your opinion? > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index a596bfd..34a29e2 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -2551,6 +2551,20 @@ static inline unsigned long > wait_table_bits(unsigned long size) > #define LONG_ALIGN(x) (((x)+(sizeof(long))-1)&~((sizeof(long))-1)) > > /* > + * Check if a pageblock contains reserved pages > + */ > +static int pageblock_is_reserved(unsigned long start_pfn) > +{ > + unsigned long end_pfn = start_pfn + pageblock_nr_pages; > + unsigned long pfn; > + > + for (pfn = start_pfn; pfn < end_pfn; pfn++) > + if (PageReserved(pfn_to_page(pfn))) > + return 1; > + return 0; > +} > + > +/* > * Mark a number of pageblocks as MIGRATE_RESERVE. The number > * of blocks reserved is based on zone->pages_min. The memory within the > * reserve will tend to store contiguous free pages. Setting min_free_kbytes > @@ -2579,7 +2593,7 @@ static void setup_zone_migrate_reserve(struct zone *zone) > continue; > > /* Blocks with reserved pages will never free, skip them. */ > - if (PageReserved(page)) > + if (pageblock_is_reserved(pfn)) > continue; > > block_migratetype = get_pageblock_migratetype(page); > -- > 1.5.4.3 > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 5c44ed4..a596bfd 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -119,6 +119,7 @@ static char * const zone_names[MAX_NR_ZONES] = { > }; > > int min_free_kbytes = 1024; > +int min_free_order_shift = 1; > > unsigned long __meminitdata nr_kernel_pages; > unsigned long __meminitdata nr_all_pages; > @@ -1256,7 +1257,7 @@ int zone_watermark_ok(struct zone *z, int order, > unsigned long mark, > free_pages -= z->free_area[o].nr_free << o; > > /* Require fewer higher order pages to be free */ > - min >>= 1; > + min >>= min_free_order_shift; > > if (free_pages <= min) > return 0; > -- > > > On Thu, Apr 1, 2010 at 12:05 PM, TAO HU <tghk48@motorola.com> wrote: > > Hi, all > > > > We got a panic on our ARM (OMAP) based HW. > > Our code is based on 2.6.29 kernel (last commit for mm/page_alloc.c is > > cc2559bccc72767cb446f79b071d96c30c26439b) > > > > It appears to crash while going through pcp->list in > > buffered_rmqueue() of mm/page_alloc.c after checking vmlinux. > > "00100100" implies LIST_POISON1 that suggests a race condition between > > list_add() and list_del() in my personal view. > > However we not yet figure out locking problem regarding page.lru. > > > > Any known issues about race condition in mm/page_alloc.c? > > And other hints are highly appreciated. > > > > /* Find a page of the appropriate migrate type */ > > if (cold) { > > ... ... > > } else { > > list_for_each_entry(page, &pcp->list, lru) > > if (page_private(page) == migratetype) > > break; > > } > > > > <1>[120898.805267] Unable to handle kernel paging request at virtual > > address 00100100 > > <1>[120898.805633] pgd = c1560000 > > <1>[120898.805786] [00100100] *pgd=897b3031, *pte=00000000, *ppte=00000000 > > <4>[120898.806457] Internal error: Oops: 17 [#1] PREEMPT > > ... ... > > <4>[120898.807861] CPU: 0 Not tainted (2.6.29-omap1 #1) > > <4>[120898.808044] PC is at get_page_from_freelist+0x1d0/0x4b0 > > <4>[120898.808227] LR is at get_page_from_freelist+0xc8/0x4b0 > > <4>[120898.808563] pc : [<c00a600c>] lr : [<c00a5f04>] psr: 800000d3 > > <4>[120898.808563] sp : c49fbd18 ip : 00000000 fp : c49fbd74 > > <4>[120898.809020] r10: 00000000 r9 : 001000e8 r8 : 00000002 > > <4>[120898.809204] r7 : 001200d2 r6 : 60000053 r5 : c0507c4c r4 : c49fa000 > > <4>[120898.809509] r3 : 001000e8 r2 : 00100100 r1 : c0507c6c r0 : 00000001 > > <4>[120898.809844] Flags: Nzcv IRQs off FIQs off Mode SVC_32 ISA > > ARM Segment kernel > > <4>[120898.810028] Control: 10c5387d Table: 82160019 DAC: 00000017 > > <4>[120898.948425] Backtrace: > > <4>[120898.948760] [<c00a5e3c>] (get_page_from_freelist+0x0/0x4b0) > > from [<c00a6398>] (__alloc_pages_internal+0xac/0x3e8) > > <4>[120898.949554] [<c00a62ec>] (__alloc_pages_internal+0x0/0x3e8) > > from [<c00b461c>] (handle_mm_fault+0x16c/0xbac) > > <4>[120898.950347] [<c00b44b0>] (handle_mm_fault+0x0/0xbac) from > > [<c00b51d0>] (__get_user_pages+0x174/0x2b4) > > <4>[120898.951019] [<c00b505c>] (__get_user_pages+0x0/0x2b4) from > > [<c00b534c>] (get_user_pages+0x3c/0x44) > > <4>[120898.951812] [<c00b5310>] (get_user_pages+0x0/0x44) from > > [<c00caf9c>] (get_arg_page+0x50/0xa4) > > <4>[120898.952636] [<c00caf4c>] (get_arg_page+0x0/0xa4) from > > [<c00cb1ec>] (copy_strings+0x108/0x210) > > <4>[120898.953430] r7:beffffe4 r6:00000ffc r5:00000000 r4:00000018 > > <4>[120898.954223] [<c00cb0e4>] (copy_strings+0x0/0x210) from > > [<c00cb330>] (copy_strings_kernel+0x3c/0x74) > > <4>[120898.955047] [<c00cb2f4>] (copy_strings_kernel+0x0/0x74) from > > [<c00cc778>] (do_execve+0x18c/0x2b0) > > <4>[120898.955841] r5:0001e240 r4:0001e224 > > <4>[120898.956329] [<c00cc5ec>] (do_execve+0x0/0x2b0) from > > [<c00400e4>] (sys_execve+0x3c/0x5c) > > <4>[120898.957153] [<c00400a8>] (sys_execve+0x0/0x5c) from > > [<c003ce80>] (ret_fast_syscall+0x0/0x2c) > > <4>[120898.957946] r7:0000000b r6:0001e270 r5:00000000 r4:0001d580 > > <4>[120898.958740] Code: e1530008 0a000006 e2429018 e1a03009 (e5b32018) > > > > > > > > -- > > Best Regards > > Hu Tao > > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Question] race condition in mm/page_alloc.c regarding page->lru? 2010-04-02 5:03 ` KOSAKI Motohiro @ 2010-04-02 5:19 ` TAO HU 2010-04-02 9:48 ` Mel Gorman 1 sibling, 0 replies; 19+ messages in thread From: TAO HU @ 2010-04-02 5:19 UTC (permalink / raw) To: KOSAKI Motohiro Cc: TAO HU, linux-mm, linux-kernel, Ye Yuan.Bo-A22116, Chang Qing-A21550, linux-arm-kernel, Mel Gorman, arve Hi, KOSAKI Motohiro I'm glad to know your're considering patch "mm: Check if any ..." though it is not my original purpose :) cc: Arve Hjønnevåg who is the author On Fri, Apr 2, 2010 at 1:03 PM, KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote: > Cc to Mel, > >> 2 patches related to page_alloc.c were applied. >> Does anyone see a connection between the 2 patches and the panic? >> NOTE: the full patches are attached. > > I think your attached two patches are perfectly unrelated your problem. > > "mm: Add min_free_order_shift tunable." seems makes zero sense. I don't think this patch > need to be merge. > > but "mm: Check if any page in a pageblock is reserved before marking it MIGRATE_RESERVE" > treat strange hardware correctly, I think. If Mel ack this, I hope merge it. > Mel, Can we hear your opinion? > >> >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> index a596bfd..34a29e2 100644 >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -2551,6 +2551,20 @@ static inline unsigned long >> wait_table_bits(unsigned long size) >> #define LONG_ALIGN(x) (((x)+(sizeof(long))-1)&~((sizeof(long))-1)) >> >> /* >> + * Check if a pageblock contains reserved pages >> + */ >> +static int pageblock_is_reserved(unsigned long start_pfn) >> +{ >> + unsigned long end_pfn = start_pfn + pageblock_nr_pages; >> + unsigned long pfn; >> + >> + for (pfn = start_pfn; pfn < end_pfn; pfn++) >> + if (PageReserved(pfn_to_page(pfn))) >> + return 1; >> + return 0; >> +} >> + >> +/* >> * Mark a number of pageblocks as MIGRATE_RESERVE. The number >> * of blocks reserved is based on zone->pages_min. The memory within the >> * reserve will tend to store contiguous free pages. Setting min_free_kbytes >> @@ -2579,7 +2593,7 @@ static void setup_zone_migrate_reserve(struct zone *zone) >> continue; >> >> /* Blocks with reserved pages will never free, skip them. */ >> - if (PageReserved(page)) >> + if (pageblock_is_reserved(pfn)) >> continue; >> >> block_migratetype = get_pageblock_migratetype(page); >> -- >> 1.5.4.3 >> >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> index 5c44ed4..a596bfd 100644 >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -119,6 +119,7 @@ static char * const zone_names[MAX_NR_ZONES] = { >> }; >> >> int min_free_kbytes = 1024; >> +int min_free_order_shift = 1; >> >> unsigned long __meminitdata nr_kernel_pages; >> unsigned long __meminitdata nr_all_pages; >> @@ -1256,7 +1257,7 @@ int zone_watermark_ok(struct zone *z, int order, >> unsigned long mark, >> free_pages -= z->free_area[o].nr_free << o; >> >> /* Require fewer higher order pages to be free */ >> - min >>= 1; >> + min >>= min_free_order_shift; >> >> if (free_pages <= min) >> return 0; >> -- >> >> >> On Thu, Apr 1, 2010 at 12:05 PM, TAO HU <tghk48@motorola.com> wrote: >> > Hi, all >> > >> > We got a panic on our ARM (OMAP) based HW. >> > Our code is based on 2.6.29 kernel (last commit for mm/page_alloc.c is >> > cc2559bccc72767cb446f79b071d96c30c26439b) >> > >> > It appears to crash while going through pcp->list in >> > buffered_rmqueue() of mm/page_alloc.c after checking vmlinux. >> > "00100100" implies LIST_POISON1 that suggests a race condition between >> > list_add() and list_del() in my personal view. >> > However we not yet figure out locking problem regarding page.lru. >> > >> > Any known issues about race condition in mm/page_alloc.c? >> > And other hints are highly appreciated. >> > >> > /* Find a page of the appropriate migrate type */ >> > if (cold) { >> > ... ... >> > } else { >> > list_for_each_entry(page, &pcp->list, lru) >> > if (page_private(page) == migratetype) >> > break; >> > } >> > >> > <1>[120898.805267] Unable to handle kernel paging request at virtual >> > address 00100100 >> > <1>[120898.805633] pgd = c1560000 >> > <1>[120898.805786] [00100100] *pgd=897b3031, *pte=00000000, *ppte=00000000 >> > <4>[120898.806457] Internal error: Oops: 17 [#1] PREEMPT >> > ... ... >> > <4>[120898.807861] CPU: 0 Not tainted (2.6.29-omap1 #1) >> > <4>[120898.808044] PC is at get_page_from_freelist+0x1d0/0x4b0 >> > <4>[120898.808227] LR is at get_page_from_freelist+0xc8/0x4b0 >> > <4>[120898.808563] pc : [<c00a600c>] lr : [<c00a5f04>] psr: 800000d3 >> > <4>[120898.808563] sp : c49fbd18 ip : 00000000 fp : c49fbd74 >> > <4>[120898.809020] r10: 00000000 r9 : 001000e8 r8 : 00000002 >> > <4>[120898.809204] r7 : 001200d2 r6 : 60000053 r5 : c0507c4c r4 : c49fa000 >> > <4>[120898.809509] r3 : 001000e8 r2 : 00100100 r1 : c0507c6c r0 : 00000001 >> > <4>[120898.809844] Flags: Nzcv IRQs off FIQs off Mode SVC_32 ISA >> > ARM Segment kernel >> > <4>[120898.810028] Control: 10c5387d Table: 82160019 DAC: 00000017 >> > <4>[120898.948425] Backtrace: >> > <4>[120898.948760] [<c00a5e3c>] (get_page_from_freelist+0x0/0x4b0) >> > from [<c00a6398>] (__alloc_pages_internal+0xac/0x3e8) >> > <4>[120898.949554] [<c00a62ec>] (__alloc_pages_internal+0x0/0x3e8) >> > from [<c00b461c>] (handle_mm_fault+0x16c/0xbac) >> > <4>[120898.950347] [<c00b44b0>] (handle_mm_fault+0x0/0xbac) from >> > [<c00b51d0>] (__get_user_pages+0x174/0x2b4) >> > <4>[120898.951019] [<c00b505c>] (__get_user_pages+0x0/0x2b4) from >> > [<c00b534c>] (get_user_pages+0x3c/0x44) >> > <4>[120898.951812] [<c00b5310>] (get_user_pages+0x0/0x44) from >> > [<c00caf9c>] (get_arg_page+0x50/0xa4) >> > <4>[120898.952636] [<c00caf4c>] (get_arg_page+0x0/0xa4) from >> > [<c00cb1ec>] (copy_strings+0x108/0x210) >> > <4>[120898.953430] r7:beffffe4 r6:00000ffc r5:00000000 r4:00000018 >> > <4>[120898.954223] [<c00cb0e4>] (copy_strings+0x0/0x210) from >> > [<c00cb330>] (copy_strings_kernel+0x3c/0x74) >> > <4>[120898.955047] [<c00cb2f4>] (copy_strings_kernel+0x0/0x74) from >> > [<c00cc778>] (do_execve+0x18c/0x2b0) >> > <4>[120898.955841] r5:0001e240 r4:0001e224 >> > <4>[120898.956329] [<c00cc5ec>] (do_execve+0x0/0x2b0) from >> > [<c00400e4>] (sys_execve+0x3c/0x5c) >> > <4>[120898.957153] [<c00400a8>] (sys_execve+0x0/0x5c) from >> > [<c003ce80>] (ret_fast_syscall+0x0/0x2c) >> > <4>[120898.957946] r7:0000000b r6:0001e270 r5:00000000 r4:0001d580 >> > <4>[120898.958740] Code: e1530008 0a000006 e2429018 e1a03009 (e5b32018) >> > >> > >> > >> > -- >> > Best Regards >> > Hu Tao >> > > > > > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Question] race condition in mm/page_alloc.c regarding page->lru? 2010-04-02 5:03 ` KOSAKI Motohiro 2010-04-02 5:19 ` TAO HU @ 2010-04-02 9:48 ` Mel Gorman 2010-04-03 0:59 ` Arve Hjønnevåg 1 sibling, 1 reply; 19+ messages in thread From: Mel Gorman @ 2010-04-02 9:48 UTC (permalink / raw) To: KOSAKI Motohiro Cc: TAO HU, linux-mm, linux-kernel, Ye Yuan.Bo-A22116, Chang Qing-A21550, linux-arm-kernel On Fri, Apr 02, 2010 at 02:03:23PM +0900, KOSAKI Motohiro wrote: > Cc to Mel, > > > 2 patches related to page_alloc.c were applied. > > Does anyone see a connection between the 2 patches and the panic? > > NOTE: the full patches are attached. > > I think your attached two patches are perfectly unrelated your problem. > Agreed. It's unlikely that there is a race as such in the page allocator. In buffered_rmqueue that you initially talk about, the lists being manipulated are per-cpu lists. About the only way to corrupt them is if you had a NMI hander that called the page allocator. I really hope your platform is not doing anything like that. A double free of page->lru is a possibility. You could try reproducing the problem with CONFIG_DEBUG_LIST enabled to see if anything falls out. > "mm: Add min_free_order_shift tunable." seems makes zero sense. I don't think this patch > need to be merge. > It makes a marginal amount of sense. Basically what it does is allowing high-order allocations to go much further below their watermarks than is currently allowed. If the platform in question is doing a lot of high-order allocations, this patch could be seen to "fix" the problem but you wouldn't touch mainline with it with a barge pole. It would be more stable to fix the drivers to not use high order allocations or use a mempool. It is inconceivable this patch is related to the problem though. > but "mm: Check if any page in a pageblock is reserved before marking it MIGRATE_RESERVE" > treat strange hardware correctly, I think. If Mel ack this, I hope merge it. > Mel, Can we hear your opinion? > This patch is interesting and I am surprised it is required. Is it really the case that page blocks near the start of a zone are dominated with PageReserved pages but the first one happen to be free? I guess it's conceivable on ARM where memmap can be freed at boot time. There is a theoritical problem with the patch but it is easily resolved. A PFN walker like this must call pfn_valid_within() before calling pfn_to_page(). If they do not, it's possible to get complete garbage for the page and result in a bad dereference. In this particular case, it would be a kernel oops rather than memory corruption though. If that was fixed, I'd see no problem with Acking the patch. It is also inconceivable this patch is related to the problem. > > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > index a596bfd..34a29e2 100644 > > --- a/mm/page_alloc.c > > +++ b/mm/page_alloc.c > > @@ -2551,6 +2551,20 @@ static inline unsigned long > > wait_table_bits(unsigned long size) > > #define LONG_ALIGN(x) (((x)+(sizeof(long))-1)&~((sizeof(long))-1)) > > > > /* > > + * Check if a pageblock contains reserved pages > > + */ > > +static int pageblock_is_reserved(unsigned long start_pfn) > > +{ > > + unsigned long end_pfn = start_pfn + pageblock_nr_pages; > > + unsigned long pfn; > > + > > + for (pfn = start_pfn; pfn < end_pfn; pfn++) > > + if (PageReserved(pfn_to_page(pfn))) > > + return 1; > > + return 0; > > +} > > + > > +/* > > * Mark a number of pageblocks as MIGRATE_RESERVE. The number > > * of blocks reserved is based on zone->pages_min. The memory within the > > * reserve will tend to store contiguous free pages. Setting min_free_kbytes > > @@ -2579,7 +2593,7 @@ static void setup_zone_migrate_reserve(struct zone *zone) > > continue; > > > > /* Blocks with reserved pages will never free, skip them. */ > > - if (PageReserved(page)) > > + if (pageblock_is_reserved(pfn)) > > continue; > > > > block_migratetype = get_pageblock_migratetype(page); > > -- > > 1.5.4.3 > > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > index 5c44ed4..a596bfd 100644 > > --- a/mm/page_alloc.c > > +++ b/mm/page_alloc.c > > @@ -119,6 +119,7 @@ static char * const zone_names[MAX_NR_ZONES] = { > > }; > > > > int min_free_kbytes = 1024; > > +int min_free_order_shift = 1; > > > > unsigned long __meminitdata nr_kernel_pages; > > unsigned long __meminitdata nr_all_pages; > > @@ -1256,7 +1257,7 @@ int zone_watermark_ok(struct zone *z, int order, > > unsigned long mark, > > free_pages -= z->free_area[o].nr_free << o; > > > > /* Require fewer higher order pages to be free */ > > - min >>= 1; > > + min >>= min_free_order_shift; > > > > if (free_pages <= min) > > return 0; > > -- > > > > > > On Thu, Apr 1, 2010 at 12:05 PM, TAO HU <tghk48@motorola.com> wrote: > > > Hi, all > > > > > > We got a panic on our ARM (OMAP) based HW. > > > Our code is based on 2.6.29 kernel (last commit for mm/page_alloc.c is > > > cc2559bccc72767cb446f79b071d96c30c26439b) > > > > > > It appears to crash while going through pcp->list in > > > buffered_rmqueue() of mm/page_alloc.c after checking vmlinux. > > > "00100100" implies LIST_POISON1 that suggests a race condition between > > > list_add() and list_del() in my personal view. > > > However we not yet figure out locking problem regarding page.lru. > > > > > > Any known issues about race condition in mm/page_alloc.c? > > > And other hints are highly appreciated. > > > > > > /* Find a page of the appropriate migrate type */ > > > if (cold) { > > > ... ... > > > } else { > > > list_for_each_entry(page, &pcp->list, lru) > > > if (page_private(page) == migratetype) > > > break; > > > } > > > > > > <1>[120898.805267] Unable to handle kernel paging request at virtual > > > address 00100100 > > > <1>[120898.805633] pgd = c1560000 > > > <1>[120898.805786] [00100100] *pgd=897b3031, *pte=00000000, *ppte=00000000 > > > <4>[120898.806457] Internal error: Oops: 17 [#1] PREEMPT > > > ... ... > > > <4>[120898.807861] CPU: 0 Not tainted (2.6.29-omap1 #1) > > > <4>[120898.808044] PC is at get_page_from_freelist+0x1d0/0x4b0 > > > <4>[120898.808227] LR is at get_page_from_freelist+0xc8/0x4b0 > > > <4>[120898.808563] pc : [<c00a600c>] lr : [<c00a5f04>] psr: 800000d3 > > > <4>[120898.808563] sp : c49fbd18 ip : 00000000 fp : c49fbd74 > > > <4>[120898.809020] r10: 00000000 r9 : 001000e8 r8 : 00000002 > > > <4>[120898.809204] r7 : 001200d2 r6 : 60000053 r5 : c0507c4c r4 : c49fa000 > > > <4>[120898.809509] r3 : 001000e8 r2 : 00100100 r1 : c0507c6c r0 : 00000001 > > > <4>[120898.809844] Flags: Nzcv IRQs off FIQs off Mode SVC_32 ISA > > > ARM Segment kernel > > > <4>[120898.810028] Control: 10c5387d Table: 82160019 DAC: 00000017 > > > <4>[120898.948425] Backtrace: > > > <4>[120898.948760] [<c00a5e3c>] (get_page_from_freelist+0x0/0x4b0) > > > from [<c00a6398>] (__alloc_pages_internal+0xac/0x3e8) > > > <4>[120898.949554] [<c00a62ec>] (__alloc_pages_internal+0x0/0x3e8) > > > from [<c00b461c>] (handle_mm_fault+0x16c/0xbac) > > > <4>[120898.950347] [<c00b44b0>] (handle_mm_fault+0x0/0xbac) from > > > [<c00b51d0>] (__get_user_pages+0x174/0x2b4) > > > <4>[120898.951019] [<c00b505c>] (__get_user_pages+0x0/0x2b4) from > > > [<c00b534c>] (get_user_pages+0x3c/0x44) > > > <4>[120898.951812] [<c00b5310>] (get_user_pages+0x0/0x44) from > > > [<c00caf9c>] (get_arg_page+0x50/0xa4) > > > <4>[120898.952636] [<c00caf4c>] (get_arg_page+0x0/0xa4) from > > > [<c00cb1ec>] (copy_strings+0x108/0x210) > > > <4>[120898.953430] r7:beffffe4 r6:00000ffc r5:00000000 r4:00000018 > > > <4>[120898.954223] [<c00cb0e4>] (copy_strings+0x0/0x210) from > > > [<c00cb330>] (copy_strings_kernel+0x3c/0x74) > > > <4>[120898.955047] [<c00cb2f4>] (copy_strings_kernel+0x0/0x74) from > > > [<c00cc778>] (do_execve+0x18c/0x2b0) > > > <4>[120898.955841] r5:0001e240 r4:0001e224 > > > <4>[120898.956329] [<c00cc5ec>] (do_execve+0x0/0x2b0) from > > > [<c00400e4>] (sys_execve+0x3c/0x5c) > > > <4>[120898.957153] [<c00400a8>] (sys_execve+0x0/0x5c) from > > > [<c003ce80>] (ret_fast_syscall+0x0/0x2c) > > > <4>[120898.957946] r7:0000000b r6:0001e270 r5:00000000 r4:0001d580 > > > <4>[120898.958740] Code: e1530008 0a000006 e2429018 e1a03009 (e5b32018) > > > > > > > > > > > > -- > > > Best Regards > > > Hu Tao > > > > > > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> > -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Question] race condition in mm/page_alloc.c regarding page->lru? 2010-04-02 9:48 ` Mel Gorman @ 2010-04-03 0:59 ` Arve Hjønnevåg 2010-04-04 22:45 ` KOSAKI Motohiro 2010-04-05 10:14 ` Mel Gorman 0 siblings, 2 replies; 19+ messages in thread From: Arve Hjønnevåg @ 2010-04-03 0:59 UTC (permalink / raw) To: Mel Gorman Cc: KOSAKI Motohiro, TAO HU, linux-mm, linux-kernel, Ye Yuan.Bo-A22116, Chang Qing-A21550, linux-arm-kernel On Fri, Apr 2, 2010 at 2:48 AM, Mel Gorman <mel@csn.ul.ie> wrote: > On Fri, Apr 02, 2010 at 02:03:23PM +0900, KOSAKI Motohiro wrote: >> Cc to Mel, >> >> > 2 patches related to page_alloc.c were applied. >> > Does anyone see a connection between the 2 patches and the panic? >> > NOTE: the full patches are attached. >> >> I think your attached two patches are perfectly unrelated your problem. >> > > Agreed. It's unlikely that there is a race as such in the page > allocator. In buffered_rmqueue that you initially talk about, the lists > being manipulated are per-cpu lists. About the only way to corrupt them > is if you had a NMI hander that called the page allocator. I really hope > your platform is not doing anything like that. > > A double free of page->lru is a possibility. You could try reproducing > the problem with CONFIG_DEBUG_LIST enabled to see if anything falls out. > >> "mm: Add min_free_order_shift tunable." seems makes zero sense. I don't think this patch >> need to be merge. >> > > It makes a marginal amount of sense. Basically what it does is allowing > high-order allocations to go much further below their watermarks than is > currently allowed. If the platform in question is doing a lot of high-order > allocations, this patch could be seen to "fix" the problem but you wouldn't > touch mainline with it with a barge pole. It would be more stable to fix > the drivers to not use high order allocations or use a mempool. > The high order allocation that caused problems was the first level page table for each process. Each time a new process started the kernel would empty the entire page cache to create contiguous free memory. With the reserved pageblock mostly full (fixed by the second patch) this contiguous memory would then almost immediately get used for low order allocations, so the same problem starts again when the next process starts. I agree this patch does not fix the problem, but it does improve things when the problem hits. I have not seen a device in this situation with the second patch applied, but I did not remove the first patch in case the reserved pageblock fills up. > It is inconceivable this patch is related to the problem though. > >> but "mm: Check if any page in a pageblock is reserved before marking it MIGRATE_RESERVE" >> treat strange hardware correctly, I think. If Mel ack this, I hope merge it. >> Mel, Can we hear your opinion? >> > > This patch is interesting and I am surprised it is required. Is it really the > case that page blocks near the start of a zone are dominated with PageReserved > pages but the first one happen to be free? I guess it's conceivable on ARM > where memmap can be freed at boot time. I think this happens by default on arm. The kernel starts at offset 0x8000 to leave room for boot parameters, and in recent kernel versions (>~2.6.26-29) this memory is freed. > > There is a theoritical problem with the patch but it is easily resolved. > A PFN walker like this must call pfn_valid_within() before calling > pfn_to_page(). If they do not, it's possible to get complete garbage > for the page and result in a bad dereference. In this particular case, > it would be a kernel oops rather than memory corruption though. > > If that was fixed, I'd see no problem with Acking the patch. > I can fix this if you want the patch in mainline. I was not sure it was acceptable since will slow down boot on all systems, even where it is not needed. > It is also inconceivable this patch is related to the problem. > >> > >> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> > index a596bfd..34a29e2 100644 >> > --- a/mm/page_alloc.c >> > +++ b/mm/page_alloc.c >> > @@ -2551,6 +2551,20 @@ static inline unsigned long >> > wait_table_bits(unsigned long size) >> > #define LONG_ALIGN(x) (((x)+(sizeof(long))-1)&~((sizeof(long))-1)) >> > >> > /* >> > + * Check if a pageblock contains reserved pages >> > + */ >> > +static int pageblock_is_reserved(unsigned long start_pfn) >> > +{ >> > + unsigned long end_pfn = start_pfn + pageblock_nr_pages; >> > + unsigned long pfn; >> > + >> > + for (pfn = start_pfn; pfn < end_pfn; pfn++) >> > + if (PageReserved(pfn_to_page(pfn))) >> > + return 1; >> > + return 0; >> > +} >> > + >> > +/* >> > * Mark a number of pageblocks as MIGRATE_RESERVE. The number >> > * of blocks reserved is based on zone->pages_min. The memory within the >> > * reserve will tend to store contiguous free pages. Setting min_free_kbytes >> > @@ -2579,7 +2593,7 @@ static void setup_zone_migrate_reserve(struct zone *zone) >> > continue; >> > >> > /* Blocks with reserved pages will never free, skip them. */ >> > - if (PageReserved(page)) >> > + if (pageblock_is_reserved(pfn)) >> > continue; >> > >> > block_migratetype = get_pageblock_migratetype(page); >> > -- >> > 1.5.4.3 >> > >> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> > index 5c44ed4..a596bfd 100644 >> > --- a/mm/page_alloc.c >> > +++ b/mm/page_alloc.c >> > @@ -119,6 +119,7 @@ static char * const zone_names[MAX_NR_ZONES] = { >> > }; >> > >> > int min_free_kbytes = 1024; >> > +int min_free_order_shift = 1; >> > >> > unsigned long __meminitdata nr_kernel_pages; >> > unsigned long __meminitdata nr_all_pages; >> > @@ -1256,7 +1257,7 @@ int zone_watermark_ok(struct zone *z, int order, >> > unsigned long mark, >> > free_pages -= z->free_area[o].nr_free << o; >> > >> > /* Require fewer higher order pages to be free */ >> > - min >>= 1; >> > + min >>= min_free_order_shift; >> > >> > if (free_pages <= min) >> > return 0; >> > -- >> > >> > >> > On Thu, Apr 1, 2010 at 12:05 PM, TAO HU <tghk48@motorola.com> wrote: >> > > Hi, all >> > > >> > > We got a panic on our ARM (OMAP) based HW. >> > > Our code is based on 2.6.29 kernel (last commit for mm/page_alloc.c is >> > > cc2559bccc72767cb446f79b071d96c30c26439b) >> > > >> > > It appears to crash while going through pcp->list in >> > > buffered_rmqueue() of mm/page_alloc.c after checking vmlinux. >> > > "00100100" implies LIST_POISON1 that suggests a race condition between >> > > list_add() and list_del() in my personal view. >> > > However we not yet figure out locking problem regarding page.lru. >> > > >> > > Any known issues about race condition in mm/page_alloc.c? >> > > And other hints are highly appreciated. >> > > >> > > /* Find a page of the appropriate migrate type */ >> > > if (cold) { >> > > ... ... >> > > } else { >> > > list_for_each_entry(page, &pcp->list, lru) >> > > if (page_private(page) == migratetype) >> > > break; >> > > } >> > > >> > > <1>[120898.805267] Unable to handle kernel paging request at virtual >> > > address 00100100 >> > > <1>[120898.805633] pgd = c1560000 >> > > <1>[120898.805786] [00100100] *pgd=897b3031, *pte=00000000, *ppte=00000000 >> > > <4>[120898.806457] Internal error: Oops: 17 [#1] PREEMPT >> > > ... ... >> > > <4>[120898.807861] CPU: 0 Not tainted (2.6.29-omap1 #1) >> > > <4>[120898.808044] PC is at get_page_from_freelist+0x1d0/0x4b0 >> > > <4>[120898.808227] LR is at get_page_from_freelist+0xc8/0x4b0 >> > > <4>[120898.808563] pc : [<c00a600c>] lr : [<c00a5f04>] psr: 800000d3 >> > > <4>[120898.808563] sp : c49fbd18 ip : 00000000 fp : c49fbd74 >> > > <4>[120898.809020] r10: 00000000 r9 : 001000e8 r8 : 00000002 >> > > <4>[120898.809204] r7 : 001200d2 r6 : 60000053 r5 : c0507c4c r4 : c49fa000 >> > > <4>[120898.809509] r3 : 001000e8 r2 : 00100100 r1 : c0507c6c r0 : 00000001 >> > > <4>[120898.809844] Flags: Nzcv IRQs off FIQs off Mode SVC_32 ISA >> > > ARM Segment kernel >> > > <4>[120898.810028] Control: 10c5387d Table: 82160019 DAC: 00000017 >> > > <4>[120898.948425] Backtrace: >> > > <4>[120898.948760] [<c00a5e3c>] (get_page_from_freelist+0x0/0x4b0) >> > > from [<c00a6398>] (__alloc_pages_internal+0xac/0x3e8) >> > > <4>[120898.949554] [<c00a62ec>] (__alloc_pages_internal+0x0/0x3e8) >> > > from [<c00b461c>] (handle_mm_fault+0x16c/0xbac) >> > > <4>[120898.950347] [<c00b44b0>] (handle_mm_fault+0x0/0xbac) from >> > > [<c00b51d0>] (__get_user_pages+0x174/0x2b4) >> > > <4>[120898.951019] [<c00b505c>] (__get_user_pages+0x0/0x2b4) from >> > > [<c00b534c>] (get_user_pages+0x3c/0x44) >> > > <4>[120898.951812] [<c00b5310>] (get_user_pages+0x0/0x44) from >> > > [<c00caf9c>] (get_arg_page+0x50/0xa4) >> > > <4>[120898.952636] [<c00caf4c>] (get_arg_page+0x0/0xa4) from >> > > [<c00cb1ec>] (copy_strings+0x108/0x210) >> > > <4>[120898.953430] r7:beffffe4 r6:00000ffc r5:00000000 r4:00000018 >> > > <4>[120898.954223] [<c00cb0e4>] (copy_strings+0x0/0x210) from >> > > [<c00cb330>] (copy_strings_kernel+0x3c/0x74) >> > > <4>[120898.955047] [<c00cb2f4>] (copy_strings_kernel+0x0/0x74) from >> > > [<c00cc778>] (do_execve+0x18c/0x2b0) >> > > <4>[120898.955841] r5:0001e240 r4:0001e224 >> > > <4>[120898.956329] [<c00cc5ec>] (do_execve+0x0/0x2b0) from >> > > [<c00400e4>] (sys_execve+0x3c/0x5c) >> > > <4>[120898.957153] [<c00400a8>] (sys_execve+0x0/0x5c) from >> > > [<c003ce80>] (ret_fast_syscall+0x0/0x2c) >> > > <4>[120898.957946] r7:0000000b r6:0001e270 r5:00000000 r4:0001d580 >> > > <4>[120898.958740] Code: e1530008 0a000006 e2429018 e1a03009 (e5b32018) >> > > >> > > >> > > >> > > -- >> > > Best Regards >> > > Hu Tao >> > > >> >> >> >> -- >> To unsubscribe, send a message with 'unsubscribe linux-mm' in >> the body to majordomo@kvack.org. For more info on Linux MM, >> see: http://www.linux-mm.org/ . >> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> >> > > -- > Mel Gorman > Part-time Phd Student Linux Technology Center > University of Limerick IBM Dublin Software Lab > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > -- Arve Hjønnevåg -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Question] race condition in mm/page_alloc.c regarding page->lru? 2010-04-03 0:59 ` Arve Hjønnevåg @ 2010-04-04 22:45 ` KOSAKI Motohiro 2010-04-05 10:14 ` Mel Gorman 1 sibling, 0 replies; 19+ messages in thread From: KOSAKI Motohiro @ 2010-04-04 22:45 UTC (permalink / raw) To: Arve Hjønnevåg Cc: kosaki.motohiro, Mel Gorman, TAO HU, linux-mm, linux-kernel, Ye Yuan.Bo-A22116, Chang Qing-A21550, linux-arm-kernel Hi > >> "mm: Add min_free_order_shift tunable." seems makes zero sense. I don't think this patch > >> need to be merge. > > > > It makes a marginal amount of sense. Basically what it does is allowing > > high-order allocations to go much further below their watermarks than is > > currently allowed. If the platform in question is doing a lot of high-order > > allocations, this patch could be seen to "fix" the problem but you wouldn't > > touch mainline with it with a barge pole. It would be more stable to fix > > the drivers to not use high order allocations or use a mempool. > > The high order allocation that caused problems was the first level > page table for each process. Each time a new process started the > kernel would empty the entire page cache to create contiguous free > memory. With the reserved pageblock mostly full (fixed by the second > patch) this contiguous memory would then almost immediately get used > for low order allocations, so the same problem starts again when the > next process starts. I agree this patch does not fix the problem, but > it does improve things when the problem hits. I have not seen a device > in this situation with the second patch applied, but I did not remove > the first patch in case the reserved pageblock fills up. I would like to merge the second patch at first. If the same problem still occur, please post bug report. (and please cc arm folks if it is arm pagetable related) > > It is inconceivable this patch is related to the problem though. > > > >> but "mm: Check if any page in a pageblock is reserved before marking it MIGRATE_RESERVE" > >> treat strange hardware correctly, I think. If Mel ack this, I hope merge it. > >> Mel, Can we hear your opinion? > >> > > > > This patch is interesting and I am surprised it is required. Is it really the > > case that page blocks near the start of a zone are dominated with PageReserved > > pages but the first one happen to be free? I guess it's conceivable on ARM > > where memmap can be freed at boot time. > > I think this happens by default on arm. The kernel starts at offset > 0x8000 to leave room for boot parameters, and in recent kernel > versions (>~2.6.26-29) this memory is freed. > > > > > There is a theoritical problem with the patch but it is easily resolved. > > A PFN walker like this must call pfn_valid_within() before calling > > pfn_to_page(). If they do not, it's possible to get complete garbage > > for the page and result in a bad dereference. In this particular case, > > it would be a kernel oops rather than memory corruption though. > > > > If that was fixed, I'd see no problem with Acking the patch. > > > > I can fix this if you want the patch in mainline. I was not sure it > was acceptable since will slow down boot on all systems, even where it > is not needed. bootup code is not fast path. then, small slowdown is ok, I think. So, I'm looking for your new version patch. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Question] race condition in mm/page_alloc.c regarding page->lru? 2010-04-03 0:59 ` Arve Hjønnevåg 2010-04-04 22:45 ` KOSAKI Motohiro @ 2010-04-05 10:14 ` Mel Gorman 2010-04-05 10:49 ` Minchan Kim 2010-04-06 3:09 ` [PATCH] mm: Check if any page in a pageblock is reserved before marking it MIGRATE_RESERVE Arve Hjønnevåg 1 sibling, 2 replies; 19+ messages in thread From: Mel Gorman @ 2010-04-05 10:14 UTC (permalink / raw) To: Arve Hj?nnev?g Cc: KOSAKI Motohiro, TAO HU, linux-mm, linux-kernel, Ye Yuan.Bo-A22116, Chang Qing-A21550, linux-arm-kernel On Fri, Apr 02, 2010 at 05:59:00PM -0700, Arve Hj?nnev?g wrote: > On Fri, Apr 2, 2010 at 2:48 AM, Mel Gorman <mel@csn.ul.ie> wrote: > > On Fri, Apr 02, 2010 at 02:03:23PM +0900, KOSAKI Motohiro wrote: > >> Cc to Mel, > >> > >> > 2 patches related to page_alloc.c were applied. > >> > Does anyone see a connection between the 2 patches and the panic? > >> > NOTE: the full patches are attached. > >> > >> I think your attached two patches are perfectly unrelated your problem. > >> > > > > Agreed. It's unlikely that there is a race as such in the page > > allocator. In buffered_rmqueue that you initially talk about, the lists > > being manipulated are per-cpu lists. About the only way to corrupt them > > is if you had a NMI hander that called the page allocator. I really hope > > your platform is not doing anything like that. > > > > A double free of page->lru is a possibility. You could try reproducing > > the problem with CONFIG_DEBUG_LIST enabled to see if anything falls out. > > > >> "mm: Add min_free_order_shift tunable." seems makes zero sense. I don't think this patch > >> need to be merge. > >> > > > > It makes a marginal amount of sense. Basically what it does is allowing > > high-order allocations to go much further below their watermarks than is > > currently allowed. If the platform in question is doing a lot of high-order > > allocations, this patch could be seen to "fix" the problem but you wouldn't > > touch mainline with it with a barge pole. It would be more stable to fix > > the drivers to not use high order allocations or use a mempool. > > > > The high order allocation that caused problems was the first level > page table for each process. Out of curiousity, how big is that allocation? Is it specific to android? If it is, I guess it can be let slide but if it's common, it would be worth thinking of an arch-hook that tells the VM that a particular high-order is very common. For example, one possibility would be to ask kswapd to always reclaim at a given order even if the watermarks required are for a lower order. > Each time a new process started the > kernel would empty the entire page cache to create contiguous free > memory. I ask because I'm surprised the entire page cache got chucked out > With the reserved pageblock mostly full (fixed by the second > patch) this contiguous memory would then almost immediately get used > for low order allocations, so the same problem starts again when the > next process starts. This is a little outside what I expected the reserved pageblock was intended for. I expected it to be used for high-order short-lived allocations such as required by some wireless drivers. Pagetables are a bit more common. > I agree this patch does not fix the problem, but > it does improve things when the problem hits. I have not seen a device > in this situation with the second patch applied, but I did not remove > the first patch in case the reserved pageblock fills up. > > > It is inconceivable this patch is related to the problem though. > > > >> but "mm: Check if any page in a pageblock is reserved before marking it MIGRATE_RESERVE" > >> treat strange hardware correctly, I think. If Mel ack this, I hope merge it. > >> Mel, Can we hear your opinion? > >> > > > > This patch is interesting and I am surprised it is required. Is it really the > > case that page blocks near the start of a zone are dominated with PageReserved > > pages but the first one happen to be free? I guess it's conceivable on ARM > > where memmap can be freed at boot time. > > I think this happens by default on arm. The kernel starts at offset > 0x8000 to leave room for boot parameters, and in recent kernel > versions (>~2.6.26-29) this memory is freed. > Ok, that's fine. > > > > There is a theoritical problem with the patch but it is easily resolved. > > A PFN walker like this must call pfn_valid_within() before calling > > pfn_to_page(). If they do not, it's possible to get complete garbage > > for the page and result in a bad dereference. In this particular case, > > it would be a kernel oops rather than memory corruption though. > > > > If that was fixed, I'd see no problem with Acking the patch. > > > > I can fix this if you want the patch in mainline. I was not sure it > was acceptable since will slow down boot on all systems, even where it > is not needed. > It will not be noticeable. Only a few pageblocks are scanned per zone and the full zone gets walked for a variety of reasons during boot anyway. If it ever became absolutly necessary, the lowest suitable pageblock could be identified when the bootmem allocator is being torn down as the necessary information becomes available then. > > It is also inconceivable this patch is related to the problem. > > > >> > > >> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > >> > index a596bfd..34a29e2 100644 > >> > --- a/mm/page_alloc.c > >> > +++ b/mm/page_alloc.c > >> > @@ -2551,6 +2551,20 @@ static inline unsigned long > >> > wait_table_bits(unsigned long size) > >> > #define LONG_ALIGN(x) (((x)+(sizeof(long))-1)&~((sizeof(long))-1)) > >> > > >> > /* > >> > + * Check if a pageblock contains reserved pages > >> > + */ > >> > +static int pageblock_is_reserved(unsigned long start_pfn) > >> > +{ > >> > + unsigned long end_pfn = start_pfn + pageblock_nr_pages; > >> > + unsigned long pfn; > >> > + > >> > + for (pfn = start_pfn; pfn < end_pfn; pfn++) > >> > + if (PageReserved(pfn_to_page(pfn))) > >> > + return 1; > >> > + return 0; > >> > +} > >> > + > >> > +/* > >> > * Mark a number of pageblocks as MIGRATE_RESERVE. The number > >> > * of blocks reserved is based on zone->pages_min. The memory within the > >> > * reserve will tend to store contiguous free pages. Setting min_free_kbytes > >> > @@ -2579,7 +2593,7 @@ static void setup_zone_migrate_reserve(struct zone *zone) > >> > continue; > >> > > >> > /* Blocks with reserved pages will never free, skip them. */ > >> > - if (PageReserved(page)) > >> > + if (pageblock_is_reserved(pfn)) > >> > continue; > >> > > >> > block_migratetype = get_pageblock_migratetype(page); > >> > -- > >> > 1.5.4.3 > >> > > >> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > >> > index 5c44ed4..a596bfd 100644 > >> > --- a/mm/page_alloc.c > >> > +++ b/mm/page_alloc.c > >> > @@ -119,6 +119,7 @@ static char * const zone_names[MAX_NR_ZONES] = { > >> > }; > >> > > >> > int min_free_kbytes = 1024; > >> > +int min_free_order_shift = 1; > >> > > >> > unsigned long __meminitdata nr_kernel_pages; > >> > unsigned long __meminitdata nr_all_pages; > >> > @@ -1256,7 +1257,7 @@ int zone_watermark_ok(struct zone *z, int order, > >> > unsigned long mark, > >> > free_pages -= z->free_area[o].nr_free << o; > >> > > >> > /* Require fewer higher order pages to be free */ > >> > - min >>= 1; > >> > + min >>= min_free_order_shift; > >> > > >> > if (free_pages <= min) > >> > return 0; > >> > -- > >> > > >> > > >> > On Thu, Apr 1, 2010 at 12:05 PM, TAO HU <tghk48@motorola.com> wrote: > >> > > Hi, all > >> > > > >> > > We got a panic on our ARM (OMAP) based HW. > >> > > Our code is based on 2.6.29 kernel (last commit for mm/page_alloc.c is > >> > > cc2559bccc72767cb446f79b071d96c30c26439b) > >> > > > >> > > It appears to crash while going through pcp->list in > >> > > buffered_rmqueue() of mm/page_alloc.c after checking vmlinux. > >> > > "00100100" implies LIST_POISON1 that suggests a race condition between > >> > > list_add() and list_del() in my personal view. > >> > > However we not yet figure out locking problem regarding page.lru. > >> > > > >> > > Any known issues about race condition in mm/page_alloc.c? > >> > > And other hints are highly appreciated. > >> > > > >> > > /* Find a page of the appropriate migrate type */ > >> > > if (cold) { > >> > > ... ... > >> > > } else { > >> > > list_for_each_entry(page, &pcp->list, lru) > >> > > if (page_private(page) == migratetype) > >> > > break; > >> > > } > >> > > > >> > > <1>[120898.805267] Unable to handle kernel paging request at virtual > >> > > address 00100100 > >> > > <1>[120898.805633] pgd = c1560000 > >> > > <1>[120898.805786] [00100100] *pgd=897b3031, *pte=00000000, *ppte=00000000 > >> > > <4>[120898.806457] Internal error: Oops: 17 [#1] PREEMPT > >> > > ... ... > >> > > <4>[120898.807861] CPU: 0 Not tainted (2.6.29-omap1 #1) > >> > > <4>[120898.808044] PC is at get_page_from_freelist+0x1d0/0x4b0 > >> > > <4>[120898.808227] LR is at get_page_from_freelist+0xc8/0x4b0 > >> > > <4>[120898.808563] pc : [<c00a600c>] lr : [<c00a5f04>] psr: 800000d3 > >> > > <4>[120898.808563] sp : c49fbd18 ip : 00000000 fp : c49fbd74 > >> > > <4>[120898.809020] r10: 00000000 r9 : 001000e8 r8 : 00000002 > >> > > <4>[120898.809204] r7 : 001200d2 r6 : 60000053 r5 : c0507c4c r4 : c49fa000 > >> > > <4>[120898.809509] r3 : 001000e8 r2 : 00100100 r1 : c0507c6c r0 : 00000001 > >> > > <4>[120898.809844] Flags: Nzcv IRQs off FIQs off Mode SVC_32 ISA > >> > > ARM Segment kernel > >> > > <4>[120898.810028] Control: 10c5387d Table: 82160019 DAC: 00000017 > >> > > <4>[120898.948425] Backtrace: > >> > > <4>[120898.948760] [<c00a5e3c>] (get_page_from_freelist+0x0/0x4b0) > >> > > from [<c00a6398>] (__alloc_pages_internal+0xac/0x3e8) > >> > > <4>[120898.949554] [<c00a62ec>] (__alloc_pages_internal+0x0/0x3e8) > >> > > from [<c00b461c>] (handle_mm_fault+0x16c/0xbac) > >> > > <4>[120898.950347] [<c00b44b0>] (handle_mm_fault+0x0/0xbac) from > >> > > [<c00b51d0>] (__get_user_pages+0x174/0x2b4) > >> > > <4>[120898.951019] [<c00b505c>] (__get_user_pages+0x0/0x2b4) from > >> > > [<c00b534c>] (get_user_pages+0x3c/0x44) > >> > > <4>[120898.951812] [<c00b5310>] (get_user_pages+0x0/0x44) from > >> > > [<c00caf9c>] (get_arg_page+0x50/0xa4) > >> > > <4>[120898.952636] [<c00caf4c>] (get_arg_page+0x0/0xa4) from > >> > > [<c00cb1ec>] (copy_strings+0x108/0x210) > >> > > <4>[120898.953430] r7:beffffe4 r6:00000ffc r5:00000000 r4:00000018 > >> > > <4>[120898.954223] [<c00cb0e4>] (copy_strings+0x0/0x210) from > >> > > [<c00cb330>] (copy_strings_kernel+0x3c/0x74) > >> > > <4>[120898.955047] [<c00cb2f4>] (copy_strings_kernel+0x0/0x74) from > >> > > [<c00cc778>] (do_execve+0x18c/0x2b0) > >> > > <4>[120898.955841] r5:0001e240 r4:0001e224 > >> > > <4>[120898.956329] [<c00cc5ec>] (do_execve+0x0/0x2b0) from > >> > > [<c00400e4>] (sys_execve+0x3c/0x5c) > >> > > <4>[120898.957153] [<c00400a8>] (sys_execve+0x0/0x5c) from > >> > > [<c003ce80>] (ret_fast_syscall+0x0/0x2c) > >> > > <4>[120898.957946] r7:0000000b r6:0001e270 r5:00000000 r4:0001d580 > >> > > <4>[120898.958740] Code: e1530008 0a000006 e2429018 e1a03009 (e5b32018) > >> > > > >> > > > >> > > > >> > > -- > >> > > Best Regards > >> > > Hu Tao > >> > > > >> > >> > >> -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Question] race condition in mm/page_alloc.c regarding page->lru? 2010-04-05 10:14 ` Mel Gorman @ 2010-04-05 10:49 ` Minchan Kim 2010-04-06 3:09 ` [PATCH] mm: Check if any page in a pageblock is reserved before marking it MIGRATE_RESERVE Arve Hjønnevåg 1 sibling, 0 replies; 19+ messages in thread From: Minchan Kim @ 2010-04-05 10:49 UTC (permalink / raw) To: Mel Gorman Cc: Arve Hj?nnev?g, KOSAKI Motohiro, TAO HU, linux-mm, linux-kernel, Ye Yuan.Bo-A22116, Chang Qing-A21550, linux-arm-kernel Hi, Mel and Arve. On Mon, Apr 5, 2010 at 7:14 PM, Mel Gorman <mel@csn.ul.ie> wrote: > On Fri, Apr 02, 2010 at 05:59:00PM -0700, Arve Hj?nnev?g wrote: >> On Fri, Apr 2, 2010 at 2:48 AM, Mel Gorman <mel@csn.ul.ie> wrote: >> > On Fri, Apr 02, 2010 at 02:03:23PM +0900, KOSAKI Motohiro wrote: >> >> Cc to Mel, >> >> >> >> > 2 patches related to page_alloc.c were applied. >> >> > Does anyone see a connection between the 2 patches and the panic? >> >> > NOTE: the full patches are attached. >> >> >> >> I think your attached two patches are perfectly unrelated your problem. >> >> >> > >> > Agreed. It's unlikely that there is a race as such in the page >> > allocator. In buffered_rmqueue that you initially talk about, the lists >> > being manipulated are per-cpu lists. About the only way to corrupt them >> > is if you had a NMI hander that called the page allocator. I really hope >> > your platform is not doing anything like that. >> > >> > A double free of page->lru is a possibility. You could try reproducing >> > the problem with CONFIG_DEBUG_LIST enabled to see if anything falls out. >> > >> >> "mm: Add min_free_order_shift tunable." seems makes zero sense. I don't think this patch >> >> need to be merge. >> >> >> > >> > It makes a marginal amount of sense. Basically what it does is allowing >> > high-order allocations to go much further below their watermarks than is >> > currently allowed. If the platform in question is doing a lot of high-order >> > allocations, this patch could be seen to "fix" the problem but you wouldn't >> > touch mainline with it with a barge pole. It would be more stable to fix >> > the drivers to not use high order allocations or use a mempool. >> > >> >> The high order allocation that caused problems was the first level >> page table for each process. > > Out of curiousity, how big is that allocation? Is it specific to > android? If it is, I guess it can be let slide but if it's common, it It is the specific on ARM. You can refer get_pgd_slow in arch/arm/mm/pgd.c. It allocates order 2 page for pgd. > would be worth thinking of an arch-hook that tells the VM that a > particular high-order is very common. For example, one possibility would > be to ask kswapd to always reclaim at a given order even if the > watermarks required are for a lower order. Just out of curiosity, too. Normally, embedded system don't have fork-bomb workload. But I think android's case is some different. That's because Dalvik(JVM) keeps many memory which are anon pages for byte codes by itself as possible as. So system always doesn't have enough memory. In addition, most of embedded system don't have swap. It makes system worse, too. So current reclaimer can't be work well. I am not sure my assumption. Arve, my guessing is right? If it is so, Dalvik have to solve this problem? For example, AFAIK, android kernel has low memory killer. If kernel signals memory pressure, Dalvik have to discard some anon pages which has byte codes for executable. It is just my guessing about android. If I misunderstood about android, please, correct me. :) > >> Each time a new process started the >> kernel would empty the entire page cache to create contiguous free >> memory. > > I ask because I'm surprised the entire page cache got chucked out Maybe it was because system has lots of anon pages but no swap. -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH] mm: Check if any page in a pageblock is reserved before marking it MIGRATE_RESERVE 2010-04-05 10:14 ` Mel Gorman 2010-04-05 10:49 ` Minchan Kim @ 2010-04-06 3:09 ` Arve Hjønnevåg 2010-04-06 4:15 ` Minchan Kim 2010-04-06 15:11 ` Mel Gorman 1 sibling, 2 replies; 19+ messages in thread From: Arve Hjønnevåg @ 2010-04-06 3:09 UTC (permalink / raw) To: Mel Gorman Cc: KOSAKI Motohiro, TAO HU, linux-mm, linux-kernel, Ye Yuan.Bo-A22116, Chang Qing-A21550, linux-arm-kernel, Arve Hjønnevåg This fixes a problem where the first pageblock got marked MIGRATE_RESERVE even though it only had a few free pages. This in turn caused no contiguous memory to be reserved and frequent kswapd wakeups that emptied the caches to get more contiguous memory. Signed-off-by: Arve HjA,nnevAJPYg <arve@android.com> --- mm/page_alloc.c | 16 +++++++++++++++- 1 files changed, 15 insertions(+), 1 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index fb7df1d..46ade16 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2860,6 +2860,20 @@ static inline unsigned long wait_table_bits(unsigned long size) #define LONG_ALIGN(x) (((x)+(sizeof(long))-1)&~((sizeof(long))-1)) /* + * Check if a pageblock contains reserved pages + */ +static int pageblock_is_reserved(unsigned long start_pfn) +{ + unsigned long end_pfn = start_pfn + pageblock_nr_pages; + unsigned long pfn; + + for (pfn = start_pfn; pfn < end_pfn; pfn++) + if (!pfn_valid_within(pfn) || PageReserved(pfn_to_page(pfn))) + return 1; + return 0; +} + +/* * Mark a number of pageblocks as MIGRATE_RESERVE. The number * of blocks reserved is based on min_wmark_pages(zone). The memory within * the reserve will tend to store contiguous free pages. Setting min_free_kbytes @@ -2898,7 +2912,7 @@ static void setup_zone_migrate_reserve(struct zone *zone) continue; /* Blocks with reserved pages will never free, skip them. */ - if (PageReserved(page)) + if (pageblock_is_reserved(pfn)) continue; block_migratetype = get_pageblock_migratetype(page); -- 1.6.5.1 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH] mm: Check if any page in a pageblock is reserved before marking it MIGRATE_RESERVE 2010-04-06 3:09 ` [PATCH] mm: Check if any page in a pageblock is reserved before marking it MIGRATE_RESERVE Arve Hjønnevåg @ 2010-04-06 4:15 ` Minchan Kim 2010-04-06 15:11 ` Mel Gorman 1 sibling, 0 replies; 19+ messages in thread From: Minchan Kim @ 2010-04-06 4:15 UTC (permalink / raw) To: Arve Hjønnevåg Cc: Mel Gorman, KOSAKI Motohiro, TAO HU, linux-mm, linux-kernel, Ye Yuan.Bo-A22116, Chang Qing-A21550, linux-arm-kernel On Tue, Apr 6, 2010 at 12:09 PM, Arve Hjønnevåg <arve@android.com> wrote: > This fixes a problem where the first pageblock got marked MIGRATE_RESERVE even > though it only had a few free pages. This in turn caused no contiguous memory > to be reserved and frequent kswapd wakeups that emptied the caches to get more > contiguous memory. It would be better to add following your description of previous mail thread. It can help others understand it in future. On Fri, Apr 02, 2010 at 05:59:00PM -0700, Arve Hj?nnev?g wrote: ... "I think this happens by default on arm. The kernel starts at offset 0x8000 to leave room for boot parameters, and in recent kernel versions (>~2.6.26-29) this memory is freed." -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] mm: Check if any page in a pageblock is reserved before marking it MIGRATE_RESERVE 2010-04-06 3:09 ` [PATCH] mm: Check if any page in a pageblock is reserved before marking it MIGRATE_RESERVE Arve Hjønnevåg 2010-04-06 4:15 ` Minchan Kim @ 2010-04-06 15:11 ` Mel Gorman 1 sibling, 0 replies; 19+ messages in thread From: Mel Gorman @ 2010-04-06 15:11 UTC (permalink / raw) To: Arve Hj?nnev?g Cc: KOSAKI Motohiro, TAO HU, linux-mm, linux-kernel, Ye Yuan.Bo-A22116, Chang Qing-A21550, linux-arm-kernel On Mon, Apr 05, 2010 at 08:09:16PM -0700, Arve Hj?nnev?g wrote: > This fixes a problem where the first pageblock got marked MIGRATE_RESERVE even > though it only had a few free pages. This in turn caused no contiguous memory > to be reserved and frequent kswapd wakeups that emptied the caches to get more > contiguous memory. > > Signed-off-by: Arve Hjonnevag <arve@android.com> I would have used pageblock_reserve_suitable because what you are really checking is "is this page block suitable for use by MIGRATE_RESERVE?". The definition was "is the first page PageReserved" and you are changing it to "does the page block have any memory holes or PageReserved pages?" No biggie though. Change it if you like before upstreaming. Either way. Acked-by: Mel Gorman <mel@csn.ul.ie> Thanks > --- > mm/page_alloc.c | 16 +++++++++++++++- > 1 files changed, 15 insertions(+), 1 deletions(-) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index fb7df1d..46ade16 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -2860,6 +2860,20 @@ static inline unsigned long wait_table_bits(unsigned long size) > #define LONG_ALIGN(x) (((x)+(sizeof(long))-1)&~((sizeof(long))-1)) > > /* > + * Check if a pageblock contains reserved pages > + */ > +static int pageblock_is_reserved(unsigned long start_pfn) > +{ > + unsigned long end_pfn = start_pfn + pageblock_nr_pages; > + unsigned long pfn; > + > + for (pfn = start_pfn; pfn < end_pfn; pfn++) > + if (!pfn_valid_within(pfn) || PageReserved(pfn_to_page(pfn))) > + return 1; > + return 0; > +} > + > +/* > * Mark a number of pageblocks as MIGRATE_RESERVE. The number > * of blocks reserved is based on min_wmark_pages(zone). The memory within > * the reserve will tend to store contiguous free pages. Setting min_free_kbytes > @@ -2898,7 +2912,7 @@ static void setup_zone_migrate_reserve(struct zone *zone) > continue; > > /* Blocks with reserved pages will never free, skip them. */ > - if (PageReserved(page)) > + if (pageblock_is_reserved(pfn)) > continue; > > block_migratetype = get_pageblock_migratetype(page); > -- > 1.6.5.1 > -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Question] race condition in mm/page_alloc.c regarding page->lru? 2010-04-02 3:51 ` TAO HU 2010-04-02 5:03 ` KOSAKI Motohiro @ 2010-04-02 5:04 ` KAMEZAWA Hiroyuki 2010-04-02 5:15 ` Minchan Kim 2010-04-02 5:13 ` Minchan Kim 2010-04-02 7:06 ` Daniel Mack 3 siblings, 1 reply; 19+ messages in thread From: KAMEZAWA Hiroyuki @ 2010-04-02 5:04 UTC (permalink / raw) To: TAO HU Cc: linux-mm, linux-kernel, Ye Yuan.Bo-A22116, Chang Qing-A21550, linux-arm-kernel On Fri, 2 Apr 2010 11:51:33 +0800 TAO HU <tghk48@motorola.com> wrote: > 2 patches related to page_alloc.c were applied. > Does anyone see a connection between the 2 patches and the panic? > NOTE: the full patches are attached. > I don't think there are relationship between patches and your panic. BTW, there is other case about the backlog rather than race in alloc_pages() itself. If someone list_del(&page->lru) and the page is already freed, you'll see the same backlog later. Then, I doubt use-after-free case rather than complicated races. Thanks, -Kame > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index a596bfd..34a29e2 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -2551,6 +2551,20 @@ static inline unsigned long > wait_table_bits(unsigned long size) > #define LONG_ALIGN(x) (((x)+(sizeof(long))-1)&~((sizeof(long))-1)) > > /* > + * Check if a pageblock contains reserved pages > + */ > +static int pageblock_is_reserved(unsigned long start_pfn) > +{ > + unsigned long end_pfn = start_pfn + pageblock_nr_pages; > + unsigned long pfn; > + > + for (pfn = start_pfn; pfn < end_pfn; pfn++) > + if (PageReserved(pfn_to_page(pfn))) > + return 1; > + return 0; > +} > + > +/* > * Mark a number of pageblocks as MIGRATE_RESERVE. The number > * of blocks reserved is based on zone->pages_min. The memory within the > * reserve will tend to store contiguous free pages. Setting min_free_kbytes > @@ -2579,7 +2593,7 @@ static void setup_zone_migrate_reserve(struct zone *zone) > continue; > > /* Blocks with reserved pages will never free, skip them. */ > - if (PageReserved(page)) > + if (pageblock_is_reserved(pfn)) > continue; > > block_migratetype = get_pageblock_migratetype(page); > -- > 1.5.4.3 > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 5c44ed4..a596bfd 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -119,6 +119,7 @@ static char * const zone_names[MAX_NR_ZONES] = { > }; > > int min_free_kbytes = 1024; > +int min_free_order_shift = 1; > > unsigned long __meminitdata nr_kernel_pages; > unsigned long __meminitdata nr_all_pages; > @@ -1256,7 +1257,7 @@ int zone_watermark_ok(struct zone *z, int order, > unsigned long mark, > free_pages -= z->free_area[o].nr_free << o; > > /* Require fewer higher order pages to be free */ > - min >>= 1; > + min >>= min_free_order_shift; > > if (free_pages <= min) > return 0; > -- > > > On Thu, Apr 1, 2010 at 12:05 PM, TAO HU <tghk48@motorola.com> wrote: > > Hi, all > > > > We got a panic on our ARM (OMAP) based HW. > > Our code is based on 2.6.29 kernel (last commit for mm/page_alloc.c is > > cc2559bccc72767cb446f79b071d96c30c26439b) > > > > It appears to crash while going through pcp->list in > > buffered_rmqueue() of mm/page_alloc.c after checking vmlinux. > > "00100100" implies LIST_POISON1 that suggests a race condition between > > list_add() and list_del() in my personal view. > > However we not yet figure out locking problem regarding page.lru. > > > > Any known issues about race condition in mm/page_alloc.c? > > And other hints are highly appreciated. > > > > A /* Find a page of the appropriate migrate type */ > > A A A A A A A A if (cold) { > > A A A A A A A A A ... ... > > A A A A A A A A } else { > > A A A A A A A A A A A A list_for_each_entry(page, &pcp->list, lru) > > A A A A A A A A A A A A A A A A if (page_private(page) == migratetype) > > A A A A A A A A A A A A A A A A A A A A break; > > A A A A A A A A } > > > > <1>[120898.805267] Unable to handle kernel paging request at virtual > > address 00100100 > > <1>[120898.805633] pgd = c1560000 > > <1>[120898.805786] [00100100] *pgd=897b3031, *pte=00000000, *ppte=00000000 > > <4>[120898.806457] Internal error: Oops: 17 [#1] PREEMPT > > ... ... > > <4>[120898.807861] CPU: 0 A A Not tainted A (2.6.29-omap1 #1) > > <4>[120898.808044] PC is at get_page_from_freelist+0x1d0/0x4b0 > > <4>[120898.808227] LR is at get_page_from_freelist+0xc8/0x4b0 > > <4>[120898.808563] pc : [<c00a600c>] A A lr : [<c00a5f04>] A A psr: 800000d3 > > <4>[120898.808563] sp : c49fbd18 A ip : 00000000 A fp : c49fbd74 > > <4>[120898.809020] r10: 00000000 A r9 : 001000e8 A r8 : 00000002 > > <4>[120898.809204] r7 : 001200d2 A r6 : 60000053 A r5 : c0507c4c A r4 : c49fa000 > > <4>[120898.809509] r3 : 001000e8 A r2 : 00100100 A r1 : c0507c6c A r0 : 00000001 > > <4>[120898.809844] Flags: Nzcv A IRQs off A FIQs off A Mode SVC_32 A ISA > > ARM A Segment kernel > > <4>[120898.810028] Control: 10c5387d A Table: 82160019 A DAC: 00000017 > > <4>[120898.948425] Backtrace: > > <4>[120898.948760] [<c00a5e3c>] (get_page_from_freelist+0x0/0x4b0) > > from [<c00a6398>] (__alloc_pages_internal+0xac/0x3e8) > > <4>[120898.949554] [<c00a62ec>] (__alloc_pages_internal+0x0/0x3e8) > > from [<c00b461c>] (handle_mm_fault+0x16c/0xbac) > > <4>[120898.950347] [<c00b44b0>] (handle_mm_fault+0x0/0xbac) from > > [<c00b51d0>] (__get_user_pages+0x174/0x2b4) > > <4>[120898.951019] [<c00b505c>] (__get_user_pages+0x0/0x2b4) from > > [<c00b534c>] (get_user_pages+0x3c/0x44) > > <4>[120898.951812] [<c00b5310>] (get_user_pages+0x0/0x44) from > > [<c00caf9c>] (get_arg_page+0x50/0xa4) > > <4>[120898.952636] [<c00caf4c>] (get_arg_page+0x0/0xa4) from > > [<c00cb1ec>] (copy_strings+0x108/0x210) > > <4>[120898.953430] A r7:beffffe4 r6:00000ffc r5:00000000 r4:00000018 > > <4>[120898.954223] [<c00cb0e4>] (copy_strings+0x0/0x210) from > > [<c00cb330>] (copy_strings_kernel+0x3c/0x74) > > <4>[120898.955047] [<c00cb2f4>] (copy_strings_kernel+0x0/0x74) from > > [<c00cc778>] (do_execve+0x18c/0x2b0) > > <4>[120898.955841] A r5:0001e240 r4:0001e224 > > <4>[120898.956329] [<c00cc5ec>] (do_execve+0x0/0x2b0) from > > [<c00400e4>] (sys_execve+0x3c/0x5c) > > <4>[120898.957153] [<c00400a8>] (sys_execve+0x0/0x5c) from > > [<c003ce80>] (ret_fast_syscall+0x0/0x2c) > > <4>[120898.957946] A r7:0000000b r6:0001e270 r5:00000000 r4:0001d580 > > <4>[120898.958740] Code: e1530008 0a000006 e2429018 e1a03009 (e5b32018) > > > > > > > > -- > > Best Regards > > Hu Tao > > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Question] race condition in mm/page_alloc.c regarding page->lru? 2010-04-02 5:04 ` [Question] race condition in mm/page_alloc.c regarding page->lru? KAMEZAWA Hiroyuki @ 2010-04-02 5:15 ` Minchan Kim 2010-04-02 7:00 ` TAO HU 0 siblings, 1 reply; 19+ messages in thread From: Minchan Kim @ 2010-04-02 5:15 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: TAO HU, linux-mm, linux-kernel, Ye Yuan.Bo-A22116, Chang Qing-A21550, linux-arm-kernel On Fri, Apr 2, 2010 at 2:04 PM, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote: > On Fri, 2 Apr 2010 11:51:33 +0800 > TAO HU <tghk48@motorola.com> wrote: > >> 2 patches related to page_alloc.c were applied. >> Does anyone see a connection between the 2 patches and the panic? >> NOTE: the full patches are attached. >> > > I don't think there are relationship between patches and your panic. > > BTW, there is other case about the backlog rather than race in alloc_pages() > itself. If someone list_del(&page->lru) and the page is already freed, > you'll see the same backlog later. > Then, I doubt use-after-free case rather than complicated races. It does make sense. Please, grep "page handling" by out-of-mainline code. If you found out, Please, post it. -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Question] race condition in mm/page_alloc.c regarding page->lru? 2010-04-02 5:15 ` Minchan Kim @ 2010-04-02 7:00 ` TAO HU 2010-04-02 7:22 ` Minchan Kim 0 siblings, 1 reply; 19+ messages in thread From: TAO HU @ 2010-04-02 7:00 UTC (permalink / raw) To: Minchan Kim Cc: KAMEZAWA Hiroyuki, TAO HU, linux-mm, linux-kernel, Ye Yuan.Bo-A22116, Chang Qing-A21550, linux-arm-kernel Hi, kamezawa hiroyu Thanks for the hint! Hi, Minchan Kim Sorry. Not exactly sure your idea about <grep "page handling">. Below is a result of $ grep -n -r "list_del(&page->lru)" * in our src tree arch/s390/mm/pgtable.c:83: list_del(&page->lru); arch/s390/mm/pgtable.c:226: list_del(&page->lru); arch/x86/mm/pgtable.c:60: list_del(&page->lru); drivers/xen/balloon.c:154: list_del(&page->lru); drivers/virtio/virtio_balloon.c:143: list_del(&page->lru); fs/cifs/file.c:1780: list_del(&page->lru); fs/btrfs/extent_io.c:2584: list_del(&page->lru); fs/mpage.c:388: list_del(&page->lru); include/linux/mm_inline.h:37: list_del(&page->lru); include/linux/mm_inline.h:47: list_del(&page->lru); kernel/kexec.c:391: list_del(&page->lru); kernel/kexec.c:711: list_del(&page->lru); mm/migrate.c:69: list_del(&page->lru); mm/migrate.c:695: list_del(&page->lru); mm/hugetlb.c:467: list_del(&page->lru); mm/hugetlb.c:509: list_del(&page->lru); mm/hugetlb.c:836: list_del(&page->lru); mm/hugetlb.c:844: list_del(&page->lru); mm/hugetlb.c:900: list_del(&page->lru); mm/hugetlb.c:1130: list_del(&page->lru); mm/hugetlb.c:1809: list_del(&page->lru); mm/vmscan.c:597: list_del(&page->lru); mm/vmscan.c:1148: list_del(&page->lru); mm/vmscan.c:1246: list_del(&page->lru); mm/slub.c:827: list_del(&page->lru); mm/slub.c:1249: list_del(&page->lru); mm/slub.c:1263: list_del(&page->lru); mm/slub.c:2419: list_del(&page->lru); mm/slub.c:2809: list_del(&page->lru); mm/readahead.c:65: list_del(&page->lru); mm/readahead.c:100: list_del(&page->lru); mm/page_alloc.c:532: list_del(&page->lru); mm/page_alloc.c:679: list_del(&page->lru); mm/page_alloc.c:741: list_del(&page->lru); mm/page_alloc.c:820: list_del(&page->lru); mm/page_alloc.c:1107: list_del(&page->lru); mm/page_alloc.c:4784: list_del(&page->lru); On Fri, Apr 2, 2010 at 1:15 PM, Minchan Kim <minchan.kim@gmail.com> wrote: > On Fri, Apr 2, 2010 at 2:04 PM, KAMEZAWA Hiroyuki > <kamezawa.hiroyu@jp.fujitsu.com> wrote: >> On Fri, 2 Apr 2010 11:51:33 +0800 >> TAO HU <tghk48@motorola.com> wrote: >> >>> 2 patches related to page_alloc.c were applied. >>> Does anyone see a connection between the 2 patches and the panic? >>> NOTE: the full patches are attached. >>> >> >> I don't think there are relationship between patches and your panic. >> >> BTW, there is other case about the backlog rather than race in alloc_pages() >> itself. If someone list_del(&page->lru) and the page is already freed, >> you'll see the same backlog later. >> Then, I doubt use-after-free case rather than complicated races. > > It does make sense. > Please, grep "page handling" by out-of-mainline code. > If you found out, Please, post it. > > -- > Kind regards, > Minchan Kim > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Question] race condition in mm/page_alloc.c regarding page->lru? 2010-04-02 7:00 ` TAO HU @ 2010-04-02 7:22 ` Minchan Kim 0 siblings, 0 replies; 19+ messages in thread From: Minchan Kim @ 2010-04-02 7:22 UTC (permalink / raw) To: TAO HU Cc: KAMEZAWA Hiroyuki, TAO HU, linux-mm, linux-kernel, Ye Yuan.Bo-A22116, Chang Qing-A21550, linux-arm-kernel On Fri, Apr 2, 2010 at 4:00 PM, TAO HU <tghk48@motorola.com> wrote: > Hi, kamezawa hiroyu > > Thanks for the hint! > > Hi, Minchan Kim > > Sorry. Not exactly sure your idea about <grep "page handling">. > Below is a result of $ grep -n -r "list_del(&page->lru)" * in our src tree It's not enough. Maybe you have to review your's patches based on mainline. > > arch/s390/mm/pgtable.c:83: list_del(&page->lru); > arch/s390/mm/pgtable.c:226: list_del(&page->lru); > arch/x86/mm/pgtable.c:60: list_del(&page->lru); > drivers/xen/balloon.c:154: list_del(&page->lru); > drivers/virtio/virtio_balloon.c:143: list_del(&page->lru); > fs/cifs/file.c:1780: list_del(&page->lru); > fs/btrfs/extent_io.c:2584: list_del(&page->lru); > fs/mpage.c:388: list_del(&page->lru); > include/linux/mm_inline.h:37: list_del(&page->lru); > include/linux/mm_inline.h:47: list_del(&page->lru); > kernel/kexec.c:391: list_del(&page->lru); > kernel/kexec.c:711: list_del(&page->lru); > mm/migrate.c:69: list_del(&page->lru); > mm/migrate.c:695: list_del(&page->lru); > mm/hugetlb.c:467: list_del(&page->lru); > mm/hugetlb.c:509: list_del(&page->lru); > mm/hugetlb.c:836: list_del(&page->lru); > mm/hugetlb.c:844: list_del(&page->lru); > mm/hugetlb.c:900: list_del(&page->lru); > mm/hugetlb.c:1130: list_del(&page->lru); > mm/hugetlb.c:1809: list_del(&page->lru); > mm/vmscan.c:597: list_del(&page->lru); > mm/vmscan.c:1148: list_del(&page->lru); > mm/vmscan.c:1246: list_del(&page->lru); > mm/slub.c:827: list_del(&page->lru); > mm/slub.c:1249: list_del(&page->lru); > mm/slub.c:1263: list_del(&page->lru); > mm/slub.c:2419: list_del(&page->lru); > mm/slub.c:2809: list_del(&page->lru); > mm/readahead.c:65: list_del(&page->lru); > mm/readahead.c:100: list_del(&page->lru); > mm/page_alloc.c:532: list_del(&page->lru); > mm/page_alloc.c:679: list_del(&page->lru); > mm/page_alloc.c:741: list_del(&page->lru); > mm/page_alloc.c:820: list_del(&page->lru); > mm/page_alloc.c:1107: list_del(&page->lru); > mm/page_alloc.c:4784: list_del(&page->lru); > There are normal caller. I expected some bogus driver of out-of-mainline uses page directly without enough review. Is your kernel working well except this bug? Do you see same oops call trace(about page-allocator) whenever kernel panic happens? I mean if something not page-allocadtor breaks memory, you can see other symptoms. so we can doubt others(H/W, other subsystem). -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Question] race condition in mm/page_alloc.c regarding page->lru? 2010-04-02 3:51 ` TAO HU 2010-04-02 5:03 ` KOSAKI Motohiro 2010-04-02 5:04 ` [Question] race condition in mm/page_alloc.c regarding page->lru? KAMEZAWA Hiroyuki @ 2010-04-02 5:13 ` Minchan Kim 2010-04-02 6:48 ` TAO HU 2010-04-02 7:06 ` Daniel Mack 3 siblings, 1 reply; 19+ messages in thread From: Minchan Kim @ 2010-04-02 5:13 UTC (permalink / raw) To: TAO HU Cc: linux-mm, linux-kernel, Ye Yuan.Bo-A22116, Chang Qing-A21550, linux-arm-kernel, Mel Gorman On Fri, Apr 2, 2010 at 12:51 PM, TAO HU <tghk48@motorola.com> wrote: > 2 patches related to page_alloc.c were applied. > Does anyone see a connection between the 2 patches and the panic? Seem to not related to the problem. I don't have seen the problem before. Could you git-bisect to make sure which patch makes bug? Is it reproducible? Can I reproduce it in QEMU-goldfish? -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Question] race condition in mm/page_alloc.c regarding page->lru? 2010-04-02 5:13 ` Minchan Kim @ 2010-04-02 6:48 ` TAO HU 0 siblings, 0 replies; 19+ messages in thread From: TAO HU @ 2010-04-02 6:48 UTC (permalink / raw) To: Minchan Kim Cc: linux-mm, linux-kernel, Ye Yuan.Bo-A22116, Chang Qing-A21550, linux-arm-kernel, Mel Gorman Hi, Minchan Kim It is hard to reproduce the problem. We only observed it twice in the past month. And it randomly occurred a few more times before. So I'm afraid neither git-bisect nor QEMU-goldfish would help. On Fri, Apr 2, 2010 at 1:13 PM, Minchan Kim <minchan.kim@gmail.com> wrote: > On Fri, Apr 2, 2010 at 12:51 PM, TAO HU <tghk48@motorola.com> wrote: >> 2 patches related to page_alloc.c were applied. >> Does anyone see a connection between the 2 patches and the panic? > > Seem to not related to the problem. > I don't have seen the problem before. > > Could you git-bisect to make sure which patch makes bug? > Is it reproducible? > Can I reproduce it in QEMU-goldfish? > > -- > Kind regards, > Minchan Kim > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Question] race condition in mm/page_alloc.c regarding page->lru? 2010-04-02 3:51 ` TAO HU ` (2 preceding siblings ...) 2010-04-02 5:13 ` Minchan Kim @ 2010-04-02 7:06 ` Daniel Mack 3 siblings, 0 replies; 19+ messages in thread From: Daniel Mack @ 2010-04-02 7:06 UTC (permalink / raw) To: TAO HU Cc: linux-mm, Chang Qing-A21550, Ye Yuan.Bo-A22116, linux-kernel, linux-arm-kernel On Fri, Apr 02, 2010 at 11:51:33AM +0800, TAO HU wrote: > On Thu, Apr 1, 2010 at 12:05 PM, TAO HU <tghk48@motorola.com> wrote: > > We got a panic on our ARM (OMAP) based HW. > > Our code is based on 2.6.29 kernel (last commit for mm/page_alloc.c is > > cc2559bccc72767cb446f79b071d96c30c26439b) > > > > It appears to crash while going through pcp->list in > > buffered_rmqueue() of mm/page_alloc.c after checking vmlinux. > > "00100100" implies LIST_POISON1 that suggests a race condition between > > list_add() and list_del() in my personal view. > > However we not yet figure out locking problem regarding page.lru. I'm sure this is just a memory corruption which is unrelated to code in the the memory management area. The code there just happens to trigger it as it is called frequently and is very sensitive to bogus data Did you see the other thread I started off yesterday? http://lkml.indiana.edu/hypermail/linux/kernel/1004.0/00157.html We could well see the same problem here. Not sure though as any kind of memory corruption ends up in Ooopses like the ones you see, but it could be a hint. Daniel -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2010-04-06 15:11 UTC | newest] Thread overview: 19+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-04-01 4:05 [Question] race condition in mm/page_alloc.c regarding page->lru? TAO HU 2010-04-02 3:51 ` TAO HU 2010-04-02 5:03 ` KOSAKI Motohiro 2010-04-02 5:19 ` TAO HU 2010-04-02 9:48 ` Mel Gorman 2010-04-03 0:59 ` Arve Hjønnevåg 2010-04-04 22:45 ` KOSAKI Motohiro 2010-04-05 10:14 ` Mel Gorman 2010-04-05 10:49 ` Minchan Kim 2010-04-06 3:09 ` [PATCH] mm: Check if any page in a pageblock is reserved before marking it MIGRATE_RESERVE Arve Hjønnevåg 2010-04-06 4:15 ` Minchan Kim 2010-04-06 15:11 ` Mel Gorman 2010-04-02 5:04 ` [Question] race condition in mm/page_alloc.c regarding page->lru? KAMEZAWA Hiroyuki 2010-04-02 5:15 ` Minchan Kim 2010-04-02 7:00 ` TAO HU 2010-04-02 7:22 ` Minchan Kim 2010-04-02 5:13 ` Minchan Kim 2010-04-02 6:48 ` TAO HU 2010-04-02 7:06 ` Daniel Mack
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).