From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail6.bemta8.messagelabs.com (mail6.bemta8.messagelabs.com [216.82.243.55]) by kanga.kvack.org (Postfix) with ESMTP id 7DE896B0169 for ; Wed, 3 Aug 2011 08:29:04 -0400 (EDT) Received: by vxg38 with SMTP id 38so813643vxg.14 for ; Wed, 03 Aug 2011 05:29:03 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <20110803110555.GD19099@suse.de> References: <20110803110555.GD19099@suse.de> Date: Wed, 3 Aug 2011 17:59:03 +0530 Message-ID: Subject: Re: [PATCH] ARM: sparsemem: Enable CONFIG_HOLES_IN_ZONE config option for SparseMem and HAS_HOLES_MEMORYMODEL for linux-3.0. From: Kautuk Consul Content-Type: text/plain; charset=ISO-8859-1 Sender: owner-linux-mm@kvack.org List-ID: To: Mel Gorman Cc: Russell King , linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org Hi Mel, Sorry for the formatting. I forgot to include the following entire backtrace: #> cp test_huge_file nfsmnt kernel BUG at mm/page_alloc.c:849! Unable to handle kernel NULL pointer dereference at virtual address 00000000 pgd = ce9f0000 Backtrace: [] (__bug+0x0/0x30) from [] (move_freepages_block+0xd4/0x158) [] (move_freepages_block+0x0/0x158) from [] (__rmqueue+0x1dc/0x32c) r8:c0481120 r7:c048107c r6:00000003 r5:00000001 r4:c04f8200 r3:00000000 [] (__rmqueue+0x0/0x32c) from [] (get_page_from_freelist+0x12c/0x530) [] (get_page_from_freelist+0x0/0x530) from [] (__alloc_pages_nodemask+0xf0/0x544) [] (__alloc_pages_nodemask+0x0/0x544) from [] (cache_alloc_refill+0x2d0/0x654) [] (cache_alloc_refill+0x0/0x654) from [] (kmem_cache_alloc+0x58/0x9c) [] (kmem_cache_alloc+0x0/0x9c) from [] (radix_tree_preload+0x58/0xbc) r7:00006741 r6:000000d0 r5:c04a98a0 r4:ce986000 [] (radix_tree_preload+0x0/0xbc) from [] (add_to_page_cache_locked+0x20/0x1c4) r6:ce987d20 r5:ce346c1c r4:c04f8600 r3:000000d0 [] (add_to_page_cache_locked+0x0/0x1c4) from [] (add_to_page_cache_lru+0x4c/0x7c) r8:00000020 r7:ce7402a0 r6:ce987d20 r5:00000005 r4:c04f8600 r3:000000d0 [] (add_to_page_cache_lru+0x0/0x7c) from [] (mpage_readpages+0x7c/0x108) r5:00000005 r4:c04f8600 [] (mpage_readpages+0x0/0x108) from [] (fat_readpages+0x20/0x28) [] (fat_readpages+0x0/0x28) from [] (__do_page_cache_readahead+0x1c4/0x27c) [] (__do_page_cache_readahead+0x0/0x27c) from [] (ra_submit+0x2c/0x34) [] (ra_submit+0x0/0x34) from [] (ondemand_readahead+0x20c/0x21c) [] (ondemand_readahead+0x0/0x21c) from [] (page_cache_async_readahead+0xa4/0xd8) [] (page_cache_async_readahead+0x0/0xd8) from [] (generic_file_aio_read+0x360/0x7f4) r8:00000000 r7:ce346c1c r6:ce88dba0 r5:0000671c r4:c04f9640 [] (generic_file_aio_read+0x0/0x7f4) from [] (do_sync_read+0xa0/0xd8) [] (do_sync_read+0x0/0xd8) from [] (vfs_read+0xb8/0x154) r6:bed53650 r5:ce88dba0 r4:00001000 [] (vfs_read+0x0/0x154) from [] (sys_read+0x44/0x70) r8:0671c000 r7:00000003 r6:00001000 r5:bed53650 r4:ce88dba0 [] (sys_read+0x0/0x70) from [] (ret_fast_syscall+0x0/0x30) r9:ce986000 r8:c0023128 r6:bed53650 r5:00001000 r4:0013a4e0 Code: e59f0010 e1a01003 eb0d0852 e3a03000 (e5833000) Since I was testing on linux-2.6.35.9, line 849 in page_alloc.c is the same line as you have mentioned: BUG_ON(page_zone(start_page) != page_zone(end_page)) I reproduce this crash by altering the memory banks' memory ranges such that they are not aligned to the SECTION_SIZE_BITS size. For example, on my ARM system SECTION_SIZE_BITS is 23(8MB), so I change the code in arch/arm/mach-* code so that the total kernel memory size in the memory banks is lesser by 1 MB, which makes the total kernel memory size become not exactly divisible by 8 MB. Thanks, Kautuk. On Wed, Aug 3, 2011 at 4:35 PM, Mel Gorman wrote: > On Tue, Aug 02, 2011 at 05:38:31PM +0530, Kautuk Consul wrote: >> Hi, >> >> In the case where the total kernel memory is not aligned to the >> SECTION_SIZE_BITS I see a kernel crash. >> >> When I copy a huge file, then the kernel crashes at the following callstack: >> > > The callstack should not be 80-column formatted as this is completely > manged and unreadable without manual editting. Also, why did you not > include the full error message? With it, I'd have a better idea of > which bug check you hit. > >> Backtrace: >> >> >> The reason for this is that the CONFIG_HOLES_IN_ZONE configuration >> option is not automatically enabled when SPARSEMEM or >> ARCH_HAS_HOLES_MEMORYMODEL are enabled. Due to this, the >> pfn_valid_within() macro always returns 1 due to which the BUG_ON is >> encountered. >> This patch enables the CONFIG_HOLES_IN_ZONE config option if either >> ARCH_HAS_HOLES_MEMORYMODEL or SPARSEMEM is enabled. >> >> Although I tested this on an older kernel, i.e., 2.6.35.13, I see that >> this option has not been enabled as yet in linux-3.0 and this appears >> to be a >> logically correct change anyways with respect to pfn_valid_within() >> functionality. >> > > There is a performance cost associated with HOLES_IN_ZONE which may be > offset by memory savings but not necessarily. > > If the BUG_ON you are hitting is this one > BUG_ON(page_zone(start_page) != page_zone(end_page)) then I'd be > wondering why the check in move_freepages_block() was insufficient. > > If it's because holes are punched in the memmap then the option does > need to be set. > > -- > Mel Gorman > SUSE Labs > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org