linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Mel Gorman <mgorman@suse.de>
To: Kautuk Consul <consul.kautuk@gmail.com>
Cc: Russell King <rmk@arm.linux.org.uk>,
	linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org
Subject: Re: [PATCH] ARM: sparsemem: Enable CONFIG_HOLES_IN_ZONE config option for SparseMem and HAS_HOLES_MEMORYMODEL for linux-3.0.
Date: Thu, 4 Aug 2011 11:09:28 +0100	[thread overview]
Message-ID: <20110804100928.GN19099@suse.de> (raw)
In-Reply-To: <CAFPAmTS2JEVk3tWhJN034dUmaxLujswmmsqGABGYEV=N3v0Ehw@mail.gmail.com>

On Thu, Aug 04, 2011 at 03:06:39PM +0530, Kautuk Consul wrote:
> Hi Mel,
> 
> My ARM system has 2 memory banks which have the following 2 PFN ranges:
> 60000-62000 and 70000-7ce00.
> 
> My SECTION_SIZE_BITS is #defined to 23.
> 

So bank 0 is 4 sections and bank 1 is 26 sections with the last section
incomplete.

> I am altering the ranges via the following kind of pseudo-code in the
> arch/arm/mach-*/mach-*.c file:
> meminfo->bank[0].size -= (1 << 20)
> meminfo->bank[1].size -= (1 << 20)
> 

Why are you taking 1M off each bank? I could understand aligning the
banks to a section size at least.

That said, there is an assumption that pages within a MAX_ORDER-aligned
block. If you are taking 1M off the end of bank 0, it is no longer
MAX_ORDER aligned. This is the assumption move_freepages_block()
is falling foul of. The problem can be avoided by ensuring that memmap
is valid within MAX_ORDER-aligned ranges.

> After altering the size of both memory banks, the PFN ranges now
> visible to the kernel are:
> 60000-61f00 and 70000-7cd00.
> 
> There is only one node and one ZONE_NORMAL zone on the system and this
> zone accounts for both memory banks as CONFIG_SPARSEMEM
> is enabled in the kernel.
> 

Ok.

> I put some printks in the move_freeblockpages() function, compiled the
> kernel and then ran the following commands:
> ifconfig eth0 107.109.39.102 up
> mount /dev/sda1 /mnt   # Mounted the USB pen drive
> mount -t nfs -o nolock 107.109.39.103:/home/kautuk nfsmnt   # NFS
> mounted an NFS share
> cp test_huge_file nfsmnt/
> 
> I got the following output:
> #> cp test_huge nfsmnt/
> ------------------------------------------------------------move_freepages_block_start----------------------------------------
> kc: The page=c068a000 start_pfn=7c400 start_page=c0686000
> end_page=c068dfe0 end_pfn=7c7ff
> page_zone(start_page)=c048107c page_zone(end_page)=c048107c
> page_zonenum(end_page) = 0
> ------------------------------------------------------------move_freepages_block_end----------------------------------------
> ------------------------------------------------------------move_freepages_block_start----------------------------------------
> kc: The page=c0652000 start_pfn=7a800 start_page=c064e000
> end_page=c0655fe0 end_pfn=7abff
> page_zone(start_page)=c048107c page_zone(end_page)=c048107c
> page_zonenum(end_page) = 0
> ------------------------------------------------------------move_freepages_block_end----------------------------------------
> ------------------------------------------------------------move_freepages_block_start----------------------------------------
> kc: The page=c065a000 start_pfn=7ac00 start_page=c0656000
> end_page=c065dfe0 end_pfn=7afff
> page_zone(start_page)=c048107c page_zone(end_page)=c048107c
> page_zonenum(end_page) = 0
> ------------------------------------------------------------move_freepages_block_end----------------------------------------
> ------------------------------------------------------------move_freepages_block_start----------------------------------------
> kc: The page=c0695000 start_pfn=7c800 start_page=c068e000
> end_page=c0695fe0 end_pfn=7cbff
> page_zone(start_page)=c048107c page_zone(end_page)=c048107c
> page_zonenum(end_page) = 0
> ------------------------------------------------------------move_freepages_block_end----------------------------------------
> ------------------------------------------------------------move_freepages_block_start----------------------------------------
> kc: The page=c04f7c00 start_pfn=61c00 start_page=c04f6000
> end_page=c04fdfe0 end_pfn=61fff
> page_zone(start_page)=c048107c page_zone(end_page)=c0481358
> page_zonenum(end_page) = 1
> ------------------------------------------------------------move_freepages_block_end----------------------------------------
> kernel BUG at mm/page_alloc.c:849!
> Unable to handle kernel NULL pointer dereference at virtual address 00000000
> pgd = ce9fc000
> [00000000] *pgd=7ca5a031, *pte=00000000, *ppte=00000000
> 
> As per the last line, we can clearly see that the
> page_zone(start_page)=c048107c and page_zone(end_page)=c0481358,
> which are not equal to each other.
> Since they do not match, the code in move_freepages() bugchecks
> because of the following BUG_ON() check:
> page_zone(start_page) != page_zone(end_page)

> The reason for this that the page_zonenum(end_page) is equal to 1 and
> this is different from the page_zonenum(start_page) which is 0.
> 

Because the MAX_ORDER alignment is gone.

> On checking the code within page_zonenum(), I see that this code tries
> to retrieve the zone number from the end_page->flags.
> 

Yes. In the majority of cases a pages node and zone is stored in the
page->flags.

> The reason why we cannot expect the 0x61fff end_page->flags to contain
> a valid zone number is:
> memmap_init_zone() initializes the zone number of all pages for a zone
> via the set_page_links() inline function.
> For the end_page (whose PFN is 0x61fff), set_page_links() cannot be
> possibly called, as the zones are simply not aware of of PFNs above
> 0x61f00 and below 0x70000.
> 

Can you ensure that the ranges passed into free_area_init_node()
are MAX_ORDER aligned as this would initialise the struct pages. You
may have already seen that care is taken when freeing memmap that it
is aligned to MAX_ORDER in free_unused_memmap() in ARM.

> The (end >= zone->zone_start_pfn + zone->spanned_pages) in
> move_freepages_block() does not stop this crash from happening as both
> our memory banks are in the same zone and the empty space within them
> is accomodated into this zone via the CONFIG_SPARSEMEM
> config option.
> 
> When we enable CONFIG_HOLES_IN_ZONE we survive this BUG_ON as well as
> any other BUG_ONs in the loop in move_freepages() as then the
> pfn_valid_within()/pfn_valid() function takes care of this
> functionality, especially in the case where the newly introduced
> CONFIG_HAVE_ARCH_PFN_VALID is
> enabled.
> 

This is an expensive option in terms of performance. If Russell
wants to pick it up, I won't object but I would strongly suggest that
you solve this problem by ensuring that memmap is initialised on a
MAX_ORDER-aligned boundaries as it'll perform better.

Thanks.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2011-08-04 10:09 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-08-02 12:08 [PATCH] ARM: sparsemem: Enable CONFIG_HOLES_IN_ZONE config option for SparseMem and HAS_HOLES_MEMORYMODEL for linux-3.0 Kautuk Consul
2011-08-03 11:05 ` Mel Gorman
2011-08-03 12:29   ` Kautuk Consul
2011-08-03 13:28     ` Mel Gorman
2011-08-04  9:36       ` Kautuk Consul
2011-08-04 10:09         ` Mel Gorman [this message]
2011-08-05  5:57           ` Kautuk Consul
2011-08-05  8:47             ` Mel Gorman
2011-08-05 11:40               ` Kautuk Consul
2011-08-24 12:31                 ` Kautuk Consul

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110804100928.GN19099@suse.de \
    --to=mgorman@suse.de \
    --cc=consul.kautuk@gmail.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-mm@kvack.org \
    --cc=rmk@arm.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).