From: Vlastimil Babka <vbabka@suse.cz>
To: Laura Abbott <lauraa@codeaurora.org>,
Srinivas Kandagatla <srinivas.kandagatla@linaro.org>,
"linux-arm-kernel@lists.infradead.org"
<linux-arm-kernel@lists.infradead.org>,
linux@arm.linux.org.uk, ssantosh@kernel.org,
Andrew Morton <akpm@linux-foundation.org>,
Mel Gorman <mgorman@suse.de>
Cc: Kevin Hilman <khilman@linaro.org>,
Stephen Boyd <sboyd@codeaurora.org>,
Arnd Bergmann <arnd@arndb.de>, Kumar Gala <galak@codeaurora.org>,
linux-mm@kvack.org
Subject: Re: Issue on reserving memory with no-map flag in DT
Date: Tue, 20 Jan 2015 10:54:55 +0100 [thread overview]
Message-ID: <54BE25EF.3060004@suse.cz> (raw)
In-Reply-To: <54BD99F7.8050603@codeaurora.org>
On 01/20/2015 12:57 AM, Laura Abbott wrote:
> On 1/19/2015 7:49 AM, Vlastimil Babka wrote:
>> On 01/17/2015 01:24 AM, Laura Abbott wrote:
>>
>> I admit I may not see clearly through all the arch-specific layers and various
>> config option combinations that are possible here, so I might be misinterpreting
>> the code. But I think the problem here is not insufficient allocation size, but
>> something else.
>>
>> The code above continues by this line:
>>
>> pgdat->node_mem_map = map + (pgdat->node_start_pfn - start);
>>
>> So, size for the map allocation has already been calculated aligned to
>> MAX_ORDER_NR_PAGES before your patch, and node_mem_map points to the first
>> actually present page, which might be offset from the perfect alignment. Your
>> patch adds another offset to the already aligned size (but you use
>> pageblock_nr_pages which might be lower than MAX_ORDER_NR_PAGES; this seems like
>> a mistake in itself?). So with your patch we have map of aligned size starting
>> from the node_mem_map. This means the last offset-worth of struct pages should
>> be beyond what's needed to access struct page of pgdat_end_pfn(). If we need
>> that extra padding to prevent crashing, then it looks really suspicious...
>>
>> And when I look at node_mem_map usage, I see include/asm/generic/memory_model.h
>> defines __pfn_to_page as (basically)
>>
>> NODE_DATA(__nid)->node_mem_map + arch_local_page_offset(__pfn, __nid);\
>>
>> and further above is a generic definition of arch_local_page_offset:
>>
>> #define arch_local_page_offset(pfn, nid) \
>> ((pfn) - NODE_DATA(nid)->node_start_pfn)
>>
>> So it looks correct to me without your patch. The map is allocated aligned,
>> node_mem_map points to this map at the offset corresponding to node_start_pfn,
>> and pfn_to_page subtracts node_start_pfn to get the offset relative to
>> node_mem_map. We shouldn't need the extra padding by the node_start_pfn offset,
>> unless something else is misbehaving here.
>>
>> In the issue fixed by 7c45512 that you refer to, the problem was basically that
>> the allocation didn't use aligned size, but this looks different to me?
>>
>>
>
> With this hard coded debugging:
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 7633c50..241b870 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5029,6 +5029,11 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat)
> map = memblock_virt_alloc_node_nopanic(size,
> pgdat->node_id);
> pgdat->node_mem_map = map + (pgdat->node_start_pfn - start);
> + pr_err(">>> node_start_pfn %lx node_end_pfn %lx\n",
> + pgdat->node_start_pfn, pgdat_end_pfn(pgdat));
> + pr_err(">>> size calculated %lx\n", size);
> + pr_err(">>> allocated region %p-%lx\n", map, ((unsigned long)map)+size);
> +
> }
> #ifndef CONFIG_NEED_MULTIPLE_NODES
> /*
> @@ -5043,6 +5048,8 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat)
> }
> #endif
> #endif /* CONFIG_FLAT_NODE_MEM_MAP */
> + pr_err(">>> pfn %lx page %p\n", 0x200, pfn_to_page(0x200));
> + pr_err(">>> pfn %lx page %p\n", 0xbffff, pfn_to_page(0xbffff));
> }
>
> void __paginginit free_area_init_node(int nid, unsigned long *zones_size,
>
> I get this output:
> [ 0.000000] >>> node_start_pfn 200 node_end_pfn c0000
> [ 0.000000] >>> size calculated 1800000
> [ 0.000000] >>> allocated region edffa000-ef7fa000
> [ 0.000000] >>> pfn 200 page ee002000
> [ 0.000000] >>> pfn bffff page ef7fdfe0
>
> The start and end pfn values are correct but that page value is outside of the
> allocated region for the memory map. This is a CONFIG_FLATMEM system so we
> aren't actually using arch_local_page_offset at all:
>
>
> #define __pfn_to_page(pfn) (mem_map + ((pfn) - ARCH_PFN_OFFSET))
> #define __page_to_pfn(page) ((unsigned long)((page) - mem_map) + \
> ARCH_PFN_OFFSET)
Ah, OK. I searched just for node_mem_map and didn't notice it's also assigned to
mem_map.
> If you do the math, the array size is fine if we don't offset by the
> start but alloc_node_mem_map offsets assuming pfn_to_page will offset
> as well but this doesn't happen in CONFIG_FLATMEM.
>
> Either alloc_node_mem_map needs to drop the offset or the pfn_to_page
> functions need to start adding the offset. It's worth noting that
> this gets corrected properly if we have CONFIG_HAVE_MEMBLOCK_NODE_MAP enabled
> so perhaps the fix is to unoffset for flatmem as well:
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 7633c50..271c44b 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5036,7 +5036,7 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat)
> */
> if (pgdat == NODE_DATA(0)) {
> mem_map = NODE_DATA(0)->node_mem_map;
> -#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
> +#if defined(CONFIG_HAVE_MEMBLOCK_NODE_MAP) || defined(CONFIG_FLATMEM)
> if (page_to_pfn(mem_map) != pgdat->node_start_pfn)
> mem_map -= (pgdat->node_start_pfn - ARCH_PFN_OFFSET);
But is this correcting the same thing? The offset that's added earlier is
(pgdat->node_start_pfn - start) where "start" is just alignment of the
node_start_pfn to MAX_ORDER_NR_PAGES. But here we subtract whole
pgdat->node_start_pfn, minus a ARCH_PFN_OFFSET constant. Is the constant always
equeal to the earlier value of "start", which is calculated dynamically?.
So I agree that mem_map assignment should be fixed, but maybe not exactly like this?
> #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
>
> Thanks,
> Laura
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2015-01-20 9:54 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <54B8F63C.1060300@linaro.org>
2015-01-17 0:24 ` Issue on reserving memory with no-map flag in DT Laura Abbott
2015-01-17 8:39 ` Srinivas Kandagatla
2015-01-19 15:49 ` Vlastimil Babka
2015-01-19 23:57 ` Laura Abbott
2015-01-20 9:54 ` Vlastimil Babka [this message]
2015-01-21 1:37 ` [PATCH] mm: Don't offset memmap for flatmem Laura Abbott
2015-01-21 10:15 ` Vlastimil Babka
2015-01-22 1:01 ` [PATCHv2] " Laura Abbott
2015-01-23 0:20 ` Andrew Morton
2015-01-23 0:33 ` Laura Abbott
2015-01-23 9:05 ` Vlastimil Babka
2015-01-26 15:56 ` Mel Gorman
2015-01-29 13:13 ` Vlastimil Babka
2015-02-04 2:25 ` Laura Abbott
2015-02-24 19:54 ` Laura Abbott
2015-02-27 15:24 ` Vlastimil Babka
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=54BE25EF.3060004@suse.cz \
--to=vbabka@suse.cz \
--cc=akpm@linux-foundation.org \
--cc=arnd@arndb.de \
--cc=galak@codeaurora.org \
--cc=khilman@linaro.org \
--cc=lauraa@codeaurora.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-mm@kvack.org \
--cc=linux@arm.linux.org.uk \
--cc=mgorman@suse.de \
--cc=sboyd@codeaurora.org \
--cc=srinivas.kandagatla@linaro.org \
--cc=ssantosh@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).