All of lore.kernel.org
 help / color / mirror / Atom feed
From: vbabka@suse.cz (Vlastimil Babka)
To: linux-arm-kernel@lists.infradead.org
Subject: Issue on reserving memory with no-map flag  in  DT
Date: Tue, 20 Jan 2015 10:54:55 +0100	[thread overview]
Message-ID: <54BE25EF.3060004@suse.cz> (raw)
In-Reply-To: <54BD99F7.8050603@codeaurora.org>

On 01/20/2015 12:57 AM, Laura Abbott wrote:
> On 1/19/2015 7:49 AM, Vlastimil Babka wrote:
>> On 01/17/2015 01:24 AM, Laura Abbott wrote:
>>
>> I admit I may not see clearly through all the arch-specific layers and various
>> config option combinations that are possible here, so I might be misinterpreting
>> the code. But I think the problem here is not insufficient allocation size, but
>> something else.
>>
>> The code above continues by this line:
>>
>> 		pgdat->node_mem_map = map + (pgdat->node_start_pfn - start);
>>
>> So, size for the map allocation has already been calculated aligned to
>> MAX_ORDER_NR_PAGES before your patch, and node_mem_map points to the first
>> actually present page, which might be offset from the perfect alignment. Your
>> patch adds another offset to the already aligned size (but you use
>> pageblock_nr_pages which might be lower than MAX_ORDER_NR_PAGES; this seems like
>> a mistake in itself?). So with your patch we have map of aligned size starting
>> from the node_mem_map. This means the last offset-worth of struct pages should
>> be beyond what's needed to access struct page of pgdat_end_pfn(). If we need
>> that extra padding to prevent crashing, then it looks really suspicious...
>>
>> And when I look at node_mem_map usage, I see include/asm/generic/memory_model.h
>> defines __pfn_to_page as (basically)
>>
>> NODE_DATA(__nid)->node_mem_map + arch_local_page_offset(__pfn, __nid);\
>>
>> and further above is a generic definition of arch_local_page_offset:
>>
>> #define arch_local_page_offset(pfn, nid)        \
>>          ((pfn) - NODE_DATA(nid)->node_start_pfn)
>>
>> So it looks correct to me without your patch. The map is allocated aligned,
>> node_mem_map points to this map at the offset corresponding to node_start_pfn,
>> and pfn_to_page subtracts node_start_pfn to get the offset relative to
>> node_mem_map. We shouldn't need the extra padding by the node_start_pfn offset,
>> unless something else is misbehaving here.
>>
>> In the issue fixed by 7c45512 that you refer to, the problem was basically that
>> the allocation didn't use aligned size, but this looks different to me?
>>
>>
> 
> With this hard coded debugging:
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 7633c50..241b870 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5029,6 +5029,11 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat)
>                          map = memblock_virt_alloc_node_nopanic(size,
>                                                                 pgdat->node_id);
>                  pgdat->node_mem_map = map + (pgdat->node_start_pfn - start);
> +               pr_err(">>> node_start_pfn %lx node_end_pfn %lx\n",
> +                       pgdat->node_start_pfn, pgdat_end_pfn(pgdat));
> +               pr_err(">>> size calculated %lx\n", size);
> +               pr_err(">>> allocated region %p-%lx\n", map, ((unsigned long)map)+size);
> +
>          }
>   #ifndef CONFIG_NEED_MULTIPLE_NODES
>          /*
> @@ -5043,6 +5048,8 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat)
>          }
>   #endif
>   #endif /* CONFIG_FLAT_NODE_MEM_MAP */
> +       pr_err(">>> pfn %lx page %p\n", 0x200, pfn_to_page(0x200));
> +       pr_err(">>> pfn %lx page %p\n", 0xbffff, pfn_to_page(0xbffff));
>   }
>   
>   void __paginginit free_area_init_node(int nid, unsigned long *zones_size,
> 
> I get this output:
> [    0.000000] >>> node_start_pfn 200 node_end_pfn c0000
> [    0.000000] >>> size calculated 1800000
> [    0.000000] >>> allocated region edffa000-ef7fa000
> [    0.000000] >>> pfn 200 page ee002000
> [    0.000000] >>> pfn bffff page ef7fdfe0
> 
> The start and end pfn values are correct but that page value is outside of the
> allocated region for the memory map. This is a CONFIG_FLATMEM system so we
> aren't actually using arch_local_page_offset at all:
> 
> 
> #define __pfn_to_page(pfn)      (mem_map + ((pfn) - ARCH_PFN_OFFSET))
> #define __page_to_pfn(page)     ((unsigned long)((page) - mem_map) + \
>                                   ARCH_PFN_OFFSET)

Ah, OK. I searched just for node_mem_map and didn't notice it's also assigned to
mem_map.

> If you do the math, the array size is fine if we don't offset by the
> start but alloc_node_mem_map offsets assuming pfn_to_page will offset
> as well but this doesn't happen in CONFIG_FLATMEM.
> 
> Either alloc_node_mem_map needs to drop the offset or the pfn_to_page
> functions need to start adding the offset. It's worth noting that
> this gets corrected properly if we have CONFIG_HAVE_MEMBLOCK_NODE_MAP enabled
> so perhaps the fix is to unoffset for flatmem as well:
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 7633c50..271c44b 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5036,7 +5036,7 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat)
>           */
>          if (pgdat == NODE_DATA(0)) {
>                  mem_map = NODE_DATA(0)->node_mem_map;
> -#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
> +#if defined(CONFIG_HAVE_MEMBLOCK_NODE_MAP) || defined(CONFIG_FLATMEM)
>                  if (page_to_pfn(mem_map) != pgdat->node_start_pfn)
>                          mem_map -= (pgdat->node_start_pfn - ARCH_PFN_OFFSET);

But is this correcting the same thing? The offset that's added earlier is
(pgdat->node_start_pfn - start) where "start" is just alignment of the
node_start_pfn to MAX_ORDER_NR_PAGES. But here we subtract whole
pgdat->node_start_pfn, minus a ARCH_PFN_OFFSET constant. Is the constant always
equeal to the earlier value of "start", which is calculated dynamically?.

So I agree that mem_map assignment should be fixed, but maybe not exactly like this?

>   #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
> 
> Thanks,
> Laura
> 

WARNING: multiple messages have this Message-ID (diff)
From: Vlastimil Babka <vbabka@suse.cz>
To: Laura Abbott <lauraa@codeaurora.org>,
	Srinivas Kandagatla <srinivas.kandagatla@linaro.org>,
	"linux-arm-kernel@lists.infradead.org"
	<linux-arm-kernel@lists.infradead.org>,
	linux@arm.linux.org.uk, ssantosh@kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Mel Gorman <mgorman@suse.de>
Cc: Kevin Hilman <khilman@linaro.org>,
	Stephen Boyd <sboyd@codeaurora.org>,
	Arnd Bergmann <arnd@arndb.de>, Kumar Gala <galak@codeaurora.org>,
	linux-mm@kvack.org
Subject: Re: Issue on reserving memory with no-map flag  in  DT
Date: Tue, 20 Jan 2015 10:54:55 +0100	[thread overview]
Message-ID: <54BE25EF.3060004@suse.cz> (raw)
In-Reply-To: <54BD99F7.8050603@codeaurora.org>

On 01/20/2015 12:57 AM, Laura Abbott wrote:
> On 1/19/2015 7:49 AM, Vlastimil Babka wrote:
>> On 01/17/2015 01:24 AM, Laura Abbott wrote:
>>
>> I admit I may not see clearly through all the arch-specific layers and various
>> config option combinations that are possible here, so I might be misinterpreting
>> the code. But I think the problem here is not insufficient allocation size, but
>> something else.
>>
>> The code above continues by this line:
>>
>> 		pgdat->node_mem_map = map + (pgdat->node_start_pfn - start);
>>
>> So, size for the map allocation has already been calculated aligned to
>> MAX_ORDER_NR_PAGES before your patch, and node_mem_map points to the first
>> actually present page, which might be offset from the perfect alignment. Your
>> patch adds another offset to the already aligned size (but you use
>> pageblock_nr_pages which might be lower than MAX_ORDER_NR_PAGES; this seems like
>> a mistake in itself?). So with your patch we have map of aligned size starting
>> from the node_mem_map. This means the last offset-worth of struct pages should
>> be beyond what's needed to access struct page of pgdat_end_pfn(). If we need
>> that extra padding to prevent crashing, then it looks really suspicious...
>>
>> And when I look at node_mem_map usage, I see include/asm/generic/memory_model.h
>> defines __pfn_to_page as (basically)
>>
>> NODE_DATA(__nid)->node_mem_map + arch_local_page_offset(__pfn, __nid);\
>>
>> and further above is a generic definition of arch_local_page_offset:
>>
>> #define arch_local_page_offset(pfn, nid)        \
>>          ((pfn) - NODE_DATA(nid)->node_start_pfn)
>>
>> So it looks correct to me without your patch. The map is allocated aligned,
>> node_mem_map points to this map at the offset corresponding to node_start_pfn,
>> and pfn_to_page subtracts node_start_pfn to get the offset relative to
>> node_mem_map. We shouldn't need the extra padding by the node_start_pfn offset,
>> unless something else is misbehaving here.
>>
>> In the issue fixed by 7c45512 that you refer to, the problem was basically that
>> the allocation didn't use aligned size, but this looks different to me?
>>
>>
> 
> With this hard coded debugging:
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 7633c50..241b870 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5029,6 +5029,11 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat)
>                          map = memblock_virt_alloc_node_nopanic(size,
>                                                                 pgdat->node_id);
>                  pgdat->node_mem_map = map + (pgdat->node_start_pfn - start);
> +               pr_err(">>> node_start_pfn %lx node_end_pfn %lx\n",
> +                       pgdat->node_start_pfn, pgdat_end_pfn(pgdat));
> +               pr_err(">>> size calculated %lx\n", size);
> +               pr_err(">>> allocated region %p-%lx\n", map, ((unsigned long)map)+size);
> +
>          }
>   #ifndef CONFIG_NEED_MULTIPLE_NODES
>          /*
> @@ -5043,6 +5048,8 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat)
>          }
>   #endif
>   #endif /* CONFIG_FLAT_NODE_MEM_MAP */
> +       pr_err(">>> pfn %lx page %p\n", 0x200, pfn_to_page(0x200));
> +       pr_err(">>> pfn %lx page %p\n", 0xbffff, pfn_to_page(0xbffff));
>   }
>   
>   void __paginginit free_area_init_node(int nid, unsigned long *zones_size,
> 
> I get this output:
> [    0.000000] >>> node_start_pfn 200 node_end_pfn c0000
> [    0.000000] >>> size calculated 1800000
> [    0.000000] >>> allocated region edffa000-ef7fa000
> [    0.000000] >>> pfn 200 page ee002000
> [    0.000000] >>> pfn bffff page ef7fdfe0
> 
> The start and end pfn values are correct but that page value is outside of the
> allocated region for the memory map. This is a CONFIG_FLATMEM system so we
> aren't actually using arch_local_page_offset at all:
> 
> 
> #define __pfn_to_page(pfn)      (mem_map + ((pfn) - ARCH_PFN_OFFSET))
> #define __page_to_pfn(page)     ((unsigned long)((page) - mem_map) + \
>                                   ARCH_PFN_OFFSET)

Ah, OK. I searched just for node_mem_map and didn't notice it's also assigned to
mem_map.

> If you do the math, the array size is fine if we don't offset by the
> start but alloc_node_mem_map offsets assuming pfn_to_page will offset
> as well but this doesn't happen in CONFIG_FLATMEM.
> 
> Either alloc_node_mem_map needs to drop the offset or the pfn_to_page
> functions need to start adding the offset. It's worth noting that
> this gets corrected properly if we have CONFIG_HAVE_MEMBLOCK_NODE_MAP enabled
> so perhaps the fix is to unoffset for flatmem as well:
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 7633c50..271c44b 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5036,7 +5036,7 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat)
>           */
>          if (pgdat == NODE_DATA(0)) {
>                  mem_map = NODE_DATA(0)->node_mem_map;
> -#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
> +#if defined(CONFIG_HAVE_MEMBLOCK_NODE_MAP) || defined(CONFIG_FLATMEM)
>                  if (page_to_pfn(mem_map) != pgdat->node_start_pfn)
>                          mem_map -= (pgdat->node_start_pfn - ARCH_PFN_OFFSET);

But is this correcting the same thing? The offset that's added earlier is
(pgdat->node_start_pfn - start) where "start" is just alignment of the
node_start_pfn to MAX_ORDER_NR_PAGES. But here we subtract whole
pgdat->node_start_pfn, minus a ARCH_PFN_OFFSET constant. Is the constant always
equeal to the earlier value of "start", which is calculated dynamically?.

So I agree that mem_map assignment should be fixed, but maybe not exactly like this?

>   #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
> 
> Thanks,
> Laura
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2015-01-20  9:54 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-01-16 11:30 Issue on reserving memory with no-map flag in DT Srinivas Kandagatla
2015-01-17  0:24 ` Laura Abbott
2015-01-17  0:24   ` Laura Abbott
2015-01-17  8:39   ` Srinivas Kandagatla
2015-01-17  8:39     ` Srinivas Kandagatla
2015-01-19 15:49   ` Vlastimil Babka
2015-01-19 15:49     ` Vlastimil Babka
2015-01-19 23:57     ` Laura Abbott
2015-01-19 23:57       ` Laura Abbott
2015-01-20  9:54       ` Vlastimil Babka [this message]
2015-01-20  9:54         ` Vlastimil Babka
2015-01-21  1:37         ` [PATCH] mm: Don't offset memmap for flatmem Laura Abbott
2015-01-21  1:37           ` Laura Abbott
2015-01-21 10:15           ` Vlastimil Babka
2015-01-21 10:15             ` Vlastimil Babka
2015-01-22  1:01           ` [PATCHv2] " Laura Abbott
2015-01-22  1:01             ` Laura Abbott
2015-01-23  0:20             ` Andrew Morton
2015-01-23  0:20               ` Andrew Morton
2015-01-23  0:33               ` Laura Abbott
2015-01-23  0:33                 ` Laura Abbott
2015-01-23  9:05                 ` Vlastimil Babka
2015-01-23  9:05                   ` Vlastimil Babka
2015-01-26 15:56                   ` Mel Gorman
2015-01-26 15:56                     ` Mel Gorman
2015-01-29 13:13                     ` Vlastimil Babka
2015-01-29 13:13                       ` Vlastimil Babka
2015-02-04  2:25                       ` Laura Abbott
2015-02-04  2:25                         ` Laura Abbott
2015-02-24 19:54                         ` Laura Abbott
2015-02-24 19:54                           ` Laura Abbott
2015-02-27 15:24                           ` Vlastimil Babka
2015-02-27 15:24                             ` Vlastimil Babka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54BE25EF.3060004@suse.cz \
    --to=vbabka@suse.cz \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.