Need some help in understanding sparsemem.

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Need some help in understanding sparsemem.
@ 2010-07-06  5:11 naren.mehra
  2010-07-06  6:07 ` KAMEZAWA Hiroyuki
  2010-07-06  7:36 ` Minchan Kim
  0 siblings, 2 replies; 8+ messages in thread
From: naren.mehra @ 2010-07-06  5:11 UTC (permalink / raw)
  To: linux-mm

Hi,

I am trying to understand the sparsemem implementation in linux for
NUMA/multiple node systems.

>From the available documentation and the sparsemem patches, I am able
to make out that sparsemem divides memory into different sections and
if the whole section contains a hole then its marked as invalid
section and if some pages in a section form a hole then those pages
are marked reserved. My issue is that this classification, I am not
able to map it to the code.

e.g. from arch specific code, we call memory_present()  to prepare a
list of sections in a particular node. but unable to find where
exactly some sections are marked invalid because they contain a hole.

Can somebody tell me where in the code are we identifying sections as
invalid and where we are marking pages as reserved.

Pls correct me, if I am wrong in my understanding.
Also, If theres any article or writeup on sparsemem, pls point me to that.

I apologize, if I have posted this mail on the wrong mailing list, in
that case, pls let me know the correct forum to ask this question.

Regards,
Naren

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Need some help in understanding sparsemem.
  2010-07-06  5:11 Need some help in understanding sparsemem naren.mehra
@ 2010-07-06  6:07 ` KAMEZAWA Hiroyuki
  2010-07-06  7:06   ` Minchan Kim
  2010-07-06  7:36 ` Minchan Kim
  1 sibling, 1 reply; 8+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-07-06  6:07 UTC (permalink / raw)
  To: naren.mehra; +Cc: linux-mm

On Tue, 6 Jul 2010 10:41:06 +0530
naren.mehra@gmail.com wrote:

> Hi,
> 
> I am trying to understand the sparsemem implementation in linux for
> NUMA/multiple node systems.
> 
> From the available documentation and the sparsemem patches, I am able
> to make out that sparsemem divides memory into different sections and
> if the whole section contains a hole then its marked as invalid
> section and if some pages in a section form a hole then those pages
> are marked reserved. My issue is that this classification, I am not
> able to map it to the code.
> 
> e.g. from arch specific code, we call memory_present()  to prepare a
> list of sections in a particular node. but unable to find where
> exactly some sections are marked invalid because they contain a hole.
> 
> Can somebody tell me where in the code are we identifying sections as
> invalid and where we are marking pages as reserved.
> 

As you wrote, memory_present() is just for setting flags 
"SECTION_MARKED_PRESENT". If a section contains both of valid pages and
holes, the section itself is marked as SECTION_MARKED_PRESENT.

This memory_present() is called in very early stage. The function which allocates
mem_map(array of struct page) is sparse_init(). It's called somewhere after
memory_present().
(In x86, it's called by paging_init(), in ARM, it's called by bootmem_init()).

After sparse_init(), mem_maps are allocated. (depends on config..plz see codes.)
But, here, mem_map is not initialized.
This is because initialization logic of memmap doesn't depend on
FLATMEM/DISCONTIGMEM/SPARSEMEM.

After sprase_init(), mem_map is allocated. It's not encouraged to detect a section
is valid or invalid but you can use pfn_valid() to check there are memmap or not.
(*) pfn_valid(pfn) is not for detecting there is memory but for detecting
    there is memmap.

Initializing mem_map is done by free_area_init_node(). This function initializes
memory range regitered by add_active_range() (see mm/page_alloc.c)
(*)There are architecutures which doesn't use add_active_range(), but this function
   is for generic use.

After free_area_init_node(), all mem_map are initialized as PG_reserved and
NODE_DATA(nid)->star_pfn, etc..are available.

When PG_reserved is cleared is at free_all_bootmem(). If you want to keep pages
as Reserved (because of holes), OR, don't register memory hole as bootmem.
Then, pages will be kept as Reserved.

clarification:
 memory_present().... prepare for section[] and mark up PRESENT.
 sparse_init()   .... allocates mem_map. but just allocates it.
 free_area_init_node() .... initizalize mem_map at el.
 free_all_bootmem() .... make pages available and put into buddy allocator.

 pfn_valid() ... useful for checking there are mem_map.

 How to make pages kept as Reserved ....
                         reserve bootmem or not register to bootmem.

All aboves may depend on CONFIG, I hope this can be a hint for you.

Hmm. unexpectedly long..

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Need some help in understanding sparsemem.
  2010-07-06  6:07 ` KAMEZAWA Hiroyuki
@ 2010-07-06  7:06   ` Minchan Kim
  0 siblings, 0 replies; 8+ messages in thread
From: Minchan Kim @ 2010-07-06  7:06 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: naren.mehra, linux-mm

On Tue, Jul 6, 2010 at 3:07 PM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> On Tue, 6 Jul 2010 10:41:06 +0530
> naren.mehra@gmail.com wrote:
>
>> Hi,
>>
>> I am trying to understand the sparsemem implementation in linux for
>> NUMA/multiple node systems.
>>
>> From the available documentation and the sparsemem patches, I am able
>> to make out that sparsemem divides memory into different sections and
>> if the whole section contains a hole then its marked as invalid
>> section and if some pages in a section form a hole then those pages
>> are marked reserved. My issue is that this classification, I am not
>> able to map it to the code.
>>
>> e.g. from arch specific code, we call memory_present()  to prepare a
>> list of sections in a particular node. but unable to find where
>> exactly some sections are marked invalid because they contain a hole.
>>
>> Can somebody tell me where in the code are we identifying sections as
>> invalid and where we are marking pages as reserved.
>>
>
> As you wrote, memory_present() is just for setting flags
> "SECTION_MARKED_PRESENT". If a section contains both of valid pages and
> holes, the section itself is marked as SECTION_MARKED_PRESENT.
>
> This memory_present() is called in very early stage. The function which allocates
> mem_map(array of struct page) is sparse_init(). It's called somewhere after
> memory_present().
> (In x86, it's called by paging_init(), in ARM, it's called by bootmem_init()).
>
> After sparse_init(), mem_maps are allocated. (depends on config..plz see codes.)
> But, here, mem_map is not initialized.
> This is because initialization logic of memmap doesn't depend on
> FLATMEM/DISCONTIGMEM/SPARSEMEM.
>
> After sprase_init(), mem_map is allocated. It's not encouraged to detect a section
> is valid or invalid but you can use pfn_valid() to check there are memmap or not.
> (*) pfn_valid(pfn) is not for detecting there is memory but for detecting
>    there is memmap.
>
> Initializing mem_map is done by free_area_init_node(). This function initializes
> memory range regitered by add_active_range() (see mm/page_alloc.c)
> (*)There are architecutures which doesn't use add_active_range(), but this function
>   is for generic use.
>
> After free_area_init_node(), all mem_map are initialized as PG_reserved and
> NODE_DATA(nid)->star_pfn, etc..are available.
>
> When PG_reserved is cleared is at free_all_bootmem(). If you want to keep pages
> as Reserved (because of holes), OR, don't register memory hole as bootmem.
> Then, pages will be kept as Reserved.
>
> clarification:
>  memory_present().... prepare for section[] and mark up PRESENT.
>  sparse_init()   .... allocates mem_map. but just allocates it.
>  free_area_init_node() .... initizalize mem_map at el.
>  free_all_bootmem() .... make pages available and put into buddy allocator.
>
>  pfn_valid() ... useful for checking there are mem_map.

Kame explained greatly.
I want to elaborate on pfn_valid but it's off-topic. ;)

The pfn_valid isn't enough on ARM if you walk whole memmap.
That's because ARM frees memmap on hole to save the memory by
free_unused_memmap_node.

In such case, you have to use memmap_valid_within.

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Need some help in understanding sparsemem.
  2010-07-06  5:11 Need some help in understanding sparsemem naren.mehra
  2010-07-06  6:07 ` KAMEZAWA Hiroyuki
@ 2010-07-06  7:36 ` Minchan Kim
  2010-07-06 10:48   ` naren.mehra
  1 sibling, 1 reply; 8+ messages in thread
From: Minchan Kim @ 2010-07-06  7:36 UTC (permalink / raw)
  To: naren.mehra; +Cc: linux-mm

On Tue, Jul 6, 2010 at 2:11 PM,  <naren.mehra@gmail.com> wrote:
> Hi,
>
> I am trying to understand the sparsemem implementation in linux for
> NUMA/multiple node systems.
>
> From the available documentation and the sparsemem patches, I am able
> to make out that sparsemem divides memory into different sections and
> if the whole section contains a hole then its marked as invalid
> section and if some pages in a section form a hole then those pages
> are marked reserved. My issue is that this classification, I am not
> able to map it to the code.
>
> e.g. from arch specific code, we call memory_present()  to prepare a
> list of sections in a particular node. but unable to find where
> exactly some sections are marked invalid because they contain a hole.

On ARM's sparsememory,

static void arm_memory_present(struct meminfo *mi)
{
        int i;
        for_each_bank(i, mi) {
                struct membank *bank = &mi->bank[i];
                memory_present(0, bank_pfn_start(bank), bank_pfn_end(bank));
        }
}

It just mark _bank_ which has memory with SECTION_MARKED_PRESENT.
Otherwise, Hole.

>
> Can somebody tell me where in the code are we identifying sections as
> invalid and where we are marking pages as reserved.

Do you mean memmap_init_zone?


-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Need some help in understanding sparsemem.
  2010-07-06  7:36 ` Minchan Kim
@ 2010-07-06 10:48   ` naren.mehra
  2010-07-07  3:49     ` Minchan Kim
  0 siblings, 1 reply; 8+ messages in thread
From: naren.mehra @ 2010-07-06 10:48 UTC (permalink / raw)
  To: Minchan Kim, kamezawa.hiroyu; +Cc: linux-mm

Thanks Kame for your elaborate response, I got a lot of pointers on
where to look for in the code.
Kim, thanks for pointing out memmap_init_zone.
So basically those sections which contains holes in them, the mem_map
in those sections skip the entry for the invalid pages (holes).
This happens in memmap_init_zone().
1) So it means that all the sections get the initial allocation of
mem_map and in memmap_init_zone we decide whether or not it requires
any mem_map entry. Correct ??

2) Both of you mentioned that
> "If a section contains both of valid pages and
> holes, the section itself is marked as SECTION_MARKED_PRESENT."
> "It just mark _bank_ which has memory with SECTION_MARKED_PRESENT.
> Otherwise, Hole."

which happens in memory_present(). In memory_present() code, I am not
able to find anything where we are doing this classification of valid
section/bank ? To me it looks that memory_present marks, all the
sections as present and doesnt verify whether any section contains any
valid pages or not. Correct ??

void __init memory_present(int nid, unsigned long start, unsigned long end)
{
        unsigned long pfn;

        start &= PAGE_SECTION_MASK;
        mminit_validate_memmodel_limits(&start, &end);
        for (pfn = start; pfn < end; pfn += PAGES_PER_SECTION) {
                unsigned long section = pfn_to_section_nr(pfn);
          <--- find out the section no. of the given pfn
                struct mem_section *ms;

                sparse_index_init(section, nid);
                     <---- allocate a new section pointer to the
mem_section array
                set_section_nid(section, nid);
                      <---- store the node id for the particular page.

                ms = __nr_to_section(section);
                     <---- get the pointer to the mem_section
                if (!ms->section_mem_map)
                     <--- mark present, if not already marked.
                        ms->section_mem_map = sparse_encode_early_nid(nid) |
                                                        SECTION_MARKED_PRESENT;
        }
}

I know, I am missing something very simple... pls point it out. if possible.

Regards,
Naren

On Tue, Jul 6, 2010 at 1:06 PM, Minchan Kim <minchan.kim@gmail.com> wrote:
> On Tue, Jul 6, 2010 at 2:11 PM,  <naren.mehra@gmail.com> wrote:
>> Hi,
>>
>> I am trying to understand the sparsemem implementation in linux for
>> NUMA/multiple node systems.
>>
>> From the available documentation and the sparsemem patches, I am able
>> to make out that sparsemem divides memory into different sections and
>> if the whole section contains a hole then its marked as invalid
>> section and if some pages in a section form a hole then those pages
>> are marked reserved. My issue is that this classification, I am not
>> able to map it to the code.
>>
>> e.g. from arch specific code, we call memory_present()  to prepare a
>> list of sections in a particular node. but unable to find where
>> exactly some sections are marked invalid because they contain a hole.
>
> On ARM's sparsememory,
>
> static void arm_memory_present(struct meminfo *mi)
> {
>        int i;
>        for_each_bank(i, mi) {
>                struct membank *bank = &mi->bank[i];
>                memory_present(0, bank_pfn_start(bank), bank_pfn_end(bank));
>        }
> }
>
> It just mark _bank_ which has memory with SECTION_MARKED_PRESENT.
> Otherwise, Hole.
>
>>
>> Can somebody tell me where in the code are we identifying sections as
>> invalid and where we are marking pages as reserved.
>
> Do you mean memmap_init_zone?
>
>
> --
> Kind regards,
> Minchan Kim
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Need some help in understanding sparsemem.
  2010-07-06 10:48   ` naren.mehra
@ 2010-07-07  3:49     ` Minchan Kim
  2010-07-09  7:05       ` naren.mehra
  0 siblings, 1 reply; 8+ messages in thread
From: Minchan Kim @ 2010-07-07  3:49 UTC (permalink / raw)
  To: naren.mehra; +Cc: kamezawa.hiroyu, linux-mm

On Tue, Jul 6, 2010 at 7:48 PM,  <naren.mehra@gmail.com> wrote:
> Thanks Kame for your elaborate response, I got a lot of pointers on
> where to look for in the code.
> Kim, thanks for pointing out memmap_init_zone.
> So basically those sections which contains holes in them, the mem_map
> in those sections skip the entry for the invalid pages (holes).
> This happens in memmap_init_zone().
> 1) So it means that all the sections get the initial allocation of
> mem_map and in memmap_init_zone we decide whether or not it requires

Yes. kernel allocates memmap for non-empty sections.
Even kernel allocates memmap for section which has mixed with valid
and invalid(ex, hole) pages. For example, bank supports 64M but system
have 16M. Let's assume section size is 64M. In this case, section has
hole of 48M.

> any mem_map entry. Correct ??

No. memmap_init_zone doesn't care about it.
Regardless of hole, it initializes page descriptors(include struct
page which on hole).
But page descriptors on holes are _Reserved_ then doesn't go to the
buddy allocator as free page. For it, free_bootmem_node marks 0x0 on
bitmap about only _valid_ pages by bank. Afterwards,
free_all_bootmem_core doesn't insert pages on hole into buddy by using
bitmap. Even memmap on hole would be free on ARM by
free_unused_memmap_node.

>
> 2) Both of you mentioned that
>> "If a section contains both of valid pages and
>> holes, the section itself is marked as SECTION_MARKED_PRESENT."
>> "It just mark _bank_ which has memory with SECTION_MARKED_PRESENT.
>> Otherwise, Hole."
>
> which happens in memory_present(). In memory_present() code, I am not
> able to find anything where we are doing this classification of valid
> section/bank ? To me it looks that memory_present marks, all the
> sections as present and doesnt verify whether any section contains any
> valid pages or not. Correct ??

memory_present is just called on banks.
So some sections which consists of hole don't marked "SECTION_MARKED_PRESENT".

I hope this help you.

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Need some help in understanding sparsemem.
  2010-07-07  3:49     ` Minchan Kim
@ 2010-07-09  7:05       ` naren.mehra
  2010-07-09  7:49         ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 8+ messages in thread
From: naren.mehra @ 2010-07-09  7:05 UTC (permalink / raw)
  To: Minchan Kim; +Cc: kamezawa.hiroyu, linux-mm

Thanks to you guys, I am now getting a grip on the sparsemem code.
While going through the code, I came across several instances of the following:
#ifndef CONFIG_NEED_MULTIPLE_NODES
.
<some code>
.
#endif

Now, it seems like this configuration option is used in case there are
multiple nodes in a system.
But its linked/depends on NUMA/discontigmem.

It could be possible that we have multiple nodes in a UMA system.
How can sparsemem handle such cases ??


Regards,
Naren


On Wed, Jul 7, 2010 at 9:19 AM, Minchan Kim <minchan.kim@gmail.com> wrote:
> On Tue, Jul 6, 2010 at 7:48 PM,  <naren.mehra@gmail.com> wrote:
>> Thanks Kame for your elaborate response, I got a lot of pointers on
>> where to look for in the code.
>> Kim, thanks for pointing out memmap_init_zone.
>> So basically those sections which contains holes in them, the mem_map
>> in those sections skip the entry for the invalid pages (holes).
>> This happens in memmap_init_zone().
>> 1) So it means that all the sections get the initial allocation of
>> mem_map and in memmap_init_zone we decide whether or not it requires
>
> Yes. kernel allocates memmap for non-empty sections.
> Even kernel allocates memmap for section which has mixed with valid
> and invalid(ex, hole) pages. For example, bank supports 64M but system
> have 16M. Let's assume section size is 64M. In this case, section has
> hole of 48M.
>
>> any mem_map entry. Correct ??
>
> No. memmap_init_zone doesn't care about it.
> Regardless of hole, it initializes page descriptors(include struct
> page which on hole).
> But page descriptors on holes are _Reserved_ then doesn't go to the
> buddy allocator as free page. For it, free_bootmem_node marks 0x0 on
> bitmap about only _valid_ pages by bank. Afterwards,
> free_all_bootmem_core doesn't insert pages on hole into buddy by using
> bitmap. Even memmap on hole would be free on ARM by
> free_unused_memmap_node.
>
>>
>> 2) Both of you mentioned that
>>> "If a section contains both of valid pages and
>>> holes, the section itself is marked as SECTION_MARKED_PRESENT."
>>> "It just mark _bank_ which has memory with SECTION_MARKED_PRESENT.
>>> Otherwise, Hole."
>>
>> which happens in memory_present(). In memory_present() code, I am not
>> able to find anything where we are doing this classification of valid
>> section/bank ? To me it looks that memory_present marks, all the
>> sections as present and doesnt verify whether any section contains any
>> valid pages or not. Correct ??
>
> memory_present is just called on banks.
> So some sections which consists of hole don't marked "SECTION_MARKED_PRESENT".
>
> I hope this help you.
>
> --
> Kind regards,
> Minchan Kim
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Need some help in understanding sparsemem.
  2010-07-09  7:05       ` naren.mehra
@ 2010-07-09  7:49         ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 8+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-07-09  7:49 UTC (permalink / raw)
  To: naren.mehra; +Cc: Minchan Kim, linux-mm

On Fri, 9 Jul 2010 12:35:17 +0530
naren.mehra@gmail.com wrote:

> Thanks to you guys, I am now getting a grip on the sparsemem code.
> While going through the code, I came across several instances of the following:
> #ifndef CONFIG_NEED_MULTIPLE_NODES
> .
> <some code>
> .
> #endif
> 
> Now, it seems like this configuration option is used in case there are
> multiple nodes in a system.
> But its linked/depends on NUMA/discontigmem.
> 
> It could be possible that we have multiple nodes in a UMA system.
> How can sparsemem handle such cases ??
> 

sparsemem can be used both in UMA/NUMA case. IOW, sparsemem is for handling
memmap(array of struct page) for flexible memory layout, and not for NUMA.
Then, NUMA/MULTIPLENODE and SPARSEMEM has no relationship, basically.
"nid" is recorded just for detecting the nearest node for allocating mem_map.
(And some 32bit arch recoreds some information of 'nid'.)

So, you shouldn't be suffer from an illusion of sparsemem when you think about
NUMA/MULTIPLENODE. please visit free_area_init_nodes(), and add_active_range(),
remove_actitve_range(). They are for MULTIPLENODES.

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2010-07-09  7:54 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-07-06  5:11 Need some help in understanding sparsemem naren.mehra
2010-07-06  6:07 ` KAMEZAWA Hiroyuki
2010-07-06  7:06   ` Minchan Kim
2010-07-06  7:36 ` Minchan Kim
2010-07-06 10:48   ` naren.mehra
2010-07-07  3:49     ` Minchan Kim
2010-07-09  7:05       ` naren.mehra
2010-07-09  7:49         ` KAMEZAWA Hiroyuki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).