From: Yinghai Lu <yhlu.kernel.send@gmail.com>
To: Andrew Morton <akpm@linux-foundation.org>, Ingo Molnar <mingo@elte.hu>
Cc: kernel list <linux-kernel@vger.kernel.org>,
Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>,
linuxppc-dev@ozlabs.org, Badari Pulavarty <pbadari@us.ibm.com>,
Balbir Singh <balbir@linux.vnet.ibm.com>
Subject: [PATCH] mm: make mem_map allocation continuous v2.
Date: Wed, 2 Apr 2008 18:30:24 -0700 [thread overview]
Message-ID: <200804021830.24563.yhlu.kernel@gmail.com> (raw)
In-Reply-To: <86802c440804021744m7c6e3d94vcb6af3ebcaa71b5b@mail.gmail.com>
vmemmap allocation current got
[ffffe20000000000-ffffe200001fffff] PMD ->ffff810001400000 on node 0
[ffffe20000200000-ffffe200003fffff] PMD ->ffff810001800000 on node 0
[ffffe20000400000-ffffe200005fffff] PMD ->ffff810001c00000 on node 0
[ffffe20000600000-ffffe200007fffff] PMD ->ffff810002000000 on node 0
[ffffe20000800000-ffffe200009fffff] PMD ->ffff810002400000 on node 0
...
there is 2M hole between them.
the rootcause is that usemap (24 bytes) will be allocated after every 2M
mem_map. and it will push next vmemmap (2M) to next align (2M).
solution:
try to allocate mem_map continously.
after patch, will get
[ffffe20000000000-ffffe200001fffff] PMD ->ffff810001400000 on node 0
[ffffe20000200000-ffffe200003fffff] PMD ->ffff810001600000 on node 0
[ffffe20000400000-ffffe200005fffff] PMD ->ffff810001800000 on node 0
[ffffe20000600000-ffffe200007fffff] PMD ->ffff810001a00000 on node 0
[ffffe20000800000-ffffe200009fffff] PMD ->ffff810001c00000 on node 0
...
and usemap will share in page because of they are allocated continuously too.
sparse_early_usemap_alloc: usemap = ffff810024e00000 size = 24
sparse_early_usemap_alloc: usemap = ffff810024e00080 size = 24
sparse_early_usemap_alloc: usemap = ffff810024e00100 size = 24
sparse_early_usemap_alloc: usemap = ffff810024e00180 size = 24
...
so we make the bootmem allocation more compact and use less memory for usemap.
for power pc
Badari Pulavarty <pbadari@us.ibm.com> wrote:
> You have to call sparse_init_one_section() on each pmap and usemap
> as we allocate - since valid_section() depends on it (which is needed
> by vmemmap_populate() to check if the section is populated or not).
> On ppc, we need to call htab_bolted_mapping() on each section and
> we need to skip existing sections.
so try to allocate usemap at first altogether.
v2 replace:
[PATCH] mm: make mem_map allocation continuous.
[PATCH] mm: allocate section_map for sparse_init
[PATCH] mm: allocate usemap at first instead of mem_map in sparse_init
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
diff --git a/mm/sparse.c b/mm/sparse.c
index f6a43c0..2881222 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -294,22 +294,48 @@ void __init sparse_init(void)
unsigned long pnum;
struct page *map;
unsigned long *usemap;
+ unsigned long **usemap_map;
+ int size;
+
+ /*
+ * map is using big page (aka 2M in x86 64 bit)
+ * usemap is less one page (aka 24 bytes)
+ * so alloc 2M (with 2M align) and 24 bytes in turn will
+ * make next 2M slip to one more 2M later.
+ * then in big system, the memory will have a lot of holes...
+ * here try to allocate 2M pages continously.
+ *
+ * powerpc need to call sparse_init_one_section right after each
+ * sparse_early_mem_map_alloc, so allocate usemap_map at first.
+ */
+ size = sizeof(unsigned long *) * NR_MEM_SECTIONS;
+ usemap_map = alloc_bootmem(size);
+ if (!usemap_map)
+ panic("can not allocate usemap_map\n");
for (pnum = 0; pnum < NR_MEM_SECTIONS; pnum++) {
if (!present_section_nr(pnum))
continue;
+ usemap_map[pnum] = sparse_early_usemap_alloc(pnum);
+ }
- map = sparse_early_mem_map_alloc(pnum);
- if (!map)
+ for (pnum = 0; pnum < NR_MEM_SECTIONS; pnum++) {
+ if (!present_section_nr(pnum))
continue;
- usemap = sparse_early_usemap_alloc(pnum);
+ usemap = usemap_map[pnum];
if (!usemap)
continue;
+ map = sparse_early_mem_map_alloc(pnum);
+ if (!map)
+ continue;
+
sparse_init_one_section(__nr_to_section(pnum), pnum, map,
usemap);
}
+
+ free_bootmem(__pa(usemap_map), size);
}
#ifdef CONFIG_MEMORY_HOTPLUG
WARNING: multiple messages have this Message-ID (diff)
From: Yinghai Lu <yhlu.kernel.send@gmail.com>
To: Andrew Morton <akpm@linux-foundation.org>, Ingo Molnar <mingo@elte.hu>
Cc: Badari Pulavarty <pbadari@us.ibm.com>,
michael@ellerman.id.au,
Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>,
linuxppc-dev@ozlabs.org, Balbir Singh <balbir@linux.vnet.ibm.com>,
kernel list <linux-kernel@vger.kernel.org>
Subject: [PATCH] mm: make mem_map allocation continuous v2.
Date: Wed, 2 Apr 2008 18:30:24 -0700 [thread overview]
Message-ID: <200804021830.24563.yhlu.kernel@gmail.com> (raw)
In-Reply-To: <86802c440804021744m7c6e3d94vcb6af3ebcaa71b5b@mail.gmail.com>
vmemmap allocation current got
[ffffe20000000000-ffffe200001fffff] PMD ->ffff810001400000 on node 0
[ffffe20000200000-ffffe200003fffff] PMD ->ffff810001800000 on node 0
[ffffe20000400000-ffffe200005fffff] PMD ->ffff810001c00000 on node 0
[ffffe20000600000-ffffe200007fffff] PMD ->ffff810002000000 on node 0
[ffffe20000800000-ffffe200009fffff] PMD ->ffff810002400000 on node 0
...
there is 2M hole between them.
the rootcause is that usemap (24 bytes) will be allocated after every 2M
mem_map. and it will push next vmemmap (2M) to next align (2M).
solution:
try to allocate mem_map continously.
after patch, will get
[ffffe20000000000-ffffe200001fffff] PMD ->ffff810001400000 on node 0
[ffffe20000200000-ffffe200003fffff] PMD ->ffff810001600000 on node 0
[ffffe20000400000-ffffe200005fffff] PMD ->ffff810001800000 on node 0
[ffffe20000600000-ffffe200007fffff] PMD ->ffff810001a00000 on node 0
[ffffe20000800000-ffffe200009fffff] PMD ->ffff810001c00000 on node 0
...
and usemap will share in page because of they are allocated continuously too.
sparse_early_usemap_alloc: usemap = ffff810024e00000 size = 24
sparse_early_usemap_alloc: usemap = ffff810024e00080 size = 24
sparse_early_usemap_alloc: usemap = ffff810024e00100 size = 24
sparse_early_usemap_alloc: usemap = ffff810024e00180 size = 24
...
so we make the bootmem allocation more compact and use less memory for usemap.
for power pc
Badari Pulavarty <pbadari@us.ibm.com> wrote:
> You have to call sparse_init_one_section() on each pmap and usemap
> as we allocate - since valid_section() depends on it (which is needed
> by vmemmap_populate() to check if the section is populated or not).
> On ppc, we need to call htab_bolted_mapping() on each section and
> we need to skip existing sections.
so try to allocate usemap at first altogether.
v2 replace:
[PATCH] mm: make mem_map allocation continuous.
[PATCH] mm: allocate section_map for sparse_init
[PATCH] mm: allocate usemap at first instead of mem_map in sparse_init
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
diff --git a/mm/sparse.c b/mm/sparse.c
index f6a43c0..2881222 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -294,22 +294,48 @@ void __init sparse_init(void)
unsigned long pnum;
struct page *map;
unsigned long *usemap;
+ unsigned long **usemap_map;
+ int size;
+
+ /*
+ * map is using big page (aka 2M in x86 64 bit)
+ * usemap is less one page (aka 24 bytes)
+ * so alloc 2M (with 2M align) and 24 bytes in turn will
+ * make next 2M slip to one more 2M later.
+ * then in big system, the memory will have a lot of holes...
+ * here try to allocate 2M pages continously.
+ *
+ * powerpc need to call sparse_init_one_section right after each
+ * sparse_early_mem_map_alloc, so allocate usemap_map at first.
+ */
+ size = sizeof(unsigned long *) * NR_MEM_SECTIONS;
+ usemap_map = alloc_bootmem(size);
+ if (!usemap_map)
+ panic("can not allocate usemap_map\n");
for (pnum = 0; pnum < NR_MEM_SECTIONS; pnum++) {
if (!present_section_nr(pnum))
continue;
+ usemap_map[pnum] = sparse_early_usemap_alloc(pnum);
+ }
- map = sparse_early_mem_map_alloc(pnum);
- if (!map)
+ for (pnum = 0; pnum < NR_MEM_SECTIONS; pnum++) {
+ if (!present_section_nr(pnum))
continue;
- usemap = sparse_early_usemap_alloc(pnum);
+ usemap = usemap_map[pnum];
if (!usemap)
continue;
+ map = sparse_early_mem_map_alloc(pnum);
+ if (!map)
+ continue;
+
sparse_init_one_section(__nr_to_section(pnum), pnum, map,
usemap);
}
+
+ free_bootmem(__pa(usemap_map), size);
}
#ifdef CONFIG_MEMORY_HOTPLUG
next prev parent reply other threads:[~2008-04-03 1:27 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-04-02 22:25 [PATCH] mm: allocate usemap at first instead of mem_map in sparse_init Yinghai Lu
2008-04-02 22:25 ` Yinghai Lu
2008-04-02 22:52 ` Andrew Morton
2008-04-02 22:52 ` Andrew Morton
2008-04-03 0:44 ` Yinghai Lu
2008-04-03 0:44 ` Yinghai Lu
2008-04-03 1:30 ` Yinghai Lu [this message]
2008-04-03 1:30 ` [PATCH] mm: make mem_map allocation continuous v2 Yinghai Lu
2008-04-03 2:22 ` Andrew Morton
2008-04-03 2:22 ` Andrew Morton
2008-04-03 4:16 ` Yinghai Lu
2008-04-03 4:16 ` Yinghai Lu
2008-04-03 10:49 ` Kamalesh Babulal
2008-04-03 10:49 ` Kamalesh Babulal
2008-04-03 3:22 ` Yasunori Goto
2008-04-03 3:22 ` Yasunori Goto
2008-04-03 1:43 ` [PATCH] mm: allocate usemap at first instead of mem_map in sparse_init Yinghai Lu
2008-04-03 1:43 ` Yinghai Lu
2008-04-02 23:51 ` Badari Pulavarty
2008-04-02 23:51 ` Badari Pulavarty
2008-04-03 0:47 ` Yinghai Lu
2008-04-03 0:47 ` Yinghai Lu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200804021830.24563.yhlu.kernel@gmail.com \
--to=yhlu.kernel.send@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=balbir@linux.vnet.ibm.com \
--cc=kamalesh@linux.vnet.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linuxppc-dev@ozlabs.org \
--cc=mingo@elte.hu \
--cc=pbadari@us.ibm.com \
--cc=yhlu.kernel@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.