* [PATCH] sparsemem/bootmem: catch greater than section size allocations
@ 2012-02-24 19:33 Nishanth Aravamudan
2012-02-28 13:53 ` Johannes Weiner
2012-02-28 15:47 ` Mel Gorman
0 siblings, 2 replies; 11+ messages in thread
From: Nishanth Aravamudan @ 2012-02-24 19:33 UTC (permalink / raw)
To: Andrew Morton
Cc: Anton Blanchard, Dave Hansen, linux-mm, Paul Mackerras,
Robert Jennings, linuxppc-dev
While testing AMS (Active Memory Sharing) / CMO (Cooperative Memory
Overcommit) on powerpc, we tripped the following:
kernel BUG at mm/bootmem.c:483!
cpu 0x0: Vector: 700 (Program Check) at [c000000000c03940]
pc: c000000000a62bd8: .alloc_bootmem_core+0x90/0x39c
lr: c000000000a64bcc: .sparse_early_usemaps_alloc_node+0x84/0x29c
sp: c000000000c03bc0
msr: 8000000000021032
current = 0xc000000000b0cce0
paca = 0xc000000001d80000
pid = 0, comm = swapper
kernel BUG at mm/bootmem.c:483!
enter ? for help
[c000000000c03c80] c000000000a64bcc
.sparse_early_usemaps_alloc_node+0x84/0x29c
[c000000000c03d50] c000000000a64f10 .sparse_init+0x12c/0x28c
[c000000000c03e20] c000000000a474f4 .setup_arch+0x20c/0x294
[c000000000c03ee0] c000000000a4079c .start_kernel+0xb4/0x460
[c000000000c03f90] c000000000009670 .start_here_common+0x1c/0x2c
This is
BUG_ON(limit && goal + size > limit);
and after some debugging, it seems that
goal = 0x7ffff000000
limit = 0x80000000000
and sparse_early_usemaps_alloc_node ->
sparse_early_usemaps_alloc_pgdat_section -> alloc_bootmem_section calls
return alloc_bootmem_section(usemap_size() * count, section_nr);
This is on a system with 8TB available via the AMS pool, and as a quirk
of AMS in firmware, all of that memory shows up in node 0. So, we end up
with an allocation that will fail the goal/limit constraints. In theory,
we could "fall-back" to alloc_bootmem_node() in
sparse_early_usemaps_alloc_node(), but since we actually have HOTREMOVE
defined, we'll BUG_ON() instead. A simple solution appears to be to
disable the limit check if the size of the allocation in
alloc_bootmem_secition exceeds the section size.
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Anton Blanchard <anton@au1.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ben Herrenschmidt <benh@kernel.crashing.org>
Cc: Robert Jennings <rcj@linux.vnet.ibm.com>
Cc: linux-mm@kvack.org
Cc: linuxppc-dev@lists.ozlabs.org
---
include/linux/mmzone.h | 2 ++
mm/bootmem.c | 5 ++++-
2 files changed, 6 insertions(+), 1 deletions(-)
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 650ba2f..4176834 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -967,6 +967,8 @@ static inline unsigned long early_pfn_to_nid(unsigned long pfn)
* PA_SECTION_SHIFT physical address to/from section number
* PFN_SECTION_SHIFT pfn to/from section number
*/
+#define BYTES_PER_SECTION (1UL << SECTION_SIZE_BITS)
+
#define SECTIONS_SHIFT (MAX_PHYSMEM_BITS - SECTION_SIZE_BITS)
#define PA_SECTION_SHIFT (SECTION_SIZE_BITS)
diff --git a/mm/bootmem.c b/mm/bootmem.c
index 668e94d..5cbbc76 100644
--- a/mm/bootmem.c
+++ b/mm/bootmem.c
@@ -770,7 +770,10 @@ void * __init alloc_bootmem_section(unsigned long size,
pfn = section_nr_to_pfn(section_nr);
goal = pfn << PAGE_SHIFT;
- limit = section_nr_to_pfn(section_nr + 1) << PAGE_SHIFT;
+ if (size > BYTES_PER_SECTION)
+ limit = 0;
+ else
+ limit = section_nr_to_pfn(section_nr + 1) << PAGE_SHIFT;
bdata = &bootmem_node_data[early_pfn_to_nid(pfn)];
return alloc_bootmem_core(bdata, size, SMP_CACHE_BYTES, goal, limit);
--
1.7.5.4
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH] sparsemem/bootmem: catch greater than section size allocations
2012-02-24 19:33 [PATCH] sparsemem/bootmem: catch greater than section size allocations Nishanth Aravamudan
@ 2012-02-28 13:53 ` Johannes Weiner
2012-02-28 20:11 ` Nishanth Aravamudan
2012-02-28 15:47 ` Mel Gorman
1 sibling, 1 reply; 11+ messages in thread
From: Johannes Weiner @ 2012-02-28 13:53 UTC (permalink / raw)
To: Nishanth Aravamudan
Cc: Anton Blanchard, Dave Hansen, linux-mm, Paul Mackerras,
Andrew Morton, Robert Jennings, linuxppc-dev
On Fri, Feb 24, 2012 at 11:33:58AM -0800, Nishanth Aravamudan wrote:
> While testing AMS (Active Memory Sharing) / CMO (Cooperative Memory
> Overcommit) on powerpc, we tripped the following:
>
> kernel BUG at mm/bootmem.c:483!
> cpu 0x0: Vector: 700 (Program Check) at [c000000000c03940]
> pc: c000000000a62bd8: .alloc_bootmem_core+0x90/0x39c
> lr: c000000000a64bcc: .sparse_early_usemaps_alloc_node+0x84/0x29c
> sp: c000000000c03bc0
> msr: 8000000000021032
> current = 0xc000000000b0cce0
> paca = 0xc000000001d80000
> pid = 0, comm = swapper
> kernel BUG at mm/bootmem.c:483!
> enter ? for help
> [c000000000c03c80] c000000000a64bcc
> .sparse_early_usemaps_alloc_node+0x84/0x29c
> [c000000000c03d50] c000000000a64f10 .sparse_init+0x12c/0x28c
> [c000000000c03e20] c000000000a474f4 .setup_arch+0x20c/0x294
> [c000000000c03ee0] c000000000a4079c .start_kernel+0xb4/0x460
> [c000000000c03f90] c000000000009670 .start_here_common+0x1c/0x2c
>
> This is
>
> BUG_ON(limit && goal + size > limit);
>
> and after some debugging, it seems that
>
> goal = 0x7ffff000000
> limit = 0x80000000000
>
> and sparse_early_usemaps_alloc_node ->
> sparse_early_usemaps_alloc_pgdat_section -> alloc_bootmem_section calls
>
> return alloc_bootmem_section(usemap_size() * count, section_nr);
>
> This is on a system with 8TB available via the AMS pool, and as a quirk
> of AMS in firmware, all of that memory shows up in node 0. So, we end up
> with an allocation that will fail the goal/limit constraints. In theory,
> we could "fall-back" to alloc_bootmem_node() in
> sparse_early_usemaps_alloc_node(), but since we actually have HOTREMOVE
> defined, we'll BUG_ON() instead. A simple solution appears to be to
> disable the limit check if the size of the allocation in
> alloc_bootmem_secition exceeds the section size.
It makes sense to allow the usemaps to spill over to subsequent
sections instead of panicking, so FWIW:
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
That being said, it would be good if check_usemap_section_nr() printed
the cross-dependencies between pgdats and sections when the usemaps of
a node spilled over to other sections than the ones holding the pgdat.
How about this?
---
From: Johannes Weiner <hannes@cmpxchg.org>
Subject: sparsemem/bootmem: catch greater than section size allocations fix
If alloc_bootmem_section() no longer guarantees section-locality, we
need check_usemap_section_nr() to print possible cross-dependencies
between node descriptors and the usemaps allocated through it.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
diff --git a/mm/sparse.c b/mm/sparse.c
index 61d7cde..9e032dc 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -359,6 +359,7 @@ static void __init sparse_early_usemaps_alloc_node(unsigned long**usemap_map,
continue;
usemap_map[pnum] = usemap;
usemap += size;
+ check_usemap_section_nr(nodeid, usemap_map[pnum]);
}
return;
}
---
Furthermore, I wonder if we can remove the sparse-specific stuff from
bootmem.c as well, as now even more so than before, calculating the
desired area is really none of bootmem's business.
Would something like this be okay?
---
From: Johannes Weiner <hannes@cmpxchg.org>
Subject: [patch] mm: remove sparsemem allocation details from the bootmem allocator
alloc_bootmem_section() derives allocation area constraints from the
specified sparsemem section. This is a bit specific for a generic
memory allocator like bootmem, though, so move it over to sparsemem.
Since __alloc_bootmem_node() already retries failed allocations with
relaxed area constraints, the fallback code in sparsemem.c can be
removed and the code becomes a bit more compact overall.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
include/linux/bootmem.h | 3 ---
mm/bootmem.c | 26 --------------------------
mm/sparse.c | 29 +++++++++--------------------
3 files changed, 9 insertions(+), 49 deletions(-)
diff --git a/include/linux/bootmem.h b/include/linux/bootmem.h
index ab344a5..001c248 100644
--- a/include/linux/bootmem.h
+++ b/include/linux/bootmem.h
@@ -135,9 +135,6 @@ extern void *__alloc_bootmem_low_node(pg_data_t *pgdat,
extern int reserve_bootmem_generic(unsigned long addr, unsigned long size,
int flags);
-extern void *alloc_bootmem_section(unsigned long size,
- unsigned long section_nr);
-
#ifdef CONFIG_HAVE_ARCH_ALLOC_REMAP
extern void *alloc_remap(int nid, unsigned long size);
#else
diff --git a/mm/bootmem.c b/mm/bootmem.c
index 7bc0557..d34026c 100644
--- a/mm/bootmem.c
+++ b/mm/bootmem.c
@@ -756,32 +756,6 @@ void * __init __alloc_bootmem_node_high(pg_data_t *pgdat, unsigned long size,
}
-#ifdef CONFIG_SPARSEMEM
-/**
- * alloc_bootmem_section - allocate boot memory from a specific section
- * @size: size of the request in bytes
- * @section_nr: sparse map section to allocate from
- *
- * Return NULL on failure.
- */
-void * __init alloc_bootmem_section(unsigned long size,
- unsigned long section_nr)
-{
- bootmem_data_t *bdata;
- unsigned long pfn, goal, limit;
-
- pfn = section_nr_to_pfn(section_nr);
- goal = pfn << PAGE_SHIFT;
- if (size > BYTES_PER_SECTION)
- limit = 0;
- else
- limit = section_nr_to_pfn(section_nr + 1) << PAGE_SHIFT;
- bdata = &bootmem_node_data[early_pfn_to_nid(pfn)];
-
- return alloc_bootmem_core(bdata, size, SMP_CACHE_BYTES, goal, limit);
-}
-#endif
-
void * __init __alloc_bootmem_node_nopanic(pg_data_t *pgdat, unsigned long size,
unsigned long align, unsigned long goal)
{
diff --git a/mm/sparse.c b/mm/sparse.c
index 9e032dc..ac0d5a3 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -273,10 +273,10 @@ static unsigned long *__kmalloc_section_usemap(void)
#ifdef CONFIG_MEMORY_HOTREMOVE
static unsigned long * __init
sparse_early_usemaps_alloc_pgdat_section(struct pglist_data *pgdat,
- unsigned long count)
+ unsigned long size)
{
- unsigned long section_nr;
-
+ pg_data_t *host_pgdat;
+ unsigned long goal;
/*
* A page may contain usemaps for other sections preventing the
* page being freed and making a section unremovable while
@@ -287,8 +287,9 @@ sparse_early_usemaps_alloc_pgdat_section(struct pglist_data *pgdat,
* from the same section as the pgdat where possible to avoid
* this problem.
*/
- section_nr = pfn_to_section_nr(__pa(pgdat) >> PAGE_SHIFT);
- return alloc_bootmem_section(usemap_size() * count, section_nr);
+ goal = __pa(pgdat) & PAGE_SECTION_MASK;
+ host_pgdat = NODE_DATA(early_pfn_to_nid(goal));
+ return __alloc_bootmem_node(host_pgdat, size, SMP_CACHE_BYTES, goal);
}
static void __init check_usemap_section_nr(int nid, unsigned long *usemap)
@@ -332,9 +333,9 @@ static void __init check_usemap_section_nr(int nid, unsigned long *usemap)
#else
static unsigned long * __init
sparse_early_usemaps_alloc_pgdat_section(struct pglist_data *pgdat,
- unsigned long count)
+ unsigned long size)
{
- return NULL;
+ return alloc_bootmem_node(pgdat, size);
}
static void __init check_usemap_section_nr(int nid, unsigned long *usemap)
@@ -352,19 +353,7 @@ static void __init sparse_early_usemaps_alloc_node(unsigned long**usemap_map,
int size = usemap_size();
usemap = sparse_early_usemaps_alloc_pgdat_section(NODE_DATA(nodeid),
- usemap_count);
- if (usemap) {
- for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
- if (!present_section_nr(pnum))
- continue;
- usemap_map[pnum] = usemap;
- usemap += size;
- check_usemap_section_nr(nodeid, usemap_map[pnum]);
- }
- return;
- }
-
- usemap = alloc_bootmem_node(NODE_DATA(nodeid), size * usemap_count);
+ size * usemap_count);
if (usemap) {
for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
if (!present_section_nr(pnum))
--
1.7.7.6
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH] sparsemem/bootmem: catch greater than section size allocations
2012-02-24 19:33 [PATCH] sparsemem/bootmem: catch greater than section size allocations Nishanth Aravamudan
2012-02-28 13:53 ` Johannes Weiner
@ 2012-02-28 15:47 ` Mel Gorman
2012-02-29 18:12 ` [PATCH v2] bootmem/sparsemem: remove limit constraint in alloc_bootmem_section Nishanth Aravamudan
1 sibling, 1 reply; 11+ messages in thread
From: Mel Gorman @ 2012-02-28 15:47 UTC (permalink / raw)
To: Nishanth Aravamudan
Cc: Anton Blanchard, Dave Hansen, linux-mm, Paul Mackerras,
Andrew Morton, Robert Jennings, linuxppc-dev
On Fri, Feb 24, 2012 at 11:33:58AM -0800, Nishanth Aravamudan wrote:
> While testing AMS (Active Memory Sharing) / CMO (Cooperative Memory
> Overcommit) on powerpc, we tripped the following:
>
> kernel BUG at mm/bootmem.c:483!
> cpu 0x0: Vector: 700 (Program Check) at [c000000000c03940]
> pc: c000000000a62bd8: .alloc_bootmem_core+0x90/0x39c
> lr: c000000000a64bcc: .sparse_early_usemaps_alloc_node+0x84/0x29c
> sp: c000000000c03bc0
> msr: 8000000000021032
> current = 0xc000000000b0cce0
> paca = 0xc000000001d80000
> pid = 0, comm = swapper
> kernel BUG at mm/bootmem.c:483!
> enter ? for help
> [c000000000c03c80] c000000000a64bcc
> .sparse_early_usemaps_alloc_node+0x84/0x29c
> [c000000000c03d50] c000000000a64f10 .sparse_init+0x12c/0x28c
> [c000000000c03e20] c000000000a474f4 .setup_arch+0x20c/0x294
> [c000000000c03ee0] c000000000a4079c .start_kernel+0xb4/0x460
> [c000000000c03f90] c000000000009670 .start_here_common+0x1c/0x2c
>
> This is
>
> BUG_ON(limit && goal + size > limit);
>
> and after some debugging, it seems that
>
> goal = 0x7ffff000000
> limit = 0x80000000000
>
> and sparse_early_usemaps_alloc_node ->
> sparse_early_usemaps_alloc_pgdat_section -> alloc_bootmem_section calls
>
> return alloc_bootmem_section(usemap_size() * count, section_nr);
>
> This is on a system with 8TB available via the AMS pool, and as a quirk
> of AMS in firmware, all of that memory shows up in node 0. So, we end up
> with an allocation that will fail the goal/limit constraints. In theory,
> we could "fall-back" to alloc_bootmem_node() in
> sparse_early_usemaps_alloc_node(), but since we actually have HOTREMOVE
> defined, we'll BUG_ON() instead. A simple solution appears to be to
> disable the limit check if the size of the allocation in
> alloc_bootmem_secition exceeds the section size.
>
> Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
> Cc: Dave Hansen <haveblue@us.ibm.com>
> Cc: Anton Blanchard <anton@au1.ibm.com>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Ben Herrenschmidt <benh@kernel.crashing.org>
> Cc: Robert Jennings <rcj@linux.vnet.ibm.com>
> Cc: linux-mm@kvack.org
> Cc: linuxppc-dev@lists.ozlabs.org
> ---
> include/linux/mmzone.h | 2 ++
> mm/bootmem.c | 5 ++++-
> 2 files changed, 6 insertions(+), 1 deletions(-)
>
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 650ba2f..4176834 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -967,6 +967,8 @@ static inline unsigned long early_pfn_to_nid(unsigned long pfn)
> * PA_SECTION_SHIFT physical address to/from section number
> * PFN_SECTION_SHIFT pfn to/from section number
> */
> +#define BYTES_PER_SECTION (1UL << SECTION_SIZE_BITS)
> +
> #define SECTIONS_SHIFT (MAX_PHYSMEM_BITS - SECTION_SIZE_BITS)
>
> #define PA_SECTION_SHIFT (SECTION_SIZE_BITS)
> diff --git a/mm/bootmem.c b/mm/bootmem.c
> index 668e94d..5cbbc76 100644
> --- a/mm/bootmem.c
> +++ b/mm/bootmem.c
> @@ -770,7 +770,10 @@ void * __init alloc_bootmem_section(unsigned long size,
>
> pfn = section_nr_to_pfn(section_nr);
> goal = pfn << PAGE_SHIFT;
> - limit = section_nr_to_pfn(section_nr + 1) << PAGE_SHIFT;
> + if (size > BYTES_PER_SECTION)
> + limit = 0;
> + else
> + limit = section_nr_to_pfn(section_nr + 1) << PAGE_SHIFT;
As it's ok to spill the allocation over to an adjacent section, why not
just make limit==0 unconditionally. That would avoid defining
BYTES_PER_SECTION.
--
Mel Gorman
SUSE Labs
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] sparsemem/bootmem: catch greater than section size allocations
2012-02-28 13:53 ` Johannes Weiner
@ 2012-02-28 20:11 ` Nishanth Aravamudan
2012-02-29 9:17 ` Johannes Weiner
0 siblings, 1 reply; 11+ messages in thread
From: Nishanth Aravamudan @ 2012-02-28 20:11 UTC (permalink / raw)
To: Johannes Weiner
Cc: Anton Blanchard, Dave Hansen, linux-mm, Paul Mackerras,
Nishanth Aravamudan, Andrew Morton, Robert Jennings, linuxppc-dev
On 28.02.2012 [14:53:26 +0100], Johannes Weiner wrote:
> On Fri, Feb 24, 2012 at 11:33:58AM -0800, Nishanth Aravamudan wrote:
> > While testing AMS (Active Memory Sharing) / CMO (Cooperative Memory
> > Overcommit) on powerpc, we tripped the following:
> >
> > kernel BUG at mm/bootmem.c:483!
> > cpu 0x0: Vector: 700 (Program Check) at [c000000000c03940]
> > pc: c000000000a62bd8: .alloc_bootmem_core+0x90/0x39c
> > lr: c000000000a64bcc: .sparse_early_usemaps_alloc_node+0x84/0x29c
> > sp: c000000000c03bc0
> > msr: 8000000000021032
> > current = 0xc000000000b0cce0
> > paca = 0xc000000001d80000
> > pid = 0, comm = swapper
> > kernel BUG at mm/bootmem.c:483!
> > enter ? for help
> > [c000000000c03c80] c000000000a64bcc
> > .sparse_early_usemaps_alloc_node+0x84/0x29c
> > [c000000000c03d50] c000000000a64f10 .sparse_init+0x12c/0x28c
> > [c000000000c03e20] c000000000a474f4 .setup_arch+0x20c/0x294
> > [c000000000c03ee0] c000000000a4079c .start_kernel+0xb4/0x460
> > [c000000000c03f90] c000000000009670 .start_here_common+0x1c/0x2c
> >
> > This is
> >
> > BUG_ON(limit && goal + size > limit);
> >
> > and after some debugging, it seems that
> >
> > goal = 0x7ffff000000
> > limit = 0x80000000000
> >
> > and sparse_early_usemaps_alloc_node ->
> > sparse_early_usemaps_alloc_pgdat_section -> alloc_bootmem_section calls
> >
> > return alloc_bootmem_section(usemap_size() * count, section_nr);
> >
> > This is on a system with 8TB available via the AMS pool, and as a quirk
> > of AMS in firmware, all of that memory shows up in node 0. So, we end up
> > with an allocation that will fail the goal/limit constraints. In theory,
> > we could "fall-back" to alloc_bootmem_node() in
> > sparse_early_usemaps_alloc_node(), but since we actually have HOTREMOVE
> > defined, we'll BUG_ON() instead. A simple solution appears to be to
> > disable the limit check if the size of the allocation in
> > alloc_bootmem_secition exceeds the section size.
>
> It makes sense to allow the usemaps to spill over to subsequent
> sections instead of panicking, so FWIW:
>
> Acked-by: Johannes Weiner <hannes@cmpxchg.org>
>
> That being said, it would be good if check_usemap_section_nr() printed
> the cross-dependencies between pgdats and sections when the usemaps of
> a node spilled over to other sections than the ones holding the pgdat.
>
> How about this?
>
> ---
> From: Johannes Weiner <hannes@cmpxchg.org>
> Subject: sparsemem/bootmem: catch greater than section size allocations fix
>
> If alloc_bootmem_section() no longer guarantees section-locality, we
> need check_usemap_section_nr() to print possible cross-dependencies
> between node descriptors and the usemaps allocated through it.
>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> ---
>
> diff --git a/mm/sparse.c b/mm/sparse.c
> index 61d7cde..9e032dc 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -359,6 +359,7 @@ static void __init sparse_early_usemaps_alloc_node(unsigned long**usemap_map,
> continue;
> usemap_map[pnum] = usemap;
> usemap += size;
> + check_usemap_section_nr(nodeid, usemap_map[pnum]);
> }
> return;
> }
This makes sense to me -- ok if I fold it into the re-worked patch
(based upon Mel's comments)?
> ---
>
> Furthermore, I wonder if we can remove the sparse-specific stuff from
> bootmem.c as well, as now even more so than before, calculating the
> desired area is really none of bootmem's business.
>
> Would something like this be okay?
>
> ---
> From: Johannes Weiner <hannes@cmpxchg.org>
> Subject: [patch] mm: remove sparsemem allocation details from the bootmem allocator
>
> alloc_bootmem_section() derives allocation area constraints from the
> specified sparsemem section. This is a bit specific for a generic
> memory allocator like bootmem, though, so move it over to sparsemem.
>
> Since __alloc_bootmem_node() already retries failed allocations with
> relaxed area constraints, the fallback code in sparsemem.c can be
> removed and the code becomes a bit more compact overall.
>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
I've not tested it, but the intention seems sensible. I think it should
remain a separate change.
Thanks,
Nish
--
Nishanth Aravamudan <nacc@us.ibm.com>
IBM Linux Technology Center
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] sparsemem/bootmem: catch greater than section size allocations
2012-02-28 20:11 ` Nishanth Aravamudan
@ 2012-02-29 9:17 ` Johannes Weiner
0 siblings, 0 replies; 11+ messages in thread
From: Johannes Weiner @ 2012-02-29 9:17 UTC (permalink / raw)
To: Nishanth Aravamudan
Cc: Anton Blanchard, Dave Hansen, linux-mm, Paul Mackerras,
Nishanth Aravamudan, Andrew Morton, Robert Jennings, linuxppc-dev
On Tue, Feb 28, 2012 at 12:11:51PM -0800, Nishanth Aravamudan wrote:
> On 28.02.2012 [14:53:26 +0100], Johannes Weiner wrote:
> > On Fri, Feb 24, 2012 at 11:33:58AM -0800, Nishanth Aravamudan wrote:
> > > While testing AMS (Active Memory Sharing) / CMO (Cooperative Memory
> > > Overcommit) on powerpc, we tripped the following:
> > >
> > > kernel BUG at mm/bootmem.c:483!
> > > cpu 0x0: Vector: 700 (Program Check) at [c000000000c03940]
> > > pc: c000000000a62bd8: .alloc_bootmem_core+0x90/0x39c
> > > lr: c000000000a64bcc: .sparse_early_usemaps_alloc_node+0x84/0x29c
> > > sp: c000000000c03bc0
> > > msr: 8000000000021032
> > > current = 0xc000000000b0cce0
> > > paca = 0xc000000001d80000
> > > pid = 0, comm = swapper
> > > kernel BUG at mm/bootmem.c:483!
> > > enter ? for help
> > > [c000000000c03c80] c000000000a64bcc
> > > .sparse_early_usemaps_alloc_node+0x84/0x29c
> > > [c000000000c03d50] c000000000a64f10 .sparse_init+0x12c/0x28c
> > > [c000000000c03e20] c000000000a474f4 .setup_arch+0x20c/0x294
> > > [c000000000c03ee0] c000000000a4079c .start_kernel+0xb4/0x460
> > > [c000000000c03f90] c000000000009670 .start_here_common+0x1c/0x2c
> > >
> > > This is
> > >
> > > BUG_ON(limit && goal + size > limit);
> > >
> > > and after some debugging, it seems that
> > >
> > > goal = 0x7ffff000000
> > > limit = 0x80000000000
> > >
> > > and sparse_early_usemaps_alloc_node ->
> > > sparse_early_usemaps_alloc_pgdat_section -> alloc_bootmem_section calls
> > >
> > > return alloc_bootmem_section(usemap_size() * count, section_nr);
> > >
> > > This is on a system with 8TB available via the AMS pool, and as a quirk
> > > of AMS in firmware, all of that memory shows up in node 0. So, we end up
> > > with an allocation that will fail the goal/limit constraints. In theory,
> > > we could "fall-back" to alloc_bootmem_node() in
> > > sparse_early_usemaps_alloc_node(), but since we actually have HOTREMOVE
> > > defined, we'll BUG_ON() instead. A simple solution appears to be to
> > > disable the limit check if the size of the allocation in
> > > alloc_bootmem_secition exceeds the section size.
> >
> > It makes sense to allow the usemaps to spill over to subsequent
> > sections instead of panicking, so FWIW:
> >
> > Acked-by: Johannes Weiner <hannes@cmpxchg.org>
> >
> > That being said, it would be good if check_usemap_section_nr() printed
> > the cross-dependencies between pgdats and sections when the usemaps of
> > a node spilled over to other sections than the ones holding the pgdat.
> >
> > How about this?
> >
> > ---
> > From: Johannes Weiner <hannes@cmpxchg.org>
> > Subject: sparsemem/bootmem: catch greater than section size allocations fix
> >
> > If alloc_bootmem_section() no longer guarantees section-locality, we
> > need check_usemap_section_nr() to print possible cross-dependencies
> > between node descriptors and the usemaps allocated through it.
> >
> > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> > ---
> >
> > diff --git a/mm/sparse.c b/mm/sparse.c
> > index 61d7cde..9e032dc 100644
> > --- a/mm/sparse.c
> > +++ b/mm/sparse.c
> > @@ -359,6 +359,7 @@ static void __init sparse_early_usemaps_alloc_node(unsigned long**usemap_map,
> > continue;
> > usemap_map[pnum] = usemap;
> > usemap += size;
> > + check_usemap_section_nr(nodeid, usemap_map[pnum]);
> > }
> > return;
> > }
>
> This makes sense to me -- ok if I fold it into the re-worked patch
> (based upon Mel's comments)?
Sure thing!
> > Furthermore, I wonder if we can remove the sparse-specific stuff from
> > bootmem.c as well, as now even more so than before, calculating the
> > desired area is really none of bootmem's business.
> >
> > Would something like this be okay?
> >
> > ---
> > From: Johannes Weiner <hannes@cmpxchg.org>
> > Subject: [patch] mm: remove sparsemem allocation details from the bootmem allocator
> >
> > alloc_bootmem_section() derives allocation area constraints from the
> > specified sparsemem section. This is a bit specific for a generic
> > memory allocator like bootmem, though, so move it over to sparsemem.
> >
> > Since __alloc_bootmem_node() already retries failed allocations with
> > relaxed area constraints, the fallback code in sparsemem.c can be
> > removed and the code becomes a bit more compact overall.
> >
> > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
>
> I've not tested it, but the intention seems sensible. I think it should
> remain a separate change.
Yes, I agree. I'll resend it in a bit as stand-alone patch.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v2] bootmem/sparsemem: remove limit constraint in alloc_bootmem_section
2012-02-28 15:47 ` Mel Gorman
@ 2012-02-29 18:12 ` Nishanth Aravamudan
2012-02-29 18:45 ` Johannes Weiner
` (2 more replies)
0 siblings, 3 replies; 11+ messages in thread
From: Nishanth Aravamudan @ 2012-02-29 18:12 UTC (permalink / raw)
To: Mel Gorman
Cc: Anton Blanchard, Dave Hansen, linux-mm, Paul Mackerras,
Johannes Weiner, Andrew Morton, Robert Jennings, linuxppc-dev
On 28.02.2012 [15:47:32 +0000], Mel Gorman wrote:
> On Fri, Feb 24, 2012 at 11:33:58AM -0800, Nishanth Aravamudan wrote:
> > While testing AMS (Active Memory Sharing) / CMO (Cooperative Memory
> > Overcommit) on powerpc, we tripped the following:
> >
> > kernel BUG at mm/bootmem.c:483!
> > cpu 0x0: Vector: 700 (Program Check) at [c000000000c03940]
> > pc: c000000000a62bd8: .alloc_bootmem_core+0x90/0x39c
> > lr: c000000000a64bcc: .sparse_early_usemaps_alloc_node+0x84/0x29c
> > sp: c000000000c03bc0
> > msr: 8000000000021032
> > current = 0xc000000000b0cce0
> > paca = 0xc000000001d80000
> > pid = 0, comm = swapper
> > kernel BUG at mm/bootmem.c:483!
> > enter ? for help
> > [c000000000c03c80] c000000000a64bcc
> > .sparse_early_usemaps_alloc_node+0x84/0x29c
> > [c000000000c03d50] c000000000a64f10 .sparse_init+0x12c/0x28c
> > [c000000000c03e20] c000000000a474f4 .setup_arch+0x20c/0x294
> > [c000000000c03ee0] c000000000a4079c .start_kernel+0xb4/0x460
> > [c000000000c03f90] c000000000009670 .start_here_common+0x1c/0x2c
> >
> > This is
> >
> > BUG_ON(limit && goal + size > limit);
> >
> > and after some debugging, it seems that
> >
> > goal = 0x7ffff000000
> > limit = 0x80000000000
> >
> > and sparse_early_usemaps_alloc_node ->
> > sparse_early_usemaps_alloc_pgdat_section -> alloc_bootmem_section calls
> >
> > return alloc_bootmem_section(usemap_size() * count, section_nr);
> >
> > This is on a system with 8TB available via the AMS pool, and as a quirk
> > of AMS in firmware, all of that memory shows up in node 0. So, we end up
> > with an allocation that will fail the goal/limit constraints. In theory,
> > we could "fall-back" to alloc_bootmem_node() in
> > sparse_early_usemaps_alloc_node(), but since we actually have HOTREMOVE
> > defined, we'll BUG_ON() instead. A simple solution appears to be to
> > disable the limit check if the size of the allocation in
> > alloc_bootmem_secition exceeds the section size.
> >
> > Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
> > Cc: Dave Hansen <haveblue@us.ibm.com>
> > Cc: Anton Blanchard <anton@au1.ibm.com>
> > Cc: Paul Mackerras <paulus@samba.org>
> > Cc: Ben Herrenschmidt <benh@kernel.crashing.org>
> > Cc: Robert Jennings <rcj@linux.vnet.ibm.com>
> > Cc: linux-mm@kvack.org
> > Cc: linuxppc-dev@lists.ozlabs.org
> > ---
> > include/linux/mmzone.h | 2 ++
> > mm/bootmem.c | 5 ++++-
> > 2 files changed, 6 insertions(+), 1 deletions(-)
> >
> > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> > index 650ba2f..4176834 100644
> > --- a/include/linux/mmzone.h
> > +++ b/include/linux/mmzone.h
> > @@ -967,6 +967,8 @@ static inline unsigned long early_pfn_to_nid(unsigned long pfn)
> > * PA_SECTION_SHIFT physical address to/from section number
> > * PFN_SECTION_SHIFT pfn to/from section number
> > */
> > +#define BYTES_PER_SECTION (1UL << SECTION_SIZE_BITS)
> > +
> > #define SECTIONS_SHIFT (MAX_PHYSMEM_BITS - SECTION_SIZE_BITS)
> >
> > #define PA_SECTION_SHIFT (SECTION_SIZE_BITS)
> > diff --git a/mm/bootmem.c b/mm/bootmem.c
> > index 668e94d..5cbbc76 100644
> > --- a/mm/bootmem.c
> > +++ b/mm/bootmem.c
> > @@ -770,7 +770,10 @@ void * __init alloc_bootmem_section(unsigned long size,
> >
> > pfn = section_nr_to_pfn(section_nr);
> > goal = pfn << PAGE_SHIFT;
> > - limit = section_nr_to_pfn(section_nr + 1) << PAGE_SHIFT;
> > + if (size > BYTES_PER_SECTION)
> > + limit = 0;
> > + else
> > + limit = section_nr_to_pfn(section_nr + 1) << PAGE_SHIFT;
>
> As it's ok to spill the allocation over to an adjacent section, why not
> just make limit==0 unconditionally. That would avoid defining
> BYTES_PER_SECTION.
Something like this?
Andrew, presuming Mel & Johannes give their, ack this should presumably
supersede the patch you pulled into -mm.
Thanks,
Nish
-------
While testing AMS (Active Memory Sharing) / CMO (Cooperative Memory
Overcommit) on powerpc, we tripped the following:
kernel BUG at mm/bootmem.c:483!
cpu 0x0: Vector: 700 (Program Check) at [c000000000c03940]
pc: c000000000a62bd8: .alloc_bootmem_core+0x90/0x39c
lr: c000000000a64bcc: .sparse_early_usemaps_alloc_node+0x84/0x29c
sp: c000000000c03bc0
msr: 8000000000021032
current = 0xc000000000b0cce0
paca = 0xc000000001d80000
pid = 0, comm = swapper
kernel BUG at mm/bootmem.c:483!
enter ? for help
[c000000000c03c80] c000000000a64bcc
.sparse_early_usemaps_alloc_node+0x84/0x29c
[c000000000c03d50] c000000000a64f10 .sparse_init+0x12c/0x28c
[c000000000c03e20] c000000000a474f4 .setup_arch+0x20c/0x294
[c000000000c03ee0] c000000000a4079c .start_kernel+0xb4/0x460
[c000000000c03f90] c000000000009670 .start_here_common+0x1c/0x2c
This is
BUG_ON(limit && goal + size > limit);
and after some debugging, it seems that
goal = 0x7ffff000000
limit = 0x80000000000
and sparse_early_usemaps_alloc_node ->
sparse_early_usemaps_alloc_pgdat_section calls
return alloc_bootmem_section(usemap_size() * count, section_nr);
This is on a system with 8TB available via the AMS pool, and as a quirk
of AMS in firmware, all of that memory shows up in node 0. So, we end up
with an allocation that will fail the goal/limit constraints. In theory,
we could "fall-back" to alloc_bootmem_node() in
sparse_early_usemaps_alloc_node(), but since we actually have HOTREMOVE
defined, we'll BUG_ON() instead. A simple solution appears to be to
unconditionally remove the limit condition in alloc_bootmem_section,
meaning allocations are allowed to cross section boundaries (necessary
for systems of this size).
Johannes Weiner pointed out that if alloc_bootmem_section() no longer
guarantees section-locality, we need check_usemap_section_nr() to print
possible cross-dependencies between node descriptors and the usemaps
allocated through it. That makes the two loops in
sparse_early_usemaps_alloc_node() identical, so re-factor the code a
bit.
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
---
v2: Unconditionally set limit to 0. Fold in Johannes' changes to
sparse_early_usemaps_alloc_node.
diff --git a/mm/bootmem.c b/mm/bootmem.c
index 668e94d..9c9ae09 100644
--- a/mm/bootmem.c
+++ b/mm/bootmem.c
@@ -770,7 +770,7 @@ void * __init alloc_bootmem_section(unsigned long size,
pfn = section_nr_to_pfn(section_nr);
goal = pfn << PAGE_SHIFT;
- limit = section_nr_to_pfn(section_nr + 1) << PAGE_SHIFT;
+ limit = 0;
bdata = &bootmem_node_data[early_pfn_to_nid(pfn)];
return alloc_bootmem_core(bdata, size, SMP_CACHE_BYTES, goal, limit);
diff --git a/mm/sparse.c b/mm/sparse.c
index 61d7cde..a8bc7d3 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -353,29 +353,21 @@ static void __init sparse_early_usemaps_alloc_node(unsigned long**usemap_map,
usemap = sparse_early_usemaps_alloc_pgdat_section(NODE_DATA(nodeid),
usemap_count);
- if (usemap) {
- for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
- if (!present_section_nr(pnum))
- continue;
- usemap_map[pnum] = usemap;
- usemap += size;
+ if (!usemap) {
+ usemap = alloc_bootmem_node(NODE_DATA(nodeid), size * usemap_count);
+ if (!usemap) {
+ printk(KERN_WARNING "%s: allocation failed\n", __func__);
+ return;
}
- return;
}
- usemap = alloc_bootmem_node(NODE_DATA(nodeid), size * usemap_count);
- if (usemap) {
- for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
- if (!present_section_nr(pnum))
- continue;
- usemap_map[pnum] = usemap;
- usemap += size;
- check_usemap_section_nr(nodeid, usemap_map[pnum]);
- }
- return;
+ for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
+ if (!present_section_nr(pnum))
+ continue;
+ usemap_map[pnum] = usemap;
+ usemap += size;
+ check_usemap_section_nr(nodeid, usemap_map[pnum]);
}
-
- printk(KERN_WARNING "%s: allocation failed\n", __func__);
}
#ifndef CONFIG_SPARSEMEM_VMEMMAP
--
Nishanth Aravamudan <nacc@us.ibm.com>
IBM Linux Technology Center
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH v2] bootmem/sparsemem: remove limit constraint in alloc_bootmem_section
2012-02-29 18:12 ` [PATCH v2] bootmem/sparsemem: remove limit constraint in alloc_bootmem_section Nishanth Aravamudan
@ 2012-02-29 18:45 ` Johannes Weiner
2012-02-29 23:28 ` Andrew Morton
2012-03-01 11:42 ` Mel Gorman
2 siblings, 0 replies; 11+ messages in thread
From: Johannes Weiner @ 2012-02-29 18:45 UTC (permalink / raw)
To: Nishanth Aravamudan
Cc: Anton Blanchard, Dave Hansen, linux-mm, Paul Mackerras,
Mel Gorman, Andrew Morton, Robert Jennings, linuxppc-dev
On Wed, Feb 29, 2012 at 10:12:33AM -0800, Nishanth Aravamudan wrote:
> On 28.02.2012 [15:47:32 +0000], Mel Gorman wrote:
> > On Fri, Feb 24, 2012 at 11:33:58AM -0800, Nishanth Aravamudan wrote:
> > > While testing AMS (Active Memory Sharing) / CMO (Cooperative Memory
> > > Overcommit) on powerpc, we tripped the following:
> > >
> > > kernel BUG at mm/bootmem.c:483!
> > > cpu 0x0: Vector: 700 (Program Check) at [c000000000c03940]
> > > pc: c000000000a62bd8: .alloc_bootmem_core+0x90/0x39c
> > > lr: c000000000a64bcc: .sparse_early_usemaps_alloc_node+0x84/0x29c
> > > sp: c000000000c03bc0
> > > msr: 8000000000021032
> > > current = 0xc000000000b0cce0
> > > paca = 0xc000000001d80000
> > > pid = 0, comm = swapper
> > > kernel BUG at mm/bootmem.c:483!
> > > enter ? for help
> > > [c000000000c03c80] c000000000a64bcc
> > > .sparse_early_usemaps_alloc_node+0x84/0x29c
> > > [c000000000c03d50] c000000000a64f10 .sparse_init+0x12c/0x28c
> > > [c000000000c03e20] c000000000a474f4 .setup_arch+0x20c/0x294
> > > [c000000000c03ee0] c000000000a4079c .start_kernel+0xb4/0x460
> > > [c000000000c03f90] c000000000009670 .start_here_common+0x1c/0x2c
> > >
> > > This is
> > >
> > > BUG_ON(limit && goal + size > limit);
> > >
> > > and after some debugging, it seems that
> > >
> > > goal = 0x7ffff000000
> > > limit = 0x80000000000
> > >
> > > and sparse_early_usemaps_alloc_node ->
> > > sparse_early_usemaps_alloc_pgdat_section -> alloc_bootmem_section calls
> > >
> > > return alloc_bootmem_section(usemap_size() * count, section_nr);
> > >
> > > This is on a system with 8TB available via the AMS pool, and as a quirk
> > > of AMS in firmware, all of that memory shows up in node 0. So, we end up
> > > with an allocation that will fail the goal/limit constraints. In theory,
> > > we could "fall-back" to alloc_bootmem_node() in
> > > sparse_early_usemaps_alloc_node(), but since we actually have HOTREMOVE
> > > defined, we'll BUG_ON() instead. A simple solution appears to be to
> > > disable the limit check if the size of the allocation in
> > > alloc_bootmem_secition exceeds the section size.
> > >
> > > Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
> > > Cc: Dave Hansen <haveblue@us.ibm.com>
> > > Cc: Anton Blanchard <anton@au1.ibm.com>
> > > Cc: Paul Mackerras <paulus@samba.org>
> > > Cc: Ben Herrenschmidt <benh@kernel.crashing.org>
> > > Cc: Robert Jennings <rcj@linux.vnet.ibm.com>
> > > Cc: linux-mm@kvack.org
> > > Cc: linuxppc-dev@lists.ozlabs.org
> > > ---
> > > include/linux/mmzone.h | 2 ++
> > > mm/bootmem.c | 5 ++++-
> > > 2 files changed, 6 insertions(+), 1 deletions(-)
> > >
> > > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> > > index 650ba2f..4176834 100644
> > > --- a/include/linux/mmzone.h
> > > +++ b/include/linux/mmzone.h
> > > @@ -967,6 +967,8 @@ static inline unsigned long early_pfn_to_nid(unsigned long pfn)
> > > * PA_SECTION_SHIFT physical address to/from section number
> > > * PFN_SECTION_SHIFT pfn to/from section number
> > > */
> > > +#define BYTES_PER_SECTION (1UL << SECTION_SIZE_BITS)
> > > +
> > > #define SECTIONS_SHIFT (MAX_PHYSMEM_BITS - SECTION_SIZE_BITS)
> > >
> > > #define PA_SECTION_SHIFT (SECTION_SIZE_BITS)
> > > diff --git a/mm/bootmem.c b/mm/bootmem.c
> > > index 668e94d..5cbbc76 100644
> > > --- a/mm/bootmem.c
> > > +++ b/mm/bootmem.c
> > > @@ -770,7 +770,10 @@ void * __init alloc_bootmem_section(unsigned long size,
> > >
> > > pfn = section_nr_to_pfn(section_nr);
> > > goal = pfn << PAGE_SHIFT;
> > > - limit = section_nr_to_pfn(section_nr + 1) << PAGE_SHIFT;
> > > + if (size > BYTES_PER_SECTION)
> > > + limit = 0;
> > > + else
> > > + limit = section_nr_to_pfn(section_nr + 1) << PAGE_SHIFT;
> >
> > As it's ok to spill the allocation over to an adjacent section, why not
> > just make limit==0 unconditionally. That would avoid defining
> > BYTES_PER_SECTION.
>
> Something like this?
>
> Andrew, presuming Mel & Johannes give their, ack this should presumably
> supersede the patch you pulled into -mm.
>
> Thanks,
> Nish
>
> -------
>
> While testing AMS (Active Memory Sharing) / CMO (Cooperative Memory
> Overcommit) on powerpc, we tripped the following:
>
> kernel BUG at mm/bootmem.c:483!
> cpu 0x0: Vector: 700 (Program Check) at [c000000000c03940]
> pc: c000000000a62bd8: .alloc_bootmem_core+0x90/0x39c
> lr: c000000000a64bcc: .sparse_early_usemaps_alloc_node+0x84/0x29c
> sp: c000000000c03bc0
> msr: 8000000000021032
> current = 0xc000000000b0cce0
> paca = 0xc000000001d80000
> pid = 0, comm = swapper
> kernel BUG at mm/bootmem.c:483!
> enter ? for help
> [c000000000c03c80] c000000000a64bcc
> .sparse_early_usemaps_alloc_node+0x84/0x29c
> [c000000000c03d50] c000000000a64f10 .sparse_init+0x12c/0x28c
> [c000000000c03e20] c000000000a474f4 .setup_arch+0x20c/0x294
> [c000000000c03ee0] c000000000a4079c .start_kernel+0xb4/0x460
> [c000000000c03f90] c000000000009670 .start_here_common+0x1c/0x2c
>
> This is
>
> BUG_ON(limit && goal + size > limit);
>
> and after some debugging, it seems that
>
> goal = 0x7ffff000000
> limit = 0x80000000000
>
> and sparse_early_usemaps_alloc_node ->
> sparse_early_usemaps_alloc_pgdat_section calls
>
> return alloc_bootmem_section(usemap_size() * count, section_nr);
>
> This is on a system with 8TB available via the AMS pool, and as a quirk
> of AMS in firmware, all of that memory shows up in node 0. So, we end up
> with an allocation that will fail the goal/limit constraints. In theory,
> we could "fall-back" to alloc_bootmem_node() in
> sparse_early_usemaps_alloc_node(), but since we actually have HOTREMOVE
> defined, we'll BUG_ON() instead. A simple solution appears to be to
> unconditionally remove the limit condition in alloc_bootmem_section,
> meaning allocations are allowed to cross section boundaries (necessary
> for systems of this size).
>
> Johannes Weiner pointed out that if alloc_bootmem_section() no longer
> guarantees section-locality, we need check_usemap_section_nr() to print
> possible cross-dependencies between node descriptors and the usemaps
> allocated through it. That makes the two loops in
> sparse_early_usemaps_alloc_node() identical, so re-factor the code a
> bit.
>
> Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2] bootmem/sparsemem: remove limit constraint in alloc_bootmem_section
2012-02-29 18:12 ` [PATCH v2] bootmem/sparsemem: remove limit constraint in alloc_bootmem_section Nishanth Aravamudan
2012-02-29 18:45 ` Johannes Weiner
@ 2012-02-29 23:28 ` Andrew Morton
2012-03-01 0:03 ` Nishanth Aravamudan
2012-03-01 23:12 ` Nishanth Aravamudan
2012-03-01 11:42 ` Mel Gorman
2 siblings, 2 replies; 11+ messages in thread
From: Andrew Morton @ 2012-02-29 23:28 UTC (permalink / raw)
To: Nishanth Aravamudan
Cc: Anton Blanchard, Dave Hansen, stable, linux-mm, Paul Mackerras,
Mel Gorman, Johannes Weiner, Robert Jennings, linuxppc-dev
On Wed, 29 Feb 2012 10:12:33 -0800
Nishanth Aravamudan <nacc@linux.vnet.ibm.com> wrote:
> While testing AMS (Active Memory Sharing) / CMO (Cooperative Memory
> Overcommit) on powerpc, we tripped the following:
>
> kernel BUG at mm/bootmem.c:483!
>
> ...
>
> This is
>
> BUG_ON(limit && goal + size > limit);
>
> and after some debugging, it seems that
>
> goal = 0x7ffff000000
> limit = 0x80000000000
>
> and sparse_early_usemaps_alloc_node ->
> sparse_early_usemaps_alloc_pgdat_section calls
>
> return alloc_bootmem_section(usemap_size() * count, section_nr);
>
> This is on a system with 8TB available via the AMS pool, and as a quirk
> of AMS in firmware, all of that memory shows up in node 0. So, we end up
> with an allocation that will fail the goal/limit constraints. In theory,
> we could "fall-back" to alloc_bootmem_node() in
> sparse_early_usemaps_alloc_node(), but since we actually have HOTREMOVE
> defined, we'll BUG_ON() instead. A simple solution appears to be to
> unconditionally remove the limit condition in alloc_bootmem_section,
> meaning allocations are allowed to cross section boundaries (necessary
> for systems of this size).
>
> Johannes Weiner pointed out that if alloc_bootmem_section() no longer
> guarantees section-locality, we need check_usemap_section_nr() to print
> possible cross-dependencies between node descriptors and the usemaps
> allocated through it. That makes the two loops in
> sparse_early_usemaps_alloc_node() identical, so re-factor the code a
> bit.
The patch is a bit scary now, so I think we should merge it into
3.4-rc1 and then backport it into 3.3.1 if nothing blows up.
Do you think it should be backported into 3.3.x? Earlier kernels?
Also, this?
--- a/mm/bootmem.c~bootmem-sparsemem-remove-limit-constraint-in-alloc_bootmem_section-fix
+++ a/mm/bootmem.c
@@ -766,14 +766,13 @@ void * __init alloc_bootmem_section(unsi
unsigned long section_nr)
{
bootmem_data_t *bdata;
- unsigned long pfn, goal, limit;
+ unsigned long pfn, goal;
pfn = section_nr_to_pfn(section_nr);
goal = pfn << PAGE_SHIFT;
- limit = 0;
bdata = &bootmem_node_data[early_pfn_to_nid(pfn)];
- return alloc_bootmem_core(bdata, size, SMP_CACHE_BYTES, goal, limit);
+ return alloc_bootmem_core(bdata, size, SMP_CACHE_BYTES, goal, 0);
}
#endif
_
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2] bootmem/sparsemem: remove limit constraint in alloc_bootmem_section
2012-02-29 23:28 ` Andrew Morton
@ 2012-03-01 0:03 ` Nishanth Aravamudan
2012-03-01 23:12 ` Nishanth Aravamudan
1 sibling, 0 replies; 11+ messages in thread
From: Nishanth Aravamudan @ 2012-03-01 0:03 UTC (permalink / raw)
To: Andrew Morton
Cc: Anton Blanchard, Dave Hansen, stable, linux-mm, Paul Mackerras,
Mel Gorman, Johannes Weiner, Robert Jennings, linuxppc-dev
On 29.02.2012 [15:28:30 -0800], Andrew Morton wrote:
> On Wed, 29 Feb 2012 10:12:33 -0800
> Nishanth Aravamudan <nacc@linux.vnet.ibm.com> wrote:
>
> > While testing AMS (Active Memory Sharing) / CMO (Cooperative Memory
> > Overcommit) on powerpc, we tripped the following:
> >
> > kernel BUG at mm/bootmem.c:483!
> >
> > ...
> >
> > This is
> >
> > BUG_ON(limit && goal + size > limit);
> >
> > and after some debugging, it seems that
> >
> > goal = 0x7ffff000000
> > limit = 0x80000000000
> >
> > and sparse_early_usemaps_alloc_node ->
> > sparse_early_usemaps_alloc_pgdat_section calls
> >
> > return alloc_bootmem_section(usemap_size() * count, section_nr);
> >
> > This is on a system with 8TB available via the AMS pool, and as a quirk
> > of AMS in firmware, all of that memory shows up in node 0. So, we end up
> > with an allocation that will fail the goal/limit constraints. In theory,
> > we could "fall-back" to alloc_bootmem_node() in
> > sparse_early_usemaps_alloc_node(), but since we actually have HOTREMOVE
> > defined, we'll BUG_ON() instead. A simple solution appears to be to
> > unconditionally remove the limit condition in alloc_bootmem_section,
> > meaning allocations are allowed to cross section boundaries (necessary
> > for systems of this size).
> >
> > Johannes Weiner pointed out that if alloc_bootmem_section() no longer
> > guarantees section-locality, we need check_usemap_section_nr() to print
> > possible cross-dependencies between node descriptors and the usemaps
> > allocated through it. That makes the two loops in
> > sparse_early_usemaps_alloc_node() identical, so re-factor the code a
> > bit.
>
> The patch is a bit scary now, so I think we should merge it into
> 3.4-rc1 and then backport it into 3.3.1 if nothing blows up.
I think that's fair.
> Do you think it should be backported into 3.3.x? Earlier kernels?
3.3.x seems reasonable. If I had to guess, I think this could be hit on
any kernels with this functionality -- that is, sparsemem in general?
Not sure how far back it's worth backporting.
> Also, this?
Urgh, yeah, that's way better.
Acked-by: Nishanth Aravamudan <nacc@us.ibm.com>
> --- a/mm/bootmem.c~bootmem-sparsemem-remove-limit-constraint-in-alloc_bootmem_section-fix
> +++ a/mm/bootmem.c
> @@ -766,14 +766,13 @@ void * __init alloc_bootmem_section(unsi
> unsigned long section_nr)
> {
> bootmem_data_t *bdata;
> - unsigned long pfn, goal, limit;
> + unsigned long pfn, goal;
>
> pfn = section_nr_to_pfn(section_nr);
> goal = pfn << PAGE_SHIFT;
> - limit = 0;
> bdata = &bootmem_node_data[early_pfn_to_nid(pfn)];
>
> - return alloc_bootmem_core(bdata, size, SMP_CACHE_BYTES, goal, limit);
> + return alloc_bootmem_core(bdata, size, SMP_CACHE_BYTES, goal, 0);
> }
> #endif
Thanks for all the feedback!
-Nish
--
Nishanth Aravamudan <nacc@us.ibm.com>
IBM Linux Technology Center
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2] bootmem/sparsemem: remove limit constraint in alloc_bootmem_section
2012-02-29 18:12 ` [PATCH v2] bootmem/sparsemem: remove limit constraint in alloc_bootmem_section Nishanth Aravamudan
2012-02-29 18:45 ` Johannes Weiner
2012-02-29 23:28 ` Andrew Morton
@ 2012-03-01 11:42 ` Mel Gorman
2 siblings, 0 replies; 11+ messages in thread
From: Mel Gorman @ 2012-03-01 11:42 UTC (permalink / raw)
To: Nishanth Aravamudan
Cc: Anton Blanchard, Dave Hansen, linux-mm, Paul Mackerras,
Johannes Weiner, Andrew Morton, Robert Jennings, linuxppc-dev
On Wed, Feb 29, 2012 at 10:12:33AM -0800, Nishanth Aravamudan wrote:
> <SNIP>
>
> Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
>
Acked-by: Mel Gorman <mgorman@suse.de>
--
Mel Gorman
SUSE Labs
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2] bootmem/sparsemem: remove limit constraint in alloc_bootmem_section
2012-02-29 23:28 ` Andrew Morton
2012-03-01 0:03 ` Nishanth Aravamudan
@ 2012-03-01 23:12 ` Nishanth Aravamudan
1 sibling, 0 replies; 11+ messages in thread
From: Nishanth Aravamudan @ 2012-03-01 23:12 UTC (permalink / raw)
To: Andrew Morton
Cc: Anton Blanchard, Dave Hansen, stable, linux-mm, Paul Mackerras,
Mel Gorman, Johannes Weiner, Robert Jennings, linuxppc-dev
On 29.02.2012 [15:28:30 -0800], Andrew Morton wrote:
> On Wed, 29 Feb 2012 10:12:33 -0800
> Nishanth Aravamudan <nacc@linux.vnet.ibm.com> wrote:
>
> > While testing AMS (Active Memory Sharing) / CMO (Cooperative Memory
> > Overcommit) on powerpc, we tripped the following:
> >
> > kernel BUG at mm/bootmem.c:483!
> >
> > ...
> >
> > This is
> >
> > BUG_ON(limit && goal + size > limit);
> >
> > and after some debugging, it seems that
> >
> > goal = 0x7ffff000000
> > limit = 0x80000000000
> >
> > and sparse_early_usemaps_alloc_node ->
> > sparse_early_usemaps_alloc_pgdat_section calls
> >
> > return alloc_bootmem_section(usemap_size() * count, section_nr);
> >
> > This is on a system with 8TB available via the AMS pool, and as a quirk
> > of AMS in firmware, all of that memory shows up in node 0. So, we end up
> > with an allocation that will fail the goal/limit constraints. In theory,
> > we could "fall-back" to alloc_bootmem_node() in
> > sparse_early_usemaps_alloc_node(), but since we actually have HOTREMOVE
> > defined, we'll BUG_ON() instead. A simple solution appears to be to
> > unconditionally remove the limit condition in alloc_bootmem_section,
> > meaning allocations are allowed to cross section boundaries (necessary
> > for systems of this size).
> >
> > Johannes Weiner pointed out that if alloc_bootmem_section() no longer
> > guarantees section-locality, we need check_usemap_section_nr() to print
> > possible cross-dependencies between node descriptors and the usemaps
> > allocated through it. That makes the two loops in
> > sparse_early_usemaps_alloc_node() identical, so re-factor the code a
> > bit.
>
> The patch is a bit scary now, so I think we should merge it into
> 3.4-rc1 and then backport it into 3.3.1 if nothing blows up.
>
> Do you think it should be backported into 3.3.x? Earlier kernels?
Upon review, it would be good if we can get it pushed back to kernels
3.0.x, 3.1.x and 3.2.x.
Thanks,
Nish
--
Nishanth Aravamudan <nacc@us.ibm.com>
IBM Linux Technology Center
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2012-03-01 23:12 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-02-24 19:33 [PATCH] sparsemem/bootmem: catch greater than section size allocations Nishanth Aravamudan
2012-02-28 13:53 ` Johannes Weiner
2012-02-28 20:11 ` Nishanth Aravamudan
2012-02-29 9:17 ` Johannes Weiner
2012-02-28 15:47 ` Mel Gorman
2012-02-29 18:12 ` [PATCH v2] bootmem/sparsemem: remove limit constraint in alloc_bootmem_section Nishanth Aravamudan
2012-02-29 18:45 ` Johannes Weiner
2012-02-29 23:28 ` Andrew Morton
2012-03-01 0:03 ` Nishanth Aravamudan
2012-03-01 23:12 ` Nishanth Aravamudan
2012-03-01 11:42 ` Mel Gorman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).