* Re: [patch 046/181] mm: remove sparsemem allocation details from the bootmem allocator
[not found] ` <CAE9FiQVEq5j1zoKS31fgfG_3xZSyfqsjdDorDku-Sgj6BTShEg@mail.gmail.com>
@ 2012-06-23 2:05 ` Yinghai Lu
2012-06-23 9:17 ` Johannes Weiner
0 siblings, 1 reply; 8+ messages in thread
From: Yinghai Lu @ 2012-06-23 2:05 UTC (permalink / raw)
To: akpm; +Cc: torvalds, hannes, davem, shangw, tj, Linux Kernel Mailing List
[-- Attachment #1: Type: text/plain, Size: 7290 bytes --]
On Fri, Jun 22, 2012 at 6:11 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Tue, May 29, 2012 at 3:06 PM, <akpm@linux-foundation.org> wrote:
>> From: Johannes Weiner <hannes@cmpxchg.org>
>> Subject: mm: remove sparsemem allocation details from the bootmem allocator
>>
>> alloc_bootmem_section() derives allocation area constraints from the
>> specified sparsemem section. This is a bit specific for a generic memory
>> allocator like bootmem, though, so move it over to sparsemem.
>>
>> As __alloc_bootmem_node_nopanic() already retries failed allocations with
>> relaxed area constraints, the fallback code in sparsemem.c can be removed
>> and the code becomes a bit more compact overall.
>>
>> [akpm@linux-foundation.org: fix build]
>> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
>> Acked-by: Tejun Heo <tj@kernel.org>
>> Acked-by: David S. Miller <davem@davemloft.net>
>> Cc: Yinghai Lu <yinghai@kernel.org>
>> Cc: Gavin Shan <shangw@linux.vnet.ibm.com>
>> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
>
> hi, this one cause regression, will put usemap to last node's memory
> instead of each node.
>
> got warning:
>
> [ 0.000000] node 0 must be removed before remove section 16399
> [ 0.000000] node 1 must be removed before remove section 16399
> [ 0.000000] node 2 must be removed before remove section 16399
> [ 0.000000] node 3 must be removed before remove section 16399
> [ 0.000000] node 4 must be removed before remove section 16399
> [ 0.000000] node 5 must be removed before remove section 16399
> [ 0.000000] node 6 must be removed before remove section 16399
>
> location for usemap:
>
> [ 0.000000] Initmem setup node 0 [mem 0x00000000-0x407fffffff]
> [ 0.000000] NODE_DATA [mem 0x407ffd9000-0x407fffffff]
> [ 0.000000] Initmem setup node 1 [mem 0x4080000000-0x807fffffff]
> [ 0.000000] NODE_DATA [mem 0x807ffd9000-0x807fffffff]
> [ 0.000000] Initmem setup node 2 [mem 0x8080000000-0xc07fffffff]
> [ 0.000000] NODE_DATA [mem 0xc07ffd9000-0xc07fffffff]
> [ 0.000000] Initmem setup node 3 [mem 0xc080000000-0x1007fffffff]
> [ 0.000000] NODE_DATA [mem 0x1007ffd9000-0x1007fffffff]
> [ 0.000000] Initmem setup node 4 [mem 0x10080000000-0x1407fffffff]
> [ 0.000000] NODE_DATA [mem 0x1407ffd9000-0x1407fffffff]
> [ 0.000000] Initmem setup node 5 [mem 0x14080000000-0x1807fffffff]
> [ 0.000000] NODE_DATA [mem 0x1807ffd9000-0x1807fffffff]
> [ 0.000000] Initmem setup node 6 [mem 0x18080000000-0x1c07fffffff]
> [ 0.000000] NODE_DATA [mem 0x1c07ffd9000-0x1c07fffffff]
> [ 0.000000] Initmem setup node 7 [mem 0x1c080000000-0x2007fffffff]
> [ 0.000000] NODE_DATA [mem 0x2007fbd8000-0x2007fbfefff]
> [ 0.000000] MEMBLOCK configuration:
> [ 0.000000] memory size = 0x1ffff6d5000 reserved size = 0x105a544ee
> [ 0.000000] memory.cnt = 0xa
> [ 0.000000] memory[0x0] [0x00010000-0x00094fff], 0x85000 bytes on node 0
> [ 0.000000] memory[0x1] [0x00100000-0x7f74ffff], 0x7f650000
> bytes on node 0
> [ 0.000000] memory[0x2] [0x100000000-0x407fffffff],
> 0x3f80000000 bytes on node 0
> [ 0.000000] memory[0x3] [0x4080000000-0x807fffffff],
> 0x4000000000 bytes on node 1
> [ 0.000000] memory[0x4] [0x8080000000-0xc07fffffff],
> 0x4000000000 bytes on node 2
> [ 0.000000] memory[0x5] [0xc080000000-0x1007fffffff],
> 0x4000000000 bytes on node 3
> [ 0.000000] memory[0x6] [0x10080000000-0x1407fffffff],
> 0x4000000000 bytes on node 4
> [ 0.000000] memory[0x7] [0x14080000000-0x1807fffffff],
> 0x4000000000 bytes on node 5
> [ 0.000000] memory[0x8] [0x18080000000-0x1c07fffffff],
> 0x4000000000 bytes on node 6
> [ 0.000000] memory[0x9] [0x1c080000000-0x2007fffffff],
> 0x4000000000 bytes on node 7
> [ 0.000000] reserved.cnt = 0x10
> [ 0.000000] reserved[0x0] [0x0008f000-0x00094fff], 0x6000 bytes
> [ 0.000000] reserved[0x1] [0x00095400-0x000fffff], 0x6ac00 bytes
> [ 0.000000] reserved[0x2] [0x01000000-0x02afb9db], 0x1afb9dc bytes
> [ 0.000000] reserved[0x3] [0x02cfb9dc-0x0383f8ad], 0xb43ed2 bytes
> [ 0.000000] reserved[0x4] [0x7ccd9000-0x7d0d5fff], 0x3fd000 bytes
> [ 0.000000] reserved[0x5] [0x7d0d8000-0x7f744fff], 0x266d000 bytes
> [ 0.000000] reserved[0x6] [0x407ffd9000-0x407fffffff], 0x27000 bytes
> [ 0.000000] reserved[0x7] [0x807ffd9000-0x807fffffff], 0x27000 bytes
> [ 0.000000] reserved[0x8] [0xc07ffd9000-0xc07fffffff], 0x27000 bytes
> [ 0.000000] reserved[0x9] [0x1007ffd9000-0x1007fffffff], 0x27000 bytes
> [ 0.000000] reserved[0xa] [0x1407ffd9000-0x1407fffffff], 0x27000 bytes
> [ 0.000000] reserved[0xb] [0x1807ffd9000-0x1807fffffff], 0x27000 bytes
> [ 0.000000] reserved[0xc] [0x1c07ffd9000-0x1c07fffffff], 0x27000 bytes
> [ 0.000000] reserved[0xd] [0x1ff7f3f9000-0x2007f7fafff], 0x100402000 bytes
> [ 0.000000] reserved[0xe] [0x2007fbd8000-0x2007fbff03f], 0x27040 bytes
> [ 0.000000] reserved[0xf] [0x2007fc00000-0x2007fffffff], 0x400000 bytes
> [ 0.000000] memblock_reserve: [0x1ff7eff9000-0x1ff7f3f8fff] usemap_map
> [ 0.000000] memblock_reserve: [0x2007fbc4000-0x2007fbcffff] usemap section
> [ 0.000000] node 0 must be removed before remove section 16399
> [ 0.000000] memblock_reserve: [0x2007fbb8000-0x2007fbc3fff] usemap section
> [ 0.000000] node 1 must be removed before remove section 16399
> [ 0.000000] memblock_reserve: [0x2007fbac000-0x2007fbb7fff] usemap section
> [ 0.000000] node 2 must be removed before remove section 16399
> [ 0.000000] memblock_reserve: [0x2007fba0000-0x2007fbabfff] usemap section
> [ 0.000000] node 3 must be removed before remove section 16399
> [ 0.000000] memblock_reserve: [0x2007fb94000-0x2007fb9ffff] usemap section
> [ 0.000000] node 4 must be removed before remove section 16399
> [ 0.000000] memblock_reserve: [0x2007fb88000-0x2007fb93fff] usemap section
> [ 0.000000] node 5 must be removed before remove section 16399
> [ 0.000000] memblock_reserve: [0x2007fb7c000-0x2007fb87fff] usemap section
> [ 0.000000] node 6 must be removed before remove section 16399
> [ 0.000000] memblock_reserve: [0x2007fb70000-0x2007fb7bfff] usemap section
>
> reverting that patch fixes the problem.
>
> will get correct...
>
> [ 0.000000] memblock_reserve: [0x1ff7eff9000-0x1ff7f3f8fff] usemap_map
> [ 0.000000] memblock_reserve: [0x407ffc4000-0x407ffcffff] usermap section
> [ 0.000000] memblock_reserve: [0x807ffc5000-0x807ffd0fff] usermap section
> [ 0.000000] memblock_reserve: [0xc07ffc5000-0xc07ffd0fff] usermap section
> [ 0.000000] memblock_reserve: [0x1007ffc5000-0x1007ffd0fff] usermap section
> [ 0.000000] memblock_reserve: [0x1407ffc5000-0x1407ffd0fff] usermap section
> [ 0.000000] memblock_reserve: [0x1807ffc5000-0x1807ffd0fff] usermap section
> [ 0.000000] memblock_reserve: [0x1c07ffc5000-0x1c07ffd0fff] usermap section
> [ 0.000000] memblock_reserve: [0x2007fbc4000-0x2007fbcffff] usermap section
>
attached patch fixes the problem.
you can decide if reverting the commit or apply the patch.
Thanks
Yinghai
[-- Attachment #2: fix_usemap_goal.patch --]
[-- Type: application/octet-stream, Size: 713 bytes --]
Subject: [PATCH] mm: fix goal calculating with usemap
PAGE_SECTION_MASK should be used with pfn instead of pa.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
diff --git a/mm/sparse.c b/mm/sparse.c
index 6a4bf91..fd00928 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -287,7 +287,7 @@ sparse_early_usemaps_alloc_pgdat_section(struct pglist_data *pgdat,
* from the same section as the pgdat where possible to avoid
* this problem.
*/
- goal = __pa(pgdat) & PAGE_SECTION_MASK;
+ goal = ((__pa(pgdat) >> PAGE_SHIFT) & PAGE_SECTION_MASK) << PAGE_SHIFT;
host_pgdat = NODE_DATA(early_pfn_to_nid(goal >> PAGE_SHIFT));
return __alloc_bootmem_node_nopanic(host_pgdat, size,
SMP_CACHE_BYTES, goal);
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [patch 046/181] mm: remove sparsemem allocation details from the bootmem allocator
2012-06-23 2:05 ` [patch 046/181] mm: remove sparsemem allocation details from the bootmem allocator Yinghai Lu
@ 2012-06-23 9:17 ` Johannes Weiner
2012-06-23 18:58 ` Yinghai Lu
0 siblings, 1 reply; 8+ messages in thread
From: Johannes Weiner @ 2012-06-23 9:17 UTC (permalink / raw)
To: Yinghai Lu; +Cc: akpm, torvalds, davem, shangw, tj, Linux Kernel Mailing List
On Fri, Jun 22, 2012 at 07:05:45PM -0700, Yinghai Lu wrote:
> On Fri, Jun 22, 2012 at 6:11 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> > On Tue, May 29, 2012 at 3:06 PM, <akpm@linux-foundation.org> wrote:
> >> From: Johannes Weiner <hannes@cmpxchg.org>
> >> Subject: mm: remove sparsemem allocation details from the bootmem allocator
> >>
> >> alloc_bootmem_section() derives allocation area constraints from the
> >> specified sparsemem section. This is a bit specific for a generic memory
> >> allocator like bootmem, though, so move it over to sparsemem.
> >>
> >> As __alloc_bootmem_node_nopanic() already retries failed allocations with
> >> relaxed area constraints, the fallback code in sparsemem.c can be removed
> >> and the code becomes a bit more compact overall.
> >>
> >> [akpm@linux-foundation.org: fix build]
> >> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> >> Acked-by: Tejun Heo <tj@kernel.org>
> >> Acked-by: David S. Miller <davem@davemloft.net>
> >> Cc: Yinghai Lu <yinghai@kernel.org>
> >> Cc: Gavin Shan <shangw@linux.vnet.ibm.com>
> >> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> >
> > hi, this one cause regression, will put usemap to last node's memory
> > instead of each node.
>
> attached patch fixes the problem.
Sorry for the trouble and thanks for the patch! The number of bugs in
these three lines is too damn high...
> Subject: [PATCH] mm: fix goal calculating with usemap
>
> PAGE_SECTION_MASK should be used with pfn instead of pa.
>
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
>
> diff --git a/mm/sparse.c b/mm/sparse.c
> index 6a4bf91..fd00928 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -287,7 +287,7 @@ sparse_early_usemaps_alloc_pgdat_section(struct pglist_data *pgdat,
> * from the same section as the pgdat where possible to avoid
> * this problem.
> */
> - goal = __pa(pgdat) & PAGE_SECTION_MASK;
> + goal = ((__pa(pgdat) >> PAGE_SHIFT) & PAGE_SECTION_MASK) << PAGE_SHIFT;
How about
goal = __pa(pgdat) & (PAGE_SECTION_MASK << PAGE_SHIFT);
instead?
> host_pgdat = NODE_DATA(early_pfn_to_nid(goal >> PAGE_SHIFT));
> return __alloc_bootmem_node_nopanic(host_pgdat, size,
> SMP_CACHE_BYTES, goal);
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [patch 046/181] mm: remove sparsemem allocation details from the bootmem allocator
2012-06-23 9:17 ` Johannes Weiner
@ 2012-06-23 18:58 ` Yinghai Lu
2012-06-23 22:06 ` Johannes Weiner
0 siblings, 1 reply; 8+ messages in thread
From: Yinghai Lu @ 2012-06-23 18:58 UTC (permalink / raw)
To: Johannes Weiner
Cc: akpm, torvalds, davem, shangw, tj, Linux Kernel Mailing List
[-- Attachment #1: Type: text/plain, Size: 2232 bytes --]
On Sat, Jun 23, 2012 at 2:17 AM, Johannes Weiner <hannes@cmpxchg.org> wrote:
> On Fri, Jun 22, 2012 at 07:05:45PM -0700, Yinghai Lu wrote:
>> On Fri, Jun 22, 2012 at 6:11 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>> > On Tue, May 29, 2012 at 3:06 PM, <akpm@linux-foundation.org> wrote:
>> >> From: Johannes Weiner <hannes@cmpxchg.org>
>> >> Subject: mm: remove sparsemem allocation details from the bootmem allocator
>> >>
>> >> alloc_bootmem_section() derives allocation area constraints from the
>> >> specified sparsemem section. This is a bit specific for a generic memory
>> >> allocator like bootmem, though, so move it over to sparsemem.
>> >>
>> >> As __alloc_bootmem_node_nopanic() already retries failed allocations with
>> >> relaxed area constraints, the fallback code in sparsemem.c can be removed
>> >> and the code becomes a bit more compact overall.
>> >>
>> >> [akpm@linux-foundation.org: fix build]
>> >> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
>> >> Acked-by: Tejun Heo <tj@kernel.org>
>> >> Acked-by: David S. Miller <davem@davemloft.net>
>> >> Cc: Yinghai Lu <yinghai@kernel.org>
>> >> Cc: Gavin Shan <shangw@linux.vnet.ibm.com>
>> >> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
>> >
>> > hi, this one cause regression, will put usemap to last node's memory
>> > instead of each node.
>>
>> attached patch fixes the problem.
>
> Sorry for the trouble and thanks for the patch! The number of bugs in
> these three lines is too damn high...
>
yeah, i should run Andrew -mm early.
Andrew, Do you have git tree for your -mm?
So I could merge it to my local tree.
now i only have tip, pci, scsi, net, driver-core, usb, tty in my local tree.
...
>> */
>> - goal = __pa(pgdat) & PAGE_SECTION_MASK;
>> + goal = ((__pa(pgdat) >> PAGE_SHIFT) & PAGE_SECTION_MASK) << PAGE_SHIFT;
>
> How about
>
> goal = __pa(pgdat) & (PAGE_SECTION_MASK << PAGE_SHIFT);
>
> instead?
yeah.
After closely looking, looks like we need to revert the commit or
apply -v2 patch.
old sequence is : try same section range at first.
this commit will try [start_of_section, end_of_same_node_range], so
could have chance to get
Yinghai
[-- Attachment #2: fix_usemap_goal_v2.patch --]
[-- Type: application/octet-stream, Size: 3460 bytes --]
Subject: [PATCH] mm: fix goal calculating with usemap
PAGE_SECTION_MASK should be used with pfn instead of pa.
Also restore the old behavoir:
limit the allocating to same section at first.
need to expose __alloc_bootmem_node_nopanic with limit.
-v2: add limit of same section
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
include/linux/bootmem.h | 5 +++++
mm/bootmem.c | 2 +-
mm/nobootmem.c | 2 +-
mm/sparse.c | 22 ++++++++++++++++------
4 files changed, 23 insertions(+), 8 deletions(-)
Index: linux-2.6/mm/sparse.c
===================================================================
--- linux-2.6.orig/mm/sparse.c
+++ linux-2.6/mm/sparse.c
@@ -275,8 +275,9 @@ static unsigned long * __init
sparse_early_usemaps_alloc_pgdat_section(struct pglist_data *pgdat,
unsigned long size)
{
- pg_data_t *host_pgdat;
- unsigned long goal;
+ unsigned long goal, limit;
+ unsigned long *p;
+ int nid;
/*
* A page may contain usemaps for other sections preventing the
* page being freed and making a section unremovable while
@@ -287,10 +288,19 @@ sparse_early_usemaps_alloc_pgdat_section
* from the same section as the pgdat where possible to avoid
* this problem.
*/
- goal = __pa(pgdat) & PAGE_SECTION_MASK;
- host_pgdat = NODE_DATA(early_pfn_to_nid(goal >> PAGE_SHIFT));
- return __alloc_bootmem_node_nopanic(host_pgdat, size,
- SMP_CACHE_BYTES, goal);
+ goal = __pa(pgdat) & (PAGE_SECTION_MASK << PAGE_SHIFT);
+ limit = goal + (1UL << PA_SECTION_SHIFT);
+ nid = early_pfn_to_nid(goal >> PAGE_SHIFT);
+
+again:
+ p = ___alloc_bootmem_node_nopanic(NODE_DATA(nid), size,
+ SMP_CACHE_BYTES, goal, limit);
+ if (!p && limit) {
+ limit = 0;
+ goto again;
+ }
+
+ return p;
}
static void __init check_usemap_section_nr(int nid, unsigned long *usemap)
Index: linux-2.6/include/linux/bootmem.h
===================================================================
--- linux-2.6.orig/include/linux/bootmem.h
+++ linux-2.6/include/linux/bootmem.h
@@ -91,6 +91,11 @@ extern void *__alloc_bootmem_node_nopani
unsigned long size,
unsigned long align,
unsigned long goal);
+void *___alloc_bootmem_node_nopanic(pg_data_t *pgdat,
+ unsigned long size,
+ unsigned long align,
+ unsigned long goal,
+ unsigned long limit);
extern void *__alloc_bootmem_low(unsigned long size,
unsigned long align,
unsigned long goal);
Index: linux-2.6/mm/bootmem.c
===================================================================
--- linux-2.6.orig/mm/bootmem.c
+++ linux-2.6/mm/bootmem.c
@@ -698,7 +698,7 @@ void * __init __alloc_bootmem(unsigned l
return ___alloc_bootmem(size, align, goal, limit);
}
-static void * __init ___alloc_bootmem_node_nopanic(pg_data_t *pgdat,
+void * __init ___alloc_bootmem_node_nopanic(pg_data_t *pgdat,
unsigned long size, unsigned long align,
unsigned long goal, unsigned long limit)
{
Index: linux-2.6/mm/nobootmem.c
===================================================================
--- linux-2.6.orig/mm/nobootmem.c
+++ linux-2.6/mm/nobootmem.c
@@ -274,7 +274,7 @@ void * __init __alloc_bootmem(unsigned l
return ___alloc_bootmem(size, align, goal, limit);
}
-static void * __init ___alloc_bootmem_node_nopanic(pg_data_t *pgdat,
+void * __init ___alloc_bootmem_node_nopanic(pg_data_t *pgdat,
unsigned long size,
unsigned long align,
unsigned long goal,
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [patch 046/181] mm: remove sparsemem allocation details from the bootmem allocator
2012-06-23 18:58 ` Yinghai Lu
@ 2012-06-23 22:06 ` Johannes Weiner
2012-06-23 22:35 ` Yinghai Lu
0 siblings, 1 reply; 8+ messages in thread
From: Johannes Weiner @ 2012-06-23 22:06 UTC (permalink / raw)
To: Yinghai Lu; +Cc: akpm, torvalds, davem, shangw, tj, Linux Kernel Mailing List
On Sat, Jun 23, 2012 at 11:58:02AM -0700, Yinghai Lu wrote:
> On Sat, Jun 23, 2012 at 2:17 AM, Johannes Weiner <hannes@cmpxchg.org> wrote:
> > On Fri, Jun 22, 2012 at 07:05:45PM -0700, Yinghai Lu wrote:
> >> On Fri, Jun 22, 2012 at 6:11 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> >> > On Tue, May 29, 2012 at 3:06 PM, <akpm@linux-foundation.org> wrote:
> >> >> From: Johannes Weiner <hannes@cmpxchg.org>
> >> >> Subject: mm: remove sparsemem allocation details from the bootmem allocator
> >> >>
> >> >> alloc_bootmem_section() derives allocation area constraints from the
> >> >> specified sparsemem section. This is a bit specific for a generic memory
> >> >> allocator like bootmem, though, so move it over to sparsemem.
> >> >>
> >> >> As __alloc_bootmem_node_nopanic() already retries failed allocations with
> >> >> relaxed area constraints, the fallback code in sparsemem.c can be removed
> >> >> and the code becomes a bit more compact overall.
> >> >>
> >> >> [akpm@linux-foundation.org: fix build]
> >> >> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> >> >> Acked-by: Tejun Heo <tj@kernel.org>
> >> >> Acked-by: David S. Miller <davem@davemloft.net>
> >> >> Cc: Yinghai Lu <yinghai@kernel.org>
> >> >> Cc: Gavin Shan <shangw@linux.vnet.ibm.com>
> >> >> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> >> >
> >> > hi, this one cause regression, will put usemap to last node's memory
> >> > instead of each node.
> >>
> >> attached patch fixes the problem.
> >
> > Sorry for the trouble and thanks for the patch! The number of bugs in
> > these three lines is too damn high...
> >
> yeah, i should run Andrew -mm early.
>
> Andrew, Do you have git tree for your -mm?
> So I could merge it to my local tree.
> now i only have tip, pci, scsi, net, driver-core, usb, tty in my local tree.
(Parts of) -mm are included in linux-next. I have a tree that is
automated raw -mm imports, including debugging stuff Andrew never
pushes upstream, you can find it here:
http://git.cmpxchg.org/?p=linux-mmotm.git;a=summary
git://git.cmpxchg.org/linux-mmotm.git
> >> */
> >> - goal = __pa(pgdat) & PAGE_SECTION_MASK;
> >> + goal = ((__pa(pgdat) >> PAGE_SHIFT) & PAGE_SECTION_MASK) << PAGE_SHIFT;
> >
> > How about
> >
> > goal = __pa(pgdat) & (PAGE_SECTION_MASK << PAGE_SHIFT);
> >
> > instead?
>
> yeah.
>
> After closely looking, looks like we need to revert the commit or
> apply -v2 patch.
>
> old sequence is : try same section range at first.
> this commit will try [start_of_section, end_of_same_node_range], so
> could have chance to get
...yes? :)
> Subject: [PATCH] mm: fix goal calculating with usemap
>
> PAGE_SECTION_MASK should be used with pfn instead of pa.
>
> Also restore the old behavoir:
> limit the allocating to same section at first.
> need to expose __alloc_bootmem_node_nopanic with limit.
>
> -v2: add limit of same section
>
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
>
> ---
> include/linux/bootmem.h | 5 +++++
> mm/bootmem.c | 2 +-
> mm/nobootmem.c | 2 +-
> mm/sparse.c | 22 ++++++++++++++++------
> 4 files changed, 23 insertions(+), 8 deletions(-)
>
> Index: linux-2.6/mm/sparse.c
> ===================================================================
> --- linux-2.6.orig/mm/sparse.c
> +++ linux-2.6/mm/sparse.c
> @@ -275,8 +275,9 @@ static unsigned long * __init
> sparse_early_usemaps_alloc_pgdat_section(struct pglist_data *pgdat,
> unsigned long size)
> {
> - pg_data_t *host_pgdat;
> - unsigned long goal;
> + unsigned long goal, limit;
> + unsigned long *p;
> + int nid;
> /*
> * A page may contain usemaps for other sections preventing the
> * page being freed and making a section unremovable while
> @@ -287,10 +288,19 @@ sparse_early_usemaps_alloc_pgdat_section
> * from the same section as the pgdat where possible to avoid
> * this problem.
> */
> - goal = __pa(pgdat) & PAGE_SECTION_MASK;
> - host_pgdat = NODE_DATA(early_pfn_to_nid(goal >> PAGE_SHIFT));
> - return __alloc_bootmem_node_nopanic(host_pgdat, size,
> - SMP_CACHE_BYTES, goal);
> + goal = __pa(pgdat) & (PAGE_SECTION_MASK << PAGE_SHIFT);
> + limit = goal + (1UL << PA_SECTION_SHIFT);
> + nid = early_pfn_to_nid(goal >> PAGE_SHIFT);
> +
> +again:
> + p = ___alloc_bootmem_node_nopanic(NODE_DATA(nid), size,
> + SMP_CACHE_BYTES, goal, limit);
> + if (!p && limit) {
> + limit = 0;
> + goto again;
> + }
I was aware that __alloc_bootmem_node_nopanic does a slightly
different fallback sequence, but as soon as you go outside the
section, you have a node-section-dependency either way.
Without this patch, it would retry without the node first, then
without the goal. With your patch, you would try to limit the upper
end, but you may already go below the requested section. What would
the limit gain?
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [patch 046/181] mm: remove sparsemem allocation details from the bootmem allocator
2012-06-23 22:06 ` Johannes Weiner
@ 2012-06-23 22:35 ` Yinghai Lu
2012-06-26 8:17 ` Johannes Weiner
0 siblings, 1 reply; 8+ messages in thread
From: Yinghai Lu @ 2012-06-23 22:35 UTC (permalink / raw)
To: Johannes Weiner
Cc: akpm, torvalds, davem, shangw, tj, Linux Kernel Mailing List
On Sat, Jun 23, 2012 at 3:06 PM, Johannes Weiner <hannes@cmpxchg.org> wrote:
> On Sat, Jun 23, 2012 at 11:58:02AM -0700, Yinghai Lu wrote:
>> +again:
>> + p = ___alloc_bootmem_node_nopanic(NODE_DATA(nid), size,
>> + SMP_CACHE_BYTES, goal, limit);
>> + if (!p && limit) {
>> + limit = 0;
>> + goto again;
>> + }
>
> I was aware that __alloc_bootmem_node_nopanic does a slightly
> different fallback sequence, but as soon as you go outside the
> section, you have a node-section-dependency either way.
you need to make sure, first try to get the in the same section, then same node.
>
> Without this patch, it would retry without the node first, then
> without the goal. With your patch, you would try to limit the upper
> end, but you may already go below the requested section. What would
> the limit gain?
goal: start of section
limit: start of next section.
the layout should be
start_of_range_same_node...start_of_range_same_section....end_of_range_same_section...end_of_range_of_same_node.
before your commit:
1. first try: start_of_range_same_section....end_of_range_same_section
2. second try: on same node.
with your commit:
1. first try: start_of_range_same_section....end_of_range_same_node.
===> it would get buffer out of same section.
2. second try: 0...0, aka 0... -1UL
with this patch:
1. first try: start_of_range_same_section....end_of_range_same_section
2. second try: 0...end_of_range_of_same_section.
3. third try: start_of_range_same_section...0, aka
tart_of_range_same_section...-1UL;
4. third try: 0...0, aka 0...-1UL;
Yinghai
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [patch 046/181] mm: remove sparsemem allocation details from the bootmem allocator
2012-06-23 22:35 ` Yinghai Lu
@ 2012-06-26 8:17 ` Johannes Weiner
2012-06-26 18:20 ` Yinghai Lu
0 siblings, 1 reply; 8+ messages in thread
From: Johannes Weiner @ 2012-06-26 8:17 UTC (permalink / raw)
To: Yinghai Lu; +Cc: akpm, torvalds, davem, shangw, tj, Linux Kernel Mailing List
On Sat, Jun 23, 2012 at 03:35:37PM -0700, Yinghai Lu wrote:
> On Sat, Jun 23, 2012 at 3:06 PM, Johannes Weiner <hannes@cmpxchg.org> wrote:
> > On Sat, Jun 23, 2012 at 11:58:02AM -0700, Yinghai Lu wrote:
> >> +again:
> >> + p = ___alloc_bootmem_node_nopanic(NODE_DATA(nid), size,
> >> + SMP_CACHE_BYTES, goal, limit);
> >> + if (!p && limit) {
> >> + limit = 0;
> >> + goto again;
> >> + }
> >
> > I was aware that __alloc_bootmem_node_nopanic does a slightly
> > different fallback sequence, but as soon as you go outside the
> > section, you have a node-section-dependency either way.
>
> you need to make sure, first try to get the in the same section, then same node.
>
> >
> > Without this patch, it would retry without the node first, then
> > without the goal. With your patch, you would try to limit the upper
> > end, but you may already go below the requested section. What would
> > the limit gain?
>
> goal: start of section
> limit: start of next section.
>
> the layout should be
> start_of_range_same_node...start_of_range_same_section....end_of_range_same_section...end_of_range_of_same_node.
>
> before your commit:
> 1. first try: start_of_range_same_section....end_of_range_same_section
> 2. second try: on same node.
>
> with your commit:
> 1. first try: start_of_range_same_section....end_of_range_same_node.
> ===> it would get buffer out of same section.
> 2. second try: 0...0, aka 0... -1UL
>
> with this patch:
> 1. first try: start_of_range_same_section....end_of_range_same_section
> 2. second try: 0...end_of_range_of_same_section.
> 3. third try: start_of_range_same_section...0, aka
> tart_of_range_same_section...-1UL;
> 4. third try: 0...0, aka 0...-1UL;
What I failed to take into account at first is that neither bootmem
implementation (necessarily) does a linear bottom-up search starting
from goal, so yes, we definitely need an upper limit for the first
attempt.
No use in reverting it, as the bootmem version was already broken
before, when the limit was removed in f5bf18f "bootmem/sparsemem:
remove limit constraint in alloc_bootmem_section".
I'm okay with exporting and using ___alloc_bootmem_node_nopanic() in
3.5.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [patch 046/181] mm: remove sparsemem allocation details from the bootmem allocator
2012-06-26 8:17 ` Johannes Weiner
@ 2012-06-26 18:20 ` Yinghai Lu
2012-06-26 19:08 ` Johannes Weiner
0 siblings, 1 reply; 8+ messages in thread
From: Yinghai Lu @ 2012-06-26 18:20 UTC (permalink / raw)
To: Johannes Weiner
Cc: akpm, torvalds, davem, shangw, tj, Linux Kernel Mailing List
[-- Attachment #1: Type: text/plain, Size: 451 bytes --]
On Tue, Jun 26, 2012 at 1:17 AM, Johannes Weiner <hannes@cmpxchg.org> wrote:
> No use in reverting it, as the bootmem version was already broken
> before, when the limit was removed in f5bf18f "bootmem/sparsemem:
> remove limit constraint in alloc_bootmem_section".
>
> I'm okay with exporting and using ___alloc_bootmem_node_nopanic() in
> 3.5.
Good. Assume that is Acked-by you.
Andrew,
Can you push attached -v2 patch to Linus?
Thanks
Yinghai
[-- Attachment #2: fix_usemap_goal_v2.patch --]
[-- Type: application/octet-stream, Size: 3460 bytes --]
Subject: [PATCH] mm: fix goal calculating with usemap
PAGE_SECTION_MASK should be used with pfn instead of pa.
Also restore the old behavoir:
limit the allocating to same section at first.
need to expose __alloc_bootmem_node_nopanic with limit.
-v2: add limit of same section
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
include/linux/bootmem.h | 5 +++++
mm/bootmem.c | 2 +-
mm/nobootmem.c | 2 +-
mm/sparse.c | 22 ++++++++++++++++------
4 files changed, 23 insertions(+), 8 deletions(-)
Index: linux-2.6/mm/sparse.c
===================================================================
--- linux-2.6.orig/mm/sparse.c
+++ linux-2.6/mm/sparse.c
@@ -275,8 +275,9 @@ static unsigned long * __init
sparse_early_usemaps_alloc_pgdat_section(struct pglist_data *pgdat,
unsigned long size)
{
- pg_data_t *host_pgdat;
- unsigned long goal;
+ unsigned long goal, limit;
+ unsigned long *p;
+ int nid;
/*
* A page may contain usemaps for other sections preventing the
* page being freed and making a section unremovable while
@@ -287,10 +288,19 @@ sparse_early_usemaps_alloc_pgdat_section
* from the same section as the pgdat where possible to avoid
* this problem.
*/
- goal = __pa(pgdat) & PAGE_SECTION_MASK;
- host_pgdat = NODE_DATA(early_pfn_to_nid(goal >> PAGE_SHIFT));
- return __alloc_bootmem_node_nopanic(host_pgdat, size,
- SMP_CACHE_BYTES, goal);
+ goal = __pa(pgdat) & (PAGE_SECTION_MASK << PAGE_SHIFT);
+ limit = goal + (1UL << PA_SECTION_SHIFT);
+ nid = early_pfn_to_nid(goal >> PAGE_SHIFT);
+
+again:
+ p = ___alloc_bootmem_node_nopanic(NODE_DATA(nid), size,
+ SMP_CACHE_BYTES, goal, limit);
+ if (!p && limit) {
+ limit = 0;
+ goto again;
+ }
+
+ return p;
}
static void __init check_usemap_section_nr(int nid, unsigned long *usemap)
Index: linux-2.6/include/linux/bootmem.h
===================================================================
--- linux-2.6.orig/include/linux/bootmem.h
+++ linux-2.6/include/linux/bootmem.h
@@ -91,6 +91,11 @@ extern void *__alloc_bootmem_node_nopani
unsigned long size,
unsigned long align,
unsigned long goal);
+void *___alloc_bootmem_node_nopanic(pg_data_t *pgdat,
+ unsigned long size,
+ unsigned long align,
+ unsigned long goal,
+ unsigned long limit);
extern void *__alloc_bootmem_low(unsigned long size,
unsigned long align,
unsigned long goal);
Index: linux-2.6/mm/bootmem.c
===================================================================
--- linux-2.6.orig/mm/bootmem.c
+++ linux-2.6/mm/bootmem.c
@@ -698,7 +698,7 @@ void * __init __alloc_bootmem(unsigned l
return ___alloc_bootmem(size, align, goal, limit);
}
-static void * __init ___alloc_bootmem_node_nopanic(pg_data_t *pgdat,
+void * __init ___alloc_bootmem_node_nopanic(pg_data_t *pgdat,
unsigned long size, unsigned long align,
unsigned long goal, unsigned long limit)
{
Index: linux-2.6/mm/nobootmem.c
===================================================================
--- linux-2.6.orig/mm/nobootmem.c
+++ linux-2.6/mm/nobootmem.c
@@ -274,7 +274,7 @@ void * __init __alloc_bootmem(unsigned l
return ___alloc_bootmem(size, align, goal, limit);
}
-static void * __init ___alloc_bootmem_node_nopanic(pg_data_t *pgdat,
+void * __init ___alloc_bootmem_node_nopanic(pg_data_t *pgdat,
unsigned long size,
unsigned long align,
unsigned long goal,
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [patch 046/181] mm: remove sparsemem allocation details from the bootmem allocator
2012-06-26 18:20 ` Yinghai Lu
@ 2012-06-26 19:08 ` Johannes Weiner
0 siblings, 0 replies; 8+ messages in thread
From: Johannes Weiner @ 2012-06-26 19:08 UTC (permalink / raw)
To: Yinghai Lu; +Cc: akpm, torvalds, davem, shangw, tj, Linux Kernel Mailing List
On Tue, Jun 26, 2012 at 11:20:16AM -0700, Yinghai Lu wrote:
> On Tue, Jun 26, 2012 at 1:17 AM, Johannes Weiner <hannes@cmpxchg.org> wrote:
> > No use in reverting it, as the bootmem version was already broken
> > before, when the limit was removed in f5bf18f "bootmem/sparsemem:
> > remove limit constraint in alloc_bootmem_section".
> >
> > I'm okay with exporting and using ___alloc_bootmem_node_nopanic() in
> > 3.5.
>
> Good. Assume that is Acked-by you.
Yes,
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Thanks, Yinghai!
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2012-06-26 19:08 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20120529220636.AAA66A0308@akpm.mtv.corp.google.com>
[not found] ` <CAE9FiQVEq5j1zoKS31fgfG_3xZSyfqsjdDorDku-Sgj6BTShEg@mail.gmail.com>
2012-06-23 2:05 ` [patch 046/181] mm: remove sparsemem allocation details from the bootmem allocator Yinghai Lu
2012-06-23 9:17 ` Johannes Weiner
2012-06-23 18:58 ` Yinghai Lu
2012-06-23 22:06 ` Johannes Weiner
2012-06-23 22:35 ` Yinghai Lu
2012-06-26 8:17 ` Johannes Weiner
2012-06-26 18:20 ` Yinghai Lu
2012-06-26 19:08 ` Johannes Weiner
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.