From: "Bob Picco" <bob.picco@hp.com>
To: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: "Martin J. Bligh" <mbligh@mbligh.org>, Andi Kleen <ak@suse.de>,
Ingo Molnar <mingo@elte.hu>,
linux-kernel@vger.kernel.org, Andrew Morton <akpm@osdl.org>,
Linux Memory Management <linux-mm@kvack.org>,
Andy Whitcroft <apw@shadowen.org>
Subject: Re: assert/crash in __rmqueue() when enabling CONFIG_NUMA
Date: Wed, 3 May 2006 21:32:39 -0400 [thread overview]
Message-ID: <20060504013239.GG19859@localhost> (raw)
In-Reply-To: <44576BF5.8070903@yahoo.com.au>
Nick Piggin wrote: [Tue May 02 2006, 10:25:57AM EDT]
> Martin J. Bligh wrote:
> >>Oh that's a 32bit kernel. I don't think the 32bit NUMA has ever worked
> >>anywhere but some Summit systems (at least every time I tried it it
> >>blew up on me and nobody seems to use it regularly). Maybe it would be
> >>finally time to mark it CONFIG_BROKEN though or just remove it (even
> >>by design it doesn't work very well)
> >
> >
> >Bollocks. It works fine, and is tested every single day, on every git
> >release, and every -mm tree.
>
> Whatever the case, there definitely does not appear to be sufficient
> zone alignment enforced for the buddy allocator. I cannot see how it
> could work if zones are not aligned on 4MB boundaries.
>
> Maybe some architectures / subarch code naturally does this for us,
> but Ingo is definitely hitting this bug because his config does not
> (align, that is).
>
> I've randomly added a couple more cc's.
>
The patch below isn't compile tested or correct for those cases where
alloc_remap is called or where arch code has allocated node_mem_map for
CONFIG_FLAT_NODE_MEM_MAP. It's just conveying what I believe the issue is.
Andy added code to buddy allocator which doesn't require the zone's endpoints
to be aligned to MAX_ORDER. I think the issue is that the buddy
allocator requires the node_mem_map's endpoints to be MAX_ORDER aligned.
Otherwise __page_find_buddy could compute a buddy not in node_mem_map
for partial MAX_ORDER regions at zone's endpoints. page_is_buddy will
detect that these pages at endpoints aren't PG_buddy (they were zeroed
out by bootmem allocator and not part of zone). Of course the negative
here is we could waste a little memory but the positive is eliminating
all the old checks for zone boundary conditions.
SPARSEMEM won't encounter this issue because of MAX_ORDER size
constraint when SPARSEMEM is configured. ia64 VIRTUAL_MEM_MAP doesn't
need the logic either because the holes and endpoints are handled
differently. This leaves checking alloc_remap and other arches which
privately allocate for node_mem_map.
Any how I could be totally wrong but like I said this requires more
thought.
bob
Index: linux-2.6.17-rc3/mm/page_alloc.c
===================================================================
--- linux-2.6.17-rc3.orig/mm/page_alloc.c 2006-04-27 09:44:02.000000000 -0400
+++ linux-2.6.17-rc3/mm/page_alloc.c 2006-05-03 14:50:13.000000000 -0400
@@ -2123,14 +2123,23 @@ static void __init alloc_node_mem_map(st
#ifdef CONFIG_FLAT_NODE_MEM_MAP
/* ia64 gets its own node_mem_map, before this, without bootmem */
if (!pgdat->node_mem_map) {
- unsigned long size;
+ unsigned long size, start, end;
struct page *map;
- size = (pgdat->node_spanned_pages + 1) * sizeof(struct page);
+ /*
+ * The zone's endpoints aren't required to be MAX_ORDER
+ * aligned but the node_mem_map endpoints must be in order
+ * for the buddy allocator to function correctly.
+ */
+ start = pgdat->node_start_pfn & ~((1 << (MAX_ORDER - 1)) - 1);
+ end = start + pgdat->node_spanned_pages;
+ end = (end + ((1 << (MAX_ORDER - 1)) - 1) &
+ ~((1 << (MAX_ORDER - 1)) - 1);
+ size = (end - start) * sizeof(struct page);
map = alloc_remap(pgdat->node_id, size);
if (!map)
map = alloc_bootmem_node(pgdat, size);
- pgdat->node_mem_map = map;
+ pgdat->node_mem_map = map + ( pgdat->node_start_pfn - start);
}
#ifdef CONFIG_FLATMEM
/*
WARNING: multiple messages have this Message-ID (diff)
From: "Bob Picco" <bob.picco@hp.com>
To: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: "Martin J. Bligh" <mbligh@mbligh.org>, Andi Kleen <ak@suse.de>,
Ingo Molnar <mingo@elte.hu>,
linux-kernel@vger.kernel.org, Andrew Morton <akpm@osdl.org>,
Linux Memory Management <linux-mm@kvack.org>,
Andy Whitcroft <apw@shadowen.org>
Subject: Re: assert/crash in __rmqueue() when enabling CONFIG_NUMA
Date: Wed, 3 May 2006 21:32:39 -0400 [thread overview]
Message-ID: <20060504013239.GG19859@localhost> (raw)
In-Reply-To: <44576BF5.8070903@yahoo.com.au>
Nick Piggin wrote: [Tue May 02 2006, 10:25:57AM EDT]
> Martin J. Bligh wrote:
> >>Oh that's a 32bit kernel. I don't think the 32bit NUMA has ever worked
> >>anywhere but some Summit systems (at least every time I tried it it
> >>blew up on me and nobody seems to use it regularly). Maybe it would be
> >>finally time to mark it CONFIG_BROKEN though or just remove it (even
> >>by design it doesn't work very well)
> >
> >
> >Bollocks. It works fine, and is tested every single day, on every git
> >release, and every -mm tree.
>
> Whatever the case, there definitely does not appear to be sufficient
> zone alignment enforced for the buddy allocator. I cannot see how it
> could work if zones are not aligned on 4MB boundaries.
>
> Maybe some architectures / subarch code naturally does this for us,
> but Ingo is definitely hitting this bug because his config does not
> (align, that is).
>
> I've randomly added a couple more cc's.
>
The patch below isn't compile tested or correct for those cases where
alloc_remap is called or where arch code has allocated node_mem_map for
CONFIG_FLAT_NODE_MEM_MAP. It's just conveying what I believe the issue is.
Andy added code to buddy allocator which doesn't require the zone's endpoints
to be aligned to MAX_ORDER. I think the issue is that the buddy
allocator requires the node_mem_map's endpoints to be MAX_ORDER aligned.
Otherwise __page_find_buddy could compute a buddy not in node_mem_map
for partial MAX_ORDER regions at zone's endpoints. page_is_buddy will
detect that these pages at endpoints aren't PG_buddy (they were zeroed
out by bootmem allocator and not part of zone). Of course the negative
here is we could waste a little memory but the positive is eliminating
all the old checks for zone boundary conditions.
SPARSEMEM won't encounter this issue because of MAX_ORDER size
constraint when SPARSEMEM is configured. ia64 VIRTUAL_MEM_MAP doesn't
need the logic either because the holes and endpoints are handled
differently. This leaves checking alloc_remap and other arches which
privately allocate for node_mem_map.
Any how I could be totally wrong but like I said this requires more
thought.
bob
Index: linux-2.6.17-rc3/mm/page_alloc.c
===================================================================
--- linux-2.6.17-rc3.orig/mm/page_alloc.c 2006-04-27 09:44:02.000000000 -0400
+++ linux-2.6.17-rc3/mm/page_alloc.c 2006-05-03 14:50:13.000000000 -0400
@@ -2123,14 +2123,23 @@ static void __init alloc_node_mem_map(st
#ifdef CONFIG_FLAT_NODE_MEM_MAP
/* ia64 gets its own node_mem_map, before this, without bootmem */
if (!pgdat->node_mem_map) {
- unsigned long size;
+ unsigned long size, start, end;
struct page *map;
- size = (pgdat->node_spanned_pages + 1) * sizeof(struct page);
+ /*
+ * The zone's endpoints aren't required to be MAX_ORDER
+ * aligned but the node_mem_map endpoints must be in order
+ * for the buddy allocator to function correctly.
+ */
+ start = pgdat->node_start_pfn & ~((1 << (MAX_ORDER - 1)) - 1);
+ end = start + pgdat->node_spanned_pages;
+ end = (end + ((1 << (MAX_ORDER - 1)) - 1) &
+ ~((1 << (MAX_ORDER - 1)) - 1);
+ size = (end - start) * sizeof(struct page);
map = alloc_remap(pgdat->node_id, size);
if (!map)
map = alloc_bootmem_node(pgdat, size);
- pgdat->node_mem_map = map;
+ pgdat->node_mem_map = map + ( pgdat->node_start_pfn - start);
}
#ifdef CONFIG_FLATMEM
/*
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2006-05-04 1:32 UTC|newest]
Thread overview: 106+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20060419112130.GA22648@elte.hu>
2006-04-20 9:18 ` assert/crash in __rmqueue() when enabling CONFIG_NUMA Nick Piggin
2006-04-21 11:20 ` Ingo Molnar
2006-04-21 11:45 ` Ingo Molnar
2006-05-01 12:49 ` Ingo Molnar
2006-05-02 6:48 ` Andi Kleen
2006-05-02 7:06 ` Ingo Molnar
2006-05-02 7:05 ` Andi Kleen
2006-05-02 8:27 ` Ingo Molnar
2006-05-02 14:02 ` Martin J. Bligh
2006-05-02 14:25 ` Nick Piggin
2006-05-02 14:25 ` Nick Piggin
2006-05-04 1:32 ` Bob Picco [this message]
2006-05-04 1:32 ` Bob Picco
2006-05-04 8:37 ` Ingo Molnar
2006-05-04 8:37 ` Ingo Molnar
2006-05-04 9:14 ` Ingo Molnar
2006-05-04 9:14 ` Ingo Molnar
2006-05-04 9:26 ` Ingo Molnar
2006-05-04 9:26 ` Ingo Molnar
2006-05-04 8:37 ` Andy Whitcroft
2006-05-04 8:37 ` Andy Whitcroft
2006-05-04 15:21 ` Dave Hansen
2006-05-04 15:21 ` Dave Hansen
2006-05-04 15:46 ` Bob Picco
2006-05-04 15:46 ` Bob Picco
2006-05-04 16:07 ` Dave Hansen
2006-05-04 16:07 ` Dave Hansen
2006-05-04 19:25 ` Ingo Molnar
2006-05-04 19:25 ` Ingo Molnar
2006-05-04 19:43 ` Bob Picco
2006-05-04 19:43 ` Bob Picco
2006-05-04 21:50 ` Andy Whitcroft
2006-05-04 21:50 ` Andy Whitcroft
2006-05-05 5:17 ` Ingo Molnar
2006-05-05 5:17 ` Ingo Molnar
2006-05-05 13:55 ` Bob Picco
2006-05-05 13:55 ` Bob Picco
2006-05-05 14:33 ` Dave Hansen
2006-05-05 14:33 ` Dave Hansen
2006-05-05 14:50 ` Bob Picco
2006-05-05 14:50 ` Bob Picco
2006-05-05 14:57 ` Dave Hansen
2006-05-05 14:57 ` Dave Hansen
2006-05-05 15:03 ` Martin J. Bligh
2006-05-05 15:03 ` Martin J. Bligh
2006-05-05 16:22 ` Bob Picco
2006-05-05 16:22 ` Bob Picco
2006-05-05 16:18 ` Bob Picco
2006-05-05 16:18 ` Bob Picco
2006-05-06 8:32 ` Nick Piggin
2006-05-06 8:32 ` Nick Piggin
2006-05-07 13:07 ` Andy Whitcroft
2006-05-07 13:07 ` Andy Whitcroft
2006-05-07 13:18 ` Nick Piggin
2006-05-07 13:18 ` Nick Piggin
2006-05-09 11:05 ` [PATCH 0/3] Zone boundry alignment fixes Andy Whitcroft
2006-05-09 11:05 ` Andy Whitcroft
2006-05-09 11:05 ` [PATCH 1/3] zone init check and report unaligned zone boundries Andy Whitcroft
2006-05-09 11:05 ` Andy Whitcroft
2006-05-09 11:28 ` Nick Piggin
2006-05-09 11:28 ` Nick Piggin
2006-05-09 11:05 ` [PATCH 2/3] x86 align highmem zone boundries with NUMA Andy Whitcroft
2006-05-09 11:05 ` Andy Whitcroft
2006-05-09 11:05 ` [PATCH 3/3] zone allow unaligned zone boundries Andy Whitcroft
2006-05-09 11:05 ` Andy Whitcroft
2006-05-11 7:59 ` [PATCH 0/3] Zone boundry alignment fixes Andrew Morton
2006-05-11 7:59 ` Andrew Morton
2006-05-12 14:19 ` Ingo Molnar
2006-05-12 14:19 ` Ingo Molnar
2006-05-13 1:39 ` Nick Piggin
2006-05-13 1:39 ` Nick Piggin
2006-05-18 14:20 ` [PATCH 0/2] Zone boundary alignment fixes cleanups Andy Whitcroft
2006-05-18 14:20 ` Andy Whitcroft
2006-05-18 14:21 ` [PATCH 1/2] zone init check and report unaligned zone boundaries fix Andy Whitcroft
2006-05-18 14:21 ` Andy Whitcroft
2006-05-18 14:21 ` [PATCH 2/2] zone allow unaligned zone boundaries spelling fix Andy Whitcroft
2006-05-18 14:21 ` Andy Whitcroft
2006-05-18 14:49 ` Andy Whitcroft
2006-05-18 14:49 ` Andy Whitcroft
2006-05-18 15:54 ` [PATCH 0/2] Zone boundary alignment fixes, cleanups v2 Andy Whitcroft
2006-05-18 15:54 ` Andy Whitcroft
2006-05-18 15:55 ` [PATCH 1/2] zone init check and report unaligned zone boundaries fix Andy Whitcroft
2006-05-18 15:55 ` Andy Whitcroft
2006-05-18 15:55 ` [PATCH 2/2] zone allow unaligned zone boundaries spelling fix Andy Whitcroft
2006-05-18 15:55 ` Andy Whitcroft
2006-05-02 15:03 ` assert/crash in __rmqueue() when enabling CONFIG_NUMA Andi Kleen
2006-05-02 15:17 ` Martin J. Bligh
2006-05-02 15:45 ` Andi Kleen
2006-05-02 16:02 ` Martin J. Bligh
2006-05-02 16:05 ` Andi Kleen
2006-05-02 19:47 ` Ingo Molnar
2006-05-02 19:48 ` Ingo Molnar
2006-05-02 19:44 ` Andi Kleen
2006-05-02 19:56 ` Martin Bligh
2006-05-02 20:00 ` Andi Kleen
2006-05-02 20:13 ` Ingo Molnar
2006-05-02 20:12 ` Andi Kleen
2006-05-02 15:52 ` Ingo Molnar
2006-05-02 19:55 ` [RFC, PATCH] cond_resched() added to close_files() Eric Dumazet
2006-05-03 7:01 ` Ingo Molnar
2006-05-12 9:44 ` Andrew Morton
2006-05-12 10:20 ` Ingo Molnar
2006-05-12 12:24 ` Eric Dumazet
2006-05-14 0:09 ` Lee Revell
2006-04-21 11:51 ` assert/crash in __rmqueue() when enabling CONFIG_NUMA Nick Piggin
2006-04-19 11:23 Ingo Molnar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20060504013239.GG19859@localhost \
--to=bob.picco@hp.com \
--cc=ak@suse.de \
--cc=akpm@osdl.org \
--cc=apw@shadowen.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mbligh@mbligh.org \
--cc=mingo@elte.hu \
--cc=nickpiggin@yahoo.com.au \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.