From: Johannes Weiner <hannes@saeurebad.de>
To: Andy Whitcroft <apw@shadowen.org>
Cc: Andi Kleen <andi@firstfloor.org>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
Ingo Molnar <mingo@elte.hu>, Yinghai Lu <yhlu.kernel@gmail.com>,
Andrew Morton <akpm@linux-foundation.org>,
Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH 0/3] bootmem2 III
Date: Fri, 16 May 2008 22:42:51 +0200 [thread overview]
Message-ID: <87zlqqx9o4.fsf@saeurebad.de> (raw)
In-Reply-To: <20080515191210.GE21787@shadowen.org> (Andy Whitcroft's message of "Thu, 15 May 2008 20:12:10 +0100")
Hi Andy,
Andy Whitcroft <apw@shadowen.org> writes:
> On Tue, May 13, 2008 at 02:40:44PM +0200, Johannes Weiner wrote:
>> Hi,
>>
>> Andi Kleen <andi@firstfloor.org> writes:
>>
>> > Johannes Weiner wrote:
>> >
>> >>> On Fri, May 09, 2008 at 05:17:13PM +0200, Johannes Weiner wrote:
>> >>>> here is bootmem2, a memory block-oriented boot time allocator.
>> >>>>
>> >>>> Recent NUMA topologies broke the current bootmem's assumption that
>> >>>> memory nodes provide non-overlapping and contiguous ranges of pages.
>> >>> I'm still not sure that's a really good rationale for bootmem2.
>> >>> e.g. the non continuous nodes are really special cases and there tends
>> >>> to be enough memory at the beginning which is enough for boot time
>> >>> use, so for those systems it would be quite reasonably to only
>> >>> put the continuous starts of the nodes into bootmem.
>> >>
>> >> Hm, that would put the logic into arch-code. I have no strong opinion
>> >> about it.
>> >
>> > In fact I suspect the current code will already work like that
>> > implicitely. The aliasing is only a problem for the new "arbitary node
>> > free_bootmem" right?
>>
>> And that alloc_bootmem_node() can not garuantee node-locality which is
>> the much worse part, I think.
>>
>> >>> That said the bootmem code has gotten a little crufty and a clean
>> >>> rewrite might be a good idea.
>> >>
>> >> I agree completely.
>> >
>> > The trouble is just that bootmem is used in early boot and early boot is
>> > very subtle and getting it working over all architectures could be a
>> > challenge. Not wanting to discourage you, but it's not exactly the
>> > easiest part of the kernel to hack on.
>>
>> Bootmem seemed pretty self-contained to me, at least in the beginning.
>> The bad thing is that I can test only the most simple configuration with
>> it.
>>
>> I was wondering yesterday if it would be feasible to enforce
>> contiguousness for nodes. So that arch-code does not create one pgdat
>> for each node but one for each contiguous block. I have not yet looked
>
> That re-introduces the concept that a node is not a unit of numa locality,
> but one of memory contiguity. The kernel pretty much assumes that a node
> exhibits memory locality.
Okay.
>> deeper into it, but I suspect that other mm code has similar problems
>> with nodes spanning other nodes.
>
> One thing we do know is that we already have systems in the wild with
> overlapping nodes. PowerPC systems sometimes exhibit this behaviour, the
> ones I have seen have node 1 embedded within node 0. x86_64 also enables
> this support. This necessitated checks when initially freeing memory
> into the allocator to make sure it ended up freed into the right node.
> For non-sparsemem configurations these systems have some wasted mem_map,
> but otherwise it does work.
>
> Check out NODES_SPAN_OTHER_NODES for the code to avoid miss-placing
> memory.
Will have a better look at all this. Thanks for the comment.
Hannes
WARNING: multiple messages have this Message-ID (diff)
From: Johannes Weiner <hannes@saeurebad.de>
To: Andy Whitcroft <apw@shadowen.org>
Cc: Andi Kleen <andi@firstfloor.org>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
Ingo Molnar <mingo@elte.hu>, Yinghai Lu <yhlu.kernel@gmail.com>,
Andrew Morton <akpm@linux-foundation.org>,
Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH 0/3] bootmem2 III
Date: Fri, 16 May 2008 22:42:51 +0200 [thread overview]
Message-ID: <87zlqqx9o4.fsf@saeurebad.de> (raw)
In-Reply-To: <20080515191210.GE21787@shadowen.org> (Andy Whitcroft's message of "Thu, 15 May 2008 20:12:10 +0100")
Hi Andy,
Andy Whitcroft <apw@shadowen.org> writes:
> On Tue, May 13, 2008 at 02:40:44PM +0200, Johannes Weiner wrote:
>> Hi,
>>
>> Andi Kleen <andi@firstfloor.org> writes:
>>
>> > Johannes Weiner wrote:
>> >
>> >>> On Fri, May 09, 2008 at 05:17:13PM +0200, Johannes Weiner wrote:
>> >>>> here is bootmem2, a memory block-oriented boot time allocator.
>> >>>>
>> >>>> Recent NUMA topologies broke the current bootmem's assumption that
>> >>>> memory nodes provide non-overlapping and contiguous ranges of pages.
>> >>> I'm still not sure that's a really good rationale for bootmem2.
>> >>> e.g. the non continuous nodes are really special cases and there tends
>> >>> to be enough memory at the beginning which is enough for boot time
>> >>> use, so for those systems it would be quite reasonably to only
>> >>> put the continuous starts of the nodes into bootmem.
>> >>
>> >> Hm, that would put the logic into arch-code. I have no strong opinion
>> >> about it.
>> >
>> > In fact I suspect the current code will already work like that
>> > implicitely. The aliasing is only a problem for the new "arbitary node
>> > free_bootmem" right?
>>
>> And that alloc_bootmem_node() can not garuantee node-locality which is
>> the much worse part, I think.
>>
>> >>> That said the bootmem code has gotten a little crufty and a clean
>> >>> rewrite might be a good idea.
>> >>
>> >> I agree completely.
>> >
>> > The trouble is just that bootmem is used in early boot and early boot is
>> > very subtle and getting it working over all architectures could be a
>> > challenge. Not wanting to discourage you, but it's not exactly the
>> > easiest part of the kernel to hack on.
>>
>> Bootmem seemed pretty self-contained to me, at least in the beginning.
>> The bad thing is that I can test only the most simple configuration with
>> it.
>>
>> I was wondering yesterday if it would be feasible to enforce
>> contiguousness for nodes. So that arch-code does not create one pgdat
>> for each node but one for each contiguous block. I have not yet looked
>
> That re-introduces the concept that a node is not a unit of numa locality,
> but one of memory contiguity. The kernel pretty much assumes that a node
> exhibits memory locality.
Okay.
>> deeper into it, but I suspect that other mm code has similar problems
>> with nodes spanning other nodes.
>
> One thing we do know is that we already have systems in the wild with
> overlapping nodes. PowerPC systems sometimes exhibit this behaviour, the
> ones I have seen have node 1 embedded within node 0. x86_64 also enables
> this support. This necessitated checks when initially freeing memory
> into the allocator to make sure it ended up freed into the right node.
> For non-sparsemem configurations these systems have some wasted mem_map,
> but otherwise it does work.
>
> Check out NODES_SPAN_OTHER_NODES for the code to avoid miss-placing
> memory.
Will have a better look at all this. Thanks for the comment.
Hannes
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2008-05-16 20:43 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-05-09 15:17 [PATCH 0/3] bootmem2 III Johannes Weiner
2008-05-09 15:17 ` Johannes Weiner
2008-05-09 15:17 ` [PATCH 1/3] mm: Make NR_NODE_MEMBLKS global Johannes Weiner
2008-05-09 15:17 ` Johannes Weiner
2008-05-09 15:17 ` [PATCH 2/3] mm: bootmem2 Johannes Weiner
2008-05-09 15:17 ` Johannes Weiner
2008-05-09 15:17 ` [PATCH 3/3] x86: Migrate X86_32 to bootmem2 Johannes Weiner
2008-05-09 15:17 ` Johannes Weiner
2008-05-09 18:40 ` [PATCH 0/3] bootmem2 III Andi Kleen
2008-05-09 18:40 ` Andi Kleen
2008-05-11 19:18 ` Johannes Weiner
2008-05-11 19:18 ` Johannes Weiner
2008-05-11 20:18 ` Andi Kleen
2008-05-11 20:18 ` Andi Kleen
2008-05-13 12:40 ` Johannes Weiner
2008-05-13 12:40 ` Johannes Weiner
2008-05-13 12:59 ` Andi Kleen
2008-05-13 12:59 ` Andi Kleen
2008-05-14 19:12 ` Johannes Weiner
2008-05-14 19:12 ` Johannes Weiner
2008-05-15 19:12 ` Andy Whitcroft
2008-05-15 19:12 ` Andy Whitcroft
2008-05-16 20:42 ` Johannes Weiner [this message]
2008-05-16 20:42 ` Johannes Weiner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87zlqqx9o4.fsf@saeurebad.de \
--to=hannes@saeurebad.de \
--cc=akpm@linux-foundation.org \
--cc=andi@firstfloor.org \
--cc=apw@shadowen.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mingo@elte.hu \
--cc=torvalds@linux-foundation.org \
--cc=yhlu.kernel@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.