All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andy Whitcroft <apw@shadowen.org>
To: Johannes Weiner <hannes@saeurebad.de>
Cc: Andi Kleen <andi@firstfloor.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Ingo Molnar <mingo@elte.hu>, Yinghai Lu <yhlu.kernel@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH 0/3] bootmem2 III
Date: Thu, 15 May 2008 20:12:10 +0100	[thread overview]
Message-ID: <20080515191210.GE21787@shadowen.org> (raw)
In-Reply-To: <874p92qsvn.fsf@saeurebad.de>

On Tue, May 13, 2008 at 02:40:44PM +0200, Johannes Weiner wrote:
> Hi,
> 
> Andi Kleen <andi@firstfloor.org> writes:
> 
> > Johannes Weiner wrote:
> >
> >>> On Fri, May 09, 2008 at 05:17:13PM +0200, Johannes Weiner wrote:
> >>>> here is bootmem2, a memory block-oriented boot time allocator.
> >>>>
> >>>> Recent NUMA topologies broke the current bootmem's assumption that
> >>>> memory nodes provide non-overlapping and contiguous ranges of pages.
> >>> I'm still not sure that's a really good rationale for bootmem2.
> >>> e.g. the non continuous nodes are really special cases and there tends
> >>> to be enough memory at the beginning which is enough for boot time
> >>> use, so for those systems it would be quite reasonably to only 
> >>> put the continuous starts of the nodes into bootmem.
> >> 
> >> Hm, that would put the logic into arch-code.  I have no strong opinion
> >> about it.
> >
> > In fact I suspect the current code will already work like that
> > implicitely. The aliasing is only a problem for the new "arbitary node
> > free_bootmem" right?
> 
> And that alloc_bootmem_node() can not garuantee node-locality which is
> the much worse part, I think.
> 
> >>> That said the bootmem code has gotten a little crufty and a clean
> >>> rewrite might be a good idea.
> >> 
> >> I agree completely.
> >
> > The trouble is just that bootmem is used in early boot and early boot is
> > very subtle and getting it working over all architectures could be a
> > challenge. Not wanting to discourage you, but it's not exactly the
> > easiest part of the kernel to hack on.
> 
> Bootmem seemed pretty self-contained to me, at least in the beginning.
> The bad thing is that I can test only the most simple configuration with
> it.
> 
> I was wondering yesterday if it would be feasible to enforce
> contiguousness for nodes.  So that arch-code does not create one pgdat
> for each node but one for each contiguous block.  I have not yet looked

That re-introduces the concept that a node is not a unit of numa locality,
but one of memory contiguity.  The kernel pretty much assumes that a node
exhibits memory locality.  

> deeper into it, but I suspect that other mm code has similar problems
> with nodes spanning other nodes.

One thing we do know is that we already have systems in the wild with
overlapping nodes.  PowerPC systems sometimes exhibit this behaviour, the
ones I have seen have node 1 embedded within node 0.  x86_64 also enables
this support.  This necessitated checks when initially freeing memory
into the allocator to make sure it ended up freed into the right node.
For non-sparsemem configurations these systems have some wasted mem_map,
but otherwise it does work.

Check out NODES_SPAN_OTHER_NODES for the code to avoid miss-placing
memory.

-apw

WARNING: multiple messages have this Message-ID (diff)
From: Andy Whitcroft <apw@shadowen.org>
To: Johannes Weiner <hannes@saeurebad.de>
Cc: Andi Kleen <andi@firstfloor.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Ingo Molnar <mingo@elte.hu>, Yinghai Lu <yhlu.kernel@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH 0/3] bootmem2 III
Date: Thu, 15 May 2008 20:12:10 +0100	[thread overview]
Message-ID: <20080515191210.GE21787@shadowen.org> (raw)
In-Reply-To: <874p92qsvn.fsf@saeurebad.de>

On Tue, May 13, 2008 at 02:40:44PM +0200, Johannes Weiner wrote:
> Hi,
> 
> Andi Kleen <andi@firstfloor.org> writes:
> 
> > Johannes Weiner wrote:
> >
> >>> On Fri, May 09, 2008 at 05:17:13PM +0200, Johannes Weiner wrote:
> >>>> here is bootmem2, a memory block-oriented boot time allocator.
> >>>>
> >>>> Recent NUMA topologies broke the current bootmem's assumption that
> >>>> memory nodes provide non-overlapping and contiguous ranges of pages.
> >>> I'm still not sure that's a really good rationale for bootmem2.
> >>> e.g. the non continuous nodes are really special cases and there tends
> >>> to be enough memory at the beginning which is enough for boot time
> >>> use, so for those systems it would be quite reasonably to only 
> >>> put the continuous starts of the nodes into bootmem.
> >> 
> >> Hm, that would put the logic into arch-code.  I have no strong opinion
> >> about it.
> >
> > In fact I suspect the current code will already work like that
> > implicitely. The aliasing is only a problem for the new "arbitary node
> > free_bootmem" right?
> 
> And that alloc_bootmem_node() can not garuantee node-locality which is
> the much worse part, I think.
> 
> >>> That said the bootmem code has gotten a little crufty and a clean
> >>> rewrite might be a good idea.
> >> 
> >> I agree completely.
> >
> > The trouble is just that bootmem is used in early boot and early boot is
> > very subtle and getting it working over all architectures could be a
> > challenge. Not wanting to discourage you, but it's not exactly the
> > easiest part of the kernel to hack on.
> 
> Bootmem seemed pretty self-contained to me, at least in the beginning.
> The bad thing is that I can test only the most simple configuration with
> it.
> 
> I was wondering yesterday if it would be feasible to enforce
> contiguousness for nodes.  So that arch-code does not create one pgdat
> for each node but one for each contiguous block.  I have not yet looked

That re-introduces the concept that a node is not a unit of numa locality,
but one of memory contiguity.  The kernel pretty much assumes that a node
exhibits memory locality.  

> deeper into it, but I suspect that other mm code has similar problems
> with nodes spanning other nodes.

One thing we do know is that we already have systems in the wild with
overlapping nodes.  PowerPC systems sometimes exhibit this behaviour, the
ones I have seen have node 1 embedded within node 0.  x86_64 also enables
this support.  This necessitated checks when initially freeing memory
into the allocator to make sure it ended up freed into the right node.
For non-sparsemem configurations these systems have some wasted mem_map,
but otherwise it does work.

Check out NODES_SPAN_OTHER_NODES for the code to avoid miss-placing
memory.

-apw

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2008-05-15 19:12 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-05-09 15:17 [PATCH 0/3] bootmem2 III Johannes Weiner
2008-05-09 15:17 ` Johannes Weiner
2008-05-09 15:17 ` [PATCH 1/3] mm: Make NR_NODE_MEMBLKS global Johannes Weiner
2008-05-09 15:17   ` Johannes Weiner
2008-05-09 15:17 ` [PATCH 2/3] mm: bootmem2 Johannes Weiner
2008-05-09 15:17   ` Johannes Weiner
2008-05-09 15:17 ` [PATCH 3/3] x86: Migrate X86_32 to bootmem2 Johannes Weiner
2008-05-09 15:17   ` Johannes Weiner
2008-05-09 18:40 ` [PATCH 0/3] bootmem2 III Andi Kleen
2008-05-09 18:40   ` Andi Kleen
2008-05-11 19:18   ` Johannes Weiner
2008-05-11 19:18     ` Johannes Weiner
2008-05-11 20:18     ` Andi Kleen
2008-05-11 20:18       ` Andi Kleen
2008-05-13 12:40       ` Johannes Weiner
2008-05-13 12:40         ` Johannes Weiner
2008-05-13 12:59         ` Andi Kleen
2008-05-13 12:59           ` Andi Kleen
2008-05-14 19:12           ` Johannes Weiner
2008-05-14 19:12             ` Johannes Weiner
2008-05-15 19:12         ` Andy Whitcroft [this message]
2008-05-15 19:12           ` Andy Whitcroft
2008-05-16 20:42           ` Johannes Weiner
2008-05-16 20:42             ` Johannes Weiner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080515191210.GE21787@shadowen.org \
    --to=apw@shadowen.org \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=hannes@saeurebad.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@elte.hu \
    --cc=torvalds@linux-foundation.org \
    --cc=yhlu.kernel@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.