public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Andy Whitcroft <apw@shadowen.org>
To: Andi Kleen <ak@suse.de>
Cc: Mel Gorman <mel@skynet.ie>, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] x86 Boot NUMA kernels on non-NUMA hardware with DISCONTIG memory model
Date: Sat, 25 Aug 2007 12:09:17 +0100	[thread overview]
Message-ID: <46D00DDD.1070101@shadowen.org> (raw)
In-Reply-To: <20070824175339.GD16227@bingen.suse.de>

Andi Kleen wrote:
> On Fri, Aug 24, 2007 at 06:44:38PM +0100, Mel Gorman wrote:
>> On (24/08/07 19:38), Andi Kleen didst pronounce:
>>>> Other than the fact that the memmap must be PMD aligned to use hugepage
>>>> entries for the memmap. 
>>> Why is that so?  mem_map should be just part of lowmem anyways.
>>>
>> Not in this case. memmap is allocated node local and mapped in the virtual
>> memory area normally occupied by the end of low memory. The objective was
>> to have memmap for the struct pages node-local. Hence, portions of
>> memmap are really in highmem.
> 
> Ok, but that still doesn't mean it has to be PMD aligned, 
> as long as illegal virtual aliases are prevent in the overlap
> (which is not very hard) 
> 
>>>> It could be mapped with small pages in corner cases
>>>> but the complexity worth it?
>>> You don't need to map it with small pages in the normal case,
>>> the only requirement is that c_p_a() is aware of it so it can
>>> split it if needed.
>>>
>>>> I can't see this type of lifting being done any time soon. As SPARSEMEM works
>>>> and there is hope with the vmemmap work that DISCONTIG will finally go away,
>>>> it may not be the best investment of time.
>>> It's a trivial change, probably less code than your original patch.
>>>
>> I'll have to take your word for it because I haven't looked closely
>> enough. I'll try and find time to look at it but the earliest I'll get around
>> to it is post kernel-summit. In the meantime, SPARSEMEM works.
> 
> Ok, so we disable DISCONTIG i386 NUMA because there's nobody willing
> to maintain it?
>
> I'll take your word SPARSEMEM works, although I was told DISCONTIG NUMA
> works too and then my testing told a quite different story.

That sounds like over kill to me.  The code unfixed works for all actual
NUMA systems I am aware of, else we would have had reports of this
problem before in the years that this code has been in the kernel.  The
fix Mel sent up fixes the code so that it works on systems with
unaligned node ends (which is what triggers the issue).  It does mean
that a little memory is wasted when this kernel is used on a non-NUMA
systems with unaligned node ends (only), but it works as designed at
that point.  To be honest it looks very much that only a very small
memory systems is going to trip this, and we have traditionally used
non-NUMA kernels on non-NUMA systems so there is almost zero exposure in
our install base.

Does this sudden interest in this combination, indicate a distro driven
change to using NUMA kernels on non-NUMA systems??

Having been involved in the development of the code originally, I think
Mel's fix is a good compromise to fix the immediate problem.  Clearly
there are bigger problems in this code that need clearing up if we are
to use this code as it is on small memory non-NUMA systems.  For one the
change merged to fix the "memmap overlapping initrd allocation" severely
wastes memory by pushing the memmap into ZONE_NORMAL even when there is
spare Kernel Virtual Address space available, and also looses the memory
under it where it used to shift to HIGHMEM.

I think that most of this can become moot if we simply pull node-0 out
of this remap scheme, as node-0's memory is already local and the
problem only occurs on node-0.  I have a todo item to look over this,
but as Mel has indicated its probabally not going to be immediate.

I think it makes sense to take Mel's fix as the smallest repair and
we'll spend some time sorting it out cleanly soon.

-apw

      parent reply	other threads:[~2007-08-25 12:34 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-08-24 16:28 [PATCH] x86 Boot NUMA kernels on non-NUMA hardware with DISCONTIG memory model Mel Gorman
2007-08-24 16:35 ` Andi Kleen
2007-08-24 16:52   ` Andy Whitcroft
2007-08-24 17:07     ` Andi Kleen
2007-08-24 17:26       ` Mel Gorman
2007-08-24 17:38         ` Andi Kleen
2007-08-24 17:44           ` Mel Gorman
2007-08-24 17:53             ` Andi Kleen
2007-08-24 18:02               ` Mel Gorman
2007-08-25 11:09               ` Andy Whitcroft [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=46D00DDD.1070101@shadowen.org \
    --to=apw@shadowen.org \
    --cc=ak@suse.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mel@skynet.ie \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox