Re: [PATCH 0/7] [RFC] Sizing zones and holes in an architecture independent manner V2

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Andi Kleen <ak@suse.de>
To: Mel Gorman <mel@skynet.ie>
Cc: davej@codemonkey.org.uk, tony.luck@intel.com,
	linuxppc-dev@ozlabs.org,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	bob.picco@hp.com,
	Linux Memory Management List <linux-mm@kvack.org>
Subject: Re: [PATCH 0/7] [RFC] Sizing zones and holes in an architecture independent manner V2
Date: Thu, 13 Apr 2006 02:56:59 +0200	[thread overview]
Message-ID: <200604130257.00203.ak@suse.de> (raw)
In-Reply-To: <Pine.LNX.4.64.0604130058210.18950@skynet.skynet.ie>

On Thursday 13 April 2006 02:22, Mel Gorman wrote:

> I experimented with the idea of all architectures sharing the struct 
> node_active_region rather than storing the information twice. It got very 
> messy, particularly for x86 because it needs to store more than nid, 
> start_pfn and end_pfn for a range of page frames (see node_memory_chunk_s 
> in arch/i386/kernel/srat.c). Worse, some architecture-specific code 
> remembers the ranges of active memory as addresses and others as pfn's. In 
> the end, I was not too worried about having the information in two places, 
> because the active ranges are kept in __initdata and gets freed.

The problem is not memory consumption but complexity of code/data structures.
Keeping information in two places is usually a good cue that something 
is wrong. This code is also fragile and hard to test.
 
> I'll admit that for x86_64, the entire code path for initialisation (i.e. 
> architecture specific and architecture independent paths) is now more 
> complex. The architecture independent code needed to be able to handle 
> every variety of node layout which is overkill for x86_64. Nevertheless, 
> without size_zones(), I thought the architecture-specific code for x86_64 
> memory initialisation was a bit easier to read. With 
> architecture-independent zone size and hole calculation, you only have to 
> understand the relevant code once, not once for each architecture.


I think i386 SRAT NUMA should be just removed at some point - it never
worked all that well and is quite complicated. That leaves IA64, x86-64
and ppc64.  I suspect keeping the code there near their low level
data structures is better.

> > I have my doubts that is really a improvement over the old state.
> >
> 
> For x86_64 in isolation or the entire set of patches?

For x86-64/i386. I haven't read the other architectures.

> > I think it would be better if you just defined some simple "library functions"
> > that can be called from the architecture specific code instead of adding
> > all this new high level code.
> >
> 
> What sort of library functions would you recommend? x86_64 uses 
> add_active_range() and free_area_init_nodes() from this patchset which 
> seemed fairly straight-forward.

e.g. a generic size_zones(). Possibly some others.

-Andi

WARNING: multiple messages have this Message-ID (diff)

From: Andi Kleen <ak@suse.de>
To: Mel Gorman <mel@skynet.ie>
Cc: davej@codemonkey.org.uk, tony.luck@intel.com,
	linuxppc-dev@ozlabs.org,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	bob.picco@hp.com,
	Linux Memory Management List <linux-mm@kvack.org>
Subject: Re: [PATCH 0/7] [RFC] Sizing zones and holes in an architecture independent manner V2
Date: Thu, 13 Apr 2006 02:56:59 +0200	[thread overview]
Message-ID: <200604130257.00203.ak@suse.de> (raw)
In-Reply-To: <Pine.LNX.4.64.0604130058210.18950@skynet.skynet.ie>

On Thursday 13 April 2006 02:22, Mel Gorman wrote:

> I experimented with the idea of all architectures sharing the struct 
> node_active_region rather than storing the information twice. It got very 
> messy, particularly for x86 because it needs to store more than nid, 
> start_pfn and end_pfn for a range of page frames (see node_memory_chunk_s 
> in arch/i386/kernel/srat.c). Worse, some architecture-specific code 
> remembers the ranges of active memory as addresses and others as pfn's. In 
> the end, I was not too worried about having the information in two places, 
> because the active ranges are kept in __initdata and gets freed.

The problem is not memory consumption but complexity of code/data structures.
Keeping information in two places is usually a good cue that something 
is wrong. This code is also fragile and hard to test.
 
> I'll admit that for x86_64, the entire code path for initialisation (i.e. 
> architecture specific and architecture independent paths) is now more 
> complex. The architecture independent code needed to be able to handle 
> every variety of node layout which is overkill for x86_64. Nevertheless, 
> without size_zones(), I thought the architecture-specific code for x86_64 
> memory initialisation was a bit easier to read. With 
> architecture-independent zone size and hole calculation, you only have to 
> understand the relevant code once, not once for each architecture.


I think i386 SRAT NUMA should be just removed at some point - it never
worked all that well and is quite complicated. That leaves IA64, x86-64
and ppc64.  I suspect keeping the code there near their low level
data structures is better.

> > I have my doubts that is really a improvement over the old state.
> >
> 
> For x86_64 in isolation or the entire set of patches?

For x86-64/i386. I haven't read the other architectures.

> > I think it would be better if you just defined some simple "library functions"
> > that can be called from the architecture specific code instead of adding
> > all this new high level code.
> >
> 
> What sort of library functions would you recommend? x86_64 uses 
> add_active_range() and free_area_init_nodes() from this patchset which 
> seemed fairly straight-forward.

e.g. a generic size_zones(). Possibly some others.

-Andi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2006-04-13  0:59 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-04-12 23:20 [PATCH 0/7] [RFC] Sizing zones and holes in an architecture independent manner V2 Mel Gorman
2006-04-12 23:20 ` Mel Gorman
2006-04-12 23:20 ` [PATCH 1/7] Introduce mechanism for registering active regions of memory Mel Gorman
2006-04-12 23:20   ` Mel Gorman
2006-04-12 23:21 ` [PATCH 2/7] Have Power use add_active_range() and free_area_init_nodes() Mel Gorman
2006-04-12 23:21   ` Mel Gorman
2006-04-12 23:21 ` [PATCH 3/7] Have x86 use add_active_range() and free_area_init_nodes Mel Gorman
2006-04-12 23:21   ` Mel Gorman
2006-04-12 23:21 ` [PATCH 4/7] Have x86_64 " Mel Gorman
2006-04-12 23:21   ` Mel Gorman
2006-04-12 23:22 ` [PATCH 5/7] Have ia64 " Mel Gorman
2006-04-12 23:22   ` Mel Gorman
2006-04-12 23:22 ` [PATCH 6/7] Break out memory initialisation code from page_alloc.c to mem_init.c Mel Gorman
2006-04-12 23:22   ` Mel Gorman
2006-04-12 23:22 ` [PATCH 7/7] Print out debugging information during initialisation Mel Gorman
2006-04-12 23:22   ` Mel Gorman
2006-04-12 23:53 ` [PATCH 0/7] [RFC] Sizing zones and holes in an architecture independent manner V2 Andi Kleen
2006-04-12 23:53   ` Andi Kleen
2006-04-13  0:22   ` Mel Gorman
2006-04-13  0:22     ` Mel Gorman
2006-04-13  0:56     ` Andi Kleen [this message]
2006-04-13  0:56       ` Andi Kleen
2006-04-13  1:08       ` Dave Hansen
2006-04-13  1:08         ` Dave Hansen
2006-04-13  1:08         ` Dave Hansen
2006-04-13 10:24       ` Mel Gorman
2006-04-13  9:52 ` Mel Gorman
2006-04-13  9:52   ` Mel Gorman
2006-04-13 10:32   ` Yasunori Goto
2006-04-13 10:32     ` Yasunori Goto
2006-04-13 10:32     ` Yasunori Goto
2006-04-13 10:51     ` Mel Gorman
2006-04-13 10:51       ` Mel Gorman
2006-04-13 10:51       ` Mel Gorman
2006-04-13 17:19   ` Luck, Tony
2006-04-13 17:19     ` Luck, Tony
2006-04-13 17:19     ` Luck, Tony
2006-04-13 17:30     ` Mel Gorman
2006-04-13 17:30       ` Mel Gorman
2006-04-13 17:30       ` Mel Gorman
2006-04-13 17:47       ` Luck, Tony
2006-04-13 17:47         ` Luck, Tony
2006-04-13 17:47         ` Luck, Tony
2006-04-13 19:14         ` Mel Gorman
2006-04-13 19:14           ` Mel Gorman
2006-04-13 19:14           ` Mel Gorman
2006-04-13 21:53           ` Luck, Tony
2006-04-13 21:53             ` Luck, Tony
2006-04-13 21:53             ` Luck, Tony
2006-04-14 13:12             ` Mel Gorman
2006-04-14 13:12               ` Mel Gorman
2006-04-14 13:12               ` Mel Gorman
2006-04-14 20:53               ` Luck, Tony
2006-04-14 20:53                 ` Luck, Tony
2006-04-14 20:53                 ` Luck, Tony
2006-04-14 22:54                 ` Mel Gorman
2006-04-14 22:54                   ` Mel Gorman
2006-04-14 22:54                   ` Mel Gorman
2006-04-14 23:17 ` Nigel Cunningham
2006-04-14 23:17   ` Nigel Cunningham
2006-04-14 23:50   ` Mel Gorman
2006-04-14 23:50     ` Mel Gorman
2006-04-14 23:50     ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200604130257.00203.ak@suse.de \
    --to=ak@suse.de \
    --cc=bob.picco@hp.com \
    --cc=davej@codemonkey.org.uk \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linuxppc-dev@ozlabs.org \
    --cc=mel@skynet.ie \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.