From: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
To: Christoph Lameter <clameter@sgi.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>,
Andrew Morton <akpm@linux-foundation.org>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
LKML <linux-kernel@vger.kernel.org>,
Linux-MM <linux-mm@kvack.org>
Subject: Re: [PATCH] change global zonelist order v4 [0/2]
Date: Fri, 04 May 2007 13:24:19 -0400 [thread overview]
Message-ID: <1178299460.5236.35.camel@localhost> (raw)
In-Reply-To: <Pine.LNX.4.64.0705040913340.21436@schroedinger.engr.sgi.com>
On Fri, 2007-05-04 at 09:18 -0700, Christoph Lameter wrote:
> On Fri, 4 May 2007, Jesse Barnes wrote:
>
> > I think the idea is to avoid exhausting ZONE_DMA on some NUMA boxes by
> > ordering the fallback list first by zone, then by node distance (e.g.
> > ZONE_NORMAL of local node, then ZONE_NORMAL of next nearest node etc.,
> > followed by ZONE_DMA of local node, ZONE_DMA of next nearest node, etc.).
>
> Maybe it would be cleaner to setup a DMA and DMA32 "node" up and define
> them at a certain distance to the rest of the nodes that only contain
> ZONE_NORMAL (or the zone that is replicated on all nodes). Then we would
> have that effect without reworking zone list generation. Plus in the long
> run we may then be able to get to 1 zone per node avoiding the
> difficulties coming zone fallback altogether.
>
> > Another option would be to make this behavior automatic if both ZONE_DMA
> > and ZONE_NORMAL had pages. I initially wrote this stuff with the idea
> > that machines that really needed it would have all their memory in
> > ZONE_DMA, but obviously that's not the case, so some more smarts are
> > needed.
>
> I think what would work is to first setup nodes that use the highest zone.
> Then add virtual nodes for the lower zones that may only exist on a single
> node.
>
> I.e. a 4 node x86_64 box may have
>
> Node
> 0 ZONE_NORMAL
> 1 ZONE_NORMAL
> 2 ZONE_NORMAL
> 3 ZONE_NORMAL
> 4 ZONE_DMA32
> 5 [additional ZONE_DMA32 if zone DMA32 is split over multiple nodes]
> 6 ZONE_DMA
>
> The SLIT information can be used to control how the nodes fallback to the
> DMA32 nodes on 4 and 5. Node 6 would be given a very high SLIT distance so
> that it would be used only if an actual __GFP_DMA occurs or the system
> really runs into memory difficulties.
Hmmm... "serious hackery", indeed! ;-)
Lee
WARNING: multiple messages have this Message-ID (diff)
From: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
To: Christoph Lameter <clameter@sgi.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>,
Andrew Morton <akpm@linux-foundation.org>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
LKML <linux-kernel@vger.kernel.org>,
Linux-MM <linux-mm@kvack.org>
Subject: Re: [PATCH] change global zonelist order v4 [0/2]
Date: Fri, 04 May 2007 13:24:19 -0400 [thread overview]
Message-ID: <1178299460.5236.35.camel@localhost> (raw)
In-Reply-To: <Pine.LNX.4.64.0705040913340.21436@schroedinger.engr.sgi.com>
On Fri, 2007-05-04 at 09:18 -0700, Christoph Lameter wrote:
> On Fri, 4 May 2007, Jesse Barnes wrote:
>
> > I think the idea is to avoid exhausting ZONE_DMA on some NUMA boxes by
> > ordering the fallback list first by zone, then by node distance (e.g.
> > ZONE_NORMAL of local node, then ZONE_NORMAL of next nearest node etc.,
> > followed by ZONE_DMA of local node, ZONE_DMA of next nearest node, etc.).
>
> Maybe it would be cleaner to setup a DMA and DMA32 "node" up and define
> them at a certain distance to the rest of the nodes that only contain
> ZONE_NORMAL (or the zone that is replicated on all nodes). Then we would
> have that effect without reworking zone list generation. Plus in the long
> run we may then be able to get to 1 zone per node avoiding the
> difficulties coming zone fallback altogether.
>
> > Another option would be to make this behavior automatic if both ZONE_DMA
> > and ZONE_NORMAL had pages. I initially wrote this stuff with the idea
> > that machines that really needed it would have all their memory in
> > ZONE_DMA, but obviously that's not the case, so some more smarts are
> > needed.
>
> I think what would work is to first setup nodes that use the highest zone.
> Then add virtual nodes for the lower zones that may only exist on a single
> node.
>
> I.e. a 4 node x86_64 box may have
>
> Node
> 0 ZONE_NORMAL
> 1 ZONE_NORMAL
> 2 ZONE_NORMAL
> 3 ZONE_NORMAL
> 4 ZONE_DMA32
> 5 [additional ZONE_DMA32 if zone DMA32 is split over multiple nodes]
> 6 ZONE_DMA
>
> The SLIT information can be used to control how the nodes fallback to the
> DMA32 nodes on 4 and 5. Node 6 would be given a very high SLIT distance so
> that it would be used only if an actual __GFP_DMA occurs or the system
> really runs into memory difficulties.
Hmmm... "serious hackery", indeed! ;-)
Lee
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2007-05-04 17:24 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-04-27 5:45 [PATCH] change global zonelist order v4 [0/2] KAMEZAWA Hiroyuki
2007-04-27 5:45 ` KAMEZAWA Hiroyuki
2007-04-27 6:04 ` [PATCH] change global zonelist order v4 [1/2] change zonelist ordering KAMEZAWA Hiroyuki
2007-04-27 6:04 ` KAMEZAWA Hiroyuki
2007-04-30 16:12 ` Lee Schermerhorn
2007-04-30 16:12 ` Lee Schermerhorn
2007-04-27 6:17 ` [PATCH] change global zonelist order v4 [2/2] auto configuration KAMEZAWA Hiroyuki
2007-04-27 6:17 ` KAMEZAWA Hiroyuki
2007-04-30 16:26 ` Lee Schermerhorn
2007-04-30 16:26 ` Lee Schermerhorn
2007-05-04 5:47 ` [PATCH] change global zonelist order v4 [0/2] Andrew Morton
2007-05-04 5:47 ` Andrew Morton
2007-05-04 15:26 ` Jesse Barnes
2007-05-04 15:26 ` Jesse Barnes
2007-05-04 16:18 ` Christoph Lameter
2007-05-04 16:18 ` Christoph Lameter
2007-05-04 17:24 ` Lee Schermerhorn [this message]
2007-05-04 17:24 ` Lee Schermerhorn
2007-05-04 17:28 ` Christoph Lameter
2007-05-04 17:28 ` Christoph Lameter
2007-05-04 17:36 ` Jesse Barnes
2007-05-04 17:36 ` Jesse Barnes
2007-05-04 18:03 ` Christoph Lameter
2007-05-04 18:03 ` Christoph Lameter
2007-05-04 17:12 ` Lee Schermerhorn
2007-05-04 17:12 ` Lee Schermerhorn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1178299460.5236.35.camel@localhost \
--to=lee.schermerhorn@hp.com \
--cc=akpm@linux-foundation.org \
--cc=clameter@sgi.com \
--cc=jbarnes@virtuousgeek.org \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.