linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Robin Holt <holt@sgi.com>
To: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Robin Holt <holt@sgi.com>,
	Christoph Lameter <cl@linux-foundation.org>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Rik van Riel <riel@redhat.com>
Subject: Re: [PATCH 4/4] zone_reclaim_mode is always 0 by default
Date: Wed, 20 May 2009 09:00:45 -0500	[thread overview]
Message-ID: <20090520140045.GA29447@sgi.com> (raw)
In-Reply-To: <20090519102003.4EAB.A69D9226@jp.fujitsu.com>

On Tue, May 19, 2009 at 11:53:44AM +0900, KOSAKI Motohiro wrote:
> Hi
> 
> > > Current linux policy is, zone_reclaim_mode is enabled by default if the machine
> > > has large remote node distance. it's because we could assume that large distance 
> > > mean large server until recently.
> > > 
> > > Unfortunately, recent modern x86 CPU (e.g. Core i7, Opeteron) have P2P transport
> > > memory controller. IOW it's seen as NUMA from software view.
> > > 
> > > Some Core i7 machine has large remote node distance, but zone_reclaim don't
> > > fit desktop and small file server. it cause performance degression.
> > > 
> > > Thus, zone_reclaim == 0 is better by default if the machine is small.
> > 
> > What if I had a node 0 with 32GB or 128GB of memory.  In that case,
> > we would have 3GB for DMA32, 125GB for Normal and then a node 1 with
> > 128GB.  I would suggest that zone reclaim would perform normally and
> > be beneficial.
> > 
> > You are unfairly classifying this as a size of machine problem when it is
> > really a problem with the underlying zone reclaim code being triggered
> > due to imbalanced node/zones, part of which is due to a single node
> > having multiple zones and those multiple zones setting up the conditions
> > for extremely agressive reclaim.  In other words, you are putting a
> > bandage in place to hide a problem on your particular hardware.
> > 
> > Can RECLAIM_DISTANCE be adjusted so your Ci7 boxes are no longer caught?
> > Aren't 4 node Ci7 boxes soon to be readily available?  How are your apps
> > different from my apps in that you are not impacted by node locality?
> > Are you being too insensitive to node locality?  Conversely am I being
> > too sensitive?
> > 
> > All that said, I would not stop this from going in.  I just think the
> > selection criteria is rather random.  I think we know the condition we
> > are trying to avoid which is a small Normal zone on one node and a larger
> > Normal zone on another causing zone reclaim to be overly agressive.
> > I don't know how to quantify "small" versus "large".  I would suggest
> > that a node 0 with 16 or more GB should have zone reclaim on by default
> > as well.  Can that be expressed in the selection criteria.
> 
> I post my opinion as another mail. please see it.

I don't think you addressed my actual question.  How much of this is
a result of having a node where 1/4 of the memory is in the 'Normal'
zone and 3/4 is in the DMA32 zone?  How much is due to the imbalance
between Node 0 'Normal' and Node 1 'Normal'?  Shouldn't that type of
sanity check be used for turning on zone reclaim instead of some random
number of nodes.  Even with 128 nodes and 256 cpus, I _NEVER_ see the
system swapping out before allocating off node so I can certainly not
reproduce the situation you are seeing.

The imbalance I have seen was when I had two small memory nodes and two
large memory nodes and then oversubscribed memory.  In that situation,
I noticed that the apps on the small memory nodes were more frequently
impacted.  This unfairness made sense to me and seemed perfectly
reasonable.

Thanks,
Robin

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2009-05-20 14:00 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-05-13  3:06 [PATCH 0/4] various zone_reclaim cleanup KOSAKI Motohiro
2009-05-13  3:06 ` [PATCH 1/4] vmscan: change the number of the unmapped files in zone reclaim KOSAKI Motohiro
2009-05-13 13:31   ` Rik van Riel
2009-05-14 19:52   ` Christoph Lameter
2009-05-18  3:15   ` Wu Fengguang
2009-05-18  3:35     ` KOSAKI Motohiro
2009-05-18  3:53       ` Wu Fengguang
2009-05-19  1:11         ` KOSAKI Motohiro
2009-05-13  3:06 ` [PATCH 2/4] vmscan: drop PF_SWAPWRITE from zone_reclaim KOSAKI Motohiro
2009-05-13 13:35   ` Rik van Riel
2009-05-14 19:57   ` Christoph Lameter
2009-05-18  3:33   ` Wu Fengguang
2009-05-13  3:07 ` [PATCH 3/4] vmscan: zone_reclaim use may_swap KOSAKI Motohiro
2009-05-13 11:26   ` Johannes Weiner
2009-05-13 14:43   ` Rik van Riel
2009-05-14 19:59   ` Christoph Lameter
2009-05-18  3:35   ` Wu Fengguang
2009-05-13  3:08 ` [PATCH 4/4] zone_reclaim_mode is always 0 by default KOSAKI Motohiro
2009-05-13 14:47   ` Rik van Riel
2009-05-14  8:20     ` KOSAKI Motohiro
2009-05-14 11:48       ` Robin Holt
2009-05-14 12:02         ` KOSAKI Motohiro
2009-05-13 15:22   ` Robin Holt
2009-05-14 20:05     ` Christoph Lameter
2009-05-14 20:23       ` Rik van Riel
2009-05-14 20:31         ` Christoph Lameter
2009-05-15  1:02       ` KOSAKI Motohiro
2009-05-15 10:51         ` Robin Holt
2009-05-19  2:53           ` KOSAKI Motohiro
2009-05-20 14:00             ` Robin Holt [this message]
2009-05-21  2:44               ` KOSAKI Motohiro
2009-05-21 13:31                 ` Christoph Lameter
2009-05-21 13:57                   ` Robin Holt
2009-05-24 13:44                   ` KOSAKI Motohiro
2009-05-15 18:01         ` Christoph Lameter
2009-05-18  3:49   ` Wu Fengguang
2009-05-19  1:16     ` Zhang, Yanmin
2009-05-19  2:53     ` KOSAKI Motohiro
2009-05-19  2:57       ` KOSAKI Motohiro
2009-05-19  3:38       ` Zhang, Yanmin
2009-05-19  4:30         ` KOSAKI Motohiro
2009-05-19  5:06           ` Zhang, Yanmin
2009-05-19  7:09             ` KOSAKI Motohiro
2009-05-19  7:15               ` Zhang, Yanmin
2009-05-18  9:09   ` Wu Fengguang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090520140045.GA29447@sgi.com \
    --to=holt@sgi.com \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux-foundation.org \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).