From: Robin Holt <holt@sgi.com>
To: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Rik van Riel <riel@redhat.com>,
LKML <linux-kernel@vger.kernel.org>,
linux-mm <linux-mm@kvack.org>,
Andrew Morton <akpm@linux-foundation.org>,
Christoph Lameter <cl@linux-foundation.org>,
Robin Holt <holt@sgi.com>
Subject: Re: [PATCH 4/4] zone_reclaim_mode is always 0 by default
Date: Thu, 14 May 2009 06:48:27 -0500 [thread overview]
Message-ID: <20090514114827.GN7601@sgi.com> (raw)
In-Reply-To: <20090514170721.9B75.A69D9226@jp.fujitsu.com>
> Unfortunately no.
> zone reclaim has two weakness by design.
>
> 1.
> zone reclaim don't works well when workingset size > local node size.
> but it can happen easily on small machine.
> if it happen, zone reclaim drop own process's memory.
>
> Plus, zone reclaim also doesn't fit DB server. its process has large
> workingset.
Large DB server is not your typical desktop application either.
> 2.
> zone reclaim have inter zone balancing issue.
>
> example: x86_64 2node 8G machine has following zone assignment
>
> zone 0 (DMA32): 3GB
> zone 0 (Normal): 1GB
> zone 1 (Normal): 4GB
>
> if the page is allocated from DMA32, you are lucky. DMA32 isn't reclaimed
> so freqently. but if from zone0 Normal, you are unlucky.
> it is very frequent reclaimed although it is small than other zone.
I have seen that behavior on some of our mismatched large systems as well,
although never had one so imbalanced because ia64 only has Normal.
> I know my patch change large server default. but I believe linux
> default kernel parameter adapt to desktop and entry machine.
If this imbalance is an x86_64 only problem, then we could do something
simple like the following untested patch. This leaves the default
for everyone except x86_64.
Robin
------------------------------------------------------------------------
Even if there is a great node distance on x86_64, disable zone reclaim
by default. This was done to handle the imbalanced zone sizes where a
majority of the memory in zone 0 is DMA32 with a small remaining Normal
which will be aggressively reclaimed.
For other architectures, we leave the default behavior.
Signed-off-by: Robin Holt <holt@sgi.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Rik van Riel <riel@redhat.com>
---
arch/x86/include/asm/topology.h | 2 ++
include/linux/topology.h | 5 +++++
mm/page_alloc.c | 2 +-
3 files changed, 8 insertions(+), 1 deletion(-)
Index: page_reclaim_mode/arch/x86/include/asm/topology.h
===================================================================
--- page_reclaim_mode.orig/arch/x86/include/asm/topology.h 2009-05-14 06:44:20.118925713 -0500
+++ page_reclaim_mode/arch/x86/include/asm/topology.h 2009-05-14 06:44:21.251067716 -0500
@@ -128,6 +128,8 @@ extern unsigned long node_remap_size[];
#endif
+#define DEFAULT_ZONE_RECLAIM_MODE 0
+
/* sched_domains SD_NODE_INIT for NUMA machines */
#define SD_NODE_INIT (struct sched_domain) { \
.min_interval = 8, \
Index: page_reclaim_mode/include/linux/topology.h
===================================================================
--- page_reclaim_mode.orig/include/linux/topology.h 2009-05-14 06:44:20.070919619 -0500
+++ page_reclaim_mode/include/linux/topology.h 2009-05-14 06:44:21.279071382 -0500
@@ -61,6 +61,11 @@ int arch_update_cpu_topology(void);
*/
#define RECLAIM_DISTANCE 20
#endif
+
+#ifndef DEFAULT_ZONE_RECLAIM_MODE
+#define DEFAULT_ZONE_RECLAIM_MODE 1
+#endif
+
#ifndef PENALTY_FOR_NODE_WITH_CPUS
#define PENALTY_FOR_NODE_WITH_CPUS (1)
#endif
Index: page_reclaim_mode/mm/page_alloc.c
===================================================================
--- page_reclaim_mode.orig/mm/page_alloc.c 2009-05-14 06:44:20.138928363 -0500
+++ page_reclaim_mode/mm/page_alloc.c 2009-05-14 06:44:21.311075244 -0500
@@ -2331,7 +2331,7 @@ static void build_zonelists(pg_data_t *p
* to reclaim pages in a zone before going off node.
*/
if (distance > RECLAIM_DISTANCE)
- zone_reclaim_mode = 1;
+ zone_reclaim_mode = DEFAULT_ZONE_RECLAIM_MODE;
/*
* We don't want to pressure a particular node.
WARNING: multiple messages have this Message-ID (diff)
From: Robin Holt <holt@sgi.com>
To: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Rik van Riel <riel@redhat.com>,
LKML <linux-kernel@vger.kernel.org>,
linux-mm <linux-mm@kvack.org>,
Andrew Morton <akpm@linux-foundation.org>,
Christoph Lameter <cl@linux-foundation.org>,
Robin Holt <holt@sgi.com>
Subject: Re: [PATCH 4/4] zone_reclaim_mode is always 0 by default
Date: Thu, 14 May 2009 06:48:27 -0500 [thread overview]
Message-ID: <20090514114827.GN7601@sgi.com> (raw)
In-Reply-To: <20090514170721.9B75.A69D9226@jp.fujitsu.com>
> Unfortunately no.
> zone reclaim has two weakness by design.
>
> 1.
> zone reclaim don't works well when workingset size > local node size.
> but it can happen easily on small machine.
> if it happen, zone reclaim drop own process's memory.
>
> Plus, zone reclaim also doesn't fit DB server. its process has large
> workingset.
Large DB server is not your typical desktop application either.
> 2.
> zone reclaim have inter zone balancing issue.
>
> example: x86_64 2node 8G machine has following zone assignment
>
> zone 0 (DMA32): 3GB
> zone 0 (Normal): 1GB
> zone 1 (Normal): 4GB
>
> if the page is allocated from DMA32, you are lucky. DMA32 isn't reclaimed
> so freqently. but if from zone0 Normal, you are unlucky.
> it is very frequent reclaimed although it is small than other zone.
I have seen that behavior on some of our mismatched large systems as well,
although never had one so imbalanced because ia64 only has Normal.
> I know my patch change large server default. but I believe linux
> default kernel parameter adapt to desktop and entry machine.
If this imbalance is an x86_64 only problem, then we could do something
simple like the following untested patch. This leaves the default
for everyone except x86_64.
Robin
------------------------------------------------------------------------
Even if there is a great node distance on x86_64, disable zone reclaim
by default. This was done to handle the imbalanced zone sizes where a
majority of the memory in zone 0 is DMA32 with a small remaining Normal
which will be aggressively reclaimed.
For other architectures, we leave the default behavior.
Signed-off-by: Robin Holt <holt@sgi.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Rik van Riel <riel@redhat.com>
---
arch/x86/include/asm/topology.h | 2 ++
include/linux/topology.h | 5 +++++
mm/page_alloc.c | 2 +-
3 files changed, 8 insertions(+), 1 deletion(-)
Index: page_reclaim_mode/arch/x86/include/asm/topology.h
===================================================================
--- page_reclaim_mode.orig/arch/x86/include/asm/topology.h 2009-05-14 06:44:20.118925713 -0500
+++ page_reclaim_mode/arch/x86/include/asm/topology.h 2009-05-14 06:44:21.251067716 -0500
@@ -128,6 +128,8 @@ extern unsigned long node_remap_size[];
#endif
+#define DEFAULT_ZONE_RECLAIM_MODE 0
+
/* sched_domains SD_NODE_INIT for NUMA machines */
#define SD_NODE_INIT (struct sched_domain) { \
.min_interval = 8, \
Index: page_reclaim_mode/include/linux/topology.h
===================================================================
--- page_reclaim_mode.orig/include/linux/topology.h 2009-05-14 06:44:20.070919619 -0500
+++ page_reclaim_mode/include/linux/topology.h 2009-05-14 06:44:21.279071382 -0500
@@ -61,6 +61,11 @@ int arch_update_cpu_topology(void);
*/
#define RECLAIM_DISTANCE 20
#endif
+
+#ifndef DEFAULT_ZONE_RECLAIM_MODE
+#define DEFAULT_ZONE_RECLAIM_MODE 1
+#endif
+
#ifndef PENALTY_FOR_NODE_WITH_CPUS
#define PENALTY_FOR_NODE_WITH_CPUS (1)
#endif
Index: page_reclaim_mode/mm/page_alloc.c
===================================================================
--- page_reclaim_mode.orig/mm/page_alloc.c 2009-05-14 06:44:20.138928363 -0500
+++ page_reclaim_mode/mm/page_alloc.c 2009-05-14 06:44:21.311075244 -0500
@@ -2331,7 +2331,7 @@ static void build_zonelists(pg_data_t *p
* to reclaim pages in a zone before going off node.
*/
if (distance > RECLAIM_DISTANCE)
- zone_reclaim_mode = 1;
+ zone_reclaim_mode = DEFAULT_ZONE_RECLAIM_MODE;
/*
* We don't want to pressure a particular node.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2009-05-14 11:48 UTC|newest]
Thread overview: 90+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-05-13 3:06 [PATCH 0/4] various zone_reclaim cleanup KOSAKI Motohiro
2009-05-13 3:06 ` KOSAKI Motohiro
2009-05-13 3:06 ` [PATCH 1/4] vmscan: change the number of the unmapped files in zone reclaim KOSAKI Motohiro
2009-05-13 3:06 ` KOSAKI Motohiro
2009-05-13 13:31 ` Rik van Riel
2009-05-13 13:31 ` Rik van Riel
2009-05-14 19:52 ` Christoph Lameter
2009-05-14 19:52 ` Christoph Lameter
2009-05-18 3:15 ` Wu Fengguang
2009-05-18 3:15 ` Wu Fengguang
2009-05-18 3:35 ` KOSAKI Motohiro
2009-05-18 3:35 ` KOSAKI Motohiro
2009-05-18 3:53 ` Wu Fengguang
2009-05-18 3:53 ` Wu Fengguang
2009-05-19 1:11 ` KOSAKI Motohiro
2009-05-19 1:11 ` KOSAKI Motohiro
2009-05-13 3:06 ` [PATCH 2/4] vmscan: drop PF_SWAPWRITE from zone_reclaim KOSAKI Motohiro
2009-05-13 3:06 ` KOSAKI Motohiro
2009-05-13 13:35 ` Rik van Riel
2009-05-13 13:35 ` Rik van Riel
2009-05-14 19:57 ` Christoph Lameter
2009-05-14 19:57 ` Christoph Lameter
2009-05-18 3:33 ` Wu Fengguang
2009-05-18 3:33 ` Wu Fengguang
2009-05-13 3:07 ` [PATCH 3/4] vmscan: zone_reclaim use may_swap KOSAKI Motohiro
2009-05-13 3:07 ` KOSAKI Motohiro
2009-05-13 11:26 ` Johannes Weiner
2009-05-13 11:26 ` Johannes Weiner
2009-05-13 14:43 ` Rik van Riel
2009-05-13 14:43 ` Rik van Riel
2009-05-14 19:59 ` Christoph Lameter
2009-05-14 19:59 ` Christoph Lameter
2009-05-18 3:35 ` Wu Fengguang
2009-05-18 3:35 ` Wu Fengguang
2009-05-13 3:08 ` [PATCH 4/4] zone_reclaim_mode is always 0 by default KOSAKI Motohiro
2009-05-13 3:08 ` KOSAKI Motohiro
2009-05-13 14:47 ` Rik van Riel
2009-05-13 14:47 ` Rik van Riel
2009-05-14 8:20 ` KOSAKI Motohiro
2009-05-14 8:20 ` KOSAKI Motohiro
2009-05-14 11:48 ` Robin Holt [this message]
2009-05-14 11:48 ` Robin Holt
2009-05-14 12:02 ` KOSAKI Motohiro
2009-05-14 12:02 ` KOSAKI Motohiro
2009-05-13 15:22 ` Robin Holt
2009-05-13 15:22 ` Robin Holt
2009-05-14 20:05 ` Christoph Lameter
2009-05-14 20:05 ` Christoph Lameter
2009-05-14 20:23 ` Rik van Riel
2009-05-14 20:23 ` Rik van Riel
2009-05-14 20:31 ` Christoph Lameter
2009-05-14 20:31 ` Christoph Lameter
2009-05-15 1:02 ` KOSAKI Motohiro
2009-05-15 1:02 ` KOSAKI Motohiro
2009-05-15 10:51 ` Robin Holt
2009-05-15 10:51 ` Robin Holt
2009-05-19 2:53 ` KOSAKI Motohiro
2009-05-19 2:53 ` KOSAKI Motohiro
2009-05-20 14:00 ` Robin Holt
2009-05-20 14:00 ` Robin Holt
2009-05-21 2:44 ` KOSAKI Motohiro
2009-05-21 2:44 ` KOSAKI Motohiro
2009-05-21 13:31 ` Christoph Lameter
2009-05-21 13:31 ` Christoph Lameter
2009-05-21 13:57 ` Robin Holt
2009-05-21 13:57 ` Robin Holt
2009-05-24 13:44 ` KOSAKI Motohiro
2009-05-24 13:44 ` KOSAKI Motohiro
2009-05-15 18:01 ` Christoph Lameter
2009-05-15 18:01 ` Christoph Lameter
2009-05-18 3:49 ` Wu Fengguang
2009-05-18 3:49 ` Wu Fengguang
2009-05-19 1:16 ` Zhang, Yanmin
2009-05-19 1:16 ` Zhang, Yanmin
2009-05-19 2:53 ` KOSAKI Motohiro
2009-05-19 2:53 ` KOSAKI Motohiro
2009-05-19 2:57 ` KOSAKI Motohiro
2009-05-19 2:57 ` KOSAKI Motohiro
2009-05-19 3:38 ` Zhang, Yanmin
2009-05-19 3:38 ` Zhang, Yanmin
2009-05-19 4:30 ` KOSAKI Motohiro
2009-05-19 4:30 ` KOSAKI Motohiro
2009-05-19 5:06 ` Zhang, Yanmin
2009-05-19 5:06 ` Zhang, Yanmin
2009-05-19 7:09 ` KOSAKI Motohiro
2009-05-19 7:09 ` KOSAKI Motohiro
2009-05-19 7:15 ` Zhang, Yanmin
2009-05-19 7:15 ` Zhang, Yanmin
2009-05-18 9:09 ` Wu Fengguang
2009-05-18 9:09 ` Wu Fengguang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090514114827.GN7601@sgi.com \
--to=holt@sgi.com \
--cc=akpm@linux-foundation.org \
--cc=cl@linux-foundation.org \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=riel@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.