From: Michal Hocko <mhocko@suse.cz>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
"hannes@cmpxchg.org" <hannes@cmpxchg.org>,
"nishimura@mxp.nes.nec.co.jp" <nishimura@mxp.nes.nec.co.jp>
Subject: Re: [PATCH v5 2/6] memcg: stop vmscan when enough done.
Date: Wed, 10 Aug 2011 16:14:25 +0200 [thread overview]
Message-ID: <20110810141425.GC15007@tiehlicka.suse.cz> (raw)
In-Reply-To: <20110809190933.d965888b.kamezawa.hiroyu@jp.fujitsu.com>
On Tue 09-08-11 19:09:33, KAMEZAWA Hiroyuki wrote:
> memcg :avoid node fallback scan if possible.
>
> Now, try_to_free_pages() scans all zonelist because the page allocator
> should visit all zonelists...but that behavior is harmful for memcg.
> Memcg just scans memory because it hits limit...no memory shortage
> in pased zonelist.
>
> For example, with following unbalanced nodes
>
> Node 0 Node 1
> File 1G 0
> Anon 200M 200M
>
> memcg will cause swap-out from Node1 at every vmscan.
>
> Another example, assume 1024 nodes system.
> With 1024 node system, memcg will visit 1024 nodes
> pages per vmscan... This is overkilling.
>
> This is why memcg's victim node selection logic doesn't work
> as expected.
>
> This patch is a help for stopping vmscan when we scanned enough.
>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
OK, I see the point. At first I was afraid that we would make a bigger
pressure on the node which triggered the reclaim but as we are selecting
t dynamically (mem_cgroup_select_victim_node) - round robin at the
moment - it should be fair in the end. More targeted node selection
should be even more efficient.
I still have a concern about resize_limit code path, though. It uses
memcg direct reclaim to get under the new limit (assuming it is lower
than the current one).
Currently we might reclaim nr_nodes * SWAP_CLUSTER_MAX while
after your change we have it at SWAP_CLUSTER_MAX. This means that
mem_cgroup_resize_mem_limit might fail sooner on large NUMA machines
(currently it is doing 5 rounds of reclaim before it gives up). I do not
consider this to be blocker but maybe we should enhance
mem_cgroup_hierarchical_reclaim with a nr_pages argument to tell it how
much we want to reclaim (min(SWAP_CLUSTER_MAX, nr_pages)).
What do you think?
> ---
> mm/vmscan.c | 10 ++++++++++
> 1 file changed, 10 insertions(+)
>
> Index: mmotm-Aug3/mm/vmscan.c
> ===================================================================
> --- mmotm-Aug3.orig/mm/vmscan.c
> +++ mmotm-Aug3/mm/vmscan.c
> @@ -2124,6 +2124,16 @@ static void shrink_zones(int priority, s
> }
>
> shrink_zone(priority, zone, sc);
> + if (!scanning_global_lru(sc)) {
> + /*
> + * When we do scan for memcg's limit, it's bad to do
> + * fallback into more node/zones because there is no
> + * memory shortage. We quit as much as possible when
> + * we reache target.
> + */
> + if (sc->nr_to_reclaim <= sc->nr_reclaimed)
> + break;
> + }
> }
> }
--
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic
WARNING: multiple messages have this Message-ID (diff)
From: Michal Hocko <mhocko@suse.cz>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
"hannes@cmpxchg.org" <hannes@cmpxchg.org>,
"nishimura@mxp.nes.nec.co.jp" <nishimura@mxp.nes.nec.co.jp>
Subject: Re: [PATCH v5 2/6] memcg: stop vmscan when enough done.
Date: Wed, 10 Aug 2011 16:14:25 +0200 [thread overview]
Message-ID: <20110810141425.GC15007@tiehlicka.suse.cz> (raw)
In-Reply-To: <20110809190933.d965888b.kamezawa.hiroyu@jp.fujitsu.com>
On Tue 09-08-11 19:09:33, KAMEZAWA Hiroyuki wrote:
> memcg :avoid node fallback scan if possible.
>
> Now, try_to_free_pages() scans all zonelist because the page allocator
> should visit all zonelists...but that behavior is harmful for memcg.
> Memcg just scans memory because it hits limit...no memory shortage
> in pased zonelist.
>
> For example, with following unbalanced nodes
>
> Node 0 Node 1
> File 1G 0
> Anon 200M 200M
>
> memcg will cause swap-out from Node1 at every vmscan.
>
> Another example, assume 1024 nodes system.
> With 1024 node system, memcg will visit 1024 nodes
> pages per vmscan... This is overkilling.
>
> This is why memcg's victim node selection logic doesn't work
> as expected.
>
> This patch is a help for stopping vmscan when we scanned enough.
>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
OK, I see the point. At first I was afraid that we would make a bigger
pressure on the node which triggered the reclaim but as we are selecting
t dynamically (mem_cgroup_select_victim_node) - round robin at the
moment - it should be fair in the end. More targeted node selection
should be even more efficient.
I still have a concern about resize_limit code path, though. It uses
memcg direct reclaim to get under the new limit (assuming it is lower
than the current one).
Currently we might reclaim nr_nodes * SWAP_CLUSTER_MAX while
after your change we have it at SWAP_CLUSTER_MAX. This means that
mem_cgroup_resize_mem_limit might fail sooner on large NUMA machines
(currently it is doing 5 rounds of reclaim before it gives up). I do not
consider this to be blocker but maybe we should enhance
mem_cgroup_hierarchical_reclaim with a nr_pages argument to tell it how
much we want to reclaim (min(SWAP_CLUSTER_MAX, nr_pages)).
What do you think?
> ---
> mm/vmscan.c | 10 ++++++++++
> 1 file changed, 10 insertions(+)
>
> Index: mmotm-Aug3/mm/vmscan.c
> ===================================================================
> --- mmotm-Aug3.orig/mm/vmscan.c
> +++ mmotm-Aug3/mm/vmscan.c
> @@ -2124,6 +2124,16 @@ static void shrink_zones(int priority, s
> }
>
> shrink_zone(priority, zone, sc);
> + if (!scanning_global_lru(sc)) {
> + /*
> + * When we do scan for memcg's limit, it's bad to do
> + * fallback into more node/zones because there is no
> + * memory shortage. We quit as much as possible when
> + * we reache target.
> + */
> + if (sc->nr_to_reclaim <= sc->nr_reclaimed)
> + break;
> + }
> }
> }
--
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2011-08-10 14:14 UTC|newest]
Thread overview: 70+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-08-09 10:04 [PATCH v5 0/6] memg: better numa scanning KAMEZAWA Hiroyuki
2011-08-09 10:04 ` KAMEZAWA Hiroyuki
2011-08-09 10:08 ` [PATCH v5 1/6] " KAMEZAWA Hiroyuki
2011-08-09 10:08 ` KAMEZAWA Hiroyuki
2011-08-10 10:00 ` Michal Hocko
2011-08-10 10:00 ` Michal Hocko
2011-08-10 23:30 ` KAMEZAWA Hiroyuki
2011-08-10 23:30 ` KAMEZAWA Hiroyuki
2011-08-10 23:44 ` [PATCH] memcg: fix comment on update nodemask KAMEZAWA Hiroyuki
2011-08-10 23:44 ` KAMEZAWA Hiroyuki
2011-08-11 13:25 ` Michal Hocko
2011-08-11 13:25 ` Michal Hocko
2011-08-09 10:09 ` [PATCH v5 2/6] memcg: stop vmscan when enough done KAMEZAWA Hiroyuki
2011-08-09 10:09 ` KAMEZAWA Hiroyuki
2011-08-10 14:14 ` Michal Hocko [this message]
2011-08-10 14:14 ` Michal Hocko
2011-08-10 23:52 ` KAMEZAWA Hiroyuki
2011-08-10 23:52 ` KAMEZAWA Hiroyuki
2011-08-11 14:50 ` Michal Hocko
2011-08-11 14:50 ` Michal Hocko
2011-08-12 12:44 ` [PATCH] memcg: add nr_pages argument for hierarchical reclaim Michal Hocko
2011-08-12 12:44 ` Michal Hocko
2011-08-17 0:54 ` [PATCH v5 2/6] memcg: stop vmscan when enough done KAMEZAWA Hiroyuki
2011-08-17 0:54 ` KAMEZAWA Hiroyuki
2011-08-17 11:35 ` Michal Hocko
2011-08-17 11:35 ` Michal Hocko
2011-08-17 23:52 ` KAMEZAWA Hiroyuki
2011-08-17 23:52 ` KAMEZAWA Hiroyuki
2011-08-18 6:27 ` Michal Hocko
2011-08-18 6:27 ` Michal Hocko
2011-08-18 6:42 ` KAMEZAWA Hiroyuki
2011-08-18 6:42 ` KAMEZAWA Hiroyuki
2011-08-18 7:46 ` Michal Hocko
2011-08-18 7:46 ` Michal Hocko
2011-08-18 12:57 ` [PATCH v3] memcg: add nr_pages argument for hierarchical reclaim Michal Hocko
2011-08-18 12:57 ` Michal Hocko
2011-08-18 13:58 ` Johannes Weiner
2011-08-18 13:58 ` Johannes Weiner
2011-08-18 14:40 ` Michal Hocko
2011-08-18 14:40 ` Michal Hocko
2011-08-09 10:10 ` [PATCH v5 3/6] memg: vmscan pass nodemask KAMEZAWA Hiroyuki
2011-08-09 10:10 ` KAMEZAWA Hiroyuki
2011-08-10 11:19 ` Michal Hocko
2011-08-10 11:19 ` Michal Hocko
2011-08-10 23:43 ` KAMEZAWA Hiroyuki
2011-08-10 23:43 ` KAMEZAWA Hiroyuki
2011-08-09 10:11 ` [PATCH v5 4/6] memg: calculate numa weight for vmscan KAMEZAWA Hiroyuki
2011-08-09 10:11 ` KAMEZAWA Hiroyuki
2011-08-17 14:34 ` Michal Hocko
2011-08-17 14:34 ` Michal Hocko
2011-08-18 0:17 ` KAMEZAWA Hiroyuki
2011-08-18 0:17 ` KAMEZAWA Hiroyuki
2011-08-18 8:41 ` Michal Hocko
2011-08-18 8:41 ` Michal Hocko
2011-08-19 0:06 ` KAMEZAWA Hiroyuki
2011-08-19 0:06 ` KAMEZAWA Hiroyuki
2011-08-09 10:12 ` [PATCH v5 5/6] memg: vmscan select victim node by weight KAMEZAWA Hiroyuki
2011-08-09 10:12 ` KAMEZAWA Hiroyuki
2011-08-18 13:34 ` Michal Hocko
2011-08-18 13:34 ` Michal Hocko
2011-08-09 10:13 ` [PATCH v5 6/6] memg: do target scan if unbalanced KAMEZAWA Hiroyuki
2011-08-09 10:13 ` KAMEZAWA Hiroyuki
2011-08-09 14:33 ` [PATCH v5 0/6] memg: better numa scanning Michal Hocko
2011-08-09 14:33 ` Michal Hocko
2011-08-10 0:15 ` KAMEZAWA Hiroyuki
2011-08-10 0:15 ` KAMEZAWA Hiroyuki
2011-08-10 6:03 ` KAMEZAWA Hiroyuki
2011-08-10 6:03 ` KAMEZAWA Hiroyuki
2011-08-10 14:20 ` Michal Hocko
2011-08-10 14:20 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110810141425.GC15007@tiehlicka.suse.cz \
--to=mhocko@suse.cz \
--cc=akpm@linux-foundation.org \
--cc=hannes@cmpxchg.org \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=nishimura@mxp.nes.nec.co.jp \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.