linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Ying Han <yinghan@google.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Minchan Kim <minchan.kim@gmail.com>,
	Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>,
	Balbir Singh <balbir@linux.vnet.ibm.com>,
	Tejun Heo <tj@kernel.org>, Pavel Emelyanov <xemul@openvz.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Li Zefan <lizf@cn.fujitsu.com>, Mel Gorman <mel@csn.ul.ie>,
	Christoph Lameter <cl@linux.com>, Rik van Riel <riel@redhat.com>,
	Hugh Dickins <hughd@google.com>, Michal Hocko <mhocko@suse.cz>,
	Dave Hansen <dave@linux.vnet.ibm.com>,
	Zhu Yanhai <zhu.yanhai@gmail.com>,
	linux-mm@kvack.org
Subject: Re: [PATCH V6 00/10] memcg: per cgroup background reclaim
Date: Wed, 20 Apr 2011 22:28:17 -0700	[thread overview]
Message-ID: <BANLkTimUQjW_XVdzoLJJwwFDuFvm=Qg_FA@mail.gmail.com> (raw)
In-Reply-To: <20110421050851.GI2333@cmpxchg.org>

[-- Attachment #1: Type: text/plain, Size: 5630 bytes --]

On Wed, Apr 20, 2011 at 10:08 PM, Johannes Weiner <hannes@cmpxchg.org>wrote:

> On Thu, Apr 21, 2011 at 01:00:16PM +0900, KAMEZAWA Hiroyuki wrote:
> > On Thu, 21 Apr 2011 04:51:07 +0200
> > Johannes Weiner <hannes@cmpxchg.org> wrote:
> >
> > > > If the cgroup is configured to use per cgroup background reclaim, a
> kswapd
> > > > thread is created which only scans the per-memcg LRU list.
> > >
> > > We already have direct reclaim, direct reclaim on behalf of a memcg,
> > > and global kswapd-reclaim.  Please don't add yet another reclaim path
> > > that does its own thing and interacts unpredictably with the rest of
> > > them.
> > >
> > > As discussed on LSF, we want to get rid of the global LRU.  So the
> > > goal is to have each reclaim entry end up at the same core part of
> > > reclaim that round-robin scans a subset of zones from a subset of
> > > memory control groups.
> >
> > It's not related to this set. And I think even if we remove global LRU,
> > global-kswapd and memcg-kswapd need to do independent work.
> >
> > global-kswapd : works for zone/node balancing and making free pages,
> >                 and compaction. select a memcg vicitm and ask it
> >                 to reduce memory with regard to gfp_mask. Starts its work
> >                 when zone/node is unbalanced.
>
> For soft limit reclaim (which is triggered by global memory pressure),
> we want to scan a group of memory cgroups equally in round robin
> fashion.  I think at LSF we established that it is not fair to find
> the one that exceeds its limit the most and hammer it until memory
> pressure is resolved or there is another group with more excess.
>
> So even for global kswapd, sooner or later we need a mechanism to
> apply equal pressure to a set of memcgs.
>
> With the removal of the global LRU, we ALWAYS operate on a set of
> memcgs in a round-robin fashion, not just for soft limit reclaim.
>
> So yes, these are two different things, but they have the same
> requirements.
>

Hmm. I don't see we have disagreement on the global-kswapd. The plan now is
to do the round-robin based
on their soft_limit. (note, this is not how it is implemented now, and I am
working on the patch now)

>
> > memcg-kswapd  : works for reducing usage of memory, no interests on
> >                 zone/nodes. Starts when high/low watermaks hits.
>
> When the watermark is hit in the charge path, we want to wake up the
> daemon to reclaim from a specific memcg.
>
> When multiple memcgs exceed their watermarks in parallel (after all,
> we DO allow concurrency), we again have a group of memcgs we want to
> reclaim from in a fair fashion until their watermarks are met again.
>
> And memcg reclaim is not oblivious to nodes and zones, right now, we
> also do mind the current node and respect the zone balancing when we
> do direct reclaim on behalf of a memcg.
>
> So, to be honest, I really don't see how both cases should be
> independent from each other.  On the contrary, I see very little
> difference between them.  The entry path differs slightly as well as
> the predicate for the set of memcgs to scan.  But most of the worker
> code is exactly the same, no?
>

They are triggered at different point and the target are different. One is
triggered under global pressure,
and the calculation of which memcg and how much to reclaim are based on
soft_limit. Also, the target is to bring the zone under the wmark, as well
as the zone balancing. The other one is triggered per-memcg on wmarks, and
the target is to bring the memcg usage below the wmark.

>
> > > > Two watermarks ("high_wmark", "low_wmark") are added to trigger the
> > > > background reclaim and stop it. The watermarks are calculated based
> > > > on the cgroup's limit_in_bytes.
> > >
> > > Which brings me to the next issue: making the watermarks configurable.
> > >
> > > You argued that having them adjustable from userspace is required for
> > > overcommitting the hardlimits and per-memcg kswapd reclaim not kicking
> > > in in case of global memory pressure.  But that is only a problem
> > > because global kswapd reclaim is (apart from soft limit reclaim)
> > > unaware of memory control groups.
> > >
> > > I think the much better solution is to make global kswapd memcg aware
> > > (with the above mentioned round-robin reclaim scheduler), compared to
> > > adding new (and final!) kernel ABI to avoid an internal shortcoming.
> >
> > I don't think its a good idea to kick kswapd even when free memory is
> enough.
>
> This depends on what kswapd is supposed to be doing.  I don't say we
> should reclaim from all memcgs (i.e. globally) just because one memcg
> hits its watermark, of course.
>
> But the argument was that we need the watermarks configurable to force
> per-memcg reclaim even when the hard limits are overcommitted, because
> global reclaim does not do a fair job to balance memcgs.


There seems to be some confusion here. The watermark we defined is
per-memcg, and that is calculated
based on the hard_limit. We need the per-memcg wmark the same reason of
per-zone wmart which triggers
the background reclaim before direct reclaim.

There is a patch in my patchset which adds the tunable for both
high/low_mark, which gives more flexibility to admin to config the host. In
over-commit environment, we might never hit the wmark if all the wmarks are
set internally.

My counter proposal is to fix global reclaim instead and apply equal
pressure on memcgs, such that we never have to tweak per-memcg watermarks
to achieve the same thing.

We still need this and that is the soft_limit reclaim under global
background reclaim.

--Ying

[-- Attachment #2: Type: text/html, Size: 7629 bytes --]

  reply	other threads:[~2011-04-21  5:28 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-04-19  3:57 [PATCH V6 00/10] memcg: per cgroup background reclaim Ying Han
2011-04-19  3:57 ` [PATCH V6 01/10] Add kswapd descriptor Ying Han
2011-04-19  3:57 ` [PATCH V6 02/10] Add per memcg reclaim watermarks Ying Han
2011-04-19  3:57 ` [PATCH V6 03/10] New APIs to adjust per-memcg wmarks Ying Han
2011-04-19  3:57 ` [PATCH V6 04/10] Infrastructure to support per-memcg reclaim Ying Han
2011-04-19  3:57 ` [PATCH V6 05/10] Implement the select_victim_node within memcg Ying Han
2011-04-19  3:57 ` [PATCH V6 06/10] Per-memcg background reclaim Ying Han
2011-04-20  1:03   ` KAMEZAWA Hiroyuki
2011-04-20  3:25     ` Ying Han
2011-04-20  4:20     ` Ying Han
2012-03-19  8:14   ` Zhu Yanhai
2012-03-20  5:37     ` Ying Han
2011-04-19  3:57 ` [PATCH V6 07/10] Add per-memcg zone "unreclaimable" Ying Han
2011-04-19  3:57 ` [PATCH V6 08/10] Enable per-memcg background reclaim Ying Han
2011-04-19  3:57 ` [PATCH V6 09/10] Add API to export per-memcg kswapd pid Ying Han
2011-04-20  1:15   ` KAMEZAWA Hiroyuki
2011-04-20  3:39     ` Ying Han
2011-04-19  3:57 ` [PATCH V6 10/10] Add some per-memcg stats Ying Han
2011-04-21  2:51 ` [PATCH V6 00/10] memcg: per cgroup background reclaim Johannes Weiner
2011-04-21  3:05   ` Ying Han
2011-04-21  3:53     ` Johannes Weiner
2011-04-21  4:00   ` KAMEZAWA Hiroyuki
2011-04-21  4:24     ` Ying Han
2011-04-21  4:46       ` KAMEZAWA Hiroyuki
2011-04-21  5:08     ` Johannes Weiner
2011-04-21  5:28       ` Ying Han [this message]
2011-04-23  1:35         ` Johannes Weiner
2011-04-23  2:10           ` Ying Han
2011-04-23  2:34             ` Johannes Weiner
2011-04-23  3:33               ` Ying Han
2011-04-23  3:41                 ` Rik van Riel
2011-04-23  3:49                   ` Ying Han
2011-04-27  7:36                 ` Johannes Weiner
2011-04-27 17:41                   ` Ying Han
2011-04-27 21:37                     ` Johannes Weiner
2011-04-21  5:41       ` KAMEZAWA Hiroyuki
2011-04-21  6:23         ` Ying Han
2011-04-23  2:02         ` Johannes Weiner
2011-04-21  3:40 ` KAMEZAWA Hiroyuki
2011-04-21  3:48   ` [PATCH 2/3] weight for memcg background reclaim (Was " KAMEZAWA Hiroyuki
2011-04-21  6:11     ` Ying Han
2011-04-21  6:38       ` KAMEZAWA Hiroyuki
2011-04-21  6:59         ` Ying Han
2011-04-21  7:01           ` KAMEZAWA Hiroyuki
2011-04-21  7:12             ` Ying Han
2011-04-21  3:50   ` [PATCH 3/3/] fix mem_cgroup_watemark_ok " KAMEZAWA Hiroyuki
2011-04-21  5:29     ` Ying Han
2011-04-21  4:22   ` Ying Han
2011-04-21  4:27     ` KAMEZAWA Hiroyuki
2011-04-21  4:31     ` Ying Han
2011-04-21  3:43 ` [PATCH 1/3] memcg kswapd thread pool (Was " KAMEZAWA Hiroyuki
2011-04-21  7:09   ` Ying Han
2011-04-21  7:14     ` KAMEZAWA Hiroyuki
2011-04-21  8:10   ` Minchan Kim
2011-04-21  8:46     ` KAMEZAWA Hiroyuki
2011-04-21  9:05       ` Minchan Kim
2011-04-21 16:56         ` Ying Han
2011-04-22  1:02           ` Minchan Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='BANLkTimUQjW_XVdzoLJJwwFDuFvm=Qg_FA@mail.gmail.com' \
    --to=yinghan@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=cl@linux.com \
    --cc=dave@linux.vnet.ibm.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-mm@kvack.org \
    --cc=lizf@cn.fujitsu.com \
    --cc=mel@csn.ul.ie \
    --cc=mhocko@suse.cz \
    --cc=minchan.kim@gmail.com \
    --cc=nishimura@mxp.nes.nec.co.jp \
    --cc=riel@redhat.com \
    --cc=tj@kernel.org \
    --cc=xemul@openvz.org \
    --cc=zhu.yanhai@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).