From: Ying Han <yinghan@google.com>
To: Michal Hocko <mhocko@suse.cz>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
Minchan Kim <minchan.kim@gmail.com>,
Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>,
Balbir Singh <balbir@linux.vnet.ibm.com>,
Tejun Heo <tj@kernel.org>, Pavel Emelyanov <xemul@openvz.org>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
Andrew Morton <akpm@linux-foundation.org>,
Li Zefan <lizf@cn.fujitsu.com>, Mel Gorman <mel@csn.ul.ie>,
Christoph Lameter <cl@linux.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Rik van Riel <riel@redhat.com>, Hugh Dickins <hughd@google.com>,
Dave Hansen <dave@linux.vnet.ibm.com>,
Zhu Yanhai <zhu.yanhai@gmail.com>,
linux-mm@kvack.org
Subject: Re: [PATCH V4 00/10] memcg: per cgroup background reclaim
Date: Mon, 18 Apr 2011 10:01:20 -0700 [thread overview]
Message-ID: <BANLkTimkPasX8AA=HCOgVeSyPBSivz8pMg@mail.gmail.com> (raw)
In-Reply-To: <20110418091351.GC8925@tiehlicka.suse.cz>
[-- Attachment #1: Type: text/plain, Size: 6890 bytes --]
On Mon, Apr 18, 2011 at 2:13 AM, Michal Hocko <mhocko@suse.cz> wrote:
> On Fri 15-04-11 09:40:54, Ying Han wrote:
> > On Fri, Apr 15, 2011 at 2:40 AM, Michal Hocko <mhocko@suse.cz> wrote:
> >
> > > Hi Ying,
> > > sorry that I am jumping into game that late but I was quite busy after
> > > returning back from LSF and LFCS.
> > >
> >
> > Sure. Nice meeting you guys there and thank you for looking into this
> patch
> > :)
>
> Yes, nice meeting.
>
> >
> > >
> > > On Thu 14-04-11 15:54:19, Ying Han wrote:
> > > > The current implementation of memcg supports targeting reclaim when
> the
> > > > cgroup is reaching its hard_limit and we do direct reclaim per
> cgroup.
> > > > Per cgroup background reclaim is needed which helps to spread out
> memory
> > > > pressure over longer period of time and smoothes out the cgroup
> > > performance.
> > > >
> > > > If the cgroup is configured to use per cgroup background reclaim, a
> > > kswapd
> > > > thread is created which only scans the per-memcg LRU list.
> > >
> > > Hmm, I am wondering if this fits into the get-rid-of-the-global-LRU
> > > strategy. If we make the background reclaim per-cgroup how do we
> balance
> > > from the global/zone POV? We can end up with all groups over the high
> > > limit while a memory zone is under this watermark. Or am I missing
> > > something?
> > > I thought that plans for the background reclaim were same as for direct
> > > reclaim so that kswapd would just evict pages from groups in the
> > > round-robin fashion (in first round just those that are under limit and
> > > proportionally when it cannot reach high watermark after it got through
> > > all groups).
> > >
> >
> > I think you are talking about the soft_limit reclaim which I am gonna
> look
> > at next.
>
> I see. I am just concerned whether 3rd level of reclaim is a good idea.
> We would need to do background reclaim anyway (and to preserve the
> original semantic it has to be somehow watermark controlled). I am just
> wondering why we have to implement it separately from kswapd. Cannot we
> just simply trigger global kswapd which would reclaim all cgroups that
> are under watermarks? [I am sorry for my ignorance if that is what is
> implemented in the series - I haven't got to the patches yes]
>
They are different on per-zone reclaim vs per-memcg reclaim. The first one
is triggered if the zone is under
memory pressure and we need to free pages to serve further page allocations.
The second one is triggered
if the memcg is under memory pressure and we need to free pages to leave
room (limit - usage) for the memcg
to grow.
Both of them are needed and that is how it is implemented on the direct
reclaim path. The kswapd batches only try to
smooth out the system and memcg performance by reclaiming pages proactively.
It doesn't affecting the functionality.
>
> > The soft_limit reclaim
> > is triggered under global memory pressure and doing round-robin across
> > memcgs. I will also cover the
> > zone-balancing by having second list of memgs under their soft_limit.
> >
> > Here is the summary of our LSF discussion :)
> > http://permalink.gmane.org/gmane.linux.kernel.mm/60966
>
> Yes, I have read it and thanks for putting it together.
>
sure.
>
> > > > Two watermarks ("high_wmark", "low_wmark") are added to trigger the
> > > > background reclaim and stop it. The watermarks are calculated based
> on
> > > > the cgroup's limit_in_bytes.
> > >
> > > I didn't have time to look at the patch how does the calculation work
> > > yet but we should be careful to match the zone's watermark
> expectations.
> > >
> >
> > I have API on the following patch which provide high/low_wmark_distance
> to
> > tune wmarks individually individually. By default, they are set to 0
> which
> > turn off the per-memcg kswapd. For now, we are ok since the global kswapd
> is
> > still doing per-zone scanning and reclaiming :)
> >
> > >
> > > > By default, the per-memcg kswapd threads are running under root
> cgroup.
> > > There
> > > > is a per-memcg API which exports the pid of each kswapd thread, and
> > > userspace
> > > > can configure cpu cgroup seperately.
> > > >
> > > > I run through dd test on large file and then cat the file. Then I
> > > compared
> > > > the reclaim related stats in memory.stat.
> > > >
> > > > Step1: Create a cgroup with 500M memory_limit.
> > > > $ mkdir /dev/cgroup/memory/A
> > > > $ echo 500m >/dev/cgroup/memory/A/memory.limit_in_bytes
> > > > $ echo $$ >/dev/cgroup/memory/A/tasks
> > > >
> > > > Step2: Test and set the wmarks.
> > > > $ cat /dev/cgroup/memory/A/memory.low_wmark_distance
> > > > 0
> > > > $ cat /dev/cgroup/memory/A/memory.high_wmark_distance
> > > > 0
> > >
> > >
> > They are used to tune the high/low_marks based on the hard_limit. We
> might
> > need to export that configuration to user admin especially on machines
> where
> > they over-commit by hard_limit.
>
> I remember there was some resistance against tuning watermarks
> separately.
>
This API is based on KAMEZAWA's request. :)
>
> > > > $ cat /dev/cgroup/memory/A/memory.reclaim_wmarks
> > > > low_wmark 524288000
> > > > high_wmark 524288000
> > > >
> > > > $ echo 50m >/dev/cgroup/memory/A/memory.high_wmark_distance
> > > > $ echo 40m >/dev/cgroup/memory/A/memory.low_wmark_distance
> > > >
> > > > $ cat /dev/cgroup/memory/A/memory.reclaim_wmarks
> > > > low_wmark 482344960
> > > > high_wmark 471859200
> > >
> > > low_wmark is higher than high_wmark?
> > >
> >
> > hah, it is confusing. I have them documented. Basically, low_wmark
> triggers
> > reclaim and high_wmark stop the reclaim. And we have
> >
> > high_wmark < usage < low_wmark.
>
> OK, I will look at it.
>
> [...]
>
> > > I am not sure how much orthogonal per-cgroup-per-thread vs. zone
> > > approaches are, though. Maybe it makes some sense to do both
> per-cgroup
> > > and zone background reclaim. Anyway I think that we should start with
> > > the zone reclaim first.
> > >
> >
> > I missed the point here. Can you clarify the zone reclaim here?
>
> kswapd does the background zone reclaim and you are trying to do
> per-cgroup reclaim, right? I am concerned about those two fighting with
> slightly different goal.
>
> I am still thinking whether backgroup reclaim would be sufficient,
> though. We would get rid of per-cgroup thread and wouldn't create a new
> reclaim interface.
>
The per-zone reclaim will look at memcg and their soft_limits, and
the criteria is different from per-memcg background reclaim where we look at
the hard_limit. This is how the direct reclaim works on both side, and
kswapd is just doing the
work proactively.
Later when we change the soft_limit reclaim on the per-zone memory pressure,
the same logic will be changed in the per-zone try_to_free_pages().
Thanks
--Ying
> --
> Michal Hocko
> SUSE Labs
> SUSE LINUX s.r.o.
> Lihovarska 1060/12
> 190 00 Praha 9
> Czech Republic
>
[-- Attachment #2: Type: text/html, Size: 9313 bytes --]
next prev parent reply other threads:[~2011-04-18 17:02 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-04-14 22:54 [PATCH V4 00/10] memcg: per cgroup background reclaim Ying Han
2011-04-14 22:54 ` [PATCH V4 01/10] Add kswapd descriptor Ying Han
2011-04-15 0:04 ` KAMEZAWA Hiroyuki
2011-04-15 3:35 ` Ying Han
2011-04-15 4:16 ` KAMEZAWA Hiroyuki
2011-04-15 21:46 ` Ying Han
2011-04-14 22:54 ` [PATCH V4 02/10] Add per memcg reclaim watermarks Ying Han
2011-04-15 0:16 ` KAMEZAWA Hiroyuki
2011-04-15 3:45 ` Ying Han
2011-04-14 22:54 ` [PATCH V4 03/10] New APIs to adjust per-memcg wmarks Ying Han
2011-04-15 0:25 ` KAMEZAWA Hiroyuki
2011-04-15 4:00 ` Ying Han
2011-04-14 22:54 ` [PATCH V4 04/10] Infrastructure to support per-memcg reclaim Ying Han
2011-04-15 0:34 ` KAMEZAWA Hiroyuki
2011-04-15 4:04 ` Ying Han
2011-04-14 22:54 ` [PATCH V4 05/10] Implement the select_victim_node within memcg Ying Han
2011-04-15 0:40 ` KAMEZAWA Hiroyuki
2011-04-15 4:36 ` Ying Han
2011-04-14 22:54 ` [PATCH V4 06/10] Per-memcg background reclaim Ying Han
2011-04-15 1:11 ` KAMEZAWA Hiroyuki
2011-04-15 6:08 ` Ying Han
2011-04-15 8:14 ` KAMEZAWA Hiroyuki
2011-04-15 18:00 ` Ying Han
2011-04-15 6:26 ` Ying Han
2011-04-14 22:54 ` [PATCH V4 07/10] Add per-memcg zone "unreclaimable" Ying Han
2011-04-15 1:32 ` KAMEZAWA Hiroyuki
2012-03-19 8:27 ` Zhu Yanhai
2012-03-20 5:45 ` Ying Han
2012-03-22 1:13 ` Zhu Yanhai
2011-04-14 22:54 ` [PATCH V4 08/10] Enable per-memcg background reclaim Ying Han
2011-04-15 1:34 ` KAMEZAWA Hiroyuki
2011-04-14 22:54 ` [PATCH V4 09/10] Add API to export per-memcg kswapd pid Ying Han
2011-04-15 1:40 ` KAMEZAWA Hiroyuki
2011-04-15 4:47 ` Ying Han
2011-04-14 22:54 ` [PATCH V4 10/10] Add some per-memcg stats Ying Han
2011-04-15 9:40 ` [PATCH V4 00/10] memcg: per cgroup background reclaim Michal Hocko
2011-04-15 16:40 ` Ying Han
2011-04-18 9:13 ` Michal Hocko
2011-04-18 17:01 ` Ying Han [this message]
2011-04-18 18:42 ` Michal Hocko
2011-04-18 22:27 ` Ying Han
2011-04-19 2:48 ` Zhu Yanhai
2011-04-19 3:46 ` Ying Han
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='BANLkTimkPasX8AA=HCOgVeSyPBSivz8pMg@mail.gmail.com' \
--to=yinghan@google.com \
--cc=akpm@linux-foundation.org \
--cc=balbir@linux.vnet.ibm.com \
--cc=cl@linux.com \
--cc=dave@linux.vnet.ibm.com \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-mm@kvack.org \
--cc=lizf@cn.fujitsu.com \
--cc=mel@csn.ul.ie \
--cc=mhocko@suse.cz \
--cc=minchan.kim@gmail.com \
--cc=nishimura@mxp.nes.nec.co.jp \
--cc=riel@redhat.com \
--cc=tj@kernel.org \
--cc=xemul@openvz.org \
--cc=zhu.yanhai@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).