From: Zhu Yanhai <zhu.yanhai@gmail.com>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Ying Han <yinghan@google.com>,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
Minchan Kim <minchan.kim@gmail.com>,
Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>,
Balbir Singh <balbir@linux.vnet.ibm.com>,
Tejun Heo <tj@kernel.org>, Pavel Emelyanov <xemul@openvz.org>,
Andrew Morton <akpm@linux-foundation.org>,
Li Zefan <lizf@cn.fujitsu.com>, Mel Gorman <mel@csn.ul.ie>,
Christoph Lameter <cl@linux.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Rik van Riel <riel@redhat.com>, Hugh Dickins <hughd@google.com>,
Michal Hocko <mhocko@suse.cz>,
Dave Hansen <dave@linux.vnet.ibm.com>,
linux-mm@kvack.org
Subject: Re: [PATCH V7 4/9] Add memcg kswapd thread pool
Date: Fri, 22 Apr 2011 14:02:51 +0800 [thread overview]
Message-ID: <BANLkTinRyZyeJh-v2XeFRPCCd=x5OpWr+g@mail.gmail.com> (raw)
In-Reply-To: <20110422140023.949e5737.kamezawa.hiroyu@jp.fujitsu.com>
[-- Attachment #1: Type: text/plain, Size: 4335 bytes --]
Hi Kame,
2011/4/22 KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> On Thu, 21 Apr 2011 21:49:04 -0700
> Ying Han <yinghan@google.com> wrote:
>
> > On Thu, Apr 21, 2011 at 9:36 PM, KAMEZAWA Hiroyuki <
> > kamezawa.hiroyu@jp.fujitsu.com> wrote:
> >
> > > On Thu, 21 Apr 2011 21:24:15 -0700
> > > Ying Han <yinghan@google.com> wrote:
> > >
> > > > This patch creates a thread pool for memcg-kswapd. All memcg which
> needs
> > > > background recalim are linked to a list and memcg-kswapd picks up a
> memcg
> > > > from the list and run reclaim.
> > > >
> > > > The concern of using per-memcg-kswapd thread is the system overhead
> > > including
> > > > memory and cputime.
> > > >
> > > > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > > > Signed-off-by: Ying Han <yinghan@google.com>
> > >
> > > Thank you for merging. This seems ok to me.
> > >
> > > Further development may make this better or change thread pools (to
> some
> > > other),
> > > but I think this is enough good.
> > >
> >
> > Thank you for reviewing and Acking. At the same time, I do have wondering
> on
> > the thread-pool modeling which I posted on the cover-letter :)
> >
> > The per-memcg-per-kswapd model
> > Pros:
> > 1. memory overhead per thread, and The memory consumption would be
> 8k*1000 =
> > 8M
> > with 1k cgroup.
> > 2. we see lots of threads at 'ps -elf'
> >
> > Cons:
> > 1. the implementation is simply and straigh-forward.
> > 2. we can easily isolate the background reclaim overhead between cgroups.
> > 3. better latency from memory pressure to actual start reclaiming
> >
> > The thread-pool model
> > Pros:
> > 1. there is no isolation between memcg background reclaim, since the
> memcg
> > threads
> > are shared.
> > 2. it is hard for visibility and debugability. I have been experienced a
> lot
> > when
> > some kswapds running creazy and we need a stright-forward way to identify
> > which
> > cgroup causing the reclaim.
> > 3. potential starvation for some memcgs, if one workitem stucks and the
> rest
> > of work
> > won't proceed.
> >
> > Cons:
> > 1. save some memory resource.
> >
> > In general, the per-memcg-per-kswapd implmentation looks sane to me at
> this
> > point, esepcially the sharing memcg thread model will make debugging
> issue
> > very hard later.
> >
> > Comments?
> >
> Pros <-> Cons ?
>
> My idea is adding trace point for memcg-kswapd and seeing what it's now
> doing.
> (We don't have too small trace point in memcg...)
>
> I don't think its sane to create kthread per memcg because we know there is
> a user
> who makes hundreds/thousands of memcg.
>
I think we need to think about the exact usage of 'thousands of cgroups' in
this case. Although not quite in detail, in Ying's previous email she did
say that they created thousands of cgroups on each box in Google's cluster
and most of them _slept_ in most of the time. So I guess actually what they
did is creating a larger number of cgroups, each of them has different
limits on various resources. Then on the time of job dispatching, they can
choose a suitable group from each box and submit the job into it - without
touching the other thousands of sleeping groups. That's to say, though
Google has a huge number of groups on each box, they have only few jobs on
it, so it's impossible to see too many busy groups at the same time.
If above is correct, then I think Ying can call kthread_stop at the moment
we find there's no tasks in the group anymore, to kill the memcg thread (as
this group is expected to sleep for a long time after all the job leave). In
this way we can keep the number of memcg threads small and don't lose the
debug-ability.
What do you think?
Regards,
Zhu Yanhai
>
> And, I think that creating threads, which does the same job, more than the
> number
> of cpus will cause much more difficult starvation, priority inversion
> issue.
> Keeping scheduling knob/chances of jobs in memcg is important. I don't want
> to
> give a hint to scheduler because of memcg internal issue.
>
> And, even if memcg-kswapd doesn't exist, memcg works (well?).
> memcg-kswapd just helps making things better but not do any critical jobs.
> So, it's okay to have this as best-effort service.
> Of course, better scheduling idea for picking up memcg is welcomed. It's
> now
> round-robin.
>
> Thanks,
> -Kame
>
>
[-- Attachment #2: Type: text/html, Size: 6341 bytes --]
next prev parent reply other threads:[~2011-04-22 6:03 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-04-22 4:24 [RFC PATCH V7 0/9] memcg: per cgroup background reclaim Ying Han
2011-04-22 4:24 ` [PATCH V7 1/9] Add kswapd descriptor Ying Han
2011-04-22 4:31 ` KAMEZAWA Hiroyuki
2011-04-22 4:47 ` KOSAKI Motohiro
2011-04-22 5:55 ` Ying Han
2011-04-22 4:24 ` [PATCH V7 2/9] Add per memcg reclaim watermarks Ying Han
2011-04-22 4:24 ` [PATCH V7 3/9] New APIs to adjust per-memcg wmarks Ying Han
2011-04-22 4:32 ` KAMEZAWA Hiroyuki
2011-04-22 4:24 ` [PATCH V7 4/9] Add memcg kswapd thread pool Ying Han
2011-04-22 4:36 ` KAMEZAWA Hiroyuki
2011-04-22 4:49 ` Ying Han
2011-04-22 5:00 ` KAMEZAWA Hiroyuki
2011-04-22 5:53 ` Ying Han
2011-04-22 5:59 ` KAMEZAWA Hiroyuki
2011-04-22 6:10 ` Ying Han
2011-04-22 7:46 ` KAMEZAWA Hiroyuki
2011-04-22 7:59 ` Ying Han
2011-04-22 8:02 ` KAMEZAWA Hiroyuki
2011-04-24 23:26 ` KAMEZAWA Hiroyuki
2011-04-25 2:08 ` Ying Han
2011-04-22 6:02 ` Zhu Yanhai [this message]
2011-04-22 6:14 ` Ying Han
2011-04-22 5:39 ` KOSAKI Motohiro
2011-04-22 5:56 ` KAMEZAWA Hiroyuki
2011-04-22 4:24 ` [PATCH V7 5/9] Infrastructure to support per-memcg reclaim Ying Han
2011-04-22 4:38 ` KAMEZAWA Hiroyuki
2011-04-22 5:11 ` KOSAKI Motohiro
2011-04-22 5:59 ` Ying Han
2011-04-22 5:27 ` KOSAKI Motohiro
2011-04-22 6:00 ` Ying Han
2011-04-22 4:24 ` [PATCH V7 6/9] Implement the select_victim_node within memcg Ying Han
2011-04-22 4:39 ` KAMEZAWA Hiroyuki
2011-04-22 4:24 ` [PATCH V7 7/9] Per-memcg background reclaim Ying Han
2011-04-22 4:40 ` KAMEZAWA Hiroyuki
2011-04-22 6:00 ` KOSAKI Motohiro
2011-04-22 7:54 ` Ying Han
2011-04-22 8:44 ` KOSAKI Motohiro
2011-04-22 18:37 ` Ying Han
2011-04-25 2:21 ` [PATCH] vmscan,memcg: memcg aware swap token KOSAKI Motohiro
2011-04-25 9:47 ` KAMEZAWA Hiroyuki
2011-04-25 17:13 ` Ying Han
2011-04-26 2:08 ` KOSAKI Motohiro
2011-04-22 4:24 ` [PATCH V7 8/9] Add per-memcg zone "unreclaimable" Ying Han
2011-04-22 4:43 ` KAMEZAWA Hiroyuki
2011-04-22 6:13 ` KOSAKI Motohiro
2011-04-22 6:17 ` Ying Han
2011-04-22 4:24 ` [PATCH V7 9/9] Enable per-memcg background reclaim Ying Han
2011-04-22 4:44 ` KAMEZAWA Hiroyuki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='BANLkTinRyZyeJh-v2XeFRPCCd=x5OpWr+g@mail.gmail.com' \
--to=zhu.yanhai@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=balbir@linux.vnet.ibm.com \
--cc=cl@linux.com \
--cc=dave@linux.vnet.ibm.com \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-mm@kvack.org \
--cc=lizf@cn.fujitsu.com \
--cc=mel@csn.ul.ie \
--cc=mhocko@suse.cz \
--cc=minchan.kim@gmail.com \
--cc=nishimura@mxp.nes.nec.co.jp \
--cc=riel@redhat.com \
--cc=tj@kernel.org \
--cc=xemul@openvz.org \
--cc=yinghan@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).