From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail202.messagelabs.com (mail202.messagelabs.com [216.82.254.227]) by kanga.kvack.org (Postfix) with ESMTP id 3C0FF8D003B for ; Fri, 22 Apr 2011 02:11:07 -0400 (EDT) Received: from wpaz1.hot.corp.google.com (wpaz1.hot.corp.google.com [172.24.198.65]) by smtp-out.google.com with ESMTP id p3M6B0U8019943 for ; Thu, 21 Apr 2011 23:11:00 -0700 Received: from qwk3 (qwk3.prod.google.com [10.241.195.131]) by wpaz1.hot.corp.google.com with ESMTP id p3M6Avag030443 (version=TLSv1/SSLv3 cipher=RC4-SHA bits=128 verify=NOT) for ; Thu, 21 Apr 2011 23:10:59 -0700 Received: by qwk3 with SMTP id 3so161692qwk.19 for ; Thu, 21 Apr 2011 23:10:59 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <20110422145943.a8f5a4ef.kamezawa.hiroyu@jp.fujitsu.com> References: <1303446260-21333-1-git-send-email-yinghan@google.com> <1303446260-21333-5-git-send-email-yinghan@google.com> <20110422133643.6a36d838.kamezawa.hiroyu@jp.fujitsu.com> <20110422140023.949e5737.kamezawa.hiroyu@jp.fujitsu.com> <20110422145943.a8f5a4ef.kamezawa.hiroyu@jp.fujitsu.com> Date: Thu, 21 Apr 2011 23:10:58 -0700 Message-ID: Subject: Re: [PATCH V7 4/9] Add memcg kswapd thread pool From: Ying Han Content-Type: multipart/alternative; boundary=0016360e3f5cb3ef5304a17bb873 Sender: owner-linux-mm@kvack.org List-ID: To: KAMEZAWA Hiroyuki Cc: KOSAKI Motohiro , Minchan Kim , Daisuke Nishimura , Balbir Singh , Tejun Heo , Pavel Emelyanov , Andrew Morton , Li Zefan , Mel Gorman , Christoph Lameter , Johannes Weiner , Rik van Riel , Hugh Dickins , Michal Hocko , Dave Hansen , Zhu Yanhai , linux-mm@kvack.org --0016360e3f5cb3ef5304a17bb873 Content-Type: text/plain; charset=ISO-8859-1 On Thu, Apr 21, 2011 at 10:59 PM, KAMEZAWA Hiroyuki < kamezawa.hiroyu@jp.fujitsu.com> wrote: > On Thu, 21 Apr 2011 22:53:19 -0700 > Ying Han wrote: > > > On Thu, Apr 21, 2011 at 10:00 PM, KAMEZAWA Hiroyuki < > > kamezawa.hiroyu@jp.fujitsu.com> wrote: > > > > > On Thu, 21 Apr 2011 21:49:04 -0700 > > > Ying Han wrote: > > > > > > > On Thu, Apr 21, 2011 at 9:36 PM, KAMEZAWA Hiroyuki < > > > > kamezawa.hiroyu@jp.fujitsu.com> wrote: > > > > > > > > > On Thu, 21 Apr 2011 21:24:15 -0700 > > > > > Ying Han wrote: > > > > > > > > > > > This patch creates a thread pool for memcg-kswapd. All memcg > which > > > needs > > > > > > background recalim are linked to a list and memcg-kswapd picks up > a > > > memcg > > > > > > from the list and run reclaim. > > > > > > > > > > > > The concern of using per-memcg-kswapd thread is the system > overhead > > > > > including > > > > > > memory and cputime. > > > > > > > > > > > > Signed-off-by: KAMEZAWA Hiroyuki > > > > > > > Signed-off-by: Ying Han > > > > > > > > > > Thank you for merging. This seems ok to me. > > > > > > > > > > Further development may make this better or change thread pools (to > > > some > > > > > other), > > > > > but I think this is enough good. > > > > > > > > > > > > > Thank you for reviewing and Acking. At the same time, I do have > wondering > > > on > > > > the thread-pool modeling which I posted on the cover-letter :) > > > > > > > > The per-memcg-per-kswapd model > > > > Pros: > > > > 1. memory overhead per thread, and The memory consumption would be > > > 8k*1000 = > > > > 8M > > > > with 1k cgroup. > > > > 2. we see lots of threads at 'ps -elf' > > > > > > > > Cons: > > > > 1. the implementation is simply and straigh-forward. > > > > 2. we can easily isolate the background reclaim overhead between > cgroups. > > > > 3. better latency from memory pressure to actual start reclaiming > > > > > > > > The thread-pool model > > > > Pros: > > > > 1. there is no isolation between memcg background reclaim, since the > > > memcg > > > > threads > > > > are shared. > > > > 2. it is hard for visibility and debugability. I have been > experienced a > > > lot > > > > when > > > > some kswapds running creazy and we need a stright-forward way to > identify > > > > which > > > > cgroup causing the reclaim. > > > > 3. potential starvation for some memcgs, if one workitem stucks and > the > > > rest > > > > of work > > > > won't proceed. > > > > > > > > Cons: > > > > 1. save some memory resource. > > > > > > > > In general, the per-memcg-per-kswapd implmentation looks sane to me > at > > > this > > > > point, esepcially the sharing memcg thread model will make debugging > > > issue > > > > very hard later. > > > > > > > > Comments? > > > > > > > Pros <-> Cons ? > > > > > > My idea is adding trace point for memcg-kswapd and seeing what it's now > > > doing. > > > (We don't have too small trace point in memcg...) > > > > > > I don't think its sane to create kthread per memcg because we know > there is > > > a user > > > who makes hundreds/thousands of memcg. > > > > > > And, I think that creating threads, which does the same job, more than > the > > > number > > > of cpus will cause much more difficult starvation, priority inversion > > > issue. > > > Keeping scheduling knob/chances of jobs in memcg is important. I don't > want > > > to > > > give a hint to scheduler because of memcg internal issue. > > > > > > And, even if memcg-kswapd doesn't exist, memcg works (well?). > > > memcg-kswapd just helps making things better but not do any critical > jobs. > > > So, it's okay to have this as best-effort service. > > > Of course, better scheduling idea for picking up memcg is welcomed. > It's > > > now > > > round-robin. > > > > > > Hmm. The concern I have is the debug-ability. Let's say I am running a > > system and found memcg-3 running crazy. Is there a way to find out which > > memcg it is trying to reclaim pages from? Also, how to count cputime for > the > > shared memcg to the memcgs if we wanted to. > > > > add a counter for kswapd-scan and kswapd-reclaim, kswapd-pickup will show > you information, if necessary it's good to show some latecy stat. I think > we can add enough information by adding stats (or debug by perf tools.) > I'll consider this a a bit more. > Something like "kswapd_pgscan" and "kswapd_steal" per memcg? If we are going to the thread-pool, we definitely need to add more stats to give us enough visibility of per-memcg background reclaim activity. Still, not sure about the cpu-cycles. --Ying > > Thanks, > -Kame > > > > > > --0016360e3f5cb3ef5304a17bb873 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable

On Thu, Apr 21, 2011 at 10:59 PM, KAMEZA= WA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
On Thu, 21 Apr 2011 22:53:19 -0700
Ying Han <yinghan@google.com> wrote:

> On Thu, Apr 21, 2011 at 10:00 PM, KAMEZAWA Hiroyuki <
> kamezawa.hiroyu@jp.f= ujitsu.com> wrote:
>
> > On Thu, 21 Apr 2011 21:49:04 -0700
> > Ying Han <yinghan@google= .com> wrote:
> >
> > > On Thu, Apr 21, 2011 at 9:36 PM, KAMEZAWA Hiroyuki <
> > > kamezawa.h= iroyu@jp.fujitsu.com> wrote:
> > >
> > > > On Thu, 21 Apr 2011 21:24:15 -0700
> > > > Ying Han <ying= han@google.com> wrote:
> > > >
> > > > > This patch creates a thread pool for memcg-kswapd.= All memcg which
> > needs
> > > > > background recalim are linked to a list and memcg-= kswapd picks up a
> > memcg
> > > > > from the list and run reclaim.
> > > > >
> > > > > The concern of using per-memcg-kswapd thread is th= e system overhead
> > > > including
> > > > > memory and cputime.
> > > > >
> > > > > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>=
> > > > > Signed-off-by: Ying Han <yinghan@google.com>
> > > >
> > > > Thank you for merging. This seems ok to me.
> > > >
> > > > Further development may make this better or change thre= ad pools (to
> > some
> > > > other),
> > > > but I think this is enough good.
> > > >
> > >
> > > Thank you for reviewing and Acking. At the same time, I do h= ave wondering
> > on
> > > the thread-pool modeling which I posted on the cover-letter = :)
> > >
> > > The per-memcg-per-kswapd model
> > > Pros:
> > > 1. memory overhead per thread, and The memory consumption wo= uld be
> > 8k*1000 =3D
> > > 8M
> > > with 1k cgroup.
> > > 2. we see lots of threads at 'ps -elf'
> > >
> > > Cons:
> > > 1. the implementation is simply and straigh-forward.
> > > 2. we can easily isolate the background reclaim overhead bet= ween cgroups.
> > > 3. better latency from memory pressure to actual start recla= iming
> > >
> > > The thread-pool model
> > > Pros:
> > > 1. there is no isolation between memcg background reclaim, s= ince the
> > memcg
> > > threads
> > > are shared.
> > > 2. it is hard for visibility and debugability. I have been e= xperienced a
> > lot
> > > when
> > > some kswapds running creazy and we need a stright-forward wa= y to identify
> > > which
> > > cgroup causing the reclaim.
> > > 3. potential starvation for some memcgs, if one workitem stu= cks and the
> > rest
> > > of work
> > > won't proceed.
> > >
> > > Cons:
> > > 1. save some memory resource.
> > >
> > > In general, the per-memcg-per-kswapd implmentation looks san= e to me at
> > this
> > > point, esepcially the sharing memcg thread model will make d= ebugging
> > issue
> > > very hard later.
> > >
> > > Comments?
> > >
> > Pros <-> Cons ?
> >
> > My idea is adding trace point for memcg-kswapd and seeing what it= 's now
> > doing.
> > (We don't have too small trace point in memcg...)
> >
> > I don't think its sane to create kthread per memcg because we= know there is
> > a user
> > who makes hundreds/thousands of memcg.
> >
> > And, I think that creating threads, which does the same job, more= than the
> > number
> > of cpus will cause much more difficult starvation, priority inver= sion
> > issue.
> > Keeping scheduling knob/chances of jobs in memcg is important. I = don't want
> > to
> > give a hint to scheduler because of memcg internal issue.
> >
> > And, even if memcg-kswapd doesn't exist, memcg works (well?).=
> > memcg-kswapd just helps making things better but not do any criti= cal jobs.
> > So, it's okay to have this as best-effort service.
> > Of course, better scheduling idea for picking up memcg is welcome= d. It's
> > now
> > round-robin.
> >
> > Hmm. The concern I have is the debug-ability. Let's say I am = running a
> system and found memcg-3 running crazy. Is there a way to find out whi= ch
> memcg it is trying to reclaim pages from? Also, how to count cputime f= or the
> shared memcg to the memcgs if we wanted to.
>

add a counter for kswapd-scan and kswapd-reclaim, kswapd-pickup= will show
you information, if necessary it's good to show some latecy stat. I thi= nk
we can add enough information by adding stats (or debug by perf tools.)
I'll consider this a a bit more.

So= mething like "kswapd_pgscan" and "kswapd_steal" per mem= cg? If we are going to the thread-pool, we definitely need to add more stat= s to give us enough visibility of per-memcg background reclaim activity. St= ill, not sure about the cpu-cycles.

--Ying

Thanks,
-Kame






--0016360e3f5cb3ef5304a17bb873-- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org