* What's the progress of per cgroup background reclaim?
@ 2012-03-14 7:53 Zhu Yanhai
[not found] ` <CAC8teKW9Yry114+B06XiFbOtamwS6fNLzrKYo8PZazjEzMCiNQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
0 siblings, 1 reply; 7+ messages in thread
From: Zhu Yanhai @ 2012-03-14 7:53 UTC (permalink / raw)
To: cgroups-u79uwXL29TY76Z2rM5mHXA, Han Ying, KAMEZAWA Hiroyuki
Hi all,
Just a quick question, could you please tell me what's the current
status of the development of per cgroup background reclaim? This topic
seems to be silent after Han Ying's patchset V7
(http://lwn.net/Articles/440073/) and Kame's async reclaim patchset V3
(https://lkml.org/lkml/2011/5/26/20), and I can't find it in
memcg-devel tree either.
Is anyone still working on this?
Thanks,
Zhu Yanhai
^ permalink raw reply [flat|nested] 7+ messages in thread[parent not found: <CAC8teKW9Yry114+B06XiFbOtamwS6fNLzrKYo8PZazjEzMCiNQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: What's the progress of per cgroup background reclaim? [not found] ` <CAC8teKW9Yry114+B06XiFbOtamwS6fNLzrKYo8PZazjEzMCiNQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2012-03-14 8:32 ` KAMEZAWA Hiroyuki 2012-03-14 16:17 ` Ying Han 1 sibling, 0 replies; 7+ messages in thread From: KAMEZAWA Hiroyuki @ 2012-03-14 8:32 UTC (permalink / raw) To: Zhu Yanhai; +Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, Han Ying (2012/03/14 16:53), Zhu Yanhai wrote: > Hi all, > Just a quick question, could you please tell me what's the current > status of the development of per cgroup background reclaim? This topic > seems to be silent after Han Ying's patchset V7 > (http://lwn.net/Articles/440073/) and Kame's async reclaim patchset V3 > (https://lkml.org/lkml/2011/5/26/20), and I can't find it in > memcg-devel tree either. > Is anyone still working on this? I'll do and have a plan. But there are several ongoing projects as a) per lruvec lock b) page cgroup diet c) kmem accounting d) hugetlb accounting e) some fixes in cgroup f) more tweaks in vmscan.c g) writeback and dirty ratio. I think it will come after a) and b), at least. And it's better to have g) before kswapd-for-memcg. Thanks, -Kame ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: What's the progress of per cgroup background reclaim? [not found] ` <CAC8teKW9Yry114+B06XiFbOtamwS6fNLzrKYo8PZazjEzMCiNQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2012-03-14 8:32 ` KAMEZAWA Hiroyuki @ 2012-03-14 16:17 ` Ying Han [not found] ` <CALWz4iyqyQD3g27J=wLvQ=WN0enXLB3rmHDAb51UJaLPJDz5BA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 1 sibling, 1 reply; 7+ messages in thread From: Ying Han @ 2012-03-14 16:17 UTC (permalink / raw) To: Zhu Yanhai; +Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, KAMEZAWA Hiroyuki On Wed, Mar 14, 2012 at 12:53 AM, Zhu Yanhai <zhu.yanhai-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > Hi all, > Just a quick question, could you please tell me what's the current > status of the development of per cgroup background reclaim? This topic > seems to be silent after Han Ying's patchset V7 > (http://lwn.net/Articles/440073/) and Kame's async reclaim patchset V3 > (https://lkml.org/lkml/2011/5/26/20), and I can't find it in > memcg-devel tree either. > Is anyone still working on this? There were some discussions on going w/ per-memcg kswapd thread or workqueue by that time. And now I think we agree to go w/ the per-memcg thread model. I haven't done much work since then, and one of questions is to demonstrate the need of this feature. I am glad you are asking, do you have workload showing problems w/o it? --Ying > > Thanks, > Zhu Yanhai ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <CALWz4iyqyQD3g27J=wLvQ=WN0enXLB3rmHDAb51UJaLPJDz5BA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: What's the progress of per cgroup background reclaim? [not found] ` <CALWz4iyqyQD3g27J=wLvQ=WN0enXLB3rmHDAb51UJaLPJDz5BA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2012-03-15 1:09 ` KAMEZAWA Hiroyuki 2012-03-15 3:12 ` Zhu Yanhai 1 sibling, 0 replies; 7+ messages in thread From: KAMEZAWA Hiroyuki @ 2012-03-15 1:09 UTC (permalink / raw) To: Ying Han; +Cc: Zhu Yanhai, cgroups-u79uwXL29TY76Z2rM5mHXA (2012/03/15 1:17), Ying Han wrote: > On Wed, Mar 14, 2012 at 12:53 AM, Zhu Yanhai <zhu.yanhai-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: >> Hi all, >> Just a quick question, could you please tell me what's the current >> status of the development of per cgroup background reclaim? This topic >> seems to be silent after Han Ying's patchset V7 >> (http://lwn.net/Articles/440073/) and Kame's async reclaim patchset V3 >> (https://lkml.org/lkml/2011/5/26/20), and I can't find it in >> memcg-devel tree either. >> Is anyone still working on this? > > There were some discussions on going w/ per-memcg kswapd thread or > workqueue by that time. And now I think we agree to go w/ the > per-memcg thread model. I think agreed, too. Thanks, -Kame ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: What's the progress of per cgroup background reclaim? [not found] ` <CALWz4iyqyQD3g27J=wLvQ=WN0enXLB3rmHDAb51UJaLPJDz5BA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2012-03-15 1:09 ` KAMEZAWA Hiroyuki @ 2012-03-15 3:12 ` Zhu Yanhai [not found] ` <CAC8teKUrJ_ufkhtVAnsv-gzHfYmp8agY_bncOYS7mEdiqD+YZA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 1 sibling, 1 reply; 7+ messages in thread From: Zhu Yanhai @ 2012-03-15 3:12 UTC (permalink / raw) To: Ying Han; +Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, KAMEZAWA Hiroyuki 2012/3/15 Ying Han <yinghan-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>: > On Wed, Mar 14, 2012 at 12:53 AM, Zhu Yanhai <zhu.yanhai-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: >> Hi all, >> Just a quick question, could you please tell me what's the current >> status of the development of per cgroup background reclaim? This topic >> seems to be silent after Han Ying's patchset V7 >> (http://lwn.net/Articles/440073/) and Kame's async reclaim patchset V3 >> (https://lkml.org/lkml/2011/5/26/20), and I can't find it in >> memcg-devel tree either. >> Is anyone still working on this? > > There were some discussions on going w/ per-memcg kswapd thread or > workqueue by that time. And now I think we agree to go w/ the > per-memcg thread model. Â I haven't done much work since then, and one > of questions is to demonstrate the need of this feature. > > I am glad you are asking, do you have workload showing problems w/o it? Yes, the background is we have a cluster of about 3k-4k servers, all running JVMs. Because the load of each Java application is small, we gave each of them a small GC heap, which was 1.5GB or so. Then to take full use of the huge memory of the servers, we setup several XEN based virtual machines on each physical box, each XEN VM had one single JVM running in it. Now we are trying to switch to a LXC/cgroup based solution, at the first step we have built a small experimental cluster online, the containers equipped with memcg are sizing to the same size with the XEN VMs (we haven't enabled other controllers since the major need for resources came from the memory, while the pressure against CPU and IO usually is smaller). As soon as they are online, we noticed that the latency recorded in the client side had periodic peaks. We also noticed that the memory.failcnt counter periodic increased. By enabling kmem:mm_directreclaim_reclaimall, kmem:mm_vmscan_direct_reclaim_begin, kmem:mm_vmscan_direct_reclaim_end in trace events, we can see that kmem:mm_directreclaim_reclaimall came out regularly, without kmem:mm_vmscan_direct_reclaim_begin/end seen. That means the caller of do_try_to_free_pages is try_to_free_mem_cgroup_pages, not the global caller try_to_free_pages. To make load balance in the cluster wide, we tend to dispatch the Java apps average over the containers on different physical boxes, that's to say unless the cluster is to be filled up, the number of active containers on each box won't be large. I think under such scenario the global pressure can never be high, so kswapd keep sleeping most of the time, however the local pressure within one single cgroup maybe very high, which results to frequent direct reclaim. The kernel we are using is a custom RHEL6U2 kernel. We have a kernel team here so it's fine to backport the upstream solution if there is any. -- Thanks, Zhu Yanhai > --Ying > >> >> Thanks, >> Zhu Yanhai ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <CAC8teKUrJ_ufkhtVAnsv-gzHfYmp8agY_bncOYS7mEdiqD+YZA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: What's the progress of per cgroup background reclaim? [not found] ` <CAC8teKUrJ_ufkhtVAnsv-gzHfYmp8agY_bncOYS7mEdiqD+YZA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2012-03-15 16:31 ` Ying Han [not found] ` <CALWz4iwO+eGFW23qVwjy=MKufRtiVck08pJphSij226pjTQiBw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 7+ messages in thread From: Ying Han @ 2012-03-15 16:31 UTC (permalink / raw) To: Zhu Yanhai; +Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, KAMEZAWA Hiroyuki On Wed, Mar 14, 2012 at 8:12 PM, Zhu Yanhai <zhu.yanhai-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > 2012/3/15 Ying Han <yinghan-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>: >> On Wed, Mar 14, 2012 at 12:53 AM, Zhu Yanhai <zhu.yanhai-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: >>> Hi all, >>> Just a quick question, could you please tell me what's the current >>> status of the development of per cgroup background reclaim? This topic >>> seems to be silent after Han Ying's patchset V7 >>> (http://lwn.net/Articles/440073/) and Kame's async reclaim patchset V3 >>> (https://lkml.org/lkml/2011/5/26/20), and I can't find it in >>> memcg-devel tree either. >>> Is anyone still working on this? >> >> There were some discussions on going w/ per-memcg kswapd thread or >> workqueue by that time. And now I think we agree to go w/ the >> per-memcg thread model. I haven't done much work since then, and one >> of questions is to demonstrate the need of this feature. >> >> I am glad you are asking, do you have workload showing problems w/o it? > > Yes, the background is we have a cluster of about 3k-4k servers, all > running JVMs. Because the load of each Java application is small, we > gave each of them a small GC heap, which was 1.5GB or so. Then to take > full use of the huge memory of the servers, we setup several XEN based > virtual machines on each physical box, each XEN VM had one single JVM > running in it. > Now we are trying to switch to a LXC/cgroup based solution, at the > first step we have built a small experimental cluster online, the > containers equipped with memcg are sizing to the same size with the > XEN VMs (we haven't enabled other controllers since the major need for > resources came from the memory, while the pressure against CPU and IO > usually is smaller). As soon as they are online, we noticed that the > latency recorded in the client side had periodic peaks. We also > noticed that the memory.failcnt counter periodic increased. By > enabling kmem:mm_directreclaim_reclaimall, > kmem:mm_vmscan_direct_reclaim_begin, kmem:mm_vmscan_direct_reclaim_end > in trace events, we can see that kmem:mm_directreclaim_reclaimall came > out regularly, without kmem:mm_vmscan_direct_reclaim_begin/end seen. > That means the caller of do_try_to_free_pages is > try_to_free_mem_cgroup_pages, not the global caller try_to_free_pages. > To make load balance in the cluster wide, we tend to dispatch the Java > apps average over the containers on different physical boxes, that's > to say unless the cluster is to be filled up, the number of active > containers on each box won't be large. I think under such scenario the > global pressure can never be high, so kswapd keep sleeping most of the > time, however the local pressure within one single cgroup maybe very > high, which results to frequent direct reclaim. > The kernel we are using is a custom RHEL6U2 kernel. We have a kernel > team here so it's fine to backport the upstream solution if there is > any. Thank you for the information. Sounds like that is exactly what the per-memcg kswapd was designed for. Two things I am interested to looking into now: 1. why the per-memcg direct reclaim introduce such noticeable latency spike, what workload that can reproduce that? 2. we can quickly patch the last version of per-memcg kswapd (V6) on your environment and see what difference it makes. Maybe I can help on that. Thanks --Ying > -- > Thanks, > Zhu Yanhai > >> --Ying >> >>> >>> Thanks, >>> Zhu Yanhai ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <CALWz4iwO+eGFW23qVwjy=MKufRtiVck08pJphSij226pjTQiBw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: What's the progress of per cgroup background reclaim? [not found] ` <CALWz4iwO+eGFW23qVwjy=MKufRtiVck08pJphSij226pjTQiBw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2012-03-19 8:09 ` Zhu Yanhai 0 siblings, 0 replies; 7+ messages in thread From: Zhu Yanhai @ 2012-03-19 8:09 UTC (permalink / raw) To: Ying Han; +Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, KAMEZAWA Hiroyuki Hi, 2012/3/16 Ying Han <yinghan-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>: > > Thank you for the information. > > Sounds like that is exactly what the per-memcg kswapd was designed > for. Two things I am interested to looking into now: > > 1. why the per-memcg direct reclaim introduce such noticeable latency > spike, what workload that can reproduce that? Sure, I will look into this this week. > > 2. we can quickly patch the last version of per-memcg kswapd (V6) on > your environment and see what difference it makes. Maybe I can help on > that. Thanks a lot. > > Thanks > > --Ying > >> -- >> Thanks, >> Zhu Yanhai >> >>> --Ying >>> >>>> >>>> Thanks, >>>> Zhu Yanhai ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2012-03-19 8:09 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-03-14 7:53 What's the progress of per cgroup background reclaim? Zhu Yanhai
[not found] ` <CAC8teKW9Yry114+B06XiFbOtamwS6fNLzrKYo8PZazjEzMCiNQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-03-14 8:32 ` KAMEZAWA Hiroyuki
2012-03-14 16:17 ` Ying Han
[not found] ` <CALWz4iyqyQD3g27J=wLvQ=WN0enXLB3rmHDAb51UJaLPJDz5BA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-03-15 1:09 ` KAMEZAWA Hiroyuki
2012-03-15 3:12 ` Zhu Yanhai
[not found] ` <CAC8teKUrJ_ufkhtVAnsv-gzHfYmp8agY_bncOYS7mEdiqD+YZA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-03-15 16:31 ` Ying Han
[not found] ` <CALWz4iwO+eGFW23qVwjy=MKufRtiVck08pJphSij226pjTQiBw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-03-19 8:09 ` Zhu Yanhai
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.