What's the progress of per cgroup background reclaim?

All of lore.kernel.org
 help / color / mirror / Atom feed

* What's the progress of per cgroup background reclaim?
@ 2012-03-14  7:53 Zhu Yanhai
       [not found] ` <CAC8teKW9Yry114+B06XiFbOtamwS6fNLzrKYo8PZazjEzMCiNQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Zhu Yanhai @ 2012-03-14  7:53 UTC (permalink / raw)
  To: cgroups-u79uwXL29TY76Z2rM5mHXA, Han Ying, KAMEZAWA Hiroyuki

Hi all,
Just a quick question, could you please tell me what's the current
status of the development of per cgroup background reclaim? This topic
seems to be silent after Han Ying's patchset V7
(http://lwn.net/Articles/440073/) and Kame's async reclaim patchset V3
(https://lkml.org/lkml/2011/5/26/20), and I can't find it in
memcg-devel tree either.
Is anyone still working on this?

Thanks,
Zhu Yanhai

^ permalink raw reply	[flat|nested] 7+ messages in thread

[parent not found: <CAC8teKW9Yry114+B06XiFbOtamwS6fNLzrKYo8PZazjEzMCiNQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]

* Re: What's the progress of per cgroup background reclaim?
       [not found] ` <CAC8teKW9Yry114+B06XiFbOtamwS6fNLzrKYo8PZazjEzMCiNQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-03-14  8:32   ` KAMEZAWA Hiroyuki
  2012-03-14 16:17   ` Ying Han
  1 sibling, 0 replies; 7+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-14  8:32 UTC (permalink / raw)
  To: Zhu Yanhai; +Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, Han Ying

(2012/03/14 16:53), Zhu Yanhai wrote:

> Hi all,
> Just a quick question, could you please tell me what's the current
> status of the development of per cgroup background reclaim? This topic
> seems to be silent after Han Ying's patchset V7
> (http://lwn.net/Articles/440073/) and Kame's async reclaim patchset V3
> (https://lkml.org/lkml/2011/5/26/20), and I can't find it in
> memcg-devel tree either.
> Is anyone still working on this?


I'll do and have a plan. But there are several ongoing projects as

 a) per lruvec lock
 b) page cgroup diet
 c) kmem accounting
 d) hugetlb accounting
 e) some fixes in cgroup
 f) more tweaks in vmscan.c
 g) writeback and dirty ratio.

I think it will come after a) and b), at least. And it's better
to have g) before kswapd-for-memcg.


Thanks,
-Kame

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: What's the progress of per cgroup background reclaim?
       [not found] ` <CAC8teKW9Yry114+B06XiFbOtamwS6fNLzrKYo8PZazjEzMCiNQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2012-03-14  8:32   ` KAMEZAWA Hiroyuki
@ 2012-03-14 16:17   ` Ying Han
       [not found]     ` <CALWz4iyqyQD3g27J=wLvQ=WN0enXLB3rmHDAb51UJaLPJDz5BA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  1 sibling, 1 reply; 7+ messages in thread
From: Ying Han @ 2012-03-14 16:17 UTC (permalink / raw)
  To: Zhu Yanhai; +Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, KAMEZAWA Hiroyuki

On Wed, Mar 14, 2012 at 12:53 AM, Zhu Yanhai <zhu.yanhai-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> Hi all,
> Just a quick question, could you please tell me what's the current
> status of the development of per cgroup background reclaim? This topic
> seems to be silent after Han Ying's patchset V7
> (http://lwn.net/Articles/440073/) and Kame's async reclaim patchset V3
> (https://lkml.org/lkml/2011/5/26/20), and I can't find it in
> memcg-devel tree either.
> Is anyone still working on this?

There were some discussions on going w/ per-memcg kswapd thread or
workqueue by that time. And now I think we agree to go w/ the
per-memcg thread model.  I haven't done much work since then, and one
of questions is to demonstrate the need of this feature.

I am glad you are asking, do you have workload showing problems w/o it?

--Ying

>
> Thanks,
> Zhu Yanhai

^ permalink raw reply	[flat|nested] 7+ messages in thread

[parent not found: <CALWz4iyqyQD3g27J=wLvQ=WN0enXLB3rmHDAb51UJaLPJDz5BA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]

* Re: What's the progress of per cgroup background reclaim?
       [not found]     ` <CALWz4iyqyQD3g27J=wLvQ=WN0enXLB3rmHDAb51UJaLPJDz5BA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-03-15  1:09       ` KAMEZAWA Hiroyuki
  2012-03-15  3:12       ` Zhu Yanhai
  1 sibling, 0 replies; 7+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-15  1:09 UTC (permalink / raw)
  To: Ying Han; +Cc: Zhu Yanhai, cgroups-u79uwXL29TY76Z2rM5mHXA

(2012/03/15 1:17), Ying Han wrote:

> On Wed, Mar 14, 2012 at 12:53 AM, Zhu Yanhai <zhu.yanhai-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> Hi all,
>> Just a quick question, could you please tell me what's the current
>> status of the development of per cgroup background reclaim? This topic
>> seems to be silent after Han Ying's patchset V7
>> (http://lwn.net/Articles/440073/) and Kame's async reclaim patchset V3
>> (https://lkml.org/lkml/2011/5/26/20), and I can't find it in
>> memcg-devel tree either.
>> Is anyone still working on this?
> 
> There were some discussions on going w/ per-memcg kswapd thread or
> workqueue by that time. And now I think we agree to go w/ the
> per-memcg thread model. 


I think agreed, too.

Thanks,
-Kame

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: What's the progress of per cgroup background reclaim?
       [not found]     ` <CALWz4iyqyQD3g27J=wLvQ=WN0enXLB3rmHDAb51UJaLPJDz5BA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2012-03-15  1:09       ` KAMEZAWA Hiroyuki
@ 2012-03-15  3:12       ` Zhu Yanhai
       [not found]         ` <CAC8teKUrJ_ufkhtVAnsv-gzHfYmp8agY_bncOYS7mEdiqD+YZA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  1 sibling, 1 reply; 7+ messages in thread
From: Zhu Yanhai @ 2012-03-15  3:12 UTC (permalink / raw)
  To: Ying Han; +Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, KAMEZAWA Hiroyuki

2012/3/15 Ying Han <yinghan-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>:
> On Wed, Mar 14, 2012 at 12:53 AM, Zhu Yanhai <zhu.yanhai-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> Hi all,
>> Just a quick question, could you please tell me what's the current
>> status of the development of per cgroup background reclaim? This topic
>> seems to be silent after Han Ying's patchset V7
>> (http://lwn.net/Articles/440073/) and Kame's async reclaim patchset V3
>> (https://lkml.org/lkml/2011/5/26/20), and I can't find it in
>> memcg-devel tree either.
>> Is anyone still working on this?
>
> There were some discussions on going w/ per-memcg kswapd thread or
> workqueue by that time. And now I think we agree to go w/ the
> per-memcg thread model. Â I haven't done much work since then, and one
> of questions is to demonstrate the need of this feature.
>
> I am glad you are asking, do you have workload showing problems w/o it?

Yes, the background is we have a cluster of about 3k-4k servers, all
running JVMs. Because the load of each Java application is small, we
gave each of them a small GC heap, which was 1.5GB or so. Then to take
full use of the huge memory of the servers, we setup several XEN based
virtual machines on each physical box, each XEN VM had one single JVM
running in it.
Now we are trying to switch to a LXC/cgroup based solution,  at the
first step we have built a small experimental cluster online, the
containers equipped with memcg are sizing to the same size with the
XEN VMs (we haven't enabled other controllers since the major need for
resources came from the memory, while the pressure against CPU and IO
usually is smaller). As soon as they are online, we noticed that the
latency recorded in the client side had periodic peaks. We also
noticed that the memory.failcnt counter periodic increased. By
enabling kmem:mm_directreclaim_reclaimall,
kmem:mm_vmscan_direct_reclaim_begin, kmem:mm_vmscan_direct_reclaim_end
in trace events, we can see that kmem:mm_directreclaim_reclaimall came
out regularly, without  kmem:mm_vmscan_direct_reclaim_begin/end seen.
That means the caller of do_try_to_free_pages is
try_to_free_mem_cgroup_pages, not the global caller try_to_free_pages.
To make load balance in the cluster wide, we tend to dispatch the Java
apps average over the containers on different physical boxes, that's
to say unless the cluster is to be filled up, the number of active
containers on each box won't be large. I think under such scenario the
global pressure can never be high, so kswapd keep sleeping most of the
time, however the local pressure within one single cgroup maybe very
high, which results to frequent direct reclaim.
The kernel we are using is a custom RHEL6U2 kernel. We have a kernel
team here so it's fine to backport the upstream solution if there is
any.

--
Thanks,
Zhu Yanhai

> --Ying
>
>>
>> Thanks,
>> Zhu Yanhai

^ permalink raw reply	[flat|nested] 7+ messages in thread

[parent not found: <CAC8teKUrJ_ufkhtVAnsv-gzHfYmp8agY_bncOYS7mEdiqD+YZA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]

* Re: What's the progress of per cgroup background reclaim?
       [not found]         ` <CAC8teKUrJ_ufkhtVAnsv-gzHfYmp8agY_bncOYS7mEdiqD+YZA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-03-15 16:31           ` Ying Han
       [not found]             ` <CALWz4iwO+eGFW23qVwjy=MKufRtiVck08pJphSij226pjTQiBw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Ying Han @ 2012-03-15 16:31 UTC (permalink / raw)
  To: Zhu Yanhai; +Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, KAMEZAWA Hiroyuki

On Wed, Mar 14, 2012 at 8:12 PM, Zhu Yanhai <zhu.yanhai-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> 2012/3/15 Ying Han <yinghan-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>:
>> On Wed, Mar 14, 2012 at 12:53 AM, Zhu Yanhai <zhu.yanhai-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>>> Hi all,
>>> Just a quick question, could you please tell me what's the current
>>> status of the development of per cgroup background reclaim? This topic
>>> seems to be silent after Han Ying's patchset V7
>>> (http://lwn.net/Articles/440073/) and Kame's async reclaim patchset V3
>>> (https://lkml.org/lkml/2011/5/26/20), and I can't find it in
>>> memcg-devel tree either.
>>> Is anyone still working on this?
>>
>> There were some discussions on going w/ per-memcg kswapd thread or
>> workqueue by that time. And now I think we agree to go w/ the
>> per-memcg thread model.  I haven't done much work since then, and one
>> of questions is to demonstrate the need of this feature.
>>
>> I am glad you are asking, do you have workload showing problems w/o it?
>
> Yes, the background is we have a cluster of about 3k-4k servers, all
> running JVMs. Because the load of each Java application is small, we
> gave each of them a small GC heap, which was 1.5GB or so. Then to take
> full use of the huge memory of the servers, we setup several XEN based
> virtual machines on each physical box, each XEN VM had one single JVM
> running in it.
> Now we are trying to switch to a LXC/cgroup based solution,  at the
> first step we have built a small experimental cluster online, the
> containers equipped with memcg are sizing to the same size with the
> XEN VMs (we haven't enabled other controllers since the major need for
> resources came from the memory, while the pressure against CPU and IO
> usually is smaller). As soon as they are online, we noticed that the
> latency recorded in the client side had periodic peaks. We also
> noticed that the memory.failcnt counter periodic increased. By
> enabling kmem:mm_directreclaim_reclaimall,
> kmem:mm_vmscan_direct_reclaim_begin, kmem:mm_vmscan_direct_reclaim_end
> in trace events, we can see that kmem:mm_directreclaim_reclaimall came
> out regularly, without  kmem:mm_vmscan_direct_reclaim_begin/end seen.
> That means the caller of do_try_to_free_pages is
> try_to_free_mem_cgroup_pages, not the global caller try_to_free_pages.
> To make load balance in the cluster wide, we tend to dispatch the Java
> apps average over the containers on different physical boxes, that's
> to say unless the cluster is to be filled up, the number of active
> containers on each box won't be large. I think under such scenario the
> global pressure can never be high, so kswapd keep sleeping most of the
> time, however the local pressure within one single cgroup maybe very
> high, which results to frequent direct reclaim.
> The kernel we are using is a custom RHEL6U2 kernel. We have a kernel
> team here so it's fine to backport the upstream solution if there is
> any.


Thank you for the information.

Sounds like that is exactly what the per-memcg kswapd was designed
for. Two things I am interested to looking into now:

1. why the per-memcg direct reclaim introduce such noticeable latency
spike, what workload that can reproduce that?

2. we can quickly patch the last version of per-memcg kswapd (V6) on
your environment and see what difference it makes. Maybe I can help on
that.

Thanks

--Ying

> --
> Thanks,
> Zhu Yanhai
>
>> --Ying
>>
>>>
>>> Thanks,
>>> Zhu Yanhai

^ permalink raw reply	[flat|nested] 7+ messages in thread

[parent not found: <CALWz4iwO+eGFW23qVwjy=MKufRtiVck08pJphSij226pjTQiBw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]

* Re: What's the progress of per cgroup background reclaim?
       [not found]             ` <CALWz4iwO+eGFW23qVwjy=MKufRtiVck08pJphSij226pjTQiBw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-03-19  8:09               ` Zhu Yanhai
  0 siblings, 0 replies; 7+ messages in thread
From: Zhu Yanhai @ 2012-03-19  8:09 UTC (permalink / raw)
  To: Ying Han; +Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, KAMEZAWA Hiroyuki

Hi,

2012/3/16 Ying Han <yinghan-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>:
>
> Thank you for the information.
>
> Sounds like that is exactly what the per-memcg kswapd was designed
> for. Two things I am interested to looking into now:
>
> 1. why the per-memcg direct reclaim introduce such noticeable latency
> spike, what workload that can reproduce that?
Sure, I will look into this this week.
>
> 2. we can quickly patch the last version of per-memcg kswapd (V6) on
> your environment and see what difference it makes. Maybe I can help on
> that.
Thanks a lot.
>
> Thanks
>
> --Ying
>
>> --
>> Thanks,
>> Zhu Yanhai
>>
>>> --Ying
>>>
>>>>
>>>> Thanks,
>>>> Zhu Yanhai

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2012-03-19  8:09 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-03-14  7:53 What's the progress of per cgroup background reclaim? Zhu Yanhai
     [not found] ` <CAC8teKW9Yry114+B06XiFbOtamwS6fNLzrKYo8PZazjEzMCiNQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-03-14  8:32   ` KAMEZAWA Hiroyuki
2012-03-14 16:17   ` Ying Han
     [not found]     ` <CALWz4iyqyQD3g27J=wLvQ=WN0enXLB3rmHDAb51UJaLPJDz5BA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-03-15  1:09       ` KAMEZAWA Hiroyuki
2012-03-15  3:12       ` Zhu Yanhai
     [not found]         ` <CAC8teKUrJ_ufkhtVAnsv-gzHfYmp8agY_bncOYS7mEdiqD+YZA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-03-15 16:31           ` Ying Han
     [not found]             ` <CALWz4iwO+eGFW23qVwjy=MKufRtiVck08pJphSij226pjTQiBw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-03-19  8:09               ` Zhu Yanhai

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.