From: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
To: Glauber Costa <glommer@parallels.com>
Cc: linux-mm@kvack.org, Pekka Enberg <penberg@kernel.org>,
Cristoph Lameter <cl@linux.com>,
David Rientjes <rientjes@google.com>,
cgroups@vger.kernel.org, devel@openvz.org,
linux-kernel@vger.kernel.org,
Frederic Weisbecker <fweisbec@gmail.com>,
Suleiman Souhlal <suleiman@google.com>,
Pekka Enberg <penberg@cs.helsinki.fi>,
Michal Hocko <mhocko@suse.cz>,
Johannes Weiner <hannes@cmpxchg.org>
Subject: Re: [PATCH v4 23/25] memcg: propagate kmem limiting information to children
Date: Sat, 23 Jun 2012 13:19:37 +0900 [thread overview]
Message-ID: <4FE543D9.2010802@jp.fujitsu.com> (raw)
In-Reply-To: <4FE19102.6030704@parallels.com>
(2012/06/20 17:59), Glauber Costa wrote:
> On 06/19/2012 12:54 PM, Glauber Costa wrote:
>> On 06/19/2012 12:35 PM, Glauber Costa wrote:
>>> On 06/19/2012 04:16 AM, Kamezawa Hiroyuki wrote:
>>>> (2012/06/18 21:43), Glauber Costa wrote:
>>>>> On 06/18/2012 04:37 PM, Kamezawa Hiroyuki wrote:
>>>>>> (2012/06/18 19:28), Glauber Costa wrote:
>>>>>>> The current memcg slab cache management fails to present satisfatory hierarchical
>>>>>>> behavior in the following scenario:
>>>>>>>
>>>>>>> -> /cgroups/memory/A/B/C
>>>>>>>
>>>>>>> * kmem limit set at A
>>>>>>> * A and B empty taskwise
>>>>>>> * bash in C does find /
>>>>>>>
>>>>>>> Because kmem_accounted is a boolean that was not set for C, no accounting
>>>>>>> would be done. This is, however, not what we expect.
>>>>>>>
>>>>>>
>>>>>> Hmm....do we need this new routines even while we have mem_cgroup_iter() ?
>>>>>>
>>>>>> Doesn't this work ?
>>>>>>
>>>>>> struct mem_cgroup {
>>>>>> .....
>>>>>> bool kmem_accounted_this;
>>>>>> atomic_t kmem_accounted;
>>>>>> ....
>>>>>> }
>>>>>>
>>>>>> at set limit
>>>>>>
>>>>>> ....set_limit(memcg) {
>>>>>>
>>>>>> if (newly accounted) {
>>>>>> mem_cgroup_iter() {
>>>>>> atomic_inc(&iter->kmem_accounted)
>>>>>> }
>>>>>> } else {
>>>>>> mem_cgroup_iter() {
>>>>>> atomic_dec(&iter->kmem_accounted);
>>>>>> }
>>>>>> }
>>>>>>
>>>>>>
>>>>>> hm ? Then, you can see kmem is accounted or not by atomic_read(&memcg->kmem_accounted);
>>>>>>
>>>>>
>>>>> Accounted by itself / parent is still useful, and I see no reason to use
>>>>> an atomic + bool if we can use a pair of bits.
>>>>>
>>>>> As for the routine, I guess mem_cgroup_iter will work... It does a lot
>>>>> more than I need, but for the sake of using what's already in there, I
>>>>> can switch to it with no problems.
>>>>>
>>>>
>>>> Hmm. please start from reusing existing routines.
>>>> If it's not enough, some enhancement for generic cgroup will be welcomed
>>>> rather than completely new one only for memcg.
>>>>
>>>
>>> And now that I am trying to adapt the code to the new function, I
>>> remember clearly why I done this way. Sorry for my failed memory.
>>>
>>> That has to do with the order of the walk. I need to enforce hierarchy,
>>> which means whenever a cgroup has !use_hierarchy, I need to cut out that
>>> branch, but continue scanning the tree for other branches.
>>>
>>> That is a lot easier to do with depth-search tree walks like the one
>>> proposed in this patch. for_each_mem_cgroup() seems to walk the tree in
>>> css-creation order. Which means we need to keep track of parents that
>>> has hierarchy disabled at all times ( can be many ), and always test for
>>> ancestorship - which is expensive, but I don't particularly care.
>>>
>>> But I'll give another shot with this one.
>>>
>>
>> Humm, silly me. I was believing the hierarchical settings to be more
>> flexible than they really are.
>>
>> I thought that it could be possible for a children of a parent with
>> use_hierarchy = 1 to have use_hierarchy = 0.
>>
>> It seems not to be the case. This makes my life a lot easier.
>>
>
> How about the following patch?
>
> It is still expensive in the clear_bit case, because I can't just walk
> the whole tree flipping the bit down: I need to stop whenever I see a
> branch whose root is itself accounted - and the ordering of iter forces
> me to always check the tree up (So we got O(n*h) h being height instead
> of O(n)).
>
> for flipping the bit up, it is easy enough.
>
>
Yes. It seems much nicer.
Thanks,
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
To: Glauber Costa <glommer@parallels.com>
Cc: linux-mm@kvack.org, Pekka Enberg <penberg@kernel.org>,
Cristoph Lameter <cl@linux.com>,
David Rientjes <rientjes@google.com>,
cgroups@vger.kernel.org, devel@openvz.org,
linux-kernel@vger.kernel.org,
Frederic Weisbecker <fweisbec@gmail.com>,
Suleiman Souhlal <suleiman@google.com>,
Pekka Enberg <penberg@cs.helsinki.fi>,
Michal Hocko <mhocko@suse.cz>,
Johannes Weiner <hannes@cmpxchg.org>
Subject: Re: [PATCH v4 23/25] memcg: propagate kmem limiting information to children
Date: Sat, 23 Jun 2012 13:19:37 +0900 [thread overview]
Message-ID: <4FE543D9.2010802@jp.fujitsu.com> (raw)
In-Reply-To: <4FE19102.6030704@parallels.com>
(2012/06/20 17:59), Glauber Costa wrote:
> On 06/19/2012 12:54 PM, Glauber Costa wrote:
>> On 06/19/2012 12:35 PM, Glauber Costa wrote:
>>> On 06/19/2012 04:16 AM, Kamezawa Hiroyuki wrote:
>>>> (2012/06/18 21:43), Glauber Costa wrote:
>>>>> On 06/18/2012 04:37 PM, Kamezawa Hiroyuki wrote:
>>>>>> (2012/06/18 19:28), Glauber Costa wrote:
>>>>>>> The current memcg slab cache management fails to present satisfatory hierarchical
>>>>>>> behavior in the following scenario:
>>>>>>>
>>>>>>> -> /cgroups/memory/A/B/C
>>>>>>>
>>>>>>> * kmem limit set at A
>>>>>>> * A and B empty taskwise
>>>>>>> * bash in C does find /
>>>>>>>
>>>>>>> Because kmem_accounted is a boolean that was not set for C, no accounting
>>>>>>> would be done. This is, however, not what we expect.
>>>>>>>
>>>>>>
>>>>>> Hmm....do we need this new routines even while we have mem_cgroup_iter() ?
>>>>>>
>>>>>> Doesn't this work ?
>>>>>>
>>>>>> struct mem_cgroup {
>>>>>> .....
>>>>>> bool kmem_accounted_this;
>>>>>> atomic_t kmem_accounted;
>>>>>> ....
>>>>>> }
>>>>>>
>>>>>> at set limit
>>>>>>
>>>>>> ....set_limit(memcg) {
>>>>>>
>>>>>> if (newly accounted) {
>>>>>> mem_cgroup_iter() {
>>>>>> atomic_inc(&iter->kmem_accounted)
>>>>>> }
>>>>>> } else {
>>>>>> mem_cgroup_iter() {
>>>>>> atomic_dec(&iter->kmem_accounted);
>>>>>> }
>>>>>> }
>>>>>>
>>>>>>
>>>>>> hm ? Then, you can see kmem is accounted or not by atomic_read(&memcg->kmem_accounted);
>>>>>>
>>>>>
>>>>> Accounted by itself / parent is still useful, and I see no reason to use
>>>>> an atomic + bool if we can use a pair of bits.
>>>>>
>>>>> As for the routine, I guess mem_cgroup_iter will work... It does a lot
>>>>> more than I need, but for the sake of using what's already in there, I
>>>>> can switch to it with no problems.
>>>>>
>>>>
>>>> Hmm. please start from reusing existing routines.
>>>> If it's not enough, some enhancement for generic cgroup will be welcomed
>>>> rather than completely new one only for memcg.
>>>>
>>>
>>> And now that I am trying to adapt the code to the new function, I
>>> remember clearly why I done this way. Sorry for my failed memory.
>>>
>>> That has to do with the order of the walk. I need to enforce hierarchy,
>>> which means whenever a cgroup has !use_hierarchy, I need to cut out that
>>> branch, but continue scanning the tree for other branches.
>>>
>>> That is a lot easier to do with depth-search tree walks like the one
>>> proposed in this patch. for_each_mem_cgroup() seems to walk the tree in
>>> css-creation order. Which means we need to keep track of parents that
>>> has hierarchy disabled at all times ( can be many ), and always test for
>>> ancestorship - which is expensive, but I don't particularly care.
>>>
>>> But I'll give another shot with this one.
>>>
>>
>> Humm, silly me. I was believing the hierarchical settings to be more
>> flexible than they really are.
>>
>> I thought that it could be possible for a children of a parent with
>> use_hierarchy = 1 to have use_hierarchy = 0.
>>
>> It seems not to be the case. This makes my life a lot easier.
>>
>
> How about the following patch?
>
> It is still expensive in the clear_bit case, because I can't just walk
> the whole tree flipping the bit down: I need to stop whenever I see a
> branch whose root is itself accounted - and the ordering of iter forces
> me to always check the tree up (So we got O(n*h) h being height instead
> of O(n)).
>
> for flipping the bit up, it is easy enough.
>
>
Yes. It seems much nicer.
Thanks,
-Kame
next prev parent reply other threads:[~2012-06-23 4:19 UTC|newest]
Thread overview: 154+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-06-18 10:27 [PATCH v4 00/25] kmem limitation for memcg Glauber Costa
2012-06-18 10:27 ` Glauber Costa
2012-06-18 10:27 ` Glauber Costa
2012-06-18 10:27 ` [PATCH v4 01/25] slab: rename gfpflags to allocflags Glauber Costa
2012-06-18 10:27 ` Glauber Costa
2012-06-18 10:27 ` Glauber Costa
2012-06-18 10:27 ` [PATCH v4 02/25] provide a common place for initcall processing in kmem_cache Glauber Costa
2012-06-18 10:27 ` Glauber Costa
2012-06-18 10:27 ` Glauber Costa
2012-06-18 10:27 ` [PATCH v4 03/25] slab: move FULL state transition to an initcall Glauber Costa
2012-06-18 10:27 ` Glauber Costa
2012-06-18 10:27 ` [PATCH v4 04/25] Wipe out CFLGS_OFF_SLAB from flags during initial slab creation Glauber Costa
2012-06-18 10:27 ` Glauber Costa
2012-06-18 10:27 ` [PATCH v4 05/25] memcg: Always free struct memcg through schedule_work() Glauber Costa
2012-06-18 10:27 ` Glauber Costa
2012-06-18 10:27 ` Glauber Costa
[not found] ` <1340015298-14133-6-git-send-email-glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-06-18 12:07 ` Kamezawa Hiroyuki
2012-06-18 12:07 ` Kamezawa Hiroyuki
2012-06-18 12:07 ` Kamezawa Hiroyuki
2012-06-18 12:10 ` Glauber Costa
2012-06-18 12:10 ` Glauber Costa
[not found] ` <4FDF1AAE.4080209-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-06-19 0:11 ` Kamezawa Hiroyuki
2012-06-19 0:11 ` Kamezawa Hiroyuki
2012-06-19 0:11 ` Kamezawa Hiroyuki
2012-06-20 7:32 ` Pekka Enberg
2012-06-20 7:32 ` Pekka Enberg
2012-06-20 7:32 ` Pekka Enberg
[not found] ` <alpine.LFD.2.02.1206201031150.2989-XMdqyYT0w3YmYvmMESoHnA@public.gmane.org>
2012-06-20 8:40 ` Glauber Costa
2012-06-20 8:40 ` Glauber Costa
2012-06-20 8:40 ` Glauber Costa
[not found] ` <4FE18C6B.1020503-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-06-21 11:39 ` Kamezawa Hiroyuki
2012-06-21 11:39 ` Kamezawa Hiroyuki
2012-06-21 11:39 ` Kamezawa Hiroyuki
2012-06-20 13:20 ` Michal Hocko
2012-06-20 13:20 ` Michal Hocko
2012-06-18 10:27 ` [PATCH v4 06/25] memcg: Make it possible to use the stock for more than one page Glauber Costa
2012-06-18 10:27 ` Glauber Costa
2012-06-20 13:28 ` Michal Hocko
2012-06-20 13:28 ` Michal Hocko
2012-06-20 19:36 ` Glauber Costa
2012-06-20 19:36 ` Glauber Costa
[not found] ` <4FE2264F.4070805-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-06-21 21:14 ` Michal Hocko
2012-06-21 21:14 ` Michal Hocko
2012-06-21 21:14 ` Michal Hocko
[not found] ` <20120620132804.GF5541-VqjxzfR4DlwKmadIfiO5sKVXKuFTiq87@public.gmane.org>
2012-06-25 13:03 ` Glauber Costa
2012-06-25 13:03 ` Glauber Costa
2012-06-25 13:03 ` Glauber Costa
2012-06-18 10:28 ` [PATCH v4 07/25] memcg: Reclaim when more than one page needed Glauber Costa
2012-06-18 10:28 ` Glauber Costa
[not found] ` <1340015298-14133-8-git-send-email-glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-06-20 13:47 ` Michal Hocko
2012-06-20 13:47 ` Michal Hocko
2012-06-20 13:47 ` Michal Hocko
2012-06-20 19:43 ` Glauber Costa
2012-06-20 19:43 ` Glauber Costa
2012-06-21 21:19 ` Michal Hocko
2012-06-21 21:19 ` Michal Hocko
[not found] ` <20120621211923.GC31759-VqjxzfR4DlwKmadIfiO5sKVXKuFTiq87@public.gmane.org>
2012-06-25 13:13 ` Glauber Costa
2012-06-25 13:13 ` Glauber Costa
2012-06-25 13:13 ` Glauber Costa
2012-06-25 14:04 ` Glauber Costa
2012-06-25 14:04 ` Glauber Costa
2012-06-25 14:04 ` Glauber Costa
2012-06-18 10:28 ` [PATCH v4 08/25] memcg: change defines to an enum Glauber Costa
2012-06-18 10:28 ` Glauber Costa
[not found] ` <1340015298-14133-9-git-send-email-glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-06-20 13:13 ` Michal Hocko
2012-06-20 13:13 ` Michal Hocko
2012-06-20 13:13 ` Michal Hocko
2012-06-18 10:28 ` [PATCH v4 09/25] kmem slab accounting basic infrastructure Glauber Costa
2012-06-18 10:28 ` Glauber Costa
2012-06-18 10:28 ` [PATCH v4 10/25] slab/slub: struct memcg_params Glauber Costa
2012-06-18 10:28 ` Glauber Costa
2012-06-18 10:28 ` Glauber Costa
2012-06-18 10:28 ` [PATCH v4 11/25] consider a memcg parameter in kmem_create_cache Glauber Costa
2012-06-18 10:28 ` Glauber Costa
2012-06-18 10:28 ` Glauber Costa
2012-06-18 10:28 ` [PATCH v4 12/25] sl[au]b: always get the cache from its page in kfree Glauber Costa
2012-06-18 10:28 ` Glauber Costa
2012-06-18 10:28 ` Glauber Costa
2012-06-18 10:28 ` [PATCH v4 13/25] Add a __GFP_SLABMEMCG flag Glauber Costa
2012-06-18 10:28 ` Glauber Costa
2012-06-18 10:28 ` Glauber Costa
2012-06-18 10:28 ` [PATCH v4 14/25] memcg: kmem controller dispatch infrastructure Glauber Costa
2012-06-18 10:28 ` Glauber Costa
2012-06-18 10:28 ` [PATCH v4 15/25] allow enable_cpu_cache to use preset values for its tunables Glauber Costa
2012-06-18 10:28 ` Glauber Costa
2012-06-18 10:28 ` [PATCH v4 16/25] don't do __ClearPageSlab before freeing slab page Glauber Costa
2012-06-18 10:28 ` Glauber Costa
2012-06-18 10:28 ` [PATCH v4 17/25] skip memcg kmem allocations in specified code regions Glauber Costa
2012-06-18 10:28 ` Glauber Costa
2012-06-18 10:28 ` Glauber Costa
2012-06-18 12:19 ` Kamezawa Hiroyuki
2012-06-18 12:19 ` Kamezawa Hiroyuki
2012-06-18 10:28 ` [PATCH v4 18/25] mm: Allocate kernel pages to the right memcg Glauber Costa
2012-06-18 10:28 ` Glauber Costa
2012-06-18 10:28 ` Glauber Costa
2012-06-18 10:28 ` [PATCH v4 19/25] memcg: disable kmem code when not in use Glauber Costa
2012-06-18 10:28 ` Glauber Costa
2012-06-18 10:28 ` Glauber Costa
[not found] ` <1340015298-14133-20-git-send-email-glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-06-18 12:22 ` Kamezawa Hiroyuki
2012-06-18 12:22 ` Kamezawa Hiroyuki
2012-06-18 12:22 ` Kamezawa Hiroyuki
[not found] ` <4FDF1D76.4060406-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
2012-06-18 12:26 ` Glauber Costa
2012-06-18 12:26 ` Glauber Costa
2012-06-18 12:26 ` Glauber Costa
2012-06-18 10:28 ` [PATCH v4 20/25] memcg: destroy memcg caches Glauber Costa
2012-06-18 10:28 ` Glauber Costa
2012-06-18 10:28 ` [PATCH v4 21/25] Track all the memcg children of a kmem_cache Glauber Costa
2012-06-18 10:28 ` Glauber Costa
2012-06-18 10:28 ` [PATCH v4 22/25] slab: slab-specific propagation changes Glauber Costa
2012-06-18 10:28 ` Glauber Costa
2012-06-18 10:28 ` Glauber Costa
2012-06-18 10:28 ` [PATCH v4 23/25] memcg: propagate kmem limiting information to children Glauber Costa
2012-06-18 10:28 ` Glauber Costa
[not found] ` <1340015298-14133-24-git-send-email-glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-06-18 12:37 ` Kamezawa Hiroyuki
2012-06-18 12:37 ` Kamezawa Hiroyuki
2012-06-18 12:37 ` Kamezawa Hiroyuki
[not found] ` <4FDF20ED.4090401-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
2012-06-18 12:43 ` Glauber Costa
2012-06-18 12:43 ` Glauber Costa
2012-06-18 12:43 ` Glauber Costa
[not found] ` <4FDF227B.3080601-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-06-19 0:16 ` Kamezawa Hiroyuki
2012-06-19 0:16 ` Kamezawa Hiroyuki
2012-06-19 0:16 ` Kamezawa Hiroyuki
2012-06-19 8:35 ` Glauber Costa
2012-06-19 8:35 ` Glauber Costa
[not found] ` <4FE039B9.3080809-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-06-19 8:54 ` Glauber Costa
2012-06-19 8:54 ` Glauber Costa
2012-06-19 8:54 ` Glauber Costa
2012-06-20 8:59 ` Glauber Costa
2012-06-20 8:59 ` Glauber Costa
2012-06-20 8:59 ` Glauber Costa
2012-06-23 4:19 ` Kamezawa Hiroyuki [this message]
2012-06-23 4:19 ` Kamezawa Hiroyuki
2012-06-18 10:28 ` [PATCH v4 24/25] memcg/slub: shrink dead caches Glauber Costa
2012-06-18 10:28 ` Glauber Costa
2012-06-18 10:28 ` Glauber Costa
[not found] ` <1340015298-14133-25-git-send-email-glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-07-06 15:16 ` Christoph Lameter
2012-07-06 15:16 ` Christoph Lameter
2012-07-06 15:16 ` Christoph Lameter
[not found] ` <alpine.DEB.2.00.1207061015030.28648-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org>
2012-07-20 22:16 ` Glauber Costa
2012-07-20 22:16 ` Glauber Costa
2012-07-20 22:16 ` Glauber Costa
[not found] ` <5009D8D8.6040509-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-07-25 15:23 ` Christoph Lameter
2012-07-25 15:23 ` Christoph Lameter
2012-07-25 15:23 ` Christoph Lameter
[not found] ` <alpine.DEB.2.00.1207251022570.32678-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org>
2012-07-25 18:15 ` Glauber Costa
2012-07-25 18:15 ` Glauber Costa
2012-07-25 18:15 ` Glauber Costa
2012-06-18 10:28 ` [PATCH v4 25/25] Documentation: add documentation for slab tracker for memcg Glauber Costa
2012-06-18 10:28 ` Glauber Costa
2012-06-18 12:10 ` [PATCH v4 00/25] kmem limitation " Kamezawa Hiroyuki
2012-06-18 12:10 ` Kamezawa Hiroyuki
[not found] ` <4FDF1ABE.7070200-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
2012-06-18 12:14 ` Glauber Costa
2012-06-18 12:14 ` Glauber Costa
2012-06-18 12:14 ` Glauber Costa
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4FE543D9.2010802@jp.fujitsu.com \
--to=kamezawa.hiroyu@jp.fujitsu.com \
--cc=cgroups@vger.kernel.org \
--cc=cl@linux.com \
--cc=devel@openvz.org \
--cc=fweisbec@gmail.com \
--cc=glommer@parallels.com \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.cz \
--cc=penberg@cs.helsinki.fi \
--cc=penberg@kernel.org \
--cc=rientjes@google.com \
--cc=suleiman@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.