Re: [RFC REPOST] cgroup: removing css reference drain wait during cgroup removal

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
To: KAMEZAWA Hiroyuki
	<kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
Cc: Jens Axboe <axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org>,
	Peter Zijlstra
	<a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw@public.gmane.org>,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org,
	Hugh Dickins <hughd-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org,
	cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>,
	Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	Vivek Goyal <vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Subject: Re: [RFC REPOST] cgroup: removing css reference drain wait during cgroup removal
Date: Thu, 15 Mar 2012 15:24:23 +0400	[thread overview]
Message-ID: <4F61D167.4000402@parallels.com> (raw)
In-Reply-To: <4F6134E1.5090601-+CUm20s59erQFUHtdCDX3A@public.gmane.org>

On 03/15/2012 04:16 AM, KAMEZAWA Hiroyuki wrote:
> (2012/03/14 18:46), Glauber Costa wrote:
>
>> On 03/14/2012 04:28 AM, KAMEZAWA Hiroyuki wrote:
>>> IIUC, in general, even in the processes are in a tree, in major case
>>> of servers, their workloads are independent.
>>> I think FLAT mode is the dafault. 'heararchical' is a crazy thing which
>>> cannot be managed.
>>
>> Better pay attention to the current overall cgroups discussions being
>> held by Tejun then. ([RFD] cgroup: about multiple hierarchies)
>>
>> The topic of whether of adapting all cgroups to be hierarchical by
>> deafult is a recurring one.
>>
>> I personally think that it is not unachievable to make res_counters
>> cheaper, therefore making this less of a problem.
>>
>
>
> I thought of this a little yesterday. Current my idea is applying following
> rule for res_counter.
>
> 1. All res_counter is hierarchical. But behavior should be optimized.
>
> 2. If parent res_counter has UNLIMITED limit, 'usage' will not be propagated
>    to its parent at _charge_.

That doesn't seem to make much sense. If you are unlimited, but your 
parent is limited,
he has a lot more interest to know about the charge than you do. So the 
logic should rather be the opposite: Don't go around getting locks and 
all that if you are unlimited. Your parent might, though.

I am trying to experiment a bit with billing to percpu counters for 
unlimited res_counters. But their inexact nature is giving me quite a 
headache.

> 3. If a res_counter has UNLIMITED limit, at reading usage, it must visit
>     all children and returns a sum of them.
>
> Then,
> 	/cgroup/
> 		memory/                       (unlimited)
> 			libivirt/             (unlimited)
> 				 qeumu/       (unlimited)
> 				        guest/(limited)
>
> All dir can show hierarchical usage and the guest will not have
> any lock contention at runtime.

If we are okay with summing it up at read time, we may as well
keep everything in percpu counters at all times.
>
> By this
>   1. no runtime overhead if the parent has unlimited limit.
>   2. All res_counter can show aggregate resource usage of children.
>
> To do this
>   1. res_coutner should have children list by itself.
>
> Implementation problem
>   - What should happens when a user set new limit to a res_counter which have
>     childrens ? Shouldn't we allow it ? Or take all locks of children and
>     update in atomic ?
Well, increasing the limit should be always possible.

As for the kids, how about:

- ) Take their locks
- ) scan through them seeing if their usage is bellow the new allowance
     -) if it is, then ok
     -) if it is not, then try to reclaim (*). Fail if it is not possible.

(*) May be hard to implement, because we already have the res_counter 
lock taken, and the code may get nasty. So maybe it is better just fail 
if any of your kids usage is over the new allowance...



>   - memory.use_hierarchy should be obsolete ?
If we're going fully hierarchical, yes.

WARNING: multiple messages have this Message-ID (diff)

From: Glauber Costa <glommer@parallels.com>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Tejun Heo <tj@kernel.org>, Michal Hocko <mhocko@suse.cz>,
	Johannes Weiner <hannes@cmpxchg.org>,
	gthelen@google.com, Hugh Dickins <hughd@google.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Vivek Goyal <vgoyal@redhat.com>, Jens Axboe <axboe@kernel.dk>,
	Li Zefan <lizf@cn.fujitsu.com>,
	containers@lists.linux-foundation.org, cgroups@vger.kernel.org,
	Peter Zijlstra <a.p.zijlstra@chello.nl>
Subject: Re: [RFC REPOST] cgroup: removing css reference drain wait during cgroup removal
Date: Thu, 15 Mar 2012 15:24:23 +0400	[thread overview]
Message-ID: <4F61D167.4000402@parallels.com> (raw)
In-Reply-To: <4F6134E1.5090601@jp.fujitsu.com>

On 03/15/2012 04:16 AM, KAMEZAWA Hiroyuki wrote:
> (2012/03/14 18:46), Glauber Costa wrote:
>
>> On 03/14/2012 04:28 AM, KAMEZAWA Hiroyuki wrote:
>>> IIUC, in general, even in the processes are in a tree, in major case
>>> of servers, their workloads are independent.
>>> I think FLAT mode is the dafault. 'heararchical' is a crazy thing which
>>> cannot be managed.
>>
>> Better pay attention to the current overall cgroups discussions being
>> held by Tejun then. ([RFD] cgroup: about multiple hierarchies)
>>
>> The topic of whether of adapting all cgroups to be hierarchical by
>> deafult is a recurring one.
>>
>> I personally think that it is not unachievable to make res_counters
>> cheaper, therefore making this less of a problem.
>>
>
>
> I thought of this a little yesterday. Current my idea is applying following
> rule for res_counter.
>
> 1. All res_counter is hierarchical. But behavior should be optimized.
>
> 2. If parent res_counter has UNLIMITED limit, 'usage' will not be propagated
>    to its parent at _charge_.

That doesn't seem to make much sense. If you are unlimited, but your 
parent is limited,
he has a lot more interest to know about the charge than you do. So the 
logic should rather be the opposite: Don't go around getting locks and 
all that if you are unlimited. Your parent might, though.

I am trying to experiment a bit with billing to percpu counters for 
unlimited res_counters. But their inexact nature is giving me quite a 
headache.

> 3. If a res_counter has UNLIMITED limit, at reading usage, it must visit
>     all children and returns a sum of them.
>
> Then,
> 	/cgroup/
> 		memory/                       (unlimited)
> 			libivirt/             (unlimited)
> 				 qeumu/       (unlimited)
> 				        guest/(limited)
>
> All dir can show hierarchical usage and the guest will not have
> any lock contention at runtime.

If we are okay with summing it up at read time, we may as well
keep everything in percpu counters at all times.
>
> By this
>   1. no runtime overhead if the parent has unlimited limit.
>   2. All res_counter can show aggregate resource usage of children.
>
> To do this
>   1. res_coutner should have children list by itself.
>
> Implementation problem
>   - What should happens when a user set new limit to a res_counter which have
>     childrens ? Shouldn't we allow it ? Or take all locks of children and
>     update in atomic ?
Well, increasing the limit should be always possible.

As for the kids, how about:

- ) Take their locks
- ) scan through them seeing if their usage is bellow the new allowance
     -) if it is, then ok
     -) if it is not, then try to reclaim (*). Fail if it is not possible.

(*) May be hard to implement, because we already have the res_counter 
lock taken, and the code may get nasty. So maybe it is better just fail 
if any of your kids usage is over the new allowance...



>   - memory.use_hierarchy should be obsolete ?
If we're going fully hierarchical, yes.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)

From: Glauber Costa <glommer@parallels.com>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Tejun Heo <tj@kernel.org>, Michal Hocko <mhocko@suse.cz>,
	Johannes Weiner <hannes@cmpxchg.org>, <gthelen@google.com>,
	Hugh Dickins <hughd@google.com>, <linux-mm@kvack.org>,
	<linux-kernel@vger.kernel.org>, Vivek Goyal <vgoyal@redhat.com>,
	Jens Axboe <axboe@kernel.dk>, Li Zefan <lizf@cn.fujitsu.com>,
	<containers@lists.linux-foundation.org>,
	<cgroups@vger.kernel.org>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>
Subject: Re: [RFC REPOST] cgroup: removing css reference drain wait during cgroup removal
Date: Thu, 15 Mar 2012 15:24:23 +0400	[thread overview]
Message-ID: <4F61D167.4000402@parallels.com> (raw)
In-Reply-To: <4F6134E1.5090601@jp.fujitsu.com>

On 03/15/2012 04:16 AM, KAMEZAWA Hiroyuki wrote:
> (2012/03/14 18:46), Glauber Costa wrote:
>
>> On 03/14/2012 04:28 AM, KAMEZAWA Hiroyuki wrote:
>>> IIUC, in general, even in the processes are in a tree, in major case
>>> of servers, their workloads are independent.
>>> I think FLAT mode is the dafault. 'heararchical' is a crazy thing which
>>> cannot be managed.
>>
>> Better pay attention to the current overall cgroups discussions being
>> held by Tejun then. ([RFD] cgroup: about multiple hierarchies)
>>
>> The topic of whether of adapting all cgroups to be hierarchical by
>> deafult is a recurring one.
>>
>> I personally think that it is not unachievable to make res_counters
>> cheaper, therefore making this less of a problem.
>>
>
>
> I thought of this a little yesterday. Current my idea is applying following
> rule for res_counter.
>
> 1. All res_counter is hierarchical. But behavior should be optimized.
>
> 2. If parent res_counter has UNLIMITED limit, 'usage' will not be propagated
>    to its parent at _charge_.

That doesn't seem to make much sense. If you are unlimited, but your 
parent is limited,
he has a lot more interest to know about the charge than you do. So the 
logic should rather be the opposite: Don't go around getting locks and 
all that if you are unlimited. Your parent might, though.

I am trying to experiment a bit with billing to percpu counters for 
unlimited res_counters. But their inexact nature is giving me quite a 
headache.

> 3. If a res_counter has UNLIMITED limit, at reading usage, it must visit
>     all children and returns a sum of them.
>
> Then,
> 	/cgroup/
> 		memory/                       (unlimited)
> 			libivirt/             (unlimited)
> 				 qeumu/       (unlimited)
> 				        guest/(limited)
>
> All dir can show hierarchical usage and the guest will not have
> any lock contention at runtime.

If we are okay with summing it up at read time, we may as well
keep everything in percpu counters at all times.
>
> By this
>   1. no runtime overhead if the parent has unlimited limit.
>   2. All res_counter can show aggregate resource usage of children.
>
> To do this
>   1. res_coutner should have children list by itself.
>
> Implementation problem
>   - What should happens when a user set new limit to a res_counter which have
>     childrens ? Shouldn't we allow it ? Or take all locks of children and
>     update in atomic ?
Well, increasing the limit should be always possible.

As for the kids, how about:

- ) Take their locks
- ) scan through them seeing if their usage is bellow the new allowance
     -) if it is, then ok
     -) if it is not, then try to reclaim (*). Fail if it is not possible.

(*) May be hard to implement, because we already have the res_counter 
lock taken, and the code may get nasty. So maybe it is better just fail 
if any of your kids usage is over the new allowance...



>   - memory.use_hierarchy should be obsolete ?
If we're going fully hierarchical, yes.

next prev parent reply	other threads:[~2012-03-15 11:24 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-03-12 21:31 [RFC] cgroup: removing css reference drain wait during cgroup removal Tejun Heo
     [not found] ` <20120312213155.GE23255-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-03-12 21:33   ` [RFC REPOST] " Tejun Heo
2012-03-12 21:33     ` Tejun Heo
2012-03-12 21:33     ` Tejun Heo
2012-03-12 23:23     ` Tejun Heo
2012-03-12 23:23       ` Tejun Heo
     [not found]     ` <20120312213343.GF23255-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-03-12 23:23       ` Tejun Heo
2012-03-13  6:11       ` KAMEZAWA Hiroyuki
2012-03-13  6:11         ` KAMEZAWA Hiroyuki
2012-03-13  6:11         ` KAMEZAWA Hiroyuki
     [not found]         ` <20120313151148.f8004a00.kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
2012-03-13 16:39           ` Tejun Heo
2012-03-13 16:39             ` Tejun Heo
2012-03-13 16:39             ` Tejun Heo
2012-03-14  0:28             ` KAMEZAWA Hiroyuki
2012-03-14  0:28               ` KAMEZAWA Hiroyuki
     [not found]               ` <20120314092828.3321731c.kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
2012-03-14  6:11                 ` Tejun Heo
2012-03-14  6:11                 ` Tejun Heo
2012-03-14  6:11                   ` Tejun Heo
2012-03-14  6:11                   ` Tejun Heo
2012-03-14  9:46                 ` Glauber Costa
2012-03-14  9:46                   ` Glauber Costa
2012-03-14  9:46                   ` Glauber Costa
     [not found]                   ` <4F6068F4.4090909-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-03-15  0:16                     ` KAMEZAWA Hiroyuki
2012-03-15  0:16                     ` KAMEZAWA Hiroyuki
2012-03-15  0:16                       ` KAMEZAWA Hiroyuki
2012-03-15  0:16                       ` KAMEZAWA Hiroyuki
     [not found]                       ` <4F6134E1.5090601-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
2012-03-15 11:24                         ` Glauber Costa [this message]
2012-03-15 11:24                           ` Glauber Costa
2012-03-15 11:24                           ` Glauber Costa
     [not found]                           ` <4F61D167.4000402-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-03-16  0:02                             ` KAMEZAWA Hiroyuki
2012-03-16  0:02                               ` KAMEZAWA Hiroyuki
2012-03-16  0:02                               ` KAMEZAWA Hiroyuki
     [not found]                               ` <4F62830F.4060303-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
2012-03-16 10:21                                 ` Glauber Costa
2012-03-16 10:21                                   ` Glauber Costa
2012-03-16 10:21                                   ` Glauber Costa
2012-03-16 10:21                                 ` Glauber Costa
2012-03-14  9:46                 ` Glauber Costa
     [not found]             ` <20120313163914.GD7349-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-03-14  0:28               ` KAMEZAWA Hiroyuki
2012-03-13 16:39           ` Tejun Heo
2012-03-13 21:45   ` [RFC] " Matt Helsley
2012-03-13 21:45   ` Matt Helsley
     [not found]     ` <20120313214526.GG19584-52DBMbEzqgQ/wnmkkaCWp/UQ3DHhIser@public.gmane.org>
2012-03-13 22:05       ` Tejun Heo
     [not found]         ` <20120313220551.GF7349-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-03-13 22:16           ` Tejun Heo
2012-03-13 22:16             ` Tejun Heo
2012-03-13 22:16             ` Tejun Heo
2012-03-13 22:05       ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4F61D167.4000402@parallels.com \
    --to=glommer-bzqdu9zft3wakbo8gow8eq@public.gmane.org \
    --cc=a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw@public.gmane.org \
    --cc=axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org \
    --cc=cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
    --cc=hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org \
    --cc=hughd-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
    --cc=kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org \
    --cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org \
    --cc=mhocko-AlSwsSmVLrQ@public.gmane.org \
    --cc=tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
    --cc=vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.