From mboxrd@z Thu Jan 1 00:00:00 1970 From: Glauber Costa Subject: Re: [RFC REPOST] cgroup: removing css reference drain wait during cgroup removal Date: Thu, 15 Mar 2012 15:24:23 +0400 Message-ID: <4F61D167.4000402@parallels.com> References: <20120312213155.GE23255@google.com> <20120312213343.GF23255@google.com> <20120313151148.f8004a00.kamezawa.hiroyu@jp.fujitsu.com> <20120313163914.GD7349@google.com> <20120314092828.3321731c.kamezawa.hiroyu@jp.fujitsu.com> <4F6068F4.4090909@parallels.com> <4F6134E1.5090601@jp.fujitsu.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <4F6134E1.5090601-+CUm20s59erQFUHtdCDX3A@public.gmane.org> List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: containers-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org Errors-To: containers-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org Content-Type: text/plain; charset="us-ascii"; format="flowed" To: KAMEZAWA Hiroyuki Cc: Jens Axboe , Peter Zijlstra , containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org, Hugh Dickins , linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Michal Hocko , linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Johannes Weiner , Tejun Heo , Vivek Goyal On 03/15/2012 04:16 AM, KAMEZAWA Hiroyuki wrote: > (2012/03/14 18:46), Glauber Costa wrote: > >> On 03/14/2012 04:28 AM, KAMEZAWA Hiroyuki wrote: >>> IIUC, in general, even in the processes are in a tree, in major case >>> of servers, their workloads are independent. >>> I think FLAT mode is the dafault. 'heararchical' is a crazy thing which >>> cannot be managed. >> >> Better pay attention to the current overall cgroups discussions being >> held by Tejun then. ([RFD] cgroup: about multiple hierarchies) >> >> The topic of whether of adapting all cgroups to be hierarchical by >> deafult is a recurring one. >> >> I personally think that it is not unachievable to make res_counters >> cheaper, therefore making this less of a problem. >> > > > I thought of this a little yesterday. Current my idea is applying following > rule for res_counter. > > 1. All res_counter is hierarchical. But behavior should be optimized. > > 2. If parent res_counter has UNLIMITED limit, 'usage' will not be propagated > to its parent at _charge_. That doesn't seem to make much sense. If you are unlimited, but your parent is limited, he has a lot more interest to know about the charge than you do. So the logic should rather be the opposite: Don't go around getting locks and all that if you are unlimited. Your parent might, though. I am trying to experiment a bit with billing to percpu counters for unlimited res_counters. But their inexact nature is giving me quite a headache. > 3. If a res_counter has UNLIMITED limit, at reading usage, it must visit > all children and returns a sum of them. > > Then, > /cgroup/ > memory/ (unlimited) > libivirt/ (unlimited) > qeumu/ (unlimited) > guest/(limited) > > All dir can show hierarchical usage and the guest will not have > any lock contention at runtime. If we are okay with summing it up at read time, we may as well keep everything in percpu counters at all times. > > By this > 1. no runtime overhead if the parent has unlimited limit. > 2. All res_counter can show aggregate resource usage of children. > > To do this > 1. res_coutner should have children list by itself. > > Implementation problem > - What should happens when a user set new limit to a res_counter which have > childrens ? Shouldn't we allow it ? Or take all locks of children and > update in atomic ? Well, increasing the limit should be always possible. As for the kids, how about: - ) Take their locks - ) scan through them seeing if their usage is bellow the new allowance -) if it is, then ok -) if it is not, then try to reclaim (*). Fail if it is not possible. (*) May be hard to implement, because we already have the res_counter lock taken, and the code may get nasty. So maybe it is better just fail if any of your kids usage is over the new allowance... > - memory.use_hierarchy should be obsolete ? If we're going fully hierarchical, yes.