From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754383AbWLYKfN (ORCPT ); Mon, 25 Dec 2006 05:35:13 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1754393AbWLYKfN (ORCPT ); Mon, 25 Dec 2006 05:35:13 -0500 Received: from mailhub.sw.ru ([195.214.233.200]:31813 "EHLO relay.sw.ru" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754383AbWLYKfM (ORCPT ); Mon, 25 Dec 2006 05:35:12 -0500 Message-ID: <458FA963.2050502@openvz.org> Date: Mon, 25 Dec 2006 13:35:15 +0300 From: Pavel Emelianov User-Agent: Thunderbird 1.5 (X11/20060317) MIME-Version: 1.0 To: herbert@13thfloor.at, dev@sw.ru, containers@lists.osdl.org, ckrm-tech@lists.sourceforge.net, linux-kernel@vger.kernel.org CC: akpm@osdl.org, pj@sgi.com, sekharan@us.ibm.com, menage@google.com, serue@us.ibm.com, vatsa@in.ibm.com, rohitseth@google.com, winget@google.com Subject: Re: [PATCH 6/6] containers: BeanCounters over generic process containers References: <20061222141442.753211763@menage.corp.google.com> <20061222145217.107513155@menage.corp.google.com> <20061223194955.GA30764@MAIL.13thfloor.at> In-Reply-To: <20061223194955.GA30764@MAIL.13thfloor.at> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Herbert Poetzl wrote: > On Fri, Dec 22, 2006 at 06:14:48AM -0800, Paul Menage wrote: >> This patch implements the BeanCounter resource control abstraction >> over generic process containers. It contains the beancounter core >> code, plus the numfiles resource counter. It doesn't currently contain >> any of the memory tracking code or the code for switching beancounter >> context in interrupts. > > I don't like it, it looks bloated and probably > adds plenty of overhead (similar to the OVZ > implementation where this seems to be taken from) FULL BC patch w/o pages fractions accounting doesn't add any noticeable overhead to mainstream kernel. Pages fractions accounting will be optimized as well. The part you're talking about is only 1/100 of the complete patch. > here are some comments/questions: > >> Currently all the beancounters resource counters are lumped into a >> single hierarchy; ideally it would be possible for each resource >> counter to be a separate container subsystem, allowing them to be >> connected to different hierarchies. >> >> +static inline void bc_uncharge(struct beancounter *bc, int res_id, >> + unsigned long val) >> +{ >> + unsigned long flags; >> + >> + spin_lock_irqsave(&bc->bc_lock, flags); >> + bc_uncharge_locked(bc, res_id, val); >> + spin_unlock_irqrestore(&bc->bc_lock, flags); > > why use a spinlock, when we could use atomic > counters? Because approach if (atomic_read(&bc->barrier) > aromic_read(&bc->held)) atomic_inc(&bc->held); used in vserver accounting is not atomic ;) Look at the comment below about charging two resources at once. > >> +int bc_charge_locked(struct beancounter *bc, int res, unsigned long val, >> + int strict, unsigned long flags) >> +{ >> + struct bc_resource_parm *parm; >> + unsigned long new_held; >> + >> + BUG_ON(val > BC_MAXVALUE); >> + >> + parm = &bc->bc_parms[res]; >> + new_held = parm->held + val; >> + >> + switch (strict) { >> + case BC_LIMIT: >> + if (new_held > parm->limit) >> + break; >> + /* fallthrough */ >> + case BC_BARRIER: >> + if (new_held > parm->barrier) { >> + if (strict == BC_BARRIER) >> + break; >> + if (parm->held < parm->barrier && >> + bc_resources[res]->bcr_barrier_hit) >> + bc_resources[res]->bcr_barrier_hit(bc); >> + } > > why do barrier checks with every accounting? > there are probably a few cases where the > checks could be independant from the accounting Let's look at if (parm->held < parm->barrier && bc_resources[res]->bcr_barrier_hit) bc_resources[res]->bcr_barrier_hit(bc); code one more time. In case of BC_LIMIT charge BC code informs resource controller about BARRIER hit to take some actions before hard resource shortage. >> + /* fallthrough */ >> + case BC_FORCE: >> + parm->held = new_held; >> + bc_adjust_maxheld(parm); > > in what cases do we want to cross the barrier? > >> + return 0; >> + default: >> + BUG(); >> + } >> + >> + if (bc_resources[res]->bcr_limit_hit) >> + return bc_resources[res]->bcr_limit_hit(bc, val, flags); >> + >> + parm->failcnt++; >> + return -ENOMEM; > >> +int bc_file_charge(struct file *file) >> +{ >> + int sev; >> + struct beancounter *bc; >> + >> + task_lock(current); > > why do we lock current? it won't go away that > easily, and for switching the bc, it might be > better to use RCU or a separate lock, no? This came from containers patches. BC code doesn't take locks on fast paths. >> + bc = task_bc(current); >> + css_get_current(&bc->css); >> + task_unlock(current); >> + >> + sev = (capable(CAP_SYS_ADMIN) ? BC_LIMIT : BC_BARRIER); >> + >> + if (bc_charge(bc, BC_NUMFILES, 1, sev)) { >> + css_put(&bc->css); >> + return -EMFILE; >> + } >> + >> + file->f_bc = bc; >> + return 0; >> +} > > also note that certain limits are much more > complicated than the (very simple) file limits > and the code will be called at higher frequency We do know it and we have "pre-charges" optimization for frequent calls. bc->lock we've seen is used to make two or more resources charge in only one atomic operation, that is faster than doing atomic_inc() for each resource as you've proposed above. > how to handle requests like: > try to get as 64 files or as many as available > whatever is smaller I promise, that if Linus will include patch that adds a syscall to open 64 or "as many as available whatever is smaller" files at once we'll add this functionality. > happy xmas, > Herbert > >