From: Pavel Emelianov <xemul@openvz.org>
To: herbert@13thfloor.at, dev@sw.ru, containers@lists.osdl.org,
ckrm-tech@lists.sourceforge.net, linux-kernel@vger.kernel.org
Cc: akpm@osdl.org, pj@sgi.com, sekharan@us.ibm.com,
menage@google.com, serue@us.ibm.com, vatsa@in.ibm.com,
rohitseth@google.com, winget@google.com
Subject: Re: [PATCH 6/6] containers: BeanCounters over generic process containers
Date: Mon, 25 Dec 2006 13:35:15 +0300 [thread overview]
Message-ID: <458FA963.2050502@openvz.org> (raw)
In-Reply-To: <20061223194955.GA30764@MAIL.13thfloor.at>
Herbert Poetzl wrote:
> On Fri, Dec 22, 2006 at 06:14:48AM -0800, Paul Menage wrote:
>> This patch implements the BeanCounter resource control abstraction
>> over generic process containers. It contains the beancounter core
>> code, plus the numfiles resource counter. It doesn't currently contain
>> any of the memory tracking code or the code for switching beancounter
>> context in interrupts.
>
> I don't like it, it looks bloated and probably
> adds plenty of overhead (similar to the OVZ
> implementation where this seems to be taken from)
FULL BC patch w/o pages fractions accounting doesn't
add any noticeable overhead to mainstream kernel.
Pages fractions accounting will be optimized as well.
The part you're talking about is only 1/100 of the
complete patch.
> here are some comments/questions:
>
>> Currently all the beancounters resource counters are lumped into a
>> single hierarchy; ideally it would be possible for each resource
>> counter to be a separate container subsystem, allowing them to be
>> connected to different hierarchies.
>>
>> +static inline void bc_uncharge(struct beancounter *bc, int res_id,
>> + unsigned long val)
>> +{
>> + unsigned long flags;
>> +
>> + spin_lock_irqsave(&bc->bc_lock, flags);
>> + bc_uncharge_locked(bc, res_id, val);
>> + spin_unlock_irqrestore(&bc->bc_lock, flags);
>
> why use a spinlock, when we could use atomic
> counters?
Because approach
if (atomic_read(&bc->barrier) > aromic_read(&bc->held))
atomic_inc(&bc->held);
used in vserver accounting is not atomic ;)
Look at the comment below about charging two resources at once.
>
>> +int bc_charge_locked(struct beancounter *bc, int res, unsigned long val,
>> + int strict, unsigned long flags)
>> +{
>> + struct bc_resource_parm *parm;
>> + unsigned long new_held;
>> +
>> + BUG_ON(val > BC_MAXVALUE);
>> +
>> + parm = &bc->bc_parms[res];
>> + new_held = parm->held + val;
>> +
>> + switch (strict) {
>> + case BC_LIMIT:
>> + if (new_held > parm->limit)
>> + break;
>> + /* fallthrough */
>> + case BC_BARRIER:
>> + if (new_held > parm->barrier) {
>> + if (strict == BC_BARRIER)
>> + break;
>> + if (parm->held < parm->barrier &&
>> + bc_resources[res]->bcr_barrier_hit)
>> + bc_resources[res]->bcr_barrier_hit(bc);
>> + }
>
> why do barrier checks with every accounting?
> there are probably a few cases where the
> checks could be independant from the accounting
Let's look at
if (parm->held < parm->barrier &&
bc_resources[res]->bcr_barrier_hit)
bc_resources[res]->bcr_barrier_hit(bc);
code one more time.
In case of BC_LIMIT charge BC code informs resource
controller about BARRIER hit to take some actions
before hard resource shortage.
>> + /* fallthrough */
>> + case BC_FORCE:
>> + parm->held = new_held;
>> + bc_adjust_maxheld(parm);
>
> in what cases do we want to cross the barrier?
>
>> + return 0;
>> + default:
>> + BUG();
>> + }
>> +
>> + if (bc_resources[res]->bcr_limit_hit)
>> + return bc_resources[res]->bcr_limit_hit(bc, val, flags);
>> +
>> + parm->failcnt++;
>> + return -ENOMEM;
>
>> +int bc_file_charge(struct file *file)
>> +{
>> + int sev;
>> + struct beancounter *bc;
>> +
>> + task_lock(current);
>
> why do we lock current? it won't go away that
> easily, and for switching the bc, it might be
> better to use RCU or a separate lock, no?
This came from containers patches. BC code doesn't take
locks on fast paths.
>> + bc = task_bc(current);
>> + css_get_current(&bc->css);
>> + task_unlock(current);
>> +
>> + sev = (capable(CAP_SYS_ADMIN) ? BC_LIMIT : BC_BARRIER);
>> +
>> + if (bc_charge(bc, BC_NUMFILES, 1, sev)) {
>> + css_put(&bc->css);
>> + return -EMFILE;
>> + }
>> +
>> + file->f_bc = bc;
>> + return 0;
>> +}
>
> also note that certain limits are much more
> complicated than the (very simple) file limits
> and the code will be called at higher frequency
We do know it and we have "pre-charges" optimization
for frequent calls. bc->lock we've seen is used to
make two or more resources charge in only one atomic
operation, that is faster than doing atomic_inc()
for each resource as you've proposed above.
> how to handle requests like:
> try to get as 64 files or as many as available
> whatever is smaller
I promise, that if Linus will include patch that adds a syscall
to open 64 or "as many as available whatever is smaller" files
at once we'll add this functionality.
> happy xmas,
> Herbert
>
>
next prev parent reply other threads:[~2006-12-25 10:35 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-12-22 14:14 [PATCH 0/6] containers: Generic Process Containers (V6) Paul Menage
2006-12-22 14:14 ` [PATCH 1/6] containers: Generic container system abstracted from cpusets code Paul Menage
2006-12-30 13:10 ` Eric W. Biederman
2006-12-31 5:17 ` Paul Jackson
2007-01-02 22:15 ` Paul Menage
2006-12-22 14:14 ` [PATCH 2/6] containers: Cpusets hooked into containers Paul Menage
2006-12-22 14:14 ` [PATCH 3/6] containers: Add generic multi-subsystem API to containers Paul Menage
2007-01-10 15:56 ` [ckrm-tech] " Balbir Singh
2007-01-11 22:53 ` Paul Menage
2007-01-12 6:29 ` Balbir Singh
2007-01-12 8:10 ` Paul Menage
2007-01-12 8:22 ` Balbir Singh
2007-01-20 17:27 ` Balbir Singh
2006-12-22 14:14 ` [PATCH 4/6] containers: Simple CPU accounting container subsystem Paul Menage
2007-01-10 14:21 ` [ckrm-tech] " Balbir Singh
2007-01-12 0:33 ` Paul Menage
2007-01-12 6:24 ` Balbir Singh
2007-01-12 8:15 ` Paul Menage
2007-01-12 8:26 ` Balbir Singh
2007-01-12 17:32 ` Paul Menage
2007-01-15 9:01 ` [PATCH 0/1] Add mount/umount callbacks to containers (Re: [ckrm-tech] [PATCH 4/6] containers: Simple CPU accounting container subsystem) Balbir Singh
2007-01-15 9:04 ` [PATCH 1/1] Fix a panic while mouting containers on powerpc and some other small cleanups " Balbir Singh
2007-01-15 9:22 ` Paul Menage
2007-01-15 9:51 ` [ckrm-tech] [PATCH 1/1] Fix a panic while mouting containers on powerpc and some other small cleanups (Re: " Balbir Singh
2007-01-15 10:01 ` Paul Menage
2007-01-15 10:10 ` Balbir Singh
2006-12-22 14:14 ` [PATCH 5/6] containers: Resource Groups over generic containers Paul Menage
2006-12-22 14:14 ` [PATCH 6/6] containers: BeanCounters over generic process containers Paul Menage
2006-12-23 19:49 ` Herbert Poetzl
2006-12-24 11:32 ` Paul Menage
2006-12-25 10:16 ` Kirill Korotaev
2006-12-26 0:54 ` Paul Menage
2006-12-25 10:35 ` Pavel Emelianov [this message]
2007-01-03 14:43 ` [PATCH 0/6] containers: Generic Process Containers (V6) Serge E. Hallyn
2007-01-05 0:25 ` Paul Menage
2007-01-12 18:42 ` Serge E. Hallyn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=458FA963.2050502@openvz.org \
--to=xemul@openvz.org \
--cc=akpm@osdl.org \
--cc=ckrm-tech@lists.sourceforge.net \
--cc=containers@lists.osdl.org \
--cc=dev@sw.ru \
--cc=herbert@13thfloor.at \
--cc=linux-kernel@vger.kernel.org \
--cc=menage@google.com \
--cc=pj@sgi.com \
--cc=rohitseth@google.com \
--cc=sekharan@us.ibm.com \
--cc=serue@us.ibm.com \
--cc=vatsa@in.ibm.com \
--cc=winget@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox