From: Pavel Emelianov <xemul@openvz.org>
To: herbert@13thfloor.at, dev@sw.ru, containers@lists.osdl.org,
ckrm-tech@lists.sourceforge.net, linux-kernel@vger.kernel.org
Cc: akpm@osdl.org, pj@sgi.com, sekharan@us.ibm.com,
menage@google.com, serue@us.ibm.com, vatsa@in.ibm.com,
rohitseth@google.com, winget@google.com
Subject: Re: [PATCH 6/6] containers: BeanCounters over generic process containers
Date: Mon, 25 Dec 2006 13:35:15 +0300 [thread overview]
Message-ID: <458FA963.2050502@openvz.org> (raw)
In-Reply-To: <20061223194955.GA30764@MAIL.13thfloor.at>
Herbert Poetzl wrote:
> On Fri, Dec 22, 2006 at 06:14:48AM -0800, Paul Menage wrote:
>> This patch implements the BeanCounter resource control abstraction
>> over generic process containers. It contains the beancounter core
>> code, plus the numfiles resource counter. It doesn't currently contain
>> any of the memory tracking code or the code for switching beancounter
>> context in interrupts.
>
> I don't like it, it looks bloated and probably
> adds plenty of overhead (similar to the OVZ
> implementation where this seems to be taken from)
FULL BC patch w/o pages fractions accounting doesn't
add any noticeable overhead to mainstream kernel.
Pages fractions accounting will be optimized as well.
The part you're talking about is only 1/100 of the
complete patch.
> here are some comments/questions:
>
>> Currently all the beancounters resource counters are lumped into a
>> single hierarchy; ideally it would be possible for each resource
>> counter to be a separate container subsystem, allowing them to be
>> connected to different hierarchies.
>>
>> +static inline void bc_uncharge(struct beancounter *bc, int res_id,
>> + unsigned long val)
>> +{
>> + unsigned long flags;
>> +
>> + spin_lock_irqsave(&bc->bc_lock, flags);
>> + bc_uncharge_locked(bc, res_id, val);
>> + spin_unlock_irqrestore(&bc->bc_lock, flags);
>
> why use a spinlock, when we could use atomic
> counters?
Because approach
if (atomic_read(&bc->barrier) > aromic_read(&bc->held))
atomic_inc(&bc->held);
used in vserver accounting is not atomic ;)
Look at the comment below about charging two resources at once.
>
>> +int bc_charge_locked(struct beancounter *bc, int res, unsigned long val,
>> + int strict, unsigned long flags)
>> +{
>> + struct bc_resource_parm *parm;
>> + unsigned long new_held;
>> +
>> + BUG_ON(val > BC_MAXVALUE);
>> +
>> + parm = &bc->bc_parms[res];
>> + new_held = parm->held + val;
>> +
>> + switch (strict) {
>> + case BC_LIMIT:
>> + if (new_held > parm->limit)
>> + break;
>> + /* fallthrough */
>> + case BC_BARRIER:
>> + if (new_held > parm->barrier) {
>> + if (strict == BC_BARRIER)
>> + break;
>> + if (parm->held < parm->barrier &&
>> + bc_resources[res]->bcr_barrier_hit)
>> + bc_resources[res]->bcr_barrier_hit(bc);
>> + }
>
> why do barrier checks with every accounting?
> there are probably a few cases where the
> checks could be independant from the accounting
Let's look at
if (parm->held < parm->barrier &&
bc_resources[res]->bcr_barrier_hit)
bc_resources[res]->bcr_barrier_hit(bc);
code one more time.
In case of BC_LIMIT charge BC code informs resource
controller about BARRIER hit to take some actions
before hard resource shortage.
>> + /* fallthrough */
>> + case BC_FORCE:
>> + parm->held = new_held;
>> + bc_adjust_maxheld(parm);
>
> in what cases do we want to cross the barrier?
>
>> + return 0;
>> + default:
>> + BUG();
>> + }
>> +
>> + if (bc_resources[res]->bcr_limit_hit)
>> + return bc_resources[res]->bcr_limit_hit(bc, val, flags);
>> +
>> + parm->failcnt++;
>> + return -ENOMEM;
>
>> +int bc_file_charge(struct file *file)
>> +{
>> + int sev;
>> + struct beancounter *bc;
>> +
>> + task_lock(current);
>
> why do we lock current? it won't go away that
> easily, and for switching the bc, it might be
> better to use RCU or a separate lock, no?
This came from containers patches. BC code doesn't take
locks on fast paths.
>> + bc = task_bc(current);
>> + css_get_current(&bc->css);
>> + task_unlock(current);
>> +
>> + sev = (capable(CAP_SYS_ADMIN) ? BC_LIMIT : BC_BARRIER);
>> +
>> + if (bc_charge(bc, BC_NUMFILES, 1, sev)) {
>> + css_put(&bc->css);
>> + return -EMFILE;
>> + }
>> +
>> + file->f_bc = bc;
>> + return 0;
>> +}
>
> also note that certain limits are much more
> complicated than the (very simple) file limits
> and the code will be called at higher frequency
We do know it and we have "pre-charges" optimization
for frequent calls. bc->lock we've seen is used to
make two or more resources charge in only one atomic
operation, that is faster than doing atomic_inc()
for each resource as you've proposed above.
> how to handle requests like:
> try to get as 64 files or as many as available
> whatever is smaller
I promise, that if Linus will include patch that adds a syscall
to open 64 or "as many as available whatever is smaller" files
at once we'll add this functionality.
> happy xmas,
> Herbert
>
>
next prev parent reply other threads:[~2006-12-25 10:35 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-12-22 14:14 [PATCH 0/6] containers: Generic Process Containers (V6) Paul Menage
2006-12-22 14:14 ` [PATCH 1/6] containers: Generic container system abstracted from cpusets code Paul Menage
2006-12-30 13:10 ` Eric W. Biederman
2006-12-31 5:17 ` Paul Jackson
2007-01-02 22:15 ` Paul Menage
2006-12-22 14:14 ` [PATCH 2/6] containers: Cpusets hooked into containers Paul Menage
2006-12-22 14:14 ` [PATCH 3/6] containers: Add generic multi-subsystem API to containers Paul Menage
2007-01-10 15:56 ` [ckrm-tech] " Balbir Singh
2007-01-11 22:53 ` Paul Menage
2007-01-12 6:29 ` Balbir Singh
2007-01-12 8:10 ` Paul Menage
2007-01-12 8:22 ` Balbir Singh
2007-01-20 17:27 ` Balbir Singh
2006-12-22 14:14 ` [PATCH 4/6] containers: Simple CPU accounting container subsystem Paul Menage
2007-01-10 14:21 ` [ckrm-tech] " Balbir Singh
2007-01-12 0:33 ` Paul Menage
2007-01-12 6:24 ` Balbir Singh
2007-01-12 8:15 ` Paul Menage
2007-01-12 8:26 ` Balbir Singh
2007-01-12 17:32 ` Paul Menage
2007-01-15 9:01 ` [PATCH 0/1] Add mount/umount callbacks to containers (Re: [ckrm-tech] [PATCH 4/6] containers: Simple CPU accounting container subsystem) Balbir Singh
2007-01-15 9:04 ` [PATCH 1/1] Fix a panic while mouting containers on powerpc and some other small cleanups " Balbir Singh
2007-01-15 9:22 ` Paul Menage
2007-01-15 9:51 ` [ckrm-tech] [PATCH 1/1] Fix a panic while mouting containers on powerpc and some other small cleanups (Re: " Balbir Singh
2007-01-15 10:01 ` Paul Menage
2007-01-15 10:10 ` Balbir Singh
2006-12-22 14:14 ` [PATCH 5/6] containers: Resource Groups over generic containers Paul Menage
2006-12-22 14:14 ` [PATCH 6/6] containers: BeanCounters over generic process containers Paul Menage
2006-12-23 19:49 ` Herbert Poetzl
2006-12-24 11:32 ` Paul Menage
2006-12-25 10:16 ` Kirill Korotaev
2006-12-26 0:54 ` Paul Menage
2006-12-25 10:35 ` Pavel Emelianov [this message]
2007-01-03 14:43 ` [PATCH 0/6] containers: Generic Process Containers (V6) Serge E. Hallyn
2007-01-05 0:25 ` Paul Menage
2007-01-12 18:42 ` Serge E. Hallyn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=458FA963.2050502@openvz.org \
--to=xemul@openvz.org \
--cc=akpm@osdl.org \
--cc=ckrm-tech@lists.sourceforge.net \
--cc=containers@lists.osdl.org \
--cc=dev@sw.ru \
--cc=herbert@13thfloor.at \
--cc=linux-kernel@vger.kernel.org \
--cc=menage@google.com \
--cc=pj@sgi.com \
--cc=rohitseth@google.com \
--cc=sekharan@us.ibm.com \
--cc=serue@us.ibm.com \
--cc=vatsa@in.ibm.com \
--cc=winget@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.