From: Balbir Singh <balbir@linux.vnet.ibm.com>
To: Paul Menage <menage@google.com>
Cc: linux-mm@kvack.org, Sudhir Kumar <skumar@linux.vnet.ibm.com>,
YAMAMOTO Takashi <yamamoto@valinux.co.jp>,
lizf@cn.fujitsu.com, linux-kernel@vger.kernel.org,
David Rientjes <rientjes@google.com>,
Pavel Emelianov <xemul@openvz.org>,
Andrew Morton <akpm@linux-foundation.org>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Subject: Re: [-mm][PATCH 3/4] Add rlimit controller accounting and control
Date: Thu, 08 May 2008 20:24:48 +0530 [thread overview]
Message-ID: <48231438.9030803@linux.vnet.ibm.com> (raw)
In-Reply-To: <6599ad830805062017n67d67f19w1469050d45e46ad6@mail.gmail.com>
Paul Menage wrote:
> On Sat, May 3, 2008 at 2:38 PM, Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
>> +
>> +int rlimit_cgroup_charge_as(struct mm_struct *mm, unsigned long nr_pages)
>> +{
>> + int ret;
>> + struct rlimit_cgroup *rcg;
>> +
>> + rcu_read_lock();
>> + rcg = rlimit_cgroup_from_task(rcu_dereference(mm->owner));
>> + css_get(&rcg->css);
>> + rcu_read_unlock();
>> +
>> + ret = res_counter_charge(&rcg->as_res, (nr_pages << PAGE_SHIFT));
>> + css_put(&rcg->css);
>> + return ret;
>> +}
>
> You need to synchronize against mm->owner changing, or
> mm->owner->cgroups changing. How about:
>
> int rlimit_cgroup_charge_as(struct mm_struct *mm, unsigned long nr_pages)
> {
> int ret;
> struct rlimit_cgroup *rcg;
> struct task_struct *owner;
>
> rcu_read_lock();
> again:
>
> /* Find and lock the owner of the mm */
> owner = rcu_dereference(mm->owner);
> task_lock(owner);
> if (mm->owner != owner) {
> task_unlock(owner);
> goto again;
> }
>
> /* Charge the owner's cgroup with the new memory */
> rcg = rlimit_cgroup_from_task(owner);
> ret = res_counter_charge(&rcg->as_res, (nr_pages << PAGE_SHIFT));
> task_unlock(owner);
> rcu_read_unlock();
> return ret;
> }
>
My mind goes blank at times, so forgive me asking, what happens if we don't use
task_lock(). mm->owner cannot be freed, even if it changes, we get the callback
in mm_owner_changed(). The locations from where we call _charge and _uncharge,
we know that the mm is not going to change either.
>> +
>> +void rlimit_cgroup_uncharge_as(struct mm_struct *mm, unsigned long nr_pages)
>> +{
>> + struct rlimit_cgroup *rcg;
>> +
>> + rcu_read_lock();
>> + rcg = rlimit_cgroup_from_task(rcu_dereference(mm->owner));
>> + css_get(&rcg->css);
>> + rcu_read_unlock();
>> +
>> + res_counter_uncharge(&rcg->as_res, (nr_pages << PAGE_SHIFT));
>> + css_put(&rcg->css);
>> +}
>
> Can't this be implemented as just a call to charge() with a negative
> value? (Possibly fixing res_counter_charge() to handle negative values
> if necessary) Seems simpler.
>
I did that earlier, but then Pavel suggested splitting it up as charge/uncharge.
I found that his suggestion helped make the code more readable.
>> +/*
>> + * TODO: get the attach callbacks to fail and disallow task movement.
>> + */
>
> You mean disallow all movement within a hierarchy that has this cgroup
> mounted? Doesn't that make it rather hard to use?
>
Consider the following scenario
We try to move task "t1" from cgroup "A" to cgroup "B".
Doing so, causes "B" to go over it's limit, what do we do?
Ideally, we would like to be able to go back to cgroups and say, please fail
attach, since that causes "B" to go over it's specified limit.
>> +static void rlimit_cgroup_move_task(struct cgroup_subsys *ss,
>> + struct cgroup *cgrp,
>> + struct cgroup *old_cgrp,
>> + struct task_struct *p)
>> +{
>> + struct mm_struct *mm;
>> + struct rlimit_cgroup *rcg, *old_rcg;
>> +
>> + mm = get_task_mm(p);
>> + if (mm == NULL)
>> + return;
>> +
>> + rcu_read_lock();
>> + if (p != rcu_dereference(mm->owner))
>> + goto out;
>> +
>> + rcg = rlimit_cgroup_from_cgrp(cgrp);
>> + old_rcg = rlimit_cgroup_from_cgrp(old_cgrp);
>> +
>> + if (rcg == old_rcg)
>> + goto out;
>> +
>> + if (res_counter_charge(&rcg->as_res, (mm->total_vm << PAGE_SHIFT)))
>> + goto out;
>> + res_counter_uncharge(&old_rcg->as_res, (mm->total_vm << PAGE_SHIFT));
>> +out:
>> + rcu_read_unlock();
>> + mmput(mm);
>> +}
>> +
>
> Since you need to protect against concurrent charges, and against
> concurrent mm ownership changes, I think you should just do this under
> task_lock(p).
Please see my comment above on task_lock(). I'll draw some diagrams on the
whiteboard and check the sanity of my argument.
>
>> +static void rlimit_cgroup_mm_owner_changed(struct cgroup_subsys *ss,
>> + struct cgroup *cgrp,
>> + struct cgroup *old_cgrp,
>> + struct task_struct *p)
>> +{
>> + struct rlimit_cgroup *rcg, *old_rcg;
>> + struct mm_struct *mm = get_task_mm(p);
>> +
>> + BUG_ON(!mm);
>> + rcg = rlimit_cgroup_from_cgrp(cgrp);
>> + old_rcg = rlimit_cgroup_from_cgrp(old_cgrp);
>> + if (res_counter_charge(&rcg->as_res, (mm->total_vm << PAGE_SHIFT)))
>> + goto out;
>> + res_counter_uncharge(&old_rcg->as_res, (mm->total_vm << PAGE_SHIFT));
>> +out:
>> + mmput(mm);
>> +}
>> +
>
> Also needs to task_lock(p) to prevent concurrent charges or cgroup
> reassignments?
>
The callbacks are called with task_lock() held. Please see
mm_update_next_owner->cgroup_mm_owner_callbacks() in kernel/exit.c
--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2008-05-08 14:54 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-05-03 21:37 [-mm][PATCH 0/4] Add rlimit controller to cgroups (v3) Balbir Singh
2008-05-03 21:37 ` [-mm][PATCH 1/4] Setup the rlimit controller Balbir Singh
2008-05-05 22:11 ` Andrew Morton
2008-05-06 3:40 ` Balbir Singh
2008-05-06 1:31 ` Li Zefan
2008-05-06 8:15 ` Balbir Singh
2008-05-03 21:38 ` [-mm][PATCH 2/4] Enhance cgroup mm_owner_changed callback to add task information Balbir Singh
2008-05-05 22:15 ` Andrew Morton
2008-05-06 3:43 ` Balbir Singh
2008-05-05 23:00 ` Paul Menage
2008-05-03 21:38 ` [-mm][PATCH 3/4] Add rlimit controller accounting and control Balbir Singh
2008-05-05 22:24 ` Andrew Morton
2008-05-05 22:32 ` David Rientjes
2008-05-06 5:34 ` Balbir Singh
2008-05-07 3:17 ` Paul Menage
2008-05-07 5:59 ` Pavel Emelyanov
2008-05-08 14:54 ` Balbir Singh [this message]
2008-05-08 23:22 ` Paul Menage
2008-05-07 3:29 ` Paul Menage
2008-05-08 14:35 ` Balbir Singh
2008-05-08 21:45 ` Paul Menage
2008-05-09 13:35 ` Balbir Singh
2008-05-03 21:38 ` [-mm][PATCH 4/4] Add rlimit controller documentation Balbir Singh
2008-05-05 22:35 ` Andrew Morton
2008-05-06 5:39 ` Balbir Singh
2008-05-06 5:54 ` Andrew Morton
2008-05-06 7:59 ` Balbir Singh
2008-05-04 15:24 ` [-mm][PATCH 0/4] Add rlimit controller to cgroups (v3) kamezawa.hiroyu
2008-05-05 4:21 ` Balbir Singh
2008-05-07 1:09 ` KAMEZAWA Hiroyuki
2008-05-04 15:27 ` kamezawa.hiroyu
2008-05-05 4:24 ` Balbir Singh
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=48231438.9030803@linux.vnet.ibm.com \
--to=balbir@linux.vnet.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lizf@cn.fujitsu.com \
--cc=menage@google.com \
--cc=rientjes@google.com \
--cc=skumar@linux.vnet.ibm.com \
--cc=xemul@openvz.org \
--cc=yamamoto@valinux.co.jp \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).