From: Balbir Singh <balbir@linux.vnet.ibm.com>
To: Pavel Emelyanov <xemul@openvz.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Hugh Dickins <hugh@veritas.com>,
Sudhir Kumar <skumar@linux.vnet.ibm.com>,
YAMAMOTO Takashi <yamamoto@valinux.co.jp>,
Paul Menage <menage@google.com>,
lizf@cn.fujitsu.com, linux-kernel@vger.kernel.org,
taka@valinux.co.jp, linux-mm@kvack.org,
David Rientjes <rientjes@google.com>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Subject: Re: [RFC][2/3] Account and control virtual address space allocations (v2)
Date: Thu, 27 Mar 2008 14:00:31 +0530 [thread overview]
Message-ID: <47EB5B27.2050907@linux.vnet.ibm.com> (raw)
In-Reply-To: <47EB59C3.3080803@openvz.org>
Pavel Emelyanov wrote:
> Balbir Singh wrote:
>> Pavel Emelyanov wrote:
>>> Balbir Singh wrote:
>>>> Changelog v2
>>>> ------------
>>>> Change the accounting to what is already present in the kernel. Split
>>>> the address space accounting into mem_cgroup_charge_as and
>>>> mem_cgroup_uncharge_as. At the time of VM expansion, call
>>>> mem_cgroup_cannot_expand_as to check if the new allocation will push
>>>> us over the limit
>>>>
>>>> This patch implements accounting and control of virtual address space.
>>>> Accounting is done when the virtual address space of any task/mm_struct
>>>> belonging to the cgroup is incremented or decremented. This patch
>>>> fails the expansion if the cgroup goes over its limit.
>>>>
>>>> TODOs
>>>>
>>>> 1. Only when CONFIG_MMU is enabled, is the virtual address space control
>>>> enabled. Should we do this for nommu cases as well? My suspicion is
>>>> that we don't have to.
>>>>
>>>> Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
>>>> ---
>>>>
>>>> arch/ia64/kernel/perfmon.c | 2 +
>>>> arch/x86/kernel/ptrace.c | 7 +++
>>>> fs/exec.c | 2 +
>>>> include/linux/memcontrol.h | 26 +++++++++++++
>>>> include/linux/res_counter.h | 19 ++++++++--
>>>> init/Kconfig | 2 -
>>>> kernel/fork.c | 17 +++++++--
>>>> mm/memcontrol.c | 83 ++++++++++++++++++++++++++++++++++++++++++++
>>>> mm/mmap.c | 11 +++++
>>>> mm/mremap.c | 2 +
>>>> 10 files changed, 163 insertions(+), 8 deletions(-)
>>>>
>>>> diff -puN mm/memcontrol.c~memory-controller-virtual-address-space-accounting-and-control mm/memcontrol.c
>>>> --- linux-2.6.25-rc5/mm/memcontrol.c~memory-controller-virtual-address-space-accounting-and-control 2008-03-26 16:27:59.000000000 +0530
>>>> +++ linux-2.6.25-rc5-balbir/mm/memcontrol.c 2008-03-27 00:18:16.000000000 +0530
>>>> @@ -526,6 +526,76 @@ unsigned long mem_cgroup_isolate_pages(u
>>>> return nr_taken;
>>>> }
>>>>
>>>> +#ifdef CONFIG_CGROUP_MEM_RES_CTLR_AS
>>>> +/*
>>>> + * Charge the address space usage for cgroup. This routine is most
>>>> + * likely to be called from places that expand the total_vm of a mm_struct.
>>>> + */
>>>> +void mem_cgroup_charge_as(struct mm_struct *mm, long nr_pages)
>>>> +{
>>>> + struct mem_cgroup *mem;
>>>> +
>>>> + if (mem_cgroup_subsys.disabled)
>>>> + return;
>>>> +
>>>> + rcu_read_lock();
>>>> + mem = rcu_dereference(mm->mem_cgroup);
>>>> + css_get(&mem->css);
>>>> + rcu_read_unlock();
>>>> +
>>>> + res_counter_charge(&mem->as_res, (nr_pages * PAGE_SIZE));
>>>> + css_put(&mem->css);
>>> Why don't you check whether the counter is charged? This is
>>> bad for two reasons:
>>> 1. you allow for some growth above the limit (e.g. in expand_stack)
>> I was doing that earlier and then decided to keep the virtual address space code
>> in sync with the RLIMIT_AS checking code in the kernel. If you see the flow, it
>> closely resembles what we do with mm->total_vm and may_expand_vm().
>> expand_stack() in turn calls acct_stack_growth() which calls may_expand_vm()
>
> But this is racy! Look - you do expand_stack on two CPUs and the limit is
> almost reached - so that there's room for a single expansion. In this case
> may_expand_vm will return true for both, since it only checks the limit,
> while the subsequent charge will fail on one of them, since it actually
> tries to raise the usage...
>
Hmm... yes, possibly. Thanks for pointing this out. For a single mm_struct, the
check is done under mmap_sem(), so it's OK for processes. I suspect, I'll have
to go back to what I had earlier. I don't want to add a mutex to mem_cgroup,
that will hurt parallelism badly.
--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2008-03-27 8:33 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-03-26 18:49 [RFC][0/3] Virtual address space control for cgroups (v2) Balbir Singh
2008-03-26 18:50 ` [RFC][1/3] Add user interface for virtual address space control (v2) Balbir Singh
2008-03-27 9:14 ` KAMEZAWA Hiroyuki
2008-03-27 9:39 ` Pavel Emelyanov
2008-03-27 9:46 ` Balbir Singh
2008-03-26 18:50 ` [RFC][2/3] Account and control virtual address space allocations (v2) Balbir Singh
2008-03-26 19:10 ` Balbir Singh
2008-03-27 7:19 ` Pavel Emelyanov
2008-03-27 8:02 ` Balbir Singh
2008-03-27 8:24 ` Pavel Emelyanov
2008-03-27 8:30 ` Balbir Singh [this message]
2008-03-27 8:38 ` Pavel Emelyanov
2008-03-26 18:50 ` [RFC][3/3] Update documentation for virtual address space control (v2) Balbir Singh
2008-03-26 22:22 ` [RFC][0/3] Virtual address space control for cgroups (v2) Paul Menage
2008-03-27 8:04 ` Balbir Singh
2008-03-27 14:28 ` Paul Menage
2008-03-27 17:50 ` Balbir Singh
2008-03-27 18:44 ` Paul Menage
2008-03-28 3:59 ` Balbir Singh
2008-03-28 14:37 ` Paul Menage
2008-03-28 18:13 ` Balbir Singh
2008-03-27 10:03 ` KAMEZAWA Hiroyuki
2008-03-27 13:59 ` Paul Menage
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=47EB5B27.2050907@linux.vnet.ibm.com \
--to=balbir@linux.vnet.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=hugh@veritas.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lizf@cn.fujitsu.com \
--cc=menage@google.com \
--cc=rientjes@google.com \
--cc=skumar@linux.vnet.ibm.com \
--cc=taka@valinux.co.jp \
--cc=xemul@openvz.org \
--cc=yamamoto@valinux.co.jp \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).