All of lore.kernel.org
 help / color / mirror / Atom feed
From: Pavel Emelyanov <xemul@openvz.org>
To: balbir@linux.vnet.ibm.com
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Hugh Dickins <hugh@veritas.com>,
	Sudhir Kumar <skumar@linux.vnet.ibm.com>,
	YAMAMOTO Takashi <yamamoto@valinux.co.jp>,
	Paul Menage <menage@google.com>,
	lizf@cn.fujitsu.com, linux-kernel@vger.kernel.org,
	taka@valinux.co.jp, linux-mm@kvack.org,
	David Rientjes <rientjes@google.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Subject: Re: [RFC][2/3] Account and control virtual address space allocations (v2)
Date: Thu, 27 Mar 2008 11:24:35 +0300	[thread overview]
Message-ID: <47EB59C3.3080803@openvz.org> (raw)
In-Reply-To: <47EB548D.2050609@linux.vnet.ibm.com>

Balbir Singh wrote:
> Pavel Emelyanov wrote:
>> Balbir Singh wrote:
>>> Changelog v2
>>> ------------
>>> Change the accounting to what is already present in the kernel. Split
>>> the address space accounting into mem_cgroup_charge_as and
>>> mem_cgroup_uncharge_as. At the time of VM expansion, call
>>> mem_cgroup_cannot_expand_as to check if the new allocation will push
>>> us over the limit
>>>
>>> This patch implements accounting and control of virtual address space.
>>> Accounting is done when the virtual address space of any task/mm_struct
>>> belonging to the cgroup is incremented or decremented. This patch
>>> fails the expansion if the cgroup goes over its limit.
>>>
>>> TODOs
>>>
>>> 1. Only when CONFIG_MMU is enabled, is the virtual address space control
>>>    enabled. Should we do this for nommu cases as well? My suspicion is
>>>    that we don't have to.
>>>
>>> Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
>>> ---
>>>
>>>  arch/ia64/kernel/perfmon.c  |    2 +
>>>  arch/x86/kernel/ptrace.c    |    7 +++
>>>  fs/exec.c                   |    2 +
>>>  include/linux/memcontrol.h  |   26 +++++++++++++
>>>  include/linux/res_counter.h |   19 ++++++++--
>>>  init/Kconfig                |    2 -
>>>  kernel/fork.c               |   17 +++++++--
>>>  mm/memcontrol.c             |   83 ++++++++++++++++++++++++++++++++++++++++++++
>>>  mm/mmap.c                   |   11 +++++
>>>  mm/mremap.c                 |    2 +
>>>  10 files changed, 163 insertions(+), 8 deletions(-)
>>>
>>> diff -puN mm/memcontrol.c~memory-controller-virtual-address-space-accounting-and-control mm/memcontrol.c
>>> --- linux-2.6.25-rc5/mm/memcontrol.c~memory-controller-virtual-address-space-accounting-and-control	2008-03-26 16:27:59.000000000 +0530
>>> +++ linux-2.6.25-rc5-balbir/mm/memcontrol.c	2008-03-27 00:18:16.000000000 +0530
>>> @@ -526,6 +526,76 @@ unsigned long mem_cgroup_isolate_pages(u
>>>  	return nr_taken;
>>>  }
>>>  
>>> +#ifdef CONFIG_CGROUP_MEM_RES_CTLR_AS
>>> +/*
>>> + * Charge the address space usage for cgroup. This routine is most
>>> + * likely to be called from places that expand the total_vm of a mm_struct.
>>> + */
>>> +void mem_cgroup_charge_as(struct mm_struct *mm, long nr_pages)
>>> +{
>>> +	struct mem_cgroup *mem;
>>> +
>>> +	if (mem_cgroup_subsys.disabled)
>>> +		return;
>>> +
>>> +	rcu_read_lock();
>>> +	mem = rcu_dereference(mm->mem_cgroup);
>>> +	css_get(&mem->css);
>>> +	rcu_read_unlock();
>>> +
>>> +	res_counter_charge(&mem->as_res, (nr_pages * PAGE_SIZE));
>>> +	css_put(&mem->css);
>> Why don't you check whether the counter is charged? This is
>> bad for two reasons:
>> 1. you allow for some growth above the limit (e.g. in expand_stack)
> 
> I was doing that earlier and then decided to keep the virtual address space code
> in sync with the RLIMIT_AS checking code in the kernel. If you see the flow, it
> closely resembles what we do with mm->total_vm and may_expand_vm().
> expand_stack() in turn calls acct_stack_growth() which calls may_expand_vm()

But this is racy! Look - you do expand_stack on two CPUs and the limit is
almost reached - so that there's room for a single expansion. In this case 
may_expand_vm will return true for both, since it only checks the limit, 
while the subsequent charge will fail on one of them, since it actually 
tries to raise the usage...

>> 2. you will undercharge it in the future when uncharging the
>>    vme, whose charge was failed and thus unaccounted.
> 
> Hmmm...  This should ideally never happen, since we do a may_expand_vm() before
> expanding the VM and in our case the virtual address space usage. I've not seen
> it during my runs either. But it is something to keep in mind.
> 


WARNING: multiple messages have this Message-ID (diff)
From: Pavel Emelyanov <xemul@openvz.org>
To: balbir@linux.vnet.ibm.com
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Hugh Dickins <hugh@veritas.com>,
	Sudhir Kumar <skumar@linux.vnet.ibm.com>,
	YAMAMOTO Takashi <yamamoto@valinux.co.jp>,
	Paul Menage <menage@google.com>,
	lizf@cn.fujitsu.com, linux-kernel@vger.kernel.org,
	taka@valinux.co.jp, linux-mm@kvack.org,
	David Rientjes <rientjes@google.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Subject: Re: [RFC][2/3] Account and control virtual address space allocations (v2)
Date: Thu, 27 Mar 2008 11:24:35 +0300	[thread overview]
Message-ID: <47EB59C3.3080803@openvz.org> (raw)
In-Reply-To: <47EB548D.2050609@linux.vnet.ibm.com>

Balbir Singh wrote:
> Pavel Emelyanov wrote:
>> Balbir Singh wrote:
>>> Changelog v2
>>> ------------
>>> Change the accounting to what is already present in the kernel. Split
>>> the address space accounting into mem_cgroup_charge_as and
>>> mem_cgroup_uncharge_as. At the time of VM expansion, call
>>> mem_cgroup_cannot_expand_as to check if the new allocation will push
>>> us over the limit
>>>
>>> This patch implements accounting and control of virtual address space.
>>> Accounting is done when the virtual address space of any task/mm_struct
>>> belonging to the cgroup is incremented or decremented. This patch
>>> fails the expansion if the cgroup goes over its limit.
>>>
>>> TODOs
>>>
>>> 1. Only when CONFIG_MMU is enabled, is the virtual address space control
>>>    enabled. Should we do this for nommu cases as well? My suspicion is
>>>    that we don't have to.
>>>
>>> Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
>>> ---
>>>
>>>  arch/ia64/kernel/perfmon.c  |    2 +
>>>  arch/x86/kernel/ptrace.c    |    7 +++
>>>  fs/exec.c                   |    2 +
>>>  include/linux/memcontrol.h  |   26 +++++++++++++
>>>  include/linux/res_counter.h |   19 ++++++++--
>>>  init/Kconfig                |    2 -
>>>  kernel/fork.c               |   17 +++++++--
>>>  mm/memcontrol.c             |   83 ++++++++++++++++++++++++++++++++++++++++++++
>>>  mm/mmap.c                   |   11 +++++
>>>  mm/mremap.c                 |    2 +
>>>  10 files changed, 163 insertions(+), 8 deletions(-)
>>>
>>> diff -puN mm/memcontrol.c~memory-controller-virtual-address-space-accounting-and-control mm/memcontrol.c
>>> --- linux-2.6.25-rc5/mm/memcontrol.c~memory-controller-virtual-address-space-accounting-and-control	2008-03-26 16:27:59.000000000 +0530
>>> +++ linux-2.6.25-rc5-balbir/mm/memcontrol.c	2008-03-27 00:18:16.000000000 +0530
>>> @@ -526,6 +526,76 @@ unsigned long mem_cgroup_isolate_pages(u
>>>  	return nr_taken;
>>>  }
>>>  
>>> +#ifdef CONFIG_CGROUP_MEM_RES_CTLR_AS
>>> +/*
>>> + * Charge the address space usage for cgroup. This routine is most
>>> + * likely to be called from places that expand the total_vm of a mm_struct.
>>> + */
>>> +void mem_cgroup_charge_as(struct mm_struct *mm, long nr_pages)
>>> +{
>>> +	struct mem_cgroup *mem;
>>> +
>>> +	if (mem_cgroup_subsys.disabled)
>>> +		return;
>>> +
>>> +	rcu_read_lock();
>>> +	mem = rcu_dereference(mm->mem_cgroup);
>>> +	css_get(&mem->css);
>>> +	rcu_read_unlock();
>>> +
>>> +	res_counter_charge(&mem->as_res, (nr_pages * PAGE_SIZE));
>>> +	css_put(&mem->css);
>> Why don't you check whether the counter is charged? This is
>> bad for two reasons:
>> 1. you allow for some growth above the limit (e.g. in expand_stack)
> 
> I was doing that earlier and then decided to keep the virtual address space code
> in sync with the RLIMIT_AS checking code in the kernel. If you see the flow, it
> closely resembles what we do with mm->total_vm and may_expand_vm().
> expand_stack() in turn calls acct_stack_growth() which calls may_expand_vm()

But this is racy! Look - you do expand_stack on two CPUs and the limit is
almost reached - so that there's room for a single expansion. In this case 
may_expand_vm will return true for both, since it only checks the limit, 
while the subsequent charge will fail on one of them, since it actually 
tries to raise the usage...

>> 2. you will undercharge it in the future when uncharging the
>>    vme, whose charge was failed and thus unaccounted.
> 
> Hmmm...  This should ideally never happen, since we do a may_expand_vm() before
> expanding the VM and in our case the virtual address space usage. I've not seen
> it during my runs either. But it is something to keep in mind.
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2008-03-27  8:25 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-03-26 18:49 [RFC][0/3] Virtual address space control for cgroups (v2) Balbir Singh
2008-03-26 18:49 ` Balbir Singh
2008-03-26 18:50 ` [RFC][1/3] Add user interface for virtual address space control (v2) Balbir Singh
2008-03-26 18:50   ` Balbir Singh
2008-03-27  9:14   ` KAMEZAWA Hiroyuki
2008-03-27  9:14     ` KAMEZAWA Hiroyuki
2008-03-27  9:39     ` Pavel Emelyanov
2008-03-27  9:39       ` Pavel Emelyanov
2008-03-27  9:46       ` Balbir Singh
2008-03-27  9:46         ` Balbir Singh
2008-03-26 18:50 ` [RFC][2/3] Account and control virtual address space allocations (v2) Balbir Singh
2008-03-26 18:50   ` Balbir Singh
2008-03-26 19:10   ` Balbir Singh
2008-03-26 19:10     ` Balbir Singh
2008-03-27  7:19   ` Pavel Emelyanov
2008-03-27  7:19     ` Pavel Emelyanov
2008-03-27  8:02     ` Balbir Singh
2008-03-27  8:02       ` Balbir Singh
2008-03-27  8:24       ` Pavel Emelyanov [this message]
2008-03-27  8:24         ` Pavel Emelyanov
2008-03-27  8:30         ` Balbir Singh
2008-03-27  8:30           ` Balbir Singh
2008-03-27  8:38           ` Pavel Emelyanov
2008-03-27  8:38             ` Pavel Emelyanov
2008-03-26 18:50 ` [RFC][3/3] Update documentation for virtual address space control (v2) Balbir Singh
2008-03-26 18:50   ` Balbir Singh
2008-03-26 22:22 ` [RFC][0/3] Virtual address space control for cgroups (v2) Paul Menage
2008-03-26 22:22   ` Paul Menage
2008-03-27  8:04   ` Balbir Singh
2008-03-27  8:04     ` Balbir Singh
2008-03-27 14:28     ` Paul Menage
2008-03-27 14:28       ` Paul Menage
2008-03-27 17:50       ` Balbir Singh
2008-03-27 17:50         ` Balbir Singh
2008-03-27 18:44         ` Paul Menage
2008-03-27 18:44           ` Paul Menage
2008-03-28  3:59           ` Balbir Singh
2008-03-28  3:59             ` Balbir Singh
2008-03-28 14:37             ` Paul Menage
2008-03-28 14:37               ` Paul Menage
2008-03-28 18:13               ` Balbir Singh
2008-03-28 18:13                 ` Balbir Singh
2008-03-27 10:03   ` KAMEZAWA Hiroyuki
2008-03-27 10:03     ` KAMEZAWA Hiroyuki
2008-03-27 13:59     ` Paul Menage
2008-03-27 13:59       ` Paul Menage

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=47EB59C3.3080803@openvz.org \
    --to=xemul@openvz.org \
    --cc=akpm@linux-foundation.org \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=hugh@veritas.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lizf@cn.fujitsu.com \
    --cc=menage@google.com \
    --cc=rientjes@google.com \
    --cc=skumar@linux.vnet.ibm.com \
    --cc=taka@valinux.co.jp \
    --cc=yamamoto@valinux.co.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.