* [RFC][0/3] Virtual address space control for cgroups
@ 2008-03-16 17:29 Balbir Singh
2008-03-16 17:29 ` [RFC][1/3] Add user interface for virtual address space control Balbir Singh
` (3 more replies)
0 siblings, 4 replies; 30+ messages in thread
From: Balbir Singh @ 2008-03-16 17:29 UTC (permalink / raw)
To: linux-mm
Cc: Hugh Dickins, Sudhir Kumar, YAMAMOTO Takashi, Paul Menage, lizf,
linux-kernel, taka, David Rientjes, Pavel Emelianov, Balbir Singh,
Andrew Morton, KAMEZAWA Hiroyuki
This is an early patchset for virtual address space control for cgroups.
The patches are against 2.6.25-rc5-mm1 and have been tested on top of
User Mode Linux.
The first patch adds the user interface. The second patch adds accounting
and control. The third patch updates documentation.
Review suggestions would be appreciated along the lines of
1. What is missing? Are all virtual address space expansion points covered?
2. Do we need to account and control address space at insert_special_mapping?
3. Address space accounting may contain duplications. Do we need to avoid it?
Comments?
series
memory-controller-virtual-address-space-control-user-interface
memory-controller-virtual-address-space-accounting-and-control
memory-controller-virtual-address-control-documentation
--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 30+ messages in thread* [RFC][1/3] Add user interface for virtual address space control 2008-03-16 17:29 [RFC][0/3] Virtual address space control for cgroups Balbir Singh @ 2008-03-16 17:29 ` Balbir Singh 2008-03-16 17:30 ` [RFC][2/3] Account and control virtual address space allocations Balbir Singh ` (2 subsequent siblings) 3 siblings, 0 replies; 30+ messages in thread From: Balbir Singh @ 2008-03-16 17:29 UTC (permalink / raw) To: linux-mm Cc: Hugh Dickins, Sudhir Kumar, YAMAMOTO Takashi, Paul Menage, lizf, linux-kernel, taka, David Rientjes, Pavel Emelianov, Balbir Singh, Andrew Morton, KAMEZAWA Hiroyuki Add as_usage_in_bytes and as_limit_in_bytes interfaces. These provide control over the total address space that the processes combined together in the cgroup can grow upto. This functionality is analogous to the RLIMIT_AS function of the getrlimit(2) and setrlimit(2) calls. A as_res resource counter is added to the mem_cgroup structure. The as_res counter handles all the accounting associated with the virtual address space accounting and control of cgroups. Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> --- mm/memcontrol.c | 31 +++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+) diff -puN mm/memcontrol.c~memory-controller-virtual-address-space-control-user-interface mm/memcontrol.c --- linux-2.6.25-rc5/mm/memcontrol.c~memory-controller-virtual-address-space-control-user-interface 2008-03-16 22:57:38.000000000 +0530 +++ linux-2.6.25-rc5-balbir/mm/memcontrol.c 2008-03-16 22:57:38.000000000 +0530 @@ -128,6 +128,10 @@ struct mem_cgroup { */ struct res_counter res; /* + * Address space limits + */ + struct res_counter as_res; + /* * Per cgroup active and inactive list, similar to the * per zone LRU lists. */ @@ -870,6 +874,21 @@ static ssize_t mem_cgroup_write(struct c mem_cgroup_write_strategy); } +static u64 mem_cgroup_as_read(struct cgroup *cont, struct cftype *cft) +{ + return res_counter_read_u64(&mem_cgroup_from_cont(cont)->as_res, + cft->private); +} + +static ssize_t mem_cgroup_as_write(struct cgroup *cont, struct cftype *cft, + struct file *file, const char __user *userbuf, + size_t nbytes, loff_t *ppos) +{ + return res_counter_write(&mem_cgroup_from_cont(cont)->as_res, + cft->private, userbuf, nbytes, ppos, + mem_cgroup_write_strategy); +} + static ssize_t mem_force_empty_write(struct cgroup *cont, struct cftype *cft, struct file *file, const char __user *userbuf, @@ -931,6 +950,17 @@ static struct cftype mem_cgroup_files[] .read_u64 = mem_cgroup_read, }, { + .name = "as_usage_in_bytes", + .private = RES_USAGE, + .read_u64 = mem_cgroup_as_read, + }, + { + .name = "as_limit_in_bytes", + .private = RES_LIMIT, + .write = mem_cgroup_as_write, + .read_u64 = mem_cgroup_as_read, + }, + { .name = "failcnt", .private = RES_FAILCNT, .read_u64 = mem_cgroup_read, @@ -999,6 +1029,7 @@ mem_cgroup_create(struct cgroup_subsys * return ERR_PTR(-ENOMEM); res_counter_init(&mem->res); + res_counter_init(&mem->as_res); memset(&mem->info, 0, sizeof(mem->info)); diff -puN include/linux/memcontrol.h~memory-controller-virtual-address-space-control-user-interface include/linux/memcontrol.h _ -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* [RFC][2/3] Account and control virtual address space allocations 2008-03-16 17:29 [RFC][0/3] Virtual address space control for cgroups Balbir Singh 2008-03-16 17:29 ` [RFC][1/3] Add user interface for virtual address space control Balbir Singh @ 2008-03-16 17:30 ` Balbir Singh 2008-03-17 2:02 ` Paul Menage ` (3 more replies) 2008-03-16 17:30 ` [RFC][3/3] Update documentation for virtual address space control Balbir Singh 2008-03-16 23:26 ` [RFC][0/3] Virtual address space control for cgroups Paul Menage 3 siblings, 4 replies; 30+ messages in thread From: Balbir Singh @ 2008-03-16 17:30 UTC (permalink / raw) To: linux-mm Cc: Hugh Dickins, Sudhir Kumar, YAMAMOTO Takashi, Paul Menage, lizf, linux-kernel, taka, David Rientjes, Pavel Emelianov, Balbir Singh, Andrew Morton, KAMEZAWA Hiroyuki This patch implements accounting and control of virtual address space. Accounting is done when the virtual address space of any task/mm_struct belonging to the cgroup is incremented or decremented. This patch fails the expansion if the cgroup goes over its limit. A new function mem_cgroup_update_as() is added to deal with the accounting of the virtual address space usage of cgroups. TODOs 1. IA64 has code in perfmon.c pfm_smpl_buffer_alloc(), which increments the total_vm of the mm_struct. This code has not yet been brought into virtual address space control 2. Only when CONFIG_MMU is enabled, is the virtual address space control enabled. Should we do this for nommu cases as well? My suspicion is that we don't have to. Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> --- arch/x86/kernel/ptrace.c | 10 +++++++++- include/linux/memcontrol.h | 7 +++++++ init/Kconfig | 4 +++- kernel/fork.c | 9 +++++++-- mm/memcontrol.c | 37 +++++++++++++++++++++++++++++++++++++ mm/memory.c | 5 +++++ mm/mmap.c | 22 ++++++++++++++++++++-- mm/mremap.c | 21 ++++++++++++++++++--- 8 files changed, 106 insertions(+), 9 deletions(-) diff -puN mm/memcontrol.c~memory-controller-virtual-address-space-accounting-and-control mm/memcontrol.c --- linux-2.6.25-rc5/mm/memcontrol.c~memory-controller-virtual-address-space-accounting-and-control 2008-03-16 22:57:40.000000000 +0530 +++ linux-2.6.25-rc5-balbir/mm/memcontrol.c 2008-03-16 22:57:40.000000000 +0530 @@ -525,6 +525,32 @@ unsigned long mem_cgroup_isolate_pages(u } /* + * Check if the current cgroup exceeds its address space limit. + * Returns 0 on success and 1 on failure. + */ +int mem_cgroup_update_as(struct mm_struct *mm, long nr_pages) +{ + int ret = 0; + struct mem_cgroup *mem; + if (mem_cgroup_subsys.disabled) + return ret; + + rcu_read_lock(); + mem = rcu_dereference(mm->mem_cgroup); + css_get(&mem->css); + rcu_read_unlock(); + + if (nr_pages > 0) { + if (res_counter_charge(&mem->as_res, (nr_pages * PAGE_SIZE))) + ret = 1; + } else + res_counter_uncharge(&mem->as_res, (-nr_pages * PAGE_SIZE)); + + css_put(&mem->css); + return ret; +} + +/* * Charge the memory controller for page usage. * Return * 0 if the charge was successful @@ -1103,6 +1129,17 @@ static void mem_cgroup_move_task(struct goto out; css_get(&mem->css); + /* + * For address space accounting, the charges are migrated. + * We need to migrate it since all the future uncharge/charge will + * now happen to the new cgroup. For consistency, we need to migrate + * all charges, otherwise we could end up dropping charges from + * the new cgroup (even though they were incurred in the current + * group). + */ + if (res_counter_charge(&mem->as_res, mm->total_vm)) + goto out; + res_counter_uncharge(&old_mem->as_res, mm->total_vm); rcu_assign_pointer(mm->mem_cgroup, mem); css_put(&old_mem->css); diff -puN include/linux/memcontrol.h~memory-controller-virtual-address-space-accounting-and-control include/linux/memcontrol.h --- linux-2.6.25-rc5/include/linux/memcontrol.h~memory-controller-virtual-address-space-accounting-and-control 2008-03-16 22:57:40.000000000 +0530 +++ linux-2.6.25-rc5-balbir/include/linux/memcontrol.h 2008-03-16 22:57:40.000000000 +0530 @@ -54,6 +54,7 @@ int task_in_mem_cgroup(struct task_struc extern int mem_cgroup_prepare_migration(struct page *page); extern void mem_cgroup_end_migration(struct page *page); extern void mem_cgroup_page_migration(struct page *page, struct page *newpage); +extern int mem_cgroup_update_as(struct mm_struct *mm, long nr_pages); /* * For memory reclaim. @@ -172,6 +173,12 @@ static inline long mem_cgroup_calc_recla { return 0; } + +static inline int mem_cgroup_update_as(struct mm_struct *mm, long nr_pages) +{ + return 0; +} + #endif /* CONFIG_CGROUP_MEM_CONT */ #endif /* _LINUX_MEMCONTROL_H */ diff -puN mm/mmap.c~memory-controller-virtual-address-space-accounting-and-control mm/mmap.c --- linux-2.6.25-rc5/mm/mmap.c~memory-controller-virtual-address-space-accounting-and-control 2008-03-16 22:57:40.000000000 +0530 +++ linux-2.6.25-rc5-balbir/mm/mmap.c 2008-03-16 22:57:40.000000000 +0530 @@ -1117,6 +1117,9 @@ munmap_back: } } + if (mem_cgroup_update_as(mm, len >> PAGE_SHIFT)) + return -ENOMEM; + /* * Can we just expand an old private anonymous mapping? * The VM_SHARED test is necessary because shmem_zero_setup @@ -1226,8 +1229,11 @@ unmap_and_free_vma: free_vma: kmem_cache_free(vm_area_cachep, vma); unacct_error: - if (charged) + if (charged) { + mem_cgroup_update_as(mm, -charged); vm_unacct_memory(charged); + } +unacct_as_error: return error; } @@ -1555,6 +1561,9 @@ static int acct_stack_growth(struct vm_a if (security_vm_enough_memory(grow)) return -ENOMEM; + if (mem_cgroup_update_as(mm, grow)) + return -ENOMEM; + /* Ok, everything looks good - let it rip */ mm->total_vm += grow; if (vma->vm_flags & VM_LOCKED) @@ -2003,9 +2012,14 @@ unsigned long do_brk(unsigned long addr, if (mm->map_count > sysctl_max_map_count) return -ENOMEM; - if (security_vm_enough_memory(len >> PAGE_SHIFT)) + if (mem_cgroup_update_as(mm, (len >> PAGE_SHIFT))) return -ENOMEM; + if (security_vm_enough_memory(len >> PAGE_SHIFT)) { + mem_cgroup_update_as(mm, -(len >> PAGE_SHIFT)); + return -ENOMEM; + } + /* Can we just expand an old private anonymous mapping? */ if (vma_merge(mm, prev, addr, addr + len, flags, NULL, NULL, pgoff, NULL)) @@ -2236,6 +2250,9 @@ int install_special_mapping(struct mm_st if (unlikely(vma == NULL)) return -ENOMEM; + if (mem_cgroup_update_as(mm, len >> PAGE_SHIFT)) + return -ENOMEM; + vma->vm_mm = mm; vma->vm_start = addr; vma->vm_end = addr + len; @@ -2248,6 +2265,7 @@ int install_special_mapping(struct mm_st if (unlikely(insert_vm_struct(mm, vma))) { kmem_cache_free(vm_area_cachep, vma); + mem_cgroup_update_as(mm, -(len >> PAGE_SHIFT)); return -ENOMEM; } diff -puN arch/x86/kernel/ptrace.c~memory-controller-virtual-address-space-accounting-and-control arch/x86/kernel/ptrace.c --- linux-2.6.25-rc5/arch/x86/kernel/ptrace.c~memory-controller-virtual-address-space-accounting-and-control 2008-03-16 22:57:40.000000000 +0530 +++ linux-2.6.25-rc5-balbir/arch/x86/kernel/ptrace.c 2008-03-16 22:57:40.000000000 +0530 @@ -20,6 +20,7 @@ #include <linux/audit.h> #include <linux/seccomp.h> #include <linux/signal.h> +#include <linux/memcontrol.h> #include <asm/uaccess.h> #include <asm/pgtable.h> @@ -787,6 +788,8 @@ static int ptrace_bts_realloc(struct tas current->mm->total_vm -= old_size; current->mm->locked_vm -= old_size; + mem_cgroup_update_as(current->mm, -old_size); + if (size == 0) goto out; @@ -816,10 +819,15 @@ static int ptrace_bts_realloc(struct tas goto out; } + if (mem_cgroup_update_as(current->mm, size)) + goto out; + ret = ds_allocate((void **)&child->thread.ds_area_msr, size << PAGE_SHIFT); - if (ret < 0) + if (ret < 0) { + mem_cgroup_update_as(current->mm, -size); goto out; + } current->mm->total_vm += size; current->mm->locked_vm += size; diff -puN kernel/fork.c~memory-controller-virtual-address-space-accounting-and-control kernel/fork.c --- linux-2.6.25-rc5/kernel/fork.c~memory-controller-virtual-address-space-accounting-and-control 2008-03-16 22:57:40.000000000 +0530 +++ linux-2.6.25-rc5-balbir/kernel/fork.c 2008-03-16 22:57:40.000000000 +0530 @@ -53,6 +53,7 @@ #include <linux/tty.h> #include <linux/proc_fs.h> #include <linux/blkdev.h> +#include <linux/memcontrol.h> #include <asm/pgtable.h> #include <asm/pgalloc.h> @@ -237,6 +238,7 @@ static int dup_mmap(struct mm_struct *mm for (mpnt = oldmm->mmap; mpnt; mpnt = mpnt->vm_next) { struct file *file; + unsigned int len = vma_pages(mpnt); if (mpnt->vm_flags & VM_DONTCOPY) { long pages = vma_pages(mpnt); @@ -247,11 +249,12 @@ static int dup_mmap(struct mm_struct *mm } charge = 0; if (mpnt->vm_flags & VM_ACCOUNT) { - unsigned int len = (mpnt->vm_end - mpnt->vm_start) >> PAGE_SHIFT; if (security_vm_enough_memory(len)) goto fail_nomem; charge = len; } + if (mem_cgroup_update_as(mm, len)) + goto fail_nomem_as; tmp = kmem_cache_alloc(vm_area_cachep, GFP_KERNEL); if (!tmp) goto fail_nomem; @@ -311,8 +314,10 @@ out: fail_nomem_policy: kmem_cache_free(vm_area_cachep, tmp); fail_nomem: - retval = -ENOMEM; + mem_cgroup_update_as(mm, -charge); vm_unacct_memory(charge); +fail_nomem_as: + retval = -ENOMEM; goto out; } diff -puN mm/mremap.c~memory-controller-virtual-address-space-accounting-and-control mm/mremap.c --- linux-2.6.25-rc5/mm/mremap.c~memory-controller-virtual-address-space-accounting-and-control 2008-03-16 22:57:40.000000000 +0530 +++ linux-2.6.25-rc5-balbir/mm/mremap.c 2008-03-16 22:57:40.000000000 +0530 @@ -174,10 +174,15 @@ static unsigned long move_vma(struct vm_ if (mm->map_count >= sysctl_max_map_count - 3) return -ENOMEM; + if (mem_cgroup_update_as(mm, new_len >> PAGE_SHIFT)) + return -ENOMEM; + new_pgoff = vma->vm_pgoff + ((old_addr - vma->vm_start) >> PAGE_SHIFT); new_vma = copy_vma(&vma, new_addr, new_len, new_pgoff); - if (!new_vma) + if (!new_vma) { + mem_cgroup_update_as(mm, -(new_len >> PAGE_SHIFT)); return -ENOMEM; + } moved_len = move_page_tables(vma, old_addr, new_vma, new_addr, old_len); if (moved_len < old_len) { @@ -187,6 +192,7 @@ static unsigned long move_vma(struct vm_ * and then proceed to unmap new area instead of old. */ move_page_tables(new_vma, new_addr, vma, old_addr, moved_len); + mem_cgroup_update_as(mm, -(new_len >> PAGE_SHIFT)); vma = new_vma; old_len = new_len; old_addr = new_addr; @@ -347,10 +353,17 @@ unsigned long do_mremap(unsigned long ad goto out; } + if (mem_cgroup_update_as(mm, (new_len - old_len) >> PAGE_SHIFT)) { + ret = -ENOMEM; + goto out; + } + if (vma->vm_flags & VM_ACCOUNT) { charged = (new_len - old_len) >> PAGE_SHIFT; - if (security_vm_enough_memory(charged)) + if (security_vm_enough_memory(charged)) { + mem_cgroup_update_as(mm, -charged); goto out_nc; + } } /* old_len exactly to the end of the area.. @@ -406,8 +419,10 @@ unsigned long do_mremap(unsigned long ad ret = move_vma(vma, addr, old_len, new_len, new_addr); } out: - if (ret & ~PAGE_MASK) + if (ret & ~PAGE_MASK) { vm_unacct_memory(charged); + mem_cgroup_update_as(mm, -charged); + } out_nc: return ret; } diff -puN init/Kconfig~memory-controller-virtual-address-space-accounting-and-control init/Kconfig --- linux-2.6.25-rc5/init/Kconfig~memory-controller-virtual-address-space-accounting-and-control 2008-03-16 22:57:40.000000000 +0530 +++ linux-2.6.25-rc5-balbir/init/Kconfig 2008-03-16 22:57:40.000000000 +0530 @@ -369,7 +369,9 @@ config CGROUP_MEM_RES_CTLR depends on CGROUPS && RESOURCE_COUNTERS help Provides a memory resource controller that manages both page cache and - RSS memory. + RSS memory. It also provide accounting and control of address + space allocations (along the lines of RLIMIT_AS) for cgroups + when CONFIG_MMU is enabled. Note that setting this option increases fixed memory overhead associated with each page of memory in the system by 4/8 bytes diff -puN mm/swapfile.c~memory-controller-virtual-address-space-accounting-and-control mm/swapfile.c diff -puN mm/memory.c~memory-controller-virtual-address-space-accounting-and-control mm/memory.c --- linux-2.6.25-rc5/mm/memory.c~memory-controller-virtual-address-space-accounting-and-control 2008-03-16 22:57:40.000000000 +0530 +++ linux-2.6.25-rc5-balbir/mm/memory.c 2008-03-16 22:57:40.000000000 +0530 @@ -838,6 +838,11 @@ unsigned long unmap_vmas(struct mmu_gath if (vma->vm_flags & VM_ACCOUNT) *nr_accounted += (end - start) >> PAGE_SHIFT; + /* + * Unaccount used virtual memory for cgroups + */ + mem_cgroup_update_as(vma->vm_mm, + ((long)(start - end)) >> PAGE_SHIFT); while (start != end) { if (!tlb_start_valid) { _ -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC][2/3] Account and control virtual address space allocations 2008-03-16 17:30 ` [RFC][2/3] Account and control virtual address space allocations Balbir Singh @ 2008-03-17 2:02 ` Paul Menage 2008-03-17 2:57 ` Balbir Singh 2008-03-17 11:36 ` Pavel Emelyanov ` (2 subsequent siblings) 3 siblings, 1 reply; 30+ messages in thread From: Paul Menage @ 2008-03-17 2:02 UTC (permalink / raw) To: Balbir Singh Cc: linux-mm, Hugh Dickins, Sudhir Kumar, YAMAMOTO Takashi, lizf, linux-kernel, taka, David Rientjes, Pavel Emelianov, Andrew Morton, KAMEZAWA Hiroyuki On Mon, Mar 17, 2008 at 1:30 AM, Balbir Singh <balbir@linux.vnet.ibm.com> wrote: > /* > + * Check if the current cgroup exceeds its address space limit. > + * Returns 0 on success and 1 on failure. > + */ > +int mem_cgroup_update_as(struct mm_struct *mm, long nr_pages) > +{ > + int ret = 0; > + struct mem_cgroup *mem; > + if (mem_cgroup_subsys.disabled) > + return ret; > + > + rcu_read_lock(); > + mem = rcu_dereference(mm->mem_cgroup); > + css_get(&mem->css); > + rcu_read_unlock(); > + How about if this function avoided charging the root cgroup? You'd save 4 atomic operations on a global data structure on every mmap/munmap when the virtual address limit cgroup wasn't in use, which could be significant on a large system. And I don't see situations where you really need to limit the address space of the root cgroup. Paul -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC][2/3] Account and control virtual address space allocations 2008-03-17 2:02 ` Paul Menage @ 2008-03-17 2:57 ` Balbir Singh 2008-03-17 3:03 ` Paul Menage 0 siblings, 1 reply; 30+ messages in thread From: Balbir Singh @ 2008-03-17 2:57 UTC (permalink / raw) To: Paul Menage Cc: linux-mm, Hugh Dickins, Sudhir Kumar, YAMAMOTO Takashi, lizf, linux-kernel, taka, David Rientjes, Pavel Emelianov, Andrew Morton, KAMEZAWA Hiroyuki Paul Menage wrote: > On Mon, Mar 17, 2008 at 1:30 AM, Balbir Singh <balbir@linux.vnet.ibm.com> wrote: >> /* >> + * Check if the current cgroup exceeds its address space limit. >> + * Returns 0 on success and 1 on failure. >> + */ >> +int mem_cgroup_update_as(struct mm_struct *mm, long nr_pages) >> +{ >> + int ret = 0; >> + struct mem_cgroup *mem; >> + if (mem_cgroup_subsys.disabled) >> + return ret; >> + >> + rcu_read_lock(); >> + mem = rcu_dereference(mm->mem_cgroup); >> + css_get(&mem->css); >> + rcu_read_unlock(); >> + > > How about if this function avoided charging the root cgroup? You'd > save 4 atomic operations on a global data structure on every > mmap/munmap when the virtual address limit cgroup wasn't in use, which > could be significant on a large system. And I don't see situations > where you really need to limit the address space of the root cgroup. 4 atomic operations is very tempting, but we want to account for root usage due to the following reasons: 1. We want to be able to support hierarchial accounting and control 2. We want to track usage of the root cgroup and report it back to the user 3. We don't want to treat the root cgroup as a special case. -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC][2/3] Account and control virtual address space allocations 2008-03-17 2:57 ` Balbir Singh @ 2008-03-17 3:03 ` Paul Menage 0 siblings, 0 replies; 30+ messages in thread From: Paul Menage @ 2008-03-17 3:03 UTC (permalink / raw) To: balbir Cc: linux-mm, Hugh Dickins, Sudhir Kumar, YAMAMOTO Takashi, lizf, linux-kernel, taka, David Rientjes, Pavel Emelianov, Andrew Morton, KAMEZAWA Hiroyuki On Mon, Mar 17, 2008 at 10:57 AM, Balbir Singh <balbir@linux.vnet.ibm.com> wrote: > > 1. We want to be able to support hierarchial accounting and control > 2. We want to track usage of the root cgroup and report it back to the user What use cases do you have for that? > 3. We don't want to treat the root cgroup as a special case. Why? It is a special case, in that in a lot of machines there's only going to be the root cgroup, and the subsystem won't be mounted. So in those cases, paying any overhead is a cost without a benefit. Alternatively, how about you skip tracking virtual address space changes if the virtual address cgroup isn't mounted on any hierarchy? When you mount it, you can do a pass across all mms and set the root cgroup usage to their total. Paul -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC][2/3] Account and control virtual address space allocations 2008-03-16 17:30 ` [RFC][2/3] Account and control virtual address space allocations Balbir Singh 2008-03-17 2:02 ` Paul Menage @ 2008-03-17 11:36 ` Pavel Emelyanov 2008-03-17 12:29 ` Balbir Singh 2008-03-17 16:53 ` Dave Hansen 2008-03-17 23:35 ` YAMAMOTO Takashi 3 siblings, 1 reply; 30+ messages in thread From: Pavel Emelyanov @ 2008-03-17 11:36 UTC (permalink / raw) To: Balbir Singh Cc: linux-mm, Hugh Dickins, Sudhir Kumar, YAMAMOTO Takashi, Paul Menage, lizf, linux-kernel, taka, David Rientjes, Pavel Emelianov, Andrew Morton, KAMEZAWA Hiroyuki [snip] > +int mem_cgroup_update_as(struct mm_struct *mm, long nr_pages) > +{ > + int ret = 0; > + struct mem_cgroup *mem; > + if (mem_cgroup_subsys.disabled) > + return ret; > + > + rcu_read_lock(); > + mem = rcu_dereference(mm->mem_cgroup); > + css_get(&mem->css); > + rcu_read_unlock(); > + > + if (nr_pages > 0) { > + if (res_counter_charge(&mem->as_res, (nr_pages * PAGE_SIZE))) > + ret = 1; > + } else > + res_counter_uncharge(&mem->as_res, (-nr_pages * PAGE_SIZE)); No, please, no. Let's make two calls - mem_cgroup_charge_as and mem_cgroup_uncharge_as. [snip] > @@ -1117,6 +1117,9 @@ munmap_back: > } > } > > + if (mem_cgroup_update_as(mm, len >> PAGE_SHIFT)) > + return -ENOMEM; > + Why not use existintg cap_vm_enough_memory and co? [snip] -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC][2/3] Account and control virtual address space allocations 2008-03-17 11:36 ` Pavel Emelyanov @ 2008-03-17 12:29 ` Balbir Singh 2008-03-17 12:40 ` Pavel Emelyanov 0 siblings, 1 reply; 30+ messages in thread From: Balbir Singh @ 2008-03-17 12:29 UTC (permalink / raw) To: Pavel Emelyanov Cc: linux-mm, Hugh Dickins, Sudhir Kumar, YAMAMOTO Takashi, Paul Menage, lizf, linux-kernel, taka, David Rientjes, Andrew Morton, KAMEZAWA Hiroyuki Pavel Emelyanov wrote: > [snip] > >> +int mem_cgroup_update_as(struct mm_struct *mm, long nr_pages) >> +{ >> + int ret = 0; >> + struct mem_cgroup *mem; >> + if (mem_cgroup_subsys.disabled) >> + return ret; >> + >> + rcu_read_lock(); >> + mem = rcu_dereference(mm->mem_cgroup); >> + css_get(&mem->css); >> + rcu_read_unlock(); >> + >> + if (nr_pages > 0) { >> + if (res_counter_charge(&mem->as_res, (nr_pages * PAGE_SIZE))) >> + ret = 1; >> + } else >> + res_counter_uncharge(&mem->as_res, (-nr_pages * PAGE_SIZE)); > > No, please, no. Let's make two calls - mem_cgroup_charge_as and mem_cgroup_uncharge_as. > > [snip] > Yes, sure :) >> @@ -1117,6 +1117,9 @@ munmap_back: >> } >> } >> >> + if (mem_cgroup_update_as(mm, len >> PAGE_SHIFT)) >> + return -ENOMEM; >> + > > Why not use existintg cap_vm_enough_memory and co? > I thought about it and almost used may_expand_vm(), but there is a slight catch there. With cap_vm_enough_memory() or security_vm_enough_memory(), they are called after total_vm has been calculated. In our case we need to keep the cgroups equivalent of total_vm up to date, and we do this in mem_cgorup_update_as. -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC][2/3] Account and control virtual address space allocations 2008-03-17 12:29 ` Balbir Singh @ 2008-03-17 12:40 ` Pavel Emelyanov 2008-03-17 12:51 ` Balbir Singh 0 siblings, 1 reply; 30+ messages in thread From: Pavel Emelyanov @ 2008-03-17 12:40 UTC (permalink / raw) To: balbir Cc: linux-mm, Hugh Dickins, Sudhir Kumar, YAMAMOTO Takashi, Paul Menage, lizf, linux-kernel, taka, David Rientjes, Andrew Morton, KAMEZAWA Hiroyuki Balbir Singh wrote: > Pavel Emelyanov wrote: >> [snip] >> >>> +int mem_cgroup_update_as(struct mm_struct *mm, long nr_pages) >>> +{ >>> + int ret = 0; >>> + struct mem_cgroup *mem; >>> + if (mem_cgroup_subsys.disabled) >>> + return ret; >>> + >>> + rcu_read_lock(); >>> + mem = rcu_dereference(mm->mem_cgroup); >>> + css_get(&mem->css); >>> + rcu_read_unlock(); >>> + >>> + if (nr_pages > 0) { >>> + if (res_counter_charge(&mem->as_res, (nr_pages * PAGE_SIZE))) >>> + ret = 1; >>> + } else >>> + res_counter_uncharge(&mem->as_res, (-nr_pages * PAGE_SIZE)); >> No, please, no. Let's make two calls - mem_cgroup_charge_as and mem_cgroup_uncharge_as. >> >> [snip] >> > > Yes, sure :) Thanks :) >>> @@ -1117,6 +1117,9 @@ munmap_back: >>> } >>> } >>> >>> + if (mem_cgroup_update_as(mm, len >> PAGE_SHIFT)) >>> + return -ENOMEM; >>> + >> Why not use existintg cap_vm_enough_memory and co? >> > > I thought about it and almost used may_expand_vm(), but there is a slight catch > there. With cap_vm_enough_memory() or security_vm_enough_memory(), they are > called after total_vm has been calculated. In our case we need to keep the > cgroups equivalent of total_vm up to date, and we do this in mem_cgorup_update_as. So? What prevents us from using these hooks? :) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC][2/3] Account and control virtual address space allocations 2008-03-17 12:40 ` Pavel Emelyanov @ 2008-03-17 12:51 ` Balbir Singh 2008-03-17 13:01 ` Pavel Emelyanov 0 siblings, 1 reply; 30+ messages in thread From: Balbir Singh @ 2008-03-17 12:51 UTC (permalink / raw) To: Pavel Emelyanov Cc: linux-mm, Hugh Dickins, Sudhir Kumar, YAMAMOTO Takashi, Paul Menage, lizf, linux-kernel, taka, David Rientjes, Andrew Morton, KAMEZAWA Hiroyuki Pavel Emelyanov wrote: > Balbir Singh wrote: >> Pavel Emelyanov wrote: >>> [snip] >>> >>>> +int mem_cgroup_update_as(struct mm_struct *mm, long nr_pages) >>>> +{ >>>> + int ret = 0; >>>> + struct mem_cgroup *mem; >>>> + if (mem_cgroup_subsys.disabled) >>>> + return ret; >>>> + >>>> + rcu_read_lock(); >>>> + mem = rcu_dereference(mm->mem_cgroup); >>>> + css_get(&mem->css); >>>> + rcu_read_unlock(); >>>> + >>>> + if (nr_pages > 0) { >>>> + if (res_counter_charge(&mem->as_res, (nr_pages * PAGE_SIZE))) >>>> + ret = 1; >>>> + } else >>>> + res_counter_uncharge(&mem->as_res, (-nr_pages * PAGE_SIZE)); >>> No, please, no. Let's make two calls - mem_cgroup_charge_as and mem_cgroup_uncharge_as. >>> >>> [snip] >>> >> Yes, sure :) > > Thanks :) > >>>> @@ -1117,6 +1117,9 @@ munmap_back: >>>> } >>>> } >>>> >>>> + if (mem_cgroup_update_as(mm, len >> PAGE_SHIFT)) >>>> + return -ENOMEM; >>>> + >>> Why not use existintg cap_vm_enough_memory and co? >>> >> I thought about it and almost used may_expand_vm(), but there is a slight catch >> there. With cap_vm_enough_memory() or security_vm_enough_memory(), they are >> called after total_vm has been calculated. In our case we need to keep the >> cgroups equivalent of total_vm up to date, and we do this in mem_cgorup_update_as. > > So? What prevents us from using these hooks? :) 1. We need to account total_vm usage of the task anyway. So why have two places, one for accounting and second for control? 2. These hooks are activated for conditionally invoked for vma's with VM_ACCOUNT set. -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC][2/3] Account and control virtual address space allocations 2008-03-17 12:51 ` Balbir Singh @ 2008-03-17 13:01 ` Pavel Emelyanov 2008-03-17 14:39 ` Balbir Singh 0 siblings, 1 reply; 30+ messages in thread From: Pavel Emelyanov @ 2008-03-17 13:01 UTC (permalink / raw) To: balbir Cc: linux-mm, Hugh Dickins, Sudhir Kumar, YAMAMOTO Takashi, Paul Menage, lizf, linux-kernel, taka, David Rientjes, Andrew Morton, KAMEZAWA Hiroyuki Balbir Singh wrote: > Pavel Emelyanov wrote: >> Balbir Singh wrote: >>> Pavel Emelyanov wrote: >>>> [snip] >>>> >>>>> +int mem_cgroup_update_as(struct mm_struct *mm, long nr_pages) >>>>> +{ >>>>> + int ret = 0; >>>>> + struct mem_cgroup *mem; >>>>> + if (mem_cgroup_subsys.disabled) >>>>> + return ret; >>>>> + >>>>> + rcu_read_lock(); >>>>> + mem = rcu_dereference(mm->mem_cgroup); >>>>> + css_get(&mem->css); >>>>> + rcu_read_unlock(); >>>>> + >>>>> + if (nr_pages > 0) { >>>>> + if (res_counter_charge(&mem->as_res, (nr_pages * PAGE_SIZE))) >>>>> + ret = 1; >>>>> + } else >>>>> + res_counter_uncharge(&mem->as_res, (-nr_pages * PAGE_SIZE)); >>>> No, please, no. Let's make two calls - mem_cgroup_charge_as and mem_cgroup_uncharge_as. >>>> >>>> [snip] >>>> >>> Yes, sure :) >> Thanks :) >> >>>>> @@ -1117,6 +1117,9 @@ munmap_back: >>>>> } >>>>> } >>>>> >>>>> + if (mem_cgroup_update_as(mm, len >> PAGE_SHIFT)) >>>>> + return -ENOMEM; >>>>> + >>>> Why not use existintg cap_vm_enough_memory and co? >>>> >>> I thought about it and almost used may_expand_vm(), but there is a slight catch >>> there. With cap_vm_enough_memory() or security_vm_enough_memory(), they are >>> called after total_vm has been calculated. In our case we need to keep the >>> cgroups equivalent of total_vm up to date, and we do this in mem_cgorup_update_as. >> So? What prevents us from using these hooks? :) > > 1. We need to account total_vm usage of the task anyway. So why have two places, > one for accounting and second for control? We still have two of them even placing hooks in each place manually. Besides, putting the mem_cgroup_(un)charge_as() in these vm hooks will 1. save the number of places to patch 2. help keeping memcgroup consistent in case someone adds more places that expand tasks vm (arches, drivers) - in case we have our hooks celled from inside vm ones, we won't have to patch more. > 2. These hooks are activated for conditionally invoked for vma's with VM_ACCOUNT > set. This is a good point against. But, wrt my previous comment, can we handle this somehow? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC][2/3] Account and control virtual address space allocations 2008-03-17 13:01 ` Pavel Emelyanov @ 2008-03-17 14:39 ` Balbir Singh 0 siblings, 0 replies; 30+ messages in thread From: Balbir Singh @ 2008-03-17 14:39 UTC (permalink / raw) To: Pavel Emelyanov Cc: linux-mm, Hugh Dickins, Sudhir Kumar, YAMAMOTO Takashi, Paul Menage, lizf, linux-kernel, taka, David Rientjes, Andrew Morton, KAMEZAWA Hiroyuki Pavel Emelyanov wrote: > Balbir Singh wrote: >> Pavel Emelyanov wrote: >>> Balbir Singh wrote: >>>> Pavel Emelyanov wrote: >>>>> [snip] >>>>> >>>>>> +int mem_cgroup_update_as(struct mm_struct *mm, long nr_pages) >>>>>> +{ >>>>>> + int ret = 0; >>>>>> + struct mem_cgroup *mem; >>>>>> + if (mem_cgroup_subsys.disabled) >>>>>> + return ret; >>>>>> + >>>>>> + rcu_read_lock(); >>>>>> + mem = rcu_dereference(mm->mem_cgroup); >>>>>> + css_get(&mem->css); >>>>>> + rcu_read_unlock(); >>>>>> + >>>>>> + if (nr_pages > 0) { >>>>>> + if (res_counter_charge(&mem->as_res, (nr_pages * PAGE_SIZE))) >>>>>> + ret = 1; >>>>>> + } else >>>>>> + res_counter_uncharge(&mem->as_res, (-nr_pages * PAGE_SIZE)); >>>>> No, please, no. Let's make two calls - mem_cgroup_charge_as and mem_cgroup_uncharge_as. >>>>> >>>>> [snip] >>>>> >>>> Yes, sure :) >>> Thanks :) >>> >>>>>> @@ -1117,6 +1117,9 @@ munmap_back: >>>>>> } >>>>>> } >>>>>> >>>>>> + if (mem_cgroup_update_as(mm, len >> PAGE_SHIFT)) >>>>>> + return -ENOMEM; >>>>>> + >>>>> Why not use existintg cap_vm_enough_memory and co? >>>>> >>>> I thought about it and almost used may_expand_vm(), but there is a slight catch >>>> there. With cap_vm_enough_memory() or security_vm_enough_memory(), they are >>>> called after total_vm has been calculated. In our case we need to keep the >>>> cgroups equivalent of total_vm up to date, and we do this in mem_cgorup_update_as. >>> So? What prevents us from using these hooks? :) >> 1. We need to account total_vm usage of the task anyway. So why have two places, >> one for accounting and second for control? > > We still have two of them even placing hooks in each place manually. > > Besides, putting the mem_cgroup_(un)charge_as() in these vm hooks will > 1. save the number of places to patch > 2. help keeping memcgroup consistent in case someone adds more places > that expand tasks vm (arches, drivers) - in case we have our hooks > celled from inside vm ones, we won't have to patch more. > I am not sure I understand your proposal. Without manually placing these hooks how do we track 1. When the vm size has increased/decreased 2. In case due to some reason, the call following these hooks fail, how do we undo it, without placing hooks? >> 2. These hooks are activated for conditionally invoked for vma's with VM_ACCOUNT >> set. > > This is a good point against. But, wrt my previous comment, can we handle > this somehow? Not sure I understand -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC][2/3] Account and control virtual address space allocations 2008-03-16 17:30 ` [RFC][2/3] Account and control virtual address space allocations Balbir Singh 2008-03-17 2:02 ` Paul Menage 2008-03-17 11:36 ` Pavel Emelyanov @ 2008-03-17 16:53 ` Dave Hansen 2008-03-18 1:14 ` Balbir Singh 2008-03-17 23:35 ` YAMAMOTO Takashi 3 siblings, 1 reply; 30+ messages in thread From: Dave Hansen @ 2008-03-17 16:53 UTC (permalink / raw) To: Balbir Singh Cc: linux-mm, Hugh Dickins, Sudhir Kumar, YAMAMOTO Takashi, Paul Menage, lizf, linux-kernel, taka, David Rientjes, Pavel Emelianov, Andrew Morton, KAMEZAWA Hiroyuki On Sun, 2008-03-16 at 23:00 +0530, Balbir Singh wrote: > @@ -787,6 +788,8 @@ static int ptrace_bts_realloc(struct tas > current->mm->total_vm -= old_size; > current->mm->locked_vm -= old_size; > > + mem_cgroup_update_as(current->mm, -old_size); > + > if (size == 0) > goto out; I think splattering these things all over is probably a bad idea. If you're going to do this, I think you need a couple of phases. 1. update the vm_(un)acct_memory() functions to take an mm 2. start using them (or some other abstracted functions in place) 3. update the new functions for cgroups It's a bit non-obvious why you do the mem_cgroup_update_as() calls in the places that you do from context. Having some other vm-abstracted functions will also keep you from splattering mem_cgroup_update_as() across the tree. That's a pretty bad name. :) ...update_mapped() or ...update_vm() might be a wee bit better. -- Dave -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC][2/3] Account and control virtual address space allocations 2008-03-17 16:53 ` Dave Hansen @ 2008-03-18 1:14 ` Balbir Singh 2008-03-18 17:11 ` Dave Hansen 0 siblings, 1 reply; 30+ messages in thread From: Balbir Singh @ 2008-03-18 1:14 UTC (permalink / raw) To: Dave Hansen Cc: linux-mm, Hugh Dickins, Sudhir Kumar, YAMAMOTO Takashi, Paul Menage, lizf, linux-kernel, taka, David Rientjes, Pavel Emelianov, Andrew Morton, KAMEZAWA Hiroyuki Dave Hansen wrote: > On Sun, 2008-03-16 at 23:00 +0530, Balbir Singh wrote: >> @@ -787,6 +788,8 @@ static int ptrace_bts_realloc(struct tas >> current->mm->total_vm -= old_size; >> current->mm->locked_vm -= old_size; >> >> + mem_cgroup_update_as(current->mm, -old_size); >> + >> if (size == 0) >> goto out; > > I think splattering these things all over is probably a bad idea. > I agree and I tried to avoid the splattering > If you're going to do this, I think you need a couple of phases. > > 1. update the vm_(un)acct_memory() functions to take an mm There are other problems 1. vm_(un)acct_memory is conditionally dependent on VM_ACCOUNT. Look at shmem_(un)acct_size for example 2. These routines are not called from all contexts that we care about (look at insert_special_mapping()) > 2. start using them (or some other abstracted functions in place) > 3. update the new functions for cgroups > > It's a bit non-obvious why you do the mem_cgroup_update_as() calls in > the places that you do from context. > > Having some other vm-abstracted functions will also keep you from > splattering mem_cgroup_update_as() across the tree. That's a pretty bad > name. :) ...update_mapped() or ...update_vm() might be a wee bit > better. > I am going to split mem_cgroup_update_as() to two routines with a better name. I agree with you in principle about splattering, but please see my comments above -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC][2/3] Account and control virtual address space allocations 2008-03-18 1:14 ` Balbir Singh @ 2008-03-18 17:11 ` Dave Hansen 2008-03-18 17:58 ` Balbir Singh 0 siblings, 1 reply; 30+ messages in thread From: Dave Hansen @ 2008-03-18 17:11 UTC (permalink / raw) To: balbir Cc: linux-mm, Hugh Dickins, Sudhir Kumar, YAMAMOTO Takashi, Paul Menage, lizf, linux-kernel, taka, David Rientjes, Pavel Emelianov, Andrew Morton, KAMEZAWA Hiroyuki On Tue, 2008-03-18 at 06:44 +0530, Balbir Singh wrote: > > If you're going to do this, I think you need a couple of phases. > > > > 1. update the vm_(un)acct_memory() functions to take an mm > > There are other problems > > 1. vm_(un)acct_memory is conditionally dependent on VM_ACCOUNT. Look at > shmem_(un)acct_size for example Yeah, but if VM_ACCOUNT isn't set, do you really want the controller accounting for them? It's there for a reason. :) The shmem_acct_size() helpers look good. I wonder if we should be using that kind of things more generically. > 2. These routines are not called from all contexts that we care about (look at > insert_special_mapping()) Could you explain why "we" care about it and why it isn't accounted for now? -- Dave -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC][2/3] Account and control virtual address space allocations 2008-03-18 17:11 ` Dave Hansen @ 2008-03-18 17:58 ` Balbir Singh 0 siblings, 0 replies; 30+ messages in thread From: Balbir Singh @ 2008-03-18 17:58 UTC (permalink / raw) To: Dave Hansen Cc: linux-mm, Hugh Dickins, Sudhir Kumar, YAMAMOTO Takashi, Paul Menage, lizf, linux-kernel, taka, David Rientjes, Pavel Emelianov, Andrew Morton, KAMEZAWA Hiroyuki Dave Hansen wrote: > On Tue, 2008-03-18 at 06:44 +0530, Balbir Singh wrote: >>> If you're going to do this, I think you need a couple of phases. >>> >>> 1. update the vm_(un)acct_memory() functions to take an mm >> There are other problems >> >> 1. vm_(un)acct_memory is conditionally dependent on VM_ACCOUNT. Look at >> shmem_(un)acct_size for example > > Yeah, but if VM_ACCOUNT isn't set, do you really want the controller > accounting for them? It's there for a reason. :) > We are trying to account for virtual memory usage. Please see http://lwn.net/Articles/5016/ to see what VM_ACCOUNT does or Documentation/vm/overcommit-accounting. We want to account and control virtual memory usage and not necessarily implement overcommit accounting > The shmem_acct_size() helpers look good. I wonder if we should be using > that kind of things more generically. > Yes, it is well written. I wish there were more such abstractions, but it does not help us. >> 2. These routines are not called from all contexts that we care about (look at >> insert_special_mapping()) > > Could you explain why "we" care about it and why it isn't accounted for > now? It is accounted for in total_vm and that's why we care about :) -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC][2/3] Account and control virtual address space allocations 2008-03-16 17:30 ` [RFC][2/3] Account and control virtual address space allocations Balbir Singh ` (2 preceding siblings ...) 2008-03-17 16:53 ` Dave Hansen @ 2008-03-17 23:35 ` YAMAMOTO Takashi 2008-03-18 1:10 ` Balbir Singh 3 siblings, 1 reply; 30+ messages in thread From: YAMAMOTO Takashi @ 2008-03-17 23:35 UTC (permalink / raw) To: balbir Cc: linux-mm, hugh, skumar, menage, lizf, linux-kernel, taka, rientjes, xemul, akpm, kamezawa.hiroyu > diff -puN mm/swapfile.c~memory-controller-virtual-address-space-accounting-and-control mm/swapfile.c > diff -puN mm/memory.c~memory-controller-virtual-address-space-accounting-and-control mm/memory.c > --- linux-2.6.25-rc5/mm/memory.c~memory-controller-virtual-address-space-accounting-and-control 2008-03-16 22:57:40.000000000 +0530 > +++ linux-2.6.25-rc5-balbir/mm/memory.c 2008-03-16 22:57:40.000000000 +0530 > @@ -838,6 +838,11 @@ unsigned long unmap_vmas(struct mmu_gath > > if (vma->vm_flags & VM_ACCOUNT) > *nr_accounted += (end - start) >> PAGE_SHIFT; > + /* > + * Unaccount used virtual memory for cgroups > + */ > + mem_cgroup_update_as(vma->vm_mm, > + ((long)(start - end)) >> PAGE_SHIFT); > > while (start != end) { > if (!tlb_start_valid) { i think you can sum and uncharge it with a single call. YAMAMOTO Takashi -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC][2/3] Account and control virtual address space allocations 2008-03-17 23:35 ` YAMAMOTO Takashi @ 2008-03-18 1:10 ` Balbir Singh 0 siblings, 0 replies; 30+ messages in thread From: Balbir Singh @ 2008-03-18 1:10 UTC (permalink / raw) To: YAMAMOTO Takashi Cc: linux-mm, hugh, skumar, menage, lizf, linux-kernel, taka, rientjes, xemul, akpm, kamezawa.hiroyu YAMAMOTO Takashi wrote: >> diff -puN mm/swapfile.c~memory-controller-virtual-address-space-accounting-and-control mm/swapfile.c >> diff -puN mm/memory.c~memory-controller-virtual-address-space-accounting-and-control mm/memory.c >> --- linux-2.6.25-rc5/mm/memory.c~memory-controller-virtual-address-space-accounting-and-control 2008-03-16 22:57:40.000000000 +0530 >> +++ linux-2.6.25-rc5-balbir/mm/memory.c 2008-03-16 22:57:40.000000000 +0530 >> @@ -838,6 +838,11 @@ unsigned long unmap_vmas(struct mmu_gath >> >> if (vma->vm_flags & VM_ACCOUNT) >> *nr_accounted += (end - start) >> PAGE_SHIFT; >> + /* >> + * Unaccount used virtual memory for cgroups >> + */ >> + mem_cgroup_update_as(vma->vm_mm, >> + ((long)(start - end)) >> PAGE_SHIFT); >> >> while (start != end) { >> if (!tlb_start_valid) { > > i think you can sum and uncharge it with a single call. > Like nr_accounted? I'll have to duplicate nr_accounted since that depends conditionally on VM_ACCOUNT. -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* [RFC][3/3] Update documentation for virtual address space control 2008-03-16 17:29 [RFC][0/3] Virtual address space control for cgroups Balbir Singh 2008-03-16 17:29 ` [RFC][1/3] Add user interface for virtual address space control Balbir Singh 2008-03-16 17:30 ` [RFC][2/3] Account and control virtual address space allocations Balbir Singh @ 2008-03-16 17:30 ` Balbir Singh 2008-03-16 18:32 ` Randy Dunlap 2008-03-16 23:26 ` [RFC][0/3] Virtual address space control for cgroups Paul Menage 3 siblings, 1 reply; 30+ messages in thread From: Balbir Singh @ 2008-03-16 17:30 UTC (permalink / raw) To: linux-mm Cc: Hugh Dickins, Sudhir Kumar, YAMAMOTO Takashi, Paul Menage, lizf, linux-kernel, taka, David Rientjes, Pavel Emelianov, Balbir Singh, Andrew Morton, KAMEZAWA Hiroyuki This patch adds documentation for virtual address space control. Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> --- Documentation/controllers/memory.txt | 26 +++++++++++++++++++++++++- 1 file changed, 25 insertions(+), 1 deletion(-) diff -puN Documentation/controllers/memory.txt~memory-controller-virtual-address-control-documentation Documentation/controllers/memory.txt --- linux-2.6.25-rc5/Documentation/controllers/memory.txt~memory-controller-virtual-address-control-documentation 2008-03-16 22:57:44.000000000 +0530 +++ linux-2.6.25-rc5-balbir/Documentation/controllers/memory.txt 2008-03-16 22:57:44.000000000 +0530 @@ -237,7 +237,31 @@ cgroup might have some charge associated tasks have migrated away from it. Such charges are automatically dropped at rmdir() if there are no tasks. -5. TODO +5. Virtual address space accounting + +A new resource counter controls the address space expansion of the tasks in +the cgroup. Address space control is provided along the same lines as +RLIMIT_AS control, which is available via getrlimit(2)/setrlimit(2). +The interface for controlling address space is provided through +"as_limit_in_bytes". The file is similar to "limit_in_bytes" w.r.t the user +interface. Please see section 3 for more details on how to use the user +interface to get and set values. + +The "as_usage_in_bytes" file provides information about the total address +space usage of the cgroup in bytes. + +5.1 Advantages of providing this feature + +1. Control over virtual address space allows for a cgroup to fail gracefully + i.e, via a malloc or mmap failure as compared to OOM kill when no + pages can be reclaimed +2. It provides better control over how many pages can be swapped out when + the cgroup goes over it's limit. A badly setup cgroup can cause excessive + swapping. Providing control over the address space allocations ensures + that the system administrator has control over the total swapping that + can take place. + +6. TODO 1. Add support for accounting huge pages (as a separate controller) 2. Make per-cgroup scanner reclaim not-shared pages first _ -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC][3/3] Update documentation for virtual address space control 2008-03-16 17:30 ` [RFC][3/3] Update documentation for virtual address space control Balbir Singh @ 2008-03-16 18:32 ` Randy Dunlap 2008-03-17 1:33 ` Balbir Singh 0 siblings, 1 reply; 30+ messages in thread From: Randy Dunlap @ 2008-03-16 18:32 UTC (permalink / raw) To: Balbir Singh Cc: linux-mm, Hugh Dickins, Sudhir Kumar, YAMAMOTO Takashi, Paul Menage, lizf, linux-kernel, taka, David Rientjes, Pavel Emelianov, Andrew Morton, KAMEZAWA Hiroyuki On Sun, 16 Mar 2008 23:00:17 +0530 Balbir Singh wrote: > This patch adds documentation for virtual address space control. > > Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> > --- > > Documentation/controllers/memory.txt | 26 +++++++++++++++++++++++++- > 1 file changed, 25 insertions(+), 1 deletion(-) > > diff -puN Documentation/controllers/memory.txt~memory-controller-virtual-address-control-documentation Documentation/controllers/memory.txt > --- linux-2.6.25-rc5/Documentation/controllers/memory.txt~memory-controller-virtual-address-control-documentation 2008-03-16 22:57:44.000000000 +0530 > +++ linux-2.6.25-rc5-balbir/Documentation/controllers/memory.txt 2008-03-16 22:57:44.000000000 +0530 > @@ -237,7 +237,31 @@ cgroup might have some charge associated > tasks have migrated away from it. Such charges are automatically dropped at > rmdir() if there are no tasks. > > -5. TODO > +5. Virtual address space accounting > + > +A new resource counter controls the address space expansion of the tasks in > +the cgroup. Address space control is provided along the same lines as > +RLIMIT_AS control, which is available via getrlimit(2)/setrlimit(2). > +The interface for controlling address space is provided through > +"as_limit_in_bytes". The file is similar to "limit_in_bytes" w.r.t the user w.r.t. or even spelled out. > +interface. Please see section 3 for more details on how to use the user > +interface to get and set values. > + > +The "as_usage_in_bytes" file provides information about the total address > +space usage of the cgroup in bytes. > + > +5.1 Advantages of providing this feature > + > +1. Control over virtual address space allows for a cgroup to fail gracefully > + i.e, via a malloc or mmap failure as compared to OOM kill when no i.e., > + pages can be reclaimed end with period. > +2. It provides better control over how many pages can be swapped out when > + the cgroup goes over it's limit. A badly setup cgroup can cause excessive its (not "it is") > + swapping. Providing control over the address space allocations ensures > + that the system administrator has control over the total swapping that > + can take place. > + > +6. TODO > > 1. Add support for accounting huge pages (as a separate controller) > 2. Make per-cgroup scanner reclaim not-shared pages first > _ --- ~Randy -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC][3/3] Update documentation for virtual address space control 2008-03-16 18:32 ` Randy Dunlap @ 2008-03-17 1:33 ` Balbir Singh 0 siblings, 0 replies; 30+ messages in thread From: Balbir Singh @ 2008-03-17 1:33 UTC (permalink / raw) To: Randy Dunlap Cc: linux-mm, Hugh Dickins, Sudhir Kumar, YAMAMOTO Takashi, Paul Menage, lizf, linux-kernel, taka, David Rientjes, Pavel Emelianov, Andrew Morton, KAMEZAWA Hiroyuki Randy Dunlap wrote: > On Sun, 16 Mar 2008 23:00:17 +0530 Balbir Singh wrote: > >> This patch adds documentation for virtual address space control. >> >> Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> >> --- >> >> Documentation/controllers/memory.txt | 26 +++++++++++++++++++++++++- >> 1 file changed, 25 insertions(+), 1 deletion(-) >> >> diff -puN Documentation/controllers/memory.txt~memory-controller-virtual-address-control-documentation Documentation/controllers/memory.txt >> --- linux-2.6.25-rc5/Documentation/controllers/memory.txt~memory-controller-virtual-address-control-documentation 2008-03-16 22:57:44.000000000 +0530 >> +++ linux-2.6.25-rc5-balbir/Documentation/controllers/memory.txt 2008-03-16 22:57:44.000000000 +0530 >> @@ -237,7 +237,31 @@ cgroup might have some charge associated >> tasks have migrated away from it. Such charges are automatically dropped at >> rmdir() if there are no tasks. >> >> -5. TODO >> +5. Virtual address space accounting >> + >> +A new resource counter controls the address space expansion of the tasks in >> +the cgroup. Address space control is provided along the same lines as >> +RLIMIT_AS control, which is available via getrlimit(2)/setrlimit(2). >> +The interface for controlling address space is provided through >> +"as_limit_in_bytes". The file is similar to "limit_in_bytes" w.r.t the user > > w.r.t. > or even spelled out. > Will spell out. >> +interface. Please see section 3 for more details on how to use the user >> +interface to get and set values. >> + >> +The "as_usage_in_bytes" file provides information about the total address >> +space usage of the cgroup in bytes. >> + >> +5.1 Advantages of providing this feature >> + >> +1. Control over virtual address space allows for a cgroup to fail gracefully >> + i.e, via a malloc or mmap failure as compared to OOM kill when no > > i.e., > >> + pages can be reclaimed > > end with period. Will fix > >> +2. It provides better control over how many pages can be swapped out when >> + the cgroup goes over it's limit. A badly setup cgroup can cause excessive > > its (not "it is") > Will fix :) >> + swapping. Providing control over the address space allocations ensures >> + that the system administrator has control over the total swapping that >> + can take place. >> + >> +6. TODO >> >> 1. Add support for accounting huge pages (as a separate controller) >> 2. Make per-cgroup scanner reclaim not-shared pages first >> _ > > --- > ~Randy > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC][0/3] Virtual address space control for cgroups 2008-03-16 17:29 [RFC][0/3] Virtual address space control for cgroups Balbir Singh ` (2 preceding siblings ...) 2008-03-16 17:30 ` [RFC][3/3] Update documentation for virtual address space control Balbir Singh @ 2008-03-16 23:26 ` Paul Menage 2008-03-17 1:47 ` Li Zefan 2008-03-17 1:50 ` Balbir Singh 3 siblings, 2 replies; 30+ messages in thread From: Paul Menage @ 2008-03-16 23:26 UTC (permalink / raw) To: Balbir Singh Cc: linux-mm, Hugh Dickins, Sudhir Kumar, YAMAMOTO Takashi, lizf, linux-kernel, taka, David Rientjes, Pavel Emelianov, Andrew Morton, KAMEZAWA Hiroyuki On Mon, Mar 17, 2008 at 1:29 AM, Balbir Singh <balbir@linux.vnet.ibm.com> wrote: > This is an early patchset for virtual address space control for cgroups. > The patches are against 2.6.25-rc5-mm1 and have been tested on top of > User Mode Linux. What's the performance hit of doing these accounting checks on every mmap/munmap? If it's not totally lost in the noise, couldn't it be made a separate control group, so that it could be just enabled (and the performance hit taken) for users that actually want it? Paul -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC][0/3] Virtual address space control for cgroups 2008-03-16 23:26 ` [RFC][0/3] Virtual address space control for cgroups Paul Menage @ 2008-03-17 1:47 ` Li Zefan 2008-03-17 1:57 ` Paul Menage 2008-03-17 1:50 ` Balbir Singh 1 sibling, 1 reply; 30+ messages in thread From: Li Zefan @ 2008-03-17 1:47 UTC (permalink / raw) To: Paul Menage Cc: Balbir Singh, linux-mm, Hugh Dickins, Sudhir Kumar, YAMAMOTO Takashi, linux-kernel, taka, David Rientjes, Pavel Emelianov, Andrew Morton, KAMEZAWA Hiroyuki Paul Menage wrote: > On Mon, Mar 17, 2008 at 1:29 AM, Balbir Singh <balbir@linux.vnet.ibm.com> wrote: >> This is an early patchset for virtual address space control for cgroups. >> The patches are against 2.6.25-rc5-mm1 and have been tested on top of >> User Mode Linux. > > What's the performance hit of doing these accounting checks on every > mmap/munmap? If it's not totally lost in the noise, couldn't it be > made a separate control group, so that it could be just enabled (and > the performance hit taken) for users that actually want it? > It will be code duplication to make it a new subsystem, and it will be useful to control both of them, am I right? :) So could we just add a CONFIG to this patch series, like: CONFIG_CGROUP_MEM_RES_AS_CTLR ? > Paul > > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC][0/3] Virtual address space control for cgroups 2008-03-17 1:47 ` Li Zefan @ 2008-03-17 1:57 ` Paul Menage 2008-03-17 5:08 ` Balbir Singh 0 siblings, 1 reply; 30+ messages in thread From: Paul Menage @ 2008-03-17 1:57 UTC (permalink / raw) To: Li Zefan Cc: Balbir Singh, linux-mm, Hugh Dickins, Sudhir Kumar, YAMAMOTO Takashi, linux-kernel, taka, David Rientjes, Pavel Emelianov, Andrew Morton, KAMEZAWA Hiroyuki On Mon, Mar 17, 2008 at 9:47 AM, Li Zefan <lizf@cn.fujitsu.com> wrote: > > It will be code duplication to make it a new subsystem, Would it? Other than the basic cgroup boilerplate, the only real duplication that I could see would be that there'd need to be an additional per-mm pointer back to the cgroup. (Which could be avoided if we added a single per-mm pointer back to the "owning" task, which would generally be the mm's thread group leader, so that you could go quickly from an mm to a set of cgroup subsystems). And the advantage would that you'd be able to more easily pick/choose which bits of control you use (and pay for). Paul -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC][0/3] Virtual address space control for cgroups 2008-03-17 1:57 ` Paul Menage @ 2008-03-17 5:08 ` Balbir Singh 2008-03-17 5:22 ` Paul Menage 0 siblings, 1 reply; 30+ messages in thread From: Balbir Singh @ 2008-03-17 5:08 UTC (permalink / raw) To: Paul Menage Cc: Li Zefan, linux-mm, Hugh Dickins, Sudhir Kumar, YAMAMOTO Takashi, linux-kernel, taka, David Rientjes, Pavel Emelianov, Andrew Morton, KAMEZAWA Hiroyuki Paul Menage wrote: > On Mon, Mar 17, 2008 at 9:47 AM, Li Zefan <lizf@cn.fujitsu.com> wrote: >> It will be code duplication to make it a new subsystem, > > Would it? Other than the basic cgroup boilerplate, the only real > duplication that I could see would be that there'd need to be an > additional per-mm pointer back to the cgroup. (Which could be avoided > if we added a single per-mm pointer back to the "owning" task, which > would generally be the mm's thread group leader, so that you could go > quickly from an mm to a set of cgroup subsystems). > I understand the per-mm pointer overhead back to the cgroup. I don't understand the part about adding a per-mm pointer back to the "owning" task. We already have task->mm. BTW, the reason by we directly add the mm_struct to mem_cgroup mapping is that there are contexts from where only the mm_struct is known (when we charge/uncharge). Assuming that current->mm's mem_cgorup is the one we want to charge/uncharge is incorrect. > And the advantage would that you'd be able to more easily pick/choose > which bits of control you use (and pay for). I am not sure I understand your proposal fully. But, if it can help provide the flexibility you are referring to, I am all ears. -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC][0/3] Virtual address space control for cgroups 2008-03-17 5:08 ` Balbir Singh @ 2008-03-17 5:22 ` Paul Menage 2008-03-17 15:15 ` Balbir Singh 0 siblings, 1 reply; 30+ messages in thread From: Paul Menage @ 2008-03-17 5:22 UTC (permalink / raw) To: balbir Cc: Li Zefan, linux-mm, Hugh Dickins, Sudhir Kumar, YAMAMOTO Takashi, linux-kernel, taka, David Rientjes, Pavel Emelianov, Andrew Morton, KAMEZAWA Hiroyuki On Mon, Mar 17, 2008 at 1:08 PM, Balbir Singh <balbir@linux.vnet.ibm.com> wrote: > > I understand the per-mm pointer overhead back to the cgroup. I don't understand > the part about adding a per-mm pointer back to the "owning" task. We already > have task->mm. Yes, but we don't have mm->owner, which is what I was proposing - mm->owner would be a pointer typically to the mm's thread group leader. It would remove the need to have to have pointers for the various different cgroup subsystems that need to act on an mm rather than a task_struct, since then you could use mm->owner->cgroups[subsys_id]. But this is kind of orthogonal to whether virtual address space limits should be a separate cgroup subsystem. Paul -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC][0/3] Virtual address space control for cgroups 2008-03-17 5:22 ` Paul Menage @ 2008-03-17 15:15 ` Balbir Singh 0 siblings, 0 replies; 30+ messages in thread From: Balbir Singh @ 2008-03-17 15:15 UTC (permalink / raw) To: Paul Menage Cc: Li Zefan, linux-mm, Hugh Dickins, Sudhir Kumar, YAMAMOTO Takashi, linux-kernel, taka, David Rientjes, Pavel Emelianov, Andrew Morton, KAMEZAWA Hiroyuki Paul Menage wrote: > On Mon, Mar 17, 2008 at 1:08 PM, Balbir Singh <balbir@linux.vnet.ibm.com> wrote: >> I understand the per-mm pointer overhead back to the cgroup. I don't understand >> the part about adding a per-mm pointer back to the "owning" task. We already >> have task->mm. > > Yes, but we don't have mm->owner, which is what I was proposing - > mm->owner would be a pointer typically to the mm's thread group > leader. It would remove the need to have to have pointers for the > various different cgroup subsystems that need to act on an mm rather > than a task_struct, since then you could use > mm->owner->cgroups[subsys_id]. > Aaahh.. Yes.. mm->owner might be a good idea. The only thing we'll need to handle is when mm->owner dies (I think the thread group is still kept around). The other disadvantage is the double dereferencing, which should not be all that bad. > But this is kind of orthogonal to whether virtual address space limits > should be a separate cgroup subsystem. > Yes, sure. -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC][0/3] Virtual address space control for cgroups 2008-03-16 23:26 ` [RFC][0/3] Virtual address space control for cgroups Paul Menage 2008-03-17 1:47 ` Li Zefan @ 2008-03-17 1:50 ` Balbir Singh 2008-03-17 1:55 ` Paul Menage 1 sibling, 1 reply; 30+ messages in thread From: Balbir Singh @ 2008-03-17 1:50 UTC (permalink / raw) To: Paul Menage Cc: linux-mm, Hugh Dickins, Sudhir Kumar, YAMAMOTO Takashi, lizf, linux-kernel, taka, David Rientjes, Pavel Emelianov, Andrew Morton, KAMEZAWA Hiroyuki Paul Menage wrote: > On Mon, Mar 17, 2008 at 1:29 AM, Balbir Singh <balbir@linux.vnet.ibm.com> wrote: >> This is an early patchset for virtual address space control for cgroups. >> The patches are against 2.6.25-rc5-mm1 and have been tested on top of >> User Mode Linux. > > What's the performance hit of doing these accounting checks on every > mmap/munmap? If it's not totally lost in the noise, couldn't it be > made a separate control group, so that it could be just enabled (and > the performance hit taken) for users that actually want it? > I am yet to measure the performance overhead of the accounting checks. I'll try and get started on that today. I did not consider making it a separate system, because I suspect that anybody wanting memory control would also want address space control (for the advantages listed in the documentation). I am not against the idea of making it a separate subsystem, but first let me get back with the numbers. -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC][0/3] Virtual address space control for cgroups 2008-03-17 1:50 ` Balbir Singh @ 2008-03-17 1:55 ` Paul Menage 2008-03-17 3:12 ` Balbir Singh 0 siblings, 1 reply; 30+ messages in thread From: Paul Menage @ 2008-03-17 1:55 UTC (permalink / raw) To: balbir Cc: linux-mm, Hugh Dickins, Sudhir Kumar, YAMAMOTO Takashi, lizf, linux-kernel, taka, David Rientjes, Pavel Emelianov, Andrew Morton, KAMEZAWA Hiroyuki On Mon, Mar 17, 2008 at 9:50 AM, Balbir Singh <balbir@linux.vnet.ibm.com> wrote: > > I am yet to measure the performance overhead of the accounting checks. I'll try > and get started on that today. I did not consider making it a separate system, > because I suspect that anybody wanting memory control would also want address > space control (for the advantages listed in the documentation). I'm a counter-example to your suspicion :-) Trying to control virtual address space is a complete nightmare in the presence of anything that uses large sparsely-populated mappings (mmaps of large files, or large sparse heaps such as the JVM uses.) If we want to control the effect of swapping, the right way to do it is to control disk I/O, and ensure that the swapping is accounted to that. Or simply just not give apps much swap space. Paul -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC][0/3] Virtual address space control for cgroups 2008-03-17 1:55 ` Paul Menage @ 2008-03-17 3:12 ` Balbir Singh 0 siblings, 0 replies; 30+ messages in thread From: Balbir Singh @ 2008-03-17 3:12 UTC (permalink / raw) To: Paul Menage Cc: linux-mm, Hugh Dickins, Sudhir Kumar, YAMAMOTO Takashi, lizf, linux-kernel, taka, David Rientjes, Pavel Emelianov, Andrew Morton, KAMEZAWA Hiroyuki Paul Menage wrote: > On Mon, Mar 17, 2008 at 9:50 AM, Balbir Singh <balbir@linux.vnet.ibm.com> wrote: >> I am yet to measure the performance overhead of the accounting checks. I'll try >> and get started on that today. I did not consider making it a separate system, >> because I suspect that anybody wanting memory control would also want address >> space control (for the advantages listed in the documentation). > > I'm a counter-example to your suspicion :-) > > Trying to control virtual address space is a complete nightmare in the > presence of anything that uses large sparsely-populated mappings > (mmaps of large files, or large sparse heaps such as the JVM uses.) > Not really. Virtual limits are more gentle than an OOM kill that can occur if the cgroup runs out of memory. Please also see http://linux-vserver.org/Memory_Limits > If we want to control the effect of swapping, the right way to do it > is to control disk I/O, and ensure that the swapping is accounted to > that. Or simply just not give apps much swap space. Yes, a disk I/O and swap I/O controller are being developed (not by us, but others in the community). How does one restrict swap space for a particular application? I can think of RLIMIT_AS for a process and something similar to what I've posted for cgroups. Not enabling swap is an option, but not very practical IMHO. -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2008-03-18 18:00 UTC | newest] Thread overview: 30+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-03-16 17:29 [RFC][0/3] Virtual address space control for cgroups Balbir Singh 2008-03-16 17:29 ` [RFC][1/3] Add user interface for virtual address space control Balbir Singh 2008-03-16 17:30 ` [RFC][2/3] Account and control virtual address space allocations Balbir Singh 2008-03-17 2:02 ` Paul Menage 2008-03-17 2:57 ` Balbir Singh 2008-03-17 3:03 ` Paul Menage 2008-03-17 11:36 ` Pavel Emelyanov 2008-03-17 12:29 ` Balbir Singh 2008-03-17 12:40 ` Pavel Emelyanov 2008-03-17 12:51 ` Balbir Singh 2008-03-17 13:01 ` Pavel Emelyanov 2008-03-17 14:39 ` Balbir Singh 2008-03-17 16:53 ` Dave Hansen 2008-03-18 1:14 ` Balbir Singh 2008-03-18 17:11 ` Dave Hansen 2008-03-18 17:58 ` Balbir Singh 2008-03-17 23:35 ` YAMAMOTO Takashi 2008-03-18 1:10 ` Balbir Singh 2008-03-16 17:30 ` [RFC][3/3] Update documentation for virtual address space control Balbir Singh 2008-03-16 18:32 ` Randy Dunlap 2008-03-17 1:33 ` Balbir Singh 2008-03-16 23:26 ` [RFC][0/3] Virtual address space control for cgroups Paul Menage 2008-03-17 1:47 ` Li Zefan 2008-03-17 1:57 ` Paul Menage 2008-03-17 5:08 ` Balbir Singh 2008-03-17 5:22 ` Paul Menage 2008-03-17 15:15 ` Balbir Singh 2008-03-17 1:50 ` Balbir Singh 2008-03-17 1:55 ` Paul Menage 2008-03-17 3:12 ` Balbir Singh
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).