From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Aneesh Kumar K.V" Subject: [PATCH -V4 00/10] memcg: Add memcg extension to control HugeTLB allocation Date: Fri, 16 Mar 2012 23:09:20 +0530 Message-ID: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Return-path: Sender: owner-linux-mm@kvack.org List-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: linux-mm@kvack.org, mgorman@suse.de, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, aarcange@redhat.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Hi, This patchset implements a memory controller extension to control HugeTLB allocations. The extension allows to limit the HugeTLB usage per control group and enforces the controller limit during page fault. Since HugeTLB doesn't support page reclaim, enforcing the limit at page fault time implies that, the application will get SIGBUS signal if it tries to access HugeTLB pages beyond its limit. This requires the application to know beforehand how much HugeTLB pages it would require for its use. The goal is to control how many HugeTLB pages a group of task can allocate. It can be looked at as an extension of the existing quota interface which limits the number of HugeTLB pages per hugetlbfs superblock. HPC job scheduler requires jobs to specify their resource requirements in the job file. Once their requirements can be met, job schedulers like (SLURM) will schedule the job. We need to make sure that the jobs won't consume more resources than requested. If they do we should either error out or kill the application. Changes from v3: * Address review feedback. * bug fix in cgroup removal related parent charging with use_hierarchy set Changes from V2: * Changed the implementation to limit the HugeTLB usage during page fault time. This simplifies the extension and keep it closer to memcg design. This also allows to support cgroup removal with less complexity. Only caveat is the application should ensure its HugeTLB usage doesn't cross the cgroup limit. Changes from V1: * Changed the implementation as a memcg extension. We still use the same logic to track the cgroup and range. Changes from RFC post: * Added support for HugeTLB cgroup hierarchy * Added support for task migration * Added documentation patch * Other bug fixes -aneesh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Aneesh Kumar K.V" Subject: [PATCH -V4 03/10] hugetlbfs: Add an inline helper for finding hstate index Date: Fri, 16 Mar 2012 23:09:23 +0530 Message-ID: <1331919570-2264-4-git-send-email-aneesh.kumar@linux.vnet.ibm.com> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Return-path: In-Reply-To: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Sender: owner-linux-mm@kvack.org List-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: linux-mm@kvack.org, mgorman@suse.de, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, aarcange@redhat.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, "Aneesh Kumar K.V" From: "Aneesh Kumar K.V" Add and inline helper and use it in the code. Signed-off-by: Aneesh Kumar K.V --- include/linux/hugetlb.h | 6 ++++++ mm/hugetlb.c | 18 ++++++++++-------- 2 files changed, 16 insertions(+), 8 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index d9d6c86..a2675b0 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -311,6 +311,11 @@ static inline unsigned hstate_index_to_shift(unsigned index) return hstates[index].order + PAGE_SHIFT; } +static inline int hstate_index(struct hstate *h) +{ + return h - hstates; +} + #else struct hstate {}; #define alloc_huge_page_node(h, nid) NULL @@ -329,6 +334,7 @@ static inline unsigned int pages_per_huge_page(struct hstate *h) return 1; } #define hstate_index_to_shift(index) 0 +#define hstate_index(h) 0 #endif #endif /* _LINUX_HUGETLB_H */ diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 3782da8..ebe245c 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1557,7 +1557,7 @@ static int hugetlb_sysfs_add_hstate(struct hstate *h, struct kobject *parent, struct attribute_group *hstate_attr_group) { int retval; - int hi = h - hstates; + int hi = hstate_index(h); hstate_kobjs[hi] = kobject_create_and_add(h->name, parent); if (!hstate_kobjs[hi]) @@ -1652,11 +1652,13 @@ void hugetlb_unregister_node(struct node *node) if (!nhs->hugepages_kobj) return; /* no hstate attributes */ - for_each_hstate(h) - if (nhs->hstate_kobjs[h - hstates]) { - kobject_put(nhs->hstate_kobjs[h - hstates]); - nhs->hstate_kobjs[h - hstates] = NULL; + for_each_hstate(h) { + int idx = hstate_index(h); + if (nhs->hstate_kobjs[idx]) { + kobject_put(nhs->hstate_kobjs[idx]); + nhs->hstate_kobjs[idx] = NULL; } + } kobject_put(nhs->hugepages_kobj); nhs->hugepages_kobj = NULL; @@ -1759,7 +1761,7 @@ static void __exit hugetlb_exit(void) hugetlb_unregister_all_nodes(); for_each_hstate(h) { - kobject_put(hstate_kobjs[h - hstates]); + kobject_put(hstate_kobjs[hstate_index(h)]); } kobject_put(hugepages_kobj); @@ -2587,7 +2589,7 @@ retry: */ if (unlikely(PageHWPoison(page))) { ret = VM_FAULT_HWPOISON | - VM_FAULT_SET_HINDEX(h - hstates); + VM_FAULT_SET_HINDEX(hstate_index(h)); goto backout_unlocked; } } @@ -2660,7 +2662,7 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, return 0; } else if (unlikely(is_hugetlb_entry_hwpoisoned(entry))) return VM_FAULT_HWPOISON_LARGE | - VM_FAULT_SET_HINDEX(h - hstates); + VM_FAULT_SET_HINDEX(hstate_index(h)); } ptep = huge_pte_alloc(mm, address, huge_page_size(h)); -- 1.7.9 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Aneesh Kumar K.V" Subject: [PATCH -V4 01/10] hugetlb: rename max_hstate to hugetlb_max_hstate Date: Fri, 16 Mar 2012 23:09:21 +0530 Message-ID: <1331919570-2264-2-git-send-email-aneesh.kumar@linux.vnet.ibm.com> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Return-path: In-Reply-To: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Sender: owner-linux-mm@kvack.org List-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: linux-mm@kvack.org, mgorman@suse.de, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, aarcange@redhat.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, "Aneesh Kumar K.V" From: "Aneesh Kumar K.V" We will be using this from other subsystems like memcg in later patches. Acked-by: Hillf Danton Signed-off-by: Aneesh Kumar K.V --- mm/hugetlb.c | 14 +++++++------- 1 files changed, 7 insertions(+), 7 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 5f34bd8..d623e71 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -34,7 +34,7 @@ const unsigned long hugetlb_zero = 0, hugetlb_infinity = ~0UL; static gfp_t htlb_alloc_mask = GFP_HIGHUSER; unsigned long hugepages_treat_as_movable; -static int max_hstate; +static int hugetlb_max_hstate; unsigned int default_hstate_idx; struct hstate hstates[HUGE_MAX_HSTATE]; @@ -46,7 +46,7 @@ static unsigned long __initdata default_hstate_max_huge_pages; static unsigned long __initdata default_hstate_size; #define for_each_hstate(h) \ - for ((h) = hstates; (h) < &hstates[max_hstate]; (h)++) + for ((h) = hstates; (h) < &hstates[hugetlb_max_hstate]; (h)++) /* * Protects updates to hugepage_freelists, nr_huge_pages, and free_huge_pages @@ -1808,9 +1808,9 @@ void __init hugetlb_add_hstate(unsigned order) printk(KERN_WARNING "hugepagesz= specified twice, ignoring\n"); return; } - BUG_ON(max_hstate >= HUGE_MAX_HSTATE); + BUG_ON(hugetlb_max_hstate >= HUGE_MAX_HSTATE); BUG_ON(order == 0); - h = &hstates[max_hstate++]; + h = &hstates[hugetlb_max_hstate++]; h->order = order; h->mask = ~((1ULL << (order + PAGE_SHIFT)) - 1); h->nr_huge_pages = 0; @@ -1831,10 +1831,10 @@ static int __init hugetlb_nrpages_setup(char *s) static unsigned long *last_mhp; /* - * !max_hstate means we haven't parsed a hugepagesz= parameter yet, + * !hugetlb_max_hstate means we haven't parsed a hugepagesz= parameter yet, * so this hugepages= parameter goes to the "default hstate". */ - if (!max_hstate) + if (!hugetlb_max_hstate) mhp = &default_hstate_max_huge_pages; else mhp = &parsed_hstate->max_huge_pages; @@ -1853,7 +1853,7 @@ static int __init hugetlb_nrpages_setup(char *s) * But we need to allocate >= MAX_ORDER hstates here early to still * use the bootmem allocator. */ - if (max_hstate && parsed_hstate->order >= MAX_ORDER) + if (hugetlb_max_hstate && parsed_hstate->order >= MAX_ORDER) hugetlb_hstate_alloc_pages(parsed_hstate); last_mhp = mhp; -- 1.7.9 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Aneesh Kumar K.V" Subject: [PATCH -V4 04/10] memcg: Add HugeTLB extension Date: Fri, 16 Mar 2012 23:09:24 +0530 Message-ID: <1331919570-2264-5-git-send-email-aneesh.kumar@linux.vnet.ibm.com> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Return-path: In-Reply-To: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Sender: owner-linux-mm@kvack.org List-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: linux-mm@kvack.org, mgorman@suse.de, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, aarcange@redhat.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, "Aneesh Kumar K.V" From: "Aneesh Kumar K.V" This patch implements a memcg extension that allows us to control HugeTLB allocations via memory controller. Signed-off-by: Aneesh Kumar K.V --- include/linux/hugetlb.h | 1 + include/linux/memcontrol.h | 42 +++++++++++++ init/Kconfig | 8 +++ mm/hugetlb.c | 2 +- mm/memcontrol.c | 138 ++++++++++++++++++++++++++++++++++++++++++++ 5 files changed, 190 insertions(+), 1 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index a2675b0..1f70068 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -243,6 +243,7 @@ struct hstate *size_to_hstate(unsigned long size); #define HUGE_MAX_HSTATE 1 #endif +extern int hugetlb_max_hstate; extern struct hstate hstates[HUGE_MAX_HSTATE]; extern unsigned int default_hstate_idx; diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 4d34356..320dbad 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -429,5 +429,47 @@ static inline void sock_release_memcg(struct sock *sk) { } #endif /* CONFIG_CGROUP_MEM_RES_CTLR_KMEM */ + +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB +extern int mem_cgroup_hugetlb_charge_page(int idx, unsigned long nr_pages, + struct mem_cgroup **ptr); +extern void mem_cgroup_hugetlb_commit_charge(int idx, unsigned long nr_pages, + struct mem_cgroup *memcg, + struct page *page); +extern void mem_cgroup_hugetlb_uncharge_page(int idx, unsigned long nr_pages, + struct page *page); +extern void mem_cgroup_hugetlb_uncharge_memcg(int idx, unsigned long nr_pages, + struct mem_cgroup *memcg); + +#else +static inline int +mem_cgroup_hugetlb_charge_page(int idx, unsigned long nr_pages, + struct mem_cgroup **ptr) +{ + return 0; +} + +static inline void +mem_cgroup_hugetlb_commit_charge(int idx, unsigned long nr_pages, + struct mem_cgroup *memcg, + struct page *page) +{ + return; +} + +static inline void +mem_cgroup_hugetlb_uncharge_page(int idx, unsigned long nr_pages, + struct page *page) +{ + return; +} + +static inline void +mem_cgroup_hugetlb_uncharge_memcg(int idx, unsigned long nr_pages, + struct mem_cgroup *memcg) +{ + return; +} +#endif /* CONFIG_MEM_RES_CTLR_HUGETLB */ #endif /* _LINUX_MEMCONTROL_H */ diff --git a/init/Kconfig b/init/Kconfig index 3f42cd6..f0eb8aa 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -725,6 +725,14 @@ config CGROUP_PERF Say N if unsure. +config MEM_RES_CTLR_HUGETLB + bool "Memory Resource Controller HugeTLB Extension (EXPERIMENTAL)" + depends on CGROUP_MEM_RES_CTLR && HUGETLB_PAGE && EXPERIMENTAL + default n + help + Add HugeTLB management to memory resource controller. When you + enable this, you can put a per cgroup limit on HugeTLB usage. + menuconfig CGROUP_SCHED bool "Group CPU scheduler" default n diff --git a/mm/hugetlb.c b/mm/hugetlb.c index ebe245c..c672187 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -34,7 +34,7 @@ const unsigned long hugetlb_zero = 0, hugetlb_infinity = ~0UL; static gfp_t htlb_alloc_mask = GFP_HIGHUSER; unsigned long hugepages_treat_as_movable; -static int hugetlb_max_hstate; +int hugetlb_max_hstate; unsigned int default_hstate_idx; struct hstate hstates[HUGE_MAX_HSTATE]; diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 6728a7a..4b36c5e 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -235,6 +235,10 @@ struct mem_cgroup { */ struct res_counter memsw; /* + * the counter to account for hugepages from hugetlb. + */ + struct res_counter hugepage[HUGE_MAX_HSTATE]; + /* * Per cgroup active and inactive list, similar to the * per zone LRU lists. */ @@ -3156,6 +3160,128 @@ static inline int mem_cgroup_move_swap_account(swp_entry_t entry, } #endif +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB +static bool mem_cgroup_have_hugetlb_usage(struct mem_cgroup *memcg) +{ + int idx; + for (idx = 0; idx < hugetlb_max_hstate; idx++) { + if (memcg->hugepage[idx].usage > 0) + return 1; + } + return 0; +} + +int mem_cgroup_hugetlb_charge_page(int idx, unsigned long nr_pages, + struct mem_cgroup **ptr) +{ + int ret = 0; + struct mem_cgroup *memcg; + struct res_counter *fail_res; + unsigned long csize = nr_pages * PAGE_SIZE; + + if (mem_cgroup_disabled()) + return 0; +again: + rcu_read_lock(); + memcg = mem_cgroup_from_task(current); + if (!memcg) + memcg = root_mem_cgroup; + if (mem_cgroup_is_root(memcg)) { + rcu_read_unlock(); + goto done; + } + if (!css_tryget(&memcg->css)) { + rcu_read_unlock(); + goto again; + } + rcu_read_unlock(); + + ret = res_counter_charge(&memcg->hugepage[idx], csize, &fail_res); + css_put(&memcg->css); +done: + *ptr = memcg; + return ret; +} + +void mem_cgroup_hugetlb_commit_charge(int idx, unsigned long nr_pages, + struct mem_cgroup *memcg, + struct page *page) +{ + struct page_cgroup *pc; + + if (mem_cgroup_disabled()) + return; + + pc = lookup_page_cgroup(page); + lock_page_cgroup(pc); + if (unlikely(PageCgroupUsed(pc))) { + unlock_page_cgroup(pc); + mem_cgroup_hugetlb_uncharge_memcg(idx, nr_pages, memcg); + return; + } + pc->mem_cgroup = memcg; + /* + * We access a page_cgroup asynchronously without lock_page_cgroup(). + * Especially when a page_cgroup is taken from a page, pc->mem_cgroup + * is accessed after testing USED bit. To make pc->mem_cgroup visible + * before USED bit, we need memory barrier here. + * See mem_cgroup_add_lru_list(), etc. + */ + smp_wmb(); + SetPageCgroupUsed(pc); + + unlock_page_cgroup(pc); + return; +} + +void mem_cgroup_hugetlb_uncharge_page(int idx, unsigned long nr_pages, + struct page *page) +{ + struct page_cgroup *pc; + struct mem_cgroup *memcg; + unsigned long csize = nr_pages * PAGE_SIZE; + + if (mem_cgroup_disabled()) + return; + + pc = lookup_page_cgroup(page); + if (unlikely(!PageCgroupUsed(pc))) + return; + + lock_page_cgroup(pc); + if (!PageCgroupUsed(pc)) { + unlock_page_cgroup(pc); + return; + } + memcg = pc->mem_cgroup; + pc->mem_cgroup = root_mem_cgroup; + ClearPageCgroupUsed(pc); + unlock_page_cgroup(pc); + + if (!mem_cgroup_is_root(memcg)) + res_counter_uncharge(&memcg->hugepage[idx], csize); + return; +} + +void mem_cgroup_hugetlb_uncharge_memcg(int idx, unsigned long nr_pages, + struct mem_cgroup *memcg) +{ + unsigned long csize = nr_pages * PAGE_SIZE; + + if (mem_cgroup_disabled()) + return; + + if (!mem_cgroup_is_root(memcg)) + res_counter_uncharge(&memcg->hugepage[idx], csize); + return; +} +#else +static bool mem_cgroup_have_hugetlb_usage(struct mem_cgroup *memcg) +{ + return 0; +} +#endif /* CONFIG_MEM_RES_CTLR_HUGETLB */ + /* * Before starting migration, account PAGE_SIZE to mem_cgroup that the old * page belongs to. @@ -4887,6 +5013,7 @@ err_cleanup: static struct cgroup_subsys_state * __ref mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont) { + int idx; struct mem_cgroup *memcg, *parent; long error = -ENOMEM; int node; @@ -4929,9 +5056,14 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont) * mem_cgroup(see mem_cgroup_put). */ mem_cgroup_get(parent); + for (idx = 0; idx < HUGE_MAX_HSTATE; idx++) + res_counter_init(&memcg->hugepage[idx], + &parent->hugepage[idx]); } else { res_counter_init(&memcg->res, NULL); res_counter_init(&memcg->memsw, NULL); + for (idx = 0; idx < HUGE_MAX_HSTATE; idx++) + res_counter_init(&memcg->hugepage[idx], NULL); } memcg->last_scanned_node = MAX_NUMNODES; INIT_LIST_HEAD(&memcg->oom_notify); @@ -4951,6 +5083,12 @@ static int mem_cgroup_pre_destroy(struct cgroup_subsys *ss, struct cgroup *cont) { struct mem_cgroup *memcg = mem_cgroup_from_cont(cont); + /* + * Don't allow memcg removal if we have HugeTLB resource + * usage. + */ + if (mem_cgroup_have_hugetlb_usage(memcg)) + return -EBUSY; return mem_cgroup_force_empty(memcg, false); } -- 1.7.9 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Aneesh Kumar K.V" Subject: [PATCH -V4 02/10] hugetlbfs: don't use ERR_PTR with VM_FAULT* values Date: Fri, 16 Mar 2012 23:09:22 +0530 Message-ID: <1331919570-2264-3-git-send-email-aneesh.kumar@linux.vnet.ibm.com> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Return-path: In-Reply-To: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Sender: owner-linux-mm@kvack.org List-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: linux-mm@kvack.org, mgorman@suse.de, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, aarcange@redhat.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, "Aneesh Kumar K.V" From: "Aneesh Kumar K.V" Using VM_FAULT_* codes with ERR_PTR will require us to make sure VM_FAULT_* values will not exceed MAX_ERRNO value. Signed-off-by: Aneesh Kumar K.V --- mm/hugetlb.c | 18 +++++++++++++----- 1 files changed, 13 insertions(+), 5 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index d623e71..3782da8 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1034,10 +1034,10 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma, */ chg = vma_needs_reservation(h, vma, addr); if (chg < 0) - return ERR_PTR(-VM_FAULT_OOM); + return ERR_PTR(-ENOMEM); if (chg) if (hugetlb_get_quota(inode->i_mapping, chg)) - return ERR_PTR(-VM_FAULT_SIGBUS); + return ERR_PTR(-ENOSPC); spin_lock(&hugetlb_lock); page = dequeue_huge_page_vma(h, vma, addr, avoid_reserve); @@ -1047,7 +1047,7 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma, page = alloc_buddy_huge_page(h, NUMA_NO_NODE); if (!page) { hugetlb_put_quota(inode->i_mapping, chg); - return ERR_PTR(-VM_FAULT_SIGBUS); + return ERR_PTR(-ENOSPC); } } @@ -2395,6 +2395,7 @@ retry_avoidcopy: new_page = alloc_huge_page(vma, address, outside_reserve); if (IS_ERR(new_page)) { + int err = PTR_ERR(new_page); page_cache_release(old_page); /* @@ -2424,7 +2425,10 @@ retry_avoidcopy: /* Caller expects lock to be held */ spin_lock(&mm->page_table_lock); - return -PTR_ERR(new_page); + if (err == -ENOMEM) + return VM_FAULT_OOM; + else + return VM_FAULT_SIGBUS; } /* @@ -2542,7 +2546,11 @@ retry: goto out; page = alloc_huge_page(vma, address, 0); if (IS_ERR(page)) { - ret = -PTR_ERR(page); + ret = PTR_ERR(page); + if (ret == -ENOMEM) + ret = VM_FAULT_OOM; + else + ret = VM_FAULT_SIGBUS; goto out; } clear_huge_page(page, address, pages_per_huge_page(h)); -- 1.7.9 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Aneesh Kumar K.V" Subject: [PATCH -V4 05/10] hugetlb: add charge/uncharge calls for HugeTLB alloc/free Date: Fri, 16 Mar 2012 23:09:25 +0530 Message-ID: <1331919570-2264-6-git-send-email-aneesh.kumar@linux.vnet.ibm.com> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Return-path: In-Reply-To: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Sender: owner-linux-mm@kvack.org List-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: linux-mm@kvack.org, mgorman@suse.de, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, aarcange@redhat.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, "Aneesh Kumar K.V" From: "Aneesh Kumar K.V" This adds necessary charge/uncharge calls in the HugeTLB code Acked-by: Hillf Danton Signed-off-by: Aneesh Kumar K.V --- mm/hugetlb.c | 21 ++++++++++++++++++++- mm/memcontrol.c | 5 +++++ 2 files changed, 25 insertions(+), 1 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index c672187..91361a0 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -21,6 +21,8 @@ #include #include #include +#include +#include #include #include @@ -542,6 +544,9 @@ static void free_huge_page(struct page *page) BUG_ON(page_mapcount(page)); INIT_LIST_HEAD(&page->lru); + if (mapping) + mem_cgroup_hugetlb_uncharge_page(hstate_index(h), + pages_per_huge_page(h), page); spin_lock(&hugetlb_lock); if (h->surplus_huge_pages_node[nid] && huge_page_order(h) < MAX_ORDER) { update_and_free_page(h, page); @@ -1019,12 +1024,15 @@ static void vma_commit_reservation(struct hstate *h, static struct page *alloc_huge_page(struct vm_area_struct *vma, unsigned long addr, int avoid_reserve) { + int ret, idx; struct hstate *h = hstate_vma(vma); struct page *page; + struct mem_cgroup *memcg = NULL; struct address_space *mapping = vma->vm_file->f_mapping; struct inode *inode = mapping->host; long chg; + idx = hstate_index(h); /* * Processes that did not create the mapping will have no reserves and * will not have accounted against quota. Check that the quota can be @@ -1039,6 +1047,12 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma, if (hugetlb_get_quota(inode->i_mapping, chg)) return ERR_PTR(-ENOSPC); + ret = mem_cgroup_hugetlb_charge_page(idx, pages_per_huge_page(h), + &memcg); + if (ret) { + hugetlb_put_quota(inode->i_mapping, chg); + return ERR_PTR(-ENOSPC); + } spin_lock(&hugetlb_lock); page = dequeue_huge_page_vma(h, vma, addr, avoid_reserve); spin_unlock(&hugetlb_lock); @@ -1046,6 +1060,9 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma, if (!page) { page = alloc_buddy_huge_page(h, NUMA_NO_NODE); if (!page) { + mem_cgroup_hugetlb_uncharge_memcg(idx, + pages_per_huge_page(h), + memcg); hugetlb_put_quota(inode->i_mapping, chg); return ERR_PTR(-ENOSPC); } @@ -1054,7 +1071,9 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma, set_page_private(page, (unsigned long) mapping); vma_commit_reservation(h, vma, addr); - + /* update page cgroup details */ + mem_cgroup_hugetlb_commit_charge(idx, pages_per_huge_page(h), + memcg, page); return page; } diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 4b36c5e..7a9ea94 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2901,6 +2901,11 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype) if (PageSwapCache(page)) return NULL; + /* + * HugeTLB page uncharge happen in the HugeTLB compound page destructor + */ + if (PageHuge(page)) + return NULL; if (PageTransHuge(page)) { nr_pages <<= compound_order(page); -- 1.7.9 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Aneesh Kumar K.V" Subject: [PATCH -V4 07/10] hugetlbfs: Add memcg control files for hugetlbfs Date: Fri, 16 Mar 2012 23:09:27 +0530 Message-ID: <1331919570-2264-8-git-send-email-aneesh.kumar@linux.vnet.ibm.com> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Return-path: In-Reply-To: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Sender: owner-linux-mm@kvack.org List-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: linux-mm@kvack.org, mgorman@suse.de, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, aarcange@redhat.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, "Aneesh Kumar K.V" From: "Aneesh Kumar K.V" This add control files for hugetlbfs in memcg Signed-off-by: Aneesh Kumar K.V --- include/linux/hugetlb.h | 17 +++++++++++++++ include/linux/memcontrol.h | 7 ++++++ mm/hugetlb.c | 25 ++++++++++++++++++++++- mm/memcontrol.c | 48 ++++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 96 insertions(+), 1 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 1f70068..cbd8dc5 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -4,6 +4,7 @@ #include #include #include +#include struct ctl_table; struct user_struct; @@ -220,6 +221,12 @@ struct hstate { unsigned int nr_huge_pages_node[MAX_NUMNODES]; unsigned int free_huge_pages_node[MAX_NUMNODES]; unsigned int surplus_huge_pages_node[MAX_NUMNODES]; + /* mem cgroup control files */ +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB + struct cftype cgroup_limit_file; + struct cftype cgroup_usage_file; + struct cftype cgroup_max_usage_file; +#endif char name[HSTATE_NAME_LEN]; }; @@ -338,4 +345,14 @@ static inline unsigned int pages_per_huge_page(struct hstate *h) #define hstate_index(h) 0 #endif +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB +extern int register_hugetlb_memcg_files(struct cgroup *cgroup, + struct cgroup_subsys *ss); +#else +static inline int register_hugetlb_memcg_files(struct cgroup *cgroup, + struct cgroup_subsys *ss) +{ + return 0; +} +#endif #endif /* _LINUX_HUGETLB_H */ diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 320dbad..73900b9 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -440,6 +440,7 @@ extern void mem_cgroup_hugetlb_uncharge_page(int idx, unsigned long nr_pages, struct page *page); extern void mem_cgroup_hugetlb_uncharge_memcg(int idx, unsigned long nr_pages, struct mem_cgroup *memcg); +extern int mem_cgroup_hugetlb_file_init(int idx); #else static inline int @@ -470,6 +471,12 @@ mem_cgroup_hugetlb_uncharge_memcg(int idx, unsigned long nr_pages, { return; } + +static inline int mem_cgroup_hugetlb_file_init(int idx) +{ + return 0; +} + #endif /* CONFIG_MEM_RES_CTLR_HUGETLB */ #endif /* _LINUX_MEMCONTROL_H */ diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 91361a0..684849a 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1819,6 +1819,29 @@ static int __init hugetlb_init(void) } module_init(hugetlb_init); +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB +int register_hugetlb_memcg_files(struct cgroup *cgroup, + struct cgroup_subsys *ss) +{ + int ret = 0; + struct hstate *h; + + for_each_hstate(h) { + ret = cgroup_add_file(cgroup, ss, &h->cgroup_limit_file); + if (ret) + return ret; + ret = cgroup_add_file(cgroup, ss, &h->cgroup_usage_file); + if (ret) + return ret; + ret = cgroup_add_file(cgroup, ss, &h->cgroup_max_usage_file); + if (ret) + return ret; + + } + return ret; +} +#endif + /* Should be called on processing a hugepagesz=... option */ void __init hugetlb_add_hstate(unsigned order) { @@ -1842,7 +1865,7 @@ void __init hugetlb_add_hstate(unsigned order) h->next_nid_to_free = first_node(node_states[N_HIGH_MEMORY]); snprintf(h->name, HSTATE_NAME_LEN, "hugepages-%lukB", huge_page_size(h)/1024); - + mem_cgroup_hugetlb_file_init(hugetlb_max_hstate - 1); parsed_hstate = h; } diff --git a/mm/memcontrol.c b/mm/memcontrol.c index d8b3513..4900b72 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -5123,6 +5123,51 @@ static void mem_cgroup_destroy(struct cgroup_subsys *ss, mem_cgroup_put(memcg); } +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB +static char *mem_fmt(char *buf, unsigned long n) +{ + if (n >= (1UL << 30)) + sprintf(buf, "%luGB", n >> 30); + else if (n >= (1UL << 20)) + sprintf(buf, "%luMB", n >> 20); + else + sprintf(buf, "%luKB", n >> 10); + return buf; +} + +int mem_cgroup_hugetlb_file_init(int idx) +{ + char buf[32]; + struct cftype *cft; + struct hstate *h = &hstates[idx]; + + /* format the size */ + mem_fmt(buf, huge_page_size(h)); + + /* Add the limit file */ + cft = &h->cgroup_limit_file; + snprintf(cft->name, MAX_CFTYPE_NAME, "hugetlb.%s.limit_in_bytes", buf); + cft->private = __MEMFILE_PRIVATE(idx, _MEMHUGETLB, RES_LIMIT); + cft->read_u64 = mem_cgroup_read; + cft->write_string = mem_cgroup_write; + + /* Add the usage file */ + cft = &h->cgroup_usage_file; + snprintf(cft->name, MAX_CFTYPE_NAME, "hugetlb.%s.usage_in_bytes", buf); + cft->private = __MEMFILE_PRIVATE(idx, _MEMHUGETLB, RES_USAGE); + cft->read_u64 = mem_cgroup_read; + + /* Add the MAX usage file */ + cft = &h->cgroup_max_usage_file; + snprintf(cft->name, MAX_CFTYPE_NAME, "hugetlb.%s.max_usage_in_bytes", buf); + cft->private = __MEMFILE_PRIVATE(idx, _MEMHUGETLB, RES_MAX_USAGE); + cft->trigger = mem_cgroup_reset; + cft->read_u64 = mem_cgroup_read; + + return 0; +} +#endif + static int mem_cgroup_populate(struct cgroup_subsys *ss, struct cgroup *cont) { @@ -5137,6 +5182,9 @@ static int mem_cgroup_populate(struct cgroup_subsys *ss, if (!ret) ret = register_kmem_files(cont, ss); + if (!ret) + ret = register_hugetlb_memcg_files(cont, ss); + return ret; } -- 1.7.9 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Aneesh Kumar K.V" Subject: [PATCH -V4 06/10] memcg: track resource index in cftype private Date: Fri, 16 Mar 2012 23:09:26 +0530 Message-ID: <1331919570-2264-7-git-send-email-aneesh.kumar@linux.vnet.ibm.com> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Return-path: In-Reply-To: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Sender: owner-linux-mm@kvack.org List-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: linux-mm@kvack.org, mgorman@suse.de, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, aarcange@redhat.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, "Aneesh Kumar K.V" From: "Aneesh Kumar K.V" This helps in using same memcg callbacks for non reclaim resource control files. Signed-off-by: Aneesh Kumar K.V --- mm/memcontrol.c | 27 +++++++++++++++++++++------ 1 files changed, 21 insertions(+), 6 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 7a9ea94..d8b3513 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -358,9 +358,14 @@ enum charge_type { #define _MEM (0) #define _MEMSWAP (1) #define _OOM_TYPE (2) -#define MEMFILE_PRIVATE(x, val) (((x) << 16) | (val)) -#define MEMFILE_TYPE(val) (((val) >> 16) & 0xffff) -#define MEMFILE_ATTR(val) ((val) & 0xffff) +#define _MEMHUGETLB (3) + +/* 0 ... val ...16.... x...24...idx...32*/ +#define __MEMFILE_PRIVATE(idx, x, val) (((idx) << 24) | ((x) << 16) | (val)) +#define MEMFILE_PRIVATE(x, val) __MEMFILE_PRIVATE(0, x, val) +#define MEMFILE_TYPE(val) (((val) >> 16) & 0xff) +#define MEMFILE_IDX(val) (((val) >> 24) & 0xff) +#define MEMFILE_ATTR(val) ((val) & 0xffff) /* Used for OOM nofiier */ #define OOM_CONTROL (0) @@ -3954,7 +3959,7 @@ static u64 mem_cgroup_read(struct cgroup *cont, struct cftype *cft) { struct mem_cgroup *memcg = mem_cgroup_from_cont(cont); u64 val; - int type, name; + int type, name, idx; type = MEMFILE_TYPE(cft->private); name = MEMFILE_ATTR(cft->private); @@ -3971,6 +3976,10 @@ static u64 mem_cgroup_read(struct cgroup *cont, struct cftype *cft) else val = res_counter_read_u64(&memcg->memsw, name); break; + case _MEMHUGETLB: + idx = MEMFILE_IDX(cft->private); + val = res_counter_read_u64(&memcg->hugepage[idx], name); + break; default: BUG(); break; @@ -4003,7 +4012,10 @@ static int mem_cgroup_write(struct cgroup *cont, struct cftype *cft, break; if (type == _MEM) ret = mem_cgroup_resize_limit(memcg, val); - else + else if (type == _MEMHUGETLB) { + int idx = MEMFILE_IDX(cft->private); + ret = res_counter_set_limit(&memcg->hugepage[idx], val); + } else ret = mem_cgroup_resize_memsw_limit(memcg, val); break; case RES_SOFT_LIMIT: @@ -4067,7 +4079,10 @@ static int mem_cgroup_reset(struct cgroup *cont, unsigned int event) case RES_MAX_USAGE: if (type == _MEM) res_counter_reset_max(&memcg->res); - else + else if (type == _MEMHUGETLB) { + int idx = MEMFILE_IDX(event); + res_counter_reset_max(&memcg->hugepage[idx]); + } else res_counter_reset_max(&memcg->memsw); break; case RES_FAILCNT: -- 1.7.9 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Aneesh Kumar K.V" Subject: [PATCH -V4 08/10] hugetlbfs: Add a list for tracking in-use HugeTLB pages Date: Fri, 16 Mar 2012 23:09:28 +0530 Message-ID: <1331919570-2264-9-git-send-email-aneesh.kumar@linux.vnet.ibm.com> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Return-path: In-Reply-To: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Sender: owner-linux-mm@kvack.org List-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: linux-mm@kvack.org, mgorman@suse.de, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, aarcange@redhat.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, "Aneesh Kumar K.V" From: "Aneesh Kumar K.V" hugepage_activelist will be used to track currently used HugeTLB pages. We need to find the in-use HugeTLB pages to support memcg removal. On memcg removal we update the page's memory cgroup to point to parent cgroup. Signed-off-by: Aneesh Kumar K.V --- include/linux/hugetlb.h | 1 + mm/hugetlb.c | 23 ++++++++++++++++++----- 2 files changed, 19 insertions(+), 5 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index cbd8dc5..6919100 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -217,6 +217,7 @@ struct hstate { unsigned long resv_huge_pages; unsigned long surplus_huge_pages; unsigned long nr_overcommit_huge_pages; + struct list_head hugepage_activelist; struct list_head hugepage_freelists[MAX_NUMNODES]; unsigned int nr_huge_pages_node[MAX_NUMNODES]; unsigned int free_huge_pages_node[MAX_NUMNODES]; diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 684849a..8fd465d 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -433,7 +433,7 @@ void copy_huge_page(struct page *dst, struct page *src) static void enqueue_huge_page(struct hstate *h, struct page *page) { int nid = page_to_nid(page); - list_add(&page->lru, &h->hugepage_freelists[nid]); + list_move(&page->lru, &h->hugepage_freelists[nid]); h->free_huge_pages++; h->free_huge_pages_node[nid]++; } @@ -445,7 +445,7 @@ static struct page *dequeue_huge_page_node(struct hstate *h, int nid) if (list_empty(&h->hugepage_freelists[nid])) return NULL; page = list_entry(h->hugepage_freelists[nid].next, struct page, lru); - list_del(&page->lru); + list_move(&page->lru, &h->hugepage_activelist); set_page_refcounted(page); h->free_huge_pages--; h->free_huge_pages_node[nid]--; @@ -542,13 +542,14 @@ static void free_huge_page(struct page *page) page->mapping = NULL; BUG_ON(page_count(page)); BUG_ON(page_mapcount(page)); - INIT_LIST_HEAD(&page->lru); if (mapping) mem_cgroup_hugetlb_uncharge_page(hstate_index(h), pages_per_huge_page(h), page); spin_lock(&hugetlb_lock); if (h->surplus_huge_pages_node[nid] && huge_page_order(h) < MAX_ORDER) { + /* remove the page from active list */ + list_del(&page->lru); update_and_free_page(h, page); h->surplus_huge_pages--; h->surplus_huge_pages_node[nid]--; @@ -562,6 +563,7 @@ static void free_huge_page(struct page *page) static void prep_new_huge_page(struct hstate *h, struct page *page, int nid) { + INIT_LIST_HEAD(&page->lru); set_compound_page_dtor(page, free_huge_page); spin_lock(&hugetlb_lock); h->nr_huge_pages++; @@ -1861,6 +1863,7 @@ void __init hugetlb_add_hstate(unsigned order) h->free_huge_pages = 0; for (i = 0; i < MAX_NUMNODES; ++i) INIT_LIST_HEAD(&h->hugepage_freelists[i]); + INIT_LIST_HEAD(&h->hugepage_activelist); h->next_nid_to_alloc = first_node(node_states[N_HIGH_MEMORY]); h->next_nid_to_free = first_node(node_states[N_HIGH_MEMORY]); snprintf(h->name, HSTATE_NAME_LEN, "hugepages-%lukB", @@ -2319,14 +2322,24 @@ void __unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start, page = pte_page(pte); if (pte_dirty(pte)) set_page_dirty(page); - list_add(&page->lru, &page_list); + + spin_lock(&hugetlb_lock); + list_move(&page->lru, &page_list); + spin_unlock(&hugetlb_lock); } spin_unlock(&mm->page_table_lock); flush_tlb_range(vma, start, end); mmu_notifier_invalidate_range_end(mm, start, end); list_for_each_entry_safe(page, tmp, &page_list, lru) { page_remove_rmap(page); - list_del(&page->lru); + /* + * We need to move it back huge page active list. If we are + * holding the last reference, below put_page will move it + * back to free list. + */ + spin_lock(&hugetlb_lock); + list_move(&page->lru, &h->hugepage_activelist); + spin_unlock(&hugetlb_lock); put_page(page); } } -- 1.7.9 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Aneesh Kumar K.V" Subject: [PATCH -V4 09/10] memcg: move HugeTLB resource count to parent cgroup on memcg removal Date: Fri, 16 Mar 2012 23:09:29 +0530 Message-ID: <1331919570-2264-10-git-send-email-aneesh.kumar@linux.vnet.ibm.com> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Return-path: In-Reply-To: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Sender: owner-linux-mm@kvack.org List-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: linux-mm@kvack.org, mgorman@suse.de, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, aarcange@redhat.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, "Aneesh Kumar K.V" From: "Aneesh Kumar K.V" This add support for memcg removal with HugeTLB resource usage. Signed-off-by: Aneesh Kumar K.V --- include/linux/hugetlb.h | 6 ++++ include/linux/memcontrol.h | 15 +++++++++- mm/hugetlb.c | 41 ++++++++++++++++++++++++++ mm/memcontrol.c | 68 +++++++++++++++++++++++++++++++++++++------ 4 files changed, 119 insertions(+), 11 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 6919100..32e948c 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -349,11 +349,17 @@ static inline unsigned int pages_per_huge_page(struct hstate *h) #ifdef CONFIG_MEM_RES_CTLR_HUGETLB extern int register_hugetlb_memcg_files(struct cgroup *cgroup, struct cgroup_subsys *ss); +extern int hugetlb_force_memcg_empty(struct cgroup *cgroup); #else static inline int register_hugetlb_memcg_files(struct cgroup *cgroup, struct cgroup_subsys *ss) { return 0; } + +static inline int hugetlb_force_memcg_empty(struct cgroup *cgroup) +{ + return 0; +} #endif #endif /* _LINUX_HUGETLB_H */ diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 73900b9..0980122 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -441,7 +441,9 @@ extern void mem_cgroup_hugetlb_uncharge_page(int idx, unsigned long nr_pages, extern void mem_cgroup_hugetlb_uncharge_memcg(int idx, unsigned long nr_pages, struct mem_cgroup *memcg); extern int mem_cgroup_hugetlb_file_init(int idx); - +extern int mem_cgroup_move_hugetlb_parent(int idx, struct cgroup *cgroup, + struct page *page); +extern bool mem_cgroup_have_hugetlb_usage(struct cgroup *cgroup); #else static inline int mem_cgroup_hugetlb_charge_page(int idx, unsigned long nr_pages, @@ -477,6 +479,17 @@ static inline int mem_cgroup_hugetlb_file_init(int idx) return 0; } +static inline int +mem_cgroup_move_hugetlb_parent(int idx, struct cgroup *cgroup, + struct page *page) +{ + return 0; +} + +static inline bool mem_cgroup_have_hugetlb_usage(struct cgroup *cgroup) +{ + return 0; +} #endif /* CONFIG_MEM_RES_CTLR_HUGETLB */ #endif /* _LINUX_MEMCONTROL_H */ diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 8fd465d..685f0d5 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1842,6 +1842,47 @@ int register_hugetlb_memcg_files(struct cgroup *cgroup, } return ret; } + +/* + * Force the memcg to empty the hugetlb resources by moving them to + * the parent cgroup. We can fail if the parent cgroup's limit prevented + * the charging. This should only happen if use_hierarchy is not set. + */ +int hugetlb_force_memcg_empty(struct cgroup *cgroup) +{ + struct hstate *h; + struct page *page; + int ret = 0, idx = 0; + + do { + if (cgroup_task_count(cgroup) || !list_empty(&cgroup->children)) + goto out; + /* + * If the task doing the cgroup_rmdir got a signal + * we don't really need to loop till the hugetlb resource + * usage become zero. + */ + if (signal_pending(current)) { + ret = -EINTR; + goto out; + } + for_each_hstate(h) { + spin_lock(&hugetlb_lock); + list_for_each_entry(page, &h->hugepage_activelist, lru) { + ret = mem_cgroup_move_hugetlb_parent(idx, cgroup, page); + if (ret) { + spin_unlock(&hugetlb_lock); + goto out; + } + } + spin_unlock(&hugetlb_lock); + idx++; + } + cond_resched(); + } while (mem_cgroup_have_hugetlb_usage(cgroup)); +out: + return ret; +} #endif /* Should be called on processing a hugepagesz=... option */ diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 4900b72..e29d86d 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -3171,9 +3171,11 @@ static inline int mem_cgroup_move_swap_account(swp_entry_t entry, #endif #ifdef CONFIG_MEM_RES_CTLR_HUGETLB -static bool mem_cgroup_have_hugetlb_usage(struct mem_cgroup *memcg) +bool mem_cgroup_have_hugetlb_usage(struct cgroup *cgroup) { int idx; + struct mem_cgroup *memcg = mem_cgroup_from_cont(cgroup); + for (idx = 0; idx < hugetlb_max_hstate; idx++) { if (memcg->hugepage[idx].usage > 0) return 1; @@ -3285,10 +3287,57 @@ void mem_cgroup_hugetlb_uncharge_memcg(int idx, unsigned long nr_pages, res_counter_uncharge(&memcg->hugepage[idx], csize); return; } -#else -static bool mem_cgroup_have_hugetlb_usage(struct mem_cgroup *memcg) + +int mem_cgroup_move_hugetlb_parent(int idx, struct cgroup *cgroup, + struct page *page) { - return 0; + struct page_cgroup *pc; + int csize, ret = 0; + struct res_counter *fail_res; + struct cgroup *pcgrp = cgroup->parent; + struct mem_cgroup *parent = mem_cgroup_from_cont(pcgrp); + struct mem_cgroup *memcg = mem_cgroup_from_cont(cgroup); + + if (!get_page_unless_zero(page)) + goto out; + + pc = lookup_page_cgroup(page); + lock_page_cgroup(pc); + if (!PageCgroupUsed(pc) || pc->mem_cgroup != memcg) + goto err_out; + + csize = PAGE_SIZE << compound_order(page); + /* + * uncharge from child and charge the parent. If we have + * use_hierarchy set, we can never fail here. In-order to make + * sure we don't get -ENOMEM on parent charge, we first uncharge + * the child and then charge the parent. + */ + if (parent->use_hierarchy) { + res_counter_uncharge(&memcg->hugepage[idx], csize); + if (!mem_cgroup_is_root(parent)) + ret = res_counter_charge(&parent->hugepage[idx], + csize, &fail_res); + } else { + if (!mem_cgroup_is_root(parent)) { + ret = res_counter_charge(&parent->hugepage[idx], + csize, &fail_res); + if (ret) { + ret = -EBUSY; + goto err_out; + } + } + res_counter_uncharge(&memcg->hugepage[idx], csize); + } + /* + * caller should have done css_get + */ + pc->mem_cgroup = parent; +err_out: + unlock_page_cgroup(pc); + put_page(page); +out: + return ret; } #endif /* CONFIG_MEM_RES_CTLR_HUGETLB */ @@ -3806,6 +3855,11 @@ static int mem_cgroup_force_empty(struct mem_cgroup *memcg, bool free_all) /* should free all ? */ if (free_all) goto try_to_free; + + /* move the hugetlb charges */ + ret = hugetlb_force_memcg_empty(cgrp); + if (ret) + goto out; move_account: do { ret = -EBUSY; @@ -5103,12 +5157,6 @@ static int mem_cgroup_pre_destroy(struct cgroup_subsys *ss, struct cgroup *cont) { struct mem_cgroup *memcg = mem_cgroup_from_cont(cont); - /* - * Don't allow memcg removal if we have HugeTLB resource - * usage. - */ - if (mem_cgroup_have_hugetlb_usage(memcg)) - return -EBUSY; return mem_cgroup_force_empty(memcg, false); } -- 1.7.9 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Aneesh Kumar K.V" Subject: [PATCH -V4 10/10] memcg: Add memory controller documentation for hugetlb management Date: Fri, 16 Mar 2012 23:09:30 +0530 Message-ID: <1331919570-2264-11-git-send-email-aneesh.kumar@linux.vnet.ibm.com> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Return-path: In-Reply-To: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Sender: owner-linux-mm@kvack.org List-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: linux-mm@kvack.org, mgorman@suse.de, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, aarcange@redhat.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, "Aneesh Kumar K.V" From: "Aneesh Kumar K.V" Signed-off-by: Aneesh Kumar K.V --- Documentation/cgroups/memory.txt | 29 +++++++++++++++++++++++++++++ 1 files changed, 29 insertions(+), 0 deletions(-) diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt index 4c95c00..d99c41b 100644 --- a/Documentation/cgroups/memory.txt +++ b/Documentation/cgroups/memory.txt @@ -43,6 +43,7 @@ Features: - usage threshold notifier - oom-killer disable knob and oom-notifier - Root cgroup has no limit controls. + - resource accounting for HugeTLB pages Kernel memory support is work in progress, and the current version provides basically functionality. (See Section 2.7) @@ -75,6 +76,12 @@ Brief summary of control files. memory.kmem.tcp.limit_in_bytes # set/show hard limit for tcp buf memory memory.kmem.tcp.usage_in_bytes # show current tcp buf memory allocation + + memory.hugetlb..limit_in_bytes # set/show limit of "hugepagesize" hugetlb usage + memory.hugetlb..max_usage_in_bytes # show max "hugepagesize" hugetlb usage recorded + memory.hugetlb..usage_in_bytes # show current res_counter usage for "hugepagesize" hugetlb + # see 5.7 for details + 1. History The memory controller has a long history. A request for comments for the memory @@ -279,6 +286,15 @@ per cgroup, instead of globally. * tcp memory pressure: sockets memory pressure for the tcp protocol. +2.8 HugeTLB extension + +This extension allows to limit the HugeTLB usage per control group and +enforces the controller limit during page fault. Since HugeTLB doesn't +support page reclaim, enforcing the limit at page fault time implies that, +the application will get SIGBUS signal if it tries to access HugeTLB pages +beyond its limit. This requires the application to know beforehand how much +HugeTLB pages it would require for its use. + 3. User Interface 0. Configuration @@ -287,6 +303,7 @@ a. Enable CONFIG_CGROUPS b. Enable CONFIG_RESOURCE_COUNTERS c. Enable CONFIG_CGROUP_MEM_RES_CTLR d. Enable CONFIG_CGROUP_MEM_RES_CTLR_SWAP (to use swap extension) +f. Enable CONFIG_MEM_RES_CTLR_HUGETLB (to use HugeTLB extension) 1. Prepare the cgroups (see cgroups.txt, Why are cgroups needed?) # mount -t tmpfs none /sys/fs/cgroup @@ -510,6 +527,18 @@ unevictable= N0= N1= ... And we have total = file + anon + unevictable. +5.7 HugeTLB resource control files +For a system supporting two hugepage size (16M and 16G) the control +files include: + + memory.hugetlb.16GB.limit_in_bytes + memory.hugetlb.16GB.max_usage_in_bytes + memory.hugetlb.16GB.usage_in_bytes + memory.hugetlb.16MB.limit_in_bytes + memory.hugetlb.16MB.max_usage_in_bytes + memory.hugetlb.16MB.usage_in_bytes + + 6. Hierarchy support The memory controller supports a deep hierarchy and hierarchical accounting. -- 1.7.9 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: KAMEZAWA Hiroyuki Subject: Re: [PATCH -V4 01/10] hugetlb: rename max_hstate to hugetlb_max_hstate Date: Mon, 19 Mar 2012 11:07:00 +0900 Message-ID: <4F6694C4.2090800@jp.fujitsu.com> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-2-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1331919570-2264-2-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Sender: owner-linux-mm@kvack.org List-ID: Content-Type: text/plain; charset="us-ascii" To: "Aneesh Kumar K.V" Cc: linux-mm@kvack.org, mgorman@suse.de, dhillf@gmail.com, aarcange@redhat.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org (2012/03/17 2:39), Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > We will be using this from other subsystems like memcg > in later patches. > > Acked-by: Hillf Danton > Signed-off-by: Aneesh Kumar K.V Reviewed-by: KAMEZAWA Hiroyuki -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: KAMEZAWA Hiroyuki Subject: Re: [PATCH -V4 02/10] hugetlbfs: don't use ERR_PTR with VM_FAULT* values Date: Mon, 19 Mar 2012 11:11:56 +0900 Message-ID: <4F6695EC.2060208@jp.fujitsu.com> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-3-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1331919570-2264-3-git-send-email-aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" To: "Aneesh Kumar K.V" Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, mgorman-l3A5Bk7waGM@public.gmane.org, dhillf-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, aarcange-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, mhocko-AlSwsSmVLrQ@public.gmane.org, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org, hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org (2012/03/17 2:39), Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > Using VM_FAULT_* codes with ERR_PTR will require us to make sure > VM_FAULT_* values will not exceed MAX_ERRNO value. > > Signed-off-by: Aneesh Kumar K.V Is this a bug fix ? Reviewed-by: KAMEZAWA Hiroyuki From mboxrd@z Thu Jan 1 00:00:00 1970 From: KAMEZAWA Hiroyuki Subject: Re: [PATCH -V4 03/10] hugetlbfs: Add an inline helper for finding hstate index Date: Mon, 19 Mar 2012 11:15:30 +0900 Message-ID: <4F6696C2.4020203@jp.fujitsu.com> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-4-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1331919570-2264-4-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Sender: owner-linux-mm@kvack.org List-ID: Content-Type: text/plain; charset="us-ascii" To: "Aneesh Kumar K.V" Cc: linux-mm@kvack.org, mgorman@suse.de, dhillf@gmail.com, aarcange@redhat.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org (2012/03/17 2:39), Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > Add and inline helper and use it in the code. > > Signed-off-by: Aneesh Kumar K.V Reviewed-by: KAMEZAWA Hiroyuki -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: KAMEZAWA Hiroyuki Subject: Re: [PATCH -V4 04/10] memcg: Add HugeTLB extension Date: Mon, 19 Mar 2012 11:38:38 +0900 Message-ID: <4F669C2E.1010502@jp.fujitsu.com> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-5-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1331919570-2264-5-git-send-email-aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" To: "Aneesh Kumar K.V" Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, mgorman-l3A5Bk7waGM@public.gmane.org, dhillf-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, aarcange-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, mhocko-AlSwsSmVLrQ@public.gmane.org, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org, hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org (2012/03/17 2:39), Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > This patch implements a memcg extension that allows us to control > HugeTLB allocations via memory controller. > If you write some details here, it will be helpful for review and seeing log after merge. > Signed-off-by: Aneesh Kumar K.V > --- > include/linux/hugetlb.h | 1 + > include/linux/memcontrol.h | 42 +++++++++++++ > init/Kconfig | 8 +++ > mm/hugetlb.c | 2 +- > mm/memcontrol.c | 138 ++++++++++++++++++++++++++++++++++++++++++++ > 5 files changed, 190 insertions(+), 1 deletions(-) > > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h > index a2675b0..1f70068 100644 > --- a/include/linux/hugetlb.h > +++ b/include/linux/hugetlb.h > @@ -243,6 +243,7 @@ struct hstate *size_to_hstate(unsigned long size); > #define HUGE_MAX_HSTATE 1 > #endif > > +extern int hugetlb_max_hstate; > extern struct hstate hstates[HUGE_MAX_HSTATE]; > extern unsigned int default_hstate_idx; > > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > index 4d34356..320dbad 100644 > --- a/include/linux/memcontrol.h > +++ b/include/linux/memcontrol.h > @@ -429,5 +429,47 @@ static inline void sock_release_memcg(struct sock *sk) > { > } > #endif /* CONFIG_CGROUP_MEM_RES_CTLR_KMEM */ > + > +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB > +extern int mem_cgroup_hugetlb_charge_page(int idx, unsigned long nr_pages, > + struct mem_cgroup **ptr); > +extern void mem_cgroup_hugetlb_commit_charge(int idx, unsigned long nr_pages, > + struct mem_cgroup *memcg, > + struct page *page); > +extern void mem_cgroup_hugetlb_uncharge_page(int idx, unsigned long nr_pages, > + struct page *page); > +extern void mem_cgroup_hugetlb_uncharge_memcg(int idx, unsigned long nr_pages, > + struct mem_cgroup *memcg); > + > +#else > +static inline int > +mem_cgroup_hugetlb_charge_page(int idx, unsigned long nr_pages, > + struct mem_cgroup **ptr) > +{ > + return 0; > +} > + > +static inline void > +mem_cgroup_hugetlb_commit_charge(int idx, unsigned long nr_pages, > + struct mem_cgroup *memcg, > + struct page *page) > +{ > + return; > +} > + > +static inline void > +mem_cgroup_hugetlb_uncharge_page(int idx, unsigned long nr_pages, > + struct page *page) > +{ > + return; > +} > + > +static inline void > +mem_cgroup_hugetlb_uncharge_memcg(int idx, unsigned long nr_pages, > + struct mem_cgroup *memcg) > +{ > + return; > +} > +#endif /* CONFIG_MEM_RES_CTLR_HUGETLB */ > #endif /* _LINUX_MEMCONTROL_H */ > > diff --git a/init/Kconfig b/init/Kconfig > index 3f42cd6..f0eb8aa 100644 > --- a/init/Kconfig > +++ b/init/Kconfig > @@ -725,6 +725,14 @@ config CGROUP_PERF > > Say N if unsure. > > +config MEM_RES_CTLR_HUGETLB > + bool "Memory Resource Controller HugeTLB Extension (EXPERIMENTAL)" > + depends on CGROUP_MEM_RES_CTLR && HUGETLB_PAGE && EXPERIMENTAL > + default n > + help > + Add HugeTLB management to memory resource controller. When you > + enable this, you can put a per cgroup limit on HugeTLB usage. > + > menuconfig CGROUP_SCHED > bool "Group CPU scheduler" > default n > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index ebe245c..c672187 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -34,7 +34,7 @@ const unsigned long hugetlb_zero = 0, hugetlb_infinity = ~0UL; > static gfp_t htlb_alloc_mask = GFP_HIGHUSER; > unsigned long hugepages_treat_as_movable; > > -static int hugetlb_max_hstate; > +int hugetlb_max_hstate; > unsigned int default_hstate_idx; > struct hstate hstates[HUGE_MAX_HSTATE]; > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 6728a7a..4b36c5e 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -235,6 +235,10 @@ struct mem_cgroup { > */ > struct res_counter memsw; > /* > + * the counter to account for hugepages from hugetlb. > + */ > + struct res_counter hugepage[HUGE_MAX_HSTATE]; > + /* > * Per cgroup active and inactive list, similar to the > * per zone LRU lists. > */ > @@ -3156,6 +3160,128 @@ static inline int mem_cgroup_move_swap_account(swp_entry_t entry, > } > #endif > > +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB > +static bool mem_cgroup_have_hugetlb_usage(struct mem_cgroup *memcg) > +{ > + int idx; > + for (idx = 0; idx < hugetlb_max_hstate; idx++) { > + if (memcg->hugepage[idx].usage > 0) > + return 1; > + } > + return 0; > +} Please use res_counter_read_u64() rather than reading the value directly. > + > +int mem_cgroup_hugetlb_charge_page(int idx, unsigned long nr_pages, > + struct mem_cgroup **ptr) > +{ > + int ret = 0; > + struct mem_cgroup *memcg; > + struct res_counter *fail_res; > + unsigned long csize = nr_pages * PAGE_SIZE; > + > + if (mem_cgroup_disabled()) > + return 0; > +again: > + rcu_read_lock(); > + memcg = mem_cgroup_from_task(current); > + if (!memcg) > + memcg = root_mem_cgroup; > + if (mem_cgroup_is_root(memcg)) { > + rcu_read_unlock(); > + goto done; > + } One concern is.... Now, yes, memory cgroup doesn't account root cgroup and doesn't update res->usage to avoid updating shared counter overheads when memcg is not mounted. But memory.usage_in_bytes files works for root memcg with reading percpu statistics. So, how about counting usage for root cgroup even if it cannot be limited ? Considering hugetlb fs usage, updating res_counter here doesn't have performance problem of false sharing.. Then, you can remove root_mem_cgroup() checks inserted several places. > struct mem_cgroup *memcg = mem_cgroup_from_cont(cont); > + /* > + * Don't allow memcg removal if we have HugeTLB resource > + * usage. > + */ > + if (mem_cgroup_have_hugetlb_usage(memcg)) > + return -EBUSY; > > return mem_cgroup_force_empty(memcg, false); > } Is this fixed by patch 8+9 ? Thanks, -Kame From mboxrd@z Thu Jan 1 00:00:00 1970 From: KAMEZAWA Hiroyuki Subject: Re: [PATCH -V4 05/10] hugetlb: add charge/uncharge calls for HugeTLB alloc/free Date: Mon, 19 Mar 2012 11:41:07 +0900 Message-ID: <4F669CC3.9070007@jp.fujitsu.com> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-6-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1331919570-2264-6-git-send-email-aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" To: "Aneesh Kumar K.V" Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, mgorman-l3A5Bk7waGM@public.gmane.org, dhillf-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, aarcange-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, mhocko-AlSwsSmVLrQ@public.gmane.org, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org, hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org (2012/03/17 2:39), Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > This adds necessary charge/uncharge calls in the HugeTLB code > > Acked-by: Hillf Danton > Signed-off-by: Aneesh Kumar K.V Reviewed-by: KAMEZAWA Hiroyuki A nitpick below. > --- > mm/hugetlb.c | 21 ++++++++++++++++++++- > mm/memcontrol.c | 5 +++++ > 2 files changed, 25 insertions(+), 1 deletions(-) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index c672187..91361a0 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -21,6 +21,8 @@ > #include > #include > #include > +#include > +#include > > #include > #include > @@ -542,6 +544,9 @@ static void free_huge_page(struct page *page) > BUG_ON(page_mapcount(page)); > INIT_LIST_HEAD(&page->lru); > > + if (mapping) > + mem_cgroup_hugetlb_uncharge_page(hstate_index(h), > + pages_per_huge_page(h), page); > spin_lock(&hugetlb_lock); > if (h->surplus_huge_pages_node[nid] && huge_page_order(h) < MAX_ORDER) { > update_and_free_page(h, page); > @@ -1019,12 +1024,15 @@ static void vma_commit_reservation(struct hstate *h, > static struct page *alloc_huge_page(struct vm_area_struct *vma, > unsigned long addr, int avoid_reserve) > { > + int ret, idx; > struct hstate *h = hstate_vma(vma); > struct page *page; > + struct mem_cgroup *memcg = NULL; Can't we this initialization in mem_cgroup_hugetlb_charge_page() ? Thanks, -Kame From mboxrd@z Thu Jan 1 00:00:00 1970 From: KAMEZAWA Hiroyuki Subject: Re: [PATCH -V4 06/10] memcg: track resource index in cftype private Date: Mon, 19 Mar 2012 11:43:34 +0900 Message-ID: <4F669D56.4080002@jp.fujitsu.com> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-7-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1331919570-2264-7-git-send-email-aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" To: "Aneesh Kumar K.V" Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, mgorman-l3A5Bk7waGM@public.gmane.org, dhillf-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, aarcange-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, mhocko-AlSwsSmVLrQ@public.gmane.org, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org, hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org (2012/03/17 2:39), Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > This helps in using same memcg callbacks for non reclaim resource > control files. > > Signed-off-by: Aneesh Kumar K.V Acked-by: KAMEZAWA Hiroyuki As mentioned, I'm glad if you can handle usage_in_bytes for root memcg. From mboxrd@z Thu Jan 1 00:00:00 1970 From: KAMEZAWA Hiroyuki Subject: Re: [PATCH -V4 07/10] hugetlbfs: Add memcg control files for hugetlbfs Date: Mon, 19 Mar 2012 11:56:25 +0900 Message-ID: <4F66A059.20801@jp.fujitsu.com> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-8-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1331919570-2264-8-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Sender: owner-linux-mm@kvack.org List-ID: Content-Type: text/plain; charset="us-ascii" To: "Aneesh Kumar K.V" Cc: linux-mm@kvack.org, mgorman@suse.de, dhillf@gmail.com, aarcange@redhat.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org (2012/03/17 2:39), Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > This add control files for hugetlbfs in memcg > > Signed-off-by: Aneesh Kumar K.V I have a question. When a user does 1. create memory cgroup as /cgroup/A 2. insmod hugetlb.ko 3. ls /cgroup/A and then, files can be shown ? Don't we have any problem at rmdir A ? I'm sorry if hugetlb never be used as module. a comment below. > --- > include/linux/hugetlb.h | 17 +++++++++++++++ > include/linux/memcontrol.h | 7 ++++++ > mm/hugetlb.c | 25 ++++++++++++++++++++++- > mm/memcontrol.c | 48 ++++++++++++++++++++++++++++++++++++++++++++ > 4 files changed, 96 insertions(+), 1 deletions(-) > > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h > index 1f70068..cbd8dc5 100644 > --- a/include/linux/hugetlb.h > +++ b/include/linux/hugetlb.h > @@ -4,6 +4,7 @@ > #include > #include > #include > +#include > > struct ctl_table; > struct user_struct; > @@ -220,6 +221,12 @@ struct hstate { > unsigned int nr_huge_pages_node[MAX_NUMNODES]; > unsigned int free_huge_pages_node[MAX_NUMNODES]; > unsigned int surplus_huge_pages_node[MAX_NUMNODES]; > + /* mem cgroup control files */ > +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB > + struct cftype cgroup_limit_file; > + struct cftype cgroup_usage_file; > + struct cftype cgroup_max_usage_file; > +#endif > char name[HSTATE_NAME_LEN]; > }; > > @@ -338,4 +345,14 @@ static inline unsigned int pages_per_huge_page(struct hstate *h) > #define hstate_index(h) 0 > #endif > > +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB > +extern int register_hugetlb_memcg_files(struct cgroup *cgroup, > + struct cgroup_subsys *ss); > +#else > +static inline int register_hugetlb_memcg_files(struct cgroup *cgroup, > + struct cgroup_subsys *ss) > +{ > + return 0; > +} > +#endif > #endif /* _LINUX_HUGETLB_H */ > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > index 320dbad..73900b9 100644 > --- a/include/linux/memcontrol.h > +++ b/include/linux/memcontrol.h > @@ -440,6 +440,7 @@ extern void mem_cgroup_hugetlb_uncharge_page(int idx, unsigned long nr_pages, > struct page *page); > extern void mem_cgroup_hugetlb_uncharge_memcg(int idx, unsigned long nr_pages, > struct mem_cgroup *memcg); > +extern int mem_cgroup_hugetlb_file_init(int idx); > > #else > static inline int > @@ -470,6 +471,12 @@ mem_cgroup_hugetlb_uncharge_memcg(int idx, unsigned long nr_pages, > { > return; > } > + > +static inline int mem_cgroup_hugetlb_file_init(int idx) > +{ > + return 0; > +} > + > #endif /* CONFIG_MEM_RES_CTLR_HUGETLB */ > #endif /* _LINUX_MEMCONTROL_H */ > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 91361a0..684849a 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -1819,6 +1819,29 @@ static int __init hugetlb_init(void) > } > module_init(hugetlb_init); > > +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB > +int register_hugetlb_memcg_files(struct cgroup *cgroup, > + struct cgroup_subsys *ss) > +{ > + int ret = 0; > + struct hstate *h; > + > + for_each_hstate(h) { > + ret = cgroup_add_file(cgroup, ss, &h->cgroup_limit_file); > + if (ret) > + return ret; > + ret = cgroup_add_file(cgroup, ss, &h->cgroup_usage_file); > + if (ret) > + return ret; > + ret = cgroup_add_file(cgroup, ss, &h->cgroup_max_usage_file); > + if (ret) > + return ret; > + > + } > + return ret; > +} > +#endif > + > /* Should be called on processing a hugepagesz=... option */ > void __init hugetlb_add_hstate(unsigned order) > { > @@ -1842,7 +1865,7 @@ void __init hugetlb_add_hstate(unsigned order) > h->next_nid_to_free = first_node(node_states[N_HIGH_MEMORY]); > snprintf(h->name, HSTATE_NAME_LEN, "hugepages-%lukB", > huge_page_size(h)/1024); > - > + mem_cgroup_hugetlb_file_init(hugetlb_max_hstate - 1); > parsed_hstate = h; > } > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index d8b3513..4900b72 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -5123,6 +5123,51 @@ static void mem_cgroup_destroy(struct cgroup_subsys *ss, > mem_cgroup_put(memcg); > } > > +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB > +static char *mem_fmt(char *buf, unsigned long n) > +{ > + if (n >= (1UL << 30)) > + sprintf(buf, "%luGB", n >> 30); > + else if (n >= (1UL << 20)) > + sprintf(buf, "%luMB", n >> 20); > + else > + sprintf(buf, "%luKB", n >> 10); > + return buf; > +} > + > +int mem_cgroup_hugetlb_file_init(int idx) > +{ __init ? And... do we have guarantee that this function is called before creating root mem cgroup even if CONFIG_HUGETLBFS=y ? Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: KAMEZAWA Hiroyuki Subject: Re: [PATCH -V4 08/10] hugetlbfs: Add a list for tracking in-use HugeTLB pages Date: Mon, 19 Mar 2012 12:00:43 +0900 Message-ID: <4F66A15B.7070804@jp.fujitsu.com> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-9-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1331919570-2264-9-git-send-email-aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" To: "Aneesh Kumar K.V" Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, mgorman-l3A5Bk7waGM@public.gmane.org, dhillf-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, aarcange-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, mhocko-AlSwsSmVLrQ@public.gmane.org, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org, hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org (2012/03/17 2:39), Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > hugepage_activelist will be used to track currently used HugeTLB pages. > We need to find the in-use HugeTLB pages to support memcg removal. > On memcg removal we update the page's memory cgroup to point to > parent cgroup. > > Signed-off-by: Aneesh Kumar K.V Reviewed-by: KAMEZAWA Hiroyuki seems ok to me but...why the new list is not per node ? no benefit ? Thanks, -Kame > --- > include/linux/hugetlb.h | 1 + > mm/hugetlb.c | 23 ++++++++++++++++++----- > 2 files changed, 19 insertions(+), 5 deletions(-) > > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h > index cbd8dc5..6919100 100644 > --- a/include/linux/hugetlb.h > +++ b/include/linux/hugetlb.h > @@ -217,6 +217,7 @@ struct hstate { > unsigned long resv_huge_pages; > unsigned long surplus_huge_pages; > unsigned long nr_overcommit_huge_pages; > + struct list_head hugepage_activelist; > struct list_head hugepage_freelists[MAX_NUMNODES]; > unsigned int nr_huge_pages_node[MAX_NUMNODES]; > unsigned int free_huge_pages_node[MAX_NUMNODES]; > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 684849a..8fd465d 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -433,7 +433,7 @@ void copy_huge_page(struct page *dst, struct page *src) > static void enqueue_huge_page(struct hstate *h, struct page *page) > { > int nid = page_to_nid(page); > - list_add(&page->lru, &h->hugepage_freelists[nid]); > + list_move(&page->lru, &h->hugepage_freelists[nid]); > h->free_huge_pages++; > h->free_huge_pages_node[nid]++; > } > @@ -445,7 +445,7 @@ static struct page *dequeue_huge_page_node(struct hstate *h, int nid) > if (list_empty(&h->hugepage_freelists[nid])) > return NULL; > page = list_entry(h->hugepage_freelists[nid].next, struct page, lru); > - list_del(&page->lru); > + list_move(&page->lru, &h->hugepage_activelist); > set_page_refcounted(page); > h->free_huge_pages--; > h->free_huge_pages_node[nid]--; > @@ -542,13 +542,14 @@ static void free_huge_page(struct page *page) > page->mapping = NULL; > BUG_ON(page_count(page)); > BUG_ON(page_mapcount(page)); > - INIT_LIST_HEAD(&page->lru); > > if (mapping) > mem_cgroup_hugetlb_uncharge_page(hstate_index(h), > pages_per_huge_page(h), page); > spin_lock(&hugetlb_lock); > if (h->surplus_huge_pages_node[nid] && huge_page_order(h) < MAX_ORDER) { > + /* remove the page from active list */ > + list_del(&page->lru); > update_and_free_page(h, page); > h->surplus_huge_pages--; > h->surplus_huge_pages_node[nid]--; > @@ -562,6 +563,7 @@ static void free_huge_page(struct page *page) > > static void prep_new_huge_page(struct hstate *h, struct page *page, int nid) > { > + INIT_LIST_HEAD(&page->lru); > set_compound_page_dtor(page, free_huge_page); > spin_lock(&hugetlb_lock); > h->nr_huge_pages++; > @@ -1861,6 +1863,7 @@ void __init hugetlb_add_hstate(unsigned order) > h->free_huge_pages = 0; > for (i = 0; i < MAX_NUMNODES; ++i) > INIT_LIST_HEAD(&h->hugepage_freelists[i]); > + INIT_LIST_HEAD(&h->hugepage_activelist); > h->next_nid_to_alloc = first_node(node_states[N_HIGH_MEMORY]); > h->next_nid_to_free = first_node(node_states[N_HIGH_MEMORY]); > snprintf(h->name, HSTATE_NAME_LEN, "hugepages-%lukB", > @@ -2319,14 +2322,24 @@ void __unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start, > page = pte_page(pte); > if (pte_dirty(pte)) > set_page_dirty(page); > - list_add(&page->lru, &page_list); > + > + spin_lock(&hugetlb_lock); > + list_move(&page->lru, &page_list); > + spin_unlock(&hugetlb_lock); > } > spin_unlock(&mm->page_table_lock); > flush_tlb_range(vma, start, end); > mmu_notifier_invalidate_range_end(mm, start, end); > list_for_each_entry_safe(page, tmp, &page_list, lru) { > page_remove_rmap(page); > - list_del(&page->lru); > + /* > + * We need to move it back huge page active list. If we are > + * holding the last reference, below put_page will move it > + * back to free list. > + */ > + spin_lock(&hugetlb_lock); > + list_move(&page->lru, &h->hugepage_activelist); > + spin_unlock(&hugetlb_lock); > put_page(page); > } > } From mboxrd@z Thu Jan 1 00:00:00 1970 From: KAMEZAWA Hiroyuki Subject: Re: [PATCH -V4 09/10] memcg: move HugeTLB resource count to parent cgroup on memcg removal Date: Mon, 19 Mar 2012 12:04:56 +0900 Message-ID: <4F66A258.5060301@jp.fujitsu.com> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-10-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1331919570-2264-10-git-send-email-aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" To: "Aneesh Kumar K.V" Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, mgorman-l3A5Bk7waGM@public.gmane.org, dhillf-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, aarcange-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, mhocko-AlSwsSmVLrQ@public.gmane.org, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org, hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org (2012/03/17 2:39), Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > This add support for memcg removal with HugeTLB resource usage. > > Signed-off-by: Aneesh Kumar K.V seems ok for now. Now, Tejun and Costa, and I are discussing removeing -EBUSY from rmdir(). We're now considering 'if use_hierarchy=false and parent seems full, reclaim all or move charges to the root cgroup.' then -EBUSY will go away. Is it accesptable for hugetlb ? Do you have another idea ? Thanks, -Kame > --- > include/linux/hugetlb.h | 6 ++++ > include/linux/memcontrol.h | 15 +++++++++- > mm/hugetlb.c | 41 ++++++++++++++++++++++++++ > mm/memcontrol.c | 68 +++++++++++++++++++++++++++++++++++++------ > 4 files changed, 119 insertions(+), 11 deletions(-) > > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h > index 6919100..32e948c 100644 > --- a/include/linux/hugetlb.h > +++ b/include/linux/hugetlb.h > @@ -349,11 +349,17 @@ static inline unsigned int pages_per_huge_page(struct hstate *h) > #ifdef CONFIG_MEM_RES_CTLR_HUGETLB > extern int register_hugetlb_memcg_files(struct cgroup *cgroup, > struct cgroup_subsys *ss); > +extern int hugetlb_force_memcg_empty(struct cgroup *cgroup); > #else > static inline int register_hugetlb_memcg_files(struct cgroup *cgroup, > struct cgroup_subsys *ss) > { > return 0; > } > + > +static inline int hugetlb_force_memcg_empty(struct cgroup *cgroup) > +{ > + return 0; > +} > #endif > #endif /* _LINUX_HUGETLB_H */ > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > index 73900b9..0980122 100644 > --- a/include/linux/memcontrol.h > +++ b/include/linux/memcontrol.h > @@ -441,7 +441,9 @@ extern void mem_cgroup_hugetlb_uncharge_page(int idx, unsigned long nr_pages, > extern void mem_cgroup_hugetlb_uncharge_memcg(int idx, unsigned long nr_pages, > struct mem_cgroup *memcg); > extern int mem_cgroup_hugetlb_file_init(int idx); > - > +extern int mem_cgroup_move_hugetlb_parent(int idx, struct cgroup *cgroup, > + struct page *page); > +extern bool mem_cgroup_have_hugetlb_usage(struct cgroup *cgroup); > #else > static inline int > mem_cgroup_hugetlb_charge_page(int idx, unsigned long nr_pages, > @@ -477,6 +479,17 @@ static inline int mem_cgroup_hugetlb_file_init(int idx) > return 0; > } > > +static inline int > +mem_cgroup_move_hugetlb_parent(int idx, struct cgroup *cgroup, > + struct page *page) > +{ > + return 0; > +} > + > +static inline bool mem_cgroup_have_hugetlb_usage(struct cgroup *cgroup) > +{ > + return 0; > +} > #endif /* CONFIG_MEM_RES_CTLR_HUGETLB */ > #endif /* _LINUX_MEMCONTROL_H */ > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 8fd465d..685f0d5 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -1842,6 +1842,47 @@ int register_hugetlb_memcg_files(struct cgroup *cgroup, > } > return ret; > } > + > +/* > + * Force the memcg to empty the hugetlb resources by moving them to > + * the parent cgroup. We can fail if the parent cgroup's limit prevented > + * the charging. This should only happen if use_hierarchy is not set. > + */ > +int hugetlb_force_memcg_empty(struct cgroup *cgroup) > +{ > + struct hstate *h; > + struct page *page; > + int ret = 0, idx = 0; > + > + do { > + if (cgroup_task_count(cgroup) || !list_empty(&cgroup->children)) > + goto out; > + /* > + * If the task doing the cgroup_rmdir got a signal > + * we don't really need to loop till the hugetlb resource > + * usage become zero. > + */ > + if (signal_pending(current)) { > + ret = -EINTR; > + goto out; > + } > + for_each_hstate(h) { > + spin_lock(&hugetlb_lock); > + list_for_each_entry(page, &h->hugepage_activelist, lru) { > + ret = mem_cgroup_move_hugetlb_parent(idx, cgroup, page); > + if (ret) { > + spin_unlock(&hugetlb_lock); > + goto out; > + } > + } > + spin_unlock(&hugetlb_lock); > + idx++; > + } > + cond_resched(); > + } while (mem_cgroup_have_hugetlb_usage(cgroup)); > +out: > + return ret; > +} > #endif > > /* Should be called on processing a hugepagesz=... option */ > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 4900b72..e29d86d 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -3171,9 +3171,11 @@ static inline int mem_cgroup_move_swap_account(swp_entry_t entry, > #endif > > #ifdef CONFIG_MEM_RES_CTLR_HUGETLB > -static bool mem_cgroup_have_hugetlb_usage(struct mem_cgroup *memcg) > +bool mem_cgroup_have_hugetlb_usage(struct cgroup *cgroup) > { > int idx; > + struct mem_cgroup *memcg = mem_cgroup_from_cont(cgroup); > + > for (idx = 0; idx < hugetlb_max_hstate; idx++) { > if (memcg->hugepage[idx].usage > 0) > return 1; > @@ -3285,10 +3287,57 @@ void mem_cgroup_hugetlb_uncharge_memcg(int idx, unsigned long nr_pages, > res_counter_uncharge(&memcg->hugepage[idx], csize); > return; > } > -#else > -static bool mem_cgroup_have_hugetlb_usage(struct mem_cgroup *memcg) > + > +int mem_cgroup_move_hugetlb_parent(int idx, struct cgroup *cgroup, > + struct page *page) > { > - return 0; > + struct page_cgroup *pc; > + int csize, ret = 0; > + struct res_counter *fail_res; > + struct cgroup *pcgrp = cgroup->parent; > + struct mem_cgroup *parent = mem_cgroup_from_cont(pcgrp); > + struct mem_cgroup *memcg = mem_cgroup_from_cont(cgroup); > + > + if (!get_page_unless_zero(page)) > + goto out; > + > + pc = lookup_page_cgroup(page); > + lock_page_cgroup(pc); > + if (!PageCgroupUsed(pc) || pc->mem_cgroup != memcg) > + goto err_out; > + > + csize = PAGE_SIZE << compound_order(page); > + /* > + * uncharge from child and charge the parent. If we have > + * use_hierarchy set, we can never fail here. In-order to make > + * sure we don't get -ENOMEM on parent charge, we first uncharge > + * the child and then charge the parent. > + */ > + if (parent->use_hierarchy) { > + res_counter_uncharge(&memcg->hugepage[idx], csize); > + if (!mem_cgroup_is_root(parent)) > + ret = res_counter_charge(&parent->hugepage[idx], > + csize, &fail_res); > + } else { > + if (!mem_cgroup_is_root(parent)) { > + ret = res_counter_charge(&parent->hugepage[idx], > + csize, &fail_res); > + if (ret) { > + ret = -EBUSY; > + goto err_out; > + } > + } > + res_counter_uncharge(&memcg->hugepage[idx], csize); > + } > + /* > + * caller should have done css_get > + */ > + pc->mem_cgroup = parent; > +err_out: > + unlock_page_cgroup(pc); > + put_page(page); > +out: > + return ret; > } > #endif /* CONFIG_MEM_RES_CTLR_HUGETLB */ > > @@ -3806,6 +3855,11 @@ static int mem_cgroup_force_empty(struct mem_cgroup *memcg, bool free_all) > /* should free all ? */ > if (free_all) > goto try_to_free; > + > + /* move the hugetlb charges */ > + ret = hugetlb_force_memcg_empty(cgrp); > + if (ret) > + goto out; > move_account: > do { > ret = -EBUSY; > @@ -5103,12 +5157,6 @@ static int mem_cgroup_pre_destroy(struct cgroup_subsys *ss, > struct cgroup *cont) > { > struct mem_cgroup *memcg = mem_cgroup_from_cont(cont); > - /* > - * Don't allow memcg removal if we have HugeTLB resource > - * usage. > - */ > - if (mem_cgroup_have_hugetlb_usage(memcg)) > - return -EBUSY; > > return mem_cgroup_force_empty(memcg, false); > } From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Aneesh Kumar K.V" Subject: Re: [PATCH -V4 02/10] hugetlbfs: don't use ERR_PTR with VM_FAULT* values Date: Mon, 19 Mar 2012 12:07:47 +0530 Message-ID: <877gyhksec.fsf@linux.vnet.ibm.com> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-3-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <4F6695EC.2060208@jp.fujitsu.com> Mime-Version: 1.0 Return-path: In-Reply-To: <4F6695EC.2060208@jp.fujitsu.com> Sender: owner-linux-mm@kvack.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: KAMEZAWA Hiroyuki Cc: linux-mm@kvack.org, mgorman@suse.de, dhillf@gmail.com, aarcange@redhat.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org On Mon, 19 Mar 2012 11:11:56 +0900, KAMEZAWA Hiroyuki wrote: > (2012/03/17 2:39), Aneesh Kumar K.V wrote: > > > From: "Aneesh Kumar K.V" > > > > Using VM_FAULT_* codes with ERR_PTR will require us to make sure > > VM_FAULT_* values will not exceed MAX_ERRNO value. > > > > Signed-off-by: Aneesh Kumar K.V > > > Is this a bug fix ? No. Currently the values of VM_FAULT_* codes are all below MAX_ERRNO. The changes in the patch are done based on the suggestion from Andrew. http://article.gmane.org/gmane.linux.kernel.cgroups/1160 > Reviewed-by: KAMEZAWA Hiroyuki > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Aneesh Kumar K.V" Subject: Re: [PATCH -V4 04/10] memcg: Add HugeTLB extension Date: Mon, 19 Mar 2012 12:22:53 +0530 Message-ID: <874ntlkrp6.fsf@linux.vnet.ibm.com> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-5-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <4F669C2E.1010502@jp.fujitsu.com> Mime-Version: 1.0 Return-path: In-Reply-To: <4F669C2E.1010502-+CUm20s59erQFUHtdCDX3A@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: KAMEZAWA Hiroyuki Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, mgorman-l3A5Bk7waGM@public.gmane.org, dhillf-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, aarcange-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, mhocko-AlSwsSmVLrQ@public.gmane.org, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org, hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org On Mon, 19 Mar 2012 11:38:38 +0900, KAMEZAWA Hiroyuki wrote: > (2012/03/17 2:39), Aneesh Kumar K.V wrote: > > > From: "Aneesh Kumar K.V" > > > > This patch implements a memcg extension that allows us to control > > HugeTLB allocations via memory controller. > > > > > If you write some details here, it will be helpful for review and > seeing log after merge. Will add more info. > > > > Signed-off-by: Aneesh Kumar K.V > > --- > > include/linux/hugetlb.h | 1 + > > include/linux/memcontrol.h | 42 +++++++++++++ > > init/Kconfig | 8 +++ > > mm/hugetlb.c | 2 +- > > mm/memcontrol.c | 138 ++++++++++++++++++++++++++++++++++++++++++++ > > 5 files changed, 190 insertions(+), 1 deletions(-) .... > > +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB > > +static bool mem_cgroup_have_hugetlb_usage(struct mem_cgroup *memcg) > > +{ > > + int idx; > > + for (idx = 0; idx < hugetlb_max_hstate; idx++) { > > + if (memcg->hugepage[idx].usage > 0) > > + return 1; > > + } > > + return 0; > > +} > > > Please use res_counter_read_u64() rather than reading the value directly. > The open-coded variant is mostly derived from mem_cgroup_force_empty. I have updated the patch to use res_counter_read_u64. > > > + > > +int mem_cgroup_hugetlb_charge_page(int idx, unsigned long nr_pages, > > + struct mem_cgroup **ptr) > > +{ > > + int ret = 0; > > + struct mem_cgroup *memcg; > > + struct res_counter *fail_res; > > + unsigned long csize = nr_pages * PAGE_SIZE; > > + > > + if (mem_cgroup_disabled()) > > + return 0; > > +again: > > + rcu_read_lock(); > > + memcg = mem_cgroup_from_task(current); > > + if (!memcg) > > + memcg = root_mem_cgroup; > > + if (mem_cgroup_is_root(memcg)) { > > + rcu_read_unlock(); > > + goto done; > > + } > > > One concern is.... Now, yes, memory cgroup doesn't account root cgroup > and doesn't update res->usage to avoid updating shared counter overheads > when memcg is not mounted. But memory.usage_in_bytes files works > for root memcg with reading percpu statistics. > > So, how about counting usage for root cgroup even if it cannot be limited ? > Considering hugetlb fs usage, updating res_counter here doesn't have > performance problem of false sharing.. > Then, you can remove root_mem_cgroup() checks inserted several places. > Yes. That is a good idea. Will update the patch. > > > > struct mem_cgroup *memcg = mem_cgroup_from_cont(cont); > > + /* > > + * Don't allow memcg removal if we have HugeTLB resource > > + * usage. > > + */ > > + if (mem_cgroup_have_hugetlb_usage(memcg)) > > + return -EBUSY; > > > > return mem_cgroup_force_empty(memcg, false); > > } > > > Is this fixed by patch 8+9 ? Yes. -aneesh From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Aneesh Kumar K.V" Subject: Re: [PATCH -V4 05/10] hugetlb: add charge/uncharge calls for HugeTLB alloc/free Date: Mon, 19 Mar 2012 12:31:36 +0530 Message-ID: <871uopkran.fsf@linux.vnet.ibm.com> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-6-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <4F669CC3.9070007@jp.fujitsu.com> Mime-Version: 1.0 Return-path: In-Reply-To: <4F669CC3.9070007@jp.fujitsu.com> Sender: owner-linux-mm@kvack.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: KAMEZAWA Hiroyuki Cc: linux-mm@kvack.org, mgorman@suse.de, dhillf@gmail.com, aarcange@redhat.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org On Mon, 19 Mar 2012 11:41:07 +0900, KAMEZAWA Hiroyuki wrote: > (2012/03/17 2:39), Aneesh Kumar K.V wrote: > > > From: "Aneesh Kumar K.V" > > > > This adds necessary charge/uncharge calls in the HugeTLB code > > > > Acked-by: Hillf Danton > > Signed-off-by: Aneesh Kumar K.V > > > Reviewed-by: KAMEZAWA Hiroyuki > A nitpick below. > > > --- > > mm/hugetlb.c | 21 ++++++++++++++++++++- > > mm/memcontrol.c | 5 +++++ > > 2 files changed, 25 insertions(+), 1 deletions(-) > > > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > > index c672187..91361a0 100644 > > --- a/mm/hugetlb.c > > +++ b/mm/hugetlb.c > > @@ -21,6 +21,8 @@ > > #include > > #include > > #include > > +#include > > +#include > > > > #include > > #include > > @@ -542,6 +544,9 @@ static void free_huge_page(struct page *page) > > BUG_ON(page_mapcount(page)); > > INIT_LIST_HEAD(&page->lru); > > > > + if (mapping) > > + mem_cgroup_hugetlb_uncharge_page(hstate_index(h), > > + pages_per_huge_page(h), page); > > spin_lock(&hugetlb_lock); > > if (h->surplus_huge_pages_node[nid] && huge_page_order(h) < MAX_ORDER) { > > update_and_free_page(h, page); > > @@ -1019,12 +1024,15 @@ static void vma_commit_reservation(struct hstate *h, > > static struct page *alloc_huge_page(struct vm_area_struct *vma, > > unsigned long addr, int avoid_reserve) > > { > > + int ret, idx; > > struct hstate *h = hstate_vma(vma); > > struct page *page; > > + struct mem_cgroup *memcg = NULL; > > > Can't we this initialization in mem_cgroup_hugetlb_charge_page() ? > Will update in the next iteration. -aneesh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: KAMEZAWA Hiroyuki Subject: Re: [PATCH -V4 04/10] memcg: Add HugeTLB extension Date: Mon, 19 Mar 2012 16:00:35 +0900 Message-ID: <4F66D993.2080100@jp.fujitsu.com> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-5-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <4F669C2E.1010502@jp.fujitsu.com> <874ntlkrp6.fsf@linux.vnet.ibm.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <874ntlkrp6.fsf-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" To: "Aneesh Kumar K.V" Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, mgorman-l3A5Bk7waGM@public.gmane.org, dhillf-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, aarcange-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, mhocko-AlSwsSmVLrQ@public.gmane.org, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org, hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org (2012/03/19 15:52), Aneesh Kumar K.V wrote: > On Mon, 19 Mar 2012 11:38:38 +0900, KAMEZAWA Hiroyuki wrote: >> (2012/03/17 2:39), Aneesh Kumar K.V wrote: >> >>> From: "Aneesh Kumar K.V" >>> >>> This patch implements a memcg extension that allows us to control >>> HugeTLB allocations via memory controller. >>> >> >> >> If you write some details here, it will be helpful for review and >> seeing log after merge. > > Will add more info. > >> >> >>> Signed-off-by: Aneesh Kumar K.V >>> --- >>> include/linux/hugetlb.h | 1 + >>> include/linux/memcontrol.h | 42 +++++++++++++ >>> init/Kconfig | 8 +++ >>> mm/hugetlb.c | 2 +- >>> mm/memcontrol.c | 138 ++++++++++++++++++++++++++++++++++++++++++++ >>> 5 files changed, 190 insertions(+), 1 deletions(-) > > .... > >>> +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB >>> +static bool mem_cgroup_have_hugetlb_usage(struct mem_cgroup *memcg) >>> +{ >>> + int idx; >>> + for (idx = 0; idx < hugetlb_max_hstate; idx++) { >>> + if (memcg->hugepage[idx].usage > 0) >>> + return 1; >>> + } >>> + return 0; >>> +} >> >> >> Please use res_counter_read_u64() rather than reading the value directly. >> > > The open-coded variant is mostly derived from mem_cgroup_force_empty. I > have updated the patch to use res_counter_read_u64. > Ah, ok. it's(maybe) my bad. I'll schedule a fix. Thanks, -Kame From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Aneesh Kumar K.V" Subject: Re: [PATCH -V4 07/10] hugetlbfs: Add memcg control files for hugetlbfs Date: Mon, 19 Mar 2012 12:44:11 +0530 Message-ID: <87wr6hjc58.fsf@linux.vnet.ibm.com> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-8-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <4F66A059.20801@jp.fujitsu.com> Mime-Version: 1.0 Return-path: In-Reply-To: <4F66A059.20801-+CUm20s59erQFUHtdCDX3A@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: KAMEZAWA Hiroyuki Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, mgorman-l3A5Bk7waGM@public.gmane.org, dhillf-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, aarcange-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, mhocko-AlSwsSmVLrQ@public.gmane.org, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org, hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org On Mon, 19 Mar 2012 11:56:25 +0900, KAMEZAWA Hiroyuki wrote: > (2012/03/17 2:39), Aneesh Kumar K.V wrote: > > > From: "Aneesh Kumar K.V" > > > > This add control files for hugetlbfs in memcg > > > > Signed-off-by: Aneesh Kumar K.V > > > I have a question. When a user does > > 1. create memory cgroup as > /cgroup/A > 2. insmod hugetlb.ko > 3. ls /cgroup/A > > and then, files can be shown ? Don't we have any problem at rmdir A ? > > I'm sorry if hugetlb never be used as module. HUGETLBFS cannot be build as kernel module > > a comment below. > > > --- > > include/linux/hugetlb.h | 17 +++++++++++++++ > > include/linux/memcontrol.h | 7 ++++++ > > mm/hugetlb.c | 25 ++++++++++++++++++++++- > > mm/memcontrol.c | 48 ++++++++++++++++++++++++++++++++++++++++++++ > > 4 files changed, 96 insertions(+), 1 deletions(-) ...... > > > > +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB > > +static char *mem_fmt(char *buf, unsigned long n) > > +{ > > + if (n >= (1UL << 30)) > > + sprintf(buf, "%luGB", n >> 30); > > + else if (n >= (1UL << 20)) > > + sprintf(buf, "%luMB", n >> 20); > > + else > > + sprintf(buf, "%luKB", n >> 10); > > + return buf; > > +} > > + > > +int mem_cgroup_hugetlb_file_init(int idx) > > +{ > > > __init ? Added . >And... do we have guarantee that this function is called before > creating root mem cgroup even if CONFIG_HUGETLBFS=y ? > Yes. This should be called before creating root mem cgroup. -aneesh From mboxrd@z Thu Jan 1 00:00:00 1970 From: KAMEZAWA Hiroyuki Subject: Re: [PATCH -V4 07/10] hugetlbfs: Add memcg control files for hugetlbfs Date: Mon, 19 Mar 2012 16:34:01 +0900 Message-ID: <4F66E169.5000909@jp.fujitsu.com> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-8-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <4F66A059.20801@jp.fujitsu.com> <87wr6hjc58.fsf@linux.vnet.ibm.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <87wr6hjc58.fsf-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" To: "Aneesh Kumar K.V" Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, mgorman-l3A5Bk7waGM@public.gmane.org, dhillf-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, aarcange-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, mhocko-AlSwsSmVLrQ@public.gmane.org, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org, hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Tejun Heo (2012/03/19 16:14), Aneesh Kumar K.V wrote: > On Mon, 19 Mar 2012 11:56:25 +0900, KAMEZAWA Hiroyuki wrote: >> (2012/03/17 2:39), Aneesh Kumar K.V wrote: >> >>> From: "Aneesh Kumar K.V" >>> >>> This add control files for hugetlbfs in memcg >>> >>> Signed-off-by: Aneesh Kumar K.V >> >> >> I have a question. When a user does >> >> 1. create memory cgroup as >> /cgroup/A >> 2. insmod hugetlb.ko >> 3. ls /cgroup/A >> >> and then, files can be shown ? Don't we have any problem at rmdir A ? >> >> I'm sorry if hugetlb never be used as module. > > HUGETLBFS cannot be build as kernel module > > >> >> a comment below. >> >>> --- >>> include/linux/hugetlb.h | 17 +++++++++++++++ >>> include/linux/memcontrol.h | 7 ++++++ >>> mm/hugetlb.c | 25 ++++++++++++++++++++++- >>> mm/memcontrol.c | 48 ++++++++++++++++++++++++++++++++++++++++++++ >>> 4 files changed, 96 insertions(+), 1 deletions(-) > > > ...... > >>> >>> +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB >>> +static char *mem_fmt(char *buf, unsigned long n) >>> +{ >>> + if (n >= (1UL << 30)) >>> + sprintf(buf, "%luGB", n >> 30); >>> + else if (n >= (1UL << 20)) >>> + sprintf(buf, "%luMB", n >> 20); >>> + else >>> + sprintf(buf, "%luKB", n >> 10); >>> + return buf; >>> +} >>> + >>> +int mem_cgroup_hugetlb_file_init(int idx) >>> +{ >> >> >> __init ? > > Added . > >> And... do we have guarantee that this function is called before >> creating root mem cgroup even if CONFIG_HUGETLBFS=y ? >> > > Yes. This should be called before creating root mem cgroup. > O.K. BTW, please read Tejun's recent post.. https://lkml.org/lkml/2012/3/16/522 Can you use his methods ? I guess you can write... CGROUP_SUBSYS_CFTYLES_COND(mem_cgroup_subsys, hugetlb_cgroup_files, if XXXXMB hugetlb is allowed); Hmm. Thanks, -Kame From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Aneesh Kumar K.V" Subject: Re: [PATCH -V4 09/10] memcg: move HugeTLB resource count to parent cgroup on memcg removal Date: Mon, 19 Mar 2012 14:30:24 +0530 Message-ID: <87r4wpj787.fsf@linux.vnet.ibm.com> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-10-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <4F66A258.5060301@jp.fujitsu.com> Mime-Version: 1.0 Return-path: In-Reply-To: <4F66A258.5060301@jp.fujitsu.com> Sender: owner-linux-mm@kvack.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: KAMEZAWA Hiroyuki Cc: linux-mm@kvack.org, mgorman@suse.de, dhillf@gmail.com, aarcange@redhat.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org On Mon, 19 Mar 2012 12:04:56 +0900, KAMEZAWA Hiroyuki wrote: > (2012/03/17 2:39), Aneesh Kumar K.V wrote: > > > From: "Aneesh Kumar K.V" > > > > This add support for memcg removal with HugeTLB resource usage. > > > > Signed-off-by: Aneesh Kumar K.V > > > seems ok for now. > > Now, Tejun and Costa, and I are discussing removeing -EBUSY from rmdir(). > We're now considering 'if use_hierarchy=false and parent seems full, > reclaim all or move charges to the root cgroup.' then -EBUSY will go away. > > Is it accesptable for hugetlb ? Do you have another idea ? > That should work even for hugetlb. -aneesh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Aneesh Kumar K.V" Subject: Re: [PATCH -V4 08/10] hugetlbfs: Add a list for tracking in-use HugeTLB pages Date: Mon, 19 Mar 2012 14:29:14 +0530 Message-ID: <87ty1lj7a5.fsf@linux.vnet.ibm.com> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-9-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <4F66A15B.7070804@jp.fujitsu.com> Mime-Version: 1.0 Return-path: In-Reply-To: <4F66A15B.7070804@jp.fujitsu.com> Sender: owner-linux-mm@kvack.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: KAMEZAWA Hiroyuki Cc: linux-mm@kvack.org, mgorman@suse.de, dhillf@gmail.com, aarcange@redhat.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org On Mon, 19 Mar 2012 12:00:43 +0900, KAMEZAWA Hiroyuki wrote: > (2012/03/17 2:39), Aneesh Kumar K.V wrote: > > > From: "Aneesh Kumar K.V" > > > > hugepage_activelist will be used to track currently used HugeTLB pages. > > We need to find the in-use HugeTLB pages to support memcg removal. > > On memcg removal we update the page's memory cgroup to point to > > parent cgroup. > > > > Signed-off-by: Aneesh Kumar K.V > > > Reviewed-by: KAMEZAWA Hiroyuki > > seems ok to me but...why the new list is not per node ? no benefit ? > I am not sure whether having per node will bring any performance benefit. For cgroup removal we need to look at all the list entries anyway. -aneesh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Glauber Costa Subject: Re: [PATCH -V4 04/10] memcg: Add HugeTLB extension Date: Mon, 19 Mar 2012 15:39:18 +0400 Message-ID: <4F671AE6.5020204@parallels.com> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-5-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <4F669C2E.1010502@jp.fujitsu.com> <874ntlkrp6.fsf@linux.vnet.ibm.com> <4F66D993.2080100@jp.fujitsu.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <4F66D993.2080100@jp.fujitsu.com> Sender: owner-linux-mm@kvack.org List-ID: Content-Type: text/plain; charset="us-ascii"; format="flowed" To: KAMEZAWA Hiroyuki Cc: "Aneesh Kumar K.V" , linux-mm@kvack.org, mgorman@suse.de, dhillf@gmail.com, aarcange@redhat.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org On 03/19/2012 11:00 AM, KAMEZAWA Hiroyuki wrote: > (2012/03/19 15:52), Aneesh Kumar K.V wrote: > >> On Mon, 19 Mar 2012 11:38:38 +0900, KAMEZAWA Hiroyuki wrote: >>> (2012/03/17 2:39), Aneesh Kumar K.V wrote: >>> >>>> From: "Aneesh Kumar K.V" >>>> >>>> This patch implements a memcg extension that allows us to control >>>> HugeTLB allocations via memory controller. >>>> >>> >>> >>> If you write some details here, it will be helpful for review and >>> seeing log after merge. >> >> Will add more info. >> >>> >>> >>>> Signed-off-by: Aneesh Kumar K.V >>>> --- >>>> include/linux/hugetlb.h | 1 + >>>> include/linux/memcontrol.h | 42 +++++++++++++ >>>> init/Kconfig | 8 +++ >>>> mm/hugetlb.c | 2 +- >>>> mm/memcontrol.c | 138 ++++++++++++++++++++++++++++++++++++++++++++ >>>> 5 files changed, 190 insertions(+), 1 deletions(-) >> >> .... >> >>>> +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB >>>> +static bool mem_cgroup_have_hugetlb_usage(struct mem_cgroup *memcg) >>>> +{ >>>> + int idx; >>>> + for (idx = 0; idx< hugetlb_max_hstate; idx++) { >>>> + if (memcg->hugepage[idx].usage> 0) >>>> + return 1; >>>> + } >>>> + return 0; >>>> +} >>> >>> >>> Please use res_counter_read_u64() rather than reading the value directly. >>> >> >> The open-coded variant is mostly derived from mem_cgroup_force_empty. I >> have updated the patch to use res_counter_read_u64. >> > > Ah, ok. it's(maybe) my bad. I'll schedule a fix. > Kame, I actually have it ready here. I can submit it if you want. This one has bitten me as well when I was trying to experiment with the res_counter performance... -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: KAMEZAWA Hiroyuki Subject: Re: [PATCH -V4 04/10] memcg: Add HugeTLB extension Date: Mon, 19 Mar 2012 21:07:01 +0900 Message-ID: <4F672165.4050506@jp.fujitsu.com> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-5-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <4F669C2E.1010502@jp.fujitsu.com> <874ntlkrp6.fsf@linux.vnet.ibm.com> <4F66D993.2080100@jp.fujitsu.com> <4F671AE6.5020204@parallels.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <4F671AE6.5020204@parallels.com> Sender: owner-linux-mm@kvack.org List-ID: Content-Type: text/plain; charset="us-ascii" To: Glauber Costa Cc: "Aneesh Kumar K.V" , linux-mm@kvack.org, mgorman@suse.de, dhillf@gmail.com, aarcange@redhat.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org (2012/03/19 20:39), Glauber Costa wrote: > On 03/19/2012 11:00 AM, KAMEZAWA Hiroyuki wrote: >> (2012/03/19 15:52), Aneesh Kumar K.V wrote: >> >>> On Mon, 19 Mar 2012 11:38:38 +0900, KAMEZAWA Hiroyuki wrote: >>>> (2012/03/17 2:39), Aneesh Kumar K.V wrote: >>>> >>>>> From: "Aneesh Kumar K.V" >>>>> >>>>> This patch implements a memcg extension that allows us to control >>>>> HugeTLB allocations via memory controller. >>>>> >>>> >>>> >>>> If you write some details here, it will be helpful for review and >>>> seeing log after merge. >>> >>> Will add more info. >>> >>>> >>>> >>>>> Signed-off-by: Aneesh Kumar K.V >>>>> --- >>>>> include/linux/hugetlb.h | 1 + >>>>> include/linux/memcontrol.h | 42 +++++++++++++ >>>>> init/Kconfig | 8 +++ >>>>> mm/hugetlb.c | 2 +- >>>>> mm/memcontrol.c | 138 ++++++++++++++++++++++++++++++++++++++++++++ >>>>> 5 files changed, 190 insertions(+), 1 deletions(-) >>> >>> .... >>> >>>>> +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB >>>>> +static bool mem_cgroup_have_hugetlb_usage(struct mem_cgroup *memcg) >>>>> +{ >>>>> + int idx; >>>>> + for (idx = 0; idx< hugetlb_max_hstate; idx++) { >>>>> + if (memcg->hugepage[idx].usage> 0) >>>>> + return 1; >>>>> + } >>>>> + return 0; >>>>> +} >>>> >>>> >>>> Please use res_counter_read_u64() rather than reading the value directly. >>>> >>> >>> The open-coded variant is mostly derived from mem_cgroup_force_empty. I >>> have updated the patch to use res_counter_read_u64. >>> >> >> Ah, ok. it's(maybe) my bad. I'll schedule a fix. >> > Kame, > > I actually have it ready here. I can submit it if you want. > That's good :) please post. (But I'm sorry I'll be absent tomorrow.) Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Aneesh Kumar K.V" Subject: Re: [PATCH -V4 07/10] hugetlbfs: Add memcg control files for hugetlbfs Date: Tue, 20 Mar 2012 14:52:20 +0530 Message-ID: <874ntjtynn.fsf@linux.vnet.ibm.com> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-8-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <4F66A059.20801@jp.fujitsu.com> <87wr6hjc58.fsf@linux.vnet.ibm.com> <4F66E169.5000909@jp.fujitsu.com>User-Agent: Notmuch/0.11.1+346~g13d19c3 (http://notmuchmail.org) Emacs/23.3.1 (x86_64-pc-linux-gnu) Mime-Version: 1.0 Return-path: In-Reply-To: <4F66E169.5000909@jp.fujitsu.com> Sender: owner-linux-mm@kvack.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: KAMEZAWA Hiroyuki Cc: linux-mm@kvack.org, mgorman@suse.de, dhillf@gmail.com, aarcange@redhat.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Tejun Heo KAMEZAWA Hiroyuki writes: > > O.K. BTW, please read Tejun's recent post.. > > https://lkml.org/lkml/2012/3/16/522 > > Can you use his methods ? > > I guess you can write... > > CGROUP_SUBSYS_CFTYLES_COND(mem_cgroup_subsys, > hugetlb_cgroup_files, > if XXXXMB hugetlb is allowed); > I may not be able to do CGROUP_SUBSYS_CFTYPES_COND(). But as long as we are able to dynamically add new control files, we should be ok. -aneesh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Aneesh Kumar K.V" Subject: Re: [PATCH -V4 04/10] memcg: Add HugeTLB extension Date: Wed, 21 Mar 2012 10:18:43 +0530 Message-ID: <87obrqsgno.fsf@linux.vnet.ibm.com> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-5-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <4F669C2E.1010502@jp.fujitsu.com> <874ntlkrp6.fsf@linux.vnet.ibm.com> <4F66D993.2080100@jp.fujitsu.com> <4F671AE6.5020204@parallels.com>User-Agent: Notmuch/0.11.1+346~g13d19c3 (http://notmuchmail.org) Emacs/23.3.1 (x86_64-pc-linux-gnu) Mime-Version: 1.0 Return-path: In-Reply-To: <4F671AE6.5020204-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: Glauber Costa , KAMEZAWA Hiroyuki Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, mgorman-l3A5Bk7waGM@public.gmane.org, dhillf-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, aarcange-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, mhocko-AlSwsSmVLrQ@public.gmane.org, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org, hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org Glauber Costa writes: > On 03/19/2012 11:00 AM, KAMEZAWA Hiroyuki wrote: >> (2012/03/19 15:52), Aneesh Kumar K.V wrote: >> >>> >>>>> +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB >>>>> +static bool mem_cgroup_have_hugetlb_usage(struct mem_cgroup *memcg) >>>>> +{ >>>>> + int idx; >>>>> + for (idx = 0; idx< hugetlb_max_hstate; idx++) { >>>>> + if (memcg->hugepage[idx].usage> 0) >>>>> + return 1; >>>>> + } >>>>> + return 0; >>>>> +} >>>> >>>> >>>> Please use res_counter_read_u64() rather than reading the value directly. >>>> >>> >>> The open-coded variant is mostly derived from mem_cgroup_force_empty. I >>> have updated the patch to use res_counter_read_u64. >>> >> >> Ah, ok. it's(maybe) my bad. I'll schedule a fix. >> > Kame, > > I actually have it ready here. I can submit it if you want. > > This one has bitten me as well when I was trying to experiment with the > res_counter performance... Do we really need memcg.res.usage to be accurate in that while loop ? If we miss a zero update because we encountered a partial update; in the next loop we will find it zero right ? -aneesh From mboxrd@z Thu Jan 1 00:00:00 1970 From: KAMEZAWA Hiroyuki Subject: Re: [PATCH -V4 04/10] memcg: Add HugeTLB extension Date: Wed, 21 Mar 2012 14:22:52 +0900 Message-ID: <4F6965AC.4070004@jp.fujitsu.com> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-5-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <4F669C2E.1010502@jp.fujitsu.com> <874ntlkrp6.fsf@linux.vnet.ibm.com> <4F66D993.2080100@jp.fujitsu.com> <4F671AE6.5020204@parallels.com>User-Agent: Notmuch/0.11.1+346~g13d19c3 (http://notmuchmail.org) Emacs/23.3.1 (x86_64-pc-linux-gnu) <87obrqsgno.fsf@linux.vnet.ibm.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <87obrqsgno.fsf-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" To: "Aneesh Kumar K.V" Cc: Glauber Costa , linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, mgorman-l3A5Bk7waGM@public.gmane.org, dhillf-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, aarcange-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, mhocko-AlSwsSmVLrQ@public.gmane.org, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org, hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org (2012/03/21 13:48), Aneesh Kumar K.V wrote: > Glauber Costa writes: > >> On 03/19/2012 11:00 AM, KAMEZAWA Hiroyuki wrote: >>> (2012/03/19 15:52), Aneesh Kumar K.V wrote: >>> >>>> >>>>>> +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB >>>>>> +static bool mem_cgroup_have_hugetlb_usage(struct mem_cgroup *memcg) >>>>>> +{ >>>>>> + int idx; >>>>>> + for (idx = 0; idx< hugetlb_max_hstate; idx++) { >>>>>> + if (memcg->hugepage[idx].usage> 0) >>>>>> + return 1; >>>>>> + } >>>>>> + return 0; >>>>>> +} >>>>> >>>>> >>>>> Please use res_counter_read_u64() rather than reading the value directly. >>>>> >>>> >>>> The open-coded variant is mostly derived from mem_cgroup_force_empty. I >>>> have updated the patch to use res_counter_read_u64. >>>> >>> >>> Ah, ok. it's(maybe) my bad. I'll schedule a fix. >>> >> Kame, >> >> I actually have it ready here. I can submit it if you want. >> >> This one has bitten me as well when I was trying to experiment with the >> res_counter performance... > > Do we really need memcg.res.usage to be accurate in that while loop ? If > we miss a zero update because we encountered a partial update; in the > next loop we will find it zero right ? > At rmdir(), I assume there is no task in memcg. It means res->usage never increase and no other thread than force_empty will touch res->counter. So, I think memcg->res.usage > 0 never be wrong and we'll find correct comparison by continuing the loop. But recent kmem accounting at el may break the assumption (I'm not fully sure..) So, I think it will be good to use res_counter_u64(). This part is not important for performance, anyway. Thanks, -Kame From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michal Hocko Subject: Re: [PATCH -V4 01/10] hugetlb: rename max_hstate to hugetlb_max_hstate Date: Wed, 28 Mar 2012 11:18:11 +0200 Message-ID: <20120328091811.GB20949@tiehlicka.suse.cz> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-2-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Mime-Version: 1.0 Return-path: Content-Disposition: inline In-Reply-To: <1331919570-2264-2-git-send-email-aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: "Aneesh Kumar K.V" Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, mgorman-l3A5Bk7waGM@public.gmane.org, kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org, dhillf-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, aarcange-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org, hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [Sorry for late review] On Fri 16-03-12 23:09:21, Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > We will be using this from other subsystems like memcg > in later patches. OK, why not. I would probably loved an accessor function more but what ever. Acked-by: Michal Hocko > > Acked-by: Hillf Danton > Signed-off-by: Aneesh Kumar K.V > --- > mm/hugetlb.c | 14 +++++++------- > 1 files changed, 7 insertions(+), 7 deletions(-) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 5f34bd8..d623e71 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -34,7 +34,7 @@ const unsigned long hugetlb_zero = 0, hugetlb_infinity = ~0UL; > static gfp_t htlb_alloc_mask = GFP_HIGHUSER; > unsigned long hugepages_treat_as_movable; > > -static int max_hstate; > +static int hugetlb_max_hstate; > > unsigned int default_hstate_idx; > struct hstate hstates[HUGE_MAX_HSTATE]; > > @@ -46,7 +46,7 @@ static unsigned long __initdata default_hstate_max_huge_pages; > static unsigned long __initdata default_hstate_size; > > #define for_each_hstate(h) \ > - for ((h) = hstates; (h) < &hstates[max_hstate]; (h)++) > + for ((h) = hstates; (h) < &hstates[hugetlb_max_hstate]; (h)++) > > /* > * Protects updates to hugepage_freelists, nr_huge_pages, and free_huge_pages > @@ -1808,9 +1808,9 @@ void __init hugetlb_add_hstate(unsigned order) > printk(KERN_WARNING "hugepagesz= specified twice, ignoring\n"); > return; > } > - BUG_ON(max_hstate >= HUGE_MAX_HSTATE); > + BUG_ON(hugetlb_max_hstate >= HUGE_MAX_HSTATE); > BUG_ON(order == 0); > - h = &hstates[max_hstate++]; > + h = &hstates[hugetlb_max_hstate++]; > h->order = order; > h->mask = ~((1ULL << (order + PAGE_SHIFT)) - 1); > h->nr_huge_pages = 0; > @@ -1831,10 +1831,10 @@ static int __init hugetlb_nrpages_setup(char *s) > static unsigned long *last_mhp; > > /* > - * !max_hstate means we haven't parsed a hugepagesz= parameter yet, > + * !hugetlb_max_hstate means we haven't parsed a hugepagesz= parameter yet, > * so this hugepages= parameter goes to the "default hstate". > */ > - if (!max_hstate) > + if (!hugetlb_max_hstate) > mhp = &default_hstate_max_huge_pages; > else > mhp = &parsed_hstate->max_huge_pages; > @@ -1853,7 +1853,7 @@ static int __init hugetlb_nrpages_setup(char *s) > * But we need to allocate >= MAX_ORDER hstates here early to still > * use the bootmem allocator. > */ > - if (max_hstate && parsed_hstate->order >= MAX_ORDER) > + if (hugetlb_max_hstate && parsed_hstate->order >= MAX_ORDER) > hugetlb_hstate_alloc_pages(parsed_hstate); > > last_mhp = mhp; > -- > 1.7.9 > -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michal Hocko Subject: Re: [PATCH -V4 02/10] hugetlbfs: don't use ERR_PTR with VM_FAULT* values Date: Wed, 28 Mar 2012 11:25:48 +0200 Message-ID: <20120328092547.GC20949@tiehlicka.suse.cz> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-3-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Mime-Version: 1.0 Return-path: Content-Disposition: inline In-Reply-To: <1331919570-2264-3-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Sender: owner-linux-mm@kvack.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: "Aneesh Kumar K.V" Cc: linux-mm@kvack.org, mgorman@suse.de, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, aarcange@redhat.com, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org On Fri 16-03-12 23:09:22, Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > Using VM_FAULT_* codes with ERR_PTR will require us to make sure > VM_FAULT_* values will not exceed MAX_ERRNO value. > > Signed-off-by: Aneesh Kumar K.V > --- > mm/hugetlb.c | 18 +++++++++++++----- > 1 files changed, 13 insertions(+), 5 deletions(-) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index d623e71..3782da8 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c [...] > @@ -1047,7 +1047,7 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma, > page = alloc_buddy_huge_page(h, NUMA_NO_NODE); > if (!page) { > hugetlb_put_quota(inode->i_mapping, chg); > - return ERR_PTR(-VM_FAULT_SIGBUS); > + return ERR_PTR(-ENOSPC); Hmm, so one error code abuse replaced by another? I know that ENOMEM would revert 4a6018f7 which would be unfortunate but ENOSPC doesn't feel right as well. > } > } > > @@ -2395,6 +2395,7 @@ retry_avoidcopy: > new_page = alloc_huge_page(vma, address, outside_reserve); > > if (IS_ERR(new_page)) { > + int err = PTR_ERR(new_page); > page_cache_release(old_page); > > /* > @@ -2424,7 +2425,10 @@ retry_avoidcopy: > > /* Caller expects lock to be held */ > spin_lock(&mm->page_table_lock); > - return -PTR_ERR(new_page); > + if (err == -ENOMEM) > + return VM_FAULT_OOM; > + else > + return VM_FAULT_SIGBUS; > } > > /* > @@ -2542,7 +2546,11 @@ retry: > goto out; > page = alloc_huge_page(vma, address, 0); > if (IS_ERR(page)) { > - ret = -PTR_ERR(page); > + ret = PTR_ERR(page); > + if (ret == -ENOMEM) > + ret = VM_FAULT_OOM; > + else > + ret = VM_FAULT_SIGBUS; > goto out; > } > clear_huge_page(page, address, pages_per_huge_page(h)); > -- > 1.7.9 > -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michal Hocko Subject: Re: [PATCH -V4 03/10] hugetlbfs: Add an inline helper for finding hstate index Date: Wed, 28 Mar 2012 11:41:34 +0200 Message-ID: <20120328094134.GD20949@tiehlicka.suse.cz> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-4-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Mime-Version: 1.0 Return-path: Content-Disposition: inline In-Reply-To: <1331919570-2264-4-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: "Aneesh Kumar K.V" Cc: linux-mm@kvack.org, mgorman@suse.de, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, aarcange@redhat.com, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org On Fri 16-03-12 23:09:23, Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > Add and inline helper and use it in the code. OK, helper function looks much nicer. > > Signed-off-by: Aneesh Kumar K.V Acked-by: Michal Hocko > --- > include/linux/hugetlb.h | 6 ++++++ > mm/hugetlb.c | 18 ++++++++++-------- > 2 files changed, 16 insertions(+), 8 deletions(-) > > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h > index d9d6c86..a2675b0 100644 > --- a/include/linux/hugetlb.h > +++ b/include/linux/hugetlb.h > @@ -311,6 +311,11 @@ static inline unsigned hstate_index_to_shift(unsigned index) > return hstates[index].order + PAGE_SHIFT; > } > > +static inline int hstate_index(struct hstate *h) > +{ > + return h - hstates; > +} > + > #else > struct hstate {}; > #define alloc_huge_page_node(h, nid) NULL > @@ -329,6 +334,7 @@ static inline unsigned int pages_per_huge_page(struct hstate *h) > return 1; > } > #define hstate_index_to_shift(index) 0 > +#define hstate_index(h) 0 > #endif > > #endif /* _LINUX_HUGETLB_H */ > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 3782da8..ebe245c 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -1557,7 +1557,7 @@ static int hugetlb_sysfs_add_hstate(struct hstate *h, struct kobject *parent, > struct attribute_group *hstate_attr_group) > { > int retval; > - int hi = h - hstates; > + int hi = hstate_index(h); > > hstate_kobjs[hi] = kobject_create_and_add(h->name, parent); > if (!hstate_kobjs[hi]) > @@ -1652,11 +1652,13 @@ void hugetlb_unregister_node(struct node *node) > if (!nhs->hugepages_kobj) > return; /* no hstate attributes */ > > - for_each_hstate(h) > - if (nhs->hstate_kobjs[h - hstates]) { > - kobject_put(nhs->hstate_kobjs[h - hstates]); > - nhs->hstate_kobjs[h - hstates] = NULL; > + for_each_hstate(h) { > + int idx = hstate_index(h); > + if (nhs->hstate_kobjs[idx]) { > + kobject_put(nhs->hstate_kobjs[idx]); > + nhs->hstate_kobjs[idx] = NULL; > } > + } > > kobject_put(nhs->hugepages_kobj); > nhs->hugepages_kobj = NULL; > @@ -1759,7 +1761,7 @@ static void __exit hugetlb_exit(void) > hugetlb_unregister_all_nodes(); > > for_each_hstate(h) { > - kobject_put(hstate_kobjs[h - hstates]); > + kobject_put(hstate_kobjs[hstate_index(h)]); > } > > kobject_put(hugepages_kobj); > @@ -2587,7 +2589,7 @@ retry: > */ > if (unlikely(PageHWPoison(page))) { > ret = VM_FAULT_HWPOISON | > - VM_FAULT_SET_HINDEX(h - hstates); > + VM_FAULT_SET_HINDEX(hstate_index(h)); > goto backout_unlocked; > } > } > @@ -2660,7 +2662,7 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, > return 0; > } else if (unlikely(is_hugetlb_entry_hwpoisoned(entry))) > return VM_FAULT_HWPOISON_LARGE | > - VM_FAULT_SET_HINDEX(h - hstates); > + VM_FAULT_SET_HINDEX(hstate_index(h)); > } > > ptep = huge_pte_alloc(mm, address, huge_page_size(h)); > -- > 1.7.9 > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ > Don't email: email@kvack.org -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michal Hocko Subject: Re: [PATCH -V4 04/10] memcg: Add HugeTLB extension Date: Wed, 28 Mar 2012 13:33:04 +0200 Message-ID: <20120328113304.GE20949@tiehlicka.suse.cz> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-5-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Mime-Version: 1.0 Return-path: Content-Disposition: inline In-Reply-To: <1331919570-2264-5-git-send-email-aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: "Aneesh Kumar K.V" Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, mgorman-l3A5Bk7waGM@public.gmane.org, kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org, dhillf-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, aarcange-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org, hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org On Fri 16-03-12 23:09:24, Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > This patch implements a memcg extension that allows us to control > HugeTLB allocations via memory controller. And the infrastructure is not used at this stage (you forgot to mention). The changelog should be much more descriptive. > > Signed-off-by: Aneesh Kumar K.V > --- > include/linux/hugetlb.h | 1 + > include/linux/memcontrol.h | 42 +++++++++++++ > init/Kconfig | 8 +++ > mm/hugetlb.c | 2 +- > mm/memcontrol.c | 138 ++++++++++++++++++++++++++++++++++++++++++++ > 5 files changed, 190 insertions(+), 1 deletions(-) > [...] > diff --git a/init/Kconfig b/init/Kconfig > index 3f42cd6..f0eb8aa 100644 > --- a/init/Kconfig > +++ b/init/Kconfig > @@ -725,6 +725,14 @@ config CGROUP_PERF > > Say N if unsure. > > +config MEM_RES_CTLR_HUGETLB > + bool "Memory Resource Controller HugeTLB Extension (EXPERIMENTAL)" > + depends on CGROUP_MEM_RES_CTLR && HUGETLB_PAGE && EXPERIMENTAL > + default n > + help > + Add HugeTLB management to memory resource controller. When you > + enable this, you can put a per cgroup limit on HugeTLB usage. How does it interact with the hard/soft limists etc... [...] > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 6728a7a..4b36c5e 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -235,6 +235,10 @@ struct mem_cgroup { > */ > struct res_counter memsw; > /* > + * the counter to account for hugepages from hugetlb. > + */ > + struct res_counter hugepage[HUGE_MAX_HSTATE]; > + /* > * Per cgroup active and inactive list, similar to the > * per zone LRU lists. > */ > @@ -3156,6 +3160,128 @@ static inline int mem_cgroup_move_swap_account(swp_entry_t entry, > } > #endif > > +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB > +static bool mem_cgroup_have_hugetlb_usage(struct mem_cgroup *memcg) > +{ > + int idx; > + for (idx = 0; idx < hugetlb_max_hstate; idx++) { Maybe we should expose for_each_hstate as well... > + if (memcg->hugepage[idx].usage > 0) > + return 1; > + } > + return 0; > +} > + > +int mem_cgroup_hugetlb_charge_page(int idx, unsigned long nr_pages, > + struct mem_cgroup **ptr) > +{ > + int ret = 0; > + struct mem_cgroup *memcg; > + struct res_counter *fail_res; > + unsigned long csize = nr_pages * PAGE_SIZE; > + > + if (mem_cgroup_disabled()) > + return 0; > +again: > + rcu_read_lock(); > + memcg = mem_cgroup_from_task(current); > + if (!memcg) > + memcg = root_mem_cgroup; > + if (mem_cgroup_is_root(memcg)) { > + rcu_read_unlock(); > + goto done; > + } > + if (!css_tryget(&memcg->css)) { > + rcu_read_unlock(); > + goto again; > + } > + rcu_read_unlock(); > + > + ret = res_counter_charge(&memcg->hugepage[idx], csize, &fail_res); > + css_put(&memcg->css); > +done: > + *ptr = memcg; Why do we set ptr even for the failure case after we dropped a reference? > + return ret; > +} > + > +void mem_cgroup_hugetlb_commit_charge(int idx, unsigned long nr_pages, > + struct mem_cgroup *memcg, > + struct page *page) > +{ > + struct page_cgroup *pc; > + > + if (mem_cgroup_disabled()) > + return; > + > + pc = lookup_page_cgroup(page); > + lock_page_cgroup(pc); > + if (unlikely(PageCgroupUsed(pc))) { > + unlock_page_cgroup(pc); > + mem_cgroup_hugetlb_uncharge_memcg(idx, nr_pages, memcg); > + return; > + } > + pc->mem_cgroup = memcg; > + /* > + * We access a page_cgroup asynchronously without lock_page_cgroup(). > + * Especially when a page_cgroup is taken from a page, pc->mem_cgroup > + * is accessed after testing USED bit. To make pc->mem_cgroup visible > + * before USED bit, we need memory barrier here. > + * See mem_cgroup_add_lru_list(), etc. > + */ > + smp_wmb(); Is this really necessary for hugetlb pages as well? > + SetPageCgroupUsed(pc); > + > + unlock_page_cgroup(pc); > + return; > +} > + [...] > @@ -4887,6 +5013,7 @@ err_cleanup: > static struct cgroup_subsys_state * __ref > mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont) > { > + int idx; > struct mem_cgroup *memcg, *parent; > long error = -ENOMEM; > int node; > @@ -4929,9 +5056,14 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont) > * mem_cgroup(see mem_cgroup_put). > */ > mem_cgroup_get(parent); > + for (idx = 0; idx < HUGE_MAX_HSTATE; idx++) Do we have to init all hstates or is hugetlb_max_hstate enough? > + res_counter_init(&memcg->hugepage[idx], > + &parent->hugepage[idx]); > } else { > res_counter_init(&memcg->res, NULL); > res_counter_init(&memcg->memsw, NULL); > + for (idx = 0; idx < HUGE_MAX_HSTATE; idx++) > + res_counter_init(&memcg->hugepage[idx], NULL); Same here -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Aneesh Kumar K.V" Subject: Re: [PATCH -V4 02/10] hugetlbfs: don't use ERR_PTR with VM_FAULT* values Date: Wed, 28 Mar 2012 17:05:49 +0530 Message-ID: <87vclpyn3e.fsf@skywalker.in.ibm.com> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-3-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <20120328092547.GC20949@tiehlicka.suse.cz>User-Agent: Notmuch/0.11.1+346~g13d19c3 (http://notmuchmail.org) Emacs/23.3.1 (x86_64-pc-linux-gnu) Mime-Version: 1.0 Return-path: In-Reply-To: <20120328092547.GC20949-VqjxzfR4DlwKmadIfiO5sKVXKuFTiq87@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: Michal Hocko Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, mgorman-l3A5Bk7waGM@public.gmane.org, kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org, dhillf-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, aarcange-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org, hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org Michal Hocko writes: > On Fri 16-03-12 23:09:22, Aneesh Kumar K.V wrote: >> From: "Aneesh Kumar K.V" >> >> Using VM_FAULT_* codes with ERR_PTR will require us to make sure >> VM_FAULT_* values will not exceed MAX_ERRNO value. >> >> Signed-off-by: Aneesh Kumar K.V >> --- >> mm/hugetlb.c | 18 +++++++++++++----- >> 1 files changed, 13 insertions(+), 5 deletions(-) >> >> diff --git a/mm/hugetlb.c b/mm/hugetlb.c >> index d623e71..3782da8 100644 >> --- a/mm/hugetlb.c >> +++ b/mm/hugetlb.c > [...] >> @@ -1047,7 +1047,7 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma, >> page = alloc_buddy_huge_page(h, NUMA_NO_NODE); >> if (!page) { >> hugetlb_put_quota(inode->i_mapping, chg); >> - return ERR_PTR(-VM_FAULT_SIGBUS); >> + return ERR_PTR(-ENOSPC); > > Hmm, so one error code abuse replaced by another? > I know that ENOMEM would revert 4a6018f7 which would be unfortunate but > ENOSPC doesn't feel right as well. > File systems do map ENOSPC to SIGBUS. block_page_mkwrite_return() does that. -aneesh From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michal Hocko Subject: Re: [PATCH -V4 05/10] hugetlb: add charge/uncharge calls for HugeTLB alloc/free Date: Wed, 28 Mar 2012 15:17:06 +0200 Message-ID: <20120328131706.GF20949@tiehlicka.suse.cz> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-6-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Mime-Version: 1.0 Return-path: Content-Disposition: inline In-Reply-To: <1331919570-2264-6-git-send-email-aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: "Aneesh Kumar K.V" Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, mgorman-l3A5Bk7waGM@public.gmane.org, kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org, dhillf-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, aarcange-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org, hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org On Fri 16-03-12 23:09:25, Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > This adds necessary charge/uncharge calls in the HugeTLB code This begs for more description... Other than that it looks correct. > Acked-by: Hillf Danton > Signed-off-by: Aneesh Kumar K.V > --- > mm/hugetlb.c | 21 ++++++++++++++++++++- > mm/memcontrol.c | 5 +++++ > 2 files changed, 25 insertions(+), 1 deletions(-) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index c672187..91361a0 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -21,6 +21,8 @@ > #include > #include > #include > +#include > +#include > > #include > #include > @@ -542,6 +544,9 @@ static void free_huge_page(struct page *page) > BUG_ON(page_mapcount(page)); > INIT_LIST_HEAD(&page->lru); > > + if (mapping) > + mem_cgroup_hugetlb_uncharge_page(hstate_index(h), > + pages_per_huge_page(h), page); > spin_lock(&hugetlb_lock); > if (h->surplus_huge_pages_node[nid] && huge_page_order(h) < MAX_ORDER) { > update_and_free_page(h, page); > @@ -1019,12 +1024,15 @@ static void vma_commit_reservation(struct hstate *h, > static struct page *alloc_huge_page(struct vm_area_struct *vma, > unsigned long addr, int avoid_reserve) > { > + int ret, idx; > struct hstate *h = hstate_vma(vma); > struct page *page; > + struct mem_cgroup *memcg = NULL; > struct address_space *mapping = vma->vm_file->f_mapping; > struct inode *inode = mapping->host; > long chg; > > + idx = hstate_index(h); > /* > * Processes that did not create the mapping will have no reserves and > * will not have accounted against quota. Check that the quota can be > @@ -1039,6 +1047,12 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma, > if (hugetlb_get_quota(inode->i_mapping, chg)) > return ERR_PTR(-ENOSPC); > > + ret = mem_cgroup_hugetlb_charge_page(idx, pages_per_huge_page(h), > + &memcg); > + if (ret) { > + hugetlb_put_quota(inode->i_mapping, chg); > + return ERR_PTR(-ENOSPC); > + } > spin_lock(&hugetlb_lock); > page = dequeue_huge_page_vma(h, vma, addr, avoid_reserve); > spin_unlock(&hugetlb_lock); > @@ -1046,6 +1060,9 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma, > if (!page) { > page = alloc_buddy_huge_page(h, NUMA_NO_NODE); > if (!page) { > + mem_cgroup_hugetlb_uncharge_memcg(idx, > + pages_per_huge_page(h), > + memcg); > hugetlb_put_quota(inode->i_mapping, chg); > return ERR_PTR(-ENOSPC); > } > @@ -1054,7 +1071,9 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma, > set_page_private(page, (unsigned long) mapping); > > vma_commit_reservation(h, vma, addr); > - > + /* update page cgroup details */ > + mem_cgroup_hugetlb_commit_charge(idx, pages_per_huge_page(h), > + memcg, page); > return page; > } > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 4b36c5e..7a9ea94 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -2901,6 +2901,11 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype) > > if (PageSwapCache(page)) > return NULL; > + /* > + * HugeTLB page uncharge happen in the HugeTLB compound page destructor > + */ > + if (PageHuge(page)) > + return NULL; > > if (PageTransHuge(page)) { > nr_pages <<= compound_order(page); > -- > 1.7.9 > -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michal Hocko Subject: Re: [PATCH -V4 04/10] memcg: Add HugeTLB extension Date: Wed, 28 Mar 2012 15:40:20 +0200 Message-ID: <20120328134020.GG20949@tiehlicka.suse.cz> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-5-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Mime-Version: 1.0 Return-path: Content-Disposition: inline In-Reply-To: <1331919570-2264-5-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Sender: owner-linux-mm@kvack.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: "Aneesh Kumar K.V" Cc: linux-mm@kvack.org, mgorman@suse.de, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, aarcange@redhat.com, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org On Fri 16-03-12 23:09:24, Aneesh Kumar K.V wrote: [...] > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 6728a7a..4b36c5e 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c [...] > @@ -4887,6 +5013,7 @@ err_cleanup: > static struct cgroup_subsys_state * __ref > mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont) > { > + int idx; > struct mem_cgroup *memcg, *parent; > long error = -ENOMEM; > int node; > @@ -4929,9 +5056,14 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont) > * mem_cgroup(see mem_cgroup_put). > */ > mem_cgroup_get(parent); > + for (idx = 0; idx < HUGE_MAX_HSTATE; idx++) > + res_counter_init(&memcg->hugepage[idx], > + &parent->hugepage[idx]); Hmm, I do not think we want to make groups deeper in the hierarchy unlimited as we cannot reclaim. Shouldn't we copy the limit from the parent? Still not ideal but slightly more expected behavior IMO. The hierarchy setups are still interesting and the limitations should be described in the documentation... > } else { > res_counter_init(&memcg->res, NULL); > res_counter_init(&memcg->memsw, NULL); > + for (idx = 0; idx < HUGE_MAX_HSTATE; idx++) > + res_counter_init(&memcg->hugepage[idx], NULL); > } > memcg->last_scanned_node = MAX_NUMNODES; > INIT_LIST_HEAD(&memcg->oom_notify); -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Aneesh Kumar K.V" Subject: Re: [PATCH -V4 04/10] memcg: Add HugeTLB extension Date: Wed, 28 Mar 2012 19:10:36 +0530 Message-ID: <87d37wetd7.fsf@skywalker.in.ibm.com> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-5-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <20120328113304.GE20949@tiehlicka.suse.cz>User-Agent: Notmuch/0.11.1+346~g13d19c3 (http://notmuchmail.org) Emacs/23.3.1 (x86_64-pc-linux-gnu) Mime-Version: 1.0 Return-path: In-Reply-To: <20120328113304.GE20949@tiehlicka.suse.cz> Sender: owner-linux-mm@kvack.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: Michal Hocko Cc: linux-mm@kvack.org, mgorman@suse.de, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, aarcange@redhat.com, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Michal Hocko writes: > On Fri 16-03-12 23:09:24, Aneesh Kumar K.V wrote: >> From: "Aneesh Kumar K.V" >> >> This patch implements a memcg extension that allows us to control >> HugeTLB allocations via memory controller. > > And the infrastructure is not used at this stage (you forgot to > mention). > The changelog should be much more descriptive. Will update the changelog. > >> >> Signed-off-by: Aneesh Kumar K.V >> --- >> include/linux/hugetlb.h | 1 + >> include/linux/memcontrol.h | 42 +++++++++++++ >> init/Kconfig | 8 +++ >> mm/hugetlb.c | 2 +- >> mm/memcontrol.c | 138 ++++++++++++++++++++++++++++++++++++++++++++ >> 5 files changed, 190 insertions(+), 1 deletions(-) >> > [...] >> diff --git a/init/Kconfig b/init/Kconfig >> index 3f42cd6..f0eb8aa 100644 >> --- a/init/Kconfig >> +++ b/init/Kconfig >> @@ -725,6 +725,14 @@ config CGROUP_PERF >> >> Say N if unsure. >> >> +config MEM_RES_CTLR_HUGETLB >> + bool "Memory Resource Controller HugeTLB Extension (EXPERIMENTAL)" >> + depends on CGROUP_MEM_RES_CTLR && HUGETLB_PAGE && EXPERIMENTAL >> + default n >> + help >> + Add HugeTLB management to memory resource controller. When you >> + enable this, you can put a per cgroup limit on HugeTLB usage. > > How does it interact with the hard/soft limists etc... There is no softlimit support for HugeTLB extension. > > [...] >> diff --git a/mm/memcontrol.c b/mm/memcontrol.c >> index 6728a7a..4b36c5e 100644 >> --- a/mm/memcontrol.c >> +++ b/mm/memcontrol.c >> @@ -235,6 +235,10 @@ struct mem_cgroup { >> */ >> struct res_counter memsw; >> /* >> + * the counter to account for hugepages from hugetlb. >> + */ >> + struct res_counter hugepage[HUGE_MAX_HSTATE]; >> + /* >> * Per cgroup active and inactive list, similar to the >> * per zone LRU lists. >> */ >> @@ -3156,6 +3160,128 @@ static inline int mem_cgroup_move_swap_account(swp_entry_t entry, >> } >> #endif >> >> +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB >> +static bool mem_cgroup_have_hugetlb_usage(struct mem_cgroup *memcg) >> +{ >> + int idx; >> + for (idx = 0; idx < hugetlb_max_hstate; idx++) { > > Maybe we should expose for_each_hstate as well... That will not really help here. If we use for_each_hstate then we will need to use hstate_index to get the index. > >> + if (memcg->hugepage[idx].usage > 0) >> + return 1; >> + } >> + return 0; >> +} >> + >> +int mem_cgroup_hugetlb_charge_page(int idx, unsigned long nr_pages, >> + struct mem_cgroup **ptr) >> +{ >> + int ret = 0; >> + struct mem_cgroup *memcg; >> + struct res_counter *fail_res; >> + unsigned long csize = nr_pages * PAGE_SIZE; >> + >> + if (mem_cgroup_disabled()) >> + return 0; >> +again: >> + rcu_read_lock(); >> + memcg = mem_cgroup_from_task(current); >> + if (!memcg) >> + memcg = root_mem_cgroup; >> + if (mem_cgroup_is_root(memcg)) { >> + rcu_read_unlock(); >> + goto done; >> + } >> + if (!css_tryget(&memcg->css)) { >> + rcu_read_unlock(); >> + goto again; >> + } >> + rcu_read_unlock(); >> + >> + ret = res_counter_charge(&memcg->hugepage[idx], csize, &fail_res); >> + css_put(&memcg->css); >> +done: >> + *ptr = memcg; > > Why do we set ptr even for the failure case after we dropped a > reference? That ensures that *ptr is NULL. > >> + return ret; >> +} >> + >> +void mem_cgroup_hugetlb_commit_charge(int idx, unsigned long nr_pages, >> + struct mem_cgroup *memcg, >> + struct page *page) >> +{ >> + struct page_cgroup *pc; >> + >> + if (mem_cgroup_disabled()) >> + return; >> + >> + pc = lookup_page_cgroup(page); >> + lock_page_cgroup(pc); >> + if (unlikely(PageCgroupUsed(pc))) { >> + unlock_page_cgroup(pc); >> + mem_cgroup_hugetlb_uncharge_memcg(idx, nr_pages, memcg); >> + return; >> + } >> + pc->mem_cgroup = memcg; >> + /* >> + * We access a page_cgroup asynchronously without lock_page_cgroup(). >> + * Especially when a page_cgroup is taken from a page, pc->mem_cgroup >> + * is accessed after testing USED bit. To make pc->mem_cgroup visible >> + * before USED bit, we need memory barrier here. >> + * See mem_cgroup_add_lru_list(), etc. >> + */ >> + smp_wmb(); > > Is this really necessary for hugetlb pages as well? I used to do that in cgroup_rmdir path, I later changed that part of the code. I will look at the patches again to see if we really need this. > >> + SetPageCgroupUsed(pc); >> + >> + unlock_page_cgroup(pc); >> + return; >> +} >> + > [...] >> @@ -4887,6 +5013,7 @@ err_cleanup: >> static struct cgroup_subsys_state * __ref >> mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont) >> { >> + int idx; >> struct mem_cgroup *memcg, *parent; >> long error = -ENOMEM; >> int node; >> @@ -4929,9 +5056,14 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont) >> * mem_cgroup(see mem_cgroup_put). >> */ >> mem_cgroup_get(parent); >> + for (idx = 0; idx < HUGE_MAX_HSTATE; idx++) > > Do we have to init all hstates or is hugetlb_max_hstate enough? Yes. we do call mem_cgroup_create for root cgroup before initialzing hugetlb hstate. > >> + res_counter_init(&memcg->hugepage[idx], >> + &parent->hugepage[idx]); >> } else { >> res_counter_init(&memcg->res, NULL); >> res_counter_init(&memcg->memsw, NULL); >> + for (idx = 0; idx < HUGE_MAX_HSTATE; idx++) >> + res_counter_init(&memcg->hugepage[idx], NULL); > > Same here > -- -aneesh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michal Hocko Subject: Re: [PATCH -V4 08/10] hugetlbfs: Add a list for tracking in-use HugeTLB pages Date: Wed, 28 Mar 2012 15:58:46 +0200 Message-ID: <20120328135845.GH20949@tiehlicka.suse.cz> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-9-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Mime-Version: 1.0 Return-path: Content-Disposition: inline In-Reply-To: <1331919570-2264-9-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Sender: owner-linux-mm@kvack.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: "Aneesh Kumar K.V" Cc: linux-mm@kvack.org, mgorman@suse.de, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, aarcange@redhat.com, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org On Fri 16-03-12 23:09:28, Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > hugepage_activelist will be used to track currently used HugeTLB pages. > We need to find the in-use HugeTLB pages to support memcg removal. > On memcg removal we update the page's memory cgroup to point to > parent cgroup. > > Signed-off-by: Aneesh Kumar K.V > --- > include/linux/hugetlb.h | 1 + > mm/hugetlb.c | 23 ++++++++++++++++++----- > 2 files changed, 19 insertions(+), 5 deletions(-) > > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h > index cbd8dc5..6919100 100644 > --- a/include/linux/hugetlb.h > +++ b/include/linux/hugetlb.h [...] > @@ -2319,14 +2322,24 @@ void __unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start, > page = pte_page(pte); > if (pte_dirty(pte)) > set_page_dirty(page); > - list_add(&page->lru, &page_list); > + > + spin_lock(&hugetlb_lock); > + list_move(&page->lru, &page_list); > + spin_unlock(&hugetlb_lock); Why do we really need the spinlock here? > } > spin_unlock(&mm->page_table_lock); > flush_tlb_range(vma, start, end); > mmu_notifier_invalidate_range_end(mm, start, end); > list_for_each_entry_safe(page, tmp, &page_list, lru) { > page_remove_rmap(page); > - list_del(&page->lru); > + /* > + * We need to move it back huge page active list. If we are > + * holding the last reference, below put_page will move it > + * back to free list. > + */ > + spin_lock(&hugetlb_lock); > + list_move(&page->lru, &h->hugepage_activelist); > + spin_unlock(&hugetlb_lock); This spinlock usage doesn't look nice but I guess we do not have many other options. -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michal Hocko Subject: Re: [PATCH -V4 09/10] memcg: move HugeTLB resource count to parent cgroup on memcg removal Date: Wed, 28 Mar 2012 16:07:33 +0200 Message-ID: <20120328140733.GI20949@tiehlicka.suse.cz> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-10-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Mime-Version: 1.0 Return-path: Content-Disposition: inline In-Reply-To: <1331919570-2264-10-git-send-email-aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: "Aneesh Kumar K.V" Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, mgorman-l3A5Bk7waGM@public.gmane.org, kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org, dhillf-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, aarcange-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org, hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org On Fri 16-03-12 23:09:29, Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > This add support for memcg removal with HugeTLB resource usage. > > Signed-off-by: Aneesh Kumar K.V > --- > include/linux/hugetlb.h | 6 ++++ > include/linux/memcontrol.h | 15 +++++++++- > mm/hugetlb.c | 41 ++++++++++++++++++++++++++ > mm/memcontrol.c | 68 +++++++++++++++++++++++++++++++++++++------ > 4 files changed, 119 insertions(+), 11 deletions(-) > [...] > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 8fd465d..685f0d5 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c [...] > @@ -3285,10 +3287,57 @@ void mem_cgroup_hugetlb_uncharge_memcg(int idx, unsigned long nr_pages, > res_counter_uncharge(&memcg->hugepage[idx], csize); > return; > } > -#else > -static bool mem_cgroup_have_hugetlb_usage(struct mem_cgroup *memcg) > + > +int mem_cgroup_move_hugetlb_parent(int idx, struct cgroup *cgroup, > + struct page *page) > { > - return 0; > + struct page_cgroup *pc; > + int csize, ret = 0; > + struct res_counter *fail_res; > + struct cgroup *pcgrp = cgroup->parent; > + struct mem_cgroup *parent = mem_cgroup_from_cont(pcgrp); > + struct mem_cgroup *memcg = mem_cgroup_from_cont(cgroup); > + > + if (!get_page_unless_zero(page)) > + goto out; > + > + pc = lookup_page_cgroup(page); > + lock_page_cgroup(pc); > + if (!PageCgroupUsed(pc) || pc->mem_cgroup != memcg) > + goto err_out; > + > + csize = PAGE_SIZE << compound_order(page); > + /* > + * uncharge from child and charge the parent. If we have > + * use_hierarchy set, we can never fail here. In-order to make > + * sure we don't get -ENOMEM on parent charge, we first uncharge > + * the child and then charge the parent. > + */ > + if (parent->use_hierarchy) { > + res_counter_uncharge(&memcg->hugepage[idx], csize); > + if (!mem_cgroup_is_root(parent)) > + ret = res_counter_charge(&parent->hugepage[idx], > + csize, &fail_res); You can still race with other hugetlb charge which would make this fail. > + } else { > + if (!mem_cgroup_is_root(parent)) { > + ret = res_counter_charge(&parent->hugepage[idx], > + csize, &fail_res); > + if (ret) { > + ret = -EBUSY; > + goto err_out; > + } > + } > + res_counter_uncharge(&memcg->hugepage[idx], csize); > + } > + /* > + * caller should have done css_get > + */ > + pc->mem_cgroup = parent; > +err_out: > + unlock_page_cgroup(pc); > + put_page(page); > +out: > + return ret; > } > #endif /* CONFIG_MEM_RES_CTLR_HUGETLB */ [...] -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michal Hocko Subject: Re: [PATCH -V4 10/10] memcg: Add memory controller documentation for hugetlb management Date: Wed, 28 Mar 2012 16:36:58 +0200 Message-ID: <20120328143658.GJ20949@tiehlicka.suse.cz> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-11-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Mime-Version: 1.0 Return-path: Content-Disposition: inline In-Reply-To: <1331919570-2264-11-git-send-email-aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: "Aneesh Kumar K.V" Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, mgorman-l3A5Bk7waGM@public.gmane.org, kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org, dhillf-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, aarcange-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org, hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org On Fri 16-03-12 23:09:30, Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > Signed-off-by: Aneesh Kumar K.V > --- > Documentation/cgroups/memory.txt | 29 +++++++++++++++++++++++++++++ > 1 files changed, 29 insertions(+), 0 deletions(-) > > diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt > index 4c95c00..d99c41b 100644 > --- a/Documentation/cgroups/memory.txt > +++ b/Documentation/cgroups/memory.txt > @@ -43,6 +43,7 @@ Features: > - usage threshold notifier > - oom-killer disable knob and oom-notifier > - Root cgroup has no limit controls. > + - resource accounting for HugeTLB pages > > Kernel memory support is work in progress, and the current version provides > basically functionality. (See Section 2.7) > @@ -75,6 +76,12 @@ Brief summary of control files. > memory.kmem.tcp.limit_in_bytes # set/show hard limit for tcp buf memory > memory.kmem.tcp.usage_in_bytes # show current tcp buf memory allocation > > + > + memory.hugetlb..limit_in_bytes # set/show limit of "hugepagesize" hugetlb usage > + memory.hugetlb..max_usage_in_bytes # show max "hugepagesize" hugetlb usage recorded > + memory.hugetlb..usage_in_bytes # show current res_counter usage for "hugepagesize" hugetlb > + # see 5.7 for details > + > 1. History > > The memory controller has a long history. A request for comments for the memory > @@ -279,6 +286,15 @@ per cgroup, instead of globally. > > * tcp memory pressure: sockets memory pressure for the tcp protocol. > > +2.8 HugeTLB extension > + > +This extension allows to limit the HugeTLB usage per control group and > +enforces the controller limit during page fault. Since HugeTLB doesn't > +support page reclaim, enforcing the limit at page fault time implies that, > +the application will get SIGBUS signal if it tries to access HugeTLB pages > +beyond its limit. This is consistent with the quota so we should mention that. We should also add a note how we interact with quotas. Another important thing to note is that the limit/usage are unrelated to memcg hard/soft limit/usage. > This requires the application to know beforehand how much > +HugeTLB pages it would require for its use. > + > 3. User Interface > > 0. Configuration > @@ -287,6 +303,7 @@ a. Enable CONFIG_CGROUPS > b. Enable CONFIG_RESOURCE_COUNTERS > c. Enable CONFIG_CGROUP_MEM_RES_CTLR > d. Enable CONFIG_CGROUP_MEM_RES_CTLR_SWAP (to use swap extension) > +f. Enable CONFIG_MEM_RES_CTLR_HUGETLB (to use HugeTLB extension) > > 1. Prepare the cgroups (see cgroups.txt, Why are cgroups needed?) > # mount -t tmpfs none /sys/fs/cgroup > @@ -510,6 +527,18 @@ unevictable= N0= N1= ... > > And we have total = file + anon + unevictable. > > +5.7 HugeTLB resource control files > +For a system supporting two hugepage size (16M and 16G) the control > +files include: > + > + memory.hugetlb.16GB.limit_in_bytes > + memory.hugetlb.16GB.max_usage_in_bytes > + memory.hugetlb.16GB.usage_in_bytes > + memory.hugetlb.16MB.limit_in_bytes > + memory.hugetlb.16MB.max_usage_in_bytes > + memory.hugetlb.16MB.usage_in_bytes > + > + > 6. Hierarchy support > > The memory controller supports a deep hierarchy and hierarchical accounting. > -- > 1.7.9 > -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michal Hocko Subject: Re: [PATCH -V4 04/10] memcg: Add HugeTLB extension Date: Wed, 28 Mar 2012 17:44:34 +0200 Message-ID: <20120328154434.GN20949@tiehlicka.suse.cz> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-5-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <20120328113304.GE20949@tiehlicka.suse.cz> <87d37wetd7.fsf@skywalker.in.ibm.com> Mime-Version: 1.0 Return-path: Content-Disposition: inline In-Reply-To: <87d37wetd7.fsf@skywalker.in.ibm.com> Sender: owner-linux-mm@kvack.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: "Aneesh Kumar K.V" Cc: linux-mm@kvack.org, mgorman@suse.de, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, aarcange@redhat.com, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org On Wed 28-03-12 19:10:36, Aneesh Kumar K.V wrote: > Michal Hocko writes: > > > On Fri 16-03-12 23:09:24, Aneesh Kumar K.V wrote: > >> From: "Aneesh Kumar K.V" > >> > >> This patch implements a memcg extension that allows us to control > >> HugeTLB allocations via memory controller. > > > > And the infrastructure is not used at this stage (you forgot to > > mention). > > The changelog should be much more descriptive. > > > Will update the changelog. Thx > > > > >> > >> Signed-off-by: Aneesh Kumar K.V > >> --- > >> include/linux/hugetlb.h | 1 + > >> include/linux/memcontrol.h | 42 +++++++++++++ > >> init/Kconfig | 8 +++ > >> mm/hugetlb.c | 2 +- > >> mm/memcontrol.c | 138 ++++++++++++++++++++++++++++++++++++++++++++ > >> 5 files changed, 190 insertions(+), 1 deletions(-) > >> > > [...] > >> diff --git a/init/Kconfig b/init/Kconfig > >> index 3f42cd6..f0eb8aa 100644 > >> --- a/init/Kconfig > >> +++ b/init/Kconfig > >> @@ -725,6 +725,14 @@ config CGROUP_PERF > >> > >> Say N if unsure. > >> > >> +config MEM_RES_CTLR_HUGETLB > >> + bool "Memory Resource Controller HugeTLB Extension (EXPERIMENTAL)" > >> + depends on CGROUP_MEM_RES_CTLR && HUGETLB_PAGE && EXPERIMENTAL > >> + default n > >> + help > >> + Add HugeTLB management to memory resource controller. When you > >> + enable this, you can put a per cgroup limit on HugeTLB usage. > > > > How does it interact with the hard/soft limists etc... > > > There is no softlimit support for HugeTLB extension. Sure, sorry for not being precise. The point was how this interacts with memcg hard/soft limit (they are independent) etc... > > [...] > >> diff --git a/mm/memcontrol.c b/mm/memcontrol.c > >> index 6728a7a..4b36c5e 100644 > >> --- a/mm/memcontrol.c > >> +++ b/mm/memcontrol.c > >> @@ -235,6 +235,10 @@ struct mem_cgroup { > >> */ > >> struct res_counter memsw; > >> /* > >> + * the counter to account for hugepages from hugetlb. > >> + */ > >> + struct res_counter hugepage[HUGE_MAX_HSTATE]; > >> + /* > >> * Per cgroup active and inactive list, similar to the > >> * per zone LRU lists. > >> */ > >> @@ -3156,6 +3160,128 @@ static inline int mem_cgroup_move_swap_account(swp_entry_t entry, > >> } > >> #endif > >> > >> +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB > >> +static bool mem_cgroup_have_hugetlb_usage(struct mem_cgroup *memcg) > >> +{ > >> + int idx; > >> + for (idx = 0; idx < hugetlb_max_hstate; idx++) { > > > > Maybe we should expose for_each_hstate as well... > > > That will not really help here. If we use for_each_hstate then we will > need to use hstate_index to get the index. Fair enough > >> + if (memcg->hugepage[idx].usage > 0) > >> + return 1; > >> + } > >> + return 0; > >> +} > >> + > >> +int mem_cgroup_hugetlb_charge_page(int idx, unsigned long nr_pages, > >> + struct mem_cgroup **ptr) > >> +{ > >> + int ret = 0; > >> + struct mem_cgroup *memcg; > >> + struct res_counter *fail_res; > >> + unsigned long csize = nr_pages * PAGE_SIZE; > >> + > >> + if (mem_cgroup_disabled()) > >> + return 0; > >> +again: > >> + rcu_read_lock(); > >> + memcg = mem_cgroup_from_task(current); > >> + if (!memcg) > >> + memcg = root_mem_cgroup; > >> + if (mem_cgroup_is_root(memcg)) { > >> + rcu_read_unlock(); > >> + goto done; > >> + } > >> + if (!css_tryget(&memcg->css)) { > >> + rcu_read_unlock(); > >> + goto again; > >> + } > >> + rcu_read_unlock(); > >> + > >> + ret = res_counter_charge(&memcg->hugepage[idx], csize, &fail_res); > >> + css_put(&memcg->css); > >> +done: > >> + *ptr = memcg; > > > > Why do we set ptr even for the failure case after we dropped a > > reference? > > That ensures that *ptr is NULL. Does it? AFAICS res_counter_charge might fail and you would use non NULL memcg (with a dropped reference). [...] > >> + SetPageCgroupUsed(pc); > >> + > >> + unlock_page_cgroup(pc); > >> + return; > >> +} > >> + > > [...] > >> @@ -4887,6 +5013,7 @@ err_cleanup: > >> static struct cgroup_subsys_state * __ref > >> mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont) > >> { > >> + int idx; > >> struct mem_cgroup *memcg, *parent; > >> long error = -ENOMEM; > >> int node; > >> @@ -4929,9 +5056,14 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont) > >> * mem_cgroup(see mem_cgroup_put). > >> */ > >> mem_cgroup_get(parent); > >> + for (idx = 0; idx < HUGE_MAX_HSTATE; idx++) > > > > Do we have to init all hstates or is hugetlb_max_hstate enough? > > > Yes. we do call mem_cgroup_create for root cgroup before initialzing > hugetlb hstate. drop a comment? -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Aneesh Kumar K.V" Subject: Re: [PATCH -V4 04/10] memcg: Add HugeTLB extension Date: Wed, 28 Mar 2012 23:07:14 +0530 Message-ID: <87y5qk1vat.fsf@skywalker.in.ibm.com> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-5-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <20120328134020.GG20949@tiehlicka.suse.cz>User-Agent: Notmuch/0.11.1+346~g13d19c3 (http://notmuchmail.org) Emacs/23.3.1 (x86_64-pc-linux-gnu) Mime-Version: 1.0 Return-path: In-Reply-To: <20120328134020.GG20949-VqjxzfR4DlwKmadIfiO5sKVXKuFTiq87@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: Michal Hocko Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, mgorman-l3A5Bk7waGM@public.gmane.org, kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org, dhillf-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, aarcange-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org, hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org Michal Hocko writes: > On Fri 16-03-12 23:09:24, Aneesh Kumar K.V wrote: > [...] >> diff --git a/mm/memcontrol.c b/mm/memcontrol.c >> index 6728a7a..4b36c5e 100644 >> --- a/mm/memcontrol.c >> +++ b/mm/memcontrol.c > [...] >> @@ -4887,6 +5013,7 @@ err_cleanup: >> static struct cgroup_subsys_state * __ref >> mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont) >> { >> + int idx; >> struct mem_cgroup *memcg, *parent; >> long error = -ENOMEM; >> int node; >> @@ -4929,9 +5056,14 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont) >> * mem_cgroup(see mem_cgroup_put). >> */ >> mem_cgroup_get(parent); >> + for (idx = 0; idx < HUGE_MAX_HSTATE; idx++) >> + res_counter_init(&memcg->hugepage[idx], >> + &parent->hugepage[idx]); > > Hmm, I do not think we want to make groups deeper in the hierarchy > unlimited as we cannot reclaim. Shouldn't we copy the limit from the parent? > Still not ideal but slightly more expected behavior IMO. But we should be limiting the child group based on parent's limit only when hierarchy is set right ? > > The hierarchy setups are still interesting and the limitations should be > described in the documentation... > It should behave similar to memcg. ie, if hierarchy is set, then we limit using MIN(parent's limit, child's limit). May be I am missing some of the details of memcg use_hierarchy config. My goal was to keep it similar to memcg. Can you explain why do you think the patch would make it any different ? -aneesh From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Aneesh Kumar K.V" Subject: Re: [PATCH -V4 08/10] hugetlbfs: Add a list for tracking in-use HugeTLB pages Date: Wed, 28 Mar 2012 23:08:34 +0530 Message-ID: <87vclo1v8l.fsf@skywalker.in.ibm.com> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-9-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <20120328135845.GH20949@tiehlicka.suse.cz>User-Agent: Notmuch/0.11.1+346~g13d19c3 (http://notmuchmail.org) Emacs/23.3.1 (x86_64-pc-linux-gnu) Mime-Version: 1.0 Return-path: In-Reply-To: <20120328135845.GH20949@tiehlicka.suse.cz> Sender: owner-linux-mm@kvack.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: Michal Hocko Cc: linux-mm@kvack.org, mgorman@suse.de, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, aarcange@redhat.com, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Michal Hocko writes: > On Fri 16-03-12 23:09:28, Aneesh Kumar K.V wrote: >> From: "Aneesh Kumar K.V" >> >> hugepage_activelist will be used to track currently used HugeTLB pages. >> We need to find the in-use HugeTLB pages to support memcg removal. >> On memcg removal we update the page's memory cgroup to point to >> parent cgroup. >> >> Signed-off-by: Aneesh Kumar K.V >> --- >> include/linux/hugetlb.h | 1 + >> mm/hugetlb.c | 23 ++++++++++++++++++----- >> 2 files changed, 19 insertions(+), 5 deletions(-) >> >> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h >> index cbd8dc5..6919100 100644 >> --- a/include/linux/hugetlb.h >> +++ b/include/linux/hugetlb.h > [...] >> @@ -2319,14 +2322,24 @@ void __unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start, >> page = pte_page(pte); >> if (pte_dirty(pte)) >> set_page_dirty(page); >> - list_add(&page->lru, &page_list); >> + >> + spin_lock(&hugetlb_lock); >> + list_move(&page->lru, &page_list); >> + spin_unlock(&hugetlb_lock); > > Why do we really need the spinlock here? It does a list_del from hugepage_activelist. > >> } >> spin_unlock(&mm->page_table_lock); >> flush_tlb_range(vma, start, end); >> mmu_notifier_invalidate_range_end(mm, start, end); >> list_for_each_entry_safe(page, tmp, &page_list, lru) { >> page_remove_rmap(page); >> - list_del(&page->lru); >> + /* >> + * We need to move it back huge page active list. If we are >> + * holding the last reference, below put_page will move it >> + * back to free list. >> + */ >> + spin_lock(&hugetlb_lock); >> + list_move(&page->lru, &h->hugepage_activelist); >> + spin_unlock(&hugetlb_lock); > > This spinlock usage doesn't look nice but I guess we do not have many > other options. > -aneesh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Aneesh Kumar K.V" Subject: Re: [PATCH -V4 05/10] hugetlb: add charge/uncharge calls for HugeTLB alloc/free Date: Wed, 28 Mar 2012 23:09:34 +0530 Message-ID: <87sjgs1v6x.fsf@skywalker.in.ibm.com> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-6-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <20120328131706.GF20949@tiehlicka.suse.cz>User-Agent: Notmuch/0.11.1+346~g13d19c3 (http://notmuchmail.org) Emacs/23.3.1 (x86_64-pc-linux-gnu) Mime-Version: 1.0 Return-path: In-Reply-To: <20120328131706.GF20949-VqjxzfR4DlwKmadIfiO5sKVXKuFTiq87@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: Michal Hocko Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, mgorman-l3A5Bk7waGM@public.gmane.org, kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org, dhillf-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, aarcange-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org, hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org Michal Hocko writes: > On Fri 16-03-12 23:09:25, Aneesh Kumar K.V wrote: >> From: "Aneesh Kumar K.V" >> >> This adds necessary charge/uncharge calls in the HugeTLB code > > This begs for more description... > Other than that it looks correct. > Updated as below hugetlb: add charge/uncharge calls for HugeTLB alloc/free This adds necessary charge/uncharge calls in the HugeTLB code. We do memcg charge in page alloc and uncharge in compound page destructor. We also need to ignore HugeTLB pages in __mem_cgroup_uncharge_common because that get called from delete_from_page_cache -aneesh From mboxrd@z Thu Jan 1 00:00:00 1970 From: KAMEZAWA Hiroyuki Subject: Re: [PATCH -V4 04/10] memcg: Add HugeTLB extension Date: Thu, 29 Mar 2012 09:18:39 +0900 Message-ID: <4F73AA5F.5050604@jp.fujitsu.com> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-5-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <20120328134020.GG20949@tiehlicka.suse.cz>User-Agent: Notmuch/0.11.1+346~g13d19c3 (http://notmuchmail.org) Emacs/23.3.1 (x86_64-pc-linux-gnu) <87y5qk1vat.fsf@skywalker.in.ibm.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <87y5qk1vat.fsf-6yE53ggjAfyqSkle7U1LjlaTQe2KTcn/@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" To: "Aneesh Kumar K.V" Cc: Michal Hocko , linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, mgorman-l3A5Bk7waGM@public.gmane.org, dhillf-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, aarcange-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org, hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org (2012/03/29 2:37), Aneesh Kumar K.V wrote: > Michal Hocko writes: > >> On Fri 16-03-12 23:09:24, Aneesh Kumar K.V wrote: >> [...] >>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c >>> index 6728a7a..4b36c5e 100644 >>> --- a/mm/memcontrol.c >>> +++ b/mm/memcontrol.c >> [...] >>> @@ -4887,6 +5013,7 @@ err_cleanup: >>> static struct cgroup_subsys_state * __ref >>> mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont) >>> { >>> + int idx; >>> struct mem_cgroup *memcg, *parent; >>> long error = -ENOMEM; >>> int node; >>> @@ -4929,9 +5056,14 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont) >>> * mem_cgroup(see mem_cgroup_put). >>> */ >>> mem_cgroup_get(parent); >>> + for (idx = 0; idx < HUGE_MAX_HSTATE; idx++) >>> + res_counter_init(&memcg->hugepage[idx], >>> + &parent->hugepage[idx]); >> >> Hmm, I do not think we want to make groups deeper in the hierarchy >> unlimited as we cannot reclaim. Shouldn't we copy the limit from the parent? >> Still not ideal but slightly more expected behavior IMO. > > But we should be limiting the child group based on parent's limit only > when hierarchy is set right ? > >> >> The hierarchy setups are still interesting and the limitations should be >> described in the documentation... >> > > It should behave similar to memcg. ie, if hierarchy is set, then we limit > using MIN(parent's limit, child's limit). May be I am missing some of > the details of memcg use_hierarchy config. My goal was to keep it > similar to memcg. Can you explain why do you think the patch would > make it any different ? > Maybe this is a different story but.... Tejun(Cgroup Maintainer) asked us to remove 'use_hierarchy' settings because most of other cgroups are hierarchical(*). I answered that improvement in res_counter latency is required. And now, we have some idea to improve res_counter. (I'd like to try this after page_cgroup diet series..) If we change and drop use_hierarchy, the usage similar to current use_hierarchy=0 will be.. /cgroup/memory/ = unlimited level1 = unlimited level2 = unlimited level3 = limit To do this, after improvement of res_counter, we entry use_hierarchy into feature-removal-list and wait for 2 versions..So, this will not affect your developments, anyway. Thanks, -Kame (*) AFAIK, blkio cgroup needs tons of work to be hierarchical... From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michal Hocko Subject: Re: [PATCH -V4 04/10] memcg: Add HugeTLB extension Date: Thu, 29 Mar 2012 09:57:22 +0200 Message-ID: <20120329075722.GB30465@tiehlicka.suse.cz> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-5-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <20120328134020.GG20949@tiehlicka.suse.cz> <87y5qk1vat.fsf@skywalker.in.ibm.com> Mime-Version: 1.0 Return-path: Content-Disposition: inline In-Reply-To: <87y5qk1vat.fsf-6yE53ggjAfyqSkle7U1LjlaTQe2KTcn/@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: "Aneesh Kumar K.V" Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, mgorman-l3A5Bk7waGM@public.gmane.org, kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org, dhillf-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, aarcange-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org, hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org On Wed 28-03-12 23:07:14, Aneesh Kumar K.V wrote: > Michal Hocko writes: > > > On Fri 16-03-12 23:09:24, Aneesh Kumar K.V wrote: > > [...] > >> diff --git a/mm/memcontrol.c b/mm/memcontrol.c > >> index 6728a7a..4b36c5e 100644 > >> --- a/mm/memcontrol.c > >> +++ b/mm/memcontrol.c > > [...] > >> @@ -4887,6 +5013,7 @@ err_cleanup: > >> static struct cgroup_subsys_state * __ref > >> mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont) > >> { > >> + int idx; > >> struct mem_cgroup *memcg, *parent; > >> long error = -ENOMEM; > >> int node; > >> @@ -4929,9 +5056,14 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont) > >> * mem_cgroup(see mem_cgroup_put). > >> */ > >> mem_cgroup_get(parent); > >> + for (idx = 0; idx < HUGE_MAX_HSTATE; idx++) > >> + res_counter_init(&memcg->hugepage[idx], > >> + &parent->hugepage[idx]); > > > > Hmm, I do not think we want to make groups deeper in the hierarchy > > unlimited as we cannot reclaim. Shouldn't we copy the limit from the parent? > > Still not ideal but slightly more expected behavior IMO. > > But we should be limiting the child group based on parent's limit only > when hierarchy is set right ? Yes. Everything else should be unlimited by default. > > > > > The hierarchy setups are still interesting and the limitations should be > > described in the documentation... > > > > It should behave similar to memcg. ie, if hierarchy is set, then we limit > using MIN(parent's limit, child's limit). May be I am missing some of > the details of memcg use_hierarchy config. My goal was to keep it > similar to memcg. Can you explain why do you think the patch would > make it any different ? Yes, the patch tries to be consistent with the memcg limits. That is OK and I have no objections for that. It is just that consequences are different. The hugetlb limit is really hard... > > -aneesh > > -- > To unsubscribe from this list: send the line "unsubscribe cgroups" in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michal Hocko Subject: Re: [PATCH -V4 05/10] hugetlb: add charge/uncharge calls for HugeTLB alloc/free Date: Thu, 29 Mar 2012 10:10:03 +0200 Message-ID: <20120329081003.GC30465@tiehlicka.suse.cz> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-6-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <20120328131706.GF20949@tiehlicka.suse.cz> <87sjgs1v6x.fsf@skywalker.in.ibm.com> Mime-Version: 1.0 Return-path: Content-Disposition: inline In-Reply-To: <87sjgs1v6x.fsf@skywalker.in.ibm.com> Sender: owner-linux-mm@kvack.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: "Aneesh Kumar K.V" Cc: linux-mm@kvack.org, mgorman@suse.de, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, aarcange@redhat.com, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org On Wed 28-03-12 23:09:34, Aneesh Kumar K.V wrote: > Michal Hocko writes: > > > On Fri 16-03-12 23:09:25, Aneesh Kumar K.V wrote: > >> From: "Aneesh Kumar K.V" > >> > >> This adds necessary charge/uncharge calls in the HugeTLB code > > > > This begs for more description... > > Other than that it looks correct. > > > > Updated as below > > hugetlb: add charge/uncharge calls for HugeTLB alloc/free > > This adds necessary charge/uncharge calls in the HugeTLB code. We do > memcg charge in page alloc and uncharge in compound page destructor. > We also need to ignore HugeTLB pages in __mem_cgroup_uncharge_common > because that get called from delete_from_page_cache and from mem_cgroup_end_migration used during soft_offline_page. Btw., while looking at mem_cgroup_end_migration, I have noticed that you need to take care of mem_cgroup_prepare_migration as well otherwise the page would get charged as a normal (shmem) page. > > -aneesh > > -- > To unsubscribe from this list: send the line "unsubscribe cgroups" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michal Hocko Subject: Re: [PATCH -V4 08/10] hugetlbfs: Add a list for tracking in-use HugeTLB pages Date: Thu, 29 Mar 2012 10:11:57 +0200 Message-ID: <20120329081157.GD30465@tiehlicka.suse.cz> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-9-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <20120328135845.GH20949@tiehlicka.suse.cz> <87vclo1v8l.fsf@skywalker.in.ibm.com> Mime-Version: 1.0 Return-path: Content-Disposition: inline In-Reply-To: <87vclo1v8l.fsf@skywalker.in.ibm.com> Sender: owner-linux-mm@kvack.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: "Aneesh Kumar K.V" Cc: linux-mm@kvack.org, mgorman@suse.de, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, aarcange@redhat.com, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org On Wed 28-03-12 23:08:34, Aneesh Kumar K.V wrote: > Michal Hocko writes: > > > On Fri 16-03-12 23:09:28, Aneesh Kumar K.V wrote: > >> From: "Aneesh Kumar K.V" > >> > >> hugepage_activelist will be used to track currently used HugeTLB pages. > >> We need to find the in-use HugeTLB pages to support memcg removal. > >> On memcg removal we update the page's memory cgroup to point to > >> parent cgroup. > >> > >> Signed-off-by: Aneesh Kumar K.V > >> --- > >> include/linux/hugetlb.h | 1 + > >> mm/hugetlb.c | 23 ++++++++++++++++++----- > >> 2 files changed, 19 insertions(+), 5 deletions(-) > >> > >> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h > >> index cbd8dc5..6919100 100644 > >> --- a/include/linux/hugetlb.h > >> +++ b/include/linux/hugetlb.h > > [...] > >> @@ -2319,14 +2322,24 @@ void __unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start, > >> page = pte_page(pte); > >> if (pte_dirty(pte)) > >> set_page_dirty(page); > >> - list_add(&page->lru, &page_list); > >> + > >> + spin_lock(&hugetlb_lock); > >> + list_move(&page->lru, &page_list); > >> + spin_unlock(&hugetlb_lock); > > > > Why do we really need the spinlock here? > > > It does a list_del from hugepage_activelist. right you are. sorry -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michal Hocko Subject: Re: [PATCH -V4 05/10] hugetlb: add charge/uncharge calls for HugeTLB alloc/free Date: Fri, 30 Mar 2012 12:46:50 +0200 Message-ID: <20120330104650.GB15375@tiehlicka.suse.cz> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-6-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <20120328131706.GF20949@tiehlicka.suse.cz> <87sjgs1v6x.fsf@skywalker.in.ibm.com> <20120329081003.GC30465@tiehlicka.suse.cz> <871uoamkxr.fsf@skywalker.in.ibm.com> Mime-Version: 1.0 Return-path: Content-Disposition: inline In-Reply-To: <871uoamkxr.fsf-6yE53ggjAfyqSkle7U1LjlaTQe2KTcn/@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: "Aneesh Kumar K.V" Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, mgorman-l3A5Bk7waGM@public.gmane.org, kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org, dhillf-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, aarcange-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org, hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org On Fri 30-03-12 16:10:00, Aneesh Kumar K.V wrote: > Michal Hocko writes: > > > On Wed 28-03-12 23:09:34, Aneesh Kumar K.V wrote: > >> Michal Hocko writes: > >> > >> > On Fri 16-03-12 23:09:25, Aneesh Kumar K.V wrote: > >> >> From: "Aneesh Kumar K.V" > >> >> > >> >> This adds necessary charge/uncharge calls in the HugeTLB code > >> > > >> > This begs for more description... > >> > Other than that it looks correct. > >> > > >> > >> Updated as below > >> > >> hugetlb: add charge/uncharge calls for HugeTLB alloc/free > >> > >> This adds necessary charge/uncharge calls in the HugeTLB code. We do > >> memcg charge in page alloc and uncharge in compound page destructor. > >> We also need to ignore HugeTLB pages in __mem_cgroup_uncharge_common > >> because that get called from delete_from_page_cache > > > > and from mem_cgroup_end_migration used during soft_offline_page. > > > > Btw., while looking at mem_cgroup_end_migration, I have noticed that you > > need to take care of mem_cgroup_prepare_migration as well otherwise the > > page would get charged as a normal (shmem) page. > > > > Won't we skip HugeTLB pages in migrate ? Yes but we still migrate for memory failure (see soft_offline_page). > check_range do check for is_vm_hugetlb_page. > > -aneesh > -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Aneesh Kumar K.V" Subject: Re: [PATCH -V4 05/10] hugetlb: add charge/uncharge calls for HugeTLB alloc/free Date: Fri, 30 Mar 2012 16:10:00 +0530 Message-ID: <871uoamkxr.fsf@skywalker.in.ibm.com> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-6-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <20120328131706.GF20949@tiehlicka.suse.cz> <87sjgs1v6x.fsf@skywalker.in.ibm.com> <20120329081003.GC30465@tiehlicka.suse.cz>User-Agent: Notmuch/0.11.1+346~g13d19c3 (http://notmuchmail.org) Emacs/23.3.1 (x86_64-pc-linux-gnu) Mime-Version: 1.0 Return-path: In-Reply-To: <20120329081003.GC30465@tiehlicka.suse.cz> Sender: owner-linux-mm@kvack.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: Michal Hocko Cc: linux-mm@kvack.org, mgorman@suse.de, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, aarcange@redhat.com, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Michal Hocko writes: > On Wed 28-03-12 23:09:34, Aneesh Kumar K.V wrote: >> Michal Hocko writes: >> >> > On Fri 16-03-12 23:09:25, Aneesh Kumar K.V wrote: >> >> From: "Aneesh Kumar K.V" >> >> >> >> This adds necessary charge/uncharge calls in the HugeTLB code >> > >> > This begs for more description... >> > Other than that it looks correct. >> > >> >> Updated as below >> >> hugetlb: add charge/uncharge calls for HugeTLB alloc/free >> >> This adds necessary charge/uncharge calls in the HugeTLB code. We do >> memcg charge in page alloc and uncharge in compound page destructor. >> We also need to ignore HugeTLB pages in __mem_cgroup_uncharge_common >> because that get called from delete_from_page_cache > > and from mem_cgroup_end_migration used during soft_offline_page. > > Btw., while looking at mem_cgroup_end_migration, I have noticed that you > need to take care of mem_cgroup_prepare_migration as well otherwise the > page would get charged as a normal (shmem) page. > Won't we skip HugeTLB pages in migrate ? check_range do check for is_vm_hugetlb_page. -aneesh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx148.postini.com [74.125.245.148]) by kanga.kvack.org (Postfix) with SMTP id 71B0F6B004A for ; Sun, 18 Mar 2012 22:13:42 -0400 (EDT) Received: from m4.gw.fujitsu.co.jp (unknown [10.0.50.74]) by fgwmail5.fujitsu.co.jp (Postfix) with ESMTP id D2ED944DD85 for ; Mon, 19 Mar 2012 11:13:40 +0900 (JST) Received: from smail (m4 [127.0.0.1]) by outgoing.m4.gw.fujitsu.co.jp (Postfix) with ESMTP id 9A9C545DE57 for ; Mon, 19 Mar 2012 11:13:40 +0900 (JST) Received: from s4.gw.fujitsu.co.jp (s4.gw.fujitsu.co.jp [10.0.50.94]) by m4.gw.fujitsu.co.jp (Postfix) with ESMTP id 6FF0A45DE4F for ; Mon, 19 Mar 2012 11:13:40 +0900 (JST) Received: from s4.gw.fujitsu.co.jp (localhost.localdomain [127.0.0.1]) by s4.gw.fujitsu.co.jp (Postfix) with ESMTP id 5CCB71DB804A for ; Mon, 19 Mar 2012 11:13:40 +0900 (JST) Received: from m107.s.css.fujitsu.com (m107.s.css.fujitsu.com [10.240.81.147]) by s4.gw.fujitsu.co.jp (Postfix) with ESMTP id 0D2B61DB8043 for ; Mon, 19 Mar 2012 11:13:40 +0900 (JST) Message-ID: <4F6695EC.2060208@jp.fujitsu.com> Date: Mon, 19 Mar 2012 11:11:56 +0900 From: KAMEZAWA Hiroyuki MIME-Version: 1.0 Subject: Re: [PATCH -V4 02/10] hugetlbfs: don't use ERR_PTR with VM_FAULT* values References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-3-git-send-email-aneesh.kumar@linux.vnet.ibm.com> In-Reply-To: <1331919570-2264-3-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Content-Type: text/plain; charset=ISO-2022-JP Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: "Aneesh Kumar K.V" Cc: linux-mm@kvack.org, mgorman@suse.de, dhillf@gmail.com, aarcange@redhat.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org (2012/03/17 2:39), Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > Using VM_FAULT_* codes with ERR_PTR will require us to make sure > VM_FAULT_* values will not exceed MAX_ERRNO value. > > Signed-off-by: Aneesh Kumar K.V Is this a bug fix ? Reviewed-by: KAMEZAWA Hiroyuki -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx159.postini.com [74.125.245.159]) by kanga.kvack.org (Postfix) with SMTP id B04436B004A for ; Sun, 18 Mar 2012 22:40:25 -0400 (EDT) Received: from m2.gw.fujitsu.co.jp (unknown [10.0.50.72]) by fgwmail6.fujitsu.co.jp (Postfix) with ESMTP id 2CB343EE0C0 for ; Mon, 19 Mar 2012 11:40:24 +0900 (JST) Received: from smail (m2 [127.0.0.1]) by outgoing.m2.gw.fujitsu.co.jp (Postfix) with ESMTP id 11CF045DE52 for ; Mon, 19 Mar 2012 11:40:24 +0900 (JST) Received: from s2.gw.fujitsu.co.jp (s2.gw.fujitsu.co.jp [10.0.50.92]) by m2.gw.fujitsu.co.jp (Postfix) with ESMTP id EC89E45DE50 for ; Mon, 19 Mar 2012 11:40:23 +0900 (JST) Received: from s2.gw.fujitsu.co.jp (localhost.localdomain [127.0.0.1]) by s2.gw.fujitsu.co.jp (Postfix) with ESMTP id DC0A41DB803E for ; Mon, 19 Mar 2012 11:40:23 +0900 (JST) Received: from m107.s.css.fujitsu.com (m107.s.css.fujitsu.com [10.240.81.147]) by s2.gw.fujitsu.co.jp (Postfix) with ESMTP id 7D45E1DB802C for ; Mon, 19 Mar 2012 11:40:23 +0900 (JST) Message-ID: <4F669C2E.1010502@jp.fujitsu.com> Date: Mon, 19 Mar 2012 11:38:38 +0900 From: KAMEZAWA Hiroyuki MIME-Version: 1.0 Subject: Re: [PATCH -V4 04/10] memcg: Add HugeTLB extension References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-5-git-send-email-aneesh.kumar@linux.vnet.ibm.com> In-Reply-To: <1331919570-2264-5-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Content-Type: text/plain; charset=ISO-2022-JP Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: "Aneesh Kumar K.V" Cc: linux-mm@kvack.org, mgorman@suse.de, dhillf@gmail.com, aarcange@redhat.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org (2012/03/17 2:39), Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > This patch implements a memcg extension that allows us to control > HugeTLB allocations via memory controller. > If you write some details here, it will be helpful for review and seeing log after merge. > Signed-off-by: Aneesh Kumar K.V > --- > include/linux/hugetlb.h | 1 + > include/linux/memcontrol.h | 42 +++++++++++++ > init/Kconfig | 8 +++ > mm/hugetlb.c | 2 +- > mm/memcontrol.c | 138 ++++++++++++++++++++++++++++++++++++++++++++ > 5 files changed, 190 insertions(+), 1 deletions(-) > > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h > index a2675b0..1f70068 100644 > --- a/include/linux/hugetlb.h > +++ b/include/linux/hugetlb.h > @@ -243,6 +243,7 @@ struct hstate *size_to_hstate(unsigned long size); > #define HUGE_MAX_HSTATE 1 > #endif > > +extern int hugetlb_max_hstate; > extern struct hstate hstates[HUGE_MAX_HSTATE]; > extern unsigned int default_hstate_idx; > > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > index 4d34356..320dbad 100644 > --- a/include/linux/memcontrol.h > +++ b/include/linux/memcontrol.h > @@ -429,5 +429,47 @@ static inline void sock_release_memcg(struct sock *sk) > { > } > #endif /* CONFIG_CGROUP_MEM_RES_CTLR_KMEM */ > + > +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB > +extern int mem_cgroup_hugetlb_charge_page(int idx, unsigned long nr_pages, > + struct mem_cgroup **ptr); > +extern void mem_cgroup_hugetlb_commit_charge(int idx, unsigned long nr_pages, > + struct mem_cgroup *memcg, > + struct page *page); > +extern void mem_cgroup_hugetlb_uncharge_page(int idx, unsigned long nr_pages, > + struct page *page); > +extern void mem_cgroup_hugetlb_uncharge_memcg(int idx, unsigned long nr_pages, > + struct mem_cgroup *memcg); > + > +#else > +static inline int > +mem_cgroup_hugetlb_charge_page(int idx, unsigned long nr_pages, > + struct mem_cgroup **ptr) > +{ > + return 0; > +} > + > +static inline void > +mem_cgroup_hugetlb_commit_charge(int idx, unsigned long nr_pages, > + struct mem_cgroup *memcg, > + struct page *page) > +{ > + return; > +} > + > +static inline void > +mem_cgroup_hugetlb_uncharge_page(int idx, unsigned long nr_pages, > + struct page *page) > +{ > + return; > +} > + > +static inline void > +mem_cgroup_hugetlb_uncharge_memcg(int idx, unsigned long nr_pages, > + struct mem_cgroup *memcg) > +{ > + return; > +} > +#endif /* CONFIG_MEM_RES_CTLR_HUGETLB */ > #endif /* _LINUX_MEMCONTROL_H */ > > diff --git a/init/Kconfig b/init/Kconfig > index 3f42cd6..f0eb8aa 100644 > --- a/init/Kconfig > +++ b/init/Kconfig > @@ -725,6 +725,14 @@ config CGROUP_PERF > > Say N if unsure. > > +config MEM_RES_CTLR_HUGETLB > + bool "Memory Resource Controller HugeTLB Extension (EXPERIMENTAL)" > + depends on CGROUP_MEM_RES_CTLR && HUGETLB_PAGE && EXPERIMENTAL > + default n > + help > + Add HugeTLB management to memory resource controller. When you > + enable this, you can put a per cgroup limit on HugeTLB usage. > + > menuconfig CGROUP_SCHED > bool "Group CPU scheduler" > default n > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index ebe245c..c672187 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -34,7 +34,7 @@ const unsigned long hugetlb_zero = 0, hugetlb_infinity = ~0UL; > static gfp_t htlb_alloc_mask = GFP_HIGHUSER; > unsigned long hugepages_treat_as_movable; > > -static int hugetlb_max_hstate; > +int hugetlb_max_hstate; > unsigned int default_hstate_idx; > struct hstate hstates[HUGE_MAX_HSTATE]; > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 6728a7a..4b36c5e 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -235,6 +235,10 @@ struct mem_cgroup { > */ > struct res_counter memsw; > /* > + * the counter to account for hugepages from hugetlb. > + */ > + struct res_counter hugepage[HUGE_MAX_HSTATE]; > + /* > * Per cgroup active and inactive list, similar to the > * per zone LRU lists. > */ > @@ -3156,6 +3160,128 @@ static inline int mem_cgroup_move_swap_account(swp_entry_t entry, > } > #endif > > +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB > +static bool mem_cgroup_have_hugetlb_usage(struct mem_cgroup *memcg) > +{ > + int idx; > + for (idx = 0; idx < hugetlb_max_hstate; idx++) { > + if (memcg->hugepage[idx].usage > 0) > + return 1; > + } > + return 0; > +} Please use res_counter_read_u64() rather than reading the value directly. > + > +int mem_cgroup_hugetlb_charge_page(int idx, unsigned long nr_pages, > + struct mem_cgroup **ptr) > +{ > + int ret = 0; > + struct mem_cgroup *memcg; > + struct res_counter *fail_res; > + unsigned long csize = nr_pages * PAGE_SIZE; > + > + if (mem_cgroup_disabled()) > + return 0; > +again: > + rcu_read_lock(); > + memcg = mem_cgroup_from_task(current); > + if (!memcg) > + memcg = root_mem_cgroup; > + if (mem_cgroup_is_root(memcg)) { > + rcu_read_unlock(); > + goto done; > + } One concern is.... Now, yes, memory cgroup doesn't account root cgroup and doesn't update res->usage to avoid updating shared counter overheads when memcg is not mounted. But memory.usage_in_bytes files works for root memcg with reading percpu statistics. So, how about counting usage for root cgroup even if it cannot be limited ? Considering hugetlb fs usage, updating res_counter here doesn't have performance problem of false sharing.. Then, you can remove root_mem_cgroup() checks inserted several places. > struct mem_cgroup *memcg = mem_cgroup_from_cont(cont); > + /* > + * Don't allow memcg removal if we have HugeTLB resource > + * usage. > + */ > + if (mem_cgroup_have_hugetlb_usage(memcg)) > + return -EBUSY; > > return mem_cgroup_force_empty(memcg, false); > } Is this fixed by patch 8+9 ? Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx207.postini.com [74.125.245.207]) by kanga.kvack.org (Postfix) with SMTP id 900DE6B004A for ; Sun, 18 Mar 2012 22:42:50 -0400 (EDT) Received: from m2.gw.fujitsu.co.jp (unknown [10.0.50.72]) by fgwmail5.fujitsu.co.jp (Postfix) with ESMTP id 1BA923EE0BD for ; Mon, 19 Mar 2012 11:42:49 +0900 (JST) Received: from smail (m2 [127.0.0.1]) by outgoing.m2.gw.fujitsu.co.jp (Postfix) with ESMTP id 01FBE45DE55 for ; Mon, 19 Mar 2012 11:42:49 +0900 (JST) Received: from s2.gw.fujitsu.co.jp (s2.gw.fujitsu.co.jp [10.0.50.92]) by m2.gw.fujitsu.co.jp (Postfix) with ESMTP id DC9AD45DE50 for ; Mon, 19 Mar 2012 11:42:48 +0900 (JST) Received: from s2.gw.fujitsu.co.jp (localhost.localdomain [127.0.0.1]) by s2.gw.fujitsu.co.jp (Postfix) with ESMTP id C3F221DB803A for ; Mon, 19 Mar 2012 11:42:48 +0900 (JST) Received: from ml13.s.css.fujitsu.com (ml13.s.css.fujitsu.com [10.240.81.133]) by s2.gw.fujitsu.co.jp (Postfix) with ESMTP id 76A3BE08001 for ; Mon, 19 Mar 2012 11:42:48 +0900 (JST) Message-ID: <4F669CC3.9070007@jp.fujitsu.com> Date: Mon, 19 Mar 2012 11:41:07 +0900 From: KAMEZAWA Hiroyuki MIME-Version: 1.0 Subject: Re: [PATCH -V4 05/10] hugetlb: add charge/uncharge calls for HugeTLB alloc/free References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-6-git-send-email-aneesh.kumar@linux.vnet.ibm.com> In-Reply-To: <1331919570-2264-6-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Content-Type: text/plain; charset=ISO-2022-JP Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: "Aneesh Kumar K.V" Cc: linux-mm@kvack.org, mgorman@suse.de, dhillf@gmail.com, aarcange@redhat.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org (2012/03/17 2:39), Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > This adds necessary charge/uncharge calls in the HugeTLB code > > Acked-by: Hillf Danton > Signed-off-by: Aneesh Kumar K.V Reviewed-by: KAMEZAWA Hiroyuki A nitpick below. > --- > mm/hugetlb.c | 21 ++++++++++++++++++++- > mm/memcontrol.c | 5 +++++ > 2 files changed, 25 insertions(+), 1 deletions(-) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index c672187..91361a0 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -21,6 +21,8 @@ > #include > #include > #include > +#include > +#include > > #include > #include > @@ -542,6 +544,9 @@ static void free_huge_page(struct page *page) > BUG_ON(page_mapcount(page)); > INIT_LIST_HEAD(&page->lru); > > + if (mapping) > + mem_cgroup_hugetlb_uncharge_page(hstate_index(h), > + pages_per_huge_page(h), page); > spin_lock(&hugetlb_lock); > if (h->surplus_huge_pages_node[nid] && huge_page_order(h) < MAX_ORDER) { > update_and_free_page(h, page); > @@ -1019,12 +1024,15 @@ static void vma_commit_reservation(struct hstate *h, > static struct page *alloc_huge_page(struct vm_area_struct *vma, > unsigned long addr, int avoid_reserve) > { > + int ret, idx; > struct hstate *h = hstate_vma(vma); > struct page *page; > + struct mem_cgroup *memcg = NULL; Can't we this initialization in mem_cgroup_hugetlb_charge_page() ? Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx178.postini.com [74.125.245.178]) by kanga.kvack.org (Postfix) with SMTP id AA7F66B004A for ; Sun, 18 Mar 2012 22:45:17 -0400 (EDT) Received: from m2.gw.fujitsu.co.jp (unknown [10.0.50.72]) by fgwmail5.fujitsu.co.jp (Postfix) with ESMTP id 4A1263EE0AE for ; Mon, 19 Mar 2012 11:45:16 +0900 (JST) Received: from smail (m2 [127.0.0.1]) by outgoing.m2.gw.fujitsu.co.jp (Postfix) with ESMTP id 2DA4945DE55 for ; Mon, 19 Mar 2012 11:45:16 +0900 (JST) Received: from s2.gw.fujitsu.co.jp (s2.gw.fujitsu.co.jp [10.0.50.92]) by m2.gw.fujitsu.co.jp (Postfix) with ESMTP id 12FA345DE52 for ; Mon, 19 Mar 2012 11:45:16 +0900 (JST) Received: from s2.gw.fujitsu.co.jp (localhost.localdomain [127.0.0.1]) by s2.gw.fujitsu.co.jp (Postfix) with ESMTP id EF1FD1DB803A for ; Mon, 19 Mar 2012 11:45:15 +0900 (JST) Received: from ml13.s.css.fujitsu.com (ml13.s.css.fujitsu.com [10.240.81.133]) by s2.gw.fujitsu.co.jp (Postfix) with ESMTP id A35C11DB802C for ; Mon, 19 Mar 2012 11:45:15 +0900 (JST) Message-ID: <4F669D56.4080002@jp.fujitsu.com> Date: Mon, 19 Mar 2012 11:43:34 +0900 From: KAMEZAWA Hiroyuki MIME-Version: 1.0 Subject: Re: [PATCH -V4 06/10] memcg: track resource index in cftype private References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-7-git-send-email-aneesh.kumar@linux.vnet.ibm.com> In-Reply-To: <1331919570-2264-7-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Content-Type: text/plain; charset=ISO-2022-JP Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: "Aneesh Kumar K.V" Cc: linux-mm@kvack.org, mgorman@suse.de, dhillf@gmail.com, aarcange@redhat.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org (2012/03/17 2:39), Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > This helps in using same memcg callbacks for non reclaim resource > control files. > > Signed-off-by: Aneesh Kumar K.V Acked-by: KAMEZAWA Hiroyuki As mentioned, I'm glad if you can handle usage_in_bytes for root memcg. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx184.postini.com [74.125.245.184]) by kanga.kvack.org (Postfix) with SMTP id 3F63B6B004A for ; Sun, 18 Mar 2012 23:02:26 -0400 (EDT) Received: from m1.gw.fujitsu.co.jp (unknown [10.0.50.71]) by fgwmail6.fujitsu.co.jp (Postfix) with ESMTP id A1B793EE0BB for ; Mon, 19 Mar 2012 12:02:24 +0900 (JST) Received: from smail (m1 [127.0.0.1]) by outgoing.m1.gw.fujitsu.co.jp (Postfix) with ESMTP id 863A645DE58 for ; Mon, 19 Mar 2012 12:02:24 +0900 (JST) Received: from s1.gw.fujitsu.co.jp (s1.gw.fujitsu.co.jp [10.0.50.91]) by m1.gw.fujitsu.co.jp (Postfix) with ESMTP id 6BA2E45DE5D for ; Mon, 19 Mar 2012 12:02:24 +0900 (JST) Received: from s1.gw.fujitsu.co.jp (localhost.localdomain [127.0.0.1]) by s1.gw.fujitsu.co.jp (Postfix) with ESMTP id 59BD31DB8049 for ; Mon, 19 Mar 2012 12:02:24 +0900 (JST) Received: from m105.s.css.fujitsu.com (m105.s.css.fujitsu.com [10.240.81.145]) by s1.gw.fujitsu.co.jp (Postfix) with ESMTP id 039D11DB804E for ; Mon, 19 Mar 2012 12:02:24 +0900 (JST) Message-ID: <4F66A15B.7070804@jp.fujitsu.com> Date: Mon, 19 Mar 2012 12:00:43 +0900 From: KAMEZAWA Hiroyuki MIME-Version: 1.0 Subject: Re: [PATCH -V4 08/10] hugetlbfs: Add a list for tracking in-use HugeTLB pages References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-9-git-send-email-aneesh.kumar@linux.vnet.ibm.com> In-Reply-To: <1331919570-2264-9-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Content-Type: text/plain; charset=ISO-2022-JP Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: "Aneesh Kumar K.V" Cc: linux-mm@kvack.org, mgorman@suse.de, dhillf@gmail.com, aarcange@redhat.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org (2012/03/17 2:39), Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > hugepage_activelist will be used to track currently used HugeTLB pages. > We need to find the in-use HugeTLB pages to support memcg removal. > On memcg removal we update the page's memory cgroup to point to > parent cgroup. > > Signed-off-by: Aneesh Kumar K.V Reviewed-by: KAMEZAWA Hiroyuki seems ok to me but...why the new list is not per node ? no benefit ? Thanks, -Kame > --- > include/linux/hugetlb.h | 1 + > mm/hugetlb.c | 23 ++++++++++++++++++----- > 2 files changed, 19 insertions(+), 5 deletions(-) > > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h > index cbd8dc5..6919100 100644 > --- a/include/linux/hugetlb.h > +++ b/include/linux/hugetlb.h > @@ -217,6 +217,7 @@ struct hstate { > unsigned long resv_huge_pages; > unsigned long surplus_huge_pages; > unsigned long nr_overcommit_huge_pages; > + struct list_head hugepage_activelist; > struct list_head hugepage_freelists[MAX_NUMNODES]; > unsigned int nr_huge_pages_node[MAX_NUMNODES]; > unsigned int free_huge_pages_node[MAX_NUMNODES]; > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 684849a..8fd465d 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -433,7 +433,7 @@ void copy_huge_page(struct page *dst, struct page *src) > static void enqueue_huge_page(struct hstate *h, struct page *page) > { > int nid = page_to_nid(page); > - list_add(&page->lru, &h->hugepage_freelists[nid]); > + list_move(&page->lru, &h->hugepage_freelists[nid]); > h->free_huge_pages++; > h->free_huge_pages_node[nid]++; > } > @@ -445,7 +445,7 @@ static struct page *dequeue_huge_page_node(struct hstate *h, int nid) > if (list_empty(&h->hugepage_freelists[nid])) > return NULL; > page = list_entry(h->hugepage_freelists[nid].next, struct page, lru); > - list_del(&page->lru); > + list_move(&page->lru, &h->hugepage_activelist); > set_page_refcounted(page); > h->free_huge_pages--; > h->free_huge_pages_node[nid]--; > @@ -542,13 +542,14 @@ static void free_huge_page(struct page *page) > page->mapping = NULL; > BUG_ON(page_count(page)); > BUG_ON(page_mapcount(page)); > - INIT_LIST_HEAD(&page->lru); > > if (mapping) > mem_cgroup_hugetlb_uncharge_page(hstate_index(h), > pages_per_huge_page(h), page); > spin_lock(&hugetlb_lock); > if (h->surplus_huge_pages_node[nid] && huge_page_order(h) < MAX_ORDER) { > + /* remove the page from active list */ > + list_del(&page->lru); > update_and_free_page(h, page); > h->surplus_huge_pages--; > h->surplus_huge_pages_node[nid]--; > @@ -562,6 +563,7 @@ static void free_huge_page(struct page *page) > > static void prep_new_huge_page(struct hstate *h, struct page *page, int nid) > { > + INIT_LIST_HEAD(&page->lru); > set_compound_page_dtor(page, free_huge_page); > spin_lock(&hugetlb_lock); > h->nr_huge_pages++; > @@ -1861,6 +1863,7 @@ void __init hugetlb_add_hstate(unsigned order) > h->free_huge_pages = 0; > for (i = 0; i < MAX_NUMNODES; ++i) > INIT_LIST_HEAD(&h->hugepage_freelists[i]); > + INIT_LIST_HEAD(&h->hugepage_activelist); > h->next_nid_to_alloc = first_node(node_states[N_HIGH_MEMORY]); > h->next_nid_to_free = first_node(node_states[N_HIGH_MEMORY]); > snprintf(h->name, HSTATE_NAME_LEN, "hugepages-%lukB", > @@ -2319,14 +2322,24 @@ void __unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start, > page = pte_page(pte); > if (pte_dirty(pte)) > set_page_dirty(page); > - list_add(&page->lru, &page_list); > + > + spin_lock(&hugetlb_lock); > + list_move(&page->lru, &page_list); > + spin_unlock(&hugetlb_lock); > } > spin_unlock(&mm->page_table_lock); > flush_tlb_range(vma, start, end); > mmu_notifier_invalidate_range_end(mm, start, end); > list_for_each_entry_safe(page, tmp, &page_list, lru) { > page_remove_rmap(page); > - list_del(&page->lru); > + /* > + * We need to move it back huge page active list. If we are > + * holding the last reference, below put_page will move it > + * back to free list. > + */ > + spin_lock(&hugetlb_lock); > + list_move(&page->lru, &h->hugepage_activelist); > + spin_unlock(&hugetlb_lock); > put_page(page); > } > } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx134.postini.com [74.125.245.134]) by kanga.kvack.org (Postfix) with SMTP id 6C7FF6B004D for ; Sun, 18 Mar 2012 23:06:39 -0400 (EDT) Received: from m4.gw.fujitsu.co.jp (unknown [10.0.50.74]) by fgwmail5.fujitsu.co.jp (Postfix) with ESMTP id 026BE3EE0BC for ; Mon, 19 Mar 2012 12:06:38 +0900 (JST) Received: from smail (m4 [127.0.0.1]) by outgoing.m4.gw.fujitsu.co.jp (Postfix) with ESMTP id DCDE445DE55 for ; Mon, 19 Mar 2012 12:06:37 +0900 (JST) Received: from s4.gw.fujitsu.co.jp (s4.gw.fujitsu.co.jp [10.0.50.94]) by m4.gw.fujitsu.co.jp (Postfix) with ESMTP id 8690745DE53 for ; Mon, 19 Mar 2012 12:06:37 +0900 (JST) Received: from s4.gw.fujitsu.co.jp (localhost.localdomain [127.0.0.1]) by s4.gw.fujitsu.co.jp (Postfix) with ESMTP id 79FCA1DB8040 for ; Mon, 19 Mar 2012 12:06:37 +0900 (JST) Received: from ml14.s.css.fujitsu.com (ml14.s.css.fujitsu.com [10.240.81.134]) by s4.gw.fujitsu.co.jp (Postfix) with ESMTP id 1E2621DB803B for ; Mon, 19 Mar 2012 12:06:37 +0900 (JST) Message-ID: <4F66A258.5060301@jp.fujitsu.com> Date: Mon, 19 Mar 2012 12:04:56 +0900 From: KAMEZAWA Hiroyuki MIME-Version: 1.0 Subject: Re: [PATCH -V4 09/10] memcg: move HugeTLB resource count to parent cgroup on memcg removal References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-10-git-send-email-aneesh.kumar@linux.vnet.ibm.com> In-Reply-To: <1331919570-2264-10-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Content-Type: text/plain; charset=ISO-2022-JP Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: "Aneesh Kumar K.V" Cc: linux-mm@kvack.org, mgorman@suse.de, dhillf@gmail.com, aarcange@redhat.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org (2012/03/17 2:39), Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > This add support for memcg removal with HugeTLB resource usage. > > Signed-off-by: Aneesh Kumar K.V seems ok for now. Now, Tejun and Costa, and I are discussing removeing -EBUSY from rmdir(). We're now considering 'if use_hierarchy=false and parent seems full, reclaim all or move charges to the root cgroup.' then -EBUSY will go away. Is it accesptable for hugetlb ? Do you have another idea ? Thanks, -Kame > --- > include/linux/hugetlb.h | 6 ++++ > include/linux/memcontrol.h | 15 +++++++++- > mm/hugetlb.c | 41 ++++++++++++++++++++++++++ > mm/memcontrol.c | 68 +++++++++++++++++++++++++++++++++++++------ > 4 files changed, 119 insertions(+), 11 deletions(-) > > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h > index 6919100..32e948c 100644 > --- a/include/linux/hugetlb.h > +++ b/include/linux/hugetlb.h > @@ -349,11 +349,17 @@ static inline unsigned int pages_per_huge_page(struct hstate *h) > #ifdef CONFIG_MEM_RES_CTLR_HUGETLB > extern int register_hugetlb_memcg_files(struct cgroup *cgroup, > struct cgroup_subsys *ss); > +extern int hugetlb_force_memcg_empty(struct cgroup *cgroup); > #else > static inline int register_hugetlb_memcg_files(struct cgroup *cgroup, > struct cgroup_subsys *ss) > { > return 0; > } > + > +static inline int hugetlb_force_memcg_empty(struct cgroup *cgroup) > +{ > + return 0; > +} > #endif > #endif /* _LINUX_HUGETLB_H */ > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > index 73900b9..0980122 100644 > --- a/include/linux/memcontrol.h > +++ b/include/linux/memcontrol.h > @@ -441,7 +441,9 @@ extern void mem_cgroup_hugetlb_uncharge_page(int idx, unsigned long nr_pages, > extern void mem_cgroup_hugetlb_uncharge_memcg(int idx, unsigned long nr_pages, > struct mem_cgroup *memcg); > extern int mem_cgroup_hugetlb_file_init(int idx); > - > +extern int mem_cgroup_move_hugetlb_parent(int idx, struct cgroup *cgroup, > + struct page *page); > +extern bool mem_cgroup_have_hugetlb_usage(struct cgroup *cgroup); > #else > static inline int > mem_cgroup_hugetlb_charge_page(int idx, unsigned long nr_pages, > @@ -477,6 +479,17 @@ static inline int mem_cgroup_hugetlb_file_init(int idx) > return 0; > } > > +static inline int > +mem_cgroup_move_hugetlb_parent(int idx, struct cgroup *cgroup, > + struct page *page) > +{ > + return 0; > +} > + > +static inline bool mem_cgroup_have_hugetlb_usage(struct cgroup *cgroup) > +{ > + return 0; > +} > #endif /* CONFIG_MEM_RES_CTLR_HUGETLB */ > #endif /* _LINUX_MEMCONTROL_H */ > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 8fd465d..685f0d5 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -1842,6 +1842,47 @@ int register_hugetlb_memcg_files(struct cgroup *cgroup, > } > return ret; > } > + > +/* > + * Force the memcg to empty the hugetlb resources by moving them to > + * the parent cgroup. We can fail if the parent cgroup's limit prevented > + * the charging. This should only happen if use_hierarchy is not set. > + */ > +int hugetlb_force_memcg_empty(struct cgroup *cgroup) > +{ > + struct hstate *h; > + struct page *page; > + int ret = 0, idx = 0; > + > + do { > + if (cgroup_task_count(cgroup) || !list_empty(&cgroup->children)) > + goto out; > + /* > + * If the task doing the cgroup_rmdir got a signal > + * we don't really need to loop till the hugetlb resource > + * usage become zero. > + */ > + if (signal_pending(current)) { > + ret = -EINTR; > + goto out; > + } > + for_each_hstate(h) { > + spin_lock(&hugetlb_lock); > + list_for_each_entry(page, &h->hugepage_activelist, lru) { > + ret = mem_cgroup_move_hugetlb_parent(idx, cgroup, page); > + if (ret) { > + spin_unlock(&hugetlb_lock); > + goto out; > + } > + } > + spin_unlock(&hugetlb_lock); > + idx++; > + } > + cond_resched(); > + } while (mem_cgroup_have_hugetlb_usage(cgroup)); > +out: > + return ret; > +} > #endif > > /* Should be called on processing a hugepagesz=... option */ > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 4900b72..e29d86d 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -3171,9 +3171,11 @@ static inline int mem_cgroup_move_swap_account(swp_entry_t entry, > #endif > > #ifdef CONFIG_MEM_RES_CTLR_HUGETLB > -static bool mem_cgroup_have_hugetlb_usage(struct mem_cgroup *memcg) > +bool mem_cgroup_have_hugetlb_usage(struct cgroup *cgroup) > { > int idx; > + struct mem_cgroup *memcg = mem_cgroup_from_cont(cgroup); > + > for (idx = 0; idx < hugetlb_max_hstate; idx++) { > if (memcg->hugepage[idx].usage > 0) > return 1; > @@ -3285,10 +3287,57 @@ void mem_cgroup_hugetlb_uncharge_memcg(int idx, unsigned long nr_pages, > res_counter_uncharge(&memcg->hugepage[idx], csize); > return; > } > -#else > -static bool mem_cgroup_have_hugetlb_usage(struct mem_cgroup *memcg) > + > +int mem_cgroup_move_hugetlb_parent(int idx, struct cgroup *cgroup, > + struct page *page) > { > - return 0; > + struct page_cgroup *pc; > + int csize, ret = 0; > + struct res_counter *fail_res; > + struct cgroup *pcgrp = cgroup->parent; > + struct mem_cgroup *parent = mem_cgroup_from_cont(pcgrp); > + struct mem_cgroup *memcg = mem_cgroup_from_cont(cgroup); > + > + if (!get_page_unless_zero(page)) > + goto out; > + > + pc = lookup_page_cgroup(page); > + lock_page_cgroup(pc); > + if (!PageCgroupUsed(pc) || pc->mem_cgroup != memcg) > + goto err_out; > + > + csize = PAGE_SIZE << compound_order(page); > + /* > + * uncharge from child and charge the parent. If we have > + * use_hierarchy set, we can never fail here. In-order to make > + * sure we don't get -ENOMEM on parent charge, we first uncharge > + * the child and then charge the parent. > + */ > + if (parent->use_hierarchy) { > + res_counter_uncharge(&memcg->hugepage[idx], csize); > + if (!mem_cgroup_is_root(parent)) > + ret = res_counter_charge(&parent->hugepage[idx], > + csize, &fail_res); > + } else { > + if (!mem_cgroup_is_root(parent)) { > + ret = res_counter_charge(&parent->hugepage[idx], > + csize, &fail_res); > + if (ret) { > + ret = -EBUSY; > + goto err_out; > + } > + } > + res_counter_uncharge(&memcg->hugepage[idx], csize); > + } > + /* > + * caller should have done css_get > + */ > + pc->mem_cgroup = parent; > +err_out: > + unlock_page_cgroup(pc); > + put_page(page); > +out: > + return ret; > } > #endif /* CONFIG_MEM_RES_CTLR_HUGETLB */ > > @@ -3806,6 +3855,11 @@ static int mem_cgroup_force_empty(struct mem_cgroup *memcg, bool free_all) > /* should free all ? */ > if (free_all) > goto try_to_free; > + > + /* move the hugetlb charges */ > + ret = hugetlb_force_memcg_empty(cgrp); > + if (ret) > + goto out; > move_account: > do { > ret = -EBUSY; > @@ -5103,12 +5157,6 @@ static int mem_cgroup_pre_destroy(struct cgroup_subsys *ss, > struct cgroup *cont) > { > struct mem_cgroup *memcg = mem_cgroup_from_cont(cont); > - /* > - * Don't allow memcg removal if we have HugeTLB resource > - * usage. > - */ > - if (mem_cgroup_have_hugetlb_usage(memcg)) > - return -EBUSY; > > return mem_cgroup_force_empty(memcg, false); > } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx187.postini.com [74.125.245.187]) by kanga.kvack.org (Postfix) with SMTP id 8BBC96B00E8 for ; Mon, 19 Mar 2012 02:53:48 -0400 (EDT) Received: from /spool/local by e28smtp07.in.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Mon, 19 Mar 2012 12:23:00 +0530 Received: from d28av05.in.ibm.com (d28av05.in.ibm.com [9.184.220.67]) by d28relay02.in.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id q2J6qtAm4014312 for ; Mon, 19 Mar 2012 12:22:55 +0530 Received: from d28av05.in.ibm.com (loopback [127.0.0.1]) by d28av05.in.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id q2JCNME2023189 for ; Mon, 19 Mar 2012 23:23:23 +1100 From: "Aneesh Kumar K.V" Subject: Re: [PATCH -V4 04/10] memcg: Add HugeTLB extension In-Reply-To: <4F669C2E.1010502@jp.fujitsu.com> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-5-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <4F669C2E.1010502@jp.fujitsu.com> Date: Mon, 19 Mar 2012 12:22:53 +0530 Message-ID: <874ntlkrp6.fsf@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-linux-mm@kvack.org List-ID: To: KAMEZAWA Hiroyuki Cc: linux-mm@kvack.org, mgorman@suse.de, dhillf@gmail.com, aarcange@redhat.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org On Mon, 19 Mar 2012 11:38:38 +0900, KAMEZAWA Hiroyuki wrote: > (2012/03/17 2:39), Aneesh Kumar K.V wrote: > > > From: "Aneesh Kumar K.V" > > > > This patch implements a memcg extension that allows us to control > > HugeTLB allocations via memory controller. > > > > > If you write some details here, it will be helpful for review and > seeing log after merge. Will add more info. > > > > Signed-off-by: Aneesh Kumar K.V > > --- > > include/linux/hugetlb.h | 1 + > > include/linux/memcontrol.h | 42 +++++++++++++ > > init/Kconfig | 8 +++ > > mm/hugetlb.c | 2 +- > > mm/memcontrol.c | 138 ++++++++++++++++++++++++++++++++++++++++++++ > > 5 files changed, 190 insertions(+), 1 deletions(-) .... > > +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB > > +static bool mem_cgroup_have_hugetlb_usage(struct mem_cgroup *memcg) > > +{ > > + int idx; > > + for (idx = 0; idx < hugetlb_max_hstate; idx++) { > > + if (memcg->hugepage[idx].usage > 0) > > + return 1; > > + } > > + return 0; > > +} > > > Please use res_counter_read_u64() rather than reading the value directly. > The open-coded variant is mostly derived from mem_cgroup_force_empty. I have updated the patch to use res_counter_read_u64. > > > + > > +int mem_cgroup_hugetlb_charge_page(int idx, unsigned long nr_pages, > > + struct mem_cgroup **ptr) > > +{ > > + int ret = 0; > > + struct mem_cgroup *memcg; > > + struct res_counter *fail_res; > > + unsigned long csize = nr_pages * PAGE_SIZE; > > + > > + if (mem_cgroup_disabled()) > > + return 0; > > +again: > > + rcu_read_lock(); > > + memcg = mem_cgroup_from_task(current); > > + if (!memcg) > > + memcg = root_mem_cgroup; > > + if (mem_cgroup_is_root(memcg)) { > > + rcu_read_unlock(); > > + goto done; > > + } > > > One concern is.... Now, yes, memory cgroup doesn't account root cgroup > and doesn't update res->usage to avoid updating shared counter overheads > when memcg is not mounted. But memory.usage_in_bytes files works > for root memcg with reading percpu statistics. > > So, how about counting usage for root cgroup even if it cannot be limited ? > Considering hugetlb fs usage, updating res_counter here doesn't have > performance problem of false sharing.. > Then, you can remove root_mem_cgroup() checks inserted several places. > Yes. That is a good idea. Will update the patch. > > > > struct mem_cgroup *memcg = mem_cgroup_from_cont(cont); > > + /* > > + * Don't allow memcg removal if we have HugeTLB resource > > + * usage. > > + */ > > + if (mem_cgroup_have_hugetlb_usage(memcg)) > > + return -EBUSY; > > > > return mem_cgroup_force_empty(memcg, false); > > } > > > Is this fixed by patch 8+9 ? Yes. -aneesh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx174.postini.com [74.125.245.174]) by kanga.kvack.org (Postfix) with SMTP id 9BC0D6B0083 for ; Mon, 19 Mar 2012 03:02:51 -0400 (EDT) Received: from m3.gw.fujitsu.co.jp (unknown [10.0.50.73]) by fgwmail5.fujitsu.co.jp (Postfix) with ESMTP id AAFEE3EE0BD for ; Mon, 19 Mar 2012 16:02:49 +0900 (JST) Received: from smail (m3 [127.0.0.1]) by outgoing.m3.gw.fujitsu.co.jp (Postfix) with ESMTP id 8E9E345DEC1 for ; Mon, 19 Mar 2012 16:02:49 +0900 (JST) Received: from s3.gw.fujitsu.co.jp (s3.gw.fujitsu.co.jp [10.0.50.93]) by m3.gw.fujitsu.co.jp (Postfix) with ESMTP id 7369E45DEB8 for ; Mon, 19 Mar 2012 16:02:49 +0900 (JST) Received: from s3.gw.fujitsu.co.jp (localhost.localdomain [127.0.0.1]) by s3.gw.fujitsu.co.jp (Postfix) with ESMTP id 5D8E31DB8047 for ; Mon, 19 Mar 2012 16:02:49 +0900 (JST) Received: from m107.s.css.fujitsu.com (m107.s.css.fujitsu.com [10.240.81.147]) by s3.gw.fujitsu.co.jp (Postfix) with ESMTP id 0520D1DB8044 for ; Mon, 19 Mar 2012 16:02:49 +0900 (JST) Message-ID: <4F66D993.2080100@jp.fujitsu.com> Date: Mon, 19 Mar 2012 16:00:35 +0900 From: KAMEZAWA Hiroyuki MIME-Version: 1.0 Subject: Re: [PATCH -V4 04/10] memcg: Add HugeTLB extension References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-5-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <4F669C2E.1010502@jp.fujitsu.com> <874ntlkrp6.fsf@linux.vnet.ibm.com> In-Reply-To: <874ntlkrp6.fsf@linux.vnet.ibm.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: "Aneesh Kumar K.V" Cc: linux-mm@kvack.org, mgorman@suse.de, dhillf@gmail.com, aarcange@redhat.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org (2012/03/19 15:52), Aneesh Kumar K.V wrote: > On Mon, 19 Mar 2012 11:38:38 +0900, KAMEZAWA Hiroyuki wrote: >> (2012/03/17 2:39), Aneesh Kumar K.V wrote: >> >>> From: "Aneesh Kumar K.V" >>> >>> This patch implements a memcg extension that allows us to control >>> HugeTLB allocations via memory controller. >>> >> >> >> If you write some details here, it will be helpful for review and >> seeing log after merge. > > Will add more info. > >> >> >>> Signed-off-by: Aneesh Kumar K.V >>> --- >>> include/linux/hugetlb.h | 1 + >>> include/linux/memcontrol.h | 42 +++++++++++++ >>> init/Kconfig | 8 +++ >>> mm/hugetlb.c | 2 +- >>> mm/memcontrol.c | 138 ++++++++++++++++++++++++++++++++++++++++++++ >>> 5 files changed, 190 insertions(+), 1 deletions(-) > > .... > >>> +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB >>> +static bool mem_cgroup_have_hugetlb_usage(struct mem_cgroup *memcg) >>> +{ >>> + int idx; >>> + for (idx = 0; idx < hugetlb_max_hstate; idx++) { >>> + if (memcg->hugepage[idx].usage > 0) >>> + return 1; >>> + } >>> + return 0; >>> +} >> >> >> Please use res_counter_read_u64() rather than reading the value directly. >> > > The open-coded variant is mostly derived from mem_cgroup_force_empty. I > have updated the patch to use res_counter_read_u64. > Ah, ok. it's(maybe) my bad. I'll schedule a fix. Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx126.postini.com [74.125.245.126]) by kanga.kvack.org (Postfix) with SMTP id E74D96B00ED for ; Mon, 19 Mar 2012 03:14:32 -0400 (EDT) Received: from /spool/local by e23smtp09.au.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Mon, 19 Mar 2012 08:04:45 +1000 Received: from d23av01.au.ibm.com (d23av01.au.ibm.com [9.190.234.96]) by d23relay03.au.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id q2J7EHFT929826 for ; Mon, 19 Mar 2012 18:14:17 +1100 Received: from d23av01.au.ibm.com (loopback [127.0.0.1]) by d23av01.au.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id q2J7EGWG025142 for ; Mon, 19 Mar 2012 18:14:17 +1100 From: "Aneesh Kumar K.V" Subject: Re: [PATCH -V4 07/10] hugetlbfs: Add memcg control files for hugetlbfs In-Reply-To: <4F66A059.20801@jp.fujitsu.com> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-8-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <4F66A059.20801@jp.fujitsu.com> Date: Mon, 19 Mar 2012 12:44:11 +0530 Message-ID: <87wr6hjc58.fsf@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-linux-mm@kvack.org List-ID: To: KAMEZAWA Hiroyuki Cc: linux-mm@kvack.org, mgorman@suse.de, dhillf@gmail.com, aarcange@redhat.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org On Mon, 19 Mar 2012 11:56:25 +0900, KAMEZAWA Hiroyuki wrote: > (2012/03/17 2:39), Aneesh Kumar K.V wrote: > > > From: "Aneesh Kumar K.V" > > > > This add control files for hugetlbfs in memcg > > > > Signed-off-by: Aneesh Kumar K.V > > > I have a question. When a user does > > 1. create memory cgroup as > /cgroup/A > 2. insmod hugetlb.ko > 3. ls /cgroup/A > > and then, files can be shown ? Don't we have any problem at rmdir A ? > > I'm sorry if hugetlb never be used as module. HUGETLBFS cannot be build as kernel module > > a comment below. > > > --- > > include/linux/hugetlb.h | 17 +++++++++++++++ > > include/linux/memcontrol.h | 7 ++++++ > > mm/hugetlb.c | 25 ++++++++++++++++++++++- > > mm/memcontrol.c | 48 ++++++++++++++++++++++++++++++++++++++++++++ > > 4 files changed, 96 insertions(+), 1 deletions(-) ...... > > > > +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB > > +static char *mem_fmt(char *buf, unsigned long n) > > +{ > > + if (n >= (1UL << 30)) > > + sprintf(buf, "%luGB", n >> 30); > > + else if (n >= (1UL << 20)) > > + sprintf(buf, "%luMB", n >> 20); > > + else > > + sprintf(buf, "%luKB", n >> 10); > > + return buf; > > +} > > + > > +int mem_cgroup_hugetlb_file_init(int idx) > > +{ > > > __init ? Added . >And... do we have guarantee that this function is called before > creating root mem cgroup even if CONFIG_HUGETLBFS=y ? > Yes. This should be called before creating root mem cgroup. -aneesh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx119.postini.com [74.125.245.119]) by kanga.kvack.org (Postfix) with SMTP id A29F06B004A for ; Mon, 19 Mar 2012 03:35:57 -0400 (EDT) Received: from m2.gw.fujitsu.co.jp (unknown [10.0.50.72]) by fgwmail5.fujitsu.co.jp (Postfix) with ESMTP id 2F71D3EE0BD for ; Mon, 19 Mar 2012 16:35:56 +0900 (JST) Received: from smail (m2 [127.0.0.1]) by outgoing.m2.gw.fujitsu.co.jp (Postfix) with ESMTP id 1595E45DE52 for ; Mon, 19 Mar 2012 16:35:56 +0900 (JST) Received: from s2.gw.fujitsu.co.jp (s2.gw.fujitsu.co.jp [10.0.50.92]) by m2.gw.fujitsu.co.jp (Postfix) with ESMTP id F3FC245DE4D for ; Mon, 19 Mar 2012 16:35:55 +0900 (JST) Received: from s2.gw.fujitsu.co.jp (localhost.localdomain [127.0.0.1]) by s2.gw.fujitsu.co.jp (Postfix) with ESMTP id E556A1DB803A for ; Mon, 19 Mar 2012 16:35:55 +0900 (JST) Received: from m105.s.css.fujitsu.com (m105.s.css.fujitsu.com [10.240.81.145]) by s2.gw.fujitsu.co.jp (Postfix) with ESMTP id 9E2D91DB8040 for ; Mon, 19 Mar 2012 16:35:55 +0900 (JST) Message-ID: <4F66E169.5000909@jp.fujitsu.com> Date: Mon, 19 Mar 2012 16:34:01 +0900 From: KAMEZAWA Hiroyuki MIME-Version: 1.0 Subject: Re: [PATCH -V4 07/10] hugetlbfs: Add memcg control files for hugetlbfs References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-8-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <4F66A059.20801@jp.fujitsu.com> <87wr6hjc58.fsf@linux.vnet.ibm.com> In-Reply-To: <87wr6hjc58.fsf@linux.vnet.ibm.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: "Aneesh Kumar K.V" Cc: linux-mm@kvack.org, mgorman@suse.de, dhillf@gmail.com, aarcange@redhat.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Tejun Heo (2012/03/19 16:14), Aneesh Kumar K.V wrote: > On Mon, 19 Mar 2012 11:56:25 +0900, KAMEZAWA Hiroyuki wrote: >> (2012/03/17 2:39), Aneesh Kumar K.V wrote: >> >>> From: "Aneesh Kumar K.V" >>> >>> This add control files for hugetlbfs in memcg >>> >>> Signed-off-by: Aneesh Kumar K.V >> >> >> I have a question. When a user does >> >> 1. create memory cgroup as >> /cgroup/A >> 2. insmod hugetlb.ko >> 3. ls /cgroup/A >> >> and then, files can be shown ? Don't we have any problem at rmdir A ? >> >> I'm sorry if hugetlb never be used as module. > > HUGETLBFS cannot be build as kernel module > > >> >> a comment below. >> >>> --- >>> include/linux/hugetlb.h | 17 +++++++++++++++ >>> include/linux/memcontrol.h | 7 ++++++ >>> mm/hugetlb.c | 25 ++++++++++++++++++++++- >>> mm/memcontrol.c | 48 ++++++++++++++++++++++++++++++++++++++++++++ >>> 4 files changed, 96 insertions(+), 1 deletions(-) > > > ...... > >>> >>> +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB >>> +static char *mem_fmt(char *buf, unsigned long n) >>> +{ >>> + if (n >= (1UL << 30)) >>> + sprintf(buf, "%luGB", n >> 30); >>> + else if (n >= (1UL << 20)) >>> + sprintf(buf, "%luMB", n >> 20); >>> + else >>> + sprintf(buf, "%luKB", n >> 10); >>> + return buf; >>> +} >>> + >>> +int mem_cgroup_hugetlb_file_init(int idx) >>> +{ >> >> >> __init ? > > Added . > >> And... do we have guarantee that this function is called before >> creating root mem cgroup even if CONFIG_HUGETLBFS=y ? >> > > Yes. This should be called before creating root mem cgroup. > O.K. BTW, please read Tejun's recent post.. https://lkml.org/lkml/2012/3/16/522 Can you use his methods ? I guess you can write... CGROUP_SUBSYS_CFTYLES_COND(mem_cgroup_subsys, hugetlb_cgroup_files, if XXXXMB hugetlb is allowed); Hmm. Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx182.postini.com [74.125.245.182]) by kanga.kvack.org (Postfix) with SMTP id 4370E6B007E for ; Wed, 21 Mar 2012 00:49:04 -0400 (EDT) Received: from /spool/local by e28smtp09.in.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 21 Mar 2012 10:18:53 +0530 Received: from d28av05.in.ibm.com (d28av05.in.ibm.com [9.184.220.67]) by d28relay04.in.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id q2L4mnZi4325388 for ; Wed, 21 Mar 2012 10:18:50 +0530 Received: from d28av05.in.ibm.com (loopback [127.0.0.1]) by d28av05.in.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id q2LAJHJF021597 for ; Wed, 21 Mar 2012 21:19:17 +1100 From: "Aneesh Kumar K.V" Subject: Re: [PATCH -V4 04/10] memcg: Add HugeTLB extension In-Reply-To: <4F671AE6.5020204@parallels.com> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-5-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <4F669C2E.1010502@jp.fujitsu.com> <874ntlkrp6.fsf@linux.vnet.ibm.com> <4F66D993.2080100@jp.fujitsu.com> <4F671AE6.5020204@parallels.com>User-Agent: Notmuch/0.11.1+346~g13d19c3 (http://notmuchmail.org) Emacs/23.3.1 (x86_64-pc-linux-gnu) Date: Wed, 21 Mar 2012 10:18:43 +0530 Message-ID: <87obrqsgno.fsf@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-linux-mm@kvack.org List-ID: To: Glauber Costa , KAMEZAWA Hiroyuki Cc: linux-mm@kvack.org, mgorman@suse.de, dhillf@gmail.com, aarcange@redhat.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Glauber Costa writes: > On 03/19/2012 11:00 AM, KAMEZAWA Hiroyuki wrote: >> (2012/03/19 15:52), Aneesh Kumar K.V wrote: >> >>> >>>>> +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB >>>>> +static bool mem_cgroup_have_hugetlb_usage(struct mem_cgroup *memcg) >>>>> +{ >>>>> + int idx; >>>>> + for (idx = 0; idx< hugetlb_max_hstate; idx++) { >>>>> + if (memcg->hugepage[idx].usage> 0) >>>>> + return 1; >>>>> + } >>>>> + return 0; >>>>> +} >>>> >>>> >>>> Please use res_counter_read_u64() rather than reading the value directly. >>>> >>> >>> The open-coded variant is mostly derived from mem_cgroup_force_empty. I >>> have updated the patch to use res_counter_read_u64. >>> >> >> Ah, ok. it's(maybe) my bad. I'll schedule a fix. >> > Kame, > > I actually have it ready here. I can submit it if you want. > > This one has bitten me as well when I was trying to experiment with the > res_counter performance... Do we really need memcg.res.usage to be accurate in that while loop ? If we miss a zero update because we encountered a partial update; in the next loop we will find it zero right ? -aneesh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx171.postini.com [74.125.245.171]) by kanga.kvack.org (Postfix) with SMTP id 1BD356B004A for ; Wed, 21 Mar 2012 01:24:49 -0400 (EDT) Received: from m4.gw.fujitsu.co.jp (unknown [10.0.50.74]) by fgwmail5.fujitsu.co.jp (Postfix) with ESMTP id CD24F3EE0C0 for ; Wed, 21 Mar 2012 14:24:46 +0900 (JST) Received: from smail (m4 [127.0.0.1]) by outgoing.m4.gw.fujitsu.co.jp (Postfix) with ESMTP id B22B345DE53 for ; Wed, 21 Mar 2012 14:24:46 +0900 (JST) Received: from s4.gw.fujitsu.co.jp (s4.gw.fujitsu.co.jp [10.0.50.94]) by m4.gw.fujitsu.co.jp (Postfix) with ESMTP id 953D345DE50 for ; Wed, 21 Mar 2012 14:24:46 +0900 (JST) Received: from s4.gw.fujitsu.co.jp (localhost.localdomain [127.0.0.1]) by s4.gw.fujitsu.co.jp (Postfix) with ESMTP id 88F801DB802F for ; Wed, 21 Mar 2012 14:24:46 +0900 (JST) Received: from m106.s.css.fujitsu.com (m106.s.css.fujitsu.com [10.240.81.146]) by s4.gw.fujitsu.co.jp (Postfix) with ESMTP id 373AC1DB8037 for ; Wed, 21 Mar 2012 14:24:46 +0900 (JST) Message-ID: <4F6965AC.4070004@jp.fujitsu.com> Date: Wed, 21 Mar 2012 14:22:52 +0900 From: KAMEZAWA Hiroyuki MIME-Version: 1.0 Subject: Re: [PATCH -V4 04/10] memcg: Add HugeTLB extension References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-5-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <4F669C2E.1010502@jp.fujitsu.com> <874ntlkrp6.fsf@linux.vnet.ibm.com> <4F66D993.2080100@jp.fujitsu.com> <4F671AE6.5020204@parallels.com>User-Agent: Notmuch/0.11.1+346~g13d19c3 (http://notmuchmail.org) Emacs/23.3.1 (x86_64-pc-linux-gnu) <87obrqsgno.fsf@linux.vnet.ibm.com> In-Reply-To: <87obrqsgno.fsf@linux.vnet.ibm.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: "Aneesh Kumar K.V" Cc: Glauber Costa , linux-mm@kvack.org, mgorman@suse.de, dhillf@gmail.com, aarcange@redhat.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org (2012/03/21 13:48), Aneesh Kumar K.V wrote: > Glauber Costa writes: > >> On 03/19/2012 11:00 AM, KAMEZAWA Hiroyuki wrote: >>> (2012/03/19 15:52), Aneesh Kumar K.V wrote: >>> >>>> >>>>>> +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB >>>>>> +static bool mem_cgroup_have_hugetlb_usage(struct mem_cgroup *memcg) >>>>>> +{ >>>>>> + int idx; >>>>>> + for (idx = 0; idx< hugetlb_max_hstate; idx++) { >>>>>> + if (memcg->hugepage[idx].usage> 0) >>>>>> + return 1; >>>>>> + } >>>>>> + return 0; >>>>>> +} >>>>> >>>>> >>>>> Please use res_counter_read_u64() rather than reading the value directly. >>>>> >>>> >>>> The open-coded variant is mostly derived from mem_cgroup_force_empty. I >>>> have updated the patch to use res_counter_read_u64. >>>> >>> >>> Ah, ok. it's(maybe) my bad. I'll schedule a fix. >>> >> Kame, >> >> I actually have it ready here. I can submit it if you want. >> >> This one has bitten me as well when I was trying to experiment with the >> res_counter performance... > > Do we really need memcg.res.usage to be accurate in that while loop ? If > we miss a zero update because we encountered a partial update; in the > next loop we will find it zero right ? > At rmdir(), I assume there is no task in memcg. It means res->usage never increase and no other thread than force_empty will touch res->counter. So, I think memcg->res.usage > 0 never be wrong and we'll find correct comparison by continuing the loop. But recent kmem accounting at el may break the assumption (I'm not fully sure..) So, I think it will be good to use res_counter_u64(). This part is not important for performance, anyway. Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx148.postini.com [74.125.245.148]) by kanga.kvack.org (Postfix) with SMTP id E002E6B0044 for ; Wed, 28 Mar 2012 05:18:16 -0400 (EDT) Date: Wed, 28 Mar 2012 11:18:11 +0200 From: Michal Hocko Subject: Re: [PATCH -V4 01/10] hugetlb: rename max_hstate to hugetlb_max_hstate Message-ID: <20120328091811.GB20949@tiehlicka.suse.cz> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-2-git-send-email-aneesh.kumar@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1331919570-2264-2-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Sender: owner-linux-mm@kvack.org List-ID: To: "Aneesh Kumar K.V" Cc: linux-mm@kvack.org, mgorman@suse.de, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, aarcange@redhat.com, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org [Sorry for late review] On Fri 16-03-12 23:09:21, Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > We will be using this from other subsystems like memcg > in later patches. OK, why not. I would probably loved an accessor function more but what ever. Acked-by: Michal Hocko > > Acked-by: Hillf Danton > Signed-off-by: Aneesh Kumar K.V > --- > mm/hugetlb.c | 14 +++++++------- > 1 files changed, 7 insertions(+), 7 deletions(-) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 5f34bd8..d623e71 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -34,7 +34,7 @@ const unsigned long hugetlb_zero = 0, hugetlb_infinity = ~0UL; > static gfp_t htlb_alloc_mask = GFP_HIGHUSER; > unsigned long hugepages_treat_as_movable; > > -static int max_hstate; > +static int hugetlb_max_hstate; > > unsigned int default_hstate_idx; > struct hstate hstates[HUGE_MAX_HSTATE]; > > @@ -46,7 +46,7 @@ static unsigned long __initdata default_hstate_max_huge_pages; > static unsigned long __initdata default_hstate_size; > > #define for_each_hstate(h) \ > - for ((h) = hstates; (h) < &hstates[max_hstate]; (h)++) > + for ((h) = hstates; (h) < &hstates[hugetlb_max_hstate]; (h)++) > > /* > * Protects updates to hugepage_freelists, nr_huge_pages, and free_huge_pages > @@ -1808,9 +1808,9 @@ void __init hugetlb_add_hstate(unsigned order) > printk(KERN_WARNING "hugepagesz= specified twice, ignoring\n"); > return; > } > - BUG_ON(max_hstate >= HUGE_MAX_HSTATE); > + BUG_ON(hugetlb_max_hstate >= HUGE_MAX_HSTATE); > BUG_ON(order == 0); > - h = &hstates[max_hstate++]; > + h = &hstates[hugetlb_max_hstate++]; > h->order = order; > h->mask = ~((1ULL << (order + PAGE_SHIFT)) - 1); > h->nr_huge_pages = 0; > @@ -1831,10 +1831,10 @@ static int __init hugetlb_nrpages_setup(char *s) > static unsigned long *last_mhp; > > /* > - * !max_hstate means we haven't parsed a hugepagesz= parameter yet, > + * !hugetlb_max_hstate means we haven't parsed a hugepagesz= parameter yet, > * so this hugepages= parameter goes to the "default hstate". > */ > - if (!max_hstate) > + if (!hugetlb_max_hstate) > mhp = &default_hstate_max_huge_pages; > else > mhp = &parsed_hstate->max_huge_pages; > @@ -1853,7 +1853,7 @@ static int __init hugetlb_nrpages_setup(char *s) > * But we need to allocate >= MAX_ORDER hstates here early to still > * use the bootmem allocator. > */ > - if (max_hstate && parsed_hstate->order >= MAX_ORDER) > + if (hugetlb_max_hstate && parsed_hstate->order >= MAX_ORDER) > hugetlb_hstate_alloc_pages(parsed_hstate); > > last_mhp = mhp; > -- > 1.7.9 > -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx178.postini.com [74.125.245.178]) by kanga.kvack.org (Postfix) with SMTP id B2B796B007E for ; Wed, 28 Mar 2012 05:41:36 -0400 (EDT) Date: Wed, 28 Mar 2012 11:41:34 +0200 From: Michal Hocko Subject: Re: [PATCH -V4 03/10] hugetlbfs: Add an inline helper for finding hstate index Message-ID: <20120328094134.GD20949@tiehlicka.suse.cz> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-4-git-send-email-aneesh.kumar@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1331919570-2264-4-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Sender: owner-linux-mm@kvack.org List-ID: To: "Aneesh Kumar K.V" Cc: linux-mm@kvack.org, mgorman@suse.de, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, aarcange@redhat.com, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org On Fri 16-03-12 23:09:23, Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > Add and inline helper and use it in the code. OK, helper function looks much nicer. > > Signed-off-by: Aneesh Kumar K.V Acked-by: Michal Hocko > --- > include/linux/hugetlb.h | 6 ++++++ > mm/hugetlb.c | 18 ++++++++++-------- > 2 files changed, 16 insertions(+), 8 deletions(-) > > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h > index d9d6c86..a2675b0 100644 > --- a/include/linux/hugetlb.h > +++ b/include/linux/hugetlb.h > @@ -311,6 +311,11 @@ static inline unsigned hstate_index_to_shift(unsigned index) > return hstates[index].order + PAGE_SHIFT; > } > > +static inline int hstate_index(struct hstate *h) > +{ > + return h - hstates; > +} > + > #else > struct hstate {}; > #define alloc_huge_page_node(h, nid) NULL > @@ -329,6 +334,7 @@ static inline unsigned int pages_per_huge_page(struct hstate *h) > return 1; > } > #define hstate_index_to_shift(index) 0 > +#define hstate_index(h) 0 > #endif > > #endif /* _LINUX_HUGETLB_H */ > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 3782da8..ebe245c 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -1557,7 +1557,7 @@ static int hugetlb_sysfs_add_hstate(struct hstate *h, struct kobject *parent, > struct attribute_group *hstate_attr_group) > { > int retval; > - int hi = h - hstates; > + int hi = hstate_index(h); > > hstate_kobjs[hi] = kobject_create_and_add(h->name, parent); > if (!hstate_kobjs[hi]) > @@ -1652,11 +1652,13 @@ void hugetlb_unregister_node(struct node *node) > if (!nhs->hugepages_kobj) > return; /* no hstate attributes */ > > - for_each_hstate(h) > - if (nhs->hstate_kobjs[h - hstates]) { > - kobject_put(nhs->hstate_kobjs[h - hstates]); > - nhs->hstate_kobjs[h - hstates] = NULL; > + for_each_hstate(h) { > + int idx = hstate_index(h); > + if (nhs->hstate_kobjs[idx]) { > + kobject_put(nhs->hstate_kobjs[idx]); > + nhs->hstate_kobjs[idx] = NULL; > } > + } > > kobject_put(nhs->hugepages_kobj); > nhs->hugepages_kobj = NULL; > @@ -1759,7 +1761,7 @@ static void __exit hugetlb_exit(void) > hugetlb_unregister_all_nodes(); > > for_each_hstate(h) { > - kobject_put(hstate_kobjs[h - hstates]); > + kobject_put(hstate_kobjs[hstate_index(h)]); > } > > kobject_put(hugepages_kobj); > @@ -2587,7 +2589,7 @@ retry: > */ > if (unlikely(PageHWPoison(page))) { > ret = VM_FAULT_HWPOISON | > - VM_FAULT_SET_HINDEX(h - hstates); > + VM_FAULT_SET_HINDEX(hstate_index(h)); > goto backout_unlocked; > } > } > @@ -2660,7 +2662,7 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, > return 0; > } else if (unlikely(is_hugetlb_entry_hwpoisoned(entry))) > return VM_FAULT_HWPOISON_LARGE | > - VM_FAULT_SET_HINDEX(h - hstates); > + VM_FAULT_SET_HINDEX(hstate_index(h)); > } > > ptep = huge_pte_alloc(mm, address, huge_page_size(h)); > -- > 1.7.9 > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ > Don't email: email@kvack.org -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx173.postini.com [74.125.245.173]) by kanga.kvack.org (Postfix) with SMTP id 887376B00F1 for ; Wed, 28 Mar 2012 07:33:09 -0400 (EDT) Date: Wed, 28 Mar 2012 13:33:04 +0200 From: Michal Hocko Subject: Re: [PATCH -V4 04/10] memcg: Add HugeTLB extension Message-ID: <20120328113304.GE20949@tiehlicka.suse.cz> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-5-git-send-email-aneesh.kumar@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1331919570-2264-5-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Sender: owner-linux-mm@kvack.org List-ID: To: "Aneesh Kumar K.V" Cc: linux-mm@kvack.org, mgorman@suse.de, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, aarcange@redhat.com, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org On Fri 16-03-12 23:09:24, Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > This patch implements a memcg extension that allows us to control > HugeTLB allocations via memory controller. And the infrastructure is not used at this stage (you forgot to mention). The changelog should be much more descriptive. > > Signed-off-by: Aneesh Kumar K.V > --- > include/linux/hugetlb.h | 1 + > include/linux/memcontrol.h | 42 +++++++++++++ > init/Kconfig | 8 +++ > mm/hugetlb.c | 2 +- > mm/memcontrol.c | 138 ++++++++++++++++++++++++++++++++++++++++++++ > 5 files changed, 190 insertions(+), 1 deletions(-) > [...] > diff --git a/init/Kconfig b/init/Kconfig > index 3f42cd6..f0eb8aa 100644 > --- a/init/Kconfig > +++ b/init/Kconfig > @@ -725,6 +725,14 @@ config CGROUP_PERF > > Say N if unsure. > > +config MEM_RES_CTLR_HUGETLB > + bool "Memory Resource Controller HugeTLB Extension (EXPERIMENTAL)" > + depends on CGROUP_MEM_RES_CTLR && HUGETLB_PAGE && EXPERIMENTAL > + default n > + help > + Add HugeTLB management to memory resource controller. When you > + enable this, you can put a per cgroup limit on HugeTLB usage. How does it interact with the hard/soft limists etc... [...] > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 6728a7a..4b36c5e 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -235,6 +235,10 @@ struct mem_cgroup { > */ > struct res_counter memsw; > /* > + * the counter to account for hugepages from hugetlb. > + */ > + struct res_counter hugepage[HUGE_MAX_HSTATE]; > + /* > * Per cgroup active and inactive list, similar to the > * per zone LRU lists. > */ > @@ -3156,6 +3160,128 @@ static inline int mem_cgroup_move_swap_account(swp_entry_t entry, > } > #endif > > +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB > +static bool mem_cgroup_have_hugetlb_usage(struct mem_cgroup *memcg) > +{ > + int idx; > + for (idx = 0; idx < hugetlb_max_hstate; idx++) { Maybe we should expose for_each_hstate as well... > + if (memcg->hugepage[idx].usage > 0) > + return 1; > + } > + return 0; > +} > + > +int mem_cgroup_hugetlb_charge_page(int idx, unsigned long nr_pages, > + struct mem_cgroup **ptr) > +{ > + int ret = 0; > + struct mem_cgroup *memcg; > + struct res_counter *fail_res; > + unsigned long csize = nr_pages * PAGE_SIZE; > + > + if (mem_cgroup_disabled()) > + return 0; > +again: > + rcu_read_lock(); > + memcg = mem_cgroup_from_task(current); > + if (!memcg) > + memcg = root_mem_cgroup; > + if (mem_cgroup_is_root(memcg)) { > + rcu_read_unlock(); > + goto done; > + } > + if (!css_tryget(&memcg->css)) { > + rcu_read_unlock(); > + goto again; > + } > + rcu_read_unlock(); > + > + ret = res_counter_charge(&memcg->hugepage[idx], csize, &fail_res); > + css_put(&memcg->css); > +done: > + *ptr = memcg; Why do we set ptr even for the failure case after we dropped a reference? > + return ret; > +} > + > +void mem_cgroup_hugetlb_commit_charge(int idx, unsigned long nr_pages, > + struct mem_cgroup *memcg, > + struct page *page) > +{ > + struct page_cgroup *pc; > + > + if (mem_cgroup_disabled()) > + return; > + > + pc = lookup_page_cgroup(page); > + lock_page_cgroup(pc); > + if (unlikely(PageCgroupUsed(pc))) { > + unlock_page_cgroup(pc); > + mem_cgroup_hugetlb_uncharge_memcg(idx, nr_pages, memcg); > + return; > + } > + pc->mem_cgroup = memcg; > + /* > + * We access a page_cgroup asynchronously without lock_page_cgroup(). > + * Especially when a page_cgroup is taken from a page, pc->mem_cgroup > + * is accessed after testing USED bit. To make pc->mem_cgroup visible > + * before USED bit, we need memory barrier here. > + * See mem_cgroup_add_lru_list(), etc. > + */ > + smp_wmb(); Is this really necessary for hugetlb pages as well? > + SetPageCgroupUsed(pc); > + > + unlock_page_cgroup(pc); > + return; > +} > + [...] > @@ -4887,6 +5013,7 @@ err_cleanup: > static struct cgroup_subsys_state * __ref > mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont) > { > + int idx; > struct mem_cgroup *memcg, *parent; > long error = -ENOMEM; > int node; > @@ -4929,9 +5056,14 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont) > * mem_cgroup(see mem_cgroup_put). > */ > mem_cgroup_get(parent); > + for (idx = 0; idx < HUGE_MAX_HSTATE; idx++) Do we have to init all hstates or is hugetlb_max_hstate enough? > + res_counter_init(&memcg->hugepage[idx], > + &parent->hugepage[idx]); > } else { > res_counter_init(&memcg->res, NULL); > res_counter_init(&memcg->memsw, NULL); > + for (idx = 0; idx < HUGE_MAX_HSTATE; idx++) > + res_counter_init(&memcg->hugepage[idx], NULL); Same here -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx107.postini.com [74.125.245.107]) by kanga.kvack.org (Postfix) with SMTP id 631CB6B00F4 for ; Wed, 28 Mar 2012 07:36:09 -0400 (EDT) Received: from /spool/local by e23smtp04.au.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 28 Mar 2012 11:18:35 +1000 Received: from d23av03.au.ibm.com (d23av03.au.ibm.com [9.190.234.97]) by d23relay05.au.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id q2SBTkKd3694632 for ; Wed, 28 Mar 2012 22:29:46 +1100 Received: from d23av03.au.ibm.com (loopback [127.0.0.1]) by d23av03.au.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id q2SBZs5T019441 for ; Wed, 28 Mar 2012 22:35:55 +1100 From: "Aneesh Kumar K.V" Subject: Re: [PATCH -V4 02/10] hugetlbfs: don't use ERR_PTR with VM_FAULT* values In-Reply-To: <20120328092547.GC20949@tiehlicka.suse.cz> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-3-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <20120328092547.GC20949@tiehlicka.suse.cz>User-Agent: Notmuch/0.11.1+346~g13d19c3 (http://notmuchmail.org) Emacs/23.3.1 (x86_64-pc-linux-gnu) Date: Wed, 28 Mar 2012 17:05:49 +0530 Message-ID: <87vclpyn3e.fsf@skywalker.in.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-linux-mm@kvack.org List-ID: To: Michal Hocko Cc: linux-mm@kvack.org, mgorman@suse.de, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, aarcange@redhat.com, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Michal Hocko writes: > On Fri 16-03-12 23:09:22, Aneesh Kumar K.V wrote: >> From: "Aneesh Kumar K.V" >> >> Using VM_FAULT_* codes with ERR_PTR will require us to make sure >> VM_FAULT_* values will not exceed MAX_ERRNO value. >> >> Signed-off-by: Aneesh Kumar K.V >> --- >> mm/hugetlb.c | 18 +++++++++++++----- >> 1 files changed, 13 insertions(+), 5 deletions(-) >> >> diff --git a/mm/hugetlb.c b/mm/hugetlb.c >> index d623e71..3782da8 100644 >> --- a/mm/hugetlb.c >> +++ b/mm/hugetlb.c > [...] >> @@ -1047,7 +1047,7 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma, >> page = alloc_buddy_huge_page(h, NUMA_NO_NODE); >> if (!page) { >> hugetlb_put_quota(inode->i_mapping, chg); >> - return ERR_PTR(-VM_FAULT_SIGBUS); >> + return ERR_PTR(-ENOSPC); > > Hmm, so one error code abuse replaced by another? > I know that ENOMEM would revert 4a6018f7 which would be unfortunate but > ENOSPC doesn't feel right as well. > File systems do map ENOSPC to SIGBUS. block_page_mkwrite_return() does that. -aneesh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx108.postini.com [74.125.245.108]) by kanga.kvack.org (Postfix) with SMTP id 5BEE96B00F9 for ; Wed, 28 Mar 2012 09:17:09 -0400 (EDT) Date: Wed, 28 Mar 2012 15:17:06 +0200 From: Michal Hocko Subject: Re: [PATCH -V4 05/10] hugetlb: add charge/uncharge calls for HugeTLB alloc/free Message-ID: <20120328131706.GF20949@tiehlicka.suse.cz> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-6-git-send-email-aneesh.kumar@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1331919570-2264-6-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Sender: owner-linux-mm@kvack.org List-ID: To: "Aneesh Kumar K.V" Cc: linux-mm@kvack.org, mgorman@suse.de, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, aarcange@redhat.com, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org On Fri 16-03-12 23:09:25, Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > This adds necessary charge/uncharge calls in the HugeTLB code This begs for more description... Other than that it looks correct. > Acked-by: Hillf Danton > Signed-off-by: Aneesh Kumar K.V > --- > mm/hugetlb.c | 21 ++++++++++++++++++++- > mm/memcontrol.c | 5 +++++ > 2 files changed, 25 insertions(+), 1 deletions(-) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index c672187..91361a0 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -21,6 +21,8 @@ > #include > #include > #include > +#include > +#include > > #include > #include > @@ -542,6 +544,9 @@ static void free_huge_page(struct page *page) > BUG_ON(page_mapcount(page)); > INIT_LIST_HEAD(&page->lru); > > + if (mapping) > + mem_cgroup_hugetlb_uncharge_page(hstate_index(h), > + pages_per_huge_page(h), page); > spin_lock(&hugetlb_lock); > if (h->surplus_huge_pages_node[nid] && huge_page_order(h) < MAX_ORDER) { > update_and_free_page(h, page); > @@ -1019,12 +1024,15 @@ static void vma_commit_reservation(struct hstate *h, > static struct page *alloc_huge_page(struct vm_area_struct *vma, > unsigned long addr, int avoid_reserve) > { > + int ret, idx; > struct hstate *h = hstate_vma(vma); > struct page *page; > + struct mem_cgroup *memcg = NULL; > struct address_space *mapping = vma->vm_file->f_mapping; > struct inode *inode = mapping->host; > long chg; > > + idx = hstate_index(h); > /* > * Processes that did not create the mapping will have no reserves and > * will not have accounted against quota. Check that the quota can be > @@ -1039,6 +1047,12 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma, > if (hugetlb_get_quota(inode->i_mapping, chg)) > return ERR_PTR(-ENOSPC); > > + ret = mem_cgroup_hugetlb_charge_page(idx, pages_per_huge_page(h), > + &memcg); > + if (ret) { > + hugetlb_put_quota(inode->i_mapping, chg); > + return ERR_PTR(-ENOSPC); > + } > spin_lock(&hugetlb_lock); > page = dequeue_huge_page_vma(h, vma, addr, avoid_reserve); > spin_unlock(&hugetlb_lock); > @@ -1046,6 +1060,9 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma, > if (!page) { > page = alloc_buddy_huge_page(h, NUMA_NO_NODE); > if (!page) { > + mem_cgroup_hugetlb_uncharge_memcg(idx, > + pages_per_huge_page(h), > + memcg); > hugetlb_put_quota(inode->i_mapping, chg); > return ERR_PTR(-ENOSPC); > } > @@ -1054,7 +1071,9 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma, > set_page_private(page, (unsigned long) mapping); > > vma_commit_reservation(h, vma, addr); > - > + /* update page cgroup details */ > + mem_cgroup_hugetlb_commit_charge(idx, pages_per_huge_page(h), > + memcg, page); > return page; > } > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 4b36c5e..7a9ea94 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -2901,6 +2901,11 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype) > > if (PageSwapCache(page)) > return NULL; > + /* > + * HugeTLB page uncharge happen in the HugeTLB compound page destructor > + */ > + if (PageHuge(page)) > + return NULL; > > if (PageTransHuge(page)) { > nr_pages <<= compound_order(page); > -- > 1.7.9 > -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx111.postini.com [74.125.245.111]) by kanga.kvack.org (Postfix) with SMTP id CCF526B0105 for ; Wed, 28 Mar 2012 10:07:35 -0400 (EDT) Date: Wed, 28 Mar 2012 16:07:33 +0200 From: Michal Hocko Subject: Re: [PATCH -V4 09/10] memcg: move HugeTLB resource count to parent cgroup on memcg removal Message-ID: <20120328140733.GI20949@tiehlicka.suse.cz> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-10-git-send-email-aneesh.kumar@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1331919570-2264-10-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Sender: owner-linux-mm@kvack.org List-ID: To: "Aneesh Kumar K.V" Cc: linux-mm@kvack.org, mgorman@suse.de, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, aarcange@redhat.com, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org On Fri 16-03-12 23:09:29, Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > This add support for memcg removal with HugeTLB resource usage. > > Signed-off-by: Aneesh Kumar K.V > --- > include/linux/hugetlb.h | 6 ++++ > include/linux/memcontrol.h | 15 +++++++++- > mm/hugetlb.c | 41 ++++++++++++++++++++++++++ > mm/memcontrol.c | 68 +++++++++++++++++++++++++++++++++++++------ > 4 files changed, 119 insertions(+), 11 deletions(-) > [...] > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 8fd465d..685f0d5 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c [...] > @@ -3285,10 +3287,57 @@ void mem_cgroup_hugetlb_uncharge_memcg(int idx, unsigned long nr_pages, > res_counter_uncharge(&memcg->hugepage[idx], csize); > return; > } > -#else > -static bool mem_cgroup_have_hugetlb_usage(struct mem_cgroup *memcg) > + > +int mem_cgroup_move_hugetlb_parent(int idx, struct cgroup *cgroup, > + struct page *page) > { > - return 0; > + struct page_cgroup *pc; > + int csize, ret = 0; > + struct res_counter *fail_res; > + struct cgroup *pcgrp = cgroup->parent; > + struct mem_cgroup *parent = mem_cgroup_from_cont(pcgrp); > + struct mem_cgroup *memcg = mem_cgroup_from_cont(cgroup); > + > + if (!get_page_unless_zero(page)) > + goto out; > + > + pc = lookup_page_cgroup(page); > + lock_page_cgroup(pc); > + if (!PageCgroupUsed(pc) || pc->mem_cgroup != memcg) > + goto err_out; > + > + csize = PAGE_SIZE << compound_order(page); > + /* > + * uncharge from child and charge the parent. If we have > + * use_hierarchy set, we can never fail here. In-order to make > + * sure we don't get -ENOMEM on parent charge, we first uncharge > + * the child and then charge the parent. > + */ > + if (parent->use_hierarchy) { > + res_counter_uncharge(&memcg->hugepage[idx], csize); > + if (!mem_cgroup_is_root(parent)) > + ret = res_counter_charge(&parent->hugepage[idx], > + csize, &fail_res); You can still race with other hugetlb charge which would make this fail. > + } else { > + if (!mem_cgroup_is_root(parent)) { > + ret = res_counter_charge(&parent->hugepage[idx], > + csize, &fail_res); > + if (ret) { > + ret = -EBUSY; > + goto err_out; > + } > + } > + res_counter_uncharge(&memcg->hugepage[idx], csize); > + } > + /* > + * caller should have done css_get > + */ > + pc->mem_cgroup = parent; > +err_out: > + unlock_page_cgroup(pc); > + put_page(page); > +out: > + return ret; > } > #endif /* CONFIG_MEM_RES_CTLR_HUGETLB */ [...] -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx164.postini.com [74.125.245.164]) by kanga.kvack.org (Postfix) with SMTP id 95ED16B010A for ; Wed, 28 Mar 2012 10:37:01 -0400 (EDT) Date: Wed, 28 Mar 2012 16:36:58 +0200 From: Michal Hocko Subject: Re: [PATCH -V4 10/10] memcg: Add memory controller documentation for hugetlb management Message-ID: <20120328143658.GJ20949@tiehlicka.suse.cz> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-11-git-send-email-aneesh.kumar@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1331919570-2264-11-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Sender: owner-linux-mm@kvack.org List-ID: To: "Aneesh Kumar K.V" Cc: linux-mm@kvack.org, mgorman@suse.de, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, aarcange@redhat.com, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org On Fri 16-03-12 23:09:30, Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > Signed-off-by: Aneesh Kumar K.V > --- > Documentation/cgroups/memory.txt | 29 +++++++++++++++++++++++++++++ > 1 files changed, 29 insertions(+), 0 deletions(-) > > diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt > index 4c95c00..d99c41b 100644 > --- a/Documentation/cgroups/memory.txt > +++ b/Documentation/cgroups/memory.txt > @@ -43,6 +43,7 @@ Features: > - usage threshold notifier > - oom-killer disable knob and oom-notifier > - Root cgroup has no limit controls. > + - resource accounting for HugeTLB pages > > Kernel memory support is work in progress, and the current version provides > basically functionality. (See Section 2.7) > @@ -75,6 +76,12 @@ Brief summary of control files. > memory.kmem.tcp.limit_in_bytes # set/show hard limit for tcp buf memory > memory.kmem.tcp.usage_in_bytes # show current tcp buf memory allocation > > + > + memory.hugetlb..limit_in_bytes # set/show limit of "hugepagesize" hugetlb usage > + memory.hugetlb..max_usage_in_bytes # show max "hugepagesize" hugetlb usage recorded > + memory.hugetlb..usage_in_bytes # show current res_counter usage for "hugepagesize" hugetlb > + # see 5.7 for details > + > 1. History > > The memory controller has a long history. A request for comments for the memory > @@ -279,6 +286,15 @@ per cgroup, instead of globally. > > * tcp memory pressure: sockets memory pressure for the tcp protocol. > > +2.8 HugeTLB extension > + > +This extension allows to limit the HugeTLB usage per control group and > +enforces the controller limit during page fault. Since HugeTLB doesn't > +support page reclaim, enforcing the limit at page fault time implies that, > +the application will get SIGBUS signal if it tries to access HugeTLB pages > +beyond its limit. This is consistent with the quota so we should mention that. We should also add a note how we interact with quotas. Another important thing to note is that the limit/usage are unrelated to memcg hard/soft limit/usage. > This requires the application to know beforehand how much > +HugeTLB pages it would require for its use. > + > 3. User Interface > > 0. Configuration > @@ -287,6 +303,7 @@ a. Enable CONFIG_CGROUPS > b. Enable CONFIG_RESOURCE_COUNTERS > c. Enable CONFIG_CGROUP_MEM_RES_CTLR > d. Enable CONFIG_CGROUP_MEM_RES_CTLR_SWAP (to use swap extension) > +f. Enable CONFIG_MEM_RES_CTLR_HUGETLB (to use HugeTLB extension) > > 1. Prepare the cgroups (see cgroups.txt, Why are cgroups needed?) > # mount -t tmpfs none /sys/fs/cgroup > @@ -510,6 +527,18 @@ unevictable= N0= N1= ... > > And we have total = file + anon + unevictable. > > +5.7 HugeTLB resource control files > +For a system supporting two hugepage size (16M and 16G) the control > +files include: > + > + memory.hugetlb.16GB.limit_in_bytes > + memory.hugetlb.16GB.max_usage_in_bytes > + memory.hugetlb.16GB.usage_in_bytes > + memory.hugetlb.16MB.limit_in_bytes > + memory.hugetlb.16MB.max_usage_in_bytes > + memory.hugetlb.16MB.usage_in_bytes > + > + > 6. Hierarchy support > > The memory controller supports a deep hierarchy and hierarchical accounting. > -- > 1.7.9 > -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx134.postini.com [74.125.245.134]) by kanga.kvack.org (Postfix) with SMTP id D0BA16B0115 for ; Wed, 28 Mar 2012 13:37:38 -0400 (EDT) Received: from /spool/local by e23smtp03.au.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 28 Mar 2012 17:28:56 +1000 Received: from d23av02.au.ibm.com (d23av02.au.ibm.com [9.190.235.138]) by d23relay04.au.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id q2SHVIgZ3375176 for ; Thu, 29 Mar 2012 04:31:18 +1100 Received: from d23av02.au.ibm.com (loopback [127.0.0.1]) by d23av02.au.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id q2SHbOe3028785 for ; Thu, 29 Mar 2012 04:37:24 +1100 From: "Aneesh Kumar K.V" Subject: Re: [PATCH -V4 04/10] memcg: Add HugeTLB extension In-Reply-To: <20120328134020.GG20949@tiehlicka.suse.cz> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-5-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <20120328134020.GG20949@tiehlicka.suse.cz>User-Agent: Notmuch/0.11.1+346~g13d19c3 (http://notmuchmail.org) Emacs/23.3.1 (x86_64-pc-linux-gnu) Date: Wed, 28 Mar 2012 23:07:14 +0530 Message-ID: <87y5qk1vat.fsf@skywalker.in.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-linux-mm@kvack.org List-ID: To: Michal Hocko Cc: linux-mm@kvack.org, mgorman@suse.de, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, aarcange@redhat.com, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Michal Hocko writes: > On Fri 16-03-12 23:09:24, Aneesh Kumar K.V wrote: > [...] >> diff --git a/mm/memcontrol.c b/mm/memcontrol.c >> index 6728a7a..4b36c5e 100644 >> --- a/mm/memcontrol.c >> +++ b/mm/memcontrol.c > [...] >> @@ -4887,6 +5013,7 @@ err_cleanup: >> static struct cgroup_subsys_state * __ref >> mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont) >> { >> + int idx; >> struct mem_cgroup *memcg, *parent; >> long error = -ENOMEM; >> int node; >> @@ -4929,9 +5056,14 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont) >> * mem_cgroup(see mem_cgroup_put). >> */ >> mem_cgroup_get(parent); >> + for (idx = 0; idx < HUGE_MAX_HSTATE; idx++) >> + res_counter_init(&memcg->hugepage[idx], >> + &parent->hugepage[idx]); > > Hmm, I do not think we want to make groups deeper in the hierarchy > unlimited as we cannot reclaim. Shouldn't we copy the limit from the parent? > Still not ideal but slightly more expected behavior IMO. But we should be limiting the child group based on parent's limit only when hierarchy is set right ? > > The hierarchy setups are still interesting and the limitations should be > described in the documentation... > It should behave similar to memcg. ie, if hierarchy is set, then we limit using MIN(parent's limit, child's limit). May be I am missing some of the details of memcg use_hierarchy config. My goal was to keep it similar to memcg. Can you explain why do you think the patch would make it any different ? -aneesh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx171.postini.com [74.125.245.171]) by kanga.kvack.org (Postfix) with SMTP id CC67D6B011B for ; Wed, 28 Mar 2012 13:39:49 -0400 (EDT) Received: from /spool/local by e23smtp07.au.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 28 Mar 2012 17:33:38 +1000 Received: from d23av01.au.ibm.com (d23av01.au.ibm.com [9.190.234.96]) by d23relay04.au.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id q2SHXcB73203254 for ; Thu, 29 Mar 2012 04:33:38 +1100 Received: from d23av01.au.ibm.com (loopback [127.0.0.1]) by d23av01.au.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id q2SHdiam017576 for ; Thu, 29 Mar 2012 04:39:44 +1100 From: "Aneesh Kumar K.V" Subject: Re: [PATCH -V4 05/10] hugetlb: add charge/uncharge calls for HugeTLB alloc/free In-Reply-To: <20120328131706.GF20949@tiehlicka.suse.cz> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-6-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <20120328131706.GF20949@tiehlicka.suse.cz>User-Agent: Notmuch/0.11.1+346~g13d19c3 (http://notmuchmail.org) Emacs/23.3.1 (x86_64-pc-linux-gnu) Date: Wed, 28 Mar 2012 23:09:34 +0530 Message-ID: <87sjgs1v6x.fsf@skywalker.in.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-linux-mm@kvack.org List-ID: To: Michal Hocko Cc: linux-mm@kvack.org, mgorman@suse.de, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, aarcange@redhat.com, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Michal Hocko writes: > On Fri 16-03-12 23:09:25, Aneesh Kumar K.V wrote: >> From: "Aneesh Kumar K.V" >> >> This adds necessary charge/uncharge calls in the HugeTLB code > > This begs for more description... > Other than that it looks correct. > Updated as below hugetlb: add charge/uncharge calls for HugeTLB alloc/free This adds necessary charge/uncharge calls in the HugeTLB code. We do memcg charge in page alloc and uncharge in compound page destructor. We also need to ignore HugeTLB pages in __mem_cgroup_uncharge_common because that get called from delete_from_page_cache -aneesh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx146.postini.com [74.125.245.146]) by kanga.kvack.org (Postfix) with SMTP id 537C26B0123 for ; Wed, 28 Mar 2012 20:20:30 -0400 (EDT) Received: from m3.gw.fujitsu.co.jp (unknown [10.0.50.73]) by fgwmail6.fujitsu.co.jp (Postfix) with ESMTP id 5A05D3EE0B5 for ; Thu, 29 Mar 2012 09:20:28 +0900 (JST) Received: from smail (m3 [127.0.0.1]) by outgoing.m3.gw.fujitsu.co.jp (Postfix) with ESMTP id 3BAF745DEB3 for ; Thu, 29 Mar 2012 09:20:28 +0900 (JST) Received: from s3.gw.fujitsu.co.jp (s3.gw.fujitsu.co.jp [10.0.50.93]) by m3.gw.fujitsu.co.jp (Postfix) with ESMTP id 1AAC145DEAD for ; Thu, 29 Mar 2012 09:20:28 +0900 (JST) Received: from s3.gw.fujitsu.co.jp (localhost.localdomain [127.0.0.1]) by s3.gw.fujitsu.co.jp (Postfix) with ESMTP id 01717E38008 for ; Thu, 29 Mar 2012 09:20:28 +0900 (JST) Received: from ml14.s.css.fujitsu.com (ml14.s.css.fujitsu.com [10.240.81.134]) by s3.gw.fujitsu.co.jp (Postfix) with ESMTP id AD313E38005 for ; Thu, 29 Mar 2012 09:20:27 +0900 (JST) Message-ID: <4F73AA5F.5050604@jp.fujitsu.com> Date: Thu, 29 Mar 2012 09:18:39 +0900 From: KAMEZAWA Hiroyuki MIME-Version: 1.0 Subject: Re: [PATCH -V4 04/10] memcg: Add HugeTLB extension References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-5-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <20120328134020.GG20949@tiehlicka.suse.cz>User-Agent: Notmuch/0.11.1+346~g13d19c3 (http://notmuchmail.org) Emacs/23.3.1 (x86_64-pc-linux-gnu) <87y5qk1vat.fsf@skywalker.in.ibm.com> In-Reply-To: <87y5qk1vat.fsf@skywalker.in.ibm.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: "Aneesh Kumar K.V" Cc: Michal Hocko , linux-mm@kvack.org, mgorman@suse.de, dhillf@gmail.com, aarcange@redhat.com, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org (2012/03/29 2:37), Aneesh Kumar K.V wrote: > Michal Hocko writes: > >> On Fri 16-03-12 23:09:24, Aneesh Kumar K.V wrote: >> [...] >>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c >>> index 6728a7a..4b36c5e 100644 >>> --- a/mm/memcontrol.c >>> +++ b/mm/memcontrol.c >> [...] >>> @@ -4887,6 +5013,7 @@ err_cleanup: >>> static struct cgroup_subsys_state * __ref >>> mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont) >>> { >>> + int idx; >>> struct mem_cgroup *memcg, *parent; >>> long error = -ENOMEM; >>> int node; >>> @@ -4929,9 +5056,14 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont) >>> * mem_cgroup(see mem_cgroup_put). >>> */ >>> mem_cgroup_get(parent); >>> + for (idx = 0; idx < HUGE_MAX_HSTATE; idx++) >>> + res_counter_init(&memcg->hugepage[idx], >>> + &parent->hugepage[idx]); >> >> Hmm, I do not think we want to make groups deeper in the hierarchy >> unlimited as we cannot reclaim. Shouldn't we copy the limit from the parent? >> Still not ideal but slightly more expected behavior IMO. > > But we should be limiting the child group based on parent's limit only > when hierarchy is set right ? > >> >> The hierarchy setups are still interesting and the limitations should be >> described in the documentation... >> > > It should behave similar to memcg. ie, if hierarchy is set, then we limit > using MIN(parent's limit, child's limit). May be I am missing some of > the details of memcg use_hierarchy config. My goal was to keep it > similar to memcg. Can you explain why do you think the patch would > make it any different ? > Maybe this is a different story but.... Tejun(Cgroup Maintainer) asked us to remove 'use_hierarchy' settings because most of other cgroups are hierarchical(*). I answered that improvement in res_counter latency is required. And now, we have some idea to improve res_counter. (I'd like to try this after page_cgroup diet series..) If we change and drop use_hierarchy, the usage similar to current use_hierarchy=0 will be.. /cgroup/memory/ = unlimited level1 = unlimited level2 = unlimited level3 = limit To do this, after improvement of res_counter, we entry use_hierarchy into feature-removal-list and wait for 2 versions..So, this will not affect your developments, anyway. Thanks, -Kame (*) AFAIK, blkio cgroup needs tons of work to be hierarchical... -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx101.postini.com [74.125.245.101]) by kanga.kvack.org (Postfix) with SMTP id 7D0C36B0044 for ; Thu, 29 Mar 2012 03:57:25 -0400 (EDT) Date: Thu, 29 Mar 2012 09:57:22 +0200 From: Michal Hocko Subject: Re: [PATCH -V4 04/10] memcg: Add HugeTLB extension Message-ID: <20120329075722.GB30465@tiehlicka.suse.cz> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-5-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <20120328134020.GG20949@tiehlicka.suse.cz> <87y5qk1vat.fsf@skywalker.in.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87y5qk1vat.fsf@skywalker.in.ibm.com> Sender: owner-linux-mm@kvack.org List-ID: To: "Aneesh Kumar K.V" Cc: linux-mm@kvack.org, mgorman@suse.de, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, aarcange@redhat.com, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org On Wed 28-03-12 23:07:14, Aneesh Kumar K.V wrote: > Michal Hocko writes: > > > On Fri 16-03-12 23:09:24, Aneesh Kumar K.V wrote: > > [...] > >> diff --git a/mm/memcontrol.c b/mm/memcontrol.c > >> index 6728a7a..4b36c5e 100644 > >> --- a/mm/memcontrol.c > >> +++ b/mm/memcontrol.c > > [...] > >> @@ -4887,6 +5013,7 @@ err_cleanup: > >> static struct cgroup_subsys_state * __ref > >> mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont) > >> { > >> + int idx; > >> struct mem_cgroup *memcg, *parent; > >> long error = -ENOMEM; > >> int node; > >> @@ -4929,9 +5056,14 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont) > >> * mem_cgroup(see mem_cgroup_put). > >> */ > >> mem_cgroup_get(parent); > >> + for (idx = 0; idx < HUGE_MAX_HSTATE; idx++) > >> + res_counter_init(&memcg->hugepage[idx], > >> + &parent->hugepage[idx]); > > > > Hmm, I do not think we want to make groups deeper in the hierarchy > > unlimited as we cannot reclaim. Shouldn't we copy the limit from the parent? > > Still not ideal but slightly more expected behavior IMO. > > But we should be limiting the child group based on parent's limit only > when hierarchy is set right ? Yes. Everything else should be unlimited by default. > > > > > The hierarchy setups are still interesting and the limitations should be > > described in the documentation... > > > > It should behave similar to memcg. ie, if hierarchy is set, then we limit > using MIN(parent's limit, child's limit). May be I am missing some of > the details of memcg use_hierarchy config. My goal was to keep it > similar to memcg. Can you explain why do you think the patch would > make it any different ? Yes, the patch tries to be consistent with the memcg limits. That is OK and I have no objections for that. It is just that consequences are different. The hugetlb limit is really hard... > > -aneesh > > -- > To unsubscribe from this list: send the line "unsubscribe cgroups" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx180.postini.com [74.125.245.180]) by kanga.kvack.org (Postfix) with SMTP id C9B686B0044 for ; Fri, 30 Mar 2012 06:47:03 -0400 (EDT) Date: Fri, 30 Mar 2012 12:46:50 +0200 From: Michal Hocko Subject: Re: [PATCH -V4 05/10] hugetlb: add charge/uncharge calls for HugeTLB alloc/free Message-ID: <20120330104650.GB15375@tiehlicka.suse.cz> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-6-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <20120328131706.GF20949@tiehlicka.suse.cz> <87sjgs1v6x.fsf@skywalker.in.ibm.com> <20120329081003.GC30465@tiehlicka.suse.cz> <871uoamkxr.fsf@skywalker.in.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <871uoamkxr.fsf@skywalker.in.ibm.com> Sender: owner-linux-mm@kvack.org List-ID: To: "Aneesh Kumar K.V" Cc: linux-mm@kvack.org, mgorman@suse.de, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, aarcange@redhat.com, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org On Fri 30-03-12 16:10:00, Aneesh Kumar K.V wrote: > Michal Hocko writes: > > > On Wed 28-03-12 23:09:34, Aneesh Kumar K.V wrote: > >> Michal Hocko writes: > >> > >> > On Fri 16-03-12 23:09:25, Aneesh Kumar K.V wrote: > >> >> From: "Aneesh Kumar K.V" > >> >> > >> >> This adds necessary charge/uncharge calls in the HugeTLB code > >> > > >> > This begs for more description... > >> > Other than that it looks correct. > >> > > >> > >> Updated as below > >> > >> hugetlb: add charge/uncharge calls for HugeTLB alloc/free > >> > >> This adds necessary charge/uncharge calls in the HugeTLB code. We do > >> memcg charge in page alloc and uncharge in compound page destructor. > >> We also need to ignore HugeTLB pages in __mem_cgroup_uncharge_common > >> because that get called from delete_from_page_cache > > > > and from mem_cgroup_end_migration used during soft_offline_page. > > > > Btw., while looking at mem_cgroup_end_migration, I have noticed that you > > need to take care of mem_cgroup_prepare_migration as well otherwise the > > page would get charged as a normal (shmem) page. > > > > Won't we skip HugeTLB pages in migrate ? Yes but we still migrate for memory failure (see soft_offline_page). > check_range do check for is_vm_hugetlb_page. > > -aneesh > -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752872Ab2CPRjz (ORCPT ); Fri, 16 Mar 2012 13:39:55 -0400 Received: from e28smtp08.in.ibm.com ([122.248.162.8]:33319 "EHLO e28smtp08.in.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752512Ab2CPRjv (ORCPT ); Fri, 16 Mar 2012 13:39:51 -0400 From: "Aneesh Kumar K.V" To: linux-mm@kvack.org, mgorman@suse.de, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, aarcange@redhat.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: [PATCH -V4 00/10] memcg: Add memcg extension to control HugeTLB allocation Date: Fri, 16 Mar 2012 23:09:20 +0530 Message-Id: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> X-Mailer: git-send-email 1.7.9 x-cbid: 12031617-2000-0000-0000-000006CC307F Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, This patchset implements a memory controller extension to control HugeTLB allocations. The extension allows to limit the HugeTLB usage per control group and enforces the controller limit during page fault. Since HugeTLB doesn't support page reclaim, enforcing the limit at page fault time implies that, the application will get SIGBUS signal if it tries to access HugeTLB pages beyond its limit. This requires the application to know beforehand how much HugeTLB pages it would require for its use. The goal is to control how many HugeTLB pages a group of task can allocate. It can be looked at as an extension of the existing quota interface which limits the number of HugeTLB pages per hugetlbfs superblock. HPC job scheduler requires jobs to specify their resource requirements in the job file. Once their requirements can be met, job schedulers like (SLURM) will schedule the job. We need to make sure that the jobs won't consume more resources than requested. If they do we should either error out or kill the application. Changes from v3: * Address review feedback. * bug fix in cgroup removal related parent charging with use_hierarchy set Changes from V2: * Changed the implementation to limit the HugeTLB usage during page fault time. This simplifies the extension and keep it closer to memcg design. This also allows to support cgroup removal with less complexity. Only caveat is the application should ensure its HugeTLB usage doesn't cross the cgroup limit. Changes from V1: * Changed the implementation as a memcg extension. We still use the same logic to track the cgroup and range. Changes from RFC post: * Added support for HugeTLB cgroup hierarchy * Added support for task migration * Added documentation patch * Other bug fixes -aneesh From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753440Ab2CPRkK (ORCPT ); Fri, 16 Mar 2012 13:40:10 -0400 Received: from e28smtp08.in.ibm.com ([122.248.162.8]:33369 "EHLO e28smtp08.in.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753099Ab2CPRkC (ORCPT ); Fri, 16 Mar 2012 13:40:02 -0400 From: "Aneesh Kumar K.V" To: linux-mm@kvack.org, mgorman@suse.de, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, aarcange@redhat.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, "Aneesh Kumar K.V" Subject: [PATCH -V4 08/10] hugetlbfs: Add a list for tracking in-use HugeTLB pages Date: Fri, 16 Mar 2012 23:09:28 +0530 Message-Id: <1331919570-2264-9-git-send-email-aneesh.kumar@linux.vnet.ibm.com> X-Mailer: git-send-email 1.7.9 In-Reply-To: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> x-cbid: 12031617-2000-0000-0000-000006CC309D Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: "Aneesh Kumar K.V" hugepage_activelist will be used to track currently used HugeTLB pages. We need to find the in-use HugeTLB pages to support memcg removal. On memcg removal we update the page's memory cgroup to point to parent cgroup. Signed-off-by: Aneesh Kumar K.V --- include/linux/hugetlb.h | 1 + mm/hugetlb.c | 23 ++++++++++++++++++----- 2 files changed, 19 insertions(+), 5 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index cbd8dc5..6919100 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -217,6 +217,7 @@ struct hstate { unsigned long resv_huge_pages; unsigned long surplus_huge_pages; unsigned long nr_overcommit_huge_pages; + struct list_head hugepage_activelist; struct list_head hugepage_freelists[MAX_NUMNODES]; unsigned int nr_huge_pages_node[MAX_NUMNODES]; unsigned int free_huge_pages_node[MAX_NUMNODES]; diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 684849a..8fd465d 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -433,7 +433,7 @@ void copy_huge_page(struct page *dst, struct page *src) static void enqueue_huge_page(struct hstate *h, struct page *page) { int nid = page_to_nid(page); - list_add(&page->lru, &h->hugepage_freelists[nid]); + list_move(&page->lru, &h->hugepage_freelists[nid]); h->free_huge_pages++; h->free_huge_pages_node[nid]++; } @@ -445,7 +445,7 @@ static struct page *dequeue_huge_page_node(struct hstate *h, int nid) if (list_empty(&h->hugepage_freelists[nid])) return NULL; page = list_entry(h->hugepage_freelists[nid].next, struct page, lru); - list_del(&page->lru); + list_move(&page->lru, &h->hugepage_activelist); set_page_refcounted(page); h->free_huge_pages--; h->free_huge_pages_node[nid]--; @@ -542,13 +542,14 @@ static void free_huge_page(struct page *page) page->mapping = NULL; BUG_ON(page_count(page)); BUG_ON(page_mapcount(page)); - INIT_LIST_HEAD(&page->lru); if (mapping) mem_cgroup_hugetlb_uncharge_page(hstate_index(h), pages_per_huge_page(h), page); spin_lock(&hugetlb_lock); if (h->surplus_huge_pages_node[nid] && huge_page_order(h) < MAX_ORDER) { + /* remove the page from active list */ + list_del(&page->lru); update_and_free_page(h, page); h->surplus_huge_pages--; h->surplus_huge_pages_node[nid]--; @@ -562,6 +563,7 @@ static void free_huge_page(struct page *page) static void prep_new_huge_page(struct hstate *h, struct page *page, int nid) { + INIT_LIST_HEAD(&page->lru); set_compound_page_dtor(page, free_huge_page); spin_lock(&hugetlb_lock); h->nr_huge_pages++; @@ -1861,6 +1863,7 @@ void __init hugetlb_add_hstate(unsigned order) h->free_huge_pages = 0; for (i = 0; i < MAX_NUMNODES; ++i) INIT_LIST_HEAD(&h->hugepage_freelists[i]); + INIT_LIST_HEAD(&h->hugepage_activelist); h->next_nid_to_alloc = first_node(node_states[N_HIGH_MEMORY]); h->next_nid_to_free = first_node(node_states[N_HIGH_MEMORY]); snprintf(h->name, HSTATE_NAME_LEN, "hugepages-%lukB", @@ -2319,14 +2322,24 @@ void __unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start, page = pte_page(pte); if (pte_dirty(pte)) set_page_dirty(page); - list_add(&page->lru, &page_list); + + spin_lock(&hugetlb_lock); + list_move(&page->lru, &page_list); + spin_unlock(&hugetlb_lock); } spin_unlock(&mm->page_table_lock); flush_tlb_range(vma, start, end); mmu_notifier_invalidate_range_end(mm, start, end); list_for_each_entry_safe(page, tmp, &page_list, lru) { page_remove_rmap(page); - list_del(&page->lru); + /* + * We need to move it back huge page active list. If we are + * holding the last reference, below put_page will move it + * back to free list. + */ + spin_lock(&hugetlb_lock); + list_move(&page->lru, &h->hugepage_activelist); + spin_unlock(&hugetlb_lock); put_page(page); } } -- 1.7.9 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753369Ab2CPRkH (ORCPT ); Fri, 16 Mar 2012 13:40:07 -0400 Received: from e28smtp02.in.ibm.com ([122.248.162.2]:52783 "EHLO e28smtp02.in.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752512Ab2CPRj4 (ORCPT ); Fri, 16 Mar 2012 13:39:56 -0400 From: "Aneesh Kumar K.V" To: linux-mm@kvack.org, mgorman@suse.de, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, aarcange@redhat.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, "Aneesh Kumar K.V" Subject: [PATCH -V4 04/10] memcg: Add HugeTLB extension Date: Fri, 16 Mar 2012 23:09:24 +0530 Message-Id: <1331919570-2264-5-git-send-email-aneesh.kumar@linux.vnet.ibm.com> X-Mailer: git-send-email 1.7.9 In-Reply-To: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> x-cbid: 12031617-5816-0000-0000-000001C71785 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: "Aneesh Kumar K.V" This patch implements a memcg extension that allows us to control HugeTLB allocations via memory controller. Signed-off-by: Aneesh Kumar K.V --- include/linux/hugetlb.h | 1 + include/linux/memcontrol.h | 42 +++++++++++++ init/Kconfig | 8 +++ mm/hugetlb.c | 2 +- mm/memcontrol.c | 138 ++++++++++++++++++++++++++++++++++++++++++++ 5 files changed, 190 insertions(+), 1 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index a2675b0..1f70068 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -243,6 +243,7 @@ struct hstate *size_to_hstate(unsigned long size); #define HUGE_MAX_HSTATE 1 #endif +extern int hugetlb_max_hstate; extern struct hstate hstates[HUGE_MAX_HSTATE]; extern unsigned int default_hstate_idx; diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 4d34356..320dbad 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -429,5 +429,47 @@ static inline void sock_release_memcg(struct sock *sk) { } #endif /* CONFIG_CGROUP_MEM_RES_CTLR_KMEM */ + +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB +extern int mem_cgroup_hugetlb_charge_page(int idx, unsigned long nr_pages, + struct mem_cgroup **ptr); +extern void mem_cgroup_hugetlb_commit_charge(int idx, unsigned long nr_pages, + struct mem_cgroup *memcg, + struct page *page); +extern void mem_cgroup_hugetlb_uncharge_page(int idx, unsigned long nr_pages, + struct page *page); +extern void mem_cgroup_hugetlb_uncharge_memcg(int idx, unsigned long nr_pages, + struct mem_cgroup *memcg); + +#else +static inline int +mem_cgroup_hugetlb_charge_page(int idx, unsigned long nr_pages, + struct mem_cgroup **ptr) +{ + return 0; +} + +static inline void +mem_cgroup_hugetlb_commit_charge(int idx, unsigned long nr_pages, + struct mem_cgroup *memcg, + struct page *page) +{ + return; +} + +static inline void +mem_cgroup_hugetlb_uncharge_page(int idx, unsigned long nr_pages, + struct page *page) +{ + return; +} + +static inline void +mem_cgroup_hugetlb_uncharge_memcg(int idx, unsigned long nr_pages, + struct mem_cgroup *memcg) +{ + return; +} +#endif /* CONFIG_MEM_RES_CTLR_HUGETLB */ #endif /* _LINUX_MEMCONTROL_H */ diff --git a/init/Kconfig b/init/Kconfig index 3f42cd6..f0eb8aa 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -725,6 +725,14 @@ config CGROUP_PERF Say N if unsure. +config MEM_RES_CTLR_HUGETLB + bool "Memory Resource Controller HugeTLB Extension (EXPERIMENTAL)" + depends on CGROUP_MEM_RES_CTLR && HUGETLB_PAGE && EXPERIMENTAL + default n + help + Add HugeTLB management to memory resource controller. When you + enable this, you can put a per cgroup limit on HugeTLB usage. + menuconfig CGROUP_SCHED bool "Group CPU scheduler" default n diff --git a/mm/hugetlb.c b/mm/hugetlb.c index ebe245c..c672187 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -34,7 +34,7 @@ const unsigned long hugetlb_zero = 0, hugetlb_infinity = ~0UL; static gfp_t htlb_alloc_mask = GFP_HIGHUSER; unsigned long hugepages_treat_as_movable; -static int hugetlb_max_hstate; +int hugetlb_max_hstate; unsigned int default_hstate_idx; struct hstate hstates[HUGE_MAX_HSTATE]; diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 6728a7a..4b36c5e 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -235,6 +235,10 @@ struct mem_cgroup { */ struct res_counter memsw; /* + * the counter to account for hugepages from hugetlb. + */ + struct res_counter hugepage[HUGE_MAX_HSTATE]; + /* * Per cgroup active and inactive list, similar to the * per zone LRU lists. */ @@ -3156,6 +3160,128 @@ static inline int mem_cgroup_move_swap_account(swp_entry_t entry, } #endif +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB +static bool mem_cgroup_have_hugetlb_usage(struct mem_cgroup *memcg) +{ + int idx; + for (idx = 0; idx < hugetlb_max_hstate; idx++) { + if (memcg->hugepage[idx].usage > 0) + return 1; + } + return 0; +} + +int mem_cgroup_hugetlb_charge_page(int idx, unsigned long nr_pages, + struct mem_cgroup **ptr) +{ + int ret = 0; + struct mem_cgroup *memcg; + struct res_counter *fail_res; + unsigned long csize = nr_pages * PAGE_SIZE; + + if (mem_cgroup_disabled()) + return 0; +again: + rcu_read_lock(); + memcg = mem_cgroup_from_task(current); + if (!memcg) + memcg = root_mem_cgroup; + if (mem_cgroup_is_root(memcg)) { + rcu_read_unlock(); + goto done; + } + if (!css_tryget(&memcg->css)) { + rcu_read_unlock(); + goto again; + } + rcu_read_unlock(); + + ret = res_counter_charge(&memcg->hugepage[idx], csize, &fail_res); + css_put(&memcg->css); +done: + *ptr = memcg; + return ret; +} + +void mem_cgroup_hugetlb_commit_charge(int idx, unsigned long nr_pages, + struct mem_cgroup *memcg, + struct page *page) +{ + struct page_cgroup *pc; + + if (mem_cgroup_disabled()) + return; + + pc = lookup_page_cgroup(page); + lock_page_cgroup(pc); + if (unlikely(PageCgroupUsed(pc))) { + unlock_page_cgroup(pc); + mem_cgroup_hugetlb_uncharge_memcg(idx, nr_pages, memcg); + return; + } + pc->mem_cgroup = memcg; + /* + * We access a page_cgroup asynchronously without lock_page_cgroup(). + * Especially when a page_cgroup is taken from a page, pc->mem_cgroup + * is accessed after testing USED bit. To make pc->mem_cgroup visible + * before USED bit, we need memory barrier here. + * See mem_cgroup_add_lru_list(), etc. + */ + smp_wmb(); + SetPageCgroupUsed(pc); + + unlock_page_cgroup(pc); + return; +} + +void mem_cgroup_hugetlb_uncharge_page(int idx, unsigned long nr_pages, + struct page *page) +{ + struct page_cgroup *pc; + struct mem_cgroup *memcg; + unsigned long csize = nr_pages * PAGE_SIZE; + + if (mem_cgroup_disabled()) + return; + + pc = lookup_page_cgroup(page); + if (unlikely(!PageCgroupUsed(pc))) + return; + + lock_page_cgroup(pc); + if (!PageCgroupUsed(pc)) { + unlock_page_cgroup(pc); + return; + } + memcg = pc->mem_cgroup; + pc->mem_cgroup = root_mem_cgroup; + ClearPageCgroupUsed(pc); + unlock_page_cgroup(pc); + + if (!mem_cgroup_is_root(memcg)) + res_counter_uncharge(&memcg->hugepage[idx], csize); + return; +} + +void mem_cgroup_hugetlb_uncharge_memcg(int idx, unsigned long nr_pages, + struct mem_cgroup *memcg) +{ + unsigned long csize = nr_pages * PAGE_SIZE; + + if (mem_cgroup_disabled()) + return; + + if (!mem_cgroup_is_root(memcg)) + res_counter_uncharge(&memcg->hugepage[idx], csize); + return; +} +#else +static bool mem_cgroup_have_hugetlb_usage(struct mem_cgroup *memcg) +{ + return 0; +} +#endif /* CONFIG_MEM_RES_CTLR_HUGETLB */ + /* * Before starting migration, account PAGE_SIZE to mem_cgroup that the old * page belongs to. @@ -4887,6 +5013,7 @@ err_cleanup: static struct cgroup_subsys_state * __ref mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont) { + int idx; struct mem_cgroup *memcg, *parent; long error = -ENOMEM; int node; @@ -4929,9 +5056,14 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont) * mem_cgroup(see mem_cgroup_put). */ mem_cgroup_get(parent); + for (idx = 0; idx < HUGE_MAX_HSTATE; idx++) + res_counter_init(&memcg->hugepage[idx], + &parent->hugepage[idx]); } else { res_counter_init(&memcg->res, NULL); res_counter_init(&memcg->memsw, NULL); + for (idx = 0; idx < HUGE_MAX_HSTATE; idx++) + res_counter_init(&memcg->hugepage[idx], NULL); } memcg->last_scanned_node = MAX_NUMNODES; INIT_LIST_HEAD(&memcg->oom_notify); @@ -4951,6 +5083,12 @@ static int mem_cgroup_pre_destroy(struct cgroup_subsys *ss, struct cgroup *cont) { struct mem_cgroup *memcg = mem_cgroup_from_cont(cont); + /* + * Don't allow memcg removal if we have HugeTLB resource + * usage. + */ + if (mem_cgroup_have_hugetlb_usage(memcg)) + return -EBUSY; return mem_cgroup_force_empty(memcg, false); } -- 1.7.9 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753645Ab2CPRlO (ORCPT ); Fri, 16 Mar 2012 13:41:14 -0400 Received: from e28smtp01.in.ibm.com ([122.248.162.1]:52856 "EHLO e28smtp01.in.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753239Ab2CPRkF (ORCPT ); Fri, 16 Mar 2012 13:40:05 -0400 From: "Aneesh Kumar K.V" To: linux-mm@kvack.org, mgorman@suse.de, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, aarcange@redhat.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, "Aneesh Kumar K.V" Subject: [PATCH -V4 10/10] memcg: Add memory controller documentation for hugetlb management Date: Fri, 16 Mar 2012 23:09:30 +0530 Message-Id: <1331919570-2264-11-git-send-email-aneesh.kumar@linux.vnet.ibm.com> X-Mailer: git-send-email 1.7.9 In-Reply-To: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> x-cbid: 12031617-4790-0000-0000-000001D40207 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: "Aneesh Kumar K.V" Signed-off-by: Aneesh Kumar K.V --- Documentation/cgroups/memory.txt | 29 +++++++++++++++++++++++++++++ 1 files changed, 29 insertions(+), 0 deletions(-) diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt index 4c95c00..d99c41b 100644 --- a/Documentation/cgroups/memory.txt +++ b/Documentation/cgroups/memory.txt @@ -43,6 +43,7 @@ Features: - usage threshold notifier - oom-killer disable knob and oom-notifier - Root cgroup has no limit controls. + - resource accounting for HugeTLB pages Kernel memory support is work in progress, and the current version provides basically functionality. (See Section 2.7) @@ -75,6 +76,12 @@ Brief summary of control files. memory.kmem.tcp.limit_in_bytes # set/show hard limit for tcp buf memory memory.kmem.tcp.usage_in_bytes # show current tcp buf memory allocation + + memory.hugetlb..limit_in_bytes # set/show limit of "hugepagesize" hugetlb usage + memory.hugetlb..max_usage_in_bytes # show max "hugepagesize" hugetlb usage recorded + memory.hugetlb..usage_in_bytes # show current res_counter usage for "hugepagesize" hugetlb + # see 5.7 for details + 1. History The memory controller has a long history. A request for comments for the memory @@ -279,6 +286,15 @@ per cgroup, instead of globally. * tcp memory pressure: sockets memory pressure for the tcp protocol. +2.8 HugeTLB extension + +This extension allows to limit the HugeTLB usage per control group and +enforces the controller limit during page fault. Since HugeTLB doesn't +support page reclaim, enforcing the limit at page fault time implies that, +the application will get SIGBUS signal if it tries to access HugeTLB pages +beyond its limit. This requires the application to know beforehand how much +HugeTLB pages it would require for its use. + 3. User Interface 0. Configuration @@ -287,6 +303,7 @@ a. Enable CONFIG_CGROUPS b. Enable CONFIG_RESOURCE_COUNTERS c. Enable CONFIG_CGROUP_MEM_RES_CTLR d. Enable CONFIG_CGROUP_MEM_RES_CTLR_SWAP (to use swap extension) +f. Enable CONFIG_MEM_RES_CTLR_HUGETLB (to use HugeTLB extension) 1. Prepare the cgroups (see cgroups.txt, Why are cgroups needed?) # mount -t tmpfs none /sys/fs/cgroup @@ -510,6 +527,18 @@ unevictable= N0= N1= ... And we have total = file + anon + unevictable. +5.7 HugeTLB resource control files +For a system supporting two hugepage size (16M and 16G) the control +files include: + + memory.hugetlb.16GB.limit_in_bytes + memory.hugetlb.16GB.max_usage_in_bytes + memory.hugetlb.16GB.usage_in_bytes + memory.hugetlb.16MB.limit_in_bytes + memory.hugetlb.16MB.max_usage_in_bytes + memory.hugetlb.16MB.usage_in_bytes + + 6. Hierarchy support The memory controller supports a deep hierarchy and hierarchical accounting. -- 1.7.9 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753163Ab2CPRkE (ORCPT ); Fri, 16 Mar 2012 13:40:04 -0400 Received: from e28smtp01.in.ibm.com ([122.248.162.1]:52821 "EHLO e28smtp01.in.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752863Ab2CPRjz (ORCPT ); Fri, 16 Mar 2012 13:39:55 -0400 From: "Aneesh Kumar K.V" To: linux-mm@kvack.org, mgorman@suse.de, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, aarcange@redhat.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, "Aneesh Kumar K.V" Subject: [PATCH -V4 03/10] hugetlbfs: Add an inline helper for finding hstate index Date: Fri, 16 Mar 2012 23:09:23 +0530 Message-Id: <1331919570-2264-4-git-send-email-aneesh.kumar@linux.vnet.ibm.com> X-Mailer: git-send-email 1.7.9 In-Reply-To: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> x-cbid: 12031617-4790-0000-0000-000001D401EC Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: "Aneesh Kumar K.V" Add and inline helper and use it in the code. Signed-off-by: Aneesh Kumar K.V --- include/linux/hugetlb.h | 6 ++++++ mm/hugetlb.c | 18 ++++++++++-------- 2 files changed, 16 insertions(+), 8 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index d9d6c86..a2675b0 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -311,6 +311,11 @@ static inline unsigned hstate_index_to_shift(unsigned index) return hstates[index].order + PAGE_SHIFT; } +static inline int hstate_index(struct hstate *h) +{ + return h - hstates; +} + #else struct hstate {}; #define alloc_huge_page_node(h, nid) NULL @@ -329,6 +334,7 @@ static inline unsigned int pages_per_huge_page(struct hstate *h) return 1; } #define hstate_index_to_shift(index) 0 +#define hstate_index(h) 0 #endif #endif /* _LINUX_HUGETLB_H */ diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 3782da8..ebe245c 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1557,7 +1557,7 @@ static int hugetlb_sysfs_add_hstate(struct hstate *h, struct kobject *parent, struct attribute_group *hstate_attr_group) { int retval; - int hi = h - hstates; + int hi = hstate_index(h); hstate_kobjs[hi] = kobject_create_and_add(h->name, parent); if (!hstate_kobjs[hi]) @@ -1652,11 +1652,13 @@ void hugetlb_unregister_node(struct node *node) if (!nhs->hugepages_kobj) return; /* no hstate attributes */ - for_each_hstate(h) - if (nhs->hstate_kobjs[h - hstates]) { - kobject_put(nhs->hstate_kobjs[h - hstates]); - nhs->hstate_kobjs[h - hstates] = NULL; + for_each_hstate(h) { + int idx = hstate_index(h); + if (nhs->hstate_kobjs[idx]) { + kobject_put(nhs->hstate_kobjs[idx]); + nhs->hstate_kobjs[idx] = NULL; } + } kobject_put(nhs->hugepages_kobj); nhs->hugepages_kobj = NULL; @@ -1759,7 +1761,7 @@ static void __exit hugetlb_exit(void) hugetlb_unregister_all_nodes(); for_each_hstate(h) { - kobject_put(hstate_kobjs[h - hstates]); + kobject_put(hstate_kobjs[hstate_index(h)]); } kobject_put(hugepages_kobj); @@ -2587,7 +2589,7 @@ retry: */ if (unlikely(PageHWPoison(page))) { ret = VM_FAULT_HWPOISON | - VM_FAULT_SET_HINDEX(h - hstates); + VM_FAULT_SET_HINDEX(hstate_index(h)); goto backout_unlocked; } } @@ -2660,7 +2662,7 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, return 0; } else if (unlikely(is_hugetlb_entry_hwpoisoned(entry))) return VM_FAULT_HWPOISON_LARGE | - VM_FAULT_SET_HINDEX(h - hstates); + VM_FAULT_SET_HINDEX(hstate_index(h)); } ptep = huge_pte_alloc(mm, address, huge_page_size(h)); -- 1.7.9 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753591Ab2CPRld (ORCPT ); Fri, 16 Mar 2012 13:41:33 -0400 Received: from e28smtp03.in.ibm.com ([122.248.162.3]:43452 "EHLO e28smtp03.in.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753175Ab2CPRkE (ORCPT ); Fri, 16 Mar 2012 13:40:04 -0400 From: "Aneesh Kumar K.V" To: linux-mm@kvack.org, mgorman@suse.de, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, aarcange@redhat.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, "Aneesh Kumar K.V" Subject: [PATCH -V4 09/10] memcg: move HugeTLB resource count to parent cgroup on memcg removal Date: Fri, 16 Mar 2012 23:09:29 +0530 Message-Id: <1331919570-2264-10-git-send-email-aneesh.kumar@linux.vnet.ibm.com> X-Mailer: git-send-email 1.7.9 In-Reply-To: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> x-cbid: 12031617-3864-0000-0000-000001E213CB Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: "Aneesh Kumar K.V" This add support for memcg removal with HugeTLB resource usage. Signed-off-by: Aneesh Kumar K.V --- include/linux/hugetlb.h | 6 ++++ include/linux/memcontrol.h | 15 +++++++++- mm/hugetlb.c | 41 ++++++++++++++++++++++++++ mm/memcontrol.c | 68 +++++++++++++++++++++++++++++++++++++------ 4 files changed, 119 insertions(+), 11 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 6919100..32e948c 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -349,11 +349,17 @@ static inline unsigned int pages_per_huge_page(struct hstate *h) #ifdef CONFIG_MEM_RES_CTLR_HUGETLB extern int register_hugetlb_memcg_files(struct cgroup *cgroup, struct cgroup_subsys *ss); +extern int hugetlb_force_memcg_empty(struct cgroup *cgroup); #else static inline int register_hugetlb_memcg_files(struct cgroup *cgroup, struct cgroup_subsys *ss) { return 0; } + +static inline int hugetlb_force_memcg_empty(struct cgroup *cgroup) +{ + return 0; +} #endif #endif /* _LINUX_HUGETLB_H */ diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 73900b9..0980122 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -441,7 +441,9 @@ extern void mem_cgroup_hugetlb_uncharge_page(int idx, unsigned long nr_pages, extern void mem_cgroup_hugetlb_uncharge_memcg(int idx, unsigned long nr_pages, struct mem_cgroup *memcg); extern int mem_cgroup_hugetlb_file_init(int idx); - +extern int mem_cgroup_move_hugetlb_parent(int idx, struct cgroup *cgroup, + struct page *page); +extern bool mem_cgroup_have_hugetlb_usage(struct cgroup *cgroup); #else static inline int mem_cgroup_hugetlb_charge_page(int idx, unsigned long nr_pages, @@ -477,6 +479,17 @@ static inline int mem_cgroup_hugetlb_file_init(int idx) return 0; } +static inline int +mem_cgroup_move_hugetlb_parent(int idx, struct cgroup *cgroup, + struct page *page) +{ + return 0; +} + +static inline bool mem_cgroup_have_hugetlb_usage(struct cgroup *cgroup) +{ + return 0; +} #endif /* CONFIG_MEM_RES_CTLR_HUGETLB */ #endif /* _LINUX_MEMCONTROL_H */ diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 8fd465d..685f0d5 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1842,6 +1842,47 @@ int register_hugetlb_memcg_files(struct cgroup *cgroup, } return ret; } + +/* + * Force the memcg to empty the hugetlb resources by moving them to + * the parent cgroup. We can fail if the parent cgroup's limit prevented + * the charging. This should only happen if use_hierarchy is not set. + */ +int hugetlb_force_memcg_empty(struct cgroup *cgroup) +{ + struct hstate *h; + struct page *page; + int ret = 0, idx = 0; + + do { + if (cgroup_task_count(cgroup) || !list_empty(&cgroup->children)) + goto out; + /* + * If the task doing the cgroup_rmdir got a signal + * we don't really need to loop till the hugetlb resource + * usage become zero. + */ + if (signal_pending(current)) { + ret = -EINTR; + goto out; + } + for_each_hstate(h) { + spin_lock(&hugetlb_lock); + list_for_each_entry(page, &h->hugepage_activelist, lru) { + ret = mem_cgroup_move_hugetlb_parent(idx, cgroup, page); + if (ret) { + spin_unlock(&hugetlb_lock); + goto out; + } + } + spin_unlock(&hugetlb_lock); + idx++; + } + cond_resched(); + } while (mem_cgroup_have_hugetlb_usage(cgroup)); +out: + return ret; +} #endif /* Should be called on processing a hugepagesz=... option */ diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 4900b72..e29d86d 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -3171,9 +3171,11 @@ static inline int mem_cgroup_move_swap_account(swp_entry_t entry, #endif #ifdef CONFIG_MEM_RES_CTLR_HUGETLB -static bool mem_cgroup_have_hugetlb_usage(struct mem_cgroup *memcg) +bool mem_cgroup_have_hugetlb_usage(struct cgroup *cgroup) { int idx; + struct mem_cgroup *memcg = mem_cgroup_from_cont(cgroup); + for (idx = 0; idx < hugetlb_max_hstate; idx++) { if (memcg->hugepage[idx].usage > 0) return 1; @@ -3285,10 +3287,57 @@ void mem_cgroup_hugetlb_uncharge_memcg(int idx, unsigned long nr_pages, res_counter_uncharge(&memcg->hugepage[idx], csize); return; } -#else -static bool mem_cgroup_have_hugetlb_usage(struct mem_cgroup *memcg) + +int mem_cgroup_move_hugetlb_parent(int idx, struct cgroup *cgroup, + struct page *page) { - return 0; + struct page_cgroup *pc; + int csize, ret = 0; + struct res_counter *fail_res; + struct cgroup *pcgrp = cgroup->parent; + struct mem_cgroup *parent = mem_cgroup_from_cont(pcgrp); + struct mem_cgroup *memcg = mem_cgroup_from_cont(cgroup); + + if (!get_page_unless_zero(page)) + goto out; + + pc = lookup_page_cgroup(page); + lock_page_cgroup(pc); + if (!PageCgroupUsed(pc) || pc->mem_cgroup != memcg) + goto err_out; + + csize = PAGE_SIZE << compound_order(page); + /* + * uncharge from child and charge the parent. If we have + * use_hierarchy set, we can never fail here. In-order to make + * sure we don't get -ENOMEM on parent charge, we first uncharge + * the child and then charge the parent. + */ + if (parent->use_hierarchy) { + res_counter_uncharge(&memcg->hugepage[idx], csize); + if (!mem_cgroup_is_root(parent)) + ret = res_counter_charge(&parent->hugepage[idx], + csize, &fail_res); + } else { + if (!mem_cgroup_is_root(parent)) { + ret = res_counter_charge(&parent->hugepage[idx], + csize, &fail_res); + if (ret) { + ret = -EBUSY; + goto err_out; + } + } + res_counter_uncharge(&memcg->hugepage[idx], csize); + } + /* + * caller should have done css_get + */ + pc->mem_cgroup = parent; +err_out: + unlock_page_cgroup(pc); + put_page(page); +out: + return ret; } #endif /* CONFIG_MEM_RES_CTLR_HUGETLB */ @@ -3806,6 +3855,11 @@ static int mem_cgroup_force_empty(struct mem_cgroup *memcg, bool free_all) /* should free all ? */ if (free_all) goto try_to_free; + + /* move the hugetlb charges */ + ret = hugetlb_force_memcg_empty(cgrp); + if (ret) + goto out; move_account: do { ret = -EBUSY; @@ -5103,12 +5157,6 @@ static int mem_cgroup_pre_destroy(struct cgroup_subsys *ss, struct cgroup *cont) { struct mem_cgroup *memcg = mem_cgroup_from_cont(cont); - /* - * Don't allow memcg removal if we have HugeTLB resource - * usage. - */ - if (mem_cgroup_have_hugetlb_usage(memcg)) - return -EBUSY; return mem_cgroup_force_empty(memcg, false); } -- 1.7.9 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753899Ab2CPRl5 (ORCPT ); Fri, 16 Mar 2012 13:41:57 -0400 Received: from e28smtp09.in.ibm.com ([122.248.162.9]:37691 "EHLO e28smtp09.in.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752847Ab2CPRkC (ORCPT ); Fri, 16 Mar 2012 13:40:02 -0400 From: "Aneesh Kumar K.V" To: linux-mm@kvack.org, mgorman@suse.de, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, aarcange@redhat.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, "Aneesh Kumar K.V" Subject: [PATCH -V4 07/10] hugetlbfs: Add memcg control files for hugetlbfs Date: Fri, 16 Mar 2012 23:09:27 +0530 Message-Id: <1331919570-2264-8-git-send-email-aneesh.kumar@linux.vnet.ibm.com> X-Mailer: git-send-email 1.7.9 In-Reply-To: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> x-cbid: 12031617-2674-0000-0000-000003B90503 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: "Aneesh Kumar K.V" This add control files for hugetlbfs in memcg Signed-off-by: Aneesh Kumar K.V --- include/linux/hugetlb.h | 17 +++++++++++++++ include/linux/memcontrol.h | 7 ++++++ mm/hugetlb.c | 25 ++++++++++++++++++++++- mm/memcontrol.c | 48 ++++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 96 insertions(+), 1 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 1f70068..cbd8dc5 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -4,6 +4,7 @@ #include #include #include +#include struct ctl_table; struct user_struct; @@ -220,6 +221,12 @@ struct hstate { unsigned int nr_huge_pages_node[MAX_NUMNODES]; unsigned int free_huge_pages_node[MAX_NUMNODES]; unsigned int surplus_huge_pages_node[MAX_NUMNODES]; + /* mem cgroup control files */ +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB + struct cftype cgroup_limit_file; + struct cftype cgroup_usage_file; + struct cftype cgroup_max_usage_file; +#endif char name[HSTATE_NAME_LEN]; }; @@ -338,4 +345,14 @@ static inline unsigned int pages_per_huge_page(struct hstate *h) #define hstate_index(h) 0 #endif +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB +extern int register_hugetlb_memcg_files(struct cgroup *cgroup, + struct cgroup_subsys *ss); +#else +static inline int register_hugetlb_memcg_files(struct cgroup *cgroup, + struct cgroup_subsys *ss) +{ + return 0; +} +#endif #endif /* _LINUX_HUGETLB_H */ diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 320dbad..73900b9 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -440,6 +440,7 @@ extern void mem_cgroup_hugetlb_uncharge_page(int idx, unsigned long nr_pages, struct page *page); extern void mem_cgroup_hugetlb_uncharge_memcg(int idx, unsigned long nr_pages, struct mem_cgroup *memcg); +extern int mem_cgroup_hugetlb_file_init(int idx); #else static inline int @@ -470,6 +471,12 @@ mem_cgroup_hugetlb_uncharge_memcg(int idx, unsigned long nr_pages, { return; } + +static inline int mem_cgroup_hugetlb_file_init(int idx) +{ + return 0; +} + #endif /* CONFIG_MEM_RES_CTLR_HUGETLB */ #endif /* _LINUX_MEMCONTROL_H */ diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 91361a0..684849a 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1819,6 +1819,29 @@ static int __init hugetlb_init(void) } module_init(hugetlb_init); +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB +int register_hugetlb_memcg_files(struct cgroup *cgroup, + struct cgroup_subsys *ss) +{ + int ret = 0; + struct hstate *h; + + for_each_hstate(h) { + ret = cgroup_add_file(cgroup, ss, &h->cgroup_limit_file); + if (ret) + return ret; + ret = cgroup_add_file(cgroup, ss, &h->cgroup_usage_file); + if (ret) + return ret; + ret = cgroup_add_file(cgroup, ss, &h->cgroup_max_usage_file); + if (ret) + return ret; + + } + return ret; +} +#endif + /* Should be called on processing a hugepagesz=... option */ void __init hugetlb_add_hstate(unsigned order) { @@ -1842,7 +1865,7 @@ void __init hugetlb_add_hstate(unsigned order) h->next_nid_to_free = first_node(node_states[N_HIGH_MEMORY]); snprintf(h->name, HSTATE_NAME_LEN, "hugepages-%lukB", huge_page_size(h)/1024); - + mem_cgroup_hugetlb_file_init(hugetlb_max_hstate - 1); parsed_hstate = h; } diff --git a/mm/memcontrol.c b/mm/memcontrol.c index d8b3513..4900b72 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -5123,6 +5123,51 @@ static void mem_cgroup_destroy(struct cgroup_subsys *ss, mem_cgroup_put(memcg); } +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB +static char *mem_fmt(char *buf, unsigned long n) +{ + if (n >= (1UL << 30)) + sprintf(buf, "%luGB", n >> 30); + else if (n >= (1UL << 20)) + sprintf(buf, "%luMB", n >> 20); + else + sprintf(buf, "%luKB", n >> 10); + return buf; +} + +int mem_cgroup_hugetlb_file_init(int idx) +{ + char buf[32]; + struct cftype *cft; + struct hstate *h = &hstates[idx]; + + /* format the size */ + mem_fmt(buf, huge_page_size(h)); + + /* Add the limit file */ + cft = &h->cgroup_limit_file; + snprintf(cft->name, MAX_CFTYPE_NAME, "hugetlb.%s.limit_in_bytes", buf); + cft->private = __MEMFILE_PRIVATE(idx, _MEMHUGETLB, RES_LIMIT); + cft->read_u64 = mem_cgroup_read; + cft->write_string = mem_cgroup_write; + + /* Add the usage file */ + cft = &h->cgroup_usage_file; + snprintf(cft->name, MAX_CFTYPE_NAME, "hugetlb.%s.usage_in_bytes", buf); + cft->private = __MEMFILE_PRIVATE(idx, _MEMHUGETLB, RES_USAGE); + cft->read_u64 = mem_cgroup_read; + + /* Add the MAX usage file */ + cft = &h->cgroup_max_usage_file; + snprintf(cft->name, MAX_CFTYPE_NAME, "hugetlb.%s.max_usage_in_bytes", buf); + cft->private = __MEMFILE_PRIVATE(idx, _MEMHUGETLB, RES_MAX_USAGE); + cft->trigger = mem_cgroup_reset; + cft->read_u64 = mem_cgroup_read; + + return 0; +} +#endif + static int mem_cgroup_populate(struct cgroup_subsys *ss, struct cgroup *cont) { @@ -5137,6 +5182,9 @@ static int mem_cgroup_populate(struct cgroup_subsys *ss, if (!ret) ret = register_kmem_files(cont, ss); + if (!ret) + ret = register_hugetlb_memcg_files(cont, ss); + return ret; } -- 1.7.9 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752152Ab2CPRj7 (ORCPT ); Fri, 16 Mar 2012 13:39:59 -0400 Received: from e28smtp04.in.ibm.com ([122.248.162.4]:38743 "EHLO e28smtp04.in.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752847Ab2CPRjz (ORCPT ); Fri, 16 Mar 2012 13:39:55 -0400 From: "Aneesh Kumar K.V" To: linux-mm@kvack.org, mgorman@suse.de, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, aarcange@redhat.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, "Aneesh Kumar K.V" Subject: [PATCH -V4 01/10] hugetlb: rename max_hstate to hugetlb_max_hstate Date: Fri, 16 Mar 2012 23:09:21 +0530 Message-Id: <1331919570-2264-2-git-send-email-aneesh.kumar@linux.vnet.ibm.com> X-Mailer: git-send-email 1.7.9 In-Reply-To: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> x-cbid: 12031617-5564-0000-0000-000001D7FCC6 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: "Aneesh Kumar K.V" We will be using this from other subsystems like memcg in later patches. Acked-by: Hillf Danton Signed-off-by: Aneesh Kumar K.V --- mm/hugetlb.c | 14 +++++++------- 1 files changed, 7 insertions(+), 7 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 5f34bd8..d623e71 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -34,7 +34,7 @@ const unsigned long hugetlb_zero = 0, hugetlb_infinity = ~0UL; static gfp_t htlb_alloc_mask = GFP_HIGHUSER; unsigned long hugepages_treat_as_movable; -static int max_hstate; +static int hugetlb_max_hstate; unsigned int default_hstate_idx; struct hstate hstates[HUGE_MAX_HSTATE]; @@ -46,7 +46,7 @@ static unsigned long __initdata default_hstate_max_huge_pages; static unsigned long __initdata default_hstate_size; #define for_each_hstate(h) \ - for ((h) = hstates; (h) < &hstates[max_hstate]; (h)++) + for ((h) = hstates; (h) < &hstates[hugetlb_max_hstate]; (h)++) /* * Protects updates to hugepage_freelists, nr_huge_pages, and free_huge_pages @@ -1808,9 +1808,9 @@ void __init hugetlb_add_hstate(unsigned order) printk(KERN_WARNING "hugepagesz= specified twice, ignoring\n"); return; } - BUG_ON(max_hstate >= HUGE_MAX_HSTATE); + BUG_ON(hugetlb_max_hstate >= HUGE_MAX_HSTATE); BUG_ON(order == 0); - h = &hstates[max_hstate++]; + h = &hstates[hugetlb_max_hstate++]; h->order = order; h->mask = ~((1ULL << (order + PAGE_SHIFT)) - 1); h->nr_huge_pages = 0; @@ -1831,10 +1831,10 @@ static int __init hugetlb_nrpages_setup(char *s) static unsigned long *last_mhp; /* - * !max_hstate means we haven't parsed a hugepagesz= parameter yet, + * !hugetlb_max_hstate means we haven't parsed a hugepagesz= parameter yet, * so this hugepages= parameter goes to the "default hstate". */ - if (!max_hstate) + if (!hugetlb_max_hstate) mhp = &default_hstate_max_huge_pages; else mhp = &parsed_hstate->max_huge_pages; @@ -1853,7 +1853,7 @@ static int __init hugetlb_nrpages_setup(char *s) * But we need to allocate >= MAX_ORDER hstates here early to still * use the bootmem allocator. */ - if (max_hstate && parsed_hstate->order >= MAX_ORDER) + if (hugetlb_max_hstate && parsed_hstate->order >= MAX_ORDER) hugetlb_hstate_alloc_pages(parsed_hstate); last_mhp = mhp; -- 1.7.9 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753937Ab2CPRmQ (ORCPT ); Fri, 16 Mar 2012 13:42:16 -0400 Received: from e28smtp01.in.ibm.com ([122.248.162.1]:52832 "EHLO e28smtp01.in.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753030Ab2CPRkA (ORCPT ); Fri, 16 Mar 2012 13:40:00 -0400 From: "Aneesh Kumar K.V" To: linux-mm@kvack.org, mgorman@suse.de, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, aarcange@redhat.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, "Aneesh Kumar K.V" Subject: [PATCH -V4 06/10] memcg: track resource index in cftype private Date: Fri, 16 Mar 2012 23:09:26 +0530 Message-Id: <1331919570-2264-7-git-send-email-aneesh.kumar@linux.vnet.ibm.com> X-Mailer: git-send-email 1.7.9 In-Reply-To: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> x-cbid: 12031617-4790-0000-0000-000001D401F7 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: "Aneesh Kumar K.V" This helps in using same memcg callbacks for non reclaim resource control files. Signed-off-by: Aneesh Kumar K.V --- mm/memcontrol.c | 27 +++++++++++++++++++++------ 1 files changed, 21 insertions(+), 6 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 7a9ea94..d8b3513 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -358,9 +358,14 @@ enum charge_type { #define _MEM (0) #define _MEMSWAP (1) #define _OOM_TYPE (2) -#define MEMFILE_PRIVATE(x, val) (((x) << 16) | (val)) -#define MEMFILE_TYPE(val) (((val) >> 16) & 0xffff) -#define MEMFILE_ATTR(val) ((val) & 0xffff) +#define _MEMHUGETLB (3) + +/* 0 ... val ...16.... x...24...idx...32*/ +#define __MEMFILE_PRIVATE(idx, x, val) (((idx) << 24) | ((x) << 16) | (val)) +#define MEMFILE_PRIVATE(x, val) __MEMFILE_PRIVATE(0, x, val) +#define MEMFILE_TYPE(val) (((val) >> 16) & 0xff) +#define MEMFILE_IDX(val) (((val) >> 24) & 0xff) +#define MEMFILE_ATTR(val) ((val) & 0xffff) /* Used for OOM nofiier */ #define OOM_CONTROL (0) @@ -3954,7 +3959,7 @@ static u64 mem_cgroup_read(struct cgroup *cont, struct cftype *cft) { struct mem_cgroup *memcg = mem_cgroup_from_cont(cont); u64 val; - int type, name; + int type, name, idx; type = MEMFILE_TYPE(cft->private); name = MEMFILE_ATTR(cft->private); @@ -3971,6 +3976,10 @@ static u64 mem_cgroup_read(struct cgroup *cont, struct cftype *cft) else val = res_counter_read_u64(&memcg->memsw, name); break; + case _MEMHUGETLB: + idx = MEMFILE_IDX(cft->private); + val = res_counter_read_u64(&memcg->hugepage[idx], name); + break; default: BUG(); break; @@ -4003,7 +4012,10 @@ static int mem_cgroup_write(struct cgroup *cont, struct cftype *cft, break; if (type == _MEM) ret = mem_cgroup_resize_limit(memcg, val); - else + else if (type == _MEMHUGETLB) { + int idx = MEMFILE_IDX(cft->private); + ret = res_counter_set_limit(&memcg->hugepage[idx], val); + } else ret = mem_cgroup_resize_memsw_limit(memcg, val); break; case RES_SOFT_LIMIT: @@ -4067,7 +4079,10 @@ static int mem_cgroup_reset(struct cgroup *cont, unsigned int event) case RES_MAX_USAGE: if (type == _MEM) res_counter_reset_max(&memcg->res); - else + else if (type == _MEMHUGETLB) { + int idx = MEMFILE_IDX(event); + res_counter_reset_max(&memcg->hugepage[idx]); + } else res_counter_reset_max(&memcg->memsw); break; case RES_FAILCNT: -- 1.7.9 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754008Ab2CPRml (ORCPT ); Fri, 16 Mar 2012 13:42:41 -0400 Received: from e28smtp04.in.ibm.com ([122.248.162.4]:38752 "EHLO e28smtp04.in.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752822Ab2CPRj6 (ORCPT ); Fri, 16 Mar 2012 13:39:58 -0400 From: "Aneesh Kumar K.V" To: linux-mm@kvack.org, mgorman@suse.de, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, aarcange@redhat.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, "Aneesh Kumar K.V" Subject: [PATCH -V4 05/10] hugetlb: add charge/uncharge calls for HugeTLB alloc/free Date: Fri, 16 Mar 2012 23:09:25 +0530 Message-Id: <1331919570-2264-6-git-send-email-aneesh.kumar@linux.vnet.ibm.com> X-Mailer: git-send-email 1.7.9 In-Reply-To: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> x-cbid: 12031617-5564-0000-0000-000001D7FCDC Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: "Aneesh Kumar K.V" This adds necessary charge/uncharge calls in the HugeTLB code Acked-by: Hillf Danton Signed-off-by: Aneesh Kumar K.V --- mm/hugetlb.c | 21 ++++++++++++++++++++- mm/memcontrol.c | 5 +++++ 2 files changed, 25 insertions(+), 1 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index c672187..91361a0 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -21,6 +21,8 @@ #include #include #include +#include +#include #include #include @@ -542,6 +544,9 @@ static void free_huge_page(struct page *page) BUG_ON(page_mapcount(page)); INIT_LIST_HEAD(&page->lru); + if (mapping) + mem_cgroup_hugetlb_uncharge_page(hstate_index(h), + pages_per_huge_page(h), page); spin_lock(&hugetlb_lock); if (h->surplus_huge_pages_node[nid] && huge_page_order(h) < MAX_ORDER) { update_and_free_page(h, page); @@ -1019,12 +1024,15 @@ static void vma_commit_reservation(struct hstate *h, static struct page *alloc_huge_page(struct vm_area_struct *vma, unsigned long addr, int avoid_reserve) { + int ret, idx; struct hstate *h = hstate_vma(vma); struct page *page; + struct mem_cgroup *memcg = NULL; struct address_space *mapping = vma->vm_file->f_mapping; struct inode *inode = mapping->host; long chg; + idx = hstate_index(h); /* * Processes that did not create the mapping will have no reserves and * will not have accounted against quota. Check that the quota can be @@ -1039,6 +1047,12 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma, if (hugetlb_get_quota(inode->i_mapping, chg)) return ERR_PTR(-ENOSPC); + ret = mem_cgroup_hugetlb_charge_page(idx, pages_per_huge_page(h), + &memcg); + if (ret) { + hugetlb_put_quota(inode->i_mapping, chg); + return ERR_PTR(-ENOSPC); + } spin_lock(&hugetlb_lock); page = dequeue_huge_page_vma(h, vma, addr, avoid_reserve); spin_unlock(&hugetlb_lock); @@ -1046,6 +1060,9 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma, if (!page) { page = alloc_buddy_huge_page(h, NUMA_NO_NODE); if (!page) { + mem_cgroup_hugetlb_uncharge_memcg(idx, + pages_per_huge_page(h), + memcg); hugetlb_put_quota(inode->i_mapping, chg); return ERR_PTR(-ENOSPC); } @@ -1054,7 +1071,9 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma, set_page_private(page, (unsigned long) mapping); vma_commit_reservation(h, vma, addr); - + /* update page cgroup details */ + mem_cgroup_hugetlb_commit_charge(idx, pages_per_huge_page(h), + memcg, page); return page; } diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 4b36c5e..7a9ea94 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2901,6 +2901,11 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype) if (PageSwapCache(page)) return NULL; + /* + * HugeTLB page uncharge happen in the HugeTLB compound page destructor + */ + if (PageHuge(page)) + return NULL; if (PageTransHuge(page)) { nr_pages <<= compound_order(page); -- 1.7.9 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753916Ab2CPRnG (ORCPT ); Fri, 16 Mar 2012 13:43:06 -0400 Received: from e28smtp05.in.ibm.com ([122.248.162.5]:55419 "EHLO e28smtp05.in.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752548Ab2CPRjw (ORCPT ); Fri, 16 Mar 2012 13:39:52 -0400 From: "Aneesh Kumar K.V" To: linux-mm@kvack.org, mgorman@suse.de, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, aarcange@redhat.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, "Aneesh Kumar K.V" Subject: [PATCH -V4 02/10] hugetlbfs: don't use ERR_PTR with VM_FAULT* values Date: Fri, 16 Mar 2012 23:09:22 +0530 Message-Id: <1331919570-2264-3-git-send-email-aneesh.kumar@linux.vnet.ibm.com> X-Mailer: git-send-email 1.7.9 In-Reply-To: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> x-cbid: 12031617-8256-0000-0000-000001ADDA8D Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: "Aneesh Kumar K.V" Using VM_FAULT_* codes with ERR_PTR will require us to make sure VM_FAULT_* values will not exceed MAX_ERRNO value. Signed-off-by: Aneesh Kumar K.V --- mm/hugetlb.c | 18 +++++++++++++----- 1 files changed, 13 insertions(+), 5 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index d623e71..3782da8 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1034,10 +1034,10 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma, */ chg = vma_needs_reservation(h, vma, addr); if (chg < 0) - return ERR_PTR(-VM_FAULT_OOM); + return ERR_PTR(-ENOMEM); if (chg) if (hugetlb_get_quota(inode->i_mapping, chg)) - return ERR_PTR(-VM_FAULT_SIGBUS); + return ERR_PTR(-ENOSPC); spin_lock(&hugetlb_lock); page = dequeue_huge_page_vma(h, vma, addr, avoid_reserve); @@ -1047,7 +1047,7 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma, page = alloc_buddy_huge_page(h, NUMA_NO_NODE); if (!page) { hugetlb_put_quota(inode->i_mapping, chg); - return ERR_PTR(-VM_FAULT_SIGBUS); + return ERR_PTR(-ENOSPC); } } @@ -2395,6 +2395,7 @@ retry_avoidcopy: new_page = alloc_huge_page(vma, address, outside_reserve); if (IS_ERR(new_page)) { + int err = PTR_ERR(new_page); page_cache_release(old_page); /* @@ -2424,7 +2425,10 @@ retry_avoidcopy: /* Caller expects lock to be held */ spin_lock(&mm->page_table_lock); - return -PTR_ERR(new_page); + if (err == -ENOMEM) + return VM_FAULT_OOM; + else + return VM_FAULT_SIGBUS; } /* @@ -2542,7 +2546,11 @@ retry: goto out; page = alloc_huge_page(vma, address, 0); if (IS_ERR(page)) { - ret = -PTR_ERR(page); + ret = PTR_ERR(page); + if (ret == -ENOMEM) + ret = VM_FAULT_OOM; + else + ret = VM_FAULT_SIGBUS; goto out; } clear_huge_page(page, address, pages_per_huge_page(h)); -- 1.7.9 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757541Ab2CSCIr (ORCPT ); Sun, 18 Mar 2012 22:08:47 -0400 Received: from fgwmail5.fujitsu.co.jp ([192.51.44.35]:41005 "EHLO fgwmail5.fujitsu.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752953Ab2CSCIp (ORCPT ); Sun, 18 Mar 2012 22:08:45 -0400 X-SecurityPolicyCheck: OK by SHieldMailChecker v1.7.4 Message-ID: <4F6694C4.2090800@jp.fujitsu.com> Date: Mon, 19 Mar 2012 11:07:00 +0900 From: KAMEZAWA Hiroyuki User-Agent: Mozilla/5.0 (Windows NT 6.0; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2 MIME-Version: 1.0 To: "Aneesh Kumar K.V" CC: linux-mm@kvack.org, mgorman@suse.de, dhillf@gmail.com, aarcange@redhat.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: Re: [PATCH -V4 01/10] hugetlb: rename max_hstate to hugetlb_max_hstate References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-2-git-send-email-aneesh.kumar@linux.vnet.ibm.com> In-Reply-To: <1331919570-2264-2-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Content-Type: text/plain; charset=ISO-2022-JP Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org (2012/03/17 2:39), Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > We will be using this from other subsystems like memcg > in later patches. > > Acked-by: Hillf Danton > Signed-off-by: Aneesh Kumar K.V Reviewed-by: KAMEZAWA Hiroyuki From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757591Ab2CSCNn (ORCPT ); Sun, 18 Mar 2012 22:13:43 -0400 Received: from fgwmail5.fujitsu.co.jp ([192.51.44.35]:45567 "EHLO fgwmail5.fujitsu.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752953Ab2CSCNl (ORCPT ); Sun, 18 Mar 2012 22:13:41 -0400 X-SecurityPolicyCheck: OK by SHieldMailChecker v1.7.4 Message-ID: <4F6695EC.2060208@jp.fujitsu.com> Date: Mon, 19 Mar 2012 11:11:56 +0900 From: KAMEZAWA Hiroyuki User-Agent: Mozilla/5.0 (Windows NT 6.0; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2 MIME-Version: 1.0 To: "Aneesh Kumar K.V" CC: linux-mm@kvack.org, mgorman@suse.de, dhillf@gmail.com, aarcange@redhat.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: Re: [PATCH -V4 02/10] hugetlbfs: don't use ERR_PTR with VM_FAULT* values References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-3-git-send-email-aneesh.kumar@linux.vnet.ibm.com> In-Reply-To: <1331919570-2264-3-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Content-Type: text/plain; charset=ISO-2022-JP Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org (2012/03/17 2:39), Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > Using VM_FAULT_* codes with ERR_PTR will require us to make sure > VM_FAULT_* values will not exceed MAX_ERRNO value. > > Signed-off-by: Aneesh Kumar K.V Is this a bug fix ? Reviewed-by: KAMEZAWA Hiroyuki From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757600Ab2CSCRN (ORCPT ); Sun, 18 Mar 2012 22:17:13 -0400 Received: from fgwmail6.fujitsu.co.jp ([192.51.44.36]:44791 "EHLO fgwmail6.fujitsu.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752953Ab2CSCRM (ORCPT ); Sun, 18 Mar 2012 22:17:12 -0400 X-SecurityPolicyCheck: OK by SHieldMailChecker v1.7.4 Message-ID: <4F6696C2.4020203@jp.fujitsu.com> Date: Mon, 19 Mar 2012 11:15:30 +0900 From: KAMEZAWA Hiroyuki User-Agent: Mozilla/5.0 (Windows NT 6.0; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2 MIME-Version: 1.0 To: "Aneesh Kumar K.V" CC: linux-mm@kvack.org, mgorman@suse.de, dhillf@gmail.com, aarcange@redhat.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: Re: [PATCH -V4 03/10] hugetlbfs: Add an inline helper for finding hstate index References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-4-git-send-email-aneesh.kumar@linux.vnet.ibm.com> In-Reply-To: <1331919570-2264-4-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Content-Type: text/plain; charset=ISO-2022-JP Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org (2012/03/17 2:39), Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > Add and inline helper and use it in the code. > > Signed-off-by: Aneesh Kumar K.V Reviewed-by: KAMEZAWA Hiroyuki From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S964800Ab2CSCk1 (ORCPT ); Sun, 18 Mar 2012 22:40:27 -0400 Received: from fgwmail5.fujitsu.co.jp ([192.51.44.35]:38714 "EHLO fgwmail5.fujitsu.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757748Ab2CSCkZ (ORCPT ); Sun, 18 Mar 2012 22:40:25 -0400 X-SecurityPolicyCheck: OK by SHieldMailChecker v1.7.4 Message-ID: <4F669C2E.1010502@jp.fujitsu.com> Date: Mon, 19 Mar 2012 11:38:38 +0900 From: KAMEZAWA Hiroyuki User-Agent: Mozilla/5.0 (Windows NT 6.0; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2 MIME-Version: 1.0 To: "Aneesh Kumar K.V" CC: linux-mm@kvack.org, mgorman@suse.de, dhillf@gmail.com, aarcange@redhat.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: Re: [PATCH -V4 04/10] memcg: Add HugeTLB extension References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-5-git-send-email-aneesh.kumar@linux.vnet.ibm.com> In-Reply-To: <1331919570-2264-5-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Content-Type: text/plain; charset=ISO-2022-JP Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org (2012/03/17 2:39), Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > This patch implements a memcg extension that allows us to control > HugeTLB allocations via memory controller. > If you write some details here, it will be helpful for review and seeing log after merge. > Signed-off-by: Aneesh Kumar K.V > --- > include/linux/hugetlb.h | 1 + > include/linux/memcontrol.h | 42 +++++++++++++ > init/Kconfig | 8 +++ > mm/hugetlb.c | 2 +- > mm/memcontrol.c | 138 ++++++++++++++++++++++++++++++++++++++++++++ > 5 files changed, 190 insertions(+), 1 deletions(-) > > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h > index a2675b0..1f70068 100644 > --- a/include/linux/hugetlb.h > +++ b/include/linux/hugetlb.h > @@ -243,6 +243,7 @@ struct hstate *size_to_hstate(unsigned long size); > #define HUGE_MAX_HSTATE 1 > #endif > > +extern int hugetlb_max_hstate; > extern struct hstate hstates[HUGE_MAX_HSTATE]; > extern unsigned int default_hstate_idx; > > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > index 4d34356..320dbad 100644 > --- a/include/linux/memcontrol.h > +++ b/include/linux/memcontrol.h > @@ -429,5 +429,47 @@ static inline void sock_release_memcg(struct sock *sk) > { > } > #endif /* CONFIG_CGROUP_MEM_RES_CTLR_KMEM */ > + > +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB > +extern int mem_cgroup_hugetlb_charge_page(int idx, unsigned long nr_pages, > + struct mem_cgroup **ptr); > +extern void mem_cgroup_hugetlb_commit_charge(int idx, unsigned long nr_pages, > + struct mem_cgroup *memcg, > + struct page *page); > +extern void mem_cgroup_hugetlb_uncharge_page(int idx, unsigned long nr_pages, > + struct page *page); > +extern void mem_cgroup_hugetlb_uncharge_memcg(int idx, unsigned long nr_pages, > + struct mem_cgroup *memcg); > + > +#else > +static inline int > +mem_cgroup_hugetlb_charge_page(int idx, unsigned long nr_pages, > + struct mem_cgroup **ptr) > +{ > + return 0; > +} > + > +static inline void > +mem_cgroup_hugetlb_commit_charge(int idx, unsigned long nr_pages, > + struct mem_cgroup *memcg, > + struct page *page) > +{ > + return; > +} > + > +static inline void > +mem_cgroup_hugetlb_uncharge_page(int idx, unsigned long nr_pages, > + struct page *page) > +{ > + return; > +} > + > +static inline void > +mem_cgroup_hugetlb_uncharge_memcg(int idx, unsigned long nr_pages, > + struct mem_cgroup *memcg) > +{ > + return; > +} > +#endif /* CONFIG_MEM_RES_CTLR_HUGETLB */ > #endif /* _LINUX_MEMCONTROL_H */ > > diff --git a/init/Kconfig b/init/Kconfig > index 3f42cd6..f0eb8aa 100644 > --- a/init/Kconfig > +++ b/init/Kconfig > @@ -725,6 +725,14 @@ config CGROUP_PERF > > Say N if unsure. > > +config MEM_RES_CTLR_HUGETLB > + bool "Memory Resource Controller HugeTLB Extension (EXPERIMENTAL)" > + depends on CGROUP_MEM_RES_CTLR && HUGETLB_PAGE && EXPERIMENTAL > + default n > + help > + Add HugeTLB management to memory resource controller. When you > + enable this, you can put a per cgroup limit on HugeTLB usage. > + > menuconfig CGROUP_SCHED > bool "Group CPU scheduler" > default n > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index ebe245c..c672187 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -34,7 +34,7 @@ const unsigned long hugetlb_zero = 0, hugetlb_infinity = ~0UL; > static gfp_t htlb_alloc_mask = GFP_HIGHUSER; > unsigned long hugepages_treat_as_movable; > > -static int hugetlb_max_hstate; > +int hugetlb_max_hstate; > unsigned int default_hstate_idx; > struct hstate hstates[HUGE_MAX_HSTATE]; > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 6728a7a..4b36c5e 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -235,6 +235,10 @@ struct mem_cgroup { > */ > struct res_counter memsw; > /* > + * the counter to account for hugepages from hugetlb. > + */ > + struct res_counter hugepage[HUGE_MAX_HSTATE]; > + /* > * Per cgroup active and inactive list, similar to the > * per zone LRU lists. > */ > @@ -3156,6 +3160,128 @@ static inline int mem_cgroup_move_swap_account(swp_entry_t entry, > } > #endif > > +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB > +static bool mem_cgroup_have_hugetlb_usage(struct mem_cgroup *memcg) > +{ > + int idx; > + for (idx = 0; idx < hugetlb_max_hstate; idx++) { > + if (memcg->hugepage[idx].usage > 0) > + return 1; > + } > + return 0; > +} Please use res_counter_read_u64() rather than reading the value directly. > + > +int mem_cgroup_hugetlb_charge_page(int idx, unsigned long nr_pages, > + struct mem_cgroup **ptr) > +{ > + int ret = 0; > + struct mem_cgroup *memcg; > + struct res_counter *fail_res; > + unsigned long csize = nr_pages * PAGE_SIZE; > + > + if (mem_cgroup_disabled()) > + return 0; > +again: > + rcu_read_lock(); > + memcg = mem_cgroup_from_task(current); > + if (!memcg) > + memcg = root_mem_cgroup; > + if (mem_cgroup_is_root(memcg)) { > + rcu_read_unlock(); > + goto done; > + } One concern is.... Now, yes, memory cgroup doesn't account root cgroup and doesn't update res->usage to avoid updating shared counter overheads when memcg is not mounted. But memory.usage_in_bytes files works for root memcg with reading percpu statistics. So, how about counting usage for root cgroup even if it cannot be limited ? Considering hugetlb fs usage, updating res_counter here doesn't have performance problem of false sharing.. Then, you can remove root_mem_cgroup() checks inserted several places. > struct mem_cgroup *memcg = mem_cgroup_from_cont(cont); > + /* > + * Don't allow memcg removal if we have HugeTLB resource > + * usage. > + */ > + if (mem_cgroup_have_hugetlb_usage(memcg)) > + return -EBUSY; > > return mem_cgroup_force_empty(memcg, false); > } Is this fixed by patch 8+9 ? Thanks, -Kame From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S964791Ab2CSCmv (ORCPT ); Sun, 18 Mar 2012 22:42:51 -0400 Received: from fgwmail6.fujitsu.co.jp ([192.51.44.36]:34886 "EHLO fgwmail6.fujitsu.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757567Ab2CSCmu (ORCPT ); Sun, 18 Mar 2012 22:42:50 -0400 X-SecurityPolicyCheck: OK by SHieldMailChecker v1.7.4 Message-ID: <4F669CC3.9070007@jp.fujitsu.com> Date: Mon, 19 Mar 2012 11:41:07 +0900 From: KAMEZAWA Hiroyuki User-Agent: Mozilla/5.0 (Windows NT 6.0; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2 MIME-Version: 1.0 To: "Aneesh Kumar K.V" CC: linux-mm@kvack.org, mgorman@suse.de, dhillf@gmail.com, aarcange@redhat.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: Re: [PATCH -V4 05/10] hugetlb: add charge/uncharge calls for HugeTLB alloc/free References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-6-git-send-email-aneesh.kumar@linux.vnet.ibm.com> In-Reply-To: <1331919570-2264-6-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Content-Type: text/plain; charset=ISO-2022-JP Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org (2012/03/17 2:39), Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > This adds necessary charge/uncharge calls in the HugeTLB code > > Acked-by: Hillf Danton > Signed-off-by: Aneesh Kumar K.V Reviewed-by: KAMEZAWA Hiroyuki A nitpick below. > --- > mm/hugetlb.c | 21 ++++++++++++++++++++- > mm/memcontrol.c | 5 +++++ > 2 files changed, 25 insertions(+), 1 deletions(-) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index c672187..91361a0 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -21,6 +21,8 @@ > #include > #include > #include > +#include > +#include > > #include > #include > @@ -542,6 +544,9 @@ static void free_huge_page(struct page *page) > BUG_ON(page_mapcount(page)); > INIT_LIST_HEAD(&page->lru); > > + if (mapping) > + mem_cgroup_hugetlb_uncharge_page(hstate_index(h), > + pages_per_huge_page(h), page); > spin_lock(&hugetlb_lock); > if (h->surplus_huge_pages_node[nid] && huge_page_order(h) < MAX_ORDER) { > update_and_free_page(h, page); > @@ -1019,12 +1024,15 @@ static void vma_commit_reservation(struct hstate *h, > static struct page *alloc_huge_page(struct vm_area_struct *vma, > unsigned long addr, int avoid_reserve) > { > + int ret, idx; > struct hstate *h = hstate_vma(vma); > struct page *page; > + struct mem_cgroup *memcg = NULL; Can't we this initialization in mem_cgroup_hugetlb_charge_page() ? Thanks, -Kame From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757786Ab2CSCpT (ORCPT ); Sun, 18 Mar 2012 22:45:19 -0400 Received: from fgwmail6.fujitsu.co.jp ([192.51.44.36]:43556 "EHLO fgwmail6.fujitsu.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757756Ab2CSCpR (ORCPT ); Sun, 18 Mar 2012 22:45:17 -0400 X-SecurityPolicyCheck: OK by SHieldMailChecker v1.7.4 Message-ID: <4F669D56.4080002@jp.fujitsu.com> Date: Mon, 19 Mar 2012 11:43:34 +0900 From: KAMEZAWA Hiroyuki User-Agent: Mozilla/5.0 (Windows NT 6.0; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2 MIME-Version: 1.0 To: "Aneesh Kumar K.V" CC: linux-mm@kvack.org, mgorman@suse.de, dhillf@gmail.com, aarcange@redhat.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: Re: [PATCH -V4 06/10] memcg: track resource index in cftype private References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-7-git-send-email-aneesh.kumar@linux.vnet.ibm.com> In-Reply-To: <1331919570-2264-7-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Content-Type: text/plain; charset=ISO-2022-JP Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org (2012/03/17 2:39), Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > This helps in using same memcg callbacks for non reclaim resource > control files. > > Signed-off-by: Aneesh Kumar K.V Acked-by: KAMEZAWA Hiroyuki As mentioned, I'm glad if you can handle usage_in_bytes for root memcg. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1030178Ab2CSC6f (ORCPT ); Sun, 18 Mar 2012 22:58:35 -0400 Received: from fgwmail5.fujitsu.co.jp ([192.51.44.35]:56530 "EHLO fgwmail5.fujitsu.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932094Ab2CSC6c (ORCPT ); Sun, 18 Mar 2012 22:58:32 -0400 X-SecurityPolicyCheck: OK by SHieldMailChecker v1.7.4 Message-ID: <4F66A059.20801@jp.fujitsu.com> Date: Mon, 19 Mar 2012 11:56:25 +0900 From: KAMEZAWA Hiroyuki User-Agent: Mozilla/5.0 (Windows NT 6.0; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2 MIME-Version: 1.0 To: "Aneesh Kumar K.V" CC: linux-mm@kvack.org, mgorman@suse.de, dhillf@gmail.com, aarcange@redhat.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: Re: [PATCH -V4 07/10] hugetlbfs: Add memcg control files for hugetlbfs References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-8-git-send-email-aneesh.kumar@linux.vnet.ibm.com> In-Reply-To: <1331919570-2264-8-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Content-Type: text/plain; charset=ISO-2022-JP Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org (2012/03/17 2:39), Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > This add control files for hugetlbfs in memcg > > Signed-off-by: Aneesh Kumar K.V I have a question. When a user does 1. create memory cgroup as /cgroup/A 2. insmod hugetlb.ko 3. ls /cgroup/A and then, files can be shown ? Don't we have any problem at rmdir A ? I'm sorry if hugetlb never be used as module. a comment below. > --- > include/linux/hugetlb.h | 17 +++++++++++++++ > include/linux/memcontrol.h | 7 ++++++ > mm/hugetlb.c | 25 ++++++++++++++++++++++- > mm/memcontrol.c | 48 ++++++++++++++++++++++++++++++++++++++++++++ > 4 files changed, 96 insertions(+), 1 deletions(-) > > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h > index 1f70068..cbd8dc5 100644 > --- a/include/linux/hugetlb.h > +++ b/include/linux/hugetlb.h > @@ -4,6 +4,7 @@ > #include > #include > #include > +#include > > struct ctl_table; > struct user_struct; > @@ -220,6 +221,12 @@ struct hstate { > unsigned int nr_huge_pages_node[MAX_NUMNODES]; > unsigned int free_huge_pages_node[MAX_NUMNODES]; > unsigned int surplus_huge_pages_node[MAX_NUMNODES]; > + /* mem cgroup control files */ > +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB > + struct cftype cgroup_limit_file; > + struct cftype cgroup_usage_file; > + struct cftype cgroup_max_usage_file; > +#endif > char name[HSTATE_NAME_LEN]; > }; > > @@ -338,4 +345,14 @@ static inline unsigned int pages_per_huge_page(struct hstate *h) > #define hstate_index(h) 0 > #endif > > +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB > +extern int register_hugetlb_memcg_files(struct cgroup *cgroup, > + struct cgroup_subsys *ss); > +#else > +static inline int register_hugetlb_memcg_files(struct cgroup *cgroup, > + struct cgroup_subsys *ss) > +{ > + return 0; > +} > +#endif > #endif /* _LINUX_HUGETLB_H */ > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > index 320dbad..73900b9 100644 > --- a/include/linux/memcontrol.h > +++ b/include/linux/memcontrol.h > @@ -440,6 +440,7 @@ extern void mem_cgroup_hugetlb_uncharge_page(int idx, unsigned long nr_pages, > struct page *page); > extern void mem_cgroup_hugetlb_uncharge_memcg(int idx, unsigned long nr_pages, > struct mem_cgroup *memcg); > +extern int mem_cgroup_hugetlb_file_init(int idx); > > #else > static inline int > @@ -470,6 +471,12 @@ mem_cgroup_hugetlb_uncharge_memcg(int idx, unsigned long nr_pages, > { > return; > } > + > +static inline int mem_cgroup_hugetlb_file_init(int idx) > +{ > + return 0; > +} > + > #endif /* CONFIG_MEM_RES_CTLR_HUGETLB */ > #endif /* _LINUX_MEMCONTROL_H */ > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 91361a0..684849a 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -1819,6 +1819,29 @@ static int __init hugetlb_init(void) > } > module_init(hugetlb_init); > > +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB > +int register_hugetlb_memcg_files(struct cgroup *cgroup, > + struct cgroup_subsys *ss) > +{ > + int ret = 0; > + struct hstate *h; > + > + for_each_hstate(h) { > + ret = cgroup_add_file(cgroup, ss, &h->cgroup_limit_file); > + if (ret) > + return ret; > + ret = cgroup_add_file(cgroup, ss, &h->cgroup_usage_file); > + if (ret) > + return ret; > + ret = cgroup_add_file(cgroup, ss, &h->cgroup_max_usage_file); > + if (ret) > + return ret; > + > + } > + return ret; > +} > +#endif > + > /* Should be called on processing a hugepagesz=... option */ > void __init hugetlb_add_hstate(unsigned order) > { > @@ -1842,7 +1865,7 @@ void __init hugetlb_add_hstate(unsigned order) > h->next_nid_to_free = first_node(node_states[N_HIGH_MEMORY]); > snprintf(h->name, HSTATE_NAME_LEN, "hugepages-%lukB", > huge_page_size(h)/1024); > - > + mem_cgroup_hugetlb_file_init(hugetlb_max_hstate - 1); > parsed_hstate = h; > } > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index d8b3513..4900b72 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -5123,6 +5123,51 @@ static void mem_cgroup_destroy(struct cgroup_subsys *ss, > mem_cgroup_put(memcg); > } > > +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB > +static char *mem_fmt(char *buf, unsigned long n) > +{ > + if (n >= (1UL << 30)) > + sprintf(buf, "%luGB", n >> 30); > + else if (n >= (1UL << 20)) > + sprintf(buf, "%luMB", n >> 20); > + else > + sprintf(buf, "%luKB", n >> 10); > + return buf; > +} > + > +int mem_cgroup_hugetlb_file_init(int idx) > +{ __init ? And... do we have guarantee that this function is called before creating root mem cgroup even if CONFIG_HUGETLBFS=y ? Thanks, -Kame From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S964880Ab2CSDC1 (ORCPT ); Sun, 18 Mar 2012 23:02:27 -0400 Received: from fgwmail5.fujitsu.co.jp ([192.51.44.35]:56685 "EHLO fgwmail5.fujitsu.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752751Ab2CSDCZ (ORCPT ); Sun, 18 Mar 2012 23:02:25 -0400 X-SecurityPolicyCheck: OK by SHieldMailChecker v1.7.4 Message-ID: <4F66A15B.7070804@jp.fujitsu.com> Date: Mon, 19 Mar 2012 12:00:43 +0900 From: KAMEZAWA Hiroyuki User-Agent: Mozilla/5.0 (Windows NT 6.0; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2 MIME-Version: 1.0 To: "Aneesh Kumar K.V" CC: linux-mm@kvack.org, mgorman@suse.de, dhillf@gmail.com, aarcange@redhat.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: Re: [PATCH -V4 08/10] hugetlbfs: Add a list for tracking in-use HugeTLB pages References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-9-git-send-email-aneesh.kumar@linux.vnet.ibm.com> In-Reply-To: <1331919570-2264-9-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Content-Type: text/plain; charset=ISO-2022-JP Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org (2012/03/17 2:39), Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > hugepage_activelist will be used to track currently used HugeTLB pages. > We need to find the in-use HugeTLB pages to support memcg removal. > On memcg removal we update the page's memory cgroup to point to > parent cgroup. > > Signed-off-by: Aneesh Kumar K.V Reviewed-by: KAMEZAWA Hiroyuki seems ok to me but...why the new list is not per node ? no benefit ? Thanks, -Kame > --- > include/linux/hugetlb.h | 1 + > mm/hugetlb.c | 23 ++++++++++++++++++----- > 2 files changed, 19 insertions(+), 5 deletions(-) > > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h > index cbd8dc5..6919100 100644 > --- a/include/linux/hugetlb.h > +++ b/include/linux/hugetlb.h > @@ -217,6 +217,7 @@ struct hstate { > unsigned long resv_huge_pages; > unsigned long surplus_huge_pages; > unsigned long nr_overcommit_huge_pages; > + struct list_head hugepage_activelist; > struct list_head hugepage_freelists[MAX_NUMNODES]; > unsigned int nr_huge_pages_node[MAX_NUMNODES]; > unsigned int free_huge_pages_node[MAX_NUMNODES]; > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 684849a..8fd465d 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -433,7 +433,7 @@ void copy_huge_page(struct page *dst, struct page *src) > static void enqueue_huge_page(struct hstate *h, struct page *page) > { > int nid = page_to_nid(page); > - list_add(&page->lru, &h->hugepage_freelists[nid]); > + list_move(&page->lru, &h->hugepage_freelists[nid]); > h->free_huge_pages++; > h->free_huge_pages_node[nid]++; > } > @@ -445,7 +445,7 @@ static struct page *dequeue_huge_page_node(struct hstate *h, int nid) > if (list_empty(&h->hugepage_freelists[nid])) > return NULL; > page = list_entry(h->hugepage_freelists[nid].next, struct page, lru); > - list_del(&page->lru); > + list_move(&page->lru, &h->hugepage_activelist); > set_page_refcounted(page); > h->free_huge_pages--; > h->free_huge_pages_node[nid]--; > @@ -542,13 +542,14 @@ static void free_huge_page(struct page *page) > page->mapping = NULL; > BUG_ON(page_count(page)); > BUG_ON(page_mapcount(page)); > - INIT_LIST_HEAD(&page->lru); > > if (mapping) > mem_cgroup_hugetlb_uncharge_page(hstate_index(h), > pages_per_huge_page(h), page); > spin_lock(&hugetlb_lock); > if (h->surplus_huge_pages_node[nid] && huge_page_order(h) < MAX_ORDER) { > + /* remove the page from active list */ > + list_del(&page->lru); > update_and_free_page(h, page); > h->surplus_huge_pages--; > h->surplus_huge_pages_node[nid]--; > @@ -562,6 +563,7 @@ static void free_huge_page(struct page *page) > > static void prep_new_huge_page(struct hstate *h, struct page *page, int nid) > { > + INIT_LIST_HEAD(&page->lru); > set_compound_page_dtor(page, free_huge_page); > spin_lock(&hugetlb_lock); > h->nr_huge_pages++; > @@ -1861,6 +1863,7 @@ void __init hugetlb_add_hstate(unsigned order) > h->free_huge_pages = 0; > for (i = 0; i < MAX_NUMNODES; ++i) > INIT_LIST_HEAD(&h->hugepage_freelists[i]); > + INIT_LIST_HEAD(&h->hugepage_activelist); > h->next_nid_to_alloc = first_node(node_states[N_HIGH_MEMORY]); > h->next_nid_to_free = first_node(node_states[N_HIGH_MEMORY]); > snprintf(h->name, HSTATE_NAME_LEN, "hugepages-%lukB", > @@ -2319,14 +2322,24 @@ void __unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start, > page = pte_page(pte); > if (pte_dirty(pte)) > set_page_dirty(page); > - list_add(&page->lru, &page_list); > + > + spin_lock(&hugetlb_lock); > + list_move(&page->lru, &page_list); > + spin_unlock(&hugetlb_lock); > } > spin_unlock(&mm->page_table_lock); > flush_tlb_range(vma, start, end); > mmu_notifier_invalidate_range_end(mm, start, end); > list_for_each_entry_safe(page, tmp, &page_list, lru) { > page_remove_rmap(page); > - list_del(&page->lru); > + /* > + * We need to move it back huge page active list. If we are > + * holding the last reference, below put_page will move it > + * back to free list. > + */ > + spin_lock(&hugetlb_lock); > + list_move(&page->lru, &h->hugepage_activelist); > + spin_unlock(&hugetlb_lock); > put_page(page); > } > } From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932196Ab2CSDGk (ORCPT ); Sun, 18 Mar 2012 23:06:40 -0400 Received: from fgwmail6.fujitsu.co.jp ([192.51.44.36]:42987 "EHLO fgwmail6.fujitsu.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754741Ab2CSDGj (ORCPT ); Sun, 18 Mar 2012 23:06:39 -0400 X-SecurityPolicyCheck: OK by SHieldMailChecker v1.7.4 Message-ID: <4F66A258.5060301@jp.fujitsu.com> Date: Mon, 19 Mar 2012 12:04:56 +0900 From: KAMEZAWA Hiroyuki User-Agent: Mozilla/5.0 (Windows NT 6.0; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2 MIME-Version: 1.0 To: "Aneesh Kumar K.V" CC: linux-mm@kvack.org, mgorman@suse.de, dhillf@gmail.com, aarcange@redhat.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: Re: [PATCH -V4 09/10] memcg: move HugeTLB resource count to parent cgroup on memcg removal References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-10-git-send-email-aneesh.kumar@linux.vnet.ibm.com> In-Reply-To: <1331919570-2264-10-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Content-Type: text/plain; charset=ISO-2022-JP Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org (2012/03/17 2:39), Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > This add support for memcg removal with HugeTLB resource usage. > > Signed-off-by: Aneesh Kumar K.V seems ok for now. Now, Tejun and Costa, and I are discussing removeing -EBUSY from rmdir(). We're now considering 'if use_hierarchy=false and parent seems full, reclaim all or move charges to the root cgroup.' then -EBUSY will go away. Is it accesptable for hugetlb ? Do you have another idea ? Thanks, -Kame > --- > include/linux/hugetlb.h | 6 ++++ > include/linux/memcontrol.h | 15 +++++++++- > mm/hugetlb.c | 41 ++++++++++++++++++++++++++ > mm/memcontrol.c | 68 +++++++++++++++++++++++++++++++++++++------ > 4 files changed, 119 insertions(+), 11 deletions(-) > > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h > index 6919100..32e948c 100644 > --- a/include/linux/hugetlb.h > +++ b/include/linux/hugetlb.h > @@ -349,11 +349,17 @@ static inline unsigned int pages_per_huge_page(struct hstate *h) > #ifdef CONFIG_MEM_RES_CTLR_HUGETLB > extern int register_hugetlb_memcg_files(struct cgroup *cgroup, > struct cgroup_subsys *ss); > +extern int hugetlb_force_memcg_empty(struct cgroup *cgroup); > #else > static inline int register_hugetlb_memcg_files(struct cgroup *cgroup, > struct cgroup_subsys *ss) > { > return 0; > } > + > +static inline int hugetlb_force_memcg_empty(struct cgroup *cgroup) > +{ > + return 0; > +} > #endif > #endif /* _LINUX_HUGETLB_H */ > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > index 73900b9..0980122 100644 > --- a/include/linux/memcontrol.h > +++ b/include/linux/memcontrol.h > @@ -441,7 +441,9 @@ extern void mem_cgroup_hugetlb_uncharge_page(int idx, unsigned long nr_pages, > extern void mem_cgroup_hugetlb_uncharge_memcg(int idx, unsigned long nr_pages, > struct mem_cgroup *memcg); > extern int mem_cgroup_hugetlb_file_init(int idx); > - > +extern int mem_cgroup_move_hugetlb_parent(int idx, struct cgroup *cgroup, > + struct page *page); > +extern bool mem_cgroup_have_hugetlb_usage(struct cgroup *cgroup); > #else > static inline int > mem_cgroup_hugetlb_charge_page(int idx, unsigned long nr_pages, > @@ -477,6 +479,17 @@ static inline int mem_cgroup_hugetlb_file_init(int idx) > return 0; > } > > +static inline int > +mem_cgroup_move_hugetlb_parent(int idx, struct cgroup *cgroup, > + struct page *page) > +{ > + return 0; > +} > + > +static inline bool mem_cgroup_have_hugetlb_usage(struct cgroup *cgroup) > +{ > + return 0; > +} > #endif /* CONFIG_MEM_RES_CTLR_HUGETLB */ > #endif /* _LINUX_MEMCONTROL_H */ > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 8fd465d..685f0d5 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -1842,6 +1842,47 @@ int register_hugetlb_memcg_files(struct cgroup *cgroup, > } > return ret; > } > + > +/* > + * Force the memcg to empty the hugetlb resources by moving them to > + * the parent cgroup. We can fail if the parent cgroup's limit prevented > + * the charging. This should only happen if use_hierarchy is not set. > + */ > +int hugetlb_force_memcg_empty(struct cgroup *cgroup) > +{ > + struct hstate *h; > + struct page *page; > + int ret = 0, idx = 0; > + > + do { > + if (cgroup_task_count(cgroup) || !list_empty(&cgroup->children)) > + goto out; > + /* > + * If the task doing the cgroup_rmdir got a signal > + * we don't really need to loop till the hugetlb resource > + * usage become zero. > + */ > + if (signal_pending(current)) { > + ret = -EINTR; > + goto out; > + } > + for_each_hstate(h) { > + spin_lock(&hugetlb_lock); > + list_for_each_entry(page, &h->hugepage_activelist, lru) { > + ret = mem_cgroup_move_hugetlb_parent(idx, cgroup, page); > + if (ret) { > + spin_unlock(&hugetlb_lock); > + goto out; > + } > + } > + spin_unlock(&hugetlb_lock); > + idx++; > + } > + cond_resched(); > + } while (mem_cgroup_have_hugetlb_usage(cgroup)); > +out: > + return ret; > +} > #endif > > /* Should be called on processing a hugepagesz=... option */ > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 4900b72..e29d86d 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -3171,9 +3171,11 @@ static inline int mem_cgroup_move_swap_account(swp_entry_t entry, > #endif > > #ifdef CONFIG_MEM_RES_CTLR_HUGETLB > -static bool mem_cgroup_have_hugetlb_usage(struct mem_cgroup *memcg) > +bool mem_cgroup_have_hugetlb_usage(struct cgroup *cgroup) > { > int idx; > + struct mem_cgroup *memcg = mem_cgroup_from_cont(cgroup); > + > for (idx = 0; idx < hugetlb_max_hstate; idx++) { > if (memcg->hugepage[idx].usage > 0) > return 1; > @@ -3285,10 +3287,57 @@ void mem_cgroup_hugetlb_uncharge_memcg(int idx, unsigned long nr_pages, > res_counter_uncharge(&memcg->hugepage[idx], csize); > return; > } > -#else > -static bool mem_cgroup_have_hugetlb_usage(struct mem_cgroup *memcg) > + > +int mem_cgroup_move_hugetlb_parent(int idx, struct cgroup *cgroup, > + struct page *page) > { > - return 0; > + struct page_cgroup *pc; > + int csize, ret = 0; > + struct res_counter *fail_res; > + struct cgroup *pcgrp = cgroup->parent; > + struct mem_cgroup *parent = mem_cgroup_from_cont(pcgrp); > + struct mem_cgroup *memcg = mem_cgroup_from_cont(cgroup); > + > + if (!get_page_unless_zero(page)) > + goto out; > + > + pc = lookup_page_cgroup(page); > + lock_page_cgroup(pc); > + if (!PageCgroupUsed(pc) || pc->mem_cgroup != memcg) > + goto err_out; > + > + csize = PAGE_SIZE << compound_order(page); > + /* > + * uncharge from child and charge the parent. If we have > + * use_hierarchy set, we can never fail here. In-order to make > + * sure we don't get -ENOMEM on parent charge, we first uncharge > + * the child and then charge the parent. > + */ > + if (parent->use_hierarchy) { > + res_counter_uncharge(&memcg->hugepage[idx], csize); > + if (!mem_cgroup_is_root(parent)) > + ret = res_counter_charge(&parent->hugepage[idx], > + csize, &fail_res); > + } else { > + if (!mem_cgroup_is_root(parent)) { > + ret = res_counter_charge(&parent->hugepage[idx], > + csize, &fail_res); > + if (ret) { > + ret = -EBUSY; > + goto err_out; > + } > + } > + res_counter_uncharge(&memcg->hugepage[idx], csize); > + } > + /* > + * caller should have done css_get > + */ > + pc->mem_cgroup = parent; > +err_out: > + unlock_page_cgroup(pc); > + put_page(page); > +out: > + return ret; > } > #endif /* CONFIG_MEM_RES_CTLR_HUGETLB */ > > @@ -3806,6 +3855,11 @@ static int mem_cgroup_force_empty(struct mem_cgroup *memcg, bool free_all) > /* should free all ? */ > if (free_all) > goto try_to_free; > + > + /* move the hugetlb charges */ > + ret = hugetlb_force_memcg_empty(cgrp); > + if (ret) > + goto out; > move_account: > do { > ret = -EBUSY; > @@ -5103,12 +5157,6 @@ static int mem_cgroup_pre_destroy(struct cgroup_subsys *ss, > struct cgroup *cont) > { > struct mem_cgroup *memcg = mem_cgroup_from_cont(cont); > - /* > - * Don't allow memcg removal if we have HugeTLB resource > - * usage. > - */ > - if (mem_cgroup_have_hugetlb_usage(memcg)) > - return -EBUSY; > > return mem_cgroup_force_empty(memcg, false); > } From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932314Ab2CSGiR (ORCPT ); Mon, 19 Mar 2012 02:38:17 -0400 Received: from e23smtp09.au.ibm.com ([202.81.31.142]:42338 "EHLO e23smtp09.au.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756805Ab2CSGiP (ORCPT ); Mon, 19 Mar 2012 02:38:15 -0400 From: "Aneesh Kumar K.V" To: KAMEZAWA Hiroyuki Cc: linux-mm@kvack.org, mgorman@suse.de, dhillf@gmail.com, aarcange@redhat.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: Re: [PATCH -V4 02/10] hugetlbfs: don't use ERR_PTR with VM_FAULT* values In-Reply-To: <4F6695EC.2060208@jp.fujitsu.com> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-3-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <4F6695EC.2060208@jp.fujitsu.com> User-Agent: Notmuch/0.11.1+190~g31a336a (http://notmuchmail.org) Emacs/23.3.1 (x86_64-pc-linux-gnu) Date: Mon, 19 Mar 2012 12:07:47 +0530 Message-ID: <877gyhksec.fsf@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii x-cbid: 12031821-3568-0000-0000-00000161AF27 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 19 Mar 2012 11:11:56 +0900, KAMEZAWA Hiroyuki wrote: > (2012/03/17 2:39), Aneesh Kumar K.V wrote: > > > From: "Aneesh Kumar K.V" > > > > Using VM_FAULT_* codes with ERR_PTR will require us to make sure > > VM_FAULT_* values will not exceed MAX_ERRNO value. > > > > Signed-off-by: Aneesh Kumar K.V > > > Is this a bug fix ? No. Currently the values of VM_FAULT_* codes are all below MAX_ERRNO. The changes in the patch are done based on the suggestion from Andrew. http://article.gmane.org/gmane.linux.kernel.cgroups/1160 > Reviewed-by: KAMEZAWA Hiroyuki > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1030672Ab2CSGxt (ORCPT ); Mon, 19 Mar 2012 02:53:49 -0400 Received: from e28smtp07.in.ibm.com ([122.248.162.7]:40313 "EHLO e28smtp07.in.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S964923Ab2CSGxr (ORCPT ); Mon, 19 Mar 2012 02:53:47 -0400 From: "Aneesh Kumar K.V" To: KAMEZAWA Hiroyuki Cc: linux-mm@kvack.org, mgorman@suse.de, dhillf@gmail.com, aarcange@redhat.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: Re: [PATCH -V4 04/10] memcg: Add HugeTLB extension In-Reply-To: <4F669C2E.1010502@jp.fujitsu.com> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-5-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <4F669C2E.1010502@jp.fujitsu.com> User-Agent: Notmuch/0.11.1+190~g31a336a (http://notmuchmail.org) Emacs/23.3.1 (x86_64-pc-linux-gnu) Date: Mon, 19 Mar 2012 12:22:53 +0530 Message-ID: <874ntlkrp6.fsf@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii x-cbid: 12031906-8878-0000-0000-000001B7B31F Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 19 Mar 2012 11:38:38 +0900, KAMEZAWA Hiroyuki wrote: > (2012/03/17 2:39), Aneesh Kumar K.V wrote: > > > From: "Aneesh Kumar K.V" > > > > This patch implements a memcg extension that allows us to control > > HugeTLB allocations via memory controller. > > > > > If you write some details here, it will be helpful for review and > seeing log after merge. Will add more info. > > > > Signed-off-by: Aneesh Kumar K.V > > --- > > include/linux/hugetlb.h | 1 + > > include/linux/memcontrol.h | 42 +++++++++++++ > > init/Kconfig | 8 +++ > > mm/hugetlb.c | 2 +- > > mm/memcontrol.c | 138 ++++++++++++++++++++++++++++++++++++++++++++ > > 5 files changed, 190 insertions(+), 1 deletions(-) .... > > +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB > > +static bool mem_cgroup_have_hugetlb_usage(struct mem_cgroup *memcg) > > +{ > > + int idx; > > + for (idx = 0; idx < hugetlb_max_hstate; idx++) { > > + if (memcg->hugepage[idx].usage > 0) > > + return 1; > > + } > > + return 0; > > +} > > > Please use res_counter_read_u64() rather than reading the value directly. > The open-coded variant is mostly derived from mem_cgroup_force_empty. I have updated the patch to use res_counter_read_u64. > > > + > > +int mem_cgroup_hugetlb_charge_page(int idx, unsigned long nr_pages, > > + struct mem_cgroup **ptr) > > +{ > > + int ret = 0; > > + struct mem_cgroup *memcg; > > + struct res_counter *fail_res; > > + unsigned long csize = nr_pages * PAGE_SIZE; > > + > > + if (mem_cgroup_disabled()) > > + return 0; > > +again: > > + rcu_read_lock(); > > + memcg = mem_cgroup_from_task(current); > > + if (!memcg) > > + memcg = root_mem_cgroup; > > + if (mem_cgroup_is_root(memcg)) { > > + rcu_read_unlock(); > > + goto done; > > + } > > > One concern is.... Now, yes, memory cgroup doesn't account root cgroup > and doesn't update res->usage to avoid updating shared counter overheads > when memcg is not mounted. But memory.usage_in_bytes files works > for root memcg with reading percpu statistics. > > So, how about counting usage for root cgroup even if it cannot be limited ? > Considering hugetlb fs usage, updating res_counter here doesn't have > performance problem of false sharing.. > Then, you can remove root_mem_cgroup() checks inserted several places. > Yes. That is a good idea. Will update the patch. > > > > struct mem_cgroup *memcg = mem_cgroup_from_cont(cont); > > + /* > > + * Don't allow memcg removal if we have HugeTLB resource > > + * usage. > > + */ > > + if (mem_cgroup_have_hugetlb_usage(memcg)) > > + return -EBUSY; > > > > return mem_cgroup_force_empty(memcg, false); > > } > > > Is this fixed by patch 8+9 ? Yes. -aneesh From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757987Ab2CSHBt (ORCPT ); Mon, 19 Mar 2012 03:01:49 -0400 Received: from e23smtp09.au.ibm.com ([202.81.31.142]:41094 "EHLO e23smtp09.au.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755444Ab2CSHBr (ORCPT ); Mon, 19 Mar 2012 03:01:47 -0400 From: "Aneesh Kumar K.V" To: KAMEZAWA Hiroyuki Cc: linux-mm@kvack.org, mgorman@suse.de, dhillf@gmail.com, aarcange@redhat.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: Re: [PATCH -V4 05/10] hugetlb: add charge/uncharge calls for HugeTLB alloc/free In-Reply-To: <4F669CC3.9070007@jp.fujitsu.com> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-6-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <4F669CC3.9070007@jp.fujitsu.com> User-Agent: Notmuch/0.11.1+190~g31a336a (http://notmuchmail.org) Emacs/23.3.1 (x86_64-pc-linux-gnu) Date: Mon, 19 Mar 2012 12:31:36 +0530 Message-ID: <871uopkran.fsf@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii x-cbid: 12031821-3568-0000-0000-00000161B7CD Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 19 Mar 2012 11:41:07 +0900, KAMEZAWA Hiroyuki wrote: > (2012/03/17 2:39), Aneesh Kumar K.V wrote: > > > From: "Aneesh Kumar K.V" > > > > This adds necessary charge/uncharge calls in the HugeTLB code > > > > Acked-by: Hillf Danton > > Signed-off-by: Aneesh Kumar K.V > > > Reviewed-by: KAMEZAWA Hiroyuki > A nitpick below. > > > --- > > mm/hugetlb.c | 21 ++++++++++++++++++++- > > mm/memcontrol.c | 5 +++++ > > 2 files changed, 25 insertions(+), 1 deletions(-) > > > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > > index c672187..91361a0 100644 > > --- a/mm/hugetlb.c > > +++ b/mm/hugetlb.c > > @@ -21,6 +21,8 @@ > > #include > > #include > > #include > > +#include > > +#include > > > > #include > > #include > > @@ -542,6 +544,9 @@ static void free_huge_page(struct page *page) > > BUG_ON(page_mapcount(page)); > > INIT_LIST_HEAD(&page->lru); > > > > + if (mapping) > > + mem_cgroup_hugetlb_uncharge_page(hstate_index(h), > > + pages_per_huge_page(h), page); > > spin_lock(&hugetlb_lock); > > if (h->surplus_huge_pages_node[nid] && huge_page_order(h) < MAX_ORDER) { > > update_and_free_page(h, page); > > @@ -1019,12 +1024,15 @@ static void vma_commit_reservation(struct hstate *h, > > static struct page *alloc_huge_page(struct vm_area_struct *vma, > > unsigned long addr, int avoid_reserve) > > { > > + int ret, idx; > > struct hstate *h = hstate_vma(vma); > > struct page *page; > > + struct mem_cgroup *memcg = NULL; > > > Can't we this initialization in mem_cgroup_hugetlb_charge_page() ? > Will update in the next iteration. -aneesh From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757996Ab2CSHCw (ORCPT ); Mon, 19 Mar 2012 03:02:52 -0400 Received: from fgwmail6.fujitsu.co.jp ([192.51.44.36]:35287 "EHLO fgwmail6.fujitsu.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752428Ab2CSHCv (ORCPT ); Mon, 19 Mar 2012 03:02:51 -0400 X-SecurityPolicyCheck: OK by SHieldMailChecker v1.7.4 Message-ID: <4F66D993.2080100@jp.fujitsu.com> Date: Mon, 19 Mar 2012 16:00:35 +0900 From: KAMEZAWA Hiroyuki User-Agent: Mozilla/5.0 (Windows NT 6.0; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2 MIME-Version: 1.0 To: "Aneesh Kumar K.V" CC: linux-mm@kvack.org, mgorman@suse.de, dhillf@gmail.com, aarcange@redhat.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: Re: [PATCH -V4 04/10] memcg: Add HugeTLB extension References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-5-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <4F669C2E.1010502@jp.fujitsu.com> <874ntlkrp6.fsf@linux.vnet.ibm.com> In-Reply-To: <874ntlkrp6.fsf@linux.vnet.ibm.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org (2012/03/19 15:52), Aneesh Kumar K.V wrote: > On Mon, 19 Mar 2012 11:38:38 +0900, KAMEZAWA Hiroyuki wrote: >> (2012/03/17 2:39), Aneesh Kumar K.V wrote: >> >>> From: "Aneesh Kumar K.V" >>> >>> This patch implements a memcg extension that allows us to control >>> HugeTLB allocations via memory controller. >>> >> >> >> If you write some details here, it will be helpful for review and >> seeing log after merge. > > Will add more info. > >> >> >>> Signed-off-by: Aneesh Kumar K.V >>> --- >>> include/linux/hugetlb.h | 1 + >>> include/linux/memcontrol.h | 42 +++++++++++++ >>> init/Kconfig | 8 +++ >>> mm/hugetlb.c | 2 +- >>> mm/memcontrol.c | 138 ++++++++++++++++++++++++++++++++++++++++++++ >>> 5 files changed, 190 insertions(+), 1 deletions(-) > > .... > >>> +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB >>> +static bool mem_cgroup_have_hugetlb_usage(struct mem_cgroup *memcg) >>> +{ >>> + int idx; >>> + for (idx = 0; idx < hugetlb_max_hstate; idx++) { >>> + if (memcg->hugepage[idx].usage > 0) >>> + return 1; >>> + } >>> + return 0; >>> +} >> >> >> Please use res_counter_read_u64() rather than reading the value directly. >> > > The open-coded variant is mostly derived from mem_cgroup_force_empty. I > have updated the patch to use res_counter_read_u64. > Ah, ok. it's(maybe) my bad. I'll schedule a fix. Thanks, -Kame From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758167Ab2CSHOY (ORCPT ); Mon, 19 Mar 2012 03:14:24 -0400 Received: from e23smtp01.au.ibm.com ([202.81.31.143]:53631 "EHLO e23smtp01.au.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1758051Ab2CSHOW (ORCPT ); Mon, 19 Mar 2012 03:14:22 -0400 From: "Aneesh Kumar K.V" To: KAMEZAWA Hiroyuki Cc: linux-mm@kvack.org, mgorman@suse.de, dhillf@gmail.com, aarcange@redhat.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: Re: [PATCH -V4 07/10] hugetlbfs: Add memcg control files for hugetlbfs In-Reply-To: <4F66A059.20801@jp.fujitsu.com> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-8-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <4F66A059.20801@jp.fujitsu.com> User-Agent: Notmuch/0.11.1+190~g31a336a (http://notmuchmail.org) Emacs/23.3.1 (x86_64-pc-linux-gnu) Date: Mon, 19 Mar 2012 12:44:11 +0530 Message-ID: <87wr6hjc58.fsf@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii x-cbid: 12031821-1618-0000-0000-00000118E3F2 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 19 Mar 2012 11:56:25 +0900, KAMEZAWA Hiroyuki wrote: > (2012/03/17 2:39), Aneesh Kumar K.V wrote: > > > From: "Aneesh Kumar K.V" > > > > This add control files for hugetlbfs in memcg > > > > Signed-off-by: Aneesh Kumar K.V > > > I have a question. When a user does > > 1. create memory cgroup as > /cgroup/A > 2. insmod hugetlb.ko > 3. ls /cgroup/A > > and then, files can be shown ? Don't we have any problem at rmdir A ? > > I'm sorry if hugetlb never be used as module. HUGETLBFS cannot be build as kernel module > > a comment below. > > > --- > > include/linux/hugetlb.h | 17 +++++++++++++++ > > include/linux/memcontrol.h | 7 ++++++ > > mm/hugetlb.c | 25 ++++++++++++++++++++++- > > mm/memcontrol.c | 48 ++++++++++++++++++++++++++++++++++++++++++++ > > 4 files changed, 96 insertions(+), 1 deletions(-) ...... > > > > +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB > > +static char *mem_fmt(char *buf, unsigned long n) > > +{ > > + if (n >= (1UL << 30)) > > + sprintf(buf, "%luGB", n >> 30); > > + else if (n >= (1UL << 20)) > > + sprintf(buf, "%luMB", n >> 20); > > + else > > + sprintf(buf, "%luKB", n >> 10); > > + return buf; > > +} > > + > > +int mem_cgroup_hugetlb_file_init(int idx) > > +{ > > > __init ? Added . >And... do we have guarantee that this function is called before > creating root mem cgroup even if CONFIG_HUGETLBFS=y ? > Yes. This should be called before creating root mem cgroup. -aneesh From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758138Ab2CSHf7 (ORCPT ); Mon, 19 Mar 2012 03:35:59 -0400 Received: from fgwmail6.fujitsu.co.jp ([192.51.44.36]:36543 "EHLO fgwmail6.fujitsu.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752298Ab2CSHf5 (ORCPT ); Mon, 19 Mar 2012 03:35:57 -0400 X-SecurityPolicyCheck: OK by SHieldMailChecker v1.7.4 Message-ID: <4F66E169.5000909@jp.fujitsu.com> Date: Mon, 19 Mar 2012 16:34:01 +0900 From: KAMEZAWA Hiroyuki User-Agent: Mozilla/5.0 (Windows NT 6.0; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2 MIME-Version: 1.0 To: "Aneesh Kumar K.V" CC: linux-mm@kvack.org, mgorman@suse.de, dhillf@gmail.com, aarcange@redhat.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Tejun Heo Subject: Re: [PATCH -V4 07/10] hugetlbfs: Add memcg control files for hugetlbfs References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-8-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <4F66A059.20801@jp.fujitsu.com> <87wr6hjc58.fsf@linux.vnet.ibm.com> In-Reply-To: <87wr6hjc58.fsf@linux.vnet.ibm.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org (2012/03/19 16:14), Aneesh Kumar K.V wrote: > On Mon, 19 Mar 2012 11:56:25 +0900, KAMEZAWA Hiroyuki wrote: >> (2012/03/17 2:39), Aneesh Kumar K.V wrote: >> >>> From: "Aneesh Kumar K.V" >>> >>> This add control files for hugetlbfs in memcg >>> >>> Signed-off-by: Aneesh Kumar K.V >> >> >> I have a question. When a user does >> >> 1. create memory cgroup as >> /cgroup/A >> 2. insmod hugetlb.ko >> 3. ls /cgroup/A >> >> and then, files can be shown ? Don't we have any problem at rmdir A ? >> >> I'm sorry if hugetlb never be used as module. > > HUGETLBFS cannot be build as kernel module > > >> >> a comment below. >> >>> --- >>> include/linux/hugetlb.h | 17 +++++++++++++++ >>> include/linux/memcontrol.h | 7 ++++++ >>> mm/hugetlb.c | 25 ++++++++++++++++++++++- >>> mm/memcontrol.c | 48 ++++++++++++++++++++++++++++++++++++++++++++ >>> 4 files changed, 96 insertions(+), 1 deletions(-) > > > ...... > >>> >>> +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB >>> +static char *mem_fmt(char *buf, unsigned long n) >>> +{ >>> + if (n >= (1UL << 30)) >>> + sprintf(buf, "%luGB", n >> 30); >>> + else if (n >= (1UL << 20)) >>> + sprintf(buf, "%luMB", n >> 20); >>> + else >>> + sprintf(buf, "%luKB", n >> 10); >>> + return buf; >>> +} >>> + >>> +int mem_cgroup_hugetlb_file_init(int idx) >>> +{ >> >> >> __init ? > > Added . > >> And... do we have guarantee that this function is called before >> creating root mem cgroup even if CONFIG_HUGETLBFS=y ? >> > > Yes. This should be called before creating root mem cgroup. > O.K. BTW, please read Tejun's recent post.. https://lkml.org/lkml/2012/3/16/522 Can you use his methods ? I guess you can write... CGROUP_SUBSYS_CFTYLES_COND(mem_cgroup_subsys, hugetlb_cgroup_files, if XXXXMB hugetlb is allowed); Hmm. Thanks, -Kame From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758663Ab2CSJPh (ORCPT ); Mon, 19 Mar 2012 05:15:37 -0400 Received: from e28smtp01.in.ibm.com ([122.248.162.1]:48225 "EHLO e28smtp01.in.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754726Ab2CSJPf (ORCPT ); Mon, 19 Mar 2012 05:15:35 -0400 From: "Aneesh Kumar K.V" To: KAMEZAWA Hiroyuki Cc: linux-mm@kvack.org, mgorman@suse.de, dhillf@gmail.com, aarcange@redhat.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: Re: [PATCH -V4 09/10] memcg: move HugeTLB resource count to parent cgroup on memcg removal In-Reply-To: <4F66A258.5060301@jp.fujitsu.com> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-10-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <4F66A258.5060301@jp.fujitsu.com> User-Agent: Notmuch/0.11.1+190~g31a336a (http://notmuchmail.org) Emacs/23.3.1 (x86_64-pc-linux-gnu) Date: Mon, 19 Mar 2012 14:30:24 +0530 Message-ID: <87r4wpj787.fsf@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii x-cbid: 12031909-4790-0000-0000-000001DA4D92 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 19 Mar 2012 12:04:56 +0900, KAMEZAWA Hiroyuki wrote: > (2012/03/17 2:39), Aneesh Kumar K.V wrote: > > > From: "Aneesh Kumar K.V" > > > > This add support for memcg removal with HugeTLB resource usage. > > > > Signed-off-by: Aneesh Kumar K.V > > > seems ok for now. > > Now, Tejun and Costa, and I are discussing removeing -EBUSY from rmdir(). > We're now considering 'if use_hierarchy=false and parent seems full, > reclaim all or move charges to the root cgroup.' then -EBUSY will go away. > > Is it accesptable for hugetlb ? Do you have another idea ? > That should work even for hugetlb. -aneesh From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758680Ab2CSJUD (ORCPT ); Mon, 19 Mar 2012 05:20:03 -0400 Received: from e28smtp06.in.ibm.com ([122.248.162.6]:38400 "EHLO e28smtp06.in.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1758495Ab2CSJUC (ORCPT ); Mon, 19 Mar 2012 05:20:02 -0400 From: "Aneesh Kumar K.V" To: KAMEZAWA Hiroyuki Cc: linux-mm@kvack.org, mgorman@suse.de, dhillf@gmail.com, aarcange@redhat.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: Re: [PATCH -V4 08/10] hugetlbfs: Add a list for tracking in-use HugeTLB pages In-Reply-To: <4F66A15B.7070804@jp.fujitsu.com> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-9-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <4F66A15B.7070804@jp.fujitsu.com> User-Agent: Notmuch/0.11.1+190~g31a336a (http://notmuchmail.org) Emacs/23.3.1 (x86_64-pc-linux-gnu) Date: Mon, 19 Mar 2012 14:29:14 +0530 Message-ID: <87ty1lj7a5.fsf@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii x-cbid: 12031908-9574-0000-0000-000001DF3811 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 19 Mar 2012 12:00:43 +0900, KAMEZAWA Hiroyuki wrote: > (2012/03/17 2:39), Aneesh Kumar K.V wrote: > > > From: "Aneesh Kumar K.V" > > > > hugepage_activelist will be used to track currently used HugeTLB pages. > > We need to find the in-use HugeTLB pages to support memcg removal. > > On memcg removal we update the page's memory cgroup to point to > > parent cgroup. > > > > Signed-off-by: Aneesh Kumar K.V > > > Reviewed-by: KAMEZAWA Hiroyuki > > seems ok to me but...why the new list is not per node ? no benefit ? > I am not sure whether having per node will bring any performance benefit. For cgroup removal we need to look at all the list entries anyway. -aneesh From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758837Ab2CSMJD (ORCPT ); Mon, 19 Mar 2012 08:09:03 -0400 Received: from fgwmail5.fujitsu.co.jp ([192.51.44.35]:47425 "EHLO fgwmail5.fujitsu.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757065Ab2CSMJA (ORCPT ); Mon, 19 Mar 2012 08:09:00 -0400 X-SecurityPolicyCheck: OK by SHieldMailChecker v1.7.4 Message-ID: <4F672165.4050506@jp.fujitsu.com> Date: Mon, 19 Mar 2012 21:07:01 +0900 From: KAMEZAWA Hiroyuki User-Agent: Mozilla/5.0 (Windows NT 6.0; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2 MIME-Version: 1.0 To: Glauber Costa CC: "Aneesh Kumar K.V" , linux-mm@kvack.org, mgorman@suse.de, dhillf@gmail.com, aarcange@redhat.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: Re: [PATCH -V4 04/10] memcg: Add HugeTLB extension References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-5-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <4F669C2E.1010502@jp.fujitsu.com> <874ntlkrp6.fsf@linux.vnet.ibm.com> <4F66D993.2080100@jp.fujitsu.com> <4F671AE6.5020204@parallels.com> In-Reply-To: <4F671AE6.5020204@parallels.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org (2012/03/19 20:39), Glauber Costa wrote: > On 03/19/2012 11:00 AM, KAMEZAWA Hiroyuki wrote: >> (2012/03/19 15:52), Aneesh Kumar K.V wrote: >> >>> On Mon, 19 Mar 2012 11:38:38 +0900, KAMEZAWA Hiroyuki wrote: >>>> (2012/03/17 2:39), Aneesh Kumar K.V wrote: >>>> >>>>> From: "Aneesh Kumar K.V" >>>>> >>>>> This patch implements a memcg extension that allows us to control >>>>> HugeTLB allocations via memory controller. >>>>> >>>> >>>> >>>> If you write some details here, it will be helpful for review and >>>> seeing log after merge. >>> >>> Will add more info. >>> >>>> >>>> >>>>> Signed-off-by: Aneesh Kumar K.V >>>>> --- >>>>> include/linux/hugetlb.h | 1 + >>>>> include/linux/memcontrol.h | 42 +++++++++++++ >>>>> init/Kconfig | 8 +++ >>>>> mm/hugetlb.c | 2 +- >>>>> mm/memcontrol.c | 138 ++++++++++++++++++++++++++++++++++++++++++++ >>>>> 5 files changed, 190 insertions(+), 1 deletions(-) >>> >>> .... >>> >>>>> +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB >>>>> +static bool mem_cgroup_have_hugetlb_usage(struct mem_cgroup *memcg) >>>>> +{ >>>>> + int idx; >>>>> + for (idx = 0; idx< hugetlb_max_hstate; idx++) { >>>>> + if (memcg->hugepage[idx].usage> 0) >>>>> + return 1; >>>>> + } >>>>> + return 0; >>>>> +} >>>> >>>> >>>> Please use res_counter_read_u64() rather than reading the value directly. >>>> >>> >>> The open-coded variant is mostly derived from mem_cgroup_force_empty. I >>> have updated the patch to use res_counter_read_u64. >>> >> >> Ah, ok. it's(maybe) my bad. I'll schedule a fix. >> > Kame, > > I actually have it ready here. I can submit it if you want. > That's good :) please post. (But I'm sorry I'll be absent tomorrow.) Thanks, -Kame From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756816Ab2CTJoG (ORCPT ); Tue, 20 Mar 2012 05:44:06 -0400 Received: from e28smtp02.in.ibm.com ([122.248.162.2]:39913 "EHLO e28smtp02.in.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755260Ab2CTJoE (ORCPT ); Tue, 20 Mar 2012 05:44:04 -0400 From: "Aneesh Kumar K.V" To: KAMEZAWA Hiroyuki Cc: linux-mm@kvack.org, mgorman@suse.de, dhillf@gmail.com, aarcange@redhat.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Tejun Heo Subject: Re: [PATCH -V4 07/10] hugetlbfs: Add memcg control files for hugetlbfs In-Reply-To: <4F66E169.5000909@jp.fujitsu.com> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-8-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <4F66A059.20801@jp.fujitsu.com> <87wr6hjc58.fsf@linux.vnet.ibm.com> <4F66E169.5000909@jp.fujitsu.com>User-Agent: Notmuch/0.11.1+346~g13d19c3 (http://notmuchmail.org) Emacs/23.3.1 (x86_64-pc-linux-gnu) Date: Tue, 20 Mar 2012 14:52:20 +0530 Message-ID: <874ntjtynn.fsf@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii x-cbid: 12032009-5816-0000-0000-000001D448E3 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org KAMEZAWA Hiroyuki writes: > > O.K. BTW, please read Tejun's recent post.. > > https://lkml.org/lkml/2012/3/16/522 > > Can you use his methods ? > > I guess you can write... > > CGROUP_SUBSYS_CFTYLES_COND(mem_cgroup_subsys, > hugetlb_cgroup_files, > if XXXXMB hugetlb is allowed); > I may not be able to do CGROUP_SUBSYS_CFTYPES_COND(). But as long as we are able to dynamically add new control files, we should be ok. -aneesh From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754111Ab2CUEtB (ORCPT ); Wed, 21 Mar 2012 00:49:01 -0400 Received: from e28smtp01.in.ibm.com ([122.248.162.1]:49840 "EHLO e28smtp01.in.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753438Ab2CUEs7 (ORCPT ); Wed, 21 Mar 2012 00:48:59 -0400 From: "Aneesh Kumar K.V" To: Glauber Costa , KAMEZAWA Hiroyuki Cc: linux-mm@kvack.org, mgorman@suse.de, dhillf@gmail.com, aarcange@redhat.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: Re: [PATCH -V4 04/10] memcg: Add HugeTLB extension In-Reply-To: <4F671AE6.5020204@parallels.com> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-5-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <4F669C2E.1010502@jp.fujitsu.com> <874ntlkrp6.fsf@linux.vnet.ibm.com> <4F66D993.2080100@jp.fujitsu.com> <4F671AE6.5020204@parallels.com>User-Agent: Notmuch/0.11.1+346~g13d19c3 (http://notmuchmail.org) Emacs/23.3.1 (x86_64-pc-linux-gnu) Date: Wed, 21 Mar 2012 10:18:43 +0530 Message-ID: <87obrqsgno.fsf@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii x-cbid: 12032104-4790-0000-0000-000001E228CF Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Glauber Costa writes: > On 03/19/2012 11:00 AM, KAMEZAWA Hiroyuki wrote: >> (2012/03/19 15:52), Aneesh Kumar K.V wrote: >> >>> >>>>> +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB >>>>> +static bool mem_cgroup_have_hugetlb_usage(struct mem_cgroup *memcg) >>>>> +{ >>>>> + int idx; >>>>> + for (idx = 0; idx< hugetlb_max_hstate; idx++) { >>>>> + if (memcg->hugepage[idx].usage> 0) >>>>> + return 1; >>>>> + } >>>>> + return 0; >>>>> +} >>>> >>>> >>>> Please use res_counter_read_u64() rather than reading the value directly. >>>> >>> >>> The open-coded variant is mostly derived from mem_cgroup_force_empty. I >>> have updated the patch to use res_counter_read_u64. >>> >> >> Ah, ok. it's(maybe) my bad. I'll schedule a fix. >> > Kame, > > I actually have it ready here. I can submit it if you want. > > This one has bitten me as well when I was trying to experiment with the > res_counter performance... Do we really need memcg.res.usage to be accurate in that while loop ? If we miss a zero update because we encountered a partial update; in the next loop we will find it zero right ? -aneesh From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754102Ab2CUFYt (ORCPT ); Wed, 21 Mar 2012 01:24:49 -0400 Received: from fgwmail5.fujitsu.co.jp ([192.51.44.35]:41596 "EHLO fgwmail5.fujitsu.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751035Ab2CUFYs (ORCPT ); Wed, 21 Mar 2012 01:24:48 -0400 X-SecurityPolicyCheck: OK by SHieldMailChecker v1.7.4 Message-ID: <4F6965AC.4070004@jp.fujitsu.com> Date: Wed, 21 Mar 2012 14:22:52 +0900 From: KAMEZAWA Hiroyuki User-Agent: Mozilla/5.0 (Windows NT 6.0; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2 MIME-Version: 1.0 To: "Aneesh Kumar K.V" CC: Glauber Costa , linux-mm@kvack.org, mgorman@suse.de, dhillf@gmail.com, aarcange@redhat.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: Re: [PATCH -V4 04/10] memcg: Add HugeTLB extension References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-5-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <4F669C2E.1010502@jp.fujitsu.com> <874ntlkrp6.fsf@linux.vnet.ibm.com> <4F66D993.2080100@jp.fujitsu.com> <4F671AE6.5020204@parallels.com>User-Agent: Notmuch/0.11.1+346~g13d19c3 (http://notmuchmail.org) Emacs/23.3.1 (x86_64-pc-linux-gnu) <87obrqsgno.fsf@linux.vnet.ibm.com> In-Reply-To: <87obrqsgno.fsf@linux.vnet.ibm.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org (2012/03/21 13:48), Aneesh Kumar K.V wrote: > Glauber Costa writes: > >> On 03/19/2012 11:00 AM, KAMEZAWA Hiroyuki wrote: >>> (2012/03/19 15:52), Aneesh Kumar K.V wrote: >>> >>>> >>>>>> +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB >>>>>> +static bool mem_cgroup_have_hugetlb_usage(struct mem_cgroup *memcg) >>>>>> +{ >>>>>> + int idx; >>>>>> + for (idx = 0; idx< hugetlb_max_hstate; idx++) { >>>>>> + if (memcg->hugepage[idx].usage> 0) >>>>>> + return 1; >>>>>> + } >>>>>> + return 0; >>>>>> +} >>>>> >>>>> >>>>> Please use res_counter_read_u64() rather than reading the value directly. >>>>> >>>> >>>> The open-coded variant is mostly derived from mem_cgroup_force_empty. I >>>> have updated the patch to use res_counter_read_u64. >>>> >>> >>> Ah, ok. it's(maybe) my bad. I'll schedule a fix. >>> >> Kame, >> >> I actually have it ready here. I can submit it if you want. >> >> This one has bitten me as well when I was trying to experiment with the >> res_counter performance... > > Do we really need memcg.res.usage to be accurate in that while loop ? If > we miss a zero update because we encountered a partial update; in the > next loop we will find it zero right ? > At rmdir(), I assume there is no task in memcg. It means res->usage never increase and no other thread than force_empty will touch res->counter. So, I think memcg->res.usage > 0 never be wrong and we'll find correct comparison by continuing the loop. But recent kmem accounting at el may break the assumption (I'm not fully sure..) So, I think it will be good to use res_counter_u64(). This part is not important for performance, anyway. Thanks, -Kame From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932252Ab2C1JZv (ORCPT ); Wed, 28 Mar 2012 05:25:51 -0400 Received: from cantor2.suse.de ([195.135.220.15]:47397 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932187Ab2C1JZu (ORCPT ); Wed, 28 Mar 2012 05:25:50 -0400 Date: Wed, 28 Mar 2012 11:25:48 +0200 From: Michal Hocko To: "Aneesh Kumar K.V" Cc: linux-mm@kvack.org, mgorman@suse.de, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, aarcange@redhat.com, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: Re: [PATCH -V4 02/10] hugetlbfs: don't use ERR_PTR with VM_FAULT* values Message-ID: <20120328092547.GC20949@tiehlicka.suse.cz> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-3-git-send-email-aneesh.kumar@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1331919570-2264-3-git-send-email-aneesh.kumar@linux.vnet.ibm.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri 16-03-12 23:09:22, Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > Using VM_FAULT_* codes with ERR_PTR will require us to make sure > VM_FAULT_* values will not exceed MAX_ERRNO value. > > Signed-off-by: Aneesh Kumar K.V > --- > mm/hugetlb.c | 18 +++++++++++++----- > 1 files changed, 13 insertions(+), 5 deletions(-) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index d623e71..3782da8 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c [...] > @@ -1047,7 +1047,7 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma, > page = alloc_buddy_huge_page(h, NUMA_NO_NODE); > if (!page) { > hugetlb_put_quota(inode->i_mapping, chg); > - return ERR_PTR(-VM_FAULT_SIGBUS); > + return ERR_PTR(-ENOSPC); Hmm, so one error code abuse replaced by another? I know that ENOMEM would revert 4a6018f7 which would be unfortunate but ENOSPC doesn't feel right as well. > } > } > > @@ -2395,6 +2395,7 @@ retry_avoidcopy: > new_page = alloc_huge_page(vma, address, outside_reserve); > > if (IS_ERR(new_page)) { > + int err = PTR_ERR(new_page); > page_cache_release(old_page); > > /* > @@ -2424,7 +2425,10 @@ retry_avoidcopy: > > /* Caller expects lock to be held */ > spin_lock(&mm->page_table_lock); > - return -PTR_ERR(new_page); > + if (err == -ENOMEM) > + return VM_FAULT_OOM; > + else > + return VM_FAULT_SIGBUS; > } > > /* > @@ -2542,7 +2546,11 @@ retry: > goto out; > page = alloc_huge_page(vma, address, 0); > if (IS_ERR(page)) { > - ret = -PTR_ERR(page); > + ret = PTR_ERR(page); > + if (ret == -ENOMEM) > + ret = VM_FAULT_OOM; > + else > + ret = VM_FAULT_SIGBUS; > goto out; > } > clear_huge_page(page, address, pages_per_huge_page(h)); > -- > 1.7.9 > -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757771Ab2C1JSR (ORCPT ); Wed, 28 Mar 2012 05:18:17 -0400 Received: from cantor2.suse.de ([195.135.220.15]:46353 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757669Ab2C1JSQ (ORCPT ); Wed, 28 Mar 2012 05:18:16 -0400 Date: Wed, 28 Mar 2012 11:18:11 +0200 From: Michal Hocko To: "Aneesh Kumar K.V" Cc: linux-mm@kvack.org, mgorman@suse.de, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, aarcange@redhat.com, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: Re: [PATCH -V4 01/10] hugetlb: rename max_hstate to hugetlb_max_hstate Message-ID: <20120328091811.GB20949@tiehlicka.suse.cz> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-2-git-send-email-aneesh.kumar@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1331919570-2264-2-git-send-email-aneesh.kumar@linux.vnet.ibm.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org [Sorry for late review] On Fri 16-03-12 23:09:21, Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > We will be using this from other subsystems like memcg > in later patches. OK, why not. I would probably loved an accessor function more but what ever. Acked-by: Michal Hocko > > Acked-by: Hillf Danton > Signed-off-by: Aneesh Kumar K.V > --- > mm/hugetlb.c | 14 +++++++------- > 1 files changed, 7 insertions(+), 7 deletions(-) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 5f34bd8..d623e71 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -34,7 +34,7 @@ const unsigned long hugetlb_zero = 0, hugetlb_infinity = ~0UL; > static gfp_t htlb_alloc_mask = GFP_HIGHUSER; > unsigned long hugepages_treat_as_movable; > > -static int max_hstate; > +static int hugetlb_max_hstate; > > unsigned int default_hstate_idx; > struct hstate hstates[HUGE_MAX_HSTATE]; > > @@ -46,7 +46,7 @@ static unsigned long __initdata default_hstate_max_huge_pages; > static unsigned long __initdata default_hstate_size; > > #define for_each_hstate(h) \ > - for ((h) = hstates; (h) < &hstates[max_hstate]; (h)++) > + for ((h) = hstates; (h) < &hstates[hugetlb_max_hstate]; (h)++) > > /* > * Protects updates to hugepage_freelists, nr_huge_pages, and free_huge_pages > @@ -1808,9 +1808,9 @@ void __init hugetlb_add_hstate(unsigned order) > printk(KERN_WARNING "hugepagesz= specified twice, ignoring\n"); > return; > } > - BUG_ON(max_hstate >= HUGE_MAX_HSTATE); > + BUG_ON(hugetlb_max_hstate >= HUGE_MAX_HSTATE); > BUG_ON(order == 0); > - h = &hstates[max_hstate++]; > + h = &hstates[hugetlb_max_hstate++]; > h->order = order; > h->mask = ~((1ULL << (order + PAGE_SHIFT)) - 1); > h->nr_huge_pages = 0; > @@ -1831,10 +1831,10 @@ static int __init hugetlb_nrpages_setup(char *s) > static unsigned long *last_mhp; > > /* > - * !max_hstate means we haven't parsed a hugepagesz= parameter yet, > + * !hugetlb_max_hstate means we haven't parsed a hugepagesz= parameter yet, > * so this hugepages= parameter goes to the "default hstate". > */ > - if (!max_hstate) > + if (!hugetlb_max_hstate) > mhp = &default_hstate_max_huge_pages; > else > mhp = &parsed_hstate->max_huge_pages; > @@ -1853,7 +1853,7 @@ static int __init hugetlb_nrpages_setup(char *s) > * But we need to allocate >= MAX_ORDER hstates here early to still > * use the bootmem allocator. > */ > - if (max_hstate && parsed_hstate->order >= MAX_ORDER) > + if (hugetlb_max_hstate && parsed_hstate->order >= MAX_ORDER) > hugetlb_hstate_alloc_pages(parsed_hstate); > > last_mhp = mhp; > -- > 1.7.9 > -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932498Ab2C1LdK (ORCPT ); Wed, 28 Mar 2012 07:33:10 -0400 Received: from cantor2.suse.de ([195.135.220.15]:33090 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932285Ab2C1LdI (ORCPT ); Wed, 28 Mar 2012 07:33:08 -0400 Date: Wed, 28 Mar 2012 13:33:04 +0200 From: Michal Hocko To: "Aneesh Kumar K.V" Cc: linux-mm@kvack.org, mgorman@suse.de, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, aarcange@redhat.com, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: Re: [PATCH -V4 04/10] memcg: Add HugeTLB extension Message-ID: <20120328113304.GE20949@tiehlicka.suse.cz> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-5-git-send-email-aneesh.kumar@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1331919570-2264-5-git-send-email-aneesh.kumar@linux.vnet.ibm.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri 16-03-12 23:09:24, Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > This patch implements a memcg extension that allows us to control > HugeTLB allocations via memory controller. And the infrastructure is not used at this stage (you forgot to mention). The changelog should be much more descriptive. > > Signed-off-by: Aneesh Kumar K.V > --- > include/linux/hugetlb.h | 1 + > include/linux/memcontrol.h | 42 +++++++++++++ > init/Kconfig | 8 +++ > mm/hugetlb.c | 2 +- > mm/memcontrol.c | 138 ++++++++++++++++++++++++++++++++++++++++++++ > 5 files changed, 190 insertions(+), 1 deletions(-) > [...] > diff --git a/init/Kconfig b/init/Kconfig > index 3f42cd6..f0eb8aa 100644 > --- a/init/Kconfig > +++ b/init/Kconfig > @@ -725,6 +725,14 @@ config CGROUP_PERF > > Say N if unsure. > > +config MEM_RES_CTLR_HUGETLB > + bool "Memory Resource Controller HugeTLB Extension (EXPERIMENTAL)" > + depends on CGROUP_MEM_RES_CTLR && HUGETLB_PAGE && EXPERIMENTAL > + default n > + help > + Add HugeTLB management to memory resource controller. When you > + enable this, you can put a per cgroup limit on HugeTLB usage. How does it interact with the hard/soft limists etc... [...] > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 6728a7a..4b36c5e 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -235,6 +235,10 @@ struct mem_cgroup { > */ > struct res_counter memsw; > /* > + * the counter to account for hugepages from hugetlb. > + */ > + struct res_counter hugepage[HUGE_MAX_HSTATE]; > + /* > * Per cgroup active and inactive list, similar to the > * per zone LRU lists. > */ > @@ -3156,6 +3160,128 @@ static inline int mem_cgroup_move_swap_account(swp_entry_t entry, > } > #endif > > +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB > +static bool mem_cgroup_have_hugetlb_usage(struct mem_cgroup *memcg) > +{ > + int idx; > + for (idx = 0; idx < hugetlb_max_hstate; idx++) { Maybe we should expose for_each_hstate as well... > + if (memcg->hugepage[idx].usage > 0) > + return 1; > + } > + return 0; > +} > + > +int mem_cgroup_hugetlb_charge_page(int idx, unsigned long nr_pages, > + struct mem_cgroup **ptr) > +{ > + int ret = 0; > + struct mem_cgroup *memcg; > + struct res_counter *fail_res; > + unsigned long csize = nr_pages * PAGE_SIZE; > + > + if (mem_cgroup_disabled()) > + return 0; > +again: > + rcu_read_lock(); > + memcg = mem_cgroup_from_task(current); > + if (!memcg) > + memcg = root_mem_cgroup; > + if (mem_cgroup_is_root(memcg)) { > + rcu_read_unlock(); > + goto done; > + } > + if (!css_tryget(&memcg->css)) { > + rcu_read_unlock(); > + goto again; > + } > + rcu_read_unlock(); > + > + ret = res_counter_charge(&memcg->hugepage[idx], csize, &fail_res); > + css_put(&memcg->css); > +done: > + *ptr = memcg; Why do we set ptr even for the failure case after we dropped a reference? > + return ret; > +} > + > +void mem_cgroup_hugetlb_commit_charge(int idx, unsigned long nr_pages, > + struct mem_cgroup *memcg, > + struct page *page) > +{ > + struct page_cgroup *pc; > + > + if (mem_cgroup_disabled()) > + return; > + > + pc = lookup_page_cgroup(page); > + lock_page_cgroup(pc); > + if (unlikely(PageCgroupUsed(pc))) { > + unlock_page_cgroup(pc); > + mem_cgroup_hugetlb_uncharge_memcg(idx, nr_pages, memcg); > + return; > + } > + pc->mem_cgroup = memcg; > + /* > + * We access a page_cgroup asynchronously without lock_page_cgroup(). > + * Especially when a page_cgroup is taken from a page, pc->mem_cgroup > + * is accessed after testing USED bit. To make pc->mem_cgroup visible > + * before USED bit, we need memory barrier here. > + * See mem_cgroup_add_lru_list(), etc. > + */ > + smp_wmb(); Is this really necessary for hugetlb pages as well? > + SetPageCgroupUsed(pc); > + > + unlock_page_cgroup(pc); > + return; > +} > + [...] > @@ -4887,6 +5013,7 @@ err_cleanup: > static struct cgroup_subsys_state * __ref > mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont) > { > + int idx; > struct mem_cgroup *memcg, *parent; > long error = -ENOMEM; > int node; > @@ -4929,9 +5056,14 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont) > * mem_cgroup(see mem_cgroup_put). > */ > mem_cgroup_get(parent); > + for (idx = 0; idx < HUGE_MAX_HSTATE; idx++) Do we have to init all hstates or is hugetlb_max_hstate enough? > + res_counter_init(&memcg->hugepage[idx], > + &parent->hugepage[idx]); > } else { > res_counter_init(&memcg->res, NULL); > res_counter_init(&memcg->memsw, NULL); > + for (idx = 0; idx < HUGE_MAX_HSTATE; idx++) > + res_counter_init(&memcg->hugepage[idx], NULL); Same here -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932536Ab2C1LgC (ORCPT ); Wed, 28 Mar 2012 07:36:02 -0400 Received: from e23smtp04.au.ibm.com ([202.81.31.146]:46746 "EHLO e23smtp04.au.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932365Ab2C1LgA (ORCPT ); Wed, 28 Mar 2012 07:36:00 -0400 From: "Aneesh Kumar K.V" To: Michal Hocko Cc: linux-mm@kvack.org, mgorman@suse.de, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, aarcange@redhat.com, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: Re: [PATCH -V4 02/10] hugetlbfs: don't use ERR_PTR with VM_FAULT* values In-Reply-To: <20120328092547.GC20949@tiehlicka.suse.cz> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-3-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <20120328092547.GC20949@tiehlicka.suse.cz>User-Agent: Notmuch/0.11.1+346~g13d19c3 (http://notmuchmail.org) Emacs/23.3.1 (x86_64-pc-linux-gnu) Date: Wed, 28 Mar 2012 17:05:49 +0530 Message-ID: <87vclpyn3e.fsf@skywalker.in.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii x-cbid: 12032801-9264-0000-0000-0000012461BA Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Michal Hocko writes: > On Fri 16-03-12 23:09:22, Aneesh Kumar K.V wrote: >> From: "Aneesh Kumar K.V" >> >> Using VM_FAULT_* codes with ERR_PTR will require us to make sure >> VM_FAULT_* values will not exceed MAX_ERRNO value. >> >> Signed-off-by: Aneesh Kumar K.V >> --- >> mm/hugetlb.c | 18 +++++++++++++----- >> 1 files changed, 13 insertions(+), 5 deletions(-) >> >> diff --git a/mm/hugetlb.c b/mm/hugetlb.c >> index d623e71..3782da8 100644 >> --- a/mm/hugetlb.c >> +++ b/mm/hugetlb.c > [...] >> @@ -1047,7 +1047,7 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma, >> page = alloc_buddy_huge_page(h, NUMA_NO_NODE); >> if (!page) { >> hugetlb_put_quota(inode->i_mapping, chg); >> - return ERR_PTR(-VM_FAULT_SIGBUS); >> + return ERR_PTR(-ENOSPC); > > Hmm, so one error code abuse replaced by another? > I know that ENOMEM would revert 4a6018f7 which would be unfortunate but > ENOSPC doesn't feel right as well. > File systems do map ENOSPC to SIGBUS. block_page_mkwrite_return() does that. -aneesh From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757868Ab2C1NRK (ORCPT ); Wed, 28 Mar 2012 09:17:10 -0400 Received: from cantor2.suse.de ([195.135.220.15]:41922 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755458Ab2C1NRI (ORCPT ); Wed, 28 Mar 2012 09:17:08 -0400 Date: Wed, 28 Mar 2012 15:17:06 +0200 From: Michal Hocko To: "Aneesh Kumar K.V" Cc: linux-mm@kvack.org, mgorman@suse.de, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, aarcange@redhat.com, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: Re: [PATCH -V4 05/10] hugetlb: add charge/uncharge calls for HugeTLB alloc/free Message-ID: <20120328131706.GF20949@tiehlicka.suse.cz> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-6-git-send-email-aneesh.kumar@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1331919570-2264-6-git-send-email-aneesh.kumar@linux.vnet.ibm.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri 16-03-12 23:09:25, Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > This adds necessary charge/uncharge calls in the HugeTLB code This begs for more description... Other than that it looks correct. > Acked-by: Hillf Danton > Signed-off-by: Aneesh Kumar K.V > --- > mm/hugetlb.c | 21 ++++++++++++++++++++- > mm/memcontrol.c | 5 +++++ > 2 files changed, 25 insertions(+), 1 deletions(-) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index c672187..91361a0 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -21,6 +21,8 @@ > #include > #include > #include > +#include > +#include > > #include > #include > @@ -542,6 +544,9 @@ static void free_huge_page(struct page *page) > BUG_ON(page_mapcount(page)); > INIT_LIST_HEAD(&page->lru); > > + if (mapping) > + mem_cgroup_hugetlb_uncharge_page(hstate_index(h), > + pages_per_huge_page(h), page); > spin_lock(&hugetlb_lock); > if (h->surplus_huge_pages_node[nid] && huge_page_order(h) < MAX_ORDER) { > update_and_free_page(h, page); > @@ -1019,12 +1024,15 @@ static void vma_commit_reservation(struct hstate *h, > static struct page *alloc_huge_page(struct vm_area_struct *vma, > unsigned long addr, int avoid_reserve) > { > + int ret, idx; > struct hstate *h = hstate_vma(vma); > struct page *page; > + struct mem_cgroup *memcg = NULL; > struct address_space *mapping = vma->vm_file->f_mapping; > struct inode *inode = mapping->host; > long chg; > > + idx = hstate_index(h); > /* > * Processes that did not create the mapping will have no reserves and > * will not have accounted against quota. Check that the quota can be > @@ -1039,6 +1047,12 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma, > if (hugetlb_get_quota(inode->i_mapping, chg)) > return ERR_PTR(-ENOSPC); > > + ret = mem_cgroup_hugetlb_charge_page(idx, pages_per_huge_page(h), > + &memcg); > + if (ret) { > + hugetlb_put_quota(inode->i_mapping, chg); > + return ERR_PTR(-ENOSPC); > + } > spin_lock(&hugetlb_lock); > page = dequeue_huge_page_vma(h, vma, addr, avoid_reserve); > spin_unlock(&hugetlb_lock); > @@ -1046,6 +1060,9 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma, > if (!page) { > page = alloc_buddy_huge_page(h, NUMA_NO_NODE); > if (!page) { > + mem_cgroup_hugetlb_uncharge_memcg(idx, > + pages_per_huge_page(h), > + memcg); > hugetlb_put_quota(inode->i_mapping, chg); > return ERR_PTR(-ENOSPC); > } > @@ -1054,7 +1071,9 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma, > set_page_private(page, (unsigned long) mapping); > > vma_commit_reservation(h, vma, addr); > - > + /* update page cgroup details */ > + mem_cgroup_hugetlb_commit_charge(idx, pages_per_huge_page(h), > + memcg, page); > return page; > } > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 4b36c5e..7a9ea94 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -2901,6 +2901,11 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype) > > if (PageSwapCache(page)) > return NULL; > + /* > + * HugeTLB page uncharge happen in the HugeTLB compound page destructor > + */ > + if (PageHuge(page)) > + return NULL; > > if (PageTransHuge(page)) { > nr_pages <<= compound_order(page); > -- > 1.7.9 > -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758039Ab2C1NkY (ORCPT ); Wed, 28 Mar 2012 09:40:24 -0400 Received: from cantor2.suse.de ([195.135.220.15]:44325 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751467Ab2C1NkW (ORCPT ); Wed, 28 Mar 2012 09:40:22 -0400 Date: Wed, 28 Mar 2012 15:40:20 +0200 From: Michal Hocko To: "Aneesh Kumar K.V" Cc: linux-mm@kvack.org, mgorman@suse.de, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, aarcange@redhat.com, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: Re: [PATCH -V4 04/10] memcg: Add HugeTLB extension Message-ID: <20120328134020.GG20949@tiehlicka.suse.cz> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-5-git-send-email-aneesh.kumar@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1331919570-2264-5-git-send-email-aneesh.kumar@linux.vnet.ibm.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri 16-03-12 23:09:24, Aneesh Kumar K.V wrote: [...] > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 6728a7a..4b36c5e 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c [...] > @@ -4887,6 +5013,7 @@ err_cleanup: > static struct cgroup_subsys_state * __ref > mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont) > { > + int idx; > struct mem_cgroup *memcg, *parent; > long error = -ENOMEM; > int node; > @@ -4929,9 +5056,14 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont) > * mem_cgroup(see mem_cgroup_put). > */ > mem_cgroup_get(parent); > + for (idx = 0; idx < HUGE_MAX_HSTATE; idx++) > + res_counter_init(&memcg->hugepage[idx], > + &parent->hugepage[idx]); Hmm, I do not think we want to make groups deeper in the hierarchy unlimited as we cannot reclaim. Shouldn't we copy the limit from the parent? Still not ideal but slightly more expected behavior IMO. The hierarchy setups are still interesting and the limitations should be described in the documentation... > } else { > res_counter_init(&memcg->res, NULL); > res_counter_init(&memcg->memsw, NULL); > + for (idx = 0; idx < HUGE_MAX_HSTATE; idx++) > + res_counter_init(&memcg->hugepage[idx], NULL); > } > memcg->last_scanned_node = MAX_NUMNODES; > INIT_LIST_HEAD(&memcg->oom_notify); -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758070Ab2C1Nzz (ORCPT ); Wed, 28 Mar 2012 09:55:55 -0400 Received: from e28smtp06.in.ibm.com ([122.248.162.6]:35587 "EHLO e28smtp06.in.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757933Ab2C1Nzx (ORCPT ); Wed, 28 Mar 2012 09:55:53 -0400 From: "Aneesh Kumar K.V" To: Michal Hocko Cc: linux-mm@kvack.org, mgorman@suse.de, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, aarcange@redhat.com, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: Re: [PATCH -V4 04/10] memcg: Add HugeTLB extension In-Reply-To: <20120328113304.GE20949@tiehlicka.suse.cz> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-5-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <20120328113304.GE20949@tiehlicka.suse.cz>User-Agent: Notmuch/0.11.1+346~g13d19c3 (http://notmuchmail.org) Emacs/23.3.1 (x86_64-pc-linux-gnu) Date: Wed, 28 Mar 2012 19:10:36 +0530 Message-ID: <87d37wetd7.fsf@skywalker.in.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii x-cbid: 12032813-9574-0000-0000-00000201D181 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Michal Hocko writes: > On Fri 16-03-12 23:09:24, Aneesh Kumar K.V wrote: >> From: "Aneesh Kumar K.V" >> >> This patch implements a memcg extension that allows us to control >> HugeTLB allocations via memory controller. > > And the infrastructure is not used at this stage (you forgot to > mention). > The changelog should be much more descriptive. Will update the changelog. > >> >> Signed-off-by: Aneesh Kumar K.V >> --- >> include/linux/hugetlb.h | 1 + >> include/linux/memcontrol.h | 42 +++++++++++++ >> init/Kconfig | 8 +++ >> mm/hugetlb.c | 2 +- >> mm/memcontrol.c | 138 ++++++++++++++++++++++++++++++++++++++++++++ >> 5 files changed, 190 insertions(+), 1 deletions(-) >> > [...] >> diff --git a/init/Kconfig b/init/Kconfig >> index 3f42cd6..f0eb8aa 100644 >> --- a/init/Kconfig >> +++ b/init/Kconfig >> @@ -725,6 +725,14 @@ config CGROUP_PERF >> >> Say N if unsure. >> >> +config MEM_RES_CTLR_HUGETLB >> + bool "Memory Resource Controller HugeTLB Extension (EXPERIMENTAL)" >> + depends on CGROUP_MEM_RES_CTLR && HUGETLB_PAGE && EXPERIMENTAL >> + default n >> + help >> + Add HugeTLB management to memory resource controller. When you >> + enable this, you can put a per cgroup limit on HugeTLB usage. > > How does it interact with the hard/soft limists etc... There is no softlimit support for HugeTLB extension. > > [...] >> diff --git a/mm/memcontrol.c b/mm/memcontrol.c >> index 6728a7a..4b36c5e 100644 >> --- a/mm/memcontrol.c >> +++ b/mm/memcontrol.c >> @@ -235,6 +235,10 @@ struct mem_cgroup { >> */ >> struct res_counter memsw; >> /* >> + * the counter to account for hugepages from hugetlb. >> + */ >> + struct res_counter hugepage[HUGE_MAX_HSTATE]; >> + /* >> * Per cgroup active and inactive list, similar to the >> * per zone LRU lists. >> */ >> @@ -3156,6 +3160,128 @@ static inline int mem_cgroup_move_swap_account(swp_entry_t entry, >> } >> #endif >> >> +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB >> +static bool mem_cgroup_have_hugetlb_usage(struct mem_cgroup *memcg) >> +{ >> + int idx; >> + for (idx = 0; idx < hugetlb_max_hstate; idx++) { > > Maybe we should expose for_each_hstate as well... That will not really help here. If we use for_each_hstate then we will need to use hstate_index to get the index. > >> + if (memcg->hugepage[idx].usage > 0) >> + return 1; >> + } >> + return 0; >> +} >> + >> +int mem_cgroup_hugetlb_charge_page(int idx, unsigned long nr_pages, >> + struct mem_cgroup **ptr) >> +{ >> + int ret = 0; >> + struct mem_cgroup *memcg; >> + struct res_counter *fail_res; >> + unsigned long csize = nr_pages * PAGE_SIZE; >> + >> + if (mem_cgroup_disabled()) >> + return 0; >> +again: >> + rcu_read_lock(); >> + memcg = mem_cgroup_from_task(current); >> + if (!memcg) >> + memcg = root_mem_cgroup; >> + if (mem_cgroup_is_root(memcg)) { >> + rcu_read_unlock(); >> + goto done; >> + } >> + if (!css_tryget(&memcg->css)) { >> + rcu_read_unlock(); >> + goto again; >> + } >> + rcu_read_unlock(); >> + >> + ret = res_counter_charge(&memcg->hugepage[idx], csize, &fail_res); >> + css_put(&memcg->css); >> +done: >> + *ptr = memcg; > > Why do we set ptr even for the failure case after we dropped a > reference? That ensures that *ptr is NULL. > >> + return ret; >> +} >> + >> +void mem_cgroup_hugetlb_commit_charge(int idx, unsigned long nr_pages, >> + struct mem_cgroup *memcg, >> + struct page *page) >> +{ >> + struct page_cgroup *pc; >> + >> + if (mem_cgroup_disabled()) >> + return; >> + >> + pc = lookup_page_cgroup(page); >> + lock_page_cgroup(pc); >> + if (unlikely(PageCgroupUsed(pc))) { >> + unlock_page_cgroup(pc); >> + mem_cgroup_hugetlb_uncharge_memcg(idx, nr_pages, memcg); >> + return; >> + } >> + pc->mem_cgroup = memcg; >> + /* >> + * We access a page_cgroup asynchronously without lock_page_cgroup(). >> + * Especially when a page_cgroup is taken from a page, pc->mem_cgroup >> + * is accessed after testing USED bit. To make pc->mem_cgroup visible >> + * before USED bit, we need memory barrier here. >> + * See mem_cgroup_add_lru_list(), etc. >> + */ >> + smp_wmb(); > > Is this really necessary for hugetlb pages as well? I used to do that in cgroup_rmdir path, I later changed that part of the code. I will look at the patches again to see if we really need this. > >> + SetPageCgroupUsed(pc); >> + >> + unlock_page_cgroup(pc); >> + return; >> +} >> + > [...] >> @@ -4887,6 +5013,7 @@ err_cleanup: >> static struct cgroup_subsys_state * __ref >> mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont) >> { >> + int idx; >> struct mem_cgroup *memcg, *parent; >> long error = -ENOMEM; >> int node; >> @@ -4929,9 +5056,14 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont) >> * mem_cgroup(see mem_cgroup_put). >> */ >> mem_cgroup_get(parent); >> + for (idx = 0; idx < HUGE_MAX_HSTATE; idx++) > > Do we have to init all hstates or is hugetlb_max_hstate enough? Yes. we do call mem_cgroup_create for root cgroup before initialzing hugetlb hstate. > >> + res_counter_init(&memcg->hugepage[idx], >> + &parent->hugepage[idx]); >> } else { >> res_counter_init(&memcg->res, NULL); >> res_counter_init(&memcg->memsw, NULL); >> + for (idx = 0; idx < HUGE_MAX_HSTATE; idx++) >> + res_counter_init(&memcg->hugepage[idx], NULL); > > Same here > -- -aneesh From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758114Ab2C1N6v (ORCPT ); Wed, 28 Mar 2012 09:58:51 -0400 Received: from cantor2.suse.de ([195.135.220.15]:45922 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752788Ab2C1N6u (ORCPT ); Wed, 28 Mar 2012 09:58:50 -0400 Date: Wed, 28 Mar 2012 15:58:46 +0200 From: Michal Hocko To: "Aneesh Kumar K.V" Cc: linux-mm@kvack.org, mgorman@suse.de, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, aarcange@redhat.com, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: Re: [PATCH -V4 08/10] hugetlbfs: Add a list for tracking in-use HugeTLB pages Message-ID: <20120328135845.GH20949@tiehlicka.suse.cz> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-9-git-send-email-aneesh.kumar@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1331919570-2264-9-git-send-email-aneesh.kumar@linux.vnet.ibm.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri 16-03-12 23:09:28, Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > hugepage_activelist will be used to track currently used HugeTLB pages. > We need to find the in-use HugeTLB pages to support memcg removal. > On memcg removal we update the page's memory cgroup to point to > parent cgroup. > > Signed-off-by: Aneesh Kumar K.V > --- > include/linux/hugetlb.h | 1 + > mm/hugetlb.c | 23 ++++++++++++++++++----- > 2 files changed, 19 insertions(+), 5 deletions(-) > > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h > index cbd8dc5..6919100 100644 > --- a/include/linux/hugetlb.h > +++ b/include/linux/hugetlb.h [...] > @@ -2319,14 +2322,24 @@ void __unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start, > page = pte_page(pte); > if (pte_dirty(pte)) > set_page_dirty(page); > - list_add(&page->lru, &page_list); > + > + spin_lock(&hugetlb_lock); > + list_move(&page->lru, &page_list); > + spin_unlock(&hugetlb_lock); Why do we really need the spinlock here? > } > spin_unlock(&mm->page_table_lock); > flush_tlb_range(vma, start, end); > mmu_notifier_invalidate_range_end(mm, start, end); > list_for_each_entry_safe(page, tmp, &page_list, lru) { > page_remove_rmap(page); > - list_del(&page->lru); > + /* > + * We need to move it back huge page active list. If we are > + * holding the last reference, below put_page will move it > + * back to free list. > + */ > + spin_lock(&hugetlb_lock); > + list_move(&page->lru, &h->hugepage_activelist); > + spin_unlock(&hugetlb_lock); This spinlock usage doesn't look nice but I guess we do not have many other options. -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758104Ab2C1OHg (ORCPT ); Wed, 28 Mar 2012 10:07:36 -0400 Received: from cantor2.suse.de ([195.135.220.15]:46462 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757767Ab2C1OHf (ORCPT ); Wed, 28 Mar 2012 10:07:35 -0400 Date: Wed, 28 Mar 2012 16:07:33 +0200 From: Michal Hocko To: "Aneesh Kumar K.V" Cc: linux-mm@kvack.org, mgorman@suse.de, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, aarcange@redhat.com, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: Re: [PATCH -V4 09/10] memcg: move HugeTLB resource count to parent cgroup on memcg removal Message-ID: <20120328140733.GI20949@tiehlicka.suse.cz> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-10-git-send-email-aneesh.kumar@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1331919570-2264-10-git-send-email-aneesh.kumar@linux.vnet.ibm.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri 16-03-12 23:09:29, Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > This add support for memcg removal with HugeTLB resource usage. > > Signed-off-by: Aneesh Kumar K.V > --- > include/linux/hugetlb.h | 6 ++++ > include/linux/memcontrol.h | 15 +++++++++- > mm/hugetlb.c | 41 ++++++++++++++++++++++++++ > mm/memcontrol.c | 68 +++++++++++++++++++++++++++++++++++++------ > 4 files changed, 119 insertions(+), 11 deletions(-) > [...] > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 8fd465d..685f0d5 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c [...] > @@ -3285,10 +3287,57 @@ void mem_cgroup_hugetlb_uncharge_memcg(int idx, unsigned long nr_pages, > res_counter_uncharge(&memcg->hugepage[idx], csize); > return; > } > -#else > -static bool mem_cgroup_have_hugetlb_usage(struct mem_cgroup *memcg) > + > +int mem_cgroup_move_hugetlb_parent(int idx, struct cgroup *cgroup, > + struct page *page) > { > - return 0; > + struct page_cgroup *pc; > + int csize, ret = 0; > + struct res_counter *fail_res; > + struct cgroup *pcgrp = cgroup->parent; > + struct mem_cgroup *parent = mem_cgroup_from_cont(pcgrp); > + struct mem_cgroup *memcg = mem_cgroup_from_cont(cgroup); > + > + if (!get_page_unless_zero(page)) > + goto out; > + > + pc = lookup_page_cgroup(page); > + lock_page_cgroup(pc); > + if (!PageCgroupUsed(pc) || pc->mem_cgroup != memcg) > + goto err_out; > + > + csize = PAGE_SIZE << compound_order(page); > + /* > + * uncharge from child and charge the parent. If we have > + * use_hierarchy set, we can never fail here. In-order to make > + * sure we don't get -ENOMEM on parent charge, we first uncharge > + * the child and then charge the parent. > + */ > + if (parent->use_hierarchy) { > + res_counter_uncharge(&memcg->hugepage[idx], csize); > + if (!mem_cgroup_is_root(parent)) > + ret = res_counter_charge(&parent->hugepage[idx], > + csize, &fail_res); You can still race with other hugetlb charge which would make this fail. > + } else { > + if (!mem_cgroup_is_root(parent)) { > + ret = res_counter_charge(&parent->hugepage[idx], > + csize, &fail_res); > + if (ret) { > + ret = -EBUSY; > + goto err_out; > + } > + } > + res_counter_uncharge(&memcg->hugepage[idx], csize); > + } > + /* > + * caller should have done css_get > + */ > + pc->mem_cgroup = parent; > +err_out: > + unlock_page_cgroup(pc); > + put_page(page); > +out: > + return ret; > } > #endif /* CONFIG_MEM_RES_CTLR_HUGETLB */ [...] -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756083Ab2C1OhD (ORCPT ); Wed, 28 Mar 2012 10:37:03 -0400 Received: from cantor2.suse.de ([195.135.220.15]:48819 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752127Ab2C1OhB (ORCPT ); Wed, 28 Mar 2012 10:37:01 -0400 Date: Wed, 28 Mar 2012 16:36:58 +0200 From: Michal Hocko To: "Aneesh Kumar K.V" Cc: linux-mm@kvack.org, mgorman@suse.de, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, aarcange@redhat.com, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: Re: [PATCH -V4 10/10] memcg: Add memory controller documentation for hugetlb management Message-ID: <20120328143658.GJ20949@tiehlicka.suse.cz> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-11-git-send-email-aneesh.kumar@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1331919570-2264-11-git-send-email-aneesh.kumar@linux.vnet.ibm.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri 16-03-12 23:09:30, Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > Signed-off-by: Aneesh Kumar K.V > --- > Documentation/cgroups/memory.txt | 29 +++++++++++++++++++++++++++++ > 1 files changed, 29 insertions(+), 0 deletions(-) > > diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt > index 4c95c00..d99c41b 100644 > --- a/Documentation/cgroups/memory.txt > +++ b/Documentation/cgroups/memory.txt > @@ -43,6 +43,7 @@ Features: > - usage threshold notifier > - oom-killer disable knob and oom-notifier > - Root cgroup has no limit controls. > + - resource accounting for HugeTLB pages > > Kernel memory support is work in progress, and the current version provides > basically functionality. (See Section 2.7) > @@ -75,6 +76,12 @@ Brief summary of control files. > memory.kmem.tcp.limit_in_bytes # set/show hard limit for tcp buf memory > memory.kmem.tcp.usage_in_bytes # show current tcp buf memory allocation > > + > + memory.hugetlb..limit_in_bytes # set/show limit of "hugepagesize" hugetlb usage > + memory.hugetlb..max_usage_in_bytes # show max "hugepagesize" hugetlb usage recorded > + memory.hugetlb..usage_in_bytes # show current res_counter usage for "hugepagesize" hugetlb > + # see 5.7 for details > + > 1. History > > The memory controller has a long history. A request for comments for the memory > @@ -279,6 +286,15 @@ per cgroup, instead of globally. > > * tcp memory pressure: sockets memory pressure for the tcp protocol. > > +2.8 HugeTLB extension > + > +This extension allows to limit the HugeTLB usage per control group and > +enforces the controller limit during page fault. Since HugeTLB doesn't > +support page reclaim, enforcing the limit at page fault time implies that, > +the application will get SIGBUS signal if it tries to access HugeTLB pages > +beyond its limit. This is consistent with the quota so we should mention that. We should also add a note how we interact with quotas. Another important thing to note is that the limit/usage are unrelated to memcg hard/soft limit/usage. > This requires the application to know beforehand how much > +HugeTLB pages it would require for its use. > + > 3. User Interface > > 0. Configuration > @@ -287,6 +303,7 @@ a. Enable CONFIG_CGROUPS > b. Enable CONFIG_RESOURCE_COUNTERS > c. Enable CONFIG_CGROUP_MEM_RES_CTLR > d. Enable CONFIG_CGROUP_MEM_RES_CTLR_SWAP (to use swap extension) > +f. Enable CONFIG_MEM_RES_CTLR_HUGETLB (to use HugeTLB extension) > > 1. Prepare the cgroups (see cgroups.txt, Why are cgroups needed?) > # mount -t tmpfs none /sys/fs/cgroup > @@ -510,6 +527,18 @@ unevictable= N0= N1= ... > > And we have total = file + anon + unevictable. > > +5.7 HugeTLB resource control files > +For a system supporting two hugepage size (16M and 16G) the control > +files include: > + > + memory.hugetlb.16GB.limit_in_bytes > + memory.hugetlb.16GB.max_usage_in_bytes > + memory.hugetlb.16GB.usage_in_bytes > + memory.hugetlb.16MB.limit_in_bytes > + memory.hugetlb.16MB.max_usage_in_bytes > + memory.hugetlb.16MB.usage_in_bytes > + > + > 6. Hierarchy support > > The memory controller supports a deep hierarchy and hierarchical accounting. > -- > 1.7.9 > -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758343Ab2C1Poi (ORCPT ); Wed, 28 Mar 2012 11:44:38 -0400 Received: from cantor2.suse.de ([195.135.220.15]:57265 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1758302Ab2C1Pog (ORCPT ); Wed, 28 Mar 2012 11:44:36 -0400 Date: Wed, 28 Mar 2012 17:44:34 +0200 From: Michal Hocko To: "Aneesh Kumar K.V" Cc: linux-mm@kvack.org, mgorman@suse.de, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, aarcange@redhat.com, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: Re: [PATCH -V4 04/10] memcg: Add HugeTLB extension Message-ID: <20120328154434.GN20949@tiehlicka.suse.cz> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-5-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <20120328113304.GE20949@tiehlicka.suse.cz> <87d37wetd7.fsf@skywalker.in.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87d37wetd7.fsf@skywalker.in.ibm.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed 28-03-12 19:10:36, Aneesh Kumar K.V wrote: > Michal Hocko writes: > > > On Fri 16-03-12 23:09:24, Aneesh Kumar K.V wrote: > >> From: "Aneesh Kumar K.V" > >> > >> This patch implements a memcg extension that allows us to control > >> HugeTLB allocations via memory controller. > > > > And the infrastructure is not used at this stage (you forgot to > > mention). > > The changelog should be much more descriptive. > > > Will update the changelog. Thx > > > > >> > >> Signed-off-by: Aneesh Kumar K.V > >> --- > >> include/linux/hugetlb.h | 1 + > >> include/linux/memcontrol.h | 42 +++++++++++++ > >> init/Kconfig | 8 +++ > >> mm/hugetlb.c | 2 +- > >> mm/memcontrol.c | 138 ++++++++++++++++++++++++++++++++++++++++++++ > >> 5 files changed, 190 insertions(+), 1 deletions(-) > >> > > [...] > >> diff --git a/init/Kconfig b/init/Kconfig > >> index 3f42cd6..f0eb8aa 100644 > >> --- a/init/Kconfig > >> +++ b/init/Kconfig > >> @@ -725,6 +725,14 @@ config CGROUP_PERF > >> > >> Say N if unsure. > >> > >> +config MEM_RES_CTLR_HUGETLB > >> + bool "Memory Resource Controller HugeTLB Extension (EXPERIMENTAL)" > >> + depends on CGROUP_MEM_RES_CTLR && HUGETLB_PAGE && EXPERIMENTAL > >> + default n > >> + help > >> + Add HugeTLB management to memory resource controller. When you > >> + enable this, you can put a per cgroup limit on HugeTLB usage. > > > > How does it interact with the hard/soft limists etc... > > > There is no softlimit support for HugeTLB extension. Sure, sorry for not being precise. The point was how this interacts with memcg hard/soft limit (they are independent) etc... > > [...] > >> diff --git a/mm/memcontrol.c b/mm/memcontrol.c > >> index 6728a7a..4b36c5e 100644 > >> --- a/mm/memcontrol.c > >> +++ b/mm/memcontrol.c > >> @@ -235,6 +235,10 @@ struct mem_cgroup { > >> */ > >> struct res_counter memsw; > >> /* > >> + * the counter to account for hugepages from hugetlb. > >> + */ > >> + struct res_counter hugepage[HUGE_MAX_HSTATE]; > >> + /* > >> * Per cgroup active and inactive list, similar to the > >> * per zone LRU lists. > >> */ > >> @@ -3156,6 +3160,128 @@ static inline int mem_cgroup_move_swap_account(swp_entry_t entry, > >> } > >> #endif > >> > >> +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB > >> +static bool mem_cgroup_have_hugetlb_usage(struct mem_cgroup *memcg) > >> +{ > >> + int idx; > >> + for (idx = 0; idx < hugetlb_max_hstate; idx++) { > > > > Maybe we should expose for_each_hstate as well... > > > That will not really help here. If we use for_each_hstate then we will > need to use hstate_index to get the index. Fair enough > >> + if (memcg->hugepage[idx].usage > 0) > >> + return 1; > >> + } > >> + return 0; > >> +} > >> + > >> +int mem_cgroup_hugetlb_charge_page(int idx, unsigned long nr_pages, > >> + struct mem_cgroup **ptr) > >> +{ > >> + int ret = 0; > >> + struct mem_cgroup *memcg; > >> + struct res_counter *fail_res; > >> + unsigned long csize = nr_pages * PAGE_SIZE; > >> + > >> + if (mem_cgroup_disabled()) > >> + return 0; > >> +again: > >> + rcu_read_lock(); > >> + memcg = mem_cgroup_from_task(current); > >> + if (!memcg) > >> + memcg = root_mem_cgroup; > >> + if (mem_cgroup_is_root(memcg)) { > >> + rcu_read_unlock(); > >> + goto done; > >> + } > >> + if (!css_tryget(&memcg->css)) { > >> + rcu_read_unlock(); > >> + goto again; > >> + } > >> + rcu_read_unlock(); > >> + > >> + ret = res_counter_charge(&memcg->hugepage[idx], csize, &fail_res); > >> + css_put(&memcg->css); > >> +done: > >> + *ptr = memcg; > > > > Why do we set ptr even for the failure case after we dropped a > > reference? > > That ensures that *ptr is NULL. Does it? AFAICS res_counter_charge might fail and you would use non NULL memcg (with a dropped reference). [...] > >> + SetPageCgroupUsed(pc); > >> + > >> + unlock_page_cgroup(pc); > >> + return; > >> +} > >> + > > [...] > >> @@ -4887,6 +5013,7 @@ err_cleanup: > >> static struct cgroup_subsys_state * __ref > >> mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont) > >> { > >> + int idx; > >> struct mem_cgroup *memcg, *parent; > >> long error = -ENOMEM; > >> int node; > >> @@ -4929,9 +5056,14 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont) > >> * mem_cgroup(see mem_cgroup_put). > >> */ > >> mem_cgroup_get(parent); > >> + for (idx = 0; idx < HUGE_MAX_HSTATE; idx++) > > > > Do we have to init all hstates or is hugetlb_max_hstate enough? > > > Yes. we do call mem_cgroup_create for root cgroup before initialzing > hugetlb hstate. drop a comment? -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758372Ab2C1Ri7 (ORCPT ); Wed, 28 Mar 2012 13:38:59 -0400 Received: from e23smtp02.au.ibm.com ([202.81.31.144]:49608 "EHLO e23smtp02.au.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753377Ab2C1Ri5 (ORCPT ); Wed, 28 Mar 2012 13:38:57 -0400 From: "Aneesh Kumar K.V" To: Michal Hocko Cc: linux-mm@kvack.org, mgorman@suse.de, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, aarcange@redhat.com, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: Re: [PATCH -V4 08/10] hugetlbfs: Add a list for tracking in-use HugeTLB pages In-Reply-To: <20120328135845.GH20949@tiehlicka.suse.cz> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-9-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <20120328135845.GH20949@tiehlicka.suse.cz>User-Agent: Notmuch/0.11.1+346~g13d19c3 (http://notmuchmail.org) Emacs/23.3.1 (x86_64-pc-linux-gnu) Date: Wed, 28 Mar 2012 23:08:34 +0530 Message-ID: <87vclo1v8l.fsf@skywalker.in.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii x-cbid: 12032807-5490-0000-0000-00000108B888 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Michal Hocko writes: > On Fri 16-03-12 23:09:28, Aneesh Kumar K.V wrote: >> From: "Aneesh Kumar K.V" >> >> hugepage_activelist will be used to track currently used HugeTLB pages. >> We need to find the in-use HugeTLB pages to support memcg removal. >> On memcg removal we update the page's memory cgroup to point to >> parent cgroup. >> >> Signed-off-by: Aneesh Kumar K.V >> --- >> include/linux/hugetlb.h | 1 + >> mm/hugetlb.c | 23 ++++++++++++++++++----- >> 2 files changed, 19 insertions(+), 5 deletions(-) >> >> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h >> index cbd8dc5..6919100 100644 >> --- a/include/linux/hugetlb.h >> +++ b/include/linux/hugetlb.h > [...] >> @@ -2319,14 +2322,24 @@ void __unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start, >> page = pte_page(pte); >> if (pte_dirty(pte)) >> set_page_dirty(page); >> - list_add(&page->lru, &page_list); >> + >> + spin_lock(&hugetlb_lock); >> + list_move(&page->lru, &page_list); >> + spin_unlock(&hugetlb_lock); > > Why do we really need the spinlock here? It does a list_del from hugepage_activelist. > >> } >> spin_unlock(&mm->page_table_lock); >> flush_tlb_range(vma, start, end); >> mmu_notifier_invalidate_range_end(mm, start, end); >> list_for_each_entry_safe(page, tmp, &page_list, lru) { >> page_remove_rmap(page); >> - list_del(&page->lru); >> + /* >> + * We need to move it back huge page active list. If we are >> + * holding the last reference, below put_page will move it >> + * back to free list. >> + */ >> + spin_lock(&hugetlb_lock); >> + list_move(&page->lru, &h->hugepage_activelist); >> + spin_unlock(&hugetlb_lock); > > This spinlock usage doesn't look nice but I guess we do not have many > other options. > -aneesh From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758634Ab2C1Rhe (ORCPT ); Wed, 28 Mar 2012 13:37:34 -0400 Received: from e23smtp09.au.ibm.com ([202.81.31.142]:36606 "EHLO e23smtp09.au.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753377Ab2C1Rhc (ORCPT ); Wed, 28 Mar 2012 13:37:32 -0400 From: "Aneesh Kumar K.V" To: Michal Hocko Cc: linux-mm@kvack.org, mgorman@suse.de, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, aarcange@redhat.com, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: Re: [PATCH -V4 04/10] memcg: Add HugeTLB extension In-Reply-To: <20120328134020.GG20949@tiehlicka.suse.cz> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-5-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <20120328134020.GG20949@tiehlicka.suse.cz>User-Agent: Notmuch/0.11.1+346~g13d19c3 (http://notmuchmail.org) Emacs/23.3.1 (x86_64-pc-linux-gnu) Date: Wed, 28 Mar 2012 23:07:14 +0530 Message-ID: <87y5qk1vat.fsf@skywalker.in.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii x-cbid: 12032808-3568-0000-0000-0000016EEC3B Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Michal Hocko writes: > On Fri 16-03-12 23:09:24, Aneesh Kumar K.V wrote: > [...] >> diff --git a/mm/memcontrol.c b/mm/memcontrol.c >> index 6728a7a..4b36c5e 100644 >> --- a/mm/memcontrol.c >> +++ b/mm/memcontrol.c > [...] >> @@ -4887,6 +5013,7 @@ err_cleanup: >> static struct cgroup_subsys_state * __ref >> mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont) >> { >> + int idx; >> struct mem_cgroup *memcg, *parent; >> long error = -ENOMEM; >> int node; >> @@ -4929,9 +5056,14 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont) >> * mem_cgroup(see mem_cgroup_put). >> */ >> mem_cgroup_get(parent); >> + for (idx = 0; idx < HUGE_MAX_HSTATE; idx++) >> + res_counter_init(&memcg->hugepage[idx], >> + &parent->hugepage[idx]); > > Hmm, I do not think we want to make groups deeper in the hierarchy > unlimited as we cannot reclaim. Shouldn't we copy the limit from the parent? > Still not ideal but slightly more expected behavior IMO. But we should be limiting the child group based on parent's limit only when hierarchy is set right ? > > The hierarchy setups are still interesting and the limitations should be > described in the documentation... > It should behave similar to memcg. ie, if hierarchy is set, then we limit using MIN(parent's limit, child's limit). May be I am missing some of the details of memcg use_hierarchy config. My goal was to keep it similar to memcg. Can you explain why do you think the patch would make it any different ? -aneesh From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758563Ab2C1Rjv (ORCPT ); Wed, 28 Mar 2012 13:39:51 -0400 Received: from e23smtp08.au.ibm.com ([202.81.31.141]:48190 "EHLO e23smtp08.au.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753377Ab2C1Rjt (ORCPT ); Wed, 28 Mar 2012 13:39:49 -0400 From: "Aneesh Kumar K.V" To: Michal Hocko Cc: linux-mm@kvack.org, mgorman@suse.de, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, aarcange@redhat.com, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: Re: [PATCH -V4 05/10] hugetlb: add charge/uncharge calls for HugeTLB alloc/free In-Reply-To: <20120328131706.GF20949@tiehlicka.suse.cz> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-6-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <20120328131706.GF20949@tiehlicka.suse.cz>User-Agent: Notmuch/0.11.1+346~g13d19c3 (http://notmuchmail.org) Emacs/23.3.1 (x86_64-pc-linux-gnu) Date: Wed, 28 Mar 2012 23:09:34 +0530 Message-ID: <87sjgs1v6x.fsf@skywalker.in.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii x-cbid: 12032807-5140-0000-0000-000000F749DC Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Michal Hocko writes: > On Fri 16-03-12 23:09:25, Aneesh Kumar K.V wrote: >> From: "Aneesh Kumar K.V" >> >> This adds necessary charge/uncharge calls in the HugeTLB code > > This begs for more description... > Other than that it looks correct. > Updated as below hugetlb: add charge/uncharge calls for HugeTLB alloc/free This adds necessary charge/uncharge calls in the HugeTLB code. We do memcg charge in page alloc and uncharge in compound page destructor. We also need to ignore HugeTLB pages in __mem_cgroup_uncharge_common because that get called from delete_from_page_cache -aneesh From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758745Ab2C2AUi (ORCPT ); Wed, 28 Mar 2012 20:20:38 -0400 Received: from fgwmail5.fujitsu.co.jp ([192.51.44.35]:39455 "EHLO fgwmail5.fujitsu.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751941Ab2C2AUa (ORCPT ); Wed, 28 Mar 2012 20:20:30 -0400 X-SecurityPolicyCheck: OK by SHieldMailChecker v1.7.4 Message-ID: <4F73AA5F.5050604@jp.fujitsu.com> Date: Thu, 29 Mar 2012 09:18:39 +0900 From: KAMEZAWA Hiroyuki User-Agent: Mozilla/5.0 (Windows NT 6.0; rv:11.0) Gecko/20120312 Thunderbird/11.0 MIME-Version: 1.0 To: "Aneesh Kumar K.V" CC: Michal Hocko , linux-mm@kvack.org, mgorman@suse.de, dhillf@gmail.com, aarcange@redhat.com, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: Re: [PATCH -V4 04/10] memcg: Add HugeTLB extension References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-5-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <20120328134020.GG20949@tiehlicka.suse.cz>User-Agent: Notmuch/0.11.1+346~g13d19c3 (http://notmuchmail.org) Emacs/23.3.1 (x86_64-pc-linux-gnu) <87y5qk1vat.fsf@skywalker.in.ibm.com> In-Reply-To: <87y5qk1vat.fsf@skywalker.in.ibm.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org (2012/03/29 2:37), Aneesh Kumar K.V wrote: > Michal Hocko writes: > >> On Fri 16-03-12 23:09:24, Aneesh Kumar K.V wrote: >> [...] >>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c >>> index 6728a7a..4b36c5e 100644 >>> --- a/mm/memcontrol.c >>> +++ b/mm/memcontrol.c >> [...] >>> @@ -4887,6 +5013,7 @@ err_cleanup: >>> static struct cgroup_subsys_state * __ref >>> mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont) >>> { >>> + int idx; >>> struct mem_cgroup *memcg, *parent; >>> long error = -ENOMEM; >>> int node; >>> @@ -4929,9 +5056,14 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont) >>> * mem_cgroup(see mem_cgroup_put). >>> */ >>> mem_cgroup_get(parent); >>> + for (idx = 0; idx < HUGE_MAX_HSTATE; idx++) >>> + res_counter_init(&memcg->hugepage[idx], >>> + &parent->hugepage[idx]); >> >> Hmm, I do not think we want to make groups deeper in the hierarchy >> unlimited as we cannot reclaim. Shouldn't we copy the limit from the parent? >> Still not ideal but slightly more expected behavior IMO. > > But we should be limiting the child group based on parent's limit only > when hierarchy is set right ? > >> >> The hierarchy setups are still interesting and the limitations should be >> described in the documentation... >> > > It should behave similar to memcg. ie, if hierarchy is set, then we limit > using MIN(parent's limit, child's limit). May be I am missing some of > the details of memcg use_hierarchy config. My goal was to keep it > similar to memcg. Can you explain why do you think the patch would > make it any different ? > Maybe this is a different story but.... Tejun(Cgroup Maintainer) asked us to remove 'use_hierarchy' settings because most of other cgroups are hierarchical(*). I answered that improvement in res_counter latency is required. And now, we have some idea to improve res_counter. (I'd like to try this after page_cgroup diet series..) If we change and drop use_hierarchy, the usage similar to current use_hierarchy=0 will be.. /cgroup/memory/ = unlimited level1 = unlimited level2 = unlimited level3 = limit To do this, after improvement of res_counter, we entry use_hierarchy into feature-removal-list and wait for 2 versions..So, this will not affect your developments, anyway. Thanks, -Kame (*) AFAIK, blkio cgroup needs tons of work to be hierarchical... From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753532Ab2C2IKP (ORCPT ); Thu, 29 Mar 2012 04:10:15 -0400 Received: from cantor2.suse.de ([195.135.220.15]:56609 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751264Ab2C2IKF (ORCPT ); Thu, 29 Mar 2012 04:10:05 -0400 Date: Thu, 29 Mar 2012 10:10:03 +0200 From: Michal Hocko To: "Aneesh Kumar K.V" Cc: linux-mm@kvack.org, mgorman@suse.de, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, aarcange@redhat.com, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: Re: [PATCH -V4 05/10] hugetlb: add charge/uncharge calls for HugeTLB alloc/free Message-ID: <20120329081003.GC30465@tiehlicka.suse.cz> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-6-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <20120328131706.GF20949@tiehlicka.suse.cz> <87sjgs1v6x.fsf@skywalker.in.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87sjgs1v6x.fsf@skywalker.in.ibm.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed 28-03-12 23:09:34, Aneesh Kumar K.V wrote: > Michal Hocko writes: > > > On Fri 16-03-12 23:09:25, Aneesh Kumar K.V wrote: > >> From: "Aneesh Kumar K.V" > >> > >> This adds necessary charge/uncharge calls in the HugeTLB code > > > > This begs for more description... > > Other than that it looks correct. > > > > Updated as below > > hugetlb: add charge/uncharge calls for HugeTLB alloc/free > > This adds necessary charge/uncharge calls in the HugeTLB code. We do > memcg charge in page alloc and uncharge in compound page destructor. > We also need to ignore HugeTLB pages in __mem_cgroup_uncharge_common > because that get called from delete_from_page_cache and from mem_cgroup_end_migration used during soft_offline_page. Btw., while looking at mem_cgroup_end_migration, I have noticed that you need to take care of mem_cgroup_prepare_migration as well otherwise the page would get charged as a normal (shmem) page. > > -aneesh > > -- > To unsubscribe from this list: send the line "unsubscribe cgroups" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757058Ab2C2H5c (ORCPT ); Thu, 29 Mar 2012 03:57:32 -0400 Received: from cantor2.suse.de ([195.135.220.15]:55656 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751182Ab2C2H5Y (ORCPT ); Thu, 29 Mar 2012 03:57:24 -0400 Date: Thu, 29 Mar 2012 09:57:22 +0200 From: Michal Hocko To: "Aneesh Kumar K.V" Cc: linux-mm@kvack.org, mgorman@suse.de, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, aarcange@redhat.com, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: Re: [PATCH -V4 04/10] memcg: Add HugeTLB extension Message-ID: <20120329075722.GB30465@tiehlicka.suse.cz> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-5-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <20120328134020.GG20949@tiehlicka.suse.cz> <87y5qk1vat.fsf@skywalker.in.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87y5qk1vat.fsf@skywalker.in.ibm.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed 28-03-12 23:07:14, Aneesh Kumar K.V wrote: > Michal Hocko writes: > > > On Fri 16-03-12 23:09:24, Aneesh Kumar K.V wrote: > > [...] > >> diff --git a/mm/memcontrol.c b/mm/memcontrol.c > >> index 6728a7a..4b36c5e 100644 > >> --- a/mm/memcontrol.c > >> +++ b/mm/memcontrol.c > > [...] > >> @@ -4887,6 +5013,7 @@ err_cleanup: > >> static struct cgroup_subsys_state * __ref > >> mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont) > >> { > >> + int idx; > >> struct mem_cgroup *memcg, *parent; > >> long error = -ENOMEM; > >> int node; > >> @@ -4929,9 +5056,14 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont) > >> * mem_cgroup(see mem_cgroup_put). > >> */ > >> mem_cgroup_get(parent); > >> + for (idx = 0; idx < HUGE_MAX_HSTATE; idx++) > >> + res_counter_init(&memcg->hugepage[idx], > >> + &parent->hugepage[idx]); > > > > Hmm, I do not think we want to make groups deeper in the hierarchy > > unlimited as we cannot reclaim. Shouldn't we copy the limit from the parent? > > Still not ideal but slightly more expected behavior IMO. > > But we should be limiting the child group based on parent's limit only > when hierarchy is set right ? Yes. Everything else should be unlimited by default. > > > > > The hierarchy setups are still interesting and the limitations should be > > described in the documentation... > > > > It should behave similar to memcg. ie, if hierarchy is set, then we limit > using MIN(parent's limit, child's limit). May be I am missing some of > the details of memcg use_hierarchy config. My goal was to keep it > similar to memcg. Can you explain why do you think the patch would > make it any different ? Yes, the patch tries to be consistent with the memcg limits. That is OK and I have no objections for that. It is just that consequences are different. The hugetlb limit is really hard... > > -aneesh > > -- > To unsubscribe from this list: send the line "unsubscribe cgroups" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753090Ab2C2IMQ (ORCPT ); Thu, 29 Mar 2012 04:12:16 -0400 Received: from cantor2.suse.de ([195.135.220.15]:56858 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753246Ab2C2IL6 (ORCPT ); Thu, 29 Mar 2012 04:11:58 -0400 Date: Thu, 29 Mar 2012 10:11:57 +0200 From: Michal Hocko To: "Aneesh Kumar K.V" Cc: linux-mm@kvack.org, mgorman@suse.de, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, aarcange@redhat.com, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: Re: [PATCH -V4 08/10] hugetlbfs: Add a list for tracking in-use HugeTLB pages Message-ID: <20120329081157.GD30465@tiehlicka.suse.cz> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-9-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <20120328135845.GH20949@tiehlicka.suse.cz> <87vclo1v8l.fsf@skywalker.in.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87vclo1v8l.fsf@skywalker.in.ibm.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed 28-03-12 23:08:34, Aneesh Kumar K.V wrote: > Michal Hocko writes: > > > On Fri 16-03-12 23:09:28, Aneesh Kumar K.V wrote: > >> From: "Aneesh Kumar K.V" > >> > >> hugepage_activelist will be used to track currently used HugeTLB pages. > >> We need to find the in-use HugeTLB pages to support memcg removal. > >> On memcg removal we update the page's memory cgroup to point to > >> parent cgroup. > >> > >> Signed-off-by: Aneesh Kumar K.V > >> --- > >> include/linux/hugetlb.h | 1 + > >> mm/hugetlb.c | 23 ++++++++++++++++++----- > >> 2 files changed, 19 insertions(+), 5 deletions(-) > >> > >> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h > >> index cbd8dc5..6919100 100644 > >> --- a/include/linux/hugetlb.h > >> +++ b/include/linux/hugetlb.h > > [...] > >> @@ -2319,14 +2322,24 @@ void __unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start, > >> page = pte_page(pte); > >> if (pte_dirty(pte)) > >> set_page_dirty(page); > >> - list_add(&page->lru, &page_list); > >> + > >> + spin_lock(&hugetlb_lock); > >> + list_move(&page->lru, &page_list); > >> + spin_unlock(&hugetlb_lock); > > > > Why do we really need the spinlock here? > > > It does a list_del from hugepage_activelist. right you are. sorry -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934108Ab2C3Kzq (ORCPT ); Fri, 30 Mar 2012 06:55:46 -0400 Received: from e28smtp06.in.ibm.com ([122.248.162.6]:51885 "EHLO e28smtp06.in.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755769Ab2C3Kzi (ORCPT ); Fri, 30 Mar 2012 06:55:38 -0400 From: "Aneesh Kumar K.V" To: Michal Hocko Cc: linux-mm@kvack.org, mgorman@suse.de, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, aarcange@redhat.com, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: Re: [PATCH -V4 05/10] hugetlb: add charge/uncharge calls for HugeTLB alloc/free In-Reply-To: <20120329081003.GC30465@tiehlicka.suse.cz> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-6-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <20120328131706.GF20949@tiehlicka.suse.cz> <87sjgs1v6x.fsf@skywalker.in.ibm.com> <20120329081003.GC30465@tiehlicka.suse.cz>User-Agent: Notmuch/0.11.1+346~g13d19c3 (http://notmuchmail.org) Emacs/23.3.1 (x86_64-pc-linux-gnu) Date: Fri, 30 Mar 2012 16:10:00 +0530 Message-ID: <871uoamkxr.fsf@skywalker.in.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii x-cbid: 12033010-9574-0000-0000-000002094902 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Michal Hocko writes: > On Wed 28-03-12 23:09:34, Aneesh Kumar K.V wrote: >> Michal Hocko writes: >> >> > On Fri 16-03-12 23:09:25, Aneesh Kumar K.V wrote: >> >> From: "Aneesh Kumar K.V" >> >> >> >> This adds necessary charge/uncharge calls in the HugeTLB code >> > >> > This begs for more description... >> > Other than that it looks correct. >> > >> >> Updated as below >> >> hugetlb: add charge/uncharge calls for HugeTLB alloc/free >> >> This adds necessary charge/uncharge calls in the HugeTLB code. We do >> memcg charge in page alloc and uncharge in compound page destructor. >> We also need to ignore HugeTLB pages in __mem_cgroup_uncharge_common >> because that get called from delete_from_page_cache > > and from mem_cgroup_end_migration used during soft_offline_page. > > Btw., while looking at mem_cgroup_end_migration, I have noticed that you > need to take care of mem_cgroup_prepare_migration as well otherwise the > page would get charged as a normal (shmem) page. > Won't we skip HugeTLB pages in migrate ? check_range do check for is_vm_hugetlb_page. -aneesh From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934107Ab2C3KrN (ORCPT ); Fri, 30 Mar 2012 06:47:13 -0400 Received: from cantor2.suse.de ([195.135.220.15]:47704 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756201Ab2C3KrD (ORCPT ); Fri, 30 Mar 2012 06:47:03 -0400 Date: Fri, 30 Mar 2012 12:46:50 +0200 From: Michal Hocko To: "Aneesh Kumar K.V" Cc: linux-mm@kvack.org, mgorman@suse.de, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, aarcange@redhat.com, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: Re: [PATCH -V4 05/10] hugetlb: add charge/uncharge calls for HugeTLB alloc/free Message-ID: <20120330104650.GB15375@tiehlicka.suse.cz> References: <1331919570-2264-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1331919570-2264-6-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <20120328131706.GF20949@tiehlicka.suse.cz> <87sjgs1v6x.fsf@skywalker.in.ibm.com> <20120329081003.GC30465@tiehlicka.suse.cz> <871uoamkxr.fsf@skywalker.in.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <871uoamkxr.fsf@skywalker.in.ibm.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri 30-03-12 16:10:00, Aneesh Kumar K.V wrote: > Michal Hocko writes: > > > On Wed 28-03-12 23:09:34, Aneesh Kumar K.V wrote: > >> Michal Hocko writes: > >> > >> > On Fri 16-03-12 23:09:25, Aneesh Kumar K.V wrote: > >> >> From: "Aneesh Kumar K.V" > >> >> > >> >> This adds necessary charge/uncharge calls in the HugeTLB code > >> > > >> > This begs for more description... > >> > Other than that it looks correct. > >> > > >> > >> Updated as below > >> > >> hugetlb: add charge/uncharge calls for HugeTLB alloc/free > >> > >> This adds necessary charge/uncharge calls in the HugeTLB code. We do > >> memcg charge in page alloc and uncharge in compound page destructor. > >> We also need to ignore HugeTLB pages in __mem_cgroup_uncharge_common > >> because that get called from delete_from_page_cache > > > > and from mem_cgroup_end_migration used during soft_offline_page. > > > > Btw., while looking at mem_cgroup_end_migration, I have noticed that you > > need to take care of mem_cgroup_prepare_migration as well otherwise the > > page would get charged as a normal (shmem) page. > > > > Won't we skip HugeTLB pages in migrate ? Yes but we still migrate for memory failure (see soft_offline_page). > check_range do check for is_vm_hugetlb_page. > > -aneesh > -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic