From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Aneesh Kumar K.V" Subject: [PATCH -V9 00/15] hugetlb: Add HugeTLB controller to control HugeTLB allocation Date: Wed, 13 Jun 2012 15:57:19 +0530 Message-ID: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Return-path: Sender: owner-linux-mm@kvack.org List-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: linux-mm@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, rientjes@google.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Hi, This patchset implements a cgroup resource controller for HugeTLB pages. The controller allows to limit the HugeTLB usage per control group and enforces the controller limit during page fault. Since HugeTLB doesn't support page reclaim, enforcing the limit at page fault time implies that, the application will get SIGBUS signal if it tries to access HugeTLB pages beyond its limit. This requires the application to know beforehand how much HugeTLB pages it would require for its use. The goal is to control how many HugeTLB pages a group of task can allocate. It can be looked at as an extension of the existing quota interface which limits the number of HugeTLB pages per hugetlbfs superblock. HPC job scheduler requires jobs to specify their resource requirements in the job file. Once their requirements can be met, job schedulers like (SLURM) will schedule the job. We need to make sure that the jobs won't consume more resources than requested. If they do we should either error out or kill the application. Patches are on top of v3.5-rc2 Changes from V8: * Address review feedback Changes from V7: * Remove dependency on page_cgroup. * Use page[2].lru.next to store HugeTLB cgroup information. Changes from V6: * Implement the controller as a seperate HugeTLB cgroup. * Folded fixup patches in -mm to the original patches Changes from V5: * Address review feedback. Changes from V4: * Add support for charge/uncharge during page migration * Drop the usage of page->lru in unmap_hugepage_range. Changes from v3: * Address review feedback. * Fix a bug in cgroup removal related parent charging with use_hierarchy set Changes from V2: * Changed the implementation to limit the HugeTLB usage during page fault time. This simplifies the extension and keep it closer to memcg design. This also allows to support cgroup removal with less complexity. Only caveat is the application should ensure its HugeTLB usage doesn't cross the cgroup limit. Changes from V1: * Changed the implementation as a memcg extension. We still use the same logic to track the cgroup and range. Changes from RFC post: * Added support for HugeTLB cgroup hierarchy * Added support for task migration * Added documentation patch * Other bug fixes -aneesh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Aneesh Kumar K.V" Subject: [PATCH -V9 03/15] hugetlb: add an inline helper for finding hstate index Date: Wed, 13 Jun 2012 15:57:22 +0530 Message-ID: <1339583254-895-4-git-send-email-aneesh.kumar@linux.vnet.ibm.com> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Return-path: In-Reply-To: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Sender: owner-linux-mm@kvack.org List-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: linux-mm@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, rientjes@google.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, "Aneesh Kumar K.V" From: "Aneesh Kumar K.V" Add an inline helper and use it in the code. Acked-by: David Rientjes Acked-by: Michal Hocko Reviewed-by: KAMEZAWA Hiroyuki Signed-off-by: Aneesh Kumar K.V --- include/linux/hugetlb.h | 6 ++++++ mm/hugetlb.c | 20 +++++++++++--------- 2 files changed, 17 insertions(+), 9 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index d5d6bbe..217f528 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -302,6 +302,11 @@ static inline unsigned hstate_index_to_shift(unsigned index) return hstates[index].order + PAGE_SHIFT; } +static inline int hstate_index(struct hstate *h) +{ + return h - hstates; +} + #else struct hstate {}; #define alloc_huge_page_node(h, nid) NULL @@ -320,6 +325,7 @@ static inline unsigned int pages_per_huge_page(struct hstate *h) return 1; } #define hstate_index_to_shift(index) 0 +#define hstate_index(h) 0 #endif #endif /* _LINUX_HUGETLB_H */ diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 34a7e23..b1e0ed1 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1646,7 +1646,7 @@ static int hugetlb_sysfs_add_hstate(struct hstate *h, struct kobject *parent, struct attribute_group *hstate_attr_group) { int retval; - int hi = h - hstates; + int hi = hstate_index(h); hstate_kobjs[hi] = kobject_create_and_add(h->name, parent); if (!hstate_kobjs[hi]) @@ -1741,11 +1741,13 @@ void hugetlb_unregister_node(struct node *node) if (!nhs->hugepages_kobj) return; /* no hstate attributes */ - for_each_hstate(h) - if (nhs->hstate_kobjs[h - hstates]) { - kobject_put(nhs->hstate_kobjs[h - hstates]); - nhs->hstate_kobjs[h - hstates] = NULL; + for_each_hstate(h) { + int idx = hstate_index(h); + if (nhs->hstate_kobjs[idx]) { + kobject_put(nhs->hstate_kobjs[idx]); + nhs->hstate_kobjs[idx] = NULL; } + } kobject_put(nhs->hugepages_kobj); nhs->hugepages_kobj = NULL; @@ -1848,7 +1850,7 @@ static void __exit hugetlb_exit(void) hugetlb_unregister_all_nodes(); for_each_hstate(h) { - kobject_put(hstate_kobjs[h - hstates]); + kobject_put(hstate_kobjs[hstate_index(h)]); } kobject_put(hugepages_kobj); @@ -1869,7 +1871,7 @@ static int __init hugetlb_init(void) if (!size_to_hstate(default_hstate_size)) hugetlb_add_hstate(HUGETLB_PAGE_ORDER); } - default_hstate_idx = size_to_hstate(default_hstate_size) - hstates; + default_hstate_idx = hstate_index(size_to_hstate(default_hstate_size)); if (default_hstate_max_huge_pages) default_hstate.max_huge_pages = default_hstate_max_huge_pages; @@ -2687,7 +2689,7 @@ retry: */ if (unlikely(PageHWPoison(page))) { ret = VM_FAULT_HWPOISON | - VM_FAULT_SET_HINDEX(h - hstates); + VM_FAULT_SET_HINDEX(hstate_index(h)); goto backout_unlocked; } } @@ -2760,7 +2762,7 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, return 0; } else if (unlikely(is_hugetlb_entry_hwpoisoned(entry))) return VM_FAULT_HWPOISON_LARGE | - VM_FAULT_SET_HINDEX(h - hstates); + VM_FAULT_SET_HINDEX(hstate_index(h)); } ptep = huge_pte_alloc(mm, address, huge_page_size(h)); -- 1.7.10 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Aneesh Kumar K.V" Subject: [PATCH -V9 01/15] hugetlb: rename max_hstate to hugetlb_max_hstate Date: Wed, 13 Jun 2012 15:57:20 +0530 Message-ID: <1339583254-895-2-git-send-email-aneesh.kumar@linux.vnet.ibm.com> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Return-path: In-Reply-To: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Sender: owner-linux-mm@kvack.org List-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: linux-mm@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, rientjes@google.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, "Aneesh Kumar K.V" From: "Aneesh Kumar K.V" Rename max_hstate to hugetlb_max_hstate. We will be using this from other subsystems like hugetlb controller in later patches. Acked-by: David Rientjes Reviewed-by: KAMEZAWA Hiroyuki Acked-by: Hillf Danton Acked-by: Michal Hocko Signed-off-by: Aneesh Kumar K.V --- mm/hugetlb.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index e198831..c868309 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -34,7 +34,7 @@ const unsigned long hugetlb_zero = 0, hugetlb_infinity = ~0UL; static gfp_t htlb_alloc_mask = GFP_HIGHUSER; unsigned long hugepages_treat_as_movable; -static int max_hstate; +static int hugetlb_max_hstate; unsigned int default_hstate_idx; struct hstate hstates[HUGE_MAX_HSTATE]; @@ -46,7 +46,7 @@ static unsigned long __initdata default_hstate_max_huge_pages; static unsigned long __initdata default_hstate_size; #define for_each_hstate(h) \ - for ((h) = hstates; (h) < &hstates[max_hstate]; (h)++) + for ((h) = hstates; (h) < &hstates[hugetlb_max_hstate]; (h)++) /* * Protects updates to hugepage_freelists, nr_huge_pages, and free_huge_pages @@ -1897,9 +1897,9 @@ void __init hugetlb_add_hstate(unsigned order) printk(KERN_WARNING "hugepagesz= specified twice, ignoring\n"); return; } - BUG_ON(max_hstate >= HUGE_MAX_HSTATE); + BUG_ON(hugetlb_max_hstate >= HUGE_MAX_HSTATE); BUG_ON(order == 0); - h = &hstates[max_hstate++]; + h = &hstates[hugetlb_max_hstate++]; h->order = order; h->mask = ~((1ULL << (order + PAGE_SHIFT)) - 1); h->nr_huge_pages = 0; @@ -1920,10 +1920,10 @@ static int __init hugetlb_nrpages_setup(char *s) static unsigned long *last_mhp; /* - * !max_hstate means we haven't parsed a hugepagesz= parameter yet, + * !hugetlb_max_hstate means we haven't parsed a hugepagesz= parameter yet, * so this hugepages= parameter goes to the "default hstate". */ - if (!max_hstate) + if (!hugetlb_max_hstate) mhp = &default_hstate_max_huge_pages; else mhp = &parsed_hstate->max_huge_pages; @@ -1942,7 +1942,7 @@ static int __init hugetlb_nrpages_setup(char *s) * But we need to allocate >= MAX_ORDER hstates here early to still * use the bootmem allocator. */ - if (max_hstate && parsed_hstate->order >= MAX_ORDER) + if (hugetlb_max_hstate && parsed_hstate->order >= MAX_ORDER) hugetlb_hstate_alloc_pages(parsed_hstate); last_mhp = mhp; -- 1.7.10 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Aneesh Kumar K.V" Subject: [PATCH -V9 05/15] hugetlb: avoid taking i_mmap_mutex in unmap_single_vma() for hugetlb Date: Wed, 13 Jun 2012 15:57:24 +0530 Message-ID: <1339583254-895-6-git-send-email-aneesh.kumar@linux.vnet.ibm.com> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Return-path: In-Reply-To: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Sender: owner-linux-mm@kvack.org List-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: linux-mm@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, rientjes@google.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, "Aneesh Kumar K.V" From: "Aneesh Kumar K.V" i_mmap_mutex lock was added in unmap_single_vma by 502717f4e ("hugetlb: fix linked list corruption in unmap_hugepage_range()") but we don't use page->lru in unmap_hugepage_range any more. Also the lock was taken higher up in the stack in some code path. That would result in deadlock. unmap_mapping_range (i_mmap_mutex) -> unmap_mapping_range_tree -> unmap_mapping_range_vma -> zap_page_range_single -> unmap_single_vma -> unmap_hugepage_range (i_mmap_mutex) For shared pagetable support for huge pages, since pagetable pages are ref counted we don't need any lock during huge_pmd_unshare. We do take i_mmap_mutex in huge_pmd_share while walking the vma_prio_tree in mapping. (39dde65c9940c97f ("shared page table for hugetlb page")). Signed-off-by: Aneesh Kumar K.V --- mm/memory.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 545e18a..f6bc04f 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1326,11 +1326,8 @@ static void unmap_single_vma(struct mmu_gather *tlb, * Since no pte has actually been setup, it is * safe to do nothing in this case. */ - if (vma->vm_file) { - mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex); + if (vma->vm_file) __unmap_hugepage_range(tlb, vma, start, end, NULL); - mutex_unlock(&vma->vm_file->f_mapping->i_mmap_mutex); - } } else unmap_page_range(tlb, vma, start, end, details); } -- 1.7.10 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Aneesh Kumar K.V" Subject: [PATCH -V9 04/15] hugetlb: use mmu_gather instead of a temporary linked list for accumulating pages Date: Wed, 13 Jun 2012 15:57:23 +0530 Message-ID: <1339583254-895-5-git-send-email-aneesh.kumar@linux.vnet.ibm.com> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Return-path: In-Reply-To: <1339583254-895-1-git-send-email-aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org, dhillf-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, rientjes-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org, mhocko-AlSwsSmVLrQ@public.gmane.org, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org, hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, "Aneesh Kumar K.V" From: "Aneesh Kumar K.V" Use a mmu_gather instead of a temporary linked list for accumulating pages when we unmap a hugepage range Reviewed-by: KAMEZAWA Hiroyuki Signed-off-by: Aneesh Kumar K.V --- fs/hugetlbfs/inode.c | 4 ++-- include/linux/hugetlb.h | 22 ++++++++++++++---- mm/hugetlb.c | 59 ++++++++++++++++++++++++++++------------------- mm/memory.c | 7 ++++-- 4 files changed, 59 insertions(+), 33 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index cc9281b..ff233e4 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -416,8 +416,8 @@ hugetlb_vmtruncate_list(struct prio_tree_root *root, pgoff_t pgoff) else v_offset = 0; - __unmap_hugepage_range(vma, - vma->vm_start + v_offset, vma->vm_end, NULL); + unmap_hugepage_range(vma, vma->vm_start + v_offset, + vma->vm_end, NULL); } } diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 217f528..0f23c18 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -7,6 +7,7 @@ struct ctl_table; struct user_struct; +struct mmu_gather; #ifdef CONFIG_HUGETLB_PAGE @@ -40,9 +41,10 @@ int follow_hugetlb_page(struct mm_struct *, struct vm_area_struct *, struct page **, struct vm_area_struct **, unsigned long *, int *, int, unsigned int flags); void unmap_hugepage_range(struct vm_area_struct *, - unsigned long, unsigned long, struct page *); -void __unmap_hugepage_range(struct vm_area_struct *, - unsigned long, unsigned long, struct page *); + unsigned long, unsigned long, struct page *); +void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma, + unsigned long start, unsigned long end, + struct page *ref_page); int hugetlb_prefault(struct address_space *, struct vm_area_struct *); void hugetlb_report_meminfo(struct seq_file *); int hugetlb_report_node_meminfo(int, char *); @@ -98,7 +100,6 @@ static inline unsigned long hugetlb_total_pages(void) #define follow_huge_addr(mm, addr, write) ERR_PTR(-EINVAL) #define copy_hugetlb_page_range(src, dst, vma) ({ BUG(); 0; }) #define hugetlb_prefault(mapping, vma) ({ BUG(); 0; }) -#define unmap_hugepage_range(vma, start, end, page) BUG() static inline void hugetlb_report_meminfo(struct seq_file *m) { } @@ -112,13 +113,24 @@ static inline void hugetlb_report_meminfo(struct seq_file *m) #define hugetlb_free_pgd_range(tlb, addr, end, floor, ceiling) ({BUG(); 0; }) #define hugetlb_fault(mm, vma, addr, flags) ({ BUG(); 0; }) #define huge_pte_offset(mm, address) 0 -#define dequeue_hwpoisoned_huge_page(page) 0 +static inline int dequeue_hwpoisoned_huge_page(struct page *page) +{ + return 0; +} + static inline void copy_huge_page(struct page *dst, struct page *src) { } #define hugetlb_change_protection(vma, address, end, newprot) +static inline void __unmap_hugepage_range(struct mmu_gather *tlb, + struct vm_area_struct *vma, unsigned long start, + unsigned long end, struct page *ref_page) +{ + BUG(); +} + #endif /* !CONFIG_HUGETLB_PAGE */ #define HUGETLB_ANON_FILE "anon_hugepage" diff --git a/mm/hugetlb.c b/mm/hugetlb.c index b1e0ed1..e54b695 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -24,8 +24,9 @@ #include #include -#include +#include +#include #include #include #include "internal.h" @@ -2310,30 +2311,26 @@ static int is_hugetlb_entry_hwpoisoned(pte_t pte) return 0; } -void __unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start, - unsigned long end, struct page *ref_page) +void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma, + unsigned long start, unsigned long end, + struct page *ref_page) { + int force_flush = 0; struct mm_struct *mm = vma->vm_mm; unsigned long address; pte_t *ptep; pte_t pte; struct page *page; - struct page *tmp; struct hstate *h = hstate_vma(vma); unsigned long sz = huge_page_size(h); - /* - * A page gathering list, protected by per file i_mmap_mutex. The - * lock is used to avoid list corruption from multiple unmapping - * of the same page since we are using page->lru. - */ - LIST_HEAD(page_list); - WARN_ON(!is_vm_hugetlb_page(vma)); BUG_ON(start & ~huge_page_mask(h)); BUG_ON(end & ~huge_page_mask(h)); + tlb_start_vma(tlb, vma); mmu_notifier_invalidate_range_start(mm, start, end); +again: spin_lock(&mm->page_table_lock); for (address = start; address < end; address += sz) { ptep = huge_pte_offset(mm, address); @@ -2372,30 +2369,45 @@ void __unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start, } pte = huge_ptep_get_and_clear(mm, address, ptep); + tlb_remove_tlb_entry(tlb, ptep, address); if (pte_dirty(pte)) set_page_dirty(page); - list_add(&page->lru, &page_list); + page_remove_rmap(page); + force_flush = !__tlb_remove_page(tlb, page); + if (force_flush) + break; /* Bail out after unmapping reference page if supplied */ if (ref_page) break; } - flush_tlb_range(vma, start, end); spin_unlock(&mm->page_table_lock); - mmu_notifier_invalidate_range_end(mm, start, end); - list_for_each_entry_safe(page, tmp, &page_list, lru) { - page_remove_rmap(page); - list_del(&page->lru); - put_page(page); + /* + * mmu_gather ran out of room to batch pages, we break out of + * the PTE lock to avoid doing the potential expensive TLB invalidate + * and page-free while holding it. + */ + if (force_flush) { + force_flush = 0; + tlb_flush_mmu(tlb); + if (address < end && !ref_page) + goto again; } + mmu_notifier_invalidate_range_end(mm, start, end); + tlb_end_vma(tlb, vma); } void unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start, unsigned long end, struct page *ref_page) { - mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex); - __unmap_hugepage_range(vma, start, end, ref_page); - mutex_unlock(&vma->vm_file->f_mapping->i_mmap_mutex); + struct mm_struct *mm; + struct mmu_gather tlb; + + mm = vma->vm_mm; + + tlb_gather_mmu(&tlb, mm, 0); + __unmap_hugepage_range(&tlb, vma, start, end, ref_page); + tlb_finish_mmu(&tlb, start, end); } /* @@ -2440,9 +2452,8 @@ static int unmap_ref_private(struct mm_struct *mm, struct vm_area_struct *vma, * from the time of fork. This would look like data corruption */ if (!is_vma_resv_set(iter_vma, HPAGE_RESV_OWNER)) - __unmap_hugepage_range(iter_vma, - address, address + huge_page_size(h), - page); + unmap_hugepage_range(iter_vma, address, + address + huge_page_size(h), page); } mutex_unlock(&mapping->i_mmap_mutex); diff --git a/mm/memory.c b/mm/memory.c index 1b7dc66..545e18a 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1326,8 +1326,11 @@ static void unmap_single_vma(struct mmu_gather *tlb, * Since no pte has actually been setup, it is * safe to do nothing in this case. */ - if (vma->vm_file) - unmap_hugepage_range(vma, start, end, NULL); + if (vma->vm_file) { + mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex); + __unmap_hugepage_range(tlb, vma, start, end, NULL); + mutex_unlock(&vma->vm_file->f_mapping->i_mmap_mutex); + } } else unmap_page_range(tlb, vma, start, end, details); } -- 1.7.10 From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Aneesh Kumar K.V" Subject: [PATCH -V9 02/15] hugetlb: don't use ERR_PTR with VM_FAULT* values Date: Wed, 13 Jun 2012 15:57:21 +0530 Message-ID: <1339583254-895-3-git-send-email-aneesh.kumar@linux.vnet.ibm.com> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Return-path: In-Reply-To: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Sender: owner-linux-mm@kvack.org List-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: linux-mm@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, rientjes@google.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, "Aneesh Kumar K.V" From: "Aneesh Kumar K.V" The current use of VM_FAULT_* codes with ERR_PTR requires us to ensure VM_FAULT_* values will not exceed MAX_ERRNO value. Decouple the VM_FAULT_* values from MAX_ERRNO. Acked-by: Hillf Danton Acked-by: KOSAKI Motohiro Reviewed-by: KAMEZAWA Hiroyuki Signed-off-by: Aneesh Kumar K.V --- mm/hugetlb.c | 18 +++++++++++++----- 1 file changed, 13 insertions(+), 5 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index c868309..34a7e23 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1123,10 +1123,10 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma, */ chg = vma_needs_reservation(h, vma, addr); if (chg < 0) - return ERR_PTR(-VM_FAULT_OOM); + return ERR_PTR(-ENOMEM); if (chg) if (hugepage_subpool_get_pages(spool, chg)) - return ERR_PTR(-VM_FAULT_SIGBUS); + return ERR_PTR(-ENOSPC); spin_lock(&hugetlb_lock); page = dequeue_huge_page_vma(h, vma, addr, avoid_reserve); @@ -1136,7 +1136,7 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma, page = alloc_buddy_huge_page(h, NUMA_NO_NODE); if (!page) { hugepage_subpool_put_pages(spool, chg); - return ERR_PTR(-VM_FAULT_SIGBUS); + return ERR_PTR(-ENOSPC); } } @@ -2496,6 +2496,7 @@ retry_avoidcopy: new_page = alloc_huge_page(vma, address, outside_reserve); if (IS_ERR(new_page)) { + long err = PTR_ERR(new_page); page_cache_release(old_page); /* @@ -2524,7 +2525,10 @@ retry_avoidcopy: /* Caller expects lock to be held */ spin_lock(&mm->page_table_lock); - return -PTR_ERR(new_page); + if (err == -ENOMEM) + return VM_FAULT_OOM; + else + return VM_FAULT_SIGBUS; } /* @@ -2642,7 +2646,11 @@ retry: goto out; page = alloc_huge_page(vma, address, 0); if (IS_ERR(page)) { - ret = -PTR_ERR(page); + ret = PTR_ERR(page); + if (ret == -ENOMEM) + ret = VM_FAULT_OOM; + else + ret = VM_FAULT_SIGBUS; goto out; } clear_huge_page(page, address, pages_per_huge_page(h)); -- 1.7.10 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Aneesh Kumar K.V" Subject: [PATCH -V9 15/15] hugetlb/cgroup: add HugeTLB controller documentation Date: Wed, 13 Jun 2012 15:57:34 +0530 Message-ID: <1339583254-895-16-git-send-email-aneesh.kumar@linux.vnet.ibm.com> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Return-path: In-Reply-To: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Sender: owner-linux-mm@kvack.org List-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: linux-mm@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, rientjes@google.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, "Aneesh Kumar K.V" From: "Aneesh Kumar K.V" Reviewed-by: KAMEZAWA Hiroyuki Signed-off-by: Aneesh Kumar K.V --- Documentation/cgroups/hugetlb.txt | 45 +++++++++++++++++++++++++++++++++++++ 1 file changed, 45 insertions(+) create mode 100644 Documentation/cgroups/hugetlb.txt diff --git a/Documentation/cgroups/hugetlb.txt b/Documentation/cgroups/hugetlb.txt new file mode 100644 index 0000000..a9faaca --- /dev/null +++ b/Documentation/cgroups/hugetlb.txt @@ -0,0 +1,45 @@ +HugeTLB Controller +------------------- + +The HugeTLB controller allows to limit the HugeTLB usage per control group and +enforces the controller limit during page fault. Since HugeTLB doesn't +support page reclaim, enforcing the limit at page fault time implies that, +the application will get SIGBUS signal if it tries to access HugeTLB pages +beyond its limit. This requires the application to know beforehand how much +HugeTLB pages it would require for its use. + +HugeTLB controller can be created by first mounting the cgroup filesystem. + +# mount -t cgroup -o hugetlb none /sys/fs/cgroup + +With the above step, the initial or the parent HugeTLB group becomes +visible at /sys/fs/cgroup. At bootup, this group includes all the tasks in +the system. /sys/fs/cgroup/tasks lists the tasks in this cgroup. + +New groups can be created under the parent group /sys/fs/cgroup. + +# cd /sys/fs/cgroup +# mkdir g1 +# echo $$ > g1/tasks + +The above steps create a new group g1 and move the current shell +process (bash) into it. + +Brief summary of control files + + hugetlb..limit_in_bytes # set/show limit of "hugepagesize" hugetlb usage + hugetlb..max_usage_in_bytes # show max "hugepagesize" hugetlb usage recorded + hugetlb..usage_in_bytes # show current res_counter usage for "hugepagesize" hugetlb + hugetlb..failcnt # show the number of allocation failure due to HugeTLB limit + +For a system supporting two hugepage size (16M and 16G) the control +files include: + +hugetlb.16GB.limit_in_bytes +hugetlb.16GB.max_usage_in_bytes +hugetlb.16GB.usage_in_bytes +hugetlb.16GB.failcnt +hugetlb.16MB.limit_in_bytes +hugetlb.16MB.max_usage_in_bytes +hugetlb.16MB.usage_in_bytes +hugetlb.16MB.failcnt -- 1.7.10 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Aneesh Kumar K.V" Subject: [PATCH -V9 13/15] hugetlb/cgroup: add hugetlb cgroup control files Date: Wed, 13 Jun 2012 15:57:32 +0530 Message-ID: <1339583254-895-14-git-send-email-aneesh.kumar@linux.vnet.ibm.com> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Return-path: In-Reply-To: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Sender: owner-linux-mm@kvack.org List-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: linux-mm@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, rientjes@google.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, "Aneesh Kumar K.V" From: "Aneesh Kumar K.V" Add the control files for hugetlb controller Signed-off-by: Aneesh Kumar K.V --- include/linux/hugetlb.h | 5 ++ include/linux/hugetlb_cgroup.h | 6 ++ mm/hugetlb.c | 8 +++ mm/hugetlb_cgroup.c | 129 ++++++++++++++++++++++++++++++++++++++++ 4 files changed, 148 insertions(+) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 4aca057..9650bb1 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -4,6 +4,7 @@ #include #include #include +#include struct ctl_table; struct user_struct; @@ -221,6 +222,10 @@ struct hstate { unsigned int nr_huge_pages_node[MAX_NUMNODES]; unsigned int free_huge_pages_node[MAX_NUMNODES]; unsigned int surplus_huge_pages_node[MAX_NUMNODES]; +#ifdef CONFIG_CGROUP_HUGETLB_RES_CTLR + /* cgroup control files */ + struct cftype cgroup_files[5]; +#endif char name[HSTATE_NAME_LEN]; }; diff --git a/include/linux/hugetlb_cgroup.h b/include/linux/hugetlb_cgroup.h index e05871c..bd8bc98 100644 --- a/include/linux/hugetlb_cgroup.h +++ b/include/linux/hugetlb_cgroup.h @@ -62,6 +62,7 @@ extern void hugetlb_cgroup_uncharge_page(int idx, unsigned long nr_pages, struct page *page); extern void hugetlb_cgroup_uncharge_cgroup(int idx, unsigned long nr_pages, struct hugetlb_cgroup *h_cg); +extern int hugetlb_cgroup_file_init(int idx) __init; #else static inline struct hugetlb_cgroup *hugetlb_cgroup_from_page(struct page *page) @@ -108,5 +109,10 @@ hugetlb_cgroup_uncharge_cgroup(int idx, unsigned long nr_pages, return; } +static inline int __init hugetlb_cgroup_file_init(int idx) +{ + return 0; +} + #endif /* CONFIG_MEM_RES_CTLR_HUGETLB */ #endif diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 59720b1..a5a30bf 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -30,6 +30,7 @@ #include #include #include +#include #include "internal.h" const unsigned long hugetlb_zero = 0, hugetlb_infinity = ~0UL; @@ -1930,6 +1931,13 @@ void __init hugetlb_add_hstate(unsigned order) h->next_nid_to_free = first_node(node_states[N_HIGH_MEMORY]); snprintf(h->name, HSTATE_NAME_LEN, "hugepages-%lukB", huge_page_size(h)/1024); + /* + * Add cgroup control files only if the huge page consists + * of more than two normal pages. This is because we use + * page[2].lru.next for storing cgoup details. + */ + if (order >= HUGETLB_CGROUP_MIN_ORDER) + hugetlb_cgroup_file_init(hugetlb_max_hstate - 1); parsed_hstate = h; } diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c index a3a68a4..64e93e0 100644 --- a/mm/hugetlb_cgroup.c +++ b/mm/hugetlb_cgroup.c @@ -26,6 +26,10 @@ struct hugetlb_cgroup { struct res_counter hugepage[HUGE_MAX_HSTATE]; }; +#define MEMFILE_PRIVATE(x, val) (((x) << 16) | (val)) +#define MEMFILE_IDX(val) (((val) >> 16) & 0xffff) +#define MEMFILE_ATTR(val) ((val) & 0xffff) + struct cgroup_subsys hugetlb_subsys __read_mostly; struct hugetlb_cgroup *root_h_cgroup __read_mostly; @@ -259,6 +263,131 @@ void hugetlb_cgroup_uncharge_cgroup(int idx, unsigned long nr_pages, return; } +static ssize_t hugetlb_cgroup_read(struct cgroup *cgroup, struct cftype *cft, + struct file *file, char __user *buf, + size_t nbytes, loff_t *ppos) +{ + u64 val; + char str[64]; + int idx, name, len; + struct hugetlb_cgroup *h_cg = hugetlb_cgroup_from_cgroup(cgroup); + + idx = MEMFILE_IDX(cft->private); + name = MEMFILE_ATTR(cft->private); + + val = res_counter_read_u64(&h_cg->hugepage[idx], name); + len = scnprintf(str, sizeof(str), "%llu\n", (unsigned long long)val); + return simple_read_from_buffer(buf, nbytes, ppos, str, len); +} + +static int hugetlb_cgroup_write(struct cgroup *cgroup, struct cftype *cft, + const char *buffer) +{ + int idx, name, ret; + unsigned long long val; + struct hugetlb_cgroup *h_cg = hugetlb_cgroup_from_cgroup(cgroup); + + idx = MEMFILE_IDX(cft->private); + name = MEMFILE_ATTR(cft->private); + + switch (name) { + case RES_LIMIT: + if (hugetlb_cgroup_is_root(h_cg)) { + /* Can't set limit on root */ + ret = -EINVAL; + break; + } + /* This function does all necessary parse...reuse it */ + ret = res_counter_memparse_write_strategy(buffer, &val); + if (ret) + break; + ret = res_counter_set_limit(&h_cg->hugepage[idx], val); + break; + default: + ret = -EINVAL; + break; + } + return ret; +} + +static int hugetlb_cgroup_reset(struct cgroup *cgroup, unsigned int event) +{ + int idx, name, ret = 0; + struct hugetlb_cgroup *h_cg = hugetlb_cgroup_from_cgroup(cgroup); + + idx = MEMFILE_IDX(event); + name = MEMFILE_ATTR(event); + + switch (name) { + case RES_MAX_USAGE: + res_counter_reset_max(&h_cg->hugepage[idx]); + break; + case RES_FAILCNT: + res_counter_reset_failcnt(&h_cg->hugepage[idx]); + break; + default: + ret = -EINVAL; + break; + } + return ret; +} + +static char *mem_fmt(char *buf, int size, unsigned long hsize) +{ + if (hsize >= (1UL << 30)) + snprintf(buf, size, "%luGB", hsize >> 30); + else if (hsize >= (1UL << 20)) + snprintf(buf, size, "%luMB", hsize >> 20); + else + snprintf(buf, size, "%luKB", hsize >> 10); + return buf; +} + +int __init hugetlb_cgroup_file_init(int idx) +{ + char buf[32]; + struct cftype *cft; + struct hstate *h = &hstates[idx]; + + /* format the size */ + mem_fmt(buf, 32, huge_page_size(h)); + + /* Add the limit file */ + cft = &h->cgroup_files[0]; + snprintf(cft->name, MAX_CFTYPE_NAME, "%s.limit_in_bytes", buf); + cft->private = MEMFILE_PRIVATE(idx, RES_LIMIT); + cft->read = hugetlb_cgroup_read; + cft->write_string = hugetlb_cgroup_write; + + /* Add the usage file */ + cft = &h->cgroup_files[1]; + snprintf(cft->name, MAX_CFTYPE_NAME, "%s.usage_in_bytes", buf); + cft->private = MEMFILE_PRIVATE(idx, RES_USAGE); + cft->read = hugetlb_cgroup_read; + + /* Add the MAX usage file */ + cft = &h->cgroup_files[2]; + snprintf(cft->name, MAX_CFTYPE_NAME, "%s.max_usage_in_bytes", buf); + cft->private = MEMFILE_PRIVATE(idx, RES_MAX_USAGE); + cft->trigger = hugetlb_cgroup_reset; + cft->read = hugetlb_cgroup_read; + + /* Add the failcntfile */ + cft = &h->cgroup_files[3]; + snprintf(cft->name, MAX_CFTYPE_NAME, "%s.failcnt", buf); + cft->private = MEMFILE_PRIVATE(idx, RES_FAILCNT); + cft->trigger = hugetlb_cgroup_reset; + cft->read = hugetlb_cgroup_read; + + /* NULL terminate the last cft */ + cft = &h->cgroup_files[4]; + memset(cft, 0, sizeof(*cft)); + + WARN_ON(cgroup_add_cftypes(&hugetlb_subsys, h->cgroup_files)); + + return 0; +} + struct cgroup_subsys hugetlb_subsys = { .name = "hugetlb", .create = hugetlb_cgroup_create, -- 1.7.10 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Aneesh Kumar K.V" Subject: [PATCH -V9 12/15] hugetlb/cgroup: Add support for cgroup removal Date: Wed, 13 Jun 2012 15:57:31 +0530 Message-ID: <1339583254-895-13-git-send-email-aneesh.kumar@linux.vnet.ibm.com> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Return-path: In-Reply-To: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Sender: owner-linux-mm@kvack.org List-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: linux-mm@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, rientjes@google.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, "Aneesh Kumar K.V" From: "Aneesh Kumar K.V" This patch add support for cgroup removal. If we don't have parent cgroup, the charges are moved to root cgroup. Signed-off-by: Aneesh Kumar K.V --- mm/hugetlb_cgroup.c | 70 +++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 68 insertions(+), 2 deletions(-) diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c index 0f2f6ac..a3a68a4 100644 --- a/mm/hugetlb_cgroup.c +++ b/mm/hugetlb_cgroup.c @@ -107,10 +107,76 @@ static void hugetlb_cgroup_destroy(struct cgroup *cgroup) kfree(h_cgroup); } + +/* + * Should be called with hugetlb_lock held. + * Since we are holding hugetlb_lock, pages cannot get moved from + * active list or uncharged from the cgroup, So no need to get + * page reference and test for page active here. This function + * cannot fail. + */ +static void hugetlb_cgroup_move_parent(int idx, struct cgroup *cgroup, + struct page *page) +{ + int csize; + struct res_counter *counter; + struct res_counter *fail_res; + struct hugetlb_cgroup *page_hcg; + struct hugetlb_cgroup *h_cg = hugetlb_cgroup_from_cgroup(cgroup); + struct hugetlb_cgroup *parent = parent_hugetlb_cgroup(cgroup); + + page_hcg = hugetlb_cgroup_from_page(page); + /* + * We can have pages in active list without any cgroup + * ie, hugepage with less than 3 pages. We can safely + * ignore those pages. + */ + if (!page_hcg || page_hcg != h_cg) + goto out; + + csize = PAGE_SIZE << compound_order(page); + if (!parent) { + parent = root_h_cgroup; + /* root has no limit */ + res_counter_charge_nofail(&parent->hugepage[idx], + csize, &fail_res); + } + counter = &h_cg->hugepage[idx]; + res_counter_uncharge_until(counter, counter->parent, csize); + + set_hugetlb_cgroup(page, parent); +out: + return; +} + +/* + * Force the hugetlb cgroup to empty the hugetlb resources by moving them to + * the parent cgroup. + */ static int hugetlb_cgroup_pre_destroy(struct cgroup *cgroup) { - /* We will add the cgroup removal support in later patches */ - return -EBUSY; + struct hstate *h; + struct page *page; + int ret = 0, idx = 0; + + do { + if (cgroup_task_count(cgroup) || + !list_empty(&cgroup->children)) { + ret = -EBUSY; + goto out; + } + for_each_hstate(h) { + spin_lock(&hugetlb_lock); + list_for_each_entry(page, &h->hugepage_activelist, lru) + hugetlb_cgroup_move_parent(idx, cgroup, page); + + spin_unlock(&hugetlb_lock); + idx++; + } + cond_resched(); + } while (hugetlb_cgroup_have_usage(cgroup)); +out: + return ret; } int hugetlb_cgroup_charge_cgroup(int idx, unsigned long nr_pages, -- 1.7.10 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Aneesh Kumar K.V" Subject: [PATCH -V9 10/15] hugetlb/cgroup: Add the cgroup pointer to page lru Date: Wed, 13 Jun 2012 15:57:29 +0530 Message-ID: <1339583254-895-11-git-send-email-aneesh.kumar@linux.vnet.ibm.com> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Return-path: In-Reply-To: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Sender: owner-linux-mm@kvack.org List-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: linux-mm@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, rientjes@google.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, "Aneesh Kumar K.V" From: "Aneesh Kumar K.V" Add the hugetlb cgroup pointer to 3rd page lru.next. This limit the usage to hugetlb cgroup to only hugepages with 3 or more normal pages. I guess that is an acceptable limitation. Signed-off-by: Aneesh Kumar K.V --- include/linux/hugetlb_cgroup.h | 37 +++++++++++++++++++++++++++++++++++++ mm/hugetlb.c | 4 ++++ 2 files changed, 41 insertions(+) diff --git a/include/linux/hugetlb_cgroup.h b/include/linux/hugetlb_cgroup.h index e9944b4..be1a9f8 100644 --- a/include/linux/hugetlb_cgroup.h +++ b/include/linux/hugetlb_cgroup.h @@ -20,6 +20,32 @@ struct hugetlb_cgroup; #ifdef CONFIG_CGROUP_HUGETLB_RES_CTLR +/* + * Minimum page order trackable by hugetlb cgroup. + * At least 3 pages are necessary for all the tracking information. + */ +#define HUGETLB_CGROUP_MIN_ORDER 2 + +static inline struct hugetlb_cgroup *hugetlb_cgroup_from_page(struct page *page) +{ + VM_BUG_ON(!PageHuge(page)); + + if (compound_order(page) < HUGETLB_CGROUP_MIN_ORDER) + return NULL; + return (struct hugetlb_cgroup *)page[2].lru.next; +} + +static inline +int set_hugetlb_cgroup(struct page *page, struct hugetlb_cgroup *h_cg) +{ + VM_BUG_ON(!PageHuge(page)); + + if (compound_order(page) < HUGETLB_CGROUP_MIN_ORDER) + return -1; + page[2].lru.next = (void *)h_cg; + return 0; +} + static inline bool hugetlb_cgroup_disabled(void) { if (hugetlb_subsys.disabled) @@ -28,6 +54,17 @@ static inline bool hugetlb_cgroup_disabled(void) } #else +static inline struct hugetlb_cgroup *hugetlb_cgroup_from_page(struct page *page) +{ + return NULL; +} + +static inline +int set_hugetlb_cgroup(struct page *page, struct hugetlb_cgroup *h_cg) +{ + return 0; +} + static inline bool hugetlb_cgroup_disabled(void) { return true; diff --git a/mm/hugetlb.c b/mm/hugetlb.c index e899a2d..6a449c5 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -28,6 +28,7 @@ #include #include +#include #include #include "internal.h" @@ -591,6 +592,7 @@ static void update_and_free_page(struct hstate *h, struct page *page) 1 << PG_active | 1 << PG_reserved | 1 << PG_private | 1 << PG_writeback); } + VM_BUG_ON(hugetlb_cgroup_from_page(page)); set_compound_page_dtor(page, NULL); set_page_refcounted(page); arch_release_hugepage(page); @@ -643,6 +645,7 @@ static void prep_new_huge_page(struct hstate *h, struct page *page, int nid) INIT_LIST_HEAD(&page->lru); set_compound_page_dtor(page, free_huge_page); spin_lock(&hugetlb_lock); + set_hugetlb_cgroup(page, NULL); h->nr_huge_pages++; h->nr_huge_pages_node[nid]++; spin_unlock(&hugetlb_lock); @@ -892,6 +895,7 @@ static struct page *alloc_buddy_huge_page(struct hstate *h, int nid) INIT_LIST_HEAD(&page->lru); r_nid = page_to_nid(page); set_compound_page_dtor(page, free_huge_page); + set_hugetlb_cgroup(page, NULL); /* * We incremented the global counters already */ -- 1.7.10 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Aneesh Kumar K.V" Subject: [PATCH -V9 08/15] hugetlb: Make some static variables global Date: Wed, 13 Jun 2012 15:57:27 +0530 Message-ID: <1339583254-895-9-git-send-email-aneesh.kumar@linux.vnet.ibm.com> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Return-path: In-Reply-To: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Sender: owner-linux-mm@kvack.org List-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: linux-mm@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, rientjes@google.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, "Aneesh Kumar K.V" From: "Aneesh Kumar K.V" We will use them later in hugetlb_cgroup.c Signed-off-by: Aneesh Kumar K.V --- include/linux/hugetlb.h | 5 +++++ mm/hugetlb.c | 7 ++----- 2 files changed, 7 insertions(+), 5 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index ed550d8..4aca057 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -21,6 +21,11 @@ struct hugepage_subpool { long max_hpages, used_hpages; }; +extern spinlock_t hugetlb_lock; +extern int hugetlb_max_hstate; +#define for_each_hstate(h) \ + for ((h) = hstates; (h) < &hstates[hugetlb_max_hstate]; (h)++) + struct hugepage_subpool *hugepage_new_subpool(long nr_blocks); void hugepage_put_subpool(struct hugepage_subpool *spool); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index b5b6e15..e899a2d 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -35,7 +35,7 @@ const unsigned long hugetlb_zero = 0, hugetlb_infinity = ~0UL; static gfp_t htlb_alloc_mask = GFP_HIGHUSER; unsigned long hugepages_treat_as_movable; -static int hugetlb_max_hstate; +int hugetlb_max_hstate; unsigned int default_hstate_idx; struct hstate hstates[HUGE_MAX_HSTATE]; @@ -46,13 +46,10 @@ static struct hstate * __initdata parsed_hstate; static unsigned long __initdata default_hstate_max_huge_pages; static unsigned long __initdata default_hstate_size; -#define for_each_hstate(h) \ - for ((h) = hstates; (h) < &hstates[hugetlb_max_hstate]; (h)++) - /* * Protects updates to hugepage_freelists, nr_huge_pages, and free_huge_pages */ -static DEFINE_SPINLOCK(hugetlb_lock); +DEFINE_SPINLOCK(hugetlb_lock); static inline void unlock_or_release_subpool(struct hugepage_subpool *spool) { -- 1.7.10 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Aneesh Kumar K.V" Subject: [PATCH -V9 09/15] mm/hugetlb: Add new HugeTLB cgroup Date: Wed, 13 Jun 2012 15:57:28 +0530 Message-ID: <1339583254-895-10-git-send-email-aneesh.kumar@linux.vnet.ibm.com> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Return-path: In-Reply-To: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Sender: owner-linux-mm@kvack.org List-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: linux-mm@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, rientjes@google.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, "Aneesh Kumar K.V" From: "Aneesh Kumar K.V" This patch implements a new controller that allows us to control HugeTLB allocations. The extension allows to limit the HugeTLB usage per control group and enforces the controller limit during page fault. Since HugeTLB doesn't support page reclaim, enforcing the limit at page fault time implies that, the application will get SIGBUS signal if it tries to access HugeTLB pages beyond its limit. This requires the application to know beforehand how much HugeTLB pages it would require for its use. The charge/uncharge calls will be added to HugeTLB code in later patch. Support for cgroup removal will be added in later patches. Reviewed-by: KAMEZAWA Hiroyuki Signed-off-by: Aneesh Kumar K.V --- include/linux/cgroup_subsys.h | 6 ++ include/linux/hugetlb_cgroup.h | 37 ++++++++++++ init/Kconfig | 15 +++++ mm/Makefile | 1 + mm/hugetlb_cgroup.c | 122 ++++++++++++++++++++++++++++++++++++++++ 5 files changed, 181 insertions(+) create mode 100644 include/linux/hugetlb_cgroup.h create mode 100644 mm/hugetlb_cgroup.c diff --git a/include/linux/cgroup_subsys.h b/include/linux/cgroup_subsys.h index 0bd390c..895923a 100644 --- a/include/linux/cgroup_subsys.h +++ b/include/linux/cgroup_subsys.h @@ -72,3 +72,9 @@ SUBSYS(net_prio) #endif /* */ + +#ifdef CONFIG_CGROUP_HUGETLB_RES_CTLR +SUBSYS(hugetlb) +#endif + +/* */ diff --git a/include/linux/hugetlb_cgroup.h b/include/linux/hugetlb_cgroup.h new file mode 100644 index 0000000..e9944b4 --- /dev/null +++ b/include/linux/hugetlb_cgroup.h @@ -0,0 +1,37 @@ +/* + * Copyright IBM Corporation, 2012 + * Author Aneesh Kumar K.V + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of version 2.1 of the GNU Lesser General Public License + * as published by the Free Software Foundation. + * + * This program is distributed in the hope that it would be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. + * + */ + +#ifndef _LINUX_HUGETLB_CGROUP_H +#define _LINUX_HUGETLB_CGROUP_H + +#include + +struct hugetlb_cgroup; + +#ifdef CONFIG_CGROUP_HUGETLB_RES_CTLR +static inline bool hugetlb_cgroup_disabled(void) +{ + if (hugetlb_subsys.disabled) + return true; + return false; +} + +#else +static inline bool hugetlb_cgroup_disabled(void) +{ + return true; +} + +#endif /* CONFIG_MEM_RES_CTLR_HUGETLB */ +#endif diff --git a/init/Kconfig b/init/Kconfig index d07dcf9..da05fae 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -751,6 +751,21 @@ config CGROUP_MEM_RES_CTLR_KMEM the kmem extension can use it to guarantee that no group of processes will ever exhaust kernel resources alone. +config CGROUP_HUGETLB_RES_CTLR + bool "HugeTLB Resource Controller for Control Groups" + depends on RESOURCE_COUNTERS && HUGETLB_PAGE && EXPERIMENTAL + default n + help + Provides a cgroup Resource Controller for HugeTLB pages. + When you enable this, you can put a per cgroup limit on HugeTLB usage. + The limit is enforced during page fault. Since HugeTLB doesn't + support page reclaim, enforcing the limit at page fault time implies + that, the application will get SIGBUS signal if it tries to access + HugeTLB pages beyond its limit. This requires the application to know + beforehand how much HugeTLB pages it would require for its use. The + control group is tracked in the third page lru pointer. This means + that we cannot use the controller with huge page less than 3 pages. + config CGROUP_PERF bool "Enable perf_event per-cpu per-container group (cgroup) monitoring" depends on PERF_EVENTS && CGROUPS diff --git a/mm/Makefile b/mm/Makefile index 2e2fbbe..25e8002 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -49,6 +49,7 @@ obj-$(CONFIG_MIGRATION) += migrate.o obj-$(CONFIG_QUICKLIST) += quicklist.o obj-$(CONFIG_TRANSPARENT_HUGEPAGE) += huge_memory.o obj-$(CONFIG_CGROUP_MEM_RES_CTLR) += memcontrol.o page_cgroup.o +obj-$(CONFIG_CGROUP_HUGETLB_RES_CTLR) += hugetlb_cgroup.o obj-$(CONFIG_MEMORY_FAILURE) += memory-failure.o obj-$(CONFIG_HWPOISON_INJECT) += hwpoison-inject.o obj-$(CONFIG_DEBUG_KMEMLEAK) += kmemleak.o diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c new file mode 100644 index 0000000..5a4e71c --- /dev/null +++ b/mm/hugetlb_cgroup.c @@ -0,0 +1,122 @@ +/* + * + * Copyright IBM Corporation, 2012 + * Author Aneesh Kumar K.V + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of version 2.1 of the GNU Lesser General Public License + * as published by the Free Software Foundation. + * + * This program is distributed in the hope that it would be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. + * + */ + +#include +#include +#include +#include + +struct hugetlb_cgroup { + struct cgroup_subsys_state css; + /* + * the counter to account for hugepages from hugetlb. + */ + struct res_counter hugepage[HUGE_MAX_HSTATE]; +}; + +struct cgroup_subsys hugetlb_subsys __read_mostly; +struct hugetlb_cgroup *root_h_cgroup __read_mostly; + +static inline +struct hugetlb_cgroup *hugetlb_cgroup_from_css(struct cgroup_subsys_state *s) +{ + if (s) + return container_of(s, struct hugetlb_cgroup, css); + return NULL; +} + +static inline +struct hugetlb_cgroup *hugetlb_cgroup_from_cgroup(struct cgroup *cgroup) +{ + return hugetlb_cgroup_from_css(cgroup_subsys_state(cgroup, + hugetlb_subsys_id)); +} + +static inline +struct hugetlb_cgroup *hugetlb_cgroup_from_task(struct task_struct *task) +{ + return hugetlb_cgroup_from_css(task_subsys_state(task, + hugetlb_subsys_id)); +} + +static inline bool hugetlb_cgroup_is_root(struct hugetlb_cgroup *h_cg) +{ + return (h_cg == root_h_cgroup); +} + +static inline struct hugetlb_cgroup *parent_hugetlb_cgroup(struct cgroup *cg) +{ + if (!cg->parent) + return NULL; + return hugetlb_cgroup_from_cgroup(cg->parent); +} + +static inline bool hugetlb_cgroup_have_usage(struct cgroup *cg) +{ + int idx; + struct hugetlb_cgroup *h_cg = hugetlb_cgroup_from_cgroup(cg); + + for (idx = 0; idx < hugetlb_max_hstate; idx++) { + if ((res_counter_read_u64(&h_cg->hugepage[idx], RES_USAGE)) > 0) + return true; + } + return false; +} + +static struct cgroup_subsys_state *hugetlb_cgroup_create(struct cgroup *cgroup) +{ + int idx; + struct cgroup *parent_cgroup; + struct hugetlb_cgroup *h_cgroup, *parent_h_cgroup; + + h_cgroup = kzalloc(sizeof(*h_cgroup), GFP_KERNEL); + if (!h_cgroup) + return ERR_PTR(-ENOMEM); + + parent_cgroup = cgroup->parent; + if (parent_cgroup) { + parent_h_cgroup = hugetlb_cgroup_from_cgroup(parent_cgroup); + for (idx = 0; idx < HUGE_MAX_HSTATE; idx++) + res_counter_init(&h_cgroup->hugepage[idx], + &parent_h_cgroup->hugepage[idx]); + } else { + root_h_cgroup = h_cgroup; + for (idx = 0; idx < HUGE_MAX_HSTATE; idx++) + res_counter_init(&h_cgroup->hugepage[idx], NULL); + } + return &h_cgroup->css; +} + +static void hugetlb_cgroup_destroy(struct cgroup *cgroup) +{ + struct hugetlb_cgroup *h_cgroup; + + h_cgroup = hugetlb_cgroup_from_cgroup(cgroup); + kfree(h_cgroup); +} + +static int hugetlb_cgroup_pre_destroy(struct cgroup *cgroup) +{ + /* We will add the cgroup removal support in later patches */ + return -EBUSY; +} + +struct cgroup_subsys hugetlb_subsys = { + .name = "hugetlb", + .create = hugetlb_cgroup_create, + .pre_destroy = hugetlb_cgroup_pre_destroy, + .destroy = hugetlb_cgroup_destroy, + .subsys_id = hugetlb_subsys_id, +}; -- 1.7.10 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Aneesh Kumar K.V" Subject: [PATCH -V9 06/15] hugetlb: simplify migrate_huge_page() Date: Wed, 13 Jun 2012 15:57:25 +0530 Message-ID: <1339583254-895-7-git-send-email-aneesh.kumar@linux.vnet.ibm.com> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Return-path: In-Reply-To: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Sender: owner-linux-mm@kvack.org List-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: linux-mm@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, rientjes@google.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, "Aneesh Kumar K.V" From: "Aneesh Kumar K.V" Since we migrate only one hugepage, don't use linked list for passing the page around. Directly pass the page that need to be migrated as argument. This also remove the usage page->lru in migrate path. Reviewed-by: KAMEZAWA Hiroyuki Signed-off-by: Aneesh Kumar K.V --- include/linux/migrate.h | 4 +-- mm/memory-failure.c | 13 ++-------- mm/migrate.c | 65 +++++++++++++++-------------------------------- 3 files changed, 25 insertions(+), 57 deletions(-) diff --git a/include/linux/migrate.h b/include/linux/migrate.h index 855c337..ce7e667 100644 --- a/include/linux/migrate.h +++ b/include/linux/migrate.h @@ -15,7 +15,7 @@ extern int migrate_page(struct address_space *, extern int migrate_pages(struct list_head *l, new_page_t x, unsigned long private, bool offlining, enum migrate_mode mode); -extern int migrate_huge_pages(struct list_head *l, new_page_t x, +extern int migrate_huge_page(struct page *, new_page_t x, unsigned long private, bool offlining, enum migrate_mode mode); @@ -36,7 +36,7 @@ static inline void putback_lru_pages(struct list_head *l) {} static inline int migrate_pages(struct list_head *l, new_page_t x, unsigned long private, bool offlining, enum migrate_mode mode) { return -ENOSYS; } -static inline int migrate_huge_pages(struct list_head *l, new_page_t x, +static inline int migrate_huge_page(struct page *page, new_page_t x, unsigned long private, bool offlining, enum migrate_mode mode) { return -ENOSYS; } diff --git a/mm/memory-failure.c b/mm/memory-failure.c index ab1e714..53a1495 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1414,7 +1414,6 @@ static int soft_offline_huge_page(struct page *page, int flags) int ret; unsigned long pfn = page_to_pfn(page); struct page *hpage = compound_head(page); - LIST_HEAD(pagelist); ret = get_any_page(page, pfn, flags); if (ret < 0) @@ -1429,19 +1428,11 @@ static int soft_offline_huge_page(struct page *page, int flags) } /* Keep page count to indicate a given hugepage is isolated. */ - - list_add(&hpage->lru, &pagelist); - ret = migrate_huge_pages(&pagelist, new_page, MPOL_MF_MOVE_ALL, 0, - true); + ret = migrate_huge_page(hpage, new_page, MPOL_MF_MOVE_ALL, 0, true); + put_page(hpage); if (ret) { - struct page *page1, *page2; - list_for_each_entry_safe(page1, page2, &pagelist, lru) - put_page(page1); - pr_info("soft offline: %#lx: migration failed %d, type %lx\n", pfn, ret, page->flags); - if (ret > 0) - ret = -EIO; return ret; } done: diff --git a/mm/migrate.c b/mm/migrate.c index be26d5c..fdce3a2 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -932,15 +932,8 @@ static int unmap_and_move_huge_page(new_page_t get_new_page, if (anon_vma) put_anon_vma(anon_vma); unlock_page(hpage); - out: - if (rc != -EAGAIN) { - list_del(&hpage->lru); - put_page(hpage); - } - put_page(new_hpage); - if (result) { if (rc) *result = rc; @@ -1016,48 +1009,32 @@ out: return nr_failed + retry; } -int migrate_huge_pages(struct list_head *from, - new_page_t get_new_page, unsigned long private, bool offlining, - enum migrate_mode mode) +int migrate_huge_page(struct page *hpage, new_page_t get_new_page, + unsigned long private, bool offlining, + enum migrate_mode mode) { - int retry = 1; - int nr_failed = 0; - int pass = 0; - struct page *page; - struct page *page2; - int rc; - - for (pass = 0; pass < 10 && retry; pass++) { - retry = 0; - - list_for_each_entry_safe(page, page2, from, lru) { + int pass, rc; + + for (pass = 0; pass < 10; pass++) { + rc = unmap_and_move_huge_page(get_new_page, + private, hpage, pass > 2, offlining, + mode); + switch (rc) { + case -ENOMEM: + goto out; + case -EAGAIN: + /* try again */ cond_resched(); - - rc = unmap_and_move_huge_page(get_new_page, - private, page, pass > 2, offlining, - mode); - - switch(rc) { - case -ENOMEM: - goto out; - case -EAGAIN: - retry++; - break; - case 0: - break; - default: - /* Permanent failure */ - nr_failed++; - break; - } + break; + case 0: + goto out; + default: + rc = -EIO; + goto out; } } - rc = 0; out: - if (rc) - return rc; - - return nr_failed + retry; + return rc; } #ifdef CONFIG_NUMA -- 1.7.10 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Aneesh Kumar K.V" Subject: [PATCH -V9 07/15] hugetlb: add a list for tracking in-use HugeTLB pages Date: Wed, 13 Jun 2012 15:57:26 +0530 Message-ID: <1339583254-895-8-git-send-email-aneesh.kumar@linux.vnet.ibm.com> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Return-path: In-Reply-To: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Sender: owner-linux-mm@kvack.org List-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: linux-mm@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, rientjes@google.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, "Aneesh Kumar K.V" From: "Aneesh Kumar K.V" hugepage_activelist will be used to track currently used HugeTLB pages. We need to find the in-use HugeTLB pages to support HugeTLB cgroup removal. On cgroup removal we update the page's HugeTLB cgroup to point to parent cgroup. Reviewed-by: KAMEZAWA Hiroyuki Signed-off-by: Aneesh Kumar K.V --- include/linux/hugetlb.h | 1 + mm/hugetlb.c | 12 +++++++----- 2 files changed, 8 insertions(+), 5 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 0f23c18..ed550d8 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -211,6 +211,7 @@ struct hstate { unsigned long resv_huge_pages; unsigned long surplus_huge_pages; unsigned long nr_overcommit_huge_pages; + struct list_head hugepage_activelist; struct list_head hugepage_freelists[MAX_NUMNODES]; unsigned int nr_huge_pages_node[MAX_NUMNODES]; unsigned int free_huge_pages_node[MAX_NUMNODES]; diff --git a/mm/hugetlb.c b/mm/hugetlb.c index e54b695..b5b6e15 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -510,7 +510,7 @@ void copy_huge_page(struct page *dst, struct page *src) static void enqueue_huge_page(struct hstate *h, struct page *page) { int nid = page_to_nid(page); - list_add(&page->lru, &h->hugepage_freelists[nid]); + list_move(&page->lru, &h->hugepage_freelists[nid]); h->free_huge_pages++; h->free_huge_pages_node[nid]++; } @@ -522,7 +522,7 @@ static struct page *dequeue_huge_page_node(struct hstate *h, int nid) if (list_empty(&h->hugepage_freelists[nid])) return NULL; page = list_entry(h->hugepage_freelists[nid].next, struct page, lru); - list_del(&page->lru); + list_move(&page->lru, &h->hugepage_activelist); set_page_refcounted(page); h->free_huge_pages--; h->free_huge_pages_node[nid]--; @@ -626,10 +626,11 @@ static void free_huge_page(struct page *page) page->mapping = NULL; BUG_ON(page_count(page)); BUG_ON(page_mapcount(page)); - INIT_LIST_HEAD(&page->lru); spin_lock(&hugetlb_lock); if (h->surplus_huge_pages_node[nid] && huge_page_order(h) < MAX_ORDER) { + /* remove the page from active list */ + list_del(&page->lru); update_and_free_page(h, page); h->surplus_huge_pages--; h->surplus_huge_pages_node[nid]--; @@ -642,6 +643,7 @@ static void free_huge_page(struct page *page) static void prep_new_huge_page(struct hstate *h, struct page *page, int nid) { + INIT_LIST_HEAD(&page->lru); set_compound_page_dtor(page, free_huge_page); spin_lock(&hugetlb_lock); h->nr_huge_pages++; @@ -890,6 +892,7 @@ static struct page *alloc_buddy_huge_page(struct hstate *h, int nid) spin_lock(&hugetlb_lock); if (page) { + INIT_LIST_HEAD(&page->lru); r_nid = page_to_nid(page); set_compound_page_dtor(page, free_huge_page); /* @@ -994,7 +997,6 @@ retry: list_for_each_entry_safe(page, tmp, &surplus_list, lru) { if ((--needed) < 0) break; - list_del(&page->lru); /* * This page is now managed by the hugetlb allocator and has * no users -- drop the buddy allocator's reference. @@ -1009,7 +1011,6 @@ free: /* Free unnecessary surplus pages to the buddy allocator */ if (!list_empty(&surplus_list)) { list_for_each_entry_safe(page, tmp, &surplus_list, lru) { - list_del(&page->lru); put_page(page); } } @@ -1909,6 +1910,7 @@ void __init hugetlb_add_hstate(unsigned order) h->free_huge_pages = 0; for (i = 0; i < MAX_NUMNODES; ++i) INIT_LIST_HEAD(&h->hugepage_freelists[i]); + INIT_LIST_HEAD(&h->hugepage_activelist); h->next_nid_to_alloc = first_node(node_states[N_HIGH_MEMORY]); h->next_nid_to_free = first_node(node_states[N_HIGH_MEMORY]); snprintf(h->name, HSTATE_NAME_LEN, "hugepages-%lukB", -- 1.7.10 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Aneesh Kumar K.V" Subject: [PATCH -V9 11/15] hugetlb/cgroup: Add charge/uncharge routines for hugetlb cgroup Date: Wed, 13 Jun 2012 15:57:30 +0530 Message-ID: <1339583254-895-12-git-send-email-aneesh.kumar@linux.vnet.ibm.com> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Return-path: In-Reply-To: <1339583254-895-1-git-send-email-aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org, dhillf-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, rientjes-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org, mhocko-AlSwsSmVLrQ@public.gmane.org, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org, hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, "Aneesh Kumar K.V" From: "Aneesh Kumar K.V" This patchset add the charge and uncharge routines for hugetlb cgroup. We do cgroup charging in page alloc and uncharge in compound page destructor. Assigning page's hugetlb cgroup is protected by hugetlb_lock. Signed-off-by: Aneesh Kumar K.V --- include/linux/hugetlb_cgroup.h | 38 +++++++++++++++++++ mm/hugetlb.c | 16 +++++++- mm/hugetlb_cgroup.c | 80 ++++++++++++++++++++++++++++++++++++++++ 3 files changed, 133 insertions(+), 1 deletion(-) diff --git a/include/linux/hugetlb_cgroup.h b/include/linux/hugetlb_cgroup.h index be1a9f8..e05871c 100644 --- a/include/linux/hugetlb_cgroup.h +++ b/include/linux/hugetlb_cgroup.h @@ -53,6 +53,16 @@ static inline bool hugetlb_cgroup_disabled(void) return false; } +extern int hugetlb_cgroup_charge_cgroup(int idx, unsigned long nr_pages, + struct hugetlb_cgroup **ptr); +extern void hugetlb_cgroup_commit_charge(int idx, unsigned long nr_pages, + struct hugetlb_cgroup *h_cg, + struct page *page); +extern void hugetlb_cgroup_uncharge_page(int idx, unsigned long nr_pages, + struct page *page); +extern void hugetlb_cgroup_uncharge_cgroup(int idx, unsigned long nr_pages, + struct hugetlb_cgroup *h_cg); + #else static inline struct hugetlb_cgroup *hugetlb_cgroup_from_page(struct page *page) { @@ -70,5 +80,33 @@ static inline bool hugetlb_cgroup_disabled(void) return true; } +static inline int +hugetlb_cgroup_charge_cgroup(int idx, unsigned long nr_pages, + struct hugetlb_cgroup **ptr) +{ + return 0; +} + +static inline void +hugetlb_cgroup_commit_charge(int idx, unsigned long nr_pages, + struct hugetlb_cgroup *h_cg, + struct page *page) +{ + return; +} + +static inline void +hugetlb_cgroup_uncharge_page(int idx, unsigned long nr_pages, struct page *page) +{ + return; +} + +static inline void +hugetlb_cgroup_uncharge_cgroup(int idx, unsigned long nr_pages, + struct hugetlb_cgroup *h_cg) +{ + return; +} + #endif /* CONFIG_MEM_RES_CTLR_HUGETLB */ #endif diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 6a449c5..59720b1 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -627,6 +627,8 @@ static void free_huge_page(struct page *page) BUG_ON(page_mapcount(page)); spin_lock(&hugetlb_lock); + hugetlb_cgroup_uncharge_page(hstate_index(h), + pages_per_huge_page(h), page); if (h->surplus_huge_pages_node[nid] && huge_page_order(h) < MAX_ORDER) { /* remove the page from active list */ list_del(&page->lru); @@ -1115,7 +1117,10 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma, struct hstate *h = hstate_vma(vma); struct page *page; long chg; + int ret, idx; + struct hugetlb_cgroup *h_cg; + idx = hstate_index(h); /* * Processes that did not create the mapping will have no * reserves and will not have accounted against subpool @@ -1131,6 +1136,11 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma, if (hugepage_subpool_get_pages(spool, chg)) return ERR_PTR(-ENOSPC); + ret = hugetlb_cgroup_charge_cgroup(idx, pages_per_huge_page(h), &h_cg); + if (ret) { + hugepage_subpool_put_pages(spool, chg); + return ERR_PTR(-ENOSPC); + } spin_lock(&hugetlb_lock); page = dequeue_huge_page_vma(h, vma, addr, avoid_reserve); spin_unlock(&hugetlb_lock); @@ -1138,6 +1148,9 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma, if (!page) { page = alloc_buddy_huge_page(h, NUMA_NO_NODE); if (!page) { + hugetlb_cgroup_uncharge_cgroup(idx, + pages_per_huge_page(h), + h_cg); hugepage_subpool_put_pages(spool, chg); return ERR_PTR(-ENOSPC); } @@ -1146,7 +1159,8 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma, set_page_private(page, (unsigned long)spool); vma_commit_reservation(h, vma, addr); - + /* update page cgroup details */ + hugetlb_cgroup_commit_charge(idx, pages_per_huge_page(h), h_cg, page); return page; } diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c index 5a4e71c..0f2f6ac 100644 --- a/mm/hugetlb_cgroup.c +++ b/mm/hugetlb_cgroup.c @@ -113,6 +113,86 @@ static int hugetlb_cgroup_pre_destroy(struct cgroup *cgroup) return -EBUSY; } +int hugetlb_cgroup_charge_cgroup(int idx, unsigned long nr_pages, + struct hugetlb_cgroup **ptr) +{ + int ret = 0; + struct res_counter *fail_res; + struct hugetlb_cgroup *h_cg = NULL; + unsigned long csize = nr_pages * PAGE_SIZE; + + if (hugetlb_cgroup_disabled()) + goto done; + /* + * We don't charge any cgroup if the compound page have less + * than 3 pages. + */ + if (huge_page_order(&hstates[idx]) < HUGETLB_CGROUP_MIN_ORDER) + goto done; +again: + rcu_read_lock(); + h_cg = hugetlb_cgroup_from_task(current); + if (!h_cg) + h_cg = root_h_cgroup; + + if (!css_tryget(&h_cg->css)) { + rcu_read_unlock(); + goto again; + } + rcu_read_unlock(); + + ret = res_counter_charge(&h_cg->hugepage[idx], csize, &fail_res); + css_put(&h_cg->css); +done: + *ptr = h_cg; + return ret; +} + +void hugetlb_cgroup_commit_charge(int idx, unsigned long nr_pages, + struct hugetlb_cgroup *h_cg, + struct page *page) +{ + if (hugetlb_cgroup_disabled() || !h_cg) + return; + + spin_lock(&hugetlb_lock); + set_hugetlb_cgroup(page, h_cg); + spin_unlock(&hugetlb_lock); + return; +} + +/* + * Should be called with hugetlb_lock held + */ +void hugetlb_cgroup_uncharge_page(int idx, unsigned long nr_pages, + struct page *page) +{ + struct hugetlb_cgroup *h_cg; + unsigned long csize = nr_pages * PAGE_SIZE; + + if (hugetlb_cgroup_disabled()) + return; + VM_BUG_ON(!spin_is_locked(&hugetlb_lock)); + h_cg = hugetlb_cgroup_from_page(page); + if (unlikely(!h_cg)) + return; + set_hugetlb_cgroup(page, NULL); + res_counter_uncharge(&h_cg->hugepage[idx], csize); + return; +} + +void hugetlb_cgroup_uncharge_cgroup(int idx, unsigned long nr_pages, + struct hugetlb_cgroup *h_cg) +{ + unsigned long csize = nr_pages * PAGE_SIZE; + + if (hugetlb_cgroup_disabled() || !h_cg) + return; + + res_counter_uncharge(&h_cg->hugepage[idx], csize); + return; +} + struct cgroup_subsys hugetlb_subsys = { .name = "hugetlb", .create = hugetlb_cgroup_create, -- 1.7.10 From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Aneesh Kumar K.V" Subject: [PATCH -V9 14/15] hugetlb/cgroup: migrate hugetlb cgroup info from oldpage to new page during migration Date: Wed, 13 Jun 2012 15:57:33 +0530 Message-ID: <1339583254-895-15-git-send-email-aneesh.kumar@linux.vnet.ibm.com> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Return-path: In-Reply-To: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Sender: owner-linux-mm@kvack.org List-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: linux-mm@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, rientjes@google.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, "Aneesh Kumar K.V" From: "Aneesh Kumar K.V" With HugeTLB pages, hugetlb cgroup is uncharged in compound page destructor. Since we are holding a hugepage reference, we can be sure that old page won't get uncharged till the last put_page(). Signed-off-by: Aneesh Kumar K.V --- include/linux/hugetlb_cgroup.h | 8 ++++++++ mm/hugetlb_cgroup.c | 20 ++++++++++++++++++++ mm/migrate.c | 5 +++++ 3 files changed, 33 insertions(+) diff --git a/include/linux/hugetlb_cgroup.h b/include/linux/hugetlb_cgroup.h index bd8bc98..e9e6d74 100644 --- a/include/linux/hugetlb_cgroup.h +++ b/include/linux/hugetlb_cgroup.h @@ -63,6 +63,8 @@ extern void hugetlb_cgroup_uncharge_page(int idx, unsigned long nr_pages, extern void hugetlb_cgroup_uncharge_cgroup(int idx, unsigned long nr_pages, struct hugetlb_cgroup *h_cg); extern int hugetlb_cgroup_file_init(int idx) __init; +extern void hugetlb_cgroup_migrate(struct page *oldhpage, + struct page *newhpage); #else static inline struct hugetlb_cgroup *hugetlb_cgroup_from_page(struct page *page) @@ -114,5 +116,11 @@ static inline int __init hugetlb_cgroup_file_init(int idx) return 0; } +static inline void hugetlb_cgroup_migrate(struct page *oldhpage, + struct page *newhpage) +{ + return; +} + #endif /* CONFIG_MEM_RES_CTLR_HUGETLB */ #endif diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c index 64e93e0..8e7ca0a 100644 --- a/mm/hugetlb_cgroup.c +++ b/mm/hugetlb_cgroup.c @@ -388,6 +388,26 @@ int __init hugetlb_cgroup_file_init(int idx) return 0; } +void hugetlb_cgroup_migrate(struct page *oldhpage, struct page *newhpage) +{ + struct hugetlb_cgroup *h_cg; + + if (hugetlb_cgroup_disabled()) + return; + + VM_BUG_ON(!PageHuge(oldhpage)); + spin_lock(&hugetlb_lock); + h_cg = hugetlb_cgroup_from_page(oldhpage); + set_hugetlb_cgroup(oldhpage, NULL); + cgroup_exclude_rmdir(&h_cg->css); + + /* move the h_cg details to new cgroup */ + set_hugetlb_cgroup(newhpage, h_cg); + spin_unlock(&hugetlb_lock); + cgroup_release_and_wakeup_rmdir(&h_cg->css); + return; +} + struct cgroup_subsys hugetlb_subsys = { .name = "hugetlb", .create = hugetlb_cgroup_create, diff --git a/mm/migrate.c b/mm/migrate.c index fdce3a2..6c37c51 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -33,6 +33,7 @@ #include #include #include +#include #include #include @@ -931,6 +932,10 @@ static int unmap_and_move_huge_page(new_page_t get_new_page, if (anon_vma) put_anon_vma(anon_vma); + + if (!rc) + hugetlb_cgroup_migrate(hpage, new_hpage); + unlock_page(hpage); out: put_page(new_hpage); -- 1.7.10 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Aneesh Kumar K.V" Subject: Re: [PATCH -V9 10/15] hugetlb/cgroup: Add the cgroup pointer to page lru Date: Wed, 13 Jun 2012 17:02:47 +0530 Message-ID: <8762avo3a8.fsf@skywalker.in.ibm.com> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1339583254-895-11-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Mime-Version: 1.0 Return-path: In-Reply-To: <1339583254-895-11-git-send-email-aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org, dhillf-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, rientjes-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org, mhocko-AlSwsSmVLrQ@public.gmane.org, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org, hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org Need this patch for hugetlb cgroup disabled. I will send an updated patch in reply. diff --git a/include/linux/hugetlb_cgroup.h b/include/linux/hugetlb_cgroup.h index e9e6d74..bc30413 100644 --- a/include/linux/hugetlb_cgroup.h +++ b/include/linux/hugetlb_cgroup.h @@ -18,14 +18,14 @@ #include struct hugetlb_cgroup; - -#ifdef CONFIG_CGROUP_HUGETLB_RES_CTLR /* * Minimum page order trackable by hugetlb cgroup. * At least 3 pages are necessary for all the tracking information. */ #define HUGETLB_CGROUP_MIN_ORDER 2 +#ifdef CONFIG_CGROUP_HUGETLB_RES_CTLR + static inline struct hugetlb_cgroup *hugetlb_cgroup_from_page(struct page *page) { VM_BUG_ON(!PageHuge(page)); From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Aneesh Kumar K.V" Subject: [PATCH -V9 [updated] 10/15] hugetlb/cgroup: Add the cgroup pointer to page lru Date: Wed, 13 Jun 2012 17:04:30 +0530 Message-ID: <1339587270-5831-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> References: <1339583254-895-11-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Return-path: In-Reply-To: <1339583254-895-11-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: linux-mm@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, rientjes@google.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, "Aneesh Kumar K.V" From: "Aneesh Kumar K.V" Add the hugetlb cgroup pointer to 3rd page lru.next. This limit the usage to hugetlb cgroup to only hugepages with 3 or more normal pages. I guess that is an acceptable limitation. Signed-off-by: Aneesh Kumar K.V --- include/linux/hugetlb_cgroup.h | 37 +++++++++++++++++++++++++++++++++++++ mm/hugetlb.c | 4 ++++ 2 files changed, 41 insertions(+) diff --git a/include/linux/hugetlb_cgroup.h b/include/linux/hugetlb_cgroup.h index e9944b4..2e4cb6b 100644 --- a/include/linux/hugetlb_cgroup.h +++ b/include/linux/hugetlb_cgroup.h @@ -18,8 +18,34 @@ #include struct hugetlb_cgroup; +/* + * Minimum page order trackable by hugetlb cgroup. + * At least 3 pages are necessary for all the tracking information. + */ +#define HUGETLB_CGROUP_MIN_ORDER 2 #ifdef CONFIG_CGROUP_HUGETLB_RES_CTLR + +static inline struct hugetlb_cgroup *hugetlb_cgroup_from_page(struct page *page) +{ + VM_BUG_ON(!PageHuge(page)); + + if (compound_order(page) < HUGETLB_CGROUP_MIN_ORDER) + return NULL; + return (struct hugetlb_cgroup *)page[2].lru.next; +} + +static inline +int set_hugetlb_cgroup(struct page *page, struct hugetlb_cgroup *h_cg) +{ + VM_BUG_ON(!PageHuge(page)); + + if (compound_order(page) < HUGETLB_CGROUP_MIN_ORDER) + return -1; + page[2].lru.next = (void *)h_cg; + return 0; +} + static inline bool hugetlb_cgroup_disabled(void) { if (hugetlb_subsys.disabled) @@ -28,6 +54,17 @@ static inline bool hugetlb_cgroup_disabled(void) } #else +static inline struct hugetlb_cgroup *hugetlb_cgroup_from_page(struct page *page) +{ + return NULL; +} + +static inline +int set_hugetlb_cgroup(struct page *page, struct hugetlb_cgroup *h_cg) +{ + return 0; +} + static inline bool hugetlb_cgroup_disabled(void) { return true; diff --git a/mm/hugetlb.c b/mm/hugetlb.c index e899a2d..6a449c5 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -28,6 +28,7 @@ #include #include +#include #include #include "internal.h" @@ -591,6 +592,7 @@ static void update_and_free_page(struct hstate *h, struct page *page) 1 << PG_active | 1 << PG_reserved | 1 << PG_private | 1 << PG_writeback); } + VM_BUG_ON(hugetlb_cgroup_from_page(page)); set_compound_page_dtor(page, NULL); set_page_refcounted(page); arch_release_hugepage(page); @@ -643,6 +645,7 @@ static void prep_new_huge_page(struct hstate *h, struct page *page, int nid) INIT_LIST_HEAD(&page->lru); set_compound_page_dtor(page, free_huge_page); spin_lock(&hugetlb_lock); + set_hugetlb_cgroup(page, NULL); h->nr_huge_pages++; h->nr_huge_pages_node[nid]++; spin_unlock(&hugetlb_lock); @@ -892,6 +895,7 @@ static struct page *alloc_buddy_huge_page(struct hstate *h, int nid) INIT_LIST_HEAD(&page->lru); r_nid = page_to_nid(page); set_compound_page_dtor(page, free_huge_page); + set_hugetlb_cgroup(page, NULL); /* * We incremented the global counters already */ -- 1.7.10 From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michal Hocko Subject: Re: [PATCH -V9 04/15] hugetlb: use mmu_gather instead of a temporary linked list for accumulating pages Date: Wed, 13 Jun 2012 16:59:23 +0200 Message-ID: <20120613145923.GA14777@tiehlicka.suse.cz> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1339583254-895-5-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Mime-Version: 1.0 Return-path: Content-Disposition: inline In-Reply-To: <1339583254-895-5-git-send-email-aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: "Aneesh Kumar K.V" Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org, dhillf-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, rientjes-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org, hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org On Wed 13-06-12 15:57:23, Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > Use a mmu_gather instead of a temporary linked list for accumulating > pages when we unmap a hugepage range Sorry for coming up with the comment that late but you owe us an explanation _why_ you are doing this. I assume that this fixes a real problem when we take i_mmap_mutex already up in unmap_mapping_range mutex_lock(&mapping->i_mmap_mutex); unmap_mapping_range_tree | unmap_mapping_range_list unmap_mapping_range_vma zap_page_range_single unmap_single_vma unmap_hugepage_range mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex); And that this should have been marked for stable as well (I haven't checked when this has been introduced). But then I do not see how this help when you still do this: [...] > diff --git a/mm/memory.c b/mm/memory.c > index 1b7dc66..545e18a 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -1326,8 +1326,11 @@ static void unmap_single_vma(struct mmu_gather *tlb, > * Since no pte has actually been setup, it is > * safe to do nothing in this case. > */ > - if (vma->vm_file) > - unmap_hugepage_range(vma, start, end, NULL); > + if (vma->vm_file) { > + mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex); > + __unmap_hugepage_range(tlb, vma, start, end, NULL); > + mutex_unlock(&vma->vm_file->f_mapping->i_mmap_mutex); > + } > } else > unmap_page_range(tlb, vma, start, end, details); > } -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michal Hocko Subject: Re: [PATCH -V9 04/15] hugetlb: use mmu_gather instead of a temporary linked list for accumulating pages Date: Wed, 13 Jun 2012 17:03:38 +0200 Message-ID: <20120613150338.GB14777@tiehlicka.suse.cz> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1339583254-895-5-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <20120613145923.GA14777@tiehlicka.suse.cz> Mime-Version: 1.0 Return-path: Content-Disposition: inline In-Reply-To: <20120613145923.GA14777-VqjxzfR4DlwKmadIfiO5sKVXKuFTiq87@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: "Aneesh Kumar K.V" Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org, dhillf-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, rientjes-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org, hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org On Wed 13-06-12 16:59:23, Michal Hocko wrote: > On Wed 13-06-12 15:57:23, Aneesh Kumar K.V wrote: > > From: "Aneesh Kumar K.V" > > > > Use a mmu_gather instead of a temporary linked list for accumulating > > pages when we unmap a hugepage range > > Sorry for coming up with the comment that late but you owe us an > explanation _why_ you are doing this. > > I assume that this fixes a real problem when we take i_mmap_mutex > already up in > unmap_mapping_range > mutex_lock(&mapping->i_mmap_mutex); > unmap_mapping_range_tree | unmap_mapping_range_list > unmap_mapping_range_vma > zap_page_range_single > unmap_single_vma > unmap_hugepage_range > mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex); > > And that this should have been marked for stable as well (I haven't > checked when this has been introduced). > > But then I do not see how this help when you still do this: > [...] > > diff --git a/mm/memory.c b/mm/memory.c > > index 1b7dc66..545e18a 100644 > > --- a/mm/memory.c > > +++ b/mm/memory.c > > @@ -1326,8 +1326,11 @@ static void unmap_single_vma(struct mmu_gather *tlb, > > * Since no pte has actually been setup, it is > > * safe to do nothing in this case. > > */ > > - if (vma->vm_file) > > - unmap_hugepage_range(vma, start, end, NULL); > > + if (vma->vm_file) { > > + mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex); > > + __unmap_hugepage_range(tlb, vma, start, end, NULL); > > + mutex_unlock(&vma->vm_file->f_mapping->i_mmap_mutex); > > + } > > } else > > unmap_page_range(tlb, vma, start, end, details); > > } Ahhh, you are removing the lock in the next patch. Really confusing and not nice for the stable backport. Could you merge those two patches and add Cc: stable? Then you can add my Reviewed-by: Michal Hocko -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Aneesh Kumar K.V" Subject: Re: [PATCH -V9 04/15] hugetlb: use mmu_gather instead of a temporary linked list for accumulating pages Date: Wed, 13 Jun 2012 22:07:06 +0530 Message-ID: <871uljnp71.fsf@skywalker.in.ibm.com> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1339583254-895-5-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <20120613145923.GA14777@tiehlicka.suse.cz> Mime-Version: 1.0 Return-path: In-Reply-To: <20120613145923.GA14777-VqjxzfR4DlwKmadIfiO5sKVXKuFTiq87@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: Michal Hocko Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org, dhillf-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, rientjes-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org, hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org Michal Hocko writes: > On Wed 13-06-12 15:57:23, Aneesh Kumar K.V wrote: >> From: "Aneesh Kumar K.V" >> >> Use a mmu_gather instead of a temporary linked list for accumulating >> pages when we unmap a hugepage range > > Sorry for coming up with the comment that late but you owe us an > explanation _why_ you are doing this. > > I assume that this fixes a real problem when we take i_mmap_mutex > already up in > unmap_mapping_range > mutex_lock(&mapping->i_mmap_mutex); > unmap_mapping_range_tree | unmap_mapping_range_list > unmap_mapping_range_vma > zap_page_range_single > unmap_single_vma > unmap_hugepage_range > mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex); > > And that this should have been marked for stable as well (I haven't > checked when this has been introduced). Switch to mmu_gather is to get rid of the use of page->lru so that i can use it for active list. -aneesh From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Aneesh Kumar K.V" Subject: Re: [PATCH -V9 04/15] hugetlb: use mmu_gather instead of a temporary linked list for accumulating pages Date: Wed, 13 Jun 2012 22:13:00 +0530 Message-ID: <87y5nrmacr.fsf@skywalker.in.ibm.com> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1339583254-895-5-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <20120613145923.GA14777@tiehlicka.suse.cz> <20120613150338.GB14777@tiehlicka.suse.cz> Mime-Version: 1.0 Return-path: In-Reply-To: <20120613150338.GB14777@tiehlicka.suse.cz> Sender: owner-linux-mm@kvack.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: Michal Hocko Cc: linux-mm@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, rientjes@google.com, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Michal Hocko writes: > On Wed 13-06-12 16:59:23, Michal Hocko wrote: >> On Wed 13-06-12 15:57:23, Aneesh Kumar K.V wrote: >> > From: "Aneesh Kumar K.V" >> > >> > Use a mmu_gather instead of a temporary linked list for accumulating >> > pages when we unmap a hugepage range >> >> Sorry for coming up with the comment that late but you owe us an >> explanation _why_ you are doing this. >> >> I assume that this fixes a real problem when we take i_mmap_mutex >> already up in >> unmap_mapping_range >> mutex_lock(&mapping->i_mmap_mutex); >> unmap_mapping_range_tree | unmap_mapping_range_list >> unmap_mapping_range_vma >> zap_page_range_single >> unmap_single_vma >> unmap_hugepage_range >> mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex); >> >> And that this should have been marked for stable as well (I haven't >> checked when this has been introduced). >> >> But then I do not see how this help when you still do this: >> [...] >> > diff --git a/mm/memory.c b/mm/memory.c >> > index 1b7dc66..545e18a 100644 >> > --- a/mm/memory.c >> > +++ b/mm/memory.c >> > @@ -1326,8 +1326,11 @@ static void unmap_single_vma(struct mmu_gather *tlb, >> > * Since no pte has actually been setup, it is >> > * safe to do nothing in this case. >> > */ >> > - if (vma->vm_file) >> > - unmap_hugepage_range(vma, start, end, NULL); >> > + if (vma->vm_file) { >> > + mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex); >> > + __unmap_hugepage_range(tlb, vma, start, end, NULL); >> > + mutex_unlock(&vma->vm_file->f_mapping->i_mmap_mutex); >> > + } >> > } else >> > unmap_page_range(tlb, vma, start, end, details); >> > } > > Ahhh, you are removing the lock in the next patch. Really confusing and > not nice for the stable backport. > Could you merge those two patches and add Cc: stable? > Then you can add my > Reviewed-by: Michal Hocko > In the last review cycle I was asked to see if we can get a lockdep report for the above and what I found was we don't really cause the above deadlock with the current codebase because for hugetlb we don't directly call unmap_mapping_range. But still it is good to remove the i_mmap_mutex, because we don't need that protection now. I didn't mark it for stable because of the above reason. -aneesh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Kamezawa Hiroyuki Subject: Re: [PATCH -V9 05/15] hugetlb: avoid taking i_mmap_mutex in unmap_single_vma() for hugetlb Date: Thu, 14 Jun 2012 12:09:28 +0900 Message-ID: <4FD955E8.5050100@jp.fujitsu.com> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1339583254-895-6-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1339583254-895-6-git-send-email-aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" To: "Aneesh Kumar K.V" Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, dhillf-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, rientjes-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org, mhocko-AlSwsSmVLrQ@public.gmane.org, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org, hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org (2012/06/13 19:27), Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > i_mmap_mutex lock was added in unmap_single_vma by 502717f4e ("hugetlb: > fix linked list corruption in unmap_hugepage_range()") but we don't use > page->lru in unmap_hugepage_range any more. Also the lock was taken > higher up in the stack in some code path. That would result in deadlock. > > unmap_mapping_range (i_mmap_mutex) > -> unmap_mapping_range_tree > -> unmap_mapping_range_vma > -> zap_page_range_single > -> unmap_single_vma > -> unmap_hugepage_range (i_mmap_mutex) > > For shared pagetable support for huge pages, since pagetable pages are ref > counted we don't need any lock during huge_pmd_unshare. We do take > i_mmap_mutex in huge_pmd_share while walking the vma_prio_tree in mapping. > (39dde65c9940c97f ("shared page table for hugetlb page")). > > Signed-off-by: Aneesh Kumar K.V Acked-by: KAMEZAWA Hiroyuki From mboxrd@z Thu Jan 1 00:00:00 1970 From: Kamezawa Hiroyuki Subject: Re: [PATCH -V9 08/15] hugetlb: Make some static variables global Date: Thu, 14 Jun 2012 12:11:33 +0900 Message-ID: <4FD95665.5050300@jp.fujitsu.com> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1339583254-895-9-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1339583254-895-9-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Sender: owner-linux-mm@kvack.org List-ID: Content-Type: text/plain; charset="us-ascii" To: "Aneesh Kumar K.V" Cc: linux-mm@kvack.org, dhillf@gmail.com, rientjes@google.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org (2012/06/13 19:27), Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > We will use them later in hugetlb_cgroup.c > > Signed-off-by: Aneesh Kumar K.V Acked-by: KAMEZAWA Hiroyuki -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Kamezawa Hiroyuki Subject: Re: [PATCH -V9 [updated] 10/15] hugetlb/cgroup: Add the cgroup pointer to page lru Date: Thu, 14 Jun 2012 13:04:36 +0900 Message-ID: <4FD962D4.1020908@jp.fujitsu.com> References: <1339583254-895-11-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1339587270-5831-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1339587270-5831-1-git-send-email-aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" To: "Aneesh Kumar K.V" Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, dhillf-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, rientjes-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org, mhocko-AlSwsSmVLrQ@public.gmane.org, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org, hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org (2012/06/13 20:34), Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > Add the hugetlb cgroup pointer to 3rd page lru.next. This limit > the usage to hugetlb cgroup to only hugepages with 3 or more > normal pages. I guess that is an acceptable limitation. > > Signed-off-by: Aneesh Kumar K.V Acked-by: KAMEZAWA Hiroyuki From mboxrd@z Thu Jan 1 00:00:00 1970 From: Kamezawa Hiroyuki Subject: Re: [PATCH -V9 11/15] hugetlb/cgroup: Add charge/uncharge routines for hugetlb cgroup Date: Thu, 14 Jun 2012 13:07:12 +0900 Message-ID: <4FD96370.2020708@jp.fujitsu.com> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1339583254-895-12-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1339583254-895-12-git-send-email-aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" To: "Aneesh Kumar K.V" Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, dhillf-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, rientjes-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org, mhocko-AlSwsSmVLrQ@public.gmane.org, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org, hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org (2012/06/13 19:27), Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > This patchset add the charge and uncharge routines for hugetlb cgroup. > We do cgroup charging in page alloc and uncharge in compound page > destructor. Assigning page's hugetlb cgroup is protected by hugetlb_lock. > > Signed-off-by: Aneesh Kumar K.V Acked-by: KAMEZAWA Hiroyuki From mboxrd@z Thu Jan 1 00:00:00 1970 From: Kamezawa Hiroyuki Subject: Re: [PATCH -V9 12/15] hugetlb/cgroup: Add support for cgroup removal Date: Thu, 14 Jun 2012 13:09:05 +0900 Message-ID: <4FD963E1.6080506@jp.fujitsu.com> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1339583254-895-13-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1339583254-895-13-git-send-email-aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" To: "Aneesh Kumar K.V" Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, dhillf-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, rientjes-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org, mhocko-AlSwsSmVLrQ@public.gmane.org, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org, hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org (2012/06/13 19:27), Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > This patch add support for cgroup removal. If we don't have parent > cgroup, the charges are moved to root cgroup. > > Signed-off-by: Aneesh Kumar K.V Acked-by: KAMEZAWA Hiroyuki From mboxrd@z Thu Jan 1 00:00:00 1970 From: Kamezawa Hiroyuki Subject: Re: [PATCH -V9 13/15] hugetlb/cgroup: add hugetlb cgroup control files Date: Thu, 14 Jun 2012 13:10:42 +0900 Message-ID: <4FD96442.3040509@jp.fujitsu.com> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1339583254-895-14-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1339583254-895-14-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: Content-Type: text/plain; charset="us-ascii" To: "Aneesh Kumar K.V" Cc: linux-mm@kvack.org, dhillf@gmail.com, rientjes@google.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org (2012/06/13 19:27), Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > Add the control files for hugetlb controller > > Signed-off-by: Aneesh Kumar K.V Acked-by: KAMEZAWA Hiroyuki From mboxrd@z Thu Jan 1 00:00:00 1970 From: Kamezawa Hiroyuki Subject: Re: [PATCH -V9 14/15] hugetlb/cgroup: migrate hugetlb cgroup info from oldpage to new page during migration Date: Thu, 14 Jun 2012 13:13:17 +0900 Message-ID: <4FD964DD.6060802@jp.fujitsu.com> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1339583254-895-15-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1339583254-895-15-git-send-email-aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" To: "Aneesh Kumar K.V" Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, dhillf-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, rientjes-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org, mhocko-AlSwsSmVLrQ@public.gmane.org, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org, hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org (2012/06/13 19:27), Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > With HugeTLB pages, hugetlb cgroup is uncharged in compound page destructor. Since > we are holding a hugepage reference, we can be sure that old page won't > get uncharged till the last put_page(). > > Signed-off-by: Aneesh Kumar K.V Acked-by: KAMEZAWA Hiroyuki From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michal Hocko Subject: Re: [PATCH -V9 04/15] hugetlb: use mmu_gather instead of a temporary linked list for accumulating pages Date: Thu, 14 Jun 2012 09:14:23 +0200 Message-ID: <20120614071423.GA27397@tiehlicka.suse.cz> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1339583254-895-5-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <20120613145923.GA14777@tiehlicka.suse.cz> <20120613150338.GB14777@tiehlicka.suse.cz> <87y5nrmacr.fsf@skywalker.in.ibm.com> Mime-Version: 1.0 Return-path: Content-Disposition: inline In-Reply-To: <87y5nrmacr.fsf-6yE53ggjAfyqSkle7U1LjlaTQe2KTcn/@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: "Aneesh Kumar K.V" Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org, dhillf-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, rientjes-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org, hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org On Wed 13-06-12 22:13:00, Aneesh Kumar K.V wrote: > Michal Hocko writes: > > > On Wed 13-06-12 16:59:23, Michal Hocko wrote: > >> On Wed 13-06-12 15:57:23, Aneesh Kumar K.V wrote: > >> > From: "Aneesh Kumar K.V" > >> > > >> > Use a mmu_gather instead of a temporary linked list for accumulating > >> > pages when we unmap a hugepage range > >> > >> Sorry for coming up with the comment that late but you owe us an > >> explanation _why_ you are doing this. > >> > >> I assume that this fixes a real problem when we take i_mmap_mutex > >> already up in > >> unmap_mapping_range > >> mutex_lock(&mapping->i_mmap_mutex); > >> unmap_mapping_range_tree | unmap_mapping_range_list > >> unmap_mapping_range_vma > >> zap_page_range_single > >> unmap_single_vma > >> unmap_hugepage_range > >> mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex); > >> > >> And that this should have been marked for stable as well (I haven't > >> checked when this has been introduced). > >> > >> But then I do not see how this help when you still do this: > >> [...] > >> > diff --git a/mm/memory.c b/mm/memory.c > >> > index 1b7dc66..545e18a 100644 > >> > --- a/mm/memory.c > >> > +++ b/mm/memory.c > >> > @@ -1326,8 +1326,11 @@ static void unmap_single_vma(struct mmu_gather *tlb, > >> > * Since no pte has actually been setup, it is > >> > * safe to do nothing in this case. > >> > */ > >> > - if (vma->vm_file) > >> > - unmap_hugepage_range(vma, start, end, NULL); > >> > + if (vma->vm_file) { > >> > + mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex); > >> > + __unmap_hugepage_range(tlb, vma, start, end, NULL); > >> > + mutex_unlock(&vma->vm_file->f_mapping->i_mmap_mutex); > >> > + } > >> > } else > >> > unmap_page_range(tlb, vma, start, end, details); > >> > } > > > > Ahhh, you are removing the lock in the next patch. Really confusing and > > not nice for the stable backport. > > Could you merge those two patches and add Cc: stable? > > Then you can add my > > Reviewed-by: Michal Hocko > > > > In the last review cycle I was asked to see if we can get a lockdep > report for the above and what I found was we don't really cause the > above deadlock with the current codebase because for hugetlb we don't > directly call unmap_mapping_range. Ahh, ok I missed that. > But still it is good to remove the i_mmap_mutex, because we don't need > that protection now. I didn't mark it for stable because of the above > reason. Thanks for clarification > > -aneesh > -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michal Hocko Subject: Re: [PATCH -V9 04/15] hugetlb: use mmu_gather instead of a temporary linked list for accumulating pages Date: Thu, 14 Jun 2012 09:16:37 +0200 Message-ID: <20120614071637.GB27397@tiehlicka.suse.cz> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1339583254-895-5-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <20120613145923.GA14777@tiehlicka.suse.cz> <871uljnp71.fsf@skywalker.in.ibm.com> Mime-Version: 1.0 Return-path: Content-Disposition: inline In-Reply-To: <871uljnp71.fsf-6yE53ggjAfyqSkle7U1LjlaTQe2KTcn/@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: "Aneesh Kumar K.V" Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org, dhillf-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, rientjes-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org, hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org On Wed 13-06-12 22:07:06, Aneesh Kumar K.V wrote: > Michal Hocko writes: > > > On Wed 13-06-12 15:57:23, Aneesh Kumar K.V wrote: > >> From: "Aneesh Kumar K.V" > >> > >> Use a mmu_gather instead of a temporary linked list for accumulating > >> pages when we unmap a hugepage range > > > > Sorry for coming up with the comment that late but you owe us an > > explanation _why_ you are doing this. > > > > I assume that this fixes a real problem when we take i_mmap_mutex > > already up in > > unmap_mapping_range > > mutex_lock(&mapping->i_mmap_mutex); > > unmap_mapping_range_tree | unmap_mapping_range_list > > unmap_mapping_range_vma > > zap_page_range_single > > unmap_single_vma > > unmap_hugepage_range > > mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex); > > > > And that this should have been marked for stable as well (I haven't > > checked when this has been introduced). > > Switch to mmu_gather is to get rid of the use of page->lru so that i can use it for > active list. So can we get this to the changelog please? -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michal Hocko Subject: Re: [PATCH -V9 05/15] hugetlb: avoid taking i_mmap_mutex in unmap_single_vma() for hugetlb Date: Thu, 14 Jun 2012 09:20:53 +0200 Message-ID: <20120614072053.GC27397@tiehlicka.suse.cz> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1339583254-895-6-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Mime-Version: 1.0 Return-path: Content-Disposition: inline In-Reply-To: <1339583254-895-6-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: "Aneesh Kumar K.V" Cc: linux-mm@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, rientjes@google.com, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org On Wed 13-06-12 15:57:24, Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > i_mmap_mutex lock was added in unmap_single_vma by 502717f4e ("hugetlb: > fix linked list corruption in unmap_hugepage_range()") but we don't use > page->lru in unmap_hugepage_range any more. Also the lock was taken > higher up in the stack in some code path. That would result in deadlock. This sounds like the deadlock is real but in the other email you wrote that the deadlock cannot happen so it would be good to mention it here. > unmap_mapping_range (i_mmap_mutex) > -> unmap_mapping_range_tree > -> unmap_mapping_range_vma > -> zap_page_range_single > -> unmap_single_vma > -> unmap_hugepage_range (i_mmap_mutex) > > For shared pagetable support for huge pages, since pagetable pages are ref > counted we don't need any lock during huge_pmd_unshare. We do take > i_mmap_mutex in huge_pmd_share while walking the vma_prio_tree in mapping. > (39dde65c9940c97f ("shared page table for hugetlb page")). > > Signed-off-by: Aneesh Kumar K.V > --- > mm/memory.c | 5 +---- > 1 file changed, 1 insertion(+), 4 deletions(-) > > diff --git a/mm/memory.c b/mm/memory.c > index 545e18a..f6bc04f 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -1326,11 +1326,8 @@ static void unmap_single_vma(struct mmu_gather *tlb, > * Since no pte has actually been setup, it is > * safe to do nothing in this case. > */ > - if (vma->vm_file) { > - mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex); > + if (vma->vm_file) > __unmap_hugepage_range(tlb, vma, start, end, NULL); > - mutex_unlock(&vma->vm_file->f_mapping->i_mmap_mutex); > - } > } else > unmap_page_range(tlb, vma, start, end, details); > } > -- > 1.7.10 > -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michal Hocko Subject: Re: [PATCH -V9 06/15] hugetlb: simplify migrate_huge_page() Date: Thu, 14 Jun 2012 09:28:31 +0200 Message-ID: <20120614072831.GD27397@tiehlicka.suse.cz> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1339583254-895-7-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Mime-Version: 1.0 Return-path: Content-Disposition: inline In-Reply-To: <1339583254-895-7-git-send-email-aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: "Aneesh Kumar K.V" Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org, dhillf-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, rientjes-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org, hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org On Wed 13-06-12 15:57:25, Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > Since we migrate only one hugepage, don't use linked list for passing the > page around. Directly pass the page that need to be migrated as argument. > This also remove the usage page->lru in migrate path. > > Reviewed-by: KAMEZAWA Hiroyuki > Signed-off-by: Aneesh Kumar K.V Yes nice. Reviewed-by: Michal Hocko > --- > include/linux/migrate.h | 4 +-- > mm/memory-failure.c | 13 ++-------- > mm/migrate.c | 65 +++++++++++++++-------------------------------- > 3 files changed, 25 insertions(+), 57 deletions(-) > > diff --git a/include/linux/migrate.h b/include/linux/migrate.h > index 855c337..ce7e667 100644 > --- a/include/linux/migrate.h > +++ b/include/linux/migrate.h > @@ -15,7 +15,7 @@ extern int migrate_page(struct address_space *, > extern int migrate_pages(struct list_head *l, new_page_t x, > unsigned long private, bool offlining, > enum migrate_mode mode); > -extern int migrate_huge_pages(struct list_head *l, new_page_t x, > +extern int migrate_huge_page(struct page *, new_page_t x, > unsigned long private, bool offlining, > enum migrate_mode mode); > > @@ -36,7 +36,7 @@ static inline void putback_lru_pages(struct list_head *l) {} > static inline int migrate_pages(struct list_head *l, new_page_t x, > unsigned long private, bool offlining, > enum migrate_mode mode) { return -ENOSYS; } > -static inline int migrate_huge_pages(struct list_head *l, new_page_t x, > +static inline int migrate_huge_page(struct page *page, new_page_t x, > unsigned long private, bool offlining, > enum migrate_mode mode) { return -ENOSYS; } > > diff --git a/mm/memory-failure.c b/mm/memory-failure.c > index ab1e714..53a1495 100644 > --- a/mm/memory-failure.c > +++ b/mm/memory-failure.c > @@ -1414,7 +1414,6 @@ static int soft_offline_huge_page(struct page *page, int flags) > int ret; > unsigned long pfn = page_to_pfn(page); > struct page *hpage = compound_head(page); > - LIST_HEAD(pagelist); > > ret = get_any_page(page, pfn, flags); > if (ret < 0) > @@ -1429,19 +1428,11 @@ static int soft_offline_huge_page(struct page *page, int flags) > } > > /* Keep page count to indicate a given hugepage is isolated. */ > - > - list_add(&hpage->lru, &pagelist); > - ret = migrate_huge_pages(&pagelist, new_page, MPOL_MF_MOVE_ALL, 0, > - true); > + ret = migrate_huge_page(hpage, new_page, MPOL_MF_MOVE_ALL, 0, true); > + put_page(hpage); > if (ret) { > - struct page *page1, *page2; > - list_for_each_entry_safe(page1, page2, &pagelist, lru) > - put_page(page1); > - > pr_info("soft offline: %#lx: migration failed %d, type %lx\n", > pfn, ret, page->flags); > - if (ret > 0) > - ret = -EIO; > return ret; > } > done: > diff --git a/mm/migrate.c b/mm/migrate.c > index be26d5c..fdce3a2 100644 > --- a/mm/migrate.c > +++ b/mm/migrate.c > @@ -932,15 +932,8 @@ static int unmap_and_move_huge_page(new_page_t get_new_page, > if (anon_vma) > put_anon_vma(anon_vma); > unlock_page(hpage); > - > out: > - if (rc != -EAGAIN) { > - list_del(&hpage->lru); > - put_page(hpage); > - } > - > put_page(new_hpage); > - > if (result) { > if (rc) > *result = rc; > @@ -1016,48 +1009,32 @@ out: > return nr_failed + retry; > } > > -int migrate_huge_pages(struct list_head *from, > - new_page_t get_new_page, unsigned long private, bool offlining, > - enum migrate_mode mode) > +int migrate_huge_page(struct page *hpage, new_page_t get_new_page, > + unsigned long private, bool offlining, > + enum migrate_mode mode) > { > - int retry = 1; > - int nr_failed = 0; > - int pass = 0; > - struct page *page; > - struct page *page2; > - int rc; > - > - for (pass = 0; pass < 10 && retry; pass++) { > - retry = 0; > - > - list_for_each_entry_safe(page, page2, from, lru) { > + int pass, rc; > + > + for (pass = 0; pass < 10; pass++) { > + rc = unmap_and_move_huge_page(get_new_page, > + private, hpage, pass > 2, offlining, > + mode); > + switch (rc) { > + case -ENOMEM: > + goto out; > + case -EAGAIN: > + /* try again */ > cond_resched(); > - > - rc = unmap_and_move_huge_page(get_new_page, > - private, page, pass > 2, offlining, > - mode); > - > - switch(rc) { > - case -ENOMEM: > - goto out; > - case -EAGAIN: > - retry++; > - break; > - case 0: > - break; > - default: > - /* Permanent failure */ > - nr_failed++; > - break; > - } > + break; > + case 0: > + goto out; > + default: > + rc = -EIO; > + goto out; > } > } > - rc = 0; > out: > - if (rc) > - return rc; > - > - return nr_failed + retry; > + return rc; > } > > #ifdef CONFIG_NUMA > -- > 1.7.10 > -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michal Hocko Subject: Re: [PATCH -V9 07/15] hugetlb: add a list for tracking in-use HugeTLB pages Date: Thu, 14 Jun 2012 09:33:20 +0200 Message-ID: <20120614073320.GE27397@tiehlicka.suse.cz> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1339583254-895-8-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Mime-Version: 1.0 Return-path: Content-Disposition: inline In-Reply-To: <1339583254-895-8-git-send-email-aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: "Aneesh Kumar K.V" Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org, dhillf-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, rientjes-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org, hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org On Wed 13-06-12 15:57:26, Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > hugepage_activelist will be used to track currently used HugeTLB pages. > We need to find the in-use HugeTLB pages to support HugeTLB cgroup removal. > On cgroup removal we update the page's HugeTLB cgroup to point to parent > cgroup. > > Reviewed-by: KAMEZAWA Hiroyuki > Signed-off-by: Aneesh Kumar K.V Reviewed-by: Michal Hocko > --- > include/linux/hugetlb.h | 1 + > mm/hugetlb.c | 12 +++++++----- > 2 files changed, 8 insertions(+), 5 deletions(-) > > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h > index 0f23c18..ed550d8 100644 > --- a/include/linux/hugetlb.h > +++ b/include/linux/hugetlb.h > @@ -211,6 +211,7 @@ struct hstate { > unsigned long resv_huge_pages; > unsigned long surplus_huge_pages; > unsigned long nr_overcommit_huge_pages; > + struct list_head hugepage_activelist; > struct list_head hugepage_freelists[MAX_NUMNODES]; > unsigned int nr_huge_pages_node[MAX_NUMNODES]; > unsigned int free_huge_pages_node[MAX_NUMNODES]; > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index e54b695..b5b6e15 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -510,7 +510,7 @@ void copy_huge_page(struct page *dst, struct page *src) > static void enqueue_huge_page(struct hstate *h, struct page *page) > { > int nid = page_to_nid(page); > - list_add(&page->lru, &h->hugepage_freelists[nid]); > + list_move(&page->lru, &h->hugepage_freelists[nid]); > h->free_huge_pages++; > h->free_huge_pages_node[nid]++; > } > @@ -522,7 +522,7 @@ static struct page *dequeue_huge_page_node(struct hstate *h, int nid) > if (list_empty(&h->hugepage_freelists[nid])) > return NULL; > page = list_entry(h->hugepage_freelists[nid].next, struct page, lru); > - list_del(&page->lru); > + list_move(&page->lru, &h->hugepage_activelist); > set_page_refcounted(page); > h->free_huge_pages--; > h->free_huge_pages_node[nid]--; > @@ -626,10 +626,11 @@ static void free_huge_page(struct page *page) > page->mapping = NULL; > BUG_ON(page_count(page)); > BUG_ON(page_mapcount(page)); > - INIT_LIST_HEAD(&page->lru); > > spin_lock(&hugetlb_lock); > if (h->surplus_huge_pages_node[nid] && huge_page_order(h) < MAX_ORDER) { > + /* remove the page from active list */ > + list_del(&page->lru); > update_and_free_page(h, page); > h->surplus_huge_pages--; > h->surplus_huge_pages_node[nid]--; > @@ -642,6 +643,7 @@ static void free_huge_page(struct page *page) > > static void prep_new_huge_page(struct hstate *h, struct page *page, int nid) > { > + INIT_LIST_HEAD(&page->lru); > set_compound_page_dtor(page, free_huge_page); > spin_lock(&hugetlb_lock); > h->nr_huge_pages++; > @@ -890,6 +892,7 @@ static struct page *alloc_buddy_huge_page(struct hstate *h, int nid) > > spin_lock(&hugetlb_lock); > if (page) { > + INIT_LIST_HEAD(&page->lru); > r_nid = page_to_nid(page); > set_compound_page_dtor(page, free_huge_page); > /* > @@ -994,7 +997,6 @@ retry: > list_for_each_entry_safe(page, tmp, &surplus_list, lru) { > if ((--needed) < 0) > break; > - list_del(&page->lru); > /* > * This page is now managed by the hugetlb allocator and has > * no users -- drop the buddy allocator's reference. > @@ -1009,7 +1011,6 @@ free: > /* Free unnecessary surplus pages to the buddy allocator */ > if (!list_empty(&surplus_list)) { > list_for_each_entry_safe(page, tmp, &surplus_list, lru) { > - list_del(&page->lru); > put_page(page); > } > } > @@ -1909,6 +1910,7 @@ void __init hugetlb_add_hstate(unsigned order) > h->free_huge_pages = 0; > for (i = 0; i < MAX_NUMNODES; ++i) > INIT_LIST_HEAD(&h->hugepage_freelists[i]); > + INIT_LIST_HEAD(&h->hugepage_activelist); > h->next_nid_to_alloc = first_node(node_states[N_HIGH_MEMORY]); > h->next_nid_to_free = first_node(node_states[N_HIGH_MEMORY]); > snprintf(h->name, HSTATE_NAME_LEN, "hugepages-%lukB", > -- > 1.7.10 > -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michal Hocko Subject: Re: [PATCH -V9 08/15] hugetlb: Make some static variables global Date: Thu, 14 Jun 2012 09:38:00 +0200 Message-ID: <20120614073800.GF27397@tiehlicka.suse.cz> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1339583254-895-9-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Mime-Version: 1.0 Return-path: Content-Disposition: inline In-Reply-To: <1339583254-895-9-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Sender: owner-linux-mm@kvack.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: "Aneesh Kumar K.V" Cc: linux-mm@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, rientjes@google.com, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org On Wed 13-06-12 15:57:27, Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > We will use them later in hugetlb_cgroup.c > > Signed-off-by: Aneesh Kumar K.V Reviewed-by: Michal Hocko Just a nit [...] > +extern int hugetlb_max_hstate; Maybe we can mark it __read_mostly as it is modified only during initialization and then it is just a constant. -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michal Hocko Subject: Re: [PATCH -V9 09/15] mm/hugetlb: Add new HugeTLB cgroup Date: Thu, 14 Jun 2012 10:24:06 +0200 Message-ID: <20120614082406.GG27397@tiehlicka.suse.cz> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1339583254-895-10-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Mime-Version: 1.0 Return-path: Content-Disposition: inline In-Reply-To: <1339583254-895-10-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Sender: owner-linux-mm@kvack.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: "Aneesh Kumar K.V" Cc: linux-mm@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, rientjes@google.com, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org On Wed 13-06-12 15:57:28, Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > This patch implements a new controller that allows us to control HugeTLB > allocations. The extension allows to limit the HugeTLB usage per control > group and enforces the controller limit during page fault. Since HugeTLB > doesn't support page reclaim, enforcing the limit at page fault time implies > that, the application will get SIGBUS signal if it tries to access HugeTLB > pages beyond its limit. This requires the application to know beforehand > how much HugeTLB pages it would require for its use. > > The charge/uncharge calls will be added to HugeTLB code in later patch. > Support for cgroup removal will be added in later patches. > > Reviewed-by: KAMEZAWA Hiroyuki > Signed-off-by: Aneesh Kumar K.V Looks good Reviewed-by: Michal Hocko > --- > include/linux/cgroup_subsys.h | 6 ++ > include/linux/hugetlb_cgroup.h | 37 ++++++++++++ > init/Kconfig | 15 +++++ > mm/Makefile | 1 + > mm/hugetlb_cgroup.c | 122 ++++++++++++++++++++++++++++++++++++++++ > 5 files changed, 181 insertions(+) > create mode 100644 include/linux/hugetlb_cgroup.h > create mode 100644 mm/hugetlb_cgroup.c > > diff --git a/include/linux/cgroup_subsys.h b/include/linux/cgroup_subsys.h > index 0bd390c..895923a 100644 > --- a/include/linux/cgroup_subsys.h > +++ b/include/linux/cgroup_subsys.h > @@ -72,3 +72,9 @@ SUBSYS(net_prio) > #endif > > /* */ > + > +#ifdef CONFIG_CGROUP_HUGETLB_RES_CTLR > +SUBSYS(hugetlb) > +#endif > + > +/* */ > diff --git a/include/linux/hugetlb_cgroup.h b/include/linux/hugetlb_cgroup.h > new file mode 100644 > index 0000000..e9944b4 > --- /dev/null > +++ b/include/linux/hugetlb_cgroup.h > @@ -0,0 +1,37 @@ > +/* > + * Copyright IBM Corporation, 2012 > + * Author Aneesh Kumar K.V > + * > + * This program is free software; you can redistribute it and/or modify it > + * under the terms of version 2.1 of the GNU Lesser General Public License > + * as published by the Free Software Foundation. > + * > + * This program is distributed in the hope that it would be useful, but > + * WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. > + * > + */ > + > +#ifndef _LINUX_HUGETLB_CGROUP_H > +#define _LINUX_HUGETLB_CGROUP_H > + > +#include > + > +struct hugetlb_cgroup; > + > +#ifdef CONFIG_CGROUP_HUGETLB_RES_CTLR > +static inline bool hugetlb_cgroup_disabled(void) > +{ > + if (hugetlb_subsys.disabled) > + return true; > + return false; > +} > + > +#else > +static inline bool hugetlb_cgroup_disabled(void) > +{ > + return true; > +} > + > +#endif /* CONFIG_MEM_RES_CTLR_HUGETLB */ > +#endif > diff --git a/init/Kconfig b/init/Kconfig > index d07dcf9..da05fae 100644 > --- a/init/Kconfig > +++ b/init/Kconfig > @@ -751,6 +751,21 @@ config CGROUP_MEM_RES_CTLR_KMEM > the kmem extension can use it to guarantee that no group of processes > will ever exhaust kernel resources alone. > > +config CGROUP_HUGETLB_RES_CTLR > + bool "HugeTLB Resource Controller for Control Groups" > + depends on RESOURCE_COUNTERS && HUGETLB_PAGE && EXPERIMENTAL > + default n > + help > + Provides a cgroup Resource Controller for HugeTLB pages. > + When you enable this, you can put a per cgroup limit on HugeTLB usage. > + The limit is enforced during page fault. Since HugeTLB doesn't > + support page reclaim, enforcing the limit at page fault time implies > + that, the application will get SIGBUS signal if it tries to access > + HugeTLB pages beyond its limit. This requires the application to know > + beforehand how much HugeTLB pages it would require for its use. The > + control group is tracked in the third page lru pointer. This means > + that we cannot use the controller with huge page less than 3 pages. > + > config CGROUP_PERF > bool "Enable perf_event per-cpu per-container group (cgroup) monitoring" > depends on PERF_EVENTS && CGROUPS > diff --git a/mm/Makefile b/mm/Makefile > index 2e2fbbe..25e8002 100644 > --- a/mm/Makefile > +++ b/mm/Makefile > @@ -49,6 +49,7 @@ obj-$(CONFIG_MIGRATION) += migrate.o > obj-$(CONFIG_QUICKLIST) += quicklist.o > obj-$(CONFIG_TRANSPARENT_HUGEPAGE) += huge_memory.o > obj-$(CONFIG_CGROUP_MEM_RES_CTLR) += memcontrol.o page_cgroup.o > +obj-$(CONFIG_CGROUP_HUGETLB_RES_CTLR) += hugetlb_cgroup.o > obj-$(CONFIG_MEMORY_FAILURE) += memory-failure.o > obj-$(CONFIG_HWPOISON_INJECT) += hwpoison-inject.o > obj-$(CONFIG_DEBUG_KMEMLEAK) += kmemleak.o > diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c > new file mode 100644 > index 0000000..5a4e71c > --- /dev/null > +++ b/mm/hugetlb_cgroup.c > @@ -0,0 +1,122 @@ > +/* > + * > + * Copyright IBM Corporation, 2012 > + * Author Aneesh Kumar K.V > + * > + * This program is free software; you can redistribute it and/or modify it > + * under the terms of version 2.1 of the GNU Lesser General Public License > + * as published by the Free Software Foundation. > + * > + * This program is distributed in the hope that it would be useful, but > + * WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. > + * > + */ > + > +#include > +#include > +#include > +#include > + > +struct hugetlb_cgroup { > + struct cgroup_subsys_state css; > + /* > + * the counter to account for hugepages from hugetlb. > + */ > + struct res_counter hugepage[HUGE_MAX_HSTATE]; > +}; > + > +struct cgroup_subsys hugetlb_subsys __read_mostly; > +struct hugetlb_cgroup *root_h_cgroup __read_mostly; > + > +static inline > +struct hugetlb_cgroup *hugetlb_cgroup_from_css(struct cgroup_subsys_state *s) > +{ > + if (s) > + return container_of(s, struct hugetlb_cgroup, css); > + return NULL; > +} > + > +static inline > +struct hugetlb_cgroup *hugetlb_cgroup_from_cgroup(struct cgroup *cgroup) > +{ > + return hugetlb_cgroup_from_css(cgroup_subsys_state(cgroup, > + hugetlb_subsys_id)); > +} > + > +static inline > +struct hugetlb_cgroup *hugetlb_cgroup_from_task(struct task_struct *task) > +{ > + return hugetlb_cgroup_from_css(task_subsys_state(task, > + hugetlb_subsys_id)); > +} > + > +static inline bool hugetlb_cgroup_is_root(struct hugetlb_cgroup *h_cg) > +{ > + return (h_cg == root_h_cgroup); > +} > + > +static inline struct hugetlb_cgroup *parent_hugetlb_cgroup(struct cgroup *cg) > +{ > + if (!cg->parent) > + return NULL; > + return hugetlb_cgroup_from_cgroup(cg->parent); > +} > + > +static inline bool hugetlb_cgroup_have_usage(struct cgroup *cg) > +{ > + int idx; > + struct hugetlb_cgroup *h_cg = hugetlb_cgroup_from_cgroup(cg); > + > + for (idx = 0; idx < hugetlb_max_hstate; idx++) { > + if ((res_counter_read_u64(&h_cg->hugepage[idx], RES_USAGE)) > 0) > + return true; > + } > + return false; > +} > + > +static struct cgroup_subsys_state *hugetlb_cgroup_create(struct cgroup *cgroup) > +{ > + int idx; > + struct cgroup *parent_cgroup; > + struct hugetlb_cgroup *h_cgroup, *parent_h_cgroup; > + > + h_cgroup = kzalloc(sizeof(*h_cgroup), GFP_KERNEL); > + if (!h_cgroup) > + return ERR_PTR(-ENOMEM); > + > + parent_cgroup = cgroup->parent; > + if (parent_cgroup) { > + parent_h_cgroup = hugetlb_cgroup_from_cgroup(parent_cgroup); > + for (idx = 0; idx < HUGE_MAX_HSTATE; idx++) > + res_counter_init(&h_cgroup->hugepage[idx], > + &parent_h_cgroup->hugepage[idx]); > + } else { > + root_h_cgroup = h_cgroup; > + for (idx = 0; idx < HUGE_MAX_HSTATE; idx++) > + res_counter_init(&h_cgroup->hugepage[idx], NULL); > + } > + return &h_cgroup->css; > +} > + > +static void hugetlb_cgroup_destroy(struct cgroup *cgroup) > +{ > + struct hugetlb_cgroup *h_cgroup; > + > + h_cgroup = hugetlb_cgroup_from_cgroup(cgroup); > + kfree(h_cgroup); > +} > + > +static int hugetlb_cgroup_pre_destroy(struct cgroup *cgroup) > +{ > + /* We will add the cgroup removal support in later patches */ > + return -EBUSY; > +} > + > +struct cgroup_subsys hugetlb_subsys = { > + .name = "hugetlb", > + .create = hugetlb_cgroup_create, > + .pre_destroy = hugetlb_cgroup_pre_destroy, > + .destroy = hugetlb_cgroup_destroy, > + .subsys_id = hugetlb_subsys_id, > +}; > -- > 1.7.10 > -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michal Hocko Subject: Re: [PATCH -V9 [updated] 10/15] hugetlb/cgroup: Add the cgroup pointer to page lru Date: Thu, 14 Jun 2012 10:44:29 +0200 Message-ID: <20120614084429.GH27397@tiehlicka.suse.cz> References: <1339583254-895-11-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1339587270-5831-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Mime-Version: 1.0 Return-path: Content-Disposition: inline In-Reply-To: <1339587270-5831-1-git-send-email-aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: "Aneesh Kumar K.V" Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org, dhillf-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, rientjes-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org, hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org On Wed 13-06-12 17:04:30, Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > Add the hugetlb cgroup pointer to 3rd page lru.next. This limit > the usage to hugetlb cgroup to only hugepages with 3 or more > normal pages. I guess that is an acceptable limitation. > > Signed-off-by: Aneesh Kumar K.V I would be happier if you explicitely mentioned that both hugetlb_cgroup_from_page and set_hugetlb_cgroup need hugetlb_lock held, but Reviewed-by: Michal Hocko > --- > include/linux/hugetlb_cgroup.h | 37 +++++++++++++++++++++++++++++++++++++ > mm/hugetlb.c | 4 ++++ > 2 files changed, 41 insertions(+) > > diff --git a/include/linux/hugetlb_cgroup.h b/include/linux/hugetlb_cgroup.h > index e9944b4..2e4cb6b 100644 > --- a/include/linux/hugetlb_cgroup.h > +++ b/include/linux/hugetlb_cgroup.h > @@ -18,8 +18,34 @@ > #include > > struct hugetlb_cgroup; > +/* > + * Minimum page order trackable by hugetlb cgroup. > + * At least 3 pages are necessary for all the tracking information. > + */ > +#define HUGETLB_CGROUP_MIN_ORDER 2 > > #ifdef CONFIG_CGROUP_HUGETLB_RES_CTLR > + > +static inline struct hugetlb_cgroup *hugetlb_cgroup_from_page(struct page *page) > +{ > + VM_BUG_ON(!PageHuge(page)); > + > + if (compound_order(page) < HUGETLB_CGROUP_MIN_ORDER) > + return NULL; > + return (struct hugetlb_cgroup *)page[2].lru.next; > +} > + > +static inline > +int set_hugetlb_cgroup(struct page *page, struct hugetlb_cgroup *h_cg) > +{ > + VM_BUG_ON(!PageHuge(page)); > + > + if (compound_order(page) < HUGETLB_CGROUP_MIN_ORDER) > + return -1; > + page[2].lru.next = (void *)h_cg; > + return 0; > +} > + > static inline bool hugetlb_cgroup_disabled(void) > { > if (hugetlb_subsys.disabled) > @@ -28,6 +54,17 @@ static inline bool hugetlb_cgroup_disabled(void) > } > > #else > +static inline struct hugetlb_cgroup *hugetlb_cgroup_from_page(struct page *page) > +{ > + return NULL; > +} > + > +static inline > +int set_hugetlb_cgroup(struct page *page, struct hugetlb_cgroup *h_cg) > +{ > + return 0; > +} > + > static inline bool hugetlb_cgroup_disabled(void) > { > return true; > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index e899a2d..6a449c5 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -28,6 +28,7 @@ > > #include > #include > +#include > #include > #include "internal.h" > > @@ -591,6 +592,7 @@ static void update_and_free_page(struct hstate *h, struct page *page) > 1 << PG_active | 1 << PG_reserved | > 1 << PG_private | 1 << PG_writeback); > } > + VM_BUG_ON(hugetlb_cgroup_from_page(page)); > set_compound_page_dtor(page, NULL); > set_page_refcounted(page); > arch_release_hugepage(page); > @@ -643,6 +645,7 @@ static void prep_new_huge_page(struct hstate *h, struct page *page, int nid) > INIT_LIST_HEAD(&page->lru); > set_compound_page_dtor(page, free_huge_page); > spin_lock(&hugetlb_lock); > + set_hugetlb_cgroup(page, NULL); > h->nr_huge_pages++; > h->nr_huge_pages_node[nid]++; > spin_unlock(&hugetlb_lock); > @@ -892,6 +895,7 @@ static struct page *alloc_buddy_huge_page(struct hstate *h, int nid) > INIT_LIST_HEAD(&page->lru); > r_nid = page_to_nid(page); > set_compound_page_dtor(page, free_huge_page); > + set_hugetlb_cgroup(page, NULL); > /* > * We incremented the global counters already > */ > -- > 1.7.10 > -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic From mboxrd@z Thu Jan 1 00:00:00 1970 From: Li Zefan Subject: Re: [PATCH -V9 09/15] mm/hugetlb: Add new HugeTLB cgroup Date: Thu, 14 Jun 2012 16:54:14 +0800 Message-ID: <4FD9A6B6.50503@huawei.com> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1339583254-895-10-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1339583254-895-10-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Sender: owner-linux-mm@kvack.org List-ID: Content-Type: text/plain; charset="us-ascii" To: "Aneesh Kumar K.V" Cc: linux-mm@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, rientjes@google.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org > +static inline > +struct hugetlb_cgroup *hugetlb_cgroup_from_css(struct cgroup_subsys_state *s) > +{ > + if (s) Neither cgroup_subsys_state() or task_subsys_state() will ever return NULL, so here 's' won't be NULL. > + return container_of(s, struct hugetlb_cgroup, css); > + return NULL; > +} > + > +static inline > +struct hugetlb_cgroup *hugetlb_cgroup_from_cgroup(struct cgroup *cgroup) > +{ > + return hugetlb_cgroup_from_css(cgroup_subsys_state(cgroup, > + hugetlb_subsys_id)); > +} > + > +static inline > +struct hugetlb_cgroup *hugetlb_cgroup_from_task(struct task_struct *task) > +{ > + return hugetlb_cgroup_from_css(task_subsys_state(task, > + hugetlb_subsys_id)); > +} -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Li Zefan Subject: Re: [PATCH -V9 11/15] hugetlb/cgroup: Add charge/uncharge routines for hugetlb cgroup Date: Thu, 14 Jun 2012 16:58:05 +0800 Message-ID: <4FD9A79D.9030303@huawei.com> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1339583254-895-12-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1339583254-895-12-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Sender: owner-linux-mm@kvack.org List-ID: Content-Type: text/plain; charset="us-ascii" To: "Aneesh Kumar K.V" Cc: linux-mm@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, rientjes@google.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org > +int hugetlb_cgroup_charge_cgroup(int idx, unsigned long nr_pages, > + struct hugetlb_cgroup **ptr) > +{ > + int ret = 0; > + struct res_counter *fail_res; > + struct hugetlb_cgroup *h_cg = NULL; > + unsigned long csize = nr_pages * PAGE_SIZE; > + > + if (hugetlb_cgroup_disabled()) > + goto done; > + /* > + * We don't charge any cgroup if the compound page have less > + * than 3 pages. > + */ > + if (huge_page_order(&hstates[idx]) < HUGETLB_CGROUP_MIN_ORDER) > + goto done; > +again: > + rcu_read_lock(); > + h_cg = hugetlb_cgroup_from_task(current); > + if (!h_cg) In no circumstances should h_cg be NULL. > + h_cg = root_h_cgroup; > + > + if (!css_tryget(&h_cg->css)) { > + rcu_read_unlock(); > + goto again; > + } > + rcu_read_unlock(); > + > + ret = res_counter_charge(&h_cg->hugepage[idx], csize, &fail_res); > + css_put(&h_cg->css); > +done: > + *ptr = h_cg; > + return ret; > +} -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michal Hocko Subject: Re: [PATCH -V9 11/15] hugetlb/cgroup: Add charge/uncharge routines for hugetlb cgroup Date: Thu, 14 Jun 2012 11:25:40 +0200 Message-ID: <20120614092539.GI27397@tiehlicka.suse.cz> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1339583254-895-12-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Mime-Version: 1.0 Return-path: Content-Disposition: inline In-Reply-To: <1339583254-895-12-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: "Aneesh Kumar K.V" Cc: linux-mm@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, rientjes@google.com, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org On Wed 13-06-12 15:57:30, Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > This patchset add the charge and uncharge routines for hugetlb cgroup. > We do cgroup charging in page alloc and uncharge in compound page > destructor. Assigning page's hugetlb cgroup is protected by hugetlb_lock. > > Signed-off-by: Aneesh Kumar K.V Reviewed-by: Michal Hocko One minor comment [...] > +void hugetlb_cgroup_commit_charge(int idx, unsigned long nr_pages, > + struct hugetlb_cgroup *h_cg, > + struct page *page) > +{ > + if (hugetlb_cgroup_disabled() || !h_cg) > + return; > + > + spin_lock(&hugetlb_lock); > + set_hugetlb_cgroup(page, h_cg); > + spin_unlock(&hugetlb_lock); > + return; > +} I guess we can remove the lock here because nobody can see the page yet, right? -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michal Hocko Subject: Re: [PATCH -V9 12/15] hugetlb/cgroup: Add support for cgroup removal Date: Thu, 14 Jun 2012 11:31:03 +0200 Message-ID: <20120614093103.GJ27397@tiehlicka.suse.cz> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1339583254-895-13-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Mime-Version: 1.0 Return-path: Content-Disposition: inline In-Reply-To: <1339583254-895-13-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Sender: owner-linux-mm@kvack.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: "Aneesh Kumar K.V" Cc: linux-mm@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, rientjes@google.com, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org On Wed 13-06-12 15:57:31, Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > This patch add support for cgroup removal. If we don't have parent > cgroup, the charges are moved to root cgroup. > > Signed-off-by: Aneesh Kumar K.V Reviewed-by: Michal Hocko > --- > mm/hugetlb_cgroup.c | 70 +++++++++++++++++++++++++++++++++++++++++++++++++-- > 1 file changed, 68 insertions(+), 2 deletions(-) > > diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c > index 0f2f6ac..a3a68a4 100644 > --- a/mm/hugetlb_cgroup.c > +++ b/mm/hugetlb_cgroup.c > @@ -107,10 +107,76 @@ static void hugetlb_cgroup_destroy(struct cgroup *cgroup) > kfree(h_cgroup); > } > > + > +/* > + * Should be called with hugetlb_lock held. > + * Since we are holding hugetlb_lock, pages cannot get moved from > + * active list or uncharged from the cgroup, So no need to get > + * page reference and test for page active here. This function > + * cannot fail. > + */ > +static void hugetlb_cgroup_move_parent(int idx, struct cgroup *cgroup, > + struct page *page) > +{ > + int csize; > + struct res_counter *counter; > + struct res_counter *fail_res; > + struct hugetlb_cgroup *page_hcg; > + struct hugetlb_cgroup *h_cg = hugetlb_cgroup_from_cgroup(cgroup); > + struct hugetlb_cgroup *parent = parent_hugetlb_cgroup(cgroup); > + > + page_hcg = hugetlb_cgroup_from_page(page); > + /* > + * We can have pages in active list without any cgroup > + * ie, hugepage with less than 3 pages. We can safely > + * ignore those pages. > + */ > + if (!page_hcg || page_hcg != h_cg) > + goto out; > + > + csize = PAGE_SIZE << compound_order(page); > + if (!parent) { > + parent = root_h_cgroup; > + /* root has no limit */ > + res_counter_charge_nofail(&parent->hugepage[idx], > + csize, &fail_res); > + } > + counter = &h_cg->hugepage[idx]; > + res_counter_uncharge_until(counter, counter->parent, csize); > + > + set_hugetlb_cgroup(page, parent); > +out: > + return; > +} > + > +/* > + * Force the hugetlb cgroup to empty the hugetlb resources by moving them to > + * the parent cgroup. > + */ > static int hugetlb_cgroup_pre_destroy(struct cgroup *cgroup) > { > - /* We will add the cgroup removal support in later patches */ > - return -EBUSY; > + struct hstate *h; > + struct page *page; > + int ret = 0, idx = 0; > + > + do { > + if (cgroup_task_count(cgroup) || > + !list_empty(&cgroup->children)) { > + ret = -EBUSY; > + goto out; > + } > + for_each_hstate(h) { > + spin_lock(&hugetlb_lock); > + list_for_each_entry(page, &h->hugepage_activelist, lru) > + hugetlb_cgroup_move_parent(idx, cgroup, page); > + > + spin_unlock(&hugetlb_lock); > + idx++; > + } > + cond_resched(); > + } while (hugetlb_cgroup_have_usage(cgroup)); > +out: > + return ret; > } > > int hugetlb_cgroup_charge_cgroup(int idx, unsigned long nr_pages, > -- > 1.7.10 > -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michal Hocko Subject: Re: [PATCH -V9 13/15] hugetlb/cgroup: add hugetlb cgroup control files Date: Thu, 14 Jun 2012 11:36:52 +0200 Message-ID: <20120614093652.GK27397@tiehlicka.suse.cz> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1339583254-895-14-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Mime-Version: 1.0 Return-path: Content-Disposition: inline In-Reply-To: <1339583254-895-14-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Sender: owner-linux-mm@kvack.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: "Aneesh Kumar K.V" Cc: linux-mm@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, rientjes@google.com, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org On Wed 13-06-12 15:57:32, Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > Add the control files for hugetlb controller > > Signed-off-by: Aneesh Kumar K.V Reviewed-by: Michal Hocko > --- > include/linux/hugetlb.h | 5 ++ > include/linux/hugetlb_cgroup.h | 6 ++ > mm/hugetlb.c | 8 +++ > mm/hugetlb_cgroup.c | 129 ++++++++++++++++++++++++++++++++++++++++ > 4 files changed, 148 insertions(+) > > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h > index 4aca057..9650bb1 100644 > --- a/include/linux/hugetlb.h > +++ b/include/linux/hugetlb.h > @@ -4,6 +4,7 @@ > #include > #include > #include > +#include > > struct ctl_table; > struct user_struct; > @@ -221,6 +222,10 @@ struct hstate { > unsigned int nr_huge_pages_node[MAX_NUMNODES]; > unsigned int free_huge_pages_node[MAX_NUMNODES]; > unsigned int surplus_huge_pages_node[MAX_NUMNODES]; > +#ifdef CONFIG_CGROUP_HUGETLB_RES_CTLR > + /* cgroup control files */ > + struct cftype cgroup_files[5]; > +#endif > char name[HSTATE_NAME_LEN]; > }; > > diff --git a/include/linux/hugetlb_cgroup.h b/include/linux/hugetlb_cgroup.h > index e05871c..bd8bc98 100644 > --- a/include/linux/hugetlb_cgroup.h > +++ b/include/linux/hugetlb_cgroup.h > @@ -62,6 +62,7 @@ extern void hugetlb_cgroup_uncharge_page(int idx, unsigned long nr_pages, > struct page *page); > extern void hugetlb_cgroup_uncharge_cgroup(int idx, unsigned long nr_pages, > struct hugetlb_cgroup *h_cg); > +extern int hugetlb_cgroup_file_init(int idx) __init; > > #else > static inline struct hugetlb_cgroup *hugetlb_cgroup_from_page(struct page *page) > @@ -108,5 +109,10 @@ hugetlb_cgroup_uncharge_cgroup(int idx, unsigned long nr_pages, > return; > } > > +static inline int __init hugetlb_cgroup_file_init(int idx) > +{ > + return 0; > +} > + > #endif /* CONFIG_MEM_RES_CTLR_HUGETLB */ > #endif > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 59720b1..a5a30bf 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -30,6 +30,7 @@ > #include > #include > #include > +#include > #include "internal.h" > > const unsigned long hugetlb_zero = 0, hugetlb_infinity = ~0UL; > @@ -1930,6 +1931,13 @@ void __init hugetlb_add_hstate(unsigned order) > h->next_nid_to_free = first_node(node_states[N_HIGH_MEMORY]); > snprintf(h->name, HSTATE_NAME_LEN, "hugepages-%lukB", > huge_page_size(h)/1024); > + /* > + * Add cgroup control files only if the huge page consists > + * of more than two normal pages. This is because we use > + * page[2].lru.next for storing cgoup details. > + */ > + if (order >= HUGETLB_CGROUP_MIN_ORDER) > + hugetlb_cgroup_file_init(hugetlb_max_hstate - 1); > > parsed_hstate = h; > } > diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c > index a3a68a4..64e93e0 100644 > --- a/mm/hugetlb_cgroup.c > +++ b/mm/hugetlb_cgroup.c > @@ -26,6 +26,10 @@ struct hugetlb_cgroup { > struct res_counter hugepage[HUGE_MAX_HSTATE]; > }; > > +#define MEMFILE_PRIVATE(x, val) (((x) << 16) | (val)) > +#define MEMFILE_IDX(val) (((val) >> 16) & 0xffff) > +#define MEMFILE_ATTR(val) ((val) & 0xffff) > + > struct cgroup_subsys hugetlb_subsys __read_mostly; > struct hugetlb_cgroup *root_h_cgroup __read_mostly; > > @@ -259,6 +263,131 @@ void hugetlb_cgroup_uncharge_cgroup(int idx, unsigned long nr_pages, > return; > } > > +static ssize_t hugetlb_cgroup_read(struct cgroup *cgroup, struct cftype *cft, > + struct file *file, char __user *buf, > + size_t nbytes, loff_t *ppos) > +{ > + u64 val; > + char str[64]; > + int idx, name, len; > + struct hugetlb_cgroup *h_cg = hugetlb_cgroup_from_cgroup(cgroup); > + > + idx = MEMFILE_IDX(cft->private); > + name = MEMFILE_ATTR(cft->private); > + > + val = res_counter_read_u64(&h_cg->hugepage[idx], name); > + len = scnprintf(str, sizeof(str), "%llu\n", (unsigned long long)val); > + return simple_read_from_buffer(buf, nbytes, ppos, str, len); > +} > + > +static int hugetlb_cgroup_write(struct cgroup *cgroup, struct cftype *cft, > + const char *buffer) > +{ > + int idx, name, ret; > + unsigned long long val; > + struct hugetlb_cgroup *h_cg = hugetlb_cgroup_from_cgroup(cgroup); > + > + idx = MEMFILE_IDX(cft->private); > + name = MEMFILE_ATTR(cft->private); > + > + switch (name) { > + case RES_LIMIT: > + if (hugetlb_cgroup_is_root(h_cg)) { > + /* Can't set limit on root */ > + ret = -EINVAL; > + break; > + } > + /* This function does all necessary parse...reuse it */ > + ret = res_counter_memparse_write_strategy(buffer, &val); > + if (ret) > + break; > + ret = res_counter_set_limit(&h_cg->hugepage[idx], val); > + break; > + default: > + ret = -EINVAL; > + break; > + } > + return ret; > +} > + > +static int hugetlb_cgroup_reset(struct cgroup *cgroup, unsigned int event) > +{ > + int idx, name, ret = 0; > + struct hugetlb_cgroup *h_cg = hugetlb_cgroup_from_cgroup(cgroup); > + > + idx = MEMFILE_IDX(event); > + name = MEMFILE_ATTR(event); > + > + switch (name) { > + case RES_MAX_USAGE: > + res_counter_reset_max(&h_cg->hugepage[idx]); > + break; > + case RES_FAILCNT: > + res_counter_reset_failcnt(&h_cg->hugepage[idx]); > + break; > + default: > + ret = -EINVAL; > + break; > + } > + return ret; > +} > + > +static char *mem_fmt(char *buf, int size, unsigned long hsize) > +{ > + if (hsize >= (1UL << 30)) > + snprintf(buf, size, "%luGB", hsize >> 30); > + else if (hsize >= (1UL << 20)) > + snprintf(buf, size, "%luMB", hsize >> 20); > + else > + snprintf(buf, size, "%luKB", hsize >> 10); > + return buf; > +} > + > +int __init hugetlb_cgroup_file_init(int idx) > +{ > + char buf[32]; > + struct cftype *cft; > + struct hstate *h = &hstates[idx]; > + > + /* format the size */ > + mem_fmt(buf, 32, huge_page_size(h)); > + > + /* Add the limit file */ > + cft = &h->cgroup_files[0]; > + snprintf(cft->name, MAX_CFTYPE_NAME, "%s.limit_in_bytes", buf); > + cft->private = MEMFILE_PRIVATE(idx, RES_LIMIT); > + cft->read = hugetlb_cgroup_read; > + cft->write_string = hugetlb_cgroup_write; > + > + /* Add the usage file */ > + cft = &h->cgroup_files[1]; > + snprintf(cft->name, MAX_CFTYPE_NAME, "%s.usage_in_bytes", buf); > + cft->private = MEMFILE_PRIVATE(idx, RES_USAGE); > + cft->read = hugetlb_cgroup_read; > + > + /* Add the MAX usage file */ > + cft = &h->cgroup_files[2]; > + snprintf(cft->name, MAX_CFTYPE_NAME, "%s.max_usage_in_bytes", buf); > + cft->private = MEMFILE_PRIVATE(idx, RES_MAX_USAGE); > + cft->trigger = hugetlb_cgroup_reset; > + cft->read = hugetlb_cgroup_read; > + > + /* Add the failcntfile */ > + cft = &h->cgroup_files[3]; > + snprintf(cft->name, MAX_CFTYPE_NAME, "%s.failcnt", buf); > + cft->private = MEMFILE_PRIVATE(idx, RES_FAILCNT); > + cft->trigger = hugetlb_cgroup_reset; > + cft->read = hugetlb_cgroup_read; > + > + /* NULL terminate the last cft */ > + cft = &h->cgroup_files[4]; > + memset(cft, 0, sizeof(*cft)); > + > + WARN_ON(cgroup_add_cftypes(&hugetlb_subsys, h->cgroup_files)); > + > + return 0; > +} > + > struct cgroup_subsys hugetlb_subsys = { > .name = "hugetlb", > .create = hugetlb_cgroup_create, > -- > 1.7.10 > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michal Hocko Subject: Re: [PATCH -V9 14/15] hugetlb/cgroup: migrate hugetlb cgroup info from oldpage to new page during migration Date: Thu, 14 Jun 2012 12:04:54 +0200 Message-ID: <20120614100454.GL27397@tiehlicka.suse.cz> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1339583254-895-15-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Mime-Version: 1.0 Return-path: Content-Disposition: inline In-Reply-To: <1339583254-895-15-git-send-email-aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: "Aneesh Kumar K.V" Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org, dhillf-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, rientjes-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org, hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org On Wed 13-06-12 15:57:33, Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > With HugeTLB pages, hugetlb cgroup is uncharged in compound page destructor. Since > we are holding a hugepage reference, we can be sure that old page won't > get uncharged till the last put_page(). > > Signed-off-by: Aneesh Kumar K.V Reviewed-by: Michal Hocko One question below [...] > +void hugetlb_cgroup_migrate(struct page *oldhpage, struct page *newhpage) > +{ > + struct hugetlb_cgroup *h_cg; > + > + if (hugetlb_cgroup_disabled()) > + return; > + > + VM_BUG_ON(!PageHuge(oldhpage)); > + spin_lock(&hugetlb_lock); > + h_cg = hugetlb_cgroup_from_page(oldhpage); > + set_hugetlb_cgroup(oldhpage, NULL); > + cgroup_exclude_rmdir(&h_cg->css); > + > + /* move the h_cg details to new cgroup */ > + set_hugetlb_cgroup(newhpage, h_cg); > + spin_unlock(&hugetlb_lock); > + cgroup_release_and_wakeup_rmdir(&h_cg->css); > + return; > +} > + The changelog says that the old page won't get uncharged - which means that the the cgroup cannot go away (even if we raced with the move parent, hugetlb_lock makes sure we either see old or new cgroup) so why do we need to play with css ref. counting? -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michal Hocko Subject: Re: [PATCH -V9 15/15] hugetlb/cgroup: add HugeTLB controller documentation Date: Thu, 14 Jun 2012 12:07:55 +0200 Message-ID: <20120614100755.GM27397@tiehlicka.suse.cz> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1339583254-895-16-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Mime-Version: 1.0 Return-path: Content-Disposition: inline In-Reply-To: <1339583254-895-16-git-send-email-aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: "Aneesh Kumar K.V" Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org, dhillf-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, rientjes-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org, hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org On Wed 13-06-12 15:57:34, Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > Reviewed-by: KAMEZAWA Hiroyuki > Signed-off-by: Aneesh Kumar K.V Reviewed-by: Michal Hocko Minor nid below > --- > Documentation/cgroups/hugetlb.txt | 45 +++++++++++++++++++++++++++++++++++++ > 1 file changed, 45 insertions(+) > create mode 100644 Documentation/cgroups/hugetlb.txt > > diff --git a/Documentation/cgroups/hugetlb.txt b/Documentation/cgroups/hugetlb.txt > new file mode 100644 > index 0000000..a9faaca > --- /dev/null > +++ b/Documentation/cgroups/hugetlb.txt [...] > +With the above step, the initial or the parent HugeTLB group becomes > +visible at /sys/fs/cgroup. At bootup, this group includes all the tasks in > +the system. /sys/fs/cgroup/tasks lists the tasks in this cgroup. > + > +New groups can be created under the parent group /sys/fs/cgroup. > + > +# cd /sys/fs/cgroup > +# mkdir g1 > +# echo $$ > g1/tasks > + > +The above steps create a new group g1 and move the current shell > +process (bash) into it. This is probably not needed as it is already described in the generic cgroups description > + > +Brief summary of control files > + > + hugetlb..limit_in_bytes # set/show limit of "hugepagesize" hugetlb usage > + hugetlb..max_usage_in_bytes # show max "hugepagesize" hugetlb usage recorded > + hugetlb..usage_in_bytes # show current res_counter usage for "hugepagesize" hugetlb > + hugetlb..failcnt # show the number of allocation failure due to HugeTLB limit > + > +For a system supporting two hugepage size (16M and 16G) the control > +files include: > + > +hugetlb.16GB.limit_in_bytes > +hugetlb.16GB.max_usage_in_bytes > +hugetlb.16GB.usage_in_bytes > +hugetlb.16GB.failcnt > +hugetlb.16MB.limit_in_bytes > +hugetlb.16MB.max_usage_in_bytes > +hugetlb.16MB.usage_in_bytes > +hugetlb.16MB.failcnt > -- > 1.7.10 > -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Aneesh Kumar K.V" Subject: Re: [PATCH -V9 09/15] mm/hugetlb: Add new HugeTLB cgroup Date: Fri, 15 Jun 2012 11:50:52 +0530 Message-ID: <87mx45m6yj.fsf@skywalker.in.ibm.com> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1339583254-895-10-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <4FD9A6B6.50503@huawei.com> Mime-Version: 1.0 Return-path: In-Reply-To: <4FD9A6B6.50503-hv44wF8Li93QT0dZR+AlfA@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: Li Zefan Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org, dhillf-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, rientjes-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org, mhocko-AlSwsSmVLrQ@public.gmane.org, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org, hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org Li Zefan writes: >> +static inline > >> +struct hugetlb_cgroup *hugetlb_cgroup_from_css(struct cgroup_subsys_state *s) >> +{ >> + if (s) > > > Neither cgroup_subsys_state() or task_subsys_state() will ever return NULL, > so here 's' won't be NULL. > That is a change that didn't get updated when i dropped page_cgroup changes. I had a series that tracked in page_cgroup cgroup_subsys_state. I will send an fix on top. Thanks for the review. -aneesh From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Aneesh Kumar K.V" Subject: Re: [PATCH -V9 11/15] hugetlb/cgroup: Add charge/uncharge routines for hugetlb cgroup Date: Fri, 15 Jun 2012 15:36:10 +0530 Message-ID: <87k3z8nb3h.fsf@skywalker.in.ibm.com> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1339583254-895-12-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <20120614092539.GI27397@tiehlicka.suse.cz> Mime-Version: 1.0 Return-path: In-Reply-To: <20120614092539.GI27397-VqjxzfR4DlwKmadIfiO5sKVXKuFTiq87@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: Michal Hocko Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org, dhillf-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, rientjes-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org, hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org Michal Hocko writes: > On Wed 13-06-12 15:57:30, Aneesh Kumar K.V wrote: >> From: "Aneesh Kumar K.V" >> >> This patchset add the charge and uncharge routines for hugetlb cgroup. >> We do cgroup charging in page alloc and uncharge in compound page >> destructor. Assigning page's hugetlb cgroup is protected by hugetlb_lock. >> >> Signed-off-by: Aneesh Kumar K.V > > Reviewed-by: Michal Hocko > > One minor comment > [...] >> +void hugetlb_cgroup_commit_charge(int idx, unsigned long nr_pages, >> + struct hugetlb_cgroup *h_cg, >> + struct page *page) >> +{ >> + if (hugetlb_cgroup_disabled() || !h_cg) >> + return; >> + >> + spin_lock(&hugetlb_lock); >> + set_hugetlb_cgroup(page, h_cg); >> + spin_unlock(&hugetlb_lock); >> + return; >> +} > > I guess we can remove the lock here because nobody can see the page yet, > right? > We need that to make sure when we remove cgroup we find correct page hugetlb cgroup values. But i guess we have a bug here. How about the below ? NOTE: We also need another patch to update active list during soft offline. I will send that in reply. commit e4c3fd3cc0f0faa30ea283cb48ba478a5c0d3e74 Author: Aneesh Kumar K.V Date: Fri Jun 15 14:42:27 2012 +0530 hugetlb/cgroup: Assign the page hugetlb cgroup when we move the page to active list. page's hugetlb cgroup assign and moving to active list should happen with hugetlb_lock held. Otherwise when we remove the hugetlb cgroup we would iterate the active list and will find page with NULL hugetlb cgroup values. Signed-off-by: Aneesh Kumar K.V diff --git a/mm/hugetlb.c b/mm/hugetlb.c index ee4da3b..b90dfb4 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1146,9 +1146,12 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma, } spin_lock(&hugetlb_lock); page = dequeue_huge_page_vma(h, vma, addr, avoid_reserve); - spin_unlock(&hugetlb_lock); - - if (!page) { + if (page) { + /* update page cgroup details */ + hugetlb_cgroup_commit_charge(idx, pages_per_huge_page(h), h_cg, page); + spin_unlock(&hugetlb_lock); + } else { + spin_unlock(&hugetlb_lock); page = alloc_buddy_huge_page(h, NUMA_NO_NODE); if (!page) { hugetlb_cgroup_uncharge_cgroup(idx, @@ -1159,14 +1162,13 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma, } spin_lock(&hugetlb_lock); list_move(&page->lru, &h->hugepage_activelist); + hugetlb_cgroup_commit_charge(idx, pages_per_huge_page(h), h_cg, page); spin_unlock(&hugetlb_lock); } set_page_private(page, (unsigned long)spool); vma_commit_reservation(h, vma, addr); - /* update page cgroup details */ - hugetlb_cgroup_commit_charge(idx, pages_per_huge_page(h), h_cg, page); return page; } diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c index 8e7ca0a..d4f3f7b 100644 --- a/mm/hugetlb_cgroup.c +++ b/mm/hugetlb_cgroup.c @@ -218,6 +218,7 @@ done: return ret; } +/* Should be called with hugetlb_lock held */ void hugetlb_cgroup_commit_charge(int idx, unsigned long nr_pages, struct hugetlb_cgroup *h_cg, struct page *page) @@ -225,9 +226,7 @@ void hugetlb_cgroup_commit_charge(int idx, unsigned long nr_pages, if (hugetlb_cgroup_disabled() || !h_cg) return; - spin_lock(&hugetlb_lock); set_hugetlb_cgroup(page, h_cg); - spin_unlock(&hugetlb_lock); return; } From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Aneesh Kumar K.V" Subject: Re: [PATCH -V9 14/15] hugetlb/cgroup: migrate hugetlb cgroup info from oldpage to new page during migration Date: Fri, 15 Jun 2012 16:20:31 +0530 Message-ID: <87haucn91k.fsf@skywalker.in.ibm.com> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1339583254-895-15-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <20120614100454.GL27397@tiehlicka.suse.cz> Mime-Version: 1.0 Return-path: In-Reply-To: <20120614100454.GL27397-VqjxzfR4DlwKmadIfiO5sKVXKuFTiq87@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: Michal Hocko Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org, dhillf-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, rientjes-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org, hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org Michal Hocko writes: > On Wed 13-06-12 15:57:33, Aneesh Kumar K.V wrote: >> From: "Aneesh Kumar K.V" >> >> With HugeTLB pages, hugetlb cgroup is uncharged in compound page destructor. Since >> we are holding a hugepage reference, we can be sure that old page won't >> get uncharged till the last put_page(). >> >> Signed-off-by: Aneesh Kumar K.V > > Reviewed-by: Michal Hocko > > One question below > [...] >> +void hugetlb_cgroup_migrate(struct page *oldhpage, struct page *newhpage) >> +{ >> + struct hugetlb_cgroup *h_cg; >> + >> + if (hugetlb_cgroup_disabled()) >> + return; >> + >> + VM_BUG_ON(!PageHuge(oldhpage)); >> + spin_lock(&hugetlb_lock); >> + h_cg = hugetlb_cgroup_from_page(oldhpage); >> + set_hugetlb_cgroup(oldhpage, NULL); >> + cgroup_exclude_rmdir(&h_cg->css); >> + >> + /* move the h_cg details to new cgroup */ >> + set_hugetlb_cgroup(newhpage, h_cg); >> + spin_unlock(&hugetlb_lock); >> + cgroup_release_and_wakeup_rmdir(&h_cg->css); >> + return; >> +} >> + > > The changelog says that the old page won't get uncharged - which means > that the the cgroup cannot go away (even if we raced with the move > parent, hugetlb_lock makes sure we either see old or new cgroup) so why > do we need to play with css ref. counting? Ok hugetlb_lock should be sufficient here i guess. I will send a patch on top to remove the exclude_rmdir and release_and_wakeup_rmdir -aneesh From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Morton Subject: Re: [PATCH -V9 11/15] hugetlb/cgroup: Add charge/uncharge routines for hugetlb cgroup Date: Fri, 22 Jun 2012 15:11:21 -0700 Message-ID: <20120622151121.917178eb.akpm@linux-foundation.org> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1339583254-895-12-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <4FD9A79D.9030303@huawei.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <4FD9A79D.9030303@huawei.com> Sender: owner-linux-mm@kvack.org List-ID: Content-Type: text/plain; charset="us-ascii" To: Li Zefan Cc: "Aneesh Kumar K.V" , linux-mm@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, rientjes@google.com, mhocko@suse.cz, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org On Thu, 14 Jun 2012 16:58:05 +0800 Li Zefan wrote: > > +int hugetlb_cgroup_charge_cgroup(int idx, unsigned long nr_pages, > > > + struct hugetlb_cgroup **ptr) > > +{ > > + int ret = 0; > > + struct res_counter *fail_res; > > + struct hugetlb_cgroup *h_cg = NULL; > > + unsigned long csize = nr_pages * PAGE_SIZE; > > + > > + if (hugetlb_cgroup_disabled()) > > + goto done; > > + /* > > + * We don't charge any cgroup if the compound page have less > > + * than 3 pages. > > + */ > > + if (huge_page_order(&hstates[idx]) < HUGETLB_CGROUP_MIN_ORDER) > > + goto done; > > +again: > > + rcu_read_lock(); > > + h_cg = hugetlb_cgroup_from_task(current); > > + if (!h_cg) > > > In no circumstances should h_cg be NULL. > Aneesh? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Aneesh Kumar K.V" Subject: Re: [PATCH -V9 11/15] hugetlb/cgroup: Add charge/uncharge routines for hugetlb cgroup Date: Sun, 24 Jun 2012 22:14:51 +0530 Message-ID: <87txy07j7g.fsf@skywalker.in.ibm.com> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1339583254-895-12-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <4FD9A79D.9030303@huawei.com> <20120622151121.917178eb.akpm@linux-foundation.org> Mime-Version: 1.0 Return-path: In-Reply-To: <20120622151121.917178eb.akpm@linux-foundation.org> Sender: owner-linux-mm@kvack.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: Andrew Morton , Li Zefan Cc: linux-mm@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, rientjes@google.com, mhocko@suse.cz, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Hi Andrew, Andrew Morton writes: > On Thu, 14 Jun 2012 16:58:05 +0800 > Li Zefan wrote: > >> > +int hugetlb_cgroup_charge_cgroup(int idx, unsigned long nr_pages, >> >> > + struct hugetlb_cgroup **ptr) >> > +{ >> > + int ret = 0; >> > + struct res_counter *fail_res; >> > + struct hugetlb_cgroup *h_cg = NULL; >> > + unsigned long csize = nr_pages * PAGE_SIZE; >> > + >> > + if (hugetlb_cgroup_disabled()) >> > + goto done; >> > + /* >> > + * We don't charge any cgroup if the compound page have less >> > + * than 3 pages. >> > + */ >> > + if (huge_page_order(&hstates[idx]) < HUGETLB_CGROUP_MIN_ORDER) >> > + goto done; >> > +again: >> > + rcu_read_lock(); >> > + h_cg = hugetlb_cgroup_from_task(current); >> > + if (!h_cg) >> >> >> In no circumstances should h_cg be NULL. >> > > Aneesh? I missed this in the last review. Thanks for reminding. I will send a patch addressing this and another related comment in 4FD9A6B6.50503@huawei.com as a separate mail. -aneesh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx113.postini.com [74.125.245.113]) by kanga.kvack.org (Postfix) with SMTP id E581D6B0080 for ; Wed, 13 Jun 2012 06:28:49 -0400 (EDT) Received: from /spool/local by e28smtp08.in.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 13 Jun 2012 15:58:46 +0530 Received: from d28av01.in.ibm.com (d28av01.in.ibm.com [9.184.220.63]) by d28relay05.in.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id q5DASOoo58064984 for ; Wed, 13 Jun 2012 15:58:24 +0530 Received: from d28av01.in.ibm.com (loopback [127.0.0.1]) by d28av01.in.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id q5DFvqLx001779 for ; Wed, 13 Jun 2012 21:27:59 +0530 From: "Aneesh Kumar K.V" Subject: [PATCH -V9 11/15] hugetlb/cgroup: Add charge/uncharge routines for hugetlb cgroup Date: Wed, 13 Jun 2012 15:57:30 +0530 Message-Id: <1339583254-895-12-git-send-email-aneesh.kumar@linux.vnet.ibm.com> In-Reply-To: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Sender: owner-linux-mm@kvack.org List-ID: To: linux-mm@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, rientjes@google.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, "Aneesh Kumar K.V" From: "Aneesh Kumar K.V" This patchset add the charge and uncharge routines for hugetlb cgroup. We do cgroup charging in page alloc and uncharge in compound page destructor. Assigning page's hugetlb cgroup is protected by hugetlb_lock. Signed-off-by: Aneesh Kumar K.V --- include/linux/hugetlb_cgroup.h | 38 +++++++++++++++++++ mm/hugetlb.c | 16 +++++++- mm/hugetlb_cgroup.c | 80 ++++++++++++++++++++++++++++++++++++++++ 3 files changed, 133 insertions(+), 1 deletion(-) diff --git a/include/linux/hugetlb_cgroup.h b/include/linux/hugetlb_cgroup.h index be1a9f8..e05871c 100644 --- a/include/linux/hugetlb_cgroup.h +++ b/include/linux/hugetlb_cgroup.h @@ -53,6 +53,16 @@ static inline bool hugetlb_cgroup_disabled(void) return false; } +extern int hugetlb_cgroup_charge_cgroup(int idx, unsigned long nr_pages, + struct hugetlb_cgroup **ptr); +extern void hugetlb_cgroup_commit_charge(int idx, unsigned long nr_pages, + struct hugetlb_cgroup *h_cg, + struct page *page); +extern void hugetlb_cgroup_uncharge_page(int idx, unsigned long nr_pages, + struct page *page); +extern void hugetlb_cgroup_uncharge_cgroup(int idx, unsigned long nr_pages, + struct hugetlb_cgroup *h_cg); + #else static inline struct hugetlb_cgroup *hugetlb_cgroup_from_page(struct page *page) { @@ -70,5 +80,33 @@ static inline bool hugetlb_cgroup_disabled(void) return true; } +static inline int +hugetlb_cgroup_charge_cgroup(int idx, unsigned long nr_pages, + struct hugetlb_cgroup **ptr) +{ + return 0; +} + +static inline void +hugetlb_cgroup_commit_charge(int idx, unsigned long nr_pages, + struct hugetlb_cgroup *h_cg, + struct page *page) +{ + return; +} + +static inline void +hugetlb_cgroup_uncharge_page(int idx, unsigned long nr_pages, struct page *page) +{ + return; +} + +static inline void +hugetlb_cgroup_uncharge_cgroup(int idx, unsigned long nr_pages, + struct hugetlb_cgroup *h_cg) +{ + return; +} + #endif /* CONFIG_MEM_RES_CTLR_HUGETLB */ #endif diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 6a449c5..59720b1 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -627,6 +627,8 @@ static void free_huge_page(struct page *page) BUG_ON(page_mapcount(page)); spin_lock(&hugetlb_lock); + hugetlb_cgroup_uncharge_page(hstate_index(h), + pages_per_huge_page(h), page); if (h->surplus_huge_pages_node[nid] && huge_page_order(h) < MAX_ORDER) { /* remove the page from active list */ list_del(&page->lru); @@ -1115,7 +1117,10 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma, struct hstate *h = hstate_vma(vma); struct page *page; long chg; + int ret, idx; + struct hugetlb_cgroup *h_cg; + idx = hstate_index(h); /* * Processes that did not create the mapping will have no * reserves and will not have accounted against subpool @@ -1131,6 +1136,11 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma, if (hugepage_subpool_get_pages(spool, chg)) return ERR_PTR(-ENOSPC); + ret = hugetlb_cgroup_charge_cgroup(idx, pages_per_huge_page(h), &h_cg); + if (ret) { + hugepage_subpool_put_pages(spool, chg); + return ERR_PTR(-ENOSPC); + } spin_lock(&hugetlb_lock); page = dequeue_huge_page_vma(h, vma, addr, avoid_reserve); spin_unlock(&hugetlb_lock); @@ -1138,6 +1148,9 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma, if (!page) { page = alloc_buddy_huge_page(h, NUMA_NO_NODE); if (!page) { + hugetlb_cgroup_uncharge_cgroup(idx, + pages_per_huge_page(h), + h_cg); hugepage_subpool_put_pages(spool, chg); return ERR_PTR(-ENOSPC); } @@ -1146,7 +1159,8 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma, set_page_private(page, (unsigned long)spool); vma_commit_reservation(h, vma, addr); - + /* update page cgroup details */ + hugetlb_cgroup_commit_charge(idx, pages_per_huge_page(h), h_cg, page); return page; } diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c index 5a4e71c..0f2f6ac 100644 --- a/mm/hugetlb_cgroup.c +++ b/mm/hugetlb_cgroup.c @@ -113,6 +113,86 @@ static int hugetlb_cgroup_pre_destroy(struct cgroup *cgroup) return -EBUSY; } +int hugetlb_cgroup_charge_cgroup(int idx, unsigned long nr_pages, + struct hugetlb_cgroup **ptr) +{ + int ret = 0; + struct res_counter *fail_res; + struct hugetlb_cgroup *h_cg = NULL; + unsigned long csize = nr_pages * PAGE_SIZE; + + if (hugetlb_cgroup_disabled()) + goto done; + /* + * We don't charge any cgroup if the compound page have less + * than 3 pages. + */ + if (huge_page_order(&hstates[idx]) < HUGETLB_CGROUP_MIN_ORDER) + goto done; +again: + rcu_read_lock(); + h_cg = hugetlb_cgroup_from_task(current); + if (!h_cg) + h_cg = root_h_cgroup; + + if (!css_tryget(&h_cg->css)) { + rcu_read_unlock(); + goto again; + } + rcu_read_unlock(); + + ret = res_counter_charge(&h_cg->hugepage[idx], csize, &fail_res); + css_put(&h_cg->css); +done: + *ptr = h_cg; + return ret; +} + +void hugetlb_cgroup_commit_charge(int idx, unsigned long nr_pages, + struct hugetlb_cgroup *h_cg, + struct page *page) +{ + if (hugetlb_cgroup_disabled() || !h_cg) + return; + + spin_lock(&hugetlb_lock); + set_hugetlb_cgroup(page, h_cg); + spin_unlock(&hugetlb_lock); + return; +} + +/* + * Should be called with hugetlb_lock held + */ +void hugetlb_cgroup_uncharge_page(int idx, unsigned long nr_pages, + struct page *page) +{ + struct hugetlb_cgroup *h_cg; + unsigned long csize = nr_pages * PAGE_SIZE; + + if (hugetlb_cgroup_disabled()) + return; + VM_BUG_ON(!spin_is_locked(&hugetlb_lock)); + h_cg = hugetlb_cgroup_from_page(page); + if (unlikely(!h_cg)) + return; + set_hugetlb_cgroup(page, NULL); + res_counter_uncharge(&h_cg->hugepage[idx], csize); + return; +} + +void hugetlb_cgroup_uncharge_cgroup(int idx, unsigned long nr_pages, + struct hugetlb_cgroup *h_cg) +{ + unsigned long csize = nr_pages * PAGE_SIZE; + + if (hugetlb_cgroup_disabled() || !h_cg) + return; + + res_counter_uncharge(&h_cg->hugepage[idx], csize); + return; +} + struct cgroup_subsys hugetlb_subsys = { .name = "hugetlb", .create = hugetlb_cgroup_create, -- 1.7.10 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx181.postini.com [74.125.245.181]) by kanga.kvack.org (Postfix) with SMTP id 876286B0083 for ; Wed, 13 Jun 2012 06:28:50 -0400 (EDT) Received: from /spool/local by e28smtp08.in.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 13 Jun 2012 15:58:47 +0530 Received: from d28av01.in.ibm.com (d28av01.in.ibm.com [9.184.220.63]) by d28relay04.in.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id q5DAS4OT15597642 for ; Wed, 13 Jun 2012 15:58:04 +0530 Received: from d28av01.in.ibm.com (loopback [127.0.0.1]) by d28av01.in.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id q5DFvako032761 for ; Wed, 13 Jun 2012 21:27:39 +0530 From: "Aneesh Kumar K.V" Subject: [PATCH -V9 04/15] hugetlb: use mmu_gather instead of a temporary linked list for accumulating pages Date: Wed, 13 Jun 2012 15:57:23 +0530 Message-Id: <1339583254-895-5-git-send-email-aneesh.kumar@linux.vnet.ibm.com> In-Reply-To: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Sender: owner-linux-mm@kvack.org List-ID: To: linux-mm@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, rientjes@google.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, "Aneesh Kumar K.V" From: "Aneesh Kumar K.V" Use a mmu_gather instead of a temporary linked list for accumulating pages when we unmap a hugepage range Reviewed-by: KAMEZAWA Hiroyuki Signed-off-by: Aneesh Kumar K.V --- fs/hugetlbfs/inode.c | 4 ++-- include/linux/hugetlb.h | 22 ++++++++++++++---- mm/hugetlb.c | 59 ++++++++++++++++++++++++++++------------------- mm/memory.c | 7 ++++-- 4 files changed, 59 insertions(+), 33 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index cc9281b..ff233e4 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -416,8 +416,8 @@ hugetlb_vmtruncate_list(struct prio_tree_root *root, pgoff_t pgoff) else v_offset = 0; - __unmap_hugepage_range(vma, - vma->vm_start + v_offset, vma->vm_end, NULL); + unmap_hugepage_range(vma, vma->vm_start + v_offset, + vma->vm_end, NULL); } } diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 217f528..0f23c18 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -7,6 +7,7 @@ struct ctl_table; struct user_struct; +struct mmu_gather; #ifdef CONFIG_HUGETLB_PAGE @@ -40,9 +41,10 @@ int follow_hugetlb_page(struct mm_struct *, struct vm_area_struct *, struct page **, struct vm_area_struct **, unsigned long *, int *, int, unsigned int flags); void unmap_hugepage_range(struct vm_area_struct *, - unsigned long, unsigned long, struct page *); -void __unmap_hugepage_range(struct vm_area_struct *, - unsigned long, unsigned long, struct page *); + unsigned long, unsigned long, struct page *); +void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma, + unsigned long start, unsigned long end, + struct page *ref_page); int hugetlb_prefault(struct address_space *, struct vm_area_struct *); void hugetlb_report_meminfo(struct seq_file *); int hugetlb_report_node_meminfo(int, char *); @@ -98,7 +100,6 @@ static inline unsigned long hugetlb_total_pages(void) #define follow_huge_addr(mm, addr, write) ERR_PTR(-EINVAL) #define copy_hugetlb_page_range(src, dst, vma) ({ BUG(); 0; }) #define hugetlb_prefault(mapping, vma) ({ BUG(); 0; }) -#define unmap_hugepage_range(vma, start, end, page) BUG() static inline void hugetlb_report_meminfo(struct seq_file *m) { } @@ -112,13 +113,24 @@ static inline void hugetlb_report_meminfo(struct seq_file *m) #define hugetlb_free_pgd_range(tlb, addr, end, floor, ceiling) ({BUG(); 0; }) #define hugetlb_fault(mm, vma, addr, flags) ({ BUG(); 0; }) #define huge_pte_offset(mm, address) 0 -#define dequeue_hwpoisoned_huge_page(page) 0 +static inline int dequeue_hwpoisoned_huge_page(struct page *page) +{ + return 0; +} + static inline void copy_huge_page(struct page *dst, struct page *src) { } #define hugetlb_change_protection(vma, address, end, newprot) +static inline void __unmap_hugepage_range(struct mmu_gather *tlb, + struct vm_area_struct *vma, unsigned long start, + unsigned long end, struct page *ref_page) +{ + BUG(); +} + #endif /* !CONFIG_HUGETLB_PAGE */ #define HUGETLB_ANON_FILE "anon_hugepage" diff --git a/mm/hugetlb.c b/mm/hugetlb.c index b1e0ed1..e54b695 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -24,8 +24,9 @@ #include #include -#include +#include +#include #include #include #include "internal.h" @@ -2310,30 +2311,26 @@ static int is_hugetlb_entry_hwpoisoned(pte_t pte) return 0; } -void __unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start, - unsigned long end, struct page *ref_page) +void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma, + unsigned long start, unsigned long end, + struct page *ref_page) { + int force_flush = 0; struct mm_struct *mm = vma->vm_mm; unsigned long address; pte_t *ptep; pte_t pte; struct page *page; - struct page *tmp; struct hstate *h = hstate_vma(vma); unsigned long sz = huge_page_size(h); - /* - * A page gathering list, protected by per file i_mmap_mutex. The - * lock is used to avoid list corruption from multiple unmapping - * of the same page since we are using page->lru. - */ - LIST_HEAD(page_list); - WARN_ON(!is_vm_hugetlb_page(vma)); BUG_ON(start & ~huge_page_mask(h)); BUG_ON(end & ~huge_page_mask(h)); + tlb_start_vma(tlb, vma); mmu_notifier_invalidate_range_start(mm, start, end); +again: spin_lock(&mm->page_table_lock); for (address = start; address < end; address += sz) { ptep = huge_pte_offset(mm, address); @@ -2372,30 +2369,45 @@ void __unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start, } pte = huge_ptep_get_and_clear(mm, address, ptep); + tlb_remove_tlb_entry(tlb, ptep, address); if (pte_dirty(pte)) set_page_dirty(page); - list_add(&page->lru, &page_list); + page_remove_rmap(page); + force_flush = !__tlb_remove_page(tlb, page); + if (force_flush) + break; /* Bail out after unmapping reference page if supplied */ if (ref_page) break; } - flush_tlb_range(vma, start, end); spin_unlock(&mm->page_table_lock); - mmu_notifier_invalidate_range_end(mm, start, end); - list_for_each_entry_safe(page, tmp, &page_list, lru) { - page_remove_rmap(page); - list_del(&page->lru); - put_page(page); + /* + * mmu_gather ran out of room to batch pages, we break out of + * the PTE lock to avoid doing the potential expensive TLB invalidate + * and page-free while holding it. + */ + if (force_flush) { + force_flush = 0; + tlb_flush_mmu(tlb); + if (address < end && !ref_page) + goto again; } + mmu_notifier_invalidate_range_end(mm, start, end); + tlb_end_vma(tlb, vma); } void unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start, unsigned long end, struct page *ref_page) { - mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex); - __unmap_hugepage_range(vma, start, end, ref_page); - mutex_unlock(&vma->vm_file->f_mapping->i_mmap_mutex); + struct mm_struct *mm; + struct mmu_gather tlb; + + mm = vma->vm_mm; + + tlb_gather_mmu(&tlb, mm, 0); + __unmap_hugepage_range(&tlb, vma, start, end, ref_page); + tlb_finish_mmu(&tlb, start, end); } /* @@ -2440,9 +2452,8 @@ static int unmap_ref_private(struct mm_struct *mm, struct vm_area_struct *vma, * from the time of fork. This would look like data corruption */ if (!is_vma_resv_set(iter_vma, HPAGE_RESV_OWNER)) - __unmap_hugepage_range(iter_vma, - address, address + huge_page_size(h), - page); + unmap_hugepage_range(iter_vma, address, + address + huge_page_size(h), page); } mutex_unlock(&mapping->i_mmap_mutex); diff --git a/mm/memory.c b/mm/memory.c index 1b7dc66..545e18a 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1326,8 +1326,11 @@ static void unmap_single_vma(struct mmu_gather *tlb, * Since no pte has actually been setup, it is * safe to do nothing in this case. */ - if (vma->vm_file) - unmap_hugepage_range(vma, start, end, NULL); + if (vma->vm_file) { + mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex); + __unmap_hugepage_range(tlb, vma, start, end, NULL); + mutex_unlock(&vma->vm_file->f_mapping->i_mmap_mutex); + } } else unmap_page_range(tlb, vma, start, end, details); } -- 1.7.10 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx150.postini.com [74.125.245.150]) by kanga.kvack.org (Postfix) with SMTP id 5F8786B0083 for ; Wed, 13 Jun 2012 07:33:01 -0400 (EDT) Received: from /spool/local by e23smtp04.au.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 13 Jun 2012 11:12:18 +1000 Received: from d23av01.au.ibm.com (d23av01.au.ibm.com [9.190.234.96]) by d23relay03.au.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id q5DBWoS553018830 for ; Wed, 13 Jun 2012 21:32:51 +1000 Received: from d23av01.au.ibm.com (loopback [127.0.0.1]) by d23av01.au.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id q5DBWnZa002180 for ; Wed, 13 Jun 2012 21:32:50 +1000 From: "Aneesh Kumar K.V" Subject: Re: [PATCH -V9 10/15] hugetlb/cgroup: Add the cgroup pointer to page lru In-Reply-To: <1339583254-895-11-git-send-email-aneesh.kumar@linux.vnet.ibm.com> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1339583254-895-11-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Date: Wed, 13 Jun 2012 17:02:47 +0530 Message-ID: <8762avo3a8.fsf@skywalker.in.ibm.com> MIME-Version: 1.0 Content-Type: text/plain Sender: owner-linux-mm@kvack.org List-ID: To: linux-mm@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, rientjes@google.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Need this patch for hugetlb cgroup disabled. I will send an updated patch in reply. diff --git a/include/linux/hugetlb_cgroup.h b/include/linux/hugetlb_cgroup.h index e9e6d74..bc30413 100644 --- a/include/linux/hugetlb_cgroup.h +++ b/include/linux/hugetlb_cgroup.h @@ -18,14 +18,14 @@ #include struct hugetlb_cgroup; - -#ifdef CONFIG_CGROUP_HUGETLB_RES_CTLR /* * Minimum page order trackable by hugetlb cgroup. * At least 3 pages are necessary for all the tracking information. */ #define HUGETLB_CGROUP_MIN_ORDER 2 +#ifdef CONFIG_CGROUP_HUGETLB_RES_CTLR + static inline struct hugetlb_cgroup *hugetlb_cgroup_from_page(struct page *page) { VM_BUG_ON(!PageHuge(page)); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx206.postini.com [74.125.245.206]) by kanga.kvack.org (Postfix) with SMTP id 618516B004D for ; Wed, 13 Jun 2012 07:34:42 -0400 (EDT) Received: from /spool/local by e28smtp08.in.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 13 Jun 2012 17:04:38 +0530 Received: from d28av04.in.ibm.com (d28av04.in.ibm.com [9.184.220.66]) by d28relay02.in.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id q5DBYawr10813888 for ; Wed, 13 Jun 2012 17:04:36 +0530 Received: from d28av04.in.ibm.com (loopback [127.0.0.1]) by d28av04.in.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id q5DH4MRp013303 for ; Thu, 14 Jun 2012 03:04:23 +1000 From: "Aneesh Kumar K.V" Subject: [PATCH -V9 [updated] 10/15] hugetlb/cgroup: Add the cgroup pointer to page lru Date: Wed, 13 Jun 2012 17:04:30 +0530 Message-Id: <1339587270-5831-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> In-Reply-To: <1339583254-895-11-git-send-email-aneesh.kumar@linux.vnet.ibm.com> References: <1339583254-895-11-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Sender: owner-linux-mm@kvack.org List-ID: To: linux-mm@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, rientjes@google.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, "Aneesh Kumar K.V" From: "Aneesh Kumar K.V" Add the hugetlb cgroup pointer to 3rd page lru.next. This limit the usage to hugetlb cgroup to only hugepages with 3 or more normal pages. I guess that is an acceptable limitation. Signed-off-by: Aneesh Kumar K.V --- include/linux/hugetlb_cgroup.h | 37 +++++++++++++++++++++++++++++++++++++ mm/hugetlb.c | 4 ++++ 2 files changed, 41 insertions(+) diff --git a/include/linux/hugetlb_cgroup.h b/include/linux/hugetlb_cgroup.h index e9944b4..2e4cb6b 100644 --- a/include/linux/hugetlb_cgroup.h +++ b/include/linux/hugetlb_cgroup.h @@ -18,8 +18,34 @@ #include struct hugetlb_cgroup; +/* + * Minimum page order trackable by hugetlb cgroup. + * At least 3 pages are necessary for all the tracking information. + */ +#define HUGETLB_CGROUP_MIN_ORDER 2 #ifdef CONFIG_CGROUP_HUGETLB_RES_CTLR + +static inline struct hugetlb_cgroup *hugetlb_cgroup_from_page(struct page *page) +{ + VM_BUG_ON(!PageHuge(page)); + + if (compound_order(page) < HUGETLB_CGROUP_MIN_ORDER) + return NULL; + return (struct hugetlb_cgroup *)page[2].lru.next; +} + +static inline +int set_hugetlb_cgroup(struct page *page, struct hugetlb_cgroup *h_cg) +{ + VM_BUG_ON(!PageHuge(page)); + + if (compound_order(page) < HUGETLB_CGROUP_MIN_ORDER) + return -1; + page[2].lru.next = (void *)h_cg; + return 0; +} + static inline bool hugetlb_cgroup_disabled(void) { if (hugetlb_subsys.disabled) @@ -28,6 +54,17 @@ static inline bool hugetlb_cgroup_disabled(void) } #else +static inline struct hugetlb_cgroup *hugetlb_cgroup_from_page(struct page *page) +{ + return NULL; +} + +static inline +int set_hugetlb_cgroup(struct page *page, struct hugetlb_cgroup *h_cg) +{ + return 0; +} + static inline bool hugetlb_cgroup_disabled(void) { return true; diff --git a/mm/hugetlb.c b/mm/hugetlb.c index e899a2d..6a449c5 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -28,6 +28,7 @@ #include #include +#include #include #include "internal.h" @@ -591,6 +592,7 @@ static void update_and_free_page(struct hstate *h, struct page *page) 1 << PG_active | 1 << PG_reserved | 1 << PG_private | 1 << PG_writeback); } + VM_BUG_ON(hugetlb_cgroup_from_page(page)); set_compound_page_dtor(page, NULL); set_page_refcounted(page); arch_release_hugepage(page); @@ -643,6 +645,7 @@ static void prep_new_huge_page(struct hstate *h, struct page *page, int nid) INIT_LIST_HEAD(&page->lru); set_compound_page_dtor(page, free_huge_page); spin_lock(&hugetlb_lock); + set_hugetlb_cgroup(page, NULL); h->nr_huge_pages++; h->nr_huge_pages_node[nid]++; spin_unlock(&hugetlb_lock); @@ -892,6 +895,7 @@ static struct page *alloc_buddy_huge_page(struct hstate *h, int nid) INIT_LIST_HEAD(&page->lru); r_nid = page_to_nid(page); set_compound_page_dtor(page, free_huge_page); + set_hugetlb_cgroup(page, NULL); /* * We incremented the global counters already */ -- 1.7.10 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx127.postini.com [74.125.245.127]) by kanga.kvack.org (Postfix) with SMTP id 72B0B6B0069 for ; Wed, 13 Jun 2012 10:59:29 -0400 (EDT) Date: Wed, 13 Jun 2012 16:59:23 +0200 From: Michal Hocko Subject: Re: [PATCH -V9 04/15] hugetlb: use mmu_gather instead of a temporary linked list for accumulating pages Message-ID: <20120613145923.GA14777@tiehlicka.suse.cz> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1339583254-895-5-git-send-email-aneesh.kumar@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1339583254-895-5-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Sender: owner-linux-mm@kvack.org List-ID: To: "Aneesh Kumar K.V" Cc: linux-mm@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, rientjes@google.com, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org On Wed 13-06-12 15:57:23, Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > Use a mmu_gather instead of a temporary linked list for accumulating > pages when we unmap a hugepage range Sorry for coming up with the comment that late but you owe us an explanation _why_ you are doing this. I assume that this fixes a real problem when we take i_mmap_mutex already up in unmap_mapping_range mutex_lock(&mapping->i_mmap_mutex); unmap_mapping_range_tree | unmap_mapping_range_list unmap_mapping_range_vma zap_page_range_single unmap_single_vma unmap_hugepage_range mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex); And that this should have been marked for stable as well (I haven't checked when this has been introduced). But then I do not see how this help when you still do this: [...] > diff --git a/mm/memory.c b/mm/memory.c > index 1b7dc66..545e18a 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -1326,8 +1326,11 @@ static void unmap_single_vma(struct mmu_gather *tlb, > * Since no pte has actually been setup, it is > * safe to do nothing in this case. > */ > - if (vma->vm_file) > - unmap_hugepage_range(vma, start, end, NULL); > + if (vma->vm_file) { > + mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex); > + __unmap_hugepage_range(tlb, vma, start, end, NULL); > + mutex_unlock(&vma->vm_file->f_mapping->i_mmap_mutex); > + } > } else > unmap_page_range(tlb, vma, start, end, details); > } -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx139.postini.com [74.125.245.139]) by kanga.kvack.org (Postfix) with SMTP id 970F66B0070 for ; Wed, 13 Jun 2012 11:03:41 -0400 (EDT) Date: Wed, 13 Jun 2012 17:03:38 +0200 From: Michal Hocko Subject: Re: [PATCH -V9 04/15] hugetlb: use mmu_gather instead of a temporary linked list for accumulating pages Message-ID: <20120613150338.GB14777@tiehlicka.suse.cz> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1339583254-895-5-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <20120613145923.GA14777@tiehlicka.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20120613145923.GA14777@tiehlicka.suse.cz> Sender: owner-linux-mm@kvack.org List-ID: To: "Aneesh Kumar K.V" Cc: linux-mm@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, rientjes@google.com, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org On Wed 13-06-12 16:59:23, Michal Hocko wrote: > On Wed 13-06-12 15:57:23, Aneesh Kumar K.V wrote: > > From: "Aneesh Kumar K.V" > > > > Use a mmu_gather instead of a temporary linked list for accumulating > > pages when we unmap a hugepage range > > Sorry for coming up with the comment that late but you owe us an > explanation _why_ you are doing this. > > I assume that this fixes a real problem when we take i_mmap_mutex > already up in > unmap_mapping_range > mutex_lock(&mapping->i_mmap_mutex); > unmap_mapping_range_tree | unmap_mapping_range_list > unmap_mapping_range_vma > zap_page_range_single > unmap_single_vma > unmap_hugepage_range > mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex); > > And that this should have been marked for stable as well (I haven't > checked when this has been introduced). > > But then I do not see how this help when you still do this: > [...] > > diff --git a/mm/memory.c b/mm/memory.c > > index 1b7dc66..545e18a 100644 > > --- a/mm/memory.c > > +++ b/mm/memory.c > > @@ -1326,8 +1326,11 @@ static void unmap_single_vma(struct mmu_gather *tlb, > > * Since no pte has actually been setup, it is > > * safe to do nothing in this case. > > */ > > - if (vma->vm_file) > > - unmap_hugepage_range(vma, start, end, NULL); > > + if (vma->vm_file) { > > + mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex); > > + __unmap_hugepage_range(tlb, vma, start, end, NULL); > > + mutex_unlock(&vma->vm_file->f_mapping->i_mmap_mutex); > > + } > > } else > > unmap_page_range(tlb, vma, start, end, details); > > } Ahhh, you are removing the lock in the next patch. Really confusing and not nice for the stable backport. Could you merge those two patches and add Cc: stable? Then you can add my Reviewed-by: Michal Hocko -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx201.postini.com [74.125.245.201]) by kanga.kvack.org (Postfix) with SMTP id 8BEA46B004D for ; Wed, 13 Jun 2012 12:37:17 -0400 (EDT) Received: from /spool/local by e28smtp02.in.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 13 Jun 2012 22:07:14 +0530 Received: from d28av02.in.ibm.com (d28av02.in.ibm.com [9.184.220.64]) by d28relay02.in.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id q5DGbA8o8978918 for ; Wed, 13 Jun 2012 22:07:11 +0530 Received: from d28av02.in.ibm.com (loopback [127.0.0.1]) by d28av02.in.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id q5DM7mqn008656 for ; Thu, 14 Jun 2012 08:07:48 +1000 From: "Aneesh Kumar K.V" Subject: Re: [PATCH -V9 04/15] hugetlb: use mmu_gather instead of a temporary linked list for accumulating pages In-Reply-To: <20120613145923.GA14777@tiehlicka.suse.cz> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1339583254-895-5-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <20120613145923.GA14777@tiehlicka.suse.cz> Date: Wed, 13 Jun 2012 22:07:06 +0530 Message-ID: <871uljnp71.fsf@skywalker.in.ibm.com> MIME-Version: 1.0 Content-Type: text/plain Sender: owner-linux-mm@kvack.org List-ID: To: Michal Hocko Cc: linux-mm@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, rientjes@google.com, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Michal Hocko writes: > On Wed 13-06-12 15:57:23, Aneesh Kumar K.V wrote: >> From: "Aneesh Kumar K.V" >> >> Use a mmu_gather instead of a temporary linked list for accumulating >> pages when we unmap a hugepage range > > Sorry for coming up with the comment that late but you owe us an > explanation _why_ you are doing this. > > I assume that this fixes a real problem when we take i_mmap_mutex > already up in > unmap_mapping_range > mutex_lock(&mapping->i_mmap_mutex); > unmap_mapping_range_tree | unmap_mapping_range_list > unmap_mapping_range_vma > zap_page_range_single > unmap_single_vma > unmap_hugepage_range > mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex); > > And that this should have been marked for stable as well (I haven't > checked when this has been introduced). Switch to mmu_gather is to get rid of the use of page->lru so that i can use it for active list. -aneesh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx128.postini.com [74.125.245.128]) by kanga.kvack.org (Postfix) with SMTP id 66A2B6B005C for ; Wed, 13 Jun 2012 23:11:40 -0400 (EDT) Received: from m2.gw.fujitsu.co.jp (unknown [10.0.50.72]) by fgwmail5.fujitsu.co.jp (Postfix) with ESMTP id 700533EE0C8 for ; Thu, 14 Jun 2012 12:11:38 +0900 (JST) Received: from smail (m2 [127.0.0.1]) by outgoing.m2.gw.fujitsu.co.jp (Postfix) with ESMTP id 50A4145DD74 for ; Thu, 14 Jun 2012 12:11:38 +0900 (JST) Received: from s2.gw.fujitsu.co.jp (s2.gw.fujitsu.co.jp [10.0.50.92]) by m2.gw.fujitsu.co.jp (Postfix) with ESMTP id 1F2F245DE4D for ; Thu, 14 Jun 2012 12:11:38 +0900 (JST) Received: from s2.gw.fujitsu.co.jp (localhost.localdomain [127.0.0.1]) by s2.gw.fujitsu.co.jp (Postfix) with ESMTP id 0BF871DB802C for ; Thu, 14 Jun 2012 12:11:38 +0900 (JST) Received: from ml14.s.css.fujitsu.com (ml14.s.css.fujitsu.com [10.240.81.134]) by s2.gw.fujitsu.co.jp (Postfix) with ESMTP id B79C21DB803F for ; Thu, 14 Jun 2012 12:11:37 +0900 (JST) Message-ID: <4FD955E8.5050100@jp.fujitsu.com> Date: Thu, 14 Jun 2012 12:09:28 +0900 From: Kamezawa Hiroyuki MIME-Version: 1.0 Subject: Re: [PATCH -V9 05/15] hugetlb: avoid taking i_mmap_mutex in unmap_single_vma() for hugetlb References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1339583254-895-6-git-send-email-aneesh.kumar@linux.vnet.ibm.com> In-Reply-To: <1339583254-895-6-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Content-Type: text/plain; charset=ISO-2022-JP Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: "Aneesh Kumar K.V" Cc: linux-mm@kvack.org, dhillf@gmail.com, rientjes@google.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org (2012/06/13 19:27), Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > i_mmap_mutex lock was added in unmap_single_vma by 502717f4e ("hugetlb: > fix linked list corruption in unmap_hugepage_range()") but we don't use > page->lru in unmap_hugepage_range any more. Also the lock was taken > higher up in the stack in some code path. That would result in deadlock. > > unmap_mapping_range (i_mmap_mutex) > -> unmap_mapping_range_tree > -> unmap_mapping_range_vma > -> zap_page_range_single > -> unmap_single_vma > -> unmap_hugepage_range (i_mmap_mutex) > > For shared pagetable support for huge pages, since pagetable pages are ref > counted we don't need any lock during huge_pmd_unshare. We do take > i_mmap_mutex in huge_pmd_share while walking the vma_prio_tree in mapping. > (39dde65c9940c97f ("shared page table for hugetlb page")). > > Signed-off-by: Aneesh Kumar K.V Acked-by: KAMEZAWA Hiroyuki -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx148.postini.com [74.125.245.148]) by kanga.kvack.org (Postfix) with SMTP id 6388F6B005C for ; Thu, 14 Jun 2012 00:07:04 -0400 (EDT) Received: from m1.gw.fujitsu.co.jp (unknown [10.0.50.71]) by fgwmail6.fujitsu.co.jp (Postfix) with ESMTP id EE2843EE0B5 for ; Thu, 14 Jun 2012 13:07:02 +0900 (JST) Received: from smail (m1 [127.0.0.1]) by outgoing.m1.gw.fujitsu.co.jp (Postfix) with ESMTP id D365945DE59 for ; Thu, 14 Jun 2012 13:07:02 +0900 (JST) Received: from s1.gw.fujitsu.co.jp (s1.gw.fujitsu.co.jp [10.0.50.91]) by m1.gw.fujitsu.co.jp (Postfix) with ESMTP id BAC5345DE56 for ; Thu, 14 Jun 2012 13:07:02 +0900 (JST) Received: from s1.gw.fujitsu.co.jp (localhost.localdomain [127.0.0.1]) by s1.gw.fujitsu.co.jp (Postfix) with ESMTP id AA4261DB804F for ; Thu, 14 Jun 2012 13:07:02 +0900 (JST) Received: from ml14.s.css.fujitsu.com (ml14.s.css.fujitsu.com [10.240.81.134]) by s1.gw.fujitsu.co.jp (Postfix) with ESMTP id 63EEE1DB803C for ; Thu, 14 Jun 2012 13:07:02 +0900 (JST) Message-ID: <4FD962D4.1020908@jp.fujitsu.com> Date: Thu, 14 Jun 2012 13:04:36 +0900 From: Kamezawa Hiroyuki MIME-Version: 1.0 Subject: Re: [PATCH -V9 [updated] 10/15] hugetlb/cgroup: Add the cgroup pointer to page lru References: <1339583254-895-11-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1339587270-5831-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> In-Reply-To: <1339587270-5831-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Content-Type: text/plain; charset=ISO-2022-JP Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: "Aneesh Kumar K.V" Cc: linux-mm@kvack.org, dhillf@gmail.com, rientjes@google.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org (2012/06/13 20:34), Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > Add the hugetlb cgroup pointer to 3rd page lru.next. This limit > the usage to hugetlb cgroup to only hugepages with 3 or more > normal pages. I guess that is an acceptable limitation. > > Signed-off-by: Aneesh Kumar K.V Acked-by: KAMEZAWA Hiroyuki -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx135.postini.com [74.125.245.135]) by kanga.kvack.org (Postfix) with SMTP id E65C46B005C for ; Thu, 14 Jun 2012 00:09:19 -0400 (EDT) Received: from m1.gw.fujitsu.co.jp (unknown [10.0.50.71]) by fgwmail6.fujitsu.co.jp (Postfix) with ESMTP id 7EA3E3EE0BC for ; Thu, 14 Jun 2012 13:09:18 +0900 (JST) Received: from smail (m1 [127.0.0.1]) by outgoing.m1.gw.fujitsu.co.jp (Postfix) with ESMTP id 5FC3245DE5B for ; Thu, 14 Jun 2012 13:09:18 +0900 (JST) Received: from s1.gw.fujitsu.co.jp (s1.gw.fujitsu.co.jp [10.0.50.91]) by m1.gw.fujitsu.co.jp (Postfix) with ESMTP id 4423D45DE54 for ; Thu, 14 Jun 2012 13:09:18 +0900 (JST) Received: from s1.gw.fujitsu.co.jp (localhost.localdomain [127.0.0.1]) by s1.gw.fujitsu.co.jp (Postfix) with ESMTP id 351BAE08003 for ; Thu, 14 Jun 2012 13:09:18 +0900 (JST) Received: from ml13.s.css.fujitsu.com (ml13.s.css.fujitsu.com [10.240.81.133]) by s1.gw.fujitsu.co.jp (Postfix) with ESMTP id E08251DB803C for ; Thu, 14 Jun 2012 13:09:17 +0900 (JST) Message-ID: <4FD96370.2020708@jp.fujitsu.com> Date: Thu, 14 Jun 2012 13:07:12 +0900 From: Kamezawa Hiroyuki MIME-Version: 1.0 Subject: Re: [PATCH -V9 11/15] hugetlb/cgroup: Add charge/uncharge routines for hugetlb cgroup References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1339583254-895-12-git-send-email-aneesh.kumar@linux.vnet.ibm.com> In-Reply-To: <1339583254-895-12-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Content-Type: text/plain; charset=ISO-2022-JP Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: "Aneesh Kumar K.V" Cc: linux-mm@kvack.org, dhillf@gmail.com, rientjes@google.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org (2012/06/13 19:27), Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > This patchset add the charge and uncharge routines for hugetlb cgroup. > We do cgroup charging in page alloc and uncharge in compound page > destructor. Assigning page's hugetlb cgroup is protected by hugetlb_lock. > > Signed-off-by: Aneesh Kumar K.V Acked-by: KAMEZAWA Hiroyuki -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx178.postini.com [74.125.245.178]) by kanga.kvack.org (Postfix) with SMTP id 92A266B005C for ; Thu, 14 Jun 2012 00:11:11 -0400 (EDT) Received: from m2.gw.fujitsu.co.jp (unknown [10.0.50.72]) by fgwmail5.fujitsu.co.jp (Postfix) with ESMTP id 341443EE0BD for ; Thu, 14 Jun 2012 13:11:10 +0900 (JST) Received: from smail (m2 [127.0.0.1]) by outgoing.m2.gw.fujitsu.co.jp (Postfix) with ESMTP id 11C6245DE52 for ; Thu, 14 Jun 2012 13:11:10 +0900 (JST) Received: from s2.gw.fujitsu.co.jp (s2.gw.fujitsu.co.jp [10.0.50.92]) by m2.gw.fujitsu.co.jp (Postfix) with ESMTP id EDA3645DE4E for ; Thu, 14 Jun 2012 13:11:09 +0900 (JST) Received: from s2.gw.fujitsu.co.jp (localhost.localdomain [127.0.0.1]) by s2.gw.fujitsu.co.jp (Postfix) with ESMTP id DBDA01DB8038 for ; Thu, 14 Jun 2012 13:11:09 +0900 (JST) Received: from m107.s.css.fujitsu.com (m107.s.css.fujitsu.com [10.240.81.147]) by s2.gw.fujitsu.co.jp (Postfix) with ESMTP id 959A01DB803F for ; Thu, 14 Jun 2012 13:11:09 +0900 (JST) Message-ID: <4FD963E1.6080506@jp.fujitsu.com> Date: Thu, 14 Jun 2012 13:09:05 +0900 From: Kamezawa Hiroyuki MIME-Version: 1.0 Subject: Re: [PATCH -V9 12/15] hugetlb/cgroup: Add support for cgroup removal References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1339583254-895-13-git-send-email-aneesh.kumar@linux.vnet.ibm.com> In-Reply-To: <1339583254-895-13-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Content-Type: text/plain; charset=ISO-2022-JP Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: "Aneesh Kumar K.V" Cc: linux-mm@kvack.org, dhillf@gmail.com, rientjes@google.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org (2012/06/13 19:27), Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > This patch add support for cgroup removal. If we don't have parent > cgroup, the charges are moved to root cgroup. > > Signed-off-by: Aneesh Kumar K.V Acked-by: KAMEZAWA Hiroyuki -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx139.postini.com [74.125.245.139]) by kanga.kvack.org (Postfix) with SMTP id 87B906B005C for ; Thu, 14 Jun 2012 00:12:49 -0400 (EDT) Received: from m3.gw.fujitsu.co.jp (unknown [10.0.50.73]) by fgwmail5.fujitsu.co.jp (Postfix) with ESMTP id D1E2B3EE0BC for ; Thu, 14 Jun 2012 13:12:47 +0900 (JST) Received: from smail (m3 [127.0.0.1]) by outgoing.m3.gw.fujitsu.co.jp (Postfix) with ESMTP id B464F45DEB3 for ; Thu, 14 Jun 2012 13:12:47 +0900 (JST) Received: from s3.gw.fujitsu.co.jp (s3.gw.fujitsu.co.jp [10.0.50.93]) by m3.gw.fujitsu.co.jp (Postfix) with ESMTP id 9C3A245DEB4 for ; Thu, 14 Jun 2012 13:12:47 +0900 (JST) Received: from s3.gw.fujitsu.co.jp (localhost.localdomain [127.0.0.1]) by s3.gw.fujitsu.co.jp (Postfix) with ESMTP id 8C7B81DB803F for ; Thu, 14 Jun 2012 13:12:47 +0900 (JST) Received: from ml13.s.css.fujitsu.com (ml13.s.css.fujitsu.com [10.240.81.133]) by s3.gw.fujitsu.co.jp (Postfix) with ESMTP id 3ECE61DB8044 for ; Thu, 14 Jun 2012 13:12:47 +0900 (JST) Message-ID: <4FD96442.3040509@jp.fujitsu.com> Date: Thu, 14 Jun 2012 13:10:42 +0900 From: Kamezawa Hiroyuki MIME-Version: 1.0 Subject: Re: [PATCH -V9 13/15] hugetlb/cgroup: add hugetlb cgroup control files References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1339583254-895-14-git-send-email-aneesh.kumar@linux.vnet.ibm.com> In-Reply-To: <1339583254-895-14-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Content-Type: text/plain; charset=ISO-2022-JP Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: "Aneesh Kumar K.V" Cc: linux-mm@kvack.org, dhillf@gmail.com, rientjes@google.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org (2012/06/13 19:27), Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > Add the control files for hugetlb controller > > Signed-off-by: Aneesh Kumar K.V Acked-by: KAMEZAWA Hiroyuki -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx183.postini.com [74.125.245.183]) by kanga.kvack.org (Postfix) with SMTP id 68EC06B005A for ; Thu, 14 Jun 2012 00:15:24 -0400 (EDT) Received: from m4.gw.fujitsu.co.jp (unknown [10.0.50.74]) by fgwmail5.fujitsu.co.jp (Postfix) with ESMTP id 04E9A3EE0BD for ; Thu, 14 Jun 2012 13:15:23 +0900 (JST) Received: from smail (m4 [127.0.0.1]) by outgoing.m4.gw.fujitsu.co.jp (Postfix) with ESMTP id D9B4E45DE57 for ; Thu, 14 Jun 2012 13:15:22 +0900 (JST) Received: from s4.gw.fujitsu.co.jp (s4.gw.fujitsu.co.jp [10.0.50.94]) by m4.gw.fujitsu.co.jp (Postfix) with ESMTP id C07CF45DE53 for ; Thu, 14 Jun 2012 13:15:22 +0900 (JST) Received: from s4.gw.fujitsu.co.jp (localhost.localdomain [127.0.0.1]) by s4.gw.fujitsu.co.jp (Postfix) with ESMTP id AEF7C1DB8041 for ; Thu, 14 Jun 2012 13:15:22 +0900 (JST) Received: from ml14.s.css.fujitsu.com (ml14.s.css.fujitsu.com [10.240.81.134]) by s4.gw.fujitsu.co.jp (Postfix) with ESMTP id 681A31DB8040 for ; Thu, 14 Jun 2012 13:15:22 +0900 (JST) Message-ID: <4FD964DD.6060802@jp.fujitsu.com> Date: Thu, 14 Jun 2012 13:13:17 +0900 From: Kamezawa Hiroyuki MIME-Version: 1.0 Subject: Re: [PATCH -V9 14/15] hugetlb/cgroup: migrate hugetlb cgroup info from oldpage to new page during migration References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1339583254-895-15-git-send-email-aneesh.kumar@linux.vnet.ibm.com> In-Reply-To: <1339583254-895-15-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Content-Type: text/plain; charset=ISO-2022-JP Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: "Aneesh Kumar K.V" Cc: linux-mm@kvack.org, dhillf@gmail.com, rientjes@google.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org (2012/06/13 19:27), Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > With HugeTLB pages, hugetlb cgroup is uncharged in compound page destructor. Since > we are holding a hugepage reference, we can be sure that old page won't > get uncharged till the last put_page(). > > Signed-off-by: Aneesh Kumar K.V Acked-by: KAMEZAWA Hiroyuki -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx196.postini.com [74.125.245.196]) by kanga.kvack.org (Postfix) with SMTP id D51DC6B005C for ; Thu, 14 Jun 2012 03:14:27 -0400 (EDT) Date: Thu, 14 Jun 2012 09:14:23 +0200 From: Michal Hocko Subject: Re: [PATCH -V9 04/15] hugetlb: use mmu_gather instead of a temporary linked list for accumulating pages Message-ID: <20120614071423.GA27397@tiehlicka.suse.cz> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1339583254-895-5-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <20120613145923.GA14777@tiehlicka.suse.cz> <20120613150338.GB14777@tiehlicka.suse.cz> <87y5nrmacr.fsf@skywalker.in.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87y5nrmacr.fsf@skywalker.in.ibm.com> Sender: owner-linux-mm@kvack.org List-ID: To: "Aneesh Kumar K.V" Cc: linux-mm@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, rientjes@google.com, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org On Wed 13-06-12 22:13:00, Aneesh Kumar K.V wrote: > Michal Hocko writes: > > > On Wed 13-06-12 16:59:23, Michal Hocko wrote: > >> On Wed 13-06-12 15:57:23, Aneesh Kumar K.V wrote: > >> > From: "Aneesh Kumar K.V" > >> > > >> > Use a mmu_gather instead of a temporary linked list for accumulating > >> > pages when we unmap a hugepage range > >> > >> Sorry for coming up with the comment that late but you owe us an > >> explanation _why_ you are doing this. > >> > >> I assume that this fixes a real problem when we take i_mmap_mutex > >> already up in > >> unmap_mapping_range > >> mutex_lock(&mapping->i_mmap_mutex); > >> unmap_mapping_range_tree | unmap_mapping_range_list > >> unmap_mapping_range_vma > >> zap_page_range_single > >> unmap_single_vma > >> unmap_hugepage_range > >> mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex); > >> > >> And that this should have been marked for stable as well (I haven't > >> checked when this has been introduced). > >> > >> But then I do not see how this help when you still do this: > >> [...] > >> > diff --git a/mm/memory.c b/mm/memory.c > >> > index 1b7dc66..545e18a 100644 > >> > --- a/mm/memory.c > >> > +++ b/mm/memory.c > >> > @@ -1326,8 +1326,11 @@ static void unmap_single_vma(struct mmu_gather *tlb, > >> > * Since no pte has actually been setup, it is > >> > * safe to do nothing in this case. > >> > */ > >> > - if (vma->vm_file) > >> > - unmap_hugepage_range(vma, start, end, NULL); > >> > + if (vma->vm_file) { > >> > + mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex); > >> > + __unmap_hugepage_range(tlb, vma, start, end, NULL); > >> > + mutex_unlock(&vma->vm_file->f_mapping->i_mmap_mutex); > >> > + } > >> > } else > >> > unmap_page_range(tlb, vma, start, end, details); > >> > } > > > > Ahhh, you are removing the lock in the next patch. Really confusing and > > not nice for the stable backport. > > Could you merge those two patches and add Cc: stable? > > Then you can add my > > Reviewed-by: Michal Hocko > > > > In the last review cycle I was asked to see if we can get a lockdep > report for the above and what I found was we don't really cause the > above deadlock with the current codebase because for hugetlb we don't > directly call unmap_mapping_range. Ahh, ok I missed that. > But still it is good to remove the i_mmap_mutex, because we don't need > that protection now. I didn't mark it for stable because of the above > reason. Thanks for clarification > > -aneesh > -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx163.postini.com [74.125.245.163]) by kanga.kvack.org (Postfix) with SMTP id 5E6406B005C for ; Thu, 14 Jun 2012 03:16:39 -0400 (EDT) Date: Thu, 14 Jun 2012 09:16:37 +0200 From: Michal Hocko Subject: Re: [PATCH -V9 04/15] hugetlb: use mmu_gather instead of a temporary linked list for accumulating pages Message-ID: <20120614071637.GB27397@tiehlicka.suse.cz> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1339583254-895-5-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <20120613145923.GA14777@tiehlicka.suse.cz> <871uljnp71.fsf@skywalker.in.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <871uljnp71.fsf@skywalker.in.ibm.com> Sender: owner-linux-mm@kvack.org List-ID: To: "Aneesh Kumar K.V" Cc: linux-mm@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, rientjes@google.com, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org On Wed 13-06-12 22:07:06, Aneesh Kumar K.V wrote: > Michal Hocko writes: > > > On Wed 13-06-12 15:57:23, Aneesh Kumar K.V wrote: > >> From: "Aneesh Kumar K.V" > >> > >> Use a mmu_gather instead of a temporary linked list for accumulating > >> pages when we unmap a hugepage range > > > > Sorry for coming up with the comment that late but you owe us an > > explanation _why_ you are doing this. > > > > I assume that this fixes a real problem when we take i_mmap_mutex > > already up in > > unmap_mapping_range > > mutex_lock(&mapping->i_mmap_mutex); > > unmap_mapping_range_tree | unmap_mapping_range_list > > unmap_mapping_range_vma > > zap_page_range_single > > unmap_single_vma > > unmap_hugepage_range > > mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex); > > > > And that this should have been marked for stable as well (I haven't > > checked when this has been introduced). > > Switch to mmu_gather is to get rid of the use of page->lru so that i can use it for > active list. So can we get this to the changelog please? -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx180.postini.com [74.125.245.180]) by kanga.kvack.org (Postfix) with SMTP id 949C96B005C for ; Thu, 14 Jun 2012 03:20:55 -0400 (EDT) Date: Thu, 14 Jun 2012 09:20:53 +0200 From: Michal Hocko Subject: Re: [PATCH -V9 05/15] hugetlb: avoid taking i_mmap_mutex in unmap_single_vma() for hugetlb Message-ID: <20120614072053.GC27397@tiehlicka.suse.cz> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1339583254-895-6-git-send-email-aneesh.kumar@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1339583254-895-6-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Sender: owner-linux-mm@kvack.org List-ID: To: "Aneesh Kumar K.V" Cc: linux-mm@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, rientjes@google.com, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org On Wed 13-06-12 15:57:24, Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > i_mmap_mutex lock was added in unmap_single_vma by 502717f4e ("hugetlb: > fix linked list corruption in unmap_hugepage_range()") but we don't use > page->lru in unmap_hugepage_range any more. Also the lock was taken > higher up in the stack in some code path. That would result in deadlock. This sounds like the deadlock is real but in the other email you wrote that the deadlock cannot happen so it would be good to mention it here. > unmap_mapping_range (i_mmap_mutex) > -> unmap_mapping_range_tree > -> unmap_mapping_range_vma > -> zap_page_range_single > -> unmap_single_vma > -> unmap_hugepage_range (i_mmap_mutex) > > For shared pagetable support for huge pages, since pagetable pages are ref > counted we don't need any lock during huge_pmd_unshare. We do take > i_mmap_mutex in huge_pmd_share while walking the vma_prio_tree in mapping. > (39dde65c9940c97f ("shared page table for hugetlb page")). > > Signed-off-by: Aneesh Kumar K.V > --- > mm/memory.c | 5 +---- > 1 file changed, 1 insertion(+), 4 deletions(-) > > diff --git a/mm/memory.c b/mm/memory.c > index 545e18a..f6bc04f 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -1326,11 +1326,8 @@ static void unmap_single_vma(struct mmu_gather *tlb, > * Since no pte has actually been setup, it is > * safe to do nothing in this case. > */ > - if (vma->vm_file) { > - mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex); > + if (vma->vm_file) > __unmap_hugepage_range(tlb, vma, start, end, NULL); > - mutex_unlock(&vma->vm_file->f_mapping->i_mmap_mutex); > - } > } else > unmap_page_range(tlb, vma, start, end, details); > } > -- > 1.7.10 > -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx172.postini.com [74.125.245.172]) by kanga.kvack.org (Postfix) with SMTP id 601F06B0062 for ; Thu, 14 Jun 2012 03:28:34 -0400 (EDT) Date: Thu, 14 Jun 2012 09:28:31 +0200 From: Michal Hocko Subject: Re: [PATCH -V9 06/15] hugetlb: simplify migrate_huge_page() Message-ID: <20120614072831.GD27397@tiehlicka.suse.cz> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1339583254-895-7-git-send-email-aneesh.kumar@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1339583254-895-7-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Sender: owner-linux-mm@kvack.org List-ID: To: "Aneesh Kumar K.V" Cc: linux-mm@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, rientjes@google.com, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org On Wed 13-06-12 15:57:25, Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > Since we migrate only one hugepage, don't use linked list for passing the > page around. Directly pass the page that need to be migrated as argument. > This also remove the usage page->lru in migrate path. > > Reviewed-by: KAMEZAWA Hiroyuki > Signed-off-by: Aneesh Kumar K.V Yes nice. Reviewed-by: Michal Hocko > --- > include/linux/migrate.h | 4 +-- > mm/memory-failure.c | 13 ++-------- > mm/migrate.c | 65 +++++++++++++++-------------------------------- > 3 files changed, 25 insertions(+), 57 deletions(-) > > diff --git a/include/linux/migrate.h b/include/linux/migrate.h > index 855c337..ce7e667 100644 > --- a/include/linux/migrate.h > +++ b/include/linux/migrate.h > @@ -15,7 +15,7 @@ extern int migrate_page(struct address_space *, > extern int migrate_pages(struct list_head *l, new_page_t x, > unsigned long private, bool offlining, > enum migrate_mode mode); > -extern int migrate_huge_pages(struct list_head *l, new_page_t x, > +extern int migrate_huge_page(struct page *, new_page_t x, > unsigned long private, bool offlining, > enum migrate_mode mode); > > @@ -36,7 +36,7 @@ static inline void putback_lru_pages(struct list_head *l) {} > static inline int migrate_pages(struct list_head *l, new_page_t x, > unsigned long private, bool offlining, > enum migrate_mode mode) { return -ENOSYS; } > -static inline int migrate_huge_pages(struct list_head *l, new_page_t x, > +static inline int migrate_huge_page(struct page *page, new_page_t x, > unsigned long private, bool offlining, > enum migrate_mode mode) { return -ENOSYS; } > > diff --git a/mm/memory-failure.c b/mm/memory-failure.c > index ab1e714..53a1495 100644 > --- a/mm/memory-failure.c > +++ b/mm/memory-failure.c > @@ -1414,7 +1414,6 @@ static int soft_offline_huge_page(struct page *page, int flags) > int ret; > unsigned long pfn = page_to_pfn(page); > struct page *hpage = compound_head(page); > - LIST_HEAD(pagelist); > > ret = get_any_page(page, pfn, flags); > if (ret < 0) > @@ -1429,19 +1428,11 @@ static int soft_offline_huge_page(struct page *page, int flags) > } > > /* Keep page count to indicate a given hugepage is isolated. */ > - > - list_add(&hpage->lru, &pagelist); > - ret = migrate_huge_pages(&pagelist, new_page, MPOL_MF_MOVE_ALL, 0, > - true); > + ret = migrate_huge_page(hpage, new_page, MPOL_MF_MOVE_ALL, 0, true); > + put_page(hpage); > if (ret) { > - struct page *page1, *page2; > - list_for_each_entry_safe(page1, page2, &pagelist, lru) > - put_page(page1); > - > pr_info("soft offline: %#lx: migration failed %d, type %lx\n", > pfn, ret, page->flags); > - if (ret > 0) > - ret = -EIO; > return ret; > } > done: > diff --git a/mm/migrate.c b/mm/migrate.c > index be26d5c..fdce3a2 100644 > --- a/mm/migrate.c > +++ b/mm/migrate.c > @@ -932,15 +932,8 @@ static int unmap_and_move_huge_page(new_page_t get_new_page, > if (anon_vma) > put_anon_vma(anon_vma); > unlock_page(hpage); > - > out: > - if (rc != -EAGAIN) { > - list_del(&hpage->lru); > - put_page(hpage); > - } > - > put_page(new_hpage); > - > if (result) { > if (rc) > *result = rc; > @@ -1016,48 +1009,32 @@ out: > return nr_failed + retry; > } > > -int migrate_huge_pages(struct list_head *from, > - new_page_t get_new_page, unsigned long private, bool offlining, > - enum migrate_mode mode) > +int migrate_huge_page(struct page *hpage, new_page_t get_new_page, > + unsigned long private, bool offlining, > + enum migrate_mode mode) > { > - int retry = 1; > - int nr_failed = 0; > - int pass = 0; > - struct page *page; > - struct page *page2; > - int rc; > - > - for (pass = 0; pass < 10 && retry; pass++) { > - retry = 0; > - > - list_for_each_entry_safe(page, page2, from, lru) { > + int pass, rc; > + > + for (pass = 0; pass < 10; pass++) { > + rc = unmap_and_move_huge_page(get_new_page, > + private, hpage, pass > 2, offlining, > + mode); > + switch (rc) { > + case -ENOMEM: > + goto out; > + case -EAGAIN: > + /* try again */ > cond_resched(); > - > - rc = unmap_and_move_huge_page(get_new_page, > - private, page, pass > 2, offlining, > - mode); > - > - switch(rc) { > - case -ENOMEM: > - goto out; > - case -EAGAIN: > - retry++; > - break; > - case 0: > - break; > - default: > - /* Permanent failure */ > - nr_failed++; > - break; > - } > + break; > + case 0: > + goto out; > + default: > + rc = -EIO; > + goto out; > } > } > - rc = 0; > out: > - if (rc) > - return rc; > - > - return nr_failed + retry; > + return rc; > } > > #ifdef CONFIG_NUMA > -- > 1.7.10 > -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx146.postini.com [74.125.245.146]) by kanga.kvack.org (Postfix) with SMTP id 6549F6B005C for ; Thu, 14 Jun 2012 03:33:23 -0400 (EDT) Date: Thu, 14 Jun 2012 09:33:20 +0200 From: Michal Hocko Subject: Re: [PATCH -V9 07/15] hugetlb: add a list for tracking in-use HugeTLB pages Message-ID: <20120614073320.GE27397@tiehlicka.suse.cz> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1339583254-895-8-git-send-email-aneesh.kumar@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1339583254-895-8-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Sender: owner-linux-mm@kvack.org List-ID: To: "Aneesh Kumar K.V" Cc: linux-mm@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, rientjes@google.com, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org On Wed 13-06-12 15:57:26, Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > hugepage_activelist will be used to track currently used HugeTLB pages. > We need to find the in-use HugeTLB pages to support HugeTLB cgroup removal. > On cgroup removal we update the page's HugeTLB cgroup to point to parent > cgroup. > > Reviewed-by: KAMEZAWA Hiroyuki > Signed-off-by: Aneesh Kumar K.V Reviewed-by: Michal Hocko > --- > include/linux/hugetlb.h | 1 + > mm/hugetlb.c | 12 +++++++----- > 2 files changed, 8 insertions(+), 5 deletions(-) > > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h > index 0f23c18..ed550d8 100644 > --- a/include/linux/hugetlb.h > +++ b/include/linux/hugetlb.h > @@ -211,6 +211,7 @@ struct hstate { > unsigned long resv_huge_pages; > unsigned long surplus_huge_pages; > unsigned long nr_overcommit_huge_pages; > + struct list_head hugepage_activelist; > struct list_head hugepage_freelists[MAX_NUMNODES]; > unsigned int nr_huge_pages_node[MAX_NUMNODES]; > unsigned int free_huge_pages_node[MAX_NUMNODES]; > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index e54b695..b5b6e15 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -510,7 +510,7 @@ void copy_huge_page(struct page *dst, struct page *src) > static void enqueue_huge_page(struct hstate *h, struct page *page) > { > int nid = page_to_nid(page); > - list_add(&page->lru, &h->hugepage_freelists[nid]); > + list_move(&page->lru, &h->hugepage_freelists[nid]); > h->free_huge_pages++; > h->free_huge_pages_node[nid]++; > } > @@ -522,7 +522,7 @@ static struct page *dequeue_huge_page_node(struct hstate *h, int nid) > if (list_empty(&h->hugepage_freelists[nid])) > return NULL; > page = list_entry(h->hugepage_freelists[nid].next, struct page, lru); > - list_del(&page->lru); > + list_move(&page->lru, &h->hugepage_activelist); > set_page_refcounted(page); > h->free_huge_pages--; > h->free_huge_pages_node[nid]--; > @@ -626,10 +626,11 @@ static void free_huge_page(struct page *page) > page->mapping = NULL; > BUG_ON(page_count(page)); > BUG_ON(page_mapcount(page)); > - INIT_LIST_HEAD(&page->lru); > > spin_lock(&hugetlb_lock); > if (h->surplus_huge_pages_node[nid] && huge_page_order(h) < MAX_ORDER) { > + /* remove the page from active list */ > + list_del(&page->lru); > update_and_free_page(h, page); > h->surplus_huge_pages--; > h->surplus_huge_pages_node[nid]--; > @@ -642,6 +643,7 @@ static void free_huge_page(struct page *page) > > static void prep_new_huge_page(struct hstate *h, struct page *page, int nid) > { > + INIT_LIST_HEAD(&page->lru); > set_compound_page_dtor(page, free_huge_page); > spin_lock(&hugetlb_lock); > h->nr_huge_pages++; > @@ -890,6 +892,7 @@ static struct page *alloc_buddy_huge_page(struct hstate *h, int nid) > > spin_lock(&hugetlb_lock); > if (page) { > + INIT_LIST_HEAD(&page->lru); > r_nid = page_to_nid(page); > set_compound_page_dtor(page, free_huge_page); > /* > @@ -994,7 +997,6 @@ retry: > list_for_each_entry_safe(page, tmp, &surplus_list, lru) { > if ((--needed) < 0) > break; > - list_del(&page->lru); > /* > * This page is now managed by the hugetlb allocator and has > * no users -- drop the buddy allocator's reference. > @@ -1009,7 +1011,6 @@ free: > /* Free unnecessary surplus pages to the buddy allocator */ > if (!list_empty(&surplus_list)) { > list_for_each_entry_safe(page, tmp, &surplus_list, lru) { > - list_del(&page->lru); > put_page(page); > } > } > @@ -1909,6 +1910,7 @@ void __init hugetlb_add_hstate(unsigned order) > h->free_huge_pages = 0; > for (i = 0; i < MAX_NUMNODES; ++i) > INIT_LIST_HEAD(&h->hugepage_freelists[i]); > + INIT_LIST_HEAD(&h->hugepage_activelist); > h->next_nid_to_alloc = first_node(node_states[N_HIGH_MEMORY]); > h->next_nid_to_free = first_node(node_states[N_HIGH_MEMORY]); > snprintf(h->name, HSTATE_NAME_LEN, "hugepages-%lukB", > -- > 1.7.10 > -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx200.postini.com [74.125.245.200]) by kanga.kvack.org (Postfix) with SMTP id 9F57E6B0069 for ; Thu, 14 Jun 2012 04:44:31 -0400 (EDT) Date: Thu, 14 Jun 2012 10:44:29 +0200 From: Michal Hocko Subject: Re: [PATCH -V9 [updated] 10/15] hugetlb/cgroup: Add the cgroup pointer to page lru Message-ID: <20120614084429.GH27397@tiehlicka.suse.cz> References: <1339583254-895-11-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1339587270-5831-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1339587270-5831-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Sender: owner-linux-mm@kvack.org List-ID: To: "Aneesh Kumar K.V" Cc: linux-mm@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, rientjes@google.com, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org On Wed 13-06-12 17:04:30, Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > Add the hugetlb cgroup pointer to 3rd page lru.next. This limit > the usage to hugetlb cgroup to only hugepages with 3 or more > normal pages. I guess that is an acceptable limitation. > > Signed-off-by: Aneesh Kumar K.V I would be happier if you explicitely mentioned that both hugetlb_cgroup_from_page and set_hugetlb_cgroup need hugetlb_lock held, but Reviewed-by: Michal Hocko > --- > include/linux/hugetlb_cgroup.h | 37 +++++++++++++++++++++++++++++++++++++ > mm/hugetlb.c | 4 ++++ > 2 files changed, 41 insertions(+) > > diff --git a/include/linux/hugetlb_cgroup.h b/include/linux/hugetlb_cgroup.h > index e9944b4..2e4cb6b 100644 > --- a/include/linux/hugetlb_cgroup.h > +++ b/include/linux/hugetlb_cgroup.h > @@ -18,8 +18,34 @@ > #include > > struct hugetlb_cgroup; > +/* > + * Minimum page order trackable by hugetlb cgroup. > + * At least 3 pages are necessary for all the tracking information. > + */ > +#define HUGETLB_CGROUP_MIN_ORDER 2 > > #ifdef CONFIG_CGROUP_HUGETLB_RES_CTLR > + > +static inline struct hugetlb_cgroup *hugetlb_cgroup_from_page(struct page *page) > +{ > + VM_BUG_ON(!PageHuge(page)); > + > + if (compound_order(page) < HUGETLB_CGROUP_MIN_ORDER) > + return NULL; > + return (struct hugetlb_cgroup *)page[2].lru.next; > +} > + > +static inline > +int set_hugetlb_cgroup(struct page *page, struct hugetlb_cgroup *h_cg) > +{ > + VM_BUG_ON(!PageHuge(page)); > + > + if (compound_order(page) < HUGETLB_CGROUP_MIN_ORDER) > + return -1; > + page[2].lru.next = (void *)h_cg; > + return 0; > +} > + > static inline bool hugetlb_cgroup_disabled(void) > { > if (hugetlb_subsys.disabled) > @@ -28,6 +54,17 @@ static inline bool hugetlb_cgroup_disabled(void) > } > > #else > +static inline struct hugetlb_cgroup *hugetlb_cgroup_from_page(struct page *page) > +{ > + return NULL; > +} > + > +static inline > +int set_hugetlb_cgroup(struct page *page, struct hugetlb_cgroup *h_cg) > +{ > + return 0; > +} > + > static inline bool hugetlb_cgroup_disabled(void) > { > return true; > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index e899a2d..6a449c5 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -28,6 +28,7 @@ > > #include > #include > +#include > #include > #include "internal.h" > > @@ -591,6 +592,7 @@ static void update_and_free_page(struct hstate *h, struct page *page) > 1 << PG_active | 1 << PG_reserved | > 1 << PG_private | 1 << PG_writeback); > } > + VM_BUG_ON(hugetlb_cgroup_from_page(page)); > set_compound_page_dtor(page, NULL); > set_page_refcounted(page); > arch_release_hugepage(page); > @@ -643,6 +645,7 @@ static void prep_new_huge_page(struct hstate *h, struct page *page, int nid) > INIT_LIST_HEAD(&page->lru); > set_compound_page_dtor(page, free_huge_page); > spin_lock(&hugetlb_lock); > + set_hugetlb_cgroup(page, NULL); > h->nr_huge_pages++; > h->nr_huge_pages_node[nid]++; > spin_unlock(&hugetlb_lock); > @@ -892,6 +895,7 @@ static struct page *alloc_buddy_huge_page(struct hstate *h, int nid) > INIT_LIST_HEAD(&page->lru); > r_nid = page_to_nid(page); > set_compound_page_dtor(page, free_huge_page); > + set_hugetlb_cgroup(page, NULL); > /* > * We incremented the global counters already > */ > -- > 1.7.10 > -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx163.postini.com [74.125.245.163]) by kanga.kvack.org (Postfix) with SMTP id 629AA6B005C for ; Thu, 14 Jun 2012 05:25:43 -0400 (EDT) Date: Thu, 14 Jun 2012 11:25:40 +0200 From: Michal Hocko Subject: Re: [PATCH -V9 11/15] hugetlb/cgroup: Add charge/uncharge routines for hugetlb cgroup Message-ID: <20120614092539.GI27397@tiehlicka.suse.cz> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1339583254-895-12-git-send-email-aneesh.kumar@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1339583254-895-12-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Sender: owner-linux-mm@kvack.org List-ID: To: "Aneesh Kumar K.V" Cc: linux-mm@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, rientjes@google.com, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org On Wed 13-06-12 15:57:30, Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > This patchset add the charge and uncharge routines for hugetlb cgroup. > We do cgroup charging in page alloc and uncharge in compound page > destructor. Assigning page's hugetlb cgroup is protected by hugetlb_lock. > > Signed-off-by: Aneesh Kumar K.V Reviewed-by: Michal Hocko One minor comment [...] > +void hugetlb_cgroup_commit_charge(int idx, unsigned long nr_pages, > + struct hugetlb_cgroup *h_cg, > + struct page *page) > +{ > + if (hugetlb_cgroup_disabled() || !h_cg) > + return; > + > + spin_lock(&hugetlb_lock); > + set_hugetlb_cgroup(page, h_cg); > + spin_unlock(&hugetlb_lock); > + return; > +} I guess we can remove the lock here because nobody can see the page yet, right? -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx104.postini.com [74.125.245.104]) by kanga.kvack.org (Postfix) with SMTP id B26B76B005C for ; Thu, 14 Jun 2012 06:04:56 -0400 (EDT) Date: Thu, 14 Jun 2012 12:04:54 +0200 From: Michal Hocko Subject: Re: [PATCH -V9 14/15] hugetlb/cgroup: migrate hugetlb cgroup info from oldpage to new page during migration Message-ID: <20120614100454.GL27397@tiehlicka.suse.cz> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1339583254-895-15-git-send-email-aneesh.kumar@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1339583254-895-15-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Sender: owner-linux-mm@kvack.org List-ID: To: "Aneesh Kumar K.V" Cc: linux-mm@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, rientjes@google.com, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org On Wed 13-06-12 15:57:33, Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > With HugeTLB pages, hugetlb cgroup is uncharged in compound page destructor. Since > we are holding a hugepage reference, we can be sure that old page won't > get uncharged till the last put_page(). > > Signed-off-by: Aneesh Kumar K.V Reviewed-by: Michal Hocko One question below [...] > +void hugetlb_cgroup_migrate(struct page *oldhpage, struct page *newhpage) > +{ > + struct hugetlb_cgroup *h_cg; > + > + if (hugetlb_cgroup_disabled()) > + return; > + > + VM_BUG_ON(!PageHuge(oldhpage)); > + spin_lock(&hugetlb_lock); > + h_cg = hugetlb_cgroup_from_page(oldhpage); > + set_hugetlb_cgroup(oldhpage, NULL); > + cgroup_exclude_rmdir(&h_cg->css); > + > + /* move the h_cg details to new cgroup */ > + set_hugetlb_cgroup(newhpage, h_cg); > + spin_unlock(&hugetlb_lock); > + cgroup_release_and_wakeup_rmdir(&h_cg->css); > + return; > +} > + The changelog says that the old page won't get uncharged - which means that the the cgroup cannot go away (even if we raced with the move parent, hugetlb_lock makes sure we either see old or new cgroup) so why do we need to play with css ref. counting? -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx196.postini.com [74.125.245.196]) by kanga.kvack.org (Postfix) with SMTP id 4ACB46B005C for ; Thu, 14 Jun 2012 06:07:57 -0400 (EDT) Date: Thu, 14 Jun 2012 12:07:55 +0200 From: Michal Hocko Subject: Re: [PATCH -V9 15/15] hugetlb/cgroup: add HugeTLB controller documentation Message-ID: <20120614100755.GM27397@tiehlicka.suse.cz> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1339583254-895-16-git-send-email-aneesh.kumar@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1339583254-895-16-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Sender: owner-linux-mm@kvack.org List-ID: To: "Aneesh Kumar K.V" Cc: linux-mm@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, rientjes@google.com, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org On Wed 13-06-12 15:57:34, Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > Reviewed-by: KAMEZAWA Hiroyuki > Signed-off-by: Aneesh Kumar K.V Reviewed-by: Michal Hocko Minor nid below > --- > Documentation/cgroups/hugetlb.txt | 45 +++++++++++++++++++++++++++++++++++++ > 1 file changed, 45 insertions(+) > create mode 100644 Documentation/cgroups/hugetlb.txt > > diff --git a/Documentation/cgroups/hugetlb.txt b/Documentation/cgroups/hugetlb.txt > new file mode 100644 > index 0000000..a9faaca > --- /dev/null > +++ b/Documentation/cgroups/hugetlb.txt [...] > +With the above step, the initial or the parent HugeTLB group becomes > +visible at /sys/fs/cgroup. At bootup, this group includes all the tasks in > +the system. /sys/fs/cgroup/tasks lists the tasks in this cgroup. > + > +New groups can be created under the parent group /sys/fs/cgroup. > + > +# cd /sys/fs/cgroup > +# mkdir g1 > +# echo $$ > g1/tasks > + > +The above steps create a new group g1 and move the current shell > +process (bash) into it. This is probably not needed as it is already described in the generic cgroups description > + > +Brief summary of control files > + > + hugetlb..limit_in_bytes # set/show limit of "hugepagesize" hugetlb usage > + hugetlb..max_usage_in_bytes # show max "hugepagesize" hugetlb usage recorded > + hugetlb..usage_in_bytes # show current res_counter usage for "hugepagesize" hugetlb > + hugetlb..failcnt # show the number of allocation failure due to HugeTLB limit > + > +For a system supporting two hugepage size (16M and 16G) the control > +files include: > + > +hugetlb.16GB.limit_in_bytes > +hugetlb.16GB.max_usage_in_bytes > +hugetlb.16GB.usage_in_bytes > +hugetlb.16GB.failcnt > +hugetlb.16MB.limit_in_bytes > +hugetlb.16MB.max_usage_in_bytes > +hugetlb.16MB.usage_in_bytes > +hugetlb.16MB.failcnt > -- > 1.7.10 > -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx181.postini.com [74.125.245.181]) by kanga.kvack.org (Postfix) with SMTP id CE6686B005C for ; Fri, 15 Jun 2012 02:20:59 -0400 (EDT) Received: from /spool/local by e28smtp04.in.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Fri, 15 Jun 2012 11:50:56 +0530 Received: from d28av05.in.ibm.com (d28av05.in.ibm.com [9.184.220.67]) by d28relay01.in.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id q5F6Ks7u13697458 for ; Fri, 15 Jun 2012 11:50:54 +0530 Received: from d28av05.in.ibm.com (loopback [127.0.0.1]) by d28av05.in.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id q5FBpOWc029754 for ; Fri, 15 Jun 2012 21:51:26 +1000 From: "Aneesh Kumar K.V" Subject: Re: [PATCH -V9 09/15] mm/hugetlb: Add new HugeTLB cgroup In-Reply-To: <4FD9A6B6.50503@huawei.com> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1339583254-895-10-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <4FD9A6B6.50503@huawei.com> Date: Fri, 15 Jun 2012 11:50:52 +0530 Message-ID: <87mx45m6yj.fsf@skywalker.in.ibm.com> MIME-Version: 1.0 Content-Type: text/plain Sender: owner-linux-mm@kvack.org List-ID: To: Li Zefan Cc: linux-mm@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, rientjes@google.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Li Zefan writes: >> +static inline > >> +struct hugetlb_cgroup *hugetlb_cgroup_from_css(struct cgroup_subsys_state *s) >> +{ >> + if (s) > > > Neither cgroup_subsys_state() or task_subsys_state() will ever return NULL, > so here 's' won't be NULL. > That is a change that didn't get updated when i dropped page_cgroup changes. I had a series that tracked in page_cgroup cgroup_subsys_state. I will send an fix on top. Thanks for the review. -aneesh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx178.postini.com [74.125.245.178]) by kanga.kvack.org (Postfix) with SMTP id 6787E6B0069 for ; Fri, 15 Jun 2012 06:08:37 -0400 (EDT) Received: from /spool/local by e28smtp05.in.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Fri, 15 Jun 2012 15:38:33 +0530 Received: from d28av02.in.ibm.com (d28av02.in.ibm.com [9.184.220.64]) by d28relay01.in.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id q5FA8VHM9830868 for ; Fri, 15 Jun 2012 15:38:31 +0530 Received: from d28av02.in.ibm.com (loopback [127.0.0.1]) by d28av02.in.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id q5FFd92V021069 for ; Sat, 16 Jun 2012 01:39:09 +1000 From: "Aneesh Kumar K.V" Subject: [PATCH 2/2] hugetlb/cgroup: Assign the page hugetlb cgroup when we move the page to active list. Date: Fri, 15 Jun 2012 15:38:22 +0530 Message-Id: <1339754902-17779-2-git-send-email-aneesh.kumar@linux.vnet.ibm.com> In-Reply-To: <1339754902-17779-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> References: <87k3z8nb3h.fsf@skywalker.in.ibm.com> <1339754902-17779-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Sender: owner-linux-mm@kvack.org List-ID: To: linux-mm@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, mhocko@suse.cz, akpm@linux-foundation.org Cc: "Aneesh Kumar K.V" From: "Aneesh Kumar K.V" page's hugetlb cgroup assign and moving to active list should happen with hugetlb_lock held. Otherwise when we remove the hugetlb cgroup we would iterate the active list and will find page with NULL hugetlb cgroup values. Signed-off-by: Aneesh Kumar K.V --- mm/hugetlb.c | 12 +++++++----- mm/hugetlb_cgroup.c | 3 +-- 2 files changed, 8 insertions(+), 7 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index ee4da3b..b90dfb4 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1146,9 +1146,12 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma, } spin_lock(&hugetlb_lock); page = dequeue_huge_page_vma(h, vma, addr, avoid_reserve); - spin_unlock(&hugetlb_lock); - - if (!page) { + if (page) { + /* update page cgroup details */ + hugetlb_cgroup_commit_charge(idx, pages_per_huge_page(h), h_cg, page); + spin_unlock(&hugetlb_lock); + } else { + spin_unlock(&hugetlb_lock); page = alloc_buddy_huge_page(h, NUMA_NO_NODE); if (!page) { hugetlb_cgroup_uncharge_cgroup(idx, @@ -1159,14 +1162,13 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma, } spin_lock(&hugetlb_lock); list_move(&page->lru, &h->hugepage_activelist); + hugetlb_cgroup_commit_charge(idx, pages_per_huge_page(h), h_cg, page); spin_unlock(&hugetlb_lock); } set_page_private(page, (unsigned long)spool); vma_commit_reservation(h, vma, addr); - /* update page cgroup details */ - hugetlb_cgroup_commit_charge(idx, pages_per_huge_page(h), h_cg, page); return page; } diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c index 8e7ca0a..d4f3f7b 100644 --- a/mm/hugetlb_cgroup.c +++ b/mm/hugetlb_cgroup.c @@ -218,6 +218,7 @@ done: return ret; } +/* Should be called with hugetlb_lock held */ void hugetlb_cgroup_commit_charge(int idx, unsigned long nr_pages, struct hugetlb_cgroup *h_cg, struct page *page) @@ -225,9 +226,7 @@ void hugetlb_cgroup_commit_charge(int idx, unsigned long nr_pages, if (hugetlb_cgroup_disabled() || !h_cg) return; - spin_lock(&hugetlb_lock); set_hugetlb_cgroup(page, h_cg); - spin_unlock(&hugetlb_lock); return; } -- 1.7.10 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx115.postini.com [74.125.245.115]) by kanga.kvack.org (Postfix) with SMTP id 2C0C56B006C for ; Fri, 15 Jun 2012 06:08:38 -0400 (EDT) Received: from /spool/local by e28smtp07.in.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Fri, 15 Jun 2012 15:38:35 +0530 Received: from d28av02.in.ibm.com (d28av02.in.ibm.com [9.184.220.64]) by d28relay01.in.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id q5FA8VPe1835488 for ; Fri, 15 Jun 2012 15:38:31 +0530 Received: from d28av02.in.ibm.com (loopback [127.0.0.1]) by d28av02.in.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id q5FFd93Z021054 for ; Sat, 16 Jun 2012 01:39:09 +1000 From: "Aneesh Kumar K.V" Subject: [PATCH 1/2] hugetlb: Move all the in use pages to active list Date: Fri, 15 Jun 2012 15:38:21 +0530 Message-Id: <1339754902-17779-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> In-Reply-To: <87k3z8nb3h.fsf@skywalker.in.ibm.com> References: <87k3z8nb3h.fsf@skywalker.in.ibm.com> Sender: owner-linux-mm@kvack.org List-ID: To: linux-mm@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, mhocko@suse.cz, akpm@linux-foundation.org Cc: "Aneesh Kumar K.V" From: "Aneesh Kumar K.V" When we fail to allocate pages from the reserve pool, hugetlb do try to allocate huge pages using alloc_buddy_huge_page. Add these to the active list. We also need to add the huge page we allocate when we soft offline the oldpage to active list. Signed-off-by: Aneesh Kumar K.V --- mm/hugetlb.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index c57740b..ee4da3b 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -928,8 +928,10 @@ struct page *alloc_huge_page_node(struct hstate *h, int nid) page = dequeue_huge_page_node(h, nid); spin_unlock(&hugetlb_lock); - if (!page) + if (!page) { page = alloc_buddy_huge_page(h, nid); + list_move(&page->lru, &h->hugepage_activelist); + } return page; } @@ -1155,6 +1157,9 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma, hugepage_subpool_put_pages(spool, chg); return ERR_PTR(-ENOSPC); } + spin_lock(&hugetlb_lock); + list_move(&page->lru, &h->hugepage_activelist); + spin_unlock(&hugetlb_lock); } set_page_private(page, (unsigned long)spool); -- 1.7.10 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx151.postini.com [74.125.245.151]) by kanga.kvack.org (Postfix) with SMTP id 7F8C66B004D for ; Fri, 15 Jun 2012 06:06:22 -0400 (EDT) Received: from /spool/local by e23smtp03.au.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Fri, 15 Jun 2012 09:55:36 +1000 Received: from d23av04.au.ibm.com (d23av04.au.ibm.com [9.190.235.139]) by d23relay03.au.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id q5FA6Etk46792906 for ; Fri, 15 Jun 2012 20:06:14 +1000 Received: from d23av04.au.ibm.com (loopback [127.0.0.1]) by d23av04.au.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id q5FA6DAZ029513 for ; Fri, 15 Jun 2012 20:06:13 +1000 From: "Aneesh Kumar K.V" Subject: Re: [PATCH -V9 11/15] hugetlb/cgroup: Add charge/uncharge routines for hugetlb cgroup In-Reply-To: <20120614092539.GI27397@tiehlicka.suse.cz> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1339583254-895-12-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <20120614092539.GI27397@tiehlicka.suse.cz> Date: Fri, 15 Jun 2012 15:36:10 +0530 Message-ID: <87k3z8nb3h.fsf@skywalker.in.ibm.com> MIME-Version: 1.0 Content-Type: text/plain Sender: owner-linux-mm@kvack.org List-ID: To: Michal Hocko Cc: linux-mm@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, rientjes@google.com, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Michal Hocko writes: > On Wed 13-06-12 15:57:30, Aneesh Kumar K.V wrote: >> From: "Aneesh Kumar K.V" >> >> This patchset add the charge and uncharge routines for hugetlb cgroup. >> We do cgroup charging in page alloc and uncharge in compound page >> destructor. Assigning page's hugetlb cgroup is protected by hugetlb_lock. >> >> Signed-off-by: Aneesh Kumar K.V > > Reviewed-by: Michal Hocko > > One minor comment > [...] >> +void hugetlb_cgroup_commit_charge(int idx, unsigned long nr_pages, >> + struct hugetlb_cgroup *h_cg, >> + struct page *page) >> +{ >> + if (hugetlb_cgroup_disabled() || !h_cg) >> + return; >> + >> + spin_lock(&hugetlb_lock); >> + set_hugetlb_cgroup(page, h_cg); >> + spin_unlock(&hugetlb_lock); >> + return; >> +} > > I guess we can remove the lock here because nobody can see the page yet, > right? > We need that to make sure when we remove cgroup we find correct page hugetlb cgroup values. But i guess we have a bug here. How about the below ? NOTE: We also need another patch to update active list during soft offline. I will send that in reply. commit e4c3fd3cc0f0faa30ea283cb48ba478a5c0d3e74 Author: Aneesh Kumar K.V Date: Fri Jun 15 14:42:27 2012 +0530 hugetlb/cgroup: Assign the page hugetlb cgroup when we move the page to active list. page's hugetlb cgroup assign and moving to active list should happen with hugetlb_lock held. Otherwise when we remove the hugetlb cgroup we would iterate the active list and will find page with NULL hugetlb cgroup values. Signed-off-by: Aneesh Kumar K.V diff --git a/mm/hugetlb.c b/mm/hugetlb.c index ee4da3b..b90dfb4 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1146,9 +1146,12 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma, } spin_lock(&hugetlb_lock); page = dequeue_huge_page_vma(h, vma, addr, avoid_reserve); - spin_unlock(&hugetlb_lock); - - if (!page) { + if (page) { + /* update page cgroup details */ + hugetlb_cgroup_commit_charge(idx, pages_per_huge_page(h), h_cg, page); + spin_unlock(&hugetlb_lock); + } else { + spin_unlock(&hugetlb_lock); page = alloc_buddy_huge_page(h, NUMA_NO_NODE); if (!page) { hugetlb_cgroup_uncharge_cgroup(idx, @@ -1159,14 +1162,13 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma, } spin_lock(&hugetlb_lock); list_move(&page->lru, &h->hugepage_activelist); + hugetlb_cgroup_commit_charge(idx, pages_per_huge_page(h), h_cg, page); spin_unlock(&hugetlb_lock); } set_page_private(page, (unsigned long)spool); vma_commit_reservation(h, vma, addr); - /* update page cgroup details */ - hugetlb_cgroup_commit_charge(idx, pages_per_huge_page(h), h_cg, page); return page; } diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c index 8e7ca0a..d4f3f7b 100644 --- a/mm/hugetlb_cgroup.c +++ b/mm/hugetlb_cgroup.c @@ -218,6 +218,7 @@ done: return ret; } +/* Should be called with hugetlb_lock held */ void hugetlb_cgroup_commit_charge(int idx, unsigned long nr_pages, struct hugetlb_cgroup *h_cg, struct page *page) @@ -225,9 +226,7 @@ void hugetlb_cgroup_commit_charge(int idx, unsigned long nr_pages, if (hugetlb_cgroup_disabled() || !h_cg) return; - spin_lock(&hugetlb_lock); set_hugetlb_cgroup(page, h_cg); - spin_unlock(&hugetlb_lock); return; } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx104.postini.com [74.125.245.104]) by kanga.kvack.org (Postfix) with SMTP id 7BB2D6B005C for ; Fri, 15 Jun 2012 06:50:43 -0400 (EDT) Received: from /spool/local by e23smtp07.au.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Fri, 15 Jun 2012 10:41:50 +1000 Received: from d23av04.au.ibm.com (d23av04.au.ibm.com [9.190.235.139]) by d23relay03.au.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id q5FAoZes59768888 for ; Fri, 15 Jun 2012 20:50:35 +1000 Received: from d23av04.au.ibm.com (loopback [127.0.0.1]) by d23av04.au.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id q5FAoYrh027534 for ; Fri, 15 Jun 2012 20:50:35 +1000 From: "Aneesh Kumar K.V" Subject: Re: [PATCH -V9 14/15] hugetlb/cgroup: migrate hugetlb cgroup info from oldpage to new page during migration In-Reply-To: <20120614100454.GL27397@tiehlicka.suse.cz> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1339583254-895-15-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <20120614100454.GL27397@tiehlicka.suse.cz> Date: Fri, 15 Jun 2012 16:20:31 +0530 Message-ID: <87haucn91k.fsf@skywalker.in.ibm.com> MIME-Version: 1.0 Content-Type: text/plain Sender: owner-linux-mm@kvack.org List-ID: To: Michal Hocko Cc: linux-mm@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, rientjes@google.com, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Michal Hocko writes: > On Wed 13-06-12 15:57:33, Aneesh Kumar K.V wrote: >> From: "Aneesh Kumar K.V" >> >> With HugeTLB pages, hugetlb cgroup is uncharged in compound page destructor. Since >> we are holding a hugepage reference, we can be sure that old page won't >> get uncharged till the last put_page(). >> >> Signed-off-by: Aneesh Kumar K.V > > Reviewed-by: Michal Hocko > > One question below > [...] >> +void hugetlb_cgroup_migrate(struct page *oldhpage, struct page *newhpage) >> +{ >> + struct hugetlb_cgroup *h_cg; >> + >> + if (hugetlb_cgroup_disabled()) >> + return; >> + >> + VM_BUG_ON(!PageHuge(oldhpage)); >> + spin_lock(&hugetlb_lock); >> + h_cg = hugetlb_cgroup_from_page(oldhpage); >> + set_hugetlb_cgroup(oldhpage, NULL); >> + cgroup_exclude_rmdir(&h_cg->css); >> + >> + /* move the h_cg details to new cgroup */ >> + set_hugetlb_cgroup(newhpage, h_cg); >> + spin_unlock(&hugetlb_lock); >> + cgroup_release_and_wakeup_rmdir(&h_cg->css); >> + return; >> +} >> + > > The changelog says that the old page won't get uncharged - which means > that the the cgroup cannot go away (even if we raced with the move > parent, hugetlb_lock makes sure we either see old or new cgroup) so why > do we need to play with css ref. counting? Ok hugetlb_lock should be sufficient here i guess. I will send a patch on top to remove the exclude_rmdir and release_and_wakeup_rmdir -aneesh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx136.postini.com [74.125.245.136]) by kanga.kvack.org (Postfix) with SMTP id 237706B0068 for ; Sat, 16 Jun 2012 02:26:58 -0400 (EDT) Received: from m3.gw.fujitsu.co.jp (unknown [10.0.50.73]) by fgwmail5.fujitsu.co.jp (Postfix) with ESMTP id AD92A3EE081 for ; Sat, 16 Jun 2012 15:26:56 +0900 (JST) Received: from smail (m3 [127.0.0.1]) by outgoing.m3.gw.fujitsu.co.jp (Postfix) with ESMTP id 9806B45DEA6 for ; Sat, 16 Jun 2012 15:26:56 +0900 (JST) Received: from s3.gw.fujitsu.co.jp (s3.gw.fujitsu.co.jp [10.0.50.93]) by m3.gw.fujitsu.co.jp (Postfix) with ESMTP id 80B1345DE9E for ; Sat, 16 Jun 2012 15:26:56 +0900 (JST) Received: from s3.gw.fujitsu.co.jp (localhost.localdomain [127.0.0.1]) by s3.gw.fujitsu.co.jp (Postfix) with ESMTP id 750441DB803B for ; Sat, 16 Jun 2012 15:26:56 +0900 (JST) Received: from m1001.s.css.fujitsu.com (m1001.s.css.fujitsu.com [10.240.81.139]) by s3.gw.fujitsu.co.jp (Postfix) with ESMTP id 2FD901DB8038 for ; Sat, 16 Jun 2012 15:26:56 +0900 (JST) Message-ID: <4FDC26BB.80806@jp.fujitsu.com> Date: Sat, 16 Jun 2012 15:24:59 +0900 From: Kamezawa Hiroyuki MIME-Version: 1.0 Subject: Re: [PATCH 2/2] hugetlb/cgroup: Assign the page hugetlb cgroup when we move the page to active list. References: <87k3z8nb3h.fsf@skywalker.in.ibm.com> <1339754902-17779-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1339754902-17779-2-git-send-email-aneesh.kumar@linux.vnet.ibm.com> In-Reply-To: <1339754902-17779-2-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Content-Type: text/plain; charset=ISO-2022-JP Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: "Aneesh Kumar K.V" Cc: linux-mm@kvack.org, mhocko@suse.cz, akpm@linux-foundation.org (2012/06/15 19:08), Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > page's hugetlb cgroup assign and moving to active list should happen with > hugetlb_lock held. Otherwise when we remove the hugetlb cgroup we would > iterate the active list and will find page with NULL hugetlb cgroup values. > > Signed-off-by: Aneesh Kumar K.V Reviewed-by: KAMEZAWA Hiroyuki -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753506Ab2FMK2F (ORCPT ); Wed, 13 Jun 2012 06:28:05 -0400 Received: from e28smtp05.in.ibm.com ([122.248.162.5]:52401 "EHLO e28smtp05.in.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753022Ab2FMK2D (ORCPT ); Wed, 13 Jun 2012 06:28:03 -0400 From: "Aneesh Kumar K.V" To: linux-mm@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, rientjes@google.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: [PATCH -V9 00/15] hugetlb: Add HugeTLB controller to control HugeTLB allocation Date: Wed, 13 Jun 2012 15:57:19 +0530 Message-Id: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> X-Mailer: git-send-email 1.7.10 x-cbid: 12061310-8256-0000-0000-000002E809C8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, This patchset implements a cgroup resource controller for HugeTLB pages. The controller allows to limit the HugeTLB usage per control group and enforces the controller limit during page fault. Since HugeTLB doesn't support page reclaim, enforcing the limit at page fault time implies that, the application will get SIGBUS signal if it tries to access HugeTLB pages beyond its limit. This requires the application to know beforehand how much HugeTLB pages it would require for its use. The goal is to control how many HugeTLB pages a group of task can allocate. It can be looked at as an extension of the existing quota interface which limits the number of HugeTLB pages per hugetlbfs superblock. HPC job scheduler requires jobs to specify their resource requirements in the job file. Once their requirements can be met, job schedulers like (SLURM) will schedule the job. We need to make sure that the jobs won't consume more resources than requested. If they do we should either error out or kill the application. Patches are on top of v3.5-rc2 Changes from V8: * Address review feedback Changes from V7: * Remove dependency on page_cgroup. * Use page[2].lru.next to store HugeTLB cgroup information. Changes from V6: * Implement the controller as a seperate HugeTLB cgroup. * Folded fixup patches in -mm to the original patches Changes from V5: * Address review feedback. Changes from V4: * Add support for charge/uncharge during page migration * Drop the usage of page->lru in unmap_hugepage_range. Changes from v3: * Address review feedback. * Fix a bug in cgroup removal related parent charging with use_hierarchy set Changes from V2: * Changed the implementation to limit the HugeTLB usage during page fault time. This simplifies the extension and keep it closer to memcg design. This also allows to support cgroup removal with less complexity. Only caveat is the application should ensure its HugeTLB usage doesn't cross the cgroup limit. Changes from V1: * Changed the implementation as a memcg extension. We still use the same logic to track the cgroup and range. Changes from RFC post: * Added support for HugeTLB cgroup hierarchy * Added support for task migration * Added documentation patch * Other bug fixes -aneesh From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753924Ab2FMK2g (ORCPT ); Wed, 13 Jun 2012 06:28:36 -0400 Received: from e28smtp09.in.ibm.com ([122.248.162.9]:51061 "EHLO e28smtp09.in.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753849Ab2FMK23 (ORCPT ); Wed, 13 Jun 2012 06:28:29 -0400 From: "Aneesh Kumar K.V" To: linux-mm@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, rientjes@google.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, "Aneesh Kumar K.V" Subject: [PATCH -V9 08/15] hugetlb: Make some static variables global Date: Wed, 13 Jun 2012 15:57:27 +0530 Message-Id: <1339583254-895-9-git-send-email-aneesh.kumar@linux.vnet.ibm.com> X-Mailer: git-send-email 1.7.10 In-Reply-To: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> x-cbid: 12061310-2674-0000-0000-000004E24DF4 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: "Aneesh Kumar K.V" We will use them later in hugetlb_cgroup.c Signed-off-by: Aneesh Kumar K.V --- include/linux/hugetlb.h | 5 +++++ mm/hugetlb.c | 7 ++----- 2 files changed, 7 insertions(+), 5 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index ed550d8..4aca057 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -21,6 +21,11 @@ struct hugepage_subpool { long max_hpages, used_hpages; }; +extern spinlock_t hugetlb_lock; +extern int hugetlb_max_hstate; +#define for_each_hstate(h) \ + for ((h) = hstates; (h) < &hstates[hugetlb_max_hstate]; (h)++) + struct hugepage_subpool *hugepage_new_subpool(long nr_blocks); void hugepage_put_subpool(struct hugepage_subpool *spool); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index b5b6e15..e899a2d 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -35,7 +35,7 @@ const unsigned long hugetlb_zero = 0, hugetlb_infinity = ~0UL; static gfp_t htlb_alloc_mask = GFP_HIGHUSER; unsigned long hugepages_treat_as_movable; -static int hugetlb_max_hstate; +int hugetlb_max_hstate; unsigned int default_hstate_idx; struct hstate hstates[HUGE_MAX_HSTATE]; @@ -46,13 +46,10 @@ static struct hstate * __initdata parsed_hstate; static unsigned long __initdata default_hstate_max_huge_pages; static unsigned long __initdata default_hstate_size; -#define for_each_hstate(h) \ - for ((h) = hstates; (h) < &hstates[hugetlb_max_hstate]; (h)++) - /* * Protects updates to hugepage_freelists, nr_huge_pages, and free_huge_pages */ -static DEFINE_SPINLOCK(hugetlb_lock); +DEFINE_SPINLOCK(hugetlb_lock); static inline void unlock_or_release_subpool(struct hugepage_subpool *spool) { -- 1.7.10 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753940Ab2FMK2j (ORCPT ); Wed, 13 Jun 2012 06:28:39 -0400 Received: from e28smtp02.in.ibm.com ([122.248.162.2]:58164 "EHLO e28smtp02.in.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753905Ab2FMK2h (ORCPT ); Wed, 13 Jun 2012 06:28:37 -0400 From: "Aneesh Kumar K.V" To: linux-mm@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, rientjes@google.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, "Aneesh Kumar K.V" Subject: [PATCH -V9 09/15] mm/hugetlb: Add new HugeTLB cgroup Date: Wed, 13 Jun 2012 15:57:28 +0530 Message-Id: <1339583254-895-10-git-send-email-aneesh.kumar@linux.vnet.ibm.com> X-Mailer: git-send-email 1.7.10 In-Reply-To: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> x-cbid: 12061310-5816-0000-0000-0000032119E4 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: "Aneesh Kumar K.V" This patch implements a new controller that allows us to control HugeTLB allocations. The extension allows to limit the HugeTLB usage per control group and enforces the controller limit during page fault. Since HugeTLB doesn't support page reclaim, enforcing the limit at page fault time implies that, the application will get SIGBUS signal if it tries to access HugeTLB pages beyond its limit. This requires the application to know beforehand how much HugeTLB pages it would require for its use. The charge/uncharge calls will be added to HugeTLB code in later patch. Support for cgroup removal will be added in later patches. Reviewed-by: KAMEZAWA Hiroyuki Signed-off-by: Aneesh Kumar K.V --- include/linux/cgroup_subsys.h | 6 ++ include/linux/hugetlb_cgroup.h | 37 ++++++++++++ init/Kconfig | 15 +++++ mm/Makefile | 1 + mm/hugetlb_cgroup.c | 122 ++++++++++++++++++++++++++++++++++++++++ 5 files changed, 181 insertions(+) create mode 100644 include/linux/hugetlb_cgroup.h create mode 100644 mm/hugetlb_cgroup.c diff --git a/include/linux/cgroup_subsys.h b/include/linux/cgroup_subsys.h index 0bd390c..895923a 100644 --- a/include/linux/cgroup_subsys.h +++ b/include/linux/cgroup_subsys.h @@ -72,3 +72,9 @@ SUBSYS(net_prio) #endif /* */ + +#ifdef CONFIG_CGROUP_HUGETLB_RES_CTLR +SUBSYS(hugetlb) +#endif + +/* */ diff --git a/include/linux/hugetlb_cgroup.h b/include/linux/hugetlb_cgroup.h new file mode 100644 index 0000000..e9944b4 --- /dev/null +++ b/include/linux/hugetlb_cgroup.h @@ -0,0 +1,37 @@ +/* + * Copyright IBM Corporation, 2012 + * Author Aneesh Kumar K.V + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of version 2.1 of the GNU Lesser General Public License + * as published by the Free Software Foundation. + * + * This program is distributed in the hope that it would be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. + * + */ + +#ifndef _LINUX_HUGETLB_CGROUP_H +#define _LINUX_HUGETLB_CGROUP_H + +#include + +struct hugetlb_cgroup; + +#ifdef CONFIG_CGROUP_HUGETLB_RES_CTLR +static inline bool hugetlb_cgroup_disabled(void) +{ + if (hugetlb_subsys.disabled) + return true; + return false; +} + +#else +static inline bool hugetlb_cgroup_disabled(void) +{ + return true; +} + +#endif /* CONFIG_MEM_RES_CTLR_HUGETLB */ +#endif diff --git a/init/Kconfig b/init/Kconfig index d07dcf9..da05fae 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -751,6 +751,21 @@ config CGROUP_MEM_RES_CTLR_KMEM the kmem extension can use it to guarantee that no group of processes will ever exhaust kernel resources alone. +config CGROUP_HUGETLB_RES_CTLR + bool "HugeTLB Resource Controller for Control Groups" + depends on RESOURCE_COUNTERS && HUGETLB_PAGE && EXPERIMENTAL + default n + help + Provides a cgroup Resource Controller for HugeTLB pages. + When you enable this, you can put a per cgroup limit on HugeTLB usage. + The limit is enforced during page fault. Since HugeTLB doesn't + support page reclaim, enforcing the limit at page fault time implies + that, the application will get SIGBUS signal if it tries to access + HugeTLB pages beyond its limit. This requires the application to know + beforehand how much HugeTLB pages it would require for its use. The + control group is tracked in the third page lru pointer. This means + that we cannot use the controller with huge page less than 3 pages. + config CGROUP_PERF bool "Enable perf_event per-cpu per-container group (cgroup) monitoring" depends on PERF_EVENTS && CGROUPS diff --git a/mm/Makefile b/mm/Makefile index 2e2fbbe..25e8002 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -49,6 +49,7 @@ obj-$(CONFIG_MIGRATION) += migrate.o obj-$(CONFIG_QUICKLIST) += quicklist.o obj-$(CONFIG_TRANSPARENT_HUGEPAGE) += huge_memory.o obj-$(CONFIG_CGROUP_MEM_RES_CTLR) += memcontrol.o page_cgroup.o +obj-$(CONFIG_CGROUP_HUGETLB_RES_CTLR) += hugetlb_cgroup.o obj-$(CONFIG_MEMORY_FAILURE) += memory-failure.o obj-$(CONFIG_HWPOISON_INJECT) += hwpoison-inject.o obj-$(CONFIG_DEBUG_KMEMLEAK) += kmemleak.o diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c new file mode 100644 index 0000000..5a4e71c --- /dev/null +++ b/mm/hugetlb_cgroup.c @@ -0,0 +1,122 @@ +/* + * + * Copyright IBM Corporation, 2012 + * Author Aneesh Kumar K.V + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of version 2.1 of the GNU Lesser General Public License + * as published by the Free Software Foundation. + * + * This program is distributed in the hope that it would be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. + * + */ + +#include +#include +#include +#include + +struct hugetlb_cgroup { + struct cgroup_subsys_state css; + /* + * the counter to account for hugepages from hugetlb. + */ + struct res_counter hugepage[HUGE_MAX_HSTATE]; +}; + +struct cgroup_subsys hugetlb_subsys __read_mostly; +struct hugetlb_cgroup *root_h_cgroup __read_mostly; + +static inline +struct hugetlb_cgroup *hugetlb_cgroup_from_css(struct cgroup_subsys_state *s) +{ + if (s) + return container_of(s, struct hugetlb_cgroup, css); + return NULL; +} + +static inline +struct hugetlb_cgroup *hugetlb_cgroup_from_cgroup(struct cgroup *cgroup) +{ + return hugetlb_cgroup_from_css(cgroup_subsys_state(cgroup, + hugetlb_subsys_id)); +} + +static inline +struct hugetlb_cgroup *hugetlb_cgroup_from_task(struct task_struct *task) +{ + return hugetlb_cgroup_from_css(task_subsys_state(task, + hugetlb_subsys_id)); +} + +static inline bool hugetlb_cgroup_is_root(struct hugetlb_cgroup *h_cg) +{ + return (h_cg == root_h_cgroup); +} + +static inline struct hugetlb_cgroup *parent_hugetlb_cgroup(struct cgroup *cg) +{ + if (!cg->parent) + return NULL; + return hugetlb_cgroup_from_cgroup(cg->parent); +} + +static inline bool hugetlb_cgroup_have_usage(struct cgroup *cg) +{ + int idx; + struct hugetlb_cgroup *h_cg = hugetlb_cgroup_from_cgroup(cg); + + for (idx = 0; idx < hugetlb_max_hstate; idx++) { + if ((res_counter_read_u64(&h_cg->hugepage[idx], RES_USAGE)) > 0) + return true; + } + return false; +} + +static struct cgroup_subsys_state *hugetlb_cgroup_create(struct cgroup *cgroup) +{ + int idx; + struct cgroup *parent_cgroup; + struct hugetlb_cgroup *h_cgroup, *parent_h_cgroup; + + h_cgroup = kzalloc(sizeof(*h_cgroup), GFP_KERNEL); + if (!h_cgroup) + return ERR_PTR(-ENOMEM); + + parent_cgroup = cgroup->parent; + if (parent_cgroup) { + parent_h_cgroup = hugetlb_cgroup_from_cgroup(parent_cgroup); + for (idx = 0; idx < HUGE_MAX_HSTATE; idx++) + res_counter_init(&h_cgroup->hugepage[idx], + &parent_h_cgroup->hugepage[idx]); + } else { + root_h_cgroup = h_cgroup; + for (idx = 0; idx < HUGE_MAX_HSTATE; idx++) + res_counter_init(&h_cgroup->hugepage[idx], NULL); + } + return &h_cgroup->css; +} + +static void hugetlb_cgroup_destroy(struct cgroup *cgroup) +{ + struct hugetlb_cgroup *h_cgroup; + + h_cgroup = hugetlb_cgroup_from_cgroup(cgroup); + kfree(h_cgroup); +} + +static int hugetlb_cgroup_pre_destroy(struct cgroup *cgroup) +{ + /* We will add the cgroup removal support in later patches */ + return -EBUSY; +} + +struct cgroup_subsys hugetlb_subsys = { + .name = "hugetlb", + .create = hugetlb_cgroup_create, + .pre_destroy = hugetlb_cgroup_pre_destroy, + .destroy = hugetlb_cgroup_destroy, + .subsys_id = hugetlb_subsys_id, +}; -- 1.7.10 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753970Ab2FMK2v (ORCPT ); Wed, 13 Jun 2012 06:28:51 -0400 Received: from e28smtp08.in.ibm.com ([122.248.162.8]:38474 "EHLO e28smtp08.in.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753830Ab2FMK2t (ORCPT ); Wed, 13 Jun 2012 06:28:49 -0400 From: "Aneesh Kumar K.V" To: linux-mm@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, rientjes@google.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, "Aneesh Kumar K.V" Subject: [PATCH -V9 07/15] hugetlb: add a list for tracking in-use HugeTLB pages Date: Wed, 13 Jun 2012 15:57:26 +0530 Message-Id: <1339583254-895-8-git-send-email-aneesh.kumar@linux.vnet.ibm.com> X-Mailer: git-send-email 1.7.10 In-Reply-To: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> x-cbid: 12061310-2000-0000-0000-000007E6F9EA Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: "Aneesh Kumar K.V" hugepage_activelist will be used to track currently used HugeTLB pages. We need to find the in-use HugeTLB pages to support HugeTLB cgroup removal. On cgroup removal we update the page's HugeTLB cgroup to point to parent cgroup. Reviewed-by: KAMEZAWA Hiroyuki Signed-off-by: Aneesh Kumar K.V --- include/linux/hugetlb.h | 1 + mm/hugetlb.c | 12 +++++++----- 2 files changed, 8 insertions(+), 5 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 0f23c18..ed550d8 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -211,6 +211,7 @@ struct hstate { unsigned long resv_huge_pages; unsigned long surplus_huge_pages; unsigned long nr_overcommit_huge_pages; + struct list_head hugepage_activelist; struct list_head hugepage_freelists[MAX_NUMNODES]; unsigned int nr_huge_pages_node[MAX_NUMNODES]; unsigned int free_huge_pages_node[MAX_NUMNODES]; diff --git a/mm/hugetlb.c b/mm/hugetlb.c index e54b695..b5b6e15 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -510,7 +510,7 @@ void copy_huge_page(struct page *dst, struct page *src) static void enqueue_huge_page(struct hstate *h, struct page *page) { int nid = page_to_nid(page); - list_add(&page->lru, &h->hugepage_freelists[nid]); + list_move(&page->lru, &h->hugepage_freelists[nid]); h->free_huge_pages++; h->free_huge_pages_node[nid]++; } @@ -522,7 +522,7 @@ static struct page *dequeue_huge_page_node(struct hstate *h, int nid) if (list_empty(&h->hugepage_freelists[nid])) return NULL; page = list_entry(h->hugepage_freelists[nid].next, struct page, lru); - list_del(&page->lru); + list_move(&page->lru, &h->hugepage_activelist); set_page_refcounted(page); h->free_huge_pages--; h->free_huge_pages_node[nid]--; @@ -626,10 +626,11 @@ static void free_huge_page(struct page *page) page->mapping = NULL; BUG_ON(page_count(page)); BUG_ON(page_mapcount(page)); - INIT_LIST_HEAD(&page->lru); spin_lock(&hugetlb_lock); if (h->surplus_huge_pages_node[nid] && huge_page_order(h) < MAX_ORDER) { + /* remove the page from active list */ + list_del(&page->lru); update_and_free_page(h, page); h->surplus_huge_pages--; h->surplus_huge_pages_node[nid]--; @@ -642,6 +643,7 @@ static void free_huge_page(struct page *page) static void prep_new_huge_page(struct hstate *h, struct page *page, int nid) { + INIT_LIST_HEAD(&page->lru); set_compound_page_dtor(page, free_huge_page); spin_lock(&hugetlb_lock); h->nr_huge_pages++; @@ -890,6 +892,7 @@ static struct page *alloc_buddy_huge_page(struct hstate *h, int nid) spin_lock(&hugetlb_lock); if (page) { + INIT_LIST_HEAD(&page->lru); r_nid = page_to_nid(page); set_compound_page_dtor(page, free_huge_page); /* @@ -994,7 +997,6 @@ retry: list_for_each_entry_safe(page, tmp, &surplus_list, lru) { if ((--needed) < 0) break; - list_del(&page->lru); /* * This page is now managed by the hugetlb allocator and has * no users -- drop the buddy allocator's reference. @@ -1009,7 +1011,6 @@ free: /* Free unnecessary surplus pages to the buddy allocator */ if (!list_empty(&surplus_list)) { list_for_each_entry_safe(page, tmp, &surplus_list, lru) { - list_del(&page->lru); put_page(page); } } @@ -1909,6 +1910,7 @@ void __init hugetlb_add_hstate(unsigned order) h->free_huge_pages = 0; for (i = 0; i < MAX_NUMNODES; ++i) INIT_LIST_HEAD(&h->hugepage_freelists[i]); + INIT_LIST_HEAD(&h->hugepage_activelist); h->next_nid_to_alloc = first_node(node_states[N_HIGH_MEMORY]); h->next_nid_to_free = first_node(node_states[N_HIGH_MEMORY]); snprintf(h->name, HSTATE_NAME_LEN, "hugepages-%lukB", -- 1.7.10 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753839Ab2FMK2Z (ORCPT ); Wed, 13 Jun 2012 06:28:25 -0400 Received: from e28smtp05.in.ibm.com ([122.248.162.5]:52430 "EHLO e28smtp05.in.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753022Ab2FMK2M (ORCPT ); Wed, 13 Jun 2012 06:28:12 -0400 From: "Aneesh Kumar K.V" To: linux-mm@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, rientjes@google.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, "Aneesh Kumar K.V" Subject: [PATCH -V9 04/15] hugetlb: use mmu_gather instead of a temporary linked list for accumulating pages Date: Wed, 13 Jun 2012 15:57:23 +0530 Message-Id: <1339583254-895-5-git-send-email-aneesh.kumar@linux.vnet.ibm.com> X-Mailer: git-send-email 1.7.10 In-Reply-To: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> x-cbid: 12061310-8256-0000-0000-000002E809E7 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: "Aneesh Kumar K.V" Use a mmu_gather instead of a temporary linked list for accumulating pages when we unmap a hugepage range Reviewed-by: KAMEZAWA Hiroyuki Signed-off-by: Aneesh Kumar K.V --- fs/hugetlbfs/inode.c | 4 ++-- include/linux/hugetlb.h | 22 ++++++++++++++---- mm/hugetlb.c | 59 ++++++++++++++++++++++++++++------------------- mm/memory.c | 7 ++++-- 4 files changed, 59 insertions(+), 33 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index cc9281b..ff233e4 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -416,8 +416,8 @@ hugetlb_vmtruncate_list(struct prio_tree_root *root, pgoff_t pgoff) else v_offset = 0; - __unmap_hugepage_range(vma, - vma->vm_start + v_offset, vma->vm_end, NULL); + unmap_hugepage_range(vma, vma->vm_start + v_offset, + vma->vm_end, NULL); } } diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 217f528..0f23c18 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -7,6 +7,7 @@ struct ctl_table; struct user_struct; +struct mmu_gather; #ifdef CONFIG_HUGETLB_PAGE @@ -40,9 +41,10 @@ int follow_hugetlb_page(struct mm_struct *, struct vm_area_struct *, struct page **, struct vm_area_struct **, unsigned long *, int *, int, unsigned int flags); void unmap_hugepage_range(struct vm_area_struct *, - unsigned long, unsigned long, struct page *); -void __unmap_hugepage_range(struct vm_area_struct *, - unsigned long, unsigned long, struct page *); + unsigned long, unsigned long, struct page *); +void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma, + unsigned long start, unsigned long end, + struct page *ref_page); int hugetlb_prefault(struct address_space *, struct vm_area_struct *); void hugetlb_report_meminfo(struct seq_file *); int hugetlb_report_node_meminfo(int, char *); @@ -98,7 +100,6 @@ static inline unsigned long hugetlb_total_pages(void) #define follow_huge_addr(mm, addr, write) ERR_PTR(-EINVAL) #define copy_hugetlb_page_range(src, dst, vma) ({ BUG(); 0; }) #define hugetlb_prefault(mapping, vma) ({ BUG(); 0; }) -#define unmap_hugepage_range(vma, start, end, page) BUG() static inline void hugetlb_report_meminfo(struct seq_file *m) { } @@ -112,13 +113,24 @@ static inline void hugetlb_report_meminfo(struct seq_file *m) #define hugetlb_free_pgd_range(tlb, addr, end, floor, ceiling) ({BUG(); 0; }) #define hugetlb_fault(mm, vma, addr, flags) ({ BUG(); 0; }) #define huge_pte_offset(mm, address) 0 -#define dequeue_hwpoisoned_huge_page(page) 0 +static inline int dequeue_hwpoisoned_huge_page(struct page *page) +{ + return 0; +} + static inline void copy_huge_page(struct page *dst, struct page *src) { } #define hugetlb_change_protection(vma, address, end, newprot) +static inline void __unmap_hugepage_range(struct mmu_gather *tlb, + struct vm_area_struct *vma, unsigned long start, + unsigned long end, struct page *ref_page) +{ + BUG(); +} + #endif /* !CONFIG_HUGETLB_PAGE */ #define HUGETLB_ANON_FILE "anon_hugepage" diff --git a/mm/hugetlb.c b/mm/hugetlb.c index b1e0ed1..e54b695 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -24,8 +24,9 @@ #include #include -#include +#include +#include #include #include #include "internal.h" @@ -2310,30 +2311,26 @@ static int is_hugetlb_entry_hwpoisoned(pte_t pte) return 0; } -void __unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start, - unsigned long end, struct page *ref_page) +void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma, + unsigned long start, unsigned long end, + struct page *ref_page) { + int force_flush = 0; struct mm_struct *mm = vma->vm_mm; unsigned long address; pte_t *ptep; pte_t pte; struct page *page; - struct page *tmp; struct hstate *h = hstate_vma(vma); unsigned long sz = huge_page_size(h); - /* - * A page gathering list, protected by per file i_mmap_mutex. The - * lock is used to avoid list corruption from multiple unmapping - * of the same page since we are using page->lru. - */ - LIST_HEAD(page_list); - WARN_ON(!is_vm_hugetlb_page(vma)); BUG_ON(start & ~huge_page_mask(h)); BUG_ON(end & ~huge_page_mask(h)); + tlb_start_vma(tlb, vma); mmu_notifier_invalidate_range_start(mm, start, end); +again: spin_lock(&mm->page_table_lock); for (address = start; address < end; address += sz) { ptep = huge_pte_offset(mm, address); @@ -2372,30 +2369,45 @@ void __unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start, } pte = huge_ptep_get_and_clear(mm, address, ptep); + tlb_remove_tlb_entry(tlb, ptep, address); if (pte_dirty(pte)) set_page_dirty(page); - list_add(&page->lru, &page_list); + page_remove_rmap(page); + force_flush = !__tlb_remove_page(tlb, page); + if (force_flush) + break; /* Bail out after unmapping reference page if supplied */ if (ref_page) break; } - flush_tlb_range(vma, start, end); spin_unlock(&mm->page_table_lock); - mmu_notifier_invalidate_range_end(mm, start, end); - list_for_each_entry_safe(page, tmp, &page_list, lru) { - page_remove_rmap(page); - list_del(&page->lru); - put_page(page); + /* + * mmu_gather ran out of room to batch pages, we break out of + * the PTE lock to avoid doing the potential expensive TLB invalidate + * and page-free while holding it. + */ + if (force_flush) { + force_flush = 0; + tlb_flush_mmu(tlb); + if (address < end && !ref_page) + goto again; } + mmu_notifier_invalidate_range_end(mm, start, end); + tlb_end_vma(tlb, vma); } void unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start, unsigned long end, struct page *ref_page) { - mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex); - __unmap_hugepage_range(vma, start, end, ref_page); - mutex_unlock(&vma->vm_file->f_mapping->i_mmap_mutex); + struct mm_struct *mm; + struct mmu_gather tlb; + + mm = vma->vm_mm; + + tlb_gather_mmu(&tlb, mm, 0); + __unmap_hugepage_range(&tlb, vma, start, end, ref_page); + tlb_finish_mmu(&tlb, start, end); } /* @@ -2440,9 +2452,8 @@ static int unmap_ref_private(struct mm_struct *mm, struct vm_area_struct *vma, * from the time of fork. This would look like data corruption */ if (!is_vma_resv_set(iter_vma, HPAGE_RESV_OWNER)) - __unmap_hugepage_range(iter_vma, - address, address + huge_page_size(h), - page); + unmap_hugepage_range(iter_vma, address, + address + huge_page_size(h), page); } mutex_unlock(&mapping->i_mmap_mutex); diff --git a/mm/memory.c b/mm/memory.c index 1b7dc66..545e18a 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1326,8 +1326,11 @@ static void unmap_single_vma(struct mmu_gather *tlb, * Since no pte has actually been setup, it is * safe to do nothing in this case. */ - if (vma->vm_file) - unmap_hugepage_range(vma, start, end, NULL); + if (vma->vm_file) { + mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex); + __unmap_hugepage_range(tlb, vma, start, end, NULL); + mutex_unlock(&vma->vm_file->f_mapping->i_mmap_mutex); + } } else unmap_page_range(tlb, vma, start, end, details); } -- 1.7.10 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753899Ab2FMK2f (ORCPT ); Wed, 13 Jun 2012 06:28:35 -0400 Received: from e28smtp06.in.ibm.com ([122.248.162.6]:34502 "EHLO e28smtp06.in.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753855Ab2FMK23 (ORCPT ); Wed, 13 Jun 2012 06:28:29 -0400 From: "Aneesh Kumar K.V" To: linux-mm@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, rientjes@google.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, "Aneesh Kumar K.V" Subject: [PATCH -V9 10/15] hugetlb/cgroup: Add the cgroup pointer to page lru Date: Wed, 13 Jun 2012 15:57:29 +0530 Message-Id: <1339583254-895-11-git-send-email-aneesh.kumar@linux.vnet.ibm.com> X-Mailer: git-send-email 1.7.10 In-Reply-To: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> x-cbid: 12061310-9574-0000-0000-000003280474 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: "Aneesh Kumar K.V" Add the hugetlb cgroup pointer to 3rd page lru.next. This limit the usage to hugetlb cgroup to only hugepages with 3 or more normal pages. I guess that is an acceptable limitation. Signed-off-by: Aneesh Kumar K.V --- include/linux/hugetlb_cgroup.h | 37 +++++++++++++++++++++++++++++++++++++ mm/hugetlb.c | 4 ++++ 2 files changed, 41 insertions(+) diff --git a/include/linux/hugetlb_cgroup.h b/include/linux/hugetlb_cgroup.h index e9944b4..be1a9f8 100644 --- a/include/linux/hugetlb_cgroup.h +++ b/include/linux/hugetlb_cgroup.h @@ -20,6 +20,32 @@ struct hugetlb_cgroup; #ifdef CONFIG_CGROUP_HUGETLB_RES_CTLR +/* + * Minimum page order trackable by hugetlb cgroup. + * At least 3 pages are necessary for all the tracking information. + */ +#define HUGETLB_CGROUP_MIN_ORDER 2 + +static inline struct hugetlb_cgroup *hugetlb_cgroup_from_page(struct page *page) +{ + VM_BUG_ON(!PageHuge(page)); + + if (compound_order(page) < HUGETLB_CGROUP_MIN_ORDER) + return NULL; + return (struct hugetlb_cgroup *)page[2].lru.next; +} + +static inline +int set_hugetlb_cgroup(struct page *page, struct hugetlb_cgroup *h_cg) +{ + VM_BUG_ON(!PageHuge(page)); + + if (compound_order(page) < HUGETLB_CGROUP_MIN_ORDER) + return -1; + page[2].lru.next = (void *)h_cg; + return 0; +} + static inline bool hugetlb_cgroup_disabled(void) { if (hugetlb_subsys.disabled) @@ -28,6 +54,17 @@ static inline bool hugetlb_cgroup_disabled(void) } #else +static inline struct hugetlb_cgroup *hugetlb_cgroup_from_page(struct page *page) +{ + return NULL; +} + +static inline +int set_hugetlb_cgroup(struct page *page, struct hugetlb_cgroup *h_cg) +{ + return 0; +} + static inline bool hugetlb_cgroup_disabled(void) { return true; diff --git a/mm/hugetlb.c b/mm/hugetlb.c index e899a2d..6a449c5 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -28,6 +28,7 @@ #include #include +#include #include #include "internal.h" @@ -591,6 +592,7 @@ static void update_and_free_page(struct hstate *h, struct page *page) 1 << PG_active | 1 << PG_reserved | 1 << PG_private | 1 << PG_writeback); } + VM_BUG_ON(hugetlb_cgroup_from_page(page)); set_compound_page_dtor(page, NULL); set_page_refcounted(page); arch_release_hugepage(page); @@ -643,6 +645,7 @@ static void prep_new_huge_page(struct hstate *h, struct page *page, int nid) INIT_LIST_HEAD(&page->lru); set_compound_page_dtor(page, free_huge_page); spin_lock(&hugetlb_lock); + set_hugetlb_cgroup(page, NULL); h->nr_huge_pages++; h->nr_huge_pages_node[nid]++; spin_unlock(&hugetlb_lock); @@ -892,6 +895,7 @@ static struct page *alloc_buddy_huge_page(struct hstate *h, int nid) INIT_LIST_HEAD(&page->lru); r_nid = page_to_nid(page); set_compound_page_dtor(page, free_huge_page); + set_hugetlb_cgroup(page, NULL); /* * We incremented the global counters already */ -- 1.7.10 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753759Ab2FMK3x (ORCPT ); Wed, 13 Jun 2012 06:29:53 -0400 Received: from e28smtp03.in.ibm.com ([122.248.162.3]:57536 "EHLO e28smtp03.in.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753765Ab2FMK2X (ORCPT ); Wed, 13 Jun 2012 06:28:23 -0400 From: "Aneesh Kumar K.V" To: linux-mm@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, rientjes@google.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, "Aneesh Kumar K.V" Subject: [PATCH -V9 06/15] hugetlb: simplify migrate_huge_page() Date: Wed, 13 Jun 2012 15:57:25 +0530 Message-Id: <1339583254-895-7-git-send-email-aneesh.kumar@linux.vnet.ibm.com> X-Mailer: git-send-email 1.7.10 In-Reply-To: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> x-cbid: 12061310-3864-0000-0000-0000034FEFC3 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: "Aneesh Kumar K.V" Since we migrate only one hugepage, don't use linked list for passing the page around. Directly pass the page that need to be migrated as argument. This also remove the usage page->lru in migrate path. Reviewed-by: KAMEZAWA Hiroyuki Signed-off-by: Aneesh Kumar K.V --- include/linux/migrate.h | 4 +-- mm/memory-failure.c | 13 ++-------- mm/migrate.c | 65 +++++++++++++++-------------------------------- 3 files changed, 25 insertions(+), 57 deletions(-) diff --git a/include/linux/migrate.h b/include/linux/migrate.h index 855c337..ce7e667 100644 --- a/include/linux/migrate.h +++ b/include/linux/migrate.h @@ -15,7 +15,7 @@ extern int migrate_page(struct address_space *, extern int migrate_pages(struct list_head *l, new_page_t x, unsigned long private, bool offlining, enum migrate_mode mode); -extern int migrate_huge_pages(struct list_head *l, new_page_t x, +extern int migrate_huge_page(struct page *, new_page_t x, unsigned long private, bool offlining, enum migrate_mode mode); @@ -36,7 +36,7 @@ static inline void putback_lru_pages(struct list_head *l) {} static inline int migrate_pages(struct list_head *l, new_page_t x, unsigned long private, bool offlining, enum migrate_mode mode) { return -ENOSYS; } -static inline int migrate_huge_pages(struct list_head *l, new_page_t x, +static inline int migrate_huge_page(struct page *page, new_page_t x, unsigned long private, bool offlining, enum migrate_mode mode) { return -ENOSYS; } diff --git a/mm/memory-failure.c b/mm/memory-failure.c index ab1e714..53a1495 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1414,7 +1414,6 @@ static int soft_offline_huge_page(struct page *page, int flags) int ret; unsigned long pfn = page_to_pfn(page); struct page *hpage = compound_head(page); - LIST_HEAD(pagelist); ret = get_any_page(page, pfn, flags); if (ret < 0) @@ -1429,19 +1428,11 @@ static int soft_offline_huge_page(struct page *page, int flags) } /* Keep page count to indicate a given hugepage is isolated. */ - - list_add(&hpage->lru, &pagelist); - ret = migrate_huge_pages(&pagelist, new_page, MPOL_MF_MOVE_ALL, 0, - true); + ret = migrate_huge_page(hpage, new_page, MPOL_MF_MOVE_ALL, 0, true); + put_page(hpage); if (ret) { - struct page *page1, *page2; - list_for_each_entry_safe(page1, page2, &pagelist, lru) - put_page(page1); - pr_info("soft offline: %#lx: migration failed %d, type %lx\n", pfn, ret, page->flags); - if (ret > 0) - ret = -EIO; return ret; } done: diff --git a/mm/migrate.c b/mm/migrate.c index be26d5c..fdce3a2 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -932,15 +932,8 @@ static int unmap_and_move_huge_page(new_page_t get_new_page, if (anon_vma) put_anon_vma(anon_vma); unlock_page(hpage); - out: - if (rc != -EAGAIN) { - list_del(&hpage->lru); - put_page(hpage); - } - put_page(new_hpage); - if (result) { if (rc) *result = rc; @@ -1016,48 +1009,32 @@ out: return nr_failed + retry; } -int migrate_huge_pages(struct list_head *from, - new_page_t get_new_page, unsigned long private, bool offlining, - enum migrate_mode mode) +int migrate_huge_page(struct page *hpage, new_page_t get_new_page, + unsigned long private, bool offlining, + enum migrate_mode mode) { - int retry = 1; - int nr_failed = 0; - int pass = 0; - struct page *page; - struct page *page2; - int rc; - - for (pass = 0; pass < 10 && retry; pass++) { - retry = 0; - - list_for_each_entry_safe(page, page2, from, lru) { + int pass, rc; + + for (pass = 0; pass < 10; pass++) { + rc = unmap_and_move_huge_page(get_new_page, + private, hpage, pass > 2, offlining, + mode); + switch (rc) { + case -ENOMEM: + goto out; + case -EAGAIN: + /* try again */ cond_resched(); - - rc = unmap_and_move_huge_page(get_new_page, - private, page, pass > 2, offlining, - mode); - - switch(rc) { - case -ENOMEM: - goto out; - case -EAGAIN: - retry++; - break; - case 0: - break; - default: - /* Permanent failure */ - nr_failed++; - break; - } + break; + case 0: + goto out; + default: + rc = -EIO; + goto out; } } - rc = 0; out: - if (rc) - return rc; - - return nr_failed + retry; + return rc; } #ifdef CONFIG_NUMA -- 1.7.10 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753809Ab2FMK2X (ORCPT ); Wed, 13 Jun 2012 06:28:23 -0400 Received: from e28smtp03.in.ibm.com ([122.248.162.3]:57496 "EHLO e28smtp03.in.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753513Ab2FMK2M (ORCPT ); Wed, 13 Jun 2012 06:28:12 -0400 From: "Aneesh Kumar K.V" To: linux-mm@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, rientjes@google.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, "Aneesh Kumar K.V" Subject: [PATCH -V9 02/15] hugetlb: don't use ERR_PTR with VM_FAULT* values Date: Wed, 13 Jun 2012 15:57:21 +0530 Message-Id: <1339583254-895-3-git-send-email-aneesh.kumar@linux.vnet.ibm.com> X-Mailer: git-send-email 1.7.10 In-Reply-To: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> x-cbid: 12061310-3864-0000-0000-0000034FEF8B Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: "Aneesh Kumar K.V" The current use of VM_FAULT_* codes with ERR_PTR requires us to ensure VM_FAULT_* values will not exceed MAX_ERRNO value. Decouple the VM_FAULT_* values from MAX_ERRNO. Acked-by: Hillf Danton Acked-by: KOSAKI Motohiro Reviewed-by: KAMEZAWA Hiroyuki Signed-off-by: Aneesh Kumar K.V --- mm/hugetlb.c | 18 +++++++++++++----- 1 file changed, 13 insertions(+), 5 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index c868309..34a7e23 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1123,10 +1123,10 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma, */ chg = vma_needs_reservation(h, vma, addr); if (chg < 0) - return ERR_PTR(-VM_FAULT_OOM); + return ERR_PTR(-ENOMEM); if (chg) if (hugepage_subpool_get_pages(spool, chg)) - return ERR_PTR(-VM_FAULT_SIGBUS); + return ERR_PTR(-ENOSPC); spin_lock(&hugetlb_lock); page = dequeue_huge_page_vma(h, vma, addr, avoid_reserve); @@ -1136,7 +1136,7 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma, page = alloc_buddy_huge_page(h, NUMA_NO_NODE); if (!page) { hugepage_subpool_put_pages(spool, chg); - return ERR_PTR(-VM_FAULT_SIGBUS); + return ERR_PTR(-ENOSPC); } } @@ -2496,6 +2496,7 @@ retry_avoidcopy: new_page = alloc_huge_page(vma, address, outside_reserve); if (IS_ERR(new_page)) { + long err = PTR_ERR(new_page); page_cache_release(old_page); /* @@ -2524,7 +2525,10 @@ retry_avoidcopy: /* Caller expects lock to be held */ spin_lock(&mm->page_table_lock); - return -PTR_ERR(new_page); + if (err == -ENOMEM) + return VM_FAULT_OOM; + else + return VM_FAULT_SIGBUS; } /* @@ -2642,7 +2646,11 @@ retry: goto out; page = alloc_huge_page(vma, address, 0); if (IS_ERR(page)) { - ret = -PTR_ERR(page); + ret = PTR_ERR(page); + if (ret == -ENOMEM) + ret = VM_FAULT_OOM; + else + ret = VM_FAULT_SIGBUS; goto out; } clear_huge_page(page, address, pages_per_huge_page(h)); -- 1.7.10 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753880Ab2FMK2d (ORCPT ); Wed, 13 Jun 2012 06:28:33 -0400 Received: from e28smtp05.in.ibm.com ([122.248.162.5]:52541 "EHLO e28smtp05.in.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753856Ab2FMK2a (ORCPT ); Wed, 13 Jun 2012 06:28:30 -0400 From: "Aneesh Kumar K.V" To: linux-mm@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, rientjes@google.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, "Aneesh Kumar K.V" Subject: [PATCH -V9 11/15] hugetlb/cgroup: Add charge/uncharge routines for hugetlb cgroup Date: Wed, 13 Jun 2012 15:57:30 +0530 Message-Id: <1339583254-895-12-git-send-email-aneesh.kumar@linux.vnet.ibm.com> X-Mailer: git-send-email 1.7.10 In-Reply-To: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> x-cbid: 12061310-8256-0000-0000-000002E80A27 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: "Aneesh Kumar K.V" This patchset add the charge and uncharge routines for hugetlb cgroup. We do cgroup charging in page alloc and uncharge in compound page destructor. Assigning page's hugetlb cgroup is protected by hugetlb_lock. Signed-off-by: Aneesh Kumar K.V --- include/linux/hugetlb_cgroup.h | 38 +++++++++++++++++++ mm/hugetlb.c | 16 +++++++- mm/hugetlb_cgroup.c | 80 ++++++++++++++++++++++++++++++++++++++++ 3 files changed, 133 insertions(+), 1 deletion(-) diff --git a/include/linux/hugetlb_cgroup.h b/include/linux/hugetlb_cgroup.h index be1a9f8..e05871c 100644 --- a/include/linux/hugetlb_cgroup.h +++ b/include/linux/hugetlb_cgroup.h @@ -53,6 +53,16 @@ static inline bool hugetlb_cgroup_disabled(void) return false; } +extern int hugetlb_cgroup_charge_cgroup(int idx, unsigned long nr_pages, + struct hugetlb_cgroup **ptr); +extern void hugetlb_cgroup_commit_charge(int idx, unsigned long nr_pages, + struct hugetlb_cgroup *h_cg, + struct page *page); +extern void hugetlb_cgroup_uncharge_page(int idx, unsigned long nr_pages, + struct page *page); +extern void hugetlb_cgroup_uncharge_cgroup(int idx, unsigned long nr_pages, + struct hugetlb_cgroup *h_cg); + #else static inline struct hugetlb_cgroup *hugetlb_cgroup_from_page(struct page *page) { @@ -70,5 +80,33 @@ static inline bool hugetlb_cgroup_disabled(void) return true; } +static inline int +hugetlb_cgroup_charge_cgroup(int idx, unsigned long nr_pages, + struct hugetlb_cgroup **ptr) +{ + return 0; +} + +static inline void +hugetlb_cgroup_commit_charge(int idx, unsigned long nr_pages, + struct hugetlb_cgroup *h_cg, + struct page *page) +{ + return; +} + +static inline void +hugetlb_cgroup_uncharge_page(int idx, unsigned long nr_pages, struct page *page) +{ + return; +} + +static inline void +hugetlb_cgroup_uncharge_cgroup(int idx, unsigned long nr_pages, + struct hugetlb_cgroup *h_cg) +{ + return; +} + #endif /* CONFIG_MEM_RES_CTLR_HUGETLB */ #endif diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 6a449c5..59720b1 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -627,6 +627,8 @@ static void free_huge_page(struct page *page) BUG_ON(page_mapcount(page)); spin_lock(&hugetlb_lock); + hugetlb_cgroup_uncharge_page(hstate_index(h), + pages_per_huge_page(h), page); if (h->surplus_huge_pages_node[nid] && huge_page_order(h) < MAX_ORDER) { /* remove the page from active list */ list_del(&page->lru); @@ -1115,7 +1117,10 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma, struct hstate *h = hstate_vma(vma); struct page *page; long chg; + int ret, idx; + struct hugetlb_cgroup *h_cg; + idx = hstate_index(h); /* * Processes that did not create the mapping will have no * reserves and will not have accounted against subpool @@ -1131,6 +1136,11 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma, if (hugepage_subpool_get_pages(spool, chg)) return ERR_PTR(-ENOSPC); + ret = hugetlb_cgroup_charge_cgroup(idx, pages_per_huge_page(h), &h_cg); + if (ret) { + hugepage_subpool_put_pages(spool, chg); + return ERR_PTR(-ENOSPC); + } spin_lock(&hugetlb_lock); page = dequeue_huge_page_vma(h, vma, addr, avoid_reserve); spin_unlock(&hugetlb_lock); @@ -1138,6 +1148,9 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma, if (!page) { page = alloc_buddy_huge_page(h, NUMA_NO_NODE); if (!page) { + hugetlb_cgroup_uncharge_cgroup(idx, + pages_per_huge_page(h), + h_cg); hugepage_subpool_put_pages(spool, chg); return ERR_PTR(-ENOSPC); } @@ -1146,7 +1159,8 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma, set_page_private(page, (unsigned long)spool); vma_commit_reservation(h, vma, addr); - + /* update page cgroup details */ + hugetlb_cgroup_commit_charge(idx, pages_per_huge_page(h), h_cg, page); return page; } diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c index 5a4e71c..0f2f6ac 100644 --- a/mm/hugetlb_cgroup.c +++ b/mm/hugetlb_cgroup.c @@ -113,6 +113,86 @@ static int hugetlb_cgroup_pre_destroy(struct cgroup *cgroup) return -EBUSY; } +int hugetlb_cgroup_charge_cgroup(int idx, unsigned long nr_pages, + struct hugetlb_cgroup **ptr) +{ + int ret = 0; + struct res_counter *fail_res; + struct hugetlb_cgroup *h_cg = NULL; + unsigned long csize = nr_pages * PAGE_SIZE; + + if (hugetlb_cgroup_disabled()) + goto done; + /* + * We don't charge any cgroup if the compound page have less + * than 3 pages. + */ + if (huge_page_order(&hstates[idx]) < HUGETLB_CGROUP_MIN_ORDER) + goto done; +again: + rcu_read_lock(); + h_cg = hugetlb_cgroup_from_task(current); + if (!h_cg) + h_cg = root_h_cgroup; + + if (!css_tryget(&h_cg->css)) { + rcu_read_unlock(); + goto again; + } + rcu_read_unlock(); + + ret = res_counter_charge(&h_cg->hugepage[idx], csize, &fail_res); + css_put(&h_cg->css); +done: + *ptr = h_cg; + return ret; +} + +void hugetlb_cgroup_commit_charge(int idx, unsigned long nr_pages, + struct hugetlb_cgroup *h_cg, + struct page *page) +{ + if (hugetlb_cgroup_disabled() || !h_cg) + return; + + spin_lock(&hugetlb_lock); + set_hugetlb_cgroup(page, h_cg); + spin_unlock(&hugetlb_lock); + return; +} + +/* + * Should be called with hugetlb_lock held + */ +void hugetlb_cgroup_uncharge_page(int idx, unsigned long nr_pages, + struct page *page) +{ + struct hugetlb_cgroup *h_cg; + unsigned long csize = nr_pages * PAGE_SIZE; + + if (hugetlb_cgroup_disabled()) + return; + VM_BUG_ON(!spin_is_locked(&hugetlb_lock)); + h_cg = hugetlb_cgroup_from_page(page); + if (unlikely(!h_cg)) + return; + set_hugetlb_cgroup(page, NULL); + res_counter_uncharge(&h_cg->hugepage[idx], csize); + return; +} + +void hugetlb_cgroup_uncharge_cgroup(int idx, unsigned long nr_pages, + struct hugetlb_cgroup *h_cg) +{ + unsigned long csize = nr_pages * PAGE_SIZE; + + if (hugetlb_cgroup_disabled() || !h_cg) + return; + + res_counter_uncharge(&h_cg->hugepage[idx], csize); + return; +} + struct cgroup_subsys hugetlb_subsys = { .name = "hugetlb", .create = hugetlb_cgroup_create, -- 1.7.10 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753865Ab2FMKaQ (ORCPT ); Wed, 13 Jun 2012 06:30:16 -0400 Received: from e28smtp04.in.ibm.com ([122.248.162.4]:51501 "EHLO e28smtp04.in.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753548Ab2FMK2W (ORCPT ); Wed, 13 Jun 2012 06:28:22 -0400 From: "Aneesh Kumar K.V" To: linux-mm@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, rientjes@google.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, "Aneesh Kumar K.V" Subject: [PATCH -V9 13/15] hugetlb/cgroup: add hugetlb cgroup control files Date: Wed, 13 Jun 2012 15:57:32 +0530 Message-Id: <1339583254-895-14-git-send-email-aneesh.kumar@linux.vnet.ibm.com> X-Mailer: git-send-email 1.7.10 In-Reply-To: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> x-cbid: 12061310-5564-0000-0000-0000032F7646 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: "Aneesh Kumar K.V" Add the control files for hugetlb controller Signed-off-by: Aneesh Kumar K.V --- include/linux/hugetlb.h | 5 ++ include/linux/hugetlb_cgroup.h | 6 ++ mm/hugetlb.c | 8 +++ mm/hugetlb_cgroup.c | 129 ++++++++++++++++++++++++++++++++++++++++ 4 files changed, 148 insertions(+) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 4aca057..9650bb1 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -4,6 +4,7 @@ #include #include #include +#include struct ctl_table; struct user_struct; @@ -221,6 +222,10 @@ struct hstate { unsigned int nr_huge_pages_node[MAX_NUMNODES]; unsigned int free_huge_pages_node[MAX_NUMNODES]; unsigned int surplus_huge_pages_node[MAX_NUMNODES]; +#ifdef CONFIG_CGROUP_HUGETLB_RES_CTLR + /* cgroup control files */ + struct cftype cgroup_files[5]; +#endif char name[HSTATE_NAME_LEN]; }; diff --git a/include/linux/hugetlb_cgroup.h b/include/linux/hugetlb_cgroup.h index e05871c..bd8bc98 100644 --- a/include/linux/hugetlb_cgroup.h +++ b/include/linux/hugetlb_cgroup.h @@ -62,6 +62,7 @@ extern void hugetlb_cgroup_uncharge_page(int idx, unsigned long nr_pages, struct page *page); extern void hugetlb_cgroup_uncharge_cgroup(int idx, unsigned long nr_pages, struct hugetlb_cgroup *h_cg); +extern int hugetlb_cgroup_file_init(int idx) __init; #else static inline struct hugetlb_cgroup *hugetlb_cgroup_from_page(struct page *page) @@ -108,5 +109,10 @@ hugetlb_cgroup_uncharge_cgroup(int idx, unsigned long nr_pages, return; } +static inline int __init hugetlb_cgroup_file_init(int idx) +{ + return 0; +} + #endif /* CONFIG_MEM_RES_CTLR_HUGETLB */ #endif diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 59720b1..a5a30bf 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -30,6 +30,7 @@ #include #include #include +#include #include "internal.h" const unsigned long hugetlb_zero = 0, hugetlb_infinity = ~0UL; @@ -1930,6 +1931,13 @@ void __init hugetlb_add_hstate(unsigned order) h->next_nid_to_free = first_node(node_states[N_HIGH_MEMORY]); snprintf(h->name, HSTATE_NAME_LEN, "hugepages-%lukB", huge_page_size(h)/1024); + /* + * Add cgroup control files only if the huge page consists + * of more than two normal pages. This is because we use + * page[2].lru.next for storing cgoup details. + */ + if (order >= HUGETLB_CGROUP_MIN_ORDER) + hugetlb_cgroup_file_init(hugetlb_max_hstate - 1); parsed_hstate = h; } diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c index a3a68a4..64e93e0 100644 --- a/mm/hugetlb_cgroup.c +++ b/mm/hugetlb_cgroup.c @@ -26,6 +26,10 @@ struct hugetlb_cgroup { struct res_counter hugepage[HUGE_MAX_HSTATE]; }; +#define MEMFILE_PRIVATE(x, val) (((x) << 16) | (val)) +#define MEMFILE_IDX(val) (((val) >> 16) & 0xffff) +#define MEMFILE_ATTR(val) ((val) & 0xffff) + struct cgroup_subsys hugetlb_subsys __read_mostly; struct hugetlb_cgroup *root_h_cgroup __read_mostly; @@ -259,6 +263,131 @@ void hugetlb_cgroup_uncharge_cgroup(int idx, unsigned long nr_pages, return; } +static ssize_t hugetlb_cgroup_read(struct cgroup *cgroup, struct cftype *cft, + struct file *file, char __user *buf, + size_t nbytes, loff_t *ppos) +{ + u64 val; + char str[64]; + int idx, name, len; + struct hugetlb_cgroup *h_cg = hugetlb_cgroup_from_cgroup(cgroup); + + idx = MEMFILE_IDX(cft->private); + name = MEMFILE_ATTR(cft->private); + + val = res_counter_read_u64(&h_cg->hugepage[idx], name); + len = scnprintf(str, sizeof(str), "%llu\n", (unsigned long long)val); + return simple_read_from_buffer(buf, nbytes, ppos, str, len); +} + +static int hugetlb_cgroup_write(struct cgroup *cgroup, struct cftype *cft, + const char *buffer) +{ + int idx, name, ret; + unsigned long long val; + struct hugetlb_cgroup *h_cg = hugetlb_cgroup_from_cgroup(cgroup); + + idx = MEMFILE_IDX(cft->private); + name = MEMFILE_ATTR(cft->private); + + switch (name) { + case RES_LIMIT: + if (hugetlb_cgroup_is_root(h_cg)) { + /* Can't set limit on root */ + ret = -EINVAL; + break; + } + /* This function does all necessary parse...reuse it */ + ret = res_counter_memparse_write_strategy(buffer, &val); + if (ret) + break; + ret = res_counter_set_limit(&h_cg->hugepage[idx], val); + break; + default: + ret = -EINVAL; + break; + } + return ret; +} + +static int hugetlb_cgroup_reset(struct cgroup *cgroup, unsigned int event) +{ + int idx, name, ret = 0; + struct hugetlb_cgroup *h_cg = hugetlb_cgroup_from_cgroup(cgroup); + + idx = MEMFILE_IDX(event); + name = MEMFILE_ATTR(event); + + switch (name) { + case RES_MAX_USAGE: + res_counter_reset_max(&h_cg->hugepage[idx]); + break; + case RES_FAILCNT: + res_counter_reset_failcnt(&h_cg->hugepage[idx]); + break; + default: + ret = -EINVAL; + break; + } + return ret; +} + +static char *mem_fmt(char *buf, int size, unsigned long hsize) +{ + if (hsize >= (1UL << 30)) + snprintf(buf, size, "%luGB", hsize >> 30); + else if (hsize >= (1UL << 20)) + snprintf(buf, size, "%luMB", hsize >> 20); + else + snprintf(buf, size, "%luKB", hsize >> 10); + return buf; +} + +int __init hugetlb_cgroup_file_init(int idx) +{ + char buf[32]; + struct cftype *cft; + struct hstate *h = &hstates[idx]; + + /* format the size */ + mem_fmt(buf, 32, huge_page_size(h)); + + /* Add the limit file */ + cft = &h->cgroup_files[0]; + snprintf(cft->name, MAX_CFTYPE_NAME, "%s.limit_in_bytes", buf); + cft->private = MEMFILE_PRIVATE(idx, RES_LIMIT); + cft->read = hugetlb_cgroup_read; + cft->write_string = hugetlb_cgroup_write; + + /* Add the usage file */ + cft = &h->cgroup_files[1]; + snprintf(cft->name, MAX_CFTYPE_NAME, "%s.usage_in_bytes", buf); + cft->private = MEMFILE_PRIVATE(idx, RES_USAGE); + cft->read = hugetlb_cgroup_read; + + /* Add the MAX usage file */ + cft = &h->cgroup_files[2]; + snprintf(cft->name, MAX_CFTYPE_NAME, "%s.max_usage_in_bytes", buf); + cft->private = MEMFILE_PRIVATE(idx, RES_MAX_USAGE); + cft->trigger = hugetlb_cgroup_reset; + cft->read = hugetlb_cgroup_read; + + /* Add the failcntfile */ + cft = &h->cgroup_files[3]; + snprintf(cft->name, MAX_CFTYPE_NAME, "%s.failcnt", buf); + cft->private = MEMFILE_PRIVATE(idx, RES_FAILCNT); + cft->trigger = hugetlb_cgroup_reset; + cft->read = hugetlb_cgroup_read; + + /* NULL terminate the last cft */ + cft = &h->cgroup_files[4]; + memset(cft, 0, sizeof(*cft)); + + WARN_ON(cgroup_add_cftypes(&hugetlb_subsys, h->cgroup_files)); + + return 0; +} + struct cgroup_subsys hugetlb_subsys = { .name = "hugetlb", .create = hugetlb_cgroup_create, -- 1.7.10 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753930Ab2FMKak (ORCPT ); Wed, 13 Jun 2012 06:30:40 -0400 Received: from e28smtp04.in.ibm.com ([122.248.162.4]:51500 "EHLO e28smtp04.in.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753743Ab2FMK2W (ORCPT ); Wed, 13 Jun 2012 06:28:22 -0400 From: "Aneesh Kumar K.V" To: linux-mm@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, rientjes@google.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, "Aneesh Kumar K.V" Subject: [PATCH -V9 15/15] hugetlb/cgroup: add HugeTLB controller documentation Date: Wed, 13 Jun 2012 15:57:34 +0530 Message-Id: <1339583254-895-16-git-send-email-aneesh.kumar@linux.vnet.ibm.com> X-Mailer: git-send-email 1.7.10 In-Reply-To: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> x-cbid: 12061310-5564-0000-0000-0000032F7643 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: "Aneesh Kumar K.V" Reviewed-by: KAMEZAWA Hiroyuki Signed-off-by: Aneesh Kumar K.V --- Documentation/cgroups/hugetlb.txt | 45 +++++++++++++++++++++++++++++++++++++ 1 file changed, 45 insertions(+) create mode 100644 Documentation/cgroups/hugetlb.txt diff --git a/Documentation/cgroups/hugetlb.txt b/Documentation/cgroups/hugetlb.txt new file mode 100644 index 0000000..a9faaca --- /dev/null +++ b/Documentation/cgroups/hugetlb.txt @@ -0,0 +1,45 @@ +HugeTLB Controller +------------------- + +The HugeTLB controller allows to limit the HugeTLB usage per control group and +enforces the controller limit during page fault. Since HugeTLB doesn't +support page reclaim, enforcing the limit at page fault time implies that, +the application will get SIGBUS signal if it tries to access HugeTLB pages +beyond its limit. This requires the application to know beforehand how much +HugeTLB pages it would require for its use. + +HugeTLB controller can be created by first mounting the cgroup filesystem. + +# mount -t cgroup -o hugetlb none /sys/fs/cgroup + +With the above step, the initial or the parent HugeTLB group becomes +visible at /sys/fs/cgroup. At bootup, this group includes all the tasks in +the system. /sys/fs/cgroup/tasks lists the tasks in this cgroup. + +New groups can be created under the parent group /sys/fs/cgroup. + +# cd /sys/fs/cgroup +# mkdir g1 +# echo $$ > g1/tasks + +The above steps create a new group g1 and move the current shell +process (bash) into it. + +Brief summary of control files + + hugetlb..limit_in_bytes # set/show limit of "hugepagesize" hugetlb usage + hugetlb..max_usage_in_bytes # show max "hugepagesize" hugetlb usage recorded + hugetlb..usage_in_bytes # show current res_counter usage for "hugepagesize" hugetlb + hugetlb..failcnt # show the number of allocation failure due to HugeTLB limit + +For a system supporting two hugepage size (16M and 16G) the control +files include: + +hugetlb.16GB.limit_in_bytes +hugetlb.16GB.max_usage_in_bytes +hugetlb.16GB.usage_in_bytes +hugetlb.16GB.failcnt +hugetlb.16MB.limit_in_bytes +hugetlb.16MB.max_usage_in_bytes +hugetlb.16MB.usage_in_bytes +hugetlb.16MB.failcnt -- 1.7.10 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753740Ab2FMK2V (ORCPT ); Wed, 13 Jun 2012 06:28:21 -0400 Received: from e28smtp01.in.ibm.com ([122.248.162.1]:54199 "EHLO e28smtp01.in.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753548Ab2FMK2O (ORCPT ); Wed, 13 Jun 2012 06:28:14 -0400 From: "Aneesh Kumar K.V" To: linux-mm@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, rientjes@google.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, "Aneesh Kumar K.V" Subject: [PATCH -V9 03/15] hugetlb: add an inline helper for finding hstate index Date: Wed, 13 Jun 2012 15:57:22 +0530 Message-Id: <1339583254-895-4-git-send-email-aneesh.kumar@linux.vnet.ibm.com> X-Mailer: git-send-email 1.7.10 In-Reply-To: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> x-cbid: 12061310-4790-0000-0000-000003370E18 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: "Aneesh Kumar K.V" Add an inline helper and use it in the code. Acked-by: David Rientjes Acked-by: Michal Hocko Reviewed-by: KAMEZAWA Hiroyuki Signed-off-by: Aneesh Kumar K.V --- include/linux/hugetlb.h | 6 ++++++ mm/hugetlb.c | 20 +++++++++++--------- 2 files changed, 17 insertions(+), 9 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index d5d6bbe..217f528 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -302,6 +302,11 @@ static inline unsigned hstate_index_to_shift(unsigned index) return hstates[index].order + PAGE_SHIFT; } +static inline int hstate_index(struct hstate *h) +{ + return h - hstates; +} + #else struct hstate {}; #define alloc_huge_page_node(h, nid) NULL @@ -320,6 +325,7 @@ static inline unsigned int pages_per_huge_page(struct hstate *h) return 1; } #define hstate_index_to_shift(index) 0 +#define hstate_index(h) 0 #endif #endif /* _LINUX_HUGETLB_H */ diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 34a7e23..b1e0ed1 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1646,7 +1646,7 @@ static int hugetlb_sysfs_add_hstate(struct hstate *h, struct kobject *parent, struct attribute_group *hstate_attr_group) { int retval; - int hi = h - hstates; + int hi = hstate_index(h); hstate_kobjs[hi] = kobject_create_and_add(h->name, parent); if (!hstate_kobjs[hi]) @@ -1741,11 +1741,13 @@ void hugetlb_unregister_node(struct node *node) if (!nhs->hugepages_kobj) return; /* no hstate attributes */ - for_each_hstate(h) - if (nhs->hstate_kobjs[h - hstates]) { - kobject_put(nhs->hstate_kobjs[h - hstates]); - nhs->hstate_kobjs[h - hstates] = NULL; + for_each_hstate(h) { + int idx = hstate_index(h); + if (nhs->hstate_kobjs[idx]) { + kobject_put(nhs->hstate_kobjs[idx]); + nhs->hstate_kobjs[idx] = NULL; } + } kobject_put(nhs->hugepages_kobj); nhs->hugepages_kobj = NULL; @@ -1848,7 +1850,7 @@ static void __exit hugetlb_exit(void) hugetlb_unregister_all_nodes(); for_each_hstate(h) { - kobject_put(hstate_kobjs[h - hstates]); + kobject_put(hstate_kobjs[hstate_index(h)]); } kobject_put(hugepages_kobj); @@ -1869,7 +1871,7 @@ static int __init hugetlb_init(void) if (!size_to_hstate(default_hstate_size)) hugetlb_add_hstate(HUGETLB_PAGE_ORDER); } - default_hstate_idx = size_to_hstate(default_hstate_size) - hstates; + default_hstate_idx = hstate_index(size_to_hstate(default_hstate_size)); if (default_hstate_max_huge_pages) default_hstate.max_huge_pages = default_hstate_max_huge_pages; @@ -2687,7 +2689,7 @@ retry: */ if (unlikely(PageHWPoison(page))) { ret = VM_FAULT_HWPOISON | - VM_FAULT_SET_HINDEX(h - hstates); + VM_FAULT_SET_HINDEX(hstate_index(h)); goto backout_unlocked; } } @@ -2760,7 +2762,7 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, return 0; } else if (unlikely(is_hugetlb_entry_hwpoisoned(entry))) return VM_FAULT_HWPOISON_LARGE | - VM_FAULT_SET_HINDEX(h - hstates); + VM_FAULT_SET_HINDEX(hstate_index(h)); } ptep = huge_pte_alloc(mm, address, huge_page_size(h)); -- 1.7.10 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753936Ab2FMKa5 (ORCPT ); Wed, 13 Jun 2012 06:30:57 -0400 Received: from e28smtp02.in.ibm.com ([122.248.162.2]:58092 "EHLO e28smtp02.in.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753546Ab2FMK2U (ORCPT ); Wed, 13 Jun 2012 06:28:20 -0400 From: "Aneesh Kumar K.V" To: linux-mm@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, rientjes@google.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, "Aneesh Kumar K.V" Subject: [PATCH -V9 12/15] hugetlb/cgroup: Add support for cgroup removal Date: Wed, 13 Jun 2012 15:57:31 +0530 Message-Id: <1339583254-895-13-git-send-email-aneesh.kumar@linux.vnet.ibm.com> X-Mailer: git-send-email 1.7.10 In-Reply-To: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> x-cbid: 12061310-5816-0000-0000-0000032119A7 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: "Aneesh Kumar K.V" This patch add support for cgroup removal. If we don't have parent cgroup, the charges are moved to root cgroup. Signed-off-by: Aneesh Kumar K.V --- mm/hugetlb_cgroup.c | 70 +++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 68 insertions(+), 2 deletions(-) diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c index 0f2f6ac..a3a68a4 100644 --- a/mm/hugetlb_cgroup.c +++ b/mm/hugetlb_cgroup.c @@ -107,10 +107,76 @@ static void hugetlb_cgroup_destroy(struct cgroup *cgroup) kfree(h_cgroup); } + +/* + * Should be called with hugetlb_lock held. + * Since we are holding hugetlb_lock, pages cannot get moved from + * active list or uncharged from the cgroup, So no need to get + * page reference and test for page active here. This function + * cannot fail. + */ +static void hugetlb_cgroup_move_parent(int idx, struct cgroup *cgroup, + struct page *page) +{ + int csize; + struct res_counter *counter; + struct res_counter *fail_res; + struct hugetlb_cgroup *page_hcg; + struct hugetlb_cgroup *h_cg = hugetlb_cgroup_from_cgroup(cgroup); + struct hugetlb_cgroup *parent = parent_hugetlb_cgroup(cgroup); + + page_hcg = hugetlb_cgroup_from_page(page); + /* + * We can have pages in active list without any cgroup + * ie, hugepage with less than 3 pages. We can safely + * ignore those pages. + */ + if (!page_hcg || page_hcg != h_cg) + goto out; + + csize = PAGE_SIZE << compound_order(page); + if (!parent) { + parent = root_h_cgroup; + /* root has no limit */ + res_counter_charge_nofail(&parent->hugepage[idx], + csize, &fail_res); + } + counter = &h_cg->hugepage[idx]; + res_counter_uncharge_until(counter, counter->parent, csize); + + set_hugetlb_cgroup(page, parent); +out: + return; +} + +/* + * Force the hugetlb cgroup to empty the hugetlb resources by moving them to + * the parent cgroup. + */ static int hugetlb_cgroup_pre_destroy(struct cgroup *cgroup) { - /* We will add the cgroup removal support in later patches */ - return -EBUSY; + struct hstate *h; + struct page *page; + int ret = 0, idx = 0; + + do { + if (cgroup_task_count(cgroup) || + !list_empty(&cgroup->children)) { + ret = -EBUSY; + goto out; + } + for_each_hstate(h) { + spin_lock(&hugetlb_lock); + list_for_each_entry(page, &h->hugepage_activelist, lru) + hugetlb_cgroup_move_parent(idx, cgroup, page); + + spin_unlock(&hugetlb_lock); + idx++; + } + cond_resched(); + } while (hugetlb_cgroup_have_usage(cgroup)); +out: + return ret; } int hugetlb_cgroup_charge_cgroup(int idx, unsigned long nr_pages, -- 1.7.10 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753902Ab2FMKbP (ORCPT ); Wed, 13 Jun 2012 06:31:15 -0400 Received: from e28smtp01.in.ibm.com ([122.248.162.1]:54239 "EHLO e28smtp01.in.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753700Ab2FMK2U (ORCPT ); Wed, 13 Jun 2012 06:28:20 -0400 From: "Aneesh Kumar K.V" To: linux-mm@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, rientjes@google.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, "Aneesh Kumar K.V" Subject: [PATCH -V9 14/15] hugetlb/cgroup: migrate hugetlb cgroup info from oldpage to new page during migration Date: Wed, 13 Jun 2012 15:57:33 +0530 Message-Id: <1339583254-895-15-git-send-email-aneesh.kumar@linux.vnet.ibm.com> X-Mailer: git-send-email 1.7.10 In-Reply-To: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> x-cbid: 12061310-4790-0000-0000-000003370E38 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: "Aneesh Kumar K.V" With HugeTLB pages, hugetlb cgroup is uncharged in compound page destructor. Since we are holding a hugepage reference, we can be sure that old page won't get uncharged till the last put_page(). Signed-off-by: Aneesh Kumar K.V --- include/linux/hugetlb_cgroup.h | 8 ++++++++ mm/hugetlb_cgroup.c | 20 ++++++++++++++++++++ mm/migrate.c | 5 +++++ 3 files changed, 33 insertions(+) diff --git a/include/linux/hugetlb_cgroup.h b/include/linux/hugetlb_cgroup.h index bd8bc98..e9e6d74 100644 --- a/include/linux/hugetlb_cgroup.h +++ b/include/linux/hugetlb_cgroup.h @@ -63,6 +63,8 @@ extern void hugetlb_cgroup_uncharge_page(int idx, unsigned long nr_pages, extern void hugetlb_cgroup_uncharge_cgroup(int idx, unsigned long nr_pages, struct hugetlb_cgroup *h_cg); extern int hugetlb_cgroup_file_init(int idx) __init; +extern void hugetlb_cgroup_migrate(struct page *oldhpage, + struct page *newhpage); #else static inline struct hugetlb_cgroup *hugetlb_cgroup_from_page(struct page *page) @@ -114,5 +116,11 @@ static inline int __init hugetlb_cgroup_file_init(int idx) return 0; } +static inline void hugetlb_cgroup_migrate(struct page *oldhpage, + struct page *newhpage) +{ + return; +} + #endif /* CONFIG_MEM_RES_CTLR_HUGETLB */ #endif diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c index 64e93e0..8e7ca0a 100644 --- a/mm/hugetlb_cgroup.c +++ b/mm/hugetlb_cgroup.c @@ -388,6 +388,26 @@ int __init hugetlb_cgroup_file_init(int idx) return 0; } +void hugetlb_cgroup_migrate(struct page *oldhpage, struct page *newhpage) +{ + struct hugetlb_cgroup *h_cg; + + if (hugetlb_cgroup_disabled()) + return; + + VM_BUG_ON(!PageHuge(oldhpage)); + spin_lock(&hugetlb_lock); + h_cg = hugetlb_cgroup_from_page(oldhpage); + set_hugetlb_cgroup(oldhpage, NULL); + cgroup_exclude_rmdir(&h_cg->css); + + /* move the h_cg details to new cgroup */ + set_hugetlb_cgroup(newhpage, h_cg); + spin_unlock(&hugetlb_lock); + cgroup_release_and_wakeup_rmdir(&h_cg->css); + return; +} + struct cgroup_subsys hugetlb_subsys = { .name = "hugetlb", .create = hugetlb_cgroup_create, diff --git a/mm/migrate.c b/mm/migrate.c index fdce3a2..6c37c51 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -33,6 +33,7 @@ #include #include #include +#include #include #include @@ -931,6 +932,10 @@ static int unmap_and_move_huge_page(new_page_t get_new_page, if (anon_vma) put_anon_vma(anon_vma); + + if (!rc) + hugetlb_cgroup_migrate(hpage, new_hpage); + unlock_page(hpage); out: put_page(new_hpage); -- 1.7.10 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753709Ab2FMK2T (ORCPT ); Wed, 13 Jun 2012 06:28:19 -0400 Received: from e28smtp02.in.ibm.com ([122.248.162.2]:58037 "EHLO e28smtp02.in.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753546Ab2FMK2O (ORCPT ); Wed, 13 Jun 2012 06:28:14 -0400 From: "Aneesh Kumar K.V" To: linux-mm@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, rientjes@google.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, "Aneesh Kumar K.V" Subject: [PATCH -V9 05/15] hugetlb: avoid taking i_mmap_mutex in unmap_single_vma() for hugetlb Date: Wed, 13 Jun 2012 15:57:24 +0530 Message-Id: <1339583254-895-6-git-send-email-aneesh.kumar@linux.vnet.ibm.com> X-Mailer: git-send-email 1.7.10 In-Reply-To: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> x-cbid: 12061310-5816-0000-0000-00000321198F Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: "Aneesh Kumar K.V" i_mmap_mutex lock was added in unmap_single_vma by 502717f4e ("hugetlb: fix linked list corruption in unmap_hugepage_range()") but we don't use page->lru in unmap_hugepage_range any more. Also the lock was taken higher up in the stack in some code path. That would result in deadlock. unmap_mapping_range (i_mmap_mutex) -> unmap_mapping_range_tree -> unmap_mapping_range_vma -> zap_page_range_single -> unmap_single_vma -> unmap_hugepage_range (i_mmap_mutex) For shared pagetable support for huge pages, since pagetable pages are ref counted we don't need any lock during huge_pmd_unshare. We do take i_mmap_mutex in huge_pmd_share while walking the vma_prio_tree in mapping. (39dde65c9940c97f ("shared page table for hugetlb page")). Signed-off-by: Aneesh Kumar K.V --- mm/memory.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 545e18a..f6bc04f 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1326,11 +1326,8 @@ static void unmap_single_vma(struct mmu_gather *tlb, * Since no pte has actually been setup, it is * safe to do nothing in this case. */ - if (vma->vm_file) { - mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex); + if (vma->vm_file) __unmap_hugepage_range(tlb, vma, start, end, NULL); - mutex_unlock(&vma->vm_file->f_mapping->i_mmap_mutex); - } } else unmap_page_range(tlb, vma, start, end, details); } -- 1.7.10 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753641Ab2FMK2Q (ORCPT ); Wed, 13 Jun 2012 06:28:16 -0400 Received: from e28smtp06.in.ibm.com ([122.248.162.6]:34423 "EHLO e28smtp06.in.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753531Ab2FMK2M (ORCPT ); Wed, 13 Jun 2012 06:28:12 -0400 From: "Aneesh Kumar K.V" To: linux-mm@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, rientjes@google.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, "Aneesh Kumar K.V" Subject: [PATCH -V9 01/15] hugetlb: rename max_hstate to hugetlb_max_hstate Date: Wed, 13 Jun 2012 15:57:20 +0530 Message-Id: <1339583254-895-2-git-send-email-aneesh.kumar@linux.vnet.ibm.com> X-Mailer: git-send-email 1.7.10 In-Reply-To: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> x-cbid: 12061310-9574-0000-0000-00000328041E Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: "Aneesh Kumar K.V" Rename max_hstate to hugetlb_max_hstate. We will be using this from other subsystems like hugetlb controller in later patches. Acked-by: David Rientjes Reviewed-by: KAMEZAWA Hiroyuki Acked-by: Hillf Danton Acked-by: Michal Hocko Signed-off-by: Aneesh Kumar K.V --- mm/hugetlb.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index e198831..c868309 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -34,7 +34,7 @@ const unsigned long hugetlb_zero = 0, hugetlb_infinity = ~0UL; static gfp_t htlb_alloc_mask = GFP_HIGHUSER; unsigned long hugepages_treat_as_movable; -static int max_hstate; +static int hugetlb_max_hstate; unsigned int default_hstate_idx; struct hstate hstates[HUGE_MAX_HSTATE]; @@ -46,7 +46,7 @@ static unsigned long __initdata default_hstate_max_huge_pages; static unsigned long __initdata default_hstate_size; #define for_each_hstate(h) \ - for ((h) = hstates; (h) < &hstates[max_hstate]; (h)++) + for ((h) = hstates; (h) < &hstates[hugetlb_max_hstate]; (h)++) /* * Protects updates to hugepage_freelists, nr_huge_pages, and free_huge_pages @@ -1897,9 +1897,9 @@ void __init hugetlb_add_hstate(unsigned order) printk(KERN_WARNING "hugepagesz= specified twice, ignoring\n"); return; } - BUG_ON(max_hstate >= HUGE_MAX_HSTATE); + BUG_ON(hugetlb_max_hstate >= HUGE_MAX_HSTATE); BUG_ON(order == 0); - h = &hstates[max_hstate++]; + h = &hstates[hugetlb_max_hstate++]; h->order = order; h->mask = ~((1ULL << (order + PAGE_SHIFT)) - 1); h->nr_huge_pages = 0; @@ -1920,10 +1920,10 @@ static int __init hugetlb_nrpages_setup(char *s) static unsigned long *last_mhp; /* - * !max_hstate means we haven't parsed a hugepagesz= parameter yet, + * !hugetlb_max_hstate means we haven't parsed a hugepagesz= parameter yet, * so this hugepages= parameter goes to the "default hstate". */ - if (!max_hstate) + if (!hugetlb_max_hstate) mhp = &default_hstate_max_huge_pages; else mhp = &parsed_hstate->max_huge_pages; @@ -1942,7 +1942,7 @@ static int __init hugetlb_nrpages_setup(char *s) * But we need to allocate >= MAX_ORDER hstates here early to still * use the bootmem allocator. */ - if (max_hstate && parsed_hstate->order >= MAX_ORDER) + if (hugetlb_max_hstate && parsed_hstate->order >= MAX_ORDER) hugetlb_hstate_alloc_pages(parsed_hstate); last_mhp = mhp; -- 1.7.10 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753566Ab2FMLc5 (ORCPT ); Wed, 13 Jun 2012 07:32:57 -0400 Received: from e23smtp01.au.ibm.com ([202.81.31.143]:49115 "EHLO e23smtp01.au.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752807Ab2FMLcz (ORCPT ); Wed, 13 Jun 2012 07:32:55 -0400 From: "Aneesh Kumar K.V" To: linux-mm@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, rientjes@google.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: Re: [PATCH -V9 10/15] hugetlb/cgroup: Add the cgroup pointer to page lru In-Reply-To: <1339583254-895-11-git-send-email-aneesh.kumar@linux.vnet.ibm.com> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1339583254-895-11-git-send-email-aneesh.kumar@linux.vnet.ibm.com> User-Agent: Notmuch/0.13.2+35~g0ff57e7 (http://notmuchmail.org) Emacs/24.1.50.1 (x86_64-unknown-linux-gnu) Date: Wed, 13 Jun 2012 17:02:47 +0530 Message-ID: <8762avo3a8.fsf@skywalker.in.ibm.com> MIME-Version: 1.0 Content-Type: text/plain x-cbid: 12061301-1618-0000-0000-000001D26659 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Need this patch for hugetlb cgroup disabled. I will send an updated patch in reply. diff --git a/include/linux/hugetlb_cgroup.h b/include/linux/hugetlb_cgroup.h index e9e6d74..bc30413 100644 --- a/include/linux/hugetlb_cgroup.h +++ b/include/linux/hugetlb_cgroup.h @@ -18,14 +18,14 @@ #include struct hugetlb_cgroup; - -#ifdef CONFIG_CGROUP_HUGETLB_RES_CTLR /* * Minimum page order trackable by hugetlb cgroup. * At least 3 pages are necessary for all the tracking information. */ #define HUGETLB_CGROUP_MIN_ORDER 2 +#ifdef CONFIG_CGROUP_HUGETLB_RES_CTLR + static inline struct hugetlb_cgroup *hugetlb_cgroup_from_page(struct page *page) { VM_BUG_ON(!PageHuge(page)); From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752774Ab2FMO7a (ORCPT ); Wed, 13 Jun 2012 10:59:30 -0400 Received: from cantor2.suse.de ([195.135.220.15]:56857 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751557Ab2FMO72 (ORCPT ); Wed, 13 Jun 2012 10:59:28 -0400 Date: Wed, 13 Jun 2012 16:59:23 +0200 From: Michal Hocko To: "Aneesh Kumar K.V" Cc: linux-mm@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, rientjes@google.com, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: Re: [PATCH -V9 04/15] hugetlb: use mmu_gather instead of a temporary linked list for accumulating pages Message-ID: <20120613145923.GA14777@tiehlicka.suse.cz> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1339583254-895-5-git-send-email-aneesh.kumar@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1339583254-895-5-git-send-email-aneesh.kumar@linux.vnet.ibm.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed 13-06-12 15:57:23, Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > Use a mmu_gather instead of a temporary linked list for accumulating > pages when we unmap a hugepage range Sorry for coming up with the comment that late but you owe us an explanation _why_ you are doing this. I assume that this fixes a real problem when we take i_mmap_mutex already up in unmap_mapping_range mutex_lock(&mapping->i_mmap_mutex); unmap_mapping_range_tree | unmap_mapping_range_list unmap_mapping_range_vma zap_page_range_single unmap_single_vma unmap_hugepage_range mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex); And that this should have been marked for stable as well (I haven't checked when this has been introduced). But then I do not see how this help when you still do this: [...] > diff --git a/mm/memory.c b/mm/memory.c > index 1b7dc66..545e18a 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -1326,8 +1326,11 @@ static void unmap_single_vma(struct mmu_gather *tlb, > * Since no pte has actually been setup, it is > * safe to do nothing in this case. > */ > - if (vma->vm_file) > - unmap_hugepage_range(vma, start, end, NULL); > + if (vma->vm_file) { > + mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex); > + __unmap_hugepage_range(tlb, vma, start, end, NULL); > + mutex_unlock(&vma->vm_file->f_mapping->i_mmap_mutex); > + } > } else > unmap_page_range(tlb, vma, start, end, details); > } -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753400Ab2FMPDm (ORCPT ); Wed, 13 Jun 2012 11:03:42 -0400 Received: from cantor2.suse.de ([195.135.220.15]:57142 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751802Ab2FMPDk (ORCPT ); Wed, 13 Jun 2012 11:03:40 -0400 Date: Wed, 13 Jun 2012 17:03:38 +0200 From: Michal Hocko To: "Aneesh Kumar K.V" Cc: linux-mm@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, rientjes@google.com, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: Re: [PATCH -V9 04/15] hugetlb: use mmu_gather instead of a temporary linked list for accumulating pages Message-ID: <20120613150338.GB14777@tiehlicka.suse.cz> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1339583254-895-5-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <20120613145923.GA14777@tiehlicka.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20120613145923.GA14777@tiehlicka.suse.cz> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed 13-06-12 16:59:23, Michal Hocko wrote: > On Wed 13-06-12 15:57:23, Aneesh Kumar K.V wrote: > > From: "Aneesh Kumar K.V" > > > > Use a mmu_gather instead of a temporary linked list for accumulating > > pages when we unmap a hugepage range > > Sorry for coming up with the comment that late but you owe us an > explanation _why_ you are doing this. > > I assume that this fixes a real problem when we take i_mmap_mutex > already up in > unmap_mapping_range > mutex_lock(&mapping->i_mmap_mutex); > unmap_mapping_range_tree | unmap_mapping_range_list > unmap_mapping_range_vma > zap_page_range_single > unmap_single_vma > unmap_hugepage_range > mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex); > > And that this should have been marked for stable as well (I haven't > checked when this has been introduced). > > But then I do not see how this help when you still do this: > [...] > > diff --git a/mm/memory.c b/mm/memory.c > > index 1b7dc66..545e18a 100644 > > --- a/mm/memory.c > > +++ b/mm/memory.c > > @@ -1326,8 +1326,11 @@ static void unmap_single_vma(struct mmu_gather *tlb, > > * Since no pte has actually been setup, it is > > * safe to do nothing in this case. > > */ > > - if (vma->vm_file) > > - unmap_hugepage_range(vma, start, end, NULL); > > + if (vma->vm_file) { > > + mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex); > > + __unmap_hugepage_range(tlb, vma, start, end, NULL); > > + mutex_unlock(&vma->vm_file->f_mapping->i_mmap_mutex); > > + } > > } else > > unmap_page_range(tlb, vma, start, end, details); > > } Ahhh, you are removing the lock in the next patch. Really confusing and not nice for the stable backport. Could you merge those two patches and add Cc: stable? Then you can add my Reviewed-by: Michal Hocko -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754372Ab2FMQni (ORCPT ); Wed, 13 Jun 2012 12:43:38 -0400 Received: from e28smtp09.in.ibm.com ([122.248.162.9]:35939 "EHLO e28smtp09.in.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752917Ab2FMQng (ORCPT ); Wed, 13 Jun 2012 12:43:36 -0400 From: "Aneesh Kumar K.V" To: Michal Hocko Cc: linux-mm@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, rientjes@google.com, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: Re: [PATCH -V9 04/15] hugetlb: use mmu_gather instead of a temporary linked list for accumulating pages In-Reply-To: <20120613150338.GB14777@tiehlicka.suse.cz> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1339583254-895-5-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <20120613145923.GA14777@tiehlicka.suse.cz> <20120613150338.GB14777@tiehlicka.suse.cz> User-Agent: Notmuch/0.13.2+35~g0ff57e7 (http://notmuchmail.org) Emacs/24.1.50.1 (x86_64-unknown-linux-gnu) Date: Wed, 13 Jun 2012 22:13:00 +0530 Message-ID: <87y5nrmacr.fsf@skywalker.in.ibm.com> MIME-Version: 1.0 Content-Type: text/plain x-cbid: 12061316-2674-0000-0000-000004E368C3 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Michal Hocko writes: > On Wed 13-06-12 16:59:23, Michal Hocko wrote: >> On Wed 13-06-12 15:57:23, Aneesh Kumar K.V wrote: >> > From: "Aneesh Kumar K.V" >> > >> > Use a mmu_gather instead of a temporary linked list for accumulating >> > pages when we unmap a hugepage range >> >> Sorry for coming up with the comment that late but you owe us an >> explanation _why_ you are doing this. >> >> I assume that this fixes a real problem when we take i_mmap_mutex >> already up in >> unmap_mapping_range >> mutex_lock(&mapping->i_mmap_mutex); >> unmap_mapping_range_tree | unmap_mapping_range_list >> unmap_mapping_range_vma >> zap_page_range_single >> unmap_single_vma >> unmap_hugepage_range >> mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex); >> >> And that this should have been marked for stable as well (I haven't >> checked when this has been introduced). >> >> But then I do not see how this help when you still do this: >> [...] >> > diff --git a/mm/memory.c b/mm/memory.c >> > index 1b7dc66..545e18a 100644 >> > --- a/mm/memory.c >> > +++ b/mm/memory.c >> > @@ -1326,8 +1326,11 @@ static void unmap_single_vma(struct mmu_gather *tlb, >> > * Since no pte has actually been setup, it is >> > * safe to do nothing in this case. >> > */ >> > - if (vma->vm_file) >> > - unmap_hugepage_range(vma, start, end, NULL); >> > + if (vma->vm_file) { >> > + mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex); >> > + __unmap_hugepage_range(tlb, vma, start, end, NULL); >> > + mutex_unlock(&vma->vm_file->f_mapping->i_mmap_mutex); >> > + } >> > } else >> > unmap_page_range(tlb, vma, start, end, details); >> > } > > Ahhh, you are removing the lock in the next patch. Really confusing and > not nice for the stable backport. > Could you merge those two patches and add Cc: stable? > Then you can add my > Reviewed-by: Michal Hocko > In the last review cycle I was asked to see if we can get a lockdep report for the above and what I found was we don't really cause the above deadlock with the current codebase because for hugetlb we don't directly call unmap_mapping_range. But still it is good to remove the i_mmap_mutex, because we don't need that protection now. I didn't mark it for stable because of the above reason. -aneesh From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754039Ab2FMQhS (ORCPT ); Wed, 13 Jun 2012 12:37:18 -0400 Received: from e28smtp09.in.ibm.com ([122.248.162.9]:42165 "EHLO e28smtp09.in.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752624Ab2FMQhQ (ORCPT ); Wed, 13 Jun 2012 12:37:16 -0400 From: "Aneesh Kumar K.V" To: Michal Hocko Cc: linux-mm@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, rientjes@google.com, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: Re: [PATCH -V9 04/15] hugetlb: use mmu_gather instead of a temporary linked list for accumulating pages In-Reply-To: <20120613145923.GA14777@tiehlicka.suse.cz> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1339583254-895-5-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <20120613145923.GA14777@tiehlicka.suse.cz> User-Agent: Notmuch/0.13.2+35~g0ff57e7 (http://notmuchmail.org) Emacs/24.1.50.1 (x86_64-unknown-linux-gnu) Date: Wed, 13 Jun 2012 22:07:06 +0530 Message-ID: <871uljnp71.fsf@skywalker.in.ibm.com> MIME-Version: 1.0 Content-Type: text/plain x-cbid: 12061316-2674-0000-0000-000004E365B8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Michal Hocko writes: > On Wed 13-06-12 15:57:23, Aneesh Kumar K.V wrote: >> From: "Aneesh Kumar K.V" >> >> Use a mmu_gather instead of a temporary linked list for accumulating >> pages when we unmap a hugepage range > > Sorry for coming up with the comment that late but you owe us an > explanation _why_ you are doing this. > > I assume that this fixes a real problem when we take i_mmap_mutex > already up in > unmap_mapping_range > mutex_lock(&mapping->i_mmap_mutex); > unmap_mapping_range_tree | unmap_mapping_range_list > unmap_mapping_range_vma > zap_page_range_single > unmap_single_vma > unmap_hugepage_range > mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex); > > And that this should have been marked for stable as well (I haven't > checked when this has been introduced). Switch to mmu_gather is to get rid of the use of page->lru so that i can use it for active list. -aneesh From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755311Ab2FNDNn (ORCPT ); Wed, 13 Jun 2012 23:13:43 -0400 Received: from fgwmail6.fujitsu.co.jp ([192.51.44.36]:44257 "EHLO fgwmail6.fujitsu.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755182Ab2FNDNk (ORCPT ); Wed, 13 Jun 2012 23:13:40 -0400 X-SecurityPolicyCheck: OK by SHieldMailChecker v1.7.4 Message-ID: <4FD95665.5050300@jp.fujitsu.com> Date: Thu, 14 Jun 2012 12:11:33 +0900 From: Kamezawa Hiroyuki User-Agent: Mozilla/5.0 (Windows NT 6.0; rv:12.0) Gecko/20120428 Thunderbird/12.0.1 MIME-Version: 1.0 To: "Aneesh Kumar K.V" CC: linux-mm@kvack.org, dhillf@gmail.com, rientjes@google.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: Re: [PATCH -V9 08/15] hugetlb: Make some static variables global References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1339583254-895-9-git-send-email-aneesh.kumar@linux.vnet.ibm.com> In-Reply-To: <1339583254-895-9-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Content-Type: text/plain; charset=ISO-2022-JP Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org (2012/06/13 19:27), Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > We will use them later in hugetlb_cgroup.c > > Signed-off-by: Aneesh Kumar K.V Acked-by: KAMEZAWA Hiroyuki From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755200Ab2FNDLk (ORCPT ); Wed, 13 Jun 2012 23:11:40 -0400 Received: from fgwmail5.fujitsu.co.jp ([192.51.44.35]:55120 "EHLO fgwmail5.fujitsu.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754862Ab2FNDLj (ORCPT ); Wed, 13 Jun 2012 23:11:39 -0400 X-SecurityPolicyCheck: OK by SHieldMailChecker v1.7.4 Message-ID: <4FD955E8.5050100@jp.fujitsu.com> Date: Thu, 14 Jun 2012 12:09:28 +0900 From: Kamezawa Hiroyuki User-Agent: Mozilla/5.0 (Windows NT 6.0; rv:12.0) Gecko/20120428 Thunderbird/12.0.1 MIME-Version: 1.0 To: "Aneesh Kumar K.V" CC: linux-mm@kvack.org, dhillf@gmail.com, rientjes@google.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: Re: [PATCH -V9 05/15] hugetlb: avoid taking i_mmap_mutex in unmap_single_vma() for hugetlb References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1339583254-895-6-git-send-email-aneesh.kumar@linux.vnet.ibm.com> In-Reply-To: <1339583254-895-6-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Content-Type: text/plain; charset=ISO-2022-JP Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org (2012/06/13 19:27), Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > i_mmap_mutex lock was added in unmap_single_vma by 502717f4e ("hugetlb: > fix linked list corruption in unmap_hugepage_range()") but we don't use > page->lru in unmap_hugepage_range any more. Also the lock was taken > higher up in the stack in some code path. That would result in deadlock. > > unmap_mapping_range (i_mmap_mutex) > -> unmap_mapping_range_tree > -> unmap_mapping_range_vma > -> zap_page_range_single > -> unmap_single_vma > -> unmap_hugepage_range (i_mmap_mutex) > > For shared pagetable support for huge pages, since pagetable pages are ref > counted we don't need any lock during huge_pmd_unshare. We do take > i_mmap_mutex in huge_pmd_share while walking the vma_prio_tree in mapping. > (39dde65c9940c97f ("shared page table for hugetlb page")). > > Signed-off-by: Aneesh Kumar K.V Acked-by: KAMEZAWA Hiroyuki From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1750826Ab2FNEHG (ORCPT ); Thu, 14 Jun 2012 00:07:06 -0400 Received: from fgwmail6.fujitsu.co.jp ([192.51.44.36]:50213 "EHLO fgwmail6.fujitsu.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750702Ab2FNEHD (ORCPT ); Thu, 14 Jun 2012 00:07:03 -0400 X-SecurityPolicyCheck: OK by SHieldMailChecker v1.7.4 Message-ID: <4FD962D4.1020908@jp.fujitsu.com> Date: Thu, 14 Jun 2012 13:04:36 +0900 From: Kamezawa Hiroyuki User-Agent: Mozilla/5.0 (Windows NT 6.0; rv:12.0) Gecko/20120428 Thunderbird/12.0.1 MIME-Version: 1.0 To: "Aneesh Kumar K.V" CC: linux-mm@kvack.org, dhillf@gmail.com, rientjes@google.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: Re: [PATCH -V9 [updated] 10/15] hugetlb/cgroup: Add the cgroup pointer to page lru References: <1339583254-895-11-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1339587270-5831-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> In-Reply-To: <1339587270-5831-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Content-Type: text/plain; charset=ISO-2022-JP Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org (2012/06/13 20:34), Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > Add the hugetlb cgroup pointer to 3rd page lru.next. This limit > the usage to hugetlb cgroup to only hugepages with 3 or more > normal pages. I guess that is an acceptable limitation. > > Signed-off-by: Aneesh Kumar K.V Acked-by: KAMEZAWA Hiroyuki From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1750871Ab2FNEJV (ORCPT ); Thu, 14 Jun 2012 00:09:21 -0400 Received: from fgwmail5.fujitsu.co.jp ([192.51.44.35]:50566 "EHLO fgwmail5.fujitsu.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750702Ab2FNEJT (ORCPT ); Thu, 14 Jun 2012 00:09:19 -0400 X-SecurityPolicyCheck: OK by SHieldMailChecker v1.7.4 Message-ID: <4FD96370.2020708@jp.fujitsu.com> Date: Thu, 14 Jun 2012 13:07:12 +0900 From: Kamezawa Hiroyuki User-Agent: Mozilla/5.0 (Windows NT 6.0; rv:12.0) Gecko/20120428 Thunderbird/12.0.1 MIME-Version: 1.0 To: "Aneesh Kumar K.V" CC: linux-mm@kvack.org, dhillf@gmail.com, rientjes@google.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: Re: [PATCH -V9 11/15] hugetlb/cgroup: Add charge/uncharge routines for hugetlb cgroup References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1339583254-895-12-git-send-email-aneesh.kumar@linux.vnet.ibm.com> In-Reply-To: <1339583254-895-12-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Content-Type: text/plain; charset=ISO-2022-JP Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org (2012/06/13 19:27), Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > This patchset add the charge and uncharge routines for hugetlb cgroup. > We do cgroup charging in page alloc and uncharge in compound page > destructor. Assigning page's hugetlb cgroup is protected by hugetlb_lock. > > Signed-off-by: Aneesh Kumar K.V Acked-by: KAMEZAWA Hiroyuki From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1750949Ab2FNELM (ORCPT ); Thu, 14 Jun 2012 00:11:12 -0400 Received: from fgwmail5.fujitsu.co.jp ([192.51.44.35]:50644 "EHLO fgwmail5.fujitsu.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750702Ab2FNELL (ORCPT ); Thu, 14 Jun 2012 00:11:11 -0400 X-SecurityPolicyCheck: OK by SHieldMailChecker v1.7.4 Message-ID: <4FD963E1.6080506@jp.fujitsu.com> Date: Thu, 14 Jun 2012 13:09:05 +0900 From: Kamezawa Hiroyuki User-Agent: Mozilla/5.0 (Windows NT 6.0; rv:12.0) Gecko/20120428 Thunderbird/12.0.1 MIME-Version: 1.0 To: "Aneesh Kumar K.V" CC: linux-mm@kvack.org, dhillf@gmail.com, rientjes@google.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: Re: [PATCH -V9 12/15] hugetlb/cgroup: Add support for cgroup removal References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1339583254-895-13-git-send-email-aneesh.kumar@linux.vnet.ibm.com> In-Reply-To: <1339583254-895-13-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Content-Type: text/plain; charset=ISO-2022-JP Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org (2012/06/13 19:27), Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > This patch add support for cgroup removal. If we don't have parent > cgroup, the charges are moved to root cgroup. > > Signed-off-by: Aneesh Kumar K.V Acked-by: KAMEZAWA Hiroyuki From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751169Ab2FNEPZ (ORCPT ); Thu, 14 Jun 2012 00:15:25 -0400 Received: from fgwmail5.fujitsu.co.jp ([192.51.44.35]:60959 "EHLO fgwmail5.fujitsu.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750768Ab2FNEPX (ORCPT ); Thu, 14 Jun 2012 00:15:23 -0400 X-SecurityPolicyCheck: OK by SHieldMailChecker v1.7.4 Message-ID: <4FD964DD.6060802@jp.fujitsu.com> Date: Thu, 14 Jun 2012 13:13:17 +0900 From: Kamezawa Hiroyuki User-Agent: Mozilla/5.0 (Windows NT 6.0; rv:12.0) Gecko/20120428 Thunderbird/12.0.1 MIME-Version: 1.0 To: "Aneesh Kumar K.V" CC: linux-mm@kvack.org, dhillf@gmail.com, rientjes@google.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: Re: [PATCH -V9 14/15] hugetlb/cgroup: migrate hugetlb cgroup info from oldpage to new page during migration References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1339583254-895-15-git-send-email-aneesh.kumar@linux.vnet.ibm.com> In-Reply-To: <1339583254-895-15-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Content-Type: text/plain; charset=ISO-2022-JP Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org (2012/06/13 19:27), Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > With HugeTLB pages, hugetlb cgroup is uncharged in compound page destructor. Since > we are holding a hugepage reference, we can be sure that old page won't > get uncharged till the last put_page(). > > Signed-off-by: Aneesh Kumar K.V Acked-by: KAMEZAWA Hiroyuki From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752839Ab2FNHO2 (ORCPT ); Thu, 14 Jun 2012 03:14:28 -0400 Received: from cantor2.suse.de ([195.135.220.15]:56090 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751208Ab2FNHO1 (ORCPT ); Thu, 14 Jun 2012 03:14:27 -0400 Date: Thu, 14 Jun 2012 09:14:23 +0200 From: Michal Hocko To: "Aneesh Kumar K.V" Cc: linux-mm@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, rientjes@google.com, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: Re: [PATCH -V9 04/15] hugetlb: use mmu_gather instead of a temporary linked list for accumulating pages Message-ID: <20120614071423.GA27397@tiehlicka.suse.cz> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1339583254-895-5-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <20120613145923.GA14777@tiehlicka.suse.cz> <20120613150338.GB14777@tiehlicka.suse.cz> <87y5nrmacr.fsf@skywalker.in.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87y5nrmacr.fsf@skywalker.in.ibm.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed 13-06-12 22:13:00, Aneesh Kumar K.V wrote: > Michal Hocko writes: > > > On Wed 13-06-12 16:59:23, Michal Hocko wrote: > >> On Wed 13-06-12 15:57:23, Aneesh Kumar K.V wrote: > >> > From: "Aneesh Kumar K.V" > >> > > >> > Use a mmu_gather instead of a temporary linked list for accumulating > >> > pages when we unmap a hugepage range > >> > >> Sorry for coming up with the comment that late but you owe us an > >> explanation _why_ you are doing this. > >> > >> I assume that this fixes a real problem when we take i_mmap_mutex > >> already up in > >> unmap_mapping_range > >> mutex_lock(&mapping->i_mmap_mutex); > >> unmap_mapping_range_tree | unmap_mapping_range_list > >> unmap_mapping_range_vma > >> zap_page_range_single > >> unmap_single_vma > >> unmap_hugepage_range > >> mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex); > >> > >> And that this should have been marked for stable as well (I haven't > >> checked when this has been introduced). > >> > >> But then I do not see how this help when you still do this: > >> [...] > >> > diff --git a/mm/memory.c b/mm/memory.c > >> > index 1b7dc66..545e18a 100644 > >> > --- a/mm/memory.c > >> > +++ b/mm/memory.c > >> > @@ -1326,8 +1326,11 @@ static void unmap_single_vma(struct mmu_gather *tlb, > >> > * Since no pte has actually been setup, it is > >> > * safe to do nothing in this case. > >> > */ > >> > - if (vma->vm_file) > >> > - unmap_hugepage_range(vma, start, end, NULL); > >> > + if (vma->vm_file) { > >> > + mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex); > >> > + __unmap_hugepage_range(tlb, vma, start, end, NULL); > >> > + mutex_unlock(&vma->vm_file->f_mapping->i_mmap_mutex); > >> > + } > >> > } else > >> > unmap_page_range(tlb, vma, start, end, details); > >> > } > > > > Ahhh, you are removing the lock in the next patch. Really confusing and > > not nice for the stable backport. > > Could you merge those two patches and add Cc: stable? > > Then you can add my > > Reviewed-by: Michal Hocko > > > > In the last review cycle I was asked to see if we can get a lockdep > report for the above and what I found was we don't really cause the > above deadlock with the current codebase because for hugetlb we don't > directly call unmap_mapping_range. Ahh, ok I missed that. > But still it is good to remove the i_mmap_mutex, because we don't need > that protection now. I didn't mark it for stable because of the above > reason. Thanks for clarification > > -aneesh > -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752851Ab2FNHQj (ORCPT ); Thu, 14 Jun 2012 03:16:39 -0400 Received: from cantor2.suse.de ([195.135.220.15]:56137 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751208Ab2FNHQi (ORCPT ); Thu, 14 Jun 2012 03:16:38 -0400 Date: Thu, 14 Jun 2012 09:16:37 +0200 From: Michal Hocko To: "Aneesh Kumar K.V" Cc: linux-mm@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, rientjes@google.com, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: Re: [PATCH -V9 04/15] hugetlb: use mmu_gather instead of a temporary linked list for accumulating pages Message-ID: <20120614071637.GB27397@tiehlicka.suse.cz> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1339583254-895-5-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <20120613145923.GA14777@tiehlicka.suse.cz> <871uljnp71.fsf@skywalker.in.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <871uljnp71.fsf@skywalker.in.ibm.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed 13-06-12 22:07:06, Aneesh Kumar K.V wrote: > Michal Hocko writes: > > > On Wed 13-06-12 15:57:23, Aneesh Kumar K.V wrote: > >> From: "Aneesh Kumar K.V" > >> > >> Use a mmu_gather instead of a temporary linked list for accumulating > >> pages when we unmap a hugepage range > > > > Sorry for coming up with the comment that late but you owe us an > > explanation _why_ you are doing this. > > > > I assume that this fixes a real problem when we take i_mmap_mutex > > already up in > > unmap_mapping_range > > mutex_lock(&mapping->i_mmap_mutex); > > unmap_mapping_range_tree | unmap_mapping_range_list > > unmap_mapping_range_vma > > zap_page_range_single > > unmap_single_vma > > unmap_hugepage_range > > mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex); > > > > And that this should have been marked for stable as well (I haven't > > checked when this has been introduced). > > Switch to mmu_gather is to get rid of the use of page->lru so that i can use it for > active list. So can we get this to the changelog please? -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753124Ab2FNH2f (ORCPT ); Thu, 14 Jun 2012 03:28:35 -0400 Received: from cantor2.suse.de ([195.135.220.15]:56585 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752075Ab2FNH2d (ORCPT ); Thu, 14 Jun 2012 03:28:33 -0400 Date: Thu, 14 Jun 2012 09:28:31 +0200 From: Michal Hocko To: "Aneesh Kumar K.V" Cc: linux-mm@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, rientjes@google.com, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: Re: [PATCH -V9 06/15] hugetlb: simplify migrate_huge_page() Message-ID: <20120614072831.GD27397@tiehlicka.suse.cz> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1339583254-895-7-git-send-email-aneesh.kumar@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1339583254-895-7-git-send-email-aneesh.kumar@linux.vnet.ibm.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed 13-06-12 15:57:25, Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > Since we migrate only one hugepage, don't use linked list for passing the > page around. Directly pass the page that need to be migrated as argument. > This also remove the usage page->lru in migrate path. > > Reviewed-by: KAMEZAWA Hiroyuki > Signed-off-by: Aneesh Kumar K.V Yes nice. Reviewed-by: Michal Hocko > --- > include/linux/migrate.h | 4 +-- > mm/memory-failure.c | 13 ++-------- > mm/migrate.c | 65 +++++++++++++++-------------------------------- > 3 files changed, 25 insertions(+), 57 deletions(-) > > diff --git a/include/linux/migrate.h b/include/linux/migrate.h > index 855c337..ce7e667 100644 > --- a/include/linux/migrate.h > +++ b/include/linux/migrate.h > @@ -15,7 +15,7 @@ extern int migrate_page(struct address_space *, > extern int migrate_pages(struct list_head *l, new_page_t x, > unsigned long private, bool offlining, > enum migrate_mode mode); > -extern int migrate_huge_pages(struct list_head *l, new_page_t x, > +extern int migrate_huge_page(struct page *, new_page_t x, > unsigned long private, bool offlining, > enum migrate_mode mode); > > @@ -36,7 +36,7 @@ static inline void putback_lru_pages(struct list_head *l) {} > static inline int migrate_pages(struct list_head *l, new_page_t x, > unsigned long private, bool offlining, > enum migrate_mode mode) { return -ENOSYS; } > -static inline int migrate_huge_pages(struct list_head *l, new_page_t x, > +static inline int migrate_huge_page(struct page *page, new_page_t x, > unsigned long private, bool offlining, > enum migrate_mode mode) { return -ENOSYS; } > > diff --git a/mm/memory-failure.c b/mm/memory-failure.c > index ab1e714..53a1495 100644 > --- a/mm/memory-failure.c > +++ b/mm/memory-failure.c > @@ -1414,7 +1414,6 @@ static int soft_offline_huge_page(struct page *page, int flags) > int ret; > unsigned long pfn = page_to_pfn(page); > struct page *hpage = compound_head(page); > - LIST_HEAD(pagelist); > > ret = get_any_page(page, pfn, flags); > if (ret < 0) > @@ -1429,19 +1428,11 @@ static int soft_offline_huge_page(struct page *page, int flags) > } > > /* Keep page count to indicate a given hugepage is isolated. */ > - > - list_add(&hpage->lru, &pagelist); > - ret = migrate_huge_pages(&pagelist, new_page, MPOL_MF_MOVE_ALL, 0, > - true); > + ret = migrate_huge_page(hpage, new_page, MPOL_MF_MOVE_ALL, 0, true); > + put_page(hpage); > if (ret) { > - struct page *page1, *page2; > - list_for_each_entry_safe(page1, page2, &pagelist, lru) > - put_page(page1); > - > pr_info("soft offline: %#lx: migration failed %d, type %lx\n", > pfn, ret, page->flags); > - if (ret > 0) > - ret = -EIO; > return ret; > } > done: > diff --git a/mm/migrate.c b/mm/migrate.c > index be26d5c..fdce3a2 100644 > --- a/mm/migrate.c > +++ b/mm/migrate.c > @@ -932,15 +932,8 @@ static int unmap_and_move_huge_page(new_page_t get_new_page, > if (anon_vma) > put_anon_vma(anon_vma); > unlock_page(hpage); > - > out: > - if (rc != -EAGAIN) { > - list_del(&hpage->lru); > - put_page(hpage); > - } > - > put_page(new_hpage); > - > if (result) { > if (rc) > *result = rc; > @@ -1016,48 +1009,32 @@ out: > return nr_failed + retry; > } > > -int migrate_huge_pages(struct list_head *from, > - new_page_t get_new_page, unsigned long private, bool offlining, > - enum migrate_mode mode) > +int migrate_huge_page(struct page *hpage, new_page_t get_new_page, > + unsigned long private, bool offlining, > + enum migrate_mode mode) > { > - int retry = 1; > - int nr_failed = 0; > - int pass = 0; > - struct page *page; > - struct page *page2; > - int rc; > - > - for (pass = 0; pass < 10 && retry; pass++) { > - retry = 0; > - > - list_for_each_entry_safe(page, page2, from, lru) { > + int pass, rc; > + > + for (pass = 0; pass < 10; pass++) { > + rc = unmap_and_move_huge_page(get_new_page, > + private, hpage, pass > 2, offlining, > + mode); > + switch (rc) { > + case -ENOMEM: > + goto out; > + case -EAGAIN: > + /* try again */ > cond_resched(); > - > - rc = unmap_and_move_huge_page(get_new_page, > - private, page, pass > 2, offlining, > - mode); > - > - switch(rc) { > - case -ENOMEM: > - goto out; > - case -EAGAIN: > - retry++; > - break; > - case 0: > - break; > - default: > - /* Permanent failure */ > - nr_failed++; > - break; > - } > + break; > + case 0: > + goto out; > + default: > + rc = -EIO; > + goto out; > } > } > - rc = 0; > out: > - if (rc) > - return rc; > - > - return nr_failed + retry; > + return rc; > } > > #ifdef CONFIG_NUMA > -- > 1.7.10 > -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753231Ab2FNHiD (ORCPT ); Thu, 14 Jun 2012 03:38:03 -0400 Received: from cantor2.suse.de ([195.135.220.15]:56877 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752122Ab2FNHiB (ORCPT ); Thu, 14 Jun 2012 03:38:01 -0400 Date: Thu, 14 Jun 2012 09:38:00 +0200 From: Michal Hocko To: "Aneesh Kumar K.V" Cc: linux-mm@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, rientjes@google.com, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: Re: [PATCH -V9 08/15] hugetlb: Make some static variables global Message-ID: <20120614073800.GF27397@tiehlicka.suse.cz> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1339583254-895-9-git-send-email-aneesh.kumar@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1339583254-895-9-git-send-email-aneesh.kumar@linux.vnet.ibm.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed 13-06-12 15:57:27, Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > We will use them later in hugetlb_cgroup.c > > Signed-off-by: Aneesh Kumar K.V Reviewed-by: Michal Hocko Just a nit [...] > +extern int hugetlb_max_hstate; Maybe we can mark it __read_mostly as it is modified only during initialization and then it is just a constant. -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753728Ab2FNHdY (ORCPT ); Thu, 14 Jun 2012 03:33:24 -0400 Received: from cantor2.suse.de ([195.135.220.15]:56736 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752506Ab2FNHdW (ORCPT ); Thu, 14 Jun 2012 03:33:22 -0400 Date: Thu, 14 Jun 2012 09:33:20 +0200 From: Michal Hocko To: "Aneesh Kumar K.V" Cc: linux-mm@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, rientjes@google.com, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: Re: [PATCH -V9 07/15] hugetlb: add a list for tracking in-use HugeTLB pages Message-ID: <20120614073320.GE27397@tiehlicka.suse.cz> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1339583254-895-8-git-send-email-aneesh.kumar@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1339583254-895-8-git-send-email-aneesh.kumar@linux.vnet.ibm.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed 13-06-12 15:57:26, Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > hugepage_activelist will be used to track currently used HugeTLB pages. > We need to find the in-use HugeTLB pages to support HugeTLB cgroup removal. > On cgroup removal we update the page's HugeTLB cgroup to point to parent > cgroup. > > Reviewed-by: KAMEZAWA Hiroyuki > Signed-off-by: Aneesh Kumar K.V Reviewed-by: Michal Hocko > --- > include/linux/hugetlb.h | 1 + > mm/hugetlb.c | 12 +++++++----- > 2 files changed, 8 insertions(+), 5 deletions(-) > > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h > index 0f23c18..ed550d8 100644 > --- a/include/linux/hugetlb.h > +++ b/include/linux/hugetlb.h > @@ -211,6 +211,7 @@ struct hstate { > unsigned long resv_huge_pages; > unsigned long surplus_huge_pages; > unsigned long nr_overcommit_huge_pages; > + struct list_head hugepage_activelist; > struct list_head hugepage_freelists[MAX_NUMNODES]; > unsigned int nr_huge_pages_node[MAX_NUMNODES]; > unsigned int free_huge_pages_node[MAX_NUMNODES]; > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index e54b695..b5b6e15 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -510,7 +510,7 @@ void copy_huge_page(struct page *dst, struct page *src) > static void enqueue_huge_page(struct hstate *h, struct page *page) > { > int nid = page_to_nid(page); > - list_add(&page->lru, &h->hugepage_freelists[nid]); > + list_move(&page->lru, &h->hugepage_freelists[nid]); > h->free_huge_pages++; > h->free_huge_pages_node[nid]++; > } > @@ -522,7 +522,7 @@ static struct page *dequeue_huge_page_node(struct hstate *h, int nid) > if (list_empty(&h->hugepage_freelists[nid])) > return NULL; > page = list_entry(h->hugepage_freelists[nid].next, struct page, lru); > - list_del(&page->lru); > + list_move(&page->lru, &h->hugepage_activelist); > set_page_refcounted(page); > h->free_huge_pages--; > h->free_huge_pages_node[nid]--; > @@ -626,10 +626,11 @@ static void free_huge_page(struct page *page) > page->mapping = NULL; > BUG_ON(page_count(page)); > BUG_ON(page_mapcount(page)); > - INIT_LIST_HEAD(&page->lru); > > spin_lock(&hugetlb_lock); > if (h->surplus_huge_pages_node[nid] && huge_page_order(h) < MAX_ORDER) { > + /* remove the page from active list */ > + list_del(&page->lru); > update_and_free_page(h, page); > h->surplus_huge_pages--; > h->surplus_huge_pages_node[nid]--; > @@ -642,6 +643,7 @@ static void free_huge_page(struct page *page) > > static void prep_new_huge_page(struct hstate *h, struct page *page, int nid) > { > + INIT_LIST_HEAD(&page->lru); > set_compound_page_dtor(page, free_huge_page); > spin_lock(&hugetlb_lock); > h->nr_huge_pages++; > @@ -890,6 +892,7 @@ static struct page *alloc_buddy_huge_page(struct hstate *h, int nid) > > spin_lock(&hugetlb_lock); > if (page) { > + INIT_LIST_HEAD(&page->lru); > r_nid = page_to_nid(page); > set_compound_page_dtor(page, free_huge_page); > /* > @@ -994,7 +997,6 @@ retry: > list_for_each_entry_safe(page, tmp, &surplus_list, lru) { > if ((--needed) < 0) > break; > - list_del(&page->lru); > /* > * This page is now managed by the hugetlb allocator and has > * no users -- drop the buddy allocator's reference. > @@ -1009,7 +1011,6 @@ free: > /* Free unnecessary surplus pages to the buddy allocator */ > if (!list_empty(&surplus_list)) { > list_for_each_entry_safe(page, tmp, &surplus_list, lru) { > - list_del(&page->lru); > put_page(page); > } > } > @@ -1909,6 +1910,7 @@ void __init hugetlb_add_hstate(unsigned order) > h->free_huge_pages = 0; > for (i = 0; i < MAX_NUMNODES; ++i) > INIT_LIST_HEAD(&h->hugepage_freelists[i]); > + INIT_LIST_HEAD(&h->hugepage_activelist); > h->next_nid_to_alloc = first_node(node_states[N_HIGH_MEMORY]); > h->next_nid_to_free = first_node(node_states[N_HIGH_MEMORY]); > snprintf(h->name, HSTATE_NAME_LEN, "hugepages-%lukB", > -- > 1.7.10 > -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754980Ab2FNIYR (ORCPT ); Thu, 14 Jun 2012 04:24:17 -0400 Received: from cantor2.suse.de ([195.135.220.15]:59337 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752184Ab2FNIYM (ORCPT ); Thu, 14 Jun 2012 04:24:12 -0400 Date: Thu, 14 Jun 2012 10:24:06 +0200 From: Michal Hocko To: "Aneesh Kumar K.V" Cc: linux-mm@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, rientjes@google.com, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: Re: [PATCH -V9 09/15] mm/hugetlb: Add new HugeTLB cgroup Message-ID: <20120614082406.GG27397@tiehlicka.suse.cz> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1339583254-895-10-git-send-email-aneesh.kumar@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1339583254-895-10-git-send-email-aneesh.kumar@linux.vnet.ibm.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed 13-06-12 15:57:28, Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > This patch implements a new controller that allows us to control HugeTLB > allocations. The extension allows to limit the HugeTLB usage per control > group and enforces the controller limit during page fault. Since HugeTLB > doesn't support page reclaim, enforcing the limit at page fault time implies > that, the application will get SIGBUS signal if it tries to access HugeTLB > pages beyond its limit. This requires the application to know beforehand > how much HugeTLB pages it would require for its use. > > The charge/uncharge calls will be added to HugeTLB code in later patch. > Support for cgroup removal will be added in later patches. > > Reviewed-by: KAMEZAWA Hiroyuki > Signed-off-by: Aneesh Kumar K.V Looks good Reviewed-by: Michal Hocko > --- > include/linux/cgroup_subsys.h | 6 ++ > include/linux/hugetlb_cgroup.h | 37 ++++++++++++ > init/Kconfig | 15 +++++ > mm/Makefile | 1 + > mm/hugetlb_cgroup.c | 122 ++++++++++++++++++++++++++++++++++++++++ > 5 files changed, 181 insertions(+) > create mode 100644 include/linux/hugetlb_cgroup.h > create mode 100644 mm/hugetlb_cgroup.c > > diff --git a/include/linux/cgroup_subsys.h b/include/linux/cgroup_subsys.h > index 0bd390c..895923a 100644 > --- a/include/linux/cgroup_subsys.h > +++ b/include/linux/cgroup_subsys.h > @@ -72,3 +72,9 @@ SUBSYS(net_prio) > #endif > > /* */ > + > +#ifdef CONFIG_CGROUP_HUGETLB_RES_CTLR > +SUBSYS(hugetlb) > +#endif > + > +/* */ > diff --git a/include/linux/hugetlb_cgroup.h b/include/linux/hugetlb_cgroup.h > new file mode 100644 > index 0000000..e9944b4 > --- /dev/null > +++ b/include/linux/hugetlb_cgroup.h > @@ -0,0 +1,37 @@ > +/* > + * Copyright IBM Corporation, 2012 > + * Author Aneesh Kumar K.V > + * > + * This program is free software; you can redistribute it and/or modify it > + * under the terms of version 2.1 of the GNU Lesser General Public License > + * as published by the Free Software Foundation. > + * > + * This program is distributed in the hope that it would be useful, but > + * WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. > + * > + */ > + > +#ifndef _LINUX_HUGETLB_CGROUP_H > +#define _LINUX_HUGETLB_CGROUP_H > + > +#include > + > +struct hugetlb_cgroup; > + > +#ifdef CONFIG_CGROUP_HUGETLB_RES_CTLR > +static inline bool hugetlb_cgroup_disabled(void) > +{ > + if (hugetlb_subsys.disabled) > + return true; > + return false; > +} > + > +#else > +static inline bool hugetlb_cgroup_disabled(void) > +{ > + return true; > +} > + > +#endif /* CONFIG_MEM_RES_CTLR_HUGETLB */ > +#endif > diff --git a/init/Kconfig b/init/Kconfig > index d07dcf9..da05fae 100644 > --- a/init/Kconfig > +++ b/init/Kconfig > @@ -751,6 +751,21 @@ config CGROUP_MEM_RES_CTLR_KMEM > the kmem extension can use it to guarantee that no group of processes > will ever exhaust kernel resources alone. > > +config CGROUP_HUGETLB_RES_CTLR > + bool "HugeTLB Resource Controller for Control Groups" > + depends on RESOURCE_COUNTERS && HUGETLB_PAGE && EXPERIMENTAL > + default n > + help > + Provides a cgroup Resource Controller for HugeTLB pages. > + When you enable this, you can put a per cgroup limit on HugeTLB usage. > + The limit is enforced during page fault. Since HugeTLB doesn't > + support page reclaim, enforcing the limit at page fault time implies > + that, the application will get SIGBUS signal if it tries to access > + HugeTLB pages beyond its limit. This requires the application to know > + beforehand how much HugeTLB pages it would require for its use. The > + control group is tracked in the third page lru pointer. This means > + that we cannot use the controller with huge page less than 3 pages. > + > config CGROUP_PERF > bool "Enable perf_event per-cpu per-container group (cgroup) monitoring" > depends on PERF_EVENTS && CGROUPS > diff --git a/mm/Makefile b/mm/Makefile > index 2e2fbbe..25e8002 100644 > --- a/mm/Makefile > +++ b/mm/Makefile > @@ -49,6 +49,7 @@ obj-$(CONFIG_MIGRATION) += migrate.o > obj-$(CONFIG_QUICKLIST) += quicklist.o > obj-$(CONFIG_TRANSPARENT_HUGEPAGE) += huge_memory.o > obj-$(CONFIG_CGROUP_MEM_RES_CTLR) += memcontrol.o page_cgroup.o > +obj-$(CONFIG_CGROUP_HUGETLB_RES_CTLR) += hugetlb_cgroup.o > obj-$(CONFIG_MEMORY_FAILURE) += memory-failure.o > obj-$(CONFIG_HWPOISON_INJECT) += hwpoison-inject.o > obj-$(CONFIG_DEBUG_KMEMLEAK) += kmemleak.o > diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c > new file mode 100644 > index 0000000..5a4e71c > --- /dev/null > +++ b/mm/hugetlb_cgroup.c > @@ -0,0 +1,122 @@ > +/* > + * > + * Copyright IBM Corporation, 2012 > + * Author Aneesh Kumar K.V > + * > + * This program is free software; you can redistribute it and/or modify it > + * under the terms of version 2.1 of the GNU Lesser General Public License > + * as published by the Free Software Foundation. > + * > + * This program is distributed in the hope that it would be useful, but > + * WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. > + * > + */ > + > +#include > +#include > +#include > +#include > + > +struct hugetlb_cgroup { > + struct cgroup_subsys_state css; > + /* > + * the counter to account for hugepages from hugetlb. > + */ > + struct res_counter hugepage[HUGE_MAX_HSTATE]; > +}; > + > +struct cgroup_subsys hugetlb_subsys __read_mostly; > +struct hugetlb_cgroup *root_h_cgroup __read_mostly; > + > +static inline > +struct hugetlb_cgroup *hugetlb_cgroup_from_css(struct cgroup_subsys_state *s) > +{ > + if (s) > + return container_of(s, struct hugetlb_cgroup, css); > + return NULL; > +} > + > +static inline > +struct hugetlb_cgroup *hugetlb_cgroup_from_cgroup(struct cgroup *cgroup) > +{ > + return hugetlb_cgroup_from_css(cgroup_subsys_state(cgroup, > + hugetlb_subsys_id)); > +} > + > +static inline > +struct hugetlb_cgroup *hugetlb_cgroup_from_task(struct task_struct *task) > +{ > + return hugetlb_cgroup_from_css(task_subsys_state(task, > + hugetlb_subsys_id)); > +} > + > +static inline bool hugetlb_cgroup_is_root(struct hugetlb_cgroup *h_cg) > +{ > + return (h_cg == root_h_cgroup); > +} > + > +static inline struct hugetlb_cgroup *parent_hugetlb_cgroup(struct cgroup *cg) > +{ > + if (!cg->parent) > + return NULL; > + return hugetlb_cgroup_from_cgroup(cg->parent); > +} > + > +static inline bool hugetlb_cgroup_have_usage(struct cgroup *cg) > +{ > + int idx; > + struct hugetlb_cgroup *h_cg = hugetlb_cgroup_from_cgroup(cg); > + > + for (idx = 0; idx < hugetlb_max_hstate; idx++) { > + if ((res_counter_read_u64(&h_cg->hugepage[idx], RES_USAGE)) > 0) > + return true; > + } > + return false; > +} > + > +static struct cgroup_subsys_state *hugetlb_cgroup_create(struct cgroup *cgroup) > +{ > + int idx; > + struct cgroup *parent_cgroup; > + struct hugetlb_cgroup *h_cgroup, *parent_h_cgroup; > + > + h_cgroup = kzalloc(sizeof(*h_cgroup), GFP_KERNEL); > + if (!h_cgroup) > + return ERR_PTR(-ENOMEM); > + > + parent_cgroup = cgroup->parent; > + if (parent_cgroup) { > + parent_h_cgroup = hugetlb_cgroup_from_cgroup(parent_cgroup); > + for (idx = 0; idx < HUGE_MAX_HSTATE; idx++) > + res_counter_init(&h_cgroup->hugepage[idx], > + &parent_h_cgroup->hugepage[idx]); > + } else { > + root_h_cgroup = h_cgroup; > + for (idx = 0; idx < HUGE_MAX_HSTATE; idx++) > + res_counter_init(&h_cgroup->hugepage[idx], NULL); > + } > + return &h_cgroup->css; > +} > + > +static void hugetlb_cgroup_destroy(struct cgroup *cgroup) > +{ > + struct hugetlb_cgroup *h_cgroup; > + > + h_cgroup = hugetlb_cgroup_from_cgroup(cgroup); > + kfree(h_cgroup); > +} > + > +static int hugetlb_cgroup_pre_destroy(struct cgroup *cgroup) > +{ > + /* We will add the cgroup removal support in later patches */ > + return -EBUSY; > +} > + > +struct cgroup_subsys hugetlb_subsys = { > + .name = "hugetlb", > + .create = hugetlb_cgroup_create, > + .pre_destroy = hugetlb_cgroup_pre_destroy, > + .destroy = hugetlb_cgroup_destroy, > + .subsys_id = hugetlb_subsys_id, > +}; > -- > 1.7.10 > -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755361Ab2FNIod (ORCPT ); Thu, 14 Jun 2012 04:44:33 -0400 Received: from cantor2.suse.de ([195.135.220.15]:60502 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753478Ab2FNIoa (ORCPT ); Thu, 14 Jun 2012 04:44:30 -0400 Date: Thu, 14 Jun 2012 10:44:29 +0200 From: Michal Hocko To: "Aneesh Kumar K.V" Cc: linux-mm@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, rientjes@google.com, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: Re: [PATCH -V9 [updated] 10/15] hugetlb/cgroup: Add the cgroup pointer to page lru Message-ID: <20120614084429.GH27397@tiehlicka.suse.cz> References: <1339583254-895-11-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1339587270-5831-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1339587270-5831-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed 13-06-12 17:04:30, Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > Add the hugetlb cgroup pointer to 3rd page lru.next. This limit > the usage to hugetlb cgroup to only hugepages with 3 or more > normal pages. I guess that is an acceptable limitation. > > Signed-off-by: Aneesh Kumar K.V I would be happier if you explicitely mentioned that both hugetlb_cgroup_from_page and set_hugetlb_cgroup need hugetlb_lock held, but Reviewed-by: Michal Hocko > --- > include/linux/hugetlb_cgroup.h | 37 +++++++++++++++++++++++++++++++++++++ > mm/hugetlb.c | 4 ++++ > 2 files changed, 41 insertions(+) > > diff --git a/include/linux/hugetlb_cgroup.h b/include/linux/hugetlb_cgroup.h > index e9944b4..2e4cb6b 100644 > --- a/include/linux/hugetlb_cgroup.h > +++ b/include/linux/hugetlb_cgroup.h > @@ -18,8 +18,34 @@ > #include > > struct hugetlb_cgroup; > +/* > + * Minimum page order trackable by hugetlb cgroup. > + * At least 3 pages are necessary for all the tracking information. > + */ > +#define HUGETLB_CGROUP_MIN_ORDER 2 > > #ifdef CONFIG_CGROUP_HUGETLB_RES_CTLR > + > +static inline struct hugetlb_cgroup *hugetlb_cgroup_from_page(struct page *page) > +{ > + VM_BUG_ON(!PageHuge(page)); > + > + if (compound_order(page) < HUGETLB_CGROUP_MIN_ORDER) > + return NULL; > + return (struct hugetlb_cgroup *)page[2].lru.next; > +} > + > +static inline > +int set_hugetlb_cgroup(struct page *page, struct hugetlb_cgroup *h_cg) > +{ > + VM_BUG_ON(!PageHuge(page)); > + > + if (compound_order(page) < HUGETLB_CGROUP_MIN_ORDER) > + return -1; > + page[2].lru.next = (void *)h_cg; > + return 0; > +} > + > static inline bool hugetlb_cgroup_disabled(void) > { > if (hugetlb_subsys.disabled) > @@ -28,6 +54,17 @@ static inline bool hugetlb_cgroup_disabled(void) > } > > #else > +static inline struct hugetlb_cgroup *hugetlb_cgroup_from_page(struct page *page) > +{ > + return NULL; > +} > + > +static inline > +int set_hugetlb_cgroup(struct page *page, struct hugetlb_cgroup *h_cg) > +{ > + return 0; > +} > + > static inline bool hugetlb_cgroup_disabled(void) > { > return true; > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index e899a2d..6a449c5 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -28,6 +28,7 @@ > > #include > #include > +#include > #include > #include "internal.h" > > @@ -591,6 +592,7 @@ static void update_and_free_page(struct hstate *h, struct page *page) > 1 << PG_active | 1 << PG_reserved | > 1 << PG_private | 1 << PG_writeback); > } > + VM_BUG_ON(hugetlb_cgroup_from_page(page)); > set_compound_page_dtor(page, NULL); > set_page_refcounted(page); > arch_release_hugepage(page); > @@ -643,6 +645,7 @@ static void prep_new_huge_page(struct hstate *h, struct page *page, int nid) > INIT_LIST_HEAD(&page->lru); > set_compound_page_dtor(page, free_huge_page); > spin_lock(&hugetlb_lock); > + set_hugetlb_cgroup(page, NULL); > h->nr_huge_pages++; > h->nr_huge_pages_node[nid]++; > spin_unlock(&hugetlb_lock); > @@ -892,6 +895,7 @@ static struct page *alloc_buddy_huge_page(struct hstate *h, int nid) > INIT_LIST_HEAD(&page->lru); > r_nid = page_to_nid(page); > set_compound_page_dtor(page, free_huge_page); > + set_hugetlb_cgroup(page, NULL); > /* > * We incremented the global counters already > */ > -- > 1.7.10 > -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755408Ab2FNI6Q (ORCPT ); Thu, 14 Jun 2012 04:58:16 -0400 Received: from szxga01-in.huawei.com ([119.145.14.64]:13574 "EHLO szxga01-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755028Ab2FNI6O (ORCPT ); Thu, 14 Jun 2012 04:58:14 -0400 Message-ID: <4FD9A6B6.50503@huawei.com> Date: Thu, 14 Jun 2012 16:54:14 +0800 From: Li Zefan User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:11.0) Gecko/20120312 Thunderbird/11.0 MIME-Version: 1.0 To: "Aneesh Kumar K.V" CC: , , , , , , , , Subject: Re: [PATCH -V9 09/15] mm/hugetlb: Add new HugeTLB cgroup References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1339583254-895-10-git-send-email-aneesh.kumar@linux.vnet.ibm.com> In-Reply-To: <1339583254-895-10-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Content-Type: text/plain; charset="GB2312" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.166.88.162] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > +static inline > +struct hugetlb_cgroup *hugetlb_cgroup_from_css(struct cgroup_subsys_state *s) > +{ > + if (s) Neither cgroup_subsys_state() or task_subsys_state() will ever return NULL, so here 's' won't be NULL. > + return container_of(s, struct hugetlb_cgroup, css); > + return NULL; > +} > + > +static inline > +struct hugetlb_cgroup *hugetlb_cgroup_from_cgroup(struct cgroup *cgroup) > +{ > + return hugetlb_cgroup_from_css(cgroup_subsys_state(cgroup, > + hugetlb_subsys_id)); > +} > + > +static inline > +struct hugetlb_cgroup *hugetlb_cgroup_from_task(struct task_struct *task) > +{ > + return hugetlb_cgroup_from_css(task_subsys_state(task, > + hugetlb_subsys_id)); > +} From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755428Ab2FNJDJ (ORCPT ); Thu, 14 Jun 2012 05:03:09 -0400 Received: from szxga02-in.huawei.com ([119.145.14.65]:38905 "EHLO szxga02-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751582Ab2FNJDI (ORCPT ); Thu, 14 Jun 2012 05:03:08 -0400 Message-ID: <4FD9A79D.9030303@huawei.com> Date: Thu, 14 Jun 2012 16:58:05 +0800 From: Li Zefan User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:11.0) Gecko/20120312 Thunderbird/11.0 MIME-Version: 1.0 To: "Aneesh Kumar K.V" CC: , , , , , , , , Subject: Re: [PATCH -V9 11/15] hugetlb/cgroup: Add charge/uncharge routines for hugetlb cgroup References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1339583254-895-12-git-send-email-aneesh.kumar@linux.vnet.ibm.com> In-Reply-To: <1339583254-895-12-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Content-Type: text/plain; charset="GB2312" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.166.88.162] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > +int hugetlb_cgroup_charge_cgroup(int idx, unsigned long nr_pages, > + struct hugetlb_cgroup **ptr) > +{ > + int ret = 0; > + struct res_counter *fail_res; > + struct hugetlb_cgroup *h_cg = NULL; > + unsigned long csize = nr_pages * PAGE_SIZE; > + > + if (hugetlb_cgroup_disabled()) > + goto done; > + /* > + * We don't charge any cgroup if the compound page have less > + * than 3 pages. > + */ > + if (huge_page_order(&hstates[idx]) < HUGETLB_CGROUP_MIN_ORDER) > + goto done; > +again: > + rcu_read_lock(); > + h_cg = hugetlb_cgroup_from_task(current); > + if (!h_cg) In no circumstances should h_cg be NULL. > + h_cg = root_h_cgroup; > + > + if (!css_tryget(&h_cg->css)) { > + rcu_read_unlock(); > + goto again; > + } > + rcu_read_unlock(); > + > + ret = res_counter_charge(&h_cg->hugepage[idx], csize, &fail_res); > + css_put(&h_cg->css); > +done: > + *ptr = h_cg; > + return ret; > +} From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755509Ab2FNJbI (ORCPT ); Thu, 14 Jun 2012 05:31:08 -0400 Received: from cantor2.suse.de ([195.135.220.15]:35207 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755460Ab2FNJbF (ORCPT ); Thu, 14 Jun 2012 05:31:05 -0400 Date: Thu, 14 Jun 2012 11:31:03 +0200 From: Michal Hocko To: "Aneesh Kumar K.V" Cc: linux-mm@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, rientjes@google.com, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: Re: [PATCH -V9 12/15] hugetlb/cgroup: Add support for cgroup removal Message-ID: <20120614093103.GJ27397@tiehlicka.suse.cz> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1339583254-895-13-git-send-email-aneesh.kumar@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1339583254-895-13-git-send-email-aneesh.kumar@linux.vnet.ibm.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed 13-06-12 15:57:31, Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > This patch add support for cgroup removal. If we don't have parent > cgroup, the charges are moved to root cgroup. > > Signed-off-by: Aneesh Kumar K.V Reviewed-by: Michal Hocko > --- > mm/hugetlb_cgroup.c | 70 +++++++++++++++++++++++++++++++++++++++++++++++++-- > 1 file changed, 68 insertions(+), 2 deletions(-) > > diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c > index 0f2f6ac..a3a68a4 100644 > --- a/mm/hugetlb_cgroup.c > +++ b/mm/hugetlb_cgroup.c > @@ -107,10 +107,76 @@ static void hugetlb_cgroup_destroy(struct cgroup *cgroup) > kfree(h_cgroup); > } > > + > +/* > + * Should be called with hugetlb_lock held. > + * Since we are holding hugetlb_lock, pages cannot get moved from > + * active list or uncharged from the cgroup, So no need to get > + * page reference and test for page active here. This function > + * cannot fail. > + */ > +static void hugetlb_cgroup_move_parent(int idx, struct cgroup *cgroup, > + struct page *page) > +{ > + int csize; > + struct res_counter *counter; > + struct res_counter *fail_res; > + struct hugetlb_cgroup *page_hcg; > + struct hugetlb_cgroup *h_cg = hugetlb_cgroup_from_cgroup(cgroup); > + struct hugetlb_cgroup *parent = parent_hugetlb_cgroup(cgroup); > + > + page_hcg = hugetlb_cgroup_from_page(page); > + /* > + * We can have pages in active list without any cgroup > + * ie, hugepage with less than 3 pages. We can safely > + * ignore those pages. > + */ > + if (!page_hcg || page_hcg != h_cg) > + goto out; > + > + csize = PAGE_SIZE << compound_order(page); > + if (!parent) { > + parent = root_h_cgroup; > + /* root has no limit */ > + res_counter_charge_nofail(&parent->hugepage[idx], > + csize, &fail_res); > + } > + counter = &h_cg->hugepage[idx]; > + res_counter_uncharge_until(counter, counter->parent, csize); > + > + set_hugetlb_cgroup(page, parent); > +out: > + return; > +} > + > +/* > + * Force the hugetlb cgroup to empty the hugetlb resources by moving them to > + * the parent cgroup. > + */ > static int hugetlb_cgroup_pre_destroy(struct cgroup *cgroup) > { > - /* We will add the cgroup removal support in later patches */ > - return -EBUSY; > + struct hstate *h; > + struct page *page; > + int ret = 0, idx = 0; > + > + do { > + if (cgroup_task_count(cgroup) || > + !list_empty(&cgroup->children)) { > + ret = -EBUSY; > + goto out; > + } > + for_each_hstate(h) { > + spin_lock(&hugetlb_lock); > + list_for_each_entry(page, &h->hugepage_activelist, lru) > + hugetlb_cgroup_move_parent(idx, cgroup, page); > + > + spin_unlock(&hugetlb_lock); > + idx++; > + } > + cond_resched(); > + } while (hugetlb_cgroup_have_usage(cgroup)); > +out: > + return ret; > } > > int hugetlb_cgroup_charge_cgroup(int idx, unsigned long nr_pages, > -- > 1.7.10 > -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755208Ab2FNJg6 (ORCPT ); Thu, 14 Jun 2012 05:36:58 -0400 Received: from cantor2.suse.de ([195.135.220.15]:35413 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754793Ab2FNJgz (ORCPT ); Thu, 14 Jun 2012 05:36:55 -0400 Date: Thu, 14 Jun 2012 11:36:52 +0200 From: Michal Hocko To: "Aneesh Kumar K.V" Cc: linux-mm@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, rientjes@google.com, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: Re: [PATCH -V9 13/15] hugetlb/cgroup: add hugetlb cgroup control files Message-ID: <20120614093652.GK27397@tiehlicka.suse.cz> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1339583254-895-14-git-send-email-aneesh.kumar@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1339583254-895-14-git-send-email-aneesh.kumar@linux.vnet.ibm.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed 13-06-12 15:57:32, Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > Add the control files for hugetlb controller > > Signed-off-by: Aneesh Kumar K.V Reviewed-by: Michal Hocko > --- > include/linux/hugetlb.h | 5 ++ > include/linux/hugetlb_cgroup.h | 6 ++ > mm/hugetlb.c | 8 +++ > mm/hugetlb_cgroup.c | 129 ++++++++++++++++++++++++++++++++++++++++ > 4 files changed, 148 insertions(+) > > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h > index 4aca057..9650bb1 100644 > --- a/include/linux/hugetlb.h > +++ b/include/linux/hugetlb.h > @@ -4,6 +4,7 @@ > #include > #include > #include > +#include > > struct ctl_table; > struct user_struct; > @@ -221,6 +222,10 @@ struct hstate { > unsigned int nr_huge_pages_node[MAX_NUMNODES]; > unsigned int free_huge_pages_node[MAX_NUMNODES]; > unsigned int surplus_huge_pages_node[MAX_NUMNODES]; > +#ifdef CONFIG_CGROUP_HUGETLB_RES_CTLR > + /* cgroup control files */ > + struct cftype cgroup_files[5]; > +#endif > char name[HSTATE_NAME_LEN]; > }; > > diff --git a/include/linux/hugetlb_cgroup.h b/include/linux/hugetlb_cgroup.h > index e05871c..bd8bc98 100644 > --- a/include/linux/hugetlb_cgroup.h > +++ b/include/linux/hugetlb_cgroup.h > @@ -62,6 +62,7 @@ extern void hugetlb_cgroup_uncharge_page(int idx, unsigned long nr_pages, > struct page *page); > extern void hugetlb_cgroup_uncharge_cgroup(int idx, unsigned long nr_pages, > struct hugetlb_cgroup *h_cg); > +extern int hugetlb_cgroup_file_init(int idx) __init; > > #else > static inline struct hugetlb_cgroup *hugetlb_cgroup_from_page(struct page *page) > @@ -108,5 +109,10 @@ hugetlb_cgroup_uncharge_cgroup(int idx, unsigned long nr_pages, > return; > } > > +static inline int __init hugetlb_cgroup_file_init(int idx) > +{ > + return 0; > +} > + > #endif /* CONFIG_MEM_RES_CTLR_HUGETLB */ > #endif > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 59720b1..a5a30bf 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -30,6 +30,7 @@ > #include > #include > #include > +#include > #include "internal.h" > > const unsigned long hugetlb_zero = 0, hugetlb_infinity = ~0UL; > @@ -1930,6 +1931,13 @@ void __init hugetlb_add_hstate(unsigned order) > h->next_nid_to_free = first_node(node_states[N_HIGH_MEMORY]); > snprintf(h->name, HSTATE_NAME_LEN, "hugepages-%lukB", > huge_page_size(h)/1024); > + /* > + * Add cgroup control files only if the huge page consists > + * of more than two normal pages. This is because we use > + * page[2].lru.next for storing cgoup details. > + */ > + if (order >= HUGETLB_CGROUP_MIN_ORDER) > + hugetlb_cgroup_file_init(hugetlb_max_hstate - 1); > > parsed_hstate = h; > } > diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c > index a3a68a4..64e93e0 100644 > --- a/mm/hugetlb_cgroup.c > +++ b/mm/hugetlb_cgroup.c > @@ -26,6 +26,10 @@ struct hugetlb_cgroup { > struct res_counter hugepage[HUGE_MAX_HSTATE]; > }; > > +#define MEMFILE_PRIVATE(x, val) (((x) << 16) | (val)) > +#define MEMFILE_IDX(val) (((val) >> 16) & 0xffff) > +#define MEMFILE_ATTR(val) ((val) & 0xffff) > + > struct cgroup_subsys hugetlb_subsys __read_mostly; > struct hugetlb_cgroup *root_h_cgroup __read_mostly; > > @@ -259,6 +263,131 @@ void hugetlb_cgroup_uncharge_cgroup(int idx, unsigned long nr_pages, > return; > } > > +static ssize_t hugetlb_cgroup_read(struct cgroup *cgroup, struct cftype *cft, > + struct file *file, char __user *buf, > + size_t nbytes, loff_t *ppos) > +{ > + u64 val; > + char str[64]; > + int idx, name, len; > + struct hugetlb_cgroup *h_cg = hugetlb_cgroup_from_cgroup(cgroup); > + > + idx = MEMFILE_IDX(cft->private); > + name = MEMFILE_ATTR(cft->private); > + > + val = res_counter_read_u64(&h_cg->hugepage[idx], name); > + len = scnprintf(str, sizeof(str), "%llu\n", (unsigned long long)val); > + return simple_read_from_buffer(buf, nbytes, ppos, str, len); > +} > + > +static int hugetlb_cgroup_write(struct cgroup *cgroup, struct cftype *cft, > + const char *buffer) > +{ > + int idx, name, ret; > + unsigned long long val; > + struct hugetlb_cgroup *h_cg = hugetlb_cgroup_from_cgroup(cgroup); > + > + idx = MEMFILE_IDX(cft->private); > + name = MEMFILE_ATTR(cft->private); > + > + switch (name) { > + case RES_LIMIT: > + if (hugetlb_cgroup_is_root(h_cg)) { > + /* Can't set limit on root */ > + ret = -EINVAL; > + break; > + } > + /* This function does all necessary parse...reuse it */ > + ret = res_counter_memparse_write_strategy(buffer, &val); > + if (ret) > + break; > + ret = res_counter_set_limit(&h_cg->hugepage[idx], val); > + break; > + default: > + ret = -EINVAL; > + break; > + } > + return ret; > +} > + > +static int hugetlb_cgroup_reset(struct cgroup *cgroup, unsigned int event) > +{ > + int idx, name, ret = 0; > + struct hugetlb_cgroup *h_cg = hugetlb_cgroup_from_cgroup(cgroup); > + > + idx = MEMFILE_IDX(event); > + name = MEMFILE_ATTR(event); > + > + switch (name) { > + case RES_MAX_USAGE: > + res_counter_reset_max(&h_cg->hugepage[idx]); > + break; > + case RES_FAILCNT: > + res_counter_reset_failcnt(&h_cg->hugepage[idx]); > + break; > + default: > + ret = -EINVAL; > + break; > + } > + return ret; > +} > + > +static char *mem_fmt(char *buf, int size, unsigned long hsize) > +{ > + if (hsize >= (1UL << 30)) > + snprintf(buf, size, "%luGB", hsize >> 30); > + else if (hsize >= (1UL << 20)) > + snprintf(buf, size, "%luMB", hsize >> 20); > + else > + snprintf(buf, size, "%luKB", hsize >> 10); > + return buf; > +} > + > +int __init hugetlb_cgroup_file_init(int idx) > +{ > + char buf[32]; > + struct cftype *cft; > + struct hstate *h = &hstates[idx]; > + > + /* format the size */ > + mem_fmt(buf, 32, huge_page_size(h)); > + > + /* Add the limit file */ > + cft = &h->cgroup_files[0]; > + snprintf(cft->name, MAX_CFTYPE_NAME, "%s.limit_in_bytes", buf); > + cft->private = MEMFILE_PRIVATE(idx, RES_LIMIT); > + cft->read = hugetlb_cgroup_read; > + cft->write_string = hugetlb_cgroup_write; > + > + /* Add the usage file */ > + cft = &h->cgroup_files[1]; > + snprintf(cft->name, MAX_CFTYPE_NAME, "%s.usage_in_bytes", buf); > + cft->private = MEMFILE_PRIVATE(idx, RES_USAGE); > + cft->read = hugetlb_cgroup_read; > + > + /* Add the MAX usage file */ > + cft = &h->cgroup_files[2]; > + snprintf(cft->name, MAX_CFTYPE_NAME, "%s.max_usage_in_bytes", buf); > + cft->private = MEMFILE_PRIVATE(idx, RES_MAX_USAGE); > + cft->trigger = hugetlb_cgroup_reset; > + cft->read = hugetlb_cgroup_read; > + > + /* Add the failcntfile */ > + cft = &h->cgroup_files[3]; > + snprintf(cft->name, MAX_CFTYPE_NAME, "%s.failcnt", buf); > + cft->private = MEMFILE_PRIVATE(idx, RES_FAILCNT); > + cft->trigger = hugetlb_cgroup_reset; > + cft->read = hugetlb_cgroup_read; > + > + /* NULL terminate the last cft */ > + cft = &h->cgroup_files[4]; > + memset(cft, 0, sizeof(*cft)); > + > + WARN_ON(cgroup_add_cftypes(&hugetlb_subsys, h->cgroup_files)); > + > + return 0; > +} > + > struct cgroup_subsys hugetlb_subsys = { > .name = "hugetlb", > .create = hugetlb_cgroup_create, > -- > 1.7.10 > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755577Ab2FNKE5 (ORCPT ); Thu, 14 Jun 2012 06:04:57 -0400 Received: from cantor2.suse.de ([195.135.220.15]:38221 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755541Ab2FNKE4 (ORCPT ); Thu, 14 Jun 2012 06:04:56 -0400 Date: Thu, 14 Jun 2012 12:04:54 +0200 From: Michal Hocko To: "Aneesh Kumar K.V" Cc: linux-mm@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, rientjes@google.com, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: Re: [PATCH -V9 14/15] hugetlb/cgroup: migrate hugetlb cgroup info from oldpage to new page during migration Message-ID: <20120614100454.GL27397@tiehlicka.suse.cz> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1339583254-895-15-git-send-email-aneesh.kumar@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1339583254-895-15-git-send-email-aneesh.kumar@linux.vnet.ibm.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed 13-06-12 15:57:33, Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > With HugeTLB pages, hugetlb cgroup is uncharged in compound page destructor. Since > we are holding a hugepage reference, we can be sure that old page won't > get uncharged till the last put_page(). > > Signed-off-by: Aneesh Kumar K.V Reviewed-by: Michal Hocko One question below [...] > +void hugetlb_cgroup_migrate(struct page *oldhpage, struct page *newhpage) > +{ > + struct hugetlb_cgroup *h_cg; > + > + if (hugetlb_cgroup_disabled()) > + return; > + > + VM_BUG_ON(!PageHuge(oldhpage)); > + spin_lock(&hugetlb_lock); > + h_cg = hugetlb_cgroup_from_page(oldhpage); > + set_hugetlb_cgroup(oldhpage, NULL); > + cgroup_exclude_rmdir(&h_cg->css); > + > + /* move the h_cg details to new cgroup */ > + set_hugetlb_cgroup(newhpage, h_cg); > + spin_unlock(&hugetlb_lock); > + cgroup_release_and_wakeup_rmdir(&h_cg->css); > + return; > +} > + The changelog says that the old page won't get uncharged - which means that the the cgroup cannot go away (even if we raced with the move parent, hugetlb_lock makes sure we either see old or new cgroup) so why do we need to play with css ref. counting? -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755609Ab2FNKH7 (ORCPT ); Thu, 14 Jun 2012 06:07:59 -0400 Received: from cantor2.suse.de ([195.135.220.15]:38536 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755571Ab2FNKH4 (ORCPT ); Thu, 14 Jun 2012 06:07:56 -0400 Date: Thu, 14 Jun 2012 12:07:55 +0200 From: Michal Hocko To: "Aneesh Kumar K.V" Cc: linux-mm@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, rientjes@google.com, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: Re: [PATCH -V9 15/15] hugetlb/cgroup: add HugeTLB controller documentation Message-ID: <20120614100755.GM27397@tiehlicka.suse.cz> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1339583254-895-16-git-send-email-aneesh.kumar@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1339583254-895-16-git-send-email-aneesh.kumar@linux.vnet.ibm.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed 13-06-12 15:57:34, Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > Reviewed-by: KAMEZAWA Hiroyuki > Signed-off-by: Aneesh Kumar K.V Reviewed-by: Michal Hocko Minor nid below > --- > Documentation/cgroups/hugetlb.txt | 45 +++++++++++++++++++++++++++++++++++++ > 1 file changed, 45 insertions(+) > create mode 100644 Documentation/cgroups/hugetlb.txt > > diff --git a/Documentation/cgroups/hugetlb.txt b/Documentation/cgroups/hugetlb.txt > new file mode 100644 > index 0000000..a9faaca > --- /dev/null > +++ b/Documentation/cgroups/hugetlb.txt [...] > +With the above step, the initial or the parent HugeTLB group becomes > +visible at /sys/fs/cgroup. At bootup, this group includes all the tasks in > +the system. /sys/fs/cgroup/tasks lists the tasks in this cgroup. > + > +New groups can be created under the parent group /sys/fs/cgroup. > + > +# cd /sys/fs/cgroup > +# mkdir g1 > +# echo $$ > g1/tasks > + > +The above steps create a new group g1 and move the current shell > +process (bash) into it. This is probably not needed as it is already described in the generic cgroups description > + > +Brief summary of control files > + > + hugetlb..limit_in_bytes # set/show limit of "hugepagesize" hugetlb usage > + hugetlb..max_usage_in_bytes # show max "hugepagesize" hugetlb usage recorded > + hugetlb..usage_in_bytes # show current res_counter usage for "hugepagesize" hugetlb > + hugetlb..failcnt # show the number of allocation failure due to HugeTLB limit > + > +For a system supporting two hugepage size (16M and 16G) the control > +files include: > + > +hugetlb.16GB.limit_in_bytes > +hugetlb.16GB.max_usage_in_bytes > +hugetlb.16GB.usage_in_bytes > +hugetlb.16GB.failcnt > +hugetlb.16MB.limit_in_bytes > +hugetlb.16MB.max_usage_in_bytes > +hugetlb.16MB.usage_in_bytes > +hugetlb.16MB.failcnt > -- > 1.7.10 > -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752677Ab2FOGVE (ORCPT ); Fri, 15 Jun 2012 02:21:04 -0400 Received: from e28smtp02.in.ibm.com ([122.248.162.2]:54582 "EHLO e28smtp02.in.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750972Ab2FOGVB (ORCPT ); Fri, 15 Jun 2012 02:21:01 -0400 From: "Aneesh Kumar K.V" To: Li Zefan Cc: linux-mm@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, rientjes@google.com, mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: Re: [PATCH -V9 09/15] mm/hugetlb: Add new HugeTLB cgroup In-Reply-To: <4FD9A6B6.50503@huawei.com> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1339583254-895-10-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <4FD9A6B6.50503@huawei.com> User-Agent: Notmuch/0.13.2+35~g0ff57e7 (http://notmuchmail.org) Emacs/24.1.50.1 (x86_64-unknown-linux-gnu) Date: Fri, 15 Jun 2012 11:50:52 +0530 Message-ID: <87mx45m6yj.fsf@skywalker.in.ibm.com> MIME-Version: 1.0 Content-Type: text/plain x-cbid: 12061506-5816-0000-0000-000003289056 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Li Zefan writes: >> +static inline > >> +struct hugetlb_cgroup *hugetlb_cgroup_from_css(struct cgroup_subsys_state *s) >> +{ >> + if (s) > > > Neither cgroup_subsys_state() or task_subsys_state() will ever return NULL, > so here 's' won't be NULL. > That is a change that didn't get updated when i dropped page_cgroup changes. I had a series that tracked in page_cgroup cgroup_subsys_state. I will send an fix on top. Thanks for the review. -aneesh From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756622Ab2FOKGX (ORCPT ); Fri, 15 Jun 2012 06:06:23 -0400 Received: from e23smtp02.au.ibm.com ([202.81.31.144]:58942 "EHLO e23smtp02.au.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756247Ab2FOKGV (ORCPT ); Fri, 15 Jun 2012 06:06:21 -0400 From: "Aneesh Kumar K.V" To: Michal Hocko Cc: linux-mm@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, rientjes@google.com, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: Re: [PATCH -V9 11/15] hugetlb/cgroup: Add charge/uncharge routines for hugetlb cgroup In-Reply-To: <20120614092539.GI27397@tiehlicka.suse.cz> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1339583254-895-12-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <20120614092539.GI27397@tiehlicka.suse.cz> User-Agent: Notmuch/0.13.2+35~g0ff57e7 (http://notmuchmail.org) Emacs/24.1.50.1 (x86_64-unknown-linux-gnu) Date: Fri, 15 Jun 2012 15:36:10 +0530 Message-ID: <87k3z8nb3h.fsf@skywalker.in.ibm.com> MIME-Version: 1.0 Content-Type: text/plain x-cbid: 12061423-5490-0000-0000-000001993D76 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Michal Hocko writes: > On Wed 13-06-12 15:57:30, Aneesh Kumar K.V wrote: >> From: "Aneesh Kumar K.V" >> >> This patchset add the charge and uncharge routines for hugetlb cgroup. >> We do cgroup charging in page alloc and uncharge in compound page >> destructor. Assigning page's hugetlb cgroup is protected by hugetlb_lock. >> >> Signed-off-by: Aneesh Kumar K.V > > Reviewed-by: Michal Hocko > > One minor comment > [...] >> +void hugetlb_cgroup_commit_charge(int idx, unsigned long nr_pages, >> + struct hugetlb_cgroup *h_cg, >> + struct page *page) >> +{ >> + if (hugetlb_cgroup_disabled() || !h_cg) >> + return; >> + >> + spin_lock(&hugetlb_lock); >> + set_hugetlb_cgroup(page, h_cg); >> + spin_unlock(&hugetlb_lock); >> + return; >> +} > > I guess we can remove the lock here because nobody can see the page yet, > right? > We need that to make sure when we remove cgroup we find correct page hugetlb cgroup values. But i guess we have a bug here. How about the below ? NOTE: We also need another patch to update active list during soft offline. I will send that in reply. commit e4c3fd3cc0f0faa30ea283cb48ba478a5c0d3e74 Author: Aneesh Kumar K.V Date: Fri Jun 15 14:42:27 2012 +0530 hugetlb/cgroup: Assign the page hugetlb cgroup when we move the page to active list. page's hugetlb cgroup assign and moving to active list should happen with hugetlb_lock held. Otherwise when we remove the hugetlb cgroup we would iterate the active list and will find page with NULL hugetlb cgroup values. Signed-off-by: Aneesh Kumar K.V diff --git a/mm/hugetlb.c b/mm/hugetlb.c index ee4da3b..b90dfb4 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1146,9 +1146,12 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma, } spin_lock(&hugetlb_lock); page = dequeue_huge_page_vma(h, vma, addr, avoid_reserve); - spin_unlock(&hugetlb_lock); - - if (!page) { + if (page) { + /* update page cgroup details */ + hugetlb_cgroup_commit_charge(idx, pages_per_huge_page(h), h_cg, page); + spin_unlock(&hugetlb_lock); + } else { + spin_unlock(&hugetlb_lock); page = alloc_buddy_huge_page(h, NUMA_NO_NODE); if (!page) { hugetlb_cgroup_uncharge_cgroup(idx, @@ -1159,14 +1162,13 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma, } spin_lock(&hugetlb_lock); list_move(&page->lru, &h->hugepage_activelist); + hugetlb_cgroup_commit_charge(idx, pages_per_huge_page(h), h_cg, page); spin_unlock(&hugetlb_lock); } set_page_private(page, (unsigned long)spool); vma_commit_reservation(h, vma, addr); - /* update page cgroup details */ - hugetlb_cgroup_commit_charge(idx, pages_per_huge_page(h), h_cg, page); return page; } diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c index 8e7ca0a..d4f3f7b 100644 --- a/mm/hugetlb_cgroup.c +++ b/mm/hugetlb_cgroup.c @@ -218,6 +218,7 @@ done: return ret; } +/* Should be called with hugetlb_lock held */ void hugetlb_cgroup_commit_charge(int idx, unsigned long nr_pages, struct hugetlb_cgroup *h_cg, struct page *page) @@ -225,9 +226,7 @@ void hugetlb_cgroup_commit_charge(int idx, unsigned long nr_pages, if (hugetlb_cgroup_disabled() || !h_cg) return; - spin_lock(&hugetlb_lock); set_hugetlb_cgroup(page, h_cg); - spin_unlock(&hugetlb_lock); return; } From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756714Ab2FOKum (ORCPT ); Fri, 15 Jun 2012 06:50:42 -0400 Received: from e23smtp05.au.ibm.com ([202.81.31.147]:48339 "EHLO e23smtp05.au.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756546Ab2FOKuk (ORCPT ); Fri, 15 Jun 2012 06:50:40 -0400 From: "Aneesh Kumar K.V" To: Michal Hocko Cc: linux-mm@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, rientjes@google.com, akpm@linux-foundation.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: Re: [PATCH -V9 14/15] hugetlb/cgroup: migrate hugetlb cgroup info from oldpage to new page during migration In-Reply-To: <20120614100454.GL27397@tiehlicka.suse.cz> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1339583254-895-15-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <20120614100454.GL27397@tiehlicka.suse.cz> User-Agent: Notmuch/0.13.2+35~g0ff57e7 (http://notmuchmail.org) Emacs/24.1.50.1 (x86_64-unknown-linux-gnu) Date: Fri, 15 Jun 2012 16:20:31 +0530 Message-ID: <87haucn91k.fsf@skywalker.in.ibm.com> MIME-Version: 1.0 Content-Type: text/plain x-cbid: 12061500-1396-0000-0000-000001656A8E Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Michal Hocko writes: > On Wed 13-06-12 15:57:33, Aneesh Kumar K.V wrote: >> From: "Aneesh Kumar K.V" >> >> With HugeTLB pages, hugetlb cgroup is uncharged in compound page destructor. Since >> we are holding a hugepage reference, we can be sure that old page won't >> get uncharged till the last put_page(). >> >> Signed-off-by: Aneesh Kumar K.V > > Reviewed-by: Michal Hocko > > One question below > [...] >> +void hugetlb_cgroup_migrate(struct page *oldhpage, struct page *newhpage) >> +{ >> + struct hugetlb_cgroup *h_cg; >> + >> + if (hugetlb_cgroup_disabled()) >> + return; >> + >> + VM_BUG_ON(!PageHuge(oldhpage)); >> + spin_lock(&hugetlb_lock); >> + h_cg = hugetlb_cgroup_from_page(oldhpage); >> + set_hugetlb_cgroup(oldhpage, NULL); >> + cgroup_exclude_rmdir(&h_cg->css); >> + >> + /* move the h_cg details to new cgroup */ >> + set_hugetlb_cgroup(newhpage, h_cg); >> + spin_unlock(&hugetlb_lock); >> + cgroup_release_and_wakeup_rmdir(&h_cg->css); >> + return; >> +} >> + > > The changelog says that the old page won't get uncharged - which means > that the the cgroup cannot go away (even if we raced with the move > parent, hugetlb_lock makes sure we either see old or new cgroup) so why > do we need to play with css ref. counting? Ok hugetlb_lock should be sufficient here i guess. I will send a patch on top to remove the exclude_rmdir and release_and_wakeup_rmdir -aneesh From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758793Ab2FVWLZ (ORCPT ); Fri, 22 Jun 2012 18:11:25 -0400 Received: from mail.linuxfoundation.org ([140.211.169.12]:51428 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754339Ab2FVWLX (ORCPT ); Fri, 22 Jun 2012 18:11:23 -0400 Date: Fri, 22 Jun 2012 15:11:21 -0700 From: Andrew Morton To: Li Zefan Cc: "Aneesh Kumar K.V" , , , , , , , , Subject: Re: [PATCH -V9 11/15] hugetlb/cgroup: Add charge/uncharge routines for hugetlb cgroup Message-Id: <20120622151121.917178eb.akpm@linux-foundation.org> In-Reply-To: <4FD9A79D.9030303@huawei.com> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1339583254-895-12-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <4FD9A79D.9030303@huawei.com> X-Mailer: Sylpheed 3.0.2 (GTK+ 2.20.1; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 14 Jun 2012 16:58:05 +0800 Li Zefan wrote: > > +int hugetlb_cgroup_charge_cgroup(int idx, unsigned long nr_pages, > > > + struct hugetlb_cgroup **ptr) > > +{ > > + int ret = 0; > > + struct res_counter *fail_res; > > + struct hugetlb_cgroup *h_cg = NULL; > > + unsigned long csize = nr_pages * PAGE_SIZE; > > + > > + if (hugetlb_cgroup_disabled()) > > + goto done; > > + /* > > + * We don't charge any cgroup if the compound page have less > > + * than 3 pages. > > + */ > > + if (huge_page_order(&hstates[idx]) < HUGETLB_CGROUP_MIN_ORDER) > > + goto done; > > +again: > > + rcu_read_lock(); > > + h_cg = hugetlb_cgroup_from_task(current); > > + if (!h_cg) > > > In no circumstances should h_cg be NULL. > Aneesh? From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753091Ab2FXQpF (ORCPT ); Sun, 24 Jun 2012 12:45:05 -0400 Received: from e28smtp02.in.ibm.com ([122.248.162.2]:34739 "EHLO e28smtp02.in.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751910Ab2FXQpD (ORCPT ); Sun, 24 Jun 2012 12:45:03 -0400 From: "Aneesh Kumar K.V" To: Andrew Morton , Li Zefan Cc: linux-mm@kvack.org, kamezawa.hiroyu@jp.fujitsu.com, dhillf@gmail.com, rientjes@google.com, mhocko@suse.cz, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: Re: [PATCH -V9 11/15] hugetlb/cgroup: Add charge/uncharge routines for hugetlb cgroup In-Reply-To: <20120622151121.917178eb.akpm@linux-foundation.org> References: <1339583254-895-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1339583254-895-12-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <4FD9A79D.9030303@huawei.com> <20120622151121.917178eb.akpm@linux-foundation.org> User-Agent: Notmuch/0.13.2+63~g548a9bf (http://notmuchmail.org) Emacs/23.3.1 (x86_64-pc-linux-gnu) Date: Sun, 24 Jun 2012 22:14:51 +0530 Message-ID: <87txy07j7g.fsf@skywalker.in.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii x-cbid: 12062416-5816-0000-0000-0000034A0DD6 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Andrew, Andrew Morton writes: > On Thu, 14 Jun 2012 16:58:05 +0800 > Li Zefan wrote: > >> > +int hugetlb_cgroup_charge_cgroup(int idx, unsigned long nr_pages, >> >> > + struct hugetlb_cgroup **ptr) >> > +{ >> > + int ret = 0; >> > + struct res_counter *fail_res; >> > + struct hugetlb_cgroup *h_cg = NULL; >> > + unsigned long csize = nr_pages * PAGE_SIZE; >> > + >> > + if (hugetlb_cgroup_disabled()) >> > + goto done; >> > + /* >> > + * We don't charge any cgroup if the compound page have less >> > + * than 3 pages. >> > + */ >> > + if (huge_page_order(&hstates[idx]) < HUGETLB_CGROUP_MIN_ORDER) >> > + goto done; >> > +again: >> > + rcu_read_lock(); >> > + h_cg = hugetlb_cgroup_from_task(current); >> > + if (!h_cg) >> >> >> In no circumstances should h_cg be NULL. >> > > Aneesh? I missed this in the last review. Thanks for reminding. I will send a patch addressing this and another related comment in 4FD9A6B6.50503@huawei.com as a separate mail. -aneesh