* [PATCH -V7 01/14] hugetlb: rename max_hstate to hugetlb_max_hstate
2012-05-30 14:38 [PATCH -V7 00/14] hugetlb: Add HugeTLB controller to control HugeTLB allocation Aneesh Kumar K.V
@ 2012-05-30 14:38 ` Aneesh Kumar K.V
[not found] ` <1338388739-22919-2-git-send-email-aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2012-05-31 0:55 ` David Rientjes
2012-05-30 14:38 ` [PATCH -V7 02/14] hugetlbfs: don't use ERR_PTR with VM_FAULT* values Aneesh Kumar K.V
` (12 subsequent siblings)
13 siblings, 2 replies; 37+ messages in thread
From: Aneesh Kumar K.V @ 2012-05-30 14:38 UTC (permalink / raw)
To: linux-mm, kamezawa.hiroyu, dhillf, rientjes, mhocko, akpm, hannes
Cc: linux-kernel, cgroups, Aneesh Kumar K.V, Andrea Arcangeli
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Rename max_hstate to hugetlb_max_hstate. We will be using this from other
subsystems like hugetlb controller in later patches.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Hillf Danton <dhillf@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
---
mm/hugetlb.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 285a81e..e07d4cd 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -34,7 +34,7 @@ const unsigned long hugetlb_zero = 0, hugetlb_infinity = ~0UL;
static gfp_t htlb_alloc_mask = GFP_HIGHUSER;
unsigned long hugepages_treat_as_movable;
-static int max_hstate;
+static int hugetlb_max_hstate;
unsigned int default_hstate_idx;
struct hstate hstates[HUGE_MAX_HSTATE];
@@ -46,7 +46,7 @@ static unsigned long __initdata default_hstate_max_huge_pages;
static unsigned long __initdata default_hstate_size;
#define for_each_hstate(h) \
- for ((h) = hstates; (h) < &hstates[max_hstate]; (h)++)
+ for ((h) = hstates; (h) < &hstates[hugetlb_max_hstate]; (h)++)
/*
* Protects updates to hugepage_freelists, nr_huge_pages, and free_huge_pages
@@ -1897,9 +1897,9 @@ void __init hugetlb_add_hstate(unsigned order)
printk(KERN_WARNING "hugepagesz= specified twice, ignoring\n");
return;
}
- BUG_ON(max_hstate >= HUGE_MAX_HSTATE);
+ BUG_ON(hugetlb_max_hstate >= HUGE_MAX_HSTATE);
BUG_ON(order == 0);
- h = &hstates[max_hstate++];
+ h = &hstates[hugetlb_max_hstate++];
h->order = order;
h->mask = ~((1ULL << (order + PAGE_SHIFT)) - 1);
h->nr_huge_pages = 0;
@@ -1920,10 +1920,10 @@ static int __init hugetlb_nrpages_setup(char *s)
static unsigned long *last_mhp;
/*
- * !max_hstate means we haven't parsed a hugepagesz= parameter yet,
+ * !hugetlb_max_hstate means we haven't parsed a hugepagesz= parameter yet,
* so this hugepages= parameter goes to the "default hstate".
*/
- if (!max_hstate)
+ if (!hugetlb_max_hstate)
mhp = &default_hstate_max_huge_pages;
else
mhp = &parsed_hstate->max_huge_pages;
@@ -1942,7 +1942,7 @@ static int __init hugetlb_nrpages_setup(char *s)
* But we need to allocate >= MAX_ORDER hstates here early to still
* use the bootmem allocator.
*/
- if (max_hstate && parsed_hstate->order >= MAX_ORDER)
+ if (hugetlb_max_hstate && parsed_hstate->order >= MAX_ORDER)
hugetlb_hstate_alloc_pages(parsed_hstate);
last_mhp = mhp;
--
1.7.10
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 37+ messages in thread[parent not found: <1338388739-22919-2-git-send-email-aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>]
* Re: [PATCH -V7 01/14] hugetlb: rename max_hstate to hugetlb_max_hstate
[not found] ` <1338388739-22919-2-git-send-email-aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
@ 2012-05-31 0:48 ` Konrad Rzeszutek Wilk
2012-05-31 5:47 ` Aneesh Kumar K.V
0 siblings, 1 reply; 37+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-05-31 0:48 UTC (permalink / raw)
To: Aneesh Kumar K.V
Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A,
dhillf-Re5JQEeQqe8AvxtiuMwx3w, rientjes-hpIqsD4AKlfQT0dZR+AlfA,
mhocko-AlSwsSmVLrQ, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
hannes-druUgvl0LCNAfugRpC6u6w,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
cgroups-u79uwXL29TY76Z2rM5mHXA, Andrea Arcangeli
On Wed, May 30, 2012 at 08:08:46PM +0530, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
>
> Rename max_hstate to hugetlb_max_hstate. We will be using this from other
> subsystems like hugetlb controller in later patches.
>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
> Acked-by: Hillf Danton <dhillf-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> Acked-by: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>
> Cc: Andrea Arcangeli <aarcange-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> Cc: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
Your SOB needs to be the last thing.
> ---
> mm/hugetlb.c | 14 +++++++-------
> 1 file changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 285a81e..e07d4cd 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -34,7 +34,7 @@ const unsigned long hugetlb_zero = 0, hugetlb_infinity = ~0UL;
> static gfp_t htlb_alloc_mask = GFP_HIGHUSER;
> unsigned long hugepages_treat_as_movable;
>
> -static int max_hstate;
> +static int hugetlb_max_hstate;
> unsigned int default_hstate_idx;
> struct hstate hstates[HUGE_MAX_HSTATE];
>
> @@ -46,7 +46,7 @@ static unsigned long __initdata default_hstate_max_huge_pages;
> static unsigned long __initdata default_hstate_size;
>
> #define for_each_hstate(h) \
> - for ((h) = hstates; (h) < &hstates[max_hstate]; (h)++)
> + for ((h) = hstates; (h) < &hstates[hugetlb_max_hstate]; (h)++)
>
> /*
> * Protects updates to hugepage_freelists, nr_huge_pages, and free_huge_pages
> @@ -1897,9 +1897,9 @@ void __init hugetlb_add_hstate(unsigned order)
> printk(KERN_WARNING "hugepagesz= specified twice, ignoring\n");
> return;
> }
> - BUG_ON(max_hstate >= HUGE_MAX_HSTATE);
> + BUG_ON(hugetlb_max_hstate >= HUGE_MAX_HSTATE);
> BUG_ON(order == 0);
> - h = &hstates[max_hstate++];
> + h = &hstates[hugetlb_max_hstate++];
> h->order = order;
> h->mask = ~((1ULL << (order + PAGE_SHIFT)) - 1);
> h->nr_huge_pages = 0;
> @@ -1920,10 +1920,10 @@ static int __init hugetlb_nrpages_setup(char *s)
> static unsigned long *last_mhp;
>
> /*
> - * !max_hstate means we haven't parsed a hugepagesz= parameter yet,
> + * !hugetlb_max_hstate means we haven't parsed a hugepagesz= parameter yet,
> * so this hugepages= parameter goes to the "default hstate".
> */
> - if (!max_hstate)
> + if (!hugetlb_max_hstate)
> mhp = &default_hstate_max_huge_pages;
> else
> mhp = &parsed_hstate->max_huge_pages;
> @@ -1942,7 +1942,7 @@ static int __init hugetlb_nrpages_setup(char *s)
> * But we need to allocate >= MAX_ORDER hstates here early to still
> * use the bootmem allocator.
> */
> - if (max_hstate && parsed_hstate->order >= MAX_ORDER)
> + if (hugetlb_max_hstate && parsed_hstate->order >= MAX_ORDER)
> hugetlb_hstate_alloc_pages(parsed_hstate);
>
> last_mhp = mhp;
> --
> 1.7.10
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo-Bw31MaZKKs0EbZ0PF+XxCw@public.gmane.org For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> Don't email: <a href=mailto:"dont-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org"> email-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org </a>
>
^ permalink raw reply [flat|nested] 37+ messages in thread* Re: [PATCH -V7 01/14] hugetlb: rename max_hstate to hugetlb_max_hstate
2012-05-31 0:48 ` Konrad Rzeszutek Wilk
@ 2012-05-31 5:47 ` Aneesh Kumar K.V
0 siblings, 0 replies; 37+ messages in thread
From: Aneesh Kumar K.V @ 2012-05-31 5:47 UTC (permalink / raw)
To: Konrad Rzeszutek Wilk
Cc: linux-mm, kamezawa.hiroyu, dhillf, rientjes, mhocko, akpm, hannes,
linux-kernel, cgroups, Andrea Arcangeli
On Wed, May 30, 2012 at 08:48:40PM -0400, Konrad Rzeszutek Wilk wrote:
> On Wed, May 30, 2012 at 08:08:46PM +0530, Aneesh Kumar K.V wrote:
> > From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> >
> > Rename max_hstate to hugetlb_max_hstate. We will be using this from other
> > subsystems like hugetlb controller in later patches.
> >
> > Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> > Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > Acked-by: Hillf Danton <dhillf@gmail.com>
> > Acked-by: Michal Hocko <mhocko@suse.cz>
> > Cc: Andrea Arcangeli <aarcange@redhat.com>
> > Cc: Johannes Weiner <hannes@cmpxchg.org>
>
> Your SOB needs to be the last thing.
I started with patches in -next, because Andrew had few fixup on top of last
patch series. So I ended up with this format. I have fixed them locally now
-aneesh
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH -V7 01/14] hugetlb: rename max_hstate to hugetlb_max_hstate
2012-05-30 14:38 ` [PATCH -V7 01/14] hugetlb: rename max_hstate to hugetlb_max_hstate Aneesh Kumar K.V
[not found] ` <1338388739-22919-2-git-send-email-aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
@ 2012-05-31 0:55 ` David Rientjes
1 sibling, 0 replies; 37+ messages in thread
From: David Rientjes @ 2012-05-31 0:55 UTC (permalink / raw)
To: Aneesh Kumar K.V
Cc: linux-mm, kamezawa.hiroyu, dhillf, mhocko, akpm, hannes,
linux-kernel, cgroups, Andrea Arcangeli
On Wed, 30 May 2012, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>
> Rename max_hstate to hugetlb_max_hstate. We will be using this from other
> subsystems like hugetlb controller in later patches.
>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Acked-by: Hillf Danton <dhillf@gmail.com>
> Acked-by: Michal Hocko <mhocko@suse.cz>
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: David Rientjes <rientjes@google.com>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH -V7 02/14] hugetlbfs: don't use ERR_PTR with VM_FAULT* values
2012-05-30 14:38 [PATCH -V7 00/14] hugetlb: Add HugeTLB controller to control HugeTLB allocation Aneesh Kumar K.V
2012-05-30 14:38 ` [PATCH -V7 01/14] hugetlb: rename max_hstate to hugetlb_max_hstate Aneesh Kumar K.V
@ 2012-05-30 14:38 ` Aneesh Kumar K.V
[not found] ` <1338388739-22919-3-git-send-email-aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2012-05-31 1:02 ` David Rientjes
2012-05-30 14:38 ` [PATCH -V7 03/14] hugetlbfs: add an inline helper for finding hstate index Aneesh Kumar K.V
` (11 subsequent siblings)
13 siblings, 2 replies; 37+ messages in thread
From: Aneesh Kumar K.V @ 2012-05-30 14:38 UTC (permalink / raw)
To: linux-mm, kamezawa.hiroyu, dhillf, rientjes, mhocko, akpm, hannes
Cc: linux-kernel, cgroups, Aneesh Kumar K.V, Andrea Arcangeli
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
The current use of VM_FAULT_* codes with ERR_PTR requires us to ensure
VM_FAULT_* values will not exceed MAX_ERRNO value. Decouple the
VM_FAULT_* values from MAX_ERRNO.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
---
mm/hugetlb.c | 18 +++++++++++++-----
1 file changed, 13 insertions(+), 5 deletions(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index e07d4cd..8ded02d 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1123,10 +1123,10 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
*/
chg = vma_needs_reservation(h, vma, addr);
if (chg < 0)
- return ERR_PTR(-VM_FAULT_OOM);
+ return ERR_PTR(-ENOMEM);
if (chg)
if (hugepage_subpool_get_pages(spool, chg))
- return ERR_PTR(-VM_FAULT_SIGBUS);
+ return ERR_PTR(-ENOSPC);
spin_lock(&hugetlb_lock);
page = dequeue_huge_page_vma(h, vma, addr, avoid_reserve);
@@ -1136,7 +1136,7 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
page = alloc_buddy_huge_page(h, NUMA_NO_NODE);
if (!page) {
hugepage_subpool_put_pages(spool, chg);
- return ERR_PTR(-VM_FAULT_SIGBUS);
+ return ERR_PTR(-ENOSPC);
}
}
@@ -2496,6 +2496,7 @@ retry_avoidcopy:
new_page = alloc_huge_page(vma, address, outside_reserve);
if (IS_ERR(new_page)) {
+ int err = PTR_ERR(new_page);
page_cache_release(old_page);
/*
@@ -2524,7 +2525,10 @@ retry_avoidcopy:
/* Caller expects lock to be held */
spin_lock(&mm->page_table_lock);
- return -PTR_ERR(new_page);
+ if (err == -ENOMEM)
+ return VM_FAULT_OOM;
+ else
+ return VM_FAULT_SIGBUS;
}
/*
@@ -2642,7 +2646,11 @@ retry:
goto out;
page = alloc_huge_page(vma, address, 0);
if (IS_ERR(page)) {
- ret = -PTR_ERR(page);
+ ret = PTR_ERR(page);
+ if (ret == -ENOMEM)
+ ret = VM_FAULT_OOM;
+ else
+ ret = VM_FAULT_SIGBUS;
goto out;
}
clear_huge_page(page, address, pages_per_huge_page(h));
--
1.7.10
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 37+ messages in thread[parent not found: <1338388739-22919-3-git-send-email-aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>]
* Re: [PATCH -V7 02/14] hugetlbfs: don't use ERR_PTR with VM_FAULT* values
[not found] ` <1338388739-22919-3-git-send-email-aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
@ 2012-05-31 0:54 ` Konrad Rzeszutek Wilk
0 siblings, 0 replies; 37+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-05-31 0:54 UTC (permalink / raw)
To: Aneesh Kumar K.V
Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A,
dhillf-Re5JQEeQqe8AvxtiuMwx3w, rientjes-hpIqsD4AKlfQT0dZR+AlfA,
mhocko-AlSwsSmVLrQ, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
hannes-druUgvl0LCNAfugRpC6u6w,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
cgroups-u79uwXL29TY76Z2rM5mHXA, Andrea Arcangeli
On Wed, May 30, 2012 at 08:08:47PM +0530, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
>
> The current use of VM_FAULT_* codes with ERR_PTR requires us to ensure
> VM_FAULT_* values will not exceed MAX_ERRNO value. Decouple the
> VM_FAULT_* values from MAX_ERRNO.
>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
> Cc: Hillf Danton <dhillf-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> Cc: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>
> Cc: Andrea Arcangeli <aarcange-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> Cc: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
> ---
> mm/hugetlb.c | 18 +++++++++++++-----
> 1 file changed, 13 insertions(+), 5 deletions(-)
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index e07d4cd..8ded02d 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1123,10 +1123,10 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
> */
> chg = vma_needs_reservation(h, vma, addr);
> if (chg < 0)
> - return ERR_PTR(-VM_FAULT_OOM);
> + return ERR_PTR(-ENOMEM);
> if (chg)
> if (hugepage_subpool_get_pages(spool, chg))
> - return ERR_PTR(-VM_FAULT_SIGBUS);
> + return ERR_PTR(-ENOSPC);
Not enough space? Why not just pass what 'hugepage_subpool_get_pages'
returns?
>
> spin_lock(&hugetlb_lock);
> page = dequeue_huge_page_vma(h, vma, addr, avoid_reserve);
> @@ -1136,7 +1136,7 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
> page = alloc_buddy_huge_page(h, NUMA_NO_NODE);
> if (!page) {
> hugepage_subpool_put_pages(spool, chg);
> - return ERR_PTR(-VM_FAULT_SIGBUS);
> + return ERR_PTR(-ENOSPC);
-ENOMEM seems more appropiate?
> }
> }
>
> @@ -2496,6 +2496,7 @@ retry_avoidcopy:
> new_page = alloc_huge_page(vma, address, outside_reserve);
>
> if (IS_ERR(new_page)) {
> + int err = PTR_ERR(new_page);
> page_cache_release(old_page);
>
> /*
> @@ -2524,7 +2525,10 @@ retry_avoidcopy:
>
> /* Caller expects lock to be held */
> spin_lock(&mm->page_table_lock);
> - return -PTR_ERR(new_page);
> + if (err == -ENOMEM)
> + return VM_FAULT_OOM;
> + else
> + return VM_FAULT_SIGBUS;
Ah, you are doing it to translate it.
Perhaps you should return -EFAULT for the really bad case
where you need to do OOM and then for all the other cases
return SIGBUS? Or maybe the other way around? ENOSPC doesn't
seem like the right error.
> }
>
> /*
> @@ -2642,7 +2646,11 @@ retry:
> goto out;
> page = alloc_huge_page(vma, address, 0);
> if (IS_ERR(page)) {
> - ret = -PTR_ERR(page);
> + ret = PTR_ERR(page);
> + if (ret == -ENOMEM)
> + ret = VM_FAULT_OOM;
> + else
> + ret = VM_FAULT_SIGBUS;
> goto out;
> }
> clear_huge_page(page, address, pages_per_huge_page(h));
> --
> 1.7.10
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo-Bw31MaZKKs0EbZ0PF+XxCw@public.gmane.org For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> Don't email: <a href=mailto:"dont-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org"> email-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org </a>
>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH -V7 02/14] hugetlbfs: don't use ERR_PTR with VM_FAULT* values
2012-05-30 14:38 ` [PATCH -V7 02/14] hugetlbfs: don't use ERR_PTR with VM_FAULT* values Aneesh Kumar K.V
[not found] ` <1338388739-22919-3-git-send-email-aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
@ 2012-05-31 1:02 ` David Rientjes
2012-05-31 5:45 ` Aneesh Kumar K.V
1 sibling, 1 reply; 37+ messages in thread
From: David Rientjes @ 2012-05-31 1:02 UTC (permalink / raw)
To: Aneesh Kumar K.V
Cc: linux-mm, kamezawa.hiroyu, dhillf, mhocko, akpm, hannes,
linux-kernel, cgroups, Andrea Arcangeli
On Wed, 30 May 2012, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>
> The current use of VM_FAULT_* codes with ERR_PTR requires us to ensure
> VM_FAULT_* values will not exceed MAX_ERRNO value. Decouple the
> VM_FAULT_* values from MAX_ERRNO.
>
Yeah, but is there a reason for using VM_FAULT_HWPOISON_LARGE_MASK since
that's the only VM_FAULT_* value that is greater than MAX_ERRNO? The rest
of your patch set doesn't require this, so I think this change should just
be dropped. (And PTR_ERR() still returns long, this wasn't fixed from my
original review.)
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 37+ messages in thread* Re: [PATCH -V7 02/14] hugetlbfs: don't use ERR_PTR with VM_FAULT* values
2012-05-31 1:02 ` David Rientjes
@ 2012-05-31 5:45 ` Aneesh Kumar K.V
2012-05-31 6:50 ` David Rientjes
0 siblings, 1 reply; 37+ messages in thread
From: Aneesh Kumar K.V @ 2012-05-31 5:45 UTC (permalink / raw)
To: David Rientjes
Cc: linux-mm, kamezawa.hiroyu, dhillf, mhocko, akpm, hannes,
linux-kernel, cgroups, Andrea Arcangeli
On Wed, May 30, 2012 at 06:02:59PM -0700, David Rientjes wrote:
> On Wed, 30 May 2012, Aneesh Kumar K.V wrote:
>
> > From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> >
> > The current use of VM_FAULT_* codes with ERR_PTR requires us to ensure
> > VM_FAULT_* values will not exceed MAX_ERRNO value. Decouple the
> > VM_FAULT_* values from MAX_ERRNO.
> >
>
> Yeah, but is there a reason for using VM_FAULT_HWPOISON_LARGE_MASK since
> that's the only VM_FAULT_* value that is greater than MAX_ERRNO? The rest
> of your patch set doesn't require this, so I think this change should just
> be dropped. (And PTR_ERR() still returns long, this wasn't fixed from my
> original review.)
>
The changes was done as per Andrew's request so that we don't have such hidden
dependencies on the values of VM_FAULT_*. Yes it can be a seperate patch from
the patchset. I have changed int to long as per your review.
-aneesh
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH -V7 02/14] hugetlbfs: don't use ERR_PTR with VM_FAULT* values
2012-05-31 5:45 ` Aneesh Kumar K.V
@ 2012-05-31 6:50 ` David Rientjes
0 siblings, 0 replies; 37+ messages in thread
From: David Rientjes @ 2012-05-31 6:50 UTC (permalink / raw)
To: Aneesh Kumar K.V
Cc: linux-mm, kamezawa.hiroyu, dhillf, mhocko, akpm, hannes,
linux-kernel, cgroups, Andrea Arcangeli
On Thu, 31 May 2012, Aneesh Kumar K.V wrote:
> > Yeah, but is there a reason for using VM_FAULT_HWPOISON_LARGE_MASK since
> > that's the only VM_FAULT_* value that is greater than MAX_ERRNO? The rest
> > of your patch set doesn't require this, so I think this change should just
> > be dropped. (And PTR_ERR() still returns long, this wasn't fixed from my
> > original review.)
> >
>
> The changes was done as per Andrew's request so that we don't have such hidden
> dependencies on the values of VM_FAULT_*. Yes it can be a seperate patch from
> the patchset. I have changed int to long as per your review.
>
I think it confuscates the code, can't we just add something like
BUILD_BUG_ON() to ensure that PTR_ERR() never uses values that are outside
the bounds of MAX_ERRNO so we'll catch these at compile time if
mm/hugetlb.c or anything else is ever extended to use such values?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH -V7 03/14] hugetlbfs: add an inline helper for finding hstate index
2012-05-30 14:38 [PATCH -V7 00/14] hugetlb: Add HugeTLB controller to control HugeTLB allocation Aneesh Kumar K.V
2012-05-30 14:38 ` [PATCH -V7 01/14] hugetlb: rename max_hstate to hugetlb_max_hstate Aneesh Kumar K.V
2012-05-30 14:38 ` [PATCH -V7 02/14] hugetlbfs: don't use ERR_PTR with VM_FAULT* values Aneesh Kumar K.V
@ 2012-05-30 14:38 ` Aneesh Kumar K.V
2012-05-31 1:05 ` David Rientjes
2012-05-30 14:38 ` [PATCH -V7 04/14] hugetlb: use mmu_gather instead of a temporary linked list for accumulating pages Aneesh Kumar K.V
` (10 subsequent siblings)
13 siblings, 1 reply; 37+ messages in thread
From: Aneesh Kumar K.V @ 2012-05-30 14:38 UTC (permalink / raw)
To: linux-mm, kamezawa.hiroyu, dhillf, rientjes, mhocko, akpm, hannes
Cc: linux-kernel, cgroups, Aneesh Kumar K.V, Andrea Arcangeli
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Add an inline helper and use it in the code.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
---
include/linux/hugetlb.h | 6 ++++++
mm/hugetlb.c | 20 +++++++++++---------
2 files changed, 17 insertions(+), 9 deletions(-)
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index d5d6bbe..217f528 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -302,6 +302,11 @@ static inline unsigned hstate_index_to_shift(unsigned index)
return hstates[index].order + PAGE_SHIFT;
}
+static inline int hstate_index(struct hstate *h)
+{
+ return h - hstates;
+}
+
#else
struct hstate {};
#define alloc_huge_page_node(h, nid) NULL
@@ -320,6 +325,7 @@ static inline unsigned int pages_per_huge_page(struct hstate *h)
return 1;
}
#define hstate_index_to_shift(index) 0
+#define hstate_index(h) 0
#endif
#endif /* _LINUX_HUGETLB_H */
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 8ded02d..9b97a5c 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1646,7 +1646,7 @@ static int hugetlb_sysfs_add_hstate(struct hstate *h, struct kobject *parent,
struct attribute_group *hstate_attr_group)
{
int retval;
- int hi = h - hstates;
+ int hi = hstate_index(h);
hstate_kobjs[hi] = kobject_create_and_add(h->name, parent);
if (!hstate_kobjs[hi])
@@ -1741,11 +1741,13 @@ void hugetlb_unregister_node(struct node *node)
if (!nhs->hugepages_kobj)
return; /* no hstate attributes */
- for_each_hstate(h)
- if (nhs->hstate_kobjs[h - hstates]) {
- kobject_put(nhs->hstate_kobjs[h - hstates]);
- nhs->hstate_kobjs[h - hstates] = NULL;
+ for_each_hstate(h) {
+ int idx = hstate_index(h);
+ if (nhs->hstate_kobjs[idx]) {
+ kobject_put(nhs->hstate_kobjs[idx]);
+ nhs->hstate_kobjs[idx] = NULL;
}
+ }
kobject_put(nhs->hugepages_kobj);
nhs->hugepages_kobj = NULL;
@@ -1848,7 +1850,7 @@ static void __exit hugetlb_exit(void)
hugetlb_unregister_all_nodes();
for_each_hstate(h) {
- kobject_put(hstate_kobjs[h - hstates]);
+ kobject_put(hstate_kobjs[hstate_index(h)]);
}
kobject_put(hugepages_kobj);
@@ -1869,7 +1871,7 @@ static int __init hugetlb_init(void)
if (!size_to_hstate(default_hstate_size))
hugetlb_add_hstate(HUGETLB_PAGE_ORDER);
}
- default_hstate_idx = size_to_hstate(default_hstate_size) - hstates;
+ default_hstate_idx = hstate_index(size_to_hstate(default_hstate_size));
if (default_hstate_max_huge_pages)
default_hstate.max_huge_pages = default_hstate_max_huge_pages;
@@ -2687,7 +2689,7 @@ retry:
*/
if (unlikely(PageHWPoison(page))) {
ret = VM_FAULT_HWPOISON |
- VM_FAULT_SET_HINDEX(h - hstates);
+ VM_FAULT_SET_HINDEX(hstate_index(h));
goto backout_unlocked;
}
}
@@ -2760,7 +2762,7 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
return 0;
} else if (unlikely(is_hugetlb_entry_hwpoisoned(entry)))
return VM_FAULT_HWPOISON_LARGE |
- VM_FAULT_SET_HINDEX(h - hstates);
+ VM_FAULT_SET_HINDEX(hstate_index(h));
}
ptep = huge_pte_alloc(mm, address, huge_page_size(h));
--
1.7.10
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 37+ messages in thread* Re: [PATCH -V7 03/14] hugetlbfs: add an inline helper for finding hstate index
2012-05-30 14:38 ` [PATCH -V7 03/14] hugetlbfs: add an inline helper for finding hstate index Aneesh Kumar K.V
@ 2012-05-31 1:05 ` David Rientjes
0 siblings, 0 replies; 37+ messages in thread
From: David Rientjes @ 2012-05-31 1:05 UTC (permalink / raw)
To: Aneesh Kumar K.V
Cc: linux-mm, kamezawa.hiroyu, dhillf, mhocko, akpm, hannes,
linux-kernel, cgroups, Andrea Arcangeli
On Wed, 30 May 2012, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>
> Add an inline helper and use it in the code.
>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> Acked-by: Michal Hocko <mhocko@suse.cz>
> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Cc: Hillf Danton <dhillf@gmail.com>
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: David Rientjes <rientjes@google.com>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH -V7 04/14] hugetlb: use mmu_gather instead of a temporary linked list for accumulating pages
2012-05-30 14:38 [PATCH -V7 00/14] hugetlb: Add HugeTLB controller to control HugeTLB allocation Aneesh Kumar K.V
` (2 preceding siblings ...)
2012-05-30 14:38 ` [PATCH -V7 03/14] hugetlbfs: add an inline helper for finding hstate index Aneesh Kumar K.V
@ 2012-05-30 14:38 ` Aneesh Kumar K.V
[not found] ` <1338388739-22919-5-git-send-email-aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2012-05-30 14:38 ` [PATCH -V7 05/14] hugetlb: avoid taking i_mmap_mutex in unmap_single_vma() for hugetlb Aneesh Kumar K.V
` (9 subsequent siblings)
13 siblings, 1 reply; 37+ messages in thread
From: Aneesh Kumar K.V @ 2012-05-30 14:38 UTC (permalink / raw)
To: linux-mm, kamezawa.hiroyu, dhillf, rientjes, mhocko, akpm, hannes
Cc: linux-kernel, cgroups, Aneesh Kumar K.V, Andrea Arcangeli
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Use an mmu_gather instead of a temporary linked list for accumulating
pages when we unmap a hugepage range. This also allows us to get rid of
i_mmap_mutex unmap_hugepage_range in the following patch.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
---
fs/hugetlbfs/inode.c | 4 ++--
include/linux/hugetlb.h | 22 ++++++++++++++----
mm/hugetlb.c | 59 ++++++++++++++++++++++++++++-------------------
mm/memory.c | 7 ++++--
4 files changed, 59 insertions(+), 33 deletions(-)
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index cc9281b..ff233e4 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -416,8 +416,8 @@ hugetlb_vmtruncate_list(struct prio_tree_root *root, pgoff_t pgoff)
else
v_offset = 0;
- __unmap_hugepage_range(vma,
- vma->vm_start + v_offset, vma->vm_end, NULL);
+ unmap_hugepage_range(vma, vma->vm_start + v_offset,
+ vma->vm_end, NULL);
}
}
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 217f528..c21e136 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -7,6 +7,7 @@
struct ctl_table;
struct user_struct;
+struct mmu_gather;
#ifdef CONFIG_HUGETLB_PAGE
@@ -40,9 +41,10 @@ int follow_hugetlb_page(struct mm_struct *, struct vm_area_struct *,
struct page **, struct vm_area_struct **,
unsigned long *, int *, int, unsigned int flags);
void unmap_hugepage_range(struct vm_area_struct *,
- unsigned long, unsigned long, struct page *);
-void __unmap_hugepage_range(struct vm_area_struct *,
- unsigned long, unsigned long, struct page *);
+ unsigned long, unsigned long, struct page *);
+void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vms,
+ unsigned long start, unsigned long end,
+ struct page *ref_page);
int hugetlb_prefault(struct address_space *, struct vm_area_struct *);
void hugetlb_report_meminfo(struct seq_file *);
int hugetlb_report_node_meminfo(int, char *);
@@ -98,7 +100,6 @@ static inline unsigned long hugetlb_total_pages(void)
#define follow_huge_addr(mm, addr, write) ERR_PTR(-EINVAL)
#define copy_hugetlb_page_range(src, dst, vma) ({ BUG(); 0; })
#define hugetlb_prefault(mapping, vma) ({ BUG(); 0; })
-#define unmap_hugepage_range(vma, start, end, page) BUG()
static inline void hugetlb_report_meminfo(struct seq_file *m)
{
}
@@ -112,13 +113,24 @@ static inline void hugetlb_report_meminfo(struct seq_file *m)
#define hugetlb_free_pgd_range(tlb, addr, end, floor, ceiling) ({BUG(); 0; })
#define hugetlb_fault(mm, vma, addr, flags) ({ BUG(); 0; })
#define huge_pte_offset(mm, address) 0
-#define dequeue_hwpoisoned_huge_page(page) 0
+static inline int dequeue_hwpoisoned_huge_page(struct page *page)
+{
+ return 0;
+}
+
static inline void copy_huge_page(struct page *dst, struct page *src)
{
}
#define hugetlb_change_protection(vma, address, end, newprot)
+static inline void __unmap_hugepage_range(struct mmu_gather *tlb,
+ struct vm_area_struct *vma, unsigned long start,
+ unsigned long end, struct page *ref_page)
+{
+ BUG();
+}
+
#endif /* !CONFIG_HUGETLB_PAGE */
#define HUGETLB_ANON_FILE "anon_hugepage"
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 9b97a5c..704a269 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -24,8 +24,9 @@
#include <asm/page.h>
#include <asm/pgtable.h>
-#include <linux/io.h>
+#include <asm/tlb.h>
+#include <linux/io.h>
#include <linux/hugetlb.h>
#include <linux/node.h>
#include "internal.h"
@@ -2310,30 +2311,26 @@ static int is_hugetlb_entry_hwpoisoned(pte_t pte)
return 0;
}
-void __unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start,
- unsigned long end, struct page *ref_page)
+void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
+ unsigned long start, unsigned long end,
+ struct page *ref_page)
{
+ int force_flush = 0;
struct mm_struct *mm = vma->vm_mm;
unsigned long address;
pte_t *ptep;
pte_t pte;
struct page *page;
- struct page *tmp;
struct hstate *h = hstate_vma(vma);
unsigned long sz = huge_page_size(h);
- /*
- * A page gathering list, protected by per file i_mmap_mutex. The
- * lock is used to avoid list corruption from multiple unmapping
- * of the same page since we are using page->lru.
- */
- LIST_HEAD(page_list);
-
WARN_ON(!is_vm_hugetlb_page(vma));
BUG_ON(start & ~huge_page_mask(h));
BUG_ON(end & ~huge_page_mask(h));
+ tlb_start_vma(tlb, vma);
mmu_notifier_invalidate_range_start(mm, start, end);
+again:
spin_lock(&mm->page_table_lock);
for (address = start; address < end; address += sz) {
ptep = huge_pte_offset(mm, address);
@@ -2372,30 +2369,45 @@ void __unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start,
}
pte = huge_ptep_get_and_clear(mm, address, ptep);
+ tlb_remove_tlb_entry(tlb, ptep, address);
if (pte_dirty(pte))
set_page_dirty(page);
- list_add(&page->lru, &page_list);
+ page_remove_rmap(page);
+ force_flush = !__tlb_remove_page(tlb, page);
+ if (force_flush)
+ break;
/* Bail out after unmapping reference page if supplied */
if (ref_page)
break;
}
- flush_tlb_range(vma, start, end);
spin_unlock(&mm->page_table_lock);
- mmu_notifier_invalidate_range_end(mm, start, end);
- list_for_each_entry_safe(page, tmp, &page_list, lru) {
- page_remove_rmap(page);
- list_del(&page->lru);
- put_page(page);
+ /*
+ * mmu_gather ran out of room to batch pages, we break out of
+ * the PTE lock to avoid doing the potential expensive TLB invalidate
+ * and page-free while holding it.
+ */
+ if (force_flush) {
+ force_flush = 0;
+ tlb_flush_mmu(tlb);
+ if (address < end && !ref_page)
+ goto again;
}
+ mmu_notifier_invalidate_range_end(mm, start, end);
+ tlb_end_vma(tlb, vma);
}
void unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start,
unsigned long end, struct page *ref_page)
{
- mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex);
- __unmap_hugepage_range(vma, start, end, ref_page);
- mutex_unlock(&vma->vm_file->f_mapping->i_mmap_mutex);
+ struct mm_struct *mm;
+ struct mmu_gather tlb;
+
+ mm = vma->vm_mm;
+
+ tlb_gather_mmu(&tlb, mm, 0);
+ __unmap_hugepage_range(&tlb, vma, start, end, ref_page);
+ tlb_finish_mmu(&tlb, start, end);
}
/*
@@ -2440,9 +2452,8 @@ static int unmap_ref_private(struct mm_struct *mm, struct vm_area_struct *vma,
* from the time of fork. This would look like data corruption
*/
if (!is_vma_resv_set(iter_vma, HPAGE_RESV_OWNER))
- __unmap_hugepage_range(iter_vma,
- address, address + huge_page_size(h),
- page);
+ unmap_hugepage_range(iter_vma, address,
+ address + huge_page_size(h), page);
}
mutex_unlock(&mapping->i_mmap_mutex);
diff --git a/mm/memory.c b/mm/memory.c
index 1b7dc66..545e18a 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1326,8 +1326,11 @@ static void unmap_single_vma(struct mmu_gather *tlb,
* Since no pte has actually been setup, it is
* safe to do nothing in this case.
*/
- if (vma->vm_file)
- unmap_hugepage_range(vma, start, end, NULL);
+ if (vma->vm_file) {
+ mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex);
+ __unmap_hugepage_range(tlb, vma, start, end, NULL);
+ mutex_unlock(&vma->vm_file->f_mapping->i_mmap_mutex);
+ }
} else
unmap_page_range(tlb, vma, start, end, details);
}
--
1.7.10
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 37+ messages in thread* [PATCH -V7 05/14] hugetlb: avoid taking i_mmap_mutex in unmap_single_vma() for hugetlb
2012-05-30 14:38 [PATCH -V7 00/14] hugetlb: Add HugeTLB controller to control HugeTLB allocation Aneesh Kumar K.V
` (3 preceding siblings ...)
2012-05-30 14:38 ` [PATCH -V7 04/14] hugetlb: use mmu_gather instead of a temporary linked list for accumulating pages Aneesh Kumar K.V
@ 2012-05-30 14:38 ` Aneesh Kumar K.V
[not found] ` <1338388739-22919-6-git-send-email-aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2012-05-30 14:38 ` [PATCH -V7 06/14] hugetlb: simplify migrate_huge_page() Aneesh Kumar K.V
` (8 subsequent siblings)
13 siblings, 1 reply; 37+ messages in thread
From: Aneesh Kumar K.V @ 2012-05-30 14:38 UTC (permalink / raw)
To: linux-mm, kamezawa.hiroyu, dhillf, rientjes, mhocko, akpm, hannes
Cc: linux-kernel, cgroups, Aneesh Kumar K.V, Andrea Arcangeli
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
i_mmap_mutex lock was added in unmap_single_vma by 502717f4e ("hugetlb:
fix linked list corruption in unmap_hugepage_range()") but we don't use
page->lru in unmap_hugepage_range any more. Also the lock was taken
higher up in the stack in some code path. That would result in deadlock.
unmap_mapping_range (i_mmap_mutex)
-> unmap_mapping_range_tree
-> unmap_mapping_range_vma
-> zap_page_range_single
-> unmap_single_vma
-> unmap_hugepage_range (i_mmap_mutex)
For shared pagetable support for huge pages, since pagetable pages are ref
counted we don't need any lock during huge_pmd_unshare. We do take
i_mmap_mutex in huge_pmd_share while walking the vma_prio_tree in mapping.
(39dde65c9940c97f ("shared page table for hugetlb page")).
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
---
mm/memory.c | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/mm/memory.c b/mm/memory.c
index 545e18a..f6bc04f 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1326,11 +1326,8 @@ static void unmap_single_vma(struct mmu_gather *tlb,
* Since no pte has actually been setup, it is
* safe to do nothing in this case.
*/
- if (vma->vm_file) {
- mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex);
+ if (vma->vm_file)
__unmap_hugepage_range(tlb, vma, start, end, NULL);
- mutex_unlock(&vma->vm_file->f_mapping->i_mmap_mutex);
- }
} else
unmap_page_range(tlb, vma, start, end, details);
}
--
1.7.10
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 37+ messages in thread* [PATCH -V7 06/14] hugetlb: simplify migrate_huge_page()
2012-05-30 14:38 [PATCH -V7 00/14] hugetlb: Add HugeTLB controller to control HugeTLB allocation Aneesh Kumar K.V
` (4 preceding siblings ...)
2012-05-30 14:38 ` [PATCH -V7 05/14] hugetlb: avoid taking i_mmap_mutex in unmap_single_vma() for hugetlb Aneesh Kumar K.V
@ 2012-05-30 14:38 ` Aneesh Kumar K.V
2012-05-30 14:38 ` [PATCH -V7 07/14] mm/page_cgroup: Make page_cgroup point to the cgroup rather than the mem_cgroup Aneesh Kumar K.V
` (7 subsequent siblings)
13 siblings, 0 replies; 37+ messages in thread
From: Aneesh Kumar K.V @ 2012-05-30 14:38 UTC (permalink / raw)
To: linux-mm, kamezawa.hiroyu, dhillf, rientjes, mhocko, akpm, hannes
Cc: linux-kernel, cgroups, Aneesh Kumar K.V, Andrea Arcangeli
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Since we migrate only one hugepage don't use linked list for passing the
page around. Directly pass page that need to be migrated as argument.
This also remove the usage page->lru in migrate path.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
---
include/linux/migrate.h | 4 +--
mm/memory-failure.c | 13 ++--------
mm/migrate.c | 65 +++++++++++++++--------------------------------
3 files changed, 25 insertions(+), 57 deletions(-)
diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index 855c337..ce7e667 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -15,7 +15,7 @@ extern int migrate_page(struct address_space *,
extern int migrate_pages(struct list_head *l, new_page_t x,
unsigned long private, bool offlining,
enum migrate_mode mode);
-extern int migrate_huge_pages(struct list_head *l, new_page_t x,
+extern int migrate_huge_page(struct page *, new_page_t x,
unsigned long private, bool offlining,
enum migrate_mode mode);
@@ -36,7 +36,7 @@ static inline void putback_lru_pages(struct list_head *l) {}
static inline int migrate_pages(struct list_head *l, new_page_t x,
unsigned long private, bool offlining,
enum migrate_mode mode) { return -ENOSYS; }
-static inline int migrate_huge_pages(struct list_head *l, new_page_t x,
+static inline int migrate_huge_page(struct page *page, new_page_t x,
unsigned long private, bool offlining,
enum migrate_mode mode) { return -ENOSYS; }
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index ab1e714..53a1495 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1414,7 +1414,6 @@ static int soft_offline_huge_page(struct page *page, int flags)
int ret;
unsigned long pfn = page_to_pfn(page);
struct page *hpage = compound_head(page);
- LIST_HEAD(pagelist);
ret = get_any_page(page, pfn, flags);
if (ret < 0)
@@ -1429,19 +1428,11 @@ static int soft_offline_huge_page(struct page *page, int flags)
}
/* Keep page count to indicate a given hugepage is isolated. */
-
- list_add(&hpage->lru, &pagelist);
- ret = migrate_huge_pages(&pagelist, new_page, MPOL_MF_MOVE_ALL, 0,
- true);
+ ret = migrate_huge_page(hpage, new_page, MPOL_MF_MOVE_ALL, 0, true);
+ put_page(hpage);
if (ret) {
- struct page *page1, *page2;
- list_for_each_entry_safe(page1, page2, &pagelist, lru)
- put_page(page1);
-
pr_info("soft offline: %#lx: migration failed %d, type %lx\n",
pfn, ret, page->flags);
- if (ret > 0)
- ret = -EIO;
return ret;
}
done:
diff --git a/mm/migrate.c b/mm/migrate.c
index ab81d48..927254c 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -929,15 +929,8 @@ static int unmap_and_move_huge_page(new_page_t get_new_page,
if (anon_vma)
put_anon_vma(anon_vma);
unlock_page(hpage);
-
out:
- if (rc != -EAGAIN) {
- list_del(&hpage->lru);
- put_page(hpage);
- }
-
put_page(new_hpage);
-
if (result) {
if (rc)
*result = rc;
@@ -1013,48 +1006,32 @@ out:
return nr_failed + retry;
}
-int migrate_huge_pages(struct list_head *from,
- new_page_t get_new_page, unsigned long private, bool offlining,
- enum migrate_mode mode)
+int migrate_huge_page(struct page *hpage, new_page_t get_new_page,
+ unsigned long private, bool offlining,
+ enum migrate_mode mode)
{
- int retry = 1;
- int nr_failed = 0;
- int pass = 0;
- struct page *page;
- struct page *page2;
- int rc;
-
- for (pass = 0; pass < 10 && retry; pass++) {
- retry = 0;
-
- list_for_each_entry_safe(page, page2, from, lru) {
+ int pass, rc;
+
+ for (pass = 0; pass < 10; pass++) {
+ rc = unmap_and_move_huge_page(get_new_page,
+ private, hpage, pass > 2, offlining,
+ mode);
+ switch (rc) {
+ case -ENOMEM:
+ goto out;
+ case -EAGAIN:
+ /* try again */
cond_resched();
-
- rc = unmap_and_move_huge_page(get_new_page,
- private, page, pass > 2, offlining,
- mode);
-
- switch(rc) {
- case -ENOMEM:
- goto out;
- case -EAGAIN:
- retry++;
- break;
- case 0:
- break;
- default:
- /* Permanent failure */
- nr_failed++;
- break;
- }
+ break;
+ case 0:
+ goto out;
+ default:
+ rc = -EIO;
+ goto out;
}
}
- rc = 0;
out:
- if (rc)
- return rc;
-
- return nr_failed + retry;
+ return rc;
}
#ifdef CONFIG_NUMA
--
1.7.10
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 37+ messages in thread* [PATCH -V7 07/14] mm/page_cgroup: Make page_cgroup point to the cgroup rather than the mem_cgroup
2012-05-30 14:38 [PATCH -V7 00/14] hugetlb: Add HugeTLB controller to control HugeTLB allocation Aneesh Kumar K.V
` (5 preceding siblings ...)
2012-05-30 14:38 ` [PATCH -V7 06/14] hugetlb: simplify migrate_huge_page() Aneesh Kumar K.V
@ 2012-05-30 14:38 ` Aneesh Kumar K.V
[not found] ` <1338388739-22919-8-git-send-email-aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2012-05-30 14:38 ` [PATCH -V7 08/14] hugetlbfs: add a list for tracking in-use HugeTLB pages Aneesh Kumar K.V
` (6 subsequent siblings)
13 siblings, 1 reply; 37+ messages in thread
From: Aneesh Kumar K.V @ 2012-05-30 14:38 UTC (permalink / raw)
To: linux-mm, kamezawa.hiroyu, dhillf, rientjes, mhocko, akpm, hannes
Cc: linux-kernel, cgroups, Aneesh Kumar K.V
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
We will use it later to make page_cgroup track the hugetlb cgroup information.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
include/linux/mmzone.h | 2 +-
include/linux/page_cgroup.h | 8 ++++----
init/Kconfig | 4 ++++
mm/Makefile | 3 ++-
mm/memcontrol.c | 42 +++++++++++++++++++++++++-----------------
5 files changed, 36 insertions(+), 23 deletions(-)
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 2427706..2483cc5 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1052,7 +1052,7 @@ struct mem_section {
/* See declaration of similar field in struct zone */
unsigned long *pageblock_flags;
-#ifdef CONFIG_CGROUP_MEM_RES_CTLR
+#ifdef CONFIG_PAGE_CGROUP
/*
* If !SPARSEMEM, pgdat doesn't have page_cgroup pointer. We use
* section. (see memcontrol.h/page_cgroup.h about this.)
diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
index a88cdba..7bbfe37 100644
--- a/include/linux/page_cgroup.h
+++ b/include/linux/page_cgroup.h
@@ -12,7 +12,7 @@ enum {
#ifndef __GENERATING_BOUNDS_H
#include <generated/bounds.h>
-#ifdef CONFIG_CGROUP_MEM_RES_CTLR
+#ifdef CONFIG_PAGE_CGROUP
#include <linux/bit_spinlock.h>
/*
@@ -24,7 +24,7 @@ enum {
*/
struct page_cgroup {
unsigned long flags;
- struct mem_cgroup *mem_cgroup;
+ struct cgroup *cgroup;
};
void __meminit pgdat_page_cgroup_init(struct pglist_data *pgdat);
@@ -82,7 +82,7 @@ static inline void unlock_page_cgroup(struct page_cgroup *pc)
bit_spin_unlock(PCG_LOCK, &pc->flags);
}
-#else /* CONFIG_CGROUP_MEM_RES_CTLR */
+#else /* CONFIG_PAGE_CGROUP */
struct page_cgroup;
static inline void __meminit pgdat_page_cgroup_init(struct pglist_data *pgdat)
@@ -102,7 +102,7 @@ static inline void __init page_cgroup_init_flatmem(void)
{
}
-#endif /* CONFIG_CGROUP_MEM_RES_CTLR */
+#endif /* CONFIG_PAGE_CGROUP */
#include <linux/swap.h>
diff --git a/init/Kconfig b/init/Kconfig
index 81816b8..1363203 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -687,10 +687,14 @@ config RESOURCE_COUNTERS
This option enables controller independent resource accounting
infrastructure that works with cgroups.
+config PAGE_CGROUP
+ bool
+
config CGROUP_MEM_RES_CTLR
bool "Memory Resource Controller for Control Groups"
depends on RESOURCE_COUNTERS
select MM_OWNER
+ select PAGE_CGROUP
help
Provides a memory resource controller that manages both anonymous
memory and page cache. (See Documentation/cgroups/memory.txt)
diff --git a/mm/Makefile b/mm/Makefile
index a156285..a70f9a9 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -47,7 +47,8 @@ obj-$(CONFIG_FS_XIP) += filemap_xip.o
obj-$(CONFIG_MIGRATION) += migrate.o
obj-$(CONFIG_QUICKLIST) += quicklist.o
obj-$(CONFIG_TRANSPARENT_HUGEPAGE) += huge_memory.o
-obj-$(CONFIG_CGROUP_MEM_RES_CTLR) += memcontrol.o page_cgroup.o
+obj-$(CONFIG_CGROUP_MEM_RES_CTLR) += memcontrol.o
+obj-$(CONFIG_PAGE_CGROUP) += page_cgroup.o
obj-$(CONFIG_MEMORY_FAILURE) += memory-failure.o
obj-$(CONFIG_HWPOISON_INJECT) += hwpoison-inject.o
obj-$(CONFIG_DEBUG_KMEMLEAK) += kmemleak.o
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index ac35bcc..6df019b 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -864,6 +864,8 @@ static void memcg_check_events(struct mem_cgroup *memcg, struct page *page)
struct mem_cgroup *mem_cgroup_from_cont(struct cgroup *cont)
{
+ if (!cont)
+ return NULL;
return container_of(cgroup_subsys_state(cont,
mem_cgroup_subsys_id), struct mem_cgroup,
css);
@@ -1097,7 +1099,7 @@ struct lruvec *mem_cgroup_page_lruvec(struct page *page, struct zone *zone)
return &zone->lruvec;
pc = lookup_page_cgroup(page);
- memcg = pc->mem_cgroup;
+ memcg = mem_cgroup_from_cont(pc->cgroup);
/*
* Surreptitiously switch any uncharged offlist page to root:
@@ -1108,8 +1110,10 @@ struct lruvec *mem_cgroup_page_lruvec(struct page *page, struct zone *zone)
* under page_cgroup lock: between them, they make all uses
* of pc->mem_cgroup safe.
*/
- if (!PageLRU(page) && !PageCgroupUsed(pc) && memcg != root_mem_cgroup)
- pc->mem_cgroup = memcg = root_mem_cgroup;
+ if (!PageLRU(page) && !PageCgroupUsed(pc) && memcg != root_mem_cgroup) {
+ memcg = root_mem_cgroup;
+ pc->cgroup = memcg->css.cgroup;
+ }
mz = page_cgroup_zoneinfo(memcg, page);
return &mz->lruvec;
@@ -1889,12 +1893,14 @@ static bool mem_cgroup_handle_oom(struct mem_cgroup *memcg, gfp_t mask,
void __mem_cgroup_begin_update_page_stat(struct page *page,
bool *locked, unsigned long *flags)
{
+ struct cgroup *cgroup;
struct mem_cgroup *memcg;
struct page_cgroup *pc;
pc = lookup_page_cgroup(page);
again:
- memcg = pc->mem_cgroup;
+ cgroup = pc->cgroup;
+ memcg = mem_cgroup_from_cont(cgroup);
if (unlikely(!memcg || !PageCgroupUsed(pc)))
return;
/*
@@ -1907,7 +1913,7 @@ again:
return;
move_lock_mem_cgroup(memcg, flags);
- if (memcg != pc->mem_cgroup || !PageCgroupUsed(pc)) {
+ if (cgroup != pc->cgroup || !PageCgroupUsed(pc)) {
move_unlock_mem_cgroup(memcg, flags);
goto again;
}
@@ -1923,7 +1929,7 @@ void __mem_cgroup_end_update_page_stat(struct page *page, unsigned long *flags)
* lock is held because a routine modifies pc->mem_cgroup
* should take move_lock_page_cgroup().
*/
- move_unlock_mem_cgroup(pc->mem_cgroup, flags);
+ move_unlock_mem_cgroup(mem_cgroup_from_cont(pc->cgroup), flags);
}
void mem_cgroup_update_page_stat(struct page *page,
@@ -1936,7 +1942,7 @@ void mem_cgroup_update_page_stat(struct page *page,
if (mem_cgroup_disabled())
return;
- memcg = pc->mem_cgroup;
+ memcg = mem_cgroup_from_cont(pc->cgroup);
if (unlikely(!memcg || !PageCgroupUsed(pc)))
return;
@@ -2444,7 +2450,7 @@ struct mem_cgroup *try_get_mem_cgroup_from_page(struct page *page)
pc = lookup_page_cgroup(page);
lock_page_cgroup(pc);
if (PageCgroupUsed(pc)) {
- memcg = pc->mem_cgroup;
+ memcg = mem_cgroup_from_cont(pc->cgroup);
if (memcg && !css_tryget(&memcg->css))
memcg = NULL;
} else if (PageSwapCache(page)) {
@@ -2491,14 +2497,15 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *memcg,
zone = page_zone(page);
spin_lock_irq(&zone->lru_lock);
if (PageLRU(page)) {
- lruvec = mem_cgroup_zone_lruvec(zone, pc->mem_cgroup);
+ lruvec = mem_cgroup_zone_lruvec(zone,
+ mem_cgroup_from_cont(pc->cgroup));
ClearPageLRU(page);
del_page_from_lru_list(page, lruvec, page_lru(page));
was_on_lru = true;
}
}
- pc->mem_cgroup = memcg;
+ pc->cgroup = memcg->css.cgroup;
/*
* We access a page_cgroup asynchronously without lock_page_cgroup().
* Especially when a page_cgroup is taken from a page, pc->mem_cgroup
@@ -2511,7 +2518,8 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *memcg,
if (lrucare) {
if (was_on_lru) {
- lruvec = mem_cgroup_zone_lruvec(zone, pc->mem_cgroup);
+ lruvec = mem_cgroup_zone_lruvec(zone,
+ mem_cgroup_from_cont(pc->cgroup));
VM_BUG_ON(PageLRU(page));
SetPageLRU(page);
add_page_to_lru_list(page, lruvec, page_lru(page));
@@ -2601,7 +2609,7 @@ static int mem_cgroup_move_account(struct page *page,
lock_page_cgroup(pc);
ret = -EINVAL;
- if (!PageCgroupUsed(pc) || pc->mem_cgroup != from)
+ if (!PageCgroupUsed(pc) || pc->cgroup != from->css.cgroup)
goto unlock;
move_lock_mem_cgroup(from, &flags);
@@ -2616,7 +2624,7 @@ static int mem_cgroup_move_account(struct page *page,
mem_cgroup_charge_statistics(from, anon, -nr_pages);
/* caller should have done css_get */
- pc->mem_cgroup = to;
+ pc->cgroup = to->css.cgroup;
mem_cgroup_charge_statistics(to, anon, nr_pages);
/*
* We charges against "to" which may not have any tasks. Then, "to"
@@ -2937,7 +2945,7 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype)
lock_page_cgroup(pc);
- memcg = pc->mem_cgroup;
+ memcg = mem_cgroup_from_cont(pc->cgroup);
if (!PageCgroupUsed(pc))
goto unlock_out;
@@ -3183,7 +3191,7 @@ int mem_cgroup_prepare_migration(struct page *page,
pc = lookup_page_cgroup(page);
lock_page_cgroup(pc);
if (PageCgroupUsed(pc)) {
- memcg = pc->mem_cgroup;
+ memcg = mem_cgroup_from_cont(pc->cgroup);
css_get(&memcg->css);
/*
* At migrating an anonymous page, its mapcount goes down
@@ -3328,7 +3336,7 @@ void mem_cgroup_replace_page_cache(struct page *oldpage,
/* fix accounting on old pages */
lock_page_cgroup(pc);
if (PageCgroupUsed(pc)) {
- memcg = pc->mem_cgroup;
+ memcg = mem_cgroup_from_cont(pc->cgroup);
mem_cgroup_charge_statistics(memcg, false, -1);
ClearPageCgroupUsed(pc);
}
@@ -5135,7 +5143,7 @@ static enum mc_target_type get_mctgt_type(struct vm_area_struct *vma,
* mem_cgroup_move_account() checks the pc is valid or not under
* the lock.
*/
- if (PageCgroupUsed(pc) && pc->mem_cgroup == mc.from) {
+ if (PageCgroupUsed(pc) && pc->cgroup == mc.from->css.cgroup) {
ret = MC_TARGET_PAGE;
if (target)
target->page = page;
--
1.7.10
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 37+ messages in thread* [PATCH -V7 08/14] hugetlbfs: add a list for tracking in-use HugeTLB pages
2012-05-30 14:38 [PATCH -V7 00/14] hugetlb: Add HugeTLB controller to control HugeTLB allocation Aneesh Kumar K.V
` (6 preceding siblings ...)
2012-05-30 14:38 ` [PATCH -V7 07/14] mm/page_cgroup: Make page_cgroup point to the cgroup rather than the mem_cgroup Aneesh Kumar K.V
@ 2012-05-30 14:38 ` Aneesh Kumar K.V
2012-05-30 14:38 ` [PATCH -V7 09/14] hugetlbfs: Make some static variables global Aneesh Kumar K.V
` (5 subsequent siblings)
13 siblings, 0 replies; 37+ messages in thread
From: Aneesh Kumar K.V @ 2012-05-30 14:38 UTC (permalink / raw)
To: linux-mm, kamezawa.hiroyu, dhillf, rientjes, mhocko, akpm, hannes
Cc: linux-kernel, cgroups, Aneesh Kumar K.V, Andrea Arcangeli
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
hugepage_activelist will be used to track currently used HugeTLB pages.
We need to find the in-use HugeTLB pages to support memcg removal. On
memcg removal we update the page's memory cgroup to point to parent
cgroup.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
---
include/linux/hugetlb.h | 1 +
mm/hugetlb.c | 12 +++++++-----
2 files changed, 8 insertions(+), 5 deletions(-)
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index c21e136..c4353ea 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -211,6 +211,7 @@ struct hstate {
unsigned long resv_huge_pages;
unsigned long surplus_huge_pages;
unsigned long nr_overcommit_huge_pages;
+ struct list_head hugepage_activelist;
struct list_head hugepage_freelists[MAX_NUMNODES];
unsigned int nr_huge_pages_node[MAX_NUMNODES];
unsigned int free_huge_pages_node[MAX_NUMNODES];
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 704a269..0f38728 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -510,7 +510,7 @@ void copy_huge_page(struct page *dst, struct page *src)
static void enqueue_huge_page(struct hstate *h, struct page *page)
{
int nid = page_to_nid(page);
- list_add(&page->lru, &h->hugepage_freelists[nid]);
+ list_move(&page->lru, &h->hugepage_freelists[nid]);
h->free_huge_pages++;
h->free_huge_pages_node[nid]++;
}
@@ -522,7 +522,7 @@ static struct page *dequeue_huge_page_node(struct hstate *h, int nid)
if (list_empty(&h->hugepage_freelists[nid]))
return NULL;
page = list_entry(h->hugepage_freelists[nid].next, struct page, lru);
- list_del(&page->lru);
+ list_move(&page->lru, &h->hugepage_activelist);
set_page_refcounted(page);
h->free_huge_pages--;
h->free_huge_pages_node[nid]--;
@@ -626,10 +626,11 @@ static void free_huge_page(struct page *page)
page->mapping = NULL;
BUG_ON(page_count(page));
BUG_ON(page_mapcount(page));
- INIT_LIST_HEAD(&page->lru);
spin_lock(&hugetlb_lock);
if (h->surplus_huge_pages_node[nid] && huge_page_order(h) < MAX_ORDER) {
+ /* remove the page from active list */
+ list_del(&page->lru);
update_and_free_page(h, page);
h->surplus_huge_pages--;
h->surplus_huge_pages_node[nid]--;
@@ -642,6 +643,7 @@ static void free_huge_page(struct page *page)
static void prep_new_huge_page(struct hstate *h, struct page *page, int nid)
{
+ INIT_LIST_HEAD(&page->lru);
set_compound_page_dtor(page, free_huge_page);
spin_lock(&hugetlb_lock);
h->nr_huge_pages++;
@@ -890,6 +892,7 @@ static struct page *alloc_buddy_huge_page(struct hstate *h, int nid)
spin_lock(&hugetlb_lock);
if (page) {
+ INIT_LIST_HEAD(&page->lru);
r_nid = page_to_nid(page);
set_compound_page_dtor(page, free_huge_page);
/*
@@ -994,7 +997,6 @@ retry:
list_for_each_entry_safe(page, tmp, &surplus_list, lru) {
if ((--needed) < 0)
break;
- list_del(&page->lru);
/*
* This page is now managed by the hugetlb allocator and has
* no users -- drop the buddy allocator's reference.
@@ -1009,7 +1011,6 @@ free:
/* Free unnecessary surplus pages to the buddy allocator */
if (!list_empty(&surplus_list)) {
list_for_each_entry_safe(page, tmp, &surplus_list, lru) {
- list_del(&page->lru);
put_page(page);
}
}
@@ -1909,6 +1910,7 @@ void __init hugetlb_add_hstate(unsigned order)
h->free_huge_pages = 0;
for (i = 0; i < MAX_NUMNODES; ++i)
INIT_LIST_HEAD(&h->hugepage_freelists[i]);
+ INIT_LIST_HEAD(&h->hugepage_activelist);
h->next_nid_to_alloc = first_node(node_states[N_HIGH_MEMORY]);
h->next_nid_to_free = first_node(node_states[N_HIGH_MEMORY]);
snprintf(h->name, HSTATE_NAME_LEN, "hugepages-%lukB",
--
1.7.10
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 37+ messages in thread* [PATCH -V7 09/14] hugetlbfs: Make some static variables global
2012-05-30 14:38 [PATCH -V7 00/14] hugetlb: Add HugeTLB controller to control HugeTLB allocation Aneesh Kumar K.V
` (7 preceding siblings ...)
2012-05-30 14:38 ` [PATCH -V7 08/14] hugetlbfs: add a list for tracking in-use HugeTLB pages Aneesh Kumar K.V
@ 2012-05-30 14:38 ` Aneesh Kumar K.V
2012-05-30 14:38 ` [PATCH -V7 10/14] hugetlbfs: Add new HugeTLB cgroup Aneesh Kumar K.V
` (4 subsequent siblings)
13 siblings, 0 replies; 37+ messages in thread
From: Aneesh Kumar K.V @ 2012-05-30 14:38 UTC (permalink / raw)
To: linux-mm, kamezawa.hiroyu, dhillf, rientjes, mhocko, akpm, hannes
Cc: linux-kernel, cgroups, Aneesh Kumar K.V
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
We will use them later in hugetlb_cgroup.c
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
include/linux/hugetlb.h | 5 +++++
mm/hugetlb.c | 7 ++-----
2 files changed, 7 insertions(+), 5 deletions(-)
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index c4353ea..dcd55c7 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -21,6 +21,11 @@ struct hugepage_subpool {
long max_hpages, used_hpages;
};
+extern spinlock_t hugetlb_lock;
+extern int hugetlb_max_hstate;
+#define for_each_hstate(h) \
+ for ((h) = hstates; (h) < &hstates[hugetlb_max_hstate]; (h)++)
+
struct hugepage_subpool *hugepage_new_subpool(long nr_blocks);
void hugepage_put_subpool(struct hugepage_subpool *spool);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 0f38728..53840dd 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -35,7 +35,7 @@ const unsigned long hugetlb_zero = 0, hugetlb_infinity = ~0UL;
static gfp_t htlb_alloc_mask = GFP_HIGHUSER;
unsigned long hugepages_treat_as_movable;
-static int hugetlb_max_hstate;
+int hugetlb_max_hstate;
unsigned int default_hstate_idx;
struct hstate hstates[HUGE_MAX_HSTATE];
@@ -46,13 +46,10 @@ static struct hstate * __initdata parsed_hstate;
static unsigned long __initdata default_hstate_max_huge_pages;
static unsigned long __initdata default_hstate_size;
-#define for_each_hstate(h) \
- for ((h) = hstates; (h) < &hstates[hugetlb_max_hstate]; (h)++)
-
/*
* Protects updates to hugepage_freelists, nr_huge_pages, and free_huge_pages
*/
-static DEFINE_SPINLOCK(hugetlb_lock);
+DEFINE_SPINLOCK(hugetlb_lock);
static inline void unlock_or_release_subpool(struct hugepage_subpool *spool)
{
--
1.7.10
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 37+ messages in thread* [PATCH -V7 10/14] hugetlbfs: Add new HugeTLB cgroup
2012-05-30 14:38 [PATCH -V7 00/14] hugetlb: Add HugeTLB controller to control HugeTLB allocation Aneesh Kumar K.V
` (8 preceding siblings ...)
2012-05-30 14:38 ` [PATCH -V7 09/14] hugetlbfs: Make some static variables global Aneesh Kumar K.V
@ 2012-05-30 14:38 ` Aneesh Kumar K.V
[not found] ` <1338388739-22919-11-git-send-email-aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2012-05-30 14:38 ` [PATCH -V7 11/14] hugetlbfs: add hugetlb cgroup control files Aneesh Kumar K.V
` (3 subsequent siblings)
13 siblings, 1 reply; 37+ messages in thread
From: Aneesh Kumar K.V @ 2012-05-30 14:38 UTC (permalink / raw)
To: linux-mm, kamezawa.hiroyu, dhillf, rientjes, mhocko, akpm, hannes
Cc: linux-kernel, cgroups, Aneesh Kumar K.V
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
This patch implements a new controller that allows us to control HugeTLB
allocations. The extension allows to limit the HugeTLB usage per control
group and enforces the controller limit during page fault. Since HugeTLB
doesn't support page reclaim, enforcing the limit at page fault time implies
that, the application will get SIGBUS signal if it tries to access HugeTLB
pages beyond its limit. This requires the application to know beforehand
how much HugeTLB pages it would require for its use.
The charge/uncharge calls will be added to HugeTLB code in later patch.
Support for cgroup removal will be added in later patches.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
include/linux/cgroup_subsys.h | 6 +
include/linux/hugetlb_cgroup.h | 79 ++++++++++++
init/Kconfig | 14 ++
mm/Makefile | 1 +
mm/hugetlb_cgroup.c | 280 ++++++++++++++++++++++++++++++++++++++++
mm/page_cgroup.c | 5 +-
6 files changed, 383 insertions(+), 2 deletions(-)
create mode 100644 include/linux/hugetlb_cgroup.h
create mode 100644 mm/hugetlb_cgroup.c
diff --git a/include/linux/cgroup_subsys.h b/include/linux/cgroup_subsys.h
index 0bd390c..895923a 100644
--- a/include/linux/cgroup_subsys.h
+++ b/include/linux/cgroup_subsys.h
@@ -72,3 +72,9 @@ SUBSYS(net_prio)
#endif
/* */
+
+#ifdef CONFIG_CGROUP_HUGETLB_RES_CTLR
+SUBSYS(hugetlb)
+#endif
+
+/* */
diff --git a/include/linux/hugetlb_cgroup.h b/include/linux/hugetlb_cgroup.h
new file mode 100644
index 0000000..5794be4
--- /dev/null
+++ b/include/linux/hugetlb_cgroup.h
@@ -0,0 +1,79 @@
+/*
+ * Copyright IBM Corporation, 2012
+ * Author Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of version 2.1 of the GNU Lesser General Public License
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+ *
+ */
+
+#ifndef _LINUX_HUGETLB_CGROUP_H
+#define _LINUX_HUGETLB_CGROUP_H
+
+#include <linux/res_counter.h>
+
+struct hugetlb_cgroup {
+ struct cgroup_subsys_state css;
+ /*
+ * the counter to account for hugepages from hugetlb.
+ */
+ struct res_counter hugepage[HUGE_MAX_HSTATE];
+};
+
+#ifdef CONFIG_CGROUP_HUGETLB_RES_CTLR
+static inline bool hugetlb_cgroup_disabled(void)
+{
+ if (hugetlb_subsys.disabled)
+ return true;
+ return false;
+}
+
+extern int hugetlb_cgroup_charge_page(int idx, unsigned long nr_pages,
+ struct hugetlb_cgroup **ptr);
+extern void hugetlb_cgroup_commit_charge(int idx, unsigned long nr_pages,
+ struct hugetlb_cgroup *h_cg,
+ struct page *page);
+extern void hugetlb_cgroup_uncharge_page(int idx, unsigned long nr_pages,
+ struct page *page);
+extern void hugetlb_cgroup_uncharge_cgroup(int idx, unsigned long nr_pages,
+ struct hugetlb_cgroup *h_cg);
+#else
+static inline bool hugetlb_cgroup_disabled(void)
+{
+ return true;
+}
+
+static inline int
+hugetlb_cgroup_charge_page(int idx, unsigned long nr_pages,
+ struct hugetlb_cgroup **ptr)
+{
+ return 0;
+}
+
+static inline void
+hugetlb_cgroup_commit_charge(int idx, unsigned long nr_pages,
+ struct hugetlb_cgroup *h_cg,
+ struct page *page)
+{
+ return;
+}
+
+static inline void
+hugetlb_cgroup_uncharge_page(int idx, unsigned long nr_pages, struct page *page)
+{
+ return;
+}
+
+static inline void
+hugetlb_cgroup_uncharge_cgroup(int idx, unsigned long nr_pages,
+ struct hugetlb_cgroup *h_cg)
+{
+ return;
+}
+#endif /* CONFIG_MEM_RES_CTLR_HUGETLB */
+#endif
diff --git a/init/Kconfig b/init/Kconfig
index 1363203..73b14b0 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -714,6 +714,20 @@ config CGROUP_MEM_RES_CTLR
This config option also selects MM_OWNER config option, which
could in turn add some fork/exit overhead.
+config CGROUP_HUGETLB_RES_CTLR
+ bool "HugeTLB Resource Controller for Control Groups"
+ depends on RESOURCE_COUNTERS && HUGETLB_PAGE && EXPERIMENTAL
+ select PAGE_CGROUP
+ default n
+ help
+ Provides a simple cgroup Resource Controller for HugeTLB pages.
+ When you enable this, you can put a per cgroup limit on HugeTLB usage.
+ The limit is enforced during page fault. Since HugeTLB doesn't
+ support page reclaim, enforcing the limit at page fault time implies
+ that, the application will get SIGBUS signal if it tries to access
+ HugeTLB pages beyond its limit. This requires the application to know
+ beforehand how much HugeTLB pages it would require for its use.
+
config CGROUP_MEM_RES_CTLR_SWAP
bool "Memory Resource Controller Swap Extension"
depends on CGROUP_MEM_RES_CTLR && SWAP
diff --git a/mm/Makefile b/mm/Makefile
index a70f9a9..bed4944 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -48,6 +48,7 @@ obj-$(CONFIG_MIGRATION) += migrate.o
obj-$(CONFIG_QUICKLIST) += quicklist.o
obj-$(CONFIG_TRANSPARENT_HUGEPAGE) += huge_memory.o
obj-$(CONFIG_CGROUP_MEM_RES_CTLR) += memcontrol.o
+obj-$(CONFIG_CGROUP_HUGETLB_RES_CTLR) += hugetlb_cgroup.o
obj-$(CONFIG_PAGE_CGROUP) += page_cgroup.o
obj-$(CONFIG_MEMORY_FAILURE) += memory-failure.o
obj-$(CONFIG_HWPOISON_INJECT) += hwpoison-inject.o
diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c
new file mode 100644
index 0000000..3a288f7
--- /dev/null
+++ b/mm/hugetlb_cgroup.c
@@ -0,0 +1,280 @@
+/*
+ *
+ * Copyright IBM Corporation, 2012
+ * Author Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of version 2.1 of the GNU Lesser General Public License
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+ *
+ */
+
+#include <linux/cgroup.h>
+#include <linux/slab.h>
+#include <linux/hugetlb.h>
+#include <linux/page_cgroup.h>
+#include <linux/hugetlb_cgroup.h>
+
+struct cgroup_subsys hugetlb_subsys __read_mostly;
+struct hugetlb_cgroup *root_h_cgroup __read_mostly;
+
+static inline
+struct hugetlb_cgroup *hugetlb_cgroup_from_css(struct cgroup_subsys_state *s)
+{
+ return container_of(s, struct hugetlb_cgroup, css);
+}
+
+static inline
+struct hugetlb_cgroup *hugetlb_cgroup_from_cgroup(struct cgroup *cgroup)
+{
+ if (!cgroup)
+ return NULL;
+ return hugetlb_cgroup_from_css(cgroup_subsys_state(cgroup,
+ hugetlb_subsys_id));
+}
+
+static inline
+struct hugetlb_cgroup *hugetlb_cgroup_from_task(struct task_struct *task)
+{
+ return hugetlb_cgroup_from_css(task_subsys_state(task,
+ hugetlb_subsys_id));
+}
+
+static inline bool hugetlb_cgroup_is_root(struct hugetlb_cgroup *h_cg)
+{
+ return (h_cg == root_h_cgroup);
+}
+
+static struct hugetlb_cgroup *parent_hugetlb_cgroup(struct cgroup *cg)
+{
+ if (!cg->parent)
+ return NULL;
+ return hugetlb_cgroup_from_cgroup(cg->parent);
+}
+
+static inline bool hugetlb_cgroup_have_usage(struct cgroup *cg)
+{
+ int idx;
+ struct hugetlb_cgroup *h_cg = hugetlb_cgroup_from_cgroup(cg);
+
+ for (idx = 0; idx < HUGE_MAX_HSTATE; idx++) {
+ if ((res_counter_read_u64(&h_cg->hugepage[idx], RES_USAGE)) > 0)
+ return 1;
+ }
+ return 0;
+}
+
+static struct cgroup_subsys_state *hugetlb_cgroup_create(struct cgroup *cgroup)
+{
+ int idx;
+ struct cgroup *parent_cgroup;
+ struct hugetlb_cgroup *h_cgroup, *parent_h_cgroup;
+
+ h_cgroup = kzalloc(sizeof(*h_cgroup), GFP_KERNEL);
+ if (!h_cgroup)
+ return ERR_PTR(-ENOMEM);
+
+ parent_cgroup = cgroup->parent;
+ if (parent_cgroup) {
+ parent_h_cgroup = hugetlb_cgroup_from_cgroup(parent_cgroup);
+ for (idx = 0; idx < HUGE_MAX_HSTATE; idx++)
+ res_counter_init(&h_cgroup->hugepage[idx],
+ &parent_h_cgroup->hugepage[idx]);
+ } else {
+ root_h_cgroup = h_cgroup;
+ for (idx = 0; idx < HUGE_MAX_HSTATE; idx++)
+ res_counter_init(&h_cgroup->hugepage[idx], NULL);
+ }
+ return &h_cgroup->css;
+}
+
+static int hugetlb_cgroup_move_parent(int idx, struct cgroup *cgroup,
+ struct page *page)
+{
+ int csize, ret = 0;
+ struct page_cgroup *pc;
+ struct res_counter *counter;
+ struct res_counter *fail_res;
+ struct hugetlb_cgroup *h_cg = hugetlb_cgroup_from_cgroup(cgroup);
+ struct hugetlb_cgroup *parent = parent_hugetlb_cgroup(cgroup);
+
+ if (!get_page_unless_zero(page))
+ goto out;
+
+ pc = lookup_page_cgroup(page);
+ lock_page_cgroup(pc);
+ if (!PageCgroupUsed(pc) || pc->cgroup != cgroup)
+ goto err_out;
+
+ csize = PAGE_SIZE << compound_order(page);
+ /* If use_hierarchy == 0, we need to charge root */
+ if (!parent) {
+ parent = root_h_cgroup;
+ /* root has no limit */
+ res_counter_charge_nofail(&parent->hugepage[idx],
+ csize, &fail_res);
+ }
+ counter = &h_cg->hugepage[idx];
+ res_counter_uncharge_until(counter, counter->parent, csize);
+
+ pc->cgroup = cgroup->parent;
+err_out:
+ unlock_page_cgroup(pc);
+ put_page(page);
+out:
+ return ret;
+}
+
+/*
+ * Force the hugetlb cgroup to empty the hugetlb resources by moving them to
+ * the parent cgroup.
+ */
+static int hugetlb_cgroup_pre_destroy(struct cgroup *cgroup)
+{
+ struct hstate *h;
+ struct page *page;
+ int ret = 0, idx = 0;
+
+ do {
+ if (cgroup_task_count(cgroup) ||
+ !list_empty(&cgroup->children)) {
+ ret = -EBUSY;
+ goto out;
+ }
+ /*
+ * If the task doing the cgroup_rmdir got a signal
+ * we don't really need to loop till the hugetlb resource
+ * usage become zero.
+ */
+ if (signal_pending(current)) {
+ ret = -EINTR;
+ goto out;
+ }
+ for_each_hstate(h) {
+ spin_lock(&hugetlb_lock);
+ list_for_each_entry(page, &h->hugepage_activelist, lru) {
+ ret = hugetlb_cgroup_move_parent(idx, cgroup, page);
+ if (ret) {
+ spin_unlock(&hugetlb_lock);
+ goto out;
+ }
+ }
+ spin_unlock(&hugetlb_lock);
+ idx++;
+ }
+ cond_resched();
+ } while (hugetlb_cgroup_have_usage(cgroup));
+out:
+ return ret;
+}
+
+static void hugetlb_cgroup_destroy(struct cgroup *cgroup)
+{
+ struct hugetlb_cgroup *h_cgroup;
+
+ h_cgroup = hugetlb_cgroup_from_cgroup(cgroup);
+ kfree(h_cgroup);
+}
+
+int hugetlb_cgroup_charge_page(int idx, unsigned long nr_pages,
+ struct hugetlb_cgroup **ptr)
+{
+ int ret = 0;
+ struct res_counter *fail_res;
+ struct hugetlb_cgroup *h_cg = NULL;
+ unsigned long csize = nr_pages * PAGE_SIZE;
+
+ if (hugetlb_cgroup_disabled())
+ goto done;
+again:
+ rcu_read_lock();
+ h_cg = hugetlb_cgroup_from_task(current);
+ if (!h_cg)
+ h_cg = root_h_cgroup;
+
+ if (!css_tryget(&h_cg->css)) {
+ rcu_read_unlock();
+ goto again;
+ }
+ rcu_read_unlock();
+
+ ret = res_counter_charge(&h_cg->hugepage[idx], csize, &fail_res);
+ css_put(&h_cg->css);
+done:
+ *ptr = h_cg;
+ return ret;
+}
+
+void hugetlb_cgroup_commit_charge(int idx, unsigned long nr_pages,
+ struct hugetlb_cgroup *h_cg,
+ struct page *page)
+{
+ struct page_cgroup *pc;
+
+ if (hugetlb_cgroup_disabled())
+ return;
+
+ pc = lookup_page_cgroup(page);
+ lock_page_cgroup(pc);
+ if (unlikely(PageCgroupUsed(pc))) {
+ unlock_page_cgroup(pc);
+ hugetlb_cgroup_uncharge_cgroup(idx, nr_pages, h_cg);
+ return;
+ }
+ pc->cgroup = h_cg->css.cgroup;
+ SetPageCgroupUsed(pc);
+ unlock_page_cgroup(pc);
+ return;
+}
+
+void hugetlb_cgroup_uncharge_page(int idx, unsigned long nr_pages,
+ struct page *page)
+{
+ struct page_cgroup *pc;
+ struct hugetlb_cgroup *h_cg;
+ unsigned long csize = nr_pages * PAGE_SIZE;
+
+ if (hugetlb_cgroup_disabled())
+ return;
+
+ pc = lookup_page_cgroup(page);
+ if (unlikely(!PageCgroupUsed(pc)))
+ return;
+
+ lock_page_cgroup(pc);
+ if (!PageCgroupUsed(pc)) {
+ unlock_page_cgroup(pc);
+ return;
+ }
+ h_cg = hugetlb_cgroup_from_cgroup(pc->cgroup);
+ pc->cgroup = root_h_cgroup->css.cgroup;
+ ClearPageCgroupUsed(pc);
+ unlock_page_cgroup(pc);
+
+ res_counter_uncharge(&h_cg->hugepage[idx], csize);
+ return;
+}
+
+void hugetlb_cgroup_uncharge_cgroup(int idx, unsigned long nr_pages,
+ struct hugetlb_cgroup *h_cg)
+{
+ unsigned long csize = nr_pages * PAGE_SIZE;
+
+ if (hugetlb_cgroup_disabled())
+ return;
+
+ res_counter_uncharge(&h_cg->hugepage[idx], csize);
+ return;
+}
+
+struct cgroup_subsys hugetlb_subsys = {
+ .name = "hugetlb",
+ .create = hugetlb_cgroup_create,
+ .pre_destroy = hugetlb_cgroup_pre_destroy,
+ .destroy = hugetlb_cgroup_destroy,
+ .subsys_id = hugetlb_subsys_id,
+};
diff --git a/mm/page_cgroup.c b/mm/page_cgroup.c
index 1ccbd71..26271b7 100644
--- a/mm/page_cgroup.c
+++ b/mm/page_cgroup.c
@@ -10,6 +10,7 @@
#include <linux/cgroup.h>
#include <linux/swapops.h>
#include <linux/kmemleak.h>
+#include <linux/hugetlb_cgroup.h>
static unsigned long total_usage;
@@ -68,7 +69,7 @@ void __init page_cgroup_init_flatmem(void)
int nid, fail;
- if (mem_cgroup_disabled())
+ if (mem_cgroup_disabled() && hugetlb_cgroup_disabled())
return;
for_each_online_node(nid) {
@@ -268,7 +269,7 @@ void __init page_cgroup_init(void)
unsigned long pfn;
int nid;
- if (mem_cgroup_disabled())
+ if (mem_cgroup_disabled() && hugetlb_cgroup_disabled())
return;
for_each_node_state(nid, N_HIGH_MEMORY) {
--
1.7.10
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 37+ messages in thread* [PATCH -V7 11/14] hugetlbfs: add hugetlb cgroup control files
2012-05-30 14:38 [PATCH -V7 00/14] hugetlb: Add HugeTLB controller to control HugeTLB allocation Aneesh Kumar K.V
` (9 preceding siblings ...)
2012-05-30 14:38 ` [PATCH -V7 10/14] hugetlbfs: Add new HugeTLB cgroup Aneesh Kumar K.V
@ 2012-05-30 14:38 ` Aneesh Kumar K.V
[not found] ` <1338388739-22919-12-git-send-email-aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2012-05-30 14:38 ` [PATCH -V7 12/14] hugetlb: add charge/uncharge calls for HugeTLB alloc/free Aneesh Kumar K.V
` (2 subsequent siblings)
13 siblings, 1 reply; 37+ messages in thread
From: Aneesh Kumar K.V @ 2012-05-30 14:38 UTC (permalink / raw)
To: linux-mm, kamezawa.hiroyu, dhillf, rientjes, mhocko, akpm, hannes
Cc: linux-kernel, cgroups, Aneesh Kumar K.V
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Add the control files for hugetlb controller
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
include/linux/hugetlb.h | 5 ++
include/linux/hugetlb_cgroup.h | 6 ++
mm/hugetlb.c | 2 +
mm/hugetlb_cgroup.c | 130 ++++++++++++++++++++++++++++++++++++++++
4 files changed, 143 insertions(+)
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index dcd55c7..92f75a5 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -4,6 +4,7 @@
#include <linux/mm_types.h>
#include <linux/fs.h>
#include <linux/hugetlb_inline.h>
+#include <linux/cgroup.h>
struct ctl_table;
struct user_struct;
@@ -221,6 +222,10 @@ struct hstate {
unsigned int nr_huge_pages_node[MAX_NUMNODES];
unsigned int free_huge_pages_node[MAX_NUMNODES];
unsigned int surplus_huge_pages_node[MAX_NUMNODES];
+#ifdef CONFIG_CGROUP_HUGETLB_RES_CTLR
+ /* cgroup control files */
+ struct cftype cgroup_files[5];
+#endif
char name[HSTATE_NAME_LEN];
};
diff --git a/include/linux/hugetlb_cgroup.h b/include/linux/hugetlb_cgroup.h
index 5794be4..fbf8c5f 100644
--- a/include/linux/hugetlb_cgroup.h
+++ b/include/linux/hugetlb_cgroup.h
@@ -42,6 +42,7 @@ extern void hugetlb_cgroup_uncharge_page(int idx, unsigned long nr_pages,
struct page *page);
extern void hugetlb_cgroup_uncharge_cgroup(int idx, unsigned long nr_pages,
struct hugetlb_cgroup *h_cg);
+extern int hugetlb_cgroup_file_init(int idx) __init;
#else
static inline bool hugetlb_cgroup_disabled(void)
{
@@ -75,5 +76,10 @@ hugetlb_cgroup_uncharge_cgroup(int idx, unsigned long nr_pages,
{
return;
}
+
+static inline int __init hugetlb_cgroup_file_init(int idx)
+{
+ return 0;
+}
#endif /* CONFIG_MEM_RES_CTLR_HUGETLB */
#endif
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 53840dd..6330de2 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -29,6 +29,7 @@
#include <linux/io.h>
#include <linux/hugetlb.h>
#include <linux/node.h>
+#include <linux/hugetlb_cgroup.h>
#include "internal.h"
const unsigned long hugetlb_zero = 0, hugetlb_infinity = ~0UL;
@@ -1912,6 +1913,7 @@ void __init hugetlb_add_hstate(unsigned order)
h->next_nid_to_free = first_node(node_states[N_HIGH_MEMORY]);
snprintf(h->name, HSTATE_NAME_LEN, "hugepages-%lukB",
huge_page_size(h)/1024);
+ hugetlb_cgroup_file_init(hugetlb_max_hstate - 1);
parsed_hstate = h;
}
diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c
index 3a288f7..49a3f20 100644
--- a/mm/hugetlb_cgroup.c
+++ b/mm/hugetlb_cgroup.c
@@ -19,6 +19,11 @@
#include <linux/page_cgroup.h>
#include <linux/hugetlb_cgroup.h>
+/* lifted from mem control */
+#define MEMFILE_PRIVATE(x, val) (((x) << 16) | (val))
+#define MEMFILE_IDX(val) (((val) >> 16) & 0xffff)
+#define MEMFILE_ATTR(val) ((val) & 0xffff)
+
struct cgroup_subsys hugetlb_subsys __read_mostly;
struct hugetlb_cgroup *root_h_cgroup __read_mostly;
@@ -271,6 +276,131 @@ void hugetlb_cgroup_uncharge_cgroup(int idx, unsigned long nr_pages,
return;
}
+static ssize_t hugetlb_cgroup_read(struct cgroup *cgroup, struct cftype *cft,
+ struct file *file, char __user *buf,
+ size_t nbytes, loff_t *ppos)
+{
+ u64 val;
+ char str[64];
+ int idx, name, len;
+ struct hugetlb_cgroup *h_cg = hugetlb_cgroup_from_cgroup(cgroup);
+
+ idx = MEMFILE_IDX(cft->private);
+ name = MEMFILE_ATTR(cft->private);
+
+ val = res_counter_read_u64(&h_cg->hugepage[idx], name);
+ len = scnprintf(str, sizeof(str), "%llu\n", (unsigned long long)val);
+ return simple_read_from_buffer(buf, nbytes, ppos, str, len);
+}
+
+static int hugetlb_cgroup_write(struct cgroup *cgroup, struct cftype *cft,
+ const char *buffer)
+{
+ int idx, name, ret;
+ unsigned long long val;
+ struct hugetlb_cgroup *h_cg = hugetlb_cgroup_from_cgroup(cgroup);
+
+ idx = MEMFILE_IDX(cft->private);
+ name = MEMFILE_ATTR(cft->private);
+
+ switch (name) {
+ case RES_LIMIT:
+ if (hugetlb_cgroup_is_root(h_cg)) {
+ /* Can't set limit on root */
+ ret = -EINVAL;
+ break;
+ }
+ /* This function does all necessary parse...reuse it */
+ ret = res_counter_memparse_write_strategy(buffer, &val);
+ if (ret)
+ break;
+ ret = res_counter_set_limit(&h_cg->hugepage[idx], val);
+ break;
+ default:
+ ret = -EINVAL;
+ break;
+ }
+ return ret;
+}
+
+static int hugetlb_cgroup_reset(struct cgroup *cgroup, unsigned int event)
+{
+ int idx, name, ret = 0;
+ struct hugetlb_cgroup *h_cg = hugetlb_cgroup_from_cgroup(cgroup);
+
+ idx = MEMFILE_IDX(event);
+ name = MEMFILE_ATTR(event);
+
+ switch (name) {
+ case RES_MAX_USAGE:
+ res_counter_reset_max(&h_cg->hugepage[idx]);
+ break;
+ case RES_FAILCNT:
+ res_counter_reset_failcnt(&h_cg->hugepage[idx]);
+ break;
+ default:
+ ret = -EINVAL;
+ break;
+ }
+ return ret;
+}
+
+static char *mem_fmt(char *buf, int size, unsigned long hsize)
+{
+ if (hsize >= (1UL << 30))
+ snprintf(buf, size, "%luGB", hsize >> 30);
+ else if (hsize >= (1UL << 20))
+ snprintf(buf, size, "%luMB", hsize >> 20);
+ else
+ snprintf(buf, size, "%luKB", hsize >> 10);
+ return buf;
+}
+
+int __init hugetlb_cgroup_file_init(int idx)
+{
+ char buf[32];
+ struct cftype *cft;
+ struct hstate *h = &hstates[idx];
+
+ /* format the size */
+ mem_fmt(buf, 32, huge_page_size(h));
+
+ /* Add the limit file */
+ cft = &h->cgroup_files[0];
+ snprintf(cft->name, MAX_CFTYPE_NAME, "%s.limit_in_bytes", buf);
+ cft->private = MEMFILE_PRIVATE(idx, RES_LIMIT);
+ cft->read = hugetlb_cgroup_read;
+ cft->write_string = hugetlb_cgroup_write;
+
+ /* Add the usage file */
+ cft = &h->cgroup_files[1];
+ snprintf(cft->name, MAX_CFTYPE_NAME, "%s.usage_in_bytes", buf);
+ cft->private = MEMFILE_PRIVATE(idx, RES_USAGE);
+ cft->read = hugetlb_cgroup_read;
+
+ /* Add the MAX usage file */
+ cft = &h->cgroup_files[2];
+ snprintf(cft->name, MAX_CFTYPE_NAME, "%s.max_usage_in_bytes", buf);
+ cft->private = MEMFILE_PRIVATE(idx, RES_MAX_USAGE);
+ cft->trigger = hugetlb_cgroup_reset;
+ cft->read = hugetlb_cgroup_read;
+
+ /* Add the failcntfile */
+ cft = &h->cgroup_files[3];
+ snprintf(cft->name, MAX_CFTYPE_NAME, "%s.failcnt", buf);
+ cft->private = MEMFILE_PRIVATE(idx, RES_FAILCNT);
+ cft->trigger = hugetlb_cgroup_reset;
+ cft->read = hugetlb_cgroup_read;
+
+ /* NULL terminate the last cft */
+ cft = &h->cgroup_files[4];
+ memset(cft, 0, sizeof(*cft));
+
+ WARN_ON(cgroup_add_cftypes(&hugetlb_subsys, h->cgroup_files));
+
+ return 0;
+}
+
struct cgroup_subsys hugetlb_subsys = {
.name = "hugetlb",
.create = hugetlb_cgroup_create,
--
1.7.10
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 37+ messages in thread* [PATCH -V7 12/14] hugetlb: add charge/uncharge calls for HugeTLB alloc/free
2012-05-30 14:38 [PATCH -V7 00/14] hugetlb: Add HugeTLB controller to control HugeTLB allocation Aneesh Kumar K.V
` (10 preceding siblings ...)
2012-05-30 14:38 ` [PATCH -V7 11/14] hugetlbfs: add hugetlb cgroup control files Aneesh Kumar K.V
@ 2012-05-30 14:38 ` Aneesh Kumar K.V
2012-05-30 14:38 ` [PATCH -V7 13/14] hugetlb: migrate hugetlb cgroup info from oldpage to new page during migration Aneesh Kumar K.V
2012-05-30 14:38 ` [PATCH -V7 14/14] hugetlb: add HugeTLB controller documentation Aneesh Kumar K.V
13 siblings, 0 replies; 37+ messages in thread
From: Aneesh Kumar K.V @ 2012-05-30 14:38 UTC (permalink / raw)
To: linux-mm, kamezawa.hiroyu, dhillf, rientjes, mhocko, akpm, hannes
Cc: linux-kernel, cgroups, Aneesh Kumar K.V, Andrea Arcangeli
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
This adds necessary charge/uncharge calls in the HugeTLB code. We do
hugetlb cgroup charge in page alloc and uncharge in compound page destructor. We
also need to ignore HugeTLB pages in __mem_cgroup_uncharge_common because
that get called from delete_from_page_cache
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Hillf Danton <dhillf@gmail.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
---
mm/hugetlb.c | 16 +++++++++++++++-
mm/memcontrol.c | 5 +++++
2 files changed, 20 insertions(+), 1 deletion(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 6330de2..cad7a4d 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -625,6 +625,8 @@ static void free_huge_page(struct page *page)
BUG_ON(page_count(page));
BUG_ON(page_mapcount(page));
+ hugetlb_cgroup_uncharge_page(hstate_index(h),
+ pages_per_huge_page(h), page);
spin_lock(&hugetlb_lock);
if (h->surplus_huge_pages_node[nid] && huge_page_order(h) < MAX_ORDER) {
/* remove the page from active list */
@@ -1112,7 +1114,10 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
struct hstate *h = hstate_vma(vma);
struct page *page;
long chg;
+ int ret, idx;
+ struct hugetlb_cgroup *h_cg;
+ idx = hstate_index(h);
/*
* Processes that did not create the mapping will have no
* reserves and will not have accounted against subpool
@@ -1128,6 +1133,11 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
if (hugepage_subpool_get_pages(spool, chg))
return ERR_PTR(-ENOSPC);
+ ret = hugetlb_cgroup_charge_page(idx, pages_per_huge_page(h), &h_cg);
+ if (ret) {
+ hugepage_subpool_put_pages(spool, chg);
+ return ERR_PTR(-ENOSPC);
+ }
spin_lock(&hugetlb_lock);
page = dequeue_huge_page_vma(h, vma, addr, avoid_reserve);
spin_unlock(&hugetlb_lock);
@@ -1135,6 +1145,9 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
if (!page) {
page = alloc_buddy_huge_page(h, NUMA_NO_NODE);
if (!page) {
+ hugetlb_cgroup_uncharge_cgroup(idx,
+ pages_per_huge_page(h),
+ h_cg);
hugepage_subpool_put_pages(spool, chg);
return ERR_PTR(-ENOSPC);
}
@@ -1143,7 +1156,8 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
set_page_private(page, (unsigned long)spool);
vma_commit_reservation(h, vma, addr);
-
+ /* update page cgroup details */
+ hugetlb_cgroup_commit_charge(idx, pages_per_huge_page(h), h_cg, page);
return page;
}
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 6df019b..a52780b 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2931,6 +2931,11 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype)
if (PageSwapCache(page))
return NULL;
+ /*
+ * HugeTLB page uncharge happen in the HugeTLB compound page destructor
+ */
+ if (PageHuge(page))
+ return NULL;
if (PageTransHuge(page)) {
nr_pages <<= compound_order(page);
--
1.7.10
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 37+ messages in thread* [PATCH -V7 13/14] hugetlb: migrate hugetlb cgroup info from oldpage to new page during migration
2012-05-30 14:38 [PATCH -V7 00/14] hugetlb: Add HugeTLB controller to control HugeTLB allocation Aneesh Kumar K.V
` (11 preceding siblings ...)
2012-05-30 14:38 ` [PATCH -V7 12/14] hugetlb: add charge/uncharge calls for HugeTLB alloc/free Aneesh Kumar K.V
@ 2012-05-30 14:38 ` Aneesh Kumar K.V
2012-05-30 14:38 ` [PATCH -V7 14/14] hugetlb: add HugeTLB controller documentation Aneesh Kumar K.V
13 siblings, 0 replies; 37+ messages in thread
From: Aneesh Kumar K.V @ 2012-05-30 14:38 UTC (permalink / raw)
To: linux-mm, kamezawa.hiroyu, dhillf, rientjes, mhocko, akpm, hannes
Cc: linux-kernel, cgroups, Aneesh Kumar K.V, Andrea Arcangeli
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
With HugeTLB pages, hugetlb cgroup is uncharged in compound page destructor. Since
we are holding a hugepage reference, we can be sure that old page won't
get uncharged till the last put_page(). On successful migrate, we can
move the memcg information to new page's page_cgroup and mark the old
page's page_cgroup unused.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
---
include/linux/hugetlb_cgroup.h | 8 ++++++++
mm/hugetlb_cgroup.c | 30 ++++++++++++++++++++++++++++++
mm/migrate.c | 5 +++++
3 files changed, 43 insertions(+)
diff --git a/include/linux/hugetlb_cgroup.h b/include/linux/hugetlb_cgroup.h
index fbf8c5f..387cbd6 100644
--- a/include/linux/hugetlb_cgroup.h
+++ b/include/linux/hugetlb_cgroup.h
@@ -43,6 +43,8 @@ extern void hugetlb_cgroup_uncharge_page(int idx, unsigned long nr_pages,
extern void hugetlb_cgroup_uncharge_cgroup(int idx, unsigned long nr_pages,
struct hugetlb_cgroup *h_cg);
extern int hugetlb_cgroup_file_init(int idx) __init;
+extern void hugetlb_cgroup_migrate(struct page *oldhpage,
+ struct page *newhpage);
#else
static inline bool hugetlb_cgroup_disabled(void)
{
@@ -81,5 +83,11 @@ static inline int __init hugetlb_cgroup_file_init(int idx)
{
return 0;
}
+
+static inline void hugetlb_cgroup_migrate(struct page *oldhpage,
+ struct page *newhpage)
+{
+ return;
+}
#endif /* CONFIG_MEM_RES_CTLR_HUGETLB */
#endif
diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c
index 49a3f20..f99007b 100644
--- a/mm/hugetlb_cgroup.c
+++ b/mm/hugetlb_cgroup.c
@@ -401,6 +401,36 @@ int __init hugetlb_cgroup_file_init(int idx)
return 0;
}
+void hugetlb_cgroup_migrate(struct page *oldhpage, struct page *newhpage)
+{
+ struct cgroup *cg;
+ struct page_cgroup *pc;
+ struct hugetlb_cgroup *h_cg;
+
+ VM_BUG_ON(!PageHuge(oldhpage));
+
+ if (hugetlb_cgroup_disabled())
+ return;
+
+ pc = lookup_page_cgroup(oldhpage);
+ lock_page_cgroup(pc);
+ cg = pc->cgroup;
+ h_cg = hugetlb_cgroup_from_cgroup(cg);
+ pc->cgroup = root_h_cgroup->css.cgroup;
+ ClearPageCgroupUsed(pc);
+ cgroup_exclude_rmdir(&h_cg->css);
+ unlock_page_cgroup(pc);
+
+ /* move the h_cg details to new cgroup */
+ pc = lookup_page_cgroup(newhpage);
+ lock_page_cgroup(pc);
+ pc->cgroup = cg;
+ SetPageCgroupUsed(pc);
+ unlock_page_cgroup(pc);
+ cgroup_release_and_wakeup_rmdir(&h_cg->css);
+ return;
+}
+
struct cgroup_subsys hugetlb_subsys = {
.name = "hugetlb",
.create = hugetlb_cgroup_create,
diff --git a/mm/migrate.c b/mm/migrate.c
index 927254c..22f414f 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -33,6 +33,7 @@
#include <linux/memcontrol.h>
#include <linux/syscalls.h>
#include <linux/hugetlb.h>
+#include <linux/hugetlb_cgroup.h>
#include <linux/gfp.h>
#include <asm/tlbflush.h>
@@ -928,6 +929,10 @@ static int unmap_and_move_huge_page(new_page_t get_new_page,
if (anon_vma)
put_anon_vma(anon_vma);
+
+ if (!rc)
+ hugetlb_cgroup_migrate(hpage, new_hpage);
+
unlock_page(hpage);
out:
put_page(new_hpage);
--
1.7.10
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 37+ messages in thread* [PATCH -V7 14/14] hugetlb: add HugeTLB controller documentation
2012-05-30 14:38 [PATCH -V7 00/14] hugetlb: Add HugeTLB controller to control HugeTLB allocation Aneesh Kumar K.V
` (12 preceding siblings ...)
2012-05-30 14:38 ` [PATCH -V7 13/14] hugetlb: migrate hugetlb cgroup info from oldpage to new page during migration Aneesh Kumar K.V
@ 2012-05-30 14:38 ` Aneesh Kumar K.V
13 siblings, 0 replies; 37+ messages in thread
From: Aneesh Kumar K.V @ 2012-05-30 14:38 UTC (permalink / raw)
To: linux-mm, kamezawa.hiroyu, dhillf, rientjes, mhocko, akpm, hannes
Cc: linux-kernel, cgroups, Aneesh Kumar K.V
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
Documentation/cgroups/hugetlb.txt | 45 +++++++++++++++++++++++++++++++++++++
1 file changed, 45 insertions(+)
create mode 100644 Documentation/cgroups/hugetlb.txt
diff --git a/Documentation/cgroups/hugetlb.txt b/Documentation/cgroups/hugetlb.txt
new file mode 100644
index 0000000..a9faaca
--- /dev/null
+++ b/Documentation/cgroups/hugetlb.txt
@@ -0,0 +1,45 @@
+HugeTLB Controller
+-------------------
+
+The HugeTLB controller allows to limit the HugeTLB usage per control group and
+enforces the controller limit during page fault. Since HugeTLB doesn't
+support page reclaim, enforcing the limit at page fault time implies that,
+the application will get SIGBUS signal if it tries to access HugeTLB pages
+beyond its limit. This requires the application to know beforehand how much
+HugeTLB pages it would require for its use.
+
+HugeTLB controller can be created by first mounting the cgroup filesystem.
+
+# mount -t cgroup -o hugetlb none /sys/fs/cgroup
+
+With the above step, the initial or the parent HugeTLB group becomes
+visible at /sys/fs/cgroup. At bootup, this group includes all the tasks in
+the system. /sys/fs/cgroup/tasks lists the tasks in this cgroup.
+
+New groups can be created under the parent group /sys/fs/cgroup.
+
+# cd /sys/fs/cgroup
+# mkdir g1
+# echo $$ > g1/tasks
+
+The above steps create a new group g1 and move the current shell
+process (bash) into it.
+
+Brief summary of control files
+
+ hugetlb.<hugepagesize>.limit_in_bytes # set/show limit of "hugepagesize" hugetlb usage
+ hugetlb.<hugepagesize>.max_usage_in_bytes # show max "hugepagesize" hugetlb usage recorded
+ hugetlb.<hugepagesize>.usage_in_bytes # show current res_counter usage for "hugepagesize" hugetlb
+ hugetlb.<hugepagesize>.failcnt # show the number of allocation failure due to HugeTLB limit
+
+For a system supporting two hugepage size (16M and 16G) the control
+files include:
+
+hugetlb.16GB.limit_in_bytes
+hugetlb.16GB.max_usage_in_bytes
+hugetlb.16GB.usage_in_bytes
+hugetlb.16GB.failcnt
+hugetlb.16MB.limit_in_bytes
+hugetlb.16MB.max_usage_in_bytes
+hugetlb.16MB.usage_in_bytes
+hugetlb.16MB.failcnt
--
1.7.10
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 37+ messages in thread