* [PATCH 1/4] hugetlb: Move update_and_free_page
2007-10-01 15:17 [PATCH 0/4] [hugetlb] Dynamic huge page pool resizing V6 Adam Litke
@ 2007-10-01 15:17 ` Adam Litke
2007-10-02 10:03 ` Bill Irwin
2007-10-01 15:17 ` [PATCH 2/4] hugetlb: Try to grow hugetlb pool for MAP_PRIVATE mappings Adam Litke
` (3 subsequent siblings)
4 siblings, 1 reply; 9+ messages in thread
From: Adam Litke @ 2007-10-01 15:17 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-mm, libhugetlbfs-devel, Adam Litke, Andy Whitcroft,
Mel Gorman, Bill Irwin, Ken Chen, Dave McCracken
This patch simply moves update_and_free_page() so that it can be reused
later in this patch series. The implementation is not changed.
Signed-off-by: Adam Litke <agl@us.ibm.com>
Acked-by: Andy Whitcroft <apw@shadowen.org>
Acked-by: Dave McCracken <dave.mccracken@oracle.com>
---
mm/hugetlb.c | 30 +++++++++++++++---------------
1 files changed, 15 insertions(+), 15 deletions(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 4a374fa..8d3919d 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -92,6 +92,21 @@ static struct page *dequeue_huge_page(struct vm_area_struct *vma,
return page;
}
+static void update_and_free_page(struct page *page)
+{
+ int i;
+ nr_huge_pages--;
+ nr_huge_pages_node[page_to_nid(page)]--;
+ for (i = 0; i < (HPAGE_SIZE / PAGE_SIZE); i++) {
+ page[i].flags &= ~(1 << PG_locked | 1 << PG_error | 1 << PG_referenced |
+ 1 << PG_dirty | 1 << PG_active | 1 << PG_reserved |
+ 1 << PG_private | 1<< PG_writeback);
+ }
+ set_compound_page_dtor(page, NULL);
+ set_page_refcounted(page);
+ __free_pages(page, HUGETLB_PAGE_ORDER);
+}
+
static void free_huge_page(struct page *page)
{
BUG_ON(page_count(page));
@@ -201,21 +216,6 @@ static unsigned int cpuset_mems_nr(unsigned int *array)
}
#ifdef CONFIG_SYSCTL
-static void update_and_free_page(struct page *page)
-{
- int i;
- nr_huge_pages--;
- nr_huge_pages_node[page_to_nid(page)]--;
- for (i = 0; i < (HPAGE_SIZE / PAGE_SIZE); i++) {
- page[i].flags &= ~(1 << PG_locked | 1 << PG_error | 1 << PG_referenced |
- 1 << PG_dirty | 1 << PG_active | 1 << PG_reserved |
- 1 << PG_private | 1<< PG_writeback);
- }
- set_compound_page_dtor(page, NULL);
- set_page_refcounted(page);
- __free_pages(page, HUGETLB_PAGE_ORDER);
-}
-
#ifdef CONFIG_HIGHMEM
static void try_to_free_low(unsigned long count)
{
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [PATCH 1/4] hugetlb: Move update_and_free_page
2007-10-01 15:17 ` [PATCH 1/4] hugetlb: Move update_and_free_page Adam Litke
@ 2007-10-02 10:03 ` Bill Irwin
0 siblings, 0 replies; 9+ messages in thread
From: Bill Irwin @ 2007-10-02 10:03 UTC (permalink / raw)
To: Adam Litke
Cc: Andrew Morton, linux-mm, libhugetlbfs-devel, Andy Whitcroft,
Mel Gorman, Bill Irwin, Ken Chen, Dave McCracken
On Mon, Oct 01, 2007 at 08:17:47AM -0700, Adam Litke wrote:
> This patch simply moves update_and_free_page() so that it can be reused
> later in this patch series. The implementation is not changed.
> Signed-off-by: Adam Litke <agl@us.ibm.com>
> Acked-by: Andy Whitcroft <apw@shadowen.org>
> Acked-by: Dave McCracken <dave.mccracken@oracle.com>
Okay, this one's easy enough.
Acked-by: William Irwin <bill.irwin@oracle.com>
-- wli
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 2/4] hugetlb: Try to grow hugetlb pool for MAP_PRIVATE mappings
2007-10-01 15:17 [PATCH 0/4] [hugetlb] Dynamic huge page pool resizing V6 Adam Litke
2007-10-01 15:17 ` [PATCH 1/4] hugetlb: Move update_and_free_page Adam Litke
@ 2007-10-01 15:17 ` Adam Litke
2007-10-11 22:09 ` [Libhugetlbfs-devel] " Dave Hansen
2007-10-01 15:18 ` [PATCH 3/4] hugetlb: Try to grow hugetlb pool for MAP_SHARED mappings Adam Litke
` (2 subsequent siblings)
4 siblings, 1 reply; 9+ messages in thread
From: Adam Litke @ 2007-10-01 15:17 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-mm, libhugetlbfs-devel, Adam Litke, Andy Whitcroft,
Mel Gorman, Bill Irwin, Ken Chen, Dave McCracken
Because we overcommit hugepages for MAP_PRIVATE mappings, it is possible
that the hugetlb pool will be exhausted or completely reserved when a
hugepage is needed to satisfy a page fault. Before killing the process in
this situation, try to allocate a hugepage directly from the buddy
allocator.
The explicitly configured pool size becomes a low watermark. When
dynamically grown, the allocated huge pages are accounted as a surplus over
the watermark. As huge pages are freed on a node, surplus pages are
released to the buddy allocator so that the pool will shrink back to the
watermark.
Surplus accounting also allows for friendlier explicit pool resizing. When
shrinking a pool that is fully in-use, increase the surplus so pages will
be returned to the buddy allocator as soon as they are freed. When growing
a pool that has a surplus, consume the surplus first and then allocate new
pages.
Signed-off-by: Adam Litke <agl@us.ibm.com>
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Andy Whitcroft <apw@shadowen.org>
Acked-by: Dave McCracken <dave.mccracken@oracle.com>
---
mm/hugetlb.c | 139 ++++++++++++++++++++++++++++++++++++++++++++++++++++------
1 files changed, 125 insertions(+), 14 deletions(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 8d3919d..dabe3d6 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -23,10 +23,12 @@
const unsigned long hugetlb_zero = 0, hugetlb_infinity = ~0UL;
static unsigned long nr_huge_pages, free_huge_pages, resv_huge_pages;
+static unsigned long surplus_huge_pages;
unsigned long max_huge_pages;
static struct list_head hugepage_freelists[MAX_NUMNODES];
static unsigned int nr_huge_pages_node[MAX_NUMNODES];
static unsigned int free_huge_pages_node[MAX_NUMNODES];
+static unsigned int surplus_huge_pages_node[MAX_NUMNODES];
static gfp_t htlb_alloc_mask = GFP_HIGHUSER;
unsigned long hugepages_treat_as_movable;
@@ -109,15 +111,57 @@ static void update_and_free_page(struct page *page)
static void free_huge_page(struct page *page)
{
- BUG_ON(page_count(page));
+ int nid = page_to_nid(page);
+ BUG_ON(page_count(page));
INIT_LIST_HEAD(&page->lru);
spin_lock(&hugetlb_lock);
- enqueue_huge_page(page);
+ if (surplus_huge_pages_node[nid]) {
+ update_and_free_page(page);
+ surplus_huge_pages--;
+ surplus_huge_pages_node[nid]--;
+ } else {
+ enqueue_huge_page(page);
+ }
spin_unlock(&hugetlb_lock);
}
+/*
+ * Increment or decrement surplus_huge_pages. Keep node-specific counters
+ * balanced by operating on them in a round-robin fashion.
+ * Returns 1 if an adjustment was made.
+ */
+static int adjust_pool_surplus(int delta)
+{
+ static int prev_nid;
+ int nid = prev_nid;
+ int ret = 0;
+
+ VM_BUG_ON(delta != -1 && delta != 1);
+ do {
+ nid = next_node(nid, node_online_map);
+ if (nid == MAX_NUMNODES)
+ nid = first_node(node_online_map);
+
+ /* To shrink on this node, there must be a surplus page */
+ if (delta < 0 && !surplus_huge_pages_node[nid])
+ continue;
+ /* Surplus cannot exceed the total number of pages */
+ if (delta > 0 && surplus_huge_pages_node[nid] >=
+ nr_huge_pages_node[nid])
+ continue;
+
+ surplus_huge_pages += delta;
+ surplus_huge_pages_node[nid] += delta;
+ ret = 1;
+ break;
+ } while (nid != prev_nid);
+
+ prev_nid = nid;
+ return ret;
+}
+
static int alloc_fresh_huge_page(void)
{
static int prev_nid;
@@ -150,10 +194,30 @@ static int alloc_fresh_huge_page(void)
return 0;
}
+static struct page *alloc_buddy_huge_page(struct vm_area_struct *vma,
+ unsigned long address)
+{
+ struct page *page;
+
+ page = alloc_pages(htlb_alloc_mask|__GFP_COMP|__GFP_NOWARN,
+ HUGETLB_PAGE_ORDER);
+ if (page) {
+ set_compound_page_dtor(page, free_huge_page);
+ spin_lock(&hugetlb_lock);
+ nr_huge_pages++;
+ nr_huge_pages_node[page_to_nid(page)]++;
+ surplus_huge_pages++;
+ surplus_huge_pages_node[page_to_nid(page)]++;
+ spin_unlock(&hugetlb_lock);
+ }
+
+ return page;
+}
+
static struct page *alloc_huge_page(struct vm_area_struct *vma,
unsigned long addr)
{
- struct page *page;
+ struct page *page = NULL;
spin_lock(&hugetlb_lock);
if (vma->vm_flags & VM_MAYSHARE)
@@ -173,7 +237,16 @@ fail:
if (vma->vm_flags & VM_MAYSHARE)
resv_huge_pages++;
spin_unlock(&hugetlb_lock);
- return NULL;
+
+ /*
+ * Private mappings do not use reserved huge pages so the allocation
+ * may have failed due to an undersized hugetlb pool. Try to grab a
+ * surplus huge page from the buddy allocator.
+ */
+ if (!(vma->vm_flags & VM_MAYSHARE))
+ page = alloc_buddy_huge_page(vma, addr);
+
+ return page;
}
static int __init hugetlb_init(void)
@@ -241,26 +314,62 @@ static inline void try_to_free_low(unsigned long count)
}
#endif
+#define persistent_huge_pages (nr_huge_pages - surplus_huge_pages)
static unsigned long set_max_huge_pages(unsigned long count)
{
- while (count > nr_huge_pages) {
- if (!alloc_fresh_huge_page())
- return nr_huge_pages;
- }
- if (count >= nr_huge_pages)
- return nr_huge_pages;
+ unsigned long min_count, ret;
+ /*
+ * Increase the pool size
+ * First take pages out of surplus state. Then make up the
+ * remaining difference by allocating fresh huge pages.
+ */
spin_lock(&hugetlb_lock);
- count = max(count, resv_huge_pages);
- try_to_free_low(count);
- while (count < nr_huge_pages) {
+ while (surplus_huge_pages && count > persistent_huge_pages) {
+ if (!adjust_pool_surplus(-1))
+ break;
+ }
+
+ while (count > persistent_huge_pages) {
+ int ret;
+ /*
+ * If this allocation races such that we no longer need the
+ * page, free_huge_page will handle it by freeing the page
+ * and reducing the surplus.
+ */
+ spin_unlock(&hugetlb_lock);
+ ret = alloc_fresh_huge_page();
+ spin_lock(&hugetlb_lock);
+ if (!ret)
+ goto out;
+
+ }
+ if (count >= persistent_huge_pages)
+ goto out;
+
+ /*
+ * Decrease the pool size
+ * First return free pages to the buddy allocator (being careful
+ * to keep enough around to satisfy reservations). Then place
+ * pages into surplus state as needed so the pool will shrink
+ * to the desired size as pages become free.
+ */
+ min_count = max(count, resv_huge_pages);
+ try_to_free_low(min_count);
+ while (min_count < persistent_huge_pages) {
struct page *page = dequeue_huge_page(NULL, 0);
if (!page)
break;
update_and_free_page(page);
}
+ while (count < persistent_huge_pages) {
+ if (!adjust_pool_surplus(1))
+ break;
+ }
+out:
+ ret = persistent_huge_pages;
spin_unlock(&hugetlb_lock);
- return nr_huge_pages;
+ return ret;
}
int hugetlb_sysctl_handler(struct ctl_table *table, int write,
@@ -292,10 +401,12 @@ int hugetlb_report_meminfo(char *buf)
"HugePages_Total: %5lu\n"
"HugePages_Free: %5lu\n"
"HugePages_Rsvd: %5lu\n"
+ "HugePages_Surp: %5lu\n"
"Hugepagesize: %5lu kB\n",
nr_huge_pages,
free_huge_pages,
resv_huge_pages,
+ surplus_huge_pages,
HPAGE_SIZE/1024);
}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [Libhugetlbfs-devel] [PATCH 2/4] hugetlb: Try to grow hugetlb pool for MAP_PRIVATE mappings
2007-10-01 15:17 ` [PATCH 2/4] hugetlb: Try to grow hugetlb pool for MAP_PRIVATE mappings Adam Litke
@ 2007-10-11 22:09 ` Dave Hansen
2007-10-12 12:50 ` Mel Gorman
0 siblings, 1 reply; 9+ messages in thread
From: Dave Hansen @ 2007-10-11 22:09 UTC (permalink / raw)
To: Adam Litke
Cc: Andrew Morton, libhugetlbfs-devel, Dave McCracken, linux-mm,
Mel Gorman, Ken Chen, Andy Whitcroft, Bill Irwin
On Mon, 2007-10-01 at 08:17 -0700, Adam Litke wrote:
>
> spin_lock(&hugetlb_lock);
> - enqueue_huge_page(page);
> + if (surplus_huge_pages_node[nid]) {
> + update_and_free_page(page);
> + surplus_huge_pages--;
> + surplus_huge_pages_node[nid]--;
> + } else {
> + enqueue_huge_page(page);
> + }
> spin_unlock(&hugetlb_lock);
> }
Why does it matter that these surplus pages are tracked per-node?
-- Dave
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [Libhugetlbfs-devel] [PATCH 2/4] hugetlb: Try to grow hugetlb pool for MAP_PRIVATE mappings
2007-10-11 22:09 ` [Libhugetlbfs-devel] " Dave Hansen
@ 2007-10-12 12:50 ` Mel Gorman
0 siblings, 0 replies; 9+ messages in thread
From: Mel Gorman @ 2007-10-12 12:50 UTC (permalink / raw)
To: Dave Hansen
Cc: Adam Litke, Andrew Morton, libhugetlbfs-devel, Dave McCracken,
linux-mm, Ken Chen, Andy Whitcroft, Bill Irwin
On (11/10/07 15:09), Dave Hansen didst pronounce:
> On Mon, 2007-10-01 at 08:17 -0700, Adam Litke wrote:
> >
> > spin_lock(&hugetlb_lock);
> > - enqueue_huge_page(page);
> > + if (surplus_huge_pages_node[nid]) {
> > + update_and_free_page(page);
> > + surplus_huge_pages--;
> > + surplus_huge_pages_node[nid]--;
> > + } else {
> > + enqueue_huge_page(page);
> > + }
> > spin_unlock(&hugetlb_lock);
> > }
>
> Why does it matter that these surplus pages are tracked per-node?
>
Because presumably one does not want to end up in a situation whereby
the pools were initially filled with balanced nodes for MPOL_INTERLEAVE
and get screwed up by dynamic page resizing. The per-node surplus
counting should be ensuring the node balancing remains the same.
(have not verified this is the case, it just makes sense)
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 3/4] hugetlb: Try to grow hugetlb pool for MAP_SHARED mappings
2007-10-01 15:17 [PATCH 0/4] [hugetlb] Dynamic huge page pool resizing V6 Adam Litke
2007-10-01 15:17 ` [PATCH 1/4] hugetlb: Move update_and_free_page Adam Litke
2007-10-01 15:17 ` [PATCH 2/4] hugetlb: Try to grow hugetlb pool for MAP_PRIVATE mappings Adam Litke
@ 2007-10-01 15:18 ` Adam Litke
2007-10-01 15:18 ` [PATCH 4/4] hugetlb: Add hugetlb_dynamic_pool sysctl Adam Litke
2007-10-02 9:41 ` [PATCH 0/4] [hugetlb] Dynamic huge page pool resizing V6 Bill Irwin
4 siblings, 0 replies; 9+ messages in thread
From: Adam Litke @ 2007-10-01 15:18 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-mm, libhugetlbfs-devel, Adam Litke, Andy Whitcroft,
Mel Gorman, Bill Irwin, Ken Chen, Dave McCracken
Shared mappings require special handling because the huge pages needed to
fully populate the VMA must be reserved at mmap time. If not enough pages
are available when making the reservation, allocate all of the shortfall at
once from the buddy allocator and add the pages directly to the hugetlb
pool. If they cannot be allocated, then fail the mapping. The page
surplus is accounted for in the same way as for private mappings; faulted
surplus pages will be freed at unmap time. Reserved, surplus pages that
have not been used must be freed separately when their reservation has been
released.
Signed-off-by: Adam Litke <agl@us.ibm.com>
Acked-by: Andy Whitcroft <apw@shadowen.org>
Acked-by: Dave McCracken <dave.mccracken@oracle.com>
---
mm/hugetlb.c | 155 +++++++++++++++++++++++++++++++++++++++++++++++++---------
1 files changed, 132 insertions(+), 23 deletions(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index dabe3d6..aa945c2 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -87,6 +87,8 @@ static struct page *dequeue_huge_page(struct vm_area_struct *vma,
list_del(&page->lru);
free_huge_pages--;
free_huge_pages_node[nid]--;
+ if (vma && vma->vm_flags & VM_MAYSHARE)
+ resv_huge_pages--;
break;
}
}
@@ -214,15 +216,116 @@ static struct page *alloc_buddy_huge_page(struct vm_area_struct *vma,
return page;
}
+/*
+ * Increase the hugetlb pool such that it can accomodate a reservation
+ * of size 'delta'.
+ */
+static int gather_surplus_pages(int delta)
+{
+ struct list_head surplus_list;
+ struct page *page, *tmp;
+ int ret, i;
+ int needed, allocated;
+
+ needed = (resv_huge_pages + delta) - free_huge_pages;
+ if (needed <= 0)
+ return 0;
+
+ allocated = 0;
+ INIT_LIST_HEAD(&surplus_list);
+
+ ret = -ENOMEM;
+retry:
+ spin_unlock(&hugetlb_lock);
+ for (i = 0; i < needed; i++) {
+ page = alloc_buddy_huge_page(NULL, 0);
+ if (!page) {
+ /*
+ * We were not able to allocate enough pages to
+ * satisfy the entire reservation so we free what
+ * we've allocated so far.
+ */
+ spin_lock(&hugetlb_lock);
+ needed = 0;
+ goto free;
+ }
+
+ list_add(&page->lru, &surplus_list);
+ }
+ allocated += needed;
+
+ /*
+ * After retaking hugetlb_lock, we need to recalculate 'needed'
+ * because either resv_huge_pages or free_huge_pages may have changed.
+ */
+ spin_lock(&hugetlb_lock);
+ needed = (resv_huge_pages + delta) - (free_huge_pages + allocated);
+ if (needed > 0)
+ goto retry;
+
+ /*
+ * The surplus_list now contains _at_least_ the number of extra pages
+ * needed to accomodate the reservation. Add the appropriate number
+ * of pages to the hugetlb pool and free the extras back to the buddy
+ * allocator.
+ */
+ needed += allocated;
+ ret = 0;
+free:
+ list_for_each_entry_safe(page, tmp, &surplus_list, lru) {
+ list_del(&page->lru);
+ if ((--needed) >= 0)
+ enqueue_huge_page(page);
+ else
+ update_and_free_page(page);
+ }
+
+ return ret;
+}
+
+/*
+ * When releasing a hugetlb pool reservation, any surplus pages that were
+ * allocated to satisfy the reservation must be explicitly freed if they were
+ * never used.
+ */
+void return_unused_surplus_pages(unsigned long unused_resv_pages)
+{
+ static int nid = -1;
+ struct page *page;
+ unsigned long nr_pages;
+
+ nr_pages = min(unused_resv_pages, surplus_huge_pages);
+
+ while (nr_pages) {
+ nid = next_node(nid, node_online_map);
+ if (nid == MAX_NUMNODES)
+ nid = first_node(node_online_map);
+
+ if (!surplus_huge_pages_node[nid])
+ continue;
+
+ if (!list_empty(&hugepage_freelists[nid])) {
+ page = list_entry(hugepage_freelists[nid].next,
+ struct page, lru);
+ list_del(&page->lru);
+ update_and_free_page(page);
+ free_huge_pages--;
+ free_huge_pages_node[nid]--;
+ surplus_huge_pages--;
+ surplus_huge_pages_node[nid]--;
+ nr_pages--;
+ }
+ }
+}
+
static struct page *alloc_huge_page(struct vm_area_struct *vma,
unsigned long addr)
{
struct page *page = NULL;
+ int use_reserved_page = vma->vm_flags & VM_MAYSHARE;
spin_lock(&hugetlb_lock);
- if (vma->vm_flags & VM_MAYSHARE)
- resv_huge_pages--;
- else if (free_huge_pages <= resv_huge_pages)
+ if (!use_reserved_page && (free_huge_pages <= resv_huge_pages))
goto fail;
page = dequeue_huge_page(vma, addr);
@@ -234,8 +337,6 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
return page;
fail:
- if (vma->vm_flags & VM_MAYSHARE)
- resv_huge_pages++;
spin_unlock(&hugetlb_lock);
/*
@@ -243,7 +344,7 @@ fail:
* may have failed due to an undersized hugetlb pool. Try to grab a
* surplus huge page from the buddy allocator.
*/
- if (!(vma->vm_flags & VM_MAYSHARE))
+ if (!use_reserved_page)
page = alloc_buddy_huge_page(vma, addr);
return page;
@@ -952,21 +1053,6 @@ static int hugetlb_acct_memory(long delta)
int ret = -ENOMEM;
spin_lock(&hugetlb_lock);
- if ((delta + resv_huge_pages) <= free_huge_pages) {
- resv_huge_pages += delta;
- ret = 0;
- }
- spin_unlock(&hugetlb_lock);
- return ret;
-}
-
-int hugetlb_reserve_pages(struct inode *inode, long from, long to)
-{
- long ret, chg;
-
- chg = region_chg(&inode->i_mapping->private_list, from, to);
- if (chg < 0)
- return chg;
/*
* When cpuset is configured, it breaks the strict hugetlb page
* reservation as the accounting is done on a global variable. Such
@@ -984,8 +1070,31 @@ int hugetlb_reserve_pages(struct inode *inode, long from, long to)
* a best attempt and hopefully to minimize the impact of changing
* semantics that cpuset has.
*/
- if (chg > cpuset_mems_nr(free_huge_pages_node))
- return -ENOMEM;
+ if (delta > 0) {
+ if (gather_surplus_pages(delta) < 0)
+ goto out;
+
+ if (delta > cpuset_mems_nr(free_huge_pages_node))
+ goto out;
+ }
+
+ ret = 0;
+ resv_huge_pages += delta;
+ if (delta < 0)
+ return_unused_surplus_pages((unsigned long) -delta);
+
+out:
+ spin_unlock(&hugetlb_lock);
+ return ret;
+}
+
+int hugetlb_reserve_pages(struct inode *inode, long from, long to)
+{
+ long ret, chg;
+
+ chg = region_chg(&inode->i_mapping->private_list, from, to);
+ if (chg < 0)
+ return chg;
ret = hugetlb_acct_memory(chg);
if (ret < 0)
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 9+ messages in thread* [PATCH 4/4] hugetlb: Add hugetlb_dynamic_pool sysctl
2007-10-01 15:17 [PATCH 0/4] [hugetlb] Dynamic huge page pool resizing V6 Adam Litke
` (2 preceding siblings ...)
2007-10-01 15:18 ` [PATCH 3/4] hugetlb: Try to grow hugetlb pool for MAP_SHARED mappings Adam Litke
@ 2007-10-01 15:18 ` Adam Litke
2007-10-02 9:41 ` [PATCH 0/4] [hugetlb] Dynamic huge page pool resizing V6 Bill Irwin
4 siblings, 0 replies; 9+ messages in thread
From: Adam Litke @ 2007-10-01 15:18 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-mm, libhugetlbfs-devel, Adam Litke, Andy Whitcroft,
Mel Gorman, Bill Irwin, Ken Chen, Dave McCracken
The maximum size of the huge page pool can be controlled using the overall
size of the hugetlb filesystem (via its 'size' mount option). However in
the common case the this will not be set as the pool is traditionally fixed
in size at boot time. In order to maintain the expected semantics, we need
to prevent the pool expanding by default.
This patch introduces a new sysctl controlling dynamic pool resizing. When
this is enabled the pool will expand beyond its base size up to the size of
the hugetlb filesystem. It is disabled by default.
Signed-off-by: Adam Litke <agl@us.ibm.com>
Acked-by: Andy Whitcroft <apw@shadowen.org>
Acked-by: Dave McCracken <dave.mccracken@oracle.com>
---
include/linux/hugetlb.h | 1 +
kernel/sysctl.c | 8 ++++++++
mm/hugetlb.c | 5 +++++
3 files changed, 14 insertions(+), 0 deletions(-)
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 3a19b03..ea0f50b 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -33,6 +33,7 @@ void hugetlb_unreserve_pages(struct inode *inode, long offset, long freed);
extern unsigned long max_huge_pages;
extern unsigned long hugepages_treat_as_movable;
+extern int hugetlb_dynamic_pool;
extern const unsigned long hugetlb_zero, hugetlb_infinity;
extern int sysctl_hugetlb_shm_group;
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index b11d22b..56c2949 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -885,6 +885,14 @@ static struct ctl_table vm_table[] = {
.mode = 0644,
.proc_handler = &hugetlb_treat_movable_handler,
},
+ {
+ .ctl_name = CTL_UNNUMBERED,
+ .procname = "hugetlb_dynamic_pool",
+ .data = &hugetlb_dynamic_pool,
+ .maxlen = sizeof(hugetlb_dynamic_pool),
+ .mode = 0644,
+ .proc_handler = &proc_dointvec,
+ },
#endif
{
.ctl_name = VM_LOWMEM_RESERVE_RATIO,
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index aa945c2..8c122bc 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -31,6 +31,7 @@ static unsigned int free_huge_pages_node[MAX_NUMNODES];
static unsigned int surplus_huge_pages_node[MAX_NUMNODES];
static gfp_t htlb_alloc_mask = GFP_HIGHUSER;
unsigned long hugepages_treat_as_movable;
+int hugetlb_dynamic_pool;
/*
* Protects updates to hugepage_freelists, nr_huge_pages, and free_huge_pages
@@ -201,6 +202,10 @@ static struct page *alloc_buddy_huge_page(struct vm_area_struct *vma,
{
struct page *page;
+ /* Check if the dynamic pool is enabled */
+ if (!hugetlb_dynamic_pool)
+ return NULL;
+
page = alloc_pages(htlb_alloc_mask|__GFP_COMP|__GFP_NOWARN,
HUGETLB_PAGE_ORDER);
if (page) {
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [PATCH 0/4] [hugetlb] Dynamic huge page pool resizing V6
2007-10-01 15:17 [PATCH 0/4] [hugetlb] Dynamic huge page pool resizing V6 Adam Litke
` (3 preceding siblings ...)
2007-10-01 15:18 ` [PATCH 4/4] hugetlb: Add hugetlb_dynamic_pool sysctl Adam Litke
@ 2007-10-02 9:41 ` Bill Irwin
4 siblings, 0 replies; 9+ messages in thread
From: Bill Irwin @ 2007-10-02 9:41 UTC (permalink / raw)
To: Adam Litke
Cc: Andrew Morton, linux-mm, libhugetlbfs-devel, Andy Whitcroft,
Mel Gorman, Bill Irwin, Ken Chen, Dave McCracken
On Mon, Oct 01, 2007 at 08:17:36AM -0700, Adam Litke wrote:
> This release contains no significant changes to any of the patches. I have
> been running regression and performance tests on a variety of machines and
> configurations. Andrew, these patches relax restrictions related to sizing the
> hugetlb pool. The patches have reached stability in content, function, and
> performance and I believe they are ready for wider testing. Please consider
> for merging into -mm. I have included performance results at the end of this
> mail.
I very much like the concept and am impressed with the testing level.
I'll ack the individual patches as I get to them.
-- wli
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread