linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] hugetlb: Fix up filesystem quota accounting
@ 2007-10-30 20:45 Adam Litke
  2007-10-30 20:46 ` [PATCH 1/5] hugetlb: Split alloc_huge_page into private and shared components Adam Litke
                   ` (4 more replies)
  0 siblings, 5 replies; 13+ messages in thread
From: Adam Litke @ 2007-10-30 20:45 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Ken Chen, Andy Whitcroft, Dave Hansen,
	Adam Litke


Changelog:
24 Oct 2007 - Initial Post
30 Oct 2007 - V2
 Changes in response to review
 * Split alloc_huge_page() into private/shared components
 * Store mapping pointer in page->private
 * Consolidated {get|put} quota calls to {alloc|free}_huge_page

Hugetlbfs implements a quota system which can limit the amount of memory that
can be used by the filesystem.  Before allocating a new huge page for a file,
the quota is checked and debited.  The quota is then credited when truncating
the file.  I found a few bugs in the code for both MAP_PRIVATE and MAP_SHARED
mappings.  Before detailing the problems and my proposed solutions, we should
agree on a definition of quotas that properly addresses both private and shared
pages.  Since the purpose of quotas is to limit total memory consumption on a
per-filesystem basis, I argue that all pages allocated by the fs (private and
shared) should be charged against quota.

Private Mappings
================
The current code will debit quota for private pages sometimes, but will never
credit it.  At a minimum, this causes a leak in the quota accounting which
renders the accounting essentially useless as it is.  Shared pages have a one
to one mapping with a hugetlbfs file and are easy to account by debiting on
allocation and crediting on truncate.  Private pages are anonymous in nature
and have a many to one relationship with their hugetlbfs files (due to copy on
write).  Because private pages are not indexed by the mapping's radix tree,
thier quota cannot be credited at file truncation time.  Crediting must be done
when the page is unmapped and freed.

Shared Pages
============
I discovered an issue concerning the interaction between the MAP_SHARED
reservation system and quotas.  Since quota is not checked until page
instantiation, an over-quota mmap/reservation will initially succeed.  When
instantiating the first over-quota page, the program will receive SIGBUS.  This
is inconsistent since the reservation is supposed to be a guarantee.  The
solution is to debit the full amount of quota at reservation time and credit
the unused portion when the reservation is released.

This patch series brings quotas back in line by making the following
modifications:
 * Private pages
   - Debit quota in alloc_huge_page()
   - Credit quota in free_huge_page()
 * Shared pages
   - Debit quota for entire reservation at mmap time
   - Credit quota for instantiated pages in free_huge_page()
   - Credit quota for unused reservation at munmap time

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/5] hugetlb: Split alloc_huge_page into private and shared components
  2007-10-30 20:45 [PATCH 0/5] hugetlb: Fix up filesystem quota accounting Adam Litke
@ 2007-10-30 20:46 ` Adam Litke
  2007-10-30 20:46 ` [PATCH 2/5] hugetlb: Fix quota management for private mappings Adam Litke
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 13+ messages in thread
From: Adam Litke @ 2007-10-30 20:46 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Ken Chen, Andy Whitcroft, Dave Hansen,
	Adam Litke

The shared page reservation and dynamic pool resizing features have made
the allocation of private vs. shared huge pages quite different.  By
splitting out the private/shared-specific portions of the process into
their own functions, readability is greatly improved.  alloc_huge_page now
calls the proper helper and performs common operations.

Signed-off-by: Adam Litke <agl@us.ibm.com>
---

 mm/hugetlb.c |   46 +++++++++++++++++++++++++++-------------------
 1 files changed, 27 insertions(+), 19 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index ae2959b..d319e8e 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -353,35 +353,43 @@ void return_unused_surplus_pages(unsigned long unused_resv_pages)
 	}
 }
 
-static struct page *alloc_huge_page(struct vm_area_struct *vma,
-				    unsigned long addr)
+
+static struct page *alloc_huge_page_shared(struct vm_area_struct *vma,
+			 			unsigned long addr)
 {
-	struct page *page = NULL;
-	int use_reserved_page = vma->vm_flags & VM_MAYSHARE;
+	struct page *page;
 
 	spin_lock(&hugetlb_lock);
-	if (!use_reserved_page && (free_huge_pages <= resv_huge_pages))
-		goto fail;
-
 	page = dequeue_huge_page(vma, addr);
-	if (!page)
-		goto fail;
-
 	spin_unlock(&hugetlb_lock);
-	set_page_refcounted(page);
 	return page;
+}
 
-fail:
-	spin_unlock(&hugetlb_lock);
+static struct page *alloc_huge_page_private(struct vm_area_struct *vma,
+			 			unsigned long addr)
+{
+	struct page *page = NULL;
 
-	/*
-	 * Private mappings do not use reserved huge pages so the allocation
-	 * may have failed due to an undersized hugetlb pool.  Try to grab a
-	 * surplus huge page from the buddy allocator.
-	 */
-	if (!use_reserved_page)
+	spin_lock(&hugetlb_lock);
+	if (free_huge_pages > resv_huge_pages)
+		page = dequeue_huge_page(vma, addr);
+	spin_unlock(&hugetlb_lock);
+	if (!page)
 		page = alloc_buddy_huge_page(vma, addr);
+	return page;
+}
 
+static struct page *alloc_huge_page(struct vm_area_struct *vma,
+				    unsigned long addr)
+{
+	struct page *page;
+
+	if (vma->vm_flags & VM_MAYSHARE)
+		page = alloc_huge_page_shared(vma, addr);
+	else
+		page = alloc_huge_page_private(vma, addr);
+	if (page)
+		set_page_refcounted(page);
 	return page;
 }
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/5] hugetlb: Fix quota management for private mappings
  2007-10-30 20:45 [PATCH 0/5] hugetlb: Fix up filesystem quota accounting Adam Litke
  2007-10-30 20:46 ` [PATCH 1/5] hugetlb: Split alloc_huge_page into private and shared components Adam Litke
@ 2007-10-30 20:46 ` Adam Litke
  2007-10-30 23:22   ` Andrew Morton
  2007-10-30 20:46 ` [PATCH 3/5] hugetlb: Debit quota in alloc_huge_page Adam Litke
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 13+ messages in thread
From: Adam Litke @ 2007-10-30 20:46 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Ken Chen, Andy Whitcroft, Dave Hansen,
	Adam Litke

The hugetlbfs quota management system was never taught to handle
MAP_PRIVATE mappings when that support was added.  Currently, quota is
debited at page instantiation and credited at file truncation.  This
approach works correctly for shared pages but is incomplete for private
pages.  In addition to hugetlb_no_page(), private pages can be instantiated
by hugetlb_cow(); but this function does not respect quotas.

Private huge pages are treated very much like normal, anonymous pages.
They are not "backed" by the hugetlbfs file and are not stored in the
mapping's radix tree.  This means that private pages are invisible to
truncate_hugepages() so that function will not credit the quota.

This patch (based on a prototype provided by Ken Chen) moves quota
crediting for all pages into free_huge_page().  page->private is used to
store a pointer to the mapping to which this page belongs.  This is used to
credit quota on the appropriate hugetlbfs instance.

Signed-off-by: Adam Litke <agl@us.ibm.com>
CC: Ken Chen <kenchen@google.com>
---

 fs/hugetlbfs/inode.c |    1 -
 mm/hugetlb.c         |   13 ++++++++++---
 2 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 04598e1..5f4e888 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -364,7 +364,6 @@ static void truncate_hugepages(struct inode *inode, loff_t lstart)
 			++next;
 			truncate_huge_page(page);
 			unlock_page(page);
-			hugetlb_put_quota(mapping);
 			freed++;
 		}
 		huge_pagevec_release(&pvec);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index d319e8e..0b09ef2 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -116,7 +116,9 @@ static void update_and_free_page(struct page *page)
 static void free_huge_page(struct page *page)
 {
 	int nid = page_to_nid(page);
+	struct address_space *mapping;
 
+	mapping = (struct address_space *) page_private(page);
 	BUG_ON(page_count(page));
 	INIT_LIST_HEAD(&page->lru);
 
@@ -129,6 +131,9 @@ static void free_huge_page(struct page *page)
 		enqueue_huge_page(page);
 	}
 	spin_unlock(&hugetlb_lock);
+	if (mapping)
+		hugetlb_put_quota(mapping);
+	set_page_private(page, 0);
 }
 
 /*
@@ -388,8 +393,10 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
 		page = alloc_huge_page_shared(vma, addr);
 	else
 		page = alloc_huge_page_private(vma, addr);
-	if (page)
+	if (page) {
 		set_page_refcounted(page);
+		set_page_private(page, (unsigned long) vma->vm_file->f_mapping);
+	}
 	return page;
 }
 
@@ -730,6 +737,8 @@ static int hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma,
 		set_huge_ptep_writable(vma, address, ptep);
 		return 0;
 	}
+	if (hugetlb_get_quota(vma->vm_file->f_mapping))
+		return VM_FAULT_SIGBUS;
 
 	page_cache_get(old_page);
 	new_page = alloc_huge_page(vma, address);
@@ -796,7 +805,6 @@ retry:
 			err = add_to_page_cache(page, mapping, idx, GFP_KERNEL);
 			if (err) {
 				put_page(page);
-				hugetlb_put_quota(mapping);
 				if (err == -EEXIST)
 					goto retry;
 				goto out;
@@ -830,7 +838,6 @@ out:
 
 backout:
 	spin_unlock(&mm->page_table_lock);
-	hugetlb_put_quota(mapping);
 	unlock_page(page);
 	put_page(page);
 	goto out;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/5] hugetlb: Debit quota in alloc_huge_page
  2007-10-30 20:45 [PATCH 0/5] hugetlb: Fix up filesystem quota accounting Adam Litke
  2007-10-30 20:46 ` [PATCH 1/5] hugetlb: Split alloc_huge_page into private and shared components Adam Litke
  2007-10-30 20:46 ` [PATCH 2/5] hugetlb: Fix quota management for private mappings Adam Litke
@ 2007-10-30 20:46 ` Adam Litke
  2007-10-30 20:46 ` [PATCH 4/5] hugetlb: Allow bulk updating in hugetlb_*_quota() Adam Litke
  2007-10-30 20:46 ` [PATCH 5/5] hugetlb: Enforce quotas during reservation for shared mappings Adam Litke
  4 siblings, 0 replies; 13+ messages in thread
From: Adam Litke @ 2007-10-30 20:46 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Ken Chen, Andy Whitcroft, Dave Hansen,
	Adam Litke

Now that quota is credited by free_huge_page(), calls to
hugetlb_get_quota() seem out of place.  The alloc/free API is unbalanced
because we handle the hugetlb_put_quota() but expect the caller to
open-code hugetlb_get_quota().  Move the get inside alloc_huge_page to
clean up this disparity.

This patch has been kept apart from the previous patch because of the
somewhat dodgy ERR_PTR() use herein.  Moving the quota logic means that
alloc_huge_page() has two failure modes.  Quota failure must result in a
SIGBUS while a standard allocation failure is OOM.  Unfortunately,
ERR_PTR() doesn't like the small positive errnos we have in VM_FAULT_* so
they must be negated before they are used.

Does anyone take issue with the way I am using PTR_ERR.  If so, what are
your thoughts on how to clean this up (without needing an if,else if,else
block at each alloc_huge_page() callsite)?

Signed-off-by: Adam Litke <agl@us.ibm.com>

 STG: Please edit the
description for patch "get-quota-on-alloc-V2" above.  STG: Lines prefixed
with "STG:" will be automatically removed.  STG: Trailing empty lines will
be automatically removed.  STG: vi: set textwidth=75 filetype=diff
nobackup:
---

 mm/hugetlb.c |   24 ++++++++++++------------
 1 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 0b09ef2..5eacee8 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -388,6 +388,10 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
 				    unsigned long addr)
 {
 	struct page *page;
+	struct address_space *mapping = vma->vm_file->f_mapping;
+
+	if (hugetlb_get_quota(mapping))
+		return ERR_PTR(-VM_FAULT_SIGBUS);
 
 	if (vma->vm_flags & VM_MAYSHARE)
 		page = alloc_huge_page_shared(vma, addr);
@@ -395,9 +399,10 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
 		page = alloc_huge_page_private(vma, addr);
 	if (page) {
 		set_page_refcounted(page);
-		set_page_private(page, (unsigned long) vma->vm_file->f_mapping);
-	}
-	return page;
+		set_page_private(page, (unsigned long) mapping);
+		return page;
+	} else
+		return ERR_PTR(-VM_FAULT_OOM);
 }
 
 static int __init hugetlb_init(void)
@@ -737,15 +742,13 @@ static int hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma,
 		set_huge_ptep_writable(vma, address, ptep);
 		return 0;
 	}
-	if (hugetlb_get_quota(vma->vm_file->f_mapping))
-		return VM_FAULT_SIGBUS;
 
 	page_cache_get(old_page);
 	new_page = alloc_huge_page(vma, address);
 
-	if (!new_page) {
+	if (IS_ERR(new_page)) {
 		page_cache_release(old_page);
-		return VM_FAULT_OOM;
+		return -PTR_ERR(new_page);
 	}
 
 	spin_unlock(&mm->page_table_lock);
@@ -789,12 +792,9 @@ retry:
 		size = i_size_read(mapping->host) >> HPAGE_SHIFT;
 		if (idx >= size)
 			goto out;
-		if (hugetlb_get_quota(mapping))
-			goto out;
 		page = alloc_huge_page(vma, address);
-		if (!page) {
-			hugetlb_put_quota(mapping);
-			ret = VM_FAULT_OOM;
+		if (IS_ERR(page)) {
+			ret = -PTR_ERR(page);
 			goto out;
 		}
 		clear_huge_page(page, address);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 4/5] hugetlb: Allow bulk updating in hugetlb_*_quota()
  2007-10-30 20:45 [PATCH 0/5] hugetlb: Fix up filesystem quota accounting Adam Litke
                   ` (2 preceding siblings ...)
  2007-10-30 20:46 ` [PATCH 3/5] hugetlb: Debit quota in alloc_huge_page Adam Litke
@ 2007-10-30 20:46 ` Adam Litke
  2007-10-30 20:46 ` [PATCH 5/5] hugetlb: Enforce quotas during reservation for shared mappings Adam Litke
  4 siblings, 0 replies; 13+ messages in thread
From: Adam Litke @ 2007-10-30 20:46 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Ken Chen, Andy Whitcroft, Dave Hansen,
	Adam Litke

Add a second parameter 'delta' to hugetlb_get_quota and hugetlb_put_quota
to allow bulk updating of the sbinfo->free_blocks counter.  This will be
used by the next patch in the series.

Signed-off-by: Adam Litke <agl@us.ibm.com>
---

 fs/hugetlbfs/inode.c    |   10 +++++-----
 include/linux/hugetlb.h |    4 ++--
 mm/hugetlb.c            |    4 ++--
 3 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 5f4e888..449ba8b 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -858,15 +858,15 @@ out_free:
 	return -ENOMEM;
 }
 
-int hugetlb_get_quota(struct address_space *mapping)
+int hugetlb_get_quota(struct address_space *mapping, long delta)
 {
 	int ret = 0;
 	struct hugetlbfs_sb_info *sbinfo = HUGETLBFS_SB(mapping->host->i_sb);
 
 	if (sbinfo->free_blocks > -1) {
 		spin_lock(&sbinfo->stat_lock);
-		if (sbinfo->free_blocks > 0)
-			sbinfo->free_blocks--;
+		if (sbinfo->free_blocks - delta >= 0)
+			sbinfo->free_blocks -= delta;
 		else
 			ret = -ENOMEM;
 		spin_unlock(&sbinfo->stat_lock);
@@ -875,13 +875,13 @@ int hugetlb_get_quota(struct address_space *mapping)
 	return ret;
 }
 
-void hugetlb_put_quota(struct address_space *mapping)
+void hugetlb_put_quota(struct address_space *mapping, long delta)
 {
 	struct hugetlbfs_sb_info *sbinfo = HUGETLBFS_SB(mapping->host->i_sb);
 
 	if (sbinfo->free_blocks > -1) {
 		spin_lock(&sbinfo->stat_lock);
-		sbinfo->free_blocks++;
+		sbinfo->free_blocks += delta;
 		spin_unlock(&sbinfo->stat_lock);
 	}
 }
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index ea0f50b..770dbed 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -165,8 +165,8 @@ static inline struct hugetlbfs_sb_info *HUGETLBFS_SB(struct super_block *sb)
 extern const struct file_operations hugetlbfs_file_operations;
 extern struct vm_operations_struct hugetlb_vm_ops;
 struct file *hugetlb_file_setup(const char *name, size_t);
-int hugetlb_get_quota(struct address_space *mapping);
-void hugetlb_put_quota(struct address_space *mapping);
+int hugetlb_get_quota(struct address_space *mapping, long delta);
+void hugetlb_put_quota(struct address_space *mapping, long delta);
 
 static inline int is_file_hugepages(struct file *file)
 {
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 5eacee8..deba411 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -132,7 +132,7 @@ static void free_huge_page(struct page *page)
 	}
 	spin_unlock(&hugetlb_lock);
 	if (mapping)
-		hugetlb_put_quota(mapping);
+		hugetlb_put_quota(mapping, 1);
 	set_page_private(page, 0);
 }
 
@@ -390,7 +390,7 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
 	struct page *page;
 	struct address_space *mapping = vma->vm_file->f_mapping;
 
-	if (hugetlb_get_quota(mapping))
+	if (hugetlb_get_quota(mapping, 1))
 		return ERR_PTR(-VM_FAULT_SIGBUS);
 
 	if (vma->vm_flags & VM_MAYSHARE)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 5/5] hugetlb: Enforce quotas during reservation for shared mappings
  2007-10-30 20:45 [PATCH 0/5] hugetlb: Fix up filesystem quota accounting Adam Litke
                   ` (3 preceding siblings ...)
  2007-10-30 20:46 ` [PATCH 4/5] hugetlb: Allow bulk updating in hugetlb_*_quota() Adam Litke
@ 2007-10-30 20:46 ` Adam Litke
  4 siblings, 0 replies; 13+ messages in thread
From: Adam Litke @ 2007-10-30 20:46 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Ken Chen, Andy Whitcroft, Dave Hansen,
	Adam Litke

When a MAP_SHARED mmap of a hugetlbfs file succeeds, huge pages are
reserved to guarantee no problems will occur later when instantiating
pages.  If quotas are in force, page instantiation could fail due to a race
with another process or an oversized (but approved) shared mapping.

To prevent these scenarios, debit the quota for the full reservation amount
up front and credit the unused quota when the reservation is released.

Signed-off-by: Adam Litke <agl@us.ibm.com>
---

 mm/hugetlb.c |   23 +++++++++++++----------
 1 files changed, 13 insertions(+), 10 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index deba411..2fc3cd6 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -367,7 +367,7 @@ static struct page *alloc_huge_page_shared(struct vm_area_struct *vma,
 	spin_lock(&hugetlb_lock);
 	page = dequeue_huge_page(vma, addr);
 	spin_unlock(&hugetlb_lock);
-	return page;
+	return page ? page : ERR_PTR(-VM_FAULT_OOM);
 }
 
 static struct page *alloc_huge_page_private(struct vm_area_struct *vma,
@@ -375,13 +375,16 @@ static struct page *alloc_huge_page_private(struct vm_area_struct *vma,
 {
 	struct page *page = NULL;
 
+	if (hugetlb_get_quota(vma->vm_file->f_mapping, 1))
+		return ERR_PTR(-VM_FAULT_SIGBUS);
+
 	spin_lock(&hugetlb_lock);
 	if (free_huge_pages > resv_huge_pages)
 		page = dequeue_huge_page(vma, addr);
 	spin_unlock(&hugetlb_lock);
 	if (!page)
 		page = alloc_buddy_huge_page(vma, addr);
-	return page;
+	return page ? page : ERR_PTR(-VM_FAULT_OOM);
 }
 
 static struct page *alloc_huge_page(struct vm_area_struct *vma,
@@ -390,19 +393,16 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
 	struct page *page;
 	struct address_space *mapping = vma->vm_file->f_mapping;
 
-	if (hugetlb_get_quota(mapping, 1))
-		return ERR_PTR(-VM_FAULT_SIGBUS);
-
 	if (vma->vm_flags & VM_MAYSHARE)
 		page = alloc_huge_page_shared(vma, addr);
 	else
 		page = alloc_huge_page_private(vma, addr);
-	if (page) {
+
+	if (!IS_ERR(page)) {
 		set_page_refcounted(page);
 		set_page_private(page, (unsigned long) mapping);
-		return page;
-	} else
-		return ERR_PTR(-VM_FAULT_OOM);
+	}
+	return page;
 }
 
 static int __init hugetlb_init(void)
@@ -1147,6 +1147,8 @@ int hugetlb_reserve_pages(struct inode *inode, long from, long to)
 	if (chg < 0)
 		return chg;
 
+	if (hugetlb_get_quota(inode->i_mapping, chg))
+		return -ENOSPC;
 	ret = hugetlb_acct_memory(chg);
 	if (ret < 0)
 		return ret;
@@ -1157,5 +1159,6 @@ int hugetlb_reserve_pages(struct inode *inode, long from, long to)
 void hugetlb_unreserve_pages(struct inode *inode, long offset, long freed)
 {
 	long chg = region_truncate(&inode->i_mapping->private_list, offset);
-	hugetlb_acct_memory(freed - chg);
+	hugetlb_put_quota(inode->i_mapping, (chg - freed));
+	hugetlb_acct_memory(-(chg - freed));
 }

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/5] hugetlb: Fix quota management for private mappings
  2007-10-30 20:46 ` [PATCH 2/5] hugetlb: Fix quota management for private mappings Adam Litke
@ 2007-10-30 23:22   ` Andrew Morton
  2007-10-30 23:28     ` Christoph Lameter
  0 siblings, 1 reply; 13+ messages in thread
From: Andrew Morton @ 2007-10-30 23:22 UTC (permalink / raw)
  To: Adam Litke
  Cc: linux-mm, linux-kernel, kenchen, apw, haveblue, Christoph Lameter

On Tue, 30 Oct 2007 13:46:15 -0700
Adam Litke <agl@us.ibm.com> wrote:

> 
> The hugetlbfs quota management system was never taught to handle
> MAP_PRIVATE mappings when that support was added.  Currently, quota is
> debited at page instantiation and credited at file truncation.  This
> approach works correctly for shared pages but is incomplete for private
> pages.  In addition to hugetlb_no_page(), private pages can be instantiated
> by hugetlb_cow(); but this function does not respect quotas.
> 
> Private huge pages are treated very much like normal, anonymous pages.
> They are not "backed" by the hugetlbfs file and are not stored in the
> mapping's radix tree.  This means that private pages are invisible to
> truncate_hugepages() so that function will not credit the quota.
> 
> This patch (based on a prototype provided by Ken Chen) moves quota
> crediting for all pages into free_huge_page().  page->private is used to
> store a pointer to the mapping to which this page belongs.  This is used to
> credit quota on the appropriate hugetlbfs instance.
> 

Consuming page.private on hugetlb pages is a noteworthy change.  I'm in
fact surprised that it's still available.

I'd expect that others (eg Christoph?) have designs upon it as well.  We
need to work out if this is the best use we can put it to.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/5] hugetlb: Fix quota management for private mappings
  2007-10-30 23:22   ` Andrew Morton
@ 2007-10-30 23:28     ` Christoph Lameter
  2007-10-31 13:26       ` Adam Litke
  2007-10-31 14:54       ` Adam Litke
  0 siblings, 2 replies; 13+ messages in thread
From: Christoph Lameter @ 2007-10-30 23:28 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Adam Litke, linux-mm, linux-kernel, kenchen, apw, haveblue

On Tue, 30 Oct 2007, Andrew Morton wrote:

> > This patch (based on a prototype provided by Ken Chen) moves quota
> > crediting for all pages into free_huge_page().  page->private is used to
> > store a pointer to the mapping to which this page belongs.  This is used to
> > credit quota on the appropriate hugetlbfs instance.
> > 
> 
> Consuming page.private on hugetlb pages is a noteworthy change.  I'm in
> fact surprised that it's still available.
> 
> I'd expect that others (eg Christoph?) have designs upon it as well.  We
> need to work out if this is the best use we can put it to.

The private pointer in the first page of a compound page is always 
available. However, why do we not use page->mapping for that purpose? 
Could we stay as close as possible to regular page cache field use?


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/5] hugetlb: Fix quota management for private mappings
  2007-10-30 23:28     ` Christoph Lameter
@ 2007-10-31 13:26       ` Adam Litke
  2007-10-31 14:54       ` Adam Litke
  1 sibling, 0 replies; 13+ messages in thread
From: Adam Litke @ 2007-10-31 13:26 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Andrew Morton, linux-mm, linux-kernel, kenchen, apw, haveblue

On Tue, 2007-10-30 at 16:28 -0700, Christoph Lameter wrote:
> On Tue, 30 Oct 2007, Andrew Morton wrote:
> 
> > > This patch (based on a prototype provided by Ken Chen) moves quota
> > > crediting for all pages into free_huge_page().  page->private is used to
> > > store a pointer to the mapping to which this page belongs.  This is used to
> > > credit quota on the appropriate hugetlbfs instance.
> > > 
> > 
> > Consuming page.private on hugetlb pages is a noteworthy change.  I'm in
> > fact surprised that it's still available.
> > 
> > I'd expect that others (eg Christoph?) have designs upon it as well.  We
> > need to work out if this is the best use we can put it to.
> 
> The private pointer in the first page of a compound page is always 
> available. However, why do we not use page->mapping for that purpose? 
> Could we stay as close as possible to regular page cache field use?

We could use page->mapping, but add_to_page_cache() already fills it in
for shared huge pages.  Even though add_to_page_cache and
alloc_huge_page would be storing the same value there, it seems a bit
odd to double-assign it.

-- 
Adam Litke - (agl at us.ibm.com)
IBM Linux Technology Center

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/5] hugetlb: Fix quota management for private mappings
  2007-10-30 23:28     ` Christoph Lameter
  2007-10-31 13:26       ` Adam Litke
@ 2007-10-31 14:54       ` Adam Litke
  2007-10-31 17:33         ` Christoph Lameter
  2007-10-31 18:42         ` Hugh Dickins
  1 sibling, 2 replies; 13+ messages in thread
From: Adam Litke @ 2007-10-31 14:54 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Andrew Morton, linux-mm, linux-kernel, kenchen, apw, haveblue

On Tue, 2007-10-30 at 16:28 -0700, Christoph Lameter wrote:
> On Tue, 30 Oct 2007, Andrew Morton wrote:
> 
> > > This patch (based on a prototype provided by Ken Chen) moves quota
> > > crediting for all pages into free_huge_page().  page->private is used to
> > > store a pointer to the mapping to which this page belongs.  This is used to
> > > credit quota on the appropriate hugetlbfs instance.
> > > 
> > 
> > Consuming page.private on hugetlb pages is a noteworthy change.  I'm in
> > fact surprised that it's still available.
> > 
> > I'd expect that others (eg Christoph?) have designs upon it as well.  We
> > need to work out if this is the best use we can put it to.
> 
> The private pointer in the first page of a compound page is always 
> available. However, why do we not use page->mapping for that purpose? 
> Could we stay as close as possible to regular page cache field use?

There is an additional problem I forgot to mention in the previous mail.
The remove_from_page_cache() call path clears page->mapping.  This means
that if the free_huge_page destructor is called on a previously shared
page, we will not have the needed information to release quota.  Perhaps
this is a further indication that use of page->mapping at this level is
inappropriate. 

-- 
Adam Litke - (agl at us.ibm.com)
IBM Linux Technology Center

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/5] hugetlb: Fix quota management for private mappings
  2007-10-31 14:54       ` Adam Litke
@ 2007-10-31 17:33         ` Christoph Lameter
  2007-10-31 18:25           ` Dave Hansen
  2007-10-31 18:42         ` Hugh Dickins
  1 sibling, 1 reply; 13+ messages in thread
From: Christoph Lameter @ 2007-10-31 17:33 UTC (permalink / raw)
  To: Adam Litke; +Cc: Andrew Morton, linux-mm, linux-kernel, kenchen, apw, haveblue

On Wed, 31 Oct 2007, Adam Litke wrote:

> > The private pointer in the first page of a compound page is always 
> > available. However, why do we not use page->mapping for that purpose? 
> > Could we stay as close as possible to regular page cache field use?
> 
> There is an additional problem I forgot to mention in the previous mail.
> The remove_from_page_cache() call path clears page->mapping.  This means
> that if the free_huge_page destructor is called on a previously shared
> page, we will not have the needed information to release quota.  Perhaps
> this is a further indication that use of page->mapping at this level is
> inappropriate. 

How does quota handle that for regular pages? Can you update the quotas 
before the page is released?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/5] hugetlb: Fix quota management for private mappings
  2007-10-31 17:33         ` Christoph Lameter
@ 2007-10-31 18:25           ` Dave Hansen
  0 siblings, 0 replies; 13+ messages in thread
From: Dave Hansen @ 2007-10-31 18:25 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Adam Litke, Andrew Morton, linux-mm, linux-kernel, kenchen, apw

On Wed, 2007-10-31 at 10:33 -0700, Christoph Lameter wrote:
> On Wed, 31 Oct 2007, Adam Litke wrote:
> 
> > > The private pointer in the first page of a compound page is always 
> > > available. However, why do we not use page->mapping for that purpose? 
> > > Could we stay as close as possible to regular page cache field use?
> > 
> > There is an additional problem I forgot to mention in the previous mail.
> > The remove_from_page_cache() call path clears page->mapping.  This means
> > that if the free_huge_page destructor is called on a previously shared
> > page, we will not have the needed information to release quota.  Perhaps
> > this is a further indication that use of page->mapping at this level is
> > inappropriate. 
> 
> How does quota handle that for regular pages? Can you update the quotas 
> before the page is released?

It should happen for normal pages at truncate time, which happens when
i_nlink hits zero.  We also truncate these a whole file at a time.

I think the hugetlbfs problem is that we want to release pages at unmap
(and thus put_page()) time.  If we have an unlinked hugetlbfs file which
has 100 huge pages in it, but only a single huge page still mapped, we
probably don't want to wait around to free those hundred pages waiting
for that last user..  That's for the private pages because their last
ref is always from the ptes.

Shared should be a different matter because that ref comes from the page
cache (generally, I guess).

So, hugetlbfs has a need to release pages _earlier_ in the process than
a normal page, at least for private mappings.  The complication comes
because the private mappings really aren't file-backed in the sense that
they don't have entries in any address_space, but we still need
file-like inode operations on them.  Gah.

-- Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/5] hugetlb: Fix quota management for private mappings
  2007-10-31 14:54       ` Adam Litke
  2007-10-31 17:33         ` Christoph Lameter
@ 2007-10-31 18:42         ` Hugh Dickins
  1 sibling, 0 replies; 13+ messages in thread
From: Hugh Dickins @ 2007-10-31 18:42 UTC (permalink / raw)
  To: Adam Litke
  Cc: Christoph Lameter, Andrew Morton, linux-mm, linux-kernel, kenchen,
	apw, haveblue

On Wed, 31 Oct 2007, Adam Litke wrote:
> On Tue, 2007-10-30 at 16:28 -0700, Christoph Lameter wrote:
> > 
> > The private pointer in the first page of a compound page is always 
> > available. However, why do we not use page->mapping for that purpose? 
> > Could we stay as close as possible to regular page cache field use?
> 
> There is an additional problem I forgot to mention in the previous mail.
> The remove_from_page_cache() call path clears page->mapping.  This means
> that if the free_huge_page destructor is called on a previously shared
> page, we will not have the needed information to release quota.  Perhaps
> this is a further indication that use of page->mapping at this level is
> inappropriate. 

Or is it an indication that use of a struct address_space pointer
at this level is inappropriate?  What guarantee do you have at
free_huge_page time, that the memory once used for that struct
address_space is still being used for the same?

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2007-10-31 18:42 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-10-30 20:45 [PATCH 0/5] hugetlb: Fix up filesystem quota accounting Adam Litke
2007-10-30 20:46 ` [PATCH 1/5] hugetlb: Split alloc_huge_page into private and shared components Adam Litke
2007-10-30 20:46 ` [PATCH 2/5] hugetlb: Fix quota management for private mappings Adam Litke
2007-10-30 23:22   ` Andrew Morton
2007-10-30 23:28     ` Christoph Lameter
2007-10-31 13:26       ` Adam Litke
2007-10-31 14:54       ` Adam Litke
2007-10-31 17:33         ` Christoph Lameter
2007-10-31 18:25           ` Dave Hansen
2007-10-31 18:42         ` Hugh Dickins
2007-10-30 20:46 ` [PATCH 3/5] hugetlb: Debit quota in alloc_huge_page Adam Litke
2007-10-30 20:46 ` [PATCH 4/5] hugetlb: Allow bulk updating in hugetlb_*_quota() Adam Litke
2007-10-30 20:46 ` [PATCH 5/5] hugetlb: Enforce quotas during reservation for shared mappings Adam Litke

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).