[PATCH 0/3] Guarantee faults for processes that call mmap(MAP

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/3] Guarantee faults for processes that call mmap(MAP_PRIVATE) on hugetlbfs v4
@ 2008-05-27 18:50 Mel Gorman
  2008-05-27 18:50 ` [PATCH 1/3] Move hugetlb_acct_memory() Mel Gorman
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Mel Gorman @ 2008-05-27 18:50 UTC (permalink / raw)
  To: akpm
  Cc: Mel Gorman, dean, apw, linux-kernel, wli, dwg, linux-mm, andi,
	kenchen, agl, abh

Hi Andrew,

This is a patchset to give reliable behaviour to a process that successfully
calls mmap(MAP_PRIVATE) on a hugetlbfs file. Currently, it is possible for
the process to be killed due to a small hugepage pool size even if it calls
mlock(). More details are below. There have been no objections made in a
while and I believe it's ready for wider testing. People are cc'd just in
case minds have changed since. Thanks

Changelog since V3
 o Differeniate between a shared pagecache page and a shared parent/child page.
   Without the check, a BUG is triggered when an existing hugetlbfs file is
   mapped MAP_PRIVATE and the pool is too small.

Changelog since V2
 o Rebase to 2.6.26-rc2-mm1
 o Document when hugetlb_lock is held for reserve counter updates
 o Add vma_has_private_reserves() helper for clarity

MAP_SHARED mappings on hugetlbfs reserve huge pages at mmap() time.
This guarantees all future faults against the mapping will succeed.
This allows local allocations at first use improving NUMA locality whilst
retaining reliability.

MAP_PRIVATE mappings do not reserve pages. This can result in an application
being SIGKILLed later if a huge page is not available at fault time. This
makes huge pages usage very ill-advised in some cases as the unexpected
application failure cannot be detected and handled as it is immediately fatal.
Although an application may force instantiation of the pages using mlock(),
this may lead to poor memory placement and the process may still be killed
when performing COW.

This patchset introduces a reliability guarantee for the process which creates
a private mapping, i.e. the process that calls mmap() on a hugetlbfs file
successfully.  The first patch of the set is purely mechanical code move to
make later diffs easier to read. The second patch will guarantee faults up
until the process calls fork(). After patch two, as long as the child keeps
the mappings, the parent is no longer guaranteed to be reliable. Patch
3 guarantees that the parent will always successfully COW by unmapping
the pages from the child in the event there are insufficient pages in the
hugepage pool in allocate a new page, be it via a static or dynamic pool.

Existing hugepage-aware applications are unlikely to be affected by this
change. For much of hugetlbfs's history, pages were pre-faulted at mmap()
time or mmap() failed which acts in a reserve-like manner. If the pool
is sized correctly already so that parent and child can fault reliably,
the application will not even notice the reserves. It's only when the pool
is too small for the application to function perfectly reliably that the
reserves come into play.

Credit goes to Andy Whitcroft for cleaning up a number of mistakes during
review before the patches were released.
-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/3] Move hugetlb_acct_memory()
  2008-05-27 18:50 [PATCH 0/3] Guarantee faults for processes that call mmap(MAP_PRIVATE) on hugetlbfs v4 Mel Gorman
@ 2008-05-27 18:50 ` Mel Gorman
  2008-05-28 13:37   ` Adam Litke
  2008-05-27 18:51 ` [PATCH 2/3] Reserve huge pages for reliable MAP_PRIVATE hugetlbfs mappings until fork() Mel Gorman
  2008-05-27 18:51 ` [PATCH 3/3] Guarantee that COW faults for a process that called mmap(MAP_PRIVATE) on hugetlbfs will succeed Mel Gorman
  2 siblings, 1 reply; 16+ messages in thread
From: Mel Gorman @ 2008-05-27 18:50 UTC (permalink / raw)
  To: akpm
  Cc: Mel Gorman, dean, linux-kernel, wli, dwg, agl, linux-mm, andi,
	kenchen, apw, abh

A later patch in this set needs to call hugetlb_acct_memory() before it
is defined. This patch moves the function without modification. This makes
later diffs easier to read.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---

 mm/hugetlb.c |   82 +++++++++++++++++++++++++++---------------------------
 1 file changed, 41 insertions(+), 41 deletions(-)

diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.26-rc2-mm1-clean/mm/hugetlb.c linux-2.6.26-rc2-mm1-0010-move-hugetlb_acct_memory/mm/hugetlb.c
--- linux-2.6.26-rc2-mm1-clean/mm/hugetlb.c	2008-05-27 10:25:54.000000000 +0100
+++ linux-2.6.26-rc2-mm1-0010-move-hugetlb_acct_memory/mm/hugetlb.c	2008-05-27 10:26:13.000000000 +0100
@@ -716,6 +716,47 @@ unsigned long hugetlb_total_pages(void)
 	return nr_huge_pages * (HPAGE_SIZE / PAGE_SIZE);
 }
 
+static int hugetlb_acct_memory(long delta)
+{
+	int ret = -ENOMEM;
+
+	spin_lock(&hugetlb_lock);
+	/*
+	 * When cpuset is configured, it breaks the strict hugetlb page
+	 * reservation as the accounting is done on a global variable. Such
+	 * reservation is completely rubbish in the presence of cpuset because
+	 * the reservation is not checked against page availability for the
+	 * current cpuset. Application can still potentially OOM'ed by kernel
+	 * with lack of free htlb page in cpuset that the task is in.
+	 * Attempt to enforce strict accounting with cpuset is almost
+	 * impossible (or too ugly) because cpuset is too fluid that
+	 * task or memory node can be dynamically moved between cpusets.
+	 *
+	 * The change of semantics for shared hugetlb mapping with cpuset is
+	 * undesirable. However, in order to preserve some of the semantics,
+	 * we fall back to check against current free page availability as
+	 * a best attempt and hopefully to minimize the impact of changing
+	 * semantics that cpuset has.
+	 */
+	if (delta > 0) {
+		if (gather_surplus_pages(delta) < 0)
+			goto out;
+
+		if (delta > cpuset_mems_nr(free_huge_pages_node)) {
+			return_unused_surplus_pages(delta);
+			goto out;
+		}
+	}
+
+	ret = 0;
+	if (delta < 0)
+		return_unused_surplus_pages((unsigned long) -delta);
+
+out:
+	spin_unlock(&hugetlb_lock);
+	return ret;
+}
+
 /*
  * We cannot handle pagefaults against hugetlb pages at all.  They cause
  * handle_mm_fault() to try to instantiate regular-sized pages in the
@@ -1248,47 +1289,6 @@ static long region_truncate(struct list_
 	return chg;
 }
 
-static int hugetlb_acct_memory(long delta)
-{
-	int ret = -ENOMEM;
-
-	spin_lock(&hugetlb_lock);
-	/*
-	 * When cpuset is configured, it breaks the strict hugetlb page
-	 * reservation as the accounting is done on a global variable. Such
-	 * reservation is completely rubbish in the presence of cpuset because
-	 * the reservation is not checked against page availability for the
-	 * current cpuset. Application can still potentially OOM'ed by kernel
-	 * with lack of free htlb page in cpuset that the task is in.
-	 * Attempt to enforce strict accounting with cpuset is almost
-	 * impossible (or too ugly) because cpuset is too fluid that
-	 * task or memory node can be dynamically moved between cpusets.
-	 *
-	 * The change of semantics for shared hugetlb mapping with cpuset is
-	 * undesirable. However, in order to preserve some of the semantics,
-	 * we fall back to check against current free page availability as
-	 * a best attempt and hopefully to minimize the impact of changing
-	 * semantics that cpuset has.
-	 */
-	if (delta > 0) {
-		if (gather_surplus_pages(delta) < 0)
-			goto out;
-
-		if (delta > cpuset_mems_nr(free_huge_pages_node)) {
-			return_unused_surplus_pages(delta);
-			goto out;
-		}
-	}
-
-	ret = 0;
-	if (delta < 0)
-		return_unused_surplus_pages((unsigned long) -delta);
-
-out:
-	spin_unlock(&hugetlb_lock);
-	return ret;
-}
-
 int hugetlb_reserve_pages(struct inode *inode, long from, long to)
 {
 	long ret, chg;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/3] Move hugetlb_acct_memory()
  2008-05-27 18:50 ` [PATCH 1/3] Move hugetlb_acct_memory() Mel Gorman
@ 2008-05-28 13:37   ` Adam Litke
  0 siblings, 0 replies; 16+ messages in thread
From: Adam Litke @ 2008-05-28 13:37 UTC (permalink / raw)
  To: Mel Gorman
  Cc: akpm, dean, linux-kernel, wli, dwg, linux-mm, andi, kenchen, apw,
	abh

On Tue, 2008-05-27 at 19:50 +0100, Mel Gorman wrote:
> A later patch in this set needs to call hugetlb_acct_memory() before it
> is defined. This patch moves the function without modification. This makes
> later diffs easier to read.
> 
> Signed-off-by: Mel Gorman <mel@csn.ul.ie>

Acked-by: Adam Litke <agl@us.ibm.com>

-- 
Adam Litke - (agl at us.ibm.com)
IBM Linux Technology Center

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 2/3] Reserve huge pages for reliable MAP_PRIVATE hugetlbfs mappings until fork()
  2008-05-27 18:50 [PATCH 0/3] Guarantee faults for processes that call mmap(MAP_PRIVATE) on hugetlbfs v4 Mel Gorman
  2008-05-27 18:50 ` [PATCH 1/3] Move hugetlb_acct_memory() Mel Gorman
@ 2008-05-27 18:51 ` Mel Gorman
  2008-05-28 13:52   ` Adam Litke
  2008-05-27 18:51 ` [PATCH 3/3] Guarantee that COW faults for a process that called mmap(MAP_PRIVATE) on hugetlbfs will succeed Mel Gorman
  2 siblings, 1 reply; 16+ messages in thread
From: Mel Gorman @ 2008-05-27 18:51 UTC (permalink / raw)
  To: akpm
  Cc: Mel Gorman, abh, dean, linux-kernel, wli, dwg, linux-mm, andi,
	kenchen, agl, apw

This patch reserves huge pages at mmap() time for MAP_PRIVATE mappings in a
similar manner to the reservations taken for MAP_SHARED mappings. The reserve count is
accounted both globally and on a per-VMA basis for private mappings. This
guarantees that a process that successfully calls mmap() will successfully
fault all pages in the future unless fork() is called.

The characteristics of private mappings of hugetlbfs files behaviour after
this patch are;

1. The process calling mmap() is guaranteed to succeed all future faults until
   it forks().
2. On fork(), the parent may die due to SIGKILL on writes to the private
   mapping if enough pages are not available for the COW. For reasonably
   reliable behaviour in the face of a small huge page pool, children of
   hugepage-aware processes should not reference the mappings; such as
   might occur when fork()ing to exec().
3. On fork(), the child VMAs inherit no reserves. Reads on pages already
   faulted by the parent will succeed. Successful writes will depend on enough
   huge pages being free in the pool.
4. Quotas of the hugetlbfs mount are checked at reserve time for the mapper
   and at fault time otherwise.

Before this patch, all reads or writes in the child potentially needs page
allocations that can later lead to the death of the parent. This applies
to reads and writes of uninstantiated pages as well as COW. After the
patch it is only a write to an instantiated page that causes problems.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---

 fs/hugetlbfs/inode.c    |    8 +-
 include/linux/hugetlb.h |    9 ++
 kernel/fork.c           |    9 ++
 mm/hugetlb.c            |  158 ++++++++++++++++++++++++++++++++-----------
 4 files changed, 140 insertions(+), 44 deletions(-)

diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.26-rc2-mm1-0010-move-hugetlb_acct_memory/fs/hugetlbfs/inode.c linux-2.6.26-rc2-mm1-0020-map_private_reserve/fs/hugetlbfs/inode.c
--- linux-2.6.26-rc2-mm1-0010-move-hugetlb_acct_memory/fs/hugetlbfs/inode.c	2008-05-12 01:09:41.000000000 +0100
+++ linux-2.6.26-rc2-mm1-0020-map_private_reserve/fs/hugetlbfs/inode.c	2008-05-27 10:26:22.000000000 +0100
@@ -103,9 +103,9 @@ static int hugetlbfs_file_mmap(struct fi
 	ret = -ENOMEM;
 	len = vma_len + ((loff_t)vma->vm_pgoff << PAGE_SHIFT);
 
-	if (vma->vm_flags & VM_MAYSHARE &&
-	    hugetlb_reserve_pages(inode, vma->vm_pgoff >> (HPAGE_SHIFT-PAGE_SHIFT),
-				  len >> HPAGE_SHIFT))
+	if (hugetlb_reserve_pages(inode,
+				vma->vm_pgoff >> (HPAGE_SHIFT-PAGE_SHIFT),
+				len >> HPAGE_SHIFT, vma))
 		goto out;
 
 	ret = 0;
@@ -942,7 +942,7 @@ struct file *hugetlb_file_setup(const ch
 		goto out_dentry;
 
 	error = -ENOMEM;
-	if (hugetlb_reserve_pages(inode, 0, size >> HPAGE_SHIFT))
+	if (hugetlb_reserve_pages(inode, 0, size >> HPAGE_SHIFT, NULL))
 		goto out_inode;
 
 	d_instantiate(dentry, inode);
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.26-rc2-mm1-0010-move-hugetlb_acct_memory/include/linux/hugetlb.h linux-2.6.26-rc2-mm1-0020-map_private_reserve/include/linux/hugetlb.h
--- linux-2.6.26-rc2-mm1-0010-move-hugetlb_acct_memory/include/linux/hugetlb.h	2008-05-12 01:09:41.000000000 +0100
+++ linux-2.6.26-rc2-mm1-0020-map_private_reserve/include/linux/hugetlb.h	2008-05-27 10:26:22.000000000 +0100
@@ -17,6 +17,7 @@ static inline int is_vm_hugetlb_page(str
 	return vma->vm_flags & VM_HUGETLB;
 }
 
+void reset_vma_resv_huge_pages(struct vm_area_struct *vma);
 int hugetlb_sysctl_handler(struct ctl_table *, int, struct file *, void __user *, size_t *, loff_t *);
 int hugetlb_overcommit_handler(struct ctl_table *, int, struct file *, void __user *, size_t *, loff_t *);
 int hugetlb_treat_movable_handler(struct ctl_table *, int, struct file *, void __user *, size_t *, loff_t *);
@@ -30,7 +31,8 @@ int hugetlb_report_node_meminfo(int, cha
 unsigned long hugetlb_total_pages(void);
 int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 			unsigned long address, int write_access);
-int hugetlb_reserve_pages(struct inode *inode, long from, long to);
+int hugetlb_reserve_pages(struct inode *inode, long from, long to,
+						struct vm_area_struct *vma);
 void hugetlb_unreserve_pages(struct inode *inode, long offset, long freed);
 
 extern unsigned long max_huge_pages;
@@ -58,6 +60,11 @@ static inline int is_vm_hugetlb_page(str
 {
 	return 0;
 }
+
+static inline void reset_vma_resv_huge_pages(struct vm_area_struct *vma)
+{
+}
+
 static inline unsigned long hugetlb_total_pages(void)
 {
 	return 0;
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.26-rc2-mm1-0010-move-hugetlb_acct_memory/kernel/fork.c linux-2.6.26-rc2-mm1-0020-map_private_reserve/kernel/fork.c
--- linux-2.6.26-rc2-mm1-0010-move-hugetlb_acct_memory/kernel/fork.c	2008-05-27 10:25:54.000000000 +0100
+++ linux-2.6.26-rc2-mm1-0020-map_private_reserve/kernel/fork.c	2008-05-27 10:26:22.000000000 +0100
@@ -54,6 +54,7 @@
 #include <linux/tty.h>
 #include <linux/proc_fs.h>
 #include <linux/blkdev.h>
+#include <linux/hugetlb.h>
 
 #include <asm/pgtable.h>
 #include <asm/pgalloc.h>
@@ -306,6 +307,14 @@ static int dup_mmap(struct mm_struct *mm
 		}
 
 		/*
+		 * Clear hugetlb-related page reserves for children. This only
+		 * affects MAP_PRIVATE mappings. Faults generated by the child
+		 * are not guaranteed to succeed, even if read-only
+		 */
+		if (is_vm_hugetlb_page(tmp))
+			reset_vma_resv_huge_pages(tmp);
+
+		/*
 		 * Link in the new vma and copy the page table entries.
 		 */
 		*pprev = tmp;
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.26-rc2-mm1-0010-move-hugetlb_acct_memory/mm/hugetlb.c linux-2.6.26-rc2-mm1-0020-map_private_reserve/mm/hugetlb.c
--- linux-2.6.26-rc2-mm1-0010-move-hugetlb_acct_memory/mm/hugetlb.c	2008-05-27 10:26:13.000000000 +0100
+++ linux-2.6.26-rc2-mm1-0020-map_private_reserve/mm/hugetlb.c	2008-05-27 10:26:22.000000000 +0100
@@ -40,6 +40,69 @@ static int hugetlb_next_nid;
  */
 static DEFINE_SPINLOCK(hugetlb_lock);
 
+/*
+ * These helpers are used to track how many pages are reserved for
+ * faults in a MAP_PRIVATE mapping. Only the process that called mmap()
+ * is guaranteed to have their future faults succeed.
+ *
+ * With the exception of reset_vma_resv_huge_pages() which is called at fork(),
+ * the reserve counters are updated with the hugetlb_lock held. It is safe
+ * to reset the VMA at fork() time as it is not in use yet and there is no
+ * chance of the global counters getting corrupted as a result of the values.
+ */
+static unsigned long vma_resv_huge_pages(struct vm_area_struct *vma)
+{
+	VM_BUG_ON(!is_vm_hugetlb_page(vma));
+	if (!(vma->vm_flags & VM_SHARED))
+		return (unsigned long)vma->vm_private_data;
+	return 0;
+}
+
+static void set_vma_resv_huge_pages(struct vm_area_struct *vma,
+							unsigned long reserve)
+{
+	VM_BUG_ON(!is_vm_hugetlb_page(vma));
+	VM_BUG_ON(vma->vm_flags & VM_SHARED);
+
+	vma->vm_private_data = (void *)reserve;
+}
+
+/* Decrement the reserved pages in the hugepage pool by one */
+static void decrement_hugepage_resv_vma(struct vm_area_struct *vma)
+{
+	if (vma->vm_flags & VM_SHARED) {
+		/* Shared mappings always use reserves */
+		resv_huge_pages--;
+	} else {
+		/*
+		 * Only the process that called mmap() has reserves for
+		 * private mappings.
+		 */
+		if (vma_resv_huge_pages(vma)) {
+			resv_huge_pages--;
+			reserve = (unsigned long)vma->vm_private_data - 1;
+			vma->vm_private_data = (void *)reserve;
+		}
+	}
+}
+
+void reset_vma_resv_huge_pages(struct vm_area_struct *vma)
+{
+	VM_BUG_ON(!is_vm_hugetlb_page(vma));
+	if (!(vma->vm_flags & VM_SHARED))
+		vma->vm_private_data = (void *)0;
+}
+
+/* Returns true if the VMA has associated reserve pages */
+static int vma_has_private_reserves(struct vm_area_struct *vma)
+{
+	if (vma->vm_flags & VM_SHARED)
+		return 0;
+	if (!vma_resv_huge_pages(vma))
+		return 0;
+	return 1;
+}
+
 static void clear_huge_page(struct page *page, unsigned long addr)
 {
 	int i;
@@ -101,6 +164,15 @@ static struct page *dequeue_huge_page_vm
 	struct zone *zone;
 	struct zoneref *z;
 
+	/*
+	 * A child process with MAP_PRIVATE mappings created by their parent
+	 * have no page reserves. This check ensures that reservations are
+	 * not "stolen". The child may still get SIGKILLed
+	 */
+	if (!vma_has_private_reserves(vma) &&
+			free_huge_pages - resv_huge_pages == 0)
+		return NULL;
+
 	for_each_zone_zonelist_nodemask(zone, z, zonelist,
 						MAX_NR_ZONES - 1, nodemask) {
 		nid = zone_to_nid(zone);
@@ -111,8 +183,8 @@ static struct page *dequeue_huge_page_vm
 			list_del(&page->lru);
 			free_huge_pages--;
 			free_huge_pages_node[nid]--;
-			if (vma && vma->vm_flags & VM_MAYSHARE)
-				resv_huge_pages--;
+			decrement_hugepage_resv_vma(vma);
+
 			break;
 		}
 	}
@@ -461,55 +533,40 @@ static void return_unused_surplus_pages(
 	}
 }
 
-
-static struct page *alloc_huge_page_shared(struct vm_area_struct *vma,
-						unsigned long addr)
+static struct page *alloc_huge_page(struct vm_area_struct *vma,
+				    unsigned long addr)
 {
 	struct page *page;
+	struct address_space *mapping = vma->vm_file->f_mapping;
+	struct inode *inode = mapping->host;
+	unsigned int chg = 0;
+
+	/*
+	 * Processes that did not create the mapping will have no reserves and
+	 * will not have accounted against quota. Check that the quota can be
+	 * made before satisfying the allocation
+	 */
+	if (!vma_has_private_reserves(vma)) {
+		chg = 1;
+		if (hugetlb_get_quota(inode->i_mapping, chg))
+			return ERR_PTR(-ENOSPC);
+	}
 
 	spin_lock(&hugetlb_lock);
 	page = dequeue_huge_page_vma(vma, addr);
 	spin_unlock(&hugetlb_lock);
-	return page ? page : ERR_PTR(-VM_FAULT_OOM);
-}
 
-static struct page *alloc_huge_page_private(struct vm_area_struct *vma,
-						unsigned long addr)
-{
-	struct page *page = NULL;
-
-	if (hugetlb_get_quota(vma->vm_file->f_mapping, 1))
-		return ERR_PTR(-VM_FAULT_SIGBUS);
-
-	spin_lock(&hugetlb_lock);
-	if (free_huge_pages > resv_huge_pages)
-		page = dequeue_huge_page_vma(vma, addr);
-	spin_unlock(&hugetlb_lock);
 	if (!page) {
 		page = alloc_buddy_huge_page(vma, addr);
 		if (!page) {
-			hugetlb_put_quota(vma->vm_file->f_mapping, 1);
+			hugetlb_put_quota(inode->i_mapping, chg);
 			return ERR_PTR(-VM_FAULT_OOM);
 		}
 	}
-	return page;
-}
 
-static struct page *alloc_huge_page(struct vm_area_struct *vma,
-				    unsigned long addr)
-{
-	struct page *page;
-	struct address_space *mapping = vma->vm_file->f_mapping;
-
-	if (vma->vm_flags & VM_MAYSHARE)
-		page = alloc_huge_page_shared(vma, addr);
-	else
-		page = alloc_huge_page_private(vma, addr);
+	set_page_refcounted(page);
+	set_page_private(page, (unsigned long) mapping);
 
-	if (!IS_ERR(page)) {
-		set_page_refcounted(page);
-		set_page_private(page, (unsigned long) mapping);
-	}
 	return page;
 }
 
@@ -757,6 +814,13 @@ out:
 	return ret;
 }
 
+static void hugetlb_vm_op_close(struct vm_area_struct *vma)
+{
+	unsigned long reserve = vma_resv_huge_pages(vma);
+	if (reserve)
+		hugetlb_acct_memory(-reserve);
+}
+
 /*
  * We cannot handle pagefaults against hugetlb pages at all.  They cause
  * handle_mm_fault() to try to instantiate regular-sized pages in the
@@ -771,6 +835,7 @@ static int hugetlb_vm_op_fault(struct vm
 
 struct vm_operations_struct hugetlb_vm_ops = {
 	.fault = hugetlb_vm_op_fault,
+	.close = hugetlb_vm_op_close,
 };
 
 static pte_t make_huge_pte(struct vm_area_struct *vma, struct page *page,
@@ -1289,11 +1354,25 @@ static long region_truncate(struct list_
 	return chg;
 }
 
-int hugetlb_reserve_pages(struct inode *inode, long from, long to)
+int hugetlb_reserve_pages(struct inode *inode,
+					long from, long to,
+					struct vm_area_struct *vma)
 {
 	long ret, chg;
 
-	chg = region_chg(&inode->i_mapping->private_list, from, to);
+	/*
+	 * Shared mappings base their reservation on the number of pages that
+	 * are already allocated on behalf of the file. Private mappings need
+	 * to reserve the full area even if read-only as mprotect() may be
+	 * called to make the mapping read-write. Assume !vma is a shm mapping
+	 */
+	if (!vma || vma->vm_flags & VM_SHARED)
+		chg = region_chg(&inode->i_mapping->private_list, from, to);
+	else {
+		chg = to - from;
+		set_vma_resv_huge_pages(vma, chg);
+	}
+
 	if (chg < 0)
 		return chg;
 
@@ -1304,7 +1383,8 @@ int hugetlb_reserve_pages(struct inode *
 		hugetlb_put_quota(inode->i_mapping, chg);
 		return ret;
 	}
-	region_add(&inode->i_mapping->private_list, from, to);
+	if (!vma || vma->vm_flags & VM_SHARED)
+		region_add(&inode->i_mapping->private_list, from, to);
 	return 0;
 }
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/3] Reserve huge pages for reliable MAP_PRIVATE hugetlbfs mappings until fork()
  2008-05-27 18:51 ` [PATCH 2/3] Reserve huge pages for reliable MAP_PRIVATE hugetlbfs mappings until fork() Mel Gorman
@ 2008-05-28 13:52   ` Adam Litke
  0 siblings, 0 replies; 16+ messages in thread
From: Adam Litke @ 2008-05-28 13:52 UTC (permalink / raw)
  To: Mel Gorman
  Cc: akpm, abh, dean, linux-kernel, wli, dwg, linux-mm, andi, kenchen,
	apw

On Tue, 2008-05-27 at 19:51 +0100, Mel Gorman wrote:
> This patch reserves huge pages at mmap() time for MAP_PRIVATE mappings in a
> similar manner to the reservations taken for MAP_SHARED mappings. The reserve count is
> accounted both globally and on a per-VMA basis for private mappings. This
> guarantees that a process that successfully calls mmap() will successfully
> fault all pages in the future unless fork() is called.
> 
> The characteristics of private mappings of hugetlbfs files behaviour after
> this patch are;
> 
> 1. The process calling mmap() is guaranteed to succeed all future faults until
>    it forks().
> 2. On fork(), the parent may die due to SIGKILL on writes to the private
>    mapping if enough pages are not available for the COW. For reasonably
>    reliable behaviour in the face of a small huge page pool, children of
>    hugepage-aware processes should not reference the mappings; such as
>    might occur when fork()ing to exec().
> 3. On fork(), the child VMAs inherit no reserves. Reads on pages already
>    faulted by the parent will succeed. Successful writes will depend on enough
>    huge pages being free in the pool.
> 4. Quotas of the hugetlbfs mount are checked at reserve time for the mapper
>    and at fault time otherwise.
> 
> Before this patch, all reads or writes in the child potentially needs page
> allocations that can later lead to the death of the parent. This applies
> to reads and writes of uninstantiated pages as well as COW. After the
> patch it is only a write to an instantiated page that causes problems.
> 
> Signed-off-by: Mel Gorman <mel@csn.ul.ie>

Acked-by: Adam Litke <agl@us.ibm.com>

-- 
Adam Litke - (agl at us.ibm.com)
IBM Linux Technology Center

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 3/3] Guarantee that COW faults for a process that called mmap(MAP_PRIVATE) on hugetlbfs will succeed
  2008-05-27 18:50 [PATCH 0/3] Guarantee faults for processes that call mmap(MAP_PRIVATE) on hugetlbfs v4 Mel Gorman
  2008-05-27 18:50 ` [PATCH 1/3] Move hugetlb_acct_memory() Mel Gorman
  2008-05-27 18:51 ` [PATCH 2/3] Reserve huge pages for reliable MAP_PRIVATE hugetlbfs mappings until fork() Mel Gorman
@ 2008-05-27 18:51 ` Mel Gorman
  2008-05-28 16:00   ` Mel Gorman
                     ` (2 more replies)
  2 siblings, 3 replies; 16+ messages in thread
From: Mel Gorman @ 2008-05-27 18:51 UTC (permalink / raw)
  To: akpm
  Cc: Mel Gorman, dean, linux-kernel, wli, dwg, apw, linux-mm, andi,
	kenchen, agl, abh

After patch 2 in this series, a process that successfully calls mmap()
for a MAP_PRIVATE mapping will be guaranteed to successfully fault until a
process calls fork().  At that point, the next write fault from the parent
could fail due to COW if the child still has a reference.

We only reserve pages for the parent but a copy must be made to avoid leaking
data from the parent to the child after fork(). Reserves could be taken for
both parent and child at fork time to guarantee faults but if the mapping
is large it is highly likely we will not have sufficient pages for the
reservation, and it is common to fork only to exec() immediatly after. A
failure here would be very undesirable.

Note that the current behaviour of mainline with MAP_PRIVATE pages is
pretty bad.  The following situation is allowed to occur today.

1. Process calls mmap(MAP_PRIVATE)
2. Process calls mlock() to fault all pages and makes sure it succeeds
3. Process forks()
4. Process writes to MAP_PRIVATE mapping while child still exists
5. If the COW fails at this point, the process gets SIGKILLed even though it
   had taken care to ensure the pages existed

This patch improves the situation by guaranteeing the reliability of the
process that successfully calls mmap(). When the parent performs COW, it
will try to satisfy the allocation without using reserves. If that fails the
parent will steal the page leaving any children without a page. Faults from
the child after that point will result in failure. If the child COW happens
first, an attempt will be made to allocate the page without reserves and
the child will get SIGKILLed on failure.

To summarise the new behaviour:

1. If the original mapper performs COW on a private mapping with multiple
   references, it will attempt to allocate a hugepage from the pool or
   the buddy allocator without using the existing reserves. On fail, VMAs
   mapping the same area are traversed and the page being COW'd is unmapped
   where found. It will then steal the original page as the last mapper in
   the normal way.

2. The VMAs the pages were unmapped from are flagged to note that pages
   with data no longer exist. Future no-page faults on those VMAs will
   terminate the process as otherwise it would appear that data was corrupted.
   A warning is printed to the console that this situation occured.

2. If the child performs COW first, it will attempt to satisfy the COW
   from the pool if there are enough pages or via the buddy allocator if
   overcommit is allowed and the buddy allocator can satisfy the request. If
   it fails, the child will be killed.

If the pool is large enough, existing applications will not notice that the
reserves were a factor. Existing applications depending on the no-reserves
been set are unlikely to exist as for much of the history of hugetlbfs,
pages were prefaulted at mmap(), allocating the pages at that point or failing
the mmap().

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---

 fs/hugetlbfs/inode.c    |    2 
 include/linux/hugetlb.h |    6 -
 mm/hugetlb.c            |  196 +++++++++++++++++++++++++++++++++++++++----
 mm/memory.c             |    2 
 4 files changed, 184 insertions(+), 22 deletions(-)

diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.26-rc2-mm1-0020-map_private_reserve/fs/hugetlbfs/inode.c linux-2.6.26-rc2-mm1-0030-reliable_parent_faults/fs/hugetlbfs/inode.c
--- linux-2.6.26-rc2-mm1-0020-map_private_reserve/fs/hugetlbfs/inode.c	2008-05-27 10:26:22.000000000 +0100
+++ linux-2.6.26-rc2-mm1-0030-reliable_parent_faults/fs/hugetlbfs/inode.c	2008-05-27 12:59:20.000000000 +0100
@@ -441,7 +441,7 @@ hugetlb_vmtruncate_list(struct prio_tree
 			v_offset = 0;
 
 		__unmap_hugepage_range(vma,
-				vma->vm_start + v_offset, vma->vm_end);
+				vma->vm_start + v_offset, vma->vm_end, NULL);
 	}
 }
 
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.26-rc2-mm1-0020-map_private_reserve/include/linux/hugetlb.h linux-2.6.26-rc2-mm1-0030-reliable_parent_faults/include/linux/hugetlb.h
--- linux-2.6.26-rc2-mm1-0020-map_private_reserve/include/linux/hugetlb.h	2008-05-27 10:26:22.000000000 +0100
+++ linux-2.6.26-rc2-mm1-0030-reliable_parent_faults/include/linux/hugetlb.h	2008-05-27 12:59:20.000000000 +0100
@@ -23,8 +23,10 @@ int hugetlb_overcommit_handler(struct ct
 int hugetlb_treat_movable_handler(struct ctl_table *, int, struct file *, void __user *, size_t *, loff_t *);
 int copy_hugetlb_page_range(struct mm_struct *, struct mm_struct *, struct vm_area_struct *);
 int follow_hugetlb_page(struct mm_struct *, struct vm_area_struct *, struct page **, struct vm_area_struct **, unsigned long *, int *, int, int);
-void unmap_hugepage_range(struct vm_area_struct *, unsigned long, unsigned long);
-void __unmap_hugepage_range(struct vm_area_struct *, unsigned long, unsigned long);
+void unmap_hugepage_range(struct vm_area_struct *,
+			unsigned long, unsigned long, struct page *);
+void __unmap_hugepage_range(struct vm_area_struct *,
+			unsigned long, unsigned long, struct page *);
 int hugetlb_prefault(struct address_space *, struct vm_area_struct *);
 int hugetlb_report_meminfo(char *);
 int hugetlb_report_node_meminfo(int, char *);
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.26-rc2-mm1-0020-map_private_reserve/mm/hugetlb.c linux-2.6.26-rc2-mm1-0030-reliable_parent_faults/mm/hugetlb.c
--- linux-2.6.26-rc2-mm1-0020-map_private_reserve/mm/hugetlb.c	2008-05-27 10:26:22.000000000 +0100
+++ linux-2.6.26-rc2-mm1-0030-reliable_parent_faults/mm/hugetlb.c	2008-05-27 15:25:09.000000000 +0100
@@ -40,6 +40,9 @@ static int hugetlb_next_nid;
  */
 static DEFINE_SPINLOCK(hugetlb_lock);
 
+#define HPAGE_RESV_OWNER    (1UL << (BITS_PER_LONG - 1))
+#define HPAGE_RESV_UNMAPPED (1UL << (BITS_PER_LONG - 2))
+#define HPAGE_RESV_MASK (HPAGE_RESV_OWNER | HPAGE_RESV_UNMAPPED)
 /*
  * These helpers are used to track how many pages are reserved for
  * faults in a MAP_PRIVATE mapping. Only the process that called mmap()
@@ -54,17 +57,32 @@ static unsigned long vma_resv_huge_pages
 {
 	VM_BUG_ON(!is_vm_hugetlb_page(vma));
 	if (!(vma->vm_flags & VM_SHARED))
-		return (unsigned long)vma->vm_private_data;
+		return (unsigned long)vma->vm_private_data & ~HPAGE_RESV_MASK;
 	return 0;
 }
 
 static void set_vma_resv_huge_pages(struct vm_area_struct *vma,
 							unsigned long reserve)
 {
+	unsigned long flags;
 	VM_BUG_ON(!is_vm_hugetlb_page(vma));
 	VM_BUG_ON(vma->vm_flags & VM_SHARED);
 
-	vma->vm_private_data = (void *)reserve;
+	flags = (unsigned long)vma->vm_private_data & HPAGE_RESV_MASK;
+	vma->vm_private_data = (void *)(reserve | flags);
+}
+
+static void set_vma_resv_flags(struct vm_area_struct *vma, unsigned long flags)
+{
+	unsigned long reserveflags = (unsigned long)vma->vm_private_data;
+	VM_BUG_ON(!is_vm_hugetlb_page(vma));
+	vma->vm_private_data = (void *)(reserveflags | flags);
+}
+
+static int is_vma_resv_set(struct vm_area_struct *vma, unsigned long flag)
+{
+	VM_BUG_ON(!is_vm_hugetlb_page(vma));
+	return ((unsigned long)vma->vm_private_data & flag) != 0;
 }
 
 /* Decrement the reserved pages in the hugepage pool by one */
@@ -78,14 +96,18 @@ static void decrement_hugepage_resv_vma(
 		 * Only the process that called mmap() has reserves for
 		 * private mappings.
 		 */
-		if (vma_resv_huge_pages(vma)) {
+		if (is_vma_resv_set(vma, HPAGE_RESV_OWNER)) {
+			unsigned long flags, reserve;
 			resv_huge_pages--;
+			flags = (unsigned long)vma->vm_private_data &
+							HPAGE_RESV_MASK;
 			reserve = (unsigned long)vma->vm_private_data - 1;
-			vma->vm_private_data = (void *)reserve;
+			vma->vm_private_data = (void *)(reserve | flags);
 		}
 	}
 }
 
+/* Reset counters to 0 and clear all HPAGE_RESV_* flags */
 void reset_vma_resv_huge_pages(struct vm_area_struct *vma)
 {
 	VM_BUG_ON(!is_vm_hugetlb_page(vma));
@@ -153,7 +175,7 @@ static struct page *dequeue_huge_page(vo
 }
 
 static struct page *dequeue_huge_page_vma(struct vm_area_struct *vma,
-				unsigned long address)
+				unsigned long address, int avoid_reserve)
 {
 	int nid;
 	struct page *page = NULL;
@@ -173,6 +195,10 @@ static struct page *dequeue_huge_page_vm
 			free_huge_pages - resv_huge_pages == 0)
 		return NULL;
 
+	/* If reserves cannot be used, ensure enough pages are in the pool */
+	if (avoid_reserve && free_huge_pages - resv_huge_pages == 0)
+		return NULL;
+
 	for_each_zone_zonelist_nodemask(zone, z, zonelist,
 						MAX_NR_ZONES - 1, nodemask) {
 		nid = zone_to_nid(zone);
@@ -183,7 +209,9 @@ static struct page *dequeue_huge_page_vm
 			list_del(&page->lru);
 			free_huge_pages--;
 			free_huge_pages_node[nid]--;
-			decrement_hugepage_resv_vma(vma);
+
+			if (!avoid_reserve)
+				decrement_hugepage_resv_vma(vma);
 
 			break;
 		}
@@ -534,7 +562,7 @@ static void return_unused_surplus_pages(
 }
 
 static struct page *alloc_huge_page(struct vm_area_struct *vma,
-				    unsigned long addr)
+				    unsigned long addr, int avoid_reserve)
 {
 	struct page *page;
 	struct address_space *mapping = vma->vm_file->f_mapping;
@@ -546,14 +574,15 @@ static struct page *alloc_huge_page(stru
 	 * will not have accounted against quota. Check that the quota can be
 	 * made before satisfying the allocation
 	 */
-	if (!vma_has_private_reserves(vma)) {
+	if (!(vma->vm_flags & VM_SHARED) &&
+			!is_vma_resv_set(vma, HPAGE_RESV_OWNER)) {
 		chg = 1;
 		if (hugetlb_get_quota(inode->i_mapping, chg))
 			return ERR_PTR(-ENOSPC);
 	}
 
 	spin_lock(&hugetlb_lock);
-	page = dequeue_huge_page_vma(vma, addr);
+	page = dequeue_huge_page_vma(vma, addr, avoid_reserve);
 	spin_unlock(&hugetlb_lock);
 
 	if (!page) {
@@ -909,7 +938,7 @@ nomem:
 }
 
 void __unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start,
-			    unsigned long end)
+			    unsigned long end, struct page *ref_page)
 {
 	struct mm_struct *mm = vma->vm_mm;
 	unsigned long address;
@@ -937,6 +966,27 @@ void __unmap_hugepage_range(struct vm_ar
 		if (huge_pmd_unshare(mm, &address, ptep))
 			continue;
 
+		/*
+		 * If a reference page is supplied, it is because a specific
+		 * page is being unmapped, not a range. Ensure the page we
+		 * are about to unmap is the actual page of interest.
+		 */
+		if (ref_page) {
+			pte = huge_ptep_get(ptep);
+			if (huge_pte_none(pte))
+				continue;
+			page = pte_page(pte);
+			if (page != ref_page)
+				continue;
+
+			/*
+			 * Mark the VMA as having unmapped its page so that
+			 * future faults in this VMA will fail rather than
+			 * looking like data was lost
+			 */
+			set_vma_resv_flags(vma, HPAGE_RESV_UNMAPPED);
+		}
+
 		pte = huge_ptep_get_and_clear(mm, address, ptep);
 		if (huge_pte_none(pte))
 			continue;
@@ -955,7 +1005,7 @@ void __unmap_hugepage_range(struct vm_ar
 }
 
 void unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start,
-			  unsigned long end)
+			  unsigned long end, struct page *ref_page)
 {
 	/*
 	 * It is undesirable to test vma->vm_file as it should be non-null
@@ -967,19 +1017,63 @@ void unmap_hugepage_range(struct vm_area
 	 */
 	if (vma->vm_file) {
 		spin_lock(&vma->vm_file->f_mapping->i_mmap_lock);
-		__unmap_hugepage_range(vma, start, end);
+		__unmap_hugepage_range(vma, start, end, ref_page);
 		spin_unlock(&vma->vm_file->f_mapping->i_mmap_lock);
 	}
 }
 
+/*
+ * This is called when the original mapper is failing to COW a MAP_PRIVATE
+ * mappping it owns the reserve page for. The intention is to unmap the page
+ * from other VMAs and let the children be SIGKILLed if they are faulting the
+ * same region.
+ */
+int unmap_ref_private(struct mm_struct *mm,
+					struct vm_area_struct *vma,
+					struct page *page,
+					unsigned long address)
+{
+	struct vm_area_struct *iter_vma;
+	struct address_space *mapping;
+	pgoff_t pgoff = ((address - vma->vm_start) >> HPAGE_SHIFT)
+		+ (vma->vm_pgoff >> (HPAGE_SHIFT - PAGE_SHIFT));
+	struct prio_tree_iter iter;
+
+	mapping = (struct address_space *)page_private(page);
+	vma_prio_tree_foreach(iter_vma, &iter, &mapping->i_mmap, pgoff, pgoff) {
+		BUG_ON(vma->vm_start != iter_vma->vm_start);
+
+		/* Do not unmap the current VMA */
+		if (iter_vma == vma)
+			continue;
+
+		/*
+		 * Unmap the page from other VMAs without their own reserves.
+		 * They get marked to be SIGKILLed if they fault in these
+		 * areas. This is because a future no-page fault on this VMA
+		 * could insert a zeroed page instead of the data existing
+		 * from the time of fork. This would look like data corruption
+		 */
+		if (!is_vma_resv_set(iter_vma, HPAGE_RESV_OWNER))
+			unmap_hugepage_range(iter_vma,
+				address, address + HPAGE_SIZE,
+				page);
+	}
+
+	return 1;
+}
+
 static int hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma,
-			unsigned long address, pte_t *ptep, pte_t pte)
+			unsigned long address, pte_t *ptep, pte_t pte,
+			struct page *pagecache_page)
 {
 	struct page *old_page, *new_page;
 	int avoidcopy;
+	int outside_reserve = 0;
 
 	old_page = pte_page(pte);
 
+retry_avoidcopy:
 	/* If no-one else is actually using this page, avoid the copy
 	 * and just make the page writable */
 	avoidcopy = (page_count(old_page) == 1);
@@ -988,11 +1082,43 @@ static int hugetlb_cow(struct mm_struct 
 		return 0;
 	}
 
+	/*
+	 * If the process that created a MAP_PRIVATE mapping is about to
+	 * perform a COW due to a shared page count, attempt to satisfy
+	 * the allocation without using the existing reserves. The pagecache
+	 * page is used to determine if the reserve at this address was
+	 * consumed or not. If reserves were used, a partial faulted mapping
+	 * at the time of fork() could consume its reserves on COW instead
+	 * of the full address range.
+	 */
+	if (!(vma->vm_flags & VM_SHARED) &&
+			is_vma_resv_set(vma, HPAGE_RESV_OWNER) &&
+			old_page != pagecache_page)
+		outside_reserve = 1;
+
 	page_cache_get(old_page);
-	new_page = alloc_huge_page(vma, address);
+	new_page = alloc_huge_page(vma, address, outside_reserve);
 
 	if (IS_ERR(new_page)) {
 		page_cache_release(old_page);
+
+		/*
+		 * If a process owning a MAP_PRIVATE mapping fails to COW,
+		 * it is due to references held by a child and an insufficient
+		 * huge page pool. To guarantee the original mappers
+		 * reliability, unmap the page from child processes. The child
+		 * may get SIGKILLed if it later faults.
+		 */
+		if (outside_reserve) {
+			BUG_ON(huge_pte_none(pte));
+			if (unmap_ref_private(mm, vma, old_page, address)) {
+				BUG_ON(page_count(old_page) != 1);
+				BUG_ON(huge_pte_none(pte));
+				goto retry_avoidcopy;
+			}
+			WARN_ON_ONCE(1);
+		}
+
 		return -PTR_ERR(new_page);
 	}
 
@@ -1015,6 +1141,20 @@ static int hugetlb_cow(struct mm_struct 
 	return 0;
 }
 
+/* Return the pagecache page at a given address within a VMA */
+static struct page *hugetlbfs_pagecache_page(struct vm_area_struct *vma,
+			unsigned long address)
+{
+	struct address_space *mapping;
+	unsigned long idx;
+
+	mapping = vma->vm_file->f_mapping;
+	idx = ((address - vma->vm_start) >> HPAGE_SHIFT)
+		+ (vma->vm_pgoff >> (HPAGE_SHIFT - PAGE_SHIFT));
+
+	return find_lock_page(mapping, idx);
+}
+
 static int hugetlb_no_page(struct mm_struct *mm, struct vm_area_struct *vma,
 			unsigned long address, pte_t *ptep, int write_access)
 {
@@ -1025,6 +1165,18 @@ static int hugetlb_no_page(struct mm_str
 	struct address_space *mapping;
 	pte_t new_pte;
 
+	/*
+	 * Currently, we are forced to kill the process in the event the
+	 * original mapper has unmapped pages from the child due to a failed
+	 * COW. Warn that such a situation has occured as it may not be obvious
+	 */
+	if (is_vma_resv_set(vma, HPAGE_RESV_UNMAPPED)) {
+		printk(KERN_WARNING
+			"PID %d killed due to inadequate hugepage pool\n",
+			current->pid);
+		return ret;
+	}
+
 	mapping = vma->vm_file->f_mapping;
 	idx = ((address - vma->vm_start) >> HPAGE_SHIFT)
 		+ (vma->vm_pgoff >> (HPAGE_SHIFT - PAGE_SHIFT));
@@ -1039,7 +1191,7 @@ retry:
 		size = i_size_read(mapping->host) >> HPAGE_SHIFT;
 		if (idx >= size)
 			goto out;
-		page = alloc_huge_page(vma, address);
+		page = alloc_huge_page(vma, address, 0);
 		if (IS_ERR(page)) {
 			ret = -PTR_ERR(page);
 			goto out;
@@ -1081,7 +1233,7 @@ retry:
 
 	if (write_access && !(vma->vm_flags & VM_SHARED)) {
 		/* Optimization, do the COW without a second fault */
-		ret = hugetlb_cow(mm, vma, address, ptep, new_pte);
+		ret = hugetlb_cow(mm, vma, address, ptep, new_pte, page);
 	}
 
 	spin_unlock(&mm->page_table_lock);
@@ -1126,8 +1278,15 @@ int hugetlb_fault(struct mm_struct *mm, 
 	spin_lock(&mm->page_table_lock);
 	/* Check for a racing update before calling hugetlb_cow */
 	if (likely(pte_same(entry, huge_ptep_get(ptep))))
-		if (write_access && !pte_write(entry))
-			ret = hugetlb_cow(mm, vma, address, ptep, entry);
+		if (write_access && !pte_write(entry)) {
+			struct page *page;
+			page = hugetlbfs_pagecache_page(vma, address);
+			ret = hugetlb_cow(mm, vma, address, ptep, entry, page);
+			if (page) {
+				unlock_page(page);
+				put_page(page);
+			}
+		}
 	spin_unlock(&mm->page_table_lock);
 	mutex_unlock(&hugetlb_instantiation_mutex);
 
@@ -1371,6 +1530,7 @@ int hugetlb_reserve_pages(struct inode *
 	else {
 		chg = to - from;
 		set_vma_resv_huge_pages(vma, chg);
+		set_vma_resv_flags(vma, HPAGE_RESV_OWNER);
 	}
 
 	if (chg < 0)
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.26-rc2-mm1-0020-map_private_reserve/mm/memory.c linux-2.6.26-rc2-mm1-0030-reliable_parent_faults/mm/memory.c
--- linux-2.6.26-rc2-mm1-0020-map_private_reserve/mm/memory.c	2008-05-27 10:25:54.000000000 +0100
+++ linux-2.6.26-rc2-mm1-0030-reliable_parent_faults/mm/memory.c	2008-05-27 12:59:20.000000000 +0100
@@ -883,7 +883,7 @@ unsigned long unmap_vmas(struct mmu_gath
 			}
 
 			if (unlikely(is_vm_hugetlb_page(vma))) {
-				unmap_hugepage_range(vma, start, end);
+				unmap_hugepage_range(vma, start, end, NULL);
 				zap_work -= (end - start) /
 						(HPAGE_SIZE / PAGE_SIZE);
 				start = end;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] Guarantee that COW faults for a process that called mmap(MAP_PRIVATE) on hugetlbfs will succeed
  2008-05-27 18:51 ` [PATCH 3/3] Guarantee that COW faults for a process that called mmap(MAP_PRIVATE) on hugetlbfs will succeed Mel Gorman
@ 2008-05-28 16:00   ` Mel Gorman
  2008-05-28 18:15     ` Adam Litke
  2008-05-28 18:16   ` Adam Litke
  2008-05-29  1:42   ` Andrew Morton
  2 siblings, 1 reply; 16+ messages in thread
From: Mel Gorman @ 2008-05-28 16:00 UTC (permalink / raw)
  To: akpm
  Cc: dean, linux-kernel, wli, dwg, apw, linux-mm, andi, kenchen, agl,
	abh, hannes

[PATCH 4/3] Fix prio tree lookup

I spoke too soon. This is a fix to patch 3/3.

If a child unmaps the start of the VMA, the start address is different and
that is perfectly legimite making the BUG_ON check bogus and should be removed.
While page cache lookups are in HPAGE_SIZE, the vma->vm_pgoff is in PAGE_SIZE
units, not HPAGE_SIZE. The offset calculation needs to be in PAGE_SIZE units
to find other VMAs that are mapping the same range of pages. This patch
fixes the offset calculation and adds an explanation comment as to why it
is different from a page cache lookup.

Credit goes to Johannes Weiner for spotting the bogus BUG_ON on IRC which
led to the discovery of the faulty offset calculation.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
 mm/hugetlb.c |   12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.26-rc2-mm1-0030-reliable_parent_faults/mm/hugetlb.c linux-2.6.26-rc2-mm1-1010_fix_priotree_lookup/mm/hugetlb.c
--- linux-2.6.26-rc2-mm1-0030-reliable_parent_faults/mm/hugetlb.c	2008-05-28 14:57:51.000000000 +0100
+++ linux-2.6.26-rc2-mm1-1010_fix_priotree_lookup/mm/hugetlb.c	2008-05-28 15:05:32.000000000 +0100
@@ -1035,14 +1035,18 @@ int unmap_ref_private(struct mm_struct *
 {
 	struct vm_area_struct *iter_vma;
 	struct address_space *mapping;
-	pgoff_t pgoff = ((address - vma->vm_start) >> HPAGE_SHIFT)
-		+ (vma->vm_pgoff >> (HPAGE_SHIFT - PAGE_SHIFT));
 	struct prio_tree_iter iter;
+	pgoff_t pgoff;
 
+	/*
+	 * vm_pgoff is in PAGE_SIZE units, hence the different calculation
+	 * from page cache lookup which is in HPAGE_SIZE units.
+	 */
+	pgoff = ((address - vma->vm_start) >> PAGE_SHIFT)
+		+ (vma->vm_pgoff >> PAGE_SHIFT);
 	mapping = (struct address_space *)page_private(page);
-	vma_prio_tree_foreach(iter_vma, &iter, &mapping->i_mmap, pgoff, pgoff) {
-		BUG_ON(vma->vm_start != iter_vma->vm_start);
 
+	vma_prio_tree_foreach(iter_vma, &iter, &mapping->i_mmap, pgoff, pgoff) {
 		/* Do not unmap the current VMA */
 		if (iter_vma == vma)
 			continue;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] Guarantee that COW faults for a process that called mmap(MAP_PRIVATE) on hugetlbfs will succeed
  2008-05-28 16:00   ` Mel Gorman
@ 2008-05-28 18:15     ` Adam Litke
  0 siblings, 0 replies; 16+ messages in thread
From: Adam Litke @ 2008-05-28 18:15 UTC (permalink / raw)
  To: Mel Gorman
  Cc: akpm, dean, linux-kernel, wli, dwg, apw, linux-mm, andi, kenchen,
	abh, hannes

On Wed, 2008-05-28 at 17:00 +0100, Mel Gorman wrote:
> [PATCH 4/3] Fix prio tree lookup
> 
> I spoke too soon. This is a fix to patch 3/3.
> 
> If a child unmaps the start of the VMA, the start address is different and
> that is perfectly legimite making the BUG_ON check bogus and should be removed.
> While page cache lookups are in HPAGE_SIZE, the vma->vm_pgoff is in PAGE_SIZE
> units, not HPAGE_SIZE. The offset calculation needs to be in PAGE_SIZE units
> to find other VMAs that are mapping the same range of pages. This patch
> fixes the offset calculation and adds an explanation comment as to why it
> is different from a page cache lookup.
> 
> Credit goes to Johannes Weiner for spotting the bogus BUG_ON on IRC which
> led to the discovery of the faulty offset calculation.
> 
> Signed-off-by: Mel Gorman <mel@csn.ul.ie>

Acked-by: Adam Litke <agl@us.ibm.com>

-- 
Adam Litke - (agl at us.ibm.com)
IBM Linux Technology Center

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] Guarantee that COW faults for a process that called mmap(MAP_PRIVATE) on hugetlbfs will succeed
  2008-05-27 18:51 ` [PATCH 3/3] Guarantee that COW faults for a process that called mmap(MAP_PRIVATE) on hugetlbfs will succeed Mel Gorman
  2008-05-28 16:00   ` Mel Gorman
@ 2008-05-28 18:16   ` Adam Litke
  2008-05-29  1:42   ` Andrew Morton
  2 siblings, 0 replies; 16+ messages in thread
From: Adam Litke @ 2008-05-28 18:16 UTC (permalink / raw)
  To: Mel Gorman
  Cc: akpm, dean, linux-kernel, wli, dwg, apw, linux-mm, andi, kenchen,
	abh

On Tue, 2008-05-27 at 19:51 +0100, Mel Gorman wrote:
> After patch 2 in this series, a process that successfully calls mmap()
> for a MAP_PRIVATE mapping will be guaranteed to successfully fault until a
> process calls fork().  At that point, the next write fault from the parent
> could fail due to COW if the child still has a reference.
> 
> We only reserve pages for the parent but a copy must be made to avoid leaking
> data from the parent to the child after fork(). Reserves could be taken for
> both parent and child at fork time to guarantee faults but if the mapping
> is large it is highly likely we will not have sufficient pages for the
> reservation, and it is common to fork only to exec() immediatly after. A
> failure here would be very undesirable.
> 
> Note that the current behaviour of mainline with MAP_PRIVATE pages is
> pretty bad.  The following situation is allowed to occur today.
> 
> 1. Process calls mmap(MAP_PRIVATE)
> 2. Process calls mlock() to fault all pages and makes sure it succeeds
> 3. Process forks()
> 4. Process writes to MAP_PRIVATE mapping while child still exists
> 5. If the COW fails at this point, the process gets SIGKILLed even though it
>    had taken care to ensure the pages existed
> 
> This patch improves the situation by guaranteeing the reliability of the
> process that successfully calls mmap(). When the parent performs COW, it
> will try to satisfy the allocation without using reserves. If that fails the
> parent will steal the page leaving any children without a page. Faults from
> the child after that point will result in failure. If the child COW happens
> first, an attempt will be made to allocate the page without reserves and
> the child will get SIGKILLed on failure.
> 
> To summarise the new behaviour:
> 
> 1. If the original mapper performs COW on a private mapping with multiple
>    references, it will attempt to allocate a hugepage from the pool or
>    the buddy allocator without using the existing reserves. On fail, VMAs
>    mapping the same area are traversed and the page being COW'd is unmapped
>    where found. It will then steal the original page as the last mapper in
>    the normal way.
> 
> 2. The VMAs the pages were unmapped from are flagged to note that pages
>    with data no longer exist. Future no-page faults on those VMAs will
>    terminate the process as otherwise it would appear that data was corrupted.
>    A warning is printed to the console that this situation occured.
> 
> 2. If the child performs COW first, it will attempt to satisfy the COW
>    from the pool if there are enough pages or via the buddy allocator if
>    overcommit is allowed and the buddy allocator can satisfy the request. If
>    it fails, the child will be killed.
> 
> If the pool is large enough, existing applications will not notice that the
> reserves were a factor. Existing applications depending on the no-reserves
> been set are unlikely to exist as for much of the history of hugetlbfs,
> pages were prefaulted at mmap(), allocating the pages at that point or failing
> the mmap().
> 
> Signed-off-by: Mel Gorman <mel@csn.ul.ie>

Acked-by: Adam Litke <agl@us.ibm.com>

-- 
Adam Litke - (agl at us.ibm.com)
IBM Linux Technology Center

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] Guarantee that COW faults for a process that called mmap(MAP_PRIVATE) on hugetlbfs will succeed
  2008-05-27 18:51 ` [PATCH 3/3] Guarantee that COW faults for a process that called mmap(MAP_PRIVATE) on hugetlbfs will succeed Mel Gorman
  2008-05-28 16:00   ` Mel Gorman
  2008-05-28 18:16   ` Adam Litke
@ 2008-05-29  1:42   ` Andrew Morton
  2008-05-30 16:57     ` [PATCH 0/2] hugetlb reservations v4/MAP_NORESERVE V3 cleanups Andy Whitcroft
  2 siblings, 1 reply; 16+ messages in thread
From: Andrew Morton @ 2008-05-29  1:42 UTC (permalink / raw)
  To: Mel Gorman
  Cc: dean, linux-kernel, wli, dwg, apw, linux-mm, andi, kenchen, agl,
	abh

On Tue, 27 May 2008 19:51:28 +0100 (IST) Mel Gorman <mel@csn.ul.ie> wrote:

> 
> After patch 2 in this series, a process that successfully calls mmap()
> for a MAP_PRIVATE mapping will be guaranteed to successfully fault until a
> process calls fork().  At that point, the next write fault from the parent
> could fail due to COW if the child still has a reference.
> 
> We only reserve pages for the parent but a copy must be made to avoid leaking
> data from the parent to the child after fork(). Reserves could be taken for
> both parent and child at fork time to guarantee faults but if the mapping
> is large it is highly likely we will not have sufficient pages for the
> reservation, and it is common to fork only to exec() immediatly after. A
> failure here would be very undesirable.
> 
> Note that the current behaviour of mainline with MAP_PRIVATE pages is
> pretty bad.  The following situation is allowed to occur today.
> 
> 1. Process calls mmap(MAP_PRIVATE)
> 2. Process calls mlock() to fault all pages and makes sure it succeeds
> 3. Process forks()
> 4. Process writes to MAP_PRIVATE mapping while child still exists
> 5. If the COW fails at this point, the process gets SIGKILLed even though it
>    had taken care to ensure the pages existed
> 
> This patch improves the situation by guaranteeing the reliability of the
> process that successfully calls mmap(). When the parent performs COW, it
> will try to satisfy the allocation without using reserves. If that fails the
> parent will steal the page leaving any children without a page. Faults from
> the child after that point will result in failure. If the child COW happens
> first, an attempt will be made to allocate the page without reserves and
> the child will get SIGKILLed on failure.
> 
> To summarise the new behaviour:
> 
> 1. If the original mapper performs COW on a private mapping with multiple
>    references, it will attempt to allocate a hugepage from the pool or
>    the buddy allocator without using the existing reserves. On fail, VMAs
>    mapping the same area are traversed and the page being COW'd is unmapped
>    where found. It will then steal the original page as the last mapper in
>    the normal way.
> 
> 2. The VMAs the pages were unmapped from are flagged to note that pages
>    with data no longer exist. Future no-page faults on those VMAs will
>    terminate the process as otherwise it would appear that data was corrupted.
>    A warning is printed to the console that this situation occured.
> 
> 2. If the child performs COW first, it will attempt to satisfy the COW
>    from the pool if there are enough pages or via the buddy allocator if
>    overcommit is allowed and the buddy allocator can satisfy the request. If
>    it fails, the child will be killed.
> 
> If the pool is large enough, existing applications will not notice that the
> reserves were a factor. Existing applications depending on the no-reserves
> been set are unlikely to exist as for much of the history of hugetlbfs,
> pages were prefaulted at mmap(), allocating the pages at that point or failing
> the mmap().
> 

Implementation nitlets:

> +#define HPAGE_RESV_OWNER    (1UL << (BITS_PER_LONG - 1))
> +#define HPAGE_RESV_UNMAPPED (1UL << (BITS_PER_LONG - 2))
> +#define HPAGE_RESV_MASK (HPAGE_RESV_OWNER | HPAGE_RESV_UNMAPPED)
>  /*
>   * These helpers are used to track how many pages are reserved for
>   * faults in a MAP_PRIVATE mapping. Only the process that called mmap()
> @@ -54,17 +57,32 @@ static unsigned long vma_resv_huge_pages
>  {
>  	VM_BUG_ON(!is_vm_hugetlb_page(vma));
>  	if (!(vma->vm_flags & VM_SHARED))
> -		return (unsigned long)vma->vm_private_data;
> +		return (unsigned long)vma->vm_private_data & ~HPAGE_RESV_MASK;
>  	return 0;
>  }

It might be better to have

unsigned long get_vma_private_data(struct vm_area_struct);
unsigned long set_vma_private_data(struct vm_area_struct);

(or even get_private & set_private)

to do away with all the funky casting.

Or it might not be, too.

> +	pgoff_t pgoff = ((address - vma->vm_start) >> HPAGE_SHIFT)
> +		+ (vma->vm_pgoff >> (HPAGE_SHIFT - PAGE_SHIFT));
>
> ...
>
> +	unsigned long idx;
> +
> +	mapping = vma->vm_file->f_mapping;
> +	idx = ((address - vma->vm_start) >> HPAGE_SHIFT)
> +		+ (vma->vm_pgoff >> (HPAGE_SHIFT - PAGE_SHIFT));

It would be clearer to have a little helper function for the above two
snippets.

And the first uses pgoff_t whereas the second uses bare ulong.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 0/2] hugetlb reservations v4/MAP_NORESERVE V3 cleanups
  2008-05-29  1:42   ` Andrew Morton
@ 2008-05-30 16:57     ` Andy Whitcroft
  2008-05-30 16:58       ` [PATCH 1/2] huge page private reservation review cleanups Andy Whitcroft
  2008-05-30 16:58       ` [PATCH 2/2] huge page MAP_NORESERVE " Andy Whitcroft
  0 siblings, 2 replies; 16+ messages in thread
From: Andy Whitcroft @ 2008-05-30 16:57 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, agl, wli, kenchen, dwg, andi, Mel Gorman,
	dean, abh, Andy Whitcroft

Following on from Andrew's feedback here are a couple of update patches
for hugetlb reservations v4 and MAP_NORESERVE V3.  These introduce a number
of new helpers and use those throughout.

This stack consists of two patches to allow them to be merged into the
appropriate stacks.

huge-page-private-reservation-review-cleanups -- adds the helpers and
  utilises them in the base reservations stack, and

huge-page-MAP_NORESERVE-review-cleanups -- uses the same helpers in the
  MAP_NORESERVE stack.

-apw

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/2] huge page private reservation review cleanups
  2008-05-30 16:57     ` [PATCH 0/2] hugetlb reservations v4/MAP_NORESERVE V3 cleanups Andy Whitcroft
@ 2008-05-30 16:58       ` Andy Whitcroft
  2008-05-30 20:29         ` Andrew Morton
  2008-05-31 12:21         ` Mel Gorman
  2008-05-30 16:58       ` [PATCH 2/2] huge page MAP_NORESERVE " Andy Whitcroft
  1 sibling, 2 replies; 16+ messages in thread
From: Andy Whitcroft @ 2008-05-30 16:58 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, agl, wli, kenchen, dwg, andi, Mel Gorman,
	dean, abh, Andy Whitcroft

Create some new accessors for vma private data to cut down on and contain
the casts.  Encapsulates the huge and small page offset calculations.  Also
adds a couple of VM_BUG_ONs for consistency.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
---
 mm/hugetlb.c |   56 +++++++++++++++++++++++++++++++++++++++++++-------------
 1 files changed, 43 insertions(+), 13 deletions(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 729a830..7a5ac81 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -40,6 +40,26 @@ static int hugetlb_next_nid;
  */
 static DEFINE_SPINLOCK(hugetlb_lock);
 
+/*
+ * Convert the address within this vma to the page offset within
+ * the mapping, in base page units.
+ */
+pgoff_t vma_page_offset(struct vm_area_struct *vma, unsigned long address)
+{
+	return ((address - vma->vm_start) >> PAGE_SHIFT) +
+					(vma->vm_pgoff >> PAGE_SHIFT);
+}
+
+/*
+ * Convert the address within this vma to the page offset within
+ * the mapping, in pagecache page units; huge pages here.
+ */
+pgoff_t vma_pagecache_offset(struct vm_area_struct *vma, unsigned long address)
+{
+	return ((address - vma->vm_start) >> HPAGE_SHIFT) +
+			(vma->vm_pgoff >> (HPAGE_SHIFT - PAGE_SHIFT));
+}
+
 #define HPAGE_RESV_OWNER    (1UL << (BITS_PER_LONG - 1))
 #define HPAGE_RESV_UNMAPPED (1UL << (BITS_PER_LONG - 2))
 #define HPAGE_RESV_MASK (HPAGE_RESV_OWNER | HPAGE_RESV_UNMAPPED)
@@ -53,36 +73,48 @@ static DEFINE_SPINLOCK(hugetlb_lock);
  * to reset the VMA at fork() time as it is not in use yet and there is no
  * chance of the global counters getting corrupted as a result of the values.
  */
+static unsigned long get_vma_private_data(struct vm_area_struct *vma)
+{
+	return (unsigned long)vma->vm_private_data;
+}
+
+static void set_vma_private_data(struct vm_area_struct *vma,
+							unsigned long value)
+{
+	vma->vm_private_data = (void *)value;
+}
+
 static unsigned long vma_resv_huge_pages(struct vm_area_struct *vma)
 {
 	VM_BUG_ON(!is_vm_hugetlb_page(vma));
 	if (!(vma->vm_flags & VM_SHARED))
-		return (unsigned long)vma->vm_private_data & ~HPAGE_RESV_MASK;
+		return get_vma_private_data(vma) & ~HPAGE_RESV_MASK;
 	return 0;
 }
 
 static void set_vma_resv_huge_pages(struct vm_area_struct *vma,
 							unsigned long reserve)
 {
-	unsigned long flags;
 	VM_BUG_ON(!is_vm_hugetlb_page(vma));
 	VM_BUG_ON(vma->vm_flags & VM_SHARED);
 
-	flags = (unsigned long)vma->vm_private_data & HPAGE_RESV_MASK;
-	vma->vm_private_data = (void *)(reserve | flags);
+	set_vma_private_data(vma,
+		(get_vma_private_data(vma) & HPAGE_RESV_MASK) | reserve);
 }
 
 static void set_vma_resv_flags(struct vm_area_struct *vma, unsigned long flags)
 {
-	unsigned long reserveflags = (unsigned long)vma->vm_private_data;
 	VM_BUG_ON(!is_vm_hugetlb_page(vma));
-	vma->vm_private_data = (void *)(reserveflags | flags);
+	VM_BUG_ON(vma->vm_flags & VM_SHARED);
+
+	set_vma_private_data(vma, get_vma_private_data(vma) | flags);
 }
 
 static int is_vma_resv_set(struct vm_area_struct *vma, unsigned long flag)
 {
 	VM_BUG_ON(!is_vm_hugetlb_page(vma));
-	return ((unsigned long)vma->vm_private_data & flag) != 0;
+
+	return (get_vma_private_data(vma) & flag) != 0;
 }
 
 /* Decrement the reserved pages in the hugepage pool by one */
@@ -1150,11 +1182,10 @@ static struct page *hugetlbfs_pagecache_page(struct vm_area_struct *vma,
 			unsigned long address)
 {
 	struct address_space *mapping;
-	unsigned long idx;
+	pgoff_t idx;
 
 	mapping = vma->vm_file->f_mapping;
-	idx = ((address - vma->vm_start) >> HPAGE_SHIFT)
-		+ (vma->vm_pgoff >> (HPAGE_SHIFT - PAGE_SHIFT));
+	idx = vma_pagecache_offset(vma, address);
 
 	return find_lock_page(mapping, idx);
 }
@@ -1163,7 +1194,7 @@ static int hugetlb_no_page(struct mm_struct *mm, struct vm_area_struct *vma,
 			unsigned long address, pte_t *ptep, int write_access)
 {
 	int ret = VM_FAULT_SIGBUS;
-	unsigned long idx;
+	pgoff_t idx;
 	unsigned long size;
 	struct page *page;
 	struct address_space *mapping;
@@ -1182,8 +1213,7 @@ static int hugetlb_no_page(struct mm_struct *mm, struct vm_area_struct *vma,
 	}
 
 	mapping = vma->vm_file->f_mapping;
-	idx = ((address - vma->vm_start) >> HPAGE_SHIFT)
-		+ (vma->vm_pgoff >> (HPAGE_SHIFT - PAGE_SHIFT));
+	idx = vma_pagecache_offset(vma, address);
 
 	/*
 	 * Use page lock to guard against racing truncation

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] huge page private reservation review cleanups
  2008-05-30 16:58       ` [PATCH 1/2] huge page private reservation review cleanups Andy Whitcroft
@ 2008-05-30 20:29         ` Andrew Morton
  2008-05-31 13:06           ` Mel Gorman
  2008-05-31 12:21         ` Mel Gorman
  1 sibling, 1 reply; 16+ messages in thread
From: Andrew Morton @ 2008-05-30 20:29 UTC (permalink / raw)
  To: Andy Whitcroft
  Cc: linux-mm, linux-kernel, agl, wli, kenchen, dwg, andi, mel, dean,
	abh

On Fri, 30 May 2008 17:58:24 +0100
Andy Whitcroft <apw@shadowen.org> wrote:

> 
> Create some new accessors for vma private data to cut down on and contain
> the casts.  Encapsulates the huge and small page offset calculations.  Also
> adds a couple of VM_BUG_ONs for consistency.
> 

I'll stage this after Mel's
hugetlb-guarantee-that-cow-faults-for-a-process-that-called-mmapmap_private-on-hugetlbfs-will-succeed.patch

> ---
>  mm/hugetlb.c |   56 +++++++++++++++++++++++++++++++++++++++++++-------------
>  1 files changed, 43 insertions(+), 13 deletions(-)
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 729a830..7a5ac81 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -40,6 +40,26 @@ static int hugetlb_next_nid;
>   */
>  static DEFINE_SPINLOCK(hugetlb_lock);
>  
> +/*
> + * Convert the address within this vma to the page offset within
> + * the mapping, in base page units.
> + */
> +pgoff_t vma_page_offset(struct vm_area_struct *vma, unsigned long address)
> +{
> +	return ((address - vma->vm_start) >> PAGE_SHIFT) +
> +					(vma->vm_pgoff >> PAGE_SHIFT);
> +}
> +
> +/*
> + * Convert the address within this vma to the page offset within
> + * the mapping, in pagecache page units; huge pages here.
> + */
> +pgoff_t vma_pagecache_offset(struct vm_area_struct *vma, unsigned long address)
> +{
> +	return ((address - vma->vm_start) >> HPAGE_SHIFT) +
> +			(vma->vm_pgoff >> (HPAGE_SHIFT - PAGE_SHIFT));
> +}

I'll make these static.

>  #define HPAGE_RESV_OWNER    (1UL << (BITS_PER_LONG - 1))
>  #define HPAGE_RESV_UNMAPPED (1UL << (BITS_PER_LONG - 2))
>  #define HPAGE_RESV_MASK (HPAGE_RESV_OWNER | HPAGE_RESV_UNMAPPED)
> @@ -53,36 +73,48 @@ static DEFINE_SPINLOCK(hugetlb_lock);
>   * to reset the VMA at fork() time as it is not in use yet and there is no
>   * chance of the global counters getting corrupted as a result of the values.
>   */
> +static unsigned long get_vma_private_data(struct vm_area_struct *vma)
> +{
> +	return (unsigned long)vma->vm_private_data;
> +}
> +
> +static void set_vma_private_data(struct vm_area_struct *vma,
> +							unsigned long value)
> +{
> +	vma->vm_private_data = (void *)value;
> +}

Better.

>  static unsigned long vma_resv_huge_pages(struct vm_area_struct *vma)
>  {
>  	VM_BUG_ON(!is_vm_hugetlb_page(vma));
>  	if (!(vma->vm_flags & VM_SHARED))
> -		return (unsigned long)vma->vm_private_data & ~HPAGE_RESV_MASK;
> +		return get_vma_private_data(vma) & ~HPAGE_RESV_MASK;
>  	return 0;
>  }

But I wonder if helpers which manipulate a vma's HPAGE_RESV_MASK
flag(s) rather than the whole vm_provate_data would have been better.

>  static void set_vma_resv_huge_pages(struct vm_area_struct *vma,
>  							unsigned long reserve)
>  {
> -	unsigned long flags;
>  	VM_BUG_ON(!is_vm_hugetlb_page(vma));
>  	VM_BUG_ON(vma->vm_flags & VM_SHARED);
>  
> -	flags = (unsigned long)vma->vm_private_data & HPAGE_RESV_MASK;
> -	vma->vm_private_data = (void *)(reserve | flags);
> +	set_vma_private_data(vma,
> +		(get_vma_private_data(vma) & HPAGE_RESV_MASK) | reserve);
>  }
>  
>  static void set_vma_resv_flags(struct vm_area_struct *vma, unsigned long flags)
>  {
> -	unsigned long reserveflags = (unsigned long)vma->vm_private_data;
>  	VM_BUG_ON(!is_vm_hugetlb_page(vma));
> -	vma->vm_private_data = (void *)(reserveflags | flags);
> +	VM_BUG_ON(vma->vm_flags & VM_SHARED);
> +
> +	set_vma_private_data(vma, get_vma_private_data(vma) | flags);
>  }
>  
>  static int is_vma_resv_set(struct vm_area_struct *vma, unsigned long flag)
>  {
>  	VM_BUG_ON(!is_vm_hugetlb_page(vma));
> -	return ((unsigned long)vma->vm_private_data & flag) != 0;
> +
> +	return (get_vma_private_data(vma) & flag) != 0;
>  }

Oh.  We already kinda have it.  Perhaps vma_resv_huge_pages() should
have called set_vma_resv_flags().  I guess the assertions would have
busted that.

Oh well, whatever.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] huge page private reservation review cleanups
  2008-05-30 20:29         ` Andrew Morton
@ 2008-05-31 13:06           ` Mel Gorman
  0 siblings, 0 replies; 16+ messages in thread
From: Mel Gorman @ 2008-05-31 13:06 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Andy Whitcroft, linux-mm, linux-kernel, agl, wli, kenchen, dwg,
	andi, dean, abh

On (30/05/08 13:29), Andrew Morton didst pronounce:
> On Fri, 30 May 2008 17:58:24 +0100
> Andy Whitcroft <apw@shadowen.org> wrote:
> 
> > 
> > Create some new accessors for vma private data to cut down on and contain
> > the casts.  Encapsulates the huge and small page offset calculations.  Also
> > adds a couple of VM_BUG_ONs for consistency.
> > 
> 
> I'll stage this after Mel's
> hugetlb-guarantee-that-cow-faults-for-a-process-that-called-mmapmap_private-on-hugetlbfs-will-succeed.patch
> 

Sounds good, thanks.

> > ---
> >  mm/hugetlb.c |   56 +++++++++++++++++++++++++++++++++++++++++++-------------
> >  1 files changed, 43 insertions(+), 13 deletions(-)
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index 729a830..7a5ac81 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -40,6 +40,26 @@ static int hugetlb_next_nid;
> >   */
> >  static DEFINE_SPINLOCK(hugetlb_lock);
> >  
> > +/*
> > + * Convert the address within this vma to the page offset within
> > + * the mapping, in base page units.
> > + */
> > +pgoff_t vma_page_offset(struct vm_area_struct *vma, unsigned long address)
> > +{
> > +	return ((address - vma->vm_start) >> PAGE_SHIFT) +
> > +					(vma->vm_pgoff >> PAGE_SHIFT);
> > +}
> > +
> > +/*
> > + * Convert the address within this vma to the page offset within
> > + * the mapping, in pagecache page units; huge pages here.
> > + */
> > +pgoff_t vma_pagecache_offset(struct vm_area_struct *vma, unsigned long address)
> > +{
> > +	return ((address - vma->vm_start) >> HPAGE_SHIFT) +
> > +			(vma->vm_pgoff >> (HPAGE_SHIFT - PAGE_SHIFT));
> > +}
> 
> I'll make these static.
> 
> >  #define HPAGE_RESV_OWNER    (1UL << (BITS_PER_LONG - 1))
> >  #define HPAGE_RESV_UNMAPPED (1UL << (BITS_PER_LONG - 2))
> >  #define HPAGE_RESV_MASK (HPAGE_RESV_OWNER | HPAGE_RESV_UNMAPPED)
> > @@ -53,36 +73,48 @@ static DEFINE_SPINLOCK(hugetlb_lock);
> >   * to reset the VMA at fork() time as it is not in use yet and there is no
> >   * chance of the global counters getting corrupted as a result of the values.
> >   */
> > +static unsigned long get_vma_private_data(struct vm_area_struct *vma)
> > +{
> > +	return (unsigned long)vma->vm_private_data;
> > +}
> > +
> > +static void set_vma_private_data(struct vm_area_struct *vma,
> > +							unsigned long value)
> > +{
> > +	vma->vm_private_data = (void *)value;
> > +}
> 
> Better.
> 
> >  static unsigned long vma_resv_huge_pages(struct vm_area_struct *vma)
> >  {
> >  	VM_BUG_ON(!is_vm_hugetlb_page(vma));
> >  	if (!(vma->vm_flags & VM_SHARED))
> > -		return (unsigned long)vma->vm_private_data & ~HPAGE_RESV_MASK;
> > +		return get_vma_private_data(vma) & ~HPAGE_RESV_MASK;
> >  	return 0;
> >  }
> 
> But I wonder if helpers which manipulate a vma's HPAGE_RESV_MASK
> flag(s) rather than the whole vm_provate_data would have been better.
> 

There are helpers that do that below. It was suggested that I define a
struct with bit-fields instead but I didn't feel it was much easier to
understand than masks which are already pretty common.

> >  static void set_vma_resv_huge_pages(struct vm_area_struct *vma,
> >  							unsigned long reserve)
> >  {
> > -	unsigned long flags;
> >  	VM_BUG_ON(!is_vm_hugetlb_page(vma));
> >  	VM_BUG_ON(vma->vm_flags & VM_SHARED);
> >  
> > -	flags = (unsigned long)vma->vm_private_data & HPAGE_RESV_MASK;
> > -	vma->vm_private_data = (void *)(reserve | flags);
> > +	set_vma_private_data(vma,
> > +		(get_vma_private_data(vma) & HPAGE_RESV_MASK) | reserve);
> >  }
> >  
> >  static void set_vma_resv_flags(struct vm_area_struct *vma, unsigned long flags)
> >  {
> > -	unsigned long reserveflags = (unsigned long)vma->vm_private_data;
> >  	VM_BUG_ON(!is_vm_hugetlb_page(vma));
> > -	vma->vm_private_data = (void *)(reserveflags | flags);
> > +	VM_BUG_ON(vma->vm_flags & VM_SHARED);
> > +
> > +	set_vma_private_data(vma, get_vma_private_data(vma) | flags);
> >  }
> >  
> >  static int is_vma_resv_set(struct vm_area_struct *vma, unsigned long flag)
> >  {
> >  	VM_BUG_ON(!is_vm_hugetlb_page(vma));
> > -	return ((unsigned long)vma->vm_private_data & flag) != 0;
> > +
> > +	return (get_vma_private_data(vma) & flag) != 0;
> >  }
> 
> Oh.  We already kinda have it.  Perhaps vma_resv_huge_pages() should
> have called set_vma_resv_flags().  I guess the assertions would have
> busted that.
> 

The assertions as-is would have made that hard all right, but the checks
(particularly the SHARED ones) that are there are really defensive in nature
rather than set in stone.  A VM_SHARED mapping could use the flags as well
if there was a good reason for it but I didn't want the helpers to be used
by accident.

> Oh well, whatever.
> 

I am currently under the belief that the helpers as-is are
fairly easy to understand, should not interfere badly with the
1GB-and-multi-large-page-support being worked on and are reasonably difficult
to use incorrectly but I'm open to being corrected on it.

Thanks

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] huge page private reservation review cleanups
  2008-05-30 16:58       ` [PATCH 1/2] huge page private reservation review cleanups Andy Whitcroft
  2008-05-30 20:29         ` Andrew Morton
@ 2008-05-31 12:21         ` Mel Gorman
  1 sibling, 0 replies; 16+ messages in thread
From: Mel Gorman @ 2008-05-31 12:21 UTC (permalink / raw)
  To: Andy Whitcroft
  Cc: Andrew Morton, linux-mm, linux-kernel, agl, wli, kenchen, dwg,
	andi, dean, abh

On (30/05/08 17:58), Andy Whitcroft didst pronounce:
> 
> Create some new accessors for vma private data to cut down on and contain
> the casts.  Encapsulates the huge and small page offset calculations.  Also
> adds a couple of VM_BUG_ONs for consistency.
> 

Things are a bit more readable with the helpers for sure. Thanks for catching
the missing VM_BUG_ONs as well.

> Signed-off-by: Andy Whitcroft <apw@shadowen.org>

Acked-by: Mel Gorman <mel@csn.ul.ie>

> ---
>  mm/hugetlb.c |   56 +++++++++++++++++++++++++++++++++++++++++++-------------
>  1 files changed, 43 insertions(+), 13 deletions(-)
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 729a830..7a5ac81 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -40,6 +40,26 @@ static int hugetlb_next_nid;
>   */
>  static DEFINE_SPINLOCK(hugetlb_lock);
>  
> +/*
> + * Convert the address within this vma to the page offset within
> + * the mapping, in base page units.
> + */
> +pgoff_t vma_page_offset(struct vm_area_struct *vma, unsigned long address)
> +{
> +	return ((address - vma->vm_start) >> PAGE_SHIFT) +
> +					(vma->vm_pgoff >> PAGE_SHIFT);
> +}
> +
> +/*
> + * Convert the address within this vma to the page offset within
> + * the mapping, in pagecache page units; huge pages here.
> + */
> +pgoff_t vma_pagecache_offset(struct vm_area_struct *vma, unsigned long address)
> +{
> +	return ((address - vma->vm_start) >> HPAGE_SHIFT) +
> +			(vma->vm_pgoff >> (HPAGE_SHIFT - PAGE_SHIFT));
> +}
> +
>  #define HPAGE_RESV_OWNER    (1UL << (BITS_PER_LONG - 1))
>  #define HPAGE_RESV_UNMAPPED (1UL << (BITS_PER_LONG - 2))
>  #define HPAGE_RESV_MASK (HPAGE_RESV_OWNER | HPAGE_RESV_UNMAPPED)
> @@ -53,36 +73,48 @@ static DEFINE_SPINLOCK(hugetlb_lock);
>   * to reset the VMA at fork() time as it is not in use yet and there is no
>   * chance of the global counters getting corrupted as a result of the values.
>   */
> +static unsigned long get_vma_private_data(struct vm_area_struct *vma)
> +{
> +	return (unsigned long)vma->vm_private_data;
> +}
> +
> +static void set_vma_private_data(struct vm_area_struct *vma,
> +							unsigned long value)
> +{
> +	vma->vm_private_data = (void *)value;
> +}
> +
>  static unsigned long vma_resv_huge_pages(struct vm_area_struct *vma)
>  {
>  	VM_BUG_ON(!is_vm_hugetlb_page(vma));
>  	if (!(vma->vm_flags & VM_SHARED))
> -		return (unsigned long)vma->vm_private_data & ~HPAGE_RESV_MASK;
> +		return get_vma_private_data(vma) & ~HPAGE_RESV_MASK;
>  	return 0;
>  }
>  
>  static void set_vma_resv_huge_pages(struct vm_area_struct *vma,
>  							unsigned long reserve)
>  {
> -	unsigned long flags;
>  	VM_BUG_ON(!is_vm_hugetlb_page(vma));
>  	VM_BUG_ON(vma->vm_flags & VM_SHARED);
>  
> -	flags = (unsigned long)vma->vm_private_data & HPAGE_RESV_MASK;
> -	vma->vm_private_data = (void *)(reserve | flags);
> +	set_vma_private_data(vma,
> +		(get_vma_private_data(vma) & HPAGE_RESV_MASK) | reserve);
>  }
>  
>  static void set_vma_resv_flags(struct vm_area_struct *vma, unsigned long flags)
>  {
> -	unsigned long reserveflags = (unsigned long)vma->vm_private_data;
>  	VM_BUG_ON(!is_vm_hugetlb_page(vma));
> -	vma->vm_private_data = (void *)(reserveflags | flags);
> +	VM_BUG_ON(vma->vm_flags & VM_SHARED);
> +
> +	set_vma_private_data(vma, get_vma_private_data(vma) | flags);
>  }
>  
>  static int is_vma_resv_set(struct vm_area_struct *vma, unsigned long flag)
>  {
>  	VM_BUG_ON(!is_vm_hugetlb_page(vma));
> -	return ((unsigned long)vma->vm_private_data & flag) != 0;
> +
> +	return (get_vma_private_data(vma) & flag) != 0;
>  }
>  
>  /* Decrement the reserved pages in the hugepage pool by one */
> @@ -1150,11 +1182,10 @@ static struct page *hugetlbfs_pagecache_page(struct vm_area_struct *vma,
>  			unsigned long address)
>  {
>  	struct address_space *mapping;
> -	unsigned long idx;
> +	pgoff_t idx;
>  
>  	mapping = vma->vm_file->f_mapping;
> -	idx = ((address - vma->vm_start) >> HPAGE_SHIFT)
> -		+ (vma->vm_pgoff >> (HPAGE_SHIFT - PAGE_SHIFT));
> +	idx = vma_pagecache_offset(vma, address);
>  
>  	return find_lock_page(mapping, idx);
>  }
> @@ -1163,7 +1194,7 @@ static int hugetlb_no_page(struct mm_struct *mm, struct vm_area_struct *vma,
>  			unsigned long address, pte_t *ptep, int write_access)
>  {
>  	int ret = VM_FAULT_SIGBUS;
> -	unsigned long idx;
> +	pgoff_t idx;
>  	unsigned long size;
>  	struct page *page;
>  	struct address_space *mapping;
> @@ -1182,8 +1213,7 @@ static int hugetlb_no_page(struct mm_struct *mm, struct vm_area_struct *vma,
>  	}
>  
>  	mapping = vma->vm_file->f_mapping;
> -	idx = ((address - vma->vm_start) >> HPAGE_SHIFT)
> -		+ (vma->vm_pgoff >> (HPAGE_SHIFT - PAGE_SHIFT));
> +	idx = vma_pagecache_offset(vma, address);
>  
>  	/*
>  	 * Use page lock to guard against racing truncation
> 

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 2/2] huge page MAP_NORESERVE review cleanups
  2008-05-30 16:57     ` [PATCH 0/2] hugetlb reservations v4/MAP_NORESERVE V3 cleanups Andy Whitcroft
  2008-05-30 16:58       ` [PATCH 1/2] huge page private reservation review cleanups Andy Whitcroft
@ 2008-05-30 16:58       ` Andy Whitcroft
  1 sibling, 0 replies; 16+ messages in thread
From: Andy Whitcroft @ 2008-05-30 16:58 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, agl, wli, kenchen, dwg, andi, Mel Gorman,
	dean, abh, Andy Whitcroft

Use the new encapsulated huge page offset helper.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
---
 mm/hugetlb.c |    6 ++----
 1 files changed, 2 insertions(+), 4 deletions(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 1dce03a..901e580 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -733,8 +733,7 @@ static int vma_needs_reservation(struct vm_area_struct *vma, unsigned long addr)
 	struct inode *inode = mapping->host;
 
 	if (vma->vm_flags & VM_SHARED) {
-		unsigned long idx = ((addr - vma->vm_start) >> HPAGE_SHIFT) +
-				(vma->vm_pgoff >> (HPAGE_SHIFT - PAGE_SHIFT));
+		pgoff_t idx = vma_pagecache_offset(vma, addr);
 		return region_chg(&inode->i_mapping->private_list,
 							idx, idx + 1);
 
@@ -752,8 +751,7 @@ static void vma_commit_reservation(struct vm_area_struct *vma,
 	struct inode *inode = mapping->host;
 
 	if (vma->vm_flags & VM_SHARED) {
-		unsigned long idx = ((addr - vma->vm_start) >> HPAGE_SHIFT) +
-				(vma->vm_pgoff >> (HPAGE_SHIFT - PAGE_SHIFT));
+		pgoff_t idx = vma_pagecache_offset(vma, addr);
 		region_add(&inode->i_mapping->private_list, idx, idx + 1);
 	}
 }

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2008-05-31 13:06 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-05-27 18:50 [PATCH 0/3] Guarantee faults for processes that call mmap(MAP_PRIVATE) on hugetlbfs v4 Mel Gorman
2008-05-27 18:50 ` [PATCH 1/3] Move hugetlb_acct_memory() Mel Gorman
2008-05-28 13:37   ` Adam Litke
2008-05-27 18:51 ` [PATCH 2/3] Reserve huge pages for reliable MAP_PRIVATE hugetlbfs mappings until fork() Mel Gorman
2008-05-28 13:52   ` Adam Litke
2008-05-27 18:51 ` [PATCH 3/3] Guarantee that COW faults for a process that called mmap(MAP_PRIVATE) on hugetlbfs will succeed Mel Gorman
2008-05-28 16:00   ` Mel Gorman
2008-05-28 18:15     ` Adam Litke
2008-05-28 18:16   ` Adam Litke
2008-05-29  1:42   ` Andrew Morton
2008-05-30 16:57     ` [PATCH 0/2] hugetlb reservations v4/MAP_NORESERVE V3 cleanups Andy Whitcroft
2008-05-30 16:58       ` [PATCH 1/2] huge page private reservation review cleanups Andy Whitcroft
2008-05-30 20:29         ` Andrew Morton
2008-05-31 13:06           ` Mel Gorman
2008-05-31 12:21         ` Mel Gorman
2008-05-30 16:58       ` [PATCH 2/2] huge page MAP_NORESERVE " Andy Whitcroft

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).