linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [patch] mm: fix xip issue with /dev/zero
@ 2007-02-16 12:22 Carsten Otte
  2007-02-18 18:56 ` Hugh Dickins
  0 siblings, 1 reply; 6+ messages in thread
From: Carsten Otte @ 2007-02-16 12:22 UTC (permalink / raw)
  To: Hugh Dickins, Nick Piggin, Linux Memory Management, LinusTorvalds

This patch removes usage of ZERO_PAGE for xip. We use our own zeroed
page for mapping sparse holes to userland now. That gets us rid of
dependencies with other users of ZERO_PAGE, such as /dev/zero. Thanks to
Hugh for reporting this issue. I tested this briefly and it seems to
work fine, please apply.

Signed-off-by: Carsten Otte <cotte@de.ibm.com>

---
diff -ruN linux-2.6/mm/filemap_xip.c linux-2.6+fix/mm/filemap_xip.c
--- linux-2.6/mm/filemap_xip.c	2007-02-02 13:02:58.000000000 +0100
+++ linux-2.6+fix/mm/filemap_xip.c	2007-02-15 15:18:51.000000000 +0100
@@ -17,6 +17,30 @@
 #include "filemap.h"
 
 /*
+ * We do use our own empty page to avoid interference with other users
+ * of ZERO_PAGE(), such as /dev/zero
+ */
+static struct page * __xip_sparse_page = NULL;
+static spinlock_t   xip_alloc_lock = SPIN_LOCK_UNLOCKED;
+
+static inline struct page *
+xip_sparse_page(void)
+{
+	unsigned long tmp;
+
+	if (!__xip_sparse_page) {
+		tmp = get_zeroed_page(GFP_KERNEL);
+		spin_lock(&xip_alloc_lock);
+		if (!__xip_sparse_page)
+			__xip_sparse_page = virt_to_page(tmp);
+		else
+			free_page (tmp);;
+		spin_unlock(&xip_alloc_lock);
+	}
+	return __xip_sparse_page;
+}
+
+/*
  * This is a file read routine for execute in place files, and uses
  * the mapping->a_ops->get_xip_page() function for the actual low-level
  * stuff.
@@ -68,7 +92,7 @@
 		if (unlikely(IS_ERR(page))) {
 			if (PTR_ERR(page) == -ENODATA) {
 				/* sparse */
-				page = ZERO_PAGE(0);
+				page = xip_sparse_page();
 			} else {
 				desc->error = PTR_ERR(page);
 				goto out;
@@ -162,7 +186,7 @@
  * xip_write
  *
  * This function walks all vmas of the address_space and unmaps the
- * ZERO_PAGE when found at pgoff. Should it go in rmap.c?
+ * xip_sparse_page() when found at pgoff.
  */
 static void
 __xip_unmap (struct address_space * mapping,
@@ -183,7 +207,7 @@
 		address = vma->vm_start +
 			((pgoff - vma->vm_pgoff) << PAGE_SHIFT);
 		BUG_ON(address < vma->vm_start || address >= vma->vm_end);
-		page = ZERO_PAGE(0);
+		page = xip_sparse_page();
 		pte = page_check_address(page, mm, address, &ptl);
 		if (pte) {
 			/* Nuke the page table entry. */
@@ -245,8 +269,8 @@
 		/* unmap page at pgoff from all other vmas */
 		__xip_unmap(mapping, pgoff);
 	} else {
-		/* not shared and writable, use ZERO_PAGE() */
-		page = ZERO_PAGE(0);
+		/* not shared and writable, use xip_sparse_page() */
+		page = xip_sparse_page();
 	}
 
 out:


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [patch] mm: fix xip issue with /dev/zero
  2007-02-16 12:22 [patch] mm: fix xip issue with /dev/zero Carsten Otte
@ 2007-02-18 18:56 ` Hugh Dickins
  2007-02-26 18:04   ` [RFC] " Carsten Otte
  0 siblings, 1 reply; 6+ messages in thread
From: Hugh Dickins @ 2007-02-18 18:56 UTC (permalink / raw)
  To: Carsten Otte; +Cc: Nick Piggin, Linux Memory Management, LinusTorvalds

On Fri, 16 Feb 2007, Carsten Otte wrote:

> This patch removes usage of ZERO_PAGE for xip. We use our own zeroed
> page for mapping sparse holes to userland now. That gets us rid of
> dependencies with other users of ZERO_PAGE, such as /dev/zero. Thanks to
> Hugh for reporting this issue. I tested this briefly and it seems to
> work fine, please apply.

That's too vague a description of the bug.  The problem was that
a long read from /dev/zero into a private xip mapping would insert the
ZERO_PAGE into the userspace page table, but xip could not distinguish
that from its own use of the ZERO_PAGE where there's a hole: if the
hole gets filled in later, it would wrongly update the private mapping
from the zeroes read there to the data newly written into the file.

> 
> Signed-off-by: Carsten Otte <cotte@de.ibm.com>

Your patch is on the right lines, but not yet correct:
sorry if I misled you when I concentrated on the locking.

> 
> ---
> diff -ruN linux-2.6/mm/filemap_xip.c linux-2.6+fix/mm/filemap_xip.c

Please include the "-p" option in your diff flags, to show what
function each hunk falls in: this is a case where that would have
been helpful.

> --- linux-2.6/mm/filemap_xip.c	2007-02-02 13:02:58.000000000 +0100
> +++ linux-2.6+fix/mm/filemap_xip.c	2007-02-15 15:18:51.000000000 +0100
> @@ -17,6 +17,30 @@
>  #include "filemap.h"
>  
>  /*
> + * We do use our own empty page to avoid interference with other users
> + * of ZERO_PAGE(), such as /dev/zero
> + */
> +static struct page * __xip_sparse_page = NULL;
> +static spinlock_t   xip_alloc_lock = SPIN_LOCK_UNLOCKED;

(You tend to insert too many spaces for kernel CodingStyle,
but filemap_xip.c is already like that, so I won't worry now.)

> +
> +static inline struct page *
> +xip_sparse_page(void)

Leave it to gcc to decide whether this is right to inline,
and put the whole declaration on one line:

static struct page *xip_sparse_page(void)

> +{
> +	unsigned long tmp;
> +
> +	if (!__xip_sparse_page) {
> +		tmp = get_zeroed_page(GFP_KERNEL);

It's rare for a GFP_KERNEL allocation to fail, but it can happen
(last time I looked it could only happen if the calling process
has been chosen for OOM-kill; but details are subject to change).

You do need to allow for the 0 return here, and deal with it
in your callers: proceeding with virt_to_page(0) will turn a
temporary lack of memory into a crash for all subsequent users.

> +		spin_lock(&xip_alloc_lock);
> +		if (!__xip_sparse_page)
> +			__xip_sparse_page = virt_to_page(tmp);
> +		else
> +			free_page (tmp);;
> +		spin_unlock(&xip_alloc_lock);
> +	}
> +	return __xip_sparse_page;
> +}
> +
> +/*
>   * This is a file read routine for execute in place files, and uses
>   * the mapping->a_ops->get_xip_page() function for the actual low-level
>   * stuff.
> @@ -68,7 +92,7 @@
>  		if (unlikely(IS_ERR(page))) {
>  			if (PTR_ERR(page) == -ENODATA) {
>  				/* sparse */
> -				page = ZERO_PAGE(0);
> +				page = xip_sparse_page();
>  			} else {
>  				desc->error = PTR_ERR(page);
>  				goto out;
> @@ -162,7 +186,7 @@
>   * xip_write
>   *
>   * This function walks all vmas of the address_space and unmaps the
> - * ZERO_PAGE when found at pgoff. Should it go in rmap.c?
> + * xip_sparse_page() when found at pgoff.
>   */
>  static void
>  __xip_unmap (struct address_space * mapping,
> @@ -183,7 +207,7 @@
>  		address = vma->vm_start +
>  			((pgoff - vma->vm_pgoff) << PAGE_SHIFT);
>  		BUG_ON(address < vma->vm_start || address >= vma->vm_end);
> -		page = ZERO_PAGE(0);
> +		page = xip_sparse_page();

No: "diff -p" would show this one is in __xip_unmap, and here you're
inside spin_lock(&mapping->i_mmap_lock): it is not safe to allocate
a page with GFP_KERNEL here, nor is there any point in doing so.
__xip_unmap should return immediately if __xip_sparse_page has not
yet been set.

>  		pte = page_check_address(page, mm, address, &ptl);
>  		if (pte) {
>  			/* Nuke the page table entry. */
> @@ -245,8 +269,8 @@
>  		/* unmap page at pgoff from all other vmas */
>  		__xip_unmap(mapping, pgoff);
>  	} else {
> -		/* not shared and writable, use ZERO_PAGE() */
> -		page = ZERO_PAGE(0);
> +		/* not shared and writable, use xip_sparse_page() */
> +		page = xip_sparse_page();
>  	}
>  
>  out:

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [RFC] [patch] mm: fix xip issue with /dev/zero
  2007-02-18 18:56 ` Hugh Dickins
@ 2007-02-26 18:04   ` Carsten Otte
  2007-03-01 18:59     ` Hugh Dickins
  0 siblings, 1 reply; 6+ messages in thread
From: Carsten Otte @ 2007-02-26 18:04 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Nick Piggin, Linux Memory Management

Thanks for your review feedback Hugh, I do appreciate it. Here comes my
second attempt:

This patch fixes the bug, that reading into xip mapping from /dev/zero
fills the user page table with ZERO_PAGE() entries. Later on, xip cannot
tell which pages have been ZERO_PAGE() filled by access to a sparse
mapping, and which ones origin from /dev/zero. It will unmap ZERO_PAGE
from all mappings when filling the sparse hole with data.
xip does now use its own zeroed page for its sparse mappings.

Signed-off-by: Carsten Otte <cotte@de.ibm.com>

---
diff -pruN linux-2.6/mm/filemap_xip.c linux-2.6+fix/mm/filemap_xip.c
--- linux-2.6/mm/filemap_xip.c	2007-02-26 13:46:28.000000000 +0100
+++ linux-2.6+fix/mm/filemap_xip.c	2007-02-26 18:45:15.000000000 +0100
@@ -17,6 +17,31 @@
 #include "filemap.h"
 
 /*
+ * We do use our own empty page to avoid interference with other users
+ * of ZERO_PAGE(), such as /dev/zero
+ */
+static struct page *__xip_sparse_page = NULL;
+static DEFINE_SPINLOCK(xip_alloc_lock);
+
+static struct page *xip_sparse_page(void)
+{
+	unsigned long tmp;
+
+	if (!__xip_sparse_page) {
+		tmp = get_zeroed_page(GFP_ATOMIC);
+		if (!tmp)
+			return NULL;
+		spin_lock(&xip_alloc_lock);
+		if (!__xip_sparse_page)
+			__xip_sparse_page = virt_to_page(tmp);
+		else
+			free_page (tmp);
+		spin_unlock(&xip_alloc_lock);
+	}
+	return __xip_sparse_page;
+}
+
+/*
  * This is a file read routine for execute in place files, and uses
  * the mapping->a_ops->get_xip_page() function for the actual low-level
  * stuff.
@@ -63,12 +88,18 @@ do_xip_mapping_read(struct address_space
 
 		page = mapping->a_ops->get_xip_page(mapping,
 			index*(PAGE_SIZE/512), 0);
-		if (!page)
+		if (!page) {
+			desc->error = -EIO;
 			goto no_xip_page;
+		}
 		if (unlikely(IS_ERR(page))) {
 			if (PTR_ERR(page) == -ENODATA) {
 				/* sparse */
-				page = ZERO_PAGE(0);
+				page = xip_sparse_page();
+				if (!page) {
+					desc->error = -ENOMEM;
+					goto no_xip_page;
+				}
 			} else {
 				desc->error = PTR_ERR(page);
 				goto out;
@@ -102,7 +133,6 @@ do_xip_mapping_read(struct address_space
 
 no_xip_page:
 		/* Did not get the page. Report it */
-		desc->error = -EIO;
 		goto out;
 	}
 
@@ -162,7 +192,7 @@ EXPORT_SYMBOL_GPL(xip_file_sendfile);
  * xip_write
  *
  * This function walks all vmas of the address_space and unmaps the
- * ZERO_PAGE when found at pgoff. Should it go in rmap.c?
+ * xip_sparse_page() when found at pgoff.
  */
 static void
 __xip_unmap (struct address_space * mapping,
@@ -183,7 +213,11 @@ __xip_unmap (struct address_space * mapp
 		address = vma->vm_start +
 			((pgoff - vma->vm_pgoff) << PAGE_SHIFT);
 		BUG_ON(address < vma->vm_start || address >= vma->vm_end);
-		page = ZERO_PAGE(0);
+		page = xip_sparse_page();
+		if (!page)
+			/* cannot allocate xip page, is not mapped anywhere */
+			goto out_unlock;
+
 		pte = page_check_address(page, mm, address, &ptl);
 		if (pte) {
 			/* Nuke the page table entry. */
@@ -196,6 +230,7 @@ __xip_unmap (struct address_space * mapp
 			page_cache_release(page);
 		}
 	}
+out_unlock:
 	spin_unlock(&mapping->i_mmap_lock);
 }
 
@@ -245,12 +280,13 @@ xip_file_nopage(struct vm_area_struct * 
 		/* unmap page at pgoff from all other vmas */
 		__xip_unmap(mapping, pgoff);
 	} else {
-		/* not shared and writable, use ZERO_PAGE() */
-		page = ZERO_PAGE(0);
+		/* not shared and writable, use xip_sparse_page() */
+		page = xip_sparse_page();
 	}
 
 out:
-	page_cache_get(page);
+	if (page)
+		page_cache_get(page);
 	return page;
 }
 


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC] [patch] mm: fix xip issue with /dev/zero
  2007-02-26 18:04   ` [RFC] " Carsten Otte
@ 2007-03-01 18:59     ` Hugh Dickins
  2007-03-27 15:37       ` Carsten Otte
  0 siblings, 1 reply; 6+ messages in thread
From: Hugh Dickins @ 2007-03-01 18:59 UTC (permalink / raw)
  To: Carsten Otte; +Cc: Nick Piggin, Linux Memory Management

On Mon, 26 Feb 2007, Carsten Otte wrote:

> Thanks for your review feedback Hugh, I do appreciate it. Here comes my
> second attempt:

Still not quite right, so I took your patch and reworked it below:
if you agree with that version, please send it on to akpm.

Things I didn't like about yours: I think you misunderstood me on
__xip_unmap, my point was that it's silly for that to be allocating
at all; if you can avoid GFP_ATOMIC you should, GFP_HIGHUSER is the
most appropriate in this case; xip_file_nopage ought to distinguish
the new NOPAGE_OOM case from its existing NULLs, which I therefore
changed to NOPAGE_SIGBUSs (filemap_nopage was made clearer that way
in 2.6.19); and in doing this, I've realized that there's no need
to change do_xip_mapping_read, its use of the ZERO_PAGE is safe
(so very likely we'll never allocate a __xip_sparse_page at all).

But I hesitated over my !page test at the start of __xip_unmap:
doesn't the "page = __xip_sparse_page" need an smp_rmb() before
it, to serialize against something setting __xip_sparse_page
concurrently?  Then realized there's an underlying raciness
there, and that's the least of it: __xip_unmap has nothing to
guard against a racing task (whose get_xip_page said -ENODATA
a moment earlier) inserting __xip_sparse_page into a vma of the
file just after __xip_unmap has checked it.

Well, fixing that would be another patch entirely, and not one
I want to think about at present: it's not obvious to me what
the appropriate locking should be, and it's probably not even
fixable within xip_file_nopage: that's a price we pay for
trying to play special zero/sparse page tricks here.



This patch fixes the bug, that reading into xip mapping from /dev/zero
fills the user page table with ZERO_PAGE() entries. Later on, xip cannot
tell which pages have been ZERO_PAGE() filled by access to a sparse
mapping, and which ones origin from /dev/zero. It will unmap ZERO_PAGE
from all mappings when filling the sparse hole with data.
xip does now use its own zeroed page for its sparse mappings.

Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Hugh Dickins <hugh@veritas.com>
---

 mm/filemap_xip.c |   48 +++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 37 insertions(+), 11 deletions(-)

--- 2.6.21-rc2/mm/filemap_xip.c	2007-02-04 18:44:54.000000000 +0000
+++ linux/mm/filemap_xip.c	2007-03-01 18:25:48.000000000 +0000
@@ -17,6 +17,29 @@
 #include "filemap.h"
 
 /*
+ * We do use our own empty page to avoid interference with other users
+ * of ZERO_PAGE(), such as /dev/zero
+ */
+static struct page *__xip_sparse_page;
+
+static struct page *xip_sparse_page(void)
+{
+	if (!__xip_sparse_page) {
+		unsigned long zeroes = get_zeroed_page(GFP_HIGHUSER);
+		if (zeroes) {
+			static DEFINE_SPINLOCK(xip_alloc_lock);
+			spin_lock(&xip_alloc_lock);
+			if (!__xip_sparse_page)
+				__xip_sparse_page = virt_to_page(zeroes);
+			else
+				free_page(zeroes);
+			spin_unlock(&xip_alloc_lock);
+		}
+	}
+	return __xip_sparse_page;
+}
+
+/*
  * This is a file read routine for execute in place files, and uses
  * the mapping->a_ops->get_xip_page() function for the actual low-level
  * stuff.
@@ -162,7 +185,7 @@ EXPORT_SYMBOL_GPL(xip_file_sendfile);
  * xip_write
  *
  * This function walks all vmas of the address_space and unmaps the
- * ZERO_PAGE when found at pgoff. Should it go in rmap.c?
+ * __xip_sparse_page when found at pgoff.
  */
 static void
 __xip_unmap (struct address_space * mapping,
@@ -177,13 +200,16 @@ __xip_unmap (struct address_space * mapp
 	spinlock_t *ptl;
 	struct page *page;
 
+	page = __xip_sparse_page;
+	if (!page)
+		return;
+
 	spin_lock(&mapping->i_mmap_lock);
 	vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, pgoff, pgoff) {
 		mm = vma->vm_mm;
 		address = vma->vm_start +
 			((pgoff - vma->vm_pgoff) << PAGE_SHIFT);
 		BUG_ON(address < vma->vm_start || address >= vma->vm_end);
-		page = ZERO_PAGE(0);
 		pte = page_check_address(page, mm, address, &ptl);
 		if (pte) {
 			/* Nuke the page table entry. */
@@ -222,16 +248,14 @@ xip_file_nopage(struct vm_area_struct * 
 		+ area->vm_pgoff;
 
 	size = (i_size_read(inode) + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
-	if (pgoff >= size) {
-		return NULL;
-	}
+	if (pgoff >= size)
+		return NOPAGE_SIGBUS;
 
 	page = mapping->a_ops->get_xip_page(mapping, pgoff*(PAGE_SIZE/512), 0);
-	if (!IS_ERR(page)) {
+	if (!IS_ERR(page))
 		goto out;
-	}
 	if (PTR_ERR(page) != -ENODATA)
-		return NULL;
+		return NOPAGE_SIGBUS;
 
 	/* sparse block */
 	if ((area->vm_flags & (VM_WRITE | VM_MAYWRITE)) &&
@@ -241,12 +265,14 @@ xip_file_nopage(struct vm_area_struct * 
 		page = mapping->a_ops->get_xip_page (mapping,
 			pgoff*(PAGE_SIZE/512), 1);
 		if (IS_ERR(page))
-			return NULL;
+			return NOPAGE_SIGBUS;
 		/* unmap page at pgoff from all other vmas */
 		__xip_unmap(mapping, pgoff);
 	} else {
-		/* not shared and writable, use ZERO_PAGE() */
-		page = ZERO_PAGE(0);
+		/* not shared and writable, use xip_sparse_page() */
+		page = xip_sparse_page();
+		if (!page)
+			return NOPAGE_OOM;
 	}
 
 out:

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC] [patch] mm: fix xip issue with /dev/zero
  2007-03-01 18:59     ` Hugh Dickins
@ 2007-03-27 15:37       ` Carsten Otte
  2007-03-27 17:07         ` Hugh Dickins
  0 siblings, 1 reply; 6+ messages in thread
From: Carsten Otte @ 2007-03-27 15:37 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Nick Piggin, Linux Memory Management

Am Donnerstag, den 01.03.2007, 18:59 +0000 schrieb Hugh Dickins:
> Still not quite right, so I took your patch and reworked it below:
> if you agree with that version, please send it on to akpm.
Sorry for my late reply. The patch does'nt apply on -mm anymore, because
filemap_xip now uses fault instead of nopage. I modified your patch
again to fit on current -mm. Did I miss something? If no, I will send it
to Andrew. I've done some basic testing on it, all seems to work well.

This patch fixes the bug, that reading into xip mapping from /dev/zero
fills the user page table with ZERO_PAGE() entries. Later on, xip cannot
tell which pages have been ZERO_PAGE() filled by access to a sparse
mapping, and which ones origin from /dev/zero. It will unmap ZERO_PAGE
from all mappings when filling the sparse hole with data.
xip does now use its own zeroed page for its sparse mappings.

Signed-off-by: Carsten Otte <cotte@de.ibm.com>
---

--- linux-2.6.21-rc5-mm2/mm/filemap_xip.c	2007-03-27 12:51:22.000000000 +0200
+++ linux-2.6.21-rc5-mm2+patch/mm/filemap_xip.c	2007-03-27 15:37:44.000000000 +0200
@@ -17,6 +17,29 @@
 #include "filemap.h"
 
 /*
+ * We do use our own empty page to avoid interference with other users
+ * of ZERO_PAGE(), such as /dev/zero
+ */
+static struct page *__xip_sparse_page;
+
+static struct page *xip_sparse_page(void)
+{
+	if (!__xip_sparse_page) {
+		unsigned long zeroes = get_zeroed_page(GFP_HIGHUSER);
+		if (zeroes) {
+			static DEFINE_SPINLOCK(xip_alloc_lock);
+			spin_lock(&xip_alloc_lock);
+			if (!__xip_sparse_page)
+				__xip_sparse_page = virt_to_page(zeroes);
+			else
+				free_page(zeroes);
+			spin_unlock(&xip_alloc_lock);
+		}
+	}
+	return __xip_sparse_page;
+}
+
+/*
  * This is a file read routine for execute in place files, and uses
  * the mapping->a_ops->get_xip_page() function for the actual low-level
  * stuff.
@@ -162,7 +185,7 @@
  * xip_write
  *
  * This function walks all vmas of the address_space and unmaps the
- * ZERO_PAGE when found at pgoff. Should it go in rmap.c?
+ * __xip_sparse_page when found at pgoff.
  */
 static void
 __xip_unmap (struct address_space * mapping,
@@ -177,13 +200,16 @@
 	spinlock_t *ptl;
 	struct page *page;
 
+	page = __xip_sparse_page;
+	if (!page)
+		return;
+
 	spin_lock(&mapping->i_mmap_lock);
 	vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, pgoff, pgoff) {
 		mm = vma->vm_mm;
 		address = vma->vm_start +
 			((pgoff - vma->vm_pgoff) << PAGE_SHIFT);
 		BUG_ON(address < vma->vm_start || address >= vma->vm_end);
-		page = ZERO_PAGE(0);
 		pte = page_check_address(page, mm, address, &ptl);
 		if (pte) {
 			/* Nuke the page table entry. */
@@ -245,8 +271,12 @@
 		/* unmap page at pgoff from all other vmas */
 		__xip_unmap(mapping, fdata->pgoff);
 	} else {
-		/* not shared and writable, use ZERO_PAGE() */
-		page = ZERO_PAGE(0);
+		/* not shared and writable, use xip_sparse_page() */
+		page = xip_sparse_page();
+		if (!page) {
+	                fdata->type = VM_FAULT_OOM;
+	                return NULL;
+		}
 	}
 
 out:


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC] [patch] mm: fix xip issue with /dev/zero
  2007-03-27 15:37       ` Carsten Otte
@ 2007-03-27 17:07         ` Hugh Dickins
  0 siblings, 0 replies; 6+ messages in thread
From: Hugh Dickins @ 2007-03-27 17:07 UTC (permalink / raw)
  To: Carsten Otte; +Cc: Nick Piggin, Linux Memory Management

On Tue, 27 Mar 2007, Carsten Otte wrote:
> Am Donnerstag, den 01.03.2007, 18:59 +0000 schrieb Hugh Dickins:
> > Still not quite right, so I took your patch and reworked it below:
> > if you agree with that version, please send it on to akpm.
> Sorry for my late reply.

I am the last person anyone should apologize to for lateness!

> The patch does'nt apply on -mm anymore, because
> filemap_xip now uses fault instead of nopage. I modified your patch
> again to fit on current -mm. Did I miss something? If no, I will send it
> to Andrew. I've done some basic testing on it, all seems to work well.

Comparing against what I suggested, it looks just right to me:
do go ahead and send Andrew.

But that comparison does show one discrepancy, not in your patch
below, but where Nick and I independently fixed up the "error
reporting".  He interprets one failure of get_xip_page as
VM_FAULT_OOM then the next as VM_FAULT_SIGBUS, where I thought
them both NOPAGE_SIGBUS.

I'm inclined to agree with me on that, though it's hard to tell
without peering into the internals of ->get_xip_page()s.

Hmm, and in looking into that, the whole file seems quite confused
as to whether ->get_xip_page might return NULL page or not: some
places allow for it (one treating it as -EIO, another as -ENOMEM),
others don't allow for it at all.  Something to tidy up.

Hugh

> 
> This patch fixes the bug, that reading into xip mapping from /dev/zero
> fills the user page table with ZERO_PAGE() entries. Later on, xip cannot
> tell which pages have been ZERO_PAGE() filled by access to a sparse
> mapping, and which ones origin from /dev/zero. It will unmap ZERO_PAGE
> from all mappings when filling the sparse hole with data.
> xip does now use its own zeroed page for its sparse mappings.
> 
> Signed-off-by: Carsten Otte <cotte@de.ibm.com>
> ---
> 
> --- linux-2.6.21-rc5-mm2/mm/filemap_xip.c	2007-03-27 12:51:22.000000000 +0200
> +++ linux-2.6.21-rc5-mm2+patch/mm/filemap_xip.c	2007-03-27 15:37:44.000000000 +0200
> @@ -17,6 +17,29 @@
>  #include "filemap.h"
>  
>  /*
> + * We do use our own empty page to avoid interference with other users
> + * of ZERO_PAGE(), such as /dev/zero
> + */
> +static struct page *__xip_sparse_page;
> +
> +static struct page *xip_sparse_page(void)
> +{
> +	if (!__xip_sparse_page) {
> +		unsigned long zeroes = get_zeroed_page(GFP_HIGHUSER);
> +		if (zeroes) {
> +			static DEFINE_SPINLOCK(xip_alloc_lock);
> +			spin_lock(&xip_alloc_lock);
> +			if (!__xip_sparse_page)
> +				__xip_sparse_page = virt_to_page(zeroes);
> +			else
> +				free_page(zeroes);
> +			spin_unlock(&xip_alloc_lock);
> +		}
> +	}
> +	return __xip_sparse_page;
> +}
> +
> +/*
>   * This is a file read routine for execute in place files, and uses
>   * the mapping->a_ops->get_xip_page() function for the actual low-level
>   * stuff.
> @@ -162,7 +185,7 @@
>   * xip_write
>   *
>   * This function walks all vmas of the address_space and unmaps the
> - * ZERO_PAGE when found at pgoff. Should it go in rmap.c?
> + * __xip_sparse_page when found at pgoff.
>   */
>  static void
>  __xip_unmap (struct address_space * mapping,
> @@ -177,13 +200,16 @@
>  	spinlock_t *ptl;
>  	struct page *page;
>  
> +	page = __xip_sparse_page;
> +	if (!page)
> +		return;
> +
>  	spin_lock(&mapping->i_mmap_lock);
>  	vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, pgoff, pgoff) {
>  		mm = vma->vm_mm;
>  		address = vma->vm_start +
>  			((pgoff - vma->vm_pgoff) << PAGE_SHIFT);
>  		BUG_ON(address < vma->vm_start || address >= vma->vm_end);
> -		page = ZERO_PAGE(0);
>  		pte = page_check_address(page, mm, address, &ptl);
>  		if (pte) {
>  			/* Nuke the page table entry. */
> @@ -245,8 +271,12 @@
>  		/* unmap page at pgoff from all other vmas */
>  		__xip_unmap(mapping, fdata->pgoff);
>  	} else {
> -		/* not shared and writable, use ZERO_PAGE() */
> -		page = ZERO_PAGE(0);
> +		/* not shared and writable, use xip_sparse_page() */
> +		page = xip_sparse_page();
> +		if (!page) {
> +	                fdata->type = VM_FAULT_OOM;
> +	                return NULL;
> +		}
>  	}
>  
>  out:

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2007-03-27 17:07 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-02-16 12:22 [patch] mm: fix xip issue with /dev/zero Carsten Otte
2007-02-18 18:56 ` Hugh Dickins
2007-02-26 18:04   ` [RFC] " Carsten Otte
2007-03-01 18:59     ` Hugh Dickins
2007-03-27 15:37       ` Carsten Otte
2007-03-27 17:07         ` Hugh Dickins

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).