All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mel Gorman <mel@csn.ul.ie>
To: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Jeff Chua <jeff.chua.linux@gmail.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Nick Piggin <npiggin@suse.de>, Rik van Riel <riel@redhat.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH 4/8] mm: FOLL_DUMP replace FOLL_ANON
Date: Wed, 9 Sep 2009 12:14:38 +0100	[thread overview]
Message-ID: <20090909111438.GF24614@csn.ul.ie> (raw)
In-Reply-To: <Pine.LNX.4.64.0909072233240.15430@sister.anvils>

On Mon, Sep 07, 2009 at 10:35:32PM +0100, Hugh Dickins wrote:
> The "FOLL_ANON optimization" and its use_zero_page() test have caused
> confusion and bugs: why does it test VM_SHARED? for the very good but
> unsatisfying reason that VMware crashed without.  As we look to maybe
> reinstating anonymous use of the ZERO_PAGE, we need to sort this out.
> 
> Easily done: it's silly for __get_user_pages() and follow_page() to
> be guessing whether it's safe to assume that they're being used for
> a coredump (which can take a shortcut snapshot where other uses must
> handle a fault) - just tell them with GUP_FLAGS_DUMP and FOLL_DUMP.
> 
> get_dump_page() doesn't even want a ZERO_PAGE: an error suits fine.
> 
> Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>

Acked-by: Mel Gorman <mel@csn.ul.ie>

> ---
> 
>  include/linux/mm.h |    2 +-
>  mm/internal.h      |    1 +
>  mm/memory.c        |   43 ++++++++++++-------------------------------
>  3 files changed, 14 insertions(+), 32 deletions(-)
> 
> --- mm3/include/linux/mm.h	2009-09-07 13:16:32.000000000 +0100
> +++ mm4/include/linux/mm.h	2009-09-07 13:16:39.000000000 +0100
> @@ -1247,7 +1247,7 @@ struct page *follow_page(struct vm_area_
>  #define FOLL_WRITE	0x01	/* check pte is writable */
>  #define FOLL_TOUCH	0x02	/* mark page accessed */
>  #define FOLL_GET	0x04	/* do get_page on page */
> -#define FOLL_ANON	0x08	/* give ZERO_PAGE if no pgtable */
> +#define FOLL_DUMP	0x08	/* give error on hole if it would be zero */
>  
>  typedef int (*pte_fn_t)(pte_t *pte, pgtable_t token, unsigned long addr,
>  			void *data);
> --- mm3/mm/internal.h	2009-09-07 13:16:22.000000000 +0100
> +++ mm4/mm/internal.h	2009-09-07 13:16:39.000000000 +0100
> @@ -252,6 +252,7 @@ static inline void mminit_validate_memmo
>  
>  #define GUP_FLAGS_WRITE		0x01
>  #define GUP_FLAGS_FORCE		0x02
> +#define GUP_FLAGS_DUMP		0x04
>  
>  int __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
>  		     unsigned long start, int len, int flags,
> --- mm3/mm/memory.c	2009-09-07 13:16:32.000000000 +0100
> +++ mm4/mm/memory.c	2009-09-07 13:16:39.000000000 +0100
> @@ -1174,41 +1174,22 @@ no_page:
>  	pte_unmap_unlock(ptep, ptl);
>  	if (!pte_none(pte))
>  		return page;
> -	/* Fall through to ZERO_PAGE handling */
> +
>  no_page_table:
>  	/*
>  	 * When core dumping an enormous anonymous area that nobody
> -	 * has touched so far, we don't want to allocate page tables.
> +	 * has touched so far, we don't want to allocate unnecessary pages or
> +	 * page tables.  Return error instead of NULL to skip handle_mm_fault,
> +	 * then get_dump_page() will return NULL to leave a hole in the dump.
> +	 * But we can only make this optimization where a hole would surely
> +	 * be zero-filled if handle_mm_fault() actually did handle it.
>  	 */
> -	if (flags & FOLL_ANON) {
> -		page = ZERO_PAGE(0);
> -		if (flags & FOLL_GET)
> -			get_page(page);
> -		BUG_ON(flags & FOLL_WRITE);
> -	}
> +	if ((flags & FOLL_DUMP) &&
> +	    (!vma->vm_ops || !vma->vm_ops->fault))
> +		return ERR_PTR(-EFAULT);
>  	return page;
>  }
>  
> -/* Can we do the FOLL_ANON optimization? */
> -static inline int use_zero_page(struct vm_area_struct *vma)
> -{
> -	/*
> -	 * We don't want to optimize FOLL_ANON for make_pages_present()
> -	 * when it tries to page in a VM_LOCKED region. As to VM_SHARED,
> -	 * we want to get the page from the page tables to make sure
> -	 * that we serialize and update with any other user of that
> -	 * mapping.
> -	 */
> -	if (vma->vm_flags & (VM_LOCKED | VM_SHARED))
> -		return 0;
> -	/*
> -	 * And if we have a fault routine, it's not an anonymous region.
> -	 */
> -	return !vma->vm_ops || !vma->vm_ops->fault;
> -}
> -
> -
> -
>  int __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
>  		     unsigned long start, int nr_pages, int flags,
>  		     struct page **pages, struct vm_area_struct **vmas)
> @@ -1288,8 +1269,8 @@ int __get_user_pages(struct task_struct
>  		foll_flags = FOLL_TOUCH;
>  		if (pages)
>  			foll_flags |= FOLL_GET;
> -		if (!write && use_zero_page(vma))
> -			foll_flags |= FOLL_ANON;
> +		if (flags & GUP_FLAGS_DUMP)
> +			foll_flags |= FOLL_DUMP;
>  
>  		do {
>  			struct page *page;
> @@ -1447,7 +1428,7 @@ struct page *get_dump_page(unsigned long
>  	struct page *page;
>  
>  	if (__get_user_pages(current, current->mm, addr, 1,
> -				GUP_FLAGS_FORCE, &page, &vma) < 1)
> +			GUP_FLAGS_FORCE | GUP_FLAGS_DUMP, &page, &vma) < 1)
>  		return NULL;
>  	if (page == ZERO_PAGE(0)) {
>  		page_cache_release(page);
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

WARNING: multiple messages have this Message-ID (diff)
From: Mel Gorman <mel@csn.ul.ie>
To: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Jeff Chua <jeff.chua.linux@gmail.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Nick Piggin <npiggin@suse.de>, Rik van Riel <riel@redhat.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH 4/8] mm: FOLL_DUMP replace FOLL_ANON
Date: Wed, 9 Sep 2009 12:14:38 +0100	[thread overview]
Message-ID: <20090909111438.GF24614@csn.ul.ie> (raw)
In-Reply-To: <Pine.LNX.4.64.0909072233240.15430@sister.anvils>

On Mon, Sep 07, 2009 at 10:35:32PM +0100, Hugh Dickins wrote:
> The "FOLL_ANON optimization" and its use_zero_page() test have caused
> confusion and bugs: why does it test VM_SHARED? for the very good but
> unsatisfying reason that VMware crashed without.  As we look to maybe
> reinstating anonymous use of the ZERO_PAGE, we need to sort this out.
> 
> Easily done: it's silly for __get_user_pages() and follow_page() to
> be guessing whether it's safe to assume that they're being used for
> a coredump (which can take a shortcut snapshot where other uses must
> handle a fault) - just tell them with GUP_FLAGS_DUMP and FOLL_DUMP.
> 
> get_dump_page() doesn't even want a ZERO_PAGE: an error suits fine.
> 
> Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>

Acked-by: Mel Gorman <mel@csn.ul.ie>

> ---
> 
>  include/linux/mm.h |    2 +-
>  mm/internal.h      |    1 +
>  mm/memory.c        |   43 ++++++++++++-------------------------------
>  3 files changed, 14 insertions(+), 32 deletions(-)
> 
> --- mm3/include/linux/mm.h	2009-09-07 13:16:32.000000000 +0100
> +++ mm4/include/linux/mm.h	2009-09-07 13:16:39.000000000 +0100
> @@ -1247,7 +1247,7 @@ struct page *follow_page(struct vm_area_
>  #define FOLL_WRITE	0x01	/* check pte is writable */
>  #define FOLL_TOUCH	0x02	/* mark page accessed */
>  #define FOLL_GET	0x04	/* do get_page on page */
> -#define FOLL_ANON	0x08	/* give ZERO_PAGE if no pgtable */
> +#define FOLL_DUMP	0x08	/* give error on hole if it would be zero */
>  
>  typedef int (*pte_fn_t)(pte_t *pte, pgtable_t token, unsigned long addr,
>  			void *data);
> --- mm3/mm/internal.h	2009-09-07 13:16:22.000000000 +0100
> +++ mm4/mm/internal.h	2009-09-07 13:16:39.000000000 +0100
> @@ -252,6 +252,7 @@ static inline void mminit_validate_memmo
>  
>  #define GUP_FLAGS_WRITE		0x01
>  #define GUP_FLAGS_FORCE		0x02
> +#define GUP_FLAGS_DUMP		0x04
>  
>  int __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
>  		     unsigned long start, int len, int flags,
> --- mm3/mm/memory.c	2009-09-07 13:16:32.000000000 +0100
> +++ mm4/mm/memory.c	2009-09-07 13:16:39.000000000 +0100
> @@ -1174,41 +1174,22 @@ no_page:
>  	pte_unmap_unlock(ptep, ptl);
>  	if (!pte_none(pte))
>  		return page;
> -	/* Fall through to ZERO_PAGE handling */
> +
>  no_page_table:
>  	/*
>  	 * When core dumping an enormous anonymous area that nobody
> -	 * has touched so far, we don't want to allocate page tables.
> +	 * has touched so far, we don't want to allocate unnecessary pages or
> +	 * page tables.  Return error instead of NULL to skip handle_mm_fault,
> +	 * then get_dump_page() will return NULL to leave a hole in the dump.
> +	 * But we can only make this optimization where a hole would surely
> +	 * be zero-filled if handle_mm_fault() actually did handle it.
>  	 */
> -	if (flags & FOLL_ANON) {
> -		page = ZERO_PAGE(0);
> -		if (flags & FOLL_GET)
> -			get_page(page);
> -		BUG_ON(flags & FOLL_WRITE);
> -	}
> +	if ((flags & FOLL_DUMP) &&
> +	    (!vma->vm_ops || !vma->vm_ops->fault))
> +		return ERR_PTR(-EFAULT);
>  	return page;
>  }
>  
> -/* Can we do the FOLL_ANON optimization? */
> -static inline int use_zero_page(struct vm_area_struct *vma)
> -{
> -	/*
> -	 * We don't want to optimize FOLL_ANON for make_pages_present()
> -	 * when it tries to page in a VM_LOCKED region. As to VM_SHARED,
> -	 * we want to get the page from the page tables to make sure
> -	 * that we serialize and update with any other user of that
> -	 * mapping.
> -	 */
> -	if (vma->vm_flags & (VM_LOCKED | VM_SHARED))
> -		return 0;
> -	/*
> -	 * And if we have a fault routine, it's not an anonymous region.
> -	 */
> -	return !vma->vm_ops || !vma->vm_ops->fault;
> -}
> -
> -
> -
>  int __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
>  		     unsigned long start, int nr_pages, int flags,
>  		     struct page **pages, struct vm_area_struct **vmas)
> @@ -1288,8 +1269,8 @@ int __get_user_pages(struct task_struct
>  		foll_flags = FOLL_TOUCH;
>  		if (pages)
>  			foll_flags |= FOLL_GET;
> -		if (!write && use_zero_page(vma))
> -			foll_flags |= FOLL_ANON;
> +		if (flags & GUP_FLAGS_DUMP)
> +			foll_flags |= FOLL_DUMP;
>  
>  		do {
>  			struct page *page;
> @@ -1447,7 +1428,7 @@ struct page *get_dump_page(unsigned long
>  	struct page *page;
>  
>  	if (__get_user_pages(current, current->mm, addr, 1,
> -				GUP_FLAGS_FORCE, &page, &vma) < 1)
> +			GUP_FLAGS_FORCE | GUP_FLAGS_DUMP, &page, &vma) < 1)
>  		return NULL;
>  	if (page == ZERO_PAGE(0)) {
>  		page_cache_release(page);
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2009-09-09 11:14 UTC|newest]

Thread overview: 105+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-09-07 21:26 [PATCH 0/8] mm: around get_user_pages flags Hugh Dickins
2009-09-07 21:26 ` Hugh Dickins
2009-09-07 21:29 ` [PATCH 1/8] mm: munlock use follow_page Hugh Dickins
2009-09-07 21:29   ` Hugh Dickins
2009-09-08  2:58   ` KAMEZAWA Hiroyuki
2009-09-08  2:58     ` KAMEZAWA Hiroyuki
2009-09-08 11:30     ` Hugh Dickins
2009-09-08 11:30       ` Hugh Dickins
2009-09-08 17:10   ` Rik van Riel
2009-09-08 17:10     ` Rik van Riel
2009-09-09 15:59   ` Minchan Kim
2009-09-09 15:59     ` Minchan Kim
2009-09-11 11:07   ` Hiroaki Wakabayashi
2009-09-11 11:07     ` Hiroaki Wakabayashi
2009-09-07 21:31 ` [PATCH 2/8] mm: remove unused GUP flags Hugh Dickins
2009-09-07 21:31   ` Hugh Dickins
2009-09-08 17:27   ` Rik van Riel
2009-09-08 17:27     ` Rik van Riel
2009-09-07 21:33 ` [PATCH 3/8] mm: add get_dump_page Hugh Dickins
2009-09-07 21:33   ` Hugh Dickins
2009-09-08 18:57   ` Rik van Riel
2009-09-08 18:57     ` Rik van Riel
2009-09-07 21:35 ` [PATCH 4/8] mm: FOLL_DUMP replace FOLL_ANON Hugh Dickins
2009-09-07 21:35   ` Hugh Dickins
2009-09-08 18:59   ` Rik van Riel
2009-09-08 18:59     ` Rik van Riel
2009-09-09 11:14   ` Mel Gorman [this message]
2009-09-09 11:14     ` Mel Gorman
2009-09-09 16:16   ` Minchan Kim
2009-09-09 16:16     ` Minchan Kim
2009-09-13 15:46     ` Hugh Dickins
2009-09-13 23:05       ` Minchan Kim
2009-09-13 23:05         ` Minchan Kim
2009-09-07 21:37 ` [PATCH 5/8] mm: follow_hugetlb_page flags Hugh Dickins
2009-09-07 21:37   ` Hugh Dickins
2009-09-08 22:21   ` Rik van Riel
2009-09-08 22:21     ` Rik van Riel
2009-09-09 11:31   ` Mel Gorman
2009-09-09 11:31     ` Mel Gorman
2009-09-13 15:35     ` Hugh Dickins
2009-09-13 15:35       ` Hugh Dickins
2009-09-14 13:27       ` Mel Gorman
2009-09-14 13:27         ` Mel Gorman
2009-09-15 20:26         ` Hugh Dickins
2009-09-15 20:26           ` Hugh Dickins
2009-09-07 21:38 ` [PATCH 6/8] mm: fix anonymous dirtying Hugh Dickins
2009-09-07 21:38   ` Hugh Dickins
2009-09-08 22:23   ` Rik van Riel
2009-09-08 22:23     ` Rik van Riel
2009-09-07 21:39 ` [PATCH 7/8] mm: reinstate ZERO_PAGE Hugh Dickins
2009-09-07 21:39   ` Hugh Dickins
2009-09-08  2:37   ` KAMEZAWA Hiroyuki
2009-09-08  2:37     ` KAMEZAWA Hiroyuki
2009-09-08 11:56     ` Hugh Dickins
2009-09-08 11:56       ` Hugh Dickins
2009-09-09  1:44       ` KAMEZAWA Hiroyuki
2009-09-09  1:44         ` KAMEZAWA Hiroyuki
2009-09-15 20:15         ` Hugh Dickins
2009-09-15 20:15           ` Hugh Dickins
2009-09-08  7:31   ` Nick Piggin
2009-09-08  7:31     ` Nick Piggin
2009-09-08 12:17     ` Hugh Dickins
2009-09-08 12:17       ` Hugh Dickins
2009-09-08 15:34       ` Nick Piggin
2009-09-08 15:34         ` Nick Piggin
2009-09-08 16:40         ` Hugh Dickins
2009-09-08 16:40           ` Hugh Dickins
2009-09-08 14:13     ` Linus Torvalds
2009-09-08 14:13       ` Linus Torvalds
2009-09-08 23:35   ` Rik van Riel
2009-09-08 23:35     ` Rik van Riel
2009-09-07 21:40 ` [PATCH 8/8] mm: FOLL flags for GUP flags Hugh Dickins
2009-09-07 21:40   ` Hugh Dickins
2009-09-07 23:51 ` [PATCH 0/8] mm: around get_user_pages flags Linus Torvalds
2009-09-07 23:51   ` Linus Torvalds
2009-09-07 23:52 ` KAMEZAWA Hiroyuki
2009-09-07 23:52   ` KAMEZAWA Hiroyuki
2009-09-08  0:00 ` KOSAKI Motohiro
2009-09-08  0:00   ` KOSAKI Motohiro
2009-09-10  0:33   ` KOSAKI Motohiro
2009-09-10  0:33     ` KOSAKI Motohiro
2009-09-15 20:16     ` Hugh Dickins
2009-09-15 20:16       ` Hugh Dickins
2009-09-15 20:30 ` [PATCH 0/4] mm: mlock, hugetlb, zero followups Hugh Dickins
2009-09-15 20:30   ` Hugh Dickins
2009-09-15 20:31   ` [PATCH 1/4] mm: m(un)lock avoid ZERO_PAGE Hugh Dickins
2009-09-15 20:31     ` Hugh Dickins
2009-09-16  0:08     ` KOSAKI Motohiro
2009-09-16  0:08       ` KOSAKI Motohiro
2009-09-16  9:35     ` Mel Gorman
2009-09-16  9:35       ` Mel Gorman
2009-09-16 11:40       ` Hugh Dickins
2009-09-16 11:40         ` Hugh Dickins
2009-09-16 12:47         ` Mel Gorman
2009-09-16 12:47           ` Mel Gorman
2009-09-15 20:33   ` [PATCH 2/4] mm: hugetlbfs_pagecache_present Hugh Dickins
2009-09-15 20:33     ` Hugh Dickins
2009-09-15 20:37   ` [PATCH 3/4] mm: ZERO_PAGE without PTE_SPECIAL Hugh Dickins
2009-09-15 20:37     ` Hugh Dickins
2009-09-16  6:20     ` KAMEZAWA Hiroyuki
2009-09-16  6:20       ` KAMEZAWA Hiroyuki
2009-09-15 20:38   ` [PATCH 4/4] mm: move highest_memmap_pfn Hugh Dickins
2009-09-15 20:38     ` Hugh Dickins
2009-09-17  0:33   ` [PATCH 0/4] mm: mlock, hugetlb, zero followups KOSAKI Motohiro
2009-09-17  0:33     ` KOSAKI Motohiro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090909111438.GF24614@csn.ul.ie \
    --to=mel@csn.ul.ie \
    --cc=akpm@linux-foundation.org \
    --cc=hugh.dickins@tiscali.co.uk \
    --cc=jeff.chua.linux@gmail.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=npiggin@suse.de \
    --cc=riel@redhat.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.