All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nick Piggin <npiggin@suse.de>
To: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Rik van Riel <riel@redhat.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	linux-arch@vger.kernel.org
Subject: Re: [PATCH 7/8] mm: reinstate ZERO_PAGE
Date: Tue, 8 Sep 2009 09:31:19 +0200	[thread overview]
Message-ID: <20090908073119.GA29902@wotan.suse.de> (raw)
In-Reply-To: <Pine.LNX.4.64.0909072238320.15430@sister.anvils>

On Mon, Sep 07, 2009 at 10:39:34PM +0100, Hugh Dickins wrote:
> KAMEZAWA Hiroyuki has observed customers of earlier kernels taking
> advantage of the ZERO_PAGE: which we stopped do_anonymous_page() from
> using in 2.6.24.  And there were a couple of regression reports on LKML.
> 
> Following suggestions from Linus, reinstate do_anonymous_page() use of
> the ZERO_PAGE; but this time avoid dirtying its struct page cacheline
> with (map)count updates - let vm_normal_page() regard it as abnormal.
> 
> Use it only on arches which __HAVE_ARCH_PTE_SPECIAL (x86, s390, sh32,
> most powerpc): that's not essential, but minimizes additional branches
> (keeping them in the unlikely pte_special case); and incidentally
> excludes mips (some models of which needed eight colours of ZERO_PAGE
> to avoid costly exceptions).

Without looking closely, why is it a big problem to have a
!HAVE PTE SPECIAL case? Couldn't it just be a check for
pfn == zero_pfn that is conditionally compiled away for pte
special architectures anyway?

If zero page is such a good idea, I don't see the logic of
limiting it like thisa. Your patch looks pretty clean though.

At any rate, I think it might be an idea to cc linux-arch. 

> 
> Don't be fanatical about avoiding ZERO_PAGE updates: get_user_pages()
> callers won't want to make exceptions for it, so increment its count
> there.  Changes to mlock and migration? happily seems not needed.
> 
> In most places it's quicker to check pfn than struct page address:
> prepare a __read_mostly zero_pfn for that.  Does get_dump_page()
> still need its ZERO_PAGE check? probably not, but keep it anyway.
> 
> Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
> ---
> I have not studied the performance of this at all: I'd rather it go
> into mmotm where others may decide whether it's a good thing or not.
> 
>  mm/memory.c |   53 +++++++++++++++++++++++++++++++++++++++++---------
>  1 file changed, 44 insertions(+), 9 deletions(-)
> 
> --- mm6/mm/memory.c	2009-09-07 13:16:53.000000000 +0100
> +++ mm7/mm/memory.c	2009-09-07 13:17:01.000000000 +0100
> @@ -107,6 +107,17 @@ static int __init disable_randmaps(char
>  }
>  __setup("norandmaps", disable_randmaps);
>  
> +static unsigned long zero_pfn __read_mostly;
> +
> +/*
> + * CONFIG_MMU architectures set up ZERO_PAGE in their paging_init()
> + */
> +static int __init init_zero_pfn(void)
> +{
> +	zero_pfn = page_to_pfn(ZERO_PAGE(0));
> +	return 0;
> +}
> +core_initcall(init_zero_pfn);
>  
>  /*
>   * If a p?d_bad entry is found while walking page tables, report
> @@ -499,7 +510,9 @@ struct page *vm_normal_page(struct vm_ar
>  	if (HAVE_PTE_SPECIAL) {
>  		if (likely(!pte_special(pte)))
>  			goto check_pfn;
> -		if (!(vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP)))
> +		if (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP))
> +			return NULL;
> +		if (pfn != zero_pfn)
>  			print_bad_pte(vma, addr, pte, NULL);
>  		return NULL;
>  	}
> @@ -1144,9 +1157,14 @@ struct page *follow_page(struct vm_area_
>  		goto no_page;
>  	if ((flags & FOLL_WRITE) && !pte_write(pte))
>  		goto unlock;
> +
>  	page = vm_normal_page(vma, address, pte);
> -	if (unlikely(!page))
> -		goto bad_page;
> +	if (unlikely(!page)) {
> +		if ((flags & FOLL_DUMP) ||
> +		    pte_pfn(pte) != zero_pfn)
> +			goto bad_page;
> +		page = pte_page(pte);
> +	}
>  
>  	if (flags & FOLL_GET)
>  		get_page(page);
> @@ -2085,10 +2103,19 @@ gotten:
>  
>  	if (unlikely(anon_vma_prepare(vma)))
>  		goto oom;
> -	VM_BUG_ON(old_page == ZERO_PAGE(0));
> -	new_page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, address);
> -	if (!new_page)
> -		goto oom;
> +
> +	if (pte_pfn(orig_pte) == zero_pfn) {
> +		new_page = alloc_zeroed_user_highpage_movable(vma, address);
> +		if (!new_page)
> +			goto oom;
> +	} else {
> +		new_page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, address);
> +		if (!new_page)
> +			goto oom;
> +		cow_user_page(new_page, old_page, address, vma);
> +	}
> +	__SetPageUptodate(new_page);
> +
>  	/*
>  	 * Don't let another task, with possibly unlocked vma,
>  	 * keep the mlocked page.
> @@ -2098,8 +2125,6 @@ gotten:
>  		clear_page_mlock(old_page);
>  		unlock_page(old_page);
>  	}
> -	cow_user_page(new_page, old_page, address, vma);
> -	__SetPageUptodate(new_page);
>  
>  	if (mem_cgroup_newpage_charge(new_page, mm, GFP_KERNEL))
>  		goto oom_free_new;
> @@ -2594,6 +2619,15 @@ static int do_anonymous_page(struct mm_s
>  	spinlock_t *ptl;
>  	pte_t entry;
>  
> +	if (HAVE_PTE_SPECIAL && !(flags & FAULT_FLAG_WRITE)) {
> +		entry = pte_mkspecial(pfn_pte(zero_pfn, vma->vm_page_prot));
> +		ptl = pte_lockptr(mm, pmd);
> +		spin_lock(ptl);
> +		if (!pte_none(*page_table))
> +			goto unlock;
> +		goto setpte;
> +	}
> +
>  	/* Allocate our own private page. */
>  	pte_unmap(page_table);
>  
> @@ -2617,6 +2651,7 @@ static int do_anonymous_page(struct mm_s
>  
>  	inc_mm_counter(mm, anon_rss);
>  	page_add_new_anon_rmap(page, vma, address);
> +setpte:
>  	set_pte_at(mm, address, page_table, entry);
>  
>  	/* No need to invalidate - it was non-present before */

WARNING: multiple messages have this Message-ID (diff)
From: Nick Piggin <npiggin@suse.de>
To: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Rik van Riel <riel@redhat.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	linux-arch@vger.kernel.org
Subject: Re: [PATCH 7/8] mm: reinstate ZERO_PAGE
Date: Tue, 8 Sep 2009 09:31:19 +0200	[thread overview]
Message-ID: <20090908073119.GA29902@wotan.suse.de> (raw)
In-Reply-To: <Pine.LNX.4.64.0909072238320.15430@sister.anvils>

On Mon, Sep 07, 2009 at 10:39:34PM +0100, Hugh Dickins wrote:
> KAMEZAWA Hiroyuki has observed customers of earlier kernels taking
> advantage of the ZERO_PAGE: which we stopped do_anonymous_page() from
> using in 2.6.24.  And there were a couple of regression reports on LKML.
> 
> Following suggestions from Linus, reinstate do_anonymous_page() use of
> the ZERO_PAGE; but this time avoid dirtying its struct page cacheline
> with (map)count updates - let vm_normal_page() regard it as abnormal.
> 
> Use it only on arches which __HAVE_ARCH_PTE_SPECIAL (x86, s390, sh32,
> most powerpc): that's not essential, but minimizes additional branches
> (keeping them in the unlikely pte_special case); and incidentally
> excludes mips (some models of which needed eight colours of ZERO_PAGE
> to avoid costly exceptions).

Without looking closely, why is it a big problem to have a
!HAVE PTE SPECIAL case? Couldn't it just be a check for
pfn == zero_pfn that is conditionally compiled away for pte
special architectures anyway?

If zero page is such a good idea, I don't see the logic of
limiting it like thisa. Your patch looks pretty clean though.

At any rate, I think it might be an idea to cc linux-arch. 

> 
> Don't be fanatical about avoiding ZERO_PAGE updates: get_user_pages()
> callers won't want to make exceptions for it, so increment its count
> there.  Changes to mlock and migration? happily seems not needed.
> 
> In most places it's quicker to check pfn than struct page address:
> prepare a __read_mostly zero_pfn for that.  Does get_dump_page()
> still need its ZERO_PAGE check? probably not, but keep it anyway.
> 
> Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
> ---
> I have not studied the performance of this at all: I'd rather it go
> into mmotm where others may decide whether it's a good thing or not.
> 
>  mm/memory.c |   53 +++++++++++++++++++++++++++++++++++++++++---------
>  1 file changed, 44 insertions(+), 9 deletions(-)
> 
> --- mm6/mm/memory.c	2009-09-07 13:16:53.000000000 +0100
> +++ mm7/mm/memory.c	2009-09-07 13:17:01.000000000 +0100
> @@ -107,6 +107,17 @@ static int __init disable_randmaps(char
>  }
>  __setup("norandmaps", disable_randmaps);
>  
> +static unsigned long zero_pfn __read_mostly;
> +
> +/*
> + * CONFIG_MMU architectures set up ZERO_PAGE in their paging_init()
> + */
> +static int __init init_zero_pfn(void)
> +{
> +	zero_pfn = page_to_pfn(ZERO_PAGE(0));
> +	return 0;
> +}
> +core_initcall(init_zero_pfn);
>  
>  /*
>   * If a p?d_bad entry is found while walking page tables, report
> @@ -499,7 +510,9 @@ struct page *vm_normal_page(struct vm_ar
>  	if (HAVE_PTE_SPECIAL) {
>  		if (likely(!pte_special(pte)))
>  			goto check_pfn;
> -		if (!(vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP)))
> +		if (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP))
> +			return NULL;
> +		if (pfn != zero_pfn)
>  			print_bad_pte(vma, addr, pte, NULL);
>  		return NULL;
>  	}
> @@ -1144,9 +1157,14 @@ struct page *follow_page(struct vm_area_
>  		goto no_page;
>  	if ((flags & FOLL_WRITE) && !pte_write(pte))
>  		goto unlock;
> +
>  	page = vm_normal_page(vma, address, pte);
> -	if (unlikely(!page))
> -		goto bad_page;
> +	if (unlikely(!page)) {
> +		if ((flags & FOLL_DUMP) ||
> +		    pte_pfn(pte) != zero_pfn)
> +			goto bad_page;
> +		page = pte_page(pte);
> +	}
>  
>  	if (flags & FOLL_GET)
>  		get_page(page);
> @@ -2085,10 +2103,19 @@ gotten:
>  
>  	if (unlikely(anon_vma_prepare(vma)))
>  		goto oom;
> -	VM_BUG_ON(old_page == ZERO_PAGE(0));
> -	new_page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, address);
> -	if (!new_page)
> -		goto oom;
> +
> +	if (pte_pfn(orig_pte) == zero_pfn) {
> +		new_page = alloc_zeroed_user_highpage_movable(vma, address);
> +		if (!new_page)
> +			goto oom;
> +	} else {
> +		new_page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, address);
> +		if (!new_page)
> +			goto oom;
> +		cow_user_page(new_page, old_page, address, vma);
> +	}
> +	__SetPageUptodate(new_page);
> +
>  	/*
>  	 * Don't let another task, with possibly unlocked vma,
>  	 * keep the mlocked page.
> @@ -2098,8 +2125,6 @@ gotten:
>  		clear_page_mlock(old_page);
>  		unlock_page(old_page);
>  	}
> -	cow_user_page(new_page, old_page, address, vma);
> -	__SetPageUptodate(new_page);
>  
>  	if (mem_cgroup_newpage_charge(new_page, mm, GFP_KERNEL))
>  		goto oom_free_new;
> @@ -2594,6 +2619,15 @@ static int do_anonymous_page(struct mm_s
>  	spinlock_t *ptl;
>  	pte_t entry;
>  
> +	if (HAVE_PTE_SPECIAL && !(flags & FAULT_FLAG_WRITE)) {
> +		entry = pte_mkspecial(pfn_pte(zero_pfn, vma->vm_page_prot));
> +		ptl = pte_lockptr(mm, pmd);
> +		spin_lock(ptl);
> +		if (!pte_none(*page_table))
> +			goto unlock;
> +		goto setpte;
> +	}
> +
>  	/* Allocate our own private page. */
>  	pte_unmap(page_table);
>  
> @@ -2617,6 +2651,7 @@ static int do_anonymous_page(struct mm_s
>  
>  	inc_mm_counter(mm, anon_rss);
>  	page_add_new_anon_rmap(page, vma, address);
> +setpte:
>  	set_pte_at(mm, address, page_table, entry);
>  
>  	/* No need to invalidate - it was non-present before */

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2009-09-08  7:31 UTC|newest]

Thread overview: 105+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-09-07 21:26 [PATCH 0/8] mm: around get_user_pages flags Hugh Dickins
2009-09-07 21:26 ` Hugh Dickins
2009-09-07 21:29 ` [PATCH 1/8] mm: munlock use follow_page Hugh Dickins
2009-09-07 21:29   ` Hugh Dickins
2009-09-08  2:58   ` KAMEZAWA Hiroyuki
2009-09-08  2:58     ` KAMEZAWA Hiroyuki
2009-09-08 11:30     ` Hugh Dickins
2009-09-08 11:30       ` Hugh Dickins
2009-09-08 17:10   ` Rik van Riel
2009-09-08 17:10     ` Rik van Riel
2009-09-09 15:59   ` Minchan Kim
2009-09-09 15:59     ` Minchan Kim
2009-09-11 11:07   ` Hiroaki Wakabayashi
2009-09-11 11:07     ` Hiroaki Wakabayashi
2009-09-07 21:31 ` [PATCH 2/8] mm: remove unused GUP flags Hugh Dickins
2009-09-07 21:31   ` Hugh Dickins
2009-09-08 17:27   ` Rik van Riel
2009-09-08 17:27     ` Rik van Riel
2009-09-07 21:33 ` [PATCH 3/8] mm: add get_dump_page Hugh Dickins
2009-09-07 21:33   ` Hugh Dickins
2009-09-08 18:57   ` Rik van Riel
2009-09-08 18:57     ` Rik van Riel
2009-09-07 21:35 ` [PATCH 4/8] mm: FOLL_DUMP replace FOLL_ANON Hugh Dickins
2009-09-07 21:35   ` Hugh Dickins
2009-09-08 18:59   ` Rik van Riel
2009-09-08 18:59     ` Rik van Riel
2009-09-09 11:14   ` Mel Gorman
2009-09-09 11:14     ` Mel Gorman
2009-09-09 16:16   ` Minchan Kim
2009-09-09 16:16     ` Minchan Kim
2009-09-13 15:46     ` Hugh Dickins
2009-09-13 23:05       ` Minchan Kim
2009-09-13 23:05         ` Minchan Kim
2009-09-07 21:37 ` [PATCH 5/8] mm: follow_hugetlb_page flags Hugh Dickins
2009-09-07 21:37   ` Hugh Dickins
2009-09-08 22:21   ` Rik van Riel
2009-09-08 22:21     ` Rik van Riel
2009-09-09 11:31   ` Mel Gorman
2009-09-09 11:31     ` Mel Gorman
2009-09-13 15:35     ` Hugh Dickins
2009-09-13 15:35       ` Hugh Dickins
2009-09-14 13:27       ` Mel Gorman
2009-09-14 13:27         ` Mel Gorman
2009-09-15 20:26         ` Hugh Dickins
2009-09-15 20:26           ` Hugh Dickins
2009-09-07 21:38 ` [PATCH 6/8] mm: fix anonymous dirtying Hugh Dickins
2009-09-07 21:38   ` Hugh Dickins
2009-09-08 22:23   ` Rik van Riel
2009-09-08 22:23     ` Rik van Riel
2009-09-07 21:39 ` [PATCH 7/8] mm: reinstate ZERO_PAGE Hugh Dickins
2009-09-07 21:39   ` Hugh Dickins
2009-09-08  2:37   ` KAMEZAWA Hiroyuki
2009-09-08  2:37     ` KAMEZAWA Hiroyuki
2009-09-08 11:56     ` Hugh Dickins
2009-09-08 11:56       ` Hugh Dickins
2009-09-09  1:44       ` KAMEZAWA Hiroyuki
2009-09-09  1:44         ` KAMEZAWA Hiroyuki
2009-09-15 20:15         ` Hugh Dickins
2009-09-15 20:15           ` Hugh Dickins
2009-09-08  7:31   ` Nick Piggin [this message]
2009-09-08  7:31     ` Nick Piggin
2009-09-08 12:17     ` Hugh Dickins
2009-09-08 12:17       ` Hugh Dickins
2009-09-08 15:34       ` Nick Piggin
2009-09-08 15:34         ` Nick Piggin
2009-09-08 16:40         ` Hugh Dickins
2009-09-08 16:40           ` Hugh Dickins
2009-09-08 14:13     ` Linus Torvalds
2009-09-08 14:13       ` Linus Torvalds
2009-09-08 23:35   ` Rik van Riel
2009-09-08 23:35     ` Rik van Riel
2009-09-07 21:40 ` [PATCH 8/8] mm: FOLL flags for GUP flags Hugh Dickins
2009-09-07 21:40   ` Hugh Dickins
2009-09-07 23:51 ` [PATCH 0/8] mm: around get_user_pages flags Linus Torvalds
2009-09-07 23:51   ` Linus Torvalds
2009-09-07 23:52 ` KAMEZAWA Hiroyuki
2009-09-07 23:52   ` KAMEZAWA Hiroyuki
2009-09-08  0:00 ` KOSAKI Motohiro
2009-09-08  0:00   ` KOSAKI Motohiro
2009-09-10  0:33   ` KOSAKI Motohiro
2009-09-10  0:33     ` KOSAKI Motohiro
2009-09-15 20:16     ` Hugh Dickins
2009-09-15 20:16       ` Hugh Dickins
2009-09-15 20:30 ` [PATCH 0/4] mm: mlock, hugetlb, zero followups Hugh Dickins
2009-09-15 20:30   ` Hugh Dickins
2009-09-15 20:31   ` [PATCH 1/4] mm: m(un)lock avoid ZERO_PAGE Hugh Dickins
2009-09-15 20:31     ` Hugh Dickins
2009-09-16  0:08     ` KOSAKI Motohiro
2009-09-16  0:08       ` KOSAKI Motohiro
2009-09-16  9:35     ` Mel Gorman
2009-09-16  9:35       ` Mel Gorman
2009-09-16 11:40       ` Hugh Dickins
2009-09-16 11:40         ` Hugh Dickins
2009-09-16 12:47         ` Mel Gorman
2009-09-16 12:47           ` Mel Gorman
2009-09-15 20:33   ` [PATCH 2/4] mm: hugetlbfs_pagecache_present Hugh Dickins
2009-09-15 20:33     ` Hugh Dickins
2009-09-15 20:37   ` [PATCH 3/4] mm: ZERO_PAGE without PTE_SPECIAL Hugh Dickins
2009-09-15 20:37     ` Hugh Dickins
2009-09-16  6:20     ` KAMEZAWA Hiroyuki
2009-09-16  6:20       ` KAMEZAWA Hiroyuki
2009-09-15 20:38   ` [PATCH 4/4] mm: move highest_memmap_pfn Hugh Dickins
2009-09-15 20:38     ` Hugh Dickins
2009-09-17  0:33   ` [PATCH 0/4] mm: mlock, hugetlb, zero followups KOSAKI Motohiro
2009-09-17  0:33     ` KOSAKI Motohiro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090908073119.GA29902@wotan.suse.de \
    --to=npiggin@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=hugh.dickins@tiscali.co.uk \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=riel@redhat.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.