Re: [PATCH 7/8] mm: reinstate ZERO_PAGE

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Nick Piggin <npiggin@suse.de>
To: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Rik van Riel <riel@redhat.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	linux-arch@vger.kernel.org
Subject: Re: [PATCH 7/8] mm: reinstate ZERO_PAGE
Date: Tue, 8 Sep 2009 09:31:19 +0200	[thread overview]
Message-ID: <20090908073119.GA29902@wotan.suse.de> (raw)
In-Reply-To: <Pine.LNX.4.64.0909072238320.15430@sister.anvils>

On Mon, Sep 07, 2009 at 10:39:34PM +0100, Hugh Dickins wrote:
> KAMEZAWA Hiroyuki has observed customers of earlier kernels taking
> advantage of the ZERO_PAGE: which we stopped do_anonymous_page() from
> using in 2.6.24.  And there were a couple of regression reports on LKML.
> 
> Following suggestions from Linus, reinstate do_anonymous_page() use of
> the ZERO_PAGE; but this time avoid dirtying its struct page cacheline
> with (map)count updates - let vm_normal_page() regard it as abnormal.
> 
> Use it only on arches which __HAVE_ARCH_PTE_SPECIAL (x86, s390, sh32,
> most powerpc): that's not essential, but minimizes additional branches
> (keeping them in the unlikely pte_special case); and incidentally
> excludes mips (some models of which needed eight colours of ZERO_PAGE
> to avoid costly exceptions).

Without looking closely, why is it a big problem to have a
!HAVE PTE SPECIAL case? Couldn't it just be a check for
pfn == zero_pfn that is conditionally compiled away for pte
special architectures anyway?

If zero page is such a good idea, I don't see the logic of
limiting it like thisa. Your patch looks pretty clean though.

At any rate, I think it might be an idea to cc linux-arch. 

> 
> Don't be fanatical about avoiding ZERO_PAGE updates: get_user_pages()
> callers won't want to make exceptions for it, so increment its count
> there.  Changes to mlock and migration? happily seems not needed.
> 
> In most places it's quicker to check pfn than struct page address:
> prepare a __read_mostly zero_pfn for that.  Does get_dump_page()
> still need its ZERO_PAGE check? probably not, but keep it anyway.
> 
> Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
> ---
> I have not studied the performance of this at all: I'd rather it go
> into mmotm where others may decide whether it's a good thing or not.
> 
>  mm/memory.c |   53 +++++++++++++++++++++++++++++++++++++++++---------
>  1 file changed, 44 insertions(+), 9 deletions(-)
> 
> --- mm6/mm/memory.c	2009-09-07 13:16:53.000000000 +0100
> +++ mm7/mm/memory.c	2009-09-07 13:17:01.000000000 +0100
> @@ -107,6 +107,17 @@ static int __init disable_randmaps(char
>  }
>  __setup("norandmaps", disable_randmaps);
>  
> +static unsigned long zero_pfn __read_mostly;
> +
> +/*
> + * CONFIG_MMU architectures set up ZERO_PAGE in their paging_init()
> + */
> +static int __init init_zero_pfn(void)
> +{
> +	zero_pfn = page_to_pfn(ZERO_PAGE(0));
> +	return 0;
> +}
> +core_initcall(init_zero_pfn);
>  
>  /*
>   * If a p?d_bad entry is found while walking page tables, report
> @@ -499,7 +510,9 @@ struct page *vm_normal_page(struct vm_ar
>  	if (HAVE_PTE_SPECIAL) {
>  		if (likely(!pte_special(pte)))
>  			goto check_pfn;
> -		if (!(vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP)))
> +		if (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP))
> +			return NULL;
> +		if (pfn != zero_pfn)
>  			print_bad_pte(vma, addr, pte, NULL);
>  		return NULL;
>  	}
> @@ -1144,9 +1157,14 @@ struct page *follow_page(struct vm_area_
>  		goto no_page;
>  	if ((flags & FOLL_WRITE) && !pte_write(pte))
>  		goto unlock;
> +
>  	page = vm_normal_page(vma, address, pte);
> -	if (unlikely(!page))
> -		goto bad_page;
> +	if (unlikely(!page)) {
> +		if ((flags & FOLL_DUMP) ||
> +		    pte_pfn(pte) != zero_pfn)
> +			goto bad_page;
> +		page = pte_page(pte);
> +	}
>  
>  	if (flags & FOLL_GET)
>  		get_page(page);
> @@ -2085,10 +2103,19 @@ gotten:
>  
>  	if (unlikely(anon_vma_prepare(vma)))
>  		goto oom;
> -	VM_BUG_ON(old_page == ZERO_PAGE(0));
> -	new_page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, address);
> -	if (!new_page)
> -		goto oom;
> +
> +	if (pte_pfn(orig_pte) == zero_pfn) {
> +		new_page = alloc_zeroed_user_highpage_movable(vma, address);
> +		if (!new_page)
> +			goto oom;
> +	} else {
> +		new_page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, address);
> +		if (!new_page)
> +			goto oom;
> +		cow_user_page(new_page, old_page, address, vma);
> +	}
> +	__SetPageUptodate(new_page);
> +
>  	/*
>  	 * Don't let another task, with possibly unlocked vma,
>  	 * keep the mlocked page.
> @@ -2098,8 +2125,6 @@ gotten:
>  		clear_page_mlock(old_page);
>  		unlock_page(old_page);
>  	}
> -	cow_user_page(new_page, old_page, address, vma);
> -	__SetPageUptodate(new_page);
>  
>  	if (mem_cgroup_newpage_charge(new_page, mm, GFP_KERNEL))
>  		goto oom_free_new;
> @@ -2594,6 +2619,15 @@ static int do_anonymous_page(struct mm_s
>  	spinlock_t *ptl;
>  	pte_t entry;
>  
> +	if (HAVE_PTE_SPECIAL && !(flags & FAULT_FLAG_WRITE)) {
> +		entry = pte_mkspecial(pfn_pte(zero_pfn, vma->vm_page_prot));
> +		ptl = pte_lockptr(mm, pmd);
> +		spin_lock(ptl);
> +		if (!pte_none(*page_table))
> +			goto unlock;
> +		goto setpte;
> +	}
> +
>  	/* Allocate our own private page. */
>  	pte_unmap(page_table);
>  
> @@ -2617,6 +2651,7 @@ static int do_anonymous_page(struct mm_s
>  
>  	inc_mm_counter(mm, anon_rss);
>  	page_add_new_anon_rmap(page, vma, address);
> +setpte:
>  	set_pte_at(mm, address, page_table, entry);
>  
>  	/* No need to invalidate - it was non-present before */

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2009-09-08  7:31 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-09-07 21:26 [PATCH 0/8] mm: around get_user_pages flags Hugh Dickins
2009-09-07 21:29 ` [PATCH 1/8] mm: munlock use follow_page Hugh Dickins
2009-09-08  2:58   ` KAMEZAWA Hiroyuki
2009-09-08 11:30     ` Hugh Dickins
2009-09-08 17:10   ` Rik van Riel
2009-09-09 15:59   ` Minchan Kim
2009-09-11 11:07   ` Hiroaki Wakabayashi
2009-09-07 21:31 ` [PATCH 2/8] mm: remove unused GUP flags Hugh Dickins
2009-09-08 17:27   ` Rik van Riel
2009-09-07 21:33 ` [PATCH 3/8] mm: add get_dump_page Hugh Dickins
2009-09-08 18:57   ` Rik van Riel
2009-09-07 21:35 ` [PATCH 4/8] mm: FOLL_DUMP replace FOLL_ANON Hugh Dickins
2009-09-08 18:59   ` Rik van Riel
2009-09-09 11:14   ` Mel Gorman
2009-09-09 16:16   ` Minchan Kim
2009-09-13 15:46     ` Hugh Dickins
2009-09-13 23:05       ` Minchan Kim
2009-09-07 21:37 ` [PATCH 5/8] mm: follow_hugetlb_page flags Hugh Dickins
2009-09-08 22:21   ` Rik van Riel
2009-09-09 11:31   ` Mel Gorman
2009-09-13 15:35     ` Hugh Dickins
2009-09-14 13:27       ` Mel Gorman
2009-09-15 20:26         ` Hugh Dickins
2009-09-07 21:38 ` [PATCH 6/8] mm: fix anonymous dirtying Hugh Dickins
2009-09-08 22:23   ` Rik van Riel
2009-09-07 21:39 ` [PATCH 7/8] mm: reinstate ZERO_PAGE Hugh Dickins
2009-09-08  2:37   ` KAMEZAWA Hiroyuki
2009-09-08 11:56     ` Hugh Dickins
2009-09-09  1:44       ` KAMEZAWA Hiroyuki
2009-09-15 20:15         ` Hugh Dickins
2009-09-08  7:31   ` Nick Piggin [this message]
2009-09-08 12:17     ` Hugh Dickins
2009-09-08 15:34       ` Nick Piggin
2009-09-08 16:40         ` Hugh Dickins
2009-09-08 14:13     ` Linus Torvalds
2009-09-08 23:35   ` Rik van Riel
2009-09-07 21:40 ` [PATCH 8/8] mm: FOLL flags for GUP flags Hugh Dickins
2009-09-07 23:51 ` [PATCH 0/8] mm: around get_user_pages flags Linus Torvalds
2009-09-07 23:52 ` KAMEZAWA Hiroyuki
2009-09-08  0:00 ` KOSAKI Motohiro
2009-09-10  0:33   ` KOSAKI Motohiro
2009-09-15 20:16     ` Hugh Dickins
2009-09-15 20:30 ` [PATCH 0/4] mm: mlock, hugetlb, zero followups Hugh Dickins
2009-09-15 20:31   ` [PATCH 1/4] mm: m(un)lock avoid ZERO_PAGE Hugh Dickins
2009-09-16  0:08     ` KOSAKI Motohiro
2009-09-16  9:35     ` Mel Gorman
2009-09-16 11:40       ` Hugh Dickins
2009-09-16 12:47         ` Mel Gorman
2009-09-15 20:33   ` [PATCH 2/4] mm: hugetlbfs_pagecache_present Hugh Dickins
2009-09-15 20:37   ` [PATCH 3/4] mm: ZERO_PAGE without PTE_SPECIAL Hugh Dickins
2009-09-16  6:20     ` KAMEZAWA Hiroyuki
2009-09-15 20:38   ` [PATCH 4/4] mm: move highest_memmap_pfn Hugh Dickins
2009-09-17  0:33   ` [PATCH 0/4] mm: mlock, hugetlb, zero followups KOSAKI Motohiro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090908073119.GA29902@wotan.suse.de \
    --to=npiggin@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=hugh.dickins@tiscali.co.uk \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=riel@redhat.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).