All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
To: Andrea Arcangeli <aarcange@redhat.com>, Mel Gorman <mgorman@suse.de>
Cc: Sasha Levin <sasha.levin@oracle.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	shli@kernel.org, Hugh Dickins <hughd@google.com>,
	Rik van Riel <riel@redhat.com>, Shaohua Li <shli@fusionio.com>,
	linux-mm <linux-mm@kvack.org>
Subject: Re: [patch 019/154] mm: make madvise(MADV_WILLNEED) support swap file prefetch
Date: Fri, 20 Dec 2013 15:36:19 +0200 (EET)	[thread overview]
Message-ID: <20131220133619.4980AE0090@blue.fi.intel.com> (raw)
In-Reply-To: <20131220131003.93C9AE0090@blue.fi.intel.com>

Kirill A. Shutemov wrote:
> Andrea Arcangeli wrote:
> > Hi,
> > 
> > On Mon, Dec 16, 2013 at 10:18:39AM -0500, Sasha Levin wrote:
> > > On 12/16/2013 07:47 AM, Kirill A. Shutemov wrote:
> > > > I probably miss some context here. Do you have crash on some use-case or
> > > > what? Could you point me to start of discussion.
> > > 
> > > Yes, Sorry, here's the crash that started this discussion originally:
> > > 
> > > The code points to:
> > > 
> > 
> > At this point pmd_none_or_trans_huge_or_clear_bad guaranteed us the
> > pmd points to a regular pte.
> 
> It took too long, but I finally found a way to reproduce the bug easily:
> 
> 	#define _GNU_SOURCE
> 	#include <sys/mman.h>
> 
> 	#define MB (1024 * 1024)
> 
> 	int main(int argc, char **argv)
> 	{
> 		void *p;
> 
> 		p = mmap(0, 10 * MB, PROT_READ,
> 				MAP_PRIVATE | MAP_ANONYMOUS | MAP_POPULATE,
> 				-1, 0);
> 		mprotect(p, 10 * MB, PROT_NONE);
> 		madvise(p, 10 * MB, MADV_WILLNEED);
> 		return 0;
> 	}
> 
> And I track it down to pmd_none_or_trans_huge_or_clear_bad().
> 
> It seems it doesn't guarantee to return 1 for pmd_trans_huge() page and I
> don't know how it suppose to do this for non-bad page.
> 
> I've fixed this with patch below.
> 
> Andrea, do I miss something important here or
> pmd_none_or_trans_huge_or_clear_bad() is broken from day 1?

[ resend with fixed mail headers. ]

Oh.. It seems it cased by change pmd_bad() behaviour by commit be3a728427a6, so
pmd_none_or_trans_huge_or_clear_bad() misses THP pmds if they are pmd_numa().

Other way to get it work is below. I'm not sure which is more correct (if any).

Mel? Andrea?

> 
> diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
> index f330d28e4d0e..0694c9bf2a34 100644
> --- a/include/asm-generic/pgtable.h
> +++ b/include/asm-generic/pgtable.h
> @@ -599,7 +599,7 @@ static inline int pmd_none_or_trans_huge_or_clear_bad(pmd_t *pmd)
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>  	barrier();
>  #endif
> -	if (pmd_none(pmdval))
> +	if (pmd_none(pmdval) || pmd_trans_huge(pmdval))
>  		return 1;
>  	if (unlikely(pmd_bad(pmdval))) {
>  		if (!pmd_trans_huge(pmdval))


diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index f330d28e4d0e..1f8bc7881bdb 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -558,6 +558,14 @@ static inline pmd_t pmd_read_atomic(pmd_t *pmdp)
 }
 #endif
 
+#ifndef pmd_numa
+static inline int pmd_numa(pmd_t pmd)
+{
+	return (pmd_flags(pmd) &
+		(_PAGE_NUMA|_PAGE_PRESENT)) == _PAGE_NUMA;
+}
+#endif
+
 /*
  * This function is meant to be used by sites walking pagetables with
  * the mmap_sem hold in read mode to protect against MADV_DONTNEED and
@@ -601,7 +609,7 @@ static inline int pmd_none_or_trans_huge_or_clear_bad(pmd_t *pmd)
 #endif
 	if (pmd_none(pmdval))
 		return 1;
-	if (unlikely(pmd_bad(pmdval))) {
+	if (unlikely(pmd_bad(pmdval) || pmd_numa(pmdval))) {
 		if (!pmd_trans_huge(pmdval))
 			pmd_clear_bad(pmd);
 		return 1;
@@ -650,14 +658,6 @@ static inline int pte_numa(pte_t pte)
 }
 #endif
 
-#ifndef pmd_numa
-static inline int pmd_numa(pmd_t pmd)
-{
-	return (pmd_flags(pmd) &
-		(_PAGE_NUMA|_PAGE_PRESENT)) == _PAGE_NUMA;
-}
-#endif
-
 /*
  * pte/pmd_mknuma sets the _PAGE_ACCESSED bitflag automatically
  * because they're called by the NUMA hinting minor page fault. If we
-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2013-12-20 13:36 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20130223003232.4CDDB5A41B6@corp2gmr1-2.hot.corp.google.com>
     [not found] ` <52AA0613.2000908@oracle.com>
     [not found]   ` <CA+55aFw3_0_Et9bbfWgGLXEUaGQW1HE8j=oGBqFG_8j+h6jmvQ@mail.gmail.com>
     [not found]     ` <CA+55aFyRZW=Uy9w+bZR0vMOFNPqV-yW2Xs9N42qEwTQ3AY0fDw@mail.gmail.com>
     [not found]       ` <52AE271C.4040805@oracle.com>
2013-12-15 22:58         ` [patch 019/154] mm: make madvise(MADV_WILLNEED) support swap file prefetch Linus Torvalds
2013-12-16 12:47           ` Kirill A. Shutemov
2013-12-16 15:18             ` Sasha Levin
2013-12-16 20:52               ` Andrea Arcangeli
2013-12-17  0:18                 ` Sasha Levin
2013-12-17 12:44                   ` Kirill A. Shutemov
2013-12-17 14:09                     ` Sasha Levin
2013-12-20 13:10                 ` Kirill A. Shutemov
2013-12-20 13:31                   ` Kirill A. Shutemov
2013-12-20 13:36                   ` Kirill A. Shutemov [this message]
2013-12-20 17:42                     ` Andrea Arcangeli
2013-12-23 10:25                       ` Mel Gorman
2013-12-23 10:54                         ` Kirill A. Shutemov
2013-12-23 11:15                           ` Kirill A. Shutemov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131220133619.4980AE0090@blue.fi.intel.com \
    --to=kirill.shutemov@linux.intel.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=hughd@google.com \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=riel@redhat.com \
    --cc=sasha.levin@oracle.com \
    --cc=shli@fusionio.com \
    --cc=shli@kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.