From: Mel Gorman <mgorman@suse.de>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
Sasha Levin <sasha.levin@oracle.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
Andrew Morton <akpm@linux-foundation.org>,
shli@kernel.org, Hugh Dickins <hughd@google.com>,
Rik van Riel <riel@redhat.com>, Shaohua Li <shli@fusionio.com>,
linux-mm <linux-mm@kvack.org>
Subject: Re: [patch 019/154] mm: make madvise(MADV_WILLNEED) support swap file prefetch
Date: Mon, 23 Dec 2013 10:25:24 +0000 [thread overview]
Message-ID: <20131223102524.GE11295@suse.de> (raw)
In-Reply-To: <20131220174210.GB727@redhat.com>
On Fri, Dec 20, 2013 at 06:42:10PM +0100, Andrea Arcangeli wrote:
> > <SNIP>
> > diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
> > index f330d28e4d0e..1f8bc7881bdb 100644
> > --- a/include/asm-generic/pgtable.h
> > +++ b/include/asm-generic/pgtable.h
> > @@ -558,6 +558,14 @@ static inline pmd_t pmd_read_atomic(pmd_t *pmdp)
> > }
> > #endif
> >
> > +#ifndef pmd_numa
> > +static inline int pmd_numa(pmd_t pmd)
> > +{
> > + return (pmd_flags(pmd) &
> > + (_PAGE_NUMA|_PAGE_PRESENT)) == _PAGE_NUMA;
> > +}
> > +#endif
> > +
> > /*
> > * This function is meant to be used by sites walking pagetables with
> > * the mmap_sem hold in read mode to protect against MADV_DONTNEED and
> > @@ -601,7 +609,7 @@ static inline int pmd_none_or_trans_huge_or_clear_bad(pmd_t *pmd)
> > #endif
> > if (pmd_none(pmdval))
> > return 1;
> > - if (unlikely(pmd_bad(pmdval))) {
> > + if (unlikely(pmd_bad(pmdval) || pmd_numa(pmdval))) {
> > if (!pmd_trans_huge(pmdval))
> > pmd_clear_bad(pmd);
> > return 1;
> > @@ -650,14 +658,6 @@ static inline int pte_numa(pte_t pte)
> > }
> > #endif
> >
> > -#ifndef pmd_numa
> > -static inline int pmd_numa(pmd_t pmd)
> > -{
> > - return (pmd_flags(pmd) &
> > - (_PAGE_NUMA|_PAGE_PRESENT)) == _PAGE_NUMA;
> > -}
> > -#endif
> > -
>
> If we should do the below I would suggest Mel to decide.
>
> diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
> index 3d19994..d70235b 100644
> --- a/arch/x86/include/asm/pgtable.h
> +++ b/arch/x86/include/asm/pgtable.h
> @@ -532,8 +532,10 @@ static inline int pmd_bad(pmd_t pmd)
> {
> #ifdef CONFIG_NUMA_BALANCING
> /* pmd_numa check */
> - if ((pmd_flags(pmd) & (_PAGE_NUMA|_PAGE_PRESENT)) == _PAGE_NUMA)
> - return 0;
> + if ((pmd_flags(pmd) & (_PAGE_NUMA|_PAGE_PRESENT)) == _PAGE_NUMA) {
> + pmd_clear_flags(pmd, _PAGE_NUMA);
> + pmd_set_flags(pmd, _PAGE_PRESENT);
> + }
> #endif
> return (pmd_flags(pmd) & ~_PAGE_USER) != _KERNPG_TABLE;
> }
>
I'm not keen on pmd_bad clearing NUMA hinting information. The name
implies it's a check only and this patch transforms the pmd. It also
puts a new burden on the architecture side that may be easily missed in
the future.
>
> The reason for not doing this was that it felt slow to have that kind
> of mangling in a fast path bad check. But maybe it's no problem to do
> it. The above should also avoid the bug.
>
If this bug is related to mprotect then it should also have been fixed
indirectly by commit 1667918b ("mm: numa: clear numa hinting information
on mprotect") although for the wrong reasons. Applying cleanly requires
mm: clear pmd_numa before invalidating
mm: numa: do not clear PMD during PTE update scan
mm: numa: do not clear PTE for pte_numa update
mm: numa: clear numa hinting information on mprotect
Is this bug reproducible in 3.13-rc5?
> However I would suggest the below, it was overkill strict to depend on
> pmd_bad to fail, and overall it made it weaker to depend on debug
> checks that may change in the future and that can provide false
> negatives (they only should never provide false positives). No need to
> check pmd_trans_huge inside the branch, the pmdval is stable there.
>
> If in a later patch I'd suggest you to cleanup the barrier with a
> ACCESS_ONCE (ACCESS_ONCE conditional only to
> CONFIG_TRANSPARENT_HUGEPAGE, no need of it if that's not configured
> in) that could give gcc more room for optimizations in this function.
>
> If CONFIG_TRANSPARENT_HUGEPAGE is not set, if pmd_none fails, the pmd
> cannot change at all.
>
> Again for the pmd_bad change above it's up to you... it's purely a
> performance/reliability tradeoff.
>
> From 238436f53937ed0a6821aa380b85366e4f6ad166 Mon Sep 17 00:00:00 2001
> From: Andrea Arcangeli <aarcange@redhat.com>
> Date: Fri, 20 Dec 2013 18:24:49 +0100
> Subject: [PATCH] mm: thp: don't depend on pmd_bad to fail on transhuge pmds
>
> pmd_bad stopped failing on transhuge pmds if the pmd_numa is
> armed. Don't depend on that invariant anymore as it has been broken.
>
> pmd_bad doesn't actually check if the pmd is bad if the pmd_numa is
> set. An alternative could be to teach it to mangle the local pmd to
> convert it from pmd_numa to non-pmd_numa and then issue a real check
> on it. However that would likely be slower, and keeping the
> pmd_trans_huge check in pmd_none_or_trans_huge_or_clear_bad in an
> unlikely pmd_bad branch was also not optimal.
>
> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
I prefer this patch
Acked-by: Mel Gorman <mgorman@suse.de>
--
Mel Gorman
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2013-12-23 10:25 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20130223003232.4CDDB5A41B6@corp2gmr1-2.hot.corp.google.com>
[not found] ` <52AA0613.2000908@oracle.com>
[not found] ` <CA+55aFw3_0_Et9bbfWgGLXEUaGQW1HE8j=oGBqFG_8j+h6jmvQ@mail.gmail.com>
[not found] ` <CA+55aFyRZW=Uy9w+bZR0vMOFNPqV-yW2Xs9N42qEwTQ3AY0fDw@mail.gmail.com>
[not found] ` <52AE271C.4040805@oracle.com>
2013-12-15 22:58 ` [patch 019/154] mm: make madvise(MADV_WILLNEED) support swap file prefetch Linus Torvalds
2013-12-16 12:47 ` Kirill A. Shutemov
2013-12-16 15:18 ` Sasha Levin
2013-12-16 20:52 ` Andrea Arcangeli
2013-12-17 0:18 ` Sasha Levin
2013-12-17 12:44 ` Kirill A. Shutemov
2013-12-17 14:09 ` Sasha Levin
2013-12-20 13:10 ` Kirill A. Shutemov
2013-12-20 13:31 ` Kirill A. Shutemov
2013-12-20 13:36 ` Kirill A. Shutemov
2013-12-20 17:42 ` Andrea Arcangeli
2013-12-23 10:25 ` Mel Gorman [this message]
2013-12-23 10:54 ` Kirill A. Shutemov
2013-12-23 11:15 ` Kirill A. Shutemov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20131223102524.GE11295@suse.de \
--to=mgorman@suse.de \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=hughd@google.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-mm@kvack.org \
--cc=riel@redhat.com \
--cc=sasha.levin@oracle.com \
--cc=shli@fusionio.com \
--cc=shli@kernel.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).