All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mel Gorman <mgorman@suse.de>
To: Elena Ufimtseva <ufimtseva@gmail.com>
Cc: Steven Noonan <steven@uplinklabs.net>,
	Daniel Borkmann <borkmann@iogearbox.net>,
	Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	Boris Ostrovsky <boris.ostrovsky@oracle.com>,
	David Vrabel <david.vrabel@citrix.com>,
	xen-devel <xen-devel@lists.xenproject.org>,
	George Dunlap <george.dunlap@eu.citrix.com>,
	Dario Faggioli <dario.faggioli@citrix.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Linux Kernel mailing List <linux-kernel@vger.kernel.org>,
	Rik van Riel <riel@redhat.com>, Alex Thorlton <athorlton@sgi.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Vlastimil Babka <vbabka@suse.cz>,
	Michel Lespinasse <walken@google.com>
Subject: Re: [BISECTED] Linux 3.12.7 introduces page map handling regression
Date: Fri, 24 Jan 2014 13:38:30 +0000	[thread overview]
Message-ID: <20140124133830.GU4963@suse.de> (raw)
In-Reply-To: <CAEr7rXjge6rKzxbwy+0A6-5YhVZL9WGmaLrDYbE8H5hrtwq_4A@mail.gmail.com>

On Thu, Jan 23, 2014 at 11:23:37AM -0500, Elena Ufimtseva wrote:
> >> >> <SNIP>
> >> >>
> >> >> This dump doesn't look dramatically different, either.
> >> >>
> >> >>>
> >> >>> The other question is - how is AutoNUMA running when it is not enabled?
> >> >>> Shouldn't those _PAGE_NUMA ops be nops when AutoNUMA hasn't even been
> >> >>> turned on?
> >> >>
> >> >>
> >> >> Well, NUMA_BALANCING is enabled in the kernel config[1], but I presume you
> >> >> mean not enabled at runtime?
> >> >>
> >> >> [1]
> >> >> http://git.uplinklabs.net/snoonan/projects/archlinux/ec2/ec2-packages.git/tree/linux-ec2/config.x86_64
> >>
> >>
> >>
> >> --
> >> Elena
> 
> I was able to reproduce this consistently, also with the latest mm
> patches from yesterday.
> Can you please try this:
> 

Thanks Elena,

> diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
> index ce563be..76dcf96 100644
> --- a/arch/x86/xen/mmu.c
> +++ b/arch/x86/xen/mmu.c
> @@ -365,7 +365,7 @@ void xen_ptep_modify_prot_commit(struct mm_struct
> *mm, unsigned long addr,
>  /* Assume pteval_t is equivalent to all the other *val_t types. */
>  static pteval_t pte_mfn_to_pfn(pteval_t val)
>  {
> -       if (val & _PAGE_PRESENT) {
> +       if ((val & _PAGE_PRESENT) || ((val &
> (_PAGE_NUMA|_PAGE_PRESENT)) == _PAGE_NUMA)) {
>                 unsigned long mfn = (val & PTE_PFN_MASK) >> PAGE_SHIFT;
>                 unsigned long pfn = mfn_to_pfn(mfn);
> 
> @@ -381,7 +381,7 @@ static pteval_t pte_mfn_to_pfn(pteval_t val)
> 
>  static pteval_t pte_pfn_to_mfn(pteval_t val)
>  {
> -       if (val & _PAGE_PRESENT) {
> +       if ((val & _PAGE_PRESENT) || ((val &
> (_PAGE_NUMA|_PAGE_PRESENT)) == _PAGE_NUMA)) {
>                 unsigned long pfn = (val & PTE_PFN_MASK) >> PAGE_SHIFT;
>                 pteval_t flags = val & PTE_FLAGS_MASK;
>                 unsigned long mfn;

Would reusing pte_present be an option? Ordinarily I expect that
PAGE_NUMA/PAGE_PROTNONE is only set if PAGE_PRESENT is not set and pte_present
is defined as

static inline int pte_present(pte_t a)
{
        return pte_flags(a) & (_PAGE_PRESENT | _PAGE_PROTNONE |
                               _PAGE_NUMA);
}

So it looks like it work work. Of course it would need to be split to
reuse it within xen if pte_present was split to have a pteval_present
helper like so

static inline int pteval_present(pteval_t val)
{
	/*
	 * Yes Linus, _PAGE_PROTNONE == _PAGE_NUMA. Expressing it this
	 * way clearly states that the intent is that a protnone and numa
	 * hinting ptes are considered present for the purposes of
	 * pagetable operations like zapping, protection changes, gup etc.
	 */
	return val & (_PAGE_PRESENT | _PAGE_PROTNONE | _PAGE_NUMA);
}

static inline int pte_present(pte_t pte)
{
	return pteval_present(pte_flags(pte))
}

If Xen is doing some other tricks with _PAGE_PRESENT then it might be
ruled out as an option. If so, then maybe it could still be made a
little clearer for future reference?


diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index c1d406f..ff621de 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -365,7 +365,7 @@ void xen_ptep_modify_prot_commit(struct mm_struct *mm, unsigned long addr,
 /* Assume pteval_t is equivalent to all the other *val_t types. */
 static pteval_t pte_mfn_to_pfn(pteval_t val)
 {
-	if (val & _PAGE_PRESENT) {
+	if ((val & _PAGE_PRESENT) || pteval_numa(val)) {
 		unsigned long mfn = (val & PTE_PFN_MASK) >> PAGE_SHIFT;
 		unsigned long pfn = mfn_to_pfn(mfn);
 
@@ -381,7 +381,7 @@ static pteval_t pte_mfn_to_pfn(pteval_t val)
 
 static pteval_t pte_pfn_to_mfn(pteval_t val)
 {
-	if (val & _PAGE_PRESENT) {
+	if ((val & _PAGE_PRESENT) || pteval_numa(val)) {
 		unsigned long pfn = (val & PTE_PFN_MASK) >> PAGE_SHIFT;
 		pteval_t flags = val & PTE_FLAGS_MASK;
 		unsigned long mfn;
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index 8e4f41d..693fe00 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -654,10 +654,14 @@ static inline int pmd_trans_unstable(pmd_t *pmd)
  * (because _PAGE_PRESENT is not set).
  */
 #ifndef pte_numa
+static inline int pteval_numa(pteval_t pteval)
+{
+	return (pteval & (_PAGE_NUMA|_PAGE_PRESENT)) == _PAGE_NUMA;
+}
+
 static inline int pte_numa(pte_t pte)
 {
-	return (pte_flags(pte) &
-		(_PAGE_NUMA|_PAGE_PRESENT)) == _PAGE_NUMA;
+	return pteval_numa(pte_flags(pte));
 }
 #endif
 

  parent reply	other threads:[~2014-01-24 13:38 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-01-21 23:27 [BISECTED] Linux 3.12.7 introduces page map handling regression Steven Noonan
2014-01-22  1:49 ` Greg Kroah-Hartman
2014-01-22  2:47   ` Linus Torvalds
2014-01-22  3:20     ` Steven Noonan
2014-01-22  5:02       ` Konrad Rzeszutek Wilk
2014-01-22  5:02       ` Konrad Rzeszutek Wilk
2014-01-22  7:29         ` Steven Noonan
2014-01-22  7:29         ` Steven Noonan
2014-01-22 14:29           ` Daniel Borkmann
2014-01-22 14:29           ` Daniel Borkmann
2014-01-22 20:18             ` Elena Ufimtseva
2014-01-22 20:18             ` Elena Ufimtseva
2014-01-22 20:33               ` Steven Noonan
2014-01-22 20:33               ` Steven Noonan
2014-01-23 16:23                 ` Elena Ufimtseva
2014-01-23 23:20                   ` Steven Noonan
2014-01-24  4:28                     ` Elena Ufimtseva
2014-01-24  4:28                     ` Elena Ufimtseva
2014-01-23 23:20                   ` Steven Noonan
2014-01-24 11:05                   ` David Vrabel
2014-01-24 11:05                   ` David Vrabel
2014-01-24 13:38                   ` Mel Gorman
2014-01-24 13:38                   ` Mel Gorman [this message]
2014-01-26 18:02                     ` Elena Ufimtseva
2014-02-04  6:58                       ` Elena Ufimtseva
2014-02-04 11:44                         ` [PATCH] Subject: [PATCH] xen: Properly account for _PAGE_NUMA during xen pte translations Mel Gorman
2014-02-04 11:44                         ` Mel Gorman
2014-02-04 11:44                           ` Mel Gorman
2014-02-04 11:48                           ` David Vrabel
2014-02-04 11:48                           ` David Vrabel
2014-02-04 11:48                             ` David Vrabel
2014-02-04 14:38                             ` Konrad Rzeszutek Wilk
2014-02-04 14:38                             ` Konrad Rzeszutek Wilk
2014-02-04 14:38                               ` Konrad Rzeszutek Wilk
2014-02-04  6:58                       ` [BISECTED] Linux 3.12.7 introduces page map handling regression Elena Ufimtseva
2014-01-26 18:02                     ` Elena Ufimtseva
2014-01-23 16:23                 ` Elena Ufimtseva
2014-01-22 18:07     ` Rik van Riel
2014-01-22 18:24       ` Linus Torvalds
2014-01-22 18:39         ` Rik van Riel
2014-01-24 11:43           ` Mel Gorman
2014-01-23 17:03 ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140124133830.GU4963@suse.de \
    --to=mgorman@suse.de \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=athorlton@sgi.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=borkmann@iogearbox.net \
    --cc=dario.faggioli@citrix.com \
    --cc=david.vrabel@citrix.com \
    --cc=george.dunlap@eu.citrix.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=konrad.wilk@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=riel@redhat.com \
    --cc=steven@uplinklabs.net \
    --cc=torvalds@linux-foundation.org \
    --cc=ufimtseva@gmail.com \
    --cc=vbabka@suse.cz \
    --cc=walken@google.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.