All of lore.kernel.org
 help / color / mirror / Atom feed
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: Steven Noonan <steven@uplinklabs.net>,
	boris.ostrovsky@oracle.com, david.vrabel@citrix.com,
	xen-devel@lists.xenproject.org, george.dunlap@eu.citrix.com,
	dario.faggioli@citrix.com, Elena Ufimtseva <ufimtseva@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Linux Kernel mailing List <linux-kernel@vger.kernel.org>,
	Mel Gorman <mgorman@suse.de>, Rik van Riel <riel@redhat.com>,
	Alex Thorlton <athorlton@sgi.com>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [BISECTED] Linux 3.12.7 introduces page map handling regression
Date: Wed, 22 Jan 2014 00:02:15 -0500	[thread overview]
Message-ID: <20140122050215.GC9931@konrad-lan.dumpdata.com> (raw)
In-Reply-To: <20140122032045.GA22182@falcon.amazon.com>

On Tue, Jan 21, 2014 at 07:20:45PM -0800, Steven Noonan wrote:
> On Tue, Jan 21, 2014 at 06:47:07PM -0800, Linus Torvalds wrote:
> > On Tue, Jan 21, 2014 at 5:49 PM, Greg Kroah-Hartman
> > <gregkh@linuxfoundation.org> wrote:

Adding extra folks to the party.
> > >
> > > Odds are this also shows up in 3.13, right?
> 
> Reproduced using 3.13 on the PV guest:
> 
> 	[  368.756763] BUG: Bad page map in process mp  pte:80000004a67c6165 pmd:e9b706067
> 	[  368.756777] page:ffffea001299f180 count:0 mapcount:-1 mapping:          (null) index:0x0
> 	[  368.756781] page flags: 0x2fffff80000014(referenced|dirty)
> 	[  368.756786] addr:00007fd1388b7000 vm_flags:00100071 anon_vma:ffff880e9ba15f80 mapping:          (null) index:7fd1388b7
> 	[  368.756792] CPU: 29 PID: 618 Comm: mp Not tainted 3.13.0-ec2 #1
> 	[  368.756795]  ffff880e9b718958 ffff880e9eaf3cc0 ffffffff814d8748 00007fd1388b7000
> 	[  368.756803]  ffff880e9eaf3d08 ffffffff8116d289 0000000000000000 0000000000000000
> 	[  368.756809]  ffff880e9b7065b8 ffffea001299f180 00007fd1388b8000 ffff880e9eaf3e30
> 	[  368.756815] Call Trace:
> 	[  368.756825]  [<ffffffff814d8748>] dump_stack+0x45/0x56
> 	[  368.756833]  [<ffffffff8116d289>] print_bad_pte+0x229/0x250
> 	[  368.756837]  [<ffffffff8116eae3>] unmap_single_vma+0x583/0x890
> 	[  368.756842]  [<ffffffff8116feb5>] unmap_vmas+0x65/0x90
> 	[  368.756847]  [<ffffffff81175dac>] unmap_region+0xac/0x120
> 	[  368.756852]  [<ffffffff81176379>] ? vma_rb_erase+0x1c9/0x210
> 	[  368.756856]  [<ffffffff81177f10>] do_munmap+0x280/0x370
> 	[  368.756860]  [<ffffffff81178041>] vm_munmap+0x41/0x60
> 	[  368.756864]  [<ffffffff81178f32>] SyS_munmap+0x22/0x30
> 	[  368.756869]  [<ffffffff814e70ed>] system_call_fastpath+0x1a/0x1f
> 	[  368.756872] Disabling lock debugging due to kernel taint
> 	[  368.760084] BUG: Bad rss-counter state mm:ffff880e9d079680 idx:0 val:-1
> 	[  368.760091] BUG: Bad rss-counter state mm:ffff880e9d079680 idx:1 val:1
> 
> > 
> > Probably. I don't have a Xen PV setup to test with (and very little
> > interest in setting one up).. And I have a suspicion that it might not
> > be so much about Xen PV, as perhaps about the kind of hardware.
> > 
> > I suspect the issue has something to do with the magic _PAGE_NUMA
> > tie-in with _PAGE_PRESENT. And then mprotect(PROT_NONE) ends up
> > removing the _PAGE_PRESENT bit, and now the crazy numa code is
> > confused.
> > 
> > The whole _PAGE_NUMA thing is a f*cking horrible hack, and shares the
> > bit with _PAGE_PROTNONE, which is why it then has that tie-in to
> > _PAGE_PRESENT.
> > 
> > Adding Andrea to the Cc, because he's the author of that horridness.
> > Putting Steven's test-case here as an attachement for Andrea, maybe
> > that makes him go "Ahh, yes, silly case".
> > 
> > Also added Kirill, because he was involved the last _PAGE_NUMA debacle.
> > 
> > Andrea, you can find the thread on lkml, but it boils down to commit
> > 1667918b6483 (backported to 3.12.7 as 3d792d616ba4) breaking the
> > attached test-case (but apparently only under Xen PV). There it
> > apparently causes a "BUG: Bad page map .." error.

I *think* it is due to the fact that pmd_numa and pte_numa is getting the _raw_
value of PMDs and PTEs. That is - it does not use the pvops interface
and instead reads the values directly from the page-table. Since the
page-table is also manipulated by the hypervisor - there are certain
flags it also sets to do its business. It might be that it uses
_PAGE_GLOBAL as well - and Linux picks up on that. If it was using
pte_flags that would invoke the pvops interface.

Elena, Dariof and George, you guys had been looking at this a bit deeper
than I have. Does the Xen hypervisor use the _PAGE_GLOBAL for PV guests?

This not-compiled-totally-bad-patch might shed some light on what I was
thinking _could_ fix this issue - and IS NOT A FIX - JUST A HACK.
It does not fix it for PMDs naturally (as there are no PMD paravirt ops
for that).

The other question is - how is AutoNUMA running when it is not enabled?
Shouldn't those _PAGE_NUMA ops be nops when AutoNUMA hasn't even been
turned on?


diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index ce563be..9fa7088 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -370,12 +370,15 @@ static pteval_t pte_mfn_to_pfn(pteval_t val)
 		unsigned long pfn = mfn_to_pfn(mfn);
 
 		pteval_t flags = val & PTE_FLAGS_MASK;
+		/* No AutoNUMA for PV. TODO If Linux sees the PTE having
+		 * said bit, just igore it. */
+		if (flags & _PAGE_NUMA)
+			flags = flags & ~_PAGE_NUMA;
 		if (unlikely(pfn == ~0))
 			val = flags & ~_PAGE_PRESENT;
 		else
 			val = ((pteval_t)pfn << PAGE_SHIFT) | flags;
 	}
-
 	return val;
 }
 
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index db09234..a8bc07d 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -644,7 +644,7 @@ static inline int pmd_trans_unstable(pmd_t *pmd)
 #ifndef pte_numa
 static inline int pte_numa(pte_t pte)
 {
-	return (pte_flags(pte) &
+	return (pte_val(pte) &
 		(_PAGE_NUMA|_PAGE_PRESENT)) == _PAGE_NUMA;
 }
 #endif

  parent reply	other threads:[~2014-01-22  5:04 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-01-21 23:27 [BISECTED] Linux 3.12.7 introduces page map handling regression Steven Noonan
2014-01-22  1:49 ` Greg Kroah-Hartman
2014-01-22  2:47   ` Linus Torvalds
2014-01-22  3:20     ` Steven Noonan
2014-01-22  5:02       ` Konrad Rzeszutek Wilk
2014-01-22  5:02       ` Konrad Rzeszutek Wilk [this message]
2014-01-22  7:29         ` Steven Noonan
2014-01-22 14:29           ` Daniel Borkmann
2014-01-22 20:18             ` Elena Ufimtseva
2014-01-22 20:18             ` Elena Ufimtseva
2014-01-22 20:33               ` Steven Noonan
2014-01-22 20:33               ` Steven Noonan
2014-01-23 16:23                 ` Elena Ufimtseva
2014-01-23 16:23                 ` Elena Ufimtseva
2014-01-23 23:20                   ` Steven Noonan
2014-01-23 23:20                   ` Steven Noonan
2014-01-24  4:28                     ` Elena Ufimtseva
2014-01-24  4:28                     ` Elena Ufimtseva
2014-01-24 11:05                   ` David Vrabel
2014-01-24 11:05                   ` David Vrabel
2014-01-24 13:38                   ` Mel Gorman
2014-01-26 18:02                     ` Elena Ufimtseva
2014-02-04  6:58                       ` Elena Ufimtseva
2014-02-04 11:44                         ` [PATCH] Subject: [PATCH] xen: Properly account for _PAGE_NUMA during xen pte translations Mel Gorman
2014-02-04 11:44                         ` Mel Gorman
2014-02-04 11:44                           ` Mel Gorman
2014-02-04 11:48                           ` David Vrabel
2014-02-04 11:48                           ` David Vrabel
2014-02-04 11:48                             ` David Vrabel
2014-02-04 14:38                             ` Konrad Rzeszutek Wilk
2014-02-04 14:38                               ` Konrad Rzeszutek Wilk
2014-02-04 14:38                             ` Konrad Rzeszutek Wilk
2014-02-04  6:58                       ` [BISECTED] Linux 3.12.7 introduces page map handling regression Elena Ufimtseva
2014-01-26 18:02                     ` Elena Ufimtseva
2014-01-24 13:38                   ` Mel Gorman
2014-01-22 14:29           ` Daniel Borkmann
2014-01-22  7:29         ` Steven Noonan
2014-01-22 18:07     ` Rik van Riel
2014-01-22 18:24       ` Linus Torvalds
2014-01-22 18:39         ` Rik van Riel
2014-01-24 11:43           ` Mel Gorman
2014-01-23 17:03 ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140122050215.GC9931@konrad-lan.dumpdata.com \
    --to=konrad.wilk@oracle.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=athorlton@sgi.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=dario.faggioli@citrix.com \
    --cc=david.vrabel@citrix.com \
    --cc=george.dunlap@eu.citrix.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=riel@redhat.com \
    --cc=steven@uplinklabs.net \
    --cc=torvalds@linux-foundation.org \
    --cc=ufimtseva@gmail.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.