From: Daniel Borkmann <borkmann@iogearbox.net>
To: Steven Noonan <steven@uplinklabs.net>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
boris.ostrovsky@oracle.com, david.vrabel@citrix.com,
xen-devel@lists.xenproject.org, george.dunlap@eu.citrix.com,
dario.faggioli@citrix.com, Elena Ufimtseva <ufimtseva@gmail.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
Andrea Arcangeli <aarcange@redhat.com>,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
Linux Kernel mailing List <linux-kernel@vger.kernel.org>,
Mel Gorman <mgorman@suse.de>, Rik van Riel <riel@redhat.com>,
Alex Thorlton <athorlton@sgi.com>,
Andrew Morton <akpm@linux-foundation.org>,
Vlastimil Babka <vbabka@suse.cz>,
Michel Lespinasse <walken@google.com>
Subject: Re: [BISECTED] Linux 3.12.7 introduces page map handling regression
Date: Wed, 22 Jan 2014 15:29:47 +0100 [thread overview]
Message-ID: <52DFD5DB.6060603@iogearbox.net> (raw)
In-Reply-To: <20140122072914.GA9283@orcus.uplinklabs.net>
On 01/22/2014 08:29 AM, Steven Noonan wrote:
> On Wed, Jan 22, 2014 at 12:02:15AM -0500, Konrad Rzeszutek Wilk wrote:
>> On Tue, Jan 21, 2014 at 07:20:45PM -0800, Steven Noonan wrote:
>>> On Tue, Jan 21, 2014 at 06:47:07PM -0800, Linus Torvalds wrote:
>>>> On Tue, Jan 21, 2014 at 5:49 PM, Greg Kroah-Hartman
>>>> <gregkh@linuxfoundation.org> wrote:
>>
>> Adding extra folks to the party.
>>>>>
>>>>> Odds are this also shows up in 3.13, right?
>>>
>>> Reproduced using 3.13 on the PV guest:
>>>
>>> [ 368.756763] BUG: Bad page map in process mp pte:80000004a67c6165 pmd:e9b706067
>>> [ 368.756777] page:ffffea001299f180 count:0 mapcount:-1 mapping: (null) index:0x0
>>> [ 368.756781] page flags: 0x2fffff80000014(referenced|dirty)
>>> [ 368.756786] addr:00007fd1388b7000 vm_flags:00100071 anon_vma:ffff880e9ba15f80 mapping: (null) index:7fd1388b7
>>> [ 368.756792] CPU: 29 PID: 618 Comm: mp Not tainted 3.13.0-ec2 #1
>>> [ 368.756795] ffff880e9b718958 ffff880e9eaf3cc0 ffffffff814d8748 00007fd1388b7000
>>> [ 368.756803] ffff880e9eaf3d08 ffffffff8116d289 0000000000000000 0000000000000000
>>> [ 368.756809] ffff880e9b7065b8 ffffea001299f180 00007fd1388b8000 ffff880e9eaf3e30
>>> [ 368.756815] Call Trace:
>>> [ 368.756825] [<ffffffff814d8748>] dump_stack+0x45/0x56
>>> [ 368.756833] [<ffffffff8116d289>] print_bad_pte+0x229/0x250
>>> [ 368.756837] [<ffffffff8116eae3>] unmap_single_vma+0x583/0x890
>>> [ 368.756842] [<ffffffff8116feb5>] unmap_vmas+0x65/0x90
>>> [ 368.756847] [<ffffffff81175dac>] unmap_region+0xac/0x120
>>> [ 368.756852] [<ffffffff81176379>] ? vma_rb_erase+0x1c9/0x210
>>> [ 368.756856] [<ffffffff81177f10>] do_munmap+0x280/0x370
>>> [ 368.756860] [<ffffffff81178041>] vm_munmap+0x41/0x60
>>> [ 368.756864] [<ffffffff81178f32>] SyS_munmap+0x22/0x30
>>> [ 368.756869] [<ffffffff814e70ed>] system_call_fastpath+0x1a/0x1f
>>> [ 368.756872] Disabling lock debugging due to kernel taint
>>> [ 368.760084] BUG: Bad rss-counter state mm:ffff880e9d079680 idx:0 val:-1
>>> [ 368.760091] BUG: Bad rss-counter state mm:ffff880e9d079680 idx:1 val:1
>>>
>>>>
>>>> Probably. I don't have a Xen PV setup to test with (and very little
>>>> interest in setting one up).. And I have a suspicion that it might not
>>>> be so much about Xen PV, as perhaps about the kind of hardware.
>>>>
>>>> I suspect the issue has something to do with the magic _PAGE_NUMA
>>>> tie-in with _PAGE_PRESENT. And then mprotect(PROT_NONE) ends up
>>>> removing the _PAGE_PRESENT bit, and now the crazy numa code is
>>>> confused.
>>>>
>>>> The whole _PAGE_NUMA thing is a f*cking horrible hack, and shares the
>>>> bit with _PAGE_PROTNONE, which is why it then has that tie-in to
>>>> _PAGE_PRESENT.
>>>>
>>>> Adding Andrea to the Cc, because he's the author of that horridness.
>>>> Putting Steven's test-case here as an attachement for Andrea, maybe
>>>> that makes him go "Ahh, yes, silly case".
>>>>
>>>> Also added Kirill, because he was involved the last _PAGE_NUMA debacle.
>>>>
>>>> Andrea, you can find the thread on lkml, but it boils down to commit
>>>> 1667918b6483 (backported to 3.12.7 as 3d792d616ba4) breaking the
>>>> attached test-case (but apparently only under Xen PV). There it
>>>> apparently causes a "BUG: Bad page map .." error.
>>
>> I *think* it is due to the fact that pmd_numa and pte_numa is getting the _raw_
>> value of PMDs and PTEs. That is - it does not use the pvops interface
>> and instead reads the values directly from the page-table. Since the
>> page-table is also manipulated by the hypervisor - there are certain
>> flags it also sets to do its business. It might be that it uses
>> _PAGE_GLOBAL as well - and Linux picks up on that. If it was using
>> pte_flags that would invoke the pvops interface.
>>
>> Elena, Dariof and George, you guys had been looking at this a bit deeper
>> than I have. Does the Xen hypervisor use the _PAGE_GLOBAL for PV guests?
>>
>> This not-compiled-totally-bad-patch might shed some light on what I was
>> thinking _could_ fix this issue - and IS NOT A FIX - JUST A HACK.
>> It does not fix it for PMDs naturally (as there are no PMD paravirt ops
>> for that).
>
> Unfortunately the Totally Bad Patch seems to make no difference. I am
> still able to repro the issue:
Maybe this one is also related to this BUG here (cc'ed people investigating
this one) ...
https://lkml.org/lkml/2014/1/10/427
... not sure, though.
> [ 346.374929] BUG: Bad page map in process mp pte:80000004ae928065 pmd:e993f9067
> [ 346.374942] page:ffffea0012ba4a00 count:0 mapcount:-1 mapping: (null) index:0x0
> [ 346.374946] page flags: 0x2fffff80000014(referenced|dirty)
> [ 346.374951] addr:00007f06a9bbb000 vm_flags:00100071 anon_vma:ffff880e9939fe00 mapping: (null) index:7f06a9bbb
> [ 346.374956] CPU: 29 PID: 609 Comm: mp Not tainted 3.13.0-ec2+ #1
> [ 346.374960] ffff880e9cc38da8 ffff880e991a3cc0 ffffffff814d8768 00007f06a9bbb000
> [ 346.374967] ffff880e991a3d08 ffffffff8116d289 0000000000000000 0000000000000000
> [ 346.374972] ffff880e993f9dd8 ffffea0012ba4a00 00007f06a9bbc000 ffff880e991a3e30
> [ 346.374979] Call Trace:
> [ 346.374988] [<ffffffff814d8768>] dump_stack+0x45/0x56
> [ 346.374996] [<ffffffff8116d289>] print_bad_pte+0x229/0x250
> [ 346.375000] [<ffffffff8116eae3>] unmap_single_vma+0x583/0x890
> [ 346.375006] [<ffffffff8116feb5>] unmap_vmas+0x65/0x90
> [ 346.375011] [<ffffffff81175dbc>] unmap_region+0xac/0x120
> [ 346.375016] [<ffffffff81176389>] ? vma_rb_erase+0x1c9/0x210
> [ 346.375021] [<ffffffff81177f20>] do_munmap+0x280/0x370
> [ 346.375025] [<ffffffff81178051>] vm_munmap+0x41/0x60
> [ 346.375029] [<ffffffff81178f42>] SyS_munmap+0x22/0x30
> [ 346.375034] [<ffffffff814e712d>] system_call_fastpath+0x1a/0x1f
> [ 346.375037] Disabling lock debugging due to kernel taint
> [ 346.380082] BUG: Bad rss-counter state mm:ffff880e9d22bc00 idx:0 val:-1
> [ 346.380088] BUG: Bad rss-counter state mm:ffff880e9d22bc00 idx:1 val:1
>
> This dump doesn't look dramatically different, either.
>
>>
>> The other question is - how is AutoNUMA running when it is not enabled?
>> Shouldn't those _PAGE_NUMA ops be nops when AutoNUMA hasn't even been
>> turned on?
>
> Well, NUMA_BALANCING is enabled in the kernel config[1], but I presume you
> mean not enabled at runtime?
>
> [1] http://git.uplinklabs.net/snoonan/projects/archlinux/ec2/ec2-packages.git/tree/linux-ec2/config.x86_64
next prev parent reply other threads:[~2014-01-22 14:30 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-01-21 23:27 [BISECTED] Linux 3.12.7 introduces page map handling regression Steven Noonan
2014-01-22 1:49 ` Greg Kroah-Hartman
2014-01-22 2:47 ` Linus Torvalds
2014-01-22 3:20 ` Steven Noonan
2014-01-22 5:02 ` Konrad Rzeszutek Wilk
2014-01-22 7:29 ` Steven Noonan
2014-01-22 7:29 ` Steven Noonan
2014-01-22 14:29 ` Daniel Borkmann
2014-01-22 14:29 ` Daniel Borkmann [this message]
2014-01-22 20:18 ` Elena Ufimtseva
2014-01-22 20:18 ` Elena Ufimtseva
2014-01-22 20:33 ` Steven Noonan
2014-01-22 20:33 ` Steven Noonan
2014-01-23 16:23 ` Elena Ufimtseva
2014-01-23 16:23 ` Elena Ufimtseva
2014-01-23 23:20 ` Steven Noonan
2014-01-24 4:28 ` Elena Ufimtseva
2014-01-24 4:28 ` Elena Ufimtseva
2014-01-23 23:20 ` Steven Noonan
2014-01-24 11:05 ` David Vrabel
2014-01-24 11:05 ` David Vrabel
2014-01-24 13:38 ` Mel Gorman
2014-01-24 13:38 ` Mel Gorman
2014-01-26 18:02 ` Elena Ufimtseva
2014-01-26 18:02 ` Elena Ufimtseva
2014-02-04 6:58 ` Elena Ufimtseva
2014-02-04 11:44 ` [PATCH] Subject: [PATCH] xen: Properly account for _PAGE_NUMA during xen pte translations Mel Gorman
2014-02-04 11:44 ` Mel Gorman
2014-02-04 11:48 ` David Vrabel
2014-02-04 11:48 ` David Vrabel
2014-02-04 11:48 ` David Vrabel
2014-02-04 14:38 ` Konrad Rzeszutek Wilk
2014-02-04 14:38 ` Konrad Rzeszutek Wilk
2014-02-04 14:38 ` Konrad Rzeszutek Wilk
2014-02-04 11:44 ` Mel Gorman
2014-02-04 6:58 ` [BISECTED] Linux 3.12.7 introduces page map handling regression Elena Ufimtseva
2014-01-22 5:02 ` Konrad Rzeszutek Wilk
2014-01-22 18:07 ` Rik van Riel
2014-01-22 18:24 ` Linus Torvalds
2014-01-22 18:39 ` Rik van Riel
2014-01-24 11:43 ` Mel Gorman
2014-01-23 17:03 ` Mel Gorman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=52DFD5DB.6060603@iogearbox.net \
--to=borkmann@iogearbox.net \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=athorlton@sgi.com \
--cc=boris.ostrovsky@oracle.com \
--cc=dario.faggioli@citrix.com \
--cc=david.vrabel@citrix.com \
--cc=george.dunlap@eu.citrix.com \
--cc=gregkh@linuxfoundation.org \
--cc=kirill.shutemov@linux.intel.com \
--cc=konrad.wilk@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=riel@redhat.com \
--cc=steven@uplinklabs.net \
--cc=torvalds@linux-foundation.org \
--cc=ufimtseva@gmail.com \
--cc=vbabka@suse.cz \
--cc=walken@google.com \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.