From: Steven Noonan <steven@uplinklabs.net>
To: Linux Kernel mailing List <linux-kernel@vger.kernel.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
Mel Gorman <mgorman@suse.de>, Rik van Riel <riel@redhat.com>,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
Alex Thorlton <athorlton@sgi.com>,
Andrew Morton <akpm@linux-foundation.org>,
Linus Torvalds <torvalds@linux-foundation.org>
Subject: [BISECTED] Linux 3.12.7 introduces page map handling regression
Date: Tue, 21 Jan 2014 15:27:08 -0800 [thread overview]
Message-ID: <20140121232708.GA29787@amazon.com> (raw)
A user reported a problem starting vsftpd on a Xen paravirtualized
guest, with this in dmesg:
[ 60.654862] BUG: Bad page map in process vsftpd pte:8000000493b88165 pmd:e9cc01067
[ 60.654876] page:ffffea00124ee200 count:0 mapcount:-1 mapping: (null) index:0x0
[ 60.654879] page flags: 0x2ffc0000000014(referenced|dirty)
[ 60.654885] addr:00007f97eea74000 vm_flags:00100071 anon_vma:ffff880e98f80380 mapping: (null) index:7f97eea74
[ 60.654890] CPU: 4 PID: 587 Comm: vsftpd Not tainted 3.12.7-1-ec2 #1
[ 60.654893] ffff880e9cc6ec38 ffff880e9cc61ca0 ffffffff814c763b 00007f97eea74000
[ 60.654900] ffff880e9cc61ce8 ffffffff8116784e 0000000000000000 0000000000000000
[ 60.654906] ffff880e9cc013a0 ffffea00124ee200 00007f97eea75000 ffff880e9cc61e10
[ 60.654912] Call Trace:
[ 60.654921] [<ffffffff814c763b>] dump_stack+0x45/0x56
[ 60.654928] [<ffffffff8116784e>] print_bad_pte+0x22e/0x250
[ 60.654933] [<ffffffff81169073>] unmap_single_vma+0x583/0x890
[ 60.654938] [<ffffffff8116a405>] unmap_vmas+0x65/0x90
[ 60.654942] [<ffffffff81173795>] exit_mmap+0xc5/0x170
[ 60.654948] [<ffffffff8105d295>] mmput+0x65/0x100
[ 60.654952] [<ffffffff81062983>] do_exit+0x393/0x9e0
[ 60.654955] [<ffffffff810630dc>] do_group_exit+0xcc/0x140
[ 60.654959] [<ffffffff81063164>] SyS_exit_group+0x14/0x20
[ 60.654965] [<ffffffff814d602d>] system_call_fastpath+0x1a/0x1f
[ 60.654968] Disabling lock debugging due to kernel taint
[ 60.655191] BUG: Bad rss-counter state mm:ffff880e9ca60580 idx:0 val:-1
[ 60.655196] BUG: Bad rss-counter state mm:ffff880e9ca60580 idx:1 val:1
The issue could not be reproduced under an HVM instance with the same
kernel, so it appears to be exclusive to paravirtual Xen guests.
I noted that it wasn't present in 3.10.27, but was present in 3.12.7 and
3.12.8. I ran through a bisection to find the root cause:
# start: 'v3.12.7' 'v3.10.27'
# bad: [4301b7a8] Linux 3.12.7
# good: [1071ea6e] Linux 3.10.27
# good: [8bb495e3] Linux 3.10
# good: [8fe73691] staging: comedi: comedi_bond: change return value
# good: [22e04f6b] Merge branch 'for-linus' of git://git.kernel.org/p
# good: [b7c09ad4] Merge branch 'for-linus' of git://git.kernel.org/p
# good: [13caa8ed] Merge git://git.kernel.org/pub/scm/linux/kernel/gi
# good: [13caa8ed] Merge git://git.kernel.org/pub/scm/linux/kernel/gi
# good: [f5fa9283] ipv6: reset dst.expires value when clearing expire
# good: [4af9d888] bridge: flush br's address entry in fdb when remov
# good: [8c13daf6] dm delay: fix a possible deadlock due to shared wo
# good: [93c02d70] firewire: sbp2: bring back WRITE SAME support
# good: [18065245] ACPI / PCI / hotplug: Avoid warning when _ADR not
# bad: [8807a436] mm/memory-failure.c: transfer page count from head
# bad: [fd5df800] mm: numa: avoid unnecessary disruption of NUMA hin
# good: [c18e3316] mm: numa: do not clear PMD during PTE update scan
# good: [f3b578d9] mm: numa: avoid unnecessary work on the failure pa
# bad: [3d792d61] mm: numa: clear numa hinting information on mprote
# good: [cefeb279] sched: numa: skip inaccessible VMAs
# first bad: [3d792d61] mm: numa: clear numa hinting information on mprote
If only I'd tested v3.12.0, that bisection would have been a lot shorter!
It looks like this is the change implicated (introduced in v3.12.7):
commit 3d792d616ba408ab55a54c1bb75a9367d997acfa
Author: Mel Gorman <mgorman@suse.de>
Date: Tue Jan 7 14:00:44 2014 +0000
mm: numa: clear numa hinting information on mprotect
commit 1667918b6483b12a6496bf54151b827b8235d7b1 upstream.
On a protection change it is no longer clear if the page should be still
accessible. This patch clears the NUMA hinting fault bits on a
protection change.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Alex Thorlton <athorlton@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This clearly points to breakage of mprotect() in particular. Checking
what vsftpd was doing via strace, I was able to come up with a simple
test case which triggers the issue:
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>
void die(const char *what)
{
perror(what);
exit(1);
}
int main(int arg, char **argv)
{
void *p = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
if (p == MAP_FAILED)
die("mmap");
/* Tickle the page. */
((char *)p)[0] = 0;
if (mprotect(p, 4096, PROT_NONE) != 0)
die("mprotect");
if (mprotect(p, 4096, PROT_READ) != 0)
die("mprotect");
if (munmap(p, 4096) != 0)
die("munmap");
return 0;
}
This could probably be reduced further. I didn't spend much time on it.
Adding people cited in the patch to CC, as well as Konrad since this is
a Xen issue (I haven't been able to repro on HVM or bare metal so far).
Any ideas what's causing the BUG, and how we can fix it?
- Steven
next reply other threads:[~2014-01-21 23:27 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-01-21 23:27 Steven Noonan [this message]
2014-01-22 1:49 ` [BISECTED] Linux 3.12.7 introduces page map handling regression Greg Kroah-Hartman
2014-01-22 2:47 ` Linus Torvalds
2014-01-22 3:20 ` Steven Noonan
2014-01-22 5:02 ` Konrad Rzeszutek Wilk
2014-01-22 7:29 ` Steven Noonan
2014-01-22 7:29 ` Steven Noonan
2014-01-22 14:29 ` Daniel Borkmann
2014-01-22 20:18 ` Elena Ufimtseva
2014-01-22 20:18 ` Elena Ufimtseva
2014-01-22 20:33 ` Steven Noonan
2014-01-23 16:23 ` Elena Ufimtseva
2014-01-23 23:20 ` Steven Noonan
2014-01-24 4:28 ` Elena Ufimtseva
2014-01-24 4:28 ` Elena Ufimtseva
2014-01-23 23:20 ` Steven Noonan
2014-01-24 11:05 ` David Vrabel
2014-01-24 11:05 ` David Vrabel
2014-01-24 13:38 ` Mel Gorman
2014-01-24 13:38 ` Mel Gorman
2014-01-26 18:02 ` Elena Ufimtseva
2014-01-26 18:02 ` Elena Ufimtseva
2014-02-04 6:58 ` Elena Ufimtseva
2014-02-04 11:44 ` [PATCH] Subject: [PATCH] xen: Properly account for _PAGE_NUMA during xen pte translations Mel Gorman
2014-02-04 11:44 ` Mel Gorman
2014-02-04 11:48 ` David Vrabel
2014-02-04 11:48 ` David Vrabel
2014-02-04 14:38 ` Konrad Rzeszutek Wilk
2014-02-04 14:38 ` Konrad Rzeszutek Wilk
2014-02-04 14:38 ` Konrad Rzeszutek Wilk
2014-02-04 11:48 ` David Vrabel
2014-02-04 11:44 ` Mel Gorman
2014-02-04 6:58 ` [BISECTED] Linux 3.12.7 introduces page map handling regression Elena Ufimtseva
2014-01-23 16:23 ` Elena Ufimtseva
2014-01-22 20:33 ` Steven Noonan
2014-01-22 14:29 ` Daniel Borkmann
2014-01-22 5:02 ` Konrad Rzeszutek Wilk
2014-01-22 18:07 ` Rik van Riel
2014-01-22 18:24 ` Linus Torvalds
2014-01-22 18:39 ` Rik van Riel
2014-01-24 11:43 ` Mel Gorman
2014-01-23 17:03 ` Mel Gorman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140121232708.GA29787@amazon.com \
--to=steven@uplinklabs.net \
--cc=akpm@linux-foundation.org \
--cc=athorlton@sgi.com \
--cc=gregkh@linuxfoundation.org \
--cc=konrad.wilk@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=riel@redhat.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.