From: Mel Gorman <mel@csn.ul.ie>
To: Johannes Berg <johannes@sipsolutions.net>
Cc: Andrew Morton <akpm@linux-foundation.org>,
linux-mm@kvack.org, bugzilla-daemon@bugzilla.kernel.org
Subject: Re: [Bug 29772] New: memory compaction crashed
Date: Thu, 24 Feb 2011 15:06:05 +0000 [thread overview]
Message-ID: <20110224150605.GX15652@csn.ul.ie> (raw)
In-Reply-To: <1298546750.3764.23.camel@jlt3.sipsolutions.net>
On Thu, Feb 24, 2011 at 12:25:50PM +0100, Johannes Berg wrote:
> On Thu, 2011-02-24 at 10:37 +0000, Mel Gorman wrote:
>
> > > Yes. I was using evince to pan around in a fairly large PDF that really
> > > is a large single-page bitmap, but that's about it. I also have a fairly
> > > large (bit more than full HD) external monitor, both of these probably
> > > take some amount of memory. The system had been up for a while few hours
> > > at most, with similar workloads, sometimes a kernel compile (but none
> > > was running at the time).
>
> > Is this reproducible or did it just happen the once?
>
> It happened only once so far. And it wasn't the first time I was doing
> this (panning large files) either.
>
> > > > Can you tell me what line the instruction ffffffff8100f1c2 corresponds to? If
> > > > you have CONFIG_DEBUG_INFO set, it should be a case of telling me what the
> > > > output of "addr2line -e vmlinux 0xffffffff8100f1c2" is. On a similar note,
> > > > do you know what sort of crash this was? i.e. was it a NULL deference or
> > > > did a VM_BUG_ON or BUG_ON hit such as VM_BUG_ON(PageTransCompound(page))?
> > > > Was CONFIG_DEBUG_VM set? Actually, it would be preferable to have the
> > > > whole .config attached to the bugzilla if possible please.
> > >
> > > Attached the config. addr2line failed so I probably don't have enough
> > > debug info,
> >
> > Indeed not, can you enable CONFIG_DEBUG_INFO for future reference
> > please? It'll be easier to figure out where things crashed exactly.
> > Also, what compiler are you using?
>
> $ gcc --version
> gcc-4.5.real (Debian 4.5.2-2) 4.5.2
>
> I thought I had DEBUG_INFO, but I just checked in my .config and I it
> seems not. My mistake. Is DEBUG_INFO_REDUCED=y acceptable? From
> experience, not setting that takes an order of magnitude longer to
> compile on my laptop.
It's not something I use myself but if you run objdump on vmlinux, see
if there is symbolic names against things like "call" and see if
addr2line works. If yes, then it's enough information.
> > > ffffffff8110f197: 48 81 c3 ff 07 00 00 add $0x7ff,%rbx
> > > ffffffff8110f19e: 4c 89 45 a0 mov %r8,-0x60(%rbp)
> > > ffffffff8110f1a2: 48 81 e3 00 fc ff ff and $0xfffffffffffffc00,%rbx
> > > ffffffff8110f1a9: 48 ff cb dec %rbx
> > > ffffffff8110f1ac: 0f 1f 40 00 nopl 0x0(%rax)
> > > ffffffff8110f1b0: 48 ff c3 inc %rbx
> > > ffffffff8110f1b3: 49 39 de cmp %rbx,%r14
> > > ffffffff8110f1b6: 76 58 jbe 0xffffffff8110f210
> > > ffffffff8110f1b8: 48 6b cb 38 imul $0x38,%rbx,%rcx
> > > ffffffff8110f1bc: 49 ff c4 inc %r12
> > > ffffffff8110f1bf: 4c 01 f9 add %r15,%rcx
> > > ffffffff8110f1c2:**** 8b 41 0c mov 0xc(%rcx),%eax
> > > ffffffff8110f1c5: 83 f8 fe cmp $0xfffffffffffffffe,%eax
> > > ffffffff8110f1c8: 74 e6 je 0xffffffff8110f1b0
> > > ffffffff8110f1ca: 41 80 7d 40 00 cmpb $0x0,0x40(%r13)
> > > ffffffff8110f1cf: 74 8f je 0xffffffff8110f160
> > > ffffffff8110f1d1: 48 8b 01 mov (%rcx),%rax
> > > ffffffff8110f1d4: a8 20 test $0x20,%al
> > > ffffffff8110f1d6: 74 d8 je 0xffffffff8110f1b0
> > >
> > > (this matches the Code: in the picture) which means it was some sort of
> > > bad pointer dereference since %rcx is 0xffffea0000a00000 (I think). That
> > > almost seems like a valid pointer, hmm.
> > >
> >
> > I believe this corresponds to;
> >
> > for (; low_pfn < end_pfn; low_pfn++) {
> > struct page *page;
> > if (!pfn_valid_within(low_pfn))
> > continue;
> > nr_scanned++;
> >
> > /* Get the page and skip if free */
> > page = pfn_to_page(low_pfn);
> > if (PageBuddy(page)) <----- HERE
> > continue;
> >
> > rcx is storing a struct page pointer and the 0xc offset is the _mapcount.
> > It should be "impossible" for this page to be invalid though so I'm wondering
> > if there is some other memory corruption going on.
>
> Possible. I had some graphics issues with X hanging once a while, but
> with all of those I could still ssh in and reboot the machine.
>
It could very well be related with the main difference being that
compaction blew up with interrupts disabled taking down the whole
machine. Have you a reproduction case for the X hangs?
It might also be worth running memtest on the machine just in case but I
find it doubtful that it's the problem. A buggy graphics driver feels
more likely. Are you running anything like compiz? If yes, are the
hangs still reproducible with it disabled?
> > > Also,
> > > since I was working on the kernel and didn't make a snapshot, I rebuilt
> > > the image using the attached config. That shouldn't change anything
> > > (went back to the same sources), but still -- FYI.
> > >
> >
> > Can you also enable;
> >
> > CONFIG_DEBUG_INFO
> > CONFIG_DEBUG_VM
> >
> > If this works for you, also enable
> >
> > CONFIG_DEBUG_PAGEALLOC
> >
> > The last option should work but it'll also slow your machine quite a
> > bit.
>
> Ok, I'll give it a try.
>
Thanks. With luck, it'll show up a driver that is corrupting memory.
> > > > However, I can't see what this corresponds to. eac0466 is not a commit I
> > > > can identify and the "dirty" implies that it's patched. How does this
> > > > kernel differ from mainline?
> > >
> > > The "-wl" indicates that it's a wireless-testing kernel (John Linville's
> > > repository), but I'm using iwlwifi-2.6 right now. The -dirty indicates
> > > that I've played with it, but only in the wireless code; the diffstat
> > > between this and rc6 indicates that only wireless, bluetooth and some
> > > tiny arch/arm changes are in here.
> > >
> >
> > There is a chance this is a driver bug that is corrupting memory. With
> > the debug options above, it would be worth trying to stress the machine
> > with network traffic with mainline, the wireless testing tree and
> > iwlwifi-2.6 (out of tree driver?) and see does each behave differently.
>
> I'd agree, but it's unlikely to be network -- my laptop doesn't even
> have iwlwifi hardware (which iwlwifi-2.6 contains, not out of tree, but
> our development tree, I just run it out of habit); and I wasn't even
> using wireless at all; networking itself and ethernet drivers are
> untouched in this tree.
>
Ok, good to know. Right now I am leaning towards a buggy graphics driver
or X server is corrupting memory and compaction suffered particularly
badly from it.
--
Mel Gorman
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2011-02-24 15:06 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <bug-29772-27@https.bugzilla.kernel.org/>
2011-02-23 21:40 ` [Bug 29772] New: memory compaction crashed Andrew Morton
2011-02-23 23:39 ` Mel Gorman
2011-02-24 8:47 ` Johannes Berg
2011-02-24 10:37 ` Mel Gorman
2011-02-24 11:25 ` Johannes Berg
2011-02-24 15:06 ` Mel Gorman [this message]
2011-02-24 15:14 ` Johannes Berg
2011-02-24 12:07 ` Johannes Berg
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110224150605.GX15652@csn.ul.ie \
--to=mel@csn.ul.ie \
--cc=akpm@linux-foundation.org \
--cc=bugzilla-daemon@bugzilla.kernel.org \
--cc=johannes@sipsolutions.net \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).