All of lore.kernel.org
 help / color / mirror / Atom feed
From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
To: pacman@kosh.dhis.org
Cc: Mel Gorman <mel@csn.ul.ie>,
	linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
	linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org
Subject: Re: PROBLEM: memory corrupting bug, bisected to 6dda9d55
Date: Tue, 19 Oct 2010 21:16:50 +1100	[thread overview]
Message-ID: <1287483410.2341.66.camel@pasglop> (raw)
In-Reply-To: <20101018213348.10281.qmail@kosh.dhis.org>


> > >From there, you might be able to close onto the culprit a bit more, for
> > example, try using the DABR register to set data access breakpoints
> > shortly before the corruption spot. AFAIK, On those old 32-bit CPUs, you
> > can set whether you want it to break on a real or a virtual address.
> 
> I thought of that, but as far as I can tell, this CPU doesn't have DABR.
> /proc/cpuinfo
> processor	: 0
> cpu		: 7447/7457
> clock		: 999.999990MHz
> revision	: 1.1 (pvr 8002 0101)
> bogomips	: 66.66
> timebase	: 33333333
> platform	: CHRP
> model		: Pegasos2
> machine		: CHRP Pegasos2
> Memory		: 512 MB

AFAIK, the 7447 is just a derivative of the 7450 design which -does-
have a DABR ... Unless it's broken :-)

> My next thought was: right after the correct value appears in memory, unmap
> the page from the kernel and let it Oops when it tries to write there. Then I
> found out that the kernel is using BATs instead of page tables for its own
> view of memory. Booting with "nobats" completely changes the memory usage
> pattern (probably because it's allocating a lot of pages to hold PTEs that it
> didn't need before)

Right. And that hides the problem I suppose ?

> > You can also sprinkle tests for the page content through the code if
> > that doesn't work to try to "close in" on the culprit (for example if
> > it's a case of stray DMA, like a network driver bug or such).
> 
> No network drivers are loaded when this happens.

Ok.

Cheers,
Ben.

WARNING: multiple messages have this Message-ID (diff)
From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
To: pacman@kosh.dhis.org
Cc: Mel Gorman <mel@csn.ul.ie>,
	linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
	linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org
Subject: Re: PROBLEM: memory corrupting bug, bisected to 6dda9d55
Date: Tue, 19 Oct 2010 21:16:50 +1100	[thread overview]
Message-ID: <1287483410.2341.66.camel@pasglop> (raw)
In-Reply-To: <20101018213348.10281.qmail@kosh.dhis.org>


> > >From there, you might be able to close onto the culprit a bit more, for
> > example, try using the DABR register to set data access breakpoints
> > shortly before the corruption spot. AFAIK, On those old 32-bit CPUs, you
> > can set whether you want it to break on a real or a virtual address.
> 
> I thought of that, but as far as I can tell, this CPU doesn't have DABR.
> /proc/cpuinfo
> processor	: 0
> cpu		: 7447/7457
> clock		: 999.999990MHz
> revision	: 1.1 (pvr 8002 0101)
> bogomips	: 66.66
> timebase	: 33333333
> platform	: CHRP
> model		: Pegasos2
> machine		: CHRP Pegasos2
> Memory		: 512 MB

AFAIK, the 7447 is just a derivative of the 7450 design which -does-
have a DABR ... Unless it's broken :-)

> My next thought was: right after the correct value appears in memory, unmap
> the page from the kernel and let it Oops when it tries to write there. Then I
> found out that the kernel is using BATs instead of page tables for its own
> view of memory. Booting with "nobats" completely changes the memory usage
> pattern (probably because it's allocating a lot of pages to hold PTEs that it
> didn't need before)

Right. And that hides the problem I suppose ?

> > You can also sprinkle tests for the page content through the code if
> > that doesn't work to try to "close in" on the culprit (for example if
> > it's a case of stray DMA, like a network driver bug or such).
> 
> No network drivers are loaded when this happens.

Ok.

Cheers,
Ben.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2010-10-19 10:17 UTC|newest]

Thread overview: 91+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-10-09  9:57 PROBLEM: memory corrupting bug, bisected to 6dda9d55 pacman
2010-10-09  9:57 ` pacman
2010-10-11 12:52 ` Christoph Lameter
2010-10-11 12:52   ` Christoph Lameter
2010-10-11 14:30 ` Mel Gorman
2010-10-11 14:30   ` Mel Gorman
2010-10-11 20:35   ` pacman
2010-10-11 20:35     ` pacman
2010-10-11 21:00   ` Andrew Morton
2010-10-11 21:00     ` Andrew Morton
2010-10-11 21:00     ` Andrew Morton
2010-10-13 14:40     ` Mel Gorman
2010-10-13 14:40       ` Mel Gorman
2010-10-13 14:40       ` Mel Gorman
2010-10-13 17:52       ` pacman
2010-10-13 17:52         ` pacman
2010-10-13 17:52         ` pacman
2010-10-18 11:33         ` Mel Gorman
2010-10-18 11:33           ` Mel Gorman
2010-10-18 11:33           ` Mel Gorman
2010-10-18 19:10           ` pacman
2010-10-18 19:10             ` pacman
2010-10-18 19:10             ` pacman
2010-10-18 21:10             ` Benjamin Herrenschmidt
2010-10-18 21:10               ` Benjamin Herrenschmidt
2010-10-18 21:33               ` pacman
2010-10-18 21:33                 ` pacman
2010-10-18 21:33                 ` pacman
2010-10-19 10:16                 ` Benjamin Herrenschmidt [this message]
2010-10-19 10:16                   ` Benjamin Herrenschmidt
2010-10-19 18:10                   ` pacman
2010-10-19 18:10                     ` pacman
2010-10-19 18:10                     ` pacman
2010-10-19 20:47                     ` Segher Boessenkool
2010-10-19 20:47                       ` Segher Boessenkool
2010-10-19 20:47                       ` Segher Boessenkool
2010-10-19 21:02                       ` Benjamin Herrenschmidt
2010-10-19 21:02                         ` Benjamin Herrenschmidt
2010-10-19 21:02                         ` Benjamin Herrenschmidt
2010-10-20  3:23                         ` pacman
2010-10-20  3:23                           ` pacman
2010-10-20  3:23                           ` pacman
2010-10-20 10:32                           ` Benjamin Herrenschmidt
2010-10-20 10:32                             ` Benjamin Herrenschmidt
2010-10-20 10:32                             ` Benjamin Herrenschmidt
2010-10-20 18:33                             ` pacman
2010-10-20 18:33                               ` pacman
2010-10-20 20:56                               ` Benjamin Herrenschmidt
2010-10-20 20:56                                 ` Benjamin Herrenschmidt
2010-10-22  9:15                                 ` pacman
2010-10-22  9:15                                   ` pacman
2010-10-27  8:57                                 ` Pegasos OHCI bug (was Re: PROBLEM: memory corrupting bug, bisected to 6dda9d55) pacman
2010-10-27  8:57                                   ` pacman
2010-10-27 10:13                                   ` Olaf Hering
2010-10-27 10:13                                     ` Olaf Hering
2010-10-27 21:04                                     ` Pegasos OHCI bug (was Re: PROBLEM: memory corrupting bug, pacman
2010-10-27 22:05                                       ` Segher Boessenkool
2010-10-27 22:58                                         ` pacman
2010-10-27 22:58                                           ` pacman
2010-10-27 23:33                                           ` Segher Boessenkool
2010-10-27 23:33                                             ` Segher Boessenkool
2010-10-28  1:11                                             ` pacman
2010-10-28 19:50                                               ` Segher Boessenkool
2010-10-28 19:50                                                 ` Segher Boessenkool
2010-10-28 21:07                                                 ` pacman
2010-10-29  0:16                                                   ` Segher Boessenkool
2010-10-29  0:16                                                     ` Segher Boessenkool
2010-11-05  6:43                                                     ` pacman
2010-11-05  6:43                                                       ` pacman
2010-11-29  5:44                                                       ` Benjamin Herrenschmidt
2010-10-27 13:27                                   ` Pegasos OHCI bug (was Re: PROBLEM: memory corrupting bug, bisected to 6dda9d55) Benjamin Herrenschmidt
2010-10-27 13:27                                     ` Benjamin Herrenschmidt
2010-10-19 20:58                     ` PROBLEM: memory corrupting bug, bisected to 6dda9d55 Benjamin Herrenschmidt
2010-10-19 20:58                       ` Benjamin Herrenschmidt
2010-10-18 19:37           ` Andrew Morton
2010-10-18 19:37             ` Andrew Morton
2010-10-18 19:37             ` Andrew Morton
2010-10-18 21:02             ` Benjamin Herrenschmidt
2010-10-18 21:02               ` Benjamin Herrenschmidt
2010-10-18 21:55             ` Thomas Gleixner
2010-10-18 21:55               ` Thomas Gleixner
2010-10-18 21:55               ` Thomas Gleixner
2010-10-19 16:24               ` Helmut Grohne
2010-10-19 16:24                 ` Helmut Grohne
2010-10-19 16:24                 ` Helmut Grohne
2010-10-19 16:42                 ` Thomas Gleixner
2010-10-19 16:42                   ` Thomas Gleixner
2010-10-19 16:42                   ` Thomas Gleixner
2010-10-18 20:59       ` Benjamin Herrenschmidt
2010-10-18 20:59         ` Benjamin Herrenschmidt
2010-10-18 20:59         ` Benjamin Herrenschmidt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1287483410.2341.66.camel@pasglop \
    --to=benh@kernel.crashing.org \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mel@csn.ul.ie \
    --cc=pacman@kosh.dhis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.