public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Robert Hancock <hancockr@shaw.ca>
To: Jack Malmostoso <malmostoso@email.it>
Cc: linux-kernel@vger.kernel.org
Subject: Re: MCE on an NForce4 board (again)
Date: Fri, 23 Mar 2007 08:24:14 -0600	[thread overview]
Message-ID: <4603E30E.3050203@shaw.ca> (raw)
In-Reply-To: <fa.7WMpSSoKT5HrVvZw161tkP591C8@ifi.uio.no>

Jack Malmostoso wrote:
> Hi there list,
> 
> this is a repost from the x86-64.org discuss list. I think it could be
> relevant here too, if not please excuse me and forget this message. I am not
> subscribed to the LKML, but I can follow the thread on usenet, so no need to
> CC. Feel free to do it if you think it's relevant!
> 
> Since a couple of days I have been experiencing weird lockups on my AMD64
> machine running Debian Sid.
> The computer is composed like this:
> 
> Asus A8N-E (NForce4 socket 939)
> Athlon64 X2 3800+
> 2x512MB Corsair TwinX
> 2x250GB Sata Western Digital
> 2xDVD/DVDRW on PATA channels
> 
> None of the components has *ever* been overclocked and the whole is one year
> old.
> 
> I have logged on in single mode and tried to find a way to reproduce the
> crash. It has been enough to do a stupid script like
> 
> while true; do
> tar -xjvf linux-2.6.20.3.tar.bz2 && rm -fr linux-2.6.20.3
> done
> 
> and regularly, after one or two cycles, the system would lockup with this
> error:
> 
> CPU 0: Machine Check Exception:		4 Bank 4: b200000000070f0f
> TSC 185ef6d81ca
> 
> It also gave me the hint to use mcelog, so I rebooted and installed it. The
> decoded error read:
> 
> CPU 0 4 northbridge TSC 185ef6d81ca 
>   Northbridge Watchdog error
>        bit57 = processor context corrupt
>        bit61 = error uncorrected
>   bus error 'generic participation, request timed out
>       generic error mem transaction
>       generic access, level generic'
> STATUS b200000000070f0f MCGSTATUS 4
> 
> I googled and found a thread on the x86-64.org discuss list [1] that blamed
> the RAM for the
> problem. So I have started doing some tests:
> 
> 1) If I use only 512MB of my RAM, alternatively, I don't get the error.
> 2) Memtest+ has been running for 10hrs and no errors have been detected.
> I'll have it running for the day just to be sure.
> 
> Additionally, right before the MCE, I could read another error:
> 
> ata3: CPB flags CMD err, flags=0x11
> 
> Googling this brought up some threads in the LKML about the sata_nv driver
> (the ADMA bit, I think).

It means the controller had an error sending commands to the drive. 
There's a possibility that if the controller is taking a long time 
retrying commands, etc. the CPU might complain about a bus transaction 
timing out. I'd check the SATA cable for that drive, maybe try a 
different one.

> 
> Since I am running a 2.6.20 kernel (a testing version from the Debian kernel
> team) I have tried booting older kernels but looks like anything I have
> tried does not work (but this is not your problem).
> 
> So I have booted a livecd (I had around an Ubuntu with a 2.6.12 kernel) and
> with my great surprise the machine worked without problems.
> 
> Now, do you think it would be even slightly possible that some regression in
> the sata_nv module could trigger an MCE? If not, would you put your money on
> the RAM, or could the motherboard be blamed? I hope it's not the CPU ;)

-- 
Robert Hancock      Saskatoon, SK, Canada
To email, remove "nospam" from hancockr@nospamshaw.ca
Home Page: http://www.roberthancock.com/


       reply	other threads:[~2007-03-23 14:25 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <fa.7WMpSSoKT5HrVvZw161tkP591C8@ifi.uio.no>
2007-03-23 14:24 ` Robert Hancock [this message]
2007-03-23 10:16 MCE on an NForce4 board (again) Jack Malmostoso
2007-03-23 11:20 ` Avuton Olrich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4603E30E.3050203@shaw.ca \
    --to=hancockr@shaw.ca \
    --cc=linux-kernel@vger.kernel.org \
    --cc=malmostoso@email.it \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox