* MCE on an NForce4 board (again)
@ 2007-03-23 10:16 Jack Malmostoso
2007-03-23 11:20 ` Avuton Olrich
0 siblings, 1 reply; 3+ messages in thread
From: Jack Malmostoso @ 2007-03-23 10:16 UTC (permalink / raw)
To: linux-kernel
Hi there list,
this is a repost from the x86-64.org discuss list. I think it could be
relevant here too, if not please excuse me and forget this message. I am not
subscribed to the LKML, but I can follow the thread on usenet, so no need to
CC. Feel free to do it if you think it's relevant!
Since a couple of days I have been experiencing weird lockups on my AMD64
machine running Debian Sid.
The computer is composed like this:
Asus A8N-E (NForce4 socket 939)
Athlon64 X2 3800+
2x512MB Corsair TwinX
2x250GB Sata Western Digital
2xDVD/DVDRW on PATA channels
None of the components has *ever* been overclocked and the whole is one year
old.
I have logged on in single mode and tried to find a way to reproduce the
crash. It has been enough to do a stupid script like
while true; do
tar -xjvf linux-2.6.20.3.tar.bz2 && rm -fr linux-2.6.20.3
done
and regularly, after one or two cycles, the system would lockup with this
error:
CPU 0: Machine Check Exception: 4 Bank 4: b200000000070f0f
TSC 185ef6d81ca
It also gave me the hint to use mcelog, so I rebooted and installed it. The
decoded error read:
CPU 0 4 northbridge TSC 185ef6d81ca
Northbridge Watchdog error
bit57 = processor context corrupt
bit61 = error uncorrected
bus error 'generic participation, request timed out
generic error mem transaction
generic access, level generic'
STATUS b200000000070f0f MCGSTATUS 4
I googled and found a thread on the x86-64.org discuss list [1] that blamed
the RAM for the
problem. So I have started doing some tests:
1) If I use only 512MB of my RAM, alternatively, I don't get the error.
2) Memtest+ has been running for 10hrs and no errors have been detected.
I'll have it running for the day just to be sure.
Additionally, right before the MCE, I could read another error:
ata3: CPB flags CMD err, flags=0x11
Googling this brought up some threads in the LKML about the sata_nv driver
(the ADMA bit, I think).
Since I am running a 2.6.20 kernel (a testing version from the Debian kernel
team) I have tried booting older kernels but looks like anything I have
tried does not work (but this is not your problem).
So I have booted a livecd (I had around an Ubuntu with a 2.6.12 kernel) and
with my great surprise the machine worked without problems.
Now, do you think it would be even slightly possible that some regression in
the sata_nv module could trigger an MCE? If not, would you put your money on
the RAM, or could the motherboard be blamed? I hope it's not the CPU ;)
Thanks for your help, I hope the information I gave you is enough. Please
feel free to suggest any more tests and diagnostics!
[1] https://www.x86-64.org/pipermail/discuss/2005-March/005902.html
--
Email.it, the professional e-mail, gratis per te: http://www.email.it/f
Sponsor:
Finalmente il design sposa il riscaldamento
vieni a scoprire i prodotti
Tubor
Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=6164&d=20070323
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: MCE on an NForce4 board (again)
2007-03-23 10:16 Jack Malmostoso
@ 2007-03-23 11:20 ` Avuton Olrich
0 siblings, 0 replies; 3+ messages in thread
From: Avuton Olrich @ 2007-03-23 11:20 UTC (permalink / raw)
To: Jack Malmostoso; +Cc: linux-kernel
On 3/23/07, Jack Malmostoso <malmostoso@email.it> wrote:
> Hi there list,
>
> this is a repost from the x86-64.org discuss list. I think it could be
> relevant here too, if not please excuse me and forget this message. I am not
> subscribed to the LKML, but I can follow the thread on usenet, so no need to
> CC. Feel free to do it if you think it's relevant!
I had this problem before. After much debugging and hair pulling I
contacted the manufacturer who said they wern't going to try to figure
out my issue due to my unwillingness to run Win. Went to the store to
replace the motherboard with the same model, the problem persisted.
Replaced the motherboard with a different model the problem completely
went away.
http://marc.info/?l=linux-kernel&m=113239372109342&w=2
--
avuton
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: MCE on an NForce4 board (again)
[not found] <fa.7WMpSSoKT5HrVvZw161tkP591C8@ifi.uio.no>
@ 2007-03-23 14:24 ` Robert Hancock
0 siblings, 0 replies; 3+ messages in thread
From: Robert Hancock @ 2007-03-23 14:24 UTC (permalink / raw)
To: Jack Malmostoso; +Cc: linux-kernel
Jack Malmostoso wrote:
> Hi there list,
>
> this is a repost from the x86-64.org discuss list. I think it could be
> relevant here too, if not please excuse me and forget this message. I am not
> subscribed to the LKML, but I can follow the thread on usenet, so no need to
> CC. Feel free to do it if you think it's relevant!
>
> Since a couple of days I have been experiencing weird lockups on my AMD64
> machine running Debian Sid.
> The computer is composed like this:
>
> Asus A8N-E (NForce4 socket 939)
> Athlon64 X2 3800+
> 2x512MB Corsair TwinX
> 2x250GB Sata Western Digital
> 2xDVD/DVDRW on PATA channels
>
> None of the components has *ever* been overclocked and the whole is one year
> old.
>
> I have logged on in single mode and tried to find a way to reproduce the
> crash. It has been enough to do a stupid script like
>
> while true; do
> tar -xjvf linux-2.6.20.3.tar.bz2 && rm -fr linux-2.6.20.3
> done
>
> and regularly, after one or two cycles, the system would lockup with this
> error:
>
> CPU 0: Machine Check Exception: 4 Bank 4: b200000000070f0f
> TSC 185ef6d81ca
>
> It also gave me the hint to use mcelog, so I rebooted and installed it. The
> decoded error read:
>
> CPU 0 4 northbridge TSC 185ef6d81ca
> Northbridge Watchdog error
> bit57 = processor context corrupt
> bit61 = error uncorrected
> bus error 'generic participation, request timed out
> generic error mem transaction
> generic access, level generic'
> STATUS b200000000070f0f MCGSTATUS 4
>
> I googled and found a thread on the x86-64.org discuss list [1] that blamed
> the RAM for the
> problem. So I have started doing some tests:
>
> 1) If I use only 512MB of my RAM, alternatively, I don't get the error.
> 2) Memtest+ has been running for 10hrs and no errors have been detected.
> I'll have it running for the day just to be sure.
>
> Additionally, right before the MCE, I could read another error:
>
> ata3: CPB flags CMD err, flags=0x11
>
> Googling this brought up some threads in the LKML about the sata_nv driver
> (the ADMA bit, I think).
It means the controller had an error sending commands to the drive.
There's a possibility that if the controller is taking a long time
retrying commands, etc. the CPU might complain about a bus transaction
timing out. I'd check the SATA cable for that drive, maybe try a
different one.
>
> Since I am running a 2.6.20 kernel (a testing version from the Debian kernel
> team) I have tried booting older kernels but looks like anything I have
> tried does not work (but this is not your problem).
>
> So I have booted a livecd (I had around an Ubuntu with a 2.6.12 kernel) and
> with my great surprise the machine worked without problems.
>
> Now, do you think it would be even slightly possible that some regression in
> the sata_nv module could trigger an MCE? If not, would you put your money on
> the RAM, or could the motherboard be blamed? I hope it's not the CPU ;)
--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from hancockr@nospamshaw.ca
Home Page: http://www.roberthancock.com/
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2007-03-23 14:25 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <fa.7WMpSSoKT5HrVvZw161tkP591C8@ifi.uio.no>
2007-03-23 14:24 ` MCE on an NForce4 board (again) Robert Hancock
2007-03-23 10:16 Jack Malmostoso
2007-03-23 11:20 ` Avuton Olrich
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox