linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* random lockups, raid problems SOLVED (plus a question)
@ 2005-12-06 20:19 Michael Stumpf
  2005-12-07  7:49 ` Mattias Wadenstein
  0 siblings, 1 reply; 2+ messages in thread
From: Michael Stumpf @ 2005-12-06 20:19 UTC (permalink / raw)
  To: linux-raid

Sure it's a FAQ.  It's probably even documented.  And, I know it, but it 
still surprised me.  Such is life:

2/3 sticks of perfectly good ECC ram in an old server class p3 board 
apparently have gone bad.  Result?  Random lockups/reboots with nothing 
in the system logs to even lend a clue.

Memtest86 showed one problem immediately, and after some time, exposed 
some more.  Remove the bad memory and it works fine.

Is there some daemon that can more actively monitor memory function?  I 
must have had this problem for months, but with sputtering hard drives 
that were slowly dying and causing very similar problems, this diagnosis 
got muddled.

Regards-
Michael Stumpf





^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: random lockups, raid problems SOLVED (plus a question)
  2005-12-06 20:19 random lockups, raid problems SOLVED (plus a question) Michael Stumpf
@ 2005-12-07  7:49 ` Mattias Wadenstein
  0 siblings, 0 replies; 2+ messages in thread
From: Mattias Wadenstein @ 2005-12-07  7:49 UTC (permalink / raw)
  To: Michael Stumpf; +Cc: linux-raid

On Tue, 6 Dec 2005, Michael Stumpf wrote:

> Sure it's a FAQ.  It's probably even documented.  And, I know it, but it 
> still surprised me.  Such is life:
>
> 2/3 sticks of perfectly good ECC ram in an old server class p3 board 
> apparently have gone bad.  Result?  Random lockups/reboots with nothing in 
> the system logs to even lend a clue.
>
> Memtest86 showed one problem immediately, and after some time, exposed some 
> more.  Remove the bad memory and it works fine.
>
> Is there some daemon that can more actively monitor memory function?  I must 
> have had this problem for months, but with sputtering hard drives that were 
> slowly dying and causing very similar problems, this diagnosis got muddled.

1. Run memtest if you experience instability.

2. Use a system that supports ecc and enable it in bios, that way you are 
likely to get a proper machine fault instead of lockups etc.

Note though, that it is very common for bugs in these systems, our last 3 
clusters over the last 4 years have all had errata on bios showing ecc to 
be enabled but not actually having it enabled.

/Mattias Wadenstein

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2005-12-07  7:49 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-12-06 20:19 random lockups, raid problems SOLVED (plus a question) Michael Stumpf
2005-12-07  7:49 ` Mattias Wadenstein

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).