From: Michal Szymanski <msz@astrouw.edu.pl>
To: SMP list <linux-smp@vger.kernel.org>
Subject: Re: FC4 crashes repeatedly on Supermicro AS1020A-T dual-core Opterons, SMP
Date: Fri, 12 May 2006 12:54:36 +0200 [thread overview]
Message-ID: <20060512105436.GA16850@astrouw.edu.pl> (raw)
In-Reply-To: <20060505152344.GA8408@boogeyman>
On Fri, May 05, 2006 at 08:23:44AM -0700, cerise@armory.com wrote:
> > Michal Szymanski wrote:
> >
> > >All systems crash (either hang with some "machine check exception"
> > >kernel messages or reset) when loaded with repeating runs of 1.3gb, CPU
> > >intensive with some I/O. I run 2 or 4 jobs simultaneously and they had
> > >never survived more than a few hours.
>
> Let's try the easy stuff first -- if it's crashing with a machine check
> exception, then let's disable machine check exceptions, and see if things
> still break.
>
> Try booting with the parameter "nomce". Be aware that mce is a mechanism
> for the processor to inform the kernel of thermal issues or component
> failure. You'll only want to disable this mechanism if you aren't having
> thermal problems.
I tried "nomce". The machine does not "halt" now with MCE kernel panic
messages onscreen but resets after 3-4 hours of work under 2 or more jobs.
As I wrote in a response to Robert's message, it seems to be a memory
issue, as there are no crashes with Kingston 1GB memory modules.
One of the machines and the memory went back to the dealer for tests.
> P.S. I came a little late to this party -- I didn't see the original message.
> Did you include the text of the kernel crash?
Below the kernel message as OCR-ed from a screen digital photo :)
Plus the decoded message as adviced by the first message:
Fedora Core release 4 (Stentz)
kernel 2.6.16-1.2069_FC4smp on an x86_64
red10 login:
HARDWARE ERROR
CPU 0: Machine Check Exception: 4 Bank 4: f604a00200000813
TSC 1504205a42ba ADDR 115e47828
This is not a software problem!
Run through mcelog --ascii to decode and contact your hardware vendor
Kernel panic - not syncing: Machine check
Call Trace: <#MC>
<ffffffff80134e6a>{panic+133} (ffffffff801129eb){mcheck_timer+0}
<ffffffff801131fc>{do_machine_check+753}
<ffffffff8010be43>{machine_check+127} <EOE>
------------------
mcelog --ascii output:
HARDWARE ERROR
CPU 0 BANK 4 TSC 1504205a42ba
MCG status:MCIP
MCi status:
Error overflow
Uncorrected error
Error enabled
MCi_ADDR register valid
Processor context corrupt
MCA:BUS Generic Originated-request Read Memory-access Request-timeout Error
Model:
STATUS f604a00200000813 MCGSTATUS 4
------------------
regards, Michal.
--
Michal Szymanski (msz at astrouw dot edu dot pl)
Warsaw University Observatory, Warszawa, POLAND
prev parent reply other threads:[~2006-05-12 10:54 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-04-18 19:11 FC4 crashes repeatedly on Supermicro AS1020A-T dual-core Opterons, SMP Michal Szymanski
2006-05-05 14:00 ` Bill Davidsen
2006-05-05 15:18 ` Robert M. Hyatt
2006-05-05 15:28 ` cerise
2006-05-05 16:31 ` Robert M. Hyatt
2006-05-09 12:23 ` Michal Szymanski
2006-05-24 20:23 ` Bill Davidsen
2006-05-24 20:28 ` Bill Davidsen
2006-05-05 15:23 ` cerise
2006-05-12 10:54 ` Michal Szymanski [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20060512105436.GA16850@astrouw.edu.pl \
--to=msz@astrouw.edu.pl \
--cc=linux-smp@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox