Symmetric Multiprocessing (SMP) development
 help / color / mirror / Atom feed
From: Michal Szymanski <msz@astrouw.edu.pl>
To: SMP list <linux-smp@vger.kernel.org>
Subject: Re: FC4 crashes repeatedly on Supermicro AS1020A-T dual-core Opterons, SMP
Date: Fri, 12 May 2006 12:54:36 +0200	[thread overview]
Message-ID: <20060512105436.GA16850@astrouw.edu.pl> (raw)
In-Reply-To: <20060505152344.GA8408@boogeyman>

On Fri, May 05, 2006 at 08:23:44AM -0700, cerise@armory.com wrote:
> > Michal Szymanski wrote:
> >
> > >All systems crash (either hang with some "machine check exception"
> > >kernel messages or reset) when loaded with repeating runs of 1.3gb, CPU
> > >intensive with some I/O. I run 2 or 4 jobs simultaneously and they had
> > >never survived more than a few hours.
> 
> Let's try the easy stuff first -- if it's crashing with a machine check
> exception, then let's disable machine check exceptions, and see if things
> still break.
> 
> Try booting with the parameter "nomce".  Be aware that mce is a mechanism
> for the processor to inform the kernel of thermal issues or component 
> failure.  You'll only want to disable this mechanism if you aren't having
> thermal problems.  

I tried "nomce". The machine does not "halt" now with MCE kernel panic
messages onscreen but resets after 3-4 hours of work under 2 or more jobs.

As I wrote in a response to Robert's message, it seems to be a memory
issue, as there are no crashes with Kingston 1GB memory modules.
One of the machines and the memory went back to the dealer for tests.

> P.S.  I came a little late to this party -- I didn't see the original message.
> Did you include the text of the kernel crash?

Below the kernel message as OCR-ed from a screen digital photo :)
Plus the decoded message as adviced by the first message:

Fedora Core release 4 (Stentz)
kernel 2.6.16-1.2069_FC4smp on an x86_64

red10 login:
HARDWARE ERROR
        CPU 0: Machine Check Exception: 4 Bank 4: f604a00200000813
TSC 1504205a42ba ADDR 115e47828
This is not a software problem!
Run through mcelog --ascii to decode and contact your hardware vendor
Kernel panic - not syncing: Machine check

Call Trace: <#MC> 
     <ffffffff80134e6a>{panic+133} (ffffffff801129eb){mcheck_timer+0}
     <ffffffff801131fc>{do_machine_check+753} 
     <ffffffff8010be43>{machine_check+127} <EOE>

------------------

mcelog --ascii  output:

HARDWARE ERROR
CPU 0 BANK 4 TSC 1504205a42ba 
MCG status:MCIP 
MCi status:
Error overflow
Uncorrected error
Error enabled
MCi_ADDR register valid
Processor context corrupt
MCA:BUS Generic Originated-request Read Memory-access Request-timeout Error
Model:
STATUS f604a00200000813 MCGSTATUS 4
------------------

regards, Michal.

-- 
  Michal Szymanski (msz at astrouw dot edu dot pl)
  Warsaw University Observatory, Warszawa, POLAND

      reply	other threads:[~2006-05-12 10:54 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-04-18 19:11 FC4 crashes repeatedly on Supermicro AS1020A-T dual-core Opterons, SMP Michal Szymanski
2006-05-05 14:00 ` Bill Davidsen
2006-05-05 15:18   ` Robert M. Hyatt
2006-05-05 15:28     ` cerise
2006-05-05 16:31       ` Robert M. Hyatt
2006-05-09 12:23     ` Michal Szymanski
2006-05-24 20:23       ` Bill Davidsen
2006-05-24 20:28         ` Bill Davidsen
2006-05-05 15:23   ` cerise
2006-05-12 10:54     ` Michal Szymanski [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20060512105436.GA16850@astrouw.edu.pl \
    --to=msz@astrouw.edu.pl \
    --cc=linux-smp@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox