public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* mcelog ?
@ 2006-05-15  9:42 Stephan von Krawczynski
  2006-05-15 15:20 ` thockin
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Stephan von Krawczynski @ 2006-05-15  9:42 UTC (permalink / raw)
  To: linux-kernel

Hello,

can some kind soul please shortly explain what this message tells me:

HARDWARE ERROR
CPU 1: Machine Check Exception:                4 Bank 4: b60a200170080813
TSC 89cfb4725b17 ADDR 1025cb3f0 
This is not a software problem!
Run through mcelog --ascii to decode and contact your hardware vendor
Kernel panic - not syncing: Machine check



Of course I ran mcelog but I don't quite understand how the additional info
helps me finding the problem.
Is this a problem with RAM? And if, which one?

The box is a dual opteron with two banks of mem (4 sockets each), each socket
holding a 1 GB mem module.

Thanks for any hints.
-- 
Regards,
Stephan


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: mcelog ?
  2006-05-15  9:42 mcelog ? Stephan von Krawczynski
@ 2006-05-15 15:20 ` thockin
  2006-05-16  9:37   ` Stephan von Krawczynski
  2006-05-15 15:45 ` Andi Kleen
  2006-05-20 22:46 ` Bernd Pfrommer
  2 siblings, 1 reply; 5+ messages in thread
From: thockin @ 2006-05-15 15:20 UTC (permalink / raw)
  To: Stephan von Krawczynski; +Cc: linux-kernel

On Mon, May 15, 2006 at 11:42:43AM +0200, Stephan von Krawczynski wrote:
> HARDWARE ERROR
> CPU 1: Machine Check Exception:                4 Bank 4: b60a200170080813
> TSC 89cfb4725b17 ADDR 1025cb3f0 
> This is not a software problem!
> Run through mcelog --ascii to decode and contact your hardware vendor
> Kernel panic - not syncing: Machine check
> 
> Of course I ran mcelog but I don't quite understand how the additional info
> helps me finding the problem.
> Is this a problem with RAM? And if, which one?

It sounds like a memory error, but there are some other bank4 errors that
can crop up.  What does mcedecode say?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: mcelog ?
  2006-05-15  9:42 mcelog ? Stephan von Krawczynski
  2006-05-15 15:20 ` thockin
@ 2006-05-15 15:45 ` Andi Kleen
  2006-05-20 22:46 ` Bernd Pfrommer
  2 siblings, 0 replies; 5+ messages in thread
From: Andi Kleen @ 2006-05-15 15:45 UTC (permalink / raw)
  To: Stephan von Krawczynski; +Cc: linux-kernel

Stephan von Krawczynski <skraw@ithnet.com> writes:

> This is not a software problem!
> Run through mcelog --ascii to decode and contact your hardware vendor

Since when is linux-kernel your hardware vendor?
Would it help if the message was written all upper case?

-Andi

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: mcelog ?
  2006-05-15 15:20 ` thockin
@ 2006-05-16  9:37   ` Stephan von Krawczynski
  0 siblings, 0 replies; 5+ messages in thread
From: Stephan von Krawczynski @ 2006-05-16  9:37 UTC (permalink / raw)
  To: thockin; +Cc: linux-kernel

On Mon, 15 May 2006 08:20:08 -0700
thockin@hockin.org wrote:

> On Mon, May 15, 2006 at 11:42:43AM +0200, Stephan von Krawczynski wrote:
> > HARDWARE ERROR
> > CPU 1: Machine Check Exception:                4 Bank 4: b60a200170080813
> > TSC 89cfb4725b17 ADDR 1025cb3f0 
> > This is not a software problem!
> > Run through mcelog --ascii to decode and contact your hardware vendor
> > Kernel panic - not syncing: Machine check
> > 
> > Of course I ran mcelog but I don't quite understand how the additional info
> > helps me finding the problem.
> > Is this a problem with RAM? And if, which one?
> 
> It sounds like a memory error, but there are some other bank4 errors that
> can crop up.  What does mcedecode say?

Well, here it is:

HARDWARE ERROR
CPU 1 4 northbridge TSC 89cfb4725b17 
  Northbridge Chipkill ECC error
  Chipkill ECC syndrome = 7014
       bit32 = err cpu0
       bit45 = uncorrected ecc error
       bit57 = processor context corrupt
       bit61 = error uncorrected
  bus error 'local node origin, request didn't time out
      generic read mem transaction
      memory access, level generic'
STATUS b60a200170080813 MCGSTATUS 4
This is not a software problem!

 
Is this some sort of mem error?

Thank you for your help
-- 
Regards,
Stephan

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: mcelog ?
  2006-05-15  9:42 mcelog ? Stephan von Krawczynski
  2006-05-15 15:20 ` thockin
  2006-05-15 15:45 ` Andi Kleen
@ 2006-05-20 22:46 ` Bernd Pfrommer
  2 siblings, 0 replies; 5+ messages in thread
From: Bernd Pfrommer @ 2006-05-20 22:46 UTC (permalink / raw)
  To: linux-kernel

Stephan von Krawczynski <skraw <at> ithnet.com> writes:

> 
> Hello,
> 
> can some kind soul please shortly explain what this message tells me:
> 
> HARDWARE ERROR
> CPU 1: Machine Check Exception:                4 Bank 4: b60a200170080813
> TSC 89cfb4725b17 ADDR 1025cb3f0 
> This is not a software problem!
> Run through mcelog --ascii to decode and contact your hardware vendor
> Kernel panic - not syncing: Machine check
> 
> Of course I ran mcelog but I don't quite understand how the additional info
> helps me finding the problem.
> Is this a problem with RAM? And if, which one?
> 
> The box is a dual opteron with two banks of mem (4 sockets each), each socket
> holding a 1 GB mem module.
> 
> Thanks for any hints.


I got a very similar error on a supermicro H8QC8+ (4way dual-core opteron)
during heavy disk writes. It only happened once so far. The error message also
mentioned
4 Bank 4: b608a00100000813 (strange that the last 4 digits agree).

Bernd



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2006-05-22 11:00 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-05-15  9:42 mcelog ? Stephan von Krawczynski
2006-05-15 15:20 ` thockin
2006-05-16  9:37   ` Stephan von Krawczynski
2006-05-15 15:45 ` Andi Kleen
2006-05-20 22:46 ` Bernd Pfrommer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox