public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* CPU error codes
@ 2001-01-24 19:30 James Simmons
  2001-01-25  1:34 ` Alan Cox
  0 siblings, 1 reply; 10+ messages in thread
From: James Simmons @ 2001-01-24 19:30 UTC (permalink / raw)
  To: Linux Kernel Mailing List


I was wondering if someone could tell me where I can find
Xeon Pentium III cpu error messages/codes

I have a machine that crashed with:
kernel: CPU 3: Machine Check Exception: 0000000000000004
kernel: Bank 1: b200000000000175<0>Kernel panic: CPU context


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: CPU error codes
  2001-01-24 19:30 CPU error codes James Simmons
@ 2001-01-25  1:34 ` Alan Cox
  2001-01-25  9:14   ` James Sutherland
  0 siblings, 1 reply; 10+ messages in thread
From: Alan Cox @ 2001-01-25  1:34 UTC (permalink / raw)
  To: James Simmons; +Cc: Linux Kernel Mailing List

> I was wondering if someone could tell me where I can find
> Xeon Pentium III cpu error messages/codes

In the intel databook. Generally an MCE indicates hardware/power/cooling
issues
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: CPU error codes
  2001-01-25  1:34 ` Alan Cox
@ 2001-01-25  9:14   ` James Sutherland
  2001-01-28 21:29     ` H. Peter Anvin
  2001-01-31 15:23     ` Alan Cox
  0 siblings, 2 replies; 10+ messages in thread
From: James Sutherland @ 2001-01-25  9:14 UTC (permalink / raw)
  To: Alan Cox; +Cc: James Simmons, Linux Kernel Mailing List

On Thu, 25 Jan 2001, Alan Cox wrote:

> > I was wondering if someone could tell me where I can find
> > Xeon Pentium III cpu error messages/codes
> 
> In the intel databook. Generally an MCE indicates hardware/power/cooling
> issues

Doesn't an MCE also cover some hardware memory problems - parity/ECC
issues etc?


James.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: CPU error codes
  2001-01-25  9:14   ` James Sutherland
@ 2001-01-28 21:29     ` H. Peter Anvin
  2001-01-31 15:23     ` Alan Cox
  1 sibling, 0 replies; 10+ messages in thread
From: H. Peter Anvin @ 2001-01-28 21:29 UTC (permalink / raw)
  To: linux-kernel

Followup to:  <Pine.SOL.4.21.0101250913590.15936-100000@orange.csi.cam.ac.uk>
By author:    James Sutherland <jas88@cam.ac.uk>
In newsgroup: linux.dev.kernel
>
> On Thu, 25 Jan 2001, Alan Cox wrote:
> 
> > > I was wondering if someone could tell me where I can find
> > > Xeon Pentium III cpu error messages/codes
> > 
> > In the intel databook. Generally an MCE indicates hardware/power/cooling
> > issues
> 
> Doesn't an MCE also cover some hardware memory problems - parity/ECC
> issues etc?
> 

Not main memory, but it does include cache errors.

	-hpa
-- 
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: CPU error codes
  2001-01-25  9:14   ` James Sutherland
  2001-01-28 21:29     ` H. Peter Anvin
@ 2001-01-31 15:23     ` Alan Cox
  2001-01-31 21:24       ` James Sutherland
  2001-02-06 20:39       ` Carlos Carvalho
  1 sibling, 2 replies; 10+ messages in thread
From: Alan Cox @ 2001-01-31 15:23 UTC (permalink / raw)
  To: James Sutherland; +Cc: Alan Cox, James Simmons, Linux Kernel Mailing List

> > In the intel databook. Generally an MCE indicates hardware/power/cooling
> > issues
> 
> Doesn't an MCE also cover some hardware memory problems - parity/ECC
> issues etc?

Parity/ECC on main memory is reported by the chipset and needs seperate
drivers or apps to handle this
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: CPU error codes
  2001-01-31 15:23     ` Alan Cox
@ 2001-01-31 21:24       ` James Sutherland
  2001-01-31 21:33         ` Dan Hollis
  2001-02-06 20:39       ` Carlos Carvalho
  1 sibling, 1 reply; 10+ messages in thread
From: James Sutherland @ 2001-01-31 21:24 UTC (permalink / raw)
  To: Alan Cox; +Cc: James Simmons, Linux Kernel Mailing List

On Wed, 31 Jan 2001, Alan Cox wrote:

> > > In the intel databook. Generally an MCE indicates hardware/power/cooling
> > > issues
> > 
> > Doesn't an MCE also cover some hardware memory problems - parity/ECC
> > issues etc?
> 
> Parity/ECC on main memory is reported by the chipset and needs seperate
> drivers or apps to handle this

Yes - MCE only covers errors in the CPU's cache, IIRC? (Is there still an
NMI on main memory parity errors, or has this changed on modern
chipsets? Presumably ECC is handled differently, being recoverable??)


James.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: CPU error codes
  2001-01-31 21:24       ` James Sutherland
@ 2001-01-31 21:33         ` Dan Hollis
  0 siblings, 0 replies; 10+ messages in thread
From: Dan Hollis @ 2001-01-31 21:33 UTC (permalink / raw)
  To: James Sutherland; +Cc: Alan Cox, James Simmons, Linux Kernel Mailing List

On Wed, 31 Jan 2001, James Sutherland wrote:
> On Wed, 31 Jan 2001, Alan Cox wrote:
> > Parity/ECC on main memory is reported by the chipset and needs seperate
> > drivers or apps to handle this
> Yes - MCE only covers errors in the CPU's cache, IIRC? (Is there still an
> NMI on main memory parity errors, or has this changed on modern
> chipsets? Presumably ECC is handled differently, being recoverable??)

You can program the northbridge to generate NMI or not, on ECC errors.
Most chipsets still need to scrub memory after an error to reset ECC bits.

-Dan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: CPU error codes
  2001-01-31 15:23     ` Alan Cox
  2001-01-31 21:24       ` James Sutherland
@ 2001-02-06 20:39       ` Carlos Carvalho
  2001-02-07  4:21         ` H. Peter Anvin
  2001-02-07  8:53         ` Alan Cox
  1 sibling, 2 replies; 10+ messages in thread
From: Carlos Carvalho @ 2001-02-06 20:39 UTC (permalink / raw)
  To: Alan Cox; +Cc: Linux Kernel Mailing List

Alan Cox (alan@lxorguk.ukuu.org.uk) wrote on 31 January 2001 15:23:
 >> > In the intel databook. Generally an MCE indicates hardware/power/cooling
 >> > issues
 >> 
 >> Doesn't an MCE also cover some hardware memory problems - parity/ECC
 >> issues etc?
 >
 >Parity/ECC on main memory is reported by the chipset and needs seperate
 >drivers or apps to handle this

Really? I thought it could be because of RAM. Here's the story:

The kernel is 2.2.18pre24.

I'm having VERY frequent of this (sometimes once a day, sometimes once
a week, sometimes twice a day, on a much used machine)

CPU 1: Machine Check Exception: 0000000000000004
Bank 4: b200000000040151<0>Kernel panic: CPU context corrupt

CPU 0: Machine Check Exception: 0000000000000004
Bank 4: b200000000040151<0>Kernel panic: CPU context corrupt

CPU 0: Machine Check Exception: 0000000000000004
Bank 4: b200000000040151<0>Kernel panic: CPU context corrupt

This is on an ASUS P2B-DS with two PIII 700MHz and 100MHz FSB, 1GB of
RAM. The mce happens with both processors (the above is just part of
it).

I've already changed the motherboard and processors, and it continued.
Then I changed the memory, and it continues. I also changed the
power supply just in case, to no avail...

It happens with PC100 and PC133 memory. I increased the memory latency
(the SPD says it's cl2, I put it 3T and 10T DRAM) but the problem
persists.

Since I changed the main board and processor, I think the most likely
cause is ram. It seems the x86 can access ram directly, so if there's
a NMI there what will happen?

This is happening on a CRITICAL machine, so any help will be much
appreciated.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: CPU error codes
  2001-02-06 20:39       ` Carlos Carvalho
@ 2001-02-07  4:21         ` H. Peter Anvin
  2001-02-07  8:53         ` Alan Cox
  1 sibling, 0 replies; 10+ messages in thread
From: H. Peter Anvin @ 2001-02-07  4:21 UTC (permalink / raw)
  To: linux-kernel

Followup to:  <14976.24819.276892.26475@hoggar.fisica.ufpr.br>
By author:    Carlos Carvalho <carlos@fisica.ufpr.br>
In newsgroup: linux.dev.kernel
> 
> Really? I thought it could be because of RAM. Here's the story:
> 
> The kernel is 2.2.18pre24.
> 
> I'm having VERY frequent of this (sometimes once a day, sometimes once
> a week, sometimes twice a day, on a much used machine)
> 
> CPU 1: Machine Check Exception: 0000000000000004
> Bank 4: b200000000040151<0>Kernel panic: CPU context corrupt
> 
> CPU 0: Machine Check Exception: 0000000000000004
> Bank 4: b200000000040151<0>Kernel panic: CPU context corrupt
> 
> CPU 0: Machine Check Exception: 0000000000000004
> Bank 4: b200000000040151<0>Kernel panic: CPU context corrupt
> 
> This is on an ASUS P2B-DS with two PIII 700MHz and 100MHz FSB, 1GB of
> RAM. The mce happens with both processors (the above is just part of
> it).
> 
> I've already changed the motherboard and processors, and it continued.
> Then I changed the memory, and it continues. I also changed the
> power supply just in case, to no avail...
> 
> It happens with PC100 and PC133 memory. I increased the memory latency
> (the SPD says it's cl2, I put it 3T and 10T DRAM) but the problem
> persists.
> 
> Since I changed the main board and processor, I think the most likely
> cause is ram. It seems the x86 can access ram directly, so if there's
> a NMI there what will happen?
> 

Much more likely is that your CPU is bad, or overclocked.

	-hpa
-- 
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: CPU error codes
  2001-02-06 20:39       ` Carlos Carvalho
  2001-02-07  4:21         ` H. Peter Anvin
@ 2001-02-07  8:53         ` Alan Cox
  1 sibling, 0 replies; 10+ messages in thread
From: Alan Cox @ 2001-02-07  8:53 UTC (permalink / raw)
  To: Carlos Carvalho; +Cc: Alan Cox, Linux Kernel Mailing List

> Really? I thought it could be because of RAM. Here's the story:

RAM talks to the chipset so I dont think it could (unless it confused the
chipset)

> CPU 1: Machine Check Exception: 0000000000000004
> Bank 4: b200000000040151<0>Kernel panic: CPU context corrupt

Ok that decodes as:
	Status valid
	Uncorrect Error
	Error Enabled
	Processor Context Corrupt

Memory Heirarchy Error
	Instruction Fetch
	L1 cache

More than that I can't really say. Power and heat problems can certainly
trigger MCE's. I don't know if I/O devices can influence them.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2001-02-07  8:53 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-01-24 19:30 CPU error codes James Simmons
2001-01-25  1:34 ` Alan Cox
2001-01-25  9:14   ` James Sutherland
2001-01-28 21:29     ` H. Peter Anvin
2001-01-31 15:23     ` Alan Cox
2001-01-31 21:24       ` James Sutherland
2001-01-31 21:33         ` Dan Hollis
2001-02-06 20:39       ` Carlos Carvalho
2001-02-07  4:21         ` H. Peter Anvin
2001-02-07  8:53         ` Alan Cox

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox