Simple question re: oops

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Simple question re: oops
@ 2005-07-30 23:48 Lee Revell
  2005-07-31  0:10 ` Lee Revell
  2005-07-31  0:11 ` Alexander Nyberg
  0 siblings, 2 replies; 10+ messages in thread
From: Lee Revell @ 2005-07-30 23:48 UTC (permalink / raw)
  To: linux-kernel

I have a machine here that oopses reliably when I start X, but the
interesting stuff scrolls away too fast, and a bunch more Oopses get
printed ending with "Aieee, killing interrupt handler".

How do I get the output to stop after the first Oops?

Lee

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Simple question re: oops
  2005-07-30 23:48 Simple question re: oops Lee Revell
@ 2005-07-31  0:10 ` Lee Revell
  2005-07-31  0:11 ` Alexander Nyberg
  1 sibling, 0 replies; 10+ messages in thread
From: Lee Revell @ 2005-07-31  0:10 UTC (permalink / raw)
  To: linux-kernel

On Sat, 2005-07-30 at 19:48 -0400, Lee Revell wrote:
> I have a machine here that oopses reliably when I start X, but the
> interesting stuff scrolls away too fast, and a bunch more Oopses get
> printed ending with "Aieee, killing interrupt handler".
> 
> How do I get the output to stop after the first Oops?
> 

Never mind, /proc/sys/kernel/panic_on_oops should do it.

Lee



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Simple question re: oops
  2005-07-30 23:48 Simple question re: oops Lee Revell
  2005-07-31  0:10 ` Lee Revell
@ 2005-07-31  0:11 ` Alexander Nyberg
  2005-07-31  0:15   ` Lee Revell
  2005-07-31  0:21   ` Lee Revell
  1 sibling, 2 replies; 10+ messages in thread
From: Alexander Nyberg @ 2005-07-31  0:11 UTC (permalink / raw)
  To: Lee Revell; +Cc: linux-kernel

On Sat, Jul 30, 2005 at 07:48:11PM -0400 Lee Revell wrote:

> I have a machine here that oopses reliably when I start X, but the
> interesting stuff scrolls away too fast, and a bunch more Oopses get
> printed ending with "Aieee, killing interrupt handler".
> 
> How do I get the output to stop after the first Oops?
> 

set /proc/sys/kernel/panic_on_oops to 1

What version of the kernel is that? It shouldn't do recursive oopses
(of the same task) any more.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Simple question re: oops
  2005-07-31  0:11 ` Alexander Nyberg
@ 2005-07-31  0:15   ` Lee Revell
  2005-07-31  0:21   ` Lee Revell
  1 sibling, 0 replies; 10+ messages in thread
From: Lee Revell @ 2005-07-31  0:15 UTC (permalink / raw)
  To: Alexander Nyberg; +Cc: linux-kernel

On Sun, 2005-07-31 at 02:11 +0200, Alexander Nyberg wrote:
> On Sat, Jul 30, 2005 at 07:48:11PM -0400 Lee Revell wrote:
> 
> > I have a machine here that oopses reliably when I start X, but the
> > interesting stuff scrolls away too fast, and a bunch more Oopses get
> > printed ending with "Aieee, killing interrupt handler".
> > 
> > How do I get the output to stop after the first Oops?
> > 
> 
> set /proc/sys/kernel/panic_on_oops to 1
> 
> What version of the kernel is that? It shouldn't do recursive oopses
> (of the same task) any more.
> 

2.6.10 (whatever comes with Ubuntu Hoary).  It's a demo install for a
client on cobbled together hardware.  First I suspected the bleeding
edge GeForce video card, then we swapped it which didn't help.  Now I
suspect the hard drive (or a kernel bug).

And I was wrong, it wasn't more Oopses, it was "scheduling while atomic"
messages that forced the interesting stuff offscreen.

Lee


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Simple question re: oops
  2005-07-31  0:11 ` Alexander Nyberg
  2005-07-31  0:15   ` Lee Revell
@ 2005-07-31  0:21   ` Lee Revell
  2005-07-31  0:40     ` Dave Airlie
  1 sibling, 1 reply; 10+ messages in thread
From: Lee Revell @ 2005-07-31  0:21 UTC (permalink / raw)
  To: Alexander Nyberg; +Cc: linux-kernel

On Sun, 2005-07-31 at 02:11 +0200, Alexander Nyberg wrote:
> On Sat, Jul 30, 2005 at 07:48:11PM -0400 Lee Revell wrote:
> 
> > I have a machine here that oopses reliably when I start X, but the
> > interesting stuff scrolls away too fast, and a bunch more Oopses get
> > printed ending with "Aieee, killing interrupt handler".
> > 
> > How do I get the output to stop after the first Oops?
> > 
> 
> set /proc/sys/kernel/panic_on_oops to 1
> 
> What version of the kernel is that? It shouldn't do recursive oopses
> (of the same task) any more.
> 

panic_on_oops has no effect, a bunch of stuff flies past and the last
thing I see is "gam_server: scheduling while atomic" then a stack trace
of the core dump path then "Aiee, killing interrupt handler".

I am starting to suspect the hard drive, does that sound plausible?
It's as if it locks up when it hits a certain disk block.

Lee


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Simple question re: oops
  2005-07-31  0:21   ` Lee Revell
@ 2005-07-31  0:40     ` Dave Airlie
  2005-07-31  0:46       ` Lee Revell
  2005-07-31  2:50       ` SOLVED - Re: Simple question re: oops Lee Revell
  0 siblings, 2 replies; 10+ messages in thread
From: Dave Airlie @ 2005-07-31  0:40 UTC (permalink / raw)
  To: Lee Revell; +Cc: Alexander Nyberg, linux-kernel

> panic_on_oops has no effect, a bunch of stuff flies past and the last
> thing I see is "gam_server: scheduling while atomic" then a stack trace
> of the core dump path then "Aiee, killing interrupt handler".
> 
> I am starting to suspect the hard drive, does that sound plausible?
> It's as if it locks up when it hits a certain disk block.

run memtest on it... you might have bad RAM..

Dave.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Simple question re: oops
  2005-07-31  0:40     ` Dave Airlie
@ 2005-07-31  0:46       ` Lee Revell
  2005-08-01 18:03         ` ECC Support in Linux Roger Heflin
  2005-07-31  2:50       ` SOLVED - Re: Simple question re: oops Lee Revell
  1 sibling, 1 reply; 10+ messages in thread
From: Lee Revell @ 2005-07-31  0:46 UTC (permalink / raw)
  To: Dave Airlie; +Cc: Alexander Nyberg, linux-kernel

On Sun, 2005-07-31 at 10:40 +1000, Dave Airlie wrote:
> > panic_on_oops has no effect, a bunch of stuff flies past and the last
> > thing I see is "gam_server: scheduling while atomic" then a stack trace
> > of the core dump path then "Aiee, killing interrupt handler".
> > 
> > I am starting to suspect the hard drive, does that sound plausible?
> > It's as if it locks up when it hits a certain disk block.
> 
> run memtest on it... you might have bad RAM..
> 

Already swapped it out, but I'll try memtest.

Any idea why printk_ratelimit does not work?  I set it to 1000 (per the
docs this should limit to 1 printk per second) and burst to 1 but I
still get screenfuls of text flying by.

Lee



^ permalink raw reply	[flat|nested] 10+ messages in thread

* SOLVED - Re: Simple question re: oops
  2005-07-31  0:40     ` Dave Airlie
  2005-07-31  0:46       ` Lee Revell
@ 2005-07-31  2:50       ` Lee Revell
  1 sibling, 0 replies; 10+ messages in thread
From: Lee Revell @ 2005-07-31  2:50 UTC (permalink / raw)
  To: Dave Airlie; +Cc: Alexander Nyberg, linux-kernel

On Sun, 2005-07-31 at 10:40 +1000, Dave Airlie wrote:
> > panic_on_oops has no effect, a bunch of stuff flies past and the last
> > thing I see is "gam_server: scheduling while atomic" then a stack trace
> > of the core dump path then "Aiee, killing interrupt handler".
> > 
> > I am starting to suspect the hard drive, does that sound plausible?
> > It's as if it locks up when it hits a certain disk block.
> 
> run memtest on it... you might have bad RAM..

This was some kind of (ACPI related?) kernel bug.  I upgraded from Hoary
(2.6.11) to Breezy (2.6.12) and the problem which had been 100%
reproducible went away.

One strange thing I noticed was some strange APM/ACPI related messages
in the logs when starting X (APM: overridden by ACPI or something).  Now
I don't get these and the X log just says /dev/apm_bios: No such device.

Oh well, it's working now.

Lee

^ permalink raw reply	[flat|nested] 10+ messages in thread

* ECC Support in Linux
  2005-07-31  0:46       ` Lee Revell
@ 2005-08-01 18:03         ` Roger Heflin
  2005-08-02  1:22           ` Wang, Zhenyu
  0 siblings, 1 reply; 10+ messages in thread
From: Roger Heflin @ 2005-08-01 18:03 UTC (permalink / raw)
  To: 'linux-kernel'

I have had a fair amount of trouble with the limited support
for ecc reporting on higher end dual and quad cpu servers as
the reporting is pretty weak.

On the opterons I can tell which cpu gets errors, but mcelog
does not isolate things down to the dimm level properly, is
there a way to do this sort of thing?   I am talking about most
of the whitebox type motherboards.

On the newer Intels I have not found any useable ECC support
is there any in the kernels?

I can test a variety of hardware if someone needs it, and can
probably even come up with some test memory that will generate ecc
errors.

                     Roger   

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: ECC Support in Linux
  2005-08-01 18:03         ` ECC Support in Linux Roger Heflin
@ 2005-08-02  1:22           ` Wang, Zhenyu
  0 siblings, 0 replies; 10+ messages in thread
From: Wang, Zhenyu @ 2005-08-02  1:22 UTC (permalink / raw)
  To: Roger Heflin; +Cc: 'linux-kernel'

On 2005.08.01 13:03:34 +0000, Roger Heflin wrote:
>  
> On the newer Intels I have not found any useable ECC support
> is there any in the kernels?

For ia32, not in kernel now, see http://bluesmoke.sf.net
For ia64, kernel already have support. 

> 
> I can test a variety of hardware if someone needs it, and can
> probably even come up with some test memory that will generate ecc
> errors.
> 

Good! bluesmoke now has many advanced server support, you can help
to test those drivers. Pls subscribe bluesmoke's ML.

thanks
-zhen

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2005-08-02  1:30 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-07-30 23:48 Simple question re: oops Lee Revell
2005-07-31  0:10 ` Lee Revell
2005-07-31  0:11 ` Alexander Nyberg
2005-07-31  0:15   ` Lee Revell
2005-07-31  0:21   ` Lee Revell
2005-07-31  0:40     ` Dave Airlie
2005-07-31  0:46       ` Lee Revell
2005-08-01 18:03         ` ECC Support in Linux Roger Heflin
2005-08-02  1:22           ` Wang, Zhenyu
2005-07-31  2:50       ` SOLVED - Re: Simple question re: oops Lee Revell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox