* Simple question re: oops
@ 2005-07-30 23:48 Lee Revell
2005-07-31 0:10 ` Lee Revell
2005-07-31 0:11 ` Alexander Nyberg
0 siblings, 2 replies; 10+ messages in thread
From: Lee Revell @ 2005-07-30 23:48 UTC (permalink / raw)
To: linux-kernel
I have a machine here that oopses reliably when I start X, but the
interesting stuff scrolls away too fast, and a bunch more Oopses get
printed ending with "Aieee, killing interrupt handler".
How do I get the output to stop after the first Oops?
Lee
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Simple question re: oops
2005-07-30 23:48 Simple question re: oops Lee Revell
@ 2005-07-31 0:10 ` Lee Revell
2005-07-31 0:11 ` Alexander Nyberg
1 sibling, 0 replies; 10+ messages in thread
From: Lee Revell @ 2005-07-31 0:10 UTC (permalink / raw)
To: linux-kernel
On Sat, 2005-07-30 at 19:48 -0400, Lee Revell wrote:
> I have a machine here that oopses reliably when I start X, but the
> interesting stuff scrolls away too fast, and a bunch more Oopses get
> printed ending with "Aieee, killing interrupt handler".
>
> How do I get the output to stop after the first Oops?
>
Never mind, /proc/sys/kernel/panic_on_oops should do it.
Lee
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Simple question re: oops
2005-07-30 23:48 Simple question re: oops Lee Revell
2005-07-31 0:10 ` Lee Revell
@ 2005-07-31 0:11 ` Alexander Nyberg
2005-07-31 0:15 ` Lee Revell
2005-07-31 0:21 ` Lee Revell
1 sibling, 2 replies; 10+ messages in thread
From: Alexander Nyberg @ 2005-07-31 0:11 UTC (permalink / raw)
To: Lee Revell; +Cc: linux-kernel
On Sat, Jul 30, 2005 at 07:48:11PM -0400 Lee Revell wrote:
> I have a machine here that oopses reliably when I start X, but the
> interesting stuff scrolls away too fast, and a bunch more Oopses get
> printed ending with "Aieee, killing interrupt handler".
>
> How do I get the output to stop after the first Oops?
>
set /proc/sys/kernel/panic_on_oops to 1
What version of the kernel is that? It shouldn't do recursive oopses
(of the same task) any more.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Simple question re: oops
2005-07-31 0:11 ` Alexander Nyberg
@ 2005-07-31 0:15 ` Lee Revell
2005-07-31 0:21 ` Lee Revell
1 sibling, 0 replies; 10+ messages in thread
From: Lee Revell @ 2005-07-31 0:15 UTC (permalink / raw)
To: Alexander Nyberg; +Cc: linux-kernel
On Sun, 2005-07-31 at 02:11 +0200, Alexander Nyberg wrote:
> On Sat, Jul 30, 2005 at 07:48:11PM -0400 Lee Revell wrote:
>
> > I have a machine here that oopses reliably when I start X, but the
> > interesting stuff scrolls away too fast, and a bunch more Oopses get
> > printed ending with "Aieee, killing interrupt handler".
> >
> > How do I get the output to stop after the first Oops?
> >
>
> set /proc/sys/kernel/panic_on_oops to 1
>
> What version of the kernel is that? It shouldn't do recursive oopses
> (of the same task) any more.
>
2.6.10 (whatever comes with Ubuntu Hoary). It's a demo install for a
client on cobbled together hardware. First I suspected the bleeding
edge GeForce video card, then we swapped it which didn't help. Now I
suspect the hard drive (or a kernel bug).
And I was wrong, it wasn't more Oopses, it was "scheduling while atomic"
messages that forced the interesting stuff offscreen.
Lee
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Simple question re: oops
2005-07-31 0:11 ` Alexander Nyberg
2005-07-31 0:15 ` Lee Revell
@ 2005-07-31 0:21 ` Lee Revell
2005-07-31 0:40 ` Dave Airlie
1 sibling, 1 reply; 10+ messages in thread
From: Lee Revell @ 2005-07-31 0:21 UTC (permalink / raw)
To: Alexander Nyberg; +Cc: linux-kernel
On Sun, 2005-07-31 at 02:11 +0200, Alexander Nyberg wrote:
> On Sat, Jul 30, 2005 at 07:48:11PM -0400 Lee Revell wrote:
>
> > I have a machine here that oopses reliably when I start X, but the
> > interesting stuff scrolls away too fast, and a bunch more Oopses get
> > printed ending with "Aieee, killing interrupt handler".
> >
> > How do I get the output to stop after the first Oops?
> >
>
> set /proc/sys/kernel/panic_on_oops to 1
>
> What version of the kernel is that? It shouldn't do recursive oopses
> (of the same task) any more.
>
panic_on_oops has no effect, a bunch of stuff flies past and the last
thing I see is "gam_server: scheduling while atomic" then a stack trace
of the core dump path then "Aiee, killing interrupt handler".
I am starting to suspect the hard drive, does that sound plausible?
It's as if it locks up when it hits a certain disk block.
Lee
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Simple question re: oops
2005-07-31 0:21 ` Lee Revell
@ 2005-07-31 0:40 ` Dave Airlie
2005-07-31 0:46 ` Lee Revell
2005-07-31 2:50 ` SOLVED - Re: Simple question re: oops Lee Revell
0 siblings, 2 replies; 10+ messages in thread
From: Dave Airlie @ 2005-07-31 0:40 UTC (permalink / raw)
To: Lee Revell; +Cc: Alexander Nyberg, linux-kernel
> panic_on_oops has no effect, a bunch of stuff flies past and the last
> thing I see is "gam_server: scheduling while atomic" then a stack trace
> of the core dump path then "Aiee, killing interrupt handler".
>
> I am starting to suspect the hard drive, does that sound plausible?
> It's as if it locks up when it hits a certain disk block.
run memtest on it... you might have bad RAM..
Dave.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Simple question re: oops
2005-07-31 0:40 ` Dave Airlie
@ 2005-07-31 0:46 ` Lee Revell
2005-08-01 18:03 ` ECC Support in Linux Roger Heflin
2005-07-31 2:50 ` SOLVED - Re: Simple question re: oops Lee Revell
1 sibling, 1 reply; 10+ messages in thread
From: Lee Revell @ 2005-07-31 0:46 UTC (permalink / raw)
To: Dave Airlie; +Cc: Alexander Nyberg, linux-kernel
On Sun, 2005-07-31 at 10:40 +1000, Dave Airlie wrote:
> > panic_on_oops has no effect, a bunch of stuff flies past and the last
> > thing I see is "gam_server: scheduling while atomic" then a stack trace
> > of the core dump path then "Aiee, killing interrupt handler".
> >
> > I am starting to suspect the hard drive, does that sound plausible?
> > It's as if it locks up when it hits a certain disk block.
>
> run memtest on it... you might have bad RAM..
>
Already swapped it out, but I'll try memtest.
Any idea why printk_ratelimit does not work? I set it to 1000 (per the
docs this should limit to 1 printk per second) and burst to 1 but I
still get screenfuls of text flying by.
Lee
^ permalink raw reply [flat|nested] 10+ messages in thread
* SOLVED - Re: Simple question re: oops
2005-07-31 0:40 ` Dave Airlie
2005-07-31 0:46 ` Lee Revell
@ 2005-07-31 2:50 ` Lee Revell
1 sibling, 0 replies; 10+ messages in thread
From: Lee Revell @ 2005-07-31 2:50 UTC (permalink / raw)
To: Dave Airlie; +Cc: Alexander Nyberg, linux-kernel
On Sun, 2005-07-31 at 10:40 +1000, Dave Airlie wrote:
> > panic_on_oops has no effect, a bunch of stuff flies past and the last
> > thing I see is "gam_server: scheduling while atomic" then a stack trace
> > of the core dump path then "Aiee, killing interrupt handler".
> >
> > I am starting to suspect the hard drive, does that sound plausible?
> > It's as if it locks up when it hits a certain disk block.
>
> run memtest on it... you might have bad RAM..
This was some kind of (ACPI related?) kernel bug. I upgraded from Hoary
(2.6.11) to Breezy (2.6.12) and the problem which had been 100%
reproducible went away.
One strange thing I noticed was some strange APM/ACPI related messages
in the logs when starting X (APM: overridden by ACPI or something). Now
I don't get these and the X log just says /dev/apm_bios: No such device.
Oh well, it's working now.
Lee
^ permalink raw reply [flat|nested] 10+ messages in thread
* ECC Support in Linux
2005-07-31 0:46 ` Lee Revell
@ 2005-08-01 18:03 ` Roger Heflin
2005-08-02 1:22 ` Wang, Zhenyu
0 siblings, 1 reply; 10+ messages in thread
From: Roger Heflin @ 2005-08-01 18:03 UTC (permalink / raw)
To: 'linux-kernel'
I have had a fair amount of trouble with the limited support
for ecc reporting on higher end dual and quad cpu servers as
the reporting is pretty weak.
On the opterons I can tell which cpu gets errors, but mcelog
does not isolate things down to the dimm level properly, is
there a way to do this sort of thing? I am talking about most
of the whitebox type motherboards.
On the newer Intels I have not found any useable ECC support
is there any in the kernels?
I can test a variety of hardware if someone needs it, and can
probably even come up with some test memory that will generate ecc
errors.
Roger
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: ECC Support in Linux
2005-08-01 18:03 ` ECC Support in Linux Roger Heflin
@ 2005-08-02 1:22 ` Wang, Zhenyu
0 siblings, 0 replies; 10+ messages in thread
From: Wang, Zhenyu @ 2005-08-02 1:22 UTC (permalink / raw)
To: Roger Heflin; +Cc: 'linux-kernel'
On 2005.08.01 13:03:34 +0000, Roger Heflin wrote:
>
> On the newer Intels I have not found any useable ECC support
> is there any in the kernels?
For ia32, not in kernel now, see http://bluesmoke.sf.net
For ia64, kernel already have support.
>
> I can test a variety of hardware if someone needs it, and can
> probably even come up with some test memory that will generate ecc
> errors.
>
Good! bluesmoke now has many advanced server support, you can help
to test those drivers. Pls subscribe bluesmoke's ML.
thanks
-zhen
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2005-08-02 1:30 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-07-30 23:48 Simple question re: oops Lee Revell
2005-07-31 0:10 ` Lee Revell
2005-07-31 0:11 ` Alexander Nyberg
2005-07-31 0:15 ` Lee Revell
2005-07-31 0:21 ` Lee Revell
2005-07-31 0:40 ` Dave Airlie
2005-07-31 0:46 ` Lee Revell
2005-08-01 18:03 ` ECC Support in Linux Roger Heflin
2005-08-02 1:22 ` Wang, Zhenyu
2005-07-31 2:50 ` SOLVED - Re: Simple question re: oops Lee Revell
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox