* 3.2.0-rc1 panic on PowerPC
@ 2011-11-15 8:44 Christian Kujau
2011-11-20 23:31 ` Christian Kujau
0 siblings, 1 reply; 8+ messages in thread
From: Christian Kujau @ 2011-11-15 8:44 UTC (permalink / raw)
To: LKML; +Cc: linuxppc-dev
Hi,
I noticed a few crashes on this PowerBook G4 lately, starting somewhere in
3.2.0-rc1. The crashes are really rare and as I'm not on the system all
the time I did not notice most of them. By the time I did, the screen was
blank already and I had to hard-reset the box. But not this time:
http://nerdbynature.de/bits/3.2.0-rc1/oops/
When the crash occured, the system was failry loaded (CPU and disk I/O
wise), so that may have triggered it. I tried to type off the stack trace,
I hope there are not too many typos, see below.
The machine is fairly old, so maybe it's "just" bad RAM or something, I
wouldn't be suprised. But maybe not, the box us pretty stable most of the
time and only now I notice these rare crashes.
If anyone could take a quick look...?
Thank you,
Christian.
Instruction dump:
92c40008 68000001 0f000000 80040000 5400003c 90040000 817f000c 380bffff
901f000c 2f090000 81640018 81440014 <916a0004> 914b0000 92840014 92a49918
Kernel panic - not syncing: Fatal exception in interrupt
Call Trace:
show_stack+0x70/0x1bc (unreliable)
panic+0xc8/0x220
die+0x2ac/0x2b8
bad_page_fault+0xbc/0x104
handle_page_fault+0x7c/0x80
Exception: 300 at T.975+0x3f4/0x570
LR = T.957+0x300/0x570
kmem_cache_alloc+0x150/0x150
__aloc_skb+0x50/0x148
tcp_send_ack+0x35/0x138
tcp_delay_timer+0x140/0x244
run_timer_softirq+0x1a0/0x2ec
__do_softirq+0xf4/0x1bc
call_do_softirq+0x14/0x24
do_softirq+0xfc/0x128
irq_exit+0xa0/0xa4
timer_interrupt+0x148/0x180
ret_from_except+0x0/0x14
cpu_idle+0xa0/0x118
rest_init+0xf0/0x114
start_kernel+0x2d0/0x2f0
0x3444
Rebooting in 180 seconds..
--
BOFH excuse #184:
loop found in loop in redundant loopback
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 3.2.0-rc1 panic on PowerPC
2011-11-15 8:44 3.2.0-rc1 panic on PowerPC Christian Kujau
@ 2011-11-20 23:31 ` Christian Kujau
2011-11-21 0:58 ` Benjamin Herrenschmidt
0 siblings, 1 reply; 8+ messages in thread
From: Christian Kujau @ 2011-11-20 23:31 UTC (permalink / raw)
To: LKML; +Cc: linuxppc-dev
On Tue, 15 Nov 2011 at 00:44, Christian Kujau wrote:
> I noticed a few crashes on this PowerBook G4 lately, starting somewhere in
> 3.2.0-rc1. The crashes are really rare and as I'm not on the system all
> the time I did not notice most of them. By the time I did, the screen was
> blank already and I had to hard-reset the box. But not this time:
>
> http://nerdbynature.de/bits/3.2.0-rc1/oops/
>
> When the crash occured, the system was failry loaded (CPU and disk I/O
> wise), so that may have triggered it. I tried to type off the stack trace,
> I hope there are not too many typos, see below.
>
> The machine is fairly old, so maybe it's "just" bad RAM or something, I
> wouldn't be suprised. But maybe not, the box us pretty stable most of the
> time and only now I notice these rare crashes.
Happened again with 3.2.0-rc2-00027-gff0ff78, this time with netconsole
enabled. But this time the machine just stopped, w/o any output on the
screen or on netconsole :(
Christian.
> If anyone could take a quick look...?
>
> Thank you,
> Christian.
>
> Instruction dump:
> 92c40008 68000001 0f000000 80040000 5400003c 90040000 817f000c 380bffff
> 901f000c 2f090000 81640018 81440014 <916a0004> 914b0000 92840014 92a49918
> Kernel panic - not syncing: Fatal exception in interrupt
> Call Trace:
> show_stack+0x70/0x1bc (unreliable)
> panic+0xc8/0x220
> die+0x2ac/0x2b8
> bad_page_fault+0xbc/0x104
> handle_page_fault+0x7c/0x80
> Exception: 300 at T.975+0x3f4/0x570
> LR = T.957+0x300/0x570
> kmem_cache_alloc+0x150/0x150
> __aloc_skb+0x50/0x148
> tcp_send_ack+0x35/0x138
> tcp_delay_timer+0x140/0x244
> run_timer_softirq+0x1a0/0x2ec
> __do_softirq+0xf4/0x1bc
> call_do_softirq+0x14/0x24
> do_softirq+0xfc/0x128
> irq_exit+0xa0/0xa4
> timer_interrupt+0x148/0x180
> ret_from_except+0x0/0x14
> cpu_idle+0xa0/0x118
> rest_init+0xf0/0x114
> start_kernel+0x2d0/0x2f0
> 0x3444
> Rebooting in 180 seconds..
>
> --
> BOFH excuse #184:
>
> loop found in loop in redundant loopback
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
--
BOFH excuse #387:
Your computer's union contract is set to expire at midnight.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 3.2.0-rc1 panic on PowerPC
2011-11-20 23:31 ` Christian Kujau
@ 2011-11-21 0:58 ` Benjamin Herrenschmidt
2011-11-21 1:17 ` Christian Kujau
0 siblings, 1 reply; 8+ messages in thread
From: Benjamin Herrenschmidt @ 2011-11-21 0:58 UTC (permalink / raw)
To: Christian Kujau; +Cc: linuxppc-dev, LKML
On Sun, 2011-11-20 at 15:31 -0800, Christian Kujau wrote:
> On Tue, 15 Nov 2011 at 00:44, Christian Kujau wrote:
> > I noticed a few crashes on this PowerBook G4 lately, starting somewhere in
> > 3.2.0-rc1. The crashes are really rare and as I'm not on the system all
> > the time I did not notice most of them. By the time I did, the screen was
> > blank already and I had to hard-reset the box. But not this time:
> >
> > http://nerdbynature.de/bits/3.2.0-rc1/oops/
> >
> > When the crash occured, the system was failry loaded (CPU and disk I/O
> > wise), so that may have triggered it. I tried to type off the stack trace,
> > I hope there are not too many typos, see below.
> >
> > The machine is fairly old, so maybe it's "just" bad RAM or something, I
> > wouldn't be suprised. But maybe not, the box us pretty stable most of the
> > time and only now I notice these rare crashes.
>
> Happened again with 3.2.0-rc2-00027-gff0ff78, this time with netconsole
> enabled. But this time the machine just stopped, w/o any output on the
> screen or on netconsole :(
I've seen something similar with 3.2-rc2 at cfcfc9ec, unfortunately I
couldn't capture the oops log at the time.
Looks like there's some kind of memory corruption happening. So far I
haven't been able to get a good target at what could be causing it.
Cheers,
Ben.
> Christian.
>
> > If anyone could take a quick look...?
> >
> > Thank you,
> > Christian.
> >
> > Instruction dump:
> > 92c40008 68000001 0f000000 80040000 5400003c 90040000 817f000c 380bffff
> > 901f000c 2f090000 81640018 81440014 <916a0004> 914b0000 92840014 92a49918
> > Kernel panic - not syncing: Fatal exception in interrupt
> > Call Trace:
> > show_stack+0x70/0x1bc (unreliable)
> > panic+0xc8/0x220
> > die+0x2ac/0x2b8
> > bad_page_fault+0xbc/0x104
> > handle_page_fault+0x7c/0x80
> > Exception: 300 at T.975+0x3f4/0x570
> > LR = T.957+0x300/0x570
> > kmem_cache_alloc+0x150/0x150
> > __aloc_skb+0x50/0x148
> > tcp_send_ack+0x35/0x138
> > tcp_delay_timer+0x140/0x244
> > run_timer_softirq+0x1a0/0x2ec
> > __do_softirq+0xf4/0x1bc
> > call_do_softirq+0x14/0x24
> > do_softirq+0xfc/0x128
> > irq_exit+0xa0/0xa4
> > timer_interrupt+0x148/0x180
> > ret_from_except+0x0/0x14
> > cpu_idle+0xa0/0x118
> > rest_init+0xf0/0x114
> > start_kernel+0x2d0/0x2f0
> > 0x3444
> > Rebooting in 180 seconds..
> >
> > --
> > BOFH excuse #184:
> >
> > loop found in loop in redundant loopback
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/
> >
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 3.2.0-rc1 panic on PowerPC
2011-11-21 0:58 ` Benjamin Herrenschmidt
@ 2011-11-21 1:17 ` Christian Kujau
2011-11-21 1:25 ` Benjamin Herrenschmidt
0 siblings, 1 reply; 8+ messages in thread
From: Christian Kujau @ 2011-11-21 1:17 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, LKML
On Mon, 21 Nov 2011 at 11:58, Benjamin Herrenschmidt wrote:
> I've seen something similar with 3.2-rc2 at cfcfc9ec, unfortunately I
> couldn't capture the oops log at the time.
It just happened again today, after heavy CPU & IO load (rsyncing from/to
external disks on dm-crypt). This time the oops was printed on the screen
but nothing on netconsole:
http://nerdbynature.de/bits/3.2.0-rc1/oops/oops3m.JPG
It looks like the oops I reported earlier (oops2m.JPG) so I doubt it's a
random corruption due to hardware issues...?
Any debug or boot options to set in my next kernel build?
Thanks,
Christian.
> Looks like there's some kind of memory corruption happening. So far I
> haven't been able to get a good target at what could be causing it.
--
BOFH excuse #90:
Budget cuts
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 3.2.0-rc1 panic on PowerPC
2011-11-21 1:17 ` Christian Kujau
@ 2011-11-21 1:25 ` Benjamin Herrenschmidt
2011-11-21 1:51 ` Benjamin Herrenschmidt
2011-11-21 8:08 ` Markus Trippelsdorf
0 siblings, 2 replies; 8+ messages in thread
From: Benjamin Herrenschmidt @ 2011-11-21 1:25 UTC (permalink / raw)
To: Christian Kujau; +Cc: linuxppc-dev, LKML
On Sun, 2011-11-20 at 17:17 -0800, Christian Kujau wrote:
> On Mon, 21 Nov 2011 at 11:58, Benjamin Herrenschmidt wrote:
> > I've seen something similar with 3.2-rc2 at cfcfc9ec, unfortunately I
> > couldn't capture the oops log at the time.
>
> It just happened again today, after heavy CPU & IO load (rsyncing from/to
> external disks on dm-crypt). This time the oops was printed on the screen
> but nothing on netconsole:
>
> http://nerdbynature.de/bits/3.2.0-rc1/oops/oops3m.JPG
>
> It looks like the oops I reported earlier (oops2m.JPG) so I doubt it's a
> random corruption due to hardware issues...?
Yeah it's starting to look like a pattern. Your latest oops looks a lot
like the one I had (though it was with tg3 on the g5), ie, vfs_read ->
driver -> allocator -> crash.
> Any debug or boot options to set in my next kernel build?
Well, you can turn everything on see whether that makes any difference
or finds something a bit more precisely
Cheers,
Ben.
> Thanks,
> Christian.
>
> > Looks like there's some kind of memory corruption happening. So far I
> > haven't been able to get a good target at what could be causing it.
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 3.2.0-rc1 panic on PowerPC
2011-11-21 1:25 ` Benjamin Herrenschmidt
@ 2011-11-21 1:51 ` Benjamin Herrenschmidt
2011-11-21 2:03 ` Christian Kujau
2011-11-21 8:08 ` Markus Trippelsdorf
1 sibling, 1 reply; 8+ messages in thread
From: Benjamin Herrenschmidt @ 2011-11-21 1:51 UTC (permalink / raw)
To: Christian Kujau; +Cc: linuxppc-dev, LKML
On Mon, 2011-11-21 at 12:25 +1100, Benjamin Herrenschmidt wrote:
> On Sun, 2011-11-20 at 17:17 -0800, Christian Kujau wrote:
> > On Mon, 21 Nov 2011 at 11:58, Benjamin Herrenschmidt wrote:
> > > I've seen something similar with 3.2-rc2 at cfcfc9ec, unfortunately I
> > > couldn't capture the oops log at the time.
> >
> > It just happened again today, after heavy CPU & IO load (rsyncing from/to
> > external disks on dm-crypt). This time the oops was printed on the screen
> > but nothing on netconsole:
> >
> > http://nerdbynature.de/bits/3.2.0-rc1/oops/oops3m.JPG
> >
> > It looks like the oops I reported earlier (oops2m.JPG) so I doubt it's a
> > random corruption due to hardware issues...?
>
> Yeah it's starting to look like a pattern. Your latest oops looks a lot
> like the one I had (though it was with tg3 on the g5), ie, vfs_read ->
> driver -> allocator -> crash.
>
> > Any debug or boot options to set in my next kernel build?
>
> Well, you can turn everything on see whether that makes any difference
> or finds something a bit more precisely
BTW. SLUB or SLAB ? Mine was SLUB with SLUB_DEBUG enabled (tho the debug
didn't seem to catch anything).
Cheers,
Ben.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 3.2.0-rc1 panic on PowerPC
2011-11-21 1:51 ` Benjamin Herrenschmidt
@ 2011-11-21 2:03 ` Christian Kujau
0 siblings, 0 replies; 8+ messages in thread
From: Christian Kujau @ 2011-11-21 2:03 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, LKML
On Mon, 21 Nov 2011 at 12:51, Benjamin Herrenschmidt wrote:
> BTW. SLUB or SLAB ? Mine was SLUB with SLUB_DEBUG enabled (tho the debug
> didn't seem to catch anything).
SLUB, and SLUB_DEBUG=y (but w/o SLUB_DEBUG_ON and SLUB_STATS). Full config
here: http://nerdbynature.de/bits/3.2.0-rc1/oops/config.txt
I'm compiling today's git checkout (mainline) with more debug settings
enabled[0], let's see if this helps anything.
Christian.
[0] diff to old config
+CONFIG_RT_MUTEX_TESTER=y
+CONFIG_DEBUG_LOCKDEP=y
+CONFIG_DEBUG_HIGHMEM=y
+CONFIG_DEBUG_INFO=y
+CONFIG_DEBUG_VM=y
+CONFIG_DEBUG_WRITECOUNT=y
+CONFIG_DEBUG_LIST=y
+CONFIG_ATOMIC64_SELFTEST=y
+CONFIG_XMON=y
+CONFIG_XMON_DEFAULT=y
+CONFIG_XMON_DISASSEMBLY=y
+CONFIG_DEBUGGER=y
--
BOFH excuse #242:
Software uses US measurements, but the OS is in metric...
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 3.2.0-rc1 panic on PowerPC
2011-11-21 1:25 ` Benjamin Herrenschmidt
2011-11-21 1:51 ` Benjamin Herrenschmidt
@ 2011-11-21 8:08 ` Markus Trippelsdorf
1 sibling, 0 replies; 8+ messages in thread
From: Markus Trippelsdorf @ 2011-11-21 8:08 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: Christian Kujau, linuxppc-dev, LKML
On 2011.11.21 at 12:25 +1100, Benjamin Herrenschmidt wrote:
> On Sun, 2011-11-20 at 17:17 -0800, Christian Kujau wrote:
> > On Mon, 21 Nov 2011 at 11:58, Benjamin Herrenschmidt wrote:
> > > I've seen something similar with 3.2-rc2 at cfcfc9ec, unfortunately I
> > > couldn't capture the oops log at the time.
> >
> > It just happened again today, after heavy CPU & IO load (rsyncing from/to
> > external disks on dm-crypt). This time the oops was printed on the screen
> > but nothing on netconsole:
> >
> > http://nerdbynature.de/bits/3.2.0-rc1/oops/oops3m.JPG
> >
> > It looks like the oops I reported earlier (oops2m.JPG) so I doubt it's a
> > random corruption due to hardware issues...?
>
> Yeah it's starting to look like a pattern. Your latest oops looks a lot
> like the one I had (though it was with tg3 on the g5), ie, vfs_read ->
> driver -> allocator -> crash.
I might be seeing a similar issue on x86_64. See:
http://thread.gmane.org/gmane.linux.kernel.mm/70254
--
Markus
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2011-11-21 8:15 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-11-15 8:44 3.2.0-rc1 panic on PowerPC Christian Kujau
2011-11-20 23:31 ` Christian Kujau
2011-11-21 0:58 ` Benjamin Herrenschmidt
2011-11-21 1:17 ` Christian Kujau
2011-11-21 1:25 ` Benjamin Herrenschmidt
2011-11-21 1:51 ` Benjamin Herrenschmidt
2011-11-21 2:03 ` Christian Kujau
2011-11-21 8:08 ` Markus Trippelsdorf
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox