* Re: kernel troubles on rx2600 while running sbuild
2004-09-26 16:33 kernel troubles on rx2600 while running sbuild Thibaut VARENE
@ 2004-09-29 7:17 ` Thibaut VARENE
2004-09-29 7:47 ` Thibaut VARENE
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: Thibaut VARENE @ 2004-09-29 7:17 UTC (permalink / raw)
To: linux-ia64
Hi,
Another report, that happened tonight while the box was idling (after
having built quite a lot of packages - about 1000 - without problems),
and which was apparently triggered by a cron task (at 5 AM). There
again (see at the end) there's a kernel NULL pointer dereference. Full
dump attached.
rx2600 1-way 2GB RAM
Debian sarge, kernel-image-2.4.27-mckinley
start-stop-daem[9393]: NaT consumption 2216203124768
Pid: 9393, CPU 0, comm: start-stop-daem
psr : 0000121008022038 ifs : 800000000000070f ip : [<e00000000446f311>]
Not tainted ip is at (no symbol)
unat: 0000000000000000 pfs : 000000000000048f rsc : 0000000000000003
rnat: 00000fffffffbfff bsps: 0000000000007fa8 pr : 8000006af6556699
ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c0270033f
csd : 0000000000000000 ssd : 0000000000000000
b0 : e0000000046490c0 b6 : e0000000044cf1e0 b7 : e000000004648250
f6 : 1003e0000000000000008 f7 : 000000000000000000000
f8 : 1003e0000000000bfcde8 f9 : 1003e000000000017f9bd
f10 : 1003e000000000000766c f11 : 1003e00000000000516a4
r1 : e000000004b2ee70 r2 : 0000000000000080 r3 : e00000000494eb4c
r8 : 0000000000000000 r9 : 0000000000000000 r10 : e00000000494eb40
r11 : 0000000000000001 r12 : e00000000621fb30 r13 : e000000006218000
r14 : 000000000225cb5c r15 : e00000000494ed4c r16 : e00000003f399760
r17 : 0000000000000000 r18 : e00000003f399768 r19 : 000000000225cb64
r20 : 0000000000000008 r21 : 0000000000000003 r22 : 0000000000000803
r23 : e000000004648250 r24 : ffffffffffe1db88 r25 : e00000000482a6c8
r26 : 0000000000000028 r27 : e00000000482a6a0 r28 : e000000004930550
r29 : e0000000040af3e0 r30 : 0000000000000000 r31 : 00000000004da072
Call Trace:
[<e000000004413e80>] (no symbol)
spà0000000621f690 bspà00000006219280
[<e00000000442e330>] (no symbol)
spà0000000621f860 bspà00000006219248
[<e00000000442f020>] (no symbol)
spà0000000621f860 bspà00000006219200
[<e00000000440eb20>] (no symbol)
spà0000000621f960 bspà00000006219200
[<e00000000446f310>] (no symbol)
spà0000000621fb30 bspà00000006219180
[<e0000000046490c0>] (no symbol)
spà0000000621fb30 bspà00000006219138
[<e0000000044cf600>] (no symbol)
spà0000000621fb30 bspà000000062190a0
[<a000000000087bf0>] (no symbol)
spà0000000621fc30 bspà00000006219078
[<e0000000044a4880>] (no symbol)
spà0000000621fc30 bspà00000006218fb8
[<e0000000044a5520>] (no symbol)
spà0000000621fc30 bspà00000006218f50
[<e0000000044dbbb0>] (no symbol)
spà0000000621fc50 bspà00000006218f10
[<e0000000044dd820>] (no symbol)
spà0000000621fc60 bspà00000006218ea8
[<e000000004415400>] (no symbol)
spà0000000621fe30 bspà00000006218e30
[<e00000000440e110>] (no symbol)
spà0000000621fe30 bspà00000006218e08
[<e00000000440eb00>] (no symbol)
spà0000000621fe30 bspà00000006218dd0
<1>Unable to handle kernel NULL pointer dereferenceswapper[0]: Oops
11012296146944
Pid: 0, CPU 0, comm: swapper
psr : 0000121008022018 ifs : 800000000000070f ip : [<e00000000446f311>]
Not tainted ip is at (no symbol)
unat: 0000000000000000 pfs : 0000000000000207 rsc : 0000000000000003
rnat: 000000000000038a bsps: 0000000000000003 pr : 80000000ff556a65
ldrs: 0000000000000000 ccv : 000000000000001d fpsr: 0009804c0270033f
csd : 0000000000000000 ssd : 0000000000000000
b0 : e0000000044c9d00 b6 : e000000004403310 b7 : e0000000044cbda0
f6 : 1003e00000000000000de f7 : 0ffebdcc4a9ae00000000
f8 : 1003e0000000000097c80 f9 : 1003e0000000000005340
f10 : 1003e0000000000000060 f11 : 1003e00000000000012f9
r1 : e000000004b2ee70 r2 : 000000000000011d r3 : 000000000000011d
r8 : 000000000000011d r9 : 000000000000011d r10 : 000000000000011d
r11 : 000000000000011d r12 : e0000000048abba0 r13 : e0000000048a4000
r14 : 000000000000001d r15 000000000000011d r16 : 000000000000001d
r17 : 0000000000000000 r18 : e00000002c2589a0 r19 : 000000000000001d
r20 : 0000000000000019 r21 : 000000000000001d r22 : 000000000000006a
r23 : 000000000000015d r24 : 000000000000011d r25 : 000000000000015d
r26 : 0000000000001800 r27 : 000000000000006a r28 : 0000000000002000
r29 : 0000000000000800 r30 : e00000003f399754 r31 : e00000003f399758
Call Trace:
[<e000000004413e80>] (no symbol)
spà000000048ab770 bspà000000048a5708
[<e00000000442e330>] (no symbol)
spà000000048ab940 bspà000000048a56d0
[<e000000004449910>] (no symbol)
spà000000048ab940 bspà000000048a5670
[<e00000000440eb20>] (no symbol)
spà000000048ab9d0 bspà000000048a5670
[<e00000000446f310>] (no symbol)
spà000000048abba0 bspà000000048a55f0
[<e0000000044c9d00>] (no symbol)
spà000000048abba0 bspà000000048a55d0
[<e0000000044cbed0>] (no symbol)
spà000000048abba0 bspà000000048a5598
[<e0000000046665b0>] (no symbol)
spà000000048abba0 bspà000000048a54d8
[<e000000004666cb0>] (no symbol)
spà000000048abba0 bspà000000048a5438
[<e0000000046863a0>] (no symbol)
spà000000048abba0 bspà000000048a53d8
[<e000000004663d30>] (no symbol)
spà000000048abbb0 bspà000000048a5368
[<a0000000000349f0>] (no symbol)
spà000000048abbb0 bspà000000048a52f8
[<a00000000001c3a0>] (no symbol)
spà000000048abbb0 bspà000000048a5210
[<e000000004411700>] (no symbol)
spà000000048abbb0 bspà000000048a51d0
[<e000000004411d40>] (no symbol)
spà000000048abbb0 bspà000000048a5190
[<e0000000044134f0>] (no symbol)
spà000000048abbb0 bspà000000048a5158
[<e00000000440eb20>] (no symbol)
spà000000048abbb0 bspà000000048a5158
[<e000000004413a40>] (no symbol)
spà000000048abd80 bspà000000048a4e50
[<e000000004414770>] (no symbol)
spà000000048abd80 bspà000000048a4e30
[<e000000004414900>] (no symbol)
spà000000048abe20 bspà000000048a4e00
[<e000000004409090>] (no symbol)
spà000000048abe20 bspà000000048a4de0
[<e000000004878cc0>] (no symbol)
spà000000048abe20 bspà000000048a4d80
[<e0000000044085e0>] (no symbol)
spà000000048abe30 bspà000000048a4d80
<0>Kernel panic: Aiee, killing interrupt handler!
In interrupt handler - not syncing
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: kernel troubles on rx2600 while running sbuild
2004-09-26 16:33 kernel troubles on rx2600 while running sbuild Thibaut VARENE
2004-09-29 7:17 ` Thibaut VARENE
@ 2004-09-29 7:47 ` Thibaut VARENE
2004-09-29 19:13 ` dann frazier
2004-09-29 20:31 ` Thibaut VARENE
3 siblings, 0 replies; 5+ messages in thread
From: Thibaut VARENE @ 2004-09-29 7:47 UTC (permalink / raw)
To: linux-ia64
On Wed, 29 Sep 2004 09:17:20 +0200
Thibaut VARENE <T-Bone@parisc-linux.org> wrote:
> Hi,
>
> Another report, that happened tonight while the box was idling (after
> having built quite a lot of packages - about 1000 - without problems),
> and which was apparently triggered by a cron task (at 5 AM). There
> again (see at the end) there's a kernel NULL pointer dereference. Full
> dump attached.
>
> rx2600 1-way 2GB RAM
> Debian sarge, kernel-image-2.4.27-mckinley
After rebooting, it appeared that the last build the box was running
(gcc-3.4) didn't complete, though the box was actually idle when it died
(according to mrtg reports). For the records, i've seen my glibc build
get stalled (ie: the box gets idle and no progress is made) in the test
suite a little before gcc was being built.
So, on the same box after a fresh reboot, trying to "du -sh" an unclean
gcc-3.4 build directory (i tried twice) made du to segfault and dmesg to
show the following (note, i tried du on other stuff and it worked
without problem):
Unable to handle kernel paging request at virtual address
00000000000100a6 du[3608]: Oops 8813272891392
Pid: 3608, CPU 0, comm: du
psr : 0000101008026018 ifs : 8000000000000690 ip : [<e0000000044da450>]
Not tainted ip is at (no symbol)
unat: 0000000000000000 pfs : 0000000000000690 rsc : 0000000000000003
rnat: 0000000000000010 bsps: 0000000000000000 pr : 80000000ff56a659
ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c0270033f
csd : 0000000000000000 ssd : 0000000000000000
b0 : e0000000044da3f0 b6 : e000000004494020 b7 : a00000000009ee00
f6 : 1003e0000000000001000 f7 : 000000000000000000000
f8 : 000000000000000000000 f9 : 1000b8000000000000000
f10 : 000000000000000000000 f11 : 1003e0000000000000000
r1 : e000000004b2ee70 r2 : 000000000001003e r3 : e000000034258a88
r8 : 0000000000000000 r9 : e000000034258980 r10 : e00000003438cf88
r11 : 0000000000000002 r12 : e00000000fc5fe00 r13 : e00000000fc58000
r14 : 000000000000f96e r15 : 00000000000100a6 r16 : 0000000000000000
r17 : e00000003f3e2388 r18 : e00000003f3f15e8 r19 : e00000003f3f15d0
r20 : e00000003f3e2388 r21 : 00000000000ab7ba r22 : e000000001025928
r23 : 000000000004df26 r24 : 000000003e5b8000 r25 : e00000003f3f15e0
r26 : e00000000493ce48 r27 : 2000000000000000 r28 : e00000000493ce48
r29 : e00000003f3f15d8 r30 : e00000003f3f15d0 r31 : e00000003f3f15c8
Call Trace:
[<e000000004413e80>] (no symbol)
spà0000000fc5f9d0 bspà0000000fc59030
[<e00000000442e330>] (no symbol)
spà0000000fc5fba0 bspà0000000fc58ff0
[<e000000004449910>] (no symbol)
spà0000000fc5fba0 bspà0000000fc58f90
[<e00000000440eb20>] (no symbol)
spà0000000fc5fc30 bspà0000000fc58f90
[<e0000000044da450>] (no symbol)
spà0000000fc5fe00 bspà0000000fc58f10
[<e00000000440e920>] (no symbol)
spà0000000fc5fe30 bspà0000000fc58ef8
<1>Unable to handle kernel paging request at virtual address
00000000000100a6 du[3646]: Oops 8813272891392
Pid: 3646, CPU 0, comm: du
psr : 0000101008026018 ifs : 8000000000000690 ip : [<e0000000044da450>]
Not tainted ip is at (no symbol)
unat: 0000000000000000 pfs : 0000000000000690 rsc : 0000000000000003
rnat: 0000000000000010 bsps: 00000000000db3da pr : 80000000ff56a659
ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c0270033f
csd : 0000000000000000 ssd : 0000000000000000
b0 : e0000000044da3f0 b6 : e000000004494020 b7 : a00000000009ee00
f6 : 1003e0000000000001000 f7 : 000000000000000000000
f8 : 000000000000000000000 f9 : 1000b8000000000000000
f10 : 000000000000000000000 f11 : 1003e0000000000000000
r1 : e000000004b2ee70 r2 : 000000000001003e r3 : e000000034258a88
r8 : 0000000000000000 r9 : e000000034258980 r10 : e00000003438cf88
r11 : 0000000000000002 r12 : e00000000fccfe00 r13 : e00000000fcc8000
r14 : 000000000000f96e r15 : 00000000000100a6 r16 : 0000000000000000
r17 : e00000003f3e2388 r18 : e00000003f3f15e8 r19 : e00000003f3f15d0
r20 : e00000003f3e2388 r21 : 00000000000ab7ba r22 : e000000001025928
r23 : 000000000004df26 r24 : 000000003e5b8000 r25 : e00000003f3f15e0
r26 : e00000000493ce48 r27 : 2000000000000000 r28 : e00000000493ce48
r29 : e00000003f3f15d8 r30 : e00000003f3f15d0 r31 : e00000003f3f15c8
Call Trace:
[<e000000004413e80>] (no symbol)
spà0000000fccf9d0 bspà0000000fcc9030
[<e00000000442e330>] (no symbol)
spà0000000fccfba0 bspà0000000fcc8ff0
[<e000000004449910>] (no symbol)
spà0000000fccfba0 bspà0000000fcc8f90
[<e00000000440eb20>] (no symbol)
spà0000000fccfc30 bspà0000000fcc8f90
[<e0000000044da450>] (no symbol)
spà0000000fccfe00 bspà0000000fcc8f10
[<e00000000440e920>] (no symbol)
spà0000000fccfe30 bspà0000000fcc8ef8
HTH,
Thibaut VARENE
The PA/Linux ESIEE Team
http://www.pateam.org/
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: kernel troubles on rx2600 while running sbuild
2004-09-26 16:33 kernel troubles on rx2600 while running sbuild Thibaut VARENE
` (2 preceding siblings ...)
2004-09-29 19:13 ` dann frazier
@ 2004-09-29 20:31 ` Thibaut VARENE
3 siblings, 0 replies; 5+ messages in thread
From: Thibaut VARENE @ 2004-09-29 20:31 UTC (permalink / raw)
To: linux-ia64
On Wed, 29 Sep 2004 13:13:05 -0600
dann frazier <dannf@hp.com> wrote:
> > Unable to handle kernel paging request at virtual address
> > 00000000000100a6 du[3608]: Oops 8813272891392
>
> Can you run these through ksymoops and provide the version of the
> 2.4.27 kernel you're running?
[varenet@envy ~]$ uname -a
Linux envy 2.4.27-1-mckinley #1 Fri Sep 3 13:33:45 MDT 2004 ia64
GNU/Linux
unfortunately i can't run ksymoops anymore. This system is under high
load and has to be quickly restored to a working state. The partition
containing the troublesome folder has been reinitialized (though neither
fsck nor badblocks shown anything wrong with it).
Again, i have some more reports:
back to the SMP box (2.6.8.1-1-mckinley-smp), i tried again to build
gcc-3.4_3.4.2-2ubuntu1. The build that failed on the 2.4 UP box. It
seems that i've been able to reproduce the "stall effect". Several tests
from the ADA testsuite are getting stalled for no apparent reason. The
only common thing is that they ALL get stuck issuing an Unaligned Access
with the SAME ip. (See the inlined dmesg output). I had to kill those
processes using SIGKILL, since SIGTERM had no effect.
Two things about this:
1) The PID (?) shown between parenthesis in this dump differs (it's
greater) with the PID reported in PS. These tests are single-threaded.
2) The ip looks like a shared lib, and ldd showed the following:
libgnarl-3.4.so.1 => not found
libgnat-3.4.so.1 => not found
libpthread.so.0 => /lib/tls/libpthread.so.0 (0x2000000000040000)
libc.so.6.1 => /lib/tls/libc.so.6.1 (0x2000000000070000)
/lib/ld-linux-ia64.so.2 => /lib/ld-linux-ia64.so.2 (0x2000000000000000)
unfortunately before i could change LD_LIBRARY_PATH, the build went over
and the executable were gone.
Only the ADA testsuite was affected. The rest of the build went mostly
fine (i had troubles at the very end of the process, with dpkg-*, but i
don't think this is related).
HTH,
Thibaut VARENE
The PA/Linux ESIEE Team
http://www.pateam.org/
c91004b(10788): unaligned access to 0x6000000000002469, ip=0x200000000037af71
c91004b(10788): unaligned access to 0x6000000000002469, ip=0x200000000037af71
c940010(12248): unaligned access to 0x6000000000003e59, ip=0x200000000037af71
c940010(12248): unaligned access to 0x6000000000003e59, ip=0x200000000037af71
c94002g(12929): unaligned access to 0x600000000000f709, ip=0x200000000037af71
c94002g(12929): unaligned access to 0x600000000000f709, ip=0x200000000037af71
c94007a(13287): unaligned access to 0x600000000000fbb1, ip=0x200000000037af71
c94007a(13287): unaligned access to 0x600000000000fbb1, ip=0x200000000037af71
c95022b(13988): unaligned access to 0x6000000000003cf9, ip=0x200000000037af71
c95022b(13988): unaligned access to 0x6000000000003cf9, ip=0x200000000037af71
c95022b(13988): unaligned access to 0x6000000000003cf9, ip=0x200000000037af71
c95022b(13988): unaligned access to 0x6000000000003cf9, ip=0x200000000037af71
c95072a(14703): unaligned access to 0x6000000000001cd1, ip=0x200000000037af71
c95072a(14703): unaligned access to 0x6000000000001cd1, ip=0x200000000037af71
c95072b(14744): unaligned access to 0x6000000000001f11, ip=0x200000000037af71
c95072b(14744): unaligned access to 0x6000000000001f11, ip=0x200000000037af71
c954016(16750): unaligned access to 0x6000000000002791, ip=0x200000000037af71
c954016(16750): unaligned access to 0x6000000000002791, ip=0x200000000037af71
c954017(16791): unaligned access to 0x6000000000002969, ip=0x200000000037af71
c954017(16791): unaligned access to 0x6000000000002969, ip=0x200000000037af71
c974004(19316): unaligned access to 0x60000000000019d9, ip=0x200000000037af71
c974004(19316): unaligned access to 0x60000000000019d9, ip=0x200000000037af71
c974009(19510): unaligned access to 0x6000000000000b71, ip=0x200000000037af71
c974009(19510): unaligned access to 0x6000000000000b71, ip=0x200000000037af71
c9a011a(20229): unaligned access to 0x600000000000de49, ip=0x200000000037af71
c9a011a(20229): unaligned access to 0x600000000000de49, ip=0x200000000037af71
cb1010a(23612): unaligned access to 0x60000000000014b9, ip=0x200000000037af71
cb1010a(23612): unaligned access to 0x60000000000014b9, ip=0x200000000037af71
cb20001(23724): unaligned access to 0x60000000000023c1, ip=0x200000000037af71
cb20001(23724): unaligned access to 0x60000000000023c1, ip=0x200000000037af71
cb20004(23802): unaligned access to 0x6000000000001619, ip=0x200000000037af71
cb20004(23802): unaligned access to 0x6000000000001619, ip=0x200000000037af71
cb20004(23802): unaligned access to 0x6000000000001619, ip=0x200000000037af71
cb20004(23802): unaligned access to 0x6000000000001619, ip=0x200000000037af71
cb41002(24763): unaligned access to 0x60000000000063a9, ip=0x200000000037af71
cb41002(24763): unaligned access to 0x60000000000063a9, ip=0x200000000037af71
cb5001a(24866): unaligned access to 0x600000000000e7f1, ip=0x200000000037af71
cb5001a(24866): unaligned access to 0x600000000000e7f1, ip=0x200000000037af71
cb5001b(24905): unaligned access to 0x600000000000f979, ip=0x200000000037af71
cb5001b(24905): unaligned access to 0x600000000000f979, ip=0x200000000037af71
cb5002a(24943): unaligned access to 0x6000000000001e01, ip=0x200000000037af71
cb5002a(24943): unaligned access to 0x6000000000001e01, ip=0x200000000037af71
^ permalink raw reply [flat|nested] 5+ messages in thread