public inbox for linux-ia64@vger.kernel.org
 help / color / mirror / Atom feed
* crash on SysRq : Show Memory
@ 2004-03-09 16:09 Aron Griffis
  2004-03-09 17:14 ` Aron Griffis
                   ` (11 more replies)
  0 siblings, 12 replies; 13+ messages in thread
From: Aron Griffis @ 2004-03-09 16:09 UTC (permalink / raw)
  To: linux-ia64

[-- Attachment #1: Type: text/plain, Size: 4453 bytes --]

I'm running davidm's bk kernel, version 2.6.4-rc1, built with gcc-3.3.
I've been poking around with SysRq and hit a crash last evening with the
following footprint.  After this the machine was wedged so badly that
the Ethernet MP stopped responding completely, so I was unable to power
cycle the machine until I arrived at the office.

telnet> send brk
SysRq : Show Memory
swapper[0]: Oops 8813272891392 [1]

Pid: 0, CPU 0, comm:              swapper
psr : 0000101008026018 ifs : 8000000000000389 ip  : [<a00000010005b9e0>]    Tainted: GF
ip is at show_mem+0x100/0x260
unat: 0000000000000000 pfs : 0000000000000389 rsc : 0000000000000003
rnat: 0000000000000000 bsps: 0000000000000000 pr  : 80000000ff556565
ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a70033f
csd : 0000000000000000 ssd : 0000000000000000
b0  : a00000010005b970 b6  : a00000010006dc70 b7  : a0000001002eda00
f6  : 0fffbccccccccc8c00000 f7  : 0ffdaa200000000000000
f8  : 100008000000000000000 f9  : 10002a000000000000000
f10 : 0fffcccccccccc8c00000 f11 : 1003e0000000000000000
r1  : a000000100afd5c0 r2  : ffffffffffffff05 r3  : 000000000103ffef
r8  : a0000001009174f0 r9  : a0000001009174f0 r10 : 0000000000000000
r11 : a0000001009172a0 r12 : e0000000047ffb90 r13 : e0000000047f8000
r14 : a000000100848510 r15 : a00000010084dcf8 r16 : a0007fffff0fffb0
r17 : 000000000100ffff r18 : a0007fffff0fffd0 r19 : a0007fffaec00000
r20 : a000000100917508 r21 : 0000000000000000 r22 : 000000000504fffb
r23 : 000000000100ffff r24 : 0000000000000000 r25 : 0000000000000001
r26 : 00000000000008eb r27 : 0000000000000001 r28 : a000000100931c12
r29 : 0000000000004b3a r30 : 0000000000000000 r31 : a000000100917258

Call Trace:
 [<a000000100018420>] show_stack+0x80/0xa0
                                sp=e0000000047ff760 bsp=e0000000047f96b8
 [<a00000010003ca70>] die+0x170/0x200
                                sp=e0000000047ff930 bsp=e0000000047f9680
 [<a00000010005a850>] ia64_do_page_fault+0x350/0x980
                                sp=e0000000047ff930 bsp=e0000000047f9618
 [<a000000100011540>] ia64_leave_kernel+0x0/0x260
                                sp=e0000000047ff9c0 bsp=e0000000047f9618
 [<a00000010005b9e0>] show_mem+0x100/0x260
                                sp=e0000000047ffb90 bsp=e0000000047f95c8
 [<a0000001003928a0>] sysrq_handle_showmem+0x20/0x40
                                sp=e0000000047ffb90 bsp=e0000000047f95b0
 [<a000000100392dd0>] __handle_sysrq_nolock+0x110/0x260
                                sp=e0000000047ffb90 bsp=e0000000047f9568
 [<a000000100392c90>] handle_sysrq+0x70/0xa0
                                sp=e0000000047ffb90 bsp=e0000000047f9538
 [<a00000010040e110>] receive_chars+0x330/0x680
                                sp=e0000000047ffb90 bsp=e0000000047f9448
 [<a00000010040ebf0>] serial8250_interrupt+0x1b0/0x200
                                sp=e0000000047ffba0 bsp=e0000000047f93d8
 [<a000000100014d40>] handle_IRQ_event+0xa0/0x120
                                sp=e0000000047ffbb0 bsp=e0000000047f9390
 [<a000000100015780>] do_IRQ+0x280/0x380
                                sp=e0000000047ffbb0 bsp=e0000000047f9340
 [<a000000100017400>] ia64_handle_irq+0xa0/0x1a0
                                sp=e0000000047ffbb0 bsp=e0000000047f9308
 [<a000000100011540>] ia64_leave_kernel+0x0/0x260
                                sp=e0000000047ffbb0 bsp=e0000000047f9308
 [<a0000001000177c0>] ia64_pal_call_static+0x80/0xa0
                                sp=e0000000047ffd80 bsp=e0000000047f9000
 [<a000000100018e30>] default_idle+0x90/0x140
                                sp=e0000000047ffd80 bsp=e0000000047f8fd8
 [<a000000100019000>] cpu_idle+0x120/0x1e0
                                sp=e0000000047ffe20 bsp=e0000000047f8f78
 [<a000000100008d30>] rest_init+0x90/0xe0
                                sp=e0000000047ffe20 bsp=e0000000047f8f60
 [<a0000001007c4e10>] start_kernel+0x490/0x520
                                sp=e0000000047ffe20 bsp=e0000000047f8f00
 [<a0000001000081a0>] _start+0x280/0x2a0
                                sp=e0000000047ffe30 bsp=e0000000047f8e90
 <<00>>KIenr nienlt eprarnuipct:  hAainedel,e rk i-l lniontg  siynntceirnrugp
 h a                                                                         t

-- 
Aron Griffis
hp Linux and Open Source Lab
Key fingerprint = 4601 AE87 379D A917 BA62  5263 C284 0366 5E6A 3C6B


[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: crash on SysRq : Show Memory
  2004-03-09 16:09 crash on SysRq : Show Memory Aron Griffis
@ 2004-03-09 17:14 ` Aron Griffis
  2004-03-09 17:21 ` Aron Griffis
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Aron Griffis @ 2004-03-09 17:14 UTC (permalink / raw)
  To: linux-ia64

Aron Griffis wrote:	[Tue Mar 09 2004, 11:09:04AM EST]
> I'm running davidm's bk kernel, version 2.6.4-rc1, built with gcc-3.3.
                                                                ^^^^^^^^
Sorry, I meant gcc-3.3.3

-- 
Aron Griffis
hp Linux and Open Source Lab
Key fingerprint = 4601 AE87 379D A917 BA62  5263 C284 0366 5E6A 3C6B


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: crash on SysRq : Show Memory
  2004-03-09 16:09 crash on SysRq : Show Memory Aron Griffis
  2004-03-09 17:14 ` Aron Griffis
@ 2004-03-09 17:21 ` Aron Griffis
  2004-03-09 17:54 ` Jesse Barnes
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Aron Griffis @ 2004-03-09 17:21 UTC (permalink / raw)
  To: linux-ia64

Aron Griffis wrote:	[Tue Mar 09 2004, 11:09:04AM EST]
>  <<00>>KIenr nienlt eprarnuipct:  hAainedel,e rk i-l lniontg  siynntceirnrugp
>  h a                                                                         t

Deciphered:
    Kernel panic: Aiee, killing interrupt
    In interrupt handler - not syncing

-- 
Aron Griffis
hp Linux and Open Source Lab
Key fingerprint = 4601 AE87 379D A917 BA62  5263 C284 0366 5E6A 3C6B


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: crash on SysRq : Show Memory
  2004-03-09 16:09 crash on SysRq : Show Memory Aron Griffis
  2004-03-09 17:14 ` Aron Griffis
  2004-03-09 17:21 ` Aron Griffis
@ 2004-03-09 17:54 ` Jesse Barnes
  2004-03-09 18:47 ` Aron Griffis
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Jesse Barnes @ 2004-03-09 17:54 UTC (permalink / raw)
  To: linux-ia64

On Tue, Mar 09, 2004 at 11:09:04AM -0500, Aron Griffis wrote:
> I'm running davidm's bk kernel, version 2.6.4-rc1, built with gcc-3.3.
> I've been poking around with SysRq and hit a crash last evening with the
> following footprint.  After this the machine was wedged so badly that
> the Ethernet MP stopped responding completely, so I was unable to power
> cycle the machine until I arrived at the office.

What kind of machine are you running on?  What's your .config look like?

Jesse

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: crash on SysRq : Show Memory
  2004-03-09 16:09 crash on SysRq : Show Memory Aron Griffis
                   ` (2 preceding siblings ...)
  2004-03-09 17:54 ` Jesse Barnes
@ 2004-03-09 18:47 ` Aron Griffis
  2004-03-09 18:57 ` Grant Grundler
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Aron Griffis @ 2004-03-09 18:47 UTC (permalink / raw)
  To: linux-ia64

Jesse Barnes wrote:	[Tue Mar 09 2004, 12:54:25PM EST]
> What kind of machine are you running on?  What's your .config look like?

Long's Peak.  I've dropped the config at
http://dev.gentoo.org/~agriffis/sysrq_oops/

-- 
Aron Griffis
hp Linux and Open Source Lab
Key fingerprint = 4601 AE87 379D A917 BA62  5263 C284 0366 5E6A 3C6B


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: crash on SysRq : Show Memory
  2004-03-09 16:09 crash on SysRq : Show Memory Aron Griffis
                   ` (3 preceding siblings ...)
  2004-03-09 18:47 ` Aron Griffis
@ 2004-03-09 18:57 ` Grant Grundler
  2004-03-09 19:25 ` Jesse Barnes
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Grant Grundler @ 2004-03-09 18:57 UTC (permalink / raw)
  To: linux-ia64

On Tue, Mar 09, 2004 at 01:47:15PM -0500, Aron Griffis wrote:
> Jesse Barnes wrote:	[Tue Mar 09 2004, 12:54:25PM EST]
> > What kind of machine are you running on?  What's your .config look like?
> 
> Long's Peak.

That's an RX2600 for anyone wanting to find the product on www.hp.com

grant

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: crash on SysRq : Show Memory
  2004-03-09 16:09 crash on SysRq : Show Memory Aron Griffis
                   ` (4 preceding siblings ...)
  2004-03-09 18:57 ` Grant Grundler
@ 2004-03-09 19:25 ` Jesse Barnes
  2004-03-09 20:10 ` Kenneth Chen
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Jesse Barnes @ 2004-03-09 19:25 UTC (permalink / raw)
  To: linux-ia64

On Tue, Mar 09, 2004 at 10:57:48AM -0800, Grant Grundler wrote:
> On Tue, Mar 09, 2004 at 01:47:15PM -0500, Aron Griffis wrote:
> > Jesse Barnes wrote:	[Tue Mar 09 2004, 12:54:25PM EST]
> > > What kind of machine are you running on?  What's your .config look like?
> > 
> > Long's Peak.
> 
> That's an RX2600 for anyone wanting to find the product on www.hp.com

Ah, zx1 then.  I accidentally deleted the original backtrace and it
doesn't appear to be in the list archives yet, but I'm guessing that
since the panic occured at arch/ia64/mm/contig.c:show_mem+0x100, I'm
guessing the problem is somewhere after the call to show_free_areas().
I don't have a zx1 box that's easy to test with, so I probably won't be
much help.

Jesse

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: crash on SysRq : Show Memory
  2004-03-09 16:09 crash on SysRq : Show Memory Aron Griffis
                   ` (5 preceding siblings ...)
  2004-03-09 19:25 ` Jesse Barnes
@ 2004-03-09 20:10 ` Kenneth Chen
  2004-03-09 21:13 ` Aron Griffis
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Kenneth Chen @ 2004-03-09 20:10 UTC (permalink / raw)
  To: linux-ia64


Jesse Barnes wrote on Tue, March 09, 2004 11:26 AM
> On Tue, Mar 09, 2004 at 10:57:48AM -0800, Grant Grundler wrote:
> > On Tue, Mar 09, 2004 at 01:47:15PM -0500, Aron Griffis wrote:
> > > Jesse Barnes wrote:	[Tue Mar 09 2004, 12:54:25PM EST]
> > > > What kind of machine are you running on?
> > >
> > > Long's Peak.
> >
> > That's an RX2600 for anyone wanting to find the product on www.hp.com
>
> Ah, zx1 then.  I accidentally deleted the original backtrace and it
> doesn't appear to be in the list archives yet, but I'm guessing that
> since the panic occured at arch/ia64/mm/contig.c:show_mem+0x100, I'm
> guessing the problem is somewhere after the call to show_free_areas().
> I don't have a zx1 box that's easy to test with, so I probably won't be
> much help.

Looks like it passed beyond show_free_areas(), faulting IP was in the
while loop accessing mem_map variable.

By the way, the local variable total is redundant in that function.
Same data already exists with max_mapnr.

- Ken



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: crash on SysRq : Show Memory
  2004-03-09 16:09 crash on SysRq : Show Memory Aron Griffis
                   ` (6 preceding siblings ...)
  2004-03-09 20:10 ` Kenneth Chen
@ 2004-03-09 21:13 ` Aron Griffis
  2004-03-09 21:36 ` Aron Griffis
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Aron Griffis @ 2004-03-09 21:13 UTC (permalink / raw)
  To: linux-ia64

Jesse Barnes wrote:	[Tue Mar 09 2004, 02:25:41PM EST]
> Ah, zx1 then.  I accidentally deleted the original backtrace

http://dev.gentoo.org/~agriffis/oops.txt

> and it
> doesn't appear to be in the list archives yet, but I'm guessing that
> since the panic occured at arch/ia64/mm/contig.c:show_mem+0x100, I'm
> guessing the problem is somewhere after the call to show_free_areas().
> I don't have a zx1 box that's easy to test with, so I probably won't be
> much help.

So I have the vmlinux which was built with debugging symbols (I assume
this is why I got a backtrace as well)...  however I'm not immediately
seeing how to trace this back into show_mem()

$ gdb vmlinux
(gdb) list *(show_mem+0x100)
0xa00000010005b9e0 is in show_mem (bitops.h:280).
275     }
276
277     static __inline__ int
278     test_bit (int nr, const volatile void *addr)
279     {
280             return 1 & (((const volatile __u32 *) addr)[nr >> 5] >> (nr & 31));
281     }
282
283     /**
284      * ffz - find the first zero bit in a long word

--
Aron Griffis
hp Linux and Open Source Lab
Key fingerprint = 4601 AE87 379D A917 BA62  5263 C284 0366 5E6A 3C6B


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: crash on SysRq : Show Memory
  2004-03-09 16:09 crash on SysRq : Show Memory Aron Griffis
                   ` (7 preceding siblings ...)
  2004-03-09 21:13 ` Aron Griffis
@ 2004-03-09 21:36 ` Aron Griffis
  2004-03-09 22:49 ` Kenneth Chen
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Aron Griffis @ 2004-03-09 21:36 UTC (permalink / raw)
  To: linux-ia64

Kenneth Chen wrote:	[Tue Mar 09 2004, 03:10:50PM EST]
> Looks like it passed beyond show_free_areas(), faulting IP was in the
> while loop accessing mem_map variable.

How did you determine that?

-- 
Aron Griffis
hp Linux and Open Source Lab
Key fingerprint = 4601 AE87 379D A917 BA62  5263 C284 0366 5E6A 3C6B


^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: crash on SysRq : Show Memory
  2004-03-09 16:09 crash on SysRq : Show Memory Aron Griffis
                   ` (8 preceding siblings ...)
  2004-03-09 21:36 ` Aron Griffis
@ 2004-03-09 22:49 ` Kenneth Chen
  2004-03-09 23:31 ` Aron Griffis
  2004-03-09 23:39 ` Andreas Schwab
  11 siblings, 0 replies; 13+ messages in thread
From: Kenneth Chen @ 2004-03-09 22:49 UTC (permalink / raw)
  To: linux-ia64

Aron Griffis wrote on Tue, March 09, 2004 1:36 PM
> > Looks like it passed beyond show_free_areas(), faulting IP was
> > in the while loop accessing mem_map variable.

> How did you determine that?

Do objdump on vmlinux, then look at the disassembled code at
faulting IP:

Show_mem+0x100
a00000010003cd00:    [MMI]       ld4.acq r21=[r16];;

It is page faulting on r16, lookup r16 in the panic dump, it shows:
r16 : a0007fffff0fffb0, which basically points to a page struct in
the mem_map array, you might want to verify whether that address is
valid or not.

Exam the code up/down a little bit, you would realize the while loop
is in between show_mem+0xe0 and show_mem+0x1dc, variable i is stored
in r17/r3, lookup up these instructions to see how the pointer is
calculated with index i:
(i*5*16 + base mem_map, sizeof(struct page) = 80).

a00000010003cce6:                  sxt4 r23=r17;;
a00000010003ccec:                  shladd r22=r23,2,r23;;
a00000010003ccf0:      [MMI]       shladd r16=r22,4,r19;;

Anyway, hope you get the idea ....

- Ken



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: crash on SysRq : Show Memory
  2004-03-09 16:09 crash on SysRq : Show Memory Aron Griffis
                   ` (9 preceding siblings ...)
  2004-03-09 22:49 ` Kenneth Chen
@ 2004-03-09 23:31 ` Aron Griffis
  2004-03-09 23:39 ` Andreas Schwab
  11 siblings, 0 replies; 13+ messages in thread
From: Aron Griffis @ 2004-03-09 23:31 UTC (permalink / raw)
  To: linux-ia64

Kenneth Chen wrote:	[Tue Mar 09 2004, 05:49:26PM EST]
> Anyway, hope you get the idea ....

Thanks!  That's exactly what I was trying to do but lacked the
know-how...

-- 
Aron Griffis
hp Linux and Open Source Lab
Key fingerprint = 4601 AE87 379D A917 BA62  5263 C284 0366 5E6A 3C6B


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: crash on SysRq : Show Memory
  2004-03-09 16:09 crash on SysRq : Show Memory Aron Griffis
                   ` (10 preceding siblings ...)
  2004-03-09 23:31 ` Aron Griffis
@ 2004-03-09 23:39 ` Andreas Schwab
  11 siblings, 0 replies; 13+ messages in thread
From: Andreas Schwab @ 2004-03-09 23:39 UTC (permalink / raw)
  To: linux-ia64

jbarnes@sgi.com (Jesse Barnes) writes:

> On Tue, Mar 09, 2004 at 10:57:48AM -0800, Grant Grundler wrote:
>> On Tue, Mar 09, 2004 at 01:47:15PM -0500, Aron Griffis wrote:
>> > Jesse Barnes wrote:	[Tue Mar 09 2004, 12:54:25PM EST]
>> > > What kind of machine are you running on?  What's your .config look like?
>> > 
>> > Long's Peak.
>> 
>> That's an RX2600 for anyone wanting to find the product on www.hp.com
>
> Ah, zx1 then.

I can reproduce it on Tiger as well.  Doesn't seem to be platform
dependent.

Andreas.

-- 
Andreas Schwab, SuSE Labs, schwab@suse.de
SuSE Linux AG, Maxfeldstraße 5, 90409 Nürnberg, Germany
Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2004-03-09 23:39 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-03-09 16:09 crash on SysRq : Show Memory Aron Griffis
2004-03-09 17:14 ` Aron Griffis
2004-03-09 17:21 ` Aron Griffis
2004-03-09 17:54 ` Jesse Barnes
2004-03-09 18:47 ` Aron Griffis
2004-03-09 18:57 ` Grant Grundler
2004-03-09 19:25 ` Jesse Barnes
2004-03-09 20:10 ` Kenneth Chen
2004-03-09 21:13 ` Aron Griffis
2004-03-09 21:36 ` Aron Griffis
2004-03-09 22:49 ` Kenneth Chen
2004-03-09 23:31 ` Aron Griffis
2004-03-09 23:39 ` Andreas Schwab

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox