* Re: sunbmac panic when no card present
2005-03-22 8:32 sunbmac panic when no card present Meelis Roos
@ 2005-03-23 2:42 ` Eric Brower
2005-03-23 2:55 ` Eric Brower
` (8 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: Eric Brower @ 2005-03-23 2:42 UTC (permalink / raw)
To: sparclinux
I'm assuming your "qe/be" card is actually installed in this system, correct?
E
On Tue, 22 Mar 2005 10:32:48 +0200 (EET), Meelis Roos <mroos@linux.ee> wrote:
> Hi, this is 2.6.12-rc1 on Sparcstation 20 with no Sun BigMac card. I
> tried to load the sunbmac module anyway and this resulted in kernel
> panic:
>
> sunbmac.c:v2.0 24/Nov/03 David S. Miller (davem@redhat.com)
> Unable to handle kernel NULL pointer dereference
> tsk->{mm,active_mm}->context = 000005ca
> tsk->{mm,active_mm}->pgd = fc0a4400
> \|/ ____ \|/
> "@'/ ,. \`@"
> /_| \__/ |_\
> \__U_/
> modprobe(980): Oops [#1]
> PSR: 400000c3 PC: f00df4d0 NPC: f00df4d4 Y: 00000000 Not tainted
> PC: <tty_wakeup+0x4/0x64>
> %G: 00000008 f0034afc 00000001 404000e5 f0034aa0 00000000 fb864000 00000001
> %O: ffffffff 00000000 fff0f000 fff0f000 f01e8c00 f01f8400 fb865a28 f00116dc
> RPC: <__udelay+0x1c/0x24>
> %L: 40800fc4 f00116c0 f00116c4 00000010 00000000 00000000 fb864000 fe62dfff
> %I: 00000000 00049000 00000001 00000000 fe637ed4 fe637ed4 fb865a90 f0034afc
> Caller[f0034afc]: tasklet_action+0x6c/0xb8
> Caller[f00347b0]: __do_softirq+0xa0/0xc4
> Caller[f0034814]: do_softirq+0x40/0x54
> Caller[f00108e8]: patch_handler_irq+0x0/0x24
> Caller[f00cc61c]: prom_getproplen+0x4c/0x5c
> Caller[f00cc634]: prom_getproperty+0x8/0x80
> Caller[f00cc6c4]: prom_getint+0x18/0x34
> Caller[f00cc6ec]: prom_getintdefault+0xc/0x28
> Caller[fe61c2f0]: bigmac_ether_init+0x2f0/0x46c [sunbmac]
> Caller[fe61c554]: bigmac_probe+0x98/0xb4 [sunbmac]
> Caller[f004af98]: sys_init_module+0x1a4/0x23c
> Caller[f00113bc]: syscall_is_too_hard+0x34/0x40
> Caller[00012758]: 0x12758
> Instruction DUMP: 81c3e008 90023fe7 9de3bf98 <c20620b8> 80886020 12800006 b
>
> (no more instruction dump fit into minicom window)
>
> --
> Meelis Roos (mroos@linux.ee)
> -
> To unsubscribe from this list: send the line "unsubscribe sparclinux" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
E
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: sunbmac panic when no card present
2005-03-22 8:32 sunbmac panic when no card present Meelis Roos
2005-03-23 2:42 ` Eric Brower
@ 2005-03-23 2:55 ` Eric Brower
2005-03-23 3:02 ` David S. Miller
` (7 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: Eric Brower @ 2005-03-23 2:55 UTC (permalink / raw)
To: sparclinux
I'm assuming your "qe/be" card is actually installed in this system, correct?
E
On Tue, 22 Mar 2005 10:32:48 +0200 (EET), Meelis Roos <mroos@linux.ee> wrote:
> Hi, this is 2.6.12-rc1 on Sparcstation 20 with no Sun BigMac card. I
> tried to load the sunbmac module anyway and this resulted in kernel
> panic:
>
> sunbmac.c:v2.0 24/Nov/03 David S. Miller (davem@redhat.com)
> Unable to handle kernel NULL pointer dereference
> tsk->{mm,active_mm}->context = 000005ca
> tsk->{mm,active_mm}->pgd = fc0a4400
> \|/ ____ \|/
> "@'/ ,. \`@"
> /_| \__/ |_\
> \__U_/
> modprobe(980): Oops [#1]
> PSR: 400000c3 PC: f00df4d0 NPC: f00df4d4 Y: 00000000 Not tainted
> PC: <tty_wakeup+0x4/0x64>
> %G: 00000008 f0034afc 00000001 404000e5 f0034aa0 00000000 fb864000 00000001
> %O: ffffffff 00000000 fff0f000 fff0f000 f01e8c00 f01f8400 fb865a28 f00116dc
> RPC: <__udelay+0x1c/0x24>
> %L: 40800fc4 f00116c0 f00116c4 00000010 00000000 00000000 fb864000 fe62dfff
> %I: 00000000 00049000 00000001 00000000 fe637ed4 fe637ed4 fb865a90 f0034afc
> Caller[f0034afc]: tasklet_action+0x6c/0xb8
> Caller[f00347b0]: __do_softirq+0xa0/0xc4
> Caller[f0034814]: do_softirq+0x40/0x54
> Caller[f00108e8]: patch_handler_irq+0x0/0x24
> Caller[f00cc61c]: prom_getproplen+0x4c/0x5c
> Caller[f00cc634]: prom_getproperty+0x8/0x80
> Caller[f00cc6c4]: prom_getint+0x18/0x34
> Caller[f00cc6ec]: prom_getintdefault+0xc/0x28
> Caller[fe61c2f0]: bigmac_ether_init+0x2f0/0x46c [sunbmac]
> Caller[fe61c554]: bigmac_probe+0x98/0xb4 [sunbmac]
> Caller[f004af98]: sys_init_module+0x1a4/0x23c
> Caller[f00113bc]: syscall_is_too_hard+0x34/0x40
> Caller[00012758]: 0x12758
> Instruction DUMP: 81c3e008 90023fe7 9de3bf98 <c20620b8> 80886020 12800006 b
>
> (no more instruction dump fit into minicom window)
>
> --
> Meelis Roos (mroos@linux.ee)
> -
> To unsubscribe from this list: send the line "unsubscribe sparclinux" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
E
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: sunbmac panic when no card present
2005-03-22 8:32 sunbmac panic when no card present Meelis Roos
2005-03-23 2:42 ` Eric Brower
2005-03-23 2:55 ` Eric Brower
@ 2005-03-23 3:02 ` David S. Miller
2005-03-23 5:58 ` Meelis Roos
` (6 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: David S. Miller @ 2005-03-23 3:02 UTC (permalink / raw)
To: sparclinux
On Tue, 22 Mar 2005 18:55:38 -0800
Eric Brower <ebrower@gmail.com> wrote:
> I'm assuming your "qe/be" card is actually installed in this system, correct?
I think it is. The kernel is crashing on a prom_getproplen() call,
probably on a bogus prom node key, this is happening via prom_getintdefault().
There are three such prom_getintdefault() calls in bigmac_ether_init(),
Meelis can you try commenting them out one by one to see which one triggers
the crash? Perhaps that bp->qec_sdev or bp->bigmac_sdev pointers are
corrupt or point to structures which don't have their prom_node values
setup properly. A comparison of the prom_node values with /usr/sbin/prtconf -pv
output would determine this.
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: sunbmac panic when no card present
2005-03-22 8:32 sunbmac panic when no card present Meelis Roos
` (2 preceding siblings ...)
2005-03-23 3:02 ` David S. Miller
@ 2005-03-23 5:58 ` Meelis Roos
2005-03-23 12:58 ` Meelis Roos
` (5 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: Meelis Roos @ 2005-03-23 5:58 UTC (permalink / raw)
To: sparclinux
> I'm assuming your "qe/be" card is actually installed in this system, correct?
Yes.
--
Meelis Roos (mroos@linux.ee)
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: sunbmac panic when no card present
2005-03-22 8:32 sunbmac panic when no card present Meelis Roos
` (3 preceding siblings ...)
2005-03-23 5:58 ` Meelis Roos
@ 2005-03-23 12:58 ` Meelis Roos
2005-03-23 14:55 ` Meelis Roos
` (4 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: Meelis Roos @ 2005-03-23 12:58 UTC (permalink / raw)
To: sparclinux
> There are three such prom_getintdefault() calls in bigmac_ether_init(),
> Meelis can you try commenting them out one by one to see which one triggers
> the crash? Perhaps that bp->qec_sdev or bp->bigmac_sdev pointers are
> corrupt or point to structures which don't have their prom_node values
> setup properly. A comparison of the prom_node values with /usr/sbin/prtconf -pv
> output would determine this.
Will test now that my 2.4 compile has finished.
--
Meelis Roos
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: sunbmac panic when no card present
2005-03-22 8:32 sunbmac panic when no card present Meelis Roos
` (4 preceding siblings ...)
2005-03-23 12:58 ` Meelis Roos
@ 2005-03-23 14:55 ` Meelis Roos
2005-03-23 16:21 ` Bob Breuer
` (3 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: Meelis Roos @ 2005-03-23 14:55 UTC (permalink / raw)
To: sparclinux
> There are three such prom_getintdefault() calls in bigmac_ether_init(),
> Meelis can you try commenting them out one by one to see which one triggers
> the crash? Perhaps that bp->qec_sdev or bp->bigmac_sdev pointers are
> corrupt or point to structures which don't have their prom_node values
> setup properly. A comparison of the prom_node values with /usr/sbin/prtconf -pv
> output would determine this.
I commented all of them out and replaced with assingment of default
values and printk of the parameters. It still oopses but now in another
place (tty_wakeup as all my oopsen).
If I removed only first two occurences, the oops moved to the third one
(offset changed).
This is an excerpt from prtconf output for comparision.
Node 0xffd767d0
#channels: 00000001
ranges: 00000000.00000000.00000001.00030000.00004000.00000010.00000000.00000001.00010000.00008000.00000020.00000000.00000001.00018000.00008000
reg: 00000001.00020000.00010000.00000001.00040000.00020000
model: 'SUNW,270-2450'
name: 'qec'
Node 0xffd77960
board-version: 00000001
reg: 00000000.00000000.00004000.00000010.00000000.00008000.00000020.00000000.00008000
device_type: 'network'
intr: 00000037.00000000
interrupts: 00000004
address-bits: 00000030
max-frame-size: 00004000
channel#: 00000000
name: 'be'
ffd767d0 seems to be a correct prom_node for qec and
ffd77960 seems to be a correct prom_node for be.
ffd5c0e0 is the prom node for 'sbus' (so qec_sdev->bus->prom_node is
also OK). So nothing suspicious here?
sunbmac.c:v2.0 24/Nov/03 David S. Miller (davem@redhat.com)
qec_sdev->prom_nodeÿd767d0
qec_sdev->busð85e200
qec_sdev->bus->prom_nodeÿd5c0e0
bigmac_sdev = f0a54800
bigmac_sdev->prom_node = ffd77960
Unable to handle kernel NULL pointer dereference
tsk->{mm,active_mm}->context = 0000092f
tsk->{mm,active_mm}->pgd = fc09b400
\|/ ____ \|/
"@'/ ,. \`@"
/_| \__/ |_\
\__U_/
khelper(1395): Oops [#1]
PSR: 400010c6 PC: f00df4d0 NPC: f00df4d4 Y: 00000000 Not tainted
PC: <tty_wakeup+0x4/0x64>
%G: 00000000 f0034afc 00000001 404010e0 f0034aa0 00000000 fac12000 00000001
%O: ffffffff 00000000 00000000 00000000 00000001 fb8d68a0 fac13b00 f00116dc
RPC: <__udelay+0x1c/0x24>
%L: fbe00f8c fbe00f34 d034fa4d 00000008 fb341006 fbe00f34 fac12000 f0011098
%I: 00000000 00049000 00000003 00000924 00000004 00000004 fac13b68 f0034afc
Caller[f0034afc]: tasklet_action+0x6c/0xb8
Caller[f00347b0]: __do_softirq+0xa0/0xc4
Caller[f0034814]: do_softirq+0x40/0x54
Caller[f00108e8]: patch_handler_irq+0x0/0x24
Caller[f00774e0]: count+0x68/0x84
Caller[f00787c0]: do_execve+0xa0/0x1d8
Caller[f0014c9c]: sparc_execve+0x44/0x84
Caller[f00113bc]: syscall_is_too_hard+0x34/0x40
Caller[f003fba8]: ____call_usermodehelper+0x30/0xa4
Caller[f0014d10]: kernel_thread+0x34/0x50
Caller[f003fcd0]: __call_usermodehelper+0x34/0x74
Caller[f00400f8]: worker_thread+0x19c/0x258
Caller[f0044988]: kthread+0x9c/0xb4
Caller[f0014d10]: kernel_thread+0x34/0x50
Caller[f00449b0]: keventd_create_kthread+0x10/0x50
Caller[00000004]: 0x4
Instruction DUMP: 81c3e008 90023fe7 9de3bf98 <c20620b8> 80886020 12800006 b
Kernel panic - not syncing: Aiee, killing interrupt handler!
<0>Press L1-A to return to the boot prom
--
Meelis Roos (mroos@linux.ee)
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: sunbmac panic when no card present
2005-03-22 8:32 sunbmac panic when no card present Meelis Roos
` (5 preceding siblings ...)
2005-03-23 14:55 ` Meelis Roos
@ 2005-03-23 16:21 ` Bob Breuer
2005-03-23 21:17 ` Meelis Roos
` (2 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: Bob Breuer @ 2005-03-23 16:21 UTC (permalink / raw)
To: sparclinux
Meelis Roos wrote:
> PSR: 400010c6 PC: f00df4d0 NPC: f00df4d4 Y: 00000000 Not tainted
> PC: <tty_wakeup+0x4/0x64>
> %G: 00000000 f0034afc 00000001 404010e0 f0034aa0 00000000 fac12000 00000001
> %O: ffffffff 00000000 00000000 00000000 00000001 fb8d68a0 fac13b00 f00116dc
> RPC: <__udelay+0x1c/0x24>
> %L: fbe00f8c fbe00f34 d034fa4d 00000008 fb341006 fbe00f34 fac12000 f0011098
> %I: 00000000 00049000 00000003 00000924 00000004 00000004 fac13b68 f0034afc
> Caller[f0034afc]: tasklet_action+0x6c/0xb8
> Caller[f00347b0]: __do_softirq+0xa0/0xc4
> Caller[f0034814]: do_softirq+0x40/0x54
Looks like some tasklet is calling tty_wakeup with a NULL tty. I have
seen this before when using serial console, but didn't track it down
once I got a framebuffer console working.
Looking into it a little more now, uart_tasklet_action() in
drivers/serial/serial_core.c looks suspicious. Meelis, can you add a
BUG_ON(state->info->tty=NULL) in the middle of uart_tasklet_action()
and see if it dies on that instead?
Bob
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: sunbmac panic when no card present
2005-03-22 8:32 sunbmac panic when no card present Meelis Roos
` (6 preceding siblings ...)
2005-03-23 16:21 ` Bob Breuer
@ 2005-03-23 21:17 ` Meelis Roos
2005-03-24 3:22 ` Bob Breuer
2005-03-24 6:37 ` Meelis Roos
9 siblings, 0 replies; 11+ messages in thread
From: Meelis Roos @ 2005-03-23 21:17 UTC (permalink / raw)
To: sparclinux
> Looking into it a little more now, uart_tasklet_action() in
> drivers/serial/serial_core.c looks suspicious. Meelis, can you add a
> BUG_ON(state->info->tty=NULL) in the middle of uart_tasklet_action() and see
> if it dies on that instead?
Yes, this BUG triggers and produces
sunbmac.c:v2.0 24/Nov/03 David S. Miller (davem@redhat.com)
eth1: BigMAC 100baseT Ethernet 08:00:20:72:8b:a1
kernel BUG at drivers/serial/serial_core.c:111!
\|/ ____ \|/
"@'/ ,. \`@"
/_| \__/ |_\
\__U_/
klogd(344): Kernel bad trap [#1]
PSR: 404000c7 PC: f00f4a80 NPC: f00f4a84 Y: 00000000 Not tainted
PC: <uart_tasklet_action+0x2c/0x30>
%G: 0000006f f0195c00 f0195efc 40400fe5 f002f084 f0195c00 fbe14000 00000000
%O: 00000033 f017ea60 0000006f 00000000 00000000 501509e0 fbe15998 f00f4a78
RPC: <uart_tasklet_action+0x24/0x30>
%L: 40800fc0 f0011530 f0011534 00000001 00000000 65646861 fbe14000 50171e9c
%I: 00000000 00049000 fbe15de8 fbe15f38 50173de4 00000057 fbe15a00 f0034894
Caller[f0034894]: tasklet_action+0x6c/0xb8
Caller[f0034548]: __do_softirq+0xa0/0xc4
Caller[f00345ac]: do_softirq+0x40/0x54
Caller[f0010758]: patch_handler_irq+0x0/0x24
Caller[f01236e8]: memcpy_fromiovec+0xa8/0xc0
Caller[fe61a9bc]: unix_dgram_sendmsg+0x180/0x4ec [unix]
Caller[f011dd78]: sock_aio_write+0xe8/0x104
Caller[f006caa0]: do_sync_write+0x88/0xb4
Caller[f006cbd0]: vfs_write+0x104/0x11c
Caller[f006cc7c]: sys_write+0x30/0x64
Caller[f001122c]: syscall_is_too_hard+0x34/0x40
Caller[00012534]: 0x12534
Instruction DUMP: 9210206f 7ffc7810 90122260 <91d02005> 9de3bf98 e0062010 e2062014 c2042010 80a06000
Kernel panic - not syncing: Aiee, killing interrupt handler!
--
Meelis Roos (mroos@linux.ee)
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: sunbmac panic when no card present
2005-03-22 8:32 sunbmac panic when no card present Meelis Roos
` (7 preceding siblings ...)
2005-03-23 21:17 ` Meelis Roos
@ 2005-03-24 3:22 ` Bob Breuer
2005-03-24 6:37 ` Meelis Roos
9 siblings, 0 replies; 11+ messages in thread
From: Bob Breuer @ 2005-03-24 3:22 UTC (permalink / raw)
To: sparclinux
Meelis Roos wrote:
>> Looking into it a little more now, uart_tasklet_action() in
>> drivers/serial/serial_core.c looks suspicious. Meelis, can you add a
>> BUG_ON(state->info->tty=NULL) in the middle of uart_tasklet_action()
>> and see if it dies on that instead?
>
> Yes, this BUG triggers
At least I'm looking in the right place, even if I don't know what I'm
looking for.
Ok, one more thing to try. Still in serial_core.c, let's also add a
BUG_ON(info->tty=NULL) in the middle of uart_write_wakeup(). I don't
expect this bug to trigger. My current guess is that the tty is closed
between scheduling and running the tasklet. How about adding a printk
after each time where info->tty is set: near line 1224 in uart_close(),
line 1320 in uart_hangup(), and line 1538 in uart_open(). If my guess
is right, we will see a uart_close just before the BUG from the tasklet.
Bob
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: sunbmac panic when no card present
2005-03-22 8:32 sunbmac panic when no card present Meelis Roos
` (8 preceding siblings ...)
2005-03-24 3:22 ` Bob Breuer
@ 2005-03-24 6:37 ` Meelis Roos
9 siblings, 0 replies; 11+ messages in thread
From: Meelis Roos @ 2005-03-24 6:37 UTC (permalink / raw)
To: sparclinux
> Ok, one more thing to try. Still in serial_core.c, let's also add a
> BUG_ON(info->tty=NULL) in the middle of uart_write_wakeup(). I don't expect
> this bug to trigger. My current guess is that the tty is closed between
> scheduling and running the tasklet. How about adding a printk after each
> time where info->tty is set: near line 1224 in uart_close(), line 1320 in
> uart_hangup(), and line 1538 in uart_open(). If my guess is right, we will
> see a uart_close just before the BUG from the tasklet.
Sounds quite possible, this reminds me that when the machine starts up,
minicom changes serial stgatus from "Offline" to "Online" and it remains
Online during bootup but is Offiline during normal work. It seems to
work though - when I do shutdown, I see all the messages scrolling by.
With these printk's and the additional BUG check, the new BUG trigges
very early on boot, during init startup:
VFS: Mounted root (ext3 filesystem) readonly.
Freeing unused kernel memory: 104k freed
uart_open setting info->tty = f0846000
uart_open setting info->tty = f0846000
uart_close setting info->tty = NULL
uart_open setting info->tty = f0846000
uart_close setting info->tty = NULL
kernel BUG at drivers/serial/serial_core.c:73!
\|/ ____ \|/
"@'/ ,. \`@"
/_| \__/ |_\
\__U_/
init(1): Kernel bad trap [#1]
PSR: 40400fc4 PC: f00f49a8 NPC: f00f49ac Y: 00000000 Not tainted
PC: <uart_write_wakeup+0x54/0x60>
%G: 00000049 f0195c00 f0195fc4 40400fe2 f002f084 f0195c00 f0862000 00000000
%O: 00000032 f017eaa8 00000049 500a4880 500197c8 00000314 f0863db0 f00f49a0
RPC: <uart_write_wakeup+0x4c/0x60>
%L: 40800fc5 f0011530 f0011534 00000020 0000000c 6d610000 f0862000 50171e9c
%I: f01d1f20 00049000 00000004 80808080 00000001 00000050 f0863e18 f00f900c
Caller[f00f900c]: sunzilog_transmit_chars+0xf8/0x180
Caller[f00f91c8]: sunzilog_interrupt+0x134/0x174
Caller[f0012f58]: handler_irq+0x78/0xa4
Caller[f0010758]: patch_handler_irq+0x0/0x24
Caller[500097e0]: 0x500097e0
Instruction DUMP: 92102049 7ffc7846 901222a8 <91d02005> 81c7e008 81e80000 9
Kernel panic - not syncing: Aiee, killing interrupt handler!
--
Meelis Roos (mroos@linux.ee)
^ permalink raw reply [flat|nested] 11+ messages in thread